Have you ever copy and pasted code for datasets ranging from 2005 to 2019? You know what I’m talking about. I’m talking about this:

DO NOT RUN THIS!

proc sql;
    create table    want_10
    select
                    enrolid,
                    dx1,
                    dx2
    from
                    raw.cc_ot_svc_10;
quit;

proc sql;
    create table    want_11
    select
                    enrolid,
                    dx1,
                    dx2
    from
                    raw.cc_ot_svc_11;
quit;


proc sql;
    create table    want_12
    select
                    enrolid,
                    dx1,
                    dx2
    from
                    raw.cc_ot_svc_12;
quit;

Some of you might be thinking:

If it ain’t broke, don’t fix it!

But we know this can get tedious, and you might end up obsessively checking each block of code for accuracy. It’s not necessarily wrong!

Alternatively, some of you may even be into macros, and your macros may look like this:

DO NOT RUN THIS!

%macro get_truven_ot (year);
proc sql;
    create table    want_&year.
    select
                    enrolid,
                    dx1,
                    dx2
    from
                    raw.cc_ot_svc_&year.;
quit;
%mend get_truven_ot;

%get_truven_ot(10);
%get_truven_ot(11);
%get_truven_ot(12);

/* You could even sandwich in a %DO% statement to minimize copy-pasting the %get_truven_ip call each time. You could even drop the year parameter */

%macro get_truven_ot;
%do year = 10 %to 12;
proc sql;
    create table    want_&year.
    select
                    enrolid,
                    dx1,
                    dx2
    from
                    raw.cc_ot_svc_&year.;
%end;
%mend get_truven_ot;

And you know, there’s nothing wrong with those either. To iterate:

If it ain’t broke, don’t fix it!

I firmly disagree with ‘program’ or ‘code shaming’ – All our techniques are unique and should be appreciated no matter how efficient they are!

I have code shamed before. :(

Nonetheless, let’s mimic the Truven data to a degree with the following code.

If you want to know what all is going on and the comments don’t help, please feel free to contact me. Otherwise, this just creates dummy datasets kind of like they would be on the server.

RUN THIS!

%macro fake_claims;							/*	Begin macro that will create 10 data sets with differing numbers of DX's.			*/

%do yr = 10 %to 20;							/* 	Setting yr macro variable to iterate from 10 (2010) to 20 (2020). 					*/

data 		work.cc_ot_svc_&yr (drop = i); 	/* 	Below creates the actual data sets.													*/
	array 	dx [2] 3. dx1-dx2;
	length 	year $4. totaldx 3.;
	
    do  	pt = 1 to 5000;					/* 	Create a record for each patient													*/
	do		i = 1 to dim(dx);				/* 	Do the following for each diagnosis in the array.									*/
			dx[i] = rand("bernoulli", .4);	/*	Generate a random number between 0 and 1. Just fake placeholders.					*/
			totaldx = sum(of dx:);			/* 	Totaling the number of dx's for a given patient.									*/
			year = catt("20", &yr.);
	end; 									/*	End the <do i = 1 to dim(dx)> do statement											*/
	output;									/* 	Necessary to output information for each patient.									*/
	end;									/*	End the <do pt = 1 to 5000> do statement.											*/
    
run;										/* 	Run the data step.																	*/
%end;										/* 	End the %DO processing above.														*/
%mend fake_claims;							/*	End the macro.																		*/
/* Execute the macro */
%fake_claims;

With all that output, you may be thinking it’s time to copy and paste a bunch of code/create a macro to get down to your final cohort.

Alt Text

Fire up the old reliable way, right?

DO NOT RUN THIS!

%macro subset (dsn);
proc sql;
	create table		copy_&dsn. as
		select
				pt,
				totaldx,
				year
		from
				&dsn.
		where
				totaldx = 2;
quit;
%mend extract;

%subset(cc_ot_svc_10);
%subset(cc_ot_svc_11);
%subset(cc_ot_svc_12);
%subset(cc_ot_svc_13);
%subset(cc_ot_svc_14);
%subset(cc_ot_svc_15);
%subset(cc_ot_svc_16);
%subset(cc_ot_svc_17);
%subset(cc_ot_svc_18);
%subset(cc_ot_svc_19);
%subset(cc_ot_svc_20);

Alt Text

Alright, now we have a bunch of datasets in our WORK library in SAS. Now what if I told you that you don’t need to go through and count the years, copy and paste the file names, etc.?

Using SAS’s dictionary tables and views is extremely helpful for data-driven programming. You’ve already seen them when you run a PROC CONTENTS, but you may just examine it and move on.

I find that the documentation on it is somewhat lacking, but lots of people know how to use it.

Let’s begin. Go ahead and run this.:

RUN THIS!

proc sql noprint;
    select
                memname                             /*      #1      */
                    into: ot_ds separated by " "    /*      #2      */
    from
                dictionary.tables                   /*      #3      */
    where
                lowcase(libname) = "work" and
                lowcase(memname) contains ("cc_ot_svc");          /*      #4    */
quit;

There’s a lot going on up there. I wanted to break it down a little longer than the code would let me:

  1. memname is basically the dataset name in the work library.
  2. The into: statement creates a macro variable called ot_ds and its values (the dataset names) are separated by spaces (“ “).
    • This is essentially going to take every dataset from our work library and store them into a single macro variable, with each dataset being separated by a space.
  3. We are taking this information from the tables view in the dictionary library. Again, information on datasets is stored here, and we are able to extract some interesting things from it. Much more than I can go into with a single post.
  4. You could also specify libname = "WORK" here since SAS converts everything to uppercase on the backend. In other words, if you use all upper-case, you won’t have an issue here. I try to type in all lower-case, so I just convert it at the beginning.

We can check our new macro variable, &ot_ds. by using the %PUT statement in our program. Note: %PUT is used with macro variables, while PUT can be used in DATA steps.

RUN THIS!

%put &ot_ds.;

You will see this in the log:

CC_OT_SVC_10 CC_OT_SVC_11 CC_OT_SVC_12 CC_OT_SVC_13 CC_OT_SVC_14 CC_OT_SVC_15 CC_OT_SVC_16 CC_OT_SVC_17 CC_OT_SVC_18 CC_OT_SVC_19 CC_OT_SVC_20

Okay, so we have extracted every single dataset name that follows the “cc_ot_svc” pattern and stored it into a single macro variable. What’s next then?

We’re now going to parse that macro variable and feed its contents to whatever procedure/extraction/subset that we need to do.

Here we go:

RUN THIS!

%macro subset;

%do i = 1 %to %sysfunc(countw(&ot_ds., " "));   /*  #1  */

%let dsn = %scan(&ot_ds., &i., " ");    /*  #2   */

proc sql;
	create table		copy_&dsn. as
		select
				pt,
				totaldx,
				year
		from
				&dsn.
		where
				totaldx = 2;    /*  #3  */
quit;

%end;   /*  #4  */

%mend subset;

%subset; /*  #5  */

Looks a little weird, right?

What is %SYSFUNC, COUNTW, and %SCAN? There’s a lot going on in here. But let’s try breaking it down together:

  1. The first line in the %macro subset call involves calling a %DO loop.
    • %DO loops and DO loops are not the same. %DO loops involve macro processing, while DO loops can be used in DATA steps.
    • This %DO statement is followed by the letter i, which is essentially a macro variable that serves as a numeric placeholder for our position within the %DO loop.
    • This %DO statement is telling SAS to execute the underlying code from 1 to however many datasets/values are in the &ot_ds macro variable.
      • To figure out how many datasets are in the &ot_ds macro variable, we need to use the COUNTW function. This function counts words and requires you to specify a delimiter. Since we used a space earlier, we specify “ “ after the &ot_ds variable.
      • Remember, however, we are operating within a macro. Regular functions are not available to us in a macro environment. We must use %SYSFUNC to access and execute the COUNTW function.
  2. This %LET statement is temporarily defining the dataset name within the %DO loop.
    • Notice that we have the %SCAN function which is the macro equivalent of the SCAN function. You may be able to wrap SCAN with a %SYSFUNC, but the %SCAN macro function is readily available.
      • Specifically, we are scanning the &ot_ds macro variable at the i‘th iteration based on our delimiter. Essentially, if it’s the first time through the %DO loop (&i = 1), %SCAN will return the first value in &ot_ds before the space delimiter ” “, which is “CC_OT_SVC_10”. When the %DO loop returns to the top, it will increment i to 2 and extract the second dataset name, “CC_OT_SVC_11”.
  3. This is our regular old query. We want to get the patient id, summed number of diagnoses, and the year of the data file. We want those individuals who have at least two diagnoses.
    • Notice that our FROM clause is pulling from &dsn.. This is temporarily defined to be whatever dataset we’re examining in the %DO loop.
  4. %END represents the end of the %DO loop. If the loop has exceeded the number of words defined in the macro variable &ot_ds, the loop will stop. If it has not exceeded the number of words defined in the macro variable &ot_ds, then it will return to the top, increment the number by 1, and run the query for ther next dataset.
  5. Execute the macro. You don’t have any parameters to define or anything. It’s all housed in the macro.

I find this method to be particularly powerful and less prone to errors. While it may take a bit more up-front work, it usually pays off and is less tedious. Once you try it a couple times, you may find yourself doing it all the time. I usually try to set up everything this way.

If you have any questions, concerns, issues, or need something explained in more detail, please email me at michaelqmaguire2@ufl.edu or stop by my office at HPNP 2330!