This post is largely for myself, but I hope it can help others.

I can’t tell you how many times attempting to count records in SAS throws me for a loop.

Counting comes down to identifiying whether a record is similar or different from the previous record. Counting is preferential and depends on your exact motive in how you want to classify your records.

The dataset below is fabricated. Please excuse any absurdities – I entered random drug names and quantities.

Copy the code below and paste it into your program. Again, you can access SAS here or through UFApps.

data have;
infile datalines delimiter = ",";
input enrolid date :mmddyy10. gennme :$18. qty_dspnd;
format date mmddyy10.;
datalines;
1,2/4/2019,Gabapentin,30
1,2/5/2019,Meloxicam,30
1,2/5/2019,Hydrocodone,20
1,2/5/2019,Hydrocodone,20
1,2/5/2019,Oxycodone,20
1,2/5/2019,Morphine,20
1,2/6/2019,Oxycodone,20
1,4/5/2020,Eszopiclone,15
1,4/6/2020,Eszopiclone,15
2,3/31/2020,Codeine,15
2,4/11/2021,Morphine,5
2,4/12/2021,Morphine,5
2,4/13/2021,Oxymorphone,10
2,4/13/2021,Alprazolam,5
3,1/2/2020,Prozac,30
3,3/4/2021,Fluoxetine,30
4,2/2/2019,Dextroamphetamine ,20
5,3/4/2017,Amiloride,30
5,3/5/2019,Amiloride,30
5,4/5/2021,Amiloride,30
5,4/6/2021,Triamterene,30
5,4/30/2021,Triamterene,30
5,9/10/2021,Amiloride,25
6,11/4/2018,Morphine,25
6,11/5/2018,Morphine,5
6,11/12/2019,Morphine,5
7,9/9/2018,Prednisone,25
7,9/8/2019,Prednisolone,30
7,4/15/2020,Prednisone,30
8,5/31/2021,Ibuprofen,10
8,6/20/2021,Morphine,15
8,6/25/2021,Hydrocodone,20
8,8/3/2021,Fluoxetine,30
;

There are a couple steps to counting in SAS. These steps will be illustrated below.

  1. DO: Sort your dataset by the variables you want to count.
  2. DECIDE: How do you want to count the variables? Do you want to repeat the count for the variable(s) of interest? Or do you want to increment it by that variable?
  3. DO: Create a new dataset with a DATA step.
    • DO: Use a BY statement count records accordingly.
    • DO: Use the RETAIN statement to hold the values in the PDV across observations.
    • Use IF-THEN/ELSE-IF/ELSE and FIRST./LAST. processing to increment records accordingly.

Step 1: Let’s say we want to count by the unique identifier, ENROLID. Sort it by that.

proc sort
	data = have;
		by enrolid;
run;

Step 2: I want to increment a counter by the ENROLID. In other words, if the same ENROLID shows up multiple times, I want each record incremented by 1.

Step 3: Create a new dataset.

data count_by_patient_record;
	set have;
	by enrolid; /* Variable I want to count by */
	retain count; /* Retain count variable value in PDV */
    
        /* If it's the first ENROLID, set count = 1. */
		if first.enrolid then count = 1;
			/* If it isn't the first ENROLID, increment by 1. */
            else if not first.enrolid then count + 1; 

run;

Step 3 gives us this:

enrolid	date	   gennme	            qty_dspnd	count

1       02/04/2019 Gabapentin           30          1 
1       02/05/2019 Meloxicam            30          2 
1       02/05/2019 Hydrocodone          20          3 
1       02/05/2019 Hydrocodone          20          4 
1       02/05/2019 Oxycodone            20          5 
1       02/05/2019 Morphine             20          6 
1       02/06/2019 Oxycodone            20          7 
1       04/05/2020 Eszopiclone          15          8 
1       04/06/2020 Eszopiclone          15          9 
2       03/31/2020 Codeine              15          1 
2       04/11/2021 Morphine             5           2 
2       04/12/2021 Morphine             5           3 
2       04/13/2021 Oxymorphone          10          4 
2       04/13/2021 Alprazolam           5           5 
3       01/02/2020 Prozac               30          1 
3       03/04/2021 Fluoxetine           30          2 
4       02/02/2019 Dextroamphetamine    20          1 
5       03/04/2017 Amiloride            30          1 
5       03/05/2019 Amiloride            30          2 
5       04/05/2021 Amiloride            30          3 
5       04/06/2021 Triamterene          30          4 
5       04/30/2021 Triamterene          30          5 
5       09/10/2021 Amiloride            25          6 
6       11/04/2018 Morphine             25          1 
6       11/05/2018 Morphine             5           2 
6       11/12/2019 Morphine             5           3 
7       09/09/2018 Prednisone           25          1 
7       09/08/2019 Prednisolone         30          2 
7       04/15/2020 Prednisone           30          3 
8       05/31/2021 Ibuprofen            10          1 
8       06/20/2021 Morphine             15          2 
8       06/25/2021 Hydrocodone          20          3 
8       08/03/2021 Fluoxetine           30          4 

Notice how each ENROLID’s first record has 1. If it’s the same ENROLID, it is incremented by 1.

Great, let’s add a little more complexity. What if I want to count by ENROLID and DATE?

Go back to the flow above.

Step 1: Sort by the variables that you want to count by.

proc sort
	data = have;
		by enrolid date;
run;

Step 2: How do we want to count? Let’s say that we want to increment by 1 if it’s a new date for a given ENROLID, but we want records with the same date for a given ENROLID to have the same count value.

Step 3: Create the new dataset and use the appropriate statements.

data count_by_patient_and_date;
	set have;
    /* These are the variables I want to count by. */
	by enrolid date;
    /* Hold the values in the PDV across observations. */ 
	retain count; 

        /* If it's the first ENROLID, populate PDV with count value of 1. */
		if first.enrolid then count = 1; 
            /* By ENROLID and DATE, if it's the first date, increment by 1. */
			else if first.date then count + 1;

run;

Any time you use the BY statement, those variables listed in the BY statement initiate temporary FIRST and LAST variables in the PDV that do not get written to the dataset. One way to see this is with a PUT statement in your DATA step.

data count_by_patient_and_date;
	set have;
	by enrolid date; /* These are the variables I want to count by. */
	retain count; /* Hold the values in the PDV across observations. */

        /* If it's the first ENROLID, populate PDV with count value of 1. */
		if first.enrolid then count = 1; 
            /* By ENROLID and DATE, if it's the first date, increment by 1. */
			else if first.date then count + 1;

    put _all_; /* Output PDV to log. */

run;

It isn’t pretty, but you can track down the records in the log and examine what is going on.

Step 3 gives us this:

enrolid	date	    gennme	            qty_dspnd	count
1       02/04/2019  Gabapentin          30          1 
1       02/05/2019  Hydrocodone         20          2 
1       02/05/2019  Hydrocodone         20          2 
1       02/05/2019  Meloxicam           30          2 
1       02/05/2019  Morphine            20          2 
1       02/05/2019  Oxycodone           20          2 
1       02/06/2019  Oxycodone           20          3 
1       04/05/2020  Eszopiclone         15          4 
1       04/06/2020  Eszopiclone         15          5 
2       03/31/2020  Codeine             15          1 
2       04/11/2021  Morphine            5           2 
2       04/12/2021  Morphine            5           3 
2       04/13/2021  Alprazolam          5           4 
2       04/13/2021  Oxymorphone         10          4 
3       01/02/2020  Prozac              30          1 
3       03/04/2021  Fluoxetine          30          2 
4       02/02/2019  Dextroamphetamine   20          1 
5       03/04/2017  Amiloride           30          1 
5       03/05/2019  Amiloride           30          2 
5       04/05/2021  Amiloride           30          3 
5       04/06/2021  Triamterene         30          4 
5       04/30/2021  Triamterene         30          5 
5       09/10/2021  Amiloride           25          6 
6       11/04/2018  Morphine            25          1 
6       11/05/2018  Morphine            5           2 
6       11/12/2019  Morphine            5           3 
7       09/09/2018  Prednisone          25          1 
7       09/08/2019  Prednisolone        30          2 
7       04/15/2020  Prednisone          30          3 
8       05/31/2021  Ibuprofen           10          1 
8       06/20/2021  Morphine            15          2 
8       06/25/2021  Hydrocodone         20          3 
8       08/03/2021  Fluoxetine          30          4 

Notice that the COUNT variable is set to 1 if it is the first ENROLID. When the date changes, however, the COUNT variable is incremented by 1. If it’s the same DATE for a given ENROLID, the RETAIN statement holds this value in the PDV until it encounters either (1) a new ENROLID or (2) a new DATE.

Let’s do one more iteration.

I want to do the same thing as above, but if a different drug is prescribed on the same day, I want to increment the count by 1.

Let’s go back to our all-too-familiar steps.

Step 1: Sort your dataset by the variables you want to count.

proc sort
    data = have;
        by enrolid date gennme;
run;

Step 2: How do we want to count? Well, I already said how! I want to increment records only if a different drug is prescribed on the same day for a given ENROLID on a given DATE.

Step 3: Create our dataset.

data count_by_patient_date_drug;
	set have;
    /* Counting by these variables. Populates PDV with FIRST. LAST. for each. */
	by enrolid date gennme; 
    /* Hold the values constant across PDV for the BY variables. */
	retain count;

        /* If it's the first DATE for a given ENROLID, set the COUNT variable to 1. */
		if first.date then count = 1;
            /* If it's the first time a drug appears on a given date, increment COUNT by 1. */
			else if first.gennme then count + 1;

run;

Here is our output:

enrolid date        gennme              qty_dspnd   count 
1       02/04/2019  Gabapentin          30          1 
1       02/05/2019  Hydrocodone         20          1 
1       02/05/2019  Hydrocodone         20          1 
1       02/05/2019  Meloxicam           30          2 
1       02/05/2019  Morphine            20          3 
1       02/05/2019  Oxycodone           20          4 
1       02/06/2019  Oxycodone           20          1 
1       04/05/2020  Eszopiclone         15          1 
1       04/06/2020  Eszopiclone         15          1 
2       03/31/2020  Codeine             15          1 
2       04/11/2021  Morphine            5           1 
2       04/12/2021  Morphine            5           1 
2       04/13/2021  Alprazolam          5           1 
2       04/13/2021  Oxymorphone         10          2 
3       01/02/2020  Prozac              30          1 
3       03/04/2021  Fluoxetine          30          1 
4       02/02/2019  Dextroamphetamine   20          1 
5       03/04/2017  Amiloride           30          1 
5       03/05/2019  Amiloride           30          1 
5       04/05/2021  Amiloride           30          1 
5       04/06/2021  Triamterene         30          1 
5       04/30/2021  Triamterene         30          1 
5       09/10/2021  Amiloride           25          1 
6       11/04/2018  Morphine            25          1 
6       11/05/2018  Morphine            5           1 
6       11/12/2019  Morphine            5           1 
7       09/09/2018  Prednisone          25          1 
7       09/08/2019  Prednisolone        30          1 
7       04/15/2020  Prednisone          30          1 
8       05/31/2021  Ibuprofen           10          1 
8       06/20/2021  Morphine            15          1 
8       06/25/2021  Hydrocodone         20          1 
8       08/03/2021  Fluoxetine          30          1 

Again, remember that when we invoke the BY statement, SAS automatically creates temporary FIRST and LAST variables in the PDV for each variable listed.

Our ELSE IF statement does not explicitly account for the ENROLID variable, but if the DATE changes OR the GENNME changes, it will increment by 1.

I know this may not capture your exact situation, but playing around with the PUT statement and changing your IF-THEN and ELSE-IF statements will get you to where you need to be. I tried to show a couple different variants to show how you can implement something based on your needs.

Of course, you can always stop by my office, shoot me an email, or message me on Teams. Feel free to let me know if you notice an issue or problem with the code above, or if you think there’s a better way to conduct counting.