This post is largely for myself, but I hope it can help others.
I can’t tell you how many times attempting to count records in SAS throws me for a loop.
Counting comes down to identifiying whether a record is similar or different from the previous record. Counting is preferential and depends on your exact motive in how you want to classify your records.
The dataset below is fabricated. Please excuse any absurdities – I entered random drug names and quantities.
Copy the code below and paste it into your program. Again, you can access SAS here or through UFApps.
data have;
infile datalines delimiter = ",";
input enrolid date :mmddyy10. gennme :$18. qty_dspnd;
format date mmddyy10.;
datalines;
1,2/4/2019,Gabapentin,30
1,2/5/2019,Meloxicam,30
1,2/5/2019,Hydrocodone,20
1,2/5/2019,Hydrocodone,20
1,2/5/2019,Oxycodone,20
1,2/5/2019,Morphine,20
1,2/6/2019,Oxycodone,20
1,4/5/2020,Eszopiclone,15
1,4/6/2020,Eszopiclone,15
2,3/31/2020,Codeine,15
2,4/11/2021,Morphine,5
2,4/12/2021,Morphine,5
2,4/13/2021,Oxymorphone,10
2,4/13/2021,Alprazolam,5
3,1/2/2020,Prozac,30
3,3/4/2021,Fluoxetine,30
4,2/2/2019,Dextroamphetamine ,20
5,3/4/2017,Amiloride,30
5,3/5/2019,Amiloride,30
5,4/5/2021,Amiloride,30
5,4/6/2021,Triamterene,30
5,4/30/2021,Triamterene,30
5,9/10/2021,Amiloride,25
6,11/4/2018,Morphine,25
6,11/5/2018,Morphine,5
6,11/12/2019,Morphine,5
7,9/9/2018,Prednisone,25
7,9/8/2019,Prednisolone,30
7,4/15/2020,Prednisone,30
8,5/31/2021,Ibuprofen,10
8,6/20/2021,Morphine,15
8,6/25/2021,Hydrocodone,20
8,8/3/2021,Fluoxetine,30
;
There are a couple steps to counting in SAS. These steps will be illustrated below.
- DO: Sort your dataset by the variables you want to count.
- DECIDE: How do you want to count the variables? Do you want to repeat the count for the variable(s) of interest? Or do you want to increment it by that variable?
- DO: Create a new dataset with a
DATA
step.- DO: Use a
BY
statement count records accordingly. - DO: Use the
RETAIN
statement to hold the values in the PDV across observations. - Use
IF-THEN/ELSE-IF/ELSE
andFIRST./LAST.
processing to increment records accordingly.
- DO: Use a
Step 1: Let’s say we want to count by the unique identifier, ENROLID. Sort it by that.
proc sort
data = have;
by enrolid;
run;
Step 2: I want to increment a counter by the ENROLID. In other words, if the same ENROLID shows up multiple times, I want each record incremented by 1.
Step 3: Create a new dataset.
data count_by_patient_record;
set have;
by enrolid; /* Variable I want to count by */
retain count; /* Retain count variable value in PDV */
/* If it's the first ENROLID, set count = 1. */
if first.enrolid then count = 1;
/* If it isn't the first ENROLID, increment by 1. */
else if not first.enrolid then count + 1;
run;
Step 3 gives us this:
enrolid date gennme qty_dspnd count
1 02/04/2019 Gabapentin 30 1
1 02/05/2019 Meloxicam 30 2
1 02/05/2019 Hydrocodone 20 3
1 02/05/2019 Hydrocodone 20 4
1 02/05/2019 Oxycodone 20 5
1 02/05/2019 Morphine 20 6
1 02/06/2019 Oxycodone 20 7
1 04/05/2020 Eszopiclone 15 8
1 04/06/2020 Eszopiclone 15 9
2 03/31/2020 Codeine 15 1
2 04/11/2021 Morphine 5 2
2 04/12/2021 Morphine 5 3
2 04/13/2021 Oxymorphone 10 4
2 04/13/2021 Alprazolam 5 5
3 01/02/2020 Prozac 30 1
3 03/04/2021 Fluoxetine 30 2
4 02/02/2019 Dextroamphetamine 20 1
5 03/04/2017 Amiloride 30 1
5 03/05/2019 Amiloride 30 2
5 04/05/2021 Amiloride 30 3
5 04/06/2021 Triamterene 30 4
5 04/30/2021 Triamterene 30 5
5 09/10/2021 Amiloride 25 6
6 11/04/2018 Morphine 25 1
6 11/05/2018 Morphine 5 2
6 11/12/2019 Morphine 5 3
7 09/09/2018 Prednisone 25 1
7 09/08/2019 Prednisolone 30 2
7 04/15/2020 Prednisone 30 3
8 05/31/2021 Ibuprofen 10 1
8 06/20/2021 Morphine 15 2
8 06/25/2021 Hydrocodone 20 3
8 08/03/2021 Fluoxetine 30 4
Notice how each ENROLID’s first record has 1. If it’s the same ENROLID, it is incremented by 1.
Great, let’s add a little more complexity. What if I want to count by ENROLID and DATE?
Go back to the flow above.
Step 1: Sort by the variables that you want to count by.
proc sort
data = have;
by enrolid date;
run;
Step 2: How do we want to count? Let’s say that we want to increment by 1 if it’s a new date for a given ENROLID, but we want records with the same date for a given ENROLID to have the same count value.
Step 3: Create the new dataset and use the appropriate statements.
data count_by_patient_and_date;
set have;
/* These are the variables I want to count by. */
by enrolid date;
/* Hold the values in the PDV across observations. */
retain count;
/* If it's the first ENROLID, populate PDV with count value of 1. */
if first.enrolid then count = 1;
/* By ENROLID and DATE, if it's the first date, increment by 1. */
else if first.date then count + 1;
run;
Any time you use the BY
statement, those variables listed in the BY
statement initiate temporary FIRST
and LAST
variables in the PDV that do not get written to the dataset. One way to see this is with a PUT
statement in your DATA
step.
data count_by_patient_and_date;
set have;
by enrolid date; /* These are the variables I want to count by. */
retain count; /* Hold the values in the PDV across observations. */
/* If it's the first ENROLID, populate PDV with count value of 1. */
if first.enrolid then count = 1;
/* By ENROLID and DATE, if it's the first date, increment by 1. */
else if first.date then count + 1;
put _all_; /* Output PDV to log. */
run;
It isn’t pretty, but you can track down the records in the log and examine what is going on.
Step 3 gives us this:
enrolid date gennme qty_dspnd count
1 02/04/2019 Gabapentin 30 1
1 02/05/2019 Hydrocodone 20 2
1 02/05/2019 Hydrocodone 20 2
1 02/05/2019 Meloxicam 30 2
1 02/05/2019 Morphine 20 2
1 02/05/2019 Oxycodone 20 2
1 02/06/2019 Oxycodone 20 3
1 04/05/2020 Eszopiclone 15 4
1 04/06/2020 Eszopiclone 15 5
2 03/31/2020 Codeine 15 1
2 04/11/2021 Morphine 5 2
2 04/12/2021 Morphine 5 3
2 04/13/2021 Alprazolam 5 4
2 04/13/2021 Oxymorphone 10 4
3 01/02/2020 Prozac 30 1
3 03/04/2021 Fluoxetine 30 2
4 02/02/2019 Dextroamphetamine 20 1
5 03/04/2017 Amiloride 30 1
5 03/05/2019 Amiloride 30 2
5 04/05/2021 Amiloride 30 3
5 04/06/2021 Triamterene 30 4
5 04/30/2021 Triamterene 30 5
5 09/10/2021 Amiloride 25 6
6 11/04/2018 Morphine 25 1
6 11/05/2018 Morphine 5 2
6 11/12/2019 Morphine 5 3
7 09/09/2018 Prednisone 25 1
7 09/08/2019 Prednisolone 30 2
7 04/15/2020 Prednisone 30 3
8 05/31/2021 Ibuprofen 10 1
8 06/20/2021 Morphine 15 2
8 06/25/2021 Hydrocodone 20 3
8 08/03/2021 Fluoxetine 30 4
Notice that the COUNT variable is set to 1 if it is the first ENROLID. When the date changes, however, the COUNT variable is incremented by 1. If it’s the same DATE for a given ENROLID, the RETAIN
statement holds this value in the PDV until it encounters either (1) a new ENROLID or (2) a new DATE.
Let’s do one more iteration.
I want to do the same thing as above, but if a different drug is prescribed on the same day, I want to increment the count by 1.
Let’s go back to our all-too-familiar steps.
Step 1: Sort your dataset by the variables you want to count.
proc sort
data = have;
by enrolid date gennme;
run;
Step 2: How do we want to count? Well, I already said how! I want to increment records only if a different drug is prescribed on the same day for a given ENROLID on a given DATE.
Step 3: Create our dataset.
data count_by_patient_date_drug;
set have;
/* Counting by these variables. Populates PDV with FIRST. LAST. for each. */
by enrolid date gennme;
/* Hold the values constant across PDV for the BY variables. */
retain count;
/* If it's the first DATE for a given ENROLID, set the COUNT variable to 1. */
if first.date then count = 1;
/* If it's the first time a drug appears on a given date, increment COUNT by 1. */
else if first.gennme then count + 1;
run;
Here is our output:
enrolid date gennme qty_dspnd count
1 02/04/2019 Gabapentin 30 1
1 02/05/2019 Hydrocodone 20 1
1 02/05/2019 Hydrocodone 20 1
1 02/05/2019 Meloxicam 30 2
1 02/05/2019 Morphine 20 3
1 02/05/2019 Oxycodone 20 4
1 02/06/2019 Oxycodone 20 1
1 04/05/2020 Eszopiclone 15 1
1 04/06/2020 Eszopiclone 15 1
2 03/31/2020 Codeine 15 1
2 04/11/2021 Morphine 5 1
2 04/12/2021 Morphine 5 1
2 04/13/2021 Alprazolam 5 1
2 04/13/2021 Oxymorphone 10 2
3 01/02/2020 Prozac 30 1
3 03/04/2021 Fluoxetine 30 1
4 02/02/2019 Dextroamphetamine 20 1
5 03/04/2017 Amiloride 30 1
5 03/05/2019 Amiloride 30 1
5 04/05/2021 Amiloride 30 1
5 04/06/2021 Triamterene 30 1
5 04/30/2021 Triamterene 30 1
5 09/10/2021 Amiloride 25 1
6 11/04/2018 Morphine 25 1
6 11/05/2018 Morphine 5 1
6 11/12/2019 Morphine 5 1
7 09/09/2018 Prednisone 25 1
7 09/08/2019 Prednisolone 30 1
7 04/15/2020 Prednisone 30 1
8 05/31/2021 Ibuprofen 10 1
8 06/20/2021 Morphine 15 1
8 06/25/2021 Hydrocodone 20 1
8 08/03/2021 Fluoxetine 30 1
Again, remember that when we invoke the BY
statement, SAS automatically creates temporary FIRST
and LAST
variables in the PDV for each variable listed.
Our ELSE IF
statement does not explicitly account for the ENROLID variable, but if the DATE changes OR the GENNME changes, it will increment by 1.
I know this may not capture your exact situation, but playing around with the PUT
statement and changing your IF-THEN
and ELSE-IF
statements will get you to where you need to be. I tried to show a couple different variants to show how you can implement something based on your needs.
Of course, you can always stop by my office, shoot me an email, or message me on Teams. Feel free to let me know if you notice an issue or problem with the code above, or if you think there’s a better way to conduct counting.