Overview
A post came across the forums recently where someone needed their results transposed.
The hitch - there were multiple records for their key or identifying variables.
They wanted it down to one line for each.
The problem
Here was the dataset:
data have;
input ID_PRSN :$3. ID_CPS :$3. CD_ALLEGATION CD_FINDING;
datalines;
001 001 1 9
001 001 3 1
001 002 1 1
002 003 3 2
002 003 4 1
002 003 5 9
002 004 1 1
002 004 2 1
002 005 1 1
003 006 4 9
;
run;
A simple PROC TRANSPOSE
, right?
proc transpose
data = have out = want;
by id_prsn id_cps;
var cd_allegation cd_finding;
run;
My Solution
You can use PROC TRANSPOSE
twice to solve the issue above (plus a DATA
step to make a counter variable), but why not get it done in two PROCS
?
PROC SUMMARY
is a little gem for this, and it’s kind of difficult to find.
It’s such a gem that it helped identify a coding issue in a SAS published book.
Step 1: Determine how many fields you need
First, I needed to create a counter
flag - this flag would represent the number of records under a given id_prsn
and id_cps
.
Whatever the largest value was for counter
was going to be the number of variables outputted.
proc sql noprint;
select
max(counter)
into: counter trimmed
from
( select
count(*) as counter
from
have
group by
id_prsn, id_cps
);
quit;
This may look a little weird, but let me point you in the right direction for interpretation.
- This query does not start at the
SELECT
statement - it starts at theFROM
clause. - Look at the
FROM
clause - it involves a subquery. This subquery is returning a column of the highest values byid_prsn
andid_cps
. - Now, go back to the opening
SELECT
statement. We are selecting the maximum value ofcounter
from that subquery.- Additionally, we are selecting that maximum value and placing it into a macro variable called
counter
.Note: The
trimmed
option is needed because this process returns a string with whitespace.
- Additionally, we are selecting that maximum value and placing it into a macro variable called
What is the value?
%put &counter.;
This means we have up to three variables for a given individual. We will need three variables each for CD_ALLEGATION
and CD_FINDING
to hold all values for all individuals.
Step 2: Apply your macro variable with PROC SUMMARY
Now for some fun, arcane syntax.
If you read the documentation, you will see that its similar to the MEANS
procedure. In fact, I just tested whether this code ran in PROC MEANS
and, go figure, it worked!
Nonetheless:
proc summary data = have nway; /* #1 */
class id_prsn id_cps; /* #2 */
output out = want (drop = _:) /* #3 */
idgroup(out[&counter.](cd_allegation cd_finding) = algtn find); /* #4 */
run;
NWAY
is telling SAS to give me all combinations for eachid_prsn
andid_cps
. Don’t give me justid_prsn
orid_cps
- we want it all.- What are we classifying or grouping the variables by? Self-evident:
id_prsn
andid_cps
. - Whenever you want to output a dataset with
MEANS
orSUMMARY
, you specifyOUTPUT OUT = <your-new-dataset>
.- Here, I’m creating a dataset named
want
and I am usingDROP
to prevent automatic variables (_TYPE_
,_FREQ_
) from being attached to the dataset.
- Here, I’m creating a dataset named
- Now the fun:
IDGROUP
allows you to make output variables with the same nameOUT
is referring to our output dataset,want
.[&counter.]
is our macro variable from thePROC SQL
step.- This is telling
PROC SUMMARY
that our output dataset should have three variables.
- This is telling
- But what variables? Remember
CD_ALLEGATION
andCD_FINDING
? Those are what we wanted to be transposed. So we follow the macro variable in brackets with the variables we want three of in our output dataset,CD_ALLEGATION
andCD_FINDING
.Note the parentheses surrounding the variables.
- Finally, we separate our existing variables and our new variables with an
=
sign.- On the left we have our existing variables, but on the right we have our new variables,
algtn
andfind
. They will be appended with the sequence they are processed (e.g.,algtn_1
,algtn_2
, etc.)
- On the left we have our existing variables, but on the right we have our new variables,