Overview
While we often generate binary variables from categorical variables, you might run into a situation where you want to do the opposite.
In other words, you may want to generate a single categorical variable from mutiple binary variables.
I sometimes do this to ensure my fields are being flagged correctly.
Creating a categorical variable
Assume that we have the following dataset:
data have;
infile datalines delimiter = ',' dsd;
input rx_hdc rx_warf rx_hep rx_gabp rx_qtp rx_nap 3.;
datalines;
1,0,0,0,0
0,1,0,0,0
0,0,1,0,0
0,0,0,1,0
0,1,0,0,0
0,1,0,0,0
0,0,0,0,1
0,1,0,0,0
0,0,0,0,1
;
run;

First, let me address the major assumption in this dataset – that is, the fields are mutually exclusive.
We could go through and do something like this:
data want;
set have;
if rx_hdc = 1 then prescription = 'hdc';
else if rx_hep = 1 then prescription = 'hep';
else if rx_warf = 1 then prescription = 'warf';
...
...
run;
This will certainly get what you need. However, if you have a lot of binary variables, this may get a bit tedious.
A clear solution is through an ARRAY. But how can we extract the variable names dynamically without typing them?
Note: I assume your binary variables have a common prefix. If they do not, you can specify them with a double dash (e.g.,
ARRAY _rx [*] rx_hdc -- rx_nap). This also assumes your fields are in a particular order.
data want_single;
set have;
array _rx [*] rx:;
do i = 1 to dim(_rx);
if _rx[i] = 1 then prescription = strip(tranwrd(vname(_rx[i]), 'rx_', ''));
end;
run;

I’ve covered arrays in a previous post. If that post wasn’t helpful or didn’t explain enough, I encourage you to examine the documentation.
You may wonder what is happening inside the DO loop. If the value of the variable within the ARRAY is equal to 1, then we do the following (from inside to outside) for the variable prescription:
- We are extracting the variable name (
VNAME) for that instance where our variable within theARRAYis equal to 1.- This avoids having to manually set the
prescriptionvalue.
- This avoids having to manually set the
- Next, we use the
TRANWRDfunction to replace the prefixrx_with a blank''. - Finally, we remove any leading or trailing whitespace with the
STRIPfunction.- Found this out the hard way. This is relevant for my next example.
Formatting your categorical variable
But what if we want a more descriptive format? We can use PROC FORMAT before we run the DATA step above, and then use PUT to format it accordingly.
proc format;
value $ prescrip
'hdc' = 'Hydrocodone'
'hep' = 'Heparin'
'warf' = 'Warfarin'
'gabp' = 'Gabapentin'
'qtp' = 'Quetiapine'
'nap' = 'Naproxen';
run;
data want_single_fmt;
set have;
array _rx [*] rx:;
do i = 1 to dim(_rx);
if _rx[i] = 1 then prescription = put(strip(tranwrd(vname(_rx[i]), 'rx_', '')), $prescrip.);
end;
run;

If you do not use the STRIP function, the variable will not format due to leading whitespace.
Multiple categorical values
Okay, but what if we have multiple binary variables that are equal to 1?
Fair point, and this one is a little more involved. The structure of this DATA step is similar to the one I made in my last post.
Here’s our dataset with multiple binary variables flagged.
data have_multiple;
infile datalines delimiter = ',' dsd;
input rx_hdc rx_warf rx_hep rx_gabp rx_qtp rx_nap 3.;
datalines;
1,0,0,0,1,1
0,1,0,0,0,1
0,0,1,0,0,0
0,0,0,1,0,0
0,1,0,1,0,0
0,1,0,0,0,1
0,0,0,0,1,0
0,1,0,1,0,1
0,1,0,0,1,1
;
run;

And this is the corresponding code.
data want_multiple;
set have_multiple;
length combined $100.;
array _rx [*] rx:;
do i = 1 to dim(_rx);
if _rx[i] = 1 then do;
prescription = put(strip(tranwrd(vname(_rx[i]), 'rx_', '')), $prescrip.);
combined = catx(', ', combined, prescription);
end;
end;
run;

Two differences:
- We’re creating a new variable called
combinedand setting its length to 100. You need to determine the correct length beforehand. This will hold all of our variables. - Instead of just doing a usual
DOloop, we need to explicitly tell SAS to do the following when_rx[i]is equal to 1:- Create the prescription value like we did earlier.
- Concatenate
combined(blank at the beginning) andprescriptioneach time_rx[i]is equal to 1.
Code is available for download here.