Let’s say you have data in long format:
The following dataset is in the
sashelpdirectory, and its name isiris. It is a common example dataset across languages. No matter which edition of SAS you are using, you should have access to this dataset.
You want to consolidate all SepalLength values for a given species into a single field.
This transformation is similar to long-to-wide transformations, except we are combining values into a single field instead of creating separate variables for each value.
If you want to create multiple variables from multiple values, I would recommend
PROC TRANSPOSEor aDATAstep with anARRAYstatement.
One method is using a DO-UNTIL loop.
data want;
do until (last.species); /* #1 */
set sashelp.iris; /* #2 */
by species; /* #3 */
length all_lengths $300.; /* #4 */
retain all_lengths; /* #5 */
if first.species then all_lengths = sepallength; /* #6 */
else all_lengths = catx(', ', all_lengths, sepallength); /* #7 */
if last.species then output; /* #8 */
end; /* #9 */
run;
This dataset is already sorted by
species. If your data are not sorted, you will need to sort it viaPROC SORTor aPROC SQLquery.
- The
DO UNTILstatement is going to process until a condition is satisfied.- The condition that we’re using here is
last.species. We’re telling SAS to continue to process untillast.speciesis equal to 1.
- The condition that we’re using here is
- We’re reading the
sashelp.irisdata into the PDV. - We initiate
BY-GROUPprocessing. This is required since we’re using theDO UNTILstatement.- This tells SAS to create automatic variables
first.speciesandlast.speciesand set each respective record (first and last) to 1. If it isn’t the first or last record, SAS marks the respective field as 0.
- This tells SAS to create automatic variables
- Tell SAS how long you want the new variable,
all_lengthsto be. You can always modify this later, so I set it to a long length to ensure we capture everything. RETAINwill hold the value ofall_lengthsacross observations for a givenspecies.- If
first.species = 1(implied withfirst.species), setall_lengthstosepallength. - Otherwise, if
first.species = 0,all_lengthsshould combine what already exists inall_lengthswith the current record’s value ofsepallength.- We’re combining with the
CATXfunction, which adds a delimiter (in this case,,) between each value.
- We’re combining with the
- If
last.species = 1(again, implied), output the record to thewantdataset. - The
ENDfor theDO UNTILstatement returns to the top and reads the next group of species into the PDV.
This results in the following: