Let’s say you have data in long format:
The following dataset is in the
sashelp
directory, and its name isiris
. It is a common example dataset across languages. No matter which edition of SAS you are using, you should have access to this dataset.
You want to consolidate all SepalLength
values for a given species
into a single field.
This transformation is similar to long-to-wide transformations, except we are combining values into a single field instead of creating separate variables for each value.
If you want to create multiple variables from multiple values, I would recommend
PROC TRANSPOSE
or aDATA
step with anARRAY
statement.
One method is using a DO-UNTIL
loop.
data want;
do until (last.species); /* #1 */
set sashelp.iris; /* #2 */
by species; /* #3 */
length all_lengths $300.; /* #4 */
retain all_lengths; /* #5 */
if first.species then all_lengths = sepallength; /* #6 */
else all_lengths = catx(', ', all_lengths, sepallength); /* #7 */
if last.species then output; /* #8 */
end; /* #9 */
run;
This dataset is already sorted by
species
. If your data are not sorted, you will need to sort it viaPROC SORT
or aPROC SQL
query.
- The
DO UNTIL
statement is going to process until a condition is satisfied.- The condition that we’re using here is
last.species
. We’re telling SAS to continue to process untillast.species
is equal to 1.
- The condition that we’re using here is
- We’re reading the
sashelp.iris
data into the PDV. - We initiate
BY-GROUP
processing. This is required since we’re using theDO UNTIL
statement.- This tells SAS to create automatic variables
first.species
andlast.species
and set each respective record (first and last) to 1. If it isn’t the first or last record, SAS marks the respective field as 0.
- This tells SAS to create automatic variables
- Tell SAS how long you want the new variable,
all_lengths
to be. You can always modify this later, so I set it to a long length to ensure we capture everything. RETAIN
will hold the value ofall_lengths
across observations for a givenspecies
.- If
first.species = 1
(implied withfirst.species
), setall_lengths
tosepallength
. - Otherwise, if
first.species = 0
,all_lengths
should combine what already exists inall_lengths
with the current record’s value ofsepallength
.- We’re combining with the
CATX
function, which adds a delimiter (in this case,,
) between each value.
- We’re combining with the
- If
last.species = 1
(again, implied), output the record to thewant
dataset. - The
END
for theDO UNTIL
statement returns to the top and reads the next group of species into the PDV.
This results in the following: