Overview

I’ll never forget feeling so confused at my first job using SAS.

Why were my variables being overwritten? Why were certain variables truncated after a MERGE?

I’d look at my computer like:

Alt Text

The history

My job relied heavily on a procedure called PROC COMPARE - if my results didn’t match my supervisors, then we had to determine who was right and who was wrong.

I would sometimes have fewer characters on a variable. Sometimes I would have a completely different value for a variable.

With the MERGE statement, the PDV and all its attributes is set based on the left most dataset. That means merging variables with the same names may produce unexpected results.

With PROC SQL you might get a warning saying that a column exists on more than one table.

The non-programmatic way

The obvious way to handle this situation was to run a PROC CONTENTS - this would provide us with an HTML listing of what was in each dataset. But that could sometimes be a bit cumbersome.

proc contents data = sashelp.class;
run;

proc contents data = sashelp.heart;
run;

Alt Text Alt Text

We could even go into our directory, right click on the dataset and navigate to the columns tab to see names, lengths, etc.

Alt Text

Nonetheless, it wasn’t terribly efficient, and I sometimes felt more confused than I initially did.

The programmatic way

I recently sat down (yesterday, actually) and developed a MACRO to help cut down on these issues.

The MACRO is called shared_cols. %shared_cols() has two arguments:

  1. tables: what datasets are you comparing?
    • These datasets be two level names (e.g., sashelp.class)
    • You must separate your datasets with a pipe (e.g., %shared_cols(sashelp.class|sashelp.heart))
  2. lib_dsn_out: create an output dataset to examine.
    • Again, you must provide a two-level name (e.g., sashelp.class_and_heart);

It has one limitation that I’m aware of at the moment:

  1. You’re comparing two and only two datasets.
    • I will work on developing an extension of this in the future.

There are some preliminary error checks in the program to prevent misspecifying the parameters.

This macro was tested on both SAS 9.4 (local) and SAS Studio.

The output

Here’s what sample output would look like if you requested a comparison.

Alt Text

Under the hood

In essence, this MACRO uses dictionary.columns to identify shared columns through PROC SQL. dictionary.columns contains attributes about a variable in a given dataset. I use an INNER JOIN (we only care about shared columns) to grab shared columns and their given attributes.

Here’s the bare macro. It is also available for viewing on my GitHub

%macro shared_cols (tables, lib_dsn_out);

/* Check for pipe separation in `tables` */
%if %sysfunc(find(&tables., |)) = 0 %then %do;
	%put ERROR: No pipe specified to separate datasets. Please place a pipe between your datasets in the `tables` argument.;
	%return;
%end;

/* Check to see if files exists */
%if %sysfunc(exist(%scan(&tables., 1, |))) = 0 & %sysfunc(exist(%scan(&tables., -1, |))) = 0 %then %do;
	%put ERROR: Neither file exists. Please ensure that you have used libnames and datasets in the `tables` argument.;
	%return;
%end;
	/* Check to see if first dataset exists */
	%else %if %sysfunc(exist(%scan(&tables., 1, |))) = 0 %then %do;
		%put ERROR: The file %scan(&tables., 1, |) does not exist. Please ensure that you have specified a LIBNAME along with the dataset name (e.g., work.dataset).;
		%return;
	%end;
	/* Check to see if second dataset exists */
	%else %if %sysfunc(exist(%scan(&tables., -1, |))) = 0 %then %do;
		%put ERROR: The file %scan(&tables., -1, |) does not exist. Please ensure that you have specified a LIBNAME along with the dataset name (e.g., work.dataset).;
		%return;
	%end;

/* Initiate local macro variables that exist within scope of macro call */
%local ln1 mn1 ln2 mn2 sc;

/* Extract libname of first dataset */
%let ln1 = %upcase(%scan(%scan(&tables., 1, |), 1, .));
/* Extract dataset name of first dataset */
%let mn1 = %upcase(%scan(%scan(&tables., 1, |), -1, .));
/* Extract libname of second dataset */
%let ln2 = %upcase(%scan(%scan(&tables., -1, |), 1, .));
/* Extract dataset name of second dataset */
%let mn2 = %upcase(%scan(%scan(&tables., -1, |), -1, .));

/* Check to see if there are shared columns */
proc sql noprint;
	select
				count(t1.name) 
					into: sc trimmed
	from
				dictionary.columns as t1
					inner join
				dictionary.columns as t2
						on	t1.name = t2.name
		where
					t1.libname = "&ln1." 	and
					t1.memname = "&mn1."	and
					t2.libname = "&ln2."	and
					t2.memname = "&mn2.";
quit;

/* If no variables match, send warning to log and exit the macro */
%if &sc. = 0 %then %do;
	%put WARNING: %upcase(%scan(&tables., 1, |)) and %upcase(%scan(&tables., -1, |)) did not share any columns.;
	%return;
%end;

/* Extract shared variable and each attribute of that column in both datasets. */
proc sql;
	/* If there are shared columns and a dataset is requested, create the dataset in the desired library */
%if &sc. ~=  0 & %length(&lib_dsn_out.) > 0 %then %do;
	create table 	
					&lib_dsn_out. as
		select
					t1.name as t1_name label = "Name in: %scan(&tables., 1, |)",
					t1.type as t1_type label = "Type in: %scan(&tables., 1, |)",
					t2.type as t2_type label = "Type in: %scan(&tables., -1, |)",
					t1.length as t1_length label = "Length in: %scan(&tables., 1, |)",
					t2.length as t2_length label = "Length in: %scan(&tables., -1, |)",
					t1.label as t1_label label = "Label in: %scan(&tables., 1, |)",
					t2.label as t2_label label = "Label in: %scan(&tables., -1, |)",
					t1.format as t1_format label = "Format in: %scan(&tables., 1, |)",
					t2.format as t2_format label = "Format in: %scan(&tables., -1, |)",
					t1.informat as t1_informat label = "Informat in %scan(&tables., 1, |)",
					t2.informat as t2_informat label = "Informat in %scan(&tables., -1, |)"					
		from
					dictionary.columns as t1
						inner join
					dictionary.columns as t2
							on	t1.name = t2.name
		where
					t1.libname = "&ln1." 	and
					t1.memname = "&mn1."	and
					t2.libname = "&ln2."	and
					t2.memname = "&mn2.";

		title "Shared Columns between %upcase(%scan(&tables., 1, |)) [t1] and %upcase(%scan(&tables., -1, |)) [t2]";
		select 
					*
		from
					&lib_dsn_out.;
		title;

%end;
	/* Otherwise, just print the results. */
	%else %do;

		title "Shared Columns between %upcase(%scan(&tables., 1, |)) [t1] and %upcase(%scan(&tables., -1, |)) [t2]";
		select
					t1.name as t1_name label = "Name in: %scan(&tables., 1, |)",
					t1.type as t1_type label = "Type in: %scan(&tables., 1, |)",
					t2.type as t2_type label = "Type in: %scan(&tables., -1, |)",
					t1.length as t1_length label = "Length in: %scan(&tables., 1, |)",
					t2.length as t2_length label = "Length in: %scan(&tables., -1, |)",
					t1.label as t1_label label = "Label in: %scan(&tables., 1, |)",
					t2.label as t2_label label = "Label in: %scan(&tables., -1, |)",
					t1.format as t1_format label = "Format in: %scan(&tables., 1, |)",
					t2.format as t2_format label = "Format in: %scan(&tables., -1, |)",
					t1.informat as t1_informat label = "Informat in %scan(&tables., 1, |)",
					t2.informat as t2_informat label = "Informat in %scan(&tables., -1, |)"					
		from
					dictionary.columns as t1
						inner join
					dictionary.columns as t2
							on	t1.name = t2.name
		where
					t1.libname = "&ln1." 	and
					t1.memname = "&mn1."	and
					t2.libname = "&ln2."	and
					t2.memname = "&mn2.";
		title;

		%end;

quit;

/* End macro */
%mend shared_cols;