We often need to flag whether a patient was diagnosed with a given medical condition.

There are many ways to do this, and I’m only showing a couple ways that are generally effective.

First, here is some example data that largely mimic the format of claims data.

data work.patients;
  infile datalines dsd truncover;
  input enrolid:3. dx1:$7. dx2:$7. dx3:$7. dx4:$7.;
datalines;
2,A821,E133519,F1194,E133299
4,E75241,E731,E088,E083291
4,E75241,A5139,E088,E083291
5,E0810,A5139,E133291,A710
5,E0810,A360,E133291,A710
6,A3790,A4289,A5041,F19121
7,F810,A4289,A5041,A300
7,F810,E083291,A5041,A300
7,F810,E083291,A5041,E103591
8,E531,A4150,A5041,E103591
8,E531,A4150,A5433,F0630
9,A5275,F04,A5433,F0630
10,E853,A231,E781,A5139
11,F554,A5423,E13641,A5139
12,E848,A5423,E13641,E7881
12,E848,E75240,E13641,E7881
13,A663,E000,F19980,A5423
15,F11129,A3790,A5433,E75240
15,F11129,A3790,A5433,E731
16,A3700,A3790,A5433,E731
16,A3700,E133519,E13641,E0939
17,A5139,E133519,E13641,E0939
18,E030,F524,A231,A1859
20,A5002,E7500,A3952,F18951
20,A5002,F13280,A839,F18951
21,F524,F13280,A839,E089
22,F11129,E1137X9,A5041,E103219
25,A4289,E71313,F1293,A218
25,A4289,E71313,F04,A218
27,A5139,E030,A5209,E103219
27,A5139,F0781,A5209,E103219
30,E28319,F13232,A5209,F812
31,E08311,F13232,A5209,F1821
32,F13280,A5139,E731,A5209
32,F13280,A5402,A5275,A218
33,E103591,A5402,A5275,A218
33,E103591,F554,A5275,A218
33,E103591,F554,E093499,F18951
34,E13641,F521,E093499,F18951
35,E1137X9,F810,E133519,E093541
36,A5002,F554,E133519,F11288
38,A523,F15980,F13181,A5139
44,E853,E731,A5423,F1194
45,E0939,F1194,E0939,E133299
45,E0939,F521,E0939,E133299
46,A300,F521,A4150,F19121
46,A300,A199,E103591,E211
47,F13181,A199,E103591,E211
47,F13181,E58,E030,F19980
49,A3700,E853,F19980,F0781
;

If you saw my last post, I discussed how you can convert large datasets into formats to make useful labels. That post is here.

Here’s what the above dataset is showing us:

enrolid dx1 dx2 dx3 dx4
2 Urban rabies Other specified diabetes mellitus with proliferative diabetic retinopathy with macular edema Opioid use Other specified diabetes mellitus with mild nonproliferative diabetic retinopathy without macular edema
4 Niemann-Pick disease type B Secondary lactase deficiency Diabetes mellitus due to underlying condition with unspecified complications Diabetes mellitus due to underlying condition with mild nonproliferative diabetic retinopathy without macular edema
4 Niemann-Pick disease type B Other secondary syphilis of skin Diabetes mellitus due to underlying condition with unspecified complications Diabetes mellitus due to underlying condition with mild nonproliferative diabetic retinopathy without macular edema
5 Diabetes mellitus due to underlying condition with ketoacidosis without coma Other secondary syphilis of skin Other specified diabetes mellitus with mild nonproliferative diabetic retinopathy without macular edema Initial stage of trachoma
5 Diabetes mellitus due to underlying condition with ketoacidosis without coma Pharyngeal diphtheria Other specified diabetes mellitus with mild nonproliferative diabetic retinopathy without macular edema Initial stage of trachoma
6 Whooping cough Other forms of actinomycosis Late congenital syphilitic meningitis Other psychoactive substance abuse with intoxication delirium
7 Specific reading disorder Other forms of actinomycosis Late congenital syphilitic meningitis Indeterminate leprosy
7 Specific reading disorder Diabetes mellitus due to underlying condition with mild nonproliferative diabetic retinopathy without macular edema Late congenital syphilitic meningitis Indeterminate leprosy
7 Specific reading disorder Diabetes mellitus due to underlying condition with mild nonproliferative diabetic retinopathy without macular edema Late congenital syphilitic meningitis Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema
8 Pyridoxine deficiency Gram-negative sepsis Late congenital syphilitic meningitis Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema
8 Pyridoxine deficiency Gram-negative sepsis Gonococcal keratitis Mood disorder due to known physiological condition
9 Syphilis of kidney and ureter Amnestic disorder due to known physiological condition Gonococcal keratitis Mood disorder due to known physiological condition
10 Secondary systemic amyloidosis Brucellosis due to Brucella abortus Pure hyperglyceridemia Other secondary syphilis of skin
11 Abuse of vitamins Gonococcal infection of other male genital organs Other specified diabetes mellitus with hypoglycemia with coma Other secondary syphilis of skin
12 Cystic fibrosis with other manifestations Gonococcal infection of other male genital organs Other specified diabetes mellitus with hypoglycemia with coma Lipoid dermatoarthritis
12 Cystic fibrosis with other manifestations Niemann-Pick disease type A Other specified diabetes mellitus with hypoglycemia with coma Lipoid dermatoarthritis
13 Hyperkeratosis of yaws Congenital iodine-deficiency syndrome Other psychoactive substance use Gonococcal infection of other male genital organs
15 Opioid abuse with intoxication Whooping cough Gonococcal keratitis Niemann-Pick disease type A
15 Opioid abuse with intoxication Whooping cough Gonococcal keratitis Secondary lactase deficiency
16 Whooping cough due to Bordetella pertussis without pneumonia Whooping cough Gonococcal keratitis Secondary lactase deficiency
16 Whooping cough due to Bordetella pertussis without pneumonia Other specified diabetes mellitus with proliferative diabetic retinopathy with macular edema Other specified diabetes mellitus with hypoglycemia with coma Drug or chemical induced diabetes mellitus with other diabetic ophthalmic complication
17 Other secondary syphilis of skin Other specified diabetes mellitus with proliferative diabetic retinopathy with macular edema Other specified diabetes mellitus with hypoglycemia with coma Drug or chemical induced diabetes mellitus with other diabetic ophthalmic complication
18 Congenital hypothyroidism with diffuse goiter Premature ejaculation Brucellosis due to Brucella abortus Other tuberculosis of eye
20 Early congenital syphilitic osteochondropathy GM2 gangliosidosis Meningococcal myocarditis Inhalant use
20 Early congenital syphilitic osteochondropathy Sedative Mosquito-borne viral encephalitis Inhalant use
21 Premature ejaculation Sedative Mosquito-borne viral encephalitis Diabetes mellitus due to underlying condition without complications
22 Opioid abuse with intoxication Type 2 diabetes mellitus with diabetic macular edema Late congenital syphilitic meningitis Type 1 diabetes mellitus with mild nonproliferative diabetic retinopathy with macular edema
25 Other forms of actinomycosis Glutaric aciduria type II Cannabis use Other forms of tularemia
25 Other forms of actinomycosis Glutaric aciduria type II Amnestic disorder due to known physiological condition Other forms of tularemia
27 Other secondary syphilis of skin Congenital hypothyroidism with diffuse goiter Other cardiovascular syphilis Type 1 diabetes mellitus with mild nonproliferative diabetic retinopathy with macular edema
27 Other secondary syphilis of skin Postconcussional syndrome Other cardiovascular syphilis Type 1 diabetes mellitus with mild nonproliferative diabetic retinopathy with macular edema
30 Asymptomatic premature menopause Sedative Other cardiovascular syphilis Mathematics disorder
31 Diabetes mellitus due to underlying condition with unspecified diabetic retinopathy with macular edema Sedative Other cardiovascular syphilis Inhalant dependence
32 Sedative Other secondary syphilis of skin Secondary lactase deficiency Other cardiovascular syphilis
32 Sedative Gonococcal vulvovaginitis Syphilis of kidney and ureter Other forms of tularemia
33 Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema Gonococcal vulvovaginitis Syphilis of kidney and ureter Other forms of tularemia
33 Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema Abuse of vitamins Syphilis of kidney and ureter Other forms of tularemia
33 Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema Abuse of vitamins Drug or chemical induced diabetes mellitus with severe nonproliferative diabetic retinopathy without macular edema Inhalant use
34 Other specified diabetes mellitus with hypoglycemia with coma Sexual aversion disorder Drug or chemical induced diabetes mellitus with severe nonproliferative diabetic retinopathy without macular edema Inhalant use
35 Type 2 diabetes mellitus with diabetic macular edema Specific reading disorder Other specified diabetes mellitus with proliferative diabetic retinopathy with macular edema Drug or chemical induced diabetes mellitus with proliferative diabetic retinopathy with combined traction retinal detachment and rhegmatogenous retinal detachment
36 Early congenital syphilitic osteochondropathy Abuse of vitamins Other specified diabetes mellitus with proliferative diabetic retinopathy with macular edema Opioid dependence with other opioid-induced disorder
38 Neurosyphilis Other stimulant use Sedative Other secondary syphilis of skin
44 Secondary systemic amyloidosis Secondary lactase deficiency Gonococcal infection of other male genital organs Opioid use
45 Drug or chemical induced diabetes mellitus with other diabetic ophthalmic complication Opioid use Drug or chemical induced diabetes mellitus with other diabetic ophthalmic complication Other specified diabetes mellitus with mild nonproliferative diabetic retinopathy without macular edema
45 Drug or chemical induced diabetes mellitus with other diabetic ophthalmic complication Sexual aversion disorder Drug or chemical induced diabetes mellitus with other diabetic ophthalmic complication Other specified diabetes mellitus with mild nonproliferative diabetic retinopathy without macular edema
46 Indeterminate leprosy Sexual aversion disorder Gram-negative sepsis Other psychoactive substance abuse with intoxication delirium
46 Indeterminate leprosy Miliary tuberculosis Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema Secondary hyperparathyroidism
47 Sedative Miliary tuberculosis Type 1 diabetes mellitus with proliferative diabetic retinopathy without macular edema Secondary hyperparathyroidism
47 Sedative Dietary calcium deficiency Congenital hypothyroidism with diffuse goiter Other psychoactive substance use
49 Whooping cough due to Bordetella pertussis without pneumonia Secondary systemic amyloidosis Other psychoactive substance use Postconcussional syndrome

It’s obviously a lot of information, and that’s the last time I will show it with labels.

Also, these diagnoses were randomly selected, so pardon any random/weird combinations.

Flagging diagnoses: the naive method

Let’s say that you need to flag any individual who has a diagnosis code related to opioid use. We could use the following code:

data patients_data_naive;
	set work.patients;
	length opioid_flag 3.;
		if dx1 = "F1194" or dx2 = "F1194" or dx3 = "F1194" or dx4 = "F1194" then opioid_flag = 1;
			else opioid_flag = 0;
run;

This is obviously DATA step code, and it uses IF-THEN-ELSE processing to flag whether someone has an opioid use diagnosis code (F1194).

We could also do it using PROC SQL and the CASE WHEN statement:

proc sql;
	create table	work.patients_sql_naive as
		select
					*,
					case
						when dx1 = "F1194" or dx2 = "F1194" or dx3 = "F1194" or dx4 = "F1194" then 1
						else 0
					end as opioid_flag length = 3
		from
					work.patients;
quit;

Both of these methods produce the following output:

enrolid dx1 dx2 dx3 dx4 opioid_flag
2 A821 E133519 F1194 E133299 1
4 E75241 E731 E088 E083291 0
4 E75241 A5139 E088 E083291 0
5 E0810 A5139 E133291 A710 0
5 E0810 A360 E133291 A710 0
6 A3790 A4289 A5041 F19121 0
7 F810 A4289 A5041 A300 0
7 F810 E083291 A5041 A300 0
7 F810 E083291 A5041 E103591 0
8 E531 A4150 A5041 E103591 0
8 E531 A4150 A5433 F0630 0
9 A5275 F04 A5433 F0630 0
10 E853 A231 E781 A5139 0
11 F554 A5423 E13641 A5139 0
12 E848 A5423 E13641 E7881 0
12 E848 E75240 E13641 E7881 0
13 A663 E000 F19980 A5423 0
15 F11129 A3790 A5433 E75240 0
15 F11129 A3790 A5433 E731 0
16 A3700 A3790 A5433 E731 0
16 A3700 E133519 E13641 E0939 0
17 A5139 E133519 E13641 E0939 0
18 E030 F524 A231 A1859 0
20 A5002 E7500 A3952 F18951 0
20 A5002 F13280 A839 F18951 0
21 F524 F13280 A839 E089 0
22 F11129 E1137X9 A5041 E103219 0
25 A4289 E71313 F1293 A218 0
25 A4289 E71313 F04 A218 0
27 A5139 E030 A5209 E103219 0
27 A5139 F0781 A5209 E103219 0
30 E28319 F13232 A5209 F812 0
31 E08311 F13232 A5209 F1821 0
32 F13280 A5139 E731 A5209 0
32 F13280 A5402 A5275 A218 0
33 E103591 A5402 A5275 A218 0
33 E103591 F554 A5275 A218 0
33 E103591 F554 E093499 F18951 0
34 E13641 F521 E093499 F18951 0
35 E1137X9 F810 E133519 E093541 0
36 A5002 F554 E133519 F11288 0
38 A523 F15980 F13181 A5139 0
44 E853 E731 A5423 F1194 1
45 E0939 F1194 E0939 E133299 1
45 E0939 F521 E0939 E133299 0
46 A300 F521 A4150 F19121 0
46 A300 A199 E103591 E211 0
47 F13181 A199 E103591 E211 0
47 F13181 E58 E030 F19980 0
49 A3700 E853 F19980 F0781 0

Arrays in SAS

One issue with the naive method is that it requires you to explicitly type out each variable. This isn’t helpful if you have up to fifty diagnosis/procedure codes in your data sets.

This is where arrays are extremely helpful. Arrays allow you to reference all of these variables with a single keyword. Below is an example of an array statement:

array _icd [4] dx1 dx2 dx3 dx4

Now, you may be saying that this isn’t helpful – you’re still typing out all the variables! Let’s break it down.

  1. We have the ARRAY statment itself.
  2. The name of your array, _icd. I prefer to begin my ARRAY names with an underscore. Think of this as a shortcut or nickname for all the variables you want to reference.
  3. The subscript. This identifies how many variables/elements are in your ARRAY.
  4. Finally, the variables that are in your ARRAY.

An array with an explicit subscript and explict variables

One way to define your array is through explicit subscripts and variables, much like the example above. Below is a full example:

data work.patients_expl_array;
	set work.patients;
	length opioid_flag 3.; /* #1 */
	array _icd [4] dx1 - dx4; /* #2 */
		opioid_flag = 0; /* #3 */
	do i = 1 to 4; /* #4 */
		if _icd[i] = "F1194" then opioid_flag = 1; /* #5 */
	end; /* #6 */
	drop i; /* #7 */
run;

Results:

enrolid dx1 dx2 dx3 dx4 opioid_flag
2 A821 E133519 F1194 E133299 1
4 E75241 E731 E088 E083291 0
4 E75241 A5139 E088 E083291 0
5 E0810 A5139 E133291 A710 0
5 E0810 A360 E133291 A710 0
6 A3790 A4289 A5041 F19121 0
7 F810 A4289 A5041 A300 0
7 F810 E083291 A5041 A300 0
7 F810 E083291 A5041 E103591 0
8 E531 A4150 A5041 E103591 0
8 E531 A4150 A5433 F0630 0
9 A5275 F04 A5433 F0630 0
10 E853 A231 E781 A5139 0
11 F554 A5423 E13641 A5139 0
12 E848 A5423 E13641 E7881 0
12 E848 E75240 E13641 E7881 0
13 A663 E000 F19980 A5423 0
15 F11129 A3790 A5433 E75240 0
15 F11129 A3790 A5433 E731 0
16 A3700 A3790 A5433 E731 0
16 A3700 E133519 E13641 E0939 0
17 A5139 E133519 E13641 E0939 0
18 E030 F524 A231 A1859 0
20 A5002 E7500 A3952 F18951 0
20 A5002 F13280 A839 F18951 0
21 F524 F13280 A839 E089 0
22 F11129 E1137X9 A5041 E103219 0
25 A4289 E71313 F1293 A218 0
25 A4289 E71313 F04 A218 0
27 A5139 E030 A5209 E103219 0
27 A5139 F0781 A5209 E103219 0
30 E28319 F13232 A5209 F812 0
31 E08311 F13232 A5209 F1821 0
32 F13280 A5139 E731 A5209 0
32 F13280 A5402 A5275 A218 0
33 E103591 A5402 A5275 A218 0
33 E103591 F554 A5275 A218 0
33 E103591 F554 E093499 F18951 0
34 E13641 F521 E093499 F18951 0
35 E1137X9 F810 E133519 E093541 0
36 A5002 F554 E133519 F11288 0
38 A523 F15980 F13181 A5139 0
44 E853 E731 A5423 F1194 1
45 E0939 F1194 E0939 E133299 1
45 E0939 F521 E0939 E133299 0
46 A300 F521 A4150 F19121 0
46 A300 A199 E103591 E211 0
47 F13181 A199 E103591 E211 0
47 F13181 E58 E030 F19980 0
49 A3700 E853 F19980 F0781 0

Let’s break this down:

  1. It’s always good practice to set the length of numeric variables to the smallest number of bytes (3) when you’re creating a dichotomous flag. SAS will default to 8 if you do not set it here.
  2. Here’s our array statement. Instead of typing out each variable (dx1, dx2, etc.), I use a hyphen to specify the range of variables between dx1 and dx4.
  3. I set the default value of opioid_flag to 0. This means each individuak will have a value of 0 for opioid_flag unless they have the diagnosis code for opioid use.
  4. To use our ARRAY statement, we have to use the DO statement. This statement tells SAS that for each observation/record, DO or execute the following code from 1 to 4 times.
    • Also, notice the i here. We’re essentially creating a variable, i, that takes on the value based on the DO statement or loop. Each time the DO loop iterates, this value will increase by 1. Once the value exceeds i (here, it would be 5), the DO loop or statement stops and proceeds to the next observation.
  5. Here’s where we use our ARRAY statement. Remember, our ARRAY name is basically a shortcut or nickname to refer to all of our dx variables.
    • I’m going to focus right on the _icd[i] part since that often trips people up and makes them avoid arrays.
    • Since we are in a DO loop, _icd[i] is first going to look at our value of i. If we are in the first iteration of the DO loop for a single observation, i will be equal to 1. Accordingly, _icd[1] will resolve to dx1.
    • We can then read it like our naive DATA step or PROC SQL query – that is, if dx1 is equal to F1194, then opioid_flag is equal to 1.
  6. For every DO statement, you must have an END statement. SAS will reach this statement, and if i has not exceeded the end value in our DO loop, it will return to the top, increment i by 1 and repeat the process outlined in point 5 above.
  7. Finally, we’re going to use the drop statement to remove the variable i from the workpatients_expl_array data set.
    • If you choose to keep i in your data set, you will see that it holds a value of 5 for each observation. This is exactly what I was talking about in step 6.

An array with an implicit subscript and implicit variables

The above code can become problematic if you’re not careful. Let’s say that you have 5 diagnosis codes, but you leave the subscript as 4 – you’re not going to examine that fifth diagnosis code, and your results may be problematic.

This is where implicit subscripts and implicit variables become helpful. They’re basically shorthand ways of letting the computer do the counting and grouping for you. What do I mean? You will probably see that most of your cohort members use this kind of ARRAY.

data work.patients_impl_array;
	set work.patients;
	length opioid_flag 3.;
	array _icd [*] dx:; /* #1 */
		opioid_flag = 0;
	do i = 1 to dim(_icd); /* #2 */
		if _icd[i] = "F1194" then opioid_flag = 1;
		leave; /* #3 */
	end;
	drop i;
run;

Since most of the steps are the same, I only commented out sections that I did not explain previously:

  1. Here, we are doing the same thing as above. We use the ARRAY statement and our ARRAY name (_icd).
    • Instead of explicitly defining how many variables or elements exist in our array, I use an asterisk * to tell SAS to count the number of variables included in my ARRAY.
    • So what variables are in my ARRAY? Anything beginning with dx. The colon (:) tells SAS to include all variables that begin with this prefix.
  2. Again, this is a DO loop. The only difference is that we are using the DIM function.
    • We are telling SAS to create that i variable that takes on a value from 1 to however many variables are in the ARRAY. The DIM function essentially acts as an upper bound that is defined by however many variables SAS found with the dx prefix.
  3. Here, I added a leave statement. This will save you a lot of time if you’re working with large files, which is obviously the case with claims data.
    • This tells SAS to leave the DO loop if the IF-THEN statement resolves to 1.

I’m not going to show the results since they are the same.

Arrays with multiple conditions

This doesn’t require much explanation. It builds on the same thing as above, except we use the IN operator.

I’m flagging the following conditions: “Diabetes mellitus due to underlying condition with unspecified diabetic retinopathy with macular edema” and “Diabetes mellitus due to underlying condition with unspecified complications”.

data work.patients_array_mult_dx;
	set work.patients;
	length diabetes_flag 3.;
	array _icd [*] dx:;
		diabetes_flag = 0;
	do i = 1 to dim(_icd);
		if _icd[i] in ("E08311", "E088") then diabetes_flag = 1;
	end;
	drop i;
run;

Results:

enrolid dx1 dx2 dx3 dx4 diabetes_flag
2 A821 E133519 F1194 E133299 0
4 E75241 E731 E088 E083291 1
4 E75241 A5139 E088 E083291 1
5 E0810 A5139 E133291 A710 0
5 E0810 A360 E133291 A710 0
6 A3790 A4289 A5041 F19121 0
7 F810 A4289 A5041 A300 0
7 F810 E083291 A5041 A300 0
7 F810 E083291 A5041 E103591 0
8 E531 A4150 A5041 E103591 0
8 E531 A4150 A5433 F0630 0
9 A5275 F04 A5433 F0630 0
10 E853 A231 E781 A5139 0
11 F554 A5423 E13641 A5139 0
12 E848 A5423 E13641 E7881 0
12 E848 E75240 E13641 E7881 0
13 A663 E000 F19980 A5423 0
15 F11129 A3790 A5433 E75240 0
15 F11129 A3790 A5433 E731 0
16 A3700 A3790 A5433 E731 0
16 A3700 E133519 E13641 E0939 0
17 A5139 E133519 E13641 E0939 0
18 E030 F524 A231 A1859 0
20 A5002 E7500 A3952 F18951 0
20 A5002 F13280 A839 F18951 0
21 F524 F13280 A839 E089 0
22 F11129 E1137X9 A5041 E103219 0
25 A4289 E71313 F1293 A218 0
25 A4289 E71313 F04 A218 0
27 A5139 E030 A5209 E103219 0
27 A5139 F0781 A5209 E103219 0
30 E28319 F13232 A5209 F812 0
31 E08311 F13232 A5209 F1821 1
32 F13280 A5139 E731 A5209 0
32 F13280 A5402 A5275 A218 0
33 E103591 A5402 A5275 A218 0
33 E103591 F554 A5275 A218 0
33 E103591 F554 E093499 F18951 0
34 E13641 F521 E093499 F18951 0
35 E1137X9 F810 E133519 E093541 0
36 A5002 F554 E133519 F11288 0
38 A523 F15980 F13181 A5139 0
44 E853 E731 A5423 F1194 0
45 E0939 F1194 E0939 E133299 0
45 E0939 F521 E0939 E133299 0
46 A300 F521 A4150 F19121 0
46 A300 A199 E103591 E211 0
47 F13181 A199 E103591 E211 0
47 F13181 E58 E030 F19980 0
49 A3700 E853 F19980 F0781 0

Arrays with the IN: operator

The IN operator is great, but we’re only grabbing two diagnoses within the E08 category that deal with diabetes. An awesome, short-hand technique to access all codes with a certain prefix is the IN with the colon modifier (IN:).

It’s not very well-documented in the SAS documentation, but it is available in a paper here.

data work.patients_array_mult_dx_in;
	set work.patients;
	length diabetes_flag 3.;
	array _icd [*] dx:;
		diabetes_flag = 0;
	do i = 1 to dim(_icd);
		if _icd[i] in: ("E08") then diabetes_flag = 1;
	end;
	drop i;
run;

Using a leave statement with IN: doesn’t flag records properly. I’m not quite sure why, but this was an issue for me on SAS Studio (free) and my personal, local version of SAS 9.4.

Results:

enrolid dx1 dx2 dx3 dx4 diabetes_flag
2 A821 E133519 F1194 E133299 0
4 E75241 E731 E088 E083291 1
4 E75241 A5139 E088 E083291 1
5 E0810 A5139 E133291 A710 1
5 E0810 A360 E133291 A710 1
6 A3790 A4289 A5041 F19121 0
7 F810 A4289 A5041 A300 0
7 F810 E083291 A5041 A300 1
7 F810 E083291 A5041 E103591 1
8 E531 A4150 A5041 E103591 0
8 E531 A4150 A5433 F0630 0
9 A5275 F04 A5433 F0630 0
10 E853 A231 E781 A5139 0
11 F554 A5423 E13641 A5139 0
12 E848 A5423 E13641 E7881 0
12 E848 E75240 E13641 E7881 0
13 A663 E000 F19980 A5423 0
15 F11129 A3790 A5433 E75240 0
15 F11129 A3790 A5433 E731 0
16 A3700 A3790 A5433 E731 0
16 A3700 E133519 E13641 E0939 0
17 A5139 E133519 E13641 E0939 0
18 E030 F524 A231 A1859 0
20 A5002 E7500 A3952 F18951 0
20 A5002 F13280 A839 F18951 0
21 F524 F13280 A839 E089 1
22 F11129 E1137X9 A5041 E103219 0
25 A4289 E71313 F1293 A218 0
25 A4289 E71313 F04 A218 0
27 A5139 E030 A5209 E103219 0
27 A5139 F0781 A5209 E103219 0
30 E28319 F13232 A5209 F812 0
31 E08311 F13232 A5209 F1821 1
32 F13280 A5139 E731 A5209 0
32 F13280 A5402 A5275 A218 0
33 E103591 A5402 A5275 A218 0
33 E103591 F554 A5275 A218 0
33 E103591 F554 E093499 F18951 0
34 E13641 F521 E093499 F18951 0
35 E1137X9 F810 E133519 E093541 0
36 A5002 F554 E133519 F11288 0
38 A523 F15980 F13181 A5139 0
44 E853 E731 A5423 F1194 0
45 E0939 F1194 E0939 E133299 0
45 E0939 F521 E0939 E133299 0
46 A300 F521 A4150 F19121 0
46 A300 A199 E103591 E211 0
47 F13181 A199 E103591 E211 0
47 F13181 E58 E030 F19980 0
49 A3700 E853 F19980 F0781 0

Sorry that I didn’t do any gifs – I’m tired.

Let me know if you need anything clarified or if there are any issues in the post.

Program for download is here.