You ever had 10-45 diagnoses that you want to see, but they have some uninterpretable ICD/CPT code?
Maybe you have a large csv file that has every single ICD code and its respective CCS cross-walk.
In reality, you can have a dataset in your
WORK
folder that you can run through this same process. If you have it as a dataset, you can turn it into aFORMAT
.
Someone had the following issue the other day:
1. They had up to 10 CPT codes per patient eyeing them down that didn't make any sense.
2. They had a large csv containing a CPT code range, a CCS code, and a CCS label.
3. They wanted the CCS label for each CPT code displayed.
Here is a mock-up of some patient-level data with some fake CPT codes for those that want to follow along:
data have_pt;
infile datalines delimiter = "," missover;
length enrolid $7. cpt1-cpt10 $5.;
input enrolid cpt1-cpt10;
datalines;
1234567,96020,G0255,96020,,96020,96020,NAAAH,64461
2234567,96105,64446,61150,C9729,COOL,22861
3322112,95999,63655,0169T,NOPE,NOPE,??CPT,IHATE,CPTCD,JUST,JOKIN
4221312,62267,28035,64461,62161,62164,62165
4222145,95965,0434T,0386T,63000,0109T,0437T
;
What it looked like after a PROC PRINT
:
enrolid | cpt1 | cpt2 | cpt3 | cpt4 | cpt5 | cpt6 | cpt7 | cpt8 | cpt9 | cpt10 |
---|---|---|---|---|---|---|---|---|---|---|
1234567 | 96020 | G0255 | 96020 | 96020 | 96020 | NAAAH | 64461 | |||
2234567 | 96105 | 64446 | 61150 | C9729 | COOL | 22861 | ||||
3322112 | 95999 | 63655 | 0169T | NOPE | NOPE | ??CPT | IHATE | CPTCD | JUST | JOKIN |
4221312 | 62267 | 28035 | 64461 | 62161 | 62164 | 62165 | ||||
4222145 | 95965 | 0434T | 0386T | 63000 | 0109T | 0437T |
Here’s a portion of the CPT to CCS conversion we were looking at:
data have_ccs;
infile datalines delimiter = "," truncover;
input code_range :$15. ccs ccs_label $100.;
datalines;
'61000-61055',1,Incision and excision of CNS
'61105-61106',1,Incision and excision of CNS
'61108-61130',1,Incision and excision of CNS
'61150-61156',1,Incision and excision of CNS
'61250-61315',1,Incision and excision of CNS
'61320-61323',1,Incision and excision of CNS
'61340-61340',1,Incision and excision of CNS
'61345-61440',1,Incision and excision of CNS
'61470-61490',1,Incision and excision of CNS
'61510-61516',1,Incision and excision of CNS
'61518-61530',1,Incision and excision of CNS
'61534-61545',1,Incision and excision of CNS
'61556-61557',1,Incision and excision of CNS
'61570-61576',1,Incision and excision of CNS
'61582-61596',1,Incision and excision of CNS
'61598-61608',1,Incision and excision of CNS
'61615-61616',1,Incision and excision of CNS
'61712-61735',1,Incision and excision of CNS
'61880-61880',1,Incision and excision of CNS
'62161-62164',1,Incision and excision of CNS
'0169T-0169T',2,Insertion, replacement, or removal of extracranial ventricular shunt
'61107-61107',2,Insertion, replacement, or removal of extracranial ventricular shunt
'61210-61210',2,Insertion, replacement, or removal of extracranial ventricular shunt
'62160-62160',2,Insertion, replacement, or removal of extracranial ventricular shunt
'62180-62258',2,Insertion, replacement, or removal of extracranial ventricular shunt
'0202T-0202T',3,Laminectomy, excision intervertebral disc
'0274T-0275T',3,Laminectomy, excision intervertebral disc
'22856-22856',3,Laminectomy, excision intervertebral disc
'22858-22858',3,Laminectomy, excision intervertebral disc
'22861-22861',3,Laminectomy, excision intervertebral disc
'22864-22864',3,Laminectomy, excision intervertebral disc
'61343-61343',3,Laminectomy, excision intervertebral disc
'62287-62287',3,Laminectomy, excision intervertebral disc
'62351-62351',3,Laminectomy, excision intervertebral disc
'62380-63252',3,Laminectomy, excision intervertebral disc
'63655-63655',3,Laminectomy, excision intervertebral disc
'63709-63709',3,Laminectomy, excision intervertebral disc
'63740-63740',3,Laminectomy, excision intervertebral disc
'C9729-C9729',3,Laminectomy, excision intervertebral disc
'G0276-G0276',3,Laminectomy, excision intervertebral disc
'S2348-S2348',3,Laminectomy, excision intervertebral disc
'S2350-S2350',3,Laminectomy, excision intervertebral disc
'S2351-S2351',3,Laminectomy, excision intervertebral disc
'62270-62272',4,Diagnostic spinal tap
'62263-62264',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'62274-62282',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'62288-62289',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'62298-62298',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'62310-62350',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'62355-62355',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'63600-63610',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'63650-63650',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'63657-63688',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'63750-63780',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'64412-64412',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'64416-64416',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'64446-64446',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'64448-64449',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'64461-64484',5,Insertion of catheter or spinal stimulator and injection into spinal canal
'28035-28035',6,Decompression peripheral nerve
'29848-29848',6,Decompression peripheral nerve
'64702-64727',6,Decompression peripheral nerve
'0077T-0077T',7,Other diagnostic nervous system procedures
'0106T-0110T',7,Other diagnostic nervous system procedures
'0199T-0199T',7,Other diagnostic nervous system procedures
'0285T-0285T',7,Other diagnostic nervous system procedures
'0333T-0333T',7,Other diagnostic nervous system procedures
'0341T-0341T',7,Other diagnostic nervous system procedures
'0381T-0386T',7,Other diagnostic nervous system procedures
'0434T-0436T',7,Other diagnostic nervous system procedures
'61140-61140',7,Other diagnostic nervous system procedures
'61750-61751',7,Other diagnostic nervous system procedures
'61795-61795',7,Other diagnostic nervous system procedures
'62267-62267',7,Other diagnostic nervous system procedures
'62269-62269',7,Other diagnostic nervous system procedures
'63690-63690',7,Other diagnostic nervous system procedures
'64795-64795',7,Other diagnostic nervous system procedures
'92516-92516',7,Other diagnostic nervous system procedures
'95857-95872',7,Other diagnostic nervous system procedures
'95874-95882',7,Other diagnostic nervous system procedures
'95885-95943',7,Other diagnostic nervous system procedures
'95965-95982',7,Other diagnostic nervous system procedures
'95999-96003',7,Other diagnostic nervous system procedures
'96020-96020',7,Other diagnostic nervous system procedures
'96105-96115',7,Other diagnostic nervous system procedures
'G0255-G0255',7,Other diagnostic nervous system procedures
'G0453-G0453',7,Other diagnostic nervous system procedures
'S3900-S3900',7,Other diagnostic nervous system procedures
'S3905-S3905',7,Other diagnostic nervous system procedures
'S8040-S8040',7,Other diagnostic nervous system procedures
;
This is what 10 of those observations looked like after a PROC PRINT
:
code_range | ccs | ccs_label |
---|---|---|
‘61000-61055’ | 1 | Incision and excision of CNS |
‘61105-61106’ | 1 | Incision and excision of CNS |
‘61108-61130’ | 1 | Incision and excision of CNS |
‘61150-61156’ | 1 | Incision and excision of CNS |
‘61250-61315’ | 1 | Incision and excision of CNS |
‘61320-61323’ | 1 | Incision and excision of CNS |
‘61340-61340’ | 1 | Incision and excision of CNS |
‘61345-61440’ | 1 | Incision and excision of CNS |
‘61470-61490’ | 1 | Incision and excision of CNS |
‘61510-61516’ | 1 | Incision and excision of CNS |
So, we wanted those CPT codes in have_pt
to be displayed with the according CCS labels in have_ccs
.
One of the best tools, in my opinion, for this kind of task is PROC FORMAT
.
PROC FORMAT
has an option called CNTLIN =
. This can save you lots of time in scenarios where you have some large, existing dataset with formats already written for you.
First, though, we need to prepare the data set for PROC FORMAT
.
The code to get this is a bit peculiar and esoteric – just know that. There’s nothing you can do but learn this weird code. It makes sense once you do it enough.
Let me highlight the necessary elements before I throw the code at you:
You need a
start
variable representing the starting value for the label/format.
You need an
end
variable representing the ending value for the label/format.
You need a
label
variable representing, you guessed it, the label.
You need a
fmtname
variable representing the name of the format.
You need a
type
variable representing the kind of format you want (e.g., ‘c’ for character, ‘n’ for numeric).
You need an
hlo
varaiable representing the ‘high-low-other’ parameters that you can use in a regularPROC FORMAT
request.
You also need to add the
END =
option on yourSET
statement so you can add an additional observation at the end of the dataset. You’ll see what I mean later.
data have_fmt (drop = code_range); /* #1 */
set have_ccs (
keep = code_range ccs_label /* #2(a) */
rename = (ccs_label = label) /* #2(b) */
)
end = z; /* #2(c) */
retain fmtname "$cpt_fmt" type "c"; /* #3 */
start = scan(compress(code_range, "'"), 1, "-"); /* #4 */
end = scan(compress(code_range, "'"), -1, "-"); /* #5 */
output; /* #6 */
if z then do; /* #7 */
start = ""; /* #7(a) */
end = ""; /* #7(b) */
label = "Missing"; /* #7(c) */
hlo = "O"; /* #7(d) */
output; /* #7(e) */
end; /* #7(f) */
run; /* #8 */
Woohoo, lots of fun, step-by-step explanations for me!
-
Our
DATA
step is a usualDATA
step. For those that don’t know, including aDROP =
on yourDATA
statement drops variables at the very, very end of processing, before it’s written to the dataset. -
Our
SET
statement does a couple things. Similiar to (1), anyKEEP
,DROP
,RENAME
that you conduct on yourSET
statement affects how the data is read into SAS. It usually speeds up processing. Just know thatDROP
,KEEP
, andRENAME
options on theDATA
step happen at the end, whileDROP
,KEEP
, andRENAME
options happen at the beginning of processing.a. We are keeping
code_range
andccs_label
from the input data set. Since this is on theSET
statement, these are the only variables read in.b. I’m also going to
RENAME
ccs_label
tolabel
, which is required forPROC FORMAT
. I really meant that this code was esoteric.c. Notice that I used the
END =
option before I insert a semicolon to end theSET
statement. This creates, I believe, a temporary variable in the PDV designating whether a record in the data set is the last observation in the data set. If it is, it get’s assigned the value “z”. You could make it anything, though (e.g., a, b, c, gator). -
We use
RETAIN
to create a variable namefmtname
, and we set the value forfmtname
to be equal to$cpt_fmt
. We also create a variable namedtype
and set the value toc
, which standsd for character. As an aside,RETAIN
can be used to create static variables across a data set, and it’s much faster than explicitly creating it likefmtname = "$cpt_fmt";
.a. I think the
$
is unnecessary, but I like to put it in there for good measure. -
Here, I’m cleaning to get the
start
value for the format.i. First, I’m using
SCAN
to extract the value that I want at a given location.ii. Second, I’m using
COMPRESS
to remove those pesky quotes surrounding thecode_range
value. Note, this is my first argument forSCAN
, so we’re essentially going toSCAN
the unquoted version ofcode_range
.iii.
SCAN
requires a position argument for you to extract your value of interest. Here, I want to take the first value (1
), and I want it to be the first value before the hyphen (-
). -
Literally the exact same as (4), except I’m getting the
end
value. The only difference is my positional argument inSCAN
, which I change to-1
. Specifying a-1
takes the outermost number (think right to left) before the hyphen (-
). -
Now we add an explicit
OUTPUT
statement. In SAS, there is an implicitOUTPUT
statement at the end of aDATA
step/when an observation has been fully processed in the PDV. Using an explicitOUTPUT
statement overrides the implicitOUTPUT
. So here, we’re usingOUTPUT
to send each observation that is read in and cleaned to the data sethave_fmt
once it has been cleaned. -
Here’s that
END
option that we talked about in 2(c) comes into play. We’re basically sayingif
this is the last observation (z
),then do
the following:a. For the
start
variable, set it to missing (""
).b. For the
end
variable mentioned in (5), set it to missing (""
). This is not theEND =
option.c. For the
label
variable, I’m going to set it toMissing
so that it’s helpful to see when something isn’t coded correctly.d. Now, here’s that
hlo
variable. I set it toO
to represent “other” codes. Basically, any CPT code in thehave_pt
that does not have a corresponding CCS code inhave_ccs
will have a missing label (""
).e. We want to
OUTPUT
this record to the dataset AFTER all the other codes that were read in.f.
END
thatif-then-do
processing.
Weird, huh?
Let’s see what the first ten observations look like:
label | fmtname | type | start | end | hlo |
---|---|---|---|---|---|
Incision and excision of CNS | $cpt_fmt | c | 61000 | 61055 | |
Incision and excision of CNS | $cpt_fmt | c | 61105 | 61106 | |
Incision and excision of CNS | $cpt_fmt | c | 61108 | 61130 | |
Incision and excision of CNS | $cpt_fmt | c | 61150 | 61156 | |
Incision and excision of CNS | $cpt_fmt | c | 61250 | 61315 | |
Incision and excision of CNS | $cpt_fmt | c | 61320 | 61323 | |
Incision and excision of CNS | $cpt_fmt | c | 61340 | 61340 | |
Incision and excision of CNS | $cpt_fmt | c | 61345 | 61440 | |
Incision and excision of CNS | $cpt_fmt | c | 61470 | 61490 | |
Incision and excision of CNS | $cpt_fmt | c | 61510 | 61516 |
Let’s look at the last observations of that just to see what happened with all that hlo
stuff:
label | fmtname | type | start | end | hlo |
---|---|---|---|---|---|
Other diagnostic nervous system procedures | $cpt_fmt | c | S8040 | S8040 | |
Missing | $cpt_fmt | c | O |
See how the if z then do
outputted an observation to the bottom of the dataset? The record right above the hlo = "O"
was marked internally as z
with our end = z
option. Once we explictly outputted that record, SAS encountered the Z
and concatenated an additional record with our specifications.
If you’re kind of lost with the above DATA
step and prefer a PROC SQL
alternative, you can do this:
proc sql;
create table have_fmt_sql as
select
scan(compress(code_range,("'")), 1, "-") as start,
scan(compress(code_range,("'")), -1, "-") as end,
ccs_label as label,
"$cpt_fmt" as fmtname,
"c" as type,
"" as hlo
from
have_ccs;
insert into have_fmt_sql
values ("", "", "Missing", "$cpt_fmt", "c", "O");
quit;
They provide the same results, but the DATA
step version requires a bit more manipulation of the PDV, as well as some weird options that you don’t always use. The PROC SQL
version ends up looking a little cleaner, in my opinion.
Okay, now we’re ready to throw it in PROC FORMAT
.
Take either the DATA
step version or the PROC SQL
version and use it as your argument for CNTLIN
:
proc format
cntlin = have_fmt;
run;
That was a bit anticlimactic. This is what your log should show, though:
138 proc format
139 cntlin = have_fmt;
NOTE: Format $CPT_FMT has been output.
140 run;
Alright, so now we have converted have_ccs
from a data set into a FORMAT
. As most of you probably know, applying a FORMAT
is pretty easy.
Let’s say that we want to see the variables themselves in the dataset.
data have_pt_w_fmt_vars (drop = i); /* #1 */
set have_pt; /* #2 */
array _cpt [*] cpt:; /* #3 */
array _fcpt [10] $100. _fcpt1 - _fcpt10; /* #4 */
do i = 1 to dim(_cpt); /* #5 */
_fcpt[i] = put(_cpt[i], $cpt_fmt.); /* #5(a) */
end; /* #5(b) */
run;
-
Create a new data set. Drop
i
right at the end. -
Set our
have_pt
data set. -
Create an
ARRAY
that groups together all thecpt
variables (cpt1 - cpt10). The colon is a modifier that tells SAS to include all variables starting withcpt
to be included in this array or group. -
Create another
ARRAY
that creates ten variables, _fcpt1 - _fcpt10 (meaning formatted CPT). I’m also setting the length of these character variables to $100. -
Do the following from the first instance of the
_cpt
array to the last instance.DIM
essentially sets the upper bound of the array for you.a. The
DO
loop will iterate from 1 to 10 here. So the first instance will do the following:- _fcpt1 = put(cpt1, $cpt_fmt.);
b. End the
DO
loop and incrementi
by 1. It will return to the top of the do loop and repeat (e.g., next instance will be 2) until thei
variable exceeds the upper bound of the array (Here,i
will become 11. You can comment out thedrop = i
from theDATA
step above to see).
This creates ten new variables, and each one can accomodate a format of up to 100 characters. It’s also good to cross-walk them this way to ensure you mapped on the formats correctly, but it can take up a lot of space depending on the number of variables. I recommend some other ways below so that you don’t have to permanently alter the values within your data set.
enrolid | cpt1 | cpt2 | _fcpt1 | _fcpt2 |
---|---|---|---|---|
1234567 | 96020 | G0255 | Other diagnostic nervous system procedures | Other diagnostic nervous system procedures |
2234567 | 96105 | 64446 | Other diagnostic nervous system procedures | Insertion of catheter or spinal stimulator and injection into spinal canal |
3322112 | 95999 | 63655 | Other diagnostic nervous system procedures | Laminectomy, excision intervertebral disc |
4221312 | 62267 | 28035 | Other diagnostic nervous system procedures | Decompression peripheral nerve |
4222145 | 95965 | 0434T | Other diagnostic nervous system procedures | Other diagnostic nervous system procedures |
Further, we can see that those with incorrect CPT codes didn’t get mapped.
enrolid | cpt6 | cpt7 | cpt8 | _fcpt6 | _fcpt7 | _fcpt8 |
---|---|---|---|---|---|---|
1234567 | NAAAH | 64461 | Insertion of catheter or spinal stimulator and injection into spinal canal | |||
2234567 | 22861 | Laminectomy, excision intervertebral disc | ||||
3322112 | ??CPT | IHATE | CPTCD | |||
4221312 | 62165 | |||||
4222145 | 0437T |
We can, of course, apply the FORMAT
directly to the dataset variables, but it permanently alters them. You could accomplish that by doing this:
data permanent;
set have_pt;
format cpt: $cpt_fmt.;
run;
A lot less code, but you’ve overwritten everything and now you have a lot of long strings instead of those short codes.
You can always just PROC PRINT
to check a couple observations. Here, I format the first PROC PRINT
, but leave the second untouched.
proc print
data = have_pt (obs = 10);
format cpt: $cpt_fmt.;
run;
proc print
data = have_pt (obs = 10);
run;
But you can apply that format everywhere. We could do a PROC FREQ
with the PLOTS = FREQPLOT
option and apply the format. Let’s look at cpt2
.
proc freq
data = have_pt;
tables cpt2 / plots = freqplot;
format cpt2 $cpt_fmt.;
run;
FYI: you can specify
cpt1 - cpt10
(orcpt:
) on theTABLES
statement to get all of them printed out with a frequency plot. You need to adjust the format toFORMAT
toformat cpt1-cpt10 $cpt_fmt.
(orcpt:
).
Nonetheless, our frequency plot of cpt2
shows this:
Finally, let’s say we want to see the most common CPT codes cross-walked with their respective CCS labels.
First, I need to transpose the data:
proc transpose
data = have_pt
out = have_pt_tp (
rename = (
col1 = cpt_code
_name_ = cpt_position
)
);
by enrolid;
var cpt:;
run;
This gives us this. This is truncated to ten observations:
enrolid | cpt_position | cpt_code |
---|---|---|
1234567 | cpt1 | 96020 |
1234567 | cpt2 | G0255 |
1234567 | cpt3 | 96020 |
1234567 | cpt4 | 96020 |
1234567 | cpt5 | 96020 |
1234567 | cpt6 | NAAAH |
1234567 | cpt7 | 64461 |
1234567 | cpt8 | |
1234567 | cpt9 | |
1234567 | cpt10 | |
2234567 | cpt1 | 96105 |
2234567 | cpt2 | 64446 |
2234567 | cpt3 | 61150 |
2234567 | cpt4 | C9729 |
2234567 | cpt5 | COOL |
Now I’m going to plot it with PROC SGPLOT
.
Pardon my weird formatting - I find that SAS graphing commands/statements are awkward with spacing and difficult to follow.
I tend to add tabs and carriage returns to normalize things. I also add semicolons below the statement to denote exactly where a statement ends intstead of searching for it at the end of a statement.
title "CPT codes crosswalked to respective CCS labels";
proc sgplot
data = have_pt_tp
;
hbar cpt_code / stat = freq
group = cpt_code
datalabel
missing
categoryorder = respdesc
;
format cpt_code $cpt_fmt.
;
run;
title;
And here is that plot.
I hope this was a helpful resource for you all.
If you have any questions or run into any issues, please let me know and I will address them as soon as possible.
The program for this post can be downloaded here.