Using SAS Macro to Create Data Driven Format Statement for Clinical Data Flagging

Using SAS Macro to Create Data Driven Format Statement for Clinical Data Flagging Pilita Canete, EDP Contract Services, Bala Cynwyd, PA Melanie Paules...
Author: Dominick Marsh
0 downloads 0 Views 323KB Size
Using SAS Macro to Create Data Driven Format Statement for Clinical Data Flagging Pilita Canete, EDP Contract Services, Bala Cynwyd, PA Melanie Paules, SmithKline Beecham Pharmaceuticals,Collegeville, PA Shi-Tao Yeh, EDP Contract Services, Bala Cynwyd, PA

ABSTRACT The idea of data flagging is simple enough. It is a process of categorizing the data into ranges based on the study defined significant values. One can use a variety of SAS methods like IF/THEN or SELECT/WHEN to do this grouping. However, these methods entail writing new code everytime values are changed. This paper presents a method of accomplishing data flagging by simply specifying the range(s) of interest into a macro. ’Range of interest’ is defined here as limits that are deemed significant for the study population. The limits can be user-defined or can be derived from the data itself. For purposes of this paper, this concept is taken within the clinical scenario. That is the focus is on clinical data flagging based on percentiles. The paper shows a method of flagging the data based on the 5th and 95th percentile whereby values below 5th percentile is flagged as Low, above, 95th as High, and witihin 5th and 95th percentiles as Normal. The paper goes a step further by giving an example the data is dynamically divided into into equal groups.

intervals. Section 4 deals with a macro for equal sized grouping. The last section concludes the paper.

SAS MACRO FOR THREE RANGE INTERVALS The UNIVARIATE procedure computes a variety of quantiles and measures of location, and outputs the values to a SAS data set. The following SAS macro uses the user-defined input of upper and lower percentiles to compute the statistics of the data distribution. The data steps then assigns the values of the statistics to macro variables. The macro variables are used to produce value labels and create SAS code of PROC FORMAT. The resulting format is used to flag the clinical data.

libname mart ’/home/yehs1/mart’;

INTRODUCTION In a clinical trial, data like laboratory , vital signs, and electrocardiogram are collected for analysis. Clinical data flagging is a programmatic process to mark or add derived flags to each record in the database using specific criteria. The data can then be used to answer the following types of questions: 1) How many patients are outside the normal range, 2) How many patients are High/Low enough to cause clinical concern about the health or welfare of the patient. Clinicians usually provide the normal and clinical concern ranges. In some studies however, the ranges are derived from the collected data itself. The clinical problem actually translates to defining clinical ranges pertinent to a particular study and flagging the data accordingly. This implies that a new range of format has to be created for every new study and for every distinct variable. This can be tedious if to be done manually. The data driven flagging process uses the percentiles from the data distribution to construct the ranges and add the flags to each record. This paper describes a macro whereby read the data, depending on the user-defined percentiles, flags the data accordingly. This paper is comprised of five sections. Section 1 includes the abstract and the introduction, Section 2 describes a macro for flagging three range intervals. Section 3 is devoted to a macro for flagging five range

%macro range1(dsin=, where=, val=, fname=, var1=,var2=); data f1; set &dsin; &where; keep &val; proc univariate data=f1 noprint; var vit_val; output out=p1 &var1=&var1 &var2=&var2; proc transpose data=p1 out=tp1(keep=col1); proc sort data=tp1;by col1; data p2(keep=range); set p1; length range $20.; range = compress(&var1) || ’ - ’ || compress(&var2) || ’ = ’ || "’I’"; data tp2(keep=range); set tp1; length range $20.; if _n_ = 1 then do; col1 = col1 - 0.1; range = ’LOW’ || ’- ’ || compress(col1) || ’ = ’ || "’L’"; end; else do; col1 = col1 + 0.1; range = compress(col1)|| ’ - ’ || ’HIGH’ || ’ = ’ || "’H’"; end; data p3; set p2 tp2; data _null_;

set p3 nobs=_nobs; call symput("p" || compress(put(_n_, 10.)), range); call symput("pnum", put(_nobs, 4.)); run; %let f1 = "proc format;"; %let f2 = " value &fname"; data _null_; file "&fname..sas"; put &f1; put &f2; %do i=1 %to &pnum; put " &&p&i " %end; put " ; " ; run; %mend range1;

The following example selects vital signs parameter ’SYS’ for flagging.

%range1(dsin=vital,where=if vit_prmc=’SYS’,val=vit_val, fname=sys, var1=p5, var2=p95); %sys;

A SAS program sys.sas with the following PROC FORMAT statements are also related. A data step follows to assign flag to each record.

The parameters used in range1 are: dsin

= read in data set name,

where = where clause to subset data, val

= variable name that contains measurement,

fname = SAS program name that contains PROC FORMAT statements, var1

= lower percentile,

var2

= upper percentile.

data vital (keep=vit_prmc vit_val flag); set vital; length flag $2.; if vit_prmc=’SYS’ then flag = put(vit_val, sys.); else if vit_prmc=’DIA’ then flag = put(vit_val, dia.); proc print data=vital (obs=100);run;

The following examples show how to invoke this macro. A vital signs data set contains systolic (SYS) and diastolic (DIA) blood pressure data. The record falling below the lower percentile of p5 or falling above the upper percentile of p95 will be flagged as ’L’ and ’H’ respectively. The record value falling between p5 and p95 will be flagged ’I’. The example selects vital signs parameter ’DIA’ for flagging.

data vital; set mart.vital; if vit_prmc in (’SYS’, ’DIA’); %range1(dsin=vital,where=if vit_prmc=’DIA’,val=vit_val, fname=dia, var1=p5, var2=p95); %dia;

After execution of macro range1, a SAS program dia.sas with the following PROC FORMAT statements are created automatically.

Figure 1. Sample Output from Macro Range1

SAS MACRO FOR FIVE RANGE INTERVALS The following macro range2 is an extension of macro range1 that it allows five range intervals.

%macro range2(dsin=, where=, val=, fname=, var1=,var2=, var3=, var4=);

data f1; set &dsin; &where; keep &val; proc univariate data=f1 noprint; var vit_val; output out=p1 &var1=&var1 &var2=&var2 &var3=&var3 &var4=&var4 ;

call symput("pnum", put(_nobs, 4.)); run; %let f1 = "proc format;"; %let f2 = " value &fname"; data _null_; file "&fname..sas"; put &f1; put &f2; %do i=1 %to &pnum; put " &&p&i " %end; put ";" ; run; %mend;

The parameters used in range2 are: dsin

where = where clause to subset data, val

proc transpose data=p1 out=tp1(keep=col1); proc sort data=tp1;by col1; data p3; set p1; p951 = p95 + 0.1; p51 = p5 - 0.1; data p2(keep=range); set p3; length range $20.; range = compress(&var2) || ’ - ’ || compress(&var3) || ’ = ’ || "’I’"; output; range = compress(p951) || ’ - ’ || compress(&var4) || ’ = ’ || "’H’"; output; range = compress(&var1) || ’ - ’ || compress(p51) || ’ = ’ || "’L’"; output; data tp2(keep=range); set tp1; length range $20.; range=’ ’; if _n_ = 1 then do; col1 = col1 - 0.1; range = ’LOW’ || ’ - ’ || compress(col1) || ’ = ’ || "’-’"; end; else if _n_ = 4 then do; col1 = col1 + 0.1; range = compress(col1) || ’- ’ || ’HIGH’ || ’ = ’ || "’+’"; end; data p4; set p2 tp2; if range = ’ ’ then delete; proc print;run; data _null_; set p4 nobs=_nobs; call symput("p" || compress(put(_n_, 10.)), put(range, $20.));

= read in data set name,

= variable name that contains measurement,

fname = SAS program name that contains PROC FORMAT statements, var1

= lower percentile,

var2

= second lower percentile,

var3

= second upper percentile,

var4

= upper percentile.

data vital; set mart.vital; if vit_prmc in (’SYS’, ’DIA’); %range2(dsin=vital,where=if vit_prmc=’SYS’,val=vit_val, fname=sys, var1=p1, var2=p5, var3=p95, var4=p99); %sys; %range2(dsin=vital,where=if vit_prmc=’DIA’,val=vit_val, fname=dia, var1=p1, var2=p5, var3=p95, var4=p99); %dia;

data vitflag(keep=vit_prmc vit_val flag); set vital; length flag $2.; if vit_prmc=’SYS’ then flag = put(vit_val, sys.); else if vit_prmc=’DIA’ then flag = put(vit_val, dia.); proc print data=vitflag;run;

lowest values are in the first group; the highest values are in the last group. proc format; value dia 58 - 90 = ’I’ . 48 - 57.9 = ’L’ . 96.1- HIGH = ’+’

.

90.1 - 96 = ’H’ . LOW - 47.9 = ’-’ . ; %macro rank1(dsin=, where=, val=, groupn=, fname=);

proc format; value sys 94 - 152 = ’I’ . 88 - 93.9 = ’L’ . 164.1- HIGH = ’+’

.

152.1 - 164 = ’H’ . LOW - 87.9 = ’-’ . ;

data dsin; set &dsin; &where; keep &val ; proc rank data=dsin groups= &groupn ties=mean out=dsout; var &val; ranks ranka; run; proc sort data=dsout;by ranka &val; run; data dsout; set dsout; by ranka &val; if last.ranka; data dsout; set dsout; lag1 = lag(&val) + 0.1; if lag1 = . then lag1 = 0; rankc = "’" || ranka || "’"; rankb = compress (lag1) || ’ - ’ || compress (&val) || ’=’ || compress (rankc); run; data _null_; set dsout nobs=_nobs; call symput("p" || compress(put(_n_, 10.)), put(rankb, $20.)); run; %let f1 = "proc format;"; %let f2 = " value &fname"; data _null_; file "&fname..sas"; put &f1; put &f2; %do i=1 %to &groupn; put " &&p&i " %end; put ";" ; run; %mend;

data vital; set mart.vital;

SAS MACRO FOR EQUAL-SIZED GROUPING When the dataset needs to be divided into approximately equal-sized groups, the PROC RANK is useful for grouping clinical data into equal-sized range. The GROUPS=n option in PROC RANK statement produces grouping scores, where n is the number of groups. The

%rank1(dsin= vital, where= if vit_prmc=’SYS’, val= vit_val, groupn= 10, fname=sys);

how to do this dynamically by using percentiles and proc rank. As far as data flagging is concerned, the benefit of this paper can be summarized into the following: á

use of less repeated code

á

not limited to specific values but can use data distribution (ie. percentiles, proc rank) to dynamically generate clinical cut-off values.

%sys; data vitrank; set vital; length rank $2.; if vit_prmc=’SYS’ then do; rank = put(vit_val, sys.); output; end; keep vit_prmc vit_val rank; proc print data=vitrank(obs=40);run;

REFERENCES [1]. SAS Institute, Inc: SAS/ACCESS Software for PC File Formats, Version 6, 1st Edition, p.260 [2]. SAS Institute, Inc: Getting Started with SAS/ACCESS Software, Version 6, p.98 SAS, and Microsoft are registered trademarks of SAS Institute Inc., and Microsoft Inc., in the USA and other countries.  indicates USA registration. Authors Melanie Paules (610)917-5104(W) E-mail: [email protected] Pilita Canete (610)917-6909(W) E-mail: [email protected] Shi-Tao Yeh, Ph. D (610)917-5883(W) E-mail: [email protected]

CONCLUSION Data flagging is one of the most important data summary methods done in reporting clinical data. It is used in clinically important parameters like laboratory, vital signs, and electrocardiogram to name a few. To do the clinical flagging, separate code has to be generated for each parameter that uses a different clinical reference range. Furthermore, if the clinical limits are dependent on the data distribution and are hard-coded, then the code may need to be modified repeatedly. The idea of doing that for laboratory parameters which can run into hundreds is not very efficient. This paper showed a method of accomplishing this task by simply specifying the range of interest and passing it into a macro. The paper showed