®

SAS and Scripts: Two Great Tasks That Task Great Together John Lujan, Kaiser Permanente, Oakland, CA

ABSTRACT “Hey, you got your SAS in my Script!” “No, you got your Script in my SAS!” Presenting data in a manner that is meaningful, useful (and familiar) to an analyst’s intended audience presents a unique set of challenges. At Kaiser Permanente’s Quality and Operations Support (QOS), we are often tasked with providing data and metrics to our Physician clients in just such a manner; one that imparts a strong visual component while also conforming to infrastructural dogma. At QOS, infrastructural dogma translates directly to Excel. As such, by “standing on the shoulders of (SAS) giants” we have created and successfully implemented an extremely simple and flexible reporting process that leverages the power of PC SAS, the ubiquity of Excel and the flexibility of Visual Basic Scripting (VBS) to create highly customized, visually impacting Excel based reports. Although this paper will be demonstrating the technique using VBS and Excel, there is no real reason to limit ourselves to these products. The following method can be easily extended or adapted to utilize a myriad of scripting languages and applications.

INTRODUCTION For better or worse, Excel has elbowed its way to the forefront of reporting tools. Many organizations have come to find themselves dependant on Microsoft’s scrappy spreadsheet application to disseminate data and Kaiser Permanente is no exception. Primarily due to Excel’s flexibility and ubiquity, a great deal of Quality and Operations Support’s reporting processes ultimately end in the creation of multiple, unique, Excel based, dashboard style reports. The need to create multiple versions of a report is due to the fact that a majority of QOS’ reporting is concerned with Protected Health Information (PHI)—data that cannot be simply broadcast across the organization but instead always kept highly secure. As a result of PHI data’s required level of security, report metrics must be rolled up in various manners, from facility level all the way down to provider level rollups. Because of the many levels data is rolled up, the data that is written to each individual report changes considerably from release to release. Of course the calculated report metric data itself will shift from release to release but it is the dynamic nature of report metric metadata, such as the change in the number of observations and variables between releases that is more challenging to deal with, especially when report specifications call for bells and whistles like conditional formatting of calculated data output (as they often do). In the past, much of the Excel formatting and data placement was done by hand, rendering the above reporting process time consuming, labor intensive and error prone. The following methodology, adapted and extended from ideas gleaned from two SUGI papers (see the Recommended Reading section) effectively automates Excel reporting processes through use of the SAS v9 Excel LIBNAME engine, data set metadata and thoughtful usage of Visual Basic Script “embedded” with SAS macro variables. In order to demonstrate this method, a sample “dynamic” dashboard report will be created for an imaginary health care organization. The report will be dynamic in the sense that it will be built to accept and accommodate an arbitrarily sized n x m (n facilities, m metrics) SAS data set of computed health care metrics; each metric value will be conditionally formatted based on whether or not a metric target has been met. Two SAS data sets will provide the metric and target data (named metric_set and target_set, respectively). A generic Excel template will be utilized as a flexible foundation and VBS will be used to format the report. Figure 1 illustrates the process.

1

SAS data set: metric_set

SAS data set: target_set

SAS Process: 1. Get metadata 2. Create VBS based on metadata

VBS Process: 1. Copy Template 2. Format Excel Wrkbk

Formatted, unpopulated Excel Report

SAS Process: Write data to Excel

Formatted, populated Excel report

Excel Template

Figure 1. Process Overview It should be stated that this process was developed using PC SAS 9.1.3 and Excel 2003. The following methodology will have to be modified slightly if different versions of SAS and/or Excel are to be used.

EXCEL NAMED RANGES AND THE SAS V9 EXCEL LIBNAME ENGINE For this discussion, a short primer on Excel named ranges and the SAS v9 Excel LIBNAME engine is warranted. In Excel, a named range is exactly what it purports to be, a cell or a block of cells that has been assigned a text name and then may be referenced by its assigned name rather than the cell address(es). Creating a named range in Excel is exceedingly simple. The quickest method to name a range is to select a cell (or cells) and assign it (them) a name by typing a string into Excel’s Name Box, which by default is directly above the top-left corner of the cell grid. Briefly consider the Excel LIBNAME engine syntax and usage to replace data in Excel named ranges:

libname xls excel ‘\\path\report_template.xls’ ver=2002; proc datasets library=xls; delete EXCEL_NAMED_RANGE; quit; data xls.EXCEL_NAMED_RANGE (dblabel=yes); set SAS_DATA_SET; run; libname xls clear;

The above code: 1.

opens the report_template.xls using a LIBNAME statement, using the VER= option to specify Excel 2002 compatibility (regarding the VER= option, specify the version that most closely matches your Excel installation)

2.

erases data that might be residing in EXCEL_NAMED_RANGE with a DATASETS Procedure DELETE statement (note that the named range itself is not deleted, only the data)

3.

copies the data contained in SAS_DATA_SET, including the variable labels via the DBLABEL= option, to EXCEL_NAMED_RANGE (the Excel LIBNAME engine will automatically resize named ranges in Excel to the required dimensions)

4.

closes report_template.xls

It is important to note that when copying a data set to a named range in Excel via the Excel LIBNAME engine, the structure of the data set will largely stay in tact. That is to say, data formats and the order of data set variables will be written to Excel as they exist in the originating SAS data set.

2

THE EXCEL TEMPLATE As stated previously, a simple Excel template and Visual Basic Script will be used to create a dynamic and customized dashboard report. Figure 2 illustrates how the production dashboard should ultimately look. The elements of the Excel template that will be affected by scripting include: •

The report title—centered on the report display columns (A).



Colorfully formatted column headers (B).



A target for each metric (C).



An n x m data matrix, conditionally formatted to display blue or red depending whether or not the metric target was met or missed by the detailed Facility (D).

Now that the report specifications have been defined, a generic template can be created.

Figure 2. Dashboard Mockup When creating the report template, the fact that the Facility metric data set will be an arbitrary size (n x m) must always be kept in mind. In fact, it is the arbitrary size of the metric data set (and thus an arbitrary number of cells in Excel that must be formatted) that provides the main challenge in creating this particular report. Due to the uncertainty of the size of the Facility metric data set, the template will be created with only three initial columns: the Facility name column and two metric columns. Visual Basic Script will be utilized alongside the Excel LIBNAME engine to write the data and format the various cells as required. Figure 3 details the generic template’s layout, which is simply a partially formatted skeleton of the full dashboard report which contains two named ranges. Some items of note: •

The named range for the data matrix, “METRIC_RNG” (selected in Figure 2) is defined in the range B8:D9 (A).



The named range for the target data, “TARGET_RNG” is defined in the range C6:D7 (B). The first row of “TARGET_RNG” is not meant to be displayed—this row will be removed later via the scripting process.



The cells in the first row of each named range have placeholder values. If placeholder values are not present in the first row of a named range, an error might occur when using the Excel LIBNAME engine to replace the named range data (A) (B).



The column header text (C) is Figure 3. Generic Template referenced from the first row of the “METRIC_RNG“ named range (i.e. cell C5 contains the formula “=C8”). The referenced text (C8 and D8) is formatted white in order to “hide” the text for a cleaner looking layout.



The background color of the column header cells (C) is white in the generic template. During the scripting process, background colors will be applied to the metric column header cells.

3



There is report title placeholder text in cell B2 (D). During the scripting process, we will replace the placeholder text with a more descriptive report title.

Cell number formatting (General, Percentage, etc.), border formatting and text formatting (font sizes, colors, styles, etc.) are finalized in the generic template. Further, the red or blue conditional formatting of the data matrix cells is in place in the range C9:D9. The conditional formatting formula is dependant on the metric target values that will be written to Row 7. During the scripting process, this conditional formatting will be expanded to accommodate the number of cells the input data set requires. Resizing column A and Row 1 to five pixels and starting the actual display in column B is functionally unnecessary and was purely a matter of taste. The template was built in this manner to reduce the number of Visual Basic Script operations that must be performed in order to format the final report.

SAS DATA SET PREPARATION The dashboard report will make use of two SAS data sets, one for metric targets and the other for the metric data itself. There are really only two major considerations that must be made when building the input data sets for this particular report. First, for obvious reasons, the order of the variables in the metric data set must match the order of the variables in the target data set. Second, descriptive labels must be defined for the metric data set since our template uses the SAS data set labels written to the “METRIC_RNG” named range to display column headers. One quick note about the data set variable labels. It is inadvisable to use a string containing a “.” (dot) or “!” (exclamation point) in the variable labels as their inclusion might cause an Excel LIBNAME engine error to occur. However, if a “#” (hash) is substituted for a “.” in the data set variable label, the “#” will be written as a “.” in the Excel named range. For the purposes of this discussion, a SAS data set containing eight variables (seven metrics and a facility name) and ten observations (one per facility) was constructed and named metric_set. The metric variables in metric_set were populated with randomly generated numbers. A target data set, named target_set, was also constructed. The target set contains one observation and seven variables (one variable for each metric). The variable values were created by averaging the randomly generated values for each metric.

THE SCRIPTING CONSTRUCT The main driver of the whole process is surprisingly simple. In a nutshell, we use a FILE statement and a PUT statement within a DATA step to create a script file and subsequently execute the script file via the X command. For example: data _null_; file ‘\\path_to_script_file\script_file.ext’; put ‘script code’; run; x ‘\\path_to_script_file\script_file.ext’; It’s time to put this construct to work!

SAS CODE AND CONSTRUCT USAGE The SAS code below has been segmented into three sections: Setup, VBS creation/execution and data insertion. The following code makes two assumptions: it is assumed that an appropriate folder hierarchy exists on C:\ and that the input (metric and target) SAS data sets have been created and named metric_set and target_set. The first section, “Setup”, performs three main duties: options are declared, the paths to the various project folders and the report title string are stored in macro variables and a macro is employed to count the number of observations and variables in the metric data set (the obvar_count macro is a slightly modified version of a macro that can be found in the online SAS 9.2 Documentation). The NOXWAIT option instructs SAS not to wait for the user to manually exit the X command’s command prompt before proceeding to the next procedure or DATA step and will instead automatically exit the command prompt upon completion of the X command’s given task.

4

/******************************************** * SECTION 1 – Setup * ********************************************/ options macrogen symbolgen noxwait; /*dashboard title*/ %let dash_title = Eastern Kingdoms: Facility Level Metrics; %let %let %let %let %let

homefldr = C:\WUSS_2010\; data_in_fldr = &homefldr.data_in\; data_out_fldr = &homefldr.data_out\; template_fldr = &homefldr.template\; vbs_fldr = &homefldr.vbs\;

libname datafldr "&data_in_fldr"; /*macro to count observations and variables in data sets*/ %macro obvar_count(data_in); %global n_vars n_obs; %let data_in_id = %sysfunc(open(&data_in)); %if &data_in_id %then %do; %let n_obs =%sysfunc(attrn(&data_in_id,nobs)); %let n_vars=%sysfunc(attrn(&data_in_id,nvars)); %let return_code = %sysfunc(close(&data_in_id)); %end; %else %do; %put ERROR: Open for data set & data_in_id failed - %sysfunc(sysmsg())!; %abort; %end; %mend obvar_count; /*call obvar_count*/ %obvar_count(datafldr.metric_set)

The second section, “Create and execute VBS”, contains the scripting construct. In depth exploration of the syntax of the Visual Basic Script itself is largely beyond the scope of this discussion. However, an individual with even limited experience with Visual Basic or Visual Basic for Applications should be able to ascertain the script’s operations with minimal effort. What is definitely not beyond the scope of this discussion is the placement of the various macro variables within the PUT statement of the DATA step.

/******************************************** * SECTION 2 – Create and execute VBS * ********************************************/ /*create VBS to make a copy of the generic template and format the copy for data insertion and output*/ data _null_; file "&vbs_fldr.modify_template.vbs"; put ‘Const OverwriteExisting = True’ / ‘Set FSO = CreateObject("Scripting.FileSystemObject")’ / ‘FSO.CopyFile "‘"&template_fldr.template.xls"’", _’ / ‘ "‘"&data_out_fldr.final dash.xls"’", _’ / ‘ OverwriteExisting’ / ‘Set XL = CreateObject("Excel.Application")’ / ‘XL.Visible = True’ / ‘XL.DisplayAlerts = False’ / ‘XL.Workbooks.Open("‘"&data_out_fldr.final dash.xls"‘")’ / ‘XL.Sheets("DASHBOARD").Activate’ / ‘XL.ActiveSheet.Range("D1:D9").Select’ / ‘XL.Selection.AutoFill XL.Range("D1:D9").Resize(9,’"&n_vars"‘ - 2), _’ / ‘ xlFillDefault’ / ‘For i = 3 to (‘"&n_vars"‘ + 1)’ /

5

‘ XL.ActiveSheet.Columns(i).ColumnWidth = 12’ / ‘ j = (i - 2) Mod 7’ / ‘ Select Case j’ / ‘ Case 1’ / ‘ colorCode = 34’ / ‘ Case 2’ / ‘ colorCode = 35’ / ‘ Case 3’ / ‘ colorCode = 36’ / ‘ Case 4’ / ‘ colorCode = 37’ / ‘ Case 5’ / ‘ colorCode = 38’ / ‘ Case 6’ / ‘ colorCode = 39’ / ‘ Case 0’ / ‘ colorCode = 40’ / ‘ End Select’ / ‘ XL.ActiveSheet.Cells(5,i).Interior.ColorIndex = colorCode’ / ‘Next’ / ‘XL.ActiveSheet.Cells(3, 2).Value = "’”&dash_title"’”’ / ‘XL.ActiveSheet.Range(XL.ActiveSheet.Cells(3, 2), _’ / ‘ XL.ActiveSheet.Cells(3,’"&n_vars"‘ + 1)).Merge’ / ‘XL.ActiveSheet.Cells(3, 2).HorizontalAlignment = -4108’ / ‘XL.ActiveSheet.Range("B9").Resize(1,’"&n_vars"‘).Select’ / ‘XL.Selection.Copy’ / ‘Set pasteRng= XL.ActiveSheet.Range("B9").Resize(‘"&n_obs"‘,’"&n_vars"‘)’ / ‘pasteRng.PasteSpecial -4122’ / ‘XL.Activesheet.Rows(6).Hidden = True’ / ‘XL.ActiveSheet.Cells(1,1).Select’ / ‘XL.ActiveWorkbook.Save’ / ‘XL.ActiveWorkbook.Close’ / ‘XL.DisplayAlerts = True’ / ‘XL.Quit’; run; /*Execute VB script*/ x "&vbs_fldr.modify_template.vbs"; In short, the macro variables declared in SECTION 1 (n_vars, n_obs, etc.) are used in SECTION 2’s DATA step to embed file folder locations, the report title string and dimensional metadata from the metric_set SAS data set into a dynamically generated Visual Basic Script. It is the embedding of the macro variables within the VBS that gives this method its flexibility and power. The only real caveat is that the programmer must have an understanding of how the single quote and double quote characters are appropriately employed in SAS. For simplicity’s sake, strings that include special characters are avoided in the code sample above but through use of character masking SAS functions such as %STR, more complex strings are certainly possible. The X command is resident in section 2 as well. Recall that the X command is used here to execute the Visual Basic Script. Figure 4 details how the final dash appears after the execution of the VBS. All of the final formatting has been performed and the dash looks like the mockup, minus the data.

Figure 4. Dashboard After VBS Execution

6

The third section of the SAS code simply employs the Excel LIBNAME engine to write data to the newly formatted dashboard report’s named ranges. /******************************************** * SECTION 3 – Write data to worksheet * ********************************************/ /*clear xls libname*/ libname xls clear; /*assign xls libname to process dash*/ libname xls excel "&data_out_fldr.final dash.xls" ver=2002; /*delete data from process dash named ranges*/ proc datasets lib=xls; delete METRIC_RNG TARGET_RNG; quit; /*write data to target named range*/ data xls.TARGET_RNG; set datafldr.target_set; run; /*write data to metric named range*/ data xls.METRIC_RNG (dblabel=yes); set datafldr.metric_set; run; /*clear xls libname*/ libname xls clear; Observe rows 6 and 7 of the final dashboard in Figure 5; the Excel LIBNAME engine will write data to Excel named ranges even if one or more rows are hidden.

Figure 5. The Final Dashboard 7

The dashboard is complete, formatted and ready for delivery! What’s more, next period when the next dashboard is due, no modifications will need to be made to the SAS code that was used to format and write data to this period’s dashboard despite input data set or metadata changes. The scripting construct and its supporting code will perform as intended regardless.

CONCLUSION Of course there are a number of alternate methods and tools available to SAS programmers that will achieve the same results as the aforementioned process. However, the real strength in the scripting construct presented in this paper is its flexibility of use. Obviously, the scripting construct is great for a Windows/Microsoft Office/VBS process but in no way should it be limited as such. The scripting construct could easily be adapted to build and execute scripts in other languages such as Perl, JavaScript or Python as well as other-than-scripting uses like dynamic HTML and Latex generation—the possibilities are (almost) infinite.

REFERENCES SAS Institute. “Macro Functions: %SYSFUNC and %QSYSFUNC Functions” SAS(R) 9.2 Macro Language: Reference.

ACKNOWLEDGMENTS Thanks to Lynn Emerson, Jason Huang, Quality and Operations Support and Kaiser Permanente. Special thanks to Patrick F O’Neill for his expertise and ideas.

RECOMMENDED READING •

Choate, Paul A, Martell, Carol A. De-Mystifying the SAS® LIBNAME Engine in Microsoft Excel: A Practical Guide, Proceedings of the Thirty-first Annual SAS® Users Group International Conference, 2006



Conway, Ted. Sur La Table: Creating Microsoft Excel PivotTables in a Jiffy from SAS® Data, Proceedings of the Thirty-first Annual SAS® Users Group International Conference, 2006

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: John Lujan Kaiser Permanente, Quality and Operations Support 1800 Harrison St. Oakland, CA 94612 [email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

8