GRAPH )

Paper TT17 Stellar Graphics Using SAS® (but not necessarily SAS/GRAPH®) P. Chris Holland 1 , U.S. Food and Drug Administration, Silver Spring, MD Yan...
Author: Janel Baker
16 downloads 3 Views 747KB Size
Paper TT17

Stellar Graphics Using SAS® (but not necessarily SAS/GRAPH®) P. Chris Holland 1 , U.S. Food and Drug Administration, Silver Spring, MD Yan Zhu, Rockville, MD ABSTRACT

SAS® has come a long way since its old days of text-based scatter plots. With Version 9, high-resolution graphics via SAS/GRAPH® are now virtually ubiquitous throughout SAS statistical procedures and the output delivery system (ODS). These tools can help advance SAS’s stature for producing publication-quality graphics. There are still, however, other lesser-known alternatives that, despite their ease-of-use and high-quality, continue to hide in the shadows of the more cutting-edge pure SAS-based solutions. This paper demonstrates some of these alternatives and compares them with those that instead involve exclusive use of SAS/GRAPH for producing graphical output.

INTRODUCTION As far as SAS/GRAPH software has come in the past few years, it still has its limitations. The almost limitless amount of available customizations that SAS programmers have at their disposal comes at the price of making simple, nicelooking output sometimes hard to achieve without a tried-and-true template to start with. As a result, other packages are often used in lieu of SAS/GRAPH for graphical summaries. Two such packages are Microsoft Excel (known for it’s ease of use and wide availability) and R, an open-source statistical package that is similar to S and S-plus. The problem with using outside packages, however, is the seam requiring human intervention in order to transfer data from SAS to the non-SAS software. This paper looks at some SAS solutions that make communicating with and transferring data to other software packages more seamless. These solutions use features like Dynamic Data Exchange (DDE) for communicating with Windows® software (such as Excel), PROC EXPORT for creating data files that other packages can read and/or open, and the X command for running other packages via SAS. This paper will explore 3 options for using SAS to create 4 types of graphical output. The 3 options include pure SAS/GRAPH output, SAS-controlled Excel graphics, and SAS-controlled R graphs. The 4 types of graphical output will include scatter plots, bar charts, Kaplan-Meier curves, and lattice plots. All solutions will be geared toward a “production environment”, meaning that the code is meant to be run in batch mode with the objective of creating a number of different graphic files automatically—with no manual intervention during program execution. All resulting figures appear in the appendix. Full program code and the output files are available online at http://www.hollandhut.com/pharmasug06.

THE SCATTER PLOT The scatter plot has been one of the most common diagnostic tools for statisticians for quite some time. Although it is probably most often used for simple, crude looks at data distributions, there is sometimes the need to refine such output for presentation purposes. Three methods for producing such output are demonstrated below. SAS/GRAPH Output The advantage to a pure SAS solution for producing graphics is the lack of a need for any sort of data importing or exporting. The following SAS/GRAPH code creates a scatter plot of the relationship between study subjects’ baseline values for a given efficacy endpoint and the post-baseline results following treatment with one of two study medications—a placebo and the experimental treatment: 101 102 103 104 105 106 108 109 110 111 112 113 114

%let path = [insert path name here]; libname library "&path"; proc format; value ntrt 0 = 'Placebo' 1 = 'Test Drug' ; run; data scatter; set library.scatter; run; filename scatter "&path.\sas-graph-scatter.cgm";

1

Disclaimer: Views expressed in this paper are those of the author and not, necessarily, of the Food and Drug Administration and must not be taken to represent policy or guidance on behalf of the FDA.

1

115 116 117 118 119 120 121 123 124 125 126 127 128 129 121 122 123 124 125 126 127 129 130 131 132 133

goptions device=cgmof97l vsize=4.5in hsize=6.5in chartype=6 gsfmode=replace gsfname=scatter; symbol1 ci=red v=star; symbol2 ci=blue v=circle; legend1 value=(h=1.0) label=(h=1.0 "Treatment Group") frame position=(bottom center outside) ; axis1 value=(h=1.0) label=(h=1.0 'Baseline Value'); axis2 value=(h=1.0) label=(a=90 h=1.0 "Follow-Up Value"); title1 h=1.2 font="TimesRomanBold" "PROC GPLOT Scatter Plot Output"; proc gplot data = scatter; plot value * baseline = trt / haxis=axis1 vaxis=axis2 legend=legend1; format trt ntrt. ; run; quit;

Perhaps the most important part of creating SAS/GRAPH output is the choice of the device driver, as specified with the DEVICE= option in the GOPTION statement. A list of all available devices that one has to choose from can be generated by simply running PROC GDEVICE. A computer graphics metafile (CGM) is a particularly useful format for creating output intended for an MS Office product such as Word or Power Point. The device being used throughout this paper, cgmof97l, creates such a file. The resulting output is Figure 1.1 in the Appendix. DDE and Excel As mentioned before, the options available with SAS/GRAPH output are almost limitless. Finding out how to implement these options, however, can sometimes require a limitless amount of time. This is why MS Excel is so useful for creating graphs. It has a “Chart Wizard” for creating nice-looking plots very easily, and then has point-andclick functionality for adding customizations to the Chart Wizard’s default results. Using Excel to produce graphics, however, does not necessarily mean having to create each one by hand. By using an Excel chart as a template and SAS’s DDE functionality, SAS can be used to create countless Excel graphs automatically. With this approach, we first create a template Excel file with sample data and a pre-created Excel scatter plot with all of the options we want. SAS is then used to open up the Excel file template and insert new data into it. The file is then saved with a new name and closed via the DDE syntax. One approach for creating the Excel template is to simply experiment with the Excel Chart Wizard. Once a basic chart is created via the Wizard, customizations such as colors and plotting symbols can be made. The figure below displays the spreadsheet data structure and the dialog box for defining the range of data that the chart will use. Since we are using this initially as a template, it is important to make sure that the data range in the template is broad enough to cover all expected data sizes. The template data, however, should be small enough to ensure that any new set of data that gets inserted into the template file has enough observations to overwrite the template data. For example, the screen shot below shows only 5 rows of data, but a data range of 215 rows (rows 2 to 216), to accommodate new data.

2

Once the template is created, SAS can be used to open the template file, replace the template data with actual data, save the new Excel spreadsheet and chart in a new file, and then close Excel. The following code demonstrates how to open Excel via the X command, and then how to define a data range (the sheet and rows and columns within that sheet) that SAS will write to via DDE: 301 302 303 304 305 306 307 308 309 310 310 311 312 313 314 315 316

%let excelpath=c:\progra~1\micros~2\office10\excel.exe; %let template=scatter-template.xls; %let newfile=excel-scatter.xls; *--- open the template file ---; x "&excelpath. &template "; *--- delete previous “new” file ---; x "del &newfile"; *--- name the rows and columns to open ---*; filename ddedata dde "excel|sheet1!r2c1:r500c6" NOTAB; data _null_; ** be sure to wait for the file to open; x = sleep(1); run;

Note that the path of the Excel executable, specified on line 301, will vary from one computer to the next. Also note the deletion of the file via the X command on line 309. This is to prevent Excel from asking the user whether it will be ok to overwrite the existing file of the same name after implementing a ‘Save As’ instruction (the objective is have the SAS program run without manual intervention during execution). The next step is to replace the data in the template file with the new data. For greater flexibility with varying data set sizes, the Excel template was set up so that each “Series” (treatment group) has data for each axis in a separate column. Therefore, some data manipulation is needed, but nothing that can not be handled in the single data step, as shown in the code below: 318 *-- update the spreadsheet with the new data --*; 319 data _null_; 310 merge scatter ( 321 rename=(baseline=pbobaseline value=pbofu) 322 where=(trt=0 and pbobaseline ne . and pbofu ne .) 323 ) 324 scatter ( 325 rename=(baseline=testbaseline value=testfu) 326 where=(trt=1 and testbaseline ne . and testfu ne .) 327 ) 328 ;

3

329 330 331 332 333

file ddedata;

334 335 run;

placebo='Placebo'; testdrug='TestDrug'; put pbobaseline '09'x pbofu '09'x Placebo '09'x testbaseline '09'x testfu '09'x TestDrug '09'x ; format pbobaseline pbofu testbaseline testfu null.;

Note the absence of a BY statement. This is one rare instance where such a thing would be permissible with a MERGE. Also note the last line of the data step-- the FORMAT statement. The NULL. format (not shown), assigns any missing value to the text string “=1/0” (missing values will result in the data step if the TRT=0 and TRT=1 groups do not have an equal number of observations). This is done so that Excel will read this string as a formula and assign a null value to the cell-- it will show up as “#DIV/0!”. Otherwise, empty or missing cells that are inside the defined data range will cause problems in the chart. If the value is one that Excel recognizes as a null value, however, the value will be ignored. For some reason, writing “#NULL!” to the cell did not result in Excel recognizing the field as a null value. Once the data step above is executed, the spreadsheet data will have been updated. The file then just needs to be closed with the new file name. 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348

*-- name Excel system commands --*; filename ddecmds dde "excel|system"; *-- save and close Excel --*; *-- note, macro vars can not be used here --*; data _null_; file ddecmds; put '[save.as("excel_scatter.xls")]'; put '[quit()]'; run; data _null_; ** be sure to wait for the file to save and close; x = sleep(1); run;

The resulting scatter plot is saved within the new Excel file. It can then be imported as a Chart Object into files such as Word documents or Power Point presentations, which is how Figure 1.2 was imported. The “point-and-click” means by which options can be set in the Excel file is a very user-friendly feature. One limitation, however, is the inability to make the title and axis names data driven, so they need to be added by hand for each graph. The next solution will not have these limitations. SAS Running R Code The last alternative first requires some background since this approach is probably the least familiar to readers. The solution involves the use of the R statistical package. R is a free package that is very similar to the S language and environment, which later developed into S-plus. It has a particular strength with graphics, which makes it worth exploring as an alternative to SAS graphics. With some advanced set-up, SAS can be used to interact with R automatically. With our “production-environment” approach in mind, we first need to develop an R function (similar to a SAS macro) to accommodate the variations to be expected from one plot to the next. For our scatter plot, we are using a function called scatter_plot_function. In conjunction with development of the function, we need to also develop some sample R code that will call the function. Once the R code is finalized, we will write a SAS program that will create the data file(s) that R can read from and then write iterations of this R code to an R-script file. Each iteration will provide new parameter values for the R function, such as a new sub-set of data, new titles and footnotes, and a new output file name. As the last step, we will use the X command to batch submit the R program that was just written by SAS. The R code below is for a simplified example that provides just one iteration: 401 ## R program for creating scatter plots using the scatter_plot_function.r file 402 ## Source: source('[insert path here]/r-scatter.r'); 403 404 path