Data Transfers Across Diverse Platforms

Data Transfers Across Diverse Platforms Denis Michel, PharmaNet Inc., Princeton, NJ ABSTRACT SAS® programs written on one system often have to be run...
Author: Ferdinand Ray
0 downloads 1 Views 98KB Size
Data Transfers Across Diverse Platforms Denis Michel, PharmaNet Inc., Princeton, NJ

ABSTRACT SAS® programs written on one system often have to be run on a different system. This issue has mandated that programmers write portable SAS code that can easily be modified for different platforms. Data and output are also transferred routinely. In the pharmaceutical industry, we transfer various file types, including data, programs, and output, among global affiliates, other companies (e.g. contract research organizations) and regulatory agencies (e.g. FDA). The systems and policies in place at the organizations sending and receiving these files are diverse.

Informats are not required unless the data are to be updated after being ported. User-defined formats are to be avoided as much as possible in portable data sets. While these formats are valuable in entry data sets, they may cause problems in transferred data. By default, SAS issues error messages and stops processing if it encounters variables with unknown formats associated with them. In order to read the data and change the error to a note in the SAS log, use the following statement. options nofmterr;

This paper focuses on the issues involved in transferring files across these diverse platforms from a pharmaceutical SAS programming perspective. These issues include SAS transport files, FDA Guidelines for electronic submissions, converting SAS files to other formats, documentation of data contents, transfer media, and writing SAS programs for portability.

If user-defined formats must be stored as variable attributes, either the SAS format catalog or the SAS PROC FORMAT code is provided with the data sets. It is preferable to use the PUT function to assign values to character variables based on user-defined formats.

INTRODUCTION

• • •

Why do SAS programmers routinely transfer files? There are several benefits of data transfers over sending printed output to our customers. Electronic files facilitate sharing of work. Pharmaceutical companies are able to outsource parts of large projects by transferring data and programs to and from contract research organizations, laboratories, and global affiliates. Electronic data transfers also enhance data review. Regulatory agencies, such as the FDA, accept electronic submissions so that the reviewers can analyze data. This may or may not reduce the review time, but it certainly eliminates the “dead” time in waiting for the sponsor to provide paper answers to requests. The most important reason for electronic transfers is that our customers have become sophisticated in their use of software, including MS Excel, MS Access, and SAS/JMP. They prefer to do their own analysis of data using their favorite tools.

THE STARTING POINT The starting point of a data transfer is a SAS data library containing organized SAS files. The SAS data sets to be transferred should have attributes that are consistent with portability. The order of the variables should be in logical sequence. For example, a data set whose observations are uniquely identified by patient, visit, and date should have these three variables first. Each variable has a descriptive label. SAS date and time variables are formatted using SAS standard formats (e.g. DATE. or MMDDYY. for dates, TIME. or HHMM. for times).

In summary, portable SAS data sets have the following characteristics.

• •

Variables in logical order Variables labeled clearly Date and time variables are numeric with standard SAS formats No informat attributes No user-defined format attributes

Adherence to these standards can easily be verified using PROC CONTENTS. The POSITION option is useful to confirm the order of variables.

SAS TRANSPORT FILES SAS files (data sets and catalogs) need to be copied to transport format in order to be read by a different operating system. There are two different methods of creating SAS transport files, the XPORT engine and the CPORT procedure. The XPORT engine was introduced in SAS version 6 and is used with the COPY procedure. PROC COPY can be used to copy individual data sets to transport files as discussed in detail below in the context of FDA guidelines. The code below copies all data sets in a permanent

SAS data library to a SAS transport file.

consequences of different decimal precision across platforms. *** data library reference;

FDA GUIDELINES

libname saslib ‘\\server\folder\data subfolder’;

Clinical programmers in the pharmaceutical industry routinely transfer data to FDA. Guidance documents are available to download as PDF files from the web site www.fda.gov. The Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER) co-published the document “Guidance for Industry: Providing Regulatory Submissions in Electronic Format - General Considerations” in January, 1999.

*** transport file name; libname xlib xport ‘\\server\\folder\transport subfolder\xport.xpt’; ***copy entire data library to transport file; proc copy in=saslib out=xlib memtype=data; run;

SAS issues the following informational note when copying data to the XPORT engine. NOTE: BUFSIZE is not cloned when copying across dissimilar engines. System Option for BUFSIZE was used. A separate PROC COPY statement should be used to copy SAS catalogs, such as format libraries. That is, SAS data sets and SAS catalogs cannot be copied into the same SAS transport file using the XPORT engine. PROC CPORT can be used to copy both SAS data sets and SAS catalogs into a transport file. The syntax is different from the XPORT engine method as shown by the code below.

*** data library reference; libname saslib ‘\\server\folder\data subfolder’; *** transport file name; filename tranfile ‘\\server\\folder\transport subfolder\xport.xpt’; *** Copy the data library to transport format; proc cport library=saslib file=tranfile; run;

Note that PROC CPORT uses a FILENAME statement to define the transport file instead of a LIBNAME statement. Listed below are important issues concerning SAS transport files. • • •

The files must be moved using a binary transfer protocol. An ASCII transfer would corrupt the data. SAS increases the lengths of all numeric variables with lengths of less than 8 bytes by 1 byte. Numeric variables containing decimal values are subject to losing precision in transport format. The ROUND function can be used to minimize the

Another important CDER document for clinical programmers is “Providing Regulatory Submissions in Electronic Format - NDAs” published in January 1999. The corresponding CBER document, revised in November 1999, is “Providing Regulatory Submissions to the Center for Biologics Evaluation and Research (CBER) in Electronic Format – Biologics Marketing Applications”. The submission of SAS data sets is discussed in both of these documents under Item 11 – Case Report Tabulations (CRTs). CBER has another important document, “CBER Computer Assisted License Application (CALA) Questionnaire”, also available on the web site. The FDA guidance documents provide the framework for interactions with the agency reviewers. Some of the specific guidelines related to SAS data transfers are summarized below. • • • • • • •

SAS System XPORT Transport Format (also known as Version 5 SAS Transport Format) PROC COPY using XPORT engine File extension should be .XPT Do not use PROC CPORT Do not compress Single .XPT file for each SAS data set Generally, files should be maximum 25 MB, with large data sets divided logically

The last point above concerns large data sets. If the reviewer requests dividing a large data set, this should be done in a way that is logical to the reviewer. For example, if the data set contains laboratory data, a logical division may be to separate hematology, blood chemistry, and urinalysis. Additional divisions can be discussed with the reviewer to maximize the efficiency of the review. The data provided to FDA should be accompanied by data definition tables, which document the data contents. The data definition tables are provided as a single document named define.pdf. The following columns are contained in the tables. • • • • •

Variable (variable name up to 8 characters) Label (descriptive label up to 32 characters) Type (number, character or date) Codes (description of codes and decodes used) Comments (further description of variable)

Typically, the data provided contain raw variables (collected on the case report form, e.g. dose date), and

derived variables (calculated, e.g. duration of treatment). The comments column would provide the case report form location for raw variables, or the method of calculation for derived variables. The guidance documents provide media recommendations. Small submissions (less than 10 MB) can be provided on 3.5-inch floppy disks. Submissions less than 3.25 GB can be provided on CD-ROM. Large submissions should be provided on digital tape.

AUTOMATED TRANSPORT FILE MACRO The SAS macro below automates the creation of individual SAS transport files for all data sets contained in a SAS data library. The macro reads the SAS data set names from PROC SQL using DICTIONARY.TABLES and then uses CALL EXECUTE to write the individual transport files.

%***FDA format - 1 transport file per data set; %***Specify: libref (SAS data library reference); %*** xpref (Directory location for xport files); %macro mcopyfda (libref=, xpref=); %***Get the data set names; proc sql; create table dsets as select memname from dictionary.tables where libname=%upcase("&libref"); quit; %***Write the transport files; data _null_; set dsets; call execute("libname " || memname || " xport ' " || "&xpref.\" || trim(memname) || ".XPT'; " || "proc copy in=&libref out= " || memname || "; select " || memname || "; run; "); run; %mend mcopyfda;

The macro call below copies all data sets in the SAS data library rawdata to the specified location on the server.

%mcopyfda(libref=rawdata, xpref=\\server\folder);

DATA CONVERSION Clinical programmers often convert data between SAS and other formats. Popular file formats include Excel workbooks (.xls files) and ASCII files (.txt or comma separated .csv files). Excel files are very popular and easy to use, but they can cause data conversion problems. Values can be lost or truncated because each cell in a spreadsheet is independent, so the Excel columns that convert to SAS variables may contain a mix of numbers, dates, and long character strings. The number of rows is limited. Excel 5 has a maximum of 16,384 rows. The maximum rows in Excel 97 are 65,536. So, a SAS data set with 70,000 observations would lose records in a data conversion to Excel. Comma separated ASCII or database files (.dbf) are preferred. Data conversion software includes the SAS Import/Export Facility Wizard. This wizard has been provided in Base SAS for Windows since version 6.12. The wizard supports limited file formats such as .CSV and ASCII. If SAS/Access to PC File Formats is licensed, additional file formats are supported including .XLS and .DBF. The wizard is run interactively, but the code generated can be recalled in the program window and saved for modification and submission. Two software products are widely used in the pharmaceutical industry for data conversion. DBMS/COPY from Conceptual Software, Inc. and STAT/Transfer from Circle Systems, Inc. provide data conversion between SAS files and many file formats.

DATA DOCUMENTATION Documentation of SAS data sets can be easily created using PROC CONTENTS to write the variable name, label, and type (character or numeric). Date and time variables can be identified from the format attribute. This information can be written to a text file with delimiters, and Microsoft Word can read the text file. Once in Word, the “convert text to table” tool can be used to generate nicely formatted tables. The following SAS code writes the text file using “$” as a delimiter for data definition tables documenting the contents of a SAS data library.

***File references; libname sasdata ‘\\server\sasfolder’; filename out ‘\\server\folder\define.txt’; ***Type of variables format; proc format; value ctype 1='number' 2='character' 3='date' 4='time'; run; ***Read data library contents and write a ***temporary data set; proc datasets library=sasdata nolist; contents data=_all_ noprint out=work.tempcont (keep=memname varnum name type format label); quit; ***Sort by variable order within data sets; ***Variables should be in logical order; proc sort data=tempcont; by memname varnum; run; ***Write text file with delimiter; data _null_; length ctype $10; file out; set tempcont; by memname varnum; if type=1 and (index(format,'DATE')>0 or index(format,'MMDDYY')>0) then type=3; else if type=1 and (index(format,'TIME')>0 or index(format,'HHMM')>0) then type=4; ctype=put(type,ctype10.); ***Header for each data set; if first.memname then put / @1 memname ' Dataset Variables' / @1 ‘Variable $ Label $ Type $ Codes $ Comments'; ***Upper case for first character of label; ***other text lower case; label=upcase(substr(label,1,1)) || lowcase(substr(label,2)); ***Write the delimited text; ***2 blank columns at end; ***for codes and comments; put @1 name +1 '$' +1 label +1 '$' +1 ctype $10. +1 '$ $ '; run; This program creates the text file, which is read into Word, and then the table menu selection for converting text to table is used to generate the data definition tables.

TRANSFER MEDIA The transfer media are decided based on the policies of the parties involved. Email is used for data transfers, but some corporate policies do not allow data transferred over the Internet due to security issues. If data are transferred via email, the files should be compressed (e.g. using WinZip) to prevent long download times. An alternative approach to electronic data transfer is the use of a secure dedicated line to transfer files. This method of data transfer eliminates the security issues of email transfers over the Internet, but involves additional cost and maintenance by the information technology department. Magnetic media can be used for data transfers. 3.5 inch disks offer low cost, low capacity, and usually require data compression. High capacity magnetic disks (Zip and Jaz) are also available at higher cost. CDs are now common data transfer media, offering high capacity (650 MB) and low cost.

PORTABLE SAS PROGRAMS In addition to the transfer of data, most customers require that SAS programs are transferred in order to verify results and modify analyses. It is therefore important to write SAS programs that are portable to diverse systems. Fortunately, SAS code is very portable. Statements defining file locations (FILENAME, LIBNAME) are not portable and should be avoided in report and analysis programs. Instead, these operating system specific statements can be coded in a single file for a project and the file called with %INCLUDE statements. Alternatively, these statements can be coded into an AUTOEXEC.SAS program for each project.

COMMON PREVENTABLE MISTAKES There are some common mistakes made in data transfers that can be prevented. One common problem is sending a virus to a customer, easily prevented using anti-virus software. Moving SAS transport files across platforms using an ASCII transfer corrupts the file. SAS transport files should be moved using a binary transfer. If sending a customer a compressed file, it is important to assure that the customer has decompression software. If not, a self-extracting file should be sent. Another problem occurs if the method of creating the SAS transport file is not specified (PROC CPORT or XPORT engine). The final media should always be validated to assure the transfer occurred according to specifications.

CONCLUSION Data transfers continue to be an important part of clinical data programming in the pharmaceutical industry. Understanding SAS transport files, FDA guidelines, data conversion tools, and data documentation are necessary to be an effective SAS programmer in the current environment. There are many opportunities for error in the data transfer process. It is important to document and follow standard procedures to avoid problems. Manual processes should be automated to the extent possible, such as using PROC CONTENTS instead of manually typing data definition tables.

REFERENCES Michel, Denis, “Portable Data Processing”, Proceedings of the Tenth Annual NorthEast SAS Users Group Conference, 1997. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER), “Guidance for Industry: Providing Regulatory Submissions in Electronic Format General Considerations”, January 1999. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), “Guidance for Industry: Providing Regulatory Submissions in Electronic Format - NDAs”, January 1999. U.S. Department of Health and Human Services, Food and Drug Administration, Center for Biologics Evaluation and Research (CBER), “Guidance for Industry: Providing Regulatory Submissions to the Center for Biologics Evaluation and Research (CBER) in Electronic Format Biologics Marketing Applications”, November 1999, Revised. SAS Institute Inc., SAS® Companion for the Microsoft Windows Environment, Version 6, First Edition, Cary, NC: SAS Institute Inc., 1993. SAS Institute Inc., SAS/ACCESS® Interface to PC File Formats: Usage and Reference, Version 6, First Edition, Cary, NC: SAS Institute Inc., 1993. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks of their respective companies.

CONTACT Denis Michel PharmaNet, Inc. 504 Carnegie Center Princeton, NJ 08540 Email: [email protected]

Suggest Documents