Introduction. Software Development

Joint Statistical Meetings - Section on Statistics in Epidemiology A Suite of Statistical Programs for the PDA: EPItome Statistical Calculator Raymon...
3 downloads 4 Views 325KB Size
Joint Statistical Meetings - Section on Statistics in Epidemiology

A Suite of Statistical Programs for the PDA: EPItome Statistical Calculator Raymond G. Hoffmann, Ph.D., Medical College of Wisconsin Thomas J Hoffmann, University of Wisconsin Madison Paul G. Hoffmann, Marquette University Raymond G. Hoffmann, Biostatistics Division, The Medical College of Wisconsin, PO Box 26509, Milwaukee, WI 53226. [email protected] Software Available at: http://iago.lib.mcw.edu/biostat/phome/rh.html •

Introduction The Palm “pocket computer” is a portable computer with the computation capacity exceeding the original McIntosh (the embedded processor is a derivative of the Motorola 68000 chip used in the original McIntosh). It has even more storage, especially on the larger SD storage cards (32-64MB) of memory. On the other hand, there is almost no statistical software available for the Palm or for the Pocket PC, Microsoft’s counterpart of the Palm device. The only statistical software is Palmstats, which is somewhat buggy and rather limited in methods, and a nicely done (freeware) sample size and power estimator by Bob Wheeler of ECHIPthat only runs on the palm.



Continuous data analysis where the data is collected as means or correlations (one way ANOVA based on means with post hoc contrasts, t-tests of means, correlations, and partial correlations) Tables of statistical distributions (z, F, t, chisquare, cumulative binomial and cumulative Poisson) Software Development

Software choices for development for the Palm and PocketPC are on several levels: cross-development systems on Windows PC’s, Linux and the Apple Macintosh. • Metrowerks CodeWarrior: a development system in C++ on the PC for the Palm • Microsoft’s Embedded Visual C++ for Windows for the PocketPC • Prc-tools (gcc compiler): a development system for Linux or under Cygwin for the Palm • PocketC: a cross development system on the Windows PC and natively on the Palm and PocketPC 2002, but with different API’s for the two different devices • SmallBasic: a rapid production development system on the Palm in Basic • Some Java virtual machines are in the process of development (Sun has released MIDP for the PalmOS, but has halted production of a virtual machine for WindowsCE)

Consequently, the purpose of this work has been to (1) investigate the software tools available for the Palm and Pocket PC 2002 that can be used for developing statistical software and (2) develop a set of programs that can be used to aid in answering the kind of questions that a practicing applied statistician or epidemiologist might face in the field. It is not intended to replace the Portable PC or Portable McIntosh for which very sophisticated tools are available for analyzing data sets of up to 1 gigabyte easily. The types of programs that are available fall into several categories • Sample size programs for quick questions about study design (comparing means, proportions, correlations, Odds Ratio and median survival times) • Power programs for questions about the ability of negative studies to detect a difference • Randomization programs (for generating simple schemes for design of clinical trials) • Categorical data analysis programs for testing tables of count data ( r × c chisquare, trends in proportions, 2 by 2 tables with different sampling structures) and simple matched pair analysis, and estimating the corresponding risk and its confidence intervals)

1461

Joint Statistical Meetings - Section on Statistics in Epidemiology

those that run correctly in single precision as well as double precision. Process Time

Palm, PocketC, Single Palm, PocketC, Single iPaq 3850 PocketPC, EVC++, Double iPaq 3850 PocketPC, PocketC, Single

Notes on the Algorithms

The PocketPC and Palm emulators running on a Windows PC.

1) The cumulative binomial and the cumulative poisson use a recursive relationship:

CodeWarrior was quickly excluded because of its high initial cost. The PocketC language was originally chosen for this development because of its inter-machine availability, and is the current choice for the Palm release of this software. However, due to the limitations of both computational speed (see figure below) and having only single precision, no double precision, production using PocketC was halted for the PocketPC device. Instead, Microsoft’s Embedded Visual C++ version 3.0 was chosen for the PocketPC.

h( x + 1) = and

h( x + 1) =

λ

h (x )

x +1

(1 − p ) ⋅ (n − x ) ⋅ h (x ) p

x

to speed up calculations and preserve precision. 2) The sample size calculations use Snedecor and Cochran’s formulas

Relative Computation Time (Pocket PC)

n = (Zα 2 + Z β )2

σ D2 D2

where

σ D2 = 2σ 2 for student’s t-test or

σ D2 = σ 2 for the paired t-test, except for the survival analysis which is based on the formulas in Dupont and Plummer’s paper with a modification for unequal group size. For example, the equation

More Computation More Loops PocketC

VC++

2 σ Pooled =σ 2 +σ 2

was replaced with

The software goals have been to balance speed and accuracy while keeping an environment in which the coding can be easily maintained and rapidly developed.

2 σ Pooled =σ2 +

σ2 k

3) The formulas for the distribution functions are based on Abromowitz and Stegun’s algorithms.

It has been tested on the Dragonball 33 Mhz processor on a PalmOS device, and the StrongARM 206 Mhz processor on a PocketPC 2002 device. The PocketPC version does double precision results in a fraction of the time that the Palm version does single precision results. However, there is little noticeable slowdown on the Palm device, and all algorithms are

4) The exact binomial confidence interval uses Straehling and Sullivan’s algorithm translated to C. 5) The 2 by 2 tables use the test based CI

1462

Joint Statistical Meetings - Section on Statistics in Epidemiology

1± zα •log (OR )

OR

2

2 χ obs

for the odds ratio (OR) in the case-control, the relative risk (RR) in the prospective study and the prevalence odds ratio (POR) in the crosssectional study. 6) The Mantel-Haenszel OR uses the equation

ai d i ni OR = i bi ci ∑i n i



The respective menu systems of the Pocket PC and the Palm.

and the test based is on Kahn and Sempos’s one df test using the asymptotic variance, while the test of homogeneity uses an ANOVA type method for comparing the homogeneity of the log (OR) (Kahn and Sempos).

Using the applications is fairly intuitive; the program pauses and prompts you for input at each step. While you are entering data, your entire session is dated and logged to a memo entitled “StatLog” on the Palm, or to the file “/My Documents/statLog.txt” on the PocketPC. This way you have a running record of anything you have typed in. The PocketPC will also retain a limited amount of information available on screen for you to scroll through.

7) The test for trend in proportions is based on the regression of the proportions on the number one to the number of categories test in Snedecor and Cochran.

An Example

8) The test for comparing correlations uses the z transform

z=

1  1  r + 1  ln  ~ N  0,  2  r − 1  n − 3

This is the output from a log file from running a few of the programs that mirrors what you see onscreen. StatLog

under Ho, adapted from Daniel.

SESSION: 7/25/02, 1:05 pm

9) The matched pairs analysis uses the test based CI from Kahn and Sempos.

~chi sq. distn df? 2 value? 3.14159 P(chi sq w 2 df > 3.14159)=0.2078

Running the Application

~t distn t-value? 1.54 df? 15 Pr(t>1.54)=0.07218 Pr(|t| > 1.54)=0.1443

Although you will need to enter in numbers manually when using this application, there is a menu-driven system for ease of use. The menu system differs a little between the Palm version and the Pocket PC version, but both are easy to use. Just tap on the function you want. The Pocket PC version will execute the same command repeatedly until you select a different command from the menu, while the Palm version returns back to the same place in the menu you were previously at.

~z distn zval (z>0)? 2.156 Pr(Z>2.156)=0.01554 Pr(|Z|>2.156)=0.03108 ~F distn F-value? 2.17 num df? 3 den df? 2 Pr(F>2.17)=0.3006 ~cum. binomial P( x= 3 for 5 subjects with 0.6999 prevalence) = 0.9189

1463

Joint Statistical Meetings - Section on Statistics in Epidemiology

~treatment randomization Blocked(0) or Simple(1) Rand? 1 number per group? 4 number of groups? 2 1 1 2 1 2 2 2 1

~cum. poisson distn P( x