Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy

StatisticalScience 1986,Vol. 1, No. 1, 54-77 BootstrapMethodsforStandardErrors, ConfidenceIntervals, and OtherMeasures of StatisticalAccuracy B. Efro...

Author: Stuart Boyd

26 downloads 0 Views 3MB Size

Report

Download PDF

Recommend Documents

Bootstrap Confidence Intervals

Statistical inference using bootstrap confidence intervals Michael Wood Bootstrap confidence intervals

Bootstrap Confidence Intervals

Confidence intervals and other statistical intervals in metrology

Statistical Inference. Confidence Intervals

Dependent Bootstrap Confidence Intervals for a Population Mean

Chapter 9: Confidence Intervals. Statistical Estimation Point Estimation Interval Estimation. Confidence Intervals One-sided Confidence Intervals

Bootstrap confidence intervals for large-scale multivariate monotonic regression problems

A comparison of bootstrap methods to construct confidence intervals in QTL mapping

CONFIDENCE INTERVALS FOR SOIL PROPERTIES BASED ON DIFFERING STATISTICAL ASSUMPTIONS

BIOM5010: Statistics #2G. Confidence Intervals Statistical Testing Statistical Power

Confidence Intervals for Ranks

Confidence Intervals, Part 1: Assessing the Accuracy of Samples

CHAPTER 7 STANDARD ERROR OF THE MEAN AND CONFIDENCE INTERVALS

6. Duality between confidence intervals and statistical tests

Confidence Intervals

Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data

B. Standard Statistical Methods for Market Risk

CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Researchers Misunderstand Confidence Intervals and Standard Error Bars

Testing statistical hypotheses based on fuzzy confidence intervals

LAB EXERCISE: Statistical Analysis (calculating 95% confidence intervals)

Notes 7: Confidence Intervals

Bootstrapped Confidence Intervals as an Approach to Statistical Inference

StatisticalScience 1986,Vol. 1, No. 1, 54-77

BootstrapMethodsforStandardErrors, ConfidenceIntervals, and OtherMeasures of StatisticalAccuracy B. Efronand R. Tibshirani Abstract.This is a reviewof bootstrapmethods,concentrating on basic ideasand applications ratherthantheoretical considerations. It beginswith an expositionof the bootstrapestimateof standarderrorforone-sample situations.Severalexamples,someinvolving quitecomplicated statistical procedures, are given.The bootstrapis thenextendedto othermeasuresof statisticalaccuracysuchas bias and prediction error,and to complicated data structures suchas timeseries,censoreddata,and regression models. Severalmoreexamplesarepresented theseideas.The lastthird illustrating ofthepaperdeals mainlywithbootstrapconfidence intervals. Key words: Bootstrapmethod,estimatedstandarderrors,approximate confidence methods. intervals, nonparametric 1. INTRODUCTION A typicalproblemin appliedstatisticsinvolvesthe parameter ofan unknown 0. The twomain estimation questionsasked are (1) whatestimator0 shouldbe used? (2) Havingchosento use a particular0, how of0? The bootstrapis a accurateis it as an estimator the secondquesforanswering generalmethodology whichsubstitutes method, tion.It is a computer-based in place oftheamountsofcomputation considerable oreticalanalysis.As we shall see, the bootstrapcan answerquestionswhichare fartoo compliroutinely statisticalanalysis.Even forrelcatedfortraditional methods ativelysimpleproblemscomputer-intensive gooddata analikethebootstrapare an increasingly comdeclining lyticbargainin an eraofexponentially putationalcosts. This paper describesthe basis of the bootstrap whichis verysimple,and givesseveralexamtheory, ples of its use. Relatedideas like the jackknife,the boundarealso andFisher'sinformation deltamethod, discussed.Mostoftheproofsandtechnicaldetailsare given, omitted.These can be foundin the references and andBiostatistics, ofStatistics B. Efronis Professor and ComChairmanoftheProgramin Mathematical His mailing Scienceat StanfordUniversity. putational SequoiaHall,StanofStatistics, addressisDepartment CA 94305.R. Tibshiraniis Stanford, fordUniversity, Fellowin theDepartment ofPreventive a Postdoctoral FacultyofMedicine,UniMedicineand Biostatistics, versityof Toronto,McMurrickBuilding,Toronto, M5S 1A8,Canada. Ontario, 54

Efron(1982a). Some of the discussion particularly hereis abridgedfromEfronand Gong(1983)and also fromEfron(1984). we willdethe mainexposition, Beforebeginning worksin termsofa problem scribehowthebootstrap whereit is not needed,assessingthe accuracyof the sample mean. Suppose that our data consistsof a distrirandomsamplefroman unknownprobability butionF on therealline, Xl X2, * , X.-

(1.1)

F.

HavingobservedX1 = x1, X2 = x2, ... , Xn = xn, we computethe samplemeanx = 1 xn/n, and wonder how accurateit is as an estimateof the truemean 6 = EFIX}.

IfthesecondcentralmomentofF is 182(F) EFX2

-

(EFX)2, then the standard errora(F; n, x), that is

thestandarddeviationofi fora sampleofsize n from distribution F, is o(F)

(1.2)

=

[,M2(F)/n]112.

n, i) is allownotationo(F) -(F; The shortened ofinterest statistic ablebecausethesamplesizen and F unknown. The standard x are known,only being measureofi's accuracy.Unerroris the traditional we cannotactuallyuse (1.2) to assessthe fortunately, accuracyofi, sincewe do notknowM2(F), butwe can standarderror use theestimated (1.3) wherejX2= of A2(F).

= Ei

(Xi-x)2/(n

[#2/n]l/2 -

1), theunbiased estimate

Thereis a moreobviouswayto estimateo(F). Let

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

distribution, F indicatetheempiricalprobability mass 1/non x1,x2,... , xn. (1.4) F: probability Thenwecan simplyreplaceF byF in (1.2),obtaining (1.5)

a -

F) = [U2(P)/nl .

as the estimatedstandarderrorfori. This is the bootstrap estimate.The reasonforthe name "bootstrap"willbe apparentin Section2,whenweevaluate thanx. Since v(F) forstatisticsmorecomplicated (1.6)

/2

a

82(F)

=

X

(

is too is notquitethesame as a-,butthedifference in smallto be important mostapplications. Of course we do not reallyneed an alternative

55

We willsee thatbootstrapconfidence intervals can automatically incorporate trickslikethis,withoutrequiringthedataanalysttoproducespecialtechniques, likethetanh-1transformation, foreachnewsituation. Animportant is thesubstitution themeofwhatfollows ofrawcomputing powerfortheoretical analysis.This is not an argumentagainsttheory,of course,only againstunnecessary theory.Mostcommonstatistical methodsweredevelopedinthe1920sand 1930s,when was slowand expensive.Now thatcomcomputation putationis fastand cheapwecan hopeforand expect This paper dischangesin statisticalmethodology. cussesone suchpotentialchange,Efron(1979b)discussesseveralothers.

a

formulato (1.3) in this case. The troublebeginswhen

2. THE BOOTSTRAPESTIMATEOF STANDARD ERROR

morecompliwewanta standarderrorforestimators of This sectionpresentsa morecarefuldescription catedthanx, forexample,a medianor a correlation thebootstrapestimateofstandarderror.For nowwe froma robustregression. In most ora slopecoefficient willassumethatthe observeddata y = (xl, x2, ** cases thereis no equivalentto formula(1.2), which andidentically distributed xn)consistsofindependent expressesthestandarderrora(F) as a simplefunction (iid) observations X1,X2- .-, Xn fiidF, as in (1.1). ofthe samplingdistribution F. As a result,formulas Here F represents an unknownprobability distribulike(1.3) do notexistformoststatistics. tion on r, the commonsample space of the observaThis is wherethecomputercomesin. It turnsout tions. We have a statisticof interest,say 0(y), to thatwecan alwaysnumerically evaluatethebootstrap whichwe wishto assignan estimatedstandarderror. estimatea = a(F), withoutknowinga simpleexpresFig. 1 showsan example.The samplespace r is sionfora(F). The evaluationofa is a straightforward n2+, the positivequadrantof the plane. We have MonteCarloexercisedescribedin thenextsection.In observedn = 15 bivariatedata points,each correas describedin the a good computing environment, spondingto an Americanlaw school.Each pointxi gives remarksin Section2, the bootstrapeffectively consistsoftwosummary statisticsforthe1973enterformula for the statisticiana simple like (1.3) any ingclass at law schooli no matterhowcomplicated. statistic, (2.1) Standarderrorsare crudebut usefulmeasuresof xi= (LSATi, GPAi); are used to give statisticalaccuracy.They frequently approximateconfidenceintervalsfor an unknown LSATi is the class' averagescore on a nationwide examcalled "LSAT"; GPAi is the class' averageun0 parameter dergraduate grades.The observedPearsoncorrelation 0 E 0 ? Sz(a), (1.7) pointofa standard wherez(a)is the100 * a percentile normalvariate,e.g., Z(95) = 1.645. Interval(1.7) is notso good.Sections sometimes good,and sometimes 7 and 8 discussa moresophisticated use ofthebootinconfidence strap,whichgivesbetterapproximate tervalsthan(1.7). The standardinterval(1.7) is based on takinglit(f erallythe largesamplenormalapproximation 0)/S N(0, 1). Appliedstatisticians use a varietyof For instanceif tricksto improvethisapproximation. 0 is the correlation and 0 the samplecorcoefficient 4 = tanh-1(0), = relation,thenthetransformation thenormalapproximation, improves tanh-1(0)greatly at leastin thosecases wheretheunderlying sampling is bivariatenormal.The correcttactic distribution thenis to transform, computetheinterval(1.7) for, and transform thisintervalbackto the0 scale.

3.5GPA

*1 3.3t 3.1

*2 -10 -

GPA

*6

*4

97

2.9 -

2.7

@13

540

FIG. 1.

03

'-

560

*@14 *12 lI

580

15

l 1I 620 600 LSAT

640

660

680

The law schooldata (Efron,1979b). The data points,beginning with School 1, are (576, 3.39), (635, 3.30), (558, 2.81), (578, 3.03), (666, 3.44), (580, 3.07), (555, 3.00), (661, 3.43), (651, 3.36), (605, 3.13), (653, 3.12), (575, 2.74), (545, 2.76), (572, 2.88), (594, 2.96).

56

B. EFRON AND R. TIBSHIRANI

forthese15 pointsis 6 = .776.We wishto coefficient assigna standarderrorto thisestimate. Let o(F) indicatethe standarderrorof 0, as a F, oftheunknownsamplingdistribution function a(F)

(2.2)

=

[VarF{Ny)

ofthesamplesize n Ofcoursea (F) is also a function and the formof the statistic0(y), but sincebothof these are knowntheyneed not be indicatedin the notation.The bootstrapestimateofstandarderroris (2.3)

=

(1.4), putting whereF is the empiricaldistribution 1/non each observeddata pointxi.In the probability puttingmass lawschoolexample,F is thedistribution 1/15on each point in Fig. 1, and a is the standard for15iidpoints deviation ofthecorrelation coefficient drawnfromF. In most cases, includingthat of the correlation forthefuncthereis no simpleexpression coefficient, it is easyto numeritiona(F) in (2.2). Nevertheless, callyevaluatea = 0(F) by meansof a MonteCarlo notation: whichdependson the following algorithm, = (x4, 4, *, draws x*) indicatesn independent sample.Because F is the fromF, called a bootstrap ofthedata,a bootstrapsample empiricaldistribution turnsout to be the same as a randomsampleof size fromthe actual sample n drawnwithreplacement {X1, X2,

..

* * Xnl.

Carloalgorithm willnotconverge to a'ifthebootstrap samplesize differs fromthetruen. Bickeland Freedman(1981) showhowto correctthealgorithm to give a ifin factthebootstrap samplesizeis takendifferent than n, but so far theredoes not seem to be any practicaladvantageto be gainedin thisway. Fig. 2 showsthe histogram of B = 1000bootstrap replications ofthecorrelation coefficient fromthelaw schooldata. For convenient reference the abscissais plottedin termsof 0* - 0 = 0* - .776. Formula (2.4) gives6' = .127 as thebootstrapestimateof standard error.This can be comparedwiththe usual normal theoryestimateofstandarderrorfor0, (2.5)

TNORM =

(1 -

)/(n - 3)1/

=

.115,

[Johnson and Kotz (1970,p. 229)]. REMARK.The Monte Carlo algorithm leadingto is to On the Stanford version U7B(2.4) simple program. of the statisticalcomputinglanguageS, Professor Owenhas introduced Arthur a singlecommandwhich bootstraps anystatisticin theS catalog.For instance thebootstrapresultsin Fig. 2 are obtainedsimplyby typing B = 1000). tboot(lawdata, correlation, The executiontimeis abouta factorofB greater than thatfortheoriginalcomputation.

proceedsinthreesteps: The MonteCarloalgorithm There is anotherway to describethe bootstrap independently standarderror:F is thenonparametric maximum like(i) usinga randomnumbergenerator, F drawa largenumberofbootstrapsamples,say y*(1), lihoodestimate(MLE) oftheunknowndistribution sample y*(b), bootstrap and This means that the (ii) for each (Kiefer Wolfowitz, 1956). y*(B); ***, y*(2), say 0*(b)= 0(y*(b))g bootstrapestimateaf = a(F) is the nonparametric evaluatethestatisticofinterest, MLE ofv(F), thetruestandarderror. b = 1, 2, * , B; and (iii) calculate the sample standard In factthereis nothingwhichsays thatthe bootdeviationofthe0*(b)values strapmustbe carriedoutnonparametrically. Suppose -A 0()2A Zb=1 1/2 *.)}0/ {8*(b)forinstancethatin thelaw schoolexamplewebelieve F mustbe bivariate the truesamplingdistribution B-i (2.4) normal.ThenwecouldestimateF withitsparametric MLE FNORM, thebivariatenormaldistribution >20*(b) having l*(.)= the same meanvectorand covariancematrixas the B~~~~~~ It is easy to see that as B 60, 5B will approach = (F), the bootstrapestimateof standarderror. All we are doingis evaluatinga standarddeviation NORMAL by Monte Carlo sampling.Later, in Section 9, we THEORYHITGA will discusshow largeB need be taken. For most DENSITY HISTOGRAM situationsB in therange50 to 200 is quiteadequate. In whatfollowswe willusuallyignorethe difference HISTOGRAM between5B and a, callingbothsimply"a" PERCENTILES Whyis each bootstrapsampletakenwiththesame 160/o 50 84 that samplesize n as theoriginaldata set?Remember 0 -0.3 -0.2 -0.1 -0.4 0.1 0.2 o(F) is actually(F, n, 6), the standarderrorforthe statistic0( ) basedon a randomsampleofsize n from FIG. 2. Histogramof B = 1000 bootstrapreplicationsof 6*for the F. The bootstrapestimate law schooldata. The normaltheorydensitycurvehas a similarshape, the unknowndistribution butfallsoffmorequicklyat the uppertail. f is actuallyo(F, n,0) evaluatedat F = F. The Monte A

-

a

57

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

data. The bootstrapsamplesat step (i) of the algoinsteadofF, rithmcouldthenbe drawnfromFNORM and steps(ii) and (iii) carriedoutas before. The smoothcurvein Fig. 2 showsthe resultsof out this "normaltheorybootstrap"on the carrying law schooldata. Actuallythereis no need to do the bootstrapsamplingin this case, because of Fisher's coefformula forthesamplingdensityofa correlation Chapter (see normal situation in the bivariate ficient 32 of Johnsonand Kotz, 1970). This densitycan be

thoughtof as the bootstrap distributionfor B = oo.

to "1NORM= Expression(2.5) is a close approximation estimate ofstandthe bootstrap parametric o(FNORM), arderror. In considering the meritsor demeritsof the bootthat all of the usual it is worth remembering strap, standard for errors,like g-l/2 formulas estimating

where J is the observed Fisher information,are es-

sentiallybootstrapestimatescarriedout in a paraThis pointis carefully explainedin metricframework. nonSection5 ofEfron(1982c).The straightforward (i)-(iii) has thevirtuesofavoidalgorithm parametric ing all parametricassumptions,all approximations (such as thoseinvolvedwiththe Fisherinformation TABLE 1 A samplingexperimentcomparingthe bootstrapand jackknife estimatesofstandarderrorforthe 25% trimmedmean, sample size n = 15

F negative exponential

F standard normal Bootstrap f (B = 200)

Ave SD

CV

Ave SD

CV

.2s87 .071

.25

.242

.32

.078

.280 .084 .30 .224 .085 .38 Jackknife 6f (.27) (.19) .232 True(minimum CV) .286 TABLE

forthestandarderrorofan MLE), and in expression of any kind. The data fact all analyticdifficulties analystis freeto obtain standarderrorsforenorsubjectonlyto the mouslycomplicatedestimators, time.Sections3 and6 discuss ofcomputer constraints appliedproblemswhichare fartoo someinteresting forstandardanalyses. complicated How welldoes the bootstrapwork?Table 1 shows the answerin one situation.Here r is the real line, n = 15, and the statistic0 of interestis the 25% F is mean.If thetruesamplingdistribution trimmed N(0, 1), thenthe truestandarderroris a(F) = .286. estimate'ais nearlyunbiased,averaging The bootstrap The standard .287 in a largesamplingexperiment. deviationofthebootstrapestimatea' is itself.071 in ofvariation.071/.287= .25. thiscase,withcoefficient are two levels of Monte Carlo that there (Notice involvedin Table 1: firstdrawingthe actualsamples y = (xl, x2, ..., x15)fromF, and thendrawingboot* *,x15)withy held fixed.The strapsamples (x4, x2*, *

bootstrapsamplesevaluatea' fora fixedvalue of y. The standarddeviation.071 refersto the variability ofa' dueto therandomchoiceofy.) anothercommonmethodofassignThe jackknife, standarderrors,is discussedin ing nonparametric Section10. The jackknifeestimateCJ' is also nearly ofvariunbiasedfora(F), buthas highercoefficient possibleCV fora scaleation (CV). The minimum invariantestimateof a(F), assumingfullknowledge of the parametricmodel,is shownin brackets.The bootstrapis seen to be moderately nonparametric in Table 1. in bothcases considered efficient Table 2 returnsto the case of 0 the correlation Insteadof real data we have a sampling coefficient. in whichthe trueF is bivariatenormal, experiment 0 = .50,samplesize n = 14. Table 2 truecorrelation a largertablein Efron(1981b),in from is abstracted 2

F bivariatenormalwithtrue C and forX = tanh-'6; sample size n 14, distribution Estimatesofstandarderrorforthe correlationcoefficient correlationp = .5 (froma largertablein Efron,1981b) Summarystatisticsfor200 trials Standard errorestimatesforC

Standard errorestimatesforX

-VKfM

Ave

SD

CV

MSE

Ave

SD

CV

.206 .206 .200 .205 .205

.066 .063 .060 .061 .059

.32 .31 .30 .30 .29

.067 .064 .063 .062 .060

.301 .301 .296 .298 .296

.065 .062 .041 .058 .052

.22 .21 .14 .19 .18

.065 .062 .041 .058 .052

6. Jackknife 7. Delta method (Infinitesimaljackknife)

.223 .175

.085 .058

.38 .33

.085 .072

.314 .244

.090 .052

.29 .21

.091 .076

8. Normal theory

.217

.056

.26

.056

.302

0

0

.003

True standarderror

.218

1. 2. 3. 4. 5.

BootstrapB = 128 BootstrapB = 512 Normal smoothedbootstrapB = 128 UniformsmoothedbootstrapB = 128 UniformsmoothedbootstrapB = 512

.299

58

B. EFRON AND R. TIBSHIRANI

a standard whichsomeofthemethodsforestimating errorrequiredthesamplesize to be even. The leftside ofTable 2 refersto 0, whiletheright side refersto X = tanh-'(0) = .5 log(1 + 6)/(1 -).

For each estimatorof standarderror,the rootmean )2]1/2 iS givenin squarederrorof estimation[E(a thecolumnheadedVMi.. The bootstrapwas runwithB = 128 and also with better B = 512,thelattervalueyieldingonlyslightly estimatesin accordancewiththeresultsofSection9. FurtherincreasingB wouldbe pointless.It can be shownthatB = oo givesVii-i = .063for0, only.001 less thanB = 512. The normaltheoryestimate(2.5), whichwe knowto be ideal forthissamplingexperiment,has ../i-Si= .056. betweenthe totallynonparaWe can compromise metric estimatea'andthetotallyparametric bootstrap bootstrapestimateC7NORM. This is done in lines3, 4, and 5 ofTable 2. Let 2 = Sin-l (xi - )(x- i)'/n be the samplecovariancematrixof the observeddata. The normalsmoothedbootstrap drawsthe bootstrap samplefromF (D N2(0, .252), (D indicatingconvoluF by an equal mixtion.This amountsto estimating of the n distributions ture N2(xi,.252), thatis by a normalwindowestimate.Each pointxi*in a smoothed bootstrapsampleis the sum of a randomlyselected bivariate originaldata pointxj, plus an independent makes little normalpointzj N2(0,.252). Smoothing of the but is left side difference on the table, spectacin the k case. The latterresultis ularlyeffective is bivarsuspectsincethetruesamplingdistribution q O = tanhis specifiiate normal,and the function callychosento havenearlyconstantstandarderrorin smoothed the bivariatenormalfamily.The uniform from F (D W(0, .252), where bootstrapsamples on a rhombus is the uniform distribution WI(0,.252) selectedso VIhas meanvector0 andcovariancematrix in vMi-SR forboth .25Z. It yieldsmoderatereductions sidesofthetable. which Line 6 ofTable 2 refersto thedeltamethod, is the mostcommonmethodof assigningnonparametricstandarderror.Surprisingly enough,it is badly on bothsidesofthetable.The delta biaseddownward method,also knownas the methodof statisticaldifandtheinfinitestheTaylorseriesmethod, ferentials, imaljackknife, is discussedin Section10. -

3. EXAMPLES Example 1. Cox's ProportionalHazards Model

In this sectionwe applybootstrapstandarderror statistics. to somecomplicated estimation The data forthis examplecome froma studyof leukemiaremissiontimesin mice,taken fromCox of remission (1972). They consistof measurements

time(y) in weeksfortwogroups,treatment (x = 0)

and control(x = 1), and a 0-1 variable (bi) indicating

whetheror notthe remissiontimeis censored(0) or complete(1). Thereare 21 micein each group. The standardregression modelforcensoreddata is Cox's proportional hazardsmodel(Cox, 1972).It assumesthatthehazardfunction h(tIx),theprobability of goingintoremissionin nextinstantgivenno remissionup to timet fora mousewithcovariatex, is oftheform (3.1)

h(tIx) = ho(t)e:x.

Hereho(t)is an arbitrary function. Since unspecified x hereis a groupindicator, thismeanssimplythatthe hazardforthecontrolgroupis e: timesthehazardfor the treatment group.The regression parameterd is of ho(t)throughmaximizaestimatedindependently tionoftheso called"partiallikelihood" (3.2)

PL

=

11

iED

e,3xi e-xi EiER,

e i'

whereD is the set ofindicesofthe failuretimesand Ri is thesetofindicesofthoseat riskat timeyi.This maximization requiresan iterative computer search. The estimated forthesedata turnsoutto be 1.51. Taken literally, thissaysthatthe hazardrateis e'5' = 4.33 timeshigherin the controlgroupthanin the treatment is veryeffective. group,so the treatment Whatis thestandarderroroffA? The usualasymptotic likelihoodtheory, one overthesquareroot maximum oftheobservedFisherinformation, givesan estimate of .41. Despitethe complicated natureoftheestimawecanalso estimate tionprocedure, thestandarderror using the bootstrap.We sample with replacement fromthe triplesI(y', xi, 5k), *..*, (Y42, x42, 642)). For

each bootstrap sample $(y*, x*, 0),

..,

(y*,

x4*2,

6*)) we formthe partiallikelihoodand numerically maximizeit to producethe bootstrapestimateA*.A of1000bootstrap histogram valuesis shownin Fig.3. The bootstrapestimateof the standarderrorof A based on these 1000 numbersis .42. Althoughthe and standardestimatesagree,it is interestbootstrap is skewed ingto notethatthe bootstrapdistribution to the right.This leads us to ask: is thereother information thatwe can extractfromthe bootstrap distribution otherthana standarderrorestimate?The the bootstrapdistribuansweris yes-in particular, intervalforfA, tioncan be used to forma confidence as wewillsee in Section9. The shapeofthebootstrap will help determinethe shape of the distributiion confidence interval. In thisexampleourresampling unitwas thetriple (yi,xi,bi),and we ignoredtheuniqueelementsofthe and theparticularmodel problem, i.e.,the censoring, beingused.In fact,thereare otherwaysto bootstrap

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY 200

150

100

50

01 0.5

1

2

1.5

2.5

3

FIG. 3. Histogram of 1000 bootstrapreplicationsfor the mouse leukemiadata.

thisproblem.We willsee thiswhenwe discussbootcensoreddata in Section5. strapping Example2: Linearand ProjectionPursuit Regression We illustratean applicationof the bootstrapto standardlinearleast squaresregression as wellas to a nonparametric regression technique. Considerthestandardregression setup.We haven on a responseY and covariates(X1,X2, observations ***,X,). Denotetheithobservedvectorofcovariates

by xi = (xil, xi2, ... , xip)'. The usual linear regression

modelassumes (3.3)

E(Yi)

a5

I

a + E /fl1xi. j=l

m

E(Yi)=

Noticetherelationoftheprojection pursuitregression modelto the standardlinearregression model. Whenthe function is forced to be linear and is sj(-) estimatedby the usual least squaresmethod,a onetermprojection pursuitmodelis exactlythe same as the standardlinearregression model.That is to say, the fittedmodels'(a'1 xi) exactlyequals the least f31 squaresfita' + jxi,. This is because the least squaresfit,bydefinition, findsthebestdirection and the best linearfunctionof that direction.Note also thataddinganotherlinearterms 2(& * X2)wouldnot changethe fittedmodelsincethe sum of two linear functions is anotherlinearfunction. Hastieand Tibshirani(1984) appliedthebootstrap to thelinearand projection pursuitregression models to assess the variability of the coefficients in each. The datatheyconsidered aretakenfromBreimanand Friedman(1985). The responseY is Upland atmosphericozoneconcentration (ppm);the covariatesX1 = SandburgAir Force base temperature (CO),X2 = inversion base height(ft),X3 = Daggotpressuregradient(mmHg), X4 = visibility (miles),and X5 = day oftheyear.Thereare 330 observations. The number ofterms(m) inthemodel(3.4) is takentobe two.The chosedirections projection pursuitalgorithm al = (.80, -.38, .37, -.24, -.14)' and 62 = (.07, .16, .04, -.05, -.98)'. These directionsconsistmostlyof Sandburg Air Force temperature and day of the year,respec-

p

=

Friedmanand Stuetzle(1981) introduced a moregenmodel eralmodel,theprojectionpursuitregression (3.4)

59

X sj(aj - xi).

a4

A.

I

a3

j=l

The p vectorsaj are unitvectors("directions"), and thefunctions sj(.) are unspecified. Estimation of la,, sl(.)),

...,

a2

{am1,Sm(-)} is per-

in a forward formed stepwisemanneras follows.Con-

sider {al, s ( -)}. Given a directional, s, ()* is estimated

smoother(e.g.,runningmean)of bya nonparametric y on a, * x. The projectionpursuitregression algorithmsearchesover all unit directionsto findthe

directional and associated functionsl(-) that minimize Then residuals are taken (y1-sl(is xa))2. *

and the nextdirectionand functionare determined. This processis continueduntil no additionalterm reducestheresidualsumofsquares. significantly

a,

-1

L

-0.5

0

0.5

bootstrappedcoefficients

1

FIG. 4. Smoothedhistogramsofthebootstrapped coefficients forthe firsttermin theprojectionpursuitregressionmodel.Solid histograms are forthe usual projectionpursuitmodel;the dottedhistogramsare forlinear9(*).

B. EFRON AND R. TIBSHIRANI

60

on 157patients.A pro(1). Thereare measurements portionalhazardsmodelwas fitto thesedata,witha quadraticterm,i.e, h(t I x) =

Both #,and

ho(t)elx+i32x.

are highlysignificant; thebrokencurvein Fig.6 is ofx. /3x+ f2x2as a function For comparison, Fig. 6 shows(solid line) another estimate.This was computedusing local likelihood estimation(Tibshiraniand Hastie, 1984). Given a hazardsmodeloftheformh(tIx) generalproportional f2

a4

= ho(t)es(x),the local likelihood technique assumes

a3

a2

al~~

-1

~

~~~~~~ -0.5

p

p

0

0.5

1

bootstrapped coefficients

coefficients forthe FIG. 5. Smoothedhistogramsofthebootstrapped second termin theprojectionpursuitmodel.

tively.(We do notshowgraphsoftheestimatedfunctionss(*(.) and s2(. ) althoughin a fullanalysisofthe data theywouldalso be of interest.)Forcings'( *) to be linearresultsin the directiona' = (.90,-.37, .03, -.14, -.19)'. These are just the usual least squares estimatesi1, *.,* ,, Ascaled so that EP12 = 1. of the directions, To assess the variability a bootfrom(Yi, x11, strapsampleis drawnwithreplacement . . ., X15), * * *, (Y330, X3301, *-- , X3305)and theprojection is applied.Figs.4 and 5 showhispursuitalgorithm a'* and a* for200 bootstrap togramsofthedirections Also shownin Fig.4 (brokenhistogram) replications. are the bootstrap replicationsof a& with s.(.)

forced

to be linear. The firstdirectionofthe projectionpursuitmodel is quite stableand onlyslightlymorevariablethan thecorresponding linearregression direction. But the unstable!It is clearly seconddirectionis extremely ofthe unwiseto putanyfaithin theseconddirection originalprojection pursuitmodel. Example 3: Cox's Model and Local Likelihood Estimation

In this example,we returnto Cox's proportional hazardsmodeldescribedin Example1,butwitha few addedtwists. The data thatwe willdiscusscomefromtheStanand are givenin Miller fordhearttransplant program and Halpern(1982). The responsey is survivaltime in weeksaftera hearttransplant, the covariatex is age at transplant,and the 0-1 variable3 indicates whether thesurvivaltimeis censored(0) or complete

formofs(x); insteadit nothingabouttheparametric estimatess(x) nonparametrically usinga kindoflocal is verycomputationally The algorithm inaveraging. tensive, andstandardmaximum likelihood theory cannotbe applied. A comparisonof the two functions revealsan imthe difference: estimate portantqualitative parametric suggeststhatthe hazarddecreasessharplyup to age 34,thenrises;the local likelihoodestimatestaysapproximately constantup to age 45 thenrises.Has the ofa quadraticfunction forcedfitting produceda misleadingresult?To answerthisquestion,we can bootstrapthe local likelihoodestimate.We samplewith replacement fromthe triplesI(Yl, x1, 61) ... (Y157i to X157,6157) and applythelocal likelihood algorithm eachbootstrapsample.Fig. 7 showsestimatedcurves from20 bootstrapsamples. Someofthecurvesare flatup to age 45,othersare decreasing.Hence the originallocal likelihoodestimateis highlyvariablein thisregionand on thebasis ofthesedata we cannotdetermine the truebehavior ofthefunction there.A lookbackat theoriginaldata showsthatwhilehalfof thepatientswereunder45, only13% of the patientswereunder30. Fig. 7 also showsthattheestimateis stablenearthemiddleages butunstablefortheolderpatients. 3

2

//~~~~~~~~~~ / /

t0 a

//

\L

10

/t

20

30

40

50

60

age FIG. 6. Estimates of log relativeriskfor the Stanfordheart transplant data. Broken curve: parametric estimate. Solid curve: local likelihoodestimate.

BOOTSTRAP METHODS FOR MEASURES OF STATISTICAL ACCURACY

61

TABLE 3

BHCGbloodserumlevelsfor54 patientshavingmetasticized breast cancerin ascending order