Better Binomial Confidence Intervals

Journal of Modern Applied Statistical Methods Volume 6 | Issue 1 Article 15 5-1-2007 Better Binomial Confidence Intervals James F. Reed III Lehigh ...
Author: Paulina Burke
12 downloads 1 Views 442KB Size
Journal of Modern Applied Statistical Methods Volume 6 | Issue 1

Article 15

5-1-2007

Better Binomial Confidence Intervals James F. Reed III Lehigh Valley Hospital and Health Network

Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Reed, James F. III (2007) "Better Binomial Confidence Intervals," Journal of Modern Applied Statistical Methods: Vol. 6: Iss. 1, Article 15. Available at: http://digitalcommons.wayne.edu/jmasm/vol6/iss1/15

This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized administrator of DigitalCommons@WayneState.

Copyright © 2007 JMASM, Inc. 1538 – 9472/07/$95.00

Journal of Modern Applied Statistical Methods May, 2007, Vol. 6, No. 1, 153-161

Better Binomial Confidence Intervals James F. Reed III Lehigh Valley Hospital & Health Network

The construction of a confidence interval for a binomial parameter is a basic analysis in statistical inference. Most introductory statistics textbook authors present the binomial confidence interval based on the asymptotic normality of the sample proportion and estimating the standard error - the Wald method. For the one sample binomial confidence interval the Clopper-Pearson exact method has been regarded as definitive as it eliminates both overshoot and zero width intervals. The Clopper-Pearson exact method is the most conservative and is unquestionably a better alternative to the Wald method. Other viable alternatives include Wilson's Score, the Agresti-Coull method, and the Borkowf SAIFS-z. Key words: Binomial distribution, confidence intervals, coverage probability, Wald method, ClopperPearson Method, Score Method, Agresti-Coull method.

set of articles, primarily in the statistics literature, about these and other less common methods of constructing binomial confidence intervals. The purpose of this article is to provide a review of alternatives to the Wald method for computing a binomial confidence interval and provide a set of tractable and better methods of constructing binomial confidence intervals for a single proportion.

Introduction The International Committee of Medical Journal editors indicated that confidence intervals are preferred over simple point estimates and pvalues. This applies to over 300 international medical/scientific journals. Most introductory statistics textbook authors present the binomial confidence interval based on the asymptotic normality of the sample proportion and estimating the standard error. This approximate method is referred to as the Wald interval. In order to avoid approximation, some advanced statistics textbooks recommend the ClopperPearson exact binomial confidence interval. Other methods, asymptotic as well as exact, have been proposed and appear sporadically in introductory textbooks. There is a rather large

Methodology When a binomial confidence interval is reported, the computational method is rarely given. This may imply that there is only one standard method for computing a binomial confidence interval - the Wald method (W). The W binomial confidence interval, either with or without a continuity correction, is found in every introductory statistics text. Typically, a warning or rule of thumb for determining when not to use W is included, but usually ignored. Occasionally, the Wald with a continuity correction (WCC) is included. For a single proportion the W and WCC lower bound (LB) and upper bound (UB) are defined as:

James Reed III is the Interim Chief of Health Studies and Director of Research at Lehigh Valley Hospital and Health Network. He has published over 100 journal articles and book chapters. His interests include applied statistical analyses, medical education, and statistical methods in simulation studies. Email: [email protected]

153

154

BETTER BINOMIAL CONFIDENCE INTERVALS W LB = p − zα/2 √[pq/n] W UB = p + zα/2 √[pq/n], WCC LB = p − (zα/2 √[pq/n]+1/(2n)) WCC UB = p + (zα/2 √[pq/n]+1/(2n))

where p = r/n, q = 1-p, r=number of successes, and n is the total sample size. Even though these two confidence interval methods are similar to large-sample formulas for means, both the W and WCC confidence intervals behave poorly in terms of zero width intervals and overshoot (Beal, 1987; Vollset, 1993; Newcombe, 1998; Pires, 2002; Rieczigel, 2003; Agresti, 2003). For instance, when r=0 or n, W and WCC have zero width or degenerate confidence intervals. Despite the known poor performance of the W and WCC confidence intervals, they continue to dominate in statistics textbooks, typically accompanied by warnings that when np is small, usually less than 5 or 10, exact or score methods should be used. A slightly different version of the rule of thumb requires that npq should be greater than or equal to 5. A better rule is to not compute confidence bounds for a proportion using the W method but rather to use one of the better methods. For small proportions the calculated lower bound can be below zero. Conversely, when a proportion approaches one, such as in the sensitivity and specificity of diagnostic or screening tests, and the upper bound may exceed one. This overshoot is avoided by truncating the interval to lie within [0, 1]. Overshoot and zero width confidence intervals may be avoided by a variety of better methods. One of the standard measures of binomial confidence interval performance is the coverage probability, C(π|n,α). Given X=k,n, and α, let δ(π|k,n,α)=1 if π ∈ [LB(k,n,α), UB(k,n,α)], and δ(π|k,n,α)=0 otherwise. Then, C(π|n,α) for a given π is: C(π|n,α)=Σ P(X=k|n,π) δ(π|k,n,α) Figure 1 shows the 95% confidence interval coverage probability of the standard Wald methods {W, WCC} as a function of π, π ∈ [0,1], for n=20. The coverage probability curves demonstrate the subnomial coverage for values of π near 0 and 1.

The Clopper-Pearson (CP) binomial confidence interval is the best-known exact method for interval estimation and is considered by most to be the gold standard (Clopper & Pearson, 1934). The CP confidence interval eliminates overshoot and zero width intervals and is strictly conservative. The CP lower and upper limits are defined by inverting the exact binomial tests with equal-tailed acceptance regions. CP

LB=0 if x=0, (α/2)1/n if x=n. LB=[1+(n−r+1)/(r × F2r, 2(n−r+ 1), 1−α/2)]-1

CP

UB=1-(α/2)1/n if x=0, 1 if x=n. UB=[1+(n−r)/(r × F2(r+1), 2(n−r),α/2)]-1

Fleiss (1981) preferred a more computationally intense binomial confidence interval with a continuity correction (SCC) attributed to Wilson (Wilson, 1927). For a single proportion, Wilson's Score (S) and Wilson's Score with continuity correction (SCC) LB and UB are defined as: S LB=(2np+z2−z√{z2+4npq})/2(n+z2) S UB=(2np+z2+z√{z2+4npq})/2(n+z2) SCC LB = [2np+z2−1−z√{z2−2−1/n+4p(nq+1)}]/(2n+2z2) SCC UB = [2np+z2+1+z√{z2+2−1/n+4p(nq-1)}]/(2n+2z2) Blyth and Still (1983) investigated the performance of W, WCC, CP, Sterne's binomial confidence interval method (Sterne, 1954), and Pratt's (P) approximate confidence interval method (Pratt, 1968). Their results demonstrate the need for a continuity correction even when n is large. Blythe and Still then suggested a modification to W (WBS). While the WBS was an improvement over W and WCC, they concluded that it still was not

JAMES F. REED III

155

Wald CP 1.0000

W

0.9500

0.9000

0.8500

0.8000

0.080

0.290

0.500

0.710

0.920

0.710

0.920

P

Wald CC CP 1.0000

WCC

0.9500

0.9000

0.8500

0.8000

0.080

0.290

0.500

P

Figure 1. Coverage Probabilities (n=20) for the Wald and Wald CC Binomial Confidence Interval Methods. satisfactory. The LB and UB for WBS are defined as: LB = p − [z/√(n-z2-2z/√n-1/n][√(pq)+1/2n], except LB=0 when r=0.

confidence intervals using evaluative criteria of C(P), interval width, and errors relative to limits. Vollset proposed a mean Pratt (MP), a modification of P that is a closed form approximation to the mid-P exact interval. Define the UB of P as: P UB=[1+(r+1)/(n-r))2((A-b)/c)3]-1,

UB = p + [z/√(n-z2-2z/√n-1/n][√(pq)+1/2n], except UB=1 for r=n. Vollset (Vollset, 1993) compared thirteen methods for computing binomial

with A=81(r+1)(n-r)-9n-8, B=3z√[9(r+1)(n-r)(9n+5z2)+n+1],

156

BETTER BINOMIAL CONFIDENCE INTERVALS

and C=81(r+1)2-9(r+1)(2+z2)+1. For P LB, replace r with r-1 and z with -z. The Vollset MP lower and upper bound are then defined as: MP LB={Pl(r)+Pl(r+1)}/2, MP UB={Pu(r)+Pu(r-1)}/2 Vollset argued that W and WCC were unsatisfactory and the Clopper-Pearson, Pratt's approximation, SCC, MP, S and SCC are methods that may be safely used in all applications. Newcombe (1998) compared seven methods for constructing two-sided binomial confidence intervals (W, WCC, S, SCC, Clopper-Pearson, mid-P and a likelihood-based method). The W and WCC were quickly judged as being inadequate, highly anti-conservative, asymmetrical in coverage, and incurred a higher risk of unacceptable boundary limits. Newcombe argued that neither W nor WCC should be acceptable methods for the scientific literature since other methods are tractable and all perform much better. Newcombe further argued that the use of the simple asymptotic standard error of a proportion should be restricted to sample size planning and introductory teaching purposes. Newcombe preferred three methods: the Clopper-Pearson method, the Score method and mid-P binomial based method. Agresti and Coull, in noting the poor performance of the Wald interval and conservativeness of the Clopper-Pearson interval, proposed a straightforward adjustment the add 4 to Wald. They suggested that by simply adding two successes and two failures and then use the Wald formula. Alternatively, one could add z2/2 successes and z2/2 failures before computing the Wald confidence interval.

The latter is preferred. The Agresti-Coull adjusted Wald (AC) lower and upper bounds are: LB=p'−z√[p'q'/n'], UB=p'+z√[p'q'/n'], where p'=(2r+z2)/(2n+z2), and n'=n+z2 Pires (2002) compared twelve methods for constructing confidence intervals for a binomial proportion and concluded that a clear classification of conservative methods included the Clopper-Pearson, the Score, and two arcsine transformation methods. A second tier of recommended confidence interval construction methods included a Bayesian method and the SCC. Agresti (2003) argued for reducing the effects of discreteness in binomial confidence intervals by inverting two-sided tests rather than two one-sided tests. In most statistical practice, for interval estimation of a proportion or a difference or ratio of proportions, the inversion of the asymptotic score test is the best choice. If one wants to be a bit more conservative, mid-P adaptations or the Clopper-Pearson are recommended. For teaching purposes, the Waldtype interval plus and minus a normal-score multiple of a standard error is simplest. Reiczigel compared four methods for constructing binomial confidence intervals: Wilson's Score, Agresti and Coull Adjusted Wald, the Clopper-Pearson, the mid-P, and Sterne's interval (Rieczigel, 2003). Unique to this study is the recommendation of using the Sterne interval and the Agresti-Coull adjusted Wald interval for binomial confidence intervals. Tobi et al. (2005) compared the performance of seven approximate methods and the exact Copper-Pearson exact confidence intervals for small proportions. Three criteria were used to evaluate the performance of confidence intervals; coverage, confidence interval width, and aberrant confidence intervals. They concluded that: (1) one should

JAMES F. REED III compute confidence intervals for small proportions even when the number of events equals zero, (2) report what method has been used for confidence interval calculation, (3) the W method should be discarded, and (4) the Clopper-Pearson and the SCC are the best choices to calculate confidence intervals for small proportions. Borkowf (2005) argued that even though the Agresti-Coull method binomial confidence intervals are substantially better than the Wald method, it can yield sub nominal coverage for some values of π for moderate sample sizes. A binomial confidence interval, which results in near nominal coverage and is easy to calculate by first augmenting the original data with a single imaginary failure to compute the lower confidence bound and a single imaginary success to compute the upper confidence bound is proposed - a single augmentation with an imaginary failure or success (SAIFS) method. The lower and upper SAIFS confidence bounds are then: SAIFS LB = p1 - ξ1-α/2 √[p1q1/n] and UB = p2 + ξ1-α/2 √[p2q2/n], with p1=(r + 0)/(n+1) and p2=(r+1)/(n+1) Borkowf (2005) evaluated two forms of the SAIFS. The first uses the z-quantiles (ξ1-α/2) and the second used the t-quantiles (τn-1, 1-α/2). Compared to the Clopper-Pearson method, the SAIFS method using either the z or t quantiles results in confidence intervals with mean widths that are narrower for proportion parameters near 0 or 1 and whose coverage probabilities are marginally better over all values of π. The SAIFS-Z is preferred.

157

Figure 2 shows the 95% confidence interval coverage probability as a function of π, π ∈ [0,1], for n=20 for CP, WBS, S, SCC, AC, and SAIFS-Z. Note that the sawtooth appearance of the coverage functions is due to the discontinuities for values of p corresponding to any lower or upper limits in the set of n+1 confidence intervals. The Clopper-Pearson and Borkowf SAIFS-z methods give at least nominal coverage for all values of π ∈ [0,1], with severe over coverage near 0 and 1. The Score CC method gives at least nominal coverage for all values of π ∈ [0,1] and avoids the over coverage of either the Clopper-Pearson or Score methods. The Score and Agresti-Coull methods yield nearly nominal coverage for all values of π ∈ [0,1]. Conclusion For the one sample binomial confidence interval, a new generation of introductory and medical statistics textbooks should emphasize the poor performance properties of W, WCC and include better binomial confidence methods. At least one from the set of Clopper-Pearson, S, SCC, Agresti-Coull, or the SAIFS-z methods should be mentioned. With the widespread use of laptop computers and access to computing resources on the internet, the complexity of computing binomial confidence intervals should not be an issue. The question remains as to which method to use. The Clopper-Pearson exact method has been regarded as definitive as it eliminates both overshoot and zero width intervals. The Clopper-Pearson exact method is the most conservative and is unquestionably a better alternative to the W when constructing and reporting binomial confidence intervals. In terms of programming ease, the Clopper-Pearson is easily programmed as are the Blythe & Still, Wilson's Score, Score with a continuity correction, the Agresti-Coull method, and the Borkowf SAIFS-z.

BETTER BINOMIAL CONFIDENCE INTERVALS

158

Clopper-Pearson CP 1.0000

CP

0.9500

0.9000

0.8500

0.8000

0.000

0.250

0.500

0.750

1.000

P

Score CP 1.0000

S

0.9500

0.9000

0.8500

0.8000 0.000

0.250

0.500

0.750

1.000

P

Score CC CP 1.0000

SCC

0.9500

0.9000

0.8500

0.8000

0.050

0.350

0.650

0.950

P

Figure 2. Coverage Probabilities (n=20) for the Clopper-Pearson, Score, Score CC, Agresti-Coull, and Borkowf SAIFS-z Binomial Confidence Interval Methods.

JAMES F. REED III

159

Agresti-Coull CP 1.0000

AC

0.9500

0.9000

0.8500

0.8000 0.000

0.250

0.500

0.750

1.000

P

Borkowf SAIFS-z CP 1.0000

SAIFSZ

0.9500

0.9000

0.8500

0.8000 0.000

0.250

0.500

0.750

1.000

P

Figure 2 (Continued). Coverage Probabilities (n=20) for the Clopper-Pearson, Score, Score CC, Agresti-Coull, and Borkowf SAIFS-z Binomial Confidence Interval Methods.

BETTER BINOMIAL CONFIDENCE INTERVALS

160

Table 1. Methods for Calculation of Confidence Intervals for a Single Proportion Method Clopper-Pearson

Formula CP

LB=0 if x=0, (α/2)1/n if x=n. LB=[1+(n−r+1)/(r × F2r, 2(n−r+ 1), 1−α/2)]-1 UB=1-(α/2)1/n if x=0, 1 if x=n. UB=[1+(n−r)/(r × F2(r+1), 2(n−r),α/2)]-1

Score (Wilson)

S

LB=(2np+z2−z√{z2+4npq})/2(n+z2) UB=(2np+z2+z√{z2+4npq})/2(n+z2)

Score (w/CC)

SCC

LB=[2np+z2−1−z√{z2−2−1/n+4p(nq+1)}]/(2n+2z2) UB=[2np+z2+1+z√{z2+2−1/n+4p(nq-1)}]/(2n+2z2)

Agresti-Coull

AC

LB=p'−z√[p'q'/n'] UB=p'+z√[p'q'/n'], where p'=(2r+z2)/(2n+z2), and n'=n+z2.

Borkowf

SAIFS

LB = p1 - ξ1-α/2 √[p1q1/n] UB = p2 + ξ1-α/2 √[p2q2/n], with p1=(r + 0)/(n+1) and p2=(r+1)/(n+1), where ξ1-α/2 are z-quantiles or τn-1, 1-α/2 the t-quantiles

JAMES F. REED III References Agresti A. & Coull B. A. (1998). Approximate is better than 'exact' for interval estimation of binomial proportions. The American Statistician, 52, 119-126. Agresti, A. & Min, Y. (2001). On smallsample confidence intervals for parameters in discrete distributions. Biometrics, 57, 963-71. Agresti, A. (2003). Dealing with discreteness: Making 'exact' confidence intervals for proportions, differences of proportions, and odds ratios more exact. Statistical Methods Medical Research, 12, 3-21. Blyth, C. R. & Still, H. A. (1983). Binomial confidence intervals. Journal of the American Statistical Association, 78, 108-116. Bonett, D. G. & Price, R. M. (2005). Confidence intervals for a ratio of binomial proportions based on paired data. Statistical Methods Medical Research,15. Borkowf, C. B. (2005). Constructing binomial confidence intervals with near nominal coverage by adding a single imaginary failure or success. Statistical Methods Medical Research, 25. Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404-413. Fleiss, J. H. (1981). Statistical methods for rates and proportions (2nd Ed.). New York: John Wiley & Sons. Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistical Methods Medical Research, 17, 857-72.

161

Pires, A. M. (2002). Confidence intervals for a binomial proportion: Comparison of methods and software evaluation. Proceedings of the Conference ComStat 2002. http://www.math.ist.utl.pt/~apires. Pratt, J. W. (1968). A normal approximation for binomial, F, Beta, and other common, related tail probabilities. Journal of the American Statistical Association, 63, 14571483. Radhakrishna, S., Murthy, B. N., Nair, N. G. K., Jayabal, P., & Jayasri, R. (1992). Confidence intervals in medical research. Indian Journal of Medical Research [B], 96, 199-205. Reiczigel, J. (2003). Confidence intervals for the binomial parameter: Some new considerations. Statistical Methods Medical Research, 22, 611-21. Sterne, T. E. (1954). Some remarks on confidence or fiducial limits'. Biometrika, 41, 275-278. Tobi, H., van den Berg, P. B., & deJongvan den Berg, L. T. W. (2005). Small proportions: What to report for confidence intervals. Pharmacoepidemiology and Drug Safety, 14, 239-247. Vollset, S. E. (1993). Confidence intervals for a binomial proportion. Statistical Methods Medical Research, 12, 809-24. Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209-212.

Suggest Documents