Many diagnostic tests are based on a characteristic

10.1177/0272989X03251246 MEDICAL VAN AREA METHODOLOGY DEN UNDER HOUT DECISION AN ROC MAKING CURVE/WITH MONTH LIMITED 2003 INFORMATION METHODOLOGY A...
Author: Dorothy Dixon
10 downloads 0 Views 104KB Size
10.1177/0272989X03251246 MEDICAL VAN AREA METHODOLOGY DEN UNDER HOUT DECISION AN ROC MAKING CURVE/WITH MONTH LIMITED 2003

INFORMATION

METHODOLOGY

ARTICLE

The Area under an ROC Curve with Limited Information Wilbert B. van den Hout, PhD

The area under the receiver operating characteristic (ROC) curve of a diagnostic test can be used as a summary measure for its discriminative ability. If only a single point of an ROC curve is available, then the entire form of the ROC curve is unknown and the area under it cannot be calculated. Assuming that the unknown ROC curve is either monotone or concave, lower and upper bounds are derived for the area. From these

bounds, the minmax approximations are obtained. Compared to only assuming monotonicity, assuming that the unknown ROC curve is concave renders a higher minmax approximation for the area under it, with tighter bounds. Key words: ROC curve analysis; area under the curve; monotonicity; concavity. (Med Decis Making 2003;23: 160–166)

M

discriminative ability of the test. Cantor and Kattan3 proposed a simple method to do so and determined the area at 1 2(Se + Sp). Desbiens4 later showed that this formula is the smallest possible area among all concave ROC curves that contain the specified combination of sensitivity and specificity. Therefore, he concluded that the approximation was not unbiased but was in fact a lower bound of the true area under the unknown ROC curve. In addition, he proposed an upper bound for the area. In response, Cantor and Kattan5 showed that the proposed upper bound was not a proper upper bound, and they proposed a new upper bound. However, whereas the upper bound proposed by Desbiens was inaccurately low, we will show here that the upper bound proposed by Cantor and Kattan was unnecessarily high. They themselves provided an example in which their upper bound was 0.95, whereas the area

any diagnostic tests are based on a characteristic that can have a range of values, such as diastolic blood pressure. For such tests, a cutoff point needs to be specified to distinguish normal values from abnormal values, thus indicating absence and presence of disease. If a less extreme cutoff point is used, then more patients are indicated as having the disease. This improves the sensitivity of the test (i.e., the probability of rightfully concluding the disease is present in diseased patients) but at the expense of deteriorating the specificity (i.e., the probability of rightfully concluding the disease is absent in healthy patients). Receiver operating characteristic (ROC) curves are used to describe the possible combinations of sensitivity and specificity, depending on the cutoff point that is chosen. Without specifying the cutoff point, the area under this ROC curve can be used as a summary measure for the discriminative ability of the test.1,2 The value of the area can be used to compare the diagnostic performance of different tests or different groups of observers, even if the groups use different cutoff points. A diagnostic test need not have a range of possible cutoff points. If the test characteristic is binary, then the ROC curve consists of only a single combination of sensitivity (Se) and specificity (Sp). The entire form of the ROC curve is then unknown or even nonexistent, so the area under it cannot be calculated. Nevertheless, it is still tempting to try to specify it as an indicator for the

DOI: 10.1177/0272989X03251246

160 • MEDICAL DECISION MAKING/MAR–APR 2003

Received 5 December 2001 from the Department of Medical Decision Making, Leiden University Medical Center, Leiden, the Netherlands. Financial support for this study was provided entirely by a contract with the Leiden University Medical Center. The funding agreement ensured the author’s independence in designing the study, interpreting the data, writing, and publishing the report. Revision accepted for publication 2 December 2002. Address correspondence and reprint requests to Wilbert B. van den Hout, Department of Medical Decision Making, Leiden University Medical Center, P.O. Box 9600, 2300 RC, Leiden, the Netherlands; telephone: 31-(71)-526-4577; fax: 31-(71)-526-6838; e-mail: W.B.van_ [email protected].

AREA UNDER AN ROC CURVE WITH LIMITED INFORMATION

ROC CURVES Figure 1 shows the ROC plane. ROC curves are traditionally graphed by plotting the true-positive fraction (TPF = Se) on the vertical axis against the false-positive fraction (FPF = 1 – Sp) on the horizontal axis. Particular regions of the ROC plane correspond with particular types of diagnostic performance. The top-left region of the ROC plane represents good diagnostic performance, with few false positives and many true positives. The extreme lower left corner represents an observer that classifies all patients as normal, whereas the extreme top-right corner represents an observer that classifies all patients as diseased. The chance diagonal contains all tests that are essentially just as good as tossing an unbiased coin (the center of the diagonal) or a biased coin (the remainder of the diagonal). Diagnostic performance below the diagonal is even worse than tossing a coin. ROC curves can have several interesting characteristics, with some more reasonable than others. To begin with, we will assume in this paper that an ROC curve assigns a unique TPF value to each FPF value on the horizontal axis. This assumption can be disputed with good reason. If the test characteristic is binary (such as being pregnant or not) or has a discrete number of possible cutoff points (such as the number of previous deliveries), then the ROC curve consists of a set of disjoint points. Moreover, for a variety of reasons, even a test characteristic with a continuous range of cutoff values (such as blood pressure) may have only a discrete number of viable cutoff points (such as entire values of mm Hg). In such cases, only a limited number of FPF values have an associated TPF value. In between these points, the ROC curve is nonexistent, so theoretically the area under the ROC curve is not a properly defined concept. Because we are interested in just that area, it is reasonable here to assume that each FPF value has an associated TPF value, albeit possibly unknown. Another reasonable characteristic for ROC curves is that they are monotone (i.e., nondecreasing). This basically means that, along the ROC curve, the FPF and the TPF cannot both deteriorate at the same time. The bent

METHODOLOGY

e

1.0 True Positive Fraction (Sensitivity)

under the unknown concave ROC curve could not have been more than 0.92. This paper will first describe and discuss some general characteristics of ROC curves, including monotonicity and concavity. Then, assuming either of these characteristics, bounds and approximations are derived for the area under an ROC curve of which only a single point is available.

f 0.8 d

c

a

0.6

b 0.4

0.2

chance diagonal

0.0 0.0

0.2

0.4

0.6

0.8

1.0

False Positive Fraction (1-Specificity) Figure 1 Monotonicity and concavity of receiver operating characteristic (ROC) curves.

curve in Figure 1 shows an ROC curve that is not monotone: following the curve from point a to point b, the FPF increases and the TPF decreases. Although a test with this ROC curve is conceivable, monotonicity is in practice rarely violated. All tests based on a cutoff value are of necessity monotone. Moreover, any test with a nonmonotone ROC curve can easily be improved. In Figure 1, all combinations between a and c are dominated by the diagnostic accuracy associated with point a: the FPF in a is lower and the TPF is higher. Therefore, it is reasonable to refrain from using this dominated part of the ROC curve, for example, by using point a instead. This way, the dominated nonmonotone part of the ROC curve is removed, and thus a monotone ROC curve is obtained. Although less obvious, concavity is also a reasonable characteristic for ROC curves.6,7 An ROC curve is concave if the slope of the curve is nonincreasing (i.e., if the slope is high in the lower left corner and monotonically decreases toward the top-right corner). In Figure 1, neither the original nor the monotone version of the ROC curve is concave: initially, the slope does decrease, but between c and e the curve becomes more and more steep. Concavity is a desirable property, for one, because it implies monotonicity and ensures that the ROC curve is never below the chance diagonal. A further reason to assume concavity is that, like tests

161

VAN DEN

with nonmonotone ROC curves, tests with nonconcave ROC curves can be improved. In Figure 1, consider the following randomized test: depending on the toss of an unbiased coin, use the original test either with cutoff point d or with cutoff point e. Point f shows the diagnostic accuracy of this randomized test, which is the average of the accuracy for cutoff points d and e. Using biased coins, the entire line segment between d and e can be obtained. This line segment is above the nonconcave part of the original ROC curve, so the randomized test dominates the original test. For any nonconcave ROC curve, such a new, improved test can be constructed by randomizing different cutoff points. The peculiarity of the randomized improvement all the more shows the reasonability of concavity. A final reason to assume that ROC curves are concave is that criteria for selecting the optimal point of an ROC curve are based on the slope of the curve and exclude selection of points in nonconcave parts of the ROC curve.8 Like monotonicity, concavity is a reasonable but not a necessary characteristic of ROC curves. Whereas monotonicity is rarely violated, nonconcave ROC curves are published quite frequently. For example, ROC curves that are at some point below the chance diagonal can be monotone but cannot be concave. Likewise, ROC curves with more than 1 horizontal line segment are necessarily nonconcave. Theoretically, concavity also excludes the well-known class of binormal ROC curves. This class of ROC curves is obtained by assuming that the test characteristic has 2 normal distributions, depending on whether disease is absent or present. At either of the end points, binormal ROC curves are necessarily nonconcave and below the chance diagonal (unless the binormal ROC curve is symmetric).9 However, the region where binormal ROC curves misbehave is usually small. THE AREA UNDER ROC CURVES The area under an ROC curve is a measure for the discriminative ability of a test. Tests with good diagnostic performance have an ROC curve that is close to the top-left corner, with an area that is close to 1. In general, the area under an ROC curve can range from 0 to 1. The area under the chance diagonal is 1 2, so any ROC curve with an area less than 1 2 is usually classified as particularly bad. If an ROC curve is not entirely known, then it is not possible to calculate the area under the curve. The problem this paper addresses is determining the range of values that the area under an ROC curve can attain if only a single interior point of that ROC curve is avail162 • MEDICAL DECISION MAKING/MAR–APR 2003

HOUT

able. In general, the area under the ROC curve can then still range from 0 to 1. The lower bound 0 corresponds to the ROC curve that follows the lower right boundary of the ROC plane, with only a short peak at the available combination. Similarly, the upper bound 1 corresponds to the ROC curve that follows the left-upper boundary of the ROC plane, with only a short drop at the available combination. Therefore, knowing 1 point of the ROC curve does not reduce the range of values that the area under the ROC curve can attain, unless additional assumptions are made. THE AREA UNDER MONOTONE ROC CURVES Consider first the case in which we are willing to assume that the unknown ROC curve is monotone but not necessarily concave. In Figure 2, the single available interior point of the ROC curve is denoted by the combination (FPF; TPF) = (1 – Sp; Se). A nondecreasing ROC curve could not contain this observed combination and also pass through the interior of block B. Therefore, among the monotone ROC curves containing the observed combination, the one with the smallest area is the one that closely follows the left-upper boundary of block B. The area under that curve is the lower bound for the area under the unknown monotone ROC curve: Lmon = B = Se × Sp.

Likewise, a nondecreasing ROC curve could not pass through the interior of block D and also contain the observed combination. Therefore, among the monotone ROC curves containing the observed combination, the one with the largest area is the one that closely follows the lower right boundary of block D. The area under that curve is the upper bound for the area under the unknown monotone ROC curve: Umon = 1 – D = 1 – (1 – Se) × (1 – Sp).

If the unknown ROC curve is monotone, then the derived lower and upper bounds specify the range of values that the area under the unknown ROC curve can attain. A plausible approximation for the true area under the unknown monotone ROC curve is the middle of the derived range: 1 1 Amon = ( Lmon + U mon ) = (Se + Sp ). 2 2

Because the approximation is defined as the middle of the range of possible values, it is equally close to both bounds of the range. Any other approximation would have a larger potential error because it would have to be

AREA UNDER AN ROC CURVE WITH LIMITED INFORMATION

1.0

(1-Sp ; Se )

0.8 A

0.6

b

1.0

C True Positive Fraction (Sensitivity)

True Positive Fraction (Sensitivity)

D

B

0.4

0.2

0.0

D

C (1-Sp ; Se )

0.8

0.6

r

a B

0.4

0.2

A

0.0

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Fraction (1-Specificity)

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Fraction (1-Specificity)

Figure 2 Monotone receiver operating characteristic (ROC) curves containing the observed diagnostic accuracy.

Figure 3 Concave receiver operating characteristic (ROC) curves containing the observed diagnostic accuracy.

further away from either the lower bound or the upper bound. Therefore, the proposed approximation is the minmax approximation that minimizes the maximum error. Its largest possible error depends on the observed diagnostic performance. It can range from 0 (for observed accuracy at the top-left or the lower right corner) to 0.5 (at the lower left and the top-right corner). The median largest possible error over the entire ROC plane is 0.25. As an example, consider the case shown in Figure 2, in which the observed diagnostic accuracy has sensitivity 0.85 and specificity 0.6. If we assume that the unknown ROC curve is monotone, then its area can range from the lower bound Lmon = 0.85 × 0.6 = 0.51 to the upper bound Umon = 1 – 0.15 × 0.4 = 0.94. The correspondingminmaxapproximationisAmon = 1 2 (051 . + 094 . ) = 0725 . , with largest possible error 1 2 (094 . − 051 . ) = 0215 . .

As observed by Desbiens,4 the lower bound for the area under a concave ROC curve is the formula derived by Cantor and Kattan.3 In Figure 3, a concave ROC curve containing the observed combination (1 – Sp; Se) cannot also pass through the interior of triangles A or C. Therefore, among the concave ROC curves containing the observed combination, the one with the smallest area is the one that closely follows the upper boundaries of triangles A or C. The area under that curve is a lower bound for the area under the unknown concave ROC curve:

THE AREA UNDER CONCAVE ROC CURVES If one is willing to assume that the unknown ROC curve is not only monotone but also concave, then the range of values the area under the ROC curve can attain can be further reduced. Concave ROC curves have an area that is at least a half because concave ROC curves are never below the chance diagonal. It only makes sense to assume concavity if the observed diagnostic performance is also not below the chance diagonal (Se + Sp ≥ 1).

METHODOLOGY

1 Lcon = A + B + C = Se(1 − Sp ) 2 1 1 + Se × Sp + (1 − Se )Sp = (Se + Sp ). 2 2

Surprisingly, this lower bound for concave ROC curves is equal to the approximation Amon that was derived for monotone ROC curves. It is considerably more difficult to derive the upper bound of the range of values that the area under the concave ROC curve can attain. Consider the curve in Figure 3 consisting of the following 3 line segments: from the lower left corner, through points a and b, to the top-right corner. The middle segment from a to b contains the observed combination and has a slope denoted by r. It intersects the boundary of the ROC plane at

163

VAN DEN

a = (0; TPFa) = (0; Se – r (1 – Sp)), 1   b = ( FPFb ;1) = (1 − Sp ) + (1 − Se );1 .   r

HOUT

the following upper bound for the area under the unknown concave ROC curve: U con

For this constructed ROC curve to remain concave, the middle segment should remain above the lower left and the top-right corner of the ROC plane. Therefore, the slope must be at least r1 = (1 – Se)/Sp and can be at most r2 = Se/(1 – Sp). These boundaries for the slope are equal to the slopes of the hypotenuse of triangles A and C, and they are known as the negative and the positive likelihood ratios. The area under the constructed 3segment ROC curve is a function of the slope r. That function A(r) and its first and second derivative are equal to 1 A( r ) = 1 − D = 1 − (1 − TPFa )FPFb 2 1 2 = 1 − {(1 − Se ) + r(1 − Sp )} , 2r A′ ( r ) =

{

}

1 (1 − Se )2 − r 2 (1 − Sp )2 , 2r 2

A′′( r ) = −

1 (1 − Se )2 ≤ 0. r3

Different values of r constitute different ROC curves. These curves need not be entirely above the unknown concave ROC curve. However, we do know that the unknown ROC curve is concave. Previously, a curve was defined as concave if the slope of the curve was nonincreasing. An equivalent alternative definition of concavity is that for every point on the curve, there is a straight line through that point that is completely on or above the curve. The line segments from a to b are candidates for the straight line through the observed point, so there must be at least one slope r for which the 3segment ROC curve is completely on or above the unknown concave ROC curve. For that particular r, the area A(r) must be at least as large as the area under the unknown concave ROC curve. So, if we choose the largest possible value of A(r), with r1 ≤ r ≤ r2, then that largest possible value must also be at least as large as the true area under the unknown concave ROC curve. This formulates a classic maximization problem of a convex univariate function on a convex domain: A(r) is convex because A″(r) is negative, and the domain is the interval r1 ≤ r ≤ r2. In such problems, the maximum is attained either where the first derivative A′(r) is 0 (i.e., at r$ = (1 − Se) (1 − Sp )) or at the nearest point inside the domain (i.e., at r1 if r$ < r 1 or at r2 if r$ > r 2 ). This renders 164 • MEDICAL DECISION MAKING/MAR–APR 2003

A( r1 ) if r$ < r1 ,  = A( r$ ) if r1 ≤ r$ ≤ r2 , A( r ) if r$ > r , 2  2 1 1 − 1 (1 − Se ) Sp if Sp < and Se + Sp ≥ 1,  2 2  1 1 = 1 − 2(1 − Se )(1 − Sp ) if Sp ≥ and Se ≥ , 2 2  1 1 − 1 (1 − Sp ) Se if Se < and Se + Sp ≥ 1.  2 2

This upper bound corresponds to a particular 3-segment ROC curve. That 3-segment ROC curve need not be entirely on or above the unknown concave ROC curve, but it does have an area under it that is at least as large as the area under the unknown concave ROC curve. Moreover, it is itself a proper concave ROC curve, so it could actually be the unknown concave ROC curve. Therefore, the derived upper bound is not unnecessarily high. Like for monotone ROC curves, the minmax approximation for the true area under the unknown concave ROC curve is the middle of the derived range: A con =

1 ( L con + U con ) 2

1  1 (2 + Se + Sp − (1 − Se ) Sp ) if Sp < and Se + Sp ≥ 1, 4 2  1 1 1 =  (2 + Se + Sp − 4(1 − Se )(1 − Sp )) if Sp ≥ and Se ≥ , 4 2 2  1  1 (2 + Se + Sp − (1 − Sp ) Se ) if Se < and Se + Sp ≥ 1.  4 2

The largest possible error of this concave approximation depends on the observed diagnostic performance and ranges from 0 (for observed accuracy at the top-left corner point or on the chance diagonal) to 0.25 (on the boundary of the ROC plane near the lower left and the top-right corner point). For observed diagnostic accuracy above the chance diagonal, the median largest possible error is 0.07. As an example, consider again the case with observed sensitivity 0.85 and observed specificity 0.6. If we assume that the unknown ROC curve is concave, then its area can range from the lower bound to the upper bound U con = L con = 1 2 (085 . + 06 . ) = 0725 . 1 – 2 × 0.15 × 0.4 = 0.88. The corresponding minmax approximation is Acon = 1 2 (0725 , with . + 088 . ) = 08025 . largest possible error 1 2 (088 . . − 0725 . ) = 00775 . The monotone and concave bounds and approximations always have the same ranking:

AREA UNDER AN ROC CURVE WITH LIMITED INFORMATION

1.0

Bounds and Approximations

U mon U con A con

0.8

0.6

L con = A mon L mon

0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

False Positive Fraction (1-Specificity) Figure 4 Bounds and approximations for the area under the unknown receiver operating characteristic (ROC) curve for fixed observed sensitivity 0.85.

0 ≤ Lmon ≤ Lcon = Amon ≤ Acon ≤ Ucon ≤ Umon ≤ 1.

At the expense of assuming concavity, the concave approximation is higher than the monotone approximation, with tighter bounds. For observed diagnostic accuracy above the chance diagonal, the median increase of the concave approximation compared to the monotone approximation is 0.07 (9%). Figure 4 shows the bounds and approximations for fixed observed sensitivity 0.85, with the observed FPF on the horizontal axis varying from 0 to 1. Larger values of the observed FPF correspond with worse diagnostic performance, so all curves are decreasing. If the FPF is larger than 0.85, then the observed diagnostic performance is worse than tossing a coin, the unknown ROC cannot be concave, and the concave approximation cannot be calculated. DISCUSSION It is not uncommon that primary diagnostic studies only report test results in 2 categories such as positive and negative, so that only a single combination of sensitivity and specificity can be estimated.10 In this paper, bounds and approximations were derived for the area under the otherwise unknown ROC curve, assuming METHODOLOGY

either monotonicity or concavity. Compared to only assuming monotonicity, assuming that the unknown ROC curve is concave rendered a higher minmax approximation with tighter bounds. Surprisingly, it turned out that the monotone minmax approximation was equal to the concave lower bound. This may explain the disagreement about the validity of the initial formula by Cantor and Kattan,3 who did not mention concavity and may implicitly also have considered nonconcave ROC curves. In that case, their formula is indeed an unbiased estimate. Desbiens4 explicitly considered only concave ROC curves, in which case the same formula is a lower bound. The concepts discussed in this paper are closely related to the trapezoidal rule to calculate the area under an ROC curve and the concordance index. Like the concave lower bound, the trapezoidal rule calculates the area under an ROC curve by connecting disjoint points of the ROC curve by straight line segments. In accordance with the results found in this paper, the trapezoidal rule has a reputation for underestimating the area under an ROC curve if the number of disjoint points is limited.2 The concordance index is the probability that the test characteristic of an arbitrary diseased patient is more indicative of disease than the test characteristic of an arbitrary normal patient, with ties in the test characteristic resolved by tossing an unbiased coin.2,11 Without ties, the concordance index is identical to the area under the ROC curve. The solution to use a coin in case of a tie is essentially the same as the trapezoidal rule to connect disjoint points of the ROC curve. Because the monotone approximation differs from the concave approximation, the question arises about which one should be used. We have argued that, in general, concavity is a reasonable assumption for ROC curves. Therefore, we advise using the concave approximation, unless for some special reason concavity is not a reasonable assumption. An example of such a special case is when the observed diagnostic performance is below the chance diagonal (Se + Sp < 1). In that case, the diagnostic performance is worse than tossing a coin, the concave approximation cannot be applied, and one needs to resort to the monotone approximation. Another example is when the test characteristic is inevitably binary, such as being pregnant or not. Except for the single available point, the ROC curve is not only unknown but even nonexistent. Therefore, monotonicity and concavity do not apply. For these cases, using the formula of the monotone approximation maintains the link with the concordance index. Because the derived approximations are minmax approximations, they are in a sense the best possible 165

VAN DEN

approximation. Nevertheless, they still can have considerable error. Therefore, they should only be used if it is not possible to obtain additional information on the ROC curve. Additional information can be obtained by considering several cutoff points for the degree of abnormality,1,2 but this may be too laborious or infeasible if one needs to rely on previously published studies. Even then, it may still be possible to estimate the area under the ROC curve, using parametric techniques that incorporate variability among different studies or observations.10,12 The approximations proposed in this paper should only be used if these other techniques fail (i.e., only if it is infeasible to obtain more than 1 combination of the ROC curve or to obtain data from multiple studies or respondents). REFERENCES 1. Swets JA, Pickett RM. Evaluation of Diagnostic Systems. New York: Academic Press; 1982. 2. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29– 36.

166 • MEDICAL DECISION MAKING/MAR–APR 2003

HOUT

3. Cantor SB, Kattan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Decis Making. 2000;20:468– 70. 4. Desbiens NA. Area under the ROC curve for a binary diagnostic test. Med Decis Making. 2001;21:421. 5. Cantor SB, Kattan MW. Response to: Area under the ROC curve for a binary diagnostic test. Med Decis Making 2001;21:422. 6. Tosteson AN, Begg CB. A general regression methodology for ROC curve estimation. Med Decis Making. 1988;8:204–15. 7. Hilden J. The area under the ROC curve and its competitors. Med Decis Making. 1991;11:95–101. 8. Sox HC, Blatt MA, Higgins MC, Marton KI. Medical Decision Making. Boston: Butterworths; 1988. 9. Halpern EJ, Albert M, Krieger AM, Metz CE, Maidment AD. Comparison of receiver operating characteristic curves on the basis of optimal operating points. Statistics for Radiology. 1996;3:245–53. 10. Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol. 1995;48:119–30. 11. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87. 12. Schouw YT, Straatman H, Verbeek AL. ROC curves and the areas under them for dichotomized tests: empirical findings for logistically and normally distributed diagnostic test results. Med Decis Making. 1994;14:374–81.