Deciding the treatment of patients with newly diagnosed and clinically

1567 Predicting the Pathology Results of Radical Prostatectomy from Preoperative Information A Validation Study Robin T. Vollmer, M.D.1 David W. Kee...
Author: Mae Welch
4 downloads 0 Views 261KB Size
1567

Predicting the Pathology Results of Radical Prostatectomy from Preoperative Information A Validation Study

Robin T. Vollmer, M.D.1 David W. Keetch, M.D.2 Peter A. Humphrey, M.D.,

3

Ph.D.

1

Laboratory Medicine, VA Medical Center and Department of Pathology, Duke University Medical Center, Durham, North Carolina.

2

Division of Urologic Surgery, Washington University School of Medicine, St. Louis, Missouri.

3

Lauren Ackerman Division of Surgical Pathology, Washington University School of Medicine, St Louis, Missouri.

BACKGROUND. There are now over 13 published models for predicting the outcomes of radical prostatectomy using preoperative information. Because their ability to predict the pathology of the prostatectomy is key in deciding who benefits the most from this surgery, it is important to know how well these models work for new data. METHODS. The patients in this study were 100 men diagnosed with prostate carcinoma in the prostate specific antigen (PSA)– based screening program at Washington University Medical Center. To test the models, the authors used preoperative information and the published algorithms to predict postoperative pathology outcomes. Statistical methods included plots of predicted probability against observed probability, boxplots of predicted probability against observed outcomes, logistic regression, and linear regression. RESULTS. Although none of the published models predicted the outcomes of radical prostatectomy perfectly, those that predicted tumor volume performed best, and in general those that were multivariate also performed best. Nevertheless, the ability of any of these models to discriminate binary outcomes was not very great. CONCLUSIONS. The results of this study suggest that preoperative variables based on serum PSA and the results of needle biopsies can be used in multivariate models to predict tumor volume, but these models need to be improved. Predicting locally advanced tumor stage is likely to be more difficult and may require information beyond what needle biopsies can provide. Cancer 1998;83:1567– 80. © 1998 American Cancer Society. KEYWORDS: prostate carcinoma, prostatectomy, modeling, pathology.

D

Address for reprints: Robin T. Vollmer, M.D., Laboratory Medicine (113), VA Medical Center, Durham, NC 27705. Received January 2, 1998; revision received March 26, 1998; accepted March 26, 1998. © 1998 American Cancer Society

eciding the treatment of patients with newly diagnosed and clinically localized prostate carcinoma remains a troublesome area for men with this disease as well as their clinicians, and deciding who will benefit from prostatectomy is a major issue. Clearly, some of these men die before their prostate carcinoma causes significant problems, and such tumors have been described as “insignificant” or “minimal.”1–3 Others undergo radical prostatectomy only to discover that their tumors have grown beyond the prostate (stage pT3 or greater), and 50% of these men will suffer recurrence. Finally, there are patients with outcomes in between the ones just described, and these are the patients most likely to benefit from radical prostatectomy. The key question is how to use preoperative information to decide who may benefit from prostatectomy, and an important part of this question is how to predict the pathology outcomes of patients with radical prostatectomy.

1568

CANCER October 15, 1998 / Volume 83 / Number 8

In this regard, many have studied how preoperative features, such as prostate specific antigen (PSA), histologic grade, volume of tumor in sextant biopsies, and prostate specific antigen density (PSAD), relate to postoperative pathologic measures made on the radical prostatectomy specimen.1,2,4 –18 For example, 15 groups of authors have developed over 16 models to predict a variety of postoperative pathology outcomes,1,2,4 –18 including the volume of tumor; volumerelated cutpoints, such as “insignificant” versus “significant” tumor; and outcomes related to stage, such as involvement of periprostatic fat, margins, or seminal vesicles. Whereas such statistically significant correlations between preoperative variables and postoperative results are noteworthy, before using the models it is also crucial to see how well they work with independently identified groups of patients. In this article we report on the validation of 13 of these models for a group of 100 men with newly diagnosed, localized prostate carcinoma, and who underwent the surgery.

METHODS One hundred men consecutively diagnosed with prostate carcinoma in the PSA-based screening program at Washington University Medical Center constituted the cohort in this study. None had a prior history of prostate carcinoma, and none had clinical evidence of prostatitis. Serum PSA was assayed using the Tandem-R method of Hybritech (San Diego, CA) as previously described.19 PSAD was calculated by dividing PSA by prostate volume determined by ultrasonography as previously described.20,21 Sextant biopsies were performed under ultrasound guidance using an 18gauge needle and biopsy gun. Grade was determined by one of us (P.A.H.) using the Gleason method.22 The extent of carcinoma in the biopsies was quantified by four methods, including the number of positive core biopsies, the total length of each tumor in mm, the percentage of cores that contained tumor, and the percentage of each biopsy length with tumor as previously described.20,21 The radical prostatectomy specimens were completely embedded and completely examined histologically as previously described.20,21 Capsular penetration was defined as tumor in periprostatic connective tissue. Positive surgical margins were those showing tumor in the shave bladder neck tissues, in distal urethral shave tissues, or in the inked edges of the remaining portions of the prostate. Tumor volume was calculated using the SAMBA 4000 image analyzer (Imaging Products International, Chantilly, VA) and the prostate SAM software program (Imaging Products International). The raw data fed into this program

TABLE 1 Data on Patient Characteristics Used for Validation Characteristics Total no. of patients Age (yrs) Mean Range Distribution of clinical stages (no. of patients) T1c T2a T2b T2c Distribution of biopsy Gleason scores (no. of patients) 3 4 5 6 7 8 .8 Preoperative PSA (ng/mL) Mean Range Information obtained from radical prostatectomy Tumor volume (cc) Mean Range % of patients with positive margins % with extraprostatic tumor % with positive seminal vesicles % classified as pT3 % with positive lymph nodes

100 63.3 51–76 74 22 3 1 1 12 32 42 10 3 0 6.7 0.4–36.2

1.7 0.01–13.3 35 40 6 40 1

consisted of surface areas of tumor traced with a digitizing pad as previously described.21 Table 1 summarizes the characteristics of the patients and their tumors, which we used for the validation data set. The 13 models tested came from references.1,2, 4 –7,12,15–17 To be included in the analysis, a model had to provide a way to estimate tumor volume in a radical prostatectomy or the probability of pathologic postoperative outcome (i.e., positive predictive value), and these estimates had to come from preoperative variables that we had documented in our data or that we could approximate. Some of the details of how we estimated model variables or how the models were applied to our data will be given in the results for specific models. We excluded several published models,3,8,9,10,11,13,14,18 either because we did not have sufficient details to regenerate the models’ algorithms or because they required data that we had not collected. Examples of the positive outcomes we tested included tumor volume as a continuous variable, tumor volume in excess of 0.5 cc, tumor outside the prostate, tumor involving surgical margins, tumor involving seminal vesicles, and any tumor of stage T3.

Validation of Prostate Models/Vollmer et al.

1569

FIGURE 1. The Terris et al. model is represented. Left: The irregular short line is a plot of observed probability of a small tumor (Psmall) versus the Psmall predicted by the model. The plot was made using the lowess function of SPLUS. For comparison, the long, straight line indicates how perfect prediction would appear. The dots at “0” on the vertical axis indicate observed nonsmall tumors, and the dots at “1” indicate observed small tumors. Right: Boxplots show the model’s predicted Psmall for tumors observed to be ,0.5 cc versus those observed to be $0.5 cc. The shaded areas indicate the range of predicted Psmall from the 25th to the 75th percentiles, and the horizontal white bar indicates the median. The more distant lines indicate more extreme values of predicted Psmall.

We tested each model by using two graphic techniques and by reanalyzing our data using either logistic regression or a general linear model as well as the authors’ preoperative variables. For each model, we plotted the observed probabilities of an outcome against the probability predicted by the model. To estimate the observed probability, we used the lowess function23,24 of the S-PLUS software (Statsci, Seattle, WA) in a fashion described recently by Kattan et al.25 In these plots, the observed correlation appeared as an irregular, often short line; and for comparison on each plot, a longer, straight line indicated what a perfect model should yield. Second, we used a boxplot to relate each model’s predicted probability of an outcome against its observed occurrence. Models that discriminate outcome well should produce low predicted probabilities when the outcome does not occur, high probabilities when the outcome occurs, and little overlap in between. Finally, to test a model’s choice of preoperative variables, we refitted each model to our data using the model’s preoperative variables, but with new regression coefficients; then we determined the model likelihood ratio (LR), the percentage of deviance residual explained by the model (PDE), and the individual P values for the variables used. For binary outcomes we used the logistic regression model,26 and for the one continuous variable of tumor volume we used a general linear model.27 Because the LR is the decrease in the deviance residuals caused by the model’s variables, both it and PDE provide continuous measures

of the amount of information provided by those variables, and therefore both of these provide ways to rank the models. Although there is no specific cutpoint of LR that proves a good model, in general LR over 20 reflects useful models. All statistical analyses were performed with S-PLUS software (Statsci, Seattle, WA).

RESULTS Terris et al. Model In 1992, Terris et al.4 demonstrated a significant correlation between the length of tumor on one biopsy core and the likelihood of a postoperative tumor volume of less than 0.5 cc. Although they did not use multivariate statistical regression, the authors did publish their observed probabilities of a tumor being less than 0.5 cc (Psmall) for several cutpoints of tumor length in the biopsies. We will refer to these as “predicted Psmall” with respect to our data. Because this set of Psmall was primarily designed for those with 1 core biopsy containing tumor, we applied this model to the subset of our data with tumor in just 1 positive core biopsy (60 patients). The results are found in Figure 1. The left side of the figure shows the observed Psmall on our data versus the predicted values of Psmall from the model. The crooked line is the result, and the straight line shows what ideal prediction would achieve. We see from the plot that the model did not follow the ideal line and tended to underestimate the likelihood of small tumors. On the right are two boxplots of predicted Psmall. The left is for our group of tumors found to be ,0.5cc, and the right is

1570

CANCER October 15, 1998 / Volume 83 / Number 8

FIGURE 2. The Partin et al. model is represented. Top left: The irregular line is the plot of observed probability of tumor outside the prostate (Pout) versus the Pout predicted by the model. For comparison, the long, straight line indicates how perfect prediction would appear. Top right: Boxplots, analogous to the right side of Figure 1, show the model’s predicted Pout for tumors that did not involve the fat and for those that did. Bottom left: The irregular line is the plot of observed versus predicted probability of tumor involving the seminal vesicles (Psv); and for comparison, the long, straight line indicates how perfect prediction would appear. Bottom right: Boxplots show the model’s predicted Psv for those without tumor involving the seminal vesicles versus those with involvement of seminal vesicles.

for those tumors $0.5cc. For each boxplot the solid bar indicates the range of predicted Psmall from the 25th to the 75th percentile, and the white bar indicates the median. The plots demonstrate that the model’s predicted Psmall was about the same regardless of whether or not the tumor was small.

Partin et al. Model In 1993, Partin et al.5 presented a model based on the preoperative variables of PSA, clinical stage, and Gleason score, and demonstrated how it could predict postoperative outcomes of capsular penetration, involvement of seminal vesicles, and involvement of lymph nodes. In 1997,6 they updated the model and used an extensive data base involving 4133 patients from three large institutions. They published tables of the probabilities expected for each outcome and for common combinations of the three preoperative variables. From these tables we were able to determine their models’ predicted probabilities and apply them to our data. We confined our validation to the two outcomes of tumor involvement of extraprostatic tissues and tumor involvement of seminal vesicles. Although just six of our patients had tumor in the seminal vesicles, our results did provide a limited test of the Partin et al. model for this outcome. For simplicity, we refer to the probability of extraprostatic tissue as “Pout” and the probability of involvement of seminal vesicles as “Psv.” Figure 2 shows a composite of four graphs that give the results and are

analogous to Figure 1. The left upper figure shows that the Partin et al. model tended to underestimate the probability of tumor outside the prostate, because the predicted Pout was lower than the observed Pout. The boxplots on the right show the predicted Pout was about the same regardless of whether the tumor involved extraprostatic tissue. The bottom left plot shows that predicted Psv was close to the ideal line, but once again the bottom right boxplots show that predicted Psv was about the same for those with and those without tumor in the seminal vesicles.

Ackerman et al. Model In 1993, Ackerman et al.7 developed a logistic regression model for the postoperative outcome associated with a positive margin. The authors used the number of positive core biopsies and PSA density (PSAD) as preoperative variables, and they published the final predictive model as a logistic regression model for the probability of a tumor involving the margin (Pmarg). This algorithm can be concisely written as a set of three formulas: Pmarg 5 1/(11exp(2E))

(1)

with E defined as: E 5 22.18 1 1.43*S 1 1.42*PSAD

(2)

Validation of Prostate Models/Vollmer et al.

1571

FIGURE 3. The Ackerman et al. model is represented. Left: The short, irregular line is the plot of observed probability of tumor in margin (Pmarg) versus the Pmarg predicted by the model. For comparison, the long, straight line indicates how perfect prediction would appear. Right: Boxplots show the model’s predicted Pmarg for tumors that did not involve the margin versus those that did.

and S defined as: S 5

H

0 if the number of positive cores 5 1 1 if the number of positive cores . 1

authors’ rules with the following formulas. First, they categorized biopsies as follows: (3)

Using equations 1–3, we calculated the expected Pmarg for our data, and Figure 3 shows the results. The left plot shows that the predicted Pmarg was not far from the idealized line, although it underpredicted the likelihood of a positive margin. The right boxplots show that, although the median predicted Pmarg (white horizontal bars) was higher for those with positive margins than for those without positive margins (0.36 vs. 0.13), there was considerable overlap of predicted Pmarg between these two groups.

Modified Epstein et al. Model In 1994, Epstein et al. published two articles1,2 describing an algorithm designed for tumors of stage T1c. It used two logical rules for categorizing preoperative variables and one logical rule for categorizing tumors based on postoperative findings, and the authors included a table of observed probabilities of the postoperative categories that could be determined, at least empirically, from the preoperative variables. The logistic regression model was used to optimize the algorithm, but the coefficients of the model were not published. If we denote the presence of a binary feature as “1” and its absence as “0,” we can summarize the

5

1 1 bad 5 1 0

if (any Gleason 4 or 5) if (3 cores positive) if (.50% carcinoma on any core) otherwise

(4)

Next, they categorized preoperative variables as follows:

5

0 if (PSAD,0.1 and bad50 0 if (PSAD #0.15 and carcinoma preop 5 ,3mm on positive cores! 1 otherwise

(5)

Finally, the authors categorized tumors as “insignificant,” “minimal,” “moderate,” or “advanced,” according to rules based largely on the postoperative tumor volume and pathologic stage. These can be written as follows:

5 5

1 if (volume ,0.2 cc) and (combined grade ,7) and (pathologic stage ,pT3) and insignificant 5 (margins negative) 0 otherwise 1 if (0.2 # vol ,0.5 cc) and (combined grade ,7) and minimal 5 (pathologic stage minimal 5 ,pT3) and (margins negative) 0 otherwise

(6)

(7)

1572

CANCER October 15, 1998 / Volume 83 / Number 8

FIGURE 4.

A modification of the Epstein et al. model is represented. Top left: The short line (“Observed”) is the plot of observed probability of insignificant tumor (Pinsig) versus the Pinsig predicted by the model. For comparison, the longer, straight line indicates how perfect prediction would appear. Top right: Boxplots show the model’s predicted Pinsig for tumors observed to be either significant (“No”) or insignificant (“Yes”). Bottom left: The short line (“Observed”) is the plot of observed versus predicted probability of small tumor (Psmall); and for comparison, the longer, straight line indicates how perfect prediction would appear. Bottom right: Boxplots show the model’s predicted Psmall obtained for tumors found to be either large (“No”) or small (“Yes”).

5

1 if (volume .0.5 cc) and (pathologic stage ,pT3) and (margins negative) 1 if (focal extraprostatic tumor) and (seminal vesicles negative) and moderate 5 (combined grade,7) 1 if (extraprostatic tumor) and (seminal vesicles negative) and (combined grade ,7) and (margins negative) 0 otherwise

5

1 if (extraprostatic tumor) and (combined grade.6) 1 if (extraprostatic tumor) and (margins positive) advanced 5 1 if (seminal vesicles positive) 1 if (lymph nodes positive) 0 otherwise

(8)

(9)

Although the authors confined this algorithm to tumors of clinical stage T1c, we decided to apply it to our data, which also included patients with disease of clinical stages T2a–T2c. If the algorithm were successful for a broader group of tumors, then certainly that result would be of great interest. To use the model on our data, we had to modify the original algorithm in two ways. We substituted the average percentage of cancer per positive core for the

threshold of “50%” in Equation 4, and we substituted the average length of tumor per positive core for the threshold “3 mm” in Equation 5. We made these changes because we had recorded the length of cancer and the percentage of cancer collectively for the sextant biopsies rather than separately for each biopsy core. On the other hand, this modification extended the model to include cases with fragmented tumor cores, which in our experience are common phenomena. The outcomes we examined with the modified model were the tumor categories “insignificant” and “minimal,” because a binary “preop” variable could not produce an algorithm for accurately categorizing tumors into four types. Thus, we classed tumors as “insignificant” according to the rule of Equation 6, and we classed tumors as “small” if they were either insignificant or minimal according to Equations 5 and 6. In our data, 27 patients had small tumors and 73 had nonsmall tumors. The results appear in Figure 4. The left upper plot shows that, when the model’s predicted probability of insignificant tumors (Pinsig) was high, the observed Pinsig was low, and this is why the short line fell below the line of ideal prediction. For example, the model predicted a Pinsig of 0.66 for tumors with “preop 5 0,” but in our data the observed probability was just 0.29. The right upper boxplots show that, although the median Pinsig was higher in truly insignificant tumors than in tumors that were not insignificant, there was considerable overlap of pre-

Validation of Prostate Models/Vollmer et al.

1573

FIGURE 5. The Peller et al. model is represented. Top left: The short, irregular line provides the plot of observed probability of tumor volume greater than 0.5 cc, or P(cc .0.5), versus the P(cc .0.5) predicted by the model. For comparison, the long, straight line indicates how perfect prediction would appear. Top right: Boxplots show the model’s predicted P(cc .0.5) obtained for tumors found to be less than or equal to 0.5 cc versus greater than 0.5 cc. Bottom left: The long, curved line is the plot of observed versus predicted probability of tumor volume greater than 4 cc, or P(cc .4), and the long, straight line indicates how perfect prediction would appear. Bottom right: Boxplots show the model’s predicted P(cc .4) obtained for tumors found to be less than or equal to 4 cc versus those found to be greater than 4 cc.

dicted Pinsig. The lower plots demonstrate similar results for the probability of a “small” tumor (Psmall). When the model predicted a Psmall of 0.79 for tumors with “preop 5 0,” the observed Psmall was just 0.47.

Peller et al. Models In 1995, Peller et al.12 used the logistic regression model to relate three preoperative variables (the number of positive core biopsies [npos], Gleason grade, and serum PSA) to three postoperative outcomes (tumor volume greater than 0.5 cc, tumor volume greater than 4.0 cc, and the presence of stage pT3 disease). They found that the best model for predicting tumors .0.5 cc used the variables npos and PSA, that the best model for predicting tumors .4 cc used npos alone, and that the best model for predicting pT3 used npos and Gleason grade. Although the authors did not publish the coefficients of the logistic models, they did publish tables containing the probabilities of these three outcomes, which we symbolize respectively as P (cc .0.5), P (cc .4), and P (T3). These, in turn, allowed us to estimate those coefficients and calculate the predicted probabilities for our data. The results appear in Figures 5 and 6. The left upper plot of Figure 5 shows that the predicted P(cc .0.5) follows closely the ideal line, and the upper right boxplots show that the predicted P(cc .0.5) was generally over 0.6 for those tumors that truly exceeded 0.5 cc. This boxplot also shows considerable overlap between predicted P(cc .0.5) and

those tumors that were less than 0.5 cc. The lower left plot shows once again that the model’s predicted P(cc .4) follows the ideal line closely, and the lower right boxplots demonstrate better discrimination between tumors truly #4 cc and those .4 cc. Figure 6 shows that the Peller et al. model does less well for predicting disease of pathologic stage pT3.

Ravery et al. Model In 1996, Ravery et al.15 related several preoperative variables to the postoperative findings of capsular penetration, positive surgical margins, pathologic stage, and progression of disease. Although the authors did not use a multivariate statistical model, they performed frequency analyses on several subsets of their data and concluded that the preoperative variable providing the “most valid threshold” for predicting these postoperative outcomes was the percentage of biopsy with carcinoma (pca). The cutpoint they favored for pca was 10%, and they found that this cutpoint was significantly associated with tumor involvement of the extraprostatic tissue and seminal vesicles. They also provided the probabilities for these two outcomes based on pca #10 versus pca .10, meaning that we could use these as predicted probabilities to determine how well the authors’ model fit our validation data set. Figure 7 shows the results. The top left plot shows that the predicted probability of tumor outside the prostate (Pout) was close to the observed Pout, but the right upper boxplot shows

1574

CANCER October 15, 1998 / Volume 83 / Number 8

FIGURE 6. The Peller et al. model is continued. Left: The irregular line is the plot of observed versus predicted probability of stage pT3, or P(T3). For comparison, the long, straight line indicates how perfect prediction would appear. Right: Boxplots show the model’s predicted P(T3) obtained for tumors found to be less than stage T3 versus those at stage T3.

FIGURE 7. The Ravery et al. model is represented. Top left: The short upper line is the plot of observed probability of tumor outside the prostate (Pout) versus the Pout predicted by the model. For comparison, the straight line indicates how perfect prediction would appear. Top right: Boxplots show the model’s predicted Pfat obtained for tumors found to be inside the capsule versus outside the capsule. Bottom left: The short lower line is the plot of observed versus predicted probability of tumor involving seminal vesicles (Psv), and the long, straight line shows the ideal of perfect prediction. Bottom right: Boxplots show the model’s predicted Psv obtained for tumors found to spare or involve the seminal vesicles. that predicted Pout did not serve well as a discriminator between truly extraprostatic tumors and those within the prostate. The results for predicted probability of the involvement of seminal vesicles (Psv) were similar.

Goto et al. Model In 1996, Goto et al.16 developed a model for predicting the likelihood of an “insignificant” tumor, which they

defined as one with a volume less than or equal to 0.5 cc, a Gleason grade between 1 and 3, and confinement to the prostate. Although they used the logistic regression model to show the likelihood that insignificant tumor was associated with both maximum length of tumor in the biopsy and PSAD, they rejected this model because it “did not have a clinically good sensitivity and specificity combination.” Instead they developed a logical rule much like that of Epstein et al.,1,2

Validation of Prostate Models/Vollmer et al.

1575

FIGURE 8. The Goto et al. model is represented. Left: The upper short, straight line (“Observed”) is the plot of observed versus predicted probability of insignificant tumor (Pinsig). For comparison, the longer, straight line indicates how perfect prediction would appear. Right: Boxplots show the model’s predicted Pinsig obtained for tumors found not to be insignificant (“No”) versus those that were insignificant (“Yes”).

and in an analogous fashion this rule can be written as follows: preop 5

H

if (PSAD ,0.1 and mm#2 and Gleason score ,7) 1 otherwise

0

(12)

They reported that the probability of insignificant tumor (Pinsig) was 0.75 if preop 5 0 and that otherwise it was 0.05. Applying this model to our data produced the results shown in Figure 8. On the left, the plot of observed versus predicted values of Pinsig fell close to the ideal line, although the model underpredicted the observed Pinsig. The boxplots on the right show that the model predicted the same range and median values (lower bracketed line) of Pinsig for tumors that were truly insignificant as well as for those that were not. This was because the preoperative variable was “0” for just 6 of the 28 tumors found to be insignificant by the authors’ definition (sensitivity 5 21%).

Modified D’Amico et al. Model for Tumor Volume In 1997, D’Amico et al.17 developed a model for estimating tumor volume in the prostatectomy specimen from three preoperative variables: Gleason grade, PSA, and volume of the prostate gland as determined by ultrasound. Executing their algorithm required assumptions of an epithelial fraction of 0.2, a PSA “leak” of benign epithelium of 0.33, and a tumor-associated PSA “leak” that was an exponential function of the average Gleason grade. To obtain the average grade,

the authors recorded the grade and the number of positive biopsies separately for the right and left sides of the prostate and then used a weighted average— that is, an average weighted by the number of cores that contained tumor. We resorted to a modified version of this model, because in our validation data we did not measure the Gleason score separately for right and left sets of biopsies and because we did not record the amount of each grade. In our modification, we estimated that, on average, the first Gleason grade comprised roughly 60% of the pattern and the second 40%, so we approximated the average Gleason score as follows: Average Gleason score 5 0.6 p major grade 1 0.4 p minor grade This modification follows the spirit of using an average score; and because many are unlikely to record the detail of grading score that the authors’ originally defined, it produces a model that may be more generally applicable. Thus, with this modification we then used their formulas to estimate tumor volume from preoperative variables, and we compared this estimate to the observed tumor volume we found in the prostatectomy specimen. Figure 9 shows the results. On the left is a plot of the observed tumor volume versus what the modified D’Amico et al. model predicted. We see that, for tumors over 2 cc, there was a significant scatter of observations away from the ideal

1576

CANCER October 15, 1998 / Volume 83 / Number 8

FIGURE 9.

A modification of the D’Amico et al. model is represented. Left: This plot shows observed tumor volume in cc versus tumor volume predicted by the model. The long, straight line shows what perfect prediction would achieve, and the dots indicate the actual correlation observed. Right: This plot shows the difference between observed and predicted tumor volume (i.e., the error) in cc versus tumor observed tumor volume.

line, which marked perfect agreement. Nevertheless, the median error was 0.31 cc, and for 50% of the tumors the error fell between -0.25 cc and 1.5 cc. The right plot shows that error (the difference between the observed and predicted tumor volume) increased with increasing tumor volume. By contrast, for tumors less than 2 cc the error was small, and for tumors over 2 cc the model tended to underestimate the observed tumor volume.

Regression Analyses of Prostatectomy Outcome Table 2 shows the results of a number of regression analyses of our data that used the preoperative variables from these 13 models. Each analysis pairs an outcome with one or more preoperative variables, and each gives an overall likelihood ratio (LR) for the result as well as the percentage of deviance residual explained (PDE) by the model. Although none of the models explained much of the deviance (PDE ranged from 2 to 30), the results did allow us to rank the models. Those with the highest LR and PDE were the best in the sense that they showed a significant association between preoperative and prostatectomy variables. For example, both the modified D’Amico et al. model and the Peller et al. model provided good information for predicting tumor volume. The results demonstrate that the LR and PDE are higher for outcomes related to tumor volume than for outcomes related to tumor outside the prostate. For example, for outcomes related to tumor volume, the LR ranged from 9.6 to 155.5, and PDE ranged from 9 to

30. By contrast, for outcomes of tumor beyond the prostate, LR ranged from 0.7 to 6.9, and PDE ranged from 2 to 9. Thus, these models work better for estimating volume than for estimating local stage. Table 2 also demonstrates which preoperative variables showed promise as predictors. Those with lower P values were more strongly associated with prostatectomy findings. For example, as univariate factors, tumor length and number of positive cores appeared closely related to tumor volume. On the other hand, the variables that combined several preoperative factors performed better, and these included the modified D’Amico et al. estimated volume and the combined variables of Equations 5 and 12.

The Importance of a Multivariate Approach To illustrate further the importance of combining preoperative variables, we constructed five general linear models for predicting tumor volume in the prostatectomy specimen. The first used no preoperative variables; the second used tumor length; and the third through fifth added, consecutively, the number of positive cores, PSA, and Gleason Grade 4 or 5. Figure 10 shows a bar plot of the residual error for each of these five models, and it demonstrates that, as the number of preoperative variables increased from 0 to 4, the error dropped— that is, the fit of the model improved. Figure 11 illustrates this further by showing that the observed and predicted tumor volumes came closer to the ideal line as the number of preoperative variables increased from 1 to 4.

Validation of Prostate Models/Vollmer et al.

1577

TABLE 2 Reanalysis of Postoperative Outcomes with the Validation Data Using Suggested Preoperative Variablesa Post-operative outcome

LR

PDE

Preoperative variables

P value

Terris et al. model Small tumor Partin et al. models Extra-prostatic tumor

9.6

12

Tumor length

0.0019

2.5

2

Gleason score Clinical stage PSA categories

0.30 0.78 0.26

Tumor in seminal vesicles

4.0

9

Gleason score Clinical stage PSA categories

0.04 0.82 0.67

Ackerman et al. model Tumor in margin

6.2

5

Pos. cores PSA density

0.033 0.19

Combinedb (Equation 5)

0.00036

Pos. cores PSA categories Pos. cores Pos. cores Gleason score

0.0016 0.22 7 3 1025 0.01 0.55

Modified Epstein et al. model Small tumor Peller et al. model Small tumor Large tumor pT3 classification Ravery et al. model Extraprostatic tumor Tumor in seminal vesicles Goto model Insignificant tumor Modified D’Amico model Observed tumor volume pT3 classification

12.7

11

14.3

11

15.7 6.9

28 5

3.1

2

% tumor in cores

0.08

0.7

2

% tumor in cores

0.39

11.1

9

Combinedb (Equation 12)

0.00087

155.5 6.2

30 5

Estimated volumec Estimated volumec

2.9 3 1029 0.013

LR: overall model likelihood ratio; PDE: percentage of deviance residual explained by the model; Pos. cores: no. of positive core biopsies; PSA: prostate specific antigen. a For every discrete outcome (all except tumor volume), the logistic regression model was used. For tumor volume, the general linear model was used. b “Combined” implies that several preoperative variables are used in a logical rule to yield an overall variable. c “Estimated volume” is the estimate of tumor volume calculated by the modified D’Amico algorithm.

FIGURE 10. This barplot shows the residual error after modeling versus the number of preoperative variables used in modeling. The error is that remaining after the general linear model is used to relate postoperative tumor volume to the following preoperative variables: tumor length (mm), number of positive cores, prostate specific antigen, and Gleason Grade 4 or 5. The “None” column gives the error when no preoperative variables are used, and the “One,” “Two,” “Three,” and “Four” columns give the error after modeling using, respectively, the first (tumor length), first two, first three, and all four preoperative variables.

The overall LR was 16.9, and the percentage of deviance explained was 25. The top left plot of Figure 12 shows that the model’s observed and predicted P(T3) were close to the ideal line, and the top right plot of Figure 12 demonstrates that this model discriminated reasonably well between those tumors observed to be less than stage T3 and those greater than or equal to stage T3. However, when we applied this model to the “test set,” it did not perform as well, and the bottom plots of Figure 12 show these results. The observed and predicted P(T3) fell farther from the ideal line, and there was, in fact, little discrimination between those without or with stage pT3. Finally, when we redid the logistic model on our test set, we found that neither PSAD nor tumor length was significant (P . 0.2).

Deterioration of a Model with Validation In validation it is natural to find that a model’s performance will deteriorate when it is applied to new data. To illustrate this, we divided our cases into two groups: the “training set” and the “test set.” With the “training set” we used logistic regression to relate the outcome of stage $pT3 to PSA density (PSAD) and tumor length (, vs. $2.5 mm). Both variables were significantly associated with stage $pT3, and their individual P values were 0.0014 and 0.01, respectively.

DISCUSSION Although the number of test cases we used was limited to 100 and their extent of disease was also limited, we believe that the results are valid and suggest several important generalizations. First, measurements derived from the prostate will correlate best with outcomes confined to the prostate. This is simple logic, but our results support this logic by demonstrating that these models perform better in predicting tumor

1578

CANCER October 15, 1998 / Volume 83 / Number 8

FIGURE 11. Four plots show observed tumor volume in cc versus that predicted by a general linear model using one, two, three, or all four of the preoperative variables cited in Figure 10. For comparison, the long, straight lines indicate how perfect prediction would appear.

FIGURE 12. This logistic regression model of the outcome of stage pT3 disease in our data. Top left: The curved line is the plot of observed probability in the training data set of tumor greater or equal to stage pT3, or P(T3), versus the P(T3) predicted by the model. For comparison, the long, straight line shows how perfect prediction would appear. Top right: Boxplots show the model’s predicted P(T3) obtained for tumors found in the training data set to be stage data set of tumor greater or equal to stage pT3, or P(T3), versus the P(T3) predicted by the model. For comparison, the long, straight line indicates perfect prediction. Bottom right: Boxplots show the model’s predicted P(T3) obtained for tumors found in the test data set to be stage , T3 versus those ^ T3. volume than in predicting tumor outside of the prostate. In this regard, our results are in agreement with those of Kattan et al.,25 who demonstrated a progressive decline in the power of the Partin et al. model for predicting outcomes at increasing distance from the

prostate, namely, extraprostatic tumor, seminal vesicle involvement, and lymph node involvement. Together, these two studies suggest that, although we may be able to improve our ability to estimate tumor volume by using serum PSA and multiple variables

Validation of Prostate Models/Vollmer et al.

obtained from needle biopsies of the prostate, our best collection of such variables will probably not predict local tumor stage as well. To estimate tumor stage more accurately, we may be required to sample sites outside the prostate by either imaging or biopsy, including fine-needle aspiration. For example, endorectal coil magnetic resonance imaging has shown promise in detecting tumor outside the prostate or tumor in the seminal vesicles.28 Molecular analyses of small biopsies may help, and they include reverse transcriptase polymerase chain reaction for mRNA associated with either PSA29 or PSM.30 For estimating tumor volume, the best predictive model will undoubtedly be a multivariate one. For the moment, the amount of residual error with models using four or fewer key predictive variables is not likely to be acceptable, and to improve on this will require additional variables or samples. Because each core biopsy is just a sample of the prostate, which can be viewed as an assembly of potential core biopsies numbering from 300 to 500, the simplest way to increase information may be to increase the number of biopsies. By taking 6 biopsies, we sample 1–2% of the volume of the prostate, so that by sampling more our estimate of tumor volume should improve. Additional variables could also help, and these include percent free PSA,31 microvessel density,18 and molecular markers such as p53 and bcl-2.32,33 Finally, our results show how important it is to validate published models. Whereas the development of good predictive models requires good fits of the data and low P values, such fits and P values are not sufficient and do not automatically apply to new data. For a model to be useful, it should retain significant predictive ability on new data—that is, it should validate. Authors of models can assist in this process by publishing the complete details of their models. For example, they should provide the formulas and coefficients for any regression models used. Although it is ideal if the original authors have sufficient data to validate their own model with an independent test set of patients, in practice this is seldom done. To do so requires either more patients or more time to gather the patients, so that it mostly falls to others to test and validate previously published models. This is a necessary step that deserves more emphasis.

REFERENCES 1.

2.

Epstein JI, Walsh PC, Brendler CB. Radical prostatectomy for impalpable prostate cancer: the Johns Hopkins experience with tumors found on transurethral resection (stages T1A and T1B) and on needle biopsy (stage T1C). J Urol 1994;152: 1721–9. Epstein JI, Walsh PC, Carmichael M, Brendler CB. Pathologic and clinical findings to predict tumor extent of non-

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

1579

palpable (stage T1c) prostate cancer. JAMA 1994;271:368 – 74. Dugan JA, Bostwick DG, Myers RP, Qian J, Bergstraih EJ, Oesterling JE. The definition and preoperative prediction of clinically insignificant prostate cancer. JAMA 1996;275:288 – 94. Terris MK, McNeal JE, Stamey TA. Detection of clinically significant prostate cancer by transrectal ultrasound– guided systematic biopsies. J Urol 1992;148:829 – 32. Partin AW, Yoo J, Carter HB, Pearson JD, Chan DW, Epstein JI, et al. The use of prostate specific antigen, clinical stage and Gleason score to predict pathological stage in men with localized prostate cancer. J Urol 1993;150:110 – 4. Partin AW, Kattan MW, Subong ENP, Walsh PC, Wojno KJ, Oesterling JE, et al. Combination of prostate specific antigen, clinical stage, and Gleason score to predict pathologic stage of localized prostate cancer: a multi-institutional update. JAMA 1997;277:1445–51. Ackerman DA, Barry JM, Wicklund RA, Olson N, Lowe BA. Analysis of risk factors associated with prostate cancer extension to the surgical margin and pelvic node metastasis at radical prostatectomy. J Urol 1993;150:1845–50. Irwin MB, Trappasso JG. Identification of insignificant prostate cancers: analysis of preoperative parameters. Urology 1994;44:862– 8. Dietrick DD, McNeal JE, Stamey TA. Core cancer length in ultrasound-guided systematic sextant biopsies: a preoperative evaluation of prostate cancer volume. Urology 1995;45: 987–92. Cupp MR, Bostwick DG, Myers RP, Oesterling JE. The volume of prostate cancer in the biopsy specimen cannot reliably predict the quantity of cancer in the radical prostatectomy specimen on an individual basis. J Urol 1995;153: 1543– 8. Terris MK, Haney DJ, Johnstone IM, McNeal JE, Stamey TA. Prediction of prostate cancer volume using prostate-specific antigen levels, transrectal ultrasound, and systematic sextant biopsies. Urology 1995;45:75– 80. Peller PA, Young DC, Marmaduke DP, Marsh WL, Badalament RA. Sextant prostate biopsies: a histopathologic correlation with radical prostatectomy specimens. Cancer 1995; 75:530 – 8. Badalament RA, Miller MC, Peller PA, Young DC, Gahn DK, et al. An algorithm for predicting non-organ-confined prostate cancer using the results obtained from sextant core biopsies with prostate specific antigen level. J Urol 1996;156: 1275–380. Huland H, Hammerer P, Henke R, Huland E. Preoperative prediction of tumor heterogeneity and recurrence after radical prostatectomy for localized prostate carcinoma with digital rectal examination, prostate specific antigen and the results of 6 systematic biopsies. J Urol 1996;155: 1344 –7. Ravery V, Schmid H, Toublanc M, Boccon-Gibod L. Is the percentage of cancer in biopsy cores predictive of extracapsular disease in T1–T2 prostate cancer? Cancer 1996; 78:1079 – 84. Goto Y, Ohori M, Arakawa A, Kattan MW, Wheeler TM, Scardino PT. Distinguishing clinically important from unimportant prostate cancers before treatment: value of systematic biopsies. J Urol 1996;156:1059 – 63.

1580

CANCER October 15, 1998 / Volume 83 / Number 8

17. D’Amico AV, Chang H, Holupka E, Renshaw AA, Desjarden A, Chen M, et al. Calculated prostate cancer volume: the optimal predictor of actual cancer volume and pathologic stage. Urology 1997;49:385–91. 18. Bostwick DG, Wheeler TM, Blute M, Barrett DM, MacLennan GT, et al. Optimized microvessel density analysis improves prediction of cancer stage from prostate needle biopsies. Urology 1996;48:47–57. 19. Catalona WJ, Smith DS, Ratliff TL, Dodds KM, Coplen DE, Yuan JJJ, et al. Measurement of prostate specific antigen in serum as a screening test for prostate cancer. N Engl J Med 1991;324:1156 – 61. 20. Humphrey PA, Baty J, Keetch D. Relationship between serum prostate specific antigen, needle biopsy findings, and histopathologic features of prostatic carcinoma in radical prostatectomy tissues. Cancer 1995;75:1842–9. 21. Humphrey PA, Keetch DW, Smith DS, Shepherd DL, Catalona WJ. Prospective characterization of pathological features of prostatic carcinomas detected via serum prostate specific antigen based screening. J Urol 1996;155:816 –20. 22. Gleason DF. Histologic grading of prostate cancer: a perspective. Hum Pathol 1992;23:273–9. 23. Cleveland WS, Grosse E, Shyu WM. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical models in S. Chapter 8. Pacific Grove: Wadsworth & Brooks/Cole Advanced Books & Software, 1992:314. 24. Chambers JM, Cleveland WS, Kleiner B, Tukey PA. Graphical methods for data analysis. Belmont, CA: Wadsworth, 1983:121–2. 25. Kattan MW, Stapleton AMF, Wheeler TM, Scardino PT. Evaluation of a nomogram used to predict the pathologic stage of clinically localized prostate carcinoma. Cancer 1997;79: 528 –37.

26. Vollmer RT. Multivariate statistical analysis for pathologists. Part I: The logistic model. Am J Clin Pathol 1996; 105:115–26. 27. Mccullagh P, Nelder JA. Generalized linear models. London: Chapman & Hall, 1989. 28. D’Amico AV, Whittington R, Schnall M, Malkowicz SB, Tomaszewski JE, Schultz D, et al. The impact of the inclusion of endorectal coil magnetic resonance imaging in a multivariate analysis to predict clinically unsuspected extraprostatic tumor. Cancer 1995;75:2368 –72. 29. Gomella LG, Raj GV, Moreno JG. Reverse transcriptase polymerase chain reaction for prostate specific antigen in the management of prostate cancer. J Urol 1997;158:326 – 37. 30. Ferrari AC, Stone NN, Eyler JN, Gao M, Mandeli J, Unger P, et al. Prospective analysis of prostate-specific markers in pelvic lymph nodes of patients with high-risk prostate cancer. J Natl Cancer Inst 1997;89:1498 –504. 31. Arcangeli CG, Shepherd DL, Smith DS, Humphrey PA, Keetch DW, Catalona WJ. Correlation of percent free PSA with pathologic features of prostatic carcinomas. J Urol 1996;155(Suppl):415A. 32. Bauer JJ, Sesterhenn IA, Mostofi FK, McLeod DG, Srivastava S, Moul JW. Elevated levels of apoptosis regulator proteins p53 and bcl-2 are independent prognostic biomarkers in surgically treated clinically localized prostate cancer. J Urol 1996;156:1511– 6. 33. Kallakury BVS, Figge J, Leibovich B, Hwang J, Rifkin M, Kaufman R, et al. Increased bcl-2 protein levels in prostatic adenocarcinomas are not associated with rearrangements in the 2.8 kb major breakpoint region or with p53 protein accumulation. Mod Pathol 1996;9:41–7.

Suggest Documents