Radiological Aspects

Lung Cancer Screening - Radiological Aspects

Lung Cancer Screening

Bartjan de Hoop

Bartjan de Hoop

Lung Cancer Screening: Radiological Aspects

Bartjan de Hoop

Lung Cancer Screening: Radiological Aspects Thesis, Utrecht University – with a summary in Dutch © B.J. de Hoop, 2010. ISBN 978-90-393-5311-0 Cover ‘Smoking-induced Lung Diseases’ by Karin van Rijnbach and Bartjan de Hoop Lay-out Karin van Rijnbach Print Gildeprint Drukkerijen B.V., Enschede, The Netherlands

Lung Cancer Screening: Radiological Aspects Longkanker Screening: Radiologische Aspecten (met een samenvatting in het Nederlands)

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de rector magnificus, prof.dr. J.C. Stoof, ingevolge het besluit van het college voor promoties in het openbaar te verdedigen op dinsdag 25 mei 2010 des middags te 12.45 uur

door

Berend Jan de Hoop geboren op 10 december 1980 te Arnhem

Promotoren:

Prof. dr. W.M. Prokop Prof. dr. J.W.J. Lammers

Co-promotoren:

Dr. H.A. Gietema Dr. B. van Ginneken

Publication of this thesis was financially supported by: Stichting Nationaal Fonds tegen Kanker – voor onderzoek naar reguliere en aanvullende therapieën te Amsterdam, Philips Healthcare, Siemens Medical Solutions, Boehringer Ingelheim bv, Röntgen Stichting Utrecht en IMAGO Utrecht (Graduate School for Biomedical Image Sciences)

Contents 1. General introduction Part 1: Nodule detection in lung cancer screening 2. Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations.

6 14

Radiology. (in press)

3. Computer-aided detection of lung cancer on CXR: how CAD and human observers interact.

30

Radiology. (in revision)

Part 2: Nodule differentiation in lung cancer screening 4. Recognition of pulmonary perifissural nodules on computed tomography: a chance to reduce the number of follow-up examinations?

44

Submitted

5. A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry: what is the minimum increase in size to detect growth in repeated CT examinations?

56

Eur. Radiol. 2009 Apr;19(4):800-8.

6. Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably.

74

Eur. Radiol. (in press)

7. Increase in mass of pulmonary ground glass nodules as an early indicator of growth.

88

Radiology. 2010 April; 255:199-206.

8. Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma.

102

Submitted

Part 3: Additional findings in lung cancer screening: emphysema 9. Automatic segmentation of pulmonary segments from volumetric chest CT scans.

116

IEEE Trans Med Imaging. 2009 Apr;28(4):621-30.

10. Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments. 11. CT-quantified emphysema in heavy smokers: prevalence and predictive value for rate of lung function decline.

138 152

Submitted

12. Summary and general discussion 13. Nederlandse samenvatting Dankwoord List of publications Curriculum vitae Onderzoekers NELSON studie

166 176 183 187 193 197

1 Chapter

General introduction

Chapter 1 | General Introduction

The effects of smoking are a major health problem. The number of smoking-related deaths worldwide is projected to be 8.3 million in 2030, at which point it will represent almost 10% of all deaths globally 1. Of those smoking-related deaths, approximately 41% is due to lung cancer and 27% to COPD 2. Lung cancer was the first disease to be causally linked to smoking. During the 1920’s and 1930’s doctors began to notice a strong increase in lung cancer largely reflecting increased rates of smoking 3. Nowadays, smoking accounts for approximately 90% of lung cancer cases 4. Survival rate varies markedly depending on the stage at diagnosis. The overall 5-year survival rate of lung cancer is 16% 5. Patients with stage I disease, however, have a much better prognosis with a 5-year survival of more than 60% 6. The clear defined risk factor for developing lung cancer combined with the increased survival rate that is associated with early detection has led to the initiation of lung cancer screening studies. Nevertheless, screening remained difficult because an accurate test to differentiate healthy and ill subjects was lacking. Large randomized controlled trials have been conducted since 1960 using sputum cytology and conventional screen-film chest X-ray (CXR) as screening test, but neither test had the ability to accurately differentiate healthy and diseased subjects. No prolongation of life expectancy of individuals with lung cancer could be demonstrated 7. Screening for lung cancer was therefore, until recently, considered inappropriate. The rapid development of multi-detector Computed Tomography (CT) technology has renewed the interest in lung cancer screening. CT enabled the visualization of even small intrapulmonary lesions. Pilot studies using CT to screen participants for lung cancer have reported that 55-85% of detected cancers are stage 1. Whether this early detection will also result in a mortality reduction is yet unknown. To answer this question, the Dutch Belgian Lung Cancer Screening trial (NELSON) was initiated in 2004. The NELSON trial is a randomised controlled population based screening trial that tests whether screening for lung cancer by low-dose CT in high risk subjects will lead to a 25% decrease in lung cancer mortality 8. This thesis evaluates methods and techniques that are used in lung cancer screening. The aim is to increase the effectivity and efficiency of lung cancer screening. Part one deals with the optimal imaging technique for the detection of lung nodules. Subsequently, different methods are discussed that can differentiate the detected nodules into benign or malignant lesions. The mortality reduction intended by lung cancer screening could possibly be increased when screening is expanded to other smoking-induced diseases like COPD. The last part of this thesis therefore focuses on the detection and quantification of emphysema in participants of a lung cancer screening trial. The predictive value of emphysema for decline in lung function is also discussed.

8

Part 1 Nodule Detection in Lung Cancer Screening Compared to CT, CXR has the advantage of low cost, low radiation dose and easy accessibility. The conventional CXR, however, was not sensitive enough to make it a successful screening tool in past studies 9. Techniques for performing CXRs have radically changed since then. Modern digital CXR equipment with highly quantumefficient detectors and elaborate processing tools has improved visualization of pulmonary structures. Current CXR may therefore be a more suitable screening tool than conventional CXR. In chapter 2 the performance of state-of the-art digital CXR in detection of lung cancer is estimated. Using digital CXR, we tested whether observers were able to detect lung cancer cases that were found during CTscreening. A nested case-control setup within the NELSON study was used. NELSON participants with a screening-detected pulmonary malignancy between the start of the study and July 2007 were defined as cases. A selection of the participants in whom CT did not demonstrate nodules larger than 5mm were defined as controls. In order to increase CXR sensitivity to pulmonary nodules, Computer Aided Detection (CAD) systems are currently being developed. The concept of a CAD system is the identification of lesions which would otherwise have been missed by the reader, with the assumption that these missed lesions were overlooked. Most CAD systems act as a second reader. In order for such a CAD system to have an effect on the final detection rate by the reader, the reader has to acknowledge the annotations produced by the system. In chapter 3 we assess the effect of a stateof-the-art, commercially available CAD system on reader performance for nodule detection with CXR. We also discuss the interaction between the readers and the CAD system.

Chapter 1 | General Introduction

Outline of the Thesis

Part 2 Nodule Differentiation in Lung Cancer Screening Once a nodule has been detected, the clinician needs to know whether it is benign or malignant. Eventually less than 2% of all detected nodules proves to be malignant 10 . Accurate characterization of each nodule is important to prevent anxiety and morbidity related to unnecessary work-up. A false positive screening result can even lead to the resection of benign lesions. One way to discriminate benign and malignant nodules is by evaluating nodule morphology. In chapter 4, the morphology and growth rate of pulmonary fissural nodules (PFNs) is described. PFNs are fissure attached nodules with a triangular or lentiform shape. They have been suggested to be lymph nodes and thus benign 11. All baseline nodules in the NELSON screening were reevaluated and classified as PFN or non-PFN. We than assessed prevalence and malignancy rate of PFNs. If they are always benign, they would offer a chance to reduce the number of follow-up scans for false positive nodules. 9

Chapter 1 | General Introduction

The NELSON study protocol uses nodule size and volume doubling time to discriminate benign and malignant nodules. Volumetry software was used to determine volume and volume doubling times for every detected nodule. The measurement of nodule volume is, as all tasks requiring image interpretation, subject to observer variability. Variation in the results of volumetry may result in false positive or false negative conclusions with potential serious consequences to the patient. To avoid overinterpreting random changes in volumetric measurements, the diagnosis of real growth or regression typically requires that the difference in measured nodule size exceeds the variation in the measurement. In chapter 5, we used CT studies of cancer patients that were scanned twice on the same day to test for variability. Multiple modern software packages for evaluation of solid lung nodules using volumetry are compared to assess the minimum increase in size to detect growth. Interchangeability of nodule volumetry software was evaluated in chapter 6 in collaboration with the Danish Lung Cancer Screening Trial. In this study 2 observers had to choose one of 3 different volumetry algorithms to segment 188 nodules at baseline and follow-up. We assessed the quality of the segmentations and interobserver variability. We also evaluated the effect of using different volumetry algorithms for measurements at baseline and follow-up. Solid nodules are characterized based on size or growth rate during the NELSON trial. Ground Glass Nodules (GGNs) in contrast, are also characterized by changes in density 12-14. GGNs are non-solid or partial solid lung nodules. They are less frequent than solid nodules but have a much higher malignancy rate than that of solid nodules. In chapter 7, the estimation of GGN mass is introduced as a new method for measuring change in GGNs on CT images. Nodule mass is a parameter that integrates volume and density changes and thus should be especially suitable for identifying GGN with a high risk for malignancy. GGN mass was retrospectively measured for all GGNs detected during the NELSON trial. We compared the variability in the measurements of diameter, volume and mass and also assessed the increase of each parameter for each of the GGNs. This data was used to evaluate which method can best be used to determine change over time in order to identify malignant GGNs. Despite the high number of GGNs that are malignant, GGNs have also shown to have a very slow grow rate 15. The current guidelines for pulmonary nodules state that nodules with a volume doubling time (VDT) outside a window of 20 to 400 days have a low probability of malignancy 16-20 but these guidelines may not be sufficient for GGNs. Consensus on the appropriate follow-up strategies for GGNs is currently still lacking. In chapter 8 we describe growth rates and pathology results of GGNs that were found during the NELSON lung cancer screening study in order to gain a better understanding of these lesions.

Part 3 Additional Findings in Lung Cancer Screening: Emphysema The average smoking history of the participants of the NELSON lung cancer trial was 40.3 pack years at baseline. Only current of former heavy smokers were included

10

Chapter 1 | General Introduction

as these smokers have a high chance of developing lung cancer. Smoking is also the main cause of Chronic Obstructive Pulmonary Disease (COPD) and contributes to about 85% of the risk of developing this disease. COPD is expected to be the fourth cause of death worldwide by 2020 21. COPD is usually defined as an irreversible airflow obstruction caused either by increasing resistance of the small airways, an increase in lung compliance due to emphysema or a combination of the two. Part 2 of this thesis focuses on the emphysema component of COPD. Emphysema is defined as a consistent widening of airspaces distal to the terminal bronchioles caused by destruction of the alveoli. It can be objectively quantified using CT because destruction of alveoli in persons with emphysema results in a lower lung density compared to non-emphysematous persons. The first step in the quantification of emphysema is to isolate the lung parenchyma from the chest CT. A completely automatic method to segment the lungs, lobes and pulmonary segments from volumetric CT chest scans is presented and evaluated in chapter 9. The next step is the quantification of low-attenuated areas which are used as a measure of emphysema. The segmentation into lobes and segments can be used to evaluate distribution of emphysema. The pattern of emphysema distribution has been reported to be closely related to functional impairment and mortality caused by the disease 22, 23. Also, knowledge of distribution is crucial in treatment of severe emphysema with lung-volume–reduction surgery. Chapter 10 describes the distribution of smoking-induced emphysema in smokers participating in the lung cancer screening trial. We used baseline and follow-up screening CTs to assess changes in distribution over time. This knowledge may help to estimate the prognosis for patients with different degree and distribution pattern of emphysema. The strong correlation between lung density and degree of emphysema has been extensively validated and used in many studies 24. Still, it remains unclear whether the presence of CT-detected emphysema in none or only mild obstructive patients is a risk-factor for developing (more severe) obstructive disease, i.e. COPD, in the future. This is important since stabilizing the disease and emphasizing on smoking cessation is the best one can do in early stages of COPD. The correlation between CT-quantified emphysema and decline in lung function over time is discussed in chapter 11.

11

Chapter 1 | General Introduction

12

Reference List 1. WHO. The global burden of disease: 2004 update. 2008. 2. Stephen Begg, Theo Vos, Bridget Barker, Chris Stevenson, Lucy Stanley, Alan D Lopez. The burden of disease and injury in Australia 2003. 2007. 3. Wynder EL, Graham EA. Tobacco smoking as a possible etiologic factor in bronchiogenic carcinoma; a study of 684 proved cases. J Am Med Assoc 1950 May 27;143(4):329-36. 4. Alberg AJ, Samet JM. Epidemiology of lung cancer. Chest 2003 January;123(1 Suppl):21S-49S. 5. Alberg AJ, Ford JG, Samet JM. Epidemiology of lung cancer: ACCP evidence-based clinical practice guidelines (2nd edition). Chest 2007 September;132(3 Suppl):29S-55S. 6. Mountain CF. Revisions in the International System for Staging Lung Cancer. Chest 1997 June;111(6):1710-7. 7. Bach PB, Kelley MJ, Tate RC, McCrory DC. Screening for lung cancer: a review of the current literature. Chest 2003 January;123(1 Suppl):72S-82S. 8. van Iersel CA, de Koning HJ, Draisma G, Mali WP, Scholten ET, Nackaerts K, Prokop M, HabbemaJD, Oudkerk M, van Klaveren RJ. Risk-based selection from the general population in a screening trial: selection criteria, recruitment and power for the Dutch-Belgian randomised lung cancer multi-slice CT screening trial (NELSON). Int J Cancer 2007 February 15;120(4):868-74. 9. Flehinger BJ, Melamed MR. Current status of screening for lung cancer. Chest Surg Clin N Am 1994 February;4(1):1-15. 10. van Klaveren RJ, Oudkerk M, Prokop M, Scholten ET, Nackaerts K, Vernhout R, van Iersel CA, van den Bergh KA, van ‘t WS, van der AC, Thunnissen E, Xu DM, Wang Y, Zhao Y, Gietema HA, de Hoop BJ, Groen HJ, de Bock GH, van OP, Weenink C, Verschakelen J, Lammers JW, Timens W, Willebrand D, Vink A, Mali W, de Koning HJ. Management of lung nodules detected by volume CT scanning. N Engl J Med 2009 December 3;361(23):2221-9. 11. Matsuki M, Noma S, Kuroda Y, Oida K, Shindo T, Kobashi Y. Thin-section CT features of intrapulmonary lymph nodes. J Comput Assist Tomogr 2001 September;25(5):753-6. 12. Aoki T, Nakata H, Watanabe H, Nakamura K, Kasai T, Hashimoto H, Yasumoto K, Kido M. Evolution of Peripheral Lung Adenocarcinomas: CT Findings Correlated with Histology and Tumor Doubling Time. Am J Roentgenol 2000 March 1;174(3):763-8. 13. Kakinuma R, Ohmatsu H, Kaneko M, Kusumoto M, Yoshida J, Nagai K, Nishiwaki Y, Kobayashi T, Tsuchiya R, Nishiyama H, Matsui E, Eguchi K, Moriyama N. Progression of focal pure ground-glass opacity detected by low-dose helical computed tomography screening for lung cancer. J Comput Assist Tomogr 2004 January;28(1):17-23. 14. Lindell RM, Hartman TE, Swensen SJ, Jett JR, Midthun DE, Tazelaar HD, Mandrekar JN. Fiveyear Lung Cancer Screening Experience: CT Appearance, Growth Rate, Location, and Histologic Features of 61 Lung Cancers. Radiology 2007 February 1;242(2):555-62. 15. Hasegawa M, Sone S, Takashima S, Li F, Yang ZG, Maruyama Y, Watanabe T. Growth rate of small lung cancers detected on mass CT screening. Br J Radiol 2000 December 1;73(876):1252-9. 16. Ost D, Fein AM, Feinsilver SH. Clinical practice. The solitary pulmonary nodule. N Engl J Med 2003 June;348(25):2535-42. 17. Usuda K, Saito Y, Sagawa M, Sato M, Kanma K, Takahashi S, Endo C, Chen Y, Sakurada A, Fujimura S. Tumor doubling time and prognostic assessment of patients with primary lung cancer. Cancer 1994 October 15;74(8):2239-44. 18. Winer-Muram HT, Jennings SG, Tarver RD, Aisen AM, Tann M, Conces DJ, Meyer CA. Volumetric Growth Rate of Stage I Lung Cancer prior to Treatment: Serial CT Scanning. Radiology 2002 June 1;223(3):798-805. 19. Yankelevitz DF, Henschke CI. Does 2-year stability imply that pulmonary nodules are benign? AJR Am J Roentgenol 1997 February;168(2):325-8. 20. Yankelevitz DF, Reeves AP, Kostis WJ, Zhao B, Henschke CI. Small pulmonary nodules: volumetrically determined growth rates based on CT evaluation. Radiology 2000 October;217(1):251-6. 21. Lopez AD, Shibuya K, Rao C, Mathers CD, Hansell AL, Held LS, Schmid V, Buist S. Chronic obstructive pulmonary disease: current burden and future projections. Eur Respir J 2006 February;27(2):397-412.

Chapter 1 | General Introduction

22. Gurney JW, Jones KK, Robbins RA, Gossman GL, Nelson KJ, Daughton D, Spurzem JR, Rennard SI. Regional distribution of emphysema: correlation of high-resolution CT with pulmonary function tests in unselected smokers. Radiology 1992 May;183(2):457-63. 23. Martinez FJ, Foster G, Curtis JL, Criner G, Weinmann G, Fishman A, DeCamp MM, Benditt J, Sciurba F, Make B, Mohsenifar Z, Diaz P, Hoffman E, Wise R, for the NETT Research Group. Predictors of Mortality in Patients with Emphysema and Severe Airflow Obstruction. Am J Respir Crit Care Med 2006 June 15;173(12):1326-34. 24. Parr DG, Stoel BC, Stolk J, Stockley RA. Validation of computed tomographic lung densitometry for monitoring emphysema in {alpha}1-antitrypsin deficiency. Thorax 2006 June 1;61(6):485-90.

13

2 Chapter

Bartjan de Hoop Cornelia Prokop-Schaefer Hester A Gietema Pim A de Jong Bram van Ginneken Rob van Klaveren Mathias Prokop

Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

.......................................................................... Abstract ..........................................................................

Purpose To estimate the performance of digital chest radiography (CXR) for detection of lung cancer. Method and Materials Using a nested case-control design within an ethics-committeeapproved lung cancer screening trial, we studied 55 cases with Computed Tomography (CT)-detected, histology-proven lung cancers and a sample of 72 of 4873 controls without nodules on CT. All subjects underwent direct detector digital CXR in two projections within 2 months of the screening CT. Four radiologists with varying experience identified and localized potential cancers on CXRs using a confidence scale of 1 (no lesion) to 5 (definite lesion). Localization receiver operating characteristic (ROC) analysis was performed. On the basis of the assumption that suspicious lesions seen at CXR would initiate further work-up by CT, the number of work-up CT examinations per detected cancer (CT examinations per cancer) was calculated at various confidence levels for the screening population (cancer rate in study population, 1.3%). Results Tumor size ranged from 6.8mm to 50.7mm (median 11.8mm). Areas under the localization ROC curve ranged from 0.52 to 0.69. Detection rates substantially varied with the observers’ experience and confidence level: at a confidence level of 5, detection rates ranged from 18% at one CT examination per cancer to 53% at 13 CT examinations per cancer. At a confidence level of 2 or higher, detection rates ranged from 94% at 62 CT examinations per cancer to 78% at 44 CT examinations per cancer. Conclusion A detection rate of 94% for lung tumors with a diameter ranging 6.8-50.7mm found at CT screening was achievable with CXR only at the expense of a high false-positive rate and an excessive number of work-up CT examinations. Detection performance is strongly observer dependent.

.........................................................................................................................................................................

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

16

Introduction Chest radiography (CXR) is still the most commonly used technique in clinical practice to rule out chest disease, to study the effects of treatment and to monitor patients with chest abnormalities. Computed tomography (CT) has a much higher sensitivity for the detection of small intrapulmonary lesions than does CXR but CXR has the advantage of low cost, low radiation dose and easy accessibility. Historically, lung cancer screening studies using cytology and/or conventional screen-film CXR have yielded disappointing results 1. Screening with conventional CXR was therefore considered inappropriate. These studies, however, used analogue screen-film techniques for chest radiography. The use of modern digital equipment with highly quantum-efficient detectors and elaborate processing tools improves visualization of pulmonary structures with CXR 2-5 and may, therefore, be a more suitable screening tool than is conventional CXR. Up to now, little has been known about the performance of modern digital radiography for lung cancer screening. We used a nested case-control setup that was based on data from the Dutch-Belgian Lung Cancer Screening Trial (NELSON) 6 to study how the confidence level for presence of a lesion affected observer performance. Assuming that a positive CXR screening result would initiate a work-up CT examination, we also estimated the number of work-up CTs needed to detect one lung cancer in the NELSON study cohort. The aim was to estimate the performance of digital CXR for detection of lung cancer.

Study Population We recruited our patients from two screening sites (Utrecht and Groningen, The Netherlands) and included all 4938 patient who underwent baseline and one year follow-up screening until July 2007. The NELSON trial was approved by the ethicscommittees of both institutions and a waiver was received for our part of the study. All participants of the NELSON trial are former or current heavy smokers 7. A total of 65 lung cancers were detected at baseline screening and 1-year follow-up in this group, which results in a cancer rate of 1.3%. Cases We recruited our case cohort from all 65 subjects in whom a pulmonary malignancy was detected with low-dose CT at one of the two screening sites. All malignancies were histologically proved. The NELSON study setup does not include an obligatory acquisition of a CXR at inclusion time. However, CXR is still part of clinical practice during the diagnostic work-up of suspicious lesions. Therefore, the positive cases in our study group comprise a selection of patients who underwent CXR in two projections as part of the clinical practice; during the diagnostic work-up or for pre-operative screening. Consequently, all CXRs were performed after detection of the malignancy by CT. Ten patients who did not receive a CXR within 6 weeks after detection of a suspicious lung nodule were excluded. This resulted in a study group of 55 cases with at least one malignant nodule. The study group included twelve cases whose cancer was detected at 1-year follow-up but was retrospectively visible at baseline CT-screening. All twelve were also reported at baseline, but did not meet the criteria for referral at that time 6. Controls From all participants who were not patients and who were screened at both study sites (n=4873), we included all participants in whom CT did not demonstrate nodules larger than 5mm in diameter and in whom a CXR was available, performed within 2 months from CT screening for reasons other than a suspicious lung nodule. CXRs for which the radiology report mentioned pulmonary abnormalities other than those related to chronic obstructive pulmonary disease (COPD) were excluded. Seventytwo subjects met these criteria. Indications for acquisition of the CXR were exclusion of acute cardiovascular disease (n=18), follow-up of COPD (n=17), screening for lung abnormalities because of rheumatoid arthritis (n=13), pre-operative screening for cardiovascular surgery (n=11), unexplained fever (n=11), trauma (n=1), and malaise (n=1). We tested for differences in prevalence of COPD in the case and control groups because this disease may affect visibility of nodules. We were able to compare prevalence of COPD because a large subsample of the whole screening population underwent pulmonary function testing (PFT) as part of a substudy of the lung cancer screening trial. PFT was available in 43 (78%) of our lung cancer cases, 46 (64%) of our control subjects, and 2547 (52%) of the participants without cancer in the lung

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

Material and Methods

17

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

18

Table 1 Demographics of study participants.

P values cases vs controls

P values controls vs non-cases

61.4±5.5

0.52

0.46

59/13

4291/510

0.32

0.02

38%

38%

0.14

0.88

Cases

Controls

All non-cases minus controls

n=55

n=72

n=4801

Age in years (mean±SD)

64.5±6.4

62.4±5.3

Male/female

49/6

COPD

63%

cancer screening trial. The remaining participants did not undergo PFT. PFT included forced expiratory volume in one second (FEV1) and forced vital capacity (FVC). A participant with a FEV1/FVC less than 0.7 was considered to have COPD 8. Acquisition and Evaluation of Screening CTs All CT-examinations were acquired and evaluated for nodules according to NELSON protocol 6. Volume and mean diameter of detected nodules were assessed using volumetric software (LungCare, Siemens, Erlangen, Germany) 6. Acquisition and Evaluation of Chest Radiographs Acquisition technique was identical to conventional CXRs performed in our hospital (University Medical Center Utrecht, Utrecht, The Netherlands). All CXRs were obtained by using a cesium iodide amorphous silicon flat-panel-detector unit (Philips, DigitalDiagnost, Best, The Netherlands). Images were processed using non-linear multi-frequency band processing 9; parameters recommended by the manufacturer were used. For all patients posterior-anterior and lateral projections were available. Images were evaluated on 2K LCD monitors (MFGD 3220D, Barco, Kortrijk, Belgium) without and with grey scale reversal. Options for magnification and adaptation of window settings were used only when observers were unsure about a region. Cases and controls were presented in alphabetical order on the basis of patient name to four independent observers with varying levels of experience: two chest radiologists (C.S. and M.P.) with more than 20 years experience (A and B); a general radiologist with more than 20 years of experience (C), and a third year resident with special experience and interest in chest radiology (D). Observers were aware of the study population but did not know the number of malignancies in the group. Nodules smaller than 5mm in diameter and calcified granulomas were ignored. Posterior-anterior and lateral radiographs were evaluated. The observers scored the presence of focal opacities that were suspicious for malignancy using a five point

Statistical Evaluation A nested case-control setup within the NELSON lung cancer screening trial was used. This design uses a selection of the control subjects to represent all control subjects in the full cohort, enabling reconstruction of the results for the full cohort 10 . For that purpose the results for the control group are multiplied by the quotient of one divided by the sample fraction. The cases of cancer in which no CXR was available (10/65 = 15.4%) were excluded. To determine the sample fraction, the number of non-cases, therefore, had to be adjusted accordingly (4873 – 15.4% = 4123) to match the 1.3% cancer rate in the NELSON cohort at the time of this study. The sample fraction in our study, therefore, was 72/4123. We tested whether the control subjects in the observer study were representative for all non-cases in the full cohort, (a requirement of a nested case-control study). Categorical parameters were evaluated via χ2 test, continuous parameters were evaluated with a Student t test. In all calculations we assumed 100% sensitivity for CT. Confidence intervals (CI) were calculated by using the Wilson score. The following four parameters were used to assess the performance of CXR as a screening tool for lung cancer: 1) Localization receiver operating characteristic (ROC) curve analysis. This analysis summarizes the reader’s sensitivity and specificity in a single value. Localization ROC analysis differs from normal ROC analysis in that it takes into account the correct localization of a reader’s marking. Localization ROC analyses were performed as described by Swensson 11. Jackknife free-response ROC software (Chakraborty D, University of Pittsburgh, PA) 12, 13 was used to test for statistical significant differences between the localization ROC areas. 2) Sensitivity and specificity. These statistics were calculated individually for each reader and each confidence level. Only correctly localized tumors were considered true-positive findings. 3) Number of work-up CTs per CXR-detected cancer. This number describes how many CT scans need to be obtained in the whole screening cohort to find one positive

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

confidence scale: 1, no lesion; 2, irregularity, probably no lesion; 3, indeterminate for the presence of a lesion; 4, probably lesion present; and 5, definitely lesion present. Readers had to manually localize the lesion on the radiograph. If more than one suspicious area was detected, the observer had to mark the most suspicious area. The CXR reading was considered true positive only when the localization of the lesion was correct. Observers were not forced to place a marking; they could also rate a radiograph as normal, i.e. no nodule present (confidence level 1). Reading time was unlimited and ranged from 140 to 175 minutes for the different observers for the total study; the mean time per patient examination ranging from 70 to 97 seconds. After acquisition of all reading data and with knowledge of the CT findings, observer A retrospectively determined whether the lesions that had not been seen by any of the observers were visible on the CXR. In addition, the same observer determined whether lesions were obscured by anatomic structures on the posterioranterior radiograph.

19

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

case that was correctly suspected from the CXR. It is based on the assumption that a suspicious CXR finding initiates a CT for further diagnostic work-up. The nested case-control setup allows us to estimate the positive predictive value (PPV) of CXR for the total screening population. Because PPV describes the proportion of true positive CXRs among all positive CXRs, PPV is equal to the proportion of positive work-up CT examinations (cancer at the suspected location) among all work-up CTs performed for a positive CXR. Therefore the number of work-up CT examinations per CXR-detected cancer can be calculated as 1 / PPV. 4) Total percentage of malignancies detected during CT work-up. There is a small chance that a work-up CT will reveal a cancer in a different location than is suspected at CXR. This chance increases with the total number of work-up CTs performed. The total percentage of detected malignancies therefore includes the reader’s true positive findings on CXR plus an estimated number of malignancies incidentally found on the work-up CT examinations performed for a false positive density on the CXR. To calculate the total percentage of malignancies detected at CT, we also determined the chance that a CT examination, based on a false positive CXR report, would incidentally reveal a malignancy. This chance is equal to the number of non-detected malignancies divided by the number of participants for whom the CXR had negative or false-positive results. This chance was multiplied by the number of CTs performed for false positive CXR reports to calculate the number of incidentally CT-detected malignancies. The number of true-positive CXRs plus incidentally CTdetected malignancies formed the total number of detected malignancies. These incidentally detected lesions will also result in a slightly different number of CTs per detected cancer, which hereafter we call CT examinations per cancer. P-values less than 0.05 were considered significant. Table 2 Diameters of detected and undetected nodules.

Observer

p-value

Median diameter of detected nodules (mm)

Median diameter of undetected nodules (mm)

A

0.10

12.2

(7.6 - 50.7)

10.6

(4.2 - 18.5)

B

0.09

11.8

(7.6 - 50.7)

11.4

(4.2 - 21.0)

C

0.36

11.8

(8.1 - 50.7)

10.7

(6.8 - 35.4)

D

0.001

17.1

(8.1 - 50.7)

11.4

(3.5 - 21.0)

Note – Only ‘probably present’ and ‘definite’ nodules (confidence levels 4 and 5) were counted. Numbers in parentheses are ranges. For observer D, the detected nodules were significantly larger than the nodules that were not detected by this observer.

20

Results Study Population Our case cohort did not significantly differ from our control cohort with respect to age, gender and prevalence of COPD. Compared with the non-cases in the full cohort, age and prevalence of COPD was not different in our control cohort, but the control cohort contained relatively more women (p=0.02) than the noncase group in the full cohort (Table 1). Malignancies The diameter of the malignancies ranged from 6.8mm to 50.7mm (median of 11.8mm). Four malignancies manifested as ground-glass opacity (GGO) on a CT scan; one was nonsolid and three were part-solid lesions. Two of the part-solid GGOs were detected by three observers and the two other GGOs were not detected by any observer. Most lesions were located in the right upper lobe (n=25). The remaining lesions were located in the right middle lobe (n=3), right lower lobe (n=9), left upper lobe (n=8) and left lower lobe (n=10). Twenty-six lesions were at least partially obscured on the PA radiograph by overlying anatomic structures such as (hilar) vascular structures (n=7), the clavicle (n=9), the heart (n=3), a rib (n=6) or the diaphragmatic recess (n=1). Obscured lesions were responsible for a mean of 43% of all undetected lesions; individual rates were 36% for observer A, 41% for observer B, 39% for observer C and 55% for observers D.

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

Figure 1 Localization ROC curves and AUC values for the different observers. Note the better performance of the chest radiologists (observers A and B). The difference between observer A and C was significant (P < .05). AUC = area under the curve.

21

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

Table 3 Relationship between sensitivity, specificity, and number of work-up CT examinations per cancer detected.

Observer

Sensitivity %,(CI)

Specificity % (CI)

Number of work-up CTs per CXR-detected cancer (CI)

2+3+4+5

75 (61-85)

24 (15-35)

78 (57-107)

3+4+5

75 (61-85)

68 (56-78)

33 (24-45)

probably lesion present

4+5

73 (59-84)

82 (71-90)

20 (14-27)

definite lesion present

5

49 (36-63)

92 (82-97)

14 (9-20)

2+3+4+5

69 (55-80)

51 (39-63)

54 (39-75)

3+4+5

67 (53-79)

60 (47-71)

46 (33-64)

probably lesion present

4+5

60 (46-73)

82 (71-90)

24 (17-33)

definite lesion present

5

45 (32-59)

96 (87-99)

8 (5-12)

2+3+4+5

60 (46-73)

56 (43-67)

57 (40-81)

3+4+5

60 (46-73)

57 (45-68)

55 (39-78)

probably lesion present

4+5

53 (39-66)

68 (56-78)

46 (32-68)

definite lesion present

5

40 (27-54)

88 (77-94)

24 (16-38)

2+3+4+5

67 (53-79)

50 (38-62)

57 (40-79)

3+4+5

53 (39-66)

88 (77-94)

19 (13-27)

probably lesion present

4+5

36 (24-50)

99 (50-76)

4 (3-6)

definite lesion present

5

18 (10-32)

100 (94-100)

1 (1-2)

Minimum level of confidence included * irregularity, probably no lesion indeterminate

A

irregularity, probably no lesion Indeterminate B

irregularity, probably no lesion indeterminate C

irregularity, probably no lesion indeterminate D

Note – The cancer rate in the lung cancer screening cohort was 1.3% in the lung cancer screening cohort. While for readers A, B, and C, inclusion of the lower levels of confidence (levels 2 and 3) did not substantially increase sensitivity, observer D had a generally low level of confidence and would gain in sensitivity if low levels of confidence were included. Numbers in parentheses are 95% CIs. * 5 point ROC scale: 1, no lesion; 2, irregularity, probably no lesion; 3, indeterminate for the presence of a lesion; 4, probably lesion present; and 5, definite lesion present

Twenty-four (44%) malignancies were correctly localized by all observers. Seven (13%) malignancies were localized by none of the observers. Three of these seven were not visible on CXR even with the knowledge of the CT findings (Figure 2). Median diameter of correctly localized lesions ranged from 11.8mm to 17.1mm, depending on the reader. Median diameter of undetected lesions ranged from 22

Localization ROC Analysis The area under the localization ROC curve ranged from 0.52 for observer C to 0.69 for observer A. Localization ROC analysis indicated a better performance for the two chest radiologists (A and B) compared to the readers C and D but differences were significant only for observers A and C (P < .05) (Figure 1). Sensitivity At the highest level of confidence (level 5 = definitely lesion present), sensitivity for correctly localized malignant lesions on CXR varied from 18% (95% CI: 10%32%) to 49% (95% CI: 36%-63%) at a specificity of 100% (95% CI: 94%-100%) to 92% (95% CI: 82%-97%) respectively. The false positive rates in the control group at this level of confidence ranged from 0% (0 of 72) to 13% (9 of 72) for observers D and C respectively. When also lesions rated as probably present (level 4) were taken into account, sensitivity increased to between 36% (95% CI: 24%-50%) for observer D and 73% (95% CI: 59%-84%) for observer A with a specificity of 99% (95% CI: 50%-76%) and 82% (95% CI: 71%-90%), respectively (Table 3). Most lesions that had been rated as ‘definitely present’ on the CXR, turned out to be correctly localized malignancies: the positive predictive values of nodules rated as ‘definitely present’ were 82% (95% CI: 64%-92%), 89% (95% CI: 71%-97%), 71% (95% CI: 52%-85%) and 100% (95% CI: 66%-100%) for observers A,B,C and D, respectively. Number of Work-up CTs per CXR-detected Cancer When only ‘definite lesions’ (confidence level 5) were taken into account, we calculated that positive calls would have initiated 1 (95% CI: 1-2) to 24 (95% CI: 1638) work-up CT examinations per CXR-detected lung cancer in the total screening cohort. When also lesions rated as ‘probably present’ (level 4) were taken into account, the number of CT examinations per CXR-detected cancer ranged from 4 (95% CI: 3-6) to 46 (95% CI: 32-68). Table 3 summarizes the relationship between sensitivity and number of work-up CT examinations. Total Percentage of Malignancies Detected During CT Work-up The percentage of additional, incidentally detected, malignancies ranged from 0% to 8% of the total number of malignancies for the various observers at the highest level of confidence. At this confidence level, the total percentage malignancies detected during work-up varied from 53% with 13 work-up CT examinations per cancer to 18% with 1 work-up CT/cancer. For all observers, the percentage of additional incidentally detected malignancies increased as the confidence level decreased (Table 4).

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

10.6mm to 11.1mm. The difference in size reached significance for only one reader (P = .001, Table 2).

23

24 4+5 5

definite lesion present

3+4+5

probably lesion present

indeterminate

2+3+4+5

5

definite lesion present

irregularity, probably no lesion

4+5

3+4+5

probably lesion present

Indeterminate

2+3+4+5

5

definite lesion present

irregularity, probably no lesion

4+5

3+4+5

probably lesion present

indeterminate

2+3+4+5

5

definite lesion present

irregularity, probably no lesion

4+5

3+4+5

probably lesion present

indeterminate

18

36

53

67

40

53

60

60

45

60

67

69

49

73

75

75

irregularity, probably no lesion

2+3+4+5

Sensitivity of CXR (%)

Minimum level of confidence included *

0

1

6

17

8

15

17

18

3

7

14

15

4

5

8

19

Incidentally detected malignancies during work-up CT (%)

18

37

59

84

47

68

77

78

48

67

80

84

53

78

83

94

Total percentage of malignancies detected during work-up (%) †

1

4

17

46

21

36

43

44

7

21

38

44

13

18

30

62

Number of workup CTs/cancer

† This number includes the truly localized malignancies plus the malignancies detected incidentally by CT-examinations performed for false positive CXRs.

Note – *5 point ROC scale: 1, no lesion; 2, irregularity, probably no lesion; 3, indeterminate for the presence of a lesion; 4, probably lesion present; and 5, definite lesion present

D

C

B

A

Observer

Table 4 Estimated total percentages of malignancies detected during screening work-up in screening in which positive chest radiograph would lead to initiation of work-up CT.

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

Even in the era of digital imaging, detection of lung cancer with CXR is a challenging task that shows high interreader variability. When only lesions rated as “probably present” and “definitely present” were considered to require further diagnostic work-up, the number of detected malignant lesions ranged from 37% to 78% (table 4), depending on the reader. Thus, in instances in which we would rely on CXR alone, 22-63% of the lung cancers would be missed at a stage of disease at which they could be detected with CT. Even for the most experienced chest radiologist, a detection rate beyond 90% (e.g., 94%) could be achieved only when the threshold for performing a CT was reduced to the lowest confidence level (irregularity, probably no lesion) and malignancies present at work-up CT examinations for false-positive CXR were counted as well. This however, would have resulted in 62 work-up CT examinations per cancer adding up to 3191 CTs in the whole population and still leaving 3 cancers undetected. The PPV of CT during the first round of the NELSON trial was 35.7% 14, resulting in approximately 3 referrals to a pulmonologist to diagnose 1 cancer. This demonstrates that even modern CXR has a much worse performance than CT for lung cancer screening and is hampered by large numbers of false positive readings or low detection rates. On the other hand, despite the lower detection rate of CXR, about half of the CT-proved malignancies were detected with CXR by all readers. This may substantially affect the power of randomized lung cancer screening trials if CXR is used in the control arm 15. However, screening with CXR commonly only involves frontal radiographs; we used posterior-anterior and lateral images in our study. Whether using CXR as primary screening tool would also influence the outcome of patients in terms of mortality and survival cannot be determined on the basis of these data alone since the biological behavior of primary bronchogenic carcinoma greatly varies and prognosis of lesions differ according to their time of diagnosis. On average, lesions detected at CXR were larger than undetected lesions, although the differences in size were small. We showed that the area under the curve for the localization ROC curve increased substantially with years of experience and subspecialization. However, the difference in performance was signficant only for observers A and C. The two chest radiologists (observers A and B) had the greatest area under the curve, but the resident with special experience (observer D) in chest radiology performed better than the general radiologist (observer C) with more than 20 years experience. These results suggest that special training is advantageous for reading CXRs in a lung cancer screening setting. Observer D demonstrated a generally lower level of confidence and, therefore, showed a stronger increase in sensitivity than the other three observers when also lesions detected with lower levels of confidence were taken into account. Observer behavior, therefore, has a stronger impact on performance with CXR than it does with CT 16. Sensitivity of CXR for malignancies detected in a lung cancer screening trial has previously been studied by using conventional radiography 17. The diameter of the malignancies was similar to those in our study. Similar percentages of nodules

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

Discussion

25

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

26

Figure 2 (a, b) Small 11.4-mm- diameter malignancy that was correctly localized and scored as ‘definitely present’ by all observers (arrow ) at (a) digital chest radiography and (b) CT. (c) Digital chest radiograph and (d) CT scan of malignancy (arrow) that was not detected by any of the observers during the study or retrospectively with knowledge of CT findings.

were obscured by anatomical structures. The main difference refers to the imaging technique (digital versus conventional) and the inclusion of lateral radiographs in our study. Conventional CXR had a sensitivity of 23% and a specificity of 96%. While maintaining the same level of specificity, digital radiography and inclusion of lateral images showed a sensitivity level approximately twice as high for each observer. Researchers in several other studies have assessed the sensitivity of conventional and digital CXR; sensitivity have ranged from 36% to 84% depending on the study population 18-22. These studies retrospectively assessed the performance of CXR for CT proven lesions but did not separately quantify the number of lesions, detected by CT but impossible to visualize on a projection radiograph 18, 21, 23. Because of the procedure we used to select participants, the control subjects may have been skewed towards COPD. It is known that the increased nodular and reticulo-nodular markings on a chest radiograph, often seen in smokers suffering

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

from COPD, affect the ability of an observer to spot focal opacities 24. The PFT results however, showed that the prevalence of COPD was did not differ in our control cohort compared to the non-cases in the full cohort. The main limitation of this study is the absence of an independent reference standard. Only malignancies detected at CT were included. Sensitivity of CT in the NELSON trial, defined as the ratio between CT-detected cancers and all pulmonary cancers diagnosed during the first year following CT-screening, was greater than 94% 14. Within the screening trial only one interval cancer was misinterpreted on CT; this cancer turned out to be retrospectively visible at the screening CT. Thus, our cohort of radiologically detectable cases would have been practically the same had we used CT or an independent reference standard. The cancer rate was higher in our study than it was in the NELSON cohort. Although the observers were not aware of the exact disease frequency, they were aware of the higher prevalence which could potentially have led towards overdiagnosis 25. Conversely, most lesions rated as ‘definitely present’ were indeed malignant. Furthermore, the number of CTs per detected cancer may have been underestimated because controls subjects in our study did not have any nodules larger than 5mm, as proved by CT. In a usual screening situation, however, a certain percentage of patients without malignancies would have presented with benign nodules, which again could have led to false-positive readings. Finally, four observers performed the readings in this study. Although they had varying experience and rating behavior, they still represent a selected group. More observers are needed to quantify the impact of reader behavior if CXR would be used as screening tool on a large scale. In conclusion, high rates of lung cancer detection can be achieved with digital chest radiographs at a stage when lesions are seen at CT screening, but only at the expense of a low specificity that results in an excessive number of work-up CT examinations. The detection performance with CXR strongly depends on the observer’s confidence level and experience. Therefore, even the use of modern digital technology instead of analogue screen-film technique does not make CXR as efficient as low-dose CT for lung cancer screening.

27

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

28

Reference List 1. Flehinger BJ, Melamed MR. Current status of screening for lung cancer. Chest Surg Clin N Am 1994 February;4(1):1-15. 2. Hennigs SP, Garmer M, Jaeger HJ, Classen R, Jacobs A, Gissler HM, Christmann A, Mathias K. Digital chest radiography with a large-area flat-panel silicon X-ray detector: clinical comparison with conventional radiography. Eur Radiol 2001;11(9):1688-96. 3. Fink C, Hallscheidt PJ, Noeldge G, Kampschulte A, Radeleff B, Hosch WP, Kauffmann GW, Hansmann J. Clinical Comparative Study with a Large-Area Amorphous Silicon Flat-Panel Detector: Image Quality and Visibility of Anatomic Structures on Chest Radiography. Am J Roentgenol 2002 February 1;178(2):481-6. 4. Garmer M, Hennigs SP, Jager HJ, Schrick F, van de Loo T, Jacobs A, Hanusch A, Christmann A, Mathias K. Digital Radiography Versus Conventional Radiography in Chest Imaging: Diagnostic Performance of a Large-Area Silicon Flat-Panel Detector in a Clinical CT-Controlled Study. Am J Roentgenol 2000 January 1;174(1):75-80. 5. Redlich U, Hoeschen C, Effenberger O, Fessel A, Preuss H, Reissberg S, Scherlach C, Dohring W. Comparison of four digital and one conventional radiographic image systems for the chest in a patient study with subsequent system optimization. Rofo 2005 February;177(2):272-8. 6. Xu DM, Gietema H, de Koning H, Vernhout R, Nackaerts K, Prokop M, Weenink C, Lammers JW, Groen H, Oudkerk M, van Klaveren R. Nodule management protocol of the NELSON randomised lung cancer screening trial. Lung Cancer 2006 November;54(2):177-84. 7. van Iersel CA, de Koning HJ, Draisma G, Mali WP, Scholten ET, Nackaerts K, Prokop M, Habbema JD, Oudkerk M, van Klaveren RJ. Risk-based selection from the general population in a screening trial: selection criteria, recruitment and power for the Dutch-Belgian randomised lung cancer multi-slice CT screening trial (NELSON). Int J Cancer 2007 February 15;120(4):868-74. 8. Rabe KF, Hurd S, Anzueto A, Barnes PJ, Buist SA, Calverley P, Fukuchi Y, Jenkins C, RodriguezRoisin R, van WC, Zielinski J. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med 2007 September 15;176(6):532-55. 9. Stahl M, Aach T, Dippel S. Digital radiography enhancement by nonlinear multiscale processing. Med Phys 2000 January;27(1):56-65. 10. Biesheuvel CJ, Vergouwe Y, Oudega R, Hoes AW, Grobbee DE, Moons KG. Advantages of the nested case-control design in diagnostic research. BMC Med Res Methodol 2008;8:48. 11. Swensson RG. Unified measurement of observer performance in detecting and localizing target objects on images. Med Phys 1996 October;23(10):1709-25. 12. Zheng B, Chakraborty DP, Rockette HE, Maitz GS, Gur D. A comparison of two data analyses from two observer performance studies using Jackknife ROC and JAFROC. Med Phys 2005 April;32(4):1031-4. 13. Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 2004 August;31(8):2313-30. 14. Klaveren R, Oudkerk M, Prokop M, Scholten ET, Nackaerts K, Vernhout R, van Iersel CA, van den Bergh K, van ‘t Westeinde S, van der Aalst C, Thunissen E, Xu D.M., Wang Y, Zhao Y, Gietema H, de Hoop B, Groen H, de Bock T, van Ooijen P, Weenink C, Verschakelen Johny, Lammers JW, Timens W, Willebrand D, Vink A, Mali WP, de Koning HJ. A Novel Management Strategy for CT Detected Lung Nodules Based on Volumetry. N Engl J Med 2009 January 11. 15. National Lung Cancer Screening Trial, website: http://www.nci.nih.gov/nlst. 2008. 16. Awai K, Murao K, Ozawa A, Komi M, Hayakawa H, Hori S, Nishimura Y. Pulmonary Nodules at Chest CT: Effect of Computer-aided Diagnosis on Radiologists’ Detection Performance. Radiology 2004 February 1;230(2):347-52. 17. Sone S, Li F, Yang ZG, Takashima S, Maruyama Y, Hasegawa M, Wang JC, Kawakami S, Honda T. Characteristics of small lung cancers invisible on conventional chest radiography and detected by population based screening using spiral CT. Br J Radiol 2000 February 1;73(866):137-45. 18. Quekel LGBA, Kessels AGH, Goei R, van Engelshoven JMA. Detection of lung cancer on the chest radiograph: a study on observer performance. European Journal of Radiology 2001 August;39(2):111-6.

Chapter 2 | Screening for lung cancer with digital chest radiography: sensitivity and number of secondary work-up CT examinations

19. Potchen EJ, Cooper TG, Sierra AE, Aben GR, Potchen MJ, Potter MG, Siebert JE. Measuring Performance in Chest Radiography. Radiology 2000 November 1;217(2):456-9. 20. Gavelli G, Giampalma E. Sensitivity and specificity of chest X-ray screening for lung cancer: review article. Cancer 2000 December 1;89(11 Suppl):2453-6. 21. Li F, Arimura H, Suzuki K, Shiraishi J, Li Q, Abe H, Engelmann R, Sone S, MacMahon H, Doi K. Computer-aided Detection of Peripheral Lung Cancers Missed at CT: ROC Analyses without and with Localization. Radiology 2005 November 1;237(2):684-90. 22. Toyoda Y, Nakayama T, Kusunoki Y, Iso H, Suzuki T. Sensitivity and specificity of lung cancer screening using chest low-dose computed tomography. Br J Cancer 2008 May 20;98(10):1602-7. 23. Monnier-Cholley L, Carrat F, Cholley BP, Tubiana JM, Arrive L. Detection of Lung Cancer on Radiographs: Receiver Operating Characteristic Analyses of Radiologists’, Pulmonologists’, and Anesthesiologists’ Performance. Radiology 2004 December 1;233(3):799-805. 24. Samei E, Flynn MJ, Eyler WR. Detection of Subtle Lung Nodules: Relative Influence of Quantum and Anatomic Noise on Chest Radiographs. Radiology 1999 December 1;213(3):727-34. 25. Egglin TK, Feinstein AR. Context bias. A problem in diagnostic radiology. JAMA 1996 December 4;276(21):1752-5.

29

3 Chapter

Bartjan de Hoop Diederik W De Boo Hester Gietema Frans van Hoorn Banafsche Mearadji Laura Schijf Bram van Ginneken Mathias Prokop Cornelia Prokop-Schaefer

Computer-aided detection of lung cancer on CXR: how CAD and human observers interact

.......................................................................... Abstract ..........................................................................

Purpose To assess how computer aided detection (CAD) affects reader performance for the task of detecting early lung cancer on chest radiographs (CXR). Method and Materials We included 51 individuals with 55 CT-detected, histology-proven lung cancers and 65 patients without nodules on CT. All subjects participated in a lung cancer screening trial. CXRs were obtained within 2 months after the screening CT. Four radiology residents and two experienced radiologists were asked to identify and localize potential cancers on CXR, first without and subsequently with the use of Riverain OnGuard 5.0 CAD software. Figure of merit (FOM) was calculated using Free-Response Operating Characteristic (FROC) analysis. Results Tumor diameter ranged from 6.8mm to 50.7mm (median 11.8mm). Fifty-one percent (28/55) of lesions were subtle and detected by 2 or less readers. Stand-alone CAD sensitivity was 55% with an average of 2.4 false positive (FP) annotations per CXR. Average sensitivity was 56% for radiologists at 0.24 FP per CXR and 44% for residents at 0.46 FP per CXR. FOM did not change significantly for any of the observers after using CAD. CAD marked between 5 and 16 cancers that were initially missed by readers. These correctly CAD-detected lesions were rejected by radiologists in 92% of cases and by residents in 77% of cases. Conclusion The sensitivity of CAD for identifying lung cancers detected by CT screening was similar to experienced radiologists. However, CAD did not improve cancer detection because, especially for subtle lesions, observers were unable to sufficiently differentiate true from false positive annotations. .........................................................................................................................................................................

Chapter 3 | Computer-aided detection of lung cancer on CXR: how CAD and human observers interact

32

Introduction Chest X-ray (CXR) is still the most commonly used technique in clinical routine to rule out chest disease, to study the effect of treatment and to follow-up patients. Missing a lung cancer on CXR is one of the most frequent causes for malpractice lawsuits in radiology 1. However, the task of detecting focal lung lesions is challenging: sensitivity for detecting bronchopulmonary malignancies with CXR ranges from only 36% to 84%, depending on study population and tumor size 2-6. Several authors have reported that many missed lesions can be detected in retrospect 7-10. Pulmonary lesions can be missed for two reasons; either they are overlooked or they are misinterpreted as normal structures. In order to increase CXR sensitivity for pulmonary nodules, computer aided detection (CAD) systems are currently being developed. The goal of CAD is to identify lesions that might be overlooked and missed by the reader. The current standard paradigm for the use of CAD systems is to employ CAD as a second reader: after the radiologist has evaluated the image, CAD offers a number of candidate lesions, which have to be subsequently accepted by the radiologists as true positives or rejected as false positive annotations. The final detection rate by the reader will be influenced by the interaction between his own perception, the performance of the CAD and his capability to differentiate true from false positive candidate lesions. CAD systems are constantly being improved with the aim of increasing sensitivity while simultaneously decreasing the number of false positive lesions. The purpose of this study was to assess how CAD affects reader performance for the task of detecting early lung cancer on CXR.

Materials and Methods Study Population All CXRs used in this study were taken from participants from two centers (Utrecht, Groningen, The Netherlands) of the NELSON lung cancer CT-screening trial 11. This trial was approved by the Ministry of Health and by the ethics committee of each participating hospital. Participants are aged between 50 and 75 years, and current of former heavy smokers, reflecting a population with high risk of developing lung cancer. In this population of 4938 participants at the two centers, CXR may be ordered for preoperative routine workup or for follow-up of screening-detected lesions but also for clinical causes unrelated to screening. We included all CXRs taken between April 2004 and January 2008 in this group of subjects under the following conditions: in patients in whom a pulmonary malignancy was detected by screening CT and histologically confirmed (cases), the CXR had to be performed within 6 weeks after the screening CT; in the other subjects (controls), the CXR had to be performed within 2 months from screening CT and no nodules larger than 5mm in diameter had to be present on the screening CT. CXRs for which the radiology report mentioned pulmonary abnormalities other than COPD were excluded.

Mean age, yrs (SD) Male/female

Cases

Controls

p

n=51

n=65

64.2 (6.3)

62.5 (5.3)

0.13

46/5

54/11

0.30

Acquisition All CXRs were obtained using a cesium iodide amorphous silicon flat-panel-detector unit (DigitalDiagnost, Philips, Best, The Netherlands). Images were processed using non-linear multi-frequency band processing 12; parameters were used as recommended by the manufacturer. For all patients a posterior-anterior and lateral projection was available. The screening CT examinations were performed with 16x0.75mm collimation at 30mAs and 120 to 140 kV, depending on weight. Sections of 1mm thickness were reconstructed every 0.7mm. Standard of Reference In the cancer-positive cases the exact location of each nodule on CXR was determined by an observer who did not participate as a reader (BdH). An independent chest radiologist (MP) judged whether lesions were retrospectively visible on CXR. Both had access to CXR as well as screening CT examinations. All screening CT examinations were evaluated for nodules according to the criteria set by the lung cancer screening program 13. Volumetric software (LungCare, Siemens) was used to assess nodule volume and mean diameter. CAD System We used a commercially available CAD system (Riverain, Onguard 5.0). The software highlights regions suspicious for containing a focal lung lesion by placing a circle of 5cm in diameter around the suspicious area (Figure 1). Images are automatically processed in the background, so that results are immediately available on demand when the CXR is being read by a radiologist. The program analyzes the posterioranterior or anterior-posterior projection. According to the manufacturer, the algorithm was optimized to detect nodules of 9 to 30mm, although in practice, it also marks larger and smaller nodules.

Chapter 3 | Computer-aided detection of lung cancer on CXR: how CAD and human observers interact

Table 1 Demographics of study participants.

Stand-alone CAD Performance To asses stand-alone performance of the CAD system annotations were labeled TP if the center of the circle indicating a suspicious lesion intersected the location of a true lesion on CXR. Observer Study Images were evaluated on 2K LCD monitors (BARCO, MFGD 3220D). CXRs were

33

Chapter 3 | Computer-aided detection of lung cancer on CXR: how CAD and human observers interact

Figure 1 CXR with CAD annotations of a patient with a malignancy in the left upper lobe. The arrow indicates a true positive CAD annotation. The remaining CAD annotations are false positive.

randomized and shown to 6 independent observers. The observers varied in level of experience: one general radiologist with 6 years of experience (observer A), one chest radiologist with more than 20 years of experience (observer B), and 4 radiology residents with experience varying from 1 to 4 years (observers C-F). Observers knew that the study group was chosen from a lung cancer screening trial, they were also told that some patients might have more than one malignant lesion. Each CXR was first evaluated without and subsequently with CAD results, and observer readings were recorded separately. Observers had to mark all potentially malignant focal abnormalities on the CXR and score them using a four point scale of confidence (1, potential lesion, very low degree of suspicion; 2, dubious lesion; 3, probable lesion; and 4, definite lesion). Observers were allowed to mark multiple suspicious lesions on each CXR. They were instructed, however, to ignore nodules smaller than 5mm in diameter. A lesion was considered true positive (TP) if the location matched a nodule on CXR. Locations that did not match with a lesion were classified as false positive (FP). Data Analysis Free-Response Operating Characteristic (FROC) analysis of the observer study was

34

Results Sample Characteristics A total of 51 participants with 55 histology-proven pulmonary malignancies met the criteria for the cancer-positive cases. Sixty-five subjects met the criteria for control cases. Indications for acquisition of the CXR in the control group were exclusion of acute cardiovascular pathology (n=18), COPD (n=18), screening for lung abnormalities because of rheumatoid arthritis (n=10), pre-operative screening (n=10), unexplained fever (n=4), chronic cough (n=3), malaise (n=1) and trauma (n=1). Cases did not significantly differ from controls with respect to age and gender (Table 1). Tumor diameter ranged from 6.8 mm to 50.7 mm (median 11.8 mm) with 2 lesions larger than 30mm. Conspicuity of malignancies was very variable: 10/55 (18%) of the malignancies were detected by all observers without the use of CAD. Others were very difficult to detect: 14/55 (25%) malignancies were not detected by any of the observers without or with the use of CAD and 7 of those were not visible even with the knowledge of the CT (Figure 2).

Chapter 3 | Computer-aided detection of lung cancer on CXR: how CAD and human observers interact

performed as described by Swensson 14 on a per-marker basis. Jackknife FROC (JAFROC), especially developed to analyze observer free-response tasks 15-17, was used to analyze the FROC data. JAFROC software version 2.3a 15, 18, 19 was used to compute a figure of merit (FOM). The FOM is defined as the probability that lesions (including unmarked lesions) are rated higher than non-lesion marks on control CXRs 16, or, in other words, that lesions are given a higher confidence rating for presence of malignancy than normal findings. Normal images with no marks and unmarked lesions are assigned zero ratings. Sensitivity was calculated as the number of true positive markings divided by the total number of malignancies. All observer markers, even those that were scored with low confidence, were included to calculate sensitivity and FP rate. Since it is controversial whether application of CAD as second reader also allows for discharge of candidates seen without CAD 20, we also evaluated the situation in which the observers could only increase their suspicion with CAD while preserving all lesion locations seen without CAD. In an effort to understand the effect of lesion conspicuity on our results, we performed a separate JAFROC analysis on conspicuous nodules, defined as lesions that were detected by 3 or more readers. To test for demographic differences between the cases and controls, we compared both groups with respect to gender using a chi-square test and age with a Student’s t-test. P-values < 0.05 were considered significant.

CAD The CAD stand-alone sensitivity was 30/55 (55%) with on average 2.4 FPs (range 0-5) per CXR. CAD detected 3 malignancies that were initially not detected by any of the observers. The diameter of the CAD-detected malignancies ranged from 7.0 mm to 50.7 mm.

35

Chapter 3 | Computer-aided detection of lung cancer on CXR: how CAD and human observers interact

Figure 2 Two cancer-positive cases with a tumor, shown by the arrow. One was detected by all observers (a,b) and the second was detected by none of the observers (c,d). Images a and c are CXRs as they were used in this study. Images b and d are coronal slices from the screening CT.

Observer Performance without CAD Without CAD, the FOM was 0.68 for radiologists and 0.54 for residents (Table 2, Figure 3). The radiologists had an average sensitivity of 56% with 0.24 FPs per CXR. The residents had an average sensitivity of 44% with 0.46 FPs per CXR. Twenty-seven lesions were detected by at least 3 observers. In this sub-selection of more conspicuous lesions, the average FOM was 0.93 for radiologists and 0.76 for residents. Observer Performance with CAD, Lowering Confidence Score Allowed When readers were allowed to change their ratings depending on CAD suggestions, average FOM for the radiologists did not change (0.68, P=.92). Average FOM for the residents increased from 0.54 to 0.57 but the improvement was not significant 36

FOM

Radiologist

Resident

sensitivity (%)

FP markings per CXR

without CAD

with CAD

without CAD

with CAD

without CAD

with CAD

A

0.69

0.71

56

51

0.25

0.11

B

0.67

0.66

56

58

0.22

0.28

average

0.68

0.68

56

55

0.24

0.19

C

0.44

0.50

35

36

0.59

0.41

D

0.54

0.60

62

58

0.76

0.57

E

0.59

0.62

33

36

0.15

0.14

F

0.58

0.58

45

49

0.33

0.35

average

0.54

0.57

44

45

0.46

0.36

Note - Lowering of the confidence score was allowed.

(P=.09) (Table 2). With CAD, the average sensitivity of radiologists and residents remained virtually unchanged, 55% and 45%, respectively. Specificity improved from 0.24 to 0.19 FPs per CXR for radiologists and from 0.46 to 0.37 FPs for residents. In the sub-selection of conspicuous lesions, average FOM remained 0.93 for radiologists, but significantly improved for residents (from 0.76 to 0.82, P400

57

13

VDT 20%). (4) ‘failure’: No segmentation or the result has no similarity with the lesion. An example of each classification can be found in Figure 1. In order to exclude the influence of failed segmentations on the reproducibility of a software, nodules were grouped into ‘adequately’ (group 1 and 2) and ‘inadequately’ (group 3 and 4) segmented nodules. Inadequately segmented nodules were excluded from the calculations of inter-examination variability as these segmentations have no value and greatly influence volume measurement reproducibility, making meaningful comparisons impossible. Reproducibility of Visual Assessment of Segmentation The intra- and interobserver reproducibility for the visual assessment of segmentation accuracy was tested. On one system, the observer performed the visual assessment twice with 1 week in between readings. A second observer, a CT technician with special training in evaluating and reporting cancer screening CTs with the use of volumetric software (> 4000 examinations in 3 years), repeated the visual assessment of the segmentation accuracy as well, on the same and on a second system. We identified the percentage of nodules in which the visual assessment of segmentation changed between adequate and inadequate. Statistical Evaluation All statistics were calculated using Microsoft Excel XP (Microsoft, Redmond, Wash.) and the SPSS statistical software package version 15 (SPSS, Chicago, Ill.). To assess the effects of inspiration level, we calculated the Pearson correlation coefficient between the relative difference and the ratio of lung volumes (first / second examination). In order to compare the number of adequately segmented nodules per system, a binominal test was applied, using the percentage of adequately segmented nodules of the best system as test proportion. Differences in volume (∆V) were calculated by subtracting the volume measured on the first scan (V1) from the volume measured on the second scan (V 2). This difference was then normalized with respect to mean nodule volume to assess relative differences:

62

V 2 −V1 (V 1 + V 2) / 2

The histogram of relative differences showed a normal distribution for all packages (tested with the Kolmogorov-Smirnov test). Because the same nodule was measured twice on successive chest CTs, a mean relative difference close to 0 can be expected. In fact, none of the packages had a mean relative difference higher than 1.1%. We therefore decided to use only the upper limit of agreement of the 95% CI of the relative differences as assessed according to the method proposed by Bland and Altman 7 as the measure of interexamination variability. An increase in nodule volume above this upper limit of agreement can, with 95% confidence, be attributed to real growth. To compare the various software packages with respect to interexamination variability, we used an F-test on a subgroup of nodules that were adequately segmented on both scans by all packages. We also tested whether there was a significant difference in interexamination variability between excellently and satisfactorily segmented nodules. For each software package separately, an F-test was used to compare interexamination variability for all those nodules that were classified as excellently or satisfactorily segmented with this specific software. An F-test was also used to test for differences in interexamination variability before and after manual correction by the user. Influence of nodule diameter on interexamination variability was tested using one-way ANOVA. In order to detect systematic differences in measured volumes between packages, we performed a mixed model variance analysis of nodule volumes on a subset of nodules that were adequately segmented by all programs.

Results Nodule Characteristics A total of 214 solid pulmonary nodules fulfilled the inclusion criteria. Mean diameter, measured with the electronic ruler, was 10.9mm (range 3.3mm – 30.0mm). The number of eligible nodules per patient ranged from 0 (no visible metastases visible after therapy) in nine patients to 50 nodules in one patient, with a median of 19 in the patients with nodules. The database included 91 round, 39 lobular and 84 irregular nodules. Seventy-six had no contact to pulmonary structures, 60 were pleura attached and 78 nodules were attached to a vessel or to both vessel and pleura.

Chapter 5 | A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry

∆Vrel = 100% ⋅

Influence of Inspirational level on Interexamination Variability Ratio of lung volume between scan 1 and 2 ranged from 81% to 126%, with a mean of 102%, weighted to the number of nodules per patient. Inspirational level did not significantly correlate to the relative difference for any of the software

63

Relative difference in nodule volume

Ratio of lung volumes

b Relative difference in nodule volume

Chapter 5 | A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry

a

Ratio of lung volumes Figure 2 Correlation between ratios of lung volumes (lung volume on second scan divided by lung volume on first scan) and differences in measured nodule size. The graphs show the software package with the strongest correlation (a; software A, r= -0.18, p= 0.10) and the weakest correlation (b; software F, r= 0.02, p= 0.87).

packages. Mean correlation coefficient for all software packages was -0.11 (P = .3). Figure 2 shows the software packages with the weakest and strongest correlation to inspirational level. Reproducibility of Visual Assessment of Segmentation Accuracy Both intra- and interobserver reproducibility of the visual assessment of segmentation accuracy was high. Only for 2 of 214 nodules (1%), the visual assessment of segmentation changed between adequate and inadequate when the same observer repeated the measurements. The second observer changed the 64

Software package A

B

C

D

E

F

17.0%

13.1%

20.8%

13.4%

20.5%

19.6%

excellent

15.9%

15.4%

21.0%

13.5%

20.4%

19.5%

satisfactory

21.9%

18.0%

22.9%

27.6%

19.5%

19.7%

< 8mm

21,2%

18,5%

24,9%

19,5%

25,6%

24,5%

≥ 8mm

17,1%

14,5%

16,8%

16,1%

12,9%

13,8%

80 mm3

78 mm3

59 mm3

85 mm3

57 mm3

95 mm3

a. Common dataset b. Individual datasets

c. Influence of size

d. Median nodule volume

Note – (a) Variability in the common dataset of 89 nodules that were adequately segmented by all software packages. (b) Variability for the individual datasets of all nodules that were excellently and satisfactorily segmented by each particular software package. A nodule that was rated as excellently segmented in one CT examination and satisfactorily in the other was classified as ‘satisfactory’. Significant differences are printed bold. (c) Variability for different sizes of nodules, also for the individual datasets. Note that results for different software packages are therefore not comparable. Significant differences are printed in bold. (d) Median nodule volume in the common dataset of 89 nodules that were sufficiently segmented by all software packages. Despite the fact that the sets contain identical nodules, there are substantial differences in nodule volumes measured by the various programs.

accuracy score for 3/214 (1.4%) nodules on the first and 6/214 (2.8%) nodules on the second system. Segmentation Accuracy The number of nodules that were segmented adequately by each individual package varied significantly from between 71% and 86% before manual correction and between 71% and 98% after correction depending on the software package used (P < .001) (Figure 3).

Chapter 5 | A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry

Table 2 Comparison of interexamination variability.

Interexamination Variability Eighty-nine (42%) of all nodules were adequately segmented by all packages. In this dataset, interexamination variability of software packages B and D was significantly lower than that of package C, E and F. Extent of variability of package C also differed significantly from package A (Table 2). Significant differences were seen in interexamination variability between excellently and satisfactorily segmented nodules for three of the six software packages (Table 2). The overall measurement variability of the software packages before and after manual correction is given in Figure 4. For each individual software package, Figure 4 shows the variability for all nodules adequately segmented by 65

Chapter 5 | A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry

Figure 3 Comparison of visual rating of segmentation accuracy for the various software packages. The graph displays the percentage of nodules for which segmentation was rated “excellent” and /or “satisfactory” on scan 1 and scan 2. Nodules for which segmentation was rated “poor” or “failure” on at least one of the scans are summarized as “inadequately” segmented nodules. a: Displays the raw results without manual adjustment of nodule contours, while (b) displays results after manual correction of the segmentation. Note that systems A and E did not allow for manual correction.

66

Chapter 5 | A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry

Figure 4 Comparison of overall variability in repeated CT examinations for the various software packages, without (a) and after (b) manual adjustments. For each package only those nodules were considered, for which segmentation was visually rated as adequate (excellent or satisfactory on both scans). The percentage of nodules included in these calculations therefore varied between Figure 4a and 4b, and per software package (see Figure 3). Overall interexamination variability is expressed as the upper limit of the 95% CI for the relative differences between scan 1 and 2.

this particular software package. Note that the numbers differ from those in Table 2 because for each package all adequately segmented nodules were included and not only the subset of 89 nodules that were adequately segmented by all software packages. Compared to the automated results, the upper limit of agreement did not change significantly after manual correction for any of the software packages. Nodule diameter significantly influenced the extent of interexamination variability in five out of six software packages. This influence is shown in Table 2. Due 67

Chapter 5 | A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry

68

a

b

Figure 5 Systematic volume differences between software packages. Although both packages draw an accurate outline around the nodule, (a) shows a volume of 82 mm3 (diameter ~5.4 mm), while in (b) the calculated volume was only 32 mm3 (diameter ~3.9 mm)

to a low number of nodules (25% was observed in 4%. When the readers used different algorithms, 83% of measurements showed a difference of >25%. Conclusion Modern volumetric software failed to correctly segment a high number of screen detected nodules. While choosing a different algorithm can yield better segmentation of a lung nodule, reproducibility of volumetric measurements deteriorates substantially when different algorithms were used. It is crucial even in the same software package to choose identical parameters for follow-up.

.........................................................................................................................................................................

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

76

Background Since the introduction of Computed Tomography (CT) the technique has improved significantly, and many more nodules are detected with modern techniques. Thinner slices and faster rotation time allow a rapid and detailed evaluation of the lung 1. Furthermore, low-dose CT techniques have reduced the radiation exposure and made use of repeat imaging more acceptable from an ethical point of view 2,3. Assessment of growth is a key issue in the diagnostic workup of lung nodules found on CT 4. Rapid growth of lung nodules is associated with malignant lung disease and repeat imaging is essential 5. Previously, assessment of lung nodules was performed manually, by measuring the nodule in three dimensions (x, y and z) 6. Recently pulmonary nodule evaluation software has been launched that allows for semi-automated volumetric measurements and is increasingly being used for the diagnostic workup of lung nodules 7,8. In lung cancer screening trials with low-dose CT, nodule volumetry is increasingly used for follow-up of indeterminate nodules in order to detect growth and thus, identify suspected malignant lesions 9. Nodule volumetry software is available from various vendors but has been shown to vary with respect to absolute measured volume as well as reproducibility of volumetric measurements 10. Correct segmentation of a pulmonary nodule is the prerequisite for accurate volumetry. In this study we examined one particular volumetry software package 11 that approaches the issue of nodule segmentation by providing three distinct segmentation options which include a generic segmentation (All-Size) and two segmentation options that are specifically aimed at small nodules (Small-Size) and non-solid nodules (Subsolid). We examined inter-observer variability in a lung cancer screening setting under the condition that observers would start with one specific algorithm and then chose to step up to the next algorithms should nodule segmentation with the first one fail. We compared inter-observer variability if both observers chose the same algorithm and if they chose different algorithms. For each approach the percentage of nodules in which differences in measured volumes exceeded 25% was recorded.

Materials and Methods Patients The study population was selected from the Danish Lung Cancer Screening trial (DLCST). The DLCST is a 5-year trial investigating the effect of annual screening with low-dose CT on lung cancer mortality. Participants were current or former smokers at an age between 50 and 70 years at inclusion with a smoking history of more than 20 pack years 12. The CT images were screened by two radiologists (KSB and HH) and all noncalcified nodules with a diameter over 5 mm (manual measurement) were included in this study. All screen detected nodules were tabulated along with information regarding the lung segment in which the nodule was found. In the event of disagreement between the radiologists consensus was obtained and registered.

Depending on the radiological degree of suspicion the nodules were either surgically resected or underwent repeat imaging after 3 months to evaluate growth. Included in this study were nodules > 5mm detected at baseline screening starting November 2004, and their follow-up images up to April 2008. All imaging was performed on multidetector Computed Tomography (MDCT) (16row, MX 8000 IDT, Philips Medical Systems, Cleveland, OH, USA). Imaging was performed supine at full inspiration in the caudo-cranial direction including the entire lungs. A low-dose technique with 140kV and 40 mAs was used. Imaging was performed with spiral data acquisition with the following acquisition parameters: Section collimation 16×0.75 mm, pitch 1.5 and rotation time 0.5 seconds. Images were reconstructed with 3-mm slice thickness at 1.5-mm increments using a soft algorithm (Kernel A) 12. The reproducibility readings of the present study were done by two trained observers (1st Reader: HA and 2nd Reader: BdH) with more than 2 years’ experience in evaluating lung screening imaging with semiautomated nodule volumetry software 11 (Syngo LungCARE CT, Siemens Medical Solutions, Erlangen, Germany). The observers were participating in different screening trials, the Danish DLSCT (HA) and the Dutch-Belgian NELSON (BdH) trials. To ensure that nodules were correctly

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

Figure 1 Screenshot of Siemens LungCare software. The right lower window displays a visual 3D presentation of the nodule.

77

78 Subsolid

Subsolid

Different algorithms

Small-Size

All-Size

0 (0)

Small/All Size

Subsolid 80 (20)

11 (3)

Subsolid

34(8)

35 (9)

311 (80)

23 (6)

36 (9)

Small/All Size

Total

All-Size

Small-Size

Total

All-Size

All-Size

275 (5 - 2107)

-

249 (5 - 2107)

304 (9 – 1780)

255 (6 – 995)

281 (7 - 3486)

392 (51 - 1296)

1056 (146 - 3486)

160 (7 - 2088)

Small Size

Small-Size

252 (65)

Mean nodule volume in mm3 (range)

No. of nodule readings (%)

Algorithm of 2nd reader

Algorithm of 1st reader

Same algorithms

 

 

 

 

 

 

 

 

 

 

  Range

-207 (705)

-

-1428 (1368)

77 (74)

-100 (154)

0 (10)

-1 (3)

1 (6)

-1 (11)

-4379 – 406

-

-4379 – -196

14 – 406

-890 – -6

-65 – 48

-5 – 8

-14 – 31

-65 – 48

% of minimal reading

Mean (SD)

0%

-

0%

0%

0%

50%

4%

22%

58%

No difference in volume

 

 

83%

-

100%

82%

77%

4%

0%

3%

4%

Difference in volume > 25%

Difference between readers (1st -2nd)

Table 1 Algorithm applied by two readers, nodule volume and differences between readings.

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

Results At baseline screening188 nodules were found in 161 participants. Including repeat imaging for follow-up of these nodules, 545 nodules on 488 CT could be included in this study. In 154 of the 545 nodules (28%), one (10%) or both (18%) readers were unable to correctly segment the nodule using all available segmentation algorithms. In the remaining 391 cases in which both readers found at least one algorithm that correctly segmented the nodule, they chose the same algorithm in 311 cases (80%) (Table 1). When the two readers chose the same algorithm, they found exactly the same volume in 50% of cases. In 4% of cases the difference in volume was larger than 25%. The percentage variation in volume measurements (% of minimal reading) between readers was significantly smaller for the Subsolid algorithm compared with the AllSize algorithm (F test, P < .001), which again had less variation than the Small-Size algorithm (F test, P < .001) (Figure 2). However, when measuring the variation in absolute terms (i.e. mm3), the Small-Size algorithm showed least variability and AllSize algorithms had the highest variability (P < .01, data not shown). When the readers chose different algorithms the volume determined by the Subsolid algorithm was always larger than that obtained with the All-Size algorithm (P < .001), which again was always larger than that of the Small-Size algorithm (Figure 3) (P < .001). All-Size measurements were on average 89% (95% confidence

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

matched, a CT slice on which the nodule was clearly marked was available for both readers. Otherwise each reader was blinded to the readings of the other reader. The analysis procedure for solid nodules consisted of a step-up approach in which first the Small-Size algorithm was tried, and in the event of failure of proper segmentation the All-Size algorithm was tried. This evaluation was performed independently by the two readers. In particular, the following steps were taken: after positioning a seed point in the nodule, the software produced a visual 3D presentation of the detected nodule highlighting the voxels of the nodule for which the volume was calculated (Figure 1). If the segmentation was visually judged to include the whole nodule and no surrounding structures such as vessels and pleura, the segmentation was considered successful. If this visual validation of the nodule showed incorrect segmentation, the reader tried to segment the nodule three times with the same algorithm before concluding that the nodule could not be correctly segmented by this algorithm. In the case of part-solid nodules, only the solid part was analysed either with the Small-Size or the All-Size algorithm. The Subsolid algorithm was applied in the case of pure non-solid ground glass opacity. Bland-Altman plots were used to compare volumetric results for those nodules in which the readers had used the same algorithm and for those nodules in which the readers had used different algorithms. Results were analysed using R statistical software version 2.7.1, and a significance level of 0.05 was applied. The differences between readers were normally distributed. An F-test was used to compare the variances achieved with the various algorithms.

79

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

Figure 2 Bland-Altman plots when the same algorithm was chosen by both readers. Small-Size = 252 nodules, All-Size = 36 nodules and SubSolid = 23 nodules. Dotted lines indicate 25% variation. Because of the relationship between nodule size and variability (and to simplify), lines corresponding to the 95% confidence interval were omitted.

Figure 3 Bland-Altman plots when the readers chose different algorithms. Small/All Size = 35 nodules, All/Small Size = 34 nodules and Small/All Size – SubSolid = 11 nodules. Dotted lines indicate 25% variation. Because of the relationship between nodule size and variability (and to simplify), lines corresponding to the 95% confidence interval were omitted.

80

Discussion In this study we examined one particular volumetry software package (LungCARE CT version VE25A, Siemens Medical Solutions, Erlangen, Germany) that offers several algorithms for the analysis of nodules depending on the morphology of the nodule. Former versions of the software without this option have been tested 13,14,15. Although several options may broaden the utility of the software, full understanding of these new features and a high reproducibility of measurements is a key issue before software can be used in clinical decision making. We found volumetric measurements of screen-detected nodules were reasonably reproducible when readers used the same algorithm. However, it is not a good idea to use a stepup approach with different segmentation algorithms in order to try to optimise nodule segmentation. Even if the vendor offers different algorithms within the same software package, you should always stick to the same algorithm and record it in order to avoid massive measurement errors. In the NELSON lung cancer screening trial where nodules were segmented by volumetry software, volume measurements were identical in a very high percentage (89%) 13. However, as in other recently published studies 14,15 this study excluded subsolid, semi-solid, pleura-based and vessel-connected nodules. Only nodules surrounded by lung tissue (intraparenchymal nodules) were included, and they are known to be more reproducible when evaluated by pulmonary nodule evaluation software 15. In a study 15 by the NELSON group of 4225 nodules in 2239 participants, they found complete agreement in volume in 86% and a disagreement ≥ 25% in 2% of nodules only. However, this study included solid 15-500 mm3 nodules only, and if readers manually modified the volume in the prospective lung cancer screening study, the nodules were excluded as well. In the present study no nodules were excluded; we also included semisolid and ground glass lesions which, in our opinion, will provide a more representative result, as in everyday clinical practice all sorts of nodules must be assessed. In 28% of the readings at least one of the readers could not determine the volume (n=154). The relatively high number of nodules that could not be measured emphasises the necessity of visual validation of nodule segmentation with skilled

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

interval (CI): 60% – 118%) larger than Small-Size measurement of the same nodule, and in 80% All-Size readings were more than 25% larger than Small-Size readings. Subsolid measurements were always more than 25% larger than readings using one of the algorithms for solid nodules, i.e. Small-Size or All-Size. On average volumetric results obtained with the Subsolid algorithm were 1428% (CI: 508% - 2347%) larger than those from algorithms for solid nodules (Table 1, Figure 4 A-B-C). Figure 2: Bland-Altman plots when the same algorithm was chosen by both readers. Small-Size = 252 nodules, All-Size = 36 nodules and SubSolid = 23 nodules. Dotted lines indicate 25% variation. Because of the relationship between nodule size and variability (and to simplify), lines corresponding to the 95% confidence interval were omitted.

81

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

82

a

b

c

Figure 4 The same nodule analysed at the same time with the Subsolid (a), Small-Size (b) and All-Size (c) algorithms.

human interference in the reading process and demonstrates why fully automated detection of lung nodules is still unrealistic with the currently tested state-ofart software. A semiautomatic approach with a manual selection of nodules and supervision of the nodule rendering procedure is necessary to ensure accuracy of the volumetric measurement. When comparing the difference between measurement in % of the minimal volume the inter-observer variability was highest for Small-Size measurements and least using the Subsolid algorithm (P < .01) (Figure 2). However, when comparing volume differences in absolute values Small-Size measurements showed the least variability and All-Size the highest (P < .01) (data not shown). This is because the size of the nodule has great influence on the variation coefficient, which is visualised in Figure 2 where variability increases with decreasing volume. This effect is also seen when different algorithms are applied (Figure 3); the difference between the readers tends to be smaller with increasing nodule size. Based on our data it could not be decided which algorithm was most reproducible because the algorithm was not randomly chosen. Furthermore, the variability of the volume measurement depended on several factors such as nodule size, morphological characteristics of the nodule, and whether the variability was calculated in relative or in absolute terms. One explanation for the observed volume disagreement between two readers is related to the fact that semiautomatic volumetric measurements may vary according to the positioning of the seed point by the observer, which is a nonautomated part of the procedure. When a spherical 3D template gradually expands from the seed point, different starting positions within the nodules may lead to different volumetric results. Obviously, the chance of picking the same seed point is inversely related to the size of the nodule, and this may explain the high percentage

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

of identical readings when using the Small-Size algorithm (Table 1). In 20% of the correctly segmented nodules (n=80), the readers used different algorithms to analyse the same nodule, and this had a significant influence on the volume measured. If the same nodule was measured with two different algorithms, the All-Size measurements were always larger than the measurements in which the Small-Size algorithm was used, and usually (80%) the difference in volume exceeded 25%. Furthermore, measurements performed with the Subsolid algorithm were always (Figure 4A) larger than the Small/All Size measurements (Figure 4B and Figure 4C). This shows that the Subsolid algorithm detects a larger volume compared with the solid algorithms, as part-solid nodules usually have a solid core surrounded by a larger subsolid sphere. The difference between All-Size and SmallSize measurements is less obvious and tended to be smaller with increasing nodule size (Figure 3). Some previous studies have indicated that a minimal growth of 25% is required to avoid confusion with random measurement variability 14,15,16. In the NELSON and DLCST trials the limit of 25% growth is implemented in the study protocol. Growth 25% are referred for additional diagnostic workup 9,12. Therefore, when using pulmonary nodule evaluation software for repeated nodule measurements the variability should be 25% in repeated measurements may result in false positive growth estimation. The NELSON group has reported on the variability of volume analysis using pulmonary nodule evaluation software, and in one of their first studies analysing 430 nodules 13 they found for nodules with a discrepancy between two readings that 95% of the variability was between -22% and 29%. However, for most nodules (89%) there was no difference between readings. The variability was above 25%, i.e. false positive growth, in only 1.2% of nodules. In a later study of 218 nodules also from the NELSON group 14, variability was found to be dependent on nodule size as the variability lowered with increasing size, which was consistent with our findings (Figure 3). In this study 14 significant growth was defined as being growth beyond the 95% CI of the variability which was estimated to be -21% to 24%, which was comparable to the first study 13. Furthermore, the NELSON group investigated the influence of nodule morphology and concluded that for irregular nodules the cutoff point for significant growth should be at 30% relative growth and only 15% for spherical nodules. These findings were later confirmed in a large study from NELSON 15 consisting of 2367 nodules which also included attached nodules. Odds ratio for irregular nodules having variability above 15% was 9.1 (CI: 6.1 - 15.1) compared with spherical nodules. In all the NELSON studies 13,14,15 the same software package (LungCARE) was applied, and a comparison of the performances of 6 different software packages showed that none of the systems had a variability for adequately segmented nodules of more than 22.3% 10. Overall the studies above comply well with the 25% definition of significant growth used in most lung cancer screening trials 9,12,16. However when analysing subgroups of nodules dependent on size and morphology the cut-off of 25% may be challenged, as smaller size and irregular nodules may require a higher cut-off. A recent study 17 further challenged the 25%

83

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

definition of significant growth by suggesting that even 30% observed growth may not prove real growth. In our study 4% had variability over 25% when the same algorithm was applied, and use of different algorithms resulted in 83% variability above 25% (Table 1). As mentioned previously, we also observed larger variability for smaller nodules, indicating the inappropriateness of applying the same threshold for growth to all nodules. To avoid false positive growth, which is essential in lung cancer screening programs, the size and morphology of the nodule should be taken into account and a sensible and customised approach to the definition of significant growth should be applied. This study has limitations. Although the software is widely used all results reported are valid only for the particular software release we used (LungCARE VE025A). Furthermore we used a 3-mm slice thickness while 1-mm slices are preferred because segmentation accuracy and reproducibility has proven to be superior with the use of the thin slices 18. However, in the DLCST the radiologist used 3-mm slices when screening for nodules 12, and therefore this slice thickness was chosen in the present study. We consider it to be a strength of the study that all nodules were included, and not just nodules surrounded by lung tissue (intraparenchymal nodules). Also the fact that the double readings were performed in two institutions in different countries strengthens the external validity of the study. The readers were completely blinded in the sense that they had no information regarding the choice of algorithm or the results of the other reader. The development of new and improved pulmonary nodule evaluation software is a promising tool for the diagnostic workup of indeterminate lung nodules. In this context the reproducibility of the volumetric measurements is a key issue before the results can be used in every day clinical decision making. New versions of software are not always comparable with former versions. Siemens syngo LungCARE CT version VE25A used in the DLCST allowed the choice of various algorithms for the analysis of lung nodules. Former versions of the software used in the NELSON study did not have this option. The use of different algorithms presents a challenge when pooling data from several trials with the aim of gaining more statistical power, as volumetric measurements may not be directly comparable. A complete and independent understanding and validation of the different software packages requires full access to technical details behind the algorithm, and this is usually incompatible with the policy of software companies. However, a close cooperation between clinicians and software companies is desirable to ensure continued development of high-quality software.

Conclusion We found volumetric measurements to be reproducible using Siemens syngo LungCARE CT version VE25A, when using the same nodule-analysing algorithm. Modern volumetric software failed to correctly segment a high number of manually detected screen nodules (28%). Provided the same software algorithm was used,

84

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

96% of the volumetric measurements showed a variability of less than 25%. However, segmentation algorithms within the same software package cannot be used interchangeably, and using the same analysing algorithm is essential for correct longitudinal assessment of lung nodules.

85

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

86

Reference List 1. Verschakelen JA,. Bogaert J, de W. Wever. Computed tomography in staging for lung cancer. Eur Respir J 2002 19:40-48 2. Henschke CI, McCauley DI, Yankelevitz DF, et al. Early Lung Cancer Action Project: overall design and findings from baseline screening. Lancet 1999 354:99–105. 3. Swensen SJ, Jett JR, Hartman TE, et al. Lung cancer screening with CT: MayoClinic experience. Radiology 2003 226:756–761. 4. Xu DM, van der Zaag-Loonen HJ, Oudkerk M, et al. Smooth or attached solid indeterminate nodules detected at baseline CT screening in the NELSON study: cancer risk during 1 year of follow-up. Radiology 2009 250:264-72. 5. Hasegawa M, S Sone, S Takashima, et al. Growth rate of small lung cancers detected on mass CT screening. Br J Radiol 2000 73:1252-1259. 6. Pauls S, Kürschner C, Dharaiya E, et al. Comparison of manual and automated size measurements of lung metastases on MDCT images: Potential influence on therapeutic decisions. Eur J Radiol 2008 66:19–26. 7. Kostis WJ, Reeves AP, Yankelevitz DF, et al. Three-dimensional segmentation and growth-rate estimation of small pulmonary nodules in helical CT images. Medical Imaging, IEEE Transactions 2003 22:1259-1274. 8. Yankelevitz DF, Anthony PR William JK, MS, et al. Small Pulmonary Nodules: Volumetrically Determined Growth Rates Based on CT Evaluation. Radiology 2000 217:251-256. 9. Xu D, Gietema H, de Koning H, et al. Nodule management protocol of the NELSON randomised lung cancer screening trial. Lung Cancer 2006 54:177-184. 10. De Hoop B, Gietema HA, Ginneken van B, et al. A comparison of six software packages for evaluation of solid lung nodules using semi-automated volumetry: What is the minimum increase in size to detect growth in repeated CT examinations. Eur Radiol 2009 19: 800–808. 11. Siemens LungCARE CT and syngo LungCAD Homepage. Accessed 1.of March 2009. http://www. medical.siemens.com/webapp/wcs/stores/servlet/ProductDisplay~q_catalogId~e_-11~a_catT ree~e_100010,1007660,12752,1008405,1008410~a_langId~e_-11~a_productId~e_11611~a_ storeId~e_10001.htm 12. Pedersen JH, Ashraf H, Dirksen A, et al. The Danish Randomized Lung Cancer CT Screening Trial – Overall Design and Results of the Prevalence Round. J Thorac Oncol 2009 4: 608-614. 13. Gietema HA, Wang Y, Xu D, et al. Pulmonary Nodules Detected at Lung Cancer screening: Interobserver Variability of Semiautomated Volume Measurements. Radiology 2006 241:251256. 14. Gietema HA, Schaefer-Prokop CM, Mali WPTM, et al. Pulmonary Nodules: Interscan Variability of Semiautomated Volume Measurements with Multisection CT—Influence of Inspiration Level, Nodule Size, and Segmentation Performance. Radiology 2007 245:888-894. 15. Wang Y, van Klaveren RJ, van der Zaag-Loonen HJ, et al. Effect of nodule characteristics on variability of semiautomated volume measurements in pulmonary nodules detected in a lung cancer screening program. Radiology 2008 248:625-31. 16. Marchiano A, Calabro E, Civelli E, et al. Pulmonary Nodules: Volume Repeatability at Multidetector CT Lung Cancer Screening. Radiology 2009 251: 919-925. 17. Rampinelli C, Fiori DE, Raimondi S, et al. In Vivo Repeatability of Automated Volume Calculations of Small Pulmonary Nodules with CT. Am J Roentgenol 2009 192:1657–1661. 18. Gurung J, Maataoui A, Khan M, et al. Automated Detection of Lung Nodules in Multidetector CT: Influence of different reconstruction protocols on performance of a software prototype. Fortschr Röntgenstr 2006 178: 71-77.

Chapter 6 | Lung nodule volumetry: segmentation algorithms within the same software package cannot be used interchangeably

87

7 Chapter

Bartjan de Hoop Hester Gietema Saskia van de Vorst Keelin Murphy Rob van Klaveren Mathias Prokop

Increase in mass of pulmonary ground glass nodules as an early indicator of growth

.......................................................................... Abstract ..........................................................................

Purpose To compare manual measurements of diameter, volume, and mass of pulmonary ground-glass nodules (GGNs) to establish which method is best for identifying malignant GGNs by determining change across time. Method and Materials In this ethics committee–approved retrospective study, baseline and follow-up CT examinations of 52 GGNs detected in a lung cancer screening trial were included, resulting in 127 GGN data sets for evaluation. Two observers measured GGN diameter with electronic calipers, manually outlined GGNs to obtain volume and mass, and scored whether a solid component was present. Observer 1 repeated all measurements after 2 months. Coefficients of variation and limits of agreement were calculated by using BlandAltman methods. In a subgroup of GGNs containing all resected malignant lesions, the ratio between intraobserver variability and growth (growth-to-variability ratio) was calculated for each measurement technique. In this subgroup, the mean time for growth to exceed the upper limit of agreement of each measurement technique was determined. Results The κ values for intra- and interobserver agreement for identifying a solid component were 0.55 and 0.38, respectively. Intra- and interobserver coefficients of variation were smallest for GGN mass (P < .001). Thirteen malignant GGNs were resected. Mean growth-tovariability ratios were 11, 28, and 35 for diameter, volume, and mass, respectively (P = .03); mean times required for growth to exceed the upper limit of agreement were 715, 673, and 425 days, respectively (P = .02). Conclusion Mass measurements can enable detection of growth of GGNs earlier and are subject to less variability than are volume or diameter measurements. .........................................................................................................................................................................

Chapter 7 | Increase in mass of pulmonary ground glass nodules as an early indicator of growth

90

Introduction Ground glass nodules (GGNs) are regularly encountered during Computed Tomography (CT) screening for lung cancer. In the Early Lung Cancer Action Project (ELCAP) trial, for example, the majority of detected malignancies were GGNs 1. These GGNs pose a challenging task to the clinician as they grow slowly 2 but at the same time have a malignancy rate as high as 63% 1. A GGN is a circumscribed area of increased lung attenuation with preservation of the bronchial and vascular margins. A GGN can be part-solid (part of the ground glass opacity completely obscures the parenchyma) or nonsolid (no completely obscured areas) 2. Whereas a large range of benign diseases (eg, inflammatory disease or fibrosis) can manifest as GGN, most GGNs that persist over longer periods of time represent either atypical adenomatous hyperplasia, bronchoalveolar carcinoma 3, or minimally invasive adenocarcinoma 4-7. Differentiation of benign and malignant GGNs is done largely on the basis of change in size or on the development of a solid component in a previously nonsolid GGN 8-10. Close follow-up is considered justified when a GGN is stable, i.e. no growth or development of a solid component can be demonstrated 9, 11. However, when a GGN increases in size or if a solid component develops, it should be resected 2, 8-10, 12. Because GGNs usually grow slowly, a method that can demonstrate subtle changes in size and density is required 8, 11. Changes in GGNs are usually evaluated using diameter measurements and visual assessment of the appearance of a solid component 9, 11, 13. Nodule volumetry is superior to diameter measurements in solid nodules in terms of accuracy and reproducibility 14 but volumetric measurement techniques are not yet regularly used for GGNs. Mass is a parameter that integrates volume and density: mass increases if the volume of a nodule increases or if its density increases. Mass should, therefore, be especially suitable for identifying GGNs with a high risk for malignancy. In this study, we introduce the estimation of GGN mass as a new method for measuring change in GGNs on CT images. Mass can be calculated from CT data because x-ray attenuation values are proportional to tissue density (i.e. mass per unit volume) 15. Nodule mass can be calculated by multiplying nodule volume and density. Whereas nodule volume weights each voxel in the volume of interest identically, nodule mass weighs voxels that include more air or even pure lung tissue less, which should make nodule mass less sensitive to variability than nodule volume to variability in nodule segmentation. We compared manual measurements of diameter, volume and mass of pulmonary GGNs to establish which method is best for identifying malignant GGNs by determining change across time.

Study Participants All participants were recruited from the randomized Dutch-Belgian lung cancer screening trial (NELSON) 16. The trial was approved by the Ministry of Health and the institutional review boards of the participating centers. Written informed consent was obtained from all participants. All participants were current or former heavy smokers 17. For the current analysis, all CT examinations performed between April 2004 and April 2009 at one of the study sites (University Medical Center Utrecht, The Netherlands) were included. All CT examinations were read for pulmonary nodules. Depicted nodules were characterized as solid nodule or GGN as part of the screening trial. All detected GGNs were used in our current evaluation. Patients with a growing GGN were referred to a pulmonologist who decided whether resection was necessary16. Detected Ground Glass Nodules A total of 2994 volunteers were scanned. Fifty-two GGNs were detected, originating from 45 participants (42men, 3 women, mean age 62 years, range 53 – 73 years). Seven patients had 2 GGNs each. Including follow-up examinations, 127 datasets containing a GGN were available for evaluation. Eighteen lesions were visible on only one CT study, 6 lesions were visible on two, 17 lesions on three, 9 lesions on four and 2 lesions on five consecutive CT-examinations, respectively. Median diameter of the GGNs was 13.9mm (range 3.9 – 29.7mm), as measured with electronic calipers. Malignant Ground Glass Nodules One benign and thirteen malignant GGNs were resected in 12 patients (9 men, 3 women, mean 60 years). Two patients each had 2 malignant GGNs resected. Histologic findings included adenocarcinoma (n=7), bronchoalveolar carcinoma (n=6) and atypical adenomatous hyperplasia (n=1). Median diameter of the malignant GGNs was 18.2mm (range 8.6 – 29.7mm), as measured on the last follow-up CT study.

Chapter 7 | Increase in mass of pulmonary ground glass nodules as an early indicator of growth

Materials and Methods

Study Design To determine which method is best for measuring potentially malignant change in GGNs, we took the following three-step approach. First, the intra- and interobserver variability of each method were assessed. All baseline and follow-up CT studies in which at least one GGN had been detected were selected. Two independent observers performed all measurements (observer 1: B.d.H., radiology researcher with 3 years experience in CT lung cancer screening [>1000 examinations]; observer 2: S.v.d.V., medical researcher with special training in evaluating lung cancer screening CT scans and 2 years experience [>4000 examinations]). Each observers independently categorized each GGN in the data set as nonsolid or part-solid. Observer 1 repeated all measurements after an interval of two months in order to estimate intraobserver variability. Second, we used a subgroup of all GGNs that had been resected and proved

91

Chapter 7 | Increase in mass of pulmonary ground glass nodules as an early indicator of growth

malignant to evaluate the ability of each method to demonstrate growth. For each malignant GGN, we related the total change in diameter, volume and mass during follow-up to the variability of each measurement method (growth-to-variability ratio). Third, we compared the three methods with respect to the time that we would have needed to follow a malignant nodule in our study before its growth would be discernable (i.e. exceed the upper limit of agreement of intraobserver variability). Intraobserver variability was estimated from the repeated measurements of observer 1. The “time to discernable growth” was determined in the subgroup of resected malignant GGNs. We used the actual data from the screening study; applied the diameter, volume and mass measurements; and determined the time between the baseline CT examination and the first CT examination in which growth became discernable. Time required for growth to become discernable was determined for the measurements of both observers and for each GGN. CT Scanning Patients underwent imaging with a 16-detector-row CT (Mx8000 IDT or Brilliance-16, Philips Medical Systems, Cleveland, OH) in helical mode with 16x0.75mm collimation and 15mm table feed per rotation. Scans were made with the patient at full inspiration, without previous training. No intravenous contrast was injected. Exposure settings were 30mAs at 120kVp for patients weighing 5000 and 630 days- >5000, respectively. In the total follow-up period, none of the GGNs showed a sudden increase in growth rate; the shortest VDT and MDT at any point during follow-up being 336 and 349 days respectively. Conclusion Growing GGNs have a very high malignancy rate but tend to grow slowly and predictably. Our data suggest that long-term follow-up with CT may be a valid option to monitor changes in persistent GGNs. Resection should be considered in any GGN that shows discernable growth. .........................................................................................................................................................................

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

Introduction Pulmonary Ground Glass Nodules (GGNs) are frequently encountered during computed tomography (CT) based lung cancer screening trials. Although GGNs are not as frequently observed as solid nodules, they accounted for the majority of lung cancers detected in early CT lung cancer screening studies 1, 2. A GGN is a circumscribed area of increased lung attenuation with preservation of the bronchial and vascular margins. A GGN can be part-solid (part of the nodule completely obscures the parenchyma) or nonsolid (no completely obscured areas) 3 . While transient GGNs can represent a large range of diseases, malignancy rates ranging from 19.4% to 79.1% have been reported for GGNs that were persistent over time 4, 5. Malignant GGNs most often represent bronchioloalveolar carcinoma (BAC) 6 or minimally invasive adenocarcinoma 7-10. The malignancy rate of GGNs is higher than that of solid nodules. The ELCAP study reported a malignancy rate among their positive findings of 34% for nonsolid GGNs and 63% for partial-solid GGNs, while only 7% of the solid nodules were malignant 1. Despite their high malignancy rate, GGNs have also been shown to grow slowly 3, 11. Current guidelines for pulmonary nodules state that nodules with a volume doubling time (VDT) outside a window of 20 to 400 days have a low probability of malignancy 12-16 . However, these guidelines are based on experiences with solid nodules and may result in misclassification of many malignant GGNs. On the other hand, even if a GGN is malignant, slow growth rate may indicate that they have no clinical impact in patients with a limited life expectancy 17-19. Early detection and treatment of these lesions might not reduce mortality. This phenomenon is called overdiagnosis. Therefore, GGNs may require another management strategy than solid nodules. So far consensus on treatment of GGNs is still lacking due to the apparent discrepancy between a high malignancy rate and a slow growth pattern. The purpose of this study was to explore the natural course of solitary GGN with long-term follow-up in order to improve the management strategy for these lesions.

Methods Study Participants All participants at one of the study sites (Utrecht) of the screening arm of the Dutch-Belgium CT lung cancer screening trial (NELSON) were included 20. The trial was approved by the Ministry of Health and a waiver was received for the current analysis. All participants were current or former heavy smokers 20. Detected nodules were characterized as solid nodule or ground glass nodule as part of the screening trial. All participants with a persistent GGN, screened between April 2004 and January 2010 were included in the current analysis. A persistent GGN was defined as a GGN which was visible on two consecutive CT-examinations performed at least 3 months apart. Follow-up of patients was performed using the NELSON database and patient hospital files.

104

Measurements The NELSON trial uses volumetry software to measure nodule dimensions because volumetry has been proven to be superior to diameter measurements in terms of accuracy and reproducibility 22. For GGNs however, the software sometimes failed to produce a good segmentation. For the present study, we therefore manually measured GGN volume and mass (nodule volume x mean CT density) in order to obtain a precise three-dimensional measurement of each GGN. Differentiation of benign and malignant GGNs is largely based on change in size or on the development of a solid component in a previously nonsolid GGN. GGN mass is a parameter that integrates volume and density, and will therefore not only increase when volume increases, but also when a solid component within a lesion occurs or progresses. Mass measurements are more reproducible than volume or diameter measurements in GGNs and can detect growth of GGN earlier 23. The use of mass measurements enables the detection of very small changes in GGNs. In order to minimize observer variability, we used the mean of the measurements of two experienced observers. Manual volumetry was performed by electronically outlining the lesion perimeter on all axial sections on which the GGN was visible using in-house developed software, followed by calculation of the GGN volume and mass by the computer. Mass of a GGN was calculated by expressing attenuation values in terms of physical density. Details were described previously 23.

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

CT Scanning and Work-up of GGNs The NELSON protocol includes a low-dose CT-examination at baseline, year 1 and year 3 plus additional follow-up CT exams in case suspicious nodules were detected 21. Patients were scanned using a 16-detector-row CT (Mx8000 IDT or Brilliance-16, Philips Medical Systems, Cleveland, OH) in helical mode with 16x0.75mm collimation and 15mm table feed per rotation (pitch 1.3). Scans were made in full inspiration. No intravenous contrast was injected. Exposure settings were 30mAs at 120kVp for patients weighing 5000). Five non-resected GGNs showed a decrease in mass. After exclusion of those nodules, the median MDT for the non-resected GGNs was 1733 days (range 630 – >5000). Growth of each nonresected GGN is plotted in Figure 2. The fastest growing malignant GGN within our study had a minimal VDT of 336 days and a MDT of 349 days between two consecutive follow-up examinations (Figure 3).

107

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

Figure 3 The fastest growing malignant GGN in this study: this GGN measured 6mm in diameter at baseline and had a VDT of 385 days and MDT of 365 days. At the time of resection the maximum diameter was 14.6mm. Pathology showed bronchioloalveolar carcinoma, without lymph node involvement (stage 1A).

The optimum interval for follow-up is long enough to detect growth in GGNs and short enough to avoid too much growth in faster growing GGNs. In order to define the optimum interval for follow-up, the percentage of GGNs that did not show discernable increase in volume or mass at several follow-up intervals is presented in Table 1. Table 1 also mentions the maximum increase in GGN volume and mass at different intervals. PET Only 1/16 (6%) malignant GGN was positive on the PET scan; this was a part-solid BAC with a diameter of 3cm. Five adenocarcinomas showed a mildly increased standardized uptake value (range 1.3 – 3.9). Ten malignancies showed no FDG uptake with PET scanning. Morbidity and Mortality No surgery related mortality was reported. Surgery was complicated by a persistent air leak (n=5) requiring a second surgery in two cases, wound infection (n=2), atelectasis (n=2), pneumonia (n=1) and atrial fibrillation (n=1). During followup 1 / 41 (2.4%) patient died due to ischemic heart disease.

Discussion So far, no consensus exists on treatment of GGNs, especially for very slow or stable GGNs. We report a very high (94%) rate of malignancy among growing GGNs. Malignant GGNs grew very slowly with VDT ranging from 385 to 1733 days. Despite slow growth of the GGNs, one patient developed stage IV disease. With this observation, we would like to recommend resection to be considered for every GGN that shows a discernable increase in either volume or mass. On the other hand, 25/41 (61%) of the GGNs had not shown any sign of growth

108

Percentage of GGNs with no discernable growth follow-up interval

Maximum growth in fastest growing GGN

volumetry

mass

volumetry

mass

3 months

100%

100%

18%

19%

6 months

100%

95%

39%

41%

9 months

88%

73%

64%

68%

12 months

88%

61%

93%

100%

18 months

61%

51%

168%

183%

24 months

56%

34%

272%

300%

36 months

41%

27%

618%

701%

Note – Discernable increase is defined as an increase that exceeds the intraobserver variability of the measurement. Interpolated data was used for intervals for which no actual follow-up data was available.

Table 1b Growth for the 25/41 (61%) GGNs that had no discernable increase in mass at 1 year follow-up.

Percentage of GGNs with no discernable growth follow-up interval

Maximum growth in fastest growing GGN

volumetry

mass

volumetry

mass

12 months

100%

100%

29%

29%

18 months

96%

84%

47%

47%

24 months

92%

60%

67%

67%

30 months

88%

56%

89%

89%

36 months

76%

48%

115%

115%

42 months

72%

48%

145%

145%

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

Table 1a The percentage of GGNs that did not show discernable growth in volume or mass at different follow-up intervals versus the maximum growth in the fastest growing GGN.

Note – Follow-up interval starts at baseline.

after one year. We used an approach of frequent follow-up with CT to monitor these lesions. None of the GGNs showed a sudden increase in growth rate over the total follow-up period; making growth rate of GGNs quite predictable. Moreover, none of the patients with a non-growing GGN was diagnosed with bronchogenic carcinoma during the follow-up period after the last CT scan. Therefore follow-up with CT using precise measurement methods seems to be a safe approach towards screening detected GGNs that do not show growth. The Fleischner guidelines for lung nodules do not provide exact recommendations for follow-up intervals for GGNs. For solid nodules larger than 6mm in diameter, the guidelines recommend a 3-6 months follow-up CT in high-risk patients. The

109

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

Figure 4 Examples of growing GGNs with different prognosis. The top row shows a BAC that measured 1139 mm3 (diameter ~13.0mm, VDT 1733 days, MDT 990 days) at the time of resection. This GGN would measure 3193 mm3 (diameter ~18.3mm) 10 years after baseline (extrapolation of growth curve) and probably represents overdiagnosis. The middle row shows an adenocarcinoma that measured 2152mm3 at resection (diameter ~16.0mm, VDT 866 days, MDT 630 days) with a large solid component. It had already invaded the visceral pleura at the time of resection, making overdiagnosis unlikely. The bottom row shows a GGN that measured 1122 mm3 (diameter 12.9mm, VDT 383, MDT 444 days) at the time of resection but proved benign (chronic inflammatory infiltration).

3-6 months follow-up CT can confirm persistency of the GGN and can detect fast growing nodules but our results show that the interval is too short to differentiate between growing and non-growing GGNs. None of the GGNs in our study had shown discernable growth after 3 months. The interval for the follow-up CT should be much longer for growth to be detected in GGNs. Conversely, the interval should not be too long to avoid relatively fast growing GGNs to become too large or disseminate. This trade-off is shown in Table 1. If, for example, our goal is to detect growth in at least 40% of all GGNs, than the follow-up interval should be around 12 months. The

110

Alternative methods for differentiation of GGNs The authors of a recent publication suggested to perform CT-guided biopsy in all persistent GGNs 25. Consequently, an invasive procedure will be performed in many patients with GGNs that will not grow over time. Moreover, no definite differentiation between atypical adenomatous hyperplasia, BAC and adenocarcinoma can be drawn from biopsy as the definition of BAC requires that there is no evidence of invasive features within the entire lesion 26. The majority of the malignant GGN was PET negative, even when they represented adenocarcinoma. Poor PET-FDG results have been reported in several other studies 25, 27 . PET-scanning therefore has little value in the diagnostic work-up of GGNs. Overdiagnosis Overdiagnosis has been reported to be a problem in lung cancer screening with chest radiography 28 as well as for screening with CT 29. Overdiagnosis may have occurred in our study as some GGN were small and very slow growing (Figure 4). To define overdiagnosis, a VDT of nodules greater than 400 days has previously been proposed 19. This threshold labels 15 out of 16 malignant GGNs in our study as overdiagnosis, including the patient who developed stage IV disease. A second GGN (Figure 4) with a VDT of 866 days had largely turned solid at the time of resection and had invaded the visceral pleura, leaving overdiagnosis unlikely. These cases suggest that the threshold of 400 days to define overdiagnosis may be too low for GGNs.

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

fastest growing GGN would have doubled in mass (100% increase) by that time. This first follow-up CT can detect the relatively fast growing GGNs. The remaining GGNs that do not show growth at the first follow-up CT can either be non-growing or very slow growing. Further follow-up is needed to make this differentiation. Table 1b shows for this category of GGNs the trade-off between the interval that is needed to detect growth versus growth of the fastest growing GGN. When we apply the same criteria: detection of growth in at least 40% of the remaining GGNs, than the follow-up interval should be 24 months from baseline. The fastest growing GGN shows an increase of 67% in volume and mass at that point. The establishment of guidelines for the time to follow-up is problematic because is not known and maybe cannot be known, to what extent growth can be accepted before a malignancy will disseminate.

GGN influence on lung cancer screening High survival rates are reported for patients with cancers detected during lung cancer screening. For example: a 86.2% survival rate after long-term follow-up was reported for patients with a pulmonary malignancy detected during CT-screening 2 . However, 35 out of 57 (61%) of the CT detected cancers appeared as GGN. This proportion is much higher than the proportion of lung cancers with ground glass aspect (4.4%) 30 diagnosed during regular clinical practice. The slower-growing GGNs have a longer asymptomatic phase than faster-growing tumors, and are therefore more likely to be detected during screening, inducing a length time bias to lung

111

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

112

cancer screening. Patients with a slow growing tumor usually survive longer than patients suffering from a faster growing tumor. The overrepresentation of GGNs may therefore be, at least partially, responsible for the high survival rates seen in lung cancer screening studies. Limitations The follow-up period in our study was limited to approximately 4.5 years and growth pattern of (non-growing) GGNs after this period is unknown. Lifelong follow-up may be needed to monitor changes indicating malignant degeneration in previously non-growing GGNs. GGNs are responsible for a substantial part of all detected malignancies in the lung cancer screening study but are still less common than solid nodules, which limited the number of GGNs included in this study. More evidence is needed to exclude less frequent types of growth patterns in GGNs and to establish definitive guidelines for treatment of these lesions. Conclusion GGNs that show growth have a high malignancy rate but tend to grow slowly. Our data suggest that long-term follow-up with CT may be a valid option to monitor changes in persistent GGNs. Resection should be considered in any GGN that shows discernable growth.

1. Henschke CI, Yankelevitz DF, Mirtcheva R, McGuinness G, McCauley D, Miettinen OS. CT screening for lung cancer: frequency and significance of part-solid and nonsolid nodules. AJR Am J Roentgenol 2002 May;178(5):1053-7. 2. Sone S, Nakayama T, Honda T, Tsushima K, Li F, Haniuda M, Takahashi Y, Suzuki T, Yamanda T, Kondo R, Hanaoka T, Takayama F, Kubo K, Fushimi H. Long-term follow-up study of a populationbased 1996-1998 mass screening programme for lung cancer using mobile low-dose spiral computed tomography. Lung Cancer 2007 December;58(3):329-41. 3. Hasegawa M, Sone S, Takashima S, Li F, Yang ZG, Maruyama Y, Watanabe T. Growth rate of small lung cancers detected on mass CT screening. Br J Radiol 2000 December 1;73(876):1252-9. 4. Oh JY, Kwon SY, Yoon HI, Lee SM, Yim JJ, Lee JH, Yoo CG, Kim YW, Han SK, Shim YS, Kim TJ, Lee KW, Chung JH, Jheon SH, Sung SW, Lee CT. Clinical significance of a solitary ground-glass opacity (GGO) lesion of the lung detected by chest CT. Lung Cancer 2007 January;55(1):67-73. 5. Park CM, Goo JM, Lee HJ, Lee CH, Chun EJ, Im JG. Nodular Ground-Glass Opacity at Thin-Section CT: Histologic Correlation and Evaluation of Change at Follow-up. RadioGraphics 2007 March 1;27(2):391-408. 6. Noguchi M, Morikawa A, Kawasaki M, Matsuno Y, Yamada T, Hirohashi S, Kondo H, Shimosato Y. Small adenocarcinoma of the lung. Histologic characteristics and prognosis. Cancer 1995 June 15;75(12):2844-52. 7. Nakajima R, Yokose T, Kakinuma R, Nagai K, Nishiwaki Y, Ochiai A. Localized pure groundglass opacity on high-resolution CT: histologic characteristics. J Comput Assist Tomogr 2002 May;26(3):323-9. 8. Nakata M, Sawada S, Saeki H, Takashima S, Mogami H, Teramoto N, Eguchi K. Prospective study of thoracoscopic limited resection for ground-glass opacity selected by computed tomography. Ann Thorac Surg 2003 May;75(5):1601-5. 9. Suzuki K, Asamura H, Kusumoto M, Kondo H, Tsuchiya R. “Early” peripheral lung cancer: prognostic significance of ground glass opacity on thin-section computed tomographic scan. Ann Thorac Surg 2002 November 1;74(5):1635-9. 10. Seki N, Sawada S, Nakata M, Inoue T, Nishimura R, Segawa Y, Shibakuki R, Eguchi K. Lung cancer with localized ground-glass attenuation represents early-stage adenocarcinoma in nonsmokers. J Thorac Oncol 2008 May;3(5):483-90. 11. Aoki T, Nakata H, Watanabe H, Nakamura K, Kasai T, Hashimoto H, Yasumoto K, Kido M. Evolution of Peripheral Lung Adenocarcinomas: CT Findings Correlated with Histology and Tumor Doubling Time. Am J Roentgenol 2000 March 1;174(3):763-8. 12. Ost D, Fein AM, Feinsilver SH. Clinical practice. The solitary pulmonary nodule. N Engl J Med 2003 June;348(25):2535-42. 13. Usuda K, Saito Y, Sagawa M, Sato M, Kanma K, Takahashi S, Endo C, Chen Y, Sakurada A, Fujimura S. Tumor doubling time and prognostic assessment of patients with primary lung cancer. Cancer 1994 October 15;74(8):2239-44. 14. Winer-Muram HT, Jennings SG, Tarver RD, Aisen AM, Tann M, Conces DJ, Meyer CA. Volumetric Growth Rate of Stage I Lung Cancer prior to Treatment: Serial CT Scanning. Radiology 2002 June 1;223(3):798-805. 15. Yankelevitz DF, Henschke CI. Does 2-year stability imply that pulmonary nodules are benign? AJR Am J Roentgenol 1997 February;168(2):325-8. 16. Yankelevitz DF, Reeves AP, Kostis WJ, Zhao B, Henschke CI. Small pulmonary nodules: volumetrically determined growth rates based on CT evaluation. Radiology 2000 October;217(1):251-6. 17. Jett JR, Midthun DE. Commentary: CT Screening for Lung Cancer--Caveat Emptor. Oncologist 2008 April 1;13(4):439-44. 18. Lee HJ, Goo JM, Lee CH, Yoo CG, Kim YT, Im JG. Nodular ground-glass opacities on thinsection CT: size change during follow-up and pathological results. Korean J Radiol 2007 January;8(1):22-31. 19. Yankelevitz DF, Kostis WJ, Henschke CI, Heelan RT, Libby DM, Pasmantier MW, Smith JP. Overdiagnosis in chest radiographic screening for lung carcinoma: frequency. Cancer 2003 March 1;97(5):1271-5.

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

Reference List

113

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

114

20. van Iersel CA, de Koning HJ, Draisma G, Mali WP, Scholten ET, Nackaerts K, Prokop M, Habbema JD, Oudkerk M, van Klaveren RJ. Risk-based selection from the general population in a screening trial: selection criteria, recruitment and power for the Dutch-Belgian randomised lung cancer multi-slice CT screening trial (NELSON). Int J Cancer 2007 February 15;120(4):868-74. 21. Xu DM, Gietema H, de Koning H, Vernhout R, Nackaerts K, Prokop M, Weenink C, Lammers JW, Groen H, Oudkerk M, van Klaveren R. Nodule management protocol of the NELSON randomised lung cancer screening trial. Lung Cancer 2006 November;54(2):177-84. 22. Marten K, Engelke C. Computer-aided detection and automated CT volumetry of pulmonary nodules. European Radiology 2007 April 11;17(4):888-901. 23. de Hoop B, Gietema H, van de Vorst S, Murphy K, van Klaveren RJ, Prokop M. Pulmonary Ground-Glass Nodules: Increase in Mass as an Early Indicator of Growth. Radiology 2010 April; 255:199-206. 24. Guiot C, Degiorgis PG, Delsanto PP, Gabriele P, Deisboeck TS. Does tumor growth follow a “universal law”? J Theor Biol 2003 November 21;225(2):147-51. 25. Infante M, Lutman RF, Imparato S, Di Rocco M, Ceresoli GL, Torri V, Morenghi E, Minuti F, Cavuto S, Bottoni E, Inzirillo F, Cariboni U, Errico V, Incarbone MA, Ferraroli G, Brambilla G, Alloisio M, Ravasi G. Differential diagnosis and management of focal ground-glass opacities. Eur Respir J 2009 April 1;33(4):821-7. 26. Colby TV, Noguchi M, Henschke CI, Vazquez M, Geiniger K., Yokosa T. Adenocarcinoma. In: Travis WD, Brambilla E, Muller-Hermelink HK, Harris CC, editors. World Health Organization classification of tumours: Tumours of the lung, pleura, thymus and heart. 1st ed. Lyon: IARC Press; 2004. p. 35-44. 27. Kim HY, Shim YM, Lee KS, Han J, Yi CA, Kim YK. Persistent Pulmonary Nodular Ground-Glass Opacity at Thin-Section CT: Histopathologic Comparisons. Radiology 2007 October 1;245(1):26775. 28. Flehinger BJ, Melamed MR. Current status of screening for lung cancer. Chest Surg Clin N Am 1994 February;4(1):1-15. 29. Toyoda Y, Nakayama T, Kusunoki Y, Iso H, Suzuki T. Sensitivity and specificity of lung cancer screening using chest low-dose computed tomography. Br J Cancer 2008 May 20;98(10):1602-7. 30. Suzuki K, Asamura H, Kusumoto M, Kondo H, Tsuchiya R. “Early” peripheral lung cancer: prognostic significance of ground glass opacity on thin-section computed tomographic scan. Ann Thorac Surg 2002 November 1;74(5):1635-9.

Chapter 8 | Ground glass nodules detected during lung cancer screening: a diagnostic and therapeutic dilemma

115

9 Chapter

Eva M. van Rikxoort Bartjan de Hoop Saskia van de Vorst Mathias Prokop Bram van Ginneken

Automatic segmentation of pulmonary segments from volumetric chest CT scans

.......................................................................... Abstract ..........................................................................

Purpose Automated extraction of pulmonary anatomy provides a foundation for computerized analysis of computed tomography (CT) scans of the chest. A completely automatic method is presented to segment the lungs, lobes and pulmonary segments from volumetric CT chest scans. Method and Materials The method starts with lung segmentation based on region growing and standard image processing techniques. Next, the pulmonary fissures are extracted by a supervised filter. Subsequently the lung lobes are obtained by voxel classification where the position of voxels in the lung and relative to the fissures are used as features. Finally, each lobe is subdivided in its pulmonary segments by applying another voxel classification that employs features based on the detected fissures and the relative position of voxels in the lobe. The method was evaluated on 100 low-dose CT scans obtained from a lung cancer screening trial and compared to estimates of both inter-and intra-observer agreement. Results and Conclusion The method was able to segment the pulmonary segments with high accuracy (77%), comparable to both inter-and intra-observer accuracy (74% and 80%, respectively). .........................................................................................................................................................................

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

118

I Introduction The advent of multidetector scanners that can acquire up to 64 slices per rotation has made high-resolution, sub millimeter isotropic computed tomography (CT) imaging the modality of choice for analysis of the lungs. Modern CT scanners can generate a complete chest scan well within a single breath hold. A downside of this high-resolution imaging is that the analysis of these large amounts of data is very time consuming. This can be facilitated by computerized image analysis. Accurate segmentation is a prerequisite for automated analysis. One important task is the division of the lungs into anatomical regions. The human lungs are divided into lobes 1. The physical boundaries between the lobes are the pulmonary fissures. The lobes are further subdivided into segments that are defined based on bronchial supply. There are usually no physical boundaries between the segments. The pulmonary segments are a reference system for radiologists, pulmonologists and surgeons to indicate the position of lesions in the lungs. Knowledge about the localization of lesions can guide bronchoscopy or surgery. In addition, the spatial distribution of a parenchymal disease can be of clinical importance. Pathologic abnormalities may be limited to one or several segments, in which case only the affected segments can be removed instead of complete lobes or lungs. Laros et al. 2 published a 30 year survey of 30 bronchiectatic patients for which more than 10 lung segments were removed. They concluded that as long as at least 6 normal lung segments were preserved, both the quantitative and qualitative functionality of the lungs did not deteriorate over the years. For patients with advanced emphysema, lung volume reduction surgery is a potentially valuable treatment 3. For emphysema, the degree of deterioration can vary significantly between the left and right lung or between different lobes and segments. Although segmentation of lobes and segments is useful in clinical practice, especially segments are underused because determining the segmental boundaries is time-consuming on 3D CT data and usually hard on axial sections. In this paper we present a completely automatic method to identify both the pulmonary lobes and the pulmonary segments on 3D CT data. Several papers have been published about segmentation of pulmonary lobes. Zhang et al. 4 proposed an automatic lobe segmentation framework with atlas initialization for the lobar fissure segmentation. Next, a ridgeness measure was computed from 2D slices to enhance fissure contrast, and fuzzy logic was used to extract the final lobar fissure positions. The extracted fissures were used to segment the lobes. Only the major fissure was extracted in both the left and right lung, so the right lung was divided into two lobes instead of three. It was shown that the similarity between the automatically found lobes and the manual lobe segmentation was high. The detection scheme was also applied to the minor fissure separating the upper and middle lobe in the right lung but this was not evaluated. Kuhnigk et al. 5 detected the lobar fissures using a distance transform to a vessel segmentation and the original CT values. Next, an interactive 3D watershed transform was used to segment the lobes. They showed that the final result of their lobe segmentation did not vary substantially for different manual initializations. Wang et al. 6 proposed a method in which the major fissures were first segmented in a subset of sections, and

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

subsequently a 3D fissure interpolation was applied to obtain the fissure surface. Manual interaction was needed in 2.4% of sections. The segmentation results approximated the gold standard and were comparable to variation between human observers. The middle lobe in the right lung was not segmented since only the major fissure was extracted. Automatic segmentation of the pulmonary segments has received little attention so far and the topic was listed in a recent survey 7 as a completely open research area. Kuhnigk et al. 5 proposed a method to extract pulmonary segments from 3D CT data. After applying a lobe segmentation, they approximated the lung segments by assigning each lung voxel to the nearest point of the segmented bronchial tree in the same lobe. Since their bronchial tree segmentation includes an anatomic classification of the segmental branches this induces a partition of the lobes into segments. They evaluated the segment segmentation in vitro with two specimens of the left human lung. Based on this evaluation they concluded that for multidetector CT data more than 80% of the voxels in the left lung were assigned to the correct segment for the two human left lung specimens. However, no evaluation was done on the real CT data nor on the right lung. In this paper we will present the results of a completely automatic segment segmentation system and evaluate it on 100 low-dose CT scans obtained from a lung cancer screening trial. In this screening trial all abnormal findings are annotated by radiological experts and the segment they reside in is recorded. The performance of the automatic system will be compared to estimates of inter-and intra-observer agreement. In the next section the pulmonary anatomy is briefly described, followed by a description of the data in Section IV. In Section III the methods for lung segmentation, fissure segmentation, lobe segmentation and segment segmentation are detailed. Section V gives the results of the automatic system and compares them to those of human observers. Section VI discusses the results and concludes.

II Pulmonary Anatomy The human lungs are divided into lobes and the physical boundaries between the lobes are the lobar fissures. The right lung consists of three lobes; the upper lobe, the middle lobe, and the lower lobe. The left lung consists of two lobes, and does not have a middle lobe. The equivalent of the middle lobe in the left lung is called the lingula. The lingula is occasionally considered a separate lobe in the left lung, however it is almost never separated from the upper lobe by a fissure 1. In this paper, we will not consider the lingula to be a separate lobe in the left lung. The lobes are separately supplied by the first subdivisions of the bronchial tree after the main bronchi (Figure 1). Since also the vascular, nerve and lymphatic supply from the hilum to each lobe are mostly separated, the lobes function relatively independently within the lungs. The lobar fissures are often incomplete 8, in which case two lobes are (partly) connected. The middle row of Figure 2 shows the lobes on cross-sectional slices from 3D CT data.

119

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Figure 1 Schematic drawing of the bronchial tree up to segmental level. Based on the anatomy of the bronchial tree 10 segments are defined in the right lung and 8 in the left lung.

The lobes are further subdivided into segments. There are almost never physical boundaries between the segments. Occasionally, an accessory fissure forms a physical boundary between two segments. The segments are defined based on supply from a tertiary branch of the bronchial tree. Based on the anatomy of the bronchial tree, 10 segments are defined in the right lung and 8 in the left lung. Compared to the right lung, in the left lung no segment 7 is defined and segment 1 and 2 are considered a single segment, indicated as segment 1/2. In Figure 1 a schematic drawing of the bronchial tree upto segmental level is shown. The bottom row of Figure 2 shows the pulmonary segments on cross-sectional slices from 3D CT data. The segments form anatomical and functional regions of the lung parenchyma. However, the segments are less functionally independent than the lobes since veins tend to drain neighboring segments. Abnormalities are often localized in one or several segments.

III Materials and Methods Before developing and evaluating the segment segmentation system, all scans were subsampled with a factor 2 in each direction using block averaging (the mean of 8 voxels becomes the new voxel value) to reduce required computation

120

A. Features and Classification After the lungs are segmented, three similar approaches in which voxels are classified are used to perform fissure, lobe and segment segmentation. This is an example of supervised learning in which we map input features to output for each voxel. This section describes the different features that are used and discusses the different classifiers used. 1) Gray-scale features: For the detection of the fissures inside the lungs two types of gray-scale features were employed; the eigenvalues of the Hessian matrix and gray-value information on different scales. Both sets of features are described in this section. The Hessian matrix is a symmetric matrix that is composed from the six independent second order derivatives. When performing an eigenvalue analysis of the Hessian matrix, the principal directions ( vˆ 0, vˆ 1, vˆ 2) in which the local second order structure can be decomposed are extracted. The corresponding eigenvalues ( |λ0| ≥ |λ1| ≥ |λ2| ) are real and denote the second order derivative in the principal directions. One expects that at a fissure λ0 is large, because when moving through the fissure the Hounsfield values change from low to high to low. The two other eigenvalues λ1 and λ2 are typically both low since the fissure itself is locally a plane with constant density. Thus the three eigenvalues together contain information that can be used to distinguish voxels on fissures from other voxels. To compute second order image derivatives, we used the standard procedure of convoluting the image with a Gaussian kernel with a specific scale σ, followed by obtaining derivatives with finite differences 9. We used two scales, i.e. σ =1, 3 voxels. The density itself is also an important characteristic, as voxels on fissures tend to have Hounsfield values in a specific range. We therefore added the gray value L at the same scales σ =1, 3 and also the voxel values themselves, indicated as L0, which can be seen as observing the image at scale σ =0 9. 2) Position features: For the lobe and segment segmentation, features based on positions inside and relative to objects will be employed. Two types of features were used, relative positions inside an object and positions relative to another object. Those features are described in this section. To capture the position inside an object we computed so called cumulative positions inside the object. The cumulative positions indicate for each voxel in the object which percentage of object volume is above, next or behind it in each ( ( ( direction. Formally, the cumulative position ( x , y , z ) of a point (x, y, z) inside an object is defined as follows:

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

times. All computations were performed on those subsampled data. The segment segmentation system consists of four steps. We start by segmenting the lungs. Next, the lobar fissures are segmented using a supervised approach. Based on the lung and fissure segmentations, the lobes are extracted. Finally, the segments are extracted per lobe. Since the fissure, lobe and segment segmentation are all based on classification, first a general section about the classifiers and features used is provided. Next, each of the four steps of the algorithm will be described.

121

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

122

1 Z x = ∫ V 0

Y

x

0

0

∫ ∫

I(x' , y' , z')dx'dy'dz'

1 Z y X y = I(x' , y' , z')dx' dy' dz'

V ∫0

1 z z = ∫ V 0

∫ ∫ 0

0

Y

X

∫ ∫ 0

0

(1)

I(x' , y' , z' )dx' dy' dz'

where V is the total number of voxels in the object, I is the segmentation image which is 1 inside the object and 0 elsewhere and X, Y and Z are the dimensions of the image. Use of the cumulative position maps locations inside the object to a standardized coordinate system, in the sense that voxels that are at a similar spatial location inside the object in different scans will get approximately the same coordinates. Next to position inside an object, the position relative to another object can be employed in classification. To capture this relative position the direction to the closest point on the object was computed. This direction is normalized to unit length and the distance to the closest point on the object is added as a feature since that information is lost by normalizing the directional features. This results in four features; the direction in the x, y and z direction and the distance. Clearly there are other ways in which this relative position can be encoded. We tried a number of different possibilities in pilot experiments on the training data and found our particular choice to work well in the context of the complete system. 3) Classification: The fissures, lobes and segments are segmented using a supervised approach, i.e. a system is constructed using sample input and output data and classifiers from pattern recognition theory 10. In supervised learning two stages can be distinguished: a training stage in which the system is developed, and a test stage in which the system is applied to previously unseen data. In the training stage, a number of voxels are sampled from training images, a set of features is calculated for each voxel and a classifier is trained 10. In order to be able to train the classifier, a ground truth is required which gives the class label (i.e. belonging to the class to be segmented or not) for each voxel. The choice of a classifier depends on the application. The performance of four classifiers was evaluated for each application in pilot experiments on the training data, i.e. the linear discriminant classifier (LDC), the quadratic discriminant classifier (QDC), the nearest mean classifier (NMC) and the k-nearest neighbor classifier (KNNC) 10. The classifier that performed best in those pilot experiments was chosen as the classifier for the specific application. For the task of fissure enhancement KNNC performed best. LDC performed best for both the lobe and segment segmentation task. LDC assumes a Gaussian distribution for the samples of each class and equal covariance matrices for each distribution 10. The different means and the single covariance matrix that determine the discriminant were estimated from the training data using maximum likelihood.

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Figure 2 Illustration of pulmonary lobar and segmental anatomy from actual results of the different steps of the segment segmentation algorithm on a randomly selected scan. From left to right: an axial slice, a coronal slice, a sagittal slice of the right lung and a sagittal slice of the left lung. The top row shows slices of the original scan. The second row shows the result of the lobe segmentation, in the bottom row the corresponding segment segmentation is shown.

B. Lung Segmentation The lung fields were segmented with an automatic 3D algorithm comparable to the algorithms proposed by Sluimer et al. 11 and Hu et al. 12. This algorithm consists of the following steps: 1) First a structure comprising the trachea and the main parts of the bronchial tree is found using region growing. A seed point for this structure is determined by searching for a connected region on axial slices below -950 HU that matches certain size, location and shape criteria. Starting at the top, each slice is examined until a suitable region is found. The seed is the point with the lowest Hounsfield value within this region. From this seed, the trachea and main stem bronchi are grown using explosion controlled region growing. This means that the threshold applied is slightly increased in each iteration and the grown structure is frozen when its size increases by a factor two compared to the previous threshold. Such a volume increase (the explosion) indicates that the bronchi have merged with the lung parenchyma. The first threshold used is -950 HU. From the determined structure, the point with the lowest Hounsfield value is taken as seed point for the next step. 2) From the new seed point region growing is applied to find the lungs. Optimal

123

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

a

b

c

Figure 3 Example of the fissure segmentation, (a) the original CT slice, (b) result of supervised enhancement, (c) the final fissure segmentation.

thresholding as described by Hu et al. 12 is used to determine the upper threshold for this region growing operation. 3) After the lungs are grown, the trachea and bronchi found in the first step are removed from the results to obtain only the lungs. Sometimes only one connected component is found. In those cases, the left and right lung are only separated by a low contrast junction line that is the pleura. To separate the lungs in this case, dynamic programming in axial slices is applied, similar to Hu et al. 12. 4) As a final step for the lung segmentation, each lung is smoothed separately using 3D hole filling and morphological closing with a spherical structuring element of size 11 to include vessels in the segmentation and smooth the borders. Example results of the lung segmentation can be seen in Figures 2 and 4. C. Fissure Segmentation After the lungs are segmented, the fissures in the lung are detected. Fissure segmentation is initiated by applying a supervised enhancement filter as described in 13. The results of the enhancement filter will be for each voxel inside the lungs a probability that it belongs to a fissure. Next, the major and minor fissures are extracted from the enhanced image by grouping neighboring voxels into plates. In this section first the enhancement filter will be described followed by a description of the grouping process that leads to the final segmentation of the major and minor fissures. 1) Fissure enhancement: The fissures inside the lungs are enhanced by applying a supervised enhancement filter 13. To be able to train the fissure enhancement filter, examples of positive and negative voxels are needed. All voxels of the manually segmented fissures (see Section IV) were used as positive examples. An equal number of points elsewhere in the lungs were selected as negative examples. Since the voxels around the fissures are the most challenging negative voxels, half of the

124

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

negative voxels were forced to be within 5 voxels of the manually drawn fissures, the other half was more than 5 voxels away. For each voxel in the training data set, a set of features is calculated that describe certain characteristics of that voxel. The set of features is chosen in such a way that the characteristics that are important for the structure to be enhanced are represented in the features. Since the fissures are bright plates in the lungs the filter used the eigenvalues of the Hessian matrix ( |λ0| ≥ |λ1| ≥ |λ2| ) and grey value information (L and L0) as features (see Section III-A). All features except L0 were calculated on two scales, i.e. σ =1, 3 voxels. This results in 9 features in total, λ0, λ1, λ2 and L at two scales plus L0. Based on the features and class labels for each voxel in the training set, a KNNC was trained to be able to assign to unseen voxels a probability that they belong to a fissure. On the training data, the value of k was varied between 1 and 45 and found to be optimal at 15. To apply the fissure enhancement filter as developed in the training stage described above to a test image, each voxel inside the lungs is classified by the trained classifier. The result of the fissure enhancement is for each voxel inside the lungs a probability that it belongs to a fissure. An example result of the enhancement filter can be seen in Figure 3b. 2) Fissure segmentation: Since we are only interested in the major and minor fissures, i.e. the boundaries between the lobes, the enhanced fissures are converted to a segmentation of the major and minor fissures. To accomplish this, the probabilities resulting from the fissure enhancement (i.e. posterior probabilities) are first thresholded at 0.5 to eliminate all voxels with a low probability of being on a fissure. Next, since fissures are plates in the lungs, the remaining voxels are grouped into plates using a method inspired on the technique described in 14,15. This method groups neighboring voxels based on their likelihood of constituting a plane given second order image information. As described in Section III-A the direction of the largest curvature is perpendicular to the plate and is described by the eigenvector vˆ 0 of the Hessian matrix. We employ vˆ 0 calculated at scale σ = 1 to group voxels belonging to the same plate. For a voxel va all voxels vb within a distance d are considered to be on the same plate if 1) the directions of vˆ 0 are similar for both voxels and 2) the voxels belong to the same plate (and not to parallel plates). The first condition is checked by taking the inner product of the normalized eigenvectors vˆ 0 at the locations of va and vb ; if va and vb have similar orientation the product will be close to one. The second condition is checked by taking the outer product of the normalized difference vector w ˆ between the locations of va and vb and vˆ 0 of va; if the voxels are on the same plate the outer product will be close to one. To discard accessory fissures and spurious detections only plates with a sufficient size were retained. The algorithm as described above contains four parameters that need to be set: 1) the maximum distance d between the two voxels under consideration, 2) a threshold Tv on the product of the two eigenvectors, the closer Tv is to 1 the less curvature is allowed in the resulting plates and 3) a threshold Tw on the product between the difference vector and the eigenvector of the grouped

125

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

vector, the closer Tw is to 1 the less chance that two parallel plates are considered the same plate and 4) the minimum number of voxels in a plate to be considered a major or minor fissure. The final result of the fissure segmentation is a set of plates in the lungs which form the major and minor fissures. An example of the final segmentation result can be seen in Figure 3c. D. Lobe Segmentation The lobes can not be directly obtained from the fissure segmentations; the fissures are often incomplete or invisible on the CT scan or may not be detected correctly by our algorithm. To be able to cope with this, we segment the lobes using a supervised approach. Lobe segmentation is applied to each lung separately. To train the lobe segmentation, voxels in the training data were used for which a segment label is known which is converted to a lobe label. See Section IV for a detailed description of the labeled voxels. Next, for each voxel a set of features is calculated which are important for lobe segmentation. Based on the data available, i.e. the lung and fissure segmentations, the most important feature for a voxel is its position relative to the fissures. We captured this relative position to the fissures using the direction and the distance to the closest point on the fissures (see Section III-A). This results in four features; the direction in the x, y and z direction and the distance. In cases where the fissure segmentation is not complete, additional information can be obtained from the position inside the lungs. To capture the position inside the lung we employed the cumulative position in the lung as features (see Section III-A). The advantage of using cumulative positions is that locations inside the lungs ( ( ( are mapped to a standardized coordinate system. With x , y and z added to the feature set, 7 features were used in total. To be able to apply the lobe segmentation to test data, the feature set created using the training data is used to train a LDC. The result of the lobe segmentation will be for each voxel a set of probabilities that it belongs to each lobe. To obtain the final segmentation result, the posterior probabilities were first post-processed by a Gaussian blurring with a scale σ =2 voxels. This has the effect of pooling local evidence and smoothing the lobe borders. Next, each voxel is assigned to the lobe with the highest probability. A result of the lobe segmentation can be seen in the second row of Figure 2. E. Segment Segmentation As a final step, each voxel is assigned to a pulmonary segment. Since there are typically no physical boundaries between the segments we must use more indirect evidence and we attempted to exploit the fact that the spatial arrangement between the segments in a lobe is relatively constant. Segment segmentation is performed per lobe using a voxel classification approach similar to the one described in Section III-D. As training data for the segment segmentation the locations from the training data for which a segment label is known were used (see Section IV for the description of the training data). Because for the assignment of a segment label to a voxel its position inside the

126

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Figure 4 Typical results of the automatic segment segmentation for 3 (the columns) randomly selected scans on different cross-sectional slices. Slices have been chosen in such a way that all segments are shown in each direction at least once.

127

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

lobe is most important, the cumulative position features are used as features with the object I in Eq. 1 being the lobe segmentation. Next to the position inside the lobe, the position relative to the lobar fissures adds useful additional information. Therefore the distance and direction to the closest point on the fissures were added as features, which makes 7 features in total. A LDC was trained to assign to each voxel in a test image a probability that it belongs to each segment. The posterior probabilities resulting from the classification were blurred with a scale σ = 1 voxel before each voxel is assigned to the segment with the highest posterior probability. Examples of the segment segmentation can be seen in the bottom row of Figure 2 and in Figure 4.

IV Data For this study data was taken from the NELSON study, a Dutch lung cancer screening trial 16 with low dose CT (30 mAs at 120 kV for patients weighing ≤ 80 kg and 30 mAs at 140 kV for those weighing over 80 kg). Data was acquired on Mx8000IDT or Brilliance-16 CT scanners (Philips Medical Systems, Cleveland, OH) in about 12 seconds in spiral mode with 16 x 0.75 mm collimation and 15 mm table feed per rotation (pitch = 1.3). Axial images of 1.0 mm thickness at 0.7 mm increment were reconstructed using a moderately soft kernel (Philips ”B”) with the smallest field of view that included the outer rib margins at the widest dimension of the thorax. All scans were reconstructed to 512 × 512 matrices. In the lung cancer screening trial, all abnormal findings (mostly pulmonary nodules) are annotated by radiologists and for each finding the pulmonary segment in which it resides is recorded. From all scans obtained in our hospital between January 2004 and May 2006 containing more than 4 findings, 600 scans from different subjects were randomly selected. From these, 500 scans with 3439 findings were used as a training set for the segmentation system. The remaining 100 scans with 697 findings were set aside and used for testing. They were not used in any way during system development. For this set, a second radiologist assigned a pulmonary segment to all findings. Moreover the radiologist that annotated the findings for the screening data repeated the assignment, this second assignment was done in a blind fashion, i.e. without having access to the results of the first reading. This allows us to compare the performance of the automatic system to estimates of both the interand intra-observer agreement. From the three annotations that exist for the evaluation data, a consensus set was constructed which contained only those lesions for which all three annotations agree on the segment label. In total, the observers agreed on 467 out of 697 findings (67%), 275 of those lesions were in the right lung and 192 in the left lung. So in total there are 4 sets of annotations available for evaluation of the automatic system: 1) 2)

128

the annotations from the screening trial, which will be referred to as the reference. the annotations of the second radiologist, from which an estimate of the inter- observer variability is computed.

Right Lung

Left Lung 500 training scans

segment

nr. of lesions

segment

nr. of lesions

1

154

2

239

1/2

219

3

221

3

173

4

215

4

128

5

278

5

100

6

305

6

242

7

19

8

168

8

200

9

214

9

311

10

126

10

127

segment

nr. of lesions

1/2

44

100 evaluation scans segment

nr. of lesions

1

30

2

39

3

61

3

38

4

49

4

30

5

53

5

22

6

61

6

67

7

1

8

29

8

27

9

41

9

44

10

33

10

28

segment

nr. of lesions

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Table 1 Number of findings recorded in each segment.

consensus data segment

nr. of lesions

1

27

2

26

1/2

31

3

52

3

21

4

34

4

11

5

15

5

8

6

56

6

52

7

0

8

16

8

19

9

26

9

24

10

23

10

26

Note – Number of findings recorded in each segment per lung in the 500 scans used for training the system, in the 100 scans used for evaluating the system (based on the reference annotations), and in the consensus data.

129

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

130

3) 4)

the second reading of the reference radiologist, from which an estimate of the intra-observer variability is computed. the consensus set which contains only those lesions on which all three annotations agree.

The accuracy compared to the reference will be computed for the automatic system as well as for the human observers. In addition the accuracy for the automatic system on the consensus set will be calculated. The number of findings in each segment for each lung are given in Table 1 for the 500 training scans, the reference annotations of the 100 test scans and the consensus data. For the fissure segmentation that is applied as one of the parts of the segment segmentation (see Section III) a set of 13 normal dose (120 kV, 100 to 150 mAs) inspiration CT chest scans of 13 different patients were randomly selected from clinical practice at our hospital as training data. This data was acquired on the same scanners and reconstructed with the same kernel as the data from the screening trial described above. In these scans a human observer manually indicated the lobar fissures in every fourth coronal slice. Segmenting fissures was performed by clicking points on the fissure; between two points, a straight line was automatically drawn.

V Experiments and Results The lobe and segment segmentation systems were trained using the 500 scans in the training data set for which one annotation exists for each lesion. Next, the complete system was applied to the 100 scans in the evaluation set for which three annotations exist; the reference, the inter-observer and the intra-observer annotations. To obtain the segmentation of the images in their original size, the segmentation results were super-sampled to the original resolution using nearest neighbor interpolation. As described in Section III-C.2 four parameters need to be set for the task of fissure segmentation. For the current application d was set to 3, Tv and Tw were both set to 0.985 and the minimum number of voxels was set to 500. These settings were determined in pilot experiments. In Figure 4 typical results of the segment segmentation are shown for three scans in different cross-sectional slices. The slices were chosen in such a way that each segment is visible at least once in each direction. In Table 2 the confusion matrices for the left and right lung as produced by the automatic system and the estimates of inter-and intra-observer variability are given as compared to the reference for the complete data set of 100 scans (697 findings). In addition, computer performance on the consensus data only (467 findings) is listed. For the complete test data set, the accuracy of the automatic system was 81% for the left lung and 74% for the right lung. The total accuracy for the automatic segment segmentation was 77%. The inter-observer agreement was 75% for the left lung and 74% for the right lung, the intra-observer agreement was 81% on the left lung and 79% for the right lung. In the last section of Table 2 the confusion matrices for the automatic system as compared to the consensus data are shown. The accuracy for both lungs in this case was 88%. The lung segmentation that was

VI Discussion and Conclusion We have presented the first fully automatic approach to extraction of pulmonary segments from chest CT data. The results indicate that the system is able to assign a segment label to each voxel inside the lungs with an accuracy comparable to that of human observers. Not only are the overall results comparable to those of human observers, the confusion matrices as presented in Section V show that most of the errors produced by the automatic system are similar to disagreement between expert human observers. The system performs slightly better than an estimate of the inter-observer agreement and slightly worse than the estimated intra-observer agreement. One should note that the human observers assigned segment labels to the lesions in the 100 test scans in one session. Therefore it is possible that they were more consistent in their labelling than in a normal clinical setting. In general, the automatic system performed better for the left lung than for the right lung. Inspection of the confusion matrices shows that this is mainly due to relatively low accuracies obtained for segments 8, 9 and 10 in the right lung (56%, 66% and 71%, respectively). This is, however, comparable to performance of the human observers, especially the inter-observer agreement. The second observer labelled findings in segments 8, 9, and 10 with an accuracy of 59%, 67%, and 70%, respectively. The human observers and the automatic system assigned the lesions that were supposed to be in those segments to the same segments; segment 8 was mainly confused with segment 4, segment 9 was mainly confused with segment 6 and 8, and segment 10 was mainly confused with segment 6. Note that all those pairs of segments are neighboring. In addition, in the right lung segment 2 was often confused for segment 1 and 3 (15% and 11% respectively) by the automatic system. This is comparable to the inter-observer agreement (15% and 13%, respectively). For both lungs, distinguishing segment 4 and 5 was problematic for both the human observers and the automatic system. For segment 4 the automatic system performed comparably to the human observers in both lungs, for segment 5 however, the automatic system had a much higher accuracy, compared to our reference, than the human observers. The human observers not only confused

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

applied as a first step was very successful; all 697 findings were assigned to the correct lung. The lobe segmentation that is applied as part of the segment segmentation system was evaluated using the same data as for the evaluation of the segment segmentation. In Table 3 the confusion matrices for the lobe segmentations for each lung are given. For the left lung, the total accuracy of the lobe segmentation as compared to the reference was 97%, for the right lung this was 90%. The interobserver agreement for the lobe segmentation was 97% for the left lung and 87% for the right lung, the intra-observer agreements were 96% and 88%, respectively. On the consensus data the accuracy for the lobe segmentation of the automatic system was 98% for the left lung and 95% for the right lung.

131

Reference Reference Reference

Reference

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Table 2 Confusion matrices for the segment segmentation in each lung.

1 2 3 4 5 6 7 8 9 10

1 90 23 2 0 0 0 0 0 0 0

2 10 63 0 0 0 6 0 0 0 0

3 0 11 79 4 7 0 0 0 0 0

1 2 3 4 5 6 7 8 9 10

1 93 15 2 0 0 0 0 0 0 0

2 7 67 1 2 0 4 0 0 0 0

3 0 13 95 0 34 0 0 3 0 0

1 2 3 4 5 6 7 8 9 10

1 90 8 0 0 0 0 0 0 0 0

2 7 87 7 2 0 5 0 0 0 0

3 3 0 88 0 26 1 0 0 0 0

1 2 3 4 5 6 7 8 9 10

1 93 16 2 0 0 0 0 0 0 0

2 7 84 0 0 0 1 0 0 0 0

3 0 0 82 0 0 0 0 0 0 0

Right Lung Left Lung Automatic system on full test data (697 findings) Classified segment Classified segment 4 5 6 7 8 9 10 1/2 3 4 5 6 8 0 0 0 0 0 0 0 0 0 3 0 0 0 0 1/2 88 10 2 0 0 0 5 14 0 0 0 0 0 3 5 81 14 0 0 0 79 13 4 0 0 0 0 4 0 7 59 24 10 0 7 80 4 0 0 2 0 5 4 0 10 86 0 0 5 5 77 0 2 0 5 6 0 0 8 0 81 11 0 0 0 100 0 0 0 7 19 7 0 7 56 11 0 8 0 0 0 0 5 77 0 0 12 0 22 66 0 9 0 0 0 0 2 12 0 0 13 3 0 13 71 10 0 0 0 0 3 0 Inter observer on full test data (697 findings) Classified segment Classified segment 4 5 6 7 8 9 10 1/2 3 4 5 6 8 0 0 0 0 0 0 0 0 0 5 0 0 0 0 1/2 91 9 0 0 0 0 0 2 0 0 0 0 0 3 24 68 5 0 3 0 82 4 12 0 0 0 0 4 3 27 47 10 10 0 24 30 8 2 2 0 0 5 5 4 32 50 9 0 0 0 93 0 0 0 3 6 0 0 0 0 92 0 0 0 100 0 0 0 0 7 10 3 14 4 59 7 0 8 0 0 0 3 7 79 0 0 19 0 12 67 2 9 0 0 0 0 7 32 0 0 27 0 0 3 70 10 0 0 0 0 7 0 Intra observer on full test data (697 findings) Classified segment Classified segment 4 5 6 7 8 9 10 1/2 3 4 5 6 8 0 0 0 0 0 0 0 0 0 5 0 0 0 0 1/2 100 0 0 0 0 0 2 3 0 0 0 0 0 3 21 68 5 3 3 0 72 14 8 0 2 0 2 4 0 20 57 10 10 0 11 55 4 0 2 2 0 5 5 0 36 50 9 0 5 0 85 0 1 1 2 6 2 0 2 0 92 2 0 0 0 100 0 0 0 7 10 4 0 0 79 7 0 8 0 0 4 4 0 74 0 0 10 0 12 71 7 9 0 0 0 0 0 18 0 0 9 0 0 3 88 10 0 0 0 0 0 0 Automatic system on consensus data only (467 findings) Classified segment Classified segment 4 5 6 7 8 9 10 1/2 3 4 5 6 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1/2 90 10 0 0 0 0 4 12 0 0 0 0 0 3 0 90 10 0 0 0 91 3 3 0 0 3 0 4 0 0 64 36 0 0 7 93 0 0 0 0 0 5 0 0 0 100 0 0 2 4 89 0 0 0 4 6 0 0 5 0 85 10 0 0 0 0 0 0 0 7 0 0 0 6 81 13 0 8 0 0 0 0 6 82 0 0 0 0 15 85 0 9 0 0 0 0 4 0 0 0 0 5 0 5 90 10 0 0 0 0 4 0

Note – Numbers are percentages. The columns indicate the classification, the rows indicate the ground truth.

132

9

10

0 0 0 0 0

0 0 0 0 0

18 84 4

0 2 93

9

10

0 0 3 0 3

0 0 0 0 5

11 57 0

0 4 93

9

10

0 0 3 0 2

0 0 0 0 0

18 77 0

0 5 100

9

10

0 0 0 0 0

0 0 0 0 0

12 96 4

0 0 92

Table 3 Confusion matrices for the lobe segmentation in each lung.

Left Lung Automatic system

Reference

Classified lobe 1

Classified lobe

2

3

1

1

91

9

0

1

2

5

90

5

2

3

2

8

90

3

2

3

98

2

3

97

Inter observer

Reference

Classified lobe 1

2

Classified lobe 3

1

90

1

9

1

2

18

70

12

2

3

2

3

95

3

1

2

3

95

-

5

1

99

Intra observer

Reference

Classified lobe

Classified lobe

1

2

3

1

89

2

9

1

1

2

14

76

10

2

3

2

4

94

3

2

3

95

5

3

97

Automatic system on consensus data

Reference

Classified lobe

Classified lobe

1

2

3

1

92

8

0

1

1

2

0

96

4

2

3

1

3

97

3

2

3

100

0

3

97

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Right Lung

Note – The numbers are percentages. The columns indicate the classification, the rows indicate the ground truth. 1 indicates the upper lobe, 2 indicates the middle lobe and 3 the lower lobe.

segments 4 and 5 with each other, segment 5 in the right lung and segment 4 in the left lung were often confused with segment 3 in the specific lungs. In the right lung this was probably due to incompleteness or absence of the minor fissure separating the upper and middle lobe. Since there is usually no minor fissure in the left lung it is even more difficult to assign the correct segmental labels to positions near segmental borders. The fact that different segments were confused with segment 3 in each lung is due to the different orientation of segment 4 and 5 in both lungs, as can be seen from Figure 4. When only the consensus data was used for evaluation, computer results improved to an accuracy of 88% for both lungs. For the consensus data the main

133

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

134

Figure 5 Example of PFO’s and the corresponding lobe segmentation as produced by the automatic lobe segmentation. The cross indicates the point marked in the nodule.

problems were the same as for the total evaluation; in the right lung segments 8 and 9 were the most difficult and in the left lung segment 4 was most often confused with segment 5. Table 1 shows that the distribution of findings over the different segments is not uniform. In particular, there is only one finding in segment 7 of the right lung in the test data, and there is no consensus for this finding among the observers. Segment 7 is relatively small and pulmonary nodules are apparently relatively rare in that part of the lungs. To obtain a qualitative evaluation of the automatic segment segmentation an expert radiologist visually inspected the pulmonary segments extracted by the automatic system for several scans and found the result to be convincing. The quantitative evaluation performed in this paper is based on random points in a large set of scans and reflects the way the segments are used by radiologists; as a reference for precise localization of lesions in the lungs. The results from Tables 2 and 3 in combination with the images in Figures 2 and 4 show that the segmentation of the pulmonary segments in the evaluation set was successful. The performance of the lobe segmentation was evaluated using the same data and was comparable to both the intra- and interobserver agreement. For the middle lobe in the right lung, the automatic system performed better than the human observers. This is probably due to the common absence or incompleteness of the minor fissure separating the upper and middle lobe. This is illustrated by the performance of the system on the consensus data; the accuracy of the automatic system for the middle lobe improved to 96%, which means that only 2 lesions were assigned to the wrong lobe. Both those lesions were assigned to the upper lobe instead of the middle lobe. We inspected all lesions that were assigned to the wrong lobe for the consensus data. For the left lung in total 3 out of 193 lesions were misclassified. We found that all those points were perifissural opacities (PFO’s) 17, 18, i.e. nodules attached to a fissure. It is difficult to assign the correct label to such findings. A point that is marked inside the nodule might be on the other side of the fissure than the main body of the nodule. In Figure 5 two examples of PFO’s and the corresponding lobe segmentations are given. For the right lung 14 out of 275 lesions in total were misclassified of which 6 were PFO’s. Of the other errors, 3 positions were on the

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Figure 6 Example result of the segment segmentation on a scan containing pathologic abnormalities. Despite the presence of abnormalities the result of the segment segmentation is comparable to the results on the screening data.

border between two lobes where no fissure was found, 4 errors were due to a segmentation error in the lobe segmentation and one error was a nodule on the border of the lungs which was not included in our lung segmentation. All scans were subsampled with a factor 2 in each direction to reduce the amount of computation time required. The algorithm was implemented in C++, and the code was not optimized. It takes on average around 10 minutes to process a complete scan on a 3GHz Pentium 4 machine. This time is subdivided into 1 minute for the lung segmentation, 4 minutes for the fissure segmentation, 2 minutes for the lobe segmentation and finally 3 minutes for the segment segmentation. It is important to note that in the test data used in the current study no significant pathologic abnormalities were present except for emphysema, since the data was taken from a lung cancer screening trial. The presence of emphysema does not appear to have a detrimental effect on system performance. We expect the system 135

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

136

to be fairly robust against pathologic abnormalities due to the combination of positional and fissure features. However, when an abnormality locally resembles a fissure or if the lung segmentation fails due to the presence of large abnormalities, the complete system is likely to produce poor results. A successful fissure enhancement filter is essential for calculating the distance to the fissures. Visual inspection showed that fissures that were visible on the CT data were always accurately detected. The fact that the training data for the fissure filter consisted of clinical dose scans while the test data here are low dose screening scans did not appear to have a detrimental effect on the fissure filter. Figure 6 shows an example result of the automatic segmentation on a scan containing substantial pathologic abnormalities. A quantitative evaluation of automatic segmentation of pulmonary segments in scans that contain abnormalities is an important topic for future work. Since fissures are often more than one voxel in width, some of the voxels that are used as negative examples will actually be fissure voxels. Due to the small amount of voxels for which this is the case compared to the total number of negative voxels this will not have much influence on the performance of the system. Although the automatic segmentation of the pulmonary segments as presented in this paper is largely successful, the system might be further improved by including a segmentation of the bronchial tree and the arterial system. A scheme in which the bronchial tree and arterial system provide additional evidence in cases were the current system fails might be more robust. It is, however, not clear how the substantial variations that exist in bronchial tree anatomy between subjects influence the segmental boundaries. In conclusion, this study has shown that it is feasible to assign with high accuracy pulmonary segment labels to voxels on the basis of a small set of features that encode a voxel’s position in the lungs, the lobes and relative to automatically detected fissures. The system was evaluated using random positions in a large set of scans from a screening trial. The results showed that the automatic system performed well, with an accuracy comparable to that of human experts.

1. M. Prokop and M. Galanski, Spiral and multislice computed tomography of the body. Eds. Stuttgart, Germany: Thieme, 2003. 2. C. D. Laros, J. M. M. van den Bosch, C. J. J. Westermann, P. G. M. Bergstein, R. G. J. Vanderschueren, and P. Knaepen. Resection of more than 10 lung segments; a 30-year survey of 30 bronchiectatic patients. Journal of Thoracic and Cardiovascular Surgery 1988 vol. 95 (1): 119–123. 3. N. E. T. T. R. Group. Patients at high risk of death after lung-volume reduction surgery. The New England Journal of Medicine 2001 vol. 345(15): 1075–1083. 4. L. Zhang, E. A. Hoffman, and J. M. Reinhardt. Atlas-driven lung lobe segmentation in volumetric x-ray CT images. IEEE Transactions on Medical Imaging 2006 vol. 25(1): 1–16. 5. J.-M. Kuhnigk, V. Dicken, S. Zidowitz, L. Bornemann, B. Kuemmerlen, S. Krass, H.-O. Peitgen, S. Yuval, H.-H. Jend, W. S. Rau, and T. Achenbach. New tools for computer assistance in thoracic CT part 1. Functional analysis of lungs, lung lobes and bronchopulmonary segments. Radiographics 2005 vol. 25: 525–536. 6. J. Wang, M. Betke, and J. P. Ko. Pulmonary fissure segmentation on CT. Medical Image Analysis 2006 vol. 10: 530–547. 7. I. C. Sluimer, A. M. R. Schilham, M. Prokop, and B. van Ginneken. Computer analysis of computed tomography scans of the lung: a survey. IEEE Transactions on Medical Imaging 2006 vol. 25(4) 385–405. 8. K. Hayashi, A. Aziz, K. Ashizawa, H. Hayashi, K. Nagaoki, and H. Otsuji. Radiographic and CT appearances of the major fissures. Radiographics 2001 vol. 21(4): 861–874. 9. T. Lindeberg, Scale-Space Theory in Computer Vision. Dordrecht, the Netherlands: Kluwer Academic Publishers, 1994. 10. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: John Wiley and Sons, 2001. 11. I. C. Sluimer, M. Prokop, and B. van Ginneken. Towards automated segmentation of the pathological lung in CT. IEEE Transactions on Medical Imaging 2005 vol. 24(8): 1025–1038. 12. S. Hu, E. A. Hoffman, and J. M. Reinhardt. Automatic lung segmentation for accurate quantitation of volumetric X-ray CT images. IEEE Transactions on Medical Imaging 2001 vol. 20(6): 490–498. 13. E. M. van Rikxoort, B. van Ginneken, M. Klik, and M. Prokop. Supervised enhancement filters: application to fissure detection in chest CT scans. IEEE Transactions on Medical Imaging 2008 vol. 27(1): 1–10. 14. S. N. Kalitzin, J. J. Staal, B. M. ter Haar Romeny, and M. A. Viergever. A computational method for segmenting topological point sets and application to image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001 vol. 23(5): 447–459. 15. J. J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken. Ridge based vessel segmentation in color image of the retina. IEEE Transactions on Medical Imaging 2004 vol. 23(4): 501–509. 16. D. M. Xu, H. Gietema, H. de Koning, R. Vernhout, K. Nackaerts, M. Prokop, C. Weenink, J. Lammers, H. Groen, M. Oudkerk, and R. van Klaveren. Nodule management protocol of the NELSON randomised lung cancer screening trial. Lung Cancer 2006 vol. 54(2): 177–184. 17. M. A. J. Klik, E. M. van Rikxoort, J. F. Peters, H. A. Gietema, M. Prokop, and B. van Ginneken. Improved classification of pulmonary nodules by automated detection of benign subpleural lymph nodes. Abstract International Symposium on Biomedical Imaging 2006: 494–497. 18. M. I. Ahn, A. McWilliams, S. MacDonald, S. Lam, and J. Mayo. Perifissural Opacities (PFO’s) Detected at Chest CT Screening for Lung Cancer. Abstract in Radiological Society of North America 2004: 324–324.

Chapter 9 | Automatic segmentation of pulmonary segments from volumetric chest CT scans

Reference List

137

10 Chapter

Bartjan de Hoop Eva M. van Rikxoort Hester Gietema Bram van Ginneken Rob van Klaveren Mathias Prokop

Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

.......................................................................... Abstract ..........................................................................

Purpose Distribution of emphysema is closely related to functional impairment and knowledge about emphysema distribution is crucial for lung volume reduction surgery. We describe distribution and regional progression of emphysema in the pulmonary lobes and segments in smokers. Methods We included 2994 (former) heavy smokers participating in a lung cancer screening trial. All participants underwent Computed Tomography (CT) scanning at baseline and after a median follow-up of 710 days. Lungs, lobes and bronchopulmonary segments were automatically segmented. The percentage of voxels with an attenuation below -950 Hounsfield Units (HU) was used to quantify low-attenuation areas (%LAA950) as a measure of emphysema. Participants with %LAA950 more than the previously reported measurement variability (1.1% of total lung volume) were considered suffering from emphysema. Results Emphysema was found in 738 participants at baseline. Most emphysema was found in both upper lobes, especially in the right apical segment. The average increase in emphysema was significantly higher for the emphysema patients compared to the nonemphysema patients (P < .001). Discernable, i.e. >1.1%, increase in extent of emphysema was found in 152 of the emphysema patients. In those patients, median baseline emphysema in the individual lung segments was well correlated to median emphysema increase in those segments at follow-up (r=0.85, P < .001). Conclusion There is not only lobar variability in distribution of emphysema but also segmental variation with the right apical segment being most affected. Progression of emphysema is correlated to the severity of emphysema at baseline, both on patient level as well as regional level within the lungs. .........................................................................................................................................................................

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

Introduction Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality worldwide 1. Emphysema is one of its major components and is present in the majority of patients with severe lung function impairment. Emphysema is anatomically defined as an abnormal permanent enlargement of the airspace distal to the terminal bronchioles without fibrosis, requiring histology for definite diagnoses. However, computed tomography has been shown to be able to reliably quantify the extent of emphysema non-invasively, being able to provide information about the progression of disease over time 2. The technique has been extensively validated and used in many studies 3-11. CT further enables the visualization and quantification of emphysema distribution and regional differences in progression of emphysema. Emphysema is often inhomogeneously distributed throughout the lungs 12. The distribution pattern has been reported to be closely related to functional impairment and mortality caused by the disease 13-15. A predominantly basal distribution has been reported to be associated with greater impairment of FEV1 and a higher mortality compared to an apical predominance of emphysema. Knowledge about emphysema distribution is also important for treatment of severe emphysema with lung-volume–reduction surgery as surgery is only beneficial in patients with predominantly upper lobe emphysema and will have no effect in patients with other emphysema distribution patterns 16. Previous cross-sectional studies describing emphysema distribution often used table position to describe the location of emphysema. The natural boundaries between the lobes and segments however, are mostly positioned oblique or parallel to the table. The longitudinal studies that evaluated progression of emphysema often used a global measure to quantify emphysema. Differences in regional progression of emphysema could therefore not be assessed. In this study we used the normal anatomy of the lung to separate the different regions within the lung. The lungs are anatomically divided in lobes and subdivided in bronchopulmonary segments. The recognition of the individual lobes and segments by the computer makes it possible to objectively describe the distribution and progression of emphysema without crossing anatomical boundaries. The most common cause of emphysema is smoking. The purpose of this study was to describe the distribution of smoking-induced emphysema in a large cohort of heavy smokers and to describe regional differences in progression of emphysema over time.

Materials and Methods Participants The study was conducted among participants of the Dutch-Belgian lung cancer screening trial (NELSON) 17. All participants were current or former heavy smokers, fit enough to undergo surgery. The trial was approved by the Dutch Ministry of Health and by the ethics committee of each participating hospital. For this part of the study a waiver was received. Subjects, meeting the inclusion criteria of having 140

CT Imaging All participants received low-dose CT screening. A 16-detector CT scanner was used (Mx8000 IDT or Brilliance 16P, Philips Medical Systems, Cleveland, OH). No intravenous contrast material was used. Exposure settings were 30mAs at 120kVp or 140 kVp, depending on weight. Scan data were obtained in spiral mode, with 16x0.75mm collimation. Scans were made in full inspiration without spirometric gating. Axial images were reconstructed with 1.0mm thickness and 0.7mm increment, using the smallest field of view to include the outer rib margins at the widest dimension of the thorax. All scans were reconstructed with a soft kernel (Philips ‘B’) at a 512x512 matrix. Lung densitometry is sensitive to CT number shifts due to, for example, X-ray tube aging. The CT-scanner was therefore calibrated every week and a phantom was scanned as quality control before each data acquisition session 11. No significant shift in CT number occurred during the study period. The applied low-dose protocol may result in an increase in noise which can influence densitometry results 18. All CT-scans were therefore subjected to a specially developed noise-reducing filter 19 to minimize this influence. Automated Determination of Anatomical Structures All CT scans were automatically analyzed using in-house developed software, (Image Sciences Institute, Utrecht, The Netherlands) 20. The software allows for automatic, accurate three-dimensional image analysis by segmentation of the lungs with an algorithm based on region growing and morphological processing. Segmentation failures were automatically detected based on statistical deviation from a range of volume and shape measurements. In those cases a multi-atlas based algorithm using non-rigid registration was applied to obtain correct segmentation results. Airways were separately excluded to ensure that only lung parenchyma was analyzed. The software was previously evaluated and reported to have successfully segmented 50 random CTs from the same lung cancer screening trial with accuracy comparable to a human observer 20. Still, cases with outlying lung volumes or lung densities were visually checked and segmentation failures for which the error was estimated to be more than 5% in volume were excluded. Next, the sof tware automatically segmented pulmonary lobes and bronchopulmonary segments 21 (Figure 1). The fissures were extracted by a filter first. Subsequently each voxel was assigned to a pulmonary lobe based on its position in the lung and its position relative to the fissures. Finally, each lobe was subdivided in its bronchopulmonary segments based on the position of voxels within the lobe, relative to the detected fissures. Accuracy of the software is comparable to intraobserver variability 21.

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

smoked a minimum of 16 cigarettes/day for 25 years or 11 cigarettes/day for 30 years were invited to participate in the study. Subjects with a moderate of poor selfreported health status and/or not able to climb two flights of stairs were excluded from participation. For this study, data of one of the participating centers (University Medical Center, Utrecht, The Netherlands) was used.

141

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

142

Figure 1 Illustration of a random CT with the lobe and segment segmentation as it was performed by the software. The top row shows the original scan, the middle row shows the different lobes and the bottom row the segments.

Emphysema Quantification and Patient Selection The percentage of voxels with an attenuation below -950 Hounsfield Unit (HU) was used to quantify low-attenuation areas (%LAA950) as a measure of emphysema. Voxels with an attenuation less than -950HU in each lung, lobe or segment were automatically counted. Next, the %LAA950 is presented as the percentage of the low-attenuated voxels in each lung, lobe or segment. Objective quantification of emphysema with CT has been validated against pathology 8, 9 and has shown to be more accurate and more reproducible than visual methods 22. A threshold was used to select the emphysema patients in the cohort. CT quantification of emphysema is subject to variability due to for example noise 19 and level of inspiration 23. Healthy individuals may therefore have low but not zero %LAA950 values. To date, no definite normal values are available. To ensure that only patients with real emphysema were included, we excluded all participants with an emphysema score within the variability range of the measurement. The upper limit of agreement for measurement variability was previously evaluated at our institution and reported to be 1.1% at the 950HU threshold 11. This means that when two separate CT examination are performed on a patient with no real change in emphysema, there is a 95% chance that the emphysema measurements will not differ more than 1.1%. All participants with a baseline emphysema score more than 1.1% were considered emphysema patients and included in the study.

Results In total 2994 participants (2478 men, 516 women) received baseline screening. After exclusion of 60 cases with no follow-up CT available and 17 cases for which the software failed to produce an accurate segmentation, 2917 cases remained for who a baseline CT and at least one follow-up CT were available. Mean age of the participants that were included in the analysis was 58 years (5.5 years) with a mean smoking history of 41 packyears (18 packyears).

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

To exclude differences in emphysema values due to measurement variability, the same strategy was used in the follow-up of emphysema patients: Patients were considered to have a significant increase in emphysema when the difference between the baseline and the follow-up CT was more than the variability in the measurement, i.e. more than 1.1%. An alternative method to quantify emphysema is the 15th percentile (Perc15) technique 10. This technique provides the maximum HU of the 15% of voxels with the lowest lung density. The %LAA950 and Perc15 are very similar as they are based on the same principle. Still, no definite consensus exist as to which method is preferred and both are used in current literature to quantify emphysema. Due to this lack of consensus we also present the correlation for the degree of emphysema in the segments at baseline and follow-up calculated with the Perc15. Statistical Evaluation Mean and standard deviation (SD) values were calculated for normally distributed data, for non-normally distributed data median and 25th – 75th percentile values were used. The degree of emphysema in each lobe was compared to each of the other lobes in patients with emphysema. The degree of emphysema in each lobe was averaged over all emphysema patients and compared to the values of each of the remaining lobes with a paired Student’s t-test. Significance level was adjusted for multiple comparisons with the Holm-Bonferroni method 24. This approach was also used to test for segmental differences. The correlation between the degree of emphysema at baseline and rate of emphysema progression was evaluated in the emphysema patients that had discernable progression of emphysema. For those patients, the median baseline degree of emphysema in each of the segments was correlated to the median progression in each of the segments, using the Pearson correlation coefficient. Medians were used because baseline %LAA950 was not normally distributed.

Baseline Emphysema In total 738 participants had >1.1 %LAA950 at baseline screening and were considered emphysema patients. Median baseline %LAA950 in the emphysema patients was 2.18 (1.45%-3.95). Median %LAA950 values for the left and right lung were 2.36 (1.67-3.88) and 1.90 (1.19-4.00) respectively. In emphysematous patients the left upper lobe was most affected compared to

143

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

Table 1 Median baseline emphysema in the each of the bronchopulmonary lobes and segments.

Baseline %LAA950

25th – 75th percentile

Left upper lobe

2.64

1.75 – 4.72

Right upper lobe

2.12

0.96 – 5.19

Right middle lobe

1.52

0.60 – 3.34

Left lower lobe

1.76

1.05 – 3.04

Right lower lobe

1.29

0.68 – 2.54

Upper lobe

Middle lobe

Lower lobe

Upper lobe

Lingula

Lower lobe

Right lung

Segment nr.

Baseline %LAA950

25th – 75th percentile

Apical

1

4.33

1.73 – 11.17

Posterior

2

1.33

0.57 – 4.05

Anterior

3

1.26

0.47 – 4.34

Lateral

4

0.45

0.13 – 1.48

Medial

5

1.95

0.79 – 4.43

Superior

6

0.57

0.29 – 1.37

Medial basal

7

2.63

1.19 – 5.06

Anterior basal

8

1.53

0.48 – 4.45

Lateral basal

9

0.52

0.17 – 1.19

Posterior basal

10

0.40

0.15 – 0.96

Left lung

Segment nr.

Baseline %LAA950

25th – 75th percentile

Apicoposterior

2

2.85

1.51 – 5.32

Anterior

3

1.31

0.51 – 4.65

Superior

4

2.15

1.00 – 4.67

Inferior

5

2.86

0.93 – 5.78

Superior

6

0.99

0.56 – 1.91

Anteriormedial basal

8

3.86

1.69 – 6.91

Lateral basal

9

0.58

0.19 – 1.54

Posterior basal

10

1.16

0.52 – 2.29

Note – Both names and numbers are given for the segments.

144

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

Figure 2 Median baseline emphysema in the individual lobes and bronchopulmonary segments in the emphysema patients. Error bars represent the 25th and 75th percentile.

145

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

Figure 3 The correlation between baseline emphysema in each of the bronchopulmonary segments and the progression of emphysema in those segments. Values are given as the median baseline %LAA950 and median difference in %LAA950 in all 152 emphysema patients that had a discernable increase in emphysema between baseline and follow-up CT.

all other lobes (P < .004). Median %LAA950 in the left upper lobe (LUL) was 2.64. The lowest degree of emphysema was found in right lower lobe. Within the upper lobes, the right apical segment was most affected (P < .001). Median %LAA950 in the right apical segment was 4.33, while it was 2.85 in the apicoposterior segment of the left the upper lobe, 1.33 in the right posterior segment, 1.31 in the left anterior segment and 1.26 in the right anterior segment. Three other segments that had relative high %LAA950 values were the left inferior segment of the upper lobe (2.86), the left anteriormedial basal segment (3.86) and the medial basal segment on the right (2.63). The lowest %LAA950 values were found in the lateral segment of the right middle lobe, both superior segment of the lower lobes, both lateral basal segments and both posterior basal segments (Table 1, Figure 2). Progression of Emphysema The median time between the baseline and follow-up CT was 710 days (364 days – 731 days) for both emphysema and non-emphysema patients. The average increase in emphysema was significantly higher in the emphysema patients compared to the non-emphysema patients, 0.32 (SD 1.80) and 0.13 (SD 0.39) respectively (P < .001). A discernable increase (>1.1% of total lung volume) in %LAA950 was found in

146

Discussion The results of our study confirm the hypothesis that smoking induced emphysema is mainly located in the upper lobes 12. However, a substantial amount of emphysema was also found in the lower parts of the lungs. The fact that smoking-induced emphysema is not only located in the apical parts of the lungs was also reported by two other studies that plotted emphysema against table position, and used the slope of the regression line to calculate the cranio-caudal distribution 25,26. Both patients with smoking related emphysema and emphysema due to α-antitrypsin deficiency (AATD) were evaluated. Interestingly, these studies reported a mainly basal predominance in smokers, although the patients with AATD had a significant stronger basal predominance for emphysema. One other study evaluated distribution of emphysema using the pulmonary fissures to separate the different regions within the lungs. The lung lobes were segmented in 47 smokers and upper lobe predominance was seen in 70% of the cases 27. Our results show that smoking related emphysema has indeed a predominant upper lobe distribution but some regions within the lower lobes are also often substantially affected. The fact that more emphysema was found than expected in some of the lower segments may be explained by the fact that a human observer mostly notices large bulla, which are often in the top of the lungs, while the computer looks at low attenuated area’s that are not necessarily part of a bulla. The low attenuated area’s show to be also frequent in the lower part of the lungs, where they are less easily recognized as emphysema by the human observer (Figure 4). Central to the pathogenesis of emphysema is an excess of proteases which causes the destruction of elastin and thus severe alveolar breakdown 28. The bestknown source of proteases in the lung are inflammatory cells recruited to the airspaces by stimulators of inflammation, such as cigarette smoke. People that are susceptible to smoke can develop severe emphysema, while others remain almost unaffected. This was also reflected in our results as progression was higher in patients that showed emphysema at baseline, compared to the participants without baseline emphysema, P < .001. Progression of emphysema was previously evaluated

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

21% (152/738) of the emphysema patients. The median difference in %LAA950 for these 152 participants was 2.16 (1.52-3.22). Baseline lung volume for all participants was 6452 ml. The median difference in lung volume between baseline and follow-up CT (lung volume on follow-up CT minus lung volume on baseline CT) was -93ml (-358ml – 181ml). Median difference in lung volume for the participants with discernable progression of emphysema was 202ml (-101ml – 503ml). The median baseline %LAA950 values for the individual lung segments showed a significant correlation with the median increase in %LAA950 in the individual lung segments at follow-up (r = 0.85, P < .001), Figure 3. This correlation was confirmed when we repeated the analysis in the same 152 patients using the Perc15 method to quantify emphysema, (r = -0.61, P = .007).

147

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

148

Figure 4 Example of typical emphysema distribution pattern in a patient with large prominent emphysematous bulla in both upper lobes (top images) and more subtle emphysema in segment 5 on the right and in segment 8 on both sides (lower images). The original images are shown in the left column, the emphysema overlay is shown in the right column. Note that, despite the noise-filter, some noise can still be seen in the apical regions of the lung.

by Mishima et al 29. They used statistical properties of clusters of low attenuated areas to study complexity of the terminal airspace geometry and reported lower complexity in emphysema patients compared to healthy subjects. They interpreted these results by using a large elastic spring model and reported that neighboring small clusters of low attenuated areas tend to coalesce and form larger clusters as the weak elastic fibers separating them break under tension. This hypothesis was confirmed by Bakker et al. in patients with emphysema due to AATD; the more emphysema was distributed basally, the more progression was found in the basal area 25. Unlike our results, they did not find a correlation between baseline and follow-up distribution for smoking-induced emphysema. The differences in the results might be explained by the method of evaluation. Bakker et al. used the slope of the regression line to emphysema in each CT slice against table position, and used the slope of the line that describes emphysema versus table position. This method works when progression is predominantly in the upper or lower lobes, but becomes unstable when progression is more heterogeneous. The comparison of progression in the individual segments enabled a more precise evaluation of the location of emphysema at baseline and follow-up. A strong correlation was found between

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

the baseline degree of emphysema in each segment compared to the progression of emphysema in each segment after follow-up, r=0.85, P < .001. This shows that progression of smoking-induced emphysema is strongest in already affected areas. The results of our study are in concordance with the finding that some smokers develop emphysema while others do not. Patients that already showed some emphysema on CT were more likely to develop more severe emphysema at followup, while subjects with no signs of emphysema were more likely to remain less affected. This is also true for regions within the lung. The stronger progression of emphysema in already affected regions may be important for possible treatment with lung volume reduction surgery, as it helps to predict which regions will progress strongest over time. A strong point of this study is the use of three-dimensional data and natural boundaries within the lung, which ensures that each region within the lung evaluated at baseline matches with the region evaluated at follow-up, even if the patient is in a slightly different position on the table or at a different level of inspiration. This study was limited by the differences in lung volume between the baseline and follow-up scan. Median lung volume at follow-up was 202ml larger compared to baseline for the patients with discernable progression of emphysema. As a larger lung volume darkens the lungs, real progression may be slightly lower. On the other hand, the increase in lung volume can also be the result of emphysema as emphysema is also characterized by overinflation of the lungs. A second limitation was the use of non-gated CT which, through movement artifacts around the heart, may make emphysema measurements close to the heart less reliable. Beam hardening between the shoulders may also have influenced measurements in the apical segments of the lung. Nevertheless, these limitations apply to both baseline and follow-up CT and are therefore unlikely to have influenced the evaluation of progression in this large group of subjects. In summary, we used a fully automatic method to quantify emphysema distribution in a cohort of (former) heavy smokers. There are segmental differences in distribution of smoking-induced emphysema with the upper lobes being most severely affected. Progression of emphysema was found to be related to severity of emphysema at baseline, being more pronounced in patients who already suffered from emphysema and in areas of the lung that were more affected beforehand.

149

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

150

Reference List 1. Rabe KF, Hurd S, Anzueto A, Barnes PJ, Buist SA, Calverley P, Fukuchi Y, Jenkins C, RodriguezRoisin R, van WC, Zielinski J. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med 2007 September 15;176(6):532-55. 2. Stolk J, Putter H, Bakker EM, Shaker SB, Parr DG, Piitulainen E, Russi EW, Grebski E, Dirksen A, Stockley RA, Reiber JHC, Stoel BC. Progression parameters for emphysema: A clinical investigation. Respiratory Medicine 2007 September;101(9):1924-30. 3. Arakawa A, Yamashita Y, Nakayama Y, Kadota M, Korogi H, Kawano O, Matsumoto M, Takahashi M. Assessment of lung volumes in pulmonary emphysema using multidetector helical CT: comparison with pulmonary function tests. Comput Med Imaging Graph 2001 September;25(5):399-404. 4. Kinsella M, Muller NL, Abboud RT, Morrison NJ, DyBuncio A. Quantitation of emphysema by computed tomography using a “density mask” program and correlation with pulmonary function tests. Chest 1990 February;97(2):315-21. 5. Madani A, Zanen J, de Maertelaer V, Gevenois PA. Pulmonary Emphysema: Objective Quantification at Multi-Detector Row CT--Comparison with Macroscopic and Microscopic Morphometry. Radiology 2006 January 19;2382042196. 6. Stern EJ, Frank MS. CT of the lung in patients with pulmonary emphysema: diagnosis, quantification, and correlation with pathologic and physiologic findings. AJR Am J Roentgenol 1994 April;162(4):791-8. 7. Thurlbeck WM, Muller NL. Emphysema: definition, imaging, and quantification. AJR Am J Roentgenol 1994 November;163(5):1017-25. 8. Gevenois PA, De M, V, De VP, Zanen J, Yernault JC. Comparison of computed density and macroscopic morphometry in pulmonary emphysema. Am J Respir Crit Care Med 1995 August;152(2):653-7. 9. Gevenois PA, De Vuyst P, De M, V, Zanen J, Jacobovitz D, Cosio MG, Yernault JC. Comparison of computed density and microscopic morphometry in pulmonary emphysema. Am J Respir Crit Care Med 1996 July;154(1):187-92. 10. Gould GA, MacNee W, McLean A, Warren PM, Redpath A, Best JJ, Lamb D, Flenley DC. CT measurements of lung density in life can quantitate distal airspace enlargement--an essential defining feature of human emphysema. Am Rev Respir Dis 1988 February;137(2):380-92. 11. Gietema HA, Schilham AM, van Ginneken B, van Klaveren RJ, Lammers JW, Prokop M. Monitoring of Smoking-induced Emphysema with CT in a Lung Cancer Screening Setting: Detection of Real Increase in Extent of Emphysema. Radiology 2007 September 1;244(3):890-7. 12. Thurlbeck WM, Muller NL. Emphysema: definition, imaging, and quantification. AJR Am J Roentgenol 1994 November;163(5):1017-25. 13. Gurney JW, Jones KK, Robbins RA, Gossman GL, Nelson KJ, Daughton D, Spurzem JR, Rennard SI. Regional distribution of emphysema: correlation of high-resolution CT with pulmonary function tests in unselected smokers. Radiology 1992 May;183(2):457-63. 14. Martinez FJ, Foster G, Curtis JL, Criner G, Weinmann G, Fishman A, DeCamp MM, Benditt J, Sciurba F, Make B, Mohsenifar Z, Diaz P, Hoffman E, Wise R, for the NETT Research Group. Predictors of Mortality in Patients with Emphysema and Severe Airflow Obstruction. Am J Respir Crit Care Med 2006 June 15;173(12):1326-34. 15. Parr DG, Stoel BC, Stolk J, Stockley RA. Pattern of Emphysema Distribution in {alpha}1-antitrypsin Deficiency Influences Lung Function Impairment. Am J Respir Crit Care Med 2004 August 11. 16. Fishman A, Martinez F, Naunheim K, Piantadosi S, Wise R, Ries A, Weinmann G, Wood DE. A randomized trial comparing lung-volume-reduction surgery with medical therapy for severe emphysema. N Engl J Med 2003 May 22;348(21):2059-73. 17. van Iersel CA, de Koning HJ, Draisma G, Mali WP, Scholten ET, Nackaerts K, Prokop M, Habbema JD, Oudkerk M, van Klaveren RJ. Risk-based selection from the general population in a screening trial: selection criteria, recruitment and power for the Dutch-Belgian randomised lung cancer multi-slice CT screening trial (NELSON). Int J Cancer 2007 February 15;120(4):868-74. 18. Yuan R, Mayo JR, Hogg JC, Pare PD, McWilliams AM, Lam S, Coxson HO. The Effects of Radiation Dose and CT Manufacturer on Measurements of Lung Densitometry. Chest 2007 August 1;132(2):617-23.

Chapter 10 | Regional progression of emphysema in smokers; a quantitative assessment based on the pulmonary segments

19. Schilham AM, van Ginneken B., Gietema H, Prokop M. Local noise weighted filtering for emphysema scoring of low-dose CT images. IEEE Trans Med Imaging 2006 April;25(4):451-63. 20. van Rikxoort EM, de Hoop B, Viergever MA, Prokop M, van Ginneken B. Automatic lung segmentation from thoracic computed tomography scans using a hybrid approach with error detection. Med Phys 2009 July;36(7):2934-47. 21. van Rikxoort EM, de Hoop B, van de Vorst S, Prokop M, van Ginneken B. Automatic segmentation of pulmonary segments from volumetric chest CT scans. IEEE Trans Med Imaging 2009 April;28(4):621-30. 22. Bankier AA, De Maertelaer V, Keyzer C, Gevenois PA. Pulmonary Emphysema: Subjective Visual Grading versus Objective Quantification with Macroscopic Morphometry and Thin-Section CT Densitometry1. Radiology 1999 June;211(3):851-8. 23. Shaker SB, Dirksen A, Laursen LC, Skovgaard LT, Holstein-Rathlou NH. Volume adjustment of lung density by computed tomography scans in patients with emphysema. Acta Radiol 2004 July;45(4):417-23. 24. Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 1979;6:65-70. 25. Bakker ME, Putter H, Stolk J, Shaker SB, Piitulainen E, Russi EW, Stoel BC. Assessment of Regional Progression of Pulmonary Emphysema with CT Densitometry. Chest 2008 Nov;134(5):931-7. 26. Stavngaard T, Shaker SB, Bach KS, Stoel BC, Dirksen A. Quantitative Assessment of Regional Emphysema Distribution in Patients with Chronic Obstructive Pulmonary Disease (COPD). Acta Radiologica 2006;47(9):914-21. 27. Revel MP, Faivre JB, Remy-Jardin M, Deken V, Duhamel A, Marquette CH, Tacelli N, Bakai AM, Remy J. Automated lobar quantification of emphysema in patients with severe COPD. Eur Radiol 2008 December;18(12):2723-30. 28. Shifren A, Mecham RP. The stumbling block in lung repair of emphysema: elastic fiber assembly. Proc Am Thorac Soc 2006 July;3(5):428-33. 29. Mishima M, Hirai T, Itoh H, Nakano Y, Sakai H, Muro S, Nishimura K, Oku Y, Chin K, Ohi M, Nakamura T, Bates JH, Alencar AM, Suki B. Complexity of terminal airspace geometry assessed by lung computed tomography in normal subjects and patients with chronic obstructive pulmonary disease. Proc Natl Acad Sci U S A 1999 August 3;96(16):8829-34.

151

11 Chapter

Bartjan de Hoop Firdaus Mohamed Hoesein Pieter Zanen Hester Gietema Bram van Ginneken Ivana Isgum Christian Mol Rob van Klaveren Akkelies Dijkstra H. Marike Boezen Harry M. Groen Dirkje S. Postma Jan-Willem. J. Lammers Mathias Prokop

CT-quantified emphysema in heavy smokers: predictive value for rate of lung function decline

.......................................................................... Abstract ..........................................................................

Aim To assess the predictive value of computed tomography (CT) detected emphysema on the rate of lung function decline in smokers during follow-up. Methods We included (former) heavy smokers participating in a lung cancer screening trial. All participants underwent CT and pulmonary function tests (PFT) at baseline. PFT was repeated after a median follow-up time of 3 years and included forced expiratory volume in 1 second (FEV1) and the FEV1/forced vital capacity ratio (FEV1/FVC). Emphysema quantification on CT was based on the 15th percentile (Perc15) technique. Analysis of covariance was used to evaluate the effect of CT-quantified emphysema. Results We included 2,087 males, mean age (SD) 59.8 (5.3) years. Median (25th-75th percentile) Perc15 at baseline was -937HU (-948HU ― -924HU). A lower Perc15 was correlated to a lower FEV1 (r=0.12), and FEV1/FVC (r=0.39) at baseline (P < .001) and significantly but weakly related to a stronger decline in FEV1 and FEV1/FVC (P ≤ 0.002) over time. Baseline non-obstructive patients who showed obstruction at follow-up had significantly lower Perc15 than those who remained non-obstructive. Conclusion More emphysema is related to lower baseline lung function and stronger rate of lung function decline. This suggests that quantification of CT-detected emphysema identifies early manifestations of COPD in patients with a (still) normal lung function. .........................................................................................................................................................................

Chapter 11 | CT-quantified emphysema in heavy smokers: predictive value for rate of lung function decline

Introduction Chronic obstructive pulmonary disease (COPD) is a common lung disease and likely to become the fourth ranking cause of death worldwide by 2020 1. Unfortunately, it remains underdiagnosed, especially in elderly people 2. Tobacco smoking is the main risk factor for developing COPD but not every smoker will develop COPD 3. COPD is characterized by a mixture of small airway disease (obstructive bronchiolitis) and parenchymal destruction (emphysema) and their relative contribution varies interindividually 4. Emphysema is frequently found on Computed Tomography (CT) in lung cancer screening trials and can be present without airway obstruction 5. Whether presence of emphysema is a risk factor for an accelerated lung function decline in these (still) non-obstructive smokers has not been established so far. However, this is important since stabilizing the disease via smoking cessation is an opportunity in COPD. Objective quantification of emphysema can be performed using lung densitometry. Lung densitometry uses the relationship between tissue density and its X-ray attenuation on CT. Destruction of alveoli in persons with emphysema results in a lower lung density. Objective quantification of emphysema with lung densitometry has shown to be more accurate and more reproducible than visual methods 6. The purpose of the study was to assess the severity of CT-detected emphysema in a lung cancer screening trial at the start of the study and to investigate the correlation between this emphysema and lung function decline in heavy smokers.

Material and Methods Participants The study was conducted among participants of the Dutch-Belgian Lung Cancer Screening Trial (NELSON); a population based CT-screening trial that studies current and former heavy smokers fit enough to undergo surgery 7. The trial was approved by the Dutch Ministry of Health and by the ethics committee of each participating hospital. Written informed consent of each participant was obtained. Detailed information on smoking habits (duration of smoking, number of cigarettes per day and, if applicable, duration of smoking cessation) were collected through questionnaires at baseline. Participants meeting the inclusion criteria of having smoked a minimum of 16 cigarettes/day for 25 years or 11 cigarettes/day for 30 years were invited to participate. Since males have a higher chance to meet this inclusion criterion, male participants were recruited first. Participants with moderate or poor self-reported health status or unable to climb two flights of stairs were excluded. For this study, data from two of the participating centers were used (University Medical Center Utrecht / University Medical Center Groningen, The Netherlands). CT Scanning and Quantification of Emphysema All participants received low-dose CT without intravenous contrast injection. At both screening sites 16-detector MDCT scanners were used (Mx8000 IDT or Brilliance

154

Pulmonary Function Tests

Chapter 11 | CT-quantified emphysema in heavy smokers: predictive value for rate of lung function decline

16P, Philips Medical Systems, Cleveland, OH, or Sensation-16 Siemens Medical Solutions, Forchheim, Germany). Scan data were obtained in spiral mode, with 16x0.75mm collimation and in full inspiration. No spirometric gating was applied since it has been reported that this does not improve repeatability of lung density measurements 8, 9. Axial images were reconstructed with 1.0mm thickness at 0.7mm increment. All scans were reconstructed with a soft reconstruction filter (Philips B, Siemens B30f) at a 512x512 matrix. Exposure settings were 30mAs at 120kVp or 140kVp, depending on participant’s weight. This low-dose CT protocol was applied in order to reduce the risk of inducing a neoplasm due to radiation. The application of a low-dose technique has been shown as having no significant effect on the CT quantification of emphysema 10, 11 and was previously used to quantify emphysema in COPD patients and heavy smokers12-14. All CT scans were automatically analyzed. The software allows for automatic, high-precision 3-dimensional image analysis by segmentation of the lungs with an algorithm based on region growing and morphological processing 15. Segmentation failures were automatically detected based on statistical deviation from a range of volume and shape measurements. In those cases a multi-atlas based algorithm using non-rigid registration was applied to obtain correct segmentation. Airways were excluded to ensure that only lung parenchyma was analyzed. All CT-examinations were recalibrated using the air in the trachea to ensure comparability between the two centers. Air calibration is critical in multicenter lung densitometry studies and incorporation of a correction factor is essential for quantitative image analysis 16. Emphysema quantification was based on the 15th percentile (Perc15) technique. This technique provides the cut off value in Hounsfield unit (HU) below which 15% of all voxels are distributed. The lower the Perc15 values are, i.e. closer to -1000, the more emphysema is present. This method of emphysema quantification has been validated against pathology 17 and has been used in multiple studies 18.

Pulmonary function tests (PFT) included forced expiratory volume in one second (FEV1), forced vital capacity (FVC), FEV1/FVC ratio and maximum expiratory flow at 50% of FVC (MEF50). Reversibility was not assessed. PFT was performed on the same day as the baseline screening CT and was repeated after three years of follow-up. All participants with an airflow obstruction, i.e. an FEV1/FVC 70% (66.4%). Further participant’s demographics are provided in table 1. Median follow-up time between the first and the last lung function test was 3 years (2.9 – 3.1). Baseline Emphysema Overall mean baseline Perc15 was -934.9HU (19.5). COPD participants had a mean Perc15 -942.6HU (17.6), compared to -931.0HU (19.2) for participants without COPD (P < .001) (Figure 1.) Increasing age was correlated to lower Perc15 values (r =-0.21, P < .001). Former smokers had on average higher Perc15 values than current smokers, respectively -929.7HU (19.5) and -941.3HU (17.3), respectively (P < 0.001). There was no significant correlation between baseline Perc15 and the number of packyears (P = .171). Baseline Lung Function Mean baseline FEV1/FVC was 72.2% (9.4), FEV1 was 3.4 L (0.77), which is 98.5% (18.5) of predicted and the MEF50 was 3.2 L/s (1.39) or 69.9% (29.7) of predicted. Participants with lower Perc15 values had lower lung function values: FEV1/FVC (r=0.39, p