Users’ Guide to the Urological Literature How to Use a Systematic Literature Review and Meta-Analysis Timothy Y. Tseng,*,† Philipp Dahm,† Rudolf W. Poolman, Glenn M. Preminger, Benjamin J. Canales and Victor M. Montori From Duke University Medical Center, Durham, North Carolina (TYT, GMP), University of Florida, College of Medicine, Gainesville, Florida (PD, BJC), Department of Orthopedic Surgery, Onze Lieve Vrouwe Gasthuis, Amsterdam, The Netherlands (RWP), and the Knowledge and Encounter Research Unit, Department of Medicine, Mayo Clinic, Rochester, Minnesota (VMM)

Purpose: This article introduces practicing urologists to the critical appraisal of systematic reviews and meta-analyses to guide their evidence-based clinical practice. Materials and Methods: Using a urological clinical case scenario we introduce a 3-step process in evaluating systematic reviews and meta-analyses by considering 1) the validity of the review results, 2) what the results are, and 3) the extent to which the results can and should be applied to patient care. Results: A systematic review seeks to synthesize the medical literature about a specific clinical question using explicit methods to perform a comprehensive literature search, identify and select eligible studies, critically appraise their methods, and judiciously summarize the results considering how they vary with study characteristics. When this summary involves statistical methods, ie a meta-analysis, reviewers can offer a pooled estimate that will have greater precision and will apply more broadly than the individual studies. The quality of the underlying studies, the consistency of results across studies and the precision of the pooled estimate can considerably affect the strength of inference from systematic reviews. Conclusions: Valid systematic reviews of high quality studies can increase the confidence with which urologists and patients make evidence-based decisions. Thus, urologists need to recognize the inherent limitations, understand the results and apply them judiciously to patient care. Key Words: evidence-based medicine, urology, review literature as topic, meta-analysis as topic

vidence-based clinical practice requires practitioners to consider the best available evidence in making decisions. When multiple studies on a specific clinical question exist the available evidence from all these studies should be critically considered in patient decision making. This task is greatly facilitated by the availability of systematic summaries of this evidence.1 In contrast to narrative reviews, which often seek to provide a broad overview of an ad hoc selection of studies that may reflect a particular viewpoint, so-called systematic reviews seek to answer a focused question by summarizing studies identified and selected using protocol driven explicit procedures with attention to the validity of each study. Meta-analysis refers to the statistical methods reviewers use to arrive at a pooled estimate from a collection of studies. Meta-analyses are particularly credible when they are applied to studies identified and selected in the context of a systematic review rather than to an arbitrary set of studies. Systematic reviews are becoming increasingly popular in the urological literature as a means to summarize the results of multiple clinical trials that are frequently small, imprecise and nondefinitive. It is important to realize that such reviews are not without limitations. The quality of a

E

Submitted for publication April 28, 2008. Nothing to disclose. * Correspondence: DUMC Box 2922, Division of Urology, Department of Surgery, Duke University Medical Center, Durham, North Carolina 27710 (telephone: 919-684-5693; FAX: 919-6817423; e-mail: [email protected]). † Equal study contribution.

0022-5347/08/1804-1249/0 THE JOURNAL OF UROLOGY® Copyright © 2008 by AMERICAN UROLOGICAL ASSOCIATION

systematic review is critically dependent on the quality of the primary studies being reviewed. Furthermore, an author of a systematic review needs to follow defined methodological standards to assure that the conclusions drawn are valid. To distinguish well conducted systematic reviews that are likely to yield valid conclusions from lesser quality systematic reviews, urologists should be familiar with these standards. This article introduces the reader to a conceptual framework on how to critically appraise a systematic review and, thereby, determine the validity of its results, what the results are, and the extent to which they can and should be applied to the care of an individual patient (Appendix 1). THE PROCESS OF CONDUCTING A SYSTEMATIC REVIEW To better understand how to interpret and apply a systematic review it is helpful to understand how a systematic review is conducted, which is summarized in Appendix 2. Investigators should begin by defining a focused clinical question, and identifying specific inclusion and exclusion criteria for the patient population, as well as intervention, outcome and study methodology. They should then conduct

Editor’s Note: This article is the second of 5 published in this issue for which category 1 CME credits can be earned. Instructions for obtaining credits are given with the questions on pages 1578 and 1579.

1249

Vol. 180, 1249-1256, October 2008 Printed in U.S.A. DOI:10.1016/j.juro.2008.06.046

1250

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS

a comprehensive search for published and unpublished studies designed to capture all available evidence on the question of interest. The predefined inclusion and exclusion criteria are then applied to the identified studies. Next, often working in pairs to limit bias, reviewers assess the methodological quality of each study and extract data using structured forms. Finally investigators pool the data and perform additional analysis as appropriate. Therefore, the process of performing a high quality systematic review is a multistep, often labor intensive process that requires more than 1 investigator and multiple procedures to safeguard against the intrusion of bias. The conduct and reporting of systematic reviews should follow the standards set forth by the Cochrane Collaboration and the Quality of Reporting of Meta-Analysis (QUOROM), which has recently been renamed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).2,3 To the extent that reviewers follow proper methods the review will provide valid conclusions with limited bias.

Among these are a systematic review and meta-analysis published by Hollingsworth et al in The Lancet in 2006 to address exactly your question.5 Intrigued by this finding you decide to retrieve the study for a careful review.

CLINICAL SCENARIO

Did the Review Explicitly Address a Sensible Clinical Question? To provide meaningful information regarding treatment recommendations systematic reviews and the underlying primary studies must address a focused clinical question.6 A question that is too broad—for example, comparing the stone passage rate resulting from such diverse interventions as medical therapy, shock wave lithotripsy and ureteroscopic stone fragmentation to the rate with no treatment—would not be useful because these interventions are likely associated with widely varying stone-free rates. Thus, the pooling of results across such therapies would yield results that would not accurately reflect the true efficacy of any of these 3 interventions (commonly referred to as pooling across apples and oranges) and, therefore, may be meaningless (the pooled result will be a mango which fails to represent either the apples or the oranges). On the other hand if reviewers focus the question too narrowly they may find that there are no studies that can answer their specific question. Therefore, to address a sensible clinical question a systematic review should clearly define the range of patients, interventions and outcomes to be considered. The provision of such explicit eligibility criteria facilitates the interpretation of the study results and the decision of whether the study results are applicable to patients in one’s clinical practice. Predefined inclusion and exclusion criteria also guard the study against the arbitrary selection of studies that support a certain hypothesis. Hollingsworth et al explicitly mention that inclusion and exclusion criteria were established before the literature search.5 They included randomized controlled trials in which an ␣-blocker or calcium channel blocker were used as the main therapy, and excluded those studies in which medical therapy was examined as an adjuvant to surgical intervention. Some readers at this point may consider it problematic to pool results across studies of ␣-antagonists and calcium channel blockers. For this to be reasonable we need to postulate a common biological link between the use of these agents and stone expulsion. As the authors of the review explain, calcium channel blockers and ␣1-blockers cause relaxation of the pelvic-ureteral smooth muscle wall by a combination of spasmolytic action and coordination of

You are a general urologist in a multipartner private practice group with more than 20 years of clinical experience. Recently your group hired a new partner, who has introduced a number of new approaches to your group including the routine use of medical expulsive therapy for patients with symptomatic urolithiasis, which he claims is highly effective. Having acquired a healthy skepticism over the years to everything that claims to be new and better, you decide to research the clinical effectiveness of medical expulsive therapy on your own. The clinic visit of a 53year-old, nonseptic patient who presents with left side renal colic and is subsequently diagnosed with a 5 mm distal ureteral stone prompts you to follow through with this plan. Therefore, you set out to perform a literature search for high quality evidence to inform your clinical decision making on whether medical expulsive therapy would be effective in your patient. THE SEARCH Recalling what you learned from a previous article in the Users’ Guide to the Urological Literature series by Krupski et al on how to search the medical literature effectively, you apply the PICOT mnemonic which stands for population, intervention, comparison, outcome and type of study to develop the focused clinical question that you wish to answer.4 In patients with symptomatic ureteral stones (P), how does treatment with medical expulsive therapy using ␣-blockers or calcium channel blockers (I) compare to placebo (C) in improving stone-free rates (O) when investigated in a randomized controlled clinical trial (T)? To perform the literature search you go to PubMed® as a freely available public resource. A first search using the terms “ureteral stones” and “alpha blockers” that you combine with the AND function in a third search yields 20 citations (search performed April 5, 2008). Using the filter for randomized controlled trials under “limits” you identify 4 such studies. Realizing that you would ideally like to find a systematic review that combines the evidence from all available randomized controlled trials, you instead use the clinical query for systematic reviews which identifies 4 studies.

STUDY SUMMARY The study by Hollingsworth et al that you identified is a systematic review and meta-analysis that addresses the question of whether ␣-blockers or calcium channel blockers are more effective than placebo for helping patients pass urinary stones.5 The authors pooled the results of 9 randomized controlled trials which enrolled 693 patients, and found stone passage rates of 78% and 47% in the medical therapy and no medical therapy arms, respectively. The main conclusion of this study was that medical therapy increases the chance of passing ureteral stones. ARE THE RESULTS VALID?

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS peristaltic activity within the ureter. Since this seems plausible we conclude that this review addresses a sensible, ie sufficiently focused, clinical question. Was the Search for Relevant Studies Detailed and Exhaustive? To include the entire body of evidence for a given clinical question and avoid omission of relevant studies, a good systematic review must involve a thorough search for published and unpublished studies, including those in nonEnglish journals. There is ample evidence including data from the urological literature that compared to studies with statistically significant results, studies with negative findings (ie studies that report not finding a significant result) do not get published at all or when published, they appear in obscure journals long after they were conducted.7–9 Despite controversies regarding the inclusion of such studies due to their potentially lower methodological quality, the omission of unpublished studies may lead to a publication bias in which studies with positive results are systematically overrepresented, thereby leading to an overestimation of the treatment effect.10,11 Overviews based on a small number of studies with few events are the most susceptible to publication bias. Potential publication bias can be explored in a number of ways including visual representation in the form of an inverted funnel plot, which plots the effect size against the sample size of individual studies.12 An inverted, funnelshaped symmetrical appearance of dots suggests that no study has been left out, whereas an asymmetrical distribution suggests possible publication bias (fig. 1). In this case one typically observes an absence of negative studies of low sample size, whereas larger studies are more likely to be published regardless of the direction of their results. However, the funnel plot is useless when there are few studies or when the studies are inconsistent in their results.13,14 Hollingsworth et al describe a search strategy which appears to have been fairly exhaustive.5 The authors searched for English and nonEnglish language publications for a preidentified time frame (1981 to 2005), identified their primary search databases (MEDLINE®, Pre-MEDLINE, CINAHL® and EMBASE®), listed their search terms and asked a medical librarian to perform an independent search to confirm its completeness. In addition, they hand-searched abstract proceedings, corresponded with first and senior authors of published trials, and contacted major drug companies. In summary, the outlined methods represent a good example of how a systematic literature search should be conducted and reported. Were the Primary Studies of High Methodological Quality? The quality of a systematic review is only as good as the quality of the primary studies from which the data are derived. The pooling of data from different studies increases the sample size, thereby improving the precision of an effect size estimate. However, it does nothing to enhance the accuracy of the study findings. Therefore, it is of critical importance that the authors of a systematic review consider the methodological quality of the individual eligible studies. Although there is currently no single definitive assessment instrument for methodological quality, a number of criteria

A

1251

Sample Size 10,000

1,000

100

10 0.1 Favors Treatment

1.0

10 Favors Control

Risk Ratio B

Sample Size 10,000

1,000

100 Publication Bias?

10 0.1 Favors Treatment

1.0

10 Favors Control

Risk Ratio FIG. 1. Inverted funnel plot to detect publication bias. Sample size is plotted against treatment effect. A, no evidence of publication bias with symmetric distribution of studies. B, if small negative trials with large variances are not included plot will appear asymmetrical suggesting publication bias against such negative trials.

have been empirically associated with decreased bias and improved validity for the assessment of primary studies. For randomized trials for example, these include random and concealed allocation; blinding of patients, practitioners, data collectors, data analysts and outcome assessors; intent to treat analysis; and completeness of followup which have been previously reviewed in detail.15 The study by Hollingsworth et al included an assessment of methodological quality of the eligible randomized controlled trials, although this information was not used to include or exclude any of the studies. They found that of the 9 randomized controlled trials, 3 identified the method of randomization, none described concealed allocation, and only 1 study applied blinding. Meanwhile all studies included in the meta-analysis used intent to treat analysis and reported a low percentage of patients lost to followup. Thus, the reviewers appropriately concluded that the methodological quality of the primary studies was limited, which raises concerns about the validity of the findings. Were Assessments of Studies Reproducible? For a systematic review to be valid its results must be reproducible. When investigators make judgments about the

1252

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS

eligibility of primary studies and about their methodological quality, they could introduce bias. Therefore, to limit the intrusion of bias arising from preconceived notions or agendas of reviewers, the assessment of eligibility, of study quality and data abstraction from the individual studies should follow a well planned and standardized process that requires 2 or more investigators to assess each study independently. The investigators responses can then be compared and used to generate measures of interobserver agreement. A high level of agreement speaks to the reproducibility of the judgments. A valuable measure of interobserver agreement is the kappa statistic, which evaluates the agreement beyond chance. Values of 0.4 to 0.6 indicate low agreement, values of 0.6 to 0.8 indicate substantial agreement and values of 0.8 to 1.0 indicate a high degree of agreement. Further discussion of how the kappa statistic is calculated is beyond the scope of this article and the interested reader is referred to the article by McGinn et al.16 Hollingsworth et al inform us that “two reviewers independently extracted data from every study using a standardized form.”5 They further report that reviewers were blinded to the source of the publication and author names. With regard to interobserver agreement the authors state that “inconsistencies were resolved through discussion until a consensus was reached.” While it would have been preferable for the reader to be given a sense of how frequently the reviewers disagreed and on which variables, it is unclear how important this would have been in the context of this systematic review. Summary of the Validity Assessment of the Review Before moving on to an analysis of the results and their applicability to the care of individual patients, it is helpful to summarize the results of the validity assessment. If the study results are unlikely to be valid it would not be beneficial to put in the time and effort to review the results any further. In this particular case the study by Hollingsworth et al addressed a sensible and clinically relevant question.5 The authors performed a detailed and exhaustive search for relevant studies to guard the study against publication bias. However, they were unable to identify studies of high methodological quality. For example, few studies described the method of randomization, none described a concealed allocation process and only 1 study was blinded. These findings raise concerns about whether the observed effect size might be exaggerated. Nevertheless, these concerns are not grave enough to discount the study entirely. Therefore, we will study the results of the meta-analysis in a similarly systematic fashion. WHAT ARE THE RESULTS? Were the Results Similar From Study to Study? A major aim of performing a systematic review and metaanalysis is to increase the precision of the estimates of effect (ie the effect of ␣-blockers and calcium channel blockers to promote the passage of ureteral stones) of several individual studies by combining them as if all patients were part of 1 larger study. However, individual studies are likely to vary in many different ways including their patient populations, interventions and outcomes. This is referred to as clinical heterogeneity. When considering pooling surgical trials, pos-

sible differences in surgical technique also need to be taken into account. Seemingly small differences in surgical interventions may yield large and clinically relevant differences in outcomes. Therefore, clinical expertise is critical to guide the decision of whether it is appropriate to pool study results.17 Study results may also vary because of methodological heterogeneity due to differences in study design (ie blinded and nonblinded studies). Finally, a certain amount of variation due to chance, called statistical heterogeneity, is expected, especially if the sample size is small and the event rate is low. The validity of pooled results is inversely related to the extent of inconsistency in the results across trials. In some cases the inconsistency can be such that reviewers may deem it insensible to conduct meta-analyses. Criteria for this judgment should be established before the data analysis phase. There are 2 criteria to consider when deciding whether the results are sufficiently similar to warrant a single estimate of treatment effect across populations, interventions and outcomes: 1) the similarity of estimates of the treatment effects (ie the point estimates) from the individual studies and 2) the extent to which the confidence intervals overlap (fig. 2).12 As the distance between point estimates increases and confidence intervals do not overlap, inferences drawn from pooled estimates become weaker. On the other hand, stronger inferences from the pooled estimate result when individual point estimates are similar and confidence intervals overlap widely (fig. 2). If there are considerable differences between the point estimates and the confidence intervals of the various studies, investigators should seriously question the validity of pooling the results. In the presence of relatively minor differences between study results investigators can perform statistical testing to assess whether the observed differences across studies are greater than what might be expected due to chance alone. Two types of statistical procedures are commonly used, namely Cochran’s Q-test as the traditional test for heterogeneity (also referred to a chi-square test) and the increasingly popular I2 statistic.1,18,19 The Q-test null hypothesis states that all apparent heterogeneity is due to chance. The test then generates a probability (p) that differences in results between studies as large as or larger than observed may have simply occurred by chance. The main limitation of this test is that it can lack power in the setting of few studies with small samples sizes and low event rates (or have excessive power in the context of many studies). The I2 statistic supposedly overcomes these limitations of the Q-test by providing an estimate of the percentage of variability in results across studies that is likely to be due to true differences in treatment effect as opposed to chance, and performs well in small sample sizes. For example, if I2 is 0%, chance is a likely explanation for any observed variability and pooling appears reasonable. As I2 increases we should increasingly search for explanations other than chance alone to explain the observed variability.

What are the Overall Results of the Review? After determining that the results of a systematic review are likely to be valid, the next step is to consider the actual results of the review. Reviewers who decide not to pool data can present a summary of qualitative results in a table or in the text. Reviewers who decide to pool typically present

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS

1253

plot (fig. 2). Studies are typically weighted by the inverse of their variance (related to the number of participants and events), with larger studies with lower variance receiving more weight.20

A Study 1 Study 2 Study 3 Study 4 Study 5 Study 6 Study 7 Overall 0.1 Favors Treatment

1.0

10 Favors Control

Risk Ratio

How Precise Were the Results? The determination of the precision of the result is as important as the overall weighted average. This is usually represented as a CI around a point estimate, which takes into account the average effect of the intervention, the standard deviation and the sample size. Although not the precise statistic definition, we could consider that CIs represent a range of values that include the true effect with a 95% probability. However, 99% CIs (wider) and 90% CIs (narrower) are also used on occasion. In general, studies with smaller sample sizes with few events result in wider confidence intervals while studies of larger sample size and more events result in narrower confidence intervals.

B Study 1 Study 2 Study 3 Study 4 Study 5 Study 6 Study 7 Overall 0.1 Favors Treatment

1.0

10 Favors Control

Risk Ratio FIG. 2. Results of hypothetical systematic review and meta-analysis in form of forest plots. In forest plots individual studies are typically plotted on y-axis with summary estimate at bottom. Position of marker for each study on x-axis represents point estimate for given treatment effect. Size of point estimate marker reflects weight of study. Confidence intervals (95% CI) are generally displayed as horizontal line flanking point estimate. Vertical midline denotes line of no treatment effect that separates portion favoring intervention group from portion favoring control group. CIs that do not cross no-effect line indicate statistically significant difference between 2 intervention groups. CIs that do cross line indicate statistically nonsignificant difference between 2 intervention groups. Pooled estimate at bottom of chart (large diamond) provides best guess of underlying treatment effect. A, all point estimates favor treatment over control and confidence intervals overlap. Investigators performing systematic review with these 7 studies would be satisfied that it is appropriate to pool. B, point estimates are far apart from each other and there is little overlap of confidence intervals. Taken together these 2 criteria suggest that it is not appropriate to pool these studies.

overall results using a summary pooled measure with its associated CI. Depending on the type of outcome variable (ie categorical or continuous), different measures of effect size may be used. Risk ratios and odds ratios are commonly used summary measures for categorical variables (eg rate of stone passage), whereas differences in means are often used for continuous variables (eg analgesic requirements in mg). A further discussion of the different measures of effect size is beyond the scope of this article. Individual study results and pooled results are typically presented graphically as a forest

Results of the Meta-Analysis of Medical Therapy vs Observation for Ureteral Stones Hollingsworth et al chose the proportion of patients in each individual trial that passed a stone as the primary end point of their analysis and compared patients who had received medical therapy to those who had not.5 The calculated pooled risk ratio was 1.65 (95% CI 1.45–1.88) indicating a 65% increased chance of passing a stone on medical therapy. The associated CI informs us that we can be reasonably certain with 95% probability that the true benefit of medical therapy is an increased chance of passing a stone that lies between a low of 45% and a high of 88%. The authors subsequently tested for heterogeneity using the Q-test, which was not statistically significant (p ⫽ 0.196), and the I2 square test, which was calculated to be 28%.5 This finding suggests that 28% of the observed variability cannot be explained by chance alone. They subsequently performed a number of subgroup analyses to identify sources of heterogeneity. For example, they pooled the studies that used tamsulosin without a calcium channel blocker (4), nifedipine alone (2) or nifedipine in combination with a corticosteroid (3) to investigate the presence of clinical heterogeneity. They also pooled those studies separately that described the method of randomization (3), had no loss to followup (6) and were published as full-text manuscripts. While the magnitude of effect varied the authors observed a consistent, relatively large benefit of medical therapy across subsets, thereby supporting their primary conclusion that medical expulsive therapy increases the rate of stone passage. However, the key procedure to determine if there is a subgroup effect is to determine whether the estimates in 1 subgroup are statistically different from the estimates in another subgroup, sometimes referred to as the treatment-subgroup interaction test.21,22 In summary, the authors reported a large effect size that was measured with good precision which appears consistent across different subgroups. The pooled point estimate suggested that medical therapy with calcium channel blockers and ␣-adrenergic antagonists results in a substantial, clinically meaningful increased chance of stone passage. Particularly helpful in gaining perspective on the effect size is that the authors provide the readers with a number needed to

1254

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS

treat. NNT provides an absolute effect size measure that describes how many patients, on average, need to be treated for 1 additional patient to have an event. Assuming a risk ratio of 1.65 that was observed in this study and a baseline stone passage rate of 47% in the control group, the NNT is 4 (95% CI 3– 4). In light of these clinically relevant results the third step of the critical appraisal process that asks the question whether these results can reasonably be applied to the care of our patients may be followed. Readers will have to be careful and estimate the NNT that corresponds to the stone passage rate in their practice. If this rate is greater without medical therapy, the NNT will be smaller. If the spontaneous passage rate is smaller then the NNT will be larger.

correspondingly greater emphasis on the reporting of therapeutic benefit.23 Ideally prospective trials of stone management should include validated pain questionnaires, assess quality of life and include a cost analysis. In the meta-analysis on medical expulsive stone treatment the authors made an attempt to assess additional variables such as analgesic requirements, number of pain episodes/severity, days lost from work, unplanned medical visits and the need for secondary procedures. However, they found this information infrequently reported. The authors stated that “sideeffects were not rigorously reported for all studies” and also found no information on direct or indirect costs of treatment.

HOW CAN I APPLY THE RESULTS TO PATIENT CARE?

Are the Benefits Worth the Costs and Potential Risks? Finally treatment recommendations must weigh the expected benefits of a given treatment (ie higher stone passage rates) against the potential harms (ie adverse drug effects) and costs. While a well performed systematic review may provide extensive information on many clinically important outcomes, these outcomes must be balanced with patient preferences and values. For example, the cost of treatment may represent a major issue for patients of lower economic means, while certain individuals such as aircraft pilots may find medical and observational management unacceptable and opt for prompt surgical intervention. Thus, in general, systematic reviews should avoid giving recommendations for practice as these cannot take into account the specific circumstances and values encountered across varied clinical settings, the work of guideline panels.

How Can I Best Interpret the Results to Apply Them to the Care of Patients in My Practice? Even when a meta-analysis demonstrates results in favor of a certain intervention, urologists should ensure that the findings are applicable to their own patients. A way to approach this issue of the external validity of a metaanalysis (and any other clinical research study, for that matter) is to review the treatment settings, inclusion/ exclusion criteria and patient characteristics. Instead of dismissing results if your patient does not meet the exact enrollment criteria, ask whether there is any compelling reason why its findings, in general, would not be applicable to your patients. The systematic review by Hollingsworth et al included a total of 9 clinical trials with 491 patients.5 Mean patient age ranged from 34.4 to 46.5 years, 25% to 60% of the patients were women and all patients were treated in an outpatient setting. Mean stone size ranged from 3.9 to 7.8 mm and all but 1 study included only patients with stones located in the distal third of the ureter. With regard to the study settings all but 1 study were performed outside the United States with 4 studies originating from Turkey, 2 from Greece, 2 from Italy and 1 from Iran. As a result evaluation for the presence of a stone relied on less accurate measures such as plain x-ray images and ultrasound than what might be considered standard of care (ie computerized tomography) in many countries including the United States. However, despite the differences in diagnostic technology and treatment location the results appear compelling enough to warrant incorporation into most practices. Were All Clinically Important Outcomes Considered? In addition to the results one must consider whether the studies haven taken all important patient outcomes into account. While all well designed clinical studies should have a defined primary end point such as stone-free rates, other outcomes such as time to stone passage, number of pain episodes, analgesic requirements, and treatment costs including those related to medications, followup visits and diagnostic imaging are relevant to clinical decision making. While this may appear intuitive there is ample evidence in the medical research literature that authors tend to place less emphasis on the reporting of harm and

RESOLUTION OF THE CLINICAL SCENARIO The study by Hollingsworth et al on medical vs observational management of ureteral stones meets most of the validity criteria for a systematic review and meta-analysis with explicit eligibility criteria, a comprehensive search strategy and an assessment of individual study validity.5 The authors found a large benefit to medical therapy compared to observation with few associated adverse effects. Furthermore, the pooling of study results appeared justified based on the reasonable similarity of results and the widely overlapping confidence intervals. Nevertheless, the quality of the individual studies was relatively poor with many failing to conceal randomization or ensure blinding of patients, study personnel and outcomes assessors. Since poor methodological quality has been associated with exaggerated effect sizes, medical expulsive therapy may not be as effective as this study would suggest.24 –26 However, despite these limitations in study quality the magnitude of the effect was sufficiently large to suggest that the inference that medical expulsive therapy provides substantially higher stone passage rates is not without merit. Although questions remain about the adverse effects of medical therapy the collective urological experience with these drugs in other settings such as benign prostatic hyperplasia would suggest that such effects are rare. Furthermore, the cost of a short course of a generic ␣-blocker or calcium channel antagonist should not be cost prohibitive for most patients. Therefore, urologists who treat patients with distal ureteral stones

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS similar to those presented in this systematic review can safely inform their patients that current evidence favors medical therapy compared to observation alone. When patients differ significantly (ie patients with distal ureteral stones larger than 7 mm, preexisting hypotension, propensity for asthenia/weakness) from those included in this meta-analysis, the applicability of these conclusions should be questioned. Furthermore, any conversation with patients and the decision making process will be hindered by the relative paucity of harm data. For example, will the patient experience less severe pain or more severe pain but for a briefer period of time? In every clinical situation urologists should seek to determine patient preference with regard to the desired amount of information and degree of involvement in the decision making process.12,27 Urologists should be aware that patient preferences may vary considerably with the nature of the decision, the options available and the particular outcomes under consideration, and should endeavor to do their best to assure that important decisions remain consistent with these values and preferences.

CONCLUSIONS The large number of small, potentially underpowered, randomized controlled clinical trials found in the urological literature provides a strong argument for investigators to perform systematic reviews. It is important that those who are planning future systematic reviews adhere to accepted methodological standards for conduct and publication to provide the best available evidence for defined clinical questions. As users of the medical literature all urologists should have a basic framework for critically appraising such reviews. Although the quality of the primary studies will always represent a limiting factor in drawing valid conclusions from a systematic review, the methodological quality of the systematic review itself is equally important in ensuring that the results are protected from the intrusion of bias.

APPENDIX 2 The Process of Conducting a Systematic Review Define the question Specify inclusion and exclusion criteria Population Intervention or exposure Outcome Methodology Establish a priori hypotheses to explain heterogeneity Conduct literature search Decision on information sources: databases, experts, funding agencies, pharmaceutical companies, personal files, registries, citation lists of retrieved articles Determine restrictions: time frame, unpublished data, language Identify titles and abstracts Apply inclusion and exclusion criteria Apply inclusion and exclusion criteria to titles and abstracts Obtain full articles for eligible titles and abstracts Apply inclusion and exclusion criteria to full articles Select final eligible articles Assess agreement between reviewers on study selection Abstract data Abstract data on participants, interventions, comparison interventions, study design Abstract results data Assess methodological quality Assess agreement between reviewers on validity assessment Conduct analysis Determine method for pooling of results Pool results (if appropriate) Decide on handling missing data Explore heterogeneity Sensitivity and subgroup analysis Explore possibility of publication bias

Abbreviations and Acronyms NNT ⫽ number needed to treat REFERENCES 1.

2.

3.

ACKNOWLEDGMENTS 4.

Concepts in this article have been taken from the Users’ Guide to the Medical Literature edited by Gordon Guyatt and Drummond Rennie.28 APPENDIX 1 Users’ Guide to Interpreting Review Articles Are the results valid? Did the review explicitly address a sensible clinical question? Was the search for relevant studies detailed and exhaustive? Were the primary studies of high methodological quality? Were assessments of studies reproducible? What are the results? Were the results similar from study to study? What are the overall results of the review? How precise were the results? How can I apply the results to patient care? How can I best interpret the results to apply them to the care of patients in my practice? Were all clinically important outcomes considered? Are the benefits worth the costs and potential risks?

1255

5.

6. 7.

8.

9.

10.

Montori VM, Swiontkowski MF and Cook DJ: Methodologic issues in systematic reviews and meta-analyses. Clin Orthop Relat Res 2003; 413: 43. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D and Stroup DF: Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999; 354: 1896. Many reviews are systematic but some are more transparent and completely reported than others. PLoS Med 2007; 4: e147. Krupski TL, Dahm P, Fesperman SF and Schardt CM: How to perform a literature search. J Urol 2008; 179: 1264. Hollingsworth JM, Rogers MA, Kaufman SR, Bradford TJ, Saint S, Wei JT et al: Medical therapy to facilitate urinary stone passage: a meta-analysis. Lancet 2006; 368: 1171. Burton M and Clarke M: Systematic reviews of surgical interventions. Surg Clin North Am 2006; 86: 101. Bhandari M, Devereaux PJ, Guyatt GH, Cook DJ, Swiontkowski MF, Sprague S et al: An observational study of orthopaedic abstracts and subsequent full-text publications. J Bone Joint Surg Am 2002; 84-A: 615. Krzyzanowska MK, Pintilie M and Tannock IF: Factors associated with failure to publish large randomized trials presented at an oncology meeting. JAMA 2003; 290: 495. Smith WA, Cancel QV, Tseng TY, Sultan S, Vieweg J and Dahm P: Factors associated with the full publication of studies presented in abstract form at the annual meeting of the American Urological Association. J Urol 2007; 177: 1084. Cook DJ, Guyatt GH, Ryan G, Clifton J, Buckingham L, Willan A et al: Should unpublished data be included in meta-

1256

11. 12.

13. 14.

15.

16.

17. 18.

19.

HOW TO USE SYSTEMATIC LITERATURE REVIEW AND META-ANALYSIS

analyses? Current convictions and controversies. JAMA 1993; 269: 2749. Dickersin K: The existence of publication bias and risk factors for its occurrence. JAMA 1990; 263: 1385. Bhandari M, Devereaux PJ, Montori V, Cina C, Tandan V and Guyatt GH: Users’ guide to the surgical literature: how to use a systematic literature review and meta-analysis. Can J Surg 2004; 47: 60. Lau J, Ioannidis JP, Terrin N, Schmid CH and Olkin I: The case of the misleading funnel plot. BMJ 2006; 333: 597. Sterne JA, Egger M and Smith GD: Systematic reviews in health care: investigating and dealing with publication and other biases in meta-analysis. BMJ 2001; 323: 101. Scales CD Jr, Preminger GM, Keitz SA and Dahm P: Evidence based clinical practice: a primer for urologists. J Urol 2007; 178: 775. McGinn T, Wyer PC, Newman TB, Keitz S, Leipzig R and For GG: Tips for learners of evidence-based medicine: 3. Measures of observer variability (kappa statistic). CMAJ 2004; 171: 1369. Hopayian K: The need for caution in interpreting high quality systematic reviews. BMJ 2001; 323: 681. Hatala R, Keitz S, Wyer P and Guyatt G: Tips for learners of evidence-based medicine: 4. Assessing heterogeneity of primary studies in systematic reviews and whether to combine their results. CMAJ 2005; 172: 661. Schoenfeld PS and Loftus EV Jr: Evidence-based medicine (EBM) in practice: understanding tests of heterogeneity in metaanalysis. Am J Gastroenterol 2005; 100: 1221.

20.

21. 22.

23.

24.

25.

26.

27.

28.

Oxman AD, Cook DJ and Guyatt GH: Users’ guides to the medical literature. VI. How to use an overview. EvidenceBased Medicine Working Group. JAMA 1994; 272: 1367. Altman DG and Bland JM: Interaction revisited: the difference between two estimates. BMJ 2003; 326: 219. Oxman A and Guyatt G: When to believe a subgroup analysis. In: Users’ Guide to the Medical Literature, 5th ed. Edited by G Guyatt and D Rennie. Chicago: American Medical Association Press 2002; pp 553–565. Papanikolaou PN and Ioannidis JP: Availability of large-scale evidence on specific harms from systematic reviews of randomized trials. Am J Med 2004; 117: 582. Wood L, Egger M, Gluud LL, Schulz KF, Juni P, Altman DG et al: Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ 2008; 336: 601. Juni P, Altman DG and Egger M: Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001; 323: 42. Juni P, Holenstein F, Sterne J, Bartlett C and Egger M: Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol 2002; 31: 115. Bhandari M, Guyatt GH, Montori V, Devereaux PJ and Swiontkowski MF: User’s guide to the orthopaedic literature: how to use a systematic literature review. J Bone Joint Surg Am 2002; 84-A: 1672. Guyatt GH and Rennie D: Users’ Guide to the Medical Literature, 4th ed. Chicago: American Medical Association Press 2002; p 706.