Recent developments in high-throughput sequence

POINT OF VIEW Exome Sequencing: Dual Role as a Discovery and Diagnostic Tool Chee-Seng Ku,1 David N. Cooper,2 Constantin Polychronakos,3 Nasheen Naid...
Author: Wilfred Parrish
1 downloads 3 Views 415KB Size
POINT OF VIEW

Exome Sequencing: Dual Role as a Discovery and Diagnostic Tool Chee-Seng Ku,1 David N. Cooper,2 Constantin Polychronakos,3 Nasheen Naidoo,4 Mengchu Wu,1 and Richie Soong1 Recent developments in high-throughput sequence capture methods and next-generation sequencing technologies have now made exome sequencing a viable approach to elucidate the genetic basis of Mendelian disorders with hitherto unknown etiology. In addition, exome sequencing is increasingly being employed as a diagnostic tool for specific genetic diseases, particularly in the context of those disorders characterized by significant genetic and phenotypic heterogeneity, for example, Charcot-Marie-Tooth disease and congenital disorders of glycosylation. Such disorders are challenging to interrogate with conventional polymerase chain reaction–Sanger sequencing methods, because of the inherent difficulty in prioritizing candidate genes for diagnostic testing. Here, we explore the value of exome sequencing as a diagnostic tool and discuss whether exome sequencing can come to serve a dual role in diagnosis and discovery. We summarize the current status of exome sequencing, the technical challenges facing it, and its adaptation to diagnostics, and make recommendations for the use of exome sequencing as a routine diagnostic tool. Finally, we discuss pertinent ethical concerns, such as the use of exome sequencing data, originally generated in a diagnostic context, in research investigations. ANN NEUROL 2012;71:5–14

R

ecent developments in high-throughput sequence capture methods and next-generation sequencing (NGS) technologies have made exome sequencing not only technically feasible but also extremely cost-effective. The advent of exome sequencing over the past 2 years has led to a paradigm shift in our approach to the identification of new causal mutations and genes for numerous previously unresolved rare disorders.1–7 In addition, exome sequencing is increasingly being explored as a diagnostic tool in the context of genetic diseases, good examples being congenital chloride-losing diarrhea,8 neonatal diabetes,9 and Charcot-Marie-Tooth disease (CMT), an inherited peripheral neuropathy characterized by extensive locus heterogeneity.10 The first proof-ofprinciple study demonstrating the feasibility of using exome sequencing in disease diagnosis was performed by Choi et al.8 A diagnosis of congenital chloride-losing diarrhea was made through the identification of a homozygous missense variant in SLC26A3, a gene known to underlie the disease. The patient was initially suspected of having Bartter syndrome; however, the diagnosis was

based on very superficial phenotyping.8 Thus, the distinction between the role of exome sequencing in the contexts of genetic discovery and diagnosis has become somewhat blurred as exome sequencing increasingly acquires a dual role. The success of exome sequencing in the discovery of novel causal mutations for rare diseases is well established, but the question remains as to whether it may also be applicable as an effective clinical diagnostic tool, given the considerable technical challenges to be overcome in this setting. The conventional approach to Mendelian disorders relies on the interrogation of all exons and adjacent intronic regions by polymerase chain reaction (PCR) amplification and Sanger sequencing of a gene, as these regions harbor >85% of mutations in monogenic disease.11 This requires knowledge of the target gene(s) for diagnosis, or the narrowing down of candidates to a manageable number for locus discovery. Exome sequencing bypasses both these requirements by harnessing the power of 2 recent technical developments to sequence all known exons (estimated at 200,000).12–14

View this article online at wileyonlinelibrary.com. DOI: 10.1002/ana.22647 Received Sep 16, 2011, and in revised form Oct 4, 2011. Accepted for publication Oct 5, 2011. Address correspondence to Ku, Cancer Science Institute of Singapore (CSI Singapore), 28 Medical Drive, Centre for Life Sciences (CeLS), Level 2, Singapore 117456. E-mail: [email protected] From the 1Cancer Science Institute of Singapore, National University of Singapore, Singapore; 2Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom; 3Department of Pediatrics/Human Genetics, McGill University Health Center, Montreal, Quebec, Canada; 4Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.

C 2012 American Neurological Association V 5

ANNALS

of Neurology (capture). The DNA sequences are then subjected to NGS, whose output is sequence reads of a very large number of individual DNA molecules. Read lengths vary from 100bp (Illumina Genome Analyzer/HiSeq or ABI SOLiD) to several hundreds of base pairs (Roche 454 GS FLX). Reads are then computationally aligned to the known target sequence. The logistics of the experiment are planned so that each nucleotide of target sequence is represented in a large number of reads, typically at least 30 on average (depth of sequencing) or, as sequencing costs keep decreasing, 50- to 100-fold. Naturally, the coverage of specific sequences varies around this average, with the minimum coverage required to call a variant depending upon homozygosity versus heterozygosity and on the error rate of the method. Sequencing costs have been decreasing more rapidly than the cost of capture, and both now stand at about $1,000 each per sample from large-volume, centralized providers. Figure 1 and Figure 2 depict the workflow of exome sequencing; Figure 3 summarizes the commonly adopted approaches to identifying causal mutations.

FIGURE 1: Workflow of exome sequencing from genomic DNA extraction to biological interpretation and the identification of the causal mutation. The workflow is divided into 3 phases, that is, sample preparation and sequencing (steps 1– 4), primary data processing (steps 5–7), and secondary data processing (steps 8–10). The genomic DNA is used for sequencing library preparation involving several steps such as DNA fragmentation and adapter ligation (steps 1 and 2). Exome capture and enrichment is usually performed using commercial kits such as the Agilent, Illumina, and Nimblegen human exome capture kits. Exome sequencing is then performed using high-throughput next-generation sequencing (NGS) technologies such as Illumina Genome Analyzer or HiSeq, Life Technologies SOLiD, and Roche 454 Genome Sequencer. These steps are performed according to the manufacturer’s recommended protocols. An average coverage of approximately 30-fold is deemed sufficient for the accurate calling of variants (steps 3 and 4). Subsequently, the data from NGS platforms are processed to raw sequence reads (step 5). These reads are then aligned to the reference genome using conventional alignment tools. Polymerase chain reaction (PCR) duplicates, which introduce noise to variant calling, are removed (steps 6 and 7). The analytical pipeline is then followed by variant calling and filtering to remove false positives according to commonly applied quality control criteria (step 8). Finally, the variants are annotated to obtain information such as genomic position and functional effect (eg, missense and nonsense variants; step 9), before they are examined to identify the causal mutations (step 10). Steps 2 and 3 are further illustrated and explained in Figure 2. [Color figure can be viewed in the online issue, which is available at www.annalsofneurology.org.]

First, the target regions are isolated from the rest of the genome by hybridization of the mechanically fragmented DNA sample to biotinylated complementary bait sequences 6

Dual Role as a Discovery and Diagnostic Tool Exome sequencing has obvious utility as an important diagnostic tool for disorders that are characterized by significant genetic heterogeneity. However, its diagnostic utility is also becoming evident in the face of wide, and previously unsuspected, phenotypic heterogeneity. Thus, for example, a homozygous PEX1 mutation has been identified in a patient with a clinical diagnosis of Leber congenital amaurosis, which is identical to a well-established mutation causing Zellweger syndrome.15 Disorders characterized by locus heterogeneity may result from a causal mutation present in any of a number of candidate genes, each of which has to be screened in its entirety by PCR and Sanger sequencing—a laborious and expensive process. In a similar vein, phenotypic heterogeneity blurs the accurate assessment of clinical manifestations, leading potentially to an erroneous or ambiguous clinical diagnosis. For example, congenital disorders of glycosylation constitute a group of >30 autosomal recessive disorders caused by deficient glycosylation, thereby rendering a precise diagnosis in these patients somewhat laborious.16 A similar challenge is faced in primary ciliary dyskinesia, an autosomal recessive disorder with a wide range of clinical manifestations.17 As a result, only the most likely causal genes tend to be prioritized for screening in most cases. This is well illustrated by the example of neonatal diabetes mellitus, for which it is recommended in clinical diagnosis to first Volume 71, No. 1

Ku et al: Exome Sequencing

screen for a chromosome 6q24 abnormality (predicts quick recovery from transient disease) or a KCNJ11 mutation (predicts response to oral medication versus insulin; single exon with known hot spots), before seeking, if negative, mutations in ABCC8 (oral-agent predictor; multiexon) and INS (only 2 coding exons; no therapeutic or prognostic consequence). This is a small and timeconsuming improvement over screening all 42 coding exons in these genes at once.9 Similarly, for disorders caused by mutations in multiple candidate genes such as CMT10 and retinitis pigmentosa,18 the current molecular diagnostic strategy involves essentially a one-by-one approach. This is a problem when accurate and timely molecular diagnosis can result in dramatic improvements in patient care.19 To address this limitation, the idea of using exome sequencing as a diagnostic aid was conceived.

Its advantages for diagnosing disorders that are particularly heterogeneous genetically, such as CMT and retinitis pigmentosa, are obvious in comparison to the conventional approach. Clinically, CMT can be divided into 2 major groups, namely CMT1 and CMT2, which are together caused by mutations in >35 different genes. Similarly, causal mutations for retinitis pigmentosa have been identified in >50 genes. A recent study of exome sequencing of 2 affected members in a family with CMT identified a nonsynonymous mutation in GJB1, a known CMT gene, using a simple and efficient method after applying commonly used filtering criteria (see Fig 3).10 Similarly, Lupski et al identified 2 mutations in SH3TC2 causing recessively inherited CMT by whole-genome sequencing. However, this mutation could also have been identified more rapidly and cost-effectively by means of exome sequencing.20 Although many candidate genes for CMT have been identified to date, approximately 70% of CMT2 cases still have no identifiable genetic cause, suggesting additional heterogeneity. The clinical utility of exome sequencing to diagnose rare disorders has also been demonstrated in a family with retinitis pigmentosa and skeletal abnormalities.21 Table 1 and Table 2 summarize, respectively, exome and targeted-gene sequencing studies using NGS for genetic diagnosis. FIGURE 2: (A) Schematic illustration of library preparation using array-based and in-solution hybrid selection or capture. Reprinted from Haas J, Katus HA, Meder B, ‘‘NextGeneration Sequencing Entering the Clinical Arena,’’ Mol Cell Probes 2011 Sep 8 [Epub ahead of print], copyright 2011, with permission from Elsevier. (B) Workflow of library preparation and exome or targeted genomic regions capture. In vitro random/shotgun library is generated from genomic DNA through fragmentation (steps 1 and 2). The fragment ends are repaired and ligated with common adapters flanking each fragment (steps 3 and 4). The library (a collection of adapter-ligated fragments) is hybridized to oligonucleotide probes tethered on a high-density exomecapture microarray or custom-made microarray for targeted genomic regions (step 5). The difference between exome capture and custom or targeted genomic region capture lies with the probes tethered on the microarray. The difference between on-solid and in-solution capture methods is that the oligonucleotide probes are tethered on microarray or are suspended in-solution, respectively. The capture of the adapter-ligated fragments is based on their complementarity with the sequences of oligonucleotide probes. After hybridization, unbound fragments are removed by washing, followed by elution of specifically hybridized fragments (steps 6 and 7). The enriched fragment pool is amplified by polymerase chain reaction (PCR). Subsequently, the success of the enrichment is checked by quantitative PCR at control loci (step 8). Finally, the end product is a sequencing library enriched for target regions, which is then sequenced by high-throughput sequencing (step 9). QC 5 quality control.

January 2012

7

ANNALS

of Neurology

FIGURE 3: Commonly adopted approaches to identify causal mutations. The 3 universal criteria used to filter the less likely causal variants are (1) removing common variants, (2) focusing on deleterious variants, and (3) predicting and retaining variants with functional effects (ie, criteria 1–3). The other criteria are dependent upon the study design, for example, whether unrelated or family samples are sequenced (criteria 4–6). The variant filtering or analysis also depends on whether linkage and homozygosity data from the families are available, because such data significantly reduce the search space for potentially causal variants (criterion 7). Finally, the mode of inheritance of the Mendelian disorder will determine whether the focus should be placed upon homozygous, compound heterozygous, or heterozygous variants (criterion 8). Additional criteria may be needed, for example, restricting cases that are highly similar in terms of their phenotypic manifestations to minimize the clinical/phenotypic heterogeneity that may mask the identification of causal mutations. However, these criteria are likely to vary from study to study.

For cases of genetically heterogeneous clinical phenotypes that are not yet explained by mutations at known loci, the distinction between diagnostic applications based on known mutations and the discovery of new causal mutations is blurred. However, a definitive genetic diagnosis cannot be established solely on the basis of a newly identified mutation; further screening of additional cases is invariably required. A mutation of the same gene in additional cases from different families constitutes the only rigorous confirmation of causality. However, it might be difficult, in the context of extremely rare disorders, to find additional cases to validate the newly identified causal variant. Biochemical validation of the putative pathological variant, to confirm the functional significance of newly identified mutations, can be considered confirmatory only when the mutated gene has a clear role in a well-defined molecular pathology of the disease as, for example, in the case of mutations that impair glycosylation in congenital disorders of glycosyla8

tion.16 This, unfortunately, is not the case for most phenotypes. Cosegregation in large dominant pedigrees, autozygosity mapping to a small fraction of the genome,22 or the de novo nature of a mutation may constitute acceptable alternative lines of evidence in unique cases/pedigrees. Sometimes the known function of a mutated gene can constitute a compelling argument, for example, in a case of an inborn error of folate metabolism, found to have biallelic MTHFD1 mutations.23 Absence in population controls constitutes much less reliable evidence. Indeed, perfectly harmless polymorphisms can be unique, and the presence of a given allele in 1 of 500 control genomes is compatible with a recessive phenotype with an incidence of 1 in a million live births (500  500  4). If the purpose of exome sequencing were solely diagnostic, and all cases could be explained by known causal genes, then it would be unnecessary to sequence the entire human exome corresponding to >25,000 Volume 71, No. 1

Ku et al: Exome Sequencing

TABLE 1: Summary of Exome Sequencing Studies for Genetic Diagnosis

Condition/Patient

Genetic Diagnosis

Reference

A patient with a suspected diagnosis of Bartter syndrome (a renal salt-wasting disease).

The molecular diagnosis was based on the finding of a homozygous missense D652N mutation in SLC26A3 (the known congenital chloride-losing diarrhea locus). Clinical follow-up confirmed the diagnosis.

Choi 20098

A patient presenting with permanent neonatal diabetes mellitus (KCNJ11, ABCC8, and INS mutations and chromosome 6q24 abnormalities were previously excluded).

Identification of a novel nonsynonymous mutation in ABCC8 (c.1455G>C/p.Q485H). This mutation, confirmed by Sanger sequencing, was not present in 348 controls or in the patient’s mother, father, and young brother, all of whom were normoglycemic.

Bonnefond 20109

Two affected individuals in a family with CMT disease.

Identification of a nonsynonymous GJB1 (Cx32) mutation. This variant had been reported previously as pathogenic in X-linked CMT families.

Montenegro 201110

A cognitively normal patient with severe visual loss, absent electrical signals from the photoreceptors (detected by electroretinogram) and nystagmus due to Leber congenital amaurosis, associated with hearing loss and Arnold-Chiari malformation.

Detection of a homozygous PEX1 mutation (p.Gly843Asp). Peroxisome biochemical studies on the patient confirmed a peroxisome biogenesis disorder in the Zellweger spectrum.

Majewski 201115

Four members of a family presenting with spondyloepiphyseal dysplasia and retinitis pigmentosa.

Identified a 6bp deletion in GNPTG, the gene implicated in mucolipidosis type IIIc. The diagnosis was confirmed by biochemical studies and serves to broaden the mucolipidosis type III phenotype.

Schrader 201121

An infant with megaloblastic anemia, atypical hemolytic uremic syndrome, severe combined immune deficiency, elevated blood levels of homocysteine and methylmalonic acid, and a selective decreased synthesis of methylcobalamin in cultured fibroblasts.

Two mutations were identified in the MTHFD1 gene, which encodes a protein that catalyses 3 reactions involved in cellular folate metabolism. Both parents carry a single mutation, and an unaffected sibling carries neither mutation.

Watkins 201123

A boy who presented at 15 months with perianal abscesses and proctitis, progressing to transmural pancolitis with colocutaneous fistulae, consistent with a Crohn disease–like illness.

Identified a novel, hemizygous missense mutation in the X-linked inhibitor of apoptosis gene (XIAP), substituting a tyrosine for a highly conserved and functionally important cysteine. Functional assays demonstrated increased susceptibility to activation-induced cell death and defective responsiveness to NOD2 ligands, consistent with the loss of normal X-linked inhibitor of apoptosis protein function in apoptosis and NOD2 signaling. Based on this medical history, genetic and functional data, the child was diagnosed as having an X-linked inhibitor of apoptosis deficiency.

Worthey 201119

CMT ¼ Charcot-Marie-Tooth.

January 2012

9

10

Targeted 24 genes known to cause congenital disorders of glycosylation using both RainDance and Fluidigm PCR-isolation platforms.

ABI SOLiD

llumina Genome Analyzer IIX

Roche GS FLX

Roche GS FLX

Roche GS Junior

Roche GS FLX

Congenital disorders of glycosylation

Breast and ovarian cancer

Primary ciliary dyskinesia

Autosomal recessive ataxia

ATS

Peripheral neuropathies

Reference

Jones in press16

Walsh 201025

Berg 201117

Hoischen 201024

Artuso in press30

Goossens 200927

Major Findings

The disease-causing mutations were identified by NGS for all 12 positive controls. All single-nucleotide substitutions, small insertion and deletion mutations, and large genomic duplications and deletions were detected.

Three of 3 substitution mutations and 1 of 3 small insertion/deletion mutations were identified. One small insertion mutation was clearly observed after adjusting the bioinformatic handling of previously described SNPs. This process failed to detect 2 known mutations: a single-nucleotide insertion and a whole-exon deletion. Detection of deletions and heterozygous and homozygous point mutations for 6 of 7 mutant alleles. Identified the second mutation (ie, not previously detected) in 2 ATS patients and reconsidered the diagnosis of ATS in a third patient.

Detection of all variants present. Also demonstration of exploitation of the multiplexed PCR amplicons to determine individual copy number variation.

ATS ¼ Alport syndrome; NGS ¼ next-generation sequencing; PCR ¼ polymerase chain reaction; SNP ¼ single nucleotide polymorphism.

Used multiplex PCR reactions and sequenced the complete coding regions of 7 genes implicated in peripheral neuropathies in 40 individuals.

Used a strategy based on the locus-specific amplification of genomic DNA, amplifying each amplicon separately for COL4A3, COL4A4, and COL4A5 genes. Tested in 3 patients (patient 1 had an uncertain diagnosis of ATS, patient 2 and patient 3 had a confident diagnosis of ATS).

The genomic sequences of 7 disease genes, together with 2 control loci, were targeted on a sequence-capture array. Tested in 5 subjects with known mutations and 2 unaffected controls.

Designed a custom array to capture 2,089 exons from 79 genes associated with primary ciliary dyskinesia or ciliary function. Tested in 4 individuals with a variety of previously identified primary ciliary dyskinesia mutations.

Designed custom oligonucleotides in solution to target complete genomic sequence of 21 genes responsible for inherited risk of these cancers. Tested in 20 women diagnosed with breast or ovarian cancer and with a known mutation in 1 of the genes responsible for inherited predisposition to these diseases.

Targeted Genes and Capturing Approaches

Sequencing Platform

Disease

TABLE 2: Summary of Targeted-Gene Studies Using Next-Generation Sequencing (NGS) for Genetic Diagnosis

ANNALS

of Neurology

Volume 71, No. 1

Ku et al: Exome Sequencing

genes; a few tens of causal/candidate genes would probably suffice, even for highly heterogeneous genetic disorders. To this end, custom-designed options for sequence enrichment17,24,25 and multiplex PCR or microfluidic PCR arrays are already available as viable alternatives to whole-exome capture.16,26,27 Indeed, this targeted gene approach has several advantages over whole-exome sequencing, such as greater sequencing depth achieved at a relatively low cost. Nevertheless, it also raises a logistical challenge when employing the current high-throughput NGS technologies (eg, the most commonly used Illumina Genome Analyzer), where several exomes can be multiplexed and sequenced by 1 of the 8 lanes per flow cell to an adequate depth. With the targeted-gene approach, more samples would be needed for multiplexing or bar-coding to avoid oversequencing. Thus, for the congenital disorders of glycosylation, excess coverage was achieved using targeted PCR-based enrichment of 24 known genes with an average coverage of >400-fold per base over the entire gene set,16 a depth unnecessary for variant identification that negates some of the efficiency of targeted sequencing. Clearly, this targeted-gene approach is not practical as a diagnostic application when only single cases are available. However, the recently launched medium-throughput NGS machines, such as the Ion Torrent Personal Genome Machine,28 Illumina MiSeq (http://www.illumina.com/systems/miseq.ilmn), and Roche 454 GS Junior,29,30 have throughputs ranging from 10Mb to >1Gb and offer a better alternative for diagnostic applications. Although another potential solution would be to consign the clinical diagnostic application of exome sequencing to high-turnover research-oriented laboratories, there are concerns as to whether the sequencing would be carried out to a standard that would meet clinical requirements and hence whether these results could be used to inform patients about their carrier or mutational status. In a recent study of attention deficit/hyperactivity disorder, an ‘‘unrelated finding’’ was not disclosed to the research subject who had idiopathic hemolytic anemia (IHA) and was identified as a carrier for 2 rare nonsynonymous mutations in PKLR, a known cause of IHA.31 The investigators explained that their exome sequencing, used specifically as a research application, did not meet the standards necessary for a clinical test as required by the Clinical Laboratory Improvement Amendments (CLIA) program (http://www.cms.gov/clia/).32 Potential diagnostic applications in the context of complex, non-Mendelian disease are also a possibility that can only be speculated upon at this time. Genomewide association studies (GWAS) are discovering multiple associated variants for each of an increasing number of January 2012

diseases. However, with few exceptions, these alleles are common, have weak effects, and are therefore of little diagnostic or prognostic use to the individual. Collectively, they typically explain a small fraction of disease heritability.33 It is quite possible that a significant fraction of the remaining missing heritability is due to variants with strong biological effects that escaped GWAS detection because of their low allele frequency.34 For example, using exome sequencing in 14 schizophrenia patients, Girard et al demonstrated that de novo mutations occur at a much higher frequency than that expected by chance alone, in terms of both the number and the proportion of nonsense mutations among them.35 Although each such variant (inherited or de novo) involves a very small fraction of cases, collectively they may involve a substantial proportion. Because of their strong biological effects, they will be important in understanding potentially different biology underlying cases bearing the same diagnostic label. If confirmed, these effects can be used for the stratification of cases in therapeutic intervention trials and the application of their results to each case (individualized medicine). Finally, exome sequencing would allow consideration of modifier genes36 genes harboring variants that have the potential to modify the clinical phenotype in the context of a particular gene defect, in a hypothesis-free approach. The importance of detecting large deletions and duplications by exome sequencing should also be emphasized as an additional advantage as compared to Sanger sequencing,37 as causal genes for some Mendelian disorders are also affected by these gross rearrangements. For example, in the detection of BRCA1 and BRCA2 mutations, a separate diagnostic test has been offered as a supplementary test to detect large exonic deletions and duplications not detectable by PCR-based approaches.25 Importantly, NGS has been shown to be capable of successfully detecting large genomic deletions and duplications in a diagnostic setting as well as point mutations and small indels.25,38

Technical Challenges Aside from the question of whether exome sequencing is needed for diagnostic applications, have we yet reached the point at which it can be used as a routine clinical diagnostic tool? In our view, several technical challenges remain to be addressed and overcome, and various improvements introduced, before this point can be reached.39–41 First, a major challenge will be to analyze and manage the large amounts of sequencing data generated. Exome sequencing typically generates >10,000 genetic variants that must be carefully filtered to identify the 11

ANNALS

of Neurology

causal mutations. Therefore, the analytical pipeline must be simple enough for the analysis to be handled by the nonspecialist, requiring the development of robust userfriendly software, usable in the clinical setting. Second, the sensitivity and specificity of the detection of microlesions (ie, point mutations, small indels, and tandem repeats) and macrolesions (ie, deletions and duplications) need to be improved to a standard similar to other clinical diagnostic tests. Currently available NGS technologies have higher raw base-calling error rates than Sanger sequencing, although this should be remediable to some extent by increasing coverage to achieve higher consensus accuracy. An adequate depth of sequencing coverage is also critical for identifying heterozygotes (whether causing dominant Mendelian disorders or occurring de novo) or compound heterozygotes (in the context of recessive disorders). Currently, the causal mutations detected by exome sequencing are generally validated by Sanger sequencing. However, for a clinical diagnostic application, this would unnecessarily increase costs. Although these 2 parameters should be optimized further, false positives are less of a critical challenge for the technology, because variant calling criteria can be made more stringent, the depth of coverage can be increased, and final findings can be validated by alternative means. However, false positives could potentially mask the identification of true disease-causing mutations if not allowed for properly. By contrast, false negatives must be managed more carefully, and it will be essential to provide clinicians and medical geneticists with a clear idea of how effective the procedure has been for each patient prior to communicating the diagnostic results. Third, the success of exome sequencing is reliant upon the complete capture of the exome followed by an adequate depth of sequencing. Thus, further improvements in the efficiency of exome capture methods are needed to ensure that all the target exons are captured. For example, the CMT-related genes, such as NEFL, HSN2, and SEPT9, were not investigated using commercial exome capture kits in the Montenegro et al study.10 This would have given a false-negative result or, at the very least, could have required further sequencing of the missing exons using conventional PCR–Sanger methods. Another problem associated with exome sequencing is the nonuniformity of sequence capture and uneven sequencing resulting in uneven coverage depth. Higher sequencing depth would be required to ensure that the sequence reads would provide adequate coverage of as many regions or bases as possible. Uneven sequencing also adversely affects our ability to detect large deleted and/or duplicated regions by quantitative evaluation of depth. Incomplete capture and uneven coverage may be due to technical limitations or the nature of the DNA 12

sequences; for example, GC-rich sequence stretches can be difficult to capture. In the worst case scenario, these GC-rich regions would not be captured at all. This is well demonstrated by the findings of Hoischen et al; 2 exons lacking any coverage were found to exhibit very high GC contents of 76.1% and 63.6%, respectively, compared to the average GC content of 37.6% for the 50 best covered exons in the 7 ataxia genes that were enriched by a sequence capture array.24 It might be expected that even with more robust enrichment kits and sequencing chemistries, there will always be uneven coverage and gaps in the sequencing coverage. Therefore, for diagnostic applications, it is recommended that reports are generated that detail the quality of an exome sequencing run, for example, on what was sequenced confidently, what was sequenced unreliably (low coverage), and what was missing (failed to be captured entirely). The details can range from very low-level data, such as the assignment of quality scores to every base sequenced, to high-level data, for example, clinician-oriented reports outlining which exons in the disease genes may lack coverage and hence may need to be followed up by alternative approaches. Interpretation of results should also be performed by a medical geneticist with proper training in these new technologies. Fourth, the time-to-diagnosis, when using exome sequencing as a diagnostic application, is also an important factor to consider, especially in the case of disorders for which a quick and accurate genetic diagnosis would significantly influence clinical management. For example, in congenital disorders of glycosylation it is important to identify the precise gene defect responsible, because several reasonably effective therapies for several subtypes of these disorders are available.16 Currently available sequencing instruments take several days for the sequencing to be completed. However, the recently launched Ion Torrent sequencing platform achieved the fastest sequencing time (completion in