Next Generation Sequencing Update
Karl V. Voelkerding, MD Professor of Pathology University of Utah Medical Director for Genomics and Bioinformatics ARUP Laboratories AACC-AMP 2012 Molecular Pathology Course
[email protected]
Disclosures • • • • • •
Grant/Research Support: NIH Salary/Consultant Fees: None Committees: College of American Pathologists Stocks/Bonds: None Honorarium/Expenses: None Intellectual Property/Royalty Income: None
Learning Objectives • Explain Principles of NGS • Describe Current and Future NGS Platform Options • Discuss Spectrum of NGS Clinical Applications
First Next Generation Sequencing Publication
Nature 437 (7057) 376-380
454 Life Sciences
2005
Paradigm Shift Sanger Sequencing Electrophoretic Separation of Chain Termination Products
Next Generation Sequencing Sequence Clonally Amplified DNA Templates in a Flow Cell Massively Parallel Configuration
Process
Genomic DNA or Enriched Genes Fragmentation
(150 – 500 bp)
End Repair and Adapter Ligation
“Fragment Library” Adapter
Fragment A
Adapter Adapter
Adapter
Fragment C
Fragment B
Adapter
Adapter
Process A
“Fragment Library”
B C
Clonal Amplification of Each Fragment Emulsion Bead PCR
A
Surface Clusters
B
A
B
C
Sequencing of Clonal Amplicons in a Flow Cell
C
Process Sequencing of Clonal Amplicons in a Flow Cell
Pyrosequencing 454
Sequencing by Ligation SOLiD
Reversible Dye Terminators Illumina
Generation of Luminescent or Fluorescent Images
Conversion to Sequence
454/Roche Bead Emulsion PCR
Solexa/Illumina Surface Bridge PCR
Pyrosequencing
Reversible dye terminators
200 – 400 base reads
36 – 75 base reads
Solexa/Illumina Sequencing
A
T
C
G
Qualitative and Quantitative Information
G>A
Ref Seq
Illumina
Coverage
Next Generation Sequencing • Sequence up to billions of fragments simultaneously • Iterative/cyclic sequencing Luminescence (Roche)
pH Detection (Ion Torrent)
Fluorescence (Illumina,SOLiD)
Signal to Noise Processing
G
Cyclic Base Calls A T G C - -
-
C30 G28
Base Quality Scores A33 T30 G28 C30 - -
-
C
Next Generation Sequencing Data
Primary Sequence Alignment BWA
Refined Sequence Alignment GATK/Picard
Variant Calling SAMTools/GATK
Variant Annotation Annovar
@HW-ST573_75:1:1:1353:4122/11 CAATCGAATGGAATTATCGAATGCAATCGA ATAGAATCATCGAATGGACTCGAATGGAAT CATCGAA + ggfggggggggggggfgggggggfgegggg fdfeefeggggggggegbgegegggdeYed gggggeg @HW-ST573_75:1:1:1347:4151/11 ATCTGTTCTTGTCTTTAACTCTCAAGGCAC CACCTTCCATGGTCAATAATGAACAACGCC AGCATGC + effffggggggggggggfgggggggggggg gdggggfgggfgdggaffffgfggffgdgg ggggdfg @HW-ST573_75:1:1:1485:4153/11 GAGGAGAGATATTTTGACTTCCTCTCTTCA TATTTGGATGCTTTTTACTTATCTCTCTTG ACTAATT + dZdddbXc`_ccccbeeedbeaedeeeee^ aeeedcaZca_`^c[eeeeed]eeecd[dd ^eeba[d
FastQ File Format
Variant g.34142190T>C in TPM1
Next Generation Sequencers First Wave
Second Wave - SMS
454/Roche 2004/5
Solexa/Illumina 2006/7
ABI/Life Tech 2007/8
Helicos
Pacific Biosciences
GS FLX
Genome Analyzer
SOLiD
HeliScope
SMRT
Third Wave GS Junior
GAIIx GAIIe HiScanSQ HiSeq
MiSeq 2011
SOLiD 5500 SOLiD 5500xl
Ion Torrent Life Technologies PGM 2011
Clinical Dissemination
Illumina HiSeq 2000
2 X 100 base pairs
2 Independent Flow Cells 8 Lanes per Flow Cell
Multiple Gene Panel Samples per Lane
540-600 Gb Output
2-3 Exome(s) per Lane
8-11 Day Sequencing Run
2 Genomes per Flow Cell
Illumina MiSeq 2 X 150 bp
2 X 250 bp
2.0 – 7.0 Gb Output ~27 Hrs Sequencing Run
Multi-Gene Panels Genetics Oncology Microbiology Reversible Dye Terminators
Viral and Bacterial Genomes Transcriptomes
Illumina MiSeq Transcriptome Sequencing GAPDH Sequence Reads
Ion Torrent
Hydrogen Ion
Pyrophosphate
Monitors H+ Release
Ion Torrent 100 – 200 base pairs 10 Mb – 1.0 Gb Output ~2 Hrs Sequencing Run
Multi-Gene Panels Genetics Oncology Microbiology Monitors H+ Release
Viral and Bacterial Genomes Transcriptomes
Ion Torrent
BRAF, c.1799T>A, p.V600E 26.5% mutant alleles
Technology Advances for 2012/13
Illumina HiSeq 2000
Late 2012
Upgrade Module
120 Gb 27+ Hours 2 X 100 base pairs 540-600 Gb Output 11 Day Sequencing Run
Single Genome in 27+ Hours
Multiple Exomes in 27+ Hours
Late 2012 Ion Torrent - Proton Exomes/Genome “Several Hours”
Oxford Nanopore Technologies Processive Enzyme
Protein Nanopore in Polymer Membrane
Current Disruption Based Electronic Signal
MinION – Late 2012
The Meeting Place Biotechnology
Bioinformatics
Sequence Generation
Sequence Analysis Interpretation
Biomedical Question What is the Genetic Landscape of a Tumor What Pathogen is Responsible for an Outbreak What Genetic Contributors Account for a Phenotype
Clinical Applications Whole Genome Whole Exome
Multi-Gene Diagnostics Increasing Complexity
Multi-Gene Diagnostics
Clinical Phenotype
Multiple Genes
Mutational Spectrum
Locus Heterogeneity
Allelic Heterogeneity
Multi-Gene Diagnostics “New First Tier” Genetic Testing
Scaling Increases Interpretive Complexity
Can Yield Non-Definitive Results
Gateway to Exome/Genome
Multi-Gene Diagnostics Genomic DNA Enrichment
Target Genes NGS Library Preparation
Next Generation Sequencing Bioinformatics
Interpretation
Gene Enrichment Approaches Genomic DNA Amplification Based
Array Capture Based
PCR or LR-PCR RainDance ePCR Fluidigm HaloGenomics
Solid Surface or In Solution
Enriched Genes
NGS
Gene Enrichment Approaches Genomic DNA Amplification Based
Array Capture Based
PCR or LR-PCR RainDance ePCR Fluidigm HaloGenomics
Solid Surface or In Solution
Advantage: Enrichment Specificity
Advantage: Scalable to Exome
Drawbacks:
Drawbacks:
Not as Scalable Instrument and Chip Costs
Homologous Sequence Capture Manually Complex
Clinical Applications Whole Genome Whole Exome
Multi-Gene Diagnostics Increasing Complexity
Human Exome “Journey to the Center of the Genome” ~ 30+ Megabases (~ 1.5% of the genome) ~ 180,000 exons (~ 20,500 genes)
Harbors “Majority” of Mendelian Mutations
Exome Sequencing History “Genetic Diagnosis by Whole Exome Capture and Massively Parallel DNA Sequencing” Choi et al PNAS 2009 – Congenital Chloride Diarrhea
~45 Gene Discovery Publications May 2012
Recessive
Dominant
De Novo
Genomic DNA Library Preparation
Next Generation Sequencing Library Hybridize to Exome Capture Probes
Exome Enriched Library Next Generation Sequencing
Bioinformatics Analysis
Comparison of Exome DNA Sequencing Technologies
Clark et al Nature Biotech Vol 29(10) Oct 2011
Comparison of Exome DNA Sequencing Technologies
Clark et al Nature Biotech Vol 29(10) Oct 2011
Exome Sequencing - Coverage of Coding Regions is Variable
Coverage
Aligned reads
Reference Capture probes
Exon 1
MAZ
HLA-DOB
Nimblegen Exome Capture and Illumina HiSeq
Exon 1
Exome Sequencing – Performance Characteristics Define Proportion of Exome “Adequately Covered” Conversely
Define Proportion of Exome “Not Adequately Covered”
Dependent On
Capture Technology – Probe Design and Capture Efficiency Sequencing Depth
Exome Sequencing – Performance Characteristics Define Proportion of Exome “Accurately Sequenced”
Co-Capture Component
Difficult to Sequence Regions
Pseudogenes
Repetitive Elements
Paralogs and Homologs
Mendelian Disorders – Working Hypothesis Seeking “Rare” Variants in a Single Gene(s)
Needle(s) in the Haystack(s)
Bioinformatics Annotated Variants
Prioritization by Heuristic Filtering Filter Out Common Variants dbSNP/1000 genomes Variant frequency
Prioritization by Likelihood Prediction
Pedigree Information Linkage/SGS/IBD Intersects Variant Binning
Pathogenicity Prediction Filtering SIFT/PolyPhen GERP
Missense Nonsense/Frameshift/Splice Site/Indels Cross Reference Databases HGMD/OMIM/Locus Specific
Candidate Genes/Potential Causative Variants
VAAST Algorithm
Genomic DNA Library Preparation
Next Generation Sequencing Library Hybridize to Exome Capture Probes
Genome Sequencing
Exome Enriched Library Next Generation Sequencing
Bioinformatics Analysis
Genomic DNA Library Preparation
Next Generation Sequencing Library Next Generation Sequencing
Bioinformatics Analysis Exome Sequencing
vs
Genome Sequencing
Cost – Coverage – Complexity
Whole Genome Sequencing Chr 10: g.43,615,633C>G in RET
Horizon Continued Evolution of Sequencing and Bioinformatics
College of American Pathologists Checklist Requirements for Next Generation Sequencing Professional Societies Guidelines for Clinical Next Generation Sequencing
Self Assessment Questions • Describe Process Steps for NGS • List NGS Platform Options and Capabilities • Relate Spectrum of Clinical NGS Applications