Next Generation Sequencing Update

Next Generation Sequencing Update Karl V. Voelkerding, MD Professor of Pathology University of Utah Medical Director for Genomics and Bioinformatics ...
60 downloads 1 Views 2MB Size
Next Generation Sequencing Update

Karl V. Voelkerding, MD Professor of Pathology University of Utah Medical Director for Genomics and Bioinformatics ARUP Laboratories AACC-AMP 2012 Molecular Pathology Course [email protected]

Disclosures • • • • • •

Grant/Research Support: NIH Salary/Consultant Fees: None Committees: College of American Pathologists Stocks/Bonds: None Honorarium/Expenses: None Intellectual Property/Royalty Income: None

Learning Objectives • Explain Principles of NGS • Describe Current and Future NGS Platform Options • Discuss Spectrum of NGS Clinical Applications

First Next Generation Sequencing Publication

Nature 437 (7057) 376-380

454 Life Sciences

2005

Paradigm Shift Sanger Sequencing Electrophoretic Separation of Chain Termination Products

Next Generation Sequencing Sequence Clonally Amplified DNA Templates in a Flow Cell Massively Parallel Configuration

Process

Genomic DNA or Enriched Genes Fragmentation

(150 – 500 bp)

End Repair and Adapter Ligation

“Fragment Library” Adapter

Fragment A

Adapter Adapter

Adapter

Fragment C

Fragment B

Adapter

Adapter

Process A

“Fragment Library”

B C

Clonal Amplification of Each Fragment Emulsion Bead PCR

A

Surface Clusters

B

A

B

C

Sequencing of Clonal Amplicons in a Flow Cell

C

Process Sequencing of Clonal Amplicons in a Flow Cell

Pyrosequencing 454

Sequencing by Ligation SOLiD

Reversible Dye Terminators Illumina

Generation of Luminescent or Fluorescent Images

Conversion to Sequence

454/Roche Bead Emulsion PCR

Solexa/Illumina Surface Bridge PCR

Pyrosequencing

Reversible dye terminators

200 – 400 base reads

36 – 75 base reads

Solexa/Illumina Sequencing

A

T

C

G

Qualitative and Quantitative Information

G>A

Ref Seq

Illumina

Coverage

Next Generation Sequencing • Sequence up to billions of fragments simultaneously • Iterative/cyclic sequencing Luminescence (Roche)

pH Detection (Ion Torrent)

Fluorescence (Illumina,SOLiD)

Signal to Noise Processing

G

Cyclic Base Calls A T G C - -

- 

C30 G28

Base Quality Scores A33 T30 G28 C30 - -

- 

C

Next Generation Sequencing Data

Primary Sequence Alignment BWA

Refined Sequence Alignment GATK/Picard

Variant Calling SAMTools/GATK

Variant Annotation Annovar

@HW-ST573_75:1:1:1353:4122/11 CAATCGAATGGAATTATCGAATGCAATCGA ATAGAATCATCGAATGGACTCGAATGGAAT CATCGAA + ggfggggggggggggfgggggggfgegggg fdfeefeggggggggegbgegegggdeYed gggggeg @HW-ST573_75:1:1:1347:4151/11 ATCTGTTCTTGTCTTTAACTCTCAAGGCAC CACCTTCCATGGTCAATAATGAACAACGCC AGCATGC + effffggggggggggggfgggggggggggg gdggggfgggfgdggaffffgfggffgdgg ggggdfg @HW-ST573_75:1:1:1485:4153/11 GAGGAGAGATATTTTGACTTCCTCTCTTCA TATTTGGATGCTTTTTACTTATCTCTCTTG ACTAATT + dZdddbXc`_ccccbeeedbeaedeeeee^ aeeedcaZca_`^c[eeeeed]eeecd[dd ^eeba[d

FastQ File Format

Variant g.34142190T>C in TPM1

Next Generation Sequencers First Wave

Second Wave - SMS

454/Roche 2004/5

Solexa/Illumina 2006/7

ABI/Life Tech 2007/8

Helicos

Pacific Biosciences

GS FLX

Genome Analyzer

SOLiD

HeliScope

SMRT

Third Wave GS Junior

GAIIx GAIIe HiScanSQ HiSeq

MiSeq 2011

SOLiD 5500 SOLiD 5500xl

Ion Torrent Life Technologies PGM 2011

Clinical Dissemination

Illumina HiSeq 2000

 2 X 100 base pairs

2 Independent Flow Cells 8 Lanes per Flow Cell

 Multiple Gene Panel Samples per Lane

 540-600 Gb Output

 2-3 Exome(s) per Lane

 8-11 Day Sequencing Run

 2 Genomes per Flow Cell

Illumina MiSeq  2 X 150 bp

2 X 250 bp

 2.0 – 7.0 Gb Output  ~27 Hrs Sequencing Run

 Multi-Gene Panels Genetics Oncology Microbiology Reversible Dye Terminators

 Viral and Bacterial Genomes  Transcriptomes

Illumina MiSeq Transcriptome Sequencing GAPDH Sequence Reads

Ion Torrent

Hydrogen Ion

Pyrophosphate

Monitors H+ Release

Ion Torrent  100 – 200 base pairs  10 Mb – 1.0 Gb Output  ~2 Hrs Sequencing Run

 Multi-Gene Panels Genetics Oncology Microbiology Monitors H+ Release

 Viral and Bacterial Genomes  Transcriptomes

Ion Torrent

BRAF, c.1799T>A, p.V600E 26.5% mutant alleles

Technology Advances for 2012/13

Illumina HiSeq 2000

Late 2012

Upgrade Module

120 Gb 27+ Hours  2 X 100 base pairs  540-600 Gb Output  11 Day Sequencing Run

 Single Genome in 27+ Hours

 Multiple Exomes in 27+ Hours

Late 2012 Ion Torrent - Proton Exomes/Genome “Several Hours”

Oxford Nanopore Technologies Processive Enzyme

Protein Nanopore in Polymer Membrane

Current Disruption Based Electronic Signal

MinION – Late 2012

The Meeting Place Biotechnology

Bioinformatics

Sequence Generation

Sequence Analysis Interpretation

Biomedical Question What is the Genetic Landscape of a Tumor What Pathogen is Responsible for an Outbreak What Genetic Contributors Account for a Phenotype

Clinical Applications Whole Genome Whole Exome

Multi-Gene Diagnostics Increasing Complexity

Multi-Gene Diagnostics

Clinical Phenotype

Multiple Genes

Mutational Spectrum

Locus Heterogeneity

Allelic Heterogeneity

Multi-Gene Diagnostics “New First Tier” Genetic Testing

Scaling Increases Interpretive Complexity

Can Yield Non-Definitive Results

Gateway to Exome/Genome

Multi-Gene Diagnostics Genomic DNA Enrichment

Target Genes NGS Library Preparation

Next Generation Sequencing Bioinformatics

Interpretation

Gene Enrichment Approaches Genomic DNA Amplification Based

Array Capture Based

PCR or LR-PCR RainDance ePCR Fluidigm HaloGenomics

Solid Surface or In Solution

Enriched Genes

NGS

Gene Enrichment Approaches Genomic DNA Amplification Based

Array Capture Based

PCR or LR-PCR RainDance ePCR Fluidigm HaloGenomics

Solid Surface or In Solution

Advantage: Enrichment Specificity

Advantage: Scalable to Exome

Drawbacks:

Drawbacks:

Not as Scalable Instrument and Chip Costs

Homologous Sequence Capture Manually Complex

Clinical Applications Whole Genome Whole Exome

Multi-Gene Diagnostics Increasing Complexity

Human Exome “Journey to the Center of the Genome” ~ 30+ Megabases (~ 1.5% of the genome) ~ 180,000 exons (~ 20,500 genes)

Harbors “Majority” of Mendelian Mutations

Exome Sequencing History “Genetic Diagnosis by Whole Exome Capture and Massively Parallel DNA Sequencing” Choi et al PNAS 2009 – Congenital Chloride Diarrhea

~45 Gene Discovery Publications May 2012

Recessive

Dominant

De Novo

Genomic DNA Library Preparation

Next Generation Sequencing Library Hybridize to Exome Capture Probes

Exome Enriched Library Next Generation Sequencing

Bioinformatics Analysis

Comparison of Exome DNA Sequencing Technologies

Clark et al Nature Biotech Vol 29(10) Oct 2011

Comparison of Exome DNA Sequencing Technologies

Clark et al Nature Biotech Vol 29(10) Oct 2011

Exome Sequencing - Coverage of Coding Regions is Variable

Coverage

Aligned reads

Reference Capture probes

Exon 1

MAZ

HLA-DOB

Nimblegen Exome Capture and Illumina HiSeq

Exon 1

Exome Sequencing – Performance Characteristics Define Proportion of Exome “Adequately Covered” Conversely

Define Proportion of Exome “Not Adequately Covered”

Dependent On

Capture Technology – Probe Design and Capture Efficiency Sequencing Depth

Exome Sequencing – Performance Characteristics Define Proportion of Exome “Accurately Sequenced”

Co-Capture Component

Difficult to Sequence Regions

Pseudogenes

Repetitive Elements

Paralogs and Homologs

Mendelian Disorders – Working Hypothesis Seeking “Rare” Variants in a Single Gene(s)

Needle(s) in the Haystack(s)

Bioinformatics Annotated Variants

Prioritization by Heuristic Filtering Filter Out Common Variants dbSNP/1000 genomes Variant frequency

Prioritization by Likelihood Prediction

Pedigree Information Linkage/SGS/IBD Intersects Variant Binning

Pathogenicity Prediction Filtering SIFT/PolyPhen GERP

Missense Nonsense/Frameshift/Splice Site/Indels Cross Reference Databases HGMD/OMIM/Locus Specific

Candidate Genes/Potential Causative Variants

VAAST Algorithm

Genomic DNA Library Preparation

Next Generation Sequencing Library Hybridize to Exome Capture Probes

Genome Sequencing

Exome Enriched Library Next Generation Sequencing

Bioinformatics Analysis

Genomic DNA Library Preparation

Next Generation Sequencing Library Next Generation Sequencing

Bioinformatics Analysis Exome Sequencing

vs

Genome Sequencing

Cost – Coverage – Complexity

Whole Genome Sequencing Chr 10: g.43,615,633C>G in RET

Horizon  Continued Evolution of Sequencing and Bioinformatics

 College of American Pathologists Checklist Requirements for Next Generation Sequencing  Professional Societies Guidelines for Clinical Next Generation Sequencing

Self Assessment Questions • Describe Process Steps for NGS • List NGS Platform Options and Capabilities • Relate Spectrum of Clinical NGS Applications