Whole Genome Sequencing by NextGeneration Methods: Genome-forward Medicine Elaine R. Mardis, Ph.D. Professor of Genetics Washington University School of Medicine St. Louis MO
2011 ASCP Annual Meeting
Conflict of Interest Slide
I have the following conflicts to declare:
Speaker’s Bureau: Illumina, Inc.
Scientific Advisory Board: Pacific Biosciences, Inc.
Stockholder: Life Technologies, Inc.
2011 ASCP Annual Meeting
Overview
Next-Generation Sequencers (NGS)
Third Generation Sequencers
Pacific Biosciences Ion Torrent MiSeq Oxford Nanopore
NGS Practical Considerations
Roche/454 Illumina Life Technologies
Coverage: depth and breadth Error model Representation bias
NGS Applications 2011 ASCP Annual Meeting
Defining “Genome-Forward” Medicine
What is “Genome-Forward” Medicine?
Informing the physician about genomic aspects of a patient’s diagnosis, using next-generation sequencing methods in place of conventional approaches. In certain applications of genome-forward medicine, having information about the patient’s genome, or that of their microbial or viral infecting genome, may better inform the physician about treatment choices and in turn, improve patient response. 2011 ASCP Annual Meeting
The Trajectory of Throughput: 10 years
E.R. Mardis, Nature (2011) 470: 198-203
2011 ASCP Annual Meeting
Comparative costs: sequencing a human genome
Capillary technology
Applied Biosystems 3730xl (2004) $15,000,000
2011 ASCP Annual Meeting
Next-gen technology
Illumina HiSeq (2011) $10,000
Next-generation sequencers
The Fundamentals
2011 ASCP Annual Meeting
Next-generation DNA sequencing instruments
Next-generation DNA sequencing instruments
All commercially-available sequencers have the following shared attributes:
Random fragmentation of starting DNA, ligation with custom linkers = “a library” Library amplification on a solid surface (either bead or glass) Direct step-by-step detection of each nucleotide base incorporated during the sequencing reaction Hundreds of thousands to hundreds of millions of reactions imaged per instrument run = “massively parallel sequencing” Shorter read lengths than capillary sequencers A “digital” read type that enables direct quantitative comparisons A sequencing mechanism that samples both ends of every fragment sequenced (“paired end” reads)
2011 ASCP Annual Meeting
What are paired-end reads?
Paired-end reads
All next-gen platforms now offer methods to derive sequence data from each end of the library fragments. Differences exist in the _distance_ between read pairs, based on the approach/platform.
“paired ends” : linear fragment with ability to sample both ends in separate reaction “mate pairs” : circularized fragment of >1kb, sequenced either by a single reaction read or two end read (platform dependent)
In general, paired end reads offer advantages for human whole genome sequencing due to the repetitive nature of our DNA, and the difficulty in accurate placement (“mapping”) of NGS reads. 2011 ASCP Annual Meeting
Read alignment to the Human Reference Genome
The Human Genome Reference •
Mapping short reads to the genome reference sequence is a required step for next-generation sequence data analysis, regardless of the preparatory steps used
•
Once mapped to the genome, “localized” assembly of select reads can be used to define genomic alterations at single nucleotide resolution
•
The human genome sequence serves as a “reference” to which we can compare other human genomes… •
SNPs (single nucleotide polymorphisms)
•
Point mutations (maintain or change amino acids)
•
Insertion/deletions (add or remove >1base in the sequence)
•
Translocations of chromosome fragments (inter- or intra-)
•
Inversions (an entire fragment swaps ends)
•
Amplification (multiple repeated copies of a genome segment) or Deletion (large blocks removed)
2011 ASCP Annual Meeting
Roche/454 Library Prep
Roche/454 Library Prep
•Random Fragmentation •Adapter Ligation
Single Stranded Adapter Ligated Library
• emPCR for clonal amplification
2011 ASCP Annual Meeting
Roche/454 Pyrosequencing
Roche/454 Pyrosequencing Load Enzyme Beads
Load beads into PicoTiter™Plate
Centrifugation
DNA Capture Bead Containing Millions of Copies of a Single Clonal Fragment
Sequencing by Synthesis A
A
T
C
G
G
C
A
T
G
C
T
A
A
A
A
G
T
C
A T Anneal Primer
Sulfurylase
APS
PPi
Luciferase
ATP
luciferin
Light + oxy luciferin
2011 ASCP Annual Meeting
454 Instrumentation Specifics Instrument
Run Time (hr)
454 FLX+
18-20
700
900
Indel
1
$30 A
454 FLX Titanium
10
400
500
Indel
1
$500
454 GS Jr. Titanium
10
400
50
Indel
1
$108
A–
Read Yield Length (Mb/run (bp) )
Error Type
Error Rate (%)
Purchase Cost (x1000)
Requires the 454 FLX Titanium. This is the upgrade cost.
Notable: • Mate pair paired end reads of 3kb, 8kb and 20 kb separation without an increase in run time. • Cost per run makes sequencing an entire human genome cost-prohibitive relative to other technologies (~ $20/Mbp) • Great platform for targeted validation
2011 ASCP Annual Meeting
Illumina Sequencing: Library Preparation
Illumina Sequencing: Library Preparation
Automated Processing Low gDNA Inputs
2011 ASCP Annual Meeting
Illumina Sequencing by Synthesis
Emission
Excitation
2011 ASCP Annual Meeting
Incorporate Detect De-block Cleave fluor
Illumina Instrumentation Specifics Key updates • 2010: HiSeq 2000 • Two flow cells per run • 100 Gbp/FC or two genome equivalents per run • New scanning mechanics - scans both surfaces of FC lanes
• 2011: HiSeq 2000 • Improved chemistry (v. 3): increased yield and accuracy
• 2011: MiSeq Instrument
Run Time (days)
Read Length (bp)
Yield (Gb/run)
Error Type
Error Rate (%)
Purchase Cost (x1000)
GAIIx
14
150 x 150
96
Sub
>0.1
$525
HiSeq 2000
8
100 x 100
200 x 2
Sub
>0.1
$700
HiSeq 2000 v3
10
100 x 100
0.1
$700
MiSeq
1
150 x 150
1
Sub*
>0.1*
$125
2011 ASCP Annual Meeting
Life Technologies: sequencing by ligation
• custom adapter library • emPCR on magnetic beads • sequencing by ligation using fluorescent probes from a common primer • sequential rounds of ligation from a series of primers • fixed/known nucleotides for each probeset identify two bases each cycle, or “two base encoding”
2011 ASCP Annual Meeting
SOLiD Instrumentation Specifics Instrument
Run Time (days)
Read Length (bp)
Yield (Gb/run)
Error Type
Error Rate (%)
Purchase Cost (x1000)
SOLiD 4
12
50 x 35 PE
71
A-T Bias
>0.06
$475
8
75 x 35 PE 60 x 60 MP
155
A-T Bias
>0.01
$595
SOLiD 5500 xl
5500 xl • Front-end automation addresses bottlenecks at emPCR, breaking, and enrichment of beads • 6-lane Flow Chip with independent lanes/2 per run • Cost per whole genome data set is predicted to be $6K by 2011 • Very high accuracy data due to two-base encoding • ECC Module – An optional 6th primer that increases accuracy to 99.999% • Direct conversion of color space to base space • True paired-end chemistry enabled – Ligation reaction can be used in either direction
2011 ASCP Annual Meeting
Third Generation Sequencing Instruments
Third generation sequencers??
Recently, new sequencing platforms were introduced. The Pacific Biosciences sequencer is a single molecule detection system that marries nanotechnology with molecular biology. The Ion Torrent uses pH rather than light to detect nucleotide incorporations. The MiSeq is a scaled down version of the HiSeq, with faster chemistry and scanning. All offer a faster run time, lower cost per run, reduced amount of data generated relative to 2nd Gen platforms, and the potential to address genetic questions in the clinical setting.
2011 ASCP Annual Meeting
Comparisons to Third-Generation Sequencers Company
Platform Name
Sequencing
Amplificatio n
Run Time
Roche
454 Ti
DNA Polymerase “Pyrosequencing”
emPCR
10 hours
Illumina
HiSeq/MiS eq
DNA Polymerase
Bridge amplification
10 days/24 hours
Life
SOLiD/5 500
DNA Ligase
emPCR
12 days
Ion Torrent
PGM
Synthesis H+ detection
emPCR
2 hours
Pacific Biosciences
RS
Synthesis
NONE
45 min
2011 ASCP Annual Meeting
Pacific Biosciences RS Sample Prep Shearing (Covaris/Hydroshear)
Library/Polymerase Complex DNA polymerase binding
Sequencing Movie 1 (v1.1 & v1.2) Raw reads
Polish ends
Load library/polymerase complex onto SMRT cell
Post-filter reads
Mapped reads
Movie 2 (only v1.2) Raw reads
Post-filter reads
Mapped reads
SMRTbell™ ligation
The image part with relationship ID rId3 was not found in the file.
Sequencing primer annealing
Zero Mode Waveguides (ZMWs) Ver. 1.1 = 1 x 45,000 per SMRT cell Ver. 1.2 = 2 x 75,000 per SMRT cell
2011 ASCP Annual Meeting
SMRTbell Library Types
Circular Consensus Small ~250bp Single Movie Multiple passes Many sub-reads
Standard Large ~2kbp Single Movie Few sub-reads
VLR (“very long read”) Larger ~6kbp Single 45 minute Movie Single long read provides linking information
Read
Sub Reads
Sub Reads
Reads Consensus Read
2011 ASCP Annual Meeting
Pacific Biosciences RS Instrumentation Specifics Instrument
Run Time (Hours)
Read Length (bp)
Yield (Mb)
Error Type
Error Rate (%)
Purchase Cost (x1000)
1500
45 per SMRTCell
Insertions
15
$695
14 RS
(~8 SMRTCells)
mean mapped sub-read accuracy: mean mapped sub-read length: maximum mapped read length: maximum mapped sub-read length: Strobe polymerase/strobe reagent/strobe protocol (45 min movie)
1x45 min movie 8 SMRT cells
2011 ASCP Annual Meeting
85.7% (±1.3%) 697 (± 501) 7,772 bp 5,601 bp
Strobe polymerase/standard reagent/standard protocol
2x45 min movies 8 SMRT cells
Ion Torrent PGM Instrument
2011 ASCP Annual Meeting
Data Output per Run Trajectory: Ion Torrent
At present, only the 314 and 316 chips are commercially available
2011 ASCP Annual Meeting
Ion Torrent Data Yield Improves with Automation
Ion Torrent Yield with Automation Modules
2011 ASCP Annual Meeting
Oxford Nanopore Sequencing Exonuclease-aided sequencing
Pore translocation sequencing
2011 ASCP Annual Meeting
NGS: Practical Considerations
Importance of coverage, error model, representation bias
2011 ASCP Annual Meeting
Importance of Coverage
What is “coverage”? Coverage is a general term to describe the – fold oversampling of a DNA target by sequencing data In covering a target region or genome, increasing depth of coverage leads to increased certainty of variant detection Coverage levels may vary due to G+C content, amplification or deletion in the region, or other biases, the uniqueness of the target region (“mappability”), and the error model of the data type Coverage breadth is equally important as depth!
2011 ASCP Annual Meeting
Alignments: Coverage breadth Breadth-of-coverage topography Altered greatly by the application of a minimum depth filter. Breadth of coverage attrition through a range of minimum depth filters (1x, 5x, 10x, 15x, 20x) for a given region of interest (chr7:930225-931129). Note the breadth of coverage drops from almost 100% at 1x depth requirement to ~20% when we require >= 20x depth for coverage membership.
2011 ASCP Annual Meeting
Coverage breadth topography Depth
Breadth
1X
97.3%
5X
67.3%
10X
43.5%
15X
27.8%
20X
22.2%
Bias Assessment: GC content impacts coverage
2011 ASCP Annual Meeting
Applying Next-generation Sequencing
Whole Genome Sequencing (WGS)
Hybrid Capture: Exome or Targeted sequencing
Protein:DNA binding
non-coding RNA sequencing: discovery/variants
Transcriptome sequencing (RNA-seq)
Genome-wide methylation of DNA (Methyl-seq)
Clinical sequencing for therapeutic decisions E.R. Mardis, Annual Reviews in Genetics & Genomics (2008) E.R. Mardis, Nature (2011) 470: 198-203
2011 ASCP Annual Meeting
Whole Genome Sequencing Process
WGS: Data Production and Alignment • Prepare paired end libraries as whole genome fragment/shotgun by random shearing of genomic DNA, adapter ligation, size selection. • Produce paired end data from each end of billions of library fragments, over-sampling about 30-fold to cover at a depth sufficient to find all types of genome alterations. • Computer programs align the read pair sequences onto the Human Reference Genome and several algorithms are used to discover variants genome-wide.
2011 ASCP Annual Meeting
Data Analysis Workflow for Variant Detection 30-fold coverage of tumor genome PE Illumina reads Align read pairs to reference genome Detect Single-Nucleotide Variants and focused insertion/deletions
Design custom capture probes for each putative variant site
Detect anomalous read pair mapping, assemble reads and identify structural variations (inversions, translocations)
Compare to SNP array analysis
2011 ASCP Annual Meeting
Use normalized read coverage levels and HMM-based algorithm to identify CNA and LOH regions
Somatic Mutations in 50 AML Genomes
2011 ASCP Annual Meeting
2011 ASCP Annual Meeting
Indentifying Prognostic Mutations DNMT3A mutations in AML
Ley et al. NEJM 2010.
mutations are present in 22% of de novo AMLs, and 34% of cytogenetically normal patients.
DNMT3A
DNMT3A
mutations are strongly associated with poor prognosis.
2011 ASCP Annual Meeting
Hybrid Capture of Genomic Regions or Genes • Hybrid capture - fragments from a whole genome library are selected by combining with probes that correspond to targeted genes. • The probe DNAs are biotinylated, making selection from solution with streptavidin magnetic beads an effective means of purification. • The human “exome” by definition, is the exons of all ~21,000 genes annotated in the human reference genome. Commercial exome kits target most but not all genes. • Custom capture reagents can be synthesized to target specific loci that may be of interest in a clinical context.
2011 ASCP Annual Meeting
Comparing Exome to Whole Genome Cancer Sequencing
• • •
Exome sequencing costs ~1/10 WGS Simplified analysis Sequence more samples
vs. •
• • • 2011 ASCP Annual Meeting
WGS captures SVs and focal amp/dels not resolved by SNP arrays Non-exonic mutations (“tier 2”) are likely to play a role in cancer Exome reagents do not capture all exons Tier 2/3 mutations facilitate heterogeneity analysis
Tumor Heterogeneity from Deep Read Count Analysis
Deep Digital Sequencing by Custom Hybrid Capture
Custom capture probes can be designed to sample from somatic point mutations, in/dels and structural variant regions in the tumor genome. High depth next-generation sequencing of these captured regions provides a digital read-out of the allele frequency of each mutation in the tumor cell population. The prevalence of a mutation reflects its ‘history’ in the tumor evolution. Older mutations are present at higher prevalence or allele frequency, newer ones at lower allele frequency. Using these data and statistical approaches, we can model the tumor heterogeneity for any sequenced sample, determining the mutational profile of each subclone and it’s proportion of the tumor cell population. 2011 ASCP Annual Meeting
“Deep Digital Sequencing”
All cells carry a heterozygous mutation
Wild-type & het mutant cells
Mutation
Total reads = 1000 Mutant reads = 500 Mutant Allele Frequency = 500 = 50% 1000
2011 ASCP Annual Meeting
Total reads = 1000 Mutant reads = 250 Mutant Allele Frequency = 250 = 25% 1000
Tumor heterogeneity: single dominant clone
Single cluster of somatic mutations
2011 ASCP Annual Meeting
Tumor heterogeneity: multiple clones
Cluster 3: 88% tumor allele frequency
Cluster 2: 57% tumor allele frequency
Cluster 1: 35% tumor allele frequency 22% Tumor Content of Tumor Sample 0.3% Tumor Content of Normal Sample
Tumor allele frequencies are adjusted for normal cell contamination Kernel density estimation determines the number of discrete clones
2011 ASCP Annual Meeting
Mutant Allele Frequency in Metastasis (%)
Comparing heterogeneity of primary and metastatic tumors
Primary = absent Metastasis= all cells
Primary tumor= heterozygous mutation present in all cells Metastasis= heterozygous mutation present in all cells
Mutant Allele Frequency in Primary Tumor (%)
2011 ASCP Annual Meeting
De Novo vs. Relapse: Monoclonal Disease
Tumor subpopulations: Single dominant clone Clinical Data Sex: Age: Cyto: Time to relapse: OS months: Induction: Consolidation: SCT:
2011 ASCP Annual Meeting
Male 53 Normal 235 30 7+3+3E HDAC No
De novo vs. Relapse: Oligoclonal Disease
Tumor subpopulations: Multiple clones Clinical Data Sex: Age: Cyto: Normal Time to relapse: OS months: Induction: Consolidation: SCT:
2011 ASCP Annual Meeting
Male 68
805 46.8 7+3D HDAC No
Relapse is based on clonal evolution
Clonal Progression in AML1 Relapse
• Additional data from 7 AML tumor:relapse whole genome sequencing reveals that this scenario is common: a single dominant mutation cluster in a founder subclone persists through the chemotherapy and re-emerges with new somatic alterations. • Each tumor showed clear cut evidence of clonal evolution at relapse, and a higher frequency of transversions that were probably induced by DNA damage from chemotherapy.
2011 ASCP Annual Meeting
Clonal Evolution: MDS to sAML
By WGS of the sAML tumor, we identified somatic mutations All validated mutations were evaluated by deep digital sequencing in both banked MDS and sAML to calculate allele frequency and heterogeneity
2011 ASCP Annual Meeting
RNA Sequencing
RNA isolate
Size selection for nc RNA classes
polyA priming, SAGE tags RT, ds DNA Fragment, RT w/randoms
Adapter-ligated fragments for next-gen sequencing Alignment to reference database & discovery -Expression levels -Novel splice isoforms -Allelic bias in transcription -Fusion transcripts
2011 ASCP Annual Meeting
RNA-seq informs therapeutic choices
EN1 – a gene that is not expressed Normal WGS
Tumor WGS
IGV screenshot
Tumor RNA-seq EN1 is not expressed (0 RNA-seq reads out of ~146 million mapped)
Metastatic breast cancer (to brain)
2011 ASCP Annual Meeting
RNA-seq provides correlative data
RNA Correlation of Somatic Copy Number Alterations
HER2
HER2 / ERBB2 is heavily amplified in this tumor
2011 ASCP Annual Meeting
Metastatic breast cancer (to brain)
RNA-seq correlates with IHC Diagnosis
RNA-seq confirms the HER2, PR, & ER status HER2 +ive
PR-
• Gene expression values from RNAseq • Used to confirm HER2, PR, & ER status of each patient obtained by IHC • Tumor is • HER2+, PR-, ER-
ER-
Metastatic breast cancer (to brain) vs. four primary HER2 +ive breast cancers 2011 ASCP Annual Meeting
Human Microbiome Project Goals: -Defining Core Microbiomes in “healthy” subjects -Changes in Microbiomes with disease
Subjects
Samples Community Phenotypes
Sources of strains
Data Coordination, Analysis and Concordance to Clinical Data
Microbial Communities
Reference Sequences
Metagenomics WGS
2011 ASCP Annual Meeting
16S rRNA
Gammapapillomavirus Human papillomavirus Polyomavirus Roseolovirus Lymphocryptovirus Mastadenovirus
Alphapapillomavirus Gammapapillomavirus Human papillomavirus Polyomavirus Roseolovirus
Kristine Wylie, WUGI
Lymphocryptovirus Mastadenovirus
HMP: Human Virome Stability
Stability of virome over time
2011 ASCP Annual Meeting
Alphapapillomavirus
GI
Oral
Vaginal
Nasal
Legend: CD
MRSA.6307.307N28
MR
SA
4 .6
MR
12
SA
MR
4 .R SA MR
MR SA MR SA .R 1 03 .6 MR 43 SA 6. 714 MR .6 N W2 SA 39 .6 12 8. 26 .64 N8 32 6. 0 . 2 . A4 .6 MR 6 62 SA 6 41 22 6.1 .6 5. G3 .2 MR 4 4 2 9 15 SA. 56 2N 3W 8 6 38 .27 MR 20 90 SA 2 8 .38 W2 . 62 MR 2W1 10 SA 54. .6 40 4 21 25 4W MRS .4 53 21 MR A. N SA . 64 R103 69 MR G 33. MRS SA .6 G99 4 33 A. 64 .N MR 33 MR 82 SA .W SA .6 21 .6 36 34 1 MR 5. 7. SA G9 34 .6 6 MR 7W 33 SA 10 0. .6 6 33 32 0N 6. 40 32 6W 87
MR
MR
SA
MR
MRSA.6366.3667367W 128 MRSA.6366.366W 127 MRSA.6366.366G 72 MRSA.6366.36 6N51 MRSA.6384. 384A37 MRSA.6502 .3302W157 MRSA.650 1.3301W1 56 MRSA.650 1.3301G83 MRSA .650 1.3301N 61 MRSA .62 60.2 60G3 4 MRSA.62 60. 260 W58 MRSA.6 340 .34 0A25 MRS A.6 340 .34 0W1 00 MRSA.6 340 .34 MRSA.6 0G57 341.34 MRS A. 1N4 2 6402.4 02N63 MRS A.6 386 .38 MRSA. 6G77 638 7.387 MRSA G7 .63 8 87.38 MR SA 7N59 .638 6. 38 MRSA 6W14 .6 40 4 1.40 MRSA 1657 .634 5W 16 9. 34 MRSA 0 9G62 .638 9.W2 MRSA 06 .638 MR SA 9.A4 .638 5 MRSA 9.N7 .6 9 42 7. MR SA.6 42 7W18 42 MRSA 7.42 4 .624 7N MR 71 3. SA W196 .6 242. MR SA 24 .6 2W MR 25 42 SA 2. .6 25 MR 25 27 SA 2. 09 .6 4W 25 MR 25 52 2G SA 2. 29 .6 25 MR 25 2W SA 8. 51 .6 25 MR 25 8N SA 8. 25 .6 25 MR 25 8G SA 8. 33 .62 MR 25 SA 8W 66 56 .6 .2 MR 26 6 61 S 6. 35 MRS A. 62 26 3W 67 66 A .6 66 01 .2 MR 1W 26 6 6W SA 67 6.2 .6 MRS 65 26 6 65 A 6. M RS .6 9 21 26 A .6 266. W68 6A MR 15 2 39 SA 4.3 66G3 .63 MR SA 8 94 91. MR .6 391 G79 S 42 MRS A .6 1. W1 42 30 A. 48 MR 17 3. 63 SA 59 G9 03 MR .6 9W .W1 3 SA 26 17 MR .6 7. 99 7 26 26 MR SA. 7. 7. 64 SA 26 MR 1N 03 7W SA .63 27 .4 MR 69 99 .6 03 . 20 64 MR S A. 4. 399 88 6 SA W 20 W1 .6 3 0 5 4W 154 63 . 33 4 5. 30 5 W7 33 1 52 98 5W 19 0
MRSA.6307.307A16
MRSA.6307.307G42
MRSA.6345.345N44
MRSA.6344.344W102
MRSA.6345.345W103
MRSA.6407.407N65 67 MRSA.6407.4074440W1 90W166 MRSA.6407.40766 7462W12 MRSA.6215.215 341.1N43 MRSA.6341. 61G35 MRSA.6261.2 335G56 MRSA.6335. 5 0.3300W15 MRSA.650 0 1.3301A4 MRSA .650 2 0.33 00G8 MRSA .650 34 74.3 74W1 MRS A.63 8W107 348.34 1 MRS A.6 415 .N8 MRS A.6 10W209 415 .70 5W54 MRS A.6 255 .25 MRS A.6 55G31 25 5.2 MRS A.6 71W93 331 43 63 31. .1G54 MRSA. 1.3 31 633 A39 MR SA. 6.386 8 .638 6. N7 MR SA A. 638 MRS 09W5 09.2 .62 2W95 MRSA 2.33 55 .6 33 1.1G MRSA 1.33 2N70 .633 2.42 MR SA 9 .6 42 8W17 MR SA 2758 27 2. 42 231W 1. .6 42 1N37 .6 23 6. MRSA MRSA 6. 32 6W46 32 SA.6 6. 24 26 MR 6G .624 24 SA 14 MR 246. 6W .6 SA 6.21 6G6 MR 21 21 .6 6. 72 SA 21 0N MR .6 43 6 SA 0. 18 MR 43 0W .6 43 89 SA 0. 0G MR 43 43 0 .6 0. 15 SA 43 5W MR .6 1 39 12 SA 5. MR 1W 39 36 83 .6 1. .N SA 36 37 3 MR .6 .64 .W21 5 SA SA 0 MR 37 MR W1 64 5 A. 077 G2 M RS 4 63 5 .1 A1 3 6 .3 .24 5. 1 2 34 2 45 A1 24 A.6 .6 1 5. 245 N2 24 M RS MR SA 5. .6 5.1 45 24 SA MR . 24 45W A.6 0 MRS 6 245 5.2 W9 9 4 29 N3 SA. .62 .3 MR A 329 20 3 29 9. 2 4 W MRS A.6 0 32 2 17 M RS A.6 24. 74W 92 2 7 4 1W 2 6 M RS .6 12 33 A SA .4 1. 341 01 MR 1 33 . .6 341 2W 13 2 34 0W 2N .6 2. SA 37 0 34 W MR 0 . . R4 .6 88 37 A SA 64 .6 M RS MR SA 02
400 MRSA Genomes
MRSA.6307.307W72 MRSA.6359.359W119
NGS of MRSA Isolates
96 5W 49 33 0W 5. 99 25 33 9W 0. .6 33 25 SA 00 9. .6 4 MR W2 33 SA 4N 1. .6 MR 10 A 21 32 S 4W 8 4. .6 MR 21 13 21 1 SA 4. 0W .6 G4 MR A 21 38 S .1 0. .6 MR 268 G40 38 SA 8. .6 MR 26 268 W7 0 SA .6 8. MR 8 SA 26 R 6 .26 G37 M A. 2 68 .265 64 MRS A.6 5W S 09 265 26 MR W1 .6 5. 50 63 SA 26 .3 MR . 1G A.6 50 8 349 2 10 MRS A.63 9. 17 9W 34 0W 34 M RS A. 6 68 9. 63 S 34 1 41 MR .6 17 6. SA 6W 41 87 MR 41 .6 1G 6. SA 6. 41 MR 4 41 .6 10 6. SA 6W 41 MR 34 .6 97 6. SA 6W 34 MR 33 .6 98 6. SA 7W 33 MR 33 .6 89 7. SA 8W 33 MR .6 32 SA 8. 74 32 MR 6G .6 37 3 SA 6. 15 MR 37 .6 398W SA 8. 142 MR 39 .6 384W SA 41 MR 384. .6 406A 113 MRSA 406. 351W SA.6 1. MR 81 35 31 9W SA.6 6 MR 319. 7W13 .6 7. 37 MRSA .6 37 14 SA MR 8.W2 .643 1N18 MRSA 1.24 24 .6 1W41 MR SA 1.24 20 .6 24 1. 1A SA 31 MR 1. 32 .631 1.1N MRSA 11.31 1 .63 MRSA 31 1N3 6311. 19 11A MRSA. 31 1.3 MRSA.6 11 W75 1.3 31 15 MRS A.6 439 .W2 A.6 MRS 5 363 .G9 79 MRS A.6 76744W 317 .31 MRS A.6 7W78 317 .31 MRS A.6 W18 8 35. 435 MRS A.64 4 36.336A2 MRSA.63 1N53 1.37 MRSA .637 .371G73 MRSA.6371 .371W133 MRSA.6371 .367W129 MRSA.6367 37.1N15 MRSA.6237.2 7W36 MRSA.6237.23 1N7 MRSA.6218.218. 181 MRSA.6423.4232663W 180 MRSA.6423.4237501W MRSA.6228.228.1G14
SA
SA
MR SA .632 5. 32 MRSA. 5W86 6324. 324.1 MR SA. N36 637 0. 370N5 MRSA.6 2 251.25 1W 50 MRS A. 6251.2 MRSA.6 51N22 506 .33 06W 159 MRS A.6 504.33 04W 158 MRS A.6 211.21 MRSA.6 1W7 387 .38 7W1 45 MRS A.62 48. 248 W47 MRS A.62 54.2 54G 30 MRS A.64 18.418W1 74 MRSA.637 6.376N54 MRSA.6376. 376.1G75 MRSA.6376. 376W135 MRSA.6376.3 76A34 MRSA.6376.376 .1N55 MRSA.6345.345G 59 MRSA.6376.376.1A35 MRSA.6230.230.1N13
MRSA.6230.230.1G15 MRSA.6230.230N12 MRSA.6230.230W25
MRSA.6228.228W23
MRSA.6230.2302975W26
MRSA.6259.259W57
MRSA.6342.342342807W191
MRSA.R406 2601W
MRSA.6383.383W141
MRSA.R406N
MRSA.6217.217W15
MRSA.6406.406G84
MRSA.6217.217G7
MRSA.6406.406N64
MRSA.6217.217.1G8
MRSA.6406.4062601W165
MRSA.6217.217.1N5 G81 MRSA.6396.396.1 4G58 MRSA.6344.34 344A27 MRSA.6344. 2173205W189 MRSA.6217. .217.1A3 MRSA.6217 8.2938W38 MRSA.623 8.238N16 MRSA.623 8.23 8W37 MRSA.623 8.23 8A8 MRSA .623 4 375. W20 MRS A.6 7 375.G9 MRSA.6 7 375 .N7 MRSA.6 110 013 96W 350 .35 0.1A28 MRSA.6 350.35 2 A.6 MRS 593 W11 500 .1 G60 350.3 45 A.6 5.3 MRS 3 634 26W18 MR SA. 6.4 W76 .642 .314 MR SA 5 .6 314 .1G4 14 MRSA 14 .3 1W 77 .63 4354 MRSA 4.31 4G44 .6 31 4. 31 MRSA .6 31 9W48 SA MR 9.24 9G27 .624 9.24 MRSA 24 1G28 .6 9. MRSA 5 9.24 18 .624 429W 8 9. 17 MR SA .642 968W MRSA 4217 2W60 1. 26 2. 26 .642 26 2N SA .6 26 MR 2. 36 MRSA 26 2G .6 26 SA 61 2. MR 9W 26 19 .6 7 21 SA 13 26 MR 9W 2. 37 57 26 9. 9N .6 37 37 SA 24 .6 9. MR 7N SA 37 25 MR .6 32 7. SA 7G 25 MR 25 5 .6 7. W5 SA 25 MR 57 8 .6 .2 19 SA 57 7W 2 MR 7 94 .6 .2 SA W19 MR
A.
47 47. A43 . 62 4 7. 75 SA 62 MR 7.N 2 S A. 24 MR W20 4 .6 4. 9 SA 35 MR 4 .G 92 A.6 635 G 7. MRS SA . 30 24 R W1 4 .6 M 68 W9 SA .3 5 94 MR 68 G6 46 1 63 53 31 16 .3 SA. 1.3 6W 53 3 MR 14 3 33 63 W1 4 .6 16 52 SA SA. 40 G6 .3 . MR MR 52 N73 1 0 52 .3 1 7 64 63 52 43 18 A. A. 63 S 1. S 6W . MR MR 43 323 SA .6 MR 31 SA R .4 M 31 64 . SA MR
SA .
MR
MRSA.6363.W203
MRSA.6377.377N56
MRSA.6227.227N 10 MRSA.6232.232W28
MRSA.6232.232G16 MRSA.6232.2326105W29
MRSA.6227.22 7G13 MRSA.6227.227 .1N11
MR
MRSA.6261.261W59
MRSA.6377.377A36
MRSA.6261.261A14
MRSA.6223.223W19
MRSA.6223.223G12
2011 ASCP Annual Meeting
8 2W 17 21 4 0W 6 2. 19 22 N6 3 21 9W 0. 10 19 .6 88 22 0.4 5W 02 SA .6 83 1 41 MR SA .64 06 0. 85 MR 41 41 86 SA 0G 0. .6 1G MR 41 41 SA 67 0. 0. .6 MR 1N 41 41 SA 0. 0. .6 MR 73 41 41 SA W . 8 0 .6 MR 58 30 41 SA 5N .6 8. MR 30 38 SA 76 5. .6 MR 8 5G 143 38 SA .3 .6 MR 5W 85 2 SA 38 38 19 MR .63 85. 5A 0W SA 38 62 5 37 MR .63 85 . 2W1 85 SA 2 63 .3 MR 18 640 7W 02 S A. 6385 20 MR 2 .4 A. 47 42 640 M RS 4. S A. 44 42 9 MR .A .6 13 66 1W SA 38 MR . 62 50 1. SA 5N 38 6 MR 36 .6 12 5. SA 5W 36 MR 36 .6 50 5. SA 7G 36 MR 32 .6 88 7. SA 7W 32 MR 1 32 .6 13 7. SA 9W 32 MR 36 .6 9. SA 21 36 MR 0A .6 32 SA 0. 82 MR 32 0W .6 32 SA 0. MR 0N33 .632 SA 0.32 1N34 MR .632 320. SA 0. 46 MR 2 .632 32 0G 3W15 0. MRSA 6164 .632 6.39 MRSA 6G80 .639 6.39 1 MRSA .6 39 6W15 MR SA 6. 39 .6 39 5A32 MRSA 5.35 7 .635 5W 11 MR SA 5.35 .635 5G67 MR SA 5.35 47 .635 MR SA .3 55N 63 .6 355 6238W MRSA 3.263 626 63W62 MRSA. 3.2 626 9W74 MRSA. 309 .30 9.1A18 MRS A.6 309.30 MRSA.6 9.1 N30 309.30 MRS A.6 8A23 328 .32 MRSA.6 7.1 G51 327.32 MRS A.6 8G52 328.32 MRSA.6 8 4.32 4G4 MRSA .632 3.323W83 MRSA .632 9.W1 95 MRSA .621 9.G9 0 MRSA.621 .N74 MRSA.6219 G91 MRSA.6219. W21 MRSA.6226.226 6A6 MRSA.6226.22 9 MRSA.6226.226N MRSA.6229.229W24
MRSA.6244.244. 1A11 MRSA.6244.24 44815W44 MRSA.6244. 244A10 MRSA.6244. 244N20 MRSA.6244. 244W43 MRSA .624 1.241 G22 MRSA .624 4.24 4G24 MRSA .622 2.222A5 MRSA .635 3.353.1N MRSA.6 46 353 .35 3.1G66 MRS A.6 353.35 3N4 5 MRS A.6 353.35 359 01W MRS A. 116 6353. 353 .1A MRSA.6 31 330.33 MRSA. 0.1 N41 635 3. 353A30 MRS A.6 35 3.353 MRSA W1 15 .641 7.4 MR SA 17G88 .621 3. 21 MR SA 3N3 .6213 MRSA .213 G4 .6 21 3.21 MR SA 3W9 .6 24 2.24 MR SA 2N19 .6 23 MRSA 9.23 .623 9G21 MRSA 9. 23 .623 9W39 MRSA 9.23 .623 9A9 MRSA 9. 239N .632 MR 17 8. SA 328N .6 234. MR 38 SA 23 .6 4G 23 MR 18 3.23 SA .6 3W MR 24 30 SA 0. .6 24 MR 36 0W SA 4. 40 .6 36 MR 36 4G SA 4. 70 .6 36 MR 36 40 SA 4. 83 .6 W1 36 MR 36 47 24 SA 4. 94 .6 36 MR 7W 36 4. SA 12 4 1G .6 .3 5 MRS 71 64W 39 A. 4. 123 MR 39 63 SA 4W 48 . 14 .3 MR 9 48 SA. 63 23 G6 .32 MR 6 32 1 SA 3 4. .6 MRS 3 24 G47 32 A. W84 MR 632 4. 32 SA 4. 4. . 63 MR 1G 32 SA 4 2 4 MR 9 4. .6 324 .1 A2 SA 20 MR .R 1. 2 719 20 S A. 20 MR 2W 1G 11 R2 SA 85 35 01 MR .6 5W W SA 20 2 MR .6 1. SA 20 20 MR .8 1. 1N 86 20 M R S A. 1 9. 63 1. MR SA. 20 1N 51 63 1G SA .3 2 30 1 .6 51 .3 33 A2 30 9 0. .1 33 G5 0W 3 91
62
MR MR SA SA .6 .6 43 MR 36 1. SA 2. 43 MR .6 36 SA 1A 36 2W .6 4 2. 30 12 2 36 8. 2 2G 6 SA 30 69 .6 20 3 MR 8A 20 SA . .6 3. 2 03 17 MR 20 20 A SA 3. 3W 1 .6 MR 20 3 3 34 SA 3G MR .6 .W 2 MR SA 33 20 SA .6 4 1 MR MRS .6 .N 4 SA 76 A. .64 412 31. 641 G9 .4 12 12N 8 .4 M RS 1.4 12W 68 11 A. 33 16 6 MR 70 9 S A. 234 .23 W 16 62 MR 8 3 3. 4W SA 233 31 MRS .6 23 N 14 3. A. MR 23 SA. 6397 3A MR 7 .W 62 SA 20 18 .6 MR 7 .2 21 SA 18 8. .6 N6 21 41 MR 8. 9. SA 1A 41 .6 MR 4 9W 21 SA 17 8. .6 5 21 21 8W MR 8. 16 21 SA .6 8. MR 21 1G SA 8. 10 .6 21 21 8G MR 0. SA 9 21 .6 MR 0W 21 SA.6 6 0. 21 36 0. 0G3 MR 360W SA .6 36 12 0 0.36 MRSA 0N49 .638 MRSA 8.W2 .622 05 MRSA 1.22 .622 1W18 1.22 MRSA .621 1G11 5.21 MRSA 5W11 .621 5.21 MR SA 5G 5 MR SA .621 5.21 .621 5. 21 5A 2 5133 MRSA 2W 13 .623 6.2 MRSA. 36W32 6236 MR SA. .23 6G 6236. 19 23661 MRS A. 95 W34 6236.2 36 652 MRSA.6 7W35 236 .23 MRSA.6 6.1 G2 236 .23 0 66121W MRSA.6 33 390 .39 MRS A.6 0W146 390 .39 05178W MRS A.63 147 88.388 .1N60 MRSA.635 6.356W11 8 MRSA .63 56.3 56A3 3 MRSA.6356 .356G68 MRSA .635 6.356N48 MRSA.6401 .401N62 MRSA.6227.2 27W22
M RS
MRSA.6406.4063760W164 MRSA.6420.420W176
NGS Application to Clinical Samples
Examples from our work
2011 ASCP Annual Meeting
Genomics of Aromatase Inhibitor Response in Luminal Breast Cancer Core Biopsy
Exemestane Postmenopausal ER+, Allred 6-8, Clinical Stage 2 and 3
Letrozole Anastrozole
S U R G E R Y
Continued therapy with an AI where possible: Radiotherapy, Chemotherapy discretionary
Accrual @ 377 Tumor (Ki67) 46 “luminal” subtype breast cancer cases: 25 AI-responsive and 21 AIunresponsive tumors (per resection Ki67 index). 46 genome pairs completed from pre-treatment core biopsies with >70% neoplastic cellularity. Manuscript in revision. We now have sequenced the resected post-treatment tumor genomes for most of these patients.
2011 ASCP Annual Meeting
A comprehensive mutational spectrum of ER+ Breast Cancer
• These mutations were tested in a panel of 121 additional luminal cases from the two clinical trials to ascertain significantly mutated genes. • 70/167 tumors carried PIK3CA mutations (41.9%). • TP53 was mutant in 15.7% (26 out of 167 tumors)
2011 ASCP Annual Meeting
PIK3CA Mutations in Luminal Breast Cancer
• Two novel in-frame deletions were discovered in PIK3CA. • 20 of 26 point mutations in PIK3CA reside in the cluster with the highest allele frequency, indicating these are likely initiating mutations in the founder clones. •PIK3CA was the most frequently mutated gene at 43% of patients.
2011 ASCP Annual Meeting
MAP3K1 Mutations in Luminal Breast Cancer
• MAP3K1 is a serine/threonine kinase that activates the ERK and JNK pathways • We identified 6 mutations in the 46 tumors (2 nonsense, 3 frameshift, 1 missense) • Four patients carried two MAP3K1 mutations, indicating bi-allelic loss
2011 ASCP Annual Meeting
Deletion of MAP3K1
Normal
Tumor
2011 ASCP Annual Meeting
Chromothripsis in Luminal Breast Cancer
• Twelve cases lacked any translocations. • Seven cases had multiple complex translocations with breakpoints suggestive of a single cellular catastrophe (“chromothripsis”).
2011 ASCP Annual Meeting
Potentially Druggable Somatic Mutations
When combining common and rare mutations across 46 samples, 96% of patients have a potentially druggable mutation identified by WGS 2011 ASCP Annual Meeting
The Challenges Ahead for Diagnostic NGS
Full and interpretive analysis Time to results in a clinical timeframe Demonstration of clinical efficacy (“Genome-forward medicine”) Adequacy of NGS + data analysis in the CLIA environment, compared to the current standard in pathology Uniformity of results across sample preservation methods Ability to inform patients about what might and might not result from genomic data interpretation Minimizing false negative results (meeting the need with the appropriate technology)
2011 ASCP Annual Meeting
WGS for diagnosis
Defining actionable targets in cancer patients
2011 ASCP Annual Meeting
Conclusion: “…whole genome characterization will become a routine part of cancer pathology.” 2011 ASCP Annual Meeting
Clinical Course of a WGS Patient
In cancer, whole genome analysis will be done not once, but multiple times during the course of the disease for tumor subtyping, monitoring response to therapy and diagnosing the reasons for recurrences or therapeutic failures. 2011 ASCP Annual Meeting
Welch et al., JAMA April 20, 2011
2011 ASCP Annual Meeting
Atypical APL Presentation
Clinical case: “AML52” 37 y.o. female with de novo AML; M3 morphology
Chemo + ATRA Complex cytogenetics, persistent leukemia
Chemo only First remission, referred to WU for SCT. rBM: normal morphology, cytogenetics; negative for PML/RARA.
??? Allogeneic SCT
2011 ASCP Annual Meeting
Consolidation + ATRA
“Genome-Guided Medicine”: An early example
Detection of PML-RARA by WGS, Confirmed by FISH, RT-PCR
Consolidation: Chemo + ATRA
37 y.o. female with de novo AML, M3 morphology, CTG, no PMLRARA. Referred to WUSM for SCT.
2011 ASCP Annual Meeting
Sustained remission
R.K.Wilson 2011
A Paradigm for Clinical Sequencing
Output per flow cell ~300 Gbp in a 10 day run time Combine the following data types per flow cell per clinical cancer case:
Normal WGS (30X) Tumor WGS (60X) Tumor exome (1 lane) > Validation plus deep digital Normal exome (1 lane) sequencing Tumor RNA-seq (1 lane) > Mutation expression + fusion transcripts
Integrated data analysis
Pharmaco-oncogenomic prediction of targeted therapy 2011 ASCP Annual Meeting
Acute lymphocytic leukemia Example case
2011 ASCP Annual Meeting
ALL-1: Case History
Male patient, mid-30’s. Initial presentation of ALL, induction therapy to remission, BMT from sibling (blasts banked). Patient relapsed in July 2011. Induction therapy to apparent remission. Using WGS of initial ALL blasts, two subclones identified, each with two distinct large deletions. Probes designed to detect these deletions used for flowFISH of remission marrow. Minimal residual disease detected. Treatment options assessed, based on WGS and RNAseq data analysis and DrugBank evaluation.
2011 ASCP Annual Meeting
ALL-1: Somatic copy number alterations
2011 ASCP Annual Meeting
Acute lymphocytic leukemia
ALL-1: Somatic single nucleotide variations
91 somatic ‘tier 1’ SNVs 42 with evidence for expression in RNA-seq Variant allele frequency of the 10 most highly expressed are listed below Gene
Ref.
Var.
AA
Type
WGS
Exome
RNA-seq
UBXN4
T
C
DD
silent
51.25%
40.56%
53.40%
OGT
G
T
CF
missense
39.22%
38.7%
35.79%
KIAA1033
C
T
TI
missense
38.89%
48.1%
50.75%
C15orf39
C
G
AA
silent
47.37%
37.5%
48.74%
SPTAN1
T
C
LP
missense
42.39%
49.03%
50.17%
DDX6
A
G
LP
missense
21.78%
14.29%
18.14%
CCDC47
C
T
AT
missense
22.34%
21.9%
23.36%
NF1
C
T
R*
nonsense
64.58%
64.04%
47.06%
TNRC6B
G
A
SN
missense
13.25%
12.8%
19.46%
KIAA1462
G
C
PA
missense
70%
45.05%
41.14%
2011 ASCP Annual Meeting
Acute lymphocytic leukemia
ALL-1: Tumor heterogeneity
Read coverage (X)
Proportion
Tumor variant allele frequency
NF1
Tumor variant allele frequency
2011 ASCP Annual Meeting
2,074 tier1-3 somatic variants. 91 are tier1 (coding exons)
Acute lymphocytic leukemia
RNA-seq Analysis
RNA-seq reveals unusual FLT3 expression levels
2011 ASCP Annual Meeting
Acute lymphocytic leukemia
Quantitation of FLT3 Expression
FLT3 expression is not just high, it is extreme… • Of 49,887 genes, 28,671 had an FPKM greater than zero. • FLT3 ranks #229 by FPKM. • This places it inside the top 1% of all expressed genes.
FLT3
0
500
Frequency
1000
1500
Distribution of non-zero gene expression values
-15
-10
-5
0
5
10
15
• Based on FLT3 overexpression in the tumor cells, predict the patient will respond to the FLT3 inhibitor Sunitinib (Sutent)[DrugBank].
Gene expression (log2(FPKM + 0.00001))
2011 ASCP Annual Meeting
Acute lymphocytic leukemia
ALL-1: Patient status
Patient has been prescribed Sutent as of 2 weeks ago.
Initial blood tests after 3 days on Sutent indicated blast production had improved by ~50% postremission.
Plan to re-evaluate remission/MRD status using flowFISH probes and a bone marrow biopsy after 3 weeks of Sutent therapy.
If remission is complete, a bone marrow donor has been identified and BMT will follow.
2011 ASCP Annual Meeting
The Future of Pathology?
Am J Clin Pathol 2011; 135:668-672
2011 ASCP Annual Meeting
Conclusions
The interplay of surgery, oncology, pathology and cancer genomics is beginning to transform our understanding of the tumor genome.
In addition to identifying targetable mutations, we can determine the mutational spectrum of the tumor cell subpopulations present, providing potentially important information in therapeutic decisions.
The maturity and scale of commercial NGS platforms and interpretational software are beginning to match the clinical need for WGS.
2011 ASCP Annual Meeting
Acknowledgements The Genome Institute
Li Ding Dong Shen Chris Miller David Larson Malachi Griffith Nathan Dees Christine Wylie Todd Wylie Jason Walker Vince Magrini Ryan Demeter Sean McGrath
2011 ASCP Annual Meeting
Washington University Tim Ley, MD Peter Westervelt, MD John DiPersio, MD John Welch, MD, Ph.D. Tim Graubert, MD Lukas Wartman, MD Matthew Ellis, MB, Ph.D.
George Weinstock Rick Wilson