Whole Genome Sequencing by Next- Generation Methods: Genome-forward Medicine

Whole Genome Sequencing by NextGeneration Methods: Genome-forward Medicine Elaine R. Mardis, Ph.D. Professor of Genetics Washington University School ...

Author: Barrie Norman

0 downloads 0 Views 5MB Size

Report

Download PDF

Recommend Documents

2016. Whole Genome Sequencing

Next-Generation Sequencing in Personalized Medicine

INFLUENZA A WHOLE-GENOME SEQUENCING

Next Generation Sequencing Exome Sequencing

Next Generation Sequencing

Next Generation Sequencing Update

Sequencing: Next-Generation

Next-Generation Sequencing Survey

Next generation sequencing

Revolutionizing Next-Generation Sequencing

Next Generation Sequencing Applications

Whole genome sequencing as typing tool

RNA Sequencing with Next-Generation Sequencing

INTRODUCTION TO NEXT GENERATION SEQUENCING

NEXT GENERATION SEQUENCING SERVICE GUIDE

NEXT-GENERATION SEQUENCING AND BIOINFORMATICS

Illumina's next generation sequencing technology

APPLICATIONS OF NEXT-GENERATION SEQUENCING

Introduction to Next Generation Sequencing

Singularity? Next Generation Sequencing is a Technological Singularity. Reagent Costs. Sequencing the Venter Genome

Introduction to illumina Next Generation Sequencing Technology

Next-generation sequencing (NGS) provides unprecedented

Next Generation Sequencing Applications for Plant Protection

Next Generation Sequencing. Fast, accurate data analysis

Whole Genome Sequencing by NextGeneration Methods: Genome-forward Medicine Elaine R. Mardis, Ph.D. Professor of Genetics Washington University School of Medicine St. Louis MO

2011 ASCP Annual Meeting

Conflict of Interest Slide

I have the following conflicts to declare:

Speaker’s Bureau: Illumina, Inc.

Scientific Advisory Board: Pacific Biosciences, Inc.

Stockholder: Life Technologies, Inc.

2011 ASCP Annual Meeting

Overview

Next-Generation Sequencers (NGS)

Third Generation Sequencers

Pacific Biosciences Ion Torrent MiSeq Oxford Nanopore

NGS Practical Considerations

Roche/454 Illumina Life Technologies

Coverage: depth and breadth Error model Representation bias

NGS Applications 2011 ASCP Annual Meeting

Defining “Genome-Forward” Medicine

What is “Genome-Forward” Medicine?

Informing the physician about genomic aspects of a patient’s diagnosis, using next-generation sequencing methods in place of conventional approaches. In certain applications of genome-forward medicine, having information about the patient’s genome, or that of their microbial or viral infecting genome, may better inform the physician about treatment choices and in turn, improve patient response. 2011 ASCP Annual Meeting

The Trajectory of Throughput: 10 years

E.R. Mardis, Nature (2011) 470: 198-203

2011 ASCP Annual Meeting

Comparative costs: sequencing a human genome

Capillary technology

Applied Biosystems 3730xl (2004) $15,000,000

2011 ASCP Annual Meeting

Next-gen technology

Illumina HiSeq (2011) $10,000

Next-generation sequencers

The Fundamentals

2011 ASCP Annual Meeting

Next-generation DNA sequencing instruments

Next-generation DNA sequencing instruments

All commercially-available sequencers have the following shared attributes:

Random fragmentation of starting DNA, ligation with custom linkers = “a library” Library amplification on a solid surface (either bead or glass) Direct step-by-step detection of each nucleotide base incorporated during the sequencing reaction Hundreds of thousands to hundreds of millions of reactions imaged per instrument run = “massively parallel sequencing” Shorter read lengths than capillary sequencers A “digital” read type that enables direct quantitative comparisons A sequencing mechanism that samples both ends of every fragment sequenced (“paired end” reads)

2011 ASCP Annual Meeting

What are paired-end reads?

Paired-end reads

All next-gen platforms now offer methods to derive sequence data from each end of the library fragments. Differences exist in the _distance_ between read pairs, based on the approach/platform.

“paired ends” : linear fragment with ability to sample both ends in separate reaction “mate pairs” : circularized fragment of >1kb, sequenced either by a single reaction read or two end read (platform dependent)

In general, paired end reads offer advantages for human whole genome sequencing due to the repetitive nature of our DNA, and the difficulty in accurate placement (“mapping”) of NGS reads. 2011 ASCP Annual Meeting

Read alignment to the Human Reference Genome

The Human Genome Reference •

Mapping short reads to the genome reference sequence is a required step for next-generation sequence data analysis, regardless of the preparatory steps used

•

Once mapped to the genome, “localized” assembly of select reads can be used to define genomic alterations at single nucleotide resolution

•

The human genome sequence serves as a “reference” to which we can compare other human genomes… •

SNPs (single nucleotide polymorphisms)

•

Point mutations (maintain or change amino acids)

•

Insertion/deletions (add or remove >1base in the sequence)

•

Translocations of chromosome fragments (inter- or intra-)

•

Inversions (an entire fragment swaps ends)

•

Amplification (multiple repeated copies of a genome segment) or Deletion (large blocks removed)

2011 ASCP Annual Meeting

Roche/454 Library Prep

Roche/454 Library Prep

•Random Fragmentation •Adapter Ligation

Single Stranded Adapter Ligated Library

• emPCR for clonal amplification

2011 ASCP Annual Meeting

Roche/454 Pyrosequencing

Roche/454 Pyrosequencing Load Enzyme Beads

Load beads into PicoTiter™Plate

Centrifugation

DNA Capture Bead Containing Millions of Copies of a Single Clonal Fragment

Sequencing by Synthesis A

A

T

C

G

G

C

A

T

G

C

T

A

A

A

A

G

T

C

A T Anneal Primer

Sulfurylase

APS

PPi

Luciferase

ATP

luciferin

Light + oxy luciferin

2011 ASCP Annual Meeting

454 Instrumentation Specifics Instrument

Run Time (hr)

454 FLX+

18-20

700

900

Indel

1

$30 A

454 FLX Titanium

10

400

500

Indel

1

$500

454 GS Jr. Titanium

10

400

50

Indel

1

$108

A–

Read Yield Length (Mb/run (bp) )

Error Type

Error Rate (%)

Purchase Cost (x1000)

Requires the 454 FLX Titanium. This is the upgrade cost.

Notable: • Mate pair paired end reads of 3kb, 8kb and 20 kb separation without an increase in run time. • Cost per run makes sequencing an entire human genome cost-prohibitive relative to other technologies (~ $20/Mbp) • Great platform for targeted validation

2011 ASCP Annual Meeting

Illumina Sequencing: Library Preparation

Illumina Sequencing: Library Preparation

Automated Processing Low gDNA Inputs

2011 ASCP Annual Meeting

Illumina Sequencing by Synthesis

Emission

Excitation

2011 ASCP Annual Meeting

Incorporate Detect De-block Cleave fluor

Illumina Instrumentation Specifics Key updates • 2010: HiSeq 2000 • Two flow cells per run • 100 Gbp/FC or two genome equivalents per run • New scanning mechanics - scans both surfaces of FC lanes

• 2011: HiSeq 2000 • Improved chemistry (v. 3): increased yield and accuracy

• 2011: MiSeq Instrument

Run Time (days)

Read Length (bp)

Yield (Gb/run)

Error Type

Error Rate (%)

Purchase Cost (x1000)

GAIIx

14

150 x 150

96

Sub

>0.1

$525

HiSeq 2000

8

100 x 100

200 x 2

Sub

>0.1

$700

HiSeq 2000 v3

10

100 x 100

0.1

$700

MiSeq

1

150 x 150

1

Sub*

>0.1*

$125

2011 ASCP Annual Meeting

Life Technologies: sequencing by ligation

• custom adapter library • emPCR on magnetic beads • sequencing by ligation using fluorescent probes from a common primer • sequential rounds of ligation from a series of primers • fixed/known nucleotides for each probeset identify two bases each cycle, or “two base encoding”

2011 ASCP Annual Meeting

SOLiD Instrumentation Specifics Instrument

Run Time (days)

Read Length (bp)

Yield (Gb/run)

Error Type

Error Rate (%)

Purchase Cost (x1000)

SOLiD 4

12

50 x 35 PE

71

A-T Bias

>0.06

$475

8

75 x 35 PE 60 x 60 MP

155

A-T Bias

>0.01

$595

SOLiD 5500 xl

5500 xl • Front-end automation addresses bottlenecks at emPCR, breaking, and enrichment of beads • 6-lane Flow Chip with independent lanes/2 per run • Cost per whole genome data set is predicted to be $6K by 2011 • Very high accuracy data due to two-base encoding • ECC Module – An optional 6th primer that increases accuracy to 99.999% • Direct conversion of color space to base space • True paired-end chemistry enabled – Ligation reaction can be used in either direction

2011 ASCP Annual Meeting

Third Generation Sequencing Instruments

Third generation sequencers??

Recently, new sequencing platforms were introduced. The Pacific Biosciences sequencer is a single molecule detection system that marries nanotechnology with molecular biology. The Ion Torrent uses pH rather than light to detect nucleotide incorporations. The MiSeq is a scaled down version of the HiSeq, with faster chemistry and scanning. All offer a faster run time, lower cost per run, reduced amount of data generated relative to 2nd Gen platforms, and the potential to address genetic questions in the clinical setting.

2011 ASCP Annual Meeting

Comparisons to Third-Generation Sequencers Company

Platform Name

Sequencing

Amplificatio n

Run Time

Roche

454 Ti

DNA Polymerase “Pyrosequencing”

emPCR

10 hours

Illumina

HiSeq/MiS eq

DNA Polymerase

Bridge amplification

10 days/24 hours

Life

SOLiD/5 500

DNA Ligase

emPCR

12 days

Ion Torrent

PGM

Synthesis H+ detection

emPCR

2 hours

Pacific Biosciences

RS

Synthesis

NONE

45 min

2011 ASCP Annual Meeting

Pacific Biosciences RS Sample Prep Shearing (Covaris/Hydroshear)

Library/Polymerase Complex DNA polymerase binding

Sequencing Movie 1 (v1.1 & v1.2) Raw reads

Polish ends

Load library/polymerase complex onto SMRT cell

Post-filter reads

Mapped reads

Movie 2 (only v1.2) Raw reads

Post-filter reads

Mapped reads

SMRTbell™ ligation

The image part with relationship ID rId3 was not found in the file.

Sequencing primer annealing

Zero Mode Waveguides (ZMWs) Ver. 1.1 = 1 x 45,000 per SMRT cell Ver. 1.2 = 2 x 75,000 per SMRT cell

2011 ASCP Annual Meeting

SMRTbell Library Types

Circular Consensus Small ~250bp Single Movie Multiple passes Many sub-reads

Standard Large ~2kbp Single Movie Few sub-reads

VLR (“very long read”) Larger ~6kbp Single 45 minute Movie Single long read provides linking information

Read

Sub Reads

Sub Reads

Reads Consensus Read

2011 ASCP Annual Meeting

Pacific Biosciences RS Instrumentation Specifics Instrument

Run Time (Hours)

Read Length (bp)

Yield (Mb)

Error Type

Error Rate (%)

Purchase Cost (x1000)

1500

45 per SMRTCell

Insertions

15

$695

14 RS

(~8 SMRTCells)

mean mapped sub-read accuracy: mean mapped sub-read length: maximum mapped read length: maximum mapped sub-read length: Strobe polymerase/strobe reagent/strobe protocol (45 min movie)

1x45 min movie 8 SMRT cells

2011 ASCP Annual Meeting

85.7% (±1.3%) 697 (± 501) 7,772 bp 5,601 bp

Strobe polymerase/standard reagent/standard protocol

2x45 min movies 8 SMRT cells

Ion Torrent PGM Instrument

2011 ASCP Annual Meeting

Data Output per Run Trajectory: Ion Torrent

At present, only the 314 and 316 chips are commercially available

2011 ASCP Annual Meeting

Ion Torrent Data Yield Improves with Automation

Ion Torrent Yield with Automation Modules

2011 ASCP Annual Meeting

Oxford Nanopore Sequencing Exonuclease-aided sequencing

Pore translocation sequencing

2011 ASCP Annual Meeting

NGS: Practical Considerations

Importance of coverage, error model, representation bias

2011 ASCP Annual Meeting

Importance of Coverage

What is “coverage”? Coverage is a general term to describe the – fold oversampling of a DNA target by sequencing data In covering a target region or genome, increasing depth of coverage leads to increased certainty of variant detection Coverage levels may vary due to G+C content, amplification or deletion in the region, or other biases, the uniqueness of the target region (“mappability”), and the error model of the data type Coverage breadth is equally important as depth!

2011 ASCP Annual Meeting

Alignments: Coverage breadth Breadth-of-coverage topography Altered greatly by the application of a minimum depth filter. Breadth of coverage attrition through a range of minimum depth filters (1x, 5x, 10x, 15x, 20x) for a given region of interest (chr7:930225-931129). Note the breadth of coverage drops from almost 100% at 1x depth requirement to ~20% when we require >= 20x depth for coverage membership.

2011 ASCP Annual Meeting

Coverage breadth topography Depth

Breadth

1X

97.3%

5X

67.3%

10X

43.5%

15X

27.8%

20X

22.2%

Bias Assessment: GC content impacts coverage

2011 ASCP Annual Meeting

Applying Next-generation Sequencing

Whole Genome Sequencing (WGS)

Hybrid Capture: Exome or Targeted sequencing

Protein:DNA binding

non-coding RNA sequencing: discovery/variants

Transcriptome sequencing (RNA-seq)

Genome-wide methylation of DNA (Methyl-seq)

Clinical sequencing for therapeutic decisions E.R. Mardis, Annual Reviews in Genetics & Genomics (2008) E.R. Mardis, Nature (2011) 470: 198-203

2011 ASCP Annual Meeting

Whole Genome Sequencing Process

WGS: Data Production and Alignment • Prepare paired end libraries as whole genome fragment/shotgun by random shearing of genomic DNA, adapter ligation, size selection. • Produce paired end data from each end of billions of library fragments, over-sampling about 30-fold to cover at a depth sufficient to find all types of genome alterations. • Computer programs align the read pair sequences onto the Human Reference Genome and several algorithms are used to discover variants genome-wide.

2011 ASCP Annual Meeting

Data Analysis Workflow for Variant Detection 30-fold coverage of tumor genome PE Illumina reads Align read pairs to reference genome Detect Single-Nucleotide Variants and focused insertion/deletions

Design custom capture probes for each putative variant site

Detect anomalous read pair mapping, assemble reads and identify structural variations (inversions, translocations)

Compare to SNP array analysis

2011 ASCP Annual Meeting

Use normalized read coverage levels and HMM-based algorithm to identify CNA and LOH regions

Somatic Mutations in 50 AML Genomes

2011 ASCP Annual Meeting

2011 ASCP Annual Meeting

Indentifying Prognostic Mutations DNMT3A mutations in AML

Ley et al. NEJM 2010.

mutations are present in 22% of de novo AMLs, and 34% of cytogenetically normal patients.

DNMT3A

DNMT3A

mutations are strongly associated with poor prognosis.

2011 ASCP Annual Meeting

Hybrid Capture of Genomic Regions or Genes • Hybrid capture - fragments from a whole genome library are selected by combining with probes that correspond to targeted genes. • The probe DNAs are biotinylated, making selection from solution with streptavidin magnetic beads an effective means of purification. • The human “exome” by definition, is the exons of all ~21,000 genes annotated in the human reference genome. Commercial exome kits target most but not all genes. • Custom capture reagents can be synthesized to target specific loci that may be of interest in a clinical context.

2011 ASCP Annual Meeting

Comparing Exome to Whole Genome Cancer Sequencing

• • •

Exome sequencing costs ~1/10 WGS Simplified analysis Sequence more samples

vs. •

• • • 2011 ASCP Annual Meeting

WGS captures SVs and focal amp/dels not resolved by SNP arrays Non-exonic mutations (“tier 2”) are likely to play a role in cancer Exome reagents do not capture all exons Tier 2/3 mutations facilitate heterogeneity analysis

Tumor Heterogeneity from Deep Read Count Analysis

Deep Digital Sequencing by Custom Hybrid Capture

Custom capture probes can be designed to sample from somatic point mutations, in/dels and structural variant regions in the tumor genome. High depth next-generation sequencing of these captured regions provides a digital read-out of the allele frequency of each mutation in the tumor cell population. The prevalence of a mutation reflects its ‘history’ in the tumor evolution. Older mutations are present at higher prevalence or allele frequency, newer ones at lower allele frequency. Using these data and statistical approaches, we can model the tumor heterogeneity for any sequenced sample, determining the mutational profile of each subclone and it’s proportion of the tumor cell population. 2011 ASCP Annual Meeting

“Deep Digital Sequencing”

All cells carry a heterozygous mutation

Wild-type & het mutant cells

Mutation

Total reads = 1000 Mutant reads = 500 Mutant Allele Frequency = 500 = 50% 1000

2011 ASCP Annual Meeting

Total reads = 1000 Mutant reads = 250 Mutant Allele Frequency = 250 = 25% 1000

Tumor heterogeneity: single dominant clone

Single cluster of somatic mutations

2011 ASCP Annual Meeting

Tumor heterogeneity: multiple clones

Cluster 3: 88% tumor allele frequency

Cluster 2: 57% tumor allele frequency

Cluster 1: 35% tumor allele frequency 22% Tumor Content of Tumor Sample 0.3% Tumor Content of Normal Sample

Tumor allele frequencies are adjusted for normal cell contamination Kernel density estimation determines the number of discrete clones

2011 ASCP Annual Meeting

Mutant Allele Frequency in Metastasis (%)

Comparing heterogeneity of primary and metastatic tumors

Primary = absent Metastasis= all cells

Primary tumor= heterozygous mutation present in all cells Metastasis= heterozygous mutation present in all cells

Mutant Allele Frequency in Primary Tumor (%)

2011 ASCP Annual Meeting

De Novo vs. Relapse: Monoclonal Disease

Tumor subpopulations: Single dominant clone Clinical Data Sex: Age: Cyto: Time to relapse: OS months: Induction: Consolidation: SCT:

2011 ASCP Annual Meeting

Male 53 Normal 235 30 7+3+3E HDAC No

De novo vs. Relapse: Oligoclonal Disease

Tumor subpopulations: Multiple clones Clinical Data Sex: Age: Cyto: Normal Time to relapse: OS months: Induction: Consolidation: SCT:

2011 ASCP Annual Meeting

Male 68

805 46.8 7+3D HDAC No

Relapse is based on clonal evolution

Clonal Progression in AML1 Relapse

• Additional data from 7 AML tumor:relapse whole genome sequencing reveals that this scenario is common: a single dominant mutation cluster in a founder subclone persists through the chemotherapy and re-emerges with new somatic alterations. • Each tumor showed clear cut evidence of clonal evolution at relapse, and a higher frequency of transversions that were probably induced by DNA damage from chemotherapy.

2011 ASCP Annual Meeting

Clonal Evolution: MDS to sAML

By WGS of the sAML tumor, we identified somatic mutations All validated mutations were evaluated by deep digital sequencing in both banked MDS and sAML to calculate allele frequency and heterogeneity

2011 ASCP Annual Meeting

RNA Sequencing

RNA isolate

Size selection for nc RNA classes

polyA priming, SAGE tags RT, ds DNA Fragment, RT w/randoms

Adapter-ligated fragments for next-gen sequencing Alignment to reference database & discovery -Expression levels -Novel splice isoforms -Allelic bias in transcription -Fusion transcripts

2011 ASCP Annual Meeting

RNA-seq informs therapeutic choices

EN1 – a gene that is not expressed Normal WGS

Tumor WGS

IGV screenshot

Tumor RNA-seq EN1 is not expressed (0 RNA-seq reads out of ~146 million mapped)

Metastatic breast cancer (to brain)

2011 ASCP Annual Meeting

RNA-seq provides correlative data

RNA Correlation of Somatic Copy Number Alterations

HER2

HER2 / ERBB2 is heavily amplified in this tumor

2011 ASCP Annual Meeting

Metastatic breast cancer (to brain)

RNA-seq correlates with IHC Diagnosis

RNA-seq confirms the HER2, PR, & ER status HER2 +ive

PR-

• Gene expression values from RNAseq • Used to confirm HER2, PR, & ER status of each patient obtained by IHC • Tumor is • HER2+, PR-, ER-

ER-

Metastatic breast cancer (to brain) vs. four primary HER2 +ive breast cancers 2011 ASCP Annual Meeting

Human Microbiome Project Goals: -Defining Core Microbiomes in “healthy” subjects -Changes in Microbiomes with disease

Subjects

Samples Community Phenotypes

Sources of strains

Data Coordination, Analysis and Concordance to Clinical Data

Microbial Communities

Reference Sequences

Metagenomics WGS

2011 ASCP Annual Meeting

16S rRNA

Gammapapillomavirus Human papillomavirus Polyomavirus Roseolovirus Lymphocryptovirus Mastadenovirus

Alphapapillomavirus Gammapapillomavirus Human papillomavirus Polyomavirus Roseolovirus

Kristine Wylie, WUGI

Lymphocryptovirus Mastadenovirus

HMP: Human Virome Stability

Stability of virome over time

2011 ASCP Annual Meeting

Alphapapillomavirus

GI

Oral

Vaginal

Nasal

Legend: CD

MRSA.6307.307N28

MR

SA

4 .6

MR

12

SA

MR

4 .R SA MR

MR SA MR SA .R 1 03 .6 MR 43 SA 6. 714 MR .6 N W2 SA 39 .6 12 8. 26 .64 N8 32 6. 0 . 2 . A4 .6 MR 6 62 SA 6 41 22 6.1 .6 5. G3 .2 MR 4 4 2 9 15 SA. 56 2N 3W 8 6 38 .27 MR 20 90 SA 2 8 .38 W2 . 62 MR 2W1 10 SA 54. .6 40 4 21 25 4W MRS .4 53 21 MR A. N SA . 64 R103 69 MR G 33. MRS SA .6 G99 4 33 A. 64 .N MR 33 MR 82 SA .W SA .6 21 .6 36 34 1 MR 5. 7. SA G9 34 .6 6 MR 7W 33 SA 10 0. .6 6 33 32 0N 6. 40 32 6W 87

MR

MR

SA

MR

MRSA.6366.3667367W 128 MRSA.6366.366W 127 MRSA.6366.366G 72 MRSA.6366.36 6N51 MRSA.6384. 384A37 MRSA.6502 .3302W157 MRSA.650 1.3301W1 56 MRSA.650 1.3301G83 MRSA .650 1.3301N 61 MRSA .62 60.2 60G3 4 MRSA.62 60. 260 W58 MRSA.6 340 .34 0A25 MRS A.6 340 .34 0W1 00 MRSA.6 340 .34 MRSA.6 0G57 341.34 MRS A. 1N4 2 6402.4 02N63 MRS A.6 386 .38 MRSA. 6G77 638 7.387 MRSA G7 .63 8 87.38 MR SA 7N59 .638 6. 38 MRSA 6W14 .6 40 4 1.40 MRSA 1657 .634 5W 16 9. 34 MRSA 0 9G62 .638 9.W2 MRSA 06 .638 MR SA 9.A4 .638 5 MRSA 9.N7 .6 9 42 7. MR SA.6 42 7W18 42 MRSA 7.42 4 .624 7N MR 71 3. SA W196 .6 242. MR SA 24 .6 2W MR 25 42 SA 2. .6 25 MR 25 27 SA 2. 09 .6 4W 25 MR 25 52 2G SA 2. 29 .6 25 MR 25 2W SA 8. 51 .6 25 MR 25 8N SA 8. 25 .6 25 MR 25 8G SA 8. 33 .62 MR 25 SA 8W 66 56 .6 .2 MR 26 6 61 S 6. 35 MRS A. 62 26 3W 67 66 A .6 66 01 .2 MR 1W 26 6 6W SA 67 6.2 .6 MRS 65 26 6 65 A 6. M RS .6 9 21 26 A .6 266. W68 6A MR 15 2 39 SA 4.3 66G3 .63 MR SA 8 94 91. MR .6 391 G79 S 42 MRS A .6 1. W1 42 30 A. 48 MR 17 3. 63 SA 59 G9 03 MR .6 9W .W1 3 SA 26 17 MR .6 7. 99 7 26 26 MR SA. 7. 7. 64 SA 26 MR 1N 03 7W SA .63 27 .4 MR 69 99 .6 03 . 20 64 MR S A. 4. 399 88 6 SA W 20 W1 .6 3 0 5 4W 154 63 . 33 4 5. 30 5 W7 33 1 52 98 5W 19 0

MRSA.6307.307A16

MRSA.6307.307G42

MRSA.6345.345N44

MRSA.6344.344W102

MRSA.6345.345W103

MRSA.6407.407N65 67 MRSA.6407.4074440W1 90W166 MRSA.6407.40766 7462W12 MRSA.6215.215 341.1N43 MRSA.6341. 61G35 MRSA.6261.2 335G56 MRSA.6335. 5 0.3300W15 MRSA.650 0 1.3301A4 MRSA .650 2 0.33 00G8 MRSA .650 34 74.3 74W1 MRS A.63 8W107 348.34 1 MRS A.6 415 .N8 MRS A.6 10W209 415 .70 5W54 MRS A.6 255 .25 MRS A.6 55G31 25 5.2 MRS A.6 71W93 331 43 63 31. .1G54 MRSA. 1.3 31 633 A39 MR SA. 6.386 8 .638 6. N7 MR SA A. 638 MRS 09W5 09.2 .62 2W95 MRSA 2.33 55 .6 33 1.1G MRSA 1.33 2N70 .633 2.42 MR SA 9 .6 42 8W17 MR SA 2758 27 2. 42 231W 1. .6 42 1N37 .6 23 6. MRSA MRSA 6. 32 6W46 32 SA.6 6. 24 26 MR 6G .624 24 SA 14 MR 246. 6W .6 SA 6.21 6G6 MR 21 21 .6 6. 72 SA 21 0N MR .6 43 6 SA 0. 18 MR 43 0W .6 43 89 SA 0. 0G MR 43 43 0 .6 0. 15 SA 43 5W MR .6 1 39 12 SA 5. MR 1W 39 36 83 .6 1. .N SA 36 37 3 MR .6 .64 .W21 5 SA SA 0 MR 37 MR W1 64 5 A. 077 G2 M RS 4 63 5 .1 A1 3 6 .3 .24 5. 1 2 34 2 45 A1 24 A.6 .6 1 5. 245 N2 24 M RS MR SA 5. .6 5.1 45 24 SA MR . 24 45W A.6 0 MRS 6 245 5.2 W9 9 4 29 N3 SA. .62 .3 MR A 329 20 3 29 9. 2 4 W MRS A.6 0 32 2 17 M RS A.6 24. 74W 92 2 7 4 1W 2 6 M RS .6 12 33 A SA .4 1. 341 01 MR 1 33 . .6 341 2W 13 2 34 0W 2N .6 2. SA 37 0 34 W MR 0 . . R4 .6 88 37 A SA 64 .6 M RS MR SA 02

400 MRSA Genomes

MRSA.6307.307W72 MRSA.6359.359W119

NGS of MRSA Isolates

96 5W 49 33 0W 5. 99 25 33 9W 0. .6 33 25 SA 00 9. .6 4 MR W2 33 SA 4N 1. .6 MR 10 A 21 32 S 4W 8 4. .6 MR 21 13 21 1 SA 4. 0W .6 G4 MR A 21 38 S .1 0. .6 MR 268 G40 38 SA 8. .6 MR 26 268 W7 0 SA .6 8. MR 8 SA 26 R 6 .26 G37 M A. 2 68 .265 64 MRS A.6 5W S 09 265 26 MR W1 .6 5. 50 63 SA 26 .3 MR . 1G A.6 50 8 349 2 10 MRS A.63 9. 17 9W 34 0W 34 M RS A. 6 68 9. 63 S 34 1 41 MR .6 17 6. SA 6W 41 87 MR 41 .6 1G 6. SA 6. 41 MR 4 41 .6 10 6. SA 6W 41 MR 34 .6 97 6. SA 6W 34 MR 33 .6 98 6. SA 7W 33 MR 33 .6 89 7. SA 8W 33 MR .6 32 SA 8. 74 32 MR 6G .6 37 3 SA 6. 15 MR 37 .6 398W SA 8. 142 MR 39 .6 384W SA 41 MR 384. .6 406A 113 MRSA 406. 351W SA.6 1. MR 81 35 31 9W SA.6 6 MR 319. 7W13 .6 7. 37 MRSA .6 37 14 SA MR 8.W2 .643 1N18 MRSA 1.24 24 .6 1W41 MR SA 1.24 20 .6 24 1. 1A SA 31 MR 1. 32 .631 1.1N MRSA 11.31 1 .63 MRSA 31 1N3 6311. 19 11A MRSA. 31 1.3 MRSA.6 11 W75 1.3 31 15 MRS A.6 439 .W2 A.6 MRS 5 363 .G9 79 MRS A.6 76744W 317 .31 MRS A.6 7W78 317 .31 MRS A.6 W18 8 35. 435 MRS A.64 4 36.336A2 MRSA.63 1N53 1.37 MRSA .637 .371G73 MRSA.6371 .371W133 MRSA.6371 .367W129 MRSA.6367 37.1N15 MRSA.6237.2 7W36 MRSA.6237.23 1N7 MRSA.6218.218. 181 MRSA.6423.4232663W 180 MRSA.6423.4237501W MRSA.6228.228.1G14

SA

SA

MR SA .632 5. 32 MRSA. 5W86 6324. 324.1 MR SA. N36 637 0. 370N5 MRSA.6 2 251.25 1W 50 MRS A. 6251.2 MRSA.6 51N22 506 .33 06W 159 MRS A.6 504.33 04W 158 MRS A.6 211.21 MRSA.6 1W7 387 .38 7W1 45 MRS A.62 48. 248 W47 MRS A.62 54.2 54G 30 MRS A.64 18.418W1 74 MRSA.637 6.376N54 MRSA.6376. 376.1G75 MRSA.6376. 376W135 MRSA.6376.3 76A34 MRSA.6376.376 .1N55 MRSA.6345.345G 59 MRSA.6376.376.1A35 MRSA.6230.230.1N13

MRSA.6230.230.1G15 MRSA.6230.230N12 MRSA.6230.230W25

MRSA.6228.228W23

MRSA.6230.2302975W26

MRSA.6259.259W57

MRSA.6342.342342807W191

MRSA.R406 2601W

MRSA.6383.383W141

MRSA.R406N

MRSA.6217.217W15

MRSA.6406.406G84

MRSA.6217.217G7

MRSA.6406.406N64

MRSA.6217.217.1G8

MRSA.6406.4062601W165

MRSA.6217.217.1N5 G81 MRSA.6396.396.1 4G58 MRSA.6344.34 344A27 MRSA.6344. 2173205W189 MRSA.6217. .217.1A3 MRSA.6217 8.2938W38 MRSA.623 8.238N16 MRSA.623 8.23 8W37 MRSA.623 8.23 8A8 MRSA .623 4 375. W20 MRS A.6 7 375.G9 MRSA.6 7 375 .N7 MRSA.6 110 013 96W 350 .35 0.1A28 MRSA.6 350.35 2 A.6 MRS 593 W11 500 .1 G60 350.3 45 A.6 5.3 MRS 3 634 26W18 MR SA. 6.4 W76 .642 .314 MR SA 5 .6 314 .1G4 14 MRSA 14 .3 1W 77 .63 4354 MRSA 4.31 4G44 .6 31 4. 31 MRSA .6 31 9W48 SA MR 9.24 9G27 .624 9.24 MRSA 24 1G28 .6 9. MRSA 5 9.24 18 .624 429W 8 9. 17 MR SA .642 968W MRSA 4217 2W60 1. 26 2. 26 .642 26 2N SA .6 26 MR 2. 36 MRSA 26 2G .6 26 SA 61 2. MR 9W 26 19 .6 7 21 SA 13 26 MR 9W 2. 37 57 26 9. 9N .6 37 37 SA 24 .6 9. MR 7N SA 37 25 MR .6 32 7. SA 7G 25 MR 25 5 .6 7. W5 SA 25 MR 57 8 .6 .2 19 SA 57 7W 2 MR 7 94 .6 .2 SA W19 MR

A.

47 47. A43 . 62 4 7. 75 SA 62 MR 7.N 2 S A. 24 MR W20 4 .6 4. 9 SA 35 MR 4 .G 92 A.6 635 G 7. MRS SA . 30 24 R W1 4 .6 M 68 W9 SA .3 5 94 MR 68 G6 46 1 63 53 31 16 .3 SA. 1.3 6W 53 3 MR 14 3 33 63 W1 4 .6 16 52 SA SA. 40 G6 .3 . MR MR 52 N73 1 0 52 .3 1 7 64 63 52 43 18 A. A. 63 S 1. S 6W . MR MR 43 323 SA .6 MR 31 SA R .4 M 31 64 . SA MR

SA .

MR

MRSA.6363.W203

MRSA.6377.377N56

MRSA.6227.227N 10 MRSA.6232.232W28

MRSA.6232.232G16 MRSA.6232.2326105W29

MRSA.6227.22 7G13 MRSA.6227.227 .1N11

MR

MRSA.6261.261W59

MRSA.6377.377A36

MRSA.6261.261A14

MRSA.6223.223W19

MRSA.6223.223G12

2011 ASCP Annual Meeting

8 2W 17 21 4 0W 6 2. 19 22 N6 3 21 9W 0. 10 19 .6 88 22 0.4 5W 02 SA .6 83 1 41 MR SA .64 06 0. 85 MR 41 41 86 SA 0G 0. .6 1G MR 41 41 SA 67 0. 0. .6 MR 1N 41 41 SA 0. 0. .6 MR 73 41 41 SA W . 8 0 .6 MR 58 30 41 SA 5N .6 8. MR 30 38 SA 76 5. .6 MR 8 5G 143 38 SA .3 .6 MR 5W 85 2 SA 38 38 19 MR .63 85. 5A 0W SA 38 62 5 37 MR .63 85 . 2W1 85 SA 2 63 .3 MR 18 640 7W 02 S A. 6385 20 MR 2 .4 A. 47 42 640 M RS 4. S A. 44 42 9 MR .A .6 13 66 1W SA 38 MR . 62 50 1. SA 5N 38 6 MR 36 .6 12 5. SA 5W 36 MR 36 .6 50 5. SA 7G 36 MR 32 .6 88 7. SA 7W 32 MR 1 32 .6 13 7. SA 9W 32 MR 36 .6 9. SA 21 36 MR 0A .6 32 SA 0. 82 MR 32 0W .6 32 SA 0. MR 0N33 .632 SA 0.32 1N34 MR .632 320. SA 0. 46 MR 2 .632 32 0G 3W15 0. MRSA 6164 .632 6.39 MRSA 6G80 .639 6.39 1 MRSA .6 39 6W15 MR SA 6. 39 .6 39 5A32 MRSA 5.35 7 .635 5W 11 MR SA 5.35 .635 5G67 MR SA 5.35 47 .635 MR SA .3 55N 63 .6 355 6238W MRSA 3.263 626 63W62 MRSA. 3.2 626 9W74 MRSA. 309 .30 9.1A18 MRS A.6 309.30 MRSA.6 9.1 N30 309.30 MRS A.6 8A23 328 .32 MRSA.6 7.1 G51 327.32 MRS A.6 8G52 328.32 MRSA.6 8 4.32 4G4 MRSA .632 3.323W83 MRSA .632 9.W1 95 MRSA .621 9.G9 0 MRSA.621 .N74 MRSA.6219 G91 MRSA.6219. W21 MRSA.6226.226 6A6 MRSA.6226.22 9 MRSA.6226.226N MRSA.6229.229W24

MRSA.6244.244. 1A11 MRSA.6244.24 44815W44 MRSA.6244. 244A10 MRSA.6244. 244N20 MRSA.6244. 244W43 MRSA .624 1.241 G22 MRSA .624 4.24 4G24 MRSA .622 2.222A5 MRSA .635 3.353.1N MRSA.6 46 353 .35 3.1G66 MRS A.6 353.35 3N4 5 MRS A.6 353.35 359 01W MRS A. 116 6353. 353 .1A MRSA.6 31 330.33 MRSA. 0.1 N41 635 3. 353A30 MRS A.6 35 3.353 MRSA W1 15 .641 7.4 MR SA 17G88 .621 3. 21 MR SA 3N3 .6213 MRSA .213 G4 .6 21 3.21 MR SA 3W9 .6 24 2.24 MR SA 2N19 .6 23 MRSA 9.23 .623 9G21 MRSA 9. 23 .623 9W39 MRSA 9.23 .623 9A9 MRSA 9. 239N .632 MR 17 8. SA 328N .6 234. MR 38 SA 23 .6 4G 23 MR 18 3.23 SA .6 3W MR 24 30 SA 0. .6 24 MR 36 0W SA 4. 40 .6 36 MR 36 4G SA 4. 70 .6 36 MR 36 40 SA 4. 83 .6 W1 36 MR 36 47 24 SA 4. 94 .6 36 MR 7W 36 4. SA 12 4 1G .6 .3 5 MRS 71 64W 39 A. 4. 123 MR 39 63 SA 4W 48 . 14 .3 MR 9 48 SA. 63 23 G6 .32 MR 6 32 1 SA 3 4. .6 MRS 3 24 G47 32 A. W84 MR 632 4. 32 SA 4. 4. . 63 MR 1G 32 SA 4 2 4 MR 9 4. .6 324 .1 A2 SA 20 MR .R 1. 2 719 20 S A. 20 MR 2W 1G 11 R2 SA 85 35 01 MR .6 5W W SA 20 2 MR .6 1. SA 20 20 MR .8 1. 1N 86 20 M R S A. 1 9. 63 1. MR SA. 20 1N 51 63 1G SA .3 2 30 1 .6 51 .3 33 A2 30 9 0. .1 33 G5 0W 3 91

62

MR MR SA SA .6 .6 43 MR 36 1. SA 2. 43 MR .6 36 SA 1A 36 2W .6 4 2. 30 12 2 36 8. 2 2G 6 SA 30 69 .6 20 3 MR 8A 20 SA . .6 3. 2 03 17 MR 20 20 A SA 3. 3W 1 .6 MR 20 3 3 34 SA 3G MR .6 .W 2 MR SA 33 20 SA .6 4 1 MR MRS .6 .N 4 SA 76 A. .64 412 31. 641 G9 .4 12 12N 8 .4 M RS 1.4 12W 68 11 A. 33 16 6 MR 70 9 S A. 234 .23 W 16 62 MR 8 3 3. 4W SA 233 31 MRS .6 23 N 14 3. A. MR 23 SA. 6397 3A MR 7 .W 62 SA 20 18 .6 MR 7 .2 21 SA 18 8. .6 N6 21 41 MR 8. 9. SA 1A 41 .6 MR 4 9W 21 SA 17 8. .6 5 21 21 8W MR 8. 16 21 SA .6 8. MR 21 1G SA 8. 10 .6 21 21 8G MR 0. SA 9 21 .6 MR 0W 21 SA.6 6 0. 21 36 0. 0G3 MR 360W SA .6 36 12 0 0.36 MRSA 0N49 .638 MRSA 8.W2 .622 05 MRSA 1.22 .622 1W18 1.22 MRSA .621 1G11 5.21 MRSA 5W11 .621 5.21 MR SA 5G 5 MR SA .621 5.21 .621 5. 21 5A 2 5133 MRSA 2W 13 .623 6.2 MRSA. 36W32 6236 MR SA. .23 6G 6236. 19 23661 MRS A. 95 W34 6236.2 36 652 MRSA.6 7W35 236 .23 MRSA.6 6.1 G2 236 .23 0 66121W MRSA.6 33 390 .39 MRS A.6 0W146 390 .39 05178W MRS A.63 147 88.388 .1N60 MRSA.635 6.356W11 8 MRSA .63 56.3 56A3 3 MRSA.6356 .356G68 MRSA .635 6.356N48 MRSA.6401 .401N62 MRSA.6227.2 27W22

M RS

MRSA.6406.4063760W164 MRSA.6420.420W176

NGS Application to Clinical Samples

Examples from our work

2011 ASCP Annual Meeting

Genomics of Aromatase Inhibitor Response in Luminal Breast Cancer Core Biopsy

Exemestane Postmenopausal ER+, Allred 6-8, Clinical Stage 2 and 3

Letrozole Anastrozole

S U R G E R Y

Continued therapy with an AI where possible: Radiotherapy, Chemotherapy discretionary

Accrual @ 377 Tumor (Ki67) 46 “luminal” subtype breast cancer cases: 25 AI-responsive and 21 AIunresponsive tumors (per resection Ki67 index). 46 genome pairs completed from pre-treatment core biopsies with >70% neoplastic cellularity. Manuscript in revision. We now have sequenced the resected post-treatment tumor genomes for most of these patients.

2011 ASCP Annual Meeting

A comprehensive mutational spectrum of ER+ Breast Cancer

• These mutations were tested in a panel of 121 additional luminal cases from the two clinical trials to ascertain significantly mutated genes. • 70/167 tumors carried PIK3CA mutations (41.9%). • TP53 was mutant in 15.7% (26 out of 167 tumors)

2011 ASCP Annual Meeting

PIK3CA Mutations in Luminal Breast Cancer

• Two novel in-frame deletions were discovered in PIK3CA. • 20 of 26 point mutations in PIK3CA reside in the cluster with the highest allele frequency, indicating these are likely initiating mutations in the founder clones. •PIK3CA was the most frequently mutated gene at 43% of patients.

2011 ASCP Annual Meeting

MAP3K1 Mutations in Luminal Breast Cancer

• MAP3K1 is a serine/threonine kinase that activates the ERK and JNK pathways • We identified 6 mutations in the 46 tumors (2 nonsense, 3 frameshift, 1 missense) • Four patients carried two MAP3K1 mutations, indicating bi-allelic loss

2011 ASCP Annual Meeting

Deletion of MAP3K1

Normal

Tumor

2011 ASCP Annual Meeting

Chromothripsis in Luminal Breast Cancer

• Twelve cases lacked any translocations. • Seven cases had multiple complex translocations with breakpoints suggestive of a single cellular catastrophe (“chromothripsis”).

2011 ASCP Annual Meeting

Potentially Druggable Somatic Mutations

When combining common and rare mutations across 46 samples, 96% of patients have a potentially druggable mutation identified by WGS 2011 ASCP Annual Meeting

The Challenges Ahead for Diagnostic NGS

Full and interpretive analysis Time to results in a clinical timeframe Demonstration of clinical efficacy (“Genome-forward medicine”) Adequacy of NGS + data analysis in the CLIA environment, compared to the current standard in pathology Uniformity of results across sample preservation methods Ability to inform patients about what might and might not result from genomic data interpretation Minimizing false negative results (meeting the need with the appropriate technology)

2011 ASCP Annual Meeting

WGS for diagnosis

Defining actionable targets in cancer patients

2011 ASCP Annual Meeting

Conclusion: “…whole genome characterization will become a routine part of cancer pathology.” 2011 ASCP Annual Meeting

Clinical Course of a WGS Patient

In cancer, whole genome analysis will be done not once, but multiple times during the course of the disease for tumor subtyping, monitoring response to therapy and diagnosing the reasons for recurrences or therapeutic failures. 2011 ASCP Annual Meeting

Welch et al., JAMA April 20, 2011

2011 ASCP Annual Meeting

Atypical APL Presentation

Clinical case: “AML52” 37 y.o. female with de novo AML; M3 morphology

Chemo + ATRA Complex cytogenetics, persistent leukemia

Chemo only First remission, referred to WU for SCT. rBM: normal morphology, cytogenetics; negative for PML/RARA.

??? Allogeneic SCT

2011 ASCP Annual Meeting

Consolidation + ATRA

“Genome-Guided Medicine”: An early example

Detection of PML-RARA by WGS, Confirmed by FISH, RT-PCR

Consolidation: Chemo + ATRA

37 y.o. female with de novo AML, M3 morphology, CTG, no PMLRARA. Referred to WUSM for SCT.

2011 ASCP Annual Meeting

Sustained remission

 R.K.Wilson 2011

A Paradigm for Clinical Sequencing

Output per flow cell ~300 Gbp in a 10 day run time Combine the following data types per flow cell per clinical cancer case:

Normal WGS (30X) Tumor WGS (60X) Tumor exome (1 lane) > Validation plus deep digital Normal exome (1 lane) sequencing Tumor RNA-seq (1 lane) > Mutation expression + fusion transcripts

Integrated data analysis

Pharmaco-oncogenomic prediction of targeted therapy 2011 ASCP Annual Meeting

Acute lymphocytic leukemia Example case

2011 ASCP Annual Meeting

ALL-1: Case History

Male patient, mid-30’s. Initial presentation of ALL, induction therapy to remission, BMT from sibling (blasts banked). Patient relapsed in July 2011. Induction therapy to apparent remission. Using WGS of initial ALL blasts, two subclones identified, each with two distinct large deletions. Probes designed to detect these deletions used for flowFISH of remission marrow. Minimal residual disease detected. Treatment options assessed, based on WGS and RNAseq data analysis and DrugBank evaluation.

2011 ASCP Annual Meeting

ALL-1: Somatic copy number alterations

2011 ASCP Annual Meeting

Acute lymphocytic leukemia

ALL-1: Somatic single nucleotide variations

91 somatic ‘tier 1’ SNVs 42 with evidence for expression in RNA-seq Variant allele frequency of the 10 most highly expressed are listed below Gene

Ref.

Var.

AA

Type

WGS

Exome

RNA-seq

UBXN4

T

C

DD

silent

51.25%

40.56%

53.40%

OGT

G

T

CF

missense

39.22%

38.7%

35.79%

KIAA1033

C

T

TI

missense

38.89%

48.1%

50.75%

C15orf39

C

G

AA

silent

47.37%

37.5%

48.74%

SPTAN1

T

C

LP

missense

42.39%

49.03%

50.17%

DDX6

A

G

LP

missense

21.78%

14.29%

18.14%

CCDC47

C

T

AT

missense

22.34%

21.9%

23.36%

NF1

C

T

R*

nonsense

64.58%

64.04%

47.06%

TNRC6B

G

A

SN

missense

13.25%

12.8%

19.46%

KIAA1462

G

C

PA

missense

70%

45.05%

41.14%

2011 ASCP Annual Meeting

Acute lymphocytic leukemia

ALL-1: Tumor heterogeneity

Read coverage (X)

Proportion

Tumor variant allele frequency

NF1

Tumor variant allele frequency

2011 ASCP Annual Meeting

2,074 tier1-3 somatic variants. 91 are tier1 (coding exons)

Acute lymphocytic leukemia

RNA-seq Analysis

RNA-seq reveals unusual FLT3 expression levels

2011 ASCP Annual Meeting

Acute lymphocytic leukemia

Quantitation of FLT3 Expression

FLT3 expression is not just high, it is extreme… • Of 49,887 genes, 28,671 had an FPKM greater than zero. • FLT3 ranks #229 by FPKM. • This places it inside the top 1% of all expressed genes.

FLT3

0

500

Frequency

1000

1500

Distribution of non-zero gene expression values

-15

-10

-5

0

5

10

15

• Based on FLT3 overexpression in the tumor cells, predict the patient will respond to the FLT3 inhibitor Sunitinib (Sutent)[DrugBank].

Gene expression (log2(FPKM + 0.00001))

2011 ASCP Annual Meeting

Acute lymphocytic leukemia

ALL-1: Patient status

Patient has been prescribed Sutent as of 2 weeks ago.

Initial blood tests after 3 days on Sutent indicated blast production had improved by ~50% postremission.

Plan to re-evaluate remission/MRD status using flowFISH probes and a bone marrow biopsy after 3 weeks of Sutent therapy.

If remission is complete, a bone marrow donor has been identified and BMT will follow.

2011 ASCP Annual Meeting

The Future of Pathology?

Am J Clin Pathol 2011; 135:668-672

2011 ASCP Annual Meeting

Conclusions

The interplay of surgery, oncology, pathology and cancer genomics is beginning to transform our understanding of the tumor genome.

In addition to identifying targetable mutations, we can determine the mutational spectrum of the tumor cell subpopulations present, providing potentially important information in therapeutic decisions.

The maturity and scale of commercial NGS platforms and interpretational software are beginning to match the clinical need for WGS.

2011 ASCP Annual Meeting

Acknowledgements The Genome Institute

Li Ding Dong Shen Chris Miller David Larson Malachi Griffith Nathan Dees Christine Wylie Todd Wylie Jason Walker Vince Magrini Ryan Demeter Sean McGrath

2011 ASCP Annual Meeting

Washington University Tim Ley, MD Peter Westervelt, MD John DiPersio, MD John Welch, MD, Ph.D. Tim Graubert, MD Lukas Wartman, MD Matthew Ellis, MB, Ph.D.

George Weinstock Rick Wilson