Introduction to the MiSeq: Technology and Applications

Introduction to the MiSeq: Technology and Applications Joseph Aman Field Applications Scientist © 2013 Illumina, Inc. All rights reserved. Illumina,...
Author: Clifford Austin
1 downloads 3 Views 4MB Size
Introduction to the MiSeq: Technology and Applications

Joseph Aman Field Applications Scientist

© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

Overview

Introduction to the MiSeq Sample preparation kits – DNA – RNA

16s Sequencing Overview Illumina sequencing portfolio update

2

MiSeq System Proven Pedigree – Bench top Friendly

3

Illumina Sequencing Workflow Library Preparation

Cluster Generation

Sequencing

Data Analysis

4

Library Preparation

Active Chromatin

Genomic DNA

Small RNA mRNA

ChIP-Sequencing

5

Other Apps

Sample Prep is Critical for Successful Sequencing

Dual Index Library shown

The aim of the sample prep step is to obtain nucleic acid fragments with adapters attached on both ends

6

The flow cell

Everything except sample preparation is completed on the flow cell •

Template annealing (1 - 96 samples)



Template amplification



Sequencing primer hybridization



Sequencing-by-synthesis reaction



Generation of fluorescent signal

7

Cluster Generation

Bind single DNA molecules to surface

Amplify on surface

~1000 molecules per ~ 1 µm cluster

8

MiSeq: Industry Best Data Quality, Paired-End Reads, High Output, and Flexible Read Length

15G Read Length

Quality Scores >Q30

1x36 bp

>90% bases

2x25 bp

>90% bases

2x100 bp

>85% bases

2x150 bp

>80% bases

2x250 bp

>75% bases

2x300 bp

>75% bases

Data output

8.5G 5.1G 3.4G 850 Mb 610 Mb 1x36

2 x 25

2 x 100

2 x 150

Read Length (bp) 9

2 x 250

2 x 300

MiSeq Applications

PCR Amplicon

10

TruSeq Amplicon (TruSight Panels)

Targeted RNA Expression

Illumina Sequencing Publications More than 5,200 publications using Illumina’s SBS technology

Greater than 85% of all SRA projects done on MiSeq 13,473 MiSeq projects*

6000 5000

14000

4000

86%

12000 10000

3000

8000 2000

6000

1000

4000

2000 0 2007

2008

2009

2010

2011

2012

2013

8%

5%

0 MiSeq

PGM

GS Jr

0.3% Proton

*Projects in NCBIs Sequence Read Archive (SRA) as of 01-02-14

11

Illumina Sample Preparation Portfolio

TruSeq Amplicon Cancer Panel

Nextera Enrichment Custom/Exome

TruSeq Custom Amplicon

TruSeq ChIP

Nextera XT

TruSeq RNA v2

TruSeq Targeted RNA

Nextera DNA

TruSeq Stranded mRNA/ Total RNA

Nextera Mate Pair

TruSeq DNA

12

Illumina Sequencing

TruSeq small RNA

Questions to ask yourself What kind of organism/s am I working with? – Is there a genome or transcriptome reference? Do I need to make one? – What level of ploidy does it have? 1n, 2n, 4n, 8n?? – How big is the genome/region of interest? – Does it have a balanced base composition?

What kind of samples do I have access to? – Fresh? Frozen? FFPE? Dried? Ancient? Environmental? – Quantity. How much nucleic acid starting material can I get? Do I only get one biological replicate? – Quality. Will my starting material be degraded or contain potential contaminants? How many samples do I need? – What number of samples will I need to be confident of my discovery? What false positive/false negative rates will I tolerate? – How much biological/technical replication is prudent? What are my limitations? – Time, budget, manpower, skills, tools, access to instrumentation – Do I need to operate within a regulatory framework? 13

Illumina DNA Sequencing Applications Portfolio The growing family of sample prep DNA Sample Prep TruSeq DNA PCR-Free

TruSeq Nano DNA

Nextera Mate Pair

Nextera

Nextera XT

Summary

Eliminates PCRinduced bias

The new gold standard with Low-input

Long-insert; gelfree

Low-input, fast, MiSeq

Lowest-input Fast, MiSeq, small genomes

Time

~5 h

~6 h

1.5 d

1.5 h

1.5 h

Apps

WGRS

WGRS

De novo, SVs

WGRS

Amplicons, plasmids, small genomes

Input

1ug+

100ng+

1ug+

50ng

1ng

Indexing

96

96

48 (gel-free)

96

96

Quality

Best!

Best!

+++

+++

+++

14

TruSeq DNA Sample Prep Workflow DNA

RNA

Gel size selection, if needed

15

Nextera and Nextera XT Sample Prep Kit Features

16

Highlights

• Fast sample prep, 90 min. • Single read or paired-end compatibility

Sample input and indices

• 50 to 1 ng DNA per sample • Index up to 96 samples • Parallel processing of up to 96 samples

Specific considerations

• Construct completed at the PCR step • Gel-free protocol • Enzymatic fragmentation

Suitable for:

• Large and small whole genomes • Now used for all enrichment workflows • Nextera Rapid Enrichment!

Nextera DNA Sample Prep

Transposons

Genomic DNA

~ 300 bp

Tagmentation

Reduced-Cycle PCR Amplification

p7 Index 1

Read 2 Sequencing Primer

Read 2 Sequencing Primer

Enrichment

Index 1 p5

Sequencing-Ready Fragment p7

17

Index 1 Rd2 SP

Rd1 SP Index 2

p5

TruSeq Custom Amplicon Assay Time Go from DNA to called variants in ~2 days Day 1 Receive custom oligos; Hybridization setup

Design studio for custom panels, Fixed panels also avaliable

18

Day 1 Assay biochemistry

Day 1-2 Cluster gen and sequencing on MiSeq

Day 2 Finished at 5:00PM Real-time analysis

1536 amplicons, 96 samples per plate

Automated sequencing

Simple, efficient, automatic data analysis and variant calling

Nextera® Mate-Pair Sample Prep Kit Industry’s only gel-free protocol, lowest DNA input, ideal for de novo seq

Enables wide-range of whole-genome apps – De novo assembly of small genomes & complex genomes (cancer) – Genome finishing: ‘close-to-finished’ ref. genomes from single library type – Spls w/limited input DNA (metagenomics) – Detection of structural variation

Optimized combo of Nextera & TruSeq DNA Spl Prep Gel-Free option: 3-15kb gap size – 1 ug input; 1.5 days: 3hrs HOT – Gel+ option for more refined gap size, 4ug input

19

TruSeq ChIP Seq - How does this work? 1. Cross link w/ formalin, shear 2. Remaining DNA protected by proteins

3. Immunoprecipitate with antibody against target protein of interest 4. Reverse crosslinks, use DNA as input for library generation TS ChIP Sample Prep

5. Sequence, reads indicate where protein was bound Figure from Szalkowski, A.M, and Schmid, C.D.(2010). Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing banchmarking efforts. Briefings in Bioinfomatics.

20

Illumina RNA Portfolio

TruSeq RNA TruSeq Stranded RNA

TruSeq RNA v2

TruSeq Small RNA

Stranded mRNA

mRNA

Small RNA

21

Stranded Total RNA

Sample Prep Solutions for RNA from Discovery to Validation Transcriptome to Targeted TruSeq RNA-Seq Portfolio Whole Transcriptome Analysis

RNA-seq for Expression Profiling

TruSeq Targeted RNA

TruSeq Stranded mRNA TruSeq Stranded Total RNA w/RiboZero • HRM • Gold • Plant • Globin TruSeq RNA

TruSeq Targeted RNA Expression

TruSeq Small RNA

22

TruSeq mRNA Sample Prep Workflow RNA

23

TruSeq Total RNA Sample Prep Workflow Ribosomal Depleted RNA

Total RNA

Add rRNA Removal Solution

B B

B

B

B

B B

Add rRNA Removal Beads

Remove RRBbound rRNA B

B

B

B B

B

24

TruSeq Targeted RNA Create custom panels, select pre-validated fixed, or add-on custom

Accurate targeting of virtually the entire transcriptome for Human, Mouse, Rat – Assay specific gene families including alternative isoforms  

Individual exons and splice junctions cSNP detection for allele specific expression

– Non-coding RNA transcripts

Validation of over 10,000 assay designs Add custom content to Fixed Panels Pre-Validated Fixed Panels Immune Response

Cardiotox

Lung Cancer

Apoptosis

Breast Cancer

Neuro Panel

Stem Cell

Prostate Cancer

P53 Pathway

Wnt Pathway

Cytochrome P450

Cell Cycle

NFκB Pathway

Hedgehog

25

Design Studio: Custom Panel Creation

Rapid Workflow Modified from existing DASL Sample Prep Chemistry

Sample to answer in 1.5 days with < 4hrs hands-on time

Sample prep for 48-384 samples per run Single MiSeq run equivalent to 15,000 qPCR reactions or 40 384-well plates On instrument analysis with MSR

= 26

TruSeq Small RNA Workflow Starts directly from total RNA 1.0 ug or less input

Interrogate regulatory miRNAs

Pre-pool samples for single gel excision step

27

16S Metagenomics Analysis on the MiSeq System:

© 2011 Illumina, Inc. All rights reserved. Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

Metagenomics & Microbiomics The unexplored living world

Metagenomics is the analysis of genomic DNA from a whole community – 16s DNA – Whole Genome – Targeted Gene

Survey of micro organisms in specific environments – Soil – Aqueous / Marine – Medical / Ag What can we learn – Taxonomic diversity (‘who is there’), – Physiology (‘what are they doing’) – Gene discovery > 99% of the bacteria present in nature are non-culturable Operational Taxonomic Unit - OTU

29 sciencemolecular.blogspot.com /

Metagenomics

Approaches to Study the Microbiome and Metagenomics

30

16S Ribosomal RNA Gene is conserved across all bacteria

31

Choosing a 16S protocol: Four Questions to answer

Biological question asked?

How many samples per run?

ROI amplicon size?

Data analysis? 32

Available options:

Illumina Dem. Protocol, 2x PCR

Nextera XT Protocol

Caporaso, et al Protocol (ISME)

• 96 samples Alternative methods allow increased multiplexing on MiSeq and HiSeq

However, more modifications are required…

Option #3 - the Caporaso et al method (ISME 2012)

44

16s metagenomics using the Caporaso et al (2012) ISME protocol Libraries are prepared by direct amplification primer sets that amplify the region (v4 in this case) and add the clustering and sequencing primer regions. Protocols developed for both the MiSeq and HiSeq platforms – available on the Earth Microbiome (EMP) web site (www.earthmicrobiome.org). Average 15M reads on the MiSeq (2x150) and 70M on the HiSeq (2x100). Employs a 12-base index tag (Golay code).

45

MSR/BaseSpace Classification Summary Output

46

Collaboration with San Diego Zoo Global Conservation of the Desert Tortoise

Wildlife Disease Laboratories – Bruce Rideout (Director) – Josephine Braun (Scientist) – Goal is to remove disease as a roadblock to conservation – Carry out following tasks 

Disease surveillance efforts  Diagnostic test development  Outbreak investigations for the animals at the San Diego Zoo, San Diego Zoo Safari Park, and field conservation programs 47

Desert Tortoise 1 - Healthy Species Individual Disease Severity Sample Name Origin Total # Sequences PF Unclassified (Chorny & Probert) Classified (Chorny & Probert)

Rank Species 1 Calothrix parietina 2 Rickettsia sp. 3 Acinetobacter sp. 4 Symploca atlantica 5 Proteobacteria 6 Thiomonas thermosulfata 7 Bacteria 8 Blautia sp. 9 Blautia coccoides 10 Pedobacter Total Other Species of Note Mycoplasma agassizii

48

Desert Tortoise 18919 0 HEALTHY CONTROL DNA-732 Nasal flush 322,916 44,769 278,147 Total # sequences in sample 106,004 15,631 5,764 4,917 4,109 3,847 3,789 3,705 3,330 2,881 153,977

138

Total % sequences Cumulative % in sample total 38.11% 38.1% 5.62% 43.7% 2.07% 45.8% 1.77% 47.6% 1.48% 49.0% 1.38% 50.4% 1.36% 51.8% 1.33% 53.1% 1.20% 54.3% 1.04% 55.4%

0.05%

Notes Blue-green algae on rocks. Tick, flea, lice borne - some diseases. Soil bacteria. Algae. Variable. Some pathogens known. Needs more detail. Extremophile. ?? Gut bacteria. Gut bacteria. Soil bacteria.

Desert Tortoise 6 - Diseased

Species Individual Severity Sample Name Origin Total # Sequences Unclassified (Chorny & Probert) Classified (Chorny & Probert)

Rank Species 1 Mycoplasma agassizii 2 Myroides odoratus 3 Flavobacteriaceae 4 Chelonobacter sp. 5 Pedobacter sp. 6 Flavobacterium swingsii 7 Proteus penneri 8 Pseudomonas brenneri 9 Pseudomonas sp. 10 Pseudomonas marginalis Total

49

Desert Tortoise 16025 4 DISEASED DNA-1283 Nasal flush 198,672 23,077 175,595 Total # sequences in sample 31,924 11,654 11,481 11,384 7,894 5,492 5,263 4,538 4,193 4,160 97,983

Total % sequences in sample 18.2% 6.6% 6.5% 6.5% 4.5% 3.1% 3.0% 2.6% 2.4% 2.4%

Cumulative % total 18.2% 24.8% 31.4% 37.8% 42.3% 45.5% 48.5% 51.0% 53.4% 55.8%

Notes Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994).

Associated with diseased tortoises (Gregersen et al., 2009). Soil bacteria. Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract. Water borne. Biofilms. Water borne. Biofilms. Water borne. Biofilms.

Desert Tortoise 8 - Diseased Species Individual Severity Sample Name Origin Total # Sequences PF Unclassified (Chorny & Probert) Classified (Chorny & Probert)

Rank Species 1 Chelonobacter sp. 2 Chelonobacter oris 3 Mycoplasma agassizii 4 Granulicatella adiacens 5 Flavobacteriaceae 6 Myroides odoratus 7 Pedobacter sp. 8 Deinococcus sp. 9 Flavobacterium swingsii 10 Proteus penneri Total

50

Desert Tortoise 13498 4 DISEASED DNA-1300 (S10) Nasal flush 271,248 25,323 245,925

Total # sequences in sample 141,734 37,883 7,639 5,825 5,443 4,855 4,108 2,051 1,800 1,625 212,963

Total % sequences in sample 57.6% 15.4% 3.1% 2.4% 2.2% 2.0% 1.7% 0.8% 0.7% 0.7%

Cumulative % total Notes 57.6% Associated with diseased tortoises (Gregersen et al., 2009). 73.0% Associated with diseased tortoises (Gregersen et al., 2009). 76.1% Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994). 78.5% Normal commensal human mucosal membranes. 80.7% Water borne. 82.7% Human nosocomial infection. 84.4% Soil bacteria. 85.2% World's Toughest Bacteria. Soil, water. 85.9% Water borne. 86.6% Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract.

Summary Nasal Microbiome of Desert Tortoises Mycoplasma agassizii present in majority of tortoises – 8 of 9 diseased Desert Tortoise contain pathogen – 7 of 9 diseased Desert Tortoise nasal flush samples this pathogen is in top ten bacteria 

 

Tortoise-16025 it is number one bacteria (18.2%) Tortoise-18963 it is number two bacteria (13.9%) Tortoise-13498 it is number three bacteria (3.1%)

– Previously recognized as etiologic agent of URTD in Desert Tortoise

Chelonobacter sp. also present in many of the tortoises – 8 of 10 Desert Tortoise nasal flush samples this bacteria is in top ten – Chelonobacter previously linked/associated with URTD

Healthy Control Desert Tortoise is dominated by diverse microbiome of soil and water borne bacteria – Points to “healthy” microbiome constituent species? – Healthy Control Desert Tortoise also contains M. agassizii trace at 0.05%!

51

New Illumina Sequencing Portfolio

Focused Power

Flexible Power

Production Power

Population Power

MiSeq

NextSeq 500

HiSeq 2000/2500

HiSeq XTen

Power and efficiency for large-scale genomics

$1000 human genome and extreme throughput for population-scale sequencing

Speed and simplicity for targeted and small genome sequencing

52

Speed and simplicity for personal scale genomics

Introducing NextSeq 500 A new sequencer that combines high throughput NGS applications with the speed, ease of use and affordability of a desktop sequencer

The most flexible applications of any desktop sequencer – Exome, transcriptome, whole genome sequencing in a single run – Industry-leading SBS chemistry: >75% >Q30, no homopolymer issues

Sample size flexibility – 2 output modes: high and mid flow cells and reagents

Push-button simplicity – Load & Go workflows – Integrated sample-to-results solution: streamlined informatics on-premise or in cloud

Accessible affordability – Runs starting at $1,000 (Human Genome ~$4k) – System list price $250K

53

Fast Applications

Human Genome 30 | HOURS

Exome | T-ome 18 | HOURS

NIPT | GEx 12 | HOURS

2 x 150bp

2 x 75bp

1 x 75bp

54

One System, Two output modes

High-Output

Mid-Output

Up to 120 Gb

Up to 40 Gb

400M clusters PF 1 x 75 bp to 2 x 150 bp

130M clusters PF 2 x 75 bp to 2 x 150 bp

30x

2-3

genome

exomes

6-12

2-4

exomes RNA-Seq

samples RNA-Seq

20 GEX profiles NIPT

55

6-36 panels