Introduction to the MiSeq: Technology and Applications
Joseph Aman Field Applications Scientist
© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Overview
Introduction to the MiSeq Sample preparation kits – DNA – RNA
16s Sequencing Overview Illumina sequencing portfolio update
2
MiSeq System Proven Pedigree – Bench top Friendly
3
Illumina Sequencing Workflow Library Preparation
Cluster Generation
Sequencing
Data Analysis
4
Library Preparation
Active Chromatin
Genomic DNA
Small RNA mRNA
ChIP-Sequencing
5
Other Apps
Sample Prep is Critical for Successful Sequencing
Dual Index Library shown
The aim of the sample prep step is to obtain nucleic acid fragments with adapters attached on both ends
6
The flow cell
Everything except sample preparation is completed on the flow cell •
Template annealing (1 - 96 samples)
•
Template amplification
•
Sequencing primer hybridization
•
Sequencing-by-synthesis reaction
•
Generation of fluorescent signal
7
Cluster Generation
Bind single DNA molecules to surface
Amplify on surface
~1000 molecules per ~ 1 µm cluster
8
MiSeq: Industry Best Data Quality, Paired-End Reads, High Output, and Flexible Read Length
15G Read Length
Quality Scores >Q30
1x36 bp
>90% bases
2x25 bp
>90% bases
2x100 bp
>85% bases
2x150 bp
>80% bases
2x250 bp
>75% bases
2x300 bp
>75% bases
Data output
8.5G 5.1G 3.4G 850 Mb 610 Mb 1x36
2 x 25
2 x 100
2 x 150
Read Length (bp) 9
2 x 250
2 x 300
MiSeq Applications
PCR Amplicon
10
TruSeq Amplicon (TruSight Panels)
Targeted RNA Expression
Illumina Sequencing Publications More than 5,200 publications using Illumina’s SBS technology
Greater than 85% of all SRA projects done on MiSeq 13,473 MiSeq projects*
6000 5000
14000
4000
86%
12000 10000
3000
8000 2000
6000
1000
4000
2000 0 2007
2008
2009
2010
2011
2012
2013
8%
5%
0 MiSeq
PGM
GS Jr
0.3% Proton
*Projects in NCBIs Sequence Read Archive (SRA) as of 01-02-14
11
Illumina Sample Preparation Portfolio
TruSeq Amplicon Cancer Panel
Nextera Enrichment Custom/Exome
TruSeq Custom Amplicon
TruSeq ChIP
Nextera XT
TruSeq RNA v2
TruSeq Targeted RNA
Nextera DNA
TruSeq Stranded mRNA/ Total RNA
Nextera Mate Pair
TruSeq DNA
12
Illumina Sequencing
TruSeq small RNA
Questions to ask yourself What kind of organism/s am I working with? – Is there a genome or transcriptome reference? Do I need to make one? – What level of ploidy does it have? 1n, 2n, 4n, 8n?? – How big is the genome/region of interest? – Does it have a balanced base composition?
What kind of samples do I have access to? – Fresh? Frozen? FFPE? Dried? Ancient? Environmental? – Quantity. How much nucleic acid starting material can I get? Do I only get one biological replicate? – Quality. Will my starting material be degraded or contain potential contaminants? How many samples do I need? – What number of samples will I need to be confident of my discovery? What false positive/false negative rates will I tolerate? – How much biological/technical replication is prudent? What are my limitations? – Time, budget, manpower, skills, tools, access to instrumentation – Do I need to operate within a regulatory framework? 13
Illumina DNA Sequencing Applications Portfolio The growing family of sample prep DNA Sample Prep TruSeq DNA PCR-Free
TruSeq Nano DNA
Nextera Mate Pair
Nextera
Nextera XT
Summary
Eliminates PCRinduced bias
The new gold standard with Low-input
Long-insert; gelfree
Low-input, fast, MiSeq
Lowest-input Fast, MiSeq, small genomes
Time
~5 h
~6 h
1.5 d
1.5 h
1.5 h
Apps
WGRS
WGRS
De novo, SVs
WGRS
Amplicons, plasmids, small genomes
Input
1ug+
100ng+
1ug+
50ng
1ng
Indexing
96
96
48 (gel-free)
96
96
Quality
Best!
Best!
+++
+++
+++
14
TruSeq DNA Sample Prep Workflow DNA
RNA
Gel size selection, if needed
15
Nextera and Nextera XT Sample Prep Kit Features
16
Highlights
• Fast sample prep, 90 min. • Single read or paired-end compatibility
Sample input and indices
• 50 to 1 ng DNA per sample • Index up to 96 samples • Parallel processing of up to 96 samples
Specific considerations
• Construct completed at the PCR step • Gel-free protocol • Enzymatic fragmentation
Suitable for:
• Large and small whole genomes • Now used for all enrichment workflows • Nextera Rapid Enrichment!
Nextera DNA Sample Prep
Transposons
Genomic DNA
~ 300 bp
Tagmentation
Reduced-Cycle PCR Amplification
p7 Index 1
Read 2 Sequencing Primer
Read 2 Sequencing Primer
Enrichment
Index 1 p5
Sequencing-Ready Fragment p7
17
Index 1 Rd2 SP
Rd1 SP Index 2
p5
TruSeq Custom Amplicon Assay Time Go from DNA to called variants in ~2 days Day 1 Receive custom oligos; Hybridization setup
Design studio for custom panels, Fixed panels also avaliable
18
Day 1 Assay biochemistry
Day 1-2 Cluster gen and sequencing on MiSeq
Day 2 Finished at 5:00PM Real-time analysis
1536 amplicons, 96 samples per plate
Automated sequencing
Simple, efficient, automatic data analysis and variant calling
Nextera® Mate-Pair Sample Prep Kit Industry’s only gel-free protocol, lowest DNA input, ideal for de novo seq
Enables wide-range of whole-genome apps – De novo assembly of small genomes & complex genomes (cancer) – Genome finishing: ‘close-to-finished’ ref. genomes from single library type – Spls w/limited input DNA (metagenomics) – Detection of structural variation
Optimized combo of Nextera & TruSeq DNA Spl Prep Gel-Free option: 3-15kb gap size – 1 ug input; 1.5 days: 3hrs HOT – Gel+ option for more refined gap size, 4ug input
19
TruSeq ChIP Seq - How does this work? 1. Cross link w/ formalin, shear 2. Remaining DNA protected by proteins
3. Immunoprecipitate with antibody against target protein of interest 4. Reverse crosslinks, use DNA as input for library generation TS ChIP Sample Prep
5. Sequence, reads indicate where protein was bound Figure from Szalkowski, A.M, and Schmid, C.D.(2010). Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing banchmarking efforts. Briefings in Bioinfomatics.
20
Illumina RNA Portfolio
TruSeq RNA TruSeq Stranded RNA
TruSeq RNA v2
TruSeq Small RNA
Stranded mRNA
mRNA
Small RNA
21
Stranded Total RNA
Sample Prep Solutions for RNA from Discovery to Validation Transcriptome to Targeted TruSeq RNA-Seq Portfolio Whole Transcriptome Analysis
RNA-seq for Expression Profiling
TruSeq Targeted RNA
TruSeq Stranded mRNA TruSeq Stranded Total RNA w/RiboZero • HRM • Gold • Plant • Globin TruSeq RNA
TruSeq Targeted RNA Expression
TruSeq Small RNA
22
TruSeq mRNA Sample Prep Workflow RNA
23
TruSeq Total RNA Sample Prep Workflow Ribosomal Depleted RNA
Total RNA
Add rRNA Removal Solution
B B
B
B
B
B B
Add rRNA Removal Beads
Remove RRBbound rRNA B
B
B
B B
B
24
TruSeq Targeted RNA Create custom panels, select pre-validated fixed, or add-on custom
Accurate targeting of virtually the entire transcriptome for Human, Mouse, Rat – Assay specific gene families including alternative isoforms
Individual exons and splice junctions cSNP detection for allele specific expression
– Non-coding RNA transcripts
Validation of over 10,000 assay designs Add custom content to Fixed Panels Pre-Validated Fixed Panels Immune Response
Cardiotox
Lung Cancer
Apoptosis
Breast Cancer
Neuro Panel
Stem Cell
Prostate Cancer
P53 Pathway
Wnt Pathway
Cytochrome P450
Cell Cycle
NFκB Pathway
Hedgehog
25
Design Studio: Custom Panel Creation
Rapid Workflow Modified from existing DASL Sample Prep Chemistry
Sample to answer in 1.5 days with < 4hrs hands-on time
Sample prep for 48-384 samples per run Single MiSeq run equivalent to 15,000 qPCR reactions or 40 384-well plates On instrument analysis with MSR
= 26
TruSeq Small RNA Workflow Starts directly from total RNA 1.0 ug or less input
Interrogate regulatory miRNAs
Pre-pool samples for single gel excision step
27
16S Metagenomics Analysis on the MiSeq System:
© 2011 Illumina, Inc. All rights reserved. Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Metagenomics & Microbiomics The unexplored living world
Metagenomics is the analysis of genomic DNA from a whole community – 16s DNA – Whole Genome – Targeted Gene
Survey of micro organisms in specific environments – Soil – Aqueous / Marine – Medical / Ag What can we learn – Taxonomic diversity (‘who is there’), – Physiology (‘what are they doing’) – Gene discovery > 99% of the bacteria present in nature are non-culturable Operational Taxonomic Unit - OTU
29 sciencemolecular.blogspot.com /
Metagenomics
Approaches to Study the Microbiome and Metagenomics
30
16S Ribosomal RNA Gene is conserved across all bacteria
31
Choosing a 16S protocol: Four Questions to answer
Biological question asked?
How many samples per run?
ROI amplicon size?
Data analysis? 32
Available options:
Illumina Dem. Protocol, 2x PCR
Nextera XT Protocol
Caporaso, et al Protocol (ISME)
• 96 samples Alternative methods allow increased multiplexing on MiSeq and HiSeq
However, more modifications are required…
Option #3 - the Caporaso et al method (ISME 2012)
44
16s metagenomics using the Caporaso et al (2012) ISME protocol Libraries are prepared by direct amplification primer sets that amplify the region (v4 in this case) and add the clustering and sequencing primer regions. Protocols developed for both the MiSeq and HiSeq platforms – available on the Earth Microbiome (EMP) web site (www.earthmicrobiome.org). Average 15M reads on the MiSeq (2x150) and 70M on the HiSeq (2x100). Employs a 12-base index tag (Golay code).
45
MSR/BaseSpace Classification Summary Output
46
Collaboration with San Diego Zoo Global Conservation of the Desert Tortoise
Wildlife Disease Laboratories – Bruce Rideout (Director) – Josephine Braun (Scientist) – Goal is to remove disease as a roadblock to conservation – Carry out following tasks
Disease surveillance efforts Diagnostic test development Outbreak investigations for the animals at the San Diego Zoo, San Diego Zoo Safari Park, and field conservation programs 47
Desert Tortoise 1 - Healthy Species Individual Disease Severity Sample Name Origin Total # Sequences PF Unclassified (Chorny & Probert) Classified (Chorny & Probert)
Rank Species 1 Calothrix parietina 2 Rickettsia sp. 3 Acinetobacter sp. 4 Symploca atlantica 5 Proteobacteria 6 Thiomonas thermosulfata 7 Bacteria 8 Blautia sp. 9 Blautia coccoides 10 Pedobacter Total Other Species of Note Mycoplasma agassizii
48
Desert Tortoise 18919 0 HEALTHY CONTROL DNA-732 Nasal flush 322,916 44,769 278,147 Total # sequences in sample 106,004 15,631 5,764 4,917 4,109 3,847 3,789 3,705 3,330 2,881 153,977
138
Total % sequences Cumulative % in sample total 38.11% 38.1% 5.62% 43.7% 2.07% 45.8% 1.77% 47.6% 1.48% 49.0% 1.38% 50.4% 1.36% 51.8% 1.33% 53.1% 1.20% 54.3% 1.04% 55.4%
0.05%
Notes Blue-green algae on rocks. Tick, flea, lice borne - some diseases. Soil bacteria. Algae. Variable. Some pathogens known. Needs more detail. Extremophile. ?? Gut bacteria. Gut bacteria. Soil bacteria.
Desert Tortoise 6 - Diseased
Species Individual Severity Sample Name Origin Total # Sequences Unclassified (Chorny & Probert) Classified (Chorny & Probert)
Rank Species 1 Mycoplasma agassizii 2 Myroides odoratus 3 Flavobacteriaceae 4 Chelonobacter sp. 5 Pedobacter sp. 6 Flavobacterium swingsii 7 Proteus penneri 8 Pseudomonas brenneri 9 Pseudomonas sp. 10 Pseudomonas marginalis Total
49
Desert Tortoise 16025 4 DISEASED DNA-1283 Nasal flush 198,672 23,077 175,595 Total # sequences in sample 31,924 11,654 11,481 11,384 7,894 5,492 5,263 4,538 4,193 4,160 97,983
Total % sequences in sample 18.2% 6.6% 6.5% 6.5% 4.5% 3.1% 3.0% 2.6% 2.4% 2.4%
Cumulative % total 18.2% 24.8% 31.4% 37.8% 42.3% 45.5% 48.5% 51.0% 53.4% 55.8%
Notes Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994).
Associated with diseased tortoises (Gregersen et al., 2009). Soil bacteria. Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract. Water borne. Biofilms. Water borne. Biofilms. Water borne. Biofilms.
Desert Tortoise 8 - Diseased Species Individual Severity Sample Name Origin Total # Sequences PF Unclassified (Chorny & Probert) Classified (Chorny & Probert)
Rank Species 1 Chelonobacter sp. 2 Chelonobacter oris 3 Mycoplasma agassizii 4 Granulicatella adiacens 5 Flavobacteriaceae 6 Myroides odoratus 7 Pedobacter sp. 8 Deinococcus sp. 9 Flavobacterium swingsii 10 Proteus penneri Total
50
Desert Tortoise 13498 4 DISEASED DNA-1300 (S10) Nasal flush 271,248 25,323 245,925
Total # sequences in sample 141,734 37,883 7,639 5,825 5,443 4,855 4,108 2,051 1,800 1,625 212,963
Total % sequences in sample 57.6% 15.4% 3.1% 2.4% 2.2% 2.0% 1.7% 0.8% 0.7% 0.7%
Cumulative % total Notes 57.6% Associated with diseased tortoises (Gregersen et al., 2009). 73.0% Associated with diseased tortoises (Gregersen et al., 2009). 76.1% Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994). 78.5% Normal commensal human mucosal membranes. 80.7% Water borne. 82.7% Human nosocomial infection. 84.4% Soil bacteria. 85.2% World's Toughest Bacteria. Soil, water. 85.9% Water borne. 86.6% Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract.
Summary Nasal Microbiome of Desert Tortoises Mycoplasma agassizii present in majority of tortoises – 8 of 9 diseased Desert Tortoise contain pathogen – 7 of 9 diseased Desert Tortoise nasal flush samples this pathogen is in top ten bacteria
Tortoise-16025 it is number one bacteria (18.2%) Tortoise-18963 it is number two bacteria (13.9%) Tortoise-13498 it is number three bacteria (3.1%)
– Previously recognized as etiologic agent of URTD in Desert Tortoise
Chelonobacter sp. also present in many of the tortoises – 8 of 10 Desert Tortoise nasal flush samples this bacteria is in top ten – Chelonobacter previously linked/associated with URTD
Healthy Control Desert Tortoise is dominated by diverse microbiome of soil and water borne bacteria – Points to “healthy” microbiome constituent species? – Healthy Control Desert Tortoise also contains M. agassizii trace at 0.05%!
51
New Illumina Sequencing Portfolio
Focused Power
Flexible Power
Production Power
Population Power
MiSeq
NextSeq 500
HiSeq 2000/2500
HiSeq XTen
Power and efficiency for large-scale genomics
$1000 human genome and extreme throughput for population-scale sequencing
Speed and simplicity for targeted and small genome sequencing
52
Speed and simplicity for personal scale genomics
Introducing NextSeq 500 A new sequencer that combines high throughput NGS applications with the speed, ease of use and affordability of a desktop sequencer
The most flexible applications of any desktop sequencer – Exome, transcriptome, whole genome sequencing in a single run – Industry-leading SBS chemistry: >75% >Q30, no homopolymer issues
Sample size flexibility – 2 output modes: high and mid flow cells and reagents
Push-button simplicity – Load & Go workflows – Integrated sample-to-results solution: streamlined informatics on-premise or in cloud
Accessible affordability – Runs starting at $1,000 (Human Genome ~$4k) – System list price $250K
53
Fast Applications
Human Genome 30 | HOURS
Exome | T-ome 18 | HOURS
NIPT | GEx 12 | HOURS
2 x 150bp
2 x 75bp
1 x 75bp
54
One System, Two output modes
High-Output
Mid-Output
Up to 120 Gb
Up to 40 Gb
400M clusters PF 1 x 75 bp to 2 x 150 bp
130M clusters PF 2 x 75 bp to 2 x 150 bp
30x
2-3
genome
exomes
6-12
2-4
exomes RNA-Seq
samples RNA-Seq
20 GEX profiles NIPT
55
6-36 panels