The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM)
Introduction to illumina Next Generation Sequencing Technology Oren Ben-Ami, PhD June 21st 2015
a process of determining the precise order of nucleotides (A, C, G, T) within a DNA molecule
The order, or sequence, of nucleotides determines the genetic information available for building and maintaining an organism
Sequence variation Natural polymorphism
Mutation
Seq. Primer
DNA template (cloned & isolated from an E.Coli colony)
Frederick Sanger 1918-2013
Replication products are separated by Electrophoresis
The Human Genome Project • Used Sanger Sequencing • Global international effort involving 20 Research Centers • Lasted 15 years (first draft completed in 2004) • Cost: 3,000,000,000 $! • Facilitated the discovery of more than 1800 diseaseassociated genes
Next Generation Sequencing Sanger Sequencing Resolves sequence of a Single DNA template per run
Resolves Hundreds of Millions of DNA Sequences per run; Higher sensitivity
Next Generation Sequencing Sanger Sequencing
van Dijk EL et. al. Trends Genet. 2014 Sep;30(9):418-26
Schematic view of illumina NGS Technology NGS resolves hundreds of Millions of DNA Sequences on a single run!
Complex DNA sample
Attachment to solid surface Parallel Sequencing of all DNA fragments Data analysis
Next Generation Sequencing Workflow ( by illumina) Step 1 SAMPLE PREPARATION
Adapted from illumina
Step 2 CLUSTER GENERATION
Step 3 SEQUENCING
Step 4 DATA ANALYSIS
Common NGS Applications Whole Genome Sequencing Whole Transcriptome (RNA-Seq)
Whole Exome Sequencing
Chromatin Immunoprecipitation (ChIP)-Seq
Small-RNA Seq DNA Methylation Analysis
Illumina Sequencing Overview
© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Session Objectives
By the end of this training, you will be able to: – List the major steps in the Illumina sequencing workflow – Describe cluster generation – Discuss the Sequencing By Synthesis process
11
Adapted from illumina
FOR RESEARCH USE ONLY
Sequencing Workflow Review
12
FOR RESEARCH USE ONLY
Illumina Sequencing Workflow Library Preparation
Cluster Generation
HiSeq HiScan SQ GA IIx MiSeq
Sequencing
ICS/RTA CASAVA MSR BaseSpace
Data Analysis
13
cBot MiSeq HiSeq 2500
FOR RESEARCH USE ONLY
Sample (“Library”) Preparation Overview: Aim: Obtaining Nucleic Acid Fragments with Adapters attached on both ends Nucleic acid (DNA/RNA)
Modify to proper insert size
Add adapters with sites for: - Flow cell binding and - Sequencing primer binding
Same general template architecture regardless of application
Sample (“Library”) Preparation Overview:
Sample Indexing Index= known short DNA sequence included in the DNA adapter which labels all DNA molecules of a particular sample
Adapted from illumina
Single- vs. Dual-indexed NGS Libraries Single-indexed libraries P5
Index sequence P7
Dual-indexed libraries P5
P7
Number of samples pooled determines the need for single vs. dual indexing
Illumina Sequencing Workflow Library Preparation
Cluster Generation
HiSeq HiScan SQ GA IIx MiSeq
Sequencing
ICS/RTA CASAVA MSR BaseSpace
Data Analysis
17
cBot MiSeq HiSeq 2500
FOR RESEARCH USE ONLY
What is a Flow Cell?
Cluster generation occurs on a flow cell
A flow cell is a thick glass slide with channels or lanes
Each lane is randomly coated with a lawn of oligos that are complementary to library adapters
18
FOR RESEARCH USE ONLY
Instrumentation Amplified Clonal Cluster
Single DNA Library
Sequencer
cBot
19
FOR RESEARCH USE ONLY
Hybridize Fragment & Extend
Adapter sequence
Single DNA libraries are hybridized to primer lawn
Bound libraries are then extended by polymerases
Surface of flow cell coated with a lawn of oligo pairs 3’ extension
20
FOR RESEARCH USE ONLY
Denature Double-Stranded DNA
Double-stranded molecule is denatured
Original template
Original template washed away
discard
Newly synthesized strand is covalently attached to flow cell surface
21
Newly synthesized strand
FOR RESEARCH USE ONLY
Single-Stranded DNA
NOTE: Single molecules bind to flow cell in a random pattern
22
FOR RESEARCH USE ONLY
Bridge Amplification
Single-stranded molecule flips over and forms a bridge by hybridizing to adjacent, complementary primer
Hybridized primer is extended by polymerases
23
FOR RESEARCH USE ONLY
Bridge Amplification
Double-stranded bridge is formed
24
FOR RESEARCH USE ONLY
Denature Double-Stranded Bridge
Double-stranded bridge is denatured
Result: Two copies of covalently bound single-stranded templates
25
FOR RESEARCH USE ONLY
Bridge Amplification
Single-stranded molecules flip over to hybridize to adjacent primers
Hybridized primer is extended by polymerase
26
FOR RESEARCH USE ONLY
Bridge Amplification
Bridge amplification cycle is repeated until multiple bridges are formed
27
FOR RESEARCH USE ONLY
Linearization
dsDNA bridges are denatured
28
FOR RESEARCH USE ONLY
Reverse Strand Cleavage
Reverse strands are cleaved and washed away, leaving a cluster with forward strands only
29
FOR RESEARCH USE ONLY
Blocking
Free 3’ ends are blocked to prevent unwanted DNA priming
30
FOR RESEARCH USE ONLY
Read 1 Primer Hybridization
Sequencing primer is hybridized to adapter sequence
31
Sequencing primer
FOR RESEARCH USE ONLY
Illumina Sequencing Workflow Library Preparation
Cluster Generation
HiSeq HiScan SQ GA IIx MiSeq
Sequencing
ICS/RTA CASAVA MSR BaseSpace
Data Analysis
32
cBot MiSeq HiSeq 2500
FOR RESEARCH USE ONLY
Sequencing By Synthesis
Add 4 Fl-NTP’s + Polymerase
Incorporated FINTP imaged
Terminator & fluorescent dye cleaved from FI-NTP
X 36 - 251 33
FOR RESEARCH USE ONLY
Reversible Terminator Chemistry
• • •
All 4 labeled nucleotides in 1 reaction Higher accuracy No problems with homopolymer repeats
Next Cycle
• • • •
34
Incorporation Detection Deblock Fluor Removal
FOR RESEARCH USE ONLY
Clusters (of DNA molecules sequenced): Cluster Intensities collected following every base addition
100 Microns 35
FOR RESEARCH USE ONLY
Illumina Sequencing Workflow Library Preparation
Cluster Generation
HiSeq HiScan SQ GA IIx MiSeq
Sequencing
ICS/RTA CASAVA MSR BaseSpace
Data Analysis
36
cBot MiSeq HiSeq 2500
FOR RESEARCH USE ONLY
Data Analysis Overview Analysis Type
Software
Outputs
ICS/RTA
Images/TIFF files
Sequencing
Primary Analysis ICS/RTA
Intensities
Base Calling
Secondary Analysis HiSeq Analysis Software 37
FOR RESEARCH USE ONLY
Alignments and Variant Detection
Paired End Sequencing
38
FOR RESEARCH USE ONLY
Single End Sequencing
39
FOR RESEARCH USE ONLY
Paired End Sequencing
40
FOR RESEARCH USE ONLY
Paired End Sequencing
Reference
This is really the best way to do sequencing
Single-reads
This is
…
… …
is really really the the best
… Paired-reads
sequencing This is (----100 characters-------) sequencing
Assembly becomes easier!! 41
FOR RESEARCH USE ONLY
Paired End Sequencing
Blocked 3’-ends
Sequenced strand is stripped off
3’-ends of template strands and lawn primers are unblocked
42
FOR RESEARCH USE ONLY
Sequenced strand
Paired End Sequencing
Single-stranded template loops over to form a bridge by hybridizing with a lawn primer
Bridge formation
3’-ends of lawn primer is extended
3’ extension
43
FOR RESEARCH USE ONLY
Paired End Sequencing
Double stranded DNA
44
FOR RESEARCH USE ONLY
Paired End Sequencing
Bridges are linearized and the original forward template is cleaved
Original forward strand
45
FOR RESEARCH USE ONLY
Paired End Sequencing Blocked 3’-ends
Free 3’ ends of the reverse template and lawn primers are blocked to prevent unwanted DNA priming
Sequencing primer
Sequencing primer is hybridized to adapter sequence
46
Reverse strand template
FOR RESEARCH USE ONLY
Sequencing By Synthesis 2nd Read
Add 4 Fl-NTP’s + Polymerase
Incorporated FINTP imaged
Terminator & fluorescent dye cleaved from FI-NTP
X 36 - 251 47
FOR RESEARCH USE ONLY
Sequencing Paired End Libraries with Single Index Read
48
FOR RESEARCH USE ONLY
Paired End Sequencing of Single-indexed libraries Read 1 Seq Primer (HP6)
Utilizes 3 sequencing reads
3
1
Paired End Turnaround
2 Index Seq Primer (HP8)
49
Read 2 Seq Primer (HP7)
FOR RESEARCH USE ONLY
Sequencing Paired End Libraries with Dual Index Read
50
FOR RESEARCH USE ONLY
Paired End Sequencing of Dual Indexed Libraries Utilizes 4 Sequencing Reads
1
2
3
4 Paired End Turnaround
Questions?
Part II: NGS Library Preparation and Quality Control
user responsibility
illumina
user / illumina
Taken from: http://rnaseq.uoregon.edu/library_prep.html
NGS Applications common at the WIS
RNA-Seq
DNA-Seq (Whole genome, ChIP-Seq)
RNA-Seq library preparation protocol: (TruSeq RNA v2, illumina)
Total RNA
Purify and Fragment mRNA
cDNA Synthesis (First & Second strand)
Adenylate 3’ Ends
Ligated Indexed Paired-End Adapters
PCR Amplification
Ends Repair
Not much different is the…
DNA-Seq library preparation protocol: (TruSeq Nano DNA, illumina)
Purified Genomic DNA
Fragment Genomic DNA
Ends Repair
Adenylate
Ligated Indexed Paired-End Adapters
PCR Amplification
3’ Ends
5 step procedure separated by Bead-based size selection
Step1: DNA/RNA Fragmentation • Physical Fragmentation Acoustic shearing: breaks DNA into 100 bp-5kb (Covaris) Sonication: shears chromatin & DNA into 150 bp-1 kb (Bioruptor) • Enzymatic Fragmentation (DNA endonucleases, Transposase) Considered consistent, but less random when compared to physical DNA-shearing methods • Chemical Fragmentation Heat and divalent metal Cation (Mg+2/Zn+2): used for breakup of RNA molecules Ideally results in 115-350 nt RNA molecules
Illumina Sequencing Workflow Sample Preparation
Library Validation: Accurate quantification Library size & quality
Cluster Generation
Sequencing
Data Analysis
cBot MiSeq
HiSeq HiScan SQ GA IIx MiSeq
Agilent Tapestation
Invitrogen Qubit
Accurate Library Quantification: Maximizes Data Quality and Quantity Optimized flow cell clustering determines data quality and overall data yield
20pM
10pM
Overclustering can result in: • Loss of data quality and data output • Loss of focus • Reduced base calls and Q30 scores • Complete run failure
5pM
1pM
Underclustering can result in: • Loss of time and money • Loss of focus • Complete run failure
Accurate Quantification Is Critical When Multiplexing
Calculated concentration is 10X higher for one library in pool
Sample
Expected Output
Actual Output
1
16%
20%
2
16%
20%
3
16%
20%
4
16%
20%
5
16%
20%
6
16%
2%
Sample
Expected Output
Actual Output
1
16%
66%
2
16%
6%
3
16%
6%
4
16%
6%
5
16%
6%
6
16%
6%
Calculated concentration is 10X lower for one library in pool
Quantification Methods of NGS Libraries UV- spectrophotometer Nanodrop
Bioanalyzer 2100
Fluorescence-based ds-DNA assay Qubit or PicoGreen
qPCR
•Detects nucleic acids nonspecifically •Contaminants elevate values •Should not be used for input or library quantification
•Accuracy highly dependent on dilution and sample handling •Recommended for quality control only
•Specifically detects double-stranded DNA
•Does not discriminate incomplete libraries
•Specifically measures full-length libraries •Detection very sensitive
Library Quantification using qPCR
Library qPCR Overview
qPCR Uses primers complementary to Designed to quantify only cluster-forming adapters to mimic fragments in the samples amplification on the flow cell
Only amplifies and quantifies library fragments with proper adapters at both ends
Steps for Quantifying Libraries with qPCR
Step 1
Step 2
Step 3
• Create a Control standard curve using a Control template of known concentration
• Run qPCR on Control template standard curve and unknown libraries
• Extrapolate concentration of unknown libraries from standard curve
Assessing Library Quality with Bioanalyzer
Agilent Bioanalyzer 2100: Overview
Image from “Bioanalyzer Applications for Next-Gen Sequencing: updates and tips” from Agilent Technologies
Understanding a Bioanalyzer Trace
Upper Marker Lower Marker
Sample Peak
Baseline
Understanding a Bioanalyzer Report
Summary Page
Sample Details
Bioanalyzer Details Region can be set in 2100 Expert software
Average Library Size
Don’t use to quantify
Calculation of Library Molar Concentration
Library weight concentration (ng/ul) (Fluoremetric assay Qubit, qPCR)
+
Average library size (bp) (BioAnalyzer/ Tapestation)
Library Molar Concentration
Optimized flow cell clustering & seq. data
Summary
Accurate quantitation is critical for maximizing high quality data output
Library quantitation is especially critical when pooling indexed libraries
Library Validation Use recommended method to quantify final libraries prior to sequencing
Check library quality using a Bioanalyzer 2100
Questions?