Introduction to illumina Next Generation Sequencing Technology

The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM) Introduction to illumina Next Generation Sequencing Technology...
Author: Daniel Palmer
24 downloads 2 Views 6MB Size
The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM)

Introduction to illumina Next Generation Sequencing Technology Oren Ben-Ami, PhD June 21st 2015

a process of determining the precise order of nucleotides (A, C, G, T) within a DNA molecule

The order, or sequence, of nucleotides determines the genetic information available for building and maintaining an organism

Sequence variation Natural polymorphism

Mutation

Seq. Primer

DNA template (cloned & isolated from an E.Coli colony)

Frederick Sanger 1918-2013

Replication products are separated by Electrophoresis

The Human Genome Project • Used Sanger Sequencing • Global international effort involving 20 Research Centers • Lasted 15 years (first draft completed in 2004) • Cost: 3,000,000,000 $! • Facilitated the discovery of more than 1800 diseaseassociated genes

Next Generation Sequencing Sanger Sequencing Resolves sequence of a Single DNA template per run

Resolves Hundreds of Millions of DNA Sequences per run; Higher sensitivity

Next Generation Sequencing Sanger Sequencing

van Dijk EL et. al. Trends Genet. 2014 Sep;30(9):418-26

Schematic view of illumina NGS Technology NGS resolves hundreds of Millions of DNA Sequences on a single run!

Complex DNA sample

Attachment to solid surface Parallel Sequencing of all DNA fragments Data analysis

Next Generation Sequencing Workflow ( by illumina) Step 1 SAMPLE PREPARATION

Adapted from illumina

Step 2 CLUSTER GENERATION

Step 3 SEQUENCING

Step 4 DATA ANALYSIS

Common NGS Applications Whole Genome Sequencing Whole Transcriptome (RNA-Seq)

Whole Exome Sequencing

Chromatin Immunoprecipitation (ChIP)-Seq

Small-RNA Seq DNA Methylation Analysis

Illumina Sequencing Overview

© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

Session Objectives

By the end of this training, you will be able to: – List the major steps in the Illumina sequencing workflow – Describe cluster generation – Discuss the Sequencing By Synthesis process

11

Adapted from illumina

FOR RESEARCH USE ONLY

Sequencing Workflow Review

12

FOR RESEARCH USE ONLY

Illumina Sequencing Workflow Library Preparation

Cluster Generation

HiSeq HiScan SQ GA IIx MiSeq

Sequencing

ICS/RTA CASAVA MSR BaseSpace

Data Analysis

13

cBot MiSeq HiSeq 2500

FOR RESEARCH USE ONLY

Sample (“Library”) Preparation Overview: Aim: Obtaining Nucleic Acid Fragments with Adapters attached on both ends Nucleic acid (DNA/RNA)

Modify to proper insert size

Add adapters with sites for: - Flow cell binding and - Sequencing primer binding

Same general template architecture regardless of application

Sample (“Library”) Preparation Overview:

Sample Indexing Index= known short DNA sequence included in the DNA adapter which labels all DNA molecules of a particular sample

Adapted from illumina

Single- vs. Dual-indexed NGS Libraries Single-indexed libraries P5

Index sequence P7

Dual-indexed libraries P5

P7

Number of samples pooled determines the need for single vs. dual indexing

Illumina Sequencing Workflow Library Preparation

Cluster Generation

HiSeq HiScan SQ GA IIx MiSeq

Sequencing

ICS/RTA CASAVA MSR BaseSpace

Data Analysis

17

cBot MiSeq HiSeq 2500

FOR RESEARCH USE ONLY

What is a Flow Cell?

Cluster generation occurs on a flow cell

A flow cell is a thick glass slide with channels or lanes

Each lane is randomly coated with a lawn of oligos that are complementary to library adapters

18

FOR RESEARCH USE ONLY

Instrumentation Amplified Clonal Cluster

Single DNA Library

Sequencer

cBot

19

FOR RESEARCH USE ONLY

Hybridize Fragment & Extend

Adapter sequence

Single DNA libraries are hybridized to primer lawn

Bound libraries are then extended by polymerases

Surface of flow cell coated with a lawn of oligo pairs 3’ extension

20

FOR RESEARCH USE ONLY

Denature Double-Stranded DNA

Double-stranded molecule is denatured

Original template

Original template washed away

discard

Newly synthesized strand is covalently attached to flow cell surface

21

Newly synthesized strand

FOR RESEARCH USE ONLY

Single-Stranded DNA

NOTE: Single molecules bind to flow cell in a random pattern

22

FOR RESEARCH USE ONLY

Bridge Amplification

Single-stranded molecule flips over and forms a bridge by hybridizing to adjacent, complementary primer

Hybridized primer is extended by polymerases

23

FOR RESEARCH USE ONLY

Bridge Amplification

Double-stranded bridge is formed

24

FOR RESEARCH USE ONLY

Denature Double-Stranded Bridge

Double-stranded bridge is denatured

Result: Two copies of covalently bound single-stranded templates

25

FOR RESEARCH USE ONLY

Bridge Amplification

Single-stranded molecules flip over to hybridize to adjacent primers

Hybridized primer is extended by polymerase

26

FOR RESEARCH USE ONLY

Bridge Amplification

Bridge amplification cycle is repeated until multiple bridges are formed

27

FOR RESEARCH USE ONLY

Linearization

dsDNA bridges are denatured

28

FOR RESEARCH USE ONLY

Reverse Strand Cleavage

Reverse strands are cleaved and washed away, leaving a cluster with forward strands only

29

FOR RESEARCH USE ONLY

Blocking

Free 3’ ends are blocked to prevent unwanted DNA priming

30

FOR RESEARCH USE ONLY

Read 1 Primer Hybridization

Sequencing primer is hybridized to adapter sequence

31

Sequencing primer

FOR RESEARCH USE ONLY

Illumina Sequencing Workflow Library Preparation

Cluster Generation

HiSeq HiScan SQ GA IIx MiSeq

Sequencing

ICS/RTA CASAVA MSR BaseSpace

Data Analysis

32

cBot MiSeq HiSeq 2500

FOR RESEARCH USE ONLY

Sequencing By Synthesis

Add 4 Fl-NTP’s + Polymerase

Incorporated FINTP imaged

Terminator & fluorescent dye cleaved from FI-NTP

X 36 - 251 33

FOR RESEARCH USE ONLY

Reversible Terminator Chemistry

• • •

All 4 labeled nucleotides in 1 reaction Higher accuracy No problems with homopolymer repeats

Next Cycle

• • • •

34

Incorporation Detection Deblock Fluor Removal

FOR RESEARCH USE ONLY

Clusters (of DNA molecules sequenced): Cluster Intensities collected following every base addition

100 Microns 35

FOR RESEARCH USE ONLY

Illumina Sequencing Workflow Library Preparation

Cluster Generation

HiSeq HiScan SQ GA IIx MiSeq

Sequencing

ICS/RTA CASAVA MSR BaseSpace

Data Analysis

36

cBot MiSeq HiSeq 2500

FOR RESEARCH USE ONLY

Data Analysis Overview Analysis Type

Software

Outputs

ICS/RTA

Images/TIFF files

Sequencing

Primary Analysis ICS/RTA

Intensities

Base Calling

Secondary Analysis HiSeq Analysis Software 37

FOR RESEARCH USE ONLY

Alignments and Variant Detection

Paired End Sequencing

38

FOR RESEARCH USE ONLY

Single End Sequencing

39

FOR RESEARCH USE ONLY

Paired End Sequencing

40

FOR RESEARCH USE ONLY

Paired End Sequencing

Reference

This is really the best way to do sequencing

Single-reads

This is



… …

is really really the the best

… Paired-reads

sequencing This is (----100 characters-------) sequencing

Assembly becomes easier!! 41

FOR RESEARCH USE ONLY

Paired End Sequencing

Blocked 3’-ends

Sequenced strand is stripped off

3’-ends of template strands and lawn primers are unblocked

42

FOR RESEARCH USE ONLY

Sequenced strand

Paired End Sequencing

Single-stranded template loops over to form a bridge by hybridizing with a lawn primer

Bridge formation

3’-ends of lawn primer is extended

3’ extension

43

FOR RESEARCH USE ONLY

Paired End Sequencing

Double stranded DNA

44

FOR RESEARCH USE ONLY

Paired End Sequencing

Bridges are linearized and the original forward template is cleaved

Original forward strand

45

FOR RESEARCH USE ONLY

Paired End Sequencing Blocked 3’-ends

Free 3’ ends of the reverse template and lawn primers are blocked to prevent unwanted DNA priming

Sequencing primer

Sequencing primer is hybridized to adapter sequence

46

Reverse strand template

FOR RESEARCH USE ONLY

Sequencing By Synthesis 2nd Read

Add 4 Fl-NTP’s + Polymerase

Incorporated FINTP imaged

Terminator & fluorescent dye cleaved from FI-NTP

X 36 - 251 47

FOR RESEARCH USE ONLY

Sequencing Paired End Libraries with Single Index Read

48

FOR RESEARCH USE ONLY

Paired End Sequencing of Single-indexed libraries Read 1 Seq Primer (HP6)

Utilizes 3 sequencing reads

3

1

Paired End Turnaround

2 Index Seq Primer (HP8)

49

Read 2 Seq Primer (HP7)

FOR RESEARCH USE ONLY

Sequencing Paired End Libraries with Dual Index Read

50

FOR RESEARCH USE ONLY

Paired End Sequencing of Dual Indexed Libraries Utilizes 4 Sequencing Reads

1

2

3

4 Paired End Turnaround

Questions?

Part II: NGS Library Preparation and Quality Control

user responsibility

illumina

user / illumina

Taken from: http://rnaseq.uoregon.edu/library_prep.html

NGS Applications common at the WIS

RNA-Seq

DNA-Seq (Whole genome, ChIP-Seq)

RNA-Seq library preparation protocol: (TruSeq RNA v2, illumina)

Total RNA

Purify and Fragment mRNA

cDNA Synthesis (First & Second strand)

Adenylate 3’ Ends

Ligated Indexed Paired-End Adapters

PCR Amplification

Ends Repair

Not much different is the…

DNA-Seq library preparation protocol: (TruSeq Nano DNA, illumina)

Purified Genomic DNA

Fragment Genomic DNA

Ends Repair

Adenylate

Ligated Indexed Paired-End Adapters

PCR Amplification

3’ Ends

5 step procedure separated by Bead-based size selection

Step1: DNA/RNA Fragmentation • Physical Fragmentation Acoustic shearing: breaks DNA into 100 bp-5kb (Covaris) Sonication: shears chromatin & DNA into 150 bp-1 kb (Bioruptor) • Enzymatic Fragmentation (DNA endonucleases, Transposase) Considered consistent, but less random when compared to physical DNA-shearing methods • Chemical Fragmentation Heat and divalent metal Cation (Mg+2/Zn+2): used for breakup of RNA molecules Ideally results in 115-350 nt RNA molecules

Illumina Sequencing Workflow Sample Preparation

Library Validation: Accurate quantification Library size & quality

Cluster Generation

Sequencing

Data Analysis

cBot MiSeq

HiSeq HiScan SQ GA IIx MiSeq

Agilent Tapestation

Invitrogen Qubit

Accurate Library Quantification: Maximizes Data Quality and Quantity Optimized flow cell clustering determines data quality and overall data yield

20pM

10pM

Overclustering can result in: • Loss of data quality and data output • Loss of focus • Reduced base calls and Q30 scores • Complete run failure

5pM

1pM

Underclustering can result in: • Loss of time and money • Loss of focus • Complete run failure

Accurate Quantification Is Critical When Multiplexing

Calculated concentration is 10X higher for one library in pool

Sample

Expected Output

Actual Output

1

16%

20%

2

16%

20%

3

16%

20%

4

16%

20%

5

16%

20%

6

16%

2%

Sample

Expected Output

Actual Output

1

16%

66%

2

16%

6%

3

16%

6%

4

16%

6%

5

16%

6%

6

16%

6%

Calculated concentration is 10X lower for one library in pool

Quantification Methods of NGS Libraries UV- spectrophotometer Nanodrop

Bioanalyzer 2100

Fluorescence-based ds-DNA assay Qubit or PicoGreen

qPCR

•Detects nucleic acids nonspecifically •Contaminants elevate values •Should not be used for input or library quantification

•Accuracy highly dependent on dilution and sample handling •Recommended for quality control only

•Specifically detects double-stranded DNA

•Does not discriminate incomplete libraries

•Specifically measures full-length libraries •Detection very sensitive

Library Quantification using qPCR

Library qPCR Overview

qPCR Uses primers complementary to Designed to quantify only cluster-forming adapters to mimic fragments in the samples amplification on the flow cell

Only amplifies and quantifies library fragments with proper adapters at both ends

Steps for Quantifying Libraries with qPCR

Step 1

Step 2

Step 3

• Create a Control standard curve using a Control template of known concentration

• Run qPCR on Control template standard curve and unknown libraries

• Extrapolate concentration of unknown libraries from standard curve

Assessing Library Quality with Bioanalyzer

Agilent Bioanalyzer 2100: Overview

Image from “Bioanalyzer Applications for Next-Gen Sequencing: updates and tips” from Agilent Technologies

Understanding a Bioanalyzer Trace

Upper Marker Lower Marker

Sample Peak

Baseline

Understanding a Bioanalyzer Report

Summary Page

Sample Details

Bioanalyzer Details Region can be set in 2100 Expert software

Average Library Size

Don’t use to quantify

Calculation of Library Molar Concentration

Library weight concentration (ng/ul) (Fluoremetric assay Qubit, qPCR)

+

Average library size (bp) (BioAnalyzer/ Tapestation)



Library Molar Concentration

Optimized flow cell clustering & seq. data

Summary

Accurate quantitation is critical for maximizing high quality data output

Library quantitation is especially critical when pooling indexed libraries

Library Validation Use recommended method to quantify final libraries prior to sequencing

Check library quality using a Bioanalyzer 2100

Questions?