Illumina • Our analysis will focus on Illumina sequencers (mainly GAs or HiSeq) for DNA processing.
Flowcell LANES
Sheared DNA, random fragments from across whole genome End Repair
AddiFon of A overhang
Adapter LigaFon
Adapters 1. Specific to the flowcell 2. May include mulFplex indices
What is on a flowcell? Short nucleoFde sequences have been fixed to the surface. These sequences are complementary to the adapters used for library preparaFon.
Why Cluster GeneraFon? • Cluster generaFon creates a regions on the flowcell of idenFcal sequence. • This is important for later detecFon of the incorporated base – the same base will be added to all the sequences in the same region making color detecFon more apparent.
Summary of Cluster GeneraFon • Sequences bound to the flowcell surface are complementary to the adaptor sequence. • Adaptors bind to the sequence on the flowcell and copied resulFng in one strand now bound to the flowcell. • The other adaptor binds to its complementary flowcell bound sequence and again copied. • This is repeated mulFple Fmes results in many copies of the sequence bound to the flowcell. • This is known as cluster generaFon.
Cluster Generation
Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing
Cluster Generation
Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing
Cluster Generation
Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing
Cluster Generation
Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing
Cluster GeneraFon : bridge amplificaFons Sequencing primer
1. 2. 3. 4.
Anneal DNA to flowcell polynucleoFde sequences Bridge AmplificaFon GeneraFon of clusters of idenFcal sequence Anneal Primer Sequence
Sequencing • One strand is cleaved so only strand in 1 direcFon remain aZached to the flowcell. • Blockers are added to prevent bases being added to the wrong strand. • Each base is incorporated and then images are taken for each base.
Sequencing by Synthesis
Sequencing by Synthesis
Sequencing by Synthesis
Sequencing by Synthesis
Image Analysis
Image Analysis • I nstrument so[ware must idenFfy where each cluster is located on the surface of the flowcell. • This analysis is done across the first N cycles and is facilitated by high sequence heterogeneity. • At each cluster posiFon the intensity of each colour is determined. • So[ware analysis of these intensiFes makes a base call at each cluster posiFon. Cluster = single read
Paired End Sequencing • A[er the first end has completed its cycles, the second sequencing primer is repeated and the sequencing by synthesis restarted. • Cluster posiFons are maintained.
MulFplexing • For many applicaFons, the capacity of a single lane far exceeds the necessary depth of sequencing required • Strategies developed to allow mulFple samples to be run in a single lane • ModificaFon to both Sample Prep and Instrument processing AGGGTAACGNNNNNNNNA TCCCATTGCNNNNNNNNT
• Different adapter sequences for each sample consisFng of • Common sites for binding to flowcell and sequence priming • Variable Index sites for idenFfying sample • Single and Dual Barcodes
On Instrument Files
IMAGES, high resoluFon Ffs 8 lanes x 64 Fles x 300 cycles x 4 colours = 614400 INTENSITY, cif binary files
BASECALLS files, bcl binary format : organized by Fle and cluster, NOT by cycle CASAVA : off-‐instrument so[ware for aggregaFng the calls from each cluster and generaFng the final consensus sequence fastq files : basecalls by cycle (sequence) with quality
PotenFal Problem Areas • Sample mixup • Size SelecFon – Adaptors dimers