Illumina. Our analysis will focus on Illumina sequencers (mainly GAs or HiSeq) for DNA processing

Illumina  Sequencing     Illumina   •  Our  analysis  will  focus  on  Illumina  sequencers     (mainly  GAs  or  HiSeq)  for  DNA  processing.   ...
Author: Bertina Harris
18 downloads 2 Views 3MB Size
Illumina  Sequencing    

Illumina   •  Our  analysis  will  focus  on  Illumina  sequencers     (mainly  GAs  or  HiSeq)  for  DNA  processing.  

Flowcell   LANES  

Sheared  DNA,  random   fragments  from  across   whole  genome   End  Repair  

AddiFon  of  A  overhang  

Adapter  LigaFon  

Adapters   1.  Specific  to  the  flowcell   2.  May  include  mulFplex  indices  

What  is  on  a  flowcell?       Short  nucleoFde  sequences  have   been  fixed  to  the  surface.     These  sequences  are  complementary   to  the  adapters  used  for  library   preparaFon.    

Why  Cluster  GeneraFon?   •  Cluster  generaFon  creates  a  regions  on  the   flowcell  of  idenFcal  sequence.   •  This  is  important  for  later  detecFon  of  the   incorporated  base  –  the  same  base  will  be   added  to  all  the  sequences  in  the  same  region   making  color  detecFon  more  apparent.      

Summary  of  Cluster  GeneraFon   •  Sequences  bound  to  the  flowcell  surface  are   complementary  to  the  adaptor  sequence.   •  Adaptors  bind  to  the  sequence  on  the  flowcell   and  copied  resulFng  in  one  strand  now  bound  to   the  flowcell.   •  The  other  adaptor  binds  to  its  complementary   flowcell  bound  sequence  and  again  copied.   •  This  is  repeated  mulFple  Fmes  results  in  many   copies  of  the  sequence  bound  to  the  flowcell.   •  This  is  known  as  cluster  generaFon.  

Cluster Generation

Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing

Cluster Generation

Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing

Cluster Generation

Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing

Cluster Generation

Illumina: cBot Fully Automated Clonal Cluster Generation for Illumina Sequencing

Cluster  GeneraFon  :  bridge  amplificaFons   Sequencing  primer  

1.  2.  3.  4.   

Anneal  DNA  to  flowcell  polynucleoFde  sequences   Bridge  AmplificaFon   GeneraFon  of  clusters  of  idenFcal  sequence   Anneal  Primer  Sequence  

Sequencing   •  One  strand  is  cleaved  so  only  strand  in  1   direcFon  remain  aZached  to  the  flowcell.   •  Blockers  are  added  to  prevent  bases  being   added  to  the  wrong  strand.   •  Each  base  is  incorporated  and  then  images  are   taken  for  each  base.  

Sequencing by Synthesis

Sequencing by Synthesis

Sequencing by Synthesis

Sequencing by Synthesis

Image Analysis

Image  Analysis   •  I  nstrument  so[ware  must  idenFfy   where  each  cluster  is  located  on  the   surface  of  the  flowcell.   •  This  analysis  is  done  across  the  first  N   cycles  and  is  facilitated  by  high   sequence  heterogeneity.     •  At  each  cluster  posiFon  the  intensity  of   each  colour  is  determined.   •  So[ware  analysis  of  these  intensiFes   makes  a  base  call  at  each  cluster   posiFon.   Cluster  =  single  read  

Paired  End  Sequencing   •  A[er  the  first  end  has  completed  its   cycles,  the  second  sequencing  primer   is  repeated  and  the  sequencing  by   synthesis  restarted.   •  Cluster  posiFons  are  maintained.  

MulFplexing   •  For  many  applicaFons,  the  capacity  of  a  single  lane  far   exceeds  the  necessary  depth  of  sequencing  required   •  Strategies  developed  to  allow  mulFple  samples  to  be  run   in  a  single  lane   •  ModificaFon  to  both  Sample  Prep  and  Instrument   processing   AGGGTAACGNNNNNNNNA   TCCCATTGCNNNNNNNNT  

•  Different  adapter  sequences  for  each  sample  consisFng   of   •  Common  sites  for  binding  to  flowcell  and  sequence   priming   •  Variable  Index  sites  for  idenFfying  sample   •  Single  and  Dual  Barcodes  

On  Instrument  Files  

  IMAGES,  high  resoluFon  Ffs   8  lanes  x  64  Fles  x  300  cycles  x  4  colours  =  614400     INTENSITY,  cif  binary  files  

BASECALLS  files,  bcl  binary  format  :  organized  by  Fle  and  cluster,   NOT  by  cycle   CASAVA  :  off-­‐instrument  so[ware  for  aggregaFng  the  calls  from   each  cluster  and  generaFng  the  final  consensus  sequence   fastq  files  :  basecalls  by  cycle  (sequence)  with  quality  

PotenFal  Problem  Areas   •  Sample  mixup   •  Size  SelecFon   –  Adaptors  dimers  

•  PCR  amplificaFon   •  Overclustering   –  Difficulty  making  calls   –  ChasFty  Filters  

 

On instrument viewer

Metrics for Sequencers

RunDuration Sequencing Unit ReadLength

HiSeq2000

HiSeq2500

10days

27hours

Lane 2x101

Lane 2x150

MiSeq 27hours Lane 2x150

Reads/Seq Unit

200M

150M

5M

Gb/Seq Unit

40Gb

30Gb

2Gb

IonTorrent

PacBioRS

80mins

90mins

IonChip

SMRTCell

1x100

2000

100-500k

150k

100-500Mb

300Mb

Suggest Documents