DNA Sequencing • Shear DNA into millions of small fragments • Read 500 – 700 nucleotides at a time from the small fragments (Sanger method)
10/18/2013
COMP 465 Fall 2013
2
Fragment Assembly • Assembles the individual overlapping short fragments (reads) into a genomic sequence • Shortest Superstring problem from last time is an overly simplified abstraction • Problems: – DNA read error rate of 1% to 3% – Can’t separate coding and template strands – DNA is full of repeats
• Let’s take a closer look
10/18/2013
COMP 465 Fall 2013
3
Construction of Repeat Graph • Construction of repeat graph from k – mers: emulates an SBH experiment with a huge (virtual) DNA chip. • Breaking reads into k – mers: Transform sequencing data into virtual DNA chip data.
10/18/2013
COMP 465 Fall 2013
4
Construction of Repeat Graph (cont’d) • Error correction in reads: “consensus first” approach to fragment assembly. Makes reads (almost) error-free BEFORE the assembly even starts. • Using reads and mate-pairs to simplify the repeat graph (Eulerian Superpath Problem).
10/18/2013
COMP 465 Fall 2013
5
Approaches to Fragment Assembly Find a path visiting every VERTEX exactly once in the OVERLAP graph: Hamiltonian path problem
NP-complete: algorithms unknown 10/18/2013
COMP 465 Fall 2013
6
Approaches to Fragment Assembly (cont’d)
Find a path visiting every EDGE exactly once in the REPEAT graph: Eulerian path problem
Linear time algorithms are known
10/18/2013
COMP 465 Fall 2013
7
Making Repeat Graph Without DNA • Problem: Construct the repeat graph from a collection of reads.
? • Solution: Break the reads into smaller pieces. 10/18/2013
COMP 465 Fall 2013
8
Repeat Sequences: Emulating a DNA Chip • Virtual DNA chip allows the biological problem to be solved within the technological constraints.
10/18/2013
COMP 465 Fall 2013
9
Repeat Sequences: Emulating a DNA Chip (cont’d) • Reads are constructed from an original sequence in lengths that allow biologists a high level of certainty. • They are then broken again to allow the technology to sequence each within a reasonable array.
10/18/2013
COMP 465 Fall 2013
10
Minimizing Errors • If an error exists in one of the 20-mer reads, the error will be perpetuated among all of the smaller pieces broken from that read.
10/18/2013
COMP 465 Fall 2013
11
Minimizing Errors (cont’d) • However, that error will not be present in the other instances of the 20-mer read. • So it is possible to eliminate most point mutation errors before reconstructing the original sequence.
10/18/2013
COMP 465 Fall 2013
12
Conclusion from Previous Lecture • Graph theory is a vital tool for solving biological problems • Wide range of applications, including sequencing, motif finding, protein networks, and many more
10/18/2013
COMP 465 Fall 2013
13
DNA Sequencing Timeline
10/21/2013
COMP 465 Fall 2013
14
Generations of Sequences
10/22/2013
COMP 465 Fall 2013
15
High-Throughput Sequencing • Also referred to as Next-Generation Sequencing • Parallelize the sequencing process, producing thousands or millions of sequences concurrently • Lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. • In ultra-high-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel
10/21/2013
COMP 465 Fall 2013
16
10/21/2013
COMP 465 Fall 2013
17
Next Generation Sequencing: Amplified Single Molecule Sequencing
10/22/2013
COMP 465 Fall 2013
18
Next Generation Sequencing: Amplified Single Molecule Sequencing
10/22/2013
COMP 465 Fall 2013
19
454 Sequencing
10/22/2013
COMP 465 Fall 2013
20
454 Sequencing
10/22/2013
COMP 465 Fall 2013
21
454 Sequencing / Pyrosequencing
10/22/2013
COMP 465 Fall 2013
22
454 Sequencing / Pyrosequencing
10/22/2013
COMP 465 Fall 2013
23
454 Sequencing / Pyrosequencing
10/22/2013
COMP 465 Fall 2013
24
SOLiD
10/22/2013
COMP 465 Fall 2013
25
SOLiD
10/22/2013
COMP 465 Fall 2013
26
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
27
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
28
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
29
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
30
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
31
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
32
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
33
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
34
Sequencing By Ligation
10/22/2013
COMP 465 Fall 2013
35
Illumina
10/22/2013
COMP 465 Fall 2013
36
Illumina
10/22/2013
COMP 465 Fall 2013
37
Illumina
10/22/2013
COMP 465 Fall 2013
38
Which Next-Gen Sequencer to Choose for your Project?
Human Genome Project • In Dec. 1, 1999, researchers in the Human Genome Project announced the complete sequencing of the DNA making up human chromosome 22. • In 2000, the completion of a “working draft” DNA sequence of the human genome was announced. • Special issues of Nature and Science came out in February of 2001 with the complete working draft human genome.
10/22/2013
COMP 465 Fall 2013
42
Human Genome Project • International HapMap Project began in 2002. • Special issue of Nature Human Genome Collection (2006) • On June 13, 2013, The U.S. Supreme Court ruled that naturally occurring DNA cannot be patented, but that synthetically created cDNA is patent-eligible.
10/22/2013
COMP 465 Fall 2013
43
References
• Simons, Robert W. Advanced Molecular Genetics Course, UCLA (2002). • Batzoglou, S. Computational Genomics Course, Stanford University (2006). http://ai.stanford.edu/~serafim/CS262_2006/ • Vierstraete, Andy. Next Generation Sequencing, University of Ghent. http://users.ugent.be/~avierstr/nextgen/nextgen.html
10/22/2013
COMP 465 Fall 2013
44
Next Time • Protein Sequencing • Sections 8.10-8.15