Lecture 14: DNA Sequencing

Lecture 14: DNA Sequencing Study Chapter 8.9 10/18/2013 COMP 465 Fall 2013 1 DNA Sequencing • Shear DNA into millions of small fragments • Read 5...
Author: Dustin Stevens
3 downloads 2 Views 5MB Size
Lecture 14: DNA Sequencing Study Chapter 8.9

10/18/2013

COMP 465 Fall 2013

1

DNA Sequencing • Shear DNA into millions of small fragments • Read 500 – 700 nucleotides at a time from the small fragments (Sanger method)

10/18/2013

COMP 465 Fall 2013

2

Fragment Assembly • Assembles the individual overlapping short fragments (reads) into a genomic sequence • Shortest Superstring problem from last time is an overly simplified abstraction • Problems: – DNA read error rate of 1% to 3% – Can’t separate coding and template strands – DNA is full of repeats

• Let’s take a closer look

10/18/2013

COMP 465 Fall 2013

3

Construction of Repeat Graph • Construction of repeat graph from k – mers: emulates an SBH experiment with a huge (virtual) DNA chip. • Breaking reads into k – mers: Transform sequencing data into virtual DNA chip data.

10/18/2013

COMP 465 Fall 2013

4

Construction of Repeat Graph (cont’d) • Error correction in reads: “consensus first” approach to fragment assembly. Makes reads (almost) error-free BEFORE the assembly even starts. • Using reads and mate-pairs to simplify the repeat graph (Eulerian Superpath Problem).

10/18/2013

COMP 465 Fall 2013

5

Approaches to Fragment Assembly Find a path visiting every VERTEX exactly once in the OVERLAP graph: Hamiltonian path problem

NP-complete: algorithms unknown 10/18/2013

COMP 465 Fall 2013

6

Approaches to Fragment Assembly (cont’d)

Find a path visiting every EDGE exactly once in the REPEAT graph: Eulerian path problem

Linear time algorithms are known

10/18/2013

COMP 465 Fall 2013

7

Making Repeat Graph Without DNA • Problem: Construct the repeat graph from a collection of reads.

? • Solution: Break the reads into smaller pieces. 10/18/2013

COMP 465 Fall 2013

8

Repeat Sequences: Emulating a DNA Chip • Virtual DNA chip allows the biological problem to be solved within the technological constraints.

10/18/2013

COMP 465 Fall 2013

9

Repeat Sequences: Emulating a DNA Chip (cont’d) • Reads are constructed from an original sequence in lengths that allow biologists a high level of certainty. • They are then broken again to allow the technology to sequence each within a reasonable array.

10/18/2013

COMP 465 Fall 2013

10

Minimizing Errors • If an error exists in one of the 20-mer reads, the error will be perpetuated among all of the smaller pieces broken from that read.

10/18/2013

COMP 465 Fall 2013

11

Minimizing Errors (cont’d) • However, that error will not be present in the other instances of the 20-mer read. • So it is possible to eliminate most point mutation errors before reconstructing the original sequence.

10/18/2013

COMP 465 Fall 2013

12

Conclusion from Previous Lecture • Graph theory is a vital tool for solving biological problems • Wide range of applications, including sequencing, motif finding, protein networks, and many more

10/18/2013

COMP 465 Fall 2013

13

DNA Sequencing Timeline

10/21/2013

COMP 465 Fall 2013

14

Generations of Sequences

10/22/2013

COMP 465 Fall 2013

15

High-Throughput Sequencing • Also referred to as Next-Generation Sequencing • Parallelize the sequencing process, producing thousands or millions of sequences concurrently • Lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. • In ultra-high-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel

10/21/2013

COMP 465 Fall 2013

16

10/21/2013

COMP 465 Fall 2013

17

Next Generation Sequencing: Amplified Single Molecule Sequencing

10/22/2013

COMP 465 Fall 2013

18

Next Generation Sequencing: Amplified Single Molecule Sequencing

10/22/2013

COMP 465 Fall 2013

19

454 Sequencing

10/22/2013

COMP 465 Fall 2013

20

454 Sequencing

10/22/2013

COMP 465 Fall 2013

21

454 Sequencing / Pyrosequencing

10/22/2013

COMP 465 Fall 2013

22

454 Sequencing / Pyrosequencing

10/22/2013

COMP 465 Fall 2013

23

454 Sequencing / Pyrosequencing

10/22/2013

COMP 465 Fall 2013

24

SOLiD

10/22/2013

COMP 465 Fall 2013

25

SOLiD

10/22/2013

COMP 465 Fall 2013

26

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

27

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

28

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

29

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

30

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

31

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

32

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

33

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

34

Sequencing By Ligation

10/22/2013

COMP 465 Fall 2013

35

Illumina

10/22/2013

COMP 465 Fall 2013

36

Illumina

10/22/2013

COMP 465 Fall 2013

37

Illumina

10/22/2013

COMP 465 Fall 2013

38

Which Next-Gen Sequencer to Choose for your Project?

10/22/2013

COMP 465 Fall 2013

39

Mouse Genomes Project • http://www.sanger.ac.uk/cgibin/modelorgs/mousegenomes/lookseq/index. pl?show=8:101738730101738871,paired_pileup&lane=C3H_HeJ.bam& width=900&win=141&display=|perfect|single| inversions|pairlinks|potsnps|uniqueness|gc|c overage|orientation|annotation|gc|coverage| &maxdist=1000

10/22/2013

COMP 465 Fall 2013

40

Sequence Comparisons

10/22/2013

COMP 465 Fall 2013

41

Human Genome Project • In Dec. 1, 1999, researchers in the Human Genome Project announced the complete sequencing of the DNA making up human chromosome 22. • In 2000, the completion of a “working draft” DNA sequence of the human genome was announced. • Special issues of Nature and Science came out in February of 2001 with the complete working draft human genome.

10/22/2013

COMP 465 Fall 2013

42

Human Genome Project • International HapMap Project began in 2002. • Special issue of Nature Human Genome Collection (2006) • On June 13, 2013, The U.S. Supreme Court ruled that naturally occurring DNA cannot be patented, but that synthetically created cDNA is patent-eligible.

10/22/2013

COMP 465 Fall 2013

43

References

• Simons, Robert W. Advanced Molecular Genetics Course, UCLA (2002). • Batzoglou, S. Computational Genomics Course, Stanford University (2006). http://ai.stanford.edu/~serafim/CS262_2006/ • Vierstraete, Andy. Next Generation Sequencing, University of Ghent. http://users.ugent.be/~avierstr/nextgen/nextgen.html

10/22/2013

COMP 465 Fall 2013

44

Next Time • Protein Sequencing • Sections 8.10-8.15

10/21/2013

COMP 465 Fall 2013

45