1/27/13
Whole genome sequencing as typing tool Birgi:a Duim Utrecht University Faculty of Veterinary Medicine
Bacterial Whole Genome Sequencing § Genomic era began in 1995 § Sanger sequencing of Haemophilus influenzae § In 2005 first high-‐througput (next-‐generaNon) sequencers Pla1orms for sequencing DNA libraries of clonally amplified templates
Single-‐molecule sequencing pla1orms
§ VariaNon in performance, read length, error rate, cost and run Nme
1
1/27/13
Next-‐generaNon technologies GeneraNon Chemistry
PlaTorm
Second
Pyrosequencing
Roche 454, GS FLX
Titanium, 454 GS Junior
Second
Dye terminaNon/ Illumina HiSeq, MySeq synthesis
Second
LigaNon
AB SOLiD
Third
Semiconductor
Ion Torrent Proton, PGM
Third
Direct detecNon
Pacific Biosciences
Suzuki et al., PlosOne 2011 Dunne et al., Eur J Clin Micro Infect Dis 2012
Next generaNon sequencing plaTorms Three-‐stage workflow: Library preparaNon Template amplificaNon DNA sequencing
2
1/27/13
Library preparNon and template amplificaNon Shotgun sequencing
TagmentaBon Tn5 transposase + adaptors
• Mate pair sequencing (3 kb, 6 kb or 20 kb fragments joined to form circular molecules) • Paired-‐end sequencing
Loman et al., 2012, Nature Reviews
AmplificaNon and sequencing chemistry Illumina
Roche 454
Ion Torrent
Solid surface flow cell
Solid beads Emulsion PCR
Ion sphere ParNcles Emulsion PCR Silicon Chip
Solexa chemistry 3ʹ′-‐O-‐azidomethyl reversible terminator
Pyrosequencing
PacBio Real-‐Nme incorporaNon of dye-‐labeled nucleoNdes with a ϕ29-‐derived DNA polymerase
Proton detecNon
3
1/27/13
Loman et al, 2012
Benchtop high-‐throughput sequencers 454 GS Junior
Ion Torrent PGM
©Roche
©Life Technologies
Lower set-‐up Emulsion PCR Pyrosequencing
Emulsion PCR NaNve dNTP chemistry Silicon chip for H-‐ion detecNon bidirecNonal sequencing
Pros Long read lengths hands-‐on Nme
Cons High reagents costs High error rate in homopolymers
Short run Nme Flexible chip hands-‐on Nme High error rate in homopolymers
MiSeq
©Illumina Solexa sequencing-‐by-‐ synthesis Smaller flow cell Reduced imaging Nme Faster microfluidics
Cost-‐effecNveness Short run Nme hands-‐on Nme Short read length
4
1/27/13
Handling Whole genome Sequence Data • Demands on the local informaNon technology infrastructure • Storage/archiving • Each plaTorm delivers data in a slightly different format • Two analyNcal approaches: -‐ reads can be aligned to a known reference sequence -‐ reads are subjected to de novo assembly
Alignment to a reference sequence • Suitable for short sequence reads • Depends on the availability of curated database of reference sequences • Depends on the intended biological applicaNon -‐ genomic epidemiology -‐ pathogen biology • Short-‐read mapping tools Problems: repeBBve regions or absent genes in the reference genome
5
1/27/13
De novo assembly • For a new strain of a known species/pathogen • Assemblers: Velvet, MIRA, Newbler (Roche) (Loman et al., Nature reviews, 2012)
• Significant impact of sequencing errors -‐ Resequencing/increasing the depth of coverage
-‐ For homopolymeric tracts combining data from different plaTorms can be necessary (PacBio + HiSeq)
(Bashir et al., Nature Biotech, 2012)
• Gap-‐closing with Sanger sequencing
From molecular to genomic typing next-‐generaNon sequence plaTorms massive sequence output
reduced cost per base
Small size of microbial genomes
DetecNon of: • mulNple resistance determinants • Virulence factors • Epidemiological markers
6
1/27/13
July 20, 2011
• Outbreak by Shiga-‐toxin-‐producing E.coli O104:H4 • In Germany 830 cases of hemolyNc uremic syndrome (HUS) and 46 deaths since May 2011 • Sequencing with Ion Torrent PGM and OpNcal Mapping • PhylogeneNc analyses of 1144 core E. coli genes • Outbreak strain was EAEC and EHEC hybrid that acquired a fimbriae cluster and CTX-‐M 15 • Rapid data release and use of crowdsourcing
Rhode et al., N Engl J Med, 2011
E.coli O104:H4 for plaTorm comparison 454 GS Junior
Ion Torrent PGM
MiSeq
Loman et al., Nature Biotech 2012
• De novo assembly with Velvet and MIRA • Different coverage of plasmid sequences • All assemblies mapped to 95% of the reference genome • All did not recover full length fragments of protease with mulNple domains that exist as mulNple copies • MiSeq: highest throughput per run, lowest error rate • 454 GS Junior: longest reads, lowest throughput • Homopolymer-‐associated indel errors: in Ion Torrent PGM( 1.5/100bp) and 454 GS Junior (0.38/100 bp)
7
1/27/13
Whole genome sequencing for outbreak analysis Staphylococcus aureus Köser et al., NEJM, 2012 Harris et al., the Lancet, 2012 • MiSeq 150 bp paired-‐end reads • MLST • Resistance genes (resistome) • Toxin genes (toxome) • IdenNficaNon of SNPs and indels in the core genome
Epidemiologically directed
WGS for outbreak analysis Mycobacterium tuberculosis Gardy et al., NEJM, 2011
higher resoluNon of WGS than MIRU-‐VNTR Schürch, Infect Genet Evol 2012 Step-‐wise accumulaNon of SNPs between transmiqng individuals Walker et al., Lancet infect Dis, 2012 HiSeq analysis of 390 strains -‐ Within host diversity < 5 SNPs per paNent -‐ > 12 SNPs belonged to another cluster -‐ Super spreading in 2 community clusters
QuanNtaNve data to define the amount of SNPs during infecNon and to trace transmission
8
1/27/13
§ Paired-‐end reads on HiSeq, 200 strains of S. typhimurium, E. coli, E. faecalis and E. faecium. § Comparison with EUCAST cut-‐off values for anNmicrobial suscepNbility tesNng and with MLST typing § Webservers Resfinder and MLST (www.genomicepidemiology.org) § Curated database for idenNficaNon of resistance genes in WGS data with 98% idenNty
J Clin Microbiol. 2013 Jan 23. [Epub ahead of print] A Genomic Day In The Life of a Clinical Microbiology Laboratory.
Long SW et al., Dept. Pathology and Genomic Medicine, The Methodist Hospital, Houston, USA.
One day microbiology : DNA extracNon and library preparaNon of 130 samples 96 samples in one MiSeq run (paired end 2x 250 bp reads) Second run for fungi and yeast Velvet assembly and NCBI BLAST with conNgs Results 6.3 fold coverage per organism 88,5 % matched with Malditof data
Conclusions Low coverage: opNmizaNon of sample and library preparaNon For 10 species no reference genome sequences
9
1/27/13
Future of S.aureus characterizaNon
Price et al., CMI nov 2012
Requirements of WGS for epidemiological typing • StandardizaNon of sample and library preparaNon • ValidaNon of sequence generaNon • InterpretaNon: -‐ Define which SNPs to include in outbreak sequencing -‐ Use of core genome (lacks informaNon on horizontal gene transfer) -‐ Automated annotaNon -‐ Curated sequence databases
10
1/27/13
Concluding remark Now cost and complexity of sequence plaTorms decline and curated databases containing resistance and virulence markers become available WGS will become a standard tool for studying the moleculair epidemiology of microorganisms
11