Demo : Analysing Whole Exon Sequencing Data

Demo : Analysing Whole Exon Sequencing Data Patrick May Bioinformatics Core (LCSB/Luxembourg) Family Genomics Group (ISB/Seattle) [email protected] ...
Author: Osborn Lynch
35 downloads 0 Views 2MB Size
Demo : Analysing Whole Exon Sequencing Data Patrick May Bioinformatics Core (LCSB/Luxembourg) Family Genomics Group (ISB/Seattle) [email protected]

GEF4 2014 26/08/2014

Outline •  •  •  •  • 

Whole Exome Sequencing (WES) WES analysis pipeline GATK best practices Variant analysis pipeline Demo: Corpasome

GEF4 2014 26/08/2014

Whole Exome Sequencing (WES)

GEF4 2014 26/08/2014

Why whole exon sequencing ? •  Exome makes 2% of the human genome, but contains ~85% of known disease-causing variants •  WES is a cost-effective alternative to wholegenome sequencing •  In case/control (GWAS) study only common variants can be detected and only with large sample sizes •  WES allows for the detection of rare variants

GEF4 2014 26/08/2014

Rare and common variants

GEF4 2014 26/08/2014

WES analysis Exome  Sequencing   Read  QC   Read  Mapping   DEMO

Variant  Calling   Variant  Annota:on   Variant  FIltering   iltering   Valida:on  

Foo et al. 2012, Nature Reviews Neurology

GEF4 2014 26/08/2014

WES analysis Recessive Mutations Exome  Sequencing   Read  QC   Read  Mapping   DEMO

Variant  Calling   Variant  Annota:on   Variant  Filtering   Valida:on  

Charour et al. 2012, PLOS Genetics GEF4 2014 26/08/2014

GATK Best Practices http://www.broadinstitute.org/gatk/guide/best-practices DNAseq

GEF4 2014 26/08/2014

GATK Best Practises - DNAseq

DEMO GEBI2012 20/08/2012

Variant analysis pipeline

GEF4 2014 26/08/2014

Variant Analysis Pipeline Pedigree Union of variants

Vcf /testvariants

Table of variants Mode of inheritance

Recessive, de novo, dominant, X-linked

Candidate variants Annotate genes

ANNOVAR

Protein-coding + ncRNA variants Filter common variants

1000 genomes, EVS, dbSNP

Novel variants Rank variants

SIFT, MutationTaster, prior genes, expression, ...

Final list of variants GEBI2012 20/08/2012

Mode of inheritance •  Diseases

can be dominant, recessive or X-linked •  compound heterozygosity (recessive) •  de novo variants (trio) • incomplete penetrance •  time of disease onset (modifier)

Dominant: heterozygous in B,C,D reference in A,E Recessive: homozygous in B,C,D heterozygous in A,E

GEF4 2014 26/08/2014

Annotation: ANNOVAR (Wang et al. NAR 2010)

-  Annotation of exonic, intronic, UTR, splice-site, up-/ downstream and ncRNA variants -  Different annotation sets can be used -  refGene, ucsc, ensembl, ccds, encode -  All annotations are combined per variant -  Filtering according to pre-compiled variant sets, e.g., EVS or 1000 genomes -  Searching for variants previously annotated in other databases like OMIM, GWAS or dbSNP

GEBI2012 20/08/2012

Ranking: Functional Impact of variants Nonsynonymous SNPs, InDels: -  exome-wide, pre-calculated datasets for SIFT, PolyPhen-2, MutationTaster, LRT Conservation: -  PhyloP, GERP++, SiPhy Other scores: - CADD, GWAVA, SILVA

GEBI2012 20/08/2012

The Corpasome CORPAS family trio WES data By Manuel Corpas http://figshare.com/articles/6_files_with_1GB_per_file/106340

GEF4 2014 26/08/2014

CORPAS family WES trio data •  Released under the a CC-BY license, just for issues of compatibility of license. •  “At this point you have permission to use these data in any way you wish as long as you attribute it to the Corpas family.” (from Manuel Corpas' Blog) •  For a more detailed explanation please: http://manuelcorpas.com/crowdsourcing/

GEF4 2014 26/08/2014

CORPAS family WES trio data •  Fastq files for whole exome sequencing from the Corpas family: mother, father, daughter. •  The data comes from 3 human saliva samples. •  Exome capture was performed using Agilent SureSelect Human All Exon 44 •  Sequenced using Illumina’s HiSeq technology.

GEF4 2014 26/08/2014

CORPAS family WES trio data •  Only chr22 data •  We introduced two variants related to schizophrenia: –  one de novo mutation only in the child –  one recessive mutation inherited from both parents to the child

•  Pedigree/Mode of inheritance (MOI): Genotype   MOI  

Father  

Mother  

Daughter  

recessive  

01  

01  

11  

de  novo  

00  

00  

01   From: Renton & Traynor. Nature Neuroscience, 2013

GEF4 2014 26/08/2014

DEMO Corpasome Your turn !!

GEF4 2014 26/08/2014

Demo Tools •  FastQC •  BWA •  SAMTOOLS •  PICARD •  GATK •  BEDTOOLS •  ANNOVAR •  DenovoGear •  VCFLIB + custom Perl and bash scripts

GEF4 2014 26/08/2014

UEF setup •  •  •  • 

Use SSH secure shell Login to: intron.uef.fi cp –R /home/work/public/WEScourse $HOME cd $HOME/WEScourse

•  •  •  • 

For Visitor-uef users: mkdir $HOME/ cp –R /home/work/public/WEScourse $HOME Cd $HOME/WEScourse

GEF4 2014 26/08/2014

UEF setup •  ls # read directory •  cd corpasome/scripts •  vi runAnalysis.sh –  Change the SOURCE_DIR path to your path

GEF4 2014 26/08/2014

Suggest Documents