Whole-genome association studies of schizophrenia & bipolar disorder Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA
[email protected] http://pngu.mgh.harvard.edu/purcell/
Whole genome association studies (WGAS)
Rationale behind WGAS
Study design, quality control & data analysis
“Next generation” analyses
Two ongoing WGAS; schizophrenia, bipolar disorder
Building computational tools: PLINK, etc
Data management
Summary statistics
Population stratification
Association analysis
IBD estimation
Strong evidence for causal role of genes in schizophrenia & bipolar 60 50
SCZ BP
20 10 0
G
en er al po pu Au lat io nt n Ne s/ u ph n ew cle s s/ n G ie ra nd ces ch ild re n Ha lf si bs Si bl in g Pa s re nt Ch s ild re n DZ tw in s M Z tw in s
Prevalence, given affected relative
70
From FromGottesman Gottesman1991 1991and and Tsuang and Faraone 1990 Tsuang and Faraone 1990
Several decades of linkage and candidate gene studies Linkage studies inconclusive
Meta-analysis of 32 linkage studies in SCZ (3200 pedigrees, 7500 affecteds). Lewis, Ng, and Levinson, October 2007
Many reported associations but little consistency, e.g. DTNBP1
Mutsuddi, …, Daly, Scolnick, Sklar AJHG 2006
Gene finding prediction circa 1987
Eight to ten moderate-sized pedigrees are sufficient to perform definitive mapping … …affected sib-pair method will require 25 pairs…
Rare disease, major gene effect Genotype AA AG GG
Risk of disease 0.001 0.001 0.95
Disease Diseaseprevalence prevalence
~1 ~1in in1000 1000
Individuals Individualswith withGG GGare are~1000 ~1000times timesmore morelikely likelyto toget getdisease disease Frequency Frequencyof ofG Gin incontrols controls Frequency Frequencyof ofG Gin incases cases
~~5% 5% ~~96% 96%
Common disease, polygenic effects Genotype AA AG GG
Risk of disease 0.01 0.012 0.0144
Disease Diseaseprevalence prevalence
~1 ~1in in100 100
Each Eachextra extraG Gallele alleleincreases increasesrisk riskby by~1.2 ~1.2times times Frequency Frequencyof ofG Gin incontrols controls Frequency Frequencyof ofG Gin incases cases
~~5% 5% ~~6% 6%
← ~1000s individuals
Whole-genome association studies
(WGAS, GWAS)
~500K+ SNPs → AA AG AA GG AG AA ..
AC AC AC AA AC CC ..
CT CC CC CT CC CC ..
GG GG GG AG GG GG ..
TT AT AA AT AA AT ..
CT TT CT TT CT CC ..
.. .. .. .. .. .. ..
Single Nucleotide Polymorphisms
Association, -log10(p)
Bird’s eye view of a WGAS
Chromosomes
Some success stories…
10Mar2005
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-678 The Wellcome Trust Case Control Consortium
A WGAS of bipolar disorder Case/control N
US 955 / 1498
Replicate ~150 SNPs
Edinburgh 273 / 313 US 409 trios
UK 506 / 510 Combine WGAS data
Pamela Sklar, Hugh Gurling, Nick Craddock, Douglas Blackwood, Jordan Smoller, Ed Scolnick
Yan Meng, Manuel Ferreira
WTCCC 1868 / 2943
Combine WGAS data
Extension samples 960 / 473 >4200 cases, >5200 controls and counting…
QC is central: looking at the raw data
A good-looking SNP
Not so pretty…
Pairwise allele-sharing metric Reference
Same population
Different population
Hierarchical clustering
Multidimensional scaling/PCA Han Chinese Japanese
Population differences in BP study CONTROLS
CASES
USA (STEP-BD)
UK (UCL)
Evaluated via permutation that within site the average case is equally similar to the average control as another case
Bipolar study QC data flow 500,568 SNPs on two chips BRLMM genotype calling Individuals 4,139 Low genotyping rate (=5%, including multimarker tests) Expected (-logP)
Initial results of primary BP scan
Best single SNP p = 1.7e-7 in Myosin 5b (MYO5B)
No signal in (modest) replication samples
A number of other “interesting” genes, but nothing that replicates and/or provides an unambiguous “genome-wide significant hit”
No replication of “top hits” in two previous BP WGAS
Ad hoc comparison of results across studies
WTCCC, Nature, 2007; PALB2, rs420259 Baum et al. Molecular Psychiatry, 2007; DGKH
multiple hits near DFNB31 (chr 9) but appear to be independent
calcium channel gene (CACNA1C) gives strongest signal seen in both our study & WTCCC
The next step? A bigger dataset: more individuals, more SNPs
Imputing ungenotyped SNPs your observed genotype data + a reference panel (HapMap) + haplotype phase estimation
=
“imputation”
Increase coverage
Facilitate merging of data/results across different WGAS platforms
More sophisticated QC procedures
can test a SNP without looking at a single genotype for that SNP
Reference chromosomes
… A C ? G T C C G C T A T C G A … … A C G C C T A A C T A T A ? A … … G T G C C T C G C T A C C G T …
Observed SNP data
Imputing untyped HapMap SNPs
… ? C G ? C ? C ? T ? A ? ? ? ? ? C ? ? … … ? T ? ? ? ? A ? ? ? ? ? C ? ? … … ? C ? ? ? ? C ? ? ? ? ? A ? ? …
Note! Doesn’t actually work exactly like this…. Different models, handles phase uncertainty, etc
5
Imputation analysis of DISC1
3 2 0
1
-log10(p)
4
Observed genotypes
228000000
228100000
228200000
228300000
228400000
228500000
228600000
5
Physical position (bp)
3 2 0
1
-log10(p)
4
With imputation
228000000
228100000
228200000
228300000
228400000
228500000
228600000
84 SNPs on 500K product → 397Physical HapMap SNPs imputed with high confidence position (bp) (using two different algorithms, MACH1 and PLINK)
Data-sharing and meta-analysis
Need to increase N individuals as well as N SNPs… GAIN WTCCC Many other individual studies
Effort to combine all psychiatric WGAS (Patrick Sullivan et al)
dbGAP
Adding WTCCC and new BP data rs1006737 (CACNA1C)
Sample
Cases (n)
Controls (n)
Case freq
Cont freq
P value
OR
STEP-BD/UCL
1461
2003
0.357
0.315
3.02E-04
1.21
1868
2943
0.359
0.324
5.28E-04
1.17
960
473
0.346
0.293
4.00E-03
1.28
4289
5419
0.354
0.318
1.88E-08
1.19
WTCCC EXTENSION SAMPLES
ALL
The International Schizophrenia Consortium
David St. Clair Noelle Kuan
Michael Gill Aiden Corvin Derek Morris
Pamela Sklar Shaun Purcell Jennifer Stone
Edward Scolnick Pamela Sklar Shaun Purcell Jennifer Stone Kimberly Chambert
Douglas Blackwood Walter Muir Kevin McGhee
Michael Owen Michael O’Donovan Nick Craddock George Kirov
Hugh Gurling Andrew McQuillin
Carlos Pato Michele Pato
Patrick Sullivan
Paul Lichtenstein Christina Hultman
Sample Distribution Institution
Pop. Ethnicity
# Cases
# Controls
USC/MGH
Portuguese
397
232
Karolinska
Swedish
582
401
UCL
UK/Irish
600
600
Aberdeen
Scottish
800
962
Dublin
Irish
301
1000
Edinburgh
Scottish
484
384
Cardiff
Bulgarian
501
706
3665
4285
Total
Affymetrix Chip Evolution 500K 2-chip
500K 1-chip
1 Million
(Classic Chip)
(5.0 Chip)
(6.0 Chip)
500K SNPs
400K CNV
900K SNPs 900K CNV
UCL Controls
UCL Cases Portuguese Swedes I Aberdeen
600 Control Samples
3296 samples
Edinburgh Cardiff Swedes II Dublin 3888 samples
Challenges combining across platforms UK (red) (different chip to rest) UK (green)
MDS Plot, C1 vs. C2 10
20
40
60
80
Swedish (purple) Portuguese (blue) Scottish (yellow)
100
Bipolar Aberdeen
- 10
MDS Plot, C1 vs. C2: 100% genotyping and MAF > 0.1
UCL Portuguese
All SNPs
Sweden 1
- 20
-4
MDS plot showing “stratification” (red cluster)
-2
2
4
6
8
-1
Bipolar
-2
Aberdeen
QC passing SNPs UCL
Portuguese Sweden
Geographic Locations of Sample Collection Sweden Scotland Bulgaria Ireland Fall River, MA England
Azores Madeira
Portugal
Evaluation of population stratification within sample sites MDS plot: C1 vs C2
Evaluation of population stratification within sample sites
Aberdeen
Sweden I
Sweden II
Portuguese
Cardiff
Edinburgh
Gray = control samples Color = case samples
Dublin
Interim Association Analysis Results 8
-Log(p-value)
7 6 5 4 3 2 1 0
1
2
3
4
5
6
7
8
9
10
11 12 13 14 15 16 17 18 19 21
Chromosome
3214 cases/2904 controls
20
X
22
695655 SNPs
UCL control and Dublin samples not included in analysis
QQ plot : Interim Results
Interim conclusions
WGAS strategy has, in general, proved capable of robustly detecting common variants via LD/SNPs
But, currently, far from complete elucidation...
Coverage & power: meta-analysis, imputation
Role of (multiple?) rare, non-SNP (and de novo) variation?
Rare segregating variants Epistasis & interactions CNVs, inversions, epigenetics etc
Intermediate phenotypes, e.g. brain imaging
Pathways & gene-based tests
Unit of analysis is now a gene, not an allelic variant
aggregate multiple, semi-independent single variant effects
Various methods, probably similar
Sum statistics, combing chi-squares Truncated rank product, combing p-values Rank-based approaches, cf. gene-set enrichment analysis Data-mining approaches to multivariate data
Important to correct for LD and gene size
Increased susceptibility to stratification & confounding a concern
Calcium channel genes and BPD Subunit, Gene & Type
Chr
p-value < 0.05?
A
19 9 12 3 X 1 1 17 16 22
N Y Y Y n/a Y N N N N
A2D CACNA2D1 CACNA2D2 CACNA2D3 CACNA2D4
7 3 3 12
N N N N
B
CACNB1 CACNB2 CACNB3 CACNB4
17 10 12 2
N N N N
CACNG1 CACNG2 CACNG3 CACNG4 CACNG5 CACNG6 CACNG7 CACNG8 TMEM37
17 22 16 17 17 19 19 19 2
N N N N N N N N N
G
CACNA1A CACNA1B CACNA1C CACNA1D CACNA1F CACNA1S CACNA1E CACNA1G CACNA1H CACNA1I
P/Q N L L L L R T T T
27 genes, 4 subunits
No inter-gene linkage disequilibrium
Cluster of significant gene-based results for genes closely related to CACNA1C (L-type calcium ion channel A1 genes)
Association studies have low power to detect rare variants MAF
GRR = 2
GRR = 4
0.1
139
29
0.01
1171
219
0.001
11,520
2,131
0.0001
115,000
21,250
Number of case/control pairs to achieve 80% at 5% type I error rate MAF = minor allele frequency; GRR = genotypic relative risk
Common Old Universal Segregate on short haplotypes
Rare Recent Population-specific Segregate on long haplotypes
“Seemingly unrelated” HapMap individuals
International HapMap Phase II analysis group (Nature, 2007)
Population-based linkage analysis Homozygosity (shared segment) Case (or case/case) Control (or not case/case) Chromosome →
Disease locus
Multiple, rare variants → increased sharing among cases at disease locus
Best overall results from shared IBD segment analysis: 13q33 in Portuguese
-log10(p) for chromosomes 1 – 22
Portugal
~9Mb
Schizophrenia IBD (1st in Portuguese) Schizophrenia IBD (2nd in Swedes) Bipolar study HBD (2nd in combined samples) Schizophrenia HBD (2nd in combined samples)
Methods
International Schizophrenia Consortium
STEP-BD/UCL Bipolar Study
Pak Sham
U. of Aberdeen
MGH/Broad
Mark Daly
David St. Clair, Noel Kwan.
Ben Neale
Cardiff University
Manuel Ferreira
Michael O’Donovan, Michael Owen,
Douglas Ruderfer
Gerorge Kirov
Kathe Todd-Brown
U. College London
Yan Meng
Hugh Gurling, Andrew McQuillin
Andrew Kirby
U. of Edinburgh Douglas Blackwood, Walter Muir, Kevin McGhee
Pamela Sklar, Manuel Ferreira, Lauren Weiss, Jinbo Fan, Edward Scolnick STEP genetics collaborators Bernie Devlin, Steve Faraone, Nan Laird, Matt McQueen, Vishwajit Nimgaonkar, Jordan Smoller STEP clinical Collaborators Gary Sachs, Roy Perlis UK-US Bipolar Disorder Hugh Gurling, Andrew McQuillin, Nick Bass, Jacob Lawrence
Broad Institute/MPG
U. of Southern California Carlos Pato, Michele Pato U. of North Carolina Patrick Sullivan Karolinska Institute Christina Hultman, Paul Lichtenstein Trinity College Michael Gill, Aiden Corvin, Derek Morris MGH/Broad Institute/Stanley Center Pamela Sklar, Jennifer Stone, Edward Scolnick
U. of Edinburgh Douglas Blackwood, Walter Muir, Kevin McGhee NIMH Genetics Initiative Control Collection Pablo Gejman, Alan Sanders Farooq Amin, Nancy Buccola William Byerley, Robert Cloninger, Raymond Crowe, Donald Black, Robert Freedman, Douglas Levinson, Bryan Mowry, Jeremy Silverman