Whole-genome association studies of schizophrenia & bipolar disorder

Whole-genome association studies of schizophrenia & bipolar disorder Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit, Center for Human Ge...
Author: Abraham Day
0 downloads 1 Views 2MB Size
Whole-genome association studies of schizophrenia & bipolar disorder Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA Broad Institute of Harvard and MIT, Cambridge, MA, USA

[email protected] http://pngu.mgh.harvard.edu/purcell/

Whole genome association studies (WGAS) „

Rationale behind WGAS

„

Study design, quality control & data analysis

„

“Next generation” analyses

„

Two ongoing WGAS; schizophrenia, bipolar disorder

Building computational tools: PLINK, etc

„

Data management

„

Summary statistics

„

Population stratification

„

Association analysis

„

IBD estimation

Strong evidence for causal role of genes in schizophrenia & bipolar 60 50

SCZ BP

20 10 0

G

en er al po pu Au lat io nt n Ne s/ u ph n ew cle s s/ n G ie ra nd ces ch ild re n Ha lf si bs Si bl in g Pa s re nt Ch s ild re n DZ tw in s M Z tw in s

Prevalence, given affected relative

70

From FromGottesman Gottesman1991 1991and and Tsuang and Faraone 1990 Tsuang and Faraone 1990

Several decades of linkage and candidate gene studies Linkage studies inconclusive

Meta-analysis of 32 linkage studies in SCZ (3200 pedigrees, 7500 affecteds). Lewis, Ng, and Levinson, October 2007

Many reported associations but little consistency, e.g. DTNBP1

Mutsuddi, …, Daly, Scolnick, Sklar AJHG 2006

Gene finding prediction circa 1987

Eight to ten moderate-sized pedigrees are sufficient to perform definitive mapping … …affected sib-pair method will require 25 pairs…

Rare disease, major gene effect Genotype AA AG GG

Risk of disease 0.001 0.001 0.95

Disease Diseaseprevalence prevalence

~1 ~1in in1000 1000

Individuals Individualswith withGG GGare are~1000 ~1000times timesmore morelikely likelyto toget getdisease disease Frequency Frequencyof ofG Gin incontrols controls Frequency Frequencyof ofG Gin incases cases

~~5% 5% ~~96% 96%

Common disease, polygenic effects Genotype AA AG GG

Risk of disease 0.01 0.012 0.0144

Disease Diseaseprevalence prevalence

~1 ~1in in100 100

Each Eachextra extraG Gallele alleleincreases increasesrisk riskby by~1.2 ~1.2times times Frequency Frequencyof ofG Gin incontrols controls Frequency Frequencyof ofG Gin incases cases

~~5% 5% ~~6% 6%

← ~1000s individuals

Whole-genome association studies

(WGAS, GWAS)

~500K+ SNPs → AA AG AA GG AG AA ..

AC AC AC AA AC CC ..

CT CC CC CT CC CC ..

GG GG GG AG GG GG ..

TT AT AA AT AA AT ..

CT TT CT TT CT CC ..

.. .. .. .. .. .. ..

Single Nucleotide Polymorphisms

Association, -log10(p)

Bird’s eye view of a WGAS

Chromosomes

Some success stories…

10Mar2005

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661-678 The Wellcome Trust Case Control Consortium

A WGAS of bipolar disorder Case/control N

US 955 / 1498

Replicate ~150 SNPs

Edinburgh 273 / 313 US 409 trios

UK 506 / 510 Combine WGAS data

„

Pamela Sklar, Hugh Gurling, Nick Craddock, Douglas Blackwood, Jordan Smoller, Ed Scolnick

„

Yan Meng, Manuel Ferreira

WTCCC 1868 / 2943

Combine WGAS data

Extension samples 960 / 473 >4200 cases, >5200 controls and counting…

QC is central: looking at the raw data „

A good-looking SNP

„

Not so pretty…

Pairwise allele-sharing metric Reference

Same population

Different population

Hierarchical clustering

Multidimensional scaling/PCA Han Chinese Japanese

Population differences in BP study CONTROLS

CASES

USA (STEP-BD)

UK (UCL)

Evaluated via permutation that within site the average case is equally similar to the average control as another case

Bipolar study QC data flow 500,568 SNPs on two chips BRLMM genotype calling Individuals 4,139 Low genotyping rate (=5%, including multimarker tests) Expected (-logP)

Initial results of primary BP scan „

Best single SNP p = 1.7e-7 in Myosin 5b (MYO5B) ‰

No signal in (modest) replication samples

„

A number of other “interesting” genes, but nothing that replicates and/or provides an unambiguous “genome-wide significant hit”

„

No replication of “top hits” in two previous BP WGAS ‰ ‰

„

Ad hoc comparison of results across studies ‰

‰

„

WTCCC, Nature, 2007; PALB2, rs420259 Baum et al. Molecular Psychiatry, 2007; DGKH

multiple hits near DFNB31 (chr 9) but appear to be independent

calcium channel gene (CACNA1C) gives strongest signal seen in both our study & WTCCC

The next step? A bigger dataset: more individuals, more SNPs

Imputing ungenotyped SNPs your observed genotype data + a reference panel (HapMap) + haplotype phase estimation

=

“imputation”

„

Increase coverage

„

Facilitate merging of data/results across different WGAS platforms

„

More sophisticated QC procedures ‰

can test a SNP without looking at a single genotype for that SNP

Reference chromosomes

… A C ? G T C C G C T A T C G A … … A C G C C T A A C T A T A ? A … … G T G C C T C G C T A C C G T …

Observed SNP data

Imputing untyped HapMap SNPs

… ? C G ? C ? C ? T ? A ? ? ? ? ? C ? ? … … ? T ? ? ? ? A ? ? ? ? ? C ? ? … … ? C ? ? ? ? C ? ? ? ? ? A ? ? …

Note! Doesn’t actually work exactly like this…. Different models, handles phase uncertainty, etc

5

Imputation analysis of DISC1

3 2 0

1

-log10(p)

4

Observed genotypes

228000000

228100000

228200000

228300000

228400000

228500000

228600000

5

Physical position (bp)

3 2 0

1

-log10(p)

4

With imputation

228000000

228100000

228200000

228300000

228400000

228500000

228600000

84 SNPs on 500K product → 397Physical HapMap SNPs imputed with high confidence position (bp) (using two different algorithms, MACH1 and PLINK)

Data-sharing and meta-analysis „

Need to increase N individuals as well as N SNPs… GAIN WTCCC Many other individual studies

„

Effort to combine all psychiatric WGAS (Patrick Sullivan et al)

dbGAP

Adding WTCCC and new BP data rs1006737 (CACNA1C)

Sample

Cases (n)

Controls (n)

Case freq

Cont freq

P value

OR

STEP-BD/UCL

1461

2003

0.357

0.315

3.02E-04

1.21

1868

2943

0.359

0.324

5.28E-04

1.17

960

473

0.346

0.293

4.00E-03

1.28

4289

5419

0.354

0.318

1.88E-08

1.19

WTCCC EXTENSION SAMPLES

ALL

The International Schizophrenia Consortium

David St. Clair Noelle Kuan

Michael Gill Aiden Corvin Derek Morris

Pamela Sklar Shaun Purcell Jennifer Stone

Edward Scolnick Pamela Sklar Shaun Purcell Jennifer Stone Kimberly Chambert

Douglas Blackwood Walter Muir Kevin McGhee

Michael Owen Michael O’Donovan Nick Craddock George Kirov

Hugh Gurling Andrew McQuillin

Carlos Pato Michele Pato

Patrick Sullivan

Paul Lichtenstein Christina Hultman

Sample Distribution Institution

Pop. Ethnicity

# Cases

# Controls

USC/MGH

Portuguese

397

232

Karolinska

Swedish

582

401

UCL

UK/Irish

600

600

Aberdeen

Scottish

800

962

Dublin

Irish

301

1000

Edinburgh

Scottish

484

384

Cardiff

Bulgarian

501

706

3665

4285

Total

Affymetrix Chip Evolution 500K 2-chip

500K 1-chip

1 Million

(Classic Chip)

(5.0 Chip)

(6.0 Chip)

500K SNPs

400K CNV

900K SNPs 900K CNV

UCL Controls

UCL Cases Portuguese Swedes I Aberdeen

600 Control Samples

3296 samples

Edinburgh Cardiff Swedes II Dublin 3888 samples

Challenges combining across platforms UK (red) (different chip to rest) UK (green)

MDS Plot, C1 vs. C2 10

20

40

60

80

Swedish (purple) Portuguese (blue) Scottish (yellow)

100

Bipolar Aberdeen

- 10

MDS Plot, C1 vs. C2: 100% genotyping and MAF > 0.1

UCL Portuguese

All SNPs

Sweden 1

- 20

-4

MDS plot showing “stratification” (red cluster)

-2

2

4

6

8

-1

Bipolar

-2

Aberdeen

QC passing SNPs UCL

Portuguese Sweden

Geographic Locations of Sample Collection Sweden Scotland Bulgaria Ireland Fall River, MA England

Azores Madeira

Portugal

Evaluation of population stratification within sample sites MDS plot: C1 vs C2

Evaluation of population stratification within sample sites

Aberdeen

Sweden I

Sweden II

Portuguese

Cardiff

Edinburgh

Gray = control samples Color = case samples

Dublin

Interim Association Analysis Results 8

-Log(p-value)

7 6 5 4 3 2 1 0

1

2

3

4

5

6

7

8

9

10

11 12 13 14 15 16 17 18 19 21

Chromosome

3214 cases/2904 controls

20

X

22

695655 SNPs

UCL control and Dublin samples not included in analysis

QQ plot : Interim Results

Interim conclusions „

WGAS strategy has, in general, proved capable of robustly detecting common variants via LD/SNPs

„

But, currently, far from complete elucidation... ‰

Coverage & power: meta-analysis, imputation

‰

Role of (multiple?) rare, non-SNP (and de novo) variation? „ „ „ „

‰

Rare segregating variants Epistasis & interactions CNVs, inversions, epigenetics etc

Intermediate phenotypes, e.g. brain imaging

Pathways & gene-based tests „

Unit of analysis is now a gene, not an allelic variant ‰

„

aggregate multiple, semi-independent single variant effects

Various methods, probably similar ‰ ‰ ‰ ‰

Sum statistics, combing chi-squares Truncated rank product, combing p-values Rank-based approaches, cf. gene-set enrichment analysis Data-mining approaches to multivariate data

„

Important to correct for LD and gene size

„

Increased susceptibility to stratification & confounding a concern

Calcium channel genes and BPD Subunit, Gene & Type

Chr

p-value < 0.05?

A

19 9 12 3 X 1 1 17 16 22

N Y Y Y n/a Y N N N N

A2D CACNA2D1 CACNA2D2 CACNA2D3 CACNA2D4

7 3 3 12

N N N N

B

CACNB1 CACNB2 CACNB3 CACNB4

17 10 12 2

N N N N

CACNG1 CACNG2 CACNG3 CACNG4 CACNG5 CACNG6 CACNG7 CACNG8 TMEM37

17 22 16 17 17 19 19 19 2

N N N N N N N N N

G

CACNA1A CACNA1B CACNA1C CACNA1D CACNA1F CACNA1S CACNA1E CACNA1G CACNA1H CACNA1I

P/Q N L L L L R T T T

„

27 genes, 4 subunits ‰

„

No inter-gene linkage disequilibrium

Cluster of significant gene-based results for genes closely related to CACNA1C (L-type calcium ion channel A1 genes)

Association studies have low power to detect rare variants MAF

GRR = 2

GRR = 4

0.1

139

29

0.01

1171

219

0.001

11,520

2,131

0.0001

115,000

21,250

Number of case/control pairs to achieve 80% at 5% type I error rate MAF = minor allele frequency; GRR = genotypic relative risk

Common Old Universal Segregate on short haplotypes

Rare Recent Population-specific Segregate on long haplotypes

“Seemingly unrelated” HapMap individuals

International HapMap Phase II analysis group (Nature, 2007)

Population-based linkage analysis Homozygosity (shared segment) Case (or case/case) Control (or not case/case) Chromosome →

Disease locus

Multiple, rare variants → increased sharing among cases at disease locus

Best overall results from shared IBD segment analysis: 13q33 in Portuguese

-log10(p) for chromosomes 1 – 22

Portugal

~9Mb

Schizophrenia IBD (1st in Portuguese) Schizophrenia IBD (2nd in Swedes) Bipolar study HBD (2nd in combined samples) Schizophrenia HBD (2nd in combined samples)

Methods

International Schizophrenia Consortium

STEP-BD/UCL Bipolar Study

Pak Sham

U. of Aberdeen

MGH/Broad

Mark Daly

David St. Clair, Noel Kwan.

Ben Neale

Cardiff University

Manuel Ferreira

Michael O’Donovan, Michael Owen,

Douglas Ruderfer

Gerorge Kirov

Kathe Todd-Brown

U. College London

Yan Meng

Hugh Gurling, Andrew McQuillin

Andrew Kirby

U. of Edinburgh Douglas Blackwood, Walter Muir, Kevin McGhee

Pamela Sklar, Manuel Ferreira, Lauren Weiss, Jinbo Fan, Edward Scolnick STEP genetics collaborators Bernie Devlin, Steve Faraone, Nan Laird, Matt McQueen, Vishwajit Nimgaonkar, Jordan Smoller STEP clinical Collaborators Gary Sachs, Roy Perlis UK-US Bipolar Disorder Hugh Gurling, Andrew McQuillin, Nick Bass, Jacob Lawrence

Broad Institute/MPG

U. of Southern California Carlos Pato, Michele Pato U. of North Carolina Patrick Sullivan Karolinska Institute Christina Hultman, Paul Lichtenstein Trinity College Michael Gill, Aiden Corvin, Derek Morris MGH/Broad Institute/Stanley Center Pamela Sklar, Jennifer Stone, Edward Scolnick

U. of Edinburgh Douglas Blackwood, Walter Muir, Kevin McGhee NIMH Genetics Initiative Control Collection Pablo Gejman, Alan Sanders Farooq Amin, Nancy Buccola William Byerley, Robert Cloninger, Raymond Crowe, Donald Black, Robert Freedman, Douglas Levinson, Bryan Mowry, Jeremy Silverman

Suggest Documents