Identity by Descent and NPL Analysis

Identity by Descent and NPL Analysis Kenneth Lange Departments of Biomathematics and Human Genetics David Geffen School of Medicine at UCLA Novembe...
Author: Brittney Ward
0 downloads 2 Views 62KB Size
Identity by Descent and NPL Analysis

Kenneth Lange

Departments of Biomathematics and Human Genetics David Geffen School of Medicine at UCLA

November 2003

8

Semantics The definition of a kinship coefficient involves certain implicit semantic conventions: 1. An allele is not a gene. Alleles are labels for the different varieties of genes that can occur at a given locus. 2. Genes are replicated and passed from generation to generation. Alleles are labels, not physical entities. 3. Genes can be compared. Two genes can be identical by state, i.e. have the same allelic label, without being identical by descent, i.e. have the same ancestral source. 9

Kinship Coefficients 1. Kinship coefficients quantify the degree of relationship between two relatives. Theoretical kinship coefficients ignore marker data while conditional kinship coefficients take it into account. 2. Two genes are identical by descent (ibd) if one is a copy of the other or they are both copies of the same ancestral gene. The theoretical kinship coefficient Φij is the probability that a randomly sampled gene from j is ibd to a randomly sampled gene from the same arbitrary locus of i. 3. Examples: Φij = 1/2 if i = j and Φij = 1/4 if i and j are first degree relatives. In both cases, no inbreeding is allowed. 10

Theoretical Kinship Coefficients for an Inbred Pedigree PEDIGREE NUMBER

PEDIGREE NAME

PERSON NAME

PERSON NAME

KINSHIP COEFFICIENT

1 INBRED P1 P1 0.500000 1 INBRED P2 P1 0.000000 1 INBRED P2 P2 0.500000 ............................................ 1 INBRED P5 P3 0.375000 1 INBRED P5 P6 0.375000 1 INBRED P5 P5 0.625000 PEDIGREE INBRED HAS AVERAGE INBREEDING COEFFICIENT 0.08333. These coefficients on a brother-sister mating appear in Summary10a.out. 11

Conditional Kinship Coefficients 1. Option 10 also computes conditional kinship coefficients on small pedigrees, and, if asked, compares average values of these to theoretical kinship coefficients. 2. A big discrepancy suggests that the relationship between a pair is mis-specified. 3. The most discrepant pairs are listed in the summary file. 4. The next slide shows how Mendel can distinguish identical twins from non-identical twins by using a discrepancy statistic. 12

Most Deviant Pairs STATISTIC NUMBER

PEDIGREE NUMBER

PEDIGREE NAME

PERSON NAME

PERSON NAME

STATISTIC VALUE

1 153 430 3 4 0.2420 1 181 579 4 5 0.2418 1 12 30 4 5 0.2415 1 126 334 3 4 0.2408 1 49 132 3 4 0.2389 ...................................................... Each flagged pair in this autism study consists of identical twins mis-labeled as ordinary siblings. Three different statistics are reported in Summary10b.out. 13

NPL Statistics 1. Nonparametric linkage analysis is useful for complex traits that do not conform to Mendelian segregation patterns. 2. Mendel introduces recessive and dominant BLOCK statistics in addition to the traditional ALL and PAIRS statistics. It only handles small to medium sized pedigrees. For large pedigrees use SimWalk. 3. Mendel also incorporates a new method for approximating p-values. 4. Mendel gives greater weight to pedigrees loaded with affecteds. 14

IBD Functions

1. NPL statistics measure identity-by-descent (IBD) sharing among the affecteds of a pedigree. 2. Consider a partition of n genes g = (g1, . . . , gn) into m IBD blocks b1, . . . , bm. These are sampled from the same locus. If block bi contains si genes, then some functions that capture excess sharing are Rblocks(g) = −m Rpairs(g) =

negative of the number of blocks

m   X si

i=1 2 m Y si ! Rall(g) = i=1

number of ibd pairs

15

Numerical Example Consider six genes labeled 1, . . . , 6 sampled at the same locus from various people. Enclose blocks of genes that are ibd in curly brackets. Example: b1 = {1, 2, 3},

b2 = {4},

b3 = {5},

b4 = {6}

Then the statistics are: Rblocks = −4 3  1 +3 = 3 Rpairs = 2 2 Rall = 3!(1!)3 = 6 A large value of a statistic is indicative of excess sharing. 16

Sampling of Genes Mendel combines IBD functions with kinship style sampling. If the n affecteds in a pedigree have genes g = (g1, . . . , g2n), then the ALL statistic generates three new statistics rec = R (g) Sall all 1 X kin Rall (h) Sall = n 2 h dom = max R (h). Sall all h

Here the genotype vector h ranges over the 2n samples of n genes from g1, . . . , g2n, taking exactly one gene from each affected. These new statistics are useful in capturing recessive, additive, and dominant inheritance, respectively. Sampling can also be combined with Rpairs(g) and Rblocks(g) function to give nine statistics. 17

NPL Statistics as Conditional Expectations

1. Neither the genes g among affecteds nor any sample h from them is directly observable. kin by taking the con2. We generate a test statistics such as Tall kin with ditional expectation of the corresponding statistic Sall respect to the observed marker data and the current position of the hypothetical trait locus.

18

Combining Statistics from Different Pedigrees If Tk is the value a test statistic on the kth pedigree of a study sample, then we can construct a sample-wide statistic T by forming the weighted sum T =

X

wk [Tk − E(Tk )],

k

where wk is a positive weight assigned to pedigree k. Mendel takes wk =

s

rk Var(Tk )

for a pedigree with rk affecteds. 19

Approximation of P-Values 1. This is difficult because of the conditional expectations used and the differences among pedigrees. 2. Mendel uses two stage simulation. 3. In stage 1, the values of a statistic Tk for pedigree k are sampled or exhaustively enumerated. For a specified number of replicates, sampling is done by gene dropping. Exhaustive enumeration ignores the marker data and is conservative. 4. In the second stage, the distributional pool for each pedigree k is sampled and the resulting Tk are summed. A p-value is approximated by taking the fraction of sampled T ’s greater than or equal to the observed T . 20

Important Keywords for NPL Analysis ANALYSIS_OPTION = NPL !analysis option AFFECTED_LOCUS_OR_FACTOR = HEALTH !field for affecteds AFFECTED = 2 !symbol for affected person REPETITIONS = 50 !for stage 1 sampling; omit for fast default SAMPLES = 100000 !for stage 2 sampling INTERIOR_POINTS = 1 !Number of points between each pair of markers SEED = 17237 !random seed for simulations

21

Output for BRCA1 Data AFFECTED DESIGNATOR = AFFECTED AFFECTED_LOCUS_OR_FACTOR = HEALTH P-VALUES OF NONPARAMETRIC MARKER ALLELE SHARING STATISTICS LOCUS NAME

NEUTER MAP RECESSIVE ADDITIVE LOCATION(cM) BLOCKS STAT PAIRS STAT

MARKER1 MARKER2

0.00 17.33 34.66

0.007240 0.115860 0.269350

0.000370 0.044990 0.195800

ADDITIVE ALL STAT

DOMINANT BLOCKS STAT

0.001580 0.040600 0.170540

0.000160 0.062550 0.236490 22

Suggest Documents