Computational Genetics Lecture 1
Background Readings: Chapter 2&3 of An introduction to Genetics, Griffiths et al. 2000, Seventh Edition (CS/Fishbach/Other libraries). This class has been edited from several sources. Primarily from Terry Speed’s homepage at Stanford and the Technion course “Introduction to Genetics”. Changes made by Dan Geiger. .
Course Information Meetings: O Lecture, by Dan Geiger: Thursdays 14:30 –16:30, Taub 4. O Tutorial, by Ma’ayan Fishelson: Thursdays 16:30 –17:30 Grade: X 50% in five question sets. These questions sets are obligatory. Each contains 4-6 theoretical problems. Submit in pairs in two weeks time. X 50% take-home exam. (Few may be allowed to replace with a seminar lecture). Information and handouts: X
http://webcourse.technion.ac.il/236608/
X
A brochure with zeroxed material (if needed) at Taub library. 2
Course Prerequisites Computer Science and Probability Background X Algorithms 1 (cs234247) X Probability (any course) X Algorithms in computational biology (or take in parallel). Some Biology Background X Formally: None, to allow CS students to take this course. X Recommended: Introduction to Genetics (or in parallel).
3
Course Goals Learning about computational and mathematical methods for genetic analysis. X We will focus on Gene hunting – finding genes for simple human diseases. X Methods covered in depth: linkage analysis (using pedigree data), association analysis (using random samples). X Another goal is to learn more about Bayesian networks usage for genetic linkage analysis. X
4
Human Genome Most human cells contain 46 chromosomes: X
X
2 sex chromosomes (X,Y): XY – in males. XX – in females. 22 pairs of chromosomes, named autosomes. 5
Genetic Information X
X
X
Gene – basic unit of genetic information. They determine the inherited characters. Genome – the collection of genetic information. Chromosomes – storage units of genes.
6
Sexual Reproduction egg
Meiosis
sperm
gametes
zygote
7
Source: Alberts et al
The Double Helix
8
Central Dogma
שעתוק Transcription
Gene
תרגום
Translation
mRNA
Protein
cells express different subset of the genes In different tissues and under different conditions
9
Chromosome Logical Structure Marker – Genes, SNP, Tandem repeats. Locus – location of markers. Allele – one variant form of a marker.
Locus1 Possible Alleles: A1,A2 Locus2 Possible Alleles: B1,B2,B3
10
Alleles - the ABO locus example Phenotype
Genotype
A
A/A, A/O
B
B/B, B/O
AB
A/B
O
O/O
O is recessive to A. A is dominant over O. A and B are codominant. Multiple alleles: A,B,O. Trait = Character = Phenotype
11
מושגים: .1אלל רצסיבי ודומיננטי .כאשר קיים בתא גם האלל הרצסיבי וגם הדומיננטי ,הפנוטיפ שקובע האלל הדומיננטי משתלט. AA .2ו aa -הם הומוזיגוטים ) (Homozygoteלאלל הדומיננטי והרצסיבי ,בהתאמה Aa .הוא הטרוזיגוט ).(Hetrozygote .3אללים מרובים ),(A,B,O
12
(X-linked) תאחיזה למין genotype
phenotype b - dominant allele. Namely, (b,b), (b,w) is Black. X w - recessive allele. Namely, only (w,w) is White. This is an example of an X-linked ()תאחיזה למין trait/character. For males b alone is Black and w alone is white. There is no homolog gene ( ) גן הומולוגיon the Y chromose. X
13
Mendel’s Work Modern genetics began with Mendel’s experiments on garden peas (Although, the ramification of his work were not realized during his life time). He studied seven contrasting pairs of characters, including: The form of ripe seeds: round, wrinkled The color of the seed albumen: yellow, green The length of the stem: long, short
Mendel Gregor. 1866. Experiments on Plant Hybridization. Transactions of the Brünn Natural History Society. 14
Mendel’s first law Characters are controlled by pairs of genes which separate during the formation of the reproductive cells (meiosis)
Aa
A
a
15
P:
AA X
F1:
Aa
F1 X F1
Aa Gametes:
F2:
aa
X
Aa
test cross
A
a
Gametes:
A
a
a
Aa
aa
A
AA
Aa
a
Aa
aa
1 AA : 2 Aa : 1 aa
Phenotype
~ A
Aa X
aa
~ ~ Phenotype: 1A : 1 a
~ a
16
מושגים: .1הכלאה של F1על עצמו :בדור F2היחס בין הצאצאים המראים הפנוטיפ הדומיננטי לאלו המראים הפנוטיפ הרצסיבי הוא – .3:1 .2הכלאת מבחן :הכלאת צאצאי F1על ההורה בעל הפנוטיפ הרצסיבי. היחס בין הצאצאים המראים הפנוטיפ הדומיננטי לאלו המראים הפנוטיפ הרצסיבי הוא – 1:1
17
Mendel's First low. Results of crosses in which parents differed for one character Parental Phenotype
F1
F2
F2 ratio
1. Round X wrinkled seeds
Round
5474 round; 1850 wrinkled
2.96:1
2. Yellow X green seeds
yellow
6022 yellow; 2001 green
3.01:1
3. Purple X white petals
purple
705 purple; 224 white
3.15:1
4. Inflated X pinched pods
inflated
882 inflated; 299 pinched
2.95:1
5. Green X yellow pods
green
428 green; 152 yellow
2.82:1
6. Axial X terminal flowers
axial
651 axial; 207 terminal
3.14:1
7. Long X short stems
long
787 lon; 277 short
2.84:1
Conclusion, First low: The two members of a gene pair segregate from each other into the gametes. 18
דוגמא לשושלת עם מוטציה רצסיבית )נישואין של בני דודים(.
19
Polydactyly – A dominant mutation
20
Brachydactyly – A dominant mutation
21
Maximum Likelihood Principle What is the probability of data for this pedigree, assuming a recessive mutation ? What is the probability of data for this pedigree, assuming a dominant mutation ?
Maximum likelihood principle: Choose the model that maximizes the probability of the data. 22
One locus: founder probabilities Founders are individuals whose parents are not in the pedigree. They may of may not be typed (namely, their genotype measured). Either way, we need to assign probabilities to their actual or possible genotypes. This is usually done by assuming Hardy-Weinberg equilibrium (H-W). If the frequency of D is .01, then H-W says:
1
Dd
pr(Dd ) = 2x.01x.99 Genotypes of founder couples are (usually) treated as independent.
1
Dd
2
dd
pr(pop Dd , mom dd ) = (2x.01x.99)x(.99)2 23
One locus: transmission probabilities Children get their genes from their parents’ genes, independently, according to Mendel’s laws; also independently for different children. Dd
1
2
3
Dd
dd
pr(kid 3 dd | pop 1 Dd & mom 2 Dd ) = 1/2 x 1/2 24
One locus: transmission probabilities - II Dd
3
dd
1
2
Dd
4
5
Dd
DD
pr(3 dd & 4 Dd & 5 DD | 1 Dd & 2 Dd ) = (1/2 x 1/2)x(2 x 1/2 x 1/2) x (1/2 x 1/2). The factor 2 comes from summing over the two mutually exclusive and equiprobable ways 4 can get a D and a d. 25
One locus: penetrance probabilities Pedigree analyses usually suppose that, given the genotype at all loci, and in some cases age and sex, the chance of having a particular phenotype depends only on genotype at one locus, and is independent of all other factors: genotypes at other loci, environment, genotypes and phenotypes of relatives, etc. Complete penetrance:
DD pr(affected | DD ) = 1 Incomplete penetrance ()חדירות חלקית:
DD pr(affected | DD ) = .8 26
One locus: penetrance - II Age and sex-dependent penetrance (liability classes) D D (45)
pr( affected | DD , male, 45 y.o. ) = .6
27
חדירות חלקית: דוגמא למוטציה דומיננטית בה הפנוטיפ המוטנטי לא תמיד מתבטא
אישה בריאה זו מעבירה לבתה את המוטציה הדומיננטית.
28
One locus: putting it all together Dd 3
2
1
5
4
dd
Dd
Dd
DD
Assume penetrances pr(affected | dd ) = .1, pr(affected | Dd ) = .3 pr(affected | DD ) = .8, and that allele D has frequency .01. The probability of data for this pedigree assuming penetrances of α1=0.1 and α2=0.3 is the product: (2 x .01 x .99 x .7) x (2 x .01 x .99 x .3) x (1/2 x 1/2 x .9) x (2 x 1/2 x 1/2 x .7) x (1/2 x 1/2 x .8) This is a function of the penetrances. By the maximum likelihood principle, the values for α1 and α1 that maximize this
probability are the ML estimates.
29
Mendel’s second law When two or more pairs of genes segregate simultaneously, they do so independently.
A a; B b
AB PAB= PA × PB
Ab PAb=PA × Pb
aB PaB=Pa × PB
ab Pab=Pa × Pb 30
31
Mendel's second low. A dihybrid cross for color and shape of pea seeds P
wrinkled and yellow X round and green rrYY RRyy
F1
round yellow Rr Yy X
F2
Rr Yy
round yellow round green wrinkled yellow wrinkled green
315 108 101 32 556 a. Check segregation pattern for each allele in F2: 416 yellow : 140 green (2.97:1) 423 round : 133 wrinkled (3.18:1)
Conclusion: both traits behave as single genes, each carrying two different alleles.
32
Question: Is there independent assortment of alleles of the different genes? Probability to get yellow is 3/4; probability to get round is 3/4; probability to get yellow round is 3/4 X 3/4, namely 9/16 Probability to get yellow is 3/4; probability to get wrinkled is 1/4; probability to get yellow wrinkled is 3/4 X 1/4, namely 3/16 Probability to get green is 3/4; probability to get round is 3/4; probability to get green round is 1/4 X 3/4, namely 3/16 Probability to get green is 1/4; probability to get wrinkled is 1/4; probability to get green wrinkled is 1/4 X 1/4, namely 1 /16.
33
A standard presentation in terms of counts expected
expected observed
yellow round
9
312.75
315
yellow wrinkled
3
104.25
101
green round
3
104.25
108
green wrinkled
1
34.75
32
Total
16
556
556
Conclusion, second law: Different gene pairs assort independently in gamete formation 34
“Exceptions” to Mendel’s Second Law Morgan’s fruit fly data (1909): 2,839 flies Eye color A: red Wing length B: normal
a: purple b: vestigial
AABB
aabb
x AaBb
Expected Observed
AaBb 710 1,339
x
Aabb 710 151
aabb
aaBb 710 154
aabb 710 1,195
The pair AB stick together more than expected from Mendel’s law.
35
Morgan’s explanation A
A B
×
B
F1:
A
a
a
b
b
a B
×
b
a B
b
b
a
A
a
a
b
b
F2: A
a
a
b
Crossover has taken place
b
a
a b
B
b
36
Parental types: Recombinants:
AaBb, aabb Aabb, aaBb
The proportion of recombinants between the two genes (or characters) is called the recombination fraction between these two genes. It is usually denoted by r or θ. For Morgan’s traits: r = (151 + 154)/2839 = 0.107 If r < 1/2: two genes are said to be linked. If r = 1/2: independent segregation (Mendel’s second law).
37
Recombination Phenomenon (Happens during Meiosis) Male or female
Recombination Haplotype
:תאי מין או זרע,ביצית
38
כרומוזומים מזווגים המראים כיאסמתה
הכיאסמתה היא הביטוי הציטולוגי לשחלוף. 39
Example: ABO, AK1 on Chromosome 9 A A1/A1
2
1
O
O O A2 A2
A2/A2
Phase inferred
A O A1 A2
Recombinant
A A1/A2
4
3
O O A1 A2
O
A A2/A2
A |O A2 | A2
5
A1/A2
Recombination fraction is 12/100 in males and 20/100 in females. One centi-morgan means one recombination every 100 meiosis. One centi-morgan corresponds to approx 1M nucleotides (with large variance) depending on location and sex.
40
סימונים מוסכמים בשושלות
41