Current Status and Future of Y-STR Analysis. AFDAA Workshop June 2012

Current Status and Future of Y-STR Analysis AFDAA Workshop June 2012 Jack Ballantyne, PhD National Center for Forensic Science University of Central F...
Author: Shanon Daniels
57 downloads 0 Views 4MB Size
Current Status and Future of Y-STR Analysis AFDAA Workshop June 2012 Jack Ballantyne, PhD National Center for Forensic Science University of Central Florida

UCF

Schedule Part 1. Biology and Forensic Uses • Y Chromosome Biology • Forensic Use of Y-STRs – Development and Validation of Y-STR Multiplexes – Extended Interval Post-Coital Samples – Improved Detection of Male DNA

• Future Developments Part 2. Y-STR Match Statistics • Basic Y-STR statistics • Use of the National US YSTR Database

Y Chromosome Biology

Function of Y Chromosome? • Aneuploidies for the X and Y – 47, XXY (Klinefelter synd.)  males • 48, XXXY; 49, XXXXY; 50, XXXXXY  males – 45, XO (Turner synd.)  females – 47, XXX (triple-X karyotype) ‘normal’ female – 47, XYY karyotype ’normal’ male • Sex Reversed Humans – XY  female (Y minus TDF) – XX  male (X plus TDF) 5

6

Classic View of Y-Chromosome

TDF master gene

patrilineal inheritance

no recombination in NRY recombination in PAR

junk-rich, gene poor

7

SDA: sex determining allele PAR: psedo-autosomal region SRY: sex determining region on the Y

9

Y Chromosome NRY is a Mosaic of Discrete Sequence Classes Euchromatin = 23 Mb

10

11

Mutation rates: Y Filer: ~ 3 x 10 -3 RM: ~2 x 10-2 12

13

Forensic Uses of YChromosome STRs

Reasons Y? • Males – 80% of all violent crime – 95% of all sex offenses – (Criminal Victimization in United States, BJS 2001)

• When trying to determine the genetic profile of the male donor in a male/female DNA admixture (when F/M > 20, often >1000) and autosomal STR analysis fails (is not informative) or not possible • Determination of number of semen donors • Missing persons (MP) – haplotype of MP determined by typing male relative • son, brother, father, uncle, nephew

Reasons Y? • Additional statistical discrimination – mixture/partial profiles

• Familial Searching – Reduce number of potential male relatives obtained by low stringency match of sample profile to offender database

• RM Y-STRs may differentiate father/son/grandfather etc

White et al: Y-PLEX-6TM Y-PLEX-12TM YFilerTM released novel YSTRs released released (ABI) (Reliagene) (Reliagene) Ayub et al: Kayser et al: SWGDAM novel YSTRs novel YSTRs core

1997

1998

Kayser et al: ‘minimal haplotype’

1999

2000

2001 Prinz et al: forensic validation (pentaplex)

UCF: begin Ychromosome project

2002

2003

2004

Y-PLEX-5TM released

PowerPlex® Y released

(Reliagene)

(Promega)

Y-STR Development Timeline Hanson et. al: Y-STR physical map

2005

2006

US YSTR Database released

K. Ballantyne et al: Rapidly mutating YSTRs

PowerPlex® Y23

(RM loci)

(Promega)

(online)

2007

2008

2009

2010

2011

2012

“Minimal Haplotype” Loci (Kayser et. al. 1997):

 DYS 19 DYS 391

DYS 389I DYS 389II DYS 390, DYS 392 DYS 393

DYS 385 I/II

• First used in Europe • Formed the basis for multiplex development in the U.S.

SWGDAM Core Loci • Recommended in 2003

• DYS19 DYS389I DYS389II DYS390, DYS391 DYS392 DYS393 DYS385 I/II • Also: DYS438, DYS439

Commercially Available Y-STR Multiplex Kits

• Reliagene: Y-PLEX-5, -6, and -12 – No longer available

• ABI: YfilerTM – 17 loci in a single reaction

• Promega: PowerPlex YTM – 12 loci in a single reaction

Novel Markers?

DYS388

DYS463

DYS522

DYS588

DYS688

DYS425

DYS464

DYS525

DYS590

DYS698

DYS385

DYS426

DYS468

DYS527

DYS591

DYS703

DYS389I/II

DYS434

DYS476

DYS531

DYS593

DYS707

DYS435

DYS481

DYS533

DYS594

Y-GATA-A7.1

DYS436

DYS484

DYS534

DYS596

Y-GATA-A7.2

DYS391

DYS441

DYS485

DYS535

DYS598

Y-GATA-A10

DYS392

DYS442

DYS488

DYS540

DYS607

YAP

DYS443

DYS490

DYS543

DYS617

DYS444

DYS494

DYS549

DYS622

DYS437

DYS445

DYS495

DYS552

DYS627

DYS438

DYS446

DYS497

DYS556

DYS630

DYS447

DYS499

DYS557

DYS631

DYS449

DYS503

DYS561

DYS634

DYS448

DYS450

DYS505

DYS562

DYS636

DYS456

DYS452

DYS508

DYS570

DYS638

DYS453

DYS510

DYS572

DYS641

DYS454

DYS511

DYS575

DYS643

Y-GATA-C4

DYS455

DYS513

DYS576

DYS637

Y-GATA-H4

DYS459

DYS520

DYS578

DYS660

DYS462

DYS521

DYS587

DYS685

DYS19

DYS390

DYS393

DYS439

DYS458

NCFS LOCI

SWGDAM core + commercial kits

NCFS Developed Multiplexes • • • • • • • • • •

Multiplex I (9 loci) Multiplex II (10 loci) Multiplex III (8 loci, 1 InDel) Multiplex IV (19 loci) Multiplex V (13 loci) Multiplex VI (14 loci) Multiplex VII (12 loci) Multiplex VIII (11 loci) Multiplex IX (7 loci) Multiplex X (6 loci)

= 109 Markers

*Additional 25 Loci screened and rejected  34% of known loci evaluated

Sexual Assault Investigations

SEXUAL ASSAULT INVESTIGATION • Assume that a woman (victim, V) is raped by an individual (perpetrator, P) but shortly before this she has consensual intercourse with her husband/boyfriend (B) • Before consideration of genetic marker results: – semen may only have come from P – semen may only have come from B – semen may comprise a mixture of semen from P and B – semen may not be present

• Absence of semen from P does not mean that P did not rape V – semen from P does not prove he raped V (could be consensual)

Occasional Problem • •



some rape victims provide vaginal samples > 2436 hours after the incident ability to obtain an autosomal STR profile of the semen donor from the living victim diminishes rapidly as the post-coital interval is extended (> 24-48h difficult, >72 h not normally possible) from classical forensic serology and reproductive biology studies – sperm persist in the post-coital vaginal canal up to 3 days after intercourse – from the medical literature sperm may be detectable up to 7 days after intercourse in the cervix •

sperm are few in number after these extended intervals

Sperm Loss Over Time 1. 2. 3.

vaginal lavage vaginal drainage normal intra-cervicovaginal sperm degradative changes –

4. 5.

sperm become damaged and fragile

below analytical sensitivity of the test during the differential extraction process within the laboratory – –

multiple manipulations of the sample few remaining sperm lyse into female fraction » kinetics of the PCR process » » »

majority component in an admixture will titrate out critical PCR reagents female/male DNA ratio > 300/1 failure to type the male component

Potential Solutions to Problem? • use Y-STRs – theoretically can detect male in female/male admixtures • ‘ignores’ female component

– no differential extraction • avoids unnecessary sample loss

• low copy number approach – add large quantities of DNA (300-450 ng) • to permit adequate sampling of small number of still persisting sperm

– increased PCR cycle number (34-36 cycles)

• ensure cervicovaginal sampling – low to mid-vaginal sampling may not recover sperm after extended post coital interval

Not All Y-STRs Are Made Equal • Y chromosome retains high level of sequence homology with the X

• Different Y-STR primers will possess different degrees of homology with the X chromosome

Early Work Conclusions • it is possible to obtain the genetic profile of the semen donor in postcoital cervicovaginal samples recovered up to 4 d after intercourse – use of carefully selected Y-STR markers • chosen for their ability to detect the male donor in an overwhelming quantity of background female DNA

– – – –

no differential extraction 300-450 ng of input template DNA increased cycle number (34-35 cycles) cervicovaginal sampling

• such strategies may significantly impact the recovery and processing of rape evidence

More recent work • Number of DNA profile enhancement strategies employed – Cervical brushing – Differential versus non-differential DNA extraction – Post-PCR purification

• Standard manufacturers’ cycling conditions used • Y-STR profiles – Full profiles routinely 3-5 days after intercourse – Profiles > 6 days after intercourse but mainly partial – Use of post PCR purification significantly improved ability to obtain profile, especially from the 5-6 day samples – Better profiles with differential lysis (sperm fraction) – 8 locus Y-STR profile from 7 day post-coital sample • Approaching limit for sperm detection in cervix

Enhanced DNA Profiling for Detection of the Male Donor in Trace DNA Samples

Nested-PCR Strategy

MinElute purification to remove excess primers

Nested PCR Pre-Amplification Multiplex

Yfiler Amplification kit (ABI)

Improvement in allele recovery and signal intensity with prior pre-amplification

5pg Input Male DNA (Single Source) Without Pre-Amp

With Pre-Amp

LCM – 1 epithelial cell

1 cell Without pre-amplification (1/17 alleles)

1 cell With pre-amplification (11/17 alleles)

Door Handle

Without pre-amplification (0/17 alleles)

With pre-amplification (17/17 alleles)

7 days after intercourse

Without pre-amplification (5/17 alleles)

With pre-amplification (16/17 alleles)

FUTURE?

• Improved Discrimination Additional Y-STR Markers (new kits) • Incorporation into CODIS or other local investigative Databases • Ability to distinguish males from same lineage • Surname calling card

Novel Y-STR Markers • Promega: PowerPlex Y23 TM – 23 loci in a single reaction – Not yet available

• RM loci • Work done at NCFS

PowerPlex® Y23 • 17 Y-STR loci currently contained in commercially available Y-STR kit (YfilerTM) • Six new highly discriminating Y-STR loci: – DYS481, DYS533, DYS549, DYS570, DYS576, DYS643

• Adds more evidential power to a forensic Y-STR analysis • Lebanese population (n = 502) – Y Filer 431 unique haplotypes, 86% unique – Y-23 478 unique haplotypes, 95% unique

RM loci • With current Y-STRs, often a failure to distinguish between males of the same lineage • Possible to utilize highly mutating Y-STR loci – Differences between males in same lineage

• 13 Y-STRs with the highest mutability recommended – “RM Y-STR” loci – Ballantyne, K. et al. Am. J. Hum Genet 2010

• Likely to have significant impact on Y-STR analysis – Additional discriminatory power to currently used loci – In some cases differentiate male lineage relatives

Example (RM Loci)

Son (Twin 1)

Father

Son (Twin 2)

Statistics “A detailed understanding of the influence of all factors on the evolution of profile proportions requires a lifetime of study, and more”

'Some people believe football is a matter of life and death. I'm very disappointed with that attitude. I can assure you it is much, much more important than that.‘

Balding 2005

Bill Shankly, 1960s

“Those who cannot learn from history are doomed to repeat it” George Santayana – The Life of Reason (1905-1906)

Tasteful Statistics Jokes • A statistician is a professional who diligently collects facts and data and then carefully draws confusions about them • You can lie with statistics but even better without • Statistics means you never have to say you’re certain (wrong)

DNA Mixture Interpretation

Presentation Title

47

Seriously True Precepts • All models are wrong. Some are useful. – George Box

• There are no facts, only interpretations. – Frederick Nietzsche

DNA Mixture Interpretation

Presentation Title

48

Basic Y-STR Interpretation Guidelines 

Similar general issues to autosomal STRs • Thresholds for detection and interpretation • Stutter • Probability of a match (STATISTICS!) • Mixtures

Profile and Match Probabilities • estimating the rarity of a Y DNA profile is performed differently than for autosomal DNA markers • because of linkage, each haplotype is treated as an allele and the total number of possible haplotypes comprise the alleles of a single locus – Composite multi-locus profile is treated as a single locus or haplotype

• no evidence for recombination across the majority of the Y-chromosome • cannot employ the product rule to estimate the rarity of the Y types in a profile

Current Approach: Counting Method • The counting method is very simple • A Y-haplotype (evidence sample) is compared to a reference database(s) of unrelated individuals • The number of times the Y-haplotype is observed in a database – The size of a database can be and is often limited – With databases (e.g., n = 100 to 20000), many possible haplotypes will not be observed and there will be sampling error

• A confidence interval can be placed on the observation (ie statistical sampling correction) – Can convey with a high degree of confidence that the rarity of the evidence Y-haplotype among unrelated individuals in a given population(s) is less than the upper bound of the estimate

Estimation of haplotype frequencies • If haplotype A is seen x times in a sample of n ~ haplotypes, pA = x / n • Tendency to – regard this quantity as being normally distributed – Acknowledge sampling variation with a 100(1-a) % confidence interval in the form of:

For 95% confidence, a = 0.05 so that Z(1-a/2) = 1.96 and Z(1-a) = 1.645

Estimation of Haplotype Frequencies • The actual sampling distribution is binomial

• This is approximately normal ONLY for * moderately large values of n pA (>5)

Estimation of haplotype frequencies • An exact confidence interval based on the binomial distribution was described by Clopper and Pearson • An upper 100(1-a) % one-sided limit po is found by solving:

– If the sample does not contain any copies of A, then po = 1 –a1/n – For a 95% confidence limit, this is approximately equal to 3/n once n is 100 or more

Clopper-Pearson Exact Confidence Interval Clopper, C.J. and E.S. Pearson, The use of confidence or fiducial intervals illustrated in the case of the binomial. Biometrika (1934). 26: p. 64-69.

n k nk     k  p0 (1  p0 )  0.05 k 0   x

The formula is the cumulative binomial distribution for all values from 0 matches to k matches given a database size of N and a frequency of p. Since N and k are fixed after a search, the goal is to determine the p at which 95% of the observations are expected to be more than k, and 5% of the observations (the 0.05 in the formula) are expected to be between 0 and k. This program increases p by small increments until this balance point (~95% of the possible comparisons expected to give you >k matches, and ~5% expected to give you k or fewer matches) has been reached. By finding what amounts to a left-hand 1-sided 95% confidence interval (i.e., the lower limit) for the distribution of possible matches given the frequency p (all as it relates to the k and N observed from the search), this then also provides an 95% upper limit for p. Beyond that point, it is considered too unlikely that a haplotype with a more common frequency would give so few matches.

55

If the haplotype has not been observed in the database, then:

The upper (95%) bound of the CI is

1-(0.05)1/N Or the ‘rule of 3’ = 3/N

Exact vs. Normal Confidence Intervals n

X

P

HP (1-tail)

HP (2-tail)

CP

100

1

0.01

0.026

0.029

0.047

2

0.02

0.043

0.047

0.062

10

0.10

0.149

0.159

0.164

1

0.001

0.0026

0.0029

0.0047

2

0.002

0.0043

0.0048

0.0063

10

0.010

0.0152

0.0162

0.0169

1

0.0001

0.0003

0.0003

0.0005

2

0.0002

0.0004

0.0005

0.0006

10

0.0010

0.0015

0.0016

0.0017

1,000

10,000

57

National U.S. Y-STR Database

Y-STR Database Goals • To compile and consolidate Y-STR data from all available ‘legitimate’ sources • To create a Y-STR Consortium comprised of stake holders and data contributors from the forensic community • Expand data – Type additional samples using core loci

• Provide custodial and managerial responsibility • Develop quality indicators for data inclusion and submission – ‘Proficiency testing’ for labs who wish to contribute data – Screen data and remove duplicate & related samples

• Ensure allele-call consistency among different primer sets • Provide accessibility and statistical data to the forensic community via the Internet

Y-STR Consortium Members •

NCFS – Jack Ballantyne – Lyn Fatolitis





University of Arizona – Mike Hammer



NIST – John Butler





MN Dept of Public Health – Ann Marie Gross

University of North Texas – Arthur Eisenberg



FBI – Bruce Budowle

NYC OCME – Mecki Prinz

Applied Biosystems – Lisa Calandro





Promega – Curtis Knox



ReliaGene – Sudhir Sinha



Orchid Cellmark – Cassie Johnson



NIJ – John Paul Jones

The Y-STR Consortium was formed at the 2006 AAFS Meeting in Seattle, WA to assist in sample consolidation and the design and development of the database.

Y-STR Interpretation Guidelines Scientific Working Group on DNA Analysis Methods (SWGDAM) FSC January 2009

5. Statistical Interpretation 5.1. Y-STR loci are located on the nonrecombining part of the Ychromosome and, therefore, should be considered linked as a single locus. A Y-STR database must consist of haplotype frequencies rather than allele frequencies. The source of the population database(s) used should be documented. Relevant population(s) for which the frequency will be estimated should be identified. A consolidated U.S. Y-STR database (http://usystrdatabase.org) has been established and should be used for population frequency estimation. A number of other Y-STR haplotype frequency databases exist online. (See available listing on the NIST [National Institute of Standards and Technology] STRBase Web site at http://www.cstl.nist.gov/biotech/strbase/y_strs.htm.)

Database Home Page: www.usystrdatabase.org

62

Release 2.6 – Current Version  Was made available on January 3, 2012  Comprised of 18,719 haplotypes – An additional 61 Yfiler haplotypes were uploaded • 35 Caucasian • 3 African American • 7 Asian • 16 Hispanic – San Diego Sheriff’s Crime Laboratory (39 samples) – Santa Clara County Crime Laboratory (22 samples)

63

    

6301 African American (All, Undefined, or Select by State) 1008 Asian (All, Asian, Chinese, Filipino, Oriental, S. Indian, Vietnamese, Jordanian) 6998 Caucasian (All, US, Canada, Europe, Undefined) 3429 Hispanic (All, Undefined, or Select by State) 983 Native American (All, Apache, Navajo, Shoshone, Sioux)

64

NCFS -2,440 ReliaGene -3,037 Promega -3,800 Applied Biosystems - 6,159 University of Arizona -2,462 The Illinois State Police - 398 Orange County CA Coroner - 30 samples.

Santa Clara County -143 CA DOJ Sacramento - 32 Marshall University -113 WSP Vancouver - 40 Richland County Sheriff SC-7 San Diego Sheriff - 39 65

  

Release 2.6 contains 18,719 samples with a complete 11-marker SWGDAM core haplotype 15,395 samples have a complete 12-marker PowerPlex Y haplotype 8,548 samples have a complete 17-marker Yfiler haplotype 66





Release 2.6 contains 15,395 complete PowerPlex Y (12-locus) haplotypes. Of these, 6194 haplotypes are unique (i.e., seen only once in the database) while 9201 haplotypes are seen more than once, giving a DP of 40.2% The Database contains 8548 complete Y-filer (17-locus) haplotypes. Of these, 6934 haplotypes are unique while 1614 haplotypes are seen more than once, giving a DP of 81.1%.

67

 Since the release of the US Y-STR Database in January 2008, over 86,200 database search queries have been performed. Up until January of 2011, the database had an average of approximately 800 searches per month. Between January and August 2011 the average use increased to >8,000 searches per month. Since this time, the average is over 1200 searches per month. 68

69

Court Support Frye Hearings

 State of California v Miszkewycz (county of Placer)

– “..contends that the statistical analysis applied in this case is faulty because of an unreliable or unknown data base used to formulate the statistics” – “Court is satisfied that accepted scientific procedures and principals (sic) were properly used in this case” (After “Ms Caser’s” testimony)

 State of Kansas v Gonzalez

– Judge Pokorny, Seventh Judicial District for the District of Kansas, Douglas County, KS – Challenge that over a period of six months the frequency of the evidence/suspect haplotype changed from 1/2717 to 1 in 1786 – 4 October 2010: found that Y-STR database is fit for purpose and motion to deny/exclude Y-STR haplotype evidence denied

Peer Reviewed Journal Article

 US forensic Y-chromosome short tandem repeats database. Ge,Budowle, Planz, Eisenberg, Ballantyne, Chakraborty. Legal Medicine 12 289-295 (2010) 70

Database Expansion / Data Solicitation  Created Sample Submission section on database website – Created quality control competency testing procedure using liquid blood samples donated by UNT for this purpose – Certificate of Participation is issued to qualifying laboratories – Sample submission template and information available on website

 Solicitation for data was posted on our behalf by Lynne Burley of Santa Clara County Crime Lab on the Yahoo group, forens-DNA, a technical discussion group of forensic DNA technology  Updated U.S. Department of Justice and NCFS websites to solicit for samples and / or data  Routinely make appeals for samples and data at all meetings, presentations, workshops, etc.

71

Certificate No.: 00010

DuPage County Forensic Science Center 501 North County Farm Road Wheaton, IL 60187 USA

The National Center for Forensic Science University of Central Florida 12354 Research Parkway, Suite 225 Orlando, FL 32816-2367 USA

US Y-STR Database

Certificate of Participation

DuPage County Forensic Science Center has participated in the Y-STR Haplotyping Quality Assurance Exercise

The alleles at all loci tested have been typed correctly according to the published nomenclature and the ISFG guidelines for Y-STR Analysis (Int J Legal Med 114 (2001) 305-309)

Granted: March 30, 2010

Jack Ballantyne, Ph.D. Associate Director (Research)

Lyn Fatolitis US Y-STR Database Manager

72

QC Participants – – – – – – – – – – – – – – – –

IL State Police Crime Laboratory Jan Bashinski DNA Crime Laboratory, CA Department of Justice Orange County CA Sheriff – Coroner, Forensic Science Services State of Connecticut Forensic Science Services Laboratory County of Santa Clara CA Crime Laboratory, Office of the DA CA Department of Justice, Sacramento Crime Laboratory WA State Patrol Crime Laboratory – Vancouver Marshall University Forensic Science Center AZ Department of Public Safety Central Regional Crime Laboratory DuPage County IL Forensic Science Center Richland County Sheriff’s DNA/Trace Department – SC Miami-Dade Police Department Forensic Services Bureau Michigan State Police Biology Unit – Lansing San Diego Sheriff’s Regional Crime Laboratory FBI Lab UNT

73

Users’ Feedback – Database Improvements  Several changes and improvements have been made based upon recommendations and suggestions from users in the field – Followed suggestions from the Santa Clara County DA Crime Laboratory and the Centre of Forensic Sciences in Ontario to alter some of the verbiage in the displayed results – Added a “News” section to the database homepage to allow for announcements and updates to keep users informed – Created and validated an automatic haplotype upload interface, allowing users to simultaneously upload multiple haplotypes directly from Genotyper® and GeneMapper® text files for database searches, modeled after Applied Biosystems’ Yfiler Database interface – Adjusted the > (greater than) and < (less than) queries of the database. Rather than returning just exact matches, the query now returns all alleles greater than or less than the entry and calculates these haplotypes into the statistic statements. 74

Court Support Frye Hearings

 State of California v Miszkewycz (county of Placer)

– “..contends that the statistical analysis applied in this case is faulty because of an unreliable or unknown data base used to formulate the statistics” – “Court is satisfied that accepted scientific procedures and principals (sic) were properly used in this case” (After “Ms Caser’s” testimony)

 State of Kansas v Gonzalez

– Judge Pokorny, Seventh Judicial District for the District of Kansas, Douglas County, KS – Challenge that over a period of six months the frequency of the evidence/suspect haplotype changed from 1/2717 to 1 in 1786 – 4 October 2010: found that Y-STR database is fit for purpose and motion to deny/exclude Y-STR haplotype evidence denied

Peer Reviewed Journal Article

 US forensic Y-chromosome short tandem repeats database. Ge,Budowle, Planz, Eisenberg, Ballantyne, Chakraborty. Legal Medicine 12 289-295 (2010) 75

Release 2.7 – Next Version • • • •

FBI: 1691 Marshall: 249 SanDiego: 15 UNT: 951 (+ Y23)

• N increased by 2906 • NIST: 642 new Y23 markers

Future Goals  To continue to solicit data and / or samples from forensic laboratories – to expand continuously the number of individuals (N) for each ancestral group and geographical location – plan updates approximately every 6 months if samples are available

 To continuously incorporate the suggestions and recommendations received from users – to improve the design and functionality of the database to better serve the needs of the forensic community

77

Population Substructure Issues • Correction for population structure may be necessary although depends upon no. of loci typed 

Effective population size ¼ of autosomal loci

Substructure effects less in US than ancestral populations 

Use when reference database considered not representative 

Haplotype Frequencies • Sample frequency ( p~A ) • Frequency in the sub-population ( pA* ) • Frequency over all sub-populations (pA)

Autosomal Markers • With random sampling of individuals – And if the population in question is in HardyWeinberg equilibrium

• The number of A alleles in a sample of n genotypes has a binomial distribution with parameters 2n and p*A, so that the variance of ~ pA is:

Autosomal markers • Population genetic theories aimed at sensible match probabilities – Must address another level of sampling: • That inherent to the underlying evolutionary process



For a very wide class of evolutionary models, the variance of the actual allele frequency p*A over populations is

~ q = Fst

Autosomal Markers • If HW equilibrium within the population of interest – Probability that a randomly drawn profile is AA P*AA = (p*A)2 – Expected value over populations =

– The ‘match probability’ that two randomly and independently drawn individuals in the same population of interest are both AA is (p*A)4 • Probability that an unknown person has AA given that one instance of AA has been seen in the population Pr(AA AA) = Pr (AA, AA)/Pr(AA) = (p*A)2

• Match probability is the same as the profile probability in this case

Matching with lineage markers • Y chromosome haplotype (type A) – Match and profile probabilities for a specific subpopulation are both equal to p*A • Which is the profile frequency in that sub-population

• With population structure – Match and profile probabilities are no longer the same – Expected value of (p*A)2 is needed – the match probability

*Note: match probability within a particular sub-population is also greater than the haplotype frequency in the whole population since q + (1 – q)pA >pA

Pr (A, B) = Pr(A B) x Pr (B) Pr (A,A) = Pr(A A) x Pr(A) Pr (A A) =

Pr(A,A) Pr(A)

= pA2 + pA(1-pA)q pA = pA + (1-pA)q 

pA + q – pAq = q + pA(1-q)

Bayesian (Frequency Surveying) Approach

• Beta posterior distribution when a sample of size n contains x copies of A • Expected value of this distribution ~ ]/[(1-q)/n+q] – [(1-q)pA/n + qp A ~

• Weighted mean of pA and pA – with decreasing weight on the prior as the sample size increases

• Bayesian analog of a confidence interval is the credible interval based on 100(1-a) % of the posterior distribution

Estimating q • Autosomal markers – Usual to assign a value in the range 0.01 – 0.05

• Non-Bayesian framework – Necessary to have data from at least two subpopulations in order to estimate q – Because q describes the normalized variance of allele frequencies over sub-populations

• If a sample was available from the relevant subpopulation – Sample haplotype frequency from those data could be used directly for calculating match probabilities without having to invoke q

Estimating q cont’d • Otherwise, any estimation of q would have to use data from a set of samples from sub-populations that were considered relevant • Average haplotype frequency over these samples could be used as an estimate of pA – Along with q estimated from the between-subpopulation variation of the ~ pA values.

Ewen’s sampling theory • Number, k, of distinct alleles (or haplotypes) in a sample is sufficient for estimating parameter y – where 1/(1+y) is the probability that two alleles (haplotypes) drawn randomly from a population are the same

• Latter probability is also a definition of q – For haploid population of size N with an infinite-alleles mutation rate of m y = 2Nm so that q = 1/(1+2Nm)

• Note: q decreases as the mutation rate increases haplotype mutation rates increase as the number of constituent markers increase

Ewen’s sampling theory • Maximum likelihood estimate of y (or q) – Found by setting the observed value of k equal to its expected value

– Numerical methods are needed to solve this equation for q – Note: • Frequencies of particular alleles are not used here • If every haolotype in a sample is unique – k=n – Provides an estimate of zero for q corresponding to (indefinitely) large mutation rates

Brenner’s approach k • Proportion of haplotypes seen only once in a sample of size (n-1) augmented by the crimescene haplotype • Approximated the match probability between an innocent suspect and a previously unseen trace haplotype as (1-k)/n • In most cases, will lead to a likelihood ratio of n/(1-k) – rather than n that would result if the match probability was set to 1/n

• 1/(1-k) referred to as “inflation factor”

Brenner’s approach • If trace haplotype had been seen (x-1) times in a database of size (n-1) – Likelihood ratio modified to n/[x(1-k)]

• Note: – If all haplotypes are seen only once in a database • k=1 • Match probability is 0 – Same quantification of evidential strength is applied to all types of a lineage marker

• Expected value of k is (from Ewens and Brenner)

Combination of lineage and autosomal markers • It can be done • e.g. Walsh, Redd and Kammer. Joint match probabilities for y chromosomal and autosomal markers. For Sci Int 174 (2008) 234-238 • SWGDAM, Budowle, Weir and Buckleton

Another Approach: Estimate of q

q=

~ q=

1 1+4Nm

1 1+4Nmm

Autosomal N = effective population size (~105) m = mutation rate

Y-chromosome N = effective population size m = number of loci m = mutation rate

Statistics and genetic sampling?

‘among subpopulation’ component

‘within

subpopulation’ component

g

2 person Mixed Y-STR profile a

b

c

d

e

f

h

i

No. of possible haplotypes = 2n, where n = no of (non 385) loci exhibiting two alleles (as opposed to one) = 24 = 16 No of haplotypes = 2n x (k(k + 1)/2) where k = no of 385 peaks a database search (N = 5000) reveals that 9 of the 16 haplotypes have been observed at least once: acegh 1 bcegh 0 DNA Mixture Interpretation

acegi 1 bcegi 2

acfgh 0 bcfgh 1

acfgi 0 bcfgi 1

adegh 0 bdegh 0

Presentation Title

adegi 0 bdegi 0

adfgh 1 bdfgh 1

adfgi 1 bdfgi 1 98

LR V = acegh S = bdfgi = 0.0034

RMNE/PE RMNE/PE: 1. Using only haplotypes OBSERVED in the database Count how many times each of the possible haplotypes are observed in the database, sum them, determine an overall frequency and that constitutes the PI/RMNE. Thus 10/5000 = 1/500 = 0.002 which, with the binomial 95% CI, is = 0.00338 (1/296) Then 1-PI = PE = 1-0.0038 = 0.9962 (ie 99.6 % can be excluded) 2. Using all haplotypes whether or not they have been observed in the database However, what about the 7 haplotypes that could be components of the mixture but whose presence has not been accounted for? DNA Mixture Interpretation

Presentation Title

99



The 7 haplotypes could each occur with a frequency of 3/N −



The PI to take into account all possible contributors to the mixture is 0.00338 + 0.0042 = 0.00758 (1/132) −



Thus 7 x 3/N = 21/N = 21/5000 = 0.0042

PE = 1- 0.00758 = 0.99242 = 99.2%

PE = 99.6% (observed possible haplotypes) versus 99.2% (all possible haplotypes)

Hp

= Prosecution hypothesis = Mixture comprises (1 known + 1 unknown) OR (2 known) individuals

In this case assume = victim + suspect DNA comprises the mixture, thus Pr (E|Hp) = 1 Hd

= Defense hypothesis = the mixture comprises DNA from 2 random unrelated males

haplotype

alleles

count

Pr (Hi) (with binomial sampling correction)

H1 = victim

acegh

1

0.0034

H2

acegi

1

0.0034

H3

acfgh

0

0.0006

H4

acfgi

0

0.0006

H5

adegh

0

0.0006

H6

adegi

0

0.0006

H7

adfgh

1

0.0034

H8

adfgi

1

0.0034

H9

bcegh

0

0.0006

H10

bcegi

2

0.0068

H11

bcfgh

1

0.0034

H12

bcfgi

1

0.0034

H13

bdegh

0

0.0006

H14

bdegi

0

0.0006

H15

bdfgh

1

0.0034

H16 = suspect

bdfgi

1

0.0034

There are sixteen combinations of haplotypes that can produce the evidence haplotype (e.g. H1 + H16)

Combinations of haplotypes that could comprise the 2 person mixture (1 • (2 • (3 • (4 • (5 • (6 • (7 • (8 •

+ 16) = (16 + 1) + 15) = (15 + 2) + 14) = (14 + 3) + 13) = (13 + 4) + 12) = (12 + 5) + 11) = (11 + 6) + 10) = (10 + 7) + 9) = (9 + 8)

Pr = 0.0034 x 0.0034 = 0.00001156 Pr = 0.0034 x 0.0034 = 0.00001156 Pr = 0.0006 x 0.0006 = 0.00000036 Pr = 0.0006 x 0.0006 = 0.00000036 Pr = 0.0006 x 0.0034 = 0.00000204 Pr = 0.0006 x 0.0034 = 0.00000204 Pr = 0.0034 x 0.0068 = 0.00002312 Pr = 0.0034 x 0.0006 = 0.00000204 ∑ = 0.00005308

LR Calculation:

= 1/(2 x 0.00005308) = 1/0.00010616 = 9419 Thus the DNA profiling results were 9419 times more likely if the mixture comprised DNA from the victim and the suspect than if it came from two random unrelated individuals -is this true? (random individuals chosen from the database or might be expected to be present in the database?)

RMNE versus LR and Unresolved Questions •



• •

RMNE = 1 in 296 or 1 in 132 (taking into account unobserved haplotypes) males are included as potential donors to the mixture LR = DNA results are 9400 times more likely if the suspect is admixed with the victim than if DNA from two random males’ DNA is present (cf LR of 1/0.0034 = 294 for the suspects haplotype if single source) Population substructure correction added instead of or in addition to, binomial sampling correction? Denominator of LR − Instead of haplotype frequencies some suggest use frequency of pair wise haplotypes from database that can explain the mixture versus all other pairs of haplotypes that are possible • •

(8)/(1/2 x 5000)(4999) = 8/12497500 = 0.00000064 (without binomial correction) LR = 1/0.00000064 ≈ 1,562,000





Y-STR Mixture Tools were added to the database website on June 20, 2010 Users can select the tool that best suits their needs and the tool opens in a new window.

 Y-STR Mixture Tools were added to the database website on June 20, 2010  Users can select the tool that best suits their needs and the tool opens in a new window.

107

CA DOJ Y-STR Mixture Tool

Harris County MEO Y-STR Mixture Tool

Summary: Statistical Issues • correct for population substructure (q)? • how to measure q? • correct for both genetic and statistical sampling? • do not take into account the specific profile but use purely statistical approach (Brenner)? • mixtures?

We shall never cease from exploration And the end of all our exploring Will be to arrive where we started And know the place for the first time. T. S. Eliot

Thank you for your attention!

Suggest Documents