A Strategy for Using Multiple Linked Markers for Genetic Counseling

Am J Hum Genet 37:984-997, 1985 A Strategy for Using Multiple Linked Markers for Genetic Counseling ARAVINDA CHAKRAVARTI AND KENNETH H. BUETOW SUMMA...
Author: Dulcie Shaw
3 downloads 0 Views 1MB Size
Am J Hum Genet 37:984-997, 1985

A Strategy for Using Multiple Linked Markers for Genetic Counseling ARAVINDA CHAKRAVARTI AND KENNETH H. BUETOW

SUMMARY A strategy for using multiple linked markers for genetic counseling is to test sequentially individual markers until a diagnosis can be made. We show that in order to minimize the number of tests performed per case while diagnosing all informative cases the order in which the markers are to be tested is critical. We describe an algorithm to obtain this order using the parameter "'I', the frequency of informative cases. The I value for a specific locus used depends on the marker frequency, association with the disease locus, and also on the informativeness of the marker loci already tested. Realizing that a direct assay for the Ps gene already exists, and that most cases of 13-thalassemia in Mediterraneans can be directly diagnosed using synthetic oligonucleotide probes, we illustrate the above technique by examining nine DNA polymorphisms in the human 1globin cluster for their ability to diagnose sickle-cell anemia in American blacks and 3-thalassemia in Mediterraneans. This analysis shows that 95.39% of all sickle-cell pregnancies can be diagnosed by testing a subset of only six markers chosen by our algorithm. Furthermore, six markers can also diagnose 88.03% of 1-thalassemia in Greeks and 83.56% of 13-thalassemia in Italians. The test set is different from that suggested by the individual informative frequencies due to nonrandom associations between the restriction sites.

INTRODUCTION

There is some misconception that the use of linked marker genes for genetic counseling is of recent origin (see, for example, [Il). It is not: the first explicit Received November 29. 1984; revised March 14. 1985. Part of this research was supported by grants AM13983 and GM33771 from the National Institutes of Health. ' Both authors: Department of Biostatistics, Human Genetics Program. University of Pittsburgh. Pittsburgh, PA 15261. t 1985 by the American Society of Human Genetics. All rights reserved. 0002-9297/85/3705-0016$02.00

984

MULTIPLE LINKED MARKERS 985 mention and use of this method was in 1942 by the Dutch physician Hoogvliet [2] and subsequently was reiterated by Haldane and Smith [3] in 1947. The usefulness of linkage for prenatal diagnosis was first mentioned by Edwards [4] in 1956, and later again by Renwick [5] in 1969. Renwick [5] wrote, "'It may be a vindication of what has long been a matter of faith-that knowledge of human chromosome maps would be of real value in preventive medicine." Since then, classical markers have been successfully used for the prenatal diagnosis of hemophilia [6, 7] and myotonic dystrophy [8, 9]. This technique was not generally applicable because of the paucity of known disease-marker linkages, the large recombination values often observed, and the low degree of polymorphism at most marker loci. This situation has changed dramatically since Kan and Dozy's [10] discovery of a DNA polymorphism in the human 13-globin cluster and its successful use in prenatal diagnosis of sickle-cell anemia [11]. The general applicability of the linkage principle requires both highly polymorphic markers and negligible recombination between disease and marker loci [12]. A general strategy for ensuring that both of the above criteria can be satisfied by DNA polymorphisms was presented by Botstein et al. [13]. The major application of linkage analysis using DNA markers for prenatal diagnosis has been, to date, for sickle-cell anemia and P-thalassemia. Boehm et al. [14] report their study of 95 at-risk pregnancies of which 82 cases were successfully diagnosed using DNA markers. However, this success rate of 86% was obtained by using 10 different DNA polymorphisms. Several mathematical studies of the utility and efficiency of linked marker genes in genetic counseling have been performed ([15, 161 and references cited therein) and demonstrate that several marker loci are necessary to obtain high efficiency. Fortunately, several closely linked DNA polymorphisms can often be discovered as in the 13-cluster [17], growth-hormone cluster [18], albumin locus [19], or the immunoglobulin C,, locus [20]. Since these markers are in strong linkage disequilibrium to each other [21-23]. a diagnosis obtained using one marker may yield the same result as that using another linked marker. Thus, using all available markers for each pregnancy leads to a great deal of redundancy in testing. We propose an alternative approach where a subset of markers can be used sequentially until a diagnosis is made. We choose the set of most informative markers and determine the sequence in which they should be used. We demonstrate that our procedure leads to minimum testing (minimum cost) and yet identifies all cases that can be correctly diagnosed using linkage information (maximum benefit). Our approach is illustrated using data on restriction site polymorphisms and sickle-cell anemia and P-thalassemia. This is for illustrative purposes only, since all cases of sickle-cell anemia and the majority of cases of P-thalassemia can now be directly diagnosed [24-26]. ACCURACY OF DIAGNOSIS

Accuracy is defined as the probability that an individual's disease status is correctly predicted from his or her marker genotype. This depends on knowl-

CHAKRAVARTI AND BUETOW 986 edge of the disease and marker phenotypes for the individual's parents [27, 28] and the linkage phase of the doubly heterozygous parent(s) [14, 29]. Under these conditions, the accuracy is a function of the recombination value between the disease and marker loci and can be made as high as desirable by choosing markers as close to the disease gene as possible. FREQUENCY OF INFORMATIVE CASES

Nei [27, 28] defined the proportion of informative families as the frequency in the population of those families in whose offspring disease diagnosis could be performed with high accuracy. This has been extensively studied by several authors [12, 15, 27-31] when recombination can occur and for different modes of inheritance. When recombination can be ignored and nonrandom associations exist, Asmussen and Clegg [16] showed how the proportion of informative families can be computed. When recombination is negligible, informative families are those couples among whose offspring the disease can either be positively diagnosed or ruled out. Therefore, we call this the frequency of informative cases (I). This is a function of the mode of disease inheritance and the frequency of haplotypes for the disease and marker loci. These frequencies are computed in the APPENDIX, when the marker locus has multiple codominant alleles. For closely linked DNA polymorphisms, we can construct marker haplotypes. These haplotypes can be considered, operationally, as "alleles" at a marker "locus" defined by only those polymorphisms considered. This is analogous to the "alleles" at the Rhesus or MNSs blood group loci. MULTIPLE LINKED MARKERS

When several, closely linked genetic markers are available, there will be considerable redundancy if all cases were typed for all markers. One strategy that can be employed is to type cases with a subset of markers, one at a time, until a firm diagnosis (affected or unaffected) can be made. We show that the order in which markers are tested determines the efficiency of the testing procedure. Consider n marker loci Al, A2,... A,A. We assume that at each locus Aj(1 < i s n) there is an arbitrary number of alleles, from whose frequencies on normal and mutant gene bearing chromosomes the proportion of informative cases can be computed for any disease using the equations in the APPENDIX. Without loss of generality, we assume that Al is the most informative marker, followed by A2, then A3, and so on. In other words,

IA, :I 1A, 2: . * * 2> IAT, *( Clearly, A1 is the first marker to be tested. We now need to choose that marker that is most informative conditional on cases where A1 was uninformative (denoted by Al). The frequency of informative cases using Ai

(i = 2, 3, . theory [32],

. .,

987 MULTIPLE LINKED MARKERS n) given that A, is not useful is from elementary probability

1A- 1A

1A

AA A

A

IAviA - IA, 1 A1 IAI

(2)

where the symbols "i", "", and "," mean "conditional on," "and," and "and/or," respectively. Thus, we compute the quantity in equation (2) for all other markers and choose that marker that gives the maximum. Since A, in the above equation is a constant, we choose that marker Ai (2 < i c n) next for which 'AX, A, is the maximum. Observe that if all markers associated at random then [32]

IAi A,

=

IA'A, ,

(3)

which on substituting in equation (2) yields IAIA = IA,' as expected, so that Awould be the second marker chosen. The procedure outlined by equation (2) is then: choose the most informative marker first, choose as the second marker that which is most informative in combination with the first, choose as the third marker that which is most informative in combination with the first and second, and so on. The quantities Ai, A, AA . .. can be easily computed for any type of disease using the equations in the APPENDIX and by considering the marker locus as "AiAjAk . ." with haplotypes defined by these loci as the alleles of the defined locus. PROPERTIES

To gauge the efficiency of the above procedure, we compute the distribution of the number of tests that are necessary before a diagnosis can be made. However, it is first necessary to understand the degree of redundancy expected if all n markers were used. The mean number of diagnoses per individual if all n marker loci were used is

ILc = IA, + IA, +

+ IA,

(4)

and can be fairly large. We now compute the distribution of the number of markers to be tested per individual pregnancy before the first diagnosis is made. For simplicity, assume that the order in which the markers were chosen is Al, A2, . ., A,. Then, the number of tests is y = yl + Y2 + . . . + Yn, where

i=

1 if test was performed l 0 if test was not performed

CHAKRAVARTI AND BUETOW 988 (1 c i . n). Note that yi = 1 and that [32] Yi

Bin(l, 1

-

IA1, A,. .A

-

.

.

where "-" means "distributed as" and "Bin" means "Binomial." Then n-I

,u = E(y) = E Prob(yi = 1) i= 1

(5) n-I =

n

IA,'

-

A,'

. ..

'

AX,

i=1

and

2= V(Y)

(6)

n-i =

E IA,, A,,

.

.

*, A(I- IA,'

A,'

.

.

*, A)

i= 1

where E(v) and V( ) denotes the mean and variance, respectively. EXAMPLE: SICKLE-CELL ANEMIA AND 13-THALASSEMIA

At the outset, we wish to clarify that this section is for illustrative purposes only. We do not wish to imply that this method be used for these diseases since sickle-cell anemia can be directly diagnosed using restriction analysis [24, 25] and most cases of ,B-thalassemia in Mediterraneans diagnosed using synthetic oligonucleotides [26]. Sickle-cell anemia and 13-thalassemia have been studied with respect to several restriction site polymorphisms [26, 33, 34]. In collaboration with Dr. H. H. Kazazian, we have been maintaining a database of restriction site polymorphism haplotypes of 13-globin genes, from which we have selected nine polymorphisms, constructed haplotypes from family data, and ascertained their frequencies on normal (PA) and mutant (PS, pthal) chromosomes. These data are presented in tables 1 and 2. From these data, we first compute the conditional marker frequencies on ,8A and I3s or p3thal chromosomes, for any collection of sites, using equations (A3. 1) and (A3.2) in the APPENDIX. Next, using equation (A4), we compute the frequency of informative cases (I). Finally, equation (2) is repeatedly used to select the order in which the nine polymorphisms are to be used. Our results are presented in table 3 and figure la-c (the solid line). Table 3 and figure la-c demonstrate that the order in which the sites have been chosen are not based on the informativeness of each marker locus per se but on their joint utility with other markers. For sickle-cell anemia, these nine markers can diagnose 95.39% of all pregnancies correctly; however, the first six markers can also achieve 95.39% efficiency. This is a key feature of nonrandom association in that the use of additional markers will not increase the

MULTIPLE LINKED MARKERS TABLE 1 FREQUENCY OF HAPLOTYPES FOR NINE RESTRICTION SITE POLYMORPHISMS CHROMOSOMES IN AMERICAN BLACKS

989 ON

13A-

AND

PSCARRYING FREQUENCY

RESTRICTION SITES

ON

(9)

MA

US

3

0 0

(M)*

(2)

(3)

(4)

(5)

(6)

(7)

(8)

+

+ -

-

+

+

+ -

+ -

+ +

+ +

-

-

-

-

+

-

-

-

-

-

-

-

+

+

+

-

+

+

+

+

+

_

_

_

-

-

-

+ +

-

+

+ +

+

+

-

+

+

+

+ + + + + + + +

-

-

-

+

-

+

+

+

+ +

-

+ -

+

+ + -

+ +

+ +

+ + +

-

-

-

+

-

+

+

-

+

-

-

+

+

+

+

+

+

+

-

-

+

-

-

+

+

-

-

+

+

-

-

+

2 I +. 2 +. I +. I +I

+ +

+

-

-

-

-

+.

-

-

-

+ +

+

-

+

+

-

-

-

+

-

+

+

+

+.

-I + 2 + 2 +. 2 2 +. 8 +. +. I . I +. +. +.

1

0 0 5

0 0 0

I 0 0 0

20 0 0

0

0

1

0

8

-.

0

1

Total..................................................................

31

36

These polymorphic sites are in order (5' to 3'): HindIII-Gy, (see reference [33]). " + ' and restriction site at a particular locus. *

Av'aII-P; HpaI-3',3; BamHI-3'p

_Ay: HincII-4pl, -3'141; Hinfl-5'0; HgiA1-p;

denote the presence and absence of a

utility. In comparison, if all the nine markers were randomly associated with each other, then 98.03% of all pregnancies would benefit from the technique. For 3-thalassemia, an identical picture emerges, namely, that even if the nine markers can correctly diagnose 88.17% of all thalassemia cases in Greeks and 83.58% in Italians the use of the first six markers indicated in table 3 gives the same utility. The efficiency of our sequential testing procedure can be evaluated in the following way. When all nine markers are available and tested sequentially, equation (5) shows that, on the average, 1.84 tests for sickle-cell anemia, 2.51 tests for P-thalassemia in Greeks, and 2.99 tests in Italians will have to be performed before a diagnosis can be made. These numbers are considerably smaller than the nine required if all markers are used. On the other hand, equation (4) shows that each case would be informative for approximately three markers for the hemoglobinopathies. Even though the algorithm we propose has maximum efficiency, other algorithms may have near maximum efficiency: such as choosing markers in decreasing magnitude of their individual I values. The successive I values so obtained are also plotted in figure la-c by the dotted lines. At the beginning

CHAKRAVARTI AND BUETOW

990

0 N 0 m ;, w

0

- o

0 o o0 o0 00 o o _ o

0 =

-

- -

z -f

0^ 6

z

W.-

U.Z 0 -0

z

-

W- W

W~-0=- ri=0-0=- 0- 0 0 00-00

0)

0) i

z

-0

r'

I

0

00

0

-

-0

0

0

Z WZ

W Z LL. -

0 M

I~n 0 z

0 Co 0

+

z 0

x

+

+ + + +

+ +

+

+ + +

i+

+

+ +

+

+

+

+ + + +

+ +

0

+++I +++++++I ++ I++++ I++++ ;-

a) C4 ._

00

+++I +++++++I ++

z

++++I++++

.) 0)

CZ z

0N

+ + + + + + + + + + + + + + + +I +

0

+

++

:

C

0)

++I

I++I

II

I+++++++I

I+++I VI

.2 .0 0) -E

z 0'

++

0)

I I

I+

++

L._ P

CO m

I + I- I III III+III+ + +fIII II+ + +I I

0L)

-E 2 ._

Ct *

* -

MULTIPLE LINKED MARKERS

991

TABLE 3 FREQUENCY OF INFORMATIVE CASES USING NINE CLOSELY LINKED DNA POLYMORPHISMS FOR DIAGNOSING SICKLE-CELL ANEMIA IN U.S. BLACKS AND 13-THALASSEMIA IN GREEKS AND ITALIANS U.S. ORDER OF SITE SELECTION*

I ........... II ........... III ........... IV ........... V ........... VI ........... VII ........... VIII ........... IX ...........

GREEKS

BLACK

ITALIANS

Individual

Cumulative

Individual

Cumulative

Individual

Cumulative

.6771 .4185 .3703 .2186 .5579 .0574 .1466 .2258 .2258

.6771 .8338 .8988 .9313 .9530 .9539 .9539 .9539 .9539

.5313 .3837 .2472 .3472 .4945 .0956 .2311 .2472 .3143

.5313 .7403 .8368 .8613

.4354 .3352 .2826 .3603 .2511 .1063 .2741 .3788 .3352

.4354 .6480

.8728 .8805

.8817 .8817 .8817

.7688 .8153 .8329 .8358 .8358 .8358

.8358

* The orders of site selection are:

(a) 5, 1, 4, 3, 8, 9, 2, 6, and 7 in U.S. blacks (see table 1); (b) 1, 9, 6, 2, 4, 5, 3, 7, and 8 in Greeks (see table 2); (c) 1, 6, 8, 2, 9, 5, 3, 4, and 7 in Italians (see table 2).

and the end of either of these testing schemes, the cumulative I values are identical, but their trajectories are clearly different. The maximum efficiency procedure we suggest will reach the maximum first of all other testing procedures, and, thus, fewer markers need to be tested. The difference in efficiency of the two procedures lies in the diversity of the mutations. The "heterozygosity" of the haplotypes containing mutant genes is 0.62 for Ps in U.S. blacks and wthal in Greeks but 0.82 for rithal in Italians. Thus, as is clear in figure 1, the greater the diversity, the more efficient is our procedure over others that can be envisioned. DISCUSSION

We have provided a theoretical framework for evaluating the efficiency of a DNA polymorphism for genetic counseling. However, most importantly, when a series of tightly linked DNA polymorphisms are available, we show how to choose a subset of marker loci and determine the order in which these should be sequentially tested to achieve maximum efficiency and minimum cost. Of course, reliable data on marker haplotype frequencies on normal gene and disease-gene-bearing chromosomes are necessary. The most preferable method for disease diagnosis is a direct test. This will sometimes be possible using a restriction endonuclease that detects the mutation (sickle-cell anemia [24]), a genomic probe that detects deletions (growth hormone deficiency [18]), or synthetic oligonucleotides to specific mutations (a1-antitrypsin deficiency [35], PKU [36], 13-thalassemia [34]). However, ,thalassemia is also a good example of the difficulty in this approach. Currently, there are over 30 distinct r3-thalassemia mutants that have been isolated but each has a distinct geographic distribution [34]. Thus, given the great diversity of mutations and the population specificity of each, it will probably not be

992

CHAKRAVARTI AND BUETOW

z~~~~~~~~~~~~~~~~~~~~~~~~~~~~~s

L_ ;~~~~~~~~~~~~~~~~~~~~~,~~~~~~~~~~~~~E *0

Suggest Documents