Manuscript received September 27, 1990 Accepted for publication November 29, 1990

Copyright 0 1991 by the Genetics Society of America A Cluster of Vitellogenin Genes in the Mediterranean Fruit Fly Ceratitis cupitatu: Sequence and S...
Author: Wendy Barrett
0 downloads 1 Views 3MB Size
Copyright 0 1991 by the Genetics Society of America

A Cluster of Vitellogenin Genes in the Mediterranean Fruit Fly Ceratitis cupitatu: Sequence and Structural Conservationin Dipteran Yolk Proteins and Their Genes M. Rina* and C. Savakis**t "Institute of Molecular Biology and Biotechnology, Research Center of Crete, Foundation of Research and Technology, Heraklion, Crete, Greece and ?Division of Medical Sciences, Medical School, University $Crete, Crete, Greece Manuscript received September 27, 1990 Accepted for publication November 29, 1990

ABSTRACT Four genes encoding the major egg yolk polypeptides of the Mediterranean fruit fly Ceratitis capitata, vitellogenins 1 and 2 (VG1 and VG2), were cloned, characterized and partially sequenced. The genes are located on thesame region of chromosome 5 and are organized in pairs, each encoding the two polypeptides on opposite DNA strands. Restriction and nucleotide sequence analysis indicate that the gene pairs have arisen from an ancestral pair by a relatively recent duplication event. The transcribed part is very similar to that of the Drosophila melanogaster yolk protein genes Y p l , Yp2 and Yp3. The Vgl. genes have two introns at the same positions as those in D. melanogaster Y p j ; the Vg2 genes haveonly one of the introns, as do D. melanogaster Ypl and Yp2. Comparison of the five polypeptide sequences shows extensive homology, with 27% of the residues being invariable. The sequence similarity of the processed proteins extends in two regions separated by a nonconserved region of varying size.Secondary structure predictions suggest a highly conserved secondary structure pattern in the two regions, which probably correspond to structural and functional domains. The carboxy-end domain of the C. capitata proteins shows the same sequence similarities with triacylglycerol lipases that have been reported previously for the D. melanogaster yolk proteins. Analysis of codon usage shows significant differences between D. melanogaster and C. capitata vitellogenins with the latter exhibiting a less biased representation of synonymous codons.

HE major egg yolk proteins (vitellogenins)of higher Diptera are polypeptides of 44,000 to 50,000 daltons and differ from those of other egg layinganimals. In contrast, the vitellogenins from species as diverse as the locust, nematode, frog and chicken probably havea common evolutionary origin. They are generally larger insize, are encoded by multigene families and share amino acid sequence similarities; no significant sequence similarities canbe detected between them and the dipteran yolk proteins (SPIETHet al. 1985; NARDELLI et al. 1987). The latter show local amino acid sequence similarity with members of the triacylglycerol lipase family (BOWNES et al. 1988). The best studied dipteran vitellogenins arethe three yolk proteins of Drosophila melanogaster. These polypeptides, designated YPl , YP2 and YP3, are synthesized in the fat body of adult females, secreted in the hemolymph and taken up by developing oocytes. I n addition, D. melanogaster vitellogenins are synthesized in developing follicularepithelial cells and transported directly into the oocyte (BOWNES and HAMES, et 1978; WARRENand MAHOWALD,1979; BRENNAN al. 1982). Their transcription is stimulated by ecdysteroid hormones; @-ecdysoneinducesyolk protein synthesis in the fat body of adult males, which nor-

T

Genetics 127: 769-780 (April, 1991)

mally do not produce yolk proteins (POSTLETHWAIT, and JOWETT1980). The three proteins are BOWNES highly related to each other and are encoded by singlecopy genes ( Y p l , Y p 2and Yp3) which are localized on the X chromosome. Genes Y p l and Yp2 are transcribed divergently and are separated by 1.2 kbof DNA, while Yp3 is located approximately one megabase away (HUNGand WENSINK 1983; GARABEDIAN et al. 1987; YAN, KUNERTand POSTLETHWAIT 1987). Tissue-specific transcriptional enhancers common for both genes are localized in the Y p l / Y p 2 intergenic region (GARABEDIAN, HUNG and WENSINK1985;GARABEDIAN, and WENSINK1986). A verysimilar arSHEPHERD rangement was described for the vitellogenin genes of Drosophila grimshawi, a Hawaiian endemic species which belongs to a different sub-genus than I). melanogaster. This specieshas three geneswhichcrosshybridize to each other and to the melanogasterYp genes; S1 nuclease analysis has shownthat two of the genes are closely linkedand transcribed with opposite orientations with their 5' ends 1.75 kb apart (HATZOPOULOS and KAMBYSELLIS1987). The D. melanogaster vitellogenins are characterized by three regions which show extensive primary and secondary structure homology among the genes but not between each other, separated by a non-conserved

M. Rina and C. Savakis

770

regionswithvariable length (HUNGand WENSINK 1983). One of the conserved regions has significant sequence similarity with part of the lipid-binding domain of triacylglycerol lipases; it has been suggested that this region has a lipid-binding function (BOWNES et al. 1988; PERSSON et ul. 1989). The vitellogeninsof Ceratitiscupitata havebeen studied to considerable detail. This species has two major yolk polypeptides, designated VGl and VG2, with molecular weights of49,000 and46,000 daltons respectively. They have been purified and show immunological cross-reactivity with the D. melunogaster homologs (RINAand MINTZAS1987, 1988). As in D. melunoguster (KOZMAand BOWNES1986), they are synthesized in the fat body and follicular epithelial cells and are induced in males by &ecdysone (RINA and MINTZAS 1988). The Mediterranean fruit fly C. cupituta (family Tephritidae; medfly) is a higher dipteran which presents several important advantages as an organism of choice D. melanofor comparativemolecularstudieswith gaster. It is phylogenetically close enough to D. melagene homologs nogaster to allow cloning of Drosophila by interspecific nucleic acidhybridization, but distant enough for comparisons to be meaningful. Furthermore, it has been adapted easily to inexpensive laboratory culture, it has a 24-day life cycle, and has well characterized polytene chromosomes (ZACHAROPOULOU 1990). Last but not least, medfly is an insect of economic importance amenable to biological control suchas the sterile male technique. Cloning oftwo chorion protein genes and one actin gene from C. cupitata was reported recently (KONSOLAKIet al. 1990; TOLIAS et al. 1990; HAYMER et ul. 1990). Of these, the actin gene and one of the chorion genes (Ccs36) were cloned by heterologous hybridization to D. melanogaster probes, while chorion gene Ccs38 was cloned by a differential screening procedure; gene Ccs38 was subsequentlyshown to cross-hybridizewith the D. melunoguster s38 gene. In an effort to obtain and analyze C. cupituta promoters that are expressed in only one sex, we cloned the genes encoding vitellogenins. Here we report the structure of these genes and the results of DNA sequence analysis.

and the complete life cycle ofthe insect is approximately 24 da s. h - L a b e l e d nucleotides were from Amersham, and restriction and modification enzymes fromMinoTech (Heraklion), Pharmacia and Bethesda Research Laboratories. Constructionandscreening of thegenomiclibrary: DNA (20 pg) from 24-hr-old medfly embryos was partially digested with restriction endonuclease MboI, extracted with phenol/chloroform, precipitated with ethanol, and fractionated on a 10-40% sucrose gradient. Fractions containing fragments of 15-20 kb in length were retained for ligation to vector DNA. For ligation, 0.25 r g genomic DNA fragments were combined with 0.83 r g lambda EMBL4 arms produced by digestion of phage DNA with BamHl. In vitro packaging was as described previously (MANIATIS, FRITSCH and SAMBROOK 1982). Approximately 250,000 plaques were screened, as described by BENTONand DAVIS(1977), by hybridization at 55" to a D. grimshawi cDNA clone (HATZOPOULOS and KAMBYSELLIS1987) corresponding tothe vitellogenin 1 mRNA of this species. This clone was used because it was available to us and had been shown to crosshybridize strongly with its D.melanogaster homolog. Generalmethods: Genomic DNA was prepared essentially as described previously (HOLMES and BONNER 1973). Preparation of phage and plasmid DNA, agarose gel electrophoresis of DNA, and blotting to nitrocellulose membranes were carried out using standard procedures (MANIATIS,FRITSCHand SAMBROOK 1982). DNA probes were prepared by nick-translation (MANIATIS, FRITSCHand SAMBROOK 1982) or by random hexanucleotide priming (FEINBERG and VOGELSTEIN1983). Hybridizations of "P-labeled probes to blotted nucleic acids wereperformed as described by MANIATIS,FRITSCHand SAMBROOK (1982), at 38" (D. grimshawi probe) or 42" (C.capitata probes) in 50% formamide, 5X SSC, 0.5% SDS, 10 mM EDTA, 100 rg/ml sonicated, heat-denatured herring sperm DNA, and 5X DENHARDT(1966) solution. DNA sequencing was done by thedoublestranded dideoxy chain termination method (WALLACE et al. 1981). DNA and protein sequence analysis: The program packagesANALYSEQ and ANALYSEP (STADEN1984) were used for sequence analysis. Optimal alignment of protein sequences was carried out by the IALIGN program (DAYHOFF, BARKER and HUNT 1983) of the Protein Identification Resource (National Biomedical Research Foundation) and by the multiple alignment program CLUSTAL (HIGGINS and SHARP1988). All programs were run on a VAX/VMS computer. Graphics were processed and plotted with a Macintosh microcomputer running a terminal emulation software. RESULTS ANDDISCUSSION

Cloning of C. cupitutu vitellogenin genes: Figure 1 shows the results of a hybridization experiment in

MATERIALS AND METHODS

Flies and materials: A C. capitata strain obtained from A. MINTZAS (Department of Biology, University of Patras, Greece) was used for all experiments. The strain was originally established in the laboratory by P. A. MOURIKIS(Benakeion Institute of Phytopathology, Athens, Greece) with flies from the Southern Peloponnese (Greece) and Palermo (Italy). Insects were raised at 22-25" as described previously (MINTZASet al. 1983). Adults were maintained on a one part sucrose to onepartdried yeast diet. Under these conditions, embryonic development lasts about 48 hours,

whichvitellogenin DNA from another higher dipteran, D. grimshawi, wasused to probe a C. capituta genomic DNA blot at low hybridization stringency. The probe was a cDNA clone (plasmid clone ~3.57) which contains approximately 900 nt. corresponding to the carboxy terminus halfof the Vitellogenin 1 and KAMmRNA from D. grimshuwi (HATZOPOULOS BYSELLIS 1987). At least four prominent Hind111 fragments and two EcoRIfragments were detected by this probe in C. cupituta DNA. The same probe hybridized

Genes Medfly Vitellogenin

1

2

3

4

5

-23.13KI

-

-

9.40 6.68

4.36

2.32 2.03

FIGURE 1 ."Southern blot hybridization of D.grimshnwi vitellogenin DNA to C. capitata genomic DNA. Embryonic DNAs from D.grimshawi (lane 2) and C. capitata (lanes 3 and 4) were digested with restriction enzymes, separated on a 0.9% agarose gel and blotted on a nitrocellulose membrane filter. The blot was hybridized at low stringency (5X SSC, 38", 50% formamide) with nick-translated plasmid c357 DNA. Lanes 2 and 3, DNA digested with HindIII. Lane 3, DNA digested with EcoRI. Lanes 1 and 5 are size standards.

to three HindIII fragments of grimshawi DNA, each corresponding to one of the three vitellogenin genes identified in this species (HATZOPOULOS and KAMBYSELLIS 1987). T o clone the genomic vitellogenin-related sequences, a C.capitata DNA library was constructed in vector EMBL4 using genomic DNA fragments produced by partial digestion with restriction endonuclease MboI. Approximately 2.5 X lo5 recombinant phage plaques were screened for hybridization to the D. grimshawi vitellogenin cDNA probe. Five clones gave strong signal and were isolated and characterized by a combinationof restriction mapping and blot hybridization. Figure 2 is a summary of the results of this analysis. The restriction maps ofthree of these clones (ccv7 1, ccv51 and ccv53) were clearly overlapping. The short overlap between clone ccv72 and clones ccv5l and ccv53 was confirmed by Southern blot hybridization

77 1

analysis (data not shown). The four overlapping clones cover a 37 kilobase region of the C. capitata genome which contains four distinct small regions of vitellogenin DNA-related sequences. Southern analysisof these clones showed four non-contiguous fragments hybridizing strongly to theD. grimshawi cDNA probe: a 0.56-kb HindIII/BamHI and a 1.4-kb HindIIIl HindIII fragment from cloneccv72, and a 0.93 HindIII/BamHI and a 0.56-kb HindIII/BamHI fragment from the other three clones. The four regions corresponding to these fragments were designated a, B, y and 6 (Figure 2). Two linesof evidence strongly suggest that the restriction fragments hybridizing to the D. grimshawi vitellogenin probe contain medfly vitellogenin gene sequences. First, hybridization of one of these fragments (the 0.93 HindIII/BamHI fragment from clone ccV71 corresponding to region y) to total fat body RNA from staged female and male medflies gave a single band in 36-hour or older females, but not in males (RINA and MINTZAS 1988); this pattern accurately reflects the levels of translatable vitellogenin mRNA in fat body of the developing medfly. Second, hybrid-selected translations showed that mRNAs hybridizing to clone ccV7 1 DNA are translated into two polypeptideswith the samemobilities inSDS-polyacrylamide gels as newly synthesized vitellogenins 1 and 2 (data not shown). The arrangement of the vitellogenin-related regions a to 6, combined with the observed symmetry of some of the restriction sites around these regions, suggests the presence of a cluster of duplicated genes. This was confirmed by sequence analysis (see below). The restriction map of the fifth clone, ccv8 1, is clearly not overlapping with that of the 37-kb region covered by the other clones. However, part of the ccv81 map is indistinguishable from that of the 37-kbregion between positions at 20 and 31 kb (Figure 2; clone ccv81 has been aligned accordingly). This clone may represent an adjacent duplication of the 7-6 gene pair, may correspond to a third pair of vitellogenin genes, not directly linked to thea-6 cluster, or, alternatively, could be the result of a cloning artefact. It is notable that ingenomicDNAblots, four EcoRI fragments (approximately 6.0, 9.5, 15.0 and >20.0 kb in size) are detected with vitellogenin DNA probes (Figure 1). Since onlythe 15- and >20-kb fragments could be accounted for by the composite map of the a-b cluster and by the restriction map ofclone ccv8 1, it is possible that more vitellogenin genes exist in the medfly genome. However, all cloned medfly vitellogenin genes are probably closely linked, since clones ccv72, ccv7 1 and ccv81 hybridize in situ tothe sameband on chromosome 5 of the medfly (A. ZACHAROPOULOU, M. FRISARDI, A. ROBINSONand C. SAVAKIS, manu-

772

M. Rina and C. Savakis B

SHE B B I n

I

B

I I

cv72

H

H HB

I

-

S

I I 1

I

H

H HB

sI

J

I I 1

-

C.

I . .

. .

I

H sn s I I I1 -+

B

.

.

.

.

I

I1

BHH I 1 I

-

BHH

nI 1s m I1

S

-

SmBB

I

I I1

I I

BHH I 1 I

I

4”‘

-4

5 .

.

.

1

ccv53

I

I n

8 30

20 .

ccv5 1 ccv7 I

I

s

I1 I1

I I

10 I

BHH

B

a 0

I

I1

I 1 I

s

I 1

H SH S B SHE I

I

11

HSBH

SHEBB

I

s

4

HSBH

I

I

ns BH

.

.

.

.

I

.

.

.

.

,

.

.

script in preparation). We have concentrated our analysis on the genes of the a-6 cluster. Structure and chromosomal localization of the y and 6 genes; conserved features between C. cupitutu and D. melanogaster: Sequencing of theand 6 regions showed that each contains a gene highly homologous to the D. melanogaster yolk protein genes. Figure 3 shows the sequence of 2364 bp of genomic DNA covering the y region. The sequence extends from base no. +953 to base -141 1 relative to the HindIII site at 23 kb of the composite map shown in Figure 2. Conceptualtranslation inallsix frames showed three open readingframes (all in the orientation indicated in Figure 2) with significant similarity to the D. melanogaster yolk polypeptides, suggesting the presence of two introns. By aligning the derived polypeptide sequences to theavailable D.melanogaster yolk protein sequences we arrived at the intron/exon structure indicated in Figure 3. T h e first coding part begins with an ATG atbase 433 of the sequence and ends at position 655, which is the first base of codon 74. The nucleotide sequence surrounding the initiation codon (C A A C A T G) is in good agreement with the consensus sequence C/A A A A/C A T G flanking translational start sites in Drosophila (CAVENER 1987). A 67-bp intron separates the first coding part from a second, 389-bp exon beginning at base 723. A second intron is placed between bases 11 12 and 1179. The 5’ and 3‘ ends of both introns conform to consensus sequences (G T A/G A G T . . . Y N Y Y Y Y N Y A G) (MOUNT 1982; TEEM et al. 1984); in addition, both introns contain versions of the internal splice signal C/T T A/G A C/T (KELLER and NOON 1984)upstreamfrom the 3‘ splice site. The third coding part is 699 bp long, ending with a TAA at base 1879. Two tandem repeats of the consensus polyadenylation signal sequence A AT A A A A (PROUDFOOT and BROWNLEE 1976) are

.

.

l

.

40 kb .

.

.

I

.

.

.

.

I

FIGURE2.-Restriction maps and arrangement of C. capitata genomic clones containing vitellogenin-related sequences. Restriction maps of the four overlapping clones ccv72, ccv53, ccv51 and ccv71 are shown at the top. The sites shown are: EamHI (B), Sal1 (S), HindIII (H) and EcoRI (E). The bars below clones ccv72and ccv53 correspond to the fragments produced by double BamHI/HindIII digestion that hybridize to the D. grimshawi probe. The composite map of the cluster is shownbelow the clones. The four vitellogenin genes (ato 6) were identified by sequencing (see text). The restriction map of the non-overlapping clone ccv81 is shown below the size scale.

located 100 nucleotides downstream from the termination codon. The 1702-bp sequence from the 6 region is shown in Figure 4. This sequence extends from base -1 137 to base +565 relative to the BamHI site at approximately 30.3 kb of the composite map shown in Figure 2. The sequence contains two open reading frames which are read in the opposite direction from the 6 gene and also show considerable sequence similarity to theD.melanogaster Yp genes. The proposed intron/ exon structure of the gene is as follows. The first coding part is 21 1 bp long and begins with the ATG at base 237 of the sequence. This ATG is also embedded within a sequence similar to the Drosophila consensus (see Figure 6A). The exon ends at position 447, which is the first base of codon 71, and is followed by a 89-bp intronwith acceptable splice signals. The second coding part is 1055 bplong, ending with a TAA codon at base 1592. Although the proposed intron/exon structures of the y and 6 genes were not subjected to direct testing, such as S 1 protection experimentsor cDNA sequence analysis, we believe that they represent the correct structures because of their strikingsimilarity to theD. melanogaster yolk protein genes. D. melanogaster has three genes, Y p l , Y p 2 and Yp3, coding for the three yolk proteins found in this species. The structures and the arrangementof consensus sequences are very similar in these genes. Y p l and Yp2 have a short exon followed by a single short intron (75 bpin Y p l and 68 bp in Y p 2 ) and then by a longer exon. Yp3 has two short introns of 62 and 72 bp; the first is at the same position as the intron in Y p l and Yp2. The 5’ consensus sequences of the threegenes (TATA box, capping site and translation initiation consensus sequence) are also at very similar positions; in addition, all three genes have rather short (51 to 61bp) 5’ untranslated regions (HUNG and WENSINK1983; GARABEDIAN et

Genes Medfly Vitellogenin

60 120 1BO 210 300 380 120 16 180

36 510

56 WG

71 660 720 9( 780

Ill M O 139 900

154 WO

773

al. 1987; YAN,KUNERT and POSTLETHWAIT 1987). The two sequenced medfly genes have the following structural features in common with their D. melanogaster homologues: Sequence alignment and position of introns: Removal of the two introns from the y gene gives an open reading frame of 131 1nucleotides, which can encode a 437 amino acid polypeptide with a molecular weight of 48,122 daltons. Respectively, the 6 gene can encode a 422 amino acid polypeptide with a molecular weight of 45,434 daltons. Medfly vitellogenins 1 and 2 have molecular weights of 49,000 and 46,000 daltons, respectively (RINAand MINTZAS 1987). Based on the good agreement between the calculated and the observed molecular weights, we suggest that the y gene encodes VGl and the 6 gene encodes VG2. Figure 5 is an alignment ofthese deduced polypeptidesequences to the known sequences of the three D. melanogaster yolk proteins. The fivesequencescanbe aligned with only two major gaps, which are located within exons. This alignment illustrates that the positionof the introns is strictly conserved in the five genes: Intron 1 of the V g l gene and the single intron of the Vg2 gene are at thesame position asthe intron present in all D.melanogaster yolk protein genes, ;.e., after the first base of a codon for tyrosine (HUNGand WENSINK1983); intron 2 of the V g l gene is at the same position as the second intron of the D. melanoet al. 1987). With regaster Yp3 gene (GARABEDIAN Vgl spect tointron/exonstructure,therefore,the gene appears to be homologous to the Yp3 gene, while the Vg2 gene appears to be homologous to the Y p l and Yp2 genes. Size ofintrons: The two introns in the V g l gene and the single intron in the Vg2 gene are, as in the D. melanogaster yolk protein genes (HUNG and WENSINK et al. 1987), very short: 67 and 89 1983; GARABEDIAN nucleotides for intron 1 of the V g l and Vg2 genes respectively, and 68 nucleotides for intron 2 of V g l . Figure 6B shows a comparison of intron sizes and consensus splicing sequences between the vitellogenin genes of C.capitata and D.melanogaster. 5 ’ Consensus sequences: The D. melanogaster yolk protein genes have, as most eukaryotic genes, a T A T A (Hogness-Goldberg)box beginning at position -29 to -32 relative to the capping nucleotide (HUNGand et al. 1987; YAN, KuWENSINK 1983; GARABEDIAN NERT and POSTLETHWAIT 1987). In addition, the capping sites of all three genes match the insect cap site KLEMENTZ and consensus sequence (HULTMARK, GEHRING 1986). The gene has the sequence T AT A T A A between bases -61 and -55 relative to the initiation ATG; the 6 gene has a T A T A A A A termination codons, polyadenylation signals, and first and last two bases of each intron are underlined. Numbers refer to nucleotide and aminoacid positions.

774

M. Rina and C. Savakis RRRTTTGRGTTCARTGCCTTOTTTTRRTRRRCCGRTTTRCRGC~TTTTRTTTTRTTGCT 60 TRGTCTRCGCRACTCTRTRC~TGTGTGTGTCCGTTTGTRTGCTT~TTClRGTTCflTC~~TI 2 0 T T T R C T O ~ G C T R C A R T T T G C R T T T T G T C R T C G l T R ~ C C R C T C R R C T G C T C I80

stream of its putative TATA box. The y gene has the sequence C A C A G T T and the 6 gene has the sequence T A C A G T T 30 and 32 bases downn I GCCCNTCGTRCGACRTRGGCGRIGTCCGGRRRGC~GCRTRTTRCITGRTCRGCC~ 2 I G stream from the TATA, respectively. These heptaN P L T I F C L U R U L L S R R T R H R 21 nucleotides represent 5/7 matches to the consensus RTCCTTTGRCTRTTTTCTGTTTCGlGGCTGTGCTGCTCTCGGCGGCCRCRGCRCRTCGCG 300 insect cap site A T C A G/T T C/T. These features G S ~ R I R N N L O P S G X L S P R E L 11 are shown in Figure 6A. GCRGCRRTGCCRTCCGCRRCRRTTTGC~RCCCTCRGGCRN~CTTTCGCCRCGTGRGTTGG360 Size of the 5' untranslated regions: The D. melanoE G H P R I H E I T F E K L Q E H P R E 61 RGGATRTGCCAGCRRTRRRTGRGRlCRCCTTCGfl~flflTTGCRGGRRRlGCCCGCTGRGG 120 gaster yolk protein mRNAs have rather short 5' leaders, 61, 51 and 56 nucleotides for Y p l , Y p 2 and Yp3, E R R O L U N K I 70 RGCCTGCRGRTTTRGTGRRCRRGRlCT~RRGTRTTGCRTRGTTTRTTlTT~GCRGCGC180 respectively (HUNG and WENSINK1983; GARABEDIAN V H 72 et al. 1987; YAN, KUNERTand POSTLETHWAIT 1987). CGRACRCTlTR~GCGGCCGCC~flflTRCRCRTRTGTRTGTRTGRRTTTTClTTRC~RCC~ 510 If the putative capping sites identified are used by the L S Q ~ S R N I E P ~ V R P S P N O 92I P and 6 genes, then their 5' sequences are also rather CTTCTCGCAGRTGRGTCGTR~TRlTGRRCCC~CTTRTGCACCRRGCCCCRRCC~CRTlCC 600 short, ca. 30 and 76 nucleotides, respectively. A V T V T P T G Q R U N F N L N Q L U R 112 Similaritiesin the 5' flanking sequences: Asin D. CGCCTRCRCRTACRCRCCCRCCGGTCRRCGCGTGR~CTTCR~TTTG~~TCRRTTGGTGGC 660 melanogaster, the sequence homology observed in the T R O Q Q P N F G K O E U T U F I T G L I32 CRCTGCTCAGCAACRRCCTR~CTTCGGCRRRC~G~RGTTRCRGTTTTCRTCRCCGGTCT 720 coding parts of the C.capitata vitellogenin genes does P N K S S R t l L T R N O K L U O R V L Q I52 not extend into the 5' flanking DNA. However, sevGCCRARCRRRACCTCCGCGRTGCTGRCGGCC~RCC~GRRGCTGGTRCRRGCRTRCTTGCR 780 eral short nucleotide sequences have been identified R V N G R U Q U Q G E Q G O O S H O O T I72 in D. melanogaster, which are repeated several times RCCRTRCRRCCGCCGTGTRC~GGTRCRRGGTGRRCRCGGCCRTGRCTCCRR~~GGRl~ 810 in the 5' flanking DNA of the yolk protein genes S S S E E S S N R P N G O Q P K P N G N I92 et al. 1987; YAN, KUNERT and (GARABEDIAN RTCATCGRGCGRGGRRTCCTCCRRCCGTCC~RRCGGTCRRC~GCCCRR~CCCRRTGGRRR 900 POSTLETHWAIT 1987). The heptamer A/TA/T L U U I O L G R U I R N F E O L U L L D 212 1 T T G G T ~ G T T ~ T C G R C l l G G G T G C C G T C R T C C G C R R C T T C G f l ~ T C T ~ G T T T T R C T C960 GR T G C A A or its complement is encountered seven times within 800 bp upstream fromthe Yp3 gene, and l N R U C A R l G N S L U Q L T R Q R O 2 3 2 CRTCRRTCGCGTCGGCGCTGCCRTCGGTRRCRGlTTGGTGCR~TTRCRGCRCRRGCTGR 1020 five times in the 1.2 kb intergenic regionbetween Y p l U P O E U I N I U R O G I R A H U R G A ~ ~ ~ and Yp2 (YAN, KUNERT and POSTLETHWAIT 1987). 1GTGCCRCRGG~RGTG~TT~RTRTTGTTGCGCRR~lRTTGCTGCTCRTGTTGCTGGlGC IO00 Matches to this heptamer or its complement are found R R R Q V T R P T G ~ T L R R I T R H O ~ ~ Z at positions -88 and -343 (relative to the first T of c c c T c c T c c T c A A T n c R c R c c ~ c n R ~ c ~ c c ~ ~ R c ~ ~ ~ ~ c c c c c cIIto ~n~c~cc~c~n~c~ the TATA box) of the y gene, and at positions -62 P S K I V R R K P N T L U G L R R G N R ~ ~ ~ and -72 of the 6 gene (Figure 6C). A comparison TCCCTCRRRGRTTTRTCCRCG~RRRCCC~RCRCTTTGGTCGGTTTGGCTCGTGGTRRTCCI200 D. melanogaster and C. capitata flanking sebetween D F U O R I H T S R V C L C T T T R R G ~ ~ ~ TGATTTCGTlGRTGCCRlCC~CRCRTCTGCTTRTGGTTTGGGTRCTRCRRCRCGTGCT~ 1260 quences also showed the presence of single copies of the sequence G A G N T C A A G/T G/T C G/C at O U D F V P N G P S U N H P G T O O l l 3 3 2 TGRTGTTGRCTTCTRTCCTA~TGGCCC~TCTGTCRRT~TGCCCGGTRCTGRTGRCRTCRT 1320 distances from -575 to -124 relative to the TATA E R S L R R T R V L R E T U L P C N O R ~ S ~ box in the Yp2, Yp3, y and 6 genes (Figure 6D). The TGAAGCCRGTTTRCGTGCC~CTCGCT~CTTRGCCGRGRCRGlGCTGCCflGGCRRTG~CCG 1380 consensus, and indeed three of the four actual seN F P R U R R E S L Q Q ~ K N N N G N G 3 7 2 quences, T R A c T r c c c R c c T c r T c c c t c ~ c ~ ~ ~ c c c ~ ~ c n ~ c ~ ~ ~ n c n ~1440 c ~ ~ ~ n ~ c ~ ~ ~ c c c n n ~ c c are close relatives (9 of 12 nucleotides) to the sequence G G G T T C A A T G C A found at R R R V ~ G I A R O V D L E G D Y I L Q ~ ~ ~ the ecdysone responsive element of the D. melanoCRGACGCGCTTRTRTGGGTATlGClGCCCRCTRCCRTTTCG~GGGTGRCTRC~TTCTCCR1500 gaster hsp27 gene promoter (RIDDIHOUGH and PELU ~ R K S P F C K S R P ~ Q K Q N S V H ~ I ~ RGTGRRCGCCRRGRGCCCRTTCGGTRRRRGCCCTCCTGCCCRGRRRCRGRRCTCCTRCCR1560 HAM 1987). Although the significance of these simi122 larities is not known, transformation experiments in G I H Q G R G R P H * T G t C R T ~ C R l C R G G G T G C C G G C C G C C C T ~ R C ~ ~ C R R C T R G f f i G G C T G C G G f l ~ G R1620 GTG which in vitro modified medfly genes are introduced GRRAGTGTRRTGAGRRATACflRRTGGRTlTCTTRTTTRRTTTTRRCTTR~CTTRCTlTTT 1680 1702 RTGCTRTTTGMITIRGRRGCTT into the D.melanogaster genome would test the possiFIGURE4.-Nucleotidesequence of the Vg2-6 geneand surbility that they representregulatoryelements conrounds. The predicted amino acid sequence is shown above the served between D.melanogaster and C. capitata. DNA sequence. The putative TATA box, cap site, initiation and Another similarity between Drosophila and medfly termination codons, polyadenylation signals, and first and last two vitellogenin genes is their position in the genome. In bases of the intron are underlined. Numbers refer to nucleotide Drosophila these genes are on the X chromosome, and amino acid positions. while in medfly they are located on chromosome 5. The X chromosome of the medfly is heterochromatic between -109 and -103 from the ATG. Alignment (ZACHAROPOULOU1990), and recent in situ hybridiof the five nucleotide sequences at the TATA boxes zation studies of medfly polytene chromosomes have showed that each of the medfly genes has a capping revealed that several medfly genes homologous to site-like sequence atthe canonical distance down-

Medfly Vitellogenin Genes

In=

L l V ps W l 174 P wz ~‘lsE Ug2 1E ~ g t-n l Y

w3

I

K . . S . N

S N E P R K

EasaRs PODDSN

OSSPON

aaLKssD

T

~

. .

. U. . A . . n l

E E N E L

-.

;l o

K R R a N

w l z235 w a

l”A2 . . . - . . . . . . .. - . uR r n~ . Q0 . R ~ 1Nu NE L mD ~ lP F ~D T l I pW RL ~ rO Q Nq ~ ~ R

wzz,

S L U P L T R ~ R D U P ~ E U I N I UO RI A R H U R D

-”

”.

I

~ p l mT L U P L T W E U D U P P E I I I U R ~ C I G RU R G R w3m T L D L T L K G - U P Q E I I H L I G Q G I S R U R G R Ll-161

_.r_.



RO..

.

.

..

.

-

. R . T O . .

I . R I T O L D P a a . .

.....

. .

FIGURE5.-Comparison of the amino acid sequences of the three D. melanogaster yolk polypeptides to the deduced C. capitata vitellogenin polypeptide sequences. YPl, D.melanogaster yolk protein 1 (NBRF database accession No A03332); YP2, D. melanogaster yolk protein 2 (NBRF database accession No A03333); YP3, D. melanogaster yolk protein 3 (sequence from GARABEDIAN et al. 1987); VGI

775

Drosophila X-linked genes (including the vitellogenins and the two chorion genes s36 and s38) are located on chromosome 5 (A. ZACHAROPOULOU,personal communication). These syntenicassociations,combined with those discovered for other medfly genes by MALACRIDAet al. (1986), further support Muller’s hypothesis about the evolution of the Diptera (MULLER 1940) and add to the significance of comparisons between medflyand Drosophila. The a and /3 geneshavearisen from arecent duplication of the 7-6 pair: The restriction map of the a-8 region shown in Figure 2 suggests that this region represents an inverted duplication of the 7-6 region. This was confirmed by partially sequencing the a and /3 loci. T w o parts of the a locus (429 and 565 bp) were sequenced. The first part is identical to bases 1 to 429 of the 6 gene, with three differences: a 202A ”* G substitution at the 5’ untranslated region, a 260T + C conservative substitution at codon 8, and 363G + A, which results in a replacement, 43Asn + Ile. The second part is identical to bases 1138 to 1702 of the sequence, with three differences: A 1330T + C conservative substitution at codon 335, and two A ”* C changes at bases 1593and 1594; thesechanges replace the T A A termination codon with a codon for serine, which is then followed by codons GAC, AAC H ~ R ~ R and a termination codon, TAA. As a result, the carboxy terminus of the polypeptide coded by the a gene differs from that of the 6 gene product by an extra SerAspAsn tripeptide. Similar results were obtained by partially sequencing the 8 locus. The sequenced part (992 bp) is more than 98%identical to nucleotides 1130 to 2121 of the gene. Of the thirteen differences found, five are in intron 2 (1134A .--, T , 1153A + G, 1157G + A, 1168T .--, C, 1174C 3 G), seven are located downstream from the termination codon (197 1T ”* G, 2056A + G, 2069C .--,G , 207 1T + C, 2074T + C, 2093C 3 T, 21 10G + A) and only one occurs in exon 3 (1 187A + T, resulting in a replacement, 206Asn 3 Ile). We conclude that the a-/3 and 7-6 pairs of genes and VG2: C. capitata vitellogenin 1 and 2, (genes and 6) respectively. The five proteins were aligned using the program CLUSTAL (HIGGINSand SHARP 1988).The positions of the gaps were first determined by two-way comparisons using program IALIGN (DAYHOFF, BARKER and HUNT 1983) and then adjusted manually for maximum similarity. Residues which are identical in at least four of the sequences are boxed. The position of the introns is indicated by filled triangles (intron 2 is found only in D.mehogaster Yp3and in C. capitata V g l genes). The similarity of these sequences to the pig triacylglycerol lipase (NBRF database accession No. A00732) is shown above the YP1 sequence. Capital letters show identities between lipase and at least two of the vitellogenins; lower-case letters indicate conservative replacements. Numbers indicate amino acid positions. Three small gaps in the lipase/vitellogenin alignment are not shown.

776

M. Rina and C. Savakis TATA box

imlammm

C.p

...gtggTRTRTARaccac. ...It bp. ...gaaggcCRGTTca.. ..i7 b p . , . .gaRCCRTGa. .. .. .ggggTRTAaRRtgcat.. . .14 bp.. . .ggctgCRGTaca.. . .36 bp.. . .CCACaATGa., . VP3 ...gagcTRTflTAAgccat... .16 bp ....agcc!TCRaaCgt.. ..42 bp ....CCAaaATGa... UGI ...tlcgTRTATAAgcaac ....1~ bp ....acggcaCAGTToa....17 bp....tCAaCATGa... UGZ ...actgTRTRaRRgctoc. ... 16 bp .... tcgttaCAGTTcc ....63 bp. ...CaGCCRTGa... VPI VP2

1

I

cansewus

...........m y...........YNYYmmG 3' e P I G I t 3 . X ...........ocAAT...18 kp...C X T ! G l X l X ' ClI'GIz\AGp...........TFAC.12 bp...AltXOXAGIX c P I G I t 3 . X ...........X l W.12 kp... CTI...........m Y T...11 b p...TlXlCAGIXC ClI'GIz\AGp...........M T . . . Z 2 bp...R T l T A C X l t C T C I G I z \ A G p ...........

5'

YP1 (75 bp) YP2 (68 bp) YP3-I (62 t p ) M-1 (61 bp) (89 bp) YPMI(72 b) M-II(68 tp)

Grfwx

TclGpAAGp

VPI

'LGcAT...13 kp...AMiAmxaclX' ...........W... 15 bp...'ITIWKXIm

VP2 VP3

C VP2 VP3

-291 -575

UGl

-228

UGZ

-121

Conscnaus

GRGTTCRAGTCG GAGGTCRRTGCC GAGCTCRATGCC GRGTTCRRTGCC GRGNTCRA;$~

UG1

-815 -7i4

TATGCAR RRTGCAR -132 RRTGCAA (RTTGCRR) ( TTTGCAR) -7# TATGCRR -779 RRTGCAA -769 TATGCAA (TRTGCRR)

-1 79

TTTGCRR

-343

(ATTGCAA) (TTTGCRR) TTTGCAR

-88

RTTGCAA (TRRGCAA)

-62

TRCGCAA

UG2

~onrensuo

-265 TTGCAAT

- 11 8 TTGCAAA

-617 TTGCRTR -178 TTGCAAT - 1 I6 TTGCRRR

-72 TTGCTTA

$TOCAR

FIGURE6.-Sequence similarities at noncoding regions in the vitellogenin genes of D.melanogaster and C.capitata. A, regions of nucleotide sequence similarity around thetranscription and translation initiation sites. B, Comparison of the splice sites and intron sizes. C, Occurrences of the repeated heptamer (see text). D, Matches to theD.melanogaster hsp27 ecdysone response element. For comparison, nucleotide positions in C and D are relative to the first T of the TATA or putative TATA boxes.

have arisen from a relatively recent duplication event. The data are not sufficient to rule out the possibility that a and /3 are pseudogenes. The findings, however, that the potentially coding regions have diverged less than the noncoding regions, and that all parts of the partial open reading frames whichhavebeensequenced in a and /3 (450 codons in total) are free of stop codons, strongly suggest that these genes code for variants of the vitellogenins 2 and 1 respectively. This is also supported by sequence analysis of vitellogenin cDNA clones from medfly ovaries (K. PALIAKASIS and C. SAVAKIS, unpublished results). D. melanogaster and C. cupitatu vitellogenin proteins show extensive sequence and structural conservation: A striking degree of conservation between the D.melanogaster and the C. capitata vitellogenins is revealedwhen the five amino acidsequences are aligned for maximumsimilarity. This conservation pertains to primary sequence, hydrophobicity patterns and predicted secondary structure (Figures 5 and 7). For comparison, we divide each sequence into five regions (a to e in Figure 7): The conserved amino terminal region ofall the proteins (region a) is 19 or 20residues long and

hydrophobic (Fig. 7, bottom); we conclude this is the signal sequence for secretion (BLOBELand DOBBERSTEIN 197 5). Region b, corresponding to residues 26 to 159 of YPl, is characterized by a low degree of sequence conservation: Twenty six residues (19%)are invariant in all five proteins. However, there are virtually no gaps in the alignment (only a two-residue deletion in YPl) and thereare several conservativereplacements, which result in a pronounced conservation of secondary structure and hydrophobicity patterns (Figure 7). A short region between regions a and b cannot be aligned without the introduction of insertions/deletions. Region c corresponds to amino acids 160 to 201 of YPI, and shows no apparent conservation, with the exception of a SerSerGluGlu sequence shared by all proteins. This region varies in size, from 42 amino acids in YPl to 33 amino acids in VG2. It contains many amino acids with charged and polar side chains (Figure 7, bottom), but does not seem to be conserved at the secondary structure level. Region d is the most conservedone. It is 228 amino acids long, spanning residues 202 to 427 of YPl. The

Medfly Vitellogenin Genes

FIGURE7.-Hydrophobicity and secondary structure predictions for the vitellogenins of D. melanogaster and C. capitata. (Top) Prediction of u-helix and &sheet structure according to GARNIER et al. ( 1 978). Regions predicted to have a given secondary structure w e shown above the line and are shaded. (Bottom) Hydrophobicity plots of the five proteins. Regions above the line are hydrophobic, and those below are hydrophilic. Calculation and graphic representation were done using the ANALYSEP software package (STADEN 1984). Bars below the graphs represent the consensus vitellogenin sequence, with similarity regions a and e indicated. Open triangles indicate the positions of introns.

five sequences are aligned along this region with only a single amino acid deletion in the YP3 sequence. Ninety two positions (40%) are invariant, and90 positions are occupied by structurally conservative residues. As a result, this region is predicted to have a highly conserved secondary structure and shows a highly conserved hydrophobicity pattern (Fig. 7). In addition, the amino-end two-thirds of region d shows sequence similarity with pig triacylglycerol lipase (Figure 5), as previously shown for the D. melanogaster yolk proteins (BOWNES et al. 1988). Region e, corresponding to the carboxy end of the proteins, is not conserved and varies in size from 3 (YP3) to 14 aminoacids (YP2). The conservation of secondary structure previously observed between YP1 and YP2 has led to the proposition that secondary structure is important in the

777

common functions of the two proteins (HUNG and WENSINK1983). Our results favor this idea. The extensive conservation between species estimated to have diverged 120 myr ago (BEVERLY andWILSON 1984), means that these genes are under strongselective pressure. Although the vitellogenins are often envisaged as nutritional storage proteins, they have a number of properties whichmay put severe constraintsonchanges of their secondary, as well as tertiary structure. First, they are known to oligomerize. Second, they probably interact with specificreceptors for active transport across the follicular epithelium. Such protein-proteininteractions could have strictstructuralrequirements. Finally, there is evidence that they may play a regulatory role in embryogenesis by binding conjugatedecdysteroid hormones (BOWNESet al. 1988). The highly conserved region d is of particular interest. It is almost coincident with the large exon of YP3 and VG1, and is separated from the rest of the molecule by a stretch of hydrophilic amino acids with noapparent secondary structure. It haspreviously been shown that this region of the D. melanogaster proteins shows weak,but significant sequence similarity with the “central homology region” shared by all members of the triacylglycerol lipase family (BOWNES et al. 1988; Persson et al. 1989). Our results confirm and extend this observation; 25 out of 145 residues in this region are invariable between pig triacylglycerol lipase and all five vitellogenin sequences, while 65 residues show conservative amino acid replacements (Figure 5). The lipase “central homology region” lies within the lipid binding domain of the lipases. Because no catalytic activities have been attributed to the vitellogenins, it has been proposed that this region constitutes a lipid binding domain which serves a lipid carrier function; this prediction is supported by the finding that fatty acid-conjugated ecdysteroids are bound to purified D.melanogaster vitellogenins (BOWNES et al. 1988). Our results, combined with those from D. melanogaster and D.grimshawi, indicate that the vitellogenins of higher diptera are highly adapted proteins which are subject to strong selective pressure. It is attractive to hypothesize that the conserved regions of b and d correspond to discreet structural domains, each derived from different ancestral peptides and being involvedin adifferent function (or functions). In searches of the NBRF and SWISPROTdatabases with sequences corresponding to regions b and c we did not obtain any significant similarities with other known proteins. Region d, which is in part similar to the vertebrate lipases, is probably a domain involved, at least in part, in lipid binding. The function(s) of the other regions are unknown, and difficult to reveal, mainly because of the lack of point mutations in the

778

M. Rina and C. Savakis

D.melanogaster vitellogenins. This drawback, which is probably caused by the presence of multiple vitellogenin genes in the genome, could be overcome by introducing in vitro mutagenized gene copies into the germline and studying their effects on the wild-type genes, as originally suggestedby HERSKOWITZ (1987). From an evolutionary point, it is remarkable that the vitellogenins of higher diptera have a different ancestry from all other known vitellogenins, such as these of the chicken, frog, locust, and nematode. The latter are generally longer proteins encoded by genes which are interrupted by many introns and share sequence similarities whichindicate a common evolutionary origin. No sequence similarity can bedetected between the dipteran and the other vitellogenins. It appears, therefore, that during the evolution of higher diptera the functions of the major yolk proteins were taken over by a gene which has common origin with the present day triacylglycerol lipases. It is intriguing that another protein of diptera, the enzyme alcohol dehydrogenase (ADH), has an analogous evolutionary history. The D. melanogaster enzyme is related to prokaryotic ribitol dehydrogenases and shows no sequence or intron/exon structure similarity to any of the sequenced eukaryotic ADHs. In contrast, the ADH proteins from yeast, plants and mammals are all related to each other, havingpresumablyevolved from a common ancestral gene (JORNVALL, PERSSON and JEFFREY 1981;JORNVALL et al. 1984; SULLIVAN, ATKINSON and STARMER 1990). With the increasing accumulation of new protein sequences it will be interesting to see whether such events have occurred also during theevolution of other taxa. C. capitata and D. melanogaster genes havedifferent codon usage patterns: Synonymous codons are not used with equalfrequency and, often,genes from one species share similarities in codon usage (GRANTHAM et al. 1980,1981). Moreover, in species showing non-random codonusage,individualgenes differ from each other in the degree, rather than in the et al. 1988). Nonrandom direction of the bias (SHARP usage of alternative codons can begenerated by biases in mutation patterns and by selection operating at the level of translation. Generally, selection for more efficient translation is probably driving the codon usage patterns in several genes of prokaryotes and yeast; in these speciesabundantly expressed genes show strong codon usage biases while weakly expressed genes have a more even synonymouscodon representation (GOUY and GAUTIER1982; IKEMURA1985;SHARPandLI 1986). In mammals codon usage varies among genes C) content,and specificallyin the mainlyin(G frequency of the dinucleotide CpG, which correlates with base compositionaround the gene and in introns (AORTAand IKEMURA 1988). D. melanogaster genes also show considerable codon usage bias(BODMERand

+

TABLE 1 Base utilizationat position I11 of codons for vitellogeninand chorion genesof D. melanogaster and C. capitata Species"

Drosophila YPl YP2 YP3 s36 s38 Ceratitis VG 1 VG2 s36 s38

G

C

Total

% (C+C)

70 68 65 52 75

146 151 139 87 64

207 205 202 123 134

440 443 421 287 307

80.22 80.36 80.99 73.17 64.49

113 137 116 106

83 81 42 38

151 137 87 78

437 421 321 282

53.31 51.78 40.18 41.13

A

T

17 19 15 25 34 90 82 74 60

a Sources of sequences: Drosophila Y P l , Y P and 2 YP3 have EMBL nucleic acid database accession numbers V00248, JOll57, and M15898, respectively. Drosophila s36 and s38 chorion DNAsequences are from EMBL entry X12635; Ceratitis s36 and s38 sequences are EMBL entries X51342 and X55886, respectively.

ASHBURNER 1984; ASHBURNER, BODMER and LEMEUNIER 1984). On average, there is a strong deficiency T in the thirdposition, of A and aweaker deficiency of and a more marked under-representation of N T A and NAA codons. Among different D. melanogaster genes there is a correlation between degree of synonymous codon nonrandomness and levels of expression, suggesting that translational selection maybe operating in D. melanogaster, as in prokaryotes and et al. 1988). yeast (SHIELDS We compared codon usage in D. melanogaster and in C. capitata for the vitellogenin genes and for two chorion protein genes, s36 and s38 (KONSOLAKIet al. 1990; TOLIAS et al. 1990). A summary of the results is shown in Table 1. The D.melanogaster genes exhibit the deficiency of A and T in the third position which is typical for most abundantly expressed genes of this species; the bias is stronger in the vitellogenins (80% to 8 % 1 G C in the thirdposition) than in the chorion proteins (73% and 64% for Ccs36 and Ccs38, respectively). In contrast, the vitellogenin genes of medfly show a rathereven synonymouscodon usage (approximately 50% G C in the third position), while the medfly chorion genesshow a slightly reversed bias (approx. 40% G C). In all four medfly genes there is also a smallbiasagainst GTA, A T A and NCG codons, which is not observed in D. melanogaster (results not shown). There are two alternative explanations for the observed difference between the two species. First, it is possible that selective pressure for specificsynonymouscodons isless strong in the vitellogenin and chorion genes of the medfly, because these genes are not as abundantly expressed as in D. melanogaster. All four genes code for proteins required at high levels during oogenesis, and C. capitata, with a lifecycle twice aslong as D. melanogaster and a comparable

+

+ +

Medfly Vitellogenin Genes

number of egg output may have lowerrates of expression of these genes than D.melanogaster.Alternatively, selection of synonymous codons may not be operating at all in the medfly if its effective population size is small. As has been pointed out (SHIELDS et al. 1988), the selection coefficients for synonymous codons are expected to be very low, and selection is possible as long as Na > 1, where Ne is the effective population size and s is the difference in selection coefficients betweensynonymouscodons. Sequence data from medfly genes encoding weakly and highly expressed proteins may resolve this question. Differences in codon usage patterns have been observed even among species of the genus Drosophila. In members of the Sophophora subgenus, the gene encoding alcohol dehydrogenase exhibits a more biased codon usage (similarto thatof D.melanogaster) than in members of the Drosophila subgenus (SULLIVAN, ATKINSON and STARMER 1990). Taken together with the medfly data, these observations should serve as caution wheneverphylogeneticdistances are inferred from synonymous substitution rates. To minimize influences by differences in codon bias, we propose that weakly expressed genes with demonstrated low codon usage bias are used in such studies. We thank F. C. KAFATOSfor acritical reading of the manuscript. This work was supported by a U.S. Department of Agriculture grant.

LITERATURE CITED AOTA,S . , and T. IKEMURA, 1988 Diversity in G + C content in the third position of codons in vertebrate genes and its cause. Nucleic Acids Res. 1 4 6345-6355. ASHBURNER, M., M. BODMERand F. LEMEUNIER, 1984 On the evolutionary relationships of Drosophila melanogaster. Dev. Genet. 4: 295-3 l . BENTON, W. D., and R. W. DAVIS, 1977 Screening ofrecombinant clones by hybridization to single plaques in situ. Science 196: 180-182. BEVERLEY, S . M., and A. C. WILSON,1984 Molecular evolution in Drosophila and the higher Diptera. 11. A time scale for fly evolution. J. Mol. Evol. 21: 1-13. BLOBEL,G . , and B. DOBBERSTEIN, 1975 Transfer ofproteins across membranes. J. Cell Biol. 67: 852-862. BODMER, M., and M. ASHBURNER, 1984 Conservation and change in the DNA sequences coding for alcohol dehydrogenase in sibling species of Drosophila. Nature 304: 425-430. BOWNES, M., and B. D. HAMES,1978 Analysis of the yolk proteins in Drosophila melanogaster. FEBS Lett. 96: 327-330. BOWNES, M., A. SHIRRAS, M. BLAIR,J. COLLINS and A. COULSON, 1988 Evidence that insect embryogenesis is regulated by ecdysteroids released from yolk proteins. Proc. Natl. Acad. Sci. USA 85: 1554-1557. BRENNAN, M. D., A. J. WEINER, T . J. GORAISKIand A. P. MAHOWALD, 1982 The follicle cells are a major site of vitellogenin synthesis in Drosophila melanogaster. Dev. Biol. 89: 225-236. CAVENER, D. R., 1987 Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res. 15: 1353-1361. DAYHOFF,M. O., W. C. BARKERand L. T. HUNT, 1983 Establishing homologies in protein sequences. Methods Enzymol. 91: 524-545.

779

DENHARDT, D. T., 1966 A membrane-filter technique for the detection of complementary DNA. Biochem.Biophys.Res. Commun. 23: 641-652. FEINBERG, A. P., and B. VOGEISTEIN,1983 A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem. 132: 6-13. GARABEDIAN, M. J., M.-C. HUNGand P. C. WENSINK,1985 Independentcontrol elements thatdetermine yolk protein gene expression in alternative Drosophila tissues. Proc. Natl. Acad. Sci. USA 82: 1396-1400. GARABEDIAN, M. J., B. M. SHEPHERD and P. C. WENSINK, 1986 A tissue-specific transcription enhancer from the Drosophila yolk protein 1 gene. Cell 4 5 859-867. GARABEDIAN, M. J., A. D. SHIRRAS, M. BOWNES and P. C. WENSINK, 1987 The nucleotide sequence of the gene coding for Drosophila melanogaster yolk protein 3. Gene 55: 1-8. GARNIER, J., D. J. OSGUTHORPE and B. ROLWN, 1978 Analysis of the accuracy and implication of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120: 97-120. GOUY,M., and C. GAUTIER,1982 Codon usagein bacteria: correlation with gene expressivity. Nucleic Acids Res. 1 0 70557074. GRANTHAM, R., C. GAUTIER,M. GOUY,R. MERCIERand A. PAVE, 1980 Codon catalogue usage and the genome hypothesis. Nucleic Acids Res. 8: r49-1-62, GRANTHAM, R.,C. GAUTIER,M. GOUY,M. JACOBZONE and R. MERCIER,1981 Codon catalogue usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res. 9: r43r74. HATZOPOULOS, P., and M. P. KAMBYSELLIS, 1987 lsolation and structural analysis of Drosophila grimshawi vitellogenin genes. Mol. Gen. Genet. 2 0 6 475-484. HAYMER, D. S . , J. E. ANLEITNER, M. HE, S . THANAPHUM, S . H. and L. ARCANGELI, 1990 Actin SAUL,J. IVY,K. HOUTCHENS genes in the mediterranean fruit fly Ceratits capitata. Genetics 1 2 5 155-160. HERSKOWITZ, I., 1987 Functional inactivation of genes by dominant negative mutations. Nature 329: 219-222. HIGGINS,D. G., and P. M. SHARP,1988 CLUSTAL:a package for performing multiple sequence alignment on a microcomputer. Gene 73: 237-244. HOLMES,D. S . , andJ. BONNER,1973 Preparation, molecular weight, base composition, and secondary structure of giant nuclear ribonucleic acid. Biochemistry 12: 2330-2338. HULTMARK, D., R. KLEMENTZ and W. J. GEHRING,1986 Translational and transcriptional control elements in the untranslated leader of the heat-shock gene hsp22. Cell 44: 429438. HUNG,M.-C., and P. C. WENSINK, 1983 Sequence and structure conservation in yolk proteins and their genes. J. Mol. Biol. 1 6 4 48 1-492. IKEMURA, T., 1985 Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2: 13-34. JORNVALL, H. M., M. PERSON and J. JEFFREY, 1981 Alcohol and polyol dehydrogenases are both divided into two protein types, and structuralproperties cross-relate the different enzyme activities within each type. Proc. Natl. Acad.Sci. USA 78: 4226-4230. JORNVALL, H. M., H. BAHR-LINDSTROM, K. D. JANY, W. ULMER 1984 Extended superfamily of short alcoand M. FROSCHLE, hol-polyol-sugar dehydrogenases: Structural similarities between glucose and ribitol dehydrogenases. FEBS Lett. 165: 190-196. KELLER,E., and W. A. NOON, 1985 Intron splicing: a conserved internal signal in introns of Drosophila pre-mRNAs. Nucleic Acids Res. 13: 4971-4981. KONSOLAKI,M., K. KOMITOPOULOU, P. P. TOLIAS, D.L. KING, C.

780

M. Rina and C. Savakis

SWIMMER and F. C. KAFATOS, 1990 The chorion genes of the medfly, Ceratits capitata. I. Structural and regulatory conservation of the s36 gene relative to two Drosophila species. Nucleic Acids Res. 18: 1731-1737. KOZMA, R., and M. BOWNES,1986 Yolk protein induction in males of several Drosophila species. Insect. Biochem. 16: 263-271. MALACRIDA, A.,G. GASPERI,G.F.BISCALDI and R. MILANI, 1986 Persistence of linkage relationships among enzyme loci in some Dipteran species. Atti Assoc. Genet. Ital. 31: 121-122. 1982 Molecular MANIATIS, T., E.F.FRITCH and J. SAMBROOK, Cloning: A Laboratory Manual.Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. MINTZAS, A. C., G. CHRYSANTHIS, C. CHRISTODOULOU and V. J. MARMARAS, 1983 Translation of the mRNAs coding for the major haemolymph proteins of Ceratits capitata in a cell-free system: comparison of the translatable mRNA levels to the respective biosynthetic levels of the proteins in the fat body during development. Dev. Biol. 95:492-496. MOUNT,S. M., 1982A catalogue of splice junction sequences. Nucleic Acids Res. 10 459-472. MULLER, H. J., 1940 Bearings of the Drosophila work on systematics, pp 185-268 in The Nnu Systematics, edited by J. HUXLEY, Clarendon Press, Oxford. NARDELLI,D., S. GERBER-HUBER, F.D. VAN HET SCHIP, M. GRUBER, G. AB and W. WAHLI,1987 Vertebrate and nemaa tode genes coding for yolk proteins are derived from common ancestor. Biochemistry 26: 6397-6402. S. ENERBACK, T. OLPERSSON, B., G. BENGTSSON-OLIVECRONA, IVECRONA and H. JORNVALL, 1989 Structural features of lip oprotein lipase.Lipasefamily relationships, binding interactions, non-equivalence of lipase cofactors, vitellogenin similarities and functional subdivision of lipoprotein lipase. Eur. J. Biochem. 179 39-45. POSTLETHWAIT, J. H., M.BOWNFS and T. JOWETT,1980 Sexual phenotype and vitellogenin synthesisin Drosophila melanogaster. Dev. Biol. 7 9 379-387. PROUDFOOT, N. J. and G. G . BROWNLEE,1976 3’ Non-coding region sequences in eukaryotic messenger RNA. Nature 263: 211-214. RIDDIHOUGH, G., and H. R.B. PELHAM, 1987 An ecdysone response element in the Drosophila hsp70 promoter. EMBO J. 6: 3729-3734. RINA,M. D., and A. C. MINTZAS, 1987 Two vitellins-vitellogenins of the Mediterranean fruit fly Ceratitis capitata: a comparative biochemical and immunological study. Comp. Biochem. Physiol. 86B: 801-808. RINA,M. D., and A.C. MINTZAS, 1988 Biosynthesis and regula-

tion of two vitellogenins in the fat body and ovaries of Ceratitis capitata (Diptera). Roux’s Arch. Dev. Biol. 197: 167-174. SHARP,P. M., and W-H. LI, 1986 On the rate of DNA sequence evolution in Drosophila. Nucleic Acids Res. 1 4 7737-7749. SHARP,P. M., E. COWE,D. G. HIGGINS,D.C.SHIELDS, K. H. WOLFEand F. WRIGHT,1988 Codon usage patterns in Escherichiacoli,Bacillussubtilis, Sacchromyces cerevisiae, Schizosaccharomyces pornbe, Drosophila melanogaster and Homo sapiens; a reviewof the considerable within-species diversity. Nucleic Acids Res. 16 8207-82 11. SHIELDS, D.C., P. M. SHARP,D. G . HIGGINS and F. WRIGHT,1988 “Silent” sites in Drosophila genes are not neutral: Evidence of selection among synonymous codons. Mol. Biol. Evol. 5: 704716. s. KIRTLAND, J. CANEand T. BLUMENTHAL, SPIETH,J., K. DENISON, 1985 The C . elegans vitellogenin genes: short sequence repeats in the promoter regions and homology to the vertebrate genes. Nucl. Acids Res. 13: 5283-5295. STADEN, R.,1984 Graphic methods to determine the function of nucleic acid sequences. Nucleic Acids Res. 12: 521-538. SULLIVAN, D. T., P.W. ATKINSON and W. T. STARMER,1990 Molecular evolution of alcohol dehydrogenase genes in the genus Drosophila. Evol. Biol. 24107-147. TEEM, J. L., N. A. ABOVICH, N. F. KAUFER, W. F. SCHWINDINGER, J. R. WARNER, A. LEVY,J. WOOLFORD, R. J. LEER,M. M. C. VAN RAAMSDONK-DUIN, W. H. MACER,R. J. PLANTA,L. SCHULTZ, J. D. FRIESEN, H. FRIEDand M. ROSBASH, 1984 A comparison of yeast ribosomal protein gene DNA sequences. Nucleic Acids Res. 12:8295-8312. TOLIAS, P. P., M. KONSOLAKI, K. KOMITOPOULOUand F. C. KAFATOS, 1990 The chorion genes of the medfly, Caratits capitata. 11. Characterization of three novel cDNA clones obtained by differential screening of an ovarian library. Dev.Biol. 140 105-112. WALLACE, R.B., M. J. JOHNSON, S. Y. SUGGS,K. MIYOSHI,R. BHATTand K. ITAKURA, 1981 Aset of synthetic oligodeoxyribonucleotide primers for DNA sequencing in the plasmid vector pBR322. Gene 16: 21-26. WARREN, T . J., and A. P. MAHOWALD, 1979 Isolation and partial chemical characterization of the three major egg yolk polypeptides from Drosophila melanogaster. Dev. Biol. 6 8 130-139. YAN, Y. L., C. J. KUNERT and J. H. POSTLETHWAIT, 1987 Sequence homologies among the three yolk polypeptide (Yp) genes in Drosophilamelanogaster. Nucleic Acids Res. 15: 6785. ZACHAROPOULOU, A., 1990 Polytene chromosome maps in the medfly Ceratitis capitata. Genome 33: 184-197. Communicating editor: W. M. GELBART