Proc. Natl. Acad. Sci. USA Vol. 90, pp. 10783-10787, November 1993 Developmental Biology
The precursor region of a protein active in sperm-egg fusion contains a metalloprotease and a disintegrin domain: Structural, functional, and evolutionary implications (PH-30/spermatogenesis/snake venom/astacin/celi adhesion)
TYRA G. WOLFSBERGt4, J. FERNANDO BAZANt§, CARL P. BLOBELt$¶, DIANA G. MYLESII, PAUL PRIMAKOFFII, AND JUDITH M. WHITEtt** Departments of tPharmacology and tBiochemistry and Biophysics, University of California, San Francisco, CA 94143; and University of Connecticut Health Center, Farmington, CT 06030
IlDepartment of Physiology,
Communicated by Bruce M. Alberts, August 4, 1993
ABSTRACT PH-30, a sperm surface protein involved in sperm-egg fusion, is composed of two subunits, a and (3, which are synthesized as precursors and processed, during sperm development, to yield the mature forms. The mature PH-30 a/fl complex resembles certain viral fusion proteins in membrane topology and predicted binding and fusion functions. Furthermore, the mature subunits are similar in sequence to each other and to a family of disintegrin domain-containing snake venom proteins. We report here the sequences of the PH-30 a and (3 precursor regions. Their domain organizations are similar to each other and to precursors of snake venom metalloproteases and disintegrins. The a precursor region contains, from amino to carboxyl terminus, pro, metalloprotease, and disintegrin domains. The ,B precursor region contains pro and metafloprotease domains. Residues diagnostic of a catalytically active metalloprotease are present in the a, but not the ,B, precursor region. We propose that the active sites of the PH-30 a and snake venom metalloproteases are structurally similar to that of astacin. PH-30, acting through its metalloprotease and/or disintegrin domains, could be involved in sperm development as well as sperm-egg binding and fusion. Phylogenetic analysis indicates that PH-30 stems from a multidomain ancestral protein.
PH-30, a guinea pig sperm surface protein, is a candidate sperm-egg membrane binding and fusion protein (1-5). The PH-30 subunits found on fertilization-competent sperm, mature a and mature (, share membrane topologies and other characteristics with viral binding and fusion proteins. The (3 subunit contains a potential receptor binding domain, a disintegrin domain, related to soluble integrin ligands found in snake venom. The a subunit contains a potential fusion peptide. In addition, the two subunits share sequence similarity. Snake venom disintegrins derive from precursors that also contain zinc-dependent metalloprotease domains (6, 7). Interestingly, PH-30 a and , are present on testicular spermatogenic cells as larger precursors, termed here pro-a and pro-,8 (2). Here we show that the precursor regions of PH-30 a and 3 (the regions amino-terminal to the mature proteins and found on developing, but not fertilization-competent, sperm) share further amino acid identity with each other as well as with this family of metalloprotease and disintegrin domain-containing snake venom proteins.tt
MATERIALS AND METHODS Cloning. A portion of the a precursor region sequence was obtained from a PCR product generated by the nested RACE (rapid amplification of cDNA ends) protocol (3). The se-
quences of the remainder of the a precursor region and the entire ( precursor region were determined from clones of a and (8 isolated at high stringency (3) from a guinea pig whole-testis cDNA library (8). Northern Analysis. RNA was isolated from adult male guinea pig tissues (9), electrophoresed in a formaldehyde/ agarose gel, and transferred and cross-linked to a Hybond-N nylon membrane (Amersham). High-stringency prehybridization and hybridization with -PH-30 a and 38 32P-labeled DNA probes was carried out at 65°C in 5x standard saline citrate (SSC)/5x Denhardt's solution/0.1% SDS containing salmon sperm DNA at 0.2 mg/ml. The membrane was washed, 10 min per wash, in 2 x SSC/0. 1% SDS once at room temperature and twice at 65°C and then in 0.2x SSC/0.1% SDS twice at 65°C. Hybridization with a mouse f-actin probe was carried out in the same solution at 55°C, with identical wash conditions.
RESULTS AND DISCUSSION The amino acid sequences of the PH-30 a and (3 precursor regions were deduced from cDNA sequences and are shown in Fig. 1. Following their signal sequences, the a and P precursor regions contain sequences similar to those in the prodomains of disintegrin domain-containing snake venom proteins, and then sequences which align with the snake venom zinc-dependent metalloprotease domain (Figs. 1 and 2). a contains the consensus active-site residues for a metalloprotease (see below); (3 does not. Following the metalloprotease domain, both proteins contain a disintegrin domain (Figs. 1 and 2). The cleavage site which generates mature a falls within the disintegrin domain (3) (arrows, Figs. 1 and 2). The cleavage site which generates mature , lies at the amino terminus of the disintegrin domain (3) (arrow before position 383, Figs. 1 and 2). The sequence alignment of mature PH-30 a and 8 with the snake venom proteins continues through the cysteine-rich domain (Figs. 1 and 2). No snake venom proteins include either the epidermal growth factor repeat or the transmembrane and cytoplasmic segments of a and (3(Figs. 1 and 2). Additional mammalian genes encode proteins with domain organizations identical to those of PH-30 a and 8 (Figs. 1 and 2). EAP I, cloned from rat and monkey, is an androgen-regulated protein located on the apical surface of epididymal epithelial cells (13). Cyritestin is a mouse testis cDNA (GenBank accession no. X64227).
§Present address: Department of Molecular Biology, DNAX, Palo Alto, CA 94304. lPresent address: Department of Cellular Biochemistry and Biophysics, Sloan-Kettering Institute, New York, NY 10021. **To whom reprint requests should be addressed. ttThe sequences reported in this paper have been deposited in the GenBank data base (accession nos. Z11719 and Z11720).
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.
10783
Developmental Biology: Wolfsberg
10784 PH 3I a PH 30
Cyritestoon EA;P I Ht-e
TrigraaS4n PH 30 aE PH-30
H E K K
Cyritestin KAP I
Jararhagin
RDEVDDP
YK VYP IPY E
FYIEKMD
R
Y
PH830 at
AC S ISP IL A I I I L R 0 L L F F T I I 0 H I K Q101N A A ElI FY TVT LIT C I L ROCL L TO IISfH E F It' ~A I QG CNIGLIR GIF F K H D A D ST ASI S ACN GL K 0 Y F IN NOGR OKE N D A DS T A I I I AINOG L F O H F N D ADS TA HOE-IR IACDIG CFK OH F
PH 20
Jararhagmn HI-e
Trigramin
PV
V
EW
Cyritestin
DC..
IHL Y1-HHN DKTI OAR H Y Y SIX ElI V IP E S L T V K 0 S Q -D POO R TOSY ML LI QWH K DO - K IV OKE N T T YII V N L V R K H F L P H D F Q VY SWO I AIoI H K P F K DY S Q N F C IYI NOON QIT V P K K I RS V K K G - -S V K S E VI Q r IT V F E K ID TOIl IQD AF KE AKE T Q V T Y V V T I EGKOA Y T L T PL sIFIL H P L F 0 T Y L R D K LWT L Q P Y F IL V K T H: C FlY LKE K I EK KY K K K L L YE I K L -OGR K T L T L H L L P A K KIFIL A L N Y I K T YIYN IK K K MV T K H P QI L D H :C FFll L S L L Q Y R D L K K I H CS D T P -K Y EDLA M QY E FYV NOKEPIV% L HLEFKN KOGLFISKODlIE IHY SP DOR EOIT TYP P VKEDHCYYI A TR P KOA V IP
Hr a Triaramir KAFO
Proc. Natl. Acad. Sci. USA 90 (1993)
at.
78 22 22 33 30 30
V P N T I S F S A S C Q K A H V V L H V A R S L L Q T C T L L M V A P R L 0 L V P 0 H L C V R L V T K LL V 0 T V L L P H IOH C H C G P V M LC LLL L LCOCL ACO1OGP.L KK YV M A A ALF FL; L 0 Y CD GQVI AtA OK D V M F P T 0 I F L H $ V L I S QM H 0G KG I:V G VE t, Q K L V H P K MI QV LCLV T I CLAAF PY QOG'SSISILE SOGN VN D PP Y QOGVI SOIIL E S ON C N D M I Q V L COIT I C L A VF
N R S 0 S M M A ~'
P
et
V
T A L
P0 V
-
A V
P K
0~~~J
K Y
TOCADG
Q
YK
YMK
A
F
OKE V, C_
P NN-0E V
G
P V VL H L
K
N K
IE &
VLIY-P V I
KILKLIWISQP
-V T TIS Y Q FK EN
H K 0 O I K P SIS I 0 F K H V Q L EN I21T1Y1K F PL E I IIA T F E HOI R JNDIQRHYL OKE V K YWSD K 0 DIHI L K LIQR E T YF OK FL F L F DOSE AJHA E LG OMNYCL I K F L K L D S K A H A EL 0KEM SCL I K P C E LCWD S K A H A
-10£E
K L
I
D
Y
I A K K K
I T A
IP I V K
D H
C YM lj
HIDQ
225 A K D SQ A V S S I N V K N V V Y K I K TO I I ---171 F L KEKN F A N I K 1 0 LIE 1716 11Y1K Q T N ISOE 0 L N K T K K I T C I D A K 0 086 A PfrMHfGJV T QNWEK S Y E POIXKE A S 044 183 A P KM C G V T QN W El SEP IKEK A S P PWM CJG V T IN W El I K S T K K A S 183
A A P K H D T N K K S I Y
LY EIOKNNKKID Y V F K I N VK A P Y I FFKY E NV E K ED I F K LK N I K K K D I F K I E NV E KE D
I D OKKE VSERA
I H Y S
RD
152 96 98 109 69
CHK
Metalloprotease Domain
KHn,VWHE
EV
00 P P: P H S V~Q A C I I I CV..N T F V VH N00 IRK I H WDGSDV Ol NE TIDW V D II A L A Nl HO ILLE N LIP H MOO N T AT V T EK IFO - -P0 Q L VOCL N N A ---------KSS V P T H WO K P EK GOS ----------N S T L T KF PILE OIK ION D F A M F O HM 0 OVGVA T QK V V HOPF LOI NIT 0 K K H F V K 3 H K K K F F V VWADIE F V; S R K N I K P Q N K C R KI H0WGMVIN P VIHIH I RIDr P YK KYI F P V V V 000 T VT K NNOG DCL D K OKX A R H YE L AN I VWKE ICL A FPT A -K--
PH-3S ax P9-30 p Cyritestin
NL~ oW'L
EAP 0
Jararhagin
TrigraRin
Q C N
PH-30 aE
WCHIE V P V OR V IIN F1N RERI D FHLL
T P K -'---
2 Q H F P Q RIOtI C 0G P- V 0 H 0 M S T K 1 0 0 N S K H I T K RIV H I H I N N
PRN
OWFN
F TED I N T EHVIH A 0 V EWVHT EDG F FST TWN L T VOA I LA L WID E H FQ I ILI M T VMLN N L EIIWOI I I IYKA ILIN 0 K V T L T 0 M EIKWII A 0 OFR K SYHMHWIVIA VOGL CIKWIN 0 H I K
AWF]N
R H DVWAH M I 'O GH H P0-KE T SWIWQF LCNO AtC SlOPC A A A 1KE OD E E I VC V RSPI M:IC. N TISOYODOG A 10 D ElI E T NO GD A DKV 11F QRF C LIEKS F £011 K-AID I T Y CLILL K I H P-DY VIOlA TY HICIM A:C N F N F TACOI A A SF00G OI::T LEIS'C7 011 0K W LIYT IN QG D ElI ElI V S N C E SIT CILCHIFIS T WQK TICLK K REKD P D NV OIL LC W Q T Y D D T T
PH lI
II
Cyritestin EAP I
Jararhagin
Q
T
CC
LCL
GIYIAI
253
P H H ED A LCCI A A C C H s K TO TL D SF0C V I H P FTLC A V K C F A I V DCC POIVN 01IHENRM
310 320
A C C K Q
Y
FPI
HD N A QI CCITT A I AI FF D 0 F GIRjAKJIG DI HO" MICC-D0PF KK R SS VA0 0I V 0 HH NS A II VN CF IV A V Q
0N
2333
NW1
4 49 38 5 397
-~S P P~ -S TI t
364 398
0T
N T
F
C
W RR
T D
K
K
H
K T V C C N K T 0 H D H A
S
T
I I N 0 N V 0 0 K
V V
D
P
0 0
K 0
K
Disintegrln Domnain
NE[FP]W
DNIDWtjY
L111CC
FPT I S N K P 5 P F F - SN ISIS I DQIW0DF I M N H N F KIll I I WEEPIL 0 TO I I F VOISD K F S K I F - L1K K TODT V CE FO N N QK FIJ I C KKX D
A1 >jLOGjjH0H ICSI'CIG D I F - 100Mb T HE C O H N9 COO H NOET O)S CS COOlS-CON
Jararhagin
HI-a
pH39V(
37H
I A KOK H FKA EKER A A V:C ROE H I CL MIQE N 01K ESGF N[EfS DIPS FHWC H K H RDG AWJC K NECOH N COG I END H E C V QLL O I AD N A D C C KEF V I K r9 C: N S P K A VF 5 1 0 N P N F 10 Ni 11V1K A F P CWT 5 1 Q V S Q.C CQ K-C- K 0 A I R 0 LAIY I POGS T IM1lN D011 N C I P S AIRP100IK V FISIIII V OKE FE KC A S IP E L DCL K N TOSE TKEF V VJ Q F CDG0 G~
A S N I N V V I A I A F T 1
f~
PH VI a
PH-30
225
I I T T CLS V C K I W S EKF
327 33 1
0 C
Cyritastin
255
V A V T V A V T H
D
HI-a Trigramin
iV 1 C P K Cj PI0 A
IR
301 234 244
ETj77D A TIN LEO C: CI TICON N K, VIKIDWEIDO0101G0K EIII-.ITIC MSNELO CA C. D K~~~E)ICH) K.::CND A
O
TS
QYKFE
T F
DLI
TF K TK OREV
-
-
C: LP EI-NT -WG4607G K"' PTKE0100F KP NC 4750 52
DECO
K
E
Hyita tin 1-: 0 N- H L C K A 0 I C: 0 'C: 0 S C K N PC. T Ol K TN ICP. DP 0I 2 OA K 0 CP C 0 I - -:: IK F H K K 0 VL I K1WN 0 -K. N 0P 0 F C: T 0 E S A 469 PH-A
PIPCOTS K 1
a
Caroarhagi
DOE
DCGA PO P A A, K C~~~~~UG F A V H
Jararhagon
L
A
Trigramin
DIFKGNPFHLEA
PH-S30
A
Cyritestin
I,
EKCPCW
PH 30 a
A N
PH-OS
P9-32
EAPF
;
D
E
[Li C;0 N
L Z
1 0
N
N
P
T
,C 7-DA
A1
-'
NL7
TGISSIP N A IA1NN Q
H
H
K
L
E
V 0C
.C
IFHTOTC
M
VOIF
ON C
ASK NPE0
I
TI T 1001001
C
T
-
KRE[J
Y Y L V QP,WI INFAFINTEPY0 C 1 K 0 OW
N LPOPONSIPPIIDNNW
T T
D PIDVI
KFKQ
P: C
I I A FF0 A
'NGRSA
D
Q - Q 11111 F T D 1 P P 0 H AK 1. C Q D I F KV YANC K S;PN V ATTO
HIO DEAPOHOCDD S- 12
L
0CIC
Al
G VIOlA I KVINICQNOEIFN1TRC K A: K 0 N 0 1R K K E - - ND0 P 0 QW P K0 K G
I 2' CK A .YIPCAQRGD[L
0 C AW C 0 0 P A K K A P P P F K K K K A 0 I C C C C LH C I I P F K V I F S I C I A T V P V I I I K K K I P S K 0 1 A N 0 El N I
V V V; C- V
KAPIteti
0
G1 3::L A ; El P C Y11 V 0-, P K K I C5 NO F0 V 1 N K A K.C N N E T K KC: G P 0C K A11 A 1R C 0 0
Jararhagin a
0 A TI.~ NE:- PP7EC P11OCT:OFDI THKC 1KPG H OWi-:V ESA 0C C N S 01 K OP OHA T LEFCN:& GTPET DILPWKIFiKOF
0
NIHR 1 TVIGDI P 1 K l- K I INI lKl S S S I C 0 T SHIN H C K LQlDKTiFG NI S GWI -I~ 011 C R VV-
FPPEC;A S
KKKQ
SGT P TAP P F F:.TNITTEAST1KFOCLD
KKKQ
759 5453
546 490
653
61054
645
0
MWV KGHAKSA D G TV -0:K F KE VS F VYI GI6K
P
NLDI SCCK KTWMFLYC
I
V D KG F
NRML---O
V 0
1
Q HWT . A
:GEMV:.,S
D:SC K P E
V P 1
KL
6174 675
Swis-Not (RelAse1- 24) dat bae." Prdmi.P-0aad ae3%ietcl ecuiggp)oerti ein h irtaioai eiu res (udrie) non of whichis68 sHow for a and (His tha of th fis n-frmemehinine a cotin or poenia start mehonn PIduEsTYFVEG -CTPA.T-I
L
followed by an obvious signal sequence (10). The methionine at position 7 is encoded by an AUG codon in the most optimal context for translation initiation (11). The putative start methionine of B(underlined), encoded by an AUG codon in a good context to initiate translation (11), is followed by a potential signal sequence (10). Potential sites for signal-sequence cleavage are marked with arrows. Stars mark cysteine residues which may be involved in a matrix metalloprotease-like "cysteine switch" (12) activation of the metalloprotease domain. Metalloprotease domain. PH-30 a and 8 are 27% identical (excluding gaps) over this region. The consensus snake venom metalloprotease active-site sequence,
Developmental Biology: Wolfsberg et al. SS
.X
P
C
D
5AS
E TMI T
I., I
.
1
XZ
|
PH-30ae II
11
I
I
)-.
,.' \Ze Z'-, t C. .f .,
4.,
0..,
...