The fold recognition of CP2 transcription factors gives new insights into the function and evolution of tumor suppressor protein p53

Cell Cycle ISSN: 1538-4101 (Print) 1551-4005 (Online) Journal homepage: http://www.tandfonline.com/loi/kccy20 The fold recognition of CP2 transcript...
Author: Moris Lindsey
28 downloads 0 Views 1018KB Size
Cell Cycle

ISSN: 1538-4101 (Print) 1551-4005 (Online) Journal homepage: http://www.tandfonline.com/loi/kccy20

The fold recognition of CP2 transcription factors gives new insights into the function and evolution of tumor suppressor protein p53 Katarzyna Kokoszyńska, Jerzy Ostrowski, Leszek Rychlewski & Lucjan S. Wyrwicz To cite this article: Katarzyna Kokoszyńska, Jerzy Ostrowski, Leszek Rychlewski & Lucjan S. Wyrwicz (2008) The fold recognition of CP2 transcription factors gives new insights into the function and evolution of tumor suppressor protein p53, Cell Cycle, 7:18, 2907-2915, DOI: 10.4161/cc.7.18.6680 To link to this article: http://dx.doi.org/10.4161/cc.7.18.6680

Published online: 15 Sep 2008.

Submit your article to this journal

Article views: 205

View related articles

Citing articles: 6 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=kccy20 Download by: [37.44.207.176]

Date: 27 January 2017, At: 21:13

[Cell Cycle 7:18, 2907-2915; 15 September 2008]; ©2008 Landes Bioscience

Report

The fold recognition of CP2 transcription factors gives new insights into the function and evolution of tumor suppressor protein p53 Katarzyna Kokoszynska,1 Jerzy Ostrowski,1,2 Leszek Rychlewski3 and Lucjan S. Wyrwicz1,* 1Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology; Warsaw, Poland; 2Medical Center for Postgraduate Education; Warsaw, Poland; 3BioInfoBank Institute; Poznan, Poland

Abbreviations: gi, GenBank identifier; Ig, immunoglobulin; MQAP, model quality assessment program; PDB, protein data bank

OT D

IST RIB

of important transcriptional regulators was determined. CP2 and homologous proteins belonging to the beta-scaffold factors comprise a family of transcription factors of yet unknown structure. CP2 was observed to contribute to the regulation of gene expression from the early development to the terminal differentiation.1 This transcription factor was at first discovered in a mouse model as a transcription factor binding to α-globin promoter, stimulating the transcription of related genes in both fetal and adult tissues.2 Subsequently—several homologous proteins were found in other organisms including Drosophila, chicken and human.1,3-5 The family of related transcription factors (CP2 family) consists of proteins around 500 amino acids in length present in genomes of the whole Metazoa/Fungi group. The experimental studies defined two major functional regions of the CP2 proteins: the most conserved N-terminal, ranging between 220–360 amino acid long segment involved in DNA-binding6,7 and a shorter C-terminal domain taking part in protein-protein interactions, mostly dimerization.8 Such protein organization is observed for all members of the gene family. The highly divergent spectrum of actions associated with specific members of the CP2 family is an effect of variations in the DNA binding module, a different interactome and specific pattern of tissue distribution.5 One of the most studied genes of this family—Drosophila Grainyhead gene (GRH) was reported to be involved in several important biological processes, including the regulation of cell cycle, cell growth and development.3 This protein expressed in a number of epithelial structures acts as a transcriptional activator, regulating genes involved in the epidermal development. The protein product of GRH is synthesized during oogenesis and stored in the developing oocytes. The Grainyhead protein is responsible for apical membrane growth and tube elongation in tracheal development. The loss of its activity blocks the neuroblasts apoptosis.9 Human homolog—Grainyheadlike 1 protein (GRHL1; Mammalian Grainyhead—MGR), plays a role in the early embryogenesis of the central nervous system as well as in the later stages of cuticle development and is highly expressed in brain, pancreas, tonsil, placenta, kidney and liver.3 Also two additional mammalian members of the CP2 family have been identified i.e., Sister-of MGR (SOM) also known as Grainyhead-like protein 2 (GRHL2) and Brother-of-MGR (BOM; Grainyhead-like protein 3;

.D

ON

The CP2 transcription factor (TFCP2) is a critical regulator of erythroid gene expression. Apart from the involvement in the transcriptional switch of globin gene promoters it activates an array of cellular and viral gene promoters. A number of homologous proteins was identified in genomes of Metazoa, with additional five homologues encoded by the human genome (TFCP2L1, UBP1, GRHL1, GRHL2, GRHL3). Although several experimental studies have already been published, the knowledge on the molecular mechanism of activity of this transcription factors remains very limited.

UT E

.

Key words: grainy head transcription factors, CP2 family, TFCP2, TP53, bioinformatics, protein structure prediction

20

08

LA

ND

ES

BIO

SC

IEN CE

Here we present the application of fold recognition and protein structure prediction in drafting the structure-to-function relationship of the CP2 family. The employed procedure clearly shows that the family adopts a DNA binding immunoglobulin fold homologous to the p53 (TP53) core domain, and a novel type of ubiquitin-like domain and a sterile alpha motif (SAM) form oligomerization modules. With a traceable evolution of CP2 family throughout the Metazoa group this protein family is highly likely to represent an ancestor of the critical cell cycle regulator p53. Based on this observation several functional hypotheses on structure-to-function relationship of p53 were drawn. The DNA motif recognized by p53 is a result of further specialization of the CP2 DNA-binding module. The analysis also shows the critical role of protein oligomerization for the function of this protein superfamily. Finally, the identification of distant homologs of TP53 allowed performing a phylogenetic footprinting analysis explaining the role of the specific amino acids important for both—the protein folding and the binding of DNA.

©

Introduction

In the post-genome era, the number of transcription factor families was identified. As of now the structure of the majority *Correspondence to: Lucjan S. Wyrwicz; Department of Gastroenterology; Maria Sklodowska-Curie Memorial Cancer Center and Institute of Oncology; Roentgena 5; Warsaw 02-781 Poland; Tel.: +48.22.546.2933; Fax: +48.22.644.0209; Email: [email protected] Submitted: 07/10/08; Revised: 07/23/08; Accepted: 07/25/08 Previously published online as a Cell Cycle E-publication: http://www.landesbioscience.com/journals/cc/article/6680 www.landesbioscience.com

Cell Cycle

2907

Modeling of CP2 transcription factors

expression in adult kidney, intestine, lung and skin.21 The mechanism of the transcriptional silencing remains undetermined.21 Although the members of the CP2 family are involved in regulation and execution of various processes, the knowledge of the direct mechanism of their activity is very limited. Here we present the protein structure prediction of CP2 family in order to elucidate the molecular mechanism of the CP2-directed regulation of gene ­expression.

Results

OT D

IST RIB

UT E

.

Defining structural and functional domains. The obtained secondary structure predictions, as well as the inspection of sequence conservation, pointed the existence of three distinct sequence modules within the CP2 protein family. Of these the constantly present N-terminal domain was mapped to the previously reported DNA-binding region (residues 37–264 of the human TFCP2; GenBank—gi|21361278) which corresponds to the PfamA database entry PF04516. The C-terminal short domain of circa 80 amino acids was mapped to all analyzed proteins within the region previously annotated as involved in the dimerization (421–502 of TFCP2). The remaining internal segment of approximately 90 amino acids was observed to be the most variable region. Here, only for TFCP2 and two closely related genes (TFCP2L1 and UBP1) the consistent predictions of secondary structure (PsiPred, ProfSec) and globular amino acid signature were observed (GlobPlot; data not shown). This fragment, corresponding to residues 327–415 in human TFCP2, was refined for further fold recognition. The overview of domain composition of the analyzed family is shown in Figure 1. Fold recognition of the domains. To elucidate the mechanism of action of DNA-binding domain, the protein sequences were submitted to various methods of protein structure prediction accessed via Protein Structure Prediction MetaServer. The applied methodology of fold recognition, especially advanced profile-profile methods like MetaBasic,25 Orfeus39 and FFAS03,26 as well as threading methods (3D-PSSM,28 INUB40) provided a consistent predictions to the family of p53-like transcription factors (b.2.5 fold in the SCOP classification of structural domains). The prediction was confirmed by the prediction quality assessment tool—3D-Jury33 with a confident score up to 74.44 for Drosophila ELF1 (Tables 2 and 3), while previous test pointed that predictions reaching 3D-Jury score above 50 are associated with less than 5% of error.41 The manually refined alignment is shown in Figure 2. The profile-sequence searches against PfamA database performed with HMMer42 resulted in the marginal hit to the sterile alpha motif (SAM; PfamA entry PF00536), while secondary structure prediction applied to the internal domain of TFCP2 suggested that this domain represents an all-alpha fold of five helices. The applied methods of distant homology mapping (profile-profile—FFAS03; meta-profile—MetaBasic) and threading methods (INUB, mGenThreader, HHpred2) confirmed that the discussed domain is distantly related to SAM—with 3D-Jury of 60.22 for human TFCP2 (Fig. 3). Also for the C-terminal dimerization domain, the applied fold recognition methodology pointed the highly consistent mapping of this domain to ubiquitin-like folds (SAM-T02, SAM-T06, FUGUE, FFAS03, MetaBasic; d.15.1 entry in the SCOP data base of protein folds). As shown in the alignment from Figure 4—all predicted

©

20

08

LA

ND

ES

BIO

SC

IEN CE

.D

ON

GRHL3).3,10 These genes exhibit a high degree of sequence similarity with a traceable homology within functional regions involved in DNA-binding, protein dimerization and activation domains.10 GRHL2 is highly expressed in various organs, including placenta, prostate, brain and kidney. Mutations of this gene are responsible for the autosomal dominant type 28 of the non-syndromic sensorineural deafness (DFNA28).11 The third member of the family, GRHL3 is expressed in brain, colon, pancreas, placenta and kidney, and also functions as a tissue specific transcription factor in these tissues.10 Another member of the CP2 family is the highly conserved transcription factor TFCP2, critical for the enhanced transcription of globin genes during erythroid differentiation and hemoglobin synthesis. Its DNA-binding sites (cis sites) were identified in promoters of multiple erythroid-specific genes and in promoters of all human globin genes.12,13 In regulation of globin genes’ expression TFCP2 protein interacts with other erythroid specific transcription factors. Finally it plays a central role in the hemoglobin switching mechanisms by creating a heteromeric complex with developmental stage specific proteins (stage selector protein; SSP)—a critical regulator of the fetal hematopoiesis in human.14,15 Apart from the regulation of the globins, TFCP2 might be additionally involved in the regulation of an array of cellular and viral genes. The list of promoters potentially regulated by this transcription factor still grows (summarized in Table 1). Binding sites for TFCP2 were defined for murine γ-fibrinogen promoter, chicken αA-crystalline, SV40 virus major late promoter and human interleukin-4, ornithine decarboxylase, transcription factors c-myc and modulated by cell growth signals activity of c-fos gene.1 It acts as a regulator of BMP4 (Bone morphogenic protein 4), with TFCP2 overexpression stimulating transcription of this protein during osteoblast differentiation. Various distinct binding motifs of CP2 with low degree of similarity were identified, suggesting that TFCP2 function may rely on interaction with many distinct target proteins.1,16 Also, the interaction between TFCP2 and YY1 forms a complex recognizing the initiation region of the human immunodeficiency virus type 1 (HIV-1).5 Apart from binding to the specified gene promoters, some general activities of TFCP2 awaits detailed description. TFCP2 was identified as a key factor in the T-cell proliferative response, it participates in activation of T helper cells in response to mitogenic stimulation.17 Here, TFCP2 is involved in the negative regulation of cytokine genes expressed in T cells. This protein was also identified as a genetic factor probably involved in the development of Alzheimer disease (reviewed in ref. 18), but its detailed role in this process still needs further characterization.19 The Upstream-binding protein 1 (UBP1) also known as the leaderbinding protein 1 (LBP1), is highly similar to TFCP2 with 88% sequence identity between human proteins. Similarly to TFCP2, its protein product is involved in the regulation of the alpha-globin gene in erythroid cells, and exhibits a highly specific tissue distribution.13 The strong interaction between UBP1 and the initiator element and inducer of HIV genome was described previously.20 Another member of CP2 family, TFCP2L1, is closely related to TFCP2 in terms of the protein primary structure (70% of sequence identity). This protein is also known as CP2-Related Transcriptional Repressor-1 (CRTR-1; LBP-9) since it can play an opposite function in cells supressing CP2-directed transcription. TFCP2L1 is differentially distributed throughout the organism with a detectable 2908

Cell Cycle

2008; Vol. 7 Issue 18

Modeling of CP2 transcription factors

Table 1  Summary of known cis-sites recognized by the family of CP2 transcription factors Protein Organism Bound sequence Sequence

Reference (PubMed ID)

TFCP2 M. musculus α-globin promoter

92-GACCAAACACG-104 107-CGAGATGGGTCA-119 123-GAACAATCCCG-134

2233727

TFCP2 H. sapiens GSK-3β promoter

1292-GCGCACACCAA-1282 1-GCCCGGGCCAA-10

16645641

174-CTGATTTCACAGG-162

10973979

TFCP2

H. sapiens

IL-4 promoter

TFCP2

H. sapiens

SV40 promoter

repeat GGGCGG

2159933

α-Globin γ-Fibrinogen SV40 late promoter HIV-1 promoter

GAGCAAGCACAAACCAGCCAA TGACCAGTTCCAGCCACTCTT GATCATGGGCGGAACTGGGCGGAG GATCGTACTGGGTCTCTCTGGTTAGACCAGATC

8157699

TFCP2 H. sapiens M. musculus M. musculus

α-globin promoter

CCTAACAAGTTTTACTGG(N6)AAGCACAAACCAG

8349681

TFCP2

G. gallus

cENS-1 promoter

CAAGTCCAGGCAAGT

15107494

TCCAGTGAGGCCAGGGGCCGGCGGCTGGCTAGGGATGA AGTATCCTCTTGGGGGCGCCTTCCCCACAC CGGAGCAAGCACAAACCAGCCAA CGTGACCAGTTCCAGCCACTCTT AGCUTGGGGAAGAGGAGGGGCCCGGCGGAGGCGATAAAAGT GGGGACACAGACA

7828600

GATCAAACAATCTGGTTTTGAGCGTTA CTAGAGCGATTGAACCGGTCCTGCGGT TGCCAACTGGTTTGATTGTTCACACTTTTT TTTCATACTGGTTGCTTGA AAACAAACCTGATATATT ACTTTTCCCTGGGCTAATG

12888489

UT E

.

TFCP2

OT D

ON

Ubx (D. melanogaster) Ddc (D. melanogaster) PCNA (D. melanogaster) dbl-1I (C. elegans) mab-5 (C. elegans) CeDdc I (C. elegans)

.D

GRH C. elegans

IST RIB

TFCP2 H. sapiens SSE HPFH α-globin CAATbox γ-fibrinogen: NF-E4

Ig-like proteins were T-box, implicated in development of a number of different tissues, NDT80 and LAG-1 (Longevity assurance protein 1). Although the general fold topology is preserved, the diversity of DNA sequences recognized by such Ig-fold proteins is related to different binding modules as well as a specific composition of protein-DNA complexes (homodimers—STAT3, T-domain; hetrodimers—Runx1, NFAT1; tetramer—TP53). As shown in Table 2—the applied methods consistently mapped the CP2 DNA-binding domain to the p53 family. Although the p53 family consists of several distinct transcription factors the majority of knowledge about this family is based on studies of the most important member TP53. Due to its ability of induction of the progression in the cell cycle in response to DNA damages, TP53 is referred as a tumor suppressor protein. It contains four functionally distinct domains spread throughout the 408 amino acid open reading frame: N-terminal transactivation domain (residues 1–70; gi|120407068),44 core DNA-binding domain (96–289), regulatory domain (­residues 320–360) and C-terminal tetramerization domain. With an ability to recognize nearly 300 human promoters45 it regulates a broad range of cellular pathways including apoptosis, cellular senescence or cell cycle arrest.46,47 Crystalographic studies and mutational assays of TP53 allowed for identification of several functional sites (reviewed in ref. 48). In general—the DNA-binding domain of beta-sandwich architecture utilizes two large loops involved in binding zinc ion and the loopsheet-helix motif as an interface of binding to the major groove of DNA.49 Among the residues highly conserved throughout the whole TP53 family, three cysteines (C176, C238, C242) and the histidine

ND

ES

BIO

SC

IEN CE

secondary structure elements were mapped onto the observed structures of distantly similar domain. Surprisingly, the poly-glutamine stretches of TFCP2 and the two other related transcription factors (TFCP2L1 and UBP1) are located within the first loop of this ubiquitin-like fold (compare Fig. 4) The structural models obtained with Modeller for all domains based on the alignments from Figures 2–4 were assessed by the MQAP. The ProSA-Web tool returned the Z-scores in the range of native conformations, while the ProQ (MaxSub) classified models as ‘fairly good’ (Table 4).

LA

Discussion

©

20

08

The application of state-of-the-art methods of protein structure prediction allowed for the mapping of distant similarity of three distinct structural domains of CP2 protein onto the proteins of known structure, with a p53-like fold as the DNA binding domain. The p53-like transcription factors (b.2.5 entry of SCOP classification) represent a set of proteins utilizing an consistent immunoglobulin-like (Ig) fold in binding to DNA with a great versatility in the mechanism of protein-DNA interaction. Among proteins assigned to this superfamily, six different DNA-binding modules were already identified.43 Apart from the central regulator of cell cycle—tumor suppressor TP53 (p53) protein, also several other transcription factors were mapped to this fold. These include the STAT domain of signal transduction and activator of transcription, the RUNT domain present in human and mouse and DNA-binding domain of Rel/Dorsal transcription factors (reviewed in ref. 43). The remaining cristalographically solved structures of DNA-binding www.landesbioscience.com

Cell Cycle

2909

Modeling of CP2 transcription factors

©

20

08

LA

ND

ES

BIO

SC

IEN CE

.D

ON

OT D

IST RIB

UT E

.

(H179) placed on two adjacent loops tightly bind a zinc ion (numbered according to the sequence of human TP53; gi|120407068). The interaction with the major groove of DNA involves arginines R282, R283, while asparagine N239, serine S241, and R248 locates in the minor groove. Apart from that, arginine R248, supported by glycine G245 and two arginines (R175, R249) plays a key role in docking the symmetrical core dimmers.50 Although the overall sequence identity between the investigated CP2 domain and the DNA-binding domain of p53 is very low (about 13%), there are strong evidence supporting the correctness of fold assignment. Apart from the confident 3D-Jury score and a consistent fold assignment in variety of applied algorithms, we observed the high agreement in terms of the secondary structure of both domains (compare Figure 1. General overview of the domain composition of CP2 and TP53 protein families. The Fig. 2). The general distribution of charged and corresponding proteins encoded in human genome are listed on the right. hydrophobic amino acids was conserved. In order to confirm that CP2 adopting the fold of p53 retains the similar mechanism of DNA binding, the critical (ca. 25%), the predicted secondary structure (PsiPred, ProfSec) of residues involved in the interaction were analyzed. The localization the annotated proteins match with observed elements of the template of this conserved residues in the sequence alignment is shown in secondary structure (Fig. 3). SAM domains are found in a variety of Figure 2. regulatory proteins, including protein kinases, transcription, regulaOut of 11 amino acids of TP53 located within the DNA major tors of lipid metabolism and ETS family of transcription factors. In groove interaction site (located throughout S10 and H2) six amino general, SAM domains are involved in protein-protein interaction in acids are conserved. Particularly—arginine R273 and valine (V274) the process of either homo- or hetero-oligomerization.51 This simple of human TP53 (gi|120407068) located in the distant region of five-helices bundle was initially identified in Drosophila proteins52 the beta-strand (S10) are preserved in CP2 family (K236, V237; and determined as important protein-protein interaction module in TFCP2; gi|21361278). The most important part of the helix packed developmental regulation.53 SAM domain was previously observed directly to the major groove is highly conserved with glycine (G279), as a functional module of Ig-fold DNA-binding TP53 homologs— aspargine (D281) and arginines (R282, R283) of TP53 nearly TP63 and TP73,54-57—as well as selected orthologs of TP53.58 The completely unchanged in terms physicochemical properties (G242, initial observation was confirmed by determination of protein strucD244, R245, K246 of TFCP2, respectively). The unpreserved resi- ture with crystallography59 and NMR spectroscopy.60 The overall dues located in the innner part of this region (TP53:238-241) are picture of the SAM domains’ function in TP53 homologs is not forming a loop which is not directly involved in DNA binding. Also, clear, but it was recently shown that this domain can mediate the two out of three amino acids involved in the interaction with minor negative regulation of p53-like activity.61,62 groove of DNA are conserved (serine S241—arginine R249 and Notable, the presence of SAM domain seems not to be critical threonine T194—lysine K205 of TP53 and TFCP2, respectively). for the function of TFCP2. The indirect evidence for this comes The inspection of the sequence alignment revealed the absence from the analysis of the products of alterative splicing of the human of zinc ion binding site throughout the CP2 family. The obtained gene. The analysis of the mRNA sequences derived from the NCBI models of CP2 structure did not reveal a presence of any cysteine- NR database revealed the presence of a sequence with missing exons histidine groups which could form a zinc ion binding cluster (data 10–12, what resulted in a tentative protein lacking 78 amino acids, not shown). While a zinc ion is critical for stabilization of the including the whole 62-residues SAM domain (gi|50480471). DNA-binding domain internal dimerization site of TP53, this obserThe identification of the common domain shared by both famivation is not surprising if we consider that the region involved in lies can augment the previous suggestion on the molecular evolution this process is actually the most diverged between the analyzed two within p53 family. As suggested by Yang et al., p63 was pointed as the protein families. Apart from the differences in primary structure, the most “primitive” member of this family63 and as such was pointed secondary structure and patterns of general physicochemical proper- as an ancestor of both p53 and p73.64 The presence of the common ties of amino acids flanking the most proximal helix (H1; Fig. 2) protein domain is somehow supporting this observation. But still the demonstrate a high degree of discrepancy. question on order of introduction of the later two proteins remained The central domain located between the DNA-binding immu- unsolved. Since for such paralogs the overall sequence similarity noglobulin-like fold domain and the interaction domain is present do not need to correlate with relative order of gene duplication in TFCP2 and two closely related proteins (CTRP1 and UBP1). events, we need to look for additional data. Optionally, the pattern Similar methodology as described previously was used for character- of genes’ co-occurance (i.e., presence in a given genome only either izing architecture of this fold. Despite the low sequence identity p63 and p73 or p63 and p53) could be the best indicator of history 2910

Cell Cycle

2008; Vol. 7 Issue 18

Modeling of CP2 transcription factors

Table 2 Summary of the fold recognition for DNA-binding domain of CP2 family according to the 3D-Jury prediction assessment method Gene

GenBank sequence identifier

ELF1—D. melanogaster

3D-Jury score

Method

PDB

SCOP

gi|28573543: 870-1093

74.44

MetaBasic

2geq_A

b.2.5

Q6GMM0—D. rerio

gi|50345086: 18-240

60.67

MetaBasic

1tsr_A

b.2.5

GRHL3—H. sapiens

gi|122889194: 197-423

48.67

FFAS3

1tsr_A

b.2.5

GRHL1—H. sapiens

gi|90101332: 216-444

48.56

FFAS3

1tsr_A

b.2.5

TF2L1—M. musculus

gi|90101766: 18-244

47.56

MetaBasic

1tsr_A

b.2.5

GRHL2—H. sapiens

gi|74736618: 210-440

46.00

MetaBasic

1tsr_A

b.2.5

TFCP2—H. sapiens

gi|21361278: 37-264

46.00

MetaBasic

1tsr_A

b.2.5

gi|71995136: 170-389

40.38

MetaBasic

1tsr_A

b.2.5

UT E

.

Q9N3N7—C. elegans

3D-Jury score

IST RIB

Table 3 Summary of the fold recognition analysis for DNA-binding domain of Drosophila melanogaster ELF1—gi|28573543: 870-1093 Method

Selected hit

Method’s score

PDB identifier

Hit name

SCOP family

Organism

74.44

MetaBasic

2

28.27

2geq_A

TP53

b.2.5

M. musculus

74.11

1

28.56

1tsr_A

TP53

b.2.5

H. sapiens

FFAS3

5

-5.4

1tsr_A

TP53

b.2.5

H. sapiens

69.89

FFAS3

10

-5.3

68.33

FFAS3

7

-5.3

INUB 3D-PSSM

1 1

21.67 0.0922

51.67

mGen Threader

4

39.44

MetaBasic

6

38.56

INUB

2

The top scored hits for each method are shown in the bold font.

TP53

b.2.5

H. sapiens

TP53

b.2.5

M. musculus

1gzh_A 1ycs_A

TP53 TP53

b.2.5 b.2.5

H. sapiens H. sapiens

0.339

1tup_A

TP53

b.2.5

H. sapiens

21.89

1t4wA

TP53

b.2.5

C. elegans

12.96

1t4w_A

TP53

b.2.5

C. elegans

.D

ON

1uol_A

2geq_A

IEN CE

54.33 51.89

OT D

MetaBasic

70.67

©

20

08

LA

ND

ES

BIO

SC

of this family. Unfortunately—the availability of complete genome sequences of early Chordata is highly limited and as of now we lack any evidence on the order of gene duplication events. The applied fold recognition methodology suggests that the C-terminal domain adopts a ubiquitin-like fold of a β-grasp (ββαββ topology). Ubiquitin is a highly conserved protein which is covalently bound to proteins to direct their traffic to the protein degradation machinery (proteasome). The structural similarity defines a single superfamily consisting of proteins performing various functions, not only related to protein degradation.65 This type, lacking the typical C-terminal diglycine motif is observed also in CP2 family. Such domains are often observed to be involved in creating homo-(caspase-activated DNase (CAD), Phox and Bem1p) and heterodimers (streptokinase, superantigen toxins, N-terminal glutamine synthetase domain).66,67 To our best knowledge this is the first observation of ubiquitin-like fold in transcriptional ­regulators. The location of the domain in C-terminus is consistent with the previously mapped dimerization domain of TFCP2.3 While dimerization is critical for the activity of any palindrome recognizing DNA-binding protein (i.e., CP2 and TP53 families), the possession of a protein domain devoted to dimer formation in the ancestor protein family can indirectly suggest the importance of the oligomerization for relatively weak interaction of immunoglobinlike DNA-binding proteins. In this light—in the TP53 family, the

www.landesbioscience.com

activity of such domain was substituted by an acquisition of specialized region of TP53 involved in this process (H1 region supported by gain of the zinc ion binding site; compare Fig. 2). The additional domain (SAM) spacing N-terminal DNA-binding domain and the C-terminal dimerization module can be the major causative for the more flexible manner of recognition of DNA-binding site resulting in a shorter cis-sites recognized by the CP2 family (Fig. 5). There is an open question how CP2 protein performs the dimerization via beta-grasp domain. The low overall sequence similarity to other proteins of such fold makes it impossible to precisely point the mechanism of such interaction. However, the constant restriction of the sequence length in the C-terminal direction from this domain may suggest the mechanism of “beta-strand exchange” as observed for some ubiquitin-like fold proteins—e.g.: autophagy related AtATG12 (crystallographic structure deposited in PDB: 1wz3). In this protein, during the formation of homodimer, two protein molecules exchange two-beta strand fragments forming an extended bridge between two domains. In the discussed protein family we can observe an insertion in the loop located at the potential “bridging” region (marked in Fig. 4 with asterisks). Finally, using the data on cancer somatic mutations of TP53 cells deposited in Cosmic database we tested if additional information can allow to perform a phylogenetic footprinting analysis. As shown in Figure 2—the commonly mutated residues were identified.

Cell Cycle

2911

.D

ON

OT D

IST RIB

UT E

.

Modeling of CP2 transcription factors

©

20

08

LA

ND

ES

BIO

SC

IEN CE

Figure 2. Sequence-to-structure alignment of DNA binding domains of CP2 and TP53 coded with GenBank identifier (gi). Secondary structures elements— observed (Protein Data Bank entry 2geqA) and predicted (Psipred) are marked below (H—helix, E—extended). Numbers in brackets refers to the length of fragments removed for clarity. The position of known functional sites of TP53 are marked above the alignment (Z—zinc binding, D—dimerization, M—major groove interaction, m—minor groove interaction). The commonly observed cancer mutations are coded with numbers below the sequences (1—most often observed, numbers refers to the quartiles of the logged distribution of mutations).

Figure 3. Sequence-to-structure alignment of SAM domains of CP2 family, TP53-like proteins and SAM domains of resolved structure (for description refer to Fig. 2).

As expected, the notable part of such residues was located in the known functional amino acid of TP53 (19 out of total 52 mutated residues—summarized in Table 6). Surprisingly—more than 60% of the remaining residues (20/33) was observed to be preserved throughout the CP2-TP53 alignment, in terms of similar amino acids’ physicochemical properties. The inspection of the structure of the DNA-binding domain revealed that majority of such residues was located in the inner part of the domain, influencing the domain 2912

stability, what is complementary to the suggestion risen by Ang et al. in their study.68

Materials and Methods The full length sequences of human CP2 and homologous proteins deposited in PfamA22 database (accession PF04516) were retrieved from GenBank. Initially the proteins were subjected to the collection of secondary structure prediction (PsiPred,23 ProfSec24),

Cell Cycle

2008; Vol. 7 Issue 18

Modeling of CP2 transcription factors

Figure 4. Sequence-to-structure alignment of ubiquitin-like CP2 dimerization domain (for description refer to Fig. 2). The region of insertion potentially forming a linker in beta-strand switching mechanism is marked with asterisks.

Table 5 Collected cis-sites recognized by TFCP2—modified after ref. 12 Gene

CP2 DNA binding

65–259

1ghzA

-3.63

0.137

1.183

LMO2

SAM

326–388 1wwuA

-6

0.385

2.289

UROS

Ubiquitin-like

384–505 1yqbA

-3.46

0.133

1.233

HMBS

Sequence of cis-site

human

CCAG N5 CAAG N4 CCAG

human

CAAG N16 CATG N6 CTTG

human

CTGG N4 CCTG CAAG N3 CTTG

ALAS2

CTGG N4 CCAG

OT D



human



CTGG N5 CAGG

ANK1

CAAG N5 CTGG

human

ON

(LG score: >1.5, fairly good; >2.5, very good; >4, extremely good model; MaxSub score: >0.1, fairly good; >0.5, very good; >0.8, extremely good model).

Organism

IST RIB

Position Template ProSA Z ProQ MaxSub ProQ LG score score score

UT E

.

Table 4  Results of MQAP for TFCP2 domains

CTCG N5 CTGG

ITGA2B

human

CTGG N5 CAAG

HBB

human

CCAG CAGG N7 CAGG

KLF1

mouse

CCTG N5 CTGG

NFE2

mouse

CTGG N2 CAGG CAGG

IEN CE

.D



BIO

SC

Figure 5. Sequence logos of recognized cis-sites (CP2 sites derived from Table 5) obtained with WebLogo (weblogo.berkley.edu).69

©

20

08

LA

ND

ES

homology modeling (MetaBasic,25 FFAS3,26 HHpred2,27) and fold recognition methods (3D-PSSM,28 Fugue,29 INBGU30) via Structure Prediction Meta Server (http://bioinfo.pl/meta).31 The presence of domain boundaries was assessed by combination of GlobPlot and the consistent secondary structure predictions as reported previously.32 The identified potential structural domains were resubmitted to the MetaServer. The obtained results were screened with the consensus fold recognition method—3D-Jury33 and the collected structural templates were realigned to the multiple sequence alignment of CP2 homologues. The manual subsequent adjustment according to the secondary structure and presence functional residues was applied. The homology models were built with Modeller34 and the quality assessment tool Verify3D was applied for the final verification and refinement of the structural alignments.35 The quality of obtained models were tested with model quality assessment programs (MQAP)-ProQ36 and ProsaWeb.37 The position of TP53 cancer mutations was retrieved from Cosmic database (Catalogue Of Somatic Mutations In Cancer; Sanger, UK).38 Amino acid positions of which mutations were reported more than once were included in this study. The collected cancer mutations of TP53 were clustered in four bins according to quartiles of the logged distribution of mutations number. www.landesbioscience.com

GFI1B

mouse

CCAG N1 CTGG N1 CCTG

EPOR

mouse

CCAG CCTG N5 CTGG N9 CTGG

TFRC

mouse

CAGG N8 CAGG

Conclusions The applied state-of-the-art of protein structural bioinformatics revealed the general anatomy of the important family of transcription factors. Apparently, the analysis of CP2 family provided an important link to the molecular ancestry of the cell cycle regulator and the tumor suppressor protein—TP53. The knowledge about the structure can help elucidate the mechanisms of the recognition of the specific DNA sequence motifs, what can be useful in drafting of the repertoire of promoters regulated by CP2 factors. The presence of SAM domain in the homologous location as observed in TP53-like proteins (TP63, TP73) as well as “substitution” of internal dimerization module observed in TP53 with a devoted beta-grasp fold domain strongly suggests that multimerization is highly important for the proper activity of CP2-TP53 DNA-binding proteins utilizing the immunoglobulin-like domain. The identification of the fold of the dimerization module can result in the functional assays where the complete set of CP2-interacting proteins can be identified. Also, the observation that poly-glutamine stretch of TFCP2 is located within ubiquitin-like domain, can give new light into understanding of the role of this protein in neurodegenerative disorders.

Cell Cycle

2913

Modeling of CP2 transcription factors

Table 6 Summary of commonly mutated residues of TP53 in cancer samples according to the data from Cosmic database (description in text) Total sites

Sites at conserved positions

Total residues commonly mutated in cancer

52

28 (53.8%)

Mutations of known functional sites, including:

19

8 (42.1%)

  dimerization site

5

1

  zinc ion binding

3

0

  major groove contacts

8

5

  minor groove contacts

3

2

33

20 (60.6%)

ON

This work was supported by MNiSW and the European Commission (LSHG-CT-2003-503265), MNiSW (N401 050 32/1181, PBZ-MNiI-2/1/2005).

IST RIB

Acknowledgements

OT D

Although our reports describes only the application of theoretical methods of fold recognition and protein structure prediction, the applied verification protocols make our suggestions very likely to be confirmed in the further experimental studies.

UT E

.

Mutations located outside of known functional sites

16. Swendeman SL, Spielholz C, Jenkins NA, Gilbert DJ, Copeland NG, Sheffery M. Characterization of the genomic structure, chromosomal location, promoter, and development expression of the alpha-globin transcription factor CP2. The Journal of biological chemistry 1994; 269:11663-71. 17. Casolaro V, Keane-Myers AM, Swendeman SL, Steindler C, Zhong F, Sheffery M, et al. Identification and characterization of a critical CP2-binding element in the human interleukin-4 promoter. The Journal of biological chemistry 2000; 275:36605-11. 18. Lendon C, Craddock N. Is LBP-1c/CP2/LSF a disease-modifying gene for Alzheimer’s disease? Lancet 2001; 358:1029-30. 19. Xu Y, Kim HS, Joo Y, Choi Y, Chang KA, Park CH, et al. Intracellular domains of amyloid precursor-like protein 2 interact with CP2 transcription factor in the nucleus and induce glycogen synthase kinase-3beta expression. Cell death and differentiation 2007; 14:79-91. 20. Yoon JB, Li G, Roeder RG. Characterization of a family of related cellular transcription factors which can modulate human immunodeficiency virus type 1 transcription in vitro. Molecular and cellular biology 1994; 14:1776-85. 21. Rodda S, Sharma S, Scherer M, Chapman G, Rathjen P. CRTR-1, a developmentally regulated transcriptional repressor related to the CP2 family of transcription factors. The Journal of biological chemistry 2001; 276:3324-32. 22. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, et al. Pfam: clans, web tools and services. Nucleic acids research 2006; 34:247-51. 23. McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics 2000; 16:404-5. 24. Rost B, Yachdav G, Liu J. The PredictProtein server. Nucleic acids research 2004; 32:3216. 25. Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L. Detecting distant homology with Meta-BASIC. Nucleic acids research 2004; 32:576-81. 26. Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. FFAS03: a server for profile—profile sequence alignments. Nucleic acids research 2005; 33:284-8. 27. Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic acids research 2005; 33:244-8. 28. Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000; 299:499-520. 29. Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001; 310:243-57. 30. Fischer D. Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput 2000:119-30. 31. Bujnicki JM, Elofsson A, Fischer D, Rychlewski L. Structure prediction meta server. Bioinformatics 2001; 17:750-1. 32. Wyrwicz L, Koczyk G, Rychlewski L, Plewczynski D. ProteinSplit: splitting of multidomain proteins using prediction of ordered and disordered regions in protein sequences for virtual structural genomics. J Physics Cond Mat 2007; 19:1-8. 33. Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003; 19:1015-8. 34. Sanchez R, Sali A. Comparative protein structure modeling. Introduction and practical examples with modeller. Methods Mol Biol 2000; 143:97-129. 35. Eisenberg D, Luthy R, Bowie JU. VERIFY3D: assessment of protein models with threedimensional profiles. Methods Enzymol 1997; 277:396-404. 36. Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment and consensus information. Protein Sci 2006; 15:900-13. 37. Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic acids research 2007; 35:407-10. 38. Forbes S, Clements J, Dawson E, Bamford S, Webb T, Dogan A, et al. Cosmic 2005. British journal of cancer 2006; 94:318-22. 39. Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L. ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic acids research 2003; 31:3804-7. 40. Fischer D. 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins 2003; 51:434-41. 41. Ginalski K, Grishin NV, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic acids research 2005; 33:1874-91. 42. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, et al. The Pfam protein families database. Nucleic acids research 2008; 36:281-8. 43. Rudolph MJ, Gergen JP. DNA-binding by Ig-fold proteins. Nat Struct Biol 2001; 8:3846. 44. Kaustov L, Yi GS, Ayed A, Bochkareva E, Bochkarev A, Arrowsmith CH. p53 transcriptional activation domain: a molecular chameleon? Cell cycle (Georgetown, Tex) 2006; 5:489-94. 45. Tokino T, Thiagalingam S, el-Deiry WS, Waldman T, Kinzler KW, Vogelstein B. p53 tagged sites from human genomic DNA. Hum Mol Genet 1994; 3:1537-42. 46. Efeyan A, Serrano M. p53: guardian of the genome and policeman of the oncogenes. Cell cycle (Georgetown, Tex) 2007; 6:1006-10. 47. Levine AJ, Hu W, Feng Z. The P53 pathway: what questions remain to be explored? Cell death and differentiation 2006; 13:1027-36. 48. Rippin TM, Freund SM, Veprintsev DB, Fersht AR. Recognition of DNA by p53 core domain and location of intermolecular contacts of cooperative binding. J Mol Biol 2002; 319:351-8.

References

©

20

08

LA

ND

ES

BIO

SC

IEN CE

.D

1. Kang HC, Chae JH, Kim BS, Han SY, Kim SH, Auh CK, et al. Transcription factor CP2 is involved in activating mBMP4 in mouse mesenchymal stem cells. Mol Cells 2004; 17:454-61. 2. Barnhart KM, Kim CG, Banerji SS, Sheffery M. Identification and characterization of multiple erythroid cell proteins that interact with the promoter of the murine alpha-globin gene. Molecular and cellular biology 1988; 8:3215-26. 3. Wilanowski T, Tuckfield A, Cerruti L, O’Connell S, Saint R, Parekh V, et al. A highly conserved novel family of mammalian developmental transcription factors related to Drosophila grainyhead. Mech Dev 2002; 114:37-50. 4. Murata T, Nitta M, Yasuda K. Transcription factor CP2 is essential for lens-specific expression of the chicken alphaA-crystallin gene. Genes Cells 1998; 3:443-57. 5. Kang HC, Chung BM, Chae JH, Yang SI, Kim CG, Kim CG. Identification and characterization of four novel peptide motifs that recognize distinct regions of the transcription factor CP2. Febs J 2005; 272:1265-77. 6. Shirra MK, Zhu Q, Huang HC, Pallas D, Hansen U. One exon of the human LSF gene includes conserved regions involved in novel DNA-binding and dimerization motifs. Molecular and cellular biology 1994; 14:5076-87. 7. Volker JL, Rameh LE, Zhu Q, DeCaprio J, Hansen U. Mitogenic stimulation of resting T cells causes rapid phosphorylation of the transcription factor LSF and increased DNAbinding activity. Genes Dev 1997; 11:1435-46. 8. Uv AE, Thompson CR, Bray SJ. The Drosophila tissue-specific factor Grainyhead contains novel DNA-binding and dimerization domains which are conserved in the human protein CP2. Molecular and cellular biology 1994; 14:4020-31. 9. Cenci C, Gould AP. Drosophila Grainyhead specifies late programmes of neural proliferation by regulating the mitotic activity and Hox-dependent apoptosis of neuroblasts. Development 2005; 132:3835-45. 10. Ting SB, Wilanowski T, Cerruti L, Zhao LL, Cunningham JM, Jane SM. The identification and characterization of human Sister-of-Mammalian Grainyhead (SOM) expands the grainyhead-like family of developmental transcription factors. Biochem J 2003; 370:953-62. 11. Peters LM, Anderson DW, Griffith AJ, Grundfast KM, San Agustin TB, Madeo AC, et al. Mutation of a transcription factor, TFCP2L3, causes progressive autosomal dominant hearing loss, DFNA28. Hum Mol Genet 2002; 11:2877-85. 12. Bose F, Fugazza C, Casalgrandi M, Capelli A, Cunningham JM, Zhao Q, et al. Functional interaction of CP2 with GATA-1 in the regulation of erythroid promoters. Molecular and cellular biology 2006; 26:3942-54. 13. Chae JH, Kim CG. CP2 binding to the promoter is essential for the enhanced transcription of globin genes in erythroid cells. Mol Cells 2003; 15:40-7. 14. Zhou W, Zhao Q, Sutton R, Cumming H, Wang X, Cerruti L, et al. The role of p22 NF-E4 in human globin gene switching. The Journal of biological chemistry 2004; 279:26227-32. 15. Jane SM, Nienhuis AW, Cunningham JM. Hemoglobin switching in man and chicken is mediated by a heteromeric complex between the ubiquitous transcription factor CP2 and a developmentally specific protein. Embo J 1995; 14:97-105.

2914

Cell Cycle

2008; Vol. 7 Issue 18

UT E IST RIB OT D

©

20

08

LA

ND

ES

BIO

SC

IEN CE

.D

ON

49. Cho Y, Gorina S, Jeffrey PD, Pavletich NP. Crystal structure of a p53 tumor suppressorDNA complex: understanding tumorigenic mutations. Science 1994; 265:346-55. 50. Kitayner M, Rozenberg H, Kessler N, Rabinovich D, Shaulov L, Haran TE, et al. Structural basis of DNA recognition by p53 tetramers. Mol Cell 2006; 22:741-53. 51. Qiao F, Bowie JU. The many faces of SAM. Sci STKE 2005; 2005:7. 52. Ponting CP. SAM: a novel motif in yeast sterile and Drosophila polyhomeotic proteins. Protein Sci 1995; 4:1928-30. 53. Schultz J, Ponting CP, Hofmann K, Bork P. SAM as a protein interaction domain involved in developmental regulation. Protein Sci 1997; 6:249-53. 54. Blandino G, Dobbelstein M. p73 and p63: why do we still need them? Cell cycle (Georgetown, Tex) 2004; 3:886-94. 55. Finlan LE, Hupp TR. p63: the phantom of the tumor suppressor. Cell cycle (Georgetown, Tex) 2007; 6:1062-71. 56. Flores ER. The roles of p63 in cancer. Cell cycle (Georgetown, Tex) 2007; 6:300-4. 57. Trink B, Osada M, Ratovitski E, Sidransky D. p63 transcriptional regulation of epithelial integrity and cancer. Cell cycle (Georgetown, Tex) 2007; 6:240-5. 58. Arrowsmith CH. Structure and function in the p53 family. Cell death and differentiation 1999; 6:1169-73. 59. Wang WK, Proctor MR, Buckle AM, Bycroft M, Chen YW. Crystallization and preliminary crystallographic studies of a SAM domain at the C-terminus of human p73alpha. Acta crystallographica 2000; 56:769-71. 60. Cicero DO, Falconi M, Candi E, Mele S, Cadot B, Di Venere A, et al. NMR structure of the p63 SAM domain and dynamical properties of G534V and T537P pathological mutants, identified in the AEC syndrome. Cell biochemistry and biophysics 2006; 44:475-89. 61. Levrero M, De Laurenzi V, Costanzo A, Gong J, Wang JY, Melino G. The p53/p63/ p73 family of transcription factors: overlapping and distinct functions. J Cell Sci 2000; 113:1661-70. 62. Scoumanne A, Harms KL, Chen X. Structural basis for gene activation by p53 family members. Cancer Biol Ther 2005; 4:1178-85. 63. Yang A, Kaghad M, Wang Y, Gillett E, Fleming MD, Dotsch V, et al. p63, a p53 homolog at 3q27-29, encodes multiple products with transactivating, death-inducing and dominantnegative activities. Molecular cell 1998; 2:305-16. 64. Strano S, Rossi M, Fontemaggi G, Munarriz E, Soddu S, Sacchi A, et al. From p63 to p53 across p73. FEBS letters 2001; 490:163-70. 65. Walters KJ, Goh AM, Wang Q, Wagner G, Howley PM. Ubiquitin family proteins and their relationship to the proteasome: a structural perspective. Biochim Biophys Acta 2004; 1695:73-87. 66. Burroughs AM, Balaji S, Iyer LM, Aravind L. Small but versatile: the extraordinary functional and structural diversity of the beta-grasp fold. Biology direct 2007; 2:18. 67. Sumimoto H, Kamakura S, Ito T. Structure and function of the PB1 domain, a protein interaction module conserved in animals, fungi, amoebas and plants. Sci STKE 2007; 2007:6. 68. Ang HC, Joerger AC, Mayer S, Fersht AR. Effects of common cancer mutations on stability and DNA binding of full-length p53 compared with isolated core domains. The Journal of biological chemistry 2006; 281:21934-41. 69. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome research 2004; 14:1188-90.

.

Modeling of CP2 transcription factors

www.landesbioscience.com

Cell Cycle

2915

Suggest Documents