Genetic and biochemical studies of human apobec family of proteins

Wayne State University DigitalCommons@WayneState Wayne State University Dissertations 1-1-2012 Genetic and biochemical studies of human apobec fami...
Author: Oswin Patterson
0 downloads 2 Views 4MB Size
Wayne State University

DigitalCommons@WayneState Wayne State University Dissertations

1-1-2012

Genetic and biochemical studies of human apobec family of proteins Priyanga Wijesinghe Wayne State University,

Follow this and additional works at: http://digitalcommons.wayne.edu/oa_dissertations Recommended Citation Wijesinghe, Priyanga, "Genetic and biochemical studies of human apobec family of proteins" (2012). Wayne State University Dissertations. Paper 584.

This Open Access Dissertation is brought to you for free and open access by DigitalCommons@WayneState. It has been accepted for inclusion in Wayne State University Dissertations by an authorized administrator of DigitalCommons@WayneState.

GENETIC AND BIOCHEMICAL STUDIES OF HUMAN APOBEC FAMILY OF PROTEINS by PRIYANGA WIJESINGHE DISSERTATION Submitted to the Graduate School of Wayne State University, Detroit, Michigan in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY 2012 MAJOR : CHEMISTRY (Biochemistry) Approved by:

Advisor

Date

© COPYRIGHT BY PRIYANGA WIJESINGHE 2012 All Rights Reserved

DEDICATION

To my wife Thiloka and daughter Senuli

ii

ACKNOWLEDGMENTS I would like to thank my thesis advisor Dr. Ashok S. Bhagwat for his exceptional guidance, supervision and help during my graduate research career. I take this opportunity to thank my thesis committee Dr. Andrew L. Feig, Dr. Jeremy Kodanko and Dr. T.R. Reddy for their valuable comments and suggestions. Also, I would like to extend my thanks to Dr. Thomas Holland, Dr. David Rueda, and Dr. John SantaLucia. I must thank my present lab members Sophia Shalhout, Thisari Guruge, Shaqiao Wei, Anita Chalasani, Amanda Arnorld, Casey Jackson, Nadeem Kandalaft and Richard Evans for friendship and mutual support. Special thanks go to my former lab members Dr. Mala Samaranayake, Dr. Chandrika Canugovi, Dr. Michael Capenter, Dr. Todd Roy, Huang He, Dr. Rachel Parisien, Dr. Vijay Parashar, Erandi Rajagurubandara, Asanka Rathnayake and Richard Savage for help and great time we shared back in the day. I am grateful to Dr. Amanda Solem for proof-reading my thesis in spite of her busy schedule. I thank my father, mother, brother and sister at this point. I want to thank my mother-in-law for her unstinting support in taking care of my daughter during my graduate studies. I am grateful for her and her family for the help given. Last but not least I thank my wife Thiloka for standing beside me in good times as well as bad times.

iii

TABLE OF CONTENTS Dedication ……………………………………………………………………………………………………………………. ii Acknowledgements.……………………………………………………………………………………………. iii List of Tables ……..……………………………………………………………………………………………….Viii List of Figures ……….……………………………………………………………………………………………..iX List of Abbreviations ……………………………………………………………………………………………Xiii CHAPTER 1

Introduction …………………………………………………………………………….. 1

1.1

Overview …………………………………………………………………………………………………………. 1

1.2

Cytosine deamination and the APOBEC family of proteins……………………….. 2

1.2.1

Innate immunity and the APOBEC3 cluster …………………………………………………. 7

1.2.2

APOBEC3A (A3A) ………………………………………………………………………………………….. 8

1.2.3

APOBEC3G (A3G) …………………………………………………………………………………………. 9

1.2.4

Structure of APOBEC3G carboxy terminal domain (A3G-CTD) ………………. 11

1.2.5

Adaptive immunity and AID ……………………………………………………………………………. 14

1.2.6

Generation of antibody diversity ……………………………………………………………………. 15

1.2.7

Discovery of AID …………………………………………………………………………………………….. 16

1.2.8

Somatic hypermutation (SHM) ……………………………………………………………………… 18

1.2.9

Class switch recombination (CSR) ………………………………………………………………. 18

iv

1.3

Cytosine methylation as an epigenetic mark …………………………………………… 20

1.3.1

Methylation sites are hot spots for mutations ………………………………………….. 21

1.3.2

The intermediate of the 5mC to T mutation …………………………………………….. 22

1.3.3

Why 5mC to T mutation frequency is higher compared to C to T ………… 22

1.3.4

Mutational hot spots in human genes ………………………………………………………. 24

1.3.5

DNA methylation and DNMTs …………………………………………………………………… 25

1.3.6

DNA demethylation …………………………………………………………………………………….. 25

1.3.7

AID dependent DNA demethylation ………………………………………………………….. 28

1.4

E.coli as a model organism to study cytosine and 5mC deamination …… 30

1.5

Scope and aims of the research projects …………………………………………...………32

CHAPTER 2 MATERIALS AND METHODS 2.1

Bacterial strains……………………………………………………………………………………………….33

2.2

Plasmids ………………………………………………………………………………………………………….36

2.3

Kanamycin-resistance reversion assay ………………………………………………………39

2.4

Amplification of the kan alleles and sequence analysis ………………………….. 40

2.5

Uracil quantification assay …………………………………………………………………………. 40

2.6

Purification of A3A and its mutant …………………………………………………………….. 42

2.7

Biochemical assay for C and 5mC deamination …………………………………….. 43

2.8

Purification of A3G-CTD and A3G-AID hybrids ……………………………………….. 44

2.9

In vitro deamination activity assay with A3G-CTD and its hybrids …………45

v

CHAPTER 3 3.1

RESULTS …………………………………………………………………………. 47

Reexamination of the ability of AID to deaminate 5-methyl cytosine ………………………………………………………………………………………. 47

3.1.1

Validation of the genetic system for 5mC to T deamination …………………. 47

3.1.1

(A) Methylation protection of genomic DNA against restriction enzymes ……………………………………………………………………………………. 49

3.1.1

(B) Effect of MTase on the KanR reversion frequency ………………………….. 50

3.1.1

(C) Sequencing of the KanR revertants ……………………………………………………. 51

3.1.1

(D) Very Short Patch repair ………………………………………………………………………. 52

3.1.2

Human AID is not an efficient 5mC deaminase …………………………………….. 54

3.1.3

Efficient 5mC deamination by other APOBEC proteins …………………………. 59

3.1.4

In vitro cytosine and 5mC deamination by A3A …………………………………….. 64

3.2

Domain swap to determine the DNA binding regions of A3G and AID ………………………………………………………………………………………….. 71

3.2.1

Experimental design ………………………………………………………………………………….. 71

3.2.2

Properties of AID hybrids containing A3G DNA binding regions …………. 72

3.2.3

Properties of A3G hybrids containing AID DNA binding regions …………. 72

3.2.4

Biochemical properties of A3G-AID hybrids ……………………………………………. 73

3.2.5

DNA binding domain of A3A confers AID the ability to deaminate 5mC …………………………………………………………………………………….. 77

vi

CHAPTER 4 4.1

DISCUSSION …………………………………………………………………. 79

Reexamination of the ability of AID to deaminate 5-methyl cytosine …………………………………………………………………………………….. 79

4.2

Domain swap to determine the DNA binding regions of A3G and AID ……………………………………………………………………………………………. 86

References ……..……………………………………………………………………………………………………. 88 Abstract …….…………………………………………………………………………………………………………. 102 Autobiographical Statement ……………….…………………………………………………………….. 104

vii

LIST OF TABLES

Table 1.1

A summary of known members in human APOBEC family and their properties………………………………………………………………………………… 5

Table 1.2

APOBEC enzymes with known substrate sequence preferences ……. 6

Table 2.1

Oligonucleotides used in genetic constructions and assays ……………. 36

Table 2.2

DNA oligomers used for in vitro deamination assay …………………………. 44

Table 2.3

DNA oligomers used for deaminations assay ……………………………………. 46

Table 3.1

Sequences of spontaneous revertants ……………………………………………….. 52

Table 3.2

Sequences of the revertants with AID/APOBEC expression…………….. 64

viii

LIST OF FIGURES Figure 1.1

Hydrolytic deamination of cytosine ……………………………………………………… 3

Figure 1.2

Members of the human APOBEC family ……………………………………………. 4

Figure 1.3

Role of APOBEC and Vif in retroviral restriction ……………………………….. 10

Figure 1.4

Catalytic pocket of A3G-CTD ………………………………………………………………. 11

Figure 1.5

Structure of A3G-CTD detrmined by NMR and crystal structure …….. 12

Figure 1.6

Sequence alignment of human AID and carboxy terminal domain of A3G ………………………………………………………………………………………..13

Figure 1.7

Schematic of the antibody molecule ……………………………………………………. 14

Figure 1.8

Molecular mechanisms underlying antibody maturation ……………………. 17

Figure 1.9

DNA cytosine methylation ……………………………………………………………………… 20

Figure 1.10

The intermediate of 5mC●G to T: A mutation ……………………………………… 22

Figure 1.11

Comparison of C to T versus 5mC to T mutations and their repair mechanisms …………………………………………………………………………………..23

Figure 1.12

DNA methylation profile during the early embryogenesis …………………. 26

Figure 1.13

The two major phases of genome-wide DNA demethylation in mammals …………………………………………………………………………………………….. 27

Figure 1.14

Proposed mechanisms for active DNA demethylation ………………………. 28

Figure 2.1

Verification of BH143 for the absence of Dcm methylation ………………. 33

Figure 2.2

Construction of E.coli strains ………………………………………………………………… 34

Figure 2.3

Salient features of the plasmid maps …………………………………………………… 38

Figure 2.4

Schematic of constructing hybrid proteins between AID/A3G and AID/A3A …………………………………………………………………………………………… 39

Figure 2.5

Scheme of the uracil quantification assay ……………………………………………. 41

Figure 3.1

Five different sequence contexts of 5-methyl cytosine in kan alleles … 48

Figure 3.2

Protection of genomic DNA against restriction enzymes by MTases … 49

ix

Figure 3.3

Effects of MTases on kanamycin reversion frequency ………………………… .51

Figure 3.4

Very Short Patch Repair pathway of E.coli ……………………………………………. 53

Figure 3.5

KanR revertant frequency with or without VSP repair ……………………………. 54

Figure 3.6

Cytosine to uracil deamination by AID ……………………………………………………. 55

Figure 3.7

5mC to T deamination by AID ……………………………………………………………………56

Figure 3.8

KanR revertants due to AID promoted 5mC to T deamination …………….. .57

Figure 3.9

Quantification of genomic uracils due to AID …………………………………………. 58

Figure 3.10

Comparison of cytosine to uracil deamination by AID and A3G ………….. 59

Figure 3.11

Comparison of 5mC to T deamination by AID and A3G ……………………….. 60

Figure 3.12

Comparison of cytosine to uracil deamination by AID, A3A and A3C …. 61

Figure 3.13

Comparison of 5mC to T deamination by AID, A3A and A3C ………………..62

Figure 3.14

5mC deamination by A3A ……………………………………………………………………………62

Figure 3.15

5mC deamination by A3A depends on catalysis ……………………………………. 63

Figure 3.16

SDS-PAGE gel of partially purified A3A and A3A-E72A mutant ……………66

Figure 3.17

In vitro cytosine deamination by A3A ………………………………………………………. 67

Figure 3.18

In vitro 5mC deamination by A3A ……………………………………………………………. 68

Figure 3.19

Quantification of cytosine and 5mC deamination by A3A …………………….. 69

Figure 3.20

Kinetics of A3A on cytosine versus 5mC deamination ………………………….. 70

Figure 3.21

SDS-PAGE gel and western blot of the AID-A3G hybrid proteins ……….. 73

Figure 3.22

Sequence selectivity and kinetics of A3G in cytosine deamination …….. 74

Figure 3.23

Kinetics of cytosine deamination by A3G-AID hybrids ………………………….. 76

Figure 3.24

Schematic to illustrate domain swap between AID and A3A ……………….. 77

Figure 3.25

Comparison of 5mC deamination by AID, A3A and AID-A3AR2 …………..78

x

Figure 4.1

Modeling of 2´- deoxycytidylate with APOBEC3G-CTD ………………………. .80

Figure 4.2

KanR revertant frequency of A3G-CTD and its Y315 substituent mutants …………………………………………………………………………….........81

Figure 4.3

Distinct DNA binding regions in A3A and A3G-CTD …………………………….. 82

xi

LIST OF ABBREVIATIONS 5caC : 5-carboxyl cytosine 5fC : 5-formyl cytosine 5hmC : 5-hydroxy methyl cytosine 5mC : 5-methyl cytosine A : Adenine AAV2 : Adeno Associated Virus 2 AID : Activation Induced Deaminase APOBEC : Apolipoprotein B mRNA editing Enzyme- Catalytic polypeptide BER : Base Excision Repair bla : β lactamase C : cytosine CDA : cytosine deaminase (domain) cDNA : complementary Deoxyribonucleic Acid cmp : cloramphenicol CSR : Class Switch Recombination CTD : Caboxy Terminal Domain DBD : DNA Binding Domain Dcm : DNA cytosine methylase DNMT : DNA methyltransferase DTT : Dithiothreitol E.coli : Escherichia coli EDTA : Ethylenediaminetetraacetic acid ES : Embryonic Stem (cells) G : Guanine GI : gastrointestinal xii

GST : Glutathione S-transferase His : Histidine HIV-1 : Human Immunodeficiency Virus-1 ICM : Inner Cell Mass Ig : Immunoglobulin IL : Interleukin IPTG : Isopropyl β-D-1-thiogalactopyranoside Kan : kanamycin KanR : kanamycin resistant KanS : kanamycin sensitive LINE : Long Interspersed Nuclear Elements LTR : Long Terminal Repeats MBD4 : Methyl-CpG Binding Domain protein 4 MMR : Mismatch Repair mRNA : messenger Ribonucleic acid MTase : Methyltransferase NER : Nucleotide Excision Repair NMR : Nuclear Magnetic Resonance ɸ - 1,10 phenanthroline PAGE : Polyacryamide Gel Electrophoresis PBMC : Peripheral Blood Mononuclear Cells PGC : Primordial Germ Cells Pol I : DNA Polymerase 1 RT : Reverse Transcriptase S : Switch (regions) SAM : S-adenosyl methionine

xiii

SDS : Sodium Dodecyl Sulfate SHM : Somatic Hypermutation T : Thymine TDG : Thymine DNA Glycosylase TET : Ten Eleven Translocation (proteins) U : Uracil UDG : Uracil DNA Glycosylase UGI : Uracil DNA glycosylase Inhibitor Ung : Uracil N-glycosylase (aka Uracil DNA Glycosylase) VDJ : Variable - Diversity- Junctions Vif : Virulence infectivity factor VSP : Very Short Patch (repair pathway) Vsr : Very short patch repair (enzyme)

xiv

1

CHAPTER 1 INTRODUCTION 1.1 Overview Deoxyribonucleic acid (DNA) is a mixed polymer of the four nucleotides adenine, guanine, cytosine and thymine linked by phophodiester bonds. The sequence of these letters determines the genetic makeup (genotype) of an organism. Any change of this sequence can lead to mutations which in turn can alter the observable traits of an organism known as phenotype. Surprisingly, not all phenotypic changes are due to changes in the genotype. Phenotypic changes occur without altering the primary sequence of DNA are epigenetic changes. In the conventional view, DNA is a stable molecule having following features to protect the integrity of the genetic information. DNA exists as a double stranded helix segregating the genetic information equally in the semi-conservative replication during cell divisions. The presence of 2´-H instead of 2´-OH in pentose sugar makes DNA more inert compared to RNA and well suited to store the genetic information in organisms. The stacking of nitrogen bases inside the structure shields the bases from both endogenous and exogenous damaging agents. However, when DNA damage does occur, several DNA repair mechanisms (Base Excision Repair, Nucleotide Excision Repair, Mismatch Repair etc.,) have evolved to protect the damaged DNA bases or strands. The double stranded-ness plays a role in repair as well providing a template to find the missing information in the damaged DNA strand. In spite of all the above protective mechanisms, the cytosine in DNA undergoes a multitude of changes to add an extra layer of information to the primary sequence. This

2 versatility has been achieved through covalent modifications of the base using a set of enzymes. The mechanics and the significance of RNA cytosine modification have long been studied and continues to expand [1-2]. In contrast, modification of cytosine in DNA comes with rewards as well as risks. The targeted deamination of cytosines in the immunoglobulin genes generates diversity in antibodies while misregulation of the process leads to B-cell lymphomas [3]. The marking of cytosines with a methyl group distinguish self DNA from foreign in prokaryotes. DNA methylation in eukaryotes is essential for gene regulation and epigenetic modification although it comes with the price of frequent 5mC to T mutations [4]. The long-term goal of our lab is to study the prevention as well as promotion of mutations in life processes. My thesis work explores the importance as well as deleterious effects of cytosine deamination, cytosine methylation and recently discovered DNA demethylation to the living cells.

1.2 Cytosine deamination and the APOBEC family of proteins The deamination of cytosine to uracil (Figure 1.1) is a spontaneous process (hydrolytic deamination) that occurs at a rate of 100-500/eukaryotic cell /day and the rate of deamination of single stranded DNA is 140 fold higher compared to the double stranded DNA deamination [5]. The APOBEC (Apolipoprotein B mRNA editing enzymecatalytic polypeptide) family of proteins in higher vertebrates has evolved to catalyze the same reaction as single stranded specific DNA or RNA cytosine deaminases (Figure 1.2). APOBEC is the only family of enzymes known in biology that damages DNA or RNA to achieve its biological function such as antibody maturation [6] or retrovirus restriction [7].

3

Figure 1.1 Hydrolytic deamination of cytosine. Cytosine deamination is initiated by the nucleophilic attack of a free water molecule at the C4 carbon of a cytosine (left) and leaving the amino group as ammonia resulting a uracil (right).

4

Figure 1.2 Members of the human APOBEC family. APOBEC1 is a single stranded RNA cytosine deaminase. The other members have either no activity on nucleic acids or are single stranded DNA cytosine deaminases. All enzymes of the family posses at least one cytosine deamination sequence motif as highlighted in the box. A modified figure from the reference [8].

APOBEC1 was the first member to be discovered and has a role in lipid metabolism [9-10]. This is a single-stranded RNA cytosine deaminase although it can act on DNA as well [11]. The known proteins of the human APOBEC family with their properties are summarized below (Table 1.1).

5 Table 1.1 A summary of known members in human APOBEC family and their properties Name

Tissue expression

Enzyme activity

Associated function

Apobec1

GI track

RNA/DNA

ApoB mRNA editing

deaminase Apobec2

Heart and skeletal muscle

Apobec3A

Primary monocytes,

None DNA deaminase

keratinocytes Apobec3B

T cells, PBMC

unknown Viral and retrotransposon restriction

DNA deaminase

Viral and retrotransposon restriction

Apobec3C

Many tissues and cancer

DNA deaminase

cell lines Apobec3DE

Many tissues

Apobec3F

Many tissues

Viral and retrotransposon restriction

none DNA deaminase

Viral restriction Viral and retrotransposon restriction

Apobec3G

Many tissues

DNA deaminase

Viral and retrotransposon restriction

Apobec3H

Poorly expressed due to

Unknown

a premature stop codon Apobec4

unknown

AID

Activated B-cells, primodial germ cells

Adapted from the references [12-13]

Viral restriction once expression is optimized

none DNA deaminase

unknown Antibody maturation

6 In performing the cytosine deamination reaction on polynucleotide substrates, APOBECs show DNA/RNA sequence preferences (Table 1.2). Typically APOBECs have a strong preference for the base -1 from the target cytosine and a weaker preference for -2 base. In one study, 89% of all the mutations caused by activation induced deaminase (AID) occurred within WRC sequence context where W is A or T and R is purine [14]. In another study, 75% of mutations caused by APOBEC3G were in CCC sequence [15]. Other proteins in the family are less well studied but also have DNA sequence preferences (Table 1.2). However, these sequence determining amino acid residues of the proteins were not understood. In this study, putative DNA binding regions were swapped among the APOBECs and the role of those amino acid residues in sequence specificity determination was tested. Table 1.2 APOBEC enzymes with known substrate sequence preferences Enzyme

DNA/RNA sequence preference

References

Apobec1

5´-TC / C and 6666 ApoB mRNA

[16]

Apobec3A

5´-TC /5´-CC

[17]

Apobec3C

5´-TC > 5´-CC

[18]

Apobec3DE

5´-WC

[19]

Apobec3F

5´-TC

[20]

Apobec3G

5´-CCC

[16],[15]

AID

5´-WRC

[14]

Underlined C is the target of deamination. W is A or T and R is purine

7

1.2.1 Innate immunity and the APOBEC3 cluster The mammalian immune system carries out a myriad of protective mechanisms to keep pathogens in check. The immune reaction we make against the pathogen is known as the immune response and there are two types of responses named innate and adaptive. The innate immune response is readily available to combat an array of pathogens but does not lead to a lasting immunity or immunological memory. This response is not specific for a particular pathogen. Multiple defense mechanisms represent the eukaryotic innate immune system including phagocytic cells, antimicrobial peptides, signaling molecules like interferons etc., In contrast to innate immunity, adaptive immune response is specific for a pathogen and it develops as an adaptation to the particular pathogen or pathogen derived molecules (antigens). APOBECs play a central role in both of these arms, APOBEC3 cluster of proteins in innate immunity and AID in the adaptive immune response. APOBEC3 genes in humans form a sub-branch of the family (Figure 1.2) originated through tandem duplications of an ancestral APOBEC3 gene [21]. Six of the human APOBEC3 genes (APOBEC3A to APBEC3G) form a 130 kb cluster on chromosome 22 [22]. Although, the region containing the APOBEC3 cluster on human chromosome 22 is syntenic to a segment of murine chromosome 15, there is only one APOBEC3 gene in mice [23]. Complex rearrangements of the APOBEC3 locus due to retroviral activity may have contributed to the expansion of the multiple APOBEC3 proteins in primates. In support of this hypothesis, ~ 19% of the human APOBEC3 locus comprises relics of mainly LTR type retroelement DNA [23]. The two APOBEC3 proteins I studied during my thesis work (APOBEC3A and APOBEC3G) are described below in detail. AID will be described in the context of adaptive immunity.

8

1.2.2 APOBEC3A (A3A) A3A is a single domain cytosine deaminase (Figure1.2) and can enter the nucleus [24]. A3A inhibits retrotransposition of LINE-1 elements [24] and adenoassociated virus 2 (AAV2) [17]. Although a recent study showed that A3A is capable of restricting HIV-1 in human myeloid cells [25], it is in general inactive against HIV-1 in cell lines [26]. This inability of A3A to act on HIV-1 seems to be due to the improper targeting of the protein into the viral nucleoprotein complex. The forced incorporation of A3A as an A3G-A3A hybrid facilitated not only packaging the protein into HIV-1 but in inducing a high level of editing (cytosine deamination) and replication blockade [27]. Over expression of A3A leads to hyper-editing of papillomavirus genomes [28] and transfected plasmid DNA [29]. A recent study indicated the potential of A3A to activate the DNA damage response and cell cycle arrest when the protein is over expressed [30]. This ability seems to be unique to A3A as authors have compared A3B, A3C and A3G along with A3A in the study. Another recent study proposed a new role for A3A in DNA catabolism based on mitochondrial DNA editing by A3A [31]. The expression of human A3A is initiated from two different methionine codons (M1 and M13) resulting in two isoforms that are visible as a doublet on a western blot [32]. In this study, I examined the ability of A3A to deaminate cytosine and 5-methyl cytosine both in vivo and in vitro.

9

1.2.3 APOBEC3G (A3G) Among APOBEC3s, A3G is the most intensively studied member. A3G inhibits the replication of HIV-1, if the virus is deficient for the viral infectivity factor (Vif) protein (Figure 1.3) [33]. Therefore, the function of HIV-1 Vif is to counteract A3G activity by targeting the protein for polyubiquitilation and proteasomal degradation [34-35]. In the absence of Vif, A3G restricts HIV-1 using at least two different mechanisms. The catalytically inactive CDA (Figure 1.2) domain is essential for binding viral RNA, interactions with HIV-1 Gag and incorporation into HIV-1 virions [36-38]. The catalytically active CDA is required to introduce C to T mutations in the reverse transcribing DNA copy of the viral genome (G to A mutations in viral RNA) as the other mechanism to impair the virus. Due to this hypermutation event, essential viral proteins may be inactivated or cellular repair enzymes may degrade the viral cDNA [39-41]. A clear understanding of the structure of the A3G protein and its interaction with nucleic acids is paramount to elucidate these unanswered questions.

10

Figure 1.3 Role of APOBEC3G and Vif in retroviral restriction and infection. APOBEC3G (maroon) and Virulence infectivity factor (VIf- pink circle) are key determinants of retroviral infectivity. In the absence of Vif, APOBEC3G is incorporated into HIV-1 virions along with the viral genomic RNA. HIV-1 Vif protein targets APOBEC3G protein for proteasomal degradation in the producer cells (cells that propagate viruses). On the contrary, if APOBEC3G escapes Vif, it can enter into the budding virions and later into the target cell which is free from viruses but get infected subsequently. The AOBEC3G in target cells can deaminate the newly synthesizing minus DNA strand of the virus. RT – Reverse transcriptase. Adapted from a figure in reference [7].

11

1.2.4 Structure of APOBEC3G carboxy terminal domain (A3G-CTD) Recent solution structures of APOBEC3G carboxy terminal domain by NMR spectroscopy [42-43] and the crystal structure of the same domain by X-ray crystallography [44] shed light into the catalytic pocket of the enzyme (Figure 1.4) and the DNA binding regions (Figure 1.5). The catalytic pocket of the A3G-CTD contains a Zn2+ ion (red dot) coordinated by Cys 288, Cys291, His 257 and a water molecule (cyan). This structure provides the insight into how this class of enzymes interacts with DNA and performs catalysis.

Figure 1.4 Catalytic pocket of A3G-CTD. The active site of A3G-CTD contains a Zn+2 ion (red) coordinated by C288, C291, H257 and a water molecule (cyan). E259 hydrogen bonds with a H2O molecule and presumably activates for an attack at C4 of target cytosine of single stranded DNA. PDB: 3E1U [44]. Although both the NMR and crystal structures predicted the path of DNA binding to the protein (Figure 1.5) there are significant differences between the two structures. These structures were not solved bound with DNA, and the authors used different criteria to determine the putative DNA binding regions. The differences between the two

12 structures arise mainly due to the loops of the protein appear to acquire different three dimensional structures in solution (NMR) compared to crystalline form. The position of the “loop1” is substantially different in the two structures. To identify the DNA binding regions, the authors relied on two different criteria. Chen et al [42] used NMR chemical shift perturbations in the presence of a 5´-CCT oligomer to identify the amino acids that may interact with DNA. In contrast, the study of Holden et al [44] was based on a deep groove present in the X-ray structure to determine the DNA binding residues. Predicted path of DNA

Zn2+ Zn2+

R215 D316 R213

R215 R313

D316

R213

R313

Figure 1.5 The structure of A3G-CTD determined by NMR (left-PDB: 2JYW) and X-ray crystal structure (right-PDB: 3E1U) are shown. The predicted path of DNA in each structure along with two amino acids has been shown to illustrate the difference between the two structures. Adapted from a figure in reference [45]. Both groups performed site directed mutagenesis and deaminase assays to narrow the list of residues down to one (Region-2, X-ray structure) or two (Region-1 and Region-2, NMR structure) putative DNA binding regions (Figure 1.6). To obtain the NMR structure, Chen et al. used a more soluble variant of A3G-CTD with 5 mutations called 2K3A and they are shown in Figure 1.6 aligning with AID amino acid sequence. L234K

13 and F310K mutations improved the solubility of the variant protein whereas C243A, C321A and C356A minimized the intermolecular disulfide bond formation [42]. The experiments in my thesis work were carried out using the A3G-CTD-2K3A variant.

Figure 1.6 Sequence alignments of human AID and A3G-CTD. The amino acids shared by both the proteins are shown in blue. The residues changed to obtain 2K3A variant is indicated with arrows. The two putative DNA binding regions are underlined (DBD-1 and DBD-2). Two cysteines and the histidine that coordinate the Zn2+ in the catalytic pocket have been indicated with a box. Adapted from a figure in reference [45].

14

1.2.5 Adaptive immunity and AID Adaptive immunity is also known as the humoral immunity as it is mediated by antibodies (Figure 1.7) produced by B-cells against the antigens. Two key features of humoral immunity are the immunological memory and the generation of antibody diversity. Antibody diversity is critical to the success of higher vertebrates to overcome a limitless number of antigens using a limited number of antibody genes. The origin of this diversity was an enigma for a long time. The early ideas of antibody gene diversification were developed on a theoretical basis due to the lack of sufficient experimental tools [46]. Although many theories were postulated to explain the mechanism of primary antibody diversity, recombinant DNA technology provided the direct evidence to characterize the mouse immunoglobulin gene loci. Tonegawa et al. were the first to come up with the experimental data for the gene rearrangements of the antibody gene [47].

Figure 1.7 Schematic of the antibody molecule. A typical antibody molecule consists of two heavy chains (longer chains in the middle) and two light chains (shorter outside chains). The N-terminus is specialized to bind to the antigen and it shows high amino acid variability. Relatively the C-terminus has a constant amino acid composition. The disulfide bonds between the chains are shown.

15

1.2.6 Generation of antibody diversity In humans and mice, antibodies gain diversity through three main processes called V(D)J recombination, somatic hypermutation (SHM), and class switch recombination (CSR) (Figure 1.8 ). V(D)J recombination is a combinatorial process that combines three types of protein coding DNA units called variable(V), diversity (D), and junction (J) segments. The identification of RAG1 and RAG2 (recombination-activating genes) as the “V(D)J recombinase” by the Baltimore laboratory was a major contribution to delineate the underlying mechanism of this process [48-49]. There are scores of V, D (only in heavy chains) and J-segments in human immunoglobulin gene region. During early B-cell development, each cell creates a unique combination representing one region from each category [V (D) and J] to form the variable region. This happens prior to the exposure of B-cells to any antigen and is capable of producing millions of clones each capable of making a distinct antibody. However, the primary repertoire of antibodies generated through V(D)J rearrangement alone was not sufficient to provide high affinity antibodies ( Ka 109 M-1 ) in any organism [50]. There were speculations of mutational processes at the N-terminus of the antibody molecule to introduce the additional diversity and Lederberg was one of the pioneers to put forward this hypothesis [51]. Brenner and Milstein bolstered Lederberg’s 1959 proposal of the mutational process suggesting an antibody diversity might be achieved through a localized DNA synthesis driven by an error-prone repair of a lesion created by an unidentified DNA cleaving enzyme [52].

16

1.2.7 Discovery of AID After 30 years, Muramatsu et al discovered the APOBEC protein, named activation induced deaminase (AID) as the mutator enzyme [53]. The discovery of AID is one of the conceptual breakthroughs in understanding the genetics of antibody gene diversification. AID is expressed mainly but not exclusively in the germinal center B cells [53-54]. This is a single stranded DNA specific cytosine deaminase [55-56] and the conversion of cytosine to uracil by AID is required for both somatic hypermutation (SHM) and class switch recombination (CSR). Once AID deaminates cytosines in DNA into uracil, the downstream cellular pathways act differently in the two processes either to introduce point mutations in the case of SHM or region specific recombinations in CSR (Figure 1.8). Some animals like chickens and pigs can perform an additional AIDdependent diversification of antibody genes called gene conversion [57] but it will not be discussed here.

17

Figure 1.8 Molecular mechanisms underlying antibody maturation. The human antibody gene loci contain scores of V (variable), D (diversity) and J (junction) sequences upstream to the constant gene segments. V(D)J recombination takes place in two steps. A “D” segment recombines with a “J” segment followed by a “V” region combing with already rearranged “D-J” segment. The shuffling of gene segments will allow the primary antibody repertoire to reach millions of different combinations. During somatic hypermutation, AID introduces point mutations to the already rearranged V(D)J region that represents the variable region of the antibody molecule. Asterisks within the variable region depict the point mutations. Class switch recombination is a region specific recombinational process in which AID mediated point mutations lead to strand breaks in S (switch) regions of antibody genes. The downstream processes resolve those strand breaks while switching the antibody classes from IgM to a different isotype such as IgG rendering a different effector function to the antibody. During this process, the intervening DNA segment in two S-regions is eliminated as a circle.

18 1.2.8 Somatic hypermuatation (SHM) SHM is a process whereby the affinity of an antibody molecule is increased toward a particular antigen through the introduction of point mutations to the variable region of the antibody gene (Figure1.8). Some of these mutations are favorable to make a better fit with the antigen and are clonally selected to proliferate whereas the B-cells with deleterious mutations are discarded through apoptosis [46].

SHM results in a

mutation rate in the immunoglobulin genes of about 1 mutation per 1000-10000 base pairs per cell division, a rate which is a million fold higher than the background mutation rate (Berek, 1998). One of the remarkable intrinsic features of SHM is that this mutational process directs only to ~ 2 kb region of the antibody variable gene region [58] out of 3 billion base pair human genome. 1.2.9 Class Switch Recombination (CSR) While SHM is a mutational process to increase the affinity of the variable regions of both light and heavy chains, CSR is a region specific recombinational process that occurs only in heavy chain genes of the antibody molecules (Figure 1.7). The first antibody isotype to produce prior to CSR is IgM with the constant gene µ. CSR exchanges the µ constant region with another (e.g. γ) to alter the effector function of the antibody. The different antibody isotypes with different effector functions will determine the half life of the antibody, tissue localization etc. Triggering the CSR machinery at Sregions (Figure 1.8) is precisely regulated by the cytokines with induced S-transcripts at the 5´ repetitive S-regions [59]. For example, the stimulation of mouse B-lymphocytes with interleukin (IL) 4 induces CSR to IgG1 and IgE. AID initiates CSR by deaminating cytosines in single stranded S-regions creating a U●G mismatch [60]. The U●G mismatches introduced by AID will be processed by uracil DNA glycosylase (UNG) creating abasic sites leading to DNA double strand breaks as essential intermediates in CSR [61]. These DNA double strand breaks in two different switch (S) regions are joined

19 while leaving the intervening DNA segment as a circle (Figure 1.8). This exchanges the µ constant region of the immature immunoglobulin genes with one of the downstream constant segments (eg.γ) switching from IgM to a different isotype such as IgG. AID has also been implicated in DNA demethylation in the context of early embryogenesis in mammals [62-64] and it will be the main focus of this thesis work. The basis for this observation is in part on the reported ability of AID to deaminate 5-methyl cytosine (5mC) to thymine [54]. The proposed molecular processes of AID-dependent 5mC deamination followed by DNA demethylation and the underlying principles of the research tools used to examine the 5mC deamination by AID will be discussed below.

20

1.3 Cytosine methylation as an epigenetic mark The tagging of cytosines with a methyl group in DNA (Figure 1.9) is a pivotal epigenetic mark in vertebrates. This provides an additional layer of information for growth and development which is not encoded in the primary sequence of DNA. The regulation of cytosine methylation in DNA has become an area of great interest in a number of fields including cell reprogramming [65] and differentiation [66], inhibition of retroelements [67], genomic imprinting [68] and X-chromosome inactivation [69]. As DNA methylation is required for normal development [70-71], aberrant methylation leads to several human diseases including cancer [72].

Figure 1.9 DNA cytosine methylation. DNA methyltransferase (DNMT) catalyses the transfer of a methyl group (CH3) from S-adenosylmethionine to the 5th carbon of cytosine in DNA.

The methylation of DNA occurs predominantly in a CpG dinucleotide sequence context allowing symmetrical modification to facilitate inheritance through cell divisions. About 70-80% of cytosines in CpG sites are methylated on both strands in mammalian

21 somatic cells [68]. A genomic region of about 1 Kb that has a high G-C content (>55%), and also is rich in CpG dinucleotides but usually hypomethylated (lower level of methylation) is termed a “CpG island” [73]. Methylation of CpG islands in the promoters of genes results in gene silencing [68] whereas methylation of CpG islands within gene bodies regulate alternative promoter usage [74]. Furthermore, cytosine methylation in non CpG contexts (also known as CpH = CpA, CpC or CpT) has been found in human embryonic stem cells (ES) [75] and in adult mouse brain [76].

1.3.1 Methylation sites are hot spots for mutations Over 30 years ago 5mC was recognized as a hot spot for spontaneous C to T mutation at methylated cytosines in E.coli [77]. Since then several studies supported this finding both in E.coli and in humans. The E.coli K-12 genome codes for a single cytosine methyltransferase called Dcm [78] that methylates the second C in the sequence context CCWGG where W is A or T [79]. In addition to Dcm sites in lacI gene of E.coli, cI gene of phage λ [80] and gene for kanamycin resistance, kan [81] also contains mutational hot spots. Another important feature is that only some of the 5mCs are mutational hot spots in a given genomic region having multiple 5mCs. This observation was first made in the lacI gene of E.coli [77] and later in the human p53 mutation spectrum where only a fraction of methylated CpG sites were known hotspots [4]. Formation of a cruciform or other types of local secondary structures may determine the length of the time that the 5mC is in unpaired state determining the “temperature” of the hot spot [82]. The special proline codon of the kanamycin resistance genetic assay used in this study is located on a mutational hot spot [83].

22

1.3.2. The intermediate of the 5mC to T mutation The two possible intermediates in the conversion of 5mC●G base pair into T:A are 5mC●A or T●G mismatches (Figure 1.10). The former mispair could be a result of misincorporation of an adenine during replication. However, there is no support for DNA polymerase copying a cytosine incorrectly when it is methylated [84]. Instead, hydrolytic deamination has been shown to convert 5mC into T [85-86] and therefore, the likely intermediate during the conversion of 5mC●G into T:A is T●G. There were a number of other hypotheses in the field to explain the origin of 5mC to T mutation through T●G mismatch [87-88] but their validity has been disputed.

T G T A

5mC

G

5mC

A

Figure 1.10 The intermediate of 5mC●G to T:A mutation.

1.3.3 Why 5mC to T mutation frequency is higher compared to C to T ? Comparison of mutational consequences of 5mC to T deamination with C to U deamination provides the evidence for underlying mechanisms. There are at least two reasons for higher mutation frequency of 5mC to T compared to C to U to T mutation (Figure 1.11). First, the deamination rate of 5mC is four fold higher compared to cytosine deamination [85]. A biological reason for this effect is that cells contain multiple efficient pathways to repair U●G mispairs compared to T●G mismatches [89]. The ubiquitous

23 enzyme Uracil-DNA glycosylase (UDG) is very efficient in removing uracils compared to VSP reapir of E.coli [90] or TDG/MBD4 of mammals [91] that have evolved to remove T from a T●G mismatch.

U G

C G

U A

C G

C G

T A

U A

5mC

T G

G

T A

C G

C G

Figure 1.11 Comparison of C to T (A) versus 5mC to T (B) mutations and their repair mechanisms. BER: Base Excision Repair.

24 1.3.4. Mutational hot spots in human genes Analysis of DNA sequence changes in human genetic diseases revealed a high proportion of mutations at CpG sites [92]. This involves a number of human genetic diseases such as β-thalassmia, hypercholesterolemia, hemophilia and human cancer [93-95]. The enumeration of sequence changes in these genes revealed the mutation frequency at CpG sites was 15 fold higher compared to random chance [93]. In the case of human genetic diseases, it is hard to determine whether the mutations occurred “spontaneously” or due to external agents. However, the following examples provide evidence to support that at least the majority of the mutations arose spontaneously at CpG sites. When lacI and lacZ genes were inserted in mice, there were hot spots at CpG sequences for unselected C to T mutations [96-97]. Similar results were observed when lacI gene was inserted in rats [98]. The study of p53 mutational spectra led to the understanding of this process in more details. Inactivation mutations in the p53 tumor suppressor gene are the single most common genetic event in human cancers that affects a specific gene. Most of the mutations lie on the gene segment of p53 gene that codes for the DNA binding region [99]. All CpG sequences analyzed in the p53 were completely methylated in all human tissue samples tested [4] and methylated CpGs contained more than one third of all cancer mutations. Five out of six p53 mutational hot spots (codons 175, 245, 248, 273 and 282) had methylated CpG sites. Transition mutations at CpG sites of p53 were abundant in colorectal and brain cancers (www.p53.iarc.fr/index.html).

25

1.3.5 DNA methylation and DNMTs The enzymes that carry out DNA methylation, DNA methytransferases (DNMTs), are well conserved in mammals and plants [100]. There are two categories of DNMTs : de novo and maintenance methylatransferases [101]. The initial pattern of methylation is established by the two de novo methytransferses DNMT3A and DNMT3B (Figure 1.12) during early embryogenesis in mammals [71]. This established methylation pattern is faithfully maintained by the maintenance methyltransferase, DNMT1 during multiple cell divisions (Figure 1.12). DNMT1 prefers to act on hemi-methylated DNA and this helps the enzyme to carry out its function during DNA replication [102]. DNMT3B and DNMT1 deficient mice are embryonic lethal [70-71] and DNMT3A null mice die around four weeks of age [71] confirming the requirement of these enzymes during development.

1.3.6 DNA demethylation Studies done during the last decade have revealed that DNA methylation is a dynamic process in which the removal of the methyl group or DNA demethylation also takes place [103]. The DNA demethylation can occur via two processes called active or passive demethylation (Figure 1.12). Passive DNA demethylation occurs if the access of DNMT1 is blocked during successive DNA replication. This process is relatively well understood and accepted. In contrast, active DNA demethylation is a poorly understood enzymatic process that removes methyl groups [103]. DNA demethylation occurs both genome-wide during early embryogenesis [104] and gene-specifically when somatic cells responding to signals such as in activated T- lymphocytes [105].

26

Figure 1.12 DNA methylation profile during the early embryogenesis. The de novo methyltransferases, DNMT3A and DNMT3B establish the initial methylation patterns. These methylation marks are maintained by the maintenance methyltransferase DNMT1 during replication after each cell division. However, if the DNMT1 is excluded or inhibited, the methyl groups will be diluted away after each successive replication cycle leading to passive demthylation. In contrast, active demethylation takes place through an enzymatic process replacing the 5mC with a C. Adapted from a figure in reference [106].

27 Genome-wide DNA demethylation occurs in mammalian development at two distinct stages (Figure 1.13). First, when primordial germ cells (PGC) have reached the embryonic gonads (in mouse embryonic day E10.5 to E13.5) and second, in the early embryo beginning in the zygote immediately after fertilization until the morula stage [107108]. What is striking here is that the paternal genome of the pronuclear zygote undergoes replication independent, genome-wide DNA demethylation during the first few hours after fertilization [104, 109]. Although plants carry out 5mC demethylation with the help of glycosylases such as Demeter and Demeter-like glycosylases, no mammalian counterparts of this class of enzymes have been found to date [63, 110].

Figure 1.13 The two major phases of genome-wide DNA demethylation in mammals. The solid arrows in the inner circle represent the time in which AID is expressed during embryogenesis. Adapted from a figure in reference [111].

28 1.3.7. AID dependent DNA demethylation DNA deaminases and the Base Excision repair (BER) pathway have recently been proposed in DNA demethylation along with other proposed mechanisms (Figure 1.14) [112].

Figure 1.14 Proposed mechanisms for active DNA demethylation. In plants, there is direct evidence for the removal of 5mC using 5mC-specific glycosylases like DME/ROS1 family of enzymes. In contrast, in mammals no efficient 5mC removal has been detected. Instead there are a number of proposed mechanisms with some experimental evidence. One model is based on DNA deamination followed by BER and the candidate deaminases are AID and APOBEC1. According to the model, the resulting thymine after 5mC deamination is removed by T●G mismatch specific glycosylases like MBD4 or TDG. Another recently proposed mechanism is based on the oxidation of 5mC by the TET (Ten Eleven Translocation) family of proteins followed by repair. A modified figure from reference [113].

29 Morgan et al first reported in 2004 that AID is expressed in primordial germ cells (PGC) and in early embryos at a time when demethylation occurs [54] (Figure 1.13). This suggestion received strong support from three recent publications. In Zebra-fish embryos, up regulation of AID correlates with DNA demethylation and coexpression of the glycosylase MBD4 through transfection results in DNA demethylation at specific embryonic stages [64]. Popp et al used bisulfate/NextGen sequencing technology to show that in AID -/- mice, PGCs have significantly higher methylation at many genomic loci than those from AID+/+ mice [63]. Using a system based on mouse-human cell fusions (heterokaryons), Bhutani et al showed that reprogramming of human cells to induced pluripotency required expression of AID [62]. Consequently, these studies strongly implicate a role for AID in DNA demethylation in early embryogenesis and PGCs. However, none of the papers clarify the mechanism by which AID promotes DNA demethylation. Conclusive evidence from both genetics and biochemistry are lacking to support the above role of AID. Genetically, AID knockout mice are viable and not sterile. Similarly, APOBEC1 knockout mice develop to adulthood and are fertile [114-115]. Biochemically, the activity of AID on 5mC is controversial (see below). As DNA methylation occurs symmetrically in the genomic DNA, deamination of both strands should give rise to T●G/G●T double mispairs. Since there is no evidence that either MBD4 or TDG act on a double mismatch, the proposed model (Figure 1.14) coupling AID /APOBEC1 with MBD4/TDG is questionable. Moreover, the repair of a double mismatch would lead to a DNA double strand break due to the activity of AP endonuclease that comes to process the two opposing abasic sites (Figure 1.14). There are several problems with this mechanism. First, if this mechanism were active, early zygotes would be under a great deal of stress repairing thousands of DNA demethylation-induced double strand breaks. In addition, as AID acts on single stranded DNA, it is not clear how AID would be

30 targeted to deaminate genomic DNA even before the first round of replication in the zygote [54]. Finally, it is not clear that AID can act on 5mC. A previous report in 2003 showed that 5mC is a poor substrate for AID [116] and subsequently concluded that methylation of cytosine “protects” the base from AID promoted deamination [117]. Supporting this line of evidence, two recent papers contained data that showed AID is inefficient in deaminating 5mC both in vitro [118] and in vivo [119]. One of the reasons this question remains unresolved is due to the difficulty of purifying the protein to perform in vitro assays. Therefore, in this study, I reexamined the ability of AID to deaminate 5mC using a powerful E. coli based genetic assay.

1.4 E.coli as a model organism to study cytosine and 5mC deamination Uracils that arise from cytosine deamination, if unrepaired, results in a genomic C to T mutation after two rounds of replication (Figure 1.11). Since the mutation rate of the single stranded DNA is 140 fold higher than double stranded DNA, E. coli based genetic assays can be developed where the DNA is exposed as a single strand. One such biological process is transcription and several studies have examined the effect of transcription on cytosine deamination in E.coli. [120-122]. Once AID was discovered and sequence similarity was compared to known proteins of the APOBEC family, the conserved sequence between AID and APOBEC1 led Muramatu et al [123] to propose an RNA editing activity for AID. The first evidence for AID being a DNA deaminase came from the early studies carried out in E.coli [124] where the authors showed that AID is a mutator in E.coli due to C to T mutations. The subsequent E. coli based assays reproduced a number of biological characteristics of the human enzyme [15, 55, 125]. When C to U to T mutations by APOBEC proteins are studied in E.coli, the sensitivity of the assay can be amplified by using an ung- strain [89]. The gene ung codes for UDG which is very efficient in removing uracil and initiating

31 the repair of the base through base excision repair pathway [126]. Hence, when the strain is deficient for ung, the majority of the C to T mutations are fixed in the genome. The same ung- phenotype can be obtained by expressing the inhibitor of UDG, Uracil DNA glycosylase Inhibitor (UGI), extra-chromosomally [127]. Methylation of cytosines in DNA is found in all three kingdoms of life and E.coli (eubacteria) uses it to recognize self from non-self. However, the presence of 5mC in DNA comes with a price of frequent deamination to T [85]. To study the 5mC deamination, one can use the Dcm methylation encoded by the E.coli [78] that methylates the second C in CCWGG sequence. To study the 5mC deamination in other sequence contexts, methyl transferases from other organisms (M.HpaII from Haemophilus parainfluenzae and M.MspI from Moraxella species etc.,) can also be expressed in E.coli. However, since E.coli does not tolerate methylation sites other than Dcm, the experiments have to be carried out in methylation restriction deficient strains with mcrA-, mrr - genotypes. Moreover, if the detection of 5mC to T mutations is the goal of the study, an E.coli strain has to be selected with ung+ to avoid the C to U to T mutations. Furthermore, when the Dcm methylation is studied, a vsr - strain is required to stop the repair of 5mC to T mutations at the CCWGG sites.

32

1.5 Scope and aims of the research projects My thesis project focused mainly on two areas of the APOBEC family of proteins.

1. Reexamination of the ability of AID to deaminate 5-methyl cytosine AID has recently been implicated in DNA demethylation based on its proposed role in 5mC deamination in early embryogenesis (Figure 1.14). However, the ability of AID to deaminate 5mC efficiently remains controversial. The major difficulty is the lack of pure AID protein for biochemical experiments. In this project, I adapted a powerful and rapid genetic system to test this proposed new role for AID. The ability of AID to deaminate 5mC was tested in five different sequence contexts including CpG and WRC. Other APOBEC proteins were also incorporated into the study to test the ability to deaminate 5mC. Based on the genetic data obtained, biochemical experiments were designed to complement this study. 2.

Domain swap to determine the DNA binding regions of A3G and AID

Although APOBECs are DNA binding proteins with specific substrate preferences, the sequence determinants of the enzymes (e.g. which amino acid residues produce WRC sequence specificity in AID and last C of CCC in A3G) were not known. In this project a series of biochemical studies were carried out to verify the putative DNA binding sites predicted in the recent NMR [42] and X-ray crystal structures [44] of the A3G carboxy terminal domain. In this study, hybrid proteins were made by swapping the putative DNA binding regions between the two proteins AID and A3G; next I determined whether or not the hybrid proteins change the substrate specificities.

33

CHAPTER 2 MATERIALS AND METHODS 2.1 Bacterial strains E.coli strain BH143 [Δ(mrr-hsdRMSmcrBC)mcrA ɸ80dlacZΔM15 ΔlacX74 deoR endA1 araD139 Δ(ara, leu)7697 galU galK rpsL nupG Δ (dcm-vsr)] was used to insert the kan alleles into the bacterial chromosome. The strain was tested to verify whether it is dcm– through restriction digestion of transformed plasmid DNA into the BH143 (Figure 2.1).

Figure 2.1 Verification of BH143 for the absence of Dcm methylation. The E.coli strain DH10B wild type for dcm+ has been used as a control. The same plasmid DNA extracted from DH10B cannot be cleaved with PspGI (lane 3) but is digested when isolated from BH143 (lane 6), confirms that the latter strain does not express Dcm. BstNI is an isoschizomer of PspGI but is not sensitive to Dcm methylation. Uncut plasmid DNA from DH10B (lane1), DNA from DH10B cut with BstNI (lane 2), uncut DNA from BH143 (lane 4) and DNA from BH143 cut with BstNI (lane 5).

34

The Red/ET recombination system from Gene Bridges (Heidelberg, Germany) was used to insert the different kan alleles into the genome of the BH143 through recombineering (Figure 2.2).

Figure 2.2 Construction of E. coli strains. The kan alleles were introduced into the manX gene in the E. coli chromosome through homologous recombination.The recombinants were selected using bleomycin-resistance conferred by the ble+ gene.

The kan alleles in the plasmids pUP31, pUP41 and pUP44 [128] were amplified using the 70 mer primer pairs (Recombinant Forward and Recombinant Reverse) Table 2.1) each of which contained 50 nucleotides identical to the manX gene in the genome. The amplification products containing the bleomycin resistance gene in addition to the kan alleles (kan-ble cassette prepared in advance) were used to construct the recombinants. Briefly, the PCR product was DpnI digested and gel purified to remove the traces of circular parental plasmid DNA. Red/ET recombination proteins-expression plasmid pRedET was transformed into BH143 using the E.coli pulser. Temperature

35 sensitive transformants were obtained incubating at 30°C, they were grown till OD600 ~0.3 and induced with L-arabinose to a final concentration of 0.3%. The cells were incubated at 37°C for 1 hour to express the Red/ET recombination proteins. The cells were spun down and made electro-competent by washing the pellet twice with chilled ddH2O and redissolving in 20-30 µL of water. 1-2 µL (~500 ng) of kan-ble cassette was electroporated into the competent cells. The cells were incubated for 3 hours at 37°C and plated on zeocin (20 µg/mL) plates. The recombinants obtained on zeocin plates were confirmed by PCR amplification and DNA sequencing. The three new strains were named BH400 (from pUP31), BH300 (pUP41) and BH500 (pUP44). BH214 is BH158 containing DE3 prophage, which contains the T7 RNA polymerase gene under the control of the lac promoter and has been described before [55]. For protein expression, E.coli B strain BL21DE3 or BL21DE3 codon+RIL (Stratagene) were used.

36 Table 2.1 Oligonucleotides used in genetic constructions and assays Name of the primer

Sequence

Recombinant Forward

5´- GTT GAT ACA TGG GGA GGC AGC CCG TTC AAT GCT GCC AGC CGC ATT GTC GTC TGG ATA ATG TTT TTT GCG CCG

Recombinant Reverse

5´- GCA TTG GAA TGT TAA CGC CTG CAA TGA CTT CAT AAT GCT CTT TGT CGG AAA ACG GGA AGA CAC ACT CAT G

UGI-F

5´-CCC GAA TTC TAG GAG TAC GAT AAT GAC AAA TTT ATC TGA CA

UGI-R

5´-GGG GAA TTC TTA TAA CAT TTT AAT TTT ATT TTC TCC ATT AC

A3A-F

5´- GGG GAC AAG CTT ATG GAA GCC AGC CCA G

A3A -R

5´ - GGG GGG AAT TCT CAG TTT CCC TGA TTC

AID-E58A-F

5´-CGG CTG CCA CGT AGC GCT GCT CTT CCT C

AID-E58A-R

5´- GAG GAA GAG CAG CGC TAC GTG GCA GCC G

A3A-E72A-F

5´- GGC CGC CAT GCG GCC CTG CGC TTC TTG

A3A-E72A-R

5´- CAA GAA GCG CAG GGC CGC ATG GCG GCC

pET-A3A-F

5´- GGA GGA ATT CAT GGA AGC CAG CCC AG

pET-A3A-R

5´- GGG AGA GGC TCG AGG TTT CCC TGA TTC TG

Kan-bleo-F

5´ -CGA CAC CAC TAA AGG CGT GCT

Kan-bleo-R

5´-TGG AGC CGC TTT TGG TGC T

2.2 Plasmids The pMB1-based plasmids carrying genes for M.HpaII [pM.HpaII,[83]], Dcm [pDCM21,[129]], and M.MspI [pQ8,[130]] have been described before. The plasmid pDCM22 contains dcm+ and vsr+ genes and has been described [129]. Human AID, APOBEC3A (A3A), and APOBEC3G (A3G) genes were cloned into p15A-based plasmid pSU24 creating respectively pSUAID, pSUA3A and pSUA3G (Figure 2.3). The primers used for the cloning are listed in the table 2.1. The clone for A3A cDNA was kindly

37 provided by Reuban Harris (University of Minnesota) and A3G cDNA was obtained from ATCC (Manassas, VA). The gene for UGI was amplified from a plasmid kindly provided by Umesh Varshney (Indian Institute of Science, Bangalore, India) and inserted at an EcoRI site in pSUAID to create pSUAID-UGI. Catalytic mutants of AID (AID-E58A) and APOBEC3A (A3A-E72A) as well as hybrid genes A3G-AIDR1, A3G-AIDR1R2, pSUA3GAIDR2

were constructed using a whole plasmid PCR mutagenesis strategy [[131],

(Table 2.1)]. For the purification of APOBEC3G carboxy terminal domain (A3G-CTD) or the A3G-AID hybrids (Figure 2.4), the genes were cloned into pGEX-6P-2 vector with a GST tag. Both A3A and the catalytic mutant A3A-E72A were amplified (Table 2.1) and cloned into pET28a (+) as EcoRI-XhoI fragments for purification. The hybrid genes A3GAIDR2 and AID-A3A were synthesized by DNA 2.0 (Menlo Park, CA) and A3G-AIDR2 was cloned into pGEX-6P-2 and AID-A3AR2 was cloned into pSU24.

38

Figure 2.3 Salient features of the plasmid maps. The promoter (arrow), inserted gene/s, (light), tags (black, if present), origin of replication (mid gray) and the antibiotic resistance (dark gray) have been shown from left to right respectively.

39

Figure 2.4 Schematic of hybrid proteins combining segements of AID (green), A3G-CTD (red), or A3A (violet). The swapped regions are illustrated as boxes.

2.3 Kanamycin-resistance Reversion Assay The reversion assay has been described previously [81, 83]. To quantify 5mC to T deaminations, AID, AID-E58A, AID-A3AR2, A3A, A3A-E72A or A3G and one of the MTase genes (M.HpaII , Dcm or M.MspI) were co-expressed from compatible plasmids and kanamycin-resistant revertants (50 µg/mL, phenotype-KanR) were scored. The KanR revertant frequency is given by the following ratio (Number of KanR revertants/Total number of viable cells). Twelve independent cultures were grown per condition and the median value was calculated using Mann-Whitney test using the GraphPad Prism 5

40 software. When the data were analyzed to quantify C to U deaminations, either UGI gene was expressed from the same plasmid as AID or and ung strain was used as the host. To study repair of T●G mispairs by VSP repair, the plasmid pDCM21 or pDCM22 was introduced in BH400 and the KanR revertant frequencies were determined.

2.4 Amplification of the kan alleles and sequence analysis A single colony from the center of a kanamycin containing plate was selected based solely on the location and not on the size. The cells from 12 independent plates per condition were dispersed in 30 µL sterile water and 2 µL of this mixture was used to perform PCR. A part of the kan-ble cassette covering the targeted proline codon that reverts to leucine or serine was PCR amplified using the two primers kan-ble-F and kanble-R (Table 2.1) and the products were purified using PCR purification kit (Epoch-Life Science). The purified PCR products were sequenced using kan-ble-F as the primer. The sequences of revertants were compared to the wild type sequences of pUP31, pUP41 and pUP44 plasmid sequences using MacVector software (MacVector, Inc., Cary, NC) and the mutations were identified. 2.5 Uracil Quantification Assay The genomic DNA was extracted and used for uracil quantification as described before [142]. Briefly, BH500 was co-transformed with M.MspI and either pSUAID or pSUAID-Ugi. The transformants were grown following the same conditions used for the KanR reversion assay. Purified genomic DNA was incubated with methoxyamine (100 mM) for 90 minutes at 37 °C to block the pre-existing abasic sites (Figure 2.5).The DNA was ethanol precipitated and treated simultaneously with E. coli UDG (1 unit/0.1 µg of DNA, New England Biolabs) and aldehyde-reactive probe (2 mM, Dojindo Laboratories) for another 90 minutes at 37 °C. DNA was transferred onto a positively charged nylon

41 membrane (Immobilon-Millipore) and cross linked to the membrane using a UV stratalinker 1800 (Beckman). The membrane was pre-equilibrated in Starting Block blocking buffer (Fisher) for 30 minutes and incubated with 5X10-4 mg/mL of streptavidincy5 (GE Healthcare). The membrane was washed, dried and scanned using a phosphorimager (Typhoon 9210). The results were analyzed with ImageQuant software. The standard was generated using a single uracil containing 75-mer duplex.

Figure 2.5 Scheme of the uracil quantification assay. DNA is first treated with methoxyamine (Mx) to block the pre-existing abasic sites. Uracil in DNA is next removed with E.coli uracil DNA glycosylase (UDG) and the resulting abasic sites are labeled with aldehyde-reactive probe (ARP). The DNA is then transferred to a positively charged nylon membrane and incubated with cy5-streptavadin. The membrane is scanned on a phosphorimager and the cy5 fluorescence is determined. Adapted from a figure in reference [142].

42 2.6 Purification of A3A and its mutant Human A3A gene and the catalytic mutant A3A-E72A were cloned into the EcoRI /XhoI sites of the pET28a (+) vector to construct A3A-His tag fusion gene and transformed into BL21DE3. The cell culture was grown at 37 °C till it reaches mid log phase. The transcription of the gene was induced with IPTG to a final concentration of 0.2 mM. The cells were grown for another 5 hours at 20 °C and harvested by centrifugation. The cell pellet was resuspended in 25 mL of 1X Tris buffer [20 mM TrisHCl (pH7.5), 50 mM NaCl)] along with half a tablet of complete EDTA-free protease inhibitor (Roche diagnostics, Indianapolis,IN) and lysozyme (5 µg/mL).The cells were broken using the French Pressure Cell Press (Thermo Spectronic) and the cell free lysate was cleared by centrifugation. The lysate was passed over a Ni-NTA column (Novagen, Madison) to bind the his-tagged proteins. The bound proteins were washed with ~ 5 column volumes each of Tris buffer (20 mM Tris-HCl, 50 mM NaCl) and tris buffer containing 5 mM, 10 mM and 40 mM imidazole (Sigma-Aldrich) respectively. Finally, the bound A3A (or A3A-E72A) protein was eluted with 250 mM imidazole. Proteins from different fractions were separated on a 12% SDS-polyacrylamide gel. The final eluent containing the protein of interest was dialyzed using a Slide-A-Lyzer dialysis cassette (Thermoscientific, Rockford, IL). Proteins were concentrated using Amicon Ultra Centrifugal devices (Milipore, Billerica,MA). The concentrated proteins were equilibrated in the storage buffer (20 mM Tris-Hcl, 50 mM NaCl, 1 mM EDTA, 1 mM DTT, 10% glycerol).

43 2.7 Biochemical assay for C and 5mC deamination Cytosine deamination by A3A was studied using the oligomer A3A-C (Table 2.2). Six picomoles of oligomer were incubated at 37 °C with ~ 140 ng of A3A enzyme in a 10 µL volume in deamination reaction buffer [40 mM Tri-HCl (pH 7.5), 5 mM EDTA, 1 mM DTT, 40 mM NaCl]. For the time course studies, the reaction mixtures were prepared scaling up six times the amount of a typical reaction. At indicated time points, 10 µL aliquots were removed and the reaction was stopped by adding 1, 10-phenanthroline (Sigma-Aldrich) to 5 mM. Two units of E. coli UDG (New England Biolabs) were added to the reaction and incubation was continued at 37 °C for one hour. The reactions were stopped by adding NaOH to 0.1 M and heating to 95 °C for 7 minutes. A 15% sequencing gel was run to separate the products and the gel was scanned using Typhoon 9210. ImageJ software was used to quantify the intensities of the substrates and the deaminated products. 5mC deamination by A3A was studied using the A3A-5mC oligo (Table 2.2). Two picomoles of the 5mC oligo were incubated with 140 ng of purified A3A protein (or A3AE72A) at 37 °C for 1 hour in a 10 µL volume in the deamination reaction buffer. The complementary G-containing oligo was mixed at three fold molar excess to create a T●G mismatch. The mixture was heated to 95 °C for 5 min and cooled slowly to room temperature over a period of 1 hour and placed on ice for 5 minutes to promote duplex formation. The duplex was incubated with 1.5 units of thermo stable thymine DNA glycosylase (Trevigen) for 1 hour at 47 °C in the thermocycler and the deaminated products were processed and analyzed similar to C-deamination.

44 Table 2.2 DNA oligomers used for in vitro deamination assay. Name

Sequence

A3A-U

5’ - ATT ATT ATT ATT ATU GAT TTA TTT ATT TAT TTA TTT ATT T -3’ (6FAM)

A3A-C

5’ - ATT ATT ATT ATT ATC GAT TTA TTT ATT TAT TTA TTT ATT T -3’ (6-FAM)

A3A-T

5’- ATT ATT ATT ATT ATT GAT TTA TTT ATT TAT TTA TTT ATT T -3’ (6-FAM)

A3A-5mC

5’- ATT ATT ATT ATT ATO GAT TTA TTT ATT TAT TTA TTT ATT T -3’ (6-FAM) O = 5mC

A3A reverse complement

5’- AAA TAA ATA AAT AAA TAA ATA AAT CGA TAA TAA TAA TAA T-3´

2.8 Purification of A3G-CTD and A3G-AID hybrids The plasmids containing hybrid proteins with the GST tag were introduced in the bacterial strain BL21 DE3. The transformants were grown at 37 °C till OD600 ~ 0.5 in the presence of the antibiotic carbenecillin (50 µg/mL). The transcription of the wild type A3G or hybrid genes was induced by adding IPTG to a final concentration of 200 µM. The cells were grown at 24°C for 4 more hours and harvested by centrifugation. The cell pellet was dissolved in 30 mL of 1X Tris buffer (150 mM NaCl and 20 mM Tris-HCl, pH 7.5) containing one tablet of the EDTA free protease inhibitor cocktail (Roche Diagnostic, Indianapolis,IN) and lysozyme (1 µg/mL). The cells were sonicated and the suspension was cleared by centrifugation. The glutathione-sepharose beads were added to the cleared lysate and mixed on a rocker for 30 minutes at 4 °C. The lysate was added to the column for glutathione-sepharose beads to pack along with GST tagged proteins. The bound proteins were washed with ~20 column volumes of 1X Tris buffer. Next, the bound proteins were eluted with the reduced glutathione (Sigma-Aldrich: in 50 mM TrisHCl, pH7.5). Different fractions of the eluates were run on a 12% SDS-PAGE gel to

45 identify the fractions with the wild type or hybrid proteins. The fractions containing the proteins

were

pooled

and

dialyzed

using

slide-A-Lyzer

dialysis

cassettes

(ThermoScientific, Rockford,IL). The proteins were concentrated using Amicon Ultra centrifugal filter devices according to the manufacturer’s instructions (Milipore, Billerica, MA). These proteins were further purified using ion-exchange chromatography in a MonoQ column (GE Healthcare). The protein was eluted using 0-1 M gradient of NaCl and collected in 1 mL fractions. The appropriate fractions of the hybrid proteins according to the UV absorption profile were pooled, concentrated and equilibrated with the storage buffer (25 mM Tris-pH7.5, 1 mM EDTA, 1 mM DTT and 10% glycerol).

2.9 In vitro deamination activity assay with A3G-CTD and its hybrids A 17-mer oligomer with overlapping WRC and CCC sequences (Table 2.3) was used as the substrate to study the sequence specificities of the A3G-AID hybrid proteins. One picomole of the oligomer was labeled at the 5´ end with P33 using T4-PNK (New England Biolabs). The radio-labeled substrate oligo was incubated with 2 µg of the protein at 37 °C in a 10 µL volume in the deamination reaction buffer (25 mM Tri-HCl, pH7.5, 50mM NaCL, 1mM DTT, 1mM EDTA). When the time course studies were carried out, the reaction mixtures were scaled up six fold. At various time points (0, 2, 8, 20, 60 minutes) 10 µL aliquots were removed from the master tube and the reactions were stopped by adding 1, 10 - phenanthroline (Sigma-Aldrich) to 5 mM. One unit of E. coli UDG (New England Biolabs) was added to the reactions and the incubation was continued at 37 °C for 45 min. The reactions were terminated by addition of NaOH to 0.1M followed by heating to 95 °C for 7 min. The products were separated by 20 % PAGE. The gel was scanned using the Typhoon 9210 scanner. ImageJ software was used to quantify the intensities of the gel bands.

46 Table 2.3 DNA oligomers used for deamination Assay Name

Sequence

CCC-17

5´- ATTATTACCCATTTATT

UCC-17

5´- ATTATTAUCCATTTATT

CUC-17

5´- ATTATTACUCATTTATT

CCU-17

5´- ATTATTACCUATTTATT

WRC-17

5´- ATTATTATACATTTATT

47

CHAPTER 3 RESULTS 3.1 Reexamination of the ability of AID to deaminate 5-methyl cytosine 3.1.1 Validation of the genetic system for 5mC to T deamination Our lab has previously used two defective alleles of the kanamycin-resistance gene (kan) that can be methylated in E.coli by different methyl transferases (MTase) to quantify 5mC to thymine conversions due to hydrolytic deamination [83]. In this plasmid based system the kan- alleles revert to kan+ (phenotype-KanR) through 5mC to T deamination increasing the KanR frequency by at least 10 fold. I adapted and expanded this system to study 5mC to T conversions by the AID/APOBEC family of proteins[132]. This new genetic system should be a better tool to reexamine the proposed role of AID to deaminate 5mC (Figure 1.16) as the purification of the protein is very hard. To reduce the number of plasmids maintained in the cells and to increase the sensitivity of the assay, I constructed three E. coli strains (BH300, BH400 and BH500) each with a different kan allele inserted into the manX gene of the E. coli chromosome (Material and Methods Figure 2.1). Three different MTase genes on plasmids were introduced into the three strains to create five different sequence contexts for cytosine methylation (Figure 3.1). The sequences included CpG as well as WRC context preferred by AID. In other three sequence contexts, the methylated cytosine was in CpH where H is any nucleotide other than guanine (i.e. CpA, CpC, or CpT). Before testing the potential of AID to deaminate 5mC, the genetic system was characterized using four different validating techniques [see below 3.1.1 (A) - 3.1.1(D)] [132].

48

Figure 3.1 Sequence contexts of 5-methylcytosines in kan alleles. Three different kan genes are present in E.coli strains BH300, BH400 and BH500. The gene for the methyltransferase M.HpaII, M.MspI or Dcm was introduced in these strains to methylate one of the cytosines in a proline codon (underlined in the figure) to obtain five possible combinations. The methylated C is indicated with “M” above and five bases unique to each sequence context are indicated by a bracket below the sequence. 5mC to T conversion changes the proline codon to either leucine or serine and changes the cellular phenotype from kanamycin sensitive (KanS) to kanamycin resistant (KanR)

49

3.1.1- (A) Methylation protection of genomic DNA against restriction enzymes First the genetic system was validated testing that the chromosomal DNA in these strains was appropriately methylated. The genomic DNA from cells expressing M.HaII, Dcm or M.MspI was digested with HpaII, EcoRII or MspI respectively and was found to be resistant to the endonucleases (Figure 3.2). This observation confirms that all three MTases are active in E.coli and express the protein in enough quantities to methylate the entire genome at the respective cognate sites of the MTases.

Figure 3.2 Protection of genomic DNA against restriction enzymes by the MTases. Genomic DNA resolved on a 0.8% agarose gel.

50

3.1.1- (B) Effect of MTase on the KanR reversion frequency The spontaneous KanR revertant frequencies in the presence and absence of the methyl transferases were compared to characterize the genetic system. With the presence of MTase, KanR reversion frequency went up ~ 100 fold in each case (Figure 3.3) and this result is consistent with previously reported values [81, 83]. There is a chemical as well as a biological reason for the observed greater increase in KanR reversion frequency due to the presence of the MTases. Unlike cytosine to uracil, 5mC to thymine deamination rate is 4 times higher [85]. E.coli does not possess DNA glycosylases like MBD4 or TDG that excise thymines from T●G mismatches. Consequently, this control experiment confirms that MTase methylates the specific proline codon in the kan allele (Figure 3.1) and reverts either to leucine or proline due to hydrolytic deamination. The T●G mismatch specific endonuclease, Vsr has been deleted from the strains used for this assay (see the genotype in material and methods).

51

Kan Revertant Frequency

10 -5 10 -6 10 -7

R

10 -8 10 -9

BH300

BH400

TA

CM G

M CG TA

TC CM G TC M CG N o M Ta se TA CM A N o M Ta se

N o

M Ta se

10 -10

BH500

Figure 3.3 Effects of MTases on Kanamycin reversion frequency. The KanR frequency with or without the MTase is shown. The E.coli strain used (BH300, BH400 or BH500) and the respective methylated sequences are shown. The genes for MTases M.HpaII, M.MspI or Dcm were introduced in the cells to methylate the DNA. The letter “M” within the sequence represents 5-methylcytosine and the horizontal line within the data points is the mean value.

3.1.1-(C) Sequencing of the KanR revertants To characterize the new genetic system 12 independent KanR revertants from all three MTases were sequenced and nearly all the KanR revertants had the methylated cytosine changed to Thymine (Table 3.1). This validates the genetic system for proper reversion of the 5mC at the special proline codon.

52 Table 3.1 Sequences of spontaneous revertants as a ratio to the number of revertants sequenced. M is 5-methyl cytosine. The proline codon in the kan alleles that changes to serine or leucine is underlined.

3.1.1-(D) Very Short Patch repair All of the control experiments carried out to validate the new genetic system confirmed the reliability of KanR reversion frequency to be used as an accurate measure of 5mC to T mutations in the system. Finally, the very short patch (VSP) repair mechanism of E.coli was utilized to verify that a majority of the revertants originate through the conversion of 5mC:G pair to T●G mispair. This was done by introducing the gene vsr+ in the strain BH400 in addition to Dcm. Vsr is a T●G mismatch specific endonuclease that hydrolyzes the phosphodiester linkage immediately 5´ of the mispaired T (Figure 3.4) initiating VSP repair that replaces the T with a C [133].

53

Figure 3.4 Very Short Patch repair (VSP) pathway of E.coli. VSP replaces T●G mismatches created by the deamination of one of the 5mC in the Dcm sequence (CCWGG) context. The Vsr endonuclease hydrolyzes the phosphodiester linkage immediately upstream of the mispaired T and DNA polymerase I (Pol I) and DNA ligase complete the reaction. M is 5methylcytosine. When the dcm+ gene alone was introduced in BH400 strain, KanR reversion frequency increased ~ 55 fold compared to the vector control. When both the dcm+ and vsr+ genes were introduced, the reversion frequency declined to only about 4 fold over the control (Figure 3.5). This result confirms that the 5mC to T mutation arises due to the methylation followed by deamination at the second C of CCWGG context. Therefore, all the validating results together confirmed that the novel genetic system can be used to quantify 5mC to T deaminations.

54

Figure 3.5 KanR revertant frequency with or without VSP repair. The KanR frequencies in the presence of only MTase (Dcm) or both Dcm and Vsr in the strain BH400 (sequence context CCWGG) is compared to the vector control. The mean value is shown as a horizontal line within the data points.

3.1.2 Human AID: an efficient cytosine deamianse not a 5mC deaminase When the human AID gene was introduced into BH500 without methyl transferase (MTase), the reversion frequency was ~ 10-6 (Figure 3.6). As BH500 is proficient in repairing U●G mismatches (ung+), the bulk of the uracil generated by cytosine deamination can be removed from the genome. To quantify the full extent of cytosine deaminations due to AID, the UDG inhibitor UGI was co-expressed. Expression of both AID and UGI increased the KanR reversion frequency ~ 200 fold confirming efficient cytosine deamination by the AID (Figure 3.6) [132].

55

Figure 3.6 Cytosine to uracil deamination by AID. KanR revertant frequencies in the strain BH500 due to AID alone or AID and UGI is compared.

56 In contrast, when M.MspI was introduced to methylate proline codon of the kan allele along with AID in BH500, only a modest increase in 5mC deamination was detected in the KanR assay. Here, the reversion frequency due to AID was only 1.9 fold higher compared to the vector and 1.5 fold to the catalytic mutant of AID (Figure 3.7). This confirms that although AID is an efficient cytosine deaminase, it is not a 5mC deaminase in the same assay [132].

Figure 3.7 5-methyl cytosine to T deamination by AID. KanR revertant frequencies in BH500 expressing AID, AID-E58A mutant or vector alone are shown. M.MspI is present in all three conditions but not shown for the clarity. The median value is shown as a horizontal line within the data points.

57 Similarly, AID was inefficient in deaminating 5mC in four other sequence contexts including CpG (Figures 3.8).

Figure 3.8 KanR revertants due to AID promoted 5mC to T deamination. Respective methyl transrerases are present in all conditions but not shown for the clarity. Median values are shown by a horizontal line. I wanted to verify that in cells where AID deaminates 5mC poorly, it was still deaminating cytosine efficiently. This was achieved by quantifying the genomic uracil accumulated due to AID activity using a biochemical assay. The assay labels the locations of the uracil with a Cy5 tag and uses Cy5 fluorescence to quantify the uracil levels (see Material and Methods Section 2.5). According to the results AID increases the amount of uracil in DNA by 10 fold (Figure 3.9) due to cytosine deamination;

58 however, 5mC to T deamination by AID is less than 2 fold (Figure 3.7) in the same cells [132].

(A)

(B)

1000 900 Uracils/ million bp

800 700 600 500 400 300 200 100 D+ UG I AI

AI

D

0

Figure 3.9 Quantification of genomic uracils due to AID. (A). Uracil containing oligo standard. (B).The amount of genomic uracils created by AID alone or AID and UGI is shown. The error bars indicate the standard deviation.

59

3.1.3 Efficient 5mC deamination by other APOBEC proteins Since AID is not an efficient 5mC deaminase, next I tested the ability of other APOBECs on 5mC deamination. APOBEC3G (A3G) was tested first because it is well characterized and a very important enzyme in retroviral restriction. As expected full length A3G expressed in an ung- E.coli strain (BH214) was very efficient in deaminating cytosines in its preferred substrate context (a run of C’s) compared to AID (Figure 3.10). However, when A3G was tested for 5mC deamination in the same sequence context, no increase in KanR reversion was detected (Figure 3.11).

Figure 3.10 Comparison of cytosine to uracil deamination by AID and A3G. KanR revertant frequencies in BH300 strain expressing AID, AID-E58A mutant or A3G are shown. The mean value is shown as a horizontal line within the data points.

60

Figure 3.11 Comparison of 5mC to T deamination by AID and A3G. KanR revertant frequencies in BH300 strain with vector alone, AID or A3G with M.HpaII are shown. The median is shown as a horizontal line within the data points.

Next, two APOBEC proteins (APOBE3A and APOBEC3C) that are localized to the nucleus were tested. APOBEC3A is not only an efficient cytosine deaminase (Figure 3.12) but it also deaminates 5mC very efficiently (Figure 3.13 and 3.14). In contrast A3C showed some toxicity in E.coli (Figure 3.13). The ability of A3A to deaminate 5mC efficiently is dependent on its catalytic activity because A3A-E72A catalytic mutant showed a back ground level of reversion frequency (Figure 3.15). I also sequenced the independent revertants obtained in experiments in which cells were expressing both MTase along with AID or A3A. According to the sequences of the revertants, an overwhelming majority of the mutations occurred at the methylated cytosines (Table

61 3.2). Taken together, these data confirm that AID is not an efficient 5mC deaminase whereas A3A deaminates 5mC very efficiently in the same genetic system [132].

Figure 3.12 Comparison of cytosine to uracil deamination by AID, A3A and A3C. KanR revertant frequencies in BH500 strain expressing AID, AID-E58A mutant, A3A or A3C are shown. The mean is shown as a horizontal line.

62

Figure 3.13 Comparison of 5-methyl cytosine to T deamination by AID, A3A and A3C. KanR revertant frequencies in BH500 strain expressing AID, AID-E58A mutant, A3A or A3C are shown. MTase is present in all the conditions but not shown for the clarity. The mean is shown as a horizontal line.

Figure 3.14 5-methyl cytosine deamination by A3A. KanR revertant frequencies in BH500 strain with vector alone, AID and A3A. MTase is present in all the conditions but not shown for the clarity. Horizontal line within data points is the median.

63

Figure 3.15 5-methyl cytosine deamination by A3A depends on catalysis. KanR revertant frequencies in BH500 strain with vector alone, A3A and the catalytic mutant E72A of A3A. MTase is present in all the conditions but not shown for the clarity. Horizontal line within data points is the median.

64 Table 3.2 Sequences of the revertants with AID/APOBEC. M is 5-methyl cytosine. The proline codon in the kan alleles that changes to serine or leucine is underlined. E. coli

MTase

Strain

Enzyme expressed

Sequence of

Revertant

(Revertants with M to T

kan allele

Sequence

change) / (Number of revertants sequenced)

BH400

Dcm

AID

TACMAGG

TACTAGG

10/12

BH400

Dcm

A3A

TACMAGG

TACTAGG

12/12

BH500

M.MspI

AID

TAMCGG

TATCGG

11/12

BH500

M.HpaII

A3A

TACMGG

TACTGG

12/12

3.1.4 In vitro cytosine and 5-methyl cytosine deamination by A3A To complement the genetic data of 5mC deamination by A3A, the activity of the A3A was characterized in vitro using purified protein. A3A was cloned into an expression vector with a C-terminal His tag and purified partially over a Ni-affinity column (Figure 3.16). The purified A3A was active and was able to completely convert cytosine into uracil in a synthetic oligonucleotide (Figure 3.17). Based on the purity of the protein (Figure 3.16), approximately 2 picomoles of the enzyme has reacted with 6 picomoles of the oligo within an hour. The purified A3A also deaminated 5mC efficiently. When an oligomer with a single 5mC was treated with the enzyme and processed as shown in the schematic (Figure 3.18) A3A treatment converted ~ 78% into product while converting ~99 % C to U. As expected, the catalytic mutant A3A-E72A did not convert 5mC to T (Figure 3.18). When the time-course of deamination by A3A was studied, a significant conversion of

65 5mC was detected even at the earliest time points (Figures 3.19 and 3.20). Due to technical difficulties there was some set to set variation in the data (Figure 3.20). However, the in vitro assays clearly confirmed that 5mC deamination by A3A is comparable to cytosine deamination [132]. Hence, both genetic and biochemical data confirm that A3A is an efficient deaminase of both C and 5mC.

66

Figure 3.16 SDS-PAGE gel of partially purified protein (or the mutant) over a Ni-affinity column. Wild type A3A is in lane 2 whereas A3A-E72A mutant is in lane 3. The protein of the size of APOBEC3A (28KDa) is indicated by an arrow.

67

(A )

(B)

Figure 3.17 in vitro cytosine deamination by A3A. (A) Schematic of the major steps of the in vitro assay. Fluoresently labeled (6-FAM) single stranded DNA oligomer with a single cytosine was treated with purified A3A and UDG. At indicated time points, the reaction was stopped by adding 1, 10-phenanthroline. The DNA strand was cleaved and electrophorased on a 12% polyacrylamide gel. (B) The oligomer with a uracil helps as a control to compare the size of the deaminated and cleaved product. The time incubated the oligo with the protein is indicated in minutes.

68 (A )

(B)

Figure 3.18 in vitro 5 methyl cytosine deamination by A3A. (A) Schematic of the major steps of the in vitro assay. Fluoresently labeled oligomer with a single cytosine or 5mC was treated with purified A3A or the E72A mutant. The 5mC containing oligo was hybridized to its complement creating a T-G mismatch, treated with TDG, strand was cleaved and electrophorased on a 12% polyacrylamide gel. (B) The oligomers with a uracil(U), cytosine(C) and thymine(T) serve as controls to compare the size of the deaminated and cleaved product as well as the efficiency of C versus 5mC deamination by A3A.

69 (A)

(B)

C-deamination

5mC-deamination

Figure 3.19 Kinetics of cytosine and 5mC deamination by A3A. (A) Single stranded DNA oligomers with a single C or 5mC were treated with A3A and reactions were stopped at indicated time points. The complementary strand was hybridized to the oligo and UDG or TDGwas added to cleave the DNA strand. DNA duplexes containing U●G and T●G mismatches servers as controls for the efficiency of reactions with UDG and TDG respectively. (B) Quantification of data in part (A) above. The data have been normalized for UDG and TDG reaction efficiencies. The density of the signal at time zero has been substracted from all data points.

70

Figure 3.20 Kinetics of A3A on cytosine versus 5mC deamination. The reactions conditions are similar to that of Figure 3.19.

71

3.2. Domain swap to determine the DNA binding regions of A3G and AID In this study we created segment swaps between the putative DNA binding regions predicted in solution structures of A3G and the respective conserved regions of AID (Figure 1.6). Although other APOBEC proteins have been less well studied, the substrate specificities of different enzymes have been examined. The variety of target sequences implies that the APOBEC family of proteins has evolved to target cytosines in different sequence contexts. Therefore, these proteins may be pliable enough to swap the DNA binding regions between the two proteins. The sequence specificity determinants of the proteins were verified using AID-A3G hybrid proteins in both genetic and biochemical assays [45]. I studied the biochemical properties of the hybrid proteins to complement the genetic data obtained by two other graduate students in the lab. A more soluble variant of A3G-CTD called A3G-CTD-2K3A [42] was used to study biochemistry. 3.2.1 Experimental design A former graduate student Dr. Michael Carpenter created the hybrids between AID and A3G using a whole plasmid PCR mutagenesis strategy [45, 131]. The construction of hybrids between AID and A3G is shown in Figure 2.3. Once the activity of the hybrid proteins was verified in E. coli, they were tested using genetic assays. Dr. Michael Carpenter carried out Lac+ papillation assay and another graduate student Erandi Rajagurubandara performed rifamicin resistance (RifR) forward mutation assay to determine the sequence specificities of the hybrid proteins [45]. The broad-spectrum RifR assay [134] is well suited to detect the sequence specificity of the hybrid proteins. Mutations that occur in the rpoB gene confer resistance to the antibiotic rifampicin by changing the β-subunit of RNA polymerase. Sequencing of a portion (called cluster 2) using only two primer pairs covers a majority of the mutations in the rpoB gene.

72

3.2.2. Properties of AID hybrids containing A3G DNA binding regions AID-A3GR1 (AID containing A3G R1 DNA binding region) did not show A3G like sequence preference. This shows that region 1 of A3G alone is not sufficient to confer the sequence selectivity of A3G into AID. However, AID-A3GR2 was significantly more like A3G than AID in its targeting the last C in a run of cytosines in DNA. Moreover, AIDA3GR1R2 promoted even higher sequence preference at CCC. Hence, when both R1 and R2 of A3G are introduced into AID, they cooperate in the hybrid protein to act like A3G on DNA at CCC sequences.

3.2.3 Properties of A3G hybrids containing AID DNA binding regions The introduction of only R1 from AID into A3G (A3G-AIDR1) behaved similarly to wild type A3G and had a mutation spectrum similar to A3G. However, the mutation spectrum of both the A3G-AIDR2 (A3G containing AID R2 binding region) and A3GAIDR1R2 were considerably different than that of either A3G or AID. To understand this new behavior of A3G-AID hybrids and also to complement the genetic data with biochemical data, I carried out in vitro deamination assays with the purified A3G-AID hybrid proteins (Figure 3.21).

73

Figure 3.21 SDS-PAGE (left) and western blot (right) of the affinity purified hybrid proteins. The right size of the proteins is indicated with two lines in both protein gel and the western blot. A3G-2K3A* - purified protein from glutathione affinity column followed by ion-exchange column.

3.2.4 Biochemical properties of A3G-AID hybrids The single stranded DNA substrate used in the assay contained overlapping WRC and CCC sites to detect the deamination at both sites [Figure 3.22 (A)]. When wild type A3G was reacted with the oligonucleotide, it readily converted only the third cytosine in CCC to uracil but did not convert the first or the second cytosine in detectable levels [Figure 3.22 (B) and (C)]. The kinetics results confirm that A3G preferentially deaminates the third C over the first C [Figure 3.22 (C)] [45].

74

Figure 3.22 Sequence selectivity and kinetics of A3G in cytosine deamination. (A) Sequence of the oligonucleotide substrate. (B) Image of a gel scan showing the kinetics of cytosine deamination by A3G. The left most lane contains the products of the three oligos with uracils replacing one of the three CCC as the control.

The length of

incubation time is indicated in mins for the remaining lanes. (C). The kinetics of cytosine deamination based on the gel scan in part B.

75 A3G-AIDR1, like wild type A3G, showed a similar preference for the third cytosine in CCC over the first C [Figure 3.23 (A)]. In contrast to A3G-AIDR1, A3GAIDR2 and A3G-AIDR1R2 showed drastic differences in substrate specificities [Figure 3.23 (B) and (C)] [45]. Both the enzymes converted the C in WRC at significant levels. However, both these enzymes retained the ability to deaminate the last C in CCC and therefore had the A3G-like specificity. Taken together, these results show that substitution of the DNA binding regions R1 and/or R2 regions of A3G with the corresponding regions in AID did not result in a replacement of A3G sequence specificity towards AID. Instead, it resulted in a broadening or relaxation of sequence preference to include WRC in addition to CCC sequence [45].

76 (A)

(B)

(C)

Figure 3.23 Kinetics of cytosine deamination by A3G-AID hybrids. Quantification of products created by (A). A3G-AIDR1, (B). A3G-AIDR2 and (C).A3G-AIDR1R2

77

3.2.5 DNA binding domain of A3A confers AID the ability to deaminate 5mC Based on the findings of the DNA binding domain (DBD) swap results, I hypothesized that A3A may contain a DNA binding region that can accommodate 5mC in its principal DBD. If this is true, replacement of DBD of AID with that of A3A may allow AID to deaminate 5mC. To test this possibility the hybrid protein was constructed by replacing the corresponding regions as shown (Figure 3.24 and Figure 2.4 Material & Methods). The hybrid protein was tested for 5mC deamination using the genetic assay. The chimeric protein AID-A3AR2 enhanced the KanR revertant frequency by approximately an order of magnitude compared to the vector control (Figure 3.25). This suggests that the putative DBD of A3A has a portable 5mC deaminating motif [132].

Figure 3.24 Schematic to illustrate domain swap between AID and A3A. The sequence of the putative DNA binding domains (DBD) of AID and A3A are shown. The principal DNA binding domain of A3A was identified by aligning the sequence of the A3A gene with that of AID. AID-A3AR2 contains all the amino acids except its DBD which is replaced with eight amino acids DBD from A3A. The amino acid residue numbers are indicated.

78

Figure 3.25 Comparison of 5mC deamination by AID, A3A and AID-A3AR2. KanR revertant frequency due the different proteins in BH500 strain is shown. The median is shown with a horizontal line within the data points.

79

CHAPTER 4 DISCUSSION 4.1 Reexamination of the ability of AID to deaminate 5-methyl cytosine In this study I found that AID is not an efficient 5-methyl cytosine deaminase, although several recent publications controversially proposed that AID acts on 5mC (Figures 3.7 and 3.8). Instead APOBEC3A (A3A) showed a very high 5mC deamination in the same assay (Figures 3.13, 3.14 and 3.15). The complementary studies confirmed that A3A is efficient in deaminating 5mC in vitro as well (Figures 3.18 and 3.19). It should be pointed out that there have been no reported enzymes from any organism that has been shown to deaminate 5mC efficiently before. However, both AID and A3G converted cytosines into uracils efficiently. This efficient cytosine deamination capability of AID (Figure 3.6) in the same assay in which 5mC to T conversion is very low (Figure 3.7) argues that the poor 5mC deamination activity by AID was not due to low level expression of the protein or its instability in E.coli. The ability of AID to deaminate 5mC was tested in all the relevant sequence contexts including CpG where the majority of 5mCs are found in mammalian cells, CpH shown to be having 5mC during embryogenesis as well as WRC preferred by AID. In all the cases the increase in 5mC to T mutations is less than two fold compared to the vector and therefore these results are inconsistent with the largely qualitative studies of Mogan et al [54] and the models that are based on 5mC deamination as the initiating step of DNA demethylation [62-63]. It is possible to argue that AID is not an efficient 5mC deaminase because a key accessory factor or a protein modification is not present in E.coli. Although possible, it is unlikely for several reasons. First, a number of studies have shown that AID is an efficient cytosine deaminase in E.coli and the novel genetic system reproduces the same

80 observation. Second, the only post translational modification of human AID purified from insect cells is serine38 phosphorylation [135] which is thought to affect its substrate specificity [136]. Even the AID protein purified from insects is 10 fold less efficient on 5mC compared to acting on cytosine [116-117]. Third, I have shown here that AID can be converted to a more active 5mC deaminase by replacing its DNA binding domain with the corresponding region from A3A (Figures 3.24 and 3.25). This implies that A3A is capable of accommodating 5mC in its active site whereas AID does not have that ability. In fact a previous attempt to dock 2´-deoxycytidylate into the active site of A3G in our laboratory (Figure 4.1) using Auto Dock Vina (http://autodock.scripps.edu/) found that tyrosine 315 is the responsible amino acid that blocks the methyl group in accommodating in the catalytic pocket (M. Carpenter and A.S. Bhagwat unpublished data).

Figure 4.1 Modeling of 2´-deoxycytidylate with APOBEC3G-CTD (PDB : 3E1U)

81 Once the docking project suggested that Y315 is the principal cause that blocks the incorporation of 5mC in A3G and by analogy in AID catalytic pocket, I carried out a rational mutagenesis study to substitute the Y315 of A3G with other amino acids. The primary goal was to introduce a less bulky group so that the new catalytic pocket will allow room for the methyl group of 5mC. First, Y315A mutant was constructed using a site directed mutagenesis and tested for 5mC deamination. This mutant was catalytically inactive on both 5mC and cytosine. As Y315A could be a drastic amino acid change, the other

amino

acid

substitutes

were

constructed

using

the

oligo

5´ACAAACACGTGAGCCTGTGCATCAAGACTGCACGCATCNNNGATGATTCAAGGA AGAGCTCAG as the top strand in which NNN will code for the other amino acids at 315 position. The mutants with Y315 substituting other amino acids also showed a reduced

1.0×10 -05

1.0×10 -06

1.0×10 -07

1.0×10 -08

A

3G

-C T Y3 D 1 Y3 5F 15 Y3 G 1 Y3 5L 15 Y3 A 1 Y3 5I 15 Y3 K 15 Y3 R 15 Y3 Q 1 Ve 5S ct or

KanR Revertant Frequency

activity on cytosine deamination as shown by Figure 4.2.

Figure 4.2 KanR revertant frequency of A3G-CTD and its Y315 substituent mutants.

82 Homology modeling of AID and other APOBECs suggest that this tyrosine is equally positioned in AID and A3G whereas it is further away in A3A [137]. Therefore, the conserved tyrosine in the catalytic pocket of AID and A3G may be the principal cause for the poor ability of these enzymes to deaminate 5mC. Conserved Y130 in A3A is farther away from the catalytic pocket whereas that of A3G (Y315 of A3G-CTD) located in the catalytic pocket (Figure 4.3).

Figure 4.3 Distinct DNA binding regions in A3A and A3G-CTD. The catalytic site pocket is highlighted in yellow. The conserved tyrosines in A3A (Y130) and in A3G-CTD (Y315) are circled in red for clarity. Adapted from a figure in reference [133]. In the novel genetic assay the introduction of MTases alone increased the KanR reversion frequency about two orders of magnitude (Figure 3.3 results section). Although this observation gave evidence for proper methylation of the proline codon which is key to the KanR assay, I was concerned about this high level of background as any enzyme

83 that deaminates 5mC should show an increase surpassing the threshold level of KanR reversion frequency due to MTase alone. This was the reason to test two additional APOBEC3 proteins A3A and A3C (shown to be localized to the nucleus) in the genetic system. The expression of A3C was toxic to E.coli and did not give consistent results in the assay. However, A3A was very efficient in deaminating 5mC in all the sequence contexts examined and showed at least 15 fold increase compared to the vector. This observation clearly confirmed the potential of the genetic assay to go beyond the level of background reversion frequency in the presence of MTase alone. I confirmed that A3A deaminates 5mC biochemically by purifying the protein partially from E.coli and testing it in vitro for C and 5mC deamination. A3A deaminated both C and 5mC with a slight preference for C over 5mC. It should be noted that the efficiency of the Thymine DNA glycosylase enzyme (TDG) reaction is low and varied from reaction to reaction. The optimum temperature for the thermophilic TDG enzyme is 65 °C and as the DNA duplex is unstable at that temperature, 47 °C was used. Consequently, we are underestimating the amount of 5mC deamination by A3A in the first step of the reaction. Regardless, these biochemical data complemented the genetic data that A3A is an efficient deaminase of 5mC. The finding that A3A is an efficient 5mC deaminase does not necessarily mean that this protein plays a major role in DNA demethylation in early embryogenesis. Although A3A has been shown to localize to the nucleus [138] there are no reports on its expression in embryonic tissues. Therefore, the biological role of A3A may be to inhibit human viruses and retrotransposons [17, 26], and to restrict foreign DNA [29]. However, there is some evidence to support the activity of A3A on 5mC in genomic DNA [31]. In this study a majority of the CpG sites of c-myc and p53 show either C to T or G to A mutations. As in mammals about 70-80% of cytosines in CpG sites are methylated on both strands in somatic cells [68], at least by inference this

84 should represent the activity of A3A in deaminating 5mC to T on CpG sites. However, more studies are required in a mammalian system to warrant its true biological function based on 5mC deamination. I have shown here that the AID-A3AR2 hybrid protein has the potential to deaminate 5mC efficiently. As this hybrid protein is still more AID like (190/198 amino acids), it would be interesting to repeat the experiments that have been done to show that AID plays a role in DNA demethylation [54, 62-64] to check whether the hybrid protein is going to enhance the effects that the authors had observed in above experiments. A number of biochemical arguments can be raised against the DNA demethylation initiated through 5mC deamination by AID, A3A or any other cytosine deaminases in early embryogenesis. First, cytosine is the preferred substrate over 5mC for all APOBEC deaminases known. Since, there is about 30 fold abundance of cytosines compared to 5mC in mammalian genome, it is hard to visualize how the APOBECs deaminate 5mCs without deaminating cytosines overwhelmingly. Second, all APOBECs are single stranded deaminases. Therefore, the paternal genome of the zygote should have a mechanism to present a greater portion of the genome as a single strand to these enzymes even before the first round of replication and there is no evidence that this occurs. Third, a replacement of all 5mCs in the genome with cytosines through base excision repair would require several million repair events per cell and which could lead to DNA double strand breaks at CpG sites. Moreover, as the mammalian base excision repair depends on polymerase β with an error rate of 1in 104 [139], this would lead to an unacceptably high mutational load. In summary, the results of the first project of my thesis work clearly showed that AID is not an efficient 5mC deaminase in five different sequence contexts including WRC. This strongly argues against the feasibility of the recently postulated role of AID in

85 DNA demethylation. Instead A3A showed efficient 5mC deamination both in vivo and in vitro. Moreover, the potential of A3A to deaminate 5mC lies in a segment of the protein that can be switched to confer 5mC deamination activity upon AID.

86

4.2 Domain swap to determine the DNA binding regions of A3G and AID The hybrid proteins were constructed by swapping the putative DNA binding domains of A3G and AID to determine their actual DNA binding regions. The results indicate that about 10-12 amino acid segment within A3G (region 2, Figure 1.6) is the principal DNA binding region. When this region was introduced into AID framework, the sequence specificity of the hybrid protein (AID-A3GR2) changed from AID (WRC) to A3G like (CCC). However, when the opposite of this swap (to introduce region2 of AID into A3G) was done, the new hybrid protein A3G-AIDR2 did not change sequence specificity entirely from A3G (CCC) to AID (WRC). Instead, A3G-AIDR2 started acting on different sequence specificities according to the genetic data. I studied this broadening or relaxation of the sequence specificity using biochemical assays. The in vitro data suggest that the hybrid protein still retains the ability to act on CCC sequence context (Figure 3.23). Furthermore, the hybrid protein has displayed an additional ability to act in WRC. Hence, unlike A3G, the DNA binding region of AID is not sufficient to switch the sequence specificity of A3G from CCC to WRC. The introduction of only DNA binding region 1 either from A3G to AID or from AID to A3G did not change the specificity of the hybrid proteins. Instead it enhanced the sequence specificity conferred by the region2. This observation by genetic data is compatible with biochemistry as shown in [Figure 3.23 (A)] where A3G-AIDR1 acted on third C in CCC in contrast to WRC. However, the hybrid protein having R1 (A3GAIDR1R2) in addition to R2 (A3G-AIDR2) deaminated WRC rapidly compared to R2 alone [Figure 3.23 (A) and (B)]. Another noticeable fact is that the CCC to CCU deamination is faster compared to the WRC to WRU [Figure 3.23 (A) and (B)]. The eventual decline of the product corresponding to CCU is obvious by the fact that the oligo was labeled at the 5´ end and the WRC is closer to the 5´ end than CCC. As a result once the uracil in WRU is converted to a nick to resolve in the gel, the downstream

87 sequence is lost. The apparent targeting of the last C in CCC earlier than WRC is hard to explain. A number of studies have shown that full length A3G binds single stranded DNA and slides in both directions. According to those studies, the enzyme targets 5´ end more frequently [140-141] though the hybrid proteins in this work show an opposite sequence preference. This observation could be due to several reasons. Whereas other studies have used the full length A3G, this work is based only on the carboxy terminal domain of the protein. Moreover, this work used a variant (2K3A) of A3G-CTD and 2K3A can deviate from its original processivity. The DNA oligomer used in this assay had both WRC and CCC overlapping sequences and this also can be a possible reason. In summary, the results of the second project found that DNA binding region2 of A3G was the principal sequence-specificity determinant of the protein. Region1 enhances the sequence specificity determined by region2. Both regions 1 and 2 of AID were not sufficient to confer its sequence specificity upon A3G. Replacement of region 2 of A3G with corresponding region from AID relaxes the specificity of A3G.

88

REFERENCES 1.

Gott, J.M. and R.B. Emeson, Functions and mechanisms of RNA editing. Annu Rev Genet, 2000. 34: p. 499-531.

2.

Ishitani, R., S. Yokoyama, and O. Nureki, Structure, dynamics, and function of RNA modification enzymes. Curr Opin Struct Biol, 2008. 18(3): p. 330-9.

3.

Okazaki, I.M., A. Kotani, and T. Honjo, Role of AID in tumorigenesis. Adv Immunol, 2007. 94: p. 245-73.

4.

Tornaletti, S. and G.P. Pfeifer, Complete and tissue-independent methylation of CpG sites in the p53 gene: implications for mutations in human cancers. Oncogene, 1995. 10(8): p. 1493-9.

5.

Lindahl, T., DNA repair enzymes. Annu Rev Biochem, 1982. 51: p. 61-87.

6.

Samaranayake, M., et al., Evaluation of molecular models for the affinity maturation of antibodies: roles of cytosine deamination by AID and DNA repair. Chem Rev, 2006. 106(2): p. 700-19.

7.

Harris, R.S. and M.T. Liddament, Retroviral restriction by APOBEC proteins. Nat Rev Immunol, 2004. 4(11): p. 868-77.

8.

Goila-Gaur, R. and K. Strebel, HIV-1 Vif, APOBEC, and intrinsic immunity. Retrovirology, 2008. 5: p. 51.

9.

Navaratnam, N., et al., The p27 catalytic subunit of the apolipoprotein B mRNA editing enzyme is a cytidine deaminase. J Biol Chem, 1993. 268(28): p. 2070912.

10.

Chester, A., et al., RNA editing: cytidine to uridine conversion in apolipoprotein B mRNA. Biochim Biophys Acta, 2000. 1494(1-2): p. 1-13.

89 11.

Petersen-Mahrt, S.K. and M.S. Neuberger, In vitro deamination of cytosine to uracil in single-stranded DNA by apolipoprotein B editing complex catalytic subunit 1 (APOBEC1). J Biol Chem, 2003. 278(22): p. 19583-6.

12.

Sousa, M.M., H.E. Krokan, and G. Slupphaug, DNA-uracil and human pathology. Mol Aspects Med, 2007. 28(3-4): p. 276-306.

13.

Prochnow, C., R. Bransteitter, and X.S. Chen, APOBEC deaminases-mutases with defensive roles for immunity. Sci China C Life Sci, 2009. 52(10): p. 893-902.

14.

Pham, P., et al., Processive AID-catalysed cytosine deamination on singlestranded DNA simulates somatic hypermutation. Nature, 2003. 424(6944): p. 103-7.

15.

Beale, R.C., et al., Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J Mol Biol, 2004. 337(3): p. 585-96.

16.

Harris, R.S., S.K. Petersen-Mahrt, and M.S. Neuberger, RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol Cell, 2002. 10(5): p. 1247-53.

17.

Chen, H., et al., APOBEC3A is a potent inhibitor of adeno-associated virus and retrotransposons. Curr Biol, 2006. 16(5): p. 480-5.

18.

Langlois, M.A., et al., Mutational comparison of the single-domained APOBEC3C and double-domained APOBEC3F/G anti-retroviral cytidine deaminases provides insight into their DNA target site specificities. Nucleic Acids Res, 2005. 33(6): p. 1913-23.

19.

Dang, Y., et al., Identification of APOBEC3DE as another antiretroviral factor from the human APOBEC family. J Virol, 2006. 80(21): p. 10522-33.

20.

Liddament, M.T., et al., APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr Biol, 2004. 14(15): p. 1385-91.

90 21.

Conticello, S.G., The AID/APOBEC family of nucleic acid mutators. Genome Biol, 2008. 9(6): p. 229.

22.

Jarmuz, A., et al., An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics, 2002. 79(3): p. 285-96.

23.

Conticello, S.G., et al., Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases. Mol Biol Evol, 2005. 22(2): p. 367-77.

24.

Bogerd, H.P., et al., Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc Natl Acad Sci U S A, 2006. 103(23): p. 8780-5.

25.

Berger, G., et al., APOBEC3A is a specific inhibitor of the early phases of HIV-1 infection in myeloid cells. PLoS Pathog, 2011. 7(9): p. e1002221.

26.

Bogerd, H.P., et al., APOBEC3A and APOBEC3B are potent inhibitors of LTRretrotransposon function in human cells. Nucleic Acids Res, 2006. 34(1): p. 8995.

27.

Goila-Gaur, R., et al., Targeting APOBEC3A to the viral nucleoprotein complex confers antiviral activity. Retrovirology, 2007. 4: p. 61.

28.

Vartanian, J.P., et al., Evidence for editing of human papillomavirus DNA by APOBEC3 in benign and precancerous lesions. Science, 2008. 320(5873): p. 230-3.

29.

Stenglein, M.D., et al., APOBEC3 proteins mediate the clearance of foreign DNA from human cells. Nat Struct Mol Biol, 2010. 17(2): p. 222-9.

30.

Landry, S., et al., APOBEC3A can activate the DNA damage response and cause cell-cycle arrest. EMBO Rep, 2011. 12(5): p. 444-50.

31.

Suspene, R., et al., Somatic hypermutation of human mitochondrial and nuclear DNA by APOBEC3 cytidine deaminases, a pathway for DNA catabolism. Proc Natl Acad Sci U S A, 2011. 108(12): p. 4858-63.

91 32.

Thielen, B.K., et al., Innate immune signaling induces high levels of TC-specific deaminase activity in primary monocyte-derived cells through expression of APOBEC3A isoforms. J Biol Chem, 2010. 285(36): p. 27753-66.

33.

Sheehy, A.M., et al., Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature, 2002. 418(6898): p. 646-50.

34.

Conticello, S.G., R.S. Harris, and M.S. Neuberger, The Vif protein of HIV triggers degradation of the human antiretroviral DNA deaminase APOBEC3G. Curr Biol, 2003. 13(22): p. 2009-13.

35.

Marin, M., et al., HIV-1 Vif protein binds the editing enzyme APOBEC3G and induces its degradation. Nat Med, 2003. 9(11): p. 1398-403.

36.

Navarro, F., et al., Complementary function of the two catalytic domains of APOBEC3G. Virology, 2005. 333(2): p. 374-86.

37.

Newman, E.N., et al., Antiviral function of APOBEC3G can be dissociated from cytidine deaminase activity. Curr Biol, 2005. 15(2): p. 166-70.

38.

Gooch, B.D. and B.R. Cullen, Functional domain organization of human APOBEC3G. Virology, 2008. 379(1): p. 118-24.

39.

Harris, R.S., et al., DNA deamination mediates innate immunity to retroviral infection. Cell, 2003. 113(6): p. 803-9.

40.

Mangeat, B., et al., Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature, 2003. 424(6944): p. 99-103.

41.

Zhang, H., et al., The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature, 2003. 424(6944): p. 94-8.

42.

Chen, K.M., et al., Structure of the DNA deaminase domain of the HIV-1 restriction factor APOBEC3G. Nature, 2008. 452(7183): p. 116-9.

43.

Furukawa, A., et al., Structure, interaction and real-time monitoring of the enzymatic reaction of wild-type APOBEC3G. EMBO J, 2009. 28(4): p. 440-51.

92 44.

Holden, L.G., et al., Crystal structure of the anti-viral APOBEC3G catalytic domain and functional implications. Nature, 2008. 456(7218): p. 121-4.

45.

Carpenter, M.A., et al., Determinants of sequence-specificity within human AID and APOBEC3G. DNA Repair (Amst), 2010.

46.

Lederberg, J., Ontogeny of the clonal selection theory of antibody formation. Reflections on Darwin and Ehrlich. Ann N Y Acad Sci, 1988. 546: p. 175-82.

47.

Tonegawa, S., Somatic generation of antibody diversity. Nature, 1983. 302(5909): p. 575-81.

48.

Schatz, D.G., M.A. Oettinger, and D. Baltimore, The V(D)J recombination activating gene, RAG-1. Cell, 1989. 59(6): p. 1035-48.

49.

Oettinger, M.A., et al., RAG-1 and RAG-2, adjacent genes that synergistically activate V(D)J recombination. Science, 1990. 248(4962): p. 1517-23.

50.

Di Noia, J.M. and M.S. Neuberger, Molecular mechanisms of antibody somatic hypermutation. Annu Rev Biochem, 2007. 76: p. 1-22.

51.

Lederberg, J., Genes and antibodies. Science, 1959. 129(3364): p. 1649-53.

52.

Brenner, S. and C. Milstein, Origin of antibody variation. Nature, 1966. 211(5046): p. 242-3.

53.

Muramatsu, M., et al., Specific expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B cells. J Biol Chem, 1999. 274(26): p. 18470-6.

54.

Morgan, H.D., et al., Activation-induced cytidine deaminase deaminates 5methylcytosine in DNA and is expressed in pluripotent tissues: implications for epigenetic reprogramming. J Biol Chem, 2004. 279(50): p. 52353-60.

55.

Sohail, A., et al., Human activation-induced cytidine deaminase causes transcription-dependent, strand-biased C to U deaminations. Nucleic Acids Res, 2003. 31(12): p. 2990-4.

93 56.

Dickerson, S.K., et al., AID mediates hypermutation by deaminating single stranded DNA. J Exp Med, 2003. 197(10): p. 1291-6.

57.

Papavasiliou, F.N. and D.G. Schatz, Somatic hypermutation of immunoglobulin genes: merging mechanisms for genetic diversity. Cell, 2002. 109 Suppl: p. S3544.

58.

Rajewsky, K., Clonal selection and learning in the antibody system. Nature, 1996. 381(6585): p. 751-8.

59.

Stavnezer-Nordgren, J. and S. Sirlin, Specificity of immunoglobulin heavy chain switch correlates with activity of germline heavy chain genes prior to switching. EMBO J, 1986. 5(1): p. 95-102.

60.

Maul, R.W., et al., Uracil residues dependent on the deaminase AID in immunoglobulin gene variable and switch regions. Nat Immunol, 2011. 12(1): p. 70-6.

61.

Imai, K., et al., Human uracil-DNA glycosylase deficiency associated with profoundly impaired immunoglobulin class-switch recombination. Nat Immunol, 2003. 4(10): p. 1023-8.

62.

Bhutani, N., et al., Reprogramming towards pluripotency requires AID-dependent DNA demethylation. Nature, 2010. 463(7284): p. 1042-7.

63.

Popp, C., et al., Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature, 2010. 463(7284): p. 1101-5.

64.

Rai, K., et al., DNA demethylation in zebrafish involves the coupling of a deaminase, a glycosylase, and gadd45. Cell, 2008. 135(7): p. 1201-12.

65.

Mikkelsen, T.S., et al., Dissecting direct reprogramming through integrative genomic analysis. Nature, 2008. 454(7200): p. 49-55.

66.

Reik, W., Stability and flexibility of epigenetic gene regulation in mammalian development. Nature, 2007. 447(7143): p. 425-32.

94 67.

Walsh, C.P., J.R. Chaillet, and T.H. Bestor, Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet, 1998. 20(2): p. 116-7.

68.

Bird, A., DNA methylation patterns and epigenetic memory. Genes Dev, 2002. 16(1): p. 6-21.

69.

Sado, T., et al., X inactivation in the mouse embryo deficient for Dnmt1: distinct effect of hypomethylation on imprinted and random X inactivation. Dev Biol, 2000. 225(2): p. 294-303.

70.

Li, E., T.H. Bestor, and R. Jaenisch, Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell, 1992. 69(6): p. 91526.

71.

Okano, M., et al., DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell, 1999. 99(3): p. 247-57.

72.

Robertson, K.D., DNA methylation and human disease. Nat Rev Genet, 2005. 6(8): p. 597-610.

73.

Illingworth, R.S. and A.P. Bird, CpG islands--'a rough guide'. FEBS Lett, 2009. 583(11): p. 1713-20.

74.

Maunakea, A.K., et al., Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature, 2010. 466(7303): p. 253-7.

75.

Ramsahoye, B.H., et al., Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc Natl Acad Sci U S A, 2000. 97(10): p. 5237-42.

76.

Xie, W., et al., Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell, 2012. 148(4): p. 816-31.

77.

Coulondre, C., et al., Molecular basis of base substitution hotspots in Escherichia coli. Nature, 1978. 274(5673): p. 775-80.

95 78.

Marinus, M.G. and N.R. Morris, Isolation of deoxyribonucleic acid methylase mutants of Escherichia coli K-12. J Bacteriol, 1973. 114(3): p. 1143-50.

79.

Bigger, C.H., K. Murray, and N.E. Murray, Recognition sequence of a restriction enzyme. Nat New Biol, 1973. 244(131): p. 7-10.

80.

Lieb, M., Spontaneous mutation at a 5-methylcytosine hotspot is prevented by very short patch (VSP) mismatch repair. Genetics, 1991. 128(1): p. 23-7.

81.

Wyszynski, M., S. Gabbara, and A.S. Bhagwat, Cytosine deaminations catalyzed by DNA cytosine methyltransferases are unlikely to be the major cause of mutational hot spots at sites of cytosine methylation in Escherichia coli. Proc Natl Acad Sci U S A, 1994. 91(4): p. 1574-8.

82.

Lutsenko, E. and A.S. Bhagwat, Principal causes of hot spots for cytosine to thymine mutations at sites of cytosine methylation in growing cells. A model, its experimental support and implications. Mutat Res, 1999. 437(1): p. 11-20.

83.

Bandaru, B., M. Wyszynski, and A.S. Bhagwat, HpaII methyltransferase is mutagenic in Escherichia coli. J Bacteriol, 1995. 177(10): p. 2950-2.

84.

Shen, J.C., et al., A comparison of the fidelity of copying 5-methylcytosine and cytosine at a defined DNA template site. Nucleic Acids Res, 1992. 20(19): p. 5119-25.

85.

Lindahl, T. and B. Nyberg, Heat-induced deamination of cytosine residues in deoxyribonucleic acid. Biochemistry, 1974. 13(16): p. 3405-10.

86.

Ehrlich, M., et al., DNA cytosine methylation and heat-induced deamination. Biosci Rep, 1986. 6(4): p. 387-93.

87.

Vairapandi, M. and N.J. Duker, Enzymic removal of 5-methylcytosine from DNA by a human DNA-glycosylase. Nucleic Acids Res, 1993. 21(23): p. 5323-7.

96 88.

Yang, A.S., et al., HhaI and HpaII DNA methyltransferases bind DNA mismatches, methylate uracil and block DNA repair. Nucleic Acids Res, 1995. 23(8): p. 1380-7.

89.

Visnes, T., et al., Uracil in DNA and its processing by different DNA glycosylases. Philos Trans R Soc Lond B Biol Sci, 2009. 364(1517): p. 563-8.

90.

Lieb, M., Specific mismatch correction in bacteriophage lambda crosses by very short patch repair. Mol Gen Genet, 1983. 191(1): p. 118-25.

91.

Hendrich, B., et al., The thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature, 1999. 401(6750): p. 301-4.

92.

Cooper, D.N. and J.F. Clayton, DNA polymorphism and the study of disease associations. Hum Genet, 1988. 78(4): p. 299-312.

93.

Cooper, D.N. and H. Youssoufian, The CpG dinucleotide and human genetic disease. Hum Genet, 1988. 78(2): p. 151-5.

94.

Giannelli, F., et al., Haemophilia B: database of point mutations and short additions and deletions. Nucleic Acids Res, 1990. 18(14): p. 4053-9.

95.

Jones, P.A., et al., Methylation, mutation and cancer. Bioessays, 1992. 14(1): p. 33-6.

96.

Kohler, S.W., et al., Spectra of spontaneous and mutagen-induced mutations in the lacI gene in transgenic mice. Proc Natl Acad Sci U S A, 1991. 88(18): p. 7958-62.

97.

Douglas, G.R., et al., Sequence spectra of spontaneous lacZ gene mutations in transgenic mouse somatic and germline tissues. Mutagenesis, 1994. 9(5): p. 451-8.

98.

Zimmer, D.M., et al., Spontaneous and ethylnitrosourea-induced mutation fixation and molecular spectra at the lacI transgene in the Big Blue rat-2 embryo cell line. Environ Mol Mutagen, 1996. 28(4): p. 325-33.

97 99.

Olivier, M., et al., TP53 mutation spectra and load: a tool for generating hypotheses on the etiology of cancer. IARC Sci Publ, 2004(157): p. 247-70.

100.

Law, J.A. and S.E. Jacobsen, Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet, 2010. 11(3): p. 20420.

101.

Goll, M.G. and T.H. Bestor, Eukaryotic cytosine methyltransferases. Annu Rev Biochem, 2005. 74: p. 481-514.

102.

Hermann, A., R. Goyal, and A. Jeltsch, The Dnmt1 DNA-(cytosine-C5)methyltransferase methylates DNA processively with high preference for hemimethylated target sites. J Biol Chem, 2004. 279(46): p. 48350-9.

103.

Ooi, S.K. and T.H. Bestor, The colorful history of active DNA demethylation. Cell, 2008. 133(7): p. 1145-8.

104.

Mayer, W., et al., Demethylation of the zygotic paternal genome. Nature, 2000. 403(6769): p. 501-2.

105.

Bruniquel, D. and R.H. Schwartz, Selective, stable demethylation of the interleukin-2 gene enhances transcription by an active process. Nat Immunol, 2003. 4(3): p. 235-40.

106.

Wu, S.C. and Y. Zhang, Active DNA demethylation: many roads lead to Rome. Nat Rev Mol Cell Biol, 2010. 11(9): p. 607-20.

107.

Hemberger, M., W. Dean, and W. Reik, Epigenetic dynamics of stem cells and cell lineage commitment: digging Waddington's canal. Nat Rev Mol Cell Biol, 2009. 10(8): p. 526-37.

108.

Surani, M.A., K. Hayashi, and P. Hajkova, Genetic and epigenetic regulators of pluripotency. Cell, 2007. 128(4): p. 747-62.

109.

Oswald, J., et al., Active demethylation of the paternal genome in the mouse zygote. Curr Biol, 2000. 10(8): p. 475-8.

98 110.

Gehring, M., W. Reik, and S. Henikoff, DNA demethylation by DNA repair. Trends Genet, 2009. 25(2): p. 82-90.

111.

Feng, S., S.E. Jacobsen, and W. Reik, Epigenetic reprogramming in plant and animal development. Science, 2010. 330(6004): p. 622-7.

112.

Hajkova, P., et al., Genome-wide reprogramming in the mouse germ line entails the base excision repair pathway. Science, 2010. 329(5987): p. 78-82.

113.

Chen, Z.X. and A.D. Riggs, DNA methylation and demethylation in mammals. J Biol Chem, 2011. 286(21): p. 18347-53.

114.

Hirano, K., et al., Targeted disruption of the mouse apobec-1 gene abolishes apolipoprotein B mRNA editing and eliminates apolipoprotein B48. J Biol Chem, 1996. 271(17): p. 9887-90.

115.

Morrison, J.R., et al., Apolipoprotein B RNA editing enzyme-deficient mice are viable despite alterations in lipoprotein metabolism. Proc Natl Acad Sci U S A, 1996. 93(14): p. 7154-9.

116.

Bransteitter, R., et al., Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proc Natl Acad Sci U S A, 2003. 100(7): p. 4102-7.

117.

Larijani, M., et al., Methylation protects cytidines from AID-mediated deamination. Mol Immunol, 2005. 42(5): p. 599-604.

118.

Kohli, R.M., et al., A portable hot spot recognition loop transfers sequence preferences from APOBEC family members to activation-induced cytidine deaminase. J Biol Chem, 2009. 284(34): p. 22898-904.

119.

Guo, J.U., et al., Hydroxylation of 5-methylcytosine by TET1 promotes active DNA demethylation in the adult brain. Cell, 2011. 145(3): p. 423-34.

99 120.

Beletskii, A. and A.S. Bhagwat, Correlation between transcription and C to T mutations in the non-transcribed DNA strand. Biol Chem, 1998. 379(4-5): p. 54951.

121.

Beletskii, A. and A.S. Bhagwat, Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc Natl Acad Sci U S A, 1996. 93(24): p. 13919-24.

122.

Beletskii, A. and A.S. Bhagwat, Transcription-induced cytosine-to-thymine mutations are not dependent on sequence context of the target cytosine. J Bacteriol, 2001. 183(21): p. 6491-3.

123.

Muramatsu, M., et al., Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell, 2000. 102(5): p. 553-63.

124.

Petersen-Mahrt, S.K., R.S. Harris, and M.S. Neuberger, AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversification. Nature, 2002. 418(6893): p. 99-103.

125.

Ramiro, A.R., et al., Transcription enhances AID-mediated cytidine deamination by exposing single-stranded DNA on the nontemplate strand. Nat Immunol, 2003. 4(5): p. 452-6.

126.

Sandigursky, M., G.A. Freyer, and W.A. Franklin, The post-incision steps of the DNA base excision repair pathway in Escherichia coli: studies with a closed circular DNA substrate containing a single U:G base pair. Nucleic Acids Res, 1998. 26(5): p. 1282-7.

127.

Asensio, J.L., et al., Novel dimeric structure of phage phi29-encoded protein p56: insights into uracil-DNA glycosylase inhibition. Nucleic Acids Res, 2011. 39(22): p. 9779-88.

100 128.

Carpenter,

M.,

et

al.,

Sequence-dependent enhancement

of

hydrolytic

deamination of cytosines in DNA by the restriction enzyme PspGI. Nucleic Acids Res, 2006. 34(13): p. 3762-70. 129.

Sohail, A., et al., A gene required for very short patch repair in Escherichia coli is adjacent to the DNA cytosine methylase gene. J Bacteriol, 1990. 172(8): p. 421421.

130.

Dubey, A.K., B. Mollet, and R.J. Roberts, Purification and characterization of the MspI DNA methyltransferase cloned and overexpressed in E. coli. Nucleic Acids Res, 1992. 20(7): p. 1579-85.

131.

Weiner, M.P. and G.L. Costa, Rapid PCR site-directed mutagenesis. PCR Methods Appl, 1994. 4(3): p. S131-6.

132.

Wijesinghe, P. and A.S. Bhagwat, Efficient deamination of 5-methylcytosines in DNA by human APOBEC3A, but not by AID or APOBEC3G. Nucleic Acids Res, 2012.

133.

Hennecke, F., et al., The vsr gene product of E. coli K-12 is a strand- and sequence-specific DNA mismatch endonuclease. Nature, 1991. 353(6346): p. 776-8.

134.

Garibyan, L., et al., Use of the rpoB gene to determine the specificity of base substitution mutations on the Escherichia coli chromosome. DNA Repair (Amst), 2003. 2(5): p. 593-608.

135.

Pham, P., et al., Impact of phosphorylation and phosphorylation-null mutants on the activity and deamination specificity of activation-induced cytidine deaminase. J Biol Chem, 2008. 283(25): p. 17428-39.

136.

Basu, U., et al., The AID antibody diversification enzyme is regulated by protein kinase A phosphorylation. Nature, 2005. 438(7067): p. 508-11.

101 137.

Bulliard, Y., et al., Structure-function analyses point to a polynucleotideaccommodating groove essential for APOBEC3A restriction activities. J Virol, 2011. 85(4): p. 1765-76.

138.

Chiu, Y.L. and W.C. Greene, The APOBEC3 cytidine deaminases: an innate defensive

network

opposing

exogenous

retroviruses

and

endogenous

retroelements. Annu Rev Immunol, 2008. 26: p. 317-53. 139.

Chan, K.K., Q.M. Zhang, and G.L. Dianov, Base excision repair fidelity in normal and cancer cells. Mutagenesis, 2006. 21(3): p. 173-8.

140.

Chelico, L., et al., APOBEC3G DNA deaminase acts processively 3' --> 5' on single-stranded DNA. Nat Struct Mol Biol, 2006. 13(5): p. 392-9.

141.

Coker, H.A. and S.K. Petersen-Mahrt, The nuclear DNA deaminase AID functions distributively whereas cytoplasmic APOBEC3G has a processive mode of action. DNA Repair (Amst), 2007. 6(2): p. 235-43.

142.

Parisien, R. and Bhagwat, A.S. In: Grosjean, H. (ed.), DNA and RNA Modification enzymes: Comparative structure, Mechanism, Functions, Cellular Intaractions and Evolution. Austin, TX: Landes Bioscience (2009).

102

ABSTRACT GENETIC AND BIOCHEMICAL STUDIES OF HUMAN APOBEC FAMILY OF PROTEINS by PRIYANGA WIJESINGHE December 2012 Advisor: Dr. Ashok S. Bhagwat Major: Chemistry (Biochemistry) Degree: Doctor of Philosophy The AID/APOBEC family of proteins in higher vertebrates converts cytosines in DNA or RNA into uracil. These proteins have essential roles in either innate immunity or adaptive immunity. Recently, AID has also been implicated in DNA demethylation in the context of early embryogenesis in mammals. This is partly based on the reported ability of AID to deaminate 5-methyl cytosine to thymine (5mC to T). I reexamined this proposed new role of AID (5mC deamination) with two members of the APOBEC family in a novel Escherichia coli based genetic system. My results confirmed that while all three enzymes are strong cytosine deaminases, only APOBEC3A (A3A) is an efficient deaminator of 5mC. The partially purified A3A converted 5mC to T in vitro as well. This is the first report of efficient deamination of 5mC by any enzyme from any organism. Although AID did not deaminate 5mC efficiently, when the DNA binding region of AID was replaced with the corresponding segment from A3A, the resulting hybrid protein deaminated 5mC efficiently. Together these results suggest that human AID deaminates 5mCs very weakly. Consequently, AID is unlikely to promote genome-wide DNA

103 demethylation unless its ability to deaminate 5mC is enhanced through covalent modifications or accessory factors. APOBECs are DNA or RNA binding proteins with substrate preferences. APOBEC3G targets the last cytosine (C) in a run of Cs (usually three Cs) whereas AID prefers to act on WRC (where W is A or T and R is purine) in DNA. Guided by the solution structures of the APOBEC3G carboxy terminal domain (A3G-CTD), two putative DNA binding regions (R1 and R2) were recognized in the A3G-CTD structure. These potential DNA binding regions of A3G-CTD were swapped with the corresponding regions in AID. The biochemistry of the hybrid proteins A3G-AIDR1 (A3G protein containing AID DNA binding region 1), A3G-AIDR2 and A3G-AIDR1R2 were studied using an in vitro deamination assay. The data revealed that regions 1 and 2 of AID are not sufficient to confer its sequence-specificity upon A3G. Instead, replacement of Region 2 of A3G with corresponding region from AID broadens the specificity of A3G to act on WRC in addition to CCC.

104

AUTOBIOGRAPHICAL STATEMENT PRIYANGA WIJESINGHE Wayne State University Department of Chemistry

Phone: (313) 577-3094 E-mail:[email protected]

440 Chemistry Detroit MI 48202

Education: Ph. D., Biochemistry 2012 Wayne State University, Detroit, MI, USA Research Advisor: Prof. Ashok S. Bhagwat

Publications: Wijesinghe, P. and Bhagwat, A.S. (2012). Efficient Deamination of 5-Methylcytosines in DNA by Human APOBEC3A, but not by AID or APOBEC3G. Nucleic Acids Res. PMID: 22798497 Carpenter, M.A., Rajagurubandara, E., Wijesinghe, P. and Bhagwat, A.S. (2010) Determinats of sequence-specificity within human AID and APOBEC3G. DNA Repair (Amst), 9, 579-587. PMID: 20338830

Honors and Awards 2012

Summer Dissertation Fellowship

2010

Exceptional service as a Graduate Teaching Assistant

2009-present Member of the Phi Lambda Upsilon honor society

Suggest Documents