The study of protein-protein interactions involved in lagging strand DNA replication and repair

The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2008 The study of protein-protein interactions involv...
Author: Camron Elliott
7 downloads 0 Views 8MB Size
The University of Toledo

The University of Toledo Digital Repository Theses and Dissertations

2008

The study of protein-protein interactions involved in lagging strand DNA replication and repair Jennifer M. Hinerman The University of Toledo

Follow this and additional works at: http://utdr.utoledo.edu/theses-dissertations Recommended Citation Hinerman, Jennifer M., "The study of protein-protein interactions involved in lagging strand DNA replication and repair" (2008). Theses and Dissertations. 1191. http://utdr.utoledo.edu/theses-dissertations/1191

This Dissertation is brought to you for free and open access by The University of Toledo Digital Repository. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of The University of Toledo Digital Repository. For more information, please see the repository's About page.

A Dissertation Entitled

The Study of Protein-Protein Interactions Involved in Lagging Strand DNA Replication and Repair

By Jennifer M. Hinerman

Submitted as partial fulfillment of the requirements for the Doctor of Philosophy in Chemistry

_______________________________________ Advisor: Dr. Timothy C. Mueser

_______________________________________ College of Graduate Studies

The University of Toledo August 2008

Copyright © 2008 The document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author.

An Abstract of The Study of Protein-Protein Interactions Involved in Lagging Strand DNA Replication and Repair

Jennifer M. Hinerman

Submitted as partial fulfillment of the requirements for the Doctor of Philosophy in Chemistry

The University of Toledo August 2008

The organization and coordination of DNA replication machinery at the replication fork is important for accurate, efficient DNA synthesis in all organisms.

The initial

organization of the replication fork is vital for initiating lagging strand replication, while the regulation of proteins involved in Okazaki fragment processing is important for generating a complete daughter DNA strand. These DNA replication and repair proteins recognize DNA in a structure-specific manner, thus the recognition of these particular DNA structures promote the formation of certain protein-DNA and protein-protein complexes that are essential for DNA replication and repair to occur. Organisms such as the Bacteriophage T4 (T4) and Aeropyrum pernix (Ape) are model systems for use in the study of binary and ternary complexes that form during DNA replication and repair.

Reassembling the replication fork would allow the

iii

determination of the mechanism used to synchronize replication on both the leading and lagging strands. Proteins (helicase assembly protein and single-stranded DNA binding protein) from T4 were used to study the complexes involved in initiation of lagging strand replication.

The protein-protein interactions between the helicase assembly

protein (59 protein), single-stranded DNA binding protein (32 protein), and truncations of the 32 protein have been investigated (ITC, DSC, DLS, native gels, crystallography). 59 protein had a moderate interaction with 32 protein (KD = 3.7 µM) and with 32-B (KD = 3.6 µM). DNA-protein interactions between the 59 protein and fork DNA substrate (with and without 32 protein) have been studied (fluorescence). X-ray data was collected on a truncation of the 32 protein (32-B). Models of the 59 protein-32-B complex have been elucidated (SAXS, SANS). Ape proteins (proliferating cell nuclear antigen, DNA polymerase B, DNA ligase, and flap endonuclease-1) were characterized to study Okazaki processing. Subunits (Ape0162, Ape0441, and Ape2182) from the heterotrimeric proliferating cell nuclear antigen (PCNA) were cloned, expressed, purified, and characterized (DSC, DLS). DNA ligase was cloned, expressed, and purified. DNA polymerase B has been cloned and expressed. Protein-protein interactions between each of the PCNA subunits, as well as the interactions between each individual subunit and DNA ligase, DNA polymerase B, and flap endonuclease-1 were investigated using mass spectrometry.

iv

Acknowledgements I would like to thank my advisor, Dr. Mueser (Sir), for the opportunity to work and learn in his lab. I will definitely miss all the “story times”. Thank you for your guidance and for always taking the time to answer all of my questions. I would also like to thank Dr. Hanson for all of his help and guidance. To my fellow lab members, thank you for all the good times, conversations, and friendships. These memories will last forever. I would like to thank my committee members, Dr. Viola, Dr. Funk, and Dr. Lee for all their comments and suggestions. I would like to thank all of my friends, new and old, for their support and friendship. A special thanks goes out to all the new friends I have made at Toledo, especially to the friends whom have made this journey with me. Your friendships helped make the journey more exciting and bearable. To the group of friends whom have supported me as I moved all over the state to go to college and grad school, thank you for everything. Your friendship will always have a special place in my heart. None of this would have been possible without the love and support from my husband, Sam, and family. I could never thank them enough for supporting me through all the good and bad times. No matter how tough things got, they were always there encouraging me the entire way. Their endless support, understanding, and love will always mean the world to me.

v

Table of Contents List of Figures ........................................................................................................................... x List of Tables ........................................................................................................................ xvii List of Abbreviations ............................................................................................................. xix Chapter 1: Introduction ............................................................................................................ 1 1.1 Bacteriophage T4 DNA replication ................................................................................4 1.1.1 Helicase assembly protein..................................................................................... 12 1.1.2 DNA helicase protein............................................................................................ 17 1.1.3 Single-stranded DNA binding protein .................................................................. 21 1.1.4 59 protein interactions with 41 helicase and 32 protein ....................................... 25 1.2 Ape lagging strand replication and repair proteins .......................................................29 1.2.1 Proliferating cell nuclear antigen .......................................................................... 32 1.1.2 PCNA - DNA polymerase interactions................................................................. 42 1.2.3 PCNA - Flap Endonuclease 1 interactions............................................................ 46 1.2.4 PCNA – DNA ligase interactions ......................................................................... 48 Chapter 2: Experimental methods........................................................................................... 51 2.1 Vector cloning...............................................................................................................51 2.1.1 Polymerase chain reaction (PCR) ......................................................................... 51 2.1.2 Restriction cloning ................................................................................................ 53 2.1.3 Gateway cloning ................................................................................................... 56 2.1.4 Cloning and expression vectors and hosts ............................................................ 58 2.1.5 Preparation of competent cells and transformation protocols............................... 59 2.2 Protein expression and purification ..............................................................................61 vi

2.2.1 Protein expression and lysis.................................................................................. 61 2.2.2 Protein purification ............................................................................................... 63 2.2.3 Protein solubility................................................................................................... 65 2.3 Biophysical studies .......................................................................................................67 2.3.1 Dynamic light scattering ....................................................................................... 67 2.3.2 Isothermal titration calorimetry ............................................................................ 68 2.3.3 Differential scanning calorimetry ......................................................................... 69 2.3.4 Fluorescence anisotropy........................................................................................ 70 2.3.5 Small angle X-ray scattering................................................................................. 72 2.3.6 Small angle neutron scattering.............................................................................. 77 2.3.7 X-ray crystallography ........................................................................................... 79 Chapter 3: T-even Bacteriophage DNA replication............................................................... 82 3.1 Bacteriophage T4 results and discussion ......................................................................84 3.1.1 Protein expression and purification ...................................................................... 84 3.1.2 T4 protein characterization ................................................................................... 92 3.1.2.1 Differential scanning calorimetry .................................................................. 92 3.1.2.2 Dynamic light scattering ................................................................................ 93 3.1.2.3 Size exclusion chromatography ..................................................................... 95 3.1.2.4 Fluorescence anisotropy................................................................................. 96 3.1.2.5 Small angle X-ray scattering of T4 proteins .................................................. 99 3.1.2.6 Small angle neutron scattering of T4 proteins ............................................. 112 3.1.3 T4 protein-protein complexes characterizations................................................. 114 3.1.3.1 T4 protein complex preparation................................................................... 114

vii

3.1.3.2 Native gels of the T4 complexes.................................................................. 116 3.1.3.3 Size exclusion chromatography of the T4 complexes ................................. 117 3.1.3.4 Dynamic light scattering of the T4 complexes ............................................ 118 3.1.3.5 Isothermal titration calorimetry of the T4 complexes.................................. 120 3.1.3.6 Small angle X-ray scattering of the T4 complexes...................................... 124 3.1.3.7 Small angle neutron scattering of T4 complexes......................................... 133 3.1.4 T4 protein complex crystallization studies ......................................................... 135 3.1.5 T4 32-B crystallization studies ........................................................................... 140 3.2 Bacteriophage KVP40 results and discussion.............................................................151 3.2.1 Protein expression and purification .................................................................... 151 3.2.2 KVP40 protein characterization.......................................................................... 155 3.2.3. KVP40 protein complex studies ......................................................................... 162 3.3 Bacteriophage RB69 59 results and discussion ..........................................................164 3.3.1. Cloning of RB69 59 ............................................................................................ 164 3.3.2. RB69 59 expression and purification.................................................................. 166 3.3.3 RB69 59 crystallographic studies ....................................................................... 171 3.3.4 RB69 59+32-B crystallographic studies ............................................................. 173 Chapter 4: Aeropyrum pernix replication and repair proteins ............................................. 175 4.1

PCNA results and discussion ...................................................................................176

4.1.1 Ape0162.............................................................................................................. 176 4.1.2 Ape0441.............................................................................................................. 192 4.1.3 Ape2182.............................................................................................................. 206 4.1.4 PCNA subunits characterization ......................................................................... 216

viii

4.1.5 Ape DNA ligase.................................................................................................. 224 4.1.6 Ape DNA polymerase B ..................................................................................... 236 4.2 PCNA summary ..........................................................................................................241 Conclusion ............................................................................................................................ 242 References............................................................................................................................. 245 Appendix............................................................................................................................... 253

ix

List of Figures FIGURE 1.1: BACTERIOPHAGE T4 REPLICATION FORK (TROMBONE MODEL) CONTAINING ALL TEN PROTEINS NEEDED FOR THE REPLICATION PROCESS. ....................................... 5

FIGURE 1.2: PROTEIN-PROTEIN INTERACTIONS OCCURRING AT THE REPLICATION FORK (COLLAPSED, TROMBONE MODEL)............................................................................... 7 FIGURE 1.3: PROTEIN SEQUENCE ALIGNMENTS OF BACTERIOPHAGE T4, BACTERIOPHAGE KVP40, BACTERIOPHAGE RB49, AND BACTERIOPHAGE RB69. .... 11 FIGURE 1.4: 59 PROTEIN RECOGNIZES DNA SUBSTRATES THAT MIMIC REPLICATION AND RECOMBINATION STRUCTURES. .......................................................................... 14

FIGURE 1.5: STRUCTURE OF THE 59 PROTEIN. ................................................................... 15 FIGURE 1.6: COMPARISON OF 59 PROTEIN AND HMG PROTEIN STRUCTURES. ................... 16 FIGURE 1.7: COMPARISON OF HELICASE STRUCTURES....................................................... 19 FIGURE 1.8: DOMAINS OF 32 PROTEIN. .............................................................................. 22 FIGURE 1.9: STRUCTURE OF THE 32 CORE. ........................................................................ 23 FIGURE 1.10: PROPOSED MODELS OF THE 59-32 CORE INTERACTIONS............................... 28 FIGURE 1.11: LAGGING STRAND MATURATION IN PROKARYOTES, EUKARYOTES, AND ARCHAEAL ORGANISMS.............................................................................................. 31

FIGURE 1.12: SEQUENCE ALIGNMENT OF PCNA SUBUNITS. .............................................. 36 FIGURE 1.13: STRUCTURES OF PCNA FROM SEVERAL SPECIES. ........................................ 38 FIGURE 1.14: ARCHITECTURE OF HUMAN PCNA. ............................................................. 39 FIGURE 1.15: STRUCTURE OF PCNA-DNA POLYMERASE PEPTIDE.................................... 45 FIGURE 1.16: PCNA-FEN-1 PROTEIN-PEPTIDE STRUCTURE.............................................. 47 FIGURE 1.17: STRUCTURE OF S. SOLFATARICUS DNA LIGASE. ......................................... 49 FIGURE 2.1: GATEWAY CLONING SCHEME......................................................................... 57 x

FIGURE 2.2: FLUORESCENTLY LABELED DNA SUBSTRATE................................................ 71 FIGURE 2.3: FLUORESCENCE EQUATIONS FOR FRACTION OF PROTEIN BOUND SUBSTRATE. ............................................................................................................... 72

FIGURE 3.1: PURIFICATION SCHEME FOR THE 59, 59C42S, 32, 32 CORE, 32-A, AND 32B PROTEINS. ............................................................................................................... 85 FIGURE 3.2: SDS-PAGE OF 59 PROTEIN FROM POROS HS COLUMN.................................. 86 FIGURE 3.3: SDS-PAGE OF 59C42S PROTEIN FROM POROS HS COLUMN......................... 87 FIGURE 3.4: SDS-PAGE OF 32 PROTEIN FROM POROS HQ. .............................................. 89 FIGURE 3.5: SDS-PAGE OF 32-A PROTEIN FROM POROS HQ. .......................................... 90 FIGURE 3.6: SDS-PAGE OF 32-B PROTEIN FROM POROS HQ. .......................................... 91 FIGURE 3.7: DSC THERMOGRAMS FOR 59 PROTEIN AND 32-B PROTEIN. ............................ 93 FIGURE 3.8: DLS 59 PROTEIN AND 32-B PROTEIN DATA. .................................................. 94 FIGURE 3.9: FLUORESCENCE ANISOTROPY RESULTS OF 59 PROTEIN AND FORK DNA........ 96 FIGURE 3.10: FLUORESCENCE ANISOTROPY OF THE 59/32/DNA TERNARY COMPLEX. ...... 97 FIGURE 3.11: FLUORESCENCE ANISOTROPY OF THE 59/32-B/DNA TERNARY COMPLEX. .. 98 FIGURE 3.12: EXAMPLE OF SAXS SCATTERING PROFILES................................................. 99 FIGURE 3.13: SCATTERING CURVES OF THE INDIVIDUAL T4 PROTEINS. ........................... 100 FIGURE 3.14: GUINIER PLOTS OF THE INDIVIDUAL T4 PROTEINS. ..................................... 101 FIGURE 3.15: EXPERIMENTAL SCATTERING CURVES FOR 59 AND 32-B PROTEINS. .......... 103 FIGURE 3.16: THEORETICAL SCATTERING CURVES CREATED WITH CRYSOL.................... 105 FIGURE 3.17: DISTANCE DISTRIBUTION PLOTS OF THE 59 AND 32-B PROTEINS. .............. 106 FIGURE 3.18: MOLECULAR ENVELOPES OF THE 59 PROTEIN. ........................................... 107

xi

FIGURE 3.19: EXPERIMENTAL AND THEORETICAL SCATTERING CURVES OF THE 59 PROTEIN. .................................................................................................................. 108

FIGURE 3.20: MOLECULAR ENVELOPE OF 32-B GENERATED WITH GASBOR.................... 109 FIGURE 3.21: EXPERIMENTAL AND THEORETICAL SCATTERING CURVES OF THE 32-B PROTEIN. .................................................................................................................. 110

FIGURE 3.22: CREDO AND CHADD MODELS OF 32-B. ................................................. 111 FIGURE 3.23: SANS SCATTERING PROFILE FOR H59 AND D32-B PROTEINS.................... 112 FIGURE 3.24: SANS SCATTERING CURVES OF THE H59 AND D32-B PROTEINS. .............. 113 FIGURE 3.25: GUINIER PLOTS OF H59 AND D32-B PROTEINS. ......................................... 114 FIGURE 3.26: NATIVE GEL OF THE T4 PROTEIN-PROTEIN COMPLEXES. ............................ 117 FIGURE 3.27: DLS T4 COMPLEX DATA............................................................................ 118 FIGURE 3.28: ISOTHERMAL TITRATION CALORIMETRY THERMOGRAM OF 59-32 PROTEIN COMPLEX. ................................................................................................................ 122

FIGURE 3.29: ISOTHERMAL TITRATION CALORIMETRY THERMOGRAM OF 59-32-B PROTEIN COMPLEX. .................................................................................................. 123

FIGURE 3.30: EXAMPLE OF SAXS SCATTERING PROFILES FOR THE T4 COMPLEX............ 124 FIGURE 3.31: SCATTERING CURVES OF THE T4 PROTEIN COMPLEXES.............................. 125 FIGURE 3.32: GUINIER PLOTS OF THE T4 PROTEIN COMPLEXES. ...................................... 126 FIGURE 3.33: EXPERIMENTAL SCATTERING CURVES AND PR PLOTS FOR THE 59 + 32-B COMPLEX. ................................................................................................................ 127

FIGURE 3.34: MOLECULAR ENVELOPES OF THE 59 + 32-B COMPLEX. ............................. 130 FIGURE 3.35: THEORETICAL AND EXPERIMENTAL SCATTERING CURVES OF 59 + 32-B COMPLEXES.............................................................................................................. 131

xii

FIGURE 3.36: THEORETICAL SCATTERING CURVES FOR DIFFERENT 59 + 32-B COMPLEX CONFORMATIONS. .................................................................................................... 132

FIGURE 3.37: SANS SCATTERING PROFILE FOR THE 59 + 32-B COMPLEXES.................... 133 FIGURE 3.38: SANS SCATTERING CURVES OF THE 59 + 32-B COMPLEXES. ..................... 134 FIGURE 3.39: 32-B EXPANSION CRYSTALS. ..................................................................... 142 FIGURE 3.40: DIFFRACTION PATTERNS OF 32-B. ............................................................. 143 FIGURE 3.41: ANALYSIS OF 32-B DATA FOR TWINNING................................................... 144 FIGURE 3.42: MODELS OF P32 AND P3221 FROM PHASER............................................... 146 FIGURE 3.43: ELECTRON DENSITY AT THE N-TERMINUS AND C-TERMINUS OF THE 32-B. 147 FIGURE 3.44: DENSITY MISSING FROM THE 32 CORE MODEL. .......................................... 148 FIGURE 3.45: FINAL BUILD OF 32-B PROTEIN. ................................................................. 150 FIGURE 3.46: PURIFICATION SCHEME FOR THE KVP40 59 AND 41 PROTEINS. ................. 152 FIGURE 3.47: SDS-PAGE OF KVP40 59 AND 41 FINAL PURIFICATION COLUMNS. .......... 154 FIGURE 3.48: SOLUBILITY SCREEN OF KVP40 41 HELICASE............................................ 156 FIGURE 3.49: SDS-PAGE RESULTS OF KVP40 41/59 COMPLEX SOLUBILITY SCREEN..... 164 FIGURE 3.50: RESULTS OF CLONING RB69 59 GENE........................................................ 166 FIGURE 3.51: PURIFICATION SCHEME FOR THE RB69 59 PROTEIN. .................................. 168 FIGURE 3.52: SDS-PAGE OF RB69 59 PROTEIN FROM HQ............................................. 170 FIGURE 3.53: SOLUBILITY SCREEN OF RB69 59. ............................................................. 170 FIGURE 4.1: CLONING OF APE0162 INTO PET28. ............................................................ 177 FIGURE 4.2: SDS-PAGE OF APE0162/PET28 EXPRESSION AND LYSIS............................ 178 FIGURE 4.3: EXPRESSION AND LYSIS OF APE0162/PET28 IN BL-21 DE3 ROS2 PLYS S. 180 FIGURE 4.4: SDS-PAGE OF APE0162 Q SEPHAROSE AND POROS HQ PURIFICATIONS.... 182

xiii

FIGURE 4.5: SDS-PAGE OF MODIFIED PURIFICATION OF APE0162. ................................ 184 FIGURE 4.6: CLONING OF APE0162 INTO PET101. .......................................................... 186 FIGURE 4.7: SDS-PAGE OF APE0162/PET101 PROTEIN EXPRESSION AND LYSIS............ 187 FIGURE 4.8: SDS-PAGE OF PURIFICATION OF APE0162. ................................................ 188 FIGURE 4.9: ANALYSIS OF APE0162 PROTEIN USING MASS SPECTROMETRY................... 189 FIGURE 4.10: SDS-PAGE OF APE0162 OPTIMIZED PURIFICATION. ................................. 191 FIGURE 4.11: CLONING OF APE0441 INTO PDRIVE AND PET28....................................... 193 FIGURE 4.12: SDS-PAGE OF APE0441/PET28 EXPRESSION AND LYSIS.......................... 194 FIGURE 4.13: RESTRICTION DIGEST OF APE0441 FROM PET28 AND INSERTION INTO PET21...................................................................................................................... 196

FIGURE 4.14: SDS-PAGE OF APE0441/PET21 EXPRESSION AND LYSIS.......................... 196 FIGURE 4.15: CLONING OF FL-APE0441 INTO PET101. .................................................. 198 FIGURE 4.16: SDS-PAGE OF FL-APE0441/PET101 EXPRESSION AND LYSIS.................. 199 FIGURE 4.17: PURIFICATION OF FULL LENGTH APE0441. ................................................ 201 FIGURE 4.18: ANALYSIS OF FL-APE0441 PROTEIN USING MASS SPECTROMETRY........... 202 FIGURE 4.19: CLONING OF FL-APE0441 INTO PENTR-D AND PDESTC1....................... 203 FIGURE 4.20: SDS-PAGE OF FL-APE0441/PDESTC1 EXPRESSION, LYSIS, AND HEATING. ................................................................................................................. 204

FIGURE 4.21: SDS-PAGE OF FL-APE0441 PURIFICATION. ............................................. 206 FIGURE 4.22: CLONING OF APE2182 INTO PDRIVE AND PET28....................................... 208 FIGURE 4.23: SDS-PAGE OF HIS-APE2182 EXPRESSION AND LYSIS............................... 209 FIGURE 4.24: CLONING OF APE2182 INTO PET21. .......................................................... 211 FIGURE 4.25: SDS-PAGE OF APE2182/PET21 EXPRESSION AND LYSIS.......................... 212

xiv

FIGURE 4.26: SDS-PAGE OF APE2182 PURIFICATION. ................................................... 213 FIGURE 4.27: SDS-PAGE OF MODIFIED APE2182 PURIFICATION.................................... 215 FIGURE 4.28: PCNA DLS RESULTS................................................................................. 216 FIGURE 4.29: DSC THERMOGRAM APE 2182. .................................................................. 217 FIGURE 4.30: APE2182 PROTEIN CRYSTALS. ................................................................... 220 FIGURE 4.31: IMAGES OF APE2182 X-RAY DIFFRACTION. ............................................... 221 FIGURE 4.32: ANALYSIS OF APE2182 DATA FOR TWINNING. ........................................... 222 FIGURE 4.33: SDS-PAGE OF DNA LIGASE/PET21 EXPRESSION AND LYSIS.................... 226 FIGURE 4.34: DNA LIGASE SEQUENCE COMPARISON WITH MS PEPTIDES........................ 226 FIGURE 4.35: SDS-PAGE OF DNA LIGASE PURIFICATION.............................................. 228 FIGURE 4.36: DNA LIGASE SOLUBILITY SCREEN. ............................................................ 229 FIGURE 4.37: MODIFIED LYSIS AND EXTRACTION PROCEDURE FOR DNA LIGASE............ 231 FIGURE 4.38: AMPLIFICATION AND CLONING OF APE DNA POLYMERASE B. .................. 236 FIGURE 4.39: CLONING OF DNA POLYMERASE B INTO PENTR-D AND PDEST C1. ....... 238 FIGURE 4.40: EXPRESSION AND LYSIS STUDIES OF DNA POLYMERASE B........................ 239 FIGURE 4.41: SOLUBILITY RESULTS OF DNA POLYMERASE B......................................... 241 FIGURE A1: CHROMATOGRAM OF 59 PROTEIN PURIFICATION ON THE POROS HS............ 253 FIGURE A2: CHROMATOGRAM OF 59C42S PROTEIN PURIFICATION ON THE POROS HS... 254 FIGURE A3: CHROMATOGRAM OF 32 PROTEIN PURIFICATION ON THE POROS HQ. .......... 254 FIGURE A4: CHROMATOGRAM OF 32-A PROTEIN PURIFICATION ON THE POROS HQ. ...... 255 FIGURE A5: CHROMATOGRAM OF 32-B PROTEIN PURIFICATION ON THE POROS HQ. ...... 255 FIGURE A6: CHROMATOGRAM OF KVP40 41 HELICASE PURIFICATION ON THE POROS HQ........................................................................................................................... 256

xv

FIGURE A7: CHROMATOGRAM OF KVP40 59 PROTEIN PURIFICATION ON THE POROS HS. .......................................................................................................................... 256 FIGURE A8: CHROMATOGRAM OF RB69 59 PROTEIN PURIFICATION ON THE POROS HS.. 257 FIGURE A9: CHROMATOGRAMS OF APE0162 PURIFICATIONS.......................................... 258 FIGURE A10: CHROMATOGRAMS OF FL-APE0441 PURIFICATION. .................................. 259 FIGURE A11: CHROMATOGRAM OF APE2182 Q SEPHAROSE PURIFICATION. ................... 260 FIGURE A12: CHROMATOGRAM OF APE2182 POROS HQ PURIFICATION. ........................ 260 FIGURE A13: CHROMATOGRAM OF DNA LIGASE SP SEPHAROSE PURIFICATION. ........... 261 FIGURE A14: CHROMATOGRAM OF DNA LIGASE POROS HS PURIFICATION.................... 261

xvi

List of Tables TABLE 1.1: REVIEW OF HOMOLOGOUS DNA REPLICATION AND REPAIR PROTEINS IN PROKARYOTES, EUKARYOTES, AND ARCHAEA.............................................................. 3

TABLE 1.2: COMPARISON OF PROTEIN SEQUENCES FROM VARIOUS BACTERIOPHAGES. ....... 9 TABLE 1.3: SEQUENCE HOMOLOGY OF PCNA SUBUNITS FROM DIFFERENT SPECIES.......... 37 TABLE 2.1: PCR REACTION MIXTURE COMPOSITION. ........................................................ 51 TABLE 2.2: A PCR CYCLER TEMPLATE PROGRAM.............................................................. 52 TABLE 2.3: CLONING OF PCR PRODUCT INTO PDRIVE....................................................... 54 TABLE 2.4: LIGATION OF PCR PRODUCT INTO EXPRESSION VECTOR USING T4 DNA LIGASE. ...................................................................................................................... 56

TABLE 2.5: DESCRIPTION OF VECTORS AND CELLS USED FOR THE GENE CLONING AND PROTEIN EXPRESSION. ................................................................................................ 59

TABLE 2.6: HPLC COLUMNS ............................................................................................. 64 TABLE 2.7: HPLC PROGRAMS FOR EACH COLUMN UTILIZED IN PROTEIN PURIFICATION. ... 65 TABLE 2.8: SANS DATA COLLECTION DATA SETS.............................................................. 78 TABLE 3.1: ELUTION PARAMETERS FOR 59, 59C42S, 32, 32-B AND 32-A......................... 92 TABLE 3.2: DYNAMIC LIGHT SCATTERING RESULTS OF THE INDIVIDUAL T4 PROTEINS...... 95 TABLE 3.3: ANALYSIS OF THE T4 PROTEINS SCATTERING DATA WITH GNOM. ............... 102 TABLE 3.4: COMPARISON OF THE T4 PROTEIN RADIUS OF GYRATION FROM PRIMUS AND GNOM. ................................................................................................................... 104 TABLE 3.5: DYNAMIC LIGHT SCATTERING RESULTS OF T4 PROTEIN COMPLEXES. ........... 119 TABLE 3.6: ANALYSIS OF 59 + 32-B COMLEX SCATTERING DATA WITH GNOM. ............ 126 TABLE 3.7: COMMERCIAL AND IN HOUSE CRYSTALLIZATION SCREENS USED FOR CRYSTALLIZATION STUDIES OF THE T4 PROTEIN COMPLEXES................................... 136

xvii

TABLE 3.8: CRYSTAL HITS OF T4 PROTEIN COMPLEXES................................................... 137 TABLE 3.9: CRYSTAL HITS OF T4 PROTEIN COMPLEXES................................................... 138 TABLE 3.10: T4 32-B CRYSTAL HITS. .............................................................................. 141 TABLE 3.11: PROCESSING STATISTICS OF 32-B DATA AS P32 AND P3221. ...................... 145 TABLE 3.12: ELUTION PARAMETERS OF THE KVP40 PROTEIN PURIFICATIONS. ............... 154 TABLE 3.13: RESULTS OF SOLUBILITY SCREENS ON KVP40 41. ...................................... 156 TABLE 3.14: DLS RESULTS FOR KVP40 41 HELICASE..................................................... 158 TABLE 3.15: COMMERCIAL AND IN HOUSE CRYSTALLIZATION SCREENS USED FOR CRYSTALLIZATION STUDIES OF THE KVP40 PROTEINS. ............................................ 160

TABLE 3.16: CRYSTAL HITS OF KVP40 PROTEINS. .......................................................... 161 TABLE 3.17: COMMERCIAL AND IN HOUSE CRYSTALLIZATION SCREENS USED FOR CRYSTALLIZATION STUDIES OF THE RB69 59 PROTEIN............................................. 171

TABLE 3.18: CRYSTAL HITS OF RB69 59 PROTEIN........................................................... 172 TABLE 3.19: CRYSTAL HITS OF RB69 59 + 32-B COMPLEX.............................................. 174 TABLE 4.1: CRYSTAL CRYSTAL HITS OF APE2182. .......................................................... 219 TABLE 4.2: APE2182 X-RAY DATA PROCESSING STATISTICS FOR TRUNCATED AND UNTRUNCATED DATA. .............................................................................................. 223

TABLE 4.3: COMMERCIAL AND IN HOUSE CRYSTALLIZATION SCREENS USED FOR CRYSTALLIZATION STUDIES OF APE DNA LIGASE................................................... 232

TABLE 4.4: CRYSTAL HITS OF APE DNA LIGASE PROTEIN.............................................. 234

xviii

List of Abbreviations ABS

absorbance

AEBSF

4-(2-aminoethyl) benzenesulfonyl fluoride hydrochloride

Ape

Aeropyrum pernix

APS

Advanced Photon Source

ATP

adenosine triphosphate

BIS-TRIS

bis[2-Hydroxyethyl]iminotris[hydroxymethyl]-methane

BME

beta-mercaptoethanol

BSA

bovine serium albumin

CAPS

3-[Cyclohexylamino]-2-hydroxy-1-propanesulfonic acid

CV

column volume

DLS

dynamic light scattering

Dmax

maximum diameter

DNA

deoxyribonucleic acid

DSC

differential scanning calorimetry

dsDNA

double-stranded DNA

DTT

dithiothreitol

E. coli

Escherichia coli

EDTA

ethylenediaminetetraacetic acid

FAD

flavin adenine dinucleotide

FEN

flap endonuclease

HA

Hydroxyapatite

HEPES

N-[2-Hydroxyethyl]piperazine-N’-[2-ethanesulfonic acid]

xix

HEX

hexachloro-6-carboxy fluorescein

HPLC

high pressure liquid chromatography

HQ

Poros HQ

HS

Poros HS

IDCL

interdomain connecting loop

IPTG

isopropyl-β-D-1-thiogalactopyranoside

ITC

isothermal titration calorimetry

LB

Luria-Bertani broth

MES

2-[N-Morpholino]ethanesulfonic acid

MW

molecular weight

NAD

nicotinamide adenine dinucleotide

NEB

New England Biolabs

NTP

nucleoside triphosphate

OB fold

oligonucleotide-oligosaccharide binding

OD

Optical density

ORNL

Oak Ridge National Labs

PCNA

proliferating cell nuclear antigen

PCR

polymerase chain reaction

PDB

protein data bank

PE

Poros PE

PEI

polyethyleneimine

PIP

PCNA interacting protein

PIPES

Piperazine-N,N’-bis[2-hydroxypropanesulfonic acid]

xx

QS

Q sepharose

RFC

replication factor C

RG

radius of gyration

RH

hydrodynamic radius

SANS

small angle neutron scattering

SAXS

small angle X-ray scattering

SDS

sodium dodecyl sulfate

ssDNA

single-stranded DNA

SP

SP sepharose

T4

Bacteriophage T4

TA

topoisomerase-assisted

TAE

tris-acetate EDTA

TAPS

N-tris[Hydroxymethyl]methyl-3-aminopropane-sulfonic acid

TRIS

Tris(hydroxymethyl)aminomethane

v/v

volume per volume

w/v

weight per volume

X-gal

5-bromo-4-chloro-3-indolyl-b-D-galactopyranoside

XRD

X-ray diffraction

xxi

Chapter 1: Introduction DNA replication is a semi-conservative, semi-discontinuous process utilized by all life forms to duplicate genomic DNA. Replication initiates at specific origins where the antiparallel double-stranded DNA (dsDNA) undergoes strand displacement. Replication machinery is then organized onto the single-stranded DNA (ssDNA), allowing DNA synthesis to begin. The DNA polymerase synthesizes the leading strand (3’→ 5’) continuously, while the lagging strand (5’→ 3’) DNA polymerase synthesizes the DNA discontinuously, in segments called Okazaki fragments. At the end of the replication process, there are two new molecules of DNA, each containing one parent strand and one newly synthesized strand of DNA. This concept of duplicating DNA appears to be a simple task, but the process is quite complex and vital to an organism’s survival. The organization of the replication process, more specifically, the coordination of the interactions between various DNA replication proteins and DNA is essential for accurate and time-efficient DNA synthesis (Table 1.1). Proteins involved in initiating lagging strand replication (Bacteriophage T4) and Okazaki processing (Aeropyrum pernix-Ape) were characterized to elucidate the roles of the protein-protein complexes in DNA replication and repair. T-even bacteriophages are linear, double-stranded DNA viruses that infect Escherichia coli (E. coli). During infection, these phage are capable of replicating its own DNA rapidly, without utilizing the hosts proteins.1, 2

1

Bacteriophages like the T4, KVP40, RB49, or RB69, are model systems used to study DNA replication as their genomes encode all replication proteins required for the replication process. Bacteriophage T4 DNA replication has been reconstituted in vitro.1, 3 By studying these bacteriophage, the mechanisms involved in the controlling the architecture and processing at the replication fork can be determined. Aeropyrum pernix (Ape), a member of the crenarchaeal family, is an extreme thermophile discovered in ocean vents off the coast of Japan. These gram negative bacterial-type cells (0.8-1.0 µm) are aerobic, unlike other crenarchaeal organisms, and exist in environments that are around 90-95 °C.4, 5 The circular double-stranded DNA has a GC content around 67 mol %; the replication machinery and processes are most similar to eukaryotic DNA replication and repair.5,

6

The function and structure of

archaeal proteins are similar to prokaryotes, but more closely related to eukaryotes. Information gained from the studies of Ape proteins will elaborate the role of these protein-protein interactions and further the understanding of these processes in eukaryotic organisms. The main goal of this dissertation work is to characterize protein-protein interactions in several protein complexes, and eventually determine the structure of the complex. Biophysical techniques, such as dynamic light scattering, isothermal titration calorimetry, and fluorescence examined the binding interactions between the proteins. X-ray crystallography/diffraction and small angle scattering experiments provided information about the 3D structure as well as determined the shape of the molecular envelope.

2

Table 1.1: Review of homologous DNA replication and repair proteins in prokaryotes, eukaryotes, and archaea. Proteins similar in structure and function are found in all organisms, with slight differences in composition or activity. The archaeal proteins are divided into two subcategories, crenarchaeal and euryarchaeal based on similarities to prokaryotes or eukaryotes.7-9 Crenarchaeal organisms lack cell division proteins, histones, and RPA.9

Replication Proteins Origin recognition

Prokaryotes

Eukaryotes

Crenarchaeal

Euryarchaeal

DnaA [one subunit]

Cdc6/ORC [single or multiple homologues]

Cdc6/ORC [single or multiple homologues]

Helicase loader

DnaC [one subunit]

Origin recognition complex (ORC) [six subunits, ORC1-6] CDC6 [one subunit]

Replicative helicase

DnaB [one subunit]

MCM [six subunit]

Cdc6/ORC [single or multiple homologues] MCM [single homologue]

Single stranded DNA binding protein (SSB) Primase

SSB [one subunit]

SSB [one subunit] Primase [two subunits]

Primase [ two subunits]

Polymerase/ exonuclease

Pol III core [three subunits, α, ε, θ]

Replication protein A (RPA) [three subunit] polα/primase complex [four subunit] Pol δ [three or four subunits] Pol ε [at least five subunits]

Cdc6/ORC [single or multiple homologues] MCM [single or multiple homologues] RPA [one or three subunits]

Pol B [single or multiple homologues]

Clamp loader

Removal of primers Lagging strand maturation

Pol I RNase H DNA ligase (NAD-dependent)

RFC [five subunits, RFC15] PCNA [one subunit] FEN-1 RNase H DNA ligase I (ATPdependent)

RFC [two subunits]

Sliding clamp

γ –complex [five subunits, γ, δ, δ’, χ, ψ] Β

Pol B [single or multiple homologues] Pol D [two subunits] RFC [two subunits]

PCNA [three subunits] FEN-1

PCNA [one subunit] FEN-1

DNA ligase I (ATPdependent)

DNA ligase I (ATPdependent)

DnaG [one subunit]

3

1.1 Bacteriophage T4 DNA replication The 168 kB double-stranded, linear genome of the Bacteriophage T4 is synthesized in concatemers, multiple linked copies, which are further processed to create genomic segments that are terminally redundant and circularly permuted.2, 10 During the early stages of infection, the T4 synthesizes its DNA via origin dependent replication. This process produces long linear molecules of genomic DNA, complete with redundant ends containing a repeat of the next gene. The 3’ repeat ends then invade doublestranded DNA in order to initiate recombination-dependent replication, which is the predominant form of replication in the later stages of T4 infection.2, 10, 11 DNA-dependent replication uses several origins of replication, via formation of R-loops (RNA-DNA hybrid structure) created with RNA primers.2, 12, 13 Recombination-dependent replication utilizes D-loops (DNA-DNA structure) to initiate replication, where the 3’ repeat ends of DNA invades double-stranded DNA.2 Bacteriophage T4 has ten replication proteins that function during DNA replication and repair (Figure 1.1). These proteins include the gene 43 DNA polymerase (43 polymerase), gene 45 sliding clamp (45 clamp), genes 44 and 62 clamp loader (44/62 clamp loader), gene 41 helicase (41 helicase), gene 59 helicase assembly protein (59 protein), gene 32 single-stranded DNA binding protein (32 protein), gene 61 primase (61 primase), RNase H, and gene 30 ligase (30 ligase).3

4

32

5’

30

RNase H

61

3’ 41

3’ 5’

45

5’ 3’

43 59

44/62

Figure 1.1: Bacteriophage T4 replication fork (trombone model) containing all ten proteins needed for the replication process.

The 43 polymerase (dark blue circle) synthesizes the

template starting at the primer on leading and lagging strands. The 43 polymerase is tethered onto the DNA by the 45 clamp (green oval) which is loaded on by the 44/62 clamp loader (pink triangle). The 41 helicase (red) is loaded onto the lagging strand by the 59 protein (light blue circle). The primers (pink line/arrow) are synthesized by the 61 primase-41 helicase complex (red oval and purple circle) and the primers are removed by RNase H (yellow circle) after elongation by the 43 polymerase. The gap formed after the removal of the primers is filled by the 43 polymerase and the nicked is filled in by 30 ligase (blue green rectangle). 32 protein (orange circle) protects the ssDNA.1

There are two major complexes of proteins that form during the DNA replication process, the replisome (43 polymerase and 45 clamp) and the primosome (41 helicase and 61 primase). 43 polymerase synthesizes complementary strands of DNA on the leading and lagging strands in a 5→ 3’ manner and has a 3’→ 5’ exonuclease activity for repairs. The processivity of the 43 polymerase increases with the presence of the 45 clamp, which tethers the 43 polymerase to the DNA allowing synthesis of longer DNA segments. 44/62 clamp loader proteins load the 45 clamp onto the double-stranded DNA via an ATP-dependent process.1, 3

5

The dsDNA is unwound ahead of the replication fork by the 41 helicase in an ATP-dependent, 5’→ 3’ manner. The rate of unwinding dsDNA controls the rate of synthesis of the DNA. 41 helicase is loaded onto the ssDNA lagging strand by the 59 protein, which increases the loading of the 41 helicase. The 61 primase synthesizes primers on the lagging strand. 61 primase activity is stimulated via interaction with the 41 helicase as the 41 helicase unwinds the dsDNA.1, 3 The primers signal the 43 protein to begin synthesis of the Okazaki fragments on the lagging strand of DNA. There are also several proteins responsible for the processing of the Okazaki fragments on the lagging strand as well as repairing other regions of DNA. The singlestranded DNA binding protein is responsible for protecting single-stranded DNA. 32 protein increases the processivity of RNase H. RNase H is a 5’→ 3’ exonuclease that is responsible for removing the RNA primers from the lagging strand after the synthesis of the Okazaki fragment.

DNA ligase repairs nicks in the DNA via ATP-dependent

process.1, 3 Protein-protein interactions between all of these proteins help regulate the rate and accuracy of duplication of the genomic DNA. Several models propose how these protein-protein interactions can control the DNA replication process. In the “trombone” model (proposed by Alberts and colleagues), the 43 polymerases on the leading and lagging strands are in constant contact with each other during the replication process (Figure 1.1).14 The 43 polymearse on the lagging strand is also in contact with the primosome at the DNA fork. As the lagging strand polymerase synthesizes the new strand of DNA, the primosome proceeds to unwind the duplex DNA ahead of the replication fork. 32 proteins protect the newly unwound DNA by binding to the DNA,

6

which facilitates formation of a loop (trombone confirmation) that allows the polymerases to remain in contact during replication.1, 14 Nossal and colleagues propose that as the lagging strand replisome synthesizes the progeny strand, the primosome is inactive at the fork.14 Once the new strand is elongated, the replisome releases from the lagging strand, but remains in close proximity to the leading strand replisome and primosome. The primosome can then unwind the duplex DNA and create the next RNA primer.1 Most recently, electron microscopy experiments show that 43 polymerases on the leading and lagging strands are coordinated together during replication, and the 59 protein is located on or near the fork during the elongation process (Figure 1.2).14

32 59 61

43 45

5’

41

5’

3’

5’

Figure 1.2:

3’

3’

Protein-protein interactions occurring at the replication fork

(Collapsed, Trombone model). The replisomes (43 polymerase and 45 clamp) on the leading and lagging strands are in contact with each other, synthesizing DNA on both strands of DNA. There is contact between the lagging strand replisome and primosome (41 helicase and 61 primase) facilitates the regulation of leading and lagging strand synthesis. Proteins present at replication fork are 32 protein (orange circle), 59 protein (light blue oval), 61 primase (purple circle), 41 helicase (red ovals), 43 polymerase (blue circle), and 45 clamp (green oval).

7

Both models recognize that there must be some type of interaction between the lagging and leading strand replisomes as well as the primosome in order to coordinate DNA synthesis on both strands. This coordination is vital to facilitate control of the rate of DNA synthesis and to ensure the accuracy and timely completion of genome duplication. The initial recognition of the replication fork is important for the initiation of lagging strand synthesis, while the interactions between the primosome and replisome are essential for successful replication of the lagging strand. The replication fork is recognized by the 59 protein, which interacts with 32 protein, the 41 helicase, and 43 polymerase. Studying the protein-protein and protein-DNA interactions between the 59 protein, 32 protein, 41 helicase, and DNA would allow us to elucidate how these proteins recognize and help organize the replication fork. These proteins are being studied from the organism Bacteriophage T4, but to increase the chances of crystallizing the complexes, the same proteins from related phages were also be studied.

Relative phages to T4 are being studied including

Bacteriophage RB69 and RB49 (close relatives), and Bacteriophage KVP40 (distant relative) (Figure 1.3). Alignments of protein sequences provide information on how similar these proteins are to each other, whether the residues are identical (% identity), similar type of residue (% positive), or if the residue is different (Table 1.2).

8

Table 1.2: Comparison of protein sequences from various bacteriophages. Protein sequence comparison of Bacteriophage T4, RB69, RB49, and KVP40 proteins. Proteins from T4 are very similar to RB69, similar to RB49, and slight similarities to KVP40. Proteins

% identity % positive

T4 59: RB69 59

85

90

T4 59 : RB49 59

56

77

T4 59 : KVP40 59 33

59

T4 32: RB69 32

92

94

T4 32 : RB49 32

62

78

T4 32 : KVP40 32 48

64

T4 41: RB69 41

85

92

T4 41: RB49 41

85

92

T4 41 : KVP40 41 49

69

9

A. 59 protein sequence alignment T4 59 RB69 59 RB49 59 KVP40 59

--MIKLRMPAGGERYIDGKSVYKLYLMIKQHMNGKYDVIKYNWCMRVSDAAYQKRRDKYFFQKLSEKYKLKELALIFISNLVANQDAWIGDISDADALVF --MIKIRMPPDGERYINGKSVYKLYLMCKQHFNGRYDVIKYNWCMRVSDNAYQKRRDKYFFEKLAEKYKLKELTLIFISNLVANQDAWIGEISDADALVF -------------MRINAKSVYMLYCMMKAHMNGRYDCVKYNWRMRLSDAAFNKRRDKYYFAKLAEKYNLRELYFLFLSNLVANTDAWVGEITDVDAYGF MKRLLEVDLVLPRTWVNGYSVFRVYCAVKAHMSGKYDISKYKLGMKTPRSAFEKRSDKVFFERLAKRLTLNDCYQLLVSNLAANPNAISYEIAGADAHEF ::. **: :* * *:.*:** **: *: . *::** ** :* :*::: .*.: :::***.** :* :*:..** *

T4 59 RB69 59 RB49 59 KVP40 59

YREYIGRLKQIKFKFEEDIRNIYYFSKKVEVSAFKEIFEYNPKVQSSYIFKLLQSNIISFETFILLDSFLNIIDKHDEQT-DNLVWNNYSIKLKAYRKIL YREYIGRLKQAKETFAEDVRNIYYFSKKVEVSALQEIFDYNNKVQSSYIFKLLQSNIISFETFLLLDSFLNIIDKHDELT-DNLVWQNYSTKLKAYRKIL YMEYIGKITDAATRFDDDIKSLVYFSEKRGVT-LKELITYNPATETSGIFKLLQNRIISFESFMLLDSFLNIIDEHDKVA-KDLIWQTYSVKMKAYKKLL WLKHTGFLEIYSQHYKNELLSIFTLINK----ENKKFKDLFRGEGHPVIMQLVLRNTISIETFVILNRLLNFVPVIDKTYSDDIFWHEFKTRALAYDKLL : :: * : : ::: .: : :* ::: . *::*: . **:*:*::*: :**:: *: .::.*: :. : ** *:*

T4 59 RB69 59 RB49 59 KVP40 59

NIDSQKAKNVFIETVKSCKY---------QIDVNAAKKLFIETIKSCKY---------VIDAAKARALFVKVVKECKEISQLKVETNS KIDEELAKELFIQVKNQSKI----------

**

*: :*::. :..*

B. 32 protein sequence alignment T4 32 RB69 32 RB49 32 KVP40 32

MFKRKSTAELAAQMAKLNGNKGFSSEDKGEWKLKLDNAGNGQAVIRFLPSKNDEQAPFAILVNHGFKKNGKWYIETCSSTHGDYDSCPVCQYISKNDLYN MFKRKSTADLAAQMAKLNGNKGFSSEDKGEWKLKLDASGNGQAVIRFLPAKTDDALPFAILVNHGFKKNGKWYIETCSSTHGDYDSCPVCQYISKNDLYN MFKRQDSSKLQAQLAALKGGNGFSKEDPKEWKIKTDAAGNGEALIRFLPSKTEDGLPIVKLVNHSFKVNGKWYWENCSSTHGDFDSCPVCKYINEKDLYN MFKRKSPAQLQEKLEKMSSKKSFDN--ADEWKLTTDKLGNGSAVIRFLPAKGEDDLPFVKIFTHGFKENGNWFIENCPSTIDLPCPCCAANGELWKTEIE ****:..:.* :: :.. :.*.. ***:. * ***.*:*****:* :: *:. :..*.** **:*: *.*.** . .* ..: : :

T4 32 RB69 32 RB49 32 KVP40 32

TDNKEYSLVKRKTSYWANILVVKDPAAPENEGKVFKYRFGKKIWDKINAMIAVDVEMGETPVDVTCPWEGANFVLKVKQVSGFSNYDESKFLNQSAIPNI TNKTEYSQLKRKTSYWANILVVKDPQAPDNEGKVFKYRFGKKIWDKINAMIAVDTEMGETPVDVTCPWEGANFVLKVKQVSGFSNYDESKFLNQSAIPNI TDKTEWQLIKRKASYYANILVLKDPQAPENEGKVFKFRFGQKIYDKIVAMVNVNTDMGEVPVDVTCVFSGANFLLKAKKVDKHQNYDDSRFMQQSQLPKI DNQNIARKRKRTLSYWANIVVIKDDAAPENEGKVFKYRFGKKILDKITQAAQADEDLGVPGMDVTCVFDGANFSLKAKKVSGFPNYDDSKFGPSTELYGG ::. **. **:***:*:** **:*******:***:** *** .: ::* :**** :.**** **.*:*. . ***:*:* .: :

T4 32 RB69 32 RB49 32 KVP40 32

DDESFQKELFEQMVDLSEMTSKDKFKSFEELNTKFGQVMGTAVMGGAAATAAKKADKVADDLDAFNVDDFNTKTEDDFMSSSSGSSSSADDTDLDDLLND DDESFQKELFEQMVDLSEMTSKDKFKSFEELNTKFNQVLGTAALGGAAAAAASVADKVASDLDDFDKDMEAFSSAKTEDDFMS--SSSSDDGDLDDLLAG EDEAYQKYLMENMVDLSEIVAPSQFKPFDELETKFKKTMGVSMVAGAAASSAAAAINSQLDDFDAALNAFDSGIAAASTVVGINAASTAADSSADDLMAS -DEAKLKEVWDAMHDLNAIIAPSAFKSEAELQKRFLQVTGAAQPKASAAQNLEAQLNTSAPAQAN-APKAAAKPAAASVDVDSEPVTDSVDDELDALLAD **: * : : * **. : : . **. **:.:* :. *.: .:** : : : * . * *: .

T4 32 RB69 32 RB49 32 KVP40 32

L--------------------L--------------------LDLGGGTSSASDLDDLDSLLAM LELGDD----------------

*

10

C. 41 helicase sequence alignment T4 41 RB69 41 RB49 41 KVP40 41

--MVEIILSHLIFDQAYFSKVWPYMDSEYFESGPAKNTFKLIKSHVNEYHSVPSINALNVALENSS-FTETEYSGVKTLISKLADSP-EDHSWLVKETEK --MVEIILSHLVYDQAYFSKVWPYMDSDYFERGPAKNVFKIIKSHVNEYNAMPSINALKVALDNSS-LTEAEYKGTSDLIEKLADTP-EDHEWLVKETEK --MIKTIFSQLIYNGAFFTNVWPHLKADYFTEDEEKLIYKLIKKHVDEYSNIPTPTALQIALDKER-GNQTTYEGAKTLLDSLENTP-EDLDWLMKETEK MQIEKSIFSNLIENQEYFATVIPHLKKEYFQSTAHQHIFMMIKNHADEYKRRATPEVLKVCLESLTGISEYDYAEIQQTINELDPHPSHDLRWLIDMTEQ : : *:*:*: : :*:.* *::. :** : : :**.*.:** .: .*::.*:. .: * . :..* * .* **:. **:

T4 41 RB69 41 RB49 41 KVP40 41

YVQQRAMFNATSKIIEIQTNAELPPEKRNKKMPDVGAIPDIMRQALSISFDSYVGHDWMDDYEARWLSYMNKARKVPFKLRILNKITKGGAETGTLNVLM YVQQKAMYNATSKIIEIQSNAELPPEQRNKKMPDVGAIPDIMRQALSISFDSYVGHDWMEDYEARWLSYLNKARKVPFKLNILNKITKGGAETGTLNVLM FIRDRAMYNAMSRALEIQANAQLPKEKQNKKLPSEGAIPEIMQEALSITFDSDVGHDWFNDFEKRFLLYQTKANKIPFKLPMLNKITKGGAERKTLNVLL FCIEQSVFNALSESLAIQENAAKPLDQQNKRIKPLGAIPELMRDALNVCFDTSVGHDYFEDWEPRYKSYIEKAAKIPFKMNILNKITQGGVERSTLNLLL : :::::** *. : ** ** * :::**:: ****::*::**.: **: ****:::*:* *: * ** *:***: :*****:**.* ***:*:

T4 41 RB69 41 RB49 41 KVP40 41

AGVNVGKSLGLCSLAADYLQLGHNVLYISMEMAEEVCAKRIDANMLDVSLDDIDDGHISYAEYKGKMEKWREK----STLGRLIVKQYPTGGADANTFRS AGVNVGKSLGLCSLAADYLQTGHNVLYISMEMAEEVCAKRIDANMLDVSLDDIDDGNVSYAEYKAKMEKWRSK----TTLGRLVVKQYPTGGANANTFRA AGVNVGKSLGLCSLAADYMECGYDVLYISMEMAEHVVAKRIDANLLDITLDELDNGNVSFAEYSARMKRVKNKKIGDRPVGKLMIKQYPTGGANSNHFEA AGSNVGKSLALCHLATEYLLQGYNVLYISMEMSEAAVSKRIDANLMDISMDDFDT--ITEKTYGNKIQNLEKK-----TQGKLFIKQFPTAGANVTHFNT ** ******.** **::*: *::********:* . :******::*:::*::* :: * :::. ..* . *:*.:**:**.**: . *.:

T4 41 RB69 41 RB49 41 KVP40 41

LLNELKLKKNFVPTIIIVDYLGICKSCRIRVYSENSYTTVKAIAEELRALAVETETVLWTAAQVGKQAWDSSDVNMSDIAESAGLPATADFMLAVIETEE LLNELKLKKNFVPSVIMVDYLGICGSCRIRVYTENSYTLVKAIAEELRALAVESETVLWTAAQVGRAAWDASDMNMSDIAESAGLPATADFMLAVIETEE LMNELRLKKQFKPDVVIVDYLGICASSRLRVYSENSYTLVKAIAEELRGFFVKWDVVGWTAAQTTRAGWDASDLNMSDTAESAGLPATADFMLGVIETED LMNELRTKKNVIPDIVIVDYLGICASSRVSSS-ENTYVHVKAIAEEIRGFAVEHNVAVWSAAQTTRNAWDASDMGMGDIAESAGLAHTADLILGIMETEE *:***: **:. * :::******* *.*: **:*. *******:*.: *: :.. *:***. : .**:**:.*.* ******. ***::*.::***:

T4 41 RB69 41 RB49 41 KVP40 41

LAAAEQQLIKQIKSRYGDKNKWNKFLMGVQKGNQKWVEIEQD--STPTEVNEVAGSQ----QIQAEQNRYQRN---ESTRAQLDALANELKF LAQAEQQLIKQIKSRYGDKNKWNKFLMGVRKGNQKWVEIEQEGMNTPNTVNENAGAQ----MRQAEVNRTERVGKAKATRADLDSLANELKF LAKMGVQLMKQIKSRYGDKNYYSRFNVGVKKGNQRWYEVPNQIADQENAQVKPQSAQ----QAEK--------------REKLDELANNMTF TVALGQQRVKQIKSRYADRNQDQTFMIAVNKGKQRWGDIDGTANYSAPAQSQAKSASPFAQ------------------KKESSAKAEAVNW . * :*******.*:* . * :.*.**:*:* :: : .:. : . . *: :.:

Figure 1.3:

Protein sequence alignments of Bacteriophage T4, Bacteriophage KVP40, Bacteriophage RB49, and

Bacteriophage RB69. A) 59 protein, B) 32 protein, C) 41 helicase. The Bacteriophage T4 sequences are at least 95% similar to the RB69 proteins, but only 69% similar to KVP40 proteins. – represents gaps in the sequence, * represents conserved amino acid residues, : represents strongly conserved amino acid residues (similar), and . represents weakly conserved amino acid residues.15

11

1.1.1 Helicase assembly protein 59 helicase assembly protein, is a 26 kDa protein that was isolated and characterized in the early 1990’s concurrently by Bruce Alberts, et. al. and Tetsuro Yonesaki.16,

17

59 protein was originally classified as a recombination/replication

mediator protein (RMP).18 Alberts suggested that the 59 protein is needed for replication and recombination.16,

17

The 59 protein has been shown to interact with the single-

stranded DNA binding protein, 41 helicase, and ssDNA, but not with additional 59 protein.17-19

Mutants of the 59 protein are defective in recombination-dependent

replication causing DNA arrest in the cell. Experiments have shown that the 59 protein is able to facilitate the loading of the 41 helicase onto ssDNA, as well as inhibit the activity of the 41 helicase if no 32 protein is present.20 The 59 protein accelerates loading of the helicase and is essential for recombination-dependent replication.20 59 protein rapidly loads the 41 helicase onto R loops during origin-dependent replication, D loops during recombination-dependent replication and at stalled replication forks.10, 20 The 59 protein functions as a gatekeeper that helps regulate replication. This 59 protein can also prevent the leading strand polymerase from elongating the leading strand until the primosome synthesizes primers on the lagging strand.10 Replication is reactivated when the lagging strand primer is synthesized and the 59 protein is displaced or moved by the 32 protein.20, 21 Recently, electron microscopy experiments show that the 59 protein remains attached to the replication fork throughout various stages of replication, suggesting that the 59 protein must travel along the DNA via interactions with other DNA replication proteins.14

12

Originally, this protein was believed to be a single-stranded binding protein since it could bind to ssDNA with a binding site of 9-10 nucleotides.16, 22 Further research has shown that the binding site for 59 protein must be at least six nucleotides long but no longer than 12 nucleotides.20 In extreme cases, 59 had been shown to bind to forks with a five nucleotide gap when there is a double stranded arm present.20 Since then, it has be discovered that the helicase assembly protein recognizes and binds with far greater affinity to a variety of DNA structures, including fork and flap substrates (Figure 1.4).13, 20, 22

The 59 helicase assembly protein will bind to recombination-like structures such as four way junctions (Holliday junctions), cruciform-like three stranded structures (ssDNA that invades duplex DNA), but not three way junctions.12

A 59 protein mutant, I87A,

was shown to bind poorly to fork DNA and was unable to stimulate helicase activity.13 Another 59 mutant, I37A, had reduced affinity for fork DNA, but was able to activate helicase activity.13 Therefore, in complexes, both residues must be located near the lagging strand arm of the DNA in order to interact with the 41 helicase. Determination of the structure of the 59 protein bound to DNA would show how the 59 interacts with the different DNA substrates.

13

A 5’ 3’

D.

5’

5’

3’

3’

5’

5’

3’

3’ 5’

B.

3’ 5’

3’ 5’

E.

3’

C.

5’ 3’

3’

5’

5’ 3’

5’

5’

3’

Figure 1.4:

59 protein recognizes DNA substrates that mimic replication and

recombination structures. 59 protein preferentially binds strongly to fork (A), flap (B), and doubleflap DNA substrates (C).

This protein also recognizes and binds tightly to substrates that mimic

recombination specific substrates such as Holliday junctions (D) and three-stranded DNA substrates that minic strand invasion (E).

There are 13 alpha helices, 2 short beta sheets, and several coils in the prolate shaped 59 protein structure (PDB: 1C1K, Figure 1.5).20 This protein is divided into two domains, an N-terminal domain (residues 1-108) and a C-terminal domain (residues 109217) with clusters of acidic residues found on the surface of the N-terminal domain and on the C-terminal domain near the interface between both domains. The surface of the protein is covered with basic residues and hydrophobic clusters which may be responsible 14

for interactions with other DNA replication proteins and with the phosphate backbone and bases of DNA.12, 20 Despite the determination of the structure of 59 protein, it is still unclear where this protein binds to DNA and how it interacts with other replication proteins.

Figure 1.5: Structure of the 59 protein. Helicase assembly protein (PDB #1C1K) ribbon, cartoon structure with the N terminus in blue and the C terminus in red. Predominately alpha helical in character, it is unknown where the 59 protein binds to fork DNA and other DNA replication proteins. Figure produced and rendered in PyMOL.23

There is minimal structural homology between the 59 protein and other proteins. The N-terminal domain of the 59 protein is similar to dsDNA binding domain of eukaryotic high mobility group family of proteins (i.e. Rat HMG1A), but functional similararity is uncertain.10, 20 This homology is seen only within the first three helices of the 59 protein (Figure 1.6).

15

Figure 1.6: Comparison of 59 protein and HMG protein structures. Rat HMG DNA binding protein (PDB 1AAB- orange) superimposed onto 59 protein (PDB -1CK1- blue). The HMG protein helices overlap the first three alpha helices of the 59 protein. Figure modeled in Coot, produced and rendered in PyMOL.23-25

The proteins that are functionally homologous to the 59 protein are found in bacteria (DnaC, PriC, and PriA) and Bacteriophage (λP). 20, 26

PriA and 59 protein both

recognize similar DNA structures like forks, D loops, and 3-stranded junctions, but only 59 protein binds to Holliday junctions.20

PriA has a 3’-5’ helicase activity but it

recognizes stalled replication forks and binds tightly to fork in order to organize the assembly of replication restart proteins.26

DnaC complexes tightly to the DnaB

(helicase), preventing DnaB from prematurely loading onto ssDNA. PriC assembles the helicase on ssDNA during origin-independent replication and on stalled replication forks.26 This helicase loading activity is similar to the activity of Dna C or λP proteins, even though the 59 protein can also bind to ssDNA, unlike these related proteins.16 These are the only proteins (to date) to have shown any kind of structural, functional, or gene similarities to the T4 59 protein.27 16

1.1.2 DNA helicase protein The 41 helicase is a 52.4 kDa protein is a functional hexamer that unwinds dsDNA concurrent with the hydrolysis of ATP.

Unwinding requires one or two

hexameric rings present during strand separation. The hexamers assemble from a “dimer of trimers” or a “trimer of dimers”.28, 29 41 helicase translocates along the lagging strand in the 5’→ 3’ direction, unwinding the DNA in front of the replisome. The helicase will bind to the 5’ end of the single-stranded arm of the duplex DNA and remove the complementary strand at the 3’ terminus.30 Strand displacement is an ATP/GTP dependent process stimulated by the presence of ssDNA; one ATP molecule is consumed for every base pair separation.31 Not only is the activity of 41 helicase energy and polarity dependent, it is also space dependent. A gap of 10 base pairs on the 5’ arm is needed for the 41 helicase to be loaded onto the single-stranded lagging strand.12 The 41 helicase activity will decrease significantly if the gap on the lagging strand is only five base pairs wide.12, 32 However, experiments have shown that 41 helicase will maintain activity if the five base pair gap is on the 3’ end of the leading strand.12 Overall, the greatest activity is seen at fork DNA substrates.13, 33 41 helicase can self-load onto the lagging strand, but if the 32 protein is present, the helicase can not load itself onto the DNA.10,

30, 34

The 59 protein is also

needed to load the 41 helicase onto the DNA substrates that mimic recombinationdependent DNA replication (four-stranded cruciform or three-stranded structure).12, 34 The primosome complex, which consists of 41 helicase and the 61 primase, is an important component of the replication process. This complex is responsible for the unwinding of the dsDNA as well as the priming of the lagging strand in order for DNA

17

synthesis to occur.10,

32

The 61 primase-41 helicase interaction promotes RNA

pentanucleotide primer synthesis on the lagging strand, where 5’ RNA is complimented to the 3’DNA.10, 30, 32, 33 This priming activity is still seen if a mutant of the 41 helicase (minus 20 amino acids from C-terminal domain) is interacting with the 61 primase, only if 32 protein is not present.12,

35

The 41 helicase interacts with 61 primase weakly in

solution, but in the presence of DNA the protein-protein interaction is strong.35 Evidence shows that the 61 primase also forms a hexamer when forming a complex with the 41 helicase.30 There is a proposed mechanism for primosome assembly onto the duplex DNA that begins with 59 protein binding to the fork DNA and interacting with the 32 protein. If the 32 protein is already present, 59 protein displaces 32 protein, creating room to load the 41 helicase onto the ssDNA. Once assembled onto the DNA, the 41 helicase recruits the 61 primase to the lagging strand, followed by the removal or displacement of the 59 protein from the fork.30 Bacteriophage T4 41 helicase belongs to the DnaB-like helicase family, along with the Bacteriophage T7 gp4, E. coli DnaB and DnaA.28, 31 The structures of the gp4, E1 papillomavirus, and DnaB (from several species) have been solved; most helicases are structurally homologous to each other (Figure 1.7) .29,

31, 36

Each monomer has

nucleotide binding domains containing Walker A and B motifs are composed of lysine and DEAD box moieties for nucleotide binding and metal coordination.28 The Walker motifs play an important role in the hydrolysis of ATP during strand displacement. ATP hydrolysis occurs via a base catalyzed ester hydrolysis initiated by a glutamate from the DEAD box that interacts with a metal coordinated water that attacks the gamma phosphate on the nucleotide.28

18

A.

B.

Figure 1.7: Comparison of helicase structures. Structural similarities between gp4 (A: PDB:1E0K) and E1 (B: PBD:2GXA) helicases. Side view of the gp4 protein and top view of the E1 protein. Characteristic alpha helices, beta sheets, and hairpin loops are present to interact with the ssDNA and ATP. E1 protein has ssDNA bound in the center of the helicase. Figures modeled and rendered in PyMOL.23

19

There are several models describing how the helicase separates dsDNA and translocates along the DNA. Early models speculated that the lagging strand would pass through the center of the hexameric ring, with or without interaction with the leading strand. Another model suggested that the outer surface of the helicase destabilizes the DNA helix through the protein-DNA interactions. Newer models describe the proteinDNA interactions in more specific details. In the “bucket brigade manner”, sequential loops of the protein bind to nucleotides and the nucleotides are passed from loop to loop until separation is achieved.28

Essentially, the helicase travels along the DNA and

performs the unfavorable strand separation using the energy created from the NTP hydrolysis. Helicase monomers will bind to base pairs and use thermal fluctuations to separate the base pair interactions.28

Most recently, a new model (“coordinated escort

mechanism”) was developed from the analysis of the structure of the papillomavirus protein E1 bound to ssDNA.31 E1 protein interacts with the sugar-phosphate backbone of the DNA via hairpins from each monomer. The orientation of the monomers allows the N-terminal domain to interact with the 5’ end ss-DNA and the C-terminal domain interacts with the 3’ end. The structure reveals that one hairpin binds to one nucleotide and remains in contact as the single-stranded DNA moves from the top to the bottom of the protein. This movement occurs during the hydrolysis of one ATP molecule. Once the hairpin reaches the bottom of the staircase, the ssDNA is released; the helicase binds another ATP and the process starts from the beginning.28, 31, 36

20

1.1.3 Single-stranded DNA binding protein The 32 single-stranded DNA binding protein, is a 33.5 kDa protein that has many functions during all phases of infection. This protein is found in large quantities in the cell since it is utilized during DNA metabolism, cell cycle regulation and transcription regulation.37

Throughout DNA metabolism, the 32 protein protects the DNA from

nuclease degradation and melts any secondary structure that might form after initial strand displacement.38 This protein-DNA interaction facilitates replisome formation and allows recombination dependent replication to occur.11, 38 Along with protecting ssDNA, the 32 protein also enhances the functions of the T4 RNase H, but inhibits the function of 61 primase and self loading of the 41 helicase. 32 protein increases the processivity of the lagging strand DNA polymerase by increasing the rate that the 44/62 clamp loader loads the 45 clamp onto the DNA.11 32 protein enhances the rate of primer synthesis by 61 primase and the degradation of the primers by RNase H.11 Protein-protein interactions have also been detected between the 32 protein with the 41 helicase and 59 protein. 32 protein affects the loading of 41 helicase onto DNA without 59 protein.11 Limited proteolytic cleavage of the 32 protein divides the protein into three segments, the B domain (residues 1-21), the core (residues 22-253) and the A domain (residues 254-301) (Figure 1.8).39-41 Based on this information, several truncations were created to study the properties of these domains. 32-B is 32 protein minus the B domain (N-terminal domain), 32 core contains only the core domain, and 32-A is 32 protein minus the A domain (C-terminal domain). The B domain or N terminal domain is responsible for the cooperative binding between 32 proteins, especially in the presence of

21

ssDNA.11, 42 The B domain contains a LAST motif ([Lys/Arg]3 [Ser/Thr]3) that enhances the cooperativity between 32 molecules and ssDNA binding.43 Removal of the B domain results in a loss of 32-32 protein interactions, while mutations in the LAST motif affects both the protein-protein and protein-DNA interactions.40, 43

B domain

A domain

N-terminal domain

1

C-terminal domain

Core Domain

22

254

301

Figure 1.8: Domains of 32 protein. Proteolytic cleavage of the single-stranded DNA binding protein produces 3 domains, the B domain, core, and A domain. The B domain is responsible for the cooperative binding of 32 proteins, the core binds to ssDNA, and the A domain is responsible for the protein-protein interactions.

The structure of 32 core domain was solved by XRD in the Steitz lab; this structure contains residues 22-239 with an oligonucleotide bound (PDB # 1GPC) (Figure 1.9).38 The core binds ssDNA noncooperatively, via the oligonucleotide-oligosaccharide binding domain (OB-fold).11 In the solved structure, the ssDNA bound to the core was located in the hydrophobic and positively charged pocket of the OB fold. In the pocket, the aromatic residues interact with the DNA bases while the positively charged amino acids interact with the DNA phosphate groups. Hydrogen bonds form between the protein with the DNA bases and phosphate groups stabilizing this protein-DNA complex. Zinc, a structural metal ion, is coordinated to a histidine and three cysteines, which is

22

suspected to stabilize the structure of the core.38, 40, 44, 45 It has been shown that the N terminal domain and C terminal domain does not affect the fold of the core; in fact, the C terminal domain facilitates the binding of large substrates to the core as well as protects the core from protease degradation.43

The core also contains a LAST motif that

facilitates binding to ssDNA. When DNA is not bound to the core, it is speculated that the C terminus of 32 protein interacts with this LAST motif to stabilize the protein.43 The C terminal domain is very acidic, like DNA, therefore it is probable that the A domain wraps around to interact with the LAST motifs in the core and the B domain.43 The A domain is involved in interactions with the helicase assembly protein, DNA polymerase, RNase H, and other proteins involved in cell cycle/transcription regulation.11, 41 There is a lack in these protein-protein interactions if the A domain is missing and research has shown that the A domain is not needed for interactions with DNA.46

Figure 1.9: Structure of the 32 core. Structure of 32 protein core containing the OB fold that interacts with ssDNA. The protein was crystallized with an oligonucleotide, but it was unable to be modeled into the binding site due to poor electron density. Figure modeled and rendered in PyMOL.23

23

The 32 protein binds cooperatively to single-stranded DNA and recognizes the 5’ end of the ssDNA.13, 43 The T4 32 protein binds with a footprint of 6-7 nucleotides per protein. Once one 32 protein binds to the ssDNA, additional 32 proteins are added or removed from the 3’ end of the ssDNA.11 Experiments have shown that the 32 protein can bind weakly to fork DNA substrates, but only if there are single-stranded arms. 32 protein will bind to the leading strand flap DNA, while the 32 protein will interact with lagging strand flap DNA only if the protein is highly concentrated.11 Truncations of the 32 protein also affect the protein-DNA interactions. 32-B binds poorly to flap DNA, unlike 32-A, which bind tightly to fork DNA. In fact, studies have shown that 32-A binds to ssDNA more tightly than native 32 protein, suggesting that the B domain enhances the binding to DNA.11 Structural and functional homologs of T4 32 have been identified in prokaryotes, eukaryotes, and archaea. Single-stranded DNA binding proteins from all organisms harbor the same functions, to organize the replication fork for synthesis and to protect DNA from degradation.37, 47, 48 T4 32 is similar to E.coli single-stranded binding proteins (ssb) and replication protein A (RPA) from eukaryotes and archaea.47, 49-52 Euryarchaeal organism contain ssb that resemble RPA, while crenarchaea organisms contain ssb are similar to prokaryotic ssb.51 The domains of each of the proteins are very similar, each containing OB folds and C termini that are required for interactions with other DNA replication proteins. SSB functions as a homotetramer with four OB folds, compared to the heterocomplex of RPA.50, 51 RPA70, RPA 32, and RPA14 interact together to bind to DNA and interact with other replication proteins.49, 50, 52 RPA70 has four OB folds to regulate binding to DNA and the N-terminal domain is involved in protein-protein

24

interactions.51 RPA32 forms a complex with RPA14; in turn this complex binds to RPA70, facilitating the completion of RPA70 folding and binding to DNA.37,

53

The

occluded DNA binding site for RPA is around 30 nucleotides, which is substantially larger than the T4 6-8 nucleotide sites.48 Archaeal single-stranded binding proteins contain an occluded binding site of 12-20 nucleotides based on the functional dimers.50, 51, 54

Whether the ssb proteins function as a homotetramer, or heterocomplex, the

structural architecture is similar between species, especially in the DNA binding domain.47, 48, 51, 53-55

1.1.4 59 protein interactions with 41 helicase and 32 protein The 32 protein has been shown to inhibit the activity of 41 helicase via proteinprotein interactions.33 The 41 helicase has a low affinity for ssDNA, therefore the competition between 41 helicase and 32 protein for ssDNA will prevent the helicase from self-loading onto the DNA.18 41 helicase can not load itself onto ssDNA that is coated with 32 protein, unless the 59 protein is present.35 32 protein is not needed for leading strand synthesis if the helicase self-assembles onto the lagging strand arm.11 Leading strand synthesis will not occur if 41 helicase is loaded onto the lagging strand by the 59 protein without 32 proteins. On the other hand, rapid synthesis of the leading strand will occur if the 43 polymerase interacts with the 41 helicase. Coupling of the 43 polymerase and the 41 helicase is prevented when 59 protein is bound to the fork, unless 59 protein is moved by the 32 protein.11,

13

59 protein can stall replication at the fork if no other

replication protein, like 32 protein, is present to shift the 59 protein from the DNA

25

replication fork. 59 protein is unable to load the helicase onto the lagging strand in the presence of 32 truncation, 32-A.21 The 59 protein displaces or moves 32 proteins to create room for the 41 helicase to assemble onto the ssDNA.18,

22

Also, this binary

complex (59+32) is believed to interact in clusters, not linearly, in order to create enough room for the 41 helicase to load.18 Research has shown that the 59-32 complex will remain at the replication fork in the presence of the helicase-ATPγS complex, but if helicase is bound to ATP, then the 59-32 complex is removed from the fork.30 This 5932 complex is also responsible for the regulation primer synthesis on the lagging strand.14 All three proteins, 41 helicase, 59 protein, and 32 protein, are needed to prep the replication fork for replication as well as to stimulate primer synthesis on the lagging strand.11 Investigating specific protein-protein and protein-DNA interactions are important in understanding the replication process. The C-terminus of the 59 protein has been shown to interact with the C-terminus of the 41 helicase.35 The presence of the 59 protein drives the oligomerization of 41 helicase from monomer to the active hexamer state.13, 56

If ss-DNA is absent, 41 helicase can be cross-linked to 59 protein via the

cysteine at residue 215. Perhaps the binding of the lagging strand DNA to the C domain of the 59 protein prevents the 41 helicase from being in close proximity to the 59 protein. The 59 protein truncation (59∆KY-deletion of residues 215-216) has shown a decrease in loading of 32 protein in the absence of 41 helicase and a decrease of loading of 41 helicase onto ssDNA in the absence of 32 protein.13 The 59 protein was shown to self-associate into a pentamer when bound to 32 protein.21 59 protein interacts with 32 protein in a 1:1 stoichiometry; 59 protein interacts

26

with 32 protein, 32-B protein and the A domain of 32 protein.13,

18, 57

Fluorescence

experiments determined a strong dissociation constant for the 59+32-B and 59+A domain complexes, in the low nM range (3 nM and 2 nM respectively).18

Cross-linking

experiments suggest that cysteine 166 located in the core of 32 protein can cross-link to either cysteine in the 59 protein (residue 42 or 215).11 It is possible that the 32 protein interacts with the 59 protein via C domain and/or core domain of the 32 protein. 59 protein can prevent the proteolytic cleavage between the core and A domain of 32 but not the cleavage between the B domain and the core.13 Conformational changes in the structure of 59 protein were monitored when the A domain of 32 protein binds to the 59 protein, with the overall structure of the complex resembling a prolate shape. Morrical performed hydrodynamic and cross-linking experiments that place the A domain of the 32 protein close to the N-terminal domain of the 59 protein.18 Mueser modeled the 32 protein interacting with the 59 protein, with the 32 core near the N-terminal domain and the A domain of 32 interacting with C-terminal domain of the 59 protein (Figure 1.10).20 Benkovic modeled the 59-32 interactions based on cross-linking studies; the 32 core domain interacting with the N-terminal domain of 59 (cross-linking 59 C42 to 32 C166), while the A domain of 32 interacts with the C-terminal domain of the 59 protein.21

27

A Domain

A Domain

Figure 1.10: Proposed models of the 59-32 core interactions. A) Benkovic model with 32 core interacting with N-terminal domain of 59 protein and the 32 A domain interacting with the surface of the C-terminal domain of 59 protein. B) Mueser model with 32 core interacting with the C-terminal domain of the 59 protein and the 32 A domain interacting with the N-terminal domain of the 59 protein. The 32 A domain is depicted by the transparent yellow oval, the 59 protein is pink and the 32 core is blue and green. (59 PDB: 1CK1 and 32 core PDB: 1GPC) Protein complexes created in COOT, modeled and rendered in PyMOL.23, 24

The 59 protein promotes the binding of 32 protein onto fork DNA substrates that are too short for multiple 32 proteins to bind. This phenomenon would optimize the replication process, for as the 41 helicase unwinds the dsDNA, the 59 could attract 32 protein to the short stretch of ssDNA. The 32 protein could protect the ssDNA until primers could be synthesized in order for DNA synthesis to take place.11,

43

Alternatively, the 59∆KY mutant is capable of binding to 32 protein, but is unable to attract 32 protein to fork DNA substrates that are too short.13 Both 32 protein and 32-A truncation interact with the 59 protein in the presence of DNA; 32 protein binds more tightly to DNA than the 59 protein.13, 43 It is speculated that the 59 protein binds to the 32

28

protein, destabilizing the 32-ssDNA complex and the cooperative binding of multiple 32 proteins. Interactions between the 59 protein and the A domain of 32 protein disrupts the protein-DNA interaction, allowing the 41 helicase to assemble.58 This destabilization occurs either through direct contact between both proteins or by altering the structure of the 32-DNA complex.21

1.2 Ape lagging strand replication and repair proteins The major differences in DNA replication between T4 and Ape occur during Okazaki processing and the function of the sliding clamp/PCNA. The proliferating cell nuclear antigen (PCNA) has only a replication role in prokaryotes, while in eukaryotes and archaea, this protein also has a role in cell cycle regulation, translation, and even post translational modification.7,

59

Processing of the lagging strand flaps varies between

prokaryotes and eukaryotes/archaea. For example, removal of the flap by RNaseH/FEN1 varies in the presence of the PCNA; PCNA increases the processivity of FEN-1 but increases the activity of RNase H only on nicked substrates.60 DNA ligase is ATPdependent in eukaryotes and archaea as compared to the NAD+-dependent ligases in prokaryotes.7 The ability of PCNA to interact with the proteins responsible for Okazaki processing, allows for the proper and timely completion of lagging strand replication. The main proteins involved in this important process include the DNA polymerase, DNA ligase, and flap endonuclease 1. PCNA is very important for lagging strand maturation for it recruits and organizes the various proteins needed to join the Okazaki fragments together.

29

The process of lagging strand replication varies between prokaryotes and eukaryotes/archaea (primer/DNA removal via RNase H vs. primer/DNA displacement via DNA polymerase, respectively) (Figure 1.11). In archaeal organisms, such as Ape, the lagging strand is primed by the polymerase α/primase complex, which synthesizes around 10 bp of RNA, followed by 10-20 bp of DNA.61, 62 The presence of PCNA facilitates the switching of polymerase α to polymerase δ after primer synthesis, allowing the initiation of lagging strand synthesis by polymerase δ.

As the polymerase reaches the next

completed Okazaki fragment downstream, the polymerase displaces the RNA/DNA primer, creating a flap. The polymerase dissociates from the PCNA and then the flap is processed. Depending on the length of the flap, there are two modes of flap removal (short and long). Flap endonuclease-1 removes a short flap (up to 12 nucleotides) creating a nick in the DNA. A long flap (20-30 nucleotides) is protected by RPA (single-stranded binding proteins) until Dna2 binds to the flap. Dna2 is an endonuclease with ATPase and helicase activities; RPA stimulates Dna2 activity, allowing Dna2 to cleave the flap until no RPA is bound, leaving a short flap processable by FEN-1.61 PCNA recruits FEN-1 to the 5’ flap where FEN-1 cleaves the flap. When FEN-1 is released from PCNA, DNA ligase interacts with PCNA and is coordinated to the nicked portion of DNA. DNA ligase repairs the nick in the DNA, joining two Okazaki fragments together. This entire process suggests that the PCNA translocates in both directions and gathers at specific DNA structures. It remains at these sites while anchoring proteins which repair the dsDNA (Figure 1.11).59, 61, 63

30

A.

B.

C.

5’

3’

5’

3’

5’

3’

3’

5’

3’

5’

3’

5’

5’

3’

5’

3’

3’

5’

3’

5’

5’

3’

3’

5’

I

II

III 5’

3’

5’

3’

5’

3’

3’

5’

3’

5’

3’

5’

5’

3’

5’

3’

5’

3’

3’

5’

3’

5’

3’

5’

5’

3’

5’

3’

5’

3’

3’

5’

3’

5’

3’

5’

IV

V

VI

Figure 1.11: Lagging strand maturation in prokaryotes, eukaryotes, and archaeal organisms. There are several pathways utilized to effectively synthesis and join Okazaki fragments on the lagging strand of DNA.

The columns (A – C) are examples of lagging strand maturation in

virus/prokaryotes (A), eukaryotes/archaeal short flap (B) and long flap (C) and the rows ( I – VI) depict stages of processing. I. DNA polymerase (blue circle) is synthesizing the complementary strand of DNA (black line) on the lagging strand. II. In case of the C and D, the polymerase runs into the downstream Okazaki fragment and displaces the RNA primer (red line) and part of the DNA. III. For longer flaps, RPA (light orange circle) coats the ssDNA, recruiting Dna2 (gray hexagon) to bind and shorten the flap. IV. In viruses, RNase H (yellow circle) removes the primer as 32 protein (yellow circle) binds to ssDNA in front of the replisome. In prokaryotes, DNA polymerase I removes the primer. As for eukaryotes/archaea, FEN1 recognizes the 5’ flap and cleaves the flap. In all cases, DNA polymerase fills the remaining nick. V. DNA ligase binds to the nicked portion of DNA and fills in the nick. VI. Completed product of two joined Okazaki fragments. PCNA (green oval) recruits many of the replication proteins to the sites of repair and tethers the proteins to the DNA in order to increase the fidelity/processivity of the proteins.

31

1.2.1 Proliferating cell nuclear antigen Proliferating cell nuclear antigen (PCNA) was originally isolated around the late 1970s in patients with the autoimmune disease systemic lupus erythematosis.64 In the early 1980s PCNA again was isolated from cells during the S phase in the cell cycle, where this protein was expected to facilitate the regulation of the cell cycle.65, 66 Later, this antigen/cyclin was determined to have a role in the replication process.67 Research has shown that the PCNA is a homologue to the sliding clamp and/or the DNA polymerase accessory protein found in viruses, prokaryotes, and eukaryotes. Archaeal PCNA, similar to eukaryotic PCNA, also functions during DNA repair, lagging strand DNA replication, and DNA modification/packaging.7, 9, 59 An abundance of this protein is found in the nuclei of the cell during the S-phase, suggesting that the PCNA might have roles in other processes.59,

68

Further investigation into eukaryotic and archaeal

PCNA have shown that these protein complexes are vital for cell cycle control/regulation, transcription, and post-translational modifications.9, 59, 63, 68 The ability of this protein clamp to interact with a variety of proteins that are involved in many different processes make the PCNA very important to the survival of the organism. The PCNA interacts with several proteins (p21, p57, gadd45, cyclin-D, etc.) that regulate the cell cycle at multiple checkpoints, more specifically, the transition between the phases.

These transitions are vital check points that assess whether

replication will continue or be stalled based on the condition of the DNA.59, 63, 69, 70 These protein-protein interactions would allow the damaged DNA to be recognized and repaired before continuing into the S phase, where the genomic DNA is replicated.59, 66, 68 PCNA

32

also interacts with proteins that are involved in cell growth arrest and in signaling for cell apoptosis.59, 63 In viruses, such as baculovirus or bacteriophages T4 and RB69, the sliding clamp (PCNA homologue) was determined to be needed for transcription, predominately during the later stages of infection.68, translational

DNA

71, 72

modification

via

PCNA has also been detected during postinteractions

with

a

DNA

cytosine-5

methyltransferase after replication has occurred.68 The assembly and cohesion between sister chromatin occurs in the presence of PCNA (shown in Drosophilia and SV40), showing interactions between the PCNA and chromatin.59,

63

The large abundance of

PCNA in the cell, allows this protein to function or act as a scaffold to facilitate the function of a variety of proteins involved in important activities of the cell. The PCNA is vital for the proper and timely completion of DNA replication, including the repair of the DNA. PCNA has roles in DNA repair processes such as nucleotide-excision, base-excision, and mismatch repair. Here, PCNA helps coordinate the appropriate proteins to the sites of damage in order to allow the DNA to be repaired.59, 63, 68 There are similarities between the PCNA and what is known as the 9-1-1 complex found in eukaryotic organisms.61

The 9-1-1 complex (Rad9/Rad1/Hus1)

associates near the damaged section of DNA, signaling for the proper repair proteins to respond to that particular region of the DNA. Both PCNA and 9-1-1 protein complexes are similar in structure and interact with the single-stranded DNA binding protein, DNA ligase and the flap endonuclease-1 during the repair of the damaged DNA.61, 73 Proper repair of the DNA allows the cell to continue through the cell cycle, allowing the organism to survive.

33

This project investigates the transient interactions that occur during the lagging strand processing and repair. PCNA interacts with DNA polymerase, on both leading and lagging strands, with flap endonuclease-1 (FEN-1), and with DNA ligase. Depending on the organism, this multifunctional PCNA can interact with multiple subunits of the DNA polymerase, as well as another type of nuclease, RNase H The structure and function of PCNAs are highly conserved among all organisms, but there is minimal primary sequence similarity (Figure 1.12).59

The amino acid

sequence varies as well as the length of the genes, all of which affects the subunit size.74 The sequence identity between the archaeal PCNA subunits from different species range from 10-30 % identical residues (Table 1.3). Depending on the species, the PCNA assembles as a homotrimer, homodimer, or heterotrimer; each type contains six domains that form a ring-like structure. Most PCNA found in prokaryotes, eukaryotes, and viruses are composed of a single subunit repeat, while archaeal PCNA are composed of three different subunits. Prokaryotic PCNA assembles as a homodimer with three domains per subunit, while eukaryotic PCNA assembles as homotrimer with two domains per subunit. Crenarchaeal PCNA assembles as a heterotrimer (two domains per subunit), while the euryarchaeal PCNA is a homotrimer (two domains per subunit).74

34

S. H. S. S. P. A. A. S. A. A. S. E. T4

cervisiae sapien solfataricus 3 todokaii furiosus pernix Ape0162 pernix Ape0441 solfataricus 2 pernix Ape2182 fulgidus solfataricus 1 coli gp 45

-------------MLEAKFEEASLFKRIIDGFKDCVQLVNFQCKEDGIIAQAVDDSRVLLVSLEIGVEAFQEYRCD---HPVTLGMDLTSLSKILRCGNN -------------MFEGRLVQGSILKKVLEALKDLINEACWDISSSGVNLQSMDSSHVSLVQLTLRSEGFDTYRCD---RNLAMGVNLTSMSKILKCAGN --------------XKVVYDDVRVLKDIIQALARLVDEAVLKFKQDSVELVALDRAHISLISVNLPREXFKEYDVN---DEFKFGFNTQYLXKILKVAKR --------------AHIVYDDVRDLKAIIQALLKLVDEALFDIKPEGIQLVAIDKAHISLIKIELPKEMFKEYDVP---EEFKFGFNTQYMSKLLKAAKR ------------MPFEIVFEGAKEFAQLIDTASKLIDEAAFKVTEDGISMRAMDPSRVVLIDLNLPSSIFSKYEVV---EPETIGVNMDHLKKILKRGKA MSSEATLDSEFTDYKAMFRYEAKVFKELVDSVSKILDEGLFIITGEGLRLRGMDPARVALVDIEIPSSSFFDFYMAGDVERVELGVNMETLKGVVARAKK -----------MADARFYFSDARTWRYMVASIEKIIEEGVFVATGEGLSLRALDTSHVAMVDLYYPNTAFIEYDIGG--ESVEFGVSFDLLSKVLRRARK --------------XKAKVIDAVSFSYILRTVGDFLSEANFIVTKEGIRVSGIDPSRVVFLDIFLPSSYFEGFEVSQ--EKEIIGFKLEDVNDILKRVLK -------------MFRLVYTASSKFKYIAQTLAKINDEGVFEFSLDGLRAWIMSPDKTSLAILEMPSLSFEEYMVE---EEMRVVLRTDELNKISKRATR --------------MIDVIMTGELLKTVTRAIVALVSEARIHFLEKGLHSRAVDPANVAMVIVDIPKDSFEVYNID---EEKTIGVDMDRIFDISKSIST -------------XFKIVYPNAKDFFSFINSITNVTDSIILNFTEDGIFSRHLTEDKVLXAIXRIPKDVLSEYSIDS---PTSVKLDVSSVKKILSKASS --------------MKFTVEREHLLKPLQQVSGPLGGRPTLPILGNLLLQVADGTLSLTGTDLEMEMVARVALVQPH--EPGATTVPARKFFDICRGLPE --------------MKLSKDTTALLKNFATINSGIMLKSGQFIMTRAVNGTTYAEANISDVIDFD-----------------VAIYDLNGFLGILSLVND . : . :

S. H. S. S. P. A. A. S. A. A. S. E. T4

cervisiae sapien solfataricus 3 todokaii furiosus pernix Ape0162 pernix Ape0441 solfataricus 2 pernix Ape2182 fulgidus solfataricus 1 coli gp 45

TDTLTLIADNTPDSIILLFEDTKKDRIAEYSLKLMDIDADFLKIEELQYDSTLSLPSSEFSKIVRDLSQLSDSINIMITKETIKFVA------------EDIITLRAEDNADTLALVFEAPNQEKVSDYEMKLMDLDVEQLGIPEQEYSCVVKMPSGEFARICRDLSHIGDAVVISCAKDGVKFSA------------KEAIEIASESPDSVIINIIGSTNR----EFNVRNLEVSEQEIPEINLQFDISATISSDGFKSAISEVSTVTDNVVVEGHEDRILIKA------------KEEIIIDADSPEVVKLTLSGALNR----VFNVNNIEVLPPEVPEVNLEFDIKATINASGLKNAIGEIAEVADTLLISGNEEKVVVKG------------KDTLILKKGEENFLEITIQGTATR----TFRVPLIDVEEMEVDLPELPFTAKVVVLGEVLKDAVKDASLVSDSIKFIARENEFIMKA------------GDQLEVRVREDKVLFIVESVVLRR-----YLLPNLEVIVDVPEDISLEFDATATVIADVVKKTLRDVELVGDIVEFDAGEDYLSIRS------------EDELVLEVEGSRLAVKLKSRGERT-----FRIPQVVMTYEKLPEPKVSFTVRARMLGSTFREAVRDLEPHSETLTLRALEDALLLVG------------DDTLILSSNESKLTLTFDGEFTRS-----FELPLIQVESTQPPSVNLEFPFKAQLLTITFADIIDELSDLGEVLNIHSKENKLYFEV------------NDDIIFQWNAEEQALEVELRDRKLGFSRKFLVPATSVGAEEMRRLKLEPTVSFTILTDDLKAMIQDVKVVGDFAEFEASEGQVVVRS------------KDLVELIVEDESTLKVKFGSVEYK-----VALIDPSAIRKEPRIPELELPAKIVMDAGEFKKAIAAADKISDQVIFRSDKEGFRIEA------------KKATIELTETDSGLKIIIRDEKSG-AKSTIYIKAEKGQVEQLTEPKVNLAVNFTTDESVLNVIAADVTLVGEEXRISTEEDKIKIEA------------GAEIAVQLEGERMLVRSGRSRFSLSTLPAADFPNLDDWQSEVEFTLPQATMKRLIEATQFSMAHQDVRYYLNGMLFETEGEELRTVATDGHRLAVCSMPI DAEISQSEDGNIKIADARSTIFWP-----AADPSTVVAPNKPIPFPVASAVTEIKAEDLQQLLRVSRGLQIDTIAITVKEGKIVING------------: . .

35

S. H. S. S. P. A. A. S. A. A. S. E. T4

cervisiae sapien solfataricus 3 todokaii furiosus pernix Ape0162 pernix Ape0441 solfataricus 2 pernix Ape2182 fulgidus solfataricus 1 coli gp 45

-----------------------------------------------------GDIGSGSVIIKPFVDMEHPETSIKLEMDQPVDLTFGAKYLLDIIK-----------------------------------------------------GELGNGNIKLSQTSNVDKEEEAVTIEMNEPVQLTFALRYLNFFTK-----------------------------------------------------EGES---EVEVEFSKDTG---LQDLEFSKESKNSYSAEYLDDVLS-----------------------------------------------------EGEN---KVEVEFSKDTG---LADIEFNKESSSAYDVEYLNDIIS-----------------------------------------------------EGET--QEVEIKLTLEDE---LLDIEVQEETKSAYGVSYLSDMVK-----------------------------------------------------VGPE-RRRVETRLTRESP---LIDLEVKEPATSRYDVGYLKRMLG-----------------------------------------------------SSEMATVEIELSQS-----GSLLDYEAESQDRASYSIEYFSEMLS-----------------------------------------------------IGDLSTAKVELSTDN------TLLEASGADVSSSYGXEYVANTTK-----------------------------------------------------QAEEKEYEWVMKPGD------LLSLEVEEDAKSIYSRQVLEIATK-----------------------------------------------------KGDVDSIVFHMTET--------LIEFNGGEARSMFSVDYLKEFCK-----------------------------------------------------GEEGKRYVAFLXKDK------LKELSIDTSASSSYSAEXFKDAVKG QSLPSHSVIVPRKGVIELMRMLDGGDNPLRVQIGSNNIRAHVGDFIFTSKLVDGRFPDYRRVLPKNPDKHLEAGCDLLKQAFARAAILSNEKFRGVRLY ------------------------------------------------------------FNKVEDSALTR----KYSLTLGDYDGENTFNFIINMANM .

S. H. S. S. P. A. A. S. A. A. S. E. T4

cervisiae sapien solfataricus 3 todokaii furiosus pernix Ape0162 pernix Ape0441 solfataricus 2 pernix Ape2182 fulgidus solfataricus 1 coli gp 45

GSSLSDRVGIRLSSEAPALFQFD-LKSG-FLQFFLAPKFNDEE--------------------------------------ATPLSSTVTLSMSADVPLVVEYK-IADMGHLKYNLAPKIEDEEGS------------------------------------LTKLSDYVKISFGNQKPLQLFFN-XEGGGKVTYLLAPKVLEHHHHHH----------------------------------LTKLSDYVKVAFADQKPMQLEFN-MEGGGKVTYLLAPKLS-----------------------------------------GLGKADEVTIKFGNEMPMQMEYY-IRDEGRLTFLLAPRVEE----------------------------------------VAKIAESIELSFSTDKPLKMVFK-SPDGSRVTYLLAPSTG-----------------------------------------AAQAADAVVVSFSEDAPVRVDME-YLGGGRLTFYVSPKIE-----------------------------------------XRRASDSXELYFGSQIPLKLRFK-LPQEGYGDFYIAPRAD-----------------------------------------PVGAAESVKVSFASDYPMKIEYT-FPNGERMELYMAPSLAG----------------------------------------VAGSGDLLTIHLGTNYPVRLVFELVGGRAKVEYILAPRIESE---------------------------------------LRGFSAPTXVSFGENLPXKIDVE-AVSGGHXIFWIAPRLLEHHHHHH----------------------------------VSENQLKITANNPEQEEAEEILDVTYSGAEMEIGFNVSYVLDVLNALKCENVRMMLTDSVSSVQIEDAASQSAAYVVMPMRL KMQPGNYKLLLWAKGKQGAAKFEGEHANYVVALEADSTHDF-----------------------------------------

Figure 1.12: Sequence alignment of PCNA subunits. The primary sequences of PCNA from viruses, prokaryotes, eukaryotes and archaea are aligned via computer program Clustal W.15, 76 There is very little sequence similarity between the domains, even species. (.) represents similar amino acid residues while the (:) represents conserved amino acids in the majority of the subunits. Sequences used in the alignment include Saccharomyces cerevisiae (S. cerevisiae), Homo sapien (H. sapien), Sulfolobus solfataricus (S. solfataricus), Sulfolobus todokaii (S. todokaii), Pyrococcus furiosus (P. furiosus), Aeropyrum pernix (A. pernix), Archaeoglobus fulgidus (A. fulgidus), Escherichia coli (E. coli), and Bacteriophage T4 (T4).

36

Table 1.3: Sequence homology of PCNA subunits from different species. The percent identity and similarity between several PCNA subunits were calculated with a computer program bl2seq.75 The PCNAsubunits from Aeropyrum pernix were compared against Pyrococcus furiosus (PFU), Sulfolobus solfataricus (SSO), Archaeoglobus fulgidus (AFU), Homo sapiens (human), Escherichia coli (E. coli) and Bacteriophage T4 (T4). SSO 0397, E. coli, and T4 had no significant similarity between the sequences; therefore a value of 0 was assigned.

PCNA subunit Ape0162 Ape0441 Ape2182 PFU AFU SSO 0405 SSO 1047 SSO 0397 Human T4 E. coli

Ape0162 % Identity %Positive 100 100 28 55 31 51 36 58 20 44 27 54 25 50 0 0 20 45 0 0 0 0

Ape0441 % Identity %Positive 28 55 100 100 23 46 30 57 23 43 24 53 24 51 21 41 22 48 0 0 0 0

Ape2182 % Identity %Positive 31 51 23 46 100 100 29 50 25 45 26 47 21 43 25 45 22 43 0 0 0 0

The x-ray structure of PCNA homologues from H. sapiens, S. cerevisiae, E. coli, bacteriophages RB69 and T4, S. solfataricus, P. furiosus, S. todokaii, and A. Fulgidus have been determined; each of the ringed structures have a total of 6 domains, with 2-3 domains per subunit (Figure 1.13).69 Each domain contains 2-3 β strands with 4-6 α helices, thus there is a total of ~ 6 β sheets and ~12 α helices per subunit. The globular domains within each subunit are connected via an interdomain connecting loop (IDCL) that is approximately 10-15 residues long.77 The N-terminal domain of one globular domain is connected to the C-terminal domain of the next globular domain, thus forming the IDCL.78 All of the PCNA structures have similar architectures with a common motif (βαβββ) found in each of the domains (Figure 1.14a).72, 77, 79-86

The β sheet structure

resembles a greek key where two antiparallel β strands pack together at the interface

37

between each monomer (Figure 1.14b).80, 81, 86 α-Helices line the inner section of the ring and pack against the β sheets via hydrophobic interactions.

Electrostatic and

hydrophobic interactions are essential for the PCNA subunit-subunit interactions and PCNA-DNA interactions.59, 63, 77, 80-83, 85

A.

B.

D.

C.

E.

Figure 1.13: Structures of PCNA from several species. A) heterotrimeric Sulfolobus solfataricus (2IX2); B) monomeric Sulfolobus todokaii (1UD9) (crystallized as a monomer with 4 molecules in the asymmetric unit); C) homodimeric Escherichia coli (2POL); D) homotrimeric Bacteriophage T4 (1CZD). E) The superimposed structures of human (orange), SSO (cyan), and E. coli PCNA (green). Each species has the same overall structure, whether the PCNA is a dimer or trimer. The models are very similar with minor differences which appear around the IDCL. Structures prepared using PyMOL and COOT.23-25

38

IDCL

A. IDCL

IDCL

B.

Figure 1.14: Architecture of human PCNA. A) Overall structure of human PCNA (1AXC) showing the presence of three subunits, containing six domains comprised of 2-3 β strands and 4-5 α helices. Each domain is connected by the interdomain connecting loop (IDCL) (noted by black arrow) which is essential for protein-protein interactions. The blue box contains one domain in a subunit.

B)

Structure of the greek key (βαβββ, green) found in each domain. Structures modeled and rendered in PyMOL.23

39

Views of the electrostatic potential surface of PCNA shows that the outer region of the ring is negatively charged, while the inner region of the ring is predominately positively charged.82, 83 This ringed shaped protein has an average outer diameter of 8085 Ǻ, with an inner diameter around 30-35 Ǻ, large enough for dsDNA (20-25 Ǻ) to pass through. This arrangement suggests that these residues would allow the PCNA to travel along the dsDNA, but allow particular DNA structures to block the progression of PCNA. The subunits interact in a head to tail manner (N-terminus of one monomer interacts with the C-terminus of the next monomer) with hydrogen bonds and hydrophobic interactions stabilizing the interactions between the subunits.59,

87

The

number of hydrogen bonds stabilizing the subunit interactions vary between species; human and yeast PCNA have 8 hydrogen bonds, while bacterial and archaeal PCNA have around 4 hydrogen bonds. The number of hydrogen bonds present between the subunits has been linked to the stability of the formation of the protomer as well as speculation into the formation of the ringed structure around dsDNA.88-90 The IDCL is an important site for the protein-protein interactions between PCNA and the multitude of proteins that interact with PCNA. Protein-protein interactions are stabilized through hydrophobic and electrostatic interactions with the IDCL and the pocket below the IDCL.77, 86 Due to the circular structure of the PCNA, a clamp loader, replication factor C (RFC) is required to load the PCNA onto dsDNA. Replication factor C is a member of the AAA+ family of ATPases and is essential for loading PCNA in prokaryotes, eukaryotes, and archaea.90 Recently, research has suggested that archaeal PCNA can self assemble onto dsDNA due to weaker interactions between the PCNA subunit, while the

40

majority of literature has reported that the RFC is needed for loading PCNA in all other organisms.9, 88-90 The RFC is a multi-subunit protein that binds to PCNA and through an ATPdependent reaction, loads the PCNA onto dsDNA.59, subunits,

γ3δδ’(E.

coli),

or

gp44(4)/62(1)

89

RFC is composed of five

(bacteriophages),

or

A-E/1-5

(archaeal/eukaryotes subunits) where A/1 is the large subunit and B-E/2-5 are the small subunits.87 The five subunits form a spiral conformation which allows three of the five subunits (A-C) to bind to the PCNA, disrupting the interactions between the PCNA subunits.46,

63, 87

RFC recognizes the specific DNA structures, like DNA-RNA primer

duplexes, and assembles at the 3’ end in the proper orientation which allows interaction with other proteins, like the DNA polymerase.59, 66 Experiments have shown that after RFC binds to PCNA, the clamp loader binds ATP, triggering the opening of the PCNA ring structure.91-93

Molecular dynamic

simulations show that the flat ring is distorted via bending and twisting motions that occur when the RFC binds to the PCNA.94 This ring is positioned around DNA, where upon ATP hydrolysis, the protein-protein interactions between clamp and the clamp loader are weakened allowing the PCNA ring to close and RFC to dissociate.59 It has also been suggested that the RFC is involved in unloading the PCNA after the lagging strand repair.59, 95 Once loaded onto the dsDNA, PCNA is capable of traveling along the DNA to facilitate repair and replication. Generally, most proteins interact with the C-terminus (C-face) of the PCNA subunit as well as with the IDCL. PCNA interacting proteins (PIP) contain a particular motif, the PIP box, that has been shown to interact with a hydrophobic pocket under the

41

interdomain connecting loop.63, 77, 90 In eukaryotic PCNA, there are two conserved amino acids ( Leu126 and Ile128) that are essential for these interactions, along with the PIP box located at this hydrophobic pocket.88 The PIP box can be located at either the N-terminus or C-terminus of the target proteins (p21, FEN-1, DNA ligase, RFC, etc.) and contain a specific sequence of residues, Qxx(I/L/M)xx(F/Y)(F/Y) (where the x represents any amino acid).59, 69, 86 The PIP sequence tends to fold into a 310 helix (conserved motif seen in PIP) upon binding to the hydrophobic pocket on the PCNA subunit. This structural formation presents the side chains in a manner that would facilitate the insertion into the hydrophobic patch.63,

78

There are deviations in the PIP box sequence, which might

facilitate the specificity of binding as well as regulate the protein-protein interactions needed for DNA replication and repair. These differences in the sequences may also affect the strength of multiple interactions that could induce competitive binding interactions between the different subunits of PCNA.63 Also, sequence variations within the IDCL region of PCNA, especially heterotrimers, promotes specificity in the binding of PCNA interacting proteins.90

1.1.2 PCNA - DNA polymerase interactions DNA polymerases are responsible for the synthesis of DNA during replication and repair. The function and overall architecture of DNA polymerases are conserved throughout the many families, but each of the domains are different, including the residues in the active site.96 There are about five (or six) families (A - D, X and Y) of DNA polymerases based on sequence similarity; family A consists of replicative and

42

repair polymerases (ie. polI, T7 polymerase), family B consists of replicative polymerases which have a 3’-5’ exonuclease activity (mainly eukaryotic, some archaeal and bacteriophages), family C consists of bacterial replicative polymerases, family D are archaeal replicative polymerases but are not well characterized, and family X (and Y) are polymerases involved in different types of DNA repair.96

Euryarchaeal organisms

possess DNA polymerases from both the B and D families. In case of crenachaeal organisms, like Ape, there are sequences for two DNA polymerases that are both in the B family.97, 98 This type of DNA polymerase is heterodimeric, with a small subunit (DP1) similar to polymerase subunits responsible for primer and DNA synthesis, and a large subunit (DP2) that has no similarity to other polymerases.89, 90 DNA polymerase is responsible for the synthesis of daughter strands of DNA. The DNA polymerase adds nucleotides to a 3’ end and complimentary to a template in a proposed two metal ion mechanism.99, 100 In the active site of the DNA polymerase, both the previously added on nucleotide and a new dNTP are coordinated between three carboxyl groups from the polymerase and two Mg2+ ions. One Mg2+ ion activates the 3’ OH group of the previous nucleotide to attack the α-phosphate on the NTP (joining the nucleotides together), while the other Mg2+ ion coordinates to the β and γ phosphates.96 Both metal ions are important in the stabilization of the structure as well as neutralizing the active site. The processivity of the DNA polymerase is increased on the leading and lagging strands by the presence of PCNA; the clamp tethers the polymerase to the DNA allowing longer segments of DNA to be synthesized at one time.63 PCNA is also needed for “pol switching” in eukaryotic lagging strand replication; the presence of PCNA, RFC, pol α

43

and pol δ is required to switch from the priming DNA polymerase α to the replicative DNA polymerase δ.62 As the DNA polymerase approaches the downstream Okazaki fragment, PCNA prevents the polymerase from dissociating from the DNA. This act allows read through by the polymerase to replace the RNA/DNA primer stretch with the correct DNA segment.62 PIP boxes are located at the C-terminus of DNA polymerase, which would position the PCNA behind the DNA polymerase as it synthesizes the complementary strand of DNA.88 The PIP boxes interact with the C-face of the PCNA subunit, which orients the polymerase on the elongation side of the clamp.63 The structure of a peptide of human DNA polymerase δ (subunit p66) complexed to human PCNA shows that the PIP box from p66 is inserted into the hydrophobic pocket under the IDCL (Figure 1.15).78 Studies have shown that DNA polymerase binds to the RNA/DNA junction with a dissociation constant in the nanomolar range, while the DNA polymerase has a weaker affinity for DNA alone. This protein-protein interaction increases the binding affinity for DNA polymerase to DNA, therefore increasing the processivity of the DNA polymerase.78

44

Figure 1.15: Structure of PCNA-DNA polymerase peptide. Left: X-ray structure of human PCNA-DNA polymerase-δ, subunit p66 peptide complex (1U76). The PCNA protein is in green and the DNA polymerase peptide is in red. Right: The PIP box from p66 forms a 310 helix that interacts with the hydrophobic pocket under the IDCL of PCNA. Structures modeled and rendered with PyMol.23

45

1.2.3 PCNA - Flap Endonuclease 1 interactions Flap endonuclease 1 (FEN-1) is a structure specific 5’ endonuclease responsible for the removal of the RNA primers and facilitates DNA repair. Functional and structural homologues are found in prokaryotes, eukaryotes, and archaea, but the manner of activity varies (ie. RNase H vs. FEN-1). FEN-1 is crucial to survival of the organism; most mutations and deletions to this protein tend to be lethal to the organism.73 Early FEN-1 homologues were discovered to have a variety of activities, including cleavage of mismatched sequences, cleaving 5’ ends of dsDNA, and cleavage of 5’ end flap DNA.73 This endonuclease recognizes the 5’ end of RNA and/or DNA and binds to the substrate where cleavage will occur at the base of the 5’ flap. FEN-1 activity is inhibited if the 5’ end is blocked by another protein (such as RPA).61 Further studies have shown that FEN-1 prefers substrates with one nucleotide 3’ flap overhang (double flap), where FEN will also act on flaps up to 10-12 nucleotides long.73 Several structural components appear to be important to the activity of FEN-1. The flexible helical loop is speculated to be involved in moving the DNA flap through the protein, following an induced conformational change.73 Recently, the structure of the FEN-1 homologue, (species) RNase H, shows that the 5’ arm is positioned below this bridge region.60 Two metal binding sites and a helix-3 turn helix motif are found below the bridge and coordinate substrate binding. A cluster of basic residues, two groups of acidic residues, and both metals are shown to interact with the substrate.73 It has been shown that PCNA is able to enhance both exonuclease and endonuclease activities of FEN-1.61 Acetylation of FEN-1 inhibits activity and reduces substrate binding, but allows FEN-1 to interact with PCNA.61

46

This protein-protein

interaction stimulates the activity of FEN-1, allowing the protein to cleave the flap DNA.61 Research has shown that the KM of FEN-1 for the flap substrate is lowered in the presence of PCNA, suggesting that PCNA stabilizes the nuclease at the site of repair.78 Two protein-peptide (human, AFU) and one protein-protein (SSO) structures have been solved of the PCNA-FEN-1 complex; all of the structures suggest that there might be two modes of interactions that could be vital for activating FEN-1 activity (Figure 1.16).84, 90 A flexible hinge region in between the core domain and the C-terminus region is believed to be important for controlling the activity of FEN-1.79 This hinge region allows the FEN-1 to lock down into an inactive conformation, unable to cleave DNA, until the enzyme is replaced by the next repair protein.79, 90 The C-terminal tail of FEN-1 contains the PIP motif that folds into a βαβ motif (human) or 310 helical (archaeal) turn that is involved in the interaction with the C-face of PCNA. This interaction forms a βzipper with the PCNA residues, which enhances the nuclease substrate binding. 79, 84, 86

Figure 1.16: PCNA-FEN-1 protein-peptide structure. AFU PCNA-FEN-1 complex (green 1RXZ) superimposed onto the human PCNA-FEN-1 complex (blue 1U7B), with the FEN-1 peptides in orange. The C-terminal tail of FEN-1 forms a 310 helical turn, which interacts with the IDCL as well as the hydrophobic pocket below the IDCL. Structures superimposed with COOT and rendered in PyMol.23-25

47

1.2.4 PCNA – DNA ligase interactions DNA ligase is an enzyme that catalyzes the formation of a phosphodiester bond on single or double-stranded DNA. To date, three different mammalian families of DNA ligase have been isolated, DNA ligase I, III, and IV, while prokaryotes and archaea contain one homologue similar to DNA ligase I.101, 102 DNA ligase I is responsible for lagging strand maturation and has been shown to interact with PCNA.103, 104 There are two classes of DNA ligases, NAD+-dependent and ATP-dependent; bacterial ligases are NAD+-dependent, while eukaryotic, bacteriophage, and archaeal ligases are ATP-dependent.105,

106

While the structures are varied, function is highly

conserved and there are three steps that occur with all ligases to seal a nick in DNA. Step 1 involves DNA ligase binding to ATP or NAD+ involving a conserved active site lysine, followed by the release of pyrophosphate (ATP) or nicotinamide mononucleotide (NAD+). In step 2, the remaining AMP moiety transfers from the ligase to the 5’ phosphate group on the nicked DNA, activating the 5’ phosphate group. During the final step, the 3’ hydroxyl group on the nicked DNA attacks the 5’ phosphate group, releasing the AMP and sealing the nick.102, 106-108 Aeropyrum pernix DNA ligase is an ATP and ADP-dependent ligase that utilizes Mg2+, Mn2+, Ca2+, or Co2+ for sealing of the nick.105 Conserved domains in archaeal DNA ligase include an adenylation domain, an OB fold domain, and a DNA binding domain or helix-hairpin-helix domain (Figure 1.17a).101,

106, 107

The adenylation domain contains a conserved motif (KXDG) that

interacts with the adenylate cofactor.105 The OB fold is a flexible linker that connects the adenylation domain to the C-terminal domain; this domain binds to dsDNA, which increases the activity of the adenylation domain.

48

The helix-hairpin-helix domain

interacts with DNA in a nonspecific manner, while the DNA binding domain, found in higher eukaryotes, binds to DNA as well.101, 106 This DNA binding domain is located in the non-catalytic N-terminal domain that has a high affinity for DNA.102 The structure of human DNA ligase bound to nicked DNA shows that the ligase forms a circle around the nicked substrate (Figure 1.17b).109 Wrapping around the substrate allows the OB fold domain and the DNA binding domain to interact with both sides of the DNA as well as position the adenylation domain at the nicked site.109

Figure 1.17: Structure of S. solfataricus DNA ligase. A). Structure of SSO DNA ligase (2HIV) in an open conformation with the OB fold domain (yellow ribbon), adenylation domain (green ribbon), and DNA binding domain (red ribbon). B). Structure of human DNA ligase (1X9N) in the closed conformation bound to DNA. The adenylate moiety binds to the adenylation domain at the site of the nicked DNA, while the DNA ligase wraps around the DNA orienting the OB fold domain and the DNA binding domain to bind to the dsDNA.77 Structure modeled and rendered in PyMol.23

49

Evidence shows that the N-terminal domain of single DNA ligase interacts with the IDCL of one subunit of the trimeric PCNA.101, 108 The DNA ligase-PCNA interaction is important for lagging strand maturation, but it is unclear whether this interaction stimulates ligase activity.101, 104 Mutations or removal of the N-terminal domain of DNA ligase inhibited this protein-protein interaction, but DNA ligase remained active.101, 103, 104 Evidence suggests that this interaction is important for targeting DNA ligase to the nicked region; DNA ligase recognizes nicked DNA substrates in-vitro, the presence of PCNA would facilitate the recognition of the nicked DNA in-vivo.101, 104 Small angle Xray scattering experiments indicate that SSO DNA ligase interacts with one subunit of PCNA in an “open conformation” in the absence of DNA.77 In the presence of DNA, PCNA clamps around the duplex and directs the DNA ligase to the nicked substrate, where a conformational change in DNA ligase from an open conformation to a closed conformation occurs.

Like FEN-1, DNA ligase has a flexible hinge region that is

speculated to facilitate the structural conformational changes between the open and closed states.77 Since DNA ligase wraps around the DNA, it prevents other replication proteins, including another ligase, from interacting with the remaining two sites on the PCNA.90

50

Chapter 2: Experimental methods 2.1 Vector cloning 2.1.1 Polymerase chain reaction (PCR) Gene amplification from genomic DNA or a plasmid source was achieved using PCR with either the Proofstart DNA polymerase (Qiagen) or Easy A cloning DNA polymerase (Stratagene) and an Eppendorf Mastercycler Personal PCR machine. The reaction mixtures for both PCR reactions were similar and were prepared as follows (Table 2.1). The primer mix was prepared by adding 2 µL of 250 µM forward primer and 2 µL of 250 µM reverse primer and diluting the mixture to 50 µL with 10 mM Tris pH 8.0 (final primer concentration of 10 µM per primer). The PCR reaction contained 2 µM of both primers, 600 µM dNTP, 1X PCR buffer (Proofstart buffer contains 15 mM MgSO4 and Easy A buffer contains 2 mM MgCl2), and 2.5 units of enzyme. Table 2.1: PCR reaction mixture composition. Total reaction volume size is 50 µL. Reagents Proofstart DNA polymerase Easy A DNA polymerase 5 µL 5 µL 10X PCR buffer 3 µL 3 µL dNTP 10 µL 10 µL Primer mix 30.0 µL 30.5 µL Sterile water 1 µL 1 µL DNA polymerase 1 µL 0.5 µL Genomic or plasmid DNA

51

A standard PCR cycle program was used for the reactions (Table 2.2), with the annealing temperature dependent on the length and composition of the primer and the concentration of salt present in the reaction. Generally, the annealing temperature for these reactions ranged from 55-60 °C. Higher annealing temperatures were required with higher salt concentrations, with high GC content primers, and with longer primer length. The forward primers contained either a CACC transposase site or the Nde I restriction sequence, while the reverse primers had a stop codon and sometimes included Hind III or Bam HI restriction sites. All additional sequences were used to facilitate the insertion of the gene into the cloning vector. The PCR reaction begins with an initial heating step at 95 °C to activate the DNA polymerase, multiple extension steps at 72 °C, and a final extension step at 72 °C in order to insure completion of the amplified DNA. The size of the gene determines the number of cycles of steps 4-6 as well as the time length for each step.

Table 2.2: A PCR cycler template program. Typical program utilized to synthesize the gene including the temperature and time for each step.

Step

Temperature (°C)

Time (Min)

1

95

5

Activate polymerase

2

Annealing

1

Primer dependent – typically 53±3°C

3

72

2

Primer extension

4

95

1

DNA duplex melting

5

Annealing

1

Primer dependent – typically 53±3°C

6

72

2

Primer extension

7

repeat

25-30 cycles

Repeat steps 4-7 for amplification

8

72

20

Final extension to complete all products

9

4

Hold

Reaction is chilled for further processing

52

The PCR products were purified using a MinElute PCR Purification kit (Qiagen) to remove the excess primers and unused dNTPs. The size of the PCR products was verified on agarose gels. The agarose gels run for analysis of PCR and cloning products are generally 1% (w/v) agarose prepared in 1X TAE pH 8.0 (40 mM Tris-acetate, 1 mM EDTA). An equal volume of either 50 % (v/v) glycerol or a gel-loading buffer (0.01 % (w/v) bromophenol blue, 40 % (v/v) glycerol, autoclaved Milli Q water) was added to each sample to assist in loading the samples into the wells. The gels are run at 90 V for 1-2 hrs. in the 1X TAE pH 8.0 buffer. The PCR products were cut from the gel, extracted, and concentrated with a MinElute Gel Extraction kit (Qiagen). Once purified, the PCR product was ready for insertion into the cloning vector. Typically after all the purification steps, there is only 10 µL of the PCR product; therefore the concentration of the product was estimated from the agarose gel band.

2.1.2 Restriction cloning The PCR product from the Easy A reaction was inserted into the pDrive cloning vector (Qiagen) (Table 2.3). The 3’ A overhang, created by the Easy-A polymerase, facilitates in the insertion of the PCR product into the cloning vector, but the directionality of gene insertion is not controlled. Generally, 1 µL of the MinElute Gel Extracted PCR product exceeds the recommended molar excess (5X to 10X) for this reaction. Following the pDrive cloning instructions, the reaction mixture was incubated at 16 °C for 5 min., 10 °C for 5 min., 4 °C for 15 min., 10 °C for 10 min., 16 °C for 25 min., and then 1 µL of the reaction was transformed into a cloning host (see section

53

2.1.4).

Insertion of the gene into pDrive was verified using X-gal LB agar plates

(blue/white screening, Qiagen). White colonies were selected, grown in 5 mL of LB (37 °C, overnight) and the plasmids were isolated (Qiagen Spin Miniprep kit). The size of the plasmids was verified on an agarose gel.

Table 2.3: Cloning of PCR product into pDrive. The gene contains overhanging 3’ adenines to promote the insertion of the gene into the cloning vector pDrive. The reaction mixture is incubated at step wise temperature gradient for one hour, where the total volume of the mixture is 10 µL.

Reagent

Volume (µL)

pDrive cloning vector 1 PCR product

1

Sterile water

3

2X ligation mixture

5

After isolating plasmids of the appropriate size, the genes were restricted from the cloning vector and ligated into the expression vector (see Section 1.1.4). Three different restriction enzymes were used to remove the genes from the cloning vectors, Nde I (CA/TATG), Hind III (A/AGCTT), and Bam HI (G/GATCC).

The Nde I enzyme

restricts the 5’ end of the gene preserving the ATG start codon, while Hind III and Bam HI were utilized to restrict the 3’ end of the gene. Three controls were ran with each reaction in order to verify each restriction step; 1) construct digestion with Nde I alone, 2) construct digestion with Hind III or Bam HI alone, 3) construct incubated with all reagents except for the restriction enzymes. The complete restriction reaction mixture contained 25 µL of the purified plasmid from the mini-prep, 20 µL of sterile water, 5 µL

54

of 10X restriction enzyme buffer (NEB buffer 2), and 0.5 µL 100X BSA, followed by additions of 0.5 µL of the restriction enzymes. For a typical reaction, the construct was first restricted with Nde I for one hr. at 37 °C, then the second enzyme (Hind III or Bam HI) was added and incubated for another hr. at 37 °C. After the completion of the reactions, the enzymes were inactivated by incubation of the reaction mixture at 65 °C for 20 min. (which denatures Nde I and Hind III) or by the addition of stop buffer (50 % glycerol, 0.05 % bromophenol blue, 0.05 M EDTA, when BamH1 is used). For each sample, the gene was gel purified (Qiagen MinElute Gel Extraction kit) from a 1 % agarose gel. The same set of reactions was performed on the expression vector in order to prepare the vector for the insertion of the gene. The use of two different restriction enzymes helped ensure that the orientation in which the gene was inserted into the vector is correct. The gene was inserted into the expression vector using the overhangs created by the restriction enzymes mentioned above. T4 DNA ligase ligated the gene into the vector. Table 2.4 describes the preparation of the ligation reaction mixture; this mixture was incubated at 25 °C for 10 min. A small sample (1 µL) of the completed ligation reaction was transformed into a cloning host and grown overnight in 5 mL of LB. Each new plasmid was isolated from the cloning hosts (Qiagen Spin Miniprep kit), the size of the construct was verified on a 1 % agarose gel and the DNA sequence was confirmed (Plant-Microbe Genomics Facility @ OSU).

55

Table 2.4: Ligation of PCR product into expression vector using T4 DNA ligase. The reaction mixture total is 20 µL and this mixture is incubated at 25 °C for 10 min.

Reagent Sterile water 10X T4 Ligation Buffer Restricted PCR gene (purified) Restricted Expression Vector T4 DNA ligase

Volume 13µL 2µL 2µL 2µL 1µL

2.1.3 Gateway cloning The Gateway cloning method is different from restriction based cloning methods in that the PCR product is inserted into the cloning vector via a topoisomerase-assisted (TA) insertion, followed by gene transfer into the destination vector via a transposase reaction (λ Integrase and Excisionase) (Figure 2.1). The TA insertion uses a CACC overhang to insert the gene into the cloning vector, conserving the directionality of the gene.

The transposase reaction allows gene insertion into a variety of different

expression vectors (see section 2.1.4). In standard Gateway cloning reactions, the PCR product was inserted into the cloning (entry) vector during a 5 min. incubation period at room temperature (20-22 °C). The reaction mixture (6 µL total) contained 1 µL of salt solution, 1 µL of the TOPO vector, 0.5-1 µL PCR product, and 3-3.5 µL of sterile water. For optimal gene insertion, the molar ratio of PCR product:TOPO vector can vary from 0.5:1 to 2:1; in most cases 1 µL of PCR product was used in the reaction and 1 µL of the cloning reaction, transformed into a cloning host, yielded several colonies harboring the proper construct.

56

A.

Cloning Vector

Kan

CACC

Kan

+

TA assisted reaction GOI

Cloning Vector GOI

B.

Kan Cloning Vector

GOI

Amp

+

LR Clonase enzyme mix

Kan Cloning Vector

Destination Vector

Amp

+

ccdB

ccdB

Expression Vector GOI

Figure 2.1: Gateway cloning scheme. A). The gene of interest (GOI) is inserted into the kanamycin resistant (Kan) cloning vector using a topoisomerase-assisted (TA) reaction facilitated by the addition of a CACC site on the GOI. B). Once inserted into the cloning vector, the gene can be transferred into the amplicillin resistant (Amp) destination vector via a transposase reaction using the LR clonase enzyme. The LR clonase reaction transposes the ccdB selection gene with the gene of interest.

The plasmids were isolated using the Spin Miniprep kit (Qiagen) and the size of the construct was verified using a 1 % agarose gel. The gene is now ready for transfer from the cloning vector into any of the destination expression vectors. The efficiency of the LR clonase transposase reaction is dependent on the molar ratio between the destination vector and the cloning construct. Since the two vectors are approximately equal in size, the molar ratio can be estimated using weight to volume measurements. The concentration of the cloning construct was determined by measuring the absorbance of the construct at 260 nm and the amount needed to equal between 100300 ng/µL to match the concentration of the destination vector is calculated. The cloning construct was added to 1 µL of the destination vector (150 ng/µL), and the volume was 57

brought up to 8 µL using 1X TE pH 8.0 (10 mM Tris pH 8.0, 2 mM EDTA). 2 µL of the LR clonase enzyme was added to the reaction mixture and the solution was incubated at 25 °C for one hr. To stop the reaction, 1 µL of Proteinase K was added to the mixture and incubated at 37 °C for ten min. After completion, 1 µL of the LR clonase reaction was transformed into a cloning host, grown overnight in 5 ml of LB, and the plasmid was isolated using the Spin Miniprep kit (Qiagen). The size of the construct was visualized on a 1 % agarose gel and the sequence was verified through DNA sequencing (PlantMicrobe Genomics Facility @ OSU).

2.1.4 Cloning and expression vectors and hosts There are a variety of vectors and host used in different combinations based on the techniques used and the antibiotics needed for the study. Common antibiotics used in these cloning and expression experiments include ampicillin (Amp), carbenicillin (Carb), kanamycin (Kan), streptomycin (Strep), tetracycline (Tet), and chloramphenicol (Cam). Amp and Carb are used interchangeably since both antibiotics are derivatives of penicillin.

58

Table 2.5: Description of vectors and cells used for the gene cloning and protein expression. The vectors and cell lines are classified into two categories, cloning or expression, and information about the vectors are included.

Vector

Classification

Antibiotic resistance (concentration used in media)

Size (bp)

pDrive pENTR-D pET28 pET21 pET101 pDEST 17 pDEST C1

Cloning Cloning Expression Expression Cloning/Expression Expression Expression

Ampicillin (45 mg/mL) Kanamycin (25 mg/mL) Kanamycin (25 mg/mL) Ampicillin (45 mg/mL) Ampicillin (45 mg/mL) Ampicillin (45 mg/mL) Streptomycin (50 mg/mL)

3850 2580 5369 5443 5753 6354 5334

Cell line

Classification

Antibiotic resistance (concentration used in media)

Induction Method

TOP10 DH5α XL10 BL21 DE3 plysS BL21 star DE3 BL21 RIL BL21 RIPL T7 express Lac Iq Rosetta Rosetta Blue DE3

Cloning Cloning Cloning Expression Expression Expression Expression Expression Expression Expression

none none none IPTG IPTG IPTG IPTG IPTG IPTG IPTG

Rosetta 2 plysS AR120 N4830 OR1265

Expression Expression Expression Expression

Streptomycin (50 mg/mL) none none Chloramphenicol (35 mg/mL) none Chloramphenicol (35 mg/mL) Chloramphenicol (35 mg/mL) Tetracycline (15 mg/mL) Chloramphenicol (35 mg/mL) Chloramphenicol (35 mg/mL), Tetracycline (15 mg/mL) Chloramphenicol (35 mg/mL) none none none

IPTG Nalidixic acid Heat, 42 °C Heat, 42 °C

2.1.5 Preparation of competent cells and transformation protocols. Cloning and expression host were prepared by inoculating 5 mL LB with the cells (from a glycerol stock-prepared from adding 1 mL of the culture growth to 1 mL 50 % glycerol) and the culture is incubated at 37 °C overnight at 180 rpm. The appropriate antibiotic was added to the culture (for the cell lines that required antibiotics). The next morning, 500 µL of the overnight culture was used to inoculate 50 mL of LB in a 250 mL culture flask. This culture was incubated at 37 °C at 200 rpm until the culture grew to an

59

optical density of 0.4 at 600 nm (OD600). The culture was transferred to a 50 mL falcon tube and incubated on ice for 10 min. The cells were gently pelleted via centrifugation at 2,000 Xg at 4 °C for 10 min. The supernatant was decanted and the cells were made competent by resuspending in cold CaCl2 (transformation buffer) and incubated on ice for another 10 min. The cloning host transformation buffer consisted of 10 mM PIPES pH 6.5, 250 mM KCl, and 15 mM CaCl2, while the expression host transformation buffer consisted of 20 mM PIPES pH 6.5, 60 mM CaCl2, and 15 % glycerol. After incubation, the cells were gently pelleted via centrifugation at 2,000 Xg at 4 °C for 10 min. The cells were resuspended in 1 mL transformation buffer; 7 % (v/v) DMSO and 15 % (v/v) glycerol were added to the cells intended for storage. The competent cells were either used directly or aliquoted into eppendorf tubes, frozen on dry ice, and stored at -80 °C. The competent cells were transformed with cloning and expression constructs. For a typical reaction, 0.5 µL or 1.0 µL of the construct was added to 50 µL of competent cells and mixed by flicking the tube. The mixture was incubated on ice for 30 min., followed by heat shocking the mixture at 42 °C for 50 sec. The mixture was placed back on ice for 2 min. and then 250 µL SOC was added to the cell mix. The culture was incubated at 37 °C at 200 rpm for one hr., after which the culture was either plated onto LB plates or used to inoculate 5 mL LB. Either the plate or culture was then incubated at 37 °C overnight.

60

2.2 Protein expression and purification 2.2.1 Protein expression and lysis The protein expression protocol is very similar in all projects, with slight differences occurring with the methods of protein induction. Three flasks containing 100 mL of LB, with the proper antibiotics, were inoculated with serial dilutions of the transformed expression host (usually from a frozen glycerol stock). These flasks were incubated at 37 °C at 180 rpm overnight. The next morning, 50 mL of the growth culture was used to inoculate six 2L Fernback flasks containing 1 L of autoclaved LB, and containing the proper antibiotics. The flasks were incubated at 37 °C and shaken at 200 rpm until the bacteria grew to an optical density at 600 nm (OD600) around 0.6. Once reaching this log phase growth, a 1 mL sample (0 hr) was taken as a control and protein production was induced with either the addition of 1 mL 1M isopropyl β-D-1thiogalactopyranoside (IPTG) (final concentration 1 mM), or 1 mL 100 mg/mL nalidixic acid (final concentration 100 µg/mL). The flasks were then incubated at 37 °C at 200 rpm for three hours, where a 1 mL sample was taken after the three hr. incubation. The OD600 was again measured after the three hr. incubation in order to normalize the amount of cells in 3 hr sample to equal the amount of cells in the 0 hr sample; the OD of the cells dictates the volume of the cells pelleted for gel analysis. For heat induced vectors, the flasks were shaken at 28 °C until OD600 reaches 0.6, then the temperature was increased to 42 °C until the media reaches 42 °C (usually one hour for a 1 L culture). The culture was then incubated at 37 °C for an additional two hours. The cells were harvested by centrifugation; pelleted at 5,500 Xg for 30 minute intervals. The cell pellet was weighed and stored at -20 °C. 61

A 1M IPTG solution (total volume 10 mL) is prepared by adding 2.38 g IPTG to 10 mL of Milli Q water, followed by filter sterilization. A 100 mg/mL nalidixic acid solution is prepared by adding 0.6 g nalidixic acid to 5.940 mL of 0.02 M NaOH and 60 µL 2 M NaOH, followed by filter sterilization. The protein was extracted from the cell pellet using a lysis process. The lysis buffers are described for each specific protein, and ranges from buffer only to high salt concentrations. The protein was resuspended in lysis buffer that includes either the protease inhibitor (4-(2-Aminoethyl) benzenesulfonyl fluoride hydrochloride-AEBSF) or a protease inhibitor cocktail tablet (Beohringer Mannheim). Aliquots (10-15 mL) of lysis buffer were added to the multiple cell pellets in centrifuge tubes and a wooden, sterile stick was used to resuspend each pellet. The resuspended pellets were combined in a beaker with ~1 mg of lysozyme and the combination was stirred at room temperature for 20 min. Next, the solution mixture was sonicated for two min. at a 70 % duty cycle, 80% power. The sonication step was repeated a second time if the solution remained viscous. A sample (50 µL) was taken after sonication to determine the location of the protein. The cell debris was pelleted down by centrifugation at 16,000 Xg for 30 minutes. 50 % glycerol was added to the lysate (final concentration 20 %) and the lysate was frozen on dry ice (stored at -80 °C). Deviations from this standard lysis procedure (such as addition of PEI or deoxycholic acid) is mentioned in the individual sections of the projects. The cell culture samples taken from the expression study were centrifuged (12,000 Xg for 3 min. at room temperature) to pellet down the cells. The media was removed and the cells were resuspended in 50 µL Bugbuster with 50 µL of 2x SDS loading dye added as well. A 50 µL sample of the sonicated mixture was centrifuged

62

(14,000 Xg for 3 min. at room temperature) and the supernatant was transferred to a new eppendorf tube. 50 µL of 2x SDS loading dye was added to the supernatant, while 100 µL of 2x SDS loading dye was added to the pellet. These samples were boiled for 10 min., centrifuged (12,000 Xg for 2 min. at room temperature) and run on an SDS-PAGE at 200 V for 35 min. The gel was stained in coomassie blue overnight and destained in a solution containing 10 % (v/v) acetic acid, 30 % (v/v) methanol and deionized water. The gel was fixed in a drying solution (30 % (v/v) methanol, 5 % (v/v) glycerol, deionized water) and stored in cellophane.

2.2.2 Protein purification These projects utilized a variety of HPLC (GE, Pharmacia, Applied Biosystems, Biorad, etc.) columns to purify and characterize proteins and protein-protein complexes. Table 2.6 describes each of the columns, including the resin functionality, and pore size. The procedure begins with the equilibration of the column and preparation of the lysate. Preparation of the column included a water rinse, followed by the equilibration with buffer A. The conductivity of the lysate sample must be below the conductivity of buffer A to ensure that the protein will interact with the column resin. The lysate was diluted with a buffer (ie. Tris, Bis Tris) to match the conductivity of Buffer A. The lysate was filtered across a 0.22 µm filter in order to remove any precipitated material or centrifuged (5,000 Xg for 10 minutes) to pellet any remaining cellular debris. The lysate (or pooled protein fraction) was then loaded onto the column by a peristaltic pump (1st column) or by an HPLC pump (subsequent columns). A linear gradient from Buffer A to Buffer B (Buffer A plus 750 mM to1 M salt) was utilized to separate the protein from the

63

contaminants, followed by an isocratic flow of buffer B to remove material remaining on the column after the linear gradient.

Table 2.6: HPLC columns Column Q Sepharose SP Sepharose Poros HQ Poros HS

Company Amersham Biosciences Amersham Biosciences Applied Biosystems Applied Biosystems

Applied Biosystems Poros PE Hydroxyapatite BioRad Clontech Talon Laboratories GE Healthcare Superdex 75 Biosciences Amersham Superdex 200 Pharmacia Biotech

Bead size 90 µm

Functionality quaternary ammonium

90 µm

sulfopropyl

20 µm 20 µm or 50 µm 20 µm 20 µm 60-160 µm 13 µm

quaternary polyethyleneimine sulfopropyl

13 µm

phenyl Ca2+, PO43-, OHCo2+, tetradentate chelator Dextran (3,000-70,000 MW) cross-linked to agarose beads Dextran (10,000-600,000 MW) cross-linked to agarose beads

Typically, the lysate was first loaded onto a low resolution ion exchange column (ie. Q Sepharose or SP Sepharose). The fractions containing the protein of interest were then loaded onto a high-resolution ion exchange column (Poros HQ or Poros HS) as the final purification step. In most cases the conductivity of the protein after the first column run will be high, therefore the conductivity was decreased by dilution of the protein sample or by dialysis. Most proteins were >95% pure after only two column runs. The size exclusion columns (Superdex 200 or Superdex 75) were utilized to examine the oligomeric state of the individual proteins and to investigate protein-protein complexes. The hydrophobic column (Poros PE) and the hydroxyapatite column (HA) were utilized to remove contaminating endogenous nucleases. For some proteins, HA 64

could separate the contaminating nuclease and DNA by differential affinities. In the case of the PE, the proteins of interest did not stick to the hydrophobic column, while the contaminating nucleases adhered to the column. An overview of the chromatography programs for each column type is provided in Table 2.7.

Table 2.7: HPLC programs for each column utilized in protein purification. Column

Column volume (mL) 20 or 60 Q Sepharose 22 SP Sepharose 20 Poros HQ 20 Poros HS 30 Superdex 75 84 Superdex 200 12 Poros PE Hydroxyapatite 10 20 Talon

Linear gradient (mL) 200 or 600 200 200 200 N/A N/A N/A 120 60

Isocratic Flow (mL) 60 100 60 80 40 90 N/A 20 36

Flow rate Fraction Number (mL/min) volume of (mL) fractions 5 6 100 4 5 60 10 4 65 8 5 56 1 1 40 1 1 90 10 N/A N/A 3 3 47 2 2 48

2.2.3 Protein solubility A solubility screen was performed to find the optimal salts and buffers to increase the solubility of a protein. For this screen, partitioning between amorphous precipitate and solubilized protein is measured. Many times the protein can be precipitated by simple concentration. If not, then the protein can be dialyzed against water or an excess of PEG is added directly to the sample in order to precipitate the protein. A suspension of the precipitated protein was aliquoted into 17 Eppendorf tubes. The samples were centrifuged at 20,000 Xg at room temperature for 5 min. in order to repellet the protein. The supernatant was removed from each tube and 25 µL of the different 100 mM salts and buffers were added to redissolve the pellets in each tube. The mixture was incubated 65

at room temperature for 30 min. The samples were centrifuged at 20,000 Xg at room temperature for 5 min. in order to pellet the protein. Next, the amount of protein present in the supernatant was detected using the Bradford reagent and measuring the absorbance at 595 nm. The Bradford reagent was prepared by adding 5 mL of the 5X Bradford reagent to 20 mL of autoclaved Milli Q water. 5 µL of the supernatant was added to 995 µL of 1X Bradford reagent, mixed well, and incubated at room temperature for 5 min. After 5 min., the absorbance at 595 nm was measured and recorded. The best salts and buffers were chosen based on the relative amounts of protein redissolved. Another solubility experiment performed was to solubilize protein from the inclusion bodies that form during cell lysis. The inclusion body was located in the pellet with the cell debris after lysis and centrifugation. The lysate was removed and the pellet was rinsed with lysis buffer plus 0.5 % (v/v) Triton X 100 to extract the endogenous membrane fraction. The pellet was resuspended with the buffer/detergent mixture and then centrifuged at 10,000 Xg at 4 °C for 5 min. This step was repeated several times (34 times) until the pellet was opalescent. The pellet was suspended in a buffer (protein dependent) and this resuspension was aliquoted into several tubes. Different denaturants (urea, KSCN, guanidine hydrochloride) were added to the samples in a 1:1(v/v) ratio and incubated at different temperatures (4, 20, 37, 75, 90 °C) for different time periods (one hour to 24 hours) to determine the optimal conditions for the sample to solubilize. Different combinations of temperatures and incubation times were tested to solubilize the mixture. Next, the denaturants were removed from the mixture by dialysis (quick or step wise); this step was optimized in order to remove the denaturants while preventing the protein from precipitating.

66

2.3 Biophysical studies 2.3.1 Dynamic light scattering A DynaPro Titan (Wyatt) with the DynaPro temperature-controlled sampler was used for these experiments. Dyanmic light scattering (DLS) is the detection of laser light scattering from a protein solution and is used to estimate molecular weight, the percent polydispersity, and the radius of hydration of the proteins in solution. The concentration of the sample used in these experiments was 1 mg/mL, though for large molecular weight proteins or complexes, the concentration was reduced due to increase in signal by larger particles. The typical sample volume prepared ranged from 30-50 µL; the cuvette holds ~15 µL. Extra sample was prepared in case an aggregation effect caused by temperature was determined to be irreversible. The sample was filtered with at least a 0.45 µm filter or lower and the sample was centrifuged at 20,000 Xg for 15-30 min. to partition large particles (dust) or large aggregates to the bottom of the Eppendorf tube. Before the sample was measured, the cuvette was thoroughly cleaned and the intensity/counts for water were measured in order to determine the buffer background. The number of counts for water at a laser power of 80 % (W) should be around 2,500-3,500; the number slightly increases (3,000-4,000) at a laser power of 100 %. The samples were loaded slowly into the cuvette in order to reduce the chance of creating air bubbles. The outside of the cuvette was cleaned with a Kimwipe and an air source was used to remove any fragments from the Kimwipe from the cuvette. The sample/cuvette was allowed to equilibrate to the instrument’s temperature (usually 2-4 min.) before beginning of each measurement. Most measurements consisted of 10-15 acquisitions, where each acquisition was

67

comprised of ten 1 sec. readings. The data was analyzed with the software included with the instrument, Wyatt Tech Corp.

2.3.2 Isothermal titration calorimetry Isothermal titration calorimetry (ITC) measures temperature changes from chemical reactions and allows one to determine the thermodynamic parameters associated with different biochemical interactions. For these projects, ITC was utilized to determine the binding affinity and binding stoichiometries of protein-protein interactions. There were two protein samples required for each ITC run; the protein placed in the sample cell and the protein placed in the syringe for the titration. The concentration of the syringe protein was at least 20 times greater than the concentration of the sample cell protein in order to get a complete titration. In most cases, the sample cell protein was around 30 µM and the syringe protein ranged from 600-900 µM. The instrument (MicroCal VP-ITC Microcalorimeter) was prepared for measurements by first rinsing the reference cell, sample cell, and syringe with water followed by rinsing with the buffer solution (exact buffer composition of the proteins, the used dialysis buffer is best). It is important to remove the entire volume of buffer from each area in order to reduce the effects of sample dilution. Several control runs were completed to evaluate the performance of the instrument. The controls consisted of an injection of buffer into buffer, buffer into the sample cell protein, and injection of the syringe protein into buffer. These controls allow the subtraction of the heats of dilution, mixing, and injection when evaluating the data.

68

For the protein-protein runs, the protein samples and buffer were degassed for at least 5 minutes at the temperature of the ITC run. The sample proteins were loaded into the sample cell and syringe carefully to avoid the addition of air bubbles. The buffer was loaded into the reference cell. The syringe sample was purged several times to reduce the dilution effect. The syringe was placed into the sample cell and the run was started. The experimental parameters for each run included a total of 40 protein injections, at the cell temperature of 20 °C, the reference power at 10 µCal/sec, initial delay of 60 sec., stirring speed at 310 rpm, with a high feedback mode, and fast, auto equilibrium set. The first injection consisted of 1 µL for a two-sec. duration with a 300 sec. spacing period. The subsequent injections consisted of 5 µL injections during a 10 sec. duration with a 300 sec. spacing period between the injections. The data was analyzed with the program available with the instrument (Origin 7).

2.3.3 Differential scanning calorimetry Differential scanning calorimetry (DSC) detects changes in temperature as two samples are heated and determines the thermodynamic parameters involved with the denaturation of the protein.

This technique was primarily used to determine the

temperature at which the protein unfolds. The concentration of the protein used in this experiment varied from 30 µM-100 µM. The instrument (MicroCal VP-DSC Microcalorimeter) was prepared by rinsing the sample cell with water and with buffer (the buffer used for dialysis). The protein and buffer were degassed for about 5 min. at the initial temperature. During each run, the sample or buffer was equilibrated at 4 °C or 10 °C for 15 min., then the temperature was

69

increased at a rate of 90 °C/hr (upscan) until reaching a final temperature of 75 °C or 125 °C. Upon reaching the final temperature, the temperature was held at that point for 15 min. and then the temperature was decreased to 4°C or 10°C at a rate of 60 °C/hr (downscan). The buffer was measured first in order to establish a stable baseline, which at times takes several upscans and downscans. Once the upscans overlap, then the buffer solution can be exchanged for protein. This exchange took place during the downscan when the temperature was around 20-25 °C, the protein should be incubated in this temperature range (during degassing period) before loading into the sample cell. Collection of one upscan and one down scan was usually sufficient for the protein data collection. The data was analyzed using the program available with the instrument (Origin 7).

2.3.4 Fluorescence anisotropy Fluorescence anisotropy was used to detect the binding interactions between two macromolecules, one of which is labeled with a fluorescent tag. This technique was used to determine the binding affinities for protein-DNA interactions. In these experiments, the DNA substrate was a 30+30 fork (Figure 2.2) labeled with a 5’ HEX (fluorescein derivative-hexachloro-6-carboxy fluorescein) and protein was titrated into the DNA solution. The concentration of labeled DNA required for a titration was based on the dissociation constant estimated to be in the low µM range. The concentration of the DNA for the initial titration was chosen to be 50 µM. The protein was titrated into the DNA solution until the anisotropy reached a plateau.

70

*

15

15

15

3’

15

5’

3’

Figure 2.2: Fluorescently labeled DNA substrate. The fork DNA substrate used for the fluorescence titration is a 30+30 mer (IDT, Coralville, IA) with a 5’ HEX (fluorescein derivativehexachloro-6-carboxy fluorescein) label on the 5’ arm of the substrate. The * represents the location of the fluorescent tag on the fork substrate. The sequence of the fork DNA substrate is: 3’- TGC AAC TGA TGG CAG AAC TCC GTC TCA CCA -5’ ||| ||| ||| ||| ||| 5’/5HEX/ CAA GCA GTC CTA ACT TTG AGG CAG AGT CCG -3’

The fluorimeter (PTI Photon Technology Inc) has automated excitation and emission polarizers. The excitation slits were set to a band pass of 2 nm and the emission slits were set to a band pass of 8 nm. The wavelength for excitation was 535 nm and the wavelength for emission was 556 nm. Each measurement was time averaged for 8-10 readings providing the error for each step in the titration. The first measurement was the DNA substrate in the protein buffer (total volume 425 µL). The subsequent readings were made after each addition of small volumes of concentrated protein; protein was added to the cuvette using a micropipettor and the mixture was mixed thoroughly via pipetting. In order to determine the dissociation constant for protein binding to labeled DNA, several equations were used to calculate the fraction bound for each step of the

71

titration (Figure 2.3). Anisotropy data, from complete titrations, are analyzed using equations in the BioKin Dynafit 3 software.

A.

KD = [P] [L] [PL]

KD = (LT – XLT) (PT – XLT) XLT

0 = X2 LT – X(PT + LT + KD) - PT

B.

a = LT ([DNA]) 2

X = -b ± √b – 4ac 2a

C.

b = - (LT + PT + KD) ([DNA] + [protein] + KD) c = PT ([protein])

Figure 2.3: Fluorescence equations for fraction of protein bound substrate. A). The dissociation constant is a ratio of free protein [P] and free ligand [L] divided by the complex [PL] which can be expanded using the equivalents: [P] = [PT] – X and [L] = [LT] – X where X is fraction bound. B). The solution is a quadradic with known quantities of total ligand [LT] and total protein [PT] and X as fraction substrate bound C). The quadratic equation with parameters defined as provided to the DynaFit program.

2.3.5 Small angle X-ray scattering Small angle X-ray scattering (SAXS) is used to determine the molecular envelopes of macromolecules based on low resolution X-ray scattering. The scattering from the protein systems are concentration and size dependent, and the total concentration of the protein samples (both individual and complex samples) were maintained at ~3 mg/mL. The concentration of the protein can be increased for smaller proteins and decreased for larger proteins, which would scatter more based on size and 72

shape. The protein (50-100 µL) was loaded and pulled into the top of a capillary. During data collection, the protein flowed through the X-ray beam towards the bottom of the capillary at a flow rate of 0.2 µL/second. Continuous flow of the protein past the beam reduces the effect of radiation damage on the proteins reducing the possibility of protein aggregation. Data was collected at different time intervals (5, 10, 20, 40 seconds) where the data collections at 20 sec. and 40 sec. intervals were repeated 4-5 times. This procedure was used on the individual proteins, protein-protein complexes and the buffers. After each run, the capillary was rinsed well with buffer and water. The data was collected at the Advanced Photon Source (APS), beamline 15-ID, ChemMatCARS with guidance from beamline scientist David Cookson. The instrument was set up to collect data at a 1.5 Å wavelength, with a capillary diameter of 1.5 mm, at a camera length of 560 mm, and a q range of 0.0200910-0.836965 Å-1 (q = s = 4π*sinΘ/λ) (7.5-312.7 Å). The transmission of the empty capillary was measured in order to correct the data for any incoherent scattering from the capillary. Buffer measurements were taken at 5, 10, 20, 30, and 40 sec. intervals. Protein sample measurements were taken at 10, 20, and 40 sec. intervals with multiple collections taken at 20 and 40 sec. The intensity of the 40 sec. data was most intense and this data was processed. The beamline program, SAXS/WAXS V3.2 94, was used to process the SAXS data. The program uploads a parameter text file and a log file (both created for the data collection software); the data from the buffer and proteins were entered into the program and the buffer run was subtracted from the protein run. The resulting output data file was further analyzed and modeled to obtain a molecular envelope as discussed below.

73

A suite of programs developed by Dmitri Svergun and collogues (available at http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html) was used to analyze the data and develop ab initio models of the molecular envelop. The program Primus was used to investigate the Guinier region of the data to determine the radius of gyration (Rg) of the protein.111 The program GNOM was used to determine the Rg as well, but GNOM analysis included all the data, not just Guinier region, allowing the program to take into consideration concentration and aggregation effects.112, 113 GNOM created a distance distribution plot (P(r) plot, the Fourier transform of the scattering curve into reciprocal space) for the given maximum diameter of the protein system. The curvature of the P(r) plot suggests an overall shape for the protein and the peak of the curve is an estimate of Rg. The q range was adjusted for particular ab initio programs during the GNOM analysis. Output files from the GNOM analysis were used as the input file for various ab initio programs utilized to generate a molecular envelope for the protein or protein system. The intensity (I0) of scattering is presented with both the Primus and GNOM programs and the intensity is proportional to the concentration and molecular weight of the scattering complex. Using the intensity of a known scatterer and the molecular weight of the scatterer will allow the determination of molecular weight of the unknown species. In case of this project, the I0 and molecular weight of the 59 protein was used to calculate the molecular weight of the 59 + 32-B complex using the following equation: (I0 known * MW known)/ I0 unknown = MW unknown. There were three main programs (Dammin, Gasbor, and SasRef) used to generate molecular envelopes from the Svergun suite. Dammin performs the shape determination analysis using simulated annealing and spherical harmonics to create a model using

74

densely packed dummy atoms.114 Dammin performs best when working with just the low q data and has four modes of structure determination. The program can be run in fast mode (default mode), slow mode (uses more dummy atoms, slow annealing steps, and fits to more spherical harmonics), jagged mode (faster than slow mode), and keep mode (generates 5-15 models which can be averaged with another program, Damaver). Generally, the fast mode is utilized but if the model is not adequate, it is suggested to next use the jagged mode then slow mode if necessary. For these experiments, the keep mode was utilized often and the program Damaver (in tandem with additional programs) was used to average the envelope reconstructions.115 Damaver is run in automatic mode, where the program calls all the side programs in the suite (damsel, damsup, damaver, damfilt, and damstart) to average the models, find the most probable model, and calculate a probability map. Damsel compares all the models and determines the most probable model (uses the subroutine supcomb13/20). Supcomb13/20 calculates the normalized spatial discrepancy (NSD-proximity measurement between 3D objects) for each atom in all the models and calculates the mean value for these distances. The model with the lowest NDS is used as the reference model and the models with high NDS are discarded.115 Based on the NDS and damsel results, damsup will align all the models to the most probable model (also using supcomb13/20). Damaver will average the aligned models and compute a probability map, followed by the filtering of the averaged model at a particular volume by the program damfilt. The output is a molecular envelope for the protein at the given low q range. Gasbor is another ab initio program that performed shape determination using simulated annealing, and includes spherical harmonic approximations.116

75

Unlike

dammin, gasbor builds the envelope using an ensemble of chain-like dummy atoms based on all of the data, including both low and high q data. Gasbor also takes into account the number of atoms present in the system. There are two gasbor programs, gasbor22iqw (reciprocal space) and gasbor22pqw (real space), which create molecular envelopes either fitting the intensity in reciprocal space (the P(r) curve) or in real space (the scattering curve). It is recommended to use gasbor22iqw because the program provides a better fit to the experimental data, thus this program was used for the modeling of this project. SasRef is a rigid body refinement program that utilizes simulated annealing, spherical harmonics, and molecular envelops created from component PDB coordinates to create an envelope based on the experimental data for the complexes.117 This program takes theoretical scattering curves from known structures solved by X-ray diffraction or NMR and creates models of complexes in conformations that would produce a scattering curve similar to the experimental curve. The theoretical scattering curves are produced in the program Crysol.118 Crysol reads the atomic coordinates from the PDB of a solved structure and calculates the partial scattering amplitudes of the structure. These partial scattering amplitudes are used by SasRef to construct the protein-protein complex in order to minimize any discrepancy between the experimental data and the theoretical scattering data. This program is predominately used to create models of complexes. There is a suite of programs (CREDO, CHADD, GLOOPY) available to facilitate in the building of a missing domain on to a protein.119 All of the programs are similar to Gasbor, spherical harmonics, simulated annealing, and chain-like dummy residues are used to create the molecular envelop as well as build in the missing domain. Each of these programs will build in the missing region, but in order to do that, the programs use

76

the PDB of the solved region of the molecule as a reference model and use the experimental data to build the missing domain from the reference model. CREDO uses free dummy residues to build the missing domain from a low-resolution model. CHADD uses chain-like dummy residues with spring forces between the neighbors (0.38 nm separation from (i+1)-th) to build the missing domain from a high-resolution model. GLOOPY is similar to CHADD with the addition of the protein sequence into the program. Both CHADD and GLOOPY require information about which residue the missing domain should be built from. CREDO, CHADD, and GLOOPY were used to build in the missing domain of 32-B (adding the C-terminal domain onto the core domain). A set of programs developed by William Heller (ORNL), which includes Ga_struct and ELLSTAT120, can also be used to create a molecular envelope based on SAXS scattering data. Both programs use the processed data file directly. ELLSTAT creates models of ellipsoids or cylinders, while ga_struct generates reconstructs several models based on the experimental data.

Ga_struct was used to create a molecular

envelope for the complexes and the 59 protein.

2.3.6 Small angle neutron scattering Small angle neutron scattering (SANS), similar to SAXS, is used to determine the molecular envelope of macromolecules in solution. With SANS, contrast matching can be implemented in order to analyze the positions of proteins within a complex as long as one of the proteins is perdeuterated. When the complex consists of one hydrogenated protein and one perdeuterated protein, varying the concentration of D2O in the buffer will

77

affect which protein contributes to the scattering. If the complex is in 40 % deuterated buffer, the only scattering detected is from the deuterated protein. If the complex is in 100 % deuterated buffer, the only scattering detected is from the hydrogenated protein. Using both data sets, the distance between proteins in the complex can be calculated. Several data sets were collected at ORNL HIFR CG-2 beamline with detector distances of 2 m and 6 m (Table 2.8).

Table 2.8: SANS data collection data sets. Several data sets were collect under different complex set ups at 2 m and 6 m. Table includes protein and protein complex data sets, detector to sample distance, exposure time, and buffer condition.

Protein system

Buffer

Distance

Exposure time

hydrogenated 59 protein

100 % deuterated buffer

2m

2 hours

perdeuterated 32-B

100 % hydrogenated buffer

2m

2 hours

hydrogenated 59C42S + hydrogenated 32-B

100 % deuterated buffer

6m

5 hours

hydrogenated 59C42S + hydrogenated 32-B

100 % deuterated buffer

2m

2 hours

hydrogenated 59 + perdeuterated 32-B

50 % deuterated buffer

2m

2 hours

hydrogenated 59C42S + perdeuterated 32-B

100 % deuterated buffer

6m

5 hours

Data was collected at a wavelength of 4.76 Å, with a sample to detector distance of 2815 mm or 6815 mm, a source distance of 9249 mm, and in a 1 mm cuvette. Protein or buffer (250 µL) was loaded into the cuvette and the data was collected for 2 hrs. or 6 hrs. The q range for the 2 m data collection was 0.012546-0.400785 Å-1 and for 6 m data collection was 0.007575-0.185709 Å-1. With each experimental set up, several controls were collected as well; if the distance or exposure time was changed, the controls would need to be re-run. The 78

controls included buffer scan (0%, 50% and 100% deuterated buffer), empty cuvette scan, dark scan (shutter shut), and an efficiency scan (using a Lucite plate, test the beam flux).

Each of the control exposures were utilized during data processing with

subroutines written for the IGOR platform. IGOR subroutines were used to obtain a scattering curve, analyze the Guinier region and determine the Rg, and output a text file to be used with GNOM. Data analysis from this point is identical to SAXS and Svergun’s suite of programs are used. An additional program specific for SANS data is Cryson (Svergun) which is used to evaluate neutron scattering relative to solved structures.121

2.3.7 X-ray crystallography Once the protein and protein complexes are monodisperse, the samples were screened with a variety of commercial and in-house screens. Griener (96 wells with three drops per well) and Corning trays (96 wells with one drop per well) were used in the crystallization screens. Several parameters were varied to increase the odds of obtaining protein crystals. The concentrations of the proteins were varied, additives/cofactors were added, the trays were stored at different temperatures (4 °C, 20 °C or at room temperature), and the crystallization screen conditions were used full strength and diluted. The drops for the screens usually contained 0.5 µL or 1 µL of protein and 0.5 µL or 1 µL of the well solution. The trays were scored on days 1, 3, 5, 7 and 10. Any crystal that was noted (crystal hits) were tested with the IZIT dye or by breaking the crystal in order to determine whether the crystal was protein or salt.

79

Crystal hits were expanded to reproduce the conditions that favor crystal nucleation and growth. Expansion trays were set up using an AB gradient and most common set ups for expansion trays were 4x6, 2x12, and 1x24.122 With the expansion trays, only one parameter was changed at a time per set up or per row. Variations in pH, buffer type, concentrations of salt, precipitating agents, additives, and protein, and drop sizes can affect crystal nucleation and growth. Once optimal crystallization conditions were developed and diffraction quality crystals obtained, several methods of cryoprotection were investigated (soaking, serial soaking, addition to crystal tray) in order to protect the crystal during data collection.

Different chemicals (high salt

concentrations, organics, polyalcohols) used in cryo protection were varied to determine the proper conditions for crystal protection. During the crystal optimizations, monitoring the crystal diffraction facilitates the determination of the proper conditions. Testing for crystal diffraction was conducted at either room temperature, or frozen at 100 K (nitrogen cryostat). For cryo-collection, the crystals were flashed cooled using a helium cryostat, producing a nice, clear freeze when conditions were optimal. Crystals that diffracted well were stored in liquid nitrogen and taken to APS for data collection or data was collected in-house on the Rigaku FRE using the CCD detector.

The Strategy program, found in Crystal Clear, HKL2000, and

Mosflm, facilitates the optimal data collection parameters. Mosflm was used to integrate the data sets. The CCP4 suite is a collection of crystallography programs including data scaling, phase determination, detwinning data, and data refinement. CCP4 SCALA was used to scale the data and the CCP4 molecular replacement programs, MolRep and Phaser, were used to determine the initial phases for

80

the model. The program COOT was used to build and adjust the model into the electron density maps and the programs REFMAC 5 and CNS were used to refine the model after the COOT building session. For each building/modeling cycle, one or two residues were added at a time and then the new model would be refined against the experimental data using both REFMAC5 (positional refinement) and CNS (simulated annealing). The outputs from both refinements were compared and the best model used to continue building. This process was repeated until the model building was completed.

81

Chapter 3: T-even Bacteriophage DNA replication During the middle and late stages in the life cycle of Bacteriophage T4, recombination-dependent DNA replication is the primary form of DNA replication. The role of the 59 protein is very important in the recognition of the replication fork and for the initiation of DNA synthesis assisting in the loading of the helicase onto the lagging strand.

This research focused on the further characterization and determination of

structure of the helicase assembly protein, in binary complexes with the single-stranded binding protein and the helicase, as well as the ternary complexes with the 59 protein-32 protein-fork DNA.

The protein-protein interactions were investigated using

chromatography, biophysical techniques and gels, while the structure was analyzed using small angle scattering and x-ray diffraction experiments. Crystallographic studies of the 59 protein interacting with fork DNA substrate would reveal how the 59 binds to DNA and detect any structural conformation changes. Previous research has shown that the 59 protein binds to fork DNA with a low nM affinity, but at higher concentrations the 59 protein precipitates in the presence of DNA. This aggregation phenomenom facilitates the oligomerization of the 59 protein, perhaps cross-linking between cysteine residues 42 and 215. Mutations at both cysteine positions (59C42S, 59C42A, 59C215A, and 59C215S) were cloned in order to prevent this oligomerization in the 59 protein and stabilize the protein.

82

Since the 59 protein binds to the fork DNA, then recruits the 32 protein to the fork, perhaps the 59-32 interaction is needed to help stabilize the 59 protein in the presence of DNA. 32 protein truncations (32-B, 32-A, and 32 core) will also be complexed to the 59 protein. Interactions between the 59 protein with the 32 protein and 32-B were detected with a KD around 3.7 µM and 3.6 µM, respectively. The mutant 59C42S also interacted with the 32 protein and 32-B with moderate affinity. The 59 protein binds tightly to fork DNA substrates (KD ~ 91.2 nM) and upon addition of 32 protein or 32-B into the binary protein-DNA complex, both proteins bind to the complex with a KD ~ 1.8 µM and ~18.2 µM, respectively. Protein crystals of the complex formed, but failed to optimize. Small angle X-ray scattering experiments were used to calculate molecular envelopes for the individual proteins and the complex. The protein complexes appear elongated in shape. Analytical ultracentrifugation experiments suggest that the 32-B protein is a dimer in solution.

83

3.1 Bacteriophage T4 results and discussion 3.1.1 Protein expression and purification Expression vectors (transformed into expression hosts) for each of the proteins have been gifts from Drs. Nancy Nossal (59 and 59C42S), Charles Jones (32), Rich Karpel (32-A and 32-B), and Yousif Shamoo (32 core). The 59 protein (pNN2859) expression in BL21 (DE3) pLys S yielded 14-16 g of cell paste from a 6 L expression. The 59C42S protein expressed in BL21 (DE3) pLysS producing 8-11 g of cell paste. 59 protein was lysed in a low salt buffer (25 mM Bis Tris pH 6.5, 150 mM NH4Cl, 10 mM MgCl2, 2 mM beta-mercaptoethanol (BME), 0.3 % polyethylenimine (PEI)), and extracted from the pellet with a high salt buffer (25 mM Bis Tris pH 6.5, 1 M NH4Cl, 10 mM MgCl2, 2 mM BME, 0.3 % PEI). 59C42S was lysed in a high salt buffer (25 mM Bis Tris pH 6.5, 750 mM NH4Cl, 10 mM MgCl2, 2 mM BME, 0.3 % PEI). Both the native and mutated proteins have theoretical isoelectric points (pI) around 9.41 and cation-exchange columns were used to purify the proteins (Figure 3.1). The 32 protein (pAS6) was expressed in E. coli N4830 cells, with typical 6L harvest of around 10-16 g of cells. The 32-A (pEKF1) and 32-B (pEKF2) are in AR120 E. coli cells and produced 8 g and 9-13 g of cell pellet, respectively. The 32 core (pKC30) is in OR1265 cells and produced a 38 g cell pellet. The 32 protein was lysed with low salt lysis buffer containing 25 mM Bis Tris pH 6.5, 50 mM NaCl, 1 mM EDTA, and 2 mM BME. 32-A, 32-B, and 32 core were lysed with a low salt lysis buffer (40 mM Tris pH 8.0, 100 mM NaCl, 10 mM MgCl2, 2 mM CaCl2, and 1 mM EDTA). Theoretical isoelectric point for 32 protein and the truncations range from 4.60-6.67; these acidic proteins were purified using anion-exchange columns (Figure 3.1). 84

Purification Scheme of T4 proteins 32, 32 core, 32-A, 32-B

59

59C42S

Q Sepharose

Hydroxyapatite

SP Sepharose

Poros PE

Poros PE

Poros PE

Poros HQ

Poros HS

Poros HS

Figure 3.1: Purification scheme for the 59, 59C42S, 32, 32 core, 32-A, and 32-B proteins. Hydroxyapatite removes DNA and contaminating basic proteins, Poros PE removes endogenous nucleases, SP Sepharose, Q Sepharose, Poros HQ, and Poros HS removes contaminating proteins.

Lysate containing the 59 protein was prepared for purification by decreasing the conductivity of the sample below the conductivity of buffer A (buffer A: 25 mM Tris pH 7.5, 100 mM NaCl). The Hydroxyapatite (HA) column was equilibrated with buffer A and the lysate containing the 59 protein was loaded onto the column. A 10 column volume (CV) linear gradient was used to purify the 59 protein (buffer B: 25 mM Tris pH 7.5, 1 M (NH4)2SO4); during the run, most of the contaminants bound to the resin while the 59 protein was found in the effulent. Next, small aliquots of 3 M (NH4)2SO4 were added to the 59 protein sample to increase the conductivity of the protein (to match the conductivity of Buffer B, 25 mM Tris pH 7.5, 1 M (NH4)2SO4). The Poros PE was equilibrated with buffer B and the 59 protein was run over the Poros PE (PE) (to remove any endogenous nucleases). It was discovered that the contaminating nucleases adhered to the column, while the 59 protein was found in the effulent.123 The 59 protein PE

85

sample was dialyzed (dialysis buffer: 25 mM Bis Tris pH 6.5, 60 mM NH4Cl, 10 mM MgCl2, 2 mM BME) to decrease the concentration of salt in the protein solution below buffer A (buffer A: 25 mM Bis Tris pH 6.5, 75 mM NH4Cl, 10 mM MgCl2, 2 mM BME). The 59 protein was loaded onto the Poros HS (HS) and eluted (~ 450 mM) in a broad peak, during a 10 column volume (CV) linear gradient (buffer B: 25 mM Bis Tris pH 6.5, 1 M NH4Cl, 10 mM MgCl2, 2 mM BME) (Figure A1). The 59 was fairly pure (> 95 % as assessed with SDS-PAGE) after the Poros HS run (Figure 3.2). Purification of the 6 L expression pellet produced around 100 mg of protein (Table 3.1). 1 2 3 4

31.0 kDa

59 25.9 kDa

21.5 kDa

Figure 3.2: SDS-PAGE of 59 protein from Poros HS column. The 59 protein eluted from the Poros HS in a broad peak around 450 mM NH4Cl (fractions 10-24). Lane 1: 59 HS fraction 10; lane 2: 59 HS fraction 22; lane 3: 59 HS fraction 24; lane 4: MW ladder.

Lysate containing the 59C42S protein was prepared for purification by decreasing the conductivity of the sample below the conductivity of buffer A (buffer A: 25 mM Bis Tris pH 6.5, 50 mM NH4Cl, 10 mM MgCl2, 2 mM BME). The SP Sepharose (SP) was equilibrated with buffer A and lysate containing the 59C42S protein was loaded onto the column. The 59C42S was eluted (~ 450 mM) in a broad peak during a 10 CV linear

86

gradient run (buffer B: 25 mM Bis Tris pH 6.5, 1.5 M NH4Cl, 10 mM MgCl2, 2 mM BME).

The SP fractions containing the 59C42S protein were combined and the

conductivity of the sample was increased (either addition of 4M NH4Cl or 3 M (NH4)2SO4) to match the conductivity of buffer B. The 59C42S protein was run over the PE where the nucleases bound to the resin, while the 59C42S was located in the effluent. The conductivity of the 59C42S PE sample was decreased below the conductivity of buffer A through dialysis (25 mM PIPES pH 6.5, 40 mM NH4Cl, 10 mM MgCl2, 2 mM BME). The 59C42S eluted from the HS (~ 450 mM) in a narrow peak during a 20 CV linear gradient (buffer B: 25 mM Bis Tris pH 6.5, 1 M NH4Cl, 10 mM MgCl2, 2 mM BME) (Figure A2). The 59C42S was fairly pure (> 95 % assessed with SDS-PAGE) after the Poros HS run (Figure 3.3). Purification of the 6 L expression pellet of 59C42S produced around 100 mg of protein (Table 3.1).

1 2 3 4 5

31.0 kDa 59C42S 25.9 kDa

21.5 kDa

Figure 3.3: SDS-PAGE of 59C42S protein from Poros HS column. The 59C42S protein eluted from the Poros HS in a narrow peak around 450 mM NH4Cl. Lane 1: MW ladder; lane 2: Flow through; lane 3: 59C42S HS fraction 28; lane 4: 59C42S HS fraction 30; lane 5: 59C42S HS fraction 32.

87

Lysate containing the 32 protein was prepared for purification by decreasing the conductivity of the lysate to below the conductivity of buffer A (buffer A: 25 mM Bis Tris pH 6.5, 50 mM NaCl, 1 % glycerol, 2 mM BME). The Q Sepharose (QS) was equilibrated with buffer A and the lysate containing the 32 protein was loaded onto the column. The 32 protein eluted from the QS in a linear stretch (~ 375 mM) during a 10 CV linear gradient (buffer B: 25 mM Bis Tris pH 6.5, 1 M NaCl, 1 % glycerol, 2 mM BME). The conductivity of the QS fractions were increased (addition of 2 M NaCl and 3 M (NH4)2SO4) to match the conductivity of buffer B. The 32 protein sample was run over the PE and the contaminating nuclease bound to the column, while the 32 protein was found in the effulent. The effulent was dialyzed (25 mM Bis Tris pH 6.5, 25 mM NaCl, 1 % glycerol, 2 mM BME) to reduce the conductivity of the sample below the conductivity of buffer A. The 32 protein sample was then loaded onto the Poros HQ (HQ), which was equilibrated with buffer A. 32 protein eluted from the HQ (~ 450 mM) in a narrow peak during the 10 CV linear gradient run (Figure A3). The 32 protein had minor contaminants in the beginning of the peak with relatively pure (> 95 %, assessed by SDS-PAGE) 32 protein at the end of the peak (Figure 3.4). Purification of the 6 L expression pellet of 32 protein produced around 40-50 mg of protein (Table 3.1).

88

1 2 3 4 5

36.5 kDa 31.0 kDa

32 33.5 kDa

Figure 3.4: SDS-PAGE of 32 protein from Poros HQ. The 32 protein eluted from the Poros HQ in a narrow peak around 450 mM NaCl. Lane 1: MW ladder; lane 2: 32 HQ fraction 29; lane 3: 32 HQ fraction 32; lane 4: 32 HQ fraction 34; lane 5: 32 HQ fraction 38.

Lysate containing 32-A was prepared for purification by decreasing the conductivity of the lysate to below the conductivity of buffer A (buffer A: 25 mM Bis Tris pH 6.5, 50 mM NH4Cl, 1 % glycerol, 2 mM BME). The Q Sepharose column was equilibrated with buffer A and the lysate containing the 32-A protein was loaded onto the column. 32-A eluted from the QS in a broad peak (~ 250 mM) during a 10 CV linear gradient run (buffer B: 25 mM Bis Tris pH 6.5, 1 M NH4Cl, 1 % glycerol, 2 mM BME). The conductivity of the QS fractions were increased (addition of 2 M NaCl and 3 M (NH4)2SO4) to match the conductivity of buffer B. 32-A was run over the PE and the nuclease adhered to the column while the 32-A was located in effluent. The conductivity of 32-A was decreased during dialysis (25 mM Bis Tris pH 6.5, 25 mM NaCl, 1 % glycerol, 2 mM BME) to a conductivity lower than the conductivity of buffer A. The dialyzed 32- A was loaded onto the HQ, after the HQ was equilibrated with buffer A. 32A was eluted from the HQ in a narrow peak (~ 250 mM) during a 10 CV linear gradient

89

run (Figure A4). The 32-A was fairly pure (> 95 %, assessed by SDS-PAGE) in the fractions towards the end of the peak (Figure 3.5). Purification of the 6 L expression pellet of 32-A protein produced around 100 mg of protein (Table 3.1). 1 2 3 4 5 6 7 8 9 10 11 12 13

31.0 kDa

32-A 28.6 kDa

21.5 kDa

Figure 3.5: SDS-PAGE of 32-A protein from Poros HQ. The 32-A protein eluted from the Poros HQ in a narrow peak around 250 mM NaCl. Lane 1: MW ladder; lane 2: HQ flow through; lane 3: 32-A HQ fraction 8; lane 4: 32-A HQ fraction 9; lane 5: 32-A HQ fraction 10; lane 6: 32-A HQ fraction 11; lane 7: 32-A HQ fraction 12; lane 8: 32-A HQ fraction 13; lane 9: 32-A HQ fraction 14; lane 10: 32-A HQ fraction 15; lane 11: 32-A HQ fraction 16; lane 12: 32-A HQ fraction 17; lane 13: 32-A HQ fraction 18.

Lysate containing 32-B protein was prepared for purification by decreasing the conductivity of the lysate below the conductivity of buffer A (buffer A: 25 mM Tris pH 7.5, 50 mM Na3citrate). The Q Sepharose column was equilibrated with buffer A and the lysate containing the 32-B protein was loaded onto the column. 32-B eluted from the QS in a broad peak (~ 275 mM) during a 10 CV linear gradient run (buffer B: 25 mM Tris pH 7.5, 50 mM Na3citrate, 1 M NaCl). The conductivity of the QS fractions were increased (addition of 2 M NaCl and 3 M (NH4)2SO4) to match the conductivity of buffer

90

B. 32-B was run over the PE and the nuclease bound to the resin while the 32-B was located in effluent. The conductivity of 32-B was decreased during dialysis (25 mM Tris pH 7.5, 25 mM Na3citrate) to a conductivity lower than the conductivity of buffer A (25 mM Tris pH 7.5, 25 mM Na3citrate). The dialyzed 32-B was loaded onto the HQ, after the HQ was equilibrated with buffer A. 32-B was eluted from the HQ in a broad peak (~ 150 mM ) during a 10 CV linear gradient run and the 32-B was also located in the effulent (Figure A5). The HQ fractions containing the 32-B and the effulent was fairly pure (< 95 %, assessed by SDS-PAGE) (Figure 3.6). Several bands were also present with the 32-B band; the MW of the bands corresponded 32-B at higher oligomeric states (dimers).

The addition of a reducing agent dissociated the oligomers and only

monomeric 32-B remained. Purification of the 6 L expression pellet of 32-B protein produced around 200-400 mg of protein (Table 3.1).

1 2 3 4 5

36.5 kDa 31.0 kDa

32-B 31.8 kDa

Figure 3.6: SDS-PAGE of 32-B protein from Poros HQ. The 32-B protein eluted from the Poros HQ in a broad peak around 150 mM NaCl. Lane 1: HQ flow through; lane 2: HQ flow through 2; lane 3: 32-B HQ fraction 6; lane 4: 32-B HQ fraction 18; lane 5: MW ladder.

91

Table 3.1: Elution parameters for 59, 59C42S, 32, 32-B and 32-A. Table includes the column used for protein purification, the fraction range in which the protein eluted from the column, and the conductivity range of the elution buffer at which the protein eluted from the column. The 32 core was purified by Laurence Boutemy.

Protein Column 59 59C42S 59C42S 32 32 32-A 32-A 32-B 32-B 32 core 32 core

HS SP HS QS HQ QS HQ QS HQ QS HQ

Fraction elution

Conductivity Range (mS/cm)

12-22 13-26 29-33 37-49 30-38 20-32 11-16 32-44 6-18 59-71 9-20

35-70 33-75 35-70 40-55 60-75 27-48 25-35 35-50 10-30 25-35 15-35

[ salt ] mM 450 400 450 375 450 250 250 275 150 350 175

Protein yield (mg) 100 N/A 100 N/A 50 N/A 100 N/A 300 N/A 60

3.1.2 T4 protein characterization

3.1.2.1 Differential scanning calorimetry Differential scanning calorimetry experiments were performed to determine the temperature stability of the proteins.

The experimental program started with an

equilibration step at 4 °C for 15 minutes, followed by an upscan at a rate of 90 °C/hr to a final temperature of 75 °C. The temperature was held at 75 °C for 15 minutes, followed by a downscan at the rate of 60 °C/hr to a final temperature of 4 °C. The concentration of the 59 protein and 32-B was 50 µM for both runs. The 59 protein had two transitions, 52.8°C and 59°C and 32-B also had two transitions, 49.4 °C and 61.3°C. Data analysis calculated a TM of 52.4 °C for the 59 protein and two TM for the 32-B protein, TM 1 92

49.9 °C and TM 2 62.5 °C (Figure 3.7). Multiple peaks suggest that the 32-B protein is unfolding but the increase in temperature is either creating aggregates or dissociating the soluble aggregates.

59 protein TM: 52.4 °C

32-B protein TM 1: 49.9 °C TM 2: 62.5 °C

Figure 3.7: DSC thermograms for 59 protein and 32-B protein. The 59 protein had a TM around 52.4 °C, while the 32-B protein had two TM, 49.9 °C and 62.5 °C. Two TM for 32-B protein suggests protein unfolding and association or dissociation of aggregates.

3.1.2.2 Dynamic light scattering Dynamic light scattering experiments (Table 2.3) were performed multiple times, obtaining reproducible results in most cases.

Data were collected at a protein

concentration around 1 mg/mL with 10 acquisitions (consisting of ten 1-second measurements per acquisition) at a laser power of 80 % (W). 59 and 59C42S proteins were monomeric in solution, and predominately monodispersed with a hydrodynamic radius (RH) ranging from 2.3 nm to 2.5 nm. The 32 protein was oligomeric in solution, and very polydisperse. There were at least five to ten 32 proteins in the oligomer, with a RH ranging from 5.5 nm to 6.9 nm. The 32-A truncation was oligomeric in solution and

93

the molecular weight suggested that there were six to seven 32-A proteins present in the oligomer. The hydrodynamic radius of 32-A indicated formation of aggregates with an average radius between 5.3 nm and 5.8 nm. Several experiments suggested that the 32-B truncation was either a monomer or dimer in solution. The monomer had an RH range of 2.5 nm to 2.8 nm, while the dimer had an RH between 3.3 nm to 3.6 nm. This variation was seen at different temperatures (4°C or 20°C) and in different protein preps/batches. Even though the purification runs appeared to be the same per protein batch, differences in buffer preparation or the oxidation of the reducing agent could alter the oligomeric state of the protein in solution.

59C42S protein

32-B protein

Figure 3.8: DLS 59 protein and 32-B protein data. The data for 59 protein suggests that the protein has a low polydispersity and a monomer. The 32-B protein has two peaks indicating a slightly polydispersed sample.

94

Table 3.2: Dynamic light scattering results of the individual T4 proteins. Protein samples (1 mg/mL) were measured at 80% laser power for 15 acquisitions (10 readings per acquisition). Representative runs of each of the individual proteins. Table includes the temperature at which the experiment was run, the degree of polydispersity of the protein complex sample, the hydrodynamic radius of the complex and the estimated molecular weight of the complex.

Protein complex 59C42S 59C42S 59 59 32 32 32-A 32-B 32-B 32-B 32-B

Temperature °C 4 25 4 25 4 25 4 4 4 25 25

% Polydispersity 9.8 18.6 7.2 20.6 26.2 13.9 20.0 27.8 16.0 12.2 27.3

Radius Ǻ 23 ± 2 25 ± 4 22 ± 2 23 ± 5 56 ± 15 64 ± 9 57 ± 10 28 ± 8 32 ± 4 28 ± 3 36 ± 6

Molecular Weight (kDa) 27.0 29.0 22.0 23.0 192.0 262.0 194.0 37.0 52.0 37.0 69.0

Calculated MW (kDa) 25.9 25.9 25.9 25.9 33.5 33.5 28.6 31.8 31.8 31.8 31.8

3.1.2.3 Size exclusion chromatography The Superdex 75 (size exclusion chromatography) verified that 59 (2.1 mg/mL) and 59C42S (3.0 mg/mL) proteins were monomeric in solution, but 32-B (4.5 mg/mL) was a dimer in solution (data not shown). 32-B eluted in the void fraction of the column (MW cutoff around 60 kDa), which would indicate the presence of a dimer species. The formation of 32-B dimers might be concentration dependent, however, both monomers and dimers were seen in DLS experiments, where the concentrations were always around 1 mg/mL.

95

3.1.2.4 Fluorescence anisotropy Fluorescence anisotropy was utilized to determine the binding affinity of the 59 protein for a fork DNA substrate, with and without the presence of 32 protein. 59 protein was titrated into 200 µM 30 +30 fork DNA (5’ HEX tag) and the dissociation constant was 91.2 nM (Figure 3.9). The percentage of 59 protein bound to DNA at the end of the titration was greater than 95 %.

Titration of 59 protein into 30+30 fork DNA 0.21

Anisotropy

0.19 0.17 0.15 0.13 0.11 0.09 0

500

1000

1500

2000

2500

[59 protein] (nM)

Figure 3.9: Fluorescence anisotropy results of 59 protein and fork DNA. Plot of concentration of 59 protein ([Rev] nM) and the anisotropy (signal) of binding to 30+30 fork DNA. The dissociation constant for the titration of 59 protein into 200 µM DNA was 91.2 nM.

The next experiment performed titrated the 32 protein into a solution of 59 protein already bound to DNA (59:DNA 1:1 molar ratio, 400 nM 59:400 nM DNA, around 50 % of 59 protein bound to the DNA). The dissociation constant for the ternary complex was 1.8 µM with around 70.8 % of 32 bound to the 59/DNA complex (Figure 3.10). 96

Titration of 32 protein into 59 : fork DNA 0.24

Anisotropy

0.23 0.22 0.21 0.2 0.19 0.18 0.17 0.16 0

1000

2000

3000

4000

5000

[32 protein] (nM)

Figure 3.10: Fluorescence anisotropy of the 59/32/DNA ternary complex. Plot of concentration of 32 protein ([Rev] nM) and the anisotropy (signal) of binding to 59/DNA. The dissociation constant for the titration of 32 protein into 400 µM 59/DNA was 1.8 µM.

32-B was also titrated into the 59/DNA mixture (200 nM 59: 200 nM 30+30 fork DNA). The dissociation constant for the 59/32-B/DNA ternary complex was 18.2 µM with around 70.0 % of 32-B binding to the 59/DNA binary complex (Figure 3.11). Both 32 and 32-B proteins bind to 59 protein with a moderate binding around 3.8 µM, but in the presence of DNA, the dissociation constant changes.

Binding becomes tighter

between 59 protein and 32 protein in the presence of fork DNA, while there is a weaker interaction between 59 and 32-B in the presence of DNA. The 32 protein binds more tightly because of the presence of the N-terminal domain, which contains a LAST domain

97

that facilitates binding to DNA. To correctly evaluate the results, a control (32 and 32-B titrated into 30+30 fork DNA) needs to be performed and the binding constants can be compared to determine the effect of 32 binding to the binary complex (59 + DNA).

Titration of 32-B protein into 59 : fork DNA 0.2

Anisotropy

0.19 0.18 0.17 0.16 0.15 0.14 0

5000

10000

15000

20000

25000

30000

[32-B protein] (nM)

Figure 3.11: Fluorescence anisotropy of the 59/32-B/DNA ternary complex. Plot of concentration of 32-B protein ([Rev] nM) and the anisotropy (signal) of binding to 59/DNA. dissociation constant for the titration of 32 protein into 200 µM 59/DNA was 18.2 µM.

98

The

3.1.2.5 Small angle X-ray scattering of T4 proteins Small angle X-ray scattering (SAXS) data collection on individual proteins (with solved X-ray structures) allowed the development of a procedure in which the SAXS data could be processed, analyzed, and modeled.

The q range for data collection was

0.0200910-0.836965 Å-1 (312.7-7.5 Å). The scattering data for each of the individual proteins was collected at 20 second and 40 second intervals (Figure 3.12). There was more intense scattering with the 40 second data runs and these data runs were averaged together, followed by a background subtraction of the buffer (also collected at the 40 second interval) (Figure 3.13). As the data was collected, the intensity decreased from low resolution (low q) to high resolution (high q). The quality of the data increases as the magnitude of decreasing intensity increases. The processed scattering curves for the 59 protein and 32 protein decreased at least two orders of magnitude, suggesting the data was good. On the other hand, the 32-B data did not follow this trend, with the intensity decreasing only an 1 ½ order of magnitude.

Figure 3.12: Example of SAXS scattering profiles. Example of the scattering profiles for the 59 protein at 20 second intervals (left panel) and at 40 second intervals (right panel). The darker the spots, the greater the intensity of scattered X-ray beam. The 40 second interval data was processed, analyzed, modeled for all of the individual proteins.

99

32

59

32-B Figure 3.13: Scattering curves of the individual T4 proteins. The 59 protein data decreased 3 orders of magnitude, with the processed 20 second interval data in blue and the 40 second interval data in purple. The 32 protein data decreased almost 4 orders of magnitude, with the with the processed 20 second interval data in green and the 40 second interval data in purple. The 32-B data only decreased 1.5 orders of magnitude, with the with the processed 20 second interval data in orange and the 40 second interval data in pink.

During data analysis and modeling, the radius of gyration (Rg) was calculated and the molecular envelopes for each of the proteins were determined. Analysis of the Guinier region (extreme low q data) using the program Primus calculated an Rg of 23.7 Å for the 59 protein, 47.0 Å for the 32 protein, and 28.7 Å for the 32-B (Figure 3.14).111 Aggregation was seen in the 32 protein sample, which was evident from the large Rg (large slopes are characteristic of protein aggregation). The low resolution data for 32 protein showed the characteristics of aggregate formation or that the solution was heterogeneous with a variety of large multimers. The Rg for the 59 protein was similar to

100

the hydrodynamic radius determined by DLS, suggesting the protein was monodispersed, with little to no aggregation present, and the 32-B was within the range of radii determined by DLS experiments.

A. 59 protein

B. 32 protein

C. 32-B protein

Figure 3.14: Guinier plots of the individual T4 proteins. Primus was utilized to examine the Guinier region of the data for each of the proteins, where the slope of the Guinier plot is the Rg. A). The slope of the Guinier plot of the 59 protein was 23.7 Å. B). The slope of the Guinier plot of the 32 protein was 47.0 Å, the steepness of the slope suggests that the protein is aggregated or heterogeneous. C). The slope of the Guinier plot of 32-B was 28.7 Å.111

101

The program GNOM was used to evaluate all of the scattering data (low and high q), unlike Primus, and through this analysis, any effects from the concentration or aggregation of the proteins can be compensated.112, 113 With GNOM, several variables were changed during data analysis and the resulting Rg varied as well.

Several

parameters (Dmax, q range, Chi value) were monitored to evaluate the data analysis using the program GNOM (Table 3.3). After designating the number of scattering points to omit from the beginning and end of the run (truncating the q range) and determining the maximum linear distance (Dmax), the data was fit to a theoretical scattering curve (Figure 3.15). The better the fit of experimental data to the theoretical curve, the more accurate the Dmax was for that q range.

Table 3.3: Analysis of the T4 proteins scattering data with GNOM. The scattering data for each individual T4 protein (59 protein and 32-B) were analyzed with GNOM112 with varying several parameters (q range and Dmax). The best analysis based on curve fitting and Chi values are listed below.

Protein

Rg (Å)

Dmax (Å)

q range (Å-1)

Chi value

59

23.2 ± 0.73

80

0.0366-0.3759

0.98

59

23.9 ± 0.26

90

0.0366-0.2585

1.00

59

24.3 ± 0.71

90

0.0366-0.1406

0.83

59

23.5 ± 0.09

85

0.0366-0.3759

0.83

32-B

29.3 ± 0.29

100

0.0366-0.2585

0.98

32-B

29.5 ± 0.18

105

0.0366-0.3759

0.98

32-B

31.3 ± 0.50

125

0.0366-0.1642

0.99

102

A.

59 protein Dmax 80 Å q range- 0.0366-0.3759

59 protein Dmax 90 Å q range- 0.0366-0.1406

B.

32-B Dmax 125 Å q range- 0.0366-0.1642

32-B Dmax 100 Å q range- 0.0366-0.2585

Figure 3.15: Experimental scattering curves for 59 and 32-B proteins. A). The scattering curves for the 59 protein (left panel: Dmax 80 Å, q range: 0.0366-0.3759) (right panel: Dmax 90 Å, q range: 0.0366-0.1406). B). The scattering curves for 32-B (left panel: Dmax 100 Å, q range: 0.03660.2585) (right panel: Dmax 125 Å, q range: 0.0366-0.1642). In both cases, the experimental data overlaps the theoretical curve at these Dmax. The q range is on the x axis, while the I0 (intensity) is on the y axis.112

The 59 and 32-B protein scattering curves indicated the presence of a small amount of aggregation. The aggregation could be due to the slow movement of the protein sample through the capillary and the temperature of the run (ambient), thus increasing the radiation damage to the protein.

103

The Chi value was also monitored for the different q ranges and Dmax, where a Chi value around 1 was optimal. GNOM was run many times with different q ranges and Dmax in order to find the most optimal range of data for the model modeling stage. The 59 protein data went through 21 different data runs, the 32 protein had 6 data runs, and the 32-B had 22 data runs. The top data runs for each protein are found in Table 2.5; the top choices were chosen based on curve fitting, pr plots, low Rg error, and a Chi value close to one. The calculated Rg from GNOM were quite similar to the values calculated in Primus for each protein, except for the 32 protein. The 59 protein had a Rg around 23.2 Å and the 32-B protein had a Rg of 29.5 Å. It was very difficult to determine a Rg for the 32 protein because of the aggregation and heterogeneity of the solution. The scattering curve of 32 protein was compared to the theoretical scattering curves of several proteins and it was discovered that the larger the oligomer/protein, the steeper the slope of the scattering curve (Figure 3.16). This suggests that the 32 is a large multimer in solution, agreeing with the DLS data collected on the 32 protein.

Table 3.4: Comparison of the T4 protein radius of gyration from Primus and GNOM. Each of the programs calculated similar values for each of the proteins, except for the 32 protein suggesting aggregation or heterogeneous solution. The Rg was compared to the radius of hydration (RH) determined from DLS as a control in the analysis of the scattering data.

Protein Primus (Rg) Å GNOM (Rg) Å DLS (RH) Å 59

23.7 ± 0.08

23.9 ± 0.26

22-23

32

47.0 ± 0.07

N/A

56-64

32-B

28.7 ± 0.09

29.5 ± 0.18

28-36

104

Theoretical scattering curves of atomic structures via Crysol

I0

3.50E+09 3.00E+09

Helicase (55.7 kDa)

2.50E+09

DNA polymerase (68.0 kDa) DnaB-DnaG complex (122 kDa)

2.00E+09

59 protein (26.0 kDa)

1.50E+09

32 protein (33.5 kDa)

1.00E+09

Helicase (334.2 kDa)

5.00E+08 0.00E+00 0.00E+00

5.00E-02

1.00E-01

1.50E-01

2.00E-01

q (1/A)

Figure 3.16: Theoretical scattering curves created with Crysol. Theoretical scattering curves were calculated with Crysol of known X-ray structures (helicase monomer, polymer, DNA polymerase, and protein complex DnaB-DnaG) and compared to the experimental scattering curves of the 59 protein and the 32 protein. The larger MW of the system, the steeper the slope of the scattering curve. All scattering curves plateau at the extreme low q region, lack of this characteristic suggests aggregation in the protein.

Along with the scattering curves, GNOM generates a distance distribution function (pr plot); the value at peak of the curve is the Rg. The shape of the curve is dependent on the distances measured from the center of mass of the system to the outer edges of the system, thus the shape of the curve suggests the overall shape of the system. The Pr plots of the 59 protein suggested that the 59 had an overall elongated shape, consistent with the X-ray structure, while the Pr plots of the 32-B protein were also indicative of an elongated shape (Figure 3.17).

105

A.

59 protein Dmax 80 Å q range- 0.0366-0.3759

B.

59 protein Dmax 90 Å q range- 0.0366-0.1406

32-B Dmax 125 Å q range- 0.0366-0.1642

32-B Dmax 100 Å q range- 0.0366-0.2585

Figure 3.17: Distance distribution plots of the 59 and 32-B proteins. The scattering data for the 59 and 32-B proteins were analyzed with GNOM at different Dmax and q ranges, with two examples shown above. A). Pr plots of the 59 protein (left panel: Dmax 80 Å, q range: 0.0366-0.3759) (right panel: Dmax 90 Å, q range: 0.0366-0.1406). B). Pr plots of 32-B (left panel: Dmax 100 Å, q range: 0.0366-0.2585) (right panel: Dmax 125 Å, q range: 0.0366-0.1642). All the plots suggest an elongated shape for each of the proteins.112

The programs Gasbor and Dammin were used to create the molecular envelopes for the 59 and 32-B proteins. Gasbor and Dammin produced similar envelopes for the 59 protein; each envelope appeared to be the correct size but the shape was different than

106

what was expected (Figure 3.18). Different q ranges were used with the Gasbor program and the models were similar despite the different q ranges. A.

B.

C.

D.

E.

Figure 3.18: Molecular envelopes of the 59 protein. Gasbor and Dammin were used to generate molecular envelopes for the 59 protein. Models B and C have a characteristic “tail” extending from the elongated shape. Model A (dark blue) was created with Gasbor (q range 0.0366-0.1406, Dmax 80Å); model B (yellow) was created with Dammin (Keep mode) and averaged in Damaver (q range 0.0366-0.2585, Dmax 90Å); model C was created with Gasbor (q range 0.0366-0.1406, Dmax 80Å); model D was created with Gasbor (q range 0.0366-0.3759, Dmax 80Å); and model E is model D with the X-ray coordinates of the 59 protein superimposed into the envelope. The surface envelopes were generated with PYMOL and the protein atomic coordinates were superimposed into the envelope using COOT, the models were created in PYMOL.23, 24

107

Model A had a Chi value of 3.4, model C had a Chi value 0.85, model D had a Chi value of 2.5, and model E had a Chi value of 1.7.

Model E appeared to fit the X-ray

coordinates well and had the lowest Chi value. The theoretical scattering curve of the 59 protein X-ray coordinates was compared to the experimental scattering curve (using the program Crysol) and the curves overlap in the low q range with major deviations in the high q range (Figure 3.19).118 The differences between the curves could be due to any movement of the protein during the SAXS data collection as well as the presence of flexible areas in the protein.

Figure 3.19: Experimental and theoretical scattering curves of the 59 protein. A theoretical scattering curve for the 59 protein was calculated with the program Crysol and the curves were compared to each other. There is an overlap in the low q region but the rest of the data does not overlap (Chi value 7.2). Differences in the curves can be contributed to flexibility of the protein. The experimental scattering data is the red circles and the theoretical scattering curve is the blue line.118

108

Several models of the 32-B protein were generated by Dammin and Gasbor, but the Chi value for many of the models were above 2. One model, generated with Gasbor, did have a Chi value of 1.24 with two 32-B molecules fitting into the envelope (Figure 3.20). The molecular weight for the 32-B system was calculated from the I0 (from Primus and GNOM) and a known protein (59 protein, known molecular weight and I0). The MW of the 32-B for this scattering data ranged from 61.9 kDa (Primus) to 61.4-64.2 kDa (GNOM), suggesting that the 32-B was a dimer in solution. This data agrees with the preliminary data from analytical ultracentrifugation experiments that have shown that the 32-B was a dimer in solution (not shown).

A theoretical scattering curve was

generated containing two 32-B molecules and when compared to the experimental data, the curves were very different from each other (Figure 3.21). The difference in the curves can be due to the fact that two A-domains (120 residues) are missing from the model, as well as the proteins were not superimposed into the envelope in the correct conformation, and the flexibility of proteins can create differences.

Figure 3.20: Molecular envelope of 32-B generated with Gasbor. Gasbor generated a molecular envelope that includes two 32-B molecules (q range 0.0366-0.1406, Dmax 100 Å).

The

calculation of the MW for the 32-B data suggest that the 32-B exists as a dimer in solution. The surface envelopes were generated with PYMOL and the protein atomic coordinates were superimposed into the envelope using COOT, the models were created in PYMOL.23, 24

109

Figure 3.21: Experimental and theoretical scattering curves of the 32-B protein. A theoretical scattering curve for two 32-B was calculated with the program Crysol and the curves were compared to each other. There is an overlap in the low q region but the rest of the data do not overlap. Differences in the curves can be due to flexibility of the proteins, two missing A-domains, and the orientations of both proteins.118

The programs CREDO and CHADD were used to build in the missing A domain and the models appeared to be the correct size, conserving the shape of the core, but the resulting Chi values were all above 5 (Figure 3.22). Both programs were used to build in the A-domain at Dmax of 90 Å and 100 Å and in two different q ranges (0.0390-0.1996 and 0.0366-0.1406). Three out of the four models orient the C-terminus of the 32 core near the extended, open region indicating a place for the missing A domain (reverse orientation for CREDO model at 100 Å). In all four models, the characteristic OB fold was seen in the envelope and each suggests that the A-domain resembles a tail extending from the core. For the building programs, many of the input GNOM files contained only

110

low q data, perhaps using more of the scattering data would generate more accurate models of the 32-B with a lower Chi value.

32-B CHADD: Dmax 100 Å q range: 0.0366-0.1406

32-B CHADD: Dmax 90 Å q range: 0.0390-0.1996

32-B CREDO: Dmax 90 Å q range: 0.0390-0.1996

32-B CREDO: Dmax 100 Å q range: 0.0366-0.1406

Figure 3.22: CREDO and CHADD models of 32-B. Programs CREDO and CHADD built models of the 32 core with the missing A-domain at two different Dmax (90 Å and 100 Å) at two different q ranges (0.0390-0.1996 and 0.0366-0.1406). All the models, except for the CREDO at 100 Å positioned the missing A-domain toward the open region of the model. The characteristic OB fold can be seen in the molecular envelope with an area for the A-domain to reside. The surface envelopes were generated with PYMOL and the protein atomic coordinates were superimposed into the envelope using COOT, the models were created in PYMOL.23, 24

The molecular envelopes for the 59 protein are similar to the actual X-ray structure of the 59 protein. As for the 32-B, the building programs suggest that the missing A-domain

111

extends from the core in a tail-like fashion, while the ab initio envelope suggests the 32-B exists as a dimer. Additional runs of the building and ab initio programs should be performed in order to determine if these envelopes are in fact the shape of the 32-B protein in solution.

3.1.2.6 Small angle neutron scattering of T4 proteins Small angle neutron scattering (SANS), like SAXS, provided low resolution information on the shape of the individual proteins. Data were collected on the 59 protein (H59 hydrogenated) and the 32-B (D32-B deuterated 32-B) at 15 °C for 2 hours at 2 m (q range 0.012546-0.400785 Å-1) (15.7-500.8 Å). The H59 data was collected in 100 % deuterated buffer (DB) (25 mM Bis Tris pH 6.5, 150 mM ND4Cl, 10 mM MgCl2, all in D2O) and the D32-B was collected in 100 % hydrogenated buffer (HB) (25 mM Bis Tris pH 6.5, 150 mM NH4Cl, 10 mM MgCl2, 2 mM BME). D32-B (2 mg/mL) scattered more intensely than the H59 protein (5.0 mg/mL) (Figure 3.23).

Figure 3.23: SANS scattering profile for H59 and D32-B proteins. The data was collected for each protein at 2 m (q range 0.012546-0.400785 Å-1) for 2 hours. The H59 was collected in 100% deuterated buffer and the D32-B was collected in 100 % hydrogenated buffer.

112

The data for both proteins was processed with the program IGOR. Data was collected on several samples in order to properly process the protein data; data was collected at 2 m for 2 hours for 100 % DB, 100 % HB, a dark scan (shutter blocking the neutron beam), an efficiency scan (Lucite), and an empty cuvette scan. Each of the scans was used to correct the protein data including the background subtraction of the buffer. After processing the data, scattering curves were generated with IGOR (Figure 3.24)

H59 in 100% DB

D32-B in 100 % HB

Figure 3.24: SANS scattering curves of the H59 and D32-B proteins. Neutron scattering cuvres generated in IGOR of the H59 in 100 % DB (left panel) and D32-B in 100 % HB (right panel). The steepness of the H59 curve suggests that the protein was aggregating. The D32-B curve plateaus at the low q range and dramatically decreases at the high q range.

The Guinier region was examined for both proteins using IGOR and a Rg of 26.3 Å was calculated for the D32-B, but the Rg for the H59 was 12.9 Å, an inaccurate measurement due to protein aggregation (Figure 3.25). The Rg of the D32-B agrees with the Rg determined with the SAXS data. The processed data for both proteins was outputted as a 1D text file, which was read by the program GNOM. Analysis with

113

GNOM was not eventful, the scattering points did not generate a curve, the points were located all over the plot. The 2 m, 2 hour data collection did not generate enough data points to properly analyze the data with the program GNOM.

D32-B

H59

Figure 3.25: Guinier plots of H59 and D32-B proteins. The program IGOR was utilized to investigate the Guinier region of the H59 and D32-B data. The Rg for H59 was 12.8 Å and the Rg for the D32-B was 26.3 Å. The Rg for the D32-B coincides with the SAXS Rg, while the aggregation of the H59 protein prevents the accurate measurement of the Rg.

3.1.3 T4 protein-protein complexes characterizations

3.1.3.1 T4 protein complex preparation A variety of protein-protein complexes were studied in order to better understand the complexes required for initiation of lagging strand synthesis. Upon the mixing of 59 (native and/or mutant) to 32 (native and/or truncations), a white precipitate formed. The precipitate formed whether 32 was added to 59 or 59 was added to 32. Initial studies of this complex performed at 0.36 nmol/µL in a volume of 25 µL resulted in a clear

114

solution. When that volume was doubled, the preparation of the complex created a white precipitate even though the concentration was the same. Later experiments showed that if the 32 protein was added to the 59 protein dropwise with constant stirring, both proteins remained in solution. This modified preparation steps was utilized for the final crystal expansion trays that were setup. Different temperatures were investigated to determine their effect on the stability of the complex. The complexes were prepared at 4 °C (on ice) or at room temperature (bench top) and either incubated in ice, at room temperature, or at 37 °C (heating block) for ½ hour to one hour. In all experiments, the precipitate formed and SDS-PAGE verified that both 59 and 32 proteins were present in the precipitate, but the majority of both proteins were found in the supernatant.

After further investigation, it was

determined that when the complex was incubated at room temperature (1/2 hour or more), then centrifuged softly, the majority of the precipitate resolubilizes, leaving a very small, clear, gel-like pellet. The pellet size increased as the volume and concentration of the complex increased. Additives were included in the 59C42S + 32-B complex to increase the solubility of the complex.

Sugars (sucrose and glucose) and polyalcohols (ethylene glycol,

glycerol, polyethylene glycol, 2-methyl-2-4-pentanediol) were added to the protein solutions, but the proteins still precipitated. 10 % sucrose and 10 % MPD stabilized the complex at room temperature while 10 % glycerol, 10 % PEG 400, 10 % glucose, and 10% ethylene glycol had minimal effects on the stability of the complex. There was no obvious effect on the stability of the complex in the presence of additives at higher concentrations of the complex.

115

3.1.3.2 Native gels of the T4 complexes Native gels were utilized to visualize the protein-protein interactions between the 59 and 32 proteins (Figure 3.26). 32 protein interacted strongly with both 59 and 59C42S, while 32-B had a moderate interaction with both the 59 and 59C42S proteins. There was no obvious shift between the 32-A/59 complex, suggesting no interaction with this truncation, but there was a slight shift in the 59 protein in the 59/32core complex. Perhaps there is a very weak interaction between the 59 protein and the 32 core that can be stabilized in the presence of the A domain or DNA. Excess 32-B protein in the 59/32B complex was detected on the agarose gel; therefore an additional gel was run to determine the stoichiometry of binding for this complex. Several ratios (1:1, 0.75:1, 1:2, 2:1) were tested and in each case, excess of 32-B was seen (data not shown). Native gels are not sensitive enough to detect the stoichiometries of these moderate interactions. Several mutations of the 32-B located in the core (I151D, I60D, His-I151D, HisI60D) (generated from the investigation of the X-ray crystal interface between RNase H and 32-B) were used to further study the interactions between the 59 protein and 32-B. Mutant 32-B proteins still interacted with the 59 protein, agreeing with previous experiments that show that the 59 protein does not primarily interact with the core of the 32 protein. The mutant 32-B proteins have an N-terminal hexahistidine tag and the presence of this tag does not interfere with the 59-32-B interaction, suggesting that the interaction occurs with the A domain of 32-B.

116

32

32

59C42S + 32

59 + 32

59C42S

59 32-B

32-A 59 + 32-B

59C42S + 32-A

59

59C42S

32-A

32-B 59 + 32-A

59C42S + 32-B

59 59C42S

32 core 59 + 32 core

32 core 59C42S + 32 core

59

59C42S

Figure 3.26: Native gel of the T4 protein-protein complexes. Native gel investigating protein-protein interactions between 59 proteins (mutant) and 32 proteins (truncations). The proteins will migrate toward the anode or cathode based on the theoretical isoelectric point (pI) of the proteins. The formation of the complex should change the overall pI of the protein, thus altering the migration of the complex. Interactions were seen between 59 and 32, 59 and 32-B based on the change in the protein shifts on the gel.

3.1.3.3 Size exclusion chromatography of the T4 complexes Size exclusion chromatography had shown the formation of the 59/32-B (10 mg/mL) and 59C42S/32-B (6.5 mg/mL) complexes, with both complexes (MW~58 kDa) eluting in the void fraction of the Superdex 75 (molecular cutoff weight of 60 kDa). This data suggests that the complex is quite stable at room temperature. No additional peaks were present, indicating that the interaction between the proteins are moderate and with a slow dissociation constant. 117

3.1.3.4 Dynamic light scattering of the T4 complexes DLS confirmed the formation or lack of formation of each of the above complexes (Table 3.5). Measurements were taken at different temperatures in order to determine the optimal temperature range needed for the formation of a monodisperse complex. The complexes tended to be slightly less polydisperse at 4 °C, but these readings do not take into account the temperature in which these samples were prepared (Figure 3.27). These experiments were performed numerous times; generally the results were reproducible, thus the temperature of complex formation does not appear to affect most of the complexes. A.

59+32 complex

B.

59+32-B complex

Figure 3.27: DLS T4 complex data. A). The data for the 59+32 complex shows multiple peaks indicating a polydisperse sample (~34 %). B). The data for the 59+32-B complex shows two peaks indicating a slightly polydisperse sample (~13%).

118

Table 3.5: Dynamic light scattering results of T4 protein complexes. Representative sampling of experiments performed on the different complexes at 4 °C, 20 °C and 25 °C. Table includes the temperature at which the experiment was run, the degree of polydispersity of the protein complex sample, the hydrodynamic radius of the complex and the estimated molecular weight of the complex. MW of 59 protein is 25.9 kDa, 32 protein 33.5 kDa, 32-A protein 28.6 kDa, and 32-B protein 31.8 kDa. Protein complex 59C42S + 32-B 59C42S + 32-B 59C42S + 32 59C42S + 32-A 59C42S + 32-A 59 +32-B

Temperature °C 4

% Polydispersity

Radius Ǻ

Ratio 59:32

43 ± 5

Molecular Weight (kDa) 101

12.6

25

14.9

32 ± 5

51

1:1

4

14.1

46 ± 6

120

N/A

4

31.7

49 ± 7

139

N/A

25

41.3

51 ± 9

156

N/A

4

13.8

34 ± 5

57

1:1

59 +32-B

20

25.8

31 ± 8

49

1:1

59 + 32

4

13.8

62 ± 9

237

N/A

59 + 32

25

34.3

52 ± 18

159

N/A

2:2

Readings for the 59C42S-32-A complex were very similar in molecular weight and hydrodynamic radius to the individual 32-A run, suggesting that there was no complex formation. Similarly, the RH and molecular weight for the 59-32 and 59C42S32 complexes also resembled the data collected for the 32 protein. Literature and the native gel results show that 59 and 59C42S proteins interact with 32 protein, perhaps the large aggregates mask any scattering that might occur from the smaller 59-32 complex. On the other hand, the 59-32-B and 59C42S-32-B complexes were determined to be interacting in a 1:1 binding ratio according to the estimated molecular weight. In one case, it was suggested that the binding ratio for 59C42S-32-B might be 2:2 at 4 °C, but a subsequent experiment showed the formation of a complex in a 1:1 binding ratio. This

119

information was used to set up the parameters for other biophysical techniques such as isothermal titration calorimetry and small angle X-ray scattering.

3.1.3.5 Isothermal titration calorimetry of the T4 complexes Isothermal titration calorimetry was utilized to determine the binding affinity and molar ratio of the protein-protein complexes (59-32, 59-32-B, and 59-32 core). Several controls (buffer into buffer, protein into buffer and buffer into protein) were run in order to account for the heat of dilution, heat of mixing, and heat of stirring. In the case of the buffer into buffer, buffer into 59, 32-B into buffer, and buffer into 32 experiments, the heat released was minimal. Titrating 59 into buffer and 32 core into buffer produced heats larger than the other controls but smaller than the protein-protein experiments. Generally, the control runs had a stable baseline and typically the best control run was used to subtract the baseline from the protein-protein run. For the 59-32-B protein titration, the concentration of the 59 protein was 24.6 µM and the concentration of 32-B was 882.6 µM. For the 59-32 protein titration, the concentration of the 59 protein was 641 µM and the concentration of 32 protein was 22.2 µM. For the 59-32 core titration, the concentration of the 59 protein was 27.1 µM and the concentration of 32 core was 450 µM. 59 protein bound to 32 protein with a moderate KD of 3.7 µM (Figure 3.28); similarly, 59 bound to 32-B with a KD of 3.6 µM (Figure 3.29). These dissociation constants are weaker than previously described in the literature (2 nM and 3 nM).18 These nM KD were determined from fluorescence anisotropy experiments where the

120

anisotropy decreased as 32 protein (or truncations) was titrated into fluorescently labeled 59 protein.

Data suggested that the initial increase in anisotropy was due to the

aggregation of the 59 protein and the decrease in anisotropy occurred as the 32 protein was titrated into the cuvette. This decrease in anisotropy occured as the sample was diluted and as the 32 protein bound to the 59 protein, decreasing the particle size. Interestingly, the 59 protein binds to 32 protein in a 0.5:1 molar ratio, while the 59 protein binds to 32-B in a 1.3:1 molar ratio. The 59-32B complex data from DLS supports the molar ratio calculated with ITC; the estimated molecular weights from DLS suggest one 59 protein interacting with one 32-B protein. There are several possibilities of different complexes that are being detected with DLS for the 59-32 complex. These estimated molecular weights suggest that either two 59 proteins are interacting with two 32 proteins, or multiple 32 proteins are interacting with one 59 protein. Since the 32 protein has a cooperative binding to other 32 proteins, one 32 protein could bind to the 59 protein then attract other 32 proteins to bind. The 59-32core was also studied with ITC; the resulting data shows that a weak titration occurred within the first 4 or 5 injections. Subsequent injections reached a uniform heat release, but never returned to baseline. The analysis of this data shows that there was no binding interaction between the 59 protein and the 32 core based on the evidence that the heat release from the controls were of the same caliber of the actual protein-protein run.

121

Tim e (m in) 0

50

100

150

200

0.1 0.0 -0.1

µcal/sec

-0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8

kcal/mole of injectant

0 -1 -2 -3

Data: A130085932_NDH Model: OneSites Chi^2/DoF = 5634 N 0.538 ±0.0137 K 2.72E5 ±2.07E4 ∆H -7242 ±240.6 ∆S 0.164

-4 -5 -6 -0.5 0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Molar Ratio Figure 3.28:

Isothermal titration calorimetry thermogram of 59-32 protein

complex. 59 protein was titrated into 32 protein in 5 µL injection for a total of 40 injections at 20 °C. The dissociation constant for this complex is 3.7 µM, with a stoichiometry of binding of 0.538.

122

T im e (m in ) 0

30

60

90

120

150

180

210

0 .0

µcal/sec

-0 .1 -0 .2 -0 .3 -0 .4

kcal/mole of injectant

-0 .5

0

-2 D ata: A 5 2 4 0 7 5 9 3 2 b _ N D H M o d el: O n eS ites C h i^2 /D o F = 7 0 9 8 N 1 .3 4 ± 0 .0 3 0 8 K 2 .8 0 E 5 ± 3 .1 2 E 4 ∆H -4 1 2 0 ± 1 4 5 .2 ∆S 1 0 .9

-4

-0 .5 0 .0 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0 4 .5 5 .0 5 .5

M o la r R a tio

Figure 3.29: Isothermal titration calorimetry thermogram of 59-32-B protein complex. 32-B protein was titrated into 59 protein in 5 µL injection for a total of 40 injections at 20 °C. The dissociation constant for this complex is 3.6 µM, with a stoichiometry of binding of 1.34.

123

3.1.3.6 Small angle X-ray scattering of the T4 complexes Small angle X-ray scattering experiments agree with DLS and ITC results, showing that 59 protein interacts with the 32 protein and 32-B truncation. Each complex had a radius of gyration larger than the individual proteins and similar to the hydrodynamic radius of the complexes.

The data was collected in the q range of

0.0200910-0.836965 Å-1 (312.7-7.5 Å) at 20 second and 40 second intervals (Figure 3.30). The 40 second interval scattering data was slightly more intense than the 20 second interval data, therefore this data was processed, analyzed, and modeled.

Figure 3.30: Example of SAXS scattering profiles for the T4 complex. Example of scattering profiles for the 59 + 32-B complex at 20 second intervals (left panel) and at 40 second intervals (right panel). The darker the spots, the greater the intensity of scattered X-ray beam. The 40 second interval data was processed, analyzed, modeled for all of the complexes.

The 40 second interval data were averaged and the background was subtracted using the 40 second interval buffer data. The 59 + 32-B data decreased two orders of magnitude, while the 59 + 32 data decreased almost three orders of magnitude (Figure 3.31).

124

59 + 32 Figure 3.31: Scattering curves of the T4 protein complexes. The 59 + 32-B complex data decreased 2 orders of magnitude, with the processed 20 second interval data in blue and the 40 second interval data in green. The 59 + 32 complex data decreased almost 3 orders of magnitude, with the with the processed 20 second interval data in pink and the 40 second interval data in purple.

The same techniques that were used to analyze and model the individual proteins were used for the protein complexes. Data analysis of the 59 + 32 complex with the Primus program calculated a Rg of 54.8 Å. The Guinier region for this complex was steep and the data did not exactly fit the line (Figure 3.32). GNOM analysis of the 59 + 32 complex did not produce decent scattering curves or pr plots.

Both programs

suggested that the 59 + 32 complex had multiple complex species in solution or there was a large degree of aggregation occurring in the sample. It would be very difficult to continue analyzing or modeling the 59 + 32 complex when the sample was not monodisperse. The Rg of the 59 + 32-B complex in the Guinier region was 35.3 Å (Figure 3.32). GNOM analysis of the 59 + 32-B complex calculated a Rg around 36.2 Å with minor deviation in the Rg value as the Dmax and q range varied (Table 3.6). The Rg values from

125

GNOM and Primus were very similar for the 59 + 32-B, suggesting data analysis was consistent between the programs.

59 + 32-B

59 + 32

Figure 3.32: Guinier plots of the T4 protein complexes. Primus was utilized to examine the Guinier region of the data for each of the protein complexes, where the slope of the Guinier plot is the Rg. A). The slope of the Guinier plot for the 59 + 32-B complex was 35.3 Å. B). The slope of the Guinier plot for the 59 + 32 complex was 54.8 Å, the steepness of the slope suggests that the protein is aggregated or heterogeneous. The slope is not a very good fit for the 59 + 32 data.111

Table 3.6: Analysis of 59 + 32-B comlex scattering data with GNOM. The scattering data for 59 +32-B complex was analyzed with GNOM112 with varying several parameters (q range and Dmax). The best analysis based on curve fitting and Chi values are listed below.

Protein

Rg (Å)

Dmax (Å)

q range (Å-1)

Chi value

59 + 32-B

25.8 ± 0.16

120

0.0366-0.3759

0.99

59 + 32-B

36.9 ± 0.19

125

0.0366-0.1406

0.99

59 + 32-B

37.1 ± 0.19

127

0.0366-0.1406

0.99

59 + 32-B

36.2 ± 0.32

124

0.0366-0.3172

1.00

59 + 32-B

35.2 ± 0.19

115

0.0366-0.2585

0.92

126

The 59 + 32-B experimental scattering curves overlapped the theoretical scattering curves well at a Dmax of 124 Å and 125 Å (Figure 3.33). The Pr plots at these Dmax suggested that the complex had an elongated shape (Figure 3.33). The Chi value was above 0.9 for many of the data analyzes and the scattering curves overlapped well, thus various data analyzes were used to generate molecular envelopes for the 59 + 32-B complex. A. 59 + 32-B complex Dmax 125 Å q range- 0.0366-0.1406

59 + 32-B complex Dmax 125 Å q range- 0.0366-0.1406

59 + 32-B complex Dmax 124 Å q range- 0.0366-0.3172

B.

59 + 32-B complex Dmax 124 Å q range- 0.0366-0.3172

Figure 3.33: Experimental scattering curves and pr plots for the 59 + 32-B complex. A). The scattering curves for the 59 + 32-B complex (left panel: scattering curve; right panel: Pr plot; Dmax 125 Å, q range: 0.0366-0.1406).

B). The scattering curves for the 59 + 32-B complex (left panel:

scattering curve; right panel: Pr plot;

Dmax 125 Å, q range: 0.0366-0.3172).

In both cases, the

experimental data overlaps the theoretical curve at these Dmax and the distance distribution plots suggest the shape of the complex was elongated. For the scattering plot the q range is on the x axis, while the I0 (intensity) is on the y axis. 112

127

Several programs were used to generate models of the 59 + 32-B complexes; persistent envelopes were generated regardless of the type of program or the specific parameters used during envelope modeling (Figure 3.34). Gasbor models appeared to be the correct size for a 1:1 complex but the overall model was more of a flattened shape (Figure 3.34 B and C). The Dammin models appeared to be an ideal shape, but the volume size was larger than expected; two 59 + 32-B complexes could fill the Dammin molecular envelope (Figure 3.34 A and E). The Chi values for each of the models in Figure 3.34 were around 1.1-5.0, where the best Chi value was for model A (Chi value of 1.1) (Dammin/Damaver model at Dmax 125). The volume difference between the Gasbor and Dammin models can be contributed to the actual program run. The number of dummy residues (DR) used in Gasbor is dictated by the number of residues in the actual system, while the number of DR used in Dammin is dictated by the mode (Fast, slow, etc) of the run. The size of the Ga_struct model was similar to the Gasbor model where only one complex could fit into the envelope; the size of these models are also dictated by the number of points specified for the program to use. The molecular weight of the scattering complex was calculated (against the known 59 protein) and the Primus data suggested a MW of 86.0 kDa, while several calculations of the GNOM data suggested a MW range of 89.3-92.2 kDa. A complex ratio of 1:1 (59:32-B) has a MW of 57.8 kDa, a ratio of 1:2 has a MW of 89.7 kDa, a ratio of 2:1 has a MW of 83.8 kDa, and a ratio of 2:2 has a MW of 115.7 kDa. The MW of the data suggested a 1:2 complex ratio with one 59 protein and two 32-B proteins

128

existed in solution. AUC and SAXS results indicate that the 32-B exists as a dimer in solution, which would agree with the 1:2 complex ratio for the 59 + 32-B. A.

B.

Gasbor: Dmax 115 Å q range: 0.0366-0.2585

Dammin/Damaver: Dmax 125 Å q range: 0.0366-0.1406 D.

C.

Gasbor: Dmax 124 Å q range: 0.0366-0.3172

Ga_struct: Dmax 115 Å q range: 0.0366-0.2000

E.

F.

Dammin/Damaver: Dmax 115 Å q range: 0.0366-0.1406

G.

129

Figure 3.34: Molecular envelopes of the 59 + 32-B complex. A). Molecular envelope generated with Dammin (Keep mode) and averaged with Damaver (Chi value 1.1); B). envelope generated with Gasbor (Chi value 1.5); C). envelope generated with Gasbor (Chi value 1.2); D). envelope generated with Ga_struct (Chi value 5.0); E). envelope generated with Dammin (Keep mode) and averaged with Damaver (Chi value 1.3); F). 59 and 32-B superimposed into the Ga_struct envelope; G). 59 and 32-B superimposed into the Dammin/Damaver envelope. The surface envelopes were generated with PYMOL and the protein atomic coordinates were superimposed into the envelope using COOT, the models were created in PYMOL.23, 24

When the 59 and 32-B were superimposed into the Ga_struct model and a Dammin model, the resulting coordinates were inputted into Crysol and a theoretical curve was calculated (Figure 3.35). The theoretical curve overlapped the experimental curve in the low q range, but there were deviations in the high q range for both of the models. In order to determine the conformation of this protein-protein interaction, several different complex models were created in COOT and the theoretical scattering curves were calculated in Crysol (Figure 3.36).24, 118 Out of all of the models the scattering curve for the side-by-side model overlapped the experimental data the best with the lowest Chi value.

130

Ga_struct model One 59 + 32-B complex

Dammin/Damaver model Two 59 + 32-B complexes

Figure 3.35:

Theoretical and experimental scattering curves of 59 + 32-B

complexes. Theoretical scattering curves were generated for two different 59 + 32-B complexes and the curves were compared to the experimental scattering data. There is greater overlap with the Ga_struct model (Chi value 5.1) than with the Dammin model (Chi value 17.0), but both models have major differences between the scattering curves at mid and high q regions. The experimental data is the red circles and the theoretical curve is the blue line.118

131

A.

B.

C.

Figure 3.36: Theoretical scattering curves for different 59 + 32-B complex conformations. 59 protein (purple) was docked to the 32 core (cyan) in various orientations using COOT and the theoretical scattering curves were calculated with Crysol.24, 118 A). The scattering curve for the side-by-side model (Chi value 5.6) overlapped the experimental data in the low q but had some deviations in the mid and high q range. B). The scattering curve for the top-to-bottom model (Chi value 18.9) did not overlap the experimental data. C). The scattering curve for the diagonal model (Chi value 10.1) overlapped the experimental data in minor regions.118

132

3.1.3.7 Small angle neutron scattering of T4 complexes Small angle neutron scattering data was collected on several variations of the 59 + 32-B at 2m (q range 0.012546-0.400785 Å-1) for 5 hours and 6 m (q range 0.0075750.187509 Å-1) for 5 hours. All of the data was collected at 15 °C, at a wavelength of 4.76 Å. Data for H59 + H32-B in 100 % DB (~ 4 mg/mL) and H59 + D32-B in 50 % DB (~ 3 mg/mL) complexes were collected at 2m for 2 hours, but the count rate (number of neutron scattering points) was too low. The data for the same complexes were then collected at 2 m for 5 hours and the count rate increased. The initial analysis of the data suggested that the q range was not optimal for the size of the complex, thus the sampleto-detector distance was increased to 6 m. At 6 m, two data sets (H59C42S + H32-B in 100 % DB and H59C42S + D32-B in 100 % DB) were collected for 5 hours; the concentration of the H59C42S + H32-B complex was ~ 11.0 mg/mL and the H59C42S + D32-B was ~ 10.0 mg/mL (Figure 3.37).

Figure 3.37: SANS scattering profile for the 59 + 32-B complexes. The data was collected for each complex at 6 m (q range 0.007575-0.187509 Å-1) for 5 hours; one data set was H59C42S + H32-B in 100 % DB and the other data set was H59C42S + D32-B in 100 % DB.

133

Scattering curves for the protein complexes were generated with IGOR, and the scattering curves were noisy in the low q region with indications of some protein aggregation (Figure 3.38). Initial analysis of the data with IGOR was unsuccessful in determining a reasonable Rg for either data set (values too low, 12-15 Å). The initial scattering curves generated with GNOM suggested that more data points needed to be collected. Longer data collection time or a more concentrated sample would increase the number of data points.

H59C42S + H32-B in 100 % DB

H59C42S + D32-B in 100 % DB

Figure 3.38: SANS scattering curves of the 59 + 32-B complexes. Neutron scattering cuvres generated in IGOR of the H59C42S + H32-B in 100 % DB (top panel) and H59C42C + D32-B in 100 % HB (bottom panel). Both scattering curves were noisy in the low q region and the steepness of the curve indicates protein aggregation.

134

3.1.4 T4 protein complex crystallization studies The 59/32 protein complexes were subjected to a multitude of crystal screen conditions in order to crystallize the complexes. Four different complexes have been studied, 59C42S/32-B, 59C42S/32, 59/32, and 59/32-B, each of which have been screened for possible protein crystal hits. Table 3.7 describes the screens used for each complex, the drop size, temperature(s) at which the trays were stored, and the number of times each crystallization screen was utilized. The majority of the crystal screens were set up with the Honeybee robotics system at ambient temperature then transferred to the cold room (4 °C) for a week. After scoring the trays at 4 °C, the trays were then transferred to a cabinet (ambient temperature) for another week. Many of the screens were repeated for complexes, but some of the crystal hits were not duplicated in the same conditions. This lack of duplication might be due to the inconsistency in preparing the protein complexes. The majority of the complexes were set up in a 1:1 molar ratio for each screen, where the molar concentration varied from 14.3 – 27.3 nM. The 59C42S/32-B complex was screened in a 1:1, 2:1, 3:1 ratio (59C42S : 32-B); the 59/32-B complex was screened in 1:1 and 0.5:1 ratio (59 : 32-B). The majority of the crystal hits were obtained in conditions set up in a 1:1 molar ratio. These crystal hits were tested for protein via Izit Dye (Hampton) or by crushing the crystals. There were several hits for the 59/32, 59C42S/32, and 59C42S/32-B complexes (Table 3.8 and 3.9). The optimal protein buffer used contained 25 mM Bis Tris pH 6.5, 100-150 mM NH4Cl, 10 mM MgCl2, 2 mM BME.

135

Table 3.7: Commercial and in house crystallization screens used for crystallization studies of the T4 protein complexes. 4/RT denotes that trays were stored at 4 °C then ambient, room temperature (RT). The drop size consists of protein : screen condition. # of setup specifies the number of times the complex was screened against that particular crystallization screen. *- this tray screening the 59C42S/32-B complex against Wizard I and II was set up by hand at RT then stored at 4 °C.

Protein Complex 59C42S/32-B 59C42S/32-B 59C42S/32-B 59C42S/32-B 59C42S/32-B 59C42S/32-B 59C42S/32-B

Temperature (°C) 4/RT 4/RT 4/RT 4/RT 4/RT 4/RT 4

Drop size (µL) 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 1+1 0.5 + 0.5

59C42S/32-B

4

0.5 + 0.5

59C42S/32-B 59C42S/32-B 59C42S/32-B 59C42S/32-B 59C42S/32-B

4 4 4 4 4

1+1 1+1 1+1 1+1 1+1

PEG Ion / Natrix Crystal Screen I and II Additive Cryo I and II Wizard I and II Wizard I and II* Index Na/KPO4 and (NH4)2SO4 / PEG Ion PEG Ion / Natrix Additive Wizard I and II Cryo I and II Crystal Screen I and II

59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59C42S/32 59/32-B 59/32-B 59/32-B 59/32-B 59/32-B 59/32 59/32 59/32 59/32

4/RT 4/RT 4/RT 4/RT 4/RT 4/RT 4/RT RT RT RT RT RT Imager RT Imager RT Imager RT Imager RT Imager RT RT RT RT

0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1

PEG Ion / Natrix Crystal Screen I and II Additive Cryo I and II Wizard I and II Cryo I and II Crystal Screen I and II PEG Ion / Natrix Additive Crystal Screen I and II Wizard I and II PEG Ion / Natrix Crystal Screen I and II Wizard I and II Additive Cryo I and II PEG Ion / Natrix Crystal Screen I and II Wizard I and II Additive

136

Crystallization Screen

# of setup 2 3 1 1 1 1 1 1 2 2 3 2 3 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2

Table 3.8: Crystal hits of T4 protein complexes. Crystal hits of 59C42S/32 and 59C42S/32B, including commercial screen and crystallization conditions. Crystal Screen II (38) condition was stained with the Izit dye and crystals absorbed the dye (protein crystal).

#

Protein Complex

Screen

Crystallization Condition 70 % (v/v) MPD 0.1 M Bis Tris pH 5.5

1

59C42S + 32

Crystal Screen II (35)

2

59C42S + 32-B

PEG Ion (39)

20 % (w/v) PEG 4,000 0.1 M TAPS pH 8.5 0.2 M KCl

3

59C42S + 32-B

Natrix (10)

5 % (w/v) PEG 4,000 0.05 M MES pH 6.0 0.005 M MgSO4

4

59C42S + 32-B

Crystal Screen II (38)

20 % (w/v) PEG 10,000 0.1 M HEPES pH 7.5

5

59C42S + 32-B

Cryo (19)

35 % (v/v) 2-propanol 0.1 M Tris pH 8.0

6

59C42S + 32-B

Crystal Screen II (46)

30 % (w/v) PEG MME 0.1 M Bicine pH 9.0 0.1 M NaCl

7

59C42S + 32-B

Crystal Screen II (29)

30 % MPD 0.1 M HEPES pH 7.5 0.5 M (NH4)2SO4

137

Picture

Table 3.9: Crystal hits of T4 protein complexes. Crystal hits of 59C42S/32-B and 59/32, including commercial screen and crystallization conditions. PEG Ion (31 and 32) conditions were stained with the Izit dye and crystals absorbed the dye (protein crystal).

#

Protein Complex

Screen

Crystallization Condition 20 % (w/v) PEG 4,000 0.1 M HEPES pH 7.5 0.2 M Na Formate

8

59C42S + 32-B

PEG Ion (31)

9

59C42S + 32-B

PEG Ion (32)

20 % (w/v) PEG 4,000 0.1 M HEPES pH 7.5 0.2 M Na Acetate

10

59C42S + 32-B

Crystal Screen II (39)

3.4 M 1,6-Hexanediol 0.1 M Tris pH 8.5 0.2 M MgCl2 hexahydrate

11

59C42S + 32-B

Crystal Screen I (41)

20 % (w/v) PEG 4,000 10 % (v/v) 2-propanol 0.1 M HEPES pH 7.5

12

59 + 32

Natrix (33)

13

59 + 32

Wizard I (48)

30 % (v/v) 1,6Hexanediol 0.05 M HEPES pH 7.0 0.2 M NH4Cl 0.01 M MgCl2 20 % (v/v) PEG 1,000 0.1 M acetate pH 4.5 0.2 M Zn(OAc)2

14

59 + 32

Wizard II (46)

1.0 M (NH4)2PO4 0.1 M Imidazole pH 8 0.2 M NaCl

15

59 + 32

Crystal Screen I (19)

30 % (v/v) 2-propanol 0.1 M Tris pH 8.5 0.2 M NH4OAc

138

Picture

Expansion trays were prepared to optimize the crystal conditions to produce larger, diffraction quality crystals.

The 59C42S/32-B complex formed a crystalline

precipitate in many of the PEG Ion conditions, especially in the HEPES pH 7.5, with a variety of salts (NaCl, Na cacodylate, Na acetate, and Na formate). Expansion trays (2 + 2 + 4 µL; 2 + 2 µL hanging drops) were set up varying the concentration of these various salts, as well as trays examining the above PEG Ion conditions # 31 and #32 (Table 9, rows 8 and 9). All of these trays were set up and stored at 4 °C; most of the drops were precipitated or had a crystalline precipitate for these screen conditions. The crystal screen # 38 (Table 8, row 4) condition was set up testing three different parameters such as varying the pH of the buffer (4 °C) (2 + 2 µL hanging drops), testing different stoichiometries (4 °C) (1 + 1 µL hanging drops), and testing the original condition at ambient temperature (2 + 2 µL hanging drops). Drops at pH 6, 7, and 9 had crystalline precipitate, while pH 8 had dark precipitate with some crystalline precipitate. The results for testing the stoichiometries produced clear drops when 59 protein is in excess, precipitated drops when 32-B is in excess, and dark/crystalline precipitate when the proteins are in a 1:1 ratio. Conditions from Table 9, rows 2, 3, and 5, and Table 10, rows 8, 9, 10, and 11 were expanded at ambient temperatures (2 + 2 µL hanging drops). Most of the drops formed a crystalline, oily, or dark precipitate. The 59/32 complex crystal hits (Table 9, rows 12, 13, 14, and 15) were expanded at ambient temperatures (2 + 2 µL hanging drops). Row 12 drops were dark and crystalline precipitate, row 13 drops were clear or crystalline precipitate, row 14 drops were clear, and row 15 drops produced a light precipitate. Overall, most of the crystal hits could not be replicated.

139

3.1.5 T4 32-B crystallization studies When screening the complexes, the individual proteins were also screened as a control. Several crystal hits were obtained for the 32-B protein truncation (Table 3.10) and the conditions were expanded to optimize the crystallization conditions. Many of the expansion drops were clear or contained a dark or crystalline precipitate. The Natrix (2) and PEG Ion (26) conditions consistently produced micro-crystals. When expanding the Natrix (2) condition, MgSO4 was substituted for Mg(OAc)2 and this substitution did not yield crystals. The drops in the PEG Ion (26) condition were clear or contained a crystalline precipitate, while the Wizard II (39) produced precipitate or crystalline precipitate. Drops in the Natrix (2) condition were either dark or crystalline precipitate. The Index (5) condition produced clear drops or drops with an oily precipitation. 32-B crystals were from the Crystal Screen II (# 41) condition were optimized; this condition was discovered by another graduate student (Laurence Boutemy). This same screen was used when screening the complexes, but no 32-B crystals were produced because the protein was screened at lower concentrations that do not facilitate crystal growth. This condition was optimized by varying the pH and the concentration of the precipitant. Optimal crystal growth was seen in Bis Tris pH 6 and Tris pH 7, with a precipitant range of 1.75-1.85 M Li2SO4 (Figure 3.39). These crystals took over a month to grow and in most cases, the crystal expansion tray produced small crystals or predominately micro crystals. A film formed on the surface of the hanging drops, thus covering the crystals that grew near the surface, so the film had to be carefully cut from some of the crystals. A SDS-PAGE gel verified that the crystals contained 32-B.

140

Table 3.10: T4 32-B crystal hits. 32-B crystallization conditions, the commercial screen which produced the crystals. Conditions 1-6 were dyed with the Izit dye (blue color absorption is indicative of protein crystals), while condition 7 was photographed with a polarizer.

# 1

Protein 32-B

Screen Wizard I (32)

Crystallization Condition 10 % (w/v) PEG 3,000 0.1 M Na/KPO4 pH 6.2

2

32-B

Wizard II (39)

20 % (w/v) PEG 8,000 0.1 M CAPS pH 10.5 0.2 M NaCl

3

32-B

PEG Ion (26)

20 % (w/v) PEG 4,000 0.1 M HEPES pH 7.5 0.2 M NaCl

4

32-B

Natrix (2)

2.5 M (NH4)2SO4 0.05 M MES pH 5.6 0.01 M Mg(OAc)2

5

32-B

Index (5)

2.0 M (NH4)2SO4 0.1 M HEPES pH 7.5

6

32-B

Crystal Screen II (41)

1.0 M Li2SO4 0.1 M Tris pH 7.0 0.01 M NiCl2 hexahydrate

141

Picture

Figure 3.39: 32-B expansion crystals. 32-B crystals from expansion trays grown Li2SO4 condition. Crystals grew in the presence of precipitate, as multiple sizes per drop, as micro-crystals at high concentrations, and with a film covering the drop.

Crystals were frozen in a substituted mother liquor that contained a cryo protectant using a Helium cryostat. Li2SO4 was used as the precipitating agent and also acted as a cryo protectant; substitute mother liquors were prepared with 1.75, 1.80, 1.85, 1.90, 1.95, 2.2, and 2.5 M Li2SO4 concentrations. Crystals were soaked into various mother liquors and the smaller crystals (≤ 0.2 µm) were stable at Li2SO4 concentrations of 2 M. The larger crystals (≥ 0.2 µm) were unstable when soaked in any condition for longer than 20 seconds. The higher concentrations of Li2SO4 tended to stabilize the protein crystals during short soaks (10-15 seconds). The crystal conditions were set up at higher concentration of Li2SO4, but only micro crystals or small crystals grew. Crystals turned brown and cracked when soaked in 10 % glycerol, while crystals soaked in gluteraldhyde did not turn yellow (signifying cross-linking). Many of the crystals from each of the different cryo studies were tested for diffraction, where many of the crystals did not diffract. 32-B crystals were stable when soaked into Fomblin (HVAC 40/11 or LVAC 14/6) and when frozen straight from the hanging drop without additional soaks into cryo protectants.

142

X-ray data were collected at BioCARS beam-line 14 (APS) on several crystals. The best data set (crystal size 0.2 x 0.3 µm) was collected at a detector distance of 450 mm, 0.5 ° oscillations, and 15 second exposures for 120 ° (Figure 3.40). During the run, it appeared that ice was forming on the crystal so the data collection was stopped. Liquid nitrogen was poured over the crystal and the data collection was resumed. The first set of data had 92 frames (46 ° of data) and the second set of data had 110 frames (55 ° of data). Both data sets were processed with Mosflm, combined, and scaled with Scala to 3.3 Å (ccp4 suite). Mosflm suggested a space group of P3 (sg 150) and Scala calculated an Rmerge of 7.9 %, with 17,806 unique reflections and 96.1 % completeness. The cell dimensions for the data were 79.591 Å, 79.591 Å, 173.12 Å, 90 °, 90 °, 120 °.

Figure 3.40: Diffraction patterns of 32-B. Images of 32-B diffraction at BioCARS BM 14 at APS. This crystal diffracted to 2.9 Ǻ with a relatively long cell edge. Data was collect with 15 second exposure at 0.5 ° oscillation for 202 frames at a distance of 450 mm. Data was processed to 3.3 Ǻ with an initial space group of P3.

143

The data set was tested for the presence of twinning based on the long cell edge and close spots. A merohedral twin detector program determined that the 32-B data set was not twinned (Figure 3.41).124 Pointless was used to determine the space group of the 32-B data and a space group of P3221 was suggested. The processed data was examined using HKL View and the screw axis was evident based on the systematic absences, but it was difficult to detect the two fold symmetry. To verify the correct space group, the data was scaled as each of the space groups of P3 and used in molecular replacement (MR). The statistics were more reasonable with the space groups P32 (sg 145) and P3221 (sg 154) after scaling and MR (Table 3.11). The Matthew’s coefficient was calculated to be 2.49 for both space groups with 50.61 % solvent.

Figure 3.41: Analysis of 32-B data for twinning. The 32-B data was analyzed using PadillaYeates server. The red line is the theoretical untwined, the red curve is the theoretical twinned and the blue line is the data. The blue line is below the red, verifying the data is untwined but indicated a problem since 124

it is below the theoretical limit.

144

Table 3.11: Processing statistics of 32-B data as P32 and P3221. Data was processed as P3 with Mosflm, but then sorted as different space groups and scaled with Scala. The number of molecules in the asymmetric unit was calculated by determining the Matthew’s coefficient.

Phaser produced

reasonable solutions for both space groups while MolRep only found a solution for P3221.

P32 (145) P3221 (154) 7.9% (29.4%) 8.7% (31.8%) Rmerge (3.3 Å) 51,045 50,925 Obs. Reflections 17,806 9,897 Unique Reflections 4 2 Molecules in ASU --2.2 Data/atom Completeness (3.3 Å) 96.1% (99.1%) 98.2% (99.9%) 2.9 (3.0) 5.1 (5.5) Multiplicity (3.3 Å) 295 388 Rejections Phaser Output 4 molecule P32

2 molecule P3221 MolRep Output 4 molecule P32 2 molecule P3221

RFZ=6.1 TFZ=12.0 RFZ= 4.9 TFZ=17.0 RFZ=4.9 TFZ=20.9 RFZ=5.3 TFZ=26.5

PAK=0 LLG=128 PAK=0 LLG=453 PAK=0 LLG=831 PAK=0 LLG=1273

RFZ=5.9 TFZ=17.9 PAK=0 LLG=230 RFZ=5.6 TFZ=23.5 PAK=0 LLG=649 No solution TFcnt= 25.7 Rfac=0.558 Scor=0.341

Molecular replacement was performed using the coordinates from the structure of T4 32 core (pdb: 1GPC) (refer to figures 1.8 and 1.9). The core contains amino acid residues 22-239, where this protein truncation has an additional 66 amino acid residues (six residues off the N-terminal domain and 60 residues off the C-terminal domain) therefore, phasing information will be missing for these residues.

145

Phaser produced

solutions for both space groups, while MolRep found a solution only for P3221. Phaser solutions contained translational functions that were greater than 8, with no packing clashes, and very large, positive log-likelihood gain. The output for MolRep for P3221 had a large translational value with a moderate Rfactor and overall score. Each solution was opened in COOT in order to investigate the packing of each of the molecules. P32 was able to be superimposed onto P3221, and the remaining molecules of P32 were overlapped with symmetry molecules (Figure 3.42). Since both space groups appeared to be correct, the higher symmetry space group was chosen to begin building and refinement.

Figure 3.42: Models of P32 and P3221 from Phaser. P32 (145) is the orange ribbon structure, P3221 (154) is the cyan ribbon structure, and symmetry molecules are the purple ribbon cartoons. Left figure: Super position of 145 onto 154. Right figure: Super position of 145 onto 154 and the overlapping symmetry units onto the remaining molecules of 145.23-25

Programs in CNS were used to perform the initial refinements on the 32-B. A rigid body refinement with simulated annealing (R 43.5 %, Rfree 48.9 %) was utilized with non-crystallographic symmetry in order to treat both molecules in the ASU the same. The majority of the molecule was located within the density and extra connected 146

density was seen off the N-terminus and broken density off of the C-terminus (Figure 3.43).

G22 M239

Figure 3.43: Electron density at the N-terminus and C-terminus of the 32-B. Left panel: broken density extending off of the C-terminal methionine.

Right panel: Continuous density

extending off of the N-terminal glycine. Figures created in PyMol.23

This density was used to begin remodeling the core as well as building in the new residues. The 32 core residues were examined in groups; after a group of residues were remodeled, the entire molecule was refined in Refmac 5 (ccp4) using NCS restraints. During this process, it appeared that regions of the molecule were slightly shifted to the right inside the density. Therefore, this 32-B model was inputted into a program (TLS Motion Determination) that analyzes structures for flexible regions.125, 126 Upon analysis of each solution, the TLS division of 32-B into three regions was used in the refinements. TLS and restrained refinements with NCS were performed until the final group of the 32 core was remodeled. The TLS refinements adjusted the model slightly to fit the density; the molecule would need to be manually moved to properly fit the density.

147

A walk through of the 32 core model was performed using COOT; after every 2040 residues, the model was refined using CNS and REFMAC 5. The best refinement model, dependent on R and Rfree values and how well the model fit the experimental density, was used to continue walking through the model. The electron density for the core model was not well defined; therefore, the only movement performed was to move the Cα backbone chain into density, not the side chains. Some regions lacked electron density for the side chains as well as for the Cα backbone. Despite the automatic and manual manipulations to fit the model to the experimental data, regions of the protein were still unable to be placed into electron density (Figure 3.44).

Figure 3.44: Density missing from the 32 core model. Several residues of the 32 core can not be modeled into the electron density because there is no density present. Left panel: There is no density to model the Cα backbone for residues 79 (Ser) and 81 (His). Right panel: There is no density to model the Cα backbone for residue 179 (Gln), as well as several side chains, 178 (Lys) and 180 (Val).

148

Next, two zinc atoms were added to each monomer using COOT and refined with a library created in Refmac 5. The three cysteines located in the same vicinity tended to bond together during refinements. The presence of the Zn atom should prevent these bonds from forming, but in most cases, the bond still formed with the Zn present during the refinement. The final round of refinement of the core with a zinc ion had an R 38.6 % and Rfree 46.5 %. The program DM (density modification) was used to improve the phasing information from the 32 core in order to improve the electron density. DM did not improve the density in the regions where the amino acid residues needed to be built. After the core of 32 was modeled, the N-terminus of the 32 core was rebuilt to fit the density before building in the six amino acid residues to the N-terminus. The entire molecule was refined using CNS and REFMAC 5 after the addition of 1-2 residues. An additional 15 residues were added onto the C-terminus of the core with refinements occurring after the addition of 2-3 residues. During the build, it appeared that the new residues were not forming any secondary structure and that the new residues were just extending out into density that might be contributed from symmetry molecules (Figure 3.45). Several times, extension residues were rebuilt into other density that appeared after refinement. The models from both refinements varied slightly at the N-terminus and C-terminus, with minor deviations throughout the core.

149

A.

B. N-terminus C-terminus

C.

Figure 3.45: Final build of 32-B protein. Six residues were added to the N-terminus of the 32 core and 15 residues were added to the C-terminus of the 32 core until it appeared that the additions were not forming any secondary structure. The pink stick molecule is from the REFMAC 5 refinement and the yellow stick molecule is from CNS refine refinement. A). N-terminal residue additions onto the 32 core. B). C-terminal residue additions onto the 32 core. C). Partial 32-B molecule (residues 16-252) with the new N-terminus in blue, and the new C-terminus in red. Both additions appear to have a random coiled structure.

There was minimal density in the region of the missing A domain, preventing the further addition of residues from the A domain. In order to obtain more density, it seemed that it would be in the best interest of the project to focus on growing bigger, better diffracting crystals in order to obtain better data to use in the build of the Adomain. Ten expansion trays were set up using the Crystal Screen II condition 41 and a

150

variation of that condition. Crystals grew in around 1.5 M Li2SO4 (0.2 µm and smaller) and as the concentration of the lithium sulfate increased, smaller crystals were formed. Fourteen crystals were tested for diffraction and none of the crystals diffracted further than 7 Å or diffracted at all. Two trays were set up using microbridges to facilitate larger drop sizes (8 µL + 8 µL) and crystals that grew were still no bigger than 0.2 µm. Gluteraldehyde was added to a larger crystal (0.3 x 0.4 µm) to cross-link the crystal in order to improve diffraction but the crystal did not diffract. To shift the equilibrium past the salting-in phase, the Li2SO4 concentration was held constant at 1.5 M and (NH4)2SO4 was used in the gradient for several trays. Crystals formed at low concentrations of (NH4)2SO4 (0-50 mM), where most of the crystals were 0.2 µm or smaller. The last couple of trays that were set up did not produce crystals; the drops contained a crystalline precipitate, oily precipitate, or the drops were clear.

3.2 Bacteriophage KVP40 results and discussion 3.2.1 Protein expression and purification Expression vectors of KVP40 59 protein and 41 helicase were gifts from Dr. Nancy Nossal. The KVP40 41 (pRB2301) was expressed in BL32 (DE3) cells with typical 6L expression yielding 5-8 g of cell pellet. KVP40 59 (pRB1906) was expressed in BL21 (DE3) pLys S cells with typical 6L expression producing 8-9 g cell pellet. 41 helicase was lysed in a low salt lysis buffer (50 mM Tris pH 7.5, 100 mM NaCl, 1 mM EDTA, 20 % sucrose). With a theoretical isoelectric point around 5.64, anion exchange chromatography was used to purify the 41 protein. 59 protein was lysed in a low salt

151

lysis buffer (50 mM Bis Tris pH 6.5, 100 mM NH4Cl, 2 mM DTT). The theoretical isoelectric point of 59 protein is around 9.59 and the protein was purified using cation exchange columns (Figure 3.46).

Purification Scheme of KVP40 proteins 41

59

Q Sepharose

SP Sepharose

Poros PE

Poros PE

Poros HQ

Poros HS

Figure 3.46: Purification scheme for the KVP40 59 and 41 proteins. Poros PE removes endogenous nucleases, SP Sepharose, Q Sepharose, Poros HQ, and Poros HS removes contaminating proteins.

Lysate containing the 41 helicase was prepared for purification by decreasing the conductivity of the lysate to below the conductivity of buffer A (buffer A: 25 mM Bis Tris pH 6.5, 25 mM Na3citrate, 50 mM NaCl).

The Q Sepharose (Q Sep.) was

equilibrated with buffer A and the lysate containing the 41 helicase was loaded onto the column. The 41 helicase eluted from the QS in a small, broad peak (~ 175 mM) during a 10 CV linear gradient (buffer B: 25 mM Bis Tris pH 6.5, 25 mM Na3citrate, 1 M NaCl). The QS fractions containing the 41 helicase were combined and the conductivity of the sample was increased (addition of 2 M NaCl) to match the conductivity of buffer B. The 41 helicase sample was run over the PE and the contaminating nuclease bound to the column, while the 41 helicase was found in the effulent. The 41 helicase was dialyzed 152

(25 mM Bis Tris pH 6.5, 25 mM Na3citrate, 25 mM NaCl) to reduce the conductivity of the sample below the conductivity of buffer A. The 41 helicase sample was then loaded onto the Poros HQ (HQ), which was equilibrated with buffer A (buffer A: 12.5 mM Bis Tris pH 6.5, 6 mM Na3citrate, 12 mM NaCl). 41 helicase eluted from the HQ (~ 100 mM) in a broad peak during the 10 CV linear gradient run (Figure A6). The 41 helicase was fairly pure in the fractions towards the end of the HQ peak (> 95 %, assessed by SDS-PAGE) (Figure 2.47). Purification of 41 helicase yielded between 70-90 mg of protein per 6 L expression (Table 3.12). Lysate containing the 59 protein was prepared for purification by decreasing the conductivity of the sample below the conductivity of buffer A (buffer A: 25 mM Bis Tris pH 6.5, 50 mM NH4Cl, 10 mM MgCl2) The SP Sepaharose (SP Sep.) was equilibrated with buffer A and lysate containing the 59 protein was loaded onto the column. The 59 protein was eluted (~ 400 mM) in a broad peak during a 10 CV linear gradient run (buffer B: 25 mM Bis Tris pH 6.5, 750 mM NH4Cl, 10 mM MgCl2). The SP fractions containing the 59 protein were combined and the conductivity of the sample was increased (addition of 4M NH4Cl) to match the conductivity of buffer B. The 59 protein was run over the PE where the nucleases bound to the resin, while the 59 was located in the effluent. The conductivity of the 59 protein PE sample was decreased below the conductivity of buffer A through dialysis (25 mM Bis Tris pH 6.5, 40 mM NH4Cl, 10 mM MgCl2). The 59 protein eluted from the HS (~ 450 mM) in a narrow peak during a 20 CV linear gradient (Figure A7). The 59 protein was fairly pure (> 95 %, assessed with SDS-PAGE) after the Poros HS run (Figure 3.47). Purification of the 6 L expression pellet of 59 protein produced around 50-60 mg of protein (Table 3.12).

153

Table 3.12: Elution parameters of the KVP40 protein purifications. Table includes the columns used for protein purification, the fraction range in which the protein eluted off the column, and the conductivity range of the elution buffer and an estimate of the salt concentration at the peak at which the protein eluted from the column.

Protein Column

Fraction elution

Conductivity Range (mS/cm)

mM Salt

Protein yield (mg)

41

Q

24-35

20-35

175

N/A

41

HQ

5-18

7-30

100

90

59

SP

10-24

20-65

400

N/A

59

HS

17-20

50-65

450

55

1 2 3 4 5 6 78 9

55.4 kDa 36.5 kDa

1 2 3 4 5 6 7 8

Lane 1 2 3 4 5 6 7 8 9

41

Sample MW ladder

Fraction 6 Fraction 9 Fraction 11 Fraction 15 Fraction 18

Lane 1 2 3 4 5 6 7 8

59

Sample Flow Through Fraction 15 Fraction 17 Fraction 18 Fraction 19 Fraction 20 Fraction MW ladder

Figure 3.47: SDS-PAGE of KVP40 59 and 41 final purification columns. Both proteins were fairly pure after the final column purification with slight contaminants at the beginning of the peaks. Both 41 and 59 proteins were monomeric with molecular weights around 55 kDa and 26 kDa, respectively.

154

31.0 kDa 21.5 kDa

3.2.2 KVP40 protein characterization The lysis and purification of KVP40 41 helicase was originally based on the purification of the T4 41 helicase. KVP40 41 helicase was lysed in the no salt lysis buffer and a new low salt lysis buffer (50 mM Tris pH 7.5, 100 mM NaCl, 1 mM EDTA, 20 % (w/v) sucrose). 41 helicase was found in both pellet and lysate after lysis in buffer containing no or low salt concentrations. For subsequent purifications, the KVP40 41 protein was lysed with the low salt buffer then purified with anion exchange columns based on the acidic theoretical pI of 5.64. The purified 41 helicase precipitated as it was concentrated. The precipitated protein was collected and used in a solubility screen to determine the optimal salts and buffer to keep the protein soluble (Figure 3.48). The solubility studies were performed after changes in the purification process; in each case, one or two new additives/salts were added to the list of solubilizing reagents (Table 3.13). With each screen, a new solution was used for the characterization of the protein, usually a buffer at pH 6.5 with a combination of salts ranging from Na3citrate, NaCl, MgSO4, or KCl. These solutions were predominately used for DLS, Superdex 200, DSC, and early crystallization experiments. The final solubility screen determined that the 41 protein was most soluble in TAPS pH 8.0, sodium citrate, sodium sulfate, magnesium chloride, and sodium chloride. The optimal buffer used for this protein contained 25 mM Tris pH 8.0 and 25 mM Na3citrate. This buffer was utilized for later crystallization trials of the KVP40 41 and the T4 41 helicases.

155

Table 3.13: Results of solubility screens on KVP40 41. Results of three solubility screens on the 41 helicase after changes in the purification process. The helicase is soluble most in basic buffers with citrate and sodium.

Screen 1 Sodium HEPES pH 7.0 Sodium citrate Sodium sulfate Magnesium chloride Lithium chloride Potassium chloride Sodium chloride

Screen 2 Sodium TAPS pH 8.0 Sodium HEPES pH 7.0 Sodium citrate Sodium sulfate Sodium formate Calcium Chloride Lithium chloride Potassium chloride Sodium chloride

Screen 3 Sodium TAPS pH 8.0 Sodium HEPES pH 7.0 Sodium PIPES pH 6.0 Sodium citrate Sodium sulfate Sodium cacodylate Sodium acetate Sodium formate Magnesium chloride Lithium chloride Sodium chloride

Solubility Screen of KVP40 41 0.01157

H2O

0.56436

TAPS HEPES

0.44905 0.43281

Salts and Buffers

PIPES

0.016396

MES Na Citrate

0.59106 0.032822

Na Phosphate Na Sulfate

0.42988 0.36937 0.34425 0.36085

Na Cacodylate Na Acetate Na Formate

0.11232

CaCl2 Mg2Cl

0.31709 0.25666

LiCl

0.067636

KCl NaCl

0.13481 0.057511

NH4Cl

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

ABS (595 nm)

Figure 3.48: Solubility screen of KVP40 41 helicase. The precipitate 41 helicase was incubated with individual salts and buffers and the supernatant was measured at 595 nm to detect soluble protein (Bradford reagent). The helicase was very soluble in TAPS pH 8.0, sodium citrate, sodium sulfate, magnesium chloride, and sodium chloride.

156

Several organics were also added to the KVP40 41 helicase to enhance the solubility of the protein during these experiments. Additives such as glycerol, glucose, sucrose, 2-methyl-2,4 pentadiol, ethylene glycol, and 1,2 propanediol (all at 10 % v/v) were added to small sample volumes of precipitated KVP40 41 protein. The isolated supernatants were measured at 280 nM and sucrose re-solubilized more of the protein as compared to the other organic additives. 10 % sucrose was used in several crystallization studies. DLS and Superdex 200 experiments suggested that the 41 helicase existed as a multimer in solution, suggesting the protein was a dimer or trimer in solution. DLS (results in Table 3.14) showed that the 41 helicase had a hydrodynamic radius ranging from 48.0-61.0 Ǻ with estimated molecular weights around the size of trimers, predominately. Measurements of KVP40 41 with ATPγS were unable to be performed because the count rate was too high, suggesting the presence of large aggregates. The samples were spun at 20,000 Xg and diluted but measurement readings were still not obtainable. Size exclusion chromatography was performed many times, with the protein eluting at the same elution time for every run. The protein eluted in two peaks, very early in the run. Overall, the data suggests that the protein was in equilibrium between the monomeric state and the dimer/trimer state. Most helicases function as a hexamer, perhaps the presence of the helicase loader protein, or ssDNA with ATP and Mg2+ would promote the formation of the hexamer.

157

Table 3.14: DLS results for KVP40 41 helicase. The KVP40 41 helicase was an oligomer in solution, with the molecular weight suggesting the 41 was a dimer (110 kDa) or trimer (165 kDa).

Protein

Temperature °C

% Polydispersity

Radius Ǻ

KVP40 41

4

8.6

61.0

Molecular Weight (kDa) 227.0

KVP40 41

4

27.1

52.0

159

KVP40 41

20

30.2

51.0

158

KVP40 41

4

8.9

48.0

131

KVP40 41

20

9.2

51.0

153

The KVP40 59 protein was fairly pure and stable after the final purification column. Experiments with the 59 protein were designed around experiments that had been performed by a previous graduate student (Anne Senger). The individual protein was screened with and without fork DNA by this student, with little information about the crystallization obtained. For this project, the 59 protein was used to stabilize and study the 41-59 protein complex. Before studying the complex, DSC was performed to determine the temperature stability of the 59 protein in a comparison to the T4 59 protein. Two peaks were present from the DSC run, the first around 53.3 °C and the second peak around 64.5 °C. The run was performed a second time with one peak present at 53.9 °C and the end of the run was noisy between 66.2 °C and 90 °C. KVP40 59 protein unfolds around the same temperature that the T4 59 protein unfolds (53 °C). The presence of the second peak might be from the formation of aggregates or the protein precipitating out of solution. Both the KVP40 41 and 59 proteins were screened individually in a variety of crystal screens (Table 3.15). The screens were set up with the Honeybee robotics system

158

at ambient temperature and then stored at 4 °C or ambient temperature. 41 helicase was screened at several concentrations, 5.4 mg/mL, 7.1 mg/mL, and 15.2 mg/mL where many of the drops were clear or precipitated. The helicase was not very stable at the highest concentration (15.2 mg/mL) because it started to precipitate before the trays were set up, thus those trays were prepared with the partially precipitated protein. These trays had predominately precipitated sitting drops. The KVP40 41 helicase was screened alone and with additives such as ATP, ATPγS, MgCl2, and sucrose. The protein screened with sucrose (41 @ 7.1 mg/mL) resulted in many clear drops. Drops containing 41 helicase (7.7 mg/mL) and 10 mM ATP or ATPγS were a mixture of light and dark precipitate, with a few clear drops. When screening the helicase with magnesium (5 mM MgCl2) and ATP (5 mM ATP), the protein began to precipitate over time.

This complex was

screened at ambient temperature and the drops were mainly precipitated. The T4 41 was also screened concurrently with the KVP40 41, individually and in combination of magnesium and ATP. T4 41 helicase was screened at concentrations around 18.0 mg/mL and this protein was quite stable as compared to the KVP40 41, even in the presence of ATP. The majority of the T4 41 drops were precipitated, with some clear drops. Several conditions produced crystals, but the majority of the crystals were salt (Table 3.16). One condition (Wizard II condition 20) was expanded and the crystals were soaked in 25% ethylene glycol.

The crystals were screened for diffraction on the

diffractometer, but the diffraction pattern was characteristic of salt.

159

Table 3.15: Commercial and in house crystallization screens used for crystallization studies of the KVP40 proteins. The temperature denotes tray storage temperature. The drop size consists of protein : screen condition. # of setup specifies the number of times the complex was screened against that particular crystallization screen.

Protein(s) (Additive) KVP40 41 (10% sucrose or ATPγS) KVP40 41 (10% sucrose or ATPγS) KVP40 41 (10% sucrose) KVP40 41 (10% sucrose or ATPγS) KVP40 41 (ATPγS) KVP40 41 (ATPγS) KVP40 41 (ATPγS) KVP40 41 (ATPγS) KVP40 41 KVP40 41 KVP40 41 KVP40 41 KVP40 41 KVP40 41 (with ATP and ATPγS) KVP40 41 (with ATP and ATPγS) KVP40 41 (with ATP and ATPγS) KVP40 41 (with ATP and ATPγS) KVP40 41 and T4 41 KVP40 41 and T4 41 KVP40 41 KVP40 41 KVP40 41 KVP40 41 KVP40 41 (MgCl2 + ATP) and T4 41 (MgCl2 + ATP) KVP40 41 (MgCl2 + ATP) and T4 41 (MgCl2 + ATP) KVP40 59 KVP40 59 KVP40 59 KVP40 59

Temperature (°C)

Drop size (µL)

Crystallization Screen

# of setup

Ambient

0.5 + 0.5

Index

2

Ambient

0.5 + 0.5

PEG Ion/ Natrix

2

Ambient

0.5 + 0.5

Wizard I and II

1

Ambient

0.5 + 0.5

Crystal Screen I and II

2

4 4 4 4 Ambient Ambient Ambient Ambient Ambient

0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 1.5 + 0.5 1.5 + 0.5 1.5 + 0.5 1.5 + 0.5 1.5 + 0.5

Index PEG Ion/ Natrix Wizard I and II Crystal Screen I and II Index PEG Ion/ Natrix Cryo I and II Wizard I and II Crystal Screen I and II

1 1 1 1 1 1 1 1 1

4

0.5 + 0.5

Index

1

4

0.5 + 0.5

Wizard I and II

1

Ambient

0.5 + 0.5

Index

1

Ambient

0.5 + 0.5

Wizard I and II

1

4 Ambient Ambient Ambient 4 4

0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5

Additive Additive Cryo and Memfac Cryo I and II Cryo and Memfac Cryo I and II

1 1 1 1 1 1

4

0.5 + 0.5

Additive

1

Ambient

0.5 + 0.5

Additive

1

Ambient Ambient Ambient Ambient

0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5

Additive Crystal Screen I and II Cryo I and II Index

1 1 1 1

160

Table 3.16: Crystal hits of KVP40 proteins. Crystal hits of KVP40 41 and 59, including commercial screen and crystallization conditions. Several conditions were stained with the Izit dye (blue drops) and the crystals did not absorb the stain.

# 1

Protein KVP40 41

Screen Wizard II (20)

Crystallization Condition 15 % Ethanol 0.1 M MES pH 6.0 0.2 M Zn(OAc)2

2

KVP40 41

Crystal Screen I (24)

20 % (v/v) 2-propanol 0.1 M Na acetate pH 4.6 0.2 M CaCl2

3

KVP40 41

Additive (74)

30 % (v/v) 2-Methyl-2,4pentanediol 0.025 M Sodium Oxalate

4

KVP40 41

Crystal Screen II (12)

30 % v/v PEG400 0.1 M Na acetate pH 4.6 0.1 M CaCl2

5

KVP40 59

Cryo I (6)

40 % (v/v) PEG-600 0.1 M cacodylate pH 6.5 0.2 M Ca(OAc)2

6

KVP40 59

Cryo I (17)

40 % (v/v) 1,2-propanediol 0.1 M HEPES pH 7.5

7

KVP40 59

Cryo I (35)

40 % v/v Ethylene Glycol 0.1M HEPES pH 7.5 5% w/v PEG-3000

161

Picture

The KVP40 59 protein was screened at 10.7 mg/mL in several screens at ambient temperatures. Drops were clear and precipitated for all of the trays. Several crystals were produced but the majority of the crystals did not turn blue with the Izit dye nor did they break easily, suggesting that the crystals were salt. Another interest would be to study each of the proteins in the presence of DNA, so the proteins were tested for the presence of contaminating nuclease activity. Initial experiments indicated that both 41 helicase and 59 protein were nuclease free, but in the case of the 41 helicase, no magnesium was present in the activity assay to activate the nuclease. Our collaborators (Drs. Nancy Nossal and Charlie Jones) detected nuclease activity with the 41 helicase, which was not removed during the purification process. The 59 protein contained magnesium in the protein buffer, therefore the results were most likely correct for this protein.

3.2.3. KVP40 protein complex studies Small scale studies were performed to determine the solubility of the 59 protein in the presence of the 41 helicase. When 1 µL of 41 helicase was added to 1 µL of 59 protein, the drops remained clear. Since the proteins appeared to be stable, the complex was prepared in a larger volume and higher concentration for crystal screens. Upon the addition of 41 helicase to the 59 protein (1:1 ratio), both proteins precipitated at higher concentrations. This precipitated complex was used in the crystal screens set up at 4 °C and ambient temperature using the commercial/in house screens PEG Ion/ Natrix and Crystal Screen I and II. The majority of the sitting drops were precipitated, with few clear drops.

162

Different preparation techniques and additives were investigated in order to obtain a soluble 41-59 complex. Precipitate is formed when 59 protein was added to an excess of 41 protein (1:6 ratio). This precipitation is seen whether 59 protein is added to 41 helicase or the 41 helicase added to the 59 protein (1:1 ratio). KVP40 59 was added to a solution of 10 mM ATP-41 helicase and the complex was initially clear, but a few minutes later the complex precipitated. The precipitate and supernatant were run on a SDS-PAGE, while the supernatant was checked for protein using the UV-Vis. Both proteins were found in the precipitate and the supernatant, therefore a new complex sample was prepared in order to perform a solubility screen on the precipitate. The solubility screen suggested several reagents that would facilitate the solubilization and stabilization the complex. The samples from the solubility screen were run on a SDSPAGE to determine whether either protein or just one protein was resolublized with the different buffers and salts (Figure 3.49). Both 41 helicase and 59 protein were found in the supernatant of ammonium chloride, sodium chloride, potassium chloride, lithium chloride, sodium sulfate, sodium citrate, sodium PIPES pH 6.0, sodium HEPES pH 7.0, sodium TAPS pH 8.0, and water. At this point, the project was discontinued until the complex can be stabilized, nuclease activity removed and the 59 protein was further characterized.

163

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19

55.4 kDa

21.5 kDa

Lane Sample 1 MW ladder 2 NH4Cl 3 NaCl 4 KCl 5 LiCl 6 MgCl2 7 CaCl2 8 Na formate 9 Na acetate 10 Na cacodylate 11 Na sulfate 12 Na phosphate 13 Na citrate 14 MES pH 5.6 15 PIPES pH 6.5 16 HEPES pH 7.5 17 TAPS pH 8.5 18 H2O 19 MW ladder

Figure 3.49: SDS-PAGE results of KVP40 41/59 complex solubility screen. The supernatant of the complex in the presence of different salts and buffers show which conditions was able to solubilize the complex. Both proteins were present in basic buffers and a variety of salts (sodium, ammonium, citrate, potassium, phosphate).

3.3 Bacteriophage RB69 59 results and discussion 3.3.1. Cloning of RB69 59 RB69 59 gene was isolated from Bacteriophage RB69 genome with PCR using an annealing/melting temperature of 57 °C. Two bands were present on the agarose gel after the PCR process, the first band (~ 550 bp) corresponding to the 59 gene and the second lower band corresponding to an oligomer of the primers (Figure 3.50 a). This band was purified, inserted into the pENTR-D cloning plasmid and transformed into the XL10 cloning host. The 3,132 bp isolated vector (2,580 bp pENTR-D + 552 bp 59 gene) was run on an agarose gel with a band around 3,100 bp (Figure 3.50 b). The LR clonase reaction was utilized to transform the gene from the cloning vector into the expression

164

plasmid (pDEST17). A band around 5,200 bp was seen on the gel verifying the transfer of the 59 gene into the pDEST17 (Figure 3.50 c). Several vectors for each cloning reaction were isolated and verified on the agarose gels. The 59 gene was also cloned directly into the expression vector pET101 successfully by an undergraduate student (Hillary Voss).

Two different expression

vectors were utililzed because of difficulties that surfaced during protein expression and lysis. The pDEST17 added an N-terminal 6X His tag to the 59 protein, while the pET101 did not add a tag and just the native protein was expressed. pET101 does add an Cterminal 6X His tag to the protein but the presence of the stop codon prevented the addition of the His tag to the 59 protein. The 59/pDEST17 construct also had a TEV protease cut site engineered in front of the 59 gene, with a linker located between the His tag and the TEV protease. Both expression clones were sequenced, verifying that the 59 gene sequence is correct and no mutations occurred for both constructs.

165

1

A.

2

3

1

4 Lane 1 2 3 4

B.

Sample 100 bp ladder 59 #1 PCR 59 #2 PCR 100 bp ladder

2

3 4 5

6 7

3,675 bp

1,000 bp

Lane 1 2 3 4 5 6 7

Sample λ BSTE ladder pDrive (3.8 kb) 59 #1 pENTR-D 59 #2 pENTR-D 59 #3 pENTR-D 59 #4 pENTR-D pET28 (5.5 kb)

2,323 bp 500 bp

1

C.

2

3

4

5

Lane Sample 1 Supercoiled DNA plasmid ladder 2 32 #1 pDEST17 3 32 #2 pDEST17 4 59 #1 pDEST17 5 59 #2 pDEST17

6 kb 5 kb

Figure 3.50: Results of cloning RB69 59 gene. A) The 59 gene product from PCR has a band around 550 bp. B) Insertion of the 59 gene into cloning plasmid pENTR-D with a band around 3,100 bp. 59#2 was sequenced properly.

The pDrive is a standard cloning plasmid and pET28 is a standard

expression plasmid. C) Insertion of the 59 gene into the expression plasmid pDEST17 with a band around 5,200 bp.

3.3.2. RB69 59 expression and purification The 59/pDEST17 vector was transformed into the E. coli cell line BL21 (DE3) pLysS and checked for expression. 59 protein was expressed with a band appearing around 23 kDa; the 59 protein appeared to be running smaller than the actual size of the 166

protein (26 kDa). The 59/pDEST17 vector was sequenced properly; therefore the protein migrates further on the gel perhaps based on the amount of SDS attached to the protein. A low salt lysis buffer (100 mM NH4Cl) was used to extract the 59 protein from the expression cell pellet, but the protein was not soluble. The insoluble pellet was used in a lysis solubility screen in order to determine the proper conditions to solubilize the protein. 1 M salts (NH4Cl or NaCl), detergents (Bug Buster), or buffer (Tris pH 8.0) was added to the insoluble material and the protein remained insoluble. RB69 59 protein was not soluble when the protein was expressed at 16 °C and the lysis study was performed. Comparing the 0 hr expression sample (before IPTG induction) to the 3 hr sample, a small band of the 59 protein can be seen before induction, suggesting that the pDEST17 has a leaky promoter. In fact, there is no lac control for the pDEST17 plasmid, therefore the leaky expression of a potentially toxic protein can have several effects on the bacterial host. Production of toxic protein can kill the host, prevent further protein production, or cause the protein to form inclusion bodies. When examining the lysis pellet, the insoluble material was white, opalescent bead-like material, resembling inclusion bodies. Extraction of the RB69 59 protein from the inclusion bodies was unsuccessful. RB69 59 has an N-terminal 6X His tag which might be interfering with the folding of the protein. The N-terminus of the T4 59 protein is located within the protein once folded, so perhaps the His tag is preventing the 59 protein from properly folding, thus making the protein insoluble. Therefore, the 59/pET101 construct was studied since the pET101 is under lac control and there is no additional tag attached to the protein.

167

The 59 protein was expressed in the BL21 (DE3) star cell line with a typical 6L expression producing 9-10g cell pellet. The protein was lysed in a low salt lysis buffer containing 25 mM Tris pH 7.5, 150 mM NaCl, 10 mM MgCl2, 2 mM EDTA, 5 % sucrose, and 0.3% PEI. With a theoretical pI of 9.15, the 59 protein was purified using cation-exchange columns (Figure 3.51).

Purification Scheme of RB69 59 Protein 59 SP Sepharose

Poros PE Poros HS Figure 3.51: Purification scheme for the RB69 59 protein. Poros PE removes endogenous nucleases, Sp Sepharose and Poros HS removes contaminating proteins. The 59 protein does not stick to the Poros PE and this protein is fairly pure after the Poros HS run.

Lysate containing the 59 protein was prepared for purification by decreasing the conductivity of the sample below the conductivity of buffer A (buffer A: 25 mM Bis Tris pH 6.5, 50 mM NH4Cl, 2 mM MgCl2, 2 mM BME)

The SP Sepaharose (SP) was

equilibrated with buffer A and lysate containing the 59 protein was loaded onto the column. The 59 protein was eluted from the SP (~ 450 mM) in a broad peak during a 10 CV linear gradient run (buffer B: 25 mM Bis Tris pH 6.5, 1 M NH4Cl, 10 mM MgCl2). The SP fractions containing the 59 protein were combined and the conductivity of the

168

sample was increased (addition of 4 M NH4Cl) to match the conductivity of buffer B. The 59 protein was run over the PE column where the nucleases bound to the resin, while the 59 was located in the effluent. The conductivity of the 59 protein PE effluent was decreased below the conductivity of HS buffer A through dialysis (25 mM Bis Tris pH 6.5, 40 mM NH4Cl, 10 mM MgCl2, 2 mM BME). The 59 protein eluted from the HS (~ 250 mM) in a narrow peak during a 20 CV linear gradient. The 59 protein was fairly pure (> 95 %, assessed with SDS-PAGE) after the Poros HS run (Figure 3.52). Purification of the 6 L expression pellet of 59 protein produced around 50-60 mg of protein. The 59 protein began to precipitate as the concentration increased, especially when the concentrated protein was stored overnight at 4 °C. This precipitated protein was used in a solubility screen; the RB69 59 protein prefers buffers at pH 5, 7, and 8 and several salts (ammonium, lithium, magnesium, calcium chlorides and sodium formate, cacodylate, sulfate, and citrate) (Figure 3.53). Based on the solubility screen results, new purification buffers were designed for optimal protein solubility (25 mM Tris pH 7.5, 50 mM -1M NH4Cl, 5 mM MgCl2) as compared to the original purification buffers (25 mM Bis Tris pH 6.5, 100 mM -1M NaCl, 10 mM MgCl2). The protein eluted from both SP Sepharose and Poros HS columns around the same conductivity ranges, with similar purity to the previous purification runs. 59 protein eluted from the SP Sepharose in fractions 9 - 22 (~ 450 mM NH4Cl). RB69 59 protein eluted from the Poros HS in fractions 9-15 (~ 300 mM NH4Cl) (Figure A8).

169

1 2

3 4 5 6

Lane 1 2 3 4 5 6

31.0 kDa 21.5 kDa

Sample Fraction 14 Fraction 15 Fraction 16 Fraction 17 Fraction 18 MW ladder

Figure 3.52: SDS-PAGE of RB69 59 protein from HQ. The RB69 59 protein was fairly pure after the Poros HS column run, with a few faint contaminant bands towards the middle and end of the peak. The protein band ran at the same molecular weight as the T4 59 protein (~ protein around 26 kDa, but the band appears slightly above the 21.5 kDa standard band).

Solubility Screen of RB69 59

Salts and Buffers

H2O Na TAPS Na HEPES

0.22794 0.21793 0.27389

Na PIPES Na MES Na Citrate Na Phosphate Na Sulfate Na Cacodylate Na Acetate Na Form ate CaCl2 Mg2Cl LiCl KCl NaCl NH4Cl

0.1685 0.28605 0.322277 0.16122 0.2609 0.26578 0.18582 0.25307 0.24509 0.19 0.26972 0.18524 0.0084521 0.17635

0

0.05

0.1

0.15

0.2

0.25

0.3

ABS (595nm)

Figure 3.53: Solubility screen of RB69 59. The precipitate protein was incubated with individual salts and buffers and the supernatant was measured at 595 nm to detect soluble protein (Bradford reagent). The helicase assembly protein was most soluble at a buffer of pH 5 or 7, with sodium citrate or cacodylate and ammonium, lithium, or calcium chlorides.

170

0.35

3.3.3 RB69 59 crystallographic studies The 59 protein was screened with a variety of crystallization screens using two different buffers. The first buffer (1) contained 25 mM Tris pH 7.0, 300 mM NH4Cl, 25 mM sodium tri-citrate, 5 mM MgCl2, and 2 mM BME, while the second buffer (2) contained 25 mM Tris pH 7.5, 250 mM NH4Cl, 10 mM MgCl2, 2 mM BME (Table 3.17). The protein was screened at several temperatures, protein concentrations, and screen concentrations. The protein screens were set up using the Honeybee robotics system and some of the later trays were scored with the imager.

Table 3.17: Commercial and in house crystallization screens used for crystallization studies of the RB69 59 protein. The temperature denotes the tray storage temperature. The drop size consists of protein: screen condition. # of setup specifies the number of times the complex was screened against that particular crystallization screen. The (number) after the protein refers to the tray number.

Protein

Crystallization Screen

# of setup

4

Drop size (µL) 1+1

PEG Ion/ Natrix

1

Temperature (°C)

RB69 59 (1) RB69 59 (2) RB69 59 (3) RB69 59 (4) RB69 59 (5) RB69 59 (6) RB69 59 (7) RB69 59 (8)

4

1+1

Crystal Screen I and II

1

4 Ambient Ambient Ambient Ambient Ambient

1+1 1+1 1+1 1+1 1+1 1+1

1 1 1 1 1 1

RB69 59 (9)

4

1+1

4

1+1

Addtive PEG Ion/Natrix Cryo I and II Wizard I and II Crystal Screen I and II Additive PEG Ion/ Natrix (50 µL screen + 50 µL water) Crystal Screen I and II (50 µL screen + 50 µL water)

4

1+1

RB69 59 (10) RB69 59 (11)

Cryo I and II (50 µL screen + 50 µL water)

171

1 1 1

The 59 protein was screened at a concentration of 12.46 mg/mL in trays 1-3, 7.82 mg/mL in trays 4-8, and at 9.75 mg/mL in trays 9-11. Protein buffer 1 was used in trays 1-8, while protein buffer 2 was used in trays 9-11. The sitting drops in all trays were predominately dark precipitates and crystalline precipitates, regardless of protein concentration, screen concentration, or temperature. Crystals were found in trays 5-7, although the crystals were salt (Table 3.18).

Table 3.18: Crystal hits of RB69 59 protein. Crystal hits RB69 59, including commercial screen and crystallization conditions. Several conditions were stained with the Izit dye (blue drops) and the crystals did not absorb the stain. The crystals were in high salt conditions and were determined to be salt in all cases.

# 1

Protein RB69 59

Screen Cryo I (25)

Crystallization Condition 30 % (w/v) PEG 200 0.1 M Tris pH 8.5 0.2M (NH4)2HPO4

2

RB69 59

Wizard I (9)

1M (NH4)2HPO4 0.1 M Acetate pH 4.6

3

RB69 59

Wizard II (46)

1M (NH4)2HPO4 0.1 M Imidazole pH 8.0 0.2 M NaCl

4

RB69 59

Crystal Screen II (43)

50 % (v/v) 2,4-Methyl-2propanediol 0.1 M Tris pH 8.5 0.2M (NH4)H2PO4

172

Picture

Since the majority of the crystal screens resulted in precipitated drops or salt crystals, there were two pathways to choose from; 1) further optimize the protein buffer and optimize the concentration of the protein in the screen or 2) clone the 32 protein to stabilize the 59 protein. Another route is to investigate the protein-DNA interactions between the RB69 59 with fork DNA as well.

3.3.4 RB69 59+32-B crystallographic studies The 59-32-B complex was screened against two crystallization screens, PEG Ion/Natrix and Crystal screen I and II. The complex buffer condition contained 25 mM Tris pH 7.5, 200 mM NH4Cl, 2 mM MgCl2, 2 mM BME and the concentration of 59 was 2.1 mg/mL and 32-B was 1.8 mg/mL. The low concentration of the complex was due to the limited availability of the 32-B (prepared by Lindsey Easthon). The trays were set up by hand with 1 µL + 1 µL drops and the trays were stored at room temperature. The PEG Ion/Natrix screen produce mainly precipitated and light precipitated drops, while the Crystal screen I and II drops were a mixture of clear, crystalline precipitate, precipitate, and dark precipitate. Several protein complex hits were obtained in the Crystal screen I and II conditions (Table 3.19). Out of eleven hits, only six of the hits did not have the same type of crystal in the well solution. The Izit dye was added to all eleven hits and not one crystal absorbed the dye.

173

Table 3.19: Crystal hits of RB69 59 + 32-B complex. Crystal hits of the 59+32-B complex from commercial screens. Several conditions were stained with the Izit dye (blue drops) and the crystals did not absorb the stain. Some pictures were taken with a polarizer.

# 1

Protein 59+32B

Screen Crystal Screen I (1)

Crystallization Condition 30 % v/v 2-methyl-2,4pentanediol 0.1 M Na Acetate pH 4.6 0.02 M CaCl2

2

59+32B

Crystal Screen I (6)

30 % w/v PEG 4,000 0.1 M Tris pH 8.5 0.2 M MgCl2

3

59+32B

Crystal Screen I (18)

20 % w/v PEG 8,000 0.1 M Na Cacodylate pH 6.5 0.2 M MgAcetate

4

59+32B

Crystal Screen I (22)

30 % w/v PEG 4,000 0.1 M Tris pH 8.5 0.2 M Na Acetate

5

59+32B

Crystal Screen I (24)

20 % v/v 2-propanol 0.1 M NaAcetate pH 4.6 0.1M CaCl2

6

59+32B

Crystal Screen II (46)

30 % w/v PEG monomethylether 550 0.1 M Bicine pH 9.0 0.1 M NaCl

174

Picture

Chapter 4: Aeropyrum pernix replication and repair proteins Most of the research on PCNA subunit interactions, to date, has been on Archaeal SSO and human PCNA. The main goal of this project was to study the protein-protein interactions between ApePCNA subunits and with each of the proteins involved in lagging strand maturation (DNA polymerase, DNA ligase, and FEN-1). In homotrimeric PCNA, the subunits are identical; therefore, each replication protein is capable of interacting with any of the subunits. It is not known whether three ligases or three FEN-1 will interact with the homotrimer simultaneously in-vivo, but crystal structures show that the replication proteins/peptides are bound to all three subunits.79, 84 The heterotrimeric PCNAs have three different subunits (1, 2, and 3) and early work has shown that each subunit specifically interacts with a particular protein, i.e. SSO PCNA1-FEN-1, PCNA2DNA polymerase, and PCNA3-DNA ligase.86, 110 Recent data confirms that the different PCNA subunits have a preferential binding to specific proteins, but that they also have a slightly weaker affinity for all the replication proteins.74

Interactions between the

heterotrimeric subunits-subunits and the subunits-replication proteins have been studied in two archaeal species; initially in Sulfolobus solfataricus (Sso) and very recently in Aeropyrum pernix (Ape). Both studies have shown that PCNA can form heterotrimers, homotrimers, dimers, and monomers.74, 110 In late 2007 and in direct competition to work reported here, Imamura and co-workers determined the intersubunit binding affinity

175

(8.3 nM- 974 nM) for the A. pernix PCNA and the subunits binding to replication proteins, showing high to moderate affinity (0.14 µM-463 µM).74 The PCNA subunits Ape0162, Ape0441, and Ape2182, the Ape DNA ligase and the Ape DNA polymerase were cloned, expressed, and purified. The PCNA subunits were characterized using DLS, DSC, and crystallography, while the protein-protein interactions between the PCNA subunits were characterized with mass spectrometry. Subsequent protein-protein interactions between the PCNA subunits and other replication proteins were characterized with mass spectrometry also. Ape0162 was thermostable to ~ 98 °C, Ape2182 was thermostable to ~ 87 °C, while Ape0441 had two transitions, 46 °C and 112 °C. The estimated molecular weight suggested that each subunit existed as a monomer or dimer, depending on temperature.

4.1

PCNA results and discussion

4.1.1 Ape0162 The gene for Ape0162 was originally synthesized and cloned into pDrive by a former lab technician (Pooja Talaty). This project began with the restriction of the Ape0162 gene from the pDrive vector with Nde I and Hind III, in order to isolate and verify the presence of the Ape0162 gene (~789 bp) (Figure 4.1a). The gene was then inserted into pET28 resulting in a band around 6,150 bp; a linear DNA ladder (BSTEdigested λ ladder) was run with the sample to get an estimate of the size of the vector (Figure 4.1b). The construct was sequenced, verifying the proper insertion of Ape0162 into pET28 and the addition of an N-terminal hexahistidine tag.

176

A.

B.

1 2 3 4 5 6

1

2

3

4

8,454 bp

1,000 bp

0162 (789 bp)

0162 (6,150 bp)

500 bp

Figure 4.1: Cloning of Ape0162 into pET28. A) Restriction of Ape0162 from pDrive using Nde I and Hind III, resulting in the correct gene size ~789 bp. Lane 1: 100 bp DNA ladder; Lane 2: BSTE digested λ ladder (linear); Lane 3: Ape0162 (Nde I and Hind III); Lane 4: Ape0162 (Nde I); Lane 5: Ape0162 (Hind III); Lane 6: Ape0162 (no enzymes). B) Insertion of Ape0162 into pET28 resulting in a construct size ~6,150 bp. Lanes 1 and 4: BSTE digested λ ladder (linear); Lanes 2 and 3: Ape0162 in pET28.

The Ape0162/pET28 construct was transformed into BL21 DE3 RIL and expression of recombinant His-Ape0162 was evident with the presence of a band on the SDS-PAGE around 31.2 kDa (Figure 4.2). The molecular weight of Ape0162 is 29,443.8 Da and the presence of the hexahistidine tag increases the molecular weight to 31,244.7 Da, thus the band around 31.2 kDa on the SDS-PAGE is most likely His-Ape0162. The cell pellet from the protein expression was lysed with a standard low salt lysis buffer (100 mM Tris HCl pH 8.0, 10 mM EDTA, 150 mM NH4Cl, 10 % sucrose, 0.3 % PEI, 5 mM DTT), and the protein was present in both the pellet and the lysate. The minimal quantity present in the lysate was not enough to proceed and improvements in extraction were therefore attempted. The pellet was divided and resuspended with either 1 M NaCl or Bug Buster (detergent). After ½ hour incubation at room temperature, each reagent was

177

able to solubilize a small fraction of the protein. A sample from the low salt lysis suspension was heated to 80°C for 24 hrs and the protein remained insoluble.

1 2 3 4 5

0162

36.5 kDa 31 kDa

Figure 4.2: SDS-PAGE of Ape0162/pET28 expression and lysis. Ape0162 was expressed in cell line BL-21 DE3 RIL and a low salt lysis buffer was used to isolate the protein in the lysate. Ape0162 was found in the pellet and lysate after lysis. Lane 1: Ape0162 RIL 0 hr expression; lane 2: Ape0162 RIL 3 hr expression; lane 3: Ape0162 lysis pellet; lane 4: Ape0162 lysate; lane 5: MW ladder.

In order to obtain a larger quantity of soluble protein, different cell lines, expression temperatures, concentration of IPTG, and lysis buffers/additives were investigated. First, His-Ape0162 was expressed in cell lines BL21 DE3 RILP and BL21 DE3 ROS. His-Ape0162 was completely insoluble when expressed in ROS cell line; the white lysis pellet after detergent extraction suggested that the protein was in inclusion bodies. Several denaturing agents, including urea (4-6.6 M) and guanidine HCl (2.5-4.2 M) were added to the inclusion body samples (3 mL volume size) and incubated at either room temperature or 90 °C. The sample pellets were solubilized when incubated at 90

178

°C, while the samples at room temperature were still precipitated. After the removal of the denaturing agents (either by rapid dialysis or step-wise dialysis), the protein began to precipitate out of solution, with only a small amount of protein remaining in solution (~ 1 mg).

The unfolding and refolding of His-Ape0162 from the inclusion bodies was

minimally successful because the majority of the protein was lost in the process. Therefore, other cell lines and experimental conditions were investigated in order to increase the yield of soluble protein. Induction of His-Ape0162 in ROS cells (a “Tuner” cell line sensistive to IPTG concentrations) at 20 °C with several concentrations of IPTG (1 mM, 0.5 mM, 0.3 mM, 0.1 mM, and 0.05 mM) produced about the same quantity of protein. The protein was found in the pellet after lysis (low salt lysis buffer, 100 mM NaCl) and His-Ape0162 was unable to be extracted from the pellet using high salt (1 M NaCl) or detergent (Bugbuster). His-Ape0162 was expressed in RILP cells but again only a minimal amount of protein was extracted from the pellet with 1 M NaCl. The Ape0162/pET28 construct was transformed into the BL21 DE3 ROS2 plysS cell line and expression was observed. The expression of His-Ape0162 was monitored after induction with 1 mM IPTG; the expression levels after 3 hrs of incubation and after 21 hrs were the same, thus an incubation period of three hours is sufficient for the production of His-Ape0162 (Figure 4.3a). This time a new no salt lysis buffer (50 mM Tris pH 8.0, 1 mM EDTA, 10 % glycerol, 1 mM BME) (Figure 4.3 b) was used and the protein was soluble in quantities suitable for purification. The soluble protein in the lysate fraction was heated to 75 °C for one hour and the protein remained soluble (Figure 4.3 c). Benzonase was added to a soluble fraction of His-Ape0162 in order to remove

179

any contaminating DNA or RNA, and the protein remained in solution. Addition of PEI to the His-Ape0162 after the heating step irreversibly precipitated the protein. This suggests that the presence of PEI in the lysis buffer of all previous experiments might be part of the reason why His-Ape0162 was insoluble.

C.

B.

A.

1 1

1 2 3 4 5

0162 31.2 kDa

36.5 kDa 31.0 kDa

0162 31.2 kDa

2 3 4

2

0162 31.2 kDa

36.5 kDa 31.0 kDa

Figure 4.3: Expression and lysis of Ape0162/pET28 in BL-21 DE3 ROS2 pLys S. A). Ape0162 is expressed in cell line BL-21 DE3 ROS2 pLys S with a band appearing around 31.2 kDa. Lane 1: Ape0162 ROS2 0 hr expression; lane 2: Ape0162 ROS2 3 hr expression; lanes 3 and 4: Ape0162 ROS2 21 hr expression; lane 5: MW ladder. B). The protein is isolated in the lysate after the cells are treated with a no salt lysis buffer. Lane 1: Ape0162 lysis pellet; lane 2: Ape0162 lysate. C). Ape0162 is soluble after heating the protein at 75 °C for one hour. Lane 1: Ape0162 lysate; lane 2: Ape0162 heated pellet; lane 3: Ape0162 heated supernatant; lane 4: MW ladder.

180

Due to unknown reasons, the affinity column did not extract the His-Tag protein from crude lysates, therefore standard IEC was performed. The Ape0162 protein is rather acidic with a theoretical pI around 4.8, thus anion exchange columns were used to purify the protein. The QS was equilibrated with buffer A (buffer A: 25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 50 mM NaCl) and His-Ape0162 was loaded onto the column. Conductivity of the lysate containing the protein was not decreased since there was no salt in the lysis buffer. His-Ape0162 eluted from the QS in a broad peak (~450 mM) during a 10 CV linear gradient (buffer B: 25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 1 M NaCl). Fractions of His-Ape0162 were pooled and dialyzed (25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol) overnight (at room temperature) to decrease the conductivity below buffer A. His-Ape0162 eluted from HQ (~ 300 mM) in a broad peak during a 10 CV linear gradient. The His-Ape0162 was relatively pure (> 95 % by inspection of SDS-PAGE) after the HQ run. During some protein purifications, two bands appeared on the SDS-PAGE for Ape0162. Assuming both bands are Ape0162, the protein was degraded (Figure 4.4).

181

A. 1

B.

2 3 4

1 2

3

36.5 kDa 31 kDa

4

36.5 kDa 31 kDa

Figure 4.4: SDS-PAGE of Ape0162 Q Sepharose and Poros HQ purifications. A). Q Sepharose run of Ape0162 where Ape0162 eluted from the column in 500-700 mM NaCl. Lane 1: MW ladder; lane 2:Ape0162 Q Sepharose fraction 48; lane 3:Ape0162 Q Sepharose fraction 54; lane 4:Ape0162 Q Sepharose fraction 64. B). Poros HQ runs of Ape0162 where Ape0162 eluted from the column at 190450 mM NaCl and 670-730 mM NaCl. Lane 1:Ape0162 Poros HQ (run 1) fraction 13; lane 2:Ape0162 Poros HQ (run 1) fraction 18; lane 3: Ape0162 Poros HQ (run 2) fraction 7; lane 4: MW ladder. The red arrow highlights Ape0162 and the set of two bands present at the molecular weight of Ape0162.

Upon checking the concentration of the protein, the absorbance reading at 260 nm (A260) was greater than the reading at 280 nm (A280). The A260/A280 ratio is ~ 0.6, 0.7 for protein and above 1.5 for DNA. The A260/A280 ratio for His-Ape0162 was 1.8 suggesting that perhaps there was a cofactor (ie. ATP, GTP, NAD, FAD, ect.) or RNA/DNA interacting with Ape0162. Several different conditions were tested to determine if the contaminant could be removed from the protein. NaCl (1 M) and 0.4 % PEI were added to Ape0162 to attempt to compete away the contaminant DNA or cofactor.

After

addition of the salt, Ape0162 was still in solution, but as PEI was added, the protein precipitated. A sample of the Ape0162 was run over the HA column to facilitate removal 182

of any DNA. Ape0162 eluted from the column in a broad peak (~500 mM) during a 10 CV linear gradient (buffer A: 25 mM Tris pH 7.5, 50 mM NaCl; buffer B: 25 mM Tris pH 7.5, 1 M (NH4)2SO4), but all samples still had a high A260/A280 ratio. An ammonium sulfate precipitation at several different salt concentrations (1.5-3.5 M (NH4)2SO4) was performed, with concentrations over 2 M salt precipitating His-Ape0162. There were difficulties re-solubilizing the precipitated His-Ape0162, but the remaining soluble protein sample had an absorbance that was greater at 260 nm. It was noted that samples that were not heat-treated did not have the contaminant, thus the contaminant-protein interaction was irreversible after the heating step. In addition, the PEI results suggested that DNA might be bound to the PCNA subunit causing the protein to co-precipitate rather than separate from DNA in the presence of PEI. The best diagnosis so far is that the premature heating of the protein before purification allows the PCNA to interact irreversibly with a small strand of DNA, thus increasing the absorbance of the sample in the OD 260 nm range. The next step was to perform a large scale protein lysis without heating the lysate. Unheated His-Ape0162 lysate was purified using the Q Sepharose and Poros HQ. The QS was equilibrated with buffer A (25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 50 mM NaCl) and the lysate containing His-Ape0162 was loaded onto the QS. HisApe0162 eluted from the column in a narrow peak (~450 mM) during a 10 CV linear gradient (buffer B: 25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 1 M NaCl) (Figure 4.5 a). The QS fractions containing His-Ape-0162 were pooled together and dialyzed (25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol) to decrease the conductivity below the conductivity of buffer A. The protein was loaded onto the HQ and His-Ape0162 was

183

eluted from the column in a narrow peak (~ 350 mM) during a 10 CV linear gradient (Figure 4.5 b). After the HQ run, contaminants were still present and the absorbance at 260 nm was equal to the absorbance at 280 nm. The pooled His-Ape0162 fractions from the HQ were heated at 75 °C for one hour, and the His-Ape0162 remained soluble (Figure 4.5 c). After heating, the A260/A280 ratio was 0.6, indicating that the strongly absorbing contaminant was no longer present. After the purification and heating steps, His-Ape0162 was fairly pure (>95 % as assessed from SDS PAGE).

1 2 3 4 5

6 1

0162 31.2 kDa

C.

B.

A.

2 3 4 5 6

36.5 kDa 31.0 kDa 36.5 kDa

1 2 3

0162 31.2 kDa

31.0 kDa

36.5 kDa 31.0 kDa

0162 31.2 kDa

Figure 4.5: SDS-PAGE of modified purification of Ape0162. A) Ape0162 Q Sepharose purification run with Ape0162 eluting from the column in fractions 24-25. Lane 1: Ape0162 Q Sepharose fraction 16; lane 2: Ape0162 Q Sepharose fraction 21; lane 3: Ape0162 Q Sepharose fraction 23; lane 4: Ape0162 Q Sepharose fraction 25; lane 5: Ape0162 Q Sepharose fraction 28; lane 6: MW ladder. B) Ape0162 Poros HQ purification run with Ape0162 eluting from the column in fractions 9-12. Lane 1: MW ladder; lane 2: Ape0162 Poros HQ fraction 7; lane 3: Ape0162 Poros HQ fraction 8; lane 4: Ape0162 Poros HQ fraction 9; lane 5: Ape0162 Poros HQ fraction 10; lane 6: Ape0162 Poros HQ fraction 12. C) Ape0162 pooled fractions (9-12) from Poros HQ were heated at 75 °C and Ape0162 remained soluble. Lane 1: MW ladder; lane 2: heated precipitate; lane 3: heated soluble Ape0162.

184

After purification, the subunit-subunit interactions were going to be investigated and the presence of the N-terminus His tag would facilitate the isolation of the heterotrimer. Purified His-Ape0162 (0.15 mg/mL) was run over a Co2+ Talon affinity column, as a control, to determine where His-Ape0162 would elute from the column. His-Ape0162 did not adhere to the column, and the protein was located in the load and rinse fractions. A second run was attempted (0162 @ 2.39 mg/mL), but His-Ape0162 protein did not stick to the Talon column. Perhaps the His tag was not accessible to the Co2+ resin, thus preventing the protein from sticking to the resin. Since the protein has no affinity for the column, it is not advantageous to keep the tag on His-Ape0162; therefore the Ape0162 gene was recloned to remove the N-terminal hexahistidine tag. The Ape0162 gene was amplified from the A. pernix genome, while adding on a CACC site onto the 5’ region of the gene (forward primer: 5’ CACC ATG TCC TCT GAG GCC ACC CTA 3’; reverse primer: 5’ GAA TCC TTA CTC GAT CTT CGG CGA TAC ATA GAA 3’; TM 60 °C) (Figure 4.6 a). Several annealing temperatures (48, 55, 58°C) were tried until the reaction worked at 60 °C; the concentration of salt present in the reaction buffer required the increase in the TM. The PCR product (789 bp) was inserted into pET101, resulting in two constructs around 6,542 bp (Figure 4.6 b). The Ape0162/pET101 construct was designed to have the stop codon before the C-terminal hexahistidine tag, to inhibit the addition of the tag.

This plasmid was sequenced

indicating a correct construct and tested for protein expression.

185

A. 1

2

B. 7 kb

1

2

3 0162 (6,542 bp)

5 kb 1,000 bp 0162 (789 bp) 500 bp

Figure 4.6: Cloning of Ape0162 into pET101. A). PCR product of Ape0162 with a band around 789 bp. Lane 1: 100 bp ladder; lane 2: Ape0162 PCR product. B) Insertion of Ape0162 into pET101 with a band around 6500 bp.

Lane 1: supercoiled DNA plasmid ladder; lanes 2 and 3:

Ape0162/pET101 constructs.

This construct was transformed into several different cell types of BL21 DE3 (RIL, RILP, ROS2) and T7 express Lac Iq to test for protein expression. Each of the cell lines were tested at least two to three times, with each attempt resulting in no protein expression. Protein expression was achieved with BL-21 DE3 RIL and Ape0162 was soluble after a no salt lysis (25 mM Tris pH 8.0, 5 mM EDTA, 5 % glycerol, 1 mM BME) (Figure 4.7). A band was present around 29.4 kDa on the gel, which was running lower than the previous construct (pET28) which was reasonable because of the removal of the hexahistidine tag. The protein appeared to be running slightly lower than expected but the amount of SDS bound to the protein could be affecting the distance traveled by the denatured protein.

186

1

2 3 4

31.0 kDa

0162 29.4 kDa

21.0 kDa

Figure 4.7: SDS-PAGE of Ape0162/pET101 protein expression and lysis. Expression of Ape0162 from the pET101 construct in the BL-21 DE3 RIL cell line produced a band around 29.4 kDa. The protein was soluble in a no salt buffer. Lane 1: MW ladder; lane 2: Ape0162 RIL 0 hr expression; lane 3: Ape0162 RIL 3 hr expression; lane 4: Ape0162 lysate.

The same purification scheme was utilized to purify Ape0162 (Q Sepharose and Poros HQ), but the lysate was not heated. Both columns were equilibrated with buffer A (25 mM Tris pH 8.0, 2 mM EDTA, 2 M NaCl) and the conductivity of the sample before each column run was reduced below the conductivity of buffer A. The majority of Ape0162 was found in the effulent, as well as eluted (~ 75 mM) during the 10 CV linear gradient (buffer B: 25 mM Tris pH 8.0, 2 mM EDTA, 2 M NaCl) (Figure 4.8). These QS fractions were fairly pure (> 95 %, assessed by SDS-PAGE), and the protein was concentrated with no evidence of an additional contaminant.

187

1 2

3

31.0 kDa

0162 29.4 kDa

21.5 kDa

Figure 4.8: SDS-PAGE of purification of Ape0162. Ape0162 (without the hexahistidine tag) was purified using the Q Sepharose column. Ape0162 was isolated in the FT and fractions 1-10 of the purification run. Both samples were fairly pure. Lane 1: Ape0162 Q Sepharose FT; lane 2: Ape0162 Q Sepharose pooled fractions 1-10; lane 3: MW ladder.

Ape0162 was heated at 75 °C to remove any remaining contaminants and upon heating the protein precipitated. Ape0162 was not thermostable and purified differently than the His-Ape0162 protein, thus a sample of the Ape0162 was examined using mass spectrometry (MS) to verify the identity of the protein. The MS results for the intact mass (MS) and tryptic digest analysis (MS and MS/MS) experiments indicated that the protein was actually beta-lactamase, a protein associated with antibiotic resistance in bacteria (Figure 4.9). After a tryptic digest analysis of the protein, there is a 78 % sequence similarity between the sample sequence and the known sequence for betalactamase. The beta-lactamase gene was located on the pET101 plasmid for antibiotic resistance and was being transcribed, due to read through transcription, instead of the expected Ape0162. Both proteins are of similar molecular weights, but key differences 188

are Ape0162 is heat stable and has affinity for the anion exchange columns, unlike betalactamase.

[M+30H+]+30

10

2 8 9 0 4

10

8

6

8

2 8 8 8 5

4

[M+27H+]+27

2

6

0 2887

oxidation

2888

oxidation 28,920 28,937

2889

2890

2891

2892

2893

2894

2895

Mass (Da)

4

Mass (ave) calculated from protein sequence (0162): 29,445.0 Da

[M+34H+]+34

2

[M+23H+]+23 [M+20H+]+20

0 70

80

Figure 4.9:

90

100

110

120

130

140

m/

Analysis of Ape0162 protein using Mass Spectrometry. Analysis of

Ape0162 using intact mass experiments and tryptic digest analysis determined that the protein in question is beta-lactamase not Ape0162.

The calculated mass of Ape0162 is 29,445.0 Da; the experiments

determined the mass of the protein to be 28,904.0 Da. Further analysis determined the sequence of the protein in question is 78 % homologous to the beta-lactamase protein sequence.

189

Due to time constrants and since the Ape0162/pET28 construct produced soluble protein, it was decided to use this construct to produce His-Ape0162 for protein-protein studies. Optimal expression was achieved in the BL21 DE3 ROS2 plys S cells with the Ape0162/pET28 construct, producing 15.0 g of cells from 6 L expression. The protein band (31.2 kDa) is located near the 31.0 kDa band in the molecular marker as expected. The protein was present in the lysate of a no salt lysis buffer (50 mM Tris pH 8.0, 5 mM EDTA, 5 % glycerol), and was purified using two anion exchange columns. Samples were eluted from the columns using a 10 CV linear gradient of Buffer A (25 mM Tris pH 8.0, 5 mM EDTA, 50 mM NaCl) to Buffer B (25 mM Tris pH 8.0, 5 mM EDTA, 2 M NaCl). During the runs, the effluent saturated the UV detector and to determine where the protein eluted, samples across the chromatogram were analyzed using SDS-PAGE (Figure 4.10). His-Ape0162 eluted from the QS in broad peak (~ 350 mM) (Figure A9) and eluted from the HQ in a broad peak (~ 400 mM) (Figure A9). Many contaminants remained after both column runs. Ape0162 was further purified by heating the protein to 75 °C for one hour, with Ape0162 remaining in solution and was fairly pure (>95 %, assessed by SDS-PAGE). After heating, the concentration of Ape0162 was measured (~250.0 mg of protein from 6 L expression) and the A260/A280 was ~1. At this time, there is no sure explanation to why this absorbance phenomenon is occurring, but several ideas have been discussed. There are no tryptophans in Ape0162 (8 tyrosines and 12 phenylalanines) which can explain the low absorbance at 280 nm, but that does not explain why in some cases the OD at 280 nm is greater than the OD at 260 nm and in other cases the reverse is seen. This protein prep was performed at a large scale (all 15 g of cells purified at once), as compared to the previous smaller scale

190

purifications (~5 g level) which had the A260/A280 < 1. Perhaps the large scale was less pure and more of the contaminating agent was present upon the heating step, allowing the irreversible interaction with Ape0162 and increasing the OD at 260 nm. Despite the higher absorbance at 260 nm, Ape0162 was prepared for analysis by mass spectrometry via buffer exchange into 100 mM NH4OAc.

1 2 3 4 5 6 7 8 9 10

A. 36.5 kDa 31.0 kDa

36.5 kDa

0162 31.2 kDa

31.0 kDa

3

C.

B. 0162 31.2 kDa

1 2

1 2 3 4 5 6 7 8 9 10

0162 31.2 kDa

36.5 kDa 31.0 kDa

Figure 4.10: SDS-PAGE of Ape0162 optimized purification. A). Ape0162 Q Sepharose run with the protein eluting in fractions 9 and 12. Lane 1: MW ladder, lane 2: Ape0162 Q Sepharose flow through, lane 3: Ape0162 Q Sepharose fraction 6, lane 4: Ape0162 Q Sepharose fraction 9, lane 5: Ape0162 Q Sepharose fraction 12, lane 6: Ape0162 Q Sepharose fraction 15, lane 7: Ape0162 Q Sepharose fraction 18, lane 8: Ape0162 Q Sepharose fraction 21, lane 9: Ape0162 Q Sepharose fraction 24, lane 10: Ape0162 Q Sepharose fraction 27. B). Ape0162 Poros HQ run with the protein eluting in fractions 5-13. Lane 1: MW ladder, lane 2: Ape0162 Poros HQ flow through, lane 3: Ape0162 Poros HQ fraction 7, lane 4: Ape0162 Poros HQ fraction 10, lane 5: Ape0162 Poros HQ fraction 12, lane 6: Ape0162 Poros HQ fraction 13, lane 7: Ape0162 Poros HQ fraction 15, lane 8: Ape0162 Poros HQ fraction 18, lane 9: Ape0162 Poros HQ fraction 22, lane 10: Ape0162 Poros HQ fraction 25. C). Ape0162 heated to 75 °C to further purify the protein. Lane 1: MW ladder, lane 2: Ape0162 heated pellet, lane 3: Ape0162 heated lysate.

191

4.1.2 Ape0441 The gene for Ape0441 was originally synthesized and cloned into pDrive by a former lab technician (Pooja Talaty). This project began with the restriction of Ape0441 from pDrive to insert into pET28. The isolated restriction product was not the proper size, therefore the gene (699 bp) was amplified from A. pernix genome using an annealing temperature of 60 °C (forward primer: 5’ CAT ATG GTC GCC TCT ATC GAG AAG ATT 3’; reverse primer: 5’ GGA TCC TTA CTC GAT CTT CGG CGA TAC ATA GAA 3’). Several annealing temperatures were attempted (48, 55, 58, 60 °C) until the correct gene size was obtained; the salt concentration (15 mM MgSO4) increased the annealing temperature of the reaction. The PCR product (699 bp) was inserted into pDrive, transformed into XL10 cloning host, and three colonies yielded three constructs (~ 4,549 bp) (Figure 4.11 a, b). Ape0441 gene was restricted from pDrive using Nde I and Bam HI and inserted into pET28. Transformation of the reaction into XL10 cells produced colonies and several constructs around 6,068 bp were isolated (Figure 4.11 c). The Ape0441/pET28 construct was sequenced verifying correct gene insertion with the N-terminal hexahistidine tag present.

192

1 A.

1 2 3 B.

2

0441 (4,549 bp) 1,000 bp 500 bp

4

1 2 3 4 5 6 C. 3,675 bp 0441 (6,068 bp) pET28 (5,369 bp)

0441 (699 bp)

Figure 4.11: Cloning of Ape0441 into pDrive and pET28. A). PCR product of Ape0441 resulting in a band around 751 bp. Lane 1: 100 bp ladder; lane 2: Ape0441 PCR product. B). Ape0441 insertion into pDrive resulting in three constructs around 4,600 bp. Lanes 1-3: Ape0441/pDrive constructs (4,600 bp), lane 4: BSTE digested λ ladder (linear). C). Ape0441 insertion into pET28 (6,120 bp). Lane 1: pET28 (5,369 pb); lanes 2-5: Ape0441/pET28 construct #1-4 (6,120 bp); lane 6: BSTE digested λ ladder (linear).

The Ape0441/pET28 construct was transformed into several cell lines to test for protein expression. Ape0441 was expressed in BL21 DE3 RIL, ROS, and RILP and producing a protein expression band around 27.6 kDa. The His-Ape0441 protein was insoluble after lysis with a low salt buffer (100 mM Tris HCl pH 8.0, 10 mM EDTA, 150 mM NH4Cl, 10 % sucrose, 0.3 % PEI, 5 mM DTT) for all cell lines. Additions of 1M NaCl or Bug Buster to the protein pellet did not resolubilize the protein (Figure 4.12 a,b). His-Ape0441 expression in RIL and RILP produced inclusion bodies and several different denaturants were used to attempt to recover the protein. The inclusion bodies were resolubilized in 6.6 M urea and 4 M guanidine hydrochloride, after an overnight incubation at 90 °C. A precipitate formed as the urea and guanidine hydrochloride were slowly dialyzed out of the samples, and His-Ape0441 was found in the precipitate with a

193

4,324 bp 3,675 bp

small fraction remaining in the supernatant.

The majority of the protein was still

insoluble with this technique, therefore other experiments were attempted to produce soluble protein.

B. 1 2 A.

3

1 2 3 4

0441 27.6 kDa 31.0 kDa 21.0 kDa

5

31.0 kDa 21.0 kDa

0441 27.6 kDa

Figure 4.12: SDS-PAGE of Ape0441/pET28 expression and lysis. A). Expression of Ape0441 produces a band ~ 27.6 kDa. Lane 1: MW ladder; lane 2: Ape0441 RIL expression 0 hr; lane 3: Ape0441 RIL expression 3 hr. B). Ape0441 was insoluble in low and high salt conditions. Lane 1: Ape0441 RIL low salt lysis pellet; lane 2: Ape0441 RIL low salt lysis lysate; lane 3: Ape0441 high salt extraction pellet; lane 4: Ape0441 high salt extraction supernatant; lane 5: MW ladder.

Expression of Ape0441 in ROS was examined using several concentrations of IPTG (1 mM, 0.5 mM, and 0.05 mM), with protein only expression at IPTG concentrations of 1 mM. After low salt lysis the protein was found in the pellet, and the addition of 1 M NaCl or Bug Buster to the protein pellet did not resolubilize the HisApe0441. Expression at a lower temperature (30 °C) in RILP cells produced insoluble protein as well. It was speculated that perhaps the N-terminal hexahistidine tag might be

194

affecting the solubility of Ape0441; therefore the Ape0441 gene was inserted into pET21 to remove the tag. Ape0441 was restricted from pET28 using Nde I and Bam HI (gene size around 699 bp) and inserted into pET21, creating a construct around 6,109 bp (Figure 4.13 a, b). The Ape0441/pET28 BamHI restriction included a stop codon eliminating the C-terminal hexahistidine tag. This new construct was sequenced indicating proper insertion. This construct (0441/pET21) was tested for protein expression using several cell lines, including BL21 DE3 RIL, RILP, and ROS2 plysS.

Protein expression was

achieved in all cell lines, with a band appearing on the SDS-PAGE around 25.8 kDa (Figure 4.14). The protein was slightly smaller in molecular weight than previously seen since there was no hexahistidine tag attached to the protein. Ape0441 expressed in ROS2 was insoluble after lysis (50 mM Tris pH 8.0, 1 mM EDTA, 10 % glycerol, 1 mM BME) and the addition of 100 mM or 1M NaCl did not solublize the protein (Figure 4.14). Solubilization studies of Ape0441 protein with no salt lysis buffer, low salt (150 mM NaCl), high salt (1 M NaCl), and Bug Buster, all incubated at room temperature or 65 °C were performed. In each study, Ape0441 protein was still located in the pellet. Ape0441 expressed in the RIL and RILP cell lines were also used in lysis solubility studies, with each case resulting in insoluble protein.

195

A. 1

B. 1 2

2

3

4

5

6

3

pET28 5,369bp 7 kb 5 kb

1,000 bp 0441 699 bp

0441/pET21 6,109 bp

500 bp

Figure 4.13: Restriction digest of Ape0441 from pET28 and insertion into pET21. A). The Ape0441 gene is restricted from pET28 with Nde I and Bam HI (699 bp). Lane 1: restricted Ape0441 gene #1; lane 2: restricted Ape0441 gene #2; lane 3: 100 bp ladder. B). Insertion of Ape0441 gene into pET 21 vector producing a band around 6,100 bp. Lane 1: supercoiled DNA plasmid ladder; lanes 2-6: Ape0441/pET21 construct #1-5. Construct #1 appears to be the correct size for the insertion of Ape0441 into pET21.

1

2

3

31.0 kDa

4

5

6

0441 25.8 kDa

21.5 kDa

Figure 4.14: SDS-PAGE of Ape0441/pET21 expression and lysis. Ape0441/pET21 was expressed in BL21 DE3 ROS2 cells, producing a protein band around 25 kDa. The protein was insoluble after a no salt and low salt lysis. Lane 1: MW ladder; lane 2: Ape0441 ROS2 expression 0 hr; lane 3: Ape0441 ROS2 expression 3 hr; lane 4: Ape0441 no salt lysis pellet; lane 5: Ape0441 no salt lysate; lane 6: Ape0441 low salt lysis pellet. No band was present for the low salt lysate.

196

Ishino and colleagues noted that their recombinant Ape0441 was smaller in molecular weight than the native Ape0441 isolated from Aeropyrum pernix.89 This suggested to us that the gene sequence of Ape0441 may have been annotated incorrectly. Upon further examination of the A. pernix genome, an upstream ATG was found around 52 base pairs away from the original gene start. These additional 17 amino acids were in the reading frame with the original annotated sequence; perhaps these additional amino acids are essential for the solubility of Ape0441. The full length gene of Ape0441 (FL-Ape0441, 751 bp) was amplified from the genome using the annealing temperature of 60 °C (forward primer: 5’ CACC ATG GCC GAC GCT AGG TTC TAC TTT A 3’; reverse primer: 5’ GGA TCC TTA CTC GAT CTT CGG CGA TAC ATA GAA 3’) (Figure 4.15 a). The gene was inserted into the expression vector pET101 and transformed into the XL10 cloning host. Several colonies grew and constructs around 6,504 bp were isolated from the colonies (Figure 4.15 b). This FL-Ape0441/pET101 construct was designed with a stop codon to prevent the addition of the C-terminal hexahistidine tag.

FL-Ape0441/pET101 was sequenced

indicating correct gene insertion and then tested for protein expression.

197

A. 1

2

3

4

B. 1

5

2

3

4

5

6

7

7 kb 5 kb

500 bp 1,000 bp

0441 751 bp

Figure 4.15: Cloning of FL-Ape0441 into pET101. A). FL-Ape0441 amplification from genomic DNA, producing a band around 751 bp. Lane 1: 100 bp ladder; lanes 2-5: FL-Ape0441 PCR product. B). FL-Ape0441 insertion into pET101 resulting in a band around 6,504 bp. Lane 1: supercoiled DNA plasmid ladder; lanes 2-7: FL-Ape0441/pET101 construct # 1-6; lane 3: FL-Ape0441/pET101 construct # 2; lane 4: FL-Ape0441/pET101 construct # 3; lane 5: FL-Ape0441/pET101 construct # 4; lane 6: FL-Ape0441/pET101 construct # 5; lane 7: FL-Ape0441/pET101 construct # 6. All constructs appear to be the correct molecular weight except for construct #5.

FL-Ape0441 was transformed into four different cell lines, BL21 DE3 RILP, ROS Blue, ROS2 plys S, and T7 express, with expression occurring in two cell lines, T7 express (co-transformed with the RILP codon plasmid) and RILP. A protein band around 27.9 kDa suggests the proper expression of FL-Ape0441 with no additional tags present. The RILP expression of Ape0441 was soluble in a no salt lysis buffer (50 mM Tris pH 8.0, 1 mM EDTA, 10 % glycerol, 1 mM BME), but upon heating (90 °C) or the addition of 0.5 % PEI, the protein precipitated. Approximately half of the protein remained in the pellet after lysis and a solubility/temperature study was performed to determine if the

198

0441 6,504 bp

protein could be extracted from the pellet. The pellet was resuspended in the no salt lysis buffer and salt was added to a final concentration of 100 mM or 1M NH4Cl. Each sample was incubated at room temperature or 90 °C for 1 hour. A small portion of FL-Ape0441 was extracted from the pellet with the low and high salt additions that were incubated at room temperature. Bug Buster was also successful in solubilizing a small fraction of FLApe0441 from the pellet but improvements were insignificant. Optimal expression of FL-Ape0441 was obtained with BL21 DE3 RIL cells, where soluble protein was isolated with a no salt lysis buffer (Figure 4.16). Protein precipitation upon the addition of PEI suggested the presence of DNA in the lysate containing FL-Ape0441.

1 2 3 4 5 6

7

31 kDa 0441 27.9 kDa

0441 27.9 kDa

Figure 4.16: SDS-PAGE of FL-Ape0441/pET101 expression and lysis. FL-Ape0441 is expressed in RILP, producing a protein band around 27.9 kDa. FL-Ape0441 is isolated in the pellet and lysate after the no salt lysis. Lane 1: FL-Ape0441 RILP expression 0 hr; lane 2: FL-Ape0441 RILP expression 3 hr; lane 3: FL-Ape0441 T7 express expression 0 hr; lane 4: FL-Ape0441 T7 express expression 3 hr; lane 5: MW ladder; lane 6: FL-Ape0441/RILP lysis pellet; lane 7: FL-Ape0441/RILP lysate.

199

FL-Ape0441 purification scheme consisted of two anion exchange columns since the protein is rather acidic with a theoretical pI around 4.7.

The columns were

equilibrated with buffer A (buffer A: 25 mM Tris pH 8.0, 2 mM EDTA, 50 mM NaCl) and the conductivity of the lysate containing the FL-Ape0441 was already below buffer A. After each 10 CV linear gradient run, fraction samples across the chromatogram were analyzed using SDS-PAGE and the samples containing the FL-Ape-0441 were pooled together. The conductivity of the pooled samples was reduced below buffer A before each column run. FL-Ape0441 eluted from the QS in a broad peak (~ 125 mM) (Figure 4.17 a). The pooled fractions from the QS were loaded onto the HQ, where the FLApe0441 did not bind to the column and was located in the effulent (~ 50 mM NaCl) (Figure 4.17 b). FL-Ape0441 was fairly pure after the Poros HQ run (> 95 %, assessed by SDS-PAGE). No additional heating step was utilized since FL-Ape0441 was pure and not thermostable. Unlike the other PCNA subunits, there was not a problem with the protein absorbing more at 260 nm as compared to 280 nm. This suggests that heating the PCNA subunits induced binding of DNA or a cofactor to the subunit. In the case of FLApe0441, there are 11 tyrosines, 12 phenylalanines, and only one tryptophane and the A260/A280 ratio was around 0.6 after protein purification.

200

B.

A. 1 2 3 4 5 6 7 8 9 10 11 12

31.0 kDa

1

0441 27.9 kDa

21.5 kDa

0441 27.9 kDa

2

31.0 kDa 21.5 kDa

Figure 4.17: Purification of full length Ape0441. A). FL-Ape0441 elutes from Q Sepharose in around 50 mM NaCl. Lane 1: MW ladder; lane 2: FL-Ape0441 Q Sepharose flow through; lane 3: FLApe0441 Q Sepharose fraction # 2; lane 4: FL-Ape0441 Q Sepharose fraction # 5; lane 5: FL-Ape0441 Q Sepharose fraction # 9; lane 6: FL-Ape0441 Q Sepharose fraction # 14; lane 7: FL-Ape0441 Q Sepharose fraction # 17; lane 8: FL-Ape0441 Q Sepharose fraction # 21; lane 9: FL-Ape0441 Q Sepharose fraction # 24; lane 10: FL-Ape0441 Q Sepharose fraction # 32; lane 11: FL-Ape0441 Q Sepharose fraction # 34, lane 12: FL-Ape0441 Q Sepharose fraction # 38. B). FL-Ape0441 does not stick to the Poros HQ and is fairly pure after the run. Lane 1: FL-Ape0441 Poros HQ flow through; lane 2: MW ladder.

Mass spectrometry was utilized to verify the identity of FL-Ape0441; along with the Ape0162. Regrettably, this protein was also identified to be beta lactamase (Figure 4.18). Analysis of the intact mass of the protein calculated a molecular weight of 28,904.0 Da, while the calculated mass of Ape0441 is 27,935.8 Da.

Analysis of

fragments produced by the tryptic digest of Ape0441 was 78 % identical to the sequence of beta lactamase. The beta lactamase gene was again transcribed from the pET101

201

plasmid instead of the gene of interest. Since expression of Ape0441 was not achieved, the gene would need to be re-cloned into a different plasmid.

+ +30

[M+30H ]

10

28,904

10

8

oxidation

8

6

4

+ +27

[M+27H ]

2

6

28,885 0 2887

2888

2889

2890

2891

28,92 0 2892

2893

2894

2895

mass + +34

[M+34H ]

4

Mass (ave) calculated from protein sequence (0441): 27,935.8 Da

2

+ +23

[M+23H ]

+ +20

[M+20H ]

0 70

80

90

100

110

120

130

140

m/

Figure 4.18: Analysis of FL-Ape0441 protein using Mass Spectrometry. Analysis of FL-Ape0441 using intact mass experiments and tryptic digest analysis determined that the protein in question is beta-lactamase not FL-Ape0441. The calculated mass of FL-Ape0441 is 27,935.8 Da; the experiments determined the mass of the protein to be 28,904.0 Da. Further analysis determined the sequence of the protein in question is 78 % homologous to the beta-lactamase protein sequence.

The FL-Ape0441 PCR product with the 5’ CACC was inserted into the Gateway cloning vector, pENTR-D, and transformed into the Top10 cloning host. Several FLApe0441/pENTR-D constructs (~3,381 bp) were isolated from the colonies (Figure 4.19 a).

FL-Ape0441 was then transposed into pDESTC1, transformed into DH5α, and 202

plasmids (~4,486 bp) were isolated from the colonies (Figure 4.19 b).

FL-

Ape0441/pDESTC1 was sequenced and the correct gene was inserted with the addition of an N-terminus hexahistidine tag.

B. 1 2

A. 1

2

3

4

5

6

7

3

4

5

5 kb 4 kb 3 kb

0441 3,381 bp

4 kb

0441 4,486 bp

Figure 4.19: Cloning of FL-Ape0441 into pENTR-D and pDESTC1. A). Insertion of FL-Ape0441 into pENTR-D resulting in a construct around 3,381 bp. All seven constructs were the correct size.

Lane 1: supercoiled DNA plasmid ladder; lanes2-7: FL-Ape0441/pENTR-D constructs.

B).

Insertion of FL-Ape0441 into pDESTC1 creating a construct around 4,486 bp. Each sample as a band around the correct size of FL-Ape0441/pDESTC1, but constructs 2-4 contain additional plasmid bands. Construct #1 appears to be the correct size at 4486 bp and was sequence properly. Lane 1: supercoiled DNA plasmid ladder; lane 2: FL-Ape0441/pDESTC1 construct # 1; lane 3: FL-Ape0441/pDESTC1 construct # 2; lane 4: FL-Ape0441/pDESTC1 construct # 3; lane 5: FL-Ape0441/pDESTC1 construct # 4.

This His-FL-Ape0441 construct was tested for protein expression in several cell lines including BL21 DE3 RIL, RILP, ROS2 plysS, T7 express and BL21 star DE3 (both

203

co-transformed with RILP rare codon plasmid). His-FL-Ape0441 was only expressed in the BL21 Star cell line with a protein band appearing around 31.4 kDa, suggesting the presence of FL-Ape0441 with the additional hexahistidine tag (Figure 4.20 a). His-FLApe0441 was soluble after a no salt lysis (50 mM Tris pH 8.0, 5 mM EDTA, 5 % glycerol), but heating the protein at 75 °C for one hour precipitated the protein (Figure 4.20 b,c).

A heavy band around 24 kDa appeared after heating His-FL-Ape0441,

suggesting the possible degradation of FL-Ape0441.

1 2 A.

0441 31 kDa

1 2

3

3

1 23 C.

B.

36.5 kDa

36.5 kDa

0441 31 kDa

31 kDa

31 kDa

36.5 kDa 31 kDa

0441 31 kDa

24 kDa

Figure 4.20: SDS-PAGE of FL-Ape0441/pDestC1 expression, lysis, and heating. A). Expression of FL-Ape0441/pDestC1 in BL21 star DE3 resulting in a band around 31 kDa. Lane 1: FLApe0441 BL21 star DE3 expression 0 hr; Lane 2: FL-Ape0441 BL21 star DE3 expression 3 hr; Lane 3: MW ladder. B). No salt lysis of FL-Ape0441 isolated protein in the pellet and lysate. Lane 1: FLApe0441 no salt lysis pellet; lane 2: FL-Ape0441 no salt lysate; lane 3: MW ladder. C). Heating FLApe0441 to 90 °C precipitates the protein. The presence of a band below FL-Ape0441 (~ 24 kDa) suggested the degradation or truncation of FL-Ape0441. Lane 1: MW ladder; lane 2: heated FL-Ape0441 pellet; lane 3: heated FL-Ape0441 supernatant.

204

Large scale expression of FL-Ape0441 in the BL21 star DE3 cell line (cotransformed with the rare codon plasmid isolated from RILP), produced a large cell pellet around 25 g. The protein was extracted into the lysate using a no salt lysis buffer (50 mM Tris pH 8.0, 5 mM EDTA, 5 % glycerol), and the protein was purified using two anion exchange columns (Q Sep and HQ). Samples were eluted from the columns using a 10 column volume linear gradient of Buffer A (25 mM Tris pH 8.0, 5 mM EDTA, 50 mM NaCl) to Buffer B (25 mM Tris pH 8.0, 5 mM EDTA, 2 M NaCl). During the runs, the effluent saturated the UV detector and to determine where the protein eluted, samples across the chromatogram were analyzed using SDS-PAGE. His-FL-Ape0441 eluted from the QS (~ 425 mM) in a broad peak and the fractions containing the protein were pooled together (Figure A10). The conductivity of the protein sample was reduced below the conductivity of buffer A and the pooled fractions were loaded onto the HQ. His-FL-Ape0441 eluted from the HQ (~ 500 mM) in a broad peak with minor contaminants remaining in the fractions containing His-FL-Ape0441 (Figure A10). These fractions were heated to 75 °C for one hour in an attempt to remove the remaining contaminants. The presence of the 31.4 kDa band assumed to be FL-Ape0441 was verified by SDS-PAGE for each purification step (Figure 4.21).

After the

purification and heating steps, FL-Ape0441 is partially pure, with a major contaminant around 24 kDa.

205

A.

1 2 3 4 5 6 7 8 9

36.5 kDa 31.0 kDa 21.0 kDa

0441 28.8 kDa

B.

C.

1 2 3 4 5 6 7 8 9

1 2 3 4

5

36.5 kDa 31.0 kDa 21.0 kDa

0441 28.8 kDa

36.5 kDa 31.0 kDa 21.0 kDa

Figure 4.21: SDS-PAGE of FL-Ape0441 purification. A). FL-Ape0441 eluted from the Q Sepharose in fractions 10-13 in 425 mM NaCl. Lane 1: MW ladder; lane 2: FL-Ape0441 Q Sepharose flow through; lane 3: FL-Ape0441 Q Sepharose fraction 6; lane 4: Q Sepharose fraction 9; lane 5: Q Sepharose fraction 12; lane 6: Q Sepharose fraction 15; lane 7: Q Sepharose fraction 18; lane 8: Q Sepharose fraction 22; lane 9: Q Sepharose fraction 29. B). FL-Ape0441 eluted from the Poros HQ in fractions 6-16 in 500 mM NaCl. Lane 1: FL-Ape0441 Poros HQ flow through; lane 2: FL-Ape0441 Poros HQ fraction 7; lane 3: FL-Ape0441 Poros HQ fraction 10, lane 4: FL-Ape0441 Poros HQ fraction 13; lane 5: FL-Ape0441 Poros HQ fraction 16; lane 6: FL-Ape0441 Poros HQ fraction 19; lane 7: FL-Ape0441 Poros HQ fraction 23; lane 8: FL-Ape0441 Poros HQ fraction 26; lane 8: FL-Ape0441 Poros HQ fraction 29. C). FL-Ape0441 was further purified with a heating step to 75 °C. Lane 1: FL-Ape0441 Poros HQ fractions 6-10 heated pellet; lane 2: FL-Ape0441 Poros HQ fractions 6-10 heated supernatant; lane 3: FL-Ape0441 Poros HQ fractions 11-16 heated pellet; lane 4: FL-Ape0441 Poros HQ fractions 11-16 heated supernatant; lane 5: MW ladder.

4.1.3 Ape2182 The gene for Ape2182 was originally synthesized and cloned into pDrive by a former lab technician (Pooja Talaty). This project began with the restriction of Ape2182 from pDrive and insertion of the gene into pET28. The isolation of the Ape2182 gene was not the proper size, therefore the gene (753 bp) was amplified from the genome using 206

at the TM of 58 °C (forward primer: 5’ CAT ATG TTC AGA CTA GTA TAC ACT GCC TC 3’; reverse primer: 5’ GGA TCC TTA GCC AGC TAG GCT GGG CG 3’) (Figure 3.31a). Multiple TM (48, 55, 58 °C) were tried before the proper gene size was obtained. Two bands appeared on the agarose gel, the band around 750 bp corresponds to the correct gene size of Ape2182, while the smaller band around 100 bp corresponds to the presence of the primers (Figure 4.22 a). The isolated Ape2182 PCR product (~750 bp) was inserted into pDrive and transformed into XL10 cells. Ape2182/pDrive constructs (~4,603 bp) were isolated from the colonies and the Ape2182 gene was restricted from pDrive using Nde I and BamHI (Figure 4.22 b). Ape2182 was inserted into pET28 resulting in a construct around 6122 bp, transformed into XL10 cells, and the construct was isolated from the colonies (Figure 4.22 c). The Ape2182/pET28 construct was sequenced indicating correct gene insertion with the addition of the N-terminal hexahistidine tag. His-Ape2182 was expressed in BL21 DE3 RIL with a protein band appearing around 30.4 kDa; the molecular weight of Ape2182 is 28617.7 Da and the 6X His tag increases the MW to 30418.6 Da (Figure 4.23 a). The protein was insoluble after lysis with a low salt buffer (100 mM Tris HCl pH 8.0, 10 mM EDTA, 150 mM NH4Cl, 10% sucrose, 0.3% PEI, and 5 mM DTT). NaCl (1M) was added to the insoluble pellet, extracting a small fraction of His-Ape2182 from the pellet (Figure 4.23 b). Since some His-Ape2182 was soluble in high salt, a high salt lysis buffer (low salt lysis buffer with 1 M NH4Cl) was utilized to extract His-Ape2182, but the protein was found in the pellet.

207

A. 1

2

B. 1

3

2

3

3675 bp 2182 4,603 bp 1,000 bp 500 bp

4

5

pDrive 3,850 bp

2182 751 bp

C. 1

2

3

2182 6,122 bp

4

5

6

pET28 5,369 bp

Figure 4.22: Cloning of Ape2182 into pDrive and pET28. A). PCR product of Ape2182 resulting in a gene around 750 bp. The lower band corresponds to the primers utilized in the reaction. Lane 1: 100 bp ladder, Lanes 2-3: Ape2182 PCR products. B). Insertion of Ape2182 into pDrive resulting in a construct size around 4,603 bp. Lane 1: BSTE λ ladder (linear); lane 2: Ape2182/pET28 # 1; lane Ape2182/pET28 # 2; lane 3: Ape2182/pET28 # 3; lane 4: pDrive. Each construct is a higher molecular weight than pDrive alone, suggesting Ape2182 was inserted into pDrive. C). Insertion of Ape2182 into pET28 resulting in a construct size around 6,122 bp. Lane 1: Ape2182/pET28 # 1; lane 2: Ape2182/pET28 # 2; lane 3: Ape2182/pET28 # 3; lane 4: BSTE λ ladder (linear); lane 5: pET28. Ape2182/pET28 construct # 3 appears to be the proper size since the band is about the pET28 plasmid (lane 5). Ape2182/pET28 constructs # 1 and # 2 are smaller in size, indicating no insertion or an incorrect insertion of Ape2182 into pET28.

208

B.

A. 1 2 3 4

12 3 4 5 2182 30.4 kDa

36.5 kDa

2182 30.4 kDa

31.0 kDa

36.5 kDa 31.0 kDa

Figure 4.23: SDS-PAGE of His-Ape2182 expression and lysis. A). Ape2182/pET28 RIL expression results in a protein band around 30.4 kDa with the presence of the hexahistidine tag. Lane 1: His-Ape2182 RIL expression 0 hr; lane 2: His-Ape2182 RIL expression 3 hr; lane 3: His-Ape2182 RIL

expression 24 hr; lane 4: MW ladder. B). His-Ape2182 low salt lysis where His-Ape2182 is not soluble, but addition of 1M NaCl solubilizes a fraction of His-Ape2182. Lane 1: His-Ape2182 low salt lysis pellet; lane 2: His-Ape2182 low salt lysate; lane 3: His-Ape2182 high salt pellet; lane 4: His-Ape2182 high salt supernatant; lane 5: MW ladder.

Examination of the pellet suggested that the insoluble His-Ape2182 was in inclusion bodies. Two denaturing agents, urea (4-6.6 M) and guanidine HCl (2.5-4.2 M), were successful in solubilizing His-Ape2182 with the majority of the pellet solubilizing at 90 °C rather than room temperature. A white precipitate formed when dialyzing out the denaturant and as the sample was cooled from 90 °C to room temperature; in both cases His-Ape2182 was found in the precipitate.

Urea and/or guanidine HCl was

removed from the sample with rapid dialysis or by step-wise dialysis and in both cases, a precipitate formed containing His-Ape2182 and other contaminants.

209

Only a small

fraction of His-Ape2182 (~ 1 mg of protein) remained in solution after the removal of the denaturant. Other cell lines, different expression temperatures, and different concentrations of IPTG were examined in order to obtain soluble His-Ape2182.

His-Ape2182 was

expressed in the cell line BL21 DE3 ROS, but the protein was insoluble after low and high salt lysis. Addition of Bug Buster to the pellet did not solubilize the protein either. Different concentrations of IPTG (1 mM, 0.5 mM, 0.3 mM, 0.1 mM, and 0.05 mM) were studied to determine if the amount of protein produced affects the solubility of this protein. Based on the intensity of the protein bands on the SDS-PAGE, it appeared that each concentration of IPTG produced similar amounts of His-Ape2182. The expression pellets from 0.5 mM and 0.05 mM inductions were lysed with a low salt lysis buffer and in both cases His-Ape2182 was insoluble. This experiment using different concentrations of IPTG was attempted again and resulted in insoluble His-Ape2182. His-Ape2182 was expressed in BL21 DE3 RILP cells, but the protein was insoluble after lysis with low salt. His-Ape2182 in RILP was expressed at a lower expression temperature (20 °C) for a 15 hour incubation period and protein expression was seen after three hours and after 15 hours. The protein was still insoluble after lysis with a low salt buffer, but a small fraction of the protein was extracted from the pellet with the addition of 1M NaCl directly to the pellet. His-Ape2182 remained insoluble after the addition of Bug Buster directly to the lysis pellet. His-Ape2182 was expressed in multiple cell lines, but remained insoluble after many attempts to isolate the protein.

Perhaps the presence of the N-terminal

hexahistidine tag prevents the protein from folding properly or initiates aggregation of the

210

protein. Therefore, Ape2182 was recloned into pET21 in order to remove the N-terminal tag. Ape2182 was restricted from pET28 using Nde I and Bam HI, isolating the gene around 753 bp (Figure 4.24 a). This gene was ligated into pET21, transformed into XL10, and the Ape2182/pET21 construct (6,163 bp) was isolated from several colonies (Figure 4.24 b). The Ape2182/pET21 plasmid was designed with a stop codon before the C-terminal hexahistidine tag in order to prevent the His-tag extension. Ape2182/pET21 was not sequenced, but theoretically the Ape2182 sequence should still be correct since it was sequenced when inserted into the pET28 vector and it will be confirmed later.

A. 1 2

B. 3

4

2182 6,163 bp

1,000 bp 500 bp

1 2 3 4

7 kb 5 kb

2182 753 bp

Figure 4.24: Cloning of Ape2182 into pET21. A). Restriction digest of Ape2182/pET28 with Nde I and Bam HI. Ape2182 gene was isolated around 753 bp. Lane 1: 100 bp ladder; lanes 2 and 3: restricted Ape2182/pET28; lane 4: 100 bp ladder. The lowest band in lanes 2 and 3 correspond to the Ape2182 gene size of 753bp. B). Insertion of Ape2182 into pET21 resulting in a construct around 6,163 bp. Lane 1: Ape2182/pET21 # 1; lane 2: Ape2182/pET21 # 2; lane 3: Ape2182/pET21 # 3; lane 4: supercoiled DNA plasmid ladder.

211

Protein expression was seen in the cell line BL21 DE3 ROS2 plysS, with a protein band appearing around 28.6 kDa (Figure 4.25 a). Ape2182 was soluble after a no salt lysis (50 mM Tris pH 8.0, 5 mM EDTA, 5 % glycerol). The Ape2182 lysate was heated at 75 °C for one hour and Ape2182 remained in solution. Addition of 0.5 % PEI precipitated about half of the Ape2182 out of solution (Figure 4.25 b).

A. 1

31 kDa 21.5 kDa

B. 1 2 3 4 56

2 3

31 kDa

2182 28.6 kDa

2182 28.6 kDa

21.5 kDa

Figure 4.25: SDS-PAGE of Ape2182/pET21 expression and lysis. A). Ape2182 was expressed in BL21 DE3 ROS2 cells producing a band around 28.6 kDa. Lane 1: MW ladder; lane 2: Ape2182/pET28 ROS2 expression 0 hr; lane 3: Ape2182/pET28 ROS2 expression 3 hr. B). No salt lysis of Ape2182 resulting in soluble protein where upon heating half of the protein precipitates. Lane 1: MW ladder; lane 2: Ape2182 no salt lysis pellet; lane 3: Ape2182 no salt lysate; lane 4: Ape2182 heated pellet; lane 5: Ape2182 heated supernatant; lane 6: Ape2182 heated PEI pellet; lane 7: Ape2182 heated PEI supernatant.

Ape2182 was purified using anion exchange chromatography since the protein is rather acidic with a theoretical pI around 4.7; the heated lysate was loaded directly onto the Poros HQ, skipping the QS, since the lysate was clear and only a few contaminants remained (Figure 4.25 b lane 5). The HQ was equilibrated with buffer A (buffer A: 25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 50 mM NaCl) and the conductivity did not 212

need to be reduced since the lysate did not contain any salt. Ape2182 eluted from the HQ in a broad peak (~ 450 mM), but the majority of the protein was found in the effulent (Figure 4.26). The effulent and pooled fractions containing Ape2182 were concentrated together and upon measuring the concentration of Ape2182, the A260/A280 ratio was around 1.6.

1 2 3 4 5 6

31.0 kDa

2182 28.6 kDa

21.5 kDa

Figure 4.26: SDS-PAGE of Ape2182 purification. The majority of Ape2182 (~ 29.4 kDa) was located in the flow through, as well as in fractions 11-40, where only the flow through and fractions 15-40 were kept based on the purity of the samples. Lane 1: MW ladder; lane 2: Ape2182 Poros HQ FT; lane 3: Ape2182 Poros HQ fraction 11; lane 4: Ape2182 Poros HQ fraction 22; lane 5: Ape2182 Poros HQ fraction 34; lane 6: Ape2182 Poros HQ fraction 40.

As previously discussed, it was speculated that some type of cofactor (ATP, GTP, NAD, FAD, ect.) or perhaps RNA/DNA was interacting with Ape2182, based on the the A260/A280 ratio. Several attempts were made to remove the contaminant from the protein. Ape2182 was incubated at room temperature in the presence of 1 M NaCl and upon addition of 0.4 % PEI, a white precipitate formed. Ape2182 was found in the pellet and

213

supernatant, but the supernatant still had a higher absorbance at 260 nm. The Ape2182 was run over a HA column and eluted in a broad peak during a 10 CV linear gradient (buffer A: 25 mM Tris pH 7.5, 50 mM NaCl; buffer B: 25 mM Tris pH 7.5, 1 M (NH4)2SO4) (~500 mM). The A260/A280 ratio was still greater than the typical protein value. Identical to the work with Ape0162, an ammonium sulfate precipitation at several different salt concentrations (1.5-3.5 M (NH4)2SO4) was performed, with concentrations over 2 M salt precipitating Ape2182.

There were difficulties re-dissolving the

precipitated protein, but absorbance readings were still larger at 260 nm than 280 nm. As speculated before, the heating step induced binding of a cofactor or stretch of nucleotides to the Ape2182 subunit, therefore the next experiment to perform was to purify unheated lysate. Unheated Ape2182 lysate was purified using the Q Sep and HQ using a 10 CV linear gradient (buffer A: 25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 50 mM NaCl; buffer B: 25 mM Tris pH 8.0, 2 mM EDTA, 1 % glycerol, 2 M NaCl). The conductivity of the Ape2182 samples were reduced to match the conductivity of buffer A before each column run. The effluent saturated the UV detector and to determine where the protein eluted, samples across the chromatogram were analyzed using SDS-PAGE. Ape2182 eluted from the QS (~ 500 mM) in a broad peak with minor contaminants (Figure A11). The pooled fractions containing Ape2182 were loaded onto the HQ and eluted in a broad peak (~ 250 mM) (Figure A12). Some contaminants were still present after the Poros HQ run and the Ape2182 was heated at 75 °C for one hour. White precipitate appeared during the heating step, but Ape2182 was soluble and upon checking the concentration of Ape2182, the absorbance at 280 nm was greater than at 260 nm. Ape2182 has three

214

tryptophans, 8 tyrosines, and 13 phenylalanines so there should be a greater absorbance at 280 nm than at 260 nm, which should be consistent regardless of heating steps. After heating Ape2182, the protein was fairly pure, despite the presence of faint bands which can be seen when the sample is overloaded. The presence of Ape2182 was verified with each purification step on SDS-PAGE (Figure 4.27).

1

2

31.0 kDa

3 4 5 6 7

2182 28.6 kDa

21.5 kDa

Figure 4.27: SDS-PAGE of modified Ape2182 purification. Ape2182 eluted from the Q Sepharose in fractions 19-45 (500 mM NaCl). These fractions were further purified on the Poros HQ; Ape2182 eluted from the Poros HQ in fractions 5-15 (250 mM NaCl). Ape2182 was heated at 75 °C to further purify the protein, resulting in a fairly pure, soluble protein. Lane 1: MW ladder; lane 2: Ape2182 Q Sepharose pooled fractions 19-45; lane 3: Ape2182 Q Sepharose pooled fractions 46-66; lane 4: Ape2182 dialyzed supernatant; lane 5: Ape2182 Poros HQ pooled fractions 5-15; lane 6: heated Ape2182 pellet; lane 7: heated Ape2182 supernatant.

215

4.1.4 PCNA subunits characterization Samples of Ape0162, Ape0441, and Ape2182 subunits were analyzed using dynamic light scattering in order to determine the oligomeric state and homogeneity of each of the proteins (Figure 4.28).

Ape0441 has a 13.7 % polydispersity with an

estimated molecular weight around 111 kDa, suggesting the protein exists as a higher multimer with a radius of 45 Å (± 6 Å). At 4 °C, Ape0162 has an estimated molecular weight around 43.0 kDa, with an 11.9% polydispersity and a radius of 30 Å (± 4 Å). Ape2182 has an estimated molecular weight around 26.0 kDa with an 11.3 % polydispersity at 4 °C and a radius of 24 Å (± 4 Å). The molecular weight of Ape0441 is 28.8 kDa, Ape2182 is 28.6 kDa, and Ape0162 31.2 kDa. The estimated MW suggests that of Ape0441 is a trimer, Ape0162 is a monomer, and Ape 2182 is a monomer.

Ape0441

Ape 0162

Ape2182

Figure 4.28: PCNA DLS results. Each of the PCNA subunit are around 11-13 % polydispersity, with estimated molecular weights suggesting that the Ape0162 and Ape2182 are monomers, while Ape0441 are trimers.

216

The stability of each of the PCNA subunit was tested using differential scanning calorimetry. Ape0162 was very thermostable, with a single peak occurring at 98 °C, signifying the unfolding of Ape0162. No transition was seen during the down scan, suggesting that the unfolding of Ape0162 was not reversible. Ape2182 was also very thermostable, with a small peak occurring at 90 °C, corresponding to the unfolding of Ape2182. Similarly to Ape0162, there was no transition during the down scan for Ape2182, showing that the unfolding of Ape2182 was not reversible. Unfolding of the subunits at such high temperatures was expected because the organism that they originate from exists at high temperatures as well.

Figure 4.29: DSC thermogram Ape 2182.

The TM for Ape 2182 was 83.4 °C,

indicating that Ape2182 is very stable to high temperatures.

217

The Ape0162 and Ape2182 were screened for crystallization using a variety of commercial and in house crystallization screens. The proteins were screened at ambient temperature, with drop sizes of 1 µL + 1 µL, using the PEG Ion/Natrix, Crystal Screen I and II, Wizard I and II, and Cryo I and II screens. Several crystal hits were obtained for Ape2182 where the majority of the conditions contain high salt concentrations (Table 4.1). Three of the conditions (Wizard I (47), Wizard II (38), Crystal Screen II (9)) were expanded in 4x6 screens122; the only expansion screen to produce crystals was crystal screen II (condition 9: 1.8 M-2.2 M NaCl, 0.1M NaOAc pH 5.0). Nine additional expansion screens of this condition were setup to find the optimal conditions for diffraction quality crystals. Two expansion trays were set up testing various buffers (0.1 M Na Citrate pH 5, Bis Tris pH 6, Tris pH 7, Tris pH 8, Bis Tris Propane pH 9, and CAPS pH 10.5) and salt concentrations (1.4 M, 1.8 M, 2.2 M, 2.4 M and 2.6M NaCl). Ape2182 crystals only grew in Na Citrate pH 5 at salt concentrations at 2.4 M NaCl or lower. Many of the crystals that grew appeared to be melting along the edges as well growing in the presence of a dark precipitate (Figure 4.30 a). Mass spectrometry verified the presence of Ape2182 in the protein crystals.

218

Table 4.1: Crystal crystal hits of Ape2182. Protein crystal crystal hits of Ape2182 from commercial screens. Each picture is taken with a polarizer.

# Protein Screen 1 2182 Natrix (44)

Crystallization Condition Picture 10 % w/v PEG 4000 0.05 M Tris pH 7.5 0.2 M KCl 0.05 M MgCl2

2 2182

Crystal Screen II (9)

2.0 M NaCl 0.1 M NaOAc pH 4.6

3 2182

Crystal Screen I (35) 1.6 M Na3PO4 0.1 M HEPES pH 7.5

4 2182

Wizard II (29)

1.26 M (NH4)2SO4 0.1 M CHES pH 9.5 0.2 M NaCl

5 2182

Wizard I (47)

1.26 M (NH4)2SO4 0.1 M Tris pH 8.5 0.2 M Li2SO4

Next, several more expansions were set up to optimize the buffer and pH (Na Citrate pH 5, NaOAc pH 5, NaH2PO4 pH 4) in the salt concentration range 1.2 M-2.4 M NaCl. Ape2182 crystals grew better in Na Citrate pH 5. Crystals did grow in NaOAc pH 5, but they had slight defects. Crystals at both pH 5 buffers still appeared to be melting around the edges. At pH 4, Ape2182 crystals were produced, but the crystals were smaller in size (0.1 µm compared to ≥ 0.15 µm at pH 5) and the crystals were not melting (Figure 4.30 b,c). Additional expansion trays were set up at pH 5 and pH 4 against the two

219

different salt concentration gradients (1.3 M - 1.8 M and 1.2 M - 2.4M) with several protein concentrations (10 mg/mL, 15 mg/mL, 20 mg/mL). The crystals formed (in the presence of dark precipitate) in each of the screens at various sizes ranging from 0.1 µm to 0.4µm, with the crystals getting smaller in size as the salt concentration increases.

A.

B.

C.

D.

Figure 4.30: Ape2182 protein crystals. Ape2182 protein crystals from the crystal screen II condition 9, with variations to the crystallization conditions. A). Crystals grown at Na citrate pH 5 in 1.5 M NaCl. B). Ape2182 crystal in Na citrate pH 5, at 1.2 M NaCl with an additional 100 mM NaCl added to the protein solution. C). Crystals grown in NaH2PO4 pH 4 in 2.4 M NaCl. D). Ape2182 cystals in Na citrate pH 5 1.4M NaCl. Diffraction data from these crystals were collected at APS SER CAT BM 19.

For cryopreservation during data collection, Ape2182 crystals were soaked in a cryo/mother liquor solution containing 0.1 M NaOAc pH 5, 2 M NaCl, 2.5 M Na malonate; during the soak the crystals were quite stable. The crystals were frozen in a helium cryostream with a clean, clear freeze. A total of 40 crystals were screened in house (FRE-CCD) and 13 of the crystals diffracted to highest resolutions ranging from 4 Å to 3.1 Å. These 13 crystals were taken to SER CAT BM 19 (at APS) and most diffracted to around 2 Å. However, the majority of the crystals were twinned, and data were unable to be processed further. One crystal (~0.3 x 0.25 µm) did not appear to be twinned and data was collected at a distance of 300 mm, for three second exposures, at 220

0.5 ° oscillations for 180 frames (Figure 3.39d). Two passes were collected on the crystal to collect 180 ° of data, with the crystal diffracting to around 2.1 Å (Figure 4.31).

Figure 4.31: Images of Ape2182 X-ray diffraction. Ape2182 protein crystals diffracted to ~2.1 Å at APS beamline 19 SER CAT. Data was collect for 3 second exposures, at a distance of 300 mm, with 0.5 ° oscillations for 180 images. A second pass was collected, continuing from the first pass. The left image is the first image collected, while the right image is the last image collected in the first pass.

The data were processed using Mosflm in space group P 23 (cubic), as suggested by mosflm and pointless. There were numerous of spots per image so the spot threshold was set to 70, while the autoindexing level was set to 60 in order to get a better solution. The beam center (x =148.52; y=146.6) and detector distance (301.5) were not refined to facilitate data processing. The data were integrated with mosflm and scaled with SCALA (ccp4 suite) resulting in a 2.5 Å inner shell Rmerge of 12 % with 40,799 unique reflections with 100 % completeness (Table 4.2). The cell dimensions for the P 23 space group were 170.4 Å, 170.4 Å, 170.4 Å, 90 °, 90 °, 90 °. The Matthew’s coefficient was 2.40 with

221

48.8 % solvent content and 2 PCNA trimers in the asymmetric unit. The scaled data were checked for the presence of twinning using a merohedral twin detector program; data analysis indicates that the crystal is partially twinned (Figure 4.32).124

Figure 4.32: Analysis of Ape2182 data for twinning. The Ape2182 data was analyzed using Padilla-Yeates server. The red line is the theoretical untwined, the red curve is the theoretical twinned and the blue line is the data. The blue line is above the red, suggesting the crystal is partially twinned.124

The detwinning operation was performed on the intensities not on the structure factors. The data were therefore rescaled to 2.8 Å without truncating the data (Table 3.4). In both cases the Rmerge was relatively high, perhaps based on the data being twinned or being processed in an incorrect space group. The output from SCALA was then analyzed to get the twin fraction (0.18), which was then utilized to detwin the data using the ccp4

222

program Detwin. The truncate program was used to convert the intensities to structure factors and the detwinned data was verified with the SFCheck program.127 The Rmerge decreased to 4.5 %, the completeness dropped to 90.6 %, and the transformation to detwin the data was +k, h, -l.

Table 4.2: Ape2182 X-ray data processing statistics for truncated and untruncated data. Ape2182 X-ray data was processed as space group P23 (cubic) and truncated. The data was determined to be partially twinned so the data had to be untruncated in order to detwin the data. The statistics were very similar whether the data contained intensities or structure factors. 2182 data (truncated)

2182 data (untruncated)

Rmerge

13.6 %

Rmerge

12.0 %

Total Reflections

600,359

Total Reflections

440,172

Unique Reflections

57,081

Unique Reflections

40,799

I/(σ I)

18.0

I/(σ I)

20.0

Completeness (%)

100

Completeness

100

Multiplicity

10.5

Multiplicity

10.8

Several molecular replacement (MR) programs (Phaser and Mol Rep) were unsuccessful in generating a model for the detwinned Ape2182 data using the solved structures of PCNA from S. solfataricus and P. furiosus. The resolution range of the detwinned Ape2182 data was decreased to 10-3 Å for the molecular replacement, but still no solutions were obtained. The search models were also varied for each species; the PCNA from S. solfataricus is a heterotrimer, while the P. furiosus is a homotrimer, thus the PDB for Pfu contains only a monomer. Phaser and MolRep were used to search for

223

one trimer using the S. solfataricus atomic coordinates, as well as search for three monomers using the P. furiosus atomic coordinates. In both cases, both programs were not able to produce a proper model to fit the Ape2182 data.

The PFU and SSO

coordinates were converted to poly-alanine chain and used as a search model, resulting with no correct models. Since only Ape2182 was in the crystal, each MR program also searched for one or two subunits in the asymmetric unit. Phaser searched for models in alternative space groups within the cubic system with both SSO and PFU. Despite all the attempts, a model for the Ape2182 data could not be created. The presence of the twinning hindered both the processing and modeling of the Ape2182 data. Since several structures of the PCNA have been solved, including heterotrimeric and homotrimeric clamps, the focus was returned to analyzing the protein-protein interactions with other repair proteins that function on the lagging strand.

4.1.5 Ape DNA ligase Ape DNA ligase was amplified using PCR and cloned into pET21 by a former undergraduate student (Nate George).

The project for this dissertation began with

sequencing and expression of the ligase/pET21 construct. Sequencing confirmed proper insertion into the plasmid with the position of the stop codon directly after the protein, preventing the addition of the C-terminal hexahistidine tag. Several cell lines were tested for DNA ligase expression, including BL21 DE3 RIL, RILP, ROS2 plysS, ROS Blue, and T7 express. Expression of DNA ligase was seen in T7 express with the addition of a rare codon plasmid (RILP), for a band appeared around 65 kDa (Figure 4.33 a). The theoretical molecular weight of ligase is 69,196.2 Da, therefore the protein was migrating

224

to a lower position on the gel than expected. The further migration of the band might be due to the amount of SDS attached to the protein. The protein was partially extracted during lysis with a low salt buffer (100 mM Tris pH 8.0, 100 mM NaCl, 5 mM EDTA) (Figure 4.33 b). This DNA ligase lysate was then heated at 75 °C for one hour, with the formation of a white precipitate. DNA ligase remained soluble during heating and PEI was added to the heated supernatant in order to remove any DNA from the solution. Upon addition of the PEI, white precipitate also formed, but ligase remained soluble and fairly pure after the heating and PEI steps (Figure 4.33 b). A second band (~52 kDa), below the DNA ligase band, appeared after the heating step and became more prevalent after the addition of PEI. The tryptic digest of this band was analyzed using MS (Michigan Proteome Consortium [MPC]). Upon analysis, 24 peptides were identified to be APE DNA ligase with 100 % certainty (Figure 4.34). The peptides were aligned manually, and it is suggested that part of the N-terminal region of DNA ligase is missing in the truncated band. The sequence obtained by MPC was not the full sequence of APE DNA ligase; the sequence was missing the first 17 amino acids, and thus a total of 124 amino acids are unaccounted for with the MS analysis. Since we are interested in studying the full length DNA ligase, the next step was to separate both bands from each other during purification or prevent the truncation during the lysis step.

225

A. 1 2 66.3 kDa 55.4 kDa

1 2 3 B.

3 DNA ligase 69.2 kDa

4 5

6 7

66.3 kDa

DNA ligase 69.2 kDa

55.4 kDa Truncated DNA ligase

Figure 4.33: SDS-PAGE of DNA ligase/pET21 expression and lysis. A). DNA ligase was expressed in T7 express (RILP plasmid) with a protein band appearing below 66.3 kDa. B). Low salt lysis of DNA ligase isolates the protein to the pellet and lysate. Lane 1: DNA ligase low salt lysis pellet; lane 2: DNA ligase low salt lysate; lane 3: DNA ligase heated pellet; lane 4: DNA ligase heated supernatant; lane 5: DNA ligase/PEI pellet; lane 6: DNA ligase/PEI supernatant.

1 61 121 181 241 301 361 421 481 541 601

MGCLVLASSS LIQGKLGPDW RAARAVTLEA YIVRFVEGRL ALRGVKPQVG LENITRMFPD EVPVAVFLFD KAIEEGAEGV GKLSSLLMAA WVEPALVAEI EMYKRQLRRV

GGVGGGDMPF KGLPELGVGE FMAGGGEALT RVGVGDATVL VPIRPMLAER VVEMARKGLK ALYVDGEDLT MVKAVHRDSV YDPDRDVFPT LGAELTLSPM EEPAEQV

KPVAEAFASM KLLVKAIALA VRRVYNTLYR DALAMAFGGG RDPAEILRKV AGEAIVEGEI SKPLPERRRR YTAGVRGWLW VCKVATGFTD HTCCLNTVRP

ERITSRTQLT YKATEERVER IAMAQGEGSR AHARPVIERA GGRAVVEYKY VAVDPDNYEI LKEIVVETPL VKLKRDYKSE EELDRMNEML GVGISIRFPR

LLLTRLFKST YKSVGDLGSV DIKLRLLAGL YNLRADLGYI DGERAQIHKK QPFQVLMQRK WRLAESIETS MMDTVDLVVV KKHIIPRKHP FIRWRDDKSP

PPGAIGIVVY AERLSREYRS LADAEPVEAK AEVVAREGVD DGEVYIYSRR RKHDIHRVMR DPEELWTFFL GAFYGRGKRG RVESRIEPDV EDATTTHELL

Figure 4.34: DNA ligase sequence comparison with MS peptides. The complete protein sequence of DNA ligase (black) overlapped with the peptides from the tryptic digest of the 52 kDa band analyzed by MS (red). 124 amino acids from the N-terminus are missing in the MS analysis.

226

DNA ligase is a basic protein with a theoretical pI around 8.71, therefore cation exchange chromatography was used to purify the protein. Samples from both column purifications were analyzed on an SDS-PAGE to verify the purity and presence of the protein (Figure 4.35). Both the SP and HS were equilibrated with buffer A (buffer A: 25 mM Tris pH 7.5, 50 mM NaCl, 1 % glycerol) and the conductivity of the samples containing DNA ligase were reduced below the conductivity of buffer A. DNA ligase eluted from the SP in a narrow peak (~ 200 mM) during the 10 CV linear gradient run (buffer B: 25 mM Tris pH 7.5, 1 M NaCl, 1 % glycerol) (Figure A13). Only two bands were present in the single peak, DNA ligase and the truncated DNA ligase band. DNA ligase was also present in the SP effulent but not saved for further purification. The protein was fairly pure towards the beginning of the peak, but at the end of the peak several additional contaminants appeared (Figure 4.35). The SP fractions containing DNA ligase were pooled and the conductivity of the sample was decrease during dialysis (25 mM Tris pH 7.5, 75 mM NaCl, 1 % glycerol). DNA ligase eluted from the HS column in a broad peak (~ 375 mM) during a 10 CV linear gradient run (Figure A14). The two peaks contained both the DNA ligase and truncated DNA ligase, with the appearance of the truncated DNA ligase becoming more prominent towards the middle and end of the peak. The presence of two different contaminants (bands ~ 12 kDa and 8 kDa), each eluting in a different peak, could be contributing to the separation of a single peak into the two peaks present on the chromatogram. The concentration of the truncated DNA ligase, more prominent in the beginning of the run, could also be contributing to the two peaks. Despite the presence of the truncated DNA ligase, the protein samples were fairly pure after both column purifications.

227

B.

A. 1

66.3 kDa 55 kDa

2 3 4

5 6

7 8 9 10 11 12

1 2 3 4 5 6 7 8

9 10 11 12 13 14 15

66.3 kDa Ligase 69.2 kDa 55 kDa

Figure 4.35: SDS-PAGE of DNA ligase purification. A). DNA ligase eluted from the SP Sepharose column in 230-480 mM NaCl (conductivity range 12.9-37.1 mS/cm). Lane 1: MW ladder; lane 2: DNA ligase SP Sepharose flow through; lane 3: DNA ligase SP Sepharose fraction 8; lane 4: DNA ligase SP Sepharose fraction 9, lane 5: DNA ligase SP Sepharose fraction 10, lane 6: DNA ligase SP Sepharose fraction 11; lane 7: DNA ligase SP Sepharose fraction 12; lane 8: DNA ligase SP Sepharose fraction 13; lane 9: DNA ligase SP Sepharose fraction 14; lane 10: DNA ligase SP Sepharose fraction 15; lane 11: DNA ligase SP Sepharose fraction 16; lane 12: DNA ligase SP Sepharose fraction 17. B). DNA ligase eluted from the Poros HS column in 310-480 mM NaCl (conductivity range 29.6-44.9 mS/cm). Lane 1: MW ladder; lane 2: DNA ligase Poros HS flow through; lane 3: DNA ligase Poros HS fraction 22; lane 4: DNA ligase Poros HS fraction 23; lane 5: DNA ligase Poros HS fraction 24; lane 6: DNA ligase Poros HS fraction 25; lane 7: DNA ligase Poros HS fraction 26; lane 8: DNA ligase Poros HS fraction 27; lane 9: DNA ligase Poros HS fraction 28; lane 10: DNA ligase Poros HS fraction 29; lane 11: DNA ligase Poros HS fraction 30; lane 12: DNA ligase Poros HS fraction 31, lane 13: DNA ligase Poros HS fraction 32; lane 14: DNA ligase Poros HS fraction 33; lane 15: DNA ligase Poros HS fraction 34.

The pooled DNA ligase fractions were placed in centrifugal concentrators, the protein started to precipitate as the concentration of the protein increased. A solubility screen was performed on the precipitated material (containing both full length and truncated versions of DNA ligase) and the proteins were most soluble at pH 5.6 and 6.5, 228

Ligase 69.2 kDa

in the presence of sodium, magnesium, calcium, citrate, and sulfate (Figure 4.36). Determination of relative protein concentration was performed using a Bradford assay reagent, therefore it is not possible to determine whether the full length and/or truncation was resolublizing in each of the salts and buffers unless the samples were run on an SDSPAGE. Ape DNA ligase solubility study 0.43451

Supernatant

0.046043

H2O

0.12172

TAPS

0.036574

HEPES

0.42966

PIPES

0.57633

MES

0.47024

Abs 595 nm

Citrate

0.26437

Phosphate

0.44827

Sulfate

0.12083

Cacodylate Acetate

0.1392

Formate

0.1385 0.71958

CaCl2

1.2112

Mg2Cl

0.22367

LiCl

0.20558

KCl

0.27971

NaCl

0.23257

NH4Cl

0

0.2

0.4

0.6

0.8

1

1.2

100 mM Salt/Buffer

Figure 4.36: DNA ligase solubility screen. The results from the solubility screen on precipitated full length and truncated DNA ligase. The solubilized supernatant was checked for the presence of protein using the Bradford assay reagent. The proteins were most soluble at a pH of 6.5 and 5.6, and in sodium, magnesium, calcium, sulfate, and citrate.

Using the results from the solubility screen, an optimal lysis buffer was created in order to better isolate more protein during lysis. DNA ligase was again located in the pellet and lysate after lysis with a modified lysis buffer (100 mM Bis Tris pH 6.5, 50 mM

229

1.4

NaCl, 10 mM MgCl2, 5 mM EDTA). The pellet was resuspended in 100 mM Bis Tris pH 6.5, 20 mM MgCl2 and a variety of salts were added to the resusupension.

The

resuspended protein samples were subjected to a variety of heating steps in order to solubilize the protein. DNA ligase was resolubilized in 1 M NaCl, 1 M NH4Cl and after heating to 90 °C (with addition of 5 mM CaCl2). Full length DNA ligase was insoluble in 1 M (NH4)2SO4 while the truncated DNA ligase was soluble. Since an increased amount of protein was solubilized with 1 M NaCl, another lysis was performed in the presence of 2 M NaCl (with 100 mM Bis Tris pH 6.5, 5 mM EDTA, 2 mM BME, 10 mM ZnOAc), but the majority of the protein was still located in the pellet. The pellet was heated to 90 °C for one hour but the protein remained in the pellet. This pellet was tested in the presence of 1M salts (MgCl2, NaCl) with and without 10 mM ZnOAc, all samples incubated at 90 °C, and the protein was solubilized in all cases. DNA ligase could also be solubilized in CAPS pH 10.5 with 1M NaCl and 10 mM MgCl2. It appeared that the presence of ZnOAc, in tandem with the protease inhibitor AEBSF and the protease cocktail tablet, limited the amount of truncated DNA ligase produced during the lysis step. A modified two-step extraction was developed to solubilize the DNA ligase (Figure 4.37). The expression cells were first lysed in 100 mM Bis Tris pH 6.5, 100 mM NaCl, 10 mM MgCl2, 5 mM CaCl2, and 10 mM ZnOAc. The lysate was heated to 75 °C for one hour, followed by the addition of PEI (final concentration 0.5 %), and then centrifugation to pellet precipitate, resulting in soluble full length DNA ligase (Figure 4.37). The pellet from the lysis was resuspended in the above buffer with the addition of 1M NaCl, heated to 75 °C for one hour, followed by the addition of PEI (final

230

concentration 0.5 %). The precipitate was removed by centrifugation resulting in soluble full length DNA ligase (Figure 3.48). DNA ligase lysis Resuspend expression cells in lysis buffer (100 mM Bis Tris pH 6.5, 100 mM NaCl, 10 mM MgCl2, 10 mM ZnOAc, 5 mM CaCl2, AEBSF, protease inhibitor tablet)

Stir cells @ RT with lysozyme for 20 min then sonicate and centrifuge 1

2

3

4

5

6

7 8

Lysate * 3

Pellet * 2 Resuspend pellet in above lysis buffer with 1 M NaCl (P 6, S 7)

Heat 75 °C, 1hr.* 8

Heat 75 °C, 1hr. 4

Add PEI (0.5%)

Add PEI (0.5%)

Centrifuge

Centrifuge

Pellet

Lysate * 9

Pellet

Lysate * 5

Figure 4.37: Modified lysis and extraction procedure for DNA ligase. DNA ligase was located in both the pellet and lysate after lysis with the optimal buffer and salt conditions. Additional protein was extracted from the pellet using a high salt buffer. Both lysate and pellet extraction samples were heated to 75 °C and PEI was used to remove extraneous DNA fragments. The red asterisks represent the presence of DNA ligase, while the red # corresponds to the sample wells on the gel. Lane 1: MW ladder; lane 2: DNA ligase lysis pellet; lane 3: DNA ligase lysis lysate; lane 4: DNA ligase heated lysate; lane 5: DNA ligase heated, PEI treated lysate; lane 6: DNA ligase resuspended pellet; lane 7: DNA ligase resuspended supernatant; lane 8: DNA ligase heated resuspended supernatant; lane 9: DNA ligase heated, PEI treated resuspended supernatant.

231

9

Purified DNA ligase was screened for protein crystals using a variety of commercial and in house screens. Seven different screens at two temperatures (ambient or 4 °C), protein concentrations (9-15 mg/mL), drop sizes (0.5 +0.5 µL or 1+1 µL), and the presence of ATP or ADP were tested for protein crystal production (Table 4.3). The majority of the screens were performed with both full length and truncated DNA ligase, either of which could initiate the nucleation of crystals.

Table 4.3: Commercial and in house crystallization screens used for crystallization studies of APE DNA ligase. RT temperature corresponds to ambient temperature of the lab. The drop size consists of protein : screen condition. # of setup specifies the number of times the complex was screened against that particular crystallization screen.

Protein/(tray #) DNA ligase (1) DNA ligase (2) DNA ligase (3) DNA ligase (4 & 7) DNA ligase (5) DNA ligase (6) DNA ligase (7) DNA ligase + 1 mM ATP (9) DNA ligase + 1 mM ATP (10) DNA ligase + 1 mM ATP (11) DNA ligase + 50 µM ADP (12) DNA ligase + 50 µM ADP (13) DNA ligase + 50 µM ADP (14) DNA ligase + 50 µM ADP (15) DNA ligase + 50 µM ADP (16)

Temperature (°C) RT RT RT RT RT RT RT 4 4 4 RT RT RT RT RT

Drop size (µL) 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 1+1 1+1 1+1 1+1 1+1 1+1 1+1 1+1

Crystallization Screen Index Crystal Screen I and II Wizard I and II PEG Ion/Natrix Salt RX Additive Cryo I and II Crystal Screen I and II Wizard I and II PEG Ion/Natrix PEG Ion/Natrix Crystal Screen I and II Wizard I and II Additive Cryo I and II

# of setup 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1

Several conditions produced crystals that were further investigated to determine whether the crystals contained protein (Table 4.4). Trays 1-4 were set up with DNA ligase (25 mM HEPES pH 7.5, 300 mM NaCl) at 15.6 mg/mL with a drop size of 0.5 + 0.5 µL at ambient temperature. Many drops were either clear, precipitated, or contained a 232

darker precipitate. Several conditions in the index screen produced similar crystalline precipitate clusters. There were crystal hits from crystal screen I (A9) and wizard I (G1 and H5), but there was no definite confirmation that the crystals were protein. Trays 5-8 were set up with DNA ligase (25 mM Bis Tris pH 6.5, 50 mM NaCl, 10 mM MgCl2) at 12.3 mg/mL with a drop size of 0.5 + 0.5 µL at ambient temperature. The majority of the drops were clear, with precipitate and dark precipitate in some of the drops. Crystal hits were obtained from the additive screen (C3, E2, E8, F2, D8) and cryo II (G8), but many of the crystals did not absorb the IZIT dye. Trays 9-11 were setup with DNA ligase (25 mM Bis Tris pH 6.5, 175 mM NaCl, 10 mM MgCl2) at 8.5 mg/mL with 1 mM ATP in a drop size of 1 + 1 µL at 4 °C. Many of the drops contained a white or dark precipitate, with only one crystal hit appearing in crystal screen I (C12), two crystal hits in wizard I and two crystal hits in wizard II. The rest of the wells were clear or had some crystalline precipitate. The majority of the crystal hits was in high salt conditions and resulted in the formation of salt crystals. Tray 11 was predominately precipitate in the PEG ion screen, with the natrix screen was a mixture of clear, crystalline precipitate, and precipitated drops. Trays 12-16 were set up with DNA ligase (25 mM Bis Tris pH 6.5, 400 mM NaCl, 5 mM MgCl2, 1% glycerol) with 50 µM ADP at 15.1 mg/mL with a drop size of 1 + 1 µL at ambient temperature. Drops were either clear or contained light precipitate or precipitate; several crystal hits appeared in two different screens, Wizard I and II (A2, B5, B8, E12) and Cryo I and II (A4, B4). These crystal hits were suspected to be salt, since the crystals did not absorb the IZIT dye.

233

Table 4.4: Crystal hits of APE DNA ligase protein. Crystal crystal hits of DNA ligase, including commercial screen and crystallization conditions. Many of the crystal hits were dyed with the IZIT dye to test for the presence of protein in the crystals.

# 1

Protein DNA ligase

Screen Index

Crystallization Condition -----------

2

DNA ligase

Crystal Screen II (17)

35 % (v/v) tert-butanol 0.1 M Sodium citrate pH 5.6

3

DNA ligase

Wizard I (7)

10 % (w/v) PEG 8,000 0.1 M MES pH 6.0 0.2 M Zn(OAc)2

4

DNA ligase

Wizard I (40)

10 % (v/v) 2-propanol 0.1 M MES pH 6.0 0.2 M Ca(OAc)2

5

DNA ligase

Additive (13)

1 M (NH4)2HPO4

6

DNA ligase

Additive (D8)

20 % (w/v) PEG 4,000 10 mM Sodium Iodide

7

DNA ligase

Additive (F2)

2 M (NH4)2HPO4

8

DNA ligase

Cryo II (G8)

50 % (w/v) PEG 200 0.1 M Na/K PO4 pH 6.2 0.2 M NaCl

234

Picture

9

DNA ligase + 1 mM ATP

Wizard I (9)

1 M (NH4)2HPO4 0.1 M acetate pH 4.5

10

DNA ligase + 1 mM ATP

Crystal Screen II (C12)

50 % (v/v) MPD 0.1 M Tris pH 8.5 0.2 M (NH4)2HPO4

11

DNA ligase

Additive (C3)

20 % (w/v) PEG 20,000

12

DNA ligase + 50 µM ADP

Wizard II (10)

20 % (w/v) PEG 2,000 MME 0.1 M Tris pH 7

13

DNA ligase + 50 µM ADP

Wizard II (45)

1.26 M (NH4)2HPO4 0.1 M MES pH 6.0

14

DNA ligase + 50 µM ADP

Cryo I (26)

40 % (w/v) PEG 200 0.1 M CHES pH 9.5 0.2 M NaCl

An expansion tray was set up for the additive C3 conditions and D8 conditions to optimize the crystals but no crystals formed. Condition C3 expansion tested a PEG 20,000 gradient from 5 %- 24%. Condition D8 tested a PEG 4,000 gradient from 5 %- 35 %. These conditions resulted in precipitate or an oily precipitate in each of the drops. DNA ligase was also prepared for MS analysis to investigate the protein-protein interactions between ligase and the PCNA subunits. A buffer exchange switched the original buffer of ligase with 300 mM NH4OAc.

235

4.1.6 Ape DNA polymerase B The Ape DNA polymerase (ApePolB) gene was amplified from the A. pernix genome

using

the

annealing

temperature

CACCATGAGGGGGTCAACCCCC

of

60

3’;

°C

(forward

reverse

primer:

primer:

5’ 5’

TTATTTCCCCCGCCTCATGAAG 3’) (Figure 4.38 a). A band around 2,354 bp was isolated and inserted into pET101. A stop codon was placed before the hexahistidine tag, preventing the addition of a hexahistidine tag onto the ApePolB gene. This cloning reaction was performed three times until two plasmids were isolated around 8,100 bp (gene + vector ~ 8,107 bp) (Figure 4.38 b).

The ApePolB/pET101 construct was

sequenced, verifying the correct sequence for DNA polymerase (construct # 4).

A. 1

B. 1

2

2

3

4

5

6

7

8 8 kb

5 kb

3,675 bp 2,323 bp

5 kb

PolB 2,354 bp

Figure 4.38: Amplification and cloning of Ape DNA polymerase B. A). PCR product of ApePolB resulting in a band around 2,154 bp. Lane 1: BSTE digested λ ladder (linear); lane 2: ApePolB gene. B). ApePolB/pET101 constructs with the correct size of the construct seen in lanes 5 and 6 (~8,100 bp).

Lane 1: supercoiled DNA plasmid ladder; lane 2: ApePolB/pET101 construct # 1; lane 3:

ApePolB/pET101 construct # 2; lane 4: ApePolB/pET101 construct # 3; lane 5: ApePolB/pET101 construct # 4; lane 6: ApePolB/pET101 construct # 5; lane 7: ApePolB/pET101 construct # 6; lane 8: supercoiled DNA plasmid ladder.

236

ApePolB/pET101 construct was transformed into several cell lines (RILP, RIL, ROS 2 plysS, T7 express/RILP plasmid) to test for protein expression, but no expression was seen for the multiple cell lines. These cell lines were used several times to express PolB and in each case expression was not achieved. Since ApePolB did not express in pET101, another expression vector, pDEST C1, was tested for protein expression. The ApePolB gene was inserted into the cloning vector pENTR-D and transformed into Top10 cells. Several plasmids were isolated from the colonies, resulting in a band around 5,000 bp (ApePolB/pENTR-D 4,934 bp) (Figure 4.39 a). A transposase reaction was utilized to transfer the ApePolB gene from pENTR-D into pDESTC1, then the reaction mixture was transformed into DH5α cells. Several plasmids were isolated resulting in bands around 6,100 bp (ApePolB/pDEST C1 6,089 bp) (Figure 4.39 b).

This

ApePolB/pDEST C1 construct was sequenced indicating proper gene insertion with the addition of the N-terminal hexahistidine tag.

237

A. 1

6 kb 5 kb

2

3 4

B. 1

5

7 kb PolB/pENTR-D 4,934 bp

5 kb

2

3

PolB/pDEST C1 6,089 bp

Figure 4.39: Cloning of DNA polymerase B into pENTR-D and pDEST C1. A). Insertion of ApePolB into pENTR-D resulting in a band around 5,000 bp (4,934 bp). Lane 1: Supercoiled DNA plasmid ladder; lane 2: ApePolB/pENTR-D construct #1; lane 3: ApePolB/pENTR-D construct # 2; lane 4: ApePolB/pENTR-D construct # 3; lane 5: ApePolB/pENTR-D construct # 4. Each construct appeared to be the correct size (5,000 bp). B). Insertion of ApePolB into pDEST C1 resulting in a band around 6,100 bp (6,089 bp). Lane 1: Supercoiled DNA plasmid ladder; lane 2: ApePolB/pDEST C1 construct # 1; lane 3: ApePolB/pDEST C1 construct # 2. ApePolB/pDEST C1 construct # 2 appeared to be the correct size (6,100 bp) and was sequenced properly.

The ApePolB/pDEST C1 construct was transformed into RIL, RILP, Ros2 plysS, T7 express (+ RILP plasmid), and BL21 star DE3 (+ RILP plasmid), with protein expression occurring in only the BL21 star DE3 cell line (Figure 4.40 a). A band around 93.0 kDa was present in the three hour sample after inducing protein production. A 6 L expression of this protein produced ~35 g of cell pellet. ApePolB was located in the lysis pellet after using a high pH, no salt lysis buffer (25 mM Tris pH 8.0, 5 mM EDTA, 5 % glycerol). Since the theoretical pI of ApePolB is around 8.9, ApePolB may be insoluble because the lysis buffer was too close to the pI of the protein. The lysis pellet was resuspended in 50 mM Bis Tris pH 6.5 and several salt concentrations and temperatures were used to attempt to solublize ApePolB. Only a small amount of ApePolB was soluble

238

in 100 mM NaCl when the sample was heated to 75 °C for one hour. The protein was insoluble in 1 M NaCl, whether the protein was incubated at ambient temperature or 75 °C (Figure 4.40 b).

A. 1 2

B. 1 2 3 4 5 6 7 8 9 10 11

3 97.4 kDa 66.3 kDa

PolB 93.0 kDa

97.4 kDa 66.3 kDa

Figure 4.40: Expression and lysis studies of DNA polymerase B. A). Expression of ApePolB in BL21 star DE3 with the RILP plasmid produces a protein band around 93 kDa. Lane 1: ApePolB zero hour expression; lane 2: ApePolB three hour expression; lane 3: MW ladder. B). Solubility study on the insoluble ApePolB with small amount of ApePolB found in heated samples. Lane 1: ApePolB pH 6.5 RT pellet; lane 2: ApePolB pH 6.5 RT supernatant; lane 3: ApePolB pH 6.5 75 °C pellet; lane 4: ApePolB pH 6.5 75 °C supernatant; lane 5: ApePolB 100 mM NaCl RT pellet; lane 6: ApePolB 100 mM NaCl RT supernatant; lane 7: ApePolB 100 mM NaCl 75 °C supernatant; lane 8: ApePolB 1M NaCl RT pellet; lane 9: ApePolB 1M NaCl RT supernatant; lane 10: ApePolB 1M NaCl 75 °C supernatant; lane 11: MW ladder.

The resuspended protein at pH 6.5 was subjected to a solubility study and ApePolB solubility was noticable in ammonium chloride, lithium chloride, magnesium chloride, and sodium cacodylate. Faint bands of ApePolB were also seen on the SDSPAGE in the presence of calcium chloride, sodium formate, sodium acetate, sodium sulfate, sodium phosphate, and sodium citrate. In the overall comparison, the study

239

suggested that the protein was more soluble at pH 6.5, in NH4Cl (instead of NaCl), MgCl2 (needed for activity), and sulfate (other polymerases are soluble with the addition of sulfate). A second cell lysis was performed using an optimized buffer (25 mM Bis Tris pH 6.5, 100 mM NH4Cl, 10 mM MgCl2), but ApePolB was still insoluble. The pellet was resuspended in 50 mM Bis Tris pH 6.5 and the suspension was aliquoted (1 mL) into tubes where either 1 M (NH4)2SO4 or 0.1 % PEI was added. Each sample was incubated at room temperature and at 95 °C for one hour. A faint band around 93 kDa was seen in the supernatant of the sample with just the buffer at pH 6.5, with addition of (NH4)2SO4 and 0.1 % PEI, all heated to 95 °C (Figure 4.41). The heating step also functions as a purification step because most contaminatants precipitate upon incubation at 75 or 95 °C. Since ApePolB solubilities were similar, a larger sample of the resuspended pellet (~15 mL) was heated to 95 °C for six hours in the presence of 0.1 % PEI. The protein remained entirely in the pellet for this large scale experiment. ApePolB was expressed around 30 °C overnight in order to increase the solubility of the protein, but ApePolB was still insoluble, even after repeating the solubility experiments above. Overall, the protein was expressed at multiple temperatures and remains insoluble after many attempts to solublize the protein.

240

1 2 3 4 5 6 7 8 9 10 11 12 13

97.4 kDa

PolB 93.0 kDa

66.3 kDa

Figure 4.41: Solubility results of DNA polymerase B. A small amount of ApePolB was solubilized upon the heating (95 °C) of the resuspended protein in the presence of sodium sulfate or PEI. Lane 1: MW ladder; lane 2: ApePolB pH 6.5 RT pellet; lane 3: ApePolB pH 6.5 RT supernatant; lane 4: ApePolB 1 M (NH4)2SO4 RT pellet; lane 5: ApePolB 1 M (NH4)2SO4 RT supernatant; lane 6: ApePolB 0.1% PEI RT pellet; lane 7: ApePolB 0.1% PEI RT supernatant; lane 8: ApePolB pH 6.5 95 °C pellet; lane 9: ApePolB pH 6.5 95 °C supernatant; lane 10: ApePolB 1 M (NH4)2SO4 95 °C pellet; lane 11: ApePolB 1 M (NH4)2SO4 95 °C supernatant; lane 12: ApePolB 0.1% PEI 95 °C pellet; lane 13: ApePolB 0.1% PEI 95 °C supernatant.

4.2 PCNA summary Ape0162 was successfully cloned into pET28 and an N-terminal hexahistidine tag was attached to the protein. Ape0441 was successfully cloned into pDESTC1, also with an N-terminal hexahistidine tag attached to the protein. cloned into pET21, sans any tag.

Ape2182 was successfully

Each protein has been expressed, lysed, and a

purification protocol has been developed for each protein. thermostable and have a low polydispersity.

241

All the proteins are

Conclusion The primary goal of this research was to determine the structure of bacteriophage T4 59 helicase assembly protein in complex with the T4 32 single stranded DNA binding protein. The recognition of the replication fork by the 59 protein is essential for initiating lagging strand DNA replication during T4 phage infection. The movement of the 59 protein from the replication fork, after the assembly of the primosome, has to occur to allow replication to resume. The interactions between the 59 protein and the 32 protein could suggest a mechanism in which the 59 protein is removed from the fork. The 59 protein interacts with the full length 32 protein and the 32-B truncation, but there were no interactions with the 32 core truncation or the 32-A truncation. This agreed with previous research suggesting that the A domain is essential for this proteinprotein interaction. The 59C42S mutant also interacted with the 32 protein and 32-B protein, which suggested that the C42 is not essential for the protein-protein interaction. The cooperative binding between the 32 proteins interferred with the detection of the 5932 protein complex, therefore the 32-B protein truncation was used exclusively in scattering experiments to study the protein-protein complex. DLS results suggested that, at the concentrations used in these studies, the 32-B is in a monomer-dimer equilibrium with both species present in solution. Sedimentation velocity results indicate that the 32B truncation was primarily a dimer in solution, while the full length 32 protein exists as a monomer, trimer, and higher multimers. This information suggests that in the absence of the N-terminal domain, the 32-B truncation forms a dimer which should also be present in the X-ray structure.

242

The 59 protein binds tightly to fork DNA (KD ~ 91 nM). The interactions between 59 protein and 32 protein (KD 3.7 µM) and between 59 protein and 32-B (KD 3.6 µM) are moderately tight. The 32 protein binds tighter to the 59 protein in the presence of fork DNA (KD 1.8 µM), while the 32-B/59 protein complex is weaker in the presence of fork DNA (KD 18.2 µM).

Comparing the fluorescence results for the ternary

complexes, the affinity of the 32 protein for the 59 protein increases due to the presence of the N-terminal domain (important for binding to DNA). In order to evalutate these experiments more efficently, the 32 and 32-B proteins need to be titrated into the fork DNA substrate to determine the effect of the DNA in the presence and absence of the Nterminal domain. SAXS data suggests that the 59 protein interacts with the 32-B in a side-by-side or a diagonal manner; comparing the experimental scattering curve to the theoretical scattering curve suggested that these are more realistic models than the previously proposed models. The molecular envelopes are consistent, (depending on the ab inito program used) with an elongated envelope suggesting a side-by-side model (Dammin), while another elongated model with an upward extension suggesting a diagonal model (Gasbor). The second goal of this research was to reconstitute the Okazaki repair complex from the crenarchaeal organism, Aeropyrum pernix. The efficient and proper completion of Okazaki processing is vital in DNA replication of the lagging strand. The interaction between the DNA ligase, DNA polymerase, and FEN-1 with PCNA is important in the timely completion of each procedure. The subunits of the heterotrimeric PCNA, the DNA polymerase, and the DNA ligase have been cloned, expressed and purified. These

243

proteins, along with the FEN-1 and RPA protein previously prepared, constitute the repair complex. Most of the effort was placed on the heterotrimeric PCNA. Each PCNA subunit unfolds at temperatures around 85-90 °C and each subunit is less polydispersed at 4 °C. The PCNA subunit Ape2182 was crystallized and analyzed by XRD. The crystals proved to be twinned and unsolvable. The interactions and stoichiometries are currently being analyzed using mass spectrometry.

244

References 1. Nossal, N. G., Protein-protein interactions at a DNA replication fork: bacteriophage T4 as a model. Faseb J 1992, 6, (3), 871-8. 2. Kreuzer, K. N., Recombination-dependent DNA replication in phage T4. Trends Biochem Sci 2000, 25, (4), 165-73. 3. Karam, J. D.; Drake, J. W., Molecular biology of bacteriophage T4. American Society for Microbiology: Washington, DC, 1994; p xviii, 615 p. 4. Sako, Y.; Nomura, N.; Uchida, A.; Ishida, Y.; Morii, H.; Koga, Y.; Hoaki, T.; Maruyama, T., Aeropyrum pernix gen. nov., sp. nov., a novel aerobic hyperthermophilic archaeon growing at temperatures up to 100 degrees C. Int J Syst Bacteriol 1996, 46, (4), 1070-7. 5. Kawarabayasi, Y.; Hino, Y.; Horikawa, H.; Yamazaki, S.; Haikawa, Y.; Jin-no, K.; Takahashi, M.; Sekine, M.; Baba, S.; Ankai, A.; Kosugi, H.; Hosoyama, A.; Fukui, S.; Nagai, Y.; Nishijima, K.; Nakazawa, H.; Takamiya, M.; Masuda, S.; Funahashi, T.; Tanaka, T.; Kudoh, Y.; Yamazaki, J.; Kushida, N.; Oguchi, A.; Kikuchi, H.; et al., Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res 1999, 6, (2), 83-101, 145-52. 6. Kelman, L. M.; Kelman, Z., Multiple origins of replication in archaea. Trends Microbiol 2004, 12, (9), 399-401. 7. Kelman, Z., DNA replication in the third domain (of life). Curr Protein Pept Sci 2000, 1, (2), 139-54. 8. Kelman, Z.; White, M. F., Archaeal DNA replication and repair. Curr Opin Microbiol 2005, 8, (6), 669-76. 9. Grabowski, B.; Kelman, Z., Archeal DNA replication: eukaryal proteins in a bacterial context. Annu Rev Microbiol 2003, 57, 487-516. 10. Jones, C. E.; Mueser, T. C.; Dudas, K. C.; Kreuzer, K. N.; Nossal, N. G., Bacteriophage T4 gene 41 helicase and gene 59 helicase-loading protein: a versatile couple with roles in replication and recombination. Proc Natl Acad Sci U S A 2001, 98, (15), 8312-8. 11. Jones, C. E.; Mueser, T. C.; Nossal, N. G., Bacteriophage T4 32 protein is required for helicase-dependent leading strand synthesis when the helicase is loaded by the T4 59 helicase-loading protein. J Biol Chem 2004, 279, (13), 12067-75. 12. Jones, C. E.; Mueser, T. C.; Nossal, N. G., Interaction of the bacteriophage T4 gene 59 helicase loading protein and gene 41 helicase with each other and with fork, flap, and cruciform DNA. J Biol Chem 2000, 275, (35), 27145-54. 13. Jones, C. E.; Green, E. M.; Stephens, J. A.; Mueser, T. C.; Nossal, N. G., Mutations of bacteriophage T4 59 helicase loader defective in binding fork DNA and in interactions with T4 32 single-stranded DNA-binding protein. J Biol Chem 2004, 279, (24), 25721-8. 245

14. Nossal, N. G.; Makhov, A. M.; Chastain, P. D., 2nd; Jones, C. E.; Griffith, J. D., Architecture of the bacteriophage T4 replication complex revealed with nanoscale biopointers. J Biol Chem 2007, 282, (2), 1098-108. 15. Thompson, J. D.; Gibson, T. J.; Plewniak, F.; Jeanmougin, F.; Higgins, D. G., The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25, (24), 4876-82. 16. Barry, J.; Alberts, B., Purification and characterization of bacteriophage T4 gene 59 protein. A DNA helicase assembly protein involved in DNA replication. J Biol Chem 1994, 269, (52), 33049-62. 17. Yonesaki, T., The purification and characterization of gene 59 protein from bacteriophage T4. J Biol Chem 1994, 269, (2), 1284-9. 18. Xu, H.; Wang, Y.; Bleuit, J. S.; Morrical, S. W., Helicase assembly protein Gp59 of bacteriophage T4: fluorescence anisotropy and sedimentation studies of complexes formed with derivatives of Gp32, the phage ssDNA binding protein. Biochemistry 2001, 40, (25), 7651-61. 19. Barry, J.; Alberts, B., A role for two DNA helicases in the replication of T4 bacteriophage DNA. J Biol Chem 1994, 269, (52), 33063-8. 20. Mueser, T. C.; Jones, C. E.; Nossal, N. G.; Hyde, C. C., Bacteriophage T4 gene 59 helicase assembly protein binds replication fork DNA. The 1.45 A resolution crystal structure reveals a novel alpha-helical two-domain fold. J Mol Biol 2000, 296, (2), 597612. 21. Ishmael, F. T.; Alley, S. C.; Benkovic, S. J., Identification and mapping of protein-protein interactions between gp32 and gp59 by cross-linking. J Biol Chem 2001, 276, (27), 25236-42. 22. Lefebvre, S. D.; Morrical, S. W., Interactions of the bacteriophage T4 gene 59 protein with single-stranded polynucleotides: binding parameters and ion effects. J Mol Biol 1997, 272, (3), 312-26. 23. DeLano, W. L. The PyMOL User's Manual, 1.01; Palo Alto, CA, 2002. 24. Emsley, P.; Cowtan, K., Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 2004, 60, (Pt 12 Pt 1), 2126-32. 25. Krissinel, E.; Henrick, K., Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 2004, 60, (Pt 12 Pt 1), 2256-68. 26. Heller, R. C.; Marians, K. J., Unwinding of the nascent lagging strand by Rep and PriA enables the direct restart of stalled replication forks. J Biol Chem 2005, 280, (40), 34143-34151. 27. Soni, R. K.; Mehra, P.; Mukhopadhyay, G.; Dhar, S. K., Helicobacter pylori DnaB helicase can bypass Escherichia coli DnaC function in vivo. Biochem J 2005, 389, (Pt 2), 541-8. 28. Eoff, R. L.; Raney, K. D., Helicase-catalysed translocation and strand separation. Biochem Soc Trans 2005, 33, (Pt 6), 1474-8. 29. Singleton, M. R.; Sawaya, M. R.; Ellenberger, T.; Wigley, D. B., Crystal structure of T7 gene 4 ring helicase indicates a mechanism for sequential hydrolysis of nucleotides. Cell 2000, 101, (6), 589-600.

246

30. Zhang, Z.; Spiering, M. M.; Trakselis, M. A.; Ishmael, F. T.; Xi, J.; Benkovic, S. J.; Hammes, G. G., Assembly of the bacteriophage T4 primosome: single-molecule and ensemble studies. Proc Natl Acad Sci U S A 2005, 102, (9), 3254-9. 31. Enemark, E. J.; Joshua-Tor, L., Mechanism of DNA translocation in a replicative hexameric helicase. Nature 2006, 442, (7100), 270-5. 32. Hinton, D. M.; Silver, L. L.; Nossal, N. G., Bacteriophage T4 DNA replication protein 41. Cloning of the gene and purification of the expressed protein. J Biol Chem 1985, 260, (23), 12851-7. 33. Richardson, R. W.; Nossal, N. G., Characterization of the bacteriophage T4 gene 41 DNA helicase. J Biol Chem 1989, 264, (8), 4725-31. 34. Richardson, R. W.; Nossal, N. G., Trypsin cleavage in the COOH terminus of the bacteriophage T4 gene 41 DNA helicase alters the primase-helicase activities of the T4 replication complex in vitro. J Biol Chem 1989, 264, (8), 4732-9. 35. Delagoutte, E.; von Hippel, P. H., Mechanistic studies of the T4 DNA (gp41) replication helicase: functional interactions of the C-terminal Tails of the helicase subunits with the T4 (gp59) helicase loader protein. J Mol Biol 2005, 347, (2), 257-75. 36. Raney, K. D., A helicase staircase. Nat Struct Mol Biol 2006, 13, (8), 671-2. 37. Wold, M. S., Replication protein A: a heterotrimeric, single-stranded DNAbinding protein required for eukaryotic DNA metabolism. Annu Rev Biochem 1997, 66, 61-92. 38. Shamoo, Y.; Friedman, A. M.; Parsons, M. R.; Konigsberg, W. H.; Steitz, T. A., Crystal structure of a replication fork single-stranded DNA binding protein (T4 gp32) complexed to DNA. Nature 1995, 376, (6538), 362-6. 39. Spicer, E. K.; Williams, K. R.; Konigsberg, W. H., T4 gene 32 protein trypsingenerated fragments. Fluorescence measurement of DNA-binding parameters. J Biol Chem 1979, 254, (14), 6433-6. 40. Villemain, J. L.; Giedroc, D. P., Characterization of a cooperativity domain mutant Lys3 --> Ala (K3A) T4 gene 32 protein. J Biol Chem 1996, 271, (44), 27623-9. 41. Sun, S.; Shamoo, Y., Biochemical characterization of interactions between DNA polymerase and single-stranded DNA-binding protein in bacteriophage RB69. J Biol Chem 2003, 278, (6), 3876-81. 42. Villemain, J. L.; Giedroc, D. P., The N-terminal B-domain of T4 gene 32 protein modulates the lifetime of cooperatively bound Gp32-ss nucleic acid complexes. Biochemistry 1996, 35, (45), 14395-404. 43. Waidner, L. A.; Flynn, E. K.; Wu, M.; Li, X.; Karpel, R. L., Domain effects on the DNA-interactive properties of bacteriophage T4 gene 32 protein. J Biol Chem 2001, 276, (4), 2509-16. 44. Liu, J.; Qian, N.; Morrical, S. W., Dynamics of bacteriophage T4 presynaptic filament assembly from extrinsic fluorescence measurements of Gp32-single-stranded DNA interactions. J Biol Chem 2006, 281, (36), 26308-19. 45. Giedroc, D. P.; Keating, K. M.; Williams, K. R.; Konigsberg, W. H.; Coleman, J. E., Gene 32 protein, the single-stranded DNA binding protein from bacteriophage T4, is a zinc metalloprotein. Proc Natl Acad Sci U S A 1986, 83, (22), 8452-6. 46. Bowman, G. D.; O'Donnell, M.; Kuriyan, J., Structural analysis of a eukaryotic sliding DNA clamp-clamp loader complex. Nature 2004, 429, (6993), 724-30.

247

47. Meyer, R. R.; Laine, P. S., The single-stranded DNA-binding protein of Escherichia coli. Microbiol Rev 1990, 54, (4), 342-80. 48. Bochkarev, A.; Bochkareva, E.; Frappier, L.; Edwards, A. M., The crystal structure of the complex of replication protein A subunits RPA32 and RPA14 reveals a mechanism for single-stranded DNA binding. Embo J 1999, 18, (16), 4498-504. 49. Chedin, F.; Seitz, E. M.; Kowalczykowski, S. C., Novel homologs of replication protein A in archaea: implications for the evolution of ssDNA-binding proteins. Trends Biochem Sci 1998, 23, (8), 273-7. 50. Kelly, T. J.; Simancek, P.; Brush, G. S., Identification and characterization of a single-stranded DNA-binding protein from the archaeon Methanococcus jannaschii. Proc Natl Acad Sci U S A 1998, 95, (25), 14634-9. 51. Kerr, I. D.; Wadsworth, R. I.; Cubeddu, L.; Blankenfeldt, W.; Naismith, J. H.; White, M. F., Insights into ssDNA recognition by the OB fold from a structural and thermodynamic study of Sulfolobus SSB protein. Embo J 2003, 22, (11), 2561-70. 52. Fanning, E.; Klimovich, V.; Nager, A. R., A dynamic model for replication protein A (RPA) function in DNA processing pathways. Nucleic Acids Res 2006, 34, (15), 4126-37. 53. Bochkareva, E.; Korolev, S.; Lees-Miller, S. P.; Bochkarev, A., Structure of the RPA trimerization core and its role in the multistep DNA-binding mechanism of RPA. Embo J 2002, 21, (7), 1855-63. 54. Fedorov, R.; Witte, G.; Urbanke, C.; Manstein, D. J.; Curth, U., 3D structure of Thermus aquaticus single-stranded DNA-binding protein gives insight into the functioning of SSB proteins. Nucleic Acids Res 2006, 34, (22), 6708-17. 55. Bochkarev, A.; Pfuetzner, R. A.; Edwards, A. M.; Frappier, L., Structure of the single-stranded-DNA-binding domain of replication protein A bound to DNA. Nature 1997, 385, (6612), 176-81. 56. Ishmael, F. T.; Alley, S. C.; Benkovic, S. J., Assembly of the bacteriophage T4 helicase: architecture and stoichiometry of the gp41-gp59 complex. J Biol Chem 2002, 277, (23), 20555-62. 57. Morrical, S. W.; Beernink, H. T.; Dash, A.; Hempstead, K., The gene 59 protein of bacteriophage T4. Characterization of protein-protein interactions with gene 32 protein, the T4 single-stranded DNA binding protein. J Biol Chem 1996, 271, (33), 20198-207. 58. Lefebvre, S. D.; Wong, M. L.; Morrical, S. W., Simultaneous interactions of bacteriophage T4 DNA replication proteins gp59 and gp32 with single-stranded (ss) DNA. Co-modulation of ssDNA binding activities in a DNA helicase assembly intermediate. J Biol Chem 1999, 274, (32), 22830-8. 59. Tsurimoto, T., PCNA binding proteins. Front Biosci 1999, 4, D849-58. 60. Devos, J. M.; Tomanicek, S. J.; Jones, C. E.; Nossal, N. G.; Mueser, T. C., Crystal structure of bacteriophage T4 5' nuclease in complex with a branched DNA reveals how flap endonuclease-1 family nucleases bind their substrates. J Biol Chem 2007, 282, (43), 31713-24. 61. Rossi, M. L.; Purohit, V.; Brandt, P. D.; Bambara, R. A., Lagging strand replication proteins in genome stability and DNA repair. Chem Rev 2006, 106, (2), 45373.

248

62. Hubscher, U.; Seo, Y. S., Replication of the lagging strand: a concert of at least 23 polypeptides. Mol Cells 2001, 12, (2), 149-57. 63. Moldovan, G. L.; Pfander, B.; Jentsch, S., PCNA, the maestro of the replication fork. Cell 2007, 129, (4), 665-79. 64. Miyachi, K.; Fritzler, M. J.; Tan, E. M., Autoantibody to a nuclear antigen in proliferating cells. J Immunol 1978, 121, (6), 2228-34. 65. Bravo, R.; Celis, J., A search for differential polypeptide sythesis throughout the cell cycle of He La cells. J. Cell Biol. 1980, 84, 795-802. 66. Kelman, Z., PCNA: structure, functions and interactions. Oncogene 1997, 14, (6), 629-40. 67. Prelich, G.; Kostura, M.; Marshak, D. R.; Mathews, M. B.; Stillman, B., The cellcycle regulated proliferating cell nuclear antigen is required for SV40 DNA replication in vitro. Nature 1987, 326, (6112), 471-5. 68. Kelman, Z.; Hurwitz, J., Protein-PCNA interactions: a DNA-scanning mechanism? Trends Biochem Sci 1998, 23, (7), 236-8. 69. Johnson, A.; O'Donnell, M., Cellular DNA replicases: components and dynamics at the replication fork. Annu Rev Biochem 2005, 74, 283-315. 70. Kuriyan, J.; O'Donnell, M., Sliding clamps of DNA polymerases. J Mol Biol 1993, 234, (4), 915-25. 71. Brody, E. N.; Kassavetis, G. A.; Ouhammouch, M.; Sanders, G. M.; Tinker, R. L.; Geiduschek, E. P., Old phage, new insights: two recently recognized mechanisms of transcriptional regulation in bacteriophage T4 development. FEMS Microbiol Lett 1995, 128, (1), 1-8. 72. Shamoo, Y.; Steitz, T. A., Building a replisome from interacting pieces: sliding clamp complexed to a peptide from DNA polymerase and a polymerase editing complex. Cell 1999, 99, (2), 155-66. 73. Liu, Y.; Kao, H. I.; Bambara, R. A., Flap endonuclease 1: a central component of DNA metabolism. Annu Rev Biochem 2004, 73, 589-615. 74. Imamura, K.; Fukunaga, K.; Kawarabayasi, Y.; Ishino, Y., Specific interactions of three proliferating cell nuclear antigens with replication-related proteins in Aeropyrum pernix. Mol Microbiol 2007, 64, (2), 308-18. 75. Tatusova, T. A.; Madden, T. L., Blast 2 sequences- a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 1999, 174, 247-50. 76. Jeanmougin, F.; Thompson, J. D.; Gouy, M.; Higgins, D. G.; Gibson, T. J., Multiple sequence alignment with Clustal X. Trends Biochem Sci 1998, 23, (10), 403-5. 77. Pascal, J. M.; Tsodikov, O. V.; Hura, G. L.; Song, W.; Cotner, E. A.; Classen, S.; Tomkinson, A. E.; Tainer, J. A.; Ellenberger, T., A flexible interface between DNA ligase and PCNA supports conformational switching and efficient ligation of DNA. Mol Cell 2006, 24, (2), 279-91. 78. Bruning, J. B.; Shamoo, Y., Structural and Thermodynamic Analysis of Human PCNA with Peptides Dervied from DNA Polymerase-∆ p66 Subunit and Flap Endonuclease-1. Structure 2004, 12, 2209-19. 79. Sakurai, S.; Kitano, K.; Yamaguchi, H.; Hamada, K.; Okada, K.; Fukuda, K.; Uchida, M.; Ohtsuka, E.; Morioka, H.; Hakoshima, T., Structural basis for recruitment of human flap endonuclease 1 to PCNA. Embo J 2005, 24, (4), 683-93.

249

80. Krishna, T. S.; Kong, X. P.; Gary, S.; Burgers, P. M.; Kuriyan, J., Crystal structure of the eukaryotic DNA polymerase processivity factor PCNA. Cell 1994, 79, (7), 1233-43. 81. Kong, X. P.; Onrust, R.; O'Donnell, M.; Kuriyan, J., Three-dimensional structure of the beta subunit of E. coli DNA polymerase III holoenzyme: a sliding DNA clamp. Cell 1992, 69, (3), 425-37. 82. Moarefi, I.; Jeruzalmi, D.; Turner, J.; O'Donnell, M.; Kuriyan, J., Crystal structure of the DNA polymerase processivity factor of T4 bacteriophage. J Mol Biol 2000, 296, (5), 1215-23. 83. Williams, G. J.; Johnson, K.; Rudolf, J.; McMahon, S. A.; Carter, L.; Oke, M.; Liu, H.; Taylor, G. L.; White, M. F.; Naismith, J. H., Structure of the heterotrimeric PCNA from Sulfolobus solfataricus. Acta Crystallograph Sect F Struct Biol Cryst Commun 2006, 62, (Pt 10), 944-8. 84. Chapados, B. R.; Hosfield, D. J.; Han, S.; Qiu, J.; Yelent, B.; Shen, B.; Tainer, J. A., Structural basis for FEN-1 substrate specificity and PCNA-mediated activation in DNA replication and repair. Cell 2004, 116, (1), 39-50. 85. Matsumiya, S.; Ishino, Y.; Morikawa, K., Crystal structure of an archaeal DNA sliding clamp: proliferating cell nuclear antigen from Pyrococcus furiosus. Protein Sci 2001, 10, (1), 17-23. 86. Dore, A. S.; Kilkenny, M. L.; Jones, S. A.; Oliver, A. W.; Roe, S. M.; Bell, S. D.; Pearl, L. H., Structure of an archaeal PCNA1-PCNA2-FEN1 complex: elucidating PCNA subunit and client enzyme specificity. Nucleic Acids Res 2006, 34, (16), 4515-26. 87. Indiani, C.; O'Donnell, M., The replication clamp-loading machine at work in the three domains of life. Nat Rev Mol Cell Biol 2006, 7, (10), 751-61. 88. Cann, I. K.; Ishino, S.; Hayashi, I.; Komori, K.; Toh, H.; Morikawa, K.; Ishino, Y., Functional interactions of a homolog of proliferating cell nuclear antigen with DNA polymerases in Archaea. J Bacteriol 1999, 181, (21), 6591-9. 89. Daimon, K.; Kawarabayasi, Y.; Kikuchi, H.; Sako, Y.; Ishino, Y., Three proliferating cell nuclear antigen-like proteins found in the hyperthermophilic archaeon Aeropyrum pernix: interactions with the two DNA polymerases. J Bacteriol 2002, 184, (3), 687-94. 90. Barry, E. R.; Bell, S. D., DNA replication in the archaea. Microbiol Mol Biol Rev 2006, 70, (4), 876-87. 91. Waga, S.; Stillman, B., Cyclin-dependent kinase inhibitor p21 modulates the DNA primer-template recognition complex. Mol Cell Biol 1998, 18, (7), 4177-87. 92. Zhang, G.; Gibbs, E.; Kelman, Z.; O'Donnell, M.; Hurwitz, J., Studies on the interactions between human replication factor C and human proliferating cell nuclear antigen. Proc Natl Acad Sci U S A 1999, 96, (5), 1869-74. 93. Seybert, A.; Scott, D. J.; Scaife, S.; Singleton, M. R.; Wigley, D. B., Biochemical characterisation of the clamp/clamp loader proteins from the euryarchaeon Archaeoglobus fulgidus. Nucleic Acids Res 2002, 30, (20), 4329-38. 94. Kazmirski, S. L.; Zhao, Y.; Bowman, G. D.; O'Donnell, M.; Kuriyan, J., Out-ofplane motions in open sliding clamps: molecular dynamics simulations of eukaryotic and archaeal proliferating cell nuclear antigen. Proc Natl Acad Sci U S A 2005, 102, (39), 13801-6.

250

95. Yao, N.; Turner, J.; Kelman, Z.; Stukenberg, P. T.; Dean, F.; Shechter, D.; Pan, Z. Q.; Hurwitz, J.; O'Donnell, M., Clamp loading, unloading and intrinsic stability of the PCNA, beta and gp45 sliding clamps of human, E. coli and T4 replicases. Genes Cells 1996, 1, (1), 101-13. 96. Steitz, T. A., DNA polymerases: structural diversity and common mechanisms. J Biol Chem 1999, 274, (25), 17395-8. 97. MacNeill, S. A., Understanding the enzymology of archaeal DNA replication: progress in form and function. Mol Microbiol 2001, 40, (3), 520-9. 98. Cann, I. K.; Ishino, S.; Nomura, N.; Sako, Y.; Ishino, Y., Two family B DNA polymerases from Aeropyrum pernix, an aerobic hyperthermophilic crenarchaeote. J Bacteriol 1999, 181, (19), 5984-92. 99. Steitz, T. A., DNA- and RNA- dependent DNA polymerases. Curr. Opin. Struct. Biol. 1993, 3, 31-38. 100. Beese, L. S.; Steitz, T. A., Structural basis for the 3'-5' exonuclease activity of Escherichia coli DNA polymerase I: a two metal ion mechanism. Embo J 1991, 10, (1), 25-33. 101. Tomkinson, A. E.; Vijayakumar, S.; Pascal, J. M.; Ellenberger, T., DNA ligases: structure, reaction mechanism, and function. Chem Rev 2006, 106, (2), 687-99. 102. Kiyonari, S.; Takayama, K.; Nishida, H.; Ishino, Y., Identification of a novel binding motif in Pyrococcus furiosus DNA ligase for the functional interaction with proliferating cell nuclear antigen. J Biol Chem 2006, 281, (38), 28023-32. 103. Levin, D. S.; McKenna, A. E.; Motycka, T. A.; Matsumoto, Y.; Tomkinson, A. E., Interaction between PCNA and DNA ligase I is critical for joining of Okazaki fragments and long-patch base-excision repair. Curr Biol 2000, 10, (15), 919-22. 104. Levin, D. S.; Bai, W.; Yao, N.; O'Donnell, M.; Tomkinson, A. E., An interaction between DNA ligase I and proliferating cell nuclear antigen: implications for Okazaki fragment synthesis and joining. Proc Natl Acad Sci U S A 1997, 94, (24), 12863-8. 105. Jeon, S. J.; Ishikawa, K., A novel ADP-dependent DNA ligase from Aeropyrum pernix K1. FEBS Lett 2003, 550, (1-3), 69-73. 106. Doherty, A. J.; Suh, S. W., Structural and mechanistic conservation in DNA ligases. Nucleic Acids Res 2000, 28, (21), 4051-8. 107. Johnson, A.; O'Donnell, M., DNA ligase: getting a grip to seal the deal. Curr Biol 2005, 15, (3), R90-2. 108. Nishida, H.; Tsuchiya, D.; Ishino, Y.; Morikawa, K., Overexpression, purification and crystallization of an archaeal DNA ligase from Pyrococcus furiosus. Acta Crystallograph Sect F Struct Biol Cryst Commun 2005, 61, (Pt 12), 1100-2. 109. Pascal, J. M.; O'Brien, P. J.; Tomkinson, A. E.; Ellenberger, T., Human DNA ligase I completely encircles and partially unwinds nicked DNA. Nature 2004, 432, (7016), 473-8. 110. Dionne, I.; Nookala, R. K.; Jackson, S. P.; Doherty, A. J.; Bell, S. D., A heterotrimeric PCNA in the hyperthermophilic archaeon Sulfolobus solfataricus. Mol Cell 2003, 11, (1), 275-82. 111. Konarev, P. V.; Volkov, V. V.; Sokolova, A. V.; Koch, M. H.; Svergun, D. I., PRIMUS: a Windows PC-based system for small-angle scattering data analysis. J. Appl. Cryst. 2003, 36, 1277-82.

251

112. Svergun, D. I., Determination of the Regularization Parameter in IndirectTransform Methods Using Perceptual Criteria. J. Appl. Cryst. 1992, 25, 495-503. 113. Koch, M. H.; Vachette, P.; Svergun, D. I., Small-angle scattering: a view on the properties, structures and structural changes of biological macromolecules in solution. Q Rev Biophys 2003, 36, (2), 147-227. 114. Svergun, D. I., Restoring Low Resolution Structure of Biological Macromolecules from Solution Scattering Using Simulated Annealing. Biophys. J. 1999, 76, 2879-86. 115. Volkov, V. V.; Svergun, D. I., Uniqueness of ab initio shape determination in small-angle scattering. J. Appl. Cryst. 2003, 36, 860-64. 116. Svergun, D. I.; Petoukhov, M. V.; Koch, M. H., Determination of Domain Structure of Proteins from X-Ray Solution Scattering. Biophys. J. 2001, 80, 2946-53. 117. Petoukhov, M. V.; Svergun, D. I., Global rigid body modeling of macromolecular complexes against small-angle scattering data. Biophys J 2005, 89, (2), 1237-50. 118. Svergun, D. I.; Barberato, C.; Koch, M. H., CRYSOL-a Program to Evaluate Xray Solution Scattering of Biological Macromolecules from Atomic Coordinates. J. Appl. Cryst. 1995, 28, 768-73. 119. Petoukhov, M. V.; Eady, N. A.; Brown, K. A.; Svergun, D. I., Addition of missing loops and domains to protein models by x-ray solution scattering. Biophys J 2002, 83, (6), 3113-25. 120. Heller, W. T., ELLSTAT: shape modeling for solution small-angle scattering of proteins and protein complexes with automated statistical characterization. J. Appl. Cryst. 2006, 39, 671-75. 121. Svergun, D. I.; Richard, S.; Koch, M. H.; Sayers, Z.; Kuprin, S.; Zaccai, G., Protein hydration in solution: experimental observation by x-ray and neutron scattering. Proc Natl Acad Sci U S A 1998, 95, (5), 2267-72. 122. Senger, A. B.; Mueser, T. C., Rapid preparation of custom grid screens for crystal growth optimization. J. Appl. Cryst. 2005, 38, 847-850. 123. Spacciapoli, P.; Nossal, N. G., Interaction of DNA polymerase and DNA helicase within the bacteriophage T4 DNA replication complex. Leading strand synthesis by the T4 DNA polymerase mutant A737V (tsL141) requires the T4 gene 59 helicase assembly protein. J Biol Chem 1994, 269, (1), 447-55. 124. Padilla, J. E.; Yeates, T. O., A statistic for local intensity differences: robustness to anisotropy and pseudo-centering and utility for detecting twinning. Acta Crystallogr D Biol Crystallogr 2003, 59, (Pt 7), 1124-30. 125. Painter, J.; Merritt, E. A., Optimal description of a protein structure in terms of multiple groups undergoing TLS motion. Acta Crystallogr D Biol Crystallogr 2006, 62, (Pt 4), 439-50. 126. Painter, J.; Merritt, E. A., TLSMD web server for the generation of multi-group TLS models. J Appl Crystallogr 2006, 39, 109-11. 127. Laskowski, R. A.; MacArthur, M. W.; Moss, D. S.; Thornton, J. M., PROCHECK: a program to check the sterochemical quality of protein structures. J. Appl. Cryst. 1993, 26, 283-91.

252

Appendix

X

X

X

Figure A1: Chromatogram of 59 protein purification on the Poros HS. The 59 protein eluted from the column in fractions 10-22 in 450 mM NH4Cl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 59 protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

253

X X X

Figure A2: Chromatogram of 59C42S protein purification on the Poros HS. The 59C42S protein eluted from the column in fractions 29-33 in 400 mM NH4Cl . The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 59C42S protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

X XX

X

Figure A3: Chromatogram of 32 protein purification on the Poros HQ. The 32 protein eluted from the column in fractions 30-38 in 450 mM NaCl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 32 protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

254

XXXXXXXXXX

Figure A4: Chromatogram of 32-A protein purification on the Poros HQ. The 32-A protein eluted from the column in fractions 10-16 in 250 mM NaCl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 32-A protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

X

X

Figure A5: Chromatogram of 32-B protein purification on the Poros HQ. The 32-B protein eluted from the column in fractions 6-16 in 150 mM NaCl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 32-B protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

255

X

X X

X

X

Figure A6: Chromatogram of KVP40 41 helicase purification on the Poros HQ. The 41 protein eluted from the column in fractions 5-18 in 100 mM NaCl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 41 protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

X

XXXX

Figure A7: Chromatogram of KVP40 59 protein purification on the Poros HS. The 59 protein eluted from the column in fractions 18-21 in 450 mM NH4Cl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the KVP40 59 protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

256

X X X X X

X

X

Figure A8: Chromatogram of RB69 59 protein purification on the Poros HS. The 59 protein eluted from the column in fractions 9-15 in 250 mM NH4Cl. The red X denote the fractions of the purification run that were run on a gel to determine the purity of the 59 protein. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) were monitored as the protein was purified.

257

5.00

A.

2 3 4 5 6 7 8 9

X

X

100.0% Buffer B

11

13

15

X

17

X

19

X

21

X

23

Fractions 25 27

29

31

33

35

37

39

41

43

45

47

49 100.0

X X

4.50

90.0

conductivity

4.00

3.50

80.0

75.0

70.0

% Buffer B

3.00

60.0

2.50

50.0 50.0

2.00

40.0

1.50

30.0 25.0

1.00

20.0

0.50

10.0

0.00

0.0

0.0

-0.50

-10.0

00:00:00

00:30:00 Hr:Min:Sec

AU

5.00

B. 2

3

4

5

6

100.0% Buffer B

7

X

8

9 10

12

14

16

X XX X

18

X

Fractions 20 22 24

X

01:00:00 mS/cm

26

X

28

30

32

34

36

38

40

42 250.0

X

4.50

4.00

3.50

200.0

75.0

% Buffer B

3.00

150.0

2.50 50.0

conductivity

2.00

100.0

1.50 25.0 1.00

50.0

0.50

0.00

0.0

0.0

-0.50

00:00:00

00:05:00

AU

00:10:00 Hr:Min:Sec

00:15:00

00:20:00 mS/cm

Figure A9: Chromatograms of Ape0162 purifications. Ape0162 eluted from Q Sepharose (A) in fractions 9-12 (350 mM NaCl) and from Poros HQ (B) in fractions 5-13 (400 mM NaCl). The red X denotes the fractions run on a SDS-PAGE to verify the presence of Ape0162, the red arrows specifies the lines monitoring the % buffer B and the conductivity of the buffer and the protein absorbance was measured with a mercury source at 250 nm.

258

1 5.00

2

3

4

5

6

A.

7

8

X

100.0% Buffer B

9

10

11

X

12

13

14

X

Fractions 15 16 17 18

X

19

20

21

X

22

23

24

X

25

26

27

28

29

30

31 32 100.0

X

conductivity

4.50

90.0

4.00

80.0

3.50

70.0

3.00

60.0

% Buffer B 2.50

50.0 50.0

2.00

40.0

1.50

30.0

1.00

20.0

0.50

10.0

0.00

0.0

0.0

-0.50

-10.0

00:00:00

00:10:00

00:20:00 Hr:Min:Sec

AU

2 5.00

B.

3

4

5

6

100.0% Buffer B

7

X

8

9 10

12

X

14

X

16

X

Fractions 20 22 24

18

X

X

00:30:00 mS/cm

26

X

28

30

32

34

36

38

40

42 250.0

X

4.50

4.00

3.50

200.0

75.0

% Buffer B

3.00

150.0

2.50 50.0

conductivity

2.00

100.0

1.50 25.0 1.00

50.0

0.50

0.00

0.0

0.0

-0.50

00:00:00 AU

00:05:00

00:10:00 Hr:Min:Sec

00:15:00

00:20:00 mS/cm

Figure A10: Chromatograms of FL-Ape0441 purification. A). FL-Ape0441 eluted from the Q Sepharose in fractions 10-13 in 425 mM NaCl. B). FL-Ape0441 eluted from the Poros HQ in fractions 6-16 in 500 mM NaCl. The red X denotes the fractions run on a SDS-PAGE to verify the presence of FL-Ape0162, the red arrows specifies the lines monitoring the % buffer B and the conductivity of the buffer and the protein absorbance was measured with a mercury lamp based detector at 250 nm.

259

XX

X

X

Figure A11: Chromatogram of Ape2182 Q Sepharose purification. Ape2182 elutes from the Q Sepharose in fractions 19-45 (500 mM NaCl). The red X denotes the fractions of the purification run that were run on a gel to determine the purity of Ape2182. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified. Both wavelength absorbencies were saturated during the run resulting in the flattening of the curve (purple and green lines).

X

X

Figure A12: Chromatogram of Ape2182 Poros HQ purification. Ape2182 elutes from the Poros HQ in fractions 5-15 (250 mM NaCl). The red X denote the fractions of the purification run that were run on a gel to determine the purity of Ape2182. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

Both wavelength absorbencies were saturated

during the run resulting in the flattening of the curve (purple and green lines).

260

XXXXXXXXX

Figure A13: Chromatogram of DNA ligase SP Sepharose purification. DNA ligase elutes from the SP Sepharose in fractions 8-17 (~ 200 mM). The red X denote the fractions of the purification run that were run on a gel to determine the purity of the DNA ligase. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nm- purple line and 280 nm green line) are monitored as the protein is purified.

XXXXXXXXXXXX

Figure A14: Chromatogram of DNA ligase Poros HS purification. DNA ligase elutes from the Poros HS in fractions 22-34 (~ 375 mM). The red X denote the fractions of the purification run that were run on a gel to determine the purity of the DNA ligase. The conductivity of the elution buffer (red line), the elution gradient (black line), and absorbance of the protein at two wavelengths (260 nmpurple line and 280 nm green line) are monitored as the protein is purified.

261

Suggest Documents