Protein Structure Prediction: Integral Membrane Proteins

Protein Structure Prediction: Integral Membrane Proteins. These are proteins which have segments bound within the lipid bilayer of the cell membrane (...
Author: Lindsey Wood
5 downloads 0 Views 1MB Size
Protein Structure Prediction: Integral Membrane Proteins. These are proteins which have segments bound within the lipid bilayer of the cell membrane (also interior membrane walls). Current estimates are that 20-40% of all coded proteins are such proteins, across species. (20-30% are helix bundle proteins of the kind discussed mainly today.) As we have seen, these include various active and inactive transport channels, and receptor proteins. All interaction with the cell’s environment pass through such proteins. Further, the typical hydrophobicity of the segment of such a protein facing into the lipid bilayer makes such proteins difficult to crystalize and hence to obtain accurate X-ray diffraction measurements of their structures. All of this makes them a useful and important target of sequence based analysis. Protein structure vocabulary: primary, secondary, tertiary and quaternary structure. Primary means

the amino acid sequence information. Secondary means the most basic recurring structural constituents. We have already seen the α-helix as the first example of such an element. Tertiary structure is basically how the secondary structure elements are arranged into three dimensional space to form the protein. Quaternary structure refers to multimer configurations of proteins, so that, e.g., the active protein might be a homodimer composed of two copies of the same protein monomer, with a necessary three dimensional configuration for activity. At the level of secondary structure, integral membrane proteins involve the two most common secondary structure elements: α-helices and β-strands. The most common alternative to the structures built out of α-helices traversing the bilayer is the so-called β-barrel.

Here are cartoons of these two classes:

Here is a schematic for the α-helix:

Here is a cartoon of a porin β-barrel:

The yellow ribbons are the (anti-parallel) βstrands. Porins generally provide a simple diffusion pathway across the membrane for molecules less than 1 kDa with little substrate selectivity. (There is a classs of proteins called aquaporins which allow passage of H2O across the membrane, but which are composed mainly of α-helices.) Study TMHMM today, the tool used last time to “parse” GPCRs (or “7TM” proteins in general) into internal, lipid bilayer (or trans-membrane) and external portions of the protein, the “topology” of the protein. It has been developed by A. Krogh and coworkers, most notably G. von Heijne. This is an HMM tool – TMHMM – which is available via an online server: http://www.cbs.dtu.dk/services/TMHMM/. Because the lipid bilayer is so different from the intracellular or extracellular environment,

there is a strong signal implicit in these portions of membrane spanning proteins, so the basic problem should be amenable to treatment by HMMs. More prescisely the tool predicts trans-membrane helices rather than just membrane embedded portions of proteins. βbarrels are the target of another, less widely used tool by Krogh and co-workers. There were classically (i.e., 15 years ago) two features, or rules, which were used to determine or predict TM protein topology: (1) the “positive inside rule”, according to which if there were charged residues, these were in the cytoplasmic segment of the protein, and (2) the hydrophobic residues should be to the lipid layer. For β-barrels, this can be somewhat misleading (green = polar, white = aromatic, red = non-polar):

First, the HMM architecture of TMHMM, which follows the simplest idea of the general structure of such a protein:

Notice that there are seven submodules here: helix core, cytoplasmic and non-cytoplasmic cap regions, cytoplasmic loops, non-cytoplasmic long and short loops, and “globular” regions. This latter is really a catch-all for things like casettes in the interior of the cell and receptor structures in the cell exterior. The caps are there because of the ambiguity, even at the wet lab bench, of what the exact extent of the length of the portion of the helix contained strictly speaking within the bilayer: a kind of transition segment. Some TM proteins have active elements here, such as charged residues, and so their composition may indeed be different, and so it is a good idea to model this as a possibly different signal. The helix core has an estimate of the maximal number of residues to traverse the bilayer (25 – minus the two end cap regions!), and a minimum estimate (5 + caps).

Here are some results: first, for single sequences, the method is about 97.5% accurate for predicting TM helices. The topology is correct about 77% of the time, with an additional 7% if you allow a flip of cytoplasmic/exterior. these are based on genome wide screens for several species.

Here is the text output of TMHMM, for human Sulfonylurea receptor 1 (SwissProt Q09428): TMHMM result

https://ctools.umich.edu/access/content/group/1118774169420-2039...

TMHMM result HELP with output formats # sp_Q09428_ACC8_HUMAN # sp_Q09428_ACC8_HUMAN # sp_Q09428_ACC8_HUMAN # sp_Q09428_ACC8_HUMAN # sp_Q09428_ACC8_HUMAN # sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN sp_Q09428_ACC8_HUMAN

Length: 1580 Number of predicted TMHs: 16 Exp number of AAs in TMHs: 352.04622 Exp number, first 60 AAs: 21.39772 Total prob of N-in: 0.00094 POSSIBLE N-term signal sequence TMHMM2.0 outside 1 28 TMHMM2.0 TMhelix 29 51 TMHMM2.0 inside 52 71 TMHMM2.0 TMhelix 72 94 TMHMM2.0 outside 95 103 TMHMM2.0 TMhelix 104 123 TMHMM2.0 inside 124 134 TMHMM2.0 TMhelix 135 157 TMHMM2.0 outside 158 166 TMHMM2.0 TMhelix 167 189 TMHMM2.0 inside 190 300 TMHMM2.0 TMhelix 301 323 TMHMM2.0 outside 324 349 TMHMM2.0 TMhelix 350 367 TMHMM2.0 inside 368 426 TMHMM2.0 TMhelix 427 449 TMHMM2.0 outside 450 453 TMHMM2.0 TMhelix 454 476 TMHMM2.0 inside 477 536 TMHMM2.0 TMhelix 537 559 TMHMM2.0 outside 560 573 TMHMM2.0 TMhelix 574 596 TMHMM2.0 inside 597 1005 TMHMM2.0 TMhelix 1006 1028 TMHMM2.0 outside 1029 1061 TMHMM2.0 TMhelix 1062 1084 TMHMM2.0 inside 1085 1153 TMHMM2.0 TMhelix 1154 1176 TMHMM2.0 outside 1177 1247 TMHMM2.0 TMhelix 1248 1270 TMHMM2.0 inside 1271 1274 TMHMM2.0 TMhelix 1275 1297 TMHMM2.0 outside 1298 1580

# plot in postscript, script for making the plot in gnuplot, data for plot

1 of 1

12/5/05 8:48 AM

Here is a TMHMM graphical output for a Halobacterium archaerhodopsin, so one of our 7TM proteins encountered earlier, showing the (three) posterior probabilities computed at each position:

Suggest Documents