Protein Structure Bioinformatics Session1: Introduction

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics Session1: Introduction Rehab Ahmed CBSB, Faculty of Science, ...
Author: Easter Dean
3 downloads 2 Views 4MB Size
Introduction to Bioinformatics online course: IBT_2016

Protein Structure Bioinformatics Session1: Introduction Rehab Ahmed CBSB, Faculty of Science, University of Khartoum Faculty of Pharmacy, University of Khartoum

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Learning Objectives • To recap some basics of amino acids and proteins • To study the different levels of protein structures • To shed light on how protein structures are determined. • To learn about some relevant databases, file formats

and file viewers.

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Learning Outcomes

By the end of this session and practical, students are expected to be able to • Explore some recourses, and tools in the PDB

database. • Use some webservers to predict Protein secondary structure Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Structure of Amino Acid

https://www.mun.ca/biology/scarr/iGen3_06-01.html Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Aliphatic R Groups

• Name • 3 letter • One letter

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Aromatic R Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Sulfur-containing R Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Side Chains with Polar Alcohol Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Basic R Groups

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Acidic R Groups

http://iweb.langara.bc.ca/biology/mario/Biol2315notes/biol2315chap3.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

https://online.science.psu.edu/sites/default/files/biol110/tutorial16_R_groups.jpg

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Molecular interactions Bonds and protein structures

https://researchpeptides.com/images/misc/peptide-bond-animation.gif

Introduction to Bioinformatics online course: IBT_2016 Protein Structural Bioinformatics, Trainer: Rehab Ahmed

Intermolecular Forces

• • • • •

Dipole interactions Hydrogen bonds van der Waals forces hydrophobic interactions Others.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

http://www.chem.ucla.edu/~harding/IGOC/D/disulfide_bridge.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Intermolecular Forces

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

https://researchpeptides.com/images/misc/Structures-Proteins.jpg

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Structure is instructed in the sequence!! • Anfinsen's dogma Christian B. Anfinsen 916–1995, U.S. biochemist: Nobel Prize in Chemistry 1972. • Principles that Govern the Folding of Protein Chains • Science 20 Jul 1973:Vol. 181, Issue 4096, pp. 223-230

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure

https://online.science.psu.edu/biol011_sandbox_7239/node/7390 Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

α- helix

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

α- helix

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Linus Pauling (1901-1994), Noble prizes in chemistry and peace

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Other types of helices

• Alpha helix…….. (I, i+4) • Others: -3-10 helix…… (i, i+3) -π-helix……….. (i, i+5)

https://en.wikipedia.org/wiki/File:Pi-helix_within_an_alpha-helix.jpg

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Beta Strands (β-strands)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Parallel and anti-parallel Beta sheets Hairpin

Crossover

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Loops/turns

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Motifs in Proteins (Super-Secondary Structure)

• http://swift.cmbi.ru.nl/gv/students/mtom/hmotif.jpg Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Motifs in Proteins (Super-Secondary Structure)

• Psi-loop

https://en.wikipedia.org/wiki/File:5CPAgood.png

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

DSSP (Dictionary of protein secondary structure) • Criteria for secondary structure. • Programmed as a pattern-recognition process of hydrogen-bonded and geometrical features extracted from x-ray coordinates.



Kabsch W, Sander C (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers. 22 (12): 2577–637

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

DSSP (Helix, Strand and loops) Secondary structure

Symbols

Alpha helix 3-10 helix π-helix

G H I

Beta bridge Beta strand Turns High curvature Space/no rule applies

B E T S C Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

DSSP (Dictionary of protein secondary structure)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Experimental determination of Secondary Structure • Spectroscopy • UV CD circular dichroism

• IR Spectroscopy • NMR

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

http://www.ap-lab.com/images/CD_STANDARDS.gif

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction • Early/empirical methods: • Probabilities, and pre-computed residues preferences. • Chou-Fasman method (~60% accurate) •

Chou PY, Fasman GD (Jan 1974). "Prediction of protein conformation". Biochemistry. 13 (2): 222–245.

• CFSSP: Chou & Fasman Secondary Structure Prediction Server • http://www.biogem.org/tool/chou-fasman/

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction

• For instance, helical propensity of residue type X

• Pα(X) = frequency (X in helix) / frequency (X) • Pα > 1 = favours helix (e.g., Pα(Glu)=1.51)

• Pα < 1 = disfavours helix (e.g., Pα(Gly)=0.57) Gerard J. Kleywegt’s slide

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction • • • • • •

Database of 2000 residues 100 are Alanines 500 residues are in a helix 50 alanines are in a helix What is the propensity for Ala to be in a helix? Is Ala a good helix former? Gerard J. Kleywegt’s slide

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction • • • • • •

Pα(X) = frequency (X in helix) / frequency (X) Pα (Ala) = freq (Ala, α) / freq (Ala) freq (Ala, α) = 50/500 = 0.1 freq (Ala) = 100/2000 = 0.05 Pα (Ala) = 0.1/0.05 = 2.0 Ala is a good helix former! Gerard J. Kleywegt’s slide

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Secondary structure prediction • Current, machine learning-based methods employ information from multiple sequence alignment, information theory, and some machine learning algorithms like artificial neural network and Bayesian networks or a combination of those. • Eg: PSIPRED: • http://bioinf.cs.ucl.ac.uk/psipred/

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Tertiary structure • The tertiary structure is the final specific geometric shape that a protein assumes. • It is determined by a variety of bonding interactions between the "side chains" on the amino acids • Bond involve: hydrogen bonding, salt bridges, disulfide bonds, and non-polar hydrophobic interactions. http://chemistry.elmhurst.edu/vchembook/567tertprotein.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Methods of 3D structure Determination Information on 3D structure can be obtained by • X-ray crystallography, • NMR spectroscopy, or, • Cryo-electron microscopy, submitted by biologists and biochemists from around the world. freely accessible on the Internet via the websites of its member organizations. Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography .

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography • According to the Online Dictionary of Crystallography the term resolution is used to describe the ability to distinguish between neighboring features in an electron density map • R factor is one measure of model quality (The level of agreement between calculated and observed intensities). (0-0.6) • >0.5 is considered of poor quality. Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

X-ray Crystallography Resolution

Evaluation Interpretation

1.2 Å

Excellent backbone and most side chains very clear. Some hydrogens may be resolved.

2.5 Å 3.5 Å 5.0 Å

Good OK! Poor!!!

backbone and many side chains clear backbone and bulky side chains backbone mostly clear; side chains not clear.

http://proteopedia.org/wiki/index.php/Resolution Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Databases

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

wwPDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

RCSB PDB • Repository of information about the 3D structures of large biological molecules. • Was established in 1971 at Brookhaven National Laboratory • Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB in 1998

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

RCSB PDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

RCSB PDB

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

PDB ID(s)

• A 4-character ID eg: 8CAT • Unique, immutable identifier.

• The IDs are automatically assigned and do not have meaning.

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Domains • The domain is the basic building block of a protein structure • 1- A spatially separated unit of the protein structure • 2- May have sequence and/or structural resemblance to another protein structure or domain. • 3- May have a specific function associated with it. http://www.proteinstructures.com/Structure/Structure/protein-domains.html

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam • • • • •

Pfam 30.0 16306 entries (06.2016). Information about protein families (HMM) Annotations. links to other databases: RCSB PDB, CATH, SCOP, Proteopedia..etc

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Pfam

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

CATH The domains are classified within the CATH structural hierarchy: • Class (C) level, classification based on secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; • Architecture (A) level, the level based on arrangement in three-dimensional space. • Topology/fold (T) level, how the secondary structure elements are connected and arranged. • Homologous superfamily (H) level, assignments are made if there is good evidence that the domains are related by evolution, i.e. they are homologous. •

http://www.cathdb.info/wiki

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

CATH CATH v4.1 PDB Release

01-01-2015

Domains

308999

Superfamilies

2737

Annotated PDBs

108378 Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

CATH

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Proteopedia • Wiki web-resource whose pages have embedded three-dimensional structures surrounded by descriptive • http://proteopedia.org/wiki/index.php/Main_ Page

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Proteopedia

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

File formats • Sequence file; FASTA • Secondary Structure Files(FASTA-formatted file ("ss.txt"). • PDB entry files (PDB, PDBx/mmCIF, XML). • Small Molecule Files (PDB, CIF, SDF,..) • Large Structures Represented in mmCIF/PDBx (containing >62 chains and/or 99999 ATOM records)

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

FASTA-formatted file ("ss.txt") • •

• •



>101M:A:sequence MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRVKHLKTEAEMKASEDLKKHGVTV LTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDI AAKYKELGYQG >101M:A:secstr HHHHHHHHHHHHHHGGGHHHHHHHHHHHHHHH GGGGGG TTTTT SHHHHHH HHHHHHHHHHHHHHHHHHTTTT HHHHHHHHHHHHHTS HHHHHHHHHHHHHHHHHH GGG SHHHHHHHHHHHHHHHHHHHHHHHHTT >102L:A:sequenceMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVI TKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRW DEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL >102L:A:secstr HHHHHHHHH EEEEEE TTS EEEETTEEEESSS TTTHHHHHHHHHHTS TTB HHHHHHHHHHHHHHHHHHHHH TTHHHHHHHS HHHHHHHHHHHHHHHHHHHHT HHHHHHHHTT HHHHHHHHHSSHHHHHSHHHHHHHHHHHHHSSSGGG

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

PDB File formats

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

Molecular Graphics Software • Cn3D http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml • iCn3D http://www.ncbi.nlm.nih.gov/Structure/icn3d/docs/icn3d_about.html • UCSF Chimera http://www.cgl.ucsf.edu/chimera/index.html • Visual molecular dynamics (VMD) http://www.ks.uiuc.edu/Research/vmd/

• PyMOL https://www.pymol.org/ • Etc… Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Molecular Representation

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Rehab Ahmed

?????????

• What do we mean by Structural bioinformatics? • Why Protein Structure Bioinformatics?

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Structural bioinformatics

• Structural Bioinformatics is a branch of bioinformatics that deals with structure of the biological macromolecules; DNA, RNA and Proteins... (Deal=analysis, storage, visualization, prediction…etc) Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Why Protein Structure bioinformatics

• Proteins are the building blocks of all cells. • In the world of proteins; Structure= Function!?

• DNA encodes life..Yes! But proteins carry out life processes, replication, reproduction, defense…etc!

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Why Protein Structure bioinformatics

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

• This first SB session is meant to cover some basics and fundamentals and to help make us all be at the same page 

Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed

Resources/References • The Anatomy and Taxonomy of Protein Structure(By: Jane S. Richardson) http://kinemage.biochem.duke.edu/teaching/anatax/ • http://www.rcsb.org/ • http://sbkb.org/ • http://www.proteinstructures.com/index.html

• http://proteopedia.org/wiki/index.php/Main_Page Introduction to Bioinformatics online course: IBT_2016 Protein Structure Bioinformatics, Trainer: Rehab Ahmed