Biological Sequences: DNA, RNA, Protein

Biological Sequences: DNA, RNA, Protein Nucleotides and Nucleic Acids • biological molecules that possess heterocyclic nitrogenous bases as principal...
Author: Horatio Spencer
4 downloads 0 Views 1MB Size
Biological Sequences: DNA, RNA, Protein Nucleotides and Nucleic Acids •

biological molecules that possess heterocyclic nitrogenous bases as principal components of their structure

Biochemical roles of nucleotides are numerous • nucleotides participate as essential intermediates in all aspects of cellular metabolism • nucleic acids are linear polymers of nucleic acids i.e. polynucleotides linked by phosphodiester bridges • nucleic acids are elements of heredity and are involved in synthesis of proteins An orderly sequence of nucleotide residues in a nucleic acid can encode information. The convention in all notations of nucleic acid structure is to read the polynucleotide chain from the 5’-end to the 3’-end. There are two basic kinds of nucleic acids • ribonucleic acid (RNA) • deoxyribonucleic acid (DNA) Basic characteristics of DNA and RNA • DNA has only one biological role, but it is a central one. The information to make all the functional macromolecules of an organism (even DNA itself) is preserved in DNA and accessed through transcription of the information into RNA copies. There is a single chromosome in the form of a DNA molecule in simple life forms (e.g. bacteria). Eukaryotic cells have many chromosomes. In addition to the nucleus, mitochondria and chloroplasts have their own DNA sequences that encode for the proteins and RNAs unique to those organelles. • RNA occurs in multiple copies and various forms. Cells contain as much as 8 times more RNA than DNA material. RNA molecules are categorized into several major types: messenger RNA, ribosomal RNA, and transfer RNA. Eukaryotic cells contain an additional type, small nuclear RNA.


DNA • • • • • •

a thread-like molecule the DNA isolated from different cells and viruses consists of two polynucleotide strands wound together to form a long, slender, helical molecule, the DNA double helix. each DNA strand consists of four types of nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T) the strands run in the opposite directions, that is, they are antiparallel the strands are held together in the double helical structure through interchain hydrogen bonds the H-bonds pair the bases of nucleotides in one chain to complementary bases in the other (so-called base pairing)

The first clue for the base pairing came by Erwin Chargaff in the late 1940s whose data showed that the four bases commonly found in DNA do not occur in equimolar amounts and that relative amounts of each vary from species to species. Chargaff noted that adenine and thymine, and glutamine and cytosine are always found in a 1:1 ratio. That is, [A] = [T] and [C] = [G].

Watson and Crick’s Double Helix • • •

proposed by James Watson and Francis Crick in 1953 there are three types of DNA molecules: A-DNA, B-DNA, and Z-DNA B-DNA (a form with the usual major and minor groove) is preferred in vivo; A-DNA is conformation that exists in vitro, while in Z-DNA (zig-zag) the helix is left-handed

Size of DNA molecules is usually represented in terms of nucleotide base pairs (e.g. E. Coli consists of ~4 million base pairs)


DNA in the Form of Chromosomes • • • •

the single chromosome in prokaryotic cells is typically a circular DNA molecule, and it is associated with very little protein DNA molecules of eukaryotic cells are linear molecules and divided into many chromosomes; each DNA sequence is accompanied with proteins a class of arginine- and lysine-rich basic proteins interact ionically with the anionic phosphate groups in the DNA backbone to form nucleosomes, structures in which the DNA double helix is wound around a protein core consisting of four histone peptides chromosomes also contain a varying mixture of other proteins so-called non-histone chromosomal proteins, many of which are involved in regulating which genes in DNA are transcribed at any given moment


RNA • •

consist of four types of nucleotides: adenosine (A), uracil (U), cytosine (C) and guanine (G) in contrast to DNA, backbone consists of a ribose sugars (has an OH group in the sugar ring)

Messenger RNA (mRNA) • • • •

single-stranded macromolecule synthesized during transcription serves to carry the information (or message) that is encoded in the genes to the sites of protein synthesis in the cell because it is directly “transcribed” from the DNA, it is said that this is a DNA-like RNA – however, only genetic units of DNA are transcribed into mRNA

in prokaryotes, a single mRNA encodes for one or more polypeptide chains

in eukaryotes, a single mRNA encodes for only one polypeptide chain; eukaryotic mRNA is much more complex, it is synthesized in the nucleus in the form of a much larger precursor called heterogeneous nuclear RNA (hnRNA) hnRNA contain stretches of nucleotide sequence that have no protein-coding capacity (intervening sequences)

Ribosomal RNA (rRNA) • • •

fold into characteristic secondary structures as a consequence of intramolecular H-bond interactions one or more rRNA molecules comprises one subunit of a ribosome contains chemically modified nucleotides

Transfer RNA (tRNA) • •

serves as a carrier of amino acid residues for protein synthesis fold into a characteristic secondary structure

tRNA structure → Small Nuclear RNA (snRNA) • • •

similar to both tRNA and rRNA, but identical to neither located in the nucleus their biological purpose is to help produce hnRNA into mature mRNA (when going from the nucleus to the cytoplasm)


Splicing • • • •

modifications of the orginially transcribed RNA molecule splicing represents a removal of certain pieces of the RNA molecule called “intervening sequences” or introns what remains is so-called “expressed sequences” or exons a large majority of eukaryotic introns start with “GT” and ends with “AG”

So, only exons encode for proteins. When a particular set of exons encode for more than one protein we call this phenomenon alternative splicing. This means that one gene encodes for many proteins. This form of encoding is very efficient and new proteins can evolve much faster than in prokaryotes.


Proteins • • • •

diverse and abundant class of macromolecules constitute more than 50% of the dry weight of cells play the central role in virtually all aspects of cell structure and function called “machinery of life”

Proteins are linear polymers of amino acids. Peptide classification • two amino acid residues – dipeptide • three amino acid residues – tripeptide • four amino acid residue – tetrapeptide • about 10 or more – oligopeptides • about 20 or more – polypeptides but the naming conventions are not precise. What is a protein? • • • • •

proteins are composed of one or more polypeptide chains proteins composed of only one chain are called monomeric proteins (monomers) proteins composed of more than one chain are called multimeric proteins monomeric proteins may contain only one kind of protein chains when they get a preffix “homo” or they may consist of various polypeptide chain when they get a prefix “hetero” so, a protein consisting of two identical polypeptide chains would be called homodimer and a protein consisting of 4 different chains is called heterotetramer (or heteromultimer)

Proteins consist of 20 amino acids (you should know them all!) Architecture of Protein Molecules Some of the forms are • fibrous proteins – have a relatively simple regular structure; have structural roles in cells • globular proteins – roughly spherical in shape; compact • membrane proteins – found in association with the various membrane systems of cells Levels of Protein Structure • • • •

primary structure – amino acid sequence secondary structure – the local arrangement of amino acid residues that is a consequence of interactions between adjacent residues (H-bonds); there are three broad categories here: helical structure, sheet structure, and other types of structure (called “other” or “coil”) tertiary structure – the spatial arrangement of secondary structure elements; the tertiary structure is often refered to as protein 3-D conformation or shape quaternary structure – complexes of polypeptide chains (called subunits) or


Notice that primary structure is determined by covalent bonds while higher order structures are predominantly determined by weak interactions (of course, not always)



Biological Function of Proteins Enzymes • the largest class of proteins • their main function is to accelerate the rates of biological reactions (as much as 1016 times) • virtually every step in metabolism is catalyzed by an enzyme Regulatory proteins • regulate the ability of other proteins to perform their biological functions (e.g. insulin) • regulate gene expression (usually bind to DNA and either activate of inhibit transcription – e.g. repressors) Transport proteins • their function is to transport specific substances from one place to another (e.g. hemoglobin or serum albumin) • a different type of transport is performed by membrane proteins – these proteins take up metabolite molecules on one side of the membrane, transport them accross the membrane, and release them on the other side (form channels in the membrane through which the transported substances are passed) Storage proteins • their biological function is to provide a reservoir of an essential nutrient (casein is the major nitrogen source for mammalian infants) Contractile and motile proteins • provide a cell with unique capabilities for movement • cell division, muscle contraction, cell motility represent some of the ways in which cells execute motion • these proteins are filamentous or polymerize to form filaments (e.g. mysion) • another class of proteins involved in motion is so-called motor proteins that drive the movement vesicles, granules, and organelles Structural proteins • apparently passive, but very important role of proteins • provide strngth and protection to cells and tissues • monomeric units of structural proteins typically polymerize to generate long fibers (as in hair) or protective sheets of fibrous arrays • collagen is an important fibrous protein found in bones, connective tissue, tendons, cartilage, where it forms inelastic fibers of great strength • elastin is an important component of liganemts and it has elastic properties • fibroin is the major constituent of spider web Scaffold (adapter) proteins • have recently discovered role in the complex cell response to hormones and growth factors • possess modular organization in which specific parts of the protein’s structure recognize and bind certain structural elements in other proteins through protein-protein interactions • scaffold proteins act as a scaffold onto which a set of different proteins as assembled into a multiprotein complex • anchoring proteins bind other proteins, causing them to associate with other structures in the cell


Protective and exploitive proteins • have a biologically active role in a cell defense or protection • prominent member would be immunoglobulins (or antibodies) produced by the lymphocytes of vertebrates – they recognize and neutralize “foreign” molecules resulting from the invasion of the organism by bacteria, viruses, or other infectious agents • blood-clotting proteins (e.g. thrombin) prevent the loss of blood when circulatory system is damaged • antifreeze proteins protect blood of arctic/antarctic fish against freezing • various toxins and defensive proteins (e.g. ricin deters predation by herbivores) Exotic proteins • do not quite fit previous classification • monellin – found in African plant, has a very sweet taste and may be used as artificial sweetener • glue proteins – secreted by some marine organisms that enables them to stick to hard surfaces

Many proteins have chemical groups other than amino acids. These proteins are termed conjugated proteins. If the non-protein part is essential to protein function, it is referred to as a prosthetic group. Some of the conjugated proteins are • glycoproteins – contain carbohydrates • lipoproteins – conjugated with lipids • nucleoproteins • phosphoproteins – have phosphate groups attached • metalloproteins

Sequence Homology • •

in biology, two or more structures are said to be homologous if they are alike because of shared ancestry at a DNA or protein sequence level, homolgy is usually concluded when two sequences are similar.

Homologous sequences are formed by gene duplication (paralogs) and/or speciation (orthologs) during evolution. Important: distinguish similar sequences from homologous sequences. Two sequences are either homolous or they are not homologous (binary decision). Sequence identity is used to measure how similar two sequences are. ----Sources:

Biochemistry by Reginald H. Garrett and Charles M. Grisham Fundamental concepts of Bioinformatics by Dan E. Krane and Michael L. Raymer Internet (good for nucleotide information, has videos)