Introduction to molecular biology

Introduction to molecular biology Summary • • • • • • • • • Cells Chromosomes DNA RNA Aminoacids Proteins Genomics Transcriptomics Proteomics 1 ...
Author: Gertrude Austin
22 downloads 2 Views 4MB Size
Introduction to molecular biology

Summary

• • • • • • • • •

Cells Chromosomes DNA RNA Aminoacids Proteins Genomics Transcriptomics Proteomics

1

Cells All the living beings are composed of cells, that are the basic unit of life. Each cell derives from other cell. Prokaryotes No nucleus or internal membranes. Eukaryotes • Nucleus. • Internal membranes. • Organelles inside the cell that play different and specific roles. Organisms can be: Unicellular • Prokaryotes: bacteria, rchaea. • Eukaryotes: baker yeast. Multicellular •Eukaryotes: animals, plants, fungi… Human beings: 60 E18 cells, 320 different types

Cells Composition 70% Water 7% Small molecules: • Salts • Lipids • Aminoacids • Nucleotides 23% Macromolecules: • Proteins • Polysaccharide

Cell functions: A cell contains all the necessary information to perform a replication (a virus does not!). Processes developed by cells include: Metabolic pathways Traduction of RNA to proteins …

2

Chromosomes • The nucleus of Eukaryots contains one or more DNA molecules (double stranded). Each of these supermoleluces are called chomosomes. • For examples, human beings have 22 pairs of autosomes) and 1 pair of sexual chromosomes. :

Cells

• Almost all the cells in an organism have the same genome (some times there are slight differences). • The DNA represents all the information needed by the cell to perform its functions.

3

Three basic macromolecules for life

• DNA – It contains all the information needed by the cell (the “hard drive”) – Actually, since almost all the cells in an organism share the same genome, it contains all the information needed by ANY cell to perform their functions. – It stays (almost) always in the nucleus.

• RNA – RNA has two main functions: • It mimics the information in DNA (located in the nucleus) and migrates to other parts of the cell where this information is used (messenger RNA, mRNA) • It has a crucial role in protein synthesis (transfer RNA, tRNA).

• Proteins – Many different functions (signalling, structural, enzymes, regulation…). They are the key constituents of the organism.

Central dogma of molecular biology

• It is not a DOGMA – A dogma is some important part of the faith that must be believed. – The researcher that coined this term finally recognized that “I did not know what dogma meant”. – There are strong support to this… it is not a dogma (or at least there are other fields of knowledge that deserve this term much more ☺.

4

DNA vs RNA

DNA: code of life DeoxyriboNucleic Acid

There are four different nucleotides for all living beings: Adenine (A), Guanine (G), Cytosine (C) y Thymine (T). They have two complimentary pairs: A-T and C-G

5

DNA structure

DNA replication

6

Structure of a nucleotide

• Purines: Adenine (A) and Guanine (G). There is a double ring. • Pyrimidines: Thymine (T), Cytosine (C) and Uracil (U). Thymine is substituted by Uracil on RNA. Single ring.

One “nucleotide” is a compound formed by one base (A, C, T ó G), one sugar molecule and phosphoric acid.

How to read a DNA sequence?

All the nucleotides have two bonds: 5’ and 3’. The number is the position of the carbon atoms in the sugar molecule. Nucleotides, in turn, form a phosphodiesther bond.

The DNA molecule is created when bonds betwen 5’ and 3’ of the nucleotides are set. DNA is alwyas read from 5’ to 3’. Funny equivocations…

Sequence: TGACT

7

DNA Code: Symbol

• It can be seen as a code with only 4 letters instead of 2 (binary coding). • How many letters? 16 to describe different possibilities.

Meaning

Origin of the name

G

G

A

A

Guanine Adenine

T

T

Thymine

C

C

Cytosine

R

G or A

puRine

Y

T or C

pYrimidine

M

A or C

aMino

K

G or T

Keto

S

G or C

Strong interaction (3 H bonds)

W

A or T

Weak interaction (2 H bonds)

H

A or C or T

not-G, H follows G in the alphabet

B

G or T or C

not-A, B follows A

V

G or C or A

not-T (not-U), V follows U not-C, D follows C

D

G or A or T

N

G or A or T or C

aNy

-

---

None (gap)

DNA is double stranded

• Hydrogen bonds between the nucleotide pairs.. • DNA is not symmetric!! It has two directions and is read, always, from 5’ to 3’. • Both strands are complimentary: A-T and C-G – Forward strand – Reverse strand

8

Mitochondrial DNA

Mitochondrial DNA (mtDNA) is the DNA located in organelles called mitochondria. All mtDNA is received by the mother (since mitochondria is provided by the zygote. Mitochondria are sometimes described as "cellular power plants," because they generate ATP, used as a source of chemical energy .

RNA

(RiboNucleic Acid)

• Protein synthesis occurs in the Ribosomes • Organelles located in in the cytoplasm outside the nucleus. • DNA is in the nucleus. • RNA transports the information from the nucleus to the Ribosomes • The mechanism that creates RNA complimentary to DNA is called

Transcription. DNA vs RNA: (T) is substituted by uracil (U). • RNA is single stranded. It can bend and form two stranded chains (palindromes) (“Sit on a potato pan, Otis”).

9

Messenger RNA (mRNA)

• Part of the DNA is trascripted into RNA (RNA is a copy of the DNA). • RNA goes to the cytoplasm and in the ribosomes, mRNA is used to build proteins. • RNA itself is the message from the nucleus to the cytoplasm.

Transcription process Inititiation In the first stage, RNA polimerase binds to a region of DNA (that is called the promoter). The enzyme opens de DNA, and allows the creation of the RNA molecule that has a complementary sequence to the DNA. Elongation RNA polimerase moves along the supporting strand and RNA nucleotides are inserted in the new RNA molecule Termination RNA termination process is a complex process (it involves palidrons –hairpins- in prokaryots and more complex processes in eukaryots). Once it has finished, DNA is closed again, and RNA moves form the nucleus to the cytoplasm.

10

Transcription in action

RNA maturation

• In Eukaryots, the sequence that appears in the genome is not exactly the one translated. • RNA has a maturation process • Remove intermediate sequences called introns. • Join the exons using polymerases. • A single gene (DNA) can raise several variants (using different exons). This process is called Alternative Splicing.

11

What is a gene?



Promoter region. It contains the necessary sequences to activate or deactivate the gene. Limits are fuzzy and depends on different genes. Proximal promoter is considered to be 1000-50000 bp upstream the TSS (transcription start site)



Exons:

Coding regions of the gen (it converts into proteins) 1 to 178 exones/gene (average: 8.8) 8 bp to 17 kb /exon (average145 bp)



Introns:

Non coding region flanked by two exons. Size (average): 1 kb – 50 kb /intron (much larger than exons)



Size of a gene: the largest: 2.4 Mb (Dystrophin). Average: 27 kb.

12

PROTEINS

Aminoacids Amino acids are the basic structural building units of proteins. An amino acid is a molecule that contains both amine and carboxyl functional groups with the general formula H2NCHRCOOH, where R is an organic substituent.

They form polymer chains Short ones called peptides, large ones called polypeptides or proteins. Translation Process to form the protein according to the mRNA template. As both the amine and carboxylic acid groups of amino acids can react to form amide bonds, one amino acid molecule can react with another and become joined through an amide linkage. This polymerization of amino acids is what creates proteins.

13

Aminoacids

• 20 standard aminoacids • Bricks to build proteins. • 10 essential amino acids • Cannot be synthesized by human body. • They therefore must be obtained from food • Plants synthesizes all of them.

Aminoacids Amino Acid Alanine Arginine Asparagine Aspartic acid Cysteine Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine

3-Letter 1-Letter polarity acidity Ala A nonpolar neutral Arg R polar basic Asn N polar neutral Asp D polar acidic Cys C polar neutral Glu E polar acidic Gln Q polar neutral Gly G nonpolar neutral His H polar basic Ile I nonpolar neutral Leu L nonpolar neutral Lys K polar basic Met M nonpolar neutral Phe F nonpolar neutral Pro P nonpolar neutral Ser S polar neutral Thr T polar neutral Trp W nonpolar neutral Tyr Y polar neutral Val V nonpolar neutral

hydrophobycity 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2

14

Proteins

• Proteins are large molecules composed of aminoacids. • Their 3D structure is complex • It is not a double helix as DNA: the shape is different for each of them. • Proteins fold. This folding plays a crucial role in their function • For example, mad cow disease is produced by an anormal folding of a protein.

Protein structure • Protein structure is crucial to determine their chemical properties, and even, their function. • 3D structure determines which are the aminoacids in the surface. • There are 4 levels at which structure can be studied: 1. 2. 3. 4.

Aminoacid sequence Polipeptide folding Protein shape Protein interactions (that include changes in the positions of the atoms).

15

Central Dogma (once again)

• Transcription brings the data from DNA to RNA • RNA from the nucleus to the Ribosoms • Translation obtains protein according to the genetic code and the corresponding mRNA – tRNA is used as a lorry to carry the aminoacids as we will see.

Translation

• Translation is the second step in the central dogma. • mARN is decoded using the genetic code • Aminoacids follow the sequence given by mRNA. • This process takes place in the cytoplasm. • tRNA is used as a “lorry” to carry the aminoacids. •Ribosomes are the factories to build the proteins.

16

Trasnfer RNA (tRNA)

tRNA is a RNA that is used to carry aminoacids to the ribosomes in order to build teh proteins. tRNA abundance is larger than mRNA (75% vs 15%) Most RNA in the cell is tRNA tRNA acknoledges mRNA and transfer the correspondign aminoacid to the protein being created.

Genetic code Codon: a sequence of 3 nucleotides that codes for an aminoacid according to this table.

AUG codes methionine, and is also the start code. First AUG in mRNA is the region where translation starts.

17

Some exceptions:

Genetic code is almost universal.

Other considerations… • A codon is a sequence of 3 nuclotides (DNA or RNA) that codes for a particular aminoacid. – There are 4 possible bases (RNA) : A, C, G y U – 3 bases per codon

• Therefore, there are: 4 * 4 * 4 = 64 possible codons • Special codons: – Start codon: AUG. Translation starts in this codon. It also codes for an aminoacid (methinine) – Stop codons (three flavours): UAA, UAG, UGA

• There are 61 codons left to code 19 aminoacids – Genetic code is redundant: the same aminoacid may be coded by several codons.

18

Translation again: Anticodon: A sequence of 3 nucleotides in tRNA that acknowledge the corresponding codon in mRNA. Using the anticodon, the aminoacid to include in the protein is selected. tRNA carries the “free” aminoacid and, in the Ribosome, it is joined to the polypeptide chain that it is being created. For example, tRNA with anticodon UAC, corresponds to the AUG codon that, in turn, codes methionine. ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG… ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G… M

E

V

F

K

A

P

P

I

G

I

stop

Translation in action

19

In brief: • Proteins are coded in the genes in ADN located in the nucleus. DNA stays always in the nucleus. • Ribosomes are factories to build proteins located in the cytoplasm. mRNA carries the mesage from the nucleus to the ribosomes. There is an intermediate step called mRNA maturation in which introns are excluded and exons are retained. • Ribosomes build what mRNA codes, using aminoacids that in turn, are carried by tRNA. – Ribosomes are composed of proteins and rRNA (a third class of RNA… and there are even more!!)

Some important Definitions in BIOINFORMATICS

20

ORF (Open Reading Frame): •

Coding From DNA to protein is done by codons. There are three possibilities (starts with the first, the second or the third nucleotide in the sequence). We can use one strand (forward) or the other (reverse strand). Each of these six possbilities are called a reading frame. Only one of them is valid. For example, this sequence has the following possibilities : ATGCC (M) ATGCC (C)



ATGCC (A)

A sequence flanked by start codon and a stop codon is called an Open Reading Frame (ORF).

ATG

TGA

Genomic Sequence Open reading frame

ORFs as gene candidates • An open reading frame that begins with a start codon (ATG) • Most prokaryotic genes code for proteins that are 60 or more amino acids in length • The probability that a random sequence of nucleotides of length n has no stop codons is (61/64)n • When n is 50, there is a probability of 92% that the random sequence contains a stop codon • When n is 100, this probability exceeds 99% – A large sequence without stop codons is probably coding a protein.

21

Definitions: • Nucleic acids = composed of nucleotides= bases or base pairs • Short form: nt (nucleotides), bp (base pairs). • 400-nt: means 400 nucleotide positions (in DNA they are 800!) • 400-bp menas 400 base pairs • 1000000-bp = 1000-kb = 1-Mb

Genomic analysis

How to build a whole genome in four steps: – Cut it!: • Restriction enzymes break the DNA in specific sites. • It is divided into sort pieces.

– Copy it!: • It is easy to copy DNA (it was designed for that!). • We get several clones of each DNA sequence using the Polymerase Chain Reaction (PCR). – Using a cycle of PCR, the concentration of DNA is doubled » 20 cycles of PCR increases the concentration by 2^20… » (about 1 million times)

– Read it!: • Electrophoresis to read small fragments.

– Ensembl it!: • Using all the fragments, there are overlapping sequences that can be used to perform the ensembl (just like building a puzzle). • This puzzle has “large sky regions” difficult to build: there are large parts of the genomes quite repetitve (and they are also important).

22

Genomic analysis and bioinformatics

Once that we have the sequence we can find genes (using statistical properties of the intra gene regions). It is also important to measure gene expression and predict their function.

Gene hunting

Protein sequence analysis

DNA sequence analysis 2001: First draft version of the human genome. 2003: Human genome curated. First “release version”. Mouse genome completed.

Protein function can be inferred from their sequence and structure. Structure analysis gives better resutls

Bioinformatics analysis: • DNA – Useful for genomic diseases • Single gene (mendelian), chromosomal. • Multifactorial o complex diseases.

Predisposition to develop a disease – Does not change if the organism has an acquired disease condition  Not valid as a marker of an acquiered disease

• RNA – Easy to measure – RNA concentration changes for disease state Early marker for different diseases

• Proteins – It is difficult to perform a whole proteome analysis. – They finally explain most of the disease targets Closer to the biological fact Most reasonable drug targets

23

Genomes:

ORGANISM

CHROMOSOMES

Size

GENE Number

Homo sapiens (Humans)

23

3,200,000,000

~ 30,000

Mus musculus (Mouse)

20

2,600,000,000

~30,000

Drosophila melanogaster (Fruit Fly)

4

180,000,000

~18,000

Saccharomyces cerevisiae (Yeast)

16

14,000,000

~6,000

Zea mays (Corn)

10

2,400,000,000

???

Transcriptome:

• Different mRNA (including splice forms) for a particular organism. – About 30.000 genes – About 250.000 splice variants

• Other RNA fucntions related with trasncription regulation – miRNA: small pieces of RNA that interrupt the transcription of a gene

24

Proteome • The complete collection of proteins in an organism – Nobody knows how many… At least several millions. – For each splice variant, using post transductional modifications, different proteins (with different functions can be obtaines).

• One gene  Several splice forms  Several proteins – Proteins are modified by other molecules that are joined to it • Phosphate, acyl, methil, sugars, lípids, etc., • They change radically the activity of the protein – There are many proteins with two forms: idle and active. The transition is done by adding a phosphate group.

• Many disparate biological activity can be assigned to a single gene.

Questions?

25

Problem:

Genetic code

AUG codes methionine, and it is also the start codon.

26