"203%.&$

18/07/14 O mundo dos RNAs não-codificadores de proteínas _________________________________________________ Francis de Morais Franco Nunes !"#"$%"&$'...
Author: Joella Harvey
48 downloads 7 Views 4MB Size
18/07/14

O mundo dos RNAs não-codificadores de proteínas _________________________________________________ Francis de Morais Franco Nunes

!"#"$%"&$'()&$*"+,-#.+"/0&$+0$1/"203%.&$

1

18/07/14

Expressão Gênica Diferencial Temporal

Gene A

Gene B

Gene C

DNA

transcrição

mRNA tradução

Proteínas

Expressão Gênica Diferencial Espacial

2

18/07/14

Classes de RNAs

444$5"#"$2.6786$%"&$09060%2"&$:0%8;#"&$ % 50% Humanos ~ 98%

4

18/07/14

@A%B$C() $D!"#$%&'()*E$ Genes < 2%

Não-codificador > 98%

...

...

>90% do genoma humano (eucromatina) é transcrito

GENOMA TRANSCRITO

Birney et al. 2007 - Nature Kapranov et al. 2007 – Nature Rev. Genet.

Next-generation sequencing >90%

~ 75% - Seqs não-repetitivas -  Funcional x noise ? -  Numerosas classes de ncRNAs -  Complexidade = info genética + regulação fina Nagalakshmi et al. 2008 - Science

5

TRANSCRIÇÃO PERVASIVA

18/07/14

*/"6.;%.$+0$+,50/0%20&$#89A9.&$ $ E = eucromatina H = heterocromatina

+

=!

Fonte: http://www2.uah.es/biologia_celular/LaCelula/Cel4Nuc.html

$

1"/$FA0$"&$%#'()&$% long

9

Non-coding RNA

18/07/14

-  Orquestram DIVERSOS Processos Biológicos e Funções Moleculares

Replicação, trancrição, tradução, silenciamento, estabilidade cromossômica, modificação/ processamento/estabilidade de RNA, estabilidade/ localização de proteínas… Comportamento, senescência, reprodução, metabolismo, apoptose, proliferação e diferenciação celular, morfogênese, respostas ao estresse… Perturbações nesses reguladores resultam em desordens!

10

18/07/14

Junho, 2011

Nível de expressão

Oligoribonuclease exonuclease

11

18/07/14

PIWI-interacting RNAs (piRNAs) Tamanho = 28–33 nt Origem = células da linhagem germinativa. Em Drosophila, foram encontrados em células somáticas. Alguns piRNAs de camundongo se originam de sequências repetitivas. Ação = se associam a proteínas da família PIWI, controlam atividade de transposons em linhagem germinativa e a viabilidade desta em C. elegans, Drosophila, peixes e mamíferos.

© The Authors Journal compilation © 2013 Biochemical Society Essays Biochem. (2013) 54, 79–90: doi: 10.1042/BSE0540079

© The Authors Journal compilation © 2013 Biochemical Society Essays Biochem. (2013) 54, 79–90: doi: 10.1042/BSE0540079

6

Role of small nuclear RNAs in eukaryotic gene expression S. Valadkhan and L.S. Gunawardane

6

Role of small nuclear RNAs in 81 eukaryotic gene expression

Saba Valadkhan1 and Lalith S. Gunawardane

Center for RNA Molecular Biology, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, U.S.A.

Abstract Saba Valadkhan1 and Lalith S. Gunawardane Eukaryotic cells contain small, highly abundant, nuclear-localized non-coding RNAs [snRNAs (small nuclear RNAs)] which play important roles in splicing of

Center Molecular Biology, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 441 introns for fromRNA primary genomic transcripts. Through a combination of RNA–RNA and RNA–protein interactions, two of the snRNPs, U1 and U2, recognize the U.S.A. splice sites and the branch site of introns. A complex remodelling of RNA–RNA and protein-based interactions follows, resulting in the assembly of catalytically competent spliceosomes, in which the snRNAs and their bound proteins play central roles. This process involves formation of extensive base-pairing interactions between U2 and U6, U6 and the 5′ splice site, and U5 and the exonic sequences immediately adjacent to the 5′ and 3′ splice sites. Thus RNA–RNA interactions involving U2, U5 and U6 help position the reacting groups of the first and second steps of splicing. In addition, U6 is also thought to participate in formation of the spliceosomal active site. Furthermore, emerging evidence suggests additional roles for snRNAs in regulation of various aspects of RNA biogenesis, from transcription to polyadenylation and RNA stability. These snRNP-mediated regulatory roles probably serve to ensure the co-ordination of the different processes involved in biogenesis of RNAs and point to the central importance of snRNAs in eukaryotic gene expression.

Abstract

Eukaryotic cells contain small, highly abundant, nuclear-localized non-codi RNAs [snRNAs (small nuclear RNAs)] which play important roles in splicing introns from primary genomic transcripts. Through a combination of RNA–RN and RNA–protein interactions, two of the snRNPs, U1 and U2, recognize t Figure 1. U6 and U2 snRNAs and the mRNA at the time of first and second steps of splice sites and the branch site of introns. A complex remodelling of RNA–RN splicing and protein-based follows, resulting in the assembly of catalytica Keywords: The location of U6, U2 and the U6 ISL is shown. Thegroup intron is shown byinteractions a thick light blue line II intron, ribozyme, small nuclear RNA, spliceosome. competent spliceosomes, in which the snRNAs and their bound proteins pl connecting the two exons. Position of the 5′ splice site (5′SS), 3′ splice site (3′SS) and branch central roles. This during processthe involves formation site are shown. Solid arrows point to the site of the nucleophilic attack two steps of of extensive base-pairing intera To whom correspondence should be addressed (email [email protected]). between U2 andgroup U6, U6 the 5′c splice site, and U5 and the exon splicing. The first step involves a nucleophilic attacktions by the 2′ hydroxy of and a specifi sequences immediately adjacent to the 5′ and 3′ splice sites. Thus RNA–RN adenosine residue in the intron, the branch site adenosine (the bulged A), on the 5′ splice site. 79 U2, U5 andadenosine U6 help position the reacting groups of the fi This leads to a trans-esterification reaction in which theinteractions 2′ oxygen involving of the branch-site and second of splicing. In reaction addition, is U6 is also thought to participate in f replaces the 3′ oxygen of the last nucleotide of the upstream exon.steps The result of this the release of the first exon and the formation of an unusual between thesite. branch mation 2′–5′ of the linkage spliceosomal active Furthermore, emerging evidence sugges site adenosine and the first nucleotide of the intron (right-hand During theinsecond step, additionalpanel). roles for snRNAs regulation of various aspects of RNA 12 biogenes the free 3′ hydroxyl moiety of the newly released exon is transcription activated fortoapolyadenylation similar nucleophilic from and RNA stability. These snRNP-mediat attack on the 3′ splice site, resulting in ligation of the two exonsroles and probably release ofserve the intron as athe co-ordination of the different pr regulatory to ensure branched lariat. Base-pairing interactions are shown by shortinvolved black lines. The location of the cesses in biogenesis of RNAs and point to the central importance 2′–5′ linkage formed after the first step of splicing at thesnRNAs branchinsite is shown. eukaryotic gene expression. 1

18/07/14

104

Essays in Biochemistry volume 54 2013

abolish either transcription or translation (Figure 1A) [3]. A second class of pseudogene, the duplicated pseudogene (Figure 1B), is formed when replication of the chromosome is performed incorrectly [2]. Such duplication events often lead to the formation of functional gene families, such as those found in the Hox gene clusters, but if part of the gene is not faithfully copied then these can lead to frameshift mutations or the loss of a promoter or enhancer, thus resulting in a non-functional duplicated pseudogene. The final class, known as the processed pseudogene (Figure 1C), is formed when an mRNA molecule is reverse-transcribed and integrated into a new location in the parental genome [4]. Because processed pseudogenes are produced from mRNA, they usually lack introns and a promoter, and are therefore only transcribed if they become integrated close to a pre-existing promoter [5]. The sequencing of a range of genomes, including the human genome, has revealed the extent of pseudogene abundance [6–8]. Estimates for the number of human pseudogenes range from 10000 to 20000, making them almost as prevalent as coding genes [9]. The majority of these are processed pseudogenes [6–8] and less than 100 are unitary pseudogenes [3]. Interestingly, the processed pseudogenes found in the human genome have been formed from just 10% of the coding genes [6,8], suggesting that either not all genes are capable of producing processed pseudogenes, or that only the processed pseudogenes produced by certain types of gene are selected for by evolution. The types of genes that produce processed pseudogenes are predominantly highly expressed housekeeping genes or shorter RNAs such as genes encoding ribosomal proteins [10]. It is of note that whereas mammalian genomes are particularly well endowed with pseudogene numbers [9], they are by no means the only species that harbour them. Pseudogenes have been found in various species [11], including bacteria, plants, insects and nematode worms, examples of which can be found in various databases [12]. Pseudogenes have often been labelled as ‘junk DNA’ because they lack protein-coding capacity. In fact, some genes that appear to be pseudogenized may in fact code for proteins

Figure 1. Different classes of pseudogene (A) A unitary pseudogene is formed when a spontaneous mutation occurs in a coding gene. Such mutations may ablate transcription from the promoter or cause premature stop codons or frameshifts to occur. (B) A duplicated pseudogene is formed when a gene is duplicated, but in such a way that mutations in the copy prevent formation of a protein. (C) Processed pseudogenes arise when DNA is transcribed into RNA, which is then reverse-transcribed into copy DNA (cDNA) and integrated into the genome. Such pseudogenes often lack promoter activity and may have deletions or truncations that prevent protein formation. Closed boxes depict exons; open boxes depict introns; ‘X’ shows a mutation that prevents the DNA from being able to make a protein.

13 © The Authors Journal compilation © 2013 Biochemical Society

bse0540103h.indd 104

2/23/2013 9:30:57 PM

Metazoan genomes are currently predicted to contain thousands of these loci, from approximately 1119 in the fruitfly [9] to more than 8000 in the human genome [10,11]. Such loci can be described as genes since they show some of the transcriptional, chromatin and evolutionary features of protein-coding genes. Nevertheless, this should not be meant to imply that each is functional. For example, some transcripts (RNA molecules) may not themselves transact a function, even if the act of their transcription is functional, for example by transcriptional interference [7]. Like the majority of mRNAs, many lncRNAs are thought to be polyadenylated and transcribed by RNA Pol II (RNA polymerase II). Recent mouse and human lincRNA sets have been defined using chromatin immunoprecipitation experiments targeting H3K4me3 (histone H3 Lys4 trimethylation) and H3K36me3 (histone H3 Lys 36 trimethylation) modifications [12,13] which are markers for RNA Pol II activity. Such lncRNAs may be spliced, and show a tendency to be expressed in a low and tissue–specific manner, with many thought not to be

18/07/14

Figure 1. Two examples of mouse lncRNA loci whose transcripts’ sequences overlap protein–coding loci UCSC genes are shown in blue, with supporting mRNA sequence evidence in black and a conservation track across 30 vertebrate species is shown below. lncRNA loci are highlighted using yellow boxes. (a) Airn, an imprinted lncRNA locus. (b) Evf2, also known as Dlx6os. An antisense transcript Dlx6as is also apparent.

Essays Biochem. (2013) 54, 103–112: doi: 10.1042/BSE0540103

8

Pseudogenes as regulators of biological function

107

© The Authors Journal compilation © 2013 Biochemical Society

Biochem. (2013) 54, 103–112: doi: 10.1042/BSE0540103 © The Authors Journal compilation © 2013 Biochemical Society Essays © The Authors Journal compilation © 2013 Biochemical Society

R.C. Pink and D.R.F. Carter

8

Pseudogenes as regulators of biological function

Ryan C. Pink and David R.F. Carter1

School of Life Sciences, Oxford Brookes University, Gipsy Lane, Headington, Oxford OX3 0BP, U.K.

Abstract

Ryan C. Pink and David R.F. Carter1

A pseudogene arises when a gene loses the ability to produce a protein, which can be due to mutation or inaccurate duplication. Previous dogma has dictated that because the pseudogene no longer produces a protein that it becomes functionlessofand being neither conserved or Gipsy removed. However, School Lifeevolutionarily Sciences,inert, Oxford Brookes University, Lane, Headington, recent evidence has forced a re-evaluation of this view. Some pseudogenes, although not translated into protein, are at least transcribed into RNA. In some cases, these pseudogene transcripts are capable of influencing the activity of other genes that code for proteins, thereby influencing expression and in turn affecting the phenotype of the organism. In the present chapter, we will define pseudogenes, describe the evidence that they are transcribed into non-coding RNAs and outline the mechanisms by which they are able to influence the machinery of the eukaryotic cell.

Oxford OX3 0BP, U.K.

Abstract

A pseudogene arises when a gene loses the ability to produce a protein, which can be due to mutation or inaccurate duplication. Previous dogma has dictated Keywords: that because theRNA, pseudogene no longer produces a protein that it becomes funcnon-coding RNA, pseudogene, transcription. tionless and evolutionarily inert, being neither conserved or removed. However, Introduction recent evidence has forced a re-evaluation of this view. Some pseudogenes, A pseudogene is generally defined as a copy of a gene that has lost the capacity to produce a although not translated into protein, are at least transcribed into RNA. In some functional protein. They were first discovered in the 1970s when a copy of the 5S rRNA gene was cases, found in Xenopus laevis with homology to the active gene, but with a clear truncation that these pseudogene transcripts are capable of influencing the activity of rendered it non-functional [1]. Sporadic discovery and characterization of pseudogenes over other 20genes that acode proteins, thereby influencing expression and in turn the following years has revealed number offor mechanisms for pseudogene formation [2]. Unitary pseudogenes are formed when spontaneous mutations occur in a coding gene that affecting the phenotype of the organism. In the present chapter, we will define To whom correspondence should be addressed (email [email protected]). pseudogenes, describe the evidence that they are transcribed into non-coding RNAs and outline the103mechanisms by which they are able to influence the machinery of the eukaryotic cell.

1

bse0540103h.indd 103

2/23/2013 9:30:57 PM

Keywords:

14

non-coding RNA, pseudogene, RNA, transcription.

Introduction A pseudogene is generally defined as a copy of a gene that has lost the capacity to produce

R.C. Pink and D.R.F. Carter

107

18/07/14

R.C. Pink and D.R.F. Carter

107

Figure 2. Mechanisms of pseudogene functionality (A) Pseudogene RNA transcribed in the reverse (antisense) direction can combine with forward (sense) transcripts from the coding gene to produce dsRNA. This can inhibit translation of the coding RNA, or produce siRNAs that go into the RNAi pathway and cause the coding RNA to be degraded. siRNAs that destroy the coding transcript can also be generated by (B) pairing between sense and antisense transcribed pseudogenes and (C) double-stranded regions formed by secondary structure within a single pseudogene transcript. (D) Pseudogene transcripts may share binding sites for miRNAs or trans-acting proteins that regulate the stability of the mRNA. Increased levels of pseudogene transcripts can compete for these factors and therefore shield the coding transcripts from their effects.

bse0540103h.indd 107

Interestingly, the siRNAs generated did not always come from the pairing of a pseudogene Figure 2. aMechanisms pseudogene functionality RNA with coding gene of mRNA. Sometimes they were generated from the pairing of two (A) Pseudogene RNA transcribed in the reverse (antisense) direction can combine with forward pseudogenes (one transcribed in the sense direction and the other in the antisense), but the (sense) transcripts from the coding gene to produce dsRNA. This can inhibit translation of the siRNA represses the siRNAs coding parent such as inpathway the caseand of HDAC1 a hiscodingthen RNA, or produce that gogene, into the RNAi cause the(encoding coding RNA to be degraded. that destroy the coding transcript can the alsosiRNAs be generated by (B) pairing tone deacetylasesiRNAs enzyme) (Figure 2B) [34]. In other instances were generated from between sense and antisense transcribed pseudogenes and (C) double-stranded regions the internal pairing of different regions within the same pseudogene transcript (i.e. from douformed by secondary structure within a single pseudogene transcript. (D) Pseudogene ble-stranded regions formed by secondary structureorfolding). An example theregulate latter is the the transcripts may share binding sites for miRNAs trans-acting proteinsofthat stability of the mRNA. Increased levels of pseudogene transcripts can compete for these formation of hairpin loop structures in the Au76 pseudogene RNA, which are processed into factors that and repress thereforeexpression shield the of coding transcripts from theirgene effects. siRNAs the homologous coding Rangap1 (encoding a protein that regulates G-coupled receptor signalling) (Figure 2C) [35]. Other organisms, including rice Interestingly, the siRNAs didshown not always come from thefrom pairing of a pseudogene [36] and trypanosomes [37]generated have been to generate siRNAs pseudogenes, which RNA with a coding gene mRNA. Sometimes they were generated from the pairing of two pseudogenes (one transcribed in the sense direction and the other in the antisense), but the © 2013 Biochemical Society siRNA then represses the coding parent gene, such as in the case of HDAC1 (encoding a histone deacetylase enzyme) (Figure 2B) [34]. In other instances the siRNAs were generated from the internal pairing of different regions within the same pseudogene transcript (i.e. from double-stranded regions formed by secondary structure folding). An example of the latter is the2/23/2013 formation of hairpin loop structures in the Au76 pseudogene RNA, which are processed into siRNAs that repress expression of the homologous coding gene Rangap1 (encoding a protein that regulates G-coupled receptor signalling) (Figure 2C) [35]. Other organisms, including rice [36] and trypanosomes [37] have been shown to generate siRNAs from pseudogenes, which © 2013 Biochemical Society

9:30:58 PM

15

Long ncRNAs

18/07/14

SNC e SNP

Mesoderme e céls. germinativas Constitutivos

M"%:"&$'()&$%! 5B%*/$E3.F"2G%#'*+,-! CK"2F%#'*+,! 7)*+,%=55&5S#F>! 1)*+,%=5U&S4#FM%1)I)&)#F/2.'F)#$%*+,>! V/()E7%#'*+,%=A:&5::#F>! 1.*+,%=12"7"F/2&.--"').F/(%#'*+,>! !"#$%=3.2$/>%#'*+,%=Y5::#F>! Z#F/2$/#)'%#'*+,! Z#F2"#)'%#'*+,! [9*%!#'*+,! ,#F)-/#-/%F2.#-F'2)1F!

2*+,%7"()J)'.F)"#! %*+,%-13)')#$M%1"3G.(/#G3.F)"#!! %! %! T/$2.(.F)"#%"J%7*+,%"2%2/12/--)"#%"J%F2.#-3.F)"#! C)3/#')#$%"J%F2.#-1"-"#-! %! W/#/%2/12/--)"#%'!("'&(D).%)#F/2.'F)#$%I)FK%X*N5! %! P1)$/#/F)'%2/$E3.F"2-%"J%F2.#-'2)1F)"#%'!("'&\'!()*+!&( ,-'.%% ,-'.%% 7*+,%-F.H)3)FG%"J%)F-%K"7"3"$"E-%'"()#$%$/#/%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

18/07/14

MINI REVIEW ARTICLE

X-/E("$/#/%F2.#-'2)1F!

published: 09 January 2012

doi: 10.3389/fgene.2011.00107 W/#/2.F)"#%"J%+,9-%"2%'/*+,-M%-F.H)3)].F)"#%"J%)F-%'"(& )#$%F2.#-'2)1F%HG%'"71/F)F)D/%H)#()#$%7)*+,-! D0*`*F$1"3/*L-3*9:P9X]M9NRP9^4P;:* The long non-coding RNAs: a new,'F)D.F)"#%"J%12"7"F/2%.'F)D)FG%HG%E#R#"I#%7/'K.#)-7 (p)layer in the ...>1@#$>%$2*a!SSCRPH]V4IP]PaD`FLP9:9::P* P#K.#'/2&3)R/%#'*+,%=/*+,> ! ! * “dark matter” 3,4',5*6"/'7(,* V)F"'K"#(2).3%#'*+,! N/33%'G'3/%2/$E3.F)"#%.#(%7"2/%E#R#"I#%JE#'F)"#-! Z%"2*"%"4(%&)"2*LCD3R*E-$31#)/-*013#-$*$-2'/1#%$3* Thomas Derrien *, Roderic */1/.F&.--"').F/(%#'*+, ! Guigó and Rory Johnson */$E3.F)"#%"J%2/1/.F%-)3/#')#$! %=*2-"-*-G6$-33)%"*1"&*($'()1/*6/17-$3*)"*(1"(-$* Bioinformatics and Genomics, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain Departament Spain C.F/33)F/%#'*+, ! de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona,Z#D"3D/7/#F%"J%J"27.F)"#%.#(%JE#'F)"#%"J%'/#F2"7/2/& * Z-)*C)- 5*b3)"24`'*+' 5*`'"24U1%*b3' 5*S,),4S,)"*c,1"2 5*D&10*U*Z1T1== 5*c,)14+-)*Z) 5*d1"*+1"2 5* .--"').F/(%'"713/0/-! `-"")=-$*Z>*b3' 5*U)-"4c,)-*b'"2 *

!

1

1,2

1

1 2

P

9

P

P595]

P

P

P

P

P595V5]

Edited by: The transcriptome of a cell is represented by a myriad of different RNA molecules with and Philipp Kapranov, St. Laurent 8!,%)"/+,-/*#9*:#(,7;()"*)-2*0,((;()"**?&,*@-'4,"$'/=*#9*?,A)$*:!*6-2,"$#-*0)-7,"*0,-/,">*8B8B*C#(D without protein-coding capacities. In recent years, advances in sequencing technologies Institute, USA K

*

!

7#+1,*E#;(,4)"2>*C#;$/#->*?F*GGHIHJ* 0,-/,"*9#"*:#(,7;()"*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=*C#$%'/)(>*?)'7&;-.>* have allowed researchers to more fully appreciate the complexity of whole transcriptomes, N6$')* Reviewed by: ?)'5)-J*IL")2;)/,*M-$/'/;/,*#9*0)-7,"*E'#(#.=>*0#((,.,*#9*:,2'7'-,>*0&'-)*:,2'7)(*@-'4,"$'/=>*?)'7&;-.>*?)'5)-J* showing that the vast majority of the genome is transcribed, producing a diverse population @-'4,"$'/=>*?)'7&;-.>*?)'5)-*Yohei Kirino, Cedars-Sinai Medical * of non-protein coding RNAs (ncRNAs).Thus, the biological significance of non-coding RNAs Center, USA L-(-)E-&*[-8$'1$7*V5*9:P9X*1((-6#-&*[-8$'1$7*PW5*9:P9X*e6'8*D6$)/*I5*9:P9X*_'8/)3,-&*D6$)/*V:5*9:P9* Chris Ponting, MRC Functional (ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs, * Genomics Unit, UK the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionally D83#$1(#R* +)#,* $16)&* &-E-/%60-"#* %=* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13* &--6* 3-?'-"()"2* 1"&* .,%/-* 2-"%0-* ,)2,4 &-"3)#7* #)/)"2* 1$$175* .-* "%.* *Correspondence: A"%.* #,1#* 0%3#* %=* #,-* J@'"AK* 2-"%0)(* 3-?'-"(-3* 1$-*A#$1"3($)8-&* 13* "%"4(%&)"2* diverse. small but growing LCD3* number of lncRNAs have been experimentally studied, and M"(LCD3N>* D* /1$2-* "'08-$* Thomas %=* /%"2* Derrien, "(LCD* #$1"3($)6#3* MY* 9::86N* ,1E-* 8--"* )&-"#)=)-&5* 1"&* #,-3-* /%"2* "(LCD3* Bioinformatics and a view is emerging that these are key regulators of epigenetic gene regulation in mamMZ"(LCD3N*1$-*=%'"&*#%*8-*($'()1/*$-2'/1#%$3*=%$*-6)2-"-#)(*0%&'/1#)%"5*#$1"3($)6#)%"5*1"&*#$1"3/1#)%">*!"*#,)3*$-E)-.5* Genomics Group, Centre for Genomic .-* 8$)-=/7* 3'001$)f-* #,-* $-2'/1#%$7* ='"(#)%"* %=* Z"(LCD3* .)#,* 1* 61$#)('/1$* =%('3* %"* #,-* '"&-$/7)"2* 0-(,1")303* %=* been implicated in human diseases such as cancer and malian cells. LncRNAs have already Regulation, Biomedical Research Park Z"(LCD3*)"*%"(%2-"-3)35*#'0%$*0-#13#13)3*1"&**3'66$-33)%">* of Barcelona, C. Dr. Aiguader 88, neurodegeneration, highlighting the importance of this emergent field. In this article, we * Barcelona 08003, Spain. review-"&%2-"%'3* the catalogs of annotated g-7.%$&3R* Z%"2* "%"4(%&)"2* LCD* MZ"(LCDN5* -6)2-"-#)(* $-2'/1#)%"5* (%06-#)#)E-* LCD* M(-LCDN5* %"(%2-")(* lncRNAs and the latest advances in our understanding e-mail: [email protected] /"(LCD5*63-'&%2-"-*#$1"3($)6#5*"1#'$1/*1"#)3-"3-*LCD*MCDFN** of lncRNAs. Keywords: non-coding RNAs, regulation, long non-coding RNA, epigenetics

Derrien et al.

Table 1 | Description of References

question of function by using large-scale functional screens. Such THE CELL, AN RNA-DEPENDENT MACHINERY $LCD5* 1"&* 3"%LCD>* T13-&* %"* "(LCD* /-"2#,5* !"#$%&'(#)%"* $-2'/1#%$7*"(LCD*(1"*8-*='$#,-$*&)E)&-&*)"#%*1#* * Some of the most fundamental cellular processes rely on moves are already underway, with groups such as Eric Lander’s /-13#*RNAs #,$--* (ncRNAs). 2$%'63R* MPN* S,%$#* )"(/'&)"2* +,-"* #,-* .,%/-* ,'01"*anciently 2-"%0-* conserved .13* &-#-$4 non-coding carrying out siRNA screens (Guttman et al., 2011). Large-scale These"(LCD* include, U)($%LCD* M0)LCDN* M9949V* "#3N* 1"&* 6).)4 0)"-&5* )#* .13* 1* 3'$6$)3-* #,1#* #,-$-* 1$-* %"/7* for instance, the ribosomal )"#-$1(#)"2*LCD*M6)LCDN*M9W4VP*"#3NX*M9N*0-&)'0* RNAs which are assembled together analysis of protein-binding partners will also add another layer 18%'#* 9:5:::49;5:::* 6$%#-)"4(%&)"2* 2-"-35* to%=* constitute ribosomes, the"(LCD*M;:49::*"#3NX*MVN*/%"2*"(LCD*MY9::*"#3N>* factories for translation of messen- of valuable information to such annotation of lncRNA catalogs. $-6$-3-"#)"2* /-33* #,1"* 9* !#* .13* ,1$&* #%* )012)"-* #,1#* 0%3#* %=* #,-*into proteins. Hopefully, advances in bioinformatic annotation of RNA strucger RNAs (mRNAs) Other.)//* ancient ncRNAs "(LCD35* 1$-* ribosomes (%//-(#)E-/7* $-=-$$-&* 13* (Torarinsson et al., 2006; Parker et al., 2011), and methods 2-"%0)(* 3-?'-"(-3* 1$-* @'"A* BCD*the 3)"(-* 3)06/-* of amino include transport acids.,)(,* through via the #%*tures /1$2-*%$*/%"2*"(LCD3*MZ"(LCD3N>*T13-&*%"*/1$2%$21")303* 3'(,* 13* !"#$#%&'()* +,()-#.)$/,"* transfer RNAs (tRNAs) or the splicing of introns of pre-mRNA to predict functions based on this, will be developed. In this way, 43(1/-* 3-?'-"()"2* 1"&* 6$-&)(#)%"* =$%0* (,$%014 1"&* 0),-#"&)12'/'$* ,(,.)-$* ,1E-* 1* E-$7* (/%3-* we)"*might build up a richly annotated catalog of lncRNAs with which is mediated in part by#)"43#1#-* the snRNAs (small nuclear RNAs). 0163* %=* ='//* /-"2#,* (BCD* /)8$1$)-3* "'08-$* %=* 6$%#-)"4(%&)"2* 2-"-3>* F,-* /)0)#-&* functional predictions, that will enable us to integrate them into the crucial role of ncRNA post-transcriptional [DCF\U9* 1"&* in V* 13* .-//* 13* ,'01"* #$1"3($)64 "'08-$* %=* 6$%#-)"4(%&)"2*More 2-"-3*recently, (1""%#* -G6/1)"* #%0-35*0%$-*#,1"*]5W::*Z"(LCD3*)"*0%'3-*1"&* #,-*&-E-/%60-"#1/*1"&*6,73)%/%2)(1/*(%06/-G)#7* gene regulation has been highlighted by the discovery of microR- existing knowledge of the cell, and infer possible roles in human %E-$*V5V::*Z"(LCD3*)"*,'01"*,1E-*8--"*)&-"4 %=*,'01"3>*+)#,*#,-*$16)&*&-E-/%60-"#*)"*,)2,4 Derrien et al. lncRNAs: a new player of the transcription lncRNAs: a new player of the transcription NAs (miRNAs), which repress gene expression by targeting semi- diseases. #)=)-&* .)#,* 1* #%#1/* %=* 166$%G)01#-/7* 9V5:::* #,$%'2,6'#* 3-?'-"()"2* #-(,"%/%2)-3* 3'(,* 13* motifs in target mRNAs (Lee et al., 1993). Many Z"(LCD3*)"*1*01001/)1"*2-"%0-*OV4^Q>** &--6* 3-?'-"()"2* 1"&* complementary .,%/-* 2-"%0-* ,)2,4 additional classes of ncRNAs * have been discovered in the last Cis AND trans FUNCTIONS FOR lncRNAs &-"3)#7* #)/)"2* 1$$175* )#* )3* "%.* A"%."* #,1#* 18%'#* )3* ?')#-* (%06/)(1#-&>* !"* recently, only a handful of lncRNAs have been described in HI** Table 1published | Description human lncRNAs catalogs. etof al.,human 2008). lncRNAs published protein-coding transcript which is responsible for the inactivation Z"(LCD3* 1$-* #$1"3($)8-&* 87* LCD* 6%/70-$13-* * Amongst the )"#%* various ncRNA classes, know probably least MLCD_N* !!5* 8'#*we 3%0-* Z"(LCD3* ,1E-* 8--"*of $-4one of the two X chromosome in placental females through D* /1$2-* E1$)-#7* %=* "(LCD3* (1"* 8-* &)E)&-&* about the long non-coding RNAs In classes particular, what 6%$#-&*(lncRNAs). #%* 8-* #$1"3($)8-&* 87* LCD_* !!!5* 1"&*type #,-*annotation #.%* (/133-3R* 3#$'(#'$1/* 1"&* $-2'/1#%$7* "(LCD3* DNA methylation (Brockdorff evidence et al., 1992). Others examples of Number of of annotation LncRNAs Type of PolyA type Experimental evidence NumberReferences of LncRNAs classes Type PolyA Experimental 01@%$)#7*%=*Z"(LCD3*1$-*36/)(-&5*6%/71&-"7/1#-&* MF18/-* PN>* S#$'(#'$1/* "(LCD3* )"(/'&-* #LCD5*of lncRNAs is the total number in mammalian genomes? Where lncRNAs located in imprinted regions, such as Airn (Sleutels et al.,

lncRNA elements

Jia et al. (2010).

O)$E2/%4B%^2)$)#-%"J%!#'*+,B%,22"I-%2/12/-/#F%()JJ/2/#F%FG1/-%"J%!#'*+,%F2.#-'2)1F-B%% elements considered considered are they lncRNA localized? What is their significance in the context of evo-

2002; Nagano et al., 2008), H19 (Gabory et al., 2009), NESPAS (Wroe et al., 2000), or Kcnq1ot1 (Mancini-Dinardo et al., 2006; (ChiPSeq) + expression array Mohammad et al., 2010) are involved in the inactivation of gene Jia et al. (2010).Genic + 6,736 Genic + intergenic Bioinformatic +cDNAsPolyA+ Full-length cDNAs intergenic + PolyA+ Full-length available, the mostBioinformatic critical questionpredictions is to address the functionalexpression predictions via specific associations with chromatin-modifying ity of these transcripts. This question is particularly acute given complexes. More recently, the HOTAIR lncRNA was shown to manual curation manual curation that we have no a priori methods for the prediction of lncRNA epigenetically repress the HOXD locus via the recruitment of the Kapranov et al.Intergenic (2010)function580 Intergenic Bioinformatic predictions PolyA+ PolyA− Single-molecule sequencing PolyA− Single-molecule sequencing based onBioinformatic sequence alone, predictions in contrast to proteins PolyA+ where PRC2 complex (Rinn et al., 2007). Strikingly, this study described REVIEW ARTICLE confident inferences on protein function can be made by simply a transMINI (SMS) of Helicos mechanism action of a lncRNA located on (SMS) human Helicos published: 09 January 2012 analysis of the amino acid sequence. Given the sheer number of Chromosome 5 which modulates expression of multiple genes doi: 10.3389/fgene.2011.00107 Ørom et al. (2010) Intergenic Manual curation Mainly polyA+ cDNA/ESTs + RNAseq Intergenic 3,019 Manual curation Mainly polyA+ cDNA/ESTs + RNAseq new unexplored lncRNA transcripts (∼15,000 at last count; Der- clustered on human Chromosome 4 (HOXD locus; Rinn et al., Cabili et al. (2011) 8,263 Intergenic Bioinformatic predictions + PolyA+ (ChiPSeq) + RNAseq Intergenic Bioinformatic +to address PolyA+ (ChiPSeq) + RNAseq rien et al., submitted), the field mustpredictions move forward this 2007). Supporting this hypothesis, two recent papers (Cabili et al., !

Khalil et al. (2009)

lution, and particularly in the evolution of complex processing in

∼3,300 Bioinformatic predictions Intergenic Bioinformatic predictions PolyA+array ∼3,300 Khalil et al. (2009) Intergenic PolyA+ (ChiPSeq) + expression primate brains? Now that good catalogs of lncRNAs have become 6,736

Kapranov et al. (2010)

580

Ørom et al. (2010)

3,019

Cabili et al. (2011)

8,263

Derrien et al. (submitted)

9,277

.#(% Aȧ&'.11/(B% C"7/% !#'*+,-% .2/% /D"3EF)"#.2& )3G%'"#-/2D/(%.#(%'.#%H/%/012/--/(%.F%3"I%3/D/3B% ,% 3.2$/% 12"1"2F)"#% "J% !#'*+,-% K.-% K)$K3G% '"#&

-/2D/(% 12"0)7.3% 12"7"F/2% -/LE/#'/M% /0"#)'% -/LE/#'/-M% )#F2"#)'% -/LE/#'/-% "2% -/'"#(.2G% *+,% -F2E'FE2/-B% !#'*+,-% "2)$)#.F/% J2"7% )#&

The long non-coding RNAs: in the manual curation manual curation a new (p)layer

MINI REVIEW ARTICLE

et al. (submitted) 9,277 Manual curation Genic + intergenic PolyA+ Manual curation(ChiPSeq) + cDNA/ESTs PolyA++PolyA− (ChiPSeq) + cDNA/ESTs + Genic + intergenic PolyA− % Derrien “dark matter” www.frontiersin.org January 2012 | Volume 2 | Article 107 | 1 RNAseq + CAGE/diTAG RNAseq + CAGE/diTAG % The long non-coding RNAs: a new (p)layer in the Thomas Derrien *, Roderic Guigó and Rory Johnson 456%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%,7%8%92.#-3%*/-%5:45;?45@&4A:% 1

1 2

1,2

published: 09 January 2012 doi: 10.3389/fgene.2011.00107

1

Bioinformatics and Genomics, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain

“dark matter”

Thomas Derrien 1 *, Roderic Guigó 1,2 and Rory Johnson 1 Edited by: The transcriptome of a cell is represented by a myriad of different RNA molecules 1with and Bioinformatics and Genomics, Centre for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Spain Philipp Kapranov, St. Laurent without protein-coding capacities. In recent years, advances in sequencing technologies 2 Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain Institute, USA have allowed researchers to more fully appreciate the complexity of whole transcriptomes, Reviewed by: showing that the vast majority of the genome is transcribed, producing a diverse population Yohei Kirino, Cedars-Sinai Medical Edited by: The transcriptome of a cell is represented by a myriad of different RNA molecules w of non-protein coding RNAs (ncRNAs).Thus, the biological significance of non-coding RNAs Center, USA Philipp Kapranov, St. Laurent without protein-coding capacities. In recent years, advances in sequencing techn Chris Ponting, MRC Functional (ncRNAs) have been largely underestimated. Amongst these multiple classes of Institute, ncRNAs, USA Genomics Unit, UK have allowed researchers to more fully appreciate the complexity of whole transcrip the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionally Reviewed by: *Correspondence: showing that the vast majority of the genome is transcribed, producing a diverse pop diverse. A small but growing number of lncRNAs have been experimentally studied, and Cedars-Sinai Medical Yohei Kirino, Thomas Derrien, Bioinformatics and of non-protein coding RNAs (ncRNAs).Thus, the biological significance of non-codin USA a view is emerging that these are key regulators of epigenetic gene regulation Center, in mamGenomics Group, Centre for Genomic Chris Ponting, MRC Functional (ncRNAs) have been largely underestimated. Amongst these multiple classes of n malian cells. LncRNAs have already been implicated in human diseases such as cancer and Regulation, Biomedical Research Park Genomics Unit, UK the long non-coding RNAs (lncRNAs) are apparently the most numerous and func of Barcelona, C. Dr. Aiguader 88, neurodegeneration, highlighting the importance of this emergent field. In this article, we *Correspondence: Barcelona 08003, Spain. diverse. A small but growing number of lncRNAs have been experimentally studi review the catalogs of annotated lncRNAs and the latest advances in our understanding Thomas Derrien, Bioinformatics and e-mail: [email protected] a view is emerging that these are key regulators of epigenetic gene regulation in Genomics Group, Centre for Genomic of lncRNAs. malian cells. LncRNAs have already been implicated in human diseases such as can Regulation, Biomedical Research Park Keywords: Derrien non-coding et al. RNAs, regulation, long non-coding RNA, epigenetics lncRNAs: of the transcription of Barcelona, C. Dr. Aiguader 88, a new player neurodegeneration, highlighting the importance of this emergent field. In this arti Barcelona 08003, Spain. review the catalogs of annotated lncRNAs and the latest advances in our unders e-mail: [email protected] of lncRNAs. question of function by using large-scale functional screens. Such THE CELL, AN RNA-DEPENDENT MACHINERY already underway, with groups such Eric Lander’s Some of the most fundamental cellulartheprocesses rely on moves their aspromoters, showed statistically significant, non-random genome. Approximately oneare third (Derrien et al., submitted) Keywords: non-coding RNAs, regulation, long non-coding RNA, epigenetics carrying out siRNA screens (Guttman et al., 2011). Large-scale anciently conserved non-coding RNAs (ncRNAs). include, conservation, strongly suggesting a functional role for these ncRto one These half (Jia et al., 2010) of lncRNAs overlap protein-coding analysis of protein-binding addInterestingly, another layer for instance, the ribosomal RNAs which areloci assembled together NAs. about one third of the 15,000 lncRNAs disin some way – “genic” lncRNAs. It seems thereforepartners essentialwill to also question of function by using large-scale functional scree THE CELL, AN RNA-DEPENDENT MACHINERY of intergenic valuable information to such since annotation of lncRNA catalogs. to constitute ribosomes, the factories for translation of messena primate-specific pattern of conservation (Derrien et al., annotate lncRNAs both in and coding regions (i) play Some of the most fundamental cellular processes rely on moves are already underway, with groups such as Eric L Hopefully, advances bioinformatic annotation of RNA strucger RNAs (mRNAs) into proteins. Other ancient rolesboundaries of ncRNAsof protein-coding submitted). the exact genes in is frequently subject anciently conservedsequencing non-coding RNAs (ncRNAs). These include, carrying out siRNA screens (Guttman et al., 2011). Lar 2006; Gingeras, Parker et al., 2011), methods include the transport of amino acids through ribosomesand via reannotations the tures (Torarinsson Usingand whole transcriptome (RNAseq) of 16 human to variations (Denoeud et al., 2007; ribosomal RNAs which are assembled together analysis of protein-binding partners will also add anoth to predict on this, into will be Infor thisinstance, way, transfer RNAs (tRNAs) or the splicing of introns of pre-mRNA cell lines produced in thethe framework of the ENCODE consortium 2007) and thus could lead to the functions revision ofbased a lincRNAs a developed. to constitute ribosomes, factories fortissues translation might build up a richly annotated of lncRNAs with which is mediated in part by the snRNAs (small nuclear RNAs).(ii) we (ENCODE Project Consortium et al.,the2007) and 16 from of messen- of valuable information to such annotation of lncRNA c bona-fide lncRNAs, thousands of protein-coding genes har-catalog ger RNAs (mRNAs) into proteins. Other ancient roles of ncRNAs Hopefully, advances in bioinformatic annotation of RNA functionalbelonging predictions, thatlncRNAs will enable integrate More recently, the crucial role of ncRNA inbor post-transcriptional Humanthem Bodyinto Map project (www.illumina.com), we showed natural antisense transcripts to the classus tothe include the transport of amino acids through ribosomes via the tures (Torarinsson et al., 2006; Parker et al., 2011), and m existing knowledge of the lncRNAs cell, and infer rolesofin human gene regulation has been highlighted by the (He discovery microRthat 94% the GENCODE lncRNAs transcripts are expressed et al.,of2008; iii) numerous functional genic over- possible transfer RNAs (tRNAs) or the splicing of introns of pre-mRNA to predict functions based on this, will be developed. In t diseases. NAs (miRNAs), which repress gene expression by targeting semi- genes lapping protein-coding have been experimentally validated, in at least one of these tissue/cell line studied. Strikingly, the which is of mediated part by the snRNAstimes (smalllower nuclear RNAs). we might build up a richly annotated catalog of lncRN polyA+inlncRNAs is ∼10–20 especially in disease complementary motifs in target mRNAs (Lee et al., 1993). Many states (Faghihi et al., 2008; Pasmant et al., level of expression functional predictions, that will enable us to integrate th More recently, the crucial role of ncRNA in post-transcriptional than protein-coding transcripts reinforcing the need to use deep 2011; Wapinski 2011). A recent catalog of both genic additional classes of ncRNAs have been discovered in theand lastChang, Cis AND trans FUNCTIONS FOR lncRNAs gene hastobeen highlighted the discovery sequencing basedregulation technologies identify these lowbyexpressed non- of microR- existing knowledge of the cell, and infer possible roles in intergenic lncRNAs Until has been released based on genome-wide decade reinforcing the view that they are ofand central importance recently, only a handful of lncRNAs have been described in NAs (miRNAs), which repress genethat expression bytend targeting semi- diseases. coding (Figure 1.). We also demonstrated lncRNAs computational approachthe combined with manual anno- was in the functioning of cells from all the branches of life (Amaral literature. Oneintensive of the earliest examples XIST,loci a 19 kb noncomplementary in target mRNAs (Lee et latter al., 1993). Many be the enriched in nucleus inmotifs comparison with mRNAs; this tation. This led to the identification ohtranscript 6,736 lncRNA genes in to for et al., 2008). protein-coding which is responsible inactivation additional classeswith of the ncRNAs have been discovered observation consistent idea that many lncRNAs may in the last Cis AND trans FUNCTIONS FOR lncRNAs (Jia et al.,least 2010) among localized within Amongst the various ncRNA classes, wehuman know probably of one which of the63% two are X chromosome in or placental females being through decade reinforcing the view that they are of central devoted to gene of regulation in the nucleus. Finally, the questionimportance Until recently, only a handful of lncRNAs have been desc in In a close proximity kb) of known protein coding genes about the long non-coding RNAs (lncRNAs). particular, what(
The rise of regulatory RNA

18/07/14

!##!$/0&"%'1$./#*2."0'0./(!3, !"

!& )*+,

#′

#′ B5CD#E "′

"′

#′

#′

"′ B5CD#E

)*+,

#′

$%&'((

D:;ED:FBGH:E

)@FG?4?HF