Constraints in RNA Secondary structure prediction

Constraints in RNA Secondary structure prediction Ronny Lorenz [email protected] University of Vienna Benasque, Spain, July 30, 2015 RNA secon...

Author: Guest

0 downloads 0 Views 1MB Size

Report

Download PDF

Recommend Documents

RNA-SSPT: RNA Secondary Structure Prediction Tools

RNA: Secondary Structure Prediction and Analysis

Algorithms in Bioinformatics: A Practical Introduction. RNA Secondary Structure Prediction

RNA Secondary Structure Prediction in Soft Computing Framework: A Review

BIOINFORMATICS. CONTRAfold: RNA secondary structure prediction without physics-based models

Neural Networks, Adaptive Optimization, and RNA Secondary Structure Prediction

RNA Secondary Structure Prediction Using Neural Machine Translation

Max-Margin Models for RNA Secondary Structure Prediction

1 Secondary structure prediction

DNA, RNA, Protein Structure Prediction

Basics of RNA structure prediction

Secondary Structure of Vertebrate Telomerase RNA

Computational Methods for RNA Secondary Structure

An Overview of RNA Structure Prediction and Applications to RNA Gene Prediction and RNAi Design

RNA: secondary structure prediction (Institute of Mathematical Sciences, National University of Singapore, 24 July 2007)

A modularized MapReduce framework to support RNA secondary structure prediction and analysis workflows

A Parallel, Out-of-Core Algorithm for RNA Secondary Structure Prediction

PREDICTION OF SECONDARY STRUCTURES FOR LARGE RNA MOLECULES

The kink-turn: a new RNA secondary structure motif

Combinatorics on Plane Trees, Motivated by RNA Secondary Structure Configurations

Quantitative DMS mapping for automated RNA secondary structure inference

Automatic RNA Secondary Structure Determination with Stochastic Context-Free Grammars

Maximum expected accuracy structural neighbors of an RNA secondary structure

RNA Secondary Structures

Constraints in RNA Secondary structure prediction Ronny Lorenz [email protected] University of Vienna

Benasque, Spain, July 30, 2015

RNA secondary structure prediction

• can be done efficiently via DP (typically) in O(n3 ) • very good accuracy for small RNAs • accuracy drops to 40%-70% for longer sequences • variation of the same scheme allows one to predict: 1 MFE 2 Suboptimals 3 Partition function → Equilibrium probabilities 4 Consensus structures 5 RNA-RNA interactions 6 Classified DP (DoS, RNAshapes, RNAbor, RNA2Dfold, RNAheliCes) 7 ...

RNA Secondary structure prediction Recursive decomposition scheme (grammar)

F j

i

i+1

i

j

M

j

j

i

= j

u

k+1

j

|

i

u+1

j

|

l

|

j

i i+1

u

u+1

j

M1 j

i

u

j

u+1 j-1

M C

i

M1

M

C

C i

F k

M

C

j

M1

C i

interior

i

=

i

|

hairpin

=

C

i

F

=

j-1 j

|

i

j-1

j

What is constraint folding What happens during secondary structure prediction: • Candidate space is generated • Candidates are evaluated (using Nearest Neighbor Energy

parameters) • Candidate scores are selected (or aggregated)

What is constraint folding What happens during secondary structure prediction: • Candidate space is generated • Candidates are evaluated (using Nearest Neighbor Energy

parameters) • Candidate scores are selected (or aggregated)

But the energy model is not perfect: • experiment (e.g. SHAPE) may suggest sth. different • RNA is not ’alone’: bound molecules (proteins, small ligands,

etc.) prohibit certain structure features and/or induce change in free energy

What is constraint folding What happens during secondary structure prediction: • Candidate space is generated • Candidates are evaluated (using Nearest Neighbor Energy

parameters) • Candidate scores are selected (or aggregated)

But the energy model is not perfect: • experiment (e.g. SHAPE) may suggest sth. different • RNA is not ’alone’: bound molecules (proteins, small ligands,

etc.) prohibit certain structure features and/or induce change in free energy Secondary structure constraints: • Hard: disallow certain parses of the decomposition scheme • Soft: modify the energy contributions of the model

What is constraint folding Hard Constraints allow for cutting out/ inserting1 points in the secondary structure energy landscape

1 circumvention

of build-in constraints, e.g canonical base pairs

What is constraint folding Hard Constraints allow for cutting out/ inserting1 points in the secondary structure energy landscape

2

1 circumvention 2 Gobierno

of build-in constraints, e.g canonical base pairs ´ de Alvaro Colom, Guatemala

What is constraint folding Soft Constraints allow for shifting points in the landscape up or down

What is constraint folding Soft Constraints allow for shifting points in the landscape up or down

Mount Rushmore 1925

What is constraint folding Soft Constraints allow for shifting points in the landscape up or down

Mount Rushmore Today

What is constraint folding Soft Constraints allow for shifting points in the landscape up or down

Mount Rushmore from the back

Secondary Structure constraints ...have been used for decades

Examples • suboptimal structures sensu M. Zuker • mark modified bases (as unpaired) • recompute optimal structure given a consensus • simulations of translocating an RNA through a pore • incorporate protein/ligand binding • incorporate probing data (SHAPE, DMS, PARS) • ...

Secondary Structure constraints ...have been used for decades

Examples • suboptimal structures sensu M. Zuker • mark modified bases (as unpaired) • recompute optimal structure given a consensus • simulations of translocating an RNA through a pore • incorporate protein/ligand binding • incorporate probing data (SHAPE, DMS, PARS) • ...

Soft constraints and SHAPE reactivity Pseudo energy terms • Deigan et al. [2009] (stacked pairs)

∆ G(i) = m ∗ ln(reactivity[i] + 1) + b C

G

C

U G U

U

C

U

G

50 G

C

G

A

C

G

30

A

U

C

G

G

G

U U

C

U

U

A G G

U

A

60

C G

A G

A U

U

U U

A

A

C

C

C

C

C

G

G

G

G

C

A

20

U

U

40 G

A 10

U

G

A

C

G

70

G

U

G

G

C

C C 1

G

G C C A 75

Soft constraints and SHAPE reactivity Pseudo energy terms • Zarringhalam et al. [2012] (unpaired bases and base pairs)

∆ G(x, i) = β ∗ |x − qi | x ∈ [0(unpaired), 1(paired)] C

G

C

U G U

U

C

U

G

50 G

C

G

A

C

G

30

A

U

C

G

G

G

U

G

A

U

G

U

C

U

U

A G G

U

A

A U

U

60

C

U

A

A

C

C

C

C

C

G

G

G

G

C

A

20

U

U

40 G

A 10

U

G

A

C

G

70

G

U

G

G

C

C C 1

G

G C C A 75

Soft constraints and SHAPE reactivity Pseudo energy terms • Washietl et al. [2012] (unpaired bases)

Objective function n n X X 2ı (pi (~) − qi )2 + → min 2 τ σ2

F (~) =

i=1

C

G

C

U G U

C

U

G

50 G

C

G

A

C

G

30

A

U

C

G

G

G

U

G

20

U

G

U

C

U

U

A G G

U

A

A U

U

60

C

U

A

A

C

C

C

C

C

G

G

G

G

C

A

A

U

U

40 G

U

i=1

A 10

U

G

A

C

G

70

G

U

G

G

C

C C 1

G

G C C A 75

Implementations Constraints aware secondary structure prediction programs: Hard constraints: • UNAfold (Markham et al., 2008) • ViennaRNA Package (Hofacker et al., 1994, Lorenz et al. 2011)

Hard and Soft constraints: • RNAstructure (SHAPE) (Reuter et al., 2010) • RNApbfold (SHAPE) (Washietl et al., 2012) • ViennaRNA Package ≥ v2.2 (SHAPE, generalized constraints)

Not to mention all the programs for specific use-cases resulting from • code-duplication • from-scratch implementions

What is constraint folding Where do current implementations apply structure constraints? • positions that are unpaired • base pairs • base pair stacks

Are the above implementations sufficient?

What is constraint folding Where do current implementations apply structure constraints? • positions that are unpaired • base pairs • base pair stacks

Are the above implementations sufficient? Of course NOT!

On generalizing Hard constraints Typical implementations: Nij = Xii Ni+1,j +

j X k =i+1

Xik Ni+1,k−1 Nk +1,j

On generalizing Hard constraints Typical implementations: Nij = Xii Ni+1,j +

j X

Xik Ni+1,k−1 Nk +1,j

k =i+1

Add discriminative power: 1

Go beyond Nussinov scheme Substitute X

with X τ

where τ now denotes the different types of loops: • • • •

exterior loop hairpin loops interior loops (closing, enclosed) components of multi-loops (closing, enclosed)

On generalizing Hard constraints Typical implementations: Nij = Xii Ni+1,j +

j X

Xik Ni+1,k−1 Nk +1,j

k =i+1

Add discriminative power: 1

Go beyond Nussinov scheme Substitute X

with X τ

where τ now denotes the different types of loops: • • • • 2

exterior loop hairpin loops interior loops (closing, enclosed) components of multi-loops (closing, enclosed)

Go to full NN scheme Express X in terms of a boolean function f : Nm × D → 0|1 with m nucleotide positions, and decomposition step d ∈ D.

On generalizing Soft constraints Position dependent pseudo energy: X p X E(ψ) = E0 (ψ) + bi + biu i∈ψ p

= E0 (ψ) +

n X

i∈ψ u

bip +

X

(biu − bip )

i∈ψ u

i=1

= E0 (ψ) + E 0 +

X

δi

i∈ψ u

Base pair specific pseudo energies: X p X E(ψ) = E0 (ψ) + bij + biju (i,j)∈ψ /

(i,j)∈ψ

= E0 (ψ) +

X

biju

i