Stockholm Bioinformatics Centre Annual Report 2009

Director’s summary 2009 was a year with many bioinformatics conferences and workshops. In January, SBC organized a workshop "Bioinformatics in industry and academia" (picture) with participants from both companies and research groups in the Stockholm area. In June the world's largest bioinformatics conference ISMB-ECCB was held in Stockholm for the first time. SBC personnel was engaged in the organization at many levels, and contributed with several presentations and special sessions to make this an exceptionally successful ISMB event. The increased governmental funding to the universities via the “strategic research areas” was distributed in competitive calls during 2009. This increase is meant to provide a permanent funding addition of 1.3 billion SEK per year from 2012, starting at a lower level in 2010. SBC groups were involved in two areas: Molecular biosciences and E-science. The former resulted in a 140 MSEK/year grant to set up the Science for Life Laboratories in Uppsala and Stockholm, to which SBC probably will relocate. The latter resulted in a 30 MSEK/year grant to the Swedish e-Science Research Centre (SeRC), in which Bioinformatics is the second largest division. The senior staff at SBC has not undergone any major changes during 2009. Unfortunately the assistant professorship promised by DBB in 2005 has still not been activated due to the financial situation of that department. On the upside, we have finally filled the position of an assistant system manager, sponsored by CBR as a compensation for the services provided by SBC to CBR staff. The new sysadmin (20%), Roman Valls, started in December and will be helping the sometimes heavily overloaded Erik Sjölund with systems support. During 2009, SBC servers were substantially upgraded. A new remote backup system was put in place; deployment of new web server systems (InParanoid, Pfam); migration to new server hardware for many web servers; consolidation of the server architecture. However, there are still many issues to tackle, for instance handling old web servers that have stopped working, or work so poorly that they negatively affect other services. All desktop machines 1

were upgraded to the Ubuntu distribution of Linux. Scientifically, 2009 was yet another productive year for SBC, witnessed by the solid publication record and the large number of conference presentations listed in this report. Two PhD students were promoted: Ali Tofigh and Anna Johansson. SBC participates in two Master programmes, the KTH Master in Computational and Systems Biology, and the SU Master in Bioinformatics. Due to the foolishness of starting Master programmes three years before any Swedish Bachelors have graduated, there is only one swede in each program. However, the KTH programme has managed to find almost 20 foreign students, mostly from Iran, and are near their target number, while the SU programme still has almost no enrollments. Personnel during 2009: Prof. Arne Elofsson** Åsa Björklund Johannes Frey-Skött Kristoffer Illergård Per Larsson Wiktor Jurkowski Anni Kauko Marcin Skwark Karin Julenius

PhD student PhD student PhD student PhD student Postdoc Postdoc PhD student Postdoc

Prof. Jens Lagergren *Ali Tofigh Hossein Farahani Joel Sjöstrand

PhD student PhD student PhD student

Ass. Prof. Lukas Käll Luminita Moruz

PhD student

Ass. Prof. Erik Lindahl** *Anna Johansson Aron Hennerdal Pär Bjelkmar *Yana Vereshchaga Sander Pronk Berk Hess Szilard Pall Arjun Ray Christine Schwaiger Rossen Apostolov Samuel Murail Björn Wallner Teemu Murtola

PhD student PhD student PhD student Post-doc Research Associate Research Associate PhD student PhD student PhD student Post-doc Post-doc Research Associate Post-doc

Prof. Gunnar von Heijne** Patrik Björkholm PhD student

2

Prof. Erik Sonnhammer (Director of the SBC) *Andrey Alexeyenko Post-doc Kristoffer Forslund PhD student Anna Henricson PhD student Gabriel Östlund PhD student David Messina PhD student Oliver Frings PhD student Sanjit Roopra PhD student Thomas Schmitt PhD student Lars Arvestad Bengt Sennblad *Olof Emanuelsson Erik Sjölund Roman Valls

Associate professor Senior researcher Research associate System administrator Assistant system administrator

*) Left during 2009 **) Group located at Arrhenius Laboratory, Frescati Collaboration partners SU Molecular Biology & Functional Genomics (Prof. Marie Öhman) KTH Biotechnology (Prof. Tuula Teeri, Prof. Peter Savolainen, Prof Joakim Lundeberg, and Prof. Mathias Uhlén, Vincent Bulone) KI Aging Research Center (Dr. Lars Bäckman) KI MMK (Prof. Anna Wedell) KI CMB (Prof. Björn Andersson) KI CCK (Prof. Arne Östman) KI KBC (Janne Lehtiö) KI CMM (Anders Hamsten and Jacob Odeberg) Uppsala University (Dr. van der Spoel) Uppsala University (Prof. Hans Ellegren) Linköping University (Prof. Fredrik Elinder) Göteborg University (Prof. Bengt Oxelman) LU Clinical Genetics (Mattias Höglund) AstraZeneca (Prof. Hugh Salter) Bioinformatics Laboratory, BioInfoBank Institute, Poznan (Dr. Leszek Rychlewski) Institut Pasteur, Paris (Dr. Marc Delarue) Univeristy of Oxford, (Drs. Hugh Watkins, John Peden, and Martin Farrall) Stanford University (Prof. Michael Levitt, Prof. Vijay S. Pande, Prof. James Trudell) University of Wyoming (Dr. David Liberles) McGill Centre for Bioinformatics (Dr. Mike Hallett) University of British Columbia (Dr. Wyeth Wasserman). Yale University, New Haven, CT. (Dr. Mark Gerstein) University of Buffalo (Dr. Daniel Fischer) Cornell University, Ithaca, NY. (Dr. Klaas van Wijk) The Sanger Institute, Hinxton, UK. (Drs. Richard Durbin & Alex Bateman) Janelia farm, VA, USA. (Dr. Sean Eddy) University of Valencia, Spain (Dr. Gustavo Camps-Valls) University of Rochester Medical Center (Dr. Fred Hagen) 3

University of Paris René Descartes (Prof. Jean-Laurent Casanova) Max Planck Inst. für Genetik, Berlin. (Alexander Schliep) Duke University (Dr. Joel Meyer) EU bioinformatics network Biosapiens EU bioinformatics network Embrace EU bioinformatics network Genefun Scientific publications 2009 was again a very fruitful year for SBC in terms of publications, with 26 bioinformatics papers in total. Many of them describe novel algorithms and databases that represent substantial improvements on previous methods, testifying to SBC's position at the forefront of bioinformatics research as well as a sustained ability to produce important advancements in the field. Four of the papers were published in PNAS, and one in Genome Research. Based on http://www.sbc.su.se/publications: Johansson, A.C. and Lindahl, E. (2009) Protein contents in biological membranes can explain abnormal solvation of charged and polar residues. Proc Natl Acad Sci U S A 106 (37) : 15684-15689. Klammer, M., Messina, D.N., Schmitt, T. and Sonnhammer, E.L. (2009) MetaTM - a consensus method for transmembrane protein topology prediction. BMC Bioinformatics 10: 314. Gabaldon, T., Dessimoz, C., Huxley-Jones, J., Vilella, A.J., Sonnhammer, E.L. and Lewis, S. (2009) Joining forces in the quest for orthologs. Genome Biol 10 (9) : 403. Björkholm, P. and Sonnhammer, E.L. (2009) Comparative analysis and unification of domain-domain interaction networks. Bioinformatics 25 (22) : 3020-3025. Forslund, K. and Sonnhammer, E.L. (2009) Benchmarking homology detection procedures with low complexity filters. Bioinformatics 25 (19) : 2500-2505. Fugelstad, J., Bouzenzana, J., Djerbi, S., Guerriero, G., Ezcurra, I., Teeri, T.T., Arvestad, L. and Bulone, V. (2009) Identification of the cellulose synthase genes from the Oomycete Saprolegnia monoica and effect of cellulose synthesis inhibitors on gene expression and enzyme activity. Fungal Genet Biol 46 (10) : 759-767. Jaud, S., Fernandez-Vidal, M., Nilsson, I., Meindl-Beinker, N.M., Hubner, N.C., Tobias, D.J., von Heijne, G. and White, S.H. (2009) Insertion of short transmembrane helices by the Sec61 translocon. Proc Natl Acad Sci U S A 106 (28) : 11588-11593. Larsson, P., Skwark, M.J., Wallner, B. and Elofsson, A. (2009) Assessment of global and local model quality in CASP8 using Pcons and ProQ. Proteins 77 (S9) : 167-172. Illergard, K., Ardell, D.H. and Elofsson, A. (2009) Structure is three to ten times more conserved than sequence-A study of structural response in protein cores. Proteins 77 (3) : 499-508. Hessa, T., Reithinger, J.H., von Heijne, G. and Kim, H. (2009) Analysis of transmembrane 4

helix integration in the endoplasmic reticulum in S. cerevisiae. J Mol Biol 386 (5) : 12221228. Johansson, A.C. and Lindahl, E. (2009) The role of lipid composition for insertion and stabilization of amino acids in membranes. J Chem Phys 130 (18) : 185101. Bernsel, A., Viklund, H., Hennerdal, A. and Elofsson, A. (2009) TOPCONS: consensus prediction of membrane protein topology. Nucleic Acids Res 37 (Web Server issue) : W4658. Spivak, M., Weston, J., Bottou, L., Kall, L. and Noble, W.S. (2009) Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets. J Proteome Res 8 (7) : 3737-3745. Åkerborg, Ö., Sennblad, B., Arvestad, L. and Lagergren, J. (2009) Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A 106 (14) : 5714-5719. Messina, D.N. and Sonnhammer, E.L. (2009) DASher: a stand-alone protein sequence client for DAS, the Distributed Annotation System. Bioinformatics 25 (10) : 1333-1334. Alexeyenko, A. and Sonnhammer, E.L. (2009) Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res 19 (6) : 1107-1116. Enquist, K., Fransson, M., Boekel, C., Bengtsson, I., Geiger, K., Lang, L., Pettersson, A., Johansson, S., von Heijne, G. and Nilsson, I. (2009) Membrane-integration characteristics of two ABC transporters, CFTR and P-glycoprotein. J Mol Biol 387 (5) : 1153-1164. Bjelkmar, P., Niemela, P.S., Vattulainen, I. and Lindahl, E. (2009) Conformational Changes and Slow Dynamics through Microsecond Polarized Atomistic Molecular Simulation of an Integral Kv1.2 Ion Channel. PLoS Comput Biol 5 (2) : e1000289. Juncker, A.S., Jensen, L.J., Pierleoni, A., Bernsel, A., Tress, M.L., Bork, P., von Heijne, G., Valencia, A., Ouzounis, C.A., Casadio, R. and Brunak, S. (2009) Sequence-based feature prediction and annotation of proteins. Genome Biol 10 (2) : 206. Kall, L., Storey, J.D. and Noble, W.S. (2009) QVALITY: non-parametric estimation of qvalues and posterior error probabilities. Bioinformatics 25 (7) : 964-966. Barth, P., Wallner, B. and Baker, D. (2009) Prediction of membrane protein structures with complex topologies using limited constraints. Proc Natl Acad Sci U S A 106 (5) : 1409-1414. Johansson, A.C. and Lindahl, E. (2009) Titratable amino Acid solvation in lipid membranes as a function of protonation state. J Phys Chem B 113 (1) : 245-253. Lassmann, T., Frings, O. and Sonnhammer, E.L. (2009) Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res 37 (3) : 858-865. Arvestad, L., Lagergren, J., Sennblad, B. (2009) The gene evolution model and computing its associated probabilities J ACM 56: 1-44.

5

Sennblad, B., Lagergren, J. (2009) Probabilistic Orthology Analysis Syst Biol 58: 411-424. Hessa, T., Reithinger, J.H., von Heijne, G. and Kim, H. (2009) Quantitative analysis of transmembrane helix integration in the endoplasmic reticulum in S. cerevisiae J Mol Biol 386 (5) : 1222-1228. Courses and workshops Bioinformatics 6 hp DD2396 (KTH) by Lars Arvestad Applied Bioinformatics 7.5 hp DD2397 (KTH) / DA2397 (SU) by Lars Arvestad Algoritmic bioinformatics 6 hp DD2450 (KTH) by Jens Lagergren Omic Data and Systems Biology 7.5 hp DD2399 (KTH) by Jens Lagergren Bioinformatics 7.5 hp KB7004 (SU) by Arne Elofsson Molecular modelling 7.5 hp KB8005 (SU) by Erik Lindahl Applied scientific programming 7.5 hp KB8009 (SU) Comparative Genomics 7.5 hp KB8007 (SU) by Erik Sonnhammer Structure prediction of globular and membrane proteins 7.5 hp KB8008 (SU) by Lukas Käll Protein physics 7.5 hp KB8011 (SU) by Erik Lindahl

Invited lectures and seminars "A joint model for genes, their sequences, and substitution rate variation" at the International Workshop on Tree Reconciliation, Max Planck Institute for Molecular Genetics, Berlin, 25–26 May 2009 by Lars Arvestad. “Multidomain protein evolution”, `3D-sig, ISMB 2009', Stockholm, June 2009 by Arne Elofsson “Membrane protein bioinformatics”, Protein Structure Prediction workshop, Budapest, Aug 2009 by Arne Elofsson "Assigning confidence to Peptide-Spectrum Matches", European Science Foundation workshop on Quality Control in Proteomics, Cambridge, UK, November 2009, Lukas Käll "Scoring peptides by their retention time", 57th American Society for Mass Spectrometry Conference, May-June 2009, Philadelphia, USA, Lukas Käll INRIA Focus K3D conference (invited lecture). Nice, France. March 2009. By Erik Lindahl Barcelona Supercomputing Center (invited lecture). Barcelona, Spain. March 2009. By Erik Lindahl. Nordita nanoscale materials conference (invited lecture). Stockholm, March 2009. By Erik Lindahl. Nordic Physical Society conference (invited lecture). Lyngby, Denmark. June 2009. By Erik Lindahl. 6

ISMB/ECCB Meeting 2009 (invited lecture). Stockholm, Sweden. June 2009. by Erik Lindahl. Wallenberg Consortium North (invited lecture). Stockholm, Sweden. September 2009. By Erik Lindahl NSC 20 year jubilee & PRACE conference (invited lecture). Linköping, Sweden. October 2009. By Berk Hess. 15th Brazilian Theoretical Chemistry Conference (keynote lecture). Pocos de Caldas, Brasil. October 2009. By Erik Lindahl Frontiers of Molecular Simulation (invited lecture). Barcelona, Spain. November 2009. By Erik Lindahl “Probabilistic Orthology Analysis”, at the International Workshop on Tree Reconciliation, Max Planck Institute for Molecular Genetics, Berlin, 25–26 May 2009 by Bengt Sennblad. “OrthoXML and InParanoid 7”, Workshop “Quest for orthologs”, Cambridge, UK, July 4 2009 by Erik Sonnhammer “FunCoup: global networks of functional coupling in eukaryotes”, ISMB 2009 Highlights track, Stockholm, 2 July 2009 by Erik Sonnhammer “InParalogs, InParanoid, and OrthoXML”, ISMB 2009 Special Session on Orthology Inference, Stockholm, 29 June 2009 by Erik Sonnhammer “Using FunCoup to find novel disease genes”, ISMB 2009 Special Interest Group “Biopathways”, Stockholm, 28 June 2009 by Erik Sonnhammer “Pfam 2009 in Stockholm”, Xfam meeting, Stockholm, May 28 2009 by Erik Sonnhammer “FunCoup: global networks of functionally coupled genes/proteins”, Evolutionary Biology Centre, Uppsala University, 10 February 2009 by Erik Sonnhammer “Global networks of functional coupling from comprehensive data integration”, Biotechnology Centre of Oslo, University of Oslo, 3 February 2009 by Erik Sonnhammer The 34th Lorne Conference on Protein Structure (invited lecture). Lorne, Australia. February 2009. by Gunnar von Heijne Workshop on “Membrane Biology Frontiers: Dynamics, Energy, Structure and Technology” (invited lecture). Mykonos, Greece. June 2009. by Gunnar von Heijne VIII European Symposium of the Protein Society (invited lecture). Zürich, Switzerland. June 2009. by Gunnar von Heijne ISMB/ECCB Meeting 2009 (honorary chair, session chair and invited lecture). Stockholm, Sweden. June 2009. by Gunnar von Heijne 7

23rd Annual Symposium of the Protein Society (invited lecture). Boston, USA. July 2009. by Gunnar von Heijne Meeting on “Structure and Function of Membrane Proteins” (invited lecture). Freiburg, FGR. September 2009. by Gunnar von Heijne Heidelberg Molecular Life Science Conference “Cellular Protein Transport” (invited lecture). Heidelberg, FRG. October 2009. by Gunnar von Heijne IB Conference on Biomembranes 2009 (invited award lecture). Utrecht, Netherlands. October 2009. by Gunnar von Heijne

Computer infrastructure The SBC employs a very standardized computer system in which each workplace has an identically set up desktop computer. All user disk storage is done at PDC and is accessed via the AFS file system (in 5-10 Gb volumes). Heavy computation is carried out on the 5440core compute cluster Ferlin, also maintained by PDC. A summary of the infrastructure is listed below. Desktop computers: ~40 desktops running Ubuntu Linux Ferlin compute cluster: In total 5440 cores, 2.66GHz CPUs on 672 8-CPU compute nodes with 8 Gb RAM (shared with SNIC). Disk servers: 2 servers, ~8 Tb in total Internal servers: mail, cups, life, mickey, sbcdb Web servers: http://www.sbc.su.se: Intel Core2 Quad 2.40GHz, 4 Gb RAM, 250 Gb RAID2 disk, Centos Linux. accessed from 9000-15000 unique IP numbers per month. Hosted services: * PRIMETV: Visualize tree reconciliations * PrIME-GSR: A Bayesian integrated model for genes, sequences, and rates * MapDP: factorizing branchlengths into divergence times and rates * PrIME-GEM: Probabilistic orthology analysis (binaries downloadable) * Pmembr A threading method for membrane proteins. * HMMER High capacity site for use of HMMER to search SCOP or Pfam * ProQ A protein model quality predictor. * PeroxiP Predict peroxisomal proteins and Pfam domains * PRODIV-TMHMM Topology and reentrant predictions. * TMHMMfix TMHMM with optional fixing and reliability score calculation. * DAS Prediction of Transmembrane Regions. 8

* NucPred Nuclear localization prediction. * DRIP-PRED Disorder/order prediction for proteins. * GPCPRED Contact map prediction for proteins. * SVMHC Prediction of MHC class I binding peptides. * PhylProM Phylogenetic profiles * OVOP automatic view generation for protein structures (source code available) * modhmm A modular HMM programed used in PRO(DIV)-TMHMM and other studies.. * LGscore A program to measure the similarity between proteins. * Palign Our alignment/threading programs. * ssHMM Secondary structure HMMs based on HMMER * LEPRA Protein modelling C++ /library. * TAED The Adaptive Evolution Database. * www.genefun.org GeneFun EU collaboration * www.perlgp.org PerlGP, The Open Source Perl Genetic Programming System * www.socbin.org Society for Bioinformatics in Norther Europe * prime.sbc.su.se Probabilistic Integrated Models of Evolution. neon.sbc.su.se: Intel Quad CPU, 4 * 2.66GHz, 8 Gb RAM, 1 Tb RAID2 disks, Centos Linux Hosted services: • InParanoid.sbc.su.se A comprehensive database of orthologs and inparalogs in eukaryotes • Pfam.sbc.su.se A comprehensive database of protein domain families. • FunCoup.sbc.su.se Comprehensive protein networks of functional coupling. • Phobius.sbc.su.se A combined transmembrane topology and signal peptide predictor. • Avdist: A tool for analyzing haplotype differences. • Repeatalign: Binary Repeat Align server. • RefSense: An alternative to Pubmed. • excap • facs • modelestimator • octopus.cbr.su.se • scampi.cbr.su.se • topcons helium.sbc.su.se: 2*2.80GHz Pentium 4, 2 Gb RAM, 750 Gb RAID2 disks, Centos Linux Hosted services: • jSquid.sbc.su.se A java tool to visualize networks and edge scores in FunCoup. • Sfinx.sbc.su.se Prediction of functional and structural features in proteins. • Sfixem.sbc.su.se A java viewer for Sfinx. • GPCRHMM.sbc.su.se A hidden Markov model for GPCR detection. • Humanoid.sbc.su.se Human ortholog groups and functional shift analysis of subfamilies. • MSA.sbc.su.se Multiple alignments and assessment of alignment accuracy. • DASher • MultiParanoid.sbc.su.se • DAS services for Phobius, signalP, HMMTOP, PhD, Toppred, etc. (http://das.sbc.su.se:9000/das/*) argon.sbc.su.se: 4* 2.8 GHz Pentium 4, 4 Gb RAM; 1.25 Tb disks, Centos Linux

9