John M. Walker, SERIES EDITOR

Bioinformatics METHODS IN MOLECULAR BIOLOGY™ John M. Walker, SERIES EDITOR 460. Essential Concepts in Toxicogenomics, edited by Donna L. Mendrick a...

Author: Cleopatra Lizbeth Moody

6 downloads 1 Views 334KB Size

Report

Download PDF

Recommend Documents

John M. Walker, SERIES EDITOR

Editor: John F. Kain and John M. Quigley. Volume URL:

Editor: John Hutchings

COLORED DIAMONDS COLOR REFERENCE CHARTS. John M. King, Editor

Editor: James M. Poterba, editor. Volume URL:

AIDS Prevention and Control Series. Series Editor

Prentice Hall International Series in Systems and Control Engineering. M. J. Grimble, Series Editor

Editor: John B. Shoven and John Whalley, editors. Volume URL:

BIBLIA. John MacArthur. Autor y Editor General

Writer: John Ikuma. Editor: Melissa Piekaar

LIDS NEWSLETTER John Stahl, Editor Fall 2009

M SERIES

JOHN M. OESTERREICHER

John (Jack) M. Roberts

John M. Kulik. Abstract

CERLIS Series. Series Editor: Maurizio Gotti. Editorial Board

About. Conference organizer and series editor

OCR GCSE FOR. Series Editor: Roger Porkess

Bioinformatics

METHODS IN MOLECULAR BIOLOGY™

John M. Walker, SERIES EDITOR 460. Essential Concepts in Toxicogenomics, edited by Donna L. Mendrick and William B. Mattes, 2008 459. Prion Protein Protocols, edited by Andrew F. Hill, 2008 458. Artificial Neural Networks: Methods and Applications, edited by David S. Livingstone, 2008 457. Membrane Trafficking, edited by Ales Vancura, 2008 456. Adipose Tissue Protocols, Second Edition, edited by Kaiping Yang, 2008 455. Osteoporosis, edited by Jennifer J. Westendorf, 2008 454. SARS- and Other Coronaviruses: Laboratory Protocols, edited by Dave Cavanagh, 2008 453. Bioinformatics, Volume II: Structure, Function and Applications, edited by Jonathan M. Keith, 2008 452. Bioinformatics, Volume I: Data, Sequence Analysis and Evolution, edited by Jonathan M. Keith, 2008 451. Plant Virology Protocols: From Viral Sequence to Protein Function, edited by Gary Foster, Elisabeth Johansen, Yiguo Hong, and Peter Nagy, 2008 450. Germline Stem Cells, edited by Steven X. Hou and Shree Ram Singh, 2008 449. Mesenchymal Stem Cells: Methods and Protocols, edited by Darwin J. Prockop, Douglas G. Phinney, and Bruce A. Brunnell, 2008 448. Pharmacogenomics in Drug Discovery and Development, edited by Qing Yan, 2008 447. Alcohol: Methods and Protocols, edited by Laura E. Nagy, 2008 446. Post-translational Modification of Proteins: Tools for Functional Proteomics, Second Edition, edited by Christoph Kannicht, 2008 445. Autophagosome and Phagosome, edited by Vojo Deretic, 2008 444. Prenatal Diagnosis, edited by Sinhue Hahn and Laird G. Jackson, 2008 443. Molecular Modeling of Proteins, edited by Andreas Kukol, 2008. 442. RNAi: Design and Application, edited by Sailen Barik, 2008 441. Tissue Proteomics: Pathways, Biomarkers, and Drug Discovery, edited by Brian Liu, 2008 440. Exocytosis and Endocytosis, edited by Andrei I. Ivanov, 2008 439. Genomics Protocols, Second Edition, edited by Mike Starkey and Ramnanth Elaswarapu, 2008 438. Neural Stem Cells: Methods and Protocols, Second Edition, edited by Leslie P. Weiner, 2008 437. Drug Delivery Systems, edited by Kewal K. Jain, 2008 436. Avian Influenza Virus, edited by Erica Spackman, 2008 435. Chromosomal Mutagenesis, edited by Greg Davis and Kevin J. Kayser, 2008 434. Gene Therapy Protocols: Volume II: Design and Characterization of Gene Transfer Vectors, edited by Joseph M. LeDoux, 2008 433. Gene Therapy Protocols: Volume I: Production and In Vivo Applications of Gene Transfer Vectors, edited by Joseph M. LeDoux, 2008

432. Organelle Proteomics, edited by Delphine Pflieger and Jean Rossier, 2008 431. Bacterial Pathogenesis: Methods and Protocols, edited by Frank DeLeo and Michael Otto, 2008 430. Hematopoietic Stem Cell Protocols, edited by Kevin D. Bunting, 2008 429. Molecular Beacons: Signalling Nucleic Acid Probes, Methods and Protocols, edited by Andreas Marx and Oliver Seitz, 2008 428. Clinical Proteomics: Methods and Protocols, edited by Antonia Vlahou, 2008 427. Plant Embryogenesis, edited by Maria Fernanda Suarez and Peter Bozhkov, 2008 426. Structural Proteomics: High-Throughput Methods, edited by Bostjan Kobe, Mitchell Guss, and Huber Thomas, 2008 425. 2D PAGE: Sample Preparation and Fractionation, Volume II, edited by Anton Posch, 2008 424. 2D PAGE: Sample Preparation and Fractionation, Volume I, edited by Anton Posch, 2008 423. Electroporation Protocols: Preclinical and Clinical Gene Medicine, edited by Shulin Li, 2008 422. Phylogenomics, edited by William J. Murphy, 2008 421. Affinity Chromatography: Methods and Protocols, Second Edition, edited by Michael Zachariou, 2008 420. Drosophila: Methods and Protocols, edited by Christian Dahmann, 2008 419. Post-Transcriptional Gene Regulation, edited by Jeffrey Wilusz, 2008 418. Avidin–Biotin Interactions: Methods and Applications, edited by Robert J. McMahon, 2008 417. Tissue Engineering, Second Edition, edited by Hannsjörg Hauser and Martin Fussenegger, 2007 416. Gene Essentiality: Protocols and Bioinformatics, edited by Svetlana Gerdes and Andrei L. Osterman, 2008 415. Innate Immunity, edited by Jonathan Ewbank and Eric Vivier, 2007 414. Apoptosis in Cancer: Methods and Protocols, edited by Gil Mor and Ayesha Alvero, 2008 413. Protein Structure Prediction, Second Edition, edited by Mohammed Zaki and Chris Bystroff, 2008 412. Neutrophil Methods and Protocols, edited by Mark T. Quinn, Frank R. DeLeo, and Gary M. Bokoch, 2007 411. Reporter Genes: A Practical Guide, edited by Don Anson, 2007 410. Environmental Genomics, edited by Cristofre C. Martin, 2007 409. Immunoinformatics: Predicting Immunogenicity In Silico, edited by Darren R. Flower, 2007 408. Gene Function Analysis, edited by Michael Ochs, 2007 407. Stem Cell Assays, edited by Vemuri C. Mohan, 2007 406. Plant Bioinformatics: Methods and Protocols, edited by David Edwards, 2007 405. Telomerase Inhibition: Strategies and Protocols, edited by Lucy Andrews and Trygve O. Tollefsbol, 2007

METHODS

IN

MOLECULAR BIOLOGY™

Bioinformatics Volume I Data, Sequence Analysis and Evolution

Edited by

Jonathan M. Keith, PhD School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia

Editor Jonathan M. Keith School of Mathematical Sciences Queensland University of Technology Brisbane, Queensland, Australia [email protected]

Series Editor John Walker Hatfield, Hertfordshire AL10 9NP UK

ISBN: 978-1-58829-707-5 ISSN 1064-3745 DOI: 10.1007/978-1-60327-159-2

e-ISBN: 978-1-60327-159-2 e-ISSN: 1940-6029

Library of Congress Control Number: 2007943036 © 2008 Humana Press, a part of Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, 999 Riverview Drive, Suite 208, Totowa, NJ 07512 USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Cover illustration: Fig. 4, Chapter 19, “Inferring Ancestral Protein Interaction Networks,” by José M. Peregrín-Alvarez Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com

Preface Bioinformatics is the management and analysis of data for the life sciences. As such, it is inherently interdisciplinary, drawing on techniques from Computer Science, Statistics, and Mathematics and bringing them to bear on problems in Biology. Moreover, its subject matter is as broad as Biology itself. Users and developers of Bioinformatics methods come from all of these fields. Molecular biologists are some of the major users of Bioinformatics, but its techniques are applicable across a range of life sciences. Other users include geneticists, microbiologists, biochemists, plant and agricultural scientists, medical researchers, and evolution researchers. The ongoing exponential expansion of data for the life sciences is both the major challenge and the raison d’être for twenty-first century Bioinformatics. To give one example among many, the completion and success of the human genome sequencing project, far from being the end of the sequencing era, motivated a proliferation of new sequencing projects. And it is not only the quantity of data that is expanding; new types of biological data continue to be introduced as a result of technological development and a growing understanding of biological systems. Bioinformatics describes a selection of methods from across this vast and expanding discipline. The methods are some of the most useful and widely applicable in the field. Most users and developers of Bioinformatics methods will find something of value to their own specialties here, and will benefit from the knowledge and experience of its 86 contributing authors. Developers will find them useful as components of larger methods, and as sources of inspiration for new methods. Volume I, Section IV in particular is aimed at developers; it describes some of the “meta-methods”—widely applicable mathematical and computational methods that inform and lie behind other more specialized methods—that have been successfully used by bioinformaticians. For users of Bioinformatics, this book provides methods that can be applied as is, or with minor variations to many specific problems. The Notes section in each chapter provides valuable insights into important variations and when to use them. It also discusses problems that can arise and how to fix them. This work is also intended to serve as an entry point for those who are just beginning to discover and use methods in Bioinformatics. As such, this book is also intended for students and early career researchers. As with other volumes in the Methods in Molecular Biology™ series, the intention of this book is to provide the kind of detailed description and implementation advice that is crucial for getting optimal results out of any given method, yet which often is not incorporated into journal publications. Thus, this series provides a forum for the communication of accumulated practical experience. The work is divided into two volumes, with data, sequence analysis, and evolution the subjects of the first volume, and structure, function, and application the subjects of the second. The second volume also presents a number of “meta-methods”: techniques that will be of particular interest to developers of bioinformatic methods and tools. Within Volume I, Section I deals with data and databases. It contains chapters on a selection of methods involving the generation and organization of data, including v

vi

Preface

sequence data, RNA and protein structures, microarray expression data, and functional annotations. Section II presents a selection of methods in sequence analysis, beginning with multiple sequence alignment. Most of the chapters in this section deal with methods for discovering the functional components of genomes, whether genes, alternative splice sites, non-coding RNAs, or regulatory motifs. Section III presents several of the most useful and interesting methods in phylogenetics and evolution. The wide variety of topics treated in this section is indicative of the breadth of evolution research. It includes chapters on some of the most basic issues in phylogenetics: modelling of evolution and inferring trees. It also includes chapters on drawing inferences about various kinds of ancestral states, systems, and events, including gene order, recombination events and genome rearrangements, ancestral interaction networks, lateral gene transfers, and patterns of migration. It concludes with a chapter discussing some of the achievements and challenges of algorithm development in phylogenetics. In Volume II, Section I, some methods pertinent to the prediction of protein and RNA structures are presented. Methods for the analysis and classification of structures are also discussed. Methods for inferring the function of previously identified genomic elements (chiefly protein-coding genes) are presented in Volume II, Section II. This is another very diverse subject area, and the variety of methods presented reflects this. Some well-known techniques for identifying function, based on homology, “Rosetta stone” genes, gene neighbors, phylogenetic profiling, and phylogenetic shadowing are discussed, alongside methods for identifying regulatory sequences, patterns of expression, and participation in complexes. The section concludes with a discussion of a technique for integrating multiple data types to increase the confidence with which functional predictions can be made. This section, taken as a whole, highlights the opportunities for development in the area of functional inference. Some medical applications, chiefly diagnostics and drug discovery, are described in Volume II, Section III. The importance of microarray expression data as a diagnostic tool is a theme of this section, as is the danger of over-interpreting such data. The case study presented in the final chapter highlights the need for computational diagnostics to be biologically informed. The final section presents just a few of the “meta-methods” that developers of Bioinformatics methods have found useful. For the purpose of designing algorithms, it is as important for bioinformaticians to be aware of the concept of fixed parameter tractability as it is for them to understand NP-completeness, since these concepts often determine the types of algorithms appropriate to a particular problem. Clustering is a ubiquitous problem in Bioinformatics, as is the need to visualize data. The need to interact with massive data bases and multiple software entities makes the development of computational pipelines an important issue for many bioinformaticians. Finally, the chapter on text mining discusses techniques for addressing the special problems of interacting with and extracting information from the vast biological literature. Jonathan M. Keith

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents of Volume II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v ix xi

SECTION I: DATA AND DATABASES 1. Managing Sequence Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilene Karsch Mizrachi 2. RNA Structure Determination by NMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lincoln G. Scott and Mirko Hennig 3. Protein Structure Determination by X-Ray Crystallography . . . . . . . . . . . . . . . . . . Andrea Ilari and Carmelinda Savino 4. Pre-Processing of Microarray Data and Analysis of Differential Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steffen Durinck 5. Developing an Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Midori A. Harris 6. Genome Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hideya Kawaji and Yoshihide Hayashizaki

3 29 63

89 111 125

SECTION II: SEQUENCE ANALYSIS 7. Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Walter Pirovano and Jaap Heringa 8. Finding Genes in Genome Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alice Carolyn McHardy 9. Bioinformatics Detection of Alternative Splicing . . . . . . . . . . . . . . . . . . . . . . . . . . Namshin Kim and Christopher Lee 10. Reconstruction of Full-Length Isoforms from Splice Graphs . . . . . . . . . . . . . . . . . Yi Xing and Christopher Lee 11. Sequence Segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jonathan M. Keith 12. Discovering Sequence Motifs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timothy L. Bailey

143 163 179 199 207 231

SECTION III: PHYLOGENETICS AND EVOLUTION 13. Modeling Sequence Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Pietro Liò and Martin Bishop 14. Inferring Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Simon Whelan

vii

viii

Contents

15. Detecting the Presence and Location of Selection in Proteins. . . . . . . . . . . . . . . . . Tim Massingham 16. Phylogenetic Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, and John Robinson 17. Inferring Ancestral Gene Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julian M. Catchen, John S. Conery, and John H. Postlethwait 18. Genome Rearrangement by the Double Cut and Join Operation . . . . . . . . . . . . . . Richard Friedberg, Aaron E. Darling, and Sophia Yancopoulos 19. Inferring Ancestral Protein Interaction Networks. . . . . . . . . . . . . . . . . . . . . . . . . . José M. Peregrín-Alvarez 20. Computational Tools for the Analysis of Rearrangements in Mammalian Genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guillaume Bourque and Glenn Tesler 21. Detecting Lateral Genetic Transfer: A Phylogenetic Approach . . . . . . . . . . . . . . . . . Robert G. Beiko and Mark A. Ragan 22. Detecting Genetic Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georg F. Weiller 23. Inferring Patterns of Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paul M.E. Bunje and Thierry Wirth 24. Fixed-Parameter Algorithms in Phylogenetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jens Gramm, Arfst Nickelsen, and Till Tantau

311 331

365 385 417

431 457 471 485 507

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Evolution Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

Contributors FAISAL ABABNEH • Department of Mathematics and Statistics, Al-Hussein Bin Talal University, Ma’an, Jordan TIMOTHY L. BAILEY • ARC Centre of Excellence in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia ROBERT G. BEIKO • Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada MARTIN BISHOP • CNR-ITB Institute of Biomedical Technologies, Segrate, Milano, Italy GUILLAUME BOURQUE • Genome Institute of Singapore, Singapore, Republic of Singapore PAUL M.E. BUNJE • Department of Biology, Lehrstuhl für Zoologie und Evolutionsbiologie, University of Konstanz, Konstanz, Germany JULIAN M. CATCHEN • Department of Computer and Information Science and Institute of Neuroscience, University of Oregon, Eugene, OR JOHN S. CONERY • Department of Computer and Information Science, University of Oregon, Eugene, OR AARON E. DARLING • ARC Centre of Excellence in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia STEFFEN DURINCK • Katholieke Universiteit Leuven, Leuven, Belgium RICHARD FRIEDBERG • Department of Physics, Columbia University, New York, NY JENS GRAMM • Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Tübingen, Germany MIDORI A. HARRIS • European Molecular Biology Laboratory – European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom YOSHIHIDE HAYASHIZAKI • Genome Exploration Research Group, RIKEN Yokohama Institute, Yokohama, Kanagawa, Japan; and Genome Science Laboratory, RIKEN Wako Institute, Wako, Saitama, Japan MIRKO HENNIG • Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC JAAP HERINGA • Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands ANDREA ILARI • CNR Institute of Molecular Biology and Pathology (IBPM), Department of Biochemical Sciences, University of Rome, “Sapienza,” Roma, Italy VIVEK JAYASWAL • School of Mathematics and Statistics, Sydney Bioinformatics and Centre for Mathematical Biology, University of Sydney, Sydney, New South Wales, Australia LARS SOMMER JERMIIN • School of Biological Sciences, Sydney Bioinformatics and Centre for Mathematical Biology, University of Sydney, Sydney, New South Wales, Australia HIDEYA KAWAJI • Functional RNA Research Program, Frontier Research System, RIKEN Wako Institute, Wako, Saitama, Japan

ix

x

Contributors

JONATHAN M. KEITH • School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia NAMSHIN KIM • Molecular Biology Institute, Institute for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA CHRISTOPHER LEE • Molecular Biology Institute, Institute for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California, Los Angeles, CA PIETRO LIÒ • Computer Laboratory, University of Cambridge, Cambridge, United Kingdom TIM MASSINGHAM • European Molecular Biology Laboratory – European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom ALICE CAROLYN MCHARDY • IBM Thomas J. Watson Research Center, Yorktown Heights, NY ILENE KARSCH MIZRACHI • National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD ARFST NICKELSEN • Institut für Theoretische Informatik, Universität zu Lübeck, Lübeck, Germany JOSÉ M. PEREGRÍN-ALVAREZ • SickKids Research Institute, Toronto, Ontario, Canada WALTER PIROVANO • Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands JOHN H. POSTLETHWAIT • Institute of Neuroscience, University of Oregon, Eugene, OR MARK A. RAGAN • ARC Centre of Excellence in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia JOHN ROBINSON • School of Mathematics and Statistics and Centre for Mathematical Biology, University of Sydney, Sydney, New South Wales, Australia CARMELINDA SAVINO • CNR-Institute of Molecular Biology and Pathology (IBPM), Department of Biochemical Sciences, University of Rome, “Sapienza,” Roma, Italy LINCOLN G. SCOTT • Cassia, LLC, San Diego, CA TILL TANTAU • Institut für Theoretische Informatik, Universität zu Lübeck, Lübeck, Germany GLENN TESLER • Department of Mathematics, University of California, San Diego, La Jolla, CA GEORG F. WEILLER • Research School of Biological Sciences and ARC Centre of Excellence for Integrative Legume Research, The Australian National University, Canberra, Australian Capital Territory, Australia SIMON WHELAN • Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom THIERRY WIRTH • Museum National d’Histoire Naturelle, Department of Systematics and Evolution, Herbier, Paris, France YI XING • Department of Internal Medicine, Carver College of Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA SOPHIA YANCOPOULOS • The Feinstein Institute for Medical Research, Manhasset, NY

Contents of Volume II SECTION I: STRUCTURES 1. 2. 3. 4. 5.

UNAFold: Software for Nucleic Acid Folding and Hybridization Nicholas R. Markham and Michael Zuker Protein Structure Prediction Bissan Al-Lazikani, Emma E. Hill, and Veronica Morea An Introduction to Protein Contact Prediction Nicholas Hamilton and Thomas Huber Analysis of Mass Spectrometry Data in Proteomics Rune Matthiesen and Ole N. Jensen The Classification of Protein Domains Russell L. Marsden and Christine A. Orengo

SECTION II: INFERRING FUNCTION 6. 7. 8. 9. 10.

11. 12. 13. 14.

Inferring Function from Homology Richard D. Emes The Rosetta Stone Method Shailesh V. Date Inferring Functional Relationships from Conservation of Gene Order Gabriel Moreno-Hagelsieb Phylogenetic Profiling Shailesh V. Date and José M. Peregrín-Alvarez Phylogenetic Shadowing: Sequence Comparisons of Multiple Primate Species Dario Boffelli Prediction of Regulatory Elements Albin Sandelin Expression and Microarrays Joaquín Dopazo and Fátima Al-Shahrour Identifying Components of Complexes Nicolas Goffard and Georg Weiller Integrating Functional Genomics Data Insuk Lee and Edward M. Marcotte

SECTION III: APPLICATIONS AND DISEASE 15. Computational Diagnostics with Gene Expression Profiles Claudio Lottaz, Dennis Kostka, Florian Markowetz, and Rainer Spang 16. Analysis of Quantitative Trait Loci Mario Falchi 17. Molecular Similarity Concepts and Search Calculations Jens Auer and Jürgen Bajorath

xi

xii

Contents of Volume II

18. Optimization of the MAD Algorithm for Virtual Screening Hanna Eckert and Jürgen Bajorath 19. Combinatorial Optimization Models for Finding Genetic Signatures from Gene Expression Datasets Regina Berretta, Wagner Costa, and Pablo Moscato 20. Genetic Signatures for a Rodent Model of Parkinson’s Disease Using Combinatorial Optimization Methods Mou’ath Hourani, Regina Berretta, Alexandre Mendes, and Pablo Moscato

SECTION IV: ANALYTICAL AND COMPUTATIONAL METHODS 21. Developing Fixed-Parameter Algorithms to Solve Combinatorially Explosive Biological Problems Falk Hüffner, Rolf Niedermeier, and Sebastian Wernicke 22. Clustering Geoffrey J. McLachlan, Richard W. Bean, and Shu-Kay Ng 23. Visualization Falk Schreiber 24. Constructing Computational Pipelines Mark Halling-Brown and Adrian J. Shepherd 25. Text Mining Andrew B. Clegg and Adrian J. Shepherd