CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
FROM INDIVIDUAL EXPERIMENTS TO INFORMED DECISION MAKING: CHALLENGES, SUCCESS STORIES AND OPPORTUNITIES IN COLLABORATIVE SCIENCE Erich Gombocz VP & Chief Science Officer, IO Informatics Inc.
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
WHERE DO WE START? • THE MORE I KNOW … THE BETTER I MAKE CONFIDENT DECISIONS !
“Actionable Knowledge” Data
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
WHAT WE WANT • DATA INTEGRATION IN CONTEXT • “CLEAN DATA” – VARIETY, SIZES, FORMATS • Harmonization & transformation challenges (synonyms, domain-specific naming, units, provenance, dynamic fields, cross-domain concepts)
• SYSTEMS APPROACH • COLLABORATIONS, CONSORTIA, DATA SHARING • NEED TO BE FLEXIBLE, ADAPTABLE, EXTENSIBLE • W3C RDF standard formats, semantic web, machine readable • Provenance, versioning, access management, compliance
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
CHALLENGES HOW TO DEAL WITH THEM
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
A “SEMANTIC KNOWLEDGEBASE”? • DATA IN CONTEXT • How much costs an APPLE?
• BASED ON RESOURCE DESCRIPTION (RDF) • Gene • Protein • Pathway
encodes regulates modulated in
Protein Pathway Disease
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
SEMANTIC KNOWLEDGEBASE • A NETWORK OF RELATED INFORMATION • To understand systems biological relevance of complex mechanisms • A network able to define pattern-based signatures of biological functions • A network which allows inference and reasoning
• DATA OUT OF CONTEXT IS MEANINGLESS, DATA IN CONTEXT MEANS EVERYTHING • “A protein is present at elevated concentration.” So what? • “It’s a key enzyme in a disease-related pathway.” Aha! • “It’s only expressed during the growth cycle of the pathogen.” Aha!
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
CORE SEMANTIC TECHNOLOGIES • RESOURCE DESCRIPTION FRAMEWORK (RDF) • Open, W3C standard based framework, interoperable and extensible by design
• ONTOLOGIES • describe resources and relationships according to their explicit meaning, and can be modified, merged, iteratively applied to data
• SPARQL (RDF QUERY LANGUAGE) AND OWL (WEB ONTOLOGY LANGUAGE) • support network queries, pattern-based screening, inference and rules-based transformations
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
WE NEED HARMONIZATION … WHEN WE WANT TO MAP, INTEGRATE
& SHARE DATASETS
• “DOMAIN BABEL” • • • •
Namespaces, URIs Ontologies Terminologies Synonyms (Classes, Instances, Relationships)
• TRANSFORMATIONS • Time & Dates, Units, …
• “DIRTY DATA” CLEAN-UP • Duplicates, inconsistent assertions • Contradicting versions
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
HARMONIZED RDF INTEGRATION NAMESPACES, ONTOLOGIES, THESAURI, INFERENCE
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
INTEGRATED. AND NOW? • UNDERSTAND YOUR NETWORK! • ENRICHMENT: • which resources? source quality? • OMICs, assays, pathways, molecular interactions, diseases, clinical, taxonomic, animal models, demographic, geo-spacial …
• what for? • qualification, validation, functional biology, mechanism of actions, adverse effects, efficacy, drug targets
• THE LINKED OPEN DATA INITIATIVE
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
LINKED OPEN DATA (LOD) CLOUD
2011 2010 2009 2008
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
INTEGRATIVE WORKFLOW EXAMPLE: COLLABORATIVE PRECISION MEDICINE Genomics Group Outcome Proteomics Group
Clinical Lab Tests
CAPTURE, MANAGE, FILTER NORMALIZE, ANALYZE
VALIDATE, APPLY
INTEGRATE CONTEXTUALIZE, QUALIFY, REFINE
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
BIOMARKER EXAMPLE: FROM SAMPLE TO SYSTEMS BIOLOGY BASED KNOWLEDGE ABOUT MICROBIAL PATHOGENS
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
SEMANTIC MAPPING WALK-THROUGH FROM DATA TO RDF GRAPH
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
WORKFLOW FROM DATA TO RDF GRAPH
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
SUCCESS STORIES TIME, MONEY, LIVES SAVED
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
PHARMACEUTICAL MANUFACTURING FORMULATION INFLUENCE ON DRUG STABILITY
Semantic integration provides immediate report verification and manufacturing based on effect of compound formulation on drug stability and purity
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
SPECIES-INDEPENDENT BIOMARKERS REDUCING ANIMAL TESTING
Genomic, proteomic, and imaging endpoints across species to discover speciesindependent biomarkers applicable to human adverse events and diseases
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
PATIENTS AT SERIOUS RISK ORGAN TRANSPLANT FAILURE
Integration of gene, protein, clinical and reference sources for combinatorial markerbased screening of transplant patients for likelihood of organ failure
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
CLINICAL DECISION SUPPORT ASK FOR PRECISION MEDICINE APPLIED SEMANTIC KNOWLEDGEBASE • • • •
Web-based dashboard Applies patterns for predictive screening Weighing, scoring of results Bring “hits” back into Knowledge Network for validation of hypotheses and algorithms
Screening of transplant patients for likelihood of organ failure, based on combinatorial biomarker patterns, and physician alerting
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
COMBINATION TREATMENT EFFECTIVENESS IN PROSTATE CANCER
Assess effectiveness of different combination treatments for prostate cancer based on multi-platform genomic and proteomic marker profiles and patient match
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
CORONARY PLAQUE RUPTURE RISK ASSESSMENT IN ACUTE ATHERIOSCLEROSIS
Inflammatory response pre-cursor marker confirm onset of plaque rupture and help better responding in acute atherosclerosis
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
TOXICITY CLASSIFICATION IDENTIFICATION OF TYPES OF TOXICITY (NIST ATP)
Development of multi-modal gene expression and metabolic biomarkers to classify and identify types of toxicity
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
OPPORTUNITIES EMERGING AND FUTURE PROJECTS
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
DATA SHARING INITIATIVES • BEST PRACTICES FOR DATA SHARING WORKING GROUP • W3C HCLS GROUPS • Semantic web standards and applications
• PISTOIA ALLIANCE • Biomarker Standards Initiative • Standards for Exchanging Screening Data
• CDD, SAGE, VIVO, SADI
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
EXAMPLE OF WHAT’S NEXT SENSOR-BASED REAL-TIME BIOLOGICAL THREAT ALERT •
Automated marker-based screening and alerting for biological threats (Mobile devices)
•
Multiplex Protein Chip Development for Real-Time Sensor-Based Monitoring
•
Development of preventive measures (Drugs, Vaccines) effective for entire classes of microorganisms
Integrated analysis of real-time data and biological profiles from 30+ resources for microbial pathogen-caused disease pathway interactions to determine threat characteristics
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
TAKE HOME • KNOWLEDGE GAINED FROM CONTEXTUAL INTEGRATION OF EXPERIMENTS, ENRICHED WITH PUBLIC LINKED OPEN DATA CLOUD RESOURCES ADDRESSES: 1. THE REQUIREMENTS TO MEANINGFULLY MERGE DIFFERENT RESOURCES TOWARDS ACTIONABLE KNOWLEDGE, AND 2. THE ADVANCE IN UNDERSTANDING OF COMPLEX INTRICATE BIOLOGICAL FUNCTIONS WHICH ARE USED AS DECISION SUPPORT IN CLINICAL AND PHARMACEUTICAL SETTINGS
CENTER FOR COMPUTING
FOR
LIFE SCIENCES (CCLS) PRESENTS: DATA
FOR
DECISION MAKING: LAB, ENTERPRISE, WEB
DISCUSSION, QUESTIONS
[email protected]