Artificial Intelligence Core Lecture, Summer Term 2016 27, 28 June, 04, 05, 11, 13 July
Prof. Wolfgang Wahlster Saarland University Chair for AI & CEO of DFKI FR 6.2 Informatik Campus E1 1 66041Saarbrücken Tel.: (0681) 302-2363 (Univ) or 85775 5252 (DFKI) E-mail:
[email protected] WWW: http://www.dfki.de/~wahlster
The Four Phases of AI Research: 60 Years of AI in 2016 Intelligence Level
Phase 4 – 2010 -
Hybrid Architectures: Knowledge Bases combined with Machine Learning (in Embodied Systems)
Machine Learning based on Mass Data and Probalistic Peasoning Phase 3 – 1990 - 2010 Hand-crafted Knowledge Bases and Rulebased reasoning Phase 2 – 1970 - 1990
General Problem Solving Methods Phase 1 – 1950 - 1970
Core Areas and Applications of AI Natural Language Understanding Systems
Image Understanding Systems
Expert Systems
Robotics
Multiagent Systems
Subsymbolic Processing Signal2Symbol Transformation Knowledge Representation Knowledge Processing - Search - Inference - Learning Knowledge Presentation AI Programming Languages Intelligent Tutoring Systems
Intelligent Help Systems
AI Hardware
AI Tools
AI Programming Methods
Intelligent User Interfaces
DFKI Operates Six Large-Scale Intelligent Environments as Living Labs
Big Data: Data as Tradeable Assets
Sensor Data for Weather, Climate, Smart City and Smart Home
3D-Internet-Data and Media Streams
Production and Machine Data from Industry 4.0
Mass Data from Individualized Medicine, from Genom Analysis and Imaging Methods
Mass Data from Smart Grid and Smart Metering
Financial Messaging Data Supervision of Banks, Stock Exchange and High Speed Trade
Life Log Data for Individuals and Objects Digital Product Memories
Mass Data from Mobility by Car2Car-Communication and Logs from Single Vehicles
Mass Data from Social Networks
95% of 1.2 zettabyte worldwide digital data are unstructured – with a data growth of 62% per year.
Development of Global Data Volumes by 2020 (in Zettabyte)
Source: AT Kearney 2013 : Mainly Machine-Generated Data, but also encoded in Natural Language for Human Inspection, Multilingual Natural Language Generation becomes more and more important
Human-Generated Internet Content: Zettabyte of Unstructured BIG Data Exponential Growth of Internet Data: 100.000 New Tweets
Commercial Spoken Language Access: Siri (Apple) Google Now (Google) S Voice (Samsung) Cortana (Microsoft) Still Missing: Crosslingual Information Retrieval
More than 2 Million Queries
48 Hours Videos uploaded
Every Minute
3125 new Photos uploaded
47.000 Apps downloaded
@ 284.166.667 new Emails send
68.4478 messages
571 new homepages
From DB to BD: New BIG DATA Services New Services based on Cloud Nets Decision Support, Prediction, Simulation, Knowledge Discovery, Information Trading, Fusion, Optimization, Modeling
Lower degree of Structure unstructured structured
Machine Learning
Multimodal Interaction
Information Extraction from Text and Video
1018=EXA - ZETTA=1021
BIG DATA 1012=GIGA - PETA=1015
Databases, Data Warehouses, WWW
Complexity
Language Technology as a Key Enabler for BIG DATA Analytics New Data Material Low Information Density Less used for ICT
Classical Data Material High Information Density Much used for ICT
From Data to Meta Knowledge
Interpreted data
Data
IT support
Use of information
Information
Data Mining
Availability of knowledge
Knowledge
Information extraction
MetaKnowledge
Knowledge Representation Knowledge Management
History of Digital Knowledge Bases Cyc
WordNet from humans for humans or Machines (Cyc)
guitarist {player,musician} artist algebraist mathematician scientist
x: human(x) ( y: mother(x,y) z: father(x,z)) x,u,w: (mother(x,u) mother(x,w) u=w)
1985
1990
from algorithms for machines
Wikipedia
4.5 Mio. English articles 20 Mio. contributors Google Knowledge Vault
2000
2005
2010
2015
adapted from Gerhard Weikum
The 5 Steps of Google‘s Semantic Search
Microsoft Academic Graph
Source: K. Wang, Microsoft Research
The Semantic Web versus the Knowledge Web
Semantic Web
Knowledge Web
• Human readable vs machine readable contents
• Machine reads human readable contents
• Human defines standard for data formats and models
• Machine learns to conflate different formats of the same thing
• Explicit and precise specification of knowledge representation that everyone has to agree upon
• Latent and fuzzy representation of knowledge learned by mining big data
Source: K. Wang, Microsoft Research
Knowledge Bases: Comparison
Name
# of Entity Types
# Entities
# Predicates
# Confident Triples
Knowledge Vault (KV)
1100
45M
4469
271M
DeepDive
4
2.7M
34
7M
NELL
271
5.1M
306
0.435M
Yago3
350,000
10M
100
120M
Freebase
1500
40M
35,000
637M
Knowledge Graph
1500
570M
35,000
18,000M
Google Knowledge Vault: The Largest Triple Store
Data from Web Unstructured Text Semi-Structured Dom Trees Structured Web Tables
.99 , .96 .76
Prior Data From All Social Websites Google Knowledge Vault
IBM Watson searches large Knowledge Bases to Recommend a Chemotherapy to the Doctor
Fundamental Terms of Knowledge Representation Knowledge: Collection of multimodal content, skills, experiences and problem solving methods, providing the background for complex information processing. Knowledge Representation: Operational as well as formal and therefore computer understandable description of knowledge. Knowledge Representation Language: Formal language for systematic representation of knowledge. Representation Construction: Subset of a knowledge representation language Knowledge Base: Knowledge that can be used by AI System. Knowledge Base
Knowledge Sources Knowledge Units Meta Knowledge: Knowledge about knowledge in the knowledge base. Heterogeneous Knowledge: Knowledge base using different knowledge representation languages to encode code knowledge units. Multiple Representation: Representation of the same knowledge using different knowledge representation languages in the same knowledge bases
Three Layers of Mark-Up Languages in the Semantic Web Content
OWL
Structure
XML
Form
HTML
WWW Document
Content : Structure : Form = 1 : n : m
First Book on Semantic AI Technologies
March 2003 ISBN 0-262-06232-1 8 x 9, 392 pp., 98 illus. $40.00/£26.95 Edited by Dieter Fensel, James A. Hendler, Henry Lieberman and Wolfgang Wahlster Foreword by Tim Berners-Lee
Knowledge Representation for Dialogue Systems Wahlster, W. (ed.): SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies Series, Heidelberg, Germany:
Springer, 2006, 644 pp.
Most Recent Book about Semantic Product Memories in the Springer Series „Cognitive Technologies“ SemProM Foundations of Semantic Product Memories for the Internet of Things Series: Cognitive Technologies Wahlster, Wolfgang (Ed.) 412 Pages
ISBN 978-3-642-37376-3 Electronic Order: http://www.springer.com/computer/ai/book/978-3-642-37376-3
Most Recent Book about the Semantic Technologies Towards the Internet of Services: The THESEUS Program Series: Cognitive Technologies Wahlster, Wolfgang (Ed.) 488 Pages
ISBN 978-3-319-06754-4 Electronic Order: http://www.springer.com/computer/ai/book/978-3-319-06754-4
Semantic Information Access (SmartWeb 2008) Who was world champion in 1990?
Interactive Semantic Access Services Q&A System
Show me the mascot of the championship! When was England world champion? When was Germany for the last time champion?
Web Service Access
Web Pages
Web Resources
Web Services Semantic Modelling
Semantic Mediator
Agent Based Web Access
Knowledge Server
Web Apps.
Ontology & Facts
• Semantic Crawler • Generation of semantic Web pages • Automatic acquisition of ontologies
SmartWeb: Getting Answers on the Go
Who won the World Football Championship in 2006?
Who won the World Football Championship in 2006?
Italy
ITALY
Personal guide for the FIFA world cup
SmartWeb Test Example for an open-domain question: Who performed last year at the Salzburg Opera Festival in La Traviata ?
Jeopardy Quiz Show (IBM Watson 14-16 Feb 2011)
This town is known as "Sin City" and its downtown is "Glitter Gulch"
Las Vegas
With much gravity, this young fellow of Trinity became the Lucasian Professor of Mathematics
Isaac Newton
This US city has two airports named for a World War II hero and a World War II battle
Chicago adopted from Gerhard Weikum
Jeopardy Quiz Show (IBM Watson 14-16 Feb 2011)
question analysis
This town is known as "Sin City" and its downtown is "Glitter Gulch"
knowledge retrieval
Las Vegas
Q: Sin City ?this young fellow of Trinity With much gravity, Isaac Lucasian movie, graphical novel, nickname for city, … became the Professor of Mathematics Newton A: Vegas ? Strip ? Vega (star), Suzanne Vega, Vincent Vega, Las Vegas, … This US city has two airports named for a comic strip, striptease, Las Vegas Strip, … Chicago World War II hero and a World War II battle adopted from Gerhard Weikum
SmartWeb-Car: Mobile Web Access in a Mercedes A-Class or R-Class Who has scored most goals at the FIFA world cup? Where do I get the lowest price Diesel?
Where are speed traps today?
DFKI Cooperation with Siemens, DaimlerChrysler and Fraunhofer.
SIAM: Multimodal Question Answering for Car Drivers
Computer Science Representation of Data AI Research
Computer Science (core)
Theory of Knowledge Representation
Theory of Data Structures
Knowledge Representation Languages
Logic l l l l l l l
Production Systems
Semantic Networks
Predicate Logic Sortal Logic Modal Logic Temporal Logic Probabilistic and Fuzzy Logic Non-Monotonic Logic Description Logic/Terminological Logic
Frames
Inference Networks
Four Description Layers for Knowledge Representation Languages 1) Implementation Layer l objects l pointers
2) Logical Layer l predicates l quantifiers
3) Epistemological Layer l inheritance relations l structuring primitives
4) Ontological Layer l primitive concepts l primitive relations
The Terminological Box (TBox) and the Assertional Box (ABox) in Knowledge Representation
TBox Father = (and Man Parent)
ABox Father(a)
The TBox and ABox in Description Logics • A TBox (Terminological Box) is a set of schema axioms (sentences), defining the vocabulary to describe situations in a domain • An ABox (Assertional Box) is a set of data axioms (ground facts), describing a specific state of a domain e.g.:
• A Knowledge Base (KB) is just a TBox plus one or more ABoxes