Artificial Intelligence Core Lecture, Summer Term 2016 27, 28 June, 04, 05, 11, 13 July

Prof. Wolfgang Wahlster Saarland University Chair for AI & CEO of DFKI FR 6.2 Informatik Campus E1 1 66041Saarbrücken Tel.: (0681) 302-2363 (Univ) or 85775 5252 (DFKI) E-mail: [email protected] WWW: http://www.dfki.de/~wahlster

The Four Phases of AI Research: 60 Years of AI in 2016 Intelligence Level

Phase 4 – 2010 -

Hybrid Architectures: Knowledge Bases combined with Machine Learning (in Embodied Systems)

Machine Learning based on Mass Data and Probalistic Peasoning Phase 3 – 1990 - 2010 Hand-crafted Knowledge Bases and Rulebased reasoning Phase 2 – 1970 - 1990

General Problem Solving Methods Phase 1 – 1950 - 1970

Core Areas and Applications of AI Natural Language Understanding Systems

Image Understanding Systems

Expert Systems

Robotics

Multiagent Systems

Subsymbolic Processing Signal2Symbol Transformation Knowledge Representation Knowledge Processing - Search - Inference - Learning Knowledge Presentation AI Programming Languages Intelligent Tutoring Systems

Intelligent Help Systems

AI Hardware

AI Tools

AI Programming Methods

Intelligent User Interfaces

DFKI Operates Six Large-Scale Intelligent Environments as Living Labs

Big Data: Data as Tradeable Assets

Sensor Data for Weather, Climate, Smart City and Smart Home

3D-Internet-Data and Media Streams

Production and Machine Data from Industry 4.0

Mass Data from Individualized Medicine, from Genom Analysis and Imaging Methods

Mass Data from Smart Grid and Smart Metering

Financial Messaging Data Supervision of Banks, Stock Exchange and High Speed Trade

Life Log Data for Individuals and Objects Digital Product Memories

Mass Data from Mobility by Car2Car-Communication and Logs from Single Vehicles

Mass Data from Social Networks

95% of 1.2 zettabyte worldwide digital data are unstructured – with a data growth of 62% per year.

Development of Global Data Volumes by 2020 (in Zettabyte)

Source: AT Kearney 2013 : Mainly Machine-Generated Data, but also encoded in Natural Language for Human Inspection, Multilingual Natural Language Generation becomes more and more important

Human-Generated Internet Content: Zettabyte of Unstructured BIG Data Exponential Growth of Internet Data: 100.000 New Tweets

Commercial Spoken Language Access: Siri (Apple) Google Now (Google) S Voice (Samsung) Cortana (Microsoft) Still Missing: Crosslingual Information Retrieval

More than 2 Million Queries

48 Hours Videos uploaded

Every Minute

3125 new Photos uploaded

47.000 Apps downloaded

@ 284.166.667 new Emails send

68.4478 messages

571 new homepages

From DB to BD: New BIG DATA Services New Services based on Cloud Nets Decision Support, Prediction, Simulation, Knowledge Discovery, Information Trading, Fusion, Optimization, Modeling

Lower degree of Structure unstructured structured

Machine Learning

Multimodal Interaction

Information Extraction from Text and Video

1018=EXA - ZETTA=1021

BIG DATA 1012=GIGA - PETA=1015

Databases, Data Warehouses, WWW

Complexity

Language Technology as a Key Enabler for BIG DATA Analytics New Data Material Low Information Density Less used for ICT

Classical Data Material High Information Density Much used for ICT

From Data to Meta Knowledge

Interpreted data

Data

IT support

Use of information

Information

Data Mining

Availability of knowledge

Knowledge

Information extraction

MetaKnowledge

Knowledge Representation Knowledge Management

History of Digital Knowledge Bases Cyc

WordNet from humans for humans or Machines (Cyc)

guitarist  {player,musician}  artist algebraist  mathematician  scientist

 x: human(x)  ( y: mother(x,y)   z: father(x,z))  x,u,w: (mother(x,u)  mother(x,w)  u=w)

1985

1990

from algorithms for machines

Wikipedia

4.5 Mio. English articles 20 Mio. contributors Google Knowledge Vault

2000

2005

2010

2015

adapted from Gerhard Weikum

The 5 Steps of Google‘s Semantic Search

Microsoft Academic Graph

Source: K. Wang, Microsoft Research

The Semantic Web versus the Knowledge Web

Semantic Web

Knowledge Web

• Human readable vs machine readable contents

• Machine reads human readable contents

• Human defines standard for data formats and models

• Machine learns to conflate different formats of the same thing

• Explicit and precise specification of knowledge representation that everyone has to agree upon

• Latent and fuzzy representation of knowledge learned by mining big data

Source: K. Wang, Microsoft Research

Knowledge Bases: Comparison

Name

# of Entity Types

# Entities

# Predicates

# Confident Triples

Knowledge Vault (KV)

1100

45M

4469

271M

DeepDive

4

2.7M

34

7M

NELL

271

5.1M

306

0.435M

Yago3

350,000

10M

100

120M

Freebase

1500

40M

35,000

637M

Knowledge Graph

1500

570M

35,000

18,000M

Google Knowledge Vault: The Largest Triple Store

 Data from Web  Unstructured Text  Semi-Structured Dom Trees  Structured Web Tables

.99 , .96 .76

 Prior Data From All Social Websites Google Knowledge Vault

IBM Watson searches large Knowledge Bases to Recommend a Chemotherapy to the Doctor

Fundamental Terms of Knowledge Representation Knowledge: Collection of multimodal content, skills, experiences and problem solving methods, providing the background for complex information processing. Knowledge Representation: Operational as well as formal and therefore computer understandable description of knowledge. Knowledge Representation Language: Formal language for systematic representation of knowledge. Representation Construction: Subset of a knowledge representation language Knowledge Base: Knowledge that can be used by AI System. Knowledge Base

Knowledge Sources Knowledge Units Meta Knowledge: Knowledge about knowledge in the knowledge base. Heterogeneous Knowledge: Knowledge base using different knowledge representation languages to encode code knowledge units. Multiple Representation: Representation of the same knowledge using different knowledge representation languages in the same knowledge bases

Three Layers of Mark-Up Languages in the Semantic Web Content

OWL

Structure

XML

Form

HTML

WWW Document

Content : Structure : Form = 1 : n : m

First Book on Semantic AI Technologies

March 2003 ISBN 0-262-06232-1 8 x 9, 392 pp., 98 illus. $40.00/£26.95 Edited by Dieter Fensel, James A. Hendler, Henry Lieberman and Wolfgang Wahlster Foreword by Tim Berners-Lee

Knowledge Representation for Dialogue Systems Wahlster, W. (ed.): SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies Series, Heidelberg, Germany:

Springer, 2006, 644 pp.

Most Recent Book about Semantic Product Memories in the Springer Series „Cognitive Technologies“ SemProM Foundations of Semantic Product Memories for the Internet of Things Series: Cognitive Technologies Wahlster, Wolfgang (Ed.) 412 Pages

ISBN 978-3-642-37376-3 Electronic Order: http://www.springer.com/computer/ai/book/978-3-642-37376-3

Most Recent Book about the Semantic Technologies Towards the Internet of Services: The THESEUS Program Series: Cognitive Technologies Wahlster, Wolfgang (Ed.) 488 Pages

ISBN 978-3-319-06754-4 Electronic Order: http://www.springer.com/computer/ai/book/978-3-319-06754-4

Semantic Information Access (SmartWeb 2008) Who was world champion in 1990?

Interactive Semantic Access Services Q&A System

Show me the mascot of the championship! When was England world champion? When was Germany for the last time champion?

Web Service Access

Web Pages

Web Resources

Web Services Semantic Modelling

Semantic Mediator

Agent Based Web Access

Knowledge Server

Web Apps.

Ontology & Facts

• Semantic Crawler • Generation of semantic Web pages • Automatic acquisition of ontologies

SmartWeb: Getting Answers on the Go

Who won the World Football Championship in 2006?

Who won the World Football Championship in 2006?

Italy

ITALY

Personal guide for the FIFA world cup

SmartWeb Test Example for an open-domain question: Who performed last year at the Salzburg Opera Festival in La Traviata ?

Jeopardy Quiz Show (IBM Watson 14-16 Feb 2011)

This town is known as "Sin City" and its downtown is "Glitter Gulch"

Las Vegas

With much gravity, this young fellow of Trinity became the Lucasian Professor of Mathematics

Isaac Newton

This US city has two airports named for a World War II hero and a World War II battle

Chicago adopted from Gerhard Weikum

Jeopardy Quiz Show (IBM Watson 14-16 Feb 2011)

question analysis

This town is known as "Sin City" and its downtown is "Glitter Gulch"

knowledge retrieval

Las Vegas

Q: Sin City ?this young fellow of Trinity With much gravity, Isaac  Lucasian movie, graphical novel, nickname for city, … became the Professor of Mathematics Newton A: Vegas ? Strip ?  Vega (star), Suzanne Vega, Vincent Vega, Las Vegas, … This US city has two airports named for a  comic strip, striptease, Las Vegas Strip, … Chicago World War II hero and a World War II battle adopted from Gerhard Weikum

SmartWeb-Car: Mobile Web Access in a Mercedes A-Class or R-Class Who has scored most goals at the FIFA world cup? Where do I get the lowest price Diesel?

Where are speed traps today?

DFKI Cooperation with Siemens, DaimlerChrysler and Fraunhofer.

SIAM: Multimodal Question Answering for Car Drivers

Computer Science Representation of Data AI Research

Computer Science (core)

Theory of Knowledge Representation

Theory of Data Structures

Knowledge Representation Languages

Logic l l l l l l l

Production Systems

Semantic Networks

Predicate Logic Sortal Logic Modal Logic Temporal Logic Probabilistic and Fuzzy Logic Non-Monotonic Logic Description Logic/Terminological Logic

Frames

Inference Networks

Four Description Layers for Knowledge Representation Languages 1) Implementation Layer l objects l pointers

2) Logical Layer l predicates l quantifiers

3) Epistemological Layer l inheritance relations l structuring primitives

4) Ontological Layer l primitive concepts l primitive relations

The Terminological Box (TBox) and the Assertional Box (ABox) in Knowledge Representation

TBox Father = (and Man Parent)

ABox Father(a)

The TBox and ABox in Description Logics • A TBox (Terminological Box) is a set of schema axioms (sentences), defining the vocabulary to describe situations in a domain • An ABox (Assertional Box) is a set of data axioms (ground facts), describing a specific state of a domain e.g.:

• A Knowledge Base (KB) is just a TBox plus one or more ABoxes