INF3580 – Semantic Technologies – Spring 2012 Lecture 2: Resource Description Framework (RDF)

Martin G. Skjæveland 24th January 2012

Department of Informatics

University of Oslo

Mandatory Exercise 1

Mandatory Exercise 1 published on course web site immediately after lecture. RDF. Hand-in by next Thursday. Use Devilry to hand-in: https://devilry.ifi.uio.no/. Use Mr. Oblig, http://sws.ifi.uio.no/mroblig to test your delivery. Next Mandatory Exercise published next Tuesday.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

2 / 43

Today’s Plan 1

Introduction

2

RDF data model

3

RDF vocabularies

4

RDF serialisations

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

3 / 43

Introduction

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

4 / 43

Introduction

RDF: W3C Overview RDF is a data model. RDF is a standard model for data interchange on the Web. It has features that facilitate data merging even if the underlying schemas differ. It extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (called “triple”). Thus allows data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations. RDF has many serialisations. Adapted from http://w3c.org/RDF.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

5 / 43

Introduction

Semantic Web Stack User interface and applications

Central block in the SW stack.

Trust Proof

First “semantic” block in stack.

RDF SPARQL RDFS/OWL Applications

Querying: SPARQL

Ontologies: OWL

Vocabularies: RDFS

Data interchange: RDF Syntax: XML Identifiers: URI

INF3580 :: Spring 2012

Rules: SWRL

Cryptography

In the course we will explore:

Unifying logic

Lecture 2 :: 24th January

Chr. set: UNICODE

6 / 43

RDF data model

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

7 / 43

RDF data model

Technicalities

RDF Triples RDF is a data model. All information in RDF is expressed using a triple pattern. A triple consists of a subject, a predicate, and an object. Examples: subject predicate object Norway has capital Oslo Oslo has mayor Fabian Stang Fabian Stang born year 1955 Another word for an RDF triple is a statement or fact. The elements of an RDF triple are either URI resources, blank nodes, or literals. INF3580 :: Spring 2012

Lecture 2 :: 24th January

8 / 43

RDF data model

Technicalities

Uniform Resource Identifiers (URIs)

RDF (Resource Description Framework) talks about resources. Almost anything is a resource.

Resources are identified by URIs (Uniform Resource Identifiers). E.g., in dbpedia.org: Norway: http://dbpedia.org/resource/Norway has capital: http://dbpedia.org/ontology/capital Oslo: http://dbpedia.org/resource/Oslo has mayor: http://dbpedia.org/ontology/leaderName Fabian Stang: http://dbpedia.org/resource/Fabian_Stang As identifiers, think of them as just strings (on a special format). Not necessarily dereferenceable.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

9 / 43

RDF data model

Technicalities

URIs and QNames

RDF is a data model, but has different serialisations. We use Turtle. URIs are often long and hard to read and write. Most serialisations use an abbreviation mechanism. Define “prefixes”, “namespaces”.

E.g., in Turtle serialisation: @prefix dbp: . @prefix dbp-ont: .

A QName like dbp:Oslo stands for http://dbpedia.org/resource/Oslo Remember: It’s all just URIs!

INF3580 :: Spring 2012

Lecture 2 :: 24th January

10 / 43

RDF data model

Technicalities

RDF Graphs An RDF graph is a set of triples. E.g., dbp:Norway dbp:Oslo

dbp-ont:capital dbp-ont:leaderName

dbp:Oslo . dbp:Fabian_Stang .

is an RDF graph containing two triples. RDF graphs are often represented as a directed labeled graph: dbp:Norway

dbp-on

t:capi tal

t dbp-on

rName :leade

dbp:Oslo

dbp:Fabian_Stang

INF3580 :: Spring 2012

Lecture 2 :: 24th January

11 / 43

RDF data model

Technicalities

Literals Literals are used to represent data values. Literals can be Plain, without language tag: dbp:Oslo dbp-ont:officialName "Oslo" . Plain, with language tag: dbp:Norway rdfs:label "Norge"@no . dbp:Norway rdfs:label "Norwegen"@de . Typed, with a URI indicating the type: dbp:Oslo dbp-ont:population "611491"^^xsd:integer . But not both, i.e., typed and with a language tag.

Usually represented with rectangles: dbp-ont:population

dbp:Oslo

INF3580 :: Spring 2012

"611491"^^xsd:integer

Lecture 2 :: 24th January

12 / 43

RDF data model

Technicalities

Blank Nodes Blank nodes are like resources without a URI. Use when resource is unknown, or has no (natural) identifier. Norway’s capital has population 611491: :capital

:population

dbp:Norway

"611491"

The address of UiO is Problemveien 7, 0313 Oslo:

:st

:UiO

:address

ree

"Problemveien 7"

t

:place

"Oslo" :pos

tcod

e

"0313" INF3580 :: Spring 2012

Lecture 2 :: 24th January

13 / 43

RDF data model

Technicalities

RDF Triple Grammar

Literals and blank nodes may not appear everywhere in triples: s p • URI resources may occur in all positions 4 4 • Literals may only occur in object position 8 8 • Blank nodes may not occur in predicate position 4 8 Why?

o 4 4 4

Literals are just values, no relationships from literals allowed. Blank nodes in predicate position deemed “too meaningless” and confusing.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

14 / 43

RDF data model

Features

Why URIs?

URIs naturally have a “global” scope, unique throughout the web. Contrasts to, e.g., keys in rel. DB which are unique within a table. Helps to avoid name clashes. Example: merging two product catalogues. http://www.abc-company.com/category/item/123 http://www.xyz-company.com/product/123

URLs are also addresses. Exploit the well-functioning machinery of web browsing. Find data by following data identifiers, i.e., URIs.

“A web of data.”

INF3580 :: Spring 2012

Lecture 2 :: 24th January

15 / 43

RDF data model

Features

Why Triples?

Any information format can be transformed to triples. Examples: Tabular (spreadsheets, DBs): Trees (XML):

row parent

column path

cell child

Relationships are made explicit and elements in their own right. The predicate, i.e., the relationship, is an element in the triple. Unlike DB columns and binary predicates. Can be described in RDF. “Self-documenting”.

Again, “A web of data”.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

16 / 43

RDF data model

Features

Why Graphs? A single, but highly versatile, format. Everything is on the same format: triples!

Since RDF graphs are just sets of triples, basic set operations are well-defined. Merging RDF graphs? Just take their union! With tabular data, table dimensions must match. With trees, a node can only have one parent. Note that graphs need not be connected.

Extending an RDF graph? Just add more triples! Need not redefine the database table, or to restructure the XML schema.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

17 / 43

RDF vocabularies

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

18 / 43

RDF vocabularies

Vocabularies

Families of related notions are grouped into vocabularies. Usually the same namespace/prefix is shared. Some important, well-known namespaces—and prefixes: rdf: – RDF rdfs: – RDF Schema foaf: – Friend of a friend dcterms: – Dublin Core

Usually, a description is published at the namespace base URI. Note that the prefix is not standardised. However, in practice many are. rdf:

INF3580 :: Spring 2012

would be highly irregular.

Lecture 2 :: 24th January

19 / 43

RDF vocabularies

Vocabularies: Classes and Properties

A vocabulary usually defines a set of classes and properties. Resources may be divided into groups called classes. The members of a class are known as instances of the class. rdf:type relates an instance to its class.

A property is a relation between subject and object resources. Predicates are properties. Classes and properties are themselves resources, and identified by URIs.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

20 / 43

RDF vocabularies

Example Vocabularies: RDF, RDFS Some example resources: RDF: describing RDF graphs.

RDFS: describing RDF vocabularies.

rdf:Statement

rdfs:Class

rdf:subject, rdf:predicate, rdf:object

rdfs:subClassOf, rdfs:subPropertyOf rdfs:domain, rdfs:range

rdf:type

rdfs:label

Examples: dbp:Oslo rdf:type dbp-ont:Place dbp:Norway rdfs:label "Norge"@no :Capital rdfs:subClassOf :City INF3580 :: Spring 2012

Lecture 2 :: 24th January

21 / 43

RDF vocabularies

Example Vocabularies: FOAF, Dublin Core Some example resources: FOAF: person data and relations. foaf:Person foaf:knows foaf:firstName, foaf:lastName, foaf:gender

Dublin Core: library metadata. dcterms:creator, dcterms:contributor dcterms:format, dcterms:language, dcterms:licence

Examples: ifi:martige rdf:type foaf:Person ifi:martige foaf:knows ifi:martingi ifi:martige dcterms:creator :rdf-lecture

INF3580 :: Spring 2012

Lecture 2 :: 24th January

22 / 43

RDF serialisations

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

23 / 43

RDF serialisations

RDF Serialisations There are many serialisations for the RDF data model: RDF/XML the W3C standard. Complicated! Fabian Stang

Turtle convenient, human readable/writable—our choice. @prefix dbp: . @prefix foaf: . dbp:Fabian_Stang foaf:name "Fabian Stang" .

N-triples one triple per line. No abbreviations. "Fabian Stang" .

Others N3, TriX, TriG, RDF/JSON, . . .

INF3580 :: Spring 2012

Lecture 2 :: 24th January

24 / 43

RDF serialisations

Turtle

URI Resources and Triples Full URIs are surrounded by < and >:

Statements are triples terminated by a period: .

Use ‘a’ to abbreviate rdf:type: a .

Turtle allows any non-zero amount of space between elements in triples. INF3580 :: Spring 2012

Lecture 2 :: 24th January

25 / 43

RDF serialisations

Turtle

Namespaces QNames are written without any special characters. Namespace prefixes are declared with @prefix: @prefix dbp: . dbp:Oslo a .

A base namespace may be declared: @prefix dbp: . @prefix : . dbp:Oslo a :Place .

INF3580 :: Spring 2012

Lecture 2 :: 24th January

26 / 43

RDF serialisations

Turtle

Literals Literal values are enclosed in double quotes: @prefix dbp: . @prefix : . dbp:Oslo :officialName "Oslo" .

Possibly with type or language information: dbp:Norway rdfs:label "Norge"@no . dbp:Oslo :population "611491"^^xsd:integer .

Numbers and booleans can be written without quotes: dbp:Oslo :population 611491 . dbp:Oslo :isCapital true .

INF3580 :: Spring 2012

Lecture 2 :: 24th January

27 / 43

RDF serialisations

Turtle

Statements sharing elements Statements may share a subject with ‘;’: dbp:Oslo :officialName "Oslo" ; :population 611491 ; :leaderName dbp:Fabian_Stang .

Statements may share subject and predicate with ‘,’: dbp:Norway rdfs:label "Norway"@en , "Norwegen"@de , "Norge"@no .

. . . and in combination: dbp:Norway rdfs:label "Norway"@en, "Norwegen"@de, "Norge"@no ; :capital dbp:Oslo .

INF3580 :: Spring 2012

Lecture 2 :: 24th January

28 / 43

RDF serialisations

Turtle

Blank Nodes Blank nodes are designated with underscores or [...]. Norway has a capital with population 611491: dbp:Norway :capital _:someplace . _:someplace :population 611491 .

There is a place with official name Oslo: [] a :Place ; :officialName "Oslo" .

UiO has address Problemveien 7, 0313 Oslo: :UiO :address [ :street "Problemveien 7" ; :place "Oslo" ; :postcode "0313" ] .

INF3580 :: Spring 2012

Lecture 2 :: 24th January

29 / 43

RDF serialisations

Turtle

Other Things

Use ‘#’ to comment: # This is a comment. dbp:Oslo a dbp-ont:Place . # This is another comment.

Use ‘\’ to escape special characters: :someGuy foaf:name "James \"Mr. Man\" Olson" .

Turtle specification: http://www.w3.org/TR/turtle/.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

30 / 43

RDF on the web

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

31 / 43

RDF on the web

Where is it? In files: In some serialisation: XML/RDF, Turtle, . . . Typically small RDF graphs, i.e., max. a few 100 triples, e.g., Vocabularies: http://xmlns.com/foaf/spec/index.rdf. Tiny datasets: http://folk.uio.no/martingi/foaf.rdf.

From SPARQL endpoints: Data kept in a triple store, i.e., a database. RDF is served from endpoint as results of SPARQL queries. Exposes data (in different formats) with endpoint frontends, e.g., http://dbpedia.org/resource/Norway, or by direct SPARQL query: http://dbpedia.org/sparql.

There are many RDFizers which convert data to RDF. W3C keeps a list: http://www.w3.org/wiki/ConverterToRdf. INF3580 :: Spring 2012

Lecture 2 :: 24th January

32 / 43

RDF on the web

Publishing RDF on the Web Make the URI of your data items dereferencable. This is the case for all full URIs in this lecture.

Make data available in different formats, using redirects. Typically: HTML for humans, RDF for computers.

Send the request to a page describing the data item. Distinguish the data item URI from the page that describes it. This is called content negotiation. Example: http://dbpedia.org/resource/Norway http://dbpedia.org/page/Norway

Endpoint frontends will do this for you.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

33 / 43

RDF on the web

Creating RDF Data and Vocabularies Designing an easy-to-use and robust namespace is non-trivial. Naming is difficult. Reuse existing vocabularies if possible. Don’t reinvent. URIs are also addresses, consider publishing issues when naming. Adhere to the policies described in best practice documents: Best Practice Recipes for Publishing RDF Vocabularies http://www.w3.org/TR/2008/NOTE-swbp-vocab-pub-20080828/

Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/

Use http://www.example.[com|net|org] for prototyping and documentation.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

34 / 43

RDF on the web

Linked Open Data

Tim Berners-Lee’s recipe for 5 star web data: ? ?? ??? ???? ?????

Make data available on the Web (any format) under an open license. Make it available as structured data (e.g., Excel, not image scans). Use non-proprietary formats (e.g., CSV instead of Excel). Use URIs to identify data items; make them referable on the Web. Link your data to other’s data to provide context.

(??????

Allow your data to be accessed under arbitrary view, i.e., make it queryable.) Adapted from http://www.w3.org/DesignIssues/LinkedData.html.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

35 / 43

RDF on the web

Web of Data

The point of publishing data as described in this lecture is to have self-describing and self-documenting data. Decouples data from applications. Lightens the programming burden. Semantic Web applications should be generic and general purpose, exploiting the rich and knowledge intensive data sets.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

36 / 43

Subtleties

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

37 / 43

Subtleties

URIs are not necessarily unique

URIs are just strings, not a “global identification service”. There is nothing stopping you from using rdf:type as the URI for your favourite data item. However, don’t do that! The simple rule of only creating URIs in a namespace domain you control should keep you out of trouble. Again, put data on the URI address.

Trust is an important (and work-in-progress) layer in the SW stack.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

38 / 43

Subtleties

RDF Graphs are not graphs dbp:Norway rdf:type

dbp-ont:capital

rdf:type

dbp:Oslo rdf:Property

Drawing dbp:Norway dbp-ont:capital dbp:Oslo is straight-forward. But what about rdf:type rdf:type rdf:Property? RDF graphs are sets of triples, not graphs. The set of nodes, i.e., subjects and object, and edges, i.e., predicates, of an RDF graph need not be disjoint. However, nodes and edges in an RDF graph are usually disjoint: data resides in the nodes, edges are vocabulary elements. INF3580 :: Spring 2012

Lecture 2 :: 24th January

39 / 43

Subtleties

Be careful when merging RDF files Merging the two RDF files containing named blank nodes File 1

File 2

ifi:martige :owns _:myCar . _:myCar a lotus:Esprit .

ifi:martingi :owns _:myCar . _:myCar a cit:Sahara .

gives the RDF graph: File 1 ∪ File 2 ifi:martige :owns _:myCar . ifi:martingi :owns _:myCar . _:myCar a lotus:Esprit, cit:Sahara . ifi:martige

:owns

ifi:martingi

:owns

rdf:type

lotus:Esprit

rdf:type

cit:Sahara

:myCar

INF3580 :: Spring 2012

Lecture 2 :: 24th January

40 / 43

Subtleties

Rename Blank Nodes Renaming :myCar to :myCar2 in File 2. File 1

File 2

ifi:martige :owns _:myCar . _:myCar a lotus:Esprit .

ifi:martingi :owns _:myCar2 . _:myCar2 a cit:Sahara .

gives the RDF graph: File 1 ∪ File 2 ifi:martige :owns _:myCar . ifi:martingi :owns _:myCar2 . ifi:martige

ifi:martingi

INF3580 :: Spring 2012

:owns

:owns

_:myCar _:myCar2

a a

lotus:Esprit . cit:Sahara .

rdf:type :myCar

lotus:Esprit

:myCar2 rdf:type

Lecture 2 :: 24th January

cit:Sahara

41 / 43

Summary

Outline 1

Introduction

2

RDF data model Technicalities Features

3

RDF vocabularies

4

RDF serialisations Turtle

5

RDF on the web

6

Subtleties

7

Summary

INF3580 :: Spring 2012

Lecture 2 :: 24th January

42 / 43

Summary

RDF: W3C Overview RDF is a data model. RDF is a standard model for data interchange on the Web. It has features that facilitate data merging even if the underlying schemas differ. It extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (called “triple”). Thus allows data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations. RDF has many serialisations. Adapted from http://w3c.org/RDF.

INF3580 :: Spring 2012

Lecture 2 :: 24th January

43 / 43