INF3580 – Semantic Technologies – Spring 2012 Lecture 2: Resource Description Framework (RDF)
Martin G. Skjæveland 24th January 2012
Department of Informatics
University of Oslo
Mandatory Exercise 1
Mandatory Exercise 1 published on course web site immediately after lecture. RDF. Hand-in by next Thursday. Use Devilry to hand-in: https://devilry.ifi.uio.no/. Use Mr. Oblig, http://sws.ifi.uio.no/mroblig to test your delivery. Next Mandatory Exercise published next Tuesday.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
2 / 43
Today’s Plan 1
Introduction
2
RDF data model
3
RDF vocabularies
4
RDF serialisations
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
3 / 43
Introduction
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
4 / 43
Introduction
RDF: W3C Overview RDF is a data model. RDF is a standard model for data interchange on the Web. It has features that facilitate data merging even if the underlying schemas differ. It extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (called “triple”). Thus allows data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations. RDF has many serialisations. Adapted from http://w3c.org/RDF.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
5 / 43
Introduction
Semantic Web Stack User interface and applications
Central block in the SW stack.
Trust Proof
First “semantic” block in stack.
RDF SPARQL RDFS/OWL Applications
Querying: SPARQL
Ontologies: OWL
Vocabularies: RDFS
Data interchange: RDF Syntax: XML Identifiers: URI
INF3580 :: Spring 2012
Rules: SWRL
Cryptography
In the course we will explore:
Unifying logic
Lecture 2 :: 24th January
Chr. set: UNICODE
6 / 43
RDF data model
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
7 / 43
RDF data model
Technicalities
RDF Triples RDF is a data model. All information in RDF is expressed using a triple pattern. A triple consists of a subject, a predicate, and an object. Examples: subject predicate object Norway has capital Oslo Oslo has mayor Fabian Stang Fabian Stang born year 1955 Another word for an RDF triple is a statement or fact. The elements of an RDF triple are either URI resources, blank nodes, or literals. INF3580 :: Spring 2012
Lecture 2 :: 24th January
8 / 43
RDF data model
Technicalities
Uniform Resource Identifiers (URIs)
RDF (Resource Description Framework) talks about resources. Almost anything is a resource.
Resources are identified by URIs (Uniform Resource Identifiers). E.g., in dbpedia.org: Norway: http://dbpedia.org/resource/Norway has capital: http://dbpedia.org/ontology/capital Oslo: http://dbpedia.org/resource/Oslo has mayor: http://dbpedia.org/ontology/leaderName Fabian Stang: http://dbpedia.org/resource/Fabian_Stang As identifiers, think of them as just strings (on a special format). Not necessarily dereferenceable.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
9 / 43
RDF data model
Technicalities
URIs and QNames
RDF is a data model, but has different serialisations. We use Turtle. URIs are often long and hard to read and write. Most serialisations use an abbreviation mechanism. Define “prefixes”, “namespaces”.
E.g., in Turtle serialisation: @prefix dbp: . @prefix dbp-ont: .
A QName like dbp:Oslo stands for http://dbpedia.org/resource/Oslo Remember: It’s all just URIs!
INF3580 :: Spring 2012
Lecture 2 :: 24th January
10 / 43
RDF data model
Technicalities
RDF Graphs An RDF graph is a set of triples. E.g., dbp:Norway dbp:Oslo
dbp-ont:capital dbp-ont:leaderName
dbp:Oslo . dbp:Fabian_Stang .
is an RDF graph containing two triples. RDF graphs are often represented as a directed labeled graph: dbp:Norway
dbp-on
t:capi tal
t dbp-on
rName :leade
dbp:Oslo
dbp:Fabian_Stang
INF3580 :: Spring 2012
Lecture 2 :: 24th January
11 / 43
RDF data model
Technicalities
Literals Literals are used to represent data values. Literals can be Plain, without language tag: dbp:Oslo dbp-ont:officialName "Oslo" . Plain, with language tag: dbp:Norway rdfs:label "Norge"@no . dbp:Norway rdfs:label "Norwegen"@de . Typed, with a URI indicating the type: dbp:Oslo dbp-ont:population "611491"^^xsd:integer . But not both, i.e., typed and with a language tag.
Usually represented with rectangles: dbp-ont:population
dbp:Oslo
INF3580 :: Spring 2012
"611491"^^xsd:integer
Lecture 2 :: 24th January
12 / 43
RDF data model
Technicalities
Blank Nodes Blank nodes are like resources without a URI. Use when resource is unknown, or has no (natural) identifier. Norway’s capital has population 611491: :capital
:population
dbp:Norway
"611491"
The address of UiO is Problemveien 7, 0313 Oslo:
:st
:UiO
:address
ree
"Problemveien 7"
t
:place
"Oslo" :pos
tcod
e
"0313" INF3580 :: Spring 2012
Lecture 2 :: 24th January
13 / 43
RDF data model
Technicalities
RDF Triple Grammar
Literals and blank nodes may not appear everywhere in triples: s p • URI resources may occur in all positions 4 4 • Literals may only occur in object position 8 8 • Blank nodes may not occur in predicate position 4 8 Why?
o 4 4 4
Literals are just values, no relationships from literals allowed. Blank nodes in predicate position deemed “too meaningless” and confusing.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
14 / 43
RDF data model
Features
Why URIs?
URIs naturally have a “global” scope, unique throughout the web. Contrasts to, e.g., keys in rel. DB which are unique within a table. Helps to avoid name clashes. Example: merging two product catalogues. http://www.abc-company.com/category/item/123 http://www.xyz-company.com/product/123
URLs are also addresses. Exploit the well-functioning machinery of web browsing. Find data by following data identifiers, i.e., URIs.
“A web of data.”
INF3580 :: Spring 2012
Lecture 2 :: 24th January
15 / 43
RDF data model
Features
Why Triples?
Any information format can be transformed to triples. Examples: Tabular (spreadsheets, DBs): Trees (XML):
row parent
column path
cell child
Relationships are made explicit and elements in their own right. The predicate, i.e., the relationship, is an element in the triple. Unlike DB columns and binary predicates. Can be described in RDF. “Self-documenting”.
Again, “A web of data”.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
16 / 43
RDF data model
Features
Why Graphs? A single, but highly versatile, format. Everything is on the same format: triples!
Since RDF graphs are just sets of triples, basic set operations are well-defined. Merging RDF graphs? Just take their union! With tabular data, table dimensions must match. With trees, a node can only have one parent. Note that graphs need not be connected.
Extending an RDF graph? Just add more triples! Need not redefine the database table, or to restructure the XML schema.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
17 / 43
RDF vocabularies
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
18 / 43
RDF vocabularies
Vocabularies
Families of related notions are grouped into vocabularies. Usually the same namespace/prefix is shared. Some important, well-known namespaces—and prefixes: rdf: – RDF rdfs: – RDF Schema foaf: – Friend of a friend dcterms: – Dublin Core
Usually, a description is published at the namespace base URI. Note that the prefix is not standardised. However, in practice many are. rdf:
INF3580 :: Spring 2012
would be highly irregular.
Lecture 2 :: 24th January
19 / 43
RDF vocabularies
Vocabularies: Classes and Properties
A vocabulary usually defines a set of classes and properties. Resources may be divided into groups called classes. The members of a class are known as instances of the class. rdf:type relates an instance to its class.
A property is a relation between subject and object resources. Predicates are properties. Classes and properties are themselves resources, and identified by URIs.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
20 / 43
RDF vocabularies
Example Vocabularies: RDF, RDFS Some example resources: RDF: describing RDF graphs.
RDFS: describing RDF vocabularies.
rdf:Statement
rdfs:Class
rdf:subject, rdf:predicate, rdf:object
rdfs:subClassOf, rdfs:subPropertyOf rdfs:domain, rdfs:range
rdf:type
rdfs:label
Examples: dbp:Oslo rdf:type dbp-ont:Place dbp:Norway rdfs:label "Norge"@no :Capital rdfs:subClassOf :City INF3580 :: Spring 2012
Lecture 2 :: 24th January
21 / 43
RDF vocabularies
Example Vocabularies: FOAF, Dublin Core Some example resources: FOAF: person data and relations. foaf:Person foaf:knows foaf:firstName, foaf:lastName, foaf:gender
Dublin Core: library metadata. dcterms:creator, dcterms:contributor dcterms:format, dcterms:language, dcterms:licence
Examples: ifi:martige rdf:type foaf:Person ifi:martige foaf:knows ifi:martingi ifi:martige dcterms:creator :rdf-lecture
INF3580 :: Spring 2012
Lecture 2 :: 24th January
22 / 43
RDF serialisations
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
23 / 43
RDF serialisations
RDF Serialisations There are many serialisations for the RDF data model: RDF/XML the W3C standard. Complicated! Fabian Stang
Turtle convenient, human readable/writable—our choice. @prefix dbp: . @prefix foaf: . dbp:Fabian_Stang foaf:name "Fabian Stang" .
N-triples one triple per line. No abbreviations. "Fabian Stang" .
Others N3, TriX, TriG, RDF/JSON, . . .
INF3580 :: Spring 2012
Lecture 2 :: 24th January
24 / 43
RDF serialisations
Turtle
URI Resources and Triples Full URIs are surrounded by < and >:
Statements are triples terminated by a period: .
Use ‘a’ to abbreviate rdf:type: a .
Turtle allows any non-zero amount of space between elements in triples. INF3580 :: Spring 2012
Lecture 2 :: 24th January
25 / 43
RDF serialisations
Turtle
Namespaces QNames are written without any special characters. Namespace prefixes are declared with @prefix: @prefix dbp: . dbp:Oslo a .
A base namespace may be declared: @prefix dbp: . @prefix : . dbp:Oslo a :Place .
INF3580 :: Spring 2012
Lecture 2 :: 24th January
26 / 43
RDF serialisations
Turtle
Literals Literal values are enclosed in double quotes: @prefix dbp: . @prefix : . dbp:Oslo :officialName "Oslo" .
Possibly with type or language information: dbp:Norway rdfs:label "Norge"@no . dbp:Oslo :population "611491"^^xsd:integer .
Numbers and booleans can be written without quotes: dbp:Oslo :population 611491 . dbp:Oslo :isCapital true .
INF3580 :: Spring 2012
Lecture 2 :: 24th January
27 / 43
RDF serialisations
Turtle
Statements sharing elements Statements may share a subject with ‘;’: dbp:Oslo :officialName "Oslo" ; :population 611491 ; :leaderName dbp:Fabian_Stang .
Statements may share subject and predicate with ‘,’: dbp:Norway rdfs:label "Norway"@en , "Norwegen"@de , "Norge"@no .
. . . and in combination: dbp:Norway rdfs:label "Norway"@en, "Norwegen"@de, "Norge"@no ; :capital dbp:Oslo .
INF3580 :: Spring 2012
Lecture 2 :: 24th January
28 / 43
RDF serialisations
Turtle
Blank Nodes Blank nodes are designated with underscores or [...]. Norway has a capital with population 611491: dbp:Norway :capital _:someplace . _:someplace :population 611491 .
There is a place with official name Oslo: [] a :Place ; :officialName "Oslo" .
UiO has address Problemveien 7, 0313 Oslo: :UiO :address [ :street "Problemveien 7" ; :place "Oslo" ; :postcode "0313" ] .
INF3580 :: Spring 2012
Lecture 2 :: 24th January
29 / 43
RDF serialisations
Turtle
Other Things
Use ‘#’ to comment: # This is a comment. dbp:Oslo a dbp-ont:Place . # This is another comment.
Use ‘\’ to escape special characters: :someGuy foaf:name "James \"Mr. Man\" Olson" .
Turtle specification: http://www.w3.org/TR/turtle/.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
30 / 43
RDF on the web
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
31 / 43
RDF on the web
Where is it? In files: In some serialisation: XML/RDF, Turtle, . . . Typically small RDF graphs, i.e., max. a few 100 triples, e.g., Vocabularies: http://xmlns.com/foaf/spec/index.rdf. Tiny datasets: http://folk.uio.no/martingi/foaf.rdf.
From SPARQL endpoints: Data kept in a triple store, i.e., a database. RDF is served from endpoint as results of SPARQL queries. Exposes data (in different formats) with endpoint frontends, e.g., http://dbpedia.org/resource/Norway, or by direct SPARQL query: http://dbpedia.org/sparql.
There are many RDFizers which convert data to RDF. W3C keeps a list: http://www.w3.org/wiki/ConverterToRdf. INF3580 :: Spring 2012
Lecture 2 :: 24th January
32 / 43
RDF on the web
Publishing RDF on the Web Make the URI of your data items dereferencable. This is the case for all full URIs in this lecture.
Make data available in different formats, using redirects. Typically: HTML for humans, RDF for computers.
Send the request to a page describing the data item. Distinguish the data item URI from the page that describes it. This is called content negotiation. Example: http://dbpedia.org/resource/Norway http://dbpedia.org/page/Norway
Endpoint frontends will do this for you.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
33 / 43
RDF on the web
Creating RDF Data and Vocabularies Designing an easy-to-use and robust namespace is non-trivial. Naming is difficult. Reuse existing vocabularies if possible. Don’t reinvent. URIs are also addresses, consider publishing issues when naming. Adhere to the policies described in best practice documents: Best Practice Recipes for Publishing RDF Vocabularies http://www.w3.org/TR/2008/NOTE-swbp-vocab-pub-20080828/
Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/
Use http://www.example.[com|net|org] for prototyping and documentation.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
34 / 43
RDF on the web
Linked Open Data
Tim Berners-Lee’s recipe for 5 star web data: ? ?? ??? ???? ?????
Make data available on the Web (any format) under an open license. Make it available as structured data (e.g., Excel, not image scans). Use non-proprietary formats (e.g., CSV instead of Excel). Use URIs to identify data items; make them referable on the Web. Link your data to other’s data to provide context.
(??????
Allow your data to be accessed under arbitrary view, i.e., make it queryable.) Adapted from http://www.w3.org/DesignIssues/LinkedData.html.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
35 / 43
RDF on the web
Web of Data
The point of publishing data as described in this lecture is to have self-describing and self-documenting data. Decouples data from applications. Lightens the programming burden. Semantic Web applications should be generic and general purpose, exploiting the rich and knowledge intensive data sets.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
36 / 43
Subtleties
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
37 / 43
Subtleties
URIs are not necessarily unique
URIs are just strings, not a “global identification service”. There is nothing stopping you from using rdf:type as the URI for your favourite data item. However, don’t do that! The simple rule of only creating URIs in a namespace domain you control should keep you out of trouble. Again, put data on the URI address.
Trust is an important (and work-in-progress) layer in the SW stack.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
38 / 43
Subtleties
RDF Graphs are not graphs dbp:Norway rdf:type
dbp-ont:capital
rdf:type
dbp:Oslo rdf:Property
Drawing dbp:Norway dbp-ont:capital dbp:Oslo is straight-forward. But what about rdf:type rdf:type rdf:Property? RDF graphs are sets of triples, not graphs. The set of nodes, i.e., subjects and object, and edges, i.e., predicates, of an RDF graph need not be disjoint. However, nodes and edges in an RDF graph are usually disjoint: data resides in the nodes, edges are vocabulary elements. INF3580 :: Spring 2012
Lecture 2 :: 24th January
39 / 43
Subtleties
Be careful when merging RDF files Merging the two RDF files containing named blank nodes File 1
File 2
ifi:martige :owns _:myCar . _:myCar a lotus:Esprit .
ifi:martingi :owns _:myCar . _:myCar a cit:Sahara .
gives the RDF graph: File 1 ∪ File 2 ifi:martige :owns _:myCar . ifi:martingi :owns _:myCar . _:myCar a lotus:Esprit, cit:Sahara . ifi:martige
:owns
ifi:martingi
:owns
rdf:type
lotus:Esprit
rdf:type
cit:Sahara
:myCar
INF3580 :: Spring 2012
Lecture 2 :: 24th January
40 / 43
Subtleties
Rename Blank Nodes Renaming :myCar to :myCar2 in File 2. File 1
File 2
ifi:martige :owns _:myCar . _:myCar a lotus:Esprit .
ifi:martingi :owns _:myCar2 . _:myCar2 a cit:Sahara .
gives the RDF graph: File 1 ∪ File 2 ifi:martige :owns _:myCar . ifi:martingi :owns _:myCar2 . ifi:martige
ifi:martingi
INF3580 :: Spring 2012
:owns
:owns
_:myCar _:myCar2
a a
lotus:Esprit . cit:Sahara .
rdf:type :myCar
lotus:Esprit
:myCar2 rdf:type
Lecture 2 :: 24th January
cit:Sahara
41 / 43
Summary
Outline 1
Introduction
2
RDF data model Technicalities Features
3
RDF vocabularies
4
RDF serialisations Turtle
5
RDF on the web
6
Subtleties
7
Summary
INF3580 :: Spring 2012
Lecture 2 :: 24th January
42 / 43
Summary
RDF: W3C Overview RDF is a data model. RDF is a standard model for data interchange on the Web. It has features that facilitate data merging even if the underlying schemas differ. It extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (called “triple”). Thus allows data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations. RDF has many serialisations. Adapted from http://w3c.org/RDF.
INF3580 :: Spring 2012
Lecture 2 :: 24th January
43 / 43