XML, DTD, and XPath. Announcements (October 17) From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems

XML, DTD, and XPath CPS 116 Introduction to Database Systems Announcements (October 17) 2 ™ Project milestone #1 feedback will be ready by Thursda...
Author: Suzan Gilmore
2 downloads 1 Views 140KB Size
XML, DTD, and XPath CPS 116 Introduction to Database Systems

Announcements (October 17)

2

™ Project

milestone #1 feedback will be ready by Thursday ™ Homework #3 will be assigned Thursday

From HTML to XML (eXtensible Markup Language) ™

3

HTML describes presentation of content Bibliography Foundations of Databases Abiteboul, Hull, and Vianu
Addison Wesley, 1995 …

™

XML describes only the content Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 …

)

Separation of content from presentation simplifies content extraction and allows the same content to be presented easily in different looks

1

Other nice features of XML

4

™ Portability:

Just like HTML, you can ship XML data across platforms ƒ Relational data requires heavy-weight protocols, e.g., JDBC

™ Flexibility:

You can represent any information (structured, semi-structured, documents, …) ƒ Relational data is best suited for structured data

™ Extensibility:

Since data describes itself, you can change the schema easily ƒ Relational schema is rigid and difficult to change

XML terminology

5 Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 …

names: book, title, … tags: , , … ™ End tags: , , … ™ An element is enclosed by a pair of start and end tags: … ™ Tag

™ Start

ƒ Elements can be nested: ……… ƒ Empty elements: • Can be abbreviated:

can also have attributes:

™ Elements

Well-formed XML documents

6

A well-formed XML document ™ Follows XML lexical conventions ƒ Wrong: We show that x < 0… ƒ Right: We show that x < 0… • Other special entities: > becomes > and & becomes &

Contains a single root element ™ Has tags that are properly matched and elements that are properly nested ™

ƒ Right: ……… ƒ Wrong: ………

2

7

More XML features Comments: CDATA: ,…]]> ™ ID’s and references ™ ™

Homer… Marge… Bart……

™

Namespaces allow external schemas and qualified names … ……

Processing instructions for apps: ™ And more… ™

8

Valid XML documents ™

A valid XML document conforms to a Document Type Definition (DTD)

™

A DTD specifies

ƒ A DTD is optional ƒ A grammar for the document ƒ Constraints on structures and values of elements, attributes, etc. ™

Example ]>

9

DTD explained Zero or one Zero or more book consists of a title, zero or more authors, an optional publisher, and zero or more sections, in sequence

book has a required ISBN attribute which is a unique identifier Foundations of Databases book has an optional (#IMPLIED) Abiteboul Hull price attribute which contains Vianu Addison Wesley character data 1995

… Other attribute types include IDREF (reference to an ID), IDREFS (space-separated list of references), enumerated list, etc.

3

10

DTD explained (cont’d)

PCDATA is text that will be parsed ( will be treated as a markup tag and < etc. will be treated as entities); CDATA is unparsed character data title, author, publisher, and year all contain parsed character data (#PCDATA) Each section starts with a title, followed by some optional text and then Introduction In this section we introduce XML and zero or more subsections XML ]>

“Deterministic” content declaration ™ Catch:

DTD…

XML stands for… DTD Definition DTD stands for… Usage You can use DTD to…

11

the following declaration does not work:

ƒ ƒ Because when looking at name, the XML processor would not know which way to go without looking further ahead ™ Requirement:

content declaration must be “deterministic” (i.e., no look-ahead required) ™ Can we rewrite the above declaration into an equivalent, but deterministic one?

Using DTD ™

12

DTD can be included in the XML source file ƒ … …

™

DTD can be external ƒ … … ƒ … …

4

13

Why use DTD’s? ™ Benefits

of not using DTD

ƒ Unstructured data is easy to represent ƒ Overhead of DTD validation is avoided ™ Benefits

of using DTD

XML versus relational data Relational data ™ Schema is always fixed in advance and difficult to change ™

Simple, flat table structures

™

Ordering of rows and columns is unimportant

™

Data exchange is problematic “Native” support in all serious commercial DBMS

™

14

XML data ™ Well-formed XML does not require predefined, fixed schema ™ Nested structure; ID/IDREF(S) permit arbitrary graphs ™ Ordering forced by document format; may or may not be important ™ Designed for easy exchange ™ Often implemented as an “addon” on top of relations

Query languages for XML

15

™ XPath

ƒ Path expressions with conditions )Building block of other standards (XQuery, XSLT, XLink, XPointer, etc.) ™ XQuery

ƒ XPath + full-fledged SQL-like query language ™ XSLT

ƒ XPath + transformation templates

5

16

Example DTD and XML ]> Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 …… …

17

A tree representation bibliography

book

title

author

Foundations Abiteboul of Databases

book



year

author

author

publisher

Hull

Vianu

Addison Wesley

title Introduction

section



1995

In this section section we introduce …



section





XPath

18

™ XPath

specifies path expressions that match XML data by navigating down (and occasionally up and across) the tree ™ Example ƒ Query: /bibliography/book/author • Like a UNIX path

ƒ Result: all author elements reachable from root via the path /bibliography/book/author

6

Basic XPath constructs

19

/ separator between steps in a path name matches any child element with this tag name * matches any child element @name matches the attribute with this name @* matches any attribute // matches any descendent element or the current element itself . matches the current element .. matches the parent element

Simple XPath examples ™

All book titles

™

All book ISBN numbers

™

All title elements, anywhere in the document

™

All section titles, anywhere in the document

™

Authors of bibliographical entries (suppose there are articles, reports, etc. in addition to books)

20

/bibliography/book/title /bibliography/book/@ISBN //title //section/title

/bibliography/*/author

Predicates in path expressions

21

[condition] matches the current element if condition evaluates to true on the current element ™ Books with price lower than $50 /bibliography/book[@price

Suggest Documents