The Relational Model. Our Mathematical Foundation

The Relational Model Our Mathematical Foundation Origins • First proposed by E.F. Codd, 1969-70 • subsequently modified and extended • An abstract ...
Author: Philip Grant
5 downloads 2 Views 141KB Size
The Relational Model Our Mathematical Foundation

Origins • First proposed by E.F. Codd, 1969-70 • subsequently modified and extended

• An abstract theory of data • based on aspects of maths • set theory

• Basis of most modern DBMS • none implement it entirely • we can compare them with the idea

3 Aspects of the Model • It concerns • 1) data objects • storing it

• 2) data integrity • making sure it corresponds to reality

• 3) data manipulation • working with it

Relational Data Objects Storing information

Tables – or is it? • We say that databases have tables • data are stored in them

• This is a simplification • helps user understanding • may be misleading

Two sets of nomenclature Model

RDBMS (Inituitive)

Relation

Table

Tuple

Row (record)

Attribute

Column

Primary key

Column(s) uniquely identifying a row A pool of possible values for an attribute

Domain

Domain • Pool of possible values for an attribute • Each tuple has one of these values for the attribute • Allows meaningful comparisons • They are data types • traditionally supported poorly in most systems • area of recent development

Relation • Based on a collection of domains • Heading • a set of attribute:domain pairs, such that each attribute Ai has its own domain Di. • {, , …, , … }

• Body • a set of tuples • each tuple is a set of name:value pairs • {, , …, , … }

Illustration of notation As a table

UID

Tel

nwh

2434

hoh

2436

xxh

7659

As a relation Heading {, } Body { {, }, {, }, {, } }

Venn diagram notation {, } {, } {, }

Are Relations Tables? • A table is a practical way to write down a relation • Relations are defined on sets • • • •

a set of attributes in the heading a set of tuples in the body sets have no ordering attributes come in no particular sequence • but columns do have sequence

• tuples come in no particular sequence • but rows do have sequence

Properties of relations • Tuples and attributes are unordered • There are no duplicate tuples • All attributes are atomic

Relations/attributes are unordered • Tables seems to be • Don’t believe it!

• We do not work in terms of: • “next row” • “previous record” • “first tuple”

• We do not rely on:

Output can be made to be ordered

• “next attribute” • “first column”

• Do not try to write for loops

No duplicate tuples (= rows) • i.e.. • No two tuples in a relation have all the same values for corresponding attributes

• Crucial point • Can seem like a weakness • It is a strength • Learn to exploit it

Example of duplication Name

Age

Employee ID

Jones

42

3895

Smith

27

6830

Jones

42

3895

Williams

62

4692

• We do not want two records for Jones • The DBMS will prevent this silly duplication • A simple example of exploiting “no duplication”

All attribute values are atomic   Jackie Chan

Acting, Filming, Computing

Jackie Chan

Acting

Bob Dylan

Singing, Dancing

Jackie Chan

Filming

Jackie Chan

Computing

Bob Dylan

Singing

Bob Dylan

Dancing





Atomic Values • Strings have characters, incl. spaces • not the problem

• Problem: Jim

Monday

Squash

Wednesday Cinema Saturday

Chess

Mary Wednesday Cinema Friday

Tennis

• Bad solution Person Day1

Act1

Day2

Act2

Day3

Jim

Mon

Squash Wed

Cinema Sat

Mary

Wed

Cinema Fri

Tennis

• Which is “day 1” How do we search this? Could get the day and activity separated Think about what we are modelling

Act3 Chess

• Solution: Name Day

Activity

Jim

Monday

Squash

Jim

Wednesday Cinema

Jim

Saturday

Chess

Mary Wednesday Cinema Mary Friday

• • • •

Tennis

Looks like adding more lines Not complicated therefore Can seem “cosmetic” Will return to this

Relational Data Objects End

Relational Data Integrity making sure the data corresponds to reality

A database as a model • A DB “models” a real-world enterprise • i.e. the DB must abstract from reality • the attribute values and their combination must reflect the true state of the world

• We try to enforce plausibility • We do this by implementing integrity rules (constraints)

Data-specific integrity rules • Domain specific • Employee age is between 20 and 70 • We only sell CDs in multiples of 10 • Car registrations must be of the form: A DDD AAA • where A = alphabetic, D = digit • A temperature cannot be lower than −273.15 deg C

• Inter-attribute (inter-relational) • Only senior managers and sales reps can have cars over 2000cc

• “Cardinality” constraints • A team has 11 players

General Integrity Rules •

There are two general integrity features Part of the relational model 1. Entity integrity - tuple identification through candidate/primary keys 2. Referential integrity – foreign keys



There may be application-specific rules • • •

must be identified and implemented may be able to use DBMS we will return to this

Candidate Keys (CKs) • A candidate key for relation R is a subset K of the attributes of R such that • no two tuples of R have the same value of K

the “uniqueness property” • no proper subset of K has the uniqueness property

the “irreducibility property” • All relations have at least 1 (everything) • may have several

• Uniqueness applies to all possible tuples • not just the current ones

What are candidate keys for? • Tuple-level addressing • allows unambiguous identification of 1 tuple • content addressing

• The tuple where X has value Y • not unique unless X is a candidate key

• Access mechanisms are more general • e.g. indexes • although CKs may be implemented using them

The primary key (PK) • A specified candidate key • The choice of PK is arbitrary • There may only be one candidate

• Practical factors may help decide • It is common to think only of “the” key (i.e. the primary key) • but remember there may be other candidates

Why bother with CKs ? Suppose that we have attributes: uid, ucas, national heath, surname, first name, d.o.b., city. Here are some data: UID xxh8 jjo1 jja1 llo2

UCAS 987632 003872 013873 003874

NH 234179 139873 243179 264902

Surname Kerr Jones Jones Lamb

FirstNameDOB Kevin 26.07.82 Jane 13.12,81 Jane 10.06.81 Louis 14.07.82

City Borth Lampeter Llanon Borth

Let us declare UCAS to be the primary key. This constrains UCAS numbers to be distinct for different people, but permits UID xxh8 jja1 jja1 llo2

UCAS 987632 003872 013873 003874

!

NH 243179 139873 243179 264902

Surname Kerr Jones Jones Lamb

FirstNameDOB Kevin 26.07.82 Jane 13.12,81 Jane 10.06.81 Louis 14.07.82

City Borth Lampeter Llanon Borth

:

Penalty of not declaring CKs • A real-world model would not want repetitions of UID and NH for distinct persons. • We want UID to be unique for each person. Similarly for NH. • Remedy: declare UID to be a candidate key. Similarly for NH. • Then, either of UID, NH gives unique identification. This captures two constraints. • Continue to use UCAS as the PK. • UCAS, UID, NH are 3 candidate keys.

Foreign Keys • A reference mechanism between relations • The target of a reference must exist in the referenced relation

no “dangling references” Referential Integrity

Example, Consider a table of employees and a table of car allocations:

Foreign Keys Name(PK)

Jones Smith Davies

Post



M.D. Accountant Porter Car(PK)

Violates referential integrity

No car - OK

Allocated

R 345 XYZ Jones P 987 CBA Davies M 567 GHJ Brown



Foreign Keys (FKs)- definition • Linking relation R2 to relation R1: • A foreign key in a relation R2 is a subset of its attributes such that: • there is a relation R1 with a candidate key CK • For each value of FK in R2 there must exist an identical value in the CK in some tuple of R1. Interactive Fiction

Foreign keys - notes • All keys are sets of attributes • A candidate key can contain a value not currently found in the foreign key • Chains of references can build up • Relations can reference themselves • personnel relation can have a “manager” attribute - managers are personnel

Foreign key - examples • Earlier example: PERSON{Name(PK), Post} CAR_ALLOC{Car(PK), Allocated*} FK {Allocated} references PERSON

Notation

• Self referential example: a surgeon is supervised by a senior surgeon, called a “consultant”) SURGEON {Surgeon(PK), Consultant*} FK {Consultant} references SURGEON

Referential Integrity • The database must not hold any unmatched foreign keys • The DBMS should prevent the situation arising - most do today • The DBMS can: • reject operation which would compromise integrity or • make other changes to retain integrity

Maintaining R. Integrity • Attempt to delete the • Attempt to update target of a foreign the candidate key key • only allow if there is no matching FK value

or

• only allow if there is no matching FK value

or

• cascade-update the • cascade-delete FK in the matching tuples with FK tuples matches Interactive Fiction

Beware Autogenerated Keys! Some systems readily offer to generate key values for you – e.g. every time another tuple is entered, a numeric key value is automatically allocated. This permits:

!

UID xxh8 jjo1 jja1 xxh8

UCAS 987632 003872 013873 987632

NH 234179 139873 243179 234179

Surname Kerr Jones Jones Kerr

FirstNameDOB Kevin 26.07.82 Jane 13.12,81 Jane 10.06.81 Kevin 26.07.82

City AutoKey Borth 1000 Lampeter 1001 Llanon 1002 Borth 1003

- the same information inserted twice with distinct autokey values!

Beware Autogenerated Keys! • Remedy: make Autokey, UID, UCAS, NH candidate keys and select one as PK. • Remedy: avoid introducing unnecessary new keys. Do we need an AutoKey in this example? • Caution: if you do introduce a new key, you still need to identify other candidate keys (or risk bad modelling).

Multiple Multi-attribute keys • Consider a timetable, with entries for Day, Time, Module, Room, Building, LecturerID • Assume that a Room label such as A6 can appear in different buildings. • Some data:

Day Monday Monday Friday Monday

Time 09.00 09.00 09.00 17.00

Module CI25910 CS23710 CS27310 CS25610

Room C22 A6 A6 C22

Building Lecturer Hugh Owen rrp Hugh Owen dap Llandinam hoh Hugh Owen dap

Candidate Keys? There might be several.

Multiple Multi-attribute Keys •

With business rules: e.g. •



a module has only one lecturer, a lecturer lectures in 1 room at a time, only 1 module in a room at a time ...

3 Candidate keys in this case: 1. {Day,Time, Room, Building}, – 4 attributes 2. {Day, Time, Lecturer} – 3 attributes 3. {Day, Time, Module}, – 3 attributes



Choose one to be the PK.

Multi-attribute Foreign Keys • A multi-attribute PK may be referenced from another relation • the referencing foreign key then needs to be declared with same structure as that PK (or generally, the CK) • {Room, Building} above could be declared to be a FK that references a relation with PK {Room, Building} describing facilities: Room A6 C22 A6

Building Hugh Owen Hugh Owen Llandinam

BoardType None Fixed Roller

Projection Facilities Lighting Yes Fixed Yes Dimming Yes Dimming

Data Integrity End

Suggest Documents