RELATIONAL DATABASES. Pierre Fierz. Bern University of Applied Sciences

Bern University of Applied Sciences Engineering and Information Technology RELATIONAL DATABASES Pierre Fierz Keywords: relational model, relational ...

Author: Arron Walker

8 downloads 0 Views 659KB Size

Report

Download PDF

Recommend Documents

Bern University of Applied Sciences

UNIVERSITY OF APPLIED SCIENCES

University of Applied Sciences

UNIVERSITY OF APPLIED SCIENCES

University of Applied Sciences

UNIVERSITY OF APPLIED SCIENCES

University of Applied Sciences

Conversion of Blaise Databases to Relational Databases *

University of Applied Sciences, Deutschland

TAMPERE UNIVERSITY OF APPLIED SCIENCES

SATAKUNTA UNIVERSITY OF APPLIED SCIENCES

University of Applied Sciences Mittweida

CENTRIA UNIVERSITY OF APPLIED SCIENCES

Laurea University of Applied Sciences

Lecture Overview. Applied Databases SQL. Relational Databases. Example Instance of Student Relation. Querying Multiple Relations

Automating Layout of Relational Databases

Relational Geographic Databases

Watermarking Relational Databases

XML and Relational Databases

Object-Relational Databases Exercices

Mining Massive Relational Databases

Technische Fachhochschule Berlin University of Applied Sciences

Bern University of Applied Sciences Engineering and Information Technology

RELATIONAL DATABASES Pierre Fierz

Keywords: relational model, relational algebra, normalforms, entity-relationshipmodel, SQL, JDBC [File rdb.tex, Date 02.06.09]

© Pierre Fierz

Chapter 1 The Model 1.1

What is a Database?

Definition 1.1 [Database] A database is a system for description, storage and recovery of large sets of data. The datas are usualy shared by several applications and users. A database consists of two main parts: • The actual data and • the Database Management System (DBMS), which is reponsible for storing, changing, searching and deleting datas in accordance with the description of the datas. Figure 1-1 represents the different parts of a database. actual data application 1

meta data application 2

auxiliary data Database−Managementsystem

application 3

Database

Figure 1-1: The DBMS is an interface between the data and the users

1.2

ANSI-SPARC Architecture

The ANSI-SPARC Architecture (American National Standards Institute, Standards Planning And Requirements Committee) is an abstract design standard for a Database Management System, first proposed in 1975. Most modern commercial DBMS are based on this system. The ANSI-SPARC model never became a formal standard.

1-1

1.2.1

Three-level architecture

The objective of the three-level architecture is to separate the users view(s) of the database from the way that it is physically represented. This is desirable since: • It allows independent customised user views. Each user should be able to access the same data, but have a different customised view of the data. These should be independent: changes to one view should not affect others. • It hides the physical storage details from users. Users should not have to deal with physical database storage details. They should be allowed to work with the data itself, without concern for how it is physically stored. • The database administrator should be able to change the database storage structures without affecting the users views. From time to time rationalisations or other changes to the structure of an organisation’s data will be required. • The structure of the database should be unaffected by changes to the physical aspects of the storage. For example, a changeover to a new disk. • The database administrator should be able to change the conceptual or global structure of the database without affecting the users. This should be possible while still maintaining the desired individual users views.

1.2.2

The Model

The architecture of the ANSI-SPARC model is pictured in figure 1-2.

Figure 1-2: ANSI-SPARC Architecture for databases

1-2

The overall description of a database is called the database schema. There are three different types of schema corresponding to the three levels in the ANSI-SPARC architecture. External Level (User Views) A user’s view of the database describes a part of the database that is relevant to a particular user. It excludes irrelevant data as well as data which the user is not authorised to access. The external schema describe the different external views of the data and there may be many external schemas for a given database. Conceptual Level The conceptual level is a way of describing what data is stored within the whole database and how the data is inter-related. The conceptual level does not specify how the data is physically stored. The conceptual schema describes all the data items and relationships between them, together with integrity constraints. There is only one conceptual schema per database Internal Level The internal level involves how the database is physically represented on the computer system. It describes how the data is actually stored in the database and on the computer hardware. The internal schema contains definitions of the stored records, the methods of representation, the data fields, and indexes. There is only one internal schema per database.

1.2.3

Data Independence

Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to make changes in the definition and organization of data, and vice-versa. There are two levels of data independence 1. Logical data independence: The ability to change the logical schema without changing the external schema is called Logical Data Independence. Example: Addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas or having to rewrite existing application programs. 2. Physical data independence: The ability to change the physical schema without changing the logical schema is called Physical Data Independence. Example: A change to the internal schema, such as using different file organisation or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas

1-3

1.3

The Relational Model

The relational model for database management is a database model based on predicate logic and set theory. It was first formulated and proposed in 1970 by Edgar Codd. The relational model is used at the conceptual level of the ANSI-SPARC architecture.

1.3.1

Attribute and Domain

Definition 1.2 [Attribute and Domain] Let U be a nonempty, finite set called the Universe. An element A ∈ U is called an Attribute. Let D = {D1 , . . . , Dm } be a finite set of nonempty finite sets. Di are Domains also called data type. There is a function dom : U 7→ D. dom(A) is called the domain of A. An element w ∈ dom(A) is called an attribute value of A. Remark 1.1 [Atomic Values] All domains contains only atomic values. This means, that the value of an attribute is not a set or a relation. Examples of atomic values are numbers, strings, dates . . . . Example 1.1 [Attributes and Domains] The next table shows attributes and possible domains. Attribute name city quantity

Domain Character string with length ≤ 30 Character string with length ≤ 25 0 < Number ≤ 1500

Definition 1.3 [Null value] A Null value is a special value used to indicate that a data value does not exist in the database. The Null value is simply added to the domain of an attribute. The meaning of a null value is “missing information” or “inapplicable information”. We will denote null values by the symbol ⊥. Remark 1.2 [A null value is not 0] A null value is not the same than the number 0 or the empty character string but a special marker with the given meaning.

1.3.2

Relation Schema, Relation and Tuple

Definition 1.4 [Relation Schema] A non empty subset of the universe R ⊆ U is called a relation schema. The relation schema defines the type of an object (tuple) in the relational model. Definition 1.5 [Relation and Tupel] A relation r over the relation schema R = {A1 , . . . , An } (written r(R)) is a finite set of functions of the form t : R −→

n [

dom(Ai ) with t|Ai ∈ dom(Ai ) for i = 1, . . . , n

i=1

1-4

t|Ai is the restriction of the function t to Ai . The functions t are called tuple over R. The set of all relations over a given relation schema is denoted by REL(R) := {r|r(R)}. For X ⊆ R the restriction t|X is called an X value of t. It is also denoted by t(X). Example 1.2 [Relation and Tuple] Consider the following relation schema representing the attributes of suppliers. supplier = {sNr, name, city, zip} The domains are dom(sN r) = int, dom(name) = string, dom(city) = {Denver, N ewY ork, . . .}, dom(zip) = [01000 − 99999] Now we can define tuples. Each tuple represents a supplier. t1 (sNr) = 1, t1 (name) = ’Smith’, t1 (city) = ’Denver’, t1 (zip) = ’80202’ t2 (sNr) = 2, t2 (name) = ’Davis’, t2 (city) = ’New York’, t2 (zip) = ’10019’ t3 (sNr) = 3, t3 (name) = ’Jones’, t3 (city) = ’Dallas’, t3 (zip) = ’75277’ The set r(supplier) = {t1 , t2 , t3 } is a relation over the schema supplier. Remark 1.3 [Cartesian Product] Originaly a relation r over a relation schema R = {A1 . . . An } was defined as a subset of the cartesian product of the domains of the attributes of R r(R) ⊆ dom(A1 ) × . . . × dom(An ) The problem is, that with the cartesien product we introduce an ordering of the attributes. But this ordering has no semantic meaning in the relational model. This definition gives its name to the model because in mathematics a relation is defined as a subset of the cartesian product. In terms of the relational model of databases, a table can be considered a convenient representation of a relation, but the two are not strictly equivalent. For instance, an SQL table can potentially contain duplicate rows, whereas a true relation cannot contain duplicate tuples. Similarly, representation as a table implies a particular ordering to the rows and columns, whereas a relation is explicitly unordered. In the figure 1-3 the supplier relation is represented as a table. Definition 1.6 [Local Integrity Constraint] Let R be a relation schema. An integrity constraint is a funktion b : REL(R) → {true, f alse}. We can extend a relation schema R by a set of integrity constraints B. The extended schema is denoted by R = (R, B). A relation r over B satisfies all constraints b(r) = true ∀b ∈ B. The set of all relations over R is denoted by SATR (B) := {r|r ∈ r(R)}.

1-5

String

Numeric

name

S#

Relation

Denver New York etc.

01000−99999

city

zip

1

Smith

Denver

80202

2

Davis

New York

10019

3

Jones

Dallas

75277

Domains

Relation Schema Tuple

. . .

Figure 1-3: The Relation supplier

1.3.3

Semantic Key

Definition 1.7 [Semantic Key] A semantic key K ⊆ R for a relation r(R) is a minimal set {A1 , . . . , Ak } ⊆ R with ∀t1 , t2 ∈ r(R)[t1 6= t2 ⇔ ∃Aj ∈ K : t1 (Aj ) 6= t2 (Aj )]

A semantic key identifies each tuple in a relation. A semantic key exists for every relation, because a relation is defined as a set of tuples. Example 1.3 [Examples of semantic keys] Consider the following relation schema representing the attributes of suppliers. supplier = {sNr, name, zip, city} We will examine different relations over the schema supplier r1

=

{, }

zip is a key for relation r1 . r2

=

{, , }

In this example there are two keys sNr and name r3

=

{, , }

In this example the two keys are {name, zip} and {name, city}

1-6

1.3.4

Syntactic Key

In the example 1.3 the semantic keys may change when the relation is changing. This is not a good way to identify the tuples in a relation. We need a key, which does not change over time. Such a key is called syntactic, because it is defined by the designer of the schema. Definition 1.8 [Syntactic Key] A syntactic Key for a relation schema R is a set K := {A1 , . . . , Ak } ⊆ R, which satisfy the following local integrity constraints bK (r(R)) =

(

true : f alse :

if ∀t1 , t2 ∈ r(R) : t1 6= t2 ⇒ ∃Aj ∈ K : t1 (Aj ) 6= t2 (Aj ) otherwise

A syntactic Key K1 is never a proper subset of another syntactic Key K2 Remark 1.4 [Keys an Null values] Attributes of a Key never can contain a null value. It is not possible to identify an objekt with the value “unknown”. This fact is called key integrity. Definition 1.9 [Primary and Candidate Key] If there is more than one possible Key for the same relational schema then one of them is selected as the primary Key. The other keys (if such keys exist) are called candidate keys of the schema. Remark 1.5 [The choice of the primary key] In practice we always choose a primary key who never changes. Example 1.4 [Keys] Consider the following relation schema representing the products supplied by a supplier. product =({pNr, pname, price, stockQty, minimalQty}, {pNr}, ck({pname})) In this example we choose pNr as the primary key of the relation. Because distinct products should have distinct names, we define the candidate key pname.

1.3.5

Database and Database Schema

Definition 1.10 [Database and Database Schema] A database schema is a finite non empty set of relation schematas S := {R1 , . . . , Rp } over the universe U. A database value (short: database) over a database schema S is a set of relations d(S) := {r1 (R1 ), . . . , rp (Rp )}. A relation r(R) ∈ d(S) is called a basis relation. As we did it for a relation schema, we define global integrity constraints for a database schema. Definition 1.11 [Global Integrity Constraints] Let S be a database schema. A global integrity constraint for S is a function γ : d(S) −→ {true, f alse}}. Let Γ := {γ|γ : {d|d(S)} −→ {true, f alse}} be a set of integrity constraint for the database schema S. Then S := (S, Γ) is called a extended database schema. d(S) is a database d(S) which satisfies all constraints in Γ. The set of all databases over S is defined by DAT(S) := {d|d(S)}. 1-7

1.3.6

Foreign Key

In the context of relational databases, a foreign key is a global integrity constraint between two tables. Definition 1.12 [Foreign Key] A foreign key relationship between two relation r1 (R1 ) and r2 (R2 ) is an expession of the form X(R1 ) → Y (R2 ) with X(R1 ) ⊆ R1 and Y (R2 ) ⊆ R2 , which satisfy the following constraints: • Y (R2 ) is the primary key (or a candidate key) for R2 • {t(X)|t ∈ r1 (R1 )} ⊆ {t(Y )|t ∈ r2 (R2 )} X(R1 ) is a foreign key, r1 (R1 ) is called the referencing relation and r2 (R2 ) the referenced relation. The system must enforce the constraints. This concept is known under the name referential integrity. Example 1.5 [Foreign key] Consider the two relational schematas supplier sorder

= ({sNr, name, city, zip}, {sNr}) = ({oNr, sNr, orderDate, deliveryDate nv}, {oNr}, fk({sNr}, supplier))

We introduce the following foreing key relationship. X(sorder) → Y (supplier) with X(sorder), Y (supplier) = {sNr} sNr is the primary key in supplier and a foreign key in sorder. This models the fact that each order belongs to exactly one supplier and that each supplier owns more than one order. Now we introduce orderItems and products. orderItem = ({oNr, pNr, qty}, {oNr,pNr}, fk({oNr}, order), fk({pNr, product)) product = ({pNr, pname, price stockQty, minimalQty}, {pNr}, ck({pname})) In this example there are two foreign keys. X(orderItem) → Y (sorder) with X(orderItem), Y (sorder) = oNr and X(orderItem) → Y (product) with X(ordeItem), Y (product) = pNr and This models the fact that each order owns one or more order items and that each order item is referencing exactly one product. The relation orderItem represents a manyto-many relationship between orders and products. orderItem is also called associative entity. Remark 1.6 [Foreign Key and Null Value] A foreign key containing null values cannot match the values of a primary key, since a primary key by definition can have 1-8

no null values. However, a null foreign key value is always valid, regardless of the value of any of its non-null parts. When a tuple is deleted in a referenced relation the referential integrity must be preserved. To do this three different strategies can be applied. 1. Restricted delete A tuple in the referenced relation can only be deleted if it is not referenced by any tuple in the referencing relation. 2. Cascaded delete All referencing tuples are also deleted. Eventually this process must be cascaded. 3. Nullify All values of the attributes of the foreign key in the referencing relation are nullified.

1.4

The Entity-Relationship-Model (ERM)

The relational model is not appropriated for design purposes. The Entity-RelationshipModel (ERM) is more suitable to design good database schematas. The translation from the ERM to the relational model is an easy task. An entity-relationship model is an abstract conceptual representation of structured data. Entity-relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model) of a system, and of its requirements, in a top-down fashion. Diagrams created using this process are called entity-relationship diagrams, or ER diagrams for short. Originally proposed in 1976 by Dr. Pin-Shan (Peter) Chen, many variants of the process have subsequently been devised.

1.4.1

The Elements of the Model

Definition 1.13 [Entity] An entity is a real-world object distinguishable from all other objects in the world. An entity has one or more properties. An entity can be a Person, an abstract concept, an event etc. The Name of an entity is always a noun. Examples for entities are: • person: employee, student, physician, patient . . . • place: city, department . . . • object: machine, building, car . . . • concept: account, course, workCenter . . . • event: customerOrder, . . .

1-9

Definition 1.14 [Entity Set] The set of all properties of an entity is called the type of the entity. An entity set is a set of entities of the same type. Definition 1.15 [Domain] A domain is a collection (set) of the valid values for a property of an entity. Every property of an entity has a domain. For the domain of a property we will write dom(prop). Definition 1.16 [Entity Attribute] An attribute is a mapping from an entity set to the domain of a property of the entity. If the property has more than one value, then the mapping will be to the powerset of the domain. The kind of the mapping will determine the cardinality of the attribute. There are four types of cardinality: • Type 1 cardinality: In this case the mapping is a function, e.g. every person has exactly one name. name : person 7→ dom(name) • Typ C cardinality (C for conditional): In this case the mapping is a partial function, e.g. The patient has one or none health insurance. insurance : patient ֒→ dom(insurance) • Typ M cardinality (M for multiple): In this case the mapping is a function to the powerset of the domain without the empty set, e.g. a person speaks one ore more languages. language : person 7→ P(dom(language)) \ {} • Typ MC cardinality (MC for multiple conditional): In this case the mapping is a function to the powerset of the domain, e.g. a person has no diploma, one diploma or more than one diploma. diploma : person 7→ P(dom(diploma)) Definition 1.17 [Entity Key] A entity key is an attribute (or a set of attributes), whose values identifies each entity in a entity set. The key selection must follow certain criteria. These include: 1. Key should NOT change its value over the life of each entity instance. 2. Key should be guaranteed to NOT be NULL. 3. Avoid intelligent-keys (stored data as part of a key such as a code). For example, a key where the first two digits indicates a warehouse location would be a poor choice because the coding scheme could change over time. 4. Substitute single-attribute surrogate keys for large composite keys. A surrogate key can be a system generated sequence of unique numbers.

1-10

To draw the entity relationship diagramm we will use the UML (unified modeling language) notation. An example for an entity set is shown in figure 1-4. Patient pNr name birthdate healthInsurance [0..1] language [1..*] allergy [0..*]

Figure 1-4: Entity set and Attributes The cardinalities 1, C, M and MC for the attributes are denoted in the following way: name Type 1 healthInsurance[0..1] Type C language[1..*] Type M allergy[0..*] Type MC Definition 1.18 [Relationship] A relationship is an association between instances of one or more entity types. We only will use binary relationships. Definition 1.19 [Relationship set] A relationship set is a collection of relationships of the same kind. An Example is the belongs to relationship between the entity supplier and sorder. The cardinality (C:C, 1:1, C:1, M:M, MC:MC etc.) expresses the number of occurrences of one entity related to another entity. Examples: • employee 1:1 parkingPlace: Each employee has one park place. Each park place belongs to exactly one employee. • supplier 1:M order: Each order belongs to exactly one supplier. A supplier may have multiple orders • physician M:M patient: A physician treats several patients. A patient is treated by several physicians. Definition 1.20 [Recursive Relationship] A recursive relationship (or unary relationship) is a relationship between entties of the same entity set. As an example we can consider the parts list of a product. This is a relationship with cardinality MC:MC between products. product MC:MC product In UML a relationship is represented by a line between the concerned entity sets. The name of the relationship is written near the line. The cardinalities are writen on both sides of the line. Some possibilities of relationships between physician and patient are shown in figure 1-5. 1-11

physician

1..*

treats

1..*

patient

a) A physician treats several patients. A patient is treated by several physicians. physician

1

1..*

treats

patient

b) A physician treats several patients. A patient is treated by one physician. physician

1

0..*

treats

patient

c) A physician treats zero, one or several patients. A patient is treated by one physician.

Figure 1-5: Relationships (M:M, 1:M and 1:MC) The arrow near the name of the relationship indicates the direction in wich the text is read. The relationship itself is not directed. Definition 1.21 [intersection attribute] An intersection attribute is associated with a relationship. The attribute is only meaningful if a relationship exists between two entities. As an example consider the parts list of a product. To manufacture a product, two informations are essential. Which produkt is needed and which quantity of the product is needed. The quantity is not an attribute of the product but an attribute of the parts list relationship. When a relationship has attributes, then it is also considered as an entity. This entity is called an associative entity. In UML this is represented by an own entity attached to the relationship. An example is shown in figure 1-6. Physician

1..*

treats

1..*

Patient

PhTreatsP diagnosis

Figure 1-6: intersection attribute The ER-Model allows generalization and specialization of entity. In the object oriented world this is called inheritance. Definition 1.22 [Generalization and Specialization] An entity type that includes one or more distinct subgroupings of its occurrences is called a generalization. A distinct subgrouping of occurrences of an entity type is called a specialization. Example 1.6 [Generalisation] Consider an hospital where we have an entity physician and an entity patient. Both entity type have attributes in common like the first 1-12

name, the name, the birthdate etc. But it is possible that a physician becomes a patient. Then the same person would have a name as a physician and a name as a patient. This introduces redundancy in the model. To solve this problem we introduce a generalization. The generalization will be the entity person. Physician and patient become specialisations of person. With generalization/specialisation there are different possibilities to consider. 1. The specialisations may be disjoint or not disjoint. 2. The generalization can be abstract or not. Abstract means that each entity in the generalization must also be an entity of one (or more) specialisation. Generalization is defined in UML by a hollow arrow on the parent side of the relationship. Generalization is not a relationship between two entity types. It is a derivation of the general entity type to the specialized entity type. Cardinality is not allowed on generalization relationships. For an abstract class the name is written in italic. The different cases are shown in figure 1-7 Person

Room {disjoint}

{disjoint} woman

Man

case a)

operatingRoom case b)

Person

Person

{overlapping} Employee case c)

hospitalRoom

{overlapping}

Patient

physician

Patient

case d)

Figure 1-7: Notation for specialization

1.4.2

Developing an Entity Relationship Model (one way of doing it)

1. Identify Entities Identify the roles, events, locations, tangible things or concepts about which the end-users want to store data. 2. Find Relationships and determine the cardinalities Find the natural associations between pairs of entities (possibly use a relationship matrix). Determine the number of occurrences of one entity for a single occurrence of the related entity. 1-13

3. Identify generalization/specialization Determine if groups of entities can be grouped to a generalized entity. 4. Define Entity Keys Identify the data attribute(s) that uniquely identify one and only one occurrence of each entity. 5. Identify and Map Attributes Name the information details which are essential to the system under development. For each attribute, match it with exactly one entity that it describes. 6. Draw the diagramm 7. Check Results Does the final Entity Relationship Diagram accurately depict the system data? Example 1.7 [Developement of an ERM] The following simple example will be developed in the following sections: 1. A company has several departments. 2. Each department has a supervisor and at least one employee. 3. Employees must be assigned to at least one, but possibly more departments. 4. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. 5. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number. 6. A further information who must be stored is the number of hours that an employee works for a project. 1.4.2.1

Identify Entities

In this stage, you look through the information about the system and seek to identify the roles, events, locations, concepts and other tangible things that you wish to store data about. One approach to this is to work through the information and underline those words which you think correspond to entities. The entities in this system are Department, Employee, Supervisor and Project. One is tempted to make Company an entity, but it is a false entity because it has only one instance in this problem. True entities must have more than one instance.

1-14

1.4.2.2

Find Relationships

From point 2. in the description we know that there is exactly one supervisor for each department. It is not clear if a supervisor is in charge of only one department so we will assume this. run by: Department 1:1 Supervisor From point 2. Each department has at least one employee. From point 3. An employee belongs to at least one, but possibly more departments is assigned: Employee M:M Department From point 4. every project has at least one employee. An employee does not have to be assigned to a project. We assume that an employee may be assigned to more than one project. works on: Employee M:MC Project 1.4.2.3

Identify generalization/specialization

The entities employee and supervisor are both roles played by person in the system. Both entities will have common attributes (name, address etc.). In this case it is appropriate to introduce a generalized entity person. This entity is abstract because each person in the system is either an employee or a supervisor. The specialisations employee and supervisor are disjoint. 1.4.2.4

Define the Entity Keys

Always introduce keys without semantic meaning. The simplest thing is to give every entity a number to identify it. Note that specializations don’t need a key because they inherite there keys from the generalization. We introduce the Keys PersonNumber, DepartmentNumber and ProjectNumber. 1.4.2.5

Identify and Map Attributes

From point 5. All entities have a name. DepartmentName, PersonName (inherited by employee and supervisor) and ProjectName. From point 6. The number of hours an employee works for a project is an intersection attribute. So we introduce a new entity ProjectHour to map this attribute.

1-15

1.4.2.6

Draw the Diagram

Now all information is awayable to draw the diagram. This is given in figure 1-8 = ’1-jan-2007’

Conditions

Till now all examples only use comparison between an attribute and a literal or between two attributes in the WEHRE clause. In SQL more complex conditions are allowed. The

LIKE

Predicate With LIKE it is possible to compare an attribute with a pattern.

::= [NOT]

LIKE

In the pattern the metacharacter ’%’ stands for an arbitrary string (the empty string is also allowed). The metacharacter ’_’ stands for exactly one character. The next example finds all suppliers whose name has an A at the beginning. SELECT FROM WHERE

* supplier name LIKE ’A%’

In the next example all products with a ’%’ in the name are selected. Notice the use of the escape Character ’\%’ SELECT FROM WHERE

* product pname LIKE ’%\%%’

Subqueries in the WHERE Clause A subquery in the WHERE clause defines a set of tuples. The predicate IN permits to test if a value is in this set. The predicate EXISTS permits to test if the set is empty or not. Remark 2.7 [Nesting] Subqueries may be nested to arbitrary depth. 2-17

Predicate

IN

Syntax of this predicate is:

::= {|} [NOT] ::= (literal [,literal]...) | ()

IN

Example with a list of literals SELECT FROM WHERE

* supplier city IN (’Detroit’, ’New York’, ’Los Angeles’)

Select all suppliers where the product ’Sugar’ is present in an orderItem. SELECT FROM WHERE

* supplier sup ’Sugar’ IN (SELECT FROM

WHERE

pname sorder so

orderItem oi NATURAL JOIN product p sup.sNr = so.sNr) NATURAL JOIN

Remark 2.8 [Scope of the Aliases] In this example the alias “sup” from the outer query is used in the subquery. The logical meaning is that the inner query is performed for each tuple of the outer query. So in the subquery the value of “sup” is always well defined and contain the values of the actual supplier. Predicate

EXISTS

This predicate tests if the result of the subquery is not empty.

The example selects all supplier with an oderItem containing the product with pNr = 2. SELECT FROM WHERE

* supplier sup EXISTS ( SELECT FROM WHERE

* sorder so NATURAL JOIN orderItem io sup.sNr = so.sNr AND io.pNr = 2)

2-18

2.3.2.8

ORDER BY Clause

A relation is a set so there is no ordering of the tuples. In SQL the tuples of the result can be ordered by an ORDER BY clause. ::=

ORDER BY

[ASC|DESC], [ [ASC|DESC]]. . .

The example selects orderItems. They are ordered by supplier name, order date in descending order and product number. SELECT FROM ORDER BY

2.3.2.9

s.name, so.orderDate, io.pNr, io.qty supplier s NATURAL JOIN sorder so NATURAL JOIN orderItem oi s.name, so.orderDate DESC, io.pNr

GROUP BY Clause

A GROUP BY clause in SQL specifies that a SQL SELECT statement returns a list that is grouped by one or more columns, usually in order to apply some sort of aggregate function to certain columns. ::=

GROUP BY

The attributes appearing in the SELECT clause must all be defined in the clause. All aggregate functions may also appear in the SELECT clause.

GROUP BY

The next example lists how many suppliers are present in a city. SELECT FROM GROUP BY

s.city, COUNT(*) supplier s s.city

AS

supplierCount

The next command lists the number of ordered items of every product since 1 january 2007. SELECT FROM WHERE GROUP BY

p.pname, SUM(io.qty) AS itemCount (sorder so NATURAL JOIN orderItem oi) b.bestelldatum >= ’1-jan-2007’ p.pNr, p.pname

2-19

NATURAL JOIN

product p

2.3.2.10 The

HAVING Clause

HAVING

clause is a filter for groups created by the

GROUP BY

clause.

The next example lists all cities where more than 3 suppliers are present. SELECT FROM GROUP BY HAVING

2.3.2.11

s.city, COUNT(*) AS supplierCount supplier s s.city supplierCount >= 3

Null Values

Null values stand for undefined or unknown values. Null values can not be compared with other values. In SQL the comparison of a null value with another value is always false. With the predicate IS [NOT] NULL it is possible to find out if an attribute contains a null value. SELECT FROM WHERE

* order deliveryDate

IS NULL

Remark 2.9 [Aggregate Functions and Null Value] During the computation of the aggregate functions SUM, AVG, MAX and MIN tuples with a null value in the given attribute are simply skiped. So the functions compute the value of all tuples where the value is not null.

2-20

Chapter 3 Transactions and Data Updates 3.1

Insert, Update and Delete Operations

Remark 3.1 [Update is possible only on one Table] Updates are possible only on one table of the database. If more than one table is involved in a change then there must be an update operation for each table. To preserve the integrity of the database, update operations can be grouped in a transaction so that all operations are performed in an atomic way (all or nothing).

3.1.1

INSERT

With the INSERT command new tuples can be inserted into a table. The syntax of the operation is given below.

::=

[()] ::= VALUES () | SELECT FROM . . . [WHERE . . . ] INSERT INTO

The attributes in the

INTO

clause and in the

SELECT

clause must be compatible.

supplier (sNr, name, city, zip) VALUES (27, ’Smith’, ’Detroit’, ’48226’) INSERT INTO

3.1.2

UPDATE

With the UPDATE command existing tuples in a table can be changed. The update accepts a where clause to select the tuples to change. The syntax of the operation is given below. 3-1

::=

SET [WHERE ] ::= = [, = ]. . . ::= literal | attribut-name | () UPTATE

sorder orderDate = ’12-jun-2007’ orderDate = ’10-jun-2007’ AND sNr = 1

UPDATE SET WHERE

In the next example we set the stockQty of the produc with pNr = 1 to the sum of all orderItems of this product. UPDATE SET

product stockQty = (

SELECT SUM(qty) FROM WHERE

WHERE

orderItem pNr = 1)

pNr = 1

3.1.3

DELETE

With the DELETE command tuple can be removed from a table. The tuples are choosed with the WHERE clause.

::=

DELETE FROM WHERE

DELETE FROM [WHERE ]

supplier sNr = 12

If no tuple is referencing this supplier then it is deleted. Else it depends on then delete mode of the foreign key.

3.2

Data Integrity and Transactions

Generally many users work with the same data base. An important task of the DBMS consists in coordinating the read and the write operations of the different processes. Each user must see the database as if it would belong to him. The operations must be executed in such a way, that no user has to wait for a long time for a result. Another task of the database system is to ensure that the datas remain consistent even after a system crash (recovery). 3-2

3.2.1

Consistency of the Database

A major requirement for database system is to ensure the consistency of the data at any time Definition 3.1 [Consistency] A database is consistent if 1. all integrity constraints are achieved (Primary key constraint, referential integrity, assertions etc) 2. all attribute values are consistent with the actual reality. 3. all relevant data are stored in the database. Violation of the first point can be caused by • System crashes, disk crashes, hardware errors etc. • Errors in the application programms. • Concurrent updates. This problems can be handled by transactions. The user has the resposability to ensure the conditions two and three. This can not be tested by the database management system.

3.2.2

The Transaction Model

Definition 3.2 [Transaction] A transaction is a sequence of read and write operations which transform a consistent database in another consistent database. The duration of a transaction is determined by the application. The transaction begins with the first SQL statement and ends with a COMMIT or ROLLBACK command. A ROLLBACK will restore the database to the state it had at the beginning of the transaction. Transactions have the following four properties (ACID): Atomarity At the end of a transaction either all update operations are executed or none. Consistency If the state of the database is consistent before the transaction starts then it is also consistent at the end of the transaction. During the transaction the state may be temporarly inconsistent. Isolation Other user do not see the changes to the database until the end of the transaction. Durability At the end of the transaction all changes are permanantly stored and are secure. 3-3

3.2.3

Atomarity and Durability

3.2.3.1

The Logfile

The atomarity is enforced with the help of the database wide logfile. The following informations are stored in the file: • Begin of transaction •

UPDATE, INSERT

and DELETE operations. To enable rollback and recovery the old data values and the new data values are stored in the file. The transaction identification is also stored for each operation.

• End of transaction. Remark 3.2 [Write Ahead Log Protocoll] All changes must be written to the logfile before the changes a stored to the database. This principle is called write ahead log protocol. Entries in the logfile are written without the help of write buffer or memory cache so that they are not lost after a system crash. The logfile should not be located on the same disk than the database. If more security is needed , the disk where the logfile is located can be mirrored. The logfile contains enough informations to recover a consistent state of the database after a system crash. 3.2.3.2

Checkpoints

A transaction is considered as finished as soon as the commit-mark is writen to the logfile. Now all changes of the transaction could be written to the database. To obtain a better performance the data are not written directly but buffered in the main memory. The writing of the data to the database is delayed as long as possible. There are two situations when the data are written to disk: 1. The cash buffer is full and we need space for new data from the disk. Then the data, which were not used for the longest time, are written to the disk. The database system manages the buffer automatically. 2. When a Checkpoint is reached: In this case all modified data in the cash buffer are written to the disk. Checkpoints are fired automatically by the system at given time intervals or after a given number of transactions. During the writing all other update operations are blocked. When the writing is terminated, this information is written to the logfile with the identification of all running transactions. Checkpoints are used to accelerate the recovery after a system crash. They define a known state of the data in the database.

3-4

3.2.3.3

Recovery

Consider figure 3-1. There five paralel transactions are shown. REDO UNDO A1

T1

A2 A2

A1

T2

A3 A1

T3 A1

T4

A2 A1

T5

time Checkpoint

Crash

Logfile: BOT T1

BOT

A1

BOT

A1

A2

A1

EOT

A2

T4

T1

T2

T4

T1

T2

T1

T2

Checkpoint aktiv T2,T4

BOT

A3

A1

A2

BOT

A1

EOT

T3

T2

T3

T4

T5

T5

T3

BOT

A3

A1

A2

BOT

A1

EOT

Recovery Operations BOT

A1

Undo

Undo

T4

T4

Redo Redo Redo Undo Undo Undo Redo T3

BOT = Begining of Transaction

T2

T3

T4

T5

T5

T3

EOT = End of Transaction

Figure 3-1: Recovery with help of the logfile Transaktion T1 This transaction ended before the last checkpoint. All changes are stored in the database. Transaktion T2 This transaction is finished. We know that A1 and A2 are stored in the database. For the operation A3 a REDO operation is necessary. Transaktion T3 This transaction is finished. A REDO operation is necessary for A1. Transaktion T4 und T5 This transactions are not terminated. For all operations of the transactions an UNDO operation is necessary.

3.2.4

Isolation

When transactions are running concurrently, certain problems can arise. This problems and the solutions are discussed in this section. 3-5

3.2.4.1

lost Update

The situation is depicted in figure 3-2. Transaction A

time

FETCH R

t1 t2

UPDATE R

Transaction B

FETCH R

t3 t4

UPDATE R

Figure 3-2: Lost Update Transactions A and B both change the same tuple R. When time t4 is reached, the changes made by A at time t3 are lost. 3.2.4.2

Uncommitted Dependency

The situation is depicted in the figures 3-3 and 3-4.

Figure 3-3: Problems with open transactions In the first example at time t2, transaction B reads data modified by transaction A at time t1. At time t3 the data which were read by transaction B are no more correct because of the ROLLBACK command. Transaction A FETCH R

time

Transaction B

t1

UPDATE R t2

FETCH R

UPDATE R ROLLBACK

t3

Figure 3-4: The Update is lost

3-6

In this example the modifications of transaction B at time t3 are destroyed by the ROLLBACK of transaction A. 3.2.4.3

Inconsitent Analysis

The situation is depicted in figure 3-5. Because transaction B transferes 10 $ from account3 to account1 the sum computed by transaction A is 110 $ and not 120 $. account1 40

account2 50

Transaction A

FETCH account1 (40) sum = 40 FETCH account2 (50) sum = 90

FETCH account3 (20) sum = 110 (not 120)

Zeit

account3 30 Transaction B

t1 t2 t3

FETCH account3 (30)

t4

UPDATE account3 30 20

t5

FETCH account11 (40)

t6

UPDATE account1 40 50

t7

COMMIT

t8

Figure 3-5: Transaction a computes a wrong value

3.2.5

Locking

The problems described so far can be solved by a locking mechanism. When a transaction reads or writes data, it will lock the data. The data can not be changed by another transaction as long as they are locked. After a commit operation the locks are released. Now another transaction can access the data. 1. There are two sorts of locks. Exclusive locks (X-lock) and shared locks (S-lock). 2. If transaction A puts a X-lock on a tuple, no other transaction can lock this tuple. If transaction B wants to access the tuple, it must wait that A releases the lock. 3. If transaction A puts a S-lock on a tuple, then another transaction can put a S-lock on the same tuple. If transaction B wants to put a X-lock on the tuple, it must wait that all S-lock for this tuple are released.

3-7

The compatibility between different locks are shown in the table 3.1. X S X No No Yes S No Yes Yes - Yes Yes Yes Table 3.1: Compatibility of Locks 4. The locks are put on the tuple implicitly. If a transaction reads a tuple, then a S-lock is put on the tuple. If a transaction writes a tuple successfully, then a X-lock is put on the tuple. 5. At the end of the transaction (COMMIT or ROLLBACK) all X-locks and all S-locks are released. Now Consider what happens with the three problems (lost Update, Uncommitted Dependency and inconsistent Analysis) when locking is introduced. 3.2.5.1

lost Update

The situation is depicted in figure 3-6. Transaction A

time

FETCH R S−Lock auf R

Transaction B

t1 t2

t3 UPDATE R Wait for X−Lock wait t4 wait wait wait wait Deadlock

FETCH R S−Lock auf R

UPDATE R waiting for X−Lock wait wait

Figure 3-6: Lost Update with locking Now the updates of transaction A are no more overwritten, but there is a deadlock. Transaktion A waits that transaction B releases the S-Lock. Transaction B waits that transaction A releases the S-lock. 3.2.5.2

Uncommitted Dependency

The situation is depicted in the figures 3-7 and 3-8. In this case no deadlock arises. After transaction A and B the database is consistent.

3-8

Transaction A

time

Transaction B

t1

UPDATE R X−Lock on R

t2 t3

ROLLBACK Release Locks

t4

FETCH R wait for S−Lock wait for S−Lock wait for S−Lock wait for S−Lock resume:FETCH R S−Lock on R

Figure 3-7: Transaction B is waiting fro transaction A to finish Transaction A

time

Transaction B

t1

FETCH R

UPDATE R X−Lock on R ROLLBACK Release Locks

t2

FETCH R

t3

wait for S−Lock wait for S−Lock wait for S−Lock

t4

resume:FETCH R

UPDATE R X−Lock on R

Figure 3-8: Transaction B is waiting fro transaction A to finish to update the tuple

3.2.5.3

Inconsitent Analysis

The situation is depicted in figure 3-9. account1 40

account2 50

Transaction A

time

account3 30 Transaction B

FETCH account1 (40)

S−Lock on account 1

t1

sum = 40 FETCH account2 (50)

t2

S−Lock on account 2 sum = 90

t3

FETCH account3 (30)

S−Lock on account3 t4

UPDATE Konto3

X−Lock on account3 t5

30 20 FETCH account1 (40)

S−Lock on account1 t6 FETCH account3 (20)

t7

wait for S−Lock account3 wait wait Deadlock

UPDATE account1

wait for X−Lock account1 wait wait wait wait

Figure 3-9: Deadlock happens In this case there is no more inconsistency but a deadlock occurs. 3.2.5.4

Handling Deadlocks

The system is able to recognize deadlock situations. When the system detects a deadlock, then one of the responsible transaction will be selected. The system enforces a 3-9

rollback for this transaction. Another possibility is to abort the read or write operation for the selected transaction. In this case the transaction is not terminated and the user can try to repeat the operation.

3.2.6

Transactions and SQL

3.2.6.1

Begin of transaction

In SQL transactions are started implicitly. If no transaction is active and a “transaction initiating” statement is executed, then a new transaction is started. Properties of transactions can be set by the user with the SET TRANSACTION command. Remark 3.3 [Explicit Start of Transactions] In SQL-2003 transactions can be started explicitly with the START TRANSACTION command. The properties of the transaction can be set with this command. The following list shows all statements which do not initiate a new transaction. All other statements are “transaction initiating” statements. CONNECT DISCONNECT COMMIT ROLLBACK DECLARE CURSOR DECLARE LOCAL TEMPORARY TABLE BEGIN DECLARE SECTION END DECLARE SECTION WHENEVER all SET Befehle all GET Befehle 3.2.6.2

End of Transaction

A transaction is terminated by a COMMIT or a ROLLBACK statement. When a transaction T1 makes changes to the database this is only visible for other transactions after the COMMIT statement. Actions of the

COMMIT

statement.

1. Close all open cursor 2. Test all delayed integrity constraints. 3. All changes become persistent. Actions of the

ROLLBACK

statement.

1. Close all open cursor 3-10

2. All changes which was made during the transaction are canceled. Remark 3.4 [Nested Transaction] In SQL-2003 there is the posibility of nested transaction. At each time during a transaction a savepoint can be set. Now it is possible to make a rollback to a given savepoint. The main transaction remains open and can be terminated later. 3.2.6.3

Property of Transactions

The “access mode” and the “isolation level” of a transation can be set with the SET TRANSACTION statement. This statement can only be executed if no transaction is active. Syntax of the command: transactionoption ::= SET TRANSACTION [accessmode] [,isolationlevel] accessmode ::= READ ONLY | READ WRITE isolationlevel ::= SERIALIZABLE | REAPEATABLE READ | READ COMMITTED | READ UNCOMITTED

“access mode”: A READ ONLY transaction can only read the data. No updates are possible during a read only transaction. The default is READ WRITE. If the isolation level is READ UNCOMMITED, then the access mode must be READ ONLY. “isolation level”: The highest degree of isolation level is serializability. If all transactions on this degree work well, then it is guaranteed that the result of concurrent transactions gives the same result, as if the transactions where executed in serial manner (one speaks then of the ACIDS principle). The default for the isolation level is SERIALIZABLE. If another level is selected by the user, then the database system may select a higher one. The ordering of isolation levels is: READ UNCOMMITTED < READ COMMITTED < REAPEATABLE READ < SERIALIZABLE Depending on the isolation level of the transaction, the following may happen. 1. Dirty read: This is the same than “uncommited dependency” (section 3.2.4). Transaction T1 updates a tupel, then transaction T2 reads the tupel. Now transaction T1 makes a rollback and the data read by T2 are “dirty”. 2. Nonrepeatable read: The transaction T1 reads a tuple, then T2 update the tuple. Now T1 reads the tuple again and sees another value. This is the same problem than “inconsistent analysis” (section 3.2.4). 3. Phantoms Transaction T1 selects a set of tuples with a given select conditions. Now transaction T2 inserts a tuple which meets the given conditions. If the same query is executed again by transaction T1, there will be some new tuples (phantoms) in the result set. 3-11

The “isolation level” in SQL says which of the effects may hapen during a transaction. The following table shows wich effects are possible with which isolation level. isolation level dirty read nonrepeatable read phantom READ UNCOMMITTED yes yes yes READ COMMITTED no yes yes REPEATABLE READ no no yes SERIALIZABLE no no no

3-12

Chapter 4 Definition of the Database Schema 4.1

Data Definition Language (SQL/DDL)

The D(ata) D(efinition) L(anguage) (DDL) is the part of SQL with wich one can define the database Schema.

4.1.1

Data Type

The following table shows the standard data types. Type BOOLEAN SMALLINT

Instance Logical Value Integer Value

Example TRUE 4771

Fixpoint Number

1003.65

Floatingpoint Number

1.5E-4

Alphanumeric Character String

’A String’

Binary String

B’110100101’

Date Time Timestamp Time Interval

DATE’1951-11-24’ TIME’11:39:49’ TIMESTAMP’2002-08-23 14:15:00’ INTERVAL’48’ HOUR

INTEGER BIGINT DECIMAL(p,q) NUMERIC(p,q) FLOAT(p) REAL DOUBLE PRECISION CHAR(q) VARCHAR(q) CLOB BIT(q) BIT VARYING(q) BLOB DATE TIME TIMESTAMP INTERVAL

4-1

4.1.2

Relation Schema

A relation schema is called a table and is defined by its attributes. Each attribute must have a data type. 4.1.2.1

New Relation Schema

::= CREATE TABLE ( [,]... ) ::= | ::= { | } [] []...

The definition of and will be given in the section 4.1.3. Example 4.1 [New Relation Schema] Here is the definition of the schema for suppliers. CREATE TABLE

sNr name city zip

supplier ( INTEGER, VARCHAR(30), VARCHAR(30), CHAR(8)

)

4.1.2.2

Altering a Relation Schema

With the command ALTER TABLE the definition of a schema can be changed. It is not possible to alter the data type of an attribute. ::= ::=

ADD [COLUMN]

| | | |

The command

ALTER [COLUMN] {SET | DROP DEFAULT} DROP [COLUMN] [RESTRICT | CASCADE] ADD DROP CONSTRAINT [RESTRICT | CASCADE]

ALTER TABLE

DROP [COLUMN]

can not be executed in the following cases.

• Trying to delete the last attribute of the table.

4-2

• The delete mode is RESTRICT and the attribute is referenced in a constraint or a view definition. If the delete mode is CASCADE, then views and constraints referencing the attribute are also dropped. Example 4.2 [Altering a Relation Schema] Define the new attribute “street in the supplier table. ALTER TABLE

supplier

ADD

street

VARCHAR(30) DEFAULT

’unknown’

Drop the attribute ”zip“ from the supplier table. ALTER TABLE

supplier

DROP

zip

RESTRICT

The table can be altered even if tuples already exist in the table. 4.1.2.3

Dropping a Relation Schema

With the command DROP TABLE a table can be dropped from the database. The command also deletes all tuples in the given table. ::=

DROP TABLE

[RESTRICT |

The table is not dropped if the delete mode is a constraint or a view definition.

RESTRICT

CASCADE]

and the table is referenced in

Example 4.3 [Dropping a Relation Schema] Delete the supplier table from the database. DROP TABLE

4.1.3

supplier

CASCADE

Constraints

Constraints permits to exclude ”semantic incorrect“ datas from the database. SQL:1992 supports constraints for attributtes, relation schematas and general constraints called Assertion. All constraints have an explicit or implicit (created by the system) name. The name of the constraint is used in error messages. ::= [CONSTRAINT ] ::= |

4.1.3.1

Constraints for Attributes

This constraints are defined at the same time as an attribute and are valid for exactly one attribute in a table.

4-3

::=

NOT NULL

| | |

|

UNIQUE PRIMARY KEY

[ON DELETE ] [ON UPDATE ] CHECK()

REFERENCES

Explantion: •

NOT NULL

the attribute can not contain a null value.

•

UNIQUE

•

PRIMARY KEY

•

REFERENCES

•

CHECK

the value of the attribute must be unique over the whole table. One exception is the value NULL which can appear an arbitrary number times. defines the attribute as the primary key.

defines the attribute as a foreign key.

is a boolean expression which limits the range of possible values for this attribute.

Example 4.4 [Constraints for Attributes] CREATE TABLE

person (

peNr name firstName securityNr

INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, CHAR(14) UNIQUE

) CREATE TABLE

phobby (

peNr

INTEGER REFERENCES

person (peNr)

ON DELETE CASCADE,

hobby

VARCHAR(20) NOT NULL

(hobby PRIMARY KEY(peNr, hobby) CHECK

IN

(’Golf’, ’Theater’, ’Football’))

) 4.1.3.2

Constraints for Relation Schema

::=

{PRIMARY KEY | UNIQUE} () | FOREING KEY () REFERENCES table-name [()] [MATCH {SIMPLE | PARTIAL | FULL}] [ON DELETE ] 4-4

[ON UPDATE ] | CHECK (conditional-expression) ::= [,]... ::= {NO ACTION | CASCADE | SET DEFAULT |

•

UNIQUE

•

PRIMARY KEY

•

FOREIGN KEY

SET NULL}

key over more than one attribute. primary key (more than one attribute).

foreign key (more than one attribute). It is possible to specify matching rules and referential actions. Matching Rules specify the behavior of the system if an attribute of the foreign key contains a null value. If the value of an attribute is null, then the referential integrity is not tested. This is the default. MATCH PARTIAL either all attributes are null or the attributes wich are not null, match the attributes of at least one tuple in the referenced table. MATCH FULL either all attributes are null or the attributes wich are not null, match the attributes of exactly one tuple in the referenced table. MATCH SIMPLE

Referential Actions specify the behavior of the system when a tuple is deleted (ON DELETE) or changed (ON UPDATE) in the referenced table. do not delete the tuple if it is referenced by a tuple of this table. This is the default. CASCADE also delete all tuples in this table which reference the tuple in the referenced table. SET NULL attributes of the foreign key are set to null in the referencing tuples. NO ACTION

•

defines a integrity constraint for the table. The constraint can contain subqueries.

CHECK

Example 4.5 [Constraints for Relation Schema] CREATE TABLE

peNr name firstName

person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(20) NOT NULL,

UNIQUE(name,

firstName)

); CREATE TABLE

name firstName

hobbys ( VARCHAR(30), VARCHAR(20),

4-5

hobby

VARCHAR(20) NOT NULL,

PRIMARY KEY FOREIGN KEY

(name, firstName, hobby), (name, firstName) REFERENCES person (name, firstName)

ON DELETE CASCADE ON UPDATE CASCADE CHECK

(hobby

IN

(’Golf’, ’Theater’, ’Boxing’, ’Piano’))

)

Example 4.6 [Names for Constraints] In the next example the name of the constraint is defined explicitly by the user. CREATE TABLE

person (

p_nr

INTEGER,

name firstName

"Primary Key for person"PRIMARY KEY VARCHAR(30) NOT NULL, VARCHAR(20) NOT NULL, CONSTRAINT

,

"Name und firstName are a canditate key" firsName)

CONSTRAINT

UNIQUE(name,

);

4.1.3.3

ASSERTIONS

An assertion is a global constraint which is not bound to a given table. ::=

CREATE ASSERTION

CHECK

()

conditional-expression is an arbitrary SQL statement which is to true or false. The expression can contain more than one table. Example 4.7 [Assertions] The next assertion ensures that each person has at least one hobby. personhobby (NOT EXISTS (SELECT * FROM person p

CREATE ASSERTION CHECK

WHERE NOT EXISTS

(SELECT * FROM phobby ph WHERE p.peNr = ph.peNr)))

4-6

4.1.3.4

Modes for constraints

Each constraint has an initial constraint mode and a delay mode. • the constraint mode defines when the constraint is tested during a transaction. 1.

IMMEDIATE

tests the constraint immediatly after the execution of a SQL

statement. 2.

DEFERRED

delays the test to the end of the current transaction.

The initiale constraint mode is valid at the beginning of a transaction. It can be set explicitly by the INITIALLY IMMEDIATE or INITIALLY DEFERRED clause. • Only deferrable constraints can have the mode DEFERRED. Deferrabilty is set implicitly by the INITIALLY DEFERRED clause or explicitly by the DEFERRABLE clause. Now the full syntax for a constraint clause: [CONSTRAINT ] [[NOT] DEFERRABLE] [INITIALLY {IMMEDIATE | DEFERRED]} The defaults are

NOT DEFERRABLE

and

INITIALLY IMMEDIATE.

The delay mode can be changed during the transaction if the constraint is deferrable. ::= ::=

{DEFERRED | IMMEDIAT} | [, ]...

SET CONSTRAINTS ALL

There exist many examples where a constraint must be deferred. In the example 4.7 no person can be inserted because there is no hobby for this person. On the other hand the hobby can not be inserted as long the person does not exist. This problem can only be solved if the test of the constraint is delayed to the end of the transaction. The correct definition for the assertion must have the following form. CREATE ASSERTION CHECK

personhobby

(NOT EXISTS (SELECT * FROM person p WHERE NOT EXISTS

(SELECT * FROM phobby ph p.peNr = ph.peNr)))

WHERE

DEFERRABLE INITIALLY DEFERRED

4-7

4.2

Normalization

Database normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only.

4.2.1

First Normal form 1NF

Definition 4.1 [First Normal Form 1NF] A table is in first normal form (1NF) if and only if it faithfully represents a relation. One requirement of a relation is that every table contain exactly one value for each attribute. Given that database tables embody a relation-like form, the defining characteristic of one in first normal form is that it does not allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents duplicate rows) and without any nullable columns is in 1NF. Example 4.8 [Relation in 1NF] Consider the following relation schema about lecturer and the course they give. FacultyCourse = ({lNr, name, city, fNr, facultyName, cNr, courseName, weekDay}, {lNr, cNr}) Some explanation: • Every lecturer has a name, a city and belongs to exactly one faculty. • A lecturer can give more than one course. It ist possible that the lecturer has no courses (sabbatical). • The same course can be teached by more than one lecturer. • Each course has a code and a unique name. • Each faculty has a code and a unique name. • weekDay is the day when the course is teached by this lecturer. In the next table a possible relation over this schema is shown. lNr 1 1 2 2

name Jones Jones Davis Davis

city New York New York Boston Boston

Lecturer and Their fNr facultyName 120 Mathematics 120 Mathematics 135 Physics 135 Physics 4-8

Courses cNr courseName 222-1 Analysis 440-1 Linear Algebra 334-2 Mechanics 440-1 Linear Algebra

weekDay Monday Friday Tuesday Thursday

The relation in the example 4.8 is in first normal form but has some update anomalies insert: It is not possible to insert a lecturer if he gives no course because cNr belongs to the primary key. delete: When the last course of a lecturer is deleted, all informations about the lecturer are lost. update: If Jones moves from New York to Boston, all tuples about Jones must be changed. Otherwise the data base is no longer consistent.

4.2.2

Functionale Dependency

In the normalization process we have to know how attributes are dependent of each other. This leads to the term of functional dependency (FD). Definition 4.2 [Functional Dependency] Let R be a relation schema, X ⊆ R and Y ∈ R. Attribute Y has a funcional dependency on Xif, for each value of the attributes in X, there is exactly one value of the attribute B. The notation is R.X → R.Y Formaly: Let R be a relation schema, X ⊆ R and Y ∈ R. Attribute Y has a functional dependency on X if, for all relation r ∈ REL(R) R.X → R.Y ⇔ ∀t1 , t2 ∈ r : t1 (X) = t2 (X) ⇒ t1 (Y ) = t2 (Y ) All non key attributes in a relation are functionaly dependent on the attributes of a key (primary or candidate). In the example 4.8 we can find the following FDs. lNr → name cNr → courseName (lNr, cNr) → weekDay Definition 4.3 [Full functional dependency] An attribute Y of a relation schema R is fully functionally dependent on a set of attributes X ⊆ R if it is • functionally dependent on X, and • not functionally dependent on any proper subset of X. In the example 4.8 the following functional dependency hold. (lNr, name) → city It is not a full functional dependency because the following FD also holds. lNr → city Remark 4.1 [FD is a Semantik Notion] The notion of functional dependency is a semantic one. Recognizing FDs is only possible if the meaning of the attributes are known. Each FD is modelling a fact from the system. 4-9

4.2.3

Second Normal Form 2NF

The problems in example 4.8 is, that there are attributes which are not fully functional dependent on the primary key. This introduces redundancy in the database. All full functional dependency are shown in figure 4-1. name city

facultyName Key lNr

fNr

weekDay cNr

courseName

Figure 4-1: Full functional dependency in the relation schema FacultyCourse Definition 4.4 [Second Normal Form 2NF] A relation schema R is in second normal form (2NF) if • it is in first normal form and • every non key attribute is fully functional dependent on every key in the schema. The task now is to transform the relation schema in 1NF in a system of relation schematas in 2NF. To do this we consider wich non key attributes are full functional dependent on which attributes of the primary key. • The attributes name, city, fNr and facultyName are fully functional dependent on the attribute lNr. • The attribute courseName is fully functional dependent on the attribute cNr • The attribute weekday is fully functional dependent on the attributes (lNr, cNr). For each of this groups we build a new relation schema. So we obtain the three schematas: Lecturer Course LecturerCourse

= ({lNr, name, city, fNr, facultyName} {lNr}) = ({cNr, courseName}, {cNr}) = ({lNr, cNr, weekDay}, {lNr, cNr}, fk({lNr}, Lecturer), fk({cNr, Course}))

4-10

To obtain the relation over the new schematas we simply make the three projections: • Lecturer = πlN r,name,city,f N r,f acultyN ame (F acultyCourse). • Course = πcN r,courseN ame (F acultyCourse) • LecturerCourse = πlN r,cN r,weekDay (F acultyCourse) The result of this projections are shown inf figure 4-2.

lNr name 1 Jones 2 Davis

Lecturer city fNr New York 120 Boston 135 lNr 1 1 2 2

facultyName Mathematics Physics

cNr 222-1 440-1 334-2

Courses courseName Analysis Linear Algebra Mechanics

LecturerCourse cNr weekDay 222-1 Monday 440-1 Friday 334-2 Tuesday 440-1 Thursday

Figure 4-2: Lecturer and their courses in 2NF Remark 4.2 [Relationship] The relation schema LecturerCourse represents a M:M relationship between Lecturer and Courses. The attribute weekDay is an intersection attribute. Consider the case where the attribute weekDay is not present. Then the schema LectureCourse is still necessary to represent the M:M relationship between Lecturer and Course. All full functional dependency of the Lecturer-Course system in 2NF are shown in figure 4-3. Lecturer

facultyName fNr

lNr

city name

Course cNr

LecturerCourse lNr

courseName

weekDay cNr

Figure 4-3: Full functional dependency of the Lecturer-Course system in 2NF The two relation schematas Course and LecturerCourse are not only in 2NF but also in 3NF. Only the relation schema Lecturer is not fully normalized. The following update anomalies are still present: 4-11

insert: A new faculty can only be inserted if a lecturer belongs to the faculty. delete: If we delete the last lecturer of a faculty, the informations about the faculty are also lost. update: If the name of the faculty changes then this must be changed for every lecturer of the faculty.

4.2.4

Third Normal Form (3NF)

The problems in the relational schema Lecturer is caused by the two following functional dependencies. lN r → f N r and f N r → f acultyN ame This is called a transitive dependency. The attribute facultyName is transitvely depependent on the attribute lNr over the attribute fNr. This introduces redundancy in the relation. Definition 4.5 [Third Normal Form 3NF] A relationschema R is in third normal form (3NF) if • it is in 2NF and • Every non key attribute of the table is non-transitively dependent on each key of the schema. In order to eliminate a transitive dependency in a relational schema, it must be split into two different schematas. Eliminate the transitive attribute in the old schema and place it, with the non-key attribute it depends on, in the new schema. For the relation schema Lecturer we obtain the to schematas: Lecturer Faculty

= ({lNr, name, city, fNr}, {lNr} fk({fNr}, Faculty)) = ({fNr, facultyName}, {fNr})

To obtain the relation over the new schematas we simply make the two projections: • F aculty = πf N r,f acultyN ame (Lecturer) • Lecturer = πlN r,name,city,f N r (Lecturer). The result of this projections are shown inf figure 4-4. All full functional dependency of the Lecturer-Course system in 3NF are shown in figure 4-5. Theorem 4.1 [Transformation to 3NF] Every relation schema can be decomposed in a system of relational schematas which are in 3NF and preserves all dependencies of the original schema. This means no informations is lost. 4-12

lNr 1 2

Lecturer name city fNr Jones New York 120 Davis Boston 135

fNr 120 135

Faculty facultyName Mathematics Physics

Figure 4-4: Lecturer in 3NF Lecturer

Faculty fNr

lNr

fNr

city

facultyName

name

LecturerCourse Course lNr cNr

weekDay

courseName

cNr

Figure 4-5: full functional dependency of the Lecturer-Course system in 3NF

4.2.5

The Boyce/Codd Normal Form BCNF

Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not contain multiple overlapping candidate keys is guaranteed to be in BCNF. Depending on what its functional dependencies are, a 3NF table containing two or more overlapping candidate keys may or may not be in BCNF. Consider the following relation schema with the attributes TutorID, TutorSocSecurityNr and StudentID with the following rules • A tutor has exactly one social security number and a social security number is assigned to exactly one tutor. • More than one student can be associated with a single tutor. • More than one tutor can be associated with a single student. From the rules we can deduce the following candidate keys, which happen to overlap. • {TutorID, StudentID} • {TutorSocSecurityNr, StudentID} From the rules we can also deduce the following functional dependencies: • T utorID → T utorSocSecurityN r 4-13

• T utorSocSecurityN r → T utorID An example for a relation over this schema is shown in the table 4.1. Tutor/Student Cross-Reference TutorID TutorSocSecurityNr StudentID 1078 088-51-0074 31850 1078 088-51-0074 37921 1293 096-77-4146 46224 1480 072-21-2223 31850 Table 4.1: Tutor/Student Cross-Reference Because all attributes belongs to a key the relation schema is in 3NF. But the schema still has some update anomalies. The candidate keys of the Tutor/Student CrossReference table cannot prevent a situation in which two different Tutor IDs are shown, illegitimately, as corresponding to the same tutor social security number. So the functional dependency T utorSocSecurityN r → T utorID can be violated. Definition 4.6 [Boyce/Codd Normal Form BCNF] A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {EmployeeID, EmployeeAddress} → EmployeeAddress is trivial EmployeeAddress → EmployeeAddress is also trivial A table is in Boyce-Codd normal form (BCNF) if and only if, for every one of its nontrivial functional dependencies X → Y , X is a superkey that is, X is either a candidate key or a superset thereof. In the example T utorSocSecurityN r → T utorID is a non trivial functional dependency but TutorSocSecurityNr is not a superkey of this schema. The solution of this problem is to split the relation schema. TutorStudent = ({TutorID, StudentID}, {TutorID, StudentID}, fk({TutorID}, Tutor)) Tutor = ({TutorID, TutorSocSecurityNr}, {TutorID}, ck(TutorSocSecurityNr)) Both tables are in BCNF; and so long as the uniqueness of all candidate keys is enforced, there is no possibility of showing multiple distinct Tutor IDs against the same Tutor Social Security Number or vice-versa. The resulting relations are shown in table 4.2 4.2.5.1

Achievability of BCNF

In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional dependencies {A, B → C, C → B} 4-14

Tutor/Student Cross-Reference TutorID StudentID 1078 31850 1078 37921 1293 46224 1480 31850

Tutor TutorID TutorSocSecurityNr 1078 088-51-0074 1293 096-77-4146 1480 072-21-2223

Table 4.2: Tutor/Student Cross-Reference cannot be represented by a BCNF schema. Thus, unlike the first three normal forms, BCNF is not always achievable. Consider the following relation schema with the attributes Student, Course and Lecturer with the following rules: • A course is teached to a student by exactly one lecturer. • Each lecturer teaches exactly one course. • Each course can be teached by more than one lecturer. From the rules we can deduce the following functional dependencies. • (Student, Course) → Lecturer • Lecturer → Course Those this schema follow the {A, B → C, C → B} pattern. An example for a relation over this schema is shown in the table 4.3. Student-Course-Lecturer-Relation Student Course Lecturer Peter Mathematics Smith Peter Physics Jones Mark Mathematics Smith Mark Physics Davis Table 4.3: Student-Course-Lecturer-Relation From the rules and the functional dependency we can deduce that the relation has two candidate keys. {Student, Course} and {Student, Lecturer}. Since there is no non-key attribute in this relation schema, it is in 3NF. But the schema still has some update anomalies. insert: The fact that a lecturer is teaching a course can not be recorded if no student takes the course. 4-15

delete: Deleting the fact that Mark takes the course in physics, also destroys the fact that Davis teaches physics. In the example Lecturer → Course is a non trivial functional dependency but Lecturer is not a superkey of this schema. The solution of this problem is to split the relation schema. StudentLecturer = ({Student, Lecturer}, {Student, Lecturer}, fk({Lecturer}, LecturerCourse)) LecturerCourse = ({Lecturer, Course}, {Lecturer}) The relations are shown in table 4.4. StudentLecturer Student Lecturer Peter Smith Peter Jones Mark Smith Mark Davis

LecturerCourse Dozent Fach Smith Mathematics Jones Physics Davis Physics

Table 4.4: Relation Schematas in BCNF Unfortunately, although this design adheres to BCNF, it is unacceptable on different grounds: it allows us to record multiple lecturer which teaches the same course against the same student. In other words, its candidate keys do not guarantee that the functional dependency (Student, Course) → Lecturer will be respected. The only solution is to introduce a global integrity

4.3

ERD and the Database Schema

In this section a methode is described to transform an ERD to a database schema with the following property 1. All informations in the ERD must be mapped to the database schema. 2. All relations are in 3NF 3. the number of relation schematas is minimal. The translation is relativly simple and could be performed by an algorithm. For every element in the ERD a table or an attribute is created in the database schema.

4-16

4.3.1

Entity sets and Attributes

For entity sets create a table with the same name. The key of the entity becomes the primary key. The figure 4-6 shows the entity person person peNr title[0..1] name firstName birthDate hobby[1..*]

Figure 4-6: Entity Person The steps to create a schema are described below 1. Create a table with the same name as the entity and define the entity key as primary key. CREATE TABLE

peNr

person ( INTEGER PRIMARY KEY

) 2. All attributes with type 1 cardinality can be added to the schema with the NULL constraint. CREATE TABLE

person (

peNr name firstName birthDate

INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL

) 3. Then add all attributes with type C cardinality without constraint. CREATE TABLE

peNr title name firsName birthDate

person ( INTEGER PRIMARY KEY, VARCHAR(15), VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL

)

4-17

NOT

4. Attributes with type M or type MC cardinality must be stored in an own table (one table for each such attribute). The primary key is the primary key of the main schema plus the attribute. Then there must be a foreign key to the main table. In the example hobby has cardinality MC and must be stored in an own table. CREATE TABLE

personhobby (

peNr

INTEGER REFERENCES

hobby

VARCHAR(20)

person(peNr)

ON DELETE CASCADE, CHECK(hobby IN PRIMARY KEY(peNr,

(’Football’, ’Piano’, ’Theater’, ’Golf’, ’Boxing’, ’Cooking’)),

hobby)

)

In this example type MC cardinality is realized for the attribute hobby. For type M cardinality an assertion is necessary. "Hobby has cardinality M" (NOT EXISTS (SELECT * FROM person LEFT JOIN personhobby WHERE hobby IS NULL))

CREATE ASSERTION CHECK

DEFERRABLE INITIALLY DEFERRED

Note that The test of the assertion must be deferred to the end of the transaction

4.3.2

Generalization and Specialisation

There are two possibilities for the implementation of specialization: 4.3.2.1

Vertical Representation

In this representation the each specialization is implemented as a table. The primary key of this tables are also foreign keys to the entity who represents the generalization. Example 4.9 [Implementation of Generalization] Consider the three relation schematas. CREATE TABLE

peNr name firstName birtDate

person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL

)

4-18

CREATE TABLE

physician (

peNr speciality

INTEGER PRIMARY KEY, VARCHAR(30)

) CREATE TABLE

patient (

peNr healthInsurance

INTEGER PRIMARY KEY, VARCHAR(30)

) To represent person as a generalization of physician and patient we introduce two foreign keys. CREATE TABLE

physician (

p_nr

INTEGER PRIMARY KEY REFERENCES

person(p_nr)

ON DELETE CASCADE,

speciality

VARCHAR(30)

) CREATE TABLE

patient (

peNr

INTEGER PRIMARY KEY REFERENCES

person(p_nr)

ON DELETE CASCADE,

healthInsurance

VARCHAR(30)

) This representation is correct: • A physician or a patient can be only one person because peNr is the primary key and the foreign key. • An entity patient or physician can only exists if it also exists as a person. This representation only fits case d) in the figure 1-7. Cases a) to c) can only be implemented correctly with some constraints. • Case a) and b): The specialisations are disjoint: "Physician and Patient are disjoint" (NOT EXISTS (SELECT * FROM person NATURAL JOIN physician NATURAL JOIN patient))

CREATE ASSERTION CHECK

• Case a) und c): A person must be physician or patient.

4-19

CREATE ASSERTION CHECK

"person is abstract"

(NOT EXISTS (SELECT * FROM person p WHERE NOT EXISTS

(SELECT * FROM physician ph WHERE p.peNr = ph.peNr) AND NOT EXISTS

(SELECT * FROM patient pa WHERE p.peNr = pa.peNr))) DEFERRABLE INITIALLY DEFERRED

Remark 4.3 [Assertions] Today no database system implements assertions. They must be simulated in the application with triggers and stored routines. 4.3.2.2

Horizontal Representation

In this representation specialization has no own entity. All attributes of the specialization are stored in th main entity. In the main entity an attribute is introduced to store which sort of entity is present. Example 4.10 [Horizontal Representation] person, patient and physician example. CREATE TABLE

person (

peNr name firstName birthDate isphysician speciality ispatient healthInsurance

INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL, BOOLEAN NOT NULL, VARCHAR(30), BOOLEAN NOT NULL, VARCHAR(30)

) If physician and patient are disjoint the following constraint will express this. person ADD CONSTRAINT "Physician and Patient are disjoint" CHECK (NOT(isphysician = TRUE AND ispatient = TRUE)) ALTER TABLE

If person is abstract, then add the following constraint. ADD CONSTRAINT "person is abstract" CHECK (isphysician = TRUE OR ispatient = TRUE)

4.3.3

ALTER TABLE

Relationships

Figure 4-7 shows the most important cases of relationships. 4-20

person

Physician phNr

1

treats

1..*

Patient paNr

Physician phNr

1..*

treats

1..*

Patient paNr

Figure 4-7: Relationships 4.3.3.1

Type 1 to Type M Relationship

In this case a foreign key is introduced from patient to physician. CREATE TABLE

phNr . .

physician ( INTEGER PRIMARY KEY

) CREATE TABLE

paNr phNr . .

patient ( INTEGER PRIMARY KEY, INTEGER NOT NULL REFERENCES

physician(phNr),

)

If it is possible that patients exist which are treated by no physician, then null values for the foreign key are allowed. CREATE TABLE

paNr phNr . .

patient ( INTEGER PRIMARY KEY, INTEGER REFERENCES

physician(phNr),

)

To be sure that each physician is treating at least one patient, the following constraint must be introduced. "Physician treats 1 Patient" (NOT EXISTS (SELECT * FROM physician LEFT JOIN patient WHERE paNr IS NULL)

CREATE ASSERTION CHECK

DEFERRABLE INITIALLY DEFERRED

4-21

4.3.3.2

Type M to Type M Relationship

In this case a new table is introduced to represent the relationship. This table has a foreign key to physician and one to patient. The primary key is composed of both foreign keys. CREATE TABLE

phNr . .

physician ( INTEGER PRIMARY KEY

) CREATE TABLE

paNr . .

patient ( INTEGER PRIMARY KEY,

) CREATE TABLE

phNr paNr

PhTreatsP ( INTEGER REFERENCES INTEGER REFERENCES

PRIMARY KEY

physician(phNr), patient(paNr),

(phNr, paNr)

)

Also in this example the conditional case (MC:MC) is represented. To represent (M:M or MC:M) an assertion is necessary.

4.3.4

Intersection Attribute

The representation of intersection Attributes depends on the cardinality of the relationship. 4.3.4.1

Type 1 to Type M Relationship

In figure 4-8 diagnosis and medicine are intersection attributes. In this example there is a 1:M cardinality between physician and patient, so the intersection attributes can be stored in patient. Because medicine has cardinality MC it must be stored in an own relation. CREATE TABLE

phNr . .

physician ( INTEGER PRIMARY KEY

) 4-22

Physician phNr

1

treats

1..*

Patient paNr

PhTreatsP diagnosis medicine[0..*]

Figure 4-8: Intersection Attributes in a 1 to M Relationship

CREATE TABLE

patient (

paNr phNr diagnosis . .

INTEGER PRIMARY KEY, INTEGER REFERENCES

physician(phNr),

CLOB,

) CREATE TABLE

PaMedicine (

paNr

INTEGER REFERENCES

patient(paNr)

ON DELETE CASCADE

medicine

VARCHAR(30),

PRIMARY KEY

(paNr, medicine)

)

4.3.4.2

Type M to Type M Relationship

In figure 4-9 diagnosis and medicine are intersection attributes. In this example there is a M:M cardinality between physician and patient. But in this case the relationship is represented by the entity PhtreatsP so the intersection attributes can be stored there. The attribute medicine has cardinality MC and must be stored in a new relation. Physician

1..*

treats

1..*

Patient

PhTreatsP diagnosis

Figure 4-9: Beziehungsattribut bei einer textbfM zu M Beziehung Die entsprechenden Relationenschematas sehen also folgendermassen aus: CREATE TABLE

phNr

physician ( INTEGER PRIMARY KEY

4-23

. . ) CREATE TABLE

patient (

paNr . .

INTEGER PRIMARY KEY,

) CREATE TABLE

PhTreatsP (

phNr paNr diagnosis

INTEGER REFERENCES INTEGER REFERENCES

physician(phNr), patient(paNr),

CLOB,

PRIMARY KEY

(phNr, paNr)

) CREATE TABLE

PaMedicine (

phNr paNr medicine

INTEGER INTEGER VARCHAR(30),

(phNr, paNr, medicine) FOREIGN KEY(phNr, paNr) REFERENCES PhTreatsP(phNr, paNr) PRIMARY KEY

ON DELETE CASCADE

)

4.3.5

Name Conflict

If possible the attributes of a foreign key have the same name than the attributes of the referenced primary key. Sometimes this is not possible. Consider the example in figure 4-10. Product prNr 0..*

Person peNr

0..*

1 sells produces

Figure 4-10: Namenskonflikte 4-24

1

In this example there are two relationships between person and product. The rules say that two foreign keys must be introduced in the relation product. This two foreign keys can not share the same attributes. We have to rename the attributes of at least one foreign key. We choose sPeNr for selling and pPeNr for producing. CREATE TABLE

peNr

person ( INTEGER PRIMARY KEY

CREATE TABLE

prNr pPeNr sPeNr

)

product ( INTEGER PRIMARY KEY, INTEGER REFERENCES INTEGER REFERENCES

person(peNr), person(peNr)

)

4.3.6

Verify 3NF

The constructed database schema must be verified to ensure that all relation schematas are in 3NF. 4.3.6.1

All Schematas are in 2NF

Because all entities in the ERM have a surogate key, all schematas which represent an entity have a key with only one attribute an can not violate the 2NF. Only relation schematas representing M:M relationships or attributes with cardinality M have composed primary keys in the database schema. 1. Attributes with cardinality M In this case all attributes of the schema belong to the primary key and a violation of the 2NF is not possible. 2. M to M relationship In this case all attributes which do not belong to the primary key are intersection attributes. An intersection attribute depends on both entity of the relationship. This means the attribute is full functional dependent on the primary key of the entity representing the relationship. 4.3.6.2

Violation of the 3NF

There are two cases where the 3NF can be violated. Local violation of the 3NF This can only happen if the ERD is not correct and that a depency between two attributes is not represented. This leads to a transitve dependency in the database schema. In this case normalize the concerned relation schema. 4-25

Global violation of the 3NF This case happens if there is a functional dependency of the form sNr → zip → city But unlike to the local case the dependency between zip and city was represented in the ERD. This means that in the database schema there exists a relation of the form CREATE TABLE

zip city . .

zipCity ( CHAR(8) PRIMARY KEY, VARCHAR(30) NOT NULL,

),

which records this fact. In this case the normalization is very simple. Example 4.11 [Global violation of the 3NF] In this example city is functionaly dependent on zip. This fact is recorded in the relation schema zipCity. CREATE TABLE

sNr name zip city

supplier ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, CHAR(8) NOT NULL, VARCHAR(30) NOT NULL

) CREATE TABLE

zip city . .

zipCity ( CHAR(8) PRIMARY KEY, VARCHAR(30) NOT NULL

)

In supplier there is the transitiv dependency sNr → zip → city. To correct this introduce a foreign key between supplier and zipCity and eliminate the attribute city in supplier. CREATE TABLE

sNr name zip

supplier ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, CHAR(8) NOT NULL REFERENCES

)

4-26

zipCity(zip)

There is a criterion, to find global violation of the 3NF in the database schema. A relation schema R possibly violates the 3NF if the two following conditions hold. 1. The relation schema R has a foreign key a which references a relation ER 2. The schematas R and ER have a common attribute b.

4.3.7

Example

4.3.7.1

Description of the problem

Consider the following rules for a company: 1. A person operates several machines. 2. A person produces several products. 3. A machine is operated by one person. 4. A machine produces several products. 5. The production of a product requires one machine. 6. The production of a product requires one person. 7. Each person has a name. 8. Each product has a name. 9. The dates of maintenace of a machine are recorded. There are not more than one maintenance for the same machine on the same day. 10. The production of a product needs one or more other products. The ERD is shown in figure 4-11. 0..*

Product prNr pName

Machine mNr machineType maintenance[0..*]

8

9

7

Person peNr name

10 0..*

1..*

1..*

5

needs

6

1 produces

4

1..*

3

1 operates

1

produces

Figure 4-11: Das Personen-Produkt-Maschine Beispiel

4-27

1

2

4.3.7.2

Impelementing the Database Schema

Entities and Attributes For each entity in the ERD we define a table in the database. CREATE TABLE

peNr name

Person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL

) CREATE TABLE

prNr pName

Product ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL

) Machine ( mNr INTEGER PRIMARY KEY, machinType VARCHAR(40) NOT NULL

CREATE TABLE

)

Create an own schema for the maintenance date. CREATE TABLE

mNr

MaschineMaintenance ( INTEGER REFERENCES

Maschine(m_nr)

ON DELETE CASCADE,

maintenance DATE, PRIMARY KEY(mNr,

maintenance)

)

Generalization and Specialization In this example there are no generalizations. Relationships Introduce foreign keys for the three 1:M relationships ”person produces product“, ”machine produces product“ and ”person operates machine“. CREATE TABLE

pNr pName mNr peNr

Product ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, INTEGER REFERENCES INTEGER REFERENCES

) CREATE TABLE

Machine ( 4-28

Machine(mNr), Person(peNr)

mNr machineType peNr

INTEGER PRIMARY KEY, VARCHAR(40) NOT NULL, INTEGER REFERENCES

Person(pe_nr)

)

Introduce a relation schema for the MC:MC relationship ”product needs product“. CREATE TABLE

PartsList (

prNr

INTEGER REFERENCES

component

INTEGER REFERENCES

Product(prNr)

ON DELETE CASCADE,

PRIMARY KEY(prNr,

Product(prNr),

component)

)

Intersection Attributes In this example there are no intersection attributes. Verify the 3NF The relation schema Product has a foreign key which references the schema Machine. Product and Machine have the common attribute peNr. The criterion for a violation of the 3NF is fulfilled. In the example there is realy a violation of the 3NF. The functional dependency mN r → peN r (”Machine is operated by person“) is stored in the schema Product and in the schema machine. So dropp the attribute peNr from the schema Machine. Here is the complete database schema. CREATE TABLE

peNr name

Person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL

) CREATE TABLE

prNr pName mNr

Product ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, INTEGER REFERENCES

Machine(mNr)

) Machine ( mNr INTEGER PRIMARY KEY, machinetype VARCHAR(40) NOT NULL, peNr INTEGER REFERENCES Person(peNr)

CREATE TABLE

) CREATE TABLE

MachineMaintenance ( 4-29

mNr

INTEGER REFERENCES

Machine(mNr)

ON DELETE CASCADE,

maintenance

DATE,

PRIMARY KEY(mNr,

maintenance)

)

Remark 4.4 [3NF] To find the global violation of the 3NF it is important to test if a transitive dependency really exists. In our example the functional dependency mNr → peNr is valid in Product and Machine. This is not always true. If the attribute peNr in product has the meaning ’person sells product’, then there is no funtional dependency mNr → peNr in Product. In this case there is no transitive dependency and the attribute peNr can not be dropped from Produkt.

4-30

Chapter 5 Java Database Connection JDBC This chapter is a short introduction to JDBC. More informations and a complete description of all methods can be found at the internetaddress http://java.sun.com/j2se/1.5/docs/api/index.html the online documentation is in the package java.sql.

5.1

What is JDBC?

Here are some general informations about JDBC. • JDBC = Java DataBase Connectivity can be compared to ODBC (Open Database Connectivity) from Microsoft. • JDBC is a low level or call level API to – Get a connection to a given database – Send SQL-Queries and receive the results. – Analyze results of SQL-Queries – Get informations about the database metadata (tables, keys, procedures etc). • With JDBC it is possible to write programs in a product independent manner, to get portable code. It is also possible to use dedicated commands from a given database.

5.2 5.2.1

Architecture of JDBC Structure of the Packages

In the figure 5-1 the structure of the JDBC packages is depicted.

5-1

UserPackage

java.sql

VendorPackage

Figure 5-1: Structure of the packages

Description java.sql: This package belongs to the standard since version 1.1 of the java API. VendorPackage: This package contains an implementation class for every interface in sql.java. It contains the functionality of JDBC and is provided by the database vendor. UserPackage: This package is provided by the user and uses the different interfaces in java.sql. The package is also responsible for loading the correct JDBC driver for the given database product.

5.2.2

The package java.sql

Remark 5.1 [Rules for the use of java.sql] So that the code remains independent from the given database the following rules must be observed when objects are created: • Objects of implementation classes of the vendor package are never created with new, but with a corresponding get- or create-method in an other interface. • The names of the implementation classes are never used in the user package. The only known name is the name of the driver for the specific database. The drivers must be loaded by the user. Here there is the description of the functionality of each interface and each class in the figure 5-2. Driver: The driver is responsible for the communication between the database and the user. The implementation of the driver must have a “static code section”. This code is executed when the driver class is loaded. It should perform the following tasks 1. Create an instance of the driver. 2. Register the instance of the driver in the class Drivermanager with the method DriverManager.registerDriver. It is possible to load different drivers. So it is also possible, to communicate with different databases in the same program A driver can be loaded with the static method Class.forname. The statement 5-2

ResultSet

Statement

DriverManager

Driver

ResultSetMetaData

PreparedStatement

Connection

DatabaseMetaData

CallableStatement

Figure 5-2: The package java.sql Class.forname("com.mysql.jdbc.Driver") will load the Mysql JDBC driver. To load the driver make sure, that it is in the class path. DriverManager: This class contains only static methods. The constructor is private, so no instance of this class can be created. For the user, the only important method is getConnection, which creates a connection to the database. A JDBC URL must be passed as a parameter to the method. This URL identifies which database to use. The JDBC URL has the following format jdbc:: – is the official name of the driver or of a connectivity mechanism (eg. odbc). The producers of drivers can register the name of there subprotocols at Javasoft. – this part of the URL identifies the database. The exact syntax depends on the subprotocol. The statement Connection con = DriverManager.getConnection( "jdbc:mysql://localhost/demodb" "Name", "Password") creates a link to the database demodb on localhost and creates a new connection object con. In this example mysql is the subprotocol and //localhost/demodb is the subname. Connection, Statement, ResultSet: With the connection object it is possible to create statement objects (Statement, PreparedStatement and CallableStatement). 5-3

With a Statement object, sql commands can be sent to the database.The interface Statement has three important methods 1. executeQuery: to send sql SELECT statements to the database. 2. executeUpdate: to send INSERT, UPDATE, DELETE, CREATE, DROP etc sql statements to the database. 3. execute: if the type of the sql statement is unknown (select or update) then this method is used to send the command to the database. The result of a sql query is always a table. This table is stored in an object of the type ResultSet. With the methods first(), last(), next() and previous(), it is possible, to navigate from tuple to tuple in the result table. With the method getxxx() (xxx is a datatype) the attributes of a tuple can be converted to a java object. The interface CallableStatement standardizes the calling of stored procedure for all different databases. The input and output parameters and the return value can be defined in a standardized way. DatabaseMetadata: This interface makes it possible to get informations about the database schema (tables, attributes, keys etc). ResultSetMetaData: This interface makes it possible to get informations about a ResultSet object. This is necessary, if the schema of the result is not known at compile time. The following classes also belong to the package java.sql Exceptions In JDBC there are three kind of exceptions: • SQLExceptions SQLExceptions are thrown, if a sql statement is not executed because it is incorrect or a table does not exist or the user has not enough privileges etc. • SQLWarnings Warnings are created, if a statement is executed, but the result is unusual. For instance, if an update statement does not change anything in the database. Warnings are not thrown, but there are attached to the object, which makes the call to the database. The warnings can be obtained with the method getWarnings(). • DataTruncation If the value of an attribute is truncated when it is written to the database, then the JDBC-driver throws a DataTruncation exception. Date, Time, TimeStamp: This classes are only present for compatibility reasons with the java Date and Time classes. DriverPropertyInfo: This class stores informations about a given driver property. The method getPropertyInfo of the interface driver returns all properties of a driver. So the programmer can get information about the parameters which are necessary to open the database (user, password etc). 5-4

Types: This class only contains constants, which define the generic SQL-Types. The following table shows, how the generic SQL-Types are translated to Java types. A X in the table means that this is the preferred conversion. A x in the table means that the conversion is possible.

getByte getShort getInt getLong getFloat getDouble getBigDecimal getBoolean getString getBytes getDate getTime getTimestamp getAsciiStream getUnicodeStream getBinaryStream getObject

5.3

T I N Y I N T

S M A L L I N T

I N T E G E R

B I G I N T

R E A L

F L O A T

D O U B L E

D E C I M A L

N U M E R I C

B I T

C H A R

V A R C H A R

L O N G V A R C H A R

X x x x x x x x x

x X x x x x x x x

x x X x x x x x x

x x x X x x x x x

x x x x X x x x x

x x x x x X x x x

x x x x x X x x x

x x x x x x X x x

x x x x x x X x x

x x x x x x x X x

x x x x x x x x X

x x x x x x x x X

x x x x x x x x x

x x x x x

x x x x x

x x x X X

x

x

x

x

x

x

x

x

x

x

x

x

x

B I N A R Y

V A R B I N A R Y

L O N G V A R B I N A R Y

D A T E

T I M E

T I M E S T A M P

x X

x X

x x

x

x

x

x

X x

x x X

x

x

x

X

x x x x

x x x x

x x X x

Programming with JDBC

Here are some applications of JDBC.

5.3.1

Connection to the database and simple selects

The following program shows how to connect to the database (for Mysql). It also shows how to do a simple select. The query is sent to the database with a Statement object. Then the tuples can be read with the returned ResultSet object. Example 5.1 [Connection] // Import the JDBC package java.sql import java.sql.*; public class JdbcExample { public static void main(String args[]) { Connection con = null;

5-5

if (args.length < 2) { System.out.println("Usage: java JdbcExamples "); System.exit(0); } String user = args[0]; String passwd = args[1]; String database = args[2]; // Load the Mysql driver it is also // possible to load more than one driver. try { Class.forName("com.mysql.jdbc.Driver"); } catch (Exception e) { // Print error and exit System.out.println(e); System.exit(1); } // Connect to the database. The subprotocol is // "mysql" so the Drivermanager knows, that this is // a connection to a Mysql database (on localhost). // // For Mysql the user and the password are mandatory try { con = DriverManager.getConnection( "jdbc:mysql://localhost/" + database, user, passwd ); } catch (Exception e) { // Connection failed. Exit program System.out.println(e); System.exit(1); } // Example for a SELECT command try { Statement stmt = con.createStatement(); // Send the query to the database ResultSet rs = stmt.executeQuery( "SELECT sNr, name, city, zip " + "FROM supplier"); while (rs.next()) { // Get the sNr and the name. Notice, that the // index begins with 1 and not with 0 System.out.print(rs.getInt(1) + " / " + rs.getString(2) + " / "); // It is also possible to get the values of the attributes // by name. System.out.println(rs.getString("zip") + " " + rs.getString("city")); 5-6

} // Close the statement explicitly. As long as there // is no garbage collection, the resources in the // database will not be released. stmt.close(); } catch (Exception e) { System.out.println(e); } } }

5.3.2

Update-Operation

The following program shows how tuples can be inserted in a table. It also shows how the database schema can be changed. Example 5.2 [JDBC Updates] ... // UPDATE Commands try { Statement stmt = con.createStatement(); // Send the INSERT command to the database. Use the // method executeUpdate. int changes = stmt.executeUpdate( "INSERT INTO supplier (sNr, name, city, zip)" + "VALUES (102, ’Clinton’, ’New York’, ’10010’)"); // Display the number of updated tuples System.out.println("Number of changes: ", changes); stmt.close(); } catch (Exception e) { System.out.println(e); } // Create a new table try { Statement stmt = con.createStatement(); stmt.executeUpdate("CREATE TABLE client (" + "cNr INTEGER PRIMARY KEY, " + "name VARCHAR(30) NOT NULL, " + "city VARCHAR(30) NOT NULL"); stmt.close(); } catch (Exception e) { System.out.println(e); } ... 5-7

5.3.3

Prepared Statement

When the same command is used more than once but with different parameters, then it is wise to use prepared Statements. First the execution is faster and second the passing of the parameters is much simpler Example 5.3 [Insert tuples with prepared Statements] ... // Prepared Statements with passing of parameters. // So a statement can be used more than once. try { PreparedStatement stmt = con.prepareStatement( "INSERT INTO supplier " + "(sNr, name, city, zip) VALUES (?,?,?,?)"); // Pass the parameters stmt.setInt(1, 103); stmt.setString(2, "Davis"); stmt.setString(3, "Dallas"); stmt.setString(4, "200200"); // Now execute the command int changes = stmt.executeUpdate(); stmt.close(); } catch (Exception e) { System.out.println(e); } ...

5.3.4

Null values

If the value of an attribute is null and the data type is an object, then the method getXXX will simply return null in Java. If the data type is not an object, then it is possible to test if the attribute contains a null value with the method wasNull of the interface ResultSet. Example 5.4 [Null values] ... // Handling null values try { Statement stmt = con.createStatement(); // Build the query ResultSet rs = stmt.executeQuery( "SELECT peNr, name, father FROM person"); while (rs.next()) { int father = rs.getInt("father"); if (rs.wasNull()) 5-8

System.out.println("Person " + rs.getInt("peNr") + ". " + rs.getString("name") + " has no father"); } stmt.close(); } catch (Exception e) { System.out.println(e); } ...

5.3.5

Scrolling in Resultset

It is possible to scroll in a Resultset. The cursor can be positioned at any tuple in the set and with the methods next() and previous() navigating from tuple to tuple is possible. Example 5.5 [Scrolling] ... // Example of scrolling in a ResultSet try { // The Statement must be declared as scroll. Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY); // Execute the query ResultSet rs = stmt.executeQuery("SELECT * FROM supplier " + "ORDER BY sNr"); // List all suppliers by descending sNr rs.afterLast(); while (rs.previous()) { System.out.print(rs.getString("name")); } // Direct adressing of tuples. Here we position // the cursor on the seventh tupel and navigate // to the next tuples rs.absolute(7); do { System.out.print(rs.getString("sNr") + " " +rs.getString("name")); } while (rs.next()); stmt.close(); } catch (Exception e) { System.out.println(e); } ...

5-9

5.3.6

Updates in a Resultset

Tuples in a ResultSet can be modified. It is also possible to insert new tuples in a ResultSet. This tuples are then stored in the database. Finaly there is also possible to delete tuples in the Resultset. This operations are only possible, if the tuples in the ResutSet are from one table. Example 5.6 [Update in Resultsets] ... // Example: Update, Insert and delete in a Resultset try { // The statement must be declared as updatable. Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE); // Execute the Query. The Select statement // only contains one Table ResultSet rs = stmt.executeQuery("SELECT * FROM supplier " + "ORDER BY sNr"); // // Update the city in the seventh tuple // rs.absolute(7); rs.updateString("city", "New York"); // Now update the row rs.updateRow(); // Insert a new supplier. For this we must go to // the "InsertRow" rs.moveToInsertRow(); // Fill the datas rs.updateInt("sNr", 201); rs.updateString("name", "Clinton"); rs.updateString("city", "Denver"); rs.updateString("zip", "200201"); // Store the new tuple rs.insertRow(); // Return to the cursor position rs.moveToCurrentRow(); // // Delete the first tuple in the set // rs.first(); rs.deleteRow(); // Commit the changes. 5-10

con.commit(); stmt.close(); } catch (Exception e) { System.out.println(e); } ...

5.4

Meta-Informations

There are some problems which can not be solved with the presented classes and methods. This problems happen, when the SQL-Statements are not known at compile time. Example 5.7 [dbframe] The program dbframe is an example of such a problem. Because the statements are written by the user at runtime, the program does not know the form of this commands. They must be interpreted at runtime. Some more classes and methods are necessary to solve such problems. 1. It must be possible to get the relation schema of a ResultSet. This can be achieved with the inteface ResultSetMetaData. 2. It must be possible to get informations about the database schema at runtime. This can be achieved with the inteface DatabaseMetaData.

5.4.1

The class ResultSetMetaData

In the next example an arbitrary query is analyzed with the methods of the interface ResultSetMetaData. The JDBC datatypes are defined in the class Types. Example 5.8 [Analyzing ResultSets] ... // Example for using ResultSetMetadata try { Statement stmt = con.createStatement(); // Execute a query ResultSet rs = stmt.executeQuery( "SELECT * FROM supplier natural join sorder " + "WHERE supplier.sNr = 1"); // Get the Metadata ResultSetMetaData rsm = rs.getMetaData(); // Get the numbers of columns in the result schema int cols = rsm.getColumnCount(); // Loop over the tuples while (rs.next()) { 5-11

// Get the name, datatype and value of each // Column for (int i = 1; i = 4 and pName = ’Sugar’) select(person, isnotnull(father) and isnotnull(mother)) Projection π project(rel-expr, attribute {,attribute}) Example: project(supplier, Snr, name) Natural join ⊲⊳ natjoin(rel-expr, rel-expr) Example: natjoin(supplier, sorder) Union ∪ union(rel-expr, rel-expr) Example: union(select(person, name = ’Meier’), select(person, ort = ’Bern’)) Intersection ∩ intersect(rel-expr, rel-expr) Example: intersect(select(person, name = ’Meier’), select(person, ort = ’Bern’)) Setminus \ except(rel-expr, rel-expr) Example: except(select(person, name = ’Meier’), select(person, ort = ’Bern’)) Renaming β rename(rel-expr, newattr