Bern University of Applied Sciences Engineering and Information Technology
RELATIONAL DATABASES Pierre Fierz
Keywords: relational model, relational algebra, normalforms, entity-relationshipmodel, SQL, JDBC [File rdb.tex, Date 02.06.09]
© Pierre Fierz
Chapter 1 The Model 1.1
What is a Database?
Definition 1.1 [Database] A database is a system for description, storage and recovery of large sets of data. The datas are usualy shared by several applications and users. A database consists of two main parts: • The actual data and • the Database Management System (DBMS), which is reponsible for storing, changing, searching and deleting datas in accordance with the description of the datas. Figure 1-1 represents the different parts of a database. actual data application 1
meta data application 2
auxiliary data Database−Managementsystem
application 3
Database
Figure 1-1: The DBMS is an interface between the data and the users
1.2
ANSI-SPARC Architecture
The ANSI-SPARC Architecture (American National Standards Institute, Standards Planning And Requirements Committee) is an abstract design standard for a Database Management System, first proposed in 1975. Most modern commercial DBMS are based on this system. The ANSI-SPARC model never became a formal standard.
1-1
1.2.1
Three-level architecture
The objective of the three-level architecture is to separate the users view(s) of the database from the way that it is physically represented. This is desirable since: • It allows independent customised user views. Each user should be able to access the same data, but have a different customised view of the data. These should be independent: changes to one view should not affect others. • It hides the physical storage details from users. Users should not have to deal with physical database storage details. They should be allowed to work with the data itself, without concern for how it is physically stored. • The database administrator should be able to change the database storage structures without affecting the users views. From time to time rationalisations or other changes to the structure of an organisation’s data will be required. • The structure of the database should be unaffected by changes to the physical aspects of the storage. For example, a changeover to a new disk. • The database administrator should be able to change the conceptual or global structure of the database without affecting the users. This should be possible while still maintaining the desired individual users views.
1.2.2
The Model
The architecture of the ANSI-SPARC model is pictured in figure 1-2.
Figure 1-2: ANSI-SPARC Architecture for databases
1-2
The overall description of a database is called the database schema. There are three different types of schema corresponding to the three levels in the ANSI-SPARC architecture. External Level (User Views) A user’s view of the database describes a part of the database that is relevant to a particular user. It excludes irrelevant data as well as data which the user is not authorised to access. The external schema describe the different external views of the data and there may be many external schemas for a given database. Conceptual Level The conceptual level is a way of describing what data is stored within the whole database and how the data is inter-related. The conceptual level does not specify how the data is physically stored. The conceptual schema describes all the data items and relationships between them, together with integrity constraints. There is only one conceptual schema per database Internal Level The internal level involves how the database is physically represented on the computer system. It describes how the data is actually stored in the database and on the computer hardware. The internal schema contains definitions of the stored records, the methods of representation, the data fields, and indexes. There is only one internal schema per database.
1.2.3
Data Independence
Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to make changes in the definition and organization of data, and vice-versa. There are two levels of data independence 1. Logical data independence: The ability to change the logical schema without changing the external schema is called Logical Data Independence. Example: Addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas or having to rewrite existing application programs. 2. Physical data independence: The ability to change the physical schema without changing the logical schema is called Physical Data Independence. Example: A change to the internal schema, such as using different file organisation or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas
1-3
1.3
The Relational Model
The relational model for database management is a database model based on predicate logic and set theory. It was first formulated and proposed in 1970 by Edgar Codd. The relational model is used at the conceptual level of the ANSI-SPARC architecture.
1.3.1
Attribute and Domain
Definition 1.2 [Attribute and Domain] Let U be a nonempty, finite set called the Universe. An element A ∈ U is called an Attribute. Let D = {D1 , . . . , Dm } be a finite set of nonempty finite sets. Di are Domains also called data type. There is a function dom : U 7→ D. dom(A) is called the domain of A. An element w ∈ dom(A) is called an attribute value of A. Remark 1.1 [Atomic Values] All domains contains only atomic values. This means, that the value of an attribute is not a set or a relation. Examples of atomic values are numbers, strings, dates . . . . Example 1.1 [Attributes and Domains] The next table shows attributes and possible domains. Attribute name city quantity
Domain Character string with length ≤ 30 Character string with length ≤ 25 0 < Number ≤ 1500
Definition 1.3 [Null value] A Null value is a special value used to indicate that a data value does not exist in the database. The Null value is simply added to the domain of an attribute. The meaning of a null value is “missing information” or “inapplicable information”. We will denote null values by the symbol ⊥. Remark 1.2 [A null value is not 0] A null value is not the same than the number 0 or the empty character string but a special marker with the given meaning.
1.3.2
Relation Schema, Relation and Tuple
Definition 1.4 [Relation Schema] A non empty subset of the universe R ⊆ U is called a relation schema. The relation schema defines the type of an object (tuple) in the relational model. Definition 1.5 [Relation and Tupel] A relation r over the relation schema R = {A1 , . . . , An } (written r(R)) is a finite set of functions of the form t : R −→
n [
dom(Ai ) with t|Ai ∈ dom(Ai ) for i = 1, . . . , n
i=1
1-4
t|Ai is the restriction of the function t to Ai . The functions t are called tuple over R. The set of all relations over a given relation schema is denoted by REL(R) := {r|r(R)}. For X ⊆ R the restriction t|X is called an X value of t. It is also denoted by t(X). Example 1.2 [Relation and Tuple] Consider the following relation schema representing the attributes of suppliers. supplier = {sNr, name, city, zip} The domains are dom(sN r) = int, dom(name) = string, dom(city) = {Denver, N ewY ork, . . .}, dom(zip) = [01000 − 99999] Now we can define tuples. Each tuple represents a supplier. t1 (sNr) = 1, t1 (name) = ’Smith’, t1 (city) = ’Denver’, t1 (zip) = ’80202’ t2 (sNr) = 2, t2 (name) = ’Davis’, t2 (city) = ’New York’, t2 (zip) = ’10019’ t3 (sNr) = 3, t3 (name) = ’Jones’, t3 (city) = ’Dallas’, t3 (zip) = ’75277’ The set r(supplier) = {t1 , t2 , t3 } is a relation over the schema supplier. Remark 1.3 [Cartesian Product] Originaly a relation r over a relation schema R = {A1 . . . An } was defined as a subset of the cartesian product of the domains of the attributes of R r(R) ⊆ dom(A1 ) × . . . × dom(An ) The problem is, that with the cartesien product we introduce an ordering of the attributes. But this ordering has no semantic meaning in the relational model. This definition gives its name to the model because in mathematics a relation is defined as a subset of the cartesian product. In terms of the relational model of databases, a table can be considered a convenient representation of a relation, but the two are not strictly equivalent. For instance, an SQL table can potentially contain duplicate rows, whereas a true relation cannot contain duplicate tuples. Similarly, representation as a table implies a particular ordering to the rows and columns, whereas a relation is explicitly unordered. In the figure 1-3 the supplier relation is represented as a table. Definition 1.6 [Local Integrity Constraint] Let R be a relation schema. An integrity constraint is a funktion b : REL(R) → {true, f alse}. We can extend a relation schema R by a set of integrity constraints B. The extended schema is denoted by R = (R, B). A relation r over B satisfies all constraints b(r) = true ∀b ∈ B. The set of all relations over R is denoted by SATR (B) := {r|r ∈ r(R)}.
1-5
String
Numeric
name
S#
Relation
Denver New York etc.
01000−99999
city
zip
1
Smith
Denver
80202
2
Davis
New York
10019
3
Jones
Dallas
75277
Domains
Relation Schema Tuple
. . .
Figure 1-3: The Relation supplier
1.3.3
Semantic Key
Definition 1.7 [Semantic Key] A semantic key K ⊆ R for a relation r(R) is a minimal set {A1 , . . . , Ak } ⊆ R with ∀t1 , t2 ∈ r(R)[t1 6= t2 ⇔ ∃Aj ∈ K : t1 (Aj ) 6= t2 (Aj )]
A semantic key identifies each tuple in a relation. A semantic key exists for every relation, because a relation is defined as a set of tuples. Example 1.3 [Examples of semantic keys] Consider the following relation schema representing the attributes of suppliers. supplier = {sNr, name, zip, city} We will examine different relations over the schema supplier r1
=
{, }
zip is a key for relation r1 . r2
=
{, , }
In this example there are two keys sNr and name r3
=
{, , }
In this example the two keys are {name, zip} and {name, city}
1-6
1.3.4
Syntactic Key
In the example 1.3 the semantic keys may change when the relation is changing. This is not a good way to identify the tuples in a relation. We need a key, which does not change over time. Such a key is called syntactic, because it is defined by the designer of the schema. Definition 1.8 [Syntactic Key] A syntactic Key for a relation schema R is a set K := {A1 , . . . , Ak } ⊆ R, which satisfy the following local integrity constraints bK (r(R)) =
(
true : f alse :
if ∀t1 , t2 ∈ r(R) : t1 6= t2 ⇒ ∃Aj ∈ K : t1 (Aj ) 6= t2 (Aj ) otherwise
A syntactic Key K1 is never a proper subset of another syntactic Key K2 Remark 1.4 [Keys an Null values] Attributes of a Key never can contain a null value. It is not possible to identify an objekt with the value “unknown”. This fact is called key integrity. Definition 1.9 [Primary and Candidate Key] If there is more than one possible Key for the same relational schema then one of them is selected as the primary Key. The other keys (if such keys exist) are called candidate keys of the schema. Remark 1.5 [The choice of the primary key] In practice we always choose a primary key who never changes. Example 1.4 [Keys] Consider the following relation schema representing the products supplied by a supplier. product =({pNr, pname, price, stockQty, minimalQty}, {pNr}, ck({pname})) In this example we choose pNr as the primary key of the relation. Because distinct products should have distinct names, we define the candidate key pname.
1.3.5
Database and Database Schema
Definition 1.10 [Database and Database Schema] A database schema is a finite non empty set of relation schematas S := {R1 , . . . , Rp } over the universe U. A database value (short: database) over a database schema S is a set of relations d(S) := {r1 (R1 ), . . . , rp (Rp )}. A relation r(R) ∈ d(S) is called a basis relation. As we did it for a relation schema, we define global integrity constraints for a database schema. Definition 1.11 [Global Integrity Constraints] Let S be a database schema. A global integrity constraint for S is a function γ : d(S) −→ {true, f alse}}. Let Γ := {γ|γ : {d|d(S)} −→ {true, f alse}} be a set of integrity constraint for the database schema S. Then S := (S, Γ) is called a extended database schema. d(S) is a database d(S) which satisfies all constraints in Γ. The set of all databases over S is defined by DAT(S) := {d|d(S)}. 1-7
1.3.6
Foreign Key
In the context of relational databases, a foreign key is a global integrity constraint between two tables. Definition 1.12 [Foreign Key] A foreign key relationship between two relation r1 (R1 ) and r2 (R2 ) is an expession of the form X(R1 ) → Y (R2 ) with X(R1 ) ⊆ R1 and Y (R2 ) ⊆ R2 , which satisfy the following constraints: • Y (R2 ) is the primary key (or a candidate key) for R2 • {t(X)|t ∈ r1 (R1 )} ⊆ {t(Y )|t ∈ r2 (R2 )} X(R1 ) is a foreign key, r1 (R1 ) is called the referencing relation and r2 (R2 ) the referenced relation. The system must enforce the constraints. This concept is known under the name referential integrity. Example 1.5 [Foreign key] Consider the two relational schematas supplier sorder
= ({sNr, name, city, zip}, {sNr}) = ({oNr, sNr, orderDate, deliveryDate nv}, {oNr}, fk({sNr}, supplier))
We introduce the following foreing key relationship. X(sorder) → Y (supplier) with X(sorder), Y (supplier) = {sNr} sNr is the primary key in supplier and a foreign key in sorder. This models the fact that each order belongs to exactly one supplier and that each supplier owns more than one order. Now we introduce orderItems and products. orderItem = ({oNr, pNr, qty}, {oNr,pNr}, fk({oNr}, order), fk({pNr, product)) product = ({pNr, pname, price stockQty, minimalQty}, {pNr}, ck({pname})) In this example there are two foreign keys. X(orderItem) → Y (sorder) with X(orderItem), Y (sorder) = oNr and X(orderItem) → Y (product) with X(ordeItem), Y (product) = pNr and This models the fact that each order owns one or more order items and that each order item is referencing exactly one product. The relation orderItem represents a manyto-many relationship between orders and products. orderItem is also called associative entity. Remark 1.6 [Foreign Key and Null Value] A foreign key containing null values cannot match the values of a primary key, since a primary key by definition can have 1-8
no null values. However, a null foreign key value is always valid, regardless of the value of any of its non-null parts. When a tuple is deleted in a referenced relation the referential integrity must be preserved. To do this three different strategies can be applied. 1. Restricted delete A tuple in the referenced relation can only be deleted if it is not referenced by any tuple in the referencing relation. 2. Cascaded delete All referencing tuples are also deleted. Eventually this process must be cascaded. 3. Nullify All values of the attributes of the foreign key in the referencing relation are nullified.
1.4
The Entity-Relationship-Model (ERM)
The relational model is not appropriated for design purposes. The Entity-RelationshipModel (ERM) is more suitable to design good database schematas. The translation from the ERM to the relational model is an easy task. An entity-relationship model is an abstract conceptual representation of structured data. Entity-relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model) of a system, and of its requirements, in a top-down fashion. Diagrams created using this process are called entity-relationship diagrams, or ER diagrams for short. Originally proposed in 1976 by Dr. Pin-Shan (Peter) Chen, many variants of the process have subsequently been devised.
1.4.1
The Elements of the Model
Definition 1.13 [Entity] An entity is a real-world object distinguishable from all other objects in the world. An entity has one or more properties. An entity can be a Person, an abstract concept, an event etc. The Name of an entity is always a noun. Examples for entities are: • person: employee, student, physician, patient . . . • place: city, department . . . • object: machine, building, car . . . • concept: account, course, workCenter . . . • event: customerOrder, . . .
1-9
Definition 1.14 [Entity Set] The set of all properties of an entity is called the type of the entity. An entity set is a set of entities of the same type. Definition 1.15 [Domain] A domain is a collection (set) of the valid values for a property of an entity. Every property of an entity has a domain. For the domain of a property we will write dom(prop). Definition 1.16 [Entity Attribute] An attribute is a mapping from an entity set to the domain of a property of the entity. If the property has more than one value, then the mapping will be to the powerset of the domain. The kind of the mapping will determine the cardinality of the attribute. There are four types of cardinality: • Type 1 cardinality: In this case the mapping is a function, e.g. every person has exactly one name. name : person 7→ dom(name) • Typ C cardinality (C for conditional): In this case the mapping is a partial function, e.g. The patient has one or none health insurance. insurance : patient ֒→ dom(insurance) • Typ M cardinality (M for multiple): In this case the mapping is a function to the powerset of the domain without the empty set, e.g. a person speaks one ore more languages. language : person 7→ P(dom(language)) \ {} • Typ MC cardinality (MC for multiple conditional): In this case the mapping is a function to the powerset of the domain, e.g. a person has no diploma, one diploma or more than one diploma. diploma : person 7→ P(dom(diploma)) Definition 1.17 [Entity Key] A entity key is an attribute (or a set of attributes), whose values identifies each entity in a entity set. The key selection must follow certain criteria. These include: 1. Key should NOT change its value over the life of each entity instance. 2. Key should be guaranteed to NOT be NULL. 3. Avoid intelligent-keys (stored data as part of a key such as a code). For example, a key where the first two digits indicates a warehouse location would be a poor choice because the coding scheme could change over time. 4. Substitute single-attribute surrogate keys for large composite keys. A surrogate key can be a system generated sequence of unique numbers.
1-10
To draw the entity relationship diagramm we will use the UML (unified modeling language) notation. An example for an entity set is shown in figure 1-4. Patient pNr name birthdate healthInsurance [0..1] language [1..*] allergy [0..*]
Figure 1-4: Entity set and Attributes The cardinalities 1, C, M and MC for the attributes are denoted in the following way: name Type 1 healthInsurance[0..1] Type C language[1..*] Type M allergy[0..*] Type MC Definition 1.18 [Relationship] A relationship is an association between instances of one or more entity types. We only will use binary relationships. Definition 1.19 [Relationship set] A relationship set is a collection of relationships of the same kind. An Example is the belongs to relationship between the entity supplier and sorder. The cardinality (C:C, 1:1, C:1, M:M, MC:MC etc.) expresses the number of occurrences of one entity related to another entity. Examples: • employee 1:1 parkingPlace: Each employee has one park place. Each park place belongs to exactly one employee. • supplier 1:M order: Each order belongs to exactly one supplier. A supplier may have multiple orders • physician M:M patient: A physician treats several patients. A patient is treated by several physicians. Definition 1.20 [Recursive Relationship] A recursive relationship (or unary relationship) is a relationship between entties of the same entity set. As an example we can consider the parts list of a product. This is a relationship with cardinality MC:MC between products. product MC:MC product In UML a relationship is represented by a line between the concerned entity sets. The name of the relationship is written near the line. The cardinalities are writen on both sides of the line. Some possibilities of relationships between physician and patient are shown in figure 1-5. 1-11
physician
1..*
treats
1..*
patient
a) A physician treats several patients. A patient is treated by several physicians. physician
1
1..*
treats
patient
b) A physician treats several patients. A patient is treated by one physician. physician
1
0..*
treats
patient
c) A physician treats zero, one or several patients. A patient is treated by one physician.
Figure 1-5: Relationships (M:M, 1:M and 1:MC) The arrow near the name of the relationship indicates the direction in wich the text is read. The relationship itself is not directed. Definition 1.21 [intersection attribute] An intersection attribute is associated with a relationship. The attribute is only meaningful if a relationship exists between two entities. As an example consider the parts list of a product. To manufacture a product, two informations are essential. Which produkt is needed and which quantity of the product is needed. The quantity is not an attribute of the product but an attribute of the parts list relationship. When a relationship has attributes, then it is also considered as an entity. This entity is called an associative entity. In UML this is represented by an own entity attached to the relationship. An example is shown in figure 1-6. Physician
1..*
treats
1..*
Patient
PhTreatsP diagnosis
Figure 1-6: intersection attribute The ER-Model allows generalization and specialization of entity. In the object oriented world this is called inheritance. Definition 1.22 [Generalization and Specialization] An entity type that includes one or more distinct subgroupings of its occurrences is called a generalization. A distinct subgrouping of occurrences of an entity type is called a specialization. Example 1.6 [Generalisation] Consider an hospital where we have an entity physician and an entity patient. Both entity type have attributes in common like the first 1-12
name, the name, the birthdate etc. But it is possible that a physician becomes a patient. Then the same person would have a name as a physician and a name as a patient. This introduces redundancy in the model. To solve this problem we introduce a generalization. The generalization will be the entity person. Physician and patient become specialisations of person. With generalization/specialisation there are different possibilities to consider. 1. The specialisations may be disjoint or not disjoint. 2. The generalization can be abstract or not. Abstract means that each entity in the generalization must also be an entity of one (or more) specialisation. Generalization is defined in UML by a hollow arrow on the parent side of the relationship. Generalization is not a relationship between two entity types. It is a derivation of the general entity type to the specialized entity type. Cardinality is not allowed on generalization relationships. For an abstract class the name is written in italic. The different cases are shown in figure 1-7 Person
Room {disjoint}
{disjoint} woman
Man
case a)
operatingRoom case b)
Person
Person
{overlapping} Employee case c)
hospitalRoom
{overlapping}
Patient
physician
Patient
case d)
Figure 1-7: Notation for specialization
1.4.2
Developing an Entity Relationship Model (one way of doing it)
1. Identify Entities Identify the roles, events, locations, tangible things or concepts about which the end-users want to store data. 2. Find Relationships and determine the cardinalities Find the natural associations between pairs of entities (possibly use a relationship matrix). Determine the number of occurrences of one entity for a single occurrence of the related entity. 1-13
3. Identify generalization/specialization Determine if groups of entities can be grouped to a generalized entity. 4. Define Entity Keys Identify the data attribute(s) that uniquely identify one and only one occurrence of each entity. 5. Identify and Map Attributes Name the information details which are essential to the system under development. For each attribute, match it with exactly one entity that it describes. 6. Draw the diagramm 7. Check Results Does the final Entity Relationship Diagram accurately depict the system data? Example 1.7 [Developement of an ERM] The following simple example will be developed in the following sections: 1. A company has several departments. 2. Each department has a supervisor and at least one employee. 3. Employees must be assigned to at least one, but possibly more departments. 4. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. 5. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number. 6. A further information who must be stored is the number of hours that an employee works for a project. 1.4.2.1
Identify Entities
In this stage, you look through the information about the system and seek to identify the roles, events, locations, concepts and other tangible things that you wish to store data about. One approach to this is to work through the information and underline those words which you think correspond to entities. The entities in this system are Department, Employee, Supervisor and Project. One is tempted to make Company an entity, but it is a false entity because it has only one instance in this problem. True entities must have more than one instance.
1-14
1.4.2.2
Find Relationships
From point 2. in the description we know that there is exactly one supervisor for each department. It is not clear if a supervisor is in charge of only one department so we will assume this. run by: Department 1:1 Supervisor From point 2. Each department has at least one employee. From point 3. An employee belongs to at least one, but possibly more departments is assigned: Employee M:M Department From point 4. every project has at least one employee. An employee does not have to be assigned to a project. We assume that an employee may be assigned to more than one project. works on: Employee M:MC Project 1.4.2.3
Identify generalization/specialization
The entities employee and supervisor are both roles played by person in the system. Both entities will have common attributes (name, address etc.). In this case it is appropriate to introduce a generalized entity person. This entity is abstract because each person in the system is either an employee or a supervisor. The specialisations employee and supervisor are disjoint. 1.4.2.4
Define the Entity Keys
Always introduce keys without semantic meaning. The simplest thing is to give every entity a number to identify it. Note that specializations don’t need a key because they inherite there keys from the generalization. We introduce the Keys PersonNumber, DepartmentNumber and ProjectNumber. 1.4.2.5
Identify and Map Attributes
From point 5. All entities have a name. DepartmentName, PersonName (inherited by employee and supervisor) and ProjectName. From point 6. The number of hours an employee works for a project is an intersection attribute. So we introduce a new entity ProjectHour to map this attribute.
1-15
1.4.2.6
Draw the Diagram
Now all information is awayable to draw the diagram. This is given in figure 1-8 = ’1-jan-2007’
Conditions
Till now all examples only use comparison between an attribute and a literal or between two attributes in the WEHRE clause. In SQL more complex conditions are allowed. The
LIKE
Predicate With LIKE it is possible to compare an attribute with a pattern.
::= [NOT]
LIKE
In the pattern the metacharacter ’%’ stands for an arbitrary string (the empty string is also allowed). The metacharacter ’_’ stands for exactly one character. The next example finds all suppliers whose name has an A at the beginning. SELECT FROM WHERE
* supplier name LIKE ’A%’
In the next example all products with a ’%’ in the name are selected. Notice the use of the escape Character ’\%’ SELECT FROM WHERE
* product pname LIKE ’%\%%’
Subqueries in the WHERE Clause A subquery in the WHERE clause defines a set of tuples. The predicate IN permits to test if a value is in this set. The predicate EXISTS permits to test if the set is empty or not. Remark 2.7 [Nesting] Subqueries may be nested to arbitrary depth. 2-17
Predicate
IN
Syntax of this predicate is:
::= {|} [NOT] ::= (literal [,literal]...) | ()
IN
Example with a list of literals SELECT FROM WHERE
* supplier city IN (’Detroit’, ’New York’, ’Los Angeles’)
Select all suppliers where the product ’Sugar’ is present in an orderItem. SELECT FROM WHERE
* supplier sup ’Sugar’ IN (SELECT FROM
WHERE
pname sorder so
orderItem oi NATURAL JOIN product p sup.sNr = so.sNr) NATURAL JOIN
Remark 2.8 [Scope of the Aliases] In this example the alias “sup” from the outer query is used in the subquery. The logical meaning is that the inner query is performed for each tuple of the outer query. So in the subquery the value of “sup” is always well defined and contain the values of the actual supplier. Predicate
EXISTS
This predicate tests if the result of the subquery is not empty.
The example selects all supplier with an oderItem containing the product with pNr = 2. SELECT FROM WHERE
* supplier sup EXISTS ( SELECT FROM WHERE
* sorder so NATURAL JOIN orderItem io sup.sNr = so.sNr AND io.pNr = 2)
2-18
2.3.2.8
ORDER BY Clause
A relation is a set so there is no ordering of the tuples. In SQL the tuples of the result can be ordered by an ORDER BY clause. ::=
ORDER BY
[ASC|DESC], [ [ASC|DESC]]. . .
The example selects orderItems. They are ordered by supplier name, order date in descending order and product number. SELECT FROM ORDER BY
2.3.2.9
s.name, so.orderDate, io.pNr, io.qty supplier s NATURAL JOIN sorder so NATURAL JOIN orderItem oi s.name, so.orderDate DESC, io.pNr
GROUP BY Clause
A GROUP BY clause in SQL specifies that a SQL SELECT statement returns a list that is grouped by one or more columns, usually in order to apply some sort of aggregate function to certain columns. ::=
GROUP BY
The attributes appearing in the SELECT clause must all be defined in the clause. All aggregate functions may also appear in the SELECT clause.
GROUP BY
The next example lists how many suppliers are present in a city. SELECT FROM GROUP BY
s.city, COUNT(*) supplier s s.city
AS
supplierCount
The next command lists the number of ordered items of every product since 1 january 2007. SELECT FROM WHERE GROUP BY
p.pname, SUM(io.qty) AS itemCount (sorder so NATURAL JOIN orderItem oi) b.bestelldatum >= ’1-jan-2007’ p.pNr, p.pname
2-19
NATURAL JOIN
product p
2.3.2.10 The
HAVING Clause
HAVING
clause is a filter for groups created by the
GROUP BY
clause.
The next example lists all cities where more than 3 suppliers are present. SELECT FROM GROUP BY HAVING
2.3.2.11
s.city, COUNT(*) AS supplierCount supplier s s.city supplierCount >= 3
Null Values
Null values stand for undefined or unknown values. Null values can not be compared with other values. In SQL the comparison of a null value with another value is always false. With the predicate IS [NOT] NULL it is possible to find out if an attribute contains a null value. SELECT FROM WHERE
* order deliveryDate
IS NULL
Remark 2.9 [Aggregate Functions and Null Value] During the computation of the aggregate functions SUM, AVG, MAX and MIN tuples with a null value in the given attribute are simply skiped. So the functions compute the value of all tuples where the value is not null.
2-20
Chapter 3 Transactions and Data Updates 3.1
Insert, Update and Delete Operations
Remark 3.1 [Update is possible only on one Table] Updates are possible only on one table of the database. If more than one table is involved in a change then there must be an update operation for each table. To preserve the integrity of the database, update operations can be grouped in a transaction so that all operations are performed in an atomic way (all or nothing).
3.1.1
INSERT
With the INSERT command new tuples can be inserted into a table. The syntax of the operation is given below.
::=
[()] ::= VALUES () | SELECT FROM . . . [WHERE . . . ] INSERT INTO
The attributes in the
INTO
clause and in the
SELECT
clause must be compatible.
supplier (sNr, name, city, zip) VALUES (27, ’Smith’, ’Detroit’, ’48226’) INSERT INTO
3.1.2
UPDATE
With the UPDATE command existing tuples in a table can be changed. The update accepts a where clause to select the tuples to change. The syntax of the operation is given below. 3-1
::=
SET [WHERE ] ::= = [, = ]. . . ::= literal | attribut-name | () UPTATE
sorder orderDate = ’12-jun-2007’ orderDate = ’10-jun-2007’ AND sNr = 1
UPDATE SET WHERE
In the next example we set the stockQty of the produc with pNr = 1 to the sum of all orderItems of this product. UPDATE SET
product stockQty = (
SELECT SUM(qty) FROM WHERE
WHERE
orderItem pNr = 1)
pNr = 1
3.1.3
DELETE
With the DELETE command tuple can be removed from a table. The tuples are choosed with the WHERE clause.
::=
DELETE FROM WHERE
DELETE FROM [WHERE ]
supplier sNr = 12
If no tuple is referencing this supplier then it is deleted. Else it depends on then delete mode of the foreign key.
3.2
Data Integrity and Transactions
Generally many users work with the same data base. An important task of the DBMS consists in coordinating the read and the write operations of the different processes. Each user must see the database as if it would belong to him. The operations must be executed in such a way, that no user has to wait for a long time for a result. Another task of the database system is to ensure that the datas remain consistent even after a system crash (recovery). 3-2
3.2.1
Consistency of the Database
A major requirement for database system is to ensure the consistency of the data at any time Definition 3.1 [Consistency] A database is consistent if 1. all integrity constraints are achieved (Primary key constraint, referential integrity, assertions etc) 2. all attribute values are consistent with the actual reality. 3. all relevant data are stored in the database. Violation of the first point can be caused by • System crashes, disk crashes, hardware errors etc. • Errors in the application programms. • Concurrent updates. This problems can be handled by transactions. The user has the resposability to ensure the conditions two and three. This can not be tested by the database management system.
3.2.2
The Transaction Model
Definition 3.2 [Transaction] A transaction is a sequence of read and write operations which transform a consistent database in another consistent database. The duration of a transaction is determined by the application. The transaction begins with the first SQL statement and ends with a COMMIT or ROLLBACK command. A ROLLBACK will restore the database to the state it had at the beginning of the transaction. Transactions have the following four properties (ACID): Atomarity At the end of a transaction either all update operations are executed or none. Consistency If the state of the database is consistent before the transaction starts then it is also consistent at the end of the transaction. During the transaction the state may be temporarly inconsistent. Isolation Other user do not see the changes to the database until the end of the transaction. Durability At the end of the transaction all changes are permanantly stored and are secure. 3-3
3.2.3
Atomarity and Durability
3.2.3.1
The Logfile
The atomarity is enforced with the help of the database wide logfile. The following informations are stored in the file: • Begin of transaction •
UPDATE, INSERT
and DELETE operations. To enable rollback and recovery the old data values and the new data values are stored in the file. The transaction identification is also stored for each operation.
• End of transaction. Remark 3.2 [Write Ahead Log Protocoll] All changes must be written to the logfile before the changes a stored to the database. This principle is called write ahead log protocol. Entries in the logfile are written without the help of write buffer or memory cache so that they are not lost after a system crash. The logfile should not be located on the same disk than the database. If more security is needed , the disk where the logfile is located can be mirrored. The logfile contains enough informations to recover a consistent state of the database after a system crash. 3.2.3.2
Checkpoints
A transaction is considered as finished as soon as the commit-mark is writen to the logfile. Now all changes of the transaction could be written to the database. To obtain a better performance the data are not written directly but buffered in the main memory. The writing of the data to the database is delayed as long as possible. There are two situations when the data are written to disk: 1. The cash buffer is full and we need space for new data from the disk. Then the data, which were not used for the longest time, are written to the disk. The database system manages the buffer automatically. 2. When a Checkpoint is reached: In this case all modified data in the cash buffer are written to the disk. Checkpoints are fired automatically by the system at given time intervals or after a given number of transactions. During the writing all other update operations are blocked. When the writing is terminated, this information is written to the logfile with the identification of all running transactions. Checkpoints are used to accelerate the recovery after a system crash. They define a known state of the data in the database.
3-4
3.2.3.3
Recovery
Consider figure 3-1. There five paralel transactions are shown. REDO UNDO A1
T1
A2 A2
A1
T2
A3 A1
T3 A1
T4
A2 A1
T5
time Checkpoint
Crash
Logfile: BOT T1
BOT
A1
BOT
A1
A2
A1
EOT
A2
T4
T1
T2
T4
T1
T2
T1
T2
Checkpoint aktiv T2,T4
BOT
A3
A1
A2
BOT
A1
EOT
T3
T2
T3
T4
T5
T5
T3
BOT
A3
A1
A2
BOT
A1
EOT
Recovery Operations BOT
A1
Undo
Undo
T4
T4
Redo Redo Redo Undo Undo Undo Redo T3
BOT = Begining of Transaction
T2
T3
T4
T5
T5
T3
EOT = End of Transaction
Figure 3-1: Recovery with help of the logfile Transaktion T1 This transaction ended before the last checkpoint. All changes are stored in the database. Transaktion T2 This transaction is finished. We know that A1 and A2 are stored in the database. For the operation A3 a REDO operation is necessary. Transaktion T3 This transaction is finished. A REDO operation is necessary for A1. Transaktion T4 und T5 This transactions are not terminated. For all operations of the transactions an UNDO operation is necessary.
3.2.4
Isolation
When transactions are running concurrently, certain problems can arise. This problems and the solutions are discussed in this section. 3-5
3.2.4.1
lost Update
The situation is depicted in figure 3-2. Transaction A
time
FETCH R
t1 t2
UPDATE R
Transaction B
FETCH R
t3 t4
UPDATE R
Figure 3-2: Lost Update Transactions A and B both change the same tuple R. When time t4 is reached, the changes made by A at time t3 are lost. 3.2.4.2
Uncommitted Dependency
The situation is depicted in the figures 3-3 and 3-4.
Figure 3-3: Problems with open transactions In the first example at time t2, transaction B reads data modified by transaction A at time t1. At time t3 the data which were read by transaction B are no more correct because of the ROLLBACK command. Transaction A FETCH R
time
Transaction B
t1
UPDATE R t2
FETCH R
UPDATE R ROLLBACK
t3
Figure 3-4: The Update is lost
3-6
In this example the modifications of transaction B at time t3 are destroyed by the ROLLBACK of transaction A. 3.2.4.3
Inconsitent Analysis
The situation is depicted in figure 3-5. Because transaction B transferes 10 $ from account3 to account1 the sum computed by transaction A is 110 $ and not 120 $. account1 40
account2 50
Transaction A
FETCH account1 (40) sum = 40 FETCH account2 (50) sum = 90
FETCH account3 (20) sum = 110 (not 120)
Zeit
account3 30 Transaction B
t1 t2 t3
FETCH account3 (30)
t4
UPDATE account3 30 20
t5
FETCH account11 (40)
t6
UPDATE account1 40 50
t7
COMMIT
t8
Figure 3-5: Transaction a computes a wrong value
3.2.5
Locking
The problems described so far can be solved by a locking mechanism. When a transaction reads or writes data, it will lock the data. The data can not be changed by another transaction as long as they are locked. After a commit operation the locks are released. Now another transaction can access the data. 1. There are two sorts of locks. Exclusive locks (X-lock) and shared locks (S-lock). 2. If transaction A puts a X-lock on a tuple, no other transaction can lock this tuple. If transaction B wants to access the tuple, it must wait that A releases the lock. 3. If transaction A puts a S-lock on a tuple, then another transaction can put a S-lock on the same tuple. If transaction B wants to put a X-lock on the tuple, it must wait that all S-lock for this tuple are released.
3-7
The compatibility between different locks are shown in the table 3.1. X S X No No Yes S No Yes Yes - Yes Yes Yes Table 3.1: Compatibility of Locks 4. The locks are put on the tuple implicitly. If a transaction reads a tuple, then a S-lock is put on the tuple. If a transaction writes a tuple successfully, then a X-lock is put on the tuple. 5. At the end of the transaction (COMMIT or ROLLBACK) all X-locks and all S-locks are released. Now Consider what happens with the three problems (lost Update, Uncommitted Dependency and inconsistent Analysis) when locking is introduced. 3.2.5.1
lost Update
The situation is depicted in figure 3-6. Transaction A
time
FETCH R S−Lock auf R
Transaction B
t1 t2
t3 UPDATE R Wait for X−Lock wait t4 wait wait wait wait Deadlock
FETCH R S−Lock auf R
UPDATE R waiting for X−Lock wait wait
Figure 3-6: Lost Update with locking Now the updates of transaction A are no more overwritten, but there is a deadlock. Transaktion A waits that transaction B releases the S-Lock. Transaction B waits that transaction A releases the S-lock. 3.2.5.2
Uncommitted Dependency
The situation is depicted in the figures 3-7 and 3-8. In this case no deadlock arises. After transaction A and B the database is consistent.
3-8
Transaction A
time
Transaction B
t1
UPDATE R X−Lock on R
t2 t3
ROLLBACK Release Locks
t4
FETCH R wait for S−Lock wait for S−Lock wait for S−Lock wait for S−Lock resume:FETCH R S−Lock on R
Figure 3-7: Transaction B is waiting fro transaction A to finish Transaction A
time
Transaction B
t1
FETCH R
UPDATE R X−Lock on R ROLLBACK Release Locks
t2
FETCH R
t3
wait for S−Lock wait for S−Lock wait for S−Lock
t4
resume:FETCH R
UPDATE R X−Lock on R
Figure 3-8: Transaction B is waiting fro transaction A to finish to update the tuple
3.2.5.3
Inconsitent Analysis
The situation is depicted in figure 3-9. account1 40
account2 50
Transaction A
time
account3 30 Transaction B
FETCH account1 (40)
S−Lock on account 1
t1
sum = 40 FETCH account2 (50)
t2
S−Lock on account 2 sum = 90
t3
FETCH account3 (30)
S−Lock on account3 t4
UPDATE Konto3
X−Lock on account3 t5
30 20 FETCH account1 (40)
S−Lock on account1 t6 FETCH account3 (20)
t7
wait for S−Lock account3 wait wait Deadlock
UPDATE account1
wait for X−Lock account1 wait wait wait wait
Figure 3-9: Deadlock happens In this case there is no more inconsistency but a deadlock occurs. 3.2.5.4
Handling Deadlocks
The system is able to recognize deadlock situations. When the system detects a deadlock, then one of the responsible transaction will be selected. The system enforces a 3-9
rollback for this transaction. Another possibility is to abort the read or write operation for the selected transaction. In this case the transaction is not terminated and the user can try to repeat the operation.
3.2.6
Transactions and SQL
3.2.6.1
Begin of transaction
In SQL transactions are started implicitly. If no transaction is active and a “transaction initiating” statement is executed, then a new transaction is started. Properties of transactions can be set by the user with the SET TRANSACTION command. Remark 3.3 [Explicit Start of Transactions] In SQL-2003 transactions can be started explicitly with the START TRANSACTION command. The properties of the transaction can be set with this command. The following list shows all statements which do not initiate a new transaction. All other statements are “transaction initiating” statements. CONNECT DISCONNECT COMMIT ROLLBACK DECLARE CURSOR DECLARE LOCAL TEMPORARY TABLE BEGIN DECLARE SECTION END DECLARE SECTION WHENEVER all SET Befehle all GET Befehle 3.2.6.2
End of Transaction
A transaction is terminated by a COMMIT or a ROLLBACK statement. When a transaction T1 makes changes to the database this is only visible for other transactions after the COMMIT statement. Actions of the
COMMIT
statement.
1. Close all open cursor 2. Test all delayed integrity constraints. 3. All changes become persistent. Actions of the
ROLLBACK
statement.
1. Close all open cursor 3-10
2. All changes which was made during the transaction are canceled. Remark 3.4 [Nested Transaction] In SQL-2003 there is the posibility of nested transaction. At each time during a transaction a savepoint can be set. Now it is possible to make a rollback to a given savepoint. The main transaction remains open and can be terminated later. 3.2.6.3
Property of Transactions
The “access mode” and the “isolation level” of a transation can be set with the SET TRANSACTION statement. This statement can only be executed if no transaction is active. Syntax of the command: transactionoption ::= SET TRANSACTION [accessmode] [,isolationlevel] accessmode ::= READ ONLY | READ WRITE isolationlevel ::= SERIALIZABLE | REAPEATABLE READ | READ COMMITTED | READ UNCOMITTED
“access mode”: A READ ONLY transaction can only read the data. No updates are possible during a read only transaction. The default is READ WRITE. If the isolation level is READ UNCOMMITED, then the access mode must be READ ONLY. “isolation level”: The highest degree of isolation level is serializability. If all transactions on this degree work well, then it is guaranteed that the result of concurrent transactions gives the same result, as if the transactions where executed in serial manner (one speaks then of the ACIDS principle). The default for the isolation level is SERIALIZABLE. If another level is selected by the user, then the database system may select a higher one. The ordering of isolation levels is: READ UNCOMMITTED < READ COMMITTED < REAPEATABLE READ < SERIALIZABLE Depending on the isolation level of the transaction, the following may happen. 1. Dirty read: This is the same than “uncommited dependency” (section 3.2.4). Transaction T1 updates a tupel, then transaction T2 reads the tupel. Now transaction T1 makes a rollback and the data read by T2 are “dirty”. 2. Nonrepeatable read: The transaction T1 reads a tuple, then T2 update the tuple. Now T1 reads the tuple again and sees another value. This is the same problem than “inconsistent analysis” (section 3.2.4). 3. Phantoms Transaction T1 selects a set of tuples with a given select conditions. Now transaction T2 inserts a tuple which meets the given conditions. If the same query is executed again by transaction T1, there will be some new tuples (phantoms) in the result set. 3-11
The “isolation level” in SQL says which of the effects may hapen during a transaction. The following table shows wich effects are possible with which isolation level. isolation level dirty read nonrepeatable read phantom READ UNCOMMITTED yes yes yes READ COMMITTED no yes yes REPEATABLE READ no no yes SERIALIZABLE no no no
3-12
Chapter 4 Definition of the Database Schema 4.1
Data Definition Language (SQL/DDL)
The D(ata) D(efinition) L(anguage) (DDL) is the part of SQL with wich one can define the database Schema.
4.1.1
Data Type
The following table shows the standard data types. Type BOOLEAN SMALLINT
Instance Logical Value Integer Value
Example TRUE 4771
Fixpoint Number
1003.65
Floatingpoint Number
1.5E-4
Alphanumeric Character String
’A String’
Binary String
B’110100101’
Date Time Timestamp Time Interval
DATE’1951-11-24’ TIME’11:39:49’ TIMESTAMP’2002-08-23 14:15:00’ INTERVAL’48’ HOUR
INTEGER BIGINT DECIMAL(p,q) NUMERIC(p,q) FLOAT(p) REAL DOUBLE PRECISION CHAR(q) VARCHAR(q) CLOB BIT(q) BIT VARYING(q) BLOB DATE TIME TIMESTAMP INTERVAL
4-1
4.1.2
Relation Schema
A relation schema is called a table and is defined by its attributes. Each attribute must have a data type. 4.1.2.1
New Relation Schema
::= CREATE TABLE ( [,]... ) ::= | ::= { | } [] []...
The definition of and will be given in the section 4.1.3. Example 4.1 [New Relation Schema] Here is the definition of the schema for suppliers. CREATE TABLE
sNr name city zip
supplier ( INTEGER, VARCHAR(30), VARCHAR(30), CHAR(8)
)
4.1.2.2
Altering a Relation Schema
With the command ALTER TABLE the definition of a schema can be changed. It is not possible to alter the data type of an attribute. ::= ::=
ADD [COLUMN]
| | | |
The command
ALTER [COLUMN] {SET | DROP DEFAULT} DROP [COLUMN] [RESTRICT | CASCADE] ADD DROP CONSTRAINT [RESTRICT | CASCADE]
ALTER TABLE
DROP [COLUMN]
can not be executed in the following cases.
• Trying to delete the last attribute of the table.
4-2
• The delete mode is RESTRICT and the attribute is referenced in a constraint or a view definition. If the delete mode is CASCADE, then views and constraints referencing the attribute are also dropped. Example 4.2 [Altering a Relation Schema] Define the new attribute “street in the supplier table. ALTER TABLE
supplier
ADD
street
VARCHAR(30) DEFAULT
’unknown’
Drop the attribute ”zip“ from the supplier table. ALTER TABLE
supplier
DROP
zip
RESTRICT
The table can be altered even if tuples already exist in the table. 4.1.2.3
Dropping a Relation Schema
With the command DROP TABLE a table can be dropped from the database. The command also deletes all tuples in the given table. ::=
DROP TABLE
[RESTRICT |
The table is not dropped if the delete mode is a constraint or a view definition.
RESTRICT
CASCADE]
and the table is referenced in
Example 4.3 [Dropping a Relation Schema] Delete the supplier table from the database. DROP TABLE
4.1.3
supplier
CASCADE
Constraints
Constraints permits to exclude ”semantic incorrect“ datas from the database. SQL:1992 supports constraints for attributtes, relation schematas and general constraints called Assertion. All constraints have an explicit or implicit (created by the system) name. The name of the constraint is used in error messages. ::= [CONSTRAINT ] ::= |
4.1.3.1
Constraints for Attributes
This constraints are defined at the same time as an attribute and are valid for exactly one attribute in a table.
4-3
::=
NOT NULL
| | |
|
UNIQUE PRIMARY KEY
[ON DELETE ] [ON UPDATE ] CHECK()
REFERENCES
Explantion: •
NOT NULL
the attribute can not contain a null value.
•
UNIQUE
•
PRIMARY KEY
•
REFERENCES
•
CHECK
the value of the attribute must be unique over the whole table. One exception is the value NULL which can appear an arbitrary number times. defines the attribute as the primary key.
defines the attribute as a foreign key.
is a boolean expression which limits the range of possible values for this attribute.
Example 4.4 [Constraints for Attributes] CREATE TABLE
person (
peNr name firstName securityNr
INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, CHAR(14) UNIQUE
) CREATE TABLE
phobby (
peNr
INTEGER REFERENCES
person (peNr)
ON DELETE CASCADE,
hobby
VARCHAR(20) NOT NULL
(hobby PRIMARY KEY(peNr, hobby) CHECK
IN
(’Golf’, ’Theater’, ’Football’))
) 4.1.3.2
Constraints for Relation Schema
::=
{PRIMARY KEY | UNIQUE} () | FOREING KEY () REFERENCES table-name [()] [MATCH {SIMPLE | PARTIAL | FULL}] [ON DELETE ] 4-4
[ON UPDATE ] | CHECK (conditional-expression) ::= [,]... ::= {NO ACTION | CASCADE | SET DEFAULT |
•
UNIQUE
•
PRIMARY KEY
•
FOREIGN KEY
SET NULL}
key over more than one attribute. primary key (more than one attribute).
foreign key (more than one attribute). It is possible to specify matching rules and referential actions. Matching Rules specify the behavior of the system if an attribute of the foreign key contains a null value. If the value of an attribute is null, then the referential integrity is not tested. This is the default. MATCH PARTIAL either all attributes are null or the attributes wich are not null, match the attributes of at least one tuple in the referenced table. MATCH FULL either all attributes are null or the attributes wich are not null, match the attributes of exactly one tuple in the referenced table. MATCH SIMPLE
Referential Actions specify the behavior of the system when a tuple is deleted (ON DELETE) or changed (ON UPDATE) in the referenced table. do not delete the tuple if it is referenced by a tuple of this table. This is the default. CASCADE also delete all tuples in this table which reference the tuple in the referenced table. SET NULL attributes of the foreign key are set to null in the referencing tuples. NO ACTION
•
defines a integrity constraint for the table. The constraint can contain subqueries.
CHECK
Example 4.5 [Constraints for Relation Schema] CREATE TABLE
peNr name firstName
person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(20) NOT NULL,
UNIQUE(name,
firstName)
); CREATE TABLE
name firstName
hobbys ( VARCHAR(30), VARCHAR(20),
4-5
hobby
VARCHAR(20) NOT NULL,
PRIMARY KEY FOREIGN KEY
(name, firstName, hobby), (name, firstName) REFERENCES person (name, firstName)
ON DELETE CASCADE ON UPDATE CASCADE CHECK
(hobby
IN
(’Golf’, ’Theater’, ’Boxing’, ’Piano’))
)
Example 4.6 [Names for Constraints] In the next example the name of the constraint is defined explicitly by the user. CREATE TABLE
person (
p_nr
INTEGER,
name firstName
"Primary Key for person"PRIMARY KEY VARCHAR(30) NOT NULL, VARCHAR(20) NOT NULL, CONSTRAINT
,
"Name und firstName are a canditate key" firsName)
CONSTRAINT
UNIQUE(name,
);
4.1.3.3
ASSERTIONS
An assertion is a global constraint which is not bound to a given table. ::=
CREATE ASSERTION
CHECK
()
conditional-expression is an arbitrary SQL statement which is to true or false. The expression can contain more than one table. Example 4.7 [Assertions] The next assertion ensures that each person has at least one hobby. personhobby (NOT EXISTS (SELECT * FROM person p
CREATE ASSERTION CHECK
WHERE NOT EXISTS
(SELECT * FROM phobby ph WHERE p.peNr = ph.peNr)))
4-6
4.1.3.4
Modes for constraints
Each constraint has an initial constraint mode and a delay mode. • the constraint mode defines when the constraint is tested during a transaction. 1.
IMMEDIATE
tests the constraint immediatly after the execution of a SQL
statement. 2.
DEFERRED
delays the test to the end of the current transaction.
The initiale constraint mode is valid at the beginning of a transaction. It can be set explicitly by the INITIALLY IMMEDIATE or INITIALLY DEFERRED clause. • Only deferrable constraints can have the mode DEFERRED. Deferrabilty is set implicitly by the INITIALLY DEFERRED clause or explicitly by the DEFERRABLE clause. Now the full syntax for a constraint clause: [CONSTRAINT ] [[NOT] DEFERRABLE] [INITIALLY {IMMEDIATE | DEFERRED]} The defaults are
NOT DEFERRABLE
and
INITIALLY IMMEDIATE.
The delay mode can be changed during the transaction if the constraint is deferrable. ::= ::=
{DEFERRED | IMMEDIAT} | [, ]...
SET CONSTRAINTS ALL
There exist many examples where a constraint must be deferred. In the example 4.7 no person can be inserted because there is no hobby for this person. On the other hand the hobby can not be inserted as long the person does not exist. This problem can only be solved if the test of the constraint is delayed to the end of the transaction. The correct definition for the assertion must have the following form. CREATE ASSERTION CHECK
personhobby
(NOT EXISTS (SELECT * FROM person p WHERE NOT EXISTS
(SELECT * FROM phobby ph p.peNr = ph.peNr)))
WHERE
DEFERRABLE INITIALLY DEFERRED
4-7
4.2
Normalization
Database normalization, sometimes referred to as canonical synthesis, is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only.
4.2.1
First Normal form 1NF
Definition 4.1 [First Normal Form 1NF] A table is in first normal form (1NF) if and only if it faithfully represents a relation. One requirement of a relation is that every table contain exactly one value for each attribute. Given that database tables embody a relation-like form, the defining characteristic of one in first normal form is that it does not allow duplicate rows or nulls. Simply put, a table with a unique key (which, by definition, prevents duplicate rows) and without any nullable columns is in 1NF. Example 4.8 [Relation in 1NF] Consider the following relation schema about lecturer and the course they give. FacultyCourse = ({lNr, name, city, fNr, facultyName, cNr, courseName, weekDay}, {lNr, cNr}) Some explanation: • Every lecturer has a name, a city and belongs to exactly one faculty. • A lecturer can give more than one course. It ist possible that the lecturer has no courses (sabbatical). • The same course can be teached by more than one lecturer. • Each course has a code and a unique name. • Each faculty has a code and a unique name. • weekDay is the day when the course is teached by this lecturer. In the next table a possible relation over this schema is shown. lNr 1 1 2 2
name Jones Jones Davis Davis
city New York New York Boston Boston
Lecturer and Their fNr facultyName 120 Mathematics 120 Mathematics 135 Physics 135 Physics 4-8
Courses cNr courseName 222-1 Analysis 440-1 Linear Algebra 334-2 Mechanics 440-1 Linear Algebra
weekDay Monday Friday Tuesday Thursday
The relation in the example 4.8 is in first normal form but has some update anomalies insert: It is not possible to insert a lecturer if he gives no course because cNr belongs to the primary key. delete: When the last course of a lecturer is deleted, all informations about the lecturer are lost. update: If Jones moves from New York to Boston, all tuples about Jones must be changed. Otherwise the data base is no longer consistent.
4.2.2
Functionale Dependency
In the normalization process we have to know how attributes are dependent of each other. This leads to the term of functional dependency (FD). Definition 4.2 [Functional Dependency] Let R be a relation schema, X ⊆ R and Y ∈ R. Attribute Y has a funcional dependency on Xif, for each value of the attributes in X, there is exactly one value of the attribute B. The notation is R.X → R.Y Formaly: Let R be a relation schema, X ⊆ R and Y ∈ R. Attribute Y has a functional dependency on X if, for all relation r ∈ REL(R) R.X → R.Y ⇔ ∀t1 , t2 ∈ r : t1 (X) = t2 (X) ⇒ t1 (Y ) = t2 (Y ) All non key attributes in a relation are functionaly dependent on the attributes of a key (primary or candidate). In the example 4.8 we can find the following FDs. lNr → name cNr → courseName (lNr, cNr) → weekDay Definition 4.3 [Full functional dependency] An attribute Y of a relation schema R is fully functionally dependent on a set of attributes X ⊆ R if it is • functionally dependent on X, and • not functionally dependent on any proper subset of X. In the example 4.8 the following functional dependency hold. (lNr, name) → city It is not a full functional dependency because the following FD also holds. lNr → city Remark 4.1 [FD is a Semantik Notion] The notion of functional dependency is a semantic one. Recognizing FDs is only possible if the meaning of the attributes are known. Each FD is modelling a fact from the system. 4-9
4.2.3
Second Normal Form 2NF
The problems in example 4.8 is, that there are attributes which are not fully functional dependent on the primary key. This introduces redundancy in the database. All full functional dependency are shown in figure 4-1. name city
facultyName Key lNr
fNr
weekDay cNr
courseName
Figure 4-1: Full functional dependency in the relation schema FacultyCourse Definition 4.4 [Second Normal Form 2NF] A relation schema R is in second normal form (2NF) if • it is in first normal form and • every non key attribute is fully functional dependent on every key in the schema. The task now is to transform the relation schema in 1NF in a system of relation schematas in 2NF. To do this we consider wich non key attributes are full functional dependent on which attributes of the primary key. • The attributes name, city, fNr and facultyName are fully functional dependent on the attribute lNr. • The attribute courseName is fully functional dependent on the attribute cNr • The attribute weekday is fully functional dependent on the attributes (lNr, cNr). For each of this groups we build a new relation schema. So we obtain the three schematas: Lecturer Course LecturerCourse
= ({lNr, name, city, fNr, facultyName} {lNr}) = ({cNr, courseName}, {cNr}) = ({lNr, cNr, weekDay}, {lNr, cNr}, fk({lNr}, Lecturer), fk({cNr, Course}))
4-10
To obtain the relation over the new schematas we simply make the three projections: • Lecturer = πlN r,name,city,f N r,f acultyN ame (F acultyCourse). • Course = πcN r,courseN ame (F acultyCourse) • LecturerCourse = πlN r,cN r,weekDay (F acultyCourse) The result of this projections are shown inf figure 4-2.
lNr name 1 Jones 2 Davis
Lecturer city fNr New York 120 Boston 135 lNr 1 1 2 2
facultyName Mathematics Physics
cNr 222-1 440-1 334-2
Courses courseName Analysis Linear Algebra Mechanics
LecturerCourse cNr weekDay 222-1 Monday 440-1 Friday 334-2 Tuesday 440-1 Thursday
Figure 4-2: Lecturer and their courses in 2NF Remark 4.2 [Relationship] The relation schema LecturerCourse represents a M:M relationship between Lecturer and Courses. The attribute weekDay is an intersection attribute. Consider the case where the attribute weekDay is not present. Then the schema LectureCourse is still necessary to represent the M:M relationship between Lecturer and Course. All full functional dependency of the Lecturer-Course system in 2NF are shown in figure 4-3. Lecturer
facultyName fNr
lNr
city name
Course cNr
LecturerCourse lNr
courseName
weekDay cNr
Figure 4-3: Full functional dependency of the Lecturer-Course system in 2NF The two relation schematas Course and LecturerCourse are not only in 2NF but also in 3NF. Only the relation schema Lecturer is not fully normalized. The following update anomalies are still present: 4-11
insert: A new faculty can only be inserted if a lecturer belongs to the faculty. delete: If we delete the last lecturer of a faculty, the informations about the faculty are also lost. update: If the name of the faculty changes then this must be changed for every lecturer of the faculty.
4.2.4
Third Normal Form (3NF)
The problems in the relational schema Lecturer is caused by the two following functional dependencies. lN r → f N r and f N r → f acultyN ame This is called a transitive dependency. The attribute facultyName is transitvely depependent on the attribute lNr over the attribute fNr. This introduces redundancy in the relation. Definition 4.5 [Third Normal Form 3NF] A relationschema R is in third normal form (3NF) if • it is in 2NF and • Every non key attribute of the table is non-transitively dependent on each key of the schema. In order to eliminate a transitive dependency in a relational schema, it must be split into two different schematas. Eliminate the transitive attribute in the old schema and place it, with the non-key attribute it depends on, in the new schema. For the relation schema Lecturer we obtain the to schematas: Lecturer Faculty
= ({lNr, name, city, fNr}, {lNr} fk({fNr}, Faculty)) = ({fNr, facultyName}, {fNr})
To obtain the relation over the new schematas we simply make the two projections: • F aculty = πf N r,f acultyN ame (Lecturer) • Lecturer = πlN r,name,city,f N r (Lecturer). The result of this projections are shown inf figure 4-4. All full functional dependency of the Lecturer-Course system in 3NF are shown in figure 4-5. Theorem 4.1 [Transformation to 3NF] Every relation schema can be decomposed in a system of relational schematas which are in 3NF and preserves all dependencies of the original schema. This means no informations is lost. 4-12
lNr 1 2
Lecturer name city fNr Jones New York 120 Davis Boston 135
fNr 120 135
Faculty facultyName Mathematics Physics
Figure 4-4: Lecturer in 3NF Lecturer
Faculty fNr
lNr
fNr
city
facultyName
name
LecturerCourse Course lNr cNr
weekDay
courseName
cNr
Figure 4-5: full functional dependency of the Lecturer-Course system in 3NF
4.2.5
The Boyce/Codd Normal Form BCNF
Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not contain multiple overlapping candidate keys is guaranteed to be in BCNF. Depending on what its functional dependencies are, a 3NF table containing two or more overlapping candidate keys may or may not be in BCNF. Consider the following relation schema with the attributes TutorID, TutorSocSecurityNr and StudentID with the following rules • A tutor has exactly one social security number and a social security number is assigned to exactly one tutor. • More than one student can be associated with a single tutor. • More than one tutor can be associated with a single student. From the rules we can deduce the following candidate keys, which happen to overlap. • {TutorID, StudentID} • {TutorSocSecurityNr, StudentID} From the rules we can also deduce the following functional dependencies: • T utorID → T utorSocSecurityN r 4-13
• T utorSocSecurityN r → T utorID An example for a relation over this schema is shown in the table 4.1. Tutor/Student Cross-Reference TutorID TutorSocSecurityNr StudentID 1078 088-51-0074 31850 1078 088-51-0074 37921 1293 096-77-4146 46224 1480 072-21-2223 31850 Table 4.1: Tutor/Student Cross-Reference Because all attributes belongs to a key the relation schema is in 3NF. But the schema still has some update anomalies. The candidate keys of the Tutor/Student CrossReference table cannot prevent a situation in which two different Tutor IDs are shown, illegitimately, as corresponding to the same tutor social security number. So the functional dependency T utorSocSecurityN r → T utorID can be violated. Definition 4.6 [Boyce/Codd Normal Form BCNF] A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {EmployeeID, EmployeeAddress} → EmployeeAddress is trivial EmployeeAddress → EmployeeAddress is also trivial A table is in Boyce-Codd normal form (BCNF) if and only if, for every one of its nontrivial functional dependencies X → Y , X is a superkey that is, X is either a candidate key or a superset thereof. In the example T utorSocSecurityN r → T utorID is a non trivial functional dependency but TutorSocSecurityNr is not a superkey of this schema. The solution of this problem is to split the relation schema. TutorStudent = ({TutorID, StudentID}, {TutorID, StudentID}, fk({TutorID}, Tutor)) Tutor = ({TutorID, TutorSocSecurityNr}, {TutorID}, ck(TutorSocSecurityNr)) Both tables are in BCNF; and so long as the uniqueness of all candidate keys is enforced, there is no possibility of showing multiple distinct Tutor IDs against the same Tutor Social Security Number or vice-versa. The resulting relations are shown in table 4.2 4.2.5.1
Achievability of BCNF
In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional dependencies {A, B → C, C → B} 4-14
Tutor/Student Cross-Reference TutorID StudentID 1078 31850 1078 37921 1293 46224 1480 31850
Tutor TutorID TutorSocSecurityNr 1078 088-51-0074 1293 096-77-4146 1480 072-21-2223
Table 4.2: Tutor/Student Cross-Reference cannot be represented by a BCNF schema. Thus, unlike the first three normal forms, BCNF is not always achievable. Consider the following relation schema with the attributes Student, Course and Lecturer with the following rules: • A course is teached to a student by exactly one lecturer. • Each lecturer teaches exactly one course. • Each course can be teached by more than one lecturer. From the rules we can deduce the following functional dependencies. • (Student, Course) → Lecturer • Lecturer → Course Those this schema follow the {A, B → C, C → B} pattern. An example for a relation over this schema is shown in the table 4.3. Student-Course-Lecturer-Relation Student Course Lecturer Peter Mathematics Smith Peter Physics Jones Mark Mathematics Smith Mark Physics Davis Table 4.3: Student-Course-Lecturer-Relation From the rules and the functional dependency we can deduce that the relation has two candidate keys. {Student, Course} and {Student, Lecturer}. Since there is no non-key attribute in this relation schema, it is in 3NF. But the schema still has some update anomalies. insert: The fact that a lecturer is teaching a course can not be recorded if no student takes the course. 4-15
delete: Deleting the fact that Mark takes the course in physics, also destroys the fact that Davis teaches physics. In the example Lecturer → Course is a non trivial functional dependency but Lecturer is not a superkey of this schema. The solution of this problem is to split the relation schema. StudentLecturer = ({Student, Lecturer}, {Student, Lecturer}, fk({Lecturer}, LecturerCourse)) LecturerCourse = ({Lecturer, Course}, {Lecturer}) The relations are shown in table 4.4. StudentLecturer Student Lecturer Peter Smith Peter Jones Mark Smith Mark Davis
LecturerCourse Dozent Fach Smith Mathematics Jones Physics Davis Physics
Table 4.4: Relation Schematas in BCNF Unfortunately, although this design adheres to BCNF, it is unacceptable on different grounds: it allows us to record multiple lecturer which teaches the same course against the same student. In other words, its candidate keys do not guarantee that the functional dependency (Student, Course) → Lecturer will be respected. The only solution is to introduce a global integrity
4.3
ERD and the Database Schema
In this section a methode is described to transform an ERD to a database schema with the following property 1. All informations in the ERD must be mapped to the database schema. 2. All relations are in 3NF 3. the number of relation schematas is minimal. The translation is relativly simple and could be performed by an algorithm. For every element in the ERD a table or an attribute is created in the database schema.
4-16
4.3.1
Entity sets and Attributes
For entity sets create a table with the same name. The key of the entity becomes the primary key. The figure 4-6 shows the entity person person peNr title[0..1] name firstName birthDate hobby[1..*]
Figure 4-6: Entity Person The steps to create a schema are described below 1. Create a table with the same name as the entity and define the entity key as primary key. CREATE TABLE
peNr
person ( INTEGER PRIMARY KEY
) 2. All attributes with type 1 cardinality can be added to the schema with the NULL constraint. CREATE TABLE
person (
peNr name firstName birthDate
INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL
) 3. Then add all attributes with type C cardinality without constraint. CREATE TABLE
peNr title name firsName birthDate
person ( INTEGER PRIMARY KEY, VARCHAR(15), VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL
)
4-17
NOT
4. Attributes with type M or type MC cardinality must be stored in an own table (one table for each such attribute). The primary key is the primary key of the main schema plus the attribute. Then there must be a foreign key to the main table. In the example hobby has cardinality MC and must be stored in an own table. CREATE TABLE
personhobby (
peNr
INTEGER REFERENCES
hobby
VARCHAR(20)
person(peNr)
ON DELETE CASCADE, CHECK(hobby IN PRIMARY KEY(peNr,
(’Football’, ’Piano’, ’Theater’, ’Golf’, ’Boxing’, ’Cooking’)),
hobby)
)
In this example type MC cardinality is realized for the attribute hobby. For type M cardinality an assertion is necessary. "Hobby has cardinality M" (NOT EXISTS (SELECT * FROM person LEFT JOIN personhobby WHERE hobby IS NULL))
CREATE ASSERTION CHECK
DEFERRABLE INITIALLY DEFERRED
Note that The test of the assertion must be deferred to the end of the transaction
4.3.2
Generalization and Specialisation
There are two possibilities for the implementation of specialization: 4.3.2.1
Vertical Representation
In this representation the each specialization is implemented as a table. The primary key of this tables are also foreign keys to the entity who represents the generalization. Example 4.9 [Implementation of Generalization] Consider the three relation schematas. CREATE TABLE
peNr name firstName birtDate
person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL
)
4-18
CREATE TABLE
physician (
peNr speciality
INTEGER PRIMARY KEY, VARCHAR(30)
) CREATE TABLE
patient (
peNr healthInsurance
INTEGER PRIMARY KEY, VARCHAR(30)
) To represent person as a generalization of physician and patient we introduce two foreign keys. CREATE TABLE
physician (
p_nr
INTEGER PRIMARY KEY REFERENCES
person(p_nr)
ON DELETE CASCADE,
speciality
VARCHAR(30)
) CREATE TABLE
patient (
peNr
INTEGER PRIMARY KEY REFERENCES
person(p_nr)
ON DELETE CASCADE,
healthInsurance
VARCHAR(30)
) This representation is correct: • A physician or a patient can be only one person because peNr is the primary key and the foreign key. • An entity patient or physician can only exists if it also exists as a person. This representation only fits case d) in the figure 1-7. Cases a) to c) can only be implemented correctly with some constraints. • Case a) and b): The specialisations are disjoint: "Physician and Patient are disjoint" (NOT EXISTS (SELECT * FROM person NATURAL JOIN physician NATURAL JOIN patient))
CREATE ASSERTION CHECK
• Case a) und c): A person must be physician or patient.
4-19
CREATE ASSERTION CHECK
"person is abstract"
(NOT EXISTS (SELECT * FROM person p WHERE NOT EXISTS
(SELECT * FROM physician ph WHERE p.peNr = ph.peNr) AND NOT EXISTS
(SELECT * FROM patient pa WHERE p.peNr = pa.peNr))) DEFERRABLE INITIALLY DEFERRED
Remark 4.3 [Assertions] Today no database system implements assertions. They must be simulated in the application with triggers and stored routines. 4.3.2.2
Horizontal Representation
In this representation specialization has no own entity. All attributes of the specialization are stored in th main entity. In the main entity an attribute is introduced to store which sort of entity is present. Example 4.10 [Horizontal Representation] person, patient and physician example. CREATE TABLE
person (
peNr name firstName birthDate isphysician speciality ispatient healthInsurance
INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, VARCHAR(30) NOT NULL, DATE NOT NULL, BOOLEAN NOT NULL, VARCHAR(30), BOOLEAN NOT NULL, VARCHAR(30)
) If physician and patient are disjoint the following constraint will express this. person ADD CONSTRAINT "Physician and Patient are disjoint" CHECK (NOT(isphysician = TRUE AND ispatient = TRUE)) ALTER TABLE
If person is abstract, then add the following constraint. ADD CONSTRAINT "person is abstract" CHECK (isphysician = TRUE OR ispatient = TRUE)
4.3.3
ALTER TABLE
Relationships
Figure 4-7 shows the most important cases of relationships. 4-20
person
Physician phNr
1
treats
1..*
Patient paNr
Physician phNr
1..*
treats
1..*
Patient paNr
Figure 4-7: Relationships 4.3.3.1
Type 1 to Type M Relationship
In this case a foreign key is introduced from patient to physician. CREATE TABLE
phNr . .
physician ( INTEGER PRIMARY KEY
) CREATE TABLE
paNr phNr . .
patient ( INTEGER PRIMARY KEY, INTEGER NOT NULL REFERENCES
physician(phNr),
)
If it is possible that patients exist which are treated by no physician, then null values for the foreign key are allowed. CREATE TABLE
paNr phNr . .
patient ( INTEGER PRIMARY KEY, INTEGER REFERENCES
physician(phNr),
)
To be sure that each physician is treating at least one patient, the following constraint must be introduced. "Physician treats 1 Patient" (NOT EXISTS (SELECT * FROM physician LEFT JOIN patient WHERE paNr IS NULL)
CREATE ASSERTION CHECK
DEFERRABLE INITIALLY DEFERRED
4-21
4.3.3.2
Type M to Type M Relationship
In this case a new table is introduced to represent the relationship. This table has a foreign key to physician and one to patient. The primary key is composed of both foreign keys. CREATE TABLE
phNr . .
physician ( INTEGER PRIMARY KEY
) CREATE TABLE
paNr . .
patient ( INTEGER PRIMARY KEY,
) CREATE TABLE
phNr paNr
PhTreatsP ( INTEGER REFERENCES INTEGER REFERENCES
PRIMARY KEY
physician(phNr), patient(paNr),
(phNr, paNr)
)
Also in this example the conditional case (MC:MC) is represented. To represent (M:M or MC:M) an assertion is necessary.
4.3.4
Intersection Attribute
The representation of intersection Attributes depends on the cardinality of the relationship. 4.3.4.1
Type 1 to Type M Relationship
In figure 4-8 diagnosis and medicine are intersection attributes. In this example there is a 1:M cardinality between physician and patient, so the intersection attributes can be stored in patient. Because medicine has cardinality MC it must be stored in an own relation. CREATE TABLE
phNr . .
physician ( INTEGER PRIMARY KEY
) 4-22
Physician phNr
1
treats
1..*
Patient paNr
PhTreatsP diagnosis medicine[0..*]
Figure 4-8: Intersection Attributes in a 1 to M Relationship
CREATE TABLE
patient (
paNr phNr diagnosis . .
INTEGER PRIMARY KEY, INTEGER REFERENCES
physician(phNr),
CLOB,
) CREATE TABLE
PaMedicine (
paNr
INTEGER REFERENCES
patient(paNr)
ON DELETE CASCADE
medicine
VARCHAR(30),
PRIMARY KEY
(paNr, medicine)
)
4.3.4.2
Type M to Type M Relationship
In figure 4-9 diagnosis and medicine are intersection attributes. In this example there is a M:M cardinality between physician and patient. But in this case the relationship is represented by the entity PhtreatsP so the intersection attributes can be stored there. The attribute medicine has cardinality MC and must be stored in a new relation. Physician
1..*
treats
1..*
Patient
PhTreatsP diagnosis
Figure 4-9: Beziehungsattribut bei einer textbfM zu M Beziehung Die entsprechenden Relationenschematas sehen also folgendermassen aus: CREATE TABLE
phNr
physician ( INTEGER PRIMARY KEY
4-23
. . ) CREATE TABLE
patient (
paNr . .
INTEGER PRIMARY KEY,
) CREATE TABLE
PhTreatsP (
phNr paNr diagnosis
INTEGER REFERENCES INTEGER REFERENCES
physician(phNr), patient(paNr),
CLOB,
PRIMARY KEY
(phNr, paNr)
) CREATE TABLE
PaMedicine (
phNr paNr medicine
INTEGER INTEGER VARCHAR(30),
(phNr, paNr, medicine) FOREIGN KEY(phNr, paNr) REFERENCES PhTreatsP(phNr, paNr) PRIMARY KEY
ON DELETE CASCADE
)
4.3.5
Name Conflict
If possible the attributes of a foreign key have the same name than the attributes of the referenced primary key. Sometimes this is not possible. Consider the example in figure 4-10. Product prNr 0..*
Person peNr
0..*
1 sells produces
Figure 4-10: Namenskonflikte 4-24
1
In this example there are two relationships between person and product. The rules say that two foreign keys must be introduced in the relation product. This two foreign keys can not share the same attributes. We have to rename the attributes of at least one foreign key. We choose sPeNr for selling and pPeNr for producing. CREATE TABLE
peNr
person ( INTEGER PRIMARY KEY
CREATE TABLE
prNr pPeNr sPeNr
)
product ( INTEGER PRIMARY KEY, INTEGER REFERENCES INTEGER REFERENCES
person(peNr), person(peNr)
)
4.3.6
Verify 3NF
The constructed database schema must be verified to ensure that all relation schematas are in 3NF. 4.3.6.1
All Schematas are in 2NF
Because all entities in the ERM have a surogate key, all schematas which represent an entity have a key with only one attribute an can not violate the 2NF. Only relation schematas representing M:M relationships or attributes with cardinality M have composed primary keys in the database schema. 1. Attributes with cardinality M In this case all attributes of the schema belong to the primary key and a violation of the 2NF is not possible. 2. M to M relationship In this case all attributes which do not belong to the primary key are intersection attributes. An intersection attribute depends on both entity of the relationship. This means the attribute is full functional dependent on the primary key of the entity representing the relationship. 4.3.6.2
Violation of the 3NF
There are two cases where the 3NF can be violated. Local violation of the 3NF This can only happen if the ERD is not correct and that a depency between two attributes is not represented. This leads to a transitve dependency in the database schema. In this case normalize the concerned relation schema. 4-25
Global violation of the 3NF This case happens if there is a functional dependency of the form sNr → zip → city But unlike to the local case the dependency between zip and city was represented in the ERD. This means that in the database schema there exists a relation of the form CREATE TABLE
zip city . .
zipCity ( CHAR(8) PRIMARY KEY, VARCHAR(30) NOT NULL,
),
which records this fact. In this case the normalization is very simple. Example 4.11 [Global violation of the 3NF] In this example city is functionaly dependent on zip. This fact is recorded in the relation schema zipCity. CREATE TABLE
sNr name zip city
supplier ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, CHAR(8) NOT NULL, VARCHAR(30) NOT NULL
) CREATE TABLE
zip city . .
zipCity ( CHAR(8) PRIMARY KEY, VARCHAR(30) NOT NULL
)
In supplier there is the transitiv dependency sNr → zip → city. To correct this introduce a foreign key between supplier and zipCity and eliminate the attribute city in supplier. CREATE TABLE
sNr name zip
supplier ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, CHAR(8) NOT NULL REFERENCES
)
4-26
zipCity(zip)
There is a criterion, to find global violation of the 3NF in the database schema. A relation schema R possibly violates the 3NF if the two following conditions hold. 1. The relation schema R has a foreign key a which references a relation ER 2. The schematas R and ER have a common attribute b.
4.3.7
Example
4.3.7.1
Description of the problem
Consider the following rules for a company: 1. A person operates several machines. 2. A person produces several products. 3. A machine is operated by one person. 4. A machine produces several products. 5. The production of a product requires one machine. 6. The production of a product requires one person. 7. Each person has a name. 8. Each product has a name. 9. The dates of maintenace of a machine are recorded. There are not more than one maintenance for the same machine on the same day. 10. The production of a product needs one or more other products. The ERD is shown in figure 4-11. 0..*
Product prNr pName
Machine mNr machineType maintenance[0..*]
8
9
7
Person peNr name
10 0..*
1..*
1..*
5
needs
6
1 produces
4
1..*
3
1 operates
1
produces
Figure 4-11: Das Personen-Produkt-Maschine Beispiel
4-27
1
2
4.3.7.2
Impelementing the Database Schema
Entities and Attributes For each entity in the ERD we define a table in the database. CREATE TABLE
peNr name
Person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL
) CREATE TABLE
prNr pName
Product ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL
) Machine ( mNr INTEGER PRIMARY KEY, machinType VARCHAR(40) NOT NULL
CREATE TABLE
)
Create an own schema for the maintenance date. CREATE TABLE
mNr
MaschineMaintenance ( INTEGER REFERENCES
Maschine(m_nr)
ON DELETE CASCADE,
maintenance DATE, PRIMARY KEY(mNr,
maintenance)
)
Generalization and Specialization In this example there are no generalizations. Relationships Introduce foreign keys for the three 1:M relationships ”person produces product“, ”machine produces product“ and ”person operates machine“. CREATE TABLE
pNr pName mNr peNr
Product ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, INTEGER REFERENCES INTEGER REFERENCES
) CREATE TABLE
Machine ( 4-28
Machine(mNr), Person(peNr)
mNr machineType peNr
INTEGER PRIMARY KEY, VARCHAR(40) NOT NULL, INTEGER REFERENCES
Person(pe_nr)
)
Introduce a relation schema for the MC:MC relationship ”product needs product“. CREATE TABLE
PartsList (
prNr
INTEGER REFERENCES
component
INTEGER REFERENCES
Product(prNr)
ON DELETE CASCADE,
PRIMARY KEY(prNr,
Product(prNr),
component)
)
Intersection Attributes In this example there are no intersection attributes. Verify the 3NF The relation schema Product has a foreign key which references the schema Machine. Product and Machine have the common attribute peNr. The criterion for a violation of the 3NF is fulfilled. In the example there is realy a violation of the 3NF. The functional dependency mN r → peN r (”Machine is operated by person“) is stored in the schema Product and in the schema machine. So dropp the attribute peNr from the schema Machine. Here is the complete database schema. CREATE TABLE
peNr name
Person ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL
) CREATE TABLE
prNr pName mNr
Product ( INTEGER PRIMARY KEY, VARCHAR(30) NOT NULL, INTEGER REFERENCES
Machine(mNr)
) Machine ( mNr INTEGER PRIMARY KEY, machinetype VARCHAR(40) NOT NULL, peNr INTEGER REFERENCES Person(peNr)
CREATE TABLE
) CREATE TABLE
MachineMaintenance ( 4-29
mNr
INTEGER REFERENCES
Machine(mNr)
ON DELETE CASCADE,
maintenance
DATE,
PRIMARY KEY(mNr,
maintenance)
)
Remark 4.4 [3NF] To find the global violation of the 3NF it is important to test if a transitive dependency really exists. In our example the functional dependency mNr → peNr is valid in Product and Machine. This is not always true. If the attribute peNr in product has the meaning ’person sells product’, then there is no funtional dependency mNr → peNr in Product. In this case there is no transitive dependency and the attribute peNr can not be dropped from Produkt.
4-30
Chapter 5 Java Database Connection JDBC This chapter is a short introduction to JDBC. More informations and a complete description of all methods can be found at the internetaddress http://java.sun.com/j2se/1.5/docs/api/index.html the online documentation is in the package java.sql.
5.1
What is JDBC?
Here are some general informations about JDBC. • JDBC = Java DataBase Connectivity can be compared to ODBC (Open Database Connectivity) from Microsoft. • JDBC is a low level or call level API to – Get a connection to a given database – Send SQL-Queries and receive the results. – Analyze results of SQL-Queries – Get informations about the database metadata (tables, keys, procedures etc). • With JDBC it is possible to write programs in a product independent manner, to get portable code. It is also possible to use dedicated commands from a given database.
5.2 5.2.1
Architecture of JDBC Structure of the Packages
In the figure 5-1 the structure of the JDBC packages is depicted.
5-1
UserPackage
java.sql
VendorPackage
Figure 5-1: Structure of the packages
Description java.sql: This package belongs to the standard since version 1.1 of the java API. VendorPackage: This package contains an implementation class for every interface in sql.java. It contains the functionality of JDBC and is provided by the database vendor. UserPackage: This package is provided by the user and uses the different interfaces in java.sql. The package is also responsible for loading the correct JDBC driver for the given database product.
5.2.2
The package java.sql
Remark 5.1 [Rules for the use of java.sql] So that the code remains independent from the given database the following rules must be observed when objects are created: • Objects of implementation classes of the vendor package are never created with new, but with a corresponding get- or create-method in an other interface. • The names of the implementation classes are never used in the user package. The only known name is the name of the driver for the specific database. The drivers must be loaded by the user. Here there is the description of the functionality of each interface and each class in the figure 5-2. Driver: The driver is responsible for the communication between the database and the user. The implementation of the driver must have a “static code section”. This code is executed when the driver class is loaded. It should perform the following tasks 1. Create an instance of the driver. 2. Register the instance of the driver in the class Drivermanager with the method DriverManager.registerDriver. It is possible to load different drivers. So it is also possible, to communicate with different databases in the same program A driver can be loaded with the static method Class.forname. The statement 5-2
ResultSet
Statement
DriverManager
Driver
ResultSetMetaData
PreparedStatement
Connection
DatabaseMetaData
CallableStatement
Figure 5-2: The package java.sql Class.forname("com.mysql.jdbc.Driver") will load the Mysql JDBC driver. To load the driver make sure, that it is in the class path. DriverManager: This class contains only static methods. The constructor is private, so no instance of this class can be created. For the user, the only important method is getConnection, which creates a connection to the database. A JDBC URL must be passed as a parameter to the method. This URL identifies which database to use. The JDBC URL has the following format jdbc:: – is the official name of the driver or of a connectivity mechanism (eg. odbc). The producers of drivers can register the name of there subprotocols at Javasoft. – this part of the URL identifies the database. The exact syntax depends on the subprotocol. The statement Connection con = DriverManager.getConnection( "jdbc:mysql://localhost/demodb" "Name", "Password") creates a link to the database demodb on localhost and creates a new connection object con. In this example mysql is the subprotocol and //localhost/demodb is the subname. Connection, Statement, ResultSet: With the connection object it is possible to create statement objects (Statement, PreparedStatement and CallableStatement). 5-3
With a Statement object, sql commands can be sent to the database.The interface Statement has three important methods 1. executeQuery: to send sql SELECT statements to the database. 2. executeUpdate: to send INSERT, UPDATE, DELETE, CREATE, DROP etc sql statements to the database. 3. execute: if the type of the sql statement is unknown (select or update) then this method is used to send the command to the database. The result of a sql query is always a table. This table is stored in an object of the type ResultSet. With the methods first(), last(), next() and previous(), it is possible, to navigate from tuple to tuple in the result table. With the method getxxx() (xxx is a datatype) the attributes of a tuple can be converted to a java object. The interface CallableStatement standardizes the calling of stored procedure for all different databases. The input and output parameters and the return value can be defined in a standardized way. DatabaseMetadata: This interface makes it possible to get informations about the database schema (tables, attributes, keys etc). ResultSetMetaData: This interface makes it possible to get informations about a ResultSet object. This is necessary, if the schema of the result is not known at compile time. The following classes also belong to the package java.sql Exceptions In JDBC there are three kind of exceptions: • SQLExceptions SQLExceptions are thrown, if a sql statement is not executed because it is incorrect or a table does not exist or the user has not enough privileges etc. • SQLWarnings Warnings are created, if a statement is executed, but the result is unusual. For instance, if an update statement does not change anything in the database. Warnings are not thrown, but there are attached to the object, which makes the call to the database. The warnings can be obtained with the method getWarnings(). • DataTruncation If the value of an attribute is truncated when it is written to the database, then the JDBC-driver throws a DataTruncation exception. Date, Time, TimeStamp: This classes are only present for compatibility reasons with the java Date and Time classes. DriverPropertyInfo: This class stores informations about a given driver property. The method getPropertyInfo of the interface driver returns all properties of a driver. So the programmer can get information about the parameters which are necessary to open the database (user, password etc). 5-4
Types: This class only contains constants, which define the generic SQL-Types. The following table shows, how the generic SQL-Types are translated to Java types. A X in the table means that this is the preferred conversion. A x in the table means that the conversion is possible.
getByte getShort getInt getLong getFloat getDouble getBigDecimal getBoolean getString getBytes getDate getTime getTimestamp getAsciiStream getUnicodeStream getBinaryStream getObject
5.3
T I N Y I N T
S M A L L I N T
I N T E G E R
B I G I N T
R E A L
F L O A T
D O U B L E
D E C I M A L
N U M E R I C
B I T
C H A R
V A R C H A R
L O N G V A R C H A R
X x x x x x x x x
x X x x x x x x x
x x X x x x x x x
x x x X x x x x x
x x x x X x x x x
x x x x x X x x x
x x x x x X x x x
x x x x x x X x x
x x x x x x X x x
x x x x x x x X x
x x x x x x x x X
x x x x x x x x X
x x x x x x x x x
x x x x x
x x x x x
x x x X X
x
x
x
x
x
x
x
x
x
x
x
x
x
B I N A R Y
V A R B I N A R Y
L O N G V A R B I N A R Y
D A T E
T I M E
T I M E S T A M P
x X
x X
x x
x
x
x
x
X x
x x X
x
x
x
X
x x x x
x x x x
x x X x
Programming with JDBC
Here are some applications of JDBC.
5.3.1
Connection to the database and simple selects
The following program shows how to connect to the database (for Mysql). It also shows how to do a simple select. The query is sent to the database with a Statement object. Then the tuples can be read with the returned ResultSet object. Example 5.1 [Connection] // Import the JDBC package java.sql import java.sql.*; public class JdbcExample { public static void main(String args[]) { Connection con = null;
5-5
if (args.length < 2) { System.out.println("Usage: java JdbcExamples "); System.exit(0); } String user = args[0]; String passwd = args[1]; String database = args[2]; // Load the Mysql driver it is also // possible to load more than one driver. try { Class.forName("com.mysql.jdbc.Driver"); } catch (Exception e) { // Print error and exit System.out.println(e); System.exit(1); } // Connect to the database. The subprotocol is // "mysql" so the Drivermanager knows, that this is // a connection to a Mysql database (on localhost). // // For Mysql the user and the password are mandatory try { con = DriverManager.getConnection( "jdbc:mysql://localhost/" + database, user, passwd ); } catch (Exception e) { // Connection failed. Exit program System.out.println(e); System.exit(1); } // Example for a SELECT command try { Statement stmt = con.createStatement(); // Send the query to the database ResultSet rs = stmt.executeQuery( "SELECT sNr, name, city, zip " + "FROM supplier"); while (rs.next()) { // Get the sNr and the name. Notice, that the // index begins with 1 and not with 0 System.out.print(rs.getInt(1) + " / " + rs.getString(2) + " / "); // It is also possible to get the values of the attributes // by name. System.out.println(rs.getString("zip") + " " + rs.getString("city")); 5-6
} // Close the statement explicitly. As long as there // is no garbage collection, the resources in the // database will not be released. stmt.close(); } catch (Exception e) { System.out.println(e); } } }
5.3.2
Update-Operation
The following program shows how tuples can be inserted in a table. It also shows how the database schema can be changed. Example 5.2 [JDBC Updates] ... // UPDATE Commands try { Statement stmt = con.createStatement(); // Send the INSERT command to the database. Use the // method executeUpdate. int changes = stmt.executeUpdate( "INSERT INTO supplier (sNr, name, city, zip)" + "VALUES (102, ’Clinton’, ’New York’, ’10010’)"); // Display the number of updated tuples System.out.println("Number of changes: ", changes); stmt.close(); } catch (Exception e) { System.out.println(e); } // Create a new table try { Statement stmt = con.createStatement(); stmt.executeUpdate("CREATE TABLE client (" + "cNr INTEGER PRIMARY KEY, " + "name VARCHAR(30) NOT NULL, " + "city VARCHAR(30) NOT NULL"); stmt.close(); } catch (Exception e) { System.out.println(e); } ... 5-7
5.3.3
Prepared Statement
When the same command is used more than once but with different parameters, then it is wise to use prepared Statements. First the execution is faster and second the passing of the parameters is much simpler Example 5.3 [Insert tuples with prepared Statements] ... // Prepared Statements with passing of parameters. // So a statement can be used more than once. try { PreparedStatement stmt = con.prepareStatement( "INSERT INTO supplier " + "(sNr, name, city, zip) VALUES (?,?,?,?)"); // Pass the parameters stmt.setInt(1, 103); stmt.setString(2, "Davis"); stmt.setString(3, "Dallas"); stmt.setString(4, "200200"); // Now execute the command int changes = stmt.executeUpdate(); stmt.close(); } catch (Exception e) { System.out.println(e); } ...
5.3.4
Null values
If the value of an attribute is null and the data type is an object, then the method getXXX will simply return null in Java. If the data type is not an object, then it is possible to test if the attribute contains a null value with the method wasNull of the interface ResultSet. Example 5.4 [Null values] ... // Handling null values try { Statement stmt = con.createStatement(); // Build the query ResultSet rs = stmt.executeQuery( "SELECT peNr, name, father FROM person"); while (rs.next()) { int father = rs.getInt("father"); if (rs.wasNull()) 5-8
System.out.println("Person " + rs.getInt("peNr") + ". " + rs.getString("name") + " has no father"); } stmt.close(); } catch (Exception e) { System.out.println(e); } ...
5.3.5
Scrolling in Resultset
It is possible to scroll in a Resultset. The cursor can be positioned at any tuple in the set and with the methods next() and previous() navigating from tuple to tuple is possible. Example 5.5 [Scrolling] ... // Example of scrolling in a ResultSet try { // The Statement must be declared as scroll. Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY); // Execute the query ResultSet rs = stmt.executeQuery("SELECT * FROM supplier " + "ORDER BY sNr"); // List all suppliers by descending sNr rs.afterLast(); while (rs.previous()) { System.out.print(rs.getString("name")); } // Direct adressing of tuples. Here we position // the cursor on the seventh tupel and navigate // to the next tuples rs.absolute(7); do { System.out.print(rs.getString("sNr") + " " +rs.getString("name")); } while (rs.next()); stmt.close(); } catch (Exception e) { System.out.println(e); } ...
5-9
5.3.6
Updates in a Resultset
Tuples in a ResultSet can be modified. It is also possible to insert new tuples in a ResultSet. This tuples are then stored in the database. Finaly there is also possible to delete tuples in the Resultset. This operations are only possible, if the tuples in the ResutSet are from one table. Example 5.6 [Update in Resultsets] ... // Example: Update, Insert and delete in a Resultset try { // The statement must be declared as updatable. Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE); // Execute the Query. The Select statement // only contains one Table ResultSet rs = stmt.executeQuery("SELECT * FROM supplier " + "ORDER BY sNr"); // // Update the city in the seventh tuple // rs.absolute(7); rs.updateString("city", "New York"); // Now update the row rs.updateRow(); // Insert a new supplier. For this we must go to // the "InsertRow" rs.moveToInsertRow(); // Fill the datas rs.updateInt("sNr", 201); rs.updateString("name", "Clinton"); rs.updateString("city", "Denver"); rs.updateString("zip", "200201"); // Store the new tuple rs.insertRow(); // Return to the cursor position rs.moveToCurrentRow(); // // Delete the first tuple in the set // rs.first(); rs.deleteRow(); // Commit the changes. 5-10
con.commit(); stmt.close(); } catch (Exception e) { System.out.println(e); } ...
5.4
Meta-Informations
There are some problems which can not be solved with the presented classes and methods. This problems happen, when the SQL-Statements are not known at compile time. Example 5.7 [dbframe] The program dbframe is an example of such a problem. Because the statements are written by the user at runtime, the program does not know the form of this commands. They must be interpreted at runtime. Some more classes and methods are necessary to solve such problems. 1. It must be possible to get the relation schema of a ResultSet. This can be achieved with the inteface ResultSetMetaData. 2. It must be possible to get informations about the database schema at runtime. This can be achieved with the inteface DatabaseMetaData.
5.4.1
The class ResultSetMetaData
In the next example an arbitrary query is analyzed with the methods of the interface ResultSetMetaData. The JDBC datatypes are defined in the class Types. Example 5.8 [Analyzing ResultSets] ... // Example for using ResultSetMetadata try { Statement stmt = con.createStatement(); // Execute a query ResultSet rs = stmt.executeQuery( "SELECT * FROM supplier natural join sorder " + "WHERE supplier.sNr = 1"); // Get the Metadata ResultSetMetaData rsm = rs.getMetaData(); // Get the numbers of columns in the result schema int cols = rsm.getColumnCount(); // Loop over the tuples while (rs.next()) { 5-11
// Get the name, datatype and value of each // Column for (int i = 1; i = 4 and pName = ’Sugar’) select(person, isnotnull(father) and isnotnull(mother)) Projection π project(rel-expr, attribute {,attribute}) Example: project(supplier, Snr, name) Natural join ⊲⊳ natjoin(rel-expr, rel-expr) Example: natjoin(supplier, sorder) Union ∪ union(rel-expr, rel-expr) Example: union(select(person, name = ’Meier’), select(person, ort = ’Bern’)) Intersection ∩ intersect(rel-expr, rel-expr) Example: intersect(select(person, name = ’Meier’), select(person, ort = ’Bern’)) Setminus \ except(rel-expr, rel-expr) Example: except(select(person, name = ’Meier’), select(person, ort = ’Bern’)) Renaming β rename(rel-expr, newattr