Sarajane Marques Peres, Ph.D. – University of São Paulo www.each.usp.br/sarajane Based on Elsmari, Navathe / Silberschatz, Korth, Sudarshan‘s books
FORMAL LANGUAGES FOR THE RELATIONAL MODEL
RelaJonal Algebra • MoJvaJon – It provides a formal foundaJon for relaJonal model operaJons; – It is used as a basis for implemenJng and opJmizing queries in the query processing and opJmizaJon modules in RDBMs; – Its concepts are incorporated into the SQL standard query language for RDBMs.
• • • •
Unary RelaJon OperaJons RelaJonal Algebra OperaJon from Set Theory Binary RelaJonal OperaJons: JOIN and DIVISION AddiJonal RelaJonal OperaJons
RelaJonal Algebra -‐ Unary RelaJon OperaJons • SELECT Opera6on – Choose a subset of the tuples from a relaJon that saJsfies a selecJon condiJon; • Filter • Horizontal parJJon
RelaJonal Algebra -‐ Unary RelaJon OperaJons • SELECT Opera6on – Choose a subset of the tuples from a relaJon that saJsfies a selecJon condiJon; • Filter • Horizontal parJJon
Select the EMPLOYEE tuples whose department is 4, or those whose salary is greater than $30,000.
RelaJonal Algebra -‐ Unary RelaJon OperaJons • SELECT Opera6on – – – – – –
– –
The Boolean condiJons AND, OR and NOT have their normal interpretaJons. The SELECT operator is unary. It is applied to each tuple individually. The degree of the relaJon resulJng from a SELECT operaJon – its number of a[ributes – is the same as the degree of R. The number of tuples in the resulJng relaJon is always less than or equal to the number of tuples in R. Selec6vity of the condiJon: the raJo of the number of records (tuples) that saJsfy the condiJon to the total number of records (tuples) in the file (relaJon). SELECT operaJon is commuta6ve. We can always combine a sequence of SELECT operaJon into a single SELECT operaJon with a conjunc6ve (AND) condiJon.
RelaJonal Algebra -‐ Unary RelaJon OperaJons • SELECT opera6on – SELECT operaJon is commuta6ve.
– We can always combine a sequence of SELECT operaJon into a single SELECT operaJon with a conjunc6ve (AND) condiJon.
RelaJonal Algebra -‐ Unary RelaJon OperaJons • PROJECT opera6on – It selects certain columns from the table and discards the other columns. • VerJcal parJJon
RelaJonal Algebra -‐ Unary RelaJon OperaJons • PROJECT opera6on – It selects certain columns from the table and discards the other columns. • VerJcal parJJon
List each employee’s first and last name and salary.
RelaJonal Algebra -‐ Unary RelaJon OperaJons • PROJECT opera6on – The PROJECT operator is unary. – The a[ributes specified in appear (in the resulJng relaJon) in the same order as they appear in the list. – Its degree is equal to the number of a[ributes in . – If the a[ribute list includes only nonkey aVributes of R, duplicate tuples are likely to occur. The PROJECT operaJon removes any duplicate tuples. • If duplicates are not eliminated, the result would be a mul6set or bag of tuples rather than a set. – The number of tuples in the resulJng relaJon is always less than or equal to the number of tuples in R. • If the projecJon list is a superkey of R, the resulJng relaJon has the same number of tuples as R. – The commuta6vity does not hold on PROJECT.
RelaJonal Algebra -‐ Unary RelaJon OperaJons • PROJECT opera6on • The commuta6vity does not hold on PROJECT.
RelaJonal Algebra -‐ Unary RelaJon OperaJons • Sequences of Opera6on and the RENAME Opera6on
RelaJonal Algebra -‐ Unary RelaJon OperaJons • Sequences of Opera6on and the RENAME Opera6on
• PossibiliJes
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT) – UNION: the result of this operaJon is a relaJon that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated.
– INTERSECTION: The result of this operaJon is a relaJon that includes all tuples that are in both R and S.
– SET DIFFERENCE: The result of this operaJon is a relaJon that includes all tuples that are in R but not in S.
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT)
Retrieve the Social Security numbers of all employees who either work in department 5 or directly supervise an employee who works in department 5.
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT) – These are binary operaJons, that is, each is applied to two sets (of tuples) – The two relaJons on which any of these three operaJons are applied must have the same type of tuples; this condiJon has been called union compa6bility or type compa6bility. • This means that the two relaJons have the same number of a[ributes and each corresponding pair of a[ributes has the same domain.
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT) – UNION and INTERSECTION are commuta6ve operaJons
– UNION and INTERSECTION are associa6ve operaJons
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT) – SET DIFFERENCE operaJon is not commutaJve; that is, in general.
R – S
S – R
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT) – SET DIFFERENCE operaJon is not associaJve;
(R – S) – T
R – (S – T)
RelaJonal Algebra OperaJon from Set Theory • UNION, INTERSECTION and SET DIFFERECE (MINUS or EXCEPT) INTERSECTION can be expressed in terms of union and set difference opera6ons.
RelaJonal Algebra OperaJon from Set Theory • CARTESIAN PRODUCT (CROSS PRODUCT) – This set operaJon produces a new element by combining every member (tuple) from one relaJon (set) with every member (tuple) from the other relaJon (set). R x S – It is a binary operaJon. – The resulJng relaJon Q has degree n + m a[ributes (n is the degree of R and M is the degree of S). – The resulJng relaJon Q has one tuple for each combina6on of tuples – one from R and one from S. – Q has n * m tuples.
RelaJonal Algebra OperaJon from Set Theory • CARTESIAN PRODUCT (CROSS PRODUCT) Suppose that we want to retrieve a list of names of each female employee’s dependents.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • JOIN Opera6on – It is used to combine related tuples from two relaJons into a single “longer” tuples (binary opera6on). – It allows us to process rela6onships among relaJons.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • JOIN Opera6on – It is used to combine related tuples from two relaJons into a single “longer” tuples (binary opera6on). – It allows us to process rela6onships among relaJons. Retrieve the name of the manager of each department.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • JOIN Opera6on – It can be specified as a CARTESIAN PRODUCT operaJon followed by a SELECT operaJon.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • JOIN Opera6on – The result of the JOIN is a relaJon Q with n + m aVributes (n from R and m from S), in that order. – Q has one tuple for each combinaJon of tuples – one from R and one from S – whenever the combina6on sa6sfies the join condi6on. – Tuples whose join a[ributes are NULL or for which the join condiJon is FALSE do not appear in the result. – NoJce that if no combinaJon of tuples saJsfies the join condiJon, the result of a JOIN is an empty relaJon with zero tuples.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • EQUIJOIN – A JOIN operaJon where the only comparison operator used is =, is called an EQUIJOIN.
• NATURAL JOIN – It was created to get rid of the second (superfluous) a[ribute in an EQUIJOIN condiJon. – It requires that the two join a[ributes (or each pair of join a[ributes) have the same name in both relaJons. If this is not the case, a renaming operaJon is applied first. – Only one join aVribute value is kept. – If there is no join condi6on, all combinaJons of tuples qualify and the JOIN degenerates into a CARTESIAN PRODUCT.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • INNER JOINS – A single JOIN operaJon used to combine data from two relaJons so that related informaJon can be presented in a single table. – Formally, it is a combinaJon of CARTESIAN PRODUCT and SELECTION.
• N-‐WAY JOINS
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • DIVISION opera6on Retrieve the names of employees who works on ALL the projects that John Smith works on.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • DIVISION opera6on Retrieve the names of employees who works on ALL the projects that John Smith works on.
RelaJonal Algebra -‐ Binary RelaJonal OperaJons • DIVISION opera6on – Binary operaJon R(Z) ./. S(X), where the a[ributes of S are a subset of the a[ributes of R. – Let Y be the set of a[ributes of R that are not a[ributes of S; • Y = Z – X
– The result of DIVISION is a relaJon T(Y) that includes a tuple t if: • tuples tR appear in R with tR[Y] = t AND with tR[X] = tS for every tuple tS in S
– This means that, for a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in combina6on with every tuple in S.
Exercise!!!!
RelaJonal Algebra • It has been shown that the set of relaJonal operaJons below is a COMPLETE SET;
– That is, any of the other original relaJonal algera operaJons can be expressed as a sequence of operaJons from this set. • INTERSECTION à UNION and MINUS • JOIN à CARTESIAN PRODUCT and SELECT • DIVISION à PROJECT, CARTESIAN PRODUCT and MINUS
RelaJonal Algebra • NotaJon for Query Trees – Query evaluaJon tree or query execuJon tree – A query tree is a tree data structure that corresponds to a relaJonal algebra expression. • Input relaJons: leaf nodes • Execu6on: execuJng an internal node operaJon whenever its operands are available, and then replacing that internal node by the resulJng relaJon. • The execuJon terminates when the root node is executed. • Output: resulJng relaJon for the query.
RelaJonal Algebra • NotaJon for Query Trees
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Generalized Projec6on – It extends the projecJon operaJon by allowing funcJons of a[ributes to be included in the projecJon list.
– Where F1, F2, … Fn are funcJons over the a[ributes in relaJon R and may involve arithmeJc operaJons and constant values.
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Generalized Projec6on
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Aggregate Func6ons and Grouping – They are mathemaJcal aggregate funcJons applied on collecJons of values. Ex.: SUM, AVERAGE, MAXIMUM, MINIMUM and COUNT.
• Grouping opera6on – It involves grouping the tuples in a relaJon by the value of some of their a[ributes and then applying an aggregate funcJon independently to each group.
• where is a list of a[ributes of the relaJon specified in R, and is a list of () pairs.
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Ex.: Retrive each department number, the number of employees in the department, and their average salary, while renaming the resulJng a[ributes.
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Ex.: Retrive each department number, the number of employees in the department, and their average salary, while renaming the resulJng a[ributes.
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Recursive Closure Opera6ons – Specify the SSNs of all employees e’ directly supervised – at level one – by the employee e whose name is ‘James Borg’.
– And at level 2?
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • Recursive Closure Opera6ons – Specify the SSNs of all employees e’ directly supervised – at level one – by the employee e whose name is ‘James Borg’.
– And at level 2?
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons • OUTER JOIN Opera6ons – They were developed for the case where the user wants to keep all the tuples in R, or all those in S, or all those in both relaJons in the result of the JOIN, regardless of whether or not they have matching tuples in the other relaJon. Print the list of all employee names as well as the name of the departments they manage IF THEY HAPPEN TO MANAGE A DEPARTMENT.
This is a LEFT OUTER JOIN. Similars operaJons are RIGHT OUTER JOIN and FULL OUTER JOIN.
RelaJonal Algebra -‐ AddiJonal RelaJonal OperaJons Print the list of all employee names as well as the name of the departments they manage IF THEY HAPPEN TO MANAGE A DEPARTMENT.
Exercise
Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project.
Exercise Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project.
π Fname (σ Hours > ‘10’ (σ Dno = ‘5’ (Employee) |X| Works_on |X| (σ Pname = ‘Product X’ (Project))))
Exercise
For each project, list the project name and the total hours per week (by all employees) spent on that project.
Exercise For each project, list the project name and the total hours per week (by all employees) spent on that project.
Pname SUM Hours (Works_on |X| Project)
Exercise
Exercises • Retrieve the name and address of all employees who work for the ‘Research’ department. • For every project located in ‘Stafford’, list the project number, the controlling department number, and the department manager’s last name, address, and birth date. • Find the names of employees who work on all the projects controlled by department number 5. • Make a list of project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as a manager of the department that controls the project. • List the names of all employees with two or more dependents. • Retrieve the names of employees who have no dependents. • List the names of managers who have at least one dependent.
Tuple RelaJonal Calculus • The tuple relaJonal calculus is based on specifying a number of tuple variables. • A simple tuple relaJonal calculus query: { t | COND(t)} • where t is a tuple variable and COND(t) is a condiJonal (boolean) expression involving t that evaluates to either TRUE or FALSE for different assignments of tuples to the variable t. • The result of such a query is the set of all tuples t that evaluate COND(t) to TRUE.
Tuple RelaJonal Calculus • All employees whose salary is above $50,000 {t | EMPLOYEE(t) AND t.Salary > 50000} • All a[ributes values for each select EMPLOYEE tuple t will be retrieved. To retrieve only the fist and last names … {t.Fname, t.Lname | EMPLOYEE(t) AND t.Salary > 50000}
Tuple RelaJonal Calculus • Retrieve the birth date and address of the employee (or employees) whose name is John B. Smith. {t.Bdate, t.Address | EMPLOYEE(t) AND t.Fname=‘John’ AND t.Minit=‘B’ AND t.Lname=‘Smith’}
Tuple RelaJonal Calculus Existencial QuanJfier • List the name and address of all employees who work for the ‘Research’ department. {t.Fname, t.Lname, t.Address | EMPLOYEE(t) AND (∃d)(DEPARTMENT(d) AND d.Dname=‘Research’ AND d.Dnumber=t.Dno)}
Tuple RelaJonal Calculus Existencial QuanJfier • For every project located in ‘Stafford’, list the project number, the controlling department number, and the department manager’s last name, birth date, and address.
{p.Pnumber, p.Dnum, m.Lname, m.Bdate, m.Address | PROJECT(p) AND EMPLOYEE(m) AND p.PlocaJon=‘Stafford’ AND ((∃d)(DEPARTMENT(d) AND p.Dnum=d.Dnumber AND d.Mgr_ssn=m.Ssn))}
Tuple RelaJonal Calculus Existencial QuanJfier • List the name of each employee who works on some project controlled by department number 5
{e.Lname, e.Fname | EMPLOYEE(e) AND ((∃x)(∃w)(PROJECT(x) AND WORKS_ON(w) AND x.Dnum=5 AND w.Essn=e.Ssn AND x.Pnumber=w.Pno))}
Tuple RelaJonal Calculus Existencial QuanJfier
• Make a list of project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as manager of the controlling department for the project. { p.Pnumber | PROJECT(p) AND (((∃e)(∃w)(EMPLOYEE(e) AND WORKS_ON(w) AND w.Pno=p.Pnumber AND e.Lname=‘Smith’ AND e.Ssn=w.Essn) ) OR ((∃m)(∃d)(EMPLOYEE(m) AND DEPARTMENT(d) AND p.Dnum=d.Dnumber AND d.Mgr_ssn=m.Ssn AND m.Lname=‘Smith’)))}
Tuple RelaJonal Calculus Universal QuanJfier • List the names of employees who work on all the projects controlled by department number 5.
{e.Lname, e.Fname | EMPLOYEE(e) AND ((∀x)(NOT(PROJECT(x)) OR NOT (x.Dnum=5) OR ((∃w)(WORKS_ON(w) AND w.Essn=e.Ssn AND x.Pnumber=w.Pno))))}
Exercise Tuple RelaJonal Calculus • Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project. • List the names of all employees who have a dependent with the same first name as themselves. • Find the names of all employees who are directly supervised by ‘Franklin Wong’. • Retrieve the names of all employees who work on every project. • Retrieve the names of all employees who do not work on any project.
Sarajane Marques Peres, Ph.D. – University of São Paulo www.each.usp.br/sarajane Based on Elsmari, Navathe / Silberschatz, Korth, Sudarshan‘s books
FORMAL LANGUAGES FOR THE RELATIONAL MODEL