Realization of DBS
11. Table Operations – Implementation Theo Härder www.haerder.de
Goals
- Systematic development of relational processing concepts for a single table or for several tables - Realization of plan operators
Main reference: Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 11. Goetz Graefe: Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25:2, June 1993, pp. 73-170. Realization of Database Systems – SS 2011 © 2011 AG DBIS
Realization of DBS
Table Operations - Implementation
Operations of the relational algebra -
Table operations
Unary operations: Binary operations:
, , , , , , –
Plan operators
Set operations
T
operations of the relational algebra. They are further transformed into access plans. So-called plan operators implement these logical operations
Plan operators on a single table
Selection
Operators across several tables
Join algorithms -
© 2011 AG DBIS
R
SQL queries contain logical expressions which can be mapped to the
Hash join
Distributed joins
S
Nested-loops & sort/merge join
Joins on typespanning paths
Sort
Nested-loops join, Sort-merge join Hash join (classic hashing, simple hash join, hybrid hash join) Exploitation of type-crossing access paths Distributed join algorithms
Further binary operations (set operations) 11-2
Realization of DBS
Plan Operators on a Single Table
Selection – general ways of evaluation • Direct access via a given TID, via a hash method or a one- resp. multidimensional index structure • Sequential search in a table • Search via an index structure (index table, bitlist) • Selection using several pointer lists where more than a single index structure can be exploited • Search via a multi-dimensional index structure
Table operations
Plan operators Nested-loops & sort/merge join Hash join
Joins on typespanning paths
Projection
Modification
Distributed joins
is typically performed in combination with sorting, selection, or join
• Updates are set-oriented in SQL, but restricted to a single table • INSERT, DELETE and UPDATE are directly mapped to the corresponding operations of the storage structures • “Automatic” execution of maintenance operations
Set operations
- to update access paths, - to guarantee clustering and reorganization etc.
• Provisions for logging and recovery etc. 11-3
© 2011 AG DBIS
Realization of DBS
Plan Operators for the Selection
Use of Scan Operators •
Table operations Plan operators
•
Plan operators 1. Table scan (relation scan) - Always possible - SCAN operator implements selection operation
Nested-loops & sort/merge join
2. Index scan - Selection of most cost-effective index - Specification of search range (start-, stop condition)
Hash join
3. k-d scan
Joins on typespanning paths
- Evaluation of multi-dimensional search criteria - Use of differing evaluation directions by navigation
4. TID algorithm
Distributed joins
-
Set operations
© 2011 AG DBIS
Definition of start- and stop condition Definition of simple search arguments
Evaluation of all “useable" useable index structures Location of TID lists of variable lengths Boolean connection of the lists Access to the records according to the hit list (result list)
Further plan operators in combination with selection • Sorting • Grouping (see sort operator) • Special operators e.g. in Data-Warehouse applications for grouping and aggregation (CUBE operator)
11-4
Realization of DBS
Operators Across Several Tables
SQL allows complex queries across k tables • One-variable expressions: describe conditions for the selection of elements from a table • Two-variable expressions: describe conditions for the combination of elements from two tables • Typically, k-variable expressions are decomposed into one- and two-variable expressions i and d evaluated l t d by b corresponding di plan l operators t
Table operations
Plan operators Nested-loops & sort/merge join
Plan operators across several tables • General ways for the evaluation: - Nested iteration for each element of outer table To traversal of inner table Ti
Hash join
Joins on typespanning paths
• •
Distributed joins
-
Set operations
Merge method iterating traversals through T1, T2 • • •
-
© 2011 AG DBIS
Realization of DBS
O(N1 + N2) additional sort costs, if necessary important application: merging join
Hashing Partitioning of inner table Ti and partition-wise loading in HT in memory. “Probing” by outer table To or its resp. partitions using HT: O(p · No + Ni)
11-5
Operators Across Several Tables (2)
n-way joins • • • • •
Table operations Plan operators
Nested-loops & sort/merge join Hash join
Joins on typespanning paths
O(No · Ni + No) important application: nested-loops join
2
Decomposition into n-1 two-way joins Number of possible join sequences is dependent on the join attributes chosen Maximal n! different sequences possible Use of pipelining techniques O i l evaluation Optimal l i sequence dependent d d on - Plan operators - “Fitting” sort orders for join attributes - Size of operands etc.
Some join sequences using two-way joins (n=5) result
result
Distributed joins
result T5
Set operations
T2
T4
T5
T3 T1
T2
left-deep tree © 2011 AG DBIS
2.
T4 T5
T1
T2 T3
T4
bushy tree
T3
T1
right-deep tree
Analogous proceeding in case of set operations Practicality test (Guy Lohman test for join techniques): Does a new technique apply to joining three inputs without interrupting data flow between the join operators?
11-6
Realization of DBS
Plan Operators for the Join
Join • • • •
Table operations
Plan operators
Nested-loops & sort/merge join
Record-type-spanning operation: usually very expensive Frequent use: important optimization candidate Typical application: equi-join General Θ-join infrequent
Implementation p of the jjoin operation p
can process, at the same time, selections (and projections) on the participating tables R and S
SELECT FROM WHERE
Hash join
Joins on typespanning paths
• •
* R, S R.JA Θ S.JA AND PR AND PS
JA: join attribute PR and PS: predicates defined on selection attributes (SA) of R and S
Distributed joins
Possible access p paths
Set operations
• Scans over R and S
(always)
• Scans over IR(JA), IS(JA) deliver sort sequence according to JA
(if present)
• Scans over IR(SA), IS(SA) if necessary, fast selection for PR and PS
(if present)
• Scans over other index structures if necessary, faster location of all records
(if present)
© 2011 AG DBIS
Realization of DBS
Nested-Loops Join
Plan operators
Nested-loops & sort/merge join
Joins on typespanning paths
Records in R and S are not ordered according to join attributes Index structures IR(JA) and IS(JA) do not exist
Algorithm for Θ-join
Scan over S, for each record s, if PS: scan over R, R for each record r, if PR AND (r.JA Θ s.JA): execute join, i.e., write combined record (r, s) into the result set.
Complexity: O(N*M) Nested-loops join using index access
Nested-block join
Distributed joins
Set operations
© 2011 AG DBIS
Assumptions • •
Table operations
Hash join
11-7
Scan over S, for each record s, if PS: determine via access to IR(JA) all TIDs for records satisfying r.JA = s.JA, for each TID: fetch record r, if PR: write combined b d record d (r, ( s ) into the h result l set. Scan over S, for each page (resp. set of contiguous pages) of S: scan over R, for each page (resp. set of contiguous pages) of R: for each record s of the S-page, if PS: for each record r of the R-page, if PR AND (r.JA Θ s.JA): write combined record (r, s) into the result set.
11-8
Realization of DBS
Sort-Merge Join
Algorithm consists of 2 phases • Phase 1:
Table operations
• Phase 2:
Plan operators Nested-loops & sort/merge join
Sorting of R and S w.r.t R(JA) and S(JA) (if not already present); in doing so, early elimination of records not needed ( PR, PS) Iterating scans over sorted R- and S-records where join is performed in case of r.JA = s.JA
Complexity: O(N log N)
Special case If either IR(JA) and IS(JA) or GAPS over R(JA) and S(JA) (join index) is present:
Hash join
exploitation of index structures on join attributes Joins on typespanning paths
Iterating scans over IR(JA) and IS(JA): for each with two keys from IR(JA) and IS(JA), if r.JA = s.JA: fetch the records using the related TIDs, if PR and PS: write combined record (r, (r s) into the result set
Distributed joins
Set operations
11-9
© 2011 AG DBIS
Realization of DBS
Hash Join
• Step 1: Partitioned read of (smaller) table R and construction of a hash table using hH(r(JA)) w.r.t. values of R(JA) of partitions Ri (1 i p): each partition fits into the available memory and each record satisfies PR
Table operations Plan operators
• Step 2: Probing for records of S using PS; if successful, execution of join • Step 3: Repeat steps 1 and 2 as long as R is exhausted
Nested-loops & sort/merge join Hash join
Simplest case (classic hashing)
Construction of hash tables and probing Scan over R; building hash tables Hi (1 i p) one at a time in memory
Joins on typespanning paths
R
H1 Scan over S with probing of H1
S
Distributed joins
... Set operations
R S
© 2011 AG DBIS
Hp
Scan over S with probing of HP
Complexity: O(p · N) Special case R fits into memory: one partition (p = 1) a single scan over S is sufficient!
11-10
Realization of DBS
Hash Join (2)
Table operations
Partitioning of R with hp(r(JA)) #records / JA-value
Plan operators Nested-loops & sort/merge join
0
100
#records / JA’-value
Hash join
JA
h p (r(JA))
Joins on typespanning paths Distributed joins
0.33
0 R1
Set operations
0.66 R2
JA’
1 R3
11-11
© 2011 AG DBIS
Realization of DBS
Hash Join (3)
Table operations Plan operators
Partitioning • Partitioning of R in subsets R1, R2, ..., Rp: a record r of R is in Ri, if h(r) is in Hi R
Nested-loops & sort/merge join Hash join
Joins on typespanning paths
Distributed joins
... H1
H2
Hp
Why is this partitioning a critical operation?
Set operations
Which auxiliary operations may be required? Is the use of a hash function needed for partitioning? • Table S is partitioned with same function hP while evaluating PS © 2011 AG DBIS
11-12
Realization of DBS
Hash Join (4)
Table operations
Variants of hash join are primarily distinguished by the kind of partitioning Partitioning technique in case of simple hash join shown for construction and probing of H1 R
step 1:
Plan operators
H1
1. iteration
Nested-loops & sort/merge join
Rrest Srest
Hash join
step 2:
Joins on typespanning paths Distributed joins
S
Simple hash join • Step p 1: Execute scan on R ((smaller table), ), evaluate PR and apply pp y hash function hP to each qualified record r. Is hP(r(JA)) in the chosen range, insert record into H1. Otherwise, write r in an output buffer for a file Rrest for “pretermitted” r-records.
Set operations
• Step 2: Execute scan on S, evaluate PS and apply hash function hP to each qualified record s. Is hP(s(JA)) in the chosen range, search a join counterpart (probing) in H1. If successful, form a join record and put it to the result. Otherwise, write s to an output buffer for a file Srest for “pretermitted” s-records. • Step 3: Repeat step 1 and 2 using the so far “pretermitted” records on Hi as long as Rrest is exhausted. Here, evaluation of PR and PS is not required anymore. 11-13
© 2011 AG DBIS
Realization of DBS
Hash Join (5)
Table operations Plan operators
Grace join (grace join) • Partitioning of R and S takes place before join starts • Partitions Ri and Si are stored in temporary files on disk • Construction of Hi (having M pages) in memory with Ri and probing with Si
Nested-loops & sort/merge join
R1
Hash join
S1
Joins on typespanning paths
H1
Scan over S1 with probing of H1 ...
RP
HP
Distributed joins
SP Set operations
Scan over SP with probing of HP
What is the minimal memory size required?
© 2011 AG DBIS
11-14
Realization of DBS
Hash Join (6) Hybrid hash join
• Optimization such that construction and probing of H1 is done in parallel to partitioning
Table operations
Plan operators
Scan
1)
R
Nested-loops & sort/merge join
R2
Hash join
a)
memory
Joins on typespanning paths
b)
Distributed joins
R3
RP
R1 constructed in H1
memory area: 1 page each S2
immediate probing of S1-records
S
S3
SP
Scan
Set operations
2)
H2
R2 S2
as in case of Grace join
Scan ...
3)
11-15
© 2011 AG DBIS
Realization of DBS
Hash Join - Example
# r ec o r d s / J A - v a lu e
Partitioning a) Partitioning of R with hP(r(JA))
Table operations
JA 0
100
# r ec o r d s / J A ‘ - v a lu e
Plan operators
Nested-loops & sort/merge join
h p ( r (JA ) )
JA‘ 0 .3 3
0
b) Partitioning of S with hP(s(JA))
R1
0 .6 6 R2
1 R3
Hash join
Joins on typespanning paths
II. Join
1)
R1 S1
Distributed joins
Set operations
2) R2
H1
in memory with hH(r(JA)) JA’: 0.0 – 0.33 read, probing with of hH(s(JA))
JA’: 0.0 – 0.33
H2 JA’: 0.34 – 0.66
S2 3)
R3 S3
© 2011 AG DBIS
JA’: 0.34 – 0.66 H3 JA’: 0.67 – 1.0 JA’: 0.67 – 1.0
11-16
Realization of DBS
Use of Type-Spanning Access Paths
Join via link structures •
Table operations
Use of hierarchical access paths for equi-join
Scan over R (Owner table), for each record r, if PR: Scan over related link structure LR-S(JA), for each record ss, if PS: write combined record (r, s ) into the result set.
Plan operators Nested-loops & sort/merge join Hash join
Further methods • Join indexes which are built for certain Θ-joins
Joins on typespanning paths Distributed joins
Set operations
VIR:
VIS:
R
S
R
S
TIDr2 TIDr1 TIDr2 TIDr2
TIDs4 TIDs3 TIDs2 TIDs6
TIDr1 TIDr2 TIDr2 TIDr2
TIDs3 TIDs2 TIDs4 TIDs6
Logical view
Index for TIDR
R
S
TIDs2 TIDs3 TIDs4 TIDs6
TIDr2 TIDr1 TIDr2 TIDr2
Index for TIDS
11-17
© 2011 AG DBIS
Realization of DBS
Use of Type-Spanning Access Paths (2) • Use of generalized access path structures (GAPS)
Table operations
K53 Plan operators
Nested-loops & sort/merge join
K25
K36
K47
K58
K78
K88
Hash join
...
Joins on typespanning paths
TIDs for Dept
Distributed joins
... TIDs for Mgr
. . . K55 1 3 1 4 TID TID TID TID TID TID TID TID TID . . .
Set operations
PRIOR
© 2011 AG DBIS
...
NEXT
TIDs for Emp
TIDs for Equipment
optional reference to overflow page
11-18
Realization of DBS
Join Algorithms - Comparison input stream 2 e21 e22 e23
inputt stream 1
Table operations
Plan operators Nested-loops & sort/merge join
e21 e22 e23
e11 e12
e11
e13 ...
e13 ...
e13 ...
e12
Hash partitions
Hash join
Joins on typespanning paths
e21 e22 e23
e11 e12
(a) Nested-loops join
(b) Merge join
element comparison
Distributed joins
Set operations
(c) Hash join
successful element comparison
Nested-loops join is always applicable, however, scanning of complete search space has to be taken into account. Merge join needs lowest search costs, requires, however, sorted input streams. Index structures on both join attributes satisfy this prerequisite. Otherwise, explicit sorting of both tables w.r.t. join attributes reduces cost advantage substantially. Nevertheless, sort-merge join can own additional advantages, if the result is required in sorted sequence and sorting of the large result is more expensive than sorting of two small result sets. Hash join partitions search space. Fig. c assumes that the same hash function h is applied to tables R and S. The partition size of the (smaller) table is given by the available buffer size in memory. A reduction of the partition size, to approximate case b, causes higher preparation costs and is therefore not recommendable. 11-19
© 2011 AG DBIS
Realization of DBS
Join Algorithms in Distributed DBS
Problem statement •
Table operations Plan operators
•
Determination of evaluation strategy •
Nested-loops & sort/merge join Hash join
•
•
Distributed joins
-
Minimal number of messages
-
Very high transfer volumes
Request for every join value in the first table related records from the second table
-
Large number of messages
-
Only relevant records are considered
Trade-off solution: Semi-join resp. extensions such as Bit-vector join (hash filter join)
Semi-join • • •
Shipping of a list of JA values of R to node of S Determination of join counterparts in S and returning them to node of R Then join processing at node of R
Bit-vector join •
© 2011 AG DBIS
Send participating tables completely to a node and compute join locally (“ship whole”)
(“fetch as needed“)
Joins on typespanning paths
Set operations
Query in node K, which requires a join between (sub-)table R at node KR and (sub-)table S at node KS Determination of processing node: K, KR or KS
•
Similar to Semi-join, only shipping of a bit vector (Bloom Filter) created using a hash function 11-20 Returning a superset of join counterparts in S
Realization of DBS
Semi-Join and Bit-Vector Join Dept Dno 47 39 64
Frankfurt Table operations
Plan operators
Loc
Dno 69 28 75 47 47 44
Dno D 47 39 64 find join counterparts
Dept DNo Loc Mgr 47 39 64 create bit vector by hashing
Frankfurt
Distributed joins
Set operations
1
0
0
1
1
Munich
0
0
return projections of join counterpart records
Name
Address
Phone
Hans Anna
Name
Address
Phone
return the potential join candidates
0
0
0
1
1
0
0
0
hashing of Dno values to find potential join candidates
© 2011 AG DBIS
Dno 47
check + join
ship bit vector
1
Name Hans Anna
Emp
ship the whole JA column
Munich
Hash join
Realization of DBS
Dno 47 47
join
Nested-loops & sort/merge join
Joins on typespanning paths
Mgr
Emp Dno 69 28 75 47 91 44
Name
Address
Phone
11-21
Set Operations3
Table operations
Which set operations are needed? S R A
Plan operators
B
R, S A, B, C
C
union-compatible input streams element sets
Nested-loops & sort/merge join Hash join
Joins on typespanning paths
Distributed joins
operation result
matching in all attributes
A
difference (R-S)
anti-semi-join (S, R)
B
intersection
join, semi-join (S, R)
C
difference (S-R)
anti-semi-join (R, S)
A, B A, C
Set operations
left-sided outer join anti-difference
B, C A, B, C
© 2011 AG DBIS
3.
matching in one or several attributes
anti-join right-sided outer join
union
symmetrical outer join
Which algorithms can be used for these set operations? • What has to be compared at a time? • How can a relationship to the join algorithms be found?
Graefe, G.: Query evaluation techniques for large databases, ACM Computing Surveys 25:2, 1993, pp. 73-170
11-22
Realization of DBS
Set Operations (2)
Binary matching operations • •
Table operations
• Plan operators Nested-loops & sort/merge join
•
Same logical proceeding • • •
Hash join
Joins on typespanning paths Distributed joins
Three element sets are formed from R and S: A, B, C Elements in B fit together ! How can these three element sets be formed? - Using nested iteration - Using merge method - Using hash method
Unified realization concept • • •
Set operations
Solve the same task, in principle: “one-to-one matching operations” An input element contributes to the output dependent of its “match” with another input element Operations repeatedly require the same steps and, therefore, can be implemented using the same algorithms p are closelyy connected! Set- and jjoin operations
Comparison of join- vs. vs primary-key attributes Commonality: records are grouped on the basis of attribute values Some unary operations are possible with special measures - Grouping and sorting enable simple duplicate elimination - In case of aggregation, an attribute value per group is determined - In case of join, grouping of potential join counterparts is cost-effective (either in partitions or a sort order) - Using set operations, the element sets A, B, C can be found; at the same time, duplicate elimination is possible
© 2011 AG DBIS
Realization of DBS
Summary
Distributed joins
Set operations
Use of the same algorithm classes, in principle Variation of executing comparisons
Extensibility infrastructure in object-relational DBMS • •
© 2011 AG DBIS
Nested-loops join Sort-merge join Hash join And variations
Set operations • •
Nested iteration Merge method Hashing
Many options for processing of join operations • • • •
Joins on typespanning paths
Existing access path types require tailor-made operations and efficient mapping Combination of various access paths possible (TID algorithm)
General classes of evaluation methods for binary operations • • •
Nested-loops & sort/merge join Hash join
Selection operations • •
Table operations Plan operators
11-23
Creation of user-defined functions and operators Generalization: user-defined table operators with n input tables and m output tables
11-24