Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs
Algorithms F O U R T H
‣ B-trees
E D I T I O N
R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu
Last updated on Mar 1, 2015, 9:42 AM
Symbol table review
guarantee
average case ordered ops?
implementation
key interface
search
insert
delete
search hit
insert
delete
sequential search (unordered list)
N
N
N
N
N
N
binary search (ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
√N
✔
compareTo()
goal
log N
log N
log N
log N
log N
log N
✔
compareTo()
equals()
Challenge. Guarantee performance. This lecture. 2-3 trees, left-leaning red-black BSTs, B-trees. 2
3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs
Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu
‣ B-trees
2-3 tree Allow 1 or 2 keys per node.
・2-node: ・3-node:
one key, two children. two keys, three children.
Symmetric order. Inorder traversal yields keys in ascending order. Perfect balance. Every path from root to null link has same length. how to maintain?
M 3-node
smaller than E
AC
between E and J
2-node
E J
H
R larger than J
L
P
SX
null link 4
2-3 tree demo Search.
・Compare search key against keys in node. ・Find interval containing search key. ・Follow associated link (recursively). search for H
M
R
E J
AC
H
L
P
SX
5
2-3 tree: insertion Insertion into a 2-node at bottom.
・Add new key to 2-node to create a 3-node.
insert G
L
L
E
AC
E
R
H
P
SX
AC
R
GH
P
SX
6
2-3 tree: insertion Insertion into a 3-node at bottom.
・Add new key to 3-node to create temporary 4-node. ・Move middle key in 4-node into parent. ・Repeat up the tree, as necessary. ・If you reach the root and it's a 4-node, split it into three 2-nodes. insert Z
L
L
E
AC
E
R
H
P
SX
AC
RX
H
P
S
Z
7
2-3 tree construction demo
insert S
S
8
2-3 tree construction demo
2-3 tree
L
E
AC
R
H
P
SX
9
2-3 tree: global properties Invariants. Maintains symmetric order and perfect balance. Pf. Each transformation maintains symmetric order and perfect balance.
root
parent is a 3-node
b
a b c
a
left
c
d e
a
a b c parent is a 2-node
left a b c
right
a
middle
b d
d
a
a c b c d
b
a e
right d
c
a c e
b c d
c
b d e
a b
b
d
a b d c d e
c
e
Splitting a temporary 4-node in a 2-3 tree (summary)
10
2-3 tree: performance Splitting a 4-node is a local transformation: constant number of operations.
a e b c d
less than a
between a and b
between b and c
between c and d
between d and e
greater than e
between d and e
greater than e
a c e b less than a
between a and b
d between b and c
between c and d
Splitting a 4-node is a local transformation that preserves balance
11
Balanced search trees: quiz 1 What is the height of a 2-3 tree with N keys in the worst case? A.
~ log3 N
B.
~ log2 N
C.
~ 2 log2 N
D.
~N
E.
I don't know.
12
2-3 tree: performance Perfect balance. Every path from root to null link has same length.
Typical 2-3 tree built from random keys
Tree height. [all 2-nodes] ・Worst case: lg N. ・Best case: log N ≈ .631 lg N. [all 3-nodes] ・Between 12 and 20 for a million nodes. ・Between 18 and 30 for a billion nodes. 3
Bottom line. Guaranteed logarithmic performance for search and insert. 13
ST implementations: summary
guarantee
average case ordered ops?
implementation
key interface
search
insert
delete
search hit
insert
delete
sequential search (unordered list)
N
N
N
N
N
N
binary search (ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
√N
✔
compareTo()
2-3 tree
log N
log N
log N
log N
log N
log N
✔
compareTo()
equals()
but hidden constant c is large (depends upon implementation)
14
2-3 tree: implementation? Direct implementation is complicated, because:
・Maintaining multiple node types is cumbersome. ・Need multiple compares to move down tree. ・Need to move back up the tree to split 4-nodes. ・Large number of cases for splitting. fantasy code public void put(Key key, Value val) { Node x = root; “ Beautiful algorithms are not always the most while (x.getTheCorrectChild(key) != null) { Donald Knuth x— = x.getTheCorrectChildKey(); if (x.is4Node()) x.split(); } if (x.is2Node()) x.make3Node(key, val); else if (x.is3Node()) x.make4Node(key, val); }
useful. ”
Bottom line. Could do it, but there's a better way. 15
3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs
Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu
‣ B-trees
How to implement 2-3 trees with binary trees? Challenge. How to represent a 3 node?
ER
Approach 1. Regular BST.
・ ・Cannot map from BST back to 2-3 tree.
R
No way to tell a 3-node from a 2-node.
E
Approach 2. Regular BST with red "glue" nodes.
・Wastes space, wasted link. ・Code probably messy. Approach 3. Regular BST with red "glue" links.
・Widely used in practice. ・Arbitrary restriction: red links lean left.
R
E
R E 17
Left-leaning red-black BSTs (Guibas-Sedgewick 1979 and Sedgewick 2007) 3-node
a b
1. Represent 2–3 tree as a BST.
lack tree
2. Use "internal"Mleft-leaning links as "glue" 3–nodes. greater less for between J E C
3-node
L
less than a
P
b
S
between a and b
a M
J
E C
H
less than a
between a and b
less than a
X
M J
R
E J H
L
black links connect 2-nodes and 3-nodes
red−black tree
M
A C
between a and b
red links "glue" nodes within a 3-node
Encoding a 3-node with two 2-nodes connected by a left-leaning red link
ee
greater than b
Encoding a 3-node with two 2-nodes connected by a left-leaning red link
greater Rthan b
S
larger key is root
a
greater than b
P
L
than b
a and b
X
b
ontal red links
A
a b
H
A
than a
R
P
E C
S X
R P
L
X S
H
A
2-3 tree
corresponding red-black BST
correspondence between red-black and 2-3 trees
horizontal red links
M
18
Left-leaning red-black BSTs: 1-1 correspondence with 2-3 trees Key property. 1–1 correspondence between 2–3 and LLRB. red−black tree
M J E C
R P
L
X S
H
A horizontal red links
M J
E A
C
H
R P
L
2-3 tree
S
X
M R
E J A C
H
L
P
S X
1−1 correspondence between red-black and 2-3 trees 19
An equivalent definition A BST such that:
・No node has two red links connected to it. ・Every path from root to null link has the same number of black links. ・Red links lean left. "perfect black balance"
red−black tree
M J E C
R P
L
X S
H
A horizontal red links
M J
E A
C
H
R L
P
S
X
20
Search implementation for red-black BSTs Observation. Search is the same as for elementary BST (ignore color). but runs faster because of better balance
public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; else if (cmp == 0) return x.val; } return null; }
red−black tree
M J E C
R P
L
X S
H
A horizontal red links
M J
E A
C
H
R P
L
2-3 tree
S
X
M
Remark. Most other ops (e.g., floor, iteration, selection) R E Jare also identical. A C
H
L
P
S X
21
Red-black BST representation Each node is pointed to by precisely one link (from its parent) ⇒ can encode color of links in nodes.
private static final boolean RED = true; private static final boolean BLACK = false; private class Node { Key key; Value val; Node left, right; boolean color; // color of parent link } private boolean isRed(Node x) { if (x == null) return false; return x.color == RED; } null links are black
h
h.left.color is RED
E
C A
J D
h.right.color is BLACK
G
private static final boolean RED = true; private static final boolean BLACK = false; private class Node { Key key; Value val; Node left, right; int N; boolean color;
// // // // // //
key associated data subtrees # nodes in this subtre color of link from parent to this node
Node(Key key, Value val) { this.key = key;
22
Insertion into a LLRB tree: overview Basic strategy. Maintain 1-1 correspondence with 2-3 trees. During internal operations, maintain:
・Symmetric order. ・Perfect black balance. [ but not necessarily color invariants ] S E A
E
E S
right-leaning red link
A
S
two red children (a temporary 4-node)
A
left-left red (a temporary 4-node)
S A E
left-right red (a temporary 4-node)
How? Apply elementary red-black BST operations: rotation and color flip. 23
Elementary red-black BST operations Left rotation. Orient a (temporarily) right-leaning red link to lean left.
rotate E left (before) h
E S
x
less than E between E and S
greater than S
private Node rotateLeft(Node h) { assert isRed(h.right); Node x = h.right; h.right = x.left; x.left = h; x.color = h.color; h.color = RED; return x; }
Invariants. Maintains symmetric order and perfect black balance. 24
Elementary red-black BST operations Left rotation. Orient a (temporarily) right-leaning red link to lean left.
rotate E left (after)
S h
x
E greater than S
less than E
between E and S
private Node rotateLeft(Node h) { assert isRed(h.right); Node x = h.right; h.right = x.left; x.left = h; x.color = h.color; h.color = RED; return x; }
Invariants. Maintains symmetric order and perfect black balance. 25
Elementary red-black BST operations Right rotation. Orient a left-leaning red link to (temporarily) lean right.
rotate S right (before)
S x
h
E greater than S
less than E
between E and S
private Node rotateRight(Node h) { assert isRed(h.left); Node x = h.left; h.left = x.right; x.right = h; x.color = h.color; h.color = RED; return x; }
Invariants. Maintains symmetric order and perfect black balance. 26
Elementary red-black BST operations Right rotation. Orient a left-leaning red link to (temporarily) lean right.
rotate S right (after) x
E S
h
less than E between E and S
greater than S
private Node rotateRight(Node h) { assert isRed(h.left); Node x = h.left; h.left = x.right; x.right = h; x.color = h.color; h.color = RED; return x; }
Invariants. Maintains symmetric order and perfect black balance. 27
Elementary red-black BST operations Color flip. Recolor to split a (temporary) 4-node.
flip colors (before) h
E
A
less than A
S
between A and E
between E and S
greater than S
private void flipColors(Node h) { assert !isRed(h); assert isRed(h.left); assert isRed(h.right); h.color = RED; h.left.color = BLACK; h.right.color = BLACK; }
Invariants. Maintains symmetric order and perfect black balance. 28
Elementary red-black BST operations Color flip. Recolor to split a (temporary) 4-node.
flip colors (after) h
E
A
less than A
S
between A and E
between E and S
greater than S
private void flipColors(Node h) { assert !isRed(h); assert isRed(h.left); assert isRed(h.right); h.color = RED; h.left.color = BLACK; h.right.color = BLACK; }
Invariants. Maintains symmetric order and perfect black balance. 29
search ends at this null link
Insertion into a LLRB tree
root
Warmup 1. Insert into a tree with exactly 1 node. b a
left
root
right
b a
root
b a
right
a
red link to new node containing a converts 2-node to 3-node root
a
root
search ends at this null link
a
search ends at this null link
search ends at this null link
red link to new node containing a converts 2-node to 3-node
attached new node with red link b root
b a
rotated left to make a legal 3-node
Insert into a single 2-node (two cases)
attached new node with red link b 30
Insertion into a LLRB tree Case 1. Insert into a 2-node at the bottom.
to maintain symmetric order and perfect black balance
・Do standard BST insert; color new link red. ・If new red link is a right link, rotate left.
to fix color invariants
insert C
E
E
A
S
add new node here right link red so rotate left
A
R
E
A
S C
R E
C A
R S
S R
E A C
R S
Insert into a 2-node at the bottom 31
Insertion into a LLRB tree Warmup 2. Insert into a tree with exactly 2 nodes. larger
b a
larger smaller
between smaller between between searchcends c c c b search b ends search ends c c at this at this at this a search ends b null link a a search ends b null link a search ends a link b at this null link null at this null link at thisends null link search search ends at this null link search ends c at this null link
larger
smaller
at this null link
b a
b a
attached new c node with red b link
attached newb a b attached new node with c b node with red link c ab b a attached new ared link c a c attached newnode with a node with red link red link rotated bb b flippedright colors aa rotated b to black colors flipped b a right c b to black c a colors flipped c a a black c to bb c colors flipped b aa to black colors flipped a to blackc b
a
c
c
c
a
ca
attached new attached new node withb node b with attached new red link nodered withlink red link c
c
b rotated a right a rotated left c
c b
rotated left
rotated a right rotated colors flipped a to right black c
rotated left rotated right
b
b
cc
attached new node with red link
c
colors flipped to black colors flipped c a to black b
b
colors flipped c b to black Insert into a single 3-node (three cases) c a a
Insert into a single 3-node (three cases)
32
E
Insertion into a LLRB tree
C A
inserting H
C
R H
E
Case 2. Insert into a 3-node at the bottom.
S
S
both children red R so flip colors to maintain symmetric order Do standard BST insert; color new link red. and perfect blackEbalance inserting H add new C R E needed). Rotate to balance the 4-node (if node here S A H C S two lefts in a row inserting Hto pass red link up one level. Flip colors so rotatetoright fix color invariants A R E right link red E Rotate to make so rotate left C leanSleft (if needed). C S add new A R node here A R E two lefts in a row inserting H H C R so rotate right add new E node here S E A H C S both children red two lefts in a rowC S so flip colors so rotate right A R R E A R E C R S E H C S add new S A H node here A H C R both children red two lefts in a row A so flip colors so rotate H right right link red E E so rotate left both children redC Insert into a 3-node R C S so flip colors at the bottom E S A H A R E C R H C R right link red S A H S so rotate left A H both children red so flip colors right link red R E E so rotate left S E 33 C R C R
・ ・ ・ ・
A
A R E
add new node here M both children R red so P H flip colors S E S
Insertion into a LLRB tree: passing red links up the C tree A
Case 2. Insert into a 3-node at the bottom. R
S
E
inserting P
A
R
R
E C
inserting P
R E
A S
M H
C
A S
H
add new node here
S
E
inserting P
M
M
H
add new node hereC A R
C
Mright link red rotate left R A H so P S E two lefts in a rowto fix color invariants M so rotate right C P A R H M E
R
C
S
E M H
A both children red so P flip colors
add new S node here E C M right link red C M so rotate left both children R R A H red so P A H S add new E S E flip colors node here C M C M both children P A H R red red so Pright link A H so rotate left R flip colors S E two lefts in a row S E C M so rotate right right link red both children C M red soleft R P so rotate A H R flip colors P A H M S S E P E C M right link red two lefts in a row so rotate left so rotate right C R H P A H R A S E both children red two lefts in a row M S C M so flip colors so rotate right P E P A H R C H M
M
both children red so P A R H to maintain symmetric order flip colors S Eand perfect black balance
inserting P
C
C
right link red so rotate left
・Do standard BST insert; color new link red. ・Rotate to balance the 4-node (if needed). ・Flip colors to pass red link up one level. ・Rotate to make lean left (if needed). ・Repeat case 1 or case 2 up the tree (if needed).
H
S two lefts in a row P so rotate right R
H M
S
both children red P E so flip colors C H MA R both children red E so flip colors P C H S M A R E M P C H S R E A P C H S M A R E C the tree H Passing a red link up A
P
S
Passing a red link up the tree 34
Red-black BST construction demo
insert S
S
35
Red-black BST construction demo
red-black BST
M E C A
R L
H
P
X S
36
Insertion into a LLRB tree: Java implementation Same code for all cases.
・Right child red, left child black: rotate left. ・Left child, left-left grandchild red: rotate right. ・Both children red: flip colors.
h
h
right rotate
private Node put(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); int cmp = key.compareTo(h.key); if (cmp < 0) h.left = put(h.left, key, val); else if (cmp > 0) h.right = put(h.right, key, val); else if (cmp == 0) h.val = val;
h
left rotate
flip colors
Passing a red link up a red-black tree
if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) flipColors(h);
insert at bottom (and color it red)
lean left balance 4-node split 4-node
return h; }
only a few extra lines of code provides near-perfect balance 37
Insertion into a LLRB tree: visualization
255 insertions in ascending order
38
Insertion into a LLRB tree: visualization
255 insertions in descending order 39
Insertion into a LLRB tree: visualization
255 random insertions
40
Balanced search trees: quiz 2 What is the height of a LLRB tree with N keys in the worst case? A.
~ log3 N
B.
~ log2 N
C.
~ 2 log2 N
D.
~N
E.
I don't know.
41
Balance in LLRB trees Proposition. Height of tree is ≤ 2 lg N in the worst case. Pf.
・Black height = height of corresponding 2-3 tree ・Never two red links in-a-row.
≤ lg N.
Property. Height of tree is ~ 1.0 lg N in typical applications. 42
ST implementations: summary guarantee
average case ordered ops?
implementation
key interface
search
insert
delete
search hit
insert
delete
sequential search (unordered list)
N
N
N
N
N
N
binary search (ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
√N
✔
compareTo()
2-3 tree
log N
log N
log N
log N
log N
log N
✔
compareTo()
red-black BST
log N
log N
log N
log N
log N
log N
✔
compareTo()
hidden constant c is small (at most 2 lg N compares)
equals()
43
RED-BLACK BST (WITHOUT USING A COLOR BIT) Red-black BST representation. BST, where each node has a color bit. Challenge. Represent without using extra memory for color.
R X
E C
M
S
A
44
War story: why red-black? Xerox PARC innovations. [1970s]
・Alto. ・GUI. ・Ethernet. ・Smalltalk. ・InterPress. ・Laser printing. ・Bitmapped display. ・WYSIWYG text editor. ・...
Xerox Alto
A DIClIROlV1ATIC FUAl\lE\V()HK Fon BALANCED TREES
Leo J. Guibas .Xerox Palo Alto Research Center, Palo Alto, California, and Carnegie-Afellon University
ABSTUACT
I() this paper we present a uniform framework for the implementation and study of halanced tree algorithms. \Ve show how to imhcd in this framework the best known halanced tree tecilIliques and thell usc the framework to deVl'lop new which perform the update and rebalancing in one pass, Oil the way down towards a leaf. \Ve conclude with a study of performance issues and concurrent updating.
and
Robert Sedgewick* Program in Computer Science Brown University Providence, R. I.
the way down towards a leaf. As we will see, this has a number of significant advantages ovcr the older methods. We shall cxamine a numhcr of variations on a common theme and exhibit full implementations which are notable for their brcvity. One imp1cn1entation is exatnined carefully, and some properties about its behavior are proved. ]n both sections 1 and 2 particular attention is paid to practical implementation issues, and cOlnplcte impletnentations are given for
45
War story: red-black BSTs Telephone company contracted with database provider to build real-time database to store customer information. Database implementation.
・Red-black BST search, insert, and delete. ・Exceeding height limit of 80 triggered error-recovery process. allows for up to 240 keys
Extended telephone service outage.
did not rebalance BST during delete
・Main cause = height bound exceeded! ・Telephone company sues database provider. ・Legal testimony: “ If implemented properly, the height of a red-black BST with N keys is at most 2 lg N. ”
— expert witness 46
3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs
Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu
‣ B-trees
File system model Page. Contiguous block of data (e.g., a 4,096-byte chunk). Probe. First access to a page (e.g., from disk to memory).
slow
fast
Property. Time required for a probe is much larger than time to access data within a page. Cost model. Number of probes. Goal. Access data using minimum number of probes. 48
B-trees (Bayer-McCreight, 1972) B-tree. Generalize 2-3 trees by allowing up to M keys per node.
・At least ⎣ M / 2 ⎦ keys in all nodes (except root). ・Every path from root to leaf has same number of links.
G P U
A C D F
−
−
I
J
K L O
−
−
−
choose M as large as possible so that M keys fit in a page (M = 1,024 is typical)
−
Q R T
−
−
−
V W X Y Z
−
a B-tree (M = 6)
49
Search in a B-tree
・Start at root. ・Check if node contains key. ・Otherwise, find interval for search key and take corresponding link. could use binary search (but all ops are considered free)
G P U
A C D F
−
−
I
J
K L O
−
−
−
−
Q R T
−
−
−
V W X Y Z
−
a B-tree (M = 6)
50
Insertion in a B-tree
・Search for new key. ・Insert at bottom. ・Split nodes with M + 1 keys on the way back up the B-tree (moving middle key to parent).
G P U
A C D F
−
−
I
J
K L O
−
−
−
−
Q R T
−
−
−
V W X Y Z
−
a B-tree (M = 6)
51
Balance in B-tree Proposition. A search or an insertion in a B-tree of order M with N keys requires between ~ log M N and ~ log M/2 N probes. Pf. All nodes (except possibly root) have between ⎣ M / 2 ⎦ and M keys.
In practice. Number of probes is at most 4.
M = 1024; N = 62 billion log M/2 N ≤ 4
52
Balanced search trees: quiz 3 What of the following does the B in B-tree not mean?
A.
Bayer
B.
Balanced
C.
Binary
D.
Boeing
E.
I don't know.
“ the more you think about what the B in B-trees could mean, the more you learn about B-trees and that is good. ” – Rudolph Bayer
53
Balanced trees in the wild Red-black trees are widely used as system symbol tables.
・Java: java.util.TreeMap, java.util.TreeSet. ・C++ STL: map, multimap, multiset. ・Linux kernel: completely fair scheduler, linux/rbtree.h. ・Emacs: conservative stack scanning. B-tree cousins. B+ tree, B*tree, B# tree, … B-trees (and cousins) are widely used for file systems and databases.
・Windows: NTFS. ・Mac: HFS, HFS+. ・Linux: ReiserFS, XFS, Ext3FS, JFS, BTRFS. ・Databases: ORACLE, DB2, INGRES, SQL, PostgreSQL.
54
Red-black BSTs in the wild
Common sense. Sixth sense. Together they're the FBI's newest team.
55
Red-black BSTs in the wild
56