Algorithms. Algorithms 3.3 BALANCED SEARCH TREES. 2-3 search trees red-black BSTs B-trees ROBERT SEDGEWICK KEVIN WAYNE

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE 3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs Algorithms F O U R T H ‣ B-trees E...
Author: Bruno Bennett
1 downloads 2 Views 6MB Size
Algorithms

R OBERT S EDGEWICK | K EVIN W AYNE

3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs

Algorithms F O U R T H

‣ B-trees

E D I T I O N

R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu

Last updated on Mar 1, 2015, 9:42 AM

Symbol table review

guarantee

average case ordered ops?

implementation

key interface

search

insert

delete

search hit

insert

delete

sequential search (unordered list)

N

N

N

N

N

N

binary search (ordered array)

log N

N

N

log N

N

N



compareTo()

BST

N

N

N

log N

log N

√N



compareTo()

goal

log N

log N

log N

log N

log N

log N



compareTo()

equals()

Challenge. Guarantee performance. This lecture. 2-3 trees, left-leaning red-black BSTs, B-trees. 2

3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu

‣ B-trees

2-3 tree Allow 1 or 2 keys per node.

・2-node: ・3-node:

one key, two children. two keys, three children.

Symmetric order. Inorder traversal yields keys in ascending order. Perfect balance. Every path from root to null link has same length. how to maintain?

M 3-node

smaller than E

AC

between E and J

2-node

E J

H

R larger than J

L

P

SX

null link 4

2-3 tree demo Search.

・Compare search key against keys in node. ・Find interval containing search key. ・Follow associated link (recursively). search for H

M

R

E J

AC

H

L

P

SX

5

2-3 tree: insertion Insertion into a 2-node at bottom.

・Add new key to 2-node to create a 3-node.

insert G

L

L

E

AC

E

R

H

P

SX

AC

R

GH

P

SX

6

2-3 tree: insertion Insertion into a 3-node at bottom.

・Add new key to 3-node to create temporary 4-node. ・Move middle key in 4-node into parent. ・Repeat up the tree, as necessary. ・If you reach the root and it's a 4-node, split it into three 2-nodes. insert Z

L

L

E

AC

E

R

H

P

SX

AC

RX

H

P

S

Z

7

2-3 tree construction demo

insert S

S

8

2-3 tree construction demo

2-3 tree

L

E

AC

R

H

P

SX

9

2-3 tree: global properties Invariants. Maintains symmetric order and perfect balance. Pf. Each transformation maintains symmetric order and perfect balance.

root

parent is a 3-node

b

a b c

a

left

c

d e

a

a b c parent is a 2-node

left a b c

right

a

middle

b d

d

a

a c b c d

b

a e

right d

c

a c e

b c d

c

b d e

a b

b

d

a b d c d e

c

e

Splitting a temporary 4-node in a 2-3 tree (summary)

10

2-3 tree: performance Splitting a 4-node is a local transformation: constant number of operations.

a e b c d

less than a

between a and b

between b and c

between c and d

between d and e

greater than e

between d and e

greater than e

a c e b less than a

between a and b

d between b and c

between c and d

Splitting a 4-node is a local transformation that preserves balance

11

Balanced search trees: quiz 1 What is the height of a 2-3 tree with N keys in the worst case? A.

~ log3 N

B.

~ log2 N

C.

~ 2 log2 N

D.

~N

E.

I don't know.

12

2-3 tree: performance Perfect balance. Every path from root to null link has same length.

Typical 2-3 tree built from random keys

Tree height. [all 2-nodes] ・Worst case: lg N. ・Best case: log N ≈ .631 lg N. [all 3-nodes] ・Between 12 and 20 for a million nodes. ・Between 18 and 30 for a billion nodes. 3

Bottom line. Guaranteed logarithmic performance for search and insert. 13

ST implementations: summary

guarantee

average case ordered ops?

implementation

key interface

search

insert

delete

search hit

insert

delete

sequential search (unordered list)

N

N

N

N

N

N

binary search (ordered array)

log N

N

N

log N

N

N



compareTo()

BST

N

N

N

log N

log N

√N



compareTo()

2-3 tree

log N

log N

log N

log N

log N

log N



compareTo()

equals()

but hidden constant c is large (depends upon implementation)

14

2-3 tree: implementation? Direct implementation is complicated, because:

・Maintaining multiple node types is cumbersome. ・Need multiple compares to move down tree. ・Need to move back up the tree to split 4-nodes. ・Large number of cases for splitting. fantasy code public void put(Key key, Value val) { Node x = root; “ Beautiful algorithms are not always the most while (x.getTheCorrectChild(key) != null) { Donald Knuth x— = x.getTheCorrectChildKey(); if (x.is4Node()) x.split(); } if (x.is2Node()) x.make3Node(key, val); else if (x.is3Node()) x.make4Node(key, val); }

useful. ”

Bottom line. Could do it, but there's a better way. 15

3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu

‣ B-trees

How to implement 2-3 trees with binary trees? Challenge. How to represent a 3 node?

ER

Approach 1. Regular BST.

・ ・Cannot map from BST back to 2-3 tree.

R

No way to tell a 3-node from a 2-node.

E

Approach 2. Regular BST with red "glue" nodes.

・Wastes space, wasted link. ・Code probably messy. Approach 3. Regular BST with red "glue" links.

・Widely used in practice. ・Arbitrary restriction: red links lean left.

R

E

R E 17

Left-leaning red-black BSTs (Guibas-Sedgewick 1979 and Sedgewick 2007) 3-node

a b

1. Represent 2–3 tree as a BST.

lack tree

2. Use "internal"Mleft-leaning links as "glue" 3–nodes. greater less for between J E C

3-node

L

less than a

P

b

S

between a and b

a M

J

E C

H

less than a

between a and b

less than a

X

M J

R

E J H

L

black links connect 2-nodes and 3-nodes

red−black tree

M

A C

between a and b

red links "glue" nodes within a 3-node

Encoding a 3-node with two 2-nodes connected by a left-leaning red link

ee

greater than b

Encoding a 3-node with two 2-nodes connected by a left-leaning red link

greater Rthan b

S

larger key is root

a

greater than b

P

L

than b

a and b

X

b

ontal red links

A

a b

H

A

than a

R

P

E C

S X

R P

L

X S

H

A

2-3 tree

corresponding red-black BST

correspondence between red-black and 2-3 trees

horizontal red links

M

18

Left-leaning red-black BSTs: 1-1 correspondence with 2-3 trees Key property. 1–1 correspondence between 2–3 and LLRB. red−black tree

M J E C

R P

L

X S

H

A horizontal red links

M J

E A

C

H

R P

L

2-3 tree

S

X

M R

E J A C

H

L

P

S X

1−1 correspondence between red-black and 2-3 trees 19

An equivalent definition A BST such that:

・No node has two red links connected to it. ・Every path from root to null link has the same number of black links. ・Red links lean left. "perfect black balance"

red−black tree

M J E C

R P

L

X S

H

A horizontal red links

M J

E A

C

H

R L

P

S

X

20

Search implementation for red-black BSTs Observation. Search is the same as for elementary BST (ignore color). but runs faster because of better balance

public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; else if (cmp == 0) return x.val; } return null; }

red−black tree

M J E C

R P

L

X S

H

A horizontal red links

M J

E A

C

H

R P

L

2-3 tree

S

X

M

Remark. Most other ops (e.g., floor, iteration, selection) R E Jare also identical. A C

H

L

P

S X

21

Red-black BST representation Each node is pointed to by precisely one link (from its parent) ⇒ can encode color of links in nodes.

private static final boolean RED = true; private static final boolean BLACK = false; private class Node { Key key; Value val; Node left, right; boolean color; // color of parent link } private boolean isRed(Node x) { if (x == null) return false; return x.color == RED; } null links are black

h

h.left.color is RED

E

C A

J D

h.right.color is BLACK

G

private static final boolean RED = true; private static final boolean BLACK = false; private class Node { Key key; Value val; Node left, right; int N; boolean color;

// // // // // //

key associated data subtrees # nodes in this subtre color of link from parent to this node

Node(Key key, Value val) { this.key = key;

22

Insertion into a LLRB tree: overview Basic strategy. Maintain 1-1 correspondence with 2-3 trees. During internal operations, maintain:

・Symmetric order. ・Perfect black balance. [ but not necessarily color invariants ] S E A

E

E S

right-leaning red link

A

S

two red children (a temporary 4-node)

A

left-left red (a temporary 4-node)

S A E

left-right red (a temporary 4-node)

How? Apply elementary red-black BST operations: rotation and color flip. 23

Elementary red-black BST operations Left rotation. Orient a (temporarily) right-leaning red link to lean left.

rotate E left (before) h

E S

x

less than E between E and S

greater than S

private Node rotateLeft(Node h) { assert isRed(h.right); Node x = h.right; h.right = x.left; x.left = h; x.color = h.color; h.color = RED; return x; }

Invariants. Maintains symmetric order and perfect black balance. 24

Elementary red-black BST operations Left rotation. Orient a (temporarily) right-leaning red link to lean left.

rotate E left (after)

S h

x

E greater than S

less than E

between E and S

private Node rotateLeft(Node h) { assert isRed(h.right); Node x = h.right; h.right = x.left; x.left = h; x.color = h.color; h.color = RED; return x; }

Invariants. Maintains symmetric order and perfect black balance. 25

Elementary red-black BST operations Right rotation. Orient a left-leaning red link to (temporarily) lean right.

rotate S right (before)

S x

h

E greater than S

less than E

between E and S

private Node rotateRight(Node h) { assert isRed(h.left); Node x = h.left; h.left = x.right; x.right = h; x.color = h.color; h.color = RED; return x; }

Invariants. Maintains symmetric order and perfect black balance. 26

Elementary red-black BST operations Right rotation. Orient a left-leaning red link to (temporarily) lean right.

rotate S right (after) x

E S

h

less than E between E and S

greater than S

private Node rotateRight(Node h) { assert isRed(h.left); Node x = h.left; h.left = x.right; x.right = h; x.color = h.color; h.color = RED; return x; }

Invariants. Maintains symmetric order and perfect black balance. 27

Elementary red-black BST operations Color flip. Recolor to split a (temporary) 4-node.

flip colors (before) h

E

A

less than A

S

between A and E

between E and S

greater than S

private void flipColors(Node h) { assert !isRed(h); assert isRed(h.left); assert isRed(h.right); h.color = RED; h.left.color = BLACK; h.right.color = BLACK; }

Invariants. Maintains symmetric order and perfect black balance. 28

Elementary red-black BST operations Color flip. Recolor to split a (temporary) 4-node.

flip colors (after) h

E

A

less than A

S

between A and E

between E and S

greater than S

private void flipColors(Node h) { assert !isRed(h); assert isRed(h.left); assert isRed(h.right); h.color = RED; h.left.color = BLACK; h.right.color = BLACK; }

Invariants. Maintains symmetric order and perfect black balance. 29

search ends at this null link

Insertion into a LLRB tree

root

Warmup 1. Insert into a tree with exactly 1 node. b a

left

root

right

b a

root

b a

right

a

red link to new node containing a converts 2-node to 3-node root

a

root

search ends at this null link

a

search ends at this null link

search ends at this null link

red link to new node containing a converts 2-node to 3-node

attached new node with red link b root

b a

rotated left to make a legal 3-node

Insert into a single 2-node (two cases)

attached new node with red link b 30

Insertion into a LLRB tree Case 1. Insert into a 2-node at the bottom.

to maintain symmetric order and perfect black balance

・Do standard BST insert; color new link red. ・If new red link is a right link, rotate left.

to fix color invariants

insert C

E

E

A

S

add new node here right link red so rotate left

A

R

E

A

S C

R E

C A

R S

S R

E A C

R S

Insert into a 2-node at the bottom 31

Insertion into a LLRB tree Warmup 2. Insert into a tree with exactly 2 nodes. larger

b a

larger smaller

between smaller between between searchcends c c c b search b ends search ends c c at this at this at this a search ends b null link a a search ends b null link a search ends a link b at this null link null at this null link at thisends null link search search ends at this null link search ends c at this null link

larger

smaller

at this null link

b a

b a

attached new c node with red b link

attached newb a b attached new node with c b node with red link c ab b a attached new ared link c a c attached newnode with a node with red link red link rotated bb b flippedright colors aa rotated b to black colors flipped b a right c b to black c a colors flipped c a a black c to bb c colors flipped b aa to black colors flipped a to blackc b

a

c

c

c

a

ca

attached new attached new node withb node b with attached new red link nodered withlink red link c

c

b rotated a right a rotated left c

c b

rotated left

rotated a right rotated colors flipped a to right black c

rotated left rotated right

b

b

cc

attached new node with red link

c

colors flipped to black colors flipped c a to black b

b

colors flipped c b to black Insert into a single 3-node (three cases) c a a

Insert into a single 3-node (three cases)

32

E

Insertion into a LLRB tree

C A

inserting H

C

R H

E

Case 2. Insert into a 3-node at the bottom.

S

S

both children red R so flip colors to maintain symmetric order Do standard BST insert; color new link red. and perfect blackEbalance inserting H add new C R E needed). Rotate to balance the 4-node (if node here S A H C S two lefts in a row inserting Hto pass red link up one level. Flip colors so rotatetoright fix color invariants A R E right link red E Rotate to make so rotate left C leanSleft (if needed). C S add new A R node here A R E two lefts in a row inserting H H C R so rotate right add new E node here S E A H C S both children red two lefts in a rowC S so flip colors so rotate right A R R E A R E C R S E H C S add new S A H node here A H C R both children red two lefts in a row A so flip colors so rotate H right right link red E E so rotate left both children redC Insert into a 3-node R C S so flip colors at the bottom E S A H A R E C R H C R right link red S A H S so rotate left A H both children red so flip colors right link red R E E so rotate left S E 33 C R C R

・ ・ ・ ・

A

A R E

add new node here M both children R red so P H flip colors S E S

Insertion into a LLRB tree: passing red links up the C tree A

Case 2. Insert into a 3-node at the bottom. R

S

E

inserting P

A

R

R

E C

inserting P

R E

A S

M H

C

A S

H

add new node here

S

E

inserting P

M

M

H

add new node hereC A R

C

Mright link red rotate left R A H so P S E two lefts in a rowto fix color invariants M so rotate right C P A R H M E

R

C

S

E M H

A both children red so P flip colors

add new S node here E C M right link red C M so rotate left both children R R A H red so P A H S add new E S E flip colors node here C M C M both children P A H R red red so Pright link A H so rotate left R flip colors S E two lefts in a row S E C M so rotate right right link red both children C M red soleft R P so rotate A H R flip colors P A H M S S E P E C M right link red two lefts in a row so rotate left so rotate right C R H P A H R A S E both children red two lefts in a row M S C M so flip colors so rotate right P E P A H R C H M

M

both children red so P A R H to maintain symmetric order flip colors S Eand perfect black balance

inserting P

C

C

right link red so rotate left

・Do standard BST insert; color new link red. ・Rotate to balance the 4-node (if needed). ・Flip colors to pass red link up one level. ・Rotate to make lean left (if needed). ・Repeat case 1 or case 2 up the tree (if needed).

H

S two lefts in a row P so rotate right R

H M

S

both children red P E so flip colors C H MA R both children red E so flip colors P C H S M A R E M P C H S R E A P C H S M A R E C the tree H Passing a red link up A

P

S

Passing a red link up the tree 34

Red-black BST construction demo

insert S

S

35

Red-black BST construction demo

red-black BST

M E C A

R L

H

P

X S

36

Insertion into a LLRB tree: Java implementation Same code for all cases.

・Right child red, left child black: rotate left. ・Left child, left-left grandchild red: rotate right. ・Both children red: flip colors.

h

h

right rotate

private Node put(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); int cmp = key.compareTo(h.key); if (cmp < 0) h.left = put(h.left, key, val); else if (cmp > 0) h.right = put(h.right, key, val); else if (cmp == 0) h.val = val;

h

left rotate

flip colors

Passing a red link up a red-black tree

if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) flipColors(h);

insert at bottom (and color it red)

lean left balance 4-node split 4-node

return h; }

only a few extra lines of code provides near-perfect balance 37

Insertion into a LLRB tree: visualization

255 insertions in ascending order

38

Insertion into a LLRB tree: visualization

255 insertions in descending order 39

Insertion into a LLRB tree: visualization

255 random insertions

40

Balanced search trees: quiz 2 What is the height of a LLRB tree with N keys in the worst case? A.

~ log3 N

B.

~ log2 N

C.

~ 2 log2 N

D.

~N

E.

I don't know.

41

Balance in LLRB trees Proposition. Height of tree is ≤ 2 lg N in the worst case. Pf.

・Black height = height of corresponding 2-3 tree ・Never two red links in-a-row.

≤ lg N.

Property. Height of tree is ~ 1.0 lg N in typical applications. 42

ST implementations: summary guarantee

average case ordered ops?

implementation

key interface

search

insert

delete

search hit

insert

delete

sequential search (unordered list)

N

N

N

N

N

N

binary search (ordered array)

log N

N

N

log N

N

N



compareTo()

BST

N

N

N

log N

log N

√N



compareTo()

2-3 tree

log N

log N

log N

log N

log N

log N



compareTo()

red-black BST

log N

log N

log N

log N

log N

log N



compareTo()

hidden constant c is small (at most 2 lg N compares)

equals()

43

RED-BLACK BST (WITHOUT USING A COLOR BIT) Red-black BST representation. BST, where each node has a color bit. Challenge. Represent without using extra memory for color.

R X

E C

M

S

A

44

War story: why red-black? Xerox PARC innovations. [1970s]

・Alto. ・GUI. ・Ethernet. ・Smalltalk. ・InterPress. ・Laser printing. ・Bitmapped display. ・WYSIWYG text editor. ・...

Xerox Alto

A DIClIROlV1ATIC FUAl\lE\V()HK Fon BALANCED TREES

Leo J. Guibas .Xerox Palo Alto Research Center, Palo Alto, California, and Carnegie-Afellon University

ABSTUACT

I() this paper we present a uniform framework for the implementation and study of halanced tree algorithms. \Ve show how to imhcd in this framework the best known halanced tree tecilIliques and thell usc the framework to deVl'lop new which perform the update and rebalancing in one pass, Oil the way down towards a leaf. \Ve conclude with a study of performance issues and concurrent updating.

and

Robert Sedgewick* Program in Computer Science Brown University Providence, R. I.

the way down towards a leaf. As we will see, this has a number of significant advantages ovcr the older methods. We shall cxamine a numhcr of variations on a common theme and exhibit full implementations which are notable for their brcvity. One imp1cn1entation is exatnined carefully, and some properties about its behavior are proved. ]n both sections 1 and 2 particular attention is paid to practical implementation issues, and cOlnplcte impletnentations are given for

45

War story: red-black BSTs Telephone company contracted with database provider to build real-time database to store customer information. Database implementation.

・Red-black BST search, insert, and delete. ・Exceeding height limit of 80 triggered error-recovery process. allows for up to 240 keys

Extended telephone service outage.

did not rebalance BST during delete

・Main cause = height bound exceeded! ・Telephone company sues database provider. ・Legal testimony: “ If implemented properly, the height of a red-black BST with N keys is at most 2 lg N. ”

— expert witness 46

3.3 B ALANCED S EARCH T REES ‣ 2-3 search trees ‣ red-black BSTs

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu

‣ B-trees

File system model Page. Contiguous block of data (e.g., a 4,096-byte chunk). Probe. First access to a page (e.g., from disk to memory).

slow

fast

Property. Time required for a probe is much larger than time to access data within a page. Cost model. Number of probes. Goal. Access data using minimum number of probes. 48

B-trees (Bayer-McCreight, 1972) B-tree. Generalize 2-3 trees by allowing up to M keys per node.

・At least ⎣ M / 2 ⎦ keys in all nodes (except root). ・Every path from root to leaf has same number of links.

G P U

A C D F





I

J

K L O







choose M as large as possible so that M keys fit in a page (M = 1,024 is typical)



Q R T







V W X Y Z



a B-tree (M = 6)

49

Search in a B-tree

・Start at root. ・Check if node contains key. ・Otherwise, find interval for search key and take corresponding link. could use binary search (but all ops are considered free)

G P U

A C D F





I

J

K L O









Q R T







V W X Y Z



a B-tree (M = 6)

50

Insertion in a B-tree

・Search for new key. ・Insert at bottom. ・Split nodes with M + 1 keys on the way back up the B-tree (moving middle key to parent).

G P U

A C D F





I

J

K L O









Q R T







V W X Y Z



a B-tree (M = 6)

51

Balance in B-tree Proposition. A search or an insertion in a B-tree of order M with N keys requires between ~ log M N and ~ log M/2 N probes. Pf. All nodes (except possibly root) have between ⎣ M / 2 ⎦ and M keys.

In practice. Number of probes is at most 4.

M = 1024; N = 62 billion log M/2 N ≤ 4

52

Balanced search trees: quiz 3 What of the following does the B in B-tree not mean?

A.

Bayer

B.

Balanced

C.

Binary

D.

Boeing

E.

I don't know.

“ the more you think about what the B in B-trees could mean, the more you learn about B-trees and that is good. ” – Rudolph Bayer

53

Balanced trees in the wild Red-black trees are widely used as system symbol tables.

・Java: java.util.TreeMap, java.util.TreeSet. ・C++ STL: map, multimap, multiset. ・Linux kernel: completely fair scheduler, linux/rbtree.h. ・Emacs: conservative stack scanning. B-tree cousins. B+ tree, B*tree, B# tree, … B-trees (and cousins) are widely used for file systems and databases.

・Windows: NTFS. ・Mac: HFS, HFS+. ・Linux: ReiserFS, XFS, Ext3FS, JFS, BTRFS. ・Databases: ORACLE, DB2, INGRES, SQL, PostgreSQL.

54

Red-black BSTs in the wild

Common sense. Sixth sense. Together they're the FBI's newest team.

55

Red-black BSTs in the wild

56

Suggest Documents