Search Trees. Outline. Dictionary ADT Log file Binary search Lookup table Binary search tree Search Insertion Deletion Performance

Search Trees 6 < 2 1 9 > 4 = 8 Algorithm Theory Hiroaki Kobayashi 12/20/02 1 Outline Dictionary ADT Log file Binary search Lookup table Binary...
Author: Rosanna Wright
0 downloads 0 Views 388KB Size
Search Trees 6

< 2 1

9

> 4 =

8

Algorithm Theory Hiroaki Kobayashi

12/20/02

1

Outline Dictionary ADT Log file Binary search Lookup table Binary search tree „ „ „ „

Search Insertion Deletion Performance

AVL Tree „ „ „ „ „

(2,4) Tree „ „

Multi-way search tree (2,4) tree Š Š Š Š

„

12/20/02

Definition Insertion Restructuring Deletion Performance

Algorithm Theory Hiroaki Kobayashi

Definition Search Insertion Deletion

Comparison of dictionary implementations 2

1

Dictionary Abstract Data Type (ADT) The dictionary ADT models a searchable collection of keyelement items The main operations of a dictionary are searching, inserting, and deleting items Multiple items with the same key are allowed Applications:

address book credit card authorization mapping host names (e.g., cs16.net) to internet addresses (e.g., 128.148.34.101)

„ „ „

Dictionary ADT methods: „

„ „

„ „ „

Algorithm Theory Hiroaki Kobayashi

12/20/02

findElement(k): if the dictionary has an item with key k, returns its element, else, returns the special element NO_SUCH_KEY insertItem(k, o): inserts item (k, o) into the dictionary removeElement(k): if the dictionary has an item with key k, removes it from the dictionary and returns its element, else returns the special element NO_SUCH_KEY size(), isEmpty() keys(), Elements() findAllElements(k), removeAllElements(k) 3

Log File A log file is a dictionary implemented by means of an unsorted sequence „

We store the items of the dictionary in a sequence (based on a doubly-linked lists or a circular array), in arbitrary order

Performance: „

„

insertItem takes O(1) time since we can insert the new item at the beginning or at the end of the sequence findElement and removeElement take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key

The log file is effective only for dictionaries of small size or for dictionaries on which insertions are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation)

12/20/02

Algorithm Theory Hiroaki Kobayashi

4

2

Ordered Dictionaries If searching a dictionary is the most common operation, elements should be sorted by their keys. Ordered Dictionaries „

Keys are assumed to come from a total order.

New operations: „

closestKeyBefore(k)

„

Š Return the key of the item

Š Return the key of the item

with largest key less than or equal to k

„

closestElemBefore(k)

closestKeyAfter(k)

with smallest key greater than or equal to k

„

Š Return the element for the

item with largest key less than or equal to k

closestElemAfter(k)

Š Return the element for the

item with smallest key greater than or equal to k

Algorithm Theory Hiroaki Kobayashi

12/20/02

5

Lookup Table A lookup table is a dictionary implemented by means of a sorted

sequence „

„

We store the items of the dictionary in an array-based sequence, sorted by key We use an external comparator for the keys

Performance: „ „

„

findElement takes O(log n) time, using binary search insertItem takes O(n) time since in the worst case we have to shift n/2 items to make room for the new item removeElement take O(n) time since in the worst case we have to shift n/2 items to compact the items after the removal

The lookup table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) 12/20/02

Algorithm Theory Hiroaki Kobayashi

6

3

Binary Search Binary search performs operation findElement(k) on a dictionary implemented by means of an array-based sequence, sorted by key „ „ „

similar to the high-low game at each step, the number of candidate items is halved terminates after a logarithmic number of steps

Example: findElement(7) 0

1

3

4

5

7

8

9

11

14

16

18

19

8

9

11

14

16

18

19

8

9

11

14

16

18

19

8

9

11

14

16

18

19

m

l 0

1

3

4

5

4

5

7

l

m

h

4

5

m

l 0

1

0

1

3 3

7

h

h

7

l=m =h

12/20/02

Algorithm Theory Hiroaki Kobayashi

7

Binary Search Tree A binary search tree is a binary tree storing keys (or key-element pairs) at its internal nodes and satisfying the following property: „

Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. We have key(u) ≤ key(v) ≤ key(w)

External nodes do not store items „

12/20/02

An inorder traversal of a binary search trees visits the keys in increasing order

6 2 1

9 4

8

Null or references to a NULL object Algorithm Theory Hiroaki Kobayashi

8

4

Search Algorithm findElement(k, v) To search for a key k, if T.isExternal (v) we trace a downward return NO_SUCH_KEY path starting at the root if k < key(v) The next node visited return findElement(k, T.leftChild(v)) depends on the else if k = key(v) outcome of the return element(v) comparison of k with else { k > key(v) } the key of the current return findElement(k, T.rightChild(v)) node 6 If we reach a leaf, the < key is not found and we 2 9 return NO_SUCH_KEY > 8 Example: 1 4 = findElement(4) Algorithm Theory Hiroaki Kobayashi

12/20/02

9

Analysis of Binary Tree Searching time per level

height

h

O(1)

O(1)

Tree T

O(1)

Total time: O(h) 12/20/02

Algorithm Theory Hiroaki Kobayashi

10

5

Insertion To perform operation insertItem(k, o), we search for key k Assume k is not already in the tree, and let let w be the leaf reached by the search We insert k at node w and expand w into an internal node Example: insert 5 12/20/02

6

< 2

9

>

1

4

8

> w 6

2

9

1

4

8

w 5

Algorithm Theory Hiroaki Kobayashi

11

Deletion To perform operation removeElement(k), we search for key k Assume key k is in the tree, and let let v be the node storing k If node v has a leaf child w, we remove v and w from the tree with operation removeAboveExternal(w) Example: remove 4

2

9

> 4 v

1

8

w 5

6 2 1

12/20/02

6


2, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and another of height n-2. That is, n(h) = 1 + n(h-1) + n(h-2) Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(n-6), … (by induction), n(h) > 2in(h-2i)

Solving the base case we get: n(h) > 2 h/2+1, where h-2i=2 Taking logarithms: h < 2log n(h) -2 Thus the height of an AVL tree is O(log n) Algorithm Theory

12/20/02

Hiroaki Kobayashi

17

Insertion in an AVL Tree Insertion is as in a binary search tree Always done by expanding an external node. Example: 44 44

17

17

2

2

78

32

50

88

78

32

50

88

3

4 48

48

c=z

a=y

62

b=x

62 54 w

before insertion (balanced) 12/20/02

Algorithm Theory Hiroaki Kobayashi

after insertion (unbalanced) 18

9

Trinode Restructuring let (a,b,c) be an inorder listing of x, y, z perform the rotations needed to make b the topmost node of the three (other two cases are symmetrical)

a=z

a=z

case 2: double rotation (a right rotation about c, then a left rotation about a)

c=y

b=y T0

T0

b=x

c=x

T3

T1

b=y T2

b=x T1

T3 a=z

case 1: single rotation (a left rotation about a)

c=x

T2

T1

T0

T2 a=z

T3

T0

c=y

12/20/02

T3

T2

T1

Algorithm Theory Hiroaki Kobayashi

19

Insertion Example, continued 5 44

z

2 17 3

1 32

1

2 y 2

1 1

7 1

50

48

64 78

3

4 62

88

x 5

T3

54

unbalanced...

T2

T0

T1

44

4 3

2 17 2

1 32

...balanced

1

1 48

z6

62

2 y 50

4 x

3

1

5

78

2

7

54

1

88

T2 12/20/02

Algorithm Theory Hiroaki Kobayashi

T0

T1

20

T3

10

Restructuring (as Single Rotations) Single Rotations: a=z

b=y

single rotation b=y

a=z

c=x

c=x

h h+1 h+2

T0

T3

T1

(depth from root)

T3

T0

T1

T2

T2

c=z

b=y

single rotation

b=y

a=x

c=z

a=x

h h+1 h+2

T3 T0

T3

T2

(depth from root) 12/20/02

T2

Algorithm Theory Hiroaki Kobayashi

T1

T1

T0 21

Restructuring (as Double Rotations) double rotations: double rotation

a=z

c=y

b=x

h h+1 h+2

T0

T3

T2

(depth from root)

T0

T2

T1

T3

T1 double rotation

c=z

b=x a=y

a=y h h+1 h+2

b=x a=z

c=y

c=z

b=x T3

(depth from root) 12/20/02

T0 T2 T1

T3 Algorithm Theory Hiroaki Kobayashi

T2

T1

T0 22

11

Removal in an AVL Tree Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent, w, may cause an imbalance. Example: 44

44

17

2

32

50

48

12/20/02

17

1

62

50

78

78

3

3 48

88

54

62

after deletion (unbalanced)

Algorithm Theory Hiroaki Kobayashi

before deletion of 32 (balanced)

88

54

23

Rebalancing after a Removal Let z be the first unbalanced node encountered while travelling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height. We perform restructure(x) to restore balance at z. As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached a=z

w

62

44

17

50

48 12/20/02

c=x

78

54

44

b=y

62

17

50

48

88 Algorithm Theory Hiroaki Kobayashi

78

88

54

24

12

Running Times for AVL Trees a single restructure is O(1) „

using a linkedstructure binary tree

find is O(log n) „

height of tree is O(log n), no restructures needed

time per level

height

O(1)

AVL Tree T

O(1)

insert is O(log n) „ „

initial find is O(log n) Restructuring up the tree, maintaining heights is O(log n)

O(log n)

Down phase

Up phase

O(1)

remove is O(log n) „ „

initial find is O(log n) Restructuring up the tree, maintaining heights is O(log n)

12/20/02

Worst-case time: O(log n) 25

Algorithm Theory Hiroaki Kobayashi

(2,4) Trees: Bounded-Depth Search Trees 9 2 5 7

12/20/02

Algorithm Theory Hiroaki Kobayashi

10 14

26

13

Multi-Way Search Tree A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores d −1 key-element items (ki, oi), where d is the number of children For a node with children v1 v2 … vd storing keys k1 k2 … kd−1

„

„

Š keys in the subtree of v1 are less than k1 Š keys in the subtree of vi are between ki−1 and ki (i = 2, …, d − 1) Š keys in the subtree of vd are greater than kd−1

The leaves store no items and serve as placeholders

„

11 2 6 8

24 15

27

32 30

Algorithm Theory Hiroaki Kobayashi

12/20/02

27

Multi-Way Inorder Traversal We can extend the notion of inorder traversal from binary trees to multi-way search trees Namely, we visit item (ki, oi) of node v between the recursive traversals of the subtrees of v rooted at children vi and vi + 1 An inorder traversal of a multi-way search tree visits the keys in increasing order 11

24

8

2 6 8 2 1

12/20/02

4 3

6 5

12

15

27

7

9

32

14

10

18

30 11

Algorithm Theory Hiroaki Kobayashi

13

19

16 15

17 28

14

Multi-Way Searching Similar to search in a binary search tree A each internal node with children v1 v2 … vd and keys k1 k2 … kd−1 „ „ „ „

k = ki (i = 1, …, d − 1): the search terminates successfully k < k1: we continue the search in child v1 ki−1 < k < ki (i = 2, …, d − 1): we continue the search in child vi k > kd−1: we continue the search in child vd

Reaching an external node terminates the search unsuccessfully Example: search for 30 11 2 6 8

24 15

27

32 30

Algorithm Theory Hiroaki Kobayashi

12/20/02

29

(2,4) Tree A (2,4) tree (also called 2-4 tree or 2-3-4 tree) is a multi-way search with the following properties „ „

Node-Size Property: every internal node has at most four children Depth Property: all the external nodes have the same depth

Depending on the number of children, an internal node of a (2,4) tree is called a 2-node, 3-node or 4node 10 15 24 2 8

12/20/02

12

Algorithm Theory Hiroaki Kobayashi

18

27

32

30

15

Height of a (2,4) Tree Theorem: A (2,4) tree storing n items has height O(log n) Proof: Let h be the height of a (2,4) tree with n items Since there are at least 2i items at depth i = 0, … , h − 1 and no items at depth h, we have n ≥ 1 + 2 + 4 + … + 2h−1 = 2h − 1 Thus, h ≤ log (n + 1)

„ „

„

Searching in a (2,4) tree with n items takes O(log n) time depth items 1 0 1

2

h−1

2h−1

h

0 Algorithm Theory Hiroaki Kobayashi

12/20/02

31

Insertion We insert a new item (k, o) at the parent v of the leaf reached by searching for k „ „

We preserve the depth property but We may cause an overflow (i.e., node v may become a 5-node)

Example: inserting key 30 causes an overflow 10 15 24 2 8

12

18

v 27 32 35

10 15 24

v 2 8 12/20/02

12

18

Algorithm Theory Hiroaki Kobayashi

overflow

27 30 32 35 32

16

Overflow and Split We handle an overflow at a 5-node v with a split operation: „ „

let v1 … v5 be the children of v and k1 … k4 be the keys of v node v is replaced nodes v' and v" Š v' is a 3-node with keys k1 k2 and children v1 v2 v3 Š v" is a 2-node with key k4 and children v4 v5

„

key k3 is inserted into the parent u of v (a new root may be created)

The overflow may propagate to the parent node u u

u

15 24 32

15 24

v 12

18

v'

27 30 32 35

12

v1 v2 v3 v4 v5 Algorithm Theory Hiroaki Kobayashi

12/20/02

18

27 30

v" 35

v1 v2 v3 v4

v5 33

Exercise Show a sequence of insertion into a (2,4) tree with a input of {4,6,12,15,3,5,10} 4

12/20/02

??

Algorithm Theory Hiroaki Kobayashi

34

17

Cascade Split As a consequence of a split operation on node v, a new overflow may occur at the parent u of v. 34

68

5 10 12

Insert 17

11

13 14 15

5 10 12 34

68

5 10 12 13 14 15 17

11

34

68

11

15

13 14

17

12 12

5 10 12 15 34

68

11

13 14

5 10 17

34

12/20/02

68

11

15 13 14

5 10 17

34

68

11

15 13 14

Algorithm Theory Hiroaki Kobayashi

17

35

Analysis of Insertion Let T be a (2,4) tree with n items

Algorithm insertItem(k, o) 1. We search for key k to locate the insertion node v

„

„

2. We add the new item (k, o) at node v „

3. while overflow(v) if isRoot(v) create a new empty root above v v ← split(v)

12/20/02

Algorithm Theory Hiroaki Kobayashi

„

Tree T has O(log n) height Step 1 takes O(log n) time because we visit O(log n) nodes Step 2 takes O(1) time Step 3 takes O(log n) time because each split takes O(1) time and we perform O(log n) splits

Thus, an insertion in a (2,4) tree takes O(log n) time 36

18

Deletion We reduce deletion of an item to the case where the item is at the node with leaf children Otherwise, we replace the item with its inorder successor (or, equivalently, with its inorder predecessor) and delete the latter item Example: to delete key 24, we replace it with 27 (inorder successor) 10 15 24 2 8

12

18

27 32 35

10 15 27 2 8

12

18

32 35

Algorithm Theory Hiroaki Kobayashi

12/20/02

37

Underflow and Fusion Deleting an item from a node v may cause an underflow, where node v becomes a 1-node with one child and no keys To handle an underflow at node v with parent u, we consider two cases Case 1: the adjacent siblings of v are 2-nodes „

„

Fusion operation: we merge v with an adjacent sibling w and move an item from u to the merged node v' After a fusion, the underflow may propagate to the parent u

u 2 5 7

12/20/02

9 14 10

fusion w v

Algorithm Theory Hiroaki Kobayashi

u 2 5 7

9 10 14

v'

38

19

Underflow and Transfer To handle an underflow at node v with parent u, we consider two cases Case 2: an adjacent sibling w of v is a 3-node or a 4-node Transfer operation: 1. we move a child of w to v 2. we move an item from u to v 3. we move an item from w to u After a transfer, no underflow occurs

„

„

u

u

4 9 2

6 8

4 8

w

v

2

6

w

Algorithm Theory Hiroaki Kobayashi

12/20/02

9

v

39

Exercise Draw the sequence of removals {4,12,13} from a (2,4) tree below. 12

5 10 4

68

12/20/02

11

15 13 14

17

??

Algorithm Theory Hiroaki Kobayashi

40

20

Propagating Sequence of Fusion Draw a sequence of fusion in the case of removal of 14, 11

6 5

8 10

15 14

?? 17

Remove 4

12/20/02

Algorithm Theory Hiroaki Kobayashi

41

Analysis of Deletion Let T be a (2,4) tree with n items „

Tree T has O(log n) height

In a deletion operation „

„

„

We visit O(log n) nodes to locate the node from which to delete the item We handle an underflow with a series of O(log n) fusions, followed by at most one transfer Each fusion and transfer takes O(1) time

Thus, deleting an item from a (2,4) tree takes O(log n) time 12/20/02

Algorithm Theory Hiroaki Kobayashi

42

21

Total Order Relation Keys in a priority queue can be arbitrary objects on which an order is defined Two distinct items in a priority queue can have the same key 12/20/02

Mathematical concept of total order relation ≤ „

„

„

Reflexive property: x≤x Antisymmetric property: x≤y∧y≤x⇒x=y Transitive property: x≤y∧y≤z⇒x≤z

Algorithm Theory Hiroaki Kobayashi

43

Tree Terminology Root: node without parent (A) Internal node: node with at least one child (A, B, C, F) External node (a.k.a. leaf ): node without children (E, I, J, K, G, H, D) Ancestors of a node: parent, grandparent, grand-grandparent, etc. Depth of a node: number of ancestors E Height of a tree: maximum depth of any node (3) Descendant of a node: child, grandchild, grand-grandchild, etc. 12/20/02

Algorithm Theory Hiroaki Kobayashi

Subtree: tree consisting of a node and its descendants A

B

C

F

G

D

H

subtree I

J

K

44

22

Tree ADT We use positions to abstract nodes Generic methods: „ „ „ „

integer size() boolean isEmpty() objectIterator elements() positionIterator positions()

Accessor methods: „ „ „

position root() position parent(p) positionIterator children(p)

Query methods: „ „ „

boolean isInternal(p) boolean isExternal(p) boolean isRoot(p)

Update methods: „ „

swapElements(p, q) object replaceElement(p, o)

Additional update methods may be defined by data structures implementing the Tree ADT

Algorithm Theory Hiroaki Kobayashi

12/20/02

45

Preorder Traversal A traversal visits the nodes of a tree in a systematic manner In a preorder traversal, a node is visited before its descendants Application: print a structured document 1

Algorithm preOrder(v) visit(v) for each child w of v preorder (w)

Make Money Fast!

2

5

1. Motivations

3

4

1.1 Greed

1.2 Avidity

12/20/02

9 2. Methods

6 2.1 Stock Fraud Algorithm Theory Hiroaki Kobayashi

7 2.2 Ponzi Scheme

References

8 2.3 Bank Robbery 46

23

Postorder Traversal In a postorder traversal, a node is visited after its descendants Application: compute space used by files in a directory and its subdirectories 9

Algorithm postOrder(v) for each child w of v postOrder (w) visit(v)

cs16/

3

8

7

homeworks/

todo.txt 1K

programs/

1

2

h1c.doc 3K

h1nc.doc 2K

12/20/02

4 DDR.java 10K

5 Stocks.java 25K

Algorithm Theory Hiroaki Kobayashi

6 Robot.java 20K 47

Properties of Binary Trees Notation

Properties: „ e = i + 1 „ n = 2e − 1 „ h ≤ i „ h ≤ (n − 1)/2 „ e ≤ 2h „ h ≥ log2 e „ h ≥ log2 (n + 1) − 1

n number of nodes e number of external nodes i number of internal nodes h height

12/20/02

Algorithm Theory Hiroaki Kobayashi

48

24

Inorder Traversal In an inorder traversal a node is visited after its left subtree and before its right subtree Application: draw a binary tree „ „

Algorithm inOrder(v) if isInternal (v) inOrder (leftChild (v)) visit(v) if isInternal (v) inOrder (rightChild (v))

x(v) = inorder rank of v y(v) = depth of v

6

2

8

1

4 3

12/20/02

7

9

5 Algorithm Theory Hiroaki Kobayashi

49

Operations to Expand and Remove an External Node expandExternal(v)

expandExternal(v)

v

A

A ∅



removeAboveExternal(w)

removeAboveExternal(w) A B 12/20/02

v

Algorithm Theory Hiroaki Kobayashi

B C

w

50

25