Binary Search Trees, etc. The Dictionary data type is a collection of data, each of which has a key component. If we use an hashing table to represent a dictionary, it only takes, on average, Θ(1) to carry out such basic operations as search, insertion, and deletion. However, the worst time is O(n). It also can not provide good average time to carry out other operations such as output in ascending order of keys, find the kth element in ascending order, etc.. All those basic operations can be done in Θ(n), using a balanced search tree, a sorted output can be done in Θ(n), and the other operations can be done in Θ(log n). We will begin with binary search tree, which is not always ”balanced.” 1

Binary Search Trees A binary search tree is a binary tree that may be empty. A nonempty binary tree has the following properties: 1) Every element has a key component, whose values are distinct. 2) The keys in the left subtree of the root are smaller than the key in the root. 3) The keys in the right subtree of the root are larger than that in the root. 4) Both the left and right subtrees are also binary search trees.

An indexed BST is derived from a BST by adding a field, LeftSize, to each tree node, whose value is the number of nodes in its leftsubtree plus 1. 2

The BSTree class Since the number of nodes, as well as its shape, will change frequently, it is appropriate to use a linked structure to represent a binary search tree. Moreover, it will be convienient to derive it from the class of binary trees. For example, we can call the InOutput, operation defined for binary trees to output all the elements in an ascending order. class BSTree : public BinaryTree { public: bool Search(const K& k, E& e) const; BSTree& Insert(const E& e); BSTree& Delete(const K& k, E& e); void Ascend() {InOutput();} };

3

The search operation When we search for x, we begin at the root. If it is NULL, then x can’t be found. Otherwise, we compare x with the key of the root. If they are equal, the search is a success. If it is larger that the key of the root, it must be in the right subtree, if it is inside the tree at all; so, we recursively search for x in the right subtree. Otherwise, we search the left one. bool BSTree::Search (const K& k, E &e) const{ BinaryTreeNode *p = root; while (p) // examine p->data if (k < p->data) p = p->LeftChild; else if (k > p->data) p = p->RightChild; else { e = p->data; return true;} return false; } 4

Obviously, the search can be done in O(h), where h is the height of the tree. With an indexed BST, we can also search for the kth element, also in O(h). For example, to look for the third element in the following indexed BST, we check the LeftSize filed of the root, since it is 4, the element must be in the left subtree. We then check this field again for the root of its left subtree, it is a 2. hence, the element we are looking for is the smallest one of its right subtree. We find out that the LeftSize of its root is 1, i.e., so we can conclude that this node the root itself is the element we are looking for.

5

The insertion operation To insert an element with key k inside a BST, we first verify that this key does not occur anywhere inside the tree. If the search succeeds, the insertion will not be done; otherwise, the element will be inserted where the search reports a failure. The following are two examples:

If we insert into an indexed BST, we also have to adjust the LeftSize field of all the elements from the insertion point back to the root. In both cases, it takes Θ(log n). 6

The code BSTree& BSTree::Insert(const E& e){ BinaryTreeNode *p = root,*pp = 0; while (p){ pp = p; if (e < p->data) p = p->LeftChild; else if (e > p->data) p = p->RightChild; else throw BadInput(); // duplicate } BinaryTreeNode *r=new BinaryTreeNode (e); if (root) { if (e < pp->data) pp->LeftChild = r; else pp->RightChild = r;} else root = r; return *this; }

7

The deletion operation If the node to be deleted is a leaf, we simply delete it. It is also easy to do when it has just one child. When it has two children, it becomes a bit tricky, since we have only one location for two nodes to be hooked on.

8

BSTree& BSTree::Delete (const K& k, E& e){ BinaryTreeNode *p = root, *pp = 0; while (p && p->data != k){ pp = p; if (k < p->data) p = p->LeftChild; else p = p->RightChild; } if (!p) throw BadInput(); e = p->data; if(p->LeftChild && p->RightChild) { BinaryTreeNode *s = p->LeftChild,*ps = p; while (s->RightChild){ ps = s; s = s->RightChild;} p->data = s->data; p = s; pp = ps; } BinaryTreeNode *c; if (p->LeftChild) c = p->LeftChild; else c = p->RightChild; if (p == root) root = c; else { if (p == pp->LeftChild) pp->LeftChild = c; else pp->RightChild = c;} delete p; return *this; } 9

BST could be high The height of a binary search tree can be quite high. For example, if we use the original insertion function to add in element 1, 2, . . . , n, in an empty BST, then the resulted BST is actually a linear list. As a result, a basic operation can take as must as O(n). But, if insertion and deletions are made at random, then the height of such a BST will be O(log n).

10

A little assignment Homework 11.1. Generate a random permutation of the integers 1 through n, n ∈ [100, 500, 1000, 10000, 20000, 50000]. Insert them into an initially empty binary search tree according to the random order. Measure the height of the resulting search tree. Repeat this experiment for several random permutations and then calculate the average of the measured heights. Compare the just obtained average with the theoretical figure of 2dlog2(n + 1)e.

11

AVL trees We want to define the BST operations in such a way that the trees stay balanced all the times, while preserving the binary search tree properties. The essential algorithms for insertion and deletion will be exactly the same as for BST, except that as they might destroy the property of being balanced, we have to restore the balance of the tree, if needed. Such a tree is called a balanced tree, it is guaranteed that all the basic operations applied to a balanced tree will be done in Θ(log n). The class of AVL trees is one of the more popular balanced trees. An nonempty binary tree with TL and TR as its left and right subtrees is an AVL tree iff both TL and TR are AVL trees, and |hL − hR| ≤ 1. 12

AVL search trees An AVL search tree is a binary search tree, which is also an AVL tree. It has the following properties. 1. The height of an AVL tree with n nodes is O(log n). 2. For every n ≥ 0, there exists an AVL tree with n nodes. 3. We can search an n− element AVL search tree in O(log n). 4. A new element can be added into an n− element AVL tree so that the resulted (n + 1)−element tree is also an AVL tree. 5. An element can be deleted from an n− element AVL search tree, and the resulted (n− 1)−element tree is also an AVL tree, and the deletion can be done in O(log n). 13

Fibonacci tree We calculate the least number of nodes in a balanced binary tree with depth h first: Let S(h) be this number. Obviously, S(0) = 1 and S(1) = 2. In general, we can construct a balanced tree with depth h by merging two subtrees of heights h−1 and h−2, each of which has the least number of nodes. We call such trees Fibonacci trees for the obvious reason. By an inductive argument, we have that S(h) = S(h − 1) + S(h − 2) + 1 ≥ f ib(h), which leads to h−1

. Thus, the least number of that S(h) > 3 2 h−1 . nodes in such a tree, n0 , satisfy that n0 > 3 2

14

Height of an AVL tree Therefore, given any AVL tree with n nodes, we have that 3 h−1 n ≥ n0 > 2 Further calculation leads to the following result, h ≤ 1.44 log(n + 2) − 0.328. A simpler, but unproved, result is that h ≤ log(n + 1) + 0.25.

15

Represent an AVL tree An AVL tree is usually represented using the linked representation for binary trees. To provide information to restore the balance, a balance factor is added to each node. For any node, this factor is defined to be the difference between the heights of its left and right subtrees.

It is easy to see that any search takes O(log n).

16

Restore the Balance When a tree becomes unbalanced, as a result of either insertion or deletion, we can restore its balance by rotating the tree at and/or below the node where the imbalance is detected. Case 1: Assume that the left subtree of k2 has height larger than that of its right subtree, caused by adding a node into subtree X or deleting a node from subtree Z. We apply a right rotation, which will restore the balance of the tree rooted at k1.

Notice that the above rotation is symmetric. 17

An example Below shows how to restore the balance of a tree by applying a single rotation.

Notice that here we just added a node with value 6 1 2 into the left subtree rooted at 7, labeled as k1 which itself is the left subtree of the tree rooted at 8, labeled as k2.

18

Restore the Balance Case 2: Suppose that the imbalance is caused by the right subtree rooted at k1 being too high, whose balance can’t be restored by a single rotation. We must apply two successive single rotations, as follows.

When it is caused by the left subtree being too high, we have the following similar double rotation.

19

An example Below shows how to restore the balance of a tree by applying a double rotation.

Notice that here we just added a node with value 14, labeled as k2 into the left subtree rooted at 15, labeled as k1, which itself is the right subtree of the tree rooted at 7, labeled as k3.

20

Histogramming We start with a collection of n keys and must output a list of distinct keys and their frequencies. Assume that n = 10, and keys = [2, 4, 2 , 2, 3, 4, 2, 6, 4, 2].

When the range of those keys is quite small, this problem can be solved, in Θ(n), by using an array h[0..r], where r is the maximum key value, and letting h[i] store the frequency of i. 21

When the key value are not integers, the above procedure can’t be used. Instead, we can sort the n keys, in Θ(n log n), then scan the sorted list from left to right to find the frequencies of respective keys. This solution can be further improved, when m, the number of distinct keys, is quite small. We can use a (balanced) BST tree, each of whose nodes contains two fields, key and frequency. We insert all the n keys into such a tree, when a key is already there, we simply increment its frequency by 1. This procedure takes an expected complexity of Θ(n log m). When a balanced tree is used, this complexity is guaranteed, even in the worst case.

22

m−way search trees An m−way search tree, when it is not empty, satisfies the following conditions: 1) each internal node has up to m children and contains between 1 and m − 1 elements. 2) Every node with p elements has exactly p + 1 children. 3) For any node with p elements. Let k1