Binary Split Tree Insertion and Deletion Algorithms

-1- Binary Split Tree Insertion and Deletion Algorithms David A. Spuler and Gopal K. Gupta Dept. of Computer Science James Cook University of North Q...
Author: Damon Lucas
50 downloads 0 Views 17KB Size
-1-

Binary Split Tree Insertion and Deletion Algorithms David A. Spuler and Gopal K. Gupta Dept. of Computer Science James Cook University of North Queensland

Abstract Split trees were designed for static data sets with skewed distributions, and therefore algorithms for split tree insertion and deletion have not received attention. However, these algorithms are important in the situation where a table containing a set of keys with known probabilities is being constructed, but must still allow occasional subsequent insertions and deletions. Insertion in a split tree when the access frequencies are not stored in the nodes is analogous to ordinary insertion in a binary search tree. Insertion is slightly more difficult when access frequencies are stored with the keys, since the constraint that the root key must be a maximally weighted key in its subtree must be ensured. Deletion algorithms for split trees are similar to binary search tree deletion except when an internal node is deleted, in which case shuffling upwards of value keys ensures highly weighted keys stay close to the root. 1. Introduction Split trees were introduced by Sheil [1978] as an efficient method of searching static data sets with skewed probability distributions. Each node contains two keys, one called the value key and one called the split key. The search algorithm for a binary split tree is as follows: function search(tree: TreePtr; search_key: KeyType) : begin if tree = nil then search := nil { NOT FOUND } else if search_key = treeˆ.value_key then search := tree { FOUND IT! else if search_key < treeˆ.split_key then search := search(treeˆ.left, search_key) { else search := search(treeˆ.right, search_key); { end;

TreePtr;

} search left subtree } search right subtree }

As can be seen from the search algorithm, only the value keys can be found during search and the split keys serve to indicate which subtree to search. This is in contrast to the binary search tree where the single key in each node performs both functions. Using two keys gives a split tree greater flexibility since it is possible to store the maximally weighted key in the root node and still have a balanced tree structure. For example, the median split tree given by Sheil [1978] always has the structure of a completely balanced tree. Insertion and deletion algorithms for split trees do not appear to have been examined by any authors. Split trees are most appropriate for static data sets with skewed access probabilities and therefore insertions would seem to be unnecessary. However, insertions and deletions are useful in the situation where an initial data set is known, but later updates are still required. A good strategy in this situation is to initially use the median split tree constructed from the known data set. A median split tree is a complete binary split tree with good average and worst case search performance. Note that the median split tree is only nearly optimal — algorithms that produce the optimal split tree are known, but have at least O(n4 ) time complexity; refer to [Hester et al, 1986], [Huang and Wong, 1984a], [Perl, 1984]. 2. Insertion Without Stored Weights A simple insertion algorithm is possible for a split tree where the weights are no longer stored. A new key is simply added as a leaf node in an identical manner to the ordinary insertion algorithm for binary search trees. This new node has both its value key and its split key equal to the new key.

-2-

procedure insert(var tree: TreePtr; new_key: KeyType); begin if tree = nil then { Reached leaf. Insert it } begin new(tree); treeˆ.left := nil; treeˆ.right := nil; treeˆ.value_key := new_key; treeˆ.split_key := new_key; end else if new_key = treeˆ.value_key then complain_about_duplicate else if new_key < treeˆ.split_key then insert(treeˆ.left, new_key) { Insert in left subtree } else insert(treeˆ.right, new_key); { Insert in right subtree } end;

This algorithm will be adequate if only a few insertions are required and the probability of accessing the new key is not very high relative to the other keys. The insertions using this algorithm must be performed after constructing an optimal or nearly optimal split tree from a known set of probabilities. Using this algorithm to build a split tree starting with the empty tree is not satisfactory, since it will simply build a split tree that is isomorphic to a binary search tree — all nodes will have two identical keys. Furthermore, applying such insertions to a median split tree will cause it to lose its complete tree property, and a large number of insertions may lead to a poorly balanced tree. 3. Insertion With Stored Weights A more effective insertion algorithm is possible if we are willing to sacrifice an extra O(n) storage to store the weights in the nodes, and if the weight of the new key being inserted is always known. It is possible to maintain the constraint that each key is the maximally weighted key in its subtree, thereby ensuring that heavily weighted keys are always close to the root. The insertion algorithm is modified so that when a key is met on the search path with a lesser weight than the new key, then the new key displaces the key at that node and the displaced key is inserted into its subtree. It is important to be clear exactly what is meant by the stored weights. One possibility is to assume that weights are probabilities that sum to one, but this constraint would be violated by the first insertion or deletion. Hence we assume that weights are integers that approximate the probabilities of search. procedure insert(var tree: TreePtr; new_key: KeyType; new_weight:integer); var temp_key : KeyType; temp:integer; begin if tree = nil then { Reached leaf. Insert it } begin new(tree); treeˆ.left := nil; treeˆ.right := nil; treeˆ.value_key := new_key; treeˆ.split_key := new_key; treeˆ.weight := new_weight; end else if new_key = treeˆ.value_key then complain_about_duplicate else begin if new_weight > treeˆ.weight then begin { Displace lower weighted key } temp := treeˆ.weight; temp_key := treeˆ.value_key; treeˆ.value_key := new_key; treeˆ.weight := new_weight; new_key := temp_key; { Prepare to insert displaced key } new_weight := temp; { .. and its weight } end; { Now insert new key or displaced key in a subtree }

-3-

if new_key < treeˆ.split_key then insert(treeˆ.left, new_key, new_weight) else insert(treeˆ.right, new_key, new_weight); end;

{ Insert in left subtree } { Insert in right subtree }

end;

This insertion algorithm is unlikely to maintain the complete tree arrangement as provided by the median split tree construction algorithm. Therefore the algorithm is only adequate for a small number of insertions. After a large number of insertions it may be necessary to reconstruct the entire tree. This algorithm has a slight danger in that it displaces keys rather than rearranging nodes and pointers. If the program maintains pointers to nodes for some other purpose, these pointers may point to nodes containing the wrong keys after the insertion. However, modifying the above algorithm to rearrange nodes rather than shifting keys is not conceptually difficult, and it is left as an exercise to the reader. Note that it would be possible to use the ordinary split tree insertion algorithm of the previous section even if weights are stored, but this will not create a split tree in the sense of Sheil [1978], but instead a generalized split tree as examined by Huang and Wong [1984b]. The simple insertion algorithm makes no use of the weights and will not always maintain a good tree. 4. Deletion Without Stored Weights A simple deletion algorithm is possible for a split tree where the weights are no longer stored. Deletion is almost identical to the ordinary deletion algorithm for binary search trees. The simple cases are handled in the same manner as for binary search trees. A leaf node is simply deleted by setting its parent’s pointer to nil; a node with only one non-nil subtree has its parent’s pointer set to point to that subtree. The difficult case of an internal node with two non-nil subtrees can be handled exactly as for the binary search tree — that is, by replacing it with the rightmost key in the left subtree (or the leftmost key in the right subtree). In fact, we could move any key up to replace the deleted key, and the choice of the rightmost key in the left subtree is just a convenient method of finding a node which is not an internal node (i.e. which has zero or one non-nil subtree). Any method of finding a node in either subtree which is not an internal node (and therefore requires only simple deletion) could be used. However, this is not the best solution, since this involves moving up a node that is far away from the root, and therefore assumed to have a low access weight. It would be preferable to simply move the key immediately below the deleted key upwards to replace it, and this is exactly what can be done. The algorithm below simply moves upwards the key in the left child, and repeats the shuffling upwards process until the node is not an internal node, and can be deleted simply. Numerous variants on this idea are possible. For example, it may be undesirable to always go left, and a more symmetric deletion algorithm could be designed by making a random direction decision. Regardless of which variant is chosen, the resulting tree is still likely to lose the complete tree property of the median split tree. procedure delete(var tree: TreePtr; key: KeyType); var temp, parent: TreePtr; done: boolean; begin if tree = nil then { Key not found } complain_not_found else if key = treeˆ.value_key then begin { FOUND IT! } if treeˆ.left = nil then { Leaf, or right subtree only } begin temp := tree; tree := treeˆ.right; dispose(temp); end else if treeˆ.right = nil then { Left subtree only } begin temp := tree; tree := treeˆ.left; dispose(temp);

-4-

end else begin { Deleting internal node } parent := tree; done := false; repeat temp := parentˆ.left; parentˆ.value_key := tempˆ.value_key; if tempˆ.right = nil then begin parentˆ.left := tempˆ.left; dispose(temp); done := true; end else if tempˆ.left = nil then begin parentˆ.left := tempˆ.right; dispose(temp); done := true; end; parent := temp; until done; end;

{ Shuffle key up }

end else if key < treeˆ.split_key then delete(treeˆ.left, key) { Delete from left subtree } else delete(treeˆ.right, key); { Delete from right subtree } end;

Note that this deletion algorithm can cause the situation where a split key is not actually stored as a value key anywhere in the tree. However, this situations is not dangerous since search, insertion and deletion algorithms will still perform correctly. The split keys only serve as an index over the other keys and it does not matter whether or not they are members of the key set. This algorithm also causes problems if the programs maintains other pointers to nodes in the tree, since it modifies the keys in the nodes. A variant of this algorithm to move nodes rather than shuffle keys between nodes is not conceptually difficult, although it may involve many details. 5. Deletion With Stored Weights The above deletion algorithm can be made slightly more effective when the access frequencies are stored with the keys in the nodes. The deletion of an internal node can be modified to always shuffle up the heaviest key, rather than arbitrarily choosing the key in the left child. procedure delete(var tree: TreePtr; key: KeyType); var temp, parent: TreePtr; done: boolean; begin if tree = nil then { Key not found } complain_not_found else if key = treeˆ.value_key then begin { FOUND IT! } if treeˆ.left = nil then { Leaf, or right subtree only } begin temp := tree; tree := treeˆ.right; dispose(temp); end else if treeˆ.right = nil then { Left subtree only } begin temp := tree; tree := treeˆ.left; dispose(temp); end else begin { Deleting internal node } {**** THIS IS THE ONLY CHANGED CODE ***}

-5-

parent := tree; done := false; repeat if parentˆ.leftˆ.weight > parentˆ.rightˆ.weight then temp := parentˆ.left else temp := parentˆ.right; parentˆ.value_key := tempˆ.value_key; { Shuffle key up } if tempˆ.right = nil then begin if temp = parentˆ.left then parentˆ.left := tempˆ.left else parentˆ.right := tempˆ.left; dispose(temp); done := true; end else if tempˆ.left = nil then begin if temp = parentˆ.left then parentˆ.left := tempˆ.right else parentˆ.right := tempˆ.right; dispose(temp); done := true; end; parent := temp; until done; end; end else if key < treeˆ.split_key then delete(treeˆ.left, key) { Delete from left subtree } else delete(treeˆ.right, key); { Delete from right subtree } end;

6. Modifying a Weight Another interesting problem in the maintenance of a split tree with stored weights is how to rearrange the keys if one weight changes. Consider what happens if a weight is reduced so as to have a lesser weight than one or both of its children. It appears to be a simple matter of repeatedly swapping the key with one of its immediate children, thereby shuffling the lower weighted key further down the tree. However, whereas this worked correctly for the insertion algorithm, it fails for this problem. The difference is that it requires a key already in the tree to be moved further down the tree, whereas for insertion it was always a "new" key that was moved downwards. Consider the tree shown in Figure 1 when the weight of the value key, A, in the root node is reduced. Assuming that A’s weight is now less than that of B or C, we should move the heaviest key upwards (i.e. either B or C). However, if C is heavier than B, this creates a problem since A’s split key (not shown, but obviously either B or C) indicates that A cannot be placed into the right subtree. Hence, A can only be swapped with B which has a lesser weight than C — the tree has become a generalized split tree in the sense examined by Huang and Wong [1984b]. There is no obvious solution to this problem, and it appears that the only possibility is to restrict the swapping to the child in only one of the subtrees (depending on the respective ordering of the root node’s value and split keys). Rotations which modify both the value and the split keys are a solution for the special case when the two children have no subtrees, but a more general rotation scheme appears impossible. FIGURE 1. Split tree with only value keys shown

A

B

C

-6-

The problem of rearranging keys when a weight is increased appears even more difficult. If a node’s weight becomes larger than its parent then the two value keys should be swapped. However, the swap is only possible if the parent node’s value key has the correct order relative to the parent node’s split key. If the ordering is wrong then there is nothing that can be done. Again, rotations are possible in special cases but no general scheme presents itself. 7. Conclusions Two different insertion algorithms have been presented for binary split trees. The first is almost identical to the ordinary insertion algorithm for binary search trees and can be used regardless of whether weights are stored. If weights are stored in the nodes and the weight of the new key being inserted is also known, a better insertion algorithm is possible — one which maintains the rule that the maximally weighted key should be stored in the root of its subtree, and therefore a good tree is maintained. Deletion algorithms are also presented for both situations. The algorithms are identical to that for binary search tree deletion except for deleting internal nodes, in which case keys are shuffled upwards to replace the deleted key. The only advantage of storing the weights in the nodes is that the deletion algorithm can make an informed choice as to whether to shuffle the key from the left or right subtree, rather than making an arbitrary choice. The problem of modifying a stored weight has also been examined. It appears that the most consistent method of handling a modification of a key’s weight is to perform a deletion followed by an immediate re-insertion of the key. A simpler shuffling algorithm is possible for weight reduction, but only at the expense of the usual split tree rule that the root node contains the maximally weighted key of its subtree, and the use of key shuffling for a weight increase is very limited, since the key with the increased weight can only be swapped with its parent approximately half the time. 8. References Hester, J.H., Hirschberg, D.S., Huang, S-H.S., and Wong, C.K., "Faster Construction of Optimal Binary Split Trees", Journal of Algorithms, Vol 7, No 3, p412-424, 1986. Huang, S-H.S. and Wong, C.K., "Optimal Binary Split Trees", Journal of Algorithms, Vol 5, No 1, p65-79, 1984a. Huang, S-H.S. and Wong, C.K., "Generalized Binary Split Trees", Acta Informatica, Vol 21, No 1, p113-123, 1984b. Perl, Y., "Optimum split trees", Journal of Algorithms, Vol 5, No 3, p367-374, Sep 1984. Sheil, B.A., "Median Split Trees: A Fast Lookup Technique for Frequently Occurring Keys", C.ACM, Vol 21, No 11, p947-958, Nov 1978.

Suggest Documents