Anda di halaman 1dari 5

2009 International Conference on Information Management and Engineering

A Survey on Maintaining Binary Search Tree in Optimal Shape


Inayat-ur-Rehman
Department of Computer Science COMSATS Institute of Information Technology Islamabad, Pakistan inayat@comsats.edu.pk

Saif-ur-Rehman khan
Department of Computer Science COMSATS Institute of Information Technology Islamabad, Pakistan saif_rehman@comsats.edu.pk

M. Sikandar Hayat Khayal


Department of Computer Science Fatima Jinnah Women University Rawalpindi, Pakistan m.sikandarhayat@yahoo.com

Abstract Binary Search Tree (BST) is one of the


most widely used techniques for searching in non-linear data structure. If the BST is not maintained in optimal shape then the searching and insertion may need extra number of comparisons. In present literature, several BST algorithms have been proposed to maintain the BST in optimal shape. Different researchers have focused mainly on finding the total running time of BST algorithms but no one has focused on solving the mystery of using which BST algorithm under what scenario. In this paper, we present a thorough comparison of existing techniques that ultimately enables software developers to select a particular BST technique according to the faced data management scenario. Keywords: Non-linear Data structure; BST; BST balancing Techniques.

ata structure is an important subject in the field of computing. It mainly focuses on the organization of data that supports efficient data access and manipulation. Generally data can be organized into two main categories, 1) Linear data structure (i.e., array, list, stacks, queues, etc), and 2) Non-linear data structure (i.e., trees, graphs, heaps, etc). Trees are built in Parent-Child hierarchy. Usually a single parent (also known as root node) has one or more child nodes. In next level, these child nodes may act as parent nodes and vice versa. BST is a special type of tree in which data is stored in a sorted order that help us in efficient and effective searching. It arranges data in such a way that a unique path exist from one node to other nodes of the tree. Finally, it helps the developer in performing optimal
978-0-7695-3595-1/09 $25.00 2009 IEEE DOI 10.1109/ICIME.2009.128 365

I.

INTRODUCTION:

searching operation. BST is generally constructed by comparing the data with its root node. If this node having greater value then it is placed on the right side of the root node, while lesser value is placed on the left side of root. The height of BST plays a vital role in searching a particular element. As the height increases, the number of comparisons also increases during a specific search operation [11]. Random BST requires approximately 39% more comparisons compared to the balanced BST technique [2]. Height also plays the important role in data insertion and deletion. Moreover, in case of unbalanced BST the height of tree may reach up to n-1 and if it is fully balanced then its height is n/2 [20]. The total running time of unbalanced BST is O(n) and for balanced BST technique is O(log n) [1]. In Present literature, several BST algorithms have been proposed to maintain the BST in optimal shape [3, 5-11, 13-19]. Each technique has its own pros and cons. In this paper, we performed a comparative study among existing BST techniques. Ultimately, it serves as a guideline for software developer(s) to select a suitable technique to meet the current faced data management problem. This paper is organized as follows: Section II provides a thorough background of existing BST techniques, it further highlight the issues related to these techniques. Section III gives a brief analysis of existing BST techniques. Next Section IV concludes the paper and finally Section V discusses the future work. II. Existing BST Techniques: Currently, different BST techniques have been proposed by researchers. In following section, we will

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on July 23, 2009 at 03:22 from IEEE Xplore. Restrictions apply.

discuss these algorithms along with their strengths and weaknesses. AVL Trees: In 1962, two Russian Mathematicians G. M. AdelSon-VelSkii and E. M. Landis proposed a nearly balanced BST technique [3]. The basic motif of AVL tree is to rebalance the tree in optimal shape, as BST looses its shape due to random data insertions and deletions. It maintains the balance in such a way that the height difference of ns left and right subtree is no more than 1 [21]. Data insertion in AVL tree is just like BST but after new node insertion, it balance is checked and if it violate this rule then single or double rotation is applied to rebalance the tree, accordingly. If the left subtree is left skewed then apply single right rotation and if it is right skewed then apply left rotation. Double rotation is applied in the case of left sub-tree is right skewed or right sub-tree is left skewed. The running time of AVL tree is not efficient because after each insertion and deletion it examine the tree shape to rebalance it (if not balanced). As far as the memory requirement is concerned, it also needs extra memory to store the balance factor for each and every node. Insertion and deletion from AVL tree of n nodes need O(log n) time [1]. The height of AVL tree does not exceed 1.44 log n, so the length of search is nearly log n and average number of comparisons for searching is about log n + 0.25, when n is large [4]. Red-Black Trees: Red-Black trees were invented by R. Bayer [16]. These trees are special case of B-tree having the following characteristics [20]: A node can be colored either red or black. The root node is always black and no red node has red children. Number of black nodes in any path from root to leaf is the same. New node is Red colored and if its parent is black then we completed insertion else change the parent and sibling to black color and grandparent to red. B. A.

height would be log2(n+1) but not greater than 2log2(n+1). Similarly, minimum number of nodes in a red-black tree with height h is 2h/2 1 but not greater than 2h 1. On the other hand, if we compare it with an optimally balanced BST with n nodes then the height would be log2(n+1). An optimally balanced BST with height h has between 2h1 and 2h1 nodes [20]. Rebalancing with Sorting Algorithm: The algorithm was developed by W. A. Martin and D. N. Ness [7]. The algorithm applies the different approach as compared to AVL and Red-Black trees by applying the static method rather than incremental restructuring method. Traditional insertion algorithm of BST can generate the tree in as many as n! shapes (Where n is the number of nodes in the tree) with the same but different order of data. In this approach, traditional BST insertion algorithm is used to construct the BST and then the sorting is applied to balance the constructed BST [7]. As AVL and other approaches readjust the trees after each operation but this algorithms uses the stack to traverse the tree and then divides the total number of nodes (n) by 2 to get the median. The time for the sorting algorithm is proportional to the number of nodes in the tree. In order to adjust the tree in optimal shape each node of the tree must be visited at least once. So the time complexity of the algorithm is O(n). This approach is also inefficient in respect of memory because it uses extra memory in shape of stack. The other problem with the technique is how to predict when the tree is unbalanced and there is need to rebalance it [7]. Threaded BST: Generally in BST all pointers having null values are wasted as they dont point to any node of tree. Day proposed a new technique called threaded BST which effectively utilizes these null pointers. The main idea behind this technique is that if a nodes left pointer is having null value then it points to the nodes in-order predecessor; similarly if its right pointer is having null value then it points to nodes successor [17]. Building a threaded BST is simple but its main drawback is that there can be as many as three different paths to visit a single node but in simple BST there is only one path. The flag stores the Boolean value and holds true when pointer is threaded and false otherwise. D. C.

The main advantage of Red-Black tree is that it requires less number of operations as compared to AVL tree during a node deletion operation. However it has more height compared to AVL trees. Consider a redblack tree with n number of nodes. The minimum
366

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on July 23, 2009 at 03:22 from IEEE Xplore. Restrictions apply.

The flag is used to differentiate whether a link is a thread or a real link. This flag also protect us from infinite loop. The right pointer of the right most node and left pointer of the left most node are left null. From BST we cant find the next or previous value but in threaded BST the right pointer points to next and left pointer points to previous element. The main advantage of threaded BST is that we can traverse it by linear traversal which is faster than recursion. The overall running time of this technique is O(n) [17]. Globally Balancing Algorithm: Hsi Chang and Sitharama Iyengar proposed a globally balancing technique which balances the tree by readjusting the pointers [8]. The global balancing algorithm runs in linear time that recursively traverses the tree in in-order traversal to get the sorted data of the tree and then rebalances it. This algorithm also uses the partitioning approach but reduces the number of times of partitions by applying the folding method [8]. In first phase, the tree is traversed to determine the order of nodes. These orders information is stored in an array link such that the value in ith index will contain lesser value then ith+1 index and greater value from ith-1 index. In the second phase, the new tree is constructed by taking the new root node by finding the left median E.

and finally bring it on top of the tree so that next time it could be accessed quickly [5]. In order to perform the splaying, several operations are performed like zig, zig-zig and zig-zag. The splay trees are memory efficient because they do not need to store additional information in the form of balance factors. They show high performance in those applications in which we need recently accessed data [5]. Since a Splay tree becomes highly unbalanced, therefore a single access to a node of the tree can be quite expensive (in terms of its running time). When total running time is considered then it is as efficient as Balanced BST because amortized performance over the entire sequence is O(log n). The overall running time of splay tree may be as worse as O(n) [4]. Similarly the other drawback is that they require more adjustments and individual operations within a sequence. DSW Algorithm: This algorithm was proposed by Stout and Warren [10], which is basically the modification of Days algorithm [17] which was for FORTRAN language. The FORTAN did not support the recursion, so they introduced the concept of threaded BST. This algorithm works in two phases. In the first phase, backbone is created by applying the right rotation until the left pointer is null and the nodes are in sorted order. In the second phase, rotation is applied from top of the backbone to balance the tree in a way that it becomes the complete or almost complete binary tree. In complete binary tree all nodes are complete (each root node have exactly 2 children) in which insertion is from left to right. The simple BST algorithm is applied instead of threaded BST, which makes it better in terms of memory. It requires additional memory, which stores the initial tree, but there is no need of too much extra memory as required by other techniques. This algorithm does not need stack or array to convert the tree into intermediate form and requires linear time to the number of nodes. As we visit all the nodes to convert the tree into vine and then rebalance it so the running time of the algorithm is O(n) [9]. G.

(n 1) / 2 to the list. The values which are less than


the left median become the part of left subtree and remaining become the right subtree. This process is continued until left and right subtrees are selected. The algorithm requires additional memory to store the pointers in an array and also the array requires consecutive memory locations. The system fails in that situation when we have huge memory in our system but dont have so much consecutive memory. Since to balance the tree we traverse each and every node of the tree in in-order to sort the data; it needs O(n) time [8]. Splay Tree: Splay tree was invented by Daniel Dominic Sleator and Robert Endre Tarjan [5]. It basically works on the principle of temporal locality according to which 90% of the accesses are to 10% of the data items [12]. In Splay tree, an additional operation splaying is performed to adjust the frequently retrieved element F.

367

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on July 23, 2009 at 03:22 from IEEE Xplore. Restrictions apply.

H. AA TREES: The algorithm was proposed by Arne Andersson, which is based on Red-Black tree [19]. It considered as the simpler form of Red-Black Tree. The additional property includes in AA trees is that the left child node may not be red. Ultimately, it simplifies the computation by eliminating half of the restructuring cases. Red-black trees require many different rotations and its delete operations are trickier. During implementation, it stores the level of each node and also ignores its coloring. The skew and split operation of AA tree simplifies its operations. By skew operation, we remove the left horizontal links and through split operation, we can remove consecutive horizontal links. AA tree can be balanced by 3 skews and 2 split operations. This algorithm is used when deletion operation is more often because it simplifies the operation by a rule that the left child may not be red [20]. Treap: Treap was designed by Seidel et al. [18] and it acts like the BST and heap. A treap is a BST in which each node stores the key value and randomly chosen priority. A treap may be min or max heap: Max heap: Parent node has greater value than each of its child nodes. Min heap: Each child node have greater value compared to its parent node. In order to insert the data in a node, it gets value and its priority. The data is inserted as BST, but after each insertion rotation is applied to adjust the nodes according to their priority. After each insertion new nodes priority is compared with its parent and rotation is applied to move the new node to one level up. It applies the following rules to insert the new node [18]: If p is a left successor of k, then value of node p will be less than the value in node k. If P is a right successor of k, then value of node p will be greater than the value in node k. If p is a child of k, then priority assigned to node p will be greater or equal to priority of k. Maintaining a Random BST Dynamically This algorithm is proposed by Vinod et. al. in 2006 [13]. BST becomes unbalanced after performing series of insertion and/or deletion operations. Most of the
368

TABLE I: CHARACTIRISTICS OF EXISTING BST TECHNIQUES Technique AVL [3] Red Black [16] Characteristics The height difference of ns left subtree and right subtree is no more than 1. Requires less number of operations as compared to AVL tree while deleting a node and no need of recursive implementation. First proposed technique of static algorithm and creates perfectly balance tree by subdividing the tree by 2. No need for recursion. Globally balancing technique which balances the tree by readjusting the pointers. The global balancing algorithm runs in linear time that recursively traverses the tree in in-order to get the sorted data of the tree and then rebalances it. Works on the principle of temporal locality. No need of extra memory to convert the tree into intermediate form. Left children may not be red which simplifies the computation by eliminating half of the restructuring cases. Maintains BST with priority. Rebalances only that portion which loses balance.

Balancing with Sorting Algorithm [7] Threaded BST [17] Globally Balancing Algorithm [8]

Splay Tree [5] DSW Algorithm [10]

I.

AA Tree [19]

Treap [18] Random BST [13]

J.

algorithms take whole tree as input and rebalance it. They exploited the fact that most of the nodes of random BST have only one node. This algorithm does not allow any grandfather in tree to have empty sub-tree. This algorithm does not grow the tree beyond n/2 [14]. To insert the new node in the tree first of all we examine the grandfather of the tree and if it has any empty subtree then rotate the subtree in appropriate direction else insert it as traditional BST algorithm. Two additional algorithms are used to point to the grandfather and father of grandfather. The grandfather pointer is used to check that whether grandfather is having any of its sub-tree is empty and father of grandfather is used to perform rotation. For the deletion process insert procedure may be called to readjust the grandfather. This algorithm only rebalances the portion which is needed to be adjusted. This algorithm does not need extra memory and need less number of steps to balance the tree. When compared with traditional BST

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on July 23, 2009 at 03:22 from IEEE Xplore. Restrictions apply.

algorithms the modified algorithm has less height and less number of comparisons [13]. III. ANALYSIS OF EXISTING BST TECHNIQUES

[6] [7]

Table 1 presents the main features of proposed BST techniques. The most widely used techniques are AVL and Red-Black Tree. Splay trees outperform the other proposed techniques in those cases where we need recently accessed data. Treap data structure is most efficient technique in those circumstances where developer has to take care of node priority. In threaded BST, we can traverse the tree without recursion, ultimately it saves extra memory requirement. IV. CONCLUSION

[8]

[9] [10] [11]. [12]. [13].

BST is widely used for data management purpose. Eventually, it supports efficient data insertion and searching. Maintaining the BST in optimal shape always remains a big concern for development organization. The overall complexity of BST depends on height of the tree, whereas this height heavily depends on the order of the given input. Previous surveys on the existing topic have mainly focused on running time of the algorithms. We have primarily analyzed the areas where these techniques can effectively be applied and then give a comparison. Based on this survey/study we are able to propose the guidelines for developers in choosing the best suitable algorithm for a certain scenario. V. FUTURE WORK

[14]

[15] [16]. [17]. [18]. [19]. [20]. [21]. [22].

In future, we plan to study the tries and other remaining dynamic data structures. Another research area would be to investigate whether for large amount of random data tree balancing is fruitful or not. REFERENCES
[1]. [2]. [3]. [4]. [5] Michael T. Goodrich, Roberto Tamassia, Algorithm Design Foundations, Analysis, and Internet Examples. John Wiley & Sons, Inc. ISBN: 0-471-38365-1. D. E. Knuth, The Art of Computer Programming, Vol. 3, Searching and Sorting, Pearson Education Asia, 1999. Adelson-Velskii, G. M., and Landis, E. M., 1962, An Algorithm for the Organization of information, Accession number: AD0406009. Robert L. Kruse Alexander J. Ryba, Data Structures and Program Design in C++, Prentice Hall, ISBN: 0-13768995-0, 1998. Daniel Dominic Sleator and Robert Endre Tarjan, Self Adjusting Binary Search Trees, In Journal of the

Association for Computing Machinery, Vol. 32, No. 3, July 1985. Ben Pfaff, An Introduction to Binary Search Trees and Balanced Trees Libavl Binary Search Tree Library. Volume 1: Source Code Version 2.0.2. W. A. Martin and D. N. Ness, Optimizing Binary Search Trees Grown with a Sorting Algorithm In Communication of ACM, Vol. 15, Issue. 2, pp. 88-93, 1972. His Chang and Sitharama Iyengar,Efficient Algorithms to Globally Balance a Binary Search Tree, In Communication of ACM, Vol. 27, Issue. 7, pp. 695-702, 1984. Timothy J. Rolfe,One-Time Binary Search Tree Balancing: The Day/Stout/Warren (DSW) Algorithm SIGCSE Bulletin Vol 34, No. 4, 2002. Quentin F. Stout and Belle L. Warren,Tree Rebalancing in Optimal time and Space, In Communications of the ACM, Vol. 29, No. 9, 1986. Jeferey L. Eppinger, An Empirical Study of Insertion and Deletion in Binary Search Trees, In Communications of the ACM, Vol. 26, No. 9, 1983. William Stallings, Computer Organization and Architecture 7th Edition, Pearson Edition, ISBN: 817758-993-8. Vinod, P. Suri, P. and Maple, C., Maintaining a Binary Search Tree Dynamically. In Proceedings of the 10th International Conference on Information Visualization, pp. 483-488, 2006. Suri Pushpa, Prasad Vinod, Jilani Abdul Khader, Insertion and Deletion on Binary Search Tree Using Modified Insert Delete Pair: An Empirical Study, International Journal of Computer Sciencn and Network Security (IJCSNS), Vol. 7, No. 12, 2007. Travis Gagie., New Ways to Construct Binary Search Trees, ISAAC 2003, LNCS 2906, Springer-Verlag, pp. 537543, 2003. R. Bayer, Symmetric Binary B-trees: Data Structure and Maintenance Algorithms, Acta Informatica 1:290-306, 1972. Day, A. C., 1976, Balancing a Binary Tree, Computer Journal, XIX, pp. 360-361. Seidel, Raimund and Aragon, Cecilia R., Randomized Search Trees, Algorithmica 16 (4/5): pp. 464-497, 1996. A. Andersson. Balanced Search Trees Made Simple, WADS, 1993. Mark Allen Weiss, Data Structure and Problem Solving using java 3rd Edition Addison-Wesley, ISBN: 9780321322135. Michael T. Goodrich, Roberto Tamassia, and David M. Mount, Data Structures and Algorithms in C++, John Wiley & Sons, Inc, ISBN: 0-471-20208-8. Dominique A. Heger, A Disquisition on the Performance Behavior of Binary Search Tree Data Structures, European Journal for the Informatics Professional, Vol. V, No. 5, October 2004.

369

Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on July 23, 2009 at 03:22 from IEEE Xplore. Restrictions apply.

Anda mungkin juga menyukai