TREES
BINARY SEARCH TREES
DEFINITIONS
A tree is a data structure consisting of data nodes connected to each other with pointers, in much the same spirit as a linked list. Each node in a tree may be connected to two or more other nodes, rather than the single node allowed in a linked list. The maximum number of nodes to which any single node may be connected is called the order of the tree. The simplest tree is of order 2, and is called a binary tree.
The topmost node in the tree is called the root node. A node with no children is called a leaf.
Each node is shown with its three main components: the data area, and the pointers to the left and right children. These pointers are used in the same way as pointers in the linked list. When a new node is to be added to the tree, memory is allocated for the new node using the new operator in Java, and the address of this location is loaded into the appropriate pointer (left or right) in the new nodes parent. Just as a null pointer indicates the end of a linked list, a null pointer in a tree indicates the end of a branch in that direction. A leaf is a node, both of whose pointers are null.
TREE OPERATIONS
A binary tree as defined above can contain any type of data in any order, depending on the use to which the tree is to be put. The most common use of trees however, is as yet another method for sorting and searching data. Trees, which are used for this purpose, are called search trees, and binary trees used for sorting and searching data are called binary search trees.
Trees (binary and otherwise) have much the same basic types of operations as other data structures:
One way that a binary search tree can be used to sort data is as follows. Let us suppose that we have a list of integers, such as 5, 2, 8, 4, and 1, that we wish to sort into ascending order. We can perform a two-stage sorting algorithm. The first stage involves inserting the integers into a binary search tree:
Having inserted the numbers into the binary search tree, it may not be immediately obvious how that has helped us. In order to produce a sorted list of the data, we need to traverse the tree, that is list the nodes in some specific order. Various types of tree traversal exist, but the most common is inorder traversal, which means that we follow the algorithm:
Steps 2 and 4 in this algorithm use recursion: they call the same algorithm to process the left and right children of the current pointer.
JAVA CODE FOR A SIMPLE BINARY SEARCH TREE
BUILDING THE TREE
The development of Java code for a binary search tree is very similar to that for the linked list. Since each node in the tree must store not only some data, but also pointers to the node’s left and right children.
TRAVERSING THE TREE
Having inserted some data into the binary search tree, the next question is obviously, ‘How do we extract the information from the tree?" This depends to a large extent on the use to which the information is to be put.
DELETION FROM A BINARY SEARCH TREE
Algorithm for deleting a node from a binary search tree in such a way that the remaining nodes still have the same inorder traversal.
There are three categories of nodes we may wish to delete:
EFFICIENCY OF BINARY SEARCH TREES
The binary search tree routines illustrate a way of using trees to sort and search data. The treesort and treesearch algorithms work well if the initial data are in a jumbled order, since the binary search tree will be fairly well balances, which means that there are roughly equal numbers of nodes in the left and right subtree of any node. Balanced trees tend to be ‘bushy’: they have few levels and spread out width-wise. This makes for efficient sorting and searching routines, because both these routines work their way vertically throughout the tree to locate nodes. Trees that are wide and shallow have only a few levels, so the insertion and searching routines have relatively few steps. In fact, in these cases, the sorting routine in 0(n log n)., so it is of comparable efficiency to mergesort and quicksort.
AVL
CONSTRUCTION OF AN AVL TREE
An algorithm for constructing balanced binary search trees in which the trees remain as balanced as possible after every insertion was devised in 1962 by two Russian mathematicians, B.M. Adelson-Vel’sky and E. M. Landis (hence the name AVL tree). An AVL tree is a binary search tree in which the left and right subtrees of any node may differ height by at most 1, and in which both the subtrees are themselves. AVL trees (the definition is recursive).
An AVL tree must have only the differences –1 0 or 1 between the two subtrees at any node.
An AVL tree is constructed in the same manner as an ordinary binary search tree, except that after the addition of each new node, a check must be made to ensure that the AVL balance conditions have not been violated. If all is well, no further action need be taken. If the new node causes an imbalance in the tree, however, some rearrangement of the tree’s nodes must be done. The insertion of a new node and test for an imbalance are done using the following algorithm:
We perform an operation known as a rotation when the tree goes out of balance. There are two types of rotation used in AVL trees: single and double rotations. The rules for deciding which type of rotation to use are quite simple:
In summary, the steps involved in inserting a new node in an AVL tree are:
JAVA CODE FOR AVL TREES
As the AVL tree is essentially an ordinary binary search tree with a balancing operation added to the insertion routine, it may seem logical to design an AVL tree template by inheriting the classes we designed earlier for the binary search tree and its node, and overloading the Insert ( ) methods.
THE EFFICIENCY OF AVL TREES
The improved efficiency of AVL trees comes at a rather severe cost in terms of the amount of effort required to program them.
HEAPS, HEAPSORT, AND PRIORITY QUEUES
THE HEAP DATA STRUCTURE
The final application of a purely binary tree that we shall examine is a data structure called a heap Heaps are unusual in the menagerie of tree structures in that they represent trees as arrays rather than linked structures using pointers. To see how a binary tree can be represented as an array, consider.
Any binary tree can be represented in an array in this fashion if we leave blank those array elements where no corresponding tree nodes exist.
The definition of the heap data type is as follow. A heap is a binary tree satisfying the restrictions:
A heap offers us a way to process its contents in a sorted order by extracting one element at a time and not bothering to fully sort the remainder of the data until it is needed. The most common application of a heap is a priority queue.
A queue is a data structure in which items are inserted at the tail of the queue and extracted from the head, in the FIFO order. A priority queue is similar in spirit to an ordinary queue, except that each item in the queue has an associated priority. Items with higher priority will get processed before items with lower priority, even if they arrive in the queue after them.
CONSTRUCTING A HEAP
A priority queue may begin with a list of unsorted data. To organize this initial data so that it has the heap properties. Once we have made our initial heap, we may deal with the priority queue in a continuous fashion by extracting the item at the top and processing it or inserting another item into the queue while maintaining the heap property.
HEAPSORT AND PRIORITY QUEUES
Having a created the heap, we may now use it in applications. As mentioned the main use of a heap is as a priority queue, in which largest element is extracted from the root node with the remaining element being rearranged to form a smaller heap. A variation on the queue provides us with yet another sorting method, known as a heapsort. Heapsort makes use of the fact that a heap is stored in and know array.
SUMMARY
The tree data structure in this chapter. A tree has a single root node. Each node may have one or more branches to other nodes in the tree. A node at the end of a branch is called a child of the node at the beginning of the branch. An node with no children is called a leaf. The maximum number of branches emanating from any one node is called the order of the tree. The most common form of tree is the binary tree.
We considered algorithms for inserting nodes into a binary tree in such a way that when the nodes of the tree are traversed in inorder fashion (the left subtree is traversed, followed by the root node, and then the right subtree, and so on recursively), the nodes will be listed in sorted order. These routines were then coded in Java. A routine for deleting items from a binary tree was considered.
Since the efficiency of a binary tree improves if the tree is balanced (the left and right subtrees of any node are roughly the same height). We studied the Avl algorithm for constructing a binary tree by balancing the tree, if necessary, after each node is added. The AVL algorithm requires that a few nodes in the tree are rearranged using either a single or double rotation, if the tree loses its balance after an insertion. Java code was given for inserting nodes into a tree using the AVL algorithm.
We then considered the heap. A heap is a binary tree, which is stored as an array, rather than as a linked structure. The nodes in a heap are arranged so that the largest element is always at the root. The main use of a heap is as a priority queue, that is , a queue in which items are processed in order of priority rather than strictly in the order in which they were added to the queue. Algorithms for constructing a heap from randomly ordered data and for extracting the root element and rearranging the remaining nodes to retain the heap property are given and coded in Java.
The B-tree, which is a multiway search three. Each node in a B-tree of order N may have up to N branches and store up to N – 1 data items. Although the depth of a B-tree is much less than that of a binary tree storing the same data, the fact that several data items are stored at each node means that extra work is required to store and find a data item at each node. The main use of B-trees is in large database systems where the entire data set is too large to store in RAM. Blocks of data are read from the hard disk and then searched in RAM. Since a disk access is much slower than processing in RAM a B-tree is used to minimize the number of disk accesses. The algorithm for constructing a b-tree is given but no Java code is given since B-trees are not often used in small or medium sized program.