TREES

BINARY SEARCH TREES

DEFINITIONS

A tree is a data structure consisting of data nodes connected to each other with pointers, in much the same spirit as a linked list. Each node in a tree may be connected to two or more other nodes, rather than the single node allowed in a linked list. The maximum number of nodes to which any single node may be connected is called the order of the tree. The simplest tree is of order 2, and is called a binary tree.

The topmost node in the tree is called the root node. A node with no children is called a leaf.

Each node is shown with its three main components: the data area, and the pointers to the left and right children. These pointers are used in the same way as pointers in the linked list. When a new node is to be added to the tree, memory is allocated for the new node using the new operator in Java, and the address of this location is loaded into the appropriate pointer (left or right) in the new nodes parent. Just as a null pointer indicates the end of a linked list, a null pointer in a tree indicates the end of a branch in that direction. A leaf is a node, both of whose pointers are null.

TREE OPERATIONS

A binary tree as defined above can contain any type of data in any order, depending on the use to which the tree is to be put. The most common use of trees however, is as yet another method for sorting and searching data. Trees, which are used for this purpose, are called search trees, and binary trees used for sorting and searching data are called binary search trees.

Trees (binary and otherwise) have much the same basic types of operations as other data structures:

Inserting a new node
Deleting a node
Listing or visiting the nodes of the tree.

One way that a binary search tree can be used to sort data is as follows. Let us suppose that we have a list of integers, such as 5, 2, 8, 4, and 1, that we wish to sort into ascending order. We can perform a two-stage sorting algorithm. The first stage involves inserting the integers into a binary search tree:

If the current pointer is 0, create a new node, store the data and return the address of the new node.
Otherwise, compare the integer to the data stored at the current node. If the new integer is less than the integer at the current node, insert the new integer into the left child of the current node (by recursively applying the same algorithm). Otherwise insert it into the right child of the current node.

Having inserted the numbers into the binary search tree, it may not be immediately obvious how that has helped us. In order to produce a sorted list of the data, we need to traverse the tree, that is list the nodes in some specific order. Various types of tree traversal exist, but the most common is inorder traversal, which means that we follow the algorithm:

If the current pointer is not 0, then:
Traverse the left child of the current pointer.
Visit (or print) the data at the current pointer.
Traverse the right child of the current pointer.

Steps 2 and 4 in this algorithm use recursion: they call the same algorithm to process the left and right children of the current pointer.

JAVA CODE FOR A SIMPLE BINARY SEARCH TREE

BUILDING THE TREE

The development of Java code for a binary search tree is very similar to that for the linked list. Since each node in the tree must store not only some data, but also pointers to the node’s left and right children.

TRAVERSING THE TREE

Having inserted some data into the binary search tree, the next question is obviously, ‘How do we extract the information from the tree?" This depends to a large extent on the use to which the information is to be put.

DELETION FROM A BINARY SEARCH TREE

Algorithm for deleting a node from a binary search tree in such a way that the remaining nodes still have the same inorder traversal.

There are three categories of nodes we may wish to delete:

Leaves – these are the easiest, since all we need to do is delete the leaf and set the pointer from its parent (if any) to 0.
Nodes with a single child- these too are fairly easy, since we just redirect the pointer from the node’s parent so that it points to the child.
Nodes with both children – these can be fairly tricky, since we must rearrange the tree to some extend. The method we shall use is to replace the node being deleted by the rightmost node in its left subtree. Because of the rules for inorder traversals, this method quarantines the same traversal.

EFFICIENCY OF BINARY SEARCH TREES

The binary search tree routines illustrate a way of using trees to sort and search data. The treesort and treesearch algorithms work well if the initial data are in a jumbled order, since the binary search tree will be fairly well balances, which means that there are roughly equal numbers of nodes in the left and right subtree of any node. Balanced trees tend to be ‘bushy’: they have few levels and spread out width-wise. This makes for efficient sorting and searching routines, because both these routines work their way vertically throughout the tree to locate nodes. Trees that are wide and shallow have only a few levels, so the insertion and searching routines have relatively few steps. In fact, in these cases, the sorting routine in 0(n log n)., so it is of comparable efficiency to mergesort and quicksort.

AVL

CONSTRUCTION OF AN AVL TREE

An algorithm for constructing balanced binary search trees in which the trees remain as balanced as possible after every insertion was devised in 1962 by two Russian mathematicians, B.M. Adelson-Vel’sky and E. M. Landis (hence the name AVL tree). An AVL tree is a binary search tree in which the left and right subtrees of any node may differ height by at most 1, and in which both the subtrees are themselves. AVL trees (the definition is recursive).

An AVL tree must have only the differences –1 0 or 1 between the two subtrees at any node.

An AVL tree is constructed in the same manner as an ordinary binary search tree, except that after the addition of each new node, a check must be made to ensure that the AVL balance conditions have not been violated. If all is well, no further action need be taken. If the new node causes an imbalance in the tree, however, some rearrangement of the tree’s nodes must be done. The insertion of a new node and test for an imbalance are done using the following algorithm:

Insert the new node using the same algorithm as for an ordinary binary search tree.
Beginning with the new node, calculate the difference in heights of the left and right subtrees of each node on the path leading from the new node back up the tree towards the root.
Continue these checks until either the root node is encountered and all nodes along the path have differences of no greater than 1, or until the first difference greater than 1 is found.
If an imbalance is found, perform a rotation of the nodes to correct the imbalance. Only one such correction is ever needed for any one node.

We perform an operation known as a rotation when the tree goes out of balance. There are two types of rotation used in AVL trees: single and double rotations. The rules for deciding which type of rotation to use are quite simple:

When you have found the first node that is out of balance (according to the algorithm above), restrict your attention to that node and two nods in the two layers immediately below it (on the path you followed up from the new node).
If these three nodes lie in a straight line, a single rotation is needed to restore the balance.
If these three nodes lie in a ‘dog-leg’ pattern (that is a bend in the path) you need a double rotation to restore the balance.

In summary, the steps involved in inserting a new node in an AVL tree are:

Insert the node in the same way as in an ordinary binary search tree.
Beginning with the new node, trace a path back towards the root, checking the difference in height of the two subtrees at each node along the way.
If you find a node with an imbalance (a height difference other than 0, +1, or –1). Stop your trace at this point.
Consider the node with the imbalance and the two nodes on the layers immediately below this point on the path back to the new node.
If these three nodes lie in a straight line, apply a single rotation to correct the imbalance.
If these three nodes lie in a dogleg pattern, apply a double rotation to correct the imbalance.

JAVA CODE FOR AVL TREES

As the AVL tree is essentially an ordinary binary search tree with a balancing operation added to the insertion routine, it may seem logical to design an AVL tree template by inheriting the classes we designed earlier for the binary search tree and its node, and overloading the Insert ( ) methods.

THE EFFICIENCY OF AVL TREES

The improved efficiency of AVL trees comes at a rather severe cost in terms of the amount of effort required to program them.

HEAPS, HEAPSORT, AND PRIORITY QUEUES

THE HEAP DATA STRUCTURE

The final application of a purely binary tree that we shall examine is a data structure called a heap Heaps are unusual in the menagerie of tree structures in that they represent trees as arrays rather than linked structures using pointers. To see how a binary tree can be represented as an array, consider.

Any binary tree can be represented in an array in this fashion if we leave blank those array elements where no corresponding tree nodes exist.

The definition of the heap data type is as follow. A heap is a binary tree satisfying the restrictions:

All leaves are on two adjacent levels.
All leaves on the lowest level occur at the left of the tree.
All levels above the lowest are completely filled.
Both children of any node are again heaps.
The value stored at any node is a least as large as the values in its two children.

A heap offers us a way to process its contents in a sorted order by extracting one element at a time and not bothering to fully sort the remainder of the data until it is needed. The most common application of a heap is a priority queue.

A queue is a data structure in which items are inserted at the tail of the queue and extracted from the head, in the FIFO order. A priority queue is similar in spirit to an ordinary queue, except that each item in the queue has an associated priority. Items with higher priority will get processed before items with lower priority, even if they arrive in the queue after them.

CONSTRUCTING A HEAP

A priority queue may begin with a list of unsorted data. To organize this initial data so that it has the heap properties. Once we have made our initial heap, we may deal with the priority queue in a continuous fashion by extracting the item at the top and processing it or inserting another item into the queue while maintaining the heap property.

HEAPSORT AND PRIORITY QUEUES

Having a created the heap, we may now use it in applications. As mentioned the main use of a heap is as a priority queue, in which largest element is extracted from the root node with the remaining element being rearranged to form a smaller heap. A variation on the queue provides us with yet another sorting method, known as a heapsort. Heapsort makes use of the fact that a heap is stored in and know array.

SUMMARY

The tree data structure in this chapter. A tree has a single root node. Each node may have one or more branches to other nodes in the tree. A node at the end of a branch is called a child of the node at the beginning of the branch. An node with no children is called a leaf. The maximum number of branches emanating from any one node is called the order of the tree. The most common form of tree is the binary tree.

We considered algorithms for inserting nodes into a binary tree in such a way that when the nodes of the tree are traversed in inorder fashion (the left subtree is traversed, followed by the root node, and then the right subtree, and so on recursively), the nodes will be listed in sorted order. These routines were then coded in Java. A routine for deleting items from a binary tree was considered.

Since the efficiency of a binary tree improves if the tree is balanced (the left and right subtrees of any node are roughly the same height). We studied the Avl algorithm for constructing a binary tree by balancing the tree, if necessary, after each node is added. The AVL algorithm requires that a few nodes in the tree are rearranged using either a single or double rotation, if the tree loses its balance after an insertion. Java code was given for inserting nodes into a tree using the AVL algorithm.

We then considered the heap. A heap is a binary tree, which is stored as an array, rather than as a linked structure. The nodes in a heap are arranged so that the largest element is always at the root. The main use of a heap is as a priority queue, that is , a queue in which items are processed in order of priority rather than strictly in the order in which they were added to the queue. Algorithms for constructing a heap from randomly ordered data and for extracting the root element and rearranging the remaining nodes to retain the heap property are given and coded in Java.

The B-tree, which is a multiway search three. Each node in a B-tree of order N may have up to N branches and store up to N – 1 data items. Although the depth of a B-tree is much less than that of a binary tree storing the same data, the fact that several data items are stored at each node means that extra work is required to store and find a data item at each node. The main use of B-trees is in large database systems where the entire data set is too large to store in RAM. Blocks of data are read from the hard disk and then searched in RAM. Since a disk access is much slower than processing in RAM a B-tree is used to minimize the number of disk accesses. The algorithm for constructing a b-tree is given but no Java code is given since B-trees are not often used in small or medium sized program.