DATA STRUCTURES

INTRODUCTION

Fixed-size structures such as single-subscripted arrays, double-sub-scripted arrays and structs. Dynamic data structures that grow and shrink during execution. Linked lists are collections of data items "lined up in a row: insertions and removals are made anywhere in a linked list. Stacks are important in compilers and operating system—insertions and removals are mode only at one end of at stack—its top. Queues represent waiting lines; insertions are made at the back (also referred to as the tail) of a queue and removals are made from the front (also referred to as the head) of a queue. Binary trees facilitate high-speed searching and sorting of data, efficient elimination of duplicate data items, representing file system directories and compiling expressions into machine language.

SELF-REFERENTIAL CLASSES

A self-referential class contains a pointer member that points to a class object of the same class type.

Self-referential class objects can be linked together to form useful data structures such as list, queues, stacks, and trees.

DYNAMIC MEMORY ALLOCATION creating and maintaining dynamic data structures requires dynamic memory allocation—the ability for a program to obtain more memory space at execution time to hold new nodes and to release space no longer needed. The limit for dynamic memory allocation can be as large as the amount of available physical memory in the computer or the amount of available virtual memory in a virtual memory system. Often, the limits are much smaller because available memory must be shared among many users.

LINKED LISTS

A linked list is a linear collection of self-referential class objects, called nodes, connected by pointer links—hence, the term ‘linked’ list. A linked list is accessed via a pointer to the first node of the list. Subsequent nodes are accessed via the link-pointer member stored in each node. By convention, the link pointer in the last node of a list is set to null (zero) to mark the end of the list. Data are stored in a linked list dynamically—each node is created as necessary. A node can contain data of any type including objects of other classes. If nodes contain base-class pointers or base class references to base-class and derived-class objects related by inheritance, we can have a linked list of such nodes and use virtual function calls to process these objects polymorphically. Stack and queues are also linear data structures, and as we will see are constrained versions of linked lists. Trees are nonlinear data structures.

Lists of data can be stored in arrays, but linked lists provide several advantages. A linked list is appropriate when the number of data elements to be represented in the data structure at one is unpredictable. Linked lists are dynamic, so the length of a list can be increase or decrease as necessary. The size of a ‘conventional array, however, cannot be altered, because the array size is fixed at compile time. "conventional" arrays can become full. Linked lists become full only when the system has insufficient memory to satisfy dynamic storage allocation requests.

An array can be declared to contain more elements than the number of items expected, but this can waste memory . Linked lists can provide better memory utilization in these situations. Linked lists allow the program to adapt at run time.

Insertion and deletion in a sorted array can be time consuming—all the elements following the inserted or deleted element must be shifted appropriately.

The elements of an array are stored contiguously in memory. This allows immediate access to any array element because the address of any element can be calculated directly based on its position relative to the beginning of the array. Linked lists do no afford such immediate "direct access" to their elements.

Linked list nodes are normally not stored contiguously in memory. Logically, however, the nodes of a linked list appear to be contiguous.

Using dynamic memory allocation (instead of arrays) for data structures that grow and shrink at execution time can save memory. Pointers occupy space and that dynamic memory allocation incurs the overhead of function calls.

Assign null (zero) to the link member of a new node Pointers should be initialized before they are used.

The kind of linked list we have been discussing is a singly-linked-list—the list begins with a pointer to the first node and each node contains a pointer to the next node "in sequence". This list terminates with a node whose pointer member is 0. A singly-linked-list may be traversed in only one direction.

A circular, singly-linked list begins with a pointer to the first node and each node contains a pointer to the next node. The "last node" does not contains a 0 pointer; rather , the pointer in the last node points back to the first node, thus closing the "circle".

A doubly linked list allows traversals both forwards and backwards. Such a list is often implemented with two "start pointers"—one that points to the first element of the list to allow front-to-back traversal of the list, and one that points to the last element of the list to allow back-to-front traversal of the list. Each node has both a forward pointer to the next node in the list in the forward direction and a backward pointer to the next node in the list in the backward direction.

In a circular, doubly-linked list, the forward pointer of the last node pointers to the first node and the backward pointer of the first node points to the last node, thus closing the "circle".

STACKS

A stack is a constrained version of a linked list—new nodes can be added to a stack and removed from a stack only at the top. A stack is referred to as a LIFO data structure. The link member in the list node of the stack is set to null (zero) to indicate the bottom of the stack.

Not setting the link in the bottom node of a stack to null (zero) is a programming error.

The primary member functions used to manipulate a stack are push and pop. Function push adds a new node to the top of the stack. Function pop removes a node from the top of the stack, stores the popped value in a reference variable that is passed to the calling function, and returns true if the pop operation was successful.

Stack have many interesting applications. When a function call is made, the called function must know how to return to its caller, so the return address is pushed onto a stack. If a series of function calls occurs, the successive return values are pushed onto the stack in LIFO order so that each function can return to its caller. Stacks support recursive function calls in the same manner as conventional nonrecursive calls.

When the function returns to it caller, the space for that function’s automatic variables is popped off the stack, and those variables are no longer known to the program.

Stacks are used by compilers in the process of evaluating expressions and generating machine language code.

QUEUES

A queue is similar to a supermarket checkout line—the first person in line is serviced first and other customers enter the line at the end and wait to be serviced. Queue nodes are removed only from the head of the queue and are inserted only at the tail of the queue. A queue is referred to as FIFO data structure.

Queues have many applications in computer systems. Most computers have only a single processor, so only one user at a time can be serviced. Entries for the other users are places in a queue. Each entry gradually advances to the front of the queue as users receive service. The entry at the front of the queue is the next to receive service.

Queues are also used to support print spooling. A multi-user environment may have only a single printer. Many users may be generating outputs to be printed. If the printer is busy, other outputs may still be generated. These are "spooled " to disk where they wait in a queue until the printer become available.

TREES

Linked lists, stacks, and queues are linear data structures. A tree in a nonlinear, two-dimensional data structure with special properties. Tree nodes contain two or more links. Binary trees—trees whose nodes all contain two links (none, one or both of, which may be null). The root node is the first node in a tree. Each link in the root node refers to a child. The left child is the root node of the left subtree, and the right child is the root node of the right subtree. The children of a node are called siblings. A node with no children is called a leaf node. Normally draw trees from the root node down—exactly the opposite of trees in nature.

A binary search tree With no duplicate node values) has the characteristic that the value in any left subtree are less than the value in its parent node, and the values in any right subtree are greater than the value in its parent node. A binary search tree and traverses three ways—using recursive inorder, preorder, and postorder traversals.

A node can only be inserted as a leaf node in a binary search tree. If the tree is empty an new node is created, initialized, and inserted in the tree.

In order Traversal of a binary search tree prints the node values in ascending order. The process of creating a binary search tree actually sorts the data—and thus this process is called the binary tree sort. The steps are

Process the value in the node.
Traverse the left subtree with a preordertraversal.
Traverse the right subtree with a preordertraversal.

The binary search tree facilitates duplicate elimination. As the tree is being created, an attempt to insert a duplicate value will be recognized because a duplicate will follow the same " go left" or " go right" decisions on each comparison as the original value did. The duplicate will eventually be compared with a node contains the same value. The duplicate value may simply be discarded at this point.

Searching a binary tree for a value that matches a key value is also fast. If the tree is tightly packed, then each level contains about twice as many elements as the previous level.

SUMMARY

Elf-referential classes contain members called links that point to objects of the same class type.
Self-referential classes enable many class objects to be linked together in stacks, queues, lists, and trees.
Dynamic memory allocation reserves a block of bytes in memory to store an object during program execution.
A linked list is a linear collection of self-referential class objects.
A linked list is a dynamic data structure—the length of the list can increase or decrease as necessary.
Linked lists can continue to grow until memory is exhausted.
Linked lists provide a mechanism for simple insertion and deletion of data by pointer manipulation.
A singly-linked list begins with a pointer the first node and each node contains a pointer to the next node "in sequence". This list terminates with a node whose pointer member is 0. A singly linked list may be traversed in only one direction.
A circular, singly-linked list begins with a pointer to the first node and each node contains a pointer to the next node. The pointer in the last node points back to the first node, thus closing the '‘circle’.
A doubly-linked list allow traversals both forwards and backwards. Each node has both a forward pointer to the next node in the list in the forward direction, and a backward pointer to the next node in the list in the backward direction.
In a circular, doubly-linked list, the forward pointer of the last node points to the first node and the backward pointer of the first node points to the last node, thus closing the ‘circle’.
Stacks and queues are constrained versions of linked lists.
New stack nodes are added to a stack and are removed from a stack only at the top of the stack. A stack is referred to as a LIFO data structure.
The link member in the last node of the stack is set to null to indicate the bottom of the stack.
The two primary operation used to manipulate a stack are push and pop. The push operation creates a new node and places it on the top of the stack. The pop operation removes a node from the top of the stack, deletes the memory that was allocated to the popped node, and returns the popped value.
In a queue data structure, nodes are removed from the head and added to the tail. For this reason, a queue is referred to as a FIFO data structure. The add and remove operations are known as enqueue and dequeue.
Trees are two-dimensional data structures requiring two or more links per node.
Binary trees contain two links per node.
The root node is the first node in the tree.
Each of the pointer in the root node refers to a child . the lest child is the first node in the left subtree, and the right child is the first node in the right subtree. The children of a node are called siblings. Any tree node that does not have any children is called a leaf node.
A binary search tree has the characteristic that the value in the left child of a node is less than the value in its parent node, and the value in the right child of a node is greater than or equal to the value in its parent node. If there are no duplicate data values, the value in the right child is simply greater than the value in its parent node.
An inorder traversal of a binary tree traverses the left subtree inorder, processes the value in the root node, and then traverses the right subtree inorder. The value in a node is not processed until the values in its left subtree are processed
A preorder traversal process the value in the root node, traverses the left subtree preorder, then traverses the right subtree preorder. The value in each node is processed as the node is encountered.
A postorder traversal traverses the left subtree postorder, traverses the right subtree postorder, then processes the value in the root node. The value in each node is not processed until the values in both its subtrees are processed.

COMMON PROGRAMMING ERRORS

Not setting the link in the list node of a list to null
Assuming that the size of a class object is simply the sum of the sizes of its data members.
Not returning dynamically allocated memory when it is no longer needed can cause the system to run out of memory prematurely. This is sometimes called a ‘memory leak’.
Deleting memory with delete that was not allocated dynamically with new.
Referring to memory that has been deleted.
Trying to delete memory that has already been deleted can lead to unpredictable results at execution time.
Not setting the link in the bottom node of a stack to null
Not setting the link in the last node of a queue to null
Not setting to null the links in leaf nodes of a tree.
GOOD PROGRAMMING PRACTICES
When memory that was dynamically allocated with new is no longer needed, use delete to return the memory to the system immediately.
Assign null to the link member of a new node. Pointers should be initialized before they are used.

PERFORMANCE TIPS
An array can be declared to contain more elements than the number of items expected, but this can waste memory. Linked lists can provide better memory utilization in these situation. Linked lists allow the program to adapt at run time.
Insertion and deletion in a sorted array can be time consuming all the elements following the inserted or deleted element must be shifted appropriately.
The elements of an array are stored contiguously in memory. This allows immediate access to any array element because the address of any element can be calculated directly based on its position relative to the beginning of the array. Linked lists do not afford such immediate ‘direct access’ to their elements.
Using dynamic memory allocation instead of arrays for data structures that grow and shrink at execution time can save memory. Keep in mind, however, that pointers occupy space, and that dynamic memory allocation incurs the overhead of function calls.