CHAPTER 14 FILE PROCESSING-

INTRODUCTION-Storage of data in variables and arrays is temporary. Files are used for permanent retention of large amounts of data. Computers store files on secondary storage devices such as magnetic disks, optical disks, and tapes.

THE DATA HIERARCHY-All data items processed by digital computers are reduced to combinations of zeros and ones. It is simple and economical to build electronic devices that can assume two stable states—one state represents 0 and the other state represents 1.

The smallest data item in a computer can assume the value 0 or the value 1. Such a data item is called a bit (binary digit)—a digit that can assume one of two values. Computer circuitry performs various simple bit manipulations such as examining the value of a bit, setting the value of a bit, and reversing a bit (from 1 to 0 or from 0 to 1).

It is cumbersome for programmers to work with data in the low-level form of bits. Programmers prefer to work with data forms such as decimal digits (0, 1, 2, …..9) letters (a…..z), and special symbols ($ @..)

Digits, letters, and special symbols are referred to as characters. The set of all characters used to write programs and represent data items on a particular computer is called that computer’s character set. Since computers can process only 1s and 0s, every character in a computer’s character set is represented as a sequence of 1s and 0s (called a byte). Bytes are most commonly composed of eight bits. Programmers create programs and data items with characters; computers manipulate and process these characters as patterns of bits.

Just as characters are composed of buts, fields are composed of characters (or bytes). A field is a group of characters that conveys meaning.

Data items processed by computers form a data hierarchy in which data items become larger and more complex in structure as we progress from bits, to characters (bytes), to fields, and so on.

A record a struct or a class is composed of several fields ( called members).

A record is a group of related fields. A file is a group of related records.

To facilitate the retrieval of specific records from a file, at least one field in each record is chosen as a record Key. A record key identifies a record as belonging to a particular person or entity that is unique from all other records in the file.

There are many way of organizing records in a file. The most common type of organization is called a sequential file in which records are typically stored in order by the record key field.

A group of related files is sometime called a database. A collection of programs designed to create and manage databases is called a database management system (DBMS).

FILES AND STREAMS-C++ views each file simply as a sequence of bytes. Each file ends either with an end-of-file marker or at a specific byte number recorded in a system-maintained, administrative data structure. When a file is opened, an object is created and a stream is associated with the object. Cin, cout, cerr, and clog streams provide communication channels between a program and a particular file or device.

Cin-enables a program to input data from the keyboard, cout enables a program to output data to the screen, cerr and clog enable a program to output error messages to the screen.

To perform file processing in C++, the header files <iostream,h> and <fstream.h> must be included. The header <fstream.h> includes the definitions for the stream classes ifstream (for input from a file), ofstream(for output to a file), and fstream(for input to and output from a file). Files are opened by creating objects of these steam classes. These stream classes are derived from (inherit the functionality of) classes istream, ostream, and iostream, respectively. The member functions, operators, and manipulators can all be applied to file streams as well

CREATING A SEQUENTIAL ACCESS FILE-C++ imposes no structure on a file. Notions like "record" do not exits in C++ files. The programmer must structure files to meet the requirements of applications. Files are opened by creating objects of stream classes ifstream, ofstream, or fstream. The file is to be opened for output, so an ofstream object is created. Two arguments are passed to the object’s constructor—the filename and the file open mode. For an ofstream object, the file open mode can be either ios::out to output data to a file or ios::app to append data to the end of a file (without modifying data already in the file). Existing files opened with mode ios::out are truncated—all data in the file is discarded. If the specified files does not yet exist, then a file is created with that filename. The declaration

Ofstream outClientFile( "clients.dat", ios:out );

An ofstream object named outClienfile associated with the file clients.dat that is opened for output. The arguments "clients.dat" and ios::out are passed to the ofstream constructor which opens the file. This establishes a "line of communication" with the file. The arguments are passed to the ofstream constructor function which opens the file. By default, ofstream objects are opened for output, so the statement.

Ofstream outClientFile( "clients.dat" );

Could have been used to open clients.data for output, lists the file open modes.

An ofstream object can be created without opening a specific file—a file can be attached to the object later. For example, the declaration

Ofstream outClientFile;

Creates ofstream object named outClientFile. The ofstream member function open opens a file and attaches it to an existing ofstream object as follows:

OutClientFile.open( "clients.dat", ios::out );

FILE OPEN MODES

Ios::app Write all output to the end of file.

Ios::ate Open a file for output and move to the end of the file (normally used to append data to a

File). Data can be written anywhere in the file.

Ios::in Open a file for input

Ios::out Open a file for output

Ios::trunc Discard the file’s contents if it exists (this is also the default action for ios::out)

Ios::nocreate If the file does not exist, the open operation fails

Ios::noreplace If the file exists, the open operations fails.

After creating an ofstream object and attempting to open it, the program tests whether the open operation was successful. The condition in the if structure

If ( !outClientFile) {

Cer << "File could not be opened" << endl;

Exit( 1 );

}

uses the overloaded ios operator member function operator! To determine if the open operation succeeded. The condition returns a nonzero (true) value if either the failbit or badbit are set for the stream on the open operation. Some possible errors are attempting to open a nonexistent file for reading, attempting to open a file for reading without permission, and opening a file for writing when no disk space is available.

When the condition indicates that the open attempt was unsuccessful, the error message "File could not be opened" is output, and function exit is called to end the program The argument to exit is returned to the environment from which the program was invoked. Argument 0 indicates that the program terminated normally; any other value indicates that the program terminated due to error. The value returned by exit is used by the calling environment to respond appropriately to the error.

Another overloaded ios operator member function—operator void*--converts the stream to a pointer so it can be tested as 0 (the null pointer) or nonzero (any other pointer value). If the failbit or badbit have been set for the stream, 0 (false) is returned. The condition in the following while header automatically invokes the operator void* member function.

While ( cin >> account >> name >> balance )

The condition will remain true as long as neither the failbit nor the badbit has been set for cin. Entering the end-of-file indicator set the failbit for cin. The operatorvoid* function can be used to test an input object for end-of-file instead of explicitly calling the eof member function on the input object.

If the file is opened successfully, the program begins processing data. The following statement prompts the user to enter the various fields for each record, or to enter end-of-file when data entry is complete:

Cout << "Enter the account, name, and balance.\n"

<< "Enter EOF to end input.\n? ";

while ( cin >> account >> name >> balance )

inputs each set of data and determines if end-of-file has been entered. When end-of-file or bad data is entered, the stream-extraction operation >> on cin returns 0 (normally this stream extraction returns cin) and the while structure terminates. The user enters end-of-file to inform the program that there is no more data to be processed. The end-of-file indicator is set when the end-of-file key combination is entered by the user. The while structure continues looping as long as the end-of-file indicator has not been entered.

OutClientfile << account << ‘ ‘ << name

<< ‘ ‘ << balance <<’\n’;

write a set of data to the file "clients.dat" using the stream-insertion operator << and the outclientFile object associated with the file at the beginning of the program. The data may be retrieved by a program designed to read the file.

Once the end-of-file indicator is entered, main terminates. This causes the outClientFile object to be destroyed thus invoking its destructor function which closes the file clients.dat. An ofstream object can explicitly be closed by the programmer using member function close as follows:

OutClientFile.close();

READING DATA FROM A SEQUENTIAL ACCESS FILE-Data is stored in files so that it may be retrieved for processing when needed. Reading records from the file "clients.dat" created by the program of and prints the contents of a record. Files are opened for input by creating an ifstream class object. Two arguments are passed to the object—the filename and the file open mode. The declaration

Ifstream inClientFile( "clients.dat", ios::in );

Creates an ifsteam object called inClientFile and associates with it the file clients.dat that is to be opened for input. The arguments in parentheses are passed to the ifstream constructor function which opens the file and establishes a "line of communication" with the file.

Objects of class ifstream are opened for input by default, so the statement

Ifstream inClientFile( "clients.dat" );

Could have been used to open clients.dat for input. Just as with an ofstream object, and ifstream object can be created without opening a specific file and a file can be attached to it later.

The program used the condition !inClientfile to determine whether the file was opened successfully before attempting to retrieve data from the file.

While ( inClientFile >> account >> name >> balance )

Reads a set of data from the file. After the preceding line is executed the first time, account has the value 100, name has the value "jones", and balance has the value 24.98. Each time the line is executed, another record is read from the file into the variables account, name, and balance. The records are displayed using function outputLine which uses parameterized stream manipulators to format the data for display. When the end of the file has been reached, the input sequence in the while structure returns – ( normally the stream inClientFile is returned), the file is closed by the ifstream destructor funciton, and the program terminates.

To retrieve data sequentially from a file, programs normally start reading from the beginning of the file, and read all the data consecutively until the desired data are found. It may be necessary to process the file sequentially several times (from the beginning of the file) during the execution of a program. Both the istream class and the ostream class provide member functions for repositioning the file position pointer (the byte number of the next byte in the file to be read or written). These member functions are seekg ("seek get") for the istream class and seekp ("seek put") for the ostream class. Each istream object has a "get pointer" that indicates the byte number in the file from which the next input is to occur, and each ostream object has a "put pointer" that indicates the byte number in the file at which the next output is to be placed. The statement:

InClientFile.seekg( 0 );

Repositions the file position pointer to the beginning of the file (location 0) attached to inClientFile. The argument is seekg is normally a long integer. A second argument can be specified to indicate the seek direction. The seek direction can be ios::beg (the default) for positioning relative to the beginning of a stream, ios::end for positioning relative to the end of a stream. The file position pointer is an integer value that specifies the location in the file as a number of bytes from the starting location of the file (this is sometimes referred to as the offset from the beginning of the file). Some examples of poisitioning the "get" file position pointer are:

// position to the nth byte of fileObject

// assume ios::beg

fileObject.seekg( n );

// position n bytes forward in fileObject

fileObject.seekg( n, ios:cur);

// position y bytes back from end of fileObject

fileObject.seekg( y, ios::end);

//position at end of fileObject

fileObject.seekg( 0. Ios::end );

The same operations can be performed with ostream member functions seekp. Members functions tellg and tellp are provided to return the current locations of the "get" and "put’ pointers, respectively. The following statement assigns the "get" file position pointer value to variable location of type long.

Location = filObject.tellg();

UPDATING SEQUENTIAL ACCESS FILES-Data that is formatted and written to a sequential access file cannot be modified without the risk of destroying other data in the file. If the name "White" needed to be changed to worthington". The new record contains six more characters than the original record. The characters beyond the second o in Worthington would overwrite the beginning of the next sequential record in the file. The problem here is that in the formatted input/output model using the insertion operator << and the extraction operator>> fields—and hence records –can vary in size. The formatted input/output model is not usually used to update records in place.

Such updating can be done, but it is awkward. This requires processing every record in the file to update one record.

RANDON ACCESS FILES- Sequential access files are inappropriate for so-called "instant-access" applications in which a particular record of information must be located immediately. Some popular instant access application are airline reservation systems, banking systems, point-of-sale systems, automated teller machines, and other kinds of transaction processing systems that require rapid access to specific data. This kind of instant access is possible with random access files. Individual records of a random access file can be accessed directly without searching through other records.

C++ does not impose structure on a file. So the application that wants to use random access files must literally create them a variety of techniques can be used to create random access files. Simplest is to require that all records in a file are of the same fixed length. Using fixed length records makes it easy for a program to calculate (as a function of the record size and the record key) the exact location of any record relative to the beginning of the file.

Data can be inserted in a random access file without destroying other data in the file. Data stored previously also can be updated or deleted without rewriting the entire file.

CREATING A RANDOM ACCESS FILE-The ostream member function write outputs a fixed number of bytes beginning at a specific location in memory to the specified stream. When the stream is associated with a file, the data is written beginning at the location in the file specified by the "put" file position pointer. The istream member function read inputs a fixed number of bytes from the specified stream to an area in membory beginning at a specified address. If the stream is associated with a file, the bytes are input beginning at the location in the file specified by the "get" file position pointer. Now, when writing an integer number to a file, instead of using

OutFile << number;

Which could print as few as 1 digit or as many as 11 digits (10 digits plus a sign, each of which requires 1 byte of storage) for a 4-byte integer we can use

OutFile,write( reinterpret_cast<const char * >( &number ),

Sizeof( number ) );

Which always writes 4 bytes (ona machine with 4-byte integers). The write function expects a first argument of type const char , hence we used the reinterpret_cast<const char > cast operator to convert the address of number to a const char* pointer. The second argument of write is an integer of type size_t specifying the number of bytes to be written.

Random access file processing program rarely write a single field to a file. Normally, they write one struct or class object at a time.

WRITING DATA RANDOMLY TO A RANDOM ACCESS FILE-Function seekp sets the "put" fileposition pointer to a specific position in the file, then write outputs the data. The "put" file position pointer is set to the end of the file initially, but data can be written anywhere in the file.

READING DATA SEQUENTIALLY FROM A RANDOM ACCESS FILE-This means that the same function can be used without writing separate functions. Sorting with direct access techniques is fast. The speed is achieved by making the file large enough to hold every possible record that might be created. The file could be sparsely occupied most of the time, a waste of storage.

INPUT/OUTPUT OF OBJECTS-We accomplished object input by overloading the stream-extraction operator>> for the appropriate istream classes. We accomplished object output by overloading the stream-insertion operator << for the appropriate ostream classes. In both cases only an object’s data members were input or output, and, in each case in a form meaningful for objects of that particular abstract data type. An object’s member functions are available internally in the computer and are comvined with the data values as these data are input via the overloaded stream-insertion operator.

When object data members are output to disk file, in a sense we lose the object’s type information. We only have data, not type information , on a disk. If the program that is going to read this data know what object type it corresponds to, then the data is simply read into objects of that type.

How can we distinguish objects of different types in the same file? Have each overloaded output operator output a type code preceding each collection of data members that represents one object. Then object input would always begin by reading the type-code field and using a switch statement to invoke the proper overloaded function.

SUMMARY-

COMMON PROGRAMMING ERROR-Opening an existing file for output (ios::out) when in fact, the user wants to perserve the file; the contents of the file are discarded without warning.

Using an incorrect ofstream object to refer to a file.

Not opening a file before attempting to reference it in a program.

GOOD PROGRAMMING PRACTICE-Open a file for input only (using ios::in) if the contents of the file should not be modified. This prevents unintentional modification of the file’s contents. This is an example of the principle of least privilege.

PERFORMANCE TIP-Explicitly close each file as soon as it is known that the program will not reference the file again. This can reduce resource usage in a program that will continue executing after it no longer needs a particular file. This practice also improves program clarity.

SELF-REVIEW-EXERCISES-

  1. Ultimately, all data items processed by a computer are reduced to combinations of 1s and 0s.
  2. The smallest data item a computer can process is called a Bit.
  3. A File is a group of related records.
  4. Digits, letters, and special symbols are referred to as Characters.
  5. A group of related files is called a Database
  6. Member function close of the file stream classes fstream, ifstream, and ofstream closes a file.
  7. The istream member function get reads a character from the specified stream.
  8. The istream member functions get and getline read a line from the specified stream.
  9. Member function open of the file stream classes fstream, ifstream, and ofstream opens a file.
  10. The istream member function read is normally used when reading data from a file in random access applications.
  11. Member functions seekg and seekp of the istream and ostream classes set the appropriate position pointer to specific location in an input or output stream respectively.
  1. Function read can be used to read from any input stream object dereved from istream.
  2. These four streams are created automatically for the programmer. The <iostream.h> header file must be included in a file to use them. This header includes declarations of each of these stream objects.
  3. The files will be closed when destructors for ifstream, ofstream, or fstream objects are executed when the stream objects go out of scope or before program execution terminates, but it is a good programming practice to close all files explicitly with close once they are no longer needed.
  4. Member function seekp or seekg can be used to reposition the part or get file position pointer to the beginning of the file.
  5. The ostream member function write can write to standard output stream cout.
  6. In most cases, sequential file records are not of uniform length. Therefore, it is possible that updating a record will cause other data to be overwritten.
  7. It is not necessary to search through all the records in a random access file to find a specific record.
  8. Records in a random access file are normally of uniform length.
  9. It is possible to seek from the beginning of the file, from the end of the file, and from the current position in the file.
  1. Write a statement that opens file "oldmast.dat" for input; use ifstream object inOldMaster.
  2. Ifstream inOldMaster( "oldmast.dat", ios::in );

  3. Write a statement that opens file "trans.dat" for input; use ifstream object inTransaction.
  4. Ifstream inTransaction( "trans.dat", ios::in );

  5. Write a statement that opens file "newmast.dat" for output (and creation); use ofstream object outNewMaster.

Ofstream outNewMaster( "newmast.dat", ios::out );

4