9.10. Input and output
9.10.1. Introduction
One of the reasons that has prevented many programming languages from
becoming widely used for ‘real programming’ is their poor support for
I/O, a subject which has never seemed to excite language designers.
C has avoided this problem, oddly enough, by having no I/O at all!
The C language approach has always been to do I/O using library functions,
which ensures that system designers can provide tailored I/O instead of
being forced to change the language itself.
As C has evolved, a library package known as the ‘Standard I/O
Library’ or stdio, has evolved with it and has proved to be both
flexible and portable. This package has now become part of the
Standard.
The old stdio package relied heavily on the UNIX model of file access, in
particular the assumption that there is no distinction between unstructured
binary files and files containing readable text. Many operating systems do
maintain a distinction between the two, and to ensure that C programs can
be written portably to run on both types of file model, the stdio package
has been modified. There are changes in this area which affect many
existing programs, although strenuous efforts were taken to limit the
amount of damage.
Old C programs should still be able work unmodified in a
UNIX environment.
9.10.2. The I/O model
The I/O model does not distinguish between the types of physical devices
supporting the I/O. Each source or sink of data (file) is treated in the
same way, and is viewed as a stream of bytes. Since the
smallest object that can be represented in C is the character, access to
a file is permitted at any character boundary. Any number of characters can
be read or written from a movable point, known as the file position
indicator. The characters will be read, or written, in sequence from
this point, and the position indicator moved accordingly. The position
indicator is initially set to the beginning of a file when it is opened,
but can also be moved by means of positioning requests. (Where random
access is not possible, the file position indicator is ignored.) Opening
a file in append mode has an implementation defined effect on the stream's
file position indicator.
The overall effect is to provide sequential reads or writes unless the
stream was opened in append mode, or the file position indicator is
explicitly moved.
There are two types of file, text files and binary
files, which, within a program, are manipulated as text
streams and binary streams once they have been opened for
I/O. The stdio package does not permit operations on the contents of files
‘directly’, but only by viewing them as streams.
9.10.2.1. Text streams
The Standard specifies what is meant by the term text stream,
which essentially considers a file to contain lines of text. A line is
a sequence of zero or more characters terminated by a newline character.
It is quite possible that the actual representation of lines in the
external environment is different from this and there may be
transformations of the data stream on the way in and out of the program;
a common requirement is to translate the ‘\n ’
line-terminator into the sequence ‘\r\n ’ on output, and
do the reverse on input. Other translations may also be necessary.
Data read in from a text stream is guaranteed to compare equal to the
data that was earlier written out to the file if the data consists only of
complete lines of printable characters and the control characters
horizontal-tab and newline, no newline character is immediately preceded
by space characters and the last character is a newline.
It is guaranteed that, if the last character written to a text file is
a newline, it will read back as the same.
It is implementation defined whether the last line written to a text
file must terminate with a newline character; this is because on some
implementations text files and binary files are the same.
Some implementations may strip the leading space from lines consisting
only of a space followed by a newline, or strip trailing spaces at the end
of a line!
An implementation must support text files with lines containing at least
254 characters, including the terminating newline.
Opening a text stream in update mode may result in a binary stream in
some implementations.
Writing on a text stream may cause some implementations to truncate the
file at that point—any data beyond the last byte of the current write
being discarded.
9.10.2.2. Binary streams
A binary stream is a sequence of characters that can be used to record
a program's internal data, such as the contents of structures or arrays in
binary form. Data read in from a binary stream will always compare equal
to data written out earlier to the same stream, under the same
implementation. In some circumstances, an implementation-defined number
of NUL characters may be appended to a binary stream.
The contents of binary files are exceedingly machine specific, and not,
in general, portable.
9.10.2.3. Other streams
Other stream types may exist, but are implementation defined.
9.10.3. The stdio.h header file
To provide support for streams of the various kinds, a number of
functions and macros exist. The <stdio.h> header file
contains the various declarations necessary for the functions, together
with the following macro and type declarations:
FILE
- The type of an object used to contain stream control
information. Users of stdio never need to know the
contents of these objects, but simply manipulate pointers
to them. It is not safe to copy these objects within the
program; sometimes their addresses may be ‘magic’.
fpos_t
- A type of object that can be used to record unique values
of a stream's file position indicator.
_IOFBF _IOLBF _IONBF
- Values used to control the buffering of a stream in
conjunction with the
setvbuf function.
BUFSIZ
- The size of the buffer used by the
setbuf function. An
integral constant expression whose value is at least 256.
EOF
- A negative integral constant expression, indicating the
end-of-file condition on a stream i.e. that there is no
more input.
FILENAME_MAX
- The maximum length which a filename can have, if there is
a limit, or otherwise the recommended size of an array
intended to hold a file name.
FOPEN_MAX
- The minimum number of files that the implementation
guarantees may be held open concurrently; at least eight
are guaranteed. Note that three predefined streams exist
and may need to be closed if a program needs to open more
than five files explicitly.
L_tmpnam
- The maximum length of the string generated by
tmpnam ; an
integral constant expression.
SEEK_CUR SEEK_END SEEK_SET
- Integral constant expressions used to control the actions
of
fseek .
TMP_MAX
- The minimum number of unique filenames generated by
tmpnam ; an integral constant expression with a value of
at least 25.
stdin stdout stderr
- Predefined objects of type (
FILE * ) referring to the
standard input, output and error streams respectively.
These streams are automatically open when a program
starts execution.
9.10.4. Opening, closing and buffering of streams
9.10.4.1. Opening
A stream is connected to a file by means of the fopen ,
freopen or tmpfile functions. These functions
will, if successful, return a pointer to a FILE object.
Three streams are available without any special action; they are
normally all connected to the physical device associated with the
executing program: usually your terminal. They are referred to by the
names stdin , the standard input,
stdout , the standard output, and
stderr , the standard error streams. Normal
keyboard input is from stdin , normal terminal output is to
stdout , and error messages are directed to
stderr . The separation of error messages from normal output
messages allows the stdout stream to be connected to something other than
the terminal device, and still to have error messages appear on the screen
in front of you, rather than to be redirected to this file. These files
are only fully buffered if they do not refer to interactive devices.
As mentioned earlier, the file position indicator may or may not be
movable, depending on the underlying device. It is not possible, for
example, to move the file position indicator on stdin if that is connected
to a terminal, as it usually is.
All non-temporary files must have a filename, which is
a string. The rules for what constitutes valid filenames are
implementation defined. Whether a file can be simultaneously open multiple
times is also implementation defined. Opening a new file may involve
creating the file. Creating an existing file causes its previous contents
to be discarded.
9.10.4.2. Closing
Files are closed by explicitly calling fclose ,
exit or by returning from main . Any buffered
data is flushed. If a program stops for some other reason, the status of
files which it had open is undefined.
9.10.4.3. Buffering
There are three types of buffering:
- Unbuffered
- Minimum internal storage is used by stdio in an attempt
to send or receive data as soon as possible.
- Line buffered
- Characters are processed on a line-by-line basis. This
is commonly used in interactive environments, and
internal buffers are flushed only when full or when a
newline is processed.
- Fully buffered
- Internal buffers are only flushed when full.
The buffering associated with a stream can always be flushed by using
fflush explicitly. Support for the various types of
buffering is implementation defined, and can be controlled within these
limits using setbuf and setvbuf .
9.10.5. Direct file manipulation
A number of functions exist to operate on files directly.
#include <stdio.h>
int remove(const char *filename);
int rename(const char *old, const char *new);
char *tmpnam(char *s);
FILE *tmpfile(void);
remove
- Causes a file to be removed. Subsequent attempts to open
the file will fail, unless it is first created again. If
the file is already open, the operation of
remove is
implementation defined. The return value is zero for
success, any other value for failure.
rename
-
Changes the name of the file identified by old to
new .
Subsequent attempts to open the original name will fail,
unless another file is created with the old name. As
with remove , rename returns zero for a
successful operation, any other value indicating a failure.
If a file with the new name exists prior to calling
rename , the behaviour is implementation defined.
If rename fails for any reason, the original file is
unaffected.
tmpnam
-
Generates a string that may be used as a filename and is
guaranteed to be different from any existing filename.
It may be called repeatedly, each time generating a new
name. The constant TMP_MAX is used to specify how many
times tmpnam may be called before it can no longer find a
unique name. TMP_MAX will be at least 25. If
tmpnam is
called more than this number of times, its behaviour is
undefined by the Standard, but many implementations offer
no practical limit.
If the argument s is set to NULL , then
tmpnam uses an
internal buffer to build the name, and returns a pointer
to that. Subsequent calls may alter the same internal
buffer. The argument may instead point to an array of at
least L_tmpnam characters, in which case the name will be
filled into the supplied buffer. Such a filename may
then be created, and used as a temporary file. Since the
name is generated by the function, it is unlikely to be
very useful in any other context. Temporary files of this
nature are not removed, except by direct calls to the
remove function. They are most often used to pass
temporary data between two separate programs.
tmpfile
- Creates a temporary binary file, opened for update, and
returns a pointer to the stream of that file. The file
will be removed when the stream is closed. If no file
could be opened,
tmpfile returns a null pointer.
9.10.6. Opening named files
Named files are opened by a call to the fopen function,
whose declaration is this:
#include <stdio.h>
FILE *fopen(const char *pathname, const char *mode);
The pathname argument is the name of the file to open, such
as that returned from tmpnam , or some program-specific
filename.
Files can be opened in a variety of modes, such as
read mode for reading data, write mode for writing
data, and so on.
Note that if you only want to write data to a file, fopen
will create the file if it does not already exist, or
truncate it to zero length (losing its previous contents) if it did
exist.
The Standard list of modes is shown in Table 9.3, although
implementations may permit extra modes by appending extra characters at
the end of the modes.
Mode |
Type of file |
Read |
Write |
Create |
Truncate |
"r" |
text |
yes |
no |
no |
no |
"rb" |
binary |
yes |
no |
no |
no |
"r+" |
text |
yes |
yes |
no |
no |
"r+b" |
binary |
yes |
yes |
no |
no |
"rb+" |
binary |
yes |
yes |
no |
no |
"w" |
text |
no |
yes |
yes |
yes |
"wb" |
binary |
no |
yes |
yes |
yes |
"w+" |
text |
yes |
yes |
yes |
yes |
"w+b" |
binary |
yes |
yes |
yes |
yes |
"wb+" |
binary |
yes |
yes |
yes |
yes |
"a" |
text |
no |
yes |
yes |
no |
"ab" |
binary |
no |
yes |
yes |
no |
"a+" |
text |
yes |
yes |
yes |
no |
"a+b" |
binary |
no |
yes |
yes |
no |
"ab+" |
binary |
no |
yes |
yes |
no |
Table 9.3. File opening modes
Beware that some implementations of binary files may pad the last record
with NULL characters, so opening them with modes
ab , ab+ or a+b could position the
file pointer beyond the last data written.
If a file is opened in append mode, all writes will occur at the
end of the file, regardless of attempts to move the file position indicator
with fseek . The initial position fo the file position
indicator will be implementation defined.
Attempts to open a file in read mode, indicated by an 'r ' as
the first character in the mode string, will fail if the file does not
already exist or can't be read.
Files opened for update (‘+ ’ as the second or
third character of mode) may be both read and written, but a read may not
immediately follow a write, or a write follow a read, without an
intervening call to one (or more) of fflush ,
fseek , fsetpos or rewind . The
only exception is that a write may immediately follow a read if
EOF was read.
It may also be possible in some implementations to omit the
b in the binary modes, using the same modes for text and
binary files.
Streams opened by fopen are fully buffered only if they are not
connected to an interactive device; this ensures that prompts and
responses are handled properly.
If fopen fails to open a file, it returns a null pointer;
otherwise, it returns a pointer to the object controlling the stream.
The stdin , stdout and stderr
objects are not necessarily modifiable and it may not be possible to use
the value returned from fopen for assignment to one of
them. For this reason, freopen is provided.
9.10.7. Freopen
The freopen function is used to take an existing stream
pointer and associate it with another named file:
#include <stdio.h>
FILE *freopen(const char *pathname,
const char *mode, FILE *stream);
The mode argument is the same as for fopen .
The stream is closed first, and any errors from the close
are ignored. On error, NULL is returned, otherwise the new
value for stream is returned.
9.10.8. Closing files
An open file is closed using fclose .
#include <stdio.h>
int fclose(FILE *stream);
Any unwritten data buffered for stream is flushed out and
any unread data is thrown away. If a buffer had been
automatically allocated for the stream, it is freed. The
file is then closed.
Zero is returned on success, EOF if any error occurs.
9.10.9. Setbuf, setvbuf
These two functions are used to change the buffering strategy for an open
stream:
#include <stdio.h>
int setvbuf(FILE *stream, char *buf,
int type, size_t size);
void setbuf(FILE *stream, char *buf);
They must be used before the file is either read from or
written to. The type argument defines how the
stream will be buffered (see Table 9.4).
Value |
Effect |
_IONBF |
Do not buffer I/O |
_IOFBF |
Fully buffer I/O |
_IOLBF |
Line buffer: flush buffer when full, when newline is written
or when a read is requested. |
Table 9.4. Type of buffering
The buf argument can be a null pointer, in which case an
array is automatically allocated to hold the buffered data. Otherwise,
the user can provide a buffer, but should ensure that its lifetime is at
least as long as that of the stream : a common mistake is to
use automatic storage allocated inside a compound statement; in correct
usage it is usual to obtain the storage from malloc instead.
The size of the buffer is specified by the size
argument.
A call of setbuf is exactly the same as a call of
setvbuf with IOFBF for the type
argument, and BUFSIZ for the size argument. If
buf is a null pointer, the value _IONBF is
used for type instead.
No value is returned by setbuf , setvbuf
returns zero on success, non-zero if invalid values are provided for
type or size , or the request cannot be complied
with.
9.10.10. Fflush
#include <stdio.h>
int fflush(FILE *stream);
If stream refers to a file opened for output or update, any
unwritten data is ‘written’ out. Exactly what that means is
a function of the host environment, and C cannot guarantee, for example,
that data immediately reaches the surface of a disk which might be
supporting the file. If the stream is associated with a file opened
for input or update, any preceding ungetc operation is
forgotten.
The most recent operation on the stream must have been an output
operation; if not, the behaviour is undefined.
A call of fflush with an argument of zero flushes every
output or update stream. Care is taken to avoid those streams that
have not had an output as their last operation, thus avoiding the undefined
behaviour mentioned above.
EOF is returned if an error occurs, otherwise zero.
|