7.2. How the preprocessor works
Although the preprocessor (Figure 7.1) is probably going
to be implemented as an integral part of an Standard C compiler, it can
equally well be though of as a separate program which transforms C source
code containing preprocessor directives into source code with the
directives removed.
It's important to remember that the preprocessor is not working to the
same rules as the rest of C. It works on a line-by-line basis, so the end
of a line means something special to it. The rest of C thinks that
end-of-line is little different from a space or tab character.
The preprocessor doesn't know about the scope rules of C. Preprocessor
directives like #define take effect as soon as they are seen
and remain in effect until the end of the file that contains them; the
program's block structure is irrelevant. This is one of the reasons why
it's a good idea to make sparing use of these directives. The less you
have in your program that doesn't obey the ‘normal’ scope rules,
the less likely you are to make mistakes. This is mainly what gives rise
to our comments about the poor level of integration between the
preprocessor and the rest of C.
The Standard gives some complicated rules for the syntax of the
preprocessor, especially with respect to tokens. To understand
the operation of the preprocessor you need to know a little about them.
The text that is being processed is not considered to be a uniform stream
of characters, but is separated into tokens then processed piecemeal.
For a full definition of the process, it is best to refer to the
Standard, but an informal description follows. Each of the terms used to
head the list below is used later in descriptions of the rules.
- header-name
- ‘
< ’ almost any character
‘> ’
- preprocessing-token
- a header-name as above but only when the subject of
#include ,
- or an identifier which is any C identifier or
keyword,
- or a constant which is any integral or floating
constant,
- or a string-literal which is a normal C string,
- or an operator which is one of the C operators,
- or one of [ ] ( ) { } * , : = ; ... # (punctuators)
- or any non-white-space character not covered by the list above.
The ‘almost any character’ above means any character
except ‘> ’ or newline.
|
Printer-friendly version
The C Book
This book is published as a matter of historical interest.
Please read the
copyright and disclaimer information.
GBdirect Ltd provides up-to-date training and consultancy in
C,
Embedded C,
C++
and a wide range of
other subjects based on
open standards if you happen to be interested.
|