2.8. Expressions and arithmetic
Expressions in C can get rather complicated because of the number of
different types and operators that can be mixed together. This section
explains what happens, but can get deep at times. You may need to re-read
it once or twice to make sure that you have understood all of the
points.
First, a bit of terminology. Expressions in C are built from
combinations of operators and operands, so for
example in this expression
x = a+b*(-c)
we have the operators = , + *
and - . The operands are the
variables x , a , b
and c . You will also have noticed that parentheses can be
used for grouping sub-expressions such as the -c . Most of
C's unusually rich set of operators are either binary operators,
which take two operands, or unary operators, which take only
one. In the example, the - was being used as a unary
operator, and is performing a different task from the binary subtraction
operator which uses the same - symbol. It may seem like
hair-splitting to argue that they are different operators when the job that
they do seems conceptually the same, or at least similar. It's worth doing
though, because, as you will find later, some of the operators have both a
binary and a unary form where the two meanings bear no relation to each
other; a good example would be the binary multiplication
operator * , which in its unary form means indirection via
a pointer variable!
A peculiarity of C is that operators may appear consecutively in
expressions without the need for parentheses to separate them. The previous
example could have been written as
x = a+b*-c;
and still have been a valid expression. Because of the number of
operators that C has, and because of the strange way that assignment works,
the precedence of the operators (and their
associativity) is of much greater importance to the
C programmer than in most other languages. It will be discussed fully
after the introduction of the important arithmetic operators.
Before that, we must investigate the type conversions that may occur.
2.8.1. Conversions
C allows types to be mixed in expressions, and permits operations that
result in type conversions happening implicitly. This section describes
the way that the conversions must occur. Old C programmers should read
this carefully, because the rules have changed — in particular, the
promotion of float to double , the promotions of
short integral types and the introduction of value preserving
rules are genuinely different in Standard C.
Although it isn't directly relevant at the moment, we must note that the
integral and the floating types are jointly known as arithmetic
types and that C also supports other types (notably pointer types).
The rules that we discuss here are appropriate only in expressions that
have arithmetic types throughout - additional rules come into play when
expressions mix pointer types with arithmetic types and these are
discussed much later.
There are various types of conversion in arithmetic expressions:
- The integral promotions
- Conversions between integral types
- Conversions between floating types
- Conversions between floating and integral types
Conversions between floating (real) types were discussed in Section 2.8; what we do next is to specify how the other conversions are
to be performed, then look at when they are required. You will
need to learn them by heart if you ever intend to program seriously
in C.
The Standard has, among some controversy, introduced what are known as
value preserving rules, where a knowledge of the target
computer is required to work out what the type of an expression will be.
Previously, whenever an unsigned type occurred in an expression, you knew
that the result had to be unsigned too. Now, the result will
only be unsigned if the conversions demand it; in many cases
the result will be an ordinary signed type.
The reason for the change was to reduce some of the surprises possible
when you mix signed and unsigned quantities together; it isn't always
obvious when this has happened and the intention is to produce the
‘more commonly required’ result.
2.8.1.1. Integral promotions
No arithmetic is done by C at a precision shorter than
int , so these conversions are implied almost whenever you
use one of the objects listed below in an expression. The conversion is
defined as follows:
- Whenever a
short or a char (or a
bitfield or enumeration type which we haven't met
yet) has the integral promotions applied
- if an
int can hold all of the values of the original
type then the value is converted to int
- otherwise, the conversion will be to
unsigned int
This preserves both the value and the sign of the original type. Note
that whether a plain char is treated as signed or unsigned
is implementation dependent.
These promotions are applied very often—they are applied as part
of the usual arithmetic conversions, and to the operands of
the shift, unary + , - , and ~
operators. They are also applied when the expression in question is an
argument to a function but no type information has been provided as part
of a function prototype, as explained in Chapter 4.
2.8.1.2. Signed and unsigned integers
A lot of conversions between different types of integers are caused by
mixing the various flavours of integers in expressions. Whenever these
happen, the integral promotions will already have been done. For all of
them, if the new type can hold all of the values of the old type, then
the value remains unchanged.
When converting from a signed integer to an unsigned integer whose
length is equal to or longer than the original type, then if the signed
value was nonnegative, its value is unchanged. If the value was negative,
then it is converted to the signed form of the longer type and then made
unsigned by conceptually adding it to one greater than the maximum that
can be held in the unsigned type. In a twos complement system, this
preserves the original bit-pattern for positive numbers and guarantees
‘sign-extension’ of negative numbers.
Whenever an integer is converted into a shorter unsigned type, there
can be no ‘overflow’, so the result is defined to be ‘the
non-negative remainder on division by the number one greater than the
largest unsigned number that can be represented in the shorter type’.
That simply means that in a two's complement environment the low-order
bits are copied into the destination and the high-order ones
discarded.
Converting an integer to a shorter signed type runs into trouble if
there is not enough room to hold the value. In that case, the result is
implementation defined (although most old-timers would expect that simply
the low-order bit pattern is copied).
That last item could be a bit worrying if you remember the integral
promotions, because you might interpret it as follows—if I assign
a char to another char , then the one on the
right is first promoted to one of the kinds of int ; could
doing the assignment result in converting (say) an int to a
char and provoking the ‘implementation defined’ clause? The answer
is no, because assignment is specified not to involve the integral
promotions, so you are safe.
2.8.1.3. Floating and integral
Converting a floating to an integral type simply throws away any
fractional part. If the integral type can't hold the value that is left,
then the behaviour is undefined—this is a sort of overflow.
As has already been said, going up the scale from float to
double to long double , there is no problem with
conversions—each higher one in the list can hold all the values of
the lower ones, so the conversion occurs with no loss of information.
Converting in the opposite direction, if the value is outside the range
that can be held, the behaviour is undefined. If the value is in
range, but can't be held exactly, then the result is one of the two
nearest values that can be held, chosen in a way that the
implementation defines. This means that there will be a loss of
precision.
2.8.1.4. The usual arithmetic conversions
A lot of expressions involve the use of subexpressions of mixed types
together with operators such as + , * and
so on. If the operands in an expression have different types, then there
will have to be a conversion applied so that a common resulting type can
be established; these are the conversions:
- If either operand is a
long double , then the other one
is converted to long double and that is the type of the
result.
- Otherwise, if either operand is a
double , then the other
one is converted to double , and that is the type of the
result.
- Otherwise, if either operand is a
float , then the other
one is converted to float , and that is the type of the
result.
- Otherwise the integral promotions are applied to both operands and
the following conversions are applied:
- If either operand is an
unsigned long int , then the
other one is converted to unsigned long int , and that is
the type of the result.
- Otherwise, if either operand is a
long int , then the
other one is converted to long int , and that is the type
of the result.
- Otherwise, if either operand is an
unsigned int , then
the other one is converted to unsigned int , and that is
the type of the result.
- Otherwise, both operands must be of type
int , so that
is the type of the result.
The Standard contains a strange sentence: ‘The values of floating
operands and of the results of floating expressions may be represented in
greater precision and range than that required by the type; the types are
not changed thereby’. This is in fact to allow the Old C
treatment of floats . In Old C, float
variables were automatically promoted to double , the way
that the integral promotions promote char to
int . So, an expression involving purely float
variables may be done as if they were double , but the type
of the result must appear to be float . The only effect is
likely to be on performance and is not particularly important to most
users.
Whether or not conversions need to be applied, and if so which ones, is
discussed at the point where each operator is introduced.
In general, the type conversions and type mixing rules don't cause a
lot of trouble, but there is one pitfall to watch out for. Mixing signed
and unsigned quantities is fine until the signed number is negative; then
its value can't be represented in an unsigned variable and something has
to happen. The standard says that to convert a negative number to
unsigned, the largest possible number that can be held in the unsigned
plus one is added to the negative number; that is the result. Because
there can be no overflow in an unsigned type, the result always has a
defined value. Taking a 16-bit int for an example, the
unsigned version has a range of 0–65535. Converting a signed value
of -7 to this type involves adding 65536, resulting
in 65529. What is happening is that the Standard is enshrining
previous practice, where the bit pattern in the signed number is simply
assigned to the unsigned number; the description in the standard is
exactly what would happen if you did perform the bit pattern assignment
on a two's complement computer. The one's complement implementations are
going to have to do some real work to get the same result.
Putting it plainly, a small magnitude negative number will result in a
large positive number when converted to unsigned. If you don't like it,
suggest a better solution—it is plainly a mistake to try to assign
a negative number to an unsigned variable, so it's your own fault.
Well, it's easy to say ‘don't do it’, but it can happen by
accident and the results can be very surprising. Look at this
example.
#include <stdio.h>
#include <stdlib.h>
main(){
int i;
unsigned int stop_val;
stop_val = 0;
i = -10;
while(i <= stop_val){
printf("%d\n", i);
i = i + 1;
}
exit(EXIT_SUCCESS);
} Example 2.7
You might expect that to print out the list of values
from -10 to 0 , but it won't. The
problem is in the comparison. The variable i , with a
value of -10 , is being compared against an
unsigned 0 . By the rules of arithmetic (check them) we
must convert both types to unsigned int first, then make the
comparison. The -10 becomes at
least 65526 (see <limits.h> ) when
it's converted, and is plainly somewhat larger than 0 ,
so the loop is never executed. The moral is to steer clear of unsigned
numbers unless you really have to use them, and to be perpetually on
guard when they are mixed with signed numbers.
2.8.1.5. Wide characters
The Standard, as we've already said, now makes allowances for extended
character sets. You can either use the shift-in shift-out encoding method
which allows the multibyte charactes to be stored in ordinary C strings
(which are really arrays of chars , as we explore later), or
you can use a representation that uses more than one byte of storage per
character for every character. The use of shift sequences only works if
you process the characters in strict order; it is next to useless if you
want to create an array of characters and access them in non-sequential
order, since the actual index of each char in the array and
the logical index of each of the encoded characters are not easily
determined. Here's the illustration we used before, annotated with the
actual and the logical array indexes:
0 1 2 3 4 5 6 7 8 9 (actual array index)
a b c <SI> a b g <SO> x y
0 1 2 3 4 5 6 7 (logical index)
We're still in trouble even if we do manage to use the index
of 5 to access the ‘correct’ array entry, since
the value retrieved is indistinguishable from the value that encodes the
letter ‘g ’ anyhow. Clearly, a better approach for
this sort of thing is to come up with a distinct value for all of the
characters in the character set we are using, which may involve more bits
than will fit into a char, and to be able to store each one as a separate
item without the use of shifts or other position-dependent techniques.
That is what the wchar_t type is for.
Although it is always a synonym for one of the other integral types,
wchar_t (whose definition is found in
<stddef.h> ) is defined to be the
implementation-dependent type that should be used to hold extended
characters when you need an array of them. The Standard makes the
following guarantees about the values in a wide character:
- A
wchar_t can hold distinct values for each member of
the largest character set supported by the implementation.
- The null character has the value of zero.
- Each member of the basic character set (see Section 2.2.1) is encoded in a
wchar_t with the same value
as it has in a char .
There is further support for this method of encoding characters.
Strings, which we have already seen, are implemented as arrays of
char , even though they look like this:
"a string"
To get strings whose type is wchar_t , simply prefix a
string with the letter L . For example:
L"a string"
In the two examples, it is very important to understand the
differences. Strings are implemented as arrays and although it might look
odd, it is entirely permissible to use array indexing on them:
"a string"[4]
L"a string"[4]
are both valid expressions. The first results in an expression whose
type is char and whose value is the internal representation
of the letter ‘r ’ (remember arrays index from
zero, not one). The second has the type wchar_t and also has
the value of the internal representation of the
letter ‘r ’.
It gets more interesting if we are using extended characters. If we use
the notation <a> , <b> , and so
on to indicate ‘additional’ characters beyond the normal character
set which are encoded using some form of shift technique, then these
examples show the problems.
"abc<a><b>"[3]
L"abc<a><b>"[3]
The second one is easiest: it has a type of wchar_t and
the appropriate internal encoding for
whatever <a> is supposed to be—say the
Greek letter alpha. The first one is unpredictable. Its type is
unquestionably char , but its value is probably the value of
the ‘shift-in’ marker.
As with strings, there are also wide character constants.
'a'
has type char and the value of the encoding for the
letter ‘a ’.
L'a'
is a constant of type wchar_t . If you use a multibyte
character in the first one, then you have the same sort of thing as if
you had written
'xy'
—multiple characters in a character constant (actually, this is
valid but means something funny). A single multibyte character in the
second example will simply be converted into the appropriate
wchar_t value.
If you don't understand all the wide character stuff, then all we can
say is that we've done our best to explain it. Come back and read it
again later, when it might suddenly click. In practice it does manage to
address the support of extended character sets in C and once you're
used to it, it makes a lot of sense.
Exercise 2.15. Assuming that chars , ints and
longs are respectively 8, 16 and 32 bits
long, and that char defaults to unsigned char
on a given system, what is the resulting type of expressions involving
the following combinations of variables, after the usual arithmetic
conversions have been applied?
- Simply
signed char .
- Simply
unsigned char .
int , unsigned int .
unsigned int , long .
int , unsigned long .
char , long .
char , float .
float , float .
float , long double .
2.8.1.6. Casts
From time to time you will find that an expression turns out not to
have the type that you wanted it to have and you would like to force it
to have a different type. That is what casts are for. By
putting a type name in parentheses, for example
(int)
you create a unary operator known as a cast. A cast turns
the value of the expression on its right into the indicated type. If, for
example, you were dividing two integers a/b then the
expression would use integer division and discard any remainder. To force
the fractional part to be retained, you could either use some
intermediate float variables, or a cast. This example does it both
ways.
#include <stdio.h>
#include <stdlib.h>
/*
* Illustrates casts.
* For each of the numbers between 2 and 20,
* print the percentage difference between it and the one
* before
*/
main(){
int curr_val;
float temp, pcnt_diff;
curr_val = 2;
while(curr_val <= 20){
/*
* % difference is
* 1/(curr_val)*100
*/
temp = curr_val;
pcnt_diff = 100/temp;
printf("Percent difference at %d is %f\n",
curr_val, pcnt_diff);
/*
* Or, using a cast:
*/
pcnt_diff = 100/(float)curr_val;
printf("Percent difference at %d is %f\n",
curr_val, pcnt_diff);
curr_val = curr_val + 1;
}
exit(EXIT_SUCCESS);
} Example 2.8
The easiest way to remember how to write a cast is to write down
exactly what you would use to declare a variable of the type that you
want. Put parentheses around the entire declaration, then delete the
variable name; that gives you the cast. Table 2.6 shows a
few simple examples—some of the types shown will be new to you,
but it's the complicated ones that illustrate best how casts are written.
Ignore the ones that you don't understand yet, because you will be able
to use the table as a reference later.
Declaration |
Cast |
Type |
int x; |
(int) |
int |
float f; |
(float) |
float |
char x[30]; |
(char [30]) |
array of char |
int *ip; |
(int *) |
pointer to int |
int (*f)(); |
(int (*)()) |
pointer to function returning int |
Table 2.6. Casts
2.8.2. Operators
2.8.2.1. The multiplicative operators
Or, put another way, multiplication * ,
division / and the remainder
operator % . Multiplication and division do what is
expected of them for both real and integral types, with integral division
producing a truncated result. The truncation is towards zero. The
remainder operator is only defined to work with integral types, because
the division of real numbers supposedly doesn't produce a remainder.
If the division is not exact and neither operand is negative, the
result of / is positive and rounded toward zero—to
get the remainder, use % . For example,
9/2 == 4
9%2 == 1
If either operand is negative, the result of / may be
the nearest integer to the true result on either side, and the sign of
the result of % may be positive or negative. Both of
these features are implementation defined.
It is always true that the following expression is equal to zero:
(a/b)*b + a%b - a
unless b is zero.
The usual arithmetic conversions are applied to both of the
operands.
2.8.2.2. Additive operators
Addition + and subtraction - also
follow the rules that you expect. The binary operators and the unary
operators both have the same symbols, but rather different meanings. For
example, the expressions a+b and a-b
both use a binary operator (the +
or - operators), and result in addition or subtraction.
The unary operators with the same symbols would be
written +b or -b .
The unary minus has an obvious function—it takes the negative
value of its operand; what does the unary plus do? In fact the answer is
almost nothing. The unary plus is a new addition to the language, which
balances the presence of the unary minus, but doesn't have any effect on
the value of the expression. Very few Old C users even noticed that
it was missing.
The usual arithmetic conversions are applied to both of the operands of
the binary forms of the operators. Only the integral promotions are
performed on the operands of the unary forms of the operators.
2.8.2.3. The bitwise operators
One of the great strengths of C is the way that it allows systems
programmers to do what had, before the advent of C, always been
regarded as the province of the assembly code programmer. That sort of
code was by definition highly non-portable. As C demonstrates, there
isn't any magic about that sort of thing, and into the bargain it turns
out to be surprisingly portable. What is it? It's what is often referred
to as ‘bit-twiddling’—the manipulation of individual bits in
integer variables. None of the bitwise operators may be used on real
operands because they aren't considered to have individual or accessible
bits.
There are six bitwise operators, listed in Table 2.7,
which also shows the arithmetic conversions that are applied.
Operator |
Effect |
Conversions |
& |
bitwise AND |
usual arithmetic conversions |
| |
bitwise OR |
usual arithmetic conversions |
^ |
Bitwise XOR |
usual arithmetic conversions |
<< |
left shift |
integral promotions |
>> |
right shift |
integral promotions |
~ |
one's complement |
integral promotions |
Table 2.7. Bitwise operators
Only the last, the one's complement, is a unary operator. It inverts
the state of every bit in its operand and has the same effect as the
unary minus on a one's complement computer. Most modern computers work
with two's complement, so it isn't a waste of time having it there.
Illustrating the use of these operators is easier if we can use
hexadecimal notation rather than decimal, so now is the time to see
hexadecimal constants. Any number written with 0x at
its beginning is interpreted as hexadecimal; both 15
and 0xf (or 0XF ) mean the same thing.
Try running this or, better still, try to predict what it does first and
then try running it.
#include <stdio.h>
#include <stdlib.h>
main(){
int x,y;
x = 0; y = ~0;
while(x != y){
printf("%x & %x = %x\n", x, 0xff, x&0xff);
printf("%x | %x = %x\n", x, 0x10f, x|0x10f);
printf("%x ^ %x = %x\n", x, 0xf00f, x^0xf00f);
printf("%x >> 2 = %x\n", x, x >> 2);
printf("%x << 2 = %x\n", x, x << 2);
x = (x << 1) | 1;
}
exit(EXIT_SUCCESS);
} Example 2.9
The way that the loop works in that example is the first thing to
study. The controlling variable is x , which is
initialized to zero. Every time round the loop it is compared
against y , which has been set to a word-length
independent pattern of all 1 s by taking the one's
complement of zero. At the bottom of the loop, x is
shifted left once and has 1 ORed into it, giving rise to a sequence
that starts 0 , 1 , 11 ,
111 , … in binary.
For each of the AND, OR, and XOR (exclusive OR) operators,
x is operated on by the operator and some other
interesting operand, then the result printed.
The left and right shift operators are in there too, giving a result
which has the type and value of their left-hand operand shifted in the
required direction a number of places specified by their right-hand
operand; the type of both of the operands must be integral. Bits shifted
off either end of the left operand simply disappear. Shifting by more
bits than there are in a word gives an implementation dependent
result.
Shifting left guarantees to shift zeros into the low-order bits.
Right shift is fussier. Your implementation is allowed to choose
whether, when shifting signed operands, it performs a logical or
arithmetic right shift. This means that a logical shift shifts zeros into
the most significant bit positions; an arithmetic shift copies the
current contents of the most significant bit back into itself. The
position is clearer if an unsigned operand is right shifted, because
there is no choice: it must be a logical shift. For that reason, whenever
right shift is being used, you would expect to find that the thing being
shifted had been declared to be unsigned, or cast to unsigned for the
shift, as in the example:
int i,j;
i = (unsigned)j >> 4;
The second (right-hand) operand of a shift operator does not have to be
a constant; any integral expression is legal. Importantly, the rules
involving mixed types of operands do not apply to the shift operators.
The result of the shift has the same type as the thing that got shifted
(after the integral promotions), and depends on nothing else.
Now something different; one of those little tricks that
C programmers find helps to write better programs. If for any reason
you want to form a value that has 1 s in all but its
least significant so-many bits, which are to have some other pattern in
them, you don't have to know the word length of the machine. For example,
to set the low order bits of an int
to 0x0f0 and all the other bits to 1 ,
this is the way to do it:
int some_variable;
some_variable = ~0xf0f;
The one's complement of the desired low-order bit pattern has been
one's complemented. That gives exactly the required result and is
completely independent of word length; it is a very common sight in
C code.
There isn't a lot more to say about the bit-twiddling operators, and
our experience of teaching C has been that most people find them
easy to learn. Let's move on.
2.8.2.4. The assignment operators
No, that isn't a mistake, ‘operators’ was meant to be plural.
C has several assignment operators, even though we have only seen
the plain = so far. An interesting thing about them is
that they are all like the other binary operators; they take two operands
and produce a result, the result being usable as part of an expression.
In this statement
x = 4;
the value 4 is assigned to x . The
result has the type of x and the value that was
assigned. It can be used like this
a = (x = 4);
where a will now have the value 4
assigned to it, after x has been assigned to. All of
the simpler assignments that we have seen until now (except for one
example) have simply discarded the resulting value of the assignment,
even though it is produced.
It's because assignment has a result that an expression like
a = b = c = d;
works. The value of d is assigned
to c , the result of that is assigned
to b and so on. It makes use of the fact that
expressions involving only assignment operators are evaluated from right
to left, but is otherwise like any other expression. (The rules
explaining what groups right to left and vice versa are given in
Table 2.9.)
If you look back to the section describing ‘conversions’, there
is a description of what happens if you convert longer types to shorter
types: that is what happens when the left-hand operand of an assignment
is shorter than the right-hand one. No conversions are applied to the
right-hand operand of the simple assignment operator.
The remaining assignment operators are the compound assignment
operators. They allow a useful shorthand, where an assignment containing
the same left- and right-hand sides can be compressed; for example
x = x + 1;
can be written as
x += 1;
using one of the compound assignment operators. The result is the same
in each case. It is a useful thing to do when the left-hand side of the
operator is a complicated expression, not just a variable; such things
occur when you start to use arrays and pointers. Most experienced C
programmers tend to use the form given in the second example because
somehow it ‘feels better’, a sentiment that no beginner has ever
been known to agree with. Table 2.8 lists the compound
assignment operators; you will see them used a lot from now on.
*= |
/= |
%= |
+= |
-= |
|
&= |
|= |
^= |
>>= |
<<= |
|
Table 2.8. Compound assignment operators
In each case, arithmetic conversions are applied as if the expression
had been written out in full, for example as if a+=b
had been written a=a+b .
Reiterating: the result of an assignment operator has both the value
and the type of the object that was assigned to.
2.8.2.5. Increment and decrement operators
It is so common to simply add or subtract 1 in an expression that C has
two special unary operators to do the job. The increment
operator ++ adds 1 , the
decrement -- subtracts 1 . They are
used like this:
x++;
++x;
x--;
--x;
where the operator can come either before or after its operand. In the
cases shown it doesn't matter where the operator comes, but in more
complicated cases the difference has a definite meaning and must be used
properly.
Here is the difference being used.
#include <stdio.h>
#include <stdlib.h>
main(){
int a,b;
a = b = 5;
printf("%d\n", ++a+5);
printf("%d\n", a);
printf("%d\n", b++ +5);
printf("%d\n", b);
exit(EXIT_SUCCESS);
} Example 2.10
The results printed were
11
6
10
6
The difference is caused by the different positions of the operators.
If the inc/decrement operator appears in front of the variable, then its
value is changed by one and the new value is used in the
expression. If the operator comes after the variable, then the
old value is used in the expression and the variable's value is
changed afterwards.
C programmers never add or subtract one with statements like this
x += 1;
they invariably use one of
x++; /* or */ ++x;
as a matter of course. A warning is in order though: it is not safe to
use a variable more than once in an expression if it has one of these
operators attached to it. There is no guarantee of when, within an
expression, the affected variable will actually change value. The
compiler might choose to ‘save up’ all of the changes and apply
them at once, so an expression like this
y = x++ + --x;
does not guarantee to assign twice the original value
of x to y . It might be evaluated as
if it expanded to this instead:
y = x + (x-1);
because the compiler notices that the overall effect on the value
of x is zero.
The arithmetic is done exactly as if the full addition expression had
been used, for example x=x+1 , and the usual arithmetic
conversions apply.
Exercise 2.16. Given the following variable definitions
int i1, i2;
float f1, f2;
- How would you find the remainder when
i1 is
divided by i2 ?
- How would you find the remainder when
i1 is
divided by the value of f1 ,
treating f1 as an integer?
- What can you predict about the sign of the remainders calculated in
the previous two questions?
- What meanings can the
- operator have?
- How would you turn off all but the low-order four bits
in
i1 ?
- How would you turn on all the low-order four bits
in
i1 ?
- How would you turn off only the low-order four bits
in
i1 ?
- How would you put into
i1 the low-order
8 bits in i2 , but swapping the significance of
the lowest four with the next
- What is wrong with the following expression?
f2 = ++f1 + ++f1;
2.8.3. Precedence and grouping
After looking at the operators we have to consider the way that they
work together. For things like addition it may not seem important; it
hardly matters whether
a + b + c
is done as
(a + b) + c
or
a + (b + c)
does it? Well, yes in fact it does. If a+b would
overflow and c held a value very close
to -b , then the second grouping might give the correct
answer where the first would cause undefined behaviour. The problem is
much more obvious with integer division:
a/b/c
gives very different results when grouped as
a/(b/c)
or
(a/b)/c
If you don't believe that, try it with a=10 ,
b=2 , c=3 . The first
gives 10/(2/3) ; 2/3 in integer
division gives 0 , so we get 10/0 which
immediately overflows. The second grouping gives (10/2) ,
obviously 5 , which divided by 3
gives 1 .
The grouping of operators like that is known as
associativity. The other question is one of
precedence, where some operators have a higher priority than
others and force evaluation of sub-expressions involving them to be
performed before those with lower precedence operators. This is almost
universal practice in high-level languages, so we ‘know’ that
a + b * c + d
groups as
a + (b * c) + d
indicating that multiplication has higher precedence than addition.
The large set of operators in C gives rise to 15 levels of
precedence! Only very boring people bother to remember them all. The
complete list is given in Table 2.9, which indicates both
precedence and associativity. Not all of the operators have been mentioned
yet. Beware of the use of the same symbol for both unary and binary
operators: the table indicates which are which.
Operator |
Direction |
Notes |
() [] -> . |
left to right |
1 |
! ~ ++ -- - + (cast) * & sizeof |
right to left |
all unary |
* / % |
left to right |
binary |
+ - |
left to right |
binary |
<< >> |
left to right |
binary |
< <= > >= |
left to right |
binary |
== != |
left to right |
binary |
& |
left to right |
binary |
^ |
left to right |
binary |
| |
left to right |
binary |
&& |
left to right |
binary |
|| |
left to right |
binary |
?: |
right to left |
2 |
= += and all combined assignment |
right to left |
binary |
, |
left to right |
binary |
1. Parentheses are for expression grouping, not
function call. | 2. This is unusual. See Section 3.4.1. | Table 2.9. Operator precedence and associativity
The question is, what can you do with that information, now that it's
there? Obviously it's important to be able to work out both how to write
expressions that evaluate in the proper order, and also how to read other
people's. The technique is this: first, identify the unary operators and
the operands that they refer to. This isn't such a difficult task but it
takes some practice, especially when you discover that operators such as
unary * can be applied an arbitrary number of times to
their operands; this expression
a*****b
means a multiplied by something, where the
something is an expression involving b and several
unary * operators.
It's not too difficult to work out which are the unary operators; here
are the rules.
++ and - are always unary
operators.
- The operator immediately to the right of an operand is a binary
operator unless (1) applies, when the operator to its right is
binary.
- All operators to the left of an operand are unary unless
(2) applies.
Because the unary operators have very high precedence, you can work out
what they do before worrying about the other operators. One thing to watch
out for is the way that ++ and -- can
be before or after their operands; the expression
a + -b++ + c
has two unary operators applied to b . The unary
operators all associate right to left, so although the -
comes first when you read the expression, it really parenthesizes (for
clarity) like this:
a + -(b++) + c
The case is a little clearer if the prefix, rather than the postfix,
form of the increment/decrement operators is being used. Again the order
is right to left, but at least the operators come all in a row.
After sorting out what to do with the unary operators, it's easy to read
the expression from left to right. Every time you see a binary operator,
remember it. Look to the right: if the next binary operator is of a lower
precedence, then the operator you just remembered is part of a
subexpression to evaluate before anything else is seen. If the next
operator is of the same precedence, keep repeating the procedure as long
as equal precedence operators are seen. When you eventually find a lower
precedence operator, evaluate the subexpression on the left according to
the associativity rules. If a higher precedence operator is found on the
right, forget the previous stuff: the operand to the left of the higher
precedence operator is part of a subexpression separate from anything on
the left so far. It belongs to the new operator instead.
If that lot isn't clear don't worry. A lot of C programmers have
trouble with this area and eventually learn to parenthesize these
expressions ‘by eye’, without ever using formal rules.
What does matter is what happens when you have fully
parenthesized these expressions. Remember the ‘usual arithmetic
conversions’? They explained how you could predict the type of an
expression from the operands involved. Now, even if you mix all sorts of
types in a complicated expression, the types of the subexpressions are
determined only from the the types of the operands in the subexpression.
Look at this.
#include <stdio.h>
#include <stdlib.h>
main(){
int i,j;
float f;
i = 5; j = 2;
f = 3.0;
f = f + j / i;
printf("value of f is %f\n", f);
exit(EXIT_SUCCESS);
} Example 2.11
The value printed is 3.0000 ,
not 5.0000 —which might surprise some, who thought
that because a float was involved the whole statement
involving the division would be done in that real type.
Of course, the division operator had only int types on either side, so
the arithmetic was done as integer division and resulted in zero. The
addition had a float and an int on either side,
so the conversions meant that the int was converted to
float for the arithmetic, and that was the correct type for
the assignment, so there were no further conversions.
The previous section on casts showed one way of changing the type of an
expression from its natural one to the one that you want. Be careful
though:
(float)(j/i)
would still use integer division, then convert the result to
float . To keep the remainder, you should use
(float)j/i
which would force real division to be used.
2.8.4. Parentheses
C allows you to override the normal effects of precedence and
associativity by the use of parentheses as the examples have illustrated.
In Old C, the parentheses had no further meaning, and in particular
did not guarantee anything about the order of evaluation in
expressions like these:
int a, b, c;
a+b+c;
(a+b)+c;
a+(b+c);
You used to need to use explicit temporary variables to get a particular
order of evaluation—something that matters if you know that there
are risks of overflow in a particular expression, but by forcing the
evaluation to be in a certain order you can avoid it.
Standard C says that evaluation must be done in the order
indicated by the precedence and grouping of the expression, unless the
compiler can tell that the result will not be affected by any regrouping
it might do for optimization reasons.
So, the expression a = 10+a+b+5 ; cannot be rewritten
by the compiler as a = 15+a+b ; unless it can be
guaranteed that the resulting value of a will be the same for all
combinations of initial values of a
and b . That would be true if the variables were both
unsigned integral types, or if they were signed integral types but in that
particular implementation overflow did not cause a run-time exception and
overflow was reversible.
2.8.5. Side Effects
To repeat and expand the warning given for the increment operators: it
is unsafe to use the same variable more than once in an expression if
evaluating the expression changes the variable and the new value could
affect the result of the expression. This is because the change(s) may be
‘saved up’ and only applied at the end of the statement.
So f = f+1; is safe even though f
appears twice in a value-changing expression, f++; is
also safe, but f = f++; is unsafe.
The problem can be caused by using an assignment, use of the increment
or decrement operators, or by calling a function that changes the value of
an external variable that is also used in the expression. These are
generally known as ‘side effects’. C makes almost no promise
that side effects will occur in a predictable order within a single
expression. (The discussion of ‘sequence points’ in Chapter 8 will be of interest if you care about this.)
|