2.6. Real types
It's easier to deal with the real types first because there's less to say
about them and they don't get as complicated as the integer types. The
Standard breaks new ground by laying down some basic guarantees on the
precision and range of the real numbers; these are found in the header file
float.h which is discussed in detail in Chapter 9.
For some users this is extremely important information, but it is of a
highly technical nature and is likely only to be fully understood by
numerical analysts.
The varieties of real numbers are these:
float
double
long double
Each of the types gives access to a particular way of representing real
numbers in the target computer. If it only has one way of doing things,
they might all turn out to be the same; if it has more than three, then C
has no way of specifying the extra ones. The type float is
intended to be the small, fast representation corresponding to what FORTRAN
would call REAL . You would use double for extra
precision, and long double for even more.
The main points of interest are that in the increasing ‘lengths’ of
float , double and long double , each
type must give at least the same range and precision as the previous type.
For example, taking the value in a double and putting it into
a long double must result in the same value.
There is no requirement for the three types of ‘real’ variables to
differ in their properties, so if a machine only has one type of real
arithmetic, all of C's three types could be implemented in the same way.
None the less, the three types would be considered to be different from the
point of view of type checking; it would be ‘as if’ they really were
different. That helps when you move the program to a system where the three
types really are different—there won't suddenly be a set of
warnings coming out of your compiler about type mismatches that you didn't
get on the first system.
In contrast to more ‘strongly typed’ languages, C permits
expressions to mix all of the scalar types: the various flavours of
integers, the real numbers and also the pointer types. When an expression
contains a mixture of arithmetic (integer and real) types there are
implicit conversions invoked which can be used to work out what the overall
type of the result will be. These rules are quite important and are known
as the usual arithmetic conversions; it will be worth committing
them to memory later. The full set of rules is described in Section 2.8; for the moment, we will investigate only the ones that involve
mixing float , double and long double
to see if they make sense.
The only time that the conversions are needed is when two different types
are mixed in an expression, as in the example below:
int f(void){
float f_var;
double d_var;
long double l_d_var;
f_var = 1; d_var = 1; l_d_var = 1;
d_var = d_var + f_var;
l_d_var = d_var + f_var;
return(l_d_var);
} Example 2.1
There are a lot of forced conversions in that example. Getting the
easiest of them out of the way first, let's look at the assignments of the
constant value 1 to each of the variables. As the section
on constants will point out, that 1 has type int ,
i.e. it is an integer, not a real constant. The assignment converts the
integer value to the appropriate real type, which is easy to cope with.
The interesting conversions come next. The first of them is on the
line
d_var = d_var + f_var;
What is the type of the expression involving the + operator?
The answer is easy when you know the rules. Whenever two different real
types are involved in an expression, the lower precision type is first
implicitly converted to the higher precision type and then the arithmetic
is performed at that precision. The example involves both a
double and a float , so the value of
f_var is converted to type double and is then
added to the value of the double d_var . The result of the
expression is naturally of type double too, so it is clearly
of the correct type to assign to d_var .
The second of the additions is a little bit more complicated, but still
perfectly O.K. Again, the value of f_var is converted and the
arithmetic performed with the precision of double , forming the
sum of the two variables. Now there's a problem. The result (the sum) is
double , but the assignment is to a long double .
Once again the obvious procedure is to convert the lower precision value to
the higher one, which is done, and then make the assignment.
So we've taken the easy ones. The difficult thing to see is what to do
when forced to assign a higher precision result to a lower precision
destination. In those cases it may be necessary to lose precision, in a way
specified by the implementation. Basically, the implementation must specify
whether and in what way it rounds or truncates. Even worse, the destination
may be unable to hold the value at all. The Standard says that in these
cases loss of precision may occur; if the destination is unable to hold the
necessary value—say by attempting to add the largest representable
number to itself—then the behaviour is undefined, your program is
faulty and you can make no predictions whatsoever about any subsequent
behaviour.
It is no mistake to re-emphasize that last statement. What the Standard
means by undefined behaviour is exactly what it says. Once a
program's behaviour has entered the undefined region, absolutely anything
can happen. The program might be stopped by the operating system with an
appropriate message, or just as likely nothing observable would happen and
the program be allowed to continue with an erroneous value stored in the
variable in question. It is your responsibility to prevent your program
from exhibiting undefined behaviour. Beware!
Summary of real arithmatic
- Arithmetic with any two real types is done at the highest precision of
the members involved.
- Assignment involves loss of precision if the receiving type has a lower
precision than the value being assigned to it.
- Further conversions are often implied when expressions mix other types,
but they have not been described yet.
2.6.1. Printing real numbers
The usual output function, printf , can be used to format
real numbers and print them. There are a number of ways to format these
numbers, but we'll stick to just one for now. Table 2.4 below
shows the appropriate format description for each of the real types.
Type |
Format |
float |
%f |
double |
%f |
long double |
%lf |
Table 2.4. Format codes for real numbers
Here's an example to try:
#include <stdio.h>
#include <stdlib.h>
#define BOILING 212 /* degrees Fahrenheit */
main(){
float f_var; double d_var; long double l_d_var;
int i;
i = 0;
printf("Fahrenheit to Centigrade\n");
while(i <= BOILING){
l_d_var = 5*(i-32);
l_d_var = l_d_var/9;
d_var = l_d_var;
f_var = l_d_var;
printf("%d %f %f %lf\n", i,
f_var, d_var, l_d_var);
i = i+1;
}
exit(EXIT_SUCCESS);
} Example 2.2
Try that example on your own computer to see what results you get.
Exercise 2.10. Which type of variable can hold the largest range of
values?
Exercise 2.11. Which type of variable can store values to the
greatest precision?
Exercise 2.12. Are there any problems possible when assigning a
float or double to a double or
long double ?
Exercise 2.13. What could go wrong when assigning, say, a long
double to a double ?
Exercise 2.14. What predictions can you make about a program showing
‘undefined behaviour’?
|