When and why it is safe to cast floor/ceil result to integer
Happy new year, ladies and gents! I am back, and the following couple of posts are going to be programming-related.
The C/C++ standard library declares floor function such that it returns a floating-point number:
double floor ( double x ); float floor ( float x ); // C++ only
I used to be tortured by two questions:
- Why would the floor function return anything other than integer?
- What is the most correct way to cast the output to integer?
At last I figured out the answer to both questions. Note that the rest of the post applies to the ceil function as well.
For the second point, I knew the common idiom was just type-casting the output:
double a;
int intRes = int(floor(a)); // use static_cast if you don't like brevity
If you’ve heard anything about peculiarities of floating-point arithmetic, you might start worrying that floor might return not exact integer value but $\lfloor a \rfloor - \epsilon$, so that type-casting is incorrect (assume $a$ is positive). However, this is not the case. Consider the following statement.
If $a$ is not integer and is representable as IEEE-754 floating-point number, than both $\lfloor a \rfloor$ and $\lceil a \rceil$ are representable within that domain. This means that for any float/double value floor can return a number that is integer and fits in float/double format.
The proof is easy but requires understanding of representation of floating-point numbers. Suppose $a = c \times b^q, c \in [0.1, 1)$ has $k$ significant digits, i.e. $k$-th digit after the point in $c$ is not null, but all the further ones are. Since it is representable, $k$ is less then the maximum width of significand. Since $a$ is not integer, both $\lfloor a \rfloor$ and $\lceil a \rceil$ have significands with the number of significant digits less then $k$. Rounding however might increase the order of magnitude but only by one. So, the rounded number’s significand fits in $k$ digits, Q.E.D.
However, this does not mean one can always type-cast the output safely, since, for example, not every integer stored in double can be represented as 4-byte int (and this is the answer to the question #1). Double precision numbers have 52 digit significands, so any integer number up to $2^{52}$ can be stored exactly. For the int type, it is only $2^{31}$. So if there is possibility of overflow, check before the cast or use the int64 type.
On the other hand, a lot of int’s cannot be represented as floats (also true for int64 and double). Consider the following code:
std::cout << std::showbase << std::hex
<< int(float(1 << 23)) << " " << int(float(1 << 24)) << std::endl
<< int(float(1 << 23) + 1) << " " << int(float(1 << 24) + 1) << std::endl;The output is:
0x800000 0x1000000 0x800001 0x1000000
$2^{24}+1$ cannot be represented as float exactly (it is represented as $2^{24}$), so one should not use the float version of floor for numbers greater than few millions.
There are classes of numbers that are guaranteed to be represented exactly in floating point format. See also the comprehensive tutorial on floating-point arithmetic by David Goldberg.