The question is whether it is a quirk of gcc or of the Intel architecture. I find that it still happens with tcc and tendracc (haven't tried Intel's compiler yet), so I suspect it's the architecture... perhaps the Pentium doesn't subtract correctly.
The explanation can be found by looking at "man gcc" under the section explaining the option "-ffloat-store". In case you don't feel like reading that here is my explanation.
In their wisdom the designers of the Pentium have created a floating point register that stores a number at greater precision than a double. Calculations using floats first land in this register. However, rounding off occurs when this register is copied to an actual memory location for a double.
Let us call this register "r". Let "q" be the memory location for a double variable. We are going to compute
q = a*b;
q0 = a*b - q;
(where a and b are some constants of which at least one is a float).
-
a and b are moved into registers.
-
Their floating point product is computed and stored in "r".
-
To perform the next computation the register needs to be used again so we move the value of "r" into "q" --- and this performs a round-off!
-
We repeat steps 1 and 2 to compute a*b in "r".
-
We move the value of "q" into some other register.
-
We compute the difference between the values of "r" and this register. This value is sometimes non-zero depending on the round-off that occured.
Now, why did the Pentium designers do this? And why have the authors of "gcc" adopted the use of this register "r"?
Since a number of floating point computations are iterative this "feature" could improve the accuracy of the result---it certainly cannot reduce the accuracy of the result.
And what if I think it is a "bug"?
You can turn off this "feature" by storing all intermediate values in memory (or perhaps moving them to a "normal" register). The "gcc" compiler offers the option "-ffloat-store" to do this.