The question is whether it is a quirk of gcc or of the Intel architecture. I find that it still happens with tcc and tendracc (haven't tried Intel's compiler yet), so I suspect it's the architecture... perhaps the Pentium doesn't subtract correctly.
The explanation can be found by looking at "man gcc" under the section explaining the option "ffloatstore". In case you don't feel like reading that here is my explanation.
In their wisdom the designers of the Pentium have created a floating point register that stores a number at greater precision than a double. Calculations using floats first land in this register. However, rounding off occurs when this register is copied to an actual memory location for a double.
Let us call this register "r". Let "q" be the memory location for a double variable. We are going to compute
q = a*b;
q0 = a*b  q;
(where a and b are some constants of which at least one is a float).

a and b are moved into registers.

Their floating point product is computed and stored in "r".

To perform the next computation the register needs to be used again so we move the value of "r" into "q"  and this performs a roundoff!

We repeat steps 1 and 2 to compute a*b in "r".

We move the value of "q" into some other register.

We compute the difference between the values of "r" and this register. This value is sometimes nonzero depending on the roundoff that occured.
Now, why did the Pentium designers do this? And why have the authors of "gcc" adopted the use of this register "r"?
Since a number of floating point computations are iterative this "feature" could improve the accuracy of the resultit certainly cannot reduce the accuracy of the result.
And what if I think it is a "bug"?
You can turn off this "feature" by storing all intermediate values in memory (or perhaps moving them to a "normal" register). The "gcc" compiler offers the option "ffloatstore" to do this.