Personal tools
You are here: Home CDAT Tips and Tricks CDAT Gotchas Single Precision Arithmetic
Document Actions

Single Precision Arithmetic

by Bob Drach last modified 2007-05-01 15:08

 


Using single-precision floating point arrays has the advantage of saving considerable space for very large arrays, but there are a few pitfalls to be aware of. Consider the following:

>>> import Numeric
>>> x = Numeric.ones(20000000., Numeric.Float32)
>>> Numeric.average(x)
0.83886079999999996
>>>

This is a huge bug, right???

Surprising, yes, but a bug, no. In fact, this is the result that should be expected for 32-bit floats, which have a 23-bit mantissa and 8-bit exponent. The largest integer that can be represented exactly as a 32-bit float is 2**24 = 16777216.0 . The Numeric average function accumulates the sum in a single-precision float, which eventually reaches the value 16777216.0. At that point, the next addition has the result:

16777216.0 + 1.0 = 16777216.0

hence the non-intuitive result.

The moral of the story is: for very large arrays do the calculations in double precision, especially where sums and averages are taken.



Powered by Plone