chrisphan.com: Float Explorer: A tool to understand floating-point numbers

I made another thing: Float Explorer!

Screenshot of a web app. The app is titled "Float Explorer". The number 44.0457 is being examined. The app shows the hexadecimal, decimal, and binary representation of this number as a double-precision floating-point number, explaining the sign, exponent, and fraction parts of the representation. — Float Explorer shows you how a floating-point number is represented in the computer's memory.

What does it do?

This is a simple web app that shows you how a floating-point number is stored in your computer's memory. For example, consider the number 44.0457.¹ In your computer, this number is represented as a double-precision (64-bit) floating-point number as \[ (-1)^{0} \times \left( 1 + \frac{1695281563481774}{2^{52}} \right) \times 2^{1028 - 1023}.\] The three numbers 0, 1028, and 1695281563481774 are stored in binary as 0b0, 0b10000000100, and 0b0110000001011101100101111111011000101011011010101110, respectively. Hence, the entire number is represented in binary as 0b0100000001000110000001011101100101111111011000101011011010101110, or in hexadecimal as 0x404605d97f62b6ae. (The convention is to prefix hexadecimal numbers with 0x and binary numbers with 0b.)

It also has a calculator feature, so you can look at how floating-point arithmetic handles calculations such as 0.4 × 3.

Double-precision floating-point numbers

Every double-precision floating-point number (other than ±0, ±∞, or NaN)² is represented as \[ (-1)^{s} \times \left( 1 + \frac{f}{2^{52}} \right) \times 2^{e - 1023} \hspace 10em \text{(1)}\] where the sign part s, the exponent part e, and the fraction part f are nonnegative integers that are stored in one, 11, and 52 bits, respectively (for a total of 64 bits). Any expression written this way can also be written in the form a × 2^b, where a is an odd integer and b is an integer. Float Explorer will give you this form as well; for example, 44.0457 ≈ 3099440595426135 × 2^-46.

It's like scientific notation

Here's another way to think about the form given in expression (1): The number 44.0457 can be written in scientific notation as 4.40457 × 10¹. In general, we write a number³ in scientific notation as (-1)^s × d₀.d₁d₂…d_n × 10^p, where each digit d_j has 0 ≤ d_j < 10 and d₀ ≠ 0. We understand the decimal notation d₀.d₁d₂…d_n to be a notation for \[ \sum_{j = 0}^n \frac{d_j}{10^j}.\]

Likewise, a number⁴ could be written in a binary form (-1)^s × 0bb₀.b₁b₂…b_n × 2^p, where each digit b_j has 0 ≤ b_j < 2 (in other words, b_j = 0 or b_j = 1) and b₀ ≠ 0. This second condition means that b₀ = 1 always. We understand the binary notation 0b1.b₁b₂…b_n to be shorthand for \[ 1 + \sum_{j = 1}^n \frac{b_j}{2^j}.\]

Suppose that we have n = 52 (in other words, there are 52 digits right of the binary point in the binary expansion of our number). Then, \[\begin{aligned} (-1)^s \times \text{0b}1.{b_1}{b_2}{b_3}\dots b_{52} \times 2^p &= (-1)^s \times \left( 1 + \sum_{j = 1}^{52} \frac{b_j}{2^j} \right) \times 2^p \\ &= (-1)^s \times \left( 1 + \frac{\sum_{j = 1}^{52} b_j\times 2^{52-j}}{2^{52}} \right) \times 2^{(p + 1023) - 1023}\\ &= (-1)^s \times \left( 1 + \frac{f}{2^{52}} \right) \times 2^{e - 1023}, \end{aligned}\] where \[f = \sum_{j = 1}^{52} b_j \times 2^{52 - j}\] and e = p + 1023, the form in expression (1).

Sadly, not every real number, or even rational number, can written exactly in this form. For most numbers, the best we can do is approximate. It's because of these approximations that we end up with rounding errors.

An example of rounding error

Suppose we open up a Python REPL and do the following:

Python (REPL)
a = 0.4
a

0.4

3 * a

1.2000000000000002

Shouldn't this be 1.2? Let's keep going:

Python (REPL)
3 * a - 1.2

2.220446049250313e-16

import math
math.log2(3 * a - 1.2) # log base 2

-52.0

It appears that 1.2 and the computer's calculation of 3 × 0.4 are exactly 2^-52 apart.

How is 0.4 represented?

Plugging 0.4 into Float Explorer, you see that the fraction part will be represented by 1001 repeated 12 times, followed by 1010. In hexadecimal⁵, the fractional part is 0x999999999999a.

Screenshot of a web app. The app is titled "Float Explorer". The number 0.4 is being examined. The app shows the hexadecimal, decimal, and binary representation of this number as a double-precision floating-point number, explaining the sign, exponent, and fraction parts of the representation. — The number 0.4 is represented with a sign of 0 (positive), a fraction of 0x999999999999a (that's twelve nines followed by a) and an exponent of 1021.

To understand what's going on better, consider this line of reasoning:

Note 1/3 = 0.3333.

Therefore, 1/3 × 3 = 0.9999.

Of course, 1/3 is not actually equal to 0.3333, but rather 0.3333…. There is a small rounding error that makes the end result off by 10^-4.

Why is 1/3 = 0.3333…? If you ask a mathematician (such as myself), the answer you might get is that \[\begin{aligned} 0.3333\dots &= \sum_{j = 1}^{\infty} \frac{3}{10^j} \\ &= \frac{3}{10} \sum_{j = 0}^{\infty} \left(\frac{1}{10}\right)^j \\ &= \frac{(3/10)}{1 - 1/10} \\ &= \frac{(3/10)}{(9/10)} \\ &= \frac{3}{9} = \frac{1}{3}. \end{aligned}\] People usually first encounter the geometric series in calculus (assuming they learn it at all). However, in elementary school you might have done this calculation:

Long division of 1 by 3. Each time 3 times 3 equals 9 is subtracted, we end up with a remainder of 1, which becomes 10 after dropping another zero. The result is 0.333 repeating.

You recognize that you are in a loop, and that 3s will be added as digits ad infinitum.

Let's use long division to calculate a hexadecimal expansion of 2/5.

Long division of 2.000 by 5, with no steps completed so far.

You may be tempted to start with 0.4, but we are expanding in hexadecimal, not decimal. The hexadecimal value 0x20 is equal to 32, and 5 × 6 + 2 = 32. In hexadecimal, the decimal calculation 32 - 30 = 2 is written 0x20 - 0x1e = 0x02.

Long division of 2.000 by 5 (in hexadecimal). So far, the computation has
been carried out to 0.6. The (hex) value 1e is subtracted from 20 to get 2.

Much like in the case of the decimal expansion of 1/3, you now recognize that we are in a loop, where additional digits of the hexadecimal expansion will be 6.

Long division of 2.000 by 5 (in hexadecimal). So far, the computation has been carried out to 0.666. Multiple times, the (hex) value 1e is subtracted from 20 to get 2, after which another zero is dropped.

So, the hexadecimal expansion of 2/5 is 0.666…. We can also verify this using a geometric series:

\[\begin{aligned} \text{0x}0.666\dots &= \sum_{j = 1}^\infty \frac{6}{16^k} \\ &= \frac{6}{16} \sum_{j = 0}^\infty \left(\frac{1}{16}\right)^k \\ &= \frac{(3/8)}{1 - 1/16} \\ &= \frac{(3/8)}{(15/16)} \\ &= \frac{2}{5}. \end{aligned} \]

Now, it's straightforward convert between hexadecimal and binary, since each hexadecimal digit corresponds exactly to four binary digits. Specifically 0x6 is equal to 0b0110. Therefore, \[\begin{aligned} \frac{2}{5} & = \text{0x}0.666\dots \\ &= \text{0b}0.0110011001100110\dots \\ &= \text{0b}1.10011001100110011\dots \times 2^{-2} \\ &= (-1)^{0} \times \left(1 + \text{0b}0.10011001\dots \right) \times 2^{-2} \\ &= (-1)^{0} \times \left(1 + \text{0x}0.999\dots \right) \times 2^{-2} \\ &= (-1)^{0} \times \left(1 + \frac{\text{0x}0.999\dots \times 2^{52}}{2^{52}}\right) \times 2^{-2} \\ &= (-1)^{0} \left(1 + \frac{\text{0x}9999999999999.9999\dots}{2^{52}}\right) \times 2^{1021-1023}, \end{aligned}\] since 0b1001 = 9 = 0x9. Therefore, to get something in the form \[(-1)^{s} \times \left(1 + \frac{f}{2^{52}} \right) \times 2^{e-1023}\] we round 0x9999999999999.999… (that's 13 nines in front of the hexadecimal point) to 0x999999999999a, getting \[0.4 = (-1)^{0} \times \left(1 + \frac{\text{0x}999999999999a}{2^{52}} \right) \times 2^{1021 - 1023}.\]

Multiplication rounding errors

Essentially, 0.4 is represented as \[ 0.4 \approx (-1)^0 \times \text{0b}1.1001100110011001100110011001100110011001100110011010 \times 2^{-2}.\] (That's 1001 repeated 12 times followed by 1010.) Likewise, 3 is represented as \[3 = (-1)^0 \times \text{0b}1.1000000000000000000000000000000000000000000000000000 \times 2^1. \] Therefore, to calculate 0.4 × 3, the computer calculates \[\hat q = \text{0b}1.1001100110011001100110011001100110011001100110011010 \times \text{0b}1.1,\] and then returns p = (-1)^{0 + 0} × q × 2^1-2, where q is q̂ rounded to 52 bits right of the binary point. Here is the calculation in binary of q̂ and q:

Multiplication of the binary approximation to 53 bits of 1.6 by the binary expansion of 1.5. The binary expansion of 1.6 is 1 point 1001 repeated 12 times, followed by 1010. The binary expansion of 1.5 is 1.1. The result is 10 point 0110 repeated 12 times followed by 1110. Then this is rounded to 53 bits as 10 point 0110 repeated 12 times followed by 100.

Hence, the approximation p for 0.4 × 3 is \[\begin{aligned} p &= (-1)^{0 + 0} \times q \times 2^{1-2} \\ &= (-1)^0 \times q \times 2^{-1} \\ &= (-1)^0 \times \text{0b}10.011001100110011001100110011001100110011001100110100 \times 2^{-1} \\ &= (-1)^0 \times \text{0b}1.0011001100110011001100110011001100110011001100110100 \times 2^{0} \\ &= (-1)^0 \times \left(1 + \frac{\text{0b}11001100110011001100110011001100110011001100110100}{2^{52}} \right) \times 2^{0} \\ &= (-1)^0 \times \left(1 + \frac{\text{0x}3333333333334}{2^{52}}\right) \times 2^{1023 - 1023} \end{aligned} \]

Put Float Explorer into Normal mode, then enter the following keystrokes: 0 . 4 × 3 =. The result will be 1.2000000000000002, with a sign of s = 0, exponent of e = 1023, and fraction of f = 0x3333333333334, which matches what we did by hand above.

Screenshot of a web app. The app is titled "Float Explorer". A calculator keypad is shown. The result on the screen is 1.2000000000000002. The app shows the hexadecimal, decimal, and binary representation of this number as a double-precision floating-point number, explaining the sign, exponent, and fraction parts of the representation. — Using the Normal calculator mode of Float Explorer, we calculate 0.4 × 3 ≈ 1.2000000000000002.

On the other hand, if we use Float Explorer to examine the correct value of 0.4 × 3 = 1.2, we get with a sign of s = 0, exponent of e = 1023, and fraction of f = 0x3333333333333. Note that the difference in last digit between the fraction parts.

Screenshot of a web app. The app is titled "Float Explorer". The app shows the hexadecimal, decimal, and binary representation of the number 1.2 as a double-precision floating-point number, explaining the sign, exponent, and fraction parts of the representation. — Float Explorer shows us that the fraction parts of 1.2 and 1.2000000000000002 are off by 1 (in other words, they differ by 2^{- 52}).

My hope

My hope is that this will be helpful to someone trying to make sense of floating-point numbers and the weird rounding errors that can occur with floating-point arithmetic.

The University of Oregon's Fenton Hall (in Eugene, Oregon) and the house where I lived from summer 2013 until summer 2024 (in Winona, Minnesota) are both at a latitude of 44.0457° N, despite being nearly 2500 km apart. ↩
IEEE floating-point formats make a distinction between negative and positive zero. For example, under floating-point arithmetic, \(1 / \infty = 0\) while \(1 / (-\infty) = -0\). ↩
Specifically, to be able to express a real number r exactly in (decimal) scientific notation, we must have r = p/q for some relatively prime integers p and q where q > 0 and q divides some power of 10. All other real numbers will be approximated to n decimal places. ↩
A similar caveat applies here as in footnote 3, except the denominator must divide (and hence be) a power of 2 instead of 10. ↩
Numbers in hexadecimal, or base 16, are typically written with the familiar digits 0–9 as well as the letters a–f. To distinguish from decimal numbers, the prefix 0x is often added to hexadecimal numbers. Counting in hexadecimal, the first 33 nonnegative integers are 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, and 0x20. ↩