1.2 Floating-Point Reality vs. Textbook Math

1.2 Floating-Point Reality vs. Textbook Math

In textbooks, numbers behave perfectly. They stretch to infinite precision, follow algebraic rules flawlessly, and never “forget” information. A real computer, however, works with a very different kind of number — one that is approximate, lossy, and shaped by the hardware it runs on.

This gap — between the clean world of mathematical numbers and the messy world of floating-point numbers — is one of the biggest reasons AI systems fail in ways that seem mysterious.

The Illusion of Real Numbers

Mathematics assumes that real numbers have infinite precision. Computers don’t. They store numbers in a format called floating-point, usually following the IEEE 754 standard.

Floating-point numbers are like scientific notation with limits: only a fixed number of digits can be stored. Everything else is rounded — sometimes gently, sometimes catastrophically.

For example, a real number like 0.1 seems simple. But in binary floating-point? It cannot be represented exactly. A computer stores something close to 0.1, not 0.1 itself.

That means operations like 0.1 + 0.2 == 0.3 may fail. Not because Python or C# is wrong — but because 0.1 never truly existed inside the machine.

The Machine Epsilon: The Smallest Difference You Can See

One way to understand floating-point limits is through machine epsilon (ε): the smallest difference the computer can distinguish around 1.0.

IEEE 754 double precision has ε ≈ 2.22 × 10⁻¹⁶. Anything smaller than that can “vanish” — the machine treats two different numbers as identical because it cannot see the gap between them.

When numbers differ by far less than epsilon, adding them, subtracting them, or comparing them becomes unreliable. This is how subtle bugs creep into AI systems: not because the concept is wrong, but because the computation is blind to tiny distinctions.

Rounding is Everywhere

Every floating-point operation introduces rounding, and repeated operations — especially in matrix multiplication, iterative solvers, or gradient-based optimization — can accumulate those errors.

A textbook might show:

a = (x + y) + z
b = x + (y + z)

Mathematically, a = b. In floating-point arithmetic, they can be different. This is known as non-associativity.

Even more surprisingly:

  • Subtraction of nearly equal numbers can destroy precision.
  • Addition of wildly different magnitudes can erase small terms.
  • Repeated multiplication amplifies tiny mistakes into large ones.

All of this is normal in floating-point. None of it appears in textbook algebra.

The Real Danger: Believing Math Works the Same in Code

Engineers who only know textbook math assume:

  • "If an algorithm is correct mathematically, it will work in code."
  • "Precision errors are rare edge cases."
  • "Libraries like NumPy or PyTorch will handle the rest."

But in real systems, the opposite is true:

  • Many failures come from mathematical algorithms that behave badly in floating-point.
  • Errors are not rare — they accumulate constantly.
  • NumPy and LAPACK are powerful, but they cannot fix a fundamentally unstable formulation.

The computer is not doing “real math.” It is doing something that only resembles math — a constrained version embedded in hardware.

Why This Matters for AI and Numerical Software

Modern AI workloads involve massive sequences of floating-point operations: matrix multiplications, decompositions, optimizations, convolutions, and iterative refinement. That means rounding, cancellation, and overflow happen constantly.

Most of the bizarre failures engineers see — exploding gradients, nan values, unstable solvers, inconsistent predictions — can be traced not to “AI being unpredictable,” but to numerical instability.

To build reliable systems, you need to know:

  • how floating-point numbers really behave,
  • why naive formulas break down, and
  • which algorithms are stable enough for real workloads.

Transition to 1.3

Understanding floating-point arithmetic gives us the first piece of the puzzle. But numerical errors alone don’t explain why some problems explode while others remain stable. To understand that, we need to explore two deeper concepts — stability and conditioning — and see why not knowing them can silently destroy even well-designed AI systems.

Let’s continue to 1.3 Stability, Conditioning, and the Cost of Not Knowing Them.

2025-09-04

Shohei Shimoda

I organized and output what I have learned and know here.