1.1 What Breaks Real AI Systems

Most AI failures do not come from “AI problems.” They come from numerical problems.

When we look at cutting-edge models—LLMs, diffusion models, optimization pipelines—it’s easy to imagine that failures happen because the algorithms are too complex, or the architecture is wrong, or the data is insufficient.

In reality, many failures start much deeper. They begin with the smallest units of computation: numbers, matrices, and the operations we perform on them.

The Illusion of Perfect Math

On paper, every equation behaves beautifully. Systems of linear equations always have a clean solution. Matrix factorizations always work. Gradient descent always moves you closer to the minimum.

Inside a machine, none of that is guaranteed.

Computers do not use real numbers. They use floating-point numbers, an approximation of real values stored in binary form. This means:

Some numbers cannot be represented exactly.
Rounding happens constantly.
Small errors compound over time.

A model that looks perfect on paper can collapse in practice because the numbers inside it slowly drift off the path.

The Hidden Fragility of AI Pipelines

To understand what breaks real AI systems, it helps to look at the patterns that appear again and again across organizations, languages, and problem domains.

1. Ill-conditioned problems

A problem is ill-conditioned when a tiny change in input causes a huge change in output. In floating-point arithmetic, tiny changes happen constantly—so ill-conditioning turns microscopic noise into catastrophic error.

Common triggers include:

Nearly dependent features
Correlated embeddings
Extremely small or large values mixed together

An ill-conditioned matrix doesn’t “almost” work. It actively destroys stability.

2. Naturally unstable algorithms

Some algorithms amplify numerical noise by design. A few famous examples:

Naive Gaussian elimination (without pivoting)
Gram–Schmidt orthogonalization (classic version)
Normal equations for least squares

Engineers often implement these because they look simple in a textbook—only to discover that the implementation behaves nothing like the theory.

3. Loss of significance

This happens when subtracting two nearly identical numbers causes the meaningful digits to cancel out, leaving only numerical noise. It’s subtle and almost impossible to detect without understanding the underlying arithmetic.

Loss of significance is the silent killer of simulations, ML training loops, and financial models.

4. Overflow and underflow

When values become too large, they overflow to infinity. When they become too small, they underflow to zero.

Softmax instability? Hyperparameter explosions? Exploding/vanishing gradients? All of them trace back to this fundamental limitation.

5. Poorly chosen decompositions

Using LU decomposition on a matrix that wants QR. Using QR on a matrix that wants SVD. Using normal equations when the problem requires a more stable method.

Choosing the wrong solver is like using a flathead screwdriver on a Phillips screw—technically possible, but only if nothing goes wrong.

6. Scaling issues

When values vary across many orders of magnitude, numerical precision collapses. This is why ML pipelines include:

normalization
standardization
whitening
log transforms

Scaling is not just a preprocessing trick. It is a numerical survival mechanism.

Why These Problems Matter

What makes numerical issues especially dangerous is that they masquerade as something else. Models appear to fail for mysterious reasons:

“Training diverged.”
“Loss suddenly spiked.”
“The model is unstable.”
“Gradients exploded out of nowhere.”
“This algorithm works on paper but breaks in production.”

But underneath, the failures often stem from:

rounding error
cancellation
ill-conditioning
inappropriate algorithms
poor scaling

Modern AI systems are built on top of enormous matrix operations. If those operations become unstable, everything above them—attention layers, embeddings, optimizers, inference pipelines—starts to wobble.

A Simple Rule for Real-World Systems

The more sophisticated the AI system becomes, the more its stability depends on the quality of its numerical foundations.

This leads us to the heart of the issue:

Textbook mathematics assumes perfect numbers. Computers do not have perfect numbers.

Everything you think you “know” about linear algebra changes the moment you step into floating-point arithmetic.

To understand how AI systems truly behave, we must first understand how floating-point numbers really work.

That is the topic of the next section.

In 1.2, we’ll peel back the abstraction and look at the computational model itself—how numbers are stored, how rounding works, and why the smallest implementation detail can completely change an algorithm’s behavior. It is the difference between theory and practice, and it explains why numerical linear algebra is not optional knowledge for modern engineers.

2025-09-03

numerical linear algebra

Shohei Shimoda

I organized and output what I have learned and know here.

カテゴリー

Computation & Mathematical Systems

Microsoft Teams Bots

検索ログ

Hello World bot 755 IT assistant bot 702 Deploy Teams bot to Azure 700 Microsoft Bot Framework 678 Adaptive Card Action.Submit 646 Microsoft Graph 626 Bot Framework example 613 Graph API token 610 Bot Framework Adaptive Card 600 Zendesk Teams integration 596 Adaptive Cards 594 Microsoft Teams Task Modules 594 ServiceNow bot 592 Teams app zip 591 Teams bot development 589 Bot Framework CLI 588 Teams chatbot 586 Azure App Service bot 585 Azure bot registration 585 Teams production bot 584 Azure Bot Services 581 Azure CLI webapp deploy 580 Task Modules 576 bot for sprint updates 574 Bot Framework proactive messaging 570 Teams bot packaging 566 Teams bot tutorial 563 Microsoft Entra ID 560 Bot Framework prompts 558 C 557

Development & Technical Consulting

Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.

1.1 What Breaks Real AI Systems

1.1 What Breaks Real AI Systems

The Illusion of Perfect Math

The Hidden Fragility of AI Pipelines

1. Ill-conditioned problems

2. Naturally unstable algorithms

3. Loss of significance

4. Overflow and underflow

5. Poorly chosen decompositions

6. Scaling issues

Why These Problems Matter

A Simple Rule for Real-World Systems

Shohei Shimoda

カテゴリー

タグ

検索ログ

Development & Technical Consulting