3.1 Norms and Why They Matter
3.1 Norms and Why They Matter
When most people encounter norms for the first time, they see them as nothing more than ways to measure the “length” of a vector or the “size” of a matrix. Simple enough. A tool, not an idea. But in numerical computation, a norm is not merely a ruler. It is a worldview. It determines what we consider big, what we consider small, what we consider dangerous, and what we consider safe.
The choice of norm shapes every downstream decision—algorithm selection, stability analysis, error estimates, and even how we interpret the results of machine learning models. Understanding norms is not optional; it is foundational.
The Illusion of a Single “Length”
If you ask someone to compute the length of a vector in 2D—for example, (3, 4)—most people immediately jump to the Euclidean distance:
‖(3,4)‖₂ = √(3² + 4²) = 5
That feels natural because we've internalized the geometric picture of distance. But what if we change the norm? The same vector behaves completely differently:
‖(3,4)‖₁ = |3| + |4| = 7 ‖(3,4)‖∞ = max(|3|,|4|) = 4
Three different norms. Three different answers. Three different ideas of “size.” So which one is correct?
In numerical computation, the answer is simple and surprising: None of them. And all of them.
Different norms illuminate different aspects of a problem. They reshape the geometry of your data, distort directions differently, emphasize some signals while hiding others. The right norm is not the one that matches your intuition—it’s the one that reveals the behavior of the system.
Norms Change the Geometry of Everything
One of the most striking examples is how norms reshape the concept of a “unit circle.” Below is what the unit circle looks like under three common norms:
L2 norm: L1 norm: L∞ norm: ○ ◇ □
In L2, the unit circle is the familiar round shape. In L1, it's a diamond. In L∞, it becomes a square.
This geometry is not just an abstract curiosity. It directly affects:
- how optimization algorithms search their space,
- how regularization shapes model parameters,
- how rounding errors propagate,
- how perturbations grow or shrink inside algorithms.
Even deep learning—often seen as “beyond linear algebra”—is intensely dependent on norms, though many practitioners do not realize it. Gradient descent behaves differently depending on norm structure. Loss surfaces appear smooth or sharp depending on the norm used to measure curvature. Even model initialization strategies implicitly encode norm choices.
Matrix Norms: Where Things Get Serious
With vectors, norms measure size. With matrices, norms measure amplification. A matrix norm tells you how much the matrix can stretch a vector.
This is the key idea:
A matrix norm answers: “How badly can this transformation distort the world?”
For a matrix A, the induced norm is:
‖A‖ = max over non-zero x of (‖Ax‖ / ‖x‖)
That ratio—how much a matrix can expand a vector—is a quantitative measure of risk. It predicts:
- how much rounding noise gets amplified,
- whether solving
Ax = bwill be stable, - whether an iterative solver will converge or explode,
- how errors evolve inside long computational chains.
Norms are not just measurements—they are predictors of failure.
Norms Define the Shape of Errors
In practical systems, we often encounter a situation like this:
Ax ≈ b (but not exactly)
Why the “approximate” sign? Because rounding, truncation, hardware, and numerical instability introduce deviations. To understand how serious these deviations are, we need a norm. But again:
The norm you choose changes the severity of the error.
Consider two errors of the same absolute size. In one norm, the error is tiny. In another norm, it is catastrophic.
This is why numerical libraries (NumPy, LAPACK, PyTorch, TensorFlow) carefully specify which norms they use when reporting convergence or stability. If you misunderstand the norm behind a reported error, you misunderstand the entire meaning of the result.
Norms Predict Ill-Conditioning
Later in this chapter, we examine conditioning: how sensitive a problem is to small perturbations. But conditioning itself is defined using norms. Change the norm, and you change the conditioning.
This single relationship explains half of all numerical mysteries:
- Why some matrices behave well in one analysis and poorly in another,
- Why Gradient Descent succeeds on one formulation but fails on another,
- Why PCA and SVD behave differently depending on preprocessing,
- Why certain systems magnify noise unpredictably.
Norms determine how sensitive your system appears, which often determines how sensitive your system actually is in practice.
The Norm as a Narrative
Every large-scale AI or simulation pipeline has a story. Layers of transforms reshape signals, multiply errors, rotate coordinate systems, diagonalize structures, compress dimensions, and propagate uncertainty.
Norms tell that story clearly. They show you how the system thinks about “size.” They reveal how errors move. They expose hidden relationships between variables.
And they give engineers a vocabulary for diagnosing failure modes before they happen.
Understanding norms means seeing computation not as a collection of formulas, but as a geometric engine that reshapes information at every step.
Taking the Next Step
Once norms are understood, a natural question emerges:
Now that we know how to measure size, how do we measure error?
Because not all errors matter equally. Some grow. Some cancel. Some are harmless. Some are the beginning of catastrophic failure.
To understand this, we move next into a central topic of numerical computation: how to measure errors correctly.
Next: 3.2 Measuring Errors
Shohei Shimoda
I organized and output what I have learned and know here.タグ
検索ログ
Development & Technical Consulting
Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.
Contact Us