7.1 Gram–Schmidt and Modified GS
7.1 Gram–Schmidt and Modified GS
If you ask someone to “explain QR decomposition from intuition,” the Gram–Schmidt process is almost always where they begin. It mirrors the geometry we instinctively imagine when we hear the word orthogonal — taking a set of messy, overlapping vectors and polishing them into a clean, perpendicular basis.
But although Gram–Schmidt is easy to visualize and elegant on paper, it contains a subtle flaw: it is numerically fragile. In floating-point arithmetic, the way we implement Gram–Schmidt matters just as much as what it computes.
To understand this, let's start with the classical algorithm — the one every textbook introduces first — and then unravel how real computers distort its behavior.
Classical Gram–Schmidt: the geometric picture
Suppose we are given vectors a₁, a₂, …, aₙ. We want to produce an orthonormal set q₁, q₂, …, qₙ such that:
span(a₁, …, aₙ) = span(q₁, …, qₙ)
The idea is beautifully simple:
- Take the first vector
a₁and normalize it. - Remove from
a₂its projection ontoq₁. - Normalize what remains to get
q₂. - For each new vector
aₖ, subtract projections onto all previousqᵢ.
Symbolically:
v₁ = a₁ q₁ = v₁ / ||v₁|| v₂ = a₂ − (q₁ᵀ a₂) q₁ q₂ = v₂ / ||v₂|| v₃ = a₃ − (q₁ᵀ a₃) q₁ − (q₂ᵀ a₃) q₂ q₃ = v₃ / ||v₃|| ...
In geometric terms, each vₖ is simply the “new direction” left over after stripping away all previous directions. What could possibly go wrong?
The problem is: computers don’t do geometry — they do floating-point arithmetic. And this makes Gram–Schmidt far more delicate than it appears.
Where Classical Gram–Schmidt breaks
The Gram–Schmidt algorithm contains repeated subtraction between nearly parallel vectors. In floating-point arithmetic, subtraction of nearly equal numbers is one of the most dangerous operations. It results in catastrophic cancellation — the dramatic loss of significant digits.
Imagine two vectors that are almost linearly dependent:
a₂ ≈ α a₁
The projection coefficient q₁ᵀ a₂ is large, and the residual v₂ = a₂ − (q₁ᵀ a₂) q₁ is tiny. Computing “large minus almost-the-same-large” leaves us with a result whose most important digits have been lost.
In practice, this means:
- q₂ is not very orthogonal to q₁.
- The errors compound at each step.
- By the time we reach qₖ, orthogonality may be severely degraded.
This is not a small problem — it is a fundamental numerical limitation. For matrices that are nearly rank-deficient or have nearly redundant columns, classical Gram–Schmidt can produce orthonormal vectors that are, in fact, visibly not orthonormal.
This is why classical Gram–Schmidt is mathematically correct but numerically incorrect.
Modified Gram–Schmidt: the simple but powerful fix
Modified Gram–Schmidt (MGS) is not a different algorithm mathematically. It is the same projection-subtraction idea — but implemented in a way that significantly reduces error.
The key difference is the order of operations. Instead of subtracting all projections at once, MGS subtracts each projection step-by-step from the working vector.
Classical version:
vₖ = aₖ − Σ (qᵢᵀ aₖ) qᵢ
Modified version:
vₖ = aₖ
for i = 1..k-1:
rᵢₖ = qᵢᵀ vₖ
vₖ = vₖ − rᵢₖ qᵢ
qₖ = vₖ / ||vₖ||
Why does this matter? Because subtracting projections incrementally keeps the working vector small and reduces catastrophic cancellation. MGS is not perfectly stable, but it is significantly more robust and is used in many practical numerical algorithms.
In fact, in real applications:
- MGS is often “good enough” for moderately conditioned problems.
- It is dramatically better than classical Gram–Schmidt.
- It is still not as stable as the industry-standard: Householder reflections.
Gram–Schmidt in machine learning and statistics
You may wonder: “Do I ever need to hand-code Gram–Schmidt?” Usually, no. But you need to understand it because:
- Many ML/AI algorithms implicitly depend on orthogonalization.
- Noise in datasets often leads to near-linear dependence.
- Regression and least-squares directly depend on orthogonality.
- PCA begins with orthogonalization before SVD enters the picture.
- Krylov methods (iterative solvers) use variants of Gram–Schmidt every iteration.
Understanding why classical GS fails — and why MGS helps — builds the intuition needed for deeper algorithms. It also teaches a core lesson of numerical computation:
A mathematically correct algorithm is not necessarily a numerically correct algorithm.
The strength of Gram–Schmidt lies in its intuitive geometry. Its weakness lies in floating-point arithmetic. Modified Gram–Schmidt patches that weakness, but only up to a point.
To go further — to reach true numerical stability — we need a different idea entirely: reflections.
Where we go next
Both classical and modified Gram–Schmidt are valuable for understanding orthogonality, but modern numerical libraries almost never use them as the backbone of QR decomposition. Instead, they rely on a method that behaves predictably even for extremely ill-conditioned matrices: Householder reflections.
Let’s now explore Householder transformations — the method that turns orthogonalization into one of the most stable operations in numerical computing.
Shohei Shimoda
I organized and output what I have learned and know here.タグ
検索ログ
Development & Technical Consulting
Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.
Contact Us