Chapter 7 — QR Decomposition

Chapter 7 — QR Decomposition

By now, we've walked through LU, explored the elegance and efficiency of Cholesky, and seen how positive-definite structure can turn once-expensive operations into fast, stable computations. But the real world is rarely so cooperative. Most matrices we meet inside real systems aren’t symmetric, aren’t positive definite, and aren’t even close to well-behaved.

In machine learning, statistics, signal processing, and numerical optimization, we frequently deal with rectangular matrices, tall matrices, ill-conditioned matrices, or data constructed from noisy measurements. These are the kinds of matrices that cannot be factored cleanly by LU without breaking, and cannot benefit from the protective symmetry required for Cholesky.

This is where QR decomposition becomes one of the most important tools in numerical linear algebra.

QR does something powerful and conceptually beautiful: it replaces an arbitrary matrix with two matrices that are fundamentally “nice”—a matrix with perfectly orthonormal columns, and an upper triangular matrix that preserves the original matrix’s essential structure.

A = QR, where Q is orthonormal and R is upper-triangular.

This single transformation unlocks a surprising amount of practical capability:

  • Stable least-squares solutions
  • Orthonormal basis extraction
  • Reliable rank estimation
  • Numerical robustness for ill-conditioned systems
  • Core operations in machine learning pipelines
  • Building blocks for eigenvalue algorithms

What makes QR exceptional isn’t just what it computes, but how it behaves under numerical arithmetic. Orthogonal matrices have a special property: they don’t amplify error. If LU sometimes feels fragile and Cholesky feels specialized, QR feels like the trustworthy friend who always behaves predictably—even when the data doesn’t.


Why Orthogonality Matters

To understand QR’s importance, imagine two worlds:

  • World A: Using LU on a nearly singular matrix.
    Round-off error explodes. Small perturbations produce wild results. Pivoting helps, but only so far.
  • World B: Using QR on the same matrix.
    The orthogonal transformation preserves the geometry. The numerical behavior remains stable. The algorithm “softens” the instability.

In World B, you aren’t just solving a problem—you’re solving it in a way that protects the integrity of the computation. This is why orthogonal transformations are considered “gold standard” operations in numerical analysis.

And unlike LU, which must be reshuffled with pivoting, QR gracefully handles matrices whose columns are nearly linearly dependent—something that happens constantly in applications like:

  • linear regression
  • signal reconstruction
  • high-dimensional feature engineering
  • kernel methods
  • inverse problems

Three Ways to Build Q

The power of QR decomposition comes not only from the result but also from the different ways we can construct it—each carrying its own numerical personality.

In this chapter, we will explore three foundational methods:

  1. Gram–Schmidt — the intuitive method that mirrors the geometric idea of orthogonalizing vectors.
  2. Modified Gram–Schmidt — a significant improvement in stability that fixes the original’s flaws.
  3. Householder reflections — the industry-standard method used in LAPACK, NumPy, MATLAB, and most numerical libraries because of its exceptional numerical stability.

Understanding all three allows you to appreciate not just the mathematical structure, but how stability emerges from geometry—and why some algorithms succeed while others fail.


Least Squares: Where QR Becomes Indispensable

The single most important real-world application of QR decomposition is solving least-squares problems:

minimize ||Ax − b||2

Least squares is everywhere:

  • regression models
  • curve fitting
  • parameter estimation
  • machine learning pipelines
  • compressed sensing
  • signal denoising

While the normal equations (AᵀA)x = Aᵀb provide the simplest theoretical solution, they are notoriously unstable. When AᵀA is ill-conditioned, numerical errors blow up— and sometimes the solution becomes meaningless.

QR solves least squares without ever forming AᵀA. This avoids catastrophic amplification of conditioning and preserves far more numerical precision.

QR is often preferred over LU and normal equations precisely because it is the most numerically stable way to compute least squares in practice.


Why This Chapter Matters

QR sits at a unique intersection of theory and practice. It is mathematically deep, yet computationally robust. It is foundational to numerical analysis, yet deeply embedded in machine learning, statistics, optimization, and kernel methods. It is the gateway to advanced algorithms such as:

  • Eigenvalue solvers (QR algorithm)
  • SVD computations
  • Rank-revealing factorizations
  • Conditioning analysis
  • Orthogonal basis construction

It is impossible to build reliable numerical systems without mastering QR.


Where We Go Next

All orthogonal factorizations begin with one fundamental operation: turning arbitrary vectors into orthonormal ones.

The most intuitive way to do that is Gram–Schmidt—a process beloved for its geometric clarity and notorious for its numerical fragility.

To understand QR, to understand orthogonality, and to understand why stability matters so much, we must start here.

Let’s begin with Gram–Schmidt—and why the “obvious” algorithm is rarely the one we should use.

2025-10-01

Shohei Shimoda

I organized and output what I have learned and know here.