Chapter 6 — Cholesky Decomposition
Chapter 6 — Cholesky Decomposition
Some chapters in numerical linear algebra feel like a rite of passage. LU decomposition, Gaussian elimination, QR factorization — these are the classical tools you expect to learn, because every textbook says so, every course covers them, and every engineer eventually bumps into them. They form the skeleton of numerical computation, the reliable backbone of many systems.
But then there are the quieter techniques — the ones that don’t always appear on the first page of a syllabus, but quietly power some of the most important computations in modern engineering. Among these, one method stands out not because it is flashy or complicated, but because it is precisely the opposite: simple, elegant, unusually stable, and almost unreasonably efficient.
This method is Cholesky decomposition.
If LU decomposition is the all-purpose wrench, Cholesky is the finely honed instrument that only appears when the problem has a particular kind of structure — and when it does, it changes everything about how we compute.
Where This Story Begins
Engineers usually meet Cholesky in one of two ways. The first is through a brief mention in a linear algebra course — a footnote near the end of a chapter, often treated as a special case: “if a matrix is symmetric positive definite, you can factor it more efficiently.” It feels like a technical curiosity, something you acknowledge and move past.
The second way — the more memorable one — is the moment you discover that a supposedly “simple” problem in machine learning or statistics refuses to run because your matrix is too big, too unstable, or too slow to invert. Then you search for alternatives and suddenly notice that every high-performance implementation uses Cholesky.
That moment is often the beginning of a shift in thinking:
- Why does this single factorization appear in so many algorithms?
- Why is it always faster than LU or QR?
- What makes it so numerically stable?
- Why does machine learning rely on it so heavily?
And, perhaps most importantly:
What makes symmetric positive definite (SPD) matrices special?
The Hidden Structure in Real Data
One of the most liberating realizations in numerical linear algebra is that real-world matrices rarely behave like arbitrary matrices from a textbook. They are structured. They are shaped by geometry, probability, optimization, and physical constraints. They encode correlations, interactions, energies, and distances.
In machine learning, covariance matrices are SPD.
In optimization, Hessians (or their approximations) are SPD.
In Gaussian processes, kernel matrices are SPD.
In Kalman filters, noise covariance matrices are SPD.
In mechanics, stiffness matrices are SPD.
In statistics, normal equations often produce SPD matrices.
Once you begin seeing this pattern, you realize:
Modern computation runs on SPD matrices.
And if SPD matrices dominate real workloads, then we should use the factorization designed specifically for them — Cholesky.
Why Cholesky Feels Almost Magical
Cholesky decomposition writes a symmetric positive definite matrix A as:
A = L LT
Where L is a lower triangular matrix. That’s it. No pivoting, no conditionals, no need to maintain both L and U. The symmetry and positive definiteness guarantee that such a factorization exists and is unique.
This simplicity unlocks three superpowers:
- Speed — Cholesky requires roughly half the work of LU decomposition.
- Stability — it often behaves better numerically, because it avoids dangerous operations that amplify rounding errors.
- Memory efficiency — only one triangular matrix is stored, reducing memory use by half.
On modern hardware, these advantages compound. Cholesky is one of the most cache-friendly operations in scientific computing. It is deeply optimized in BLAS and LAPACK. It is the default solver for many probabilistic models.
This is why, once you enter fields like Bayesian inference, robotics, physics simulation, finance, or kernel methods, Cholesky is everywhere. You don’t see it in the headlines, but you see it behind almost every stable, efficient system.
From Theory to Practice
This chapter is dedicated to revealing the inner workings, applications, and subtleties of Cholesky decomposition. Our journey includes:
- 6.1 SPD matrices and why they matter — understanding the geometry and consequences of symmetry and positive definiteness.
- 6.2 Memory advantages — how storing only one triangular matrix transforms performance.
- 6.3 Applications in ML, statistics, kernels — the real-life algorithms that depend on Cholesky for both correctness and speed.
Along the way, we will look at:
- Geometric interpretations of SPD matrices.
- Why positive definiteness guarantees stability.
- How covariance matrices rely on Cholesky for sampling, likelihoods, and inference.
- How kernel machines avoid matrix inversion by using triangular solves.
- Why optimization libraries treat Cholesky as the “gold standard” factorization.
By the end of this chapter, you will understand not only how Cholesky works, but also why software engineers, data scientists, and simulation engineers trust it implicitly.
Where We Go Next
To unlock the power of Cholesky, we must first understand the foundation that makes it possible: symmetric positive definite matrices. These matrices are not merely a special case — they are the mathematical expression of stability, energy, and curvature. They tell us when a system behaves predictably, when optimization converges, and when probabilities are well-defined.
Let’s begin with 6.1 SPD matrices and why they matter.
Shohei Shimoda
I organized and output what I have learned and know here.タグ
検索ログ
Development & Technical Consulting
Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.
Contact Us