7.3 Least Squares Problems

Least squares problems appear everywhere in applied mathematics, data science, and machine learning. Any time we try to fit a model, estimate parameters, denoise signals, or approximate relationships, we are (knowingly or unknowingly) solving a least squares problem.

Formally, given a matrix A and a vector b, we wish to find x that minimizes:

‖Ax − b‖₂

This is the geometric version of the problem. Algebraically, it expands to minimizing the sum of squared residuals. Practically, it means: find the best possible solution when the system has no exact solution.

Least squares is the mathematical backbone of:

Linear regression
Signal reconstruction
Time-series smoothing
Computer vision calibration
Machine learning (as an inner subproblem)
Overdetermined physical measurement systems

Because of its universality, the way we solve least squares matters enormously. A “good-enough” method in theory can become dangerously unstable in floating-point arithmetic.

The geometry of least squares

Least squares has an incredibly clean geometric interpretation:

Solve Ax ≈ b by projecting b onto the column space of A.

Imagine the columns of A as vectors forming a subspace. The vector b lives somewhere in the ambient space. The goal is to find the point in the column space that is closest to b. That point is A x̂, where x̂ is the least squares solution.

This geometric viewpoint immediately reveals something important:

Solving least squares is fundamentally about orthogonality.

If we can build an orthonormal basis for the column space of A, everything becomes easy and numerically stable.

The normal equations — and their pitfalls

The most “textbook” approach to solving least squares is using the normal equations:

Aᵀ A x = Aᵀ b

This looks elegant, but it hides a dangerous problem.

The matrix Aᵀ A squares the condition number of A.

That means:

Ill-conditioned systems become catastrophically ill-conditioned.
Small errors in A or b become amplified.
The system becomes unstable even for moderate datasets.

Moreover, forming Aᵀ A explicitly often introduces unnecessary rounding errors. Floating-point summation is not exact, especially for large-scale data.

So while the normal equations are mathematically valid, in practice they often lead to:

Poor accuracy
Unstable estimates
Incorrect solutions under floating-point arithmetic

This is why numerical linear algebra textbooks, libraries, and ML frameworks strongly recommend against using them except for small, well-behaved systems.

QR decomposition to the rescue

If we decompose A = QR where Q is orthogonal and R is upper triangular, the least squares problem becomes:

‖QRx − b‖₂

Because Q preserves norms, this simplifies neatly to:

‖Rx − Qᵀ b‖₂

Now R is triangular, so solving the least squares system becomes:

R x = Qᵀ b

This is stable, efficient, and avoids forming Aᵀ A. This method is the foundation of robust least-squares solvers everywhere.

Two crucial advantages:

Orthogonality preserves numerical accuracy
The triangular system is easy to solve and well-behaved

Because QR handles rank-deficient or nearly dependent columns better than normal equations, it serves as the standard approach in ML, statistics, and scientific computing.

SVD: the ultimate least-squares solver

Although QR is excellent, the most robust solver for least squares is the Singular Value Decomposition (SVD).

SVD directly reveals:

Rank deficiencies
Near-linear dependence
Conditioning and numerical sensitivity

It computes the pseudoinverse in a stable way:

x = A⁺ b

This is the gold standard solver, especially for ill-conditioned systems. The trade-off is cost: SVD is slower than QR.

Thus, numerical libraries follow a rule of thumb:

QR for most problems
SVD when accuracy outweighs performance

Least squares in the real world

Least squares lives inside countless real systems:

Fitting linear models
Kalman filters
Camera calibration
Gaussian processes
Matrix factorization in recommender systems
Gradient descent initialization

In machine learning, least squares even hides in the background of nonlinear models: linearized steps in optimizers, trust-region methods, and small subproblems in training often reduce to solving a least squares system.

Thus, accuracy in least squares does not just improve a single computation — it strengthens entire pipelines.

Where QR enters the picture

Whether through Gram–Schmidt (rare), Modified GS (better), or Householder reflections (preferred), QR decomposition serves as the workhorse for stable least-squares solutions.

Which brings us to the key question:

Why do modern numerical libraries overwhelmingly choose QR as the default solver for least squares?

To answer that, we now step into a comparison that reveals the strengths — and limits — of QR.

2025-10-04

least squares

QR decomposition

SVD

numerical stability

linear algebra

Shohei Shimoda

I organized and output what I have learned and know here.

カテゴリー

Computation & Mathematical Systems

Microsoft Teams Bots

検索ログ

Hello World bot 755 IT assistant bot 701 Deploy Teams bot to Azure 700 Microsoft Bot Framework 678 Adaptive Card Action.Submit 646 Microsoft Graph 626 Bot Framework example 613 Graph API token 610 Bot Framework Adaptive Card 600 Zendesk Teams integration 596 Adaptive Cards 594 Microsoft Teams Task Modules 594 ServiceNow bot 592 Teams app zip 591 Bot Framework CLI 588 Teams bot development 588 Azure App Service bot 585 Teams chatbot 585 Azure bot registration 584 Teams production bot 584 Azure Bot Services 581 Azure CLI webapp deploy 580 Task Modules 576 bot for sprint updates 574 Bot Framework proactive messaging 570 Teams bot packaging 565 Teams bot tutorial 563 Microsoft Entra ID 560 Bot Framework prompts 558 C 557

Development & Technical Consulting

Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.

7.3 Least Squares Problems