Lecture Notes

18. Least Squares

Part of the Series on Linear Algebra.

By Akshay Agrawal. Last updated Dec. 20, 2018.

Previous entry: Projections ; Next entry: Principal Component Analysis

This section treats the problem of least squares. We are given a matrix and a vector , and our goal is to find a vector that minimizes . This problem has “squares” in its name because it is often posed as the equivalent problem of minimizing .

The case when has a trivial nullspace

In our study of orthogonal projections, we saw that if and is full rank, then this problem has a unique solution , where is the pseudoinverse of . The vector in that achieves the minimum distance to is , where is the familiar matrix with orthonormal columns from ’s full QR factorization.

Solution set of the general least squares problem

Regardless of whether is full rank, and regardless of whether , the vector in achieving minimum distance to is the orthogonal projection of onto . This means that if is a matrix with orthonormal columns spanning the range of , then minimizes over all ; this was proved in the previous section. When has a non-trivial nullspace, however, the least squares problem has infinitely many solutions.

Consider the general least squares problem, with and the problem data. Let be a matrix with orthonormal columns spanning the range of ; such a matrix can be obtained by applying the modified Gram-Schmidt algorithm to the columns of . Let be a vector in such that (such a vector must exist, since is the orthogonal projection onto , which means ). Then the solution set of the least squares problem is

The set always has at least one member, namely . If has a nontrivial nullspace (i.e., if there is a vector other than in the null space of ), then has infinitely many members.

Residual vector

We can also characterize the residual vector , where is any particular solution to the least squares problem. Let be a matrix whose columns are an orthonormal basis for and a matrix whose columns are an orthonormal basis for . The residual vector is

where the last equality comes from the fact that is the complementary projection of , i.e., it is the projection onto . Notice that we say the residual vector, because the residual vector is unique; while the least squares problem might have infinitely many solutions, there is a uniqe vector in that is closest to . The optimal squared residual is therefore . Although has orthonormal columns, it is not an orthogonal matrix (because it is not square), so we cannot further simplify the righthand side.

Best linear unbiased estimator

Suppose we want to estimate a vector . We cannot measure directly; instead, we observe a vector , which we know to be a function of : . The matrix , < represents a sensor for , and the vector is a random variable representing measurement noise with mean and variance . We seek an unbiased estimate of that is a linear function of .

One such estimate is , since . It turns out that has lower variance than any other unbiased estimator of ; in this sense, is the best linear unbiased estimator of . To see this, let be some other unbiased estimator of , and note that . We will show that is positive semidefinite.

The variance of is

Because is an orthogonal projection, for all , and . Hence

This implies that is positive semidefinite.

Least-norm solution via the Moore-Penrose inverse

The vector gives the least-norm solution of the general least squares problem (in which may be singular), where is the Moore-Penrose inverse (see the notes on the SVD). To see why this is the case, let , so that and (the matrix is obtained by transposing and inverting the non-zero entries). If is diagonal, then it is clear that the least-norm minimizer of is . Otherwise, note that

,

since is an isometry. Let . Note that and that is invertible, so is the least-norm minimizer of if and only if is the least-norm minimizer of . Since is diagonal, by our previous argument we conclude that the least-norm minimizer of is . Therefore, the least-norm minimizer of is .

References

  1. Stephen Boyd and Sanjay Lall. EE 263 Course Notes.
  2. Dan Kalman. A Singularly Valuable Decomposition: The SVD of a Matrix.
  3. Lloyd Trefethen and David Bau. Numerical Linear Algebra.