The least squares problem is to obtain a vector that the distance of the value of the given linear map at that vector from another given vector is minimized. More specifically, let \(A\) be an \(m \times n\) real matrix and \(b\) a vector in \(\mathbb{R}^m\). The least squares problem is to find a suitable vector \(x\) in \(\mathbb{R}^n\) such that \(\Vert Ax – b\Vert\) is minimized. The least squares problem becomes in use when the equation \(Ax = b\) has no solution; its corresponding system of linear equations is inconsistent. The interpretation is as follows:

Now that there is no vector \(x\) such that the value of \(\Vert Ax – b\Vert\) is zero, how we can find a vector \(x\) that minimizes the values of \(\Vert Ax – b\Vert\). In the following, we will see that we can solve the least squares problem with the help of QR matrix factorization.

#### Definition of the least squares problem

Let \(A\) be an \(m \times n\) real matrix and \(x\) be a vector in \(\mathbb{R}^n\). It is clear that \(Ax\) is a vector in the column space \(\hbox{Col}(A)\) of the matrix \(A\). Let \(b\) be a vector in \(\mathbb{R}^m\). We are interested to find a vector \(\hat{x}\) such that \(A\hat{x}\) is nearest to \(b\). In other words:

**The least squares problem**. If \(A\) is an \(m\times n\) matrix and \(b \in \mathbb{R}^m\). Then, a solution to the least squares problem is a vector \(\hat{x} \in \mathbb{R}^n\) such that $$\Vert A\hat{x} – b \Vert \leq \Vert Ax – b \Vert,$$ for all \(x \in \mathbb{R}^n\).

**Remark**. In some references, they refer to the value of \(\Vert A\hat{x} – b \Vert \) as the **least squares error**.

#### Normal equation of the least squares problem

Let \(A\) be an \(m\times n\) matrix and \(b \in \mathbb{R}^m\). By the best approximation theorem, discussed in the post on orthogonal projection, a point in the subspace \(\hbox{Col}(A)\) is the nearest to \(b\) if and only if it is the projection of \(b\) onto \(\hbox{Col}(A)\); let’s call that point \(\hat{b}\).

Since \(\hat{b}\) is an element of the column space of \(A\), there is a point \(\hat{x} \in \mathbb{R}^n\) such that \(A\hat{x} = \hat{b}\). Consequently, \(\hat{x}\) is a solution to the least squares problem of \(Ax = b\) if and only if \(\hat{x}\) is a solution to the equation \(A\hat{x} = \hat{b}\).

Now, let \(A\hat{x} = \hat{b}\). By the orthogonal decomposition theorem, discussed in the post on orthogonal projection, the vector \(b – \hat{b}\) is orthogonal to the subspace \(\hbox{Col}(A)\). This implies that \(b – A\hat{x}\) is orthogonal to each column of of the matrix \(A\). This means that $$C_i \cdot (b – A\hat{x}) = 0,$$ for each column \(C_i\) of the matrix \(A\).

This is equivalent to say that $$C^T_i (b – A\hat{x}) = 0.$$ Since each \(C^T_i\) is a row of the matrix \(A^T\), we obtain that $$A^T (b – A\hat{x}) = 0$$ which implies the following essential equation: $$(A^T A) \hat{x} = A^T b.$$

**Remark**. In some references, they refer to the equation \((A^T A) \hat{x} = A^T b\) as the normal equation of \(Ax = b\).

#### Solutions of the least squares problem

Conversely, let \(\hat{x}\) satisfy the normal equation $$(A^T A) \hat{x} = A^T b.$$ Therefore, \(\hat{x}\) satisfies the following equation: $$A^T (b – A\hat{x}) = 0.$$ This means that the vector \(b – A\hat{x}\) is orthogonal to the column space of the matrix \(A\), i.e. \(\hbox{Col}(A)\). So, $$b = A\hat{x} + (b – A\hat{x})$$ is an additive decomposition of the vector \(b\) into the addition of the vector \(A\hat{x}\) in \(\hbox{Col}(A)\) and the vector \(b – A\hat{x}\) orthogonal to \(\hbox{Col}(A)\). By the orthogonal decomposition theorem, \(A\hat{x}\) is the orthogonal projection of \(b\) onto \(\hbox{Col}(A)\), i.e., \( \hat{b} = A\hat{x}\) is the nearest vector to \(b\), and so, \(\hat{x}\) is a least squares solution.

Hence, we have already proved the following result:

#### An example of the least squares problem

**Example**. Solve the least squares problem for \(Ax = b\) if $$A = \begin{pmatrix} 2 & 0 \\ -1 & 1 \\ 0 & 2 \end{pmatrix}$$ and $$ b = \begin{pmatrix} 1 \\ 0 \\ -1 \end{pmatrix}.$$ An easy calculation shows that $$A^T A = \begin{pmatrix} 5 & -1 \\ -1 & 5 \end{pmatrix}$$ and $$A^T b = \begin{pmatrix} 2 \\ -2 \end{pmatrix}.$$ In this case, the only solution to the normal equation $$(A^T A) x = A^T b$$ is $$\hat{x} = \begin{pmatrix} 1/3 \\ -1/3 \end{pmatrix},$$ because the matrix \(A^T A\) is invertible. However, in many cases, the solution is not unique. This is something that we will clarify in the next section:

#### When does the least squares problem has a unique solution?

There is one of the most essential results for the least squares problem that can be summed up as follows:

**Result**. Let \(A\) be an \(m \times n\) real matrix. Then, the following statements are equivalent:

- There is a unique least squares solution to \(Ax = b\) for each \(b\in \mathbb{R}^m\).
- The columns of \(A\) are linearly independent vectors.
- The matrix \(A^TA\) is invertible.

And if one of the above statements happens, the unique least squares solution to \(Ax=b\) is $$\hat{x} = (A^TA)^{-1} A^T b.$$

The benefit of the above result is that if the columns of a matrix \(A\) are linearly independent, we can use QR factorization to find the unique least squares solution to the equation \(Ax = b\) for each possible column vector \(b\). More precisely:

#### Finding a solution to the least squares problem with the help of QR factorization

**Result**. If \(A\) is an \(m \times n\) matrix with linearly independent columns and \(A = QR\) is a QR factorization of \(A\) explained in the post on the QR matrix factorization, then the equation \(Ax = b\) has the following unique least squares solution: $$\hat{x} = R^{-1} Q^T b.$$

The proof goes as follows:

If we set \(\hat{x} = R^{-1} Q^T b\), we have $$A\hat{x} = QR\hat{x} = $$ $$QR R^{-1} Q^T b = QQ^T b.$$ By the main result related to the QR matrix factorization, the columns of \(Q\) form an orthogonal basis for the column space \(\hbox{Col}(A)\) of the matrix \(A\). So, by the orthogonal projection formula explained in the post on the orthogonal projection, \(QQ^T b\) is the orthogonal projection of \(b\) on \(\hbox{Col}(A)\), i.e., \(\hat{b} = QQ^T b\). Consequently, we have \(A\hat{x} = \hat{b}\) and it means that \(\hat{x}\) is a least squares solution to the equation \(Ax = b\). The uniqueness of \(\hat{x}\) is a result this point that The columns of \(A\) are linearly independent. This completes the proof.