Lecture Notes

12. Symmetric Matrices

Part of the Series on Linear Algebra.

By Akshay Agrawal. Last updated Dec. 20, 2018.

Previous entry: Eigenvectors ; Next entry: Matrix Norms

A square matrix $A$ is called symmetric or self-adjoint if the entries above its diagonal are equal to the entries below its diagonal, i.e., if $A = A^T$ . In this section, we will present important properties related to the spectrum of self-adjoint operators, without proof.

The spectral theorem

The eigenvalues of self-adjoint operators are real.
An operator $A \in \RR^{n \times n}$ is self-adjoint if and only if $\RR^n$ has an orthonormal basis consisting of eigenvectors of $A$ .

The second result is known as the real spectral theorem, and it is one of the most important results in linear algebra. A main goal of linear algebra is to find conditions under which linear operators have simple matrices; the real spectral theorem says every self-adjoint matrix is diagonalizable, and diagonal matrices are as simple as it gets.

Let $A \in \RR^{n \times n}$ be self-adjoint. By the real spectral theorem, we can construct an orthogonal matrix $Q = \begin{bmatrix} q_1, \cdots q_n \end{bmatrix}$ whose columns are orthonormal eigenvectors of $A$ . Let $\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$ have on its diagonal the eigenvalues corresponding to $q_1, \ldots, q_n$ . Then

$\Lambda = Q^T A Q,$

and

$A = Q\Lambda Q^T.$

Quadratic forms

A quadratic form is a function $f : \RR^n \to \RR$ of the form

$f(x) = x^T A x = \sum_{i, j = 1}^{n} A_{ij} x_i x_j.$

In other words, a quadratic form is a polynomial in which each term is of degree two and each coefficient is (for our purposes) real.

We typically assume that $A \in \RR^{n \times n}$ is self-adjoint, since

$x^T A x = x^T ((A + A^T)/2)x.$

Inequalities. Let $A$ be self-adjoint with eigenvalue decomposition $Q \Lambda Q^T$ and eigenvalues $\lambda_1 \geq \cdots \geq \lambda_n$ . Then $\lambda_n x^Tx \leq x^T A x \leq \lambda_1 x^TxA$ . We will show the lower bound; the upper bound follows in a similar fashion.

$\begin{align*} x^TAx &= x^TQ\Lambda Q^Tx \\ &= (Q^Tx)^T\Lambda (Q^Tx) \\ &= \sum_{i=1}^{n}\lambda_i (q_i^Tx)^2 \\ &\geq \lambda_n \sum_{i=1}^{n} (q_i^Tx)^2 \\ &= \lambda_n x^Tx. \\ \end{align*}$

The lower bound is achieved by $x = q_n$ , and the upper bound by $x = q_1$ .

Positive semidefinite matrices

A self-adjoint matrix $A$ is positive semidefinite if its quadratic form is nonnegative, i.e., if $x^TAx \geq 0$ for all $x$ . This is denoted $A \geq 0$ . By the inequalities in the previous subsection, a matrix is positive semidefinite if and only if its eigenvalues are all nonnegative.

A matrix is said to be positive definite if its quadratic form is positive; this is denoted by $A > 0$ . Negative semidefinite and negative semidefinite are defined analogously.

Square root. A matrix $R$ is called a square root of a matrix $A$ if $R^2 = A$ . Every positive semidefinite $A = Q\Lambda Q^T$ matrix has a square root $R = Q \Lambda^{1/2} Q^T$ , where $\Lambda^{1/2} = \operatorname{diag}(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_n})$ . Evidently, $R$ is also positive semidefinite; moreover, it is unique. Of course, every positive definite matrix also has a unique positive definite square root.

Partial order. We can define a partial order on semidefinite matrices; see the notes on the Loewner order.

Gram matrix. Every gram matrix $A^TA$ is symmetric and positive semidefinite, since $x^TA^TAx = \norm{Ax}^2 \geq 0$ for all $x$ . By relabeling it follows that $AA^T$ is symmetric and positive semidefinite.

Covariance matrix. Let $X \in \RR^{n \times p}$ be a data matrix recording $n$ data points, with each point represented as a list of $p$ measurements; i.e., each row is a data point (or example, record, or experiment) with $p$ variables, and there are $n$ points total. Said another way, each row of $X$ is an observation of a $p$ -dimensional random vector. Let $\overline{x}$ be the sample mean of the data matrix, i.e., the average of its rows. The sample covariance between the $i$ th and $j$ th variables is

$\frac{1}{n-1} \sum_{k=1}^{n} (X_{ki} - \overline{x}_i)(X_{kj} - \overline{x}_j).$

The sample covariance matrix is a $p \times p$ matrix $\Sigma$ such that $\Sigma_{ij}$ is the covariance between variables $i$ and $j$ , that is,

$\Sigma = \frac{1}{n-1} (X - 1\overline{x}^T)^T(X - 1\overline{x}^T).$

Notice that if the data points in $X$ were arranged as columns instead of rows (i.e., if $X$ were transposed), the sample mean $\overline{x}$ would be the average of its columns and the sample covariance matrix would be

$\Sigma = \frac{1}{n-1} (X - \overline{x}1^T)(X - \overline{x}1^T)^T.$

The sample covariance matrix $\Sigma$ is positive semidefinite, since it is a gram matrix.

Decorrelation and whitening

A common pre-processing step in machine learning is decorrelation, i.e., linearly transforming a data matrix to make its covariance diagonal. Let $X$ be an $n \times p$ data matrix with mean $0$ ; its covariance matrix is $X^TX$ . By the spectral theorem, there exists an orthogonal matrix $Q$ and a diagonal matrix $\Lambda$ such that

$X^TX = Q \Lambda Q^T.$

Multiplying $X$ on the right by $Q$ decorrelates the data: $(XQ)^T XQ = Q^TX^TXQ = \Lambda$ , which is diagonal. If we additionally multiply $XQ$ on the right by $\Lambda^{-1/2}$ and use $XQ\Lambda^{-1/2}$ as our new data matrix, then the covariance matrix of the data becomes the identity. Any linear transformation that transforms the covariance into the identity is called a whitening transformation, and applying such a transformation to the original data is referred to as whitening the data.

References

Sheldon Axler. Linear Algebra Done Right.
Stephen Boyd and Sanjay Lall. EE 263 Course Notes.