Lecture Notes

12. Symmetric Matrices

Part of the Series on Linear Algebra.

By Akshay Agrawal. Last updated Dec. 20, 2018.

Previous entry: Eigenvectors ; Next entry: Matrix Norms

A square matrix is called symmetric or self-adjoint if the entries above its diagonal are equal to the entries below its diagonal, i.e., if . In this section, we will present important properties related to the spectrum of self-adjoint operators, without proof.

The spectral theorem

  1. The eigenvalues of self-adjoint operators are real.
  2. An operator is self-adjoint if and only if has an orthonormal basis consisting of eigenvectors of .

The second result is known as the real spectral theorem, and it is one of the most important results in linear algebra. A main goal of linear algebra is to find conditions under which linear operators have simple matrices; the real spectral theorem says every self-adjoint matrix is diagonalizable, and diagonal matrices are as simple as it gets.

Let be self-adjoint. By the real spectral theorem, we can construct an orthogonal matrix whose columns are orthonormal eigenvectors of . Let have on its diagonal the eigenvalues corresponding to . Then

and

Quadratic forms

A quadratic form is a function of the form

In other words, a quadratic form is a polynomial in which each term is of degree two and each coefficient is (for our purposes) real.

We typically assume that is self-adjoint, since

Inequalities. Let be self-adjoint with eigenvalue decomposition and eigenvalues . Then . We will show the lower bound; the upper bound follows in a similar fashion.

The lower bound is achieved by , and the upper bound by .

Positive semidefinite matrices

A self-adjoint matrix is positive semidefinite if its quadratic form is nonnegative, i.e., if for all . This is denoted . By the inequalities in the previous subsection, a matrix is positive semidefinite if and only if its eigenvalues are all nonnegative.

A matrix is said to be positive definite if its quadratic form is positive; this is denoted by . Negative semidefinite and negative semidefinite are defined analogously.

Square root. A matrix is called a square root of a matrix if . Every positive semidefinite matrix has a square root , where . Evidently, is also positive semidefinite; moreover, it is unique. Of course, every positive definite matrix also has a unique positive definite square root.

Partial order. We can define a partial order on semidefinite matrices; see the notes on the Loewner order.

Gram matrix. Every gram matrix is symmetric and positive semidefinite, since for all . By relabeling it follows that is symmetric and positive semidefinite.

Covariance matrix. Let be a data matrix recording data points, with each point represented as a list of measurements; i.e., each row is a data point (or example, record, or experiment) with variables, and there are points total. Said another way, each row of is an observation of a -dimensional random vector. Let be the sample mean of the data matrix, i.e., the average of its rows. The sample covariance between the th and th variables is

The sample covariance matrix is a matrix such that is the covariance between variables and , that is,

Notice that if the data points in were arranged as columns instead of rows (i.e., if were transposed), the sample mean would be the average of its columns and the sample covariance matrix would be

The sample covariance matrix is positive semidefinite, since it is a gram matrix.

Decorrelation and whitening

A common pre-processing step in machine learning is decorrelation, i.e., linearly transforming a data matrix to make its covariance diagonal. Let be an data matrix with mean ; its covariance matrix is . By the spectral theorem, there exists an orthogonal matrix and a diagonal matrix such that

Multiplying on the right by decorrelates the data: , which is diagonal. If we additionally multiply on the right by and use as our new data matrix, then the covariance matrix of the data becomes the identity. Any linear transformation that transforms the covariance into the identity is called a whitening transformation, and applying such a transformation to the original data is referred to as whitening the data.

References

  1. Sheldon Axler. Linear Algebra Done Right.
  2. Stephen Boyd and Sanjay Lall. EE 263 Course Notes.