The Empirical Covariance Matrix (Vector Form)¶

While the element-wise definition ($cov(x_i, x_j)$) is useful for understanding the relationship between two specific variables, we rarely calculate the matrix element-by-element in practice. Instead, we use vector algebra to compute the entire matrix at once.

Vectorized Definition¶

Given a set of $m$ data points (samples) $\{\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \dots, \mathbf{x}^{(m)}\}$, where each $\mathbf{x}^{(i)} \in \mathbb{R}^n$, and the empirical mean vector:$$\mathbf{\bar{x}} = \frac{1}{m} \sum_{i=1}^{m} \mathbf{x}^{(i)}$$The Empirical Covariance Matrix $\mathbf{C}$ is defined as the average of the "spread" of the data around the mean:$$\mathbf{C} = \frac{1}{m-1} \sum_{i=1}^{m} (\mathbf{x}^{(i)} - \mathbf{\bar{x}})(\mathbf{x}^{(i)} - \mathbf{\bar{x}})^T$$

Derivation: From Element to Outer Product¶

To see why this holds, look at the $(j, k)$-th entry of the matrix resulting from the sum above. Let $\mathbf{v}^{(i)} = (\mathbf{x}^{(i)} - \mathbf{\bar{x}})$.

The product of a column vector and a row vector—the outer product—results in a matrix:$$\mathbf{v}^{(i)} (\mathbf{v}^{(i)})^T = \begin{pmatrix} v_1^{(i)} \\ \vdots \\ v_n^{(i)} \end{pmatrix} \begin{pmatrix} v_1^{(i)} & \dots & v_n^{(i)} \end{pmatrix} = \begin{pmatrix} v_1^{(i)}v_1^{(i)} & \dots & v_1^{(i)}v_n^{(i)} \\ \vdots & \ddots & \vdots \\ v_n^{(i)}v_1^{(i)} & \dots & v_n^{(i)}v_n^{(i)} \end{pmatrix}$$

Comparing this to your previous definition:

The entry at $(j, k)$ is $v_j^{(i)} v_k^{(i)} = (x_j^{(i)} - \bar{x}_j)(x_k^{(i)} - \bar{x}_k)$.
When we sum over all $m$ samples and divide by $m-1$, the $(j, k)$-th entry becomes:$$C_{jk} = \frac{1}{m-1} \sum_{i=1}^{m} (x_j^{(i)} - \bar{x}_j)(x_k^{(i)} - \bar{x}_k)$$ This is exactly the definition of $cov(x_j, x_k)$ you established in the previous chapter.

Geometric Interpretation¶

The outer product $(\mathbf{x}^{(i)} - \mathbf{\bar{x}})(\mathbf{x}^{(i)} - \mathbf{\bar{x}})^T$ represents the contribution to the variance by a single sample.

It creates a rank-1 matrix aligned with the direction of that sample.
Summing these contributions builds the $n$-dimensional "shape" of the data.

The Data Matrix Form (The most compact representation)¶

If we organize our $m$ measurements (centered by subtracting the mean $\bar{\mathbf{x}}$) as rows in an $m \times n$ matrix $\tilde{\mathbf{X}}$:

$$\tilde{\mathbf{X}} = \begin{pmatrix} — & (\mathbf{x}^{(1)} - \bar{\mathbf{x}})^T & — \\ — & (\mathbf{x}^{(2)} - \bar{\mathbf{x}})^T & — \\ & \vdots & \\ — & (\mathbf{x}^{(m)} - \bar{\mathbf{x}})^T & — \end{pmatrix}$$

The covariance matrix $\mathbf{C}$ (which must be $n \times n$) is then:

$$\mathbf{C} = \frac{1}{m-1} \tilde{\mathbf{X}}^T \tilde{\mathbf{X}}$$

Alternatively, we can organize in a centered data matrix $\hat{\mathbf{X}}$ each data vector as a column, i.e. $\hat{\mathbf{X}}$ is an $n \times m$-matrix:

$$\mathbf{C} = \frac{1}{m-1} \hat{\mathbf{X}} \hat{\mathbf{X}}^T$$

Summary: Covariance Representations¶

The empirical covariance can be expressed at different levels of abstraction. Each form is mathematically equivalent but serves a different purpose in implementation and theory.

Level	Formula	Best used for...
Scalar	$cov(x,y) = \frac{1}{m-1} \sum_{i=1}^{m} (x_i - \bar{x})(y_i - \bar{y})$	Understanding the statistical relationship between two specific variables.
Outer Product	$\mathbf{C} = \frac{1}{m-1} \sum_{i=1}^{m} \mathbf{v}^{(i)}(\mathbf{v}^{(i)})^T$	Theoretical derivations, Eigen-analysis, and Rank-1/Rank-$\mu$ updates in CMA-ES.
Data Matrix	$\mathbf{C} = \frac{1}{m-1} \tilde{\mathbf{X}}^T\tilde{\mathbf{X}}$	High-performance computation and software implementations (assuming samples as rows).

Note on Notation: In the Data Matrix form above, $\tilde{\mathbf{X}}$ is an $m \times n$ matrix where each row is a centered observation vector $\mathbf{v}^{(i)} = (\mathbf{x}^{(i)} - \bar{\mathbf{x}})$.