Covariance Matrix and Multivariate Normal Distribution¶

Variance and Covariance: Measuring Uncertainty and Relationships¶

Variance: The 1D Spread¶

Variance measures how much a single random variable deviates from its own mean. It is the average of the squared distances from the mean:

$$Var(X) = \sigma_X^2 = \mathbb E\left[\left(X - \mathbb E\left[ X\right]\right)^2 \right]$$

Low Variance: The values are tightly clustered around the mean. We are quite "sure".
High Variance: The values are spread out. We are "unsure."

Covariance: The Shared Relationship¶

The covariance between two random variables $X$ and $Y$ defines how they change together. It is the expected value of the product of their deviations:$$Cov(X,Y) = \sigma_{XY} = \mathbb E\left[\left(X - \mathbb E\left[ X\right]\right) \left(Y - \mathbb E\left[ Y\right]\right) \right]$$

What does the number tell us?

The sign of the covariance tells us the "direction" of the relationship:

Positive Covariance ($\sigma_{XY} > 0$): When $X$ is above its mean, $Y$ tends to be above its mean too.
- Example: If our velocity is higher than expected, our position will likely be further ahead than expected.
Negative Covariance ($\sigma_{XY} < 0$): When $X$ is above its mean, $Y$ tends to be below its mean.
- Example: In a braking car, as time increases, the velocity decreases.
Zero Covariance ($\sigma_{XY} = 0$): There is no linear relationship. $X$ being high tells us nothing about whether $Y$ will be high or low.

Why this is the important of the Kalman Filter

In the Kalman Filter, we don't just want to know "Where is the car?" and "How fast is it?" independently. We want to know how an error in one propagates into the other.

Recap: Empirical Covariance: How We Calculate it from Data¶

If we have a set of $m$ measurements, we calculate the empirical covariance using the following formula:

$$cov(x,y) = \frac{\sum_{i=1}^{m} (x^{(i)} - \bar x)(y^{(i)} - \bar y )}{m-1}$$

Where:

$x^{(i)}, y^{(i)}$: The $i$-th individual data point.
$\bar x, \bar y$: The sample means (averages).
$m-1$: The degrees of freedom (Bessel's correction), used to provide an unbiased estimate.

In [1]:

from ipynb.fs.defs.plot_multivariate_Gaussian_distribution  import plot_cov_matricies, plot_all_mvd
plot_cov_matricies()

No description has been provided for this image

Explanation:

The Center Point: In all plots, the center of the cloud is at $(0,0)$. This is because we set the means $\mathbb{E}[X]=0$ and $\mathbb{E}[Y]=0$.
Positive Covariance (Left - Blue): Look at the "quadrants." Most points are in the top-right (both positive) or bottom-left (both negative). The products $X \cdot Y$ in these quadrants are positive, leading to a positive average sum. The cloud is tilted upward.
Negative Covariance (Middle - Red): Most points are in the top-left ($X$ negative, $Y$ positive) or bottom-right ($X$ positive, $Y$ negative). The products $X \cdot Y$ here are negative, leading to a negative average sum. The cloud is tilted downward.
Zero Covariance (Right - Green): The points are spread evenly across all four quadrants. The positive products cancel out the negative products almost perfectly, resulting in an average near zero. The cloud is axis-aligned (not tilted).

The Correlation Coefficient ($\rho$): Normalizing the "Tilt"¶

While covariance tells us the direction of the relationship, its raw value is hard to interpret because it depends on the units of the variables.

To solve this, we use the Pearson Correlation Coefficient ($\rho$), which normalizes the covariance into a range from -1 to +1.

The formula is:

$$\rho_{xy} = \frac {cov(x, y)}{\sigma_x \sigma_y}$$

Where $\sigma_x$ and $\sigma_y$ are the standard deviations (the square roots of the variances).

Covariance Matrix¶

When dealing with a multi-dimensional random variable, we use a covariance matrix to describe the relationships between the different dimensions (or components) of the variable.

Let's assume that such a vector $\mathbf x$ has three elements, i.e., $n=3$:

$$ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} $$ Then, the covariance matrix $\mathbf{C}$ is an $n \times n$ matrix (in this case, a $3 \times 3$ matrix):

$$ \mathbf{C} = \begin{pmatrix} cov(x_1,x_1) & cov(x_1,x_2) & cov(x_1,x_3) \\ cov(x_2,x_1) & cov(x_2,x_2) & cov(x_2,x_3) \\ cov(x_3,x_1) & cov(x_3,x_2) & cov(x_3,x_3) \end{pmatrix} $$

Key Properties:

The Diagonal (Variances): The elements $cov(x_i, x_i) = var(x_i)$ are simply the variances of each individual component ($\sigma_{x_i}^2$). They tell you how "noisy" each variable is on its own.
Symmetry ($C = C^T$): The matrix is always symmetric. Because $cov(x_i, x_j) = cov(x_j, x_i)$, the "mirror image" across the diagonal is identical.
Positive Semi-Definite: Physically, this ensures that variances (and the "volume" of our uncertainty) can never be negative.

Vectorized Form of the Covariance Matrix¶

The element-wise definition above can be written compactly using the outer product. For a random vector $\mathbf{x}$ with mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{x}]$:

$$\mathbf{C} = \text{Cov}(\mathbf{x}) = \mathbb{E}\!\left[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T\right]$$

To see why, expand the outer product explicitly for $n = 3$:

$$(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T = \begin{bmatrix} x_1 - \mu_1 \\ x_2 - \mu_2 \\ x_3 - \mu_3 \end{bmatrix} \begin{bmatrix} x_1 - \mu_1 & x_2 - \mu_2 & x_3 - \mu_3 \end{bmatrix} = \begin{bmatrix} (x_1-\mu_1)^2 & (x_1-\mu_1)(x_2-\mu_2) & (x_1-\mu_1)(x_3-\mu_3) \\ (x_2-\mu_2)(x_1-\mu_1) & (x_2-\mu_2)^2 & (x_2-\mu_2)(x_3-\mu_3) \\ (x_3-\mu_3)(x_1-\mu_1) & (x_3-\mu_3)(x_2-\mu_2) & (x_3-\mu_3)^2 \end{bmatrix}$$

Taking the expectation entry-by-entry:

$$\mathbb{E}\!\left[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T\right] = \begin{bmatrix} \mathbb{E}[(x_1-\mu_1)^2] & \mathbb{E}[(x_1-\mu_1)(x_2-\mu_2)] & \mathbb{E}[(x_1-\mu_1)(x_3-\mu_3)] \\ \mathbb{E}[(x_2-\mu_2)(x_1-\mu_1)] & \mathbb{E}[(x_2-\mu_2)^2] & \mathbb{E}[(x_2-\mu_2)(x_3-\mu_3)] \\ \mathbb{E}[(x_3-\mu_3)(x_1-\mu_1)] & \mathbb{E}[(x_3-\mu_3)(x_2-\mu_2)] & \mathbb{E}[(x_3-\mu_3)^2] \end{bmatrix} = \begin{pmatrix} cov(x_1,x_1) & cov(x_1,x_2) & cov(x_1,x_3) \\ cov(x_2,x_1) & cov(x_2,x_2) & cov(x_2,x_3) \\ cov(x_3,x_1) & cov(x_3,x_2) & cov(x_3,x_3) \end{pmatrix} = \mathbf{C}$$

Special case — zero mean ($\boldsymbol{\mu} = \mathbf{0}$):

If the random vector is zero-mean, the formula simplifies to:

$$\mathbf{C} = \mathbb{E}\!\left[\mathbf{x}\mathbf{x}^T\right]$$

The General Multivariate Normal Distribution¶

The Multivariate Normal Distribution (MVD) is the generalization of the 1-dimensional bell curve to $n$ dimensions. For a state vector $\mathbf{x}$ of size $n$, the probability density function (PDF) is defined as:

$$p(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n |\mathbf{C}|}} \exp\left(-\frac{1}{2}(\mathbf{x} - {\boldsymbol{\mu}})^T \mathbf{C}^{-1} (\mathbf{x} - {\boldsymbol{\mu}})\right)$$

Where:

$\mathbf{x} - {\boldsymbol{\mu}}$: The Error Vector. It represents how far a specific point $\mathbf{x}$ is from the mean $ {\boldsymbol{\mu}}$.
$\mathbf{C}$: The Covariance Matrix. It describes the spread and correlation of the variables.
$\mathbf{C}^{-1}$: The Inverse Covariance Matrix. In the exponent, this matrix scales the error. If a certain direction has low variance (small $\mathbf{C}$), the inverse is large, causing the probability to drop rapidly even for small errors.
$|\mathbf{C}|$: The Determinant for Normalization. This represents the total "volume" of the uncertainty.

The 2D Example: The $p-v$ Plane¶

In our 1D point mass model, the state vector is $\mathbf{x} = \begin{bmatrix} p \\ v \end{bmatrix}$. The distribution lives in a 2D plane where the "height" of the mountain at any $(p, v)$ coordinate tells us how likely that specific combination of position and velocity is.

For this case, the covariance matrix is:$$\mathbf{C} = \begin{bmatrix} \sigma_p^2 & \sigma_{pv} \\ \sigma_{vp} & \sigma_v^2 \end{bmatrix}$$

In [2]:

plot_all_mvd()

Explaination:

Uncorrelated ($\sigma_{pv} = 0$): The ellipse is axis-aligned (standing straight up or lying flat). This means an error in position tells you nothing about the error in velocity. Note that we can be very certain about one and unsure about the other, or unsure about both—the ellipse stays aligned as long as the covariance is zero.
Correlated ($\sigma_{pv} \neq 0$): The ellipse is tilted. In physics, if a car has been traveling faster than expected ($v$ is high), it is likely further ahead than expected ($p$ is high). This creates a positive correlation, tilting the ellipse to the right.

Transformation of the Covariance Matrix¶

The covariance matrix of a transformed random vector $\mathbf{y} = \mathbf{A}\mathbf{x}$ is $\operatorname{Cov}(\mathbf{y}) = \mathbf{A} \operatorname{Cov}(\mathbf{x}) \mathbf{A}^\top$, which can be demonstrated as follows:

Preliminaries and Definitions¶

Let $\mathbf{x}$ be a random vector with:

Mean: $\mathbb{E}[\mathbf{x}] = \boldsymbol{\mu}_x$
Covariance Matrix: $\operatorname{Cov}(\mathbf{x}) = \mathbb{E}[(\mathbf{x} - \boldsymbol{\mu}_x)(\mathbf{x} - \boldsymbol{\mu}_x)^\top]$

For the transformed vector $\mathbf{y} = \mathbf{A}\mathbf{x}$ (where $\mathbf{A}$ is a constant matrix):

Mean: By the linearity of expectation, $\mathbb{E}[\mathbf{y}] = \mathbb{E}[\mathbf{A}\mathbf{x}] = \mathbf{A} \mathbb{E}[\mathbf{x}] = \mathbf{A}\boldsymbol{\mu}_x = \boldsymbol{\mu}_y$.

Derivation¶

The covariance matrix of $\mathbf{y}$ is defined as:

$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[(\mathbf{y} - \boldsymbol{\mu}_y)(\mathbf{y} - \boldsymbol{\mu}_y)^\top]$$

Substitute $\mathbf{y} = \mathbf{A}\mathbf{x}$ and $\boldsymbol{\mu}_y = \mathbf{A}\boldsymbol{\mu}_x$ into the expression:

$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[(\mathbf{A}\mathbf{x} - \mathbf{A}\boldsymbol{\mu}_x)(\mathbf{A}\mathbf{x} - \mathbf{A}\boldsymbol{\mu}_x)^\top]$$

Factor out the constant matrix $\mathbf{A}$:

$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[\mathbf{A}(\mathbf{x} - \boldsymbol{\mu}_x) (\mathbf{A}(\mathbf{x} - \boldsymbol{\mu}_x))^\top]$$

Apply the property of matrix transposition, $(\mathbf{BC})^\top = \mathbf{C}^\top \mathbf{B}^\top$:

$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[\mathbf{A}(\mathbf{x} - \boldsymbol{\mu}_x) (\mathbf{x} - \boldsymbol{\mu}_x)^\top \mathbf{A}^\top]$$

Since $\mathbf{A}$ and $\mathbf{A}^\top$ are constant matrices, they can be pulled out of the expectation operator $\mathbb{E}[\cdot]$:

$$\operatorname{Cov}(\mathbf{y}) = \mathbf{A} \mathbb{E}[(\mathbf{x} - \boldsymbol{\mu}_x)(\mathbf{x} - \boldsymbol{\mu}_x)^\top] \mathbf{A}^\top$$

By the definition of $\operatorname{Cov}(\mathbf{x})$, the term inside the expectation is exactly the covariance of $\mathbf{x}$:

$$\operatorname{Cov}(\mathbf{y}) = \mathbf{A} \operatorname{Cov}(\mathbf{x}) \mathbf{A}^\top$$