Variance measures how much a single random variable deviates from its own mean. It is the average of the squared distances from the mean:
$$Var(X) = \sigma_X^2 = \mathbb E\left[\left(X - \mathbb E\left[ X\right]\right)^2 \right]$$
The covariance between two random variables $X$ and $Y$ defines how they change together. It is the expected value of the product of their deviations:$$Cov(X,Y) = \sigma_{XY} = \mathbb E\left[\left(X - \mathbb E\left[ X\right]\right) \left(Y - \mathbb E\left[ Y\right]\right) \right]$$
What does the number tell us?
The sign of the covariance tells us the "direction" of the relationship:
Why this is the important of the Kalman Filter
In the Kalman Filter, we don't just want to know "Where is the car?" and "How fast is it?" independently. We want to know how an error in one propagates into the other.
If we have a set of $m$ measurements, we calculate the empirical covariance using the following formula:
$$cov(x,y) = \frac{\sum_{i=1}^{m} (x^{(i)} - \bar x)(y^{(i)} - \bar y )}{m-1}$$
Where:
from ipynb.fs.defs.plot_multivariate_Gaussian_distribution import plot_cov_matricies, plot_all_mvd
plot_cov_matricies()
Explanation:
While covariance tells us the direction of the relationship, its raw value is hard to interpret because it depends on the units of the variables.
To solve this, we use the Pearson Correlation Coefficient ($\rho$), which normalizes the covariance into a range from -1 to +1.
The formula is:
$$\rho_{xy} = \frac {cov(x, y)}{\sigma_x \sigma_y}$$
Where $\sigma_x$ and $\sigma_y$ are the standard deviations (the square roots of the variances).
When dealing with a multi-dimensional random variable, we use a covariance matrix to describe the relationships between the different dimensions (or components) of the variable.
Let's assume that such a vector $\mathbf x$ has three elements, i.e., $n=3$:
$$ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} $$ Then, the covariance matrix $\mathbf{C}$ is an $n \times n$ matrix (in this case, a $3 \times 3$ matrix):
$$ \mathbf{C} = \begin{pmatrix} cov(x_1,x_1) & cov(x_1,x_2) & cov(x_1,x_3) \\ cov(x_2,x_1) & cov(x_2,x_2) & cov(x_2,x_3) \\ cov(x_3,x_1) & cov(x_3,x_2) & cov(x_3,x_3) \end{pmatrix} $$
Key Properties:
The element-wise definition above can be written compactly using the outer product. For a random vector $\mathbf{x}$ with mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{x}]$:
$$\mathbf{C} = \text{Cov}(\mathbf{x}) = \mathbb{E}\!\left[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T\right]$$
To see why, expand the outer product explicitly for $n = 3$:
$$(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T = \begin{bmatrix} x_1 - \mu_1 \\ x_2 - \mu_2 \\ x_3 - \mu_3 \end{bmatrix} \begin{bmatrix} x_1 - \mu_1 & x_2 - \mu_2 & x_3 - \mu_3 \end{bmatrix} = \begin{bmatrix} (x_1-\mu_1)^2 & (x_1-\mu_1)(x_2-\mu_2) & (x_1-\mu_1)(x_3-\mu_3) \\ (x_2-\mu_2)(x_1-\mu_1) & (x_2-\mu_2)^2 & (x_2-\mu_2)(x_3-\mu_3) \\ (x_3-\mu_3)(x_1-\mu_1) & (x_3-\mu_3)(x_2-\mu_2) & (x_3-\mu_3)^2 \end{bmatrix}$$
Taking the expectation entry-by-entry:
$$\mathbb{E}\!\left[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T\right] = \begin{bmatrix} \mathbb{E}[(x_1-\mu_1)^2] & \mathbb{E}[(x_1-\mu_1)(x_2-\mu_2)] & \mathbb{E}[(x_1-\mu_1)(x_3-\mu_3)] \\ \mathbb{E}[(x_2-\mu_2)(x_1-\mu_1)] & \mathbb{E}[(x_2-\mu_2)^2] & \mathbb{E}[(x_2-\mu_2)(x_3-\mu_3)] \\ \mathbb{E}[(x_3-\mu_3)(x_1-\mu_1)] & \mathbb{E}[(x_3-\mu_3)(x_2-\mu_2)] & \mathbb{E}[(x_3-\mu_3)^2] \end{bmatrix} = \begin{pmatrix} cov(x_1,x_1) & cov(x_1,x_2) & cov(x_1,x_3) \\ cov(x_2,x_1) & cov(x_2,x_2) & cov(x_2,x_3) \\ cov(x_3,x_1) & cov(x_3,x_2) & cov(x_3,x_3) \end{pmatrix} = \mathbf{C}$$
Special case — zero mean ($\boldsymbol{\mu} = \mathbf{0}$):
If the random vector is zero-mean, the formula simplifies to:
$$\mathbf{C} = \mathbb{E}\!\left[\mathbf{x}\mathbf{x}^T\right]$$
The Multivariate Normal Distribution (MVD) is the generalization of the 1-dimensional bell curve to $n$ dimensions. For a state vector $\mathbf{x}$ of size $n$, the probability density function (PDF) is defined as:
$$p(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n |\mathbf{C}|}} \exp\left(-\frac{1}{2}(\mathbf{x} - {\boldsymbol{\mu}})^T \mathbf{C}^{-1} (\mathbf{x} - {\boldsymbol{\mu}})\right)$$
Where:
In our 1D point mass model, the state vector is $\mathbf{x} = \begin{bmatrix} p \\ v \end{bmatrix}$. The distribution lives in a 2D plane where the "height" of the mountain at any $(p, v)$ coordinate tells us how likely that specific combination of position and velocity is.
For this case, the covariance matrix is:$$\mathbf{C} = \begin{bmatrix} \sigma_p^2 & \sigma_{pv} \\ \sigma_{vp} & \sigma_v^2 \end{bmatrix}$$
plot_all_mvd()
Explaination:
The covariance matrix of a transformed random vector $\mathbf{y} = \mathbf{A}\mathbf{x}$ is $\operatorname{Cov}(\mathbf{y}) = \mathbf{A} \operatorname{Cov}(\mathbf{x}) \mathbf{A}^\top$, which can be demonstrated as follows:
Let $\mathbf{x}$ be a random vector with:
For the transformed vector $\mathbf{y} = \mathbf{A}\mathbf{x}$ (where $\mathbf{A}$ is a constant matrix):
The covariance matrix of $\mathbf{y}$ is defined as:
$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[(\mathbf{y} - \boldsymbol{\mu}_y)(\mathbf{y} - \boldsymbol{\mu}_y)^\top]$$
Substitute $\mathbf{y} = \mathbf{A}\mathbf{x}$ and $\boldsymbol{\mu}_y = \mathbf{A}\boldsymbol{\mu}_x$ into the expression:
$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[(\mathbf{A}\mathbf{x} - \mathbf{A}\boldsymbol{\mu}_x)(\mathbf{A}\mathbf{x} - \mathbf{A}\boldsymbol{\mu}_x)^\top]$$
Factor out the constant matrix $\mathbf{A}$:
$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[\mathbf{A}(\mathbf{x} - \boldsymbol{\mu}_x) (\mathbf{A}(\mathbf{x} - \boldsymbol{\mu}_x))^\top]$$
Apply the property of matrix transposition, $(\mathbf{BC})^\top = \mathbf{C}^\top \mathbf{B}^\top$:
$$\operatorname{Cov}(\mathbf{y}) = \mathbb{E}[\mathbf{A}(\mathbf{x} - \boldsymbol{\mu}_x) (\mathbf{x} - \boldsymbol{\mu}_x)^\top \mathbf{A}^\top]$$
Since $\mathbf{A}$ and $\mathbf{A}^\top$ are constant matrices, they can be pulled out of the expectation operator $\mathbb{E}[\cdot]$:
$$\operatorname{Cov}(\mathbf{y}) = \mathbf{A} \mathbb{E}[(\mathbf{x} - \boldsymbol{\mu}_x)(\mathbf{x} - \boldsymbol{\mu}_x)^\top] \mathbf{A}^\top$$
By the definition of $\operatorname{Cov}(\mathbf{x})$, the term inside the expectation is exactly the covariance of $\mathbf{x}$:
$$\operatorname{Cov}(\mathbf{y}) = \mathbf{A} \operatorname{Cov}(\mathbf{x}) \mathbf{A}^\top$$