While our state vector $\mathbf{x}$ contains all the variables we are interested in tracking (position $p$ and velocity $v$), our sensors often measure only a subset of these, or a linear combination of them.
The relationship between the state and the noisy measurement $\mathbf{z}_k$ is defined by the observation equation:
$$\mathbf{z}_k = \mathbf{H}_k \mathbf{x}_k + \mathbf{v}_k$$
Where:
The matrix $\mathbf{H}$ dictates how the elements of the internal state vector $\mathbf{x} = \begin{bmatrix} p \\ v \end{bmatrix}$ are combined to produce the measurement $\mathbf{z}$.
In other words, $\mathbf{H}$ acts as a bridge:
If we use a GPS sensor that measures only the position ($p$) and nothing else, our measurement vector is $\mathbf{z}_k = [p_{measured}]$.
The matrix $\mathbf{H}$ must select the position from $\mathbf{x}_k$:
$$\mathbf{z}_k = \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} p_k \\ v_k \end{bmatrix}$$
$$\mathbf{H} = \begin{bmatrix} 1 & 0 \end{bmatrix}$$
This matrix multiplication yields: $p_{measured} = 1 \cdot p_k + 0 \cdot v_k$, meaning the measurement is simply the position.
Example B: Measuring Position and Velocity
If we use a sophisticated sensor (e.g., radar, which measures position via time-of-flight and velocity via the Doppler effect) that provides independent measurements of both position ($p$) and velocity ($v$), the measurement vector is $\mathbf{z}_k = \begin{bmatrix} p_{measured} \\ v_{measured} \end{bmatrix}$.
The matrix $\mathbf{H}$ must select both elements:
$$\mathbf{z}_k = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p_k \\ v_k \end{bmatrix}$$
$$\mathbf{H} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \mathbf{I} \text{ (Identity Matrix)}$$
The vector $\mathbf{v}_k$ (measurement noise) accounts for the fact that sensors are never perfectly accurate. This noise is modeled as a zero-mean Gaussian distribution, characterized by the Measurement Noise Covariance Matrix ($\mathbf{R}$):
$$\mathbf{R} = \mathbb{E}[\mathbf{v}_k \mathbf{v}_k^T]$$
$\mathbf{R}$ is a symmetric matrix that quantifies the sensor's inherent unreliability. For a single position measurement (Example A), $\mathbf{R}$ is a $1 \times 1$ matrix:
$$\mathbf{R} = \begin{bmatrix} \sigma^2_{\text{GPS}} \end{bmatrix}$$
Just like $\mathbf{P}$ and $\mathbf{Q}$, the diagonal of $\mathbf{R}$ contains the variances of the individual sensor measurements, and the off-diagonal elements represent the correlation between sensor noises (which is usually zero for independent sensors).
The $\mathbf{R}$ matrix is used in the filter's Correction Step.
A smaller $\mathbf{R}$ means the sensor is highly reliable, causing the filter to trust the measurement more than the prediction.
A larger $\mathbf{R}$ means the sensor is noisy, causing the filter to rely more on its internal model prediction.
The Observation Model ($\mathbf{H}$ and $\mathbf{R}$) and the Prediction Model ($\mathbf{F}$ and $\mathbf{Q}$) together form the complete framework for the linear Kalman Filter. The Kalman Filter constantly compares the size of $\mathbf{P}$ (how much we trust our prediction) with the size of $\mathbf{R}$ (how much we trust our sensor). This comparison is what calculates the Kalman Gain ($\mathbf{K}$) (see below).
Now that we have a Predicted State ($\hat{\mathbf{x}}_k^-$) and an Actual Measurement ($\mathbf{z}_k$), the filter must decide how to fuse them. This happens in three logical steps.
First, we calculate the difference between what the sensor actually saw ($\mathbf{z}_k$) and what it expected to see based on our prediction ($\mathbf{H}\hat{\mathbf{x}}_k^-$). This is called the Innovation ($\mathbf{y}_k$):
$$\mathbf{y}_k = \mathbf{z}_k - \mathbf{H} \hat{\mathbf{x}}_k^-$$
If $\mathbf{y}_k$ is zero, our prediction was perfect!
The Kalman Gain is the "Trust Factor." It is a weighting matrix that determines how much of that innovation error we should apply to our state.
$$\mathbf{K}_k = \mathbf{P}_k^- \mathbf{H}^T (\mathbf{H} \mathbf{P}_k^- \mathbf{H}^T + \mathbf{R})^{-1}$$
The Intuition of $\mathbf{K}$:
| Component | Interpretation |
|---|---|
| $\mathbf{P}_k^-$ | The a priori error covariance. Represents the current uncertainty in our state estimate. High uncertainty increases the gain, causing the filter to trust new measurements more. |
| $\mathbf{H}^T$ | The observation geometry. Projects the state-space error into the measurement space dimensions. |
| $\mathbf{H} \mathbf{P}_k^- \mathbf{H}^T$ | The predicted measurement covariance. Translates our state uncertainty into the units/dimensions of the sensor. |
| $\mathbf{R}$ | The measurement noise covariance. Represents the sensor's inherent "unreliability." Higher noise decreases the gain, causing the filter to trust the model prediction more. |
| $(\dots)^{-1}$ | The normalization (Inversion). Solves for the optimal weights that minimize the post-update state variance by blending the predicted and measured distributions. |
In multidimensional space, $\mathbf{K}_k$ does more than just scale the error; it rotates and stretches the innovation vector $\mathbf{y}_k = (\mathbf{z}_k - \mathbf{H}\hat{\mathbf{x}}_k^-)$.
Finally, we update our "Best Guess" and our "Uncertainty":
Update State:
$$\hat{\mathbf{x}}_k^+ = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{y}_k$$ (New Guess = Old Guess + Gain $\times$ Error)
Update Uncertainty:
$$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{H}) \mathbf{P}_k^-$$ (New Uncertainty = Old Uncertainty - Information Gained)
| Term | Symbol | Intuition |
|---|---|---|
| Innovation | $\mathbf{y}$ | "How much did the sensor surprise us?" |
| Kalman Gain | $\mathbf{K}$ | "How much do we trust that surprise?" |
| A Posteriori State | $\hat{\mathbf{x}}^+$ | Our final, fused "truth" for this time step. |
| A Posteriori Covariance | $\mathbf{P}^+$ | Our updated (smaller) uncertainty after the measurement. |
The Correction Step is the only place in the filter where Uncertainty ($\mathbf{P}$) actually shrinks.
This example simulates an object moving along a line and tracks it with a full Kalman filter. Each time step exercises both halves of the model: the prediction model ($\mathbf{F}, \mathbf{Q}$) and the observation model ($\mathbf{H}, \mathbf{R}$) developed above.
Here the control input is $u_{k-1} = a_{k-1} = 0$ (no commanded acceleration), so the prediction equation simplifies to:
$$\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} w_{k-1}$$
Expanding into the individual components:
$$\begin{bmatrix} p \\ v \end{bmatrix}_k = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p \\ v \end{bmatrix}_{k-1} + \begin{bmatrix} \frac{1}{2}\Delta t^2 \\ \Delta t \end{bmatrix} w_{k-1}$$
The scalar $w_{k-1}$ is a random acceleration — a small, unpredictable "push" — with standard deviation process_std. This produces the process-noise covariance $\mathbf{Q} = \mathbf{G}\,\mathbf{G}^T\,\sigma_w^2$.
The object is tracked with a position-only sensor (e.g. GPS) — exactly the Example A case from earlier. The measurement equation is:
$$\mathbf{z}_k = \mathbf{H}\,\mathbf{x}_k + v_k, \qquad \mathbf{H} = \begin{bmatrix} 1 & 0 \end{bmatrix}$$
so $\mathbf{H}$ picks out the position and discards the velocity: $z_k = p_k + v_k$. The sensor noise $v_k$ has standard deviation measurement_std, giving the (here $1 \times 1$) measurement-noise covariance:
$$\mathbf{R} = \begin{bmatrix} \sigma^2_{\text{meas}} \end{bmatrix}$$
Note that velocity is never measured directly. The filter can still infer it: the off-diagonal terms of $\mathbf{P}$ — built up by $\mathbf{F}$ coupling position and velocity — let a position correction propagate back into the velocity estimate through the Kalman gain. This is why the velocity error bars in the plot below shrink even with no velocity sensor.
At every time step the code runs the two stages described in this notebook:
import numpy as np
import matplotlib.pyplot as plt
# --- 1. SETUP ---
np.random.seed(420)
dt = 0.4
t = np.arange(0, 20, dt)
n_steps = len(t)
# Kalman/Physics Matrices
F = np.array([[1, dt], [0, 1]])
G = np.array([[0.5*dt**2], [dt]])
H = np.array([[1, 0]])
# Noise Parameters
process_std_true = 0.5 # Standard deviation of the random "push" (w)
process_std = .5 # process noise of Kalman filter
measurement_std = 1.2 # Standard deviation of the sensor noise (v)
Q = G @ G.T * process_std**2
R = np.array([[measurement_std**2]])
# --- 2. GENERATE CONSISTENT GROUND TRUTH ---
# We simulate the "Real World" using the exact physical model
true_x = np.array([[0.0], [2.0]]) # Initial position=0, velocity=2
true_pos, true_vel = [], []
for i in range(n_steps):
# The world moves: x_k = F*x_{k-1} + G*w
w = np.random.normal(0, process_std_true)
true_x = F @ true_x + G * w
true_pos.append(true_x[0,0])
true_vel.append(true_x[1,0])
# Measurements are taken from the consistent truth
measurements = np.array(true_pos) + np.random.normal(0, measurement_std, n_steps)
# --- 3. THE KALMAN FILTER ---
x = np.array([[0], [0]]) # Initial guess (deliberately wrong velocity)
P = np.array([[10, 0], [0, 10]])
preds_pos, ests_pos, stds_pos = [], [], []
preds_vel, ests_vel, stds_vel = [], [], []
for i in range(n_steps):
# PREDICT (A Priori)
x_p = F @ x
P_p = F @ P @ F.T + Q
preds_pos.append(x_p[0,0])
preds_vel.append(x_p[1,0])
# CORRECT (A Posteriori)
z = measurements[i]
K = P_p @ H.T @ np.linalg.inv(H @ P_p @ H.T + R)
x = x_p + K @ (z - H @ x_p)
P = (np.eye(2) - K @ H) @ P_p
ests_pos.append(x[0,0])
ests_vel.append(x[1,0])
stds_pos.append(np.sqrt(P[0,0]))
stds_vel.append(np.sqrt(P[1,1]))
# --- 4. PLOTTING ---
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 12), sharex=True)
# Position Plot
ax1.plot(t, true_pos, 'g-', label='True Position (Consistent Physics)', linewidth=2)
ax1.errorbar(t, measurements, yerr=2*measurement_std, fmt='o', color='red',
ecolor='lightcoral', elinewidth=1, capsize=2, label='Measurements (z) ± 2σ')
ax1.scatter(t, preds_pos, color='orange', marker='x', s=40, label='Predictions (x⁻)')
ax1.errorbar(t, ests_pos, yerr=2*np.array(stds_pos), fmt='o', color='blue',
ecolor='lightblue', elinewidth=2, capsize=3, label='Estimates (x⁺) ± 2σ')
ax1.set_title('Consistent Position Tracking', fontsize=14)
ax1.legend()
# Velocity Plot
ax2.plot(t, true_vel, 'g-', label='True Velocity (Consistent Physics)', linewidth=2)
ax2.scatter(t, preds_vel, color='orange', marker='x', s=40, label='Predictions (x⁻)')
ax2.errorbar(t, ests_vel, yerr=2*np.array(stds_vel), fmt='o', color='blue',
ecolor='lightblue', elinewidth=2, capsize=3, label='Estimates (x⁺) ± 2σ')
ax2.set_title('Consistent Velocity Inference', fontsize=14)
ax2.legend()
plt.tight_layout()
plt.show()