The Observation Model¶

The Observation Equation¶

While our state vector $\mathbf{x}$ contains all the variables we are interested in tracking (position $p$ and velocity $v$), our sensors often measure only a subset of these, or a linear combination of them.

The relationship between the state and the noisy measurement $\mathbf{z}_k$ is defined by the observation equation:

$$\mathbf{z}_k = \mathbf{H}_k \mathbf{x}_k + \mathbf{r}_k$$

Where:

$\mathbf{z}_k$: The actual measurement vector received from the sensor at time $k$.
$\mathbf{H}_k$: The Observation Matrix (or Measurement Matrix).
$\mathbf{x}_k$: The true state vector at time $k$.
$\mathbf{r}_k$: The Measurement Noise Vector, representing the sensor's intrinsic error.

Note on notation. Most of the literature writes the measurement noise as $\mathbf{v}_k$. We use $\mathbf{r}_k$ instead, both to avoid clashing with the velocity $v$ in our state vector and because it pairs mnemonically with its covariance $\mathbf{R}$ (introduced below).

The Observation Matrix ($\mathbf{H}$)¶

The matrix $\mathbf{H}$ dictates how the elements of the internal state vector $\mathbf{x} = \begin{bmatrix} p \\ v \end{bmatrix}$ are combined to produce the measurement $\mathbf{z}$.

In other words, $\mathbf{H}$ acts as a bridge:

The State ($\mathbf{x}$) lives in the "ideal world" of our physics model.
The Measurement ($\mathbf{z}$) lives in the "sensor world" (e.g., voltages, pixels, or GPS coordinates).
$\mathbf{H}$ translates from the "ideal world" to the "sensor world."

Example A: Measuring Only Position ($p$)¶

If we use a GPS sensor that measures only the position ($p$) and nothing else, our measurement vector is $\mathbf{z}_k = [p_{measured}]$.

The matrix $\mathbf{H}$ must select the position from $\mathbf{x}_k$:

$$\mathbf{z}_k = \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} p_k \\ v_k \end{bmatrix}$$

$$\mathbf{H} = \begin{bmatrix} 1 & 0 \end{bmatrix}$$

This matrix multiplication yields: $p_{measured} = 1 \cdot p_k + 0 \cdot v_k$, meaning the measurement is simply the position.

Example B: Measuring Position and Velocity

If we use a sophisticated sensor (e.g., radar, which measures position via time-of-flight and velocity via the Doppler effect) that provides independent measurements of both position ($p$) and velocity ($v$), the measurement vector is $\mathbf{z}_k = \begin{bmatrix} p_{measured} \\ v_{measured} \end{bmatrix}$.

The matrix $\mathbf{H}$ must select both elements:

$$\mathbf{z}_k = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p_k \\ v_k \end{bmatrix}$$

$$\mathbf{H} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \mathbf{I} \text{ (Identity Matrix)}$$

Measurement Noise Covariance Matrix ($\mathbf{R}$)¶

The vector $\mathbf{r}_k$ (measurement noise) accounts for the fact that sensors are never perfectly accurate. This noise is modeled as a zero-mean Gaussian distribution, characterized by the Measurement Noise Covariance Matrix ($\mathbf{R}$):

$$\mathbf{R} = \mathbb{E}[\mathbf{r}_k \mathbf{r}_k^T]$$

$\mathbf{R}$ is a symmetric matrix that quantifies the sensor's inherent unreliability. For a single position measurement (Example A), $\mathbf{R}$ is a $1 \times 1$ matrix:

$$\mathbf{R} = \begin{bmatrix} \sigma^2_{\text{GPS}} \end{bmatrix}$$

Just like $\mathbf{P}$ and $\mathbf{Q}$, the diagonal of $\mathbf{R}$ contains the variances of the individual sensor measurements, and the off-diagonal elements represent the correlation between sensor noises (which is usually zero for independent sensors).

The Role of $\mathbf{R}$ in the Kalman Filter¶

The $\mathbf{R}$ matrix is used in the filter's Correction Step.

A smaller $\mathbf{R}$ means the sensor is highly reliable, causing the filter to trust the measurement more than the prediction.

A larger $\mathbf{R}$ means the sensor is noisy, causing the filter to rely more on its internal model prediction.

The Observation Model ($\mathbf{H}$ and $\mathbf{R}$) and the Prediction Model ($\mathbf{F}$ and $\mathbf{Q}$) together form the complete framework for the linear Kalman Filter. The Kalman Filter constantly compares the size of $\mathbf{P}$ (how much we trust our prediction) with the size of $\mathbf{R}$ (how much we trust our sensor). This comparison is what calculates the Kalman Gain ($\mathbf{K}$) (see below).

The Correction Step¶

Now that we have a Predicted State ($\hat{\mathbf{x}}_k^-$) and an Actual Measurement ($\mathbf{z}_k$), the filter must decide how to fuse them. This happens in four logical steps.

1. The Innovation (Measurement Residual)¶

First, we calculate the difference between what the sensor actually saw ($\mathbf{z}_k$) and what it expected to see based on our prediction ($\mathbf{H}\hat{\mathbf{x}}_k^-$). This is called the Innovation ($\mathbf{y}_k$):

$$\mathbf{y}_k = \mathbf{z}_k - \mathbf{H} \hat{\mathbf{x}}_k^-$$

If $\mathbf{y}_k$ is zero, our prediction was perfect!

2. The Innovation Covariance ($\mathbf{S}_k$)¶

The innovation $\mathbf{y}_k$ is itself a random vector, so before deciding how much to trust it we ask: how large an innovation did we expect? That expected spread is the innovation covariance

$$\mathbf{S}_k = \mathbf{H}\,\mathbf{P}_k^-\,\mathbf{H}^T + \mathbf{R}.$$

It adds the two independent reasons the measurement can differ from the prediction:

$\mathbf{H}\mathbf{P}_k^-\mathbf{H}^T$ — our own prediction uncertainty, projected from state space into measurement space (we are not sure what the sensor should read), plus
$\mathbf{R}$ — the sensor's measurement noise (the reading is corrupted even if the prediction were perfect).

$\mathbf{S}_k$ is exactly the matrix inverted in the Kalman gain, so the gain can be written compactly as

$$\mathbf{K}_k = \mathbf{P}_k^-\mathbf{H}^T\,\mathbf{S}_k^{-1}.$$

Read this way the gain is a trust ratio: the state–measurement cross-covariance $\mathbf{P}_k^-\mathbf{H}^T$ "divided by" the total expected surprise $\mathbf{S}_k$. A large $\mathbf{S}_k$ (shaky prediction or noisy sensor) shrinks the gain; a small $\mathbf{S}_k$ enlarges it.

Why give it a name? Beyond the gain, $\mathbf{S}_k$ is what makes the innovation interpretable. The scalar $\mathbf{y}_k^T\mathbf{S}_k^{-1}\mathbf{y}_k$ — the squared Mahalanobis distance of the innovation, also called the normalized innovation squared (NIS) — measures the surprise in units of its own expected size. It follows a $\chi^2$ distribution when the filter is consistent, which makes it the standard tool for outlier rejection (gate out a measurement whose innovation is implausibly large, e.g. a GPS multipath spike) and for filter health monitoring (innovations persistently too large or too small reveal a mistuned $\mathbf{Q}$ or $\mathbf{R}$).

3. The Kalman Gain ($\mathbf{K}_k$)¶

The Kalman Gain is the "Trust Factor." It is a weighting matrix that determines how much of that innovation error we should apply to our state.

$$\mathbf{K}_k = \mathbf{P}_k^- \mathbf{H}^T (\mathbf{H} \mathbf{P}_k^- \mathbf{H}^T + \mathbf{R})^{-1} = \mathbf{P}_k^- \mathbf{H}^T \mathbf{S}_k^{-1}$$

The Intuition of $\mathbf{K}$:

If all elements of $\mathbf{R}$ are very small (Perfect Sensor): $\mathbf{K}$ grows large and the filter trusts the measurement completely — the updated state is pulled fully toward the observation.
If all elements of $\mathbf{P}^-$ are very small (Perfect Prediction): $\mathbf{K}$ approaches $\mathbf{0}$. The filter ignores the sensor and sticks to its internal model.

Breakdown of Components¶

Component	Interpretation
$\mathbf{P}_k^-$	The a priori error covariance. Represents the current uncertainty in our state estimate. High uncertainty increases the gain, causing the filter to trust new measurements more.
$\mathbf{H}^T$	The observation geometry. Projects the state-space error into the measurement space dimensions.
$\mathbf{H} \mathbf{P}_k^- \mathbf{H}^T$	The predicted measurement covariance. Translates our state uncertainty into the units/dimensions of the sensor.
$\mathbf{R}$	The measurement noise covariance. Represents the sensor's inherent "unreliability." Higher noise decreases the gain, causing the filter to trust the model prediction more.
$(\dots)^{-1}$	The normalization (Inversion). Solves for the optimal weights that minimize the post-update state variance by blending the predicted and measured distributions.

Geometric Interpretation¶

In multidimensional space, $\mathbf{K}_k$ does more than just scale the error; it rotates and stretches the innovation vector $\mathbf{y}_k = (\mathbf{z}_k - \mathbf{H}\hat{\mathbf{x}}_k^-)$.

Coupling: Because $\mathbf{P}_k^-$ contains cross-correlations between state variables (e.g., position vs. velocity), the gain matrix allows a measurement in one dimension to update correlated variables in others.
Optimal Filtering: It is mathematically derived to be the Minimum Mean Square Error (MMSE) estimator. It effectively balances the "trust" between your dynamic model and your sensor suite based on their respective uncertainties.

4. The State and Covariance Update¶

Finally, we update our "Best Guess" and our "Uncertainty":

Update State:

$$\hat{\mathbf{x}}_k^+ = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{y}_k$$ (New Guess = Old Guess + Gain $\times$ Error)

Update Uncertainty:

$$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{H}) \mathbf{P}_k^-$$ (New Uncertainty = Old Uncertainty - Information Gained)

Term	Symbol	Intuition
Innovation	$\mathbf{y}$	"How much did the sensor surprise us?"
Kalman Gain	$\mathbf{K}$	"How much do we trust that surprise?"
A Posteriori State	$\hat{\mathbf{x}}^+$	Our final, fused "truth" for this time step.
A Posteriori Covariance	$\mathbf{P}^+$	Our updated (smaller) uncertainty after the measurement.

Summary of the Correction Step¶

The Correction Step is the only place in the filter where Uncertainty ($\mathbf{P}$) actually shrinks.

Prediction: We step into the dark; $\mathbf{P}$ grows.
Correction: We turn on the flashlight (sensor); $\mathbf{P}$ shrinks.

Example¶

This example simulates an object moving along a line and tracks it with a full Kalman filter. Each time step exercises both halves of the model: the prediction model ($\mathbf{F}, \mathbf{Q}$) and the observation model ($\mathbf{H}, \mathbf{R}$) developed above.

The Prediction Model¶

Here the control input is $u_{k-1} = a_{k-1} = 0$ (no commanded acceleration), so the prediction equation simplifies to: $$\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} w_{k-1}$$ Expanding into the individual components: $$\begin{bmatrix} p \\ v \end{bmatrix}_k = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p \\ v \end{bmatrix}_{k-1} + \begin{bmatrix} \frac{1}{2}\Delta t^2 \\ \Delta t \end{bmatrix} w_{k-1}$$ The scalar $w_{k-1}$ is a random acceleration — a small, unpredictable "push" — with standard deviation process_std. This produces the process-noise covariance $\mathbf{Q} = \mathbf{G}\,\mathbf{G}^T\,\sigma_w^2$.

The Observation Model¶

The object is tracked with a position-only sensor (e.g. GPS) — exactly the Example A case from earlier. The measurement equation is: $$\mathbf{z}_k = \mathbf{H}\,\mathbf{x}_k + r_k, \qquad \mathbf{H} = \begin{bmatrix} 1 & 0 \end{bmatrix}$$ so $\mathbf{H}$ picks out the position and discards the velocity: $z_k = p_k + r_k$. The sensor noise $r_k$ has standard deviation measurement_std, giving the (here $1 \times 1$) measurement-noise covariance: $$\mathbf{R} = \begin{bmatrix} \sigma^2_{\text{meas}} \end{bmatrix}$$

Note that velocity is never measured directly. The filter can still infer it: the off-diagonal terms of $\mathbf{P}$ — built up by $\mathbf{F}$ coupling position and velocity — let a position correction propagate back into the velocity estimate through the Kalman gain. This is why the velocity error bars in the plot below shrink even with no velocity sensor.

What Each Step Does¶

At every time step the code runs the two stages described in this notebook:

Predict (a priori): advance the state and grow the uncertainty $$\hat{\mathbf{x}}_k^- = \mathbf{F}\,\hat{\mathbf{x}}_{k-1}^+, \qquad \mathbf{P}_k^- = \mathbf{F}\,\mathbf{P}_{k-1}^+\,\mathbf{F}^T + \mathbf{Q}$$
Correct (a posteriori): fold in the position measurement $z_k$ via $\mathbf{H}$ and $\mathbf{R}$ — compute the innovation $\mathbf{y}_k = z_k - \mathbf{H}\hat{\mathbf{x}}_k^-$ and the gain $\mathbf{K}_k$, then update $\hat{\mathbf{x}}_k^+$ and shrink $\mathbf{P}_k^+$.

In [1]:

import numpy as np
import matplotlib.pyplot as plt

# --- 1. SETUP ---
np.random.seed(420)
dt = 0.4 
t = np.arange(0, 20, dt)
n_steps = len(t)

# Kalman/Physics Matrices
F = np.array([[1, dt], [0, 1]])
G = np.array([[0.5*dt**2], [dt]])
H = np.array([[1, 0]]) 

# Noise Parameters
process_std_true = 0.5    # Standard deviation of the random "push" (w)
process_std = .5 # process noise of Kalman filter
measurement_std = 1.2 # Standard deviation of the sensor noise (r)

Q = G @ G.T * process_std**2
R = np.array([[measurement_std**2]])

# --- 2. GENERATE CONSISTENT GROUND TRUTH ---
# We simulate the "Real World" using the exact physical model
true_x = np.array([[0.0], [2.0]]) # Initial position=0, velocity=2
true_pos, true_vel = [], []

for i in range(n_steps):
    # The world moves: x_k = F*x_{k-1} + G*w
    w = np.random.normal(0, process_std_true)
    true_x = F @ true_x + G * w
    
    true_pos.append(true_x[0,0])
    true_vel.append(true_x[1,0])

# Measurements are taken from the consistent truth
measurements = np.array(true_pos) + np.random.normal(0, measurement_std, n_steps)

# --- 3. THE KALMAN FILTER ---
x = np.array([[0], [0]])  # Initial guess (deliberately wrong velocity)
P = np.array([[10, 0], [0, 10]])

preds_pos, ests_pos, stds_pos = [], [], []
preds_vel, ests_vel, stds_vel = [], [], []

for i in range(n_steps):
    # PREDICT (A Priori)
    x_p = F @ x
    P_p = F @ P @ F.T + Q
    preds_pos.append(x_p[0,0])
    preds_vel.append(x_p[1,0])
    
    # CORRECT (A Posteriori)
    z = measurements[i]
    K = P_p @ H.T @ np.linalg.inv(H @ P_p @ H.T + R)
    x = x_p + K @ (z - H @ x_p)
    P = (np.eye(2) - K @ H) @ P_p
    
    ests_pos.append(x[0,0])
    ests_vel.append(x[1,0])
    stds_pos.append(np.sqrt(P[0,0]))
    stds_vel.append(np.sqrt(P[1,1]))

# --- 4. PLOTTING ---
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 12), sharex=True)

# Position Plot
ax1.plot(t, true_pos, 'g-', label='True Position (Consistent Physics)', linewidth=2)
ax1.errorbar(t, measurements, yerr=2*measurement_std, fmt='o', color='red', 
             ecolor='lightcoral', elinewidth=1, capsize=2, label='Measurements (z) ± 2σ')
ax1.scatter(t, preds_pos, color='orange', marker='x', s=40, label='Predictions (x⁻)')
ax1.errorbar(t, ests_pos, yerr=2*np.array(stds_pos), fmt='o', color='blue', 
             ecolor='lightblue', elinewidth=2, capsize=3, label='Estimates (x⁺) ± 2σ')
ax1.set_title('Consistent Position Tracking', fontsize=14)
ax1.legend()

# Velocity Plot
ax2.plot(t, true_vel, 'g-', label='True Velocity (Consistent Physics)', linewidth=2)
ax2.scatter(t, preds_vel, color='orange', marker='x', s=40, label='Predictions (x⁻)')
ax2.errorbar(t, ests_vel, yerr=2*np.array(stds_vel), fmt='o', color='blue', 
             ecolor='lightblue', elinewidth=2, capsize=3, label='Estimates (x⁺) ± 2σ')
ax2.set_title('Consistent Velocity Inference', fontsize=14)
ax2.legend()

plt.tight_layout()
plt.show()

No description has been provided for this image

Red vs. Blue Error Bars: In the top plot, notice that the red error bars (representing the raw sensor accuracy) are much larger than the blue error bars. This proves that the filter is more confident than the sensor itself because it uses the "wisdom" of the physical model.
The Predictive "X": Notice that the orange prediction (where the physics expected the object to be) and the red measurement (where the sensor saw it) often pull in different directions. The blue estimate lands between them, weighted by the Kalman Gain.
Velocity Uncertainty: In the bottom plot, despite having no sensor for velocity, the error bars for velocity are surprisingly small. This demonstrates that by watching the position consistently, the filter becomes very certain about the underlying speed.
Convergence: Observe the blue error bars at $t=0$ compared to $t=2$. They usually start larger and "shrink" as the filter collects more evidence, representing the learning phase of the system.