The Observation Model¶

The Observation Equation¶

While our state vector $\mathbf{x}$ contains all the variables we are interested in tracking (position $p$ and velocity $v$), our sensors often measure only a subset of these, or a linear combination of them.

The relationship between the state and the noisy measurement $\mathbf{z}_k$ is defined by the observation equation:

$$\mathbf{z}_k = \mathbf{H}_k \mathbf{x}_k + \mathbf{v}_k$$

Where:

$\mathbf{z}_k$: The actual measurement vector received from the sensor at time $k$.
$\mathbf{H}_k$: The Observation Matrix (or Measurement Matrix).
$\mathbf{x}_k$: The true state vector at time $k$.
$\mathbf{v}_k$: The Measurement Noise Vector, representing the sensor's intrinsic error.

The Observation Matrix ($\mathbf{H}$)¶

The matrix $\mathbf{H}$ dictates how the elements of the internal state vector $\mathbf{x} = \begin{bmatrix} p \\ v \end{bmatrix}$ are combined to produce the measurement $\mathbf{z}$.

In other words, $\mathbf{H}$ acts as a bridge:

The State ($\mathbf{x}$) lives in the "ideal world" of our physics model.
The Measurement ($\mathbf{z}$) lives in the "sensor world" (e.g., voltages, pixels, or GPS coordinates).
$\mathbf{H}$ translates from the "ideal world" to the "sensor world."

Example A: Measuring Only Position ($p$)

If we use a GPS sensor that measures only the position ($p$) and nothing else, our measurement vector is $\mathbf{z}_k = [p_{measured}]$.

The matrix $\mathbf{H}$ must select the position from $\mathbf{x}_k$:

$$\mathbf{z}_k = \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} p_k \\ v_k \end{bmatrix}$$

$$\mathbf{H} = \begin{bmatrix} 1 & 0 \end{bmatrix}$$

This matrix multiplication yields: $p_{measured} = 1 \cdot p_k + 0 \cdot v_k$, meaning the measurement is simply the position.

Example B: Measuring Position and Velocity

If we use a sophisticated sensor (e.g., radar or a highly integrated IMU) that provides independent measurements of both position ($p$) and velocity ($v$), the measurement vector is $\mathbf{z}_k = \begin{bmatrix} p_{measured} \\ v_{measured} \end{bmatrix}$.

The matrix $\mathbf{H}$ must select both elements:

$$\mathbf{z}_k = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p_k \\ v_k \end{bmatrix}$$

$$\mathbf{H} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \mathbf{I} \text{ (Identity Matrix)}$$

Measurement Noise Covariance Matrix ($\mathbf{R}$)¶

The vector $\mathbf{v}_k$ (measurement noise) accounts for the fact that sensors are never perfectly accurate. This noise is modeled as a zero-mean Gaussian distribution, characterized by the Measurement Noise Covariance Matrix ($\mathbf{R}$):

$$\mathbf{R} = \mathbb{E}[\mathbf{v}_k \mathbf{v}_k^T]$$

$\mathbf{R}$ is a symmetric matrix that quantifies the sensor's inherent unreliability. For a single position measurement (Example A), $\mathbf{R}$ is a $1 \times 1$ matrix:

$$\mathbf{R} = \begin{bmatrix} \sigma^2_{\text{GPS}} \end{bmatrix}$$

This $\mathbf{R}$ matrix is the formal representation of the single-measurement variance ($\sigma^2_{\text{sensor}}$) discussed in the error calculation section.

Just like $\mathbf{P}$ and $\mathbf{Q}$, the diagonal of $\mathbf{R}$ contains the variances of the sensors, and the off-diagonal elements represent the correlation between sensor noises (which is usually zero for independent sensors).

The Role of $\mathbf{R}$ in the Kalman Filter¶

The $\mathbf{R}$ matrix is used in the filter's Correction Step.

A smaller $\mathbf{R}$ means the sensor is highly reliable, causing the filter to trust the measurement more than the prediction.

A larger $\mathbf{R}$ means the sensor is noisy, causing the filter to rely more on its internal model prediction.

The Observation Model ($\mathbf{H}$ and $\mathbf{R}$) and the Prediction Model ($\mathbf{F}$ and $\mathbf{Q}$) together form the complete framework for the linear Kalman Filter. The Kalman Filter constantly compares the size of $\mathbf{P}$ (how much we trust our prediction) with the size of $\mathbf{R}$ (how much we trust our sensor). This comparison is what calculates the Kalman Gain ($K$) (see below).

The Correction Step¶

Now that we have a Predicted State ($\mathbf{x}_k^-$) and an Actual Measurement ($\mathbf{z}_k$), the filter must decide how to fuse them. This happens in three logical steps.

1. The Innovation (Measurement Residual)

First, we calculate the difference between what the sensor actually saw ($\mathbf{z}_k$) and what it expected to see based on our prediction ($\mathbf{H}\mathbf{x}_k^-$). This is called the Innovation ($\mathbf{y}_k$):

$$\mathbf{y}_k = \mathbf{z}_k - \mathbf{H} \mathbf{x}_k^-$$

If $\mathbf{y}_k$ is zero, our prediction was perfect!

2. The Kalman Gain ($\mathbf{K}_k$)

The Kalman Gain is the "Trust Factor." It is a weighting matrix that determines how much of that innovation error we should apply to our state.

$$\mathbf{K}_k = \mathbf{P}_k^- \mathbf{H}^T (\mathbf{H} \mathbf{P}_k^- \mathbf{H}^T + \mathbf{R})^{-1}$$

The Intuition of $\mathbf{K}$:

If $\mathbf{R}$ is very small (Perfect Sensor): $\mathbf{K}$ approaches $\mathbf{H}^{-1}$. The filter trusts the sensor completely and ignores the prediction.
If $\mathbf{P}$ is very small (Perfect Prediction): $\mathbf{K}$ approaches $0$. The filter ignores the sensor and sticks to its internal model.

3. The State and Covariance Update

Finally, we update our "Best Guess" and our "Uncertainty":

Update State:

$$\hat{\mathbf{x}}_k^+ = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{y}_k$$ (New Guess = Old Guess + Gain $\times$ Error)

Update Uncertainty:

$$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{H}) \mathbf{P}_k^-$$ (New Uncertainty = Old Uncertainty - Information Gained)

Term	Symbol	Intuition
Innovation	$\mathbf{y}$	"How much did the sensor surprise us?"
Kalman Gain	$\mathbf{K}$	"How much do we trust that surprise?"
A Posteriori State	$\hat{\mathbf{x}}^+$	Our final, fused "truth" for this time step.
A Posteriori Covariance	$\mathbf{P}^+$	Our updated (smaller) uncertainty after the measurement.

Summary of the Correction Step¶

The Correction Step is the only place in the filter where Uncertainty ($\mathbf{P}$) actually shrinks.

Prediction: We step into the dark; $\mathbf{P}$ grows.
Correction: We turn on the flashlight (sensor); $\mathbf{P}$ shrinks.

Example¶

In our first example, the input $u_{k-1} = a_{k-1} = 0$.

With $u_{k-1} = 0$, your prediction equation simplifies to:$$\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} w_{k-1}$$Expanding this into the individual components:$$\begin{bmatrix} p \\ v \end{bmatrix}_k = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p \\ v \end{bmatrix}_{k-1} + \begin{bmatrix} \frac{1}{2}\Delta t^2 \\ \Delta t \end{bmatrix} w_{k-1}$$

In [1]:

import numpy as np
import matplotlib.pyplot as plt

# --- 1. SETUP ---
np.random.seed(420)
dt = 0.4 
t = np.arange(0, 20, dt)
n_steps = len(t)

# Kalman/Physics Matrices
F = np.array([[1, dt], [0, 1]])
G = np.array([[0.5*dt**2], [dt]])
H = np.array([[1, 0]]) 

# Noise Parameters
process_std_true = 0.5    # Standard deviation of the random "push" (w)
process_std = .5 # process noise of Kalman filter
measurement_std = 1.2 # Standard deviation of the sensor noise (v)

Q = G @ G.T * process_std**2
R = np.array([[measurement_std**2]])

# --- 2. GENERATE CONSISTENT GROUND TRUTH ---
# We simulate the "Real World" using the exact physical model
true_x = np.array([[0.0], [2.0]]) # Initial position=0, velocity=2
true_pos, true_vel = [], []

for i in range(n_steps):
    # The world moves: x_k = F*x_{k-1} + G*w
    w = np.random.normal(0, process_std_true)
    true_x = F @ true_x + G * w
    
    true_pos.append(true_x[0,0])
    true_vel.append(true_x[1,0])

# Measurements are taken from the consistent truth
measurements = np.array(true_pos) + np.random.normal(0, measurement_std, n_steps)

# --- 3. THE KALMAN FILTER ---
x = np.array([[0], [0]])  # Initial guess (deliberately wrong velocity)
P = np.array([[10, 0], [0, 10]])

preds_pos, ests_pos, stds_pos = [], [], []
preds_vel, ests_vel, stds_vel = [], [], []

for i in range(n_steps):
    # PREDICT (A Priori)
    x_p = F @ x
    P_p = F @ P @ F.T + Q
    preds_pos.append(x_p[0,0])
    preds_vel.append(x_p[1,0])
    
    # CORRECT (A Posteriori)
    z = measurements[i]
    K = P_p @ H.T @ np.linalg.inv(H @ P_p @ H.T + R)
    x = x_p + K @ (z - H @ x_p)
    P = (np.eye(2) - K @ H) @ P_p
    
    ests_pos.append(x[0,0])
    ests_vel.append(x[1,0])
    stds_pos.append(np.sqrt(P[0,0]))
    stds_vel.append(np.sqrt(P[1,1]))

# --- 4. PLOTTING ---
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 12), sharex=True)

# Position Plot
ax1.plot(t, true_pos, 'g-', label='True Position (Consistent Physics)', linewidth=2)
ax1.errorbar(t, measurements, yerr=2*measurement_std, fmt='o', color='red', 
             ecolor='lightcoral', elinewidth=1, capsize=2, label='Measurements (z) ± 2σ')
ax1.scatter(t, preds_pos, color='orange', marker='x', s=40, label='Predictions (x⁻)')
ax1.errorbar(t, ests_pos, yerr=2*np.array(stds_pos), fmt='o', color='blue', 
             ecolor='lightblue', elinewidth=2, capsize=3, label='Estimates (x⁺) ± 2σ')
ax1.set_title('Consistent Position Tracking', fontsize=14)
ax1.legend()

# Velocity Plot
ax2.plot(t, true_vel, 'g-', label='True Velocity (Consistent Physics)', linewidth=2)
ax2.scatter(t, preds_vel, color='orange', marker='x', s=40, label='Predictions (x⁻)')
ax2.errorbar(t, ests_vel, yerr=2*np.array(stds_vel), fmt='o', color='blue', 
             ecolor='lightblue', elinewidth=2, capsize=3, label='Estimates (x⁺) ± 2σ')
ax2.set_title('Consistent Velocity Inference', fontsize=14)
ax2.legend()

plt.tight_layout()
plt.show()

No description has been provided for this image

Red vs. Blue Error Bars: In the top plot, notice that the red error bars (representing the raw sensor accuracy) are much larger than the blue error bars. This proves that the filter is more confident than the sensor itself because it uses the "wisdom" of the physical model.
The Predictive "X": Notice that the orange prediction (where the physics expected the car to be) and the red measurement (where the sensor saw it) often pull in different directions. The blue estimate lands between them, weighted by the Kalman Gain.
Velocity Uncertainty: In the bottom plot, despite having no sensor for velocity, the error bars for velocity are surprisingly small. This demonstrates that by watching the position consistently, the filter becomes very certain about the underlying speed.
Convergence: Observe the blue error bars at $t=0$ compared to $t=2$. They usually start larger and "shrink" as the filter collects more evidence, representing the learning phase of the system.