Process Noise: Modeling Model Imperfection

The prediction step we derived ($\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} \mathbf{u}_{k-1}$) assumes our model is perfect: that the system is truly linear and that the acceleration $a_{k-1}$ is known and perfectly constant over the interval $\Delta t$.

In reality, neither of these is true. Process Noise accounts for unmodeled disturbances and inaccuracies.

The Full Prediction Equation

The general prediction equation including model imperfections is:

$$\mathbf{x}_k = \mathbf{F}_{k-1} \mathbf{x}_{k-1} + \mathbf{G}_{k-1} \mathbf{u}_{k-1} + \mathbf{L}_{k-1} \mathbf{w}_{k-1}$$

Where:

  • $\mathbf{x}$: $n$ State Vector.
  • $\mathbf{F}_{k-1}$: $n \times n$ State Transition Matrix.
  • $\mathbf{G}_{k-1}$: $n \times m$ Control Input Matrix.
  • $\mathbf{u}_{k-1}$: $m$ Control Input Vector.
  • $\mathbf{L}_{k-1}$: $n \times q$ Noise Gain Matrix.
  • $\mathbf{w}_{k-1}$: $q$ Process Noise Vector.

What Does Process Noise Represent?

The term $\mathbf{w}_{k-1}$ represents any effect that causes the true state of the system to deviate from our model's prediction, including:

  • Model Error: The true acceleration is never perfectly constant; it jitters, changes slightly, or is affected by unmodeled forces (like wind resistance or friction).
  • Input Error: Errors in measuring or calculating the input (e.g. $a_{k-1}$) itself.
  • Discretization Error: The error introduced by assuming a constant acceleration over a finite time step $\Delta t$.

Application to the 1D Point Mass (Scalar Case)

For our 1D point mass, we have only one input (acceleration) and one noise source (acceleration jitter). Here, we assume the noise enters the system through the exact same physical "door" as the control command, meaning $\mathbf{L} = \mathbf{G}$.

The 1D Prediction Equation:

$$\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} (u_{k-1} + w_{k-1})$$

The noise $w$ is sampled once per $\Delta t$, just like the control signal $u_{k-1}$. This reflects a "Discrete-Discrete" model, where we assume the jitter measured at the start of the interval persists until the next sample.

Here, $u_{k-1} = a_{k-1}$ and $w_{k-1}$ are scalars ($1 \times 1$), while $\mathbf{G}$ is the $2 \times 1$ matrix that maps that acceleration into position and velocity.

$$ \begin{bmatrix} p_k \\ v_k \end{bmatrix} = \mathbf{F} \begin{bmatrix} p_{k-1} \\ v_{k-1} \end{bmatrix} + \mathbf{G} (a_{k-1} + w_{k-1}) $$

The acceleration noise is a random variable, drawn from a Gaussian distribution with mean $0$ and variance $\sigma_a^2$. Notated as $w_{k-1} \sim \mathcal{N}(0, \sigma_a^2)$. $\sigma_a^2$ is the variance of the acceleration noise.

The Process Noise Covariance Matrix ($\mathbf{Q}$)

Recap of the 1D Case

In the 1D Kalman Filter, the state is a single scalar (e.g., position). The process noise $w \sim \mathcal{N}(0, \sigma_w^2)$ enters the state directly:

$$x_k = x_{k-1} + \Delta x_{k-1} + w_{k-1}$$

Because the noise lives in the same space as the state, it simply adds to the scalar variance:

$$\sigma_k^{2-} = \sigma_{k-1}^{2+} + \sigma_w^2$$

This follows directly from the convolution of two Gaussians derived in the 1D case: the variance of the convolution is the sum of the individual variances. Here the noise enters the state directly without any scaling, so the addition is a simple scalar sum.

The 2D Problem

The noise is in a different space than the state

Our state is now 2-dimensional $\mathbf{x} = [p, v]^T$, but the noise source is still 1-dimensional: a scalar acceleration jitter $w$. The noise does not enter position and velocity directly — it first passes through kinematics via the matrix $\mathbf{G}$:

$$\mathbf{G} w = \begin{bmatrix} \frac{1}{2}(\Delta t)^2 \\ \Delta t \end{bmatrix} w \quad = \quad \begin{bmatrix} \text{noise on position} \\ \text{noise on velocity} \end{bmatrix}$$

Think of $\mathbf{G}$ as a lever arm: it maps the single scalar jitter $w$ into a 2D noise vector. The physics is the same as for the control input $u$ — an acceleration jitter $w$ adds $\frac{1}{2}(\Delta t)^2 w$ to position and $\Delta t \cdot w$ to velocity.

Recap: Covariance of a Random Vector

For a random vector $\mathbf{x}$ with mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{x}]$, the covariance matrix is:

$$\text{Cov}(\mathbf{x}) = \mathbb{E}\!\left[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T\right]$$

If the mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{x}] = \mathbf 0$ this becomes

$$\text{Cov}(\mathbf{x}) = \mathbb{E}\!\left[\mathbf{x}\mathbf{x}^T\right]$$

The Covariance of the 2D Noise Vector

The 2D noise vector $\mathbf{G}w$ is a random variable with zero mean. Therefore, its covariance matrix is:

$$\mathbf{Q} = \mathbb{E}\!\left[(\mathbf{G}w)(\mathbf{G}w)^T\right] = \mathbf{G} \mathbb{E}[w^2] \mathbf{G}^T = \mathbf{G} \sigma_a^2 \mathbf{G}^T$$

Even though $w$ is a scalar, its effect on the state is 2-dimensional. While $w$ is just a "jitter in acceleration," that jitter causes a "jitter in position" and a "jitter in velocity."

$\mathbf{Q}$ defines:

  • How much position varies.
  • How much velocity varies.
  • How position and velocity vary together.

Substituting our values of $\mathbf{G}$:

$$\mathbf{Q} = \begin{bmatrix} \frac{1}{2}(\Delta t)^2 \\ \Delta t \end{bmatrix} \begin{bmatrix} \frac{1}{2}(\Delta t)^2 & \Delta t \end{bmatrix} \sigma_a^2 = \begin{bmatrix} \frac{1}{4}(\Delta t)^4 & \frac{1}{2}(\Delta t)^3 \\ \frac{1}{2}(\Delta t)^3 & (\Delta t)^2 \end{bmatrix} \sigma_a^2$$

Meaning of the Off-Diagonal Terms

The off-diagonal entry $Q_{pv} = \sigma_a^2 \cdot \frac{1}{2}(\Delta t)^3$ is non-zero because position error and velocity error are not independent — they both come from the same single noise source $w$.

Imagine a sudden unexpected bump (a large $w$). It simultaneously pushes the position estimate off by $\frac{1}{2}(\Delta t)^2 w$ and the velocity estimate off by $\Delta t \cdot w$. The two errors always point in the same direction and are perfectly correlated:

$$Q_{pv} = \mathbb{E}\!\left[\frac{1}{2}(\Delta t)^2 w \cdot \Delta t \cdot w\right] = \frac{1}{2}(\Delta t)^3 \,\mathbb{E}[w^2] = \frac{1}{2}(\Delta t)^3 \sigma_a^2$$

If we had modeled two independent noise sources — one for position and one for velocity — the off-diagonal would be zero. But here, one physical cause (acceleration jitter) creates correlated errors in both state components simultaneously.

State Covariance Matrix ($\mathbf{P}$)

The vector $\mathbf{x}_k$ is our best guess of the state, but we know it isn't perfect. To handle this, we use the State Covariance Matrix ($\mathbf{P}$).

The matrix $\mathbf{P}$ represents the "error budget" of our estimate. It quantifies how much we trust our current values for position and velocity. For our 1D point mass (which has a 2D state: position and velocity), $\mathbf{P}$ is a $2 \times 2$ matrix:

$$\mathbf{P} = \begin{bmatrix} \sigma_p^2 & \sigma_{pv} \\ \sigma_{vp} & \sigma_v^2 \end{bmatrix}$$

  • The Diagonal ($\sigma_p^2, \sigma_v^2$): These are the variances. They tell us the "spread" of our uncertainty for each variable. A large $\sigma_p^2$ means we are very unsure about the robot's location.
  • The Off-Diagonal ($\sigma_{pv}$): This is the covariance. It tells us how errors in position and velocity are linked.

The Prediction Equation

We update our uncertainty State Covariance Matrix from the previous time step ($k-1$) to the current time ($k$) using:

$$\mathbf{P}_k^- = \mathbf{F} \mathbf{P}_{k-1}^+ \mathbf{F}^T + \mathbf{Q}$$

  • $\mathbf{P}_k^-$: The "A Priori" (predicted) uncertainty.
  • $\mathbf{Q}$: The uncertainty added by the random jitter during the time step.

Here, we assume that $\mathbf{Q}$ is a constant matrix. In more advanced systems (like those with changing time steps $\Delta t$), $\mathbf{Q}_k$ may also get a time index $k$, but the underlying "stretch and grow" logic remains identical.

The Projection ($\mathbf{F} \mathbf{P}_{k-1}^+ \mathbf{F}^T$)
This term takes our existing uncertainty and moves it forward in time using our physics model ($\mathbf{F}$).

  • Why the "Sandwich" Product? In 1D, if $x_{new} = f \cdot x$, then the variance scales by the square: $\sigma_{new}^2 = f^2 \sigma^2$. In matrix form, $\mathbf{F} (\cdot) \mathbf{F}^T$ is the multi-dimensional equivalent of "squaring" the transformation. How tensors transform with "Sandwich" Products will be explained in detail later in the course.
  • Geometric Effect: It reshapes and stretches the uncertainty. If you are uncertain about your velocity, as time passes ($\Delta t$), that uncertainty "bleeds" into your position.

The Injection ($\mathbf{Q}$)
The matrix $\mathbf{Q}$ represents the uncertainty growth caused by random noise (the "jitters" and "bumps" discussed in the Process Noise section).

  • The Role: Even if we knew our position perfectly at $k=0$, random disturbances during the interval $\Delta t$ mean we are less certain at $k=1$.
  • Geometric Effect: While the first term reshapes the bubble, $\mathbf{Q}$ increases its overall size (volume).

Summary: The Result of Prediction

Every time we run the prediction step:

  • State Prediction: Our "best guess" $\hat{\mathbf{x}}_k^-$ moves forward.

    $$\hat{\mathbf{x}}_k^- = \mathbf{F}\hat{\mathbf{x}}_{k-1}^+ + \mathbf{G}u_{k-1}$$

  • Uncertainty Prediction: Our "uncertainty bubble" $\mathbf{P}_k^-$ stretches (due to $\mathbf{F}$) and grows (due to $\mathbf{Q}$).

    $$\mathbf{P}_k^- = \mathbf{F}\mathbf{P}_{k-1}^+\mathbf{F}^T + \mathbf{Q}$$

Without a measurement update to "shrink" this bubble back down, the filter would eventually become so uncertain that the prediction becomes useless.