The prediction step we derived ($\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} \mathbf{u}_{k-1}$) assumes our model is perfect: that the system is truly linear and that the acceleration $a_{k-1}$ is known and perfectly constant over the interval $\Delta t$.
In reality, neither of these is true. Process Noise accounts for unmodeled disturbances and inaccuracies.
The general prediction equation including model imperfections is:
$$\mathbf{x}_k = \mathbf{F}_{k-1} \mathbf{x}_{k-1} + \mathbf{G}_{k-1} \mathbf{u}_{k-1} + \mathbf{L}_{k-1} \mathbf{w}_{k-1}$$
Where:
What Does Process Noise Represent?
The term $\mathbf{w}_{k-1}$ represents any effect that causes the true state of the system to deviate from our model's prediction, including:
For our 1D point mass, we have only one input (acceleration) and one noise source (acceleration jitter). Here, we assume the noise enters the system through the exact same physical "door" as the control command, meaning $\mathbf{L} = \mathbf{G}$.
The 1D Prediction Equation:
$$\mathbf{x}_k = \mathbf{F} \mathbf{x}_{k-1} + \mathbf{G} (u_{k-1} + w_{k-1})$$
The noise $w$ is sampled once per $\Delta t$, just like the control signal $u_{k-1}$. This reflects a "Discrete-Discrete" model, where we assume the jitter measured at the start of the interval persists until the next sample.
Here, $u_{k-1} = a_{k-1}$ and $w_{k-1}$ are scalars ($1 \times 1$), while $\mathbf{G}$ is the $2 \times 1$ matrix that maps that acceleration into position and velocity.
$$ \begin{bmatrix} p_k \\ v_k \end{bmatrix} = \mathbf{F} \begin{bmatrix} p_{k-1} \\ v_{k-1} \end{bmatrix} + \mathbf{G} (a_{k-1} + w_{k-1}) $$
The acceleration noise is a random variable, drawn from a Gaussian distribution with mean $0$ and variance $\sigma_a^2$. Notated as $w_{k-1} \sim \mathcal{N}(0, \sigma_a^2)$. $\sigma_a^2$ is the variance of the acceleration noise.
In the 1D Kalman Filter, the state is a single scalar (e.g., position). The process noise $w \sim \mathcal{N}(0, \sigma_w^2)$ enters the state directly:
$$x_k = x_{k-1} + \Delta x_{k-1} + w_{k-1}$$
Because the noise lives in the same space as the state, it simply adds to the scalar variance:
$$\sigma_k^{2-} = \sigma_{k-1}^{2+} + \sigma_w^2$$
This follows directly from the convolution of two Gaussians derived in the 1D case: the variance of the convolution is the sum of the individual variances. Here the noise enters the state directly without any scaling, so the addition is a simple scalar sum.
The noise is in a different space than the state
Our state is now 2-dimensional $\mathbf{x} = [p, v]^T$, but the noise source is still 1-dimensional: a scalar acceleration jitter $w$. The noise does not enter position and velocity directly — it first passes through kinematics via the matrix $\mathbf{G}$:
$$\mathbf{G} w = \begin{bmatrix} \frac{1}{2}(\Delta t)^2 \\ \Delta t \end{bmatrix} w \quad = \quad \begin{bmatrix} \text{noise on position} \\ \text{noise on velocity} \end{bmatrix}$$
Think of $\mathbf{G}$ as a lever arm: it maps the single scalar jitter $w$ into a 2D noise vector. The physics is the same as for the control input $u$ — an acceleration jitter $w$ adds $\frac{1}{2}(\Delta t)^2 w$ to position and $\Delta t \cdot w$ to velocity.
For a random vector $\mathbf{x}$ with mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{x}]$, the covariance matrix is:
$$\text{Cov}(\mathbf{x}) = \mathbb{E}\!\left[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T\right]$$
If the mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{x}] = \mathbf 0$ this becomes
$$\text{Cov}(\mathbf{x}) = \mathbb{E}\!\left[\mathbf{x}\mathbf{x}^T\right]$$
The 2D noise vector $\mathbf{G}w$ is a random variable with zero mean. Therefore, its covariance matrix is:
$$\mathbf{Q} = \mathbb{E}\!\left[(\mathbf{G}w)(\mathbf{G}w)^T\right] = \mathbf{G} \mathbb{E}[w^2] \mathbf{G}^T = \mathbf{G} \sigma_a^2 \mathbf{G}^T$$
Even though $w$ is a scalar, its effect on the state is 2-dimensional. While $w$ is just a "jitter in acceleration," that jitter causes a "jitter in position" and a "jitter in velocity."
$\mathbf{Q}$ defines:
Substituting our values of $\mathbf{G}$:
$$\mathbf{Q} = \begin{bmatrix} \frac{1}{2}(\Delta t)^2 \\ \Delta t \end{bmatrix} \begin{bmatrix} \frac{1}{2}(\Delta t)^2 & \Delta t \end{bmatrix} \sigma_a^2 = \begin{bmatrix} \frac{1}{4}(\Delta t)^4 & \frac{1}{2}(\Delta t)^3 \\ \frac{1}{2}(\Delta t)^3 & (\Delta t)^2 \end{bmatrix} \sigma_a^2$$
The off-diagonal entry $Q_{pv} = \sigma_a^2 \cdot \frac{1}{2}(\Delta t)^3$ is non-zero because position error and velocity error are not independent — they both come from the same single noise source $w$.
Imagine a sudden unexpected bump (a large $w$). It simultaneously pushes the position estimate off by $\frac{1}{2}(\Delta t)^2 w$ and the velocity estimate off by $\Delta t \cdot w$. The two errors always point in the same direction and are perfectly correlated:
$$Q_{pv} = \mathbb{E}\!\left[\frac{1}{2}(\Delta t)^2 w \cdot \Delta t \cdot w\right] = \frac{1}{2}(\Delta t)^3 \,\mathbb{E}[w^2] = \frac{1}{2}(\Delta t)^3 \sigma_a^2$$
If we had modeled two independent noise sources — one for position and one for velocity — the off-diagonal would be zero. But here, one physical cause (acceleration jitter) creates correlated errors in both state components simultaneously.
The vector $\mathbf{x}_k$ is our best guess of the state, but we know it isn't perfect. To handle this, we use the State Covariance Matrix ($\mathbf{P}$).
The matrix $\mathbf{P}$ represents the "error budget" of our estimate. It quantifies how much we trust our current values for position and velocity. For our 1D point mass (which has a 2D state: position and velocity), $\mathbf{P}$ is a $2 \times 2$ matrix:
$$\mathbf{P} = \begin{bmatrix} \sigma_p^2 & \sigma_{pv} \\ \sigma_{vp} & \sigma_v^2 \end{bmatrix}$$
We update our uncertainty State Covariance Matrix from the previous time step ($k-1$) to the current time ($k$) using:
$$\mathbf{P}_k^- = \mathbf{F} \mathbf{P}_{k-1}^+ \mathbf{F}^T + \mathbf{Q}$$
Here, we assume that $\mathbf{Q}$ is a constant matrix. In more advanced systems (like those with changing time steps $\Delta t$), $\mathbf{Q}_k$ may also get a time index $k$, but the underlying "stretch and grow" logic remains identical.
The Projection ($\mathbf{F} \mathbf{P}_{k-1}^+ \mathbf{F}^T$)
This term takes our existing uncertainty and moves it forward in time using our physics model ($\mathbf{F}$).
The Injection ($\mathbf{Q}$)
The matrix $\mathbf{Q}$ represents the uncertainty growth caused by random noise (the "jitters" and "bumps" discussed in the Process Noise section).
Every time we run the prediction step:
State Prediction: Our "best guess" $\hat{\mathbf{x}}_k^-$ moves forward.
$$\hat{\mathbf{x}}_k^- = \mathbf{F}\hat{\mathbf{x}}_{k-1}^+ + \mathbf{G}u_{k-1}$$
Uncertainty Prediction: Our "uncertainty bubble" $\mathbf{P}_k^-$ stretches (due to $\mathbf{F}$) and grows (due to $\mathbf{Q}$).
$$\mathbf{P}_k^- = \mathbf{F}\mathbf{P}_{k-1}^+\mathbf{F}^T + \mathbf{Q}$$
Without a measurement update to "shrink" this bubble back down, the filter would eventually become so uncertain that the prediction becomes useless.