Extended Kalman Filter¶

Introduction: From Particles to Pose¶

To track a robot, we must maintain linear and angular motion simultaneously.

We define the state vector $\mathbf{x}$ as a single column containing five pieces of information:

$$\mathbf{x} = \begin{bmatrix} p_1 \\ p_2 \\ v_1 \\ v_2 \\ \theta \end{bmatrix} \begin{matrix} \text{(Position X)} \\ \text{(Position Y)} \\ \text{(Velocity X)} \\ \text{(Velocity Y)} \\ \text{(Orientation)} \end{matrix}$$

We also define an input vector $\mathbf{u}$ consisting of the IMU-measurements (Inertial Measurement Unit) that "drive" the change in our state:

linear accelerations ($a_1^B, a_2^B$)
angular velocity ($\omega$).

$$ \mathbf{u} = \begin{bmatrix} a_1^B \\ a_2^B \\ \omega \end{bmatrix} $$

Note: The linear accelerations are measured in the body frame (indicated by the upper index $B$). In contrast, position and velocity are tracked in the world frame (no upper index).

The Breakdown of the Linear Model¶

The standard Kalman Filter (KF) assumes the next state is a linear combination of the previous state, control inputs, and noise:

$$\mathbf{x}_k = \mathbf{F}_{k-1} \mathbf{x}_{k-1} + \mathbf{G}_{k-1} \left( \mathbf{u}_{k-1} + \mathbf{w}_{k-1}\right)$$ Where:

$\mathbf{F}_{k-1}$: State Transition Matrix (how position/velocity evolve).
$\mathbf{G}_{k-1}$: Control Matrix (how the IMU inputs $u$ change the state).
$\mathbf{w}_{k-1}$ is the raw process noise from our sensors

However, this linear assumption breaks down for our robot. When we use body-frame measurements ($a_1^B, a_2^B, \omega$) and rotate them into the world frame ($W$), the equations become non-linear due to their dependence on the heading $\theta$.

Recalling the rotation into the world frame (no upper index):

$a_1 = a_1^B \cos\theta - a_2^B \sin\theta$
$a_2 = a_1^B \sin\theta + a_2^B \cos\theta$

The resulting state transition equations for our system (ignoring noise for a moment) are:

$p_{1,k} = p_{1,k-1} + v_{1,k-1}\Delta t + \frac{1}{2}\Delta t^2 (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1})$
$p_{2,k} = p_{2,k-1} + v_{2,k-1}\Delta t + \frac{1}{2}\Delta t^2 (a_1^B \sin\theta_{k-1} + a_2^B \cos\theta_{k-1})$
$v_{1,k} = v_{1,k-1} + \Delta t (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1})$
$v_{2,k} = v_{2,k-1} + \Delta t (a_1^B \sin\theta_{k-1} + a_2^B \cos\theta_{k-1})$
$\theta_{k} = \theta_{k-1} + \omega \Delta t$

Notice:

Trigonometric Coupling: The state variable $\theta$ is "trapped" inside sine and cosine functions.
Cross-Term Interaction: We see terms like $a_1^B \cos\theta$. Here, an input ($a_1^B$) is being multiplied by a function of a state ($\theta$). In a linear system, inputs and states must remain independent additive terms; once they multiply, the geometry of our uncertainty "warps."

This means the transition from $\mathbf{x}_{k-1}$ to $\mathbf{x}_k$ can no longer be expressed as a simple matrix multiplication.

Modeling assumptions. Two approximations are baked into these equations and are worth making explicit:

Constant acceleration over the step: $\mathbf{a}^B$ is treated as constant across $\Delta t$ — this is what produces the constant-acceleration term $\frac{1}{2}\Delta t^2 \mathbf{a}$.
Frozen heading during the step: the rotation uses $\theta_{k-1}$ for the entire interval, even though the true heading slews from $\theta_{k-1}$ to $\theta_k = \theta_{k-1} + \omega\Delta t$. This is a first-order approximation, accurate when $\omega\,\Delta t$ is small.

Both errors shrink as $\Delta t \to 0$, so a fast prediction rate keeps the discretization honest.

Moving to the Extended Kalman Filter (EKF)¶

Since we cannot use a simple matrix multiplication to predict the state, we define a nonlinear transition function $\mathbf{f}$:

$$\mathbf{x}_k = \mathbf{f}(\mathbf{x}_{k-1}, \mathbf{u}_{k-1}, \mathbf{w}_{k-1})$$

where with the dimension of our state space $n$:

$\mathbf f$ is a $n$-vector-valued function.

To keep the "Kalman logic" alive, we must linearize this function at every time step using a first-order Taylor expansion. This is the essence of the EKF.

1. The Prediction Step (Nonlinear) We calculate the new state $\hat{\mathbf{x}}_k^-$ directly using the nonlinear trigonometric equations above. This gives us our "best guess" for the robot's new pose.

2. The Uncertainty Propagation (The Jacobian) To update the error covariance $\mathbf{P}$ (our "certainty"), we cannot use $\mathbf{F}$ or $\mathbf{G}$ because they don't exist in constant form. Instead, we calculate the Jacobian Matrix—the matrix of partial derivatives—which represents the local linear slope of the function $\mathbf{f}$ at the current state.

The State Transition Jacobian (Process Jacobian) $\mathbf{J}_F$¶

The State Transition Jacobian, which we will call $\mathbf{J}_F$, is a matrix of partial derivatives that represents the "linear slope" of the movement $\mathbf f$ at the current state variables. For example, it tells the filter: Given our current heading and acceleration, how much will a $1^\circ$ error in $\theta$ bleed into our position and velocity estimates?

The Jacobian matrix, denoted in general as $\mathbf{J}$, is a matrix of all first-order partial derivatives of a vector function. In our EKF context, we use the symbol $\mathbf{J}_F$ for the Jacobian of the nonlinear prediction function $\mathbf{f}(\mathbf{x}_{k-1}, \mathbf{u}_{k-1})$. This Jacobian tells us how a tiny change in any part of the state ($\mathbf{x}$) affects the next state.

The State Transition Jacobian $\mathbf{J}_F$ ($n \times n$-matrix) is defined as:

$$\mathbf{J}_F = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \dots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}$$

Rows ($n$): Each row represents one output of your $n$-vector-valued function $\mathbf{f}$.
Columns ($n$): Each column represents the derivative with respect to one of the $n$ variables in your state vector $\mathbf{x}$.

Because you are mapping the "change in the state" to the "next state," the input and output counts are identical, making the matrix perfectly square.

Why propagate the uncertainty through the linearization?¶

The mean is easy; the spread is the hard part. The best guess $\hat{\mathbf{x}}$ can be pushed straight through the nonlinear model: $\hat{\mathbf{x}}_k^- = \mathbf{f}(\hat{\mathbf{x}}_{k-1}^+,\mathbf{u}_{k-1})$. The difficulty is the uncertainty — $\mathbf{P}$ describes a whole cloud of plausible states, and we need what the motion does to that cloud.

The filter can only carry a Gaussian (an ellipse). A linear map sends a Gaussian to another Gaussian — it stretches, rotates and shifts the ellipse, which $\mathbf{P}$ still describes exactly. A nonlinear map bends it into a banana-shaped blob that is no longer Gaussian, and the filter has no way to store that shape. So the step that transforms the cloud must be linear.

Covariance has a simple rule only for linear maps. For $\mathbf{y}=\mathbf{A}\mathbf{x}$, $\operatorname{Cov}(\mathbf{y})=\mathbf{A}\,\operatorname{Cov}(\mathbf{x})\,\mathbf{A}^\top$ (in 1D, $\operatorname{var}(ax)=a^2\operatorname{var}(x)$). No equally clean exact rule exists for $\operatorname{Cov}(\mathbf{f}(\mathbf{x}))$.

So we replace $\mathbf{f}$ by its tangent. Near the current estimate $\hat{\mathbf{x}}$, $\mathbf{f}(\mathbf{x}) \approx \underbrace{\mathbf{f}(\hat{\mathbf{x}})}_{\text{shifts the cloud}} + \underbrace{\mathbf{J}_F(\mathbf{x}-\hat{\mathbf{x}})}_{\text{stretches/rotates it}}$. The Jacobian $\mathbf{J}_F$ is this local linear map, which makes the clean rule apply:

$$\mathbf{P}_k^- = \mathbf{J}_F\,\mathbf{P}_{k-1}^+\,\mathbf{J}_F^\top + (\text{process noise}).$$

Why it works (and when it doesn't). Zoom in on any smooth curve and it looks like its tangent line. As long as the uncertainty cloud is small compared to how sharply $\mathbf{f}$ curves, $\mathbf{J}_F$ captures almost all of the reshaping. If the spread is large or $\mathbf{f}$ bends sharply, the linearization becomes inaccurate — the EKF's main weakness. Several filters address this:

The Unscented Kalman Filter and Particle Filters sidestep the explicit Jacobian altogether.
The Error-State (indirect) Kalman Filter instead keeps the linearization accurate by tracking only the small error between the true state and a separately-integrated nominal trajectory: because that error stays near zero, its first-order approximation is essentially exact. The error-state form also naturally handles state spaces that are not flat vector spaces — notably 3D orientations, which live on a rotation manifold ($SO(3)$) rather than in $\mathbb{R}^n$ — and is the approach we use later for the 3D case.

Calculating the Jacobian for your 5-State System¶

Definition of the Jacobian for our 5-state system:

$$\mathbf{J}_F = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial p_1} & \frac{\partial f_1}{\partial p_2} & \dots & \frac{\partial f_1}{\partial \theta} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_5}{\partial p_1} & \frac{\partial f_5}{\partial p_2} & \dots & \frac{\partial f_5}{\partial \theta} \end{bmatrix}$$

Take your first equation: $f_1 = p_{1,k-1} + v_{1,k-1} \Delta t + \frac{1}{2}\Delta t^2 (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1})$.

Differentiating $f_1$ with respect to $p_{1,k-1}$, we get 1.
Differentiating $f_1$ with respect to $v_{1,k-1}$, we get $\Delta t$.
Differentiating $f_1$ with respect to $\theta_{k-1}$, we use the rules $(\cos\theta)' = -\sin\theta$ and $(\sin\theta)' = \cos\theta$: $$\frac{\partial f_1}{\partial \theta_{k-1}} = \frac{1}{2}\Delta t^2 (-a_1^B \sin\theta_{k-1} - a_2^B \cos\theta_{k-1})$$

This entry in the Jacobian tells the EKF how much the X-position ($p_1$) will change if the angle ($\theta$) is slightly different than we thought.

The Resulting Jacobian Matrix $\mathbf{J}_F$

For our system, the Jacobian — evaluated at the current estimate $\hat{\mathbf{x}}_{k-1}^+$ — looks like this:

$$\mathbf{J}_F = \left.\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right|_{\hat{\mathbf{x}}_{k-1}^+,\,\mathbf{u}_{k-1}} = \begin{bmatrix} 1 & 0 & \Delta t & 0 & \frac{1}{2}\Delta t^2 (-a_1^B \sin\theta_{k-1} - a_2^B \cos\theta_{k-1}) \\ 0 & 1 & 0 & \Delta t & \frac{1}{2}\Delta t^2 (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1}) \\ 0 & 0 & 1 & 0 & \Delta t (-a_1^B \sin\theta_{k-1} - a_2^B \cos\theta_{k-1}) \\ 0 & 0 & 0 & 1 & \Delta t (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1}) \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}$$

(the entries are the symbolic derivatives from above, with $\theta_{k-1}$ and $a^B$ taken at their current estimated/measured values.)

Interpreting $\mathbf{J}_F$ — how heading error leaks into position and velocity. Apart from the trivial position–velocity link ($\partial p/\partial v = \Delta t$), the only coupling in $\mathbf{J}_F$ lives in the fifth column $\partial \mathbf{f}/\partial\theta$. Its non-zero entries (rows 1–4) say that an error or uncertainty in the heading $\theta$ does not stay confined to $\theta$ — it is converted, through the body acceleration, into errors in velocity ($v_1, v_2$) and position ($p_1, p_2$). Intuitively: if the robot is accelerating and our estimate of $\theta$ is slightly off, we rotate $\mathbf{a}^B$ into the wrong world-frame direction, so the predicted velocity (scaled by $\Delta t$) and position (scaled by $\tfrac{1}{2}\Delta t^2$) inherit that mistake. Two consequences:

The leak is proportional to the acceleration. If $a_1^B = a_2^B = 0$, the entire $\theta$-column (rows 1–4) vanishes and heading uncertainty cannot contaminate $p$ or $v$ during this step. The faster the robot accelerates, the more a wrong heading corrupts the trajectory.
This column is the mechanism behind the "warping" uncertainty cloud. When the covariance is propagated as $\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \dots$, it is precisely these $\partial/\partial\theta$ terms that transfer variance from $\theta$ into the position and velocity states whenever the heading is uncertain.

Unlike the linear $\mathbf{F}$, which is constant, the Jacobian $\mathbf{J}_F$ depends on:

The current estimated orientation $\theta$ (part of $\mathbf x $).
The current measured accelerations $a_1^B, a_2^B$ (part of $\mathbf u $).

Summary for the Jacobian $\mathbf{J}_F$

The Jacobian $\mathbf{J}_F$ replaces the constant transition matrix $\mathbf{F}$. Unlike $\mathbf{F}$, which is the same every time, $\mathbf{J}_F$ is recalculated at every step using the current estimate $\hat\theta_{k-1}^+$ and the current accelerations $\mathbf{a}_{k-1}^B$.

It serves as the "linear bridge" that allows the Kalman Filter to propagate uncertainty through the nonlinear turns of the robot.

The Control Jacobian $\mathbf{J}_G$¶

While the State Jacobian $\mathbf{J}_F$ tells us how uncertainty already in the state propagates forward, every prediction step also injects new uncertainty — because the IMU drivers $\mathbf{u} = [a_1^B, a_2^B, \omega]^\top$ that we feed into the motion model $\mathbf{f}$ are themselves noisy.

The difficulty is that this noise is described in the wrong space. The IMU gives us a few per-sensor error figures — uncertainty in input space (3 numbers: two accelerometers and a gyro). What we actually need is the resulting uncertainty in state space (5 numbers: $p_1, p_2, v_1, v_2, \theta$). So we need a bridge that carries a wobble in the drivers over into a wobble in the state.

That bridge is the Control Jacobian: $$\mathbf{J}_G = \frac{\partial \mathbf{f}}{\partial \mathbf{u}} = \begin{bmatrix} \frac{\partial f_1}{\partial u_1} & \frac{\partial f_1}{\partial u_2} & \cdots & \frac{\partial f_1}{\partial u_p} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial u_1} & \frac{\partial f_n}{\partial u_2} & \cdots & \frac{\partial f_n}{\partial u_p} \end{bmatrix} $$

It is the linear map that transforms a small perturbation of the drivers into the corresponding perturbation of the state: $$\delta\mathbf{x} \approx \mathbf{J}_G\,\delta\mathbf{u}.$$ A noisy IMU reading $\delta\mathbf{u}$ is rotated and scaled by $\mathbf{J}_G$ into the state error $\delta\mathbf{x}$ it produces over one step. When that perturbation is specifically the random IMU measurement error, it is the process noise $\mathbf{w}$ of the linear model in the callout below ($\delta\mathbf{u} = \mathbf{w}$), a zero-mean random vector whose covariance is the $\mathbf{Q}_u$ introduced next ($\mathbf{Q}_u = \operatorname{Cov}(\mathbf{w})$). So $\mathbf{J}_G$ captures how the driver noise spreads into the state; how large that noise is in the first place is the separate quantity $\mathbf{Q}_u$ — and the two are combined in the covariance-prediction step.

A note on the symbol. We write the subscript $G$ because, in the generic linear model $\mathbf{x}_k = \mathbf{F}_{k-1}\mathbf{x}_{k-1} + \mathbf{G}_{k-1}\left(\mathbf{u}_{k-1} + \mathbf{w}_{k-1}\right)$, this Jacobian is what plays the role of the control matrix $\mathbf{G}$ — just as $\mathbf{J}_F$ takes the role of $\mathbf{F}$ and (below) $\mathbf{J}_H$ takes the role of $\mathbf{H}$. Each $\mathbf{J}_\bullet$ is the EKF Jacobian that, in the linear case, collapses back to the standard matrix named in its subscript. Note that here the input $\mathbf{u}_{k-1}$ and its noise $\mathbf{w}_{k-1}$ enter through the same matrix $\mathbf{G}_{k-1}$; for our robot that process noise is the IMU measurement error on $\mathbf{u}$, so the single Jacobian $\mathbf{J}_G$ maps both the deterministic IMU input and its noise into the state space. (Were the noise to enter through a separate gain $\mathbf{L}$, as in the general $\mathbf{G}\mathbf{u}+\mathbf{L}\mathbf{w}$ form, the identification $\delta\mathbf{u}=\mathbf{w}$ would break and the noise would propagate through $\mathbf{L}$ instead of $\mathbf{J}_G$.)

Dimensions and the Resulting Matrix¶

Since our state $\mathbf{x}$ has $n=5$ variables and our input $\mathbf{u}$ has $p=3$ variables, $\mathbf{J}_G$ is an $n \times p$ matrix ($5 \times 3$).

$$\mathbf{J}_G = \left.\frac{\partial \mathbf{f}}{\partial \mathbf{u}}\right|_{\hat{\mathbf{x}}_{k-1}^+,\,\mathbf{u}_{k-1}} = \begin{bmatrix} \frac{1}{2}\Delta t^2 \cos\theta_{k-1} & -\frac{1}{2}\Delta t^2 \sin\theta_{k-1} & 0 \\ \frac{1}{2}\Delta t^2 \sin\theta_{k-1} & \frac{1}{2}\Delta t^2 \cos\theta_{k-1} & 0 \\ \Delta t \cos\theta_{k-1} & -\Delta t \sin\theta_{k-1} & 0 \\ \Delta t \sin\theta_{k-1} & \Delta t \cos\theta_{k-1} & 0 \\ 0 & 0 & \Delta t \end{bmatrix}$$

Reading one entry. Each entry $\partial f_i/\partial u_j$ answers: how much does input $u_j$ move state $i$ over one step? Take the $v_1$ row and the $a_1^B$ column. From the velocity equation $$v_{1,k} = v_{1,k-1} + \Delta t\,(a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1}),$$ the derivative is $\dfrac{\partial v_1}{\partial a_1^B} = \Delta t\,\cos\theta_{k-1}$.

So if the accelerometer reading $a_1^B$ carries a small error $\delta a_1^B$ (say $0.01\,\mathrm{m/s^2}$), the predicted $v_1$ inherits an error of about $\Delta t\,\cos\theta \cdot \delta a_1^B$. The $\cos\theta$ is just the projection of the body-$x$ axis onto the world-$x$ axis: the error is largest when the robot faces along world-$x$ ($\theta\approx 0$, $\cos\theta = 1$) and vanishes when it faces sideways ($\theta\approx 90^\circ$, $\cos\theta = 0$). Collecting one such sensitivity per (state, input) pair is the matrix $\mathbf{J}_G$ above. On its own, though, $\mathbf{J}_G$ describes only how a driver error spreads — to push the robot's actual sensor noise through it we still have to say how large that noise is. That is the input-noise matrix $\mathbf{Q}_u$, which we define next.

Defining the Input Noise Matrix ($\mathbf{Q}_u$)¶

Where $\mathbf{J}_G$ told us how input noise spreads into the state, the input-noise matrix $\mathbf{Q}_u$ says how large that input noise is. It represents the uncertainty of your "drivers" — the raw measurements from the IMU. We assume these sensors are uncorrelated (e.g., noise in the X-accelerometer doesn't affect the Gyroscope), which allows us to use a diagonal form:

$$\mathbf{Q}_u = \begin{bmatrix} \sigma_{a1}^2 & 0 & 0 \\ 0 & \sigma_{a2}^2 & 0 \\ 0 & 0 & \sigma_\omega^2 \end{bmatrix}$$

With both ingredients now in hand — the spread map $\mathbf{J}_G$ and the input noise $\mathbf{Q}_u$ — the next cell combines them into the process-noise contribution $\mathbf{Q}_k = \mathbf{J}_G\,\mathbf{Q}_u\,\mathbf{J}_G^\top$.

Isotropic noise¶

Be careful about what $\mathbf{J}_G$ does to the input noise: it does not merely rotate it into the world frame, it maps it into the position and velocity states. Two operations are bundled together — a body→world rotation $\mathbf{C}(\theta)$ (the $2\times2$ heading rotation, written $\mathbf{C}$ to avoid clashing with the measurement-noise $\mathbf{R}$), and a kinematic integration that spreads each world-frame acceleration into position (factor $\tfrac12\Delta t^2$) and velocity (factor $\Delta t$).

Writing the $2\times2$ accelerometer block of the input noise as $\mathbf{Q}_a = \mathrm{diag}(\sigma_{a1}^2,\sigma_{a2}^2)$, the linear-state ($p_1,p_2,v_1,v_2$) part of $\mathbf{Q}_k=\mathbf{J}_G\mathbf{Q}_u\mathbf{J}_G^\top$ is

$$ \begin{bmatrix} \tfrac14\Delta t^4\,\mathbf{C}\mathbf{Q}_a\mathbf{C}^\top & \tfrac12\Delta t^3\,\mathbf{C}\mathbf{Q}_a\mathbf{C}^\top \\ \tfrac12\Delta t^3\,\mathbf{C}\mathbf{Q}_a\mathbf{C}^\top & \Delta t^2\,\mathbf{C}\mathbf{Q}_a\mathbf{C}^\top \end{bmatrix}. $$

The $\Delta t$ integration factors are always present; the heading enters only through the rotated block $\mathbf{C}(\theta)\,\mathbf{Q}_a\,\mathbf{C}(\theta)^\top$.

There is one important exception. If the two accelerometer axes are equally noisy ($\sigma_{a1}=\sigma_{a2}=\sigma_a$), then $\mathbf{Q}_a=\sigma_a^2\mathbf{I}$ is isotropic and the rotation leaves it unchanged,

$$\mathbf{C}(\theta)\,(\sigma_a^2 \mathbf{I})\,\mathbf{C}(\theta)^\top = \sigma_a^2\,\mathbf{C}(\theta)\mathbf{C}(\theta)^\top = \sigma_a^2 \mathbf{I}.$$

The heading then disappears from $\mathbf{Q}_k$ entirely — only the constant $\Delta t$ factors remain — whereas anisotropic noise ($\sigma_{a1}\neq\sigma_{a2}$) really is reshaped by $\theta$. This is exactly what lets a practical implementation replace the full, state-dependent $\mathbf{J}_G\mathbf{Q}_u\mathbf{J}_G^\top$ with a cheaper, heading-free discrete white-noise (constant-acceleration) model for $\mathbf{Q}_k$.

The explicit constant $\mathbf{Q}$ (isotropic case). Carrying the isotropic block $\mathbf{C}\mathbf{Q}_a\mathbf{C}^\top=\sigma_a^2\mathbf{I}$ into all five states (order $p_1,p_2,v_1,v_2,\theta$), and adding the gyro contribution $\partial f_5/\partial\omega=\Delta t$ for the heading, the process-noise matrix becomes fully explicit and independent of $\theta$:

$$ \mathbf{Q} = \mathbf{J}_G\,\mathbf{Q}_u\,\mathbf{J}_G^\top = \begin{bmatrix} \tfrac14\Delta t^4\,\sigma_a^2 & 0 & \tfrac12\Delta t^3\,\sigma_a^2 & 0 & 0 \\ 0 & \tfrac14\Delta t^4\,\sigma_a^2 & 0 & \tfrac12\Delta t^3\,\sigma_a^2 & 0 \\ \tfrac12\Delta t^3\,\sigma_a^2 & 0 & \Delta t^2\,\sigma_a^2 & 0 & 0 \\ 0 & \tfrac12\Delta t^3\,\sigma_a^2 & 0 & \Delta t^2\,\sigma_a^2 & 0 \\ 0 & 0 & 0 & 0 & \Delta t^2\,\sigma_\omega^2 \end{bmatrix}. $$

Each position–velocity axis carries the classic discrete white-noise–acceleration block $\begin{bmatrix}\tfrac14\Delta t^4 & \tfrac12\Delta t^3 \\ \tfrac12\Delta t^3 & \Delta t^2\end{bmatrix}\sigma_a^2$; the two axes are decoupled (the off-diagonal $\mathbf{C}\mathbf{Q}_a\mathbf{C}^\top$ terms vanish under isotropy), and the heading gets its own variance $\Delta t^2\,\sigma_\omega^2$ from the gyro. Because it no longer depends on the step, we drop the time index and write it simply $\mathbf{Q}$ — the cheap, heading-free matrix a practical filter can hard-code instead of recomputing $\mathbf{J}_G\mathbf{Q}_u\mathbf{J}_G^\top$ every step.

Input Noise Values¶

Values from Datasheet

Sensor manufacturers usually specify noise in one of two ways:

Standard Deviation ($\sigma$): Often listed as "Noise RMS." You must square this value to get the variance ($\sigma^2$) for the matrix.
Noise Density: Listed in units like $\text{mg}/\sqrt{\text{Hz}}$ (for accelerometers) or $^\circ/\text{s}/\sqrt{\text{Hz}}$ (for gyros). To convert this to a standard deviation, use the formula: $$\sigma = \text{Noise Density} \cdot \sqrt{\text{Sampling Bandwidth}}$$

Accounting for Vibration and "Real-World" Noise

The datasheet provides the "laboratory" noise. In a real robot, motors and surface roughness create significant mechanical vibrations.

The Multiplier Rule: It is common practice to start with the datasheet values and multiply them by a factor of 2 to 10 to account for these environmental factors.
Static Test: A professional way to determine these values is to let the robot sit perfectly still for 10 seconds, record the IMU data, and calculate the variance of the signal: variance = numpy.var(sensor_data_while_still)

The EKF Covariance Prediction Formula¶

Now that we have both Jacobians, we can propagate the uncertainty (the Covariance Matrix $\mathbf{P}$) correctly. The new predicted uncertainty is the sum of "old uncertainty moved forward" and "new noise from the sensors":

$$\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$$

Where:

$\mathbf{P}_{k-1}^+$ is your previous (a posteriori) uncertainty.
$\mathbf{Q}_u$ is the raw noise matrix from your IMU datasheet (diagonal matrix of $\sigma^2_{a1}, \sigma^2_{a2}, \sigma^2_{\omega}$).
$\mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$ is the Process Noise Matrix $\mathbf{Q}_k$ that we now define—the input noise is now projected into the world frame through the state-dependent $\mathbf{J}_G$!

This $\mathbf{Q}_k = \mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$ is the multi-dimensional generalisation of the 1D result $\mathbf{Q} = \mathbf{G}\,\sigma_a^2\,\mathbf{G}^\top$ derived in the process-noise notebook: the scalar input variance $\sigma_a^2$ becomes the input-noise covariance $\mathbf{Q}_u$, and the kinematic lever arm $\mathbf{G}$ becomes the control Jacobian $\mathbf{J}_G$.

A constant process-noise matrix is often used in practice. Rather than recomputing $\mathbf{J}_G\mathbf{Q}_u\mathbf{J}_G^\top$ from the current heading at every step, many implementations simply hard-code the heading-free, isotropic process noise derived above. Since it no longer changes from step to step, we drop the time index and call it simply $\mathbf{Q}$ (a fixed matrix of $\Delta t$, $\sigma_a^2$ and $\sigma_\omega^2$ terms). This is exact when the accelerometer noise is isotropic, and otherwise only a mild approximation — acceptable because $\mathbf{Q}$ is in any case a hand-tuned design parameter rather than a precisely known quantity. The covariance prediction then simplifies to

$$\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \mathbf{Q}.$$

Notational Summary¶

Matrix	Name	Role	Calculation
$\mathbf{J}_F$	State Jacobian	Maps how the current state ($\mathbf{x}_{k-1}$) affects the future state.	$\frac{\partial \mathbf{f}}{\partial \mathbf{x}}$
$\mathbf{J}_G$	Control Jacobian	Maps how IMU inputs ($\mathbf{u}_{k-1}$) affect the future state.	$\frac{\partial \mathbf{f}}{\partial \mathbf{u}}$
$\mathbf{Q}_u$	Input Noise	The raw variance of the IMU sensors (from datasheet).	$\text{diag}(\sigma^2_{a1}, \sigma^2_{a2}, \sigma^2_\omega)$
$\mathbf{Q}_k$	Process Noise	The projected uncertainty added during the movement.	$\mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$

The Measurement Jacobian ($\mathbf{J}_H$)¶

Just as $\mathbf{J}_F$ and $\mathbf{J}_G$ handle the nonlinearity of the robot's motion, the Measurement Jacobian ($\mathbf{J}_H$) handles the relationship between your state and your sensors.

In an EKF, we define a measurement function $\mathbf{h}(\mathbf{x})$ that predicts what the sensor should see based on our current estimate. Because this function is often nonlinear (like the Pythagorean theorem for distance or trigonometry for a compass), the EKF uses a Jacobian to linearize the relationship at the robot's current location.

Mathematically, $\mathbf{J}_H$ is the matrix of partial derivatives of the measurement function with respect to each state variable:

$$\mathbf{J}_H = \frac{\partial \mathbf{h}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial h_1}{\partial x_1} & \frac{\partial h_1}{\partial x_2} & \cdots & \frac{\partial h_1}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial h_m}{\partial x_1} & \frac{\partial h_m}{\partial x_2} & \cdots & \frac{\partial h_m}{\partial x_n} \end{bmatrix}$$

The role of $\mathbf{J}_H$: bridging state and measurement space¶

Like the other Jacobians, $\mathbf{J}_H$ is a bridge between two spaces — here between the state space (where the filter keeps its estimate and its uncertainty $\mathbf{P}$) and the measurement space (where the sensor lives). The prediction of the reading itself is done by $\mathbf{h}(\cdot)$; $\mathbf{J}_H = \partial\mathbf{h}/\partial\mathbf{x}$ is what carries uncertainty across that boundary, and it does so in both directions during the correction step:

Push uncertainty into measurement space. Before trusting a sensor reading, the filter asks "how uncertain is my prediction of that reading?" It maps the state covariance $\mathbf{P}_k^-$ through $\mathbf{J}_H$ and adds the sensor noise $\mathbf{R}_k$ to form the innovation covariance $$\mathbf{S}_k = \mathbf{J}_H\,\mathbf{P}_k^-\,\mathbf{J}_H^\top + \mathbf{R}_k.$$ This is the same congruence transform $\mathbf{J}\,\mathbf{P}\,\mathbf{J}^\top$ used by $\mathbf{J}_F$ and $\mathbf{J}_G$, but now mapping state $\rightarrow$ measurement.
Pull the correction back into state space. The Kalman gain $$\mathbf{K}_k = \mathbf{P}_k^-\,\mathbf{J}_H^\top\,\mathbf{S}_k^{-1}$$ uses $\mathbf{J}_H^\top$ to turn a mismatch expressed in sensor units — the innovation $\mathbf{y}_k = \mathbf{z}_k - \mathbf{h}(\hat{\mathbf{x}}_k^-)$ — into an adjustment of the full state. In effect $\mathbf{J}_H$ encodes which state directions a sensor actually observes, and therefore which components of the estimate a given reading is allowed to correct (e.g. a position fix leaves $\theta$ untouched unless the state is correlated through $\mathbf{P}$).

So $\mathbf{J}_H$ is the state $\leftrightarrow$ measurement translator that makes the update step possible. The complete set of correction equations is collected later in The EKF Algorithm Cycle; here we focus on how to build $\mathbf{J}_H$ itself.

When is a Measurement Nonlinear?¶

A relationship is nonlinear if the sensor does not "see" the state variables directly, but instead sees a result of a trigonometric or geometric calculation:

Range Sensors (e.g., Ultra-Wideband or Radar): The sensor measures a distance $d$ to a known beacon (here taken at the origin). To predict this, the filter must calculate $\mathbf{h}(\mathbf{x}) = \sqrt{p_1^2 + p_2^2}$. The derivative of this square root is nonlinear.
Bearing-Only Sensors (e.g., Camera tracking a landmark): The sensor measures the angle $\alpha$ to an object. The prediction function $\mathbf{h}(\mathbf{x}) = \arctan2(p_2, p_1)$ is highly nonlinear.

In these cases, $\mathbf{J}_H$ must be recalculated at every step because the "slope" of a circle or an angle changes depending on where the robot is standing.

When to use the Linear Measurement Update¶

If your sensors provide direct observations of your state variables, the relationship is linear. For your 5-state system, this applies to:

GPS/MoCap: Provides $(p_1, p_2)$ directly.
Magnetometer: Provides $\theta$ directly.

In these cases, $\mathbf{J}_H$ is simply a constant "selection matrix" (a matrix of 1s and 0s) that picks the relevant variable out of the state vector. For a sensor measuring only global position, the Jacobian is constant:

$$\mathbf{J}_H = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{bmatrix}$$

Since the derivative of $1 \cdot p_1$ is simply $1$, the Jacobian of a linear measurement is identical to the standard linear matrix $\mathbf{H}$.

Caution — angular measurements live on a circle.¶

For any sensor that observes the heading directly (e.g. the magnetometer, $\mathbf{h}(\mathbf{x})=\theta$), the measurement Jacobian is linear, but the innovation is not. Recall from the 2D-rotations notebook that $\theta$ is periodic. The residual $\mathbf{y}_k = \mathbf{z}_k - \mathbf{h}(\hat{\mathbf{x}}_k^-)$ must be wrapped to $(-\pi, \pi]$ (e.g. atan2(sin(y), cos(y))), otherwise a reading near $+\pi$ versus $-\pi$ produces a spurious $\approx 2\pi$ jump that the filter would misinterpret as a huge error. This wrapping is applied explicitly in the implementation in the 2d-navigation notebook.

Summary¶

We only "extend" the measurement step (calculating a fresh Jacobian at every step) if the sensor output is a complex function of the state. If the sensor measures position, velocity, or angle directly, we stick to the much simpler linear $\mathbf{J}_H$ matrix.

The EKF Algorithm Cycle¶

The EKF follows the same "Predict-Correct" loop as the standard Kalman Filter, but uses the nonlinear function $\mathbf{f}$ for the state and Jacobians for the uncertainty.

1. Prediction (Time Update)¶

We move the robot forward in time based on the IMU inputs.

State Prediction: $$\hat{\mathbf{x}}_k^- = \mathbf{f}(\hat{\mathbf{x}}_{k-1}^+, \mathbf{u}_{k-1})$$ (Use the trigonometric equations to find the new $p_1, p_2, v_1, v_2, \theta$)
Covariance Prediction: $$\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$$ respectivly for the constant process noise matrix $$\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \mathbf{Q}.$$

(This warps the old uncertainty and adds the rotated IMU noise)

2. Correction (Measurement Update)¶

We "pull" the prediction toward the actual sensor reading $z_k$.

Innovation (Residual): $$\mathbf{y}_k = \mathbf{z}_k - \mathbf{h}(\hat{\mathbf{x}}_k^-)$$ (The difference between what the sensor says and what we expected to see. For angular measurements, wrap $\mathbf{y}_k$ to $(-\pi,\pi]$ — see the caution above.)
Innovation Covariance:$$\mathbf{S}_k = \mathbf{J}_H \mathbf{P}_k^- \mathbf{J}_H^\top + \mathbf{R}_k$$
Kalman Gain:$$\mathbf{K}_k = \mathbf{P}_k^- \mathbf{J}_H^\top \mathbf{S}_k^{-1}$$
State Update:$$\hat{\mathbf{x}}_k^+ = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{y}_k$$
Covariance Update:$$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{J}_H) \mathbf{P}_k^-$$

Numerical stability — the Joseph form.¶

The compact update $\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{J}_H)\mathbf{P}_k^-$ is correct in exact arithmetic but can lose symmetry or positive-definiteness under floating-point rounding (mirroring the rounding concerns that made $\mathbf{R}^{-1}=\mathbf{R}^\top$ attractive in the rotations notebook). The algebraically equivalent Joseph form

$$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{J}_H)\,\mathbf{P}_k^-\,(\mathbf{I} - \mathbf{K}_k \mathbf{J}_H)^\top + \mathbf{K}_k \mathbf{R}_k \mathbf{K}_k^\top$$

is structured as a symmetric "sandwich" plus a positive term, so it keeps $\mathbf{P}$ symmetric positive-definite over long runs. It is the preferred choice in practice and is the form used in the 2d-navigation implementation.

Why the two are equal (for the optimal gain). Write $\mathbf{P}\equiv\mathbf{P}_k^-$, $\mathbf{H}\equiv\mathbf{J}_H$, and $\mathbf{S}=\mathbf{H}\mathbf{P}\mathbf{H}^\top+\mathbf{R}_k$. Expanding the Joseph sandwich and collecting the two $\mathbf{K}(\cdots)\mathbf{K}^\top$ pieces:

$$ (\mathbf{I}-\mathbf{K}\mathbf{H})\,\mathbf{P}\,(\mathbf{I}-\mathbf{K}\mathbf{H})^\top + \mathbf{K}\mathbf{R}_k\mathbf{K}^\top = \mathbf{P} - \mathbf{P}\mathbf{H}^\top\mathbf{K}^\top - \mathbf{K}\mathbf{H}\mathbf{P} + \mathbf{K}\,\underbrace{(\mathbf{H}\mathbf{P}\mathbf{H}^\top+\mathbf{R}_k)}_{=\,\mathbf{S}}\,\mathbf{K}^\top . $$

Now insert the optimal gain $\mathbf{K}=\mathbf{P}\mathbf{H}^\top\mathbf{S}^{-1}$. It satisfies $\mathbf{K}\mathbf{S}=\mathbf{P}\mathbf{H}^\top$, so the last term becomes $\mathbf{K}\mathbf{S}\mathbf{K}^\top=\mathbf{P}\mathbf{H}^\top\mathbf{K}^\top$ and cancels the second term:

$$ = \mathbf{P} - \mathbf{P}\mathbf{H}^\top\mathbf{K}^\top - \mathbf{K}\mathbf{H}\mathbf{P} + \mathbf{P}\mathbf{H}^\top\mathbf{K}^\top = \mathbf{P} - \mathbf{K}\mathbf{H}\mathbf{P} = (\mathbf{I}-\mathbf{K}\mathbf{H})\,\mathbf{P} = \mathbf{P}_k^+ , $$

which is exactly the compact update. The cancellation uses $\mathbf{K}=\mathbf{P}\mathbf{H}^\top\mathbf{S}^{-1}$ only at the final step: the Joseph form stays valid — and symmetric, positive-semidefinite — for any gain $\mathbf{K}$, whereas the compact form is correct only for the optimal one. That is the deeper reason Joseph is the safer choice.

EKF Matrix Summary Table¶

Matrix	Dimensions	Source	Physical Meaning
$\mathbf{\hat x}$	$5 \times 1$	State Vector	Our current "best guess" for ($p_1, p_2, v_1, v_2, \theta$).
$\mathbf{P}$	$5 \times 5$	Covariance	The uncertainty "cloud" around the state estimate.
$\mathbf{u}$	$3 \times 1$	Control Input	Raw IMU data: ($a_1^B, a_2^B, \omega$).
$\mathbf{Q}_u$	$3 \times 3$	Input Noise	Variance of the IMU sensors (from datasheet).
$\mathbf{R}$	$m \times m$	Sensor Noise	Variance of the measurement sensor (GPS, MoCap, etc.).
$\mathbf{J}_F$	$5 \times 5$	State Jacobian	How the current state affects the predicted next state.
$\mathbf{J}_G$	$5 \times 3$	Control Jacobian	How IMU inputs (and their noise) affect the state.
$\mathbf{J}_H$	$m \times 5$	Meas. Jacobian	How the state maps to the sensor's observation space.
$\mathbf{K}$	$5 \times m$	Kalman Gain	The optimal weight given to the measurement correction.