Extended Kalman Filter

Introduction: From Particles to Pose

To track a robot, we must maintain linear and angular motion simultaneously.

We define the state vector $\mathbf{x}$ as a single column containing five pieces of information:

$$\mathbf{x} = \begin{bmatrix} p_1 \\ p_2 \\ v_1 \\ v_2 \\ \theta \end{bmatrix} \begin{matrix} \text{(Position X)} \\ \text{(Position Y)} \\ \text{(Velocity X)} \\ \text{(Velocity Y)} \\ \text{(Orientation)} \end{matrix}$$

We also define an input vector $\mathbf{u}$ consisting of the IMU-measurements (Inertial Measurement Unit) that "drive" the change in our state:

  • linear accelerations ($a_1^B, a_2^B$)
  • angular velocity ($\omega$).

$$ \mathbf{u} = \begin{bmatrix} a_1^B \\ a_2^B \\ \omega \end{bmatrix} $$

Note: The linear accelerations are measured in the body frame (indicated by the upper index $B$). In contrast, position and velocity are tracked in the world frame (no upper index).

The Breakdown of the Linear Model

The standard Kalman Filter (KF) assumes the next state is a linear combination of the previous state, control inputs, and noise:

$$\mathbf{x}_k = \mathbf{F}_{k-1} \mathbf{x}_{k-1} + \mathbf{G}_{k-1} \left( \mathbf{u}_{k-1} + \mathbf{w}_{k-1}\right)$$ Where:

  • $\mathbf{F}_{k-1}$: State Transition Matrix (how position/velocity evolve).
  • $\mathbf{G}_{k-1}$: Control Matrix (how the IMU inputs $u$ change the state).
  • $\mathbf{w}_{k-1}$ is the raw process noise from our sensors

However, this linear assumption breaks down for our robot. When we use body-frame measurements ($a_1^B, a_2^B, \omega$) and rotate them into the world frame ($W$), the equations become non-linear due to their dependence on the heading $\theta$.

Recalling the rotation into the world frame (no upper index):

  • $a_1 = a_1^B \cos\theta - a_2^B \sin\theta$
  • $a_2 = a_1^B \sin\theta + a_2^B \cos\theta$

The resulting state transition equations for our system (ignoring noise for a moment) are:

  • $p_{1,k} = p_{1,k-1} + v_{1,k-1}\Delta t + \frac{1}{2}\Delta t^2 (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1})$
  • $p_{2,k} = p_{2,k-1} + v_{2,k-1}\Delta t + \frac{1}{2}\Delta t^2 (a_1^B \sin\theta_{k-1} + a_2^B \cos\theta_{k-1})$
  • $v_{1,k} = v_{1,k-1} + \Delta t (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1})$
  • $v_{2,k} = v_{2,k-1} + \Delta t (a_1^B \sin\theta_{k-1} + a_2^B \cos\theta_{k-1})$
  • $\theta_{k} = \theta_{k-1} + \omega \Delta t$

Notice:

  • Trigonometric Coupling: The state variable $\theta$ is "trapped" inside sine and cosine functions.
  • Cross-Term Interaction: We see terms like $a_1^B \cos\theta$. Here, an input ($a_1^B$) is being multiplied by a function of a state ($\theta$). In a linear system, inputs and states must remain independent additive terms; once they multiply, the geometry of our uncertainty "warps."

This means the transition from $\mathbf{x}_{k-1}$ to $\mathbf{x}_k$ can no longer be expressed as a simple matrix multiplication.

Modeling assumptions. Two approximations are baked into these equations and are worth making explicit:

  • Constant acceleration over the step: $\mathbf{a}^B$ is treated as constant across $\Delta t$ — this is what produces the constant-acceleration term $\frac{1}{2}\Delta t^2 \mathbf{a}$.
  • Frozen heading during the step: the rotation uses $\theta_{k-1}$ for the entire interval, even though the true heading slews from $\theta_{k-1}$ to $\theta_k = \theta_{k-1} + \omega\Delta t$. This is a first-order approximation, accurate when $\omega\,\Delta t$ is small.

Both errors shrink as $\Delta t \to 0$, so a fast prediction rate keeps the discretization honest.

Moving to the Extended Kalman Filter (EKF)

Since we cannot use a simple matrix multiplication to predict the state, we define a nonlinear transition function $\mathbf{f}$:

$$\mathbf{x}_k = \mathbf{f}(\mathbf{x}_{k-1}, \mathbf{u}_{k-1}, \mathbf{w}_{k-1})$$

where with the dimension of our state space $n$:

  • $\mathbf f$ is a $n$-vector-valued function.

To keep the "Kalman logic" alive, we must linearize this function at every time step using a first-order Taylor expansion. This is the essence of the EKF.

1. The Prediction Step (Nonlinear) We calculate the new state $\hat{\mathbf{x}}_k^-$ directly using the nonlinear trigonometric equations above. This gives us our "best guess" for the robot's new pose.

2. The Uncertainty Propagation (The Jacobian) To update the error covariance $P$ (our "certainty"), we cannot use $\mathbf{F}$ or $\mathbf{G}$ because they don't exist in constant form. Instead, we calculate the Jacobian Matrix—the matrix of partial derivatives—which represents the local linear slope of the function $\mathbf{f}$ at the current state.

Why propagate the uncertainty through the linearization?

The mean is easy; the spread is the hard part. The best guess $\hat{\mathbf{x}}$ can be pushed straight through the nonlinear model: $\hat{\mathbf{x}}_k^- = \mathbf{f}(\hat{\mathbf{x}}_{k-1}^+,\mathbf{u}_{k-1})$. The difficulty is the uncertainty — $\mathbf{P}$ describes a whole cloud of plausible states, and we need what the motion does to that cloud.

The filter can only carry a Gaussian (an ellipse). A linear map sends a Gaussian to another Gaussian — it stretches, rotates and shifts the ellipse, which $\mathbf{P}$ still describes exactly. A nonlinear map bends it into a banana-shaped blob that is no longer Gaussian, and the filter has no way to store that shape. So the step that transforms the cloud must be linear.

Covariance has a simple rule only for linear maps. For $\mathbf{y}=\mathbf{A}\mathbf{x}$, $\operatorname{Cov}(\mathbf{y})=\mathbf{A}\,\operatorname{Cov}(\mathbf{x})\,\mathbf{A}^\top$ (in 1D, $\operatorname{var}(ax)=a^2\operatorname{var}(x)$). No equally clean exact rule exists for $\operatorname{Cov}(\mathbf{f}(\mathbf{x}))$.

So we replace $\mathbf{f}$ by its tangent. Near the current estimate $\hat{\mathbf{x}}$, $\mathbf{f}(\mathbf{x}) \approx \underbrace{\mathbf{f}(\hat{\mathbf{x}})}_{\text{shifts the cloud}} + \underbrace{\mathbf{J}_F(\mathbf{x}-\hat{\mathbf{x}})}_{\text{stretches/rotates it}}$. The Jacobian $\mathbf{J}_F$ is this local linear map, which makes the clean rule apply:

$$\mathbf{P}_k^- = \mathbf{J}_F\,\mathbf{P}_{k-1}^+\,\mathbf{J}_F^\top + (\text{process noise}).$$

Why it works (and when it doesn't). Zoom in on any smooth curve and it looks like its tangent line. As long as the uncertainty cloud is small compared to how sharply $\mathbf{f}$ curves, $\mathbf{J}_F$ captures almost all of the reshaping. If the spread is large or $\mathbf{f}$ bends sharply, the linearization becomes inaccurate — the EKF's main weakness. Several filters address this:

  • The Unscented Kalman Filter and Particle Filters sidestep the explicit Jacobian altogether.
  • The Error-State (indirect) Kalman Filter instead keeps the linearization accurate by tracking only the small error between the true state and a separately-integrated nominal trajectory: because that error stays near zero, its first-order approximation is essentially exact. The error-state form also naturally handles state spaces that are not flat vector spaces — notably 3D orientations, which live on a rotation manifold ($SO(3)$) rather than in $\mathbb{R}^n$ — and is the approach we use later for the 3D case.

The State Transition Jacobian (Process Jacobian) $\mathbf{J}_F$

The State Transition Jacobian, which we will call $\mathbf{J}_F$, is a matrix of partial derivatives that represents the "linear slope" of the movement $\mathbf f$ at the current state variables. For example, it tells the filter: Given our current heading and acceleration, how much will a $1^\circ$ error in $\theta$ bleed into our position and velocity estimates?

The Jacobian matrix, denoted in general as $\mathbf{J}$, is a matrix of all first-order partial derivatives of a vector function. In our EKF context, we use the symbol $\mathbf{J}_F$ for the Jacobian of the nonlinear prediction function $f(\mathbf{x}_{k-1}, \mathbf{u}_{k-1})$. This Jacobian tells us how a tiny change in any part of the state ($\mathbf{x}$) affects the next state.

The State Transition Jacobian $\mathbf{J}_F$ ($n \times n$-matrix) is defined as:

$$\mathbf{J}_F = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \dots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}$$

  • Rows ($n$): Each row represents one output of your $n$-vector-valued function $\mathbf{f}$.
  • Columns ($n$): Each column represents the derivative with respect to one of the $n$ variables in your state vector $\mathbf{x}$.

Because you are mapping the "change in the state" to the "next state," the input and output counts are identical, making the matrix perfectly square.

Calculating the Jacobian for your 5-State System

Definition of the Jacobian for our 5-state system:

$$\mathbf{J}_F = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial p_1} & \frac{\partial f_1}{\partial p_2} & \dots & \frac{\partial f_1}{\partial \theta} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_5}{\partial p_1} & \frac{\partial f_5}{\partial p_2} & \dots & \frac{\partial f_5}{\partial \theta} \end{bmatrix}$$

Take your first equation: $f_1 = p_{1,k-1} + v_{1,k-1} \Delta t + \frac{1}{2}\Delta t^2 (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1})$.

  • Differentiating $f_1$ with respect to $p_{1,k-1}$, we get 1.
  • Differentiating $f_1$ with respect to $v_{1,k-1}$, we get $\Delta t$.
  • Differentiating $f_1$ with respect to $\theta_{k-1}$, we use the rules $(\cos\theta)' = -\sin\theta$ and $(\sin\theta)' = \cos\theta$: $$\frac{\partial f_1}{\partial \theta_{k-1}} = \frac{1}{2}\Delta t^2 (-a_1^B \sin\theta_{k-1} - a_2^B \cos\theta_{k-1})$$

This entry in the Jacobian tells the EKF how much the X-position ($p_1$) will change if the angle ($\theta$) is slightly different than we thought.

The Resulting Jacobian Matrix $\mathbf{J}_F$

For our system, the Jacobian — evaluated at the current estimate $\hat{\mathbf{x}}_{k-1}^+$ — looks like this:

$$\mathbf{J}_F = \left.\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right|_{\hat{\mathbf{x}}_{k-1}^+,\,\mathbf{u}_{k-1}} = \begin{bmatrix} 1 & 0 & \Delta t & 0 & \frac{1}{2}\Delta t^2 (-a_1^B \sin\theta_{k-1} - a_2^B \cos\theta_{k-1}) \\ 0 & 1 & 0 & \Delta t & \frac{1}{2}\Delta t^2 (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1}) \\ 0 & 0 & 1 & 0 & \Delta t (-a_1^B \sin\theta_{k-1} - a_2^B \cos\theta_{k-1}) \\ 0 & 0 & 0 & 1 & \Delta t (a_1^B \cos\theta_{k-1} - a_2^B \sin\theta_{k-1}) \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}$$

(the entries are the symbolic derivatives from above, with $\theta_{k-1}$ and $a^B$ taken at their current estimated/measured values.)

Interpreting $\mathbf{J}_F$ — how heading error leaks into position and velocity. Apart from the trivial position–velocity link ($\partial p/\partial v = \Delta t$), the only coupling in $\mathbf{J}_F$ lives in the fifth column $\partial \mathbf{f}/\partial\theta$. Its non-zero entries (rows 1–4) say that an error or uncertainty in the heading $\theta$ does not stay confined to $\theta$ — it is converted, through the body acceleration, into errors in velocity ($v_1, v_2$) and position ($p_1, p_2$). Intuitively: if the robot is accelerating and our estimate of $\theta$ is slightly off, we rotate $\mathbf{a}^B$ into the wrong world-frame direction, so the predicted velocity (scaled by $\Delta t$) and position (scaled by $\tfrac{1}{2}\Delta t^2$) inherit that mistake. Two consequences:

  • The leak is proportional to the acceleration. If $a_1^B = a_2^B = 0$, the entire $\theta$-column (rows 1–4) vanishes and heading uncertainty cannot contaminate $p$ or $v$ during this step. The faster the robot accelerates, the more a wrong heading corrupts the trajectory.
  • This column is the mechanism behind the "warping" uncertainty cloud. When the covariance is propagated as $\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \dots$, it is precisely these $\partial/\partial\theta$ terms that transfer variance from $\theta$ into the position and velocity states whenever the heading is uncertain.

Unlike the linear $\mathbf{F}$, which is constant, the Jacobian $\mathbf{J}_F$ depends on:

  • The current estimated orientation $\theta$ (part of $\mathbf x $).
  • The current measured accelerations $a_1^B, a_2^B$ (part of $\mathbf u $).

Summary for the Jacobian $\mathbf{J}_F$

The Jacobian $\mathbf{J}_F$ replaces the constant transition matrix $\mathbf{F}$. Unlike $\mathbf{F}$, which is the same every time, $\mathbf{J}_F$ is recalculated at every step using the current estimate $\hat\theta_{k-1}^+$ and the current accelerations $\mathbf{a}_{k-1}^B$.

It serves as the "linear bridge" that allows the Kalman Filter to propagate uncertainty through the nonlinear turns of the robot.

The Control Jacobian $\mathbf{J}_G$

While the State Jacobian $\mathbf{J}_F$ tells us how uncertainty already in the state propagates forward, every prediction step also injects new uncertainty — because the IMU drivers $\mathbf{u} = [a_1^B, a_2^B, \omega]^\top$ that we feed into the motion model $\mathbf{f}$ are themselves noisy.

The difficulty is that this noise is described in the wrong space. The IMU gives us a few per-sensor error figures — uncertainty in input space (3 numbers: two accelerometers and a gyro). What we actually need is the resulting uncertainty in state space (5 numbers: $p_1, p_2, v_1, v_2, \theta$). So we need a bridge that carries a wobble in the drivers over into a wobble in the state.

That bridge is the Control Jacobian: $$\mathbf{J}_G = \frac{\partial \mathbf{f}}{\partial \mathbf{u}}.$$

It is the linear map that transforms a small perturbation of the drivers into the corresponding perturbation of the state: $$\delta\mathbf{x} \approx \mathbf{J}_G\,\delta\mathbf{u}.$$ A noisy IMU reading $\delta\mathbf{u}$ is rotated and scaled by $\mathbf{J}_G$ into the state error $\delta\mathbf{x}$ it produces over one step. When that perturbation is specifically the random IMU measurement error, it is the process noise $\mathbf{w}$ of the linear model in the callout below ($\delta\mathbf{u} = \mathbf{w}$), a zero-mean random vector whose covariance is the $\mathbf{Q}_u$ introduced next ($\mathbf{Q}_u = \operatorname{Cov}(\mathbf{w})$). So $\mathbf{J}_G$ captures how the driver noise spreads into the state; how large that noise is in the first place is the separate quantity $\mathbf{Q}_u$ — and the two are combined in the covariance-prediction step.

A note on the symbol. We write the subscript $G$ because, in the generic linear model $\mathbf{x}_k = \mathbf{F}_{k-1}\mathbf{x}_{k-1} + \mathbf{G}_{k-1}\left(\mathbf{u}_{k-1} + \mathbf{w}_{k-1}\right)$, this Jacobian is what plays the role of the control matrix $\mathbf{G}$ — just as $\mathbf{J}_F$ takes the role of $\mathbf{F}$ and (below) $\mathbf{J}_H$ takes the role of $\mathbf{H}$. Each $\mathbf{J}_\bullet$ is the EKF Jacobian that, in the linear case, collapses back to the standard matrix named in its subscript. Note that here the input $\mathbf{u}_{k-1}$ and its noise $\mathbf{w}_{k-1}$ enter through the same matrix $\mathbf{G}_{k-1}$; for our robot that process noise is the IMU measurement error on $\mathbf{u}$, so the single Jacobian $\mathbf{J}_G$ maps both the deterministic IMU input and its noise into the state space. (Were the noise to enter through a separate gain $\mathbf{L}$, as in the general $\mathbf{G}\mathbf{u}+\mathbf{L}\mathbf{w}$ form, the identification $\delta\mathbf{u}=\mathbf{w}$ would break and the noise would propagate through $\mathbf{L}$ instead of $\mathbf{J}_G$.)

Dimensions and the Resulting Matrix

Since our state $\mathbf{x}$ has $n=5$ variables and our input $\mathbf{u}$ has $p=3$ variables, $\mathbf{J}_G$ is an $n \times p$ matrix ($5 \times 3$).

$$\mathbf{J}_G = \begin{bmatrix} \frac{1}{2}\Delta t^2 \cos\theta & -\frac{1}{2}\Delta t^2 \sin\theta & 0 \\ \frac{1}{2}\Delta t^2 \sin\theta & \frac{1}{2}\Delta t^2 \cos\theta & 0 \\ \Delta t \cos\theta & -\Delta t \sin\theta & 0 \\ \Delta t \sin\theta & \Delta t \cos\theta & 0 \\ 0 & 0 & \Delta t \end{bmatrix}$$

Reading one entry. Each entry $\partial f_i/\partial u_j$ answers: how much does input $u_j$ move state $i$ over one step? Take the $v_1$ row and the $a_1^B$ column. From the velocity equation $$v_{1,k} = v_{1,k-1} + \Delta t\,(a_1^B \cos\theta - a_2^B \sin\theta),$$ the derivative is $\dfrac{\partial v_1}{\partial a_1^B} = \Delta t\,\cos\theta$.

So if the accelerometer reading $a_1^B$ carries a small error $\delta a_1^B$ (say $0.01\,\mathrm{m/s^2}$), the predicted $v_1$ inherits an error of about $\Delta t\,\cos\theta \cdot \delta a_1^B$. The $\cos\theta$ is just the projection of the body-$x$ axis onto the world-$x$ axis: the error is largest when the robot faces along world-$x$ ($\theta\approx 0$, $\cos\theta = 1$) and vanishes when it faces sideways ($\theta\approx 90^\circ$, $\cos\theta = 0$). Collecting one such sensitivity per (state, input) pair is the matrix $\mathbf{J}_G$ above. On its own, though, $\mathbf{J}_G$ describes only how a driver error spreads — to push the robot's actual sensor noise through it we still have to say how large that noise is. That is the input-noise matrix $\mathbf{Q}_u$, which we define next.

Defining the Input Noise Matrix ($\mathbf{Q}_u$)

Where $\mathbf{J}_G$ told us how input noise spreads into the state, the input-noise matrix $\mathbf{Q}_u$ says how large that input noise is. It represents the uncertainty of your "drivers" — the raw measurements from the IMU. We assume these sensors are uncorrelated (e.g., noise in the X-accelerometer doesn't affect the Gyroscope), which allows us to use a diagonal form:

$$\mathbf{Q}_u = \begin{bmatrix} \sigma_{a1}^2 & 0 & 0 \\ 0 & \sigma_{a2}^2 & 0 \\ 0 & 0 & \sigma_\omega^2 \end{bmatrix}$$

With both ingredients now in hand — the spread map $\mathbf{J}_G$ and the input noise $\mathbf{Q}_u$ — the next cell combines them into the process-noise contribution $\mathbf{Q}_k = \mathbf{J}_G\,\mathbf{Q}_u\,\mathbf{J}_G^\top$.

Aside: isotropic noise. Propagating $\mathbf{Q}_u$ through $\mathbf{J}_G$ rotates the accelerometer noise into the world frame, so in general the world-frame noise depends on the heading $\theta$. There is one important exception: if the two accelerometer axes are equally noisy ($\sigma_{a1}=\sigma_{a2}=\sigma_a$), that block is isotropic and rotation leaves it unchanged, $$\mathbf{R}(\theta)\,(\sigma_a^2 \mathbf{I})\,\mathbf{R}(\theta)^\top = \sigma_a^2\,\mathbf{R}(\theta)\mathbf{R}(\theta)^\top = \sigma_a^2 \mathbf{I}.$$ So for isotropic accelerometer noise the heading drops out of that term — only anisotropic ($\sigma_{a1}\neq\sigma_{a2}$) noise actually gets reshaped by $\theta$. This is exactly what lets a practical implementation replace the full $\mathbf{J}_G\mathbf{Q}_u\mathbf{J}_G^\top$ with a cheaper, rotation-free discrete white-noise model.

Finding Values in the Datasheet

Sensor manufacturers usually specify noise in one of two ways:

  • Standard Deviation ($\sigma$): Often listed as "Noise RMS." You must square this value to get the variance ($\sigma^2$) for the matrix.
  • Noise Density: Listed in units like $\text{mg}/\sqrt{\text{Hz}}$ (for accelerometers) or $^\circ/\text{s}/\sqrt{\text{Hz}}$ (for gyros). To convert this to a standard deviation, use the formula: $$\sigma = \text{Noise Density} \cdot \sqrt{\text{Sampling Bandwidth}}$$

Accounting for Vibration and "Real-World" Noise

The datasheet provides the "laboratory" noise. In a real robot, motors and surface roughness create significant mechanical vibrations.

  • The Multiplier Rule: It is common practice to start with the datasheet values and multiply them by a factor of 2 to 10 to account for these environmental factors.
  • Static Test: A professional way to determine these values is to let the robot sit perfectly still for 10 seconds, record the IMU data, and calculate the variance of the signal: variance = numpy.var(sensor_data_while_still)

The EKF Covariance Prediction Formula

Now that we have both Jacobians, we can propagate the uncertainty (the Covariance Matrix $\mathbf{P}$) correctly. The new predicted uncertainty is the sum of "old uncertainty moved forward" and "new noise from the sensors":

$$\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$$

Where:

  • $\mathbf{P}_{k-1}^+$ is your previous (a posteriori) uncertainty.
  • $\mathbf{Q}_u$ is the raw noise matrix from your IMU datasheet (diagonal matrix of $\sigma^2_{a1}, \sigma^2_{a2}, \sigma^2_{\omega}$).
  • $\mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$ is the Process Noise Matrix $\mathbf{Q}_k$ that we now define—it is now non-linearly projected into the world frame!

This $\mathbf{Q}_k = \mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$ is the multi-dimensional generalisation of the 1D result $\mathbf{Q} = \mathbf{G}\,\sigma_a^2\,\mathbf{G}^\top$ derived in the process-noise notebook: the scalar input variance $\sigma_a^2$ becomes the input-noise covariance $\mathbf{Q}_u$, and the kinematic lever arm $\mathbf{G}$ becomes the control Jacobian $\mathbf{J}_G$.

Notational Summary

Matrix Name Role Calculation
$\mathbf{J}_F$ State Jacobian Maps how the current state ($\mathbf{x}_{k-1}$) affects the future state. $\frac{\partial \mathbf{f}}{\partial \mathbf{x}}$
$\mathbf{J}_G$ Control Jacobian Maps how IMU inputs ($\mathbf{u}_{k-1}$) affect the future state. $\frac{\partial \mathbf{f}}{\partial \mathbf{u}}$
$\mathbf{Q}_u$ Input Noise The raw variance of the IMU sensors (from datasheet). $\text{diag}(\sigma^2_{a1}, \sigma^2_{a2}, \sigma^2_\omega)$
$\mathbf{Q}_k$ Process Noise The projected uncertainty added during the movement. $\mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$

The Measurement Jacobian ($\mathbf{J}_H$)

Just as $\mathbf{J}_F$ and $\mathbf{J}_G$ handle the nonlinearity of the robot's motion, the Measurement Jacobian ($\mathbf{J}_H$) handles the relationship between your state and your sensors.

In an EKF, we define a measurement function $\mathbf{h}(\mathbf{x})$ that predicts what the sensor should see based on our current estimate. Because this function is often nonlinear (like the Pythagorean theorem for distance or trigonometry for a compass), the EKF uses a Jacobian to linearize the relationship at the robot's current location.

Mathematically, $\mathbf{J}_H$ is the matrix of partial derivatives of the measurement function with respect to each state variable:$$\mathbf{J}_H = \frac{\partial \mathbf{h}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial h_1}{\partial x_1} & \frac{\partial h_1}{\partial x_2} & \cdots & \frac{\partial h_1}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial h_m}{\partial x_1} & \frac{\partial h_m}{\partial x_2} & \cdots & \frac{\partial h_m}{\partial x_n} \end{bmatrix}$$

When is a Measurement Nonlinear?

A relationship is nonlinear if the sensor does not "see" the state variables directly, but instead sees a result of a trigonometric or geometric calculation:

  • Range Sensors (e.g., Ultra-Wideband or Radar): The sensor measures a distance $d$ to a known beacon (here taken at the origin). To predict this, the filter must calculate $\mathbf{h}(\mathbf{x}) = \sqrt{p_1^2 + p_2^2}$. The derivative of this square root is nonlinear.
  • Bearing-Only Sensors (e.g., Camera tracking a landmark): The sensor measures the angle $\alpha$ to an object. The prediction function $\mathbf{h}(\mathbf{x}) = \arctan2(p_2, p_1)$ is highly nonlinear.

In these cases, $\mathbf{J}_H$ must be recalculated at every step because the "slope" of a circle or an angle changes depending on where the robot is standing.

When to use the Linear Measurement Update

If your sensors provide direct observations of your state variables, the relationship is linear. For your 5-state system, this applies to:

  • GPS/MoCap: Provides $(p_1, p_2)$ directly.
  • Magnetometer: Provides $\theta$ directly.

In these cases, $\mathbf{J}_H$ is simply a constant "selection matrix" (a matrix of 1s and 0s) that picks the relevant variable out of the state vector. For a sensor measuring only global position, the Jacobian is constant:

$$\mathbf{J}_H = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{bmatrix}$$

Since the derivative of $1 \cdot p_1$ is simply $1$, the Jacobian of a linear measurement is identical to the standard linear matrix $\mathbf{H}$.

Caution — angular measurements live on a circle. For any sensor that observes the heading directly (e.g. the magnetometer, $\mathbf{h}(\mathbf{x})=\theta$), the measurement Jacobian is linear, but the innovation is not. Recall from the 2D-rotations notebook that $\theta$ is periodic. The residual $\mathbf{y}_k = \mathbf{z}_k - \mathbf{h}(\hat{\mathbf{x}}_k^-)$ must be wrapped to $(-\pi, \pi]$ (e.g. atan2(sin(y), cos(y))), otherwise a reading near $+\pi$ versus $-\pi$ produces a spurious $\approx 2\pi$ jump that the filter would misinterpret as a huge error. This wrapping is applied explicitly in the implementation in the 2d-navigation notebook.

Summary

We only "extend" the measurement step (calculating a fresh Jacobian at every step) if the sensor output is a complex function of the state. If the sensor measures position, velocity, or angle directly, we stick to the much simpler linear $\mathbf{J}_H$ matrix.

The EKF Algorithm Cycle

The EKF follows the same "Predict-Correct" loop as the standard Kalman Filter, but uses the nonlinear function $\mathbf{f}$ for the state and Jacobians for the uncertainty.

1. Prediction (Time Update)

We move the robot forward in time based on the IMU inputs.

  • State Prediction: $$\hat{\mathbf{x}}_k^- = \mathbf{f}(\hat{\mathbf{x}}_{k-1}^+, \mathbf{u}_{k-1})$$ (Use the trigonometric equations to find the new $p_1, p_2, v_1, v_2, \theta$)
  • Covariance Prediction: $$\mathbf{P}_k^- = \mathbf{J}_F \mathbf{P}_{k-1}^+ \mathbf{J}_F^\top + \mathbf{J}_G \mathbf{Q}_u \mathbf{J}_G^\top$$ (This warps the old uncertainty and adds the rotated IMU noise)

2. Correction (Measurement Update)

We "pull" the prediction toward the actual sensor reading $z_k$.

  • Innovation (Residual): $$\mathbf{y}_k = \mathbf{z}_k - \mathbf{h}(\hat{\mathbf{x}}_k^-)$$ (The difference between what the sensor says and what we expected to see. For angular measurements, wrap $\mathbf{y}_k$ to $(-\pi,\pi]$ — see the caution above.)
  • Innovation Covariance:$$\mathbf{S}_k = \mathbf{J}_H \mathbf{P}_k^- \mathbf{J}_H^\top + \mathbf{R}_k$$
  • Kalman Gain:$$\mathbf{K}_k = \mathbf{P}_k^- \mathbf{J}_H^\top \mathbf{S}_k^{-1}$$
  • State Update:$$\hat{\mathbf{x}}_k^+ = \hat{\mathbf{x}}_k^- + \mathbf{K}_k \mathbf{y}_k$$
  • Covariance Update:$$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{J}_H) \mathbf{P}_k^-$$

Numerical stability — the Joseph form. The compact update $\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{J}_H)\mathbf{P}_k^-$ is correct in exact arithmetic but can lose symmetry or positive-definiteness under floating-point rounding (mirroring the rounding concerns that made $\mathbf{R}^{-1}=\mathbf{R}^\top$ attractive in the rotations notebook). The algebraically equivalent Joseph form $$\mathbf{P}_k^+ = (\mathbf{I} - \mathbf{K}_k \mathbf{J}_H)\,\mathbf{P}_k^-\,(\mathbf{I} - \mathbf{K}_k \mathbf{J}_H)^\top + \mathbf{K}_k \mathbf{R}_k \mathbf{K}_k^\top$$ is structured as a symmetric "sandwich" plus a positive term, so it keeps $\mathbf{P}$ symmetric positive-definite over long runs. It is the preferred choice in practice and is the form used in the 2d-navigation implementation.

EKF Matrix Summary Table

Matrix Dimensions Source Physical Meaning
$\mathbf{\hat x}$ $5 \times 1$ State Vector Our current "best guess" for ($p_1, p_2, v_1, v_2, \theta$).
$\mathbf{P}$ $5 \times 5$ Covariance The uncertainty "cloud" around the state estimate.
$\mathbf{u}$ $3 \times 1$ Control Input Raw IMU data: ($a_1^B, a_2^B, \omega$).
$\mathbf{Q}_u$ $3 \times 3$ Input Noise Variance of the IMU sensors (from datasheet).
$\mathbf{R}$ $m \times m$ Sensor Noise Variance of the measurement sensor (GPS, MoCap, etc.).
$\mathbf{J}_F$ $5 \times 5$ State Jacobian How the current state affects the predicted next state.
$\mathbf{J}_G$ $5 \times 3$ Control Jacobian How IMU inputs (and their noise) affect the state.
$\mathbf{J}_H$ $m \times 5$ Meas. Jacobian How the state maps to the sensor's observation space.
$\mathbf{K}$ $5 \times m$ Kalman Gain The optimal weight given to the measurement correction.