The Kalman Filter (KF) is not a fundamentally different concept; it is a specific, computationally efficient implementation of the general Bayes Filter.
The reason the Kalman Filter is tractable and widely used is that it imposes two powerful, simplifying constraints on the system:
If these two conditions hold, the 1D-system remains entirely defined by just two parameters at every step: the mean ($\mu$) and the covariance ($\sigma$). This means the entire probability distribution can be tracked simply by updating these two values, replacing the complex calculations (like convolution) required for general discrete distributions.
Everything you've learned—the Markov Assumption, the recursive Prediction/Update cycle, and the use of Bayes' Rule—is the direct probabilistic foundation upon which the 1D-Kalman Filter is built.
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
from ipynb.fs.defs.plot_Kalman_Filter_1D import plot_prediction_results, plot_update_step, plot_step
In the continuous domain, the Kalman Filter (KF) is an instance of the Bayes Filter where the probability distribution is a Probability Density Function (PDF). The critical assumption that simplifies the KF is the Linear Gaussian model.
We use superscripts to indicate whether a value includes the most recent sensor measurement:
Note the hat in the notation to indicate that these are estimates.
The true state transition (the actual physics) is expressed as a constant linear model:
$$x_t = F x_{t-1} + G (u_{t-1} + w)$$
where
For our example is $F=1$, $G=1$, and $u_{t-1} = \Delta x_{t-1}$:
$$x_t = x_{t-1} + \Delta x_{t-1} + w$$
where
Note: We do not use a hat ($\hat{x}$) in the equations above because it represents the true (physical) state. Since we cannot know $x_t$ perfectly, we use the Kalman Filter to calculate $\hat{x}_t$, our best possible guess.
We estimate the robot's position, $x_t$, with a Continuous Random Variable described by a Gaussian distribution $\mathcal{N}(\hat x_t, \sigma^2)$.
The state transition probability density function is:
$$p(x_t \mid x_{t-1}) = \mathcal{N}(x_{t-1} + \Delta x_{t-1}, \sigma^2_w)$$
This equation states that the probability of being at $x_t$, given the starting point $x_{t-1}$ and the planned movement $\Delta x_{t-1}$, is described by a Gaussian centered at the expected new position ($x_{t-1} + \Delta x_{t-1}$) , with a variance equal to the motion noise $\sigma^2_w$.
The initial estimated position (the prior) is $p(x_{t-1}^+) = \mathcal{N}(\hat{x}_{t-1}^+, \sigma^2_{t-1})$. The movement, $\Delta x_{t-1}$, shifts the position and adds noise $\mathcal{N}(0, \sigma^2_w)$.
To find the new predicted distribution $p(x_t^-)$, we combine the two Gaussians via convolution (marginalization) using the previous posterior distribution $p^+(x_{t-1})$:
$$p(x_t^-) = \int p(x_t \mid x_{t-1}) \cdot p^+(x_{t-1}) \, dx_{t-1}$$
A Note on Density Notation:
To be precise in our integration, we denote the posterior PDF as $p^+(x_{t-1})$. Note that:$$p^+(x_{t-1}) \equiv p(x_{t-1}^+)$$Both notations indicate the probability of the state $x_{t-1}$ given all information up to the previous update step. The version on the left ($p^+$) is used in the integral to clearly show that $x_{t-1}$ is the variable being "swept" across the state space.
The remarkable simplification provided by the Gaussian assumption is that the convolution of two Gaussians is always a new Gaussian (see e.g. [Thr06][Bro14]).
Given the
Then the predicted posterior $\mathcal{N}^-_{t}$ is defined by:
def predict(x_hat_prev, sigma_prev, delta_x, sigma_w):
"""
Performs the prediction step of the 1D Kalman Filter.
Combines the previous posterior state with the motion model.
"""
# 1. Update Mean: New mean is the sum of previous estimate and deterministic motion
x_hat_pred = x_hat_prev + delta_x
# 2. Update Variance: New variance is the sum of previous variance and process noise
# We calculate the new standard deviation by taking the square root of the sum of variances
sigma_pred = np.sqrt(sigma_prev**2 + sigma_w**2)
return x_hat_pred, sigma_pred
# --- Simulation Parameters ---
x_space = np.arange(-10., 15., .05)
# Initial state at t-1 (Posterior from previous step)
x_hat_prev = -2.0
sigma_prev = 1.5
# Motion parameters
delta_x = 2.5 # Expected displacement (control input)
sigma_w = 1.9 # Process noise standard deviation
# --- Calculation ---
# 1. Generate PDFs for visualization
prior_pdf = scipy.stats.norm(loc=x_hat_prev, scale=sigma_prev).pdf(x_space)
motion_pdf = scipy.stats.norm(loc=delta_x, scale=sigma_w).pdf(x_space)
# 2. Run Prediction Step
x_hat_pred, sigma_pred = predict(x_hat_prev, sigma_prev, delta_x, sigma_w)
# 3. Generate Predicted PDF
pred_pdf = scipy.stats.norm(loc=x_hat_pred, scale=sigma_pred).pdf(x_space)
# --- Plotting ---
params = {
'x_hat_prev': x_hat_prev,
'sigma_prev': sigma_prev,
'x_hat_pred': x_hat_pred,
'sigma_pred': sigma_pred,
'sigma_w': sigma_w
}
plot_prediction_results(x_space, prior_pdf, pred_pdf, motion_pdf, params)
Interpretation
The second plot clearly demonstrates the two effects of the Prediction step:
In the continuous 1D Kalman Filter, the measurement update step uses Bayes' Rule to fuse our uncertain prior belief (the prediction) with new, uncertain sensor data.
We combine:
To yield:
Mathematically, while the prediction step involved a convolution (adding the motion noise), the update step involves the product of two Gaussians.
When we multiply the prediction Gaussian and the measurement Gaussian, the result is a new Gaussian that is shifted toward the more certain source. This new distribution is "narrower" (has a smaller variance) than both originals, indicating that our uncertainty has decreased now that we have fused two sources of information.
The relationship between the true state ($x_t$) and the noisy observation ($z_t$) is modeled linearly with additive Gaussian noise.
The general form of the measurement equation is:
$$z_t = H x_t + r$$
Where:
For the simple 1D case where the sensor directly measures position, the Observation constant $H$ simplifies to $1$:
$$z_t = x_t + r$$
This formulation leads to the Measurement Likelihood Function $p(z_t \mid x_t)$:
$$p(z_t \mid x_t) = \mathcal{N}(z_t \mid x_t, \sigma_r^2)$$
This means that given the robot's true state $x_t$, the probability of receiving a specific measurement $z_t$ is described by a Gaussian centered at $x_t$ with a variance equal to the sensor noise $\sigma_r^2$.
The update is governed by Bayes' Rule, which fuses our prediction with the new measurement:
$$p^+(x_t) = \frac{p(z_t \mid x_t) \cdot p(x_t^-)}{p(z_t)}$$
Crucially, because both the prior and the likelihood are Gaussian—a property known as conjugacy—their product is proportional to a new Gaussian. This means the resulting posterior is guaranteed to remain in the Gaussian family. This allows us to calculate the new mean and variance directly through algebraic formulas, bypassing the need to solve the complex integral for the denominator $p(z_t)$ (the evidence).
If the prior $\mathcal{N}(\hat{x}_t^-, \sigma_t^{2-})$ is multiplied by the likelihood $\mathcal{N}(z_t \mid x_t, \sigma_r^2)$, the resulting Posterior Gaussian $\mathcal{N}(\hat{x}_t^+, \sigma_t^{2+})$ has the following parameters (for the derivation, see e.g., [Bro14]):
Intuition: The Weighted Average
The update formula for the mean is essentially a weighted average of our prediction and our measurement. Each source is weighted by the certainty (the inverse of the variance) of the other source:
The posterior variance is always smaller than both the prior and the measurement variance. This reflects the reduction in uncertainty achieved by the fusion of information: even a noisy sensor makes our estimate more certain than the prediction alone.
The Kalman Gain is a weighted factor that automatically determines the optimal balance between trusting the prediction versus trusting the measurement. It is directly derived by algebraically manipulating the posterior mean equation.
The formula for the posterior mean can be rewritten from a weighted average into a "prediction + correction" form:
$$\hat{x}_t^+ = \hat{x}_t^- + \left(\frac{\sigma_t^{2-}}{\sigma_t^{2-} + \sigma_r^2}\right) (z_t - \hat{x}_t^-)$$
Definition of the Kalman Gain
By defining the Kalman Gain ($K_t$) as the ratio of uncertainties:
$$K_t = \frac{\sigma_t^{2-}}{\sigma_t^{2-} + \sigma_r^2}$$
The posterior mean update takes the standard, intuitive form of the Kalman Filter:
$$\hat{x}_t^+ = \hat{x}_t^- + K_t (z_t - \hat{x}_t^-)$$
Term to Know: The difference $(z_t - \hat{x}_t^-)$ is known as the Innovation or Measurement Residual. It represents the "surprise"—how much the new sensor data differs from what we expected.
Interpretation of $K_t$
The Kalman Gain is an adaptable factor that ranges between 0 and 1. It determines how much of the innovation should be used to correct the predicted state:
Final Variance Update
Using the Kalman Gain, the update for the variance also simplifies to a standard form:
$$\sigma_t^{2+} = (1 - K_t)\sigma_t^{2-}$$
This formula beautifully illustrates that the new uncertainty ($\sigma_t^{2+}$) is always a fraction of the old uncertainty ($\sigma_t^{2-}$), reflecting our increased confidence after seeing the data.
measurement update function
The measurement function implements the core algebraic rules derived from the Product of Two Gaussians (Bayes' Rule) to fuse the predicted state (prior) with the observation (likelihood).
def update(x_hat_pred, sigma_pred, z_t, sigma_r):
"""
Performs the measurement update (correction) step of the 1D Kalman Filter.
Fuses the predicted state (Prior) with the sensor measurement (Likelihood).
"""
# 1. Calculate the Kalman Gain (K_t)
# This is the 'weight' that balances our trust between prediction and measurement.
k_t = sigma_pred**2 / (sigma_pred**2 + sigma_r**2)
# 2. Update Mean (Posterior Mean: \hat{x}_t^+)
# This is algebraically equivalent to your variance-weighted average:
# x_hat_plus = (sigma_r**2 * x_hat_pred + sigma_pred**2 * z_t) / (sigma_pred**2 + sigma_r**2)
x_hat_plus = x_hat_pred + k_t * (z_t - x_hat_pred)
# 3. Update Standard Deviation (Posterior Variance: \sigma_t^{2+})
# This is algebraically equivalent to the reciprocal sum of inverse variances:
# sigma_plus = np.sqrt( 1./( 1./sigma_pred**2 + 1./sigma_r**2) )
sigma_plus = np.sqrt((1 - k_t) * sigma_pred**2)
return x_hat_plus, sigma_plus
Demonstration of the Update Step
The following code segment visually demonstrates the fusion process, showing how the Prior and the Likelihood combine to create a sharper, more certain Posterior.
Parameters for the Simulation:
# --- Simulation Parameters & Setup ---
# (Using results from your previous prediction step)
x_hat_pred = 0.5 # Corresponds to mu1
sigma_pred = 2.42 # Corresponds to sigma1
# Measurement (Sensor Data)
z_t = -1.0 # The reading (mu_o)
sigma_r = 3.0 # Sensor noise (sigma_z)
# --- Calculation ---
# 1. Run the update function
x_hat_plus, sigma_plus = update(x_hat_pred, sigma_pred, z_t, sigma_r)
# 2. Generate PDFs for plotting
pred_pdf = scipy.stats.norm(loc=x_hat_pred, scale=sigma_pred).pdf(x_space)
likelihood_pdf = scipy.stats.norm(loc=z_t, scale=sigma_r).pdf(x_space)
post_pdf = scipy.stats.norm(loc=x_hat_plus, scale=sigma_plus).pdf(x_space)
# --- Execution ---
params = {
'x_hat_pred': x_hat_pred, 'sigma_pred': sigma_pred,
'z_t': z_t, 'sigma_r': sigma_r,
'x_hat_plus': x_hat_plus, 'sigma_plus': sigma_plus
}
plot_update_step(x_space, pred_pdf, likelihood_pdf, post_pdf, params)
Interpretation:
The resulting Posterior (blue line) is narrower than both the Prior and the Likelihood, confirming that the filter has reduced the overall uncertainty. Its mean is optimally located between the Prior and the Likelihood, weighted by the respective variances (Kalman Gain).
This final code block demonstrates the full, recursive operation of the 1D Kalman Filter over multiple time steps ($t=1$ to $t=5$).
Initial Setup and Cycle
# --- Simulation Parameters ---
sigma_r = 1.6 # Observation noise (sigma_z)
sigma_w = 0.9 # Process noise (sigma_w)
z_observations = np.array([-2, -1.5, -0.4, 1.2, 2.1])
delta_x_velocity = np.array([1, 1.1, 1.2, 1.2, 1.2])
x_space = np.arange(-6., 10., 0.05)
x_hat = 0. # Initial position estimate
sigma = 6. # Initial uncertainty (high)
# --- Main Simulation Loop ---
fig, axes = plt.subplots(6, 1, figsize=(10, 14), sharex=True)
# 1. Initial State at t=0
x_pdf_initial = scipy.stats.norm(loc=x_hat, scale=sigma).pdf(x_space)
axes[0].plot(x_space, x_pdf_initial, 'b-', label=r"Initial Position ($\hat{x}_0^+$)")
axes[0].set_ylim(0., 0.6)
axes[0].legend()
axes[0].set_title("Kalman Filter Recursive Estimation")
# 2. Iterative Predict and Update
for i, (z_t, delta_x) in enumerate(zip(z_observations, delta_x_velocity)):
t = i + 1
# PREDICT STEP
x_hat_pred, sigma_pred = predict(x_hat, sigma, delta_x, sigma_w)
x_pdf_pred = scipy.stats.norm(loc=x_hat_pred, scale=sigma_pred).pdf(x_space)
# MEASUREMENT MODEL (For visualization)
likelihood_pdf = scipy.stats.norm(loc=z_t, scale=sigma_r).pdf(x_space)
# UPDATE STEP
x_hat, sigma = update(x_hat_pred, sigma_pred, z_t, sigma_r)
x_pdf_post = scipy.stats.norm(loc=x_hat, scale=sigma).pdf(x_space)
# PLOTTING
plot_step(axes[t], x_space, x_pdf_pred, likelihood_pdf, x_pdf_post, t)
plt.xlabel("Position (x)")
plt.tight_layout()
plt.show()
Final Interpretation of the Results
The sequence of plots shows:
This recursive cycle—Prediction followed by Measurement Update—is the complete implementation of the Bayes Filter and the definition of the Kalman Filter in its simplest 1D form.
This chapter introduced the Kalman Filter (KF) as an efficient, specialized form of the Bayes Filter for continuous state estimation. The KF provides an optimal, recursive solution when two conditions hold:
| Component | Symbol | Gaussian Parameter | Description |
|---|---|---|---|
| Initial State | $\hat{x}_{t-1}^+$ | $\mathcal{N}(\hat{x}_{t-1}^+, \sigma^2_{t-1})$ | The final estimate from the previous time step. |
| Control Input | $\Delta x_{t-1}$ | $u_{t-1}$ | The intended displacement (deterministic part). |
| Process Noise | $w$ | $\mathcal{N}(0, \sigma^2_w)$ | Uncertainty from movement (e.g., wheel slip). |
| Prediction (Prior) | $p(x_t^-)$ | $\mathcal{N}(\hat{x}_t^-, \sigma_t^{2-})$ | The "Dead Reckoning" estimate before sensor data. |
| Measurement | $z_t$ | - | The actual sensor reading received at time $t$. |
| Measurement Noise | $r$ | $\mathcal{N}(0, \sigma^2_r)$ | The inherent uncertainty of the sensor. |
| Likelihood | $p(z_t \mid x_t)$ | $\mathcal{N}(z_t \mid x_t, \sigma^2_r)$ | Probability of reading $z_t$ given true state $x_t$. |
| Kalman Gain | $K_t$ | Scalar $[0, 1]$ | The weight assigned to the measurement residual. |
| Update (Posterior) | $p^+(x_t)$ | $\mathcal{N}(\hat{x}_t^+, \sigma^2_t)$ | The optimal fusion of prediction and measurement. |
The Recursive Cycle
The KF operates in a continuous, two-step cycle, tracking the entire belief using only the mean ($\hat x$) and variance ($\sigma^2$).
| Step | Purpose | Algebraic Rule | Effect on Uncertainty ($\sigma^2$) |
|---|---|---|---|
| 1. Prediction | Updates state based on motion. | $\hat{x}_t^- = \hat{x}_{t-1}^+ + \Delta x_{t-1}$ $\sigma_t^{2-} = \sigma_{t-1}^{2+} + \sigma_w^2$ |
Increases ($\sigma^2_w$ is added). |
| 2. Update | Corrects prediction using sensor ($z_t$). | $\hat{x}_t^+ = \hat{x}_t^- + K_t (z_t - \hat{x}_t^-)$ $\sigma_t^{2+} = (1 - K_t)\sigma_t^{2-}$ |
Decreases (Information is fused). |
The Kalman Gain ($K_t$) is the weighting factor that optimally balances the trust between the uncertain prediction and the uncertain measurement. The resulting Posterior state ($\hat{x}_t^+, \sigma_t^{2+}$) becomes the starting Prior for the next cycle.