%matplotlib inline
import numpy as np
from ipynb.fs.defs.plot_Bayes_Filter import plot_markov_chain, plot_HMM, plot_locations, plot_updates
The challenge in fields like robotics and control is to continuously determine the true state of a system (e.g., the exact location of a robot, $X_t$) despite two major sources of uncertainty:
The Bayes Filter is the general probabilistic algorithm designed to solve this challenge. It provides the mathematical framework for recursively fusing uncertain information from the Motion Model and the Measurement Model to produce the best possible estimate of the system's state at any given time.
The Recursive Cycle
The Bayes Filter operates in a continuous, two-step cycle:
A Markov Chain is a mathematical system that experiences transitions from one state to another on a state space, with the property that the probability of the next state depends only on the current state and not on the sequence of events that preceded it. This "memoryless" property is its defining characteristic.
Imagine a sequence of states, $X_1, X_2, X_3, \dots$, where $X_t$ represents the state of the system at time $t$. The chain describes how the system moves from one state to the next over time.
The assumption states that the future state of the process is conditionally independent of the past states, given the present state.
Mathematically, this is expressed as:
$$ P(X_t \mid X_1, \dots ,X_{t-1}) = P(X_t \mid X_{t-1}) $$
Where:
This equation means that to predict the state at time $t$, we only need to know the state at time $t-1$. Knowing $X_{t-2}, X_{t-3}$, and so on, provides no additional information because the influence of those earlier states is entirely captured by $X_{t-1}$.
Your second equation formalizes the idea of statistical independence resulting from the Markov Assumption:
$$\forall t \in \mathbb N^{+}, \forall t' \in \mathbb N^{+}: (t-t' \geq 2) \Rightarrow X_t \perp X_{t'} \mid X_{t-1}$$
This means that any state $X_t$ is statistically independent ($\perp$) of any earlier state $X_{t'}$ (where $t'$ is at least two time steps before $t$, i.e., $t-t' \geq 2$), once you are given the state $X_{t-1}$ that lies between them.
For example, $X_3$ is independent of $X_1$, given $X_2$. The history $X_1$ only influences $X_2$, and it is only $X_2$ that influences $X_3$.
This property simplifies the joint probability distribution of the entire sequence of states. Using the chain rule of probability and applying the Markov Assumption:
$$P(X_1, \dots ,X_T) = P(X_1) \cdot P(X_2 \mid X_1) \cdot P(X_3 \mid X_1, X_2) \cdots P(X_T \mid X_1, \dots ,X_{T-1})$$
Becomes:
$$P(X_1, \dots ,X_T) = P(X_1) \prod_{t=2}^{T} P(X_t \mid X_{t-1})$$
This simplification is what makes Markov Chains, and extensions like the Hidden Markov Model (HMM) and Kalman Filter, computationally tractable and widely applicable in sequence modeling.
plot_markov_chain()
In this example, we consider the problem of tracking a robot's location on a discrete grid, a classic application of a Markov Chain.
The motion equation describes how the probability distribution over the robot's location evolves over one time step, based on the Markov Assumption ($P(X_t \mid X_1, \dots, X_{t-1}) = P(X_t \mid X_{t-1})$).
To find the probability that the robot is at cell $i$ at time $t$, $P(X_t=i)$, we must sum up the probabilities of all possible previous locations $j$ at $t-1$ that could have led to state $i$ at time $t$. This is done using the law of total probability:
$$P(X_t=i) = \sum_{j} P(X_{t-1}=j) \cdot P(X_t=i \mid X_{t-1}=j) $$
This process, where a probability distribution is updated by considering all possibilities from the previous step, is called temporal prediction or Bayesian filtering prediction.
In many localization problems, especially on a uniform grid, the transition probabilities exhibit translation invariance (or stationarity). This means the probability of moving by a certain amount is independent of the absolute starting position.
Example of Translation Invariance
The probability of moving one cell to the left is the same whether you start at cell 3 and move to 2, or start at cell 2 and move to 1.
$$P(X_t=2 \mid X_{t-1}=3) = P(X_t=1 \mid X_{t-1}=2) = P(\text{move left by } 1) $$
If we define a function $g$ for the relative motion probability:
$$g(k) \equiv P(X_t - X_{t-1} = k)$$ where $k = i - j$ is the displacement.
Then the transition probability can be written in terms of this relative difference:
$$P(X_t=i \mid X_{t-1}=j) = g(i-j)$$
Substituting this into the motion equation, and switching to a functional notation where $f_t(i) = P(X_t=i)$: $$f_t(i) = \sum_{j} f_{t-1}(j) \cdot g(i-j)$$
This equation is exactly the definition of a convolution $(f_{t-1} * g)(i)$:$$(f_{t-1} * g)(i) = \sum_{j} f_{t-1}(j) \cdot g(i-j)$$
Conclusion: The operation of predicting the robot's new location probability by incorporating the motion model is mathematically equivalent to convolving the previous location probability distribution ($f_{t-1}$) with the motion uncertainty distribution ($g$).
The "Blurry" Effect"
So if we know the localization of the robot at time $t=1$ and we do not observe any variable (no measurement), our knowledge of the robot location becomes more 'blurry' with time.
Convolution, in this context, always involves spreading the probability mass:
Without any observations (measurements) to correct the belief, the uncertainty (or entropy) of the robot's location continuously increases because the cumulative error from the motion model grows. This highlights the crucial need for the measurement update step in the full Kalman Filter/Bayesian Filter framework.
This example illustrates how the initial probability distribution of the robot's location, $P(X_t)$, spreads out over time due to the uncertainty in the motion model.
# 1. Initialize Probability Matrix (5 time steps x 10 cells)
p = np.zeros([5, 10])
# 2. Set Initial State at t=0
p[0, 4] = 1 # The robot is located at time t=0 in cell with grid index 4
# 3. Define Transition Probabilities (tp)
# [P(X_t = i | X_{t-1} = i-1), P(X_t = i | X_{t-1} = i), P(X_t = i | X_{t-1} = i+1)]
# Represents: [Move Left (0.1), Stay (0.7), Move Right (0.2)]
tp = np.array([0.1, 0.7, 0.2]) # convolution kernel
# Set NumPy print options for clean output
np.set_printoptions(precision=3, suppress=True)
# 4. Perform Prediction (Convolution) over 4 time steps
for t in range(4):
# Convolve the current distribution p[t] with the transition probabilities tp
# 'same' mode ensures the output array p[t+1] has the same size (10)
p[t+1] = np.convolve(p[t], tp, mode='same')
print("Calculated Probability Distributions P(X_t=i) over 5 timesteps:")
print(p)
# 5. Plotting the Results
plot_locations(p)
Note: This procedure also works effectively even if the distribution is multimodal.
A Hidden Markov Model (HMM) is an extension of a Markov Chain where the states ($X_t$) are not directly observableβthey are hidden. Instead, we receive noisy observations or measurements ($Z_t$) that depend on the true hidden state. The relationship between these variables is defined by two crucial sets of conditional independence assumptions.
plot_HMM()
The Two Core Conditional Independence Assumptions
The graph structure ($\mathcal{G}$) of the HMM is encoded by these two independence assumptions:
1. The Observation/Measurement Model (Sensor Independence)
This assumption defines the relationship between the hidden state ($X_t$) and the measurement ($Z_t$).
$$P(Z_t \mid Z_1, \dots, Z_{t-1}, X_1, \dots, X_{t}) = P(Z_t \mid X_t)$$
Explanation:
In the robot example: The reading from your sensor ($Z_t$) only depends on where the robot truly is right now ($X_t$). It doesn't matter how the robot got there ($X_1, \dots, X_{t-1}$) or what measurements were taken previously ($Z_1, \dots, Z_{t-1}$).
2. The Transition/Motion Model (Markov Assumption)
This assumption defines the evolution of the hidden state itself, retaining the fundamental property of the Markov Chain.
$$P(X_t \mid Z_1, \dots, Z_{t-1}, X_1, \dots, X_{t-1}) = P(X_t \mid X_{t-1})$$
Explanation:
In the robot example: The robot's motion is purely autonomous. Where it moves next ($X_t$) depends only on where it was one step ago ($X_{t-1}$), and is not influenced by any past sensor readings ($Z_1, \dots, Z_{t-1}$). The motion occurs independently of the observations we make.
These two assumptions collectively decouple the full problem into the manageable, recursive Prediction (motion) and Update (measurement) steps used in the Bayesian Filter and, consequently, the Kalman Filter.
The Prediction Step (motion/convolution) calculates the prior belief, $P(X_t)$, but it always increases uncertainty. To correct this, we use a sensor measurement, $Z_t$, and Bayes' Rule to compute the updated belief, $P(X_t \mid Z_t)$, called the Posterior.
The core update equation is: $$P(X_t=i \mid Z_t) = \frac{P(Z_t \mid X_t=i) \cdot P(X_{t}=i)}{P(Z_t)}$$
Components of the Update:
| Term | Role in the Equation | Interpretation |
|---|---|---|
| $P(X_t=i)$ | Prior (from Prediction) | The predicted probability distribution before seeing the measurement. |
| $P(Z_t \mid X_t=i)$ | Likelihood (Measurement Model) | The sensor's probability: How likely is it to get measurement $Z_t$ if the robot is truly at cell $i$? |
| $P(Z_t)$ | Normalizer/Evidence | A scaling factor ensuring the Posterior sums to 1. It is calculated by marginalizing the joint probability over all states $i$: $$P(Z_t) = \sum_{i} P(Z_t \mid X_t=i) P(X_{t}=i)$$ |
| $P(X_t=i \mid Z_t)$ | Posterior | The final, updated, and corrected probability distribution after incorporating the observation. |
Code Demonstration: Fusing Prior and Measurement
This code takes the prior distribution $P(X_4)$ (the result of the motion prediction at $t=4$ from the earlier example), introduces a new sensor likelihood, and calculates the posterior distribution.
# --- 1. Define the Prior (P(X_t)) ---
# Assuming p is the array from the prediction code (p[4] is the prior distribution at t=4)
# We use the predicted distribution P(X_t=4) from the previous convolution example.
# If p is not defined, we'd use p = np.array([0.003, 0.021, 0.088, 0.231, 0.315, 0.231, 0.088, 0.021, 0.003, 0.000])
prior = np.array([0.003, 0.021, 0.088, 0.231, 0.315, 0.231, 0.088, 0.021, 0.003, 0.000])
# --- 2. Define the Likelihood (P(Z_t | X_t)) ---
# This is the probability of the measurement Z_t, given the robot is at cell i.
# The measurement indicates a value centered around cell 5.
p_measurement = np.array([0., 0., 0., 0.05, 0.18, 0.54, 0.18, 0.05, 0., 0.])
# --- 3. Compute the Posterior (P(X_t | Z_t)) ---
# STEP 1: Numerator (un-normalized posterior) = Likelihood * Prior
unnormalized_posterior = p_measurement * prior
# STEP 2: Normalizer P(Z_t) = sum(unnormalized_posterior)
normalizer = unnormalized_posterior.sum()
# STEP 3: Normalize to get the Posterior
posterior = unnormalized_posterior / normalizer
# print("Normalizer P(Z_t):", normalizer)
# print("Posterior P(X_t | Z_t):\n", posterior)
# --- 4. Plotting the Results ---
plot_updates(prior, p_measurement, posterior)
Interpretation
The crucial effect is that the Posterior distribution is narrower than the Prior. The observation $Z_t$ essentially "pulls" the predicted belief towards the location indicated by the sensor (cell 5 in the likelihood) and reduces the overall uncertainty. This is the core function of the measurement update in any state estimation filter.