Eigen-Decomposition¶

on the example Covariance Matrix

From Data to the Covariance Matrix¶

Synthetic Data Generation

We start by generating a 2D Gaussian distribution with a known mean $\boldsymbol \mu$ and covariance $\boldsymbol \Sigma$. This simulates noisy sensor measurements.

In [1]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(41)

# True parameters
od = 0.8
Sigma = np.array([[2, od], [od, 0.6]])
mu = np.array([2., 1.])
nb_data = 40

# Sample the data
X = np.random.multivariate_normal(mu, Sigma, nb_data)

# Visualization
plt.figure(figsize=(5, 5))
plt.scatter(X[:, 0], X[:, 1], c='b', marker='x', label="Data")
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
plt.title("Sampled 2D Data")
plt.legend()
plt.show()

No description has been provided for this image

Empirical Statistics¶

In reality, we don't know the "True" $\boldsymbol \mu$ or $\boldsymbol \Sigma$. We must estimate them from the $n$ samples (for each data component).

The Empirical Mean ($\bar{x}$):$$\bar{x}_i = \frac{1}{n} \sum_{k=1}^n x_i^{(k)}$$

The Empirical Covariance:

To get an unbiased estimator, we use $n-1$ in the denominator. This is called Bessel's Correction; it compensates for the fact that we are measuring spread around the empirical mean $\bar{x}$ rather than the true population mean $\mu$, which slightly underestimates the true variance.

The covariance between dimension $i$ and dimension $j$ is:$$\text{cov}(x_i, x_j) = \frac{1}{n-1} \sum_{k=1}^n (x_i^{(k)} - \bar{x}_i)(x_j^{(k)} - \bar{x}_j)$$

The Matrix Approach (Centering)¶

To compute this efficiently for all dimensions at once, we center the data matrix $X$ by subtracting the mean (using NumPy broadcasting):

$$\tilde{\mathbf{X}} = \mathbf{X} - \mathbf{1}\bar{\mathbf{x}}^T$$

where

$\mathbf{1}$ is a vector of ones and
$\bar{\mathbf{x}}^T$ is the row vector of empirical means.

Then, the empirically covariance matrix $C$ is found via the inner product:

$$\mathbf{C} = \frac{1}{n-1} \tilde{\mathbf{X}}^T \tilde{\mathbf{X}}$$

$$ \mathbf{C} = \frac{1}{n-1} \tilde{\mathbf{X}}^T \cdot \tilde{\mathbf{X}} = \frac{1}{n-1} \begin{pmatrix} \tilde x_1^{(1)}& \tilde x_1^{(2)} & \tilde x_1^{(3)} & \dots \\ \tilde x_2^{(1)}& \tilde x_2^{(2)} & \tilde x_2^{(3)} & \dots\\ \dots & \dots & \dots & \dots \\ \end{pmatrix} \cdot \begin{pmatrix} \tilde x_1^{(1)}& \tilde x_2^{(1)} & \dots \\ \tilde x_1^{(2)}& \tilde x_2^{(2)} & \dots \\ \tilde x_1^{(3)}& \tilde x_2^{(3)} & \dots \\ \dots & \dots & \dots \\ \end{pmatrix} = \frac{1}{n-1} \begin{pmatrix} \sum_k \tilde x_1^{(k)} \tilde x_1^{(k)} & \sum_k \tilde x_1^{(k)} \tilde x_2^{(k)} & \dots \\ \sum_k \tilde x_2^{(k)} \tilde x_1^{(k)} & \sum_k \tilde x_2^{(k)} \tilde x_2^{(k)} & \dots \\ \dots & \dots & \dots \\ \end{pmatrix} $$

In [2]:

n = X.shape[0]
mean_empirical = X.mean(axis=0)
X_centered = X - mean_empirical

# Manual computation
C = (1 / (n - 1)) * X_centered.T @ X_centered
# NumPy built-in
# cov_mat_np = np.cov(X, rowvar=False)

print(f"Empirical Covariance Matrix:\n{C}")

Empirical Covariance Matrix:
[[2.04624006 0.80392928]
 [0.80392928 0.60234806]]

Symmetry¶

Definition: A matrix $\mathbf{C}$ is symmetric if it is equal to its transpose ($\mathbf{C} = \mathbf{C}^T$).

In the world of Covariance Matrices, symmetry is guaranteed because the covariance between random variable $\mathbf{X}_1$ and $\mathbf{X}_2$ is identical to the covariance between $\mathbf{X}_2$ and $\mathbf{X}_1$.

Positive definite¶

Definition: A square matrix $\mathbf{C} \in \mathbb{R}^{n \times n}$ is positive-definite if for all $\mathbf v \in \mathbb{R}^{n} $ and $\mathbf v \neq 0$ holds $$ \mathbf{v}^T \mathbf{C} \mathbf{v} > 0 $$

The covariance matrix is positive definite¶

Let $\mathbf{y} = \mathbf{x} - \boldsymbol \mu $ be a random vector (centered at zero) drawn from our data distribution. The covariance is defined as the expected value:$$\mathbf{C} = \mathbb{E}[\mathbf{y}\mathbf{y}^T]$$

with an arbitrary vector $\mathbf{v}$:

$$\mathbf{v}^T \mathbf{C} \mathbf{v} = \mathbf{v}^T \mathbb{E}[\mathbf{y}\mathbf{y}^T] \mathbf{v}$$

Since $\mathbf{v}$ is a constant vector, we can move it inside the expectation: $$\mathbb{E}[\mathbf{v}^T \mathbf{y}\mathbf{y}^T \mathbf{v}]$$

Notice that $\mathbf{v}^T \mathbf{y}$ is a scalar (let's call it $a$). Then $\mathbf{y}^T \mathbf{v}$ is also $a$. So the equation becomes:

$$\mathbb{E}[a \cdot a] = \mathbb{E}[a^2]$$

Because $a^2$ is the square of a number, it is always $\ge 0$. Therefore, its average (Expectation) is also $\ge 0$.

The expression $\mathbf{v}^T \mathbf{C} \mathbf{v}$ represents the variance of the data projected onto the direction $\mathbf{v}$. Since variance cannot be negative, the matrix must be positive semi-definite. If the data has any spread at all in that direction, the variance is $> 0$, making it positive definite.

The Fundamental Eigen-Equation¶

For a square matrix $\mathbf{C}$, a non-zero vector $\mathbf{q}$ is an eigenvector if the transformation of $\mathbf{q}$ by $\mathbf{C}$ results only in a change of scale, not direction:

$$\mathbf{C}\mathbf{q} = \lambda\mathbf{q}$$

Eigenvector ($\mathbf{q}$): The "characteristic direction." It defines an axis that remains invariant (un-rotated) under the transformation.
Eigenvalue ($\lambda$): The "characteristic scale." It represents the factor by which the eigenvector is stretched or shrunk.

Remember: The Eigenvectors $\bf{q}$ are not rotated by $C$.

Eigenvalues of positive-definite matricies are strictly positive¶

$$\mathbf{C}\mathbf{q} = \lambda \mathbf{q} \implies \mathbf{q}^T \mathbf{C} \mathbf{q} = \lambda (\mathbf{q}^T \mathbf{q})$$ Since $\mathbf{q}^T \mathbf{C} \mathbf{q} > 0$ and $\mathbf{q}^T \mathbf{q} > 0$, it follows that $\lambda > 0$.

Solving the Eigen-problem¶

To find $\lambda$, we rearrange the equation:$$(\mathbf{C} - \lambda \mathbb{I}) \mathbf{q} = \mathbf{0}$$

This equation has a non-zero solution for $\vec{q}$ only if the matrix $(\mathbf{C} - \lambda \mathbb{I})$ is singular, meaning its determinant is zero:

$$\det(\mathbf{C} - \lambda \mathbb{I}) = 0$$

Mathematical Example:

For a matrix $\mathbf{C} = \begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}$:

$$\det \begin{pmatrix} 3-\lambda & 1 \\ 0 & 2-\lambda \end{pmatrix} = (3-\lambda)(2-\lambda) - 0 = 0$$

Results: $\lambda_1 = 3, \lambda_2 = 2$.

For $\lambda_1=3$, solving $(\mathbf{C}-3\mathbb{I})\mathbf{q} = 0$ gives $\mathbf{q}_1 = \begin{pmatrix} \alpha \\ 0 \end{pmatrix}$.
For $\lambda_2=2$, solving $(\mathbf{C}-2\mathbb{I})\mathbf{q} = 0$ gives $\mathbf{q}_2 = \begin{pmatrix} \beta \\ -\beta \end{pmatrix}$.

Geometric Intuition: The "Power" of the Eigenvector¶

What happens if we repeatedly apply the covariance matrix $\mathbf{C}$ to random vector (not an eigenvector) and rescale it to length 1?

In [3]:

# 1. Define the function correctly
def transform_and_rescale(C_matrix, y, rescale=True):
    y = C_matrix.dot(y)
    if rescale:
        y = y / np.linalg.norm(y)
    return y

# 2. Setup context (Assuming C is our empirical covariance matrix from before)
# If you don't have C defined, uncomment the next two lines:
# Sigma = np.array([[2, 0.8], [0.8, 0.6]])
# C = Sigma 

# 3. Initialize the vector
y = np.array([1., -1]) # Starting with a slight offset from the axis
y = y / np.linalg.norm(y)

# 4. Plotting
fig, ax = plt.subplots(figsize=(5, 5))
ax.set_xlim(-1.2, 1.2)
ax.set_ylim(-1.2, 1.2)
ax.axhline(0, color='grey', lw=1)
ax.axvline(0, color='grey', lw=1)

# Plot the starting vector in red
ax.quiver(0, 0, y[0], y[1], color="red", scale=1, scale_units='xy', 
          angles='xy', label="Start", zorder=3)

# 5. Repeatedly transform and plot the "path" toward the eigenvector
for i in range(10):
    y = transform_and_rescale(C, y)
    alpha_val = (i + 1) / 10  # Make later iterations darker
    ax.quiver(0, 0, y[0], y[1], alpha=alpha_val, color="black", 
              scale=1, scale_units='xy', angles='xy')

plt.title("Convergence toward the Dominant Eigenvector")
plt.legend()
plt.grid(True, linestyle=':', alpha=0.6)
plt.show()

The result:

No matter where you start (unless you are perfectly aligned with a smaller eigenvector), repeated transformation by $C$ will "pull" the vector toward the eigenvector with the largest eigenvalue.

In the context of our data, this direction is the axis of maximum variance.

Term	Symbol	Meaning in Data Science
Covariance Matrix	$\mathbf{C}$	Describes the "shape" and spread of the data cloud.
Eigenvector	$\bf{q}$	The "Principal Axes" (directions of maximum/minimum spread).
Eigenvalue	$\lambda_i$	The amount of variance captured along a specific eigenvector.
Trace($C$)	$\sum_i \lambda_i$	The total variance in the dataset (sum of the diagonal).

Orthogonality of Eigenvectors¶

Premise: Let $\mathbf{C} \in \mathbb{R}^{n \times n}$ be a symmetric matrix ($\mathbf{C} = \mathbf{C}^T$). Let $\mathbf{q}_1, \mathbf{q}_2$ be eigenvectors with corresponding distinct eigenvalues $\lambda_1, \lambda_2$ (where $\lambda_1 \neq \lambda_2$).

Starting from the fundamental eigen-equations:

$\mathbf{C}\mathbf{q}_1 = \lambda_1 \mathbf{q}_1$
$\mathbf{C}\mathbf{q}_2 = \lambda_2 \mathbf{q}_2$

From the first eigen-equation:

$$(\lambda_1 \mathbf{q}_1)^T \cdot \mathbf{q}_2 = (\mathbf{C}\mathbf{q}_1)^T \cdot \mathbf{q}_2$$ By the property of transposes $(AB)^T = B^T A^T$: $$\lambda_1 \mathbf{q}_1^T \mathbf{q}_2 = \mathbf{q}_1^T \mathbf{C}^T \mathbf{q}_2$$

Apply the symmetry property. Since $\mathbf{C} = \mathbf{C}^T$: $$\lambda_1 \mathbf{q}_1^T \mathbf{q}_2 = \mathbf{q}_1^T \mathbf{C} \mathbf{q}_2$$

From the second eigen equation: $$\mathbf{q}_1^T \cdot (\lambda_2 \mathbf{q}_2) = \mathbf{q}_1^T \cdot (\mathbf{C} \mathbf{q}_2) $$ $$\lambda_2 \mathbf{q}_1^T \mathbf{q}_2 = \mathbf{q}_1^T \mathbf{C} \mathbf{q}_2$$

The difference of the two result: $$(\lambda_1 - \lambda_2) (\mathbf{q}_1^T \mathbf{q}_2) = 0$$

Conclusion: Since we assumed the eigenvalues are distinct ($\lambda_1 - \lambda_2 \neq 0$), the scalar product must be zero:$$\mathbf{q}_1^T \mathbf{q}_2 = 0$$This proves that the eigenvectors $\mathbf{q}_1$ and $\mathbf{q}_2$ are orthogonal.

Eigen-decomposition¶

Eigen-decomposition is the process of "deconstructing" a square matrix into a set of characteristic directions and scales. In the context of data science and robotics, it reveals the hidden geometric structure of a linear transformation or a covariance matrix.

Factorization of a matrix $\mathbf{C}$¶

1. The Starting Point: The Vector Level

By definition, for an individual eigenvector $\mathbf{q}_i$ and its eigenvalue $\lambda_i$, we have:

$$\mathbf{C}\mathbf{q}_i = \lambda_i \mathbf{q}_i$$

This tells us that $\mathbf{C}$ acts on $\mathbf{q}_i$ just like a simple scalar multiplication (no rotation).

2. Scaling to the Matrix Level

If our matrix $\mathbf{C}$ is $n \times n$ and has $n$ linearly independent eigenvectors, we can "stack" these individual equations side-by-side.

Let $\mathbf{Q}$ be the matrix where each column is one of these eigenvectors:

$$\mathbf{Q} = \begin{bmatrix} | & | & & | \\ \mathbf{q}_1 & \mathbf{q}_2 & \dots & \mathbf{q}_n \\ | & | & & | \end{bmatrix}$$

If we multiply $\mathbf{C}$ by this entire matrix $\mathbf{Q}$, the rules of matrix multiplication tell us:

$$\mathbf{C}\mathbf{Q} = \begin{bmatrix} | & | & & | \\ \mathbf{C}\mathbf{q}_1 & \mathbf{C}\mathbf{q}_2 & \dots & \mathbf{C}\mathbf{q}_n \\ | & | & & | \end{bmatrix}$$

Using our definition from Step 1, we can replace each $\mathbf{C}\mathbf{q}_i$ with $\lambda_i\mathbf{q}_i$:

$$\mathbf{C}\mathbf{Q} = \begin{bmatrix} | & | & & | \\ \lambda_1\mathbf{q}_1 & \lambda_2\mathbf{q}_2 & \dots & \lambda_n\mathbf{q}_n \\ | & | & & | \end{bmatrix}$$

3. Factoring out the Eigenvalues

Notice that the right side can be rewritten as the product of $\mathbf{Q}$ and a diagonal matrix $\mathbf{\Lambda}$:

$$\mathbf{C}\mathbf{Q} = \mathbf{Q}\mathbf{\Lambda}$$

Where $\mathbf{\Lambda} = \text{diag}(\lambda_1, \lambda_2, \dots, \lambda_n)$.

4. Completing the Factorization

Since we assumed the eigenvectors are linearly independent, the matrix $\mathbf{Q}$ is guaranteed to be invertible. We can therefore multiply both sides by $\mathbf{Q}^{-1}$ from the right:

$$\mathbf{C}\mathbf{Q}\mathbf{Q}^{-1} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1}$$$$\mathbf{C} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1}$$

Key Requirements (The "Catch")

Not every matrix can be factorized this way. For this math to work, two conditions must be met:

Square Matrix: $\mathbf{C}$ must be $n \times n$.
Eigenbasis Existence: You must be able to find $n$ linearly independent eigenvectors. If you don't have enough (a "defective" matrix), $\mathbf{Q}$ won't be invertible, and the factorization fails. Note, that a positive-definite matrix $\mathbf{C}$ is guaranteed to be invertible: The determinant of a matrix is equal to the product of its eigenvalues. Since every $\lambda_i > 0$ (see above), their product must also be strictly greater than zero. Since the determinant is not zero, the matrix $\mathbf{C}$ is non-singular and therefore invertible.

Matrix Factorization (Summary)¶

If a matrix $\mathbf{C} \in \mathbb{R}^{n \times n}$ has $n$ linearly independent eigenvectors, we can group them into a matrix $\mathbf{Q}$ and the eigenvalues into a diagonal matrix $\mathbf{\Lambda}$ (Lambda):

$$\mathbf{C} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1}$$

$\mathbf{Q}$: A matrix where each column is an eigenvector of $\mathbf{C}$. We can normalize these eigenvectors.
$\mathbf{\Lambda}$: A diagonal matrix where $\Lambda_{ii} = \lambda_i$.
$\mathbf{Q}^{-1}$: The inverse transformation that brings the data back to the original basis.

The Connection to the Sandwich Product:

If $\mathbf{C}$ is a symmetric matrix (like a covariance matrix), its eigenvectors are orthogonal (see above), meaning $\mathbf{Q}^{-1} = \mathbf{Q}^T$. This reveals that:

$$\mathbf{C} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$$

Notice the structure!
This is the exact same sandwich product we used for rotations. This tells us that any covariance matrix can be viewed as a simple diagonal matrix $\mathbf{\Lambda}$ (where variances are independent) that has been actively rotated into its current position by the eigenvector matrix $\mathbf{Q}$.

Summary Table:

Concept	Geometric Meaning	Role in PCA
Eigenvector	An axis that is not rotated by the matrix.	A Principal Component (a new feature axis).
Eigenvalue	The stretch factor along an eigenvector.	The "importance" or variance of that axis.
Diagonalization	Rotating the space to align with eigenvectors.	Decoupling features to make them uncorrelated.

Why Factorize? The Change of Basis Perspective

To understand why we write $\mathbf{C} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1}$, think of it as a three-step pipeline for processing data or robot states. Instead of applying a complex transformation $\mathbf{C}$ directly, we "simplify" the space first.

$\mathbf{Q}^{-1}$ (Coordinate Change): We move from the standard basis (the world’s $x, y, z$) into the eigenbasis (the robot’s natural axes). In this new coordinate system, the variables are no longer "tangled" or correlated.
$\mathbf{\Lambda}$ (Independent Scaling): Since $\mathbf{\Lambda}$ is a diagonal matrix, it performs the simplest possible operation: it just stretches or shrinks each axis independently by its eigenvalue $\lambda_i$. No rotation or shearing happens here.
$\mathbf{Q}$ (Return to World): We rotate the result back into the original world coordinates.

Computational Advantage: In Applied CS, factorization is used for efficiency. If you need to calculate the state of a system 100 steps into the future, calculating $\mathbf{C}^{100}$ is computationally expensive and prone to rounding errors. However, using the factorized form:

$$\mathbf{C}^{100} = (\mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^{-1})^{100} = \mathbf{Q}\mathbf{\Lambda}^{100}\mathbf{Q}^{-1}$$

Since $\mathbf{\Lambda}$ is diagonal, $\mathbf{\Lambda}^{100}$ is just each diagonal element $\lambda_i$ raised to the power of 100. This turns a massive matrix-matrix multiplication problem into a simple element-wise power problem.

The inverse of $C$¶

For a symmetric positive-definite covariance matrix $\mathbf{C} = \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^T$, the inverse $\mathbf{C}^{-1}$ is calculated by applying the inverse property of matrix products $(ABC)^{-1} = C^{-1}B^{-1}A^{-1}$:

$$\mathbf{C}^{-1} = \left( \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T \right)^{-1} \\ = (\mathbf{Q}^T)^{-1} \mathbf{\Lambda}^{-1} \mathbf{Q}^{-1} = \mathbf{Q}\mathbf{\Lambda}^{-1}\mathbf{Q}^T$$

Where:

$\mathbf{\Lambda}^{-1}$: A diagonal matrix where each diagonal element is the reciprocal of the eigenvalue, $\Lambda_{ii}^{-1} = 1/\lambda_i$.
Structure: The resulting matrix $\mathbf{C}^{-1}$ remains symmetric and positive-definite, as all diagonal elements of $\mathbf{\Lambda}^{-1}$ are positive ($1/\lambda_i > 0$).

The square root of $C$¶

with normalized eigenvectors $ \mathbf{q}_i^T \mathbf{q}_j = \delta_{ij}$ in $\mathbf{Q}$ is the square root of $C$:

$$\mathbf{C}^{\frac{1}{2}} = \mathbf{Q} \mathbf{\Lambda}^{\frac{1}{2}} \mathbf{Q}^T$$

where $\mathbf{\Lambda}^{\frac{1}{2}}$ is the diagonal matrix containing the square roots of the (positive) eigenvalues $\sqrt{\lambda_i}$.

This definition is consistent because multiplying the matrix by itself retrieves the original covariance matrix $\mathbf{C}$:

$$\mathbf{C} = \mathbf{C}^{\frac{1}{2}} \mathbf{C}^{\frac{1}{2}} = \mathbf{Q} \mathbf{\Lambda}^{\frac{1}{2}} \mathbf{Q}^T \mathbf{Q} \mathbf{\Lambda}^{\frac{1}{2}} \mathbf{Q}^T = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$$

Application: Stability Analysis in Dynamic Systems¶

Beyond data science, eigen-decomposition is the primary tool used to determine if a system—such as a self-balancing robot or an autonomous vehicle—will remain under control or spiral into chaos.

The Mathematical Intuition

Imagine a robot’s state (position and velocity) is described by a vector $\mathbf{x}$. The way this state changes over time is often modeled by a system of linear equations:

$$\mathbf{x}_{t+1} = \mathbf{Fx}_t$$

If we apply this repeatedly, the state at any time $n$ is $\mathbf{x}_n = \mathbf{F}^n \mathbf{x}_0$. This is where eigen-decomposition ($\mathbf{F} = \mathbf{VDV}^{-1}$) becomes powerful. It allows us to view the system not as a tangled mess of variables, but as a set of independent "modes":$$\mathbf{x}_n = \mathbf{V} \mathbf{D}^n \mathbf{V}^{-1} \mathbf{x}_0$$

Why the Eigenvalues ($\lambda$) Dictate Reality

The diagonal matrix $\mathbf{D}^n$ contains the eigenvalues raised to the power of $n$ ($\lambda_1^n, \lambda_2^n, \dots$). This reveals the "long-term fate" of the robot:

Stability ($|\lambda| < 1$): If the magnitude of every eigenvalue is less than 1, then as $n$ increases, $\lambda^n$ shrinks toward zero. Physically, this means if you push the robot, the "error" eventually dies out, and the robot returns to its desired state.
Instability ($|\lambda| > 1$): If even a single eigenvalue has a magnitude greater than 1, then $\lambda^n$ grows exponentially. Even a tiny gust of wind will cause the robot's tilt or speed to increase until it physically fails or hits a limit.
Oscillation (Complex $\lambda$): If the eigenvalues are complex numbers, the robot won't just fall; it will oscillate. The imaginary part determines the frequency of the "wobble.

Example: The "Segway" Problem

Consider a self-balancing robot. The matrix $\mathbf{F}$ captures the physics of gravity. Previously, we used $\mathbf{x}_k = \mathbf{F}\mathbf{x}_{k-1} + \mathbf{G}\mathbf{u}_{k-1}$ for passive prediction. However, if $\mathbf{F}$ has an eigenvalue $\lambda > 1$ (e.g., $\lambda = 1.05$), our model simply predicts a crash as errors grow by 5% every step.

To prevent this, the computer scientist implements a feedback loop by defining the control input as $\mathbf{u}_k = -\mathbf{K}\mathbf{x}_k$.

Here, $\mathbf{K}$ is the Gain Matrix—the software logic that determines how strongly the robot reacts to its current state. By substituting this back into our prediction model, we "rewrite" the physics:

$$\mathbf{x}_k = (\mathbf{F} - \mathbf{GK})\mathbf{x}_{k-1}$$

By choosing a $\mathbf{K}$ that shifts all eigenvalues of this new matrix $(\mathbf{F} - \mathbf{GK})$ into the unit circle ($|\lambda| < 1$), we turn an unstable "falling" system into a stable "self-correcting" one.

Key Takeaway: In AI and Robotics, eigen-decomposition isn't just about "summarizing" data (like PCA); it is about designing the future. It allows us to verify if our software's logic ($\mathbf{K}$) will result in a smooth trajectory or a catastrophic crash.