In PCA, we treat the Covariance Matrix $\mathbf{C}$ as a Rank 2 tensor that describes the physical "shape" of our data cloud. By applying eigen-decomposition, we perform a passive rotation to find a new coordinate system where the data is most simply described.
When we decompose $\mathbf{C} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$, we are essentially "un-sandwiching" the matrix to find its core properties:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms
def draw_confidence_ellipse(mu, sigma, ax, n_std=2.0, facecolor='none', **kwargs):
"""
Utility to create a plot of the covariance confidence ellipse.
"""
# 1. Compute Eigen-decomposition of the covariance matrix
vals, vecs = np.linalg.eigh(sigma)
order = vals.argsort()[::-1]
vals, vecs = vals[order], vecs[:, order]
# 2. Calculate the angle of rotation
theta = np.degrees(np.arctan2(*vecs[:, 0][::-1]))
# 3. Width and height are standard deviations (sqrt of eigenvalues)
width, height = 2 * n_std * np.sqrt(vals)
# 4. Create the ellipse
ell = Ellipse(xy=mu, width=width, height=height, angle=theta,
facecolor=facecolor, **kwargs)
return ax.add_patch(ell)
# --- Parameters & Data ---
np.random.seed(42)
od = 0.8
Sigma = np.array([[2, od], [od, 0.6]])
mu = np.array([2., 1.])
nb_data = 100
X = np.random.multivariate_normal(mu, Sigma, nb_data)
# --- Statistics ---
mean_emp = X.mean(axis=0)
cov_mat = np.cov(X, rowvar=False)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
# --- Visualization ---
fig, ax = plt.subplots(figsize=(7, 7))
# 1. Plot the sampled data
ax.scatter(X[:, 0], X[:, 1], c='b', marker='x', alpha=0.6, label="Sampled Data")
# 2. Plot the TRUE underlying distribution (The Ellipse)
# We draw the 2-sigma ellipse (approx 95% of data)
draw_confidence_ellipse(mu, Sigma, ax, n_std=2.0, edgecolor='black',
linestyle='--', label=r'True Dist. (2$\sigma$)')
# 3. Plot the Eigenvectors (Empirical)
colors = ['r', 'g']
for i in range(2):
std_dev = np.sqrt(eig_vals[i])
v = eig_vecs[:, i] * std_dev
ax.quiver(mean_emp[0], mean_emp[1], v[0], v[1],
color=colors[i], angles='xy', scale_units='xy', scale=1,
label=fr'PC {i+1} ($\sigma$)')
# Formatting
ax.axhline(0, color='grey', lw=0.5)
ax.axvline(0, color='grey', lw=0.5)
ax.set_aspect('equal')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_title('PCA and Underlying Gaussian Distribution')
ax.legend()
plt.grid(True, alpha=0.3)
plt.show()
The visualization summarizes the Eigen-decomposition of a Covariance Matrix:
The Takeaway
The plot shows that PCA has found a new coordinate system. Instead of using $x_1$ and $x_2$, we can describe the data more efficiently using the red and green axes. In dimensionality reduction, we would keep the red axis ($PC_1$) and discard the green one ($PC_2$) because it contains less variance.
To project the data onto the first Principal Component, we mathematically "collapse" every 2D point onto the line defined by the first eigenvector.
The Projection Step
If $\mathbf{v}_1$ is our first eigenvector (the red arrow), the 1D projection $z^{(k)}$ of a centered data point $\tilde{\mathbf{x}}^{(k)}$ is calculated via the dot product:$$z^{(k)} = \tilde{\mathbf{x}}^{(k)} \cdot \mathbf{v}_1$$This gives us a single coordinate representing the point's position along the axis of maximum variance.
# 1. Center the data (X is your original 40x2 array)
mean_emp = X.mean(axis=0)
X_centered = X - mean_emp
# 2. Get the first eigenvector (the direction of the red arrow)
# Ensure you use the eigenvector corresponding to the largest eigenvalue
v1 = eig_vecs[:, 0]
# 3. Project the 2D data onto the 1D line
# This results in 40 scalar values
z1 = X_centered @ v1
# 4. Reconstruct for visualization
# This maps the 1D values back into 2D space along the PC1 axis
X_reconstructed = np.outer(z1, v1) + mean_emp
# --- Visualization ---
plt.figure(figsize=(6, 6))
plt.scatter(X[:, 0], X[:, 1], c='b', marker='x', alpha=0.3, label="Original 2D Data")
plt.scatter(X_reconstructed[:, 0], X_reconstructed[:, 1], c='r', label="Projected 1D Data")
# Draw the axis line
plt.plot(X_reconstructed[:, 0], X_reconstructed[:, 1], color='r', alpha=0.2)
plt.gca().set_aspect('equal')
plt.legend()
plt.title("Projection onto the 1st Principal Component")
plt.show()
Why this matters
What is happening mathematically?
We are performing a change of basis.
The "magic" of PCA happens when we decide to ignore the eigenvectors with small eigenvalues.
If we have a 100-dimensional dataset, but the first 3 eigenvalues capture 95% of the total variance, we can "project" our data onto those 3 eigenvectors and discard the rest. We effectively simplify our "world" from 100 dimensions down to 3, while keeping almost all the important structural information.
A Scree Plot is a diagnostic tool used to visualize the importance of each Principal Component. It plots the eigenvalues (variance) against the component number.
In a tensor context, the scree plot tells us which "dimensions" of the Rank 2 tensor are physically significant and which are just noise. We look for the "elbow" of the plot—the point where the variance drops off sharply.
Python Code: Generating a Scree Plot This script calculates PCA on a synthetic dataset and visualizes how much "information" each eigenvalue captures.
import numpy as np
import matplotlib.pyplot as plt
# 1. Create synthetic data with 5 dimensions
# (Only 2 dimensions will have significant variance)
np.random.seed(42)
n_samples = 100
data = np.dot(np.random.randn(n_samples, 2), [[2, 0.5], [0.5, 1]]) # Main signal
noise = np.random.normal(0, 0.2, (n_samples, 3)) # Noise
X = np.hstack([data, noise])
# 2. PCA Steps
X_centered = X - np.mean(X, axis=0)
cov_matrix = np.cov(X_centered, rowvar=False)
eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)
# Sort eigenvalues in descending order
eigenvalues = eigenvalues[::-1]
total_variance = np.sum(eigenvalues)
explained_variance_ratio = eigenvalues / total_variance
# 3. Plotting the Scree Plot
plt.figure(figsize=(8, 5))
components = np.arange(1, len(eigenvalues) + 1)
# Bar plot for individual variance
plt.bar(components, explained_variance_ratio, alpha=0.7, color='blue', label='Individual Variance')
# Step plot for cumulative variance
plt.step(components, np.cumsum(explained_variance_ratio), where='mid', color='red', label='Cumulative Variance')
plt.xlabel('Principal Component Index')
plt.ylabel('Proportion of Explained Variance')
plt.title('Scree Plot: Finding the "Elbow"')
plt.xticks(components)
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
The Bars: Show how much of the "Total Variance" (the Trace) is captured by each eigenvector.
The Line: Shows the cumulative total. If the first two components reach 90%, it means you can describe the 5D dataset using only 2 dimensions with minimal information loss.
The "Elbow": The point where the curve flattens out. In a tensor interpretation, this is where we distinguish the Signal (the primary axes of the tensor) from the Noise (the tiny, insignificant fluctuations).
Dimensionality reduction is the ultimate goal of PCA. From a tensor perspective, we are identifying which dimensions of the space are physically meaningful (the signal) and which are just background fluctuations (the noise).
By looking at the Scree Plot, we can decide to keep only the top $k$ eigenvectors. We then form a projection matrix $\mathbf{V}$ using these $k$ vectors. When we project our data:$$\mathbf{X}_{reduced} = \mathbf{X}_{centered} \mathbf{V}$$We are performing a passive rotation into a new coordinate system and then simply "ignoring" the axes that don't carry enough variance.
The Geometry of Truncation When we perform PCA, we aren't just "deleting" columns of data; we are choosing a new subspace that best represents the "energy" or "mass" of our data cloud.
Constructing the Projection Matrix ($\mathbf{V}$)
After sorting our eigenvalues ($\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_d$), we select the first $k$ eigenvectors to form our projection matrix $\mathbf{V}$. This matrix is a $d \times k$ operator.
The Projection Equation: $\mathbf{X}_{reduced} = \mathbf{X}_{centered} \mathbf{V}$
This multiplication is a Passive Transformation. For every data point (row) in $\mathbf{X}_{centered}$:
By doing this, we are effectively looking at the data from the perspective of the "best" $k$ axes and ignoring the remaining $d-k$ axes.
Why it is "Truncating the Tensor"
In its full form, the covariance matrix (our Rank 2 tensor) is:$$\mathbf{C} = \sum_{i=1}^{d} \lambda_i \mathbf{v}_i \mathbf{v}_i^T$$When we reduce dimensions, we are approximating the tensor by cutting off the end of this sum:$$\mathbf{C}_{approx} \approx \sum_{i=1}^{k} \lambda_i \mathbf{v}_i \mathbf{v}_i^T$$We are claiming that the "physical relationship" described by the tensor is almost entirely contained within those first $k$ directions. The remaining variance (the sum of the ignored eigenvalues) is treated as isotropic noise—shaking that doesn't follow a meaningful pattern.
Summary of Information Loss
The difference between $\mathbf{X}_{centered}$ and $\mathbf{X}_{reconstructed}$ is the Reconstruction Error.
If the eigenvalues we discarded are near zero, the error is negligible. In tensor terms, we have successfully removed the "dimensions of least resistance," keeping only the axes where the data actually lives.
Measuring Information Retention
In the tensor perspective, the Trace of the covariance matrix represents the "Total Energy" or "Total Variance" of the system. Since the Trace is invariant under rotation, the sum of our eigenvalues must equal the total variance of the original data.When we reduce the dimensions from $d$ to $k$, we can calculate the percentage of information kept using this formula:$$\text{Explained Variance} = \frac{\sum_{i=1}^{k} \lambda_i}{\sum_{j=1}^{d} \lambda_j}$$
| Step in PCA | Tensor Interpretation |
|---|---|
| Calculate $\mathbf{C}$ | Defining the Rank 2 "relationship" between all variables. |
| Find Eigenvectors | Finding the "Natural Axes" (characteristic directions) of the data cloud. |
| Sort Eigenvalues | Ranking axes by how much "information" (variance) they carry. |
| Project Data | A Change of Basis (Passive Rotation) to the new coordinate system. |
| Dimension Reduction | Truncating the tensor to its most significant physical dimensions. |