• Praktische Grundlagen der Informatik
  • Algorithmen und Datenstrukturen
  • Foundations for Robotic and AI
  • Übersicht Data Science:
  • Data Science / Machine Learning
  • Projekt: Deep Teaching
  • Alte Veranstaltungen:
  • Grundlagen der Informatik (NF)
  • Octave/Matlab Tutorial
  • Game Physik
  • Home
  • Lehre
  • Publikationen
  • Kontakt/Impressum

Applications of PCA in Image Processing¶

In [ ]:
import numpy as np
import pickle
import os
import urllib 
import tarfile
%matplotlib inline

from matplotlib import pyplot as plt

from sklearn.decomposition import PCA

Data¶

The folowing python code downloads the image data and transforms the shapes.

In [ ]:
# --- 1. Configuration and Paths ---
url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
# We define a local folder to store the data
data_dir = "~/tmp/cifar_data"
tar_filename = "cifar-10-python.tar.gz"
tar_path = os.path.join(data_dir, tar_filename)
# This is the folder created after extraction
batch_folder = os.path.join(data_dir, "cifar-10-batches-py")

# --- 2. Automated Download ---
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

if not os.path.exists(tar_path):
    print(f"Downloading CIFAR-10 from {url}...")
    # This is the Python equivalent of 'wget'
    urllib.request.urlretrieve(url, tar_path)
    print("Download finished.")
else:
    print("Archive already exists, skipping download.")

# --- 3. Automated Extraction ---
if not os.path.exists(batch_folder):
    print("Extracting archive...")
    # This is the Python equivalent of 'tar xf'
    with tarfile.open(tar_path, "r:gz") as tar:
        tar.extractall(path=data_dir)
    print("Extraction finished.")
else:
    print("Data already extracted.")

# --- 4. Loading the Data (Your Original Logic) ---
def load_cifar_batch(file_path):
    with open(file_path, 'rb') as f:
        # encoding='bytes' is required for CIFAR-10 pickle files
        data_dict = pickle.load(f, encoding='bytes')
    return data_dict

# Load the first batch
batch_1_path = os.path.join(batch_folder, "data_batch_1")
cifar_10 = load_cifar_batch(batch_1_path)

X_original = cifar_10[b'data'] # Shape: (10000, 3072)

# --- 5. Pre-processing for PCA ---
# Reshape to (N, Channels, Height, Width) -> (10000, 3, 32, 32)
X_reshaped = X_original.reshape((-1, 3, 32, 32))
# Transpose to (N, Height, Width, Channels) for visualization -> (10000, 32, 32, 3)
X_images = X_reshaped.transpose(0, 2, 3, 1)
# Back to design matrix for PCA (N, 3072)
X_data = X_images.reshape(-1, 3072)
print(f"Successfully loaded data matrix with shape: {X_data.shape}")

def show(i, autoscale = False, title="Sample Image"):
    plt.figure(figsize=(1.5, 1.5))
    image = i.reshape((32, 32, 3))
    if autoscale:
        # Min-Max Scaling: (value - min) / (max - min)
        # This maps the tensor values to the [0, 1] range required by imshow
        m, M = image.min(), image.max()
        # Adding a small epsilon prevents division by zero if the image is a solid color
        plt.imshow((image - m) / (M - m + 1e-8))
    else:
        plt.imshow(image)
    plt.title(title)
    plt.axis('off')
    plt.show()
    
    
show(X_data[12])

Geometric transformations¶

Before we apply the PCA algorithm, we need to transform the images. In the context of machine learning, this is often called mean subtraction and length normalization.

1. Centering: The Translation ($\mathbf{X} - \mu$)¶

Mathematically, this is a vector translation.

  • What is mean(axis=0)? It is the "Average Image" of your entire dataset. It represents the shared characteristics of all CIFAR-10 images (the common background colors, average lighting, etc.).
  • The Transformation: By subtracting this average, you shift the entire data cloud so that its center (the centroid) is at the origin of the coordinate system.
  • Why for PCA? PCA finds directions of maximum variance relative to the origin. If you don't center the data, the first "Principal Component" will just point from $(0,0, \dots, 0)$ to the middle of your data blob. Centering forces PCA to ignore the "average" and focus on the differences between images.
In [ ]:
# This shifts the entire data cloud so that its centroid (the average image) sits at the origin $(0,0, \dots, 0)$.
X_data = X_data - X_data.mean(axis=0)

2. Unit Norm Scaling: The Projection to the Hypersphere¶

This is a scaling operation that acts on each image vector individually.

  • The Math: For each image vector $\mathbf{x}_i$, it calculates the Euclidean norm (length): $||\mathbf{x}_i|| = \sqrt{\sum x_{ij}^2}$. It then divides every pixel ($x_{ij}$) in that image by that length.
  • The Geometry: It pushes or pulls every image point until it sits exactly on the surface of a unit hypersphere (a sphere with a radius of $1$).
  • The Effect on Images: This removes "Global Intensity."
    • A very bright image of a car and a very dark image of the same car become nearly identical vectors.
    • The "Magnitude" (brightness) is discarded, and only the "Direction" (the relative pattern of pixels) is kept.
In [ ]:
# This is a Unit Norm scaling. It ensures that every image vector has a length (magnitude) of exactly $1$.
X_data = X_data / np.sqrt((X_data ** 2).sum(axis=1))[:,None]
X_data.dtype, X_data.shape

Summary¶

After these two steps, X_data has a specific structure:

  • The Average is Zero: If you summed all pixel values across the whole dataset now, the result would be $0$.
  • The Magnitude is Constant: Every single image in the batch now has a total "energy" of $1.0$.

Application: Image Compression via Dimensionality Reduction¶

Image compression is a direct application of Truncating the Tensor. Instead of keeping all $3072$ pixels ($32 \times 32 \times 3$), we keep only the $k$ most significant eigenvectors (the ones with the largest eigenvalues).

The Process:

  1. Channel Separation: We treat the Red, Green, and Blue channels as separate sets of data.
  2. Eigen-decomposition: We find the eigenvectors of the pixel-covariance matrix for each channel.
  3. Truncation: We keep n_components instead of $1024$ ($32^2$).
  4. Change of Basis: We project the pixels onto this smaller subspace.
In [ ]:
X = X_data.copy()
X = X.reshape(-1, 32*32, 3)

Encoder¶

In [ ]:
n_components=1000 # we choose at least this number of components for compression
In [ ]:
# do a pca compression for a given channel
def compress_channel(channel, n_components=n_components):
    
    # Initialize the PCA model to find the 'n' most important directions (eigenvectors)
    pca = PCA(n_components=n_components)
    
    # calculate the covariance matrix and its eigen-decomposition.
    pca.fit(channel)
    
    # That line actually calculates the compression (projection) for educational clarity.
    # It dots the 'loadings' (eigenvectors) with the data to project it 
    # into the reduced subspace. 
    # Note: Using pca.inverse_transform(principalComponents) would be equivalent.
    compressed_channel = np.dot(channel, pca.components_.T)
    
    # alternatively the two steps can be done with fit_transform:
    # 1. Fit: Calculate the Covariance Matrix and its eigen-decomposition.
    # 2. Transform: Project the pixels onto the new principal component axes.
    # compressed_channel = pca.fit_transform(channel)
    
    return compressed_channel, pca 

1. The Design Matrix ($\mathbf{X}$):

The channel input is a 2d-array $\mathbf{X} \in \mathbb{R}^{N \times D}$.

  • $N$: Number of images in the batch.
  • $D$: Number of pixels per channel ($32 \times 32 = 1024$).

2. The Covariance and Eigen-decomposition

The call pca.fit() calculates the Covariance Matrix $\mathbf{C}$:$$\mathbf{C} = \frac{1}{N-1} \mathbf{X}^T \mathbf{X}$$It then solves the Eigenvalue Problem:

$$\mathbf{C}\mathbf{q} = \lambda \mathbf{q}$$

This produces $D$ eigenvectors ($\mathbf{q}$). The eigenvectors with the $k$ largest eigenvalues ($\lambda$) are the principal components. These are stored in pca.components_.

3. The "Compression" (Projection)

In the code, we use np.dot(channel, pca.components_.T).

Let's look at the shapes to see the math:

  • channel is $\mathbf{X}$ of shape $(N, D)$.
  • pca.components_.T is a matrix $\mathbf{Q}$ of shape $(D, k)$, where $k$ is n_components. The column vectors are the $k$ principal components.

The multiplication $\mathbf{X}\mathbf{Q}$ results in the matrix $\mathbf{Z}$. This projects the images on the principle components: $$\mathbf{Z} = \mathbf{X} \mathbf{Q}$$

  • Result Shape: $(N, k)$.
  • Meaning: You have moved your images from a 1024-pixel basis to a $k$-component basis. This is a Passive Rotation into a coordinate system where the axes are ordered by their importance (variance).

PCA is simply a Linear Map: $$\text{Pixels} \xrightarrow{\text{Matrix } \mathbf{Q}} \text{Principal Components}$$

4. The Cumulative Explained Variance (see in the plot below)

Mathematically, the total variance is the Trace (sum of diagonal elements) of the Covariance matrix.$$\text{Total Variance} = \sum_{i=1}^{D} \lambda_i$$

The explained_variance_ratio_ is:$$\frac{\lambda_j}{\sum \lambda_i}$$

In [ ]:
pca_info = []
# loop over all three channels
for c in range(3):
    pca_info.append(compress_channel(X[:,:,c]))
In [ ]:
plt.figure(figsize=(8, 5))
colors = ['red', 'green', 'blue']

for i, (_, pca) in enumerate(pca_info):
    cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
    plt.plot(cumulative_variance, color=colors[i], label=f'Channel {i}')
    
    # Draw a line at 99% to see where it crosses
    plt.axhline(y=0.99, color='gray', linestyle='--', alpha=0.3)

plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('How much "Image Energy" are we keeping?')
plt.legend()
plt.show()

We choose the number of components used for the reconstruction of the images in the decoder:

In [ ]:
k_prime = nb_components_for_reconstruction = 400

Compression ration and saved space¶

In [ ]:
# 1. Total original bytes (Raw Pixels)
X_bytes = X.nbytes 

# 2. Compute compressed bytes based on truncated shapes
compressed_bytes = 0 

#for compressed_data, pca_model in pca_info:
# N = number of images, D = original dimensions (1024)
N, D = X.shape[0], 1024 
    
# We store N images, but each has only k_prime components instead of D pixels
# We multiply by 8 because PCA outputs are float64 (8 bytes per element)
# for each channel ( * 3) 
coords_bytes = N * k_prime * 8 * 3
    
# We store the "Dictionary" (Eigenvectors)
# Each eigenvector is D pixels long, and we keep k_prime of them for each channel (3)
eigenvector_bytes = k_prime * D * 8 * 3
compressed_bytes = (coords_bytes + eigenvector_bytes)

# Calculations
space_saved = 1.0 - (compressed_bytes / X_bytes)
compression_ratio = X_bytes / compressed_bytes

print(f"Space Saved: {space_saved:.2%}")
print(f"Compression Ratio: {compression_ratio:.2f}x")

Decoder: Truncated Reconstruction¶

The reconstruction process is a Change of Basis that maps our low-dimensional coordinates back into the original pixel space. By choosing a specific $k'$, we determine the "rank" of our approximation.

In [ ]:
# reconstruct the data using nb_components
def reconstruct(compressed_channel, pca, nb_components):
    return np.dot(compressed_channel[:,:nb_components], pca.components_[:nb_components])

Mathematical Components¶

In the function reconstruct(compressed_channel, pca, nb_components), we define the following matricies:

  • $\mathbf{Z}'$ (compressed_channel[:, :k']): The Truncated Score Matrix. This contains the first $k'$ columns of the projected data.
    • Shape: $(N, k')$
    • Meaning: These are the weights (coordinates) telling us "how much" of each principal pattern is present in each image.
  • $\mathbf{Q}'^T$ (pca.components_[:k']): The Truncated Basis Matrix. This contains the first $k'$ eigenvectors (Principal Components).
    • Shape: $(k', D)$
    • Meaning: Each row is a "Basis Image" (a 1024-pixel pattern) representing a fundamental visual feature discovered by the PCA.

The Reconstruction Operation¶

The approximation $\mathbf{\hat{X}}$ is calculated via the dot product: $$\mathbf{\hat{X}} = \mathbf{Z}' \mathbf{Q}'^T$$

  • Result Shape: $(N, 1024)$. We have successfully moved from a small $k'$-dimensional latent space back into the large $D$-dimensional pixel space.
  • Constraint: $k' \le k$. The number of components used for reconstruction ($k'$) cannot exceed the number of components originally computed during the encoding/fit phase ($k$).

Visual Interpretation: Linear Combinations¶

Mathematically, for a single image, this reconstruction is a weighted sum of patterns:$$\text{Image}_i \approx z_{i,1}\mathbf{q}_1 + z_{i,2}\mathbf{q}_2 + \dots + z_{i,k'}\mathbf{q}_{k'}$$By increasing $k'$, you add more "building blocks" (eigenvectors), which adds finer detail to the image. If $k'$ is low, the reconstruction only captures the global structure (low-frequency information). As $k'$ approaches $D$, the reconstruction becomes a perfect replica of the original.

Let's look at the shapes of the tensors in your reconstruct function:

  • compressed_channel[:, :nb_components]is $\mathbf{Z}'$. $\mathbf{Z}'$ are the first $k'$ colums of $\mathbf{Z}$ (projected data). Its shape is $(N, k')$. It represents how much of each "basis pattern" exists in each image.
  • pca.components_[:nb_components]: This is the reduced eigenvector matrix $\mathbf{Q}'^T$. Its shape is $(k', D)$. Each row is a "Principal Component" (a 1024-pixel pattern).

with

  • $k'$ is the number of components used for reconstruction
  • $k' <= k$ number of components used for reconstruction can not be greater then the computed number of components of the encoder.

The operation np.dot(Z, V_T) calculates the reconstructed images:

$$\mathbf{\hat{X}} = \mathbf{Z}' \mathbf{Q}'^T$$

The result $\mathbf{\hat{X}}$ has the shape $(N, 1024)$. You have successfully moved from a small $k$-dimensional space back into the large $D$-dimensional pixel space.

In [ ]:
reconstructed = []
for compressed, pca in pca_info:
    reconstructed.append(reconstruct(compressed, pca, k_prime))
hat_X = np.concatenate(reconstructed, axis=1)
hat_X = hat_X.reshape(-1,3,1024).transpose(0,2,1)
hat_X[4].shape
In [ ]:
X.shape
In [ ]:
i=12
show(X[i], True, "original image")
show(hat_X[i], True, "compressed image")

Image whitening¶

Beyond simple dimensionality reduction, PCA can be used for Image Whitening to decorrelate pixels and standardize feature variance. By setting whiten=True in the PCA constructor, each principal component is scaled by the inverse square root of its eigenvalue, ensuring that every transformed feature has a variance of exactly 1.0. This process effectively removes redundant spatial information and acts like a high-pass filter, highlighting structural edges while suppressing global lighting gradients.

For machine learning models like CNNs, this "leveled" input space can help optimizers converge faster by preventing high-variance features from dominating the learning process. However, because standard PCA whitening discards the natural spatial locality of pixels, the resulting images appear as abstract, "ghostly" versions of the original. In modern deep learning, this global preprocessing step is frequently substituted by simpler channel-wise normalization or internal Batch Normalization layers.

Kontakt/Impressum