Tensors and Transformations¶

In the fields of Artificial Intelligence and Robotics, the term tensor is used in two distinct ways:

Data Structures: In most AI frameworks (like PyTorch or TensorFlow), a tensor is simply a synonym for a multidimensional array. It is a generic container for numbers arranged in a grid with any number of axes.
The Mathematical Perspective (Geometric Objects): In physics and classical engineering, a tensor is more than just a grid of numbers; it is a mathematical object that obeys specific transformation laws. A "true" tensor maintains a consistent relationship between components even when the underlying coordinate system is rotated or scaled (a linear transformation).

Context	Definition	Key Characteristic
Deep Learning	Multidimensional Array	Focuses on efficient storage and computation.
Physics / Robotics	Geometric Object	Focuses on invariance under coordinate transformation.

Tensors: The Data Structure¶

In modern AI, a Tensor is a generalized container for data. We categorize tensors by their rank, which describes the number of dimensions (or indices) required to access an element.

Scalar (Rank 0): A single number representing magnitude only (e.g., temperature, mass).
- Notation: $s \in \mathbb{R}$
Vector (Rank 1): An ordered list of numbers representing magnitude and direction (e.g., a robot’s velocity in 3D space).
- Notation: $\mathbf{v} \in \mathbb{R}^n$
Matrix (Rank 2): A rectangular array of numbers (e.g. covariance-matrix).
- Notation: $\mathbf{A} \in \mathbb{R}^{m \times n}$
Rank 3-Tensor: For example, a color image is a Rank 3 tensor with dimensions $(\text{Height} \times \text{Width} \times \text{Channels})$.
- Notation: $\boldsymbol{\mathcal{I}} \in \mathbb{R}^{h \times w \times c}$
$\dots$

The Transformation Perspective of Tensors¶

Imagine a robot arm physically rotating in a room. If the robot rotates its arm, the velocity vector of the gripper physically changes its direction relative to the floor. An active transformation describes this physical movement within a fixed coordinate system.

A "true" tensor is not just a collection of numbers, but a physical object that "knows" how to move. Its components transform according to specific linear rules so that the underlying physical relationship remains consistent after the movement.

Vectors (Rank 1 Tensors)¶

In an active rotation, we physically rotate the vector within a fixed coordinate system. If $\mathbf{R}$ is our rotation matrix, the new vector $\mathbf{x}'$ is given by

$$\mathbf{x}' = \mathbf{R}\mathbf{x}$$

In this active view, the vector "moves" within the space. The numbers change because the arrow is now pointing in a new direction.

Example: The 2D Rotation Matrix¶

A rotation matrix $\mathbf{R}$ in 2D takes a tensor and rotates it about the origin by an angle $\theta$, e.g. it rotates a vector without changing its length.

In 2D, the rotation matrix is defined as:

$$\mathbf{R}_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$$

If you multiply a vector (rank 1 tensor) $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ by this matrix, the new coordinates $\mathbf{x}'$ will be:

$$\mathbf{x}' = \mathbf{R}_\theta \mathbf{x} = \begin{bmatrix} x_1\cos\theta - x_2\sin\theta \\ x_1\sin\theta + x_2\cos\theta \end{bmatrix}$$

In [1]:

import numpy as np
import matplotlib.pyplot as plt

def plot_vectors(vectors, colors, title):
    plt.figure(figsize=(5, 5))
    plt.axhline(0, color='black', lw=1)
    plt.axvline(0, color='black', lw=1)
    for i, v in enumerate(vectors):
        plt.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color=colors[i], label=f'v{i}')
    plt.xlim(-1.5, 1.5)
    plt.ylim(-1.5, 1.5)
    plt.grid(True, alpha=0.3)
    plt.title(title)
    plt.legend()
    plt.show()

# --- 1. Rotation Example ---
theta = np.radians(45) # 45 degrees
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])

v_original = np.array([1, 0])
v_rotated = R @ v_original

print(f"Original: {v_original} -> Rotated: {v_rotated.round(2)}")
plot_vectors([v_original, v_rotated], ['blue', 'red'], "Rotation by 45°")

Original: [1 0] -> Rotated: [0.71 0.71]

No description has been provided for this image

Matrices (Rank 2 Tensors) as Linear Maps¶

A Rank 2 tensor (like the Inertia Tensor or a Stress Tensor) represents a relationship between two vectors.

Matrices as Linear Maps (Rank 2 Tensors): A Rank 2 tensor is an operator that maps an input vector $\mathbf{x}$ to an output vector $\mathbf{y}$. Think of it as a physical rule: "If the input force is $\mathbf{x}$, the resulting displacement is $\mathbf{y}$." Mathematically:

$$\mathbf{y} = \mathbf{A}\mathbf{x}$$

Active Rotation of the System: Now, imagine we physically rotate the entire setup (the input, the output, and the mechanism itself) by a rotation matrix $\mathbf{R}$.The rule for an active transformation of Rank 1 tensors (Vectors) is that they physically move to new positions:

$$\mathbf{x}' = \mathbf{R}\mathbf{x} \quad \text{and} \quad \mathbf{y}' = \mathbf{R}\mathbf{y}$$

We need to find the transformed matrix $\mathbf{A}'$ that describes how the rotated mechanism works. In this rotated state, the physical relationship must still hold: the rotated input $\mathbf{x}'$ must produce the rotated output $\mathbf{y}'$.

$$\mathbf{y}' = \mathbf{A}'\mathbf{x}'$$

To derive $\mathbf{A}'$, we substitute our active transformation rules into the original physical law:

Start with the original rule: $\mathbf{y} = \mathbf{A}\mathbf{x}$
Rotate the output vector: Multiply both sides by $\mathbf{R}$ to see where the new output points: $\mathbf{R}\mathbf{y} = \mathbf{R}(\mathbf{A}\mathbf{x})$
Account for the rotated input: We know $\mathbf{x}' = \mathbf{R}\mathbf{x}$, which implies the original input was $\mathbf{x} = \mathbf{R}^{-1}\mathbf{x}'$. We substitute this in:$$\mathbf{R}\mathbf{y} = \mathbf{R}\mathbf{A}(\mathbf{R}^{-1}\mathbf{x}')$$
Group the terms: $$\underbrace{\mathbf{R}\mathbf{y}}_{\mathbf{y}'} = (\mathbf{R}\mathbf{A}\mathbf{R}^{-1}) \underbrace{\mathbf{x}'}_{\mathbf{x}'}$$

By comparing this to our goal $\mathbf{y}' = \mathbf{A}'\mathbf{x}'$, the active transformation law for a Rank 2 tensor is revealed:

$$\mathbf{A}' = \mathbf{R}\mathbf{A}\mathbf{R}^{-1}$$

Note: For rotation matrices, the inverse is the transpose ($\mathbf{R}^{-1} = \mathbf{R}^T$), so we usually write:

$$\mathbf{A}' = \mathbf{R}\mathbf{A}\mathbf{R}^T$$

This "sandwich" product ensures that the matrix is rotated "on both sides"—once to handle the rotated input and once to produce the rotated output.

Example for Rank 2 Tensor: Covariance matrix¶

To explain why the covariance matrix is a Rank 2 tensor, we can look at it through the lens of variance.

Matrices as Quadratic Forms (Rank 2 Tensors)¶

A Rank 1 tensor (a vector $\mathbf{v}$) represents a single direction. A Rank 2 tensor (a matrix $\mathbf{C}$) represents a relationship that requires two directions to produce a scalar result. In the case of a covariance matrix, it tells us the variance of the data along any chosen direction $\mathbf{v}$. To calculate this variance, the matrix $\mathbf{C}$ must interact with the direction vector from both sides:

$$\text{Variance}_{\mathbf{v}} = \mathbf{v}^T \mathbf{C} \mathbf{v}$$

Because $\mathbf{C}$ interacts with the vector $\mathbf{v}$ twice—once as a "row" input on the left and once as a "column" input on the right—it is a Rank 2 tensor.

To understand why we "sandwich" the covariance matrix between two vectors, we need to look at how we measure the spread of data along a specific direction.

1. Projecting a Single Data Point

Imagine you have a centered data point $\mathbf{x}_i$ (a 2D vector) and you want to know its position along a specific direction defined by a unit vector $\mathbf{v}$.The scalar projection of $\mathbf{x}_i$ onto $\mathbf{v}$ is the dot product:

$$\text{projection} = \mathbf{v} \cdot \mathbf{x}_i = \mathbf{v}^T \mathbf{x}_i$$

2. Calculating the Variance

Variance is the average of the squared distances from the mean. Since our data is centered (mean is zero), the variance along direction $\mathbf{v}$ is the average of the squared projections:

$$\text{Variance}_{\mathbf{v}} = \frac{1}{n} \sum_i (\mathbf{v}^T \mathbf{x}_i)^2$$

Using the property $(ab)^2 = (ab)(ab)$, and knowing that a scalar is equal to its own transpose $(\mathbf{v}^T \mathbf{x}_i) = (\mathbf{x}_i^T \mathbf{v})$, we can rewrite the squared term:

$$(\mathbf{v}^T \mathbf{x}_i)^2 = (\mathbf{v}^T \mathbf{x}_i)(\mathbf{x}_i^T \mathbf{v})$$

3. Emerging the Covariance Matrix

Now, substitute this back into our summation:

$$\text{Variance}_{\mathbf{v}} = \frac{1}{n} \sum_i \mathbf{v}^T (\mathbf{x}_i \mathbf{x}_i^T) \mathbf{v}$$

Since $\mathbf{v}$ is a constant direction and doesn't depend on the individual data points $i$, we can pull it outside the summation:

$$\text{Variance}_{\mathbf{v}} = \mathbf{v}^T \left( \frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^T \right) \mathbf{v}$$

The term inside the parentheses is the definition of the Covariance Matrix $\mathbf{C}$. Thus:

$$\text{Variance}_{\mathbf{v}} = \mathbf{v}^T \mathbf{C} \mathbf{v}$$

Why this explains the "Rank 2" Nature

This formula is a Quadratic Form. It reveals that the covariance matrix is a machine with two "slots" for vectors:

The Right Slot ($\mathbf{C}\mathbf{v}$): This operation transforms the direction $\mathbf{v}$ based on the data's spread, creating a new vector that represents the "weighted" direction.
The Left Slot ($\mathbf{v}^T [\dots]$): This dots that weighted direction back onto our original direction to get a single scalar number (the variance).

Because the matrix $\mathbf{C}$ acts on a vector to produce a result that is then compared to another vector, it is fundamentally a Rank 2 tensor.

Rotation of the Covariance Matrix¶

When we apply an active rotation to our data points using a matrix $\mathbf{R}$, the "cloud" of data physically rotates in space. To describe this rotated cloud correctly, we must update the covariance matrix using the sandwich product:$$\mathbf{C}' = \mathbf{R} \mathbf{C} \mathbf{R}^T$$This transformation ensures that the physical properties of the data—such as the Trace (the total variance) and the relative spread—remain unchanged, even though the individual numbers within the matrix have shifted to reflect the new orientation.

In [2]:

# 1. Define a simple covariance matrix (Higher variance in X than Y)
C = np.array([[2.0, 0.0], 
              [0.0, 0.5]])

# 2. Define a 45-degree Rotation Matrix (R)
# In 2D: [[cos(t), -sin(t)], [sin(t), cos(t)]]
theta = np.radians(45)
c, s = np.cos(theta), np.sin(theta)
R = np.array([[c, -s], 
              [s,  c]])

# 3. Apply the Rank 2 Transformation (The Sandwich)
C_prime = R @ C @ R.T

# 4. Results
print("Original Matrix (Variance aligned with X and Y):")
print(C)
print("\nRotated Matrix (Variance now spread across both axes):")
print(np.round(C_prime, 2))

# 5. The Invariant: Total Variance
print(f"\nOriginal Total Variance (Trace): {np.trace(C)}")
print(f"Rotated Total Variance (Trace):  {np.trace(C_prime)}")

Original Matrix (Variance aligned with X and Y):
[[2.  0. ]
 [0.  0.5]]

Rotated Matrix (Variance now spread across both axes):
[[1.25 0.75]
 [0.75 1.25]]

Original Total Variance (Trace): 2.5
Rotated Total Variance (Trace):  2.5

Why does this happen?

Original $\mathbf{C}$: The data is stretched only along the X and Y axes. The off-diagonal is $0$, meaning $x$ and $y$ are uncorrelated.
Rotated $\mathbf{C}'$: After a $45^\circ$ rotation, the "stretch" is now diagonal. Because the stretch is no longer aligned with the grid, the matrix now shows a covariance (non-zero off-diagonal elements).
The Invariant: Notice that the Trace (the sum of the diagonal) is exactly $2.5$ in both cases. The total amount of "shaking" in the data hasn't changed; we just changed the angle from which we are watching it.

In [3]:

import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def draw_ellipse(mu, sigma, ax, n_std=1.0, edgecolor='black', label=None):
    # Eigen-decomposition to find the orientation and scale for the ellipse
    vals, vecs = np.linalg.eigh(sigma)
    order = vals.argsort()[::-1]
    vals, vecs = vals[order], vecs[:, order]
    theta = np.degrees(np.arctan2(*vecs[:, 0][::-1]))
    
    # The radius of the ellipse represents the standard deviation (sqrt of variance)
    width, height = 2 * n_std * np.sqrt(vals)
    ell = Ellipse(xy=mu, width=width, height=height, angle=theta,
                  edgecolor=edgecolor, facecolor='none', label=label, lw=2)
    return ax.add_patch(ell)


mu = np.array([0, 0])

# --- Vectors representing the variance axes ---
# Original axes lengths set to the std values, i.e. sqrt of (2.0 and 0.5)
v1_blue, v2_blue = np.array([np.sqrt(2.0), 0.0]), np.array([0.0, np.sqrt(0.5)])

# Rotated axes: transformed as Rank 1 tensors (Vectors)
v1_red, v2_red = R @ v1_blue, R @ v2_blue

# --- Plotting ---
fig, ax = plt.subplots(figsize=(7, 7))

# 1. Draw the Probability Distributions (Ellipses)
draw_ellipse(mu, C, ax, edgecolor='blue', label=r'Original Covariance $\mathbf{C}$')
draw_ellipse(mu, C_prime, ax, edgecolor='red', label=r'Rotated Covariance $\mathbf{C}\prime$')

# 2. Draw the Basis Vectors (Scaled to Variance)
# Original Basis (Blue)
ax.quiver(0, 0, v1_blue[0], v1_blue[1], color='blue', scale=1, 
          scale_units='xy', angles='xy', alpha=0.5, label='Orig. Std. Axes')
ax.quiver(0, 0, v2_blue[0], v2_blue[1], color='blue', scale=1, 
          scale_units='xy', angles='xy', alpha=0.5)

# Rotated Basis (Red)
ax.quiver(0, 0, v1_red[0], v1_red[1], color='red', scale=1, 
          scale_units='xy', angles='xy', alpha=0.8, label='Rotated Std. Axes')
ax.quiver(0, 0, v2_red[0], v2_red[1], color='red', scale=1, 
          scale_units='xy', angles='xy', alpha=0.8)

# Formatting
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
ax.set_aspect('equal')
ax.axhline(0, color='black', lw=1, alpha=0.2)
ax.axvline(0, color='black', lw=1, alpha=0.2)
ax.set_title(r"Rank 2 Tensor: $\mathbf{C}' = \mathbf{R}\mathbf{C}\mathbf{R}^T$ (45° Rotation)")
ax.legend()
ax.grid(True, linestyle=':', alpha=0.5)
plt.show()

Invariance: The Core of Tensor Theory¶

The most important takeaway from the transformation perspective is Invariance.

While the individual components (the numbers in the array) change depending on the basis you choose, the underlying physical properties do not. For a Rank 2 tensor, properties like the Determinant and the Trace (the sum of diagonal elements) remain constant regardless of the transformation.

Key Insight: In AI, we often ignore this and treat tensors just as "heaps of data." In Robotics, we must respect these laws because our data (force, torque, inertia) exists in a physical 3D world where the choice of coordinate system is arbitrary.

Geometric Intuition

Rank 0: A point. No matter how you turn, the point is the same.
Rank 1: An arrow. If you turn the world, the arrow's coordinates change to keep pointing the same way.
Rank 2: A transformation (like a squish or a stretch). If you turn the world, you have to turn the "squish direction" and the "stretch direction" simultaneously.

Passive Rotations: Changing the Observer¶

Up to this point, we have focused on active rotations, where we physically moved the data cloud or the robot arm within a fixed room. However, in many scientific and engineering contexts, we use passive rotations. In a passive rotation, the physical object (the data cloud) stays exactly where it is, but the coordinate system (the observer) rotates.

Imagine you are looking at a distribution of data and you simply tilt your head $45^\circ$ to the right. The data hasn't moved, but the numbers you use to describe the position of each point will change. While an active rotation by an angle $\theta$ moves the object "forward" relative to the axes, a passive rotation of the axes by $\theta$ makes the object appear to move "backward" (by $-\theta$) relative to the new grid.

The Mathematical Shift¶

Because the passive rotation moves the "ruler" rather than the "object," the transformation laws look slightly different:

Vectors (Rank 1): To find the new coordinates $\mathbf{x}'$ of a stationary vector after the axes rotate by $\mathbf{R}$, we apply the inverse rotation:$$\mathbf{x}' = \mathbf{R}^{-1}\mathbf{x} = \mathbf{R}^T\mathbf{x}$$This is called a contravariant transformation—the coordinates move in the opposite (contrary) direction of the axes.
Matrices (Rank 2): Even in a passive rotation, the "sandwich" remains, but the roles are swapped. If $\mathbf{A}$ is a matrix in the old system, its description in the new system $\mathbf{A}'$ is:$$\mathbf{A}' = \mathbf{R}^T \mathbf{A} \mathbf{R}$$(Note how this is the reverse of the active $\mathbf{R}\mathbf{A}\mathbf{R}^T$).