Linear Algebra

Eigenvectors and Eigenvalues

The directions a map only stretches, never turns

01 · First principlesLooking for the map's own axes

A matrix generally does something messy to a vector: rotates it a bit, stretches it a bit, by amounts that depend on where the vector points. The natural question: are there privileged directions where the mess disappears — where the map acts as a pure stretch, no turning at all? A vector v on such a direction satisfies

A v = λ v
output is the input, rescaled by λ — same line, new length

v is an eigenvector ("own vector"), λ its eigenvalue: |λ| > 1 stretches, |λ| < 1 shrinks, λ < 0 flips along the line, λ = 0 crushes it (which is singularity, caught in the act). These directions are the map's own axes — the coordinate system in which a tangled transformation becomes n independent one-dimensional rescalings.

EIGENLINE v Av = λv — still on its line w Aw — turned off its line

The eigenvector v is rescaled along its own line; a generic vector w is knocked onto a different line. Eigenvectors are the directions the map respects.

02 · Why they matterRepeated application: eigenvalues are dynamics

The payoff arrives the moment a matrix is applied more than once — recurrences, iterations, layers. Expand the starting vector in the eigenbasis, x = c₁v₁ + ⋯ + cₙvₙ, and apply A k times. Each component just gets rescaled k times over:

Aᵏx = c₁λ₁ᵏv₁ + c₂λ₂ᵏv₂ + ⋯ + cₙλₙᵏvₙ
long-run behaviour = a race between the |λᵢ|ᵏ

Everything about the long run is decided by the eigenvalue magnitudes. If any |λᵢ| > 1, that component explodes; if all |λᵢ| < 1, everything decays to zero; the boundary |λ| = 1 is the knife-edge of stability. And whichever |λ| is largest eventually wins the race — after enough iterations, Aᵏx points along the top eigenvector regardless of where x started. That observation, used deliberately, is power iteration: multiply, normalise, repeat, and the dominant eigenvector emerges (this is the skeleton of PageRank, and of how libraries estimate spectral norms).

03 · The clean caseSymmetric matrices: rotate, stretch, rotate back

General matrices can be unpleasant: complex eigenvalues (rotations have no real fixed direction), missing eigenvectors, skewed eigenbases. The spectral theorem says all of that vanishes for symmetric matrices (A = Aᵀ): the eigenvalues are real and the eigenvectors can be chosen orthonormal. Packing them into Q:

A = Q Λ Qᵀ  =  rotate to eigenaxes · stretch each axis by λᵢ · rotate back

Every symmetric matrix is a stretch in disguise — read right to left: Qᵀ rotates space so the eigenaxes align with the coordinate axes, Λ stretches each axis independently, Q rotates back. No shear, no tangling. This matters because the symmetric case is the one ML lives in: covariance matrices, Gram matrices, Hessians are all symmetric (and the first two are positive semi-definite — spectral theorem plus λᵢ ≥ 0).

04 · Why ML caresThree load-bearing appearances

  1. PCA. The covariance matrix Σ is symmetric PSD; its top eigenvectors are the orthogonal directions of maximal variance, and the eigenvalues are the variances along them. PCA is nothing but "rotate to Σ's eigenaxes, keep the loudest ones".
  2. Curvature. The Hessian of the loss is symmetric; its eigenvalues are the curvatures along its eigendirections — positive λ means bowl, negative means downhill escape route, the mix classifies saddle points.
  3. Gradient descent convergence. On a quadratic, GD's error contracts each step by factors |1 − ηλᵢ| per eigendirection. The step size is hostage to λ_max (η > 2/λ_max diverges) while progress along the shallow direction crawls at rate ηλ_min — so the iteration count is governed by the ratio κ = λ_max/λ_min. Ill-conditioned bowls make GD zig-zag; this single ratio is why preconditioning and normalisation exist.
Recurring pattern: whenever a symmetric matrix encodes "how much, in which direction" — variance, curvature, stiffness — its eigenvectors name the directions and its eigenvalues rank them. Diagonalise first, think second.
Mental Model