Linear Algebra

Singular Matrices

The map loses a dimension, and there is no way back

01 · First principlesWhat does "singular" mean physically?

A square matrix is singular when the transformation it performs loses a dimension: it takes n-dimensional space and flattens it into something thinner — a plane into a line, a volume into a sheet. Flattening is irreversible. Once two distinct inputs have been pressed onto the same output, no map can tell them apart again; the information is not hidden, it is gone.

That is the whole concept. Every algebraic test for singularity — determinant, rank, eigenvalues — is just a different instrument for detecting the same physical event: somewhere, a direction of space got crushed flat.

det = 0.8 det = 0.3 — THINNING det = 0 — SINGULAR AREA GONE, NO UNDO

The image of the unit square as the matrix slides toward singularity: the parallelogram thins, then degenerates to a segment. A dimension has left the output.

02 · One fact, five costumesThe equivalence list

For a square n×n matrix A, the following are all equivalent — not five related facts but one fact, "A crushes a direction", reported by five different instruments.

CostumeStatementWhat the instrument measures
det A = 0volume scale factor is zerothe unit cube is flattened — determinant
rank < ncolumns do not span ℝⁿoutputs live in a thinner subspace — rank & span
null space ≠ {0}some x ≠ 0 has Ax = 0a whole direction is sent to the origin — null space
A⁻¹ does not existno map undoes Atwo inputs share an output — inverse
λ = 0 is an eigenvalueAv = 0·v for some v ≠ 0the crushed direction, named — eigenvectors

The translation exercises are short and worth doing once. Zero eigenvalue ⇔ nontrivial null space: the eigenvector for λ = 0 is a null vector, by definition. det = 0 ⇔ rank-deficient: the determinant is the volume of the parallelepiped spanned by the columns, and dependent columns span a flat one. Each arrow is one sentence; the wisdom is in refusing to treat the five as separate things to memorise.

03 · How it breaks in practiceExactly singular is rare; nearly singular is everywhere

Here is the twist that matters for practice. With floating-point data, a matrix is almost never exactly singular — round any entry in the last bit and det ≠ 0 again. The practical enemy is the nearly singular matrix: full rank on paper, but with some direction squashed almost flat (smallest singular value σ_min ≈ 0). All five costumes then read "technically fine", while the condition number κ = σ_max/σ_min explodes and every solve amplifies noise by κ (the mechanics are in matrix inverse).

Where do near-singular matrices come from in ML? From redundancy. Two nearly-duplicate features make two columns of X nearly dependent, so XᵀX has a near-zero eigenvalue. More data dimensions than effective degrees of freedom, correlated parameters, nearly-collinear basis functions — every form of "the data does not really determine this direction" shows up as σ_min → 0.

Reading a warning sign: when a solver returns huge, wildly oscillating coefficients that fit the data perfectly, you are watching 1/σ_min at work — the model is dividing by a near-collapse.

04 · The fixWhy we add λI

The universal remedy is regularisation: replace the offending matrix by A + λI (for symmetric A, typically XᵀX or a Hessian). The effect is surgical. A + λI has the same eigenvectors as A, and every eigenvalue is lifted by exactly λ:

Av = μv  ⟹  (A + λI)v = (μ + λ)v
every eigenvalue raised by λ; zero becomes λ; κ drops to (μ_max+λ)/(μ_min+λ)

The crushed directions — the ones the data says nothing about — get a floor of λ instead of zero, so the solve stops dividing by nothing there; the healthy directions barely notice. The price is a small bias: we are answering a slightly different question in exchange for the answer being stable. This one move, under different names:

The named connection: ridge regression is the canonical ML answer to near-singularity — pay λ worth of bias to buy back the dimensions the data almost lost.
Mental Model