The map loses a dimension, and there is no way back
A square matrix is singular when the transformation it performs loses a dimension: it takes n-dimensional space and flattens it into something thinner — a plane into a line, a volume into a sheet. Flattening is irreversible. Once two distinct inputs have been pressed onto the same output, no map can tell them apart again; the information is not hidden, it is gone.
That is the whole concept. Every algebraic test for singularity — determinant, rank, eigenvalues — is just a different instrument for detecting the same physical event: somewhere, a direction of space got crushed flat.
The image of the unit square as the matrix slides toward singularity: the parallelogram thins, then degenerates to a segment. A dimension has left the output.
For a square n×n matrix A, the following are all equivalent — not five related facts but one fact, "A crushes a direction", reported by five different instruments.
| Costume | Statement | What the instrument measures |
|---|---|---|
| det A = 0 | volume scale factor is zero | the unit cube is flattened — determinant |
| rank < n | columns do not span ℝⁿ | outputs live in a thinner subspace — rank & span |
| null space ≠ {0} | some x ≠ 0 has Ax = 0 | a whole direction is sent to the origin — null space |
| A⁻¹ does not exist | no map undoes A | two inputs share an output — inverse |
| λ = 0 is an eigenvalue | Av = 0·v for some v ≠ 0 | the crushed direction, named — eigenvectors |
The translation exercises are short and worth doing once. Zero eigenvalue ⇔ nontrivial null space: the eigenvector for λ = 0 is a null vector, by definition. det = 0 ⇔ rank-deficient: the determinant is the volume of the parallelepiped spanned by the columns, and dependent columns span a flat one. Each arrow is one sentence; the wisdom is in refusing to treat the five as separate things to memorise.
Here is the twist that matters for practice. With floating-point data, a matrix is almost never exactly singular — round any entry in the last bit and det ≠ 0 again. The practical enemy is the nearly singular matrix: full rank on paper, but with some direction squashed almost flat (smallest singular value σ_min ≈ 0). All five costumes then read "technically fine", while the condition number κ = σ_max/σ_min explodes and every solve amplifies noise by κ (the mechanics are in matrix inverse).
Where do near-singular matrices come from in ML? From redundancy. Two nearly-duplicate features make two columns of X nearly dependent, so XᵀX has a near-zero eigenvalue. More data dimensions than effective degrees of freedom, correlated parameters, nearly-collinear basis functions — every form of "the data does not really determine this direction" shows up as σ_min → 0.
The universal remedy is regularisation: replace the offending matrix by A + λI (for symmetric A, typically XᵀX or a Hessian). The effect is surgical. A + λI has the same eigenvectors as A, and every eigenvalue is lifted by exactly λ:
The crushed directions — the ones the data says nothing about — get a floor of λ instead of zero, so the solve stops dividing by nothing there; the healthy directions barely notice. The price is a small bias: we are answering a slightly different question in exchange for the answer being stable. This one move, under different names: