Every point you can reach, and the true information content of a map
Give yourself a set of vectors and two moves: scale any of them, and add results together. The span is everything reachable with those moves — every linear combination c₁v₁ + … + ckvk. One nonzero vector spans a line; two non-parallel vectors span a plane; and adding a vector that is already a combination of the others (see linear independence) extends the span by nothing at all.
Spans are always flats through the origin: lines, planes, hyperplanes. The question span answers is reachability. The question it cannot answer alone is how big the reachable set is — for that we count, and the count is rank.
View a matrix A as a machine: x goes in, Ax comes out. Since Ax is a combination of A's columns weighted by the entries of x, the set of all possible outputs — the image — is exactly the span of the columns. The rank is the dimension of that image:
Rank is the honest size of the map, as opposed to its advertised size. A 1000 × 1000 matrix of rank 3 looks like a million numbers, but as a transformation it flattens all of ℝ¹⁰⁰⁰ onto a 3-dimensional flat: only three numbers' worth of output direction survive. Geometrically, rank is the dimension the map preserves; everything else gets crushed.
Where do the missing dimensions go? Nowhere mysterious — they are crushed to zero. For A with n input dimensions:
Read it as a conservation law: every input dimension either makes it through to the image or dies in the null space. A 5-dimensional input mapped with rank 3 must be killing exactly a 2-dimensional family of inputs. This single identity settles most existence and uniqueness questions about Ax = b before any computation; the companion note works it from the null-space side.
| Situation (A is m × n) | Means | Consequence |
|---|---|---|
| rank = n (full column rank) | No input information lost | Ax = b has at most one solution; least squares is well-posed |
| rank = m (full row rank) | Every output reachable | Ax = b solvable for every b |
| rank = m = n (square, full) | A bijection of space | Invertible, det ≠ 0 |
| rank < min(m, n) | Map collapses dimensions | Singular behaviour: lost information, no clean undo |
Numerically, rank is read off the singular values: count how many exceed a tolerance. "Rank 50 but forty of the singular values are tiny" behaves like rank 10 plus noise — this effective rank is what matters in practice.
Real-world matrices are rarely full rank in spirit: their singular values decay fast, so a low-rank approximation keeps almost everything while paying almost nothing. ML exploits this constantly.