One number: how much the map scales volume, and whether it flips space
The cofactor-expansion formula is where intuition goes to die, so start elsewhere. A matrix A moves every region of space somewhere. Here is a remarkable fact: every region, whatever its shape, has its volume multiplied by the same constant factor. A map that doubles the unit square's area doubles every area. That universal factor is the determinant.
Why is the factor universal? Because linearity has no favourites: any region can be tiled by tiny cubes, A sends every tiny cube to the same-shaped tiny parallelepiped, so every tile is scaled identically, and the sum inherits the factor. The unit square is just the convenient test particle.
Apply a 2×2 matrix to the unit square. The corners (1,0) and (0,1) land on the columns of A, so the square becomes the parallelogram spanned by the columns — and det A is its signed area.
e₁ and e₂ land on the columns of A; the square they spanned becomes the parallelogram the columns span. Its signed area is the determinant.
The picture immediately explains det = 0: if the columns are linearly dependent, the parallelogram is a flat segment, area zero — the map collapses a dimension, and everything in singular matrices follows. It also explains the determinant in the denominator of the 2×2 inverse: to undo the map you divide volumes back out, and you cannot divide by zero.
The sign is orientation. det < 0 means the map flips space's handedness — a reflection is involved; the parallelogram came out "face down". |det| carries the volume, the sign carries the flip.
Composition multiplies. Apply B then A and volumes scale by both factors in turn, so the algebra is forced by the geometry:
Eigenvalues factor it. Along each eigendirection the map is a pure stretch by λᵢ, and a box aligned with those directions has its volume scaled by all the stretches at once:
One zero eigenvalue zeroes the product — the algebraic echo of one crushed direction killing the whole volume. A triangular matrix shows the same logic nakedly: its eigenvalues sit on the diagonal, so its determinant is the product of diagonal entries. Hold this fact; it is about to pay for itself.
The named connection: change of variables. Push a density through an invertible map f and probability mass is conserved, but the volume it occupies changes — by the local volume factor of f, which is the determinant of its Jacobian. Densities must compensate:
A normalizing flow is a neural network trained directly on this identity: an invertible f pulling data back to a Gaussian, with the log-determinant term as the price of honesty in the likelihood. (Diffusion's probability-flow ODE computes the same correction continuously, as a running trace.)
The catch is cost. A general n×n determinant is O(n³) — and worse, needed per data point per training step, with gradients. So flow architectures are an exercise in determinant-dodging: design layers whose Jacobian is triangular by construction (coupling layers, autoregressive flows), so log|det J| is just Σ log of the diagonal — O(n), read off rather than computed. The entire architecture of RealNVP-style models is the shape of this one accounting trick.