Linear Algebra

Determinant

One number: how much the map scales volume, and whether it flips space

01 · First principlesForget the formula; ask what one number could summarise a map

The cofactor-expansion formula is where intuition goes to die, so start elsewhere. A matrix A moves every region of space somewhere. Here is a remarkable fact: every region, whatever its shape, has its volume multiplied by the same constant factor. A map that doubles the unit square's area doubles every area. That universal factor is the determinant.

vol( A(S) ) = |det A| · vol(S) for every region S

one scalar summarises the map's effect on all volumes

Why is the factor universal? Because linearity has no favourites: any region can be tiled by tiny cubes, A sends every tiny cube to the same-shaped tiny parallelepiped, so every tile is scaled identically, and the sum inherits the factor. The unit square is just the convenient test particle.

02 · The pictureThe unit square becomes the column parallelogram

Apply a 2×2 matrix to the unit square. The corners (1,0) and (0,1) land on the columns of A, so the square becomes the parallelogram spanned by the columns — and det A is its signed area.

e₁ and e₂ land on the columns of A; the square they spanned becomes the parallelogram the columns span. Its signed area is the determinant.

The picture immediately explains det = 0: if the columns are linearly dependent, the parallelogram is a flat segment, area zero — the map collapses a dimension, and everything in singular matrices follows. It also explains the determinant in the denominator of the 2×2 inverse: to undo the map you divide volumes back out, and you cannot divide by zero.

03 · The algebraSign, products, and eigenvalues

The sign is orientation. det < 0 means the map flips space's handedness — a reflection is involved; the parallelogram came out "face down". |det| carries the volume, the sign carries the flip.

Composition multiplies. Apply B then A and volumes scale by both factors in turn, so the algebra is forced by the geometry:

det(AB) = det(A) · det(B), det(A⁻¹) = 1 / det(A)

Eigenvalues factor it. Along each eigendirection the map is a pure stretch by λᵢ, and a box aligned with those directions has its volume scaled by all the stretches at once:

det A = λ₁ λ₂ ⋯ λₙ

total volume change = product of per-direction stretches

One zero eigenvalue zeroes the product — the algebraic echo of one crushed direction killing the whole volume. A triangular matrix shows the same logic nakedly: its eigenvalues sit on the diagonal, so its determinant is the product of diagonal entries. Hold this fact; it is about to pay for itself.

04 · Why ML caresNormalizing flows live on log|det J|

The named connection: change of variables. Push a density through an invertible map f and probability mass is conserved, but the volume it occupies changes — by the local volume factor of f, which is the determinant of its Jacobian. Densities must compensate:

log p_x(x) = log p_z(f(x)) + log |det J_f(x)|

where the map compresses volume, density piles up

A normalizing flow is a neural network trained directly on this identity: an invertible f pulling data back to a Gaussian, with the log-determinant term as the price of honesty in the likelihood. (Diffusion's probability-flow ODE computes the same correction continuously, as a running trace.)

The catch is cost. A general n×n determinant is O(n³) — and worse, needed per data point per training step, with gradients. So flow architectures are an exercise in determinant-dodging: design layers whose Jacobian is triangular by construction (coupling layers, autoregressive flows), so log|det J| is just Σ log of the diagonal — O(n), read off rather than computed. The entire architecture of RealNVP-style models is the shape of this one accounting trick.

Elsewhere in ML: log det Σ in the Gaussian log-likelihood (computed via Cholesky, see PSD matrices), volume terms in variational bounds, and determinantal point processes for diverse sampling. Wherever volume meets probability, a log-determinant is the exchange rate.

Mental Model

det A is the universal volume scale factor: every region's volume, multiplied by the same number.
The unit square lands on the column parallelogram; dependent columns flatten it, so det = 0 means collapse.
Sign = orientation flip; det(AB) = det A · det B because volume factors compose; det = product of eigenvalues.
Change of variables: densities trade off against volume via log|det J| — the heart of normalizing flows.
Nobody computes general determinants at scale; architectures are designed so the Jacobian is triangular and log|det| is a diagonal sum.