General ML

Curse of Dimensionality

High-dimensional geometry breaks the intuitions your methods rely on

01 · First principlesYour geometric intuition is three-dimensional

Every spatial instinct you own was trained in d = 3: volumes feel even, "nearby" feels meaningful, a sphere mostly fills its bounding box. Many ML methods quietly assume these instincts scale — that distance measures similarity, that a neighbourhood contains neighbours, that sampling a space covers it. In high dimensions, every one of those assumptions fails, and it fails for plain geometric reasons rather than statistical ones. We can compute exactly how.

02 · Volume flees the centreCorners and shells take everything

Inscribe the largest possible sphere in a unit cube. In 2D the disc covers most of the square; the corners are slivers. Now raise d — the cube keeps 2d corners, and the sphere's share of the volume collapses:

Vsphere / Vcube = πd/2 / ( 2d · Γ(d/2 + 1) )  ⟶  0
Dimension dSphere / cube volume ratioReading
20.785The disc is most of the square.
30.524About half. Intuition still roughly holds.
50.164Corners now dominate.
100.0025The "centre" of the cube is essentially empty.
202.5 × 10⁻⁸Effectively all volume sits in the 2²⁰ ≈ 10⁶ corners.

A matching fact about spheres themselves: nearly all of a high-d ball's volume lies in a thin shell at the surface (the fraction within radius 0.9 is 0.9d, which is 0.000027 at d = 100). Uniform high-dimensional mass lives in corners and shells; "the middle" — where your intuition keeps the data — holds nothing.

03 · Distance stops discriminatingThe nearest neighbour is barely near

A point's distance to a random point is a sum of d independent per-coordinate contributions, so its mean grows like √d while its standard deviation stays put — the relative spread shrinks like 1/√d. All pairwise distances pile into a narrowing band:

( dmax − dmin ) / dmin  ⟶  0   — the nearest neighbour is scarcely nearer than the farthest
PAIRWISE DISTANCE (NORMALISED BY MEAN) → DENSITY d = 2 · BROAD — "NEAR" MEANS SOMETHING d = 1000 · A SPIKE — ALL PAIRS EQUIDISTANT

Distributions of pairwise distances. As d grows the histogram collapses to a spike: distance carries almost no contrast.

This is the operational curse. Nearest-neighbour methods, kernels, and clustering all rank points by distance; when every distance is the same number plus noise, the ranking is noise. The method still runs and still returns answers — it has simply stopped measuring anything.

04 · The sample-cost statementCoverage is exponential in d

Suppose density estimation needs roughly 10 samples per axis-direction of resolution. Then covering d dimensions at that resolution needs about 10d samples: a hundred points handle d = 2, and d = 100 needs more samples than there are atoms in the universe. Equivalently, to capture 10% of a unit cube's volume with a sub-cube you must take 0.11/d of each edge — at d = 100 that is 98% of every axis. A "local" neighbourhood containing any fixed fraction of the data is not local at all.

So the honest conclusion would be: learning anything nonparametric in d = 10⁴ (a small image) is hopeless. Yet vision models work. Something about the premise must be false.

05 · The reprieveReal data does not fill the space

The false premise is uniformity. Natural data occupies a vanishing, structured sliver of its ambient space: random pixel arrays are never photographs, random character strings are never sentences. The manifold hypothesis says real data concentrates near a low-dimensional surface embedded in the high-dimensional container — faces in pixel space vary along perhaps a few dozen coordinates (pose, lighting, identity), not 10⁶. The curse applies to the ambient dimension; learning only has to pay for the intrinsic one.

Victims · trust raw ambient distance
KNN on pixels, RBF kernels with naive Euclidean metrics, grid search over raw coordinates, density estimation in the original space. All inherit the spike in section 03.
Survivors · learn the manifold's coordinates
Deep networks learn representations in which distance is meaningful again — KNN on good embeddings works fine. Dimensionality reduction is the explicit version of the same move.

One inversion worth noticing on the way out: the same concentration that ruins distances also makes high-dimensional randomness well behaved — random vectors are nearly orthogonal, random projections nearly preserve geometry (Johnson–Lindenstrauss), and sums concentrate hard. The curse and the blessings of dimensionality are the same theorems, read in opposite moods.

Mental Model