High-dimensional geometry breaks the intuitions your methods rely on
Every spatial instinct you own was trained in d = 3: volumes feel even, "nearby" feels meaningful, a sphere mostly fills its bounding box. Many ML methods quietly assume these instincts scale — that distance measures similarity, that a neighbourhood contains neighbours, that sampling a space covers it. In high dimensions, every one of those assumptions fails, and it fails for plain geometric reasons rather than statistical ones. We can compute exactly how.
Inscribe the largest possible sphere in a unit cube. In 2D the disc covers most of the square; the corners are slivers. Now raise d — the cube keeps 2d corners, and the sphere's share of the volume collapses:
| Dimension d | Sphere / cube volume ratio | Reading |
|---|---|---|
| 2 | 0.785 | The disc is most of the square. |
| 3 | 0.524 | About half. Intuition still roughly holds. |
| 5 | 0.164 | Corners now dominate. |
| 10 | 0.0025 | The "centre" of the cube is essentially empty. |
| 20 | 2.5 × 10⁻⁸ | Effectively all volume sits in the 2²⁰ ≈ 10⁶ corners. |
A matching fact about spheres themselves: nearly all of a high-d ball's volume lies in a thin shell at the surface (the fraction within radius 0.9 is 0.9d, which is 0.000027 at d = 100). Uniform high-dimensional mass lives in corners and shells; "the middle" — where your intuition keeps the data — holds nothing.
A point's distance to a random point is a sum of d independent per-coordinate contributions, so its mean grows like √d while its standard deviation stays put — the relative spread shrinks like 1/√d. All pairwise distances pile into a narrowing band:
Distributions of pairwise distances. As d grows the histogram collapses to a spike: distance carries almost no contrast.
This is the operational curse. Nearest-neighbour methods, kernels, and clustering all rank points by distance; when every distance is the same number plus noise, the ranking is noise. The method still runs and still returns answers — it has simply stopped measuring anything.
Suppose density estimation needs roughly 10 samples per axis-direction of resolution. Then covering d dimensions at that resolution needs about 10d samples: a hundred points handle d = 2, and d = 100 needs more samples than there are atoms in the universe. Equivalently, to capture 10% of a unit cube's volume with a sub-cube you must take 0.11/d of each edge — at d = 100 that is 98% of every axis. A "local" neighbourhood containing any fixed fraction of the data is not local at all.
The false premise is uniformity. Natural data occupies a vanishing, structured sliver of its ambient space: random pixel arrays are never photographs, random character strings are never sentences. The manifold hypothesis says real data concentrates near a low-dimensional surface embedded in the high-dimensional container — faces in pixel space vary along perhaps a few dozen coordinates (pose, lighting, identity), not 10⁶. The curse applies to the ambient dimension; learning only has to pay for the intrinsic one.
One inversion worth noticing on the way out: the same concentration that ruins distances also makes high-dimensional randomness well behaved — random vectors are nearly orthogonal, random projections nearly preserve geometry (Johnson–Lindenstrauss), and sums concentrate hard. The curse and the blessings of dimensionality are the same theorems, read in opposite moods.