No training, all inference — the model is the data
Every learner exploits some assumed structure in the data. k-NN exploits the weakest one imaginable: smoothness — nearby inputs tend to share labels. If you believe only that and nothing else, the honest algorithm writes itself: store every training point, and to predict at x, find the k stored points nearest to x and let them vote (or average, for regression).
There is no training phase at all. No parameters are fit, no loss is minimised; the "model" is the training set itself plus a distance function. All the cost moves to inference, where a naive query scans all n points. This inversion (zero work up front, full work per query) is the opposite of a parametric model like an SVM, which spends effort once to compress the data into a few parameters.
k-NN puts the bias–variance tradeoff on a single integer, which makes it the cleanest teaching example in the field.
Sliding k from 1 to n smooths the boundary continuously between these extremes. Larger k averages over more neighbours, suppressing noise (variance down) while blurring genuine fine structure (bias up). Choose k by cross-validation; odd k avoids ties in binary problems.
Same data, two values of k. At k = 1 a single mislabelled red point earns its own enclave; at k = 15 its neighbours outvote it.
People say k-NN "has no model." It does — the model is the distance function, and it is supplied by you, untrained. Two consequences follow immediately:
The smoothness bet requires that "nearest" neighbours are actually near. In high dimensions they are not: volume concentrates so that all pairwise distances become nearly equal, and the ratio of nearest to farthest neighbour distance drifts toward 1. To capture even a small fixed fraction of the data within a neighbourhood, the neighbourhood's radius must grow until "local" is meaningless. The voting neighbours are then no closer to the query than random points, and k-NN degrades to guessing the prior.
This is the curse of dimensionality in its purest form, and k-NN is its most direct victim because k-NN has nothing else to fall back on — no learned weights, no margins, no feature selection.
The fix was not to repair k-NN but to repair the space it operates in. Deep networks learn embeddings in which semantic similarity is geometric proximity — and in such a space, nearest-neighbour lookup is exactly the right algorithm again. Add approximate nearest-neighbour indexes (HNSW, IVF, product quantisation) to cut query cost from O(n) to roughly O(log n), and the lazy learner becomes infrastructure: