Two failures, one gap, one instrument
A model can fail unseen data from opposite directions. It can memorise: reproduce the training set, private noise and all, having learned the sample rather than the structure (overfitting). Or it can not even learn: lack the capacity or the training to capture structure that is plainly in the data (underfitting). These are the operational faces of the bias–variance tradeoff — underfitting is what high bias looks like in a training log, overfitting is what high variance looks like. That note explains where the error comes from; this one is about detecting which failure you have, because the remedies point in opposite directions and applying the wrong one makes things worse.
The diagnosis requires exactly two numbers: training error and validation error. Their absolute level and the gap between them place you in one of four quadrants.
| Train error | Validation error | Diagnosis | Read it as |
|---|---|---|---|
| high | high (small gap) | underfitting | cannot even fit the data it has seen — capacity or optimisation problem |
| low | high (large gap) | overfitting | fits seen data, fails unseen — memorising sample noise |
| high | even higher | both | too little capacity and unstable fit; common with bad features or bugs |
| low | low (small gap) | healthy | learning structure that transfers — stop fiddling |
A single (train, val) pair is a snapshot; learning curves — error versus training set size — show the trend, and the trend tells you what more data would buy. The two failures produce unmistakably different pictures:
Left: curves converge at a high plateau — more data cannot help, the model is the ceiling. Right: a persistent gap with the validation curve still descending — more data is exactly what would help.
This is the cheapest expensive-question-answerer in ML: before paying for data collection, plot the curves on subsamples. If train and validation have already converged (left panel), the money is wasted; the model family is the bottleneck. If a large gap persists and validation error is still falling with size (right panel), data is precisely the purchase to make.
Two modern footnotes. First, training error near zero is no longer automatically alarming: large overparameterised networks routinely interpolate their training data and still generalise (double descent; the implicit regularisation of SGD is doing unbilled work). The gap remains meaningful, but "train error ≈ 0" alone is not a diagnosis. Second, the entire framework rests on the validation set being honest — drawn from the deployment distribution and untouched by training. Leakage produces a small gap and a confident "healthy" verdict for a model that will fail in production; the hygiene rules live in cross-validation.