The Geometry of Generalization

Four forces stand between a model and the abyss of memorization. Each panel below is a live, mathematically exact simulation — no decoration faked, every curve solved from the real equations. Drag the dials. Watch truth fight noise.

▾ scroll into the dark ▾
Force I

Regularization · L1 & L2

A degree-9 polynomial is fit to noisy samples of a true sine wave by ridge-regularized least squares, solved exactly via the normal equations w = (XᵀX + λI)⁻¹ Xᵀy. As you raise λ, the penalty crushes large weights and the wild curve relaxes toward the truth. The bars show each weight shrinking.

True f(x)=sin Noisy training points Fitted polynomial
Force II

k-Fold Cross-Validation

The dataset is sliced into k folds. Each round, one fold becomes the validation set (cyan) while the rest train (rose). A polynomial of chosen degree is fit on the train folds and scored on the held-out fold. The averaged validation error is the honest estimate of generalization — watch it bottom out at the right complexity.

Force III

Model Simplification · Capacity vs. Truth

Same noisy data, one dial: polynomial degree = capacity. Low degree underfits (too rigid). High degree overfits (memorizes noise, explodes between points). The live bias² / variance / total decomposition shows the U-shaped sweet spot of true generalization error.

Force IV

Early Stopping · The Moment of Divergence

A high-capacity model is trained by real gradient descent. Training loss (rose) falls forever; validation loss (cyan) falls, then turns upward as the model begins memorizing. Early stopping freezes the weights at the validation minimum — the gold star. Press Train and watch the two curves split.