
The benchmark wars are quietly breaking science
When everyone optimises for the same test, the test stops measuring anything at all.
Inside the labs teaching machines to unlearn — and why deletion turns out to be the hardest problem in AI.

Photograph: tek54
Every model remembers more than it should. A name pulled from a scraped forum, a face from a dataset nobody audited, a sentence a person asked to be forgotten years ago — all of it folded into billions of weights, impossible to point to and, for a long time, impossible to remove.
That last assumption is the one a small group of researchers spent the past eighteen months trying to break. They call the field machine unlearning, and the premise is deceptively simple: take a trained model and surgically excise the influence of specific data, as if it had never been seen.
Learning is addition. Forgetting is surgery. — An unlearning researcher
A neural network does not store facts the way a database does. There is no row to delete. Information is smeared across the whole system, entangled with everything else it knows. Pull one thread and the rest can unravel.
What follows is a map of that surgery — the techniques that work, the ones that quietly do not, and the regulators now betting that the difference can be measured.