
The benchmark wars are quietly breaking science
When everyone optimises for the same test, the test stops measuring anything at all.
Models, agents, and the labs racing to build them — and the limits nobody can engineer away.

When everyone optimises for the same test, the test stops measuring anything at all.

Two autonomous assistants, one calendar, and a quiet lesson about handing over the keys.

Inside the labs teaching machines to unlearn — and why deletion turns out to be the hardest problem in AI.