
The agent that booked my whole week — and the one that quit
Two autonomous assistants, one calendar, and a quiet lesson about handing over the keys.
When everyone optimises for the same test, the test stops measuring anything at all.

Photograph: tek54
A leaderboard is a powerful thing. It can also be a trap.
The numbers keep going up. Whether anything is actually improving is a harder question than the charts suggest.

Two autonomous assistants, one calendar, and a quiet lesson about handing over the keys.

Inside the labs teaching machines to unlearn — and why deletion turns out to be the hardest problem in AI.