Twitter/X

@lillysharples: Anyways, this is how we should benchmark new models. AI's supposed to replace our jobs? Great, let's...

2026-06-11 · 16:25 UTC ·@lillysharples ·0 min read

Anyways, this is how we should benchmark new models. AI's supposed to replace our jobs? Great, let's give it a real one and see how close it's getting!

We keep grading models on the equivalent of a standardized test, and like any standardized test, it mostly measures who studied the most for it.