Twitter/X

@lillysharples: Anyways, this is how we should benchmark new models. AI's supposed to replace our jobs? Great, let's...

Anyways, this is how we should benchmark new models. AI's supposed to replace our jobs? Great, let's give it a real one and see how close it's getting!

We keep grading models on the equivalent of a standardized test, and like any standardized test, it mostly measures who studied the most for it.