Twitter/X

@fchollet: We desperately need standardized benchmarks for agentic capabilities instead of panic-reacting to pr...

We desperately need standardized benchmarks for agentic capabilities instead of panic-reacting to prompt-engineering parlor tricks. Until we can objectively and transparently measure what these systems actually do, the entire ecosystem will remain vulnerable to unpredictable and arbitrary government overreach.