Twitter/X

@mattpocockuk: An unbelievably simple way of evaling a new model/harness for your org: 1. Instead of running one i...

2026-06-21 · 13:01 UTC ·@mattpocockuk ·0 min read

An unbelievably simple way of evaling a new model/harness for your org:

Instead of running one implementer agent AFK, run two
Get an agent to pick the best output, or use human review to pick the best output
Tally the results at the end of the week