Twitter/X

@mattpocockuk: An unbelievably simple way of evaling a new model/harness for your org: 1. Instead of running one i...

An unbelievably simple way of evaling a new model/harness for your org:

  1. Instead of running one implementer agent AFK, run two
  2. Get an agent to pick the best output, or use human review to pick the best output
  3. Tally the results at the end of the week