Claire is extremely fun to talk to
claire vo 🖤 (@clairevo)
"What you do is you prioritize the top few benchmarks and then you probably bullshit the rest."
We talk a lot about how AI is making coding easier for non-technical folks, but don't hear much about how the most elite engineers are delegating their most technically complex work.
@ankrgyl (CEO @braintrust) and team has to deliver huge volumes of data at lightning fast speeds, which means they're always looking at ways to optimize their DB stack.
The problem? Technical hypotheses are expensive and slow to test when your human team has to manually review, so you look at 1-2 things and then move on with your best guess.
With AI, they're able to bring much more rigor to their most technical work.
In this episode, he shows us:
- how a coding agent can run a full matrix of benchmarks against ever open source column store; something no eng had time to manually do before
- how to build evals the same way you'd write a PRD
- how he uses his best engineer's "vibe check" as the gold standard, and scales it across the product as scoring functions
This is an episode by engineers for engineers, where Ankur and I geek out on CEOs who code, brute force coding agents, and why you should be using AI to solve your hardest engineering problems.
Watch now: piped.video/watch?v=QE_1hRLs…
Video
— https://nitter.net/clairevo/status/2066596922190926149#m