(None of these things was designed to be a good eval—all stuff I wanted help with and organically figured out LLMs couldn’t helped me with in the past)
(None of these things was designed to be a good eval—all stuff I wanted help with and organically figured out LLMs couldn’t helped me with in the past)