TWITTER_ARTICLE

On 2026-02-15, DimitrisPapail described an experiment in which Claude Code…

Brief

DimitrisPapail frames Claude Code as a new kind of research accelerator after previously reacting with dread to autonomous coding agents. In this experiment, he takes a modest, well-scoped idea inspired by an Anthropic generation bug—where the most probable token was sometimes dropped—and tries to move from concept to initial results using AI agents end to end. His workflow starts with a 5-10 minute voice memo, transcribed by ChatGPT and converted by Claude Opus 4.6 into a prompt for Claude Code. From there, the agent handles practical research operations: SSH into Lambda GPU machines, GitHub pushes, local result retrieval, multi-GPU job monitoring, queue management, and ETA tracking. The key observation is not just automation of engineering but compression of the exploratory phase of research: what once required weeks of personal effort or delegation to a student can now be reduced to a few daily check-ins plus GPU time, even as the author warns that this same capability may amplify low-quality paper output.

Why it matters

On 2026-02-15, DimitrisPapail described an experiment in which Claude Code handled nearly the entire research workflow for a small ML idea, including infrastructure setup, experiment execution, and drafting a first report, with no other humans involved.

Key details

  • The project began with a 5-10 minute spoken problem description recorded during a walk, transcribed using ChatGPT voice-to-text, then turned into a Claude Opus 4.6 prompt that was passed into Claude Code to orchestrate the work.
  • Claude Code was reportedly managing SSH access to Lambda GPU instances, pushing code to GitHub, pulling results locally, monitoring and queueing jobs across multiple GPUs, and estimating ETAs while providing intermediate updates on demand.
  • The author estimates the idea would have taken weeks of personal effort or a few days for a junior student, but with the agent loop in place it required only a couple hours of check-ins per day and could reach human-readable results within a few days, with GPUs rather than engineering as the main bottleneck.
  • The central claim is that coding agents sharply shrink the gap between a research question and a first empirical answer, which could accelerate exploratory research but also increase the volume of low-quality 'arxiv-slop.'
Source evidence

title: @DimitrisPapail: A few days ago I wrote about waking up grumpy after watching Claude Code and Cod...
author: DimitrisPapail
contenttype: twitterarticle
published: 2026-02-15T17:01:42+00:00
source_url: https://x.com/DimitrisPapail/status/2023080289828831349

word_count: 672

A few days ago I wrote about waking up grumpy after watching Claude Code and Codex spend a full day

A few days ago I wrote about waking up grumpy after watching Claude Code and Codex spend a full day (and night) iterating on a silly GSM8K idea about solving math problems with purely symbolic methods.

That post was mostly about the dread I felt with the revelation of the automation afforded by the new gen of coding agents. This post is about leaning into it.

I’m attempting to run a simple experiment on myself.

I started from a simple and well-scoped idea, the type of problem I’d normally give to a junior student to get their feet wet, and I am trying to go from concept to semi concrete results using Claude Code for all experiments, infra setup, and first report.

The idea I’m specifically interested in came from a known Anthropic infrastructure bug where the most probable token was occasionally dropped during generation, and it seemed to affect model behavior in interesting ways. A question crept into my mind that is loosely inspired by this, but the details don’t matter for now. I’ll share more once the results are in.

So, I went on a long walk, recorded a 5-10 minute voice message explaining the problem, I used ChatGPT’s voice-to-text to transcribe it (which is, by the way, superior to anything else out there), and pasted the transcript into Claude Opus 4.6. I told Claude that I want to run this end-to-end with Claude Code, and no other humans involved, and it produced a prompt. I pasted that prompt into Claude Code, and from there the AI agent basically took over.

Claude Code is, as I am writing this, handling ssh into lambda GPU instances, pushes code to github, pulls results locally, babysits and monitors jobs across multiple GPUs, queuing jobs I request, estimating ETAs, etc.

The agent keeps me posted, and I can bug it anytime for intermediate results. It is kind of incredible.

To be clear, I care about this idea, but it is a side thing that I don’t have time to commit hours on it. It would likely take me, optimistically, weeks. A student could probably do it in a few days but they’re all busy with their own projects. My MSR colleagues are working on much more important things. But I still want to know the answer to it!!

So now it’s basically me and Claude on this one. Which either feels exciting or dystopian depending on the time of day, and how much coffee I’ve had. But mostly exciting.

So far, it’s been a couple hours of my own work a day, checking in, redirecting, and I think I can get from idea to human-readable output (I don’t quite want to call it paper) in a few days. The only bottlenecks are me finding the time to check in with Claude and the GPUs finishing their runs. Definitely not the engineering.

What does this mean for research in general, but also what does it mean for the way I approach it?

This will obviously increase throughput of arxiv-slop, and that part is a little scary and bad.

But the intellectually interesting part for me is something else. I now have something close to a magic box where I throw in a question and a first answer comes back basically for free, in terms of human effort. Before this, the way I'd explore a new idea is to either clumsily put something together myself or ask a student to run something short for signal, and if it's there, we’d go deeper. That quick signal step, i.e., finding out if a question has any meat to it, is what I can now do without taking up anyone else's time. It’s now between just me, Claude Code, and a few days of GPU time.

I don’t know what this means for how we do research long term. I don’t think anyone does yet. But the distance between a question and a first answer just got very small.

See you in a few days with what actually happened.


Posted: 2026-02-15T17:01:42.000Z

Engagement: 531 likes, 41 retweets, 13 replies