substack.com

Dario Amodei — "We are near the end of the exponential"

Brief

Dario Amodei’s core claim is that AI progress is still following a fairly simple scaling story, but that the world has not internalized how close that trajectory may be to transformative outcomes. He frames today’s systems as the continuation of a thesis he has held since 2017: intelligence emerges less from bespoke clever tricks than from pouring compute through sufficiently broad data distributions under objectives that scale cleanly. In his telling, the old pretraining scaling laws have not broken; instead, reinforcement learning now appears to be extending the same pattern. He points to public reports of log-linear performance improvements with more RL training on verifiable tasks such as AIME-style math and says Anthropic is seeing similar behavior across a wider range of domains. That leads him to treat the current pretraining-plus-RL stack not as a dead end but as evidence that the main recipe is still working.

Where Amodei departs from many skeptics is in his willingness to map those scaling curves directly onto very aggressive capability forecasts. He says he is around 90% confident that within 10 years we will have what he calls “a country of geniuses in a data center,” and his personal hunch is much shorter—roughly 1-3 years. He is most confident on tasks with verifiable feedback, especially software engineering, where he says end-to-end coding is effectively guaranteed inside a decade and likely 1-2 years away. He is somewhat less certain on domains where verification is weaker—scientific discovery, Mars mission planning, novel writing—but he argues that generalization from verifiable to less verifiable domains is already visible. On the contentious issue of continual learning, he takes a surprisingly minimalist stance: these systems may not need human-like lifetime learning to become economically dominant. Broad pretraining, RL generalization, and long-context in-context learning may already cover most of the gap, with continual learning potentially arriving as an extra capability rather than a prerequisite.

A major theme of the interview is the distinction between capability growth and economic diffusion. Amodei repeatedly argues that the technology curve and the adoption curve are both exponential, but not identical. He dismisses the idea that diffusion nullifies AI progress, yet insists deployment will still be bottlenecked by enterprise procurement, compliance, security review, organizational change management, and the physical pace of closing loops in real workflows. As evidence that adoption is already very fast, he offers Anthropic’s revenue numbers: roughly $100 million in 2023, $1 billion in 2024, and $9-10 billion in 2025, with another few billion allegedly added in January 2026 alone. Even so, he says that is not the same as instant absorption of AGI-scale capability into GDP. The same logic underlies his defense of Anthropic’s compute posture. If model capability reaches “country of geniuses” status in 2026 or 2027 but monetization lags by 1-5 years, overcommitting to trillion-dollar annual compute purchases could bankrupt a lab that is directionally right but off by a single year.

That leads into his economic model of frontier labs, which he portrays less as speculative money furnaces than as businesses whose losses are largely a timing artifact of compute scale-up. His stylized picture is that individual model generations can already have strong positive gross margins on inference, but firms remain unprofitable because every profitable model finances a much larger next model. In a more mature equilibrium, he expects a small-number-of-firms market analogous to cloud infrastructure: very high barriers to entry, differentiated products, positive but not monopolistic margins, and a meaningful though not dominant share of compute continuously devoted to R&D. He pushes back on the view that profitable AI labs must be underinvesting, arguing that scaling returns are approximately log-linear, so there is a rational interior optimum rather than a need to push 95-100% of resources into training. Notably, he also quantifies the physical side of the buildout: industry AI power demand at 10-15 GW in 2026, rising 3x yearly toward ~300 GW by 2029, implying multi-trillion-dollar annual compute expenditures if that trajectory persists.

On policy and geopolitics, Amodei is simultaneously pro-build and hawkish. He rejects a blanket 10-year moratorium on state AI laws absent federal replacement, arguing that the world may face meaningful AI-enabled bioterrorism and autonomy risks well before then. His preferred sequence is transparency first, then targeted regulation such as mandatory biological-risk classifiers if threat evidence hardens. He also worries that AI may create offense-dominant equilibria in which one actor can do catastrophic harm, making traditional balance-of-power assumptions unreliable. That concern carries into his China stance: he does not think both the US and China should simply race to build symmetric “countries of geniuses,” because advanced AI could stabilize authoritarian control internally and destabilize deterrence externally. At the same time, he distinguishes restricting frontier compute and chips from restricting downstream benefits, suggesting AI-enabled drugs and development should spread widely, especially to the developing world. The broader implication is that Amodei sees the bottleneck after AGI less in invention than in governance, distribution, and institutional adaptation—and believes those questions are arriving on a timeline measured in years, not decades.

Why it matters

In a 142-minute interview published on 2026-02-13, Anthropic CEO Dario Amodei said he is roughly 90% confident that within 10 years there will be “a country of geniuses in a data center,” and said his weaker near-term hunch is 1-3 years, with coding likely reaching end-to-end capability in 1-2 years for verifiable tasks.

Key details

  • Amodei argued that the same “Big Blob of Compute” thesis he held in 2017 still explains progress: the main drivers are raw compute, quantity of data, data quality and breadth, training duration, scalable objective functions, and training-stability techniques such as normalization and conditioning; he said RL scaling is now showing the same log-linear behavior previously seen in pretraining.
  • He said Anthropic has seen revenue compound at an extraordinary pace—roughly $0 to $100 million in 2023, $100 million to $1 billion in 2024, and $1 billion to $9-10 billion in 2025—with “another few billion” added in January 2026 alone, which he presented as evidence that diffusion is fast even if not instantaneous.
  • On coding productivity, Amodei distinguished between weak and strong milestones: he said his earlier prediction that AI would write 90% of code within 3-6 months has already happened in some places, including Anthropic, but emphasized that this is far short of 90-100% of end-to-end software engineering tasks such as compiling, testing, environment setup, writing design docs, and setting technical direction.
  • Amodei said Anthropic engineers are already using Claude heavily enough that some “don’t write any code,” and estimated current coding models may provide about a 15-20% total-factor speedup, up from roughly 5% six months earlier, implying a gradual “soft takeoff” rather than an abrupt recursive-intelligence explosion.
  • He argued continual learning may not be necessary to reach economically transformative systems because broad pretraining + RL generalization plus long-context in-context learning may suffice; however, he also said Anthropic is actively working on continual learning and longer contexts, characterizing very long context as mostly an inference and engineering problem around serving KV cache rather than a fundamental research blocker.
Cleaned source text

title: Dario Amodei — "We are near the end of the exponential"

author: Dwarkesh Patel

content_type: newsletter

publication: substack.com

published: 2026-02-13T17:21:25+00:00

source_url: gmail://19c580608515528c

word_count: 22415

Watch now (142 mins) | "That's why I'm sending this message of urgency"

͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­͏ ­

Forwarded this email? Subscribe here for more

Watch now

Dario Amodei — "We are near the end of the exponential"

"That's why I'm sending this message of urgency"

Dwarkesh Patel

Feb 13

READ IN APP

Dario Amodei thinks we are just a few years away from “a country of geniuses in a data center”. In this episode, we discuss what to make of the scaling hypothesis in the current RL regime, how AI will diffuse throughout the economy, whether Anthropic is underinvesting in compute given their timelines, how frontier labs will ever make money, whether regulation will destroy the boons of this technology, US-China competition, and much more.

Watch on YouTube; listen on Apple Podcasts or Spotify.

Sponsors

Labelbox can get you the RL tasks and environments you need. Their massive network of subject-matter experts ensures realism across domains, and their in-house tooling lets them continuously tweak task difficulty to optimize learning. Reach out at labelbox.com/dwarkesh

Jane Street sent me another puzzle… this time, they’ve trained backdoors into 3 different language models — they want you to find the triggers. Jane Street isn’t even sure this is possible, but they’ve set aside $50,000 for the best attempts and write-ups. They’re accepting submissions until April 1st at janestreet.com/dwarkesh

Mercury’s personal accounts make it easy to share finances with a partner, a roommate… or OpenClaw. Last week, I wanted to try OpenClaw for myself, so I used Mercury to spin up a virtual debit card with a small spend limit, and then I let my agent loose. No matter your use case, apply at mercury.com/personal-banking

Timestamps

(00:00:00) - What exactly are we scaling?

(00:12:36) - Is diffusion cope?

(00:29:42) - Is continual learning necessary?

(00:46:20) - If AGI is imminent, why not buy more compute?

(00:58:49) - How will AI labs actually make profit?

(01:31:19) - Will regulations destroy the boons of AGI?

(01:47:41) - Why can’t China and America both have a country of geniuses in a datacenter?

Transcript

00:00:00 - What exactly are we scaling?

We talked three years ago. In your view, what has been the biggest update over the last three years? What has been the biggest difference between what it felt like then versus now?

Dario Amodei

Broadly speaking, the exponential of the underlying technology has gone about as I expected it to go. There’s plus or minus a year or two here and there. I don’t know that I would’ve predicted the specific direction of code.

But when I look at the exponential, it is roughly what I expected in terms of the march of the models from smart high school student to smart college student to beginning to do PhD and professional stuff, and in the case of code reaching beyond that. The frontier is a little bit uneven, but it’s roughly what I expected.

What has been the most surprising thing is the lack of public recognition of how close we are to the end of the exponential. To me, it is absolutely wild that you have people — within the bubble and outside the bubble — talking about the same tired, old hot-button political issues, when we are near the end of the exponential.

I want to understand what that exponential looks like right now. The first question I asked you when we recorded three years ago was, “what’s up with scaling and why does it work?” I have a similar question now, but it feels more complicated. At least from the public’s point of view, three years ago there were well-known public trends across many orders of magnitude of compute where you could see how the loss improves.

Now we have RL scaling and there’s no publicly known scaling law for it. It’s not even clear what the story is. Is this supposed to be teaching the model skills? Is it supposed to be teaching meta-learning? What is the scaling hypothesis at this point?

I actually have the same hypothesis I had even all the way back in 2017. I think I talked about it last time, but I wrote a doc called “The Big Blob of Compute Hypothesis”. It wasn’t about the scaling of language models in particular. When I wrote it GPT-1 had just come out.

That was one among many things. Back in those days there was robotics. People tried to work on reasoning as a separate thing from language models, and there was scaling of the kind of RL that happened in AlphaGo and in Dota at OpenAI. People remember StarCraft at DeepMind, AlphaStar.

It was written as a more general document. Rich Sutton put out “The Bitter Lesson” a couple years later. The hypothesis is basically the same. What it says is that all the cleverness, all the techniques, all the “we need a new method to do something”, that doesn’t matter very much. There are only a few things that matter. I think I listed seven of them.

One is how much raw compute you have. The second is the quantity of data. The third is the quality and distribution of data. It needs to be a broad distribution. The fourth is how long you train for. The fifth is that you need an objective function that can scale to the moon. The pre-training objective function is one such objective function. Another is the RL objective function that says you have a goal, you’re going to go out and reach the goal.

Within that, there’s objective rewards like you see in math and coding, and there’s more subjective rewards like you see in RLHF or higher-order versions of that. Then the sixth and seventh were things around normalization or conditioning, just getting the numerical stability so that the big blob of compute flows in this laminar way instead of running into problems.

That was the hypothesis, and it’s a hypothesis I still hold. I don’t think I’ve seen very much that is not in line with it. The pre-training scaling laws were one example of what we see there. Those have continued going. Now it’s been widely reported, we feel good about pre-training. It’s continuing to give us gains.

What has changed is that now we’re also seeing the same thing for RL. We’re seeing a pre-training phase and then an RL phase on top of that. With RL, it’s actually just the same. Even other companies have published things in some of their releases that say, “We train the model on math contests — AIME or other things — and how well the model does is log-linear in how long we’ve trained it.”

We see that as well, and it’s not just math contests. It’s a wide variety of RL tasks. We’re seeing the same scaling in RL that we saw for pre-training.

You mentioned Rich Sutton and “The Bitter Lesson”. I interviewed him last year, and he’s actually very non-LLM-pilled. I don’t know if this is his perspective, but one way to paraphrase his objection is: Something which possesses the true core of human learning would not require all these billions of dollars of data and compute and these bespoke environments, to learn how to use Excel, how to use PowerPoint, how to navigate a web browser. The fact that we have to build in these skills using these RL environments hints that we are actually lacking a core human learning algorithm. So we’re scaling the wrong thing.

That does raise the question. Why are we doing all this RL scaling if we think there’s something that’s going to be human-like in its ability to learn on the fly?

I think this puts together several things that should be thought of differently. There is a genuine puzzle here, but it may not matter. In fact, I would guess it probably doesn’t matter. There is an interesting thing. Let me take the RL out of it for a second, because I actually think it’s a red herring to say that RL is any different from pre-training in this matter.

If we look at pre-training scaling, it was very interesting back in 2017 when Alec Radford was doing GPT-1. The models before GPT-1 were trained on datasets that didn’t represent a wide distribution of text. You had very standard language modeling benchmarks. GPT-1 itself was trained on a bunch of fanfiction, I think actually.

It was literary text, which is a very small fraction of the text you can get. In those days it was like a billion words or something, so small datasets representing a pretty narrow distribution of what you can see in the world. It didn’t generalize well. If you did better on some fanfiction corpus, it wouldn’t generalize that well to other tasks.

We had all these measures. We had all these measures of how well it did at predicting all these other kinds of texts. It was only when you trained over all the tasks on the internet — when you did a general internet scrape from something like Common Crawl or scraping links in Reddit, which is what we did for GPT-2 — that you started to get generalization.

I think we’re seeing the same thing on RL. We’re starting first with simple RL tasks like training on math competitions, then moving to broader training that involves things like code. Now we’re moving to many other tasks. I think then we’re going to increasingly get generalization. So that kind of takes out the RL vs. pre-training side of it.

But there is a puzzle either way, which is that in pre-training we use trillions of tokens. Humans don’t see trillions of words. So there is an actual sample efficiency difference here. There is actually something different here. The models start from scratch and they need much more training. But we also see that once they’re trained, if we give them a long context length of a million — the only thing blocking long context is inference — they’re very good at learning and adapting within that context.

So I don’t know the full answer to this. I think there’s something going on where pre-training is not like the process of humans learning, but it’s somewhere between the process of humans learning and the process of human evolution. We get many of our priors from evolution. Our brain isn’t just a blank slate. Whole books have been written about this.

The language models are much more like blank slates. They literally start as random weights, whereas the human brain starts with all these regions connected to all these inputs and outputs. Maybe we should think of pre-training — and for that matter, RL as well — as something that exists in the middle space between human evolution and human on-the-spot learning. And we should think of the in-context learning that the models do as something between long-term human learning and short-term human learning.

So there’s this hierarchy. There’s evolution, there’s long-term learning, there’s short-term learning, and there’s just human reaction. The LLM phases exist along this spectrum, but not necessarily at exactly the same points. There’s no analog to some of the human modes of learning the LLMs are falling in between the points. Does that make sense?

Yes, although some things are still a bit confusing. For example, if the analogy is that this is like evolution so it’s fine that it’s not sample efficient, then if we’re going to get super sample-efficient agent from in-context learning, why are we bothering to build all these RL environments?

There are companies whose work seems to be teaching models how to use this API, how to use Slack, how to use whatever. It’s confusing to me why there’s so much emphasis on that if the kind of agent that can just learn on the fly is emerging or has already emerged.

I can’t speak for the emphasis of anyone else. I can only talk about how we think about it. The goal is not to teach the model every possible skill within RL, just as we don’t do that within pre-training. Within pre-training, we’re not trying to expose the model to every possible way that words could be put together. Rather, the model trains on a lot of things and then reaches generalization across pre-training.

That was the transition from GPT-1 to GPT-2 that I saw up close. The model reaches a point. I had these moments where I was like, “Oh yeah, you just give the model a list of numbers — this is the cost of the house, this is the square feet of the house — and the model completes the pattern and does linear regression.” Not great, but it does it, and it’s never seen that exact thing before.

So to the extent that we are building these RL environments, the goal is very similar to what was done five or ten years ago with pre-training. We’re trying to get a whole bunch of data, not because we want to cover a specific document or a specific skill, but because we want to generalize.

00:12:36 - Is diffusion cope?

I think the framework you’re laying down obviously makes sense. We’re making progress toward AGI. Nobody at this point disagrees we’re going to achieve AGI this century. The crux is you say we’re hitting the end of the exponential. Somebody else looks at this and says, “We’ve been making progress since 2012, and by 2035 we’ll have a human-like agent.”

Obviously we’re seeing in these models the kinds of things that evolution did, or that learning within a human lifetime does. I want to understand what you’re seeing that makes you think it’s one year away and not ten years away.

There are two claims you could make here, one stronger and one weaker. Starting with the weaker claim, when I first saw the scaling back in 2019, I wasn’t sure. This was a 50/50 thing. I thought I saw something. My claim was that this was much more likely than anyone thinks. Maybe there’s a 50% chance this happens.

On the basic hypothesis of, as you put it, within ten years we’ll get to what I call a “country of geniuses in a data center”, I’m at 90% on that. It’s hard to go much higher than 90% because the world is so unpredictable. Maybe the irreducible uncertainty puts us at 95%, where you get to things like multiple companies having internal turmoil, Taiwan gets invaded, all the fabs get blown up by missiles.

Now you’ve jinxed us, Dario.

You could construct a 5% world where things get delayed for ten years. There’s another 5% which is that I’m very confident on tasks that can be verified. With coding, except for that irreducible uncertainty, I think we’ll be there in one or two years. There’s no way we will not be there in ten years in terms of being able to do end-to-end coding.

My one little bit of fundamental uncertainty, even on long timescales, is about tasks that aren’t verifiable: planning a mission to Mars; doing some fundamental scientific discovery like CRISPR; writing a novel. It’s hard to verify those tasks. I am almost certain we have a reliable path to get there, but if there’s a little bit of uncertainty it’s there. On the ten-year timeline I’m at 90%, which is about as certain as you can be. I think it’s crazy to say that this won’t happen by 2035. In some sane world, it would be outside the mainstream.

But the emphasis on verification hints to me a lack of belief that these models are generalized. If you think about humans, we’re both good at things for which we get verifiable reward and things for which we don’t.

No, this is why I’m almost sure. We already see substantial generalization from things that verify to things that don’t. We’re already seeing that.

But it seems like you were emphasizing this as a spectrum which will split apart which domains in which we see more progress. That doesn’t seem like how humans get better.

The world in which we don’t get there is the world in which we do all the verifiable things. Many of them generalize, but we don’t fully get there. We don’t fully color in the other side of the box. It’s not a binary thing.

Even if generalization is weak and you can only do verifiable domains, it’s not clear to me you could automate software engineering in such a world. You are “a software engineer” in some sense, but part of being a software engineer for you involves writing long memos about your grand vision.

I don’t think that’s part of the job of SWE. That’s part of the job of the company, not SWE specifically. But SWE does involve design documents and other things like that. The models are already pretty good at writing comments. Again, I’m making much weaker claims here than I believe, to distinguish between two things. We’re already almost there for software engineering.

By what metric? There’s one metric which is how many lines of code are written by AI. If you consider other productivity improvements in the history of software engineering, compilers write all the lines of software. There’s a difference between how many lines are written and how big the productivity improvement is. “We’re almost there” meaning… How big is the productivity improvement, not just how many lines are written by AI?

I actually agree with you on this. I’ve made a series of predictions on code and software engineering. I think people have repeatedly misunderstood them. Let me lay out the spectrum.

About eight or nine months ago, I said the AI model will be writing 90% of the lines of code in three to six months. That happened, at least at some places. It happened at Anthropic, happened with many people downstream using our models. But that’s actually a very weak criterion. People thought I was saying that we won’t need 90% of the software engineers. Those things are worlds apart. The spectrum is: 90% of code is written by the model, 100% of code is written by the model. That’s a big difference in productivity.