TWITTER_ARTICLE

Claude bills by tokens, not message count—token growth per conversation follows S…

Brief

Author 0x_kaize (posted 2026-03-29, 10,819 likes / 1,618 retweets / 213 replies) argues that Claude’s perceived strictness comes from token accounting rather than message count and lays out ten tactical habits to cut token spend. Key technical points: token cost scales quadratically with message count (S × N(N+1)/2) with concrete examples at ~500 tokens/exchange; edit original prompts instead of sending follow-ups; start new chats every 15–20 messages or paste a summary into a fresh chat; batch multiple tasks into one message; upload recurring files to Projects so they’re cached; and save Memory/User Preferences to avoid repeating setup. The post also recommends Haiku for low-cost tasks (claiming 50–70% budget savings versus Sonnet/Opus), disabling unused tools, spreading work across the rolling 5-hour window, avoiding peak hours (change effective Mar 26, 2026), and enabling overage for paid plans as a safety net.

Why it matters

Claude bills by tokens, not message count—token growth per conversation follows S × N(N+1) / 2 (S = avg tokens/exchange, N = message count). At ~500 tokens/exchange this yields ~7,500 tokens for 5 messages, ~27,500 for 10, ~105,000 for 20, and ~232,000 for 30 (message 30 ≈ 31× cost of message 1).

Key details

  • Practical habits to cut token spend: edit your original prompt and regenerate (replaces history), start a new chat every 15–20 messages (or ask Claude to summarize + paste into new chat), batch multiple questions into one message, upload recurring PDFs to Projects (cached, not re-tokenized), and save Memory/User Preferences to avoid repeating setup messages.
  • Model and feature choices matter: use Haiku for simple tasks (author reports it can free up ~50–70% of budget vs Sonnet/Opus), turn off Search/Tools and Advanced Thinking when not needed, and pick model tiers by task (Haiku=low, Sonnet=medium, Opus=high).
  • Usage management: Claude uses a rolling 5-hour window; starting Mar 26, 2026 peak weekday hours (5:00–11:00 PT / 8:00–14:00 ET) deplete session limits faster. Pro, Max 5x and Max 20x subscribers can enable “Overage” (pay-as-you-go API billing) with a monthly spending cap as a safety net.
Cleaned source text

title: @0x_kaize: Most people blame Claude for strict limits. I blamed Claude too.

Recently I rea...

author: 0x_kaize

content_type: twitter_article

published: 2026-03-29T16:03:52+00:00

source_url: https://x.com/0x_kaize/status/2038286026284667239

word_count: 1096

Most people blame Claude for strict limits. I blamed Claude too.

Recently I realized that Claude do

Recently I realized that Claude doesn't count the number of messages. it counts tokens. All you need to do is use tokens wisely, but not everyone knows how to do that and ends up losing a lot of tokens and money as a result.

I got really into this and put together a list of the best habits that will save you a ton of tokens.

1. Edit your prompt. Don't send a follow-up

When Claude doesn't get your thoughts right, you might feel tempted to send:

1/ “No, I meant [your message]”

2/ “Ugh, that's not what I wanted [your message]”

and so on

Don't do that!

Every subsequent message is added to the conversation history. Claude re-reads ALL of it every turn - burning tokens on context that didn't even help.

Token cost per message = all previous messages + your new one.

> Total = S × N(N+1) / 2

> (S = avg tokens per exchange, N = message count)

At ~500 tokens per exchange:

5 messages: 7.5K tokens

10 messages: 27.5K tokens

20 messages: 105K tokens

30 messages: 232K tokens

Message 30 costs 31x more than message 1

Instead: click Edit on your original message → fix it → regenerate. The old exchange gets replaced, not stacked.

Fix the prompt, don't feed the history.

2. Start a fresh chat every 15–20 messages

In the previous section, I showed how token costs grow with every message.

Ideally, you should start a new chat every 15–20 messages.

Now imagine a chat with 100+ messages. At ~500 tokens per exchange, that's over 2.5 million tokens burned - most of it just re-reading old history.

One developer tracked his usage and found that 98.5% of tokens were spent on re-reading the history. Only 1.5% went toward actually outputting the result.

Aniket Parihar's post on LinkedIn.

When a chat gets long → ask Claude to summarize everything → copy it → new chat → paste as first message.

3. Batch your questions into one message

Many people believe that splitting questions into separate messages leads to better results. Almost always, the opposite is true.

Three separate prompts = three context loads.

One prompt with three tasks = one context load.

You save tokens twice: fewer context reloads, and you stay further from hitting your limit.

Instead of: “Summarize this article”

“Now list the main points”

“Now suggest a headline”

Write: “Summarize this article, list the main points, and suggest a headline.”

Bonus: the answers often turn out better because Claude immediately sees the full picture.

Three questions. One Prompt. Always!

4. Upload recurring files to Projects

If you upload the same PDF to multiple chats, Claude re-tokenizes that document every single time.

Use the Projects feature instead.

Upload your file once → it gets cached. Every new conversation inside that project references it without burning tokens again.

Cached project content doesn't eat into your usage when you access it repeatedly.

If you work with contracts, briefs, style guides, or any long docs - this alone could cut your token spend dramatically.

5. Set up Memory & User Preferences

Every new chat without saved context wastes 3–5 messages on setup: “I’m a marketer, I write in a casual style, I prefer short paragraphs…”

You've probably seen people start every prompt with "Act as a..." - that's tokens burned on repeat. Claude can remember this permanently.

Go to “Settings” → “Memory and User Settings.” Save your role, communication style, and settings once. Claude will automatically apply them to every new chat.

6. Turn off features you're not actively using

Web search, connectors, and “Explore” mode - all of these add tokens to every response, even if you don’t need them.

Writing your own content?

Turn off the “Search and Tools” feature.

The “Advanced Thinking” feature also consumes tokens. Keep it turned off by default. Only turn it on if your first attempt was unsatisfactory.

Rule: if you didn’t turn this feature on intentionally, turn it off.

7. Use Haiku for simple tasks

Grammar checking, brainstorming, formatting, quick translations, short answers - Haiku handles all of this at a much lower cost than Sonnet or Opus.

Choosing the right model is the most important decision you make every day.

Haiku for drafts and simple tasks → frees up 50–70% of your budget for tasks that truly require powerful models.

Mental model: Haiku → quick tasks, low cost

Sonnet → real work, medium cost

Opus → deep thinking, high cost

You don’t need powerful models for simple tasks!

8. Spread your work across the day

The Claude system uses a rolling 5-hour window. It does not reset at midnight - your limit gradually decreases. Messages sent at 9 a.m. will no longer count by 2 p.m.

If you use up your entire limit during a single morning session, most of your daily limit will remain unused.

Divide your day into 2–3 sessions: morning, afternoon, and evening. By the time you return, your previous usage is no longer counted, and you have a

new limit.

9. Work during off-peak hours

Starting March 26, 2026: Anthropic will now use up your 5-hour session limit more quickly during peak hours:

> 5:00 AM to 11:00 AM Pacific Time / 8:00 AM to 2:00 PM Eastern Time on weekdays.

Same query, same chat - but during peak hours, it impacts your limit more.