title: @realmcore_: " You missed the latest agent browser skill on X. You don’t have 10 agent loops ...
author: realmcore_
contenttype: twitterarticle
published: 2026-02-03T19:05:56+00:00
sourceurl: https://x.com/realmcore/status/2018762897971990830
word_count: 1202
" You missed the latest agent browser skill on X. You don’t have 10 agent loops running 24/7. Oh, yo
" You missed the latest agent browser skill on X. You don’t have 10 agent loops running 24/7. Oh, you’re not using clawdbot? Maybe you could if you have 5 more Max/Ultra accounts. Maybe you could if you had just one more hour to find the right hooks and skills you should be using. Last time you said you’d try to keep up.
Oh, you’re still using MCP?
Its over. "
Yeah, you lost . Opus 5 just dropped. The latest 3rd party harnesses rip 50 subagents at a time and connect to proprietary stacks.
And you have a lousy meat computer in your head that is now outclassed by an intelligence that the average person tells you is deleting a bottle of water from the planet for every message you send it.
In a space where the de facto way of doing things is changing so rapidly, you might never keep up.
But to make sure you’re not left behind, you can start by recognizing a few things about how software engineering is changing.
First, many FAANG engineers during the BigTech era of the past decade were really not doing engineering. They were coding. And there’s a huge difference.
In this context, armies of junior engineers (1-3 years of experience) would take PRD specs from their pms, and convert them into tickets before sprint planning, and then code them.
Unfortunately for those juniors, we now have agents. And agents love specs. Agents eat your specs for breakfast.
In a world where the core junior engineer activity is done by an agent (which is a separate problem), how do you as an engineer handle your work and stay productive?
The answer is obvious, but I’ll say it anyways: you need to be managing agents and making core architectural/system decisions.
Implementation is no longer in the job description.
Now your job is pretty simple: designing and building systems for agents to work in while producing high quality code.
By the way, the system you’re building is bigger than just the codebase.
It also includes ALL of the tooling around it and how it is made available to an agent.
So the job is now, “how do I design a good system that yields high quality code in this particular codebase?” .
It’s more like a factory, where better automation increases the volume of high quality code that the system yields.
So, how do you design your factory?
Here comes everyone’s favorite answer:
It depends.
The goal behind each individual factory we set up (codebase, stack, agent skills/hooks/rules/etc.) will be different which means that how we design the factory must be different too.
Here are two examples.
Let’s assume you are building a random proof-of-concept app. For this, you are doing strictly greenfield development. You need to focus on implementing end to end testing tools (type checking, QA etc.)
If you chose a stack that LLMs have a lot of training data on, you’re in luck.
If you didn’t? Too bad.
The idiomatic code they will write will be, on average, fine for a demo. You likely don’t even need to do a code review, because it’s a low stakes project with a high likelihood of success.
Now, let’s instead assume you’re deep in the backend mines at a large legacy company.
You’re working with a 500,000 LOC java codebase that runs on a patched legacy version of java. The LLM’s aren’t trained for this, the codebase is massive, and there certainly are implications if things get broken (which means the code review process will be extremely slow).
This is the opposite case. Where the models have a predisposition to do exactly what you do not want them to do.
But still there is a way to set up an agent such that it does generate good code some of the time. In a case like this, where complexity and risk are high, humans will still need to write and review quite a bit of code (although this will decrease over time). You should document your codebase in a rules file. You should build some skills for internal apis that would be useful. Setup access credentials for your agents. Provide skills for your test pipelines, etc. Add mcps for your system integrations. etc. It's a lot of stuff.
Luckily, being able to define systems means we can measure them which means we can improve them.
I am defining this industrialization as software process engineering . You are engineering the processes and plumbing through which your coding factory will produce increasingly high quality code over time.
So the first step is knowing the blocks we build the factory with. These are simple: Agents, Hooks, Skills, Rules, etc.
The next step to engineer the performance of your agent system is actually having something to measure and having a tool for reviewing the performance of that system.
The purpose of this article is to help you understand if you are moving in the right, or wrong, direction, and how to make sure you are on track while building your factory.
How do we measure the output here though? Measuring code quality directly is quite difficult, and even ai based scoring systems are not great. Eventually better systems for measuring the quality will emerge, but, like a factory, we will still have quality control.
However, there is something else about the output that we can measure:
Quantity.
This means the number of tokens coming out of your system on a day to day basis.
This is your system’s token output , and as the autonomy of these models increases the question will continue to shift away form whether agents can do a task and towards how many tasks you can simultaneously have them do.
And so to measure your factory, first measure your tokens .
I’ve talked with a lot of engineers and founders and many of them feel like they aren’t spending enough tokens, because they cannot figure out how .
Well, the first step to figuring out how to improve your factory is reviewing its performance, and giving it upgrades where it needs them.
To do this well, you’ll need to be reviewing your past sessions. Particularly sessions where the agents/models failed to solve the task.
Models are so smart at this point that the likely issue was not model intelligence but rather the tools you provided it and the context you gave it access to.
While reviewing a session you should be thinking “How many tokens was spent on this task?” and comparing it with “How complex was this task?”.
If something feels off, that means it probably was, and using tools to understand your sessions will help you figure out the key missing context. Once you identify what was missing, you can then go update your rules, download new skills, install new cli tools etc.
And to keep up with everyone else, you can just do this, ad infinitum, until your system is smoothly attacking every ticket it is given.
Because this is such a common concern, we built a simple app in rust that allows you to understand your token usage and session activity.
Check it out at: init . randomlabs . ai
Posted: 2026-02-03T19:05:56.000Z
Engagement: 714 likes, 50 retweets, 11 replies