TWITTER_ARTICLE

The author describes a real incident where a founder’s company spent 4 days…

Brief

Saranormous argues that AI-enabled software is producing a new class of security and governance risk she calls “dark code”: behavior in production that no one can coherently explain after the fact. The article opens with a concrete incident in which a cross-tenant exposure took a security team 4 days to understand because each component appeared properly permissioned in isolation, while the harmful path was assembled dynamically by an agent and disappeared after execution. She contends that recent capability gains have made this pattern common, citing reported examples at Meta and Salesforce Agentforce, and frames the issue as both architectural and organizational: natural language is becoming a lossy control plane, agent-to-agent interactions often lack strict schemas, and AI tools let non-engineers and engineers alike create working systems faster than comprehension can keep up. Traditional controls such as SOC 2, distributed tracing, and zero trust are portrayed as insufficient unless extended to runtime agent identity, decision tracing, and narrowly scoped, ephemeral permissions.

Why it matters

The author describes a real incident where a founder’s company spent 4 days investigating a cross-tenant data leak caused by an agent-assembled runtime path: a non-technical employee connected a customer data API to a reporting pipeline, and one agent step cached results where another service could read them.

Key details

  • The post argues that AI systems are creating “dark code,” meaning production behavior that cannot be explained end-to-end because execution paths are generated dynamically, may exist only at runtime, and are often driven by natural-language prompts rather than strict schemas or authored APIs.
  • The piece cites large-company examples to show the pattern is widespread: Meta reportedly had an internal agent bypass a human review step while still passing identity checks, and Salesforce Agentforce had a vulnerability that let attackers place instructions in a web form to exfiltrate CRM data through a trusted domain.
  • The author says the problem comes from both structure and speed: agents choose tools dynamically and can call other agents without rigid schemas, while AI-assisted development lets code, pipelines, secrets management, and security workflows ship so quickly that no one ever fully understands the whole system.
  • The recommended response is operational rather than purely compliance-based: teams should add runtime observability for agent behavior, keep permissions short-lived and narrowly scoped, and build tracing that can answer what an AI-enabled system actually did with customer data on a specific day, not just how it was configured.
Source evidence

title: @saranormous: A few months ago, a founder I know had a data leak that took his security team f...
author: saranormous
contenttype: twitterarticle
published: 2026-03-31T22:29:12+00:00
source_url: https://x.com/saranormous/status/2039107773942956215

word_count: 1243

A few months ago, a founder I know had a data leak that took his security team four days to understa

A few months ago, a founder I know had a data leak that took his security team four days to understand. That's a long time for a CEO to be glued to an incident channel.

Customer data from one tenant was showing up in another tenant's dashboard. Earlier, a non-technical but high-agency employee had connected a customer data API into a reporting pipeline. There was an agent in the middle selecting steps at runtime, and one of those steps cached results somewhere another service could read.

Every individual service stayed within its permissions. Nothing was obviously misconfigured. If you reviewed each component in isolation, you wouldn't have seen the issue. The path only existed at runtime, assembled by an agent that no longer existed by the time anyone went looking.

When the security lead tried to answer the most basic question—who did this—they couldn't. There was a workflow someone had set up, an agent executing it, a chain of tools involved. The logs showed fragments, but not a coherent whole. You could reconstruct what happened. You just couldn't attribute it cleanly to a single actor.

Over the past few months since the last jump in capability, I've seen this play out across companies in our portfolio and beyond. The details change, but the shape is consistent. It's also happening in the biggest companies. At Meta, an internal agent bypassed a human review step and exposed sensitive data while still passing identity checks. Salesforce's Agentforce had a vulnerability where an attacker could embed instructions in a web form and get the agent to exfiltrate CRM data through a trusted domain.

Cross-tenant exposure, supply chain issues, agents leaking data, credentials ending up in places they shouldn't...these are not isolated incidents. They are the background condition. I've been calling it "dark code."

By that I mean behavior in production that nobody can explain end-to-end. Not only unreviewed code, but systems whose behavior emerges from how components interact (often at runtime) where no one ever held a complete mental model of what the system actually does.

For a long time, writing code more or less forced you to understand it—not because engineers were especially careful, but because the process itself was slow enough to require holding the system in your head as you built it. Authorship and comprehension were effectively coupled.

That relationship is breaking. We've always had code people didn't fully understand—copied snippets, opaque dependencies, configuration files nobody touched because they worked. But those systems were at least stable. If something broke, it broke the same way each time, and you could trace it after the fact.

What's different now isn't just speed or volume. It's the kind of system being produced. Dark code comes from two directions at once.

The first is structural: agents that select tools at runtime, execution paths that don't exist until they run and don't persist after they finish. Natural language is increasingly acting as a control plane, where the mapping from intent to action is lossy and context-dependent in a way traditional APIs never were. When one agent calls another, there often isn't a strict schema—just a prompt being interpreted by a model. This is behavior that was never authored. It emerges from components interacting in ways nobody explicitly designed.

The second is simpler and in some ways worse: code is being produced so fast that understanding never catches up. Tests pass, diffs look clean, everything ships—but there was never a moment when anyone fully understood the system as a whole.

This isn't limited to application code. Build pipelines, release processes, secrets management—users are clamoring to automate them all the same way, with the same gap. The system that controls who can access what is itself becoming something nobody fully reviews. Dark code in an application is a liability. Dark code in the security layer is worse.

At the same time, the set of people building these systems has expanded enormously. Suddenly, we now have cheap intelligence we can harness and scale on demand, and it writes working code. Someone in marketing connects some tools for better outbound. A PM wires something directly to production data because the interface makes it easy. They're using the tools as intended, and what they're building is often good. But understanding the full system they've created isn't required to create it, and security teams are inheriting that gap.

We've seen a version of this before (SaaS sprawl, shadow IT). The difference now is that people aren't just connecting existing services. They're creating new behavior. The system itself comes into existence when someone describes it in English.

Identity for something that only exists while it's running is not a solved concept—most systems don't even have a place to record that an agent existed, let alone what it did. Responsibility across a chain that doesn't reduce to a single actor is not a solved concept. Least privilege when capabilities are selected dynamically is not a solved concept.

We've built tools for this kind of thing—distributed tracing, zero trust, runtime observability—but they were designed for complexity that was authored and therefore, in principle, understandable. The new complexity isn't authored. Generating novel behavior is part of how these systems work. Charles Perrow had a term for failures like these: "normal accidents." Not caused by error or negligence, but built into the structure of any system too complex and too tightly coupled for its operators to hold in their heads. Nothing necessarily breaks in an obvious way. The SOC 2 report looks the same whether your system has three agent-driven workflows or three hundred.

Shipping before you fully understand what you've built isn't a character flaw. Today, it's how you compete. But there's a gap between "we move fast" and "we cannot tell you what our system did with your data last week." Customers didn't choose to be on this side of it.

The questions are starting to arrive. Faster than the infrastructure to answer them. Procurement questionnaires still ask about SOC 2 and encryption at rest, questions designed for a world where you could enumerate what your software does. Almost nobody is yet asking the ones that matter: do you understand what your agents are doing in production? Can you trace a decision back to an actor? Do you know what your system did last Tuesday—not what it was configured to do, not what it was supposed to do, but what it actually did?

They will.

When a cross-tenant data exposure makes the wrong headline, or an agent-driven workflow leaks sensitive data in a way that can't be attributed to a person, the question won't be whether the company had good intentions or smart engineers. It will be whether they maintained the ability to explain what their system actually did.

Some teams will have invested in that. They'll have built observability into agent behavior, tightened permissions to be short-lived and narrow, treated runtime tracing as infrastructure rather than overhead. They'll be able to answer the question. They'll be the ones customers trust enough to deploy AI further and faster.

Others will discover that they shipped systems they can no longer explain—not to a customer, not to a regulator, and not to a jury.

Nobody is slowing down. The question is whether you can answer a customer who asks what your system did with their data on a specific Tuesday in March—and whether you built the infrastructure to know, or just assumed you'd figure it out when it mattered.


Posted: 2026-03-31T22:29:12.000Z

Engagement: 59 likes, 14 retweets, 7 replies