I Tested an AI Agent on My Live Systems. Here Is the Blast Radius Assessment Every Engineer Is Skipping.

An AI agent just went viral. The creator got hired by OpenAI within days of the announcement. Instagram reels with seven-figure view counts are telling engineers to connect the agent to their inbox, their CRM, their project management tools, and step back. The premise: autonomous execution, zero oversight, time reclaimed.

I set up an isolated Mac environment and ran the agent through every task I could construct.

The software worked. The concept did not survive contact with a real operational context.

The agent can do the work. That was never in question. The question is what happens when it does the work wrong — and whether you find out before your client does.

This is not a review of OpenClaw. This is an architectural analysis of the deployment decision — and why the question most engineers are asking is the wrong question.

The Wrong Question#

The wrong question is: what can this agent do?

The demo answers that question cleanly. Summarize emails. Update CRM records. Draft replies. Execute Notion updates. The agent handles each of these in a controlled environment with clean data, clear intent, and no ambiguity.

Production environments are not controlled. They have relationship context the model cannot access. They have implicit rules the team maintains without documenting. They have communication threads where tone, timing, and word choice carry business consequences that no training set has fully encoded.

The right question is: what can this agent break, and what is the recovery time?

That question is an architectural question. And most engineers buying Mac Minis this weekend are not asking it.

Blast Radius: The Metric That Determines Safe Deployment#

Blast radius is a concept borrowed from system reliability engineering. In the context of agent deployment, it measures the maximum damage an agent can produce given its current permission set — and whether that damage is reversible within an acceptable recovery window.

Every permission you grant an agent is a ceiling on its potential blast radius. Read access to an email thread has a low ceiling. Write access to the primary inbox has a ceiling that includes unauthorized client commitments, archived active deals, and reputational exposure that no rollback command will repair.

Before any agent touches a live system, every permission needs a blast radius assessment:

What is the worst action this agent can take with this permission? Is it reversible? In under ten minutes?

If the answer to either of the last two questions is no, the permission does not belong in a production deployment.

Why Slack and Email Are the Highest-Risk Starting Points#

The hype cycle defaults to email and messaging platforms as the first integration targets. This is architecturally backwards.

Slack#

Slack carries organizational hierarchy in every channel. The agent reads a ticket thread. It does not know that pinging a VP in a public channel about a P3 issue is a career event, not a notification. It does not understand that some threads are politically loaded — and that a confident summary posted to leadership changes the conversation in ways no rollback can fix.

A misclassified priority escalated autonomously in front of the wrong audience is not an automation failure. It is a political incident with your name on it.

Email#

Email carries relationship context that spans months or years. The agent reads a thread. It does not know this client is in a delicate negotiation. It does not know that the last message you sent was intentionally vague to hold the conversation open. It responds with precision based on the literal text and misses the intent entirely.

An autonomous send from your address, to a client, with the wrong interpretation of an open negotiation, is not an automation failure. It is a business incident. It may be a legal one.

These systems should be the last to receive agent write access, not the first.

The Deployment Framework#

1. The Empty Mandate Test#

Before any agent is deployed, it must pass the Empty Mandate Test. You define the task in one sentence. Not a category. Not a workflow. One specific, repeatable task.

1
✅  "Summarize tier-1 support tickets into a daily Slack brief."

— that gets a deployment date.

1
❌  "Manage my communications."

— does not.

If you cannot write it in one sentence, the agent has no job description and should not touch your systems.

2. The Staged Promotion Model#

Safe agent deployment follows a structured promotion model — not a single-step connection.

Stage 1 → Read-Only Observation. The agent has access to the system but cannot write, send, or modify. It observes, summarizes, and surfaces patterns. You study its output over a minimum of thirty real interactions. You document where it hallucinates, where it misclassifies, and where its context window produces confident but incorrect summaries.

Stage 2 → Draft-Only Access. The agent generates outputs that require human review and explicit approval before any action occurs. Every email draft is reviewed. Every CRM note is read before it is saved. The agent proposes. The Architect approves. This is not friction — it is validation data.

Stage 3 → Supervised Execution. After validated behavior in Stage 2, the agent is granted scoped write access for a single, defined, low-blast-radius task. One task. One system. Behavior is logged and reviewed weekly.

Stage 4 → Earned Autonomy. Autonomy is a reward for demonstrated reliability, not a feature you enable at installation. It is extended incrementally, with a revocation plan active at every stage.

Most agents never reach Stage 4 in any high-stakes context. That is the correct outcome. Controlled leverage at Stage 2 or 3 is more valuable than uncontrolled autonomy at Stage 4.

The Notion Problem Nobody Is Talking About#

Notion sits at the center of most knowledge-intensive workflows. It holds architecture decisions, project context, client notes, and institutional memory accumulated over months or years.

An agent with write access to a live Notion workspace is operating on the organizational brain.

It will reorganize. It will deduplicate. It will decide that certain pages are redundant and consolidate them. It will do this confidently, based on pattern matching, with no understanding of why a page exists in its current form or who depends on its current structure.

The damage is not immediately visible. It surfaces three weeks later when someone opens a page that has been rewritten — and realizes the original context is gone.

Notion integrations require the most conservative blast radius assessment of any system in a typical operational stack. Read-only access to specific databases is the correct starting point. Unrestricted write access is not on the deployment roadmap until behavior is validated in isolation for an extended period.

The Architectural Verdict#

Agents are a real capability. The use case is legitimate. The efficiency gains at appropriate scope are measurable.

The deployment philosophy being sold in the current hype cycle is not legitimate. It skips the blast radius assessment entirely. It treats autonomy as a feature to enable, not a property to earn.

The Field CTO function is not to chase the demo. It is to map the worst case before the first API key is issued.

Read-only first. Draft-only second. Scoped writes after validated behavior. Earned autonomy last — if at all — in high-stakes systems.

That sequence is not caution. It is architecture.

The engineers who skip it will have the recovery story within six months. The Architects who follow it will have the deployment they can stand behind.

VERDICT & INTEL#

Public Doctrine: Executors debate the hype. Architects calculate the blast radius. Deploy your verdict on YouTube.
The Private Order: Stop reacting to the market. Gain access to executive blueprints, architectural protocols, and unfiltered signals. Access the Vault.

Vladimir Mikhalev

Field CTO · Docker Captain · IBM Champion · AWS Community Builder

Discord: Vault

Telegram: Intel

YouTube: Doctrine