Claude Managed Agents, Explained: Meet Claude's Agent Hub

Most agent demos feel magical until the browser closes.

You can give an AI a goal, a few tools, and a pile of context, then watch it do something that looks suspiciously like work. The harder part starts when you want that agent to run for real people: keeping the loop alive, storing the work history, streaming progress, managing credentials, recovering from failures, and letting a human step in before the agent does anything risky.

That’s the jump Claude Managed Agents is trying to make. Anthropic is packaging the invisible infrastructure around an agent so developers can ship something closer to a durable AI worker than a clever demo script.

Below, we break down Anthropic's own advice for how to work with Managed Agents so you can learn how to apply these tools for your own needs.

First up, the TL;DR
Key moments from the video, time-stamped
Key moments from the video, time-stamped
What managed agents are
Why managed agents exist
The mental model: brain, hands, session, events
The key Anthropic terms, translated
How Claude Managed Agents work
The workflow: from prompt to managed session
Example 1: the incident-investigator agent
Example 2: a production-ready deal-analysis agent
The advanced pieces
When to use managed agents
What to watch out for
The security debate is moving from prompts to infrastructure
Putting this all together: a step-by-step guide
What this means
Resources to check out next
Our take

First up, the TL;DR

AI agents are easy to describe and annoying to run.

The beginner version is simple: an agent needs a goal, context, tools, actions, guardrails, triggers, and a human approval step for anything high-stakes. The production version is where things get messy. Someone has to host the agent loop, run the sandbox, manage tool calls, store session history, stream events, protect credentials, and debug jobs that may run for minutes or hours.

Claude Managed Agents is Anthropic’s attempt to turn that pile of scaffolding into a managed platform.

Here's how it works:

Anthropic’s first Managed Agents walkthrough shows how to build an incident-response agent that checks logs, metrics, deployments, and diffs. (2:11)
The second workshop, Build a production-ready agent with Claude Managed Agents, shows the advanced version: a deal-analysis agent using Linear MCP, memory stores, outcomes, credential vaults, and subagents. (1:38)
The agent model has four core pieces: Agents, Environments, Sessions, and Events.
The “brain” runs server-side, while the “hands” execute tools in a sandbox or customer-controlled environment. (7:22)
Sessions act like durable work records, so the agent can stream progress, persist after refresh, and resume work later. (25:59)
Outcomes let a team define the desired result as a rubric, so Claude can iterate and check its work instead of merely calling tools in sequence. (6:29)
Subagents let the main agent split a complex job across specialists, each with its own context window and activity thread. (11:26)

Why this matters: Managed Agents is basically how you make agents available to your customers. Yes, you might be able to spin up your own agents and it might be worth it for you to do that. But if you're shipping agents to other customers to use, Managed Agents is the system you want.

See, useful agents need more than smart reasoning. They need a workspace, a memory, a tool belt, an activity log, a credential boundary, and a way for humans to approve risky moves. In Anthropic’s demo, the agent stops at recommending a fix. With codebase access, the same pattern could eventually suggest a patch, open a PR, and move from diagnosis to remediation.

Below we break down all the key moments across both of Anthropic's Managed Agent presentations, and then dive deeper into the insights in more depth.

Key moments from the video, time-stamped

(0:21) Isabella He opens the session as a hands-on workshop for “shipping your first manage agent.”
(0:27) Isabella identifies herself as a member of technical staff on Anthropic’s applied AI team, which works across products, research, and customers.
(0:51) The session goal is to build on Managed Agents, understand the harness under the hood, and ship an incident-response management agent.
(1:02) The agenda starts with a refresher on Claude Managed Agents, then moves into a hands-on workshop, then closes with “beyond the basics.”
(1:48) Isabella points viewers toward a later session on “dreaming,” described as a feature for self-improving agents and memory built into the harness.
(2:11) The history starts with the 2023 Messages API, which gave developers raw model access with tokens in and tokens out.
(2:38) Raw model access meant developers had to build their own primitives, including context management, the agent loop, and compaction.
(3:17) As models became more capable, agent infrastructure became harder because agents could take more complex actions in real environments.
(3:42) The Agent SDK let developers call Claude Code programmatically, but developers still had to manage hosting, scaling, and container safety.
(3:55) Managed Agents is framed as the next step: Anthropic manages scaling, sandboxing, observability, and tool runtime.
(4:33) Isabella says teams have seen “10 to 15 times faster” production builds by using the Managed Agents harness.
(4:45) The argument for Managed Agents is that harnesses should evolve alongside agents.
(4:51) Isabella gives “context anxiety” as an example, where a model wrapped up tasks early despite remaining context window.
(5:27) The takeaway: maintaining harness behavior across model releases is a lot of work, so Anthropic wants to absorb primitives like compaction, caching, and context-anxiety mitigations.
(5:55) Managed Agents is organized around three primary resources: Agents, Environments, and Sessions.
(6:00) The Agent endpoint defines the persona and capabilities: model, system prompt, MCP servers, skills, and other components.
(6:13) The Environment is described as the agent’s “hands,” the space where it acts inside a container.
(6:30) A Session ties together an Agent and an Environment, streaming events back to the user while the agent acts.
(6:49) The agent loop runs server-side, which abstracts hosting, scaling, durability, and reliability concerns.
(7:22) A key design decision separates the agent loop from tool execution, decoupling the agent’s “brain” from its “hands.”
(8:11) Credentials and security motivate the separation: sandboxing can prevent the agent from accessing credentials without encryption.
(8:48) Isabella says Anthropic saw more than a 90% reduction in P95 time to first token (8:48) after decoupling the agent loop from tool execution.
(10:49) The workshop app simulates an incident-response scenario in Streamlit.
(11:04) The demo use case is the 3 a.m. on-call incident, where a developer normally investigates metrics, logs, and deployments.
(12:18) The build starts by defining the Agent, including the model, system prompt, and tools.
(12:55) The SRE agent prompt is intentionally simple: debug incidents with access to metrics, recent deployments, diffs, and logs.
(14:01) Isabella says Anthropic had just released bring-your-own containers and compute for Managed Agents, allowing tool execution in a developer’s own infrastructure.
(14:47) Networking is described as an allow list, so developers can restrict the websites and URLs the agent can access.
(14:58) MCP tunnels are presented as a way to run MCP servers inside private environments rather than on the public network.
(15:46) The demo uses the Files API to upload metrics and logs so the agent can inspect them with code.
(16:05) Context engineering is framed as a large share of developer work: deciding what data to give the agent and how it should process that data.
(16:38) A Session binds the Agent, Environment, and mounted logs so the agent can act and interact with a user.
(17:22) Managed Agents works in units of events, including user messages, tool calls, and agent responses, rather than simple request-response tokens.
(17:34) Events can be logged for observability and streamed to users as the agent works.
(18:17) The agent can respond before local tools are connected, but it cannot call get-metrics-style tools until the developer wires them up.
(18:49) Once local tools are connected, the agent can call metrics, deployments, and diffs to debug the incident.
(19:01) The demo adds the ability to delete sessions.
(19:16) Session deletion is framed as a security control because data can be proactively removed from cloud or infrastructure logs.
(20:03) The agent begins using tools like sandboxing, bash, and recent deployments to investigate the incident.
(21:17) The incident is a P99 latency spike that is ten times above baseline.
(21:35) Isabella says real SRE agents need careful choices about MCP servers and skills.
(21:52) Runbook skills are highlighted as useful because they let agents learn from past incident responses and postmortems.
(23:21) The local demo data could be swapped for production systems such as Datadog through the same tool interface.
(24:22) Tool calls and session logs persist in the cloud and appear in the observability console.
(24:33) The demo agent identifies database pool exhaustion caused by a commit that refactored the order summary builder.
(25:05) A more advanced incident agent could use Claude Code to suggest fixes, open PRs, and move from diagnosis to repair.
(25:29) The human developer becomes oversight while the agent handles more of the incident workflow.
(25:59) Session persistence means conversations remain available after a hard refresh without the developer wiring up a database.
(26:42) Sessions have states such as idle and running, which support durability and resumability.
(27:18) The recap names the primitive sequence: define the Agent, give it an Environment, add data/context, create Sessions, stream events, implement local tools, and delete sessions.
(28:35) Sessions “speak in events,” with events appended to logs rather than handled as one-off request-response interactions.
(29:24) Local tools in the workshop are JSON-backed, but the pattern generalizes to production tool clients.
(29:49) Managed Agents separates local tool execution from the cloud-hosted agent loop.
(30:31) Event streaming serves both front-end user experience and observability.
(31:07) Session states include idle, running, rescheduling, and terminated.
(31:26) Webhooks can trigger a session resume or kick off action based on external events.
(31:59) The agent loop’s conversation state persists in the cloud, removing the need to build storage for the prototype.
(32:35) Managed Agents handles production primitives like compaction, caching, and tool calling for the basic build.
(33:04) Advanced features include skills, subagents, memory, and outcomes.
(33:29) Subagents let an orchestrator spin up other agents with their own context windows, improving parallelization and context management.
(33:58) Dreaming is described as Claude reviewing its own memory logs to decide what to keep, merge, or discard.
(34:26) Outcomes let developers define a rubric for the desired result so the agent can work toward a target, rather than merely execute tool calls.
(34:57) Vaults address credential management by storing credentials separately and encrypting access between endpoints and the agent.
(35:43) Other production features include webhooks, fine-grained permission policies, and MCP server controls.
(36:00) The console agent builder provides a developer console for observability dashboards and defining Managed Agents.

Key moments from the video, time-stamped

(0:23) The speaker opens by noting that “Claude Managed Agents” had been mentioned repeatedly that day, but many attendees still did not know what it was.
(1:03) The session goal is for attendees to leave able to start building on Claude Managed Agents immediately.
(1:38) Claude Managed Agents is described as a set of API endpoints that can be used with any Anthropic API key.
(1:47) The platform provides production-ready agent primitives that developers can pick and choose from, then build their own product experience on top.
(2:03) Anthropic handles computer access, credential vaults, MCP authentication, the tool-calling harness, retries, and error recovery.
(2:28) Managed Agents includes memory, context management, and multi-agent primitives that help builders benefit as newer model families improve.
(2:44) The developer console includes observability views for live-debugging what agents are doing.
(3:00) An Agent is framed as a reusable template: system prompt, Skills, tools, and MCP servers.
(3:15) Different agents can get different tool access; one may have bash and web search, while another may be blocked from the web to reduce prompt-injection risk.
(3:33) Permission controls can be defined per tool, so low-risk tools can auto-execute while bash or database MCP calls require explicit user approval.
(3:57) Environments define sandbox behavior, including network access and pre-installed packages from npm or pip.
(4:13) Self-hosted environments and sandboxes let teams bring Cloudflare, Modal, Vercel, or their own sandboxing fleet instead of only using Anthropic infrastructure.
(4:39) A Session is compared to starting a new conversation in Claude.ai or entering Claude Code from the terminal.
(5:06) Sessions can include GitHub repositories or files that Claude should access, with those resources provisioned into the container.
(5:21) Sessions are ongoing event streams where the client submits messages and Anthropic streams back what Claude is doing.
(5:47) User events can include user messages, images, documents, text, interrupts, custom tool results, confirmations, and outcomes.
(6:29) Outcomes let developers pass a spec or rubric to Claude, then have the agent iterate and check its work against that rubric.
(7:04) Agent events include Claude messages, context compaction, MCP or default tool execution, and multi-agent coordination events.
(7:39) Session events expose lifecycle changes such as retries, errors, idle states, termination, and outcome-processing state.
(8:09) Span events tell the product when long-running work starts and ends, so users know the system is not stuck.
(8:37) Self-hosted sandboxes are positioned for teams that want private data to stay inside their VPC or private perimeter.
(9:19) MCP tunnels let teams connect Claude to private MCP servers without exposing those servers over the public internet.
(10:03) The workshop uses a public repo with starter and solution versions so developers can inspect the end-to-end pattern.
(10:41) The production demo is a mock deal-analysis product for evaluating companies using Linear data and private data sources.
(11:22) The demo uses a multi-agent feature launched two weeks earlier to coordinate multiple agents in one workflow.
(11:31) Each subagent can have its own persona and way of doing work.
(11:44) A financial-analysis subagent can have tools or MCP servers that make it better at that specialized task.
(12:53) The starter app intentionally has missing implementation points so the workshop can walk through the actual Managed Agents APIs.
(14:04) Listing sessions is a straightforward Anthropic SDK call to the sessions list endpoint.
(15:24) The more interesting chat view has two pieces: sending session events and streaming events from the server back to the client.
(15:51) Claude Code ships with the Claude API Skill, which makes Claude better at using Anthropic APIs, including Managed Agents APIs.
(17:39) The docs expose endpoints for creating, updating, and fetching Agents, Environments, and Sessions.
(18:01) Session sub-endpoints can list events, stream events, and list uploaded resources such as files and GitHub repositories.
(18:32) The API includes endpoints for credential vaults and memory stores.
(18:37) Vaults let builders provision MCP authentication tokens once and store them securely with Anthropic.
(18:47) When a session includes a vault, Anthropic can inject authentication tokens for MCP servers without Claude seeing those tokens.
(19:09) Memory stores let Claude read and write memories across sessions so later sessions can improve from earlier ones.
(20:04) The Claude console can monitor Managed Agent sessions and provide observability over what each session is doing.
(20:17) The quickstart offers a guided chat with Claude to help developers build agents and sessions.
(20:44) Agents are versioned, so builders can revert if a system prompt or tool list update goes wrong.
(21:38) The demo creates a new session using the Linear MCP and memory stores.
(22:06) The outcome prompt asks Claude to evaluate three companies, use scattered Linear and file data, criticize its own work, and decide whether the findings satisfy the rubric.
(22:46) Sending an outcome definition event kicks off tool use, delegation to subagents, and file reading.
(23:13) The console lets builders click into a running session and watch the model process work live.
(23:25) The demo shows four spawned agents, each with its own context window, reporting back to a coordinator.
(23:53) The console exposes individual event inputs and outputs, including web search calls.
(24:12) Tool-call timing can be inspected to debug slow or inefficient tools.
(24:31) In response to a plugins question, the speaker says agent definitions already operate in a plugin-like way, with more extensibility planned.
(25:16) Memory stores can be inspected, edited, generated, or corrected when Claude records something wrong.
(25:41) Outcome evaluation gives the product a way to know whether Claude has reached a conclusion that satisfies the requested criteria.
(26:09) Building the same system manually would require an agent loop, remote hosting, context management, state recovery, Skills, MCP, durable storage, sandboxing, and secure user auth.
(26:54) Managed Agents gives builders many of those production pieces out of the box, even if a demo uses only some of them.

Now, let's dive into that all more in depth.

What managed agents are

A managed agent is a production-ready agent setup where the platform handles the machinery around the model.

In the basic API version of AI, your app sends a prompt and gets a response. Anthropic calls that direct model access the Messages API. It works well when you want a model to answer, classify, summarize, or generate something in one request. It is stateless by default, so your application usually has to send the relevant conversation history each time if it wants continuity.

An agent needs more than that. It has to keep track of the task, call tools, receive tool results, decide what to do next, handle errors, remember what happened, and sometimes run for minutes or hours. Anthropic’s Managed Agents overview describes the product as a pre-built, configurable harness that runs Claude in managed infrastructure, with support for reading files, running commands, browsing the web, executing code, prompt caching, compaction, and other performance optimizations.

Anthropic explains the progression in three steps. The Messages API gave developers raw model access: tokens in, tokens out (2:11). The Agent SDK gave developers a harness for calling Claude Code-style agents, while developers still managed hosting and scaling (3:23). Claude Managed Agents moved the harness into managed infrastructure with sandboxing, tool runtime, observability, and context management (3:55).

That makes Managed Agents the premium platform-layer answer. You can still build cheaper or simpler workflows with lighter automation tools. Anthropic is making the case that long-running, tool-heavy, stateful agent work needs a stronger runtime than a prompt wrapper.

Why managed agents exist

Real agents need scaffolding. In this context, scaffolding means the support structure around the model: instructions, guardrails, permissions, context, memory, tools, and the runtime that lets the agent keep working. Anthropic’s engineering team calls this structure a harness: the loop that calls Claude and routes Claude’s tool calls to the relevant infrastructure.

The harness matters because agent behavior changes as models improve. Anthropic gives one revealing example in the first video: Claude Sonnet 4.5 sometimes showed “context anxiety,” where it wrapped up tasks early despite remaining context window. Anthropic added harness-level mitigations, then found the behavior disappeared with Claude Opus 4.5. The workaround became dead weight (4:51).

The infrastructure lesson is simple: if you hand-build an agent harness around today’s model quirks, you may have to rebuild it when the next model gets smarter. Anthropic’s engineering post says the point is to use interfaces that can outlast any particular harness implementation.

That is the same reason business workflows break every time a model update changes the model’s habits. The workflow was secretly built around a temporary behavior. Managed Agents is Anthropic’s attempt to make the outer system more durable than any one model generation.

The mental model: brain, hands, session, events

The easiest way to understand Managed Agents is to split the system into four parts. Anthropic’s docs use four core concepts: Agents, Environments, Sessions, and Events.

The brain is Claude plus the agent harness. It reads the task, decides which tools to call, processes results, and decides what should happen next. The agent loop runs server-side, which is why the session can keep going after a browser refresh or laptop close (6:55).

The hands are the tools and execution environments. Anthropic’s docs describe built-in tools such as bash, file read/write/edit, glob, grep, web search, web fetch, and MCP servers. In the workshop language, these are how the agent touches the outside world.

The session is the durable work record. The official docs define a session as an agent instance within an environment that maintains conversation history. Anthropic’s engineering post goes deeper: the session is an append-only log of everything that happened, separate from Claude’s current context window.

The events are the messages and state changes moving through that session. A user message is an event. A tool call is an event. A tool result is an event. A status update is an event. The event stream is what lets the product show progress and lets developers audit the work after the fact.

The simple picture: the brain thinks, the hands act, the session remembers, and events show what happened.

The key Anthropic terms, translated

Managed Agents has a lot of vocabulary because the product is packaging the pieces developers used to build by hand. Here are the terms worth getting straight before the rest of the article.

Agent: the reusable, versioned worker configuration. It bundles the model, system prompt, tools, MCP servers, Skills, and optional multiagent roster that shape Claude’s behavior during a session. Anthropic’s docs say all Claude 4.5-family and later models are supported for Managed Agents.
Environment: the sandbox configuration where the agent runs. Multiple sessions can share one environment definition, but each session gets its own isolated fresh Linux container. Environments can specify packages and networking rules.
Cloud sandbox: Anthropic’s managed Linux container for tool execution. It is the place where files can be read and written, code can run, and packages can be pre-installed before the agent starts.
Self-hosted sandbox: the version of the execution environment that runs on infrastructure you control. This matters for compliance, data residency, private networks, and systems that should stay inside your own perimeter.
Session: one running instance of an Agent inside an Environment for a specific task. It references an agent ID and an environment ID, maintains history across interactions, and starts work when your app sends a user event.
Session statuses: the lifecycle states Anthropic exposes. A session can be idle, running, rescheduling after a transient error, or terminated after an unrecoverable error.
Events: the event-based communication layer. User events and system events are sent into the agent. Session, span, and agent events come back out for observability, progress, tool calls, status changes, and confirmations.
Server-sent events: the streaming mechanism used to send progress back to the application as work happens, rather than waiting for a final answer.
Tools: capabilities the agent can use inside a session. The built-in agent toolset includes bash, file operations, glob, grep, web search, and web fetch. Custom tools can be executed by your application and returned to Claude as tool results.
MCP connector: the bridge from Managed Agents to Model Context Protocol servers. MCP lets an agent reach external tools and data sources through a standard interface, while session-level auth can be supplied through vaults.
Remote MCP servers: external services that expose tools and data over MCP, such as GitHub, Linear, Figma, Datadog-style systems, or your own internal services.
Permission policies: rules that decide whether server-executed tools run automatically or wait for approval. The built-in agent toolset defaults to always_allow if you omit a policy, while MCP toolsets default to always_ask.
Agent Skills: reusable, filesystem-based expertise that loads on demand. A Skill can hold workflows, best practices, and domain context, turning a general agent into a specialist without stuffing every instruction into the system prompt.
Files: session resources you can upload through the Files API and mount inside the sandbox. Up to 100 files are supported per session.
GitHub access: Anthropic’s pattern for mounting a repository into a session sandbox and using GitHub MCP to create pull requests. GitHub repos are cached so future sessions that use the same repo can start faster.
Memory stores: workspace-scoped collections of text documents optimized for Claude. When attached to a session, they mount under /mnt/memory/, can be read and written with file tools, and create immutable memory versions for audit and recovery.
Dreams: a research-preview memory-curation job. A dream reads a memory store plus past sessions, then produces a new reorganized memory store with duplicates merged, stale entries replaced, and new insights surfaced.
Outcomes: a way to tell the agent what “done” looks like. You define a desired result and rubric; the harness provisions a grader in a separate context window, the agent iterates, and outcome events show whether the work is satisfied.
Vaults: per-session credential containers for third-party services. Vaults let you register credentials once, reference them by ID, and avoid exposing raw tokens to Claude or the sandbox.
Webhooks: notifications for major state changes. The docs distinguish them from the live SSE stream: webhooks tell your app that something important happened so you can fetch the current object.
Multiagent sessions: a coordinator pattern where one agent delegates work to other agents. Each agent gets its own isolated context thread, while sharing the same sandbox, filesystem, and vault credentials.
Scheduled deployments: a way to run managed-agent work on a cron schedule, turning agents into recurring workers rather than only user-triggered sessions.

How Claude Managed Agents work

Claude Managed Agents is built around those four objects.

The Agent is the definition of the worker. It includes the model, system prompt, tools, MCP servers, Skills, and optional multiagent delegation settings. In the first workshop demo, the agent is an SRE agent. Its job is to investigate incidents. It gets a simple system prompt and tools for metrics, recent deployments, diffs, and logs (12:55).

The Environment defines where the agent does work. Anthropic’s docs describe this as either an Anthropic-managed cloud sandbox or a self-hosted sandbox on your own infrastructure. The environment can define packages and networking. For production, Anthropic recommends limited networking with an explicit allowed-hosts list and minimum necessary access.

The Session connects a specific Agent to a specific Environment for a specific task. In the incident demo, the session binds the SRE agent, the environment, and the uploaded log file (16:38). Once the session starts, Claude can inspect logs, call tools, stream results, and keep the investigation alive.

The Events are how the application and the agent communicate. Your app sends user events. The agent emits response events, tool-call events, status events, span events, and session events. The stream can be shown to a user, logged for observability, or used to trigger follow-up behavior.

The more advanced production-ready demo breaks events into user events, agent events, session events, and span events. That sounds technical, but the purpose is practical: you want to know what the agent is doing, when something starts, when something ends, when it needs approval, and when it hits an error.

The workflow: from prompt to managed session

The production workflow looks like this:

Create the agent: choose the model, write the system prompt, define tools, attach MCP servers, add Skills, set permission policies, and version the configuration.
Create the environment: choose a managed cloud container or a self-hosted sandbox, configure packages, and set network access.
Start a session: connect the agent to the environment and attach any resources the task needs, such as files, repositories, memory stores, or vault IDs.
Send events: send the user’s request, tool results, confirmations, system updates, interruptions, or outcome definitions as events.
Stream responses: show the agent’s work as it happens, including messages, tool calls, status updates, confirmations, and outcome evaluations.
Monitor and steer: use the console, event logs, permission prompts, and interrupts to debug, redirect, or pause the agent.
Persist, resume, archive, or delete: keep sessions alive for long work, resume later, archive for history, or delete sessions and uploaded files when you no longer need them.

That is the production version of the Agents 101 formula:

Goal + Context + Tools + Actions + Guardrails + Human approval = a managed agent session you can run, monitor, and resume.

The key shift is that you are no longer thinking only in prompts. You are designing a small operating system for a piece of work.

Example 1: the incident-investigator agent

The workshop’s example is an incident-response agent (10:49). The scenario is the classic developer nightmare: something breaks, P99 latency jumps to roughly ten times baseline, and someone needs to figure out why (21:17).

The agent gets a system prompt that tells it to act like an SRE agent, logs and metrics as context, tools for recent deployments and diffs, a session that streams every tool call and response, and a front-end view where the user can watch the investigation unfold. Anthropic also uploads logs as files, which is a small example of the larger point: the agent is only useful if the right context enters the session.

By the end of the run, the agent identifies a likely database pool exhaustion problem, connects it to a recent code change, rules out other causes, and recommends actions (24:33). The demo stops at recommendations. Anthropic points out that with codebase access, the same pattern could continue into remediation: suggest a fix, open a PR, and move from diagnosis to repair (25:05).

The beginner mapping is useful. The trigger is the incident. The goal is the investigation. The context is logs and metrics. The tools are deployment and diff lookups. The guardrail is that the demo stops before production changes. The human remains the reviewer before any fix lands.

Example 2: a production-ready deal-analysis agent

The second Anthropic demo, Build a production-ready agent with Claude Managed Agents, shows the same architecture in a more advanced workflow. The setup is a mock M&A-style analysis agent that can use files, Linear data, memory stores, outcomes, vaults, console observability, and multiple specialist agents.

The useful idea is delegation. Instead of one agent stuffing every detail into one context window, a coordinator can spin up specialized agents. One subagent might focus on macro trends. Another might focus on financial analysis. Another might inspect operational risks. Anthropic’s multiagent sessions docs describe this as a way for agents to act in parallel with isolated context threads while sharing the same sandbox, filesystem, and vault credentials.

That makes multiagent orchestration easier to understand. It is a project lead assigning smaller jobs to specialists, then turning the work into one recommendation.

The advanced pieces

Once the basic loop works, Anthropic’s roadmap adds more production controls.

Subagents: let a main agent delegate pieces of work to other agents. This helps with parallelization and context management (33:29). The docs add a practical constraint: the coordinator can delegate one level deep, and a maximum of 25 concurrent threads are supported.
Memory stores: let agents carry useful information across sessions, including user preferences, project conventions, prior mistakes, and domain context. The docs add useful limits: each memory is capped at 100 KB, each store can hold up to 2,000 memories, and a maximum of 8 stores can attach to a session.
Dreams: are Anthropic’s research-preview way for Claude to clean up accumulated memory by reading past sessions and producing a new curated memory store. The input store is never modified, so teams can review the output before switching over.
Outcomes: let you define a desired result and rubric. Anthropic’s docs say the harness provisions a separate grader context so the grader is not influenced by the main agent’s implementation choices.
Vaults: store credentials outside the sandbox and outside Claude’s context window. Vaults can hold MCP credentials and environment-variable credentials, and actual credential values are write-only fields that are never returned in API responses.
Webhooks: let external systems notify your app about major state changes. The docs say webhook payloads include the event type and ID, then your app fetches the full object directly.
Permission policies: let you choose whether tool calls execute automatically or wait for approval. This is the production version of “human in the loop.”
Scheduled deployments: let an agent start sessions on a recurring cron schedule, with optional files, GitHub repositories, memory stores, and vaults attached to the run.

These features turn the agent from a clever tool user into a controllable workflow runner. The workflow has memory, permissions, outcome criteria, and a recoverable record of what happened.

When to use managed agents

Managed Agents make the most sense when the work is long-running, stateful, tool-heavy, or needs production-grade control. Anthropic’s overview says they are best for tasks that run for minutes or hours, require multiple tool calls, need secure sandboxes, need self-hosted execution, or benefit from stateful sessions with persistent filesystems and conversation history.

Good fits include incident investigation across logs, metrics, deployments, and code; codebase work that needs files, shell commands, repositories, PRs, and review; research workflows that require multiple tools and checkpoints; business-operations workflows that need private systems and next-step drafting; project-management agents that monitor tasks and blockers; customer-support agents that draft approved responses; and creative-review agents that compare outputs against a style rubric.

A simpler automation tool is usually better when the job is short, deterministic, and mostly app-to-app routing. Make, Zapier, n8n, and Cloudflare Workers still shine when the workflow is “when X happens, do Y.” Reach for managed agents when the workflow is closer to: “when X happens, investigate, decide which tools matter, keep a record, ask for approval where needed, and continue until the outcome is good enough.”

What to watch out for

It is still developer infrastructure. Managed Agents make production agents easier to build, but they are still API-driven. A nontechnical operator can understand the model, but a developer usually needs to wire it into a product, connect systems, configure permissions, and manage the surrounding workflow.

Beta status matters. Anthropic’s docs say all Managed Agents API requests require the managed-agents-2026-04-01 beta header. MCP tunnels and Dreams are in a more limited research preview. That does not mean you should avoid them; it means teams should expect behavior to keep changing.

Data retention matters. Anthropic’s Managed Agents overview says sessions are stateful by design and store conversation history, sandbox state, and outputs server-side. The same docs say Managed Agents is currently outside Zero Data Retention and HIPAA Business Associate Agreement coverage, though users can delete sessions and separately delete uploaded files through the API. That matters for teams handling sensitive data.

Session checkpoints have a clock. Anthropic’s event-stream docs say session history persists until deletion, but sandbox checkpoints are only preserved for 30 days after the session’s last activity. If a workflow needs full sandbox state beyond that window, your app needs to send periodic user messages before checkpoint expiry.

Permissions should start narrow. Give the agent read access before write access. Require confirmation for sends, deletes, payments, production changes, database writes, and anything customer-facing. Anthropic’s permission policies docs say server-executed tools can either run automatically or wait for approval, and MCP toolsets default to approval first.

Memory can carry mistakes forward. Anthropic’s memory docs warn that read-write stores exposed to untrusted input can be poisoned by prompt injection. Use read-only memory for shared reference material, keep stores focused, and review or redact memory versions when sensitive content leaks into history.

Observability is part of the user experience. With agents, people need to watch the work, inspect the tool calls, see errors, understand timing, and review the result. The console view is useful because it shows tool calls, event streams, subagent activity, errors, and timing. In production, this is how users build trust.

The security debate is moving from prompts to infrastructure

Anthropic’s design makes a clear bet: agent safety improves when tool execution, credentials, logs, and state become explicit infrastructure objects.

That bet matches the direction of the broader agent ecosystem. MCP gives AI applications a standard way to reach external systems, and Anthropic’s Managed Agents docs describe MCP servers as the way agents connect to standardized third-party tools and data sources. The Managed Agents MCP connector separates reusable server declarations from session-level auth, which keeps secrets out of agent definitions.

The counterpoint is equally important. Tool access expands the attack surface. Recent security research on MCP-style systems points to risks such as tool poisoning, prompt injection, privilege escalation, supply-chain attacks, missing capability attestation, and implicit trust across connected servers. The infrastructure answer is stronger identity, tighter permissions, vault-backed credentials, event logs, network restrictions, and narrow default access.

The lesson for builders is practical. The prompts matter. The model matters. The tool registry, credential boundary, audit trail, network allow list, session-deletion policy, memory access mode, and approval flow matter just as much.

Putting this all together: a step-by-step guide

Here is the practical build sequence to add at the bottom of the article. It turns the whole explainer into a usable checklist for readers who want to go from “I understand Managed Agents” to “I know what I would build first.”

Step 1: Choose a real workflow

Start with a workflow that has a clear business goal and a human review point. Good first candidates include incident triage, code review, research brief generation, customer-support drafting, status reporting, procurement review, creative QA, or deal analysis.

Goal: what outcome should the agent produce?
Trigger: what starts the work, a user message, webhook, schedule, ticket, incident, or file upload?
Human review: where should the agent pause before doing anything risky?

Step 2: Decide whether Managed Agents is the right tool

Use Claude Managed Agents when the job is long-running, stateful, tool-heavy, or needs durable progress. Use simpler automation when the job is deterministic, short, and mostly app-to-app routing.

Step 3: Define the Agent

Create the reusable Agent. Give it a clear system prompt, the right Claude model, a small set of tools, relevant MCP servers, and any Skills it needs. Treat the agent like a worker role, not a one-off prompt.

Use the system prompt for the role, norms, and constraints.
Use tools for what the agent can do.
Use Skills for reusable procedures and domain expertise.
Use multiagent settings only when you truly need specialization or parallel work.

Step 4: Create the Environment

Define the Environment where work runs. Choose Anthropic’s cloud sandbox for a fast start, or a self-hosted sandbox when data residency, compliance, or private infrastructure matters.

Pre-install the packages the agent needs.
Use limited networking and an explicit allowed-hosts list in production.
Use self-hosted sandboxes when private data should stay inside your perimeter.

Step 5: Attach context

Attach the resources the agent needs at session start. That can include files, GitHub repositories, memory stores, and vault IDs. This is the context-engineering step that determines whether the agent can actually do the job.

Use files for logs, PDFs, CSVs, specs, and artifacts.
Use GitHub resources when the agent needs to inspect or edit code.
Use memory stores for user preferences, project conventions, prior mistakes, and persistent domain context.
Use vaults for credentials, especially per-user MCP auth.

Step 6: Start a Session

Create a Session by connecting the Agent to the Environment. Sessions follow a two-step lifecycle: create the session to provision the sandbox, then send a user event to start work.

Use the latest agent version for normal runs.
Pin a specific agent version when you need controlled rollout or reproducibility.
Give the session a title and metadata so humans can find it later.

Step 7: Send and stream Events

Use the event stream as the product’s live work feed. Send user.message events to start or steer the job. Stream agent, session, and span events back to the UI so users can see what is happening.

Show status changes, tool calls, and long-running span events.
Use interrupts when the agent goes off track.
Track usage after the session goes idle to monitor costs.

Step 8: Add permissions before adding power

Use permission policies to decide which actions can auto-execute and which require approval. Start with read-only and low-risk tools. Add write access slowly.

Let safe reads auto-execute.
Require confirmation for bash, database writes, sends, deletes, PR creation, production changes, and customer-facing actions.
Remember that MCP toolsets default to approval first, which is the safer starting point.

Step 9: Define what “done” means

For work that needs quality control, add Outcomes. Write a rubric that tells the agent what the final artifact must satisfy. The harness can then run a separate grader, feed the feedback back to the agent, and iterate until the criteria are met or the process stops.

Use outcomes for decision briefs, research reports, code reviews, QA passes, and financial analysis.
Keep rubrics concrete: required sections, evidence standards, error checks, and approval conditions.
Use outcome events to show progress and explain why the result passed or failed.

Step 10: Add memory carefully

Use memory stores when the agent should carry preferences, conventions, and learned corrections across sessions. Keep memory scoped, reviewable, and permissioned.

Use read-only memory for shared standards and reference material.
Use read-write memory only when the agent truly needs to learn.
Keep stores focused by user, team, project, or domain.
Use Dreams to consolidate memory when stores get messy.

Step 11: Connect external systems through MCP and vaults

Use the MCP connector for external tools and data sources. Declare the MCP server on the Agent, then supply per-session credentials through Vaults.

Keep secrets out of reusable agent definitions.
Reference vault IDs when creating sessions.
Use separate vaults per user when the agent acts on a user’s behalf.
Use MCP tunnels for private MCP servers when public exposure is the wrong security posture.

Step 12: Monitor, resume, archive, or delete

Use session operations, event streams, and webhooks to manage the agent after launch.

Resume idle sessions by sending new user messages.
Archive sessions you want to keep for history.
Delete sessions and uploaded files when data should be removed.
Use webhooks for major state changes, then fetch the full object through the API.
Use scheduled deployments when the workflow should run on a recurring cadence.

The short version of the build plan

Pick a workflow with a clear goal and review step.
Create a versioned Agent.
Create an Environment with minimal required network access.
Attach files, repos, memory stores, and vaults only as needed.
Start a Session.
Send user events and stream results.
Use permissions for risky tools.
Use outcomes when quality matters.
Monitor the session through events, webhooks, and console views.
Delete or archive the data when the work is done.

What this means

The big shift is that agents are becoming durable systems.

Managed Agents package the invisible work around the model: the loop, sandbox, session log, memory, tools, credential boundary, event stream, permission layer, and observability console. For beginners, the important lesson is conceptual: an agent is only as useful as the system around it. For builders, the practical lesson is that the harness is becoming a platform layer.

This pattern is spreading. OpenAI’s Workspace Agents push a similar idea into ChatGPT: teams create shared agents that run in the cloud, gather context from business systems, ask for approval, and execute work from inside the workspace. The shape is different, but the direction rhymes. Agents are moving from personal prompts to shared, governable systems.

The open question is where teams draw the line between agent autonomy and human review. Diagnosis feels easier to trust than remediation. Recommended actions feel safer than merged code. A pull request feels safer than a direct production change. The companies that get value first will probably be the ones that turn those boundaries into product design rather than vague policy.

Resources to check out next

Claude Managed Agents overview: Anthropic’s official docs for core concepts, supported tools, beta access, data-retention caveats, and stateful session behavior.
Agent setup: the docs page for defining a reusable, versioned agent configuration with model, system prompt, tools, MCP servers, Skills, and multiagent settings.
Cloud environment setup: the docs page for configuring cloud sandboxes, packages, and network access.
Start a session: the docs page for connecting an Agent to an Environment and beginning work through events.
Session event stream: the docs page for sending user events, streaming agent events, handling custom tools, and processing confirmations.
MCP connector: the docs page for connecting agents to external tools and data sources.
Vaults: the docs page for registering per-user credentials and referencing them at session creation.
Memory stores: the docs page for persistent memory, access modes, audit history, limits, and prompt-injection risks.
Outcomes: the docs page for defining rubrics and letting the agent iterate against them.
Scaling Managed Agents: Anthropic’s engineering explanation of why it separated the brain, hands, and session log.
Ship your first Managed Agent: the hands-on incident-investigator workshop.
Build a production-ready agent with Claude Managed Agents: the more advanced demo with sessions, subagents, memory, outcomes, vaults, and console observability.

Our take

Managed Agents is Anthropic’s argument that the next agent bottleneck is operational. Models can already plan, inspect files, call tools, and write code well enough to be useful. The scarier, less glamorous work is keeping the agent alive, bounded, observable, and connected to the right context.

That makes the incident-response demo a smart choice. On-call engineering compresses the whole agent problem into one workflow: high stakes, time pressure, scattered context, tool-heavy investigation, real credentials, and a human who needs to trust the answer fast.

Claude Managed Agents will probably be judged less by whether the demo works and more by whether the boring parts stay boring in production. Sessions need to persist (25:59). Events need to stream. Tools need to run where the customer wants them. Credentials need hard boundaries. Logs need to tell the story after something goes wrong.

If that layer works, the agent becomes less like a chat window and more like a teammate with a workbench, a notebook, and a supervisor. The future of agents may come down to how well companies design the workbench.

Anthropic's Managed Agents, Explained (Meet Claude's Agent Platform)

First up, the TL;DR

Key moments from the video, time-stamped

Key moments from the video, time-stamped

What managed agents are

Why managed agents exist

The mental model: brain, hands, session, events

The key Anthropic terms, translated

How Claude Managed Agents work

The workflow: from prompt to managed session

Example 1: the incident-investigator agent

Example 2: a production-ready deal-analysis agent

The advanced pieces

When to use managed agents

What to watch out for

The security debate is moving from prompts to infrastructure

Putting this all together: a step-by-step guide

Step 1: Choose a real workflow

Step 2: Decide whether Managed Agents is the right tool

Step 3: Define the Agent

Step 4: Create the Environment

Step 5: Attach context

Step 6: Start a Session

Step 7: Send and stream Events

Step 8: Add permissions before adding power

Step 9: Define what “done” means

Step 10: Add memory carefully

Step 11: Connect external systems through MCP and vaults

Step 12: Monitor, resume, archive, or delete

The short version of the build plan

What this means

Resources to check out next

Our take

Grant Harvey

Company

Categories