When Subagents Turn AgentDesign Into an Operating Model Decision

The Configuration Question

A team starts with one coding agent and one long prompt. It works well enough for simple tasks, but the session grows, tool calls pile up, and each new request carries the weight of everything that came before. Then the team splits the work. One agent investigates the codebase, another handles repetitive edits, a third runs a narrow review. The question stops being what prompt to write and becomes how many agents to run, what each is allowed to do, and how their work is coordinated.

As of April 2026, that question has a product-level answer.

The Short Version

On April 15, 2026, Google introduced subagents in Gemini CLI (v0.38.1 . Google describes them as specialized agents that operate alongside the primary session with their own context windows, system instructions, tools, and MCP server access, then return a consolidated result to the main agent.

The update changes agent structure from an implementation detail into a configurable operating choice. Once work is split across isolated specialists, teams are no longer managing a single model session. They are managing delegation, coordination, tool boundaries, and concurrency.

What Led Here

Google’s subagents release followed engineering guidance it had published one day earlier. In its Agent Bake-Off post, Google argued that production-ready agents should move away from one large agent handling intent extraction, retrieval, and reasoning all at once, and instead decompose work into specialized subagents managed by a supervisor. Google framed the pattern as a way to reduce hallucinations, lower latency, and make systems easier to maintain.

The Gemini CLI update operationalized that advice in a shipping product.

Under the Hood

A subagent in Gemini CLI is exposed to the main agent as a tool. When the main agent calls it, the task is delegated. The subagent runs in its own context loop and returns a single consolidated response. The intermediate steps, potentially dozens of tool calls, file reads, or test runs, never enter the main agent’s context.

This is the core isolation model. Each subagent gets its own context window, system prompt, and conversation history. The orchestrator sees results, not execution traces. That keeps the main session lean and prevents intermediate output from one task degrading the next.

Tool access is scoped through YAML frontmatter in the Markdown definition file. Subagents can receive a restricted tool list, wildcard patterns ( _{mcp_*} for all MCP tools, _{mcp_server_*} for a specific server), or inline MCP servers isolated to that agent. If tools are not specified, the subagent inherits everything from the parent session. Tool isolation is opt-in, not default.

Custom instructions live in the Markdown body, which becomes the subagent’s system prompt. Configuration fields include name , description , tools , model , temperature , max_turns(default 30 , and _{timeout_mins}(default 10 . Definitions can be

committed to a repository at project level or stored globally at user level. Each subagent becomes a versionable, shareable specialist role.

Delegation happens automatically (the main agent routes based on the subagent’s description) or explicitly (via @agent_name syntax). Subagents cannot call other subagents, which prevents recursion. Remote subagents communicate through the Agent-to-Agent protocol, meaning a specialist can run on another machine or in another environment.

Parallel execution is supported. Google explicitly warns that parallel subagents performing heavy code edits “can lead to conflicts and agents overwriting one another” and that parallel execution “will lead to usage limits being hit faster.” The GitHub issue tracker for the feature states the v1 “does not solve more complex concerns like agents having conflicts.”

On the security side, Gemini CLI v0.36.0 introduced native macOS Seatbelt and Windows sandboxing for subagent security. Six built-in Seatbelt profiles control write access, network access, and read scope at different restriction levels. Different subagents within the same session can operate under different security profiles. JIT context injection delivers context dynamically at invocation rather than carrying it as static state.

Why This Matters Now

The significance is not that multi-agent patterns exist as a concept. What changed is that Google moved the pattern into a shipping product with explicit configuration, scoped tools, isolated context, parallel execution, and documented operational warnings.

That changes the practical unit of deployment. A team adopting subagents is no longer tuning one assistant. It is defining a topology of roles, permissions, and execution paths. Which tasks deserve a separate agent? What tool access should each have? When is parallelism worth the coordination overhead? What should the orchestrator retain versus summarize? These are design decisions, and they now have a concrete configuration surface.

Google’s Bake-Off guidance frames the motivation directly: prompting a single large agent to handle everything at once is “a fast track to hallucinations and latency spikes.” Decomposition into specialists with deterministic execution where needed is the engineering response. The subagents feature is the product implementation of that argument.

What This Changes For Operations

Agent topology becomes something teams must actively govern.

Permissions are no longer global. Each subagent can have its own tool access, MCP connections, and security profile. That is a real improvement over a single agent with access to everything, but only if isolation is explicitly configured. Omitting the tools field from a subagent definition causes it to inherit the parent’s full tool set. The secure path requires deliberate configuration.

Cost visibility is partially addressed. Gemini CLI’s_/stats command now distinguishes requests by role (main agent, subagent, utility). Per-subagent bounds ( _maxTurns, _{maxExecutionTime}) provide individual limits. But there is no aggregate cost ceiling across all subagents in a session. Parallel execution multiplies token consumption without a documented mechanism to cap total spend.

Observability is the most notable gap. The orchestrator receives summaries, not traces. The full execution history of a subagent’s work lives inside that subagent’s context loop, not in the main session. Gemini CLI does not ship a dedicated observability framework for subagent execution chains. For teams running multiple subagents in parallel, understanding what happened across the full delegation requires inspection that the product does not yet surface natively.

There is also no built-in structure for subagent deployment governance. Which subagents are authorized for which environments? What read/write permissions does each hold? Who approves changes to a subagent’s tool access? Gemini CLI’s documentation describes policy controls, execution ceilings, and recursion protection, but not a broader framework for approval or audit across subagent deployments.

What Remains Uncertain

The clearest documented risk is concurrency conflict. Google’s warning is specific: parallel subagents doing heavy code edits can overwrite one another, and the v1 does not include conflict resolution. The benefit of parallelism depends on the type of work being delegated.

Some early adopters report a failure mode involving reasoning loops, where the orchestrator and a subagent disagree on task completion and the delegation repeats. Per-subagent bounds mitigate this partially, but it is worth monitoring as adoption scales.

The broader question is whether the shift from single-agent to multi-agent changes what kind of team manages the system. A single agent is a tool that a developer configures. A team of subagents with parallel execution, scoped permissions, remote delegation, and per-agent security profiles is closer to a distributed system. The skills required to govern it, cost modeling, conflict analysis, permission management, trace inspection, overlap more with platform engineering than with prompt engineering.

Google is shipping the mechanism. How organizations wrap governance around it is a separate problem, and one the product does not yet solve.

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

When Subagents Turn AgentDesign Into an Operating Model Decision

When Subagents Turn AgentDesign Into an Operating Model Decision

The Configuration Question

The Short Version

What Led Here

Under the Hood

Why This Matters Now

What This Changes For Operations

What Remains Uncertain

Further Reading

Contact Us

Contact Us

Contact Us

Contact Us