The primitives

21 min read

What is a coding-agent primitive?¶

Open the source code or documentation of most production-grade coding agents - Codex CLI in Rust, opencode in TypeScript, the public-source parts of Claude Code, the agents shipped by half a dozen smaller vendors - and you see the same architecture emerging: a small set of primitives wrapped by a harness. The implementations differ. The anatomy converges. Different names sometimes, different file layouts always, but the same conceptual building blocks. Most are local capabilities of the agent. One - subagents - is the composition mechanism that makes the agent recursive: it can spawn constrained instances of itself.

Context window. Tools. Permissions / Sandbox. Skills. Plugins. MCP. Memory. Subagents.

Subagents are recent in the public vocabulary, not because the idea is new but because they went universal across the major agents in a tight window. Claude Code shipped the Task tool, then layered Agent Teams on top of it for higher-level coordination. As of early 2026, Codex CLI exposed subagents as a first-class workflow and allowed multiple subagents to run in parallel. Cursor 2.0 introduced its own subagent system. Cline shipped subagents natively. Within roughly a year, dispatching a constrained child instance of the agent went from "advanced workflow" to "a primitive the harness exposes by default." That is the test I use for primitive status, and subagents pass it.

That is the anatomy. Every interesting question about a coding agent - what it can do, what it cannot do, how to control it, what to compare it to - reduces to one or more of these primitives. When a new agent arrives, your first question is: how does this one handle each primitive? When you are deciding whether to let an agent touch a particular codebase, your second question is: which primitive is the relevant control point for this risk? When you are buying tooling, your third question is: which primitive does this tooling improve, and at what cost?

THE HARNESS

◉

context window

⚙

tools

◫

permissions / sandbox

decision layer OS enforcement

✦

skills

▣

plugins

↔

MCP

▤

memory

manually defined auto-memory system

⟲

subagents

the agent, recursively

the agent loop binds them together;
subagents spawn constrained child instances of the agent itself

Figure: The primitives and the harness that runs them. Permissions / Sandbox sits in slot 3 as a primitive whose two halves - the agent-level decision layer and OS-level enforcement - converge on presence but diverge on posture across vendors. Memory is the other primitive whose second half is still mid-convergence. Subagents sit below the line because they are the recursive primitive: each subagent is itself an instance of the others.

The context window is what the agent knows right now. It is bounded - every model has a maximum number of tokens it can hold in active attention. Two hundred thousand on a mid-range model. One million on a top-tier model. Those numbers are growing every quarter; by the time you read this they will be larger. But the bound exists, and the bound matters, because the context window is the workspace inside which the agent makes decisions.

What goes in the context window? The system prompt that defines the agent's role and constraints. The current conversation history with the user. The tool calls the agent has made and the results those calls returned. Any files the agent has read or chunks of files it has loaded. Any instructions injected by the harness (we will get to the harness in a moment). That is roughly what fills the window.

What does not go in the context window by default? The rest of your codebase. The git history. The Jira tickets. The Confluence wiki. The Slack channel where the team discusses architecture. All of that is potentially relevant, all of that lives somewhere, none of that is automatically in the agent's awareness. The agent has to ask for it - through tools, through plugins, through MCP. Which means the agent has to know it exists, or be told, or be configured to look.

This is the first thing that surprises teams new to agentic coding. The agent is brilliant at the things it can see, and oblivious to everything else. Most "the agent made an obviously wrong decision" failures trace back to "the agent did not have the context required to make a correct decision." The agent did not know about the new authentication library because nobody told it. It defaulted to the wrong test framework because nobody put the team's preference in the configuration. The agent made the best decision it could with the context it had, and that decision was wrong because the context was incomplete.

Context window management is therefore one of the central engineering disciplines of agentic coding. You are constantly making decisions about what to load, what to summarize, what to drop, what to ask for at the right moment. Bigger context windows help - a million tokens of context is genuinely more forgiving than two hundred thousand - but bigger windows do not eliminate the constraint. They raise the ceiling. The craft of working inside that bound - what to load, what to reference, when to start clean - is taught in Chapter 5 as context hygiene.

Tools are the actions the agent can take. Reading a file. Writing a file. Editing a file in place. Running a shell command. Searching for text across the codebase. Listing the contents of a directory.

Most production-grade coding agents converge on roughly the same core tool set. Read, Write, Edit, Bash, Glob, Grep. Sometimes a few more - running a Python snippet, fetching a URL, parsing a structured document. These are the verbs. Without them, the agent could think but could not act.

Tools are conceptually simple and operationally important. Each tool call is a decision point. Each tool call is also an audit point - production-grade agents should record the tool calls they made, in order, with arguments, so you can replay and inspect the agent's behavior after the fact. If you have ever had to debug a multi-step agent action that went wrong, you will appreciate the audit trail. The tool call log is the equivalent of the SQL query log in a database problem - without it, you are guessing.

Tool calls are also where governance lives. We will spend an entire chapter on this, but the preview: when you tell an agent "do not delete files in this directory," what you are doing is intercepting the Bash tool call before it executes the rm command. The tool call is the bottleneck. Constrain the tool calls and you constrain the agent. Let the tool calls run unconstrained and you have an agent that can do anything.

Permissions / Sandbox¶

Permissions / Sandbox is the primitive that PocketOS lacked. It has two halves, and they are not the same control written two ways - they are two different controls that the convergent agents ship together because each one catches what the other misses.

The agent-level decision layer is what the agent itself consults before every tool call. Allow / Ask / Deny rules, plus the newer auto-mode classifier that handles routine decisions silently and surfaces the rest for the operator. Every major coding agent ships this: Claude Code, Codex CLI, opencode, Cursor, Gemini CLI. Rule syntax differs by vendor; the architectural role does not. This is the layer most teams reach for first. This is also the layer prompt injection can defeat, because prompt injection works by manipulating the agent's reasoning, and the agent's reasoning is what consults the rules.

The OS-level enforcement runs underneath. The kernel itself refuses syscalls the agent was not authorized to make: Seatbelt on macOS, bubblewrap with Landlock and seccomp on Linux, restricted tokens or WSL2-backed isolation on Windows. The agent cannot reason its way past this layer because the kernel is not listening to the agent's reasoning - it is listening to system calls. Either the syscall is permitted or it is not.

Convergence on this half is real but uneven. Codex CLI enforces OS-level sandbox by default on Linux and macOS. Cursor added kernel-backed sandbox controls in its 2.x line. Gemini CLI ships sandbox profiles per platform. Claude Code is opt-in - the sandbox is available, but most installations skip it. opencode is the partial exception: it ships only the decision half, leaving OS isolation to whatever Docker or microVM the operator configures around it. The convergence is on presence of both halves as a configurable bundle, not on posture - exactly the asymmetry the Memory primitive has on its second half. Treat that as the honest reading.

Said plainly: the agent-level layer is bypassable by prompt injection. The OS-level layer is not. The two ship together because neither is sufficient alone. The convergent pairing is the primitive.

Chapter 3 walks the configuration surfaces of this primitive - how each major agent exposes its allow/ask/deny rules, where the OS sandbox is opted-into or opted-out-of, and how the chapter's five governance layers map onto the primitive's two halves.

Skills are packaged instructions that the agent loads when relevant. The team's preferred way of writing a Spring Boot service. The conventions for React component testing. The pattern for adding a new column to a multi-tenant database table. Each of these is a chunk of markdown - usually a few hundred words to a few thousand - that the agent reads at the moment it needs the relevant expertise.

The implementation of skills varies between agents in file names and loading semantics, but the underlying primitive is now shared across the major agents. The always-loaded primitive has converged on two filenames: the vendor-neutral AGENTS.md, supported by Codex CLI, Cursor, GitHub Copilot, Gemini CLI, Aider, and the wider ecosystem; and CLAUDE.md, which Claude Code reads natively. The two are interoperable - Claude Code can import AGENTS.md into CLAUDE.md so the team's content lives in one place across vendors. Both load at session start, both serve the same role. The on-demand primitive has converged too: individual markdown files, dispatched on detection, kept out of context until a task matches the skill's trigger (Claude Code calls them Skills; Codex CLI ships SKILL.md files with YAML frontmatter and progressive disclosure). The Spring Boot code review skill loads when reviewing Spring code; it does not pollute the context when the agent picks up a schema migration task. The always-loaded pattern (AGENTS.md, CLAUDE.md) is older. The dispatch-on-detection pattern (Claude Code Skills, Codex Skills) is newer and scales better as the team's catalog of skills grows.

Both patterns work. The dispatch-on-detection pattern is more efficient at scale - you can have fifty skills for fifty different kinds of work without filling the context window with forty-nine irrelevant ones at any given moment. The always-loaded pattern is simpler and more predictable. Choose based on the kinds of tasks your team runs and the kinds of context-overflow problems you hit.

Skills are how you encode team-specific expertise into a form the agent can use. They are durable - committed to git, reviewed in pull request, signed by the author. They are the closest thing in the agent's anatomy to "the senior engineer's tribal knowledge, but written down."

Plugins are bundles. A plugin packages skills together with tools together with hooks together with commands, all installable through a single command. hookify is a plugin. Security-guidance is a plugin. Superpowers is a plugin. The plugin is the unit of distribution.

Why plugins matter operationally: they make it possible to share expertise across teams without each team rebuilding the same scaffolding. If one team builds a great PR review workflow as a plugin, another team can install it with a single command. The plugin handles its own version management, its own dependencies, its own activation. The receiving team does not have to integrate it manually.

Why plugins matter strategically: they create a marketplace. As of May 2026, the plugin marketplace had become a real distribution channel, with official and community plugins beginning to form an ecosystem. The exact count matters less than the shift: plugins are now a supply-chain surface. The marketplace is the equivalent of npm for agentic coding - and like npm, it brings both the upside of rapid reuse and the downside of supply chain risk. You need to vet plugins the way you vet dependencies. Here is the checklist I use.

Maintainer. Who owns the plugin. Is the repo active in the last ninety days. Is there more than one maintainer or is it a bus factor of one. Is the author identifiable - a real GitHub history, a real employer, a track record - or a freshly minted account with one repo. A plugin maintained by Anthropic, by a known vendor, or by an engineer whose other work you can verify is in a different risk class from a plugin uploaded last week by a name nobody recognizes.

Permissions surface. What tools does the plugin install. What hooks does it register. What file paths does it touch. What network calls does it make. Treat broad permission requests the same way you would treat an npm package that quietly asks for filesystem and network access in its install script: as a red flag that needs a reason. A PR-review plugin does not need write access outside .git/. A documentation plugin does not need network calls to a third-party host. If the permissions exceed the obvious scope of the plugin, ask why before installing.

Code review. Read the source on first install. The plugins worth installing are small enough to read in twenty minutes; Superpowers and hookify both are. The plugins that are too large to read are usually doing too much. If you cannot understand what the plugin does from its source in a reasonable sitting, that is a signal to either find a smaller one or invest a deeper review before adopting it.

Update discipline. Pin the version. Do not auto-update. Treat a plugin update the same way you treat a dependency update - read the diff, run the test suite, ship the bump as a reviewed change, not a silent one.

Blast radius. A plugin that operates inside the agent's sandbox is bounded by the sandbox. A plugin that registers a hook running on the developer's machine outside the sandbox is unbounded. Prefer the bounded kind. When you must install the unbounded kind, raise the bar on everything above.

The marketplace is a real distribution channel. The discipline is the discipline you already use for dependencies. The cost of vetting once per plugin is small compared to the cost of one supply-chain incident.

What is MCP?¶

MCP stands for Model Context Protocol. It is a specification for how agents talk to external systems. Your Jira. Your Confluence. Your Postgres. Your GitHub. Your internal data warehouse. Anything that lives outside the agent's local environment but that the agent needs to query or update.

Before MCP, every agent had a custom integration story. Claude integrated with Jira one way; some other agent integrated with Jira a different way; if you switched agents, you redid all the integrations. MCP changed that. The same MCP server that talks to your Jira works with any agent that supports MCP - and most serious coding agents now support MCP.

This matters for enterprise procurement. An MCP integration is portable. The investment you make in setting up an MCP server for your internal systems is not lost when the agent landscape shifts. The next agent your team adopts will be able to use the same MCP servers you already configured. MCP is the closest thing the agentic AI ecosystem has to "open standard that survives vendor changes."

Memory¶

Memory is the most recent primitive to go universal. Eighteen months ago it was implicit: the agent loaded a prompt, did some work, and the next session started clean. Today Memory has two halves - one fully converged across the major agents, one led by Claude Code with the others on the path.

Manually defined memory is the layer the team writes. The convergence is real: Codex CLI, Cursor, GitHub Copilot, Gemini CLI, Aider, and the wider ecosystem all read AGENTS.md from the repository root at session start. Claude Code reads CLAUDE.md, which can import AGENTS.md to share the same content with other agents. The file is committed to source control, reviewed in pull requests, owned by the team. It is the place forbidden patterns, mistake-journal entries, build commands, and domain glossaries live. Chapter 6 covers what goes in this file in detail and why it matters.

The auto-memory system is what the agent writes for itself. Claude Code is the early-mover; other agents are converging on similar mechanisms but had not shipped equivalents at publication. It has two visible surfaces: Auto Memory is the layer where Claude saves learned patterns across sessions - build commands it figured out, debugging insights it confirmed, code-style preferences it inferred - without the user explicitly writing them down. Auto Dream is the background-consolidation layer Anthropic unveiled at Code with Claude SF on 2026-05-06: a scheduled process that reviews recent sessions and the memory store, identifies recurring mistakes and convergent workflows, and writes consolidated notes back into long-term memory. The agent gets better at your codebase between runs.

A note on what is not memory in this taxonomy: session memory (the conversation history plus tool results inside a single session) is just the context window. It is memory in the everyday sense but not a separate primitive - it is the primitive named first.

Manually defined memory passes the convergence test today. The auto-memory system is on the path - Claude Code is first; others are following. This manual treats them as one primitive because the structural role is identical, with the caveat that the second half is an early-mover signal, not yet a convergence.

Subagents are constrained child instances of the agent itself.

The orchestrator agent spawns a subagent, hands it a bounded task with a scoped prompt, and lets it run in its own isolated context with its own scoped tool access. The subagent does the work. The subagent returns a result. The orchestrator collects.

What makes subagents structurally distinct from the other primitives is that they are recursive. A subagent is another instance of the primitives - it has its own context window, its own tools, its own permissions / sandbox, its own skills, plugins, MCP, memory - bounded to a smaller task and isolated from the orchestrator's context. The orchestrator does not see what the subagent saw. It sees only what the subagent returns. The subagent does not pollute the orchestrator's context with intermediate work. The orchestrator does not pollute the subagent's context with unrelated history.

The tight-window convergence that opened this chapter - every major agent shipping subagents within roughly a year - is not an accident. Subagents solve two problems no other primitive solves: parallel work bounded by independence rather than by coordination, and context isolation bounded by task scope rather than by session history.

The primary uses in serious work: parallel execution of a multi-task plan (the Execute phase of the loop in Chapter 5), structured review (dispatch one subagent to check spec compliance, another to check code quality), and architecture analysis at scale (one subagent per file or per module, returning structured summaries the orchestrator assembles - the workflow in Chapter 7).

The primary cost: tokens. Each subagent runs its own model and tool work, so an eight-subagent dispatch consumes roughly eight times the tokens of a single agent run. Use them where parallelism or isolation matters. Do not use them as a default for work a single agent could handle linearly.

As of mid-2026 this composition is being productized. Claude Code's /workflows has the agent write a short script that dispatches subagents deterministically - phases run in order, work fans out in parallel or streams through a pipeline, each handoff returns a schema-validated structure instead of free text, and a verification step can gate the result before it returns. A single workflow can fan out across hundreds of children, holds their intermediate work in script variables rather than the orchestrator's context window, and is resumable and budget-bounded. This is not a new primitive. It is harness convenience over the subagent primitive you already have: the determinism - control flow in code, handoffs validated by schema - is the only real difference from the model-driven dispatch a plugin like Superpowers already offers, where the orchestrator decides at runtime how many children to spawn and reads their prose back. Reach for the scripted version when the fan-out is large, repeatable, or worth verifying mechanically; reach for model-driven dispatch when you do not know the shape of the work until the agent is in it. What gets composed is the same either way.

We will return to subagents in Chapter 5 (Execute) and Chapter 7 (architecture review at scale). The point in this chapter is that they are not a technique laid on top of the architecture. They are the architecture's composition primitive.

One more piece organizes them all. The vendors call it the harness. The harness is the runtime around the model - the part that turns the raw model into something useful for coding.

When you ask an agent to do work, the harness is the code that takes your request, formats it for the model, manages the context window, dispatches the tool calls the model wants to make, captures the results, feeds them back to the model, decides when the model is done, and returns the final output to you. The primitives all live inside the harness. The harness is the architecture; the primitives are the components.

Why this distinction matters: when you compare agents, the temptation is to compare models. "Is Claude Code better than Codex or Cursor for the workflow my team runs?" That is the right question. "Which model has the higher benchmark score this quarter?" is the wrong one. The model determines the ceiling. The harness determines whether you reach it. Compare harnesses, not models.

Said plainly: the harness is the trim around the agent loop. The agent loop is trivial. It is roughly the shape of an HTTP request handler in a web framework - receive prompt, run model, dispatch tools, return response, repeat. The middleware around that loop is where the real work lives. The middleware is the harness, is the primitives, is what you are buying when you adopt an agent.

A note on vocabulary. The primitives named here are what the agent uses to know, act, gate, extend, integrate, remember, and delegate. The test for primitiveness is convergence: a mechanism is a primitive when every major coding agent ships it as a distinct, configurable bundle, even when the implementations differ substantively. Permissions / Sandbox passes that test on the decision-layer half across all the major agents; the OS-enforcement half is presence-converged but posture-divergent, with the vendor postures catalogued in its section above. The Memory primitive has the same shape on its second half. Telemetry has not yet crossed the convergence line and remains a control layer around the primitives. When the next mechanism converges - observability event-push is the candidate to watch - the list will grow again. This chapter is the first convergence catalogue; Chapter 3 is the second.

Context window. Tools. Permissions / Sandbox. Skills. Plugins. MCP. Memory. Subagents. Plus the harness as the runtime that organizes them. That is the list today. The set is open; expect it to grow. The next primitive will join the way Memory just did: when the convergence appears, not before.

When the next coding agent appears in the marketplace next quarter, the evaluation rubric is right there. How big is the context window and how does the agent manage it under pressure? What tools are available and how are they constrained? What permission model does it ship - allow/ask/deny rules, auto-mode classifier - and what OS sandbox does it default to? How are skills implemented - always-loaded, or dispatched on detection? Is there a plugin marketplace and is it growing? Does it speak MCP, and how good is the MCP integration? Does it read a team-shared memory file at session start? Does it maintain any agent-written learned memory across sessions? How does it expose subagents - and is parallel dispatch a first-class operation or an afterthought?

Nine questions today, across eight primitives - Memory earns two; more tomorrow. They tell you almost everything you need to know to compare the new agent to the one you are using today.

Next chapter: what happens when you point one agent at the source of another. The anatomy I just described becomes very real, very fast.

Key	Action
`?`	Show this help
`Esc`	Close overlays and menus
`⌘ K` or `Ctrl K`	Open search
`/`	Open search (secondary)
`←` `→`	Previous / next chapter
`g` `g`	Jump to top
`G`	Jump to bottom
`T`	Toggle theme
`-` `+`	Decrease / increase font size