AGENTS.md as team infrastructure
14 min read
Names and conventions. The vendor-neutral standard for the team instruction file is AGENTS.md, with native support across Codex CLI, Cursor, GitHub Copilot, Gemini CLI, Aider, Zed, Windsurf, and others. The Claude Code-specific variant is CLAUDE.md. The format is markdown either way; the loading semantics are equivalent. If you came to this chapter from the Claude Code ecosystem, read AGENTS.md as "the file your agent reads at session start" - the discipline this chapter teaches is identical regardless of the filename. Where this manual discusses Claude Code-specific behavior, I use CLAUDE.md; otherwise I use the vendor-neutral name.
AGENTS.md is the manually defined layer of the Memory primitive named in Chapter 1. It is the team-shareable surface - the layer the team authors, reviews, and owns in source control. The auto-memory system (Auto Memory, Auto Dream) is per-developer and largely automatic; this chapter focuses on the manually owned layer, because that is where team-level discipline lives. What follows is six things that go in AGENTS.md, the 200-line budget rule, and the failure modes you see in practice.
A team I worked with last year had been using a coding agent for six months when a senior engineer brought me a complaint. "Every time we generate an API endpoint, I have to fix the validation. Always the same fix. The agent puts the validation in the wrong layer."
I asked the obvious question. "Why isn't the rule written down?"
Long pause. "We never wrote it down. We just keep fixing it."
So I walked through the count. In the past month, the agent had generated twelve new endpoints. The same senior engineer had moved validation from the controller layer to the service layer in eleven of those twelve. He had spent maybe fifteen minutes per fix, including the PR review back-and-forth. Three hours of his month, every month, on the same correction. The team had reabsorbed the cost into "normal review work" and stopped noticing.
We wrote one line in AGENTS.md. "Bean Validation annotations on DTOs at the controller boundary; service-layer methods trust their inputs." We added it to the team's pull request template as a reviewer prompt. We tagged the existing endpoints that were already correct as reference examples.
The next twelve endpoints: zero validation-layer corrections. Three hours per month of senior-engineer time, eliminated by one line of configuration. The rule did not make the agent perfect; it eliminated that specific repeated correction.
AGENTS.md is what turns "the agent keeps making this mistake" into "the agent stopped making this mistake." It is team infrastructure in the same way build scripts are team infrastructure - committed code that the team maintains because the team relies on it.
There is a failure mode in AGENTS.md that is the inverse of the success story above. A banking team I worked with in early 2026 had an AGENTS.md that had grown over six months to roughly nine hundred lines. Every PR that exposed a new edge case spawned a new rule. Every team retro added a new convention. The file was comprehensive, well-organized, and rigorously maintained. It also stopped working. The agent started ignoring specific rules the team cared about most - the very rules that had been added in the most recent sprint - because instruction-following quality on long context degrades uniformly as instruction count grows. The team's most important rules were drowning in their own thoroughness.
The fix was unglamorous. We cut AGENTS.md from nine hundred lines to ninety. We moved the project-specific conventions that did not need to be loaded on every session into skills that dispatched on detection. We moved the documentation-of-past-mistakes into the architecture document, where it lived as prose rather than as instructions. The compliance ratio on the rules that remained jumped within a week. The pattern I now teach: if your AGENTS.md has grown past two hundred lines, the file is not getting more useful, the file is getting more ignored. Cut it, move material into skills, and treat the surviving rules as the load-bearing ones.
AGENTS.md is a markdown file that lives in the root of your repository. The coding agent reads it at session start, before any user prompt. It is the document that turns "what the agent thinks is reasonable" into "what your team has agreed is reasonable." Without it, every developer's agent session has a different opinion about how to write code for your codebase. With it, the opinion is the team's - committed in git, signed by the author, reviewable in pull request.
AGENTS.md is the most important new piece of infrastructure for sustained agentic delivery. A team without an AGENTS.md is doing agentic coding the way an early-stage startup does deployments - by hand, by tribal knowledge, by the institutional memory of one senior engineer who happens to be in the room. A team with a well-maintained AGENTS.md is doing agentic coding the way mature engineering organizations do deployments - automated, repeatable, owned by the team, surviving turnover.
Six things go in AGENTS.md. I will walk through each one with concrete examples from banking codebases, since that is where I do most of my work.
One: forbidden patterns. Things the agent must never do. Each is one line. Each has a one-line reason.
Never construct SQL by string concatenation. Always use bound parameters. (Reason: SQL injection on customer queries is a regulator-visible incident.) Never log PII fields, including the obvious ones (account number, SSN) and the less-obvious composites (full name plus DOB). (Reason: GDPR Article 5(1)(c) data minimization.) Never roll your own cryptography. Use the team's approved crypto wrapper at
com.bank.crypto.SecureCrypto. (Reason: AES-CBC with hardcoded IV shipped to production in 2024; we are not doing that again.) Never modify the database migration history. Migrations are append-only. (Reason: rollback of a modified migration corrupts the schema in unrecoverable ways.)
Each forbidden pattern is a wall the agent will not cross. If the agent thinks the rule is wrong in a specific case, the agent will surface the disagreement in its response - and the team will either explain the exception or update the rule.
Two: mistake journal. A log of failures the team has actually seen, with the rule that prevents recurrence. Example:
2026-03-03: shipped a bug where the agent generated a JPQL query that bypassed the multi-tenant filter. Root cause: prompt did not mention multi-tenancy; agent assumed single-tenant. Fix: queries now extend
MultiTenantQueryBuilderbase class which enforces tenant filtering. Rule added.
The mistake journal grows over time. It also gets pruned - entries that have been structurally resolved (the underlying issue is no longer possible) are removed. The journal is documentation that earns its keep through prevention, not through volume. Every entry should be a rule that has actually prevented a recurrence at least once.
Three: Spring Boot conventions specific to the team. (Or React, or whatever your stack is. Spring Boot is my example.)
Constructor injection only, not field injection. (Easier to test.)
@Transactionalonly on service methods that mutate state, not on read methods. (Avoids read-only transactions holding unnecessary locks.) Repositories extendBaseRepository<Entity>for multi-tenant filtering. (See mistake journal entry 2026-03-03.) DTOs at controller boundary use Bean Validation annotations. Internal services trust their inputs.
Each convention is one line. The agent reads them and applies them by default. New code matches the conventions. Old code that doesn't match is gradually brought into alignment as the agent touches it.
Four: build and test commands. The exact incantations the team uses.
Build:
mvn clean verifyRun all tests:mvn testRun tests for a single class:mvn test -Dtest=ClassNameRun security scan:mvn dependency-check:checkRun linting:mvn spotless:check
This sounds trivial. It is not. Without this section, the agent guesses commands. The guesses are usually close but occasionally wrong, which causes confusing failures. With this section, the agent uses the exact commands the team uses, no guessing.
Five: where to find things. The repository's structural conventions.
Services live in
src/main/java/com/bank/service/Repositories insrc/main/java/com/bank/repository/DTOs insrc/main/java/com/bank/dto/Tests parallel main, insrc/test/java/com/bank/Database migrations insrc/main/resources/db/migration/(Flyway) Configuration insrc/main/resources/application.yml
The agent reads this and knows where to put new files. Without this, the agent uses its best guess based on the existing structure, which is usually right but occasionally wrong in ways that violate team conventions.
Six: domain glossary. Terms specific to your business.
"Customer" refers to an end user of the bank, not a corporate client. Corporate clients are "Counterparties". "Transfer" includes both intra-bank and inter-bank movements. "Wire" is specifically inter-bank. "Holds" are short-term reservations of funds, distinct from "blocks" which are long-term legal restrictions.
The glossary disambiguates terms the agent might otherwise interpret in their general-purpose meaning. In a banking context, "transfer" means something specific. In the agent's pretraining, "transfer" means a lot of things. The glossary anchors the agent to your meaning.
The AGENTS.md should be under two hundred lines. This is a hard constraint, not a guideline.
Two hundred is the budget because AGENTS.md is loaded into the agent's context at every session start. Every line costs context that the agent could be using for the actual task. Two hundred lines fits comfortably without crowding out reasoning capacity. If your AGENTS.md is past two hundred lines, it is doing too much. The two failure modes:
Failure mode A: too many rules. Your team has accumulated rules over time and never deprecated the ones that no longer apply. Audit. Remove rules that have not been triggered in six months. Move rarely-applicable rules into skills that load on detection rather than always.
Failure mode B: too verbose. Each rule is a paragraph instead of a line. Tighten. The agent does not need three sentences of justification for each rule; it needs the rule. Justifications belong in comments in the AGENTS.md itself, or in linked documentation.
The two-hundred-line cap forces opinion. The opinion is the value.
AGENTS.md is committed to git. Reviewed in pull request. Signed by the author. Changes to AGENTS.md go through the same review process as code changes, because AGENTS.md is code, in the sense that the agent executes against it.
When a developer adds a new rule, the pull request has at least one reviewer. The reviewer asks: "what failure does this prevent? when did we last see it?" If the answer is "we have not seen it, but I think we might," the rule does not land. The reviewer pushes back. Speculative rules accumulate into bloat; only failure-mode-driven rules earn their place.
I have seen teams maintain AGENTS.md files for over a year with sustained quality. The pattern is the same in every case: one champion owns it, the champion rotates quarterly, every change is reviewed, the under-two-hundred-lines limit is enforced, and the mistake journal grows then shrinks then grows then shrinks as the codebase matures.
This is engineering infrastructure. Treat it that way.
When the agent confidently lies.
A subsection because every practitioner deals with this weekly.
The agent sometimes references files that do not exist. A function signature that the library does not actually expose. A configuration option that was deprecated three versions ago. The output looks plausible. It is wrong.
This is called hallucination, which is the polite term. The blunter term is the agent is making things up to be helpful. The agent does not know it is making things up. The agent is generating tokens that pattern-match against similar code it has seen, and the pattern happens to not match your specific reality.
Three tactics that reduce the cost.
One: force the agent to read before it cites. If the agent is about to reference a file, the agent should have read that file in this session, recently enough that the read appears in its recent context. AGENTS.md can require this: "Before referencing any function, read the file that defines it in this session. Citations without preceding reads are treated as drafts to verify."
Two: cross-check tool calls against grep. After the agent produces code that references some library function, run grep on the codebase for the function name. If grep returns no hits, the function probably does not exist (or it exists in a vendored dependency the agent cannot see). The cross-check is mechanical and catches the most common hallucinations in seconds.
Three: structured citation formats with verification hooks. If the agent cites a file colon line number ("see ImpactCalculator dot java line 142"), a hookify rule can verify that the line exists before the agent moves on. The hook is a few lines of bash. It catches every fabricated citation.
The general pattern: trust nothing the agent has not just demonstrated it knows. The architecture review workflow from the next chapter is one form of this discipline applied at the codebase level. The cross-check, the forced-read, the citation hook are forms of it applied at the file and function level.
Hallucination is the agent's most-publicized failure mode and the one most over-corrected for. You do not need to verify everything the agent does. You need to verify the specific things the agent makes claims about that would compound if wrong. File and function citations are first on that list.
A concrete comparison: bad AGENTS.md vs good AGENTS.md.
The difference between an AGENTS.md that helps and one that does not is usually visible in a single rule. Take a rule about validation, which most teams will need.
Bad:
Always follow our coding standards. Be careful with validation. Use the right architecture. Do not make risky changes.
Good:
DTOs own controller-boundary validation via Bean Validation annotations. Services trust validated DTOs and do not re-validate. Never add validation annotations inside service methods. Examples: UserCreateController, AccountUpdateController.
The bad version sounds responsible. It tells the agent to be careful, follow standards, use the right architecture. It is also useless. "Careful" is not a constraint the agent can check. "Right architecture" depends on context the rule does not provide. The agent will read this rule, produce code that violates it, and the team will conclude AGENTS.md does not work.
The good version is enforceable. It names the layer (DTOs at the controller boundary), the mechanism (Bean Validation annotations), the rule (services trust, do not re-validate), and two concrete examples the agent can pattern-match against. A rule like this catches mistakes in review. The bad version does not.
The pattern: rules that name the layer, the mechanism, and at least one example are rules the agent can apply. Rules that gesture at principles are rules the agent will ignore.