Ship It With AI Mihai Cvasnievschi

Patterns for brownfield codebases

9 min read

The traffic light tells you which codebases are ready for the agent. This chapter is about the operational patterns that make agentic work effective on legacy codebases - yellow projects in particular, where the practice matters most.

I will walk through eight patterns. The first five are the default operating patterns I install on most brownfield teams: worktrees, champions, hooks, PR review, and governance storytelling. The final three are maturity patterns for teams past the first few months: mistake-journal review, demo-day backstop, and failure watchlist.

Each one is something I have watched make the difference between an agent that contributes and an agent that frustrates.


Pattern one: worktrees.

Each agent session runs in its own git worktree. A worktree is a separate checked-out copy of the repository on a separate branch. The agent can experiment, fail, retry, refactor on its worktree without touching anyone else's working copy. When the work is good, the branch goes through normal PR review. When the work is bad, you delete the worktree and start over.

Cost of local-state damage with worktrees: near zero. The agent's bad attempt never touches your working copy or anyone else's.

Cost of failure without worktrees: the agent contends with your unfinished local work, breaks something subtle, and you spend an hour figuring out what changed.

Worktrees are the single most under-appreciated git feature for agentic work. Every developer on an agentic-coding team should have a git worktree add command in their muscle memory. Use them.


Pattern two: champions.

One person on the team owns the AGENTS.md and the mistake journal for a given repository. The champion does the weekly maintenance: read what other developers added to the mistake journal, refactor rules that have accumulated, deprecate rules that no longer apply, update the conventions when the team's practice changes.

The champion rotates quarterly. The first champion has the highest cost - they set up the patterns. The subsequent champions have the lowest cost - they maintain. Rotation prevents single-point-of-failure on the tribal knowledge of "how we use agents here." It also distributes the practice; every senior on the team eventually takes a turn, every senior internalizes the maintenance.

The champion is not the only person who edits AGENTS.md. Everyone edits it, through pull requests, when they encounter a new pattern or a new mistake. The champion's job is curation, not authorship. The distinction matters: if the champion is the only author, the file becomes one person's opinion of how to do agentic work, and the team's actual practice diverges. If everyone authors and the champion curates, the file represents the team's collective experience, kept disciplined by a single owner at any one time.


Pattern three: hookify rules.

hookify (or your agent's equivalent plugin) lets you write hooks that fire before tool execution. The hook reads the proposed action, evaluates it against custom rules, and either allows it, prompts the user, or blocks it.

The use case for hookify in brownfield work is specific. You have areas of the codebase that are dangerous to modify without senior review - cryptography modules, payment processing, data migration scripts, anything regulatory. You write a hookify rule that blocks the agent from modifying files in those directories, or requires explicit user confirmation when it tries. The rule lives in the repository, committed to git, applied automatically every session.

hookify rules complement AGENTS.md. AGENTS.md tells the agent the team's conventions and forbidden patterns; the agent reads them and applies them by default. hookify enforces the rules structurally; if the agent tries to violate them anyway (because LLMs sometimes do), the hook catches it. AGENTS.md is the polite request. hookify is the firm boundary.

For yellow projects, I recommend establishing hookify rules for at least the regulatory-sensitive areas and the historically-broken modules. Five to ten rules is usually enough. Each rule is one line of configuration plus a one-line justification.


Pattern four: PR review toolkit.

Before a senior engineer reviews an agent-produced pull request, a set of review agents go through the diff first. Silent failure hunter looks for swallowed exceptions, unbounded retries, missing null checks. PR test analyzer identifies which new methods have weak test coverage and recommends specific tests to add. Security scanner checks for the standard vulnerability categories. Documentation reviewer flags missing or stale documentation.

The review agents do not replace human review. They run before it, surfacing the kinds of issues that are mechanical to detect. The human reviewer then focuses on the things only humans can evaluate: business correctness, architectural fit, judgment calls.

The leverage is in the time savings on the mechanical findings. A senior engineer doing a fifteen-minute review can catch the obvious bugs. A senior engineer doing a five-minute review (because the review agents caught the obvious bugs already) can spend the other ten minutes on the architectural judgment that the agents cannot make.

Set up the review agents. Wire them into the pull request flow. The agents do the mechanical work; the humans do the human work.


Pattern five: governance for AI-selling companies.

If your company sells AI capabilities to its own clients - not just consumes AI internally, but resells AI as part of your product - then the governance pattern is different from a pure consumer of AI. Your demos, your sales calls, your client engagements are all situations where your team's discipline is on display. The client is evaluating whether you know how to do AI responsibly, not just whether the AI works.

This calls for a few additional patterns on top of the general ones.

First, the demos are not allowed to skip the rigor. If you are showing a client how you use an agent to ship code, you show them the research phase, the plan phase, the review phase, the verify phase. You do not just show them the agent generating code and shipping it, because that is the demo that creates client expectations you will not meet in production.

Second, your AGENTS.md is a sales asset. Clients will want to see what your team has codified. A well-maintained AGENTS.md that runs to a hundred lines, with a mistake journal that shows real lessons learned, is more credible than a thousand-line AGENTS.md that reads like it was written by a consultant. Show the discipline, not the volume.

Third, the kill signal framework is something you teach clients. The rubric is more valuable to a client than any specific recommendation you would make, because the lens lets the client evaluate their own codebases without depending on you. Giving away the frame strengthens the trust relationship. Teams that hoard frameworks lose to teams that share them.

The patterns above apply to any team. They apply with extra force to teams whose customers are watching - companies whose engineering quality is a visible product surface, not an internal cost center. Governance maturity is part of those companies' offering, and the discipline this manual describes is what makes the maturity defensible.


Five patterns. Worktrees. Champions. hookify rules. PR review toolkit. Governance for AI-selling companies. None of them require inventing a new process. All of them slot into how engineering teams already ship code, with the small additions that agentic work requires.

Three more patterns, briefly, because they appear in the well-functioning teams I have worked with even though they are less often discussed.


Pattern six: the mistake-journal review.

The mistake journal in AGENTS.md is alive. Every entry is a real failure the team experienced. But the journal grows over time, and not every entry stays load-bearing forever. Some failures are structurally resolved - the underlying cause has been refactored out, the dependency has been replaced, the convention has been internalized to the point where nobody would make the mistake again. Those entries can be retired without losing safety, and retiring them keeps the journal lean.

The champion runs a quarterly review of the journal. For each entry, the question is: has anyone been saved by this rule in the last three months? If yes, keep it. If no, but the rule is still applicable to the codebase, keep it (rules that prevent rare failures are still valuable even if the failure has not recurred recently). If no, and the rule no longer applies because the underlying problem is gone, retire it with a note in the commit message explaining why.

The habit keeps the journal from becoming a graveyard. A graveyard of rules is almost as useless as no rules at all, because the team stops trusting any individual rule when there are too many of them, and the agent's context window gets crowded with obsolete instructions.


Pattern seven: the demo-day backstop.

When the team is preparing a demo of agentic work - for leadership, for clients, for an internal showcase - there is a temptation to do the demo live, with the agent doing real work in front of the audience. Sometimes this works. Sometimes the agent has a bad day, the network hiccups, the model decides to be unusually verbose. Live demos of probabilistic systems carry real failure risk.

The pattern I recommend: prepare a backup recording of the same demo, done successfully ahead of time. If the live demo runs into trouble, pivot to the recording at thirty seconds in. The audience does not need to know the difference. The lesson lands either way.

The backstop is not cheating. The backstop is professional execution. Every senior speaker in any field has a backup plan for the moment the live element fails. The agentic equivalent is a recorded version of the same work, kept ready.


Pattern eight: the failure watchlist.

When the team has been doing agentic work for a few months, you start to notice failure modes that recur. Specific kinds of mistakes the agent makes that you have to correct repeatedly. Specific situations where the workflow breaks down. Specific user behaviors that lead to predictable problems.

The pattern is to maintain a failure watchlist - a document that catalogs these recurring failure modes, the conditions under which they occur, and the team's standard response when they happen. The list grows. The team reviews it together every month or so. New entries get added; old entries that have been structurally fixed get retired.

The watchlist is to operations what the mistake journal is to development. The mistake journal prevents code mistakes; the watchlist prevents process mistakes. Both are committed to the repository. Both are reviewed regularly. Both are how the team's accumulated experience becomes infrastructure.


Eight patterns total. They will not all apply to every team. The first five - worktrees, champions, hookify, PR review, AI-selling governance - apply broadly. The remaining three - mistake-journal review, demo backstop, failure watchlist - are for teams that are past the first six months and ready to professionalize their practice.

Next chapter: the adoption framework - how a team that has read this manual starts. Three roles, ninety days, specific commitments.