Appendix B. Templates

1 min read

Eight copy-paste templates referenced throughout the manual. All are starting points; customize for your team.

B.1 Architecture review prompt

Analyze the architecture of this codebase. Produce a structured architecture review document covering:

1. Purpose. What does this service do? Who uses it? What business problem does it solve?
2. Top-level structure. Major modules, packages, or folders. One paragraph per major component.
3. Data model. Primary entities, relationships, persistence. Cite specific files and line numbers.
4. Request flows. For the three most important external entry points, trace from entry to persistence. Cite files and lines at each step.
5. Cross-cutting concerns. Authentication, authorization, logging, error handling, configuration. Where do they live?
6. Dependencies. External services, databases, message brokers, third-party APIs.
7. Test posture. Test structure, coverage, gaps.
8. Build and deployment. Cite the configuration files.
9. Risks and unknowns. Fragile code, inconsistent conventions, deprecated dependencies, unresolved patterns.

Cite specific files and line numbers throughout. Where the codebase is ambiguous, say so explicitly. Where you encounter patterns the team should formalize, suggest the convention.

B.2 AGENTS.md skeleton

This template works as either AGENTS.md (vendor-neutral standard) or CLAUDE.md (Claude Code variant). The filename varies by agent; the markdown format does not.

# AGENTS.md

## Forbidden patterns
- Never construct SQL by string concatenation. Use bound parameters. (Reason: SQL injection.)
- Never log PII fields. (Reason: data minimization compliance.)
- Never roll your own cryptography. Use the team's approved crypto wrapper. (Reason: AES-CBC with hardcoded IV shipped to production in 2024; we are not doing that again.)
- Never modify migration history. Migrations are append-only.

## Mistake journal
- 2026-03-03: agent generated JPQL query that bypassed the multi-tenant filter. Fix: queries extend MultiTenantQueryBuilder base class which enforces tenant filtering. Rule added.

## Conventions
- Constructor injection only, not field injection.
- @Transactional only on service methods that mutate state.
- Repositories extend BaseRepository<Entity>.
- DTOs at controller boundary use Bean Validation.

## Build and test
- Build: mvn clean verify
- Run tests: mvn test
- Run linting: mvn spotless:check
- Run security scan: mvn dependency-check:check

## Where things live
- Services: src/main/java/com/team/service/
- Repositories: src/main/java/com/team/repository/
- DTOs: src/main/java/com/team/dto/
- Tests: src/test/java/com/team/ (parallel package structure)
- Migrations: src/main/resources/db/migration/ (Flyway)

## Domain glossary
- "Customer" = end user. "Counterparty" = corporate client.
- "Transfer" = intra-bank or inter-bank. "Wire" = inter-bank only.
- "Hold" = short-term reservation. "Block" = long-term legal restriction.

B.3 Six-phase loop checklist (one-pager)

RESEARCH
- Agent produces research note (2-4 pages)
- Note names: files to touch, conventions to follow, risks, open questions
- Human review: does this match my mental model of the work?

PLAN
- Agent produces file-level plan, each task 2-5 minutes
- Plan names test changes for any code change
- Plan states what "done" means for the whole change, not just per task
- Human review: any task too vague, too large, wrongly ordered? Push back. Approve.

EXECUTE
- Agent dispatches subagents per task in isolated context
- Each subagent: read, implement, verify, report
- Orchestrator integrates results
- If task fails: orchestrator decides retry / route-around / escalate

REVIEW (two reviewers, in sequence)
- Spec compliance reviewer: does implementation match the spec?
- Code quality reviewer: is this good code, by team standards?

VERIFY
- New tests run. Existing tests run (as part of execute).
- For UI: Playwright with accessibility tree, not pixels.
- No "done" without test evidence.

SHIP
- Structured commit message + push + PR with structured description
- Reviewers tagged per CODEOWNERS
- Linked Jira ticket updated; Slack notified
- Pull request goes through normal team review

B.4 Kill signal scoring worksheet

For each codebase, score each signal: 0 (signal absent) / 0.5 (partial) / 1 (signal present).

Signal 1 - No tests
- 0: > 70% line coverage AND tests are run on every commit
- 0.5: 30-70% coverage OR tests exist but are not routinely run
- 1: < 30% coverage OR no automated test suite

Signal 2 - No documentation
- 0: current architecture doc + in-code comments + decision records
- 0.5: partial documentation, possibly stale
- 1: no architectural overview; only the original author knows

Signal 3 - Tight coupling
- 0: clear module boundaries; modules can be changed in isolation
- 0.5: some coupling; experienced devs can navigate but new hires struggle
- 1: hairball; edit one file, three others break

Signal 4 - Scattered business rules
- 0: single source of truth for each business rule
- 0.5: some duplication, documented
- 1: same rule expressed in 3+ places, often inconsistent

Signal 5 - Regulatory constraints
- 0: standard controls; existing audit machinery handles changes
- 0.5: regulated but team has the workflow in place
- 1: heavy controls + team has no integrated workflow; sign-off matrices missing

Signal 6 - Team cannot evaluate output
- 0: team has senior expertise for every domain in the codebase
- 0.5: senior expertise exists but is fragile (one person, may be unavailable)
- 1: team cannot reliably evaluate agent output in some domain

Signal 7 - Model-context fit
- 0: codebase is in a popular language/framework with substantial public footprint
- 0.5: niche but documented enough that the agent has some context
- 1: proprietary DSL, internal framework, or rare language with no public corpus

Signal 8 - Velocity-of-change
- 0: framework and dependencies are stable; no major migrations in flight
- 0.5: minor version churn ongoing but the team is in control
- 1: major migration mid-flight; codebase straddles old and new versions

Round to the nearest integer.

Traffic light:
- 0-1: GREEN (agent-led work appropriate)
- 2-3: YELLOW (human-led with agent support)
- 4+: RED (fix codebase first)

Signal 6 weights extra: any codebase scoring 1 on signal 6 is RED for the affected work, regardless of other signals. Restrict the agent away from that work until the capability gap is closed.

B.5 90-day adoption calendar (one-pager)

CHAMPION (engineer with curiosity)

Month 1
- Week 1: install agent. Architecture review on familiar codebase. Commit artifact.
- Week 2: draft team's first AGENTS.md, < 50 lines.
- Weeks 3-4: run six-phase loop on three small features.

Month 2: run six-phase loop on three medium features. Pull other engineers into individual phases.

Month 3: hand off champion role to a successor. AGENTS.md is now team-owned, not champion-owned.

---

LEAD (decides which projects get the agent)

Week 1: classify top 5 projects against 8 kill signals. Write classifications. Share with team.

Weeks 2-4: for one project at each color, document what would have to change to move it.

Month 2: track metrics. Cycle time, defect rate, reviewer time.

Month 3: present results. Honest data. Recommend adjustments.

---

MANAGER (owns budget, procurement, hiring)

Months 1-2: protect the team. No per-engineer productivity metrics, no ROI
projections, no vendor benchmarks yet. The practice is being built.

Days 61-67: take the Champion/Lead handoff. Verify the dashboard metrics are
real (agent-touched PR share, cycle time vs baseline, defect rate vs baseline).

Days 68-74: close the seat-vs-enterprise procurement decision with the
Appendix A rubric. Match tier to usage, not uniformity.

Days 75-81: governance one-pager (hooks, sandbox, secrets, telemetry).
Champion drafts; manager edits and signs.

Days 82-90: leadership update - two sentences and one number. Defend the
operational metric; leave revenue to whoever owns revenue. Hand off to
quarterly cadence.

---

GRASSROOTS TRACK (for teams without all three roles)

Month 1: champion uses agent for personal productivity only. No announcement.
Month 2: champion writes up results. Shares in team meeting.
Month 3: peer asks how. Champion teaches. Two-engineer demo invites lead.
Months 4-6: lead recruits manager using two-engineer evidence base.

Twice as long as the ideal arc; works in companies that are not yet ready for the ideal one.

B.6 Outer-loop contract (one-pager)

Before any unattended run, fill every line. A blank line means the loop is not ready.

WORK
- Queue: ______ (file or issue list in the repo; one small, similar, reversible unit per item)
- Eligibility: GREEN codebase (B.4 score 0-1) / every unit machine-verifiable / every unit revertible

STOP CONDITION (machine-evaluable; the loop halts itself)
- Done when: ______ (queue empty / suite green / N units ready for review)
- Budget: max ______ tokens or $______ or ______ iterations or ______ hours - whichever hits first
- Failure rule: same unit fails twice -> skip it and flag; three skipped units -> halt

EACH ITERATION
- Fresh context; state read from queue + journal in the repo, not from session history
- One unit per iteration; finish it or journal why not - nothing left half-applied
- Gate: ______ (tests / lint / typecheck / build) runs outside the agent's reach (CI or hook)
- Deny rules: agent cannot edit test config, lint config, CI workflow, hook rules, or the journal's done-markers

ISOLATION
- Own worktree and branch; never the default branch
- Sandbox on; no production credentials in the environment; network off or allowlisted

MORNING REVIEW (the human floor)
- Each PR reviewed for business correctness and architectural fit - not checkmark-glancing
- Kill check before relaunch: oscillating diffs? budget spent but queue not shorter?
  same failure a third time? gate touched?
  Any yes -> do not relaunch. Read the journal, fix the cause first.

B.7 Context hygiene one-pager

LOAD
- Task-relevant context only; everything loaded competes with reasoning
- Pointers over payloads: link the architecture doc, name the files, do not paste them
- AGENTS.md under 200 lines - the always-loaded layer is the most expensive space

SESSION
- One unit of work per session
- Start clean per unit; do not carry a finished task's history into the next
- Durable state lives in files - research note, plan, journal - committed to the repo

WATCH FOR (contamination signs)
- Agent re-answers a question already settled this session
- Agent cites a stale version of a file it edited earlier
- Agent forgets a constraint it honored earlier
- Edit quality degrades late in a long session

WHEN CONTAMINATED
- Do not argue with the session - you cannot debate a window back into coherence
- Commit the durable state -> end the session -> start fresh
- The fresh session reads the progress back from the repo, without the noise

COMPACTION
- A handoff, not a continuation - summarizing drops detail
- Treat it like handing the work to a new engineer
- Anything that matters and is not in a file by then is gone

SUBAGENTS
- Isolate each task in its own context; one task's confusion never reaches the next
- Read handoff summaries with the skepticism you would give a junior's standup

B.8 Agent-diff read order (one-pager)

BEFORE THE CODE
- Diff-stat against the plan first - read the shape before a line of code
- Does the change touch what the ask named, and only that?
- Files the plan never named are the first flag - an over-scoped diff decided something for you

TESTS FIRST
- Read what they assert, not whether they pass - green is the signal you already have
- The assertion has to come from the intent, not the implementation
- A test written from the code will agree with the code

BOUNDARIES
- Check the rules the team wrote down - AGENTS.md forbidden patterns, layer conventions
- The agent violates a convention confidently and in fluent style
- A boundary crossing looks clean on the page - fluent style hides it from style-reading

NEW NAMES
- Grep every API, function, or config key the diff introduces that you do not recognize
- No hits in the codebase or the dependencies -> the name may not exist
- The invented call reads as plausibly as the real one

THEN LINE BY LINE
- The mechanical layer the review agents already swept - spec compliance, code quality
- Do not repeat their pass - spend the minutes they bought you
- The minutes go where the agents cannot: business correctness and architectural fit

CALIBRATION
- Fluency is not correctness
- An agent bug looks like the code a good engineer would write for a slightly different task
- The human read gets shorter and sharper as the tooling improves - never skipped

Key	Action
`?`	Show this help
`Esc`	Close overlays and menus
`⌘ K` or `Ctrl K`	Open search
`/`	Open search (secondary)
`←` `→`	Previous / next chapter
`g` `g`	Jump to top
`G`	Jump to bottom
`T`	Toggle theme
`-` `+`	Decrease / increase font size