Verification round prompt

Invoked from prompts/02-phase-close.md (or standalone if the human wants a mid-phase friction probe). Produces a list of surfaced friction items for the reviewer to classify. Mandatory at phase close per MR-10.

What this is

A verification round is a short, structured probe of the phase’s deliverables, performed after acceptance gates pass but before the phase is declared closed. The point is to surface the friction the phase produced that the acceptance gates didn’t catch — UX rough edges, doc/code drift, half-built abstractions, deferred conversations.

The output is a list. Each item gets classified by the reviewer (closed / in-budget / deferred). No friction is fixed during the round itself.

Prompt to paste (or read inline from phase-close)

You are running a verification round for Phase <>. You are operating as the auditor subagent (see AGENTS.xml). The reviewer will classify your findings; your job is to produce a faithful list.

Step 1 — Choose the probe shape

Pick the probes appropriate to this phase. Use as many as the phase warrants; do not skip “because it’s already known to be fine”.

A — Visitor probe (full-stack web phases especially)

Pretend you are a paying visitor opening the deployed app for the first time. For each top-level surface (route, view, API endpoint), record:

First-impression verdict (one sentence).
Concrete defects (broken renders, 500s, 404s, console errors, layout breaks at common viewports).
A11y gaps (keyboard nav, contrast, semantic markup, screen-reader labels).
Responsive gaps (mobile / tablet / desktop breakpoints).
Cross-view consistency issues (does this view feel like it belongs to the same product?).
Performance smell (visible lag, oversized bundles, blocking requests).

B — Operator probe (backend / infra phases)

Pretend you are an operator inheriting the system at this phase boundary. For each operator-facing surface (deploy, migrate, observe, debug, recover):

Can you do it from the runbooks alone? If a runbook is missing, that’s a finding.
Are the failure modes documented? If not, that’s a finding.
Is the diagnostic shortcut (à la “if X happens, see Y”) in place for the failure modes you can predict?
If a deploy went wrong tonight, what’s missing for safe rollback?

C — Spec-vs-build probe

Walk the spec docs (docs/spec/) and verify each requirement nominally covered by this phase actually has:

An implementation (file path).
A test (file path).
A traceability entry (docs/verification/vtm.md or equivalent).

Gaps are findings.

D — Invariant probe

For each invariant I-N declared in CLAUDE.md:

Does the phase’s code preserve it? Cite a test or a code reference.
If the invariant is “should never happen” (e.g. “Scope writes only under .kahn/archive/”), is there a guard that fails-closed if it would be violated?

Invariant violations are blocking findings — they go straight back to the implementer, not into the friction budget.

E — Drift probe

Read docs/north-star/intent.md § drift_watches. For each named anti-shape:

Did the phase pull toward it? Cite the specific code or doc move that did.
Is there a counter-anchor in place? If yes, was it exercised during the phase?

Drift findings are usually in-budget or deferred, rarely closed.

F — Decision-doc audit

Walk docs/decisions/ for entries created during this phase:

Are any superseded but not marked? (MR-7 violation — block.)
Are any open questions still open? (Friction item.)
Are dates ISO-formatted? (MR-4.)

Step 2 — Produce the findings list

For each finding, output:

### F<<N>> — <<short title>>

- **Probe:** A | B | C | D | E | F
- **Surface:** <<file path / route / module / doc>>
- **Observation:** <<concrete description; one paragraph max>>
- **Severity:** trivial | minor | substantive | invariant-violation
- **Reproduction:** <<how to surface this again, ideally a command or a click path>>
- **Suggested category:** closed | in-budget | deferred
- **If deferred, suggested milestone parent:** M<<N+1>>.<<x>>

severity: invariant-violation items are blocking — they short-circuit the round and route to the implementer. Surface them at the top of your output.

Step 3 — Sanity check before handing back

No finding has category: closed and severity: substantive simultaneously. Substantive things must defer; closing them silently is the failure mode MR-2 prevents.
Every deferred finding has a suggested M<<N+1>>.x.y slot. Ungrounded defers are how phase N+1 ends up undecidable.
No invariant-violations slipped into the in-budget bucket. Those bypass the budget by design.
The findings list is the round’s only output. Do not modify code or docs during the round.

Step 4 — Hand back to the reviewer

End your turn with:

Total findings count.
Counts by severity and suggested category.
The findings list.
Any blocking (invariant-violation) findings flagged at the top.

The reviewer will pick this up and walk the categorisation and decision-doc steps.

Notes for the human

Verification rounds feel slow the first few times. They’re not slow; they’re substituting for the much slower drift you’d otherwise pay for two phases later. The KAHN project’s M6.5 round is the canonical reference — it surfaced “M6.3.1 live-dot redesign” and “M6.2.2 cross-view consistency” as Phase 7.2.1 / 7.2.2 carry-overs, and the Phase 7 plan worked because of it.
A round with zero findings is not a sign of perfection; it’s a sign you didn’t probe deeply enough. Push back.
A round with >10 substantive findings is a sign the phase decomposition was wrong, not that the phase failed. Bring the deferred count to me; we may need to re-scope phase N+1 before opening it.
If a finding is genuinely an invariant violation, do not classify it as friction. Block. Route to implementer. The friction budget is for things the phase chose to leave; invariant violations are things the phase failed to do.