failure modes
Worked failure modes
Section titled “Worked failure modes”Each scenario below shows the verb invocation, the structured failure output, and the fleet’s recovery path. These are the cases CI fixtures should exercise once a sandbox repo is wired up.
1. Schema validation rejects malformed input
Section titled “1. Schema validation rejects malformed input”Cause: fleet sent a slug with uppercase characters.
echo '{"title":"x","slug":"BAD_SLUG","context":"x","decision":"x","alternatives_considered":[{"name":"a","rejection_reason":"b"},{"name":"c","rejection_reason":"d"}]}' | \ petrova open_decision kahn-hqOutput:
open_decision → failed c4d5e6f7... ✗ FIELD_PATTERN must match pattern "^[a-z0-9]+(-[a-z0-9]+){0,5}$" (/params/slug)Recovery: fleet fixes the slug to kebab-case lowercase, retries. No side-effects occurred — the verb aborted before any GH call.
2. Repo not in registry
Section titled “2. Repo not in registry”Cause: fleet trying to act on a repo not yet onboarded.
petrova diagnose unknown-projectOutput:
diagnose → failed ... ✗ REPO_NOT_IN_REGISTRY 'unknown-project' not found. Known: petrova-hq, kahn-hqRecovery: Do not retry. Surface to human. The onboarding flow
(TASKSET 8’s petrova-onboard) opens a registry-update PR; until that
merges, the repo is invisible to the verb surface.
3. Privileged path
Section titled “3. Privileged path”Cause: fleet’s diagnostic concluded a workflow file needs editing.
echo '{"title":"ci tweak","rationale":"x","files":[{"path":".github/workflows/test.yml","operation":"modify","contents":"..."}],"grounding":[{"kind":"finding","ref":"docs/findings/x.md"}]}' | \ petrova request_review kahn-hq --actor fleet:kahn-implementerOutput:
request_review → failed ... ✗ NO_PRIVILEGED_PATHS path '.github/workflows/test.yml' is privileged; verb refuses to edit itRecovery: Do not retry. Surface to human; CI workflow edits require human review specifically because they affect every future PR’s gating. The fleet records the recommendation in a finding doc and lets the human take it from there.
4. Idempotency match (re-run of same input)
Section titled “4. Idempotency match (re-run of same input)”Cause: fleet retried after a crash; same verb, same inputs.
petrova propose_fix kahn-hq --input /tmp/fix.json --apply# (first run earlier today already opened PR #1234)Output:
propose_fix → skipped_idempotent 4f8c3e21d97a upholds: MR-7, MR-12 PR #1234 petrova/propose-fix/4f8c3e21 https://github.com/kahn-hq/kahn/pull/1234Recovery: No action needed. Fleet treats this as success — the PR exists, branch protection is taking over. Logging an event (audit purposes) is fine; downstream pipelines should not double-fire.
5. Stale diagnosis
Section titled “5. Stale diagnosis”Cause: fleet trying to propose_fix against a diagnosis older than 24h (e.g. resumed after weekend).
petrova propose_fix kahn-hq --input /tmp/fix.jsonOutput:
propose_fix → failed ... ✗ DIAGNOSIS_EXISTS diagnosis older than 24h; re-run 'petrova diagnose kahn-hq'Recovery: Re-run petrova diagnose kahn-hq, capture the new
diagnosis_id, update /tmp/fix.json, retry.
6. Strict profile blocks auto-merge
Section titled “6. Strict profile blocks auto-merge”Cause: fleet attempted request_merge_when_green on a
strict-profile repo (e.g. petrova-hq itself).
echo '{"title":"x","rationale":"x","files":[{"path":"NOTES.md","operation":"modify","contents":"x"}],"merge_method":"squash","grounding":[{"kind":"finding","ref":"x"}]}' | \ petrova request_merge_when_green petrova-hq --actor fleet:demoOutput:
request_merge_when_green → failed ... ✗ PROFILE_PERMITS_AUTOMERGE repo 'petrova-hq' has profile 'strict'; auto-merge is forbidden — use request_review insteadRecovery: Switch to request_review. Same input shape minus
merge_method. Human reviews and merges manually.
7. Fleet not allowed
Section titled “7. Fleet not allowed”Cause: new fleet trying to act on a repo before its ID was added
to fleets_allowed.
echo '{"title":"x","rationale":"x","files":[{"path":"NOTES.md","operation":"modify","contents":"x"}],"grounding":[{"kind":"finding","ref":"x"}]}' | \ petrova request_review kahn-hq --actor fleet:new-experimental-fleetOutput:
request_review → failed ... ✗ FLEETS_ALLOWED fleet 'new-experimental-fleet' not in registry.yaml fleets_allowed for kahn-hqRecovery: Do not retry until registry updated. The registry
update is itself a request_review verb call against petrova-hq —
opens a PR adding the fleet ID. Once merged, retry.
8. Auth missing on apply
Section titled “8. Auth missing on apply”Cause: fleet tried --apply without PETROVA_GITHUB_TOKEN set.
unset PETROVA_GITHUB_TOKENpetrova open_decision kahn-hq --input /tmp/decision.json --applyOutput:
error: AUTH_MISSING — set PETROVA_GITHUB_TOKEN before --apply(The CLI’s apply guard rejects before any verb code runs.)
Recovery: Set credentials in environment, retry. If running in CI, this means the fleet’s deployment is missing a secret — surface to operator immediately, do not silently swallow.
9. CI red on the resulting PR
Section titled “9. CI red on the resulting PR”Not a verb failure — the verb returned applied. CI runs
asynchronously after PR open and may go red.
Detection: poll petrova diagnose <repo> --scope ci or watch the
PR status checks. The fleet’s response options:
- Iterate: compose a follow-up
propose_fixwhoseproposed_changesfix the failure. New idempotency key (different inputs), new PR. - Surface: if the failure is structural or beyond the fleet’s pattern set, leave the PR open with a comment naming the failure and let a human take over.
- Close: if the change was wrong, close the PR via
gh pr close(the fleet does NOT use a verb for this — closing without merging is a human-or-fleet GitHub-level action, not a verb event).
10. Concurrent edit invalidates SHA
Section titled “10. Concurrent edit invalidates SHA”Cause: between dry-run and apply, someone (or another fleet) merged a change to the same file. The verb’s stored base SHA is stale.
Output during apply:
... → failed ... ✗ FILE_ALREADY_EXISTS (or 422 Unprocessable Entity from GitHub)Recovery: Re-run the verb. The emitter fetches a fresh SHA on each apply attempt. If the conflict reflects a real semantic conflict (both edits target the same lines), the fleet should re-diagnose, re-compose with awareness of the new content.