failure modes

Worked failure modes

Each scenario below shows the verb invocation, the structured failure output, and the fleet’s recovery path. These are the cases CI fixtures should exercise once a sandbox repo is wired up.

1. Schema validation rejects malformed input

Cause: fleet sent a slug with uppercase characters.

echo '{"title":"x","slug":"BAD_SLUG","context":"x","decision":"x","alternatives_considered":[{"name":"a","rejection_reason":"b"},{"name":"c","rejection_reason":"d"}]}' | \
  petrova open_decision kahn-hq

Output:

open_decision → failed  c4d5e6f7...
  ✗ FIELD_PATTERN must match pattern "^[a-z0-9]+(-[a-z0-9]+){0,5}$" (/params/slug)

Recovery: fleet fixes the slug to kebab-case lowercase, retries. No side-effects occurred — the verb aborted before any GH call.

2. Repo not in registry

Cause: fleet trying to act on a repo not yet onboarded.

petrova diagnose unknown-project

Output:

diagnose → failed  ...
  ✗ REPO_NOT_IN_REGISTRY 'unknown-project' not found. Known: petrova-hq, kahn-hq

Recovery: Do not retry. Surface to human. The onboarding flow (TASKSET 8’s petrova-onboard) opens a registry-update PR; until that merges, the repo is invisible to the verb surface.

3. Privileged path

Cause: fleet’s diagnostic concluded a workflow file needs editing.

echo '{"title":"ci tweak","rationale":"x","files":[{"path":".github/workflows/test.yml","operation":"modify","contents":"..."}],"grounding":[{"kind":"finding","ref":"docs/findings/x.md"}]}' | \
  petrova request_review kahn-hq --actor fleet:kahn-implementer

Output:

request_review → failed  ...
  ✗ NO_PRIVILEGED_PATHS path '.github/workflows/test.yml' is privileged; verb refuses to edit it

Recovery: Do not retry. Surface to human; CI workflow edits require human review specifically because they affect every future PR’s gating. The fleet records the recommendation in a finding doc and lets the human take it from there.

4. Idempotency match (re-run of same input)

Cause: fleet retried after a crash; same verb, same inputs.

petrova propose_fix kahn-hq --input /tmp/fix.json --apply
# (first run earlier today already opened PR #1234)

Output:

propose_fix → skipped_idempotent  4f8c3e21d97a
  upholds: MR-7, MR-12
  PR #1234 petrova/propose-fix/4f8c3e21 https://github.com/kahn-hq/kahn/pull/1234

Recovery: No action needed. Fleet treats this as success — the PR exists, branch protection is taking over. Logging an event (audit purposes) is fine; downstream pipelines should not double-fire.

5. Stale diagnosis

Cause: fleet trying to propose_fix against a diagnosis older than 24h (e.g. resumed after weekend).

petrova propose_fix kahn-hq --input /tmp/fix.json

Output:

propose_fix → failed  ...
  ✗ DIAGNOSIS_EXISTS diagnosis older than 24h; re-run 'petrova diagnose kahn-hq'

Recovery: Re-run petrova diagnose kahn-hq, capture the new diagnosis_id, update /tmp/fix.json, retry.

6. Strict profile blocks auto-merge

Cause: fleet attempted request_merge_when_green on a strict-profile repo (e.g. petrova-hq itself).

echo '{"title":"x","rationale":"x","files":[{"path":"NOTES.md","operation":"modify","contents":"x"}],"merge_method":"squash","grounding":[{"kind":"finding","ref":"x"}]}' | \
  petrova request_merge_when_green petrova-hq --actor fleet:demo

Output:

request_merge_when_green → failed  ...
  ✗ PROFILE_PERMITS_AUTOMERGE repo 'petrova-hq' has profile 'strict'; auto-merge is forbidden — use request_review instead

Recovery: Switch to request_review. Same input shape minus merge_method. Human reviews and merges manually.

7. Fleet not allowed

Cause: new fleet trying to act on a repo before its ID was added to fleets_allowed.

echo '{"title":"x","rationale":"x","files":[{"path":"NOTES.md","operation":"modify","contents":"x"}],"grounding":[{"kind":"finding","ref":"x"}]}' | \
  petrova request_review kahn-hq --actor fleet:new-experimental-fleet

Output:

request_review → failed  ...
  ✗ FLEETS_ALLOWED fleet 'new-experimental-fleet' not in registry.yaml fleets_allowed for kahn-hq

Recovery: Do not retry until registry updated. The registry update is itself a request_review verb call against petrova-hq — opens a PR adding the fleet ID. Once merged, retry.

8. Auth missing on apply

Cause: fleet tried --apply without PETROVA_GITHUB_TOKEN set.

unset PETROVA_GITHUB_TOKEN
petrova open_decision kahn-hq --input /tmp/decision.json --apply

Output:

error: AUTH_MISSING — set PETROVA_GITHUB_TOKEN before --apply

(The CLI’s apply guard rejects before any verb code runs.)

Recovery: Set credentials in environment, retry. If running in CI, this means the fleet’s deployment is missing a secret — surface to operator immediately, do not silently swallow.

9. CI red on the resulting PR

Not a verb failure — the verb returned applied. CI runs asynchronously after PR open and may go red.

Detection: poll petrova diagnose <repo> --scope ci or watch the PR status checks. The fleet’s response options:

Iterate: compose a follow-up propose_fix whose proposed_changes fix the failure. New idempotency key (different inputs), new PR.
Surface: if the failure is structural or beyond the fleet’s pattern set, leave the PR open with a comment naming the failure and let a human take over.
Close: if the change was wrong, close the PR via gh pr close (the fleet does NOT use a verb for this — closing without merging is a human-or-fleet GitHub-level action, not a verb event).

10. Concurrent edit invalidates SHA

Cause: between dry-run and apply, someone (or another fleet) merged a change to the same file. The verb’s stored base SHA is stale.

Output during apply:

... → failed  ...
  ✗ FILE_ALREADY_EXISTS  (or 422 Unprocessable Entity from GitHub)

Recovery: Re-run the verb. The emitter fetches a fresh SHA on each apply attempt. If the conflict reflects a real semantic conflict (both edits target the same lines), the fleet should re-diagnose, re-compose with awareness of the new content.