Files
claw/docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-triage-plan.md

241 lines
6.5 KiB
Markdown

# 102 Full Sweep Dry-Run Triage Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-triage-design.md`
## Plan Intent
Turn the `62` non-pass records from the full sweep into concrete triage buckets while staying measurement-only.
The plan must not fix generator failures. It only explains them.
## Fixed Inputs
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
4. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
## Fixed Outputs
1. triage result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
2. triage report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
## Non-Negotiable Scope Guardrails
1. do not edit analyzer implementation
2. do not edit generator implementation
3. do not update `scene_execution_board_2026-04-18.json`
4. do not promote any scene
5. do not add new family baselines
6. do not start implementation correction during triage
7. do not expand beyond the fixed `102` scene set
## Workstreams
1. `WS1` Timeout Triage
2. `WS2` Misclassification Triage
3. `WS3` No-Report Failure Triage
4. `WS4` Publish Triage Result
## Phase 0: Freeze Triage Boundary
### Objective
Make the triage a classification exercise only.
### Tasks
1. read the upstream dry-run result
2. verify the upstream result has `102` scenes
3. verify non-pass buckets are:
- `31` timeout records
- `5` misclassified records
- `25` no-report records
- `1` bootstrap-target record
4. freeze the triage order:
- timeout first
- misclassification second
- no-report third
### Deliverables
1. frozen triage input statement
2. frozen non-pass bucket counts
3. frozen triage order
### Acceptance Criteria
1. triage input count is stable
2. no code is changed
3. no board status is updated
## Phase 1: Timeout Triage
### Objective
Split the `31` timeout records into second-level reasons.
### Tasks
1. select records where `dryRunStatus = source-unreadable`
2. verify reason is `generator timeout after 30s`
3. collect source directory metadata:
- source directory exists
- file count
- total source bytes
- largest file path
- largest file bytes
4. collect dry-run artifact metadata:
- generated skill directory exists
- references directory exists
- generation report exists
5. preserve board context:
- current group
- current status
- current source asset
- real sample record id
6. optionally run one diagnostic longer-timeout attempt for classification only
7. assign one timeout label:
- `timeout-known-family-sample`
- `timeout-unvalidated-source`
- `timeout-large-source`
- `timeout-command-hang`
- `timeout-generator-slow-but-progressing`
- `timeout-undetermined`
### Deliverables
1. `timeoutTriage[]` records in the triage JSON
2. timeout label summary
3. timeout size/source metadata summary
### Acceptance Criteria
1. all `31` timeout records have a second-level label
2. no timeout is treated as unsupported family by default
3. no long-timeout rerun result promotes a scene
## Phase 2: Misclassification Triage
### Objective
Explain the `5` board-vs-archetype conflicts.
### Tasks
1. select records where `dryRunStatus = misclassified`
2. preserve:
- board expected group
- expected archetype
- inferred archetype
- current source asset
- real sample layer status
3. inspect existing dry-run report path when present
4. collect route-conflict evidence:
- whether host bridge evidence dominates
- whether G3 or G1-E evidence is still present
- whether current board expectation came from baseline or expansion
5. assign one routing triage label:
- `route-overprefer-host-bridge`
- `board-expectation-stale`
- `mixed-workflow-host-bridge-valid`
- `scene-family-split-needed`
- `misclassification-undetermined`
### Deliverables
1. `misclassificationTriage[]` records in the triage JSON
2. routing conflict summary
3. high-priority routing risk list
### Acceptance Criteria
1. all `5` misclassified records have a routing label
2. no routing code is changed
3. the report identifies whether implementation correction is justified later
## Phase 3: No-Report Failure Triage
### Objective
Split the `25` generic no-report failures into concrete failure stages.
### Tasks
1. select records where:
- `dryRunStatus = fail-closed-known`
- `reason = generator failed without generation report`
2. collect command artifacts:
- exit code
- stdout tail
- stderr tail
3. inspect output artifacts:
- skill directory exists
- references directory exists
- any report file exists
4. infer one failure stage:
- `source-scan`
- `analyzer`
- `ir-assembly`
- `readiness-before-report`
- `compiler-package-write`
- `panic-or-process-error`
- `unknown-no-report`
5. keep `bootstrap_target` failure separate
### Deliverables
1. `noReportFailureTriage[]` records in the triage JSON
2. `bootstrapTargetFailures[]` records in the triage JSON
3. failure-stage summary
### Acceptance Criteria
1. all `25` no-report failures have an inferred failure stage
2. the `bootstrap_target` case is not hidden in the no-report bucket
3. every non-pass record remains explainable without implementation changes
## Phase 4: Publish Triage Result
### Objective
Publish a bounded triage result and stop.
### Tasks
1. write `full_sweep_dry_run_triage_2026-04-19.json`
2. write `2026-04-19-102-full-sweep-dry-run-triage-report.md`
3. include:
- timeout triage summary
- misclassification triage summary
- no-report triage summary
- recommended next blocker
4. explicitly state that the triage does not promote scenes or start fixes
### Deliverables
1. triage JSON
2. triage report
### Acceptance Criteria
1. all `62` non-pass records are covered
2. every non-pass record has a second-level explanation
3. the report identifies the next blocker without implementing it
4. no generator/analyzer file is modified
5. `scene_execution_board_2026-04-18.json` is not modified
## Completion Criteria
This plan is complete when:
1. `31` timeout records have timeout labels
2. `5` misclassified records have routing labels
3. `25` no-report failures have failure stages
4. `1` bootstrap-target failure is separately tracked
5. the triage JSON and report are published
6. execution stops without implementation work