Files
claw/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-triage-design.md

209 lines
5.2 KiB
Markdown

# 102 Full Sweep Dry-Run Triage Design
> Date: 2026-04-19
> Status: Draft
> Upstream Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
> Upstream Report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
## Design Intent
Split the non-pass buckets from the `102` scene full sweep into concrete, actionable triage categories without changing generator behavior or promoting scene status.
The design answers:
`why did 62 scenes not become dry-run auto-pass, and which blocker should be handled first?`
## Starting Point
The upstream dry-run produced:
| Status | Count |
| --- | ---: |
| `auto-pass` | 40 |
| `fail-closed-known` | 26 |
| `misclassified` | 5 |
| `source-unreadable` | 31 |
| `missing-source` | 0 |
| `unsupported-family` | 0 |
| Total | 102 |
The triage scope is only the `62` non-pass records.
## Scope Guardrails
1. do not edit `src/generated_scene/analyzer.rs`
2. do not edit `src/generated_scene/generator.rs`
3. do not change scene generation logic
4. do not update `scene_execution_board_2026-04-18.json`
5. do not promote scenes from this triage
6. do not add family baselines
7. do not create implementation plans from a single failure
8. do not rerun outside the fixed `102` scene set
## Fixed Inputs
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
4. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
## Fixed Outputs
1. triage result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
2. triage report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
## Triage Order
The order is fixed:
1. timeout triage
2. misclassification triage
3. no-report failure triage
This order is deliberate:
1. timeouts are the largest bucket and include already-mapped `G2` scenes
2. misclassification has the cleanest routing-quality signal
3. no-report failures are too broad until the higher-signal buckets are separated
## Timeout Triage Model
Input bucket:
`dryRunStatus = source-unreadable`
Current count:
`31`
Current reason:
`generator timeout after 30s`
Target second-level labels:
1. `timeout-known-family-sample`
2. `timeout-unvalidated-source`
3. `timeout-large-source`
4. `timeout-command-hang`
5. `timeout-generator-slow-but-progressing`
6. `timeout-undetermined`
Minimum evidence per timeout record:
1. source directory exists
2. file count
3. total source bytes
4. current group
5. current board status
6. real sample record id if present
7. whether a partial skill directory exists
8. whether a partial generation report exists
Diagnostic reruns are allowed only for classification. A longer rerun success does not promote the scene.
## Misclassification Triage Model
Input bucket:
`dryRunStatus = misclassified`
Current count:
`5`
Current shape:
1. `G3 -> host_bridge_workflow`: `3`
2. `G1-E -> host_bridge_workflow`: `2`
Target second-level labels:
1. `route-overprefer-host-bridge`
2. `board-expectation-stale`
3. `mixed-workflow-host-bridge-valid`
4. `scene-family-split-needed`
5. `misclassification-undetermined`
Minimum evidence per misclassification record:
1. board expected group
2. expected archetype
3. dry-run inferred archetype
4. current source asset
5. real sample layer status
6. generated report path
7. failed or conflicting signal summary
This phase does not correct routing logic.
## No-Report Failure Triage Model
Input bucket:
`dryRunStatus = fail-closed-known` and reason is `generator failed without generation report`
Current count:
`25`
Target failure stages:
1. `source-scan`
2. `analyzer`
3. `ir-assembly`
4. `readiness-before-report`
5. `compiler-package-write`
6. `panic-or-process-error`
7. `unknown-no-report`
The one `bootstrap_target` failure remains separately tracked and is not merged into no-report failures.
Minimum evidence per no-report record:
1. exit code if available
2. stdout tail
3. stderr tail
4. partial skill directory exists
5. partial references directory exists
6. generated report exists
7. inferred failure stage
## Result Schema
Top-level fields:
```json
{
"triageDate": "2026-04-19",
"scope": "102-full-sweep-dry-run-triage",
"sourceDryRun": "tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json",
"summary": {},
"timeoutTriage": [],
"misclassificationTriage": [],
"noReportFailureTriage": [],
"bootstrapTargetFailures": [],
"recommendations": []
}
```
Each triage record keeps the original dry-run scene id and scene name.
## Completion Criteria
This triage is complete when:
1. all `31` timeout records have a second-level timeout label
2. all `5` misclassified records have a routing triage label
3. all `25` no-report failures have an inferred failure stage
4. the `bootstrap_target` case remains separately visible
5. no scene status is promoted
6. no generator or analyzer logic is changed
## Stop Rule
Stop after publishing the triage JSON and report.
Do not start implementation correction from this triage unless a new bounded implementation plan is explicitly created later.