241 lines
6.5 KiB
Markdown
241 lines
6.5 KiB
Markdown
# 102 Full Sweep Dry-Run Triage Plan
|
|
|
|
> Date: 2026-04-19
|
|
> Status: Draft
|
|
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-triage-design.md`
|
|
|
|
## Plan Intent
|
|
|
|
Turn the `62` non-pass records from the full sweep into concrete triage buckets while staying measurement-only.
|
|
|
|
The plan must not fix generator failures. It only explains them.
|
|
|
|
## Fixed Inputs
|
|
|
|
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
|
|
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
|
|
3. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
|
|
4. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
|
|
|
|
## Fixed Outputs
|
|
|
|
1. triage result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
|
|
2. triage report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
|
|
|
|
## Non-Negotiable Scope Guardrails
|
|
|
|
1. do not edit analyzer implementation
|
|
2. do not edit generator implementation
|
|
3. do not update `scene_execution_board_2026-04-18.json`
|
|
4. do not promote any scene
|
|
5. do not add new family baselines
|
|
6. do not start implementation correction during triage
|
|
7. do not expand beyond the fixed `102` scene set
|
|
|
|
## Workstreams
|
|
|
|
1. `WS1` Timeout Triage
|
|
2. `WS2` Misclassification Triage
|
|
3. `WS3` No-Report Failure Triage
|
|
4. `WS4` Publish Triage Result
|
|
|
|
## Phase 0: Freeze Triage Boundary
|
|
|
|
### Objective
|
|
|
|
Make the triage a classification exercise only.
|
|
|
|
### Tasks
|
|
|
|
1. read the upstream dry-run result
|
|
2. verify the upstream result has `102` scenes
|
|
3. verify non-pass buckets are:
|
|
- `31` timeout records
|
|
- `5` misclassified records
|
|
- `25` no-report records
|
|
- `1` bootstrap-target record
|
|
4. freeze the triage order:
|
|
- timeout first
|
|
- misclassification second
|
|
- no-report third
|
|
|
|
### Deliverables
|
|
|
|
1. frozen triage input statement
|
|
2. frozen non-pass bucket counts
|
|
3. frozen triage order
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. triage input count is stable
|
|
2. no code is changed
|
|
3. no board status is updated
|
|
|
|
## Phase 1: Timeout Triage
|
|
|
|
### Objective
|
|
|
|
Split the `31` timeout records into second-level reasons.
|
|
|
|
### Tasks
|
|
|
|
1. select records where `dryRunStatus = source-unreadable`
|
|
2. verify reason is `generator timeout after 30s`
|
|
3. collect source directory metadata:
|
|
- source directory exists
|
|
- file count
|
|
- total source bytes
|
|
- largest file path
|
|
- largest file bytes
|
|
4. collect dry-run artifact metadata:
|
|
- generated skill directory exists
|
|
- references directory exists
|
|
- generation report exists
|
|
5. preserve board context:
|
|
- current group
|
|
- current status
|
|
- current source asset
|
|
- real sample record id
|
|
6. optionally run one diagnostic longer-timeout attempt for classification only
|
|
7. assign one timeout label:
|
|
- `timeout-known-family-sample`
|
|
- `timeout-unvalidated-source`
|
|
- `timeout-large-source`
|
|
- `timeout-command-hang`
|
|
- `timeout-generator-slow-but-progressing`
|
|
- `timeout-undetermined`
|
|
|
|
### Deliverables
|
|
|
|
1. `timeoutTriage[]` records in the triage JSON
|
|
2. timeout label summary
|
|
3. timeout size/source metadata summary
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. all `31` timeout records have a second-level label
|
|
2. no timeout is treated as unsupported family by default
|
|
3. no long-timeout rerun result promotes a scene
|
|
|
|
## Phase 2: Misclassification Triage
|
|
|
|
### Objective
|
|
|
|
Explain the `5` board-vs-archetype conflicts.
|
|
|
|
### Tasks
|
|
|
|
1. select records where `dryRunStatus = misclassified`
|
|
2. preserve:
|
|
- board expected group
|
|
- expected archetype
|
|
- inferred archetype
|
|
- current source asset
|
|
- real sample layer status
|
|
3. inspect existing dry-run report path when present
|
|
4. collect route-conflict evidence:
|
|
- whether host bridge evidence dominates
|
|
- whether G3 or G1-E evidence is still present
|
|
- whether current board expectation came from baseline or expansion
|
|
5. assign one routing triage label:
|
|
- `route-overprefer-host-bridge`
|
|
- `board-expectation-stale`
|
|
- `mixed-workflow-host-bridge-valid`
|
|
- `scene-family-split-needed`
|
|
- `misclassification-undetermined`
|
|
|
|
### Deliverables
|
|
|
|
1. `misclassificationTriage[]` records in the triage JSON
|
|
2. routing conflict summary
|
|
3. high-priority routing risk list
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. all `5` misclassified records have a routing label
|
|
2. no routing code is changed
|
|
3. the report identifies whether implementation correction is justified later
|
|
|
|
## Phase 3: No-Report Failure Triage
|
|
|
|
### Objective
|
|
|
|
Split the `25` generic no-report failures into concrete failure stages.
|
|
|
|
### Tasks
|
|
|
|
1. select records where:
|
|
- `dryRunStatus = fail-closed-known`
|
|
- `reason = generator failed without generation report`
|
|
2. collect command artifacts:
|
|
- exit code
|
|
- stdout tail
|
|
- stderr tail
|
|
3. inspect output artifacts:
|
|
- skill directory exists
|
|
- references directory exists
|
|
- any report file exists
|
|
4. infer one failure stage:
|
|
- `source-scan`
|
|
- `analyzer`
|
|
- `ir-assembly`
|
|
- `readiness-before-report`
|
|
- `compiler-package-write`
|
|
- `panic-or-process-error`
|
|
- `unknown-no-report`
|
|
5. keep `bootstrap_target` failure separate
|
|
|
|
### Deliverables
|
|
|
|
1. `noReportFailureTriage[]` records in the triage JSON
|
|
2. `bootstrapTargetFailures[]` records in the triage JSON
|
|
3. failure-stage summary
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. all `25` no-report failures have an inferred failure stage
|
|
2. the `bootstrap_target` case is not hidden in the no-report bucket
|
|
3. every non-pass record remains explainable without implementation changes
|
|
|
|
## Phase 4: Publish Triage Result
|
|
|
|
### Objective
|
|
|
|
Publish a bounded triage result and stop.
|
|
|
|
### Tasks
|
|
|
|
1. write `full_sweep_dry_run_triage_2026-04-19.json`
|
|
2. write `2026-04-19-102-full-sweep-dry-run-triage-report.md`
|
|
3. include:
|
|
- timeout triage summary
|
|
- misclassification triage summary
|
|
- no-report triage summary
|
|
- recommended next blocker
|
|
4. explicitly state that the triage does not promote scenes or start fixes
|
|
|
|
### Deliverables
|
|
|
|
1. triage JSON
|
|
2. triage report
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. all `62` non-pass records are covered
|
|
2. every non-pass record has a second-level explanation
|
|
3. the report identifies the next blocker without implementing it
|
|
4. no generator/analyzer file is modified
|
|
5. `scene_execution_board_2026-04-18.json` is not modified
|
|
|
|
## Completion Criteria
|
|
|
|
This plan is complete when:
|
|
|
|
1. `31` timeout records have timeout labels
|
|
2. `5` misclassified records have routing labels
|
|
3. `25` no-report failures have failure stages
|
|
4. `1` bootstrap-target failure is separately tracked
|
|
5. the triage JSON and report are published
|
|
6. execution stops without implementation work
|
|
|