Files
claw/docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md

198 lines
5.6 KiB
Markdown

# 102 Full Sweep Dry-Run Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-102-full-sweep-dry-run-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md)
## Plan Intent
Run one bounded, read-only full sweep over the `102` scene ledger to measure actual generic `scene -> skill` coverage.
The plan answers:
`how many of the 102 scenes can the current generic analyzer/generator handle today?`
## Scope Guardrails
1. do not change analyzer logic
2. do not change generator logic
3. do not promote scenes into `scene_execution_board_2026-04-18.json`
4. do not add new family baselines
5. do not create new family implementation plans
6. do not fix failures during this dry-run
7. do not run outside the fixed `102` scene set
## Fixed Inputs
1. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
3. generator command: `cargo run --bin sg_scene_generate`
## Fixed Outputs
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
## Workstreams
1. `WS1` Build Scene Inventory
2. `WS2` Run Analyzer/Generator Dry-Run
3. `WS3` Classify Results
4. `WS4` Publish Coverage Report
## Phase 0: Freeze Dry-Run Boundary
### Objective
Make the dry-run a measurement exercise only.
### Tasks
1. freeze the execution board input
2. freeze the local scene root
3. freeze the dry-run output paths
4. explicitly mark the run as read-only with respect to generator behavior and board status
### Deliverables
1. fixed input statement
2. fixed output statement
3. dry-run no-promotion statement
### Acceptance Criteria
1. no analyzer/generator implementation file is edited for this dry-run
2. `scene_execution_board_2026-04-18.json` is not modified by dry-run results
3. failures are recorded, not fixed
## Phase 1: Build Scene Inventory
### Objective
Construct a deterministic inventory of all `102` scene names and expected source directories.
### Tasks
1. read `scene_execution_board_2026-04-18.json`
2. extract all scene entries
3. map each scene name to `D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>`
4. check whether each source directory exists
5. assign initial inventory status:
- `source-present`
- `missing-source`
### Deliverables
1. inventory section inside `full_sweep_dry_run_2026-04-19.json`
2. missing-source list
### Acceptance Criteria
1. inventory count equals `102`
2. every scene has a source path
3. missing source does not stop the sweep
## Phase 2: Run Analyzer/Generator Dry-Run
### Objective
Attempt current generic generation for every source-present scene without fixing failures.
### Tasks
1. generate a stable safe scene id for each scene
2. invoke `sg_scene_generate` for each source-present scene
3. write outputs under `examples/full_sweep_dry_run_2026-04-19`
4. for successful generation, read `references/generation-report.json`
5. for failed generation, capture stderr/stdout and exit code
6. continue until all `102` scenes are processed
### Deliverables
1. per-scene dry-run execution record
2. generated output root for successful scenes
3. captured error messages for failed scenes
### Acceptance Criteria
1. every source-present scene has a generator result
2. no failure aborts the full sweep
3. generator results are isolated under the dry-run output root
## Phase 3: Classify Results
### Objective
Turn raw dry-run output into actionable coverage categories.
### Tasks
1. classify generated `A/B` readiness with no blocker as `auto-pass`
2. classify generator blocking with known gate/contract reason as `fail-closed-known`
3. classify obvious family mismatch as `misclassified`
4. classify evidence outside current families as `unsupported-family`
5. classify absent directories as `missing-source`
6. classify read/analyze failures as `source-unreadable`
7. compute top blockers by frequency
8. compute counts by inferred archetype
### Deliverables
1. final dry-run status per scene
2. summary counts
3. by-archetype counts
4. top-blocker list
### Acceptance Criteria
1. every scene has exactly one final status
2. total classified count equals `102`
3. every non-pass scene has a reason
## Phase 4: Publish Report
### Objective
Answer the coverage question without changing project state.
### Tasks
1. write `full_sweep_dry_run_2026-04-19.json`
2. write `2026-04-19-102-full-sweep-dry-run-report.md`
3. report these four headline numbers:
- `real-sample executed pass`
- `code-backed ledger coverage`
- `dry-run auto-pass`
- `dry-run actionable coverage`
4. list next recommended blocker, but do not start implementation
### Deliverables
1. dry-run JSON
2. dry-run report
### Acceptance Criteria
1. report can answer actual generic coverage over `102` scenes
2. report separates proven coverage from predicted/dry-run coverage
3. report does not promote scene status
## Completion Criteria
This plan is complete when:
1. all `102` scenes are included in the dry-run result
2. the dry-run result has stable summary counts
3. the report explains the gap between `5/102`, `23/102`, and dry-run coverage
4. no generator logic or execution board status is modified
## Non-Negotiable Stop Rule
After this dry-run starts:
1. do not fix generator failures inside the sweep
2. do not create new family implementation plans from a single failure
3. do not update the execution board automatically
4. stop after publishing the dry-run result and report