198 lines
5.6 KiB
Markdown
198 lines
5.6 KiB
Markdown
# 102 Full Sweep Dry-Run Plan
|
|
|
|
> Date: 2026-04-19
|
|
> Status: Draft
|
|
> Upstream Spec: [2026-04-19-102-full-sweep-dry-run-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md)
|
|
|
|
## Plan Intent
|
|
|
|
Run one bounded, read-only full sweep over the `102` scene ledger to measure actual generic `scene -> skill` coverage.
|
|
|
|
The plan answers:
|
|
|
|
`how many of the 102 scenes can the current generic analyzer/generator handle today?`
|
|
|
|
## Scope Guardrails
|
|
|
|
1. do not change analyzer logic
|
|
2. do not change generator logic
|
|
3. do not promote scenes into `scene_execution_board_2026-04-18.json`
|
|
4. do not add new family baselines
|
|
5. do not create new family implementation plans
|
|
6. do not fix failures during this dry-run
|
|
7. do not run outside the fixed `102` scene set
|
|
|
|
## Fixed Inputs
|
|
|
|
1. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
|
|
2. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
|
|
3. generator command: `cargo run --bin sg_scene_generate`
|
|
|
|
## Fixed Outputs
|
|
|
|
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
|
|
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
|
|
3. report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
|
|
|
|
## Workstreams
|
|
|
|
1. `WS1` Build Scene Inventory
|
|
2. `WS2` Run Analyzer/Generator Dry-Run
|
|
3. `WS3` Classify Results
|
|
4. `WS4` Publish Coverage Report
|
|
|
|
## Phase 0: Freeze Dry-Run Boundary
|
|
|
|
### Objective
|
|
|
|
Make the dry-run a measurement exercise only.
|
|
|
|
### Tasks
|
|
|
|
1. freeze the execution board input
|
|
2. freeze the local scene root
|
|
3. freeze the dry-run output paths
|
|
4. explicitly mark the run as read-only with respect to generator behavior and board status
|
|
|
|
### Deliverables
|
|
|
|
1. fixed input statement
|
|
2. fixed output statement
|
|
3. dry-run no-promotion statement
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. no analyzer/generator implementation file is edited for this dry-run
|
|
2. `scene_execution_board_2026-04-18.json` is not modified by dry-run results
|
|
3. failures are recorded, not fixed
|
|
|
|
## Phase 1: Build Scene Inventory
|
|
|
|
### Objective
|
|
|
|
Construct a deterministic inventory of all `102` scene names and expected source directories.
|
|
|
|
### Tasks
|
|
|
|
1. read `scene_execution_board_2026-04-18.json`
|
|
2. extract all scene entries
|
|
3. map each scene name to `D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>`
|
|
4. check whether each source directory exists
|
|
5. assign initial inventory status:
|
|
- `source-present`
|
|
- `missing-source`
|
|
|
|
### Deliverables
|
|
|
|
1. inventory section inside `full_sweep_dry_run_2026-04-19.json`
|
|
2. missing-source list
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. inventory count equals `102`
|
|
2. every scene has a source path
|
|
3. missing source does not stop the sweep
|
|
|
|
## Phase 2: Run Analyzer/Generator Dry-Run
|
|
|
|
### Objective
|
|
|
|
Attempt current generic generation for every source-present scene without fixing failures.
|
|
|
|
### Tasks
|
|
|
|
1. generate a stable safe scene id for each scene
|
|
2. invoke `sg_scene_generate` for each source-present scene
|
|
3. write outputs under `examples/full_sweep_dry_run_2026-04-19`
|
|
4. for successful generation, read `references/generation-report.json`
|
|
5. for failed generation, capture stderr/stdout and exit code
|
|
6. continue until all `102` scenes are processed
|
|
|
|
### Deliverables
|
|
|
|
1. per-scene dry-run execution record
|
|
2. generated output root for successful scenes
|
|
3. captured error messages for failed scenes
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. every source-present scene has a generator result
|
|
2. no failure aborts the full sweep
|
|
3. generator results are isolated under the dry-run output root
|
|
|
|
## Phase 3: Classify Results
|
|
|
|
### Objective
|
|
|
|
Turn raw dry-run output into actionable coverage categories.
|
|
|
|
### Tasks
|
|
|
|
1. classify generated `A/B` readiness with no blocker as `auto-pass`
|
|
2. classify generator blocking with known gate/contract reason as `fail-closed-known`
|
|
3. classify obvious family mismatch as `misclassified`
|
|
4. classify evidence outside current families as `unsupported-family`
|
|
5. classify absent directories as `missing-source`
|
|
6. classify read/analyze failures as `source-unreadable`
|
|
7. compute top blockers by frequency
|
|
8. compute counts by inferred archetype
|
|
|
|
### Deliverables
|
|
|
|
1. final dry-run status per scene
|
|
2. summary counts
|
|
3. by-archetype counts
|
|
4. top-blocker list
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. every scene has exactly one final status
|
|
2. total classified count equals `102`
|
|
3. every non-pass scene has a reason
|
|
|
|
## Phase 4: Publish Report
|
|
|
|
### Objective
|
|
|
|
Answer the coverage question without changing project state.
|
|
|
|
### Tasks
|
|
|
|
1. write `full_sweep_dry_run_2026-04-19.json`
|
|
2. write `2026-04-19-102-full-sweep-dry-run-report.md`
|
|
3. report these four headline numbers:
|
|
- `real-sample executed pass`
|
|
- `code-backed ledger coverage`
|
|
- `dry-run auto-pass`
|
|
- `dry-run actionable coverage`
|
|
4. list next recommended blocker, but do not start implementation
|
|
|
|
### Deliverables
|
|
|
|
1. dry-run JSON
|
|
2. dry-run report
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. report can answer actual generic coverage over `102` scenes
|
|
2. report separates proven coverage from predicted/dry-run coverage
|
|
3. report does not promote scene status
|
|
|
|
## Completion Criteria
|
|
|
|
This plan is complete when:
|
|
|
|
1. all `102` scenes are included in the dry-run result
|
|
2. the dry-run result has stable summary counts
|
|
3. the report explains the gap between `5/102`, `23/102`, and dry-run coverage
|
|
4. no generator logic or execution board status is modified
|
|
|
|
## Non-Negotiable Stop Rule
|
|
|
|
After this dry-run starts:
|
|
|
|
1. do not fix generator failures inside the sweep
|
|
2. do not create new family implementation plans from a single failure
|
|
3. do not update the execution board automatically
|
|
4. stop after publishing the dry-run result and report
|