feat: add generated scene skill platform hardening
This commit is contained in:
197
docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md
Normal file
197
docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# 102 Full Sweep Dry-Run Plan
|
||||
|
||||
> Date: 2026-04-19
|
||||
> Status: Draft
|
||||
> Upstream Spec: [2026-04-19-102-full-sweep-dry-run-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md)
|
||||
|
||||
## Plan Intent
|
||||
|
||||
Run one bounded, read-only full sweep over the `102` scene ledger to measure actual generic `scene -> skill` coverage.
|
||||
|
||||
The plan answers:
|
||||
|
||||
`how many of the 102 scenes can the current generic analyzer/generator handle today?`
|
||||
|
||||
## Scope Guardrails
|
||||
|
||||
1. do not change analyzer logic
|
||||
2. do not change generator logic
|
||||
3. do not promote scenes into `scene_execution_board_2026-04-18.json`
|
||||
4. do not add new family baselines
|
||||
5. do not create new family implementation plans
|
||||
6. do not fix failures during this dry-run
|
||||
7. do not run outside the fixed `102` scene set
|
||||
|
||||
## Fixed Inputs
|
||||
|
||||
1. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
|
||||
2. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
|
||||
3. generator command: `cargo run --bin sg_scene_generate`
|
||||
|
||||
## Fixed Outputs
|
||||
|
||||
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
|
||||
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
|
||||
3. report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
|
||||
|
||||
## Workstreams
|
||||
|
||||
1. `WS1` Build Scene Inventory
|
||||
2. `WS2` Run Analyzer/Generator Dry-Run
|
||||
3. `WS3` Classify Results
|
||||
4. `WS4` Publish Coverage Report
|
||||
|
||||
## Phase 0: Freeze Dry-Run Boundary
|
||||
|
||||
### Objective
|
||||
|
||||
Make the dry-run a measurement exercise only.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. freeze the execution board input
|
||||
2. freeze the local scene root
|
||||
3. freeze the dry-run output paths
|
||||
4. explicitly mark the run as read-only with respect to generator behavior and board status
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. fixed input statement
|
||||
2. fixed output statement
|
||||
3. dry-run no-promotion statement
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. no analyzer/generator implementation file is edited for this dry-run
|
||||
2. `scene_execution_board_2026-04-18.json` is not modified by dry-run results
|
||||
3. failures are recorded, not fixed
|
||||
|
||||
## Phase 1: Build Scene Inventory
|
||||
|
||||
### Objective
|
||||
|
||||
Construct a deterministic inventory of all `102` scene names and expected source directories.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. read `scene_execution_board_2026-04-18.json`
|
||||
2. extract all scene entries
|
||||
3. map each scene name to `D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>`
|
||||
4. check whether each source directory exists
|
||||
5. assign initial inventory status:
|
||||
- `source-present`
|
||||
- `missing-source`
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. inventory section inside `full_sweep_dry_run_2026-04-19.json`
|
||||
2. missing-source list
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. inventory count equals `102`
|
||||
2. every scene has a source path
|
||||
3. missing source does not stop the sweep
|
||||
|
||||
## Phase 2: Run Analyzer/Generator Dry-Run
|
||||
|
||||
### Objective
|
||||
|
||||
Attempt current generic generation for every source-present scene without fixing failures.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. generate a stable safe scene id for each scene
|
||||
2. invoke `sg_scene_generate` for each source-present scene
|
||||
3. write outputs under `examples/full_sweep_dry_run_2026-04-19`
|
||||
4. for successful generation, read `references/generation-report.json`
|
||||
5. for failed generation, capture stderr/stdout and exit code
|
||||
6. continue until all `102` scenes are processed
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. per-scene dry-run execution record
|
||||
2. generated output root for successful scenes
|
||||
3. captured error messages for failed scenes
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. every source-present scene has a generator result
|
||||
2. no failure aborts the full sweep
|
||||
3. generator results are isolated under the dry-run output root
|
||||
|
||||
## Phase 3: Classify Results
|
||||
|
||||
### Objective
|
||||
|
||||
Turn raw dry-run output into actionable coverage categories.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. classify generated `A/B` readiness with no blocker as `auto-pass`
|
||||
2. classify generator blocking with known gate/contract reason as `fail-closed-known`
|
||||
3. classify obvious family mismatch as `misclassified`
|
||||
4. classify evidence outside current families as `unsupported-family`
|
||||
5. classify absent directories as `missing-source`
|
||||
6. classify read/analyze failures as `source-unreadable`
|
||||
7. compute top blockers by frequency
|
||||
8. compute counts by inferred archetype
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. final dry-run status per scene
|
||||
2. summary counts
|
||||
3. by-archetype counts
|
||||
4. top-blocker list
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. every scene has exactly one final status
|
||||
2. total classified count equals `102`
|
||||
3. every non-pass scene has a reason
|
||||
|
||||
## Phase 4: Publish Report
|
||||
|
||||
### Objective
|
||||
|
||||
Answer the coverage question without changing project state.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. write `full_sweep_dry_run_2026-04-19.json`
|
||||
2. write `2026-04-19-102-full-sweep-dry-run-report.md`
|
||||
3. report these four headline numbers:
|
||||
- `real-sample executed pass`
|
||||
- `code-backed ledger coverage`
|
||||
- `dry-run auto-pass`
|
||||
- `dry-run actionable coverage`
|
||||
4. list next recommended blocker, but do not start implementation
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. dry-run JSON
|
||||
2. dry-run report
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. report can answer actual generic coverage over `102` scenes
|
||||
2. report separates proven coverage from predicted/dry-run coverage
|
||||
3. report does not promote scene status
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
This plan is complete when:
|
||||
|
||||
1. all `102` scenes are included in the dry-run result
|
||||
2. the dry-run result has stable summary counts
|
||||
3. the report explains the gap between `5/102`, `23/102`, and dry-run coverage
|
||||
4. no generator logic or execution board status is modified
|
||||
|
||||
## Non-Negotiable Stop Rule
|
||||
|
||||
After this dry-run starts:
|
||||
|
||||
1. do not fix generator failures inside the sweep
|
||||
2. do not create new family implementation plans from a single failure
|
||||
3. do not update the execution board automatically
|
||||
4. stop after publishing the dry-run result and report
|
||||
Reference in New Issue
Block a user