feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00
parent 118fc77935
commit 956f0c2b68
439 changed files with 61974 additions and 3645 deletions
--- a/docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md
+++ b/docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md
@@ -0,0 +1,197 @@
+# 102 Full Sweep Dry-Run Plan
+
+> Date: 2026-04-19
+> Status: Draft
+> Upstream Spec: [2026-04-19-102-full-sweep-dry-run-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md)
+
+## Plan Intent
+
+Run one bounded, read-only full sweep over the `102` scene ledger to measure actual generic `scene -> skill` coverage.
+
+The plan answers:
+
+`how many of the 102 scenes can the current generic analyzer/generator handle today?`
+
+## Scope Guardrails
+
+1. do not change analyzer logic
+2. do not change generator logic
+3. do not promote scenes into `scene_execution_board_2026-04-18.json`
+4. do not add new family baselines
+5. do not create new family implementation plans
+6. do not fix failures during this dry-run
+7. do not run outside the fixed `102` scene set
+
+## Fixed Inputs
+
+1. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
+2. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
+3. generator command: `cargo run --bin sg_scene_generate`
+
+## Fixed Outputs
+
+1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
+2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
+3. report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
+
+## Workstreams
+
+1. `WS1` Build Scene Inventory
+2. `WS2` Run Analyzer/Generator Dry-Run
+3. `WS3` Classify Results
+4. `WS4` Publish Coverage Report
+
+## Phase 0: Freeze Dry-Run Boundary
+
+### Objective
+
+Make the dry-run a measurement exercise only.
+
+### Tasks
+
+1. freeze the execution board input
+2. freeze the local scene root
+3. freeze the dry-run output paths
+4. explicitly mark the run as read-only with respect to generator behavior and board status
+
+### Deliverables
+
+1. fixed input statement
+2. fixed output statement
+3. dry-run no-promotion statement
+
+### Acceptance Criteria
+
+1. no analyzer/generator implementation file is edited for this dry-run
+2. `scene_execution_board_2026-04-18.json` is not modified by dry-run results
+3. failures are recorded, not fixed
+
+## Phase 1: Build Scene Inventory
+
+### Objective
+
+Construct a deterministic inventory of all `102` scene names and expected source directories.
+
+### Tasks
+
+1. read `scene_execution_board_2026-04-18.json`
+2. extract all scene entries
+3. map each scene name to `D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>`
+4. check whether each source directory exists
+5. assign initial inventory status:
+   - `source-present`
+   - `missing-source`
+
+### Deliverables
+
+1. inventory section inside `full_sweep_dry_run_2026-04-19.json`
+2. missing-source list
+
+### Acceptance Criteria
+
+1. inventory count equals `102`
+2. every scene has a source path
+3. missing source does not stop the sweep
+
+## Phase 2: Run Analyzer/Generator Dry-Run
+
+### Objective
+
+Attempt current generic generation for every source-present scene without fixing failures.
+
+### Tasks
+
+1. generate a stable safe scene id for each scene
+2. invoke `sg_scene_generate` for each source-present scene
+3. write outputs under `examples/full_sweep_dry_run_2026-04-19`
+4. for successful generation, read `references/generation-report.json`
+5. for failed generation, capture stderr/stdout and exit code
+6. continue until all `102` scenes are processed
+
+### Deliverables
+
+1. per-scene dry-run execution record
+2. generated output root for successful scenes
+3. captured error messages for failed scenes
+
+### Acceptance Criteria
+
+1. every source-present scene has a generator result
+2. no failure aborts the full sweep
+3. generator results are isolated under the dry-run output root
+
+## Phase 3: Classify Results
+
+### Objective
+
+Turn raw dry-run output into actionable coverage categories.
+
+### Tasks
+
+1. classify generated `A/B` readiness with no blocker as `auto-pass`
+2. classify generator blocking with known gate/contract reason as `fail-closed-known`
+3. classify obvious family mismatch as `misclassified`
+4. classify evidence outside current families as `unsupported-family`
+5. classify absent directories as `missing-source`
+6. classify read/analyze failures as `source-unreadable`
+7. compute top blockers by frequency
+8. compute counts by inferred archetype
+
+### Deliverables
+
+1. final dry-run status per scene
+2. summary counts
+3. by-archetype counts
+4. top-blocker list
+
+### Acceptance Criteria
+
+1. every scene has exactly one final status
+2. total classified count equals `102`
+3. every non-pass scene has a reason
+
+## Phase 4: Publish Report
+
+### Objective
+
+Answer the coverage question without changing project state.
+
+### Tasks
+
+1. write `full_sweep_dry_run_2026-04-19.json`
+2. write `2026-04-19-102-full-sweep-dry-run-report.md`
+3. report these four headline numbers:
+   - `real-sample executed pass`
+   - `code-backed ledger coverage`
+   - `dry-run auto-pass`
+   - `dry-run actionable coverage`
+4. list next recommended blocker, but do not start implementation
+
+### Deliverables
+
+1. dry-run JSON
+2. dry-run report
+
+### Acceptance Criteria
+
+1. report can answer actual generic coverage over `102` scenes
+2. report separates proven coverage from predicted/dry-run coverage
+3. report does not promote scene status
+
+## Completion Criteria
+
+This plan is complete when:
+
+1. all `102` scenes are included in the dry-run result
+2. the dry-run result has stable summary counts
+3. the report explains the gap between `5/102`, `23/102`, and dry-run coverage
+4. no generator logic or execution board status is modified
+
+## Non-Negotiable Stop Rule
+
+After this dry-run starts:
+
+1. do not fix generator failures inside the sweep
+2. do not create new family implementation plans from a single failure
+3. do not update the execution board automatically
+4. stop after publishing the dry-run result and report