5.6 KiB
5.6 KiB
102 Full Sweep Dry-Run Plan
Date: 2026-04-19 Status: Draft Upstream Spec: 2026-04-19-102-full-sweep-dry-run-design.md
Plan Intent
Run one bounded, read-only full sweep over the 102 scene ledger to measure actual generic scene -> skill coverage.
The plan answers:
how many of the 102 scenes can the current generic analyzer/generator handle today?
Scope Guardrails
- do not change analyzer logic
- do not change generator logic
- do not promote scenes into
scene_execution_board_2026-04-18.json - do not add new family baselines
- do not create new family implementation plans
- do not fix failures during this dry-run
- do not run outside the fixed
102scene set
Fixed Inputs
- execution board:
tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json - scene root:
D:/desk/智能体资料/全量业务场景/一平台场景 - generator command:
cargo run --bin sg_scene_generate
Fixed Outputs
- dry-run result:
tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json - dry-run output root:
examples/full_sweep_dry_run_2026-04-19 - report:
docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md
Workstreams
WS1Build Scene InventoryWS2Run Analyzer/Generator Dry-RunWS3Classify ResultsWS4Publish Coverage Report
Phase 0: Freeze Dry-Run Boundary
Objective
Make the dry-run a measurement exercise only.
Tasks
- freeze the execution board input
- freeze the local scene root
- freeze the dry-run output paths
- explicitly mark the run as read-only with respect to generator behavior and board status
Deliverables
- fixed input statement
- fixed output statement
- dry-run no-promotion statement
Acceptance Criteria
- no analyzer/generator implementation file is edited for this dry-run
scene_execution_board_2026-04-18.jsonis not modified by dry-run results- failures are recorded, not fixed
Phase 1: Build Scene Inventory
Objective
Construct a deterministic inventory of all 102 scene names and expected source directories.
Tasks
- read
scene_execution_board_2026-04-18.json - extract all scene entries
- map each scene name to
D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName> - check whether each source directory exists
- assign initial inventory status:
source-presentmissing-source
Deliverables
- inventory section inside
full_sweep_dry_run_2026-04-19.json - missing-source list
Acceptance Criteria
- inventory count equals
102 - every scene has a source path
- missing source does not stop the sweep
Phase 2: Run Analyzer/Generator Dry-Run
Objective
Attempt current generic generation for every source-present scene without fixing failures.
Tasks
- generate a stable safe scene id for each scene
- invoke
sg_scene_generatefor each source-present scene - write outputs under
examples/full_sweep_dry_run_2026-04-19 - for successful generation, read
references/generation-report.json - for failed generation, capture stderr/stdout and exit code
- continue until all
102scenes are processed
Deliverables
- per-scene dry-run execution record
- generated output root for successful scenes
- captured error messages for failed scenes
Acceptance Criteria
- every source-present scene has a generator result
- no failure aborts the full sweep
- generator results are isolated under the dry-run output root
Phase 3: Classify Results
Objective
Turn raw dry-run output into actionable coverage categories.
Tasks
- classify generated
A/Breadiness with no blocker asauto-pass - classify generator blocking with known gate/contract reason as
fail-closed-known - classify obvious family mismatch as
misclassified - classify evidence outside current families as
unsupported-family - classify absent directories as
missing-source - classify read/analyze failures as
source-unreadable - compute top blockers by frequency
- compute counts by inferred archetype
Deliverables
- final dry-run status per scene
- summary counts
- by-archetype counts
- top-blocker list
Acceptance Criteria
- every scene has exactly one final status
- total classified count equals
102 - every non-pass scene has a reason
Phase 4: Publish Report
Objective
Answer the coverage question without changing project state.
Tasks
- write
full_sweep_dry_run_2026-04-19.json - write
2026-04-19-102-full-sweep-dry-run-report.md - report these four headline numbers:
real-sample executed passcode-backed ledger coveragedry-run auto-passdry-run actionable coverage
- list next recommended blocker, but do not start implementation
Deliverables
- dry-run JSON
- dry-run report
Acceptance Criteria
- report can answer actual generic coverage over
102scenes - report separates proven coverage from predicted/dry-run coverage
- report does not promote scene status
Completion Criteria
This plan is complete when:
- all
102scenes are included in the dry-run result - the dry-run result has stable summary counts
- the report explains the gap between
5/102,23/102, and dry-run coverage - no generator logic or execution board status is modified
Non-Negotiable Stop Rule
After this dry-run starts:
- do not fix generator failures inside the sweep
- do not create new family implementation plans from a single failure
- do not update the execution board automatically
- stop after publishing the dry-run result and report