117 lines
3.0 KiB
Markdown
117 lines
3.0 KiB
Markdown
# 102 Full Sweep Dry-Run Design
|
|
|
|
> Date: 2026-04-19
|
|
> Status: Draft
|
|
> Upstream Context: completed `scene-skill 60-to-90` roadmap and post-roadmap real-sample closures
|
|
|
|
## 1. Intent
|
|
|
|
This design defines a bounded, read-only dry-run over the full `102` scene ledger.
|
|
|
|
The target is:
|
|
|
|
`measure current generic scene-to-skill coverage without changing generator behavior or promoting scene status`
|
|
|
|
## 2. Problem Statement
|
|
|
|
The current project has three different coverage numbers:
|
|
|
|
1. real-sample executed pass: `5 / 102`
|
|
2. code-backed ledger coverage: `23 / 102`
|
|
3. repo-local family regression pass count: `24 / 24`
|
|
|
|
These numbers are all valid, but none answers the direct question:
|
|
|
|
`how many of the 102 scenes can the current generic analyzer/generator handle if we run them all now?`
|
|
|
|
This dry-run answers that question.
|
|
|
|
## 3. Scope Boundary
|
|
|
|
This design is limited to measurement.
|
|
|
|
It may include:
|
|
|
|
1. reading the current `102` execution board
|
|
2. resolving local source directories under the fixed real-scene root
|
|
3. running analyzer/generator dry-runs against available sources
|
|
4. collecting success, fail-closed, missing-source, and unsupported results
|
|
5. publishing a standalone dry-run JSON and report
|
|
|
|
It must not include:
|
|
|
|
1. changing analyzer logic
|
|
2. changing generator logic
|
|
3. changing existing family baselines
|
|
4. changing `scene_execution_board_2026-04-18.json`
|
|
5. promoting scenes from dry-run results
|
|
6. creating new family plans
|
|
7. running more than the fixed `102` ledger set
|
|
|
|
## 4. Fixed Inputs
|
|
|
|
### Execution Board
|
|
|
|
`tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
|
|
|
|
### Scene Root
|
|
|
|
`D:/desk/智能体资料/全量业务场景/一平台场景`
|
|
|
|
### Generator
|
|
|
|
`cargo run --bin sg_scene_generate`
|
|
|
|
## 5. Fixed Outputs
|
|
|
|
### Dry-Run Result JSON
|
|
|
|
`tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
|
|
|
|
### Dry-Run Output Root
|
|
|
|
`examples/full_sweep_dry_run_2026-04-19`
|
|
|
|
### Report
|
|
|
|
`docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
|
|
|
|
## 6. Classification Model
|
|
|
|
Each scene must receive exactly one final dry-run status:
|
|
|
|
1. `auto-pass`
|
|
2. `fail-closed-known`
|
|
3. `misclassified`
|
|
4. `unsupported-family`
|
|
5. `missing-source`
|
|
6. `source-unreadable`
|
|
|
|
## 7. Coverage Metrics
|
|
|
|
The dry-run must report at least these numbers:
|
|
|
|
1. `realSampleExecutedPass`
|
|
2. `codeBackedLedgerCoverage`
|
|
3. `dryRunAutoPass`
|
|
4. `dryRunActionableCoverage`
|
|
5. `missingSource`
|
|
6. `sourceUnreadable`
|
|
7. `unsupportedFamily`
|
|
|
|
## 8. Non-Negotiable Stop Rules
|
|
|
|
1. If a scene fails, record the failure and continue.
|
|
2. If many scenes fail with the same blocker, record the blocker and do not fix it in this dry-run.
|
|
3. If dry-run discovers a likely bug, write it as a follow-up recommendation only.
|
|
4. Do not update the execution board from dry-run output.
|
|
|
|
## 9. Exit Condition
|
|
|
|
This design is complete when the project has a single bounded plan that:
|
|
|
|
1. defines the dry-run tool/task
|
|
2. defines the dry-run output schema
|
|
3. preserves read-only behavior against generator logic and board status
|
|
4. produces a report that answers actual generic coverage over `102` scenes
|