Files
claw/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md

3.0 KiB

102 Full Sweep Dry-Run Design

Date: 2026-04-19 Status: Draft Upstream Context: completed scene-skill 60-to-90 roadmap and post-roadmap real-sample closures

1. Intent

This design defines a bounded, read-only dry-run over the full 102 scene ledger.

The target is:

measure current generic scene-to-skill coverage without changing generator behavior or promoting scene status

2. Problem Statement

The current project has three different coverage numbers:

  1. real-sample executed pass: 5 / 102
  2. code-backed ledger coverage: 23 / 102
  3. repo-local family regression pass count: 24 / 24

These numbers are all valid, but none answers the direct question:

how many of the 102 scenes can the current generic analyzer/generator handle if we run them all now?

This dry-run answers that question.

3. Scope Boundary

This design is limited to measurement.

It may include:

  1. reading the current 102 execution board
  2. resolving local source directories under the fixed real-scene root
  3. running analyzer/generator dry-runs against available sources
  4. collecting success, fail-closed, missing-source, and unsupported results
  5. publishing a standalone dry-run JSON and report

It must not include:

  1. changing analyzer logic
  2. changing generator logic
  3. changing existing family baselines
  4. changing scene_execution_board_2026-04-18.json
  5. promoting scenes from dry-run results
  6. creating new family plans
  7. running more than the fixed 102 ledger set

4. Fixed Inputs

Execution Board

tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json

Scene Root

D:/desk/智能体资料/全量业务场景/一平台场景

Generator

cargo run --bin sg_scene_generate

5. Fixed Outputs

Dry-Run Result JSON

tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json

Dry-Run Output Root

examples/full_sweep_dry_run_2026-04-19

Report

docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

6. Classification Model

Each scene must receive exactly one final dry-run status:

  1. auto-pass
  2. fail-closed-known
  3. misclassified
  4. unsupported-family
  5. missing-source
  6. source-unreadable

7. Coverage Metrics

The dry-run must report at least these numbers:

  1. realSampleExecutedPass
  2. codeBackedLedgerCoverage
  3. dryRunAutoPass
  4. dryRunActionableCoverage
  5. missingSource
  6. sourceUnreadable
  7. unsupportedFamily

8. Non-Negotiable Stop Rules

  1. If a scene fails, record the failure and continue.
  2. If many scenes fail with the same blocker, record the blocker and do not fix it in this dry-run.
  3. If dry-run discovers a likely bug, write it as a follow-up recommendation only.
  4. Do not update the execution board from dry-run output.

9. Exit Condition

This design is complete when the project has a single bounded plan that:

  1. defines the dry-run tool/task
  2. defines the dry-run output schema
  3. preserves read-only behavior against generator logic and board status
  4. produces a report that answers actual generic coverage over 102 scenes