claw/docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md

# 102 Full Sweep Dry-Run Plan

> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-102-full-sweep-dry-run-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md)

## Plan Intent

Run one bounded, read-only full sweep over the `102` scene ledger to measure actual generic `scene -> skill` coverage.

The plan answers:

`how many of the 102 scenes can the current generic analyzer/generator handle today?`

## Scope Guardrails

1. do not change analyzer logic
2. do not change generator logic
3. do not promote scenes into `scene_execution_board_2026-04-18.json`
4. do not add new family baselines
5. do not create new family implementation plans
6. do not fix failures during this dry-run
7. do not run outside the fixed `102` scene set

## Fixed Inputs

1. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
3. generator command: `cargo run --bin sg_scene_generate`

## Fixed Outputs

1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`

## Workstreams

1. `WS1` Build Scene Inventory
2. `WS2` Run Analyzer/Generator Dry-Run
3. `WS3` Classify Results
4. `WS4` Publish Coverage Report

## Phase 0: Freeze Dry-Run Boundary

### Objective

Make the dry-run a measurement exercise only.

### Tasks

1. freeze the execution board input
2. freeze the local scene root
3. freeze the dry-run output paths
4. explicitly mark the run as read-only with respect to generator behavior and board status

### Deliverables

1. fixed input statement
2. fixed output statement
3. dry-run no-promotion statement

### Acceptance Criteria

1. no analyzer/generator implementation file is edited for this dry-run
2. `scene_execution_board_2026-04-18.json` is not modified by dry-run results
3. failures are recorded, not fixed

## Phase 1: Build Scene Inventory

### Objective

Construct a deterministic inventory of all `102` scene names and expected source directories.

### Tasks

1. read `scene_execution_board_2026-04-18.json`
2. extract all scene entries
3. map each scene name to `D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>`
4. check whether each source directory exists
5. assign initial inventory status:
   - `source-present`
   - `missing-source`

### Deliverables

1. inventory section inside `full_sweep_dry_run_2026-04-19.json`
2. missing-source list

### Acceptance Criteria

1. inventory count equals `102`
2. every scene has a source path
3. missing source does not stop the sweep

## Phase 2: Run Analyzer/Generator Dry-Run

### Objective

Attempt current generic generation for every source-present scene without fixing failures.

### Tasks

1. generate a stable safe scene id for each scene
2. invoke `sg_scene_generate` for each source-present scene
3. write outputs under `examples/full_sweep_dry_run_2026-04-19`
4. for successful generation, read `references/generation-report.json`
5. for failed generation, capture stderr/stdout and exit code
6. continue until all `102` scenes are processed

### Deliverables

1. per-scene dry-run execution record
2. generated output root for successful scenes
3. captured error messages for failed scenes

### Acceptance Criteria

1. every source-present scene has a generator result
2. no failure aborts the full sweep
3. generator results are isolated under the dry-run output root

## Phase 3: Classify Results

### Objective

Turn raw dry-run output into actionable coverage categories.

### Tasks

1. classify generated `A/B` readiness with no blocker as `auto-pass`
2. classify generator blocking with known gate/contract reason as `fail-closed-known`
3. classify obvious family mismatch as `misclassified`
4. classify evidence outside current families as `unsupported-family`
5. classify absent directories as `missing-source`
6. classify read/analyze failures as `source-unreadable`
7. compute top blockers by frequency
8. compute counts by inferred archetype

### Deliverables

1. final dry-run status per scene
2. summary counts
3. by-archetype counts
4. top-blocker list

### Acceptance Criteria

1. every scene has exactly one final status
2. total classified count equals `102`
3. every non-pass scene has a reason

## Phase 4: Publish Report

### Objective

Answer the coverage question without changing project state.

### Tasks

1. write `full_sweep_dry_run_2026-04-19.json`
2. write `2026-04-19-102-full-sweep-dry-run-report.md`
3. report these four headline numbers:
   - `real-sample executed pass`
   - `code-backed ledger coverage`
   - `dry-run auto-pass`
   - `dry-run actionable coverage`
4. list next recommended blocker, but do not start implementation

### Deliverables

1. dry-run JSON
2. dry-run report

### Acceptance Criteria

1. report can answer actual generic coverage over `102` scenes
2. report separates proven coverage from predicted/dry-run coverage
3. report does not promote scene status

## Completion Criteria

This plan is complete when:

1. all `102` scenes are included in the dry-run result
2. the dry-run result has stable summary counts
3. the report explains the gap between `5/102`, `23/102`, and dry-run coverage
4. no generator logic or execution board status is modified

## Non-Negotiable Stop Rule

After this dry-run starts:

1. do not fix generator failures inside the sweep
2. do not create new family implementation plans from a single failure
3. do not update the execution board automatically
4. stop after publishing the dry-run result and report