Files
claw/docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md

5.6 KiB

102 Full Sweep Dry-Run Plan

Date: 2026-04-19 Status: Draft Upstream Spec: 2026-04-19-102-full-sweep-dry-run-design.md

Plan Intent

Run one bounded, read-only full sweep over the 102 scene ledger to measure actual generic scene -> skill coverage.

The plan answers:

how many of the 102 scenes can the current generic analyzer/generator handle today?

Scope Guardrails

  1. do not change analyzer logic
  2. do not change generator logic
  3. do not promote scenes into scene_execution_board_2026-04-18.json
  4. do not add new family baselines
  5. do not create new family implementation plans
  6. do not fix failures during this dry-run
  7. do not run outside the fixed 102 scene set

Fixed Inputs

  1. execution board: tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json
  2. scene root: D:/desk/智能体资料/全量业务场景/一平台场景
  3. generator command: cargo run --bin sg_scene_generate

Fixed Outputs

  1. dry-run result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json
  2. dry-run output root: examples/full_sweep_dry_run_2026-04-19
  3. report: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

Workstreams

  1. WS1 Build Scene Inventory
  2. WS2 Run Analyzer/Generator Dry-Run
  3. WS3 Classify Results
  4. WS4 Publish Coverage Report

Phase 0: Freeze Dry-Run Boundary

Objective

Make the dry-run a measurement exercise only.

Tasks

  1. freeze the execution board input
  2. freeze the local scene root
  3. freeze the dry-run output paths
  4. explicitly mark the run as read-only with respect to generator behavior and board status

Deliverables

  1. fixed input statement
  2. fixed output statement
  3. dry-run no-promotion statement

Acceptance Criteria

  1. no analyzer/generator implementation file is edited for this dry-run
  2. scene_execution_board_2026-04-18.json is not modified by dry-run results
  3. failures are recorded, not fixed

Phase 1: Build Scene Inventory

Objective

Construct a deterministic inventory of all 102 scene names and expected source directories.

Tasks

  1. read scene_execution_board_2026-04-18.json
  2. extract all scene entries
  3. map each scene name to D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>
  4. check whether each source directory exists
  5. assign initial inventory status:
    • source-present
    • missing-source

Deliverables

  1. inventory section inside full_sweep_dry_run_2026-04-19.json
  2. missing-source list

Acceptance Criteria

  1. inventory count equals 102
  2. every scene has a source path
  3. missing source does not stop the sweep

Phase 2: Run Analyzer/Generator Dry-Run

Objective

Attempt current generic generation for every source-present scene without fixing failures.

Tasks

  1. generate a stable safe scene id for each scene
  2. invoke sg_scene_generate for each source-present scene
  3. write outputs under examples/full_sweep_dry_run_2026-04-19
  4. for successful generation, read references/generation-report.json
  5. for failed generation, capture stderr/stdout and exit code
  6. continue until all 102 scenes are processed

Deliverables

  1. per-scene dry-run execution record
  2. generated output root for successful scenes
  3. captured error messages for failed scenes

Acceptance Criteria

  1. every source-present scene has a generator result
  2. no failure aborts the full sweep
  3. generator results are isolated under the dry-run output root

Phase 3: Classify Results

Objective

Turn raw dry-run output into actionable coverage categories.

Tasks

  1. classify generated A/B readiness with no blocker as auto-pass
  2. classify generator blocking with known gate/contract reason as fail-closed-known
  3. classify obvious family mismatch as misclassified
  4. classify evidence outside current families as unsupported-family
  5. classify absent directories as missing-source
  6. classify read/analyze failures as source-unreadable
  7. compute top blockers by frequency
  8. compute counts by inferred archetype

Deliverables

  1. final dry-run status per scene
  2. summary counts
  3. by-archetype counts
  4. top-blocker list

Acceptance Criteria

  1. every scene has exactly one final status
  2. total classified count equals 102
  3. every non-pass scene has a reason

Phase 4: Publish Report

Objective

Answer the coverage question without changing project state.

Tasks

  1. write full_sweep_dry_run_2026-04-19.json
  2. write 2026-04-19-102-full-sweep-dry-run-report.md
  3. report these four headline numbers:
    • real-sample executed pass
    • code-backed ledger coverage
    • dry-run auto-pass
    • dry-run actionable coverage
  4. list next recommended blocker, but do not start implementation

Deliverables

  1. dry-run JSON
  2. dry-run report

Acceptance Criteria

  1. report can answer actual generic coverage over 102 scenes
  2. report separates proven coverage from predicted/dry-run coverage
  3. report does not promote scene status

Completion Criteria

This plan is complete when:

  1. all 102 scenes are included in the dry-run result
  2. the dry-run result has stable summary counts
  3. the report explains the gap between 5/102, 23/102, and dry-run coverage
  4. no generator logic or execution board status is modified

Non-Negotiable Stop Rule

After this dry-run starts:

  1. do not fix generator failures inside the sweep
  2. do not create new family implementation plans from a single failure
  3. do not update the execution board automatically
  4. stop after publishing the dry-run result and report