Files
claw/docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

4.6 KiB

102 Full Sweep Dry-Run Report

Date: 2026-04-19 Plan: docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md Result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json Output Root: examples/full_sweep_dry_run_2026-04-19

Scope

This run measured current generic scene -> skill coverage over the fixed 102 scene execution board.

It was a measurement-only dry-run:

  1. no analyzer logic was changed
  2. no generator logic was changed
  3. scene_execution_board_2026-04-18.json was not updated
  4. no scene was promoted from this result
  5. failures were recorded, not fixed

Headline Numbers

Metric Count
Real-sample executed pass 5 / 102
Code-backed ledger coverage 23 / 102
Dry-run auto-pass 40 / 102
Dry-run actionable coverage 66 / 102

dry-run actionable coverage is auto-pass + fail-closed-known.

Dry-Run Summary

Dry-run status Count
auto-pass 40
fail-closed-known 26
misclassified 5
unsupported-family 0
missing-source 0
source-unreadable 31
Total 102

Archetype Distribution

Inferred archetype Count
host_bridge_workflow 31
paginated_enrichment 8
multi_mode_request 3
multi_endpoint_inventory 2
page_state_eval 2
none 56

The none bucket includes generator failures and timeout cases that did not produce a generation-report.json.

Auto-Pass Shape

The 40 auto-pass scenes are distributed as:

Inferred archetype Auto-pass count
host_bridge_workflow 26
paginated_enrichment 8
multi_mode_request 3
multi_endpoint_inventory 2
page_state_eval 1

This means the current generic generator is no longer limited to the 23 code-backed ledger scenes. The conservative ledger coverage is lower because it only counts scenes already mapped into formal baseline or boundary assets.

Non-Pass Buckets

Source-Unreadable

31 scenes timed out during this bounded dry-run.

All timeout records use:

generator timeout after 30s

These should not be interpreted as unsupported family evidence. They are dry-run execution-limit failures and need separate timeout/performance triage before capability conclusions are drawn.

Fail-Closed-Known

26 scenes failed without an auto-pass result but were recorded with a known dry-run failure category.

Top reasons:

Reason Count
generator failed without generation report 25
bootstrap_target 1

The generator failed without generation report bucket is actionable but too broad for implementation work. It should be split in a later bounded triage pass before any fixes are attempted.

Misclassified

5 scenes produced a package, but the inferred archetype conflicted with the current board group:

Scene Current group Inferred archetype
95598报修工单日管控 G3 host_bridge_workflow
95598重要服务事项报备统计表 G3 host_bridge_workflow
用电报装信息统计列表 G1-E host_bridge_workflow
配网支撑月报(95598抢修统计报表) G3 host_bridge_workflow
高低压新增报装容量月度统计表 G1-E host_bridge_workflow

This is the clearest blocker category from the dry-run because it indicates current generic routing can over-prefer host_bridge_workflow on some scenes that already have board-level family expectations.

Interpretation

The four coverage numbers answer different questions:

  1. 5 / 102 is the strict real-sample pass count.
  2. 23 / 102 is the formal code-backed ledger coverage.
  3. 40 / 102 is the current generic dry-run auto-pass count.
  4. 66 / 102 is the current generic actionable coverage count.

The key result is that the generic generator currently auto-passes more scenes than the formal ledger coverage shows, but the result is not clean enough to promote automatically because:

  1. 31 scenes hit bounded dry-run timeouts.
  2. 5 scenes show board-vs-archetype mismatch.
  3. 26 scenes need more specific failure extraction before implementation work.

Do not start implementation from this report directly.

The next bounded step should be a dry-run triage pass, with priority:

  1. split the 31 timeout cases into true timeout, oversized source, and command-level hang
  2. inspect the 5 misclassified cases as the first routing-quality sample
  3. refine the 25 generic no-report failures into concrete failure categories

This report does not update the execution board and does not promote any scene.