4.6 KiB
102 Full Sweep Dry-Run Report
Date: 2026-04-19 Plan:
docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.mdResult:tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.jsonOutput Root:examples/full_sweep_dry_run_2026-04-19
Scope
This run measured current generic scene -> skill coverage over the fixed 102 scene execution board.
It was a measurement-only dry-run:
- no analyzer logic was changed
- no generator logic was changed
scene_execution_board_2026-04-18.jsonwas not updated- no scene was promoted from this result
- failures were recorded, not fixed
Headline Numbers
| Metric | Count |
|---|---|
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |
dry-run actionable coverage is auto-pass + fail-closed-known.
Dry-Run Summary
| Dry-run status | Count |
|---|---|
auto-pass |
40 |
fail-closed-known |
26 |
misclassified |
5 |
unsupported-family |
0 |
missing-source |
0 |
source-unreadable |
31 |
| Total | 102 |
Archetype Distribution
| Inferred archetype | Count |
|---|---|
host_bridge_workflow |
31 |
paginated_enrichment |
8 |
multi_mode_request |
3 |
multi_endpoint_inventory |
2 |
page_state_eval |
2 |
none |
56 |
The none bucket includes generator failures and timeout cases that did not produce a generation-report.json.
Auto-Pass Shape
The 40 auto-pass scenes are distributed as:
| Inferred archetype | Auto-pass count |
|---|---|
host_bridge_workflow |
26 |
paginated_enrichment |
8 |
multi_mode_request |
3 |
multi_endpoint_inventory |
2 |
page_state_eval |
1 |
This means the current generic generator is no longer limited to the 23 code-backed ledger scenes. The conservative ledger coverage is lower because it only counts scenes already mapped into formal baseline or boundary assets.
Non-Pass Buckets
Source-Unreadable
31 scenes timed out during this bounded dry-run.
All timeout records use:
generator timeout after 30s
These should not be interpreted as unsupported family evidence. They are dry-run execution-limit failures and need separate timeout/performance triage before capability conclusions are drawn.
Fail-Closed-Known
26 scenes failed without an auto-pass result but were recorded with a known dry-run failure category.
Top reasons:
| Reason | Count |
|---|---|
generator failed without generation report |
25 |
bootstrap_target |
1 |
The generator failed without generation report bucket is actionable but too broad for implementation work. It should be split in a later bounded triage pass before any fixes are attempted.
Misclassified
5 scenes produced a package, but the inferred archetype conflicted with the current board group:
| Scene | Current group | Inferred archetype |
|---|---|---|
95598报修工单日管控 |
G3 |
host_bridge_workflow |
95598重要服务事项报备统计表 |
G3 |
host_bridge_workflow |
用电报装信息统计列表 |
G1-E |
host_bridge_workflow |
配网支撑月报(95598抢修统计报表) |
G3 |
host_bridge_workflow |
高低压新增报装容量月度统计表 |
G1-E |
host_bridge_workflow |
This is the clearest blocker category from the dry-run because it indicates current generic routing can over-prefer host_bridge_workflow on some scenes that already have board-level family expectations.
Interpretation
The four coverage numbers answer different questions:
5 / 102is the strict real-sample pass count.23 / 102is the formal code-backed ledger coverage.40 / 102is the current generic dry-run auto-pass count.66 / 102is the current generic actionable coverage count.
The key result is that the generic generator currently auto-passes more scenes than the formal ledger coverage shows, but the result is not clean enough to promote automatically because:
31scenes hit bounded dry-run timeouts.5scenes show board-vs-archetype mismatch.26scenes need more specific failure extraction before implementation work.
Recommended Next Blocker
Do not start implementation from this report directly.
The next bounded step should be a dry-run triage pass, with priority:
- split the
31timeout cases into true timeout, oversized source, and command-level hang - inspect the
5misclassified cases as the first routing-quality sample - refine the
25generic no-report failures into concrete failure categories
This report does not update the execution board and does not promote any scene.