Files
claw/docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

136 lines
4.6 KiB
Markdown

# 102 Full Sweep Dry-Run Report
> Date: 2026-04-19
> Plan: `docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md`
> Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
> Output Root: `examples/full_sweep_dry_run_2026-04-19`
## Scope
This run measured current generic `scene -> skill` coverage over the fixed `102` scene execution board.
It was a measurement-only dry-run:
1. no analyzer logic was changed
2. no generator logic was changed
3. `scene_execution_board_2026-04-18.json` was not updated
4. no scene was promoted from this result
5. failures were recorded, not fixed
## Headline Numbers
| Metric | Count |
| --- | ---: |
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |
`dry-run actionable coverage` is `auto-pass + fail-closed-known`.
## Dry-Run Summary
| Dry-run status | Count |
| --- | ---: |
| `auto-pass` | 40 |
| `fail-closed-known` | 26 |
| `misclassified` | 5 |
| `unsupported-family` | 0 |
| `missing-source` | 0 |
| `source-unreadable` | 31 |
| Total | 102 |
## Archetype Distribution
| Inferred archetype | Count |
| --- | ---: |
| `host_bridge_workflow` | 31 |
| `paginated_enrichment` | 8 |
| `multi_mode_request` | 3 |
| `multi_endpoint_inventory` | 2 |
| `page_state_eval` | 2 |
| `none` | 56 |
The `none` bucket includes generator failures and timeout cases that did not produce a `generation-report.json`.
## Auto-Pass Shape
The `40` auto-pass scenes are distributed as:
| Inferred archetype | Auto-pass count |
| --- | ---: |
| `host_bridge_workflow` | 26 |
| `paginated_enrichment` | 8 |
| `multi_mode_request` | 3 |
| `multi_endpoint_inventory` | 2 |
| `page_state_eval` | 1 |
This means the current generic generator is no longer limited to the `23` code-backed ledger scenes. The conservative ledger coverage is lower because it only counts scenes already mapped into formal baseline or boundary assets.
## Non-Pass Buckets
### Source-Unreadable
`31` scenes timed out during this bounded dry-run.
All timeout records use:
`generator timeout after 30s`
These should not be interpreted as unsupported family evidence. They are dry-run execution-limit failures and need separate timeout/performance triage before capability conclusions are drawn.
### Fail-Closed-Known
`26` scenes failed without an auto-pass result but were recorded with a known dry-run failure category.
Top reasons:
| Reason | Count |
| --- | ---: |
| `generator failed without generation report` | 25 |
| `bootstrap_target` | 1 |
The `generator failed without generation report` bucket is actionable but too broad for implementation work. It should be split in a later bounded triage pass before any fixes are attempted.
### Misclassified
`5` scenes produced a package, but the inferred archetype conflicted with the current board group:
| Scene | Current group | Inferred archetype |
| --- | --- | --- |
| `95598报修工单日管控` | `G3` | `host_bridge_workflow` |
| `95598重要服务事项报备统计表` | `G3` | `host_bridge_workflow` |
| `用电报装信息统计列表` | `G1-E` | `host_bridge_workflow` |
| `配网支撑月报(95598抢修统计报表)` | `G3` | `host_bridge_workflow` |
| `高低压新增报装容量月度统计表` | `G1-E` | `host_bridge_workflow` |
This is the clearest blocker category from the dry-run because it indicates current generic routing can over-prefer `host_bridge_workflow` on some scenes that already have board-level family expectations.
## Interpretation
The four coverage numbers answer different questions:
1. `5 / 102` is the strict real-sample pass count.
2. `23 / 102` is the formal code-backed ledger coverage.
3. `40 / 102` is the current generic dry-run auto-pass count.
4. `66 / 102` is the current generic actionable coverage count.
The key result is that the generic generator currently auto-passes more scenes than the formal ledger coverage shows, but the result is not clean enough to promote automatically because:
1. `31` scenes hit bounded dry-run timeouts.
2. `5` scenes show board-vs-archetype mismatch.
3. `26` scenes need more specific failure extraction before implementation work.
## Recommended Next Blocker
Do not start implementation from this report directly.
The next bounded step should be a dry-run triage pass, with priority:
1. split the `31` timeout cases into true timeout, oversized source, and command-level hang
2. inspect the `5` misclassified cases as the first routing-quality sample
3. refine the `25` generic no-report failures into concrete failure categories
This report does not update the execution board and does not promote any scene.