136 lines
4.6 KiB
Markdown
136 lines
4.6 KiB
Markdown
# 102 Full Sweep Dry-Run Report
|
|
|
|
> Date: 2026-04-19
|
|
> Plan: `docs/superpowers/plans/2026-04-19-102-full-sweep-dry-run-plan.md`
|
|
> Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
|
|
> Output Root: `examples/full_sweep_dry_run_2026-04-19`
|
|
|
|
## Scope
|
|
|
|
This run measured current generic `scene -> skill` coverage over the fixed `102` scene execution board.
|
|
|
|
It was a measurement-only dry-run:
|
|
|
|
1. no analyzer logic was changed
|
|
2. no generator logic was changed
|
|
3. `scene_execution_board_2026-04-18.json` was not updated
|
|
4. no scene was promoted from this result
|
|
5. failures were recorded, not fixed
|
|
|
|
## Headline Numbers
|
|
|
|
| Metric | Count |
|
|
| --- | ---: |
|
|
| Real-sample executed pass | 5 / 102 |
|
|
| Code-backed ledger coverage | 23 / 102 |
|
|
| Dry-run auto-pass | 40 / 102 |
|
|
| Dry-run actionable coverage | 66 / 102 |
|
|
|
|
`dry-run actionable coverage` is `auto-pass + fail-closed-known`.
|
|
|
|
## Dry-Run Summary
|
|
|
|
| Dry-run status | Count |
|
|
| --- | ---: |
|
|
| `auto-pass` | 40 |
|
|
| `fail-closed-known` | 26 |
|
|
| `misclassified` | 5 |
|
|
| `unsupported-family` | 0 |
|
|
| `missing-source` | 0 |
|
|
| `source-unreadable` | 31 |
|
|
| Total | 102 |
|
|
|
|
## Archetype Distribution
|
|
|
|
| Inferred archetype | Count |
|
|
| --- | ---: |
|
|
| `host_bridge_workflow` | 31 |
|
|
| `paginated_enrichment` | 8 |
|
|
| `multi_mode_request` | 3 |
|
|
| `multi_endpoint_inventory` | 2 |
|
|
| `page_state_eval` | 2 |
|
|
| `none` | 56 |
|
|
|
|
The `none` bucket includes generator failures and timeout cases that did not produce a `generation-report.json`.
|
|
|
|
## Auto-Pass Shape
|
|
|
|
The `40` auto-pass scenes are distributed as:
|
|
|
|
| Inferred archetype | Auto-pass count |
|
|
| --- | ---: |
|
|
| `host_bridge_workflow` | 26 |
|
|
| `paginated_enrichment` | 8 |
|
|
| `multi_mode_request` | 3 |
|
|
| `multi_endpoint_inventory` | 2 |
|
|
| `page_state_eval` | 1 |
|
|
|
|
This means the current generic generator is no longer limited to the `23` code-backed ledger scenes. The conservative ledger coverage is lower because it only counts scenes already mapped into formal baseline or boundary assets.
|
|
|
|
## Non-Pass Buckets
|
|
|
|
### Source-Unreadable
|
|
|
|
`31` scenes timed out during this bounded dry-run.
|
|
|
|
All timeout records use:
|
|
|
|
`generator timeout after 30s`
|
|
|
|
These should not be interpreted as unsupported family evidence. They are dry-run execution-limit failures and need separate timeout/performance triage before capability conclusions are drawn.
|
|
|
|
### Fail-Closed-Known
|
|
|
|
`26` scenes failed without an auto-pass result but were recorded with a known dry-run failure category.
|
|
|
|
Top reasons:
|
|
|
|
| Reason | Count |
|
|
| --- | ---: |
|
|
| `generator failed without generation report` | 25 |
|
|
| `bootstrap_target` | 1 |
|
|
|
|
The `generator failed without generation report` bucket is actionable but too broad for implementation work. It should be split in a later bounded triage pass before any fixes are attempted.
|
|
|
|
### Misclassified
|
|
|
|
`5` scenes produced a package, but the inferred archetype conflicted with the current board group:
|
|
|
|
| Scene | Current group | Inferred archetype |
|
|
| --- | --- | --- |
|
|
| `95598报修工单日管控` | `G3` | `host_bridge_workflow` |
|
|
| `95598重要服务事项报备统计表` | `G3` | `host_bridge_workflow` |
|
|
| `用电报装信息统计列表` | `G1-E` | `host_bridge_workflow` |
|
|
| `配网支撑月报(95598抢修统计报表)` | `G3` | `host_bridge_workflow` |
|
|
| `高低压新增报装容量月度统计表` | `G1-E` | `host_bridge_workflow` |
|
|
|
|
This is the clearest blocker category from the dry-run because it indicates current generic routing can over-prefer `host_bridge_workflow` on some scenes that already have board-level family expectations.
|
|
|
|
## Interpretation
|
|
|
|
The four coverage numbers answer different questions:
|
|
|
|
1. `5 / 102` is the strict real-sample pass count.
|
|
2. `23 / 102` is the formal code-backed ledger coverage.
|
|
3. `40 / 102` is the current generic dry-run auto-pass count.
|
|
4. `66 / 102` is the current generic actionable coverage count.
|
|
|
|
The key result is that the generic generator currently auto-passes more scenes than the formal ledger coverage shows, but the result is not clean enough to promote automatically because:
|
|
|
|
1. `31` scenes hit bounded dry-run timeouts.
|
|
2. `5` scenes show board-vs-archetype mismatch.
|
|
3. `26` scenes need more specific failure extraction before implementation work.
|
|
|
|
## Recommended Next Blocker
|
|
|
|
Do not start implementation from this report directly.
|
|
|
|
The next bounded step should be a dry-run triage pass, with priority:
|
|
|
|
1. split the `31` timeout cases into true timeout, oversized source, and command-level hang
|
|
2. inspect the `5` misclassified cases as the first routing-quality sample
|
|
3. refine the `25` generic no-report failures into concrete failure categories
|
|
|
|
This report does not update the execution board and does not promote any scene.
|
|
|