240 lines
6.9 KiB
Markdown
240 lines
6.9 KiB
Markdown
# 102 Full Sweep Improvement Roadmap Design
|
|
|
|
> Date: 2026-04-19
|
|
> Status: Draft
|
|
> Upstream Dry-Run: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
|
|
> Upstream Triage: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
|
|
|
|
## Design Intent
|
|
|
|
Use the full `102` scene dry-run and triage results to define a single improvement roadmap for generic `scene -> skill` coverage.
|
|
|
|
This roadmap is the post-triage equivalent of the earlier `60-to-90` roadmap. It is not a single bugfix plan. It is the governing design for turning measured dry-run blockers into bounded implementation tracks.
|
|
|
|
The design answers:
|
|
|
|
`how do we move from 40/102 dry-run auto-pass and 66/102 actionable coverage toward a higher verified generic conversion rate without drifting into unbounded fixes?`
|
|
|
|
## Current Baseline
|
|
|
|
The current measured state is:
|
|
|
|
| Metric | Count |
|
|
| --- | ---: |
|
|
| Real-sample executed pass | 5 / 102 |
|
|
| Code-backed ledger coverage | 23 / 102 |
|
|
| Dry-run auto-pass | 40 / 102 |
|
|
| Dry-run actionable coverage | 66 / 102 |
|
|
|
|
The non-pass triage state is:
|
|
|
|
| Bucket | Count | Triage conclusion |
|
|
| --- | ---: | --- |
|
|
| Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` |
|
|
| Misclassified | 5 | all `route-overprefer-host-bridge` |
|
|
| No-report failure | 25 | all `readiness-before-report` |
|
|
| Bootstrap target | 1 | separate `bootstrap_target` |
|
|
|
|
## Problem Statement
|
|
|
|
The generic generator already auto-passes more scenes than the formal ledger coverage shows, but the result is not trustworthy enough to promote automatically because:
|
|
|
|
1. known-family scenes still appear in the timeout bucket
|
|
2. `host_bridge_workflow` can over-absorb scenes expected to remain `G3` or `G1-E`
|
|
3. many fail-closed cases terminate before a structured generation report exists
|
|
4. timeout and no-report failures hide actionable blocker details
|
|
|
|
## Roadmap Goal
|
|
|
|
Improve the measurable generic conversion pipeline, not by adding new families first, but by reducing ambiguity in the current failure surface.
|
|
|
|
The roadmap has four goals:
|
|
|
|
1. make known-family timeout results explainable and repeatable
|
|
2. correct or formally adjudicate host-bridge routing over-preference
|
|
3. convert pre-report failures into structured fail-closed results
|
|
4. rerun a bounded `102` sweep to measure coverage delta
|
|
|
|
## Scope Guardrails
|
|
|
|
1. do not add new scene families in this roadmap
|
|
2. do not promote scenes directly from diagnostic runs
|
|
3. do not update `scene_execution_board_2026-04-18.json` until a later explicit status-sync plan
|
|
4. do not use one failure as justification for an unbounded rewrite
|
|
5. do not reopen completed `G1-E / G2 / G3 / G6 / G7` real-sample pass records unless they are part of a fixed regression check
|
|
6. do not start `G4 / G5`
|
|
7. do not implement login recovery, full host runtime, or attachment pipeline work in this roadmap
|
|
|
|
## Workstreams
|
|
|
|
1. `WS1` Timeout and Source-Scale Diagnostics
|
|
2. `WS2` Host-Bridge Routing Boundary Correction
|
|
3. `WS3` Structured Fail-Closed Reporting
|
|
4. `WS4` Coverage Delta Sweep and Decision Board
|
|
|
|
## Track A: Known-Family Timeout Diagnostics
|
|
|
|
### Intent
|
|
|
|
Separate known-family timeout behavior from generic unvalidated-source timeout behavior.
|
|
|
|
### Input
|
|
|
|
The `4` records labeled:
|
|
|
|
`timeout-known-family-sample`
|
|
|
|
### Expected Output
|
|
|
|
Each known-family timeout gets one of:
|
|
|
|
1. `known-family-rerun-pass`
|
|
2. `known-family-source-scale-timeout`
|
|
3. `known-family-generator-hotspot`
|
|
4. `known-family-contract-blocked-after-long-run`
|
|
5. `known-family-timeout-unresolved`
|
|
|
|
### Design Constraint
|
|
|
|
A longer rerun success does not promote a scene. It only changes diagnostic classification.
|
|
|
|
## Track B: Timeout Source-Scale Policy
|
|
|
|
### Intent
|
|
|
|
Create a bounded input filtering and scan-budget policy for large source directories without changing family semantics.
|
|
|
|
### Input
|
|
|
|
The timeout labels:
|
|
|
|
1. `timeout-large-source`
|
|
2. `timeout-unvalidated-source`
|
|
|
|
### Expected Output
|
|
|
|
1. source file selection policy
|
|
2. large vendor/library ignore list policy
|
|
3. scan-budget decision table
|
|
4. timeout reporting shape
|
|
|
|
### Design Constraint
|
|
|
|
This track is allowed to improve scan boundaries, but not allowed to change archetype semantics.
|
|
|
|
## Track C: Host-Bridge Route Over-Preference Correction
|
|
|
|
### Intent
|
|
|
|
Prevent `host_bridge_workflow` from absorbing scenes that should remain `G3` or `G1-E` when business-chain evidence is stronger.
|
|
|
|
### Input
|
|
|
|
The `5` records labeled:
|
|
|
|
`route-overprefer-host-bridge`
|
|
|
|
### Expected Output
|
|
|
|
Each misclassification gets one of:
|
|
|
|
1. `route-corrected-to-g3`
|
|
2. `route-corrected-to-g1e`
|
|
3. `board-expectation-reclassified`
|
|
4. `valid-host-bridge-workflow`
|
|
5. `route-conflict-unresolved`
|
|
|
|
### Design Constraint
|
|
|
|
This track must preserve the already-passed `G6` real sample and must not degrade `G3` or `G1-E` canonical tests.
|
|
|
|
## Track D: Readiness-Before-Report Structured Fail-Closed
|
|
|
|
### Intent
|
|
|
|
Convert `generator failed without generation report` into structured, machine-readable fail-closed results.
|
|
|
|
### Input
|
|
|
|
The `25` records labeled:
|
|
|
|
`readiness-before-report`
|
|
|
|
### Expected Output
|
|
|
|
Each case produces a generation report or equivalent dry-run failure record with:
|
|
|
|
1. inferred archetype
|
|
2. blocker stage
|
|
3. missing contract pieces
|
|
4. failed gate name
|
|
5. actionable reason
|
|
|
|
### Design Constraint
|
|
|
|
This track should not make failing scenes pass. It should make failures explainable.
|
|
|
|
## Track E: Bootstrap Target Isolation
|
|
|
|
### Intent
|
|
|
|
Keep the single `bootstrap_target` failure separate so it does not pollute the no-report or route-correction work.
|
|
|
|
### Input
|
|
|
|
The `1` bootstrap target failure:
|
|
|
|
`用户停电频次分析监测`
|
|
|
|
### Expected Output
|
|
|
|
1. isolated bootstrap failure note
|
|
2. decision whether it belongs to later bootstrap normalization work
|
|
|
|
### Design Constraint
|
|
|
|
No bootstrap auto-recovery or login work is included in this roadmap.
|
|
|
|
## Track F: Coverage Delta Sweep
|
|
|
|
### Intent
|
|
|
|
After bounded improvements, rerun a comparable `102` sweep and compare against the baseline.
|
|
|
|
### Input
|
|
|
|
1. baseline dry-run result
|
|
2. updated generator after approved tracks
|
|
3. same `102` scene board
|
|
|
|
### Expected Output
|
|
|
|
1. new dry-run result
|
|
2. coverage delta report
|
|
3. category movement table
|
|
4. decision board for remaining blockers
|
|
|
|
### Design Constraint
|
|
|
|
The rerun must be comparable to the baseline. It cannot silently change the scene set.
|
|
|
|
## Success Criteria
|
|
|
|
This roadmap succeeds when:
|
|
|
|
1. all known-family timeouts are separated from unvalidated timeout noise
|
|
2. all five host-bridge over-preference cases are adjudicated
|
|
3. no-report failures become structured fail-closed outputs
|
|
4. a follow-up full sweep shows measurable improvement or a clearly explained plateau
|
|
5. no new family is introduced to mask existing failure categories
|
|
|
|
## Out of Scope
|
|
|
|
1. new `G4/G5` implementation
|
|
2. full login recovery
|
|
3. browser host runtime transport implementation
|
|
4. local document attachment pipeline
|
|
5. automatic scene promotion into the execution board
|
|
6. full manual validation of all `102` generated skills
|
|
|