6.9 KiB
102 Full Sweep Improvement Roadmap Design
Date: 2026-04-19 Status: Draft Upstream Dry-Run:
docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.mdUpstream Triage:docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md
Design Intent
Use the full 102 scene dry-run and triage results to define a single improvement roadmap for generic scene -> skill coverage.
This roadmap is the post-triage equivalent of the earlier 60-to-90 roadmap. It is not a single bugfix plan. It is the governing design for turning measured dry-run blockers into bounded implementation tracks.
The design answers:
how do we move from 40/102 dry-run auto-pass and 66/102 actionable coverage toward a higher verified generic conversion rate without drifting into unbounded fixes?
Current Baseline
The current measured state is:
| Metric | Count |
|---|---|
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |
The non-pass triage state is:
| Bucket | Count | Triage conclusion |
|---|---|---|
| Timeout | 31 | 19 timeout-unvalidated-source, 8 timeout-large-source, 4 timeout-known-family-sample |
| Misclassified | 5 | all route-overprefer-host-bridge |
| No-report failure | 25 | all readiness-before-report |
| Bootstrap target | 1 | separate bootstrap_target |
Problem Statement
The generic generator already auto-passes more scenes than the formal ledger coverage shows, but the result is not trustworthy enough to promote automatically because:
- known-family scenes still appear in the timeout bucket
host_bridge_workflowcan over-absorb scenes expected to remainG3orG1-E- many fail-closed cases terminate before a structured generation report exists
- timeout and no-report failures hide actionable blocker details
Roadmap Goal
Improve the measurable generic conversion pipeline, not by adding new families first, but by reducing ambiguity in the current failure surface.
The roadmap has four goals:
- make known-family timeout results explainable and repeatable
- correct or formally adjudicate host-bridge routing over-preference
- convert pre-report failures into structured fail-closed results
- rerun a bounded
102sweep to measure coverage delta
Scope Guardrails
- do not add new scene families in this roadmap
- do not promote scenes directly from diagnostic runs
- do not update
scene_execution_board_2026-04-18.jsonuntil a later explicit status-sync plan - do not use one failure as justification for an unbounded rewrite
- do not reopen completed
G1-E / G2 / G3 / G6 / G7real-sample pass records unless they are part of a fixed regression check - do not start
G4 / G5 - do not implement login recovery, full host runtime, or attachment pipeline work in this roadmap
Workstreams
WS1Timeout and Source-Scale DiagnosticsWS2Host-Bridge Routing Boundary CorrectionWS3Structured Fail-Closed ReportingWS4Coverage Delta Sweep and Decision Board
Track A: Known-Family Timeout Diagnostics
Intent
Separate known-family timeout behavior from generic unvalidated-source timeout behavior.
Input
The 4 records labeled:
timeout-known-family-sample
Expected Output
Each known-family timeout gets one of:
known-family-rerun-passknown-family-source-scale-timeoutknown-family-generator-hotspotknown-family-contract-blocked-after-long-runknown-family-timeout-unresolved
Design Constraint
A longer rerun success does not promote a scene. It only changes diagnostic classification.
Track B: Timeout Source-Scale Policy
Intent
Create a bounded input filtering and scan-budget policy for large source directories without changing family semantics.
Input
The timeout labels:
timeout-large-sourcetimeout-unvalidated-source
Expected Output
- source file selection policy
- large vendor/library ignore list policy
- scan-budget decision table
- timeout reporting shape
Design Constraint
This track is allowed to improve scan boundaries, but not allowed to change archetype semantics.
Track C: Host-Bridge Route Over-Preference Correction
Intent
Prevent host_bridge_workflow from absorbing scenes that should remain G3 or G1-E when business-chain evidence is stronger.
Input
The 5 records labeled:
route-overprefer-host-bridge
Expected Output
Each misclassification gets one of:
route-corrected-to-g3route-corrected-to-g1eboard-expectation-reclassifiedvalid-host-bridge-workflowroute-conflict-unresolved
Design Constraint
This track must preserve the already-passed G6 real sample and must not degrade G3 or G1-E canonical tests.
Track D: Readiness-Before-Report Structured Fail-Closed
Intent
Convert generator failed without generation report into structured, machine-readable fail-closed results.
Input
The 25 records labeled:
readiness-before-report
Expected Output
Each case produces a generation report or equivalent dry-run failure record with:
- inferred archetype
- blocker stage
- missing contract pieces
- failed gate name
- actionable reason
Design Constraint
This track should not make failing scenes pass. It should make failures explainable.
Track E: Bootstrap Target Isolation
Intent
Keep the single bootstrap_target failure separate so it does not pollute the no-report or route-correction work.
Input
The 1 bootstrap target failure:
用户停电频次分析监测
Expected Output
- isolated bootstrap failure note
- decision whether it belongs to later bootstrap normalization work
Design Constraint
No bootstrap auto-recovery or login work is included in this roadmap.
Track F: Coverage Delta Sweep
Intent
After bounded improvements, rerun a comparable 102 sweep and compare against the baseline.
Input
- baseline dry-run result
- updated generator after approved tracks
- same
102scene board
Expected Output
- new dry-run result
- coverage delta report
- category movement table
- decision board for remaining blockers
Design Constraint
The rerun must be comparable to the baseline. It cannot silently change the scene set.
Success Criteria
This roadmap succeeds when:
- all known-family timeouts are separated from unvalidated timeout noise
- all five host-bridge over-preference cases are adjudicated
- no-report failures become structured fail-closed outputs
- a follow-up full sweep shows measurable improvement or a clearly explained plateau
- no new family is introduced to mask existing failure categories
Out of Scope
- new
G4/G5implementation - full login recovery
- browser host runtime transport implementation
- local document attachment pipeline
- automatic scene promotion into the execution board
- full manual validation of all
102generated skills