9.4 KiB
102 Full Sweep Improvement Roadmap Plan
Date: 2026-04-19 Status: Draft Upstream Spec:
docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.mdUpstream Dry-Run Result:tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.jsonUpstream Triage Result:tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json
Plan Intent
Turn the 102 scene dry-run and triage findings into a governed improvement roadmap.
This plan is intentionally broad like the earlier 60-to-90 roadmap. It coordinates multiple bounded implementation tracks instead of starting isolated fixes from individual failures.
Baseline
Current measured baseline:
| Metric | Count |
|---|---|
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |
Current triage baseline:
| Bucket | Count | Triage conclusion |
|---|---|---|
| Timeout | 31 | 19 timeout-unvalidated-source, 8 timeout-large-source, 4 timeout-known-family-sample |
| Misclassified | 5 | all route-overprefer-host-bridge |
| No-report failure | 25 | all readiness-before-report |
| Bootstrap target | 1 | separate bootstrap_target |
Scope Guardrails
- do not add new scene families
- do not update
scene_execution_board_2026-04-18.jsoninside this roadmap - do not promote scenes directly from diagnostic or dry-run results
- do not reopen completed real-sample passes except as regression checks
- do not start
G4/G5 - do not implement full login recovery
- do not implement full host runtime transport
- do not implement local document attachment runtime
- do not create unbounded micro-plans from a single failure
Workstreams
WS1Timeout Diagnostics and Scan BudgetWS2Routing Boundary CorrectionWS3Structured Fail-Closed ReportingWS4Follow-Up Sweep and Coverage Delta
Phase 0: Freeze Improvement Baseline
Objective
Freeze the dry-run and triage outputs as the only accepted inputs to this roadmap.
Tasks
- freeze
full_sweep_dry_run_2026-04-19.json - freeze
full_sweep_dry_run_triage_2026-04-19.json - freeze the four headline metrics:
5/102real-sample pass23/102code-backed ledger coverage40/102dry-run auto-pass66/102dry-run actionable coverage
- freeze the problem buckets:
4known-family timeouts8large-source timeouts19unvalidated-source timeouts5host-bridge over-preference cases25readiness-before-report failures1bootstrap-target failure
Deliverables
- baseline statement
- frozen blocker inventory
- roadmap entry criteria
Acceptance Criteria
- no additional scene is added to scope
- no implementation starts before the baseline is frozen
- dry-run and triage assets are treated as immutable inputs
Phase 1: Known-Family Timeout Diagnostics
Objective
Resolve the highest-priority ambiguity: known-family scenes that timed out in the full sweep.
Tasks
- select only records labeled
timeout-known-family-sample - capture source scale metrics and previous family context
- run bounded diagnostic attempts if needed
- classify each record as:
known-family-rerun-passknown-family-source-scale-timeoutknown-family-generator-hotspotknown-family-contract-blocked-after-long-runknown-family-timeout-unresolved
- publish diagnostic result
Deliverables
- known-family timeout diagnostic JSON
- known-family timeout diagnostic report
Acceptance Criteria
- all
4known-family timeout records are classified - no scene is promoted from diagnostic success
- no generator logic is changed in the diagnostic step
Phase 2: Source-Scale and Scan-Budget Improvement
Objective
Reduce timeout noise caused by oversized source directories and obvious vendor/library files.
Tasks
- analyze
timeout-large-sourceandtimeout-unvalidated-source - define source scan budget policy
- define vendor/library ignore policy
- implement only bounded source scanning or timeout reporting changes
- verify no canonical or real-sample regression is introduced
Deliverables
- source scan budget policy
- bounded scan implementation if approved by Phase 1 evidence
- timeout reporting regression tests
Acceptance Criteria
- large source directories no longer dominate the full sweep by accidental vendor-file scanning
- known-family samples are not made worse
- archetype semantics are unchanged
Phase 3: Host-Bridge Route Over-Preference Correction
Objective
Correct or formally adjudicate the five cases where host_bridge_workflow over-absorbed G3 or G1-E expected scenes.
Tasks
- select the
5route-overprefer-host-bridgerecords - compare business-chain evidence against host-bridge evidence
- define routing precedence rules for:
G3vsG6G1-EvsG6
- implement bounded routing correction only if evidence supports it
- preserve regressions for:
G3real-sample passG1-Ereal-sample passG6real-sample pass
- classify each case as:
route-corrected-to-g3route-corrected-to-g1eboard-expectation-reclassifiedvalid-host-bridge-workflowroute-conflict-unresolved
Deliverables
- route over-preference correction report
- routing regression tests
- updated dry-run classification for the five fixed records
Acceptance Criteria
- all
5route conflicts are adjudicated host_bridge_workflowno longer wins solely because host evidence exists- existing
G6pass remains stable - no broad routing rewrite is introduced
Phase 4: Structured Fail-Closed Reporting
Objective
Convert readiness-before-report failures into structured failure reports instead of process-level no-report failures.
Tasks
- select the
25readiness-before-reportrecords - identify where generation exits before report emission
- define a minimal failure-report schema for pre-package fail-closed
- emit structured failure records with:
- inferred archetype
- failed gate
- blocker reason
- missing contract pieces
- stderr summary if any
- keep scenes failing unless their contracts are actually complete
Deliverables
- pre-report fail-closed schema
- implementation of structured failure report emission
- regression covering at least one
paginated_enrichment, onelocal_doc_pipeline, onemulti_mode_request, and onesingle_request_enrichmentpre-report failure
Acceptance Criteria
- no-report failures are reduced or eliminated as a category
- failing scenes still fail closed
- failure reasons become machine-readable
- auto-pass count is not inflated by looser gates
Phase 5: Bootstrap Target Isolation
Objective
Keep the single bootstrap_target failure isolated and decide whether it belongs to later bootstrap normalization work.
Tasks
- preserve
用户停电频次分析监测as a separate bootstrap failure - inspect whether the failure is caused by missing target URL, domain mismatch, or unsupported bootstrap pattern
- produce a bootstrap isolation note
- do not implement login or bootstrap auto-recovery
Deliverables
- bootstrap target isolation note
- decision whether the case enters a later bootstrap-normalization roadmap
Acceptance Criteria
- the bootstrap case does not pollute readiness-before-report work
- no login recovery implementation is started
Phase 6: Follow-Up Full Sweep and Coverage Delta
Objective
Measure whether the bounded improvements improved generic coverage.
Tasks
- rerun the fixed
102scene full sweep with the same scene set - produce a new dry-run result
- compare against the baseline:
- auto-pass delta
- actionable coverage delta
- timeout delta
- misclassification delta
- no-report delta
- publish coverage delta report
- decide whether to move to execution-board status sync or another bounded improvement cycle
Deliverables
- follow-up full sweep JSON
- coverage delta report
- remaining blocker decision board
Acceptance Criteria
- scene set remains exactly
102 - baseline and follow-up are comparable
- improvements are quantified, not assumed
- no execution board status is changed automatically
Milestone Order
The order is fixed:
- Phase 0: freeze baseline
- Phase 1: known-family timeout diagnostics
- Phase 2: source-scale and scan-budget improvement
- Phase 3: host-bridge route over-preference correction
- Phase 4: structured fail-closed reporting
- Phase 5: bootstrap target isolation
- Phase 6: follow-up full sweep and coverage delta
Do not start Phase 3 before Phase 1 is completed. Known-family timeout ambiguity affects the interpretation of current coverage.
Do not start Phase 6 before Phases 2-5 have either completed or been explicitly deferred with reasons.
Completion Criteria
This roadmap is complete when:
- known-family timeouts are no longer mixed with generic timeout noise
- host-bridge over-preference cases are adjudicated
- readiness-before-report failures become structured fail-closed records
- the bootstrap target case is isolated
- a follow-up full sweep quantifies coverage delta
- no new family is introduced as a shortcut around current blockers
Out of Plan
- new family implementation
G4/G5implementation- browser host runtime transport
- login recovery
- attachment/local document runtime
- automatic execution board promotion