# 102 Full Sweep Improvement Roadmap Plan > Date: 2026-04-19 > Status: Draft > Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.md` > Upstream Dry-Run Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json` > Upstream Triage Result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json` ## Plan Intent Turn the `102` scene dry-run and triage findings into a governed improvement roadmap. This plan is intentionally broad like the earlier `60-to-90` roadmap. It coordinates multiple bounded implementation tracks instead of starting isolated fixes from individual failures. ## Baseline Current measured baseline: | Metric | Count | | --- | ---: | | Real-sample executed pass | 5 / 102 | | Code-backed ledger coverage | 23 / 102 | | Dry-run auto-pass | 40 / 102 | | Dry-run actionable coverage | 66 / 102 | Current triage baseline: | Bucket | Count | Triage conclusion | | --- | ---: | --- | | Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` | | Misclassified | 5 | all `route-overprefer-host-bridge` | | No-report failure | 25 | all `readiness-before-report` | | Bootstrap target | 1 | separate `bootstrap_target` | ## Scope Guardrails 1. do not add new scene families 2. do not update `scene_execution_board_2026-04-18.json` inside this roadmap 3. do not promote scenes directly from diagnostic or dry-run results 4. do not reopen completed real-sample passes except as regression checks 5. do not start `G4/G5` 6. do not implement full login recovery 7. do not implement full host runtime transport 8. do not implement local document attachment runtime 9. do not create unbounded micro-plans from a single failure ## Workstreams 1. `WS1` Timeout Diagnostics and Scan Budget 2. `WS2` Routing Boundary Correction 3. `WS3` Structured Fail-Closed Reporting 4. `WS4` Follow-Up Sweep and Coverage Delta ## Phase 0: Freeze Improvement Baseline ### Objective Freeze the dry-run and triage outputs as the only accepted inputs to this roadmap. ### Tasks 1. freeze `full_sweep_dry_run_2026-04-19.json` 2. freeze `full_sweep_dry_run_triage_2026-04-19.json` 3. freeze the four headline metrics: - `5/102` real-sample pass - `23/102` code-backed ledger coverage - `40/102` dry-run auto-pass - `66/102` dry-run actionable coverage 4. freeze the problem buckets: - `4` known-family timeouts - `8` large-source timeouts - `19` unvalidated-source timeouts - `5` host-bridge over-preference cases - `25` readiness-before-report failures - `1` bootstrap-target failure ### Deliverables 1. baseline statement 2. frozen blocker inventory 3. roadmap entry criteria ### Acceptance Criteria 1. no additional scene is added to scope 2. no implementation starts before the baseline is frozen 3. dry-run and triage assets are treated as immutable inputs ## Phase 1: Known-Family Timeout Diagnostics ### Objective Resolve the highest-priority ambiguity: known-family scenes that timed out in the full sweep. ### Tasks 1. select only records labeled `timeout-known-family-sample` 2. capture source scale metrics and previous family context 3. run bounded diagnostic attempts if needed 4. classify each record as: - `known-family-rerun-pass` - `known-family-source-scale-timeout` - `known-family-generator-hotspot` - `known-family-contract-blocked-after-long-run` - `known-family-timeout-unresolved` 5. publish diagnostic result ### Deliverables 1. known-family timeout diagnostic JSON 2. known-family timeout diagnostic report ### Acceptance Criteria 1. all `4` known-family timeout records are classified 2. no scene is promoted from diagnostic success 3. no generator logic is changed in the diagnostic step ## Phase 2: Source-Scale and Scan-Budget Improvement ### Objective Reduce timeout noise caused by oversized source directories and obvious vendor/library files. ### Tasks 1. analyze `timeout-large-source` and `timeout-unvalidated-source` 2. define source scan budget policy 3. define vendor/library ignore policy 4. implement only bounded source scanning or timeout reporting changes 5. verify no canonical or real-sample regression is introduced ### Deliverables 1. source scan budget policy 2. bounded scan implementation if approved by Phase 1 evidence 3. timeout reporting regression tests ### Acceptance Criteria 1. large source directories no longer dominate the full sweep by accidental vendor-file scanning 2. known-family samples are not made worse 3. archetype semantics are unchanged ## Phase 3: Host-Bridge Route Over-Preference Correction ### Objective Correct or formally adjudicate the five cases where `host_bridge_workflow` over-absorbed `G3` or `G1-E` expected scenes. ### Tasks 1. select the `5` `route-overprefer-host-bridge` records 2. compare business-chain evidence against host-bridge evidence 3. define routing precedence rules for: - `G3` vs `G6` - `G1-E` vs `G6` 4. implement bounded routing correction only if evidence supports it 5. preserve regressions for: - `G3` real-sample pass - `G1-E` real-sample pass - `G6` real-sample pass 6. classify each case as: - `route-corrected-to-g3` - `route-corrected-to-g1e` - `board-expectation-reclassified` - `valid-host-bridge-workflow` - `route-conflict-unresolved` ### Deliverables 1. route over-preference correction report 2. routing regression tests 3. updated dry-run classification for the five fixed records ### Acceptance Criteria 1. all `5` route conflicts are adjudicated 2. `host_bridge_workflow` no longer wins solely because host evidence exists 3. existing `G6` pass remains stable 4. no broad routing rewrite is introduced ## Phase 4: Structured Fail-Closed Reporting ### Objective Convert `readiness-before-report` failures into structured failure reports instead of process-level no-report failures. ### Tasks 1. select the `25` `readiness-before-report` records 2. identify where generation exits before report emission 3. define a minimal failure-report schema for pre-package fail-closed 4. emit structured failure records with: - inferred archetype - failed gate - blocker reason - missing contract pieces - stderr summary if any 5. keep scenes failing unless their contracts are actually complete ### Deliverables 1. pre-report fail-closed schema 2. implementation of structured failure report emission 3. regression covering at least one `paginated_enrichment`, one `local_doc_pipeline`, one `multi_mode_request`, and one `single_request_enrichment` pre-report failure ### Acceptance Criteria 1. no-report failures are reduced or eliminated as a category 2. failing scenes still fail closed 3. failure reasons become machine-readable 4. auto-pass count is not inflated by looser gates ## Phase 5: Bootstrap Target Isolation ### Objective Keep the single `bootstrap_target` failure isolated and decide whether it belongs to later bootstrap normalization work. ### Tasks 1. preserve `用户停电频次分析监测` as a separate bootstrap failure 2. inspect whether the failure is caused by missing target URL, domain mismatch, or unsupported bootstrap pattern 3. produce a bootstrap isolation note 4. do not implement login or bootstrap auto-recovery ### Deliverables 1. bootstrap target isolation note 2. decision whether the case enters a later bootstrap-normalization roadmap ### Acceptance Criteria 1. the bootstrap case does not pollute readiness-before-report work 2. no login recovery implementation is started ## Phase 6: Follow-Up Full Sweep and Coverage Delta ### Objective Measure whether the bounded improvements improved generic coverage. ### Tasks 1. rerun the fixed `102` scene full sweep with the same scene set 2. produce a new dry-run result 3. compare against the baseline: - auto-pass delta - actionable coverage delta - timeout delta - misclassification delta - no-report delta 4. publish coverage delta report 5. decide whether to move to execution-board status sync or another bounded improvement cycle ### Deliverables 1. follow-up full sweep JSON 2. coverage delta report 3. remaining blocker decision board ### Acceptance Criteria 1. scene set remains exactly `102` 2. baseline and follow-up are comparable 3. improvements are quantified, not assumed 4. no execution board status is changed automatically ## Milestone Order The order is fixed: 1. Phase 0: freeze baseline 2. Phase 1: known-family timeout diagnostics 3. Phase 2: source-scale and scan-budget improvement 4. Phase 3: host-bridge route over-preference correction 5. Phase 4: structured fail-closed reporting 6. Phase 5: bootstrap target isolation 7. Phase 6: follow-up full sweep and coverage delta Do not start Phase 3 before Phase 1 is completed. Known-family timeout ambiguity affects the interpretation of current coverage. Do not start Phase 6 before Phases 2-5 have either completed or been explicitly deferred with reasons. ## Completion Criteria This roadmap is complete when: 1. known-family timeouts are no longer mixed with generic timeout noise 2. host-bridge over-preference cases are adjudicated 3. readiness-before-report failures become structured fail-closed records 4. the bootstrap target case is isolated 5. a follow-up full sweep quantifies coverage delta 6. no new family is introduced as a shortcut around current blockers ## Out of Plan 1. new family implementation 2. `G4/G5` implementation 3. browser host runtime transport 4. login recovery 5. attachment/local document runtime 6. automatic execution board promotion