claw/docs/superpowers/plans/2026-04-19-102-full-sweep-improvement-roadmap-plan.md

# 102 Full Sweep Improvement Roadmap Plan

> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.md`
> Upstream Dry-Run Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
> Upstream Triage Result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`

## Plan Intent

Turn the `102` scene dry-run and triage findings into a governed improvement roadmap.

This plan is intentionally broad like the earlier `60-to-90` roadmap. It coordinates multiple bounded implementation tracks instead of starting isolated fixes from individual failures.

## Baseline

Current measured baseline:

| Metric | Count |
| --- | ---: |
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |

Current triage baseline:

| Bucket | Count | Triage conclusion |
| --- | ---: | --- |
| Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` |
| Misclassified | 5 | all `route-overprefer-host-bridge` |
| No-report failure | 25 | all `readiness-before-report` |
| Bootstrap target | 1 | separate `bootstrap_target` |

## Scope Guardrails

1. do not add new scene families
2. do not update `scene_execution_board_2026-04-18.json` inside this roadmap
3. do not promote scenes directly from diagnostic or dry-run results
4. do not reopen completed real-sample passes except as regression checks
5. do not start `G4/G5`
6. do not implement full login recovery
7. do not implement full host runtime transport
8. do not implement local document attachment runtime
9. do not create unbounded micro-plans from a single failure

## Workstreams

1. `WS1` Timeout Diagnostics and Scan Budget
2. `WS2` Routing Boundary Correction
3. `WS3` Structured Fail-Closed Reporting
4. `WS4` Follow-Up Sweep and Coverage Delta

## Phase 0: Freeze Improvement Baseline

### Objective

Freeze the dry-run and triage outputs as the only accepted inputs to this roadmap.

### Tasks

1. freeze `full_sweep_dry_run_2026-04-19.json`
2. freeze `full_sweep_dry_run_triage_2026-04-19.json`
3. freeze the four headline metrics:
   - `5/102` real-sample pass
   - `23/102` code-backed ledger coverage
   - `40/102` dry-run auto-pass
   - `66/102` dry-run actionable coverage
4. freeze the problem buckets:
   - `4` known-family timeouts
   - `8` large-source timeouts
   - `19` unvalidated-source timeouts
   - `5` host-bridge over-preference cases
   - `25` readiness-before-report failures
   - `1` bootstrap-target failure

### Deliverables

1. baseline statement
2. frozen blocker inventory
3. roadmap entry criteria

### Acceptance Criteria

1. no additional scene is added to scope
2. no implementation starts before the baseline is frozen
3. dry-run and triage assets are treated as immutable inputs

## Phase 1: Known-Family Timeout Diagnostics

### Objective

Resolve the highest-priority ambiguity: known-family scenes that timed out in the full sweep.

### Tasks

1. select only records labeled `timeout-known-family-sample`
2. capture source scale metrics and previous family context
3. run bounded diagnostic attempts if needed
4. classify each record as:
   - `known-family-rerun-pass`
   - `known-family-source-scale-timeout`
   - `known-family-generator-hotspot`
   - `known-family-contract-blocked-after-long-run`
   - `known-family-timeout-unresolved`
5. publish diagnostic result

### Deliverables

1. known-family timeout diagnostic JSON
2. known-family timeout diagnostic report

### Acceptance Criteria

1. all `4` known-family timeout records are classified
2. no scene is promoted from diagnostic success
3. no generator logic is changed in the diagnostic step

## Phase 2: Source-Scale and Scan-Budget Improvement

### Objective

Reduce timeout noise caused by oversized source directories and obvious vendor/library files.

### Tasks

1. analyze `timeout-large-source` and `timeout-unvalidated-source`
2. define source scan budget policy
3. define vendor/library ignore policy
4. implement only bounded source scanning or timeout reporting changes
5. verify no canonical or real-sample regression is introduced

### Deliverables

1. source scan budget policy
2. bounded scan implementation if approved by Phase 1 evidence
3. timeout reporting regression tests

### Acceptance Criteria

1. large source directories no longer dominate the full sweep by accidental vendor-file scanning
2. known-family samples are not made worse
3. archetype semantics are unchanged

## Phase 3: Host-Bridge Route Over-Preference Correction

### Objective

Correct or formally adjudicate the five cases where `host_bridge_workflow` over-absorbed `G3` or `G1-E` expected scenes.

### Tasks

1. select the `5` `route-overprefer-host-bridge` records
2. compare business-chain evidence against host-bridge evidence
3. define routing precedence rules for:
   - `G3` vs `G6`
   - `G1-E` vs `G6`
4. implement bounded routing correction only if evidence supports it
5. preserve regressions for:
   - `G3` real-sample pass
   - `G1-E` real-sample pass
   - `G6` real-sample pass
6. classify each case as:
   - `route-corrected-to-g3`
   - `route-corrected-to-g1e`
   - `board-expectation-reclassified`
   - `valid-host-bridge-workflow`
   - `route-conflict-unresolved`

### Deliverables

1. route over-preference correction report
2. routing regression tests
3. updated dry-run classification for the five fixed records

### Acceptance Criteria

1. all `5` route conflicts are adjudicated
2. `host_bridge_workflow` no longer wins solely because host evidence exists
3. existing `G6` pass remains stable
4. no broad routing rewrite is introduced

## Phase 4: Structured Fail-Closed Reporting

### Objective

Convert `readiness-before-report` failures into structured failure reports instead of process-level no-report failures.

### Tasks

1. select the `25` `readiness-before-report` records
2. identify where generation exits before report emission
3. define a minimal failure-report schema for pre-package fail-closed
4. emit structured failure records with:
   - inferred archetype
   - failed gate
   - blocker reason
   - missing contract pieces
   - stderr summary if any
5. keep scenes failing unless their contracts are actually complete

### Deliverables

1. pre-report fail-closed schema
2. implementation of structured failure report emission
3. regression covering at least one `paginated_enrichment`, one `local_doc_pipeline`, one `multi_mode_request`, and one `single_request_enrichment` pre-report failure

### Acceptance Criteria

1. no-report failures are reduced or eliminated as a category
2. failing scenes still fail closed
3. failure reasons become machine-readable
4. auto-pass count is not inflated by looser gates

## Phase 5: Bootstrap Target Isolation

### Objective

Keep the single `bootstrap_target` failure isolated and decide whether it belongs to later bootstrap normalization work.

### Tasks

1. preserve `用户停电频次分析监测` as a separate bootstrap failure
2. inspect whether the failure is caused by missing target URL, domain mismatch, or unsupported bootstrap pattern
3. produce a bootstrap isolation note
4. do not implement login or bootstrap auto-recovery

### Deliverables

1. bootstrap target isolation note
2. decision whether the case enters a later bootstrap-normalization roadmap

### Acceptance Criteria

1. the bootstrap case does not pollute readiness-before-report work
2. no login recovery implementation is started

## Phase 6: Follow-Up Full Sweep and Coverage Delta

### Objective

Measure whether the bounded improvements improved generic coverage.

### Tasks

1. rerun the fixed `102` scene full sweep with the same scene set
2. produce a new dry-run result
3. compare against the baseline:
   - auto-pass delta
   - actionable coverage delta
   - timeout delta
   - misclassification delta
   - no-report delta
4. publish coverage delta report
5. decide whether to move to execution-board status sync or another bounded improvement cycle

### Deliverables

1. follow-up full sweep JSON
2. coverage delta report
3. remaining blocker decision board

### Acceptance Criteria

1. scene set remains exactly `102`
2. baseline and follow-up are comparable
3. improvements are quantified, not assumed
4. no execution board status is changed automatically

## Milestone Order

The order is fixed:

1. Phase 0: freeze baseline
2. Phase 1: known-family timeout diagnostics
3. Phase 2: source-scale and scan-budget improvement
4. Phase 3: host-bridge route over-preference correction
5. Phase 4: structured fail-closed reporting
6. Phase 5: bootstrap target isolation
7. Phase 6: follow-up full sweep and coverage delta

Do not start Phase 3 before Phase 1 is completed. Known-family timeout ambiguity affects the interpretation of current coverage.

Do not start Phase 6 before Phases 2-5 have either completed or been explicitly deferred with reasons.

## Completion Criteria

This roadmap is complete when:

1. known-family timeouts are no longer mixed with generic timeout noise
2. host-bridge over-preference cases are adjudicated
3. readiness-before-report failures become structured fail-closed records
4. the bootstrap target case is isolated
5. a follow-up full sweep quantifies coverage delta
6. no new family is introduced as a shortcut around current blockers

## Out of Plan

1. new family implementation
2. `G4/G5` implementation
3. browser host runtime transport
4. login recovery
5. attachment/local document runtime
6. automatic execution board promotion