claw/docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.md

# 102 Full Sweep Improvement Roadmap Design

> Date: 2026-04-19
> Status: Draft
> Upstream Dry-Run: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
> Upstream Triage: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`

## Design Intent

Use the full `102` scene dry-run and triage results to define a single improvement roadmap for generic `scene -> skill` coverage.

This roadmap is the post-triage equivalent of the earlier `60-to-90` roadmap. It is not a single bugfix plan. It is the governing design for turning measured dry-run blockers into bounded implementation tracks.

The design answers:

`how do we move from 40/102 dry-run auto-pass and 66/102 actionable coverage toward a higher verified generic conversion rate without drifting into unbounded fixes?`

## Current Baseline

The current measured state is:

| Metric | Count |
| --- | ---: |
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |

The non-pass triage state is:

| Bucket | Count | Triage conclusion |
| --- | ---: | --- |
| Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` |
| Misclassified | 5 | all `route-overprefer-host-bridge` |
| No-report failure | 25 | all `readiness-before-report` |
| Bootstrap target | 1 | separate `bootstrap_target` |

## Problem Statement

The generic generator already auto-passes more scenes than the formal ledger coverage shows, but the result is not trustworthy enough to promote automatically because:

1. known-family scenes still appear in the timeout bucket
2. `host_bridge_workflow` can over-absorb scenes expected to remain `G3` or `G1-E`
3. many fail-closed cases terminate before a structured generation report exists
4. timeout and no-report failures hide actionable blocker details

## Roadmap Goal

Improve the measurable generic conversion pipeline, not by adding new families first, but by reducing ambiguity in the current failure surface.

The roadmap has four goals:

1. make known-family timeout results explainable and repeatable
2. correct or formally adjudicate host-bridge routing over-preference
3. convert pre-report failures into structured fail-closed results
4. rerun a bounded `102` sweep to measure coverage delta

## Scope Guardrails

1. do not add new scene families in this roadmap
2. do not promote scenes directly from diagnostic runs
3. do not update `scene_execution_board_2026-04-18.json` until a later explicit status-sync plan
4. do not use one failure as justification for an unbounded rewrite
5. do not reopen completed `G1-E / G2 / G3 / G6 / G7` real-sample pass records unless they are part of a fixed regression check
6. do not start `G4 / G5`
7. do not implement login recovery, full host runtime, or attachment pipeline work in this roadmap

## Workstreams

1. `WS1` Timeout and Source-Scale Diagnostics
2. `WS2` Host-Bridge Routing Boundary Correction
3. `WS3` Structured Fail-Closed Reporting
4. `WS4` Coverage Delta Sweep and Decision Board

## Track A: Known-Family Timeout Diagnostics

### Intent

Separate known-family timeout behavior from generic unvalidated-source timeout behavior.

### Input

The `4` records labeled:

`timeout-known-family-sample`

### Expected Output

Each known-family timeout gets one of:

1. `known-family-rerun-pass`
2. `known-family-source-scale-timeout`
3. `known-family-generator-hotspot`
4. `known-family-contract-blocked-after-long-run`
5. `known-family-timeout-unresolved`

### Design Constraint

A longer rerun success does not promote a scene. It only changes diagnostic classification.

## Track B: Timeout Source-Scale Policy

### Intent

Create a bounded input filtering and scan-budget policy for large source directories without changing family semantics.

### Input

The timeout labels:

1. `timeout-large-source`
2. `timeout-unvalidated-source`

### Expected Output

1. source file selection policy
2. large vendor/library ignore list policy
3. scan-budget decision table
4. timeout reporting shape

### Design Constraint

This track is allowed to improve scan boundaries, but not allowed to change archetype semantics.

## Track C: Host-Bridge Route Over-Preference Correction

### Intent

Prevent `host_bridge_workflow` from absorbing scenes that should remain `G3` or `G1-E` when business-chain evidence is stronger.

### Input

The `5` records labeled:

`route-overprefer-host-bridge`

### Expected Output

Each misclassification gets one of:

1. `route-corrected-to-g3`
2. `route-corrected-to-g1e`
3. `board-expectation-reclassified`
4. `valid-host-bridge-workflow`
5. `route-conflict-unresolved`

### Design Constraint

This track must preserve the already-passed `G6` real sample and must not degrade `G3` or `G1-E` canonical tests.

## Track D: Readiness-Before-Report Structured Fail-Closed

### Intent

Convert `generator failed without generation report` into structured, machine-readable fail-closed results.

### Input

The `25` records labeled:

`readiness-before-report`

### Expected Output

Each case produces a generation report or equivalent dry-run failure record with:

1. inferred archetype
2. blocker stage
3. missing contract pieces
4. failed gate name
5. actionable reason

### Design Constraint

This track should not make failing scenes pass. It should make failures explainable.

## Track E: Bootstrap Target Isolation

### Intent

Keep the single `bootstrap_target` failure separate so it does not pollute the no-report or route-correction work.

### Input

The `1` bootstrap target failure:

`用户停电频次分析监测`

### Expected Output

1. isolated bootstrap failure note
2. decision whether it belongs to later bootstrap normalization work

### Design Constraint

No bootstrap auto-recovery or login work is included in this roadmap.

## Track F: Coverage Delta Sweep

### Intent

After bounded improvements, rerun a comparable `102` sweep and compare against the baseline.

### Input

1. baseline dry-run result
2. updated generator after approved tracks
3. same `102` scene board

### Expected Output

1. new dry-run result
2. coverage delta report
3. category movement table
4. decision board for remaining blockers

### Design Constraint

The rerun must be comparable to the baseline. It cannot silently change the scene set.

## Success Criteria

This roadmap succeeds when:

1. all known-family timeouts are separated from unvalidated timeout noise
2. all five host-bridge over-preference cases are adjudicated
3. no-report failures become structured fail-closed outputs
4. a follow-up full sweep shows measurable improvement or a clearly explained plateau
5. no new family is introduced to mask existing failure categories

## Out of Scope

1. new `G4/G5` implementation
2. full login recovery
3. browser host runtime transport implementation
4. local document attachment pipeline
5. automatic scene promotion into the execution board
6. full manual validation of all `102` generated skills