306 lines
9.4 KiB
Markdown
306 lines
9.4 KiB
Markdown
# 102 Full Sweep Improvement Roadmap Plan
|
|
|
|
> Date: 2026-04-19
|
|
> Status: Draft
|
|
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.md`
|
|
> Upstream Dry-Run Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
|
|
> Upstream Triage Result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
|
|
|
|
## Plan Intent
|
|
|
|
Turn the `102` scene dry-run and triage findings into a governed improvement roadmap.
|
|
|
|
This plan is intentionally broad like the earlier `60-to-90` roadmap. It coordinates multiple bounded implementation tracks instead of starting isolated fixes from individual failures.
|
|
|
|
## Baseline
|
|
|
|
Current measured baseline:
|
|
|
|
| Metric | Count |
|
|
| --- | ---: |
|
|
| Real-sample executed pass | 5 / 102 |
|
|
| Code-backed ledger coverage | 23 / 102 |
|
|
| Dry-run auto-pass | 40 / 102 |
|
|
| Dry-run actionable coverage | 66 / 102 |
|
|
|
|
Current triage baseline:
|
|
|
|
| Bucket | Count | Triage conclusion |
|
|
| --- | ---: | --- |
|
|
| Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` |
|
|
| Misclassified | 5 | all `route-overprefer-host-bridge` |
|
|
| No-report failure | 25 | all `readiness-before-report` |
|
|
| Bootstrap target | 1 | separate `bootstrap_target` |
|
|
|
|
## Scope Guardrails
|
|
|
|
1. do not add new scene families
|
|
2. do not update `scene_execution_board_2026-04-18.json` inside this roadmap
|
|
3. do not promote scenes directly from diagnostic or dry-run results
|
|
4. do not reopen completed real-sample passes except as regression checks
|
|
5. do not start `G4/G5`
|
|
6. do not implement full login recovery
|
|
7. do not implement full host runtime transport
|
|
8. do not implement local document attachment runtime
|
|
9. do not create unbounded micro-plans from a single failure
|
|
|
|
## Workstreams
|
|
|
|
1. `WS1` Timeout Diagnostics and Scan Budget
|
|
2. `WS2` Routing Boundary Correction
|
|
3. `WS3` Structured Fail-Closed Reporting
|
|
4. `WS4` Follow-Up Sweep and Coverage Delta
|
|
|
|
## Phase 0: Freeze Improvement Baseline
|
|
|
|
### Objective
|
|
|
|
Freeze the dry-run and triage outputs as the only accepted inputs to this roadmap.
|
|
|
|
### Tasks
|
|
|
|
1. freeze `full_sweep_dry_run_2026-04-19.json`
|
|
2. freeze `full_sweep_dry_run_triage_2026-04-19.json`
|
|
3. freeze the four headline metrics:
|
|
- `5/102` real-sample pass
|
|
- `23/102` code-backed ledger coverage
|
|
- `40/102` dry-run auto-pass
|
|
- `66/102` dry-run actionable coverage
|
|
4. freeze the problem buckets:
|
|
- `4` known-family timeouts
|
|
- `8` large-source timeouts
|
|
- `19` unvalidated-source timeouts
|
|
- `5` host-bridge over-preference cases
|
|
- `25` readiness-before-report failures
|
|
- `1` bootstrap-target failure
|
|
|
|
### Deliverables
|
|
|
|
1. baseline statement
|
|
2. frozen blocker inventory
|
|
3. roadmap entry criteria
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. no additional scene is added to scope
|
|
2. no implementation starts before the baseline is frozen
|
|
3. dry-run and triage assets are treated as immutable inputs
|
|
|
|
## Phase 1: Known-Family Timeout Diagnostics
|
|
|
|
### Objective
|
|
|
|
Resolve the highest-priority ambiguity: known-family scenes that timed out in the full sweep.
|
|
|
|
### Tasks
|
|
|
|
1. select only records labeled `timeout-known-family-sample`
|
|
2. capture source scale metrics and previous family context
|
|
3. run bounded diagnostic attempts if needed
|
|
4. classify each record as:
|
|
- `known-family-rerun-pass`
|
|
- `known-family-source-scale-timeout`
|
|
- `known-family-generator-hotspot`
|
|
- `known-family-contract-blocked-after-long-run`
|
|
- `known-family-timeout-unresolved`
|
|
5. publish diagnostic result
|
|
|
|
### Deliverables
|
|
|
|
1. known-family timeout diagnostic JSON
|
|
2. known-family timeout diagnostic report
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. all `4` known-family timeout records are classified
|
|
2. no scene is promoted from diagnostic success
|
|
3. no generator logic is changed in the diagnostic step
|
|
|
|
## Phase 2: Source-Scale and Scan-Budget Improvement
|
|
|
|
### Objective
|
|
|
|
Reduce timeout noise caused by oversized source directories and obvious vendor/library files.
|
|
|
|
### Tasks
|
|
|
|
1. analyze `timeout-large-source` and `timeout-unvalidated-source`
|
|
2. define source scan budget policy
|
|
3. define vendor/library ignore policy
|
|
4. implement only bounded source scanning or timeout reporting changes
|
|
5. verify no canonical or real-sample regression is introduced
|
|
|
|
### Deliverables
|
|
|
|
1. source scan budget policy
|
|
2. bounded scan implementation if approved by Phase 1 evidence
|
|
3. timeout reporting regression tests
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. large source directories no longer dominate the full sweep by accidental vendor-file scanning
|
|
2. known-family samples are not made worse
|
|
3. archetype semantics are unchanged
|
|
|
|
## Phase 3: Host-Bridge Route Over-Preference Correction
|
|
|
|
### Objective
|
|
|
|
Correct or formally adjudicate the five cases where `host_bridge_workflow` over-absorbed `G3` or `G1-E` expected scenes.
|
|
|
|
### Tasks
|
|
|
|
1. select the `5` `route-overprefer-host-bridge` records
|
|
2. compare business-chain evidence against host-bridge evidence
|
|
3. define routing precedence rules for:
|
|
- `G3` vs `G6`
|
|
- `G1-E` vs `G6`
|
|
4. implement bounded routing correction only if evidence supports it
|
|
5. preserve regressions for:
|
|
- `G3` real-sample pass
|
|
- `G1-E` real-sample pass
|
|
- `G6` real-sample pass
|
|
6. classify each case as:
|
|
- `route-corrected-to-g3`
|
|
- `route-corrected-to-g1e`
|
|
- `board-expectation-reclassified`
|
|
- `valid-host-bridge-workflow`
|
|
- `route-conflict-unresolved`
|
|
|
|
### Deliverables
|
|
|
|
1. route over-preference correction report
|
|
2. routing regression tests
|
|
3. updated dry-run classification for the five fixed records
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. all `5` route conflicts are adjudicated
|
|
2. `host_bridge_workflow` no longer wins solely because host evidence exists
|
|
3. existing `G6` pass remains stable
|
|
4. no broad routing rewrite is introduced
|
|
|
|
## Phase 4: Structured Fail-Closed Reporting
|
|
|
|
### Objective
|
|
|
|
Convert `readiness-before-report` failures into structured failure reports instead of process-level no-report failures.
|
|
|
|
### Tasks
|
|
|
|
1. select the `25` `readiness-before-report` records
|
|
2. identify where generation exits before report emission
|
|
3. define a minimal failure-report schema for pre-package fail-closed
|
|
4. emit structured failure records with:
|
|
- inferred archetype
|
|
- failed gate
|
|
- blocker reason
|
|
- missing contract pieces
|
|
- stderr summary if any
|
|
5. keep scenes failing unless their contracts are actually complete
|
|
|
|
### Deliverables
|
|
|
|
1. pre-report fail-closed schema
|
|
2. implementation of structured failure report emission
|
|
3. regression covering at least one `paginated_enrichment`, one `local_doc_pipeline`, one `multi_mode_request`, and one `single_request_enrichment` pre-report failure
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. no-report failures are reduced or eliminated as a category
|
|
2. failing scenes still fail closed
|
|
3. failure reasons become machine-readable
|
|
4. auto-pass count is not inflated by looser gates
|
|
|
|
## Phase 5: Bootstrap Target Isolation
|
|
|
|
### Objective
|
|
|
|
Keep the single `bootstrap_target` failure isolated and decide whether it belongs to later bootstrap normalization work.
|
|
|
|
### Tasks
|
|
|
|
1. preserve `用户停电频次分析监测` as a separate bootstrap failure
|
|
2. inspect whether the failure is caused by missing target URL, domain mismatch, or unsupported bootstrap pattern
|
|
3. produce a bootstrap isolation note
|
|
4. do not implement login or bootstrap auto-recovery
|
|
|
|
### Deliverables
|
|
|
|
1. bootstrap target isolation note
|
|
2. decision whether the case enters a later bootstrap-normalization roadmap
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. the bootstrap case does not pollute readiness-before-report work
|
|
2. no login recovery implementation is started
|
|
|
|
## Phase 6: Follow-Up Full Sweep and Coverage Delta
|
|
|
|
### Objective
|
|
|
|
Measure whether the bounded improvements improved generic coverage.
|
|
|
|
### Tasks
|
|
|
|
1. rerun the fixed `102` scene full sweep with the same scene set
|
|
2. produce a new dry-run result
|
|
3. compare against the baseline:
|
|
- auto-pass delta
|
|
- actionable coverage delta
|
|
- timeout delta
|
|
- misclassification delta
|
|
- no-report delta
|
|
4. publish coverage delta report
|
|
5. decide whether to move to execution-board status sync or another bounded improvement cycle
|
|
|
|
### Deliverables
|
|
|
|
1. follow-up full sweep JSON
|
|
2. coverage delta report
|
|
3. remaining blocker decision board
|
|
|
|
### Acceptance Criteria
|
|
|
|
1. scene set remains exactly `102`
|
|
2. baseline and follow-up are comparable
|
|
3. improvements are quantified, not assumed
|
|
4. no execution board status is changed automatically
|
|
|
|
## Milestone Order
|
|
|
|
The order is fixed:
|
|
|
|
1. Phase 0: freeze baseline
|
|
2. Phase 1: known-family timeout diagnostics
|
|
3. Phase 2: source-scale and scan-budget improvement
|
|
4. Phase 3: host-bridge route over-preference correction
|
|
5. Phase 4: structured fail-closed reporting
|
|
6. Phase 5: bootstrap target isolation
|
|
7. Phase 6: follow-up full sweep and coverage delta
|
|
|
|
Do not start Phase 3 before Phase 1 is completed. Known-family timeout ambiguity affects the interpretation of current coverage.
|
|
|
|
Do not start Phase 6 before Phases 2-5 have either completed or been explicitly deferred with reasons.
|
|
|
|
## Completion Criteria
|
|
|
|
This roadmap is complete when:
|
|
|
|
1. known-family timeouts are no longer mixed with generic timeout noise
|
|
2. host-bridge over-preference cases are adjudicated
|
|
3. readiness-before-report failures become structured fail-closed records
|
|
4. the bootstrap target case is isolated
|
|
5. a follow-up full sweep quantifies coverage delta
|
|
6. no new family is introduced as a shortcut around current blockers
|
|
|
|
## Out of Plan
|
|
|
|
1. new family implementation
|
|
2. `G4/G5` implementation
|
|
3. browser host runtime transport
|
|
4. login recovery
|
|
5. attachment/local document runtime
|
|
6. automatic execution board promotion
|
|
|