Files
claw/docs/superpowers/plans/2026-04-19-102-full-sweep-improvement-roadmap-plan.md

9.4 KiB

102 Full Sweep Improvement Roadmap Plan

Date: 2026-04-19 Status: Draft Upstream Spec: docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.md Upstream Dry-Run Result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json Upstream Triage Result: tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json

Plan Intent

Turn the 102 scene dry-run and triage findings into a governed improvement roadmap.

This plan is intentionally broad like the earlier 60-to-90 roadmap. It coordinates multiple bounded implementation tracks instead of starting isolated fixes from individual failures.

Baseline

Current measured baseline:

Metric Count
Real-sample executed pass 5 / 102
Code-backed ledger coverage 23 / 102
Dry-run auto-pass 40 / 102
Dry-run actionable coverage 66 / 102

Current triage baseline:

Bucket Count Triage conclusion
Timeout 31 19 timeout-unvalidated-source, 8 timeout-large-source, 4 timeout-known-family-sample
Misclassified 5 all route-overprefer-host-bridge
No-report failure 25 all readiness-before-report
Bootstrap target 1 separate bootstrap_target

Scope Guardrails

  1. do not add new scene families
  2. do not update scene_execution_board_2026-04-18.json inside this roadmap
  3. do not promote scenes directly from diagnostic or dry-run results
  4. do not reopen completed real-sample passes except as regression checks
  5. do not start G4/G5
  6. do not implement full login recovery
  7. do not implement full host runtime transport
  8. do not implement local document attachment runtime
  9. do not create unbounded micro-plans from a single failure

Workstreams

  1. WS1 Timeout Diagnostics and Scan Budget
  2. WS2 Routing Boundary Correction
  3. WS3 Structured Fail-Closed Reporting
  4. WS4 Follow-Up Sweep and Coverage Delta

Phase 0: Freeze Improvement Baseline

Objective

Freeze the dry-run and triage outputs as the only accepted inputs to this roadmap.

Tasks

  1. freeze full_sweep_dry_run_2026-04-19.json
  2. freeze full_sweep_dry_run_triage_2026-04-19.json
  3. freeze the four headline metrics:
    • 5/102 real-sample pass
    • 23/102 code-backed ledger coverage
    • 40/102 dry-run auto-pass
    • 66/102 dry-run actionable coverage
  4. freeze the problem buckets:
    • 4 known-family timeouts
    • 8 large-source timeouts
    • 19 unvalidated-source timeouts
    • 5 host-bridge over-preference cases
    • 25 readiness-before-report failures
    • 1 bootstrap-target failure

Deliverables

  1. baseline statement
  2. frozen blocker inventory
  3. roadmap entry criteria

Acceptance Criteria

  1. no additional scene is added to scope
  2. no implementation starts before the baseline is frozen
  3. dry-run and triage assets are treated as immutable inputs

Phase 1: Known-Family Timeout Diagnostics

Objective

Resolve the highest-priority ambiguity: known-family scenes that timed out in the full sweep.

Tasks

  1. select only records labeled timeout-known-family-sample
  2. capture source scale metrics and previous family context
  3. run bounded diagnostic attempts if needed
  4. classify each record as:
    • known-family-rerun-pass
    • known-family-source-scale-timeout
    • known-family-generator-hotspot
    • known-family-contract-blocked-after-long-run
    • known-family-timeout-unresolved
  5. publish diagnostic result

Deliverables

  1. known-family timeout diagnostic JSON
  2. known-family timeout diagnostic report

Acceptance Criteria

  1. all 4 known-family timeout records are classified
  2. no scene is promoted from diagnostic success
  3. no generator logic is changed in the diagnostic step

Phase 2: Source-Scale and Scan-Budget Improvement

Objective

Reduce timeout noise caused by oversized source directories and obvious vendor/library files.

Tasks

  1. analyze timeout-large-source and timeout-unvalidated-source
  2. define source scan budget policy
  3. define vendor/library ignore policy
  4. implement only bounded source scanning or timeout reporting changes
  5. verify no canonical or real-sample regression is introduced

Deliverables

  1. source scan budget policy
  2. bounded scan implementation if approved by Phase 1 evidence
  3. timeout reporting regression tests

Acceptance Criteria

  1. large source directories no longer dominate the full sweep by accidental vendor-file scanning
  2. known-family samples are not made worse
  3. archetype semantics are unchanged

Phase 3: Host-Bridge Route Over-Preference Correction

Objective

Correct or formally adjudicate the five cases where host_bridge_workflow over-absorbed G3 or G1-E expected scenes.

Tasks

  1. select the 5 route-overprefer-host-bridge records
  2. compare business-chain evidence against host-bridge evidence
  3. define routing precedence rules for:
    • G3 vs G6
    • G1-E vs G6
  4. implement bounded routing correction only if evidence supports it
  5. preserve regressions for:
    • G3 real-sample pass
    • G1-E real-sample pass
    • G6 real-sample pass
  6. classify each case as:
    • route-corrected-to-g3
    • route-corrected-to-g1e
    • board-expectation-reclassified
    • valid-host-bridge-workflow
    • route-conflict-unresolved

Deliverables

  1. route over-preference correction report
  2. routing regression tests
  3. updated dry-run classification for the five fixed records

Acceptance Criteria

  1. all 5 route conflicts are adjudicated
  2. host_bridge_workflow no longer wins solely because host evidence exists
  3. existing G6 pass remains stable
  4. no broad routing rewrite is introduced

Phase 4: Structured Fail-Closed Reporting

Objective

Convert readiness-before-report failures into structured failure reports instead of process-level no-report failures.

Tasks

  1. select the 25 readiness-before-report records
  2. identify where generation exits before report emission
  3. define a minimal failure-report schema for pre-package fail-closed
  4. emit structured failure records with:
    • inferred archetype
    • failed gate
    • blocker reason
    • missing contract pieces
    • stderr summary if any
  5. keep scenes failing unless their contracts are actually complete

Deliverables

  1. pre-report fail-closed schema
  2. implementation of structured failure report emission
  3. regression covering at least one paginated_enrichment, one local_doc_pipeline, one multi_mode_request, and one single_request_enrichment pre-report failure

Acceptance Criteria

  1. no-report failures are reduced or eliminated as a category
  2. failing scenes still fail closed
  3. failure reasons become machine-readable
  4. auto-pass count is not inflated by looser gates

Phase 5: Bootstrap Target Isolation

Objective

Keep the single bootstrap_target failure isolated and decide whether it belongs to later bootstrap normalization work.

Tasks

  1. preserve 用户停电频次分析监测 as a separate bootstrap failure
  2. inspect whether the failure is caused by missing target URL, domain mismatch, or unsupported bootstrap pattern
  3. produce a bootstrap isolation note
  4. do not implement login or bootstrap auto-recovery

Deliverables

  1. bootstrap target isolation note
  2. decision whether the case enters a later bootstrap-normalization roadmap

Acceptance Criteria

  1. the bootstrap case does not pollute readiness-before-report work
  2. no login recovery implementation is started

Phase 6: Follow-Up Full Sweep and Coverage Delta

Objective

Measure whether the bounded improvements improved generic coverage.

Tasks

  1. rerun the fixed 102 scene full sweep with the same scene set
  2. produce a new dry-run result
  3. compare against the baseline:
    • auto-pass delta
    • actionable coverage delta
    • timeout delta
    • misclassification delta
    • no-report delta
  4. publish coverage delta report
  5. decide whether to move to execution-board status sync or another bounded improvement cycle

Deliverables

  1. follow-up full sweep JSON
  2. coverage delta report
  3. remaining blocker decision board

Acceptance Criteria

  1. scene set remains exactly 102
  2. baseline and follow-up are comparable
  3. improvements are quantified, not assumed
  4. no execution board status is changed automatically

Milestone Order

The order is fixed:

  1. Phase 0: freeze baseline
  2. Phase 1: known-family timeout diagnostics
  3. Phase 2: source-scale and scan-budget improvement
  4. Phase 3: host-bridge route over-preference correction
  5. Phase 4: structured fail-closed reporting
  6. Phase 5: bootstrap target isolation
  7. Phase 6: follow-up full sweep and coverage delta

Do not start Phase 3 before Phase 1 is completed. Known-family timeout ambiguity affects the interpretation of current coverage.

Do not start Phase 6 before Phases 2-5 have either completed or been explicitly deferred with reasons.

Completion Criteria

This roadmap is complete when:

  1. known-family timeouts are no longer mixed with generic timeout noise
  2. host-bridge over-preference cases are adjudicated
  3. readiness-before-report failures become structured fail-closed records
  4. the bootstrap target case is isolated
  5. a follow-up full sweep quantifies coverage delta
  6. no new family is introduced as a shortcut around current blockers

Out of Plan

  1. new family implementation
  2. G4/G5 implementation
  3. browser host runtime transport
  4. login recovery
  5. attachment/local document runtime
  6. automatic execution board promotion