admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

6.9 KiB

Raw Blame History

102 Full Sweep Improvement Roadmap Design

Date: 2026-04-19 Status: Draft Upstream Dry-Run: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md Upstream Triage: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md

Design Intent

Use the full 102 scene dry-run and triage results to define a single improvement roadmap for generic scene -> skill coverage.

This roadmap is the post-triage equivalent of the earlier 60-to-90 roadmap. It is not a single bugfix plan. It is the governing design for turning measured dry-run blockers into bounded implementation tracks.

The design answers:

how do we move from 40/102 dry-run auto-pass and 66/102 actionable coverage toward a higher verified generic conversion rate without drifting into unbounded fixes?

Current Baseline

The current measured state is:

Metric	Count
Real-sample executed pass	5 / 102
Code-backed ledger coverage	23 / 102
Dry-run auto-pass	40 / 102
Dry-run actionable coverage	66 / 102

The non-pass triage state is:

Bucket	Count	Triage conclusion
Timeout	31	`19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample`
Misclassified	5	all `route-overprefer-host-bridge`
No-report failure	25	all `readiness-before-report`
Bootstrap target	1	separate `bootstrap_target`

Problem Statement

The generic generator already auto-passes more scenes than the formal ledger coverage shows, but the result is not trustworthy enough to promote automatically because:

known-family scenes still appear in the timeout bucket
host_bridge_workflow can over-absorb scenes expected to remain G3 or G1-E
many fail-closed cases terminate before a structured generation report exists
timeout and no-report failures hide actionable blocker details

Roadmap Goal

Improve the measurable generic conversion pipeline, not by adding new families first, but by reducing ambiguity in the current failure surface.

The roadmap has four goals:

make known-family timeout results explainable and repeatable
correct or formally adjudicate host-bridge routing over-preference
convert pre-report failures into structured fail-closed results
rerun a bounded 102 sweep to measure coverage delta

Scope Guardrails

do not add new scene families in this roadmap
do not promote scenes directly from diagnostic runs
do not update scene_execution_board_2026-04-18.json until a later explicit status-sync plan
do not use one failure as justification for an unbounded rewrite
do not reopen completed G1-E / G2 / G3 / G6 / G7 real-sample pass records unless they are part of a fixed regression check
do not start G4 / G5
do not implement login recovery, full host runtime, or attachment pipeline work in this roadmap

Workstreams

WS1 Timeout and Source-Scale Diagnostics
WS2 Host-Bridge Routing Boundary Correction
WS3 Structured Fail-Closed Reporting
WS4 Coverage Delta Sweep and Decision Board

Track A: Known-Family Timeout Diagnostics

Intent

Separate known-family timeout behavior from generic unvalidated-source timeout behavior.

Input

The 4 records labeled:

timeout-known-family-sample

Expected Output

Each known-family timeout gets one of:

known-family-rerun-pass
known-family-source-scale-timeout
known-family-generator-hotspot
known-family-contract-blocked-after-long-run
known-family-timeout-unresolved

Design Constraint

A longer rerun success does not promote a scene. It only changes diagnostic classification.

Track B: Timeout Source-Scale Policy

Intent

Create a bounded input filtering and scan-budget policy for large source directories without changing family semantics.

Input

The timeout labels:

timeout-large-source
timeout-unvalidated-source

Expected Output

source file selection policy
large vendor/library ignore list policy
scan-budget decision table
timeout reporting shape

Design Constraint

This track is allowed to improve scan boundaries, but not allowed to change archetype semantics.

Track C: Host-Bridge Route Over-Preference Correction

Intent

Prevent host_bridge_workflow from absorbing scenes that should remain G3 or G1-E when business-chain evidence is stronger.

Input

The 5 records labeled:

route-overprefer-host-bridge

Expected Output

Each misclassification gets one of:

route-corrected-to-g3
route-corrected-to-g1e
board-expectation-reclassified
valid-host-bridge-workflow
route-conflict-unresolved

Design Constraint

This track must preserve the already-passed G6 real sample and must not degrade G3 or G1-E canonical tests.

Track D: Readiness-Before-Report Structured Fail-Closed

Intent

Convert generator failed without generation report into structured, machine-readable fail-closed results.

Input

The 25 records labeled:

readiness-before-report

Expected Output

Each case produces a generation report or equivalent dry-run failure record with:

inferred archetype
blocker stage
missing contract pieces
failed gate name
actionable reason

Design Constraint

This track should not make failing scenes pass. It should make failures explainable.

Track E: Bootstrap Target Isolation

Intent

Keep the single bootstrap_target failure separate so it does not pollute the no-report or route-correction work.

Input

The 1 bootstrap target failure:

用户停电频次分析监测

Expected Output

isolated bootstrap failure note
decision whether it belongs to later bootstrap normalization work

Design Constraint

No bootstrap auto-recovery or login work is included in this roadmap.

Track F: Coverage Delta Sweep

Intent

After bounded improvements, rerun a comparable 102 sweep and compare against the baseline.

Input

baseline dry-run result
updated generator after approved tracks
same 102 scene board

Expected Output

new dry-run result
coverage delta report
category movement table
decision board for remaining blockers

Design Constraint

The rerun must be comparable to the baseline. It cannot silently change the scene set.

Success Criteria

This roadmap succeeds when:

all known-family timeouts are separated from unvalidated timeout noise
all five host-bridge over-preference cases are adjudicated
no-report failures become structured fail-closed outputs
a follow-up full sweep shows measurable improvement or a clearly explained plateau
no new family is introduced to mask existing failure categories

Out of Scope

new G4/G5 implementation
full login recovery
browser host runtime transport implementation
local document attachment pipeline
automatic scene promotion into the execution board
full manual validation of all 102 generated skills

6.9 KiB Raw Blame History

102 Full Sweep Improvement Roadmap Design

Design Intent

Current Baseline

Problem Statement

Roadmap Goal

Scope Guardrails

Workstreams

Track A: Known-Family Timeout Diagnostics

Intent

Input

Expected Output

Design Constraint

Track B: Timeout Source-Scale Policy

Intent

Input

Expected Output

Design Constraint

Track C: Host-Bridge Route Over-Preference Correction

Intent

Input

Expected Output

Design Constraint

Track D: Readiness-Before-Report Structured Fail-Closed

Intent

Input

Expected Output

Design Constraint

Track E: Bootstrap Target Isolation

Intent

Input

Expected Output

Design Constraint

Track F: Coverage Delta Sweep

Intent

Input

Expected Output

Design Constraint

Success Criteria

Out of Scope

6.9 KiB

Raw Blame History