admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

10 KiB

Raw Permalink Blame History

Scene Skill 102 Full Coverage Framework Design

Date: 2026-04-19 Status: Draft Upstream Roadmap: docs/superpowers/plans/2026-04-17-scene-skill-60-to-90-roadmap-plan.md Upstream Reconciliation: tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json Upstream Follow-up: tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json Upstream Timeout Hygiene: tests/fixtures/generated_scene/timeout_rerun_hygiene_integration_2026-04-19.json

Intent

Provide the single post-roadmap framework design for driving the current sgClaw scene-to-skill pipeline from partial 102 scene coverage to full bounded 102 scene coverage.

This design is intentionally broader than the bounded micro-plans used so far. It defines:

the current actual state of the 102 scene set
what is still missing before 100% coverage can be claimed
the layered framework that all future changes must fit into
the fixed route order for future implementation work
the stop rules that prevent the project from drifting into unbounded plan recursion

This design is meant to become the single parent framework for later bounded plans.

Current State

Raw Current State

From the latest integrated assets:

Status	Count
`auto-pass`	48
`fail-closed-known`	47
`adjudicated-valid-host-bridge`	4
raw `source-unreadable`	3
Total	102

Timeout Hygiene Overlay

The timeout hygiene layer shows that the raw 3 timeout records are not all hard unreadable records:

Hygiene-aware timeout interpretation	Count
`timeout-as-pass-candidate`	2
`timeout-as-fail-closed-candidate`	1
`timeout-still-unreadable`	0
`timeout-rerun-error`	0

Interpretation

This means the framework has already reached these milestones:

there are no unsupported-family scenes in the current 102 sweep
there are no unresolved route conflicts left in the current 102 sweep
the remaining gap is no longer “framework cannot classify this scene”
the remaining gap is “contract does not close” or “timeout budget/hygiene distorts the raw reading”

What Is Still Missing Before 100% Coverage

100% coverage does not mean all 102 scenes must become direct auto-pass.

For this framework, 100% bounded coverage means:

every scene is classified into a supported framework path
every non-pass result is either:
- structured fail-closed with named blocker
- valid host-bridge workflow adjudication
- hygiene-aware timeout interpretation
there are no unresolved buckets like:
- unsupported family
- unresolved route conflict
- opaque no-report failure
- unexplained timeout

Under that definition, the missing gap is:

Missing Gap A: Structured Contract Closure

There are still 47 structured fail-closed records.

Current distribution:

Archetype	Count
`paginated_enrichment`	34
`local_doc_pipeline`	5
`multi_mode_request`	4
`single_request_enrichment`	2
`host_bridge_workflow`	1
`page_state_eval`	1

This is the largest remaining implementation gap.

Missing Gap B: Timeout Hygiene Integration into Main Reporting

The timeout hygiene layer now exists, but it is still a reporting-side overlay. It has not yet been folded into the primary current-state narrative used by later roadmap decisions.

Missing Gap C: Current-State Overlay vs Execution Board

The project intentionally did not update scene_execution_board_2026-04-18.json during these bounded plans. That is correct, but it means the official board is still behind the latest integrated view.

Missing Gap D: Promotion Policy

The project still lacks a single parent rule that says when a structured fail-closed scene may be promoted from:

fail-closed
fail-closed with stronger evidence
bounded rerun pass candidate

into a stronger scene-level coverage status.

Framework Layers

All future work must land in exactly one of these layers.

Layer A: Source Scan and Budget Layer

Purpose:

source directory size handling
file filtering
timeout budget policy
rerun hygiene

Owned concerns:

source scan volume
timeout policy
rerun interpretation

Must not own:

archetype routing
contract closure logic
scene promotion

Primary code area:

src/generated_scene/analyzer.rs
reporting JSON and sweep scripts

Layer B: Archetype Routing Layer

Purpose:

decide the correct framework path:
- single_request_table
- single_request_enrichment
- multi_mode_request
- paginated_enrichment
- host_bridge_workflow
- multi_endpoint_inventory
- local_doc_pipeline

Owned concerns:

route precedence
mixed-evidence routing boundaries
route adjudication support

Must not own:

timeout policy
contract synthesis beyond routing evidence
board reconciliation

Primary code area:

src/generated_scene/analyzer.rs

Layer C: Contract Recovery Layer

Purpose:

Recover the minimum business contract fields needed by each supported archetype.

Owned concerns:

request contract recovery
response contract recovery
pagination plan recovery
enrichment request recovery
join key recovery
export plan recovery
mode matrix recovery

Must not own:

timeout policy
execution board updates
status promotion

Primary code area:

src/generated_scene/generator.rs
src/generated_scene/ir.rs

Layer D: Structured Fail-Closed and Reporting Layer

Purpose:

Make every incomplete scene fail in an explainable and structured way.

Owned concerns:

readiness-before-report classification
blocker naming
contractSnapshot
generation-report completeness

Must not own:

route preference
source scan budget
promotion policy

Primary code area:

src/generated_scene/generator.rs
reporting assets under tests/fixtures/generated_scene/

Layer E: Sweep, Reconciliation, and Coverage Layer

Purpose:

Measure the whole 102 scene set, reconcile multiple interpretation layers, and report trustworthy coverage.

Owned concerns:

full sweep outputs
route adjudication overlay
timeout hygiene overlay
integrated coverage reporting
board reconciliation planning

Must not own:

analyzer implementation changes
generator implementation changes

Primary assets:

tests/fixtures/generated_scene/*full_sweep*
tests/fixtures/generated_scene/*reconciliation*
tests/fixtures/generated_scene/*timeout*hygiene*
docs/superpowers/reports/*coverage*

Coverage Definitions

This framework uses four explicit coverage concepts.

Coverage 1: Direct Pass Coverage

Scenes with direct auto-pass.

Current count:

48 / 102

Coverage 2: Framework-Resolved Coverage

Scenes in one of:

auto-pass
adjudicated-valid-host-bridge
structured fail-closed-known
hygiene-aware timeout interpretation

This is the best measure of whether the framework has “caught” the scene set.

Coverage 3: Promotion Coverage

Scenes already represented as promoted or boundary family assets in current project assets.

This is lower than framework-resolved coverage because promotion is intentionally conservative.

Coverage 4: Real-Sample Execution Coverage

Scenes that have actual selected and executed real-sample validation records.

This is the strictest coverage metric.

Fixed Route Order for Future Work

Future work must follow this order.

Route 1: Finish Layer E Hygiene Integration

Goal:

Make sweep and reconciliation reporting hygiene-aware by default.

This route is nearly finished and should be closed first.

Route 2: `G3 / paginated_enrichment` Contract Closure

Goal:

Work down the largest remaining structured fail-closed bucket.

Why first:

largest bucket by count
most important for closing the remaining 102 gap
already split into repeated missing-contract patterns

Expected sub-order:

enrichment_request_missing
export_plan_missing
then any remaining join_key or runtime-scope style gaps

Route 3: `G2 / multi_mode_request` Small-Bucket Closure

Goal:

Close the remaining 4 multi-mode structured fail-closed records.

Why third:

clear archetype
relatively small bucket
mainline family already has real-sample pass anchor

Route 4: `G1-E / single_request_enrichment` Small-Bucket Closure

Goal:

Close the remaining 2 G1-E structured fail-closed records.

Why fourth:

smallest mainline bucket
framework anchor already exists
lower leverage than G3 and G2

Route 5: Decide on `local_doc_pipeline` and `host_bridge_workflow`

Goal:

Handle the remaining boundary-family fail-closed records only after the mainline buckets are reduced.

This route must not start before Routes 2–4 have completed or been explicitly deferred.

Route 6: Reconciliation and Board Promotion Policy

Goal:

Define how stronger framework-resolved statuses can update the execution board without over-promoting scenes.

This must be done only after contract-closure routes have produced stable deltas.

What Future Plans Must Contain

Every later bounded implementation plan must explicitly declare:

which framework layer it belongs to
which route from this design it belongs to
which code modules it is allowed to touch
which code modules it must not touch
how it protects current real-sample and canonical passes
what exact delta it expects to produce in the 102 scene state

If a future plan cannot answer those six items, it is out of framework and should not start.

Stop Rules

The framework forbids:

starting a new micro-plan that only renames a narrower semantics problem without moving toward a route completion
treating timeout rerun success as promotion
updating execution board state inside a diagnostic plan
opening G4/G5 before the current structured fail-closed mainline is reduced
using prompt-only tuning as a substitute for contract recovery

What 100% Looks Like

This framework considers 100% bounded coverage achieved when:

unsupported-family = 0
missing-source = 0
misclassified-unresolved = 0
timeout-still-unreadable = 0
every remaining non-pass scene is structured and attributable to a supported framework path
execution board and reconciliation reporting can express the current scene state without ambiguity

This is different from 100% auto-pass.

100% auto-pass is not the immediate target.

100% bounded framework coverage is the immediate target.

10 KiB Raw Permalink Blame History Unescape Escape

Scene Skill 102 Full Coverage Framework Design

Intent

Current State

Raw Current State

Timeout Hygiene Overlay

Interpretation

What Is Still Missing Before 100% Coverage

Missing Gap A: Structured Contract Closure

Missing Gap B: Timeout Hygiene Integration into Main Reporting

Missing Gap C: Current-State Overlay vs Execution Board

Missing Gap D: Promotion Policy

Framework Layers

Layer A: Source Scan and Budget Layer

Layer B: Archetype Routing Layer

Layer C: Contract Recovery Layer

Layer D: Structured Fail-Closed and Reporting Layer

Layer E: Sweep, Reconciliation, and Coverage Layer

Coverage Definitions

Coverage 1: Direct Pass Coverage

Coverage 2: Framework-Resolved Coverage

Coverage 3: Promotion Coverage

Coverage 4: Real-Sample Execution Coverage

Fixed Route Order for Future Work

Route 1: Finish Layer E Hygiene Integration

Route 2: G3 / paginated_enrichment Contract Closure

Route 3: G2 / multi_mode_request Small-Bucket Closure

Route 4: G1-E / single_request_enrichment Small-Bucket Closure

Route 5: Decide on local_doc_pipeline and host_bridge_workflow

Route 6: Reconciliation and Board Promotion Policy

What Future Plans Must Contain

Stop Rules

What 100% Looks Like

10 KiB

Raw Permalink Blame History

Route 2: `G3 / paginated_enrichment` Contract Closure

Route 3: `G2 / multi_mode_request` Small-Bucket Closure

Route 4: `G1-E / single_request_enrichment` Small-Bucket Closure

Route 5: Decide on `local_doc_pipeline` and `host_bridge_workflow`