admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

7.6 KiB

Raw Permalink Blame History

Structured Fail-Closed Improvement Roadmap Plan

Date: 2026-04-19 Status: Draft Upstream Spec: docs/superpowers/specs/2026-04-19-structured-fail-closed-improvement-roadmap-design.md Upstream Reconciliation: tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json

Plan Intent

Coordinate the next improvement cycle for the 48 structured fail-closed records from the reconciled 102 sweep.

This is a roadmap-level plan. It intentionally starts with inventory and gap taxonomy before any implementation correction.

Baseline

Current reconciled 102 status:

Status	Count
`auto-pass`	48
`fail-closed-known`	48
`adjudicated-valid-host-bridge`	4
`source-unreadable`	2

Fail-closed distribution:

Inferred archetype	Count
`paginated_enrichment`	35
`local_doc_pipeline`	5
`multi_mode_request`	4
`single_request_enrichment`	2
`host_bridge_workflow`	1
`page_state_eval`	1

Scope Guardrails

do not add new scene families
do not start G4/G5
do not implement login recovery
do not implement full host runtime transport
do not implement local document attachment runtime
do not update scene_execution_board_2026-04-18.json
do not promote scenes directly from dry-run or follow-up results
do not reopen adjudicated-valid-host-bridge records
do not handle the 2 timeout records in this roadmap
do not loosen readiness gates to increase pass count

Workstreams

WS1 Fail-Closed Inventory and Gap Taxonomy
WS2 G3 Paginated Enrichment Recovery
WS3 Small-Bucket Recovery
WS4 Bootstrap Isolation
WS5 Follow-Up Sweep and Reporting

Phase 0: Freeze Structured Fail-Closed Baseline

Objective

Freeze the 48 fail-closed records as the only implementation-analysis input.

Tasks

read full_sweep_status_reconciliation_2026-04-19.json
verify total scene count is 102
verify fail-closed-known = 48
verify adjudicated-valid-host-bridge = 4
verify source-unreadable = 2
extract only records with reconciledStatus = fail-closed-known

Deliverables

frozen fail-closed input list
baseline validation summary

Acceptance Criteria

exactly 48 records enter this roadmap
route-adjudicated records are excluded
timeout records are excluded

Phase 1: Build Fail-Closed Inventory and Gap Taxonomy

Objective

Split the 48 records into actionable missing-contract buckets.

Tasks

inspect each fail-closed record
assign exactly one primary missing-contract label:
- main_request_missing
- pagination_plan_missing
- enrichment_request_missing
- join_key_missing
- export_plan_missing
- mode_matrix_missing
- mode_request_contract_missing
- single_request_enrichment_contract_missing
- host_bridge_contract_missing
- local_doc_contract_missing
- bootstrap_target_unresolved
- mixed_or_ambiguous_contract_gap
attach secondary labels when useful
group by inferred archetype and primary label
identify top repeated recoverable patterns

Deliverables

tests/fixtures/generated_scene/structured_fail_closed_inventory_2026-04-19.json
docs/superpowers/reports/2026-04-19-structured-fail-closed-inventory-report.md

Acceptance Criteria

all 48 records have exactly one primary label
the 35 paginated_enrichment records are explicitly split
no implementation is performed in this phase

Phase 2: G3 Paginated Enrichment Recovery Slice

Objective

Improve the largest bucket only when Phase 1 identifies repeated recoverable G3 patterns.

Tasks

select only paginated_enrichment records from the inventory
prioritize repeated primary labels in this order:
- main_request_missing
- pagination_plan_missing
- enrichment_request_missing
- join_key_missing
- export_plan_missing
define bounded recovery rules for the top repeated pattern
implement only traceable evidence recovery
add regression tests for the recovered pattern
preserve canonical G3 and real-sample G3 pass

Deliverables

G3 recovery implementation if evidence supports it
regression tests for the recovered pattern
G3 recovery report

Acceptance Criteria

no scene-name hardcoding
no gate relaxation
recovered fields are traceable to source evidence
existing G3 canonical and real-sample tests pass

Phase 3: Small-Bucket Recovery Slice

Objective

Handle smaller buckets only after the G3 slice is complete or explicitly deferred.

Tasks

inspect local_doc_pipeline = 5
inspect multi_mode_request = 4
inspect single_request_enrichment = 2
inspect host_bridge_workflow = 1
choose at most one bounded non-G3 recovery slice
preserve existing real-sample passes for G1-E, G2, G6, G7

Deliverables

small-bucket recovery decision report
optional bounded implementation and tests

Acceptance Criteria

only one small-bucket slice is implemented in this roadmap
no G8 attachment/local document runtime is started
no full host runtime transport is started

Phase 4: Bootstrap Target Isolation

Objective

Keep the single page_state_eval + bootstrap_target record separate.

Tasks

identify the bootstrap target record
preserve it as a separate future input
do not implement login recovery
produce bootstrap isolation note

Deliverables

bootstrap isolation note

Acceptance Criteria

bootstrap target does not pollute G3 or small-bucket recovery
no login or bootstrap auto-recovery is implemented

Phase 5: Follow-Up Sweep and Coverage Delta

Objective

Measure the impact of bounded recovery work.

Tasks

rerun the fixed 102 scene sweep
produce a new follow-up result
compare against the reconciled baseline:
- auto-pass delta
- fail-closed-known delta
- actionable coverage delta
- timeout count
- adjudicated host-bridge count
publish coverage delta report

Deliverables

tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json
docs/superpowers/reports/2026-04-19-structured-fail-closed-improvement-coverage-delta-report.md
docs/superpowers/reports/2026-04-19-structured-fail-closed-improvement-roadmap-closure-report.md

Acceptance Criteria

scene set remains exactly 102
improvements are measured, not assumed
execution board remains unchanged
fail-closed count only drops when contracts close or become more specifically isolated

Milestone Order

The order is fixed:

Phase 0: freeze fail-closed baseline
Phase 1: build inventory and taxonomy
Phase 2: G3 recovery slice
Phase 3: small-bucket recovery slice
Phase 4: bootstrap target isolation
Phase 5: follow-up sweep and delta

Do not start implementation before Phase 1 is complete.

Do not start small-bucket recovery before the G3 slice is completed or explicitly deferred with reasons.

Completion Criteria

This roadmap is complete when:

all 48 structured fail-closed records are inventoried and labeled
the 35 G3 records are split into actionable contract-gap groups
at least the highest-value repeated recoverable pattern is either implemented or explicitly deferred
small buckets are inspected and at most one bounded slice is implemented
the bootstrap target remains isolated
a follow-up sweep quantifies coverage delta
no new family is introduced

Stop Statement

Stop after the follow-up sweep, delta report, and closure report.

Do not automatically update the execution board or start another roadmap inside this plan.

7.6 KiB Raw Permalink Blame History

Structured Fail-Closed Improvement Roadmap Plan

Plan Intent

Baseline

Scope Guardrails

Workstreams

Phase 0: Freeze Structured Fail-Closed Baseline

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 1: Build Fail-Closed Inventory and Gap Taxonomy

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 2: G3 Paginated Enrichment Recovery Slice

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 3: Small-Bucket Recovery Slice

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 4: Bootstrap Target Isolation

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 5: Follow-Up Sweep and Coverage Delta

Objective

Tasks

Deliverables

Acceptance Criteria

Milestone Order

Completion Criteria

Stop Statement

7.6 KiB

Raw Permalink Blame History