admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

7.8 KiB

Raw Blame History

Scene Skill 102 Static, Mock, And Pseudo-Production Validation Plan

Date: 2026-04-20 Status: Draft Upstream Design: docs/superpowers/specs/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-design.md Parent Stage: validation after final materialization and deterministic invocation readiness

Plan Intent

Define the next validation stage for the fully materialized 102 scene skill set.

This plan validates package health, deterministic dispatch readiness, mock runtime feasibility, and pseudo-production readiness. It does not perform real production execution.

Fixed Inputs

examples/scene_skill_102_final_materialization_2026-04-19/skills
examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md
examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json
tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json
tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json
tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json

Planned Outputs

tests/fixtures/generated_scene/scene_skill_102_static_validation_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_dispatch_dry_run_validation_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_pseudoprod_readiness_2026-04-20.json
docs/superpowers/reports/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-report.md

Scope Guardrails

Allowed:

read final materialized skill packages
parse SKILL.toml, scene.toml, and generation reports
run deterministic dispatch dry-run without executing selected skills
build mock runtime validation matrix
publish validation JSON and report assets

Forbidden:

do not modify src/generated_scene/analyzer.rs
do not modify src/generated_scene/generator.rs
do not modify generated skill scripts during this validation plan
do not rematerialize the 102 skills
do not update tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json
do not start real browser or production environment execution
do not require production credentials, SSO, VPN, or real system access

Workstreams

WS1 Static package validation
WS2 Deterministic invocation dry-run validation
WS3 Mock runtime validation matrix
WS4 Pseudo-production readiness checklist
WS5 Validation report and next-stage decision

Phase 0: Freeze Validation Baseline

Objective

Freeze the 102 final skill set as the input to validation.

Tasks

Confirm final materialization count is 102 / 102.
Confirm materialization failure count is 0.
Confirm deterministic invocation readiness is 102 / 102.
Confirm this plan does not rematerialize skills.

Deliverables

validation baseline section in final report

Acceptance Criteria

validation begins from the final materialization root
no source scene directories are rescanned
no generated scene logic is changed

Phase 1: Static Package Validation

Objective

Validate that all 102 skill packages are structurally complete and parseable.

Tasks

For each skill:

check SKILL.toml
check SKILL.md
check scene.toml
check references/generation-report.json
check at least one script under scripts/
parse TOML and JSON files
compare sceneId, display name, archetype, readiness, suffix, and keyword fields against index and manifest assets

Deliverables

scene_skill_102_static_validation_2026-04-20.json

Acceptance Criteria

every scene has exactly one static validation record
every static failure has a named reason
total records equal 102

Phase 2: Deterministic Invocation Dry-Run Validation

Objective

Validate U+3002 x3 deterministic suffix dispatch selection without executing selected skills.

Tasks

For each skill:

construct one canonical utterance from the scene display name plus the U+3002 x3 suffix
optionally construct one keyword-based utterance when safe
dry-run deterministic selection against the skill registry
record selected skill id, ambiguity count, and no-match status

Deliverables

scene_skill_102_dispatch_dry_run_validation_2026-04-20.json

Acceptance Criteria

every complete skill has a dispatch dry-run result
ambiguous and no-match outcomes are explicit
no selected skill is executed

Phase 3: Mock Runtime Validation Matrix

Objective

Define and, where safe, prepare mock runtime validation by archetype.

Tasks

group 102 skills by workflow archetype
identify one to three representatives per archetype
define mock dependencies required by each archetype:
- fake fetch
- fake browser DOM
- fake host bridge
- fake local-doc service
- fake artifact writer
classify each skill as:
- mock-covered-by-representative
- mock-needs-harness
- mock-not-safe-yet

Deliverables

scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json

Acceptance Criteria

every scene is assigned a mock-runtime coverage status
every archetype has a named harness requirement
this phase does not require real network or browser credentials

Phase 4: Pseudo-Production Readiness Checklist

Objective

Define what evidence is required before real-environment validation can start.

Tasks

For each scene or archetype:

record required runtime dependencies
record expected artifact type
record whether host bridge, browser, localhost service, or document pipeline is required
define required execution evidence:
- console logs
- network logs
- screenshots
- exported files
- generated artifact metadata
define failure taxonomy:
- login-blocked
- network-blocked
- host-bridge-blocked
- data-mismatch
- artifact-mismatch
- environment-unavailable
- runtime-error

Deliverables

scene_skill_102_pseudoprod_readiness_2026-04-20.json

Acceptance Criteria

every scene has a pseudo-production readiness record
every real-environment blocker has a named category
no production credentials are required by this phase

Phase 5: Publish Validation Report

Objective

Publish one report that separates static readiness, dispatch readiness, mock readiness, and pseudo-production readiness.

Tasks

summarize static validation results
summarize dispatch dry-run results
summarize mock runtime coverage matrix
summarize pseudo-production readiness categories
recommend whether to start real-environment validation and at what batch size

Deliverables

docs/superpowers/reports/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-report.md

Acceptance Criteria

report explains that 102 / 102 materialization is not the same as 102 / 102 production execution
report lists remaining validation blockers, if any
report does not promote any scene to real executed-pass

Expected Status Outputs

This plan should produce these independent status counts:

staticValidationStatus
dispatchDryRunStatus
mockRuntimeCoverageStatus
pseudoProductionReadinessStatus

Completion Criteria

This plan is complete when:

all planned validation assets are published
all 102 scenes have static validation records
all 102 scenes have dispatch dry-run records
all 102 scenes have mock runtime matrix records
all 102 scenes have pseudo-production readiness records
the validation report is published

Stop Statement

Stop after publishing static, dispatch, mock-runtime matrix, pseudo-production readiness assets, and the report.

Do not execute real production validation under this plan.

7.8 KiB Raw Blame History

Scene Skill 102 Static, Mock, And Pseudo-Production Validation Plan

Plan Intent

Fixed Inputs

Planned Outputs

Scope Guardrails

Workstreams

Phase 0: Freeze Validation Baseline

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 1: Static Package Validation

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 2: Deterministic Invocation Dry-Run Validation

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 3: Mock Runtime Validation Matrix

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 4: Pseudo-Production Readiness Checklist

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 5: Publish Validation Report

Objective

Tasks

Deliverables

Acceptance Criteria

Expected Status Outputs

Completion Criteria

Stop Statement

7.8 KiB

Raw Blame History