admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

5.8 KiB

Raw Permalink Blame History

Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design

Date: 2026-04-20 Status: Draft Upstream Framework: docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md Upstream Materialization: docs/superpowers/plans/2026-04-19-scene-skill-102-final-materialization-plan.md Upstream Invocation Readiness: docs/superpowers/plans/2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md

Intent

Define the validation stage after the 102 scene set has reached:

102 / 102 final materialized skill packages
102 / 102 deterministic invocation readiness using the U+3002 x3 deterministic suffix
0 materialization failures
0 deterministic dispatch ambiguities

This design does not extend the framework coverage work. It starts the next stage: proving that the materialized skill assets are structurally healthy, dispatchable, mock-runnable, and ready for a later real-environment validation campaign.

Current Baseline

Fixed inputs:

examples/scene_skill_102_final_materialization_2026-04-19/skills
examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md
examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json
tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json
tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json
tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json

Current state:

Layer	Count
materialized skill packages	102 / 102
deterministic invocation ready	102 / 102
known materialization failures	0
deterministic ambiguities	0

Validation Layers

This validation stage has four layers. Each layer answers a different question.

Layer 1: Static Package Validation

Question:

Can every materialized skill package be parsed, indexed, and inspected without executing browser or business runtime code?

Checks:

required files exist:
- SKILL.toml
- SKILL.md
- scene.toml
- references/generation-report.json
- at least one runtime script
TOML files parse successfully
JSON reports parse successfully
scene.toml references the expected sceneId, tool, suffix, and keyword fields
SKILL.toml contains stable machine name and human-readable display metadata
generated scripts are non-empty and referenced consistently

Output status:

static-validated or static-invalid

Layer 2: Deterministic Invocation Dry-Run

Question:

Can sgClaw select the correct skill for deterministic user input ending in the U+3002 x3 suffix without using an LLM?

Checks:

full scene name plus the U+3002 x3 suffix resolves to the expected skill
index sample utterance resolves to the expected skill
duplicate or ambiguous keyword matches are reported
scenes with parameter hints are flagged for later parameter validation

This layer must not execute the selected skill. It only validates registry and dispatch behavior.

Output status:

dispatch-dry-run-pass, dispatch-ambiguous, or dispatch-no-match

Layer 3: Mock Runtime Validation

Question:

Can representative generated scripts execute their control flow against mocked browser, fetch, host-bridge, and local-doc dependencies?

This layer is not full production validation. It only proves that generated scripts can run through their main control path with controlled fake responses.

Checks:

generated script module loads
entry function is callable
mock request paths are invoked in expected order
empty data and basic error data do not crash the script
artifact metadata path is produced when the archetype declares exports

Scope:

This layer should begin with archetype representatives, then expand only if the representative harness is stable.

Output status:

mock-runtime-pass, mock-runtime-fail, or mock-runtime-not-covered

Layer 4: Pseudo-Production Validation Plan

Question:

What must be true before moving from mock validation into real environment validation?

This layer defines the pre-production checklist and evidence bundle. It does not require production credentials or real system access.

Checklist:

environment variable and runtime dependency inventory
browser or host-bridge dependency declaration
expected artifact type per skill
required screenshots, logs, HAR files, console logs, or generated artifacts for later real execution
pass/fail taxonomy for real-environment results

Output status:

pseudo-prod-ready, pseudo-prod-blocked, or real-env-required

Non-Goals

This design does not:

modify src/generated_scene/analyzer.rs
modify src/generated_scene/generator.rs
rematerialize the 102 skill packages
update scene_execution_board_2026-04-18.json
start browser-integrated production execution
require live credentials, VPN, SSO, or production network access
claim 102 / 102 real-sample executed-pass

Validation Status Model

Each scene should eventually have independent statuses:

materializationStatus
deterministicDispatchStatus
staticValidationStatus
mockRuntimeStatus
pseudoProductionReadinessStatus
realEnvironmentExecutionStatus

This prevents the project from confusing generated skill availability with production correctness.

Expected Deliverables

The implementation plan should produce:

static validation result JSON
deterministic dry-run validation JSON
mock runtime readiness matrix
pseudo-production checklist
validation report
next-stage decision on whether to start real environment validation

Stop Rules

Stop after publishing validation readiness assets and reports.

Do not proceed into real production execution under this plan.

Do not modify generated framework logic under this plan.

5.8 KiB Raw Permalink Blame History

Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design

Intent

Current Baseline

Validation Layers

Layer 1: Static Package Validation

Layer 2: Deterministic Invocation Dry-Run

Layer 3: Mock Runtime Validation

Layer 4: Pseudo-Production Validation Plan

Non-Goals

Validation Status Model

Expected Deliverables

Stop Rules

5.8 KiB

Raw Permalink Blame History