5.8 KiB
Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design
Date: 2026-04-20 Status: Draft Upstream Framework:
docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.mdUpstream Materialization:docs/superpowers/plans/2026-04-19-scene-skill-102-final-materialization-plan.mdUpstream Invocation Readiness:docs/superpowers/plans/2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md
Intent
Define the validation stage after the 102 scene set has reached:
102 / 102final materialized skill packages102 / 102deterministic invocation readiness using theU+3002 x3deterministic suffix0materialization failures0deterministic dispatch ambiguities
This design does not extend the framework coverage work. It starts the next stage: proving that the materialized skill assets are structurally healthy, dispatchable, mock-runnable, and ready for a later real-environment validation campaign.
Current Baseline
Fixed inputs:
examples/scene_skill_102_final_materialization_2026-04-19/skillsexamples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.mdexamples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.jsontests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.jsontests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.jsontests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json
Current state:
| Layer | Count |
|---|---|
| materialized skill packages | 102 / 102 |
| deterministic invocation ready | 102 / 102 |
| known materialization failures | 0 |
| deterministic ambiguities | 0 |
Validation Layers
This validation stage has four layers. Each layer answers a different question.
Layer 1: Static Package Validation
Question:
Can every materialized skill package be parsed, indexed, and inspected without executing browser or business runtime code?
Checks:
- required files exist:
SKILL.tomlSKILL.mdscene.tomlreferences/generation-report.json- at least one runtime script
- TOML files parse successfully
- JSON reports parse successfully
scene.tomlreferences the expectedsceneId, tool, suffix, and keyword fieldsSKILL.tomlcontains stable machine name and human-readable display metadata- generated scripts are non-empty and referenced consistently
Output status:
static-validated or static-invalid
Layer 2: Deterministic Invocation Dry-Run
Question:
Can sgClaw select the correct skill for deterministic user input ending in the U+3002 x3 suffix without using an LLM?
Checks:
- full scene name plus the
U+3002 x3suffix resolves to the expected skill - index sample utterance resolves to the expected skill
- duplicate or ambiguous keyword matches are reported
- scenes with parameter hints are flagged for later parameter validation
This layer must not execute the selected skill. It only validates registry and dispatch behavior.
Output status:
dispatch-dry-run-pass, dispatch-ambiguous, or dispatch-no-match
Layer 3: Mock Runtime Validation
Question:
Can representative generated scripts execute their control flow against mocked browser, fetch, host-bridge, and local-doc dependencies?
This layer is not full production validation. It only proves that generated scripts can run through their main control path with controlled fake responses.
Checks:
- generated script module loads
- entry function is callable
- mock request paths are invoked in expected order
- empty data and basic error data do not crash the script
- artifact metadata path is produced when the archetype declares exports
Scope:
This layer should begin with archetype representatives, then expand only if the representative harness is stable.
Output status:
mock-runtime-pass, mock-runtime-fail, or mock-runtime-not-covered
Layer 4: Pseudo-Production Validation Plan
Question:
What must be true before moving from mock validation into real environment validation?
This layer defines the pre-production checklist and evidence bundle. It does not require production credentials or real system access.
Checklist:
- environment variable and runtime dependency inventory
- browser or host-bridge dependency declaration
- expected artifact type per skill
- required screenshots, logs, HAR files, console logs, or generated artifacts for later real execution
- pass/fail taxonomy for real-environment results
Output status:
pseudo-prod-ready, pseudo-prod-blocked, or real-env-required
Non-Goals
This design does not:
- modify
src/generated_scene/analyzer.rs - modify
src/generated_scene/generator.rs - rematerialize the
102skill packages - update
scene_execution_board_2026-04-18.json - start browser-integrated production execution
- require live credentials, VPN, SSO, or production network access
- claim
102 / 102real-sample executed-pass
Validation Status Model
Each scene should eventually have independent statuses:
materializationStatusdeterministicDispatchStatusstaticValidationStatusmockRuntimeStatuspseudoProductionReadinessStatusrealEnvironmentExecutionStatus
This prevents the project from confusing generated skill availability with production correctness.
Expected Deliverables
The implementation plan should produce:
- static validation result JSON
- deterministic dry-run validation JSON
- mock runtime readiness matrix
- pseudo-production checklist
- validation report
- next-stage decision on whether to start real environment validation
Stop Rules
Stop after publishing validation readiness assets and reports.
Do not proceed into real production execution under this plan.
Do not modify generated framework logic under this plan.