Files
claw/docs/superpowers/specs/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-design.md

5.8 KiB

Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design

Date: 2026-04-20 Status: Draft Upstream Framework: docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md Upstream Materialization: docs/superpowers/plans/2026-04-19-scene-skill-102-final-materialization-plan.md Upstream Invocation Readiness: docs/superpowers/plans/2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md

Intent

Define the validation stage after the 102 scene set has reached:

  1. 102 / 102 final materialized skill packages
  2. 102 / 102 deterministic invocation readiness using the U+3002 x3 deterministic suffix
  3. 0 materialization failures
  4. 0 deterministic dispatch ambiguities

This design does not extend the framework coverage work. It starts the next stage: proving that the materialized skill assets are structurally healthy, dispatchable, mock-runnable, and ready for a later real-environment validation campaign.

Current Baseline

Fixed inputs:

  1. examples/scene_skill_102_final_materialization_2026-04-19/skills
  2. examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md
  3. examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json
  4. tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json
  5. tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json
  6. tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json

Current state:

Layer Count
materialized skill packages 102 / 102
deterministic invocation ready 102 / 102
known materialization failures 0
deterministic ambiguities 0

Validation Layers

This validation stage has four layers. Each layer answers a different question.

Layer 1: Static Package Validation

Question:

Can every materialized skill package be parsed, indexed, and inspected without executing browser or business runtime code?

Checks:

  1. required files exist:
    • SKILL.toml
    • SKILL.md
    • scene.toml
    • references/generation-report.json
    • at least one runtime script
  2. TOML files parse successfully
  3. JSON reports parse successfully
  4. scene.toml references the expected sceneId, tool, suffix, and keyword fields
  5. SKILL.toml contains stable machine name and human-readable display metadata
  6. generated scripts are non-empty and referenced consistently

Output status:

static-validated or static-invalid

Layer 2: Deterministic Invocation Dry-Run

Question:

Can sgClaw select the correct skill for deterministic user input ending in the U+3002 x3 suffix without using an LLM?

Checks:

  1. full scene name plus the U+3002 x3 suffix resolves to the expected skill
  2. index sample utterance resolves to the expected skill
  3. duplicate or ambiguous keyword matches are reported
  4. scenes with parameter hints are flagged for later parameter validation

This layer must not execute the selected skill. It only validates registry and dispatch behavior.

Output status:

dispatch-dry-run-pass, dispatch-ambiguous, or dispatch-no-match

Layer 3: Mock Runtime Validation

Question:

Can representative generated scripts execute their control flow against mocked browser, fetch, host-bridge, and local-doc dependencies?

This layer is not full production validation. It only proves that generated scripts can run through their main control path with controlled fake responses.

Checks:

  1. generated script module loads
  2. entry function is callable
  3. mock request paths are invoked in expected order
  4. empty data and basic error data do not crash the script
  5. artifact metadata path is produced when the archetype declares exports

Scope:

This layer should begin with archetype representatives, then expand only if the representative harness is stable.

Output status:

mock-runtime-pass, mock-runtime-fail, or mock-runtime-not-covered

Layer 4: Pseudo-Production Validation Plan

Question:

What must be true before moving from mock validation into real environment validation?

This layer defines the pre-production checklist and evidence bundle. It does not require production credentials or real system access.

Checklist:

  1. environment variable and runtime dependency inventory
  2. browser or host-bridge dependency declaration
  3. expected artifact type per skill
  4. required screenshots, logs, HAR files, console logs, or generated artifacts for later real execution
  5. pass/fail taxonomy for real-environment results

Output status:

pseudo-prod-ready, pseudo-prod-blocked, or real-env-required

Non-Goals

This design does not:

  1. modify src/generated_scene/analyzer.rs
  2. modify src/generated_scene/generator.rs
  3. rematerialize the 102 skill packages
  4. update scene_execution_board_2026-04-18.json
  5. start browser-integrated production execution
  6. require live credentials, VPN, SSO, or production network access
  7. claim 102 / 102 real-sample executed-pass

Validation Status Model

Each scene should eventually have independent statuses:

  1. materializationStatus
  2. deterministicDispatchStatus
  3. staticValidationStatus
  4. mockRuntimeStatus
  5. pseudoProductionReadinessStatus
  6. realEnvironmentExecutionStatus

This prevents the project from confusing generated skill availability with production correctness.

Expected Deliverables

The implementation plan should produce:

  1. static validation result JSON
  2. deterministic dry-run validation JSON
  3. mock runtime readiness matrix
  4. pseudo-production checklist
  5. validation report
  6. next-stage decision on whether to start real environment validation

Stop Rules

Stop after publishing validation readiness assets and reports.

Do not proceed into real production execution under this plan.

Do not modify generated framework logic under this plan.