Files
claw/docs/superpowers/specs/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-design.md

174 lines
5.8 KiB
Markdown

# Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design
> Date: 2026-04-20
> Status: Draft
> Upstream Framework: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Materialization: `docs/superpowers/plans/2026-04-19-scene-skill-102-final-materialization-plan.md`
> Upstream Invocation Readiness: `docs/superpowers/plans/2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md`
## Intent
Define the validation stage after the `102` scene set has reached:
1. `102 / 102` final materialized skill packages
2. `102 / 102` deterministic invocation readiness using the `U+3002 x3` deterministic suffix
3. `0` materialization failures
4. `0` deterministic dispatch ambiguities
This design does not extend the framework coverage work. It starts the next stage: proving that the materialized skill assets are structurally healthy, dispatchable, mock-runnable, and ready for a later real-environment validation campaign.
## Current Baseline
Fixed inputs:
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
3. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
4. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
5. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
6. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
Current state:
| Layer | Count |
| --- | ---: |
| materialized skill packages | 102 / 102 |
| deterministic invocation ready | 102 / 102 |
| known materialization failures | 0 |
| deterministic ambiguities | 0 |
## Validation Layers
This validation stage has four layers. Each layer answers a different question.
### Layer 1: Static Package Validation
Question:
Can every materialized skill package be parsed, indexed, and inspected without executing browser or business runtime code?
Checks:
1. required files exist:
- `SKILL.toml`
- `SKILL.md`
- `scene.toml`
- `references/generation-report.json`
- at least one runtime script
2. TOML files parse successfully
3. JSON reports parse successfully
4. `scene.toml` references the expected `sceneId`, tool, suffix, and keyword fields
5. `SKILL.toml` contains stable machine name and human-readable display metadata
6. generated scripts are non-empty and referenced consistently
Output status:
`static-validated` or `static-invalid`
### Layer 2: Deterministic Invocation Dry-Run
Question:
Can sgClaw select the correct skill for deterministic user input ending in the `U+3002 x3` suffix without using an LLM?
Checks:
1. full scene name plus the `U+3002 x3` suffix resolves to the expected skill
2. index sample utterance resolves to the expected skill
3. duplicate or ambiguous keyword matches are reported
4. scenes with parameter hints are flagged for later parameter validation
This layer must not execute the selected skill. It only validates registry and dispatch behavior.
Output status:
`dispatch-dry-run-pass`, `dispatch-ambiguous`, or `dispatch-no-match`
### Layer 3: Mock Runtime Validation
Question:
Can representative generated scripts execute their control flow against mocked browser, fetch, host-bridge, and local-doc dependencies?
This layer is not full production validation. It only proves that generated scripts can run through their main control path with controlled fake responses.
Checks:
1. generated script module loads
2. entry function is callable
3. mock request paths are invoked in expected order
4. empty data and basic error data do not crash the script
5. artifact metadata path is produced when the archetype declares exports
Scope:
This layer should begin with archetype representatives, then expand only if the representative harness is stable.
Output status:
`mock-runtime-pass`, `mock-runtime-fail`, or `mock-runtime-not-covered`
### Layer 4: Pseudo-Production Validation Plan
Question:
What must be true before moving from mock validation into real environment validation?
This layer defines the pre-production checklist and evidence bundle. It does not require production credentials or real system access.
Checklist:
1. environment variable and runtime dependency inventory
2. browser or host-bridge dependency declaration
3. expected artifact type per skill
4. required screenshots, logs, HAR files, console logs, or generated artifacts for later real execution
5. pass/fail taxonomy for real-environment results
Output status:
`pseudo-prod-ready`, `pseudo-prod-blocked`, or `real-env-required`
## Non-Goals
This design does not:
1. modify `src/generated_scene/analyzer.rs`
2. modify `src/generated_scene/generator.rs`
3. rematerialize the `102` skill packages
4. update `scene_execution_board_2026-04-18.json`
5. start browser-integrated production execution
6. require live credentials, VPN, SSO, or production network access
7. claim `102 / 102` real-sample executed-pass
## Validation Status Model
Each scene should eventually have independent statuses:
1. `materializationStatus`
2. `deterministicDispatchStatus`
3. `staticValidationStatus`
4. `mockRuntimeStatus`
5. `pseudoProductionReadinessStatus`
6. `realEnvironmentExecutionStatus`
This prevents the project from confusing generated skill availability with production correctness.
## Expected Deliverables
The implementation plan should produce:
1. static validation result JSON
2. deterministic dry-run validation JSON
3. mock runtime readiness matrix
4. pseudo-production checklist
5. validation report
6. next-stage decision on whether to start real environment validation
## Stop Rules
Stop after publishing validation readiness assets and reports.
Do not proceed into real production execution under this plan.
Do not modify generated framework logic under this plan.