174 lines
5.8 KiB
Markdown
174 lines
5.8 KiB
Markdown
# Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design
|
|
|
|
> Date: 2026-04-20
|
|
> Status: Draft
|
|
> Upstream Framework: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
|
|
> Upstream Materialization: `docs/superpowers/plans/2026-04-19-scene-skill-102-final-materialization-plan.md`
|
|
> Upstream Invocation Readiness: `docs/superpowers/plans/2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md`
|
|
|
|
## Intent
|
|
|
|
Define the validation stage after the `102` scene set has reached:
|
|
|
|
1. `102 / 102` final materialized skill packages
|
|
2. `102 / 102` deterministic invocation readiness using the `U+3002 x3` deterministic suffix
|
|
3. `0` materialization failures
|
|
4. `0` deterministic dispatch ambiguities
|
|
|
|
This design does not extend the framework coverage work. It starts the next stage: proving that the materialized skill assets are structurally healthy, dispatchable, mock-runnable, and ready for a later real-environment validation campaign.
|
|
|
|
## Current Baseline
|
|
|
|
Fixed inputs:
|
|
|
|
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
|
|
2. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
|
|
3. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
|
|
4. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
|
|
5. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
|
|
6. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
|
|
|
|
Current state:
|
|
|
|
| Layer | Count |
|
|
| --- | ---: |
|
|
| materialized skill packages | 102 / 102 |
|
|
| deterministic invocation ready | 102 / 102 |
|
|
| known materialization failures | 0 |
|
|
| deterministic ambiguities | 0 |
|
|
|
|
## Validation Layers
|
|
|
|
This validation stage has four layers. Each layer answers a different question.
|
|
|
|
### Layer 1: Static Package Validation
|
|
|
|
Question:
|
|
|
|
Can every materialized skill package be parsed, indexed, and inspected without executing browser or business runtime code?
|
|
|
|
Checks:
|
|
|
|
1. required files exist:
|
|
- `SKILL.toml`
|
|
- `SKILL.md`
|
|
- `scene.toml`
|
|
- `references/generation-report.json`
|
|
- at least one runtime script
|
|
2. TOML files parse successfully
|
|
3. JSON reports parse successfully
|
|
4. `scene.toml` references the expected `sceneId`, tool, suffix, and keyword fields
|
|
5. `SKILL.toml` contains stable machine name and human-readable display metadata
|
|
6. generated scripts are non-empty and referenced consistently
|
|
|
|
Output status:
|
|
|
|
`static-validated` or `static-invalid`
|
|
|
|
### Layer 2: Deterministic Invocation Dry-Run
|
|
|
|
Question:
|
|
|
|
Can sgClaw select the correct skill for deterministic user input ending in the `U+3002 x3` suffix without using an LLM?
|
|
|
|
Checks:
|
|
|
|
1. full scene name plus the `U+3002 x3` suffix resolves to the expected skill
|
|
2. index sample utterance resolves to the expected skill
|
|
3. duplicate or ambiguous keyword matches are reported
|
|
4. scenes with parameter hints are flagged for later parameter validation
|
|
|
|
This layer must not execute the selected skill. It only validates registry and dispatch behavior.
|
|
|
|
Output status:
|
|
|
|
`dispatch-dry-run-pass`, `dispatch-ambiguous`, or `dispatch-no-match`
|
|
|
|
### Layer 3: Mock Runtime Validation
|
|
|
|
Question:
|
|
|
|
Can representative generated scripts execute their control flow against mocked browser, fetch, host-bridge, and local-doc dependencies?
|
|
|
|
This layer is not full production validation. It only proves that generated scripts can run through their main control path with controlled fake responses.
|
|
|
|
Checks:
|
|
|
|
1. generated script module loads
|
|
2. entry function is callable
|
|
3. mock request paths are invoked in expected order
|
|
4. empty data and basic error data do not crash the script
|
|
5. artifact metadata path is produced when the archetype declares exports
|
|
|
|
Scope:
|
|
|
|
This layer should begin with archetype representatives, then expand only if the representative harness is stable.
|
|
|
|
Output status:
|
|
|
|
`mock-runtime-pass`, `mock-runtime-fail`, or `mock-runtime-not-covered`
|
|
|
|
### Layer 4: Pseudo-Production Validation Plan
|
|
|
|
Question:
|
|
|
|
What must be true before moving from mock validation into real environment validation?
|
|
|
|
This layer defines the pre-production checklist and evidence bundle. It does not require production credentials or real system access.
|
|
|
|
Checklist:
|
|
|
|
1. environment variable and runtime dependency inventory
|
|
2. browser or host-bridge dependency declaration
|
|
3. expected artifact type per skill
|
|
4. required screenshots, logs, HAR files, console logs, or generated artifacts for later real execution
|
|
5. pass/fail taxonomy for real-environment results
|
|
|
|
Output status:
|
|
|
|
`pseudo-prod-ready`, `pseudo-prod-blocked`, or `real-env-required`
|
|
|
|
## Non-Goals
|
|
|
|
This design does not:
|
|
|
|
1. modify `src/generated_scene/analyzer.rs`
|
|
2. modify `src/generated_scene/generator.rs`
|
|
3. rematerialize the `102` skill packages
|
|
4. update `scene_execution_board_2026-04-18.json`
|
|
5. start browser-integrated production execution
|
|
6. require live credentials, VPN, SSO, or production network access
|
|
7. claim `102 / 102` real-sample executed-pass
|
|
|
|
## Validation Status Model
|
|
|
|
Each scene should eventually have independent statuses:
|
|
|
|
1. `materializationStatus`
|
|
2. `deterministicDispatchStatus`
|
|
3. `staticValidationStatus`
|
|
4. `mockRuntimeStatus`
|
|
5. `pseudoProductionReadinessStatus`
|
|
6. `realEnvironmentExecutionStatus`
|
|
|
|
This prevents the project from confusing generated skill availability with production correctness.
|
|
|
|
## Expected Deliverables
|
|
|
|
The implementation plan should produce:
|
|
|
|
1. static validation result JSON
|
|
2. deterministic dry-run validation JSON
|
|
3. mock runtime readiness matrix
|
|
4. pseudo-production checklist
|
|
5. validation report
|
|
6. next-stage decision on whether to start real environment validation
|
|
|
|
## Stop Rules
|
|
|
|
Stop after publishing validation readiness assets and reports.
|
|
|
|
Do not proceed into real production execution under this plan.
|
|
|
|
Do not modify generated framework logic under this plan.
|