Files
claw/docs/superpowers/plans/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-plan.md

253 lines
7.8 KiB
Markdown

# Scene Skill 102 Static, Mock, And Pseudo-Production Validation Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-design.md`
> Parent Stage: validation after final materialization and deterministic invocation readiness
## Plan Intent
Define the next validation stage for the fully materialized `102` scene skill set.
This plan validates package health, deterministic dispatch readiness, mock runtime feasibility, and pseudo-production readiness. It does not perform real production execution.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
3. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
4. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
5. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
6. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_static_validation_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_dispatch_dry_run_validation_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_readiness_2026-04-20.json`
5. `docs/superpowers/reports/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-report.md`
## Scope Guardrails
Allowed:
1. read final materialized skill packages
2. parse `SKILL.toml`, `scene.toml`, and generation reports
3. run deterministic dispatch dry-run without executing selected skills
4. build mock runtime validation matrix
5. publish validation JSON and report assets
Forbidden:
1. do not modify `src/generated_scene/analyzer.rs`
2. do not modify `src/generated_scene/generator.rs`
3. do not modify generated skill scripts during this validation plan
4. do not rematerialize the `102` skills
5. do not update `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
6. do not start real browser or production environment execution
7. do not require production credentials, SSO, VPN, or real system access
## Workstreams
1. `WS1` Static package validation
2. `WS2` Deterministic invocation dry-run validation
3. `WS3` Mock runtime validation matrix
4. `WS4` Pseudo-production readiness checklist
5. `WS5` Validation report and next-stage decision
## Phase 0: Freeze Validation Baseline
### Objective
Freeze the `102` final skill set as the input to validation.
### Tasks
1. Confirm final materialization count is `102 / 102`.
2. Confirm materialization failure count is `0`.
3. Confirm deterministic invocation readiness is `102 / 102`.
4. Confirm this plan does not rematerialize skills.
### Deliverables
1. validation baseline section in final report
### Acceptance Criteria
1. validation begins from the final materialization root
2. no source scene directories are rescanned
3. no generated scene logic is changed
## Phase 1: Static Package Validation
### Objective
Validate that all `102` skill packages are structurally complete and parseable.
### Tasks
For each skill:
1. check `SKILL.toml`
2. check `SKILL.md`
3. check `scene.toml`
4. check `references/generation-report.json`
5. check at least one script under `scripts/`
6. parse TOML and JSON files
7. compare `sceneId`, display name, archetype, readiness, suffix, and keyword fields against index and manifest assets
### Deliverables
1. `scene_skill_102_static_validation_2026-04-20.json`
### Acceptance Criteria
1. every scene has exactly one static validation record
2. every static failure has a named reason
3. total records equal `102`
## Phase 2: Deterministic Invocation Dry-Run Validation
### Objective
Validate `U+3002 x3` deterministic suffix dispatch selection without executing selected skills.
### Tasks
For each skill:
1. construct one canonical utterance from the scene display name plus the `U+3002 x3` suffix
2. optionally construct one keyword-based utterance when safe
3. dry-run deterministic selection against the skill registry
4. record selected skill id, ambiguity count, and no-match status
### Deliverables
1. `scene_skill_102_dispatch_dry_run_validation_2026-04-20.json`
### Acceptance Criteria
1. every complete skill has a dispatch dry-run result
2. ambiguous and no-match outcomes are explicit
3. no selected skill is executed
## Phase 3: Mock Runtime Validation Matrix
### Objective
Define and, where safe, prepare mock runtime validation by archetype.
### Tasks
1. group `102` skills by workflow archetype
2. identify one to three representatives per archetype
3. define mock dependencies required by each archetype:
- fake fetch
- fake browser DOM
- fake host bridge
- fake local-doc service
- fake artifact writer
4. classify each skill as:
- `mock-covered-by-representative`
- `mock-needs-harness`
- `mock-not-safe-yet`
### Deliverables
1. `scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
### Acceptance Criteria
1. every scene is assigned a mock-runtime coverage status
2. every archetype has a named harness requirement
3. this phase does not require real network or browser credentials
## Phase 4: Pseudo-Production Readiness Checklist
### Objective
Define what evidence is required before real-environment validation can start.
### Tasks
For each scene or archetype:
1. record required runtime dependencies
2. record expected artifact type
3. record whether host bridge, browser, localhost service, or document pipeline is required
4. define required execution evidence:
- console logs
- network logs
- screenshots
- exported files
- generated artifact metadata
5. define failure taxonomy:
- `login-blocked`
- `network-blocked`
- `host-bridge-blocked`
- `data-mismatch`
- `artifact-mismatch`
- `environment-unavailable`
- `runtime-error`
### Deliverables
1. `scene_skill_102_pseudoprod_readiness_2026-04-20.json`
### Acceptance Criteria
1. every scene has a pseudo-production readiness record
2. every real-environment blocker has a named category
3. no production credentials are required by this phase
## Phase 5: Publish Validation Report
### Objective
Publish one report that separates static readiness, dispatch readiness, mock readiness, and pseudo-production readiness.
### Tasks
1. summarize static validation results
2. summarize dispatch dry-run results
3. summarize mock runtime coverage matrix
4. summarize pseudo-production readiness categories
5. recommend whether to start real-environment validation and at what batch size
### Deliverables
1. `docs/superpowers/reports/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-report.md`
### Acceptance Criteria
1. report explains that `102 / 102` materialization is not the same as `102 / 102` production execution
2. report lists remaining validation blockers, if any
3. report does not promote any scene to real executed-pass
## Expected Status Outputs
This plan should produce these independent status counts:
1. `staticValidationStatus`
2. `dispatchDryRunStatus`
3. `mockRuntimeCoverageStatus`
4. `pseudoProductionReadinessStatus`
## Completion Criteria
This plan is complete when:
1. all planned validation assets are published
2. all `102` scenes have static validation records
3. all `102` scenes have dispatch dry-run records
4. all `102` scenes have mock runtime matrix records
5. all `102` scenes have pseudo-production readiness records
6. the validation report is published
## Stop Statement
Stop after publishing static, dispatch, mock-runtime matrix, pseudo-production readiness assets, and the report.
Do not execute real production validation under this plan.