Files
claw/docs/superpowers/specs/2026-04-20-scene-skill-102-mock-runtime-harness-implementation-design.md

151 lines
5.1 KiB
Markdown

# Scene Skill 102 Mock Runtime Harness Implementation Design
> Date: 2026-04-20
> Status: Draft
> Upstream Validation: `docs/superpowers/plans/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-plan.md`
> Input Matrix: `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
## Intent
Define a bounded implementation stage for mock runtime harnesses after the `102` materialized skill set has passed static package validation and deterministic dispatch dry-run.
This design is not production validation. It exists to prove that generated skill scripts can be loaded and exercised against controlled fake dependencies before any real browser, host bridge, or production system is touched.
## Current Baseline
From `scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`:
| Archetype | Count | Representative scenes |
| --- | ---: | --- |
| `paginated_enrichment` | 51 | `sweep-001-scene`, `sweep-002-scene`, `sweep-003-scene` |
| `host_bridge_workflow` | 26 | `sweep-007-scene`, `sweep-009-scene`, `sweep-010-scene` |
| `multi_mode_request` | 10 | `sweep-020-scene`, `sweep-023-scene`, `sweep-030-scene` |
| `local_doc_pipeline` | 6 | `sweep-012-scene`, `sweep-017-scene`, `sweep-019-scene` |
| `single_request_enrichment` | 5 | `sweep-013-scene`, `sweep-016-scene`, `sweep-068-scene` |
| `multi_endpoint_inventory` | 2 | `sweep-084-scene`, `sweep-085-scene` |
| `page_state_eval` | 2 | `sweep-066-scene`, `sweep-094-scene` |
Current matrix status:
| Status | Count |
| --- | ---: |
| `mock-covered-by-representative` | 19 |
| `mock-needs-harness` | 83 |
Important interpretation:
`mock-covered-by-representative` currently means representative selection only. It does not mean scripts have been executed in a mock runtime.
## Harness Layers
### Layer 1: Script Load Harness
Purpose:
1. load generated browser scripts in a controlled JavaScript runtime
2. verify the entry module does not fail during parse/load
3. verify referenced helper files are present
Output status:
`script-load-pass` or `script-load-fail`
### Layer 2: Fake Dependency Harness
Purpose:
Provide controlled fake implementations for:
1. `fetch`
2. browser DOM
3. host bridge action/callback
4. local document service
5. artifact writer
Output status:
`mock-dependency-ready` or `mock-dependency-missing`
### Layer 3: Representative Flow Harness
Purpose:
Run representative scene scripts far enough to prove control-flow integrity.
Checks:
1. expected request or host action is attempted
2. controlled empty-data response is handled
3. controlled non-empty response is normalized
4. artifact metadata is produced when declared
5. error response does not crash outside structured failure path
Output status:
`mock-runtime-pass`, `mock-runtime-fail`, or `mock-runtime-partial`
### Layer 4: Matrix Propagation
Purpose:
Propagate representative results to same-archetype scenes without claiming direct execution for every scene.
Output statuses:
1. `mock-runtime-representative-pass`
2. `mock-runtime-covered-by-representative`
3. `mock-runtime-needs-direct-run`
4. `mock-runtime-fail`
## Route Order
The route order is fixed:
1. `Route 1`: `paginated_enrichment` mock harness
2. `Route 2`: `multi_mode_request` and `single_request_enrichment` mock harnesses
3. `Route 3`: `multi_endpoint_inventory` and `page_state_eval` mock harnesses
4. `Route 4`: `local_doc_pipeline` and `host_bridge_workflow` mock harnesses
5. `Route 5`: publish integrated mock runtime validation report
Rationale:
1. `paginated_enrichment` is the largest bucket and should validate the most reused generated flow first.
2. `multi_mode_request` and `single_request_enrichment` are mainline API flows and can share fake fetch infrastructure.
3. `multi_endpoint_inventory` and `page_state_eval` are small buckets and should be validated after the mainline fetch harness exists.
4. `local_doc_pipeline` and `host_bridge_workflow` require more specialized fakes and must not drive the first harness implementation.
## Scope Guardrails
Allowed:
1. add mock validation harness files
2. add mock validation tests
3. read generated final skill packages
4. execute generated scripts only inside a mock runtime
5. publish mock runtime validation result assets and reports
Forbidden:
1. do not modify generated skill scripts under `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. do not modify `src/generated_scene/analyzer.rs`
3. do not modify `src/generated_scene/generator.rs`
4. do not rematerialize the `102` skills
5. do not update `scene_execution_board_2026-04-18.json`
6. do not start real browser execution
7. do not connect to real business systems
8. do not require production credentials, VPN, SSO, or internal network access
## Expected Assets
1. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-mock-runtime-harness-report.md`
## Stop Rules
Stop after representative mock runtime results and the integrated report are published.
Do not continue into production validation under this plan.
Do not claim `102 / 102` real runtime pass from mock results.