Files
claw/docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-design.md

201 lines
6.6 KiB
Markdown

# Generated Scene Source-First Runtime Semantics Ledger Design
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Upstream scan:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-evidence-cross-scan-plan.md`
## Intent
Define the second bounded child step of the source-first runtime semantics hardening roadmap:
`merge source-side evidence with generated-skill evidence into a full 102-scene runtime-semantics ledger`
This design is still analysis-only. It does not modify `src/`, generated skills, validation assets, or execution-board state.
## Objective
For every scene in the current 102-scene set:
1. merge source-side evidence from the completed cross-scan
2. compare that evidence against current generated skill manifests and references
3. assign one or more canonical runtime-semantics gap classes
4. assign a bounded `riskLevel`
5. distinguish:
- reusable generator-level rule gap
- runtime-only residual
6. publish a source-first runtime-semantics ledger that becomes the only valid input for later hardening-route design
## Fixed Gap Taxonomy
The ledger must continue using the five gap classes already anchored by `sweep-030-scene`:
1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`
No additional gap class should be invented inside this ledger stage unless the evidence is clearly outside these five and cannot be expressed as a subtype.
## Scope
In scope:
1. the completed source cross-scan asset
2. the current final generated skills under `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. current deterministic invocation readiness assets
4. current natural-language parameter readiness assets
5. current parameter dictionary normalization assets
6. source-to-generated comparison for all 102 scenes
7. JSON ledger + human-readable report
Out of scope:
1. any change in `src/`
2. any skill manifest or script edit
3. any rematerialization
4. any validation rerun
5. any inner-network execution
## Required Comparisons
The ledger stage must compare source evidence with generated output along these axes.
### 1. Invocation alias comparison
Check whether source-side operator wording, labels, route names, or titles imply broader natural-language coverage than the current generated `include_keywords`.
### 2. Dictionary comparison
Check whether source-side dictionaries, trees, or option arrays imply a richer entity dictionary than the generated `references/*dictionary*.json` assets currently expose.
### 3. Parameter default semantics comparison
Check whether source-side date / period / mode initialization implies a default-value policy that the generated manifest or resolver metadata does not currently preserve.
### 4. Resolver-to-request mapping comparison
Check whether source-side request field names differ from generated resolver output names and whether the generated skill currently encodes an explicit mapping.
### 5. Runtime URL comparison
Check whether source-side evidence implies multiple URL roles:
1. app entry URL
2. module route URL
3. API endpoint URL
4. runtime browser context URL
and whether the generated skill currently collapses those roles into a single ambiguous target.
## Ledger Schema
Each scene record in the runtime-semantics ledger should include:
1. `sceneId`
2. `sceneName`
3. `sourceDir`
4. `archetype`
5. `readiness`
6. `riskLevel`
7. `gaps`
8. `generatorLevelGap`
9. `runtimeOnlyResidual`
10. `recommendedFixRoutes`
11. `sourceEvidenceSummary`
12. `generatedEvidenceSummary`
13. `comparisonNotes`
## Risk-Level Rules
The ledger should use bounded, reproducible risk levels:
### `high`
Use when the scene has strong source evidence for one or more gap classes and the current generated skill visibly lacks equivalent semantics.
### `medium`
Use when the scene has source evidence for one or more gap classes, but current generated output appears partially aligned or the mismatch is plausible rather than explicit.
### `low`
Use when source evidence exists but generated output already appears materially aligned, or when the residual is likely runtime-only rather than generator-level.
## Generator-Level vs Runtime-Only
The ledger must classify whether a scene's residuals should later drive generator hardening or should remain runtime-only.
### `generatorLevelGap = true`
Use when source evidence proves the generated skill is missing semantics that should be recoverable during generation.
### `runtimeOnlyResidual = true`
Use when the remaining risk is primarily:
1. login / session
2. host runtime behavior
3. local-doc / host-bridge environment
4. inner-network-only execution context
and not a generation-semantic omission.
These two flags are not always mutually exclusive, but the ledger must explain why.
## Inputs
Primary inputs:
1. `tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json`
2. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
Anchor runtime findings:
1. the confirmed `sweep-030-scene` inner-network findings:
- alias mismatch
- starter-subset org dictionary
- page-semantic default period behavior
- request-field mismatch
- runtime context URL ambiguity
## Output Artifacts
### JSON
- `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
### Report
- `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md`
The report must answer:
1. how many scenes are `high`, `medium`, `low`
2. how many scenes carry each gap class
3. how many scenes appear to require generator-level fixes
4. how many scenes look runtime-only
5. which route clusters are likely to yield the highest reuse
## Acceptance Criteria
This design is complete when:
1. it defines a full-scene ledger stage rather than scene-by-scene notes
2. it binds the ledger to the fixed five-gap taxonomy
3. it defines how source evidence and generated evidence are compared
4. it defines `riskLevel`, `generatorLevelGap`, and `runtimeOnlyResidual`
5. it remains analysis-only
## Stop Statement
Stop after publishing this ledger design and its child plan.
Do not execute the ledger build inside this design.