claw/docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-design.md

# Generated Scene Source-First Runtime Semantics Ledger Design

> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Upstream scan:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-evidence-cross-scan-plan.md`

## Intent

Define the second bounded child step of the source-first runtime semantics hardening roadmap:

`merge source-side evidence with generated-skill evidence into a full 102-scene runtime-semantics ledger`

This design is still analysis-only. It does not modify `src/`, generated skills, validation assets, or execution-board state.

## Objective

For every scene in the current 102-scene set:

1. merge source-side evidence from the completed cross-scan
2. compare that evidence against current generated skill manifests and references
3. assign one or more canonical runtime-semantics gap classes
4. assign a bounded `riskLevel`
5. distinguish:
   - reusable generator-level rule gap
   - runtime-only residual
6. publish a source-first runtime-semantics ledger that becomes the only valid input for later hardening-route design

## Fixed Gap Taxonomy

The ledger must continue using the five gap classes already anchored by `sweep-030-scene`:

1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`

No additional gap class should be invented inside this ledger stage unless the evidence is clearly outside these five and cannot be expressed as a subtype.

## Scope

In scope:

1. the completed source cross-scan asset
2. the current final generated skills under `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. current deterministic invocation readiness assets
4. current natural-language parameter readiness assets
5. current parameter dictionary normalization assets
6. source-to-generated comparison for all 102 scenes
7. JSON ledger + human-readable report

Out of scope:

1. any change in `src/`
2. any skill manifest or script edit
3. any rematerialization
4. any validation rerun
5. any inner-network execution

## Required Comparisons

The ledger stage must compare source evidence with generated output along these axes.

### 1. Invocation alias comparison

Check whether source-side operator wording, labels, route names, or titles imply broader natural-language coverage than the current generated `include_keywords`.

### 2. Dictionary comparison

Check whether source-side dictionaries, trees, or option arrays imply a richer entity dictionary than the generated `references/*dictionary*.json` assets currently expose.

### 3. Parameter default semantics comparison

Check whether source-side date / period / mode initialization implies a default-value policy that the generated manifest or resolver metadata does not currently preserve.

### 4. Resolver-to-request mapping comparison

Check whether source-side request field names differ from generated resolver output names and whether the generated skill currently encodes an explicit mapping.

### 5. Runtime URL comparison

Check whether source-side evidence implies multiple URL roles:

1. app entry URL
2. module route URL
3. API endpoint URL
4. runtime browser context URL

and whether the generated skill currently collapses those roles into a single ambiguous target.

## Ledger Schema

Each scene record in the runtime-semantics ledger should include:

1. `sceneId`
2. `sceneName`
3. `sourceDir`
4. `archetype`
5. `readiness`
6. `riskLevel`
7. `gaps`
8. `generatorLevelGap`
9. `runtimeOnlyResidual`
10. `recommendedFixRoutes`
11. `sourceEvidenceSummary`
12. `generatedEvidenceSummary`
13. `comparisonNotes`

## Risk-Level Rules

The ledger should use bounded, reproducible risk levels:

### `high`

Use when the scene has strong source evidence for one or more gap classes and the current generated skill visibly lacks equivalent semantics.

### `medium`

Use when the scene has source evidence for one or more gap classes, but current generated output appears partially aligned or the mismatch is plausible rather than explicit.

### `low`

Use when source evidence exists but generated output already appears materially aligned, or when the residual is likely runtime-only rather than generator-level.

## Generator-Level vs Runtime-Only

The ledger must classify whether a scene's residuals should later drive generator hardening or should remain runtime-only.

### `generatorLevelGap = true`

Use when source evidence proves the generated skill is missing semantics that should be recoverable during generation.

### `runtimeOnlyResidual = true`

Use when the remaining risk is primarily:

1. login / session
2. host runtime behavior
3. local-doc / host-bridge environment
4. inner-network-only execution context

and not a generation-semantic omission.

These two flags are not always mutually exclusive, but the ledger must explain why.

## Inputs

Primary inputs:

1. `tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json`
2. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`

Anchor runtime findings:

1. the confirmed `sweep-030-scene` inner-network findings:
   - alias mismatch
   - starter-subset org dictionary
   - page-semantic default period behavior
   - request-field mismatch
   - runtime context URL ambiguity

## Output Artifacts

### JSON

- `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`

### Report

- `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md`

The report must answer:

1. how many scenes are `high`, `medium`, `low`
2. how many scenes carry each gap class
3. how many scenes appear to require generator-level fixes
4. how many scenes look runtime-only
5. which route clusters are likely to yield the highest reuse

## Acceptance Criteria

This design is complete when:

1. it defines a full-scene ledger stage rather than scene-by-scene notes
2. it binds the ledger to the fixed five-gap taxonomy
3. it defines how source evidence and generated evidence are compared
4. it defines `riskLevel`, `generatorLevelGap`, and `runtimeOnlyResidual`
5. it remains analysis-only

## Stop Statement

Stop after publishing this ledger design and its child plan.

Do not execute the ledger build inside this design.