claw/docs/superpowers/specs/2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md

# Generated Scene Runtime Semantics Gap Analysis Design

> Status: Superseded by `docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md`

## Objective

Produce a bounded, implementation-free analysis of runtime semantics gaps across the final 102 generated scene skills, using `sweep-030-scene` as the anchor case that exposed five concrete gap classes during inner-network validation.

This design does **not** modify analyzer, generator, runtime, skill manifests, or execution assets. It only defines how to analyze and classify the gaps that remain between:

- `generated_scene` framework-level success
- real inner-network invocation / execution equivalence

## Anchor Case

The anchor case is:

- `sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析`

Inner-network debugging exposed the following gap classes:

1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`

The analysis generalizes these five classes across the full 102-scene final materialization set.

## Scope

In scope:

- Analyze the final 102 generated skills under:
  - `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- Inspect:
  - `scene.toml`
  - `SKILL.toml`
  - `references/generation-report.json`
  - `references/org-dictionary.json` where present
  - generated browser scripts where needed for request mapping evidence
- Compare generated assets against source-scene evidence when required to validate dictionary and runtime-url semantics
- Produce a 102-scene gap inventory and summary report

Out of scope:

- Any code change in `src/`
- Any edit to generated skill packages
- Any update to execution board / official board
- Any new pseudo-production execution
- Any new inner-network fix for a specific scene

## Problem Statement

The repository has already reached:

- `102 / 102` framework auto-pass
- `102 / 102` final materialized skills
- deterministic invocation readiness

But `sweep-030-scene` demonstrated that generated skills can still diverge from real runtime semantics in ways not captured by framework-level closure:

- user phrasing differs from canonical scene name
- source scene contains complete org dictionaries not fully recovered into the generated skill
- source page defaults dates / periods while generated invocation initially required explicit period values
- resolver outputs and request field names do not align 1:1
- runtime context URL semantics differ from module-route URL semantics

Therefore the next bounded step is analysis, not implementation.

## Gap Taxonomy

Each scene may be tagged with zero or more of the following gap classes:

### 1. `invocation_alias_gap`

Definition:

- Natural operator phrasing is likely not covered by current deterministic `include_keywords`

Indicators:

- Deterministic keywords only contain canonical scene title
- Scene title includes punctuation / separators / compound mode phrases
- Existing reports already required alias normalization

### 2. `dictionary_recovery_gap`

Definition:

- Source scene contains embedded dictionaries / trees / option arrays, but generated skill only carries a starter subset or no dictionary at all

Indicators:

- Source contains files like `city.js`, `dict.js`, `enum.js`, `options.js`
- Source JS includes tree/option structures with labels/codes/children
- Generated `references/org-dictionary.json` is empty or much smaller than source evidence

### 3. `parameter_default_semantics_gap`

Definition:

- Source page applies default values (date, period, mode, range, org) when user omits them, but generated skill currently treats them as required or unresolved

Indicators:

- Source contains `moment()` / date defaulting / initial query payloads
- Generated parameter readiness previously required explicit user input

### 4. `resolver_to_request_mapping_gap`

Definition:

- Resolved semantic parameters do not align directly with actual request field names or payload layout used by the source page

Indicators:

- Resolver outputs `org_code` while request uses `orgno`, or analogous mismatches
- Generated request template uses placeholders not directly populated by resolver outputs
- Source request payload structure differs from generated request mapping

### 5. `runtime_url_semantics_gap`

Definition:

- Generated skill does not clearly distinguish between app-entry URL, module-route URL, and API endpoint URL for runtime binding

Indicators:

- `scene.toml` only stores one `bootstrap.target_url`
- Inner-network execution shows app-entry URL succeeds while module-route URL fails, or vice versa
- Generation report contains both an app entry and a deeper route candidate

## Inputs

Primary inputs:

- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`

Anchor-case source evidence:

- `D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析`

## Output Artifacts

### 1. JSON inventory

- `tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json`

Required structure:

- top-level summary counts by gap class
- per-scene records
- per-risk-bucket grouping

Each scene record should include:

- `sceneId`
- `sceneName`
- `archetype`
- `riskLevel`
- `gaps`
- `evidence`
- `recommendedFixRoutes`

### 2. Human-readable report

- `docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md`

The report must answer:

1. How many scenes likely have each gap type
2. Which families / archetypes are most affected
3. Which gaps are generator-level
4. Which gaps are runtime-only and should not be pushed back into generation
5. Which next implementation routes should be prioritized

## Risk Buckets

Scenes should be grouped into:

- `high`: multi-parameter or runtime-sensitive scenes where inner-network invocation is likely to diverge without further hardening
- `medium`: scenes with likely alias / dictionary / default-semantics issues but lower execution sensitivity
- `low`: scenes with no immediate evidence of these five gap classes

## Acceptance Criteria

This analysis is complete when:

1. All 102 final materialized scenes have a runtime-semantics record
2. `sweep-030-scene` is explicitly analyzed under all applicable gap classes
3. Summary counts exist for all five gap classes
4. Dictionary recovery gap is supported by direct source-vs-generated evidence for the anchor case
5. The report recommends next implementation routes without changing code

## Stop Statement

Stop after publishing the JSON inventory and report.

Do not open implementation work from this design.