Files
claw/docs/superpowers/specs/2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md

6.9 KiB

Generated Scene Runtime Semantics Gap Analysis Design

Status: Superseded by docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md

Objective

Produce a bounded, implementation-free analysis of runtime semantics gaps across the final 102 generated scene skills, using sweep-030-scene as the anchor case that exposed five concrete gap classes during inner-network validation.

This design does not modify analyzer, generator, runtime, skill manifests, or execution assets. It only defines how to analyze and classify the gaps that remain between:

  • generated_scene framework-level success
  • real inner-network invocation / execution equivalence

Anchor Case

The anchor case is:

  • sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析

Inner-network debugging exposed the following gap classes:

  1. invocation_alias_gap
  2. dictionary_recovery_gap
  3. parameter_default_semantics_gap
  4. resolver_to_request_mapping_gap
  5. runtime_url_semantics_gap

The analysis generalizes these five classes across the full 102-scene final materialization set.

Scope

In scope:

  • Analyze the final 102 generated skills under:
    • examples/scene_skill_102_final_materialization_2026-04-19/skills
  • Inspect:
    • scene.toml
    • SKILL.toml
    • references/generation-report.json
    • references/org-dictionary.json where present
    • generated browser scripts where needed for request mapping evidence
  • Compare generated assets against source-scene evidence when required to validate dictionary and runtime-url semantics
  • Produce a 102-scene gap inventory and summary report

Out of scope:

  • Any code change in src/
  • Any edit to generated skill packages
  • Any update to execution board / official board
  • Any new pseudo-production execution
  • Any new inner-network fix for a specific scene

Problem Statement

The repository has already reached:

  • 102 / 102 framework auto-pass
  • 102 / 102 final materialized skills
  • deterministic invocation readiness

But sweep-030-scene demonstrated that generated skills can still diverge from real runtime semantics in ways not captured by framework-level closure:

  • user phrasing differs from canonical scene name
  • source scene contains complete org dictionaries not fully recovered into the generated skill
  • source page defaults dates / periods while generated invocation initially required explicit period values
  • resolver outputs and request field names do not align 1:1
  • runtime context URL semantics differ from module-route URL semantics

Therefore the next bounded step is analysis, not implementation.

Gap Taxonomy

Each scene may be tagged with zero or more of the following gap classes:

1. invocation_alias_gap

Definition:

  • Natural operator phrasing is likely not covered by current deterministic include_keywords

Indicators:

  • Deterministic keywords only contain canonical scene title
  • Scene title includes punctuation / separators / compound mode phrases
  • Existing reports already required alias normalization

2. dictionary_recovery_gap

Definition:

  • Source scene contains embedded dictionaries / trees / option arrays, but generated skill only carries a starter subset or no dictionary at all

Indicators:

  • Source contains files like city.js, dict.js, enum.js, options.js
  • Source JS includes tree/option structures with labels/codes/children
  • Generated references/org-dictionary.json is empty or much smaller than source evidence

3. parameter_default_semantics_gap

Definition:

  • Source page applies default values (date, period, mode, range, org) when user omits them, but generated skill currently treats them as required or unresolved

Indicators:

  • Source contains moment() / date defaulting / initial query payloads
  • Generated parameter readiness previously required explicit user input

4. resolver_to_request_mapping_gap

Definition:

  • Resolved semantic parameters do not align directly with actual request field names or payload layout used by the source page

Indicators:

  • Resolver outputs org_code while request uses orgno, or analogous mismatches
  • Generated request template uses placeholders not directly populated by resolver outputs
  • Source request payload structure differs from generated request mapping

5. runtime_url_semantics_gap

Definition:

  • Generated skill does not clearly distinguish between app-entry URL, module-route URL, and API endpoint URL for runtime binding

Indicators:

  • scene.toml only stores one bootstrap.target_url
  • Inner-network execution shows app-entry URL succeeds while module-route URL fails, or vice versa
  • Generation report contains both an app entry and a deeper route candidate

Inputs

Primary inputs:

  • examples/scene_skill_102_final_materialization_2026-04-19/skills
  • tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json
  • tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json
  • tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json

Anchor-case source evidence:

  • D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析

Output Artifacts

1. JSON inventory

  • tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json

Required structure:

  • top-level summary counts by gap class
  • per-scene records
  • per-risk-bucket grouping

Each scene record should include:

  • sceneId
  • sceneName
  • archetype
  • riskLevel
  • gaps
  • evidence
  • recommendedFixRoutes

2. Human-readable report

  • docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md

The report must answer:

  1. How many scenes likely have each gap type
  2. Which families / archetypes are most affected
  3. Which gaps are generator-level
  4. Which gaps are runtime-only and should not be pushed back into generation
  5. Which next implementation routes should be prioritized

Risk Buckets

Scenes should be grouped into:

  • high: multi-parameter or runtime-sensitive scenes where inner-network invocation is likely to diverge without further hardening
  • medium: scenes with likely alias / dictionary / default-semantics issues but lower execution sensitivity
  • low: scenes with no immediate evidence of these five gap classes

Acceptance Criteria

This analysis is complete when:

  1. All 102 final materialized scenes have a runtime-semantics record
  2. sweep-030-scene is explicitly analyzed under all applicable gap classes
  3. Summary counts exist for all five gap classes
  4. Dictionary recovery gap is supported by direct source-vs-generated evidence for the anchor case
  5. The report recommends next implementation routes without changing code

Stop Statement

Stop after publishing the JSON inventory and report.

Do not open implementation work from this design.