Files
claw/docs/superpowers/specs/2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md

204 lines
6.9 KiB
Markdown

# Generated Scene Runtime Semantics Gap Analysis Design
> Status: Superseded by `docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md`
## Objective
Produce a bounded, implementation-free analysis of runtime semantics gaps across the final 102 generated scene skills, using `sweep-030-scene` as the anchor case that exposed five concrete gap classes during inner-network validation.
This design does **not** modify analyzer, generator, runtime, skill manifests, or execution assets. It only defines how to analyze and classify the gaps that remain between:
- `generated_scene` framework-level success
- real inner-network invocation / execution equivalence
## Anchor Case
The anchor case is:
- `sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析`
Inner-network debugging exposed the following gap classes:
1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`
The analysis generalizes these five classes across the full 102-scene final materialization set.
## Scope
In scope:
- Analyze the final 102 generated skills under:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- Inspect:
- `scene.toml`
- `SKILL.toml`
- `references/generation-report.json`
- `references/org-dictionary.json` where present
- generated browser scripts where needed for request mapping evidence
- Compare generated assets against source-scene evidence when required to validate dictionary and runtime-url semantics
- Produce a 102-scene gap inventory and summary report
Out of scope:
- Any code change in `src/`
- Any edit to generated skill packages
- Any update to execution board / official board
- Any new pseudo-production execution
- Any new inner-network fix for a specific scene
## Problem Statement
The repository has already reached:
- `102 / 102` framework auto-pass
- `102 / 102` final materialized skills
- deterministic invocation readiness
But `sweep-030-scene` demonstrated that generated skills can still diverge from real runtime semantics in ways not captured by framework-level closure:
- user phrasing differs from canonical scene name
- source scene contains complete org dictionaries not fully recovered into the generated skill
- source page defaults dates / periods while generated invocation initially required explicit period values
- resolver outputs and request field names do not align 1:1
- runtime context URL semantics differ from module-route URL semantics
Therefore the next bounded step is analysis, not implementation.
## Gap Taxonomy
Each scene may be tagged with zero or more of the following gap classes:
### 1. `invocation_alias_gap`
Definition:
- Natural operator phrasing is likely not covered by current deterministic `include_keywords`
Indicators:
- Deterministic keywords only contain canonical scene title
- Scene title includes punctuation / separators / compound mode phrases
- Existing reports already required alias normalization
### 2. `dictionary_recovery_gap`
Definition:
- Source scene contains embedded dictionaries / trees / option arrays, but generated skill only carries a starter subset or no dictionary at all
Indicators:
- Source contains files like `city.js`, `dict.js`, `enum.js`, `options.js`
- Source JS includes tree/option structures with labels/codes/children
- Generated `references/org-dictionary.json` is empty or much smaller than source evidence
### 3. `parameter_default_semantics_gap`
Definition:
- Source page applies default values (date, period, mode, range, org) when user omits them, but generated skill currently treats them as required or unresolved
Indicators:
- Source contains `moment()` / date defaulting / initial query payloads
- Generated parameter readiness previously required explicit user input
### 4. `resolver_to_request_mapping_gap`
Definition:
- Resolved semantic parameters do not align directly with actual request field names or payload layout used by the source page
Indicators:
- Resolver outputs `org_code` while request uses `orgno`, or analogous mismatches
- Generated request template uses placeholders not directly populated by resolver outputs
- Source request payload structure differs from generated request mapping
### 5. `runtime_url_semantics_gap`
Definition:
- Generated skill does not clearly distinguish between app-entry URL, module-route URL, and API endpoint URL for runtime binding
Indicators:
- `scene.toml` only stores one `bootstrap.target_url`
- Inner-network execution shows app-entry URL succeeds while module-route URL fails, or vice versa
- Generation report contains both an app entry and a deeper route candidate
## Inputs
Primary inputs:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
Anchor-case source evidence:
- `D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析`
## Output Artifacts
### 1. JSON inventory
- `tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json`
Required structure:
- top-level summary counts by gap class
- per-scene records
- per-risk-bucket grouping
Each scene record should include:
- `sceneId`
- `sceneName`
- `archetype`
- `riskLevel`
- `gaps`
- `evidence`
- `recommendedFixRoutes`
### 2. Human-readable report
- `docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md`
The report must answer:
1. How many scenes likely have each gap type
2. Which families / archetypes are most affected
3. Which gaps are generator-level
4. Which gaps are runtime-only and should not be pushed back into generation
5. Which next implementation routes should be prioritized
## Risk Buckets
Scenes should be grouped into:
- `high`: multi-parameter or runtime-sensitive scenes where inner-network invocation is likely to diverge without further hardening
- `medium`: scenes with likely alias / dictionary / default-semantics issues but lower execution sensitivity
- `low`: scenes with no immediate evidence of these five gap classes
## Acceptance Criteria
This analysis is complete when:
1. All 102 final materialized scenes have a runtime-semantics record
2. `sweep-030-scene` is explicitly analyzed under all applicable gap classes
3. Summary counts exist for all five gap classes
4. Dictionary recovery gap is supported by direct source-vs-generated evidence for the anchor case
5. The report recommends next implementation routes without changing code
## Stop Statement
Stop after publishing the JSON inventory and report.
Do not open implementation work from this design.