6.9 KiB
Generated Scene Runtime Semantics Gap Analysis Design
Status: Superseded by
docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md
Objective
Produce a bounded, implementation-free analysis of runtime semantics gaps across the final 102 generated scene skills, using sweep-030-scene as the anchor case that exposed five concrete gap classes during inner-network validation.
This design does not modify analyzer, generator, runtime, skill manifests, or execution assets. It only defines how to analyze and classify the gaps that remain between:
generated_sceneframework-level success- real inner-network invocation / execution equivalence
Anchor Case
The anchor case is:
sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析
Inner-network debugging exposed the following gap classes:
invocation_alias_gapdictionary_recovery_gapparameter_default_semantics_gapresolver_to_request_mapping_gapruntime_url_semantics_gap
The analysis generalizes these five classes across the full 102-scene final materialization set.
Scope
In scope:
- Analyze the final 102 generated skills under:
examples/scene_skill_102_final_materialization_2026-04-19/skills
- Inspect:
scene.tomlSKILL.tomlreferences/generation-report.jsonreferences/org-dictionary.jsonwhere present- generated browser scripts where needed for request mapping evidence
- Compare generated assets against source-scene evidence when required to validate dictionary and runtime-url semantics
- Produce a 102-scene gap inventory and summary report
Out of scope:
- Any code change in
src/ - Any edit to generated skill packages
- Any update to execution board / official board
- Any new pseudo-production execution
- Any new inner-network fix for a specific scene
Problem Statement
The repository has already reached:
102 / 102framework auto-pass102 / 102final materialized skills- deterministic invocation readiness
But sweep-030-scene demonstrated that generated skills can still diverge from real runtime semantics in ways not captured by framework-level closure:
- user phrasing differs from canonical scene name
- source scene contains complete org dictionaries not fully recovered into the generated skill
- source page defaults dates / periods while generated invocation initially required explicit period values
- resolver outputs and request field names do not align 1:1
- runtime context URL semantics differ from module-route URL semantics
Therefore the next bounded step is analysis, not implementation.
Gap Taxonomy
Each scene may be tagged with zero or more of the following gap classes:
1. invocation_alias_gap
Definition:
- Natural operator phrasing is likely not covered by current deterministic
include_keywords
Indicators:
- Deterministic keywords only contain canonical scene title
- Scene title includes punctuation / separators / compound mode phrases
- Existing reports already required alias normalization
2. dictionary_recovery_gap
Definition:
- Source scene contains embedded dictionaries / trees / option arrays, but generated skill only carries a starter subset or no dictionary at all
Indicators:
- Source contains files like
city.js,dict.js,enum.js,options.js - Source JS includes tree/option structures with labels/codes/children
- Generated
references/org-dictionary.jsonis empty or much smaller than source evidence
3. parameter_default_semantics_gap
Definition:
- Source page applies default values (date, period, mode, range, org) when user omits them, but generated skill currently treats them as required or unresolved
Indicators:
- Source contains
moment()/ date defaulting / initial query payloads - Generated parameter readiness previously required explicit user input
4. resolver_to_request_mapping_gap
Definition:
- Resolved semantic parameters do not align directly with actual request field names or payload layout used by the source page
Indicators:
- Resolver outputs
org_codewhile request usesorgno, or analogous mismatches - Generated request template uses placeholders not directly populated by resolver outputs
- Source request payload structure differs from generated request mapping
5. runtime_url_semantics_gap
Definition:
- Generated skill does not clearly distinguish between app-entry URL, module-route URL, and API endpoint URL for runtime binding
Indicators:
scene.tomlonly stores onebootstrap.target_url- Inner-network execution shows app-entry URL succeeds while module-route URL fails, or vice versa
- Generation report contains both an app entry and a deeper route candidate
Inputs
Primary inputs:
examples/scene_skill_102_final_materialization_2026-04-19/skillstests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.jsontests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.jsontests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json
Anchor-case source evidence:
D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析
Output Artifacts
1. JSON inventory
tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json
Required structure:
- top-level summary counts by gap class
- per-scene records
- per-risk-bucket grouping
Each scene record should include:
sceneIdsceneNamearchetyperiskLevelgapsevidencerecommendedFixRoutes
2. Human-readable report
docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md
The report must answer:
- How many scenes likely have each gap type
- Which families / archetypes are most affected
- Which gaps are generator-level
- Which gaps are runtime-only and should not be pushed back into generation
- Which next implementation routes should be prioritized
Risk Buckets
Scenes should be grouped into:
high: multi-parameter or runtime-sensitive scenes where inner-network invocation is likely to diverge without further hardeningmedium: scenes with likely alias / dictionary / default-semantics issues but lower execution sensitivitylow: scenes with no immediate evidence of these five gap classes
Acceptance Criteria
This analysis is complete when:
- All 102 final materialized scenes have a runtime-semantics record
sweep-030-sceneis explicitly analyzed under all applicable gap classes- Summary counts exist for all five gap classes
- Dictionary recovery gap is supported by direct source-vs-generated evidence for the anchor case
- The report recommends next implementation routes without changing code
Stop Statement
Stop after publishing the JSON inventory and report.
Do not open implementation work from this design.