admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

6.9 KiB

Raw Blame History

Generated Scene Runtime Semantics Gap Analysis Design

Status: Superseded by docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md

Objective

Produce a bounded, implementation-free analysis of runtime semantics gaps across the final 102 generated scene skills, using sweep-030-scene as the anchor case that exposed five concrete gap classes during inner-network validation.

This design does not modify analyzer, generator, runtime, skill manifests, or execution assets. It only defines how to analyze and classify the gaps that remain between:

generated_scene framework-level success
real inner-network invocation / execution equivalence

Anchor Case

The anchor case is:

sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析

Inner-network debugging exposed the following gap classes:

invocation_alias_gap
dictionary_recovery_gap
parameter_default_semantics_gap
resolver_to_request_mapping_gap
runtime_url_semantics_gap

The analysis generalizes these five classes across the full 102-scene final materialization set.

Scope

In scope:

Analyze the final 102 generated skills under:
- examples/scene_skill_102_final_materialization_2026-04-19/skills
Inspect:
- scene.toml
- SKILL.toml
- references/generation-report.json
- references/org-dictionary.json where present
- generated browser scripts where needed for request mapping evidence
Compare generated assets against source-scene evidence when required to validate dictionary and runtime-url semantics
Produce a 102-scene gap inventory and summary report

Out of scope:

Any code change in src/
Any edit to generated skill packages
Any update to execution board / official board
Any new pseudo-production execution
Any new inner-network fix for a specific scene

Problem Statement

The repository has already reached:

102 / 102 framework auto-pass
102 / 102 final materialized skills
deterministic invocation readiness

But sweep-030-scene demonstrated that generated skills can still diverge from real runtime semantics in ways not captured by framework-level closure:

user phrasing differs from canonical scene name
source scene contains complete org dictionaries not fully recovered into the generated skill
source page defaults dates / periods while generated invocation initially required explicit period values
resolver outputs and request field names do not align 1:1
runtime context URL semantics differ from module-route URL semantics

Therefore the next bounded step is analysis, not implementation.

Gap Taxonomy

Each scene may be tagged with zero or more of the following gap classes:

1. `invocation_alias_gap`

Definition:

Natural operator phrasing is likely not covered by current deterministic include_keywords

Indicators:

Deterministic keywords only contain canonical scene title
Scene title includes punctuation / separators / compound mode phrases
Existing reports already required alias normalization

2. `dictionary_recovery_gap`

Definition:

Source scene contains embedded dictionaries / trees / option arrays, but generated skill only carries a starter subset or no dictionary at all

Indicators:

Source contains files like city.js, dict.js, enum.js, options.js
Source JS includes tree/option structures with labels/codes/children
Generated references/org-dictionary.json is empty or much smaller than source evidence

3. `parameter_default_semantics_gap`

Definition:

Source page applies default values (date, period, mode, range, org) when user omits them, but generated skill currently treats them as required or unresolved

Indicators:

Source contains moment() / date defaulting / initial query payloads
Generated parameter readiness previously required explicit user input

4. `resolver_to_request_mapping_gap`

Definition:

Resolved semantic parameters do not align directly with actual request field names or payload layout used by the source page

Indicators:

Resolver outputs org_code while request uses orgno, or analogous mismatches
Generated request template uses placeholders not directly populated by resolver outputs
Source request payload structure differs from generated request mapping

5. `runtime_url_semantics_gap`

Definition:

Generated skill does not clearly distinguish between app-entry URL, module-route URL, and API endpoint URL for runtime binding

Indicators:

scene.toml only stores one bootstrap.target_url
Inner-network execution shows app-entry URL succeeds while module-route URL fails, or vice versa
Generation report contains both an app entry and a deeper route candidate

Inputs

Primary inputs:

examples/scene_skill_102_final_materialization_2026-04-19/skills
tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json

Anchor-case source evidence:

D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析

Output Artifacts

1. JSON inventory

tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json

Required structure:

top-level summary counts by gap class
per-scene records
per-risk-bucket grouping

Each scene record should include:

sceneId
sceneName
archetype
riskLevel
gaps
evidence
recommendedFixRoutes

2. Human-readable report

docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md

The report must answer:

How many scenes likely have each gap type
Which families / archetypes are most affected
Which gaps are generator-level
Which gaps are runtime-only and should not be pushed back into generation
Which next implementation routes should be prioritized

Risk Buckets

Scenes should be grouped into:

high: multi-parameter or runtime-sensitive scenes where inner-network invocation is likely to diverge without further hardening
medium: scenes with likely alias / dictionary / default-semantics issues but lower execution sensitivity
low: scenes with no immediate evidence of these five gap classes

Acceptance Criteria

This analysis is complete when:

All 102 final materialized scenes have a runtime-semantics record
sweep-030-scene is explicitly analyzed under all applicable gap classes
Summary counts exist for all five gap classes
Dictionary recovery gap is supported by direct source-vs-generated evidence for the anchor case
The report recommends next implementation routes without changing code

Stop Statement

Stop after publishing the JSON inventory and report.

Do not open implementation work from this design.

6.9 KiB Raw Blame History

Generated Scene Runtime Semantics Gap Analysis Design

Objective

Anchor Case

Scope

Problem Statement

Gap Taxonomy

1. invocation_alias_gap

2. dictionary_recovery_gap

3. parameter_default_semantics_gap

4. resolver_to_request_mapping_gap

5. runtime_url_semantics_gap

Inputs

Output Artifacts

1. JSON inventory

2. Human-readable report

Risk Buckets

Acceptance Criteria

Stop Statement

6.9 KiB

Raw Blame History

1. `invocation_alias_gap`

2. `dictionary_recovery_gap`

3. `parameter_default_semantics_gap`

4. `resolver_to_request_mapping_gap`

5. `runtime_url_semantics_gap`