claw/docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md

# Generated Scene Source-First Runtime Semantics Hardening Design

> Date: 2026-04-20
> Status: Draft
> Supersedes:
> - `docs/superpowers/specs/2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md`
> Upstream Parent:
> - `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Materialization:
> - `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`

## Intent

Define the next parent roadmap for `generated_scene` after framework closure has already been achieved.

The purpose is no longer:

- whether the `102` scenes can be generated into skills

That has already been proven.

The purpose is now:

- scan the original `102` source scenes for runtime-semantics evidence
- identify all scenes that can reproduce the same class of divergence exposed by `sweep-030-scene`
- harden analyzer / generator / manifest rules at the rule level rather than scene-by-scene
- regenerate the full `102` skill set from the hardened rules
- rerun validation assets so future inner-network execution does not rediscover the same class of defects one scene at a time

This design deliberately moves from a weak `generated-skill-first` analysis to a stronger `source-first` analysis and regeneration program.

## Why the Previous Analysis Was Not Enough

The superseded analysis-only design focused mainly on the already-generated skill assets.

That is insufficient for the actual project goal, because the goal is not simply to describe gaps that already surfaced in generated skills. The goal is to:

1. proactively find other source scenes with the same latent runtime-semantics risks as `sweep-030-scene`
2. correct the generation rules once
3. regenerate the full 102-scene bundle
4. avoid repeated inner-network rediscovery of the same class of defects

Therefore the correct parent approach must be source-first.

## Anchor Problem Family

`sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析` exposed five reusable gap classes:

1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`

The roadmapping problem is no longer “fix sweep-030”.

It is:

`find every source scene in the current 102 set that can reproduce one or more of these five gap classes, then harden generation rules and rematerialize the whole set`

## Source-First Principle

For this roadmap, the original source scenes are the primary truth.

Generated skills are secondary, derived artifacts used for comparison.

This means:

1. risk discovery starts from original source-scene files, not from generated output alone
2. generated skills are used to measure what is missing compared with source evidence
3. implementation targets rule-level recovery, not scene-name patching
4. the roadmap is incomplete until the full 102 skills are regenerated from hardened rules

## Scope

In scope:

1. Scan the original 102 source-scene directories under:
   - `D:/desk/智能体资料/全量业务场景/一平台场景`
2. Cross-map each source scene to the current final generated skill
3. Detect source-side evidence for the five runtime-semantics gap classes
4. Produce a full risk ledger for all 102 scenes
5. Define the bounded implementation routes required to harden generation rules
6. Define the required full rematerialization and validation refresh after rule changes

Out of scope:

1. Inner-network execution itself
2. Login / credential handling
3. Host-bridge runtime hardening outside current generated-scene semantics
4. Scene-by-scene ad hoc inner-network patching as the primary method

## Problem Restatement

The repository already reached:

1. `102 / 102` framework auto-pass
2. `102 / 102` materialized skills
3. deterministic invocation readiness
4. full direct mock pass

But `sweep-030-scene` proved that generated skills can still diverge from original scene runtime semantics in ways that only surface when actually invoked in a browser-attached environment.

The project cannot sustainably close that gap by waiting for each scene to fail in inner-network execution.

The missing capability is:

`source-first runtime semantics extraction and rule hardening`

## Runtime-Semantics Gap Taxonomy

The five anchor gap classes remain the canonical taxonomy.

### 1. `invocation_alias_gap`

The original scene affords natural operator phrasing, but the generated deterministic manifest is too narrow.

### 2. `dictionary_recovery_gap`

The original scene contains embedded dictionaries, trees, or option structures, but the generated skill only restores a starter subset or no dictionary.

### 3. `parameter_default_semantics_gap`

The original page supplies default time / mode / org semantics, but the generated skill initially treats the parameter as explicitly required.

### 4. `resolver_to_request_mapping_gap`

The generated resolver output names are not the actual request payload field names used by the original page.

### 5. `runtime_url_semantics_gap`

The generated skill does not properly separate:

1. app-entry URL
2. module-route URL
3. API endpoint URL
4. runtime browser context URL

## New Required Source-Side Scan

The new parent roadmap must explicitly scan the original source scenes for high-signal evidence.

### Evidence families to scan

1. Dictionary files
   - `city.js`
   - `dict.js`
   - `enum.js`
   - `options*.js`
   - tree / option / label-code-value arrays

2. Default-parameter semantics
   - `moment(`
   - `dayjs(`
   - month/week defaulting
   - implicit query payload initialization

3. Request payload semantics
   - `$.ajax`
   - `fetch`
   - `contentType`
   - `data`
   - request body field names

4. Runtime URL semantics
   - app entry URLs
   - module route URLs
   - menu navigation targets
   - bootstrap candidates

5. Invocation alias evidence
   - titles
   - menu labels
   - button text
   - route names
   - report names
   - operator-facing wording

### Required output of the scan

For each source scene:

1. whether embedded dictionaries exist
2. whether page defaults exist
3. whether request-field aliasing exists
4. whether multiple URL kinds exist
5. whether natural alias variation is likely

## Work Product Hierarchy

The roadmap should produce three layers of output.

### Layer 1: Source-Side Risk Ledger

A full 102-scene ledger that starts from original source evidence.

### Layer 2: Rule-Hardening Route Map

A route map that groups scenes by reusable rule fixes rather than by scene name.

### Layer 3: Rematerialization + Validation Refresh Plan

A controlled plan for regenerating all 102 skills and refreshing validation assets after the rule changes land.

## Core Routes

The source-first roadmap must be split into these fixed routes:

### Route A: Source Cross-Scan and Evidence Ledger

Goal:

Build a full 102-scene source-first runtime-semantics risk inventory.

### Route B: Rule-Level Hardening Design

Goal:

Translate the source-first gaps into rule-level changes for analyzer/generator/manifest output.

Primary targets:

1. alias generation
2. dictionary extraction
3. parameter default recovery
4. resolver-to-request field mapping
5. runtime URL classification

### Route C: Bounded Implementation Slices

Goal:

Implement the rule-level hardening in bounded slices organized by reusable fix route, not by single scene.

### Route D: Full 102 Rematerialization

Goal:

Regenerate all 102 skills after hardening so the new rules actually propagate to the released skill bundle.

### Route E: Validation Refresh

Goal:

Refresh:

1. deterministic invocation readiness
2. parameter readiness
3. static validation
4. direct mock execution
5. offline / pseudo-production handoff assets

## Inputs

Primary source inventory:

- `D:/desk/智能体资料/全量业务场景/一平台场景`

Primary generated comparison inventory:

- `examples/scene_skill_102_final_materialization_2026-04-19/skills`

Supporting assets:

- `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
- `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`

## Deliverables

### 1. Source-first risk ledger

- `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`

### 2. Source-first analysis report

- `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-report.md`

### 3. Rule-hardening roadmap outputs

Not implemented in this design, but this design must define the bounded next plans that follow the ledger.

## Acceptance Criteria

This design is successful when:

1. it explicitly requires source-scene cross-scan over the full 102 set
2. it no longer relies on generated-skill-only inspection as the main discovery method
3. it makes full rematerialization a required downstream step
4. it treats `sweep-030-scene` as an anchor case, not a one-off patch
5. it defines a route from source scan to rule hardening to regeneration

## Stop Rule

Stop after publishing the parent design and parent plan.

Do not begin source scanning or implementation inside this design document.