admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

6.6 KiB

Raw Blame History

Generated Scene Source-First Runtime Semantics Ledger Design

Date: 2026-04-20 Status: Draft Parent roadmap:

docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md Upstream scan:

docs/superpowers/plans/2026-04-20-generated-scene-source-evidence-cross-scan-plan.md

Intent

Define the second bounded child step of the source-first runtime semantics hardening roadmap:

merge source-side evidence with generated-skill evidence into a full 102-scene runtime-semantics ledger

This design is still analysis-only. It does not modify src/, generated skills, validation assets, or execution-board state.

Objective

For every scene in the current 102-scene set:

merge source-side evidence from the completed cross-scan
compare that evidence against current generated skill manifests and references
assign one or more canonical runtime-semantics gap classes
assign a bounded riskLevel
distinguish:
- reusable generator-level rule gap
- runtime-only residual
publish a source-first runtime-semantics ledger that becomes the only valid input for later hardening-route design

Fixed Gap Taxonomy

The ledger must continue using the five gap classes already anchored by sweep-030-scene:

invocation_alias_gap
dictionary_recovery_gap
parameter_default_semantics_gap
resolver_to_request_mapping_gap
runtime_url_semantics_gap

No additional gap class should be invented inside this ledger stage unless the evidence is clearly outside these five and cannot be expressed as a subtype.

Scope

In scope:

the completed source cross-scan asset
the current final generated skills under examples/scene_skill_102_final_materialization_2026-04-19/skills
current deterministic invocation readiness assets
current natural-language parameter readiness assets
current parameter dictionary normalization assets
source-to-generated comparison for all 102 scenes
JSON ledger + human-readable report

Out of scope:

any change in src/
any skill manifest or script edit
any rematerialization
any validation rerun
any inner-network execution

Required Comparisons

The ledger stage must compare source evidence with generated output along these axes.

1. Invocation alias comparison

Check whether source-side operator wording, labels, route names, or titles imply broader natural-language coverage than the current generated include_keywords.

2. Dictionary comparison

Check whether source-side dictionaries, trees, or option arrays imply a richer entity dictionary than the generated references/*dictionary*.json assets currently expose.

3. Parameter default semantics comparison

Check whether source-side date / period / mode initialization implies a default-value policy that the generated manifest or resolver metadata does not currently preserve.

4. Resolver-to-request mapping comparison

Check whether source-side request field names differ from generated resolver output names and whether the generated skill currently encodes an explicit mapping.

5. Runtime URL comparison

Check whether source-side evidence implies multiple URL roles:

app entry URL
module route URL
API endpoint URL
runtime browser context URL

and whether the generated skill currently collapses those roles into a single ambiguous target.

Ledger Schema

Each scene record in the runtime-semantics ledger should include:

sceneId
sceneName
sourceDir
archetype
readiness
riskLevel
gaps
generatorLevelGap
runtimeOnlyResidual
recommendedFixRoutes
sourceEvidenceSummary
generatedEvidenceSummary
comparisonNotes

Risk-Level Rules

The ledger should use bounded, reproducible risk levels:

`high`

Use when the scene has strong source evidence for one or more gap classes and the current generated skill visibly lacks equivalent semantics.

`medium`

Use when the scene has source evidence for one or more gap classes, but current generated output appears partially aligned or the mismatch is plausible rather than explicit.

`low`

Use when source evidence exists but generated output already appears materially aligned, or when the residual is likely runtime-only rather than generator-level.

Generator-Level vs Runtime-Only

The ledger must classify whether a scene's residuals should later drive generator hardening or should remain runtime-only.

`generatorLevelGap = true`

Use when source evidence proves the generated skill is missing semantics that should be recoverable during generation.

`runtimeOnlyResidual = true`

Use when the remaining risk is primarily:

login / session
host runtime behavior
local-doc / host-bridge environment
inner-network-only execution context

and not a generation-semantic omission.

These two flags are not always mutually exclusive, but the ledger must explain why.

Inputs

Primary inputs:

tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json
examples/scene_skill_102_final_materialization_2026-04-19/skills
tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json
tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json

Anchor runtime findings:

the confirmed sweep-030-scene inner-network findings:
- alias mismatch
- starter-subset org dictionary
- page-semantic default period behavior
- request-field mismatch
- runtime context URL ambiguity

Output Artifacts

JSON

tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json

Report

docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md

The report must answer:

how many scenes are high, medium, low
how many scenes carry each gap class
how many scenes appear to require generator-level fixes
how many scenes look runtime-only
which route clusters are likely to yield the highest reuse

Acceptance Criteria

This design is complete when:

it defines a full-scene ledger stage rather than scene-by-scene notes
it binds the ledger to the fixed five-gap taxonomy
it defines how source evidence and generated evidence are compared
it defines riskLevel, generatorLevelGap, and runtimeOnlyResidual
it remains analysis-only

Stop Statement

Stop after publishing this ledger design and its child plan.

Do not execute the ledger build inside this design.

6.6 KiB Raw Blame History