admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

16 KiB

Raw Blame History

Generated Scene Rectification Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Rectify the generated-scene pipeline so it stops emitting false-positive runnable skills for complex internal scenes, specifically by fixing sceneId degeneration, bootstrap pollution, incomplete workflow reconstruction, and readiness fail-open behavior.

Architecture: Keep the current Scene IR pipeline, but add four hard control chains around it: naming validation, bootstrap evidence stratification, workflow evidence reconstruction, and readiness gating. Generation must fail closed whenever these chains are incomplete.

Tech Stack: Rust, Node.js, HTML/CSS/JavaScript, serde_json, OpenAI-compatible LLM API

Scope Check

This plan implements the design in:

docs/superpowers/specs/2026-04-17-generated-scene-rectification-design.md

This plan builds on the existing generated-scene foundation already described in:

docs/superpowers/specs/2026-04-17-scene-skill-compiler-design.md
docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md
docs/superpowers/specs/2026-04-17-enhanced-llm-extraction-schema-design.md

This plan does not attempt to solve:

login or authentication recovery
Chromium host integration or browser embedding changes
full runtime resolver expansion beyond what this rectification needs
arbitrary historical scene compatibility outside the reference regression cases

File Map

Frontend scene generator

File	Action	Purpose
`frontend/scene-generator/generator-runner.js`	Modify	Implement naming fallback control, URL evidence stratification, workflow evidence cleanup, and pre-generation gate inputs
`frontend/scene-generator/llm-client.js`	Modify	Tighten sceneId semantic constraints and reject low-entropy LLM naming output
`frontend/scene-generator/server.js`	Modify	Aggregate readiness gates, block unsafe generation, and return rectification diagnostics
`frontend/scene-generator/sg_scene_generator.html`	Modify	Show invalid `sceneId`, bootstrap role breakdown, workflow evidence completeness, and generation block reasons

Rust generated-scene pipeline

File	Action	Purpose
`src/generated_scene/analyzer.rs`	Modify	Add endpoint denoising, evidence role typing, and stricter archetype preconditions
`src/generated_scene/ir.rs`	Modify	Extend IR to carry candidate roles, gate states, and workflow evidence completeness
`src/generated_scene/generator.rs`	Modify	Prevent compiler routing when gates fail and surface fail-closed diagnostics

Tests and fixtures

File	Action	Purpose
`tests/scene_generator_test.rs`	Modify	Cover naming, bootstrap, workflow, and readiness regression cases
`tests/scene_generator_html_test.rs`	Modify	Cover HTML/UI risk and blocking output
`tests/fixtures/generated_scene/paginated_enrichment/*`	Modify	Preserve marketing-like reference coverage
`tests/fixtures/generated_scene/multi_mode/*`	Modify	Preserve tq-like multi-mode coverage
Additional fixture files as needed	Create	Add low-entropy naming and localhost-pollution regression inputs

Scope Guardrails

Do not broaden this work into a generic scene-generator redesign.
Do not remove the existing Scene IR structure; extend and constrain it.
Do not let localhost or helper/export endpoints participate in bootstrap selection.
Do not silently coerce invalid sceneId values into accepted ids.
Do not route into paginated_enrichment unless its minimum workflow evidence is complete.
Do not emit a default runnable skill when any rectification gate fails.

Task 1: Rectify Naming Chain

Files:

Modify: frontend/scene-generator/generator-runner.js
Modify: frontend/scene-generator/llm-client.js
Modify: frontend/scene-generator/server.js
Modify: src/generated_scene/ir.rs

Goal: Stop Chinese-source scenes from degrading into low-information ids such as 2-0, and turn sceneId into a validated business identifier instead of a raw slug fallback.

Step 1: Classify sceneId candidate sources

Define explicit candidate tiers for sceneId:

LLM semantic business id
deterministic keyword-derived id
controlled alias/transliteration fallback
invalid fallback candidate

Expected result: the pipeline can explain where the chosen id came from.

Step 2: Add low-entropy sceneId validation

Implement shared validation rules that reject ids which are:

numeric-only or numeric-dominant
too short to be business-readable
generic placeholders such as scene or report
semantically detached from the extracted sceneName

Expected result: ids like 2-0, 1-0, scene, report are blocked.

Step 3: Fail closed on invalid sceneId

Update generation flow so invalid sceneId produces:

invalid_scene_id gate failure
readiness downgrade
analysis/report output only unless explicitly overridden later by a separate approved flow

Expected result: invalid ids never create a formal generated skill directory by default.

Step 4: Surface naming diagnostics in server/UI

Return and display:

chosen sceneId
candidate source
validation result
invalidation reason if blocked
Step 5: Add regression tests

Cover at least:

Chinese source name that previously degraded to 2-0
valid semantic id chosen over slug fallback
invalid low-entropy id blocked from generation
Step 6: Commit

git add frontend/scene-generator/generator-runner.js frontend/scene-generator/llm-client.js frontend/scene-generator/server.js src/generated_scene/ir.rs tests/scene_generator_test.rs
git commit -m "fix(generator): block degenerate generated scene ids"

Task 2: Rectify Bootstrap Chain

Files:

Modify: frontend/scene-generator/generator-runner.js
Modify: frontend/scene-generator/server.js
Modify: src/generated_scene/analyzer.rs
Modify: src/generated_scene/ir.rs

Goal: Separate business bootstrap candidates from localhost/export/helper URLs so internal-network entry domains resolve correctly.

Step 1: Add URL evidence role stratification

Classify URL candidates into:

business_entry
business_api
gateway_api
export_service
local_helper
static_asset
template_noise

Expected result: every URL candidate is typed before bootstrap selection.

Step 2: Add deterministic localhost and noise rejection

Ensure that:

localhost
127.0.0.1
SurfaceServices
ReportServices
.js / .css assets
template placeholders and format strings

are routed away from bootstrap candidates.

Expected result: helper/export/static/template strings can remain as evidence but can never win bootstrap.

Step 3: Redefine bootstrap resolution order

Bootstrap selection may only consume:

business_entry
business_api
gateway_api

When only helper/noise roles exist, set bootstrap to unresolved and downgrade readiness.

Step 4: Preserve export/helper evidence separately

Retain localhost/export endpoints as downstream evidence for workflow/reporting, but isolate them from expectedDomain and targetUrl.

Step 5: Add regression tests

Cover at least:

marketing-like source choosing yx.gs.sgcc.com.cn over localhost
mixed business + gateway scene preserving business target page
scene with only localhost/noise ending in unresolved bootstrap
Step 6: Commit

git add frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs tests/scene_generator_test.rs
git commit -m "fix(generator): stratify bootstrap evidence and exclude localhost"

Task 3: Rectify Workflow Chain

Files:

Modify: frontend/scene-generator/generator-runner.js
Modify: frontend/scene-generator/server.js
Modify: src/generated_scene/analyzer.rs
Modify: src/generated_scene/ir.rs
Modify: src/generated_scene/generator.rs

Goal: Reconstruct workflow from request-chain evidence instead of generic field names, so paginated_enrichment is only emitted when its true workflow exists.

Step 1: Split workflow evidence into typed layers

Represent workflow evidence as:

request evidence
pagination evidence
secondary request evidence
post-process evidence

Expected result: archetype decisions operate on structured workflow signals instead of a flat endpoint list.

Step 2: Denoise endpoint and method evidence

Normalize and filter out:

${apiUrl}
template placeholders
exception strings
log text fragments
localhost export endpoints

Expected result: workflow reconstruction only consumes business-relevant requests.

Step 3: Tighten archetype routing rules

Require paginated_enrichment to have at minimum:

one main list request
one pagination variable set
one secondary request or explicit per-item enrichment function
one post-process action among filter, transform, export

If only part of this exists, preserve it as candidate evidence but do not route into the compiler.

Step 4: Narrow multi_mode detection

Allow multi_mode_request only when mode switching materially changes at least one of:

request body
endpoint shape
response path
column definition

Expected result: generic type/tab/mode/status fields alone no longer misclassify marketing-like scenes.

Step 5: Block compiler routing on incomplete workflow

Update generator-side routing so incomplete evidence cannot produce a formal paginated_enrichment skill package.

Step 6: Add regression tests

Cover at least:

marketing-like scene must expose paginate + secondary_request + post-process evidence
generic mode fields without real mode divergence must not force multi_mode_request
noisy endpoint lists must still reconstruct the correct business request chain
Step 7: Commit

git add frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs src/generated_scene/generator.rs tests/scene_generator_test.rs
git commit -m "fix(generator): require complete workflow evidence before archetype routing"

Task 4: Rectify Readiness Chain

Files:

Modify: frontend/scene-generator/server.js
Modify: frontend/scene-generator/sg_scene_generator.html
Modify: src/generated_scene/ir.rs
Modify: src/generated_scene/generator.rs
Modify: tests/scene_generator_html_test.rs

Goal: Turn readiness into a hard gate that distinguishes analysis output from runnable skill output.

Step 1: Add explicit rectification gates

Track at minimum:

scene_id_valid
bootstrap_resolved
workflow_complete_for_archetype
runtime_contract_compatible

Expected result: readiness is derived from named gates rather than a loose score only.

Step 2: Enforce fail-closed readiness rules

Require:

all core gates pass for readiness A or B
any core gate failure forces readiness C
generation endpoint blocks runnable output on gate failure
Step 3: Separate analysis result from generation result

When gates fail, allow:

analysis preview
evidence report
block reasons

But do not default to:

full skill emission
compiler success messaging
Step 4: Expose readiness breakdown in UI

Display:

gate names
pass/fail state
missing workflow pieces
bootstrap resolution reason
invalid sceneId reason
Step 5: Add regression tests

Cover at least:

invalid sceneId forcing readiness C
unresolved bootstrap forcing readiness C
incomplete paginated workflow forcing readiness C
fully valid reference fixture remaining eligible for generation
Step 6: Commit

git add frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/ir.rs src/generated_scene/generator.rs tests/scene_generator_html_test.rs tests/scene_generator_test.rs
git commit -m "fix(generator): enforce readiness fail-closed gating"

Task 5: Reference Regression Verification

Files:

Modify: tests/scene_generator_test.rs
Modify: tests/scene_generator_html_test.rs
Modify/Create: relevant fixtures under tests/fixtures/generated_scene/

Goal: Lock the rectification against the two reference scene families and ensure future changes do not reintroduce the same false positives.

Step 1: Regress marketing-like fixture

Verify the marketing reference path now satisfies:

non-degenerate sceneId
bootstrap rooted in yx.gs.sgcc.com.cn family
workflow includes paginate
workflow includes secondary_request
readiness does not pass if any of the above are missing
Step 2: Regress tq-like fixture

Verify the tq reference path still satisfies:

stable semantic sceneId
valid non-localhost bootstrap
genuine multi_mode_request detection
no downgrade caused by the stricter marketing rectification rules
Step 3: Run verification commands

Run:

cargo check
cargo test --test scene_generator_test -- --nocapture
cargo test --test scene_generator_html_test -- --nocapture
node --check frontend/scene-generator/llm-client.js
node --check frontend/scene-generator/generator-runner.js
node --check frontend/scene-generator/server.js

Expected result: rectification passes both Rust and Node validation plus regression coverage.

Step 4: Record outcomes in generated reports if needed

If the implementation emits readiness or analysis JSON reports, ensure the test fixtures assert the key blocked/passed states directly.

Step 5: Commit

git add tests/scene_generator_test.rs tests/scene_generator_html_test.rs tests/fixtures/generated_scene frontend/scene-generator/llm-client.js frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs src/generated_scene/generator.rs
git commit -m "test(generator): lock generated scene rectification regressions"

Acceptance Criteria

This plan is complete when all of the following are true:

Chinese-source scene names no longer degrade into low-entropy ids like 2-0.
localhost, 127.0.0.1, export services, and helper URLs no longer compete for bootstrap resolution.
paginated_enrichment routing only occurs when pagination, secondary request, and post-process evidence are all present.
Incomplete evidence paths fail closed with explicit readiness gate failures instead of generating false-positive runnable skills.
The marketing-like and tq-like reference scenes both remain covered by automated regression tests.

Rollback Strategy

If this rectification causes unacceptable regressions:

Revert the latest rectification task commit only, not unrelated generated-scene work.
Keep the previous Scene IR and compiler structure intact.
Preserve newly added fixtures and tests where possible, then relax only the specific gate or classifier that caused the regression.

Notes For Executors

Implement this plan strictly in order: naming, bootstrap, workflow, readiness, verification.
Do not skip ahead to UI polish before the gating logic is in place.
Do not add speculative resolver or login work under this plan.
Any need for user override or forced draft generation must be handled as a separate follow-up spec, not smuggled into this rectification plan.

16 KiB Raw Blame History

Generated Scene Rectification Implementation Plan

Scope Check

File Map

Frontend scene generator

Rust generated-scene pipeline

Tests and fixtures

Scope Guardrails

Task 1: Rectify Naming Chain

Task 2: Rectify Bootstrap Chain

Task 3: Rectify Workflow Chain

Task 4: Rectify Readiness Chain

Task 5: Reference Regression Verification

Acceptance Criteria

Rollback Strategy

Notes For Executors

16 KiB

Raw Blame History