feat: add generated scene skill platform hardening
This commit is contained in:
@@ -0,0 +1,441 @@
|
||||
# Generated Scene Rectification Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Rectify the generated-scene pipeline so it stops emitting false-positive runnable skills for complex internal scenes, specifically by fixing `sceneId` degeneration, bootstrap pollution, incomplete workflow reconstruction, and readiness fail-open behavior.
|
||||
|
||||
**Architecture:** Keep the current `Scene IR` pipeline, but add four hard control chains around it: naming validation, bootstrap evidence stratification, workflow evidence reconstruction, and readiness gating. Generation must fail closed whenever these chains are incomplete.
|
||||
|
||||
**Tech Stack:** Rust, Node.js, HTML/CSS/JavaScript, serde_json, OpenAI-compatible LLM API
|
||||
|
||||
---
|
||||
|
||||
## Scope Check
|
||||
|
||||
This plan implements the design in:
|
||||
|
||||
- `docs/superpowers/specs/2026-04-17-generated-scene-rectification-design.md`
|
||||
|
||||
This plan builds on the existing generated-scene foundation already described in:
|
||||
|
||||
- `docs/superpowers/specs/2026-04-17-scene-skill-compiler-design.md`
|
||||
- `docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md`
|
||||
- `docs/superpowers/specs/2026-04-17-enhanced-llm-extraction-schema-design.md`
|
||||
|
||||
This plan does not attempt to solve:
|
||||
|
||||
- login or authentication recovery
|
||||
- Chromium host integration or browser embedding changes
|
||||
- full runtime resolver expansion beyond what this rectification needs
|
||||
- arbitrary historical scene compatibility outside the reference regression cases
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
### Frontend scene generator
|
||||
|
||||
| File | Action | Purpose |
|
||||
|------|--------|---------|
|
||||
| `frontend/scene-generator/generator-runner.js` | Modify | Implement naming fallback control, URL evidence stratification, workflow evidence cleanup, and pre-generation gate inputs |
|
||||
| `frontend/scene-generator/llm-client.js` | Modify | Tighten sceneId semantic constraints and reject low-entropy LLM naming output |
|
||||
| `frontend/scene-generator/server.js` | Modify | Aggregate readiness gates, block unsafe generation, and return rectification diagnostics |
|
||||
| `frontend/scene-generator/sg_scene_generator.html` | Modify | Show invalid `sceneId`, bootstrap role breakdown, workflow evidence completeness, and generation block reasons |
|
||||
|
||||
### Rust generated-scene pipeline
|
||||
|
||||
| File | Action | Purpose |
|
||||
|------|--------|---------|
|
||||
| `src/generated_scene/analyzer.rs` | Modify | Add endpoint denoising, evidence role typing, and stricter archetype preconditions |
|
||||
| `src/generated_scene/ir.rs` | Modify | Extend IR to carry candidate roles, gate states, and workflow evidence completeness |
|
||||
| `src/generated_scene/generator.rs` | Modify | Prevent compiler routing when gates fail and surface fail-closed diagnostics |
|
||||
|
||||
### Tests and fixtures
|
||||
|
||||
| File | Action | Purpose |
|
||||
|------|--------|---------|
|
||||
| `tests/scene_generator_test.rs` | Modify | Cover naming, bootstrap, workflow, and readiness regression cases |
|
||||
| `tests/scene_generator_html_test.rs` | Modify | Cover HTML/UI risk and blocking output |
|
||||
| `tests/fixtures/generated_scene/paginated_enrichment/*` | Modify | Preserve marketing-like reference coverage |
|
||||
| `tests/fixtures/generated_scene/multi_mode/*` | Modify | Preserve tq-like multi-mode coverage |
|
||||
| Additional fixture files as needed | Create | Add low-entropy naming and localhost-pollution regression inputs |
|
||||
|
||||
---
|
||||
|
||||
## Scope Guardrails
|
||||
|
||||
- Do not broaden this work into a generic scene-generator redesign.
|
||||
- Do not remove the existing `Scene IR` structure; extend and constrain it.
|
||||
- Do not let `localhost` or helper/export endpoints participate in bootstrap selection.
|
||||
- Do not silently coerce invalid `sceneId` values into accepted ids.
|
||||
- Do not route into `paginated_enrichment` unless its minimum workflow evidence is complete.
|
||||
- Do not emit a default runnable skill when any rectification gate fails.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Rectify Naming Chain
|
||||
|
||||
**Files:**
|
||||
- Modify: `frontend/scene-generator/generator-runner.js`
|
||||
- Modify: `frontend/scene-generator/llm-client.js`
|
||||
- Modify: `frontend/scene-generator/server.js`
|
||||
- Modify: `src/generated_scene/ir.rs`
|
||||
|
||||
**Goal:** Stop Chinese-source scenes from degrading into low-information ids such as `2-0`, and turn `sceneId` into a validated business identifier instead of a raw slug fallback.
|
||||
|
||||
- [ ] **Step 1: Classify sceneId candidate sources**
|
||||
|
||||
Define explicit candidate tiers for `sceneId`:
|
||||
|
||||
1. LLM semantic business id
|
||||
2. deterministic keyword-derived id
|
||||
3. controlled alias/transliteration fallback
|
||||
4. invalid fallback candidate
|
||||
|
||||
Expected result: the pipeline can explain where the chosen id came from.
|
||||
|
||||
- [ ] **Step 2: Add low-entropy sceneId validation**
|
||||
|
||||
Implement shared validation rules that reject ids which are:
|
||||
|
||||
- numeric-only or numeric-dominant
|
||||
- too short to be business-readable
|
||||
- generic placeholders such as `scene` or `report`
|
||||
- semantically detached from the extracted `sceneName`
|
||||
|
||||
Expected result: ids like `2-0`, `1-0`, `scene`, `report` are blocked.
|
||||
|
||||
- [ ] **Step 3: Fail closed on invalid sceneId**
|
||||
|
||||
Update generation flow so invalid `sceneId` produces:
|
||||
|
||||
- `invalid_scene_id` gate failure
|
||||
- readiness downgrade
|
||||
- analysis/report output only unless explicitly overridden later by a separate approved flow
|
||||
|
||||
Expected result: invalid ids never create a formal generated skill directory by default.
|
||||
|
||||
- [ ] **Step 4: Surface naming diagnostics in server/UI**
|
||||
|
||||
Return and display:
|
||||
|
||||
- chosen `sceneId`
|
||||
- candidate source
|
||||
- validation result
|
||||
- invalidation reason if blocked
|
||||
|
||||
- [ ] **Step 5: Add regression tests**
|
||||
|
||||
Cover at least:
|
||||
|
||||
- Chinese source name that previously degraded to `2-0`
|
||||
- valid semantic id chosen over slug fallback
|
||||
- invalid low-entropy id blocked from generation
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add frontend/scene-generator/generator-runner.js frontend/scene-generator/llm-client.js frontend/scene-generator/server.js src/generated_scene/ir.rs tests/scene_generator_test.rs
|
||||
git commit -m "fix(generator): block degenerate generated scene ids"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Rectify Bootstrap Chain
|
||||
|
||||
**Files:**
|
||||
- Modify: `frontend/scene-generator/generator-runner.js`
|
||||
- Modify: `frontend/scene-generator/server.js`
|
||||
- Modify: `src/generated_scene/analyzer.rs`
|
||||
- Modify: `src/generated_scene/ir.rs`
|
||||
|
||||
**Goal:** Separate business bootstrap candidates from localhost/export/helper URLs so internal-network entry domains resolve correctly.
|
||||
|
||||
- [ ] **Step 1: Add URL evidence role stratification**
|
||||
|
||||
Classify URL candidates into:
|
||||
|
||||
- `business_entry`
|
||||
- `business_api`
|
||||
- `gateway_api`
|
||||
- `export_service`
|
||||
- `local_helper`
|
||||
- `static_asset`
|
||||
- `template_noise`
|
||||
|
||||
Expected result: every URL candidate is typed before bootstrap selection.
|
||||
|
||||
- [ ] **Step 2: Add deterministic localhost and noise rejection**
|
||||
|
||||
Ensure that:
|
||||
|
||||
- `localhost`
|
||||
- `127.0.0.1`
|
||||
- `SurfaceServices`
|
||||
- `ReportServices`
|
||||
- `.js` / `.css` assets
|
||||
- template placeholders and format strings
|
||||
|
||||
are routed away from bootstrap candidates.
|
||||
|
||||
Expected result: helper/export/static/template strings can remain as evidence but can never win bootstrap.
|
||||
|
||||
- [ ] **Step 3: Redefine bootstrap resolution order**
|
||||
|
||||
Bootstrap selection may only consume:
|
||||
|
||||
1. `business_entry`
|
||||
2. `business_api`
|
||||
3. `gateway_api`
|
||||
|
||||
When only helper/noise roles exist, set bootstrap to unresolved and downgrade readiness.
|
||||
|
||||
- [ ] **Step 4: Preserve export/helper evidence separately**
|
||||
|
||||
Retain localhost/export endpoints as downstream evidence for workflow/reporting, but isolate them from `expectedDomain` and `targetUrl`.
|
||||
|
||||
- [ ] **Step 5: Add regression tests**
|
||||
|
||||
Cover at least:
|
||||
|
||||
- marketing-like source choosing `yx.gs.sgcc.com.cn` over `localhost`
|
||||
- mixed business + gateway scene preserving business target page
|
||||
- scene with only localhost/noise ending in unresolved bootstrap
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs tests/scene_generator_test.rs
|
||||
git commit -m "fix(generator): stratify bootstrap evidence and exclude localhost"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Rectify Workflow Chain
|
||||
|
||||
**Files:**
|
||||
- Modify: `frontend/scene-generator/generator-runner.js`
|
||||
- Modify: `frontend/scene-generator/server.js`
|
||||
- Modify: `src/generated_scene/analyzer.rs`
|
||||
- Modify: `src/generated_scene/ir.rs`
|
||||
- Modify: `src/generated_scene/generator.rs`
|
||||
|
||||
**Goal:** Reconstruct workflow from request-chain evidence instead of generic field names, so `paginated_enrichment` is only emitted when its true workflow exists.
|
||||
|
||||
- [ ] **Step 1: Split workflow evidence into typed layers**
|
||||
|
||||
Represent workflow evidence as:
|
||||
|
||||
- request evidence
|
||||
- pagination evidence
|
||||
- secondary request evidence
|
||||
- post-process evidence
|
||||
|
||||
Expected result: archetype decisions operate on structured workflow signals instead of a flat endpoint list.
|
||||
|
||||
- [ ] **Step 2: Denoise endpoint and method evidence**
|
||||
|
||||
Normalize and filter out:
|
||||
|
||||
- `${apiUrl}`
|
||||
- template placeholders
|
||||
- exception strings
|
||||
- log text fragments
|
||||
- localhost export endpoints
|
||||
|
||||
Expected result: workflow reconstruction only consumes business-relevant requests.
|
||||
|
||||
- [ ] **Step 3: Tighten archetype routing rules**
|
||||
|
||||
Require `paginated_enrichment` to have at minimum:
|
||||
|
||||
1. one main list request
|
||||
2. one pagination variable set
|
||||
3. one secondary request or explicit per-item enrichment function
|
||||
4. one post-process action among `filter`, `transform`, `export`
|
||||
|
||||
If only part of this exists, preserve it as candidate evidence but do not route into the compiler.
|
||||
|
||||
- [ ] **Step 4: Narrow multi_mode detection**
|
||||
|
||||
Allow `multi_mode_request` only when mode switching materially changes at least one of:
|
||||
|
||||
- request body
|
||||
- endpoint shape
|
||||
- response path
|
||||
- column definition
|
||||
|
||||
Expected result: generic `type/tab/mode/status` fields alone no longer misclassify marketing-like scenes.
|
||||
|
||||
- [ ] **Step 5: Block compiler routing on incomplete workflow**
|
||||
|
||||
Update generator-side routing so incomplete evidence cannot produce a formal `paginated_enrichment` skill package.
|
||||
|
||||
- [ ] **Step 6: Add regression tests**
|
||||
|
||||
Cover at least:
|
||||
|
||||
- marketing-like scene must expose `paginate` + `secondary_request` + post-process evidence
|
||||
- generic mode fields without real mode divergence must not force `multi_mode_request`
|
||||
- noisy endpoint lists must still reconstruct the correct business request chain
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
```bash
|
||||
git add frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs src/generated_scene/generator.rs tests/scene_generator_test.rs
|
||||
git commit -m "fix(generator): require complete workflow evidence before archetype routing"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Rectify Readiness Chain
|
||||
|
||||
**Files:**
|
||||
- Modify: `frontend/scene-generator/server.js`
|
||||
- Modify: `frontend/scene-generator/sg_scene_generator.html`
|
||||
- Modify: `src/generated_scene/ir.rs`
|
||||
- Modify: `src/generated_scene/generator.rs`
|
||||
- Modify: `tests/scene_generator_html_test.rs`
|
||||
|
||||
**Goal:** Turn readiness into a hard gate that distinguishes analysis output from runnable skill output.
|
||||
|
||||
- [ ] **Step 1: Add explicit rectification gates**
|
||||
|
||||
Track at minimum:
|
||||
|
||||
- `scene_id_valid`
|
||||
- `bootstrap_resolved`
|
||||
- `workflow_complete_for_archetype`
|
||||
- `runtime_contract_compatible`
|
||||
|
||||
Expected result: readiness is derived from named gates rather than a loose score only.
|
||||
|
||||
- [ ] **Step 2: Enforce fail-closed readiness rules**
|
||||
|
||||
Require:
|
||||
|
||||
- all core gates pass for readiness `A` or `B`
|
||||
- any core gate failure forces readiness `C`
|
||||
- generation endpoint blocks runnable output on gate failure
|
||||
|
||||
- [ ] **Step 3: Separate analysis result from generation result**
|
||||
|
||||
When gates fail, allow:
|
||||
|
||||
- analysis preview
|
||||
- evidence report
|
||||
- block reasons
|
||||
|
||||
But do not default to:
|
||||
|
||||
- full skill emission
|
||||
- compiler success messaging
|
||||
|
||||
- [ ] **Step 4: Expose readiness breakdown in UI**
|
||||
|
||||
Display:
|
||||
|
||||
- gate names
|
||||
- pass/fail state
|
||||
- missing workflow pieces
|
||||
- bootstrap resolution reason
|
||||
- invalid sceneId reason
|
||||
|
||||
- [ ] **Step 5: Add regression tests**
|
||||
|
||||
Cover at least:
|
||||
|
||||
- invalid `sceneId` forcing readiness `C`
|
||||
- unresolved bootstrap forcing readiness `C`
|
||||
- incomplete paginated workflow forcing readiness `C`
|
||||
- fully valid reference fixture remaining eligible for generation
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/ir.rs src/generated_scene/generator.rs tests/scene_generator_html_test.rs tests/scene_generator_test.rs
|
||||
git commit -m "fix(generator): enforce readiness fail-closed gating"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Reference Regression Verification
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/scene_generator_test.rs`
|
||||
- Modify: `tests/scene_generator_html_test.rs`
|
||||
- Modify/Create: relevant fixtures under `tests/fixtures/generated_scene/`
|
||||
|
||||
**Goal:** Lock the rectification against the two reference scene families and ensure future changes do not reintroduce the same false positives.
|
||||
|
||||
- [ ] **Step 1: Regress marketing-like fixture**
|
||||
|
||||
Verify the marketing reference path now satisfies:
|
||||
|
||||
- non-degenerate `sceneId`
|
||||
- bootstrap rooted in `yx.gs.sgcc.com.cn` family
|
||||
- workflow includes `paginate`
|
||||
- workflow includes `secondary_request`
|
||||
- readiness does not pass if any of the above are missing
|
||||
|
||||
- [ ] **Step 2: Regress tq-like fixture**
|
||||
|
||||
Verify the tq reference path still satisfies:
|
||||
|
||||
- stable semantic `sceneId`
|
||||
- valid non-localhost bootstrap
|
||||
- genuine `multi_mode_request` detection
|
||||
- no downgrade caused by the stricter marketing rectification rules
|
||||
|
||||
- [ ] **Step 3: Run verification commands**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cargo check
|
||||
cargo test --test scene_generator_test -- --nocapture
|
||||
cargo test --test scene_generator_html_test -- --nocapture
|
||||
node --check frontend/scene-generator/llm-client.js
|
||||
node --check frontend/scene-generator/generator-runner.js
|
||||
node --check frontend/scene-generator/server.js
|
||||
```
|
||||
|
||||
Expected result: rectification passes both Rust and Node validation plus regression coverage.
|
||||
|
||||
- [ ] **Step 4: Record outcomes in generated reports if needed**
|
||||
|
||||
If the implementation emits readiness or analysis JSON reports, ensure the test fixtures assert the key blocked/passed states directly.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add tests/scene_generator_test.rs tests/scene_generator_html_test.rs tests/fixtures/generated_scene frontend/scene-generator/llm-client.js frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs src/generated_scene/generator.rs
|
||||
git commit -m "test(generator): lock generated scene rectification regressions"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
This plan is complete when all of the following are true:
|
||||
|
||||
1. Chinese-source scene names no longer degrade into low-entropy ids like `2-0`.
|
||||
2. `localhost`, `127.0.0.1`, export services, and helper URLs no longer compete for bootstrap resolution.
|
||||
3. `paginated_enrichment` routing only occurs when pagination, secondary request, and post-process evidence are all present.
|
||||
4. Incomplete evidence paths fail closed with explicit readiness gate failures instead of generating false-positive runnable skills.
|
||||
5. The marketing-like and tq-like reference scenes both remain covered by automated regression tests.
|
||||
|
||||
## Rollback Strategy
|
||||
|
||||
If this rectification causes unacceptable regressions:
|
||||
|
||||
1. Revert the latest rectification task commit only, not unrelated generated-scene work.
|
||||
2. Keep the previous `Scene IR` and compiler structure intact.
|
||||
3. Preserve newly added fixtures and tests where possible, then relax only the specific gate or classifier that caused the regression.
|
||||
|
||||
## Notes For Executors
|
||||
|
||||
- Implement this plan strictly in order: naming, bootstrap, workflow, readiness, verification.
|
||||
- Do not skip ahead to UI polish before the gating logic is in place.
|
||||
- Do not add speculative resolver or login work under this plan.
|
||||
- Any need for user override or forced draft generation must be handled as a separate follow-up spec, not smuggled into this rectification plan.
|
||||
Reference in New Issue
Block a user