feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00
parent 118fc77935
commit 956f0c2b68
439 changed files with 61974 additions and 3645 deletions
--- a/docs/superpowers/plans/2026-04-17-scene-skill-compiler-plan.md
+++ b/docs/superpowers/plans/2026-04-17-scene-skill-compiler-plan.md
@@ -0,0 +1,663 @@
+# Scene Skill Compiler Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Upgrade `sg_scene_generate` from a scene metadata extractor plus template filler into a reusable scene skill compiler that can understand workflow semantics, classify scene archetypes, and generate runnable skills for both `tq-lineloss-report`-style and `marketing-zero-consumer-report`-style internal scenes.
+
+**Architecture:** Introduce a unified `Scene IR`, switch extraction to a hybrid deterministic-plus-LLM pipeline, route generation by `workflowArchetype`, align runtime resolver contracts, and add readiness gates so users can tell whether a generated skill is safe to trial on the internal network.
+
+**Tech Stack:** Rust, Node.js, HTML/CSS/JavaScript, serde_json, OpenAI-compatible LLM API
+
+---
+
+## Scope Check
+
+This plan implements the design in:
+
+- `docs/superpowers/specs/2026-04-17-scene-skill-compiler-design.md`
+
+This plan builds on the existing generator work already described in:
+
+- `docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md`
+- `docs/superpowers/specs/2026-04-17-enhanced-llm-extraction-schema-design.md`
+- `docs/superpowers/specs/2026-04-17-progressive-template-enhancement-design.md`
+- `docs/superpowers/specs/2026-04-16-multi-scene-kind-generator-design.md`
+
+This plan does not attempt to solve:
+
+- full login and authentication reconstruction
+- all historical scene patterns in one pass
+- 100% no-touch generation without human review
+
+---
+
+## File Map
+
+### Core generator pipeline
+
+| File | Action | Purpose |
+|------|--------|---------|
+| `frontend/scene-generator/llm-client.js` | Modify | Replace truncation-only extraction with chunked workflow-aware extraction and `Scene IR` schema output |
+| `frontend/scene-generator/generator-runner.js` | Modify | Add deterministic scene scanning, key-fragment selection, and IR support |
+| `frontend/scene-generator/server.js` | Modify | Expose analysis, preview, readiness, and generation endpoints for `Scene IR` |
+| `frontend/scene-generator/sg_scene_generator.html` | Modify | Show extraction preview, archetype classification, bootstrap, risks, and readiness |
+
+### Rust backend
+
+| File | Action | Purpose |
+|------|--------|---------|
+| `src/generated_scene/analyzer.rs` | Modify | Add deterministic extraction helpers and archetype support |
+| `src/generated_scene/generator.rs` | Modify | Route generation by archetype and compile from `Scene IR` instead of ad hoc fields |
+| `src/generated_scene/ir.rs` | Create | Define unified `Scene IR` structs and serde contracts |
+| `src/bin/sg_scene_generate.rs` | Modify | Accept `Scene IR` JSON or file input and pass it into generator |
+| `src/compat/scene_platform/resolvers.rs` | Modify | Align runtime parameter resolution with generated contracts |
+
+### Tests and fixtures
+
+| File | Action | Purpose |
+|------|--------|---------|
+| `tests/scene_generator_test.rs` | Modify | Cover new analysis, archetype classification, and generation routing |
+| `tests/generated_scene_*` or related fixtures | Modify/Create | Add representative fixtures for single-request, multi-mode, and paginated-enrichment scenes |
+
+---
+
+## Scope Guardrails
+
+- Do not break existing `--scene-id`, `--scene-name`, or `--scene-kind` compatibility.
+- Do not require all scenes to provide complete metadata in HTML meta tags.
+- Do not force the runtime to support new resolver contracts unless generation is updated to gate incompatible output.
+- Do not assume all report scenes share `org + period` params.
+- Do not silently generate low-confidence skills as if they were runnable.
+
+---
+
+### Task 1: Fix Current Hard Failures Before Compiler Refactor
+
+**Files:**
+- Modify: `frontend/scene-generator/llm-client.js`
+- Modify: `frontend/scene-generator/generator-runner.js`
+- Modify: `frontend/scene-generator/server.js`
+- Modify: `frontend/scene-generator/sg_scene_generator.html`
+- Modify: `src/generated_scene/generator.rs`
+
+**Goal:** Stop the most obvious wrong outputs that currently make generated skills fail on the internal network even before the full compiler architecture lands.
+
+- [ ] **Step 1: Remove report-scene hardcoded parameter assumptions**
+
+Audit `scene.toml` generation in `src/generated_scene/generator.rs` and remove default injection of generic report params such as:
+
+- fixed `org`
+- fixed `period`
+- default dictionary entity for a specific city
+- generic page title keywords like `["报表", "线损"]`
+
+Expected result: generated params come from extracted scene semantics or are omitted when not confidently known.
+
+- [ ] **Step 2: Rework bootstrap source priority**
+
+Change bootstrap derivation so `expected_domain` and `target_url` are resolved using this order:
+
+1. explicit deep extraction result
+2. deterministic extraction from business entry points
+3. HTML meta tags if trustworthy
+4. fallback empty with warning
+
+Explicitly prevent script-host URLs such as static JS includes from becoming the business domain by mistake.
+
+- [ ] **Step 3: Replace naive truncation with chunked extraction input**
+
+Update `frontend/scene-generator/llm-client.js` and `frontend/scene-generator/generator-runner.js` so they no longer send only the first `15000/3000` characters. Replace with:
+
+1. directory tree summary
+2. `index.html` chunking
+3. URL-bearing fragments
+4. request-construction fragments
+5. branching logic fragments
+6. export-related fragments
+
+- [ ] **Step 4: Add analysis preview and risk banner in Web UI**
+
+Update `frontend/scene-generator/sg_scene_generator.html` and `frontend/scene-generator/server.js` to preview:
+
+- detected archetype
+- bootstrap
+- key endpoints
+- extracted params
+- workflow steps
+- confidence and risk notes
+
+- [ ] **Step 5: Verify with marketing and tq reference scenes**
+
+Run local analysis against the two reference scenes and confirm:
+
+- `marketing-zero-consumer-report` no longer resolves the wrong domain
+- `tq-lineloss-report` still identifies mode-related structures
+- generated preview no longer shows generic hardcoded report params
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add frontend/scene-generator/llm-client.js frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/generator.rs
+git commit -m "fix(generator): remove hardcoded report defaults and improve bootstrap extraction"
+```
+
+---
+
+### Task 2: Introduce Unified Scene IR
+
+**Files:**
+- Create: `src/generated_scene/ir.rs`
+- Modify: `src/generated_scene/generator.rs`
+- Modify: `src/bin/sg_scene_generate.rs`
+- Modify: `frontend/scene-generator/server.js`
+- Modify: `frontend/scene-generator/llm-client.js`
+
+**Goal:** Introduce a single intermediate representation that all extraction and compilation stages use.
+
+- [ ] **Step 1: Add Rust `Scene IR` structs**
+
+Create `src/generated_scene/ir.rs` with serde-enabled structs for:
+
+- `SceneIr`
+- `BootstrapIr`
+- `ParamIr`
+- `ModeIr`
+- `WorkflowStepIr`
+- `ArtifactContractIr`
+- `NormalizeRulesIr`
+- `ReadinessIr`
+- `EvidenceIr`
+
+Minimum top-level fields:
+
+```json
+{
+  "sceneId": "",
+  "sceneName": "",
+  "sceneKind": "",
+  "workflowArchetype": "",
+  "bootstrap": {},
+  "params": [],
+  "modes": [],
+  "workflowSteps": [],
+  "requestTemplate": {},
+  "responsePath": "",
+  "normalizeRules": {},
+  "artifactContract": {},
+  "validationHints": {},
+  "evidence": []
+}
+```
+
+- [ ] **Step 2: Wire `Scene IR` into generator entrypoints**
+
+Update `src/bin/sg_scene_generate.rs` to accept either:
+
+- `--scene-info-json` upgraded to the new IR contract, or
+- a new `--scene-ir-json` / `--scene-ir-file` parameter
+
+Keep backward compatibility by translating old scene info into partial IR where needed.
+
+- [ ] **Step 3: Refactor generator to compile from IR**
+
+Update `src/generated_scene/generator.rs` so its internal interfaces no longer directly depend on loosely grouped fields like `expectedDomain`, `staticParams`, and `columnDefs` alone. It should compile from unified `SceneIr`.
+
+- [ ] **Step 4: Update Node server to pass IR through generation**
+
+Modify `frontend/scene-generator/server.js` so analyze endpoints return IR-shaped JSON and generate endpoints pass the same structure into Rust without flattening.
+
+- [ ] **Step 5: Verify serde and CLI compatibility**
+
+Run:
+
+```bash
+cargo check
+node --check frontend/scene-generator/server.js
+node --check frontend/scene-generator/llm-client.js
+```
+
+Expected: Rust and Node compile cleanly with the new IR contract.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/generated_scene/ir.rs src/generated_scene/generator.rs src/bin/sg_scene_generate.rs frontend/scene-generator/server.js frontend/scene-generator/llm-client.js
+git commit -m "feat(generator): introduce unified scene ir for analysis and compilation"
+```
+
+---
+
+### Task 3: Build Hybrid Extraction Pipeline
+
+**Files:**
+- Modify: `src/generated_scene/analyzer.rs`
+- Modify: `frontend/scene-generator/generator-runner.js`
+- Modify: `frontend/scene-generator/llm-client.js`
+- Modify: `frontend/scene-generator/server.js`
+
+**Goal:** Split extraction into deterministic signal collection plus LLM semantic completion.
+
+- [ ] **Step 1: Implement deterministic extraction helpers**
+
+Add helper logic in `src/generated_scene/analyzer.rs` or adjacent extraction code to detect:
+
+- URLs and request methods
+- `contentType`
+- request payload builders
+- pagination variables such as `page`, `rows`, `pageSize`
+- branch variables such as `period_mode`, `reportType`
+- entry methods
+- export methods
+- obvious filter expressions such as `charge !== 0`
+
+- [ ] **Step 2: Create key-fragment selection in Node runner**
+
+Update `frontend/scene-generator/generator-runner.js` to extract and package:
+
+- directory summary
+- URL fragments
+- branch fragments
+- request-body fragments
+- response normalization fragments
+- export fragments
+
+for LLM analysis.
+
+- [ ] **Step 3: Redesign LLM prompt for workflow understanding**
+
+Update `frontend/scene-generator/llm-client.js` so the prompt explicitly asks for:
+
+- `workflowArchetype`
+- `bootstrap`
+- `params`
+- `modes`
+- `workflowSteps`
+- `requestTemplate`
+- `responsePath`
+- `normalizeRules`
+- `artifactContract`
+- `confidence`
+- `uncertainties`
+
+- [ ] **Step 4: Merge deterministic and LLM results**
+
+Implement merge logic in `frontend/scene-generator/server.js` or a dedicated helper:
+
+- deterministic extraction wins for hard facts
+- LLM fills missing semantics
+- conflicts are surfaced in preview as warnings
+
+- [ ] **Step 5: Verify against reference workflows**
+
+Check that:
+
+- `marketing-zero-consumer-report` emits workflow steps including `paginate`, `secondary_request`, `filter`, and `export`
+- `tq-lineloss-report` emits `modes`, `defaultMode`, and `modeSwitchField`
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/generated_scene/analyzer.rs frontend/scene-generator/generator-runner.js frontend/scene-generator/llm-client.js frontend/scene-generator/server.js
+git commit -m "feat(generator): add hybrid deterministic and llm workflow extraction"
+```
+
+---
+
+### Task 4: Add Workflow Archetype Classification
+
+**Files:**
+- Modify: `src/generated_scene/analyzer.rs`
+- Modify: `src/generated_scene/ir.rs`
+- Modify: `frontend/scene-generator/server.js`
+- Modify: `frontend/scene-generator/sg_scene_generator.html`
+
+**Goal:** Reliably classify scenes so the correct compiler path is chosen.
+
+- [ ] **Step 1: Add archetype enum support**
+
+Define and support these initial archetypes:
+
+- `single_request_table`
+- `multi_mode_request`
+- `paginated_enrichment`
+- `page_state_eval`
+
+- [ ] **Step 2: Implement classification rules**
+
+Classification logic should prefer:
+
+1. `multi_mode_request` when explicit mode-switch branching exists
+2. `paginated_enrichment` when paginated list fetch plus secondary requests are detected
+3. `page_state_eval` when page-state judgment dominates
+4. `single_request_table` as fallback with lower confidence
+
+- [ ] **Step 3: Expose classification confidence**
+
+Add confidence and evidence fields to the preview payload so UI can show why a scene was classified into an archetype.
+
+- [ ] **Step 4: Add manual override support in UI**
+
+Allow users to override archetype in `frontend/scene-generator/sg_scene_generator.html` before final generation, but preserve the original detected result and confidence.
+
+- [ ] **Step 5: Verify reference classifications**
+
+Expected:
+
+- `marketing-zero-consumer-report` => `paginated_enrichment`
+- `tq-lineloss-report` => `multi_mode_request`
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/generated_scene/analyzer.rs src/generated_scene/ir.rs frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html
+git commit -m "feat(generator): classify scenes by workflow archetype with confidence"
+```
+
+---
+
+### Task 5: Split Generator Into Archetype Compilers
+
+**Files:**
+- Modify: `src/generated_scene/generator.rs`
+- Optionally create: `src/generated_scene/compiler_single_request.rs`
+- Optionally create: `src/generated_scene/compiler_multi_mode.rs`
+- Optionally create: `src/generated_scene/compiler_paginated_enrichment.rs`
+- Optionally create: `src/generated_scene/compiler_page_state.rs`
+
+**Goal:** Replace the single generic report template with explicit compiler paths.
+
+- [ ] **Step 1: Add compiler routing by archetype**
+
+Update `src/generated_scene/generator.rs` so generation dispatches on `workflowArchetype`.
+
+- [ ] **Step 2: Implement `single_request_table` compiler**
+
+Generate:
+
+- minimal `scene.toml`
+- direct request browser script
+- artifact output for simple table/list data
+
+- [ ] **Step 3: Implement `multi_mode_request` compiler**
+
+Generate:
+
+- mode detection
+- mode-specific request builders
+- mode-specific column definitions
+- mode-specific response extraction
+- unified artifact output
+
+Reference target: `tq-lineloss-report`
+
+- [ ] **Step 4: Implement `paginated_enrichment` compiler**
+
+Generate:
+
+- paginated list loop
+- per-item or batched secondary requests
+- aggregation and transform steps
+- business filters
+- final artifact or export output
+
+Reference target: `marketing-zero-consumer-report`
+
+- [ ] **Step 5: Implement `page_state_eval` compiler**
+
+Generate:
+
+- state-check script skeleton
+- light artifact semantics for monitoring or status checks
+
+- [ ] **Step 6: Verify generated outputs by archetype**
+
+Validate that generated scripts no longer:
+
+- define multiple API endpoints but use only the first
+- collapse mode-aware scenes into one request body
+- flatten paginated enrichment scenes into one-step normalization
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add src/generated_scene/generator.rs src/generated_scene/compiler_*.rs
+git commit -m "feat(generator): split scene generation into workflow archetype compilers"
+```
+
+---
+
+### Task 6: Align Runtime Resolver Contracts
+
+**Files:**
+- Modify: `src/compat/scene_platform/resolvers.rs`
+- Modify: `src/generated_scene/generator.rs`
+- Modify: `src/generated_scene/ir.rs`
+
+**Goal:** Ensure generated parameter contracts are either executable by the runtime or explicitly flagged as unsupported.
+
+- [ ] **Step 1: Audit current resolver coverage**
+
+Document which current contracts are already supported, including:
+
+- `dictionary_entity`
+- `month_week_period`
+- `fixed_enum`
+- `literal_passthrough`
+
+- [ ] **Step 2: Add missing resolver types or gate them**
+
+Choose one of these paths per parameter type:
+
+1. implement new runtime resolver support
+2. downgrade generation to an existing supported resolver
+3. block generation with explicit readiness warning
+
+Recommended additions:
+
+- `mode_enum`
+- `date_range`
+- `org_tree`
+- `page_size`
+- `hidden_static`
+- `derived_param`
+
+- [ ] **Step 3: Reflect runtime compatibility in generated metadata**
+
+Generated output should clearly indicate:
+
+- supported params
+- unresolved params
+- manual-completion requirements
+
+- [ ] **Step 4: Add tests for resolver alignment**
+
+Extend tests to ensure a generated skill cannot claim runnable readiness when its params require unsupported resolver behavior.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/compat/scene_platform/resolvers.rs src/generated_scene/generator.rs src/generated_scene/ir.rs tests/scene_generator_test.rs
+git commit -m "feat(runtime): align generated scene contracts with resolver support"
+```
+
+---
+
+### Task 7: Add Readiness Gates And Generation Report
+
+**Files:**
+- Modify: `frontend/scene-generator/server.js`
+- Modify: `frontend/scene-generator/sg_scene_generator.html`
+- Modify: `src/generated_scene/ir.rs`
+- Modify: `src/generated_scene/generator.rs`
+
+**Goal:** Make generation output self-describing so users know whether a skill is ready for internal-network trial.
+
+- [ ] **Step 1: Add static readiness checks**
+
+Implement checks for:
+
+- entrypoint detection
+- request-chain completeness
+- bootstrap plausibility
+- param/runtime compatibility
+- archetype compiler completeness
+
+- [ ] **Step 2: Add readiness levels**
+
+Define:
+
+- `A` = ready for direct internal-network trial
+- `B` = structurally correct, human review recommended
+- `C` = draft only, manual completion required
+
+- [ ] **Step 3: Generate human-readable report**
+
+Each analysis or generation result should include:
+
+- archetype
+- confidence
+- key evidence
+- detected risks
+- missing pieces
+- readiness level
+
+- [ ] **Step 4: Display readiness in Web UI**
+
+Show the readiness grade before generation and after generation, with explicit warnings for internal-network execution risk.
+
+- [ ] **Step 5: Verify readiness outcomes**
+
+Expected baseline:
+
+- `tq-lineloss-report` should reach `A` or high-confidence `B`
+- `marketing-zero-consumer-report` should not be labeled runnable unless pagination and secondary-request logic are correctly represented
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/ir.rs src/generated_scene/generator.rs
+git commit -m "feat(generator): add readiness grading and generation risk reporting"
+```
+
+---
+
+### Task 8: Add Regression Coverage For Reference Scenes
+
+**Files:**
+- Modify: `tests/scene_generator_test.rs`
+- Create/Modify: scene generator fixtures as needed
+
+**Goal:** Lock in the two reference scenes as ongoing regression cases.
+
+- [ ] **Step 1: Add marketing classification fixture coverage**
+
+Test that the marketing source scene is classified as `paginated_enrichment` and contains evidence for:
+
+- paginated list request
+- secondary request
+- filter rule
+- export step
+
+- [ ] **Step 2: Add tq classification fixture coverage**
+
+Test that the tq source scene is classified as `multi_mode_request` and contains evidence for:
+
+- month mode
+- week mode
+- distinct request templates
+- distinct column definitions
+
+- [ ] **Step 3: Add generation-shape assertions**
+
+Assert that generated outputs differ by archetype and do not collapse to a single generic template shape.
+
+- [ ] **Step 4: Run verification**
+
+```bash
+cargo test --test scene_generator_test -- --nocapture
+```
+
+Expected: both reference cases pass and guard against regression.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tests/scene_generator_test.rs tests/fixtures
+git commit -m "test(generator): add regression coverage for marketing and tq reference scenes"
+```
+
+---
+
+## Delivery Sequence
+
+Recommended implementation order:
+
+1. Task 1: hard failure fixes
+2. Task 2: `Scene IR`
+3. Task 3: hybrid extraction
+4. Task 4: archetype classification
+5. Task 5: compiler split
+6. Task 6: resolver alignment
+7. Task 7: readiness gates
+8. Task 8: regression coverage
+
+Rationale:
+
+- Task 1 stops current bad outputs early.
+- Tasks 2 to 5 establish the new compiler backbone.
+- Tasks 6 and 7 prevent false claims of runnability.
+- Task 8 locks the new architecture against regression.
+
+---
+
+## Verification Strategy
+
+### Static Verification
+
+- `cargo check`
+- `cargo test --test scene_generator_test -- --nocapture`
+- `node --check frontend/scene-generator/llm-client.js`
+- `node --check frontend/scene-generator/generator-runner.js`
+- `node --check frontend/scene-generator/server.js`
+
+### Functional Verification
+
+For `marketing-zero-consumer-report`:
+
+- detected as `paginated_enrichment`
+- bootstrap resolves to business domain, not static script host
+- generated workflow includes pagination and secondary requests
+- generation is not labeled runnable if those steps are missing
+
+For `tq-lineloss-report`:
+
+- detected as `multi_mode_request`
+- month and week logic remain distinct
+- request templates and column definitions are mode-specific
+
+### UI Verification
+
+Confirm the scene generator UI now shows:
+
+- detected archetype
+- confidence
+- bootstrap
+- key params
+- readiness grade
+- risk notes
+
+---
+
+## Acceptance Criteria
+
+This plan is complete when all of the following are true:
+
+1. `sg_scene_generate` consumes a unified `Scene IR`.
+2. The analysis pipeline can distinguish at least `single_request_table`, `multi_mode_request`, `paginated_enrichment`, and `page_state_eval`.
+3. `tq-lineloss-report` is generated through the multi-mode compiler path.
+4. `marketing-zero-consumer-report` is generated through the paginated-enrichment compiler path.
+5. Generated `scene.toml` no longer injects unrelated default org/period assumptions.
+6. Bootstrap resolution no longer mistakes external script hosts for business target domains.
+7. Runtime resolver compatibility is explicit, not implicit.
+8. Generation results include readiness grading and risk reporting before internal-network trial.
+