Files
claw/docs/superpowers/plans/2026-04-15-generated-scene-skill-platform-plan.md

1181 lines
49 KiB
Markdown

# Generated Scene Skill Platform Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add a manifest-driven generated-scene platform that discovers staged report/collection `browser_script` scenes, routes deterministic `。。。` requests through generic registry/resolver logic, migrates `tq-lineloss-report` off one-off Rust branches, and ships a first in-repo generator that outputs registration-ready scene packages with minimal or zero per-scene Rust changes.
**Architecture:** Keep the existing submit branch shape in `src/agent/task_runner.rs`, but replace the line-loss-specific deterministic branch with a thin adapter over a generic scene registry, deterministic dispatcher, generic report-artifact interpreter, and generic XLSX postprocess path. Keep the generator separate from runtime internals by making `scene.toml` plus the lessons-learned TOML the only stable generator/runtime contract; generator code lives in its own module and binary, while runtime code stays under the existing `compat` submit/bootstrap seams.
**Tech Stack:** Rust 2021, `serde`, `serde_json`, `toml`, existing `browser_script` runtime and callback-host/browser-backend seams, `node:test` for staged JS, Cargo integration tests, filesystem-based package generation.
---
## Execution Context
- Branch from the repo's current ws baseline branch, which is `feature/claw-ws` in this checkout today. Do **not** implement on that branch directly; create a new feature branch from its HEAD.
- Do **not** create a worktree unless the user explicitly asks. Branch isolation is required; worktree isolation is not.
- Keep `skillsDir` as the existing single resolved path. The new scene registry must scan inside that one resolved skills root instead of adding array-style scene roots or a second config field.
- For this branch's automated tests and real smokes, use a repo-local `skillsDir` override that points at `examples/generated_scene_platform`. That still preserves the single-root contract because the runtime scans one resolved root whose `skills/` child contains the committed sample package.
- Put the new runtime registration manifest at `<skill-root>/scene.toml`. Keep existing `skill_staging/scenes/*/scene.json` files for legacy staging/UI metadata and do **not** move runtime dispatch policy back into `scene.json`.
- Keep every required deliverable for this plan inside the current `claw-new` repo so the branch can be built, tested, and committed independently. The first committed sample package should live under `examples/generated_scene_platform/skills/`; publishing the same package into any external skills/staging repo is a separate follow-up, not part of this branch.
- V1 scope is locked to `category = "report_collection"`, `kind = "browser_script"`, `artifact.type = "report-artifact"`. Unsupported scene types must fail fast instead of partially working.
- Deterministic invocation remains exact-suffix-only: only raw instructions ending with the exact `。。。` suffix enter the scene dispatcher.
- Never use hidden page defaults for required canonical parameters. Missing org, missing month/week mode, or missing period must prompt and stop.
- Do **not** add a generic login/session subsystem in this plan.
- Preserve current non-platform flows: Zhihu/LLM, configured `directSubmitSkill`, and ordinary browser-attached orchestration must remain behaviorally unchanged unless an explicit regression test says otherwise.
## File Map
### Core runtime and contract files
- Create: `src/scene_contract/mod.rs`
- shared serializable manifest contract used by both runtime and generator
- Create: `src/scene_contract/manifest.rs`
- `scene.toml` schema types, schema-version validation helpers, artifact/postprocess enums
- Create: `src/compat/scene_platform/mod.rs`
- exports the registry, dispatch, and resolver units
- Create: `src/compat/scene_platform/registry.rs`
- scans the single resolved `skillsDir`, loads `<skill-root>/scene.toml`, validates duplicates and runtime compatibility
- Create: `src/compat/scene_platform/dispatch.rs`
- deterministic candidate scoring, ambiguity fail-closed behavior, canonical param resolution, executable scene plan creation
- Create: `src/compat/scene_platform/resolvers.rs`
- reusable resolver types for `dictionary_entity`, `month_week_period`, `fixed_enum`, and `literal_passthrough`
- Create: `src/compat/report_artifact.rs`
- generic report-artifact parsing, status mapping, summary building, and export-readiness helpers
- Create: `src/compat/report_xlsx_export.rs`
- generic XLSX exporter for any `report-artifact` with `column_defs`/`columns` + `rows`
- Modify: `src/lib.rs`
- export new shared/runtime/generator modules and any CLI helpers needed by tests
- Modify: `src/compat/mod.rs`
- export the new scene-platform and report-artifact modules
- Modify: `src/compat/deterministic_submit.rs`
- keep the public API shape, but make it registry/manifest-driven instead of line-loss-hardcoded
- Modify: `src/compat/direct_skill_runtime.rs`
- reuse the generic report-artifact interpreter so direct-submit and scene-submit summarize/status-map the same way
- Modify: `src/agent/task_runner.rs`
- keep branch order, but call the new registry-backed deterministic planner before ordinary orchestration/LLM
- Modify: `src/service/server.rs`
- keep bootstrap precedence shape, but let deterministic plans source `target_url` / `expected_domain` from scene manifests instead of hardcoded constants
### Generator files
- Create: `src/generated_scene/mod.rs`
- generator entrypoints shared by tests and CLI
- Create: `src/generated_scene/analyzer.rs`
- source directory inspection for v1 report/collection `browser_script` scenes
- Create: `src/generated_scene/generator.rs`
- template rendering and package writing into an output staging root
- Create: `src/generated_scene/lessons.rs`
- loads and validates `tq-lineloss-lessons-learned.toml` as generation constraints
- Create: `src/bin/sg_scene_generate.rs`
- CLI entry for `sgClaw`'s in-repo scene generator capability
### In-repo sample package and reference assets
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml`
- first committed manifest-driven sample scene package used by runtime and generator tests in this repo
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json`
- external dictionary data for the `dictionary_entity` resolver fixture
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml`
- committed sample browser-script tool contract aligned with the manifest-driven runtime
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md`
- committed sample documentation for canonical args, artifact contract, and runtime expectations
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js`
- committed sample collection script with generic-platform artifact fields
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js`
- committed JS contract tests for canonical args and artifact shape
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md`
- committed sample data-quality notes aligned with manifest-driven output rules
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md`
- committed sample bootstrap/collection-flow notes
- Create: `tests/fixtures/scene_source/tq_lineloss/index.html`
- hermetic in-repo source fixture for required analyzer/generator smoke coverage
- Create: `tests/fixtures/scene_source/tq_lineloss/js/collect.js`
- hermetic in-repo source fixture JS for analyzer/generator smoke coverage
### Repo-local runtime discovery path for validation
- Use `examples/generated_scene_platform` as the repo-local `skillsDir` override root during tests and manual smokes.
- The runtime still scans one resolved root only; it just resolves that root to `examples/generated_scene_platform`, whose `skills/` child contains the committed sample package.
- Add or reuse a tiny repo-local config fixture such as `tmp/generated_scene_platform_sgclaw_config.json` or an equivalent test helper so the validation steps all point at the same reproducible `skillsDir`.
- Do not require external staging repos to make the manifest-driven runtime discoverable during this branch.
### External publish target kept out of scope for this branch
- Do not modify external paths like `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/...` in this plan.
- If the user later wants the generated sample published into that external staging repo, do it as a separate follow-up after this branch is green.
### Platform-reference files
### Tests and fixtures
- Create: `tests/scene_registry_test.rs`
- manifest loading, duplicate detection, schema validation, tool compatibility checks
- Create: `tests/report_artifact_postprocess_test.rs`
- generic report-artifact parsing and XLSX postprocess coverage
- Create: `tests/generated_scene_lessons_test.rs`
- lessons-TOML shape and required-rule coverage
- Create: `tests/scene_generator_test.rs`
- analyzer + generator integration coverage using hermetic fixtures
- Create: `tests/fixtures/generated_scene/report_collection/index.html`
- supported v1 report-scene fixture
- Create: `tests/fixtures/generated_scene/report_collection/js/report.js`
- supported fixture source hints for analyzer tests
- Create: `tests/fixtures/generated_scene/non_report/index.html`
- unsupported fixture proving fail-fast behavior
- Modify: `tests/deterministic_submit_test.rs`
- migrate from hardcoded line-loss expectations to registry-driven deterministic behavior
- Modify: `tests/agent_runtime_test.rs`
- keep direct-submit behavior intact while sharing generic report-artifact summaries
- Modify: `tests/service_task_flow_test.rs`
- task-runner/bootstrap regressions for manifest-driven deterministic scenes
- Modify: `tests/service_ws_session_test.rs`
- callback-host bootstrap target regression for manifest-driven deterministic submit when the browser-ws path is active
### Legacy files to delete only after green verification proves they are unused
- Delete: `src/compat/tq_lineloss/org_units.rs`
- Delete: `src/compat/tq_lineloss/org_resolver.rs`
- Delete: `src/compat/tq_lineloss/period_resolver.rs`
- Delete or reduce to a compatibility shim only if still needed: `src/compat/lineloss_xlsx_export.rs`
---
### Task 1: Create the implementation branch and lock the layout boundaries
**Files:**
- Verify only
- [ ] **Step 1: Switch to the ws baseline branch and create a new platform branch**
Run:
```bash
git switch feature/claw-ws
git switch -c feature/generated-scene-skill-platform
```
Expected: `git status -sb` shows a clean new branch rooted at the current ws baseline, not `feature/claw-ws` itself.
- [ ] **Step 2: Verify the current single-root skills layout before coding**
Run:
```bash
cargo test --test compat_config_test ws_cleanup_resolves_single_configured_skills_dir -- --nocapture
```
Expected: PASS, proving the repo still uses one resolved `skillsDir` path and the platform work must build on that instead of introducing array-style roots.
- [ ] **Step 3: Write down the two non-negotiable layout decisions in the first registry test scaffold**
The very first red test file (`tests/scene_registry_test.rs`) must assume:
```rust
// runtime manifest location:
let manifest_path = skill_root.join("scene.toml");
// legacy scene.json stays outside runtime dispatch ownership:
assert!(skill_root.join("scene.toml").exists());
assert!(!manifest_path.ends_with("skill_staging/scenes/.../scene.json"));
```
This prevents the implementation from drifting back toward `scene.json` routing or multi-root config.
---
### Task 2: Add the shared `scene.toml` contract and registry loader
**Files:**
- Create: `src/scene_contract/mod.rs`
- Create: `src/scene_contract/manifest.rs`
- Create: `src/compat/scene_platform/mod.rs`
- Create: `src/compat/scene_platform/registry.rs`
- Modify: `src/lib.rs`
- Modify: `src/compat/mod.rs`
- Create: `tests/scene_registry_test.rs`
- [ ] **Step 1: Write the failing registry tests first**
Add `tests/scene_registry_test.rs` with focused red cases like:
```rust
#[test]
fn registry_loads_scene_manifest_from_skill_root() {
let skill_root = temp_skill_with_scene_manifest(r#"
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"
[manifest]
schema_version = "1"
[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
requires_target_page = true
[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
"#);
let registry = load_scene_registry(skill_root.parent().unwrap()).unwrap();
assert_eq!(registry.len(), 1);
assert_eq!(registry[0].manifest.scene.id, "tq-lineloss-report");
}
#[test]
fn registry_rejects_duplicate_scene_ids_with_both_paths_in_error() { /* two skills, same scene.id */ }
#[test]
fn registry_rejects_unknown_manifest_schema_version() { /* schema_version = "999" */ }
#[test]
fn registry_rejects_non_browser_script_scene_tool_in_v1() { /* kind = "shell" should fail */ }
#[test]
fn registry_ignores_skills_without_scene_toml() { /* ordinary skills still load elsewhere */ }
```
- [ ] **Step 2: Run the registry test file and verify it fails**
Run:
```bash
cargo test --test scene_registry_test -- --nocapture
```
Expected: FAIL because `scene.toml` types and registry loading do not exist yet.
- [ ] **Step 3: Implement the serializable manifest contract and the single-root registry loader**
Implement the minimal contract and loader needed to satisfy the tests:
```rust
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SceneManifest {
pub scene: SceneSection,
pub manifest: ManifestSection,
pub bootstrap: BootstrapSection,
pub deterministic: DeterministicSection,
pub params: Vec<SceneParam>,
pub artifact: ArtifactSection,
pub postprocess: Option<PostprocessSection>,
}
#[derive(Debug, Clone)]
pub struct SceneRegistryEntry {
pub manifest: SceneManifest,
pub skill_root: PathBuf,
}
pub fn load_scene_registry(skills_dir: &Path) -> Result<Vec<SceneRegistryEntry>, SceneRegistryError> {
// iterate immediate skill dirs under the already-resolved single skillsDir
// look for <skill-root>/scene.toml only
// parse and validate schema version
// verify scene.id uniqueness across the loaded root
// verify manifest.scene.skill matches the containing skill package
// verify referenced tool exists in SKILL.toml and is browser_script in v1
}
```
Rules to lock now:
- `schema_version = "1"` is the only accepted version in v1
- duplicate `scene.id` is a hard error and must report both manifest paths
- manifest loading must not add a second config key or a hardcoded `skill_staging/scenes` scan
- `scene.toml` is runtime-owned; `scene.json` stays legacy-only
- [ ] **Step 4: Re-run the registry tests and verify they pass**
Run:
```bash
cargo test --test scene_registry_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Commit the contract and registry slice**
Run:
```bash
git add src/lib.rs src/scene_contract/mod.rs src/scene_contract/manifest.rs src/compat/mod.rs src/compat/scene_platform/mod.rs src/compat/scene_platform/registry.rs tests/scene_registry_test.rs
git commit -m "feat: add scene manifest registry"
```
Expected: one commit that introduces the stable runtime/generator contract and registry loader.
---
### Task 3: Generalize deterministic dispatch and reusable parameter resolvers
**Files:**
- Create: `src/compat/scene_platform/dispatch.rs`
- Create: `src/compat/scene_platform/resolvers.rs`
- Modify: `src/compat/deterministic_submit.rs`
- Modify: `tests/deterministic_submit_test.rs`
- [ ] **Step 1: Replace the line-loss-only deterministic tests with registry-backed red tests**
Extend `tests/deterministic_submit_test.rs` with registry-backed red cases built from temp fixture manifests under a temporary skills root. Do **not** depend on the committed sample package from Task 6 yet; Task 3 must stay hermetic and independently runnable. Add failing cases such as:
```rust
#[test]
fn deterministic_submit_uses_registry_backed_scene_plan() {
let decision = decide_deterministic_submit(
"兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。",
None,
None,
);
match decision {
DeterministicSubmitDecision::Execute(plan) => {
assert_eq!(plan.scene_id, "tq-lineloss-report");
assert_eq!(plan.tool_name, "tq-lineloss-report.collect_lineloss");
assert_eq!(plan.expected_domain, "20.76.57.61");
assert_eq!(plan.target_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
}
other => panic!("expected execute plan, got {other:?}"),
}
}
#[test]
fn deterministic_submit_fails_closed_on_scene_ambiguity() { /* two plausible scene.toml entries -> Prompt */ }
#[test]
fn deterministic_submit_prompts_for_missing_period_instead_of_defaulting() {
let decision = decide_deterministic_submit("兰州公司 台区线损大数据 月累计线损率统计分析。。。", None, None);
assert!(matches!(decision, DeterministicSubmitDecision::Prompt { .. }));
}
#[test]
fn deterministic_submit_uses_page_context_to_break_ties_before_keyword_only_match() { /* page_url/title beats keyword overlap */ }
#[test]
fn zhihu_without_suffix_remains_not_deterministic() {
assert!(matches!(
decide_deterministic_submit("打开知乎热榜", Some("https://www.zhihu.com/hot"), Some("知乎热榜")),
DeterministicSubmitDecision::NotDeterministic
));
}
```
Also invert the current default-period expectations. `兰州公司 月累计。。。` and `兰州公司 周累计。。。` must now prompt instead of executing.
- [ ] **Step 2: Run the targeted deterministic tests and verify they fail**
Run:
```bash
cargo test --test deterministic_submit_test -- --nocapture
```
Expected: FAIL because the current implementation is still hardcoded to line-loss constants and still defaults missing month/week periods.
- [ ] **Step 3: Implement reusable resolver types and a registry-backed dispatcher**
Implement the generic deterministic planner in the new scene-platform modules, then make `src/compat/deterministic_submit.rs` a thin adapter over it.
Required implementation shape:
```rust
pub enum ResolverKind {
DictionaryEntity,
MonthWeekPeriod,
FixedEnum,
LiteralPassthrough,
}
pub struct SceneExecutionPlan {
pub scene_id: String,
pub instruction: String,
pub tool_name: String,
pub expected_domain: String,
pub target_url: String,
pub args: Map<String, Value>,
pub success_statuses: Vec<String>,
pub failure_statuses: Vec<String>,
pub postprocess: Option<PostprocessSection>,
}
pub fn plan_deterministic_scene(
raw_instruction: &str,
page_url: Option<&str>,
page_title: Option<&str>,
skills_dir: &Path,
) -> Result<DeterministicSubmitDecision, SceneDispatchError> {
// exact suffix gate
// load registry from the single skillsDir
// score candidate scenes using include/exclude keywords + page context + required-param resolution
// if multiple remain plausible -> fail closed with explicit ambiguity prompt
// resolve params using generic resolver kinds
// build executable SceneExecutionPlan with manifest bootstrap + tool + canonical args
}
```
Resolver rules to lock now:
- `dictionary_entity` reads external dictionary data such as `references/org-dictionary.json`; no hardcoded org list in Rust after migration
- `month_week_period` returns explicit prompts for missing mode, missing period, contradictory month/week intent, or week-without-year
- `fixed_enum` and `literal_passthrough` exist now so the manifest contract is extensible, even if line-loss is the only v1 user
- if a new scene needs a new resolver **type**, add a reusable resolver, not a scene-specific `if scene_id == ...` branch
- [ ] **Step 4: Re-run the deterministic tests and verify they pass**
Run:
```bash
cargo test --test deterministic_submit_test -- --nocapture
```
Expected: PASS, including the new no-default-period behavior and ambiguity fail-closed coverage.
- [ ] **Step 5: Commit the registry-driven deterministic slice**
Run:
```bash
git add src/compat/deterministic_submit.rs src/compat/scene_platform/dispatch.rs src/compat/scene_platform/resolvers.rs tests/deterministic_submit_test.rs
git commit -m "feat: add registry-driven deterministic scene dispatch"
```
Expected: one commit that removes one-off line-loss decision ownership from the deterministic planner.
---
### Task 4: Add a generic report-artifact interpreter and XLSX postprocess path
**Files:**
- Create: `src/compat/report_artifact.rs`
- Create: `src/compat/report_xlsx_export.rs`
- Modify: `src/compat/direct_skill_runtime.rs`
- Modify: `src/compat/deterministic_submit.rs`
- Create: `tests/report_artifact_postprocess_test.rs`
- Modify: `tests/agent_runtime_test.rs`
- [ ] **Step 1: Write the red tests for generic report-artifact handling**
Add `tests/report_artifact_postprocess_test.rs` and the minimum `tests/agent_runtime_test.rs` extensions needed to prove the platform no longer depends on line-loss-specific Rust export logic:
```rust
#[test]
fn report_artifact_postprocess_exports_xlsx_for_ok_or_partial_scene() {
let artifact = serde_json::json!({
"type": "report-artifact",
"report_name": "tq-lineloss-report",
"status": "partial",
"columns": ["ORG_NAME", "LINE_LOSS_RATE"],
"column_defs": [["ORG_NAME", "供电单位"], ["LINE_LOSS_RATE", "综合线损率(%)"]],
"rows": [{"ORG_NAME": "国网兰州供电公司", "LINE_LOSS_RATE": "1.23"}],
"counts": {"rows": 1},
"partial_reasons": ["report_log_failed"]
});
let outcome = interpret_report_artifact_and_postprocess(&artifact, report_postprocess_xlsx(), temp_workspace()).unwrap();
assert!(outcome.success);
assert!(outcome.summary.contains("status=partial"));
assert!(outcome.summary.contains("detail_rows=1"));
assert!(outcome.summary.contains("export_path="));
}
#[test]
fn report_artifact_postprocess_skips_export_for_blocked_or_error_scene() { /* no xlsx path */ }
#[test]
fn direct_submit_and_scene_submit_share_the_same_report_summary_contract() { /* direct_skill_runtime + deterministic path both use same summary builder */ }
```
- [ ] **Step 2: Run the focused report-artifact tests and verify they fail**
Run:
```bash
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture
```
Expected: FAIL because the generic interpreter/exporter does not exist yet and deterministic line-loss export is still special-cased.
- [ ] **Step 3: Implement the shared parser, summary builder, and generic XLSX exporter**
Implement a reusable path that both deterministic scenes and configured direct-submit skills can call:
```rust
pub struct ParsedReportArtifact {
pub report_name: String,
pub status: String,
pub columns: Vec<String>,
pub column_defs: Vec<(String, String)>,
pub rows: Vec<Map<String, Value>>,
pub counts: ReportCounts,
pub partial_reasons: Vec<String>,
}
pub fn interpret_report_artifact_and_postprocess(
artifact_json: &Value,
postprocess: Option<&PostprocessSection>,
workspace_root: &Path,
) -> Result<DirectSubmitOutcome, PipeError> {
// parse report-artifact generically
// map ok/partial/empty => success=true
// map blocked/error => success=false
// if postprocess.exporter == Some("xlsx_report") and status is exportable, write xlsx under workspace_root/out
// if postprocess.auto_open == Some("excel"), reuse existing open-export helper
}
```
Rules:
- export logic must read `column_defs` when present, else fall back to `columns`
- do not keep line-loss-only column-name assumptions in Rust
- keep direct-submit behavior unchanged for non-artifact string outputs
- keep `blocked` / `error` as failures even if rows happen to be present late in the artifact
- [ ] **Step 4: Re-run the focused tests and verify they pass**
Run:
```bash
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_blocked_report_artifact_as_failure -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Commit the generic artifact/postprocess slice**
Run:
```bash
git add src/compat/report_artifact.rs src/compat/report_xlsx_export.rs src/compat/direct_skill_runtime.rs src/compat/deterministic_submit.rs tests/report_artifact_postprocess_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: share generic report artifact postprocess"
```
Expected: one commit that removes the need for per-scene Rust export logic.
---
### Task 5: Wire manifest-driven scenes into submit and bootstrap without regressing other flows
**Files:**
- Modify: `src/agent/task_runner.rs`
- Modify: `src/service/server.rs`
- Modify: `tests/service_task_flow_test.rs`
- Modify: `tests/service_ws_session_test.rs`
- Modify: `tests/agent_runtime_test.rs`
- [ ] **Step 1: Add the failing submit/bootstrap regression tests**
Add focused tests that lock branch order and bootstrap behavior:
```rust
#[test]
fn submit_task_routes_suffix_instruction_through_manifest_scene_before_llm() {
// no provider call should happen when deterministic scene planning succeeds or prompts
}
#[test]
fn resolve_submit_bootstrap_target_prefers_manifest_scene_target_for_deterministic_scene() {
let request = SubmitTaskRequest {
instruction: "兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。".to_string(),
conversation_id: None,
messages: vec![],
page_url: None,
page_title: None,
};
let target = resolve_submit_bootstrap_target(&request, workspace_root, &settings);
assert_eq!(target.request_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
assert_eq!(target.expected_domain.as_deref(), Some("20.76.57.61"));
}
#[test]
fn zhihu_without_suffix_keeps_existing_non_scene_path() { /* ordinary path unchanged */ }
```
For the browser-ws/callback-host path, add one regression in `tests/service_ws_session_test.rs` proving the first bootstrap/open target comes from `scene.toml` when a deterministic scene plan exists.
- [ ] **Step 2: Run the focused integration tests and verify they fail**
Run:
```bash
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
```
Expected: FAIL because the submit/bootstrap path still depends on the old deterministic line-loss branch shape.
- [ ] **Step 3: Implement the minimal wiring changes only where the branch already exists**
Implementation targets:
- keep the current submit branch order in `src/agent/task_runner.rs`
- keep `resolve_submit_bootstrap_target(...)` precedence in `src/service/server.rs`
- replace the old hardcoded deterministic plan source with the new manifest-backed planner
- keep configured `directSubmitSkill` and ordinary LLM/browser orchestration behavior untouched
The resulting branch order must still be:
```rust
// 1. registry-backed deterministic scene (exact suffix only)
// 2. ordinary primary orchestration path
// 3. configured directSubmitSkill
// 4. compat LLM/runtime path
```
- [ ] **Step 4: Re-run the focused integration tests and verify they pass**
Run:
```bash
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
```
Expected: PASS, with no regression to the ordinary direct-submit or Zhihu paths.
- [ ] **Step 5: Commit the submit/bootstrap integration slice**
Run:
```bash
git add src/agent/task_runner.rs src/service/server.rs tests/service_task_flow_test.rs tests/service_ws_session_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: wire manifest scenes into submit bootstrap"
```
Expected: one commit that changes wiring only at the existing seams.
---
### Task 6: Add the first manifest-driven `tq-lineloss-report` sample package inside this repo
**Files:**
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md`
- Create: `examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md`
- Modify: `tests/deterministic_submit_test.rs`
- Modify: `tests/scene_registry_test.rs`
- [ ] **Step 1: Add the failing line-loss manifest and runtime-contract checks**
Create the `scene.toml` shape in the in-repo sample package first and lock the migration expectations:
```toml
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"
[manifest]
schema_version = "1"
[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
page_title_keywords = ["线损"]
requires_target_page = true
[deterministic]
suffix = "。。。"
include_keywords = ["线损", "月累计", "周累计", "统计分析"]
exclude_keywords = ["知乎"]
[[params]]
name = "org"
resolver = "dictionary_entity"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少供电单位。"
prompt_ambiguous = "已命中台区线损报表技能,但供电单位存在歧义,请补充更完整名称。"
[params.resolver_config]
dictionary_ref = "references/org-dictionary.json"
output_label_field = "org_label"
output_code_field = "org_code"
[[params]]
name = "period"
resolver = "month_week_period"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少统计周期。"
prompt_ambiguous = "已命中台区线损报表技能,但统计周期存在歧义,请补充更明确表达。"
[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
[postprocess]
exporter = "xlsx_report"
auto_open = "excel"
```
Also add a red JS assertion in the committed sample package proving the script returns `column_defs` and never re-parses raw natural-language org/period text:
```javascript
test('buildBrowserEntrypointResult keeps canonical args and generic export fields only', async () => {
const artifact = await buildBrowserEntrypointResult({
expected_domain: '20.76.57.61',
org_label: '国网兰州供电公司',
org_code: '62401',
period_mode: 'month',
period_mode_code: '1',
period_value: '2026-03',
period_payload: { fdate: '2026-03' },
instruction: '兰州公司 月累计 2026-03'
}, fakeDeps);
assert.equal(artifact.org.code, '62401');
assert.ok(Array.isArray(artifact.column_defs));
assert.equal(JSON.stringify(artifact).includes('兰州公司 月累计 2026-03'), false);
});
```
- [ ] **Step 2: Run the targeted line-loss tests and verify they fail**
Run:
```bash
cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
```
Expected: FAIL because the runtime is not yet manifest-driven and the committed sample package does not yet expose the final manifest/dictionary/export contract.
- [ ] **Step 3: Implement the sample-scene migration without adding per-scene Rust branches**
Required actions:
- add `scene.toml` under the in-repo sample skill root and use the same layout the generator will emit
- make tests and service-smoke config resolve `skillsDir` to `examples/generated_scene_platform` so the registry can discover the committed sample package without any external repo copy step
- export the current org unit data into `references/org-dictionary.json` and make the resolver read that file instead of a Rust hardcoded list
- update `collect_lineloss.js` so the returned `report-artifact` includes generic-platform fields needed by `report_xlsx_export.rs`
- keep collection logic in JS; do **not** move line-loss business semantics back into Rust
- write `SKILL.toml` / `SKILL.md` / references docs into the sample package to describe canonical args and the manifest-driven contract
- keep any external staging-repo publish step out of scope for this branch; this task only commits the in-repo sample package
- [ ] **Step 4: Re-run the line-loss tests and verify they pass**
Run:
```bash
cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
```
Expected: PASS, including the new missing-period prompt behavior and the new manifest-driven sample-package shape.
- [ ] **Step 5: Commit the line-loss sample migration**
Run:
```bash
git add examples/generated_scene_platform/skills/tq-lineloss-report tests/deterministic_submit_test.rs tests/scene_registry_test.rs
git commit -m "feat: add manifest-driven lineloss sample package"
```
Expected: one commit that adds the first committed manifest-driven sample package and updates runtime expectations around it.
---
### Task 7: Write the required `tq-lineloss` lessons-learned artifacts and load them as generator rules
**Files:**
- Create: `docs/superpowers/references/tq-lineloss-lessons-learned.md`
- Create: `docs/superpowers/references/tq-lineloss-lessons-learned.toml`
- Create: `tests/generated_scene_lessons_test.rs`
- Create: `src/generated_scene/mod.rs`
- Create: `src/generated_scene/lessons.rs`
- Modify: `src/lib.rs`
- [ ] **Step 1: Write the failing lessons-rules test before the docs**
Add `tests/generated_scene_lessons_test.rs` that requires all mandatory structured rule sections to exist. In the same red step, wire the empty `src/generated_scene/mod.rs` and `src/lib.rs` exports needed so this test fails on missing implementation/data, not on missing module visibility:
```rust
#[test]
fn lineloss_lessons_toml_declares_required_generator_rules() {
let lessons = load_generation_lessons("docs/superpowers/references/tq-lineloss-lessons-learned.toml").unwrap();
assert!(lessons.routing.require_exact_suffix);
assert!(lessons.routing.unsupported_scene_fail_closed);
assert!(lessons.canonical_params.require_explicit_period);
assert!(lessons.bootstrap.require_expected_domain);
assert!(lessons.bootstrap.require_target_url);
assert!(lessons.artifact.require_report_artifact);
assert!(lessons.validation.require_pipe_and_ws_checks);
assert!(lessons.validation.require_manual_service_console_smoke);
}
```
- [ ] **Step 2: Run the lessons test and verify it fails**
Run:
```bash
cargo test --test generated_scene_lessons_test -- --nocapture
```
Expected: FAIL because the lessons loader and TOML file do not exist yet.
- [ ] **Step 3: Implement the loader and write both lessons artifacts**
Implement the loader and complete the minimal module wiring (`src/generated_scene/mod.rs`, `src/lib.rs`) in this task so `cargo test --test generated_scene_lessons_test` is buildable before Task 8. Use a TOML shape explicit enough for generator enforcement, for example:
```toml
[routing]
require_exact_suffix = true
unsupported_scene_fail_closed = true
ambiguity_fail_closed = true
[canonical_params]
require_dictionary_entity_for_org = true
require_explicit_period = true
forbid_hidden_page_defaults = true
[bootstrap]
require_expected_domain = true
require_target_url = true
prefer_page_context_when_present = true
[artifact]
require_report_artifact = true
require_column_defs_for_export = true
rust_side_xlsx_export_when_postprocess_xlsx = true
[validation]
require_pipe_and_ws_checks = true
require_manual_service_console_smoke = true
require_callback_host_timeout_notes = true
```
The Markdown companion must explain the why behind those rules: deterministic routing pitfalls, canonical parameter pitfalls, bootstrap target pitfalls, pipe/ws differences, callback-host timeout lessons, and Rust-side export constraints.
- [ ] **Step 4: Re-run the lessons tests and verify they pass**
Run:
```bash
cargo test --test generated_scene_lessons_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Commit the lessons artifacts and loader**
Run:
```bash
git add docs/superpowers/references/tq-lineloss-lessons-learned.md docs/superpowers/references/tq-lineloss-lessons-learned.toml src/generated_scene/mod.rs src/generated_scene/lessons.rs src/lib.rs tests/generated_scene_lessons_test.rs
git commit -m "docs: add lineloss generation lessons"
```
Expected: one commit that makes the line-loss lessons machine-consumable and reviewable.
---
### Task 8: Build the v1 source analyzer, package generator, and CLI entry
**Files:**
- Create: `src/generated_scene/analyzer.rs`
- Create: `src/generated_scene/generator.rs`
- Create: `src/bin/sg_scene_generate.rs`
- Modify: `src/generated_scene/mod.rs`
- Modify: `src/lib.rs`
- Create: `tests/scene_generator_test.rs`
- Create: `tests/fixtures/generated_scene/report_collection/index.html`
- Create: `tests/fixtures/generated_scene/report_collection/js/report.js`
- Create: `tests/fixtures/generated_scene/non_report/index.html`
- Create: `tests/fixtures/scene_source/tq_lineloss/index.html`
- Create: `tests/fixtures/scene_source/tq_lineloss/js/collect.js`
- [ ] **Step 1: Add the failing analyzer/generator tests with hermetic fixtures**
Create fixture-backed tests like:
```rust
#[test]
fn analyzer_classifies_supported_report_collection_source() {
let analysis = analyze_scene_source(Path::new("tests/fixtures/generated_scene/report_collection")).unwrap();
assert_eq!(analysis.scene_kind, SceneKind::ReportCollection);
assert_eq!(analysis.tool_kind, ToolKind::BrowserScript);
assert!(analysis.bootstrap.target_url.is_some());
assert!(analysis.collection_entry_script.is_some());
}
#[test]
fn generator_writes_registration_ready_package_with_scene_toml() {
let output_root = tempdir();
generate_scene_package(GenerateSceneRequest {
source_dir: PathBuf::from("tests/fixtures/generated_scene/report_collection"),
scene_id: "sample-report-scene".to_string(),
scene_name: "示例报表场景".to_string(),
output_root: output_root.path().to_path_buf(),
lessons_path: PathBuf::from("docs/superpowers/references/tq-lineloss-lessons-learned.toml"),
}).unwrap();
assert!(output_root.path().join("skills/sample-report-scene/SKILL.toml").exists());
assert!(output_root.path().join("skills/sample-report-scene/scene.toml").exists());
assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.js").exists());
assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.test.js").exists());
}
#[test]
fn generator_rejects_non_report_source_with_explicit_reason() {
let err = analyze_scene_source(Path::new("tests/fixtures/generated_scene/non_report")).unwrap_err();
assert!(err.to_string().contains("report/collection browser_script only"));
}
```
- [ ] **Step 2: Run the generator tests and verify they fail**
Run:
```bash
cargo test --test scene_generator_test -- --nocapture
```
Expected: FAIL because the analyzer, generator, fixtures, and CLI do not exist yet.
- [ ] **Step 3: Implement the analyzer, generator, CLI, and the source fixtures used by final smoke**
Implementation rules:
- create the generator test fixtures under `tests/fixtures/generated_scene/*`
- create the hermetic source-smoke fixtures under `tests/fixtures/scene_source/tq_lineloss/*` so Task 9 can run without any external scenario directory
- analyzer must refuse unsupported/non-report scenes explicitly instead of generating broken packages
- generator must emit `scene.toml` inside the generated skill root
- generator must use `tq-lineloss-lessons-learned.toml` as a required input so the same hardening rules apply to future scenes
- generator/runtime coupling must stay at the file-contract level only
- CLI should use an explicit parser, no new heavy dependency
Suggested CLI shape:
```rust
cargo run --bin sg_scene_generate -- \
--source-dir <scenario-dir> \
--scene-id <scene-id> \
--scene-name <display-name> \
--output-root <skill-staging-root> \
--lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml
```
Expected outputs under `<output-root>`:
- `skills/<scene-id>/SKILL.toml`
- `skills/<scene-id>/SKILL.md`
- `skills/<scene-id>/scene.toml`
- `skills/<scene-id>/references/*.md`
- `skills/<scene-id>/scripts/*.js`
- `skills/<scene-id>/scripts/*.test.js`
- [ ] **Step 4: Re-run the generator tests and verify they pass**
Run:
```bash
cargo test --test scene_generator_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Commit the generator slice**
Run:
```bash
git add src/lib.rs src/generated_scene/mod.rs src/generated_scene/analyzer.rs src/generated_scene/generator.rs src/bin/sg_scene_generate.rs tests/scene_generator_test.rs tests/fixtures/generated_scene tests/fixtures/scene_source/tq_lineloss
git commit -m "feat: add generated scene package generator"
```
Expected: one commit that adds the in-repo v1 generator capability.
---
### Task 9: Run the final verification sweep, smoke the real runtime, and remove unused one-off scene code
**Files:**
- Delete if unused after green verification: `src/compat/tq_lineloss/org_units.rs`
- Delete if unused after green verification: `src/compat/tq_lineloss/org_resolver.rs`
- Delete if unused after green verification: `src/compat/tq_lineloss/period_resolver.rs`
- Delete or reduce to shim only if unused after green verification: `src/compat/lineloss_xlsx_export.rs`
- Modify: `src/compat/mod.rs`
- Modify: `src/lib.rs`
- [ ] **Step 1: Remove only the legacy one-off files that are provably unused**
Before deleting anything, prove the new path covers the old responsibilities:
```bash
cargo test --test deterministic_submit_test -- --nocapture
cargo test --test scene_registry_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture
```
Then delete the old line-loss-only resolver/export files only if `cargo test` and `Grep` show they are no longer referenced.
- [ ] **Step 2: Run the full automated verification sweep**
Run:
```bash
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
cargo test --test scene_registry_test -- --nocapture
cargo test --test deterministic_submit_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test compat_runtime_test -- --nocapture
cargo test --test compat_config_test -- --nocapture
cargo build --bin sgclaw --bin sg_claw --bin sg_scene_generate
```
Expected: PASS.
- [ ] **Step 3: Run the required hermetic generator smoke and keep the real external source smoke optional**
Run the required in-repo smoke first:
```bash
tmp_out="$(mktemp -d)"
cargo run --bin sg_scene_generate -- \
--source-dir tests/fixtures/scene_source/tq_lineloss \
--scene-id tq-lineloss-report \
--scene-name "台区线损月周累计线损率统计分析" \
--output-root "$tmp_out" \
--lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml
```
Expected: generator emits a complete package into `$tmp_out` using only in-repo fixtures.
Optional manual follow-up after the required smoke is green:
- if the external scenario directory is available on the implementer's machine, re-run the same command against the real source tree for additional confidence
- if it is unavailable, do **not** block the branch on that machine-specific path
- [ ] **Step 4: Run the real service-console smoke checks with `sg_claw.exe` semantics in mind**
Manual verification checklist:
- write or reuse a repo-local `sgclaw_config.json` whose `skillsDir` points to `examples/generated_scene_platform`
- rebuild and run `sg_claw`/`sg_claw.exe` with that config so the runtime-scanned skills root is reproducible
- on the real line-loss page, submit `兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。`
- confirm the request bootstraps the manifest `target_url`, uses the manifest `expected_domain`, and returns the line-loss report artifact through the generic scene runtime
- submit `兰州公司 台区线损大数据 月累计线损率统计分析。。。` and confirm the runtime prompts for missing period instead of defaulting
- submit `打开知乎热榜` and confirm the ordinary Zhihu path still behaves as before
- submit `打开知乎热榜。。。` and confirm the deterministic runtime fails closed with the unsupported-scene prompt instead of falling into the Zhihu path
- [ ] **Step 5: Commit the cleanup + verified platform state**
Run:
```bash
git add src/compat/mod.rs src/lib.rs src/compat src/generated_scene src/scene_contract docs/superpowers/references tests examples/generated_scene_platform
git commit -m "feat: add generated scene skill platform"
```
Expected: one final commit after the full automated and manual verification passes.
---
## Verification Checklist
### Registry and manifest contract
```bash
cargo test --test scene_registry_test -- --nocapture
```
Expected:
- `scene.toml` loads from the skill root
- only `schema_version = "1"` passes
- duplicate `scene.id` fails with both manifest paths in the error
- non-`browser_script` or non-`report_collection` v1 scenes are rejected cleanly
- the registry still scans exactly one resolved `skillsDir`
### Deterministic routing contract
```bash
cargo test --test deterministic_submit_test -- --nocapture
```
Expected:
- exact `。。。` suffix only
- no-suffix behavior unchanged
- unsupported suffix-scene requests fail closed
- multi-match ambiguity fails closed
- missing org/mode/period prompt instead of defaulting
- page context may improve scoring but cannot cause silent guessing on unresolved ambiguity
### Generic report-artifact handling
```bash
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
```
Expected:
- `ok` / `partial` / `empty` map to success
- `blocked` / `error` map to failure
- generic XLSX export works from artifact fields, not line-loss-only Rust code
- configured `directSubmitSkill` keeps working on the shared artifact interpreter
### Service submit/bootstrap path
```bash
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
```
Expected:
- deterministic manifest scenes route before LLM
- bootstrap target resolution uses manifest `target_url` / `expected_domain`
- callback-host/browser-ws paths still receive the correct request URL
- non-deterministic Zhihu and direct-submit flows remain intact
### Generator and lessons
```bash
cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo build --bin sg_scene_generate
```
Expected:
- lessons TOML contains all required routing/param/bootstrap/artifact/validation rules
- analyzer only accepts v1 report/collection browser-script fixtures
- generator writes a complete package with `scene.toml` and JS test scaffold
- generator/runtime share only the explicit file contract, not hidden Rust internals
### Real runtime smoke
Manual checklist:
- `sg_claw.exe` / service console can still run the line-loss deterministic path
- missing-period deterministic line-loss requests prompt instead of defaulting
- plain Zhihu requests still avoid the scene platform
- suffixed unsupported requests fail closed
- line-loss export still opens through the generic postprocess path when configured
---
## Notes For The Engineer
- The paired approved spec is `docs/superpowers/specs/2026-04-15-generated-scene-skill-platform-design.md`.
- The current repo branch name for the ws baseline is `feature/claw-ws`, even though the design prose says `ws`.
- Do **not** reintroduce the old scene-registry experiment that was explicitly cleaned off the ws branch. This plan deliberately keeps the new runtime under `compat` and a shared serializable contract instead of reviving the deleted scene-only branch structure blindly.
- Keep `scene.toml` inside each skill package root. The separate `skill_staging/scenes/*/scene.json` tree remains legacy metadata only in this plan.
- Keep the generator extractable by holding the boundary at `scene.toml`, generated package layout, and lessons TOML rules. Avoid runtime code that reaches into generator-only internals.
- If a real scenario directory does not fit the v1 report/collection/browser-script envelope, the analyzer/generator must refuse it explicitly instead of emitting a half-valid package.
- Do **not** add a generic login/session platform here. Capture that need in docs if discovered, but keep it out of this implementation slice.