Files
claw/docs/superpowers/plans/2026-04-15-generated-scene-skill-platform-plan.md

49 KiB

Generated Scene Skill Platform Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a manifest-driven generated-scene platform that discovers staged report/collection browser_script scenes, routes deterministic 。。。 requests through generic registry/resolver logic, migrates tq-lineloss-report off one-off Rust branches, and ships a first in-repo generator that outputs registration-ready scene packages with minimal or zero per-scene Rust changes.

Architecture: Keep the existing submit branch shape in src/agent/task_runner.rs, but replace the line-loss-specific deterministic branch with a thin adapter over a generic scene registry, deterministic dispatcher, generic report-artifact interpreter, and generic XLSX postprocess path. Keep the generator separate from runtime internals by making scene.toml plus the lessons-learned TOML the only stable generator/runtime contract; generator code lives in its own module and binary, while runtime code stays under the existing compat submit/bootstrap seams.

Tech Stack: Rust 2021, serde, serde_json, toml, existing browser_script runtime and callback-host/browser-backend seams, node:test for staged JS, Cargo integration tests, filesystem-based package generation.


Execution Context

  • Branch from the repo's current ws baseline branch, which is feature/claw-ws in this checkout today. Do not implement on that branch directly; create a new feature branch from its HEAD.
  • Do not create a worktree unless the user explicitly asks. Branch isolation is required; worktree isolation is not.
  • Keep skillsDir as the existing single resolved path. The new scene registry must scan inside that one resolved skills root instead of adding array-style scene roots or a second config field.
  • For this branch's automated tests and real smokes, use a repo-local skillsDir override that points at examples/generated_scene_platform. That still preserves the single-root contract because the runtime scans one resolved root whose skills/ child contains the committed sample package.
  • Put the new runtime registration manifest at <skill-root>/scene.toml. Keep existing skill_staging/scenes/*/scene.json files for legacy staging/UI metadata and do not move runtime dispatch policy back into scene.json.
  • Keep every required deliverable for this plan inside the current claw-new repo so the branch can be built, tested, and committed independently. The first committed sample package should live under examples/generated_scene_platform/skills/; publishing the same package into any external skills/staging repo is a separate follow-up, not part of this branch.
  • V1 scope is locked to category = "report_collection", kind = "browser_script", artifact.type = "report-artifact". Unsupported scene types must fail fast instead of partially working.
  • Deterministic invocation remains exact-suffix-only: only raw instructions ending with the exact 。。。 suffix enter the scene dispatcher.
  • Never use hidden page defaults for required canonical parameters. Missing org, missing month/week mode, or missing period must prompt and stop.
  • Do not add a generic login/session subsystem in this plan.
  • Preserve current non-platform flows: Zhihu/LLM, configured directSubmitSkill, and ordinary browser-attached orchestration must remain behaviorally unchanged unless an explicit regression test says otherwise.

File Map

Core runtime and contract files

  • Create: src/scene_contract/mod.rs
    • shared serializable manifest contract used by both runtime and generator
  • Create: src/scene_contract/manifest.rs
    • scene.toml schema types, schema-version validation helpers, artifact/postprocess enums
  • Create: src/compat/scene_platform/mod.rs
    • exports the registry, dispatch, and resolver units
  • Create: src/compat/scene_platform/registry.rs
    • scans the single resolved skillsDir, loads <skill-root>/scene.toml, validates duplicates and runtime compatibility
  • Create: src/compat/scene_platform/dispatch.rs
    • deterministic candidate scoring, ambiguity fail-closed behavior, canonical param resolution, executable scene plan creation
  • Create: src/compat/scene_platform/resolvers.rs
    • reusable resolver types for dictionary_entity, month_week_period, fixed_enum, and literal_passthrough
  • Create: src/compat/report_artifact.rs
    • generic report-artifact parsing, status mapping, summary building, and export-readiness helpers
  • Create: src/compat/report_xlsx_export.rs
    • generic XLSX exporter for any report-artifact with column_defs/columns + rows
  • Modify: src/lib.rs
    • export new shared/runtime/generator modules and any CLI helpers needed by tests
  • Modify: src/compat/mod.rs
    • export the new scene-platform and report-artifact modules
  • Modify: src/compat/deterministic_submit.rs
    • keep the public API shape, but make it registry/manifest-driven instead of line-loss-hardcoded
  • Modify: src/compat/direct_skill_runtime.rs
    • reuse the generic report-artifact interpreter so direct-submit and scene-submit summarize/status-map the same way
  • Modify: src/agent/task_runner.rs
    • keep branch order, but call the new registry-backed deterministic planner before ordinary orchestration/LLM
  • Modify: src/service/server.rs
    • keep bootstrap precedence shape, but let deterministic plans source target_url / expected_domain from scene manifests instead of hardcoded constants

Generator files

  • Create: src/generated_scene/mod.rs
    • generator entrypoints shared by tests and CLI
  • Create: src/generated_scene/analyzer.rs
    • source directory inspection for v1 report/collection browser_script scenes
  • Create: src/generated_scene/generator.rs
    • template rendering and package writing into an output staging root
  • Create: src/generated_scene/lessons.rs
    • loads and validates tq-lineloss-lessons-learned.toml as generation constraints
  • Create: src/bin/sg_scene_generate.rs
    • CLI entry for sgClaw's in-repo scene generator capability

In-repo sample package and reference assets

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml
    • first committed manifest-driven sample scene package used by runtime and generator tests in this repo
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json
    • external dictionary data for the dictionary_entity resolver fixture
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml
    • committed sample browser-script tool contract aligned with the manifest-driven runtime
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md
    • committed sample documentation for canonical args, artifact contract, and runtime expectations
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js
    • committed sample collection script with generic-platform artifact fields
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js
    • committed JS contract tests for canonical args and artifact shape
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md
    • committed sample data-quality notes aligned with manifest-driven output rules
  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md
    • committed sample bootstrap/collection-flow notes
  • Create: tests/fixtures/scene_source/tq_lineloss/index.html
    • hermetic in-repo source fixture for required analyzer/generator smoke coverage
  • Create: tests/fixtures/scene_source/tq_lineloss/js/collect.js
    • hermetic in-repo source fixture JS for analyzer/generator smoke coverage

Repo-local runtime discovery path for validation

  • Use examples/generated_scene_platform as the repo-local skillsDir override root during tests and manual smokes.
  • The runtime still scans one resolved root only; it just resolves that root to examples/generated_scene_platform, whose skills/ child contains the committed sample package.
  • Add or reuse a tiny repo-local config fixture such as tmp/generated_scene_platform_sgclaw_config.json or an equivalent test helper so the validation steps all point at the same reproducible skillsDir.
  • Do not require external staging repos to make the manifest-driven runtime discoverable during this branch.

External publish target kept out of scope for this branch

  • Do not modify external paths like D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/... in this plan.
  • If the user later wants the generated sample published into that external staging repo, do it as a separate follow-up after this branch is green.

Platform-reference files

Tests and fixtures

  • Create: tests/scene_registry_test.rs
    • manifest loading, duplicate detection, schema validation, tool compatibility checks
  • Create: tests/report_artifact_postprocess_test.rs
    • generic report-artifact parsing and XLSX postprocess coverage
  • Create: tests/generated_scene_lessons_test.rs
    • lessons-TOML shape and required-rule coverage
  • Create: tests/scene_generator_test.rs
    • analyzer + generator integration coverage using hermetic fixtures
  • Create: tests/fixtures/generated_scene/report_collection/index.html
    • supported v1 report-scene fixture
  • Create: tests/fixtures/generated_scene/report_collection/js/report.js
    • supported fixture source hints for analyzer tests
  • Create: tests/fixtures/generated_scene/non_report/index.html
    • unsupported fixture proving fail-fast behavior
  • Modify: tests/deterministic_submit_test.rs
    • migrate from hardcoded line-loss expectations to registry-driven deterministic behavior
  • Modify: tests/agent_runtime_test.rs
    • keep direct-submit behavior intact while sharing generic report-artifact summaries
  • Modify: tests/service_task_flow_test.rs
    • task-runner/bootstrap regressions for manifest-driven deterministic scenes
  • Modify: tests/service_ws_session_test.rs
    • callback-host bootstrap target regression for manifest-driven deterministic submit when the browser-ws path is active

Legacy files to delete only after green verification proves they are unused

  • Delete: src/compat/tq_lineloss/org_units.rs
  • Delete: src/compat/tq_lineloss/org_resolver.rs
  • Delete: src/compat/tq_lineloss/period_resolver.rs
  • Delete or reduce to a compatibility shim only if still needed: src/compat/lineloss_xlsx_export.rs

Task 1: Create the implementation branch and lock the layout boundaries

Files:

  • Verify only

  • Step 1: Switch to the ws baseline branch and create a new platform branch

Run:

git switch feature/claw-ws
git switch -c feature/generated-scene-skill-platform

Expected: git status -sb shows a clean new branch rooted at the current ws baseline, not feature/claw-ws itself.

  • Step 2: Verify the current single-root skills layout before coding

Run:

cargo test --test compat_config_test ws_cleanup_resolves_single_configured_skills_dir -- --nocapture

Expected: PASS, proving the repo still uses one resolved skillsDir path and the platform work must build on that instead of introducing array-style roots.

  • Step 3: Write down the two non-negotiable layout decisions in the first registry test scaffold

The very first red test file (tests/scene_registry_test.rs) must assume:

// runtime manifest location:
let manifest_path = skill_root.join("scene.toml");

// legacy scene.json stays outside runtime dispatch ownership:
assert!(skill_root.join("scene.toml").exists());
assert!(!manifest_path.ends_with("skill_staging/scenes/.../scene.json"));

This prevents the implementation from drifting back toward scene.json routing or multi-root config.


Task 2: Add the shared scene.toml contract and registry loader

Files:

  • Create: src/scene_contract/mod.rs

  • Create: src/scene_contract/manifest.rs

  • Create: src/compat/scene_platform/mod.rs

  • Create: src/compat/scene_platform/registry.rs

  • Modify: src/lib.rs

  • Modify: src/compat/mod.rs

  • Create: tests/scene_registry_test.rs

  • Step 1: Write the failing registry tests first

Add tests/scene_registry_test.rs with focused red cases like:

#[test]
fn registry_loads_scene_manifest_from_skill_root() {
    let skill_root = temp_skill_with_scene_manifest(r#"
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"

[manifest]
schema_version = "1"

[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
requires_target_page = true

[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
"#);

    let registry = load_scene_registry(skill_root.parent().unwrap()).unwrap();
    assert_eq!(registry.len(), 1);
    assert_eq!(registry[0].manifest.scene.id, "tq-lineloss-report");
}

#[test]
fn registry_rejects_duplicate_scene_ids_with_both_paths_in_error() { /* two skills, same scene.id */ }

#[test]
fn registry_rejects_unknown_manifest_schema_version() { /* schema_version = "999" */ }

#[test]
fn registry_rejects_non_browser_script_scene_tool_in_v1() { /* kind = "shell" should fail */ }

#[test]
fn registry_ignores_skills_without_scene_toml() { /* ordinary skills still load elsewhere */ }
  • Step 2: Run the registry test file and verify it fails

Run:

cargo test --test scene_registry_test -- --nocapture

Expected: FAIL because scene.toml types and registry loading do not exist yet.

  • Step 3: Implement the serializable manifest contract and the single-root registry loader

Implement the minimal contract and loader needed to satisfy the tests:

#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SceneManifest {
    pub scene: SceneSection,
    pub manifest: ManifestSection,
    pub bootstrap: BootstrapSection,
    pub deterministic: DeterministicSection,
    pub params: Vec<SceneParam>,
    pub artifact: ArtifactSection,
    pub postprocess: Option<PostprocessSection>,
}

#[derive(Debug, Clone)]
pub struct SceneRegistryEntry {
    pub manifest: SceneManifest,
    pub skill_root: PathBuf,
}

pub fn load_scene_registry(skills_dir: &Path) -> Result<Vec<SceneRegistryEntry>, SceneRegistryError> {
    // iterate immediate skill dirs under the already-resolved single skillsDir
    // look for <skill-root>/scene.toml only
    // parse and validate schema version
    // verify scene.id uniqueness across the loaded root
    // verify manifest.scene.skill matches the containing skill package
    // verify referenced tool exists in SKILL.toml and is browser_script in v1
}

Rules to lock now:

  • schema_version = "1" is the only accepted version in v1

  • duplicate scene.id is a hard error and must report both manifest paths

  • manifest loading must not add a second config key or a hardcoded skill_staging/scenes scan

  • scene.toml is runtime-owned; scene.json stays legacy-only

  • Step 4: Re-run the registry tests and verify they pass

Run:

cargo test --test scene_registry_test -- --nocapture

Expected: PASS.

  • Step 5: Commit the contract and registry slice

Run:

git add src/lib.rs src/scene_contract/mod.rs src/scene_contract/manifest.rs src/compat/mod.rs src/compat/scene_platform/mod.rs src/compat/scene_platform/registry.rs tests/scene_registry_test.rs
git commit -m "feat: add scene manifest registry"

Expected: one commit that introduces the stable runtime/generator contract and registry loader.


Task 3: Generalize deterministic dispatch and reusable parameter resolvers

Files:

  • Create: src/compat/scene_platform/dispatch.rs

  • Create: src/compat/scene_platform/resolvers.rs

  • Modify: src/compat/deterministic_submit.rs

  • Modify: tests/deterministic_submit_test.rs

  • Step 1: Replace the line-loss-only deterministic tests with registry-backed red tests

Extend tests/deterministic_submit_test.rs with registry-backed red cases built from temp fixture manifests under a temporary skills root. Do not depend on the committed sample package from Task 6 yet; Task 3 must stay hermetic and independently runnable. Add failing cases such as:

#[test]
fn deterministic_submit_uses_registry_backed_scene_plan() {
    let decision = decide_deterministic_submit(
        "兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。",
        None,
        None,
    );

    match decision {
        DeterministicSubmitDecision::Execute(plan) => {
            assert_eq!(plan.scene_id, "tq-lineloss-report");
            assert_eq!(plan.tool_name, "tq-lineloss-report.collect_lineloss");
            assert_eq!(plan.expected_domain, "20.76.57.61");
            assert_eq!(plan.target_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
        }
        other => panic!("expected execute plan, got {other:?}"),
    }
}

#[test]
fn deterministic_submit_fails_closed_on_scene_ambiguity() { /* two plausible scene.toml entries -> Prompt */ }

#[test]
fn deterministic_submit_prompts_for_missing_period_instead_of_defaulting() {
    let decision = decide_deterministic_submit("兰州公司 台区线损大数据 月累计线损率统计分析。。。", None, None);
    assert!(matches!(decision, DeterministicSubmitDecision::Prompt { .. }));
}

#[test]
fn deterministic_submit_uses_page_context_to_break_ties_before_keyword_only_match() { /* page_url/title beats keyword overlap */ }

#[test]
fn zhihu_without_suffix_remains_not_deterministic() {
    assert!(matches!(
        decide_deterministic_submit("打开知乎热榜", Some("https://www.zhihu.com/hot"), Some("知乎热榜")),
        DeterministicSubmitDecision::NotDeterministic
    ));
}

Also invert the current default-period expectations. 兰州公司 月累计。。。 and 兰州公司 周累计。。。 must now prompt instead of executing.

  • Step 2: Run the targeted deterministic tests and verify they fail

Run:

cargo test --test deterministic_submit_test -- --nocapture

Expected: FAIL because the current implementation is still hardcoded to line-loss constants and still defaults missing month/week periods.

  • Step 3: Implement reusable resolver types and a registry-backed dispatcher

Implement the generic deterministic planner in the new scene-platform modules, then make src/compat/deterministic_submit.rs a thin adapter over it.

Required implementation shape:

pub enum ResolverKind {
    DictionaryEntity,
    MonthWeekPeriod,
    FixedEnum,
    LiteralPassthrough,
}

pub struct SceneExecutionPlan {
    pub scene_id: String,
    pub instruction: String,
    pub tool_name: String,
    pub expected_domain: String,
    pub target_url: String,
    pub args: Map<String, Value>,
    pub success_statuses: Vec<String>,
    pub failure_statuses: Vec<String>,
    pub postprocess: Option<PostprocessSection>,
}

pub fn plan_deterministic_scene(
    raw_instruction: &str,
    page_url: Option<&str>,
    page_title: Option<&str>,
    skills_dir: &Path,
) -> Result<DeterministicSubmitDecision, SceneDispatchError> {
    // exact suffix gate
    // load registry from the single skillsDir
    // score candidate scenes using include/exclude keywords + page context + required-param resolution
    // if multiple remain plausible -> fail closed with explicit ambiguity prompt
    // resolve params using generic resolver kinds
    // build executable SceneExecutionPlan with manifest bootstrap + tool + canonical args
}

Resolver rules to lock now:

  • dictionary_entity reads external dictionary data such as references/org-dictionary.json; no hardcoded org list in Rust after migration

  • month_week_period returns explicit prompts for missing mode, missing period, contradictory month/week intent, or week-without-year

  • fixed_enum and literal_passthrough exist now so the manifest contract is extensible, even if line-loss is the only v1 user

  • if a new scene needs a new resolver type, add a reusable resolver, not a scene-specific if scene_id == ... branch

  • Step 4: Re-run the deterministic tests and verify they pass

Run:

cargo test --test deterministic_submit_test -- --nocapture

Expected: PASS, including the new no-default-period behavior and ambiguity fail-closed coverage.

  • Step 5: Commit the registry-driven deterministic slice

Run:

git add src/compat/deterministic_submit.rs src/compat/scene_platform/dispatch.rs src/compat/scene_platform/resolvers.rs tests/deterministic_submit_test.rs
git commit -m "feat: add registry-driven deterministic scene dispatch"

Expected: one commit that removes one-off line-loss decision ownership from the deterministic planner.


Task 4: Add a generic report-artifact interpreter and XLSX postprocess path

Files:

  • Create: src/compat/report_artifact.rs

  • Create: src/compat/report_xlsx_export.rs

  • Modify: src/compat/direct_skill_runtime.rs

  • Modify: src/compat/deterministic_submit.rs

  • Create: tests/report_artifact_postprocess_test.rs

  • Modify: tests/agent_runtime_test.rs

  • Step 1: Write the red tests for generic report-artifact handling

Add tests/report_artifact_postprocess_test.rs and the minimum tests/agent_runtime_test.rs extensions needed to prove the platform no longer depends on line-loss-specific Rust export logic:

#[test]
fn report_artifact_postprocess_exports_xlsx_for_ok_or_partial_scene() {
    let artifact = serde_json::json!({
        "type": "report-artifact",
        "report_name": "tq-lineloss-report",
        "status": "partial",
        "columns": ["ORG_NAME", "LINE_LOSS_RATE"],
        "column_defs": [["ORG_NAME", "供电单位"], ["LINE_LOSS_RATE", "综合线损率(%)"]],
        "rows": [{"ORG_NAME": "国网兰州供电公司", "LINE_LOSS_RATE": "1.23"}],
        "counts": {"rows": 1},
        "partial_reasons": ["report_log_failed"]
    });

    let outcome = interpret_report_artifact_and_postprocess(&artifact, report_postprocess_xlsx(), temp_workspace()).unwrap();
    assert!(outcome.success);
    assert!(outcome.summary.contains("status=partial"));
    assert!(outcome.summary.contains("detail_rows=1"));
    assert!(outcome.summary.contains("export_path="));
}

#[test]
fn report_artifact_postprocess_skips_export_for_blocked_or_error_scene() { /* no xlsx path */ }

#[test]
fn direct_submit_and_scene_submit_share_the_same_report_summary_contract() { /* direct_skill_runtime + deterministic path both use same summary builder */ }
  • Step 2: Run the focused report-artifact tests and verify they fail

Run:

cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture

Expected: FAIL because the generic interpreter/exporter does not exist yet and deterministic line-loss export is still special-cased.

  • Step 3: Implement the shared parser, summary builder, and generic XLSX exporter

Implement a reusable path that both deterministic scenes and configured direct-submit skills can call:

pub struct ParsedReportArtifact {
    pub report_name: String,
    pub status: String,
    pub columns: Vec<String>,
    pub column_defs: Vec<(String, String)>,
    pub rows: Vec<Map<String, Value>>,
    pub counts: ReportCounts,
    pub partial_reasons: Vec<String>,
}

pub fn interpret_report_artifact_and_postprocess(
    artifact_json: &Value,
    postprocess: Option<&PostprocessSection>,
    workspace_root: &Path,
) -> Result<DirectSubmitOutcome, PipeError> {
    // parse report-artifact generically
    // map ok/partial/empty => success=true
    // map blocked/error => success=false
    // if postprocess.exporter == Some("xlsx_report") and status is exportable, write xlsx under workspace_root/out
    // if postprocess.auto_open == Some("excel"), reuse existing open-export helper
}

Rules:

  • export logic must read column_defs when present, else fall back to columns

  • do not keep line-loss-only column-name assumptions in Rust

  • keep direct-submit behavior unchanged for non-artifact string outputs

  • keep blocked / error as failures even if rows happen to be present late in the artifact

  • Step 4: Re-run the focused tests and verify they pass

Run:

cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_blocked_report_artifact_as_failure -- --nocapture

Expected: PASS.

  • Step 5: Commit the generic artifact/postprocess slice

Run:

git add src/compat/report_artifact.rs src/compat/report_xlsx_export.rs src/compat/direct_skill_runtime.rs src/compat/deterministic_submit.rs tests/report_artifact_postprocess_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: share generic report artifact postprocess"

Expected: one commit that removes the need for per-scene Rust export logic.


Task 5: Wire manifest-driven scenes into submit and bootstrap without regressing other flows

Files:

  • Modify: src/agent/task_runner.rs

  • Modify: src/service/server.rs

  • Modify: tests/service_task_flow_test.rs

  • Modify: tests/service_ws_session_test.rs

  • Modify: tests/agent_runtime_test.rs

  • Step 1: Add the failing submit/bootstrap regression tests

Add focused tests that lock branch order and bootstrap behavior:

#[test]
fn submit_task_routes_suffix_instruction_through_manifest_scene_before_llm() {
    // no provider call should happen when deterministic scene planning succeeds or prompts
}

#[test]
fn resolve_submit_bootstrap_target_prefers_manifest_scene_target_for_deterministic_scene() {
    let request = SubmitTaskRequest {
        instruction: "兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。".to_string(),
        conversation_id: None,
        messages: vec![],
        page_url: None,
        page_title: None,
    };
    let target = resolve_submit_bootstrap_target(&request, workspace_root, &settings);
    assert_eq!(target.request_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
    assert_eq!(target.expected_domain.as_deref(), Some("20.76.57.61"));
}

#[test]
fn zhihu_without_suffix_keeps_existing_non_scene_path() { /* ordinary path unchanged */ }

For the browser-ws/callback-host path, add one regression in tests/service_ws_session_test.rs proving the first bootstrap/open target comes from scene.toml when a deterministic scene plan exists.

  • Step 2: Run the focused integration tests and verify they fail

Run:

cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture

Expected: FAIL because the submit/bootstrap path still depends on the old deterministic line-loss branch shape.

  • Step 3: Implement the minimal wiring changes only where the branch already exists

Implementation targets:

  • keep the current submit branch order in src/agent/task_runner.rs
  • keep resolve_submit_bootstrap_target(...) precedence in src/service/server.rs
  • replace the old hardcoded deterministic plan source with the new manifest-backed planner
  • keep configured directSubmitSkill and ordinary LLM/browser orchestration behavior untouched

The resulting branch order must still be:

// 1. registry-backed deterministic scene (exact suffix only)
// 2. ordinary primary orchestration path
// 3. configured directSubmitSkill
// 4. compat LLM/runtime path
  • Step 4: Re-run the focused integration tests and verify they pass

Run:

cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test agent_runtime_test -- --nocapture

Expected: PASS, with no regression to the ordinary direct-submit or Zhihu paths.

  • Step 5: Commit the submit/bootstrap integration slice

Run:

git add src/agent/task_runner.rs src/service/server.rs tests/service_task_flow_test.rs tests/service_ws_session_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: wire manifest scenes into submit bootstrap"

Expected: one commit that changes wiring only at the existing seams.


Task 6: Add the first manifest-driven tq-lineloss-report sample package inside this repo

Files:

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md

  • Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md

  • Modify: tests/deterministic_submit_test.rs

  • Modify: tests/scene_registry_test.rs

  • Step 1: Add the failing line-loss manifest and runtime-contract checks

Create the scene.toml shape in the in-repo sample package first and lock the migration expectations:

[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"

[manifest]
schema_version = "1"

[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
page_title_keywords = ["线损"]
requires_target_page = true

[deterministic]
suffix = "。。。"
include_keywords = ["线损", "月累计", "周累计", "统计分析"]
exclude_keywords = ["知乎"]

[[params]]
name = "org"
resolver = "dictionary_entity"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少供电单位。"
prompt_ambiguous = "已命中台区线损报表技能,但供电单位存在歧义,请补充更完整名称。"

[params.resolver_config]
dictionary_ref = "references/org-dictionary.json"
output_label_field = "org_label"
output_code_field = "org_code"

[[params]]
name = "period"
resolver = "month_week_period"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少统计周期。"
prompt_ambiguous = "已命中台区线损报表技能,但统计周期存在歧义,请补充更明确表达。"

[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]

[postprocess]
exporter = "xlsx_report"
auto_open = "excel"

Also add a red JS assertion in the committed sample package proving the script returns column_defs and never re-parses raw natural-language org/period text:

test('buildBrowserEntrypointResult keeps canonical args and generic export fields only', async () => {
  const artifact = await buildBrowserEntrypointResult({
    expected_domain: '20.76.57.61',
    org_label: '国网兰州供电公司',
    org_code: '62401',
    period_mode: 'month',
    period_mode_code: '1',
    period_value: '2026-03',
    period_payload: { fdate: '2026-03' },
    instruction: '兰州公司 月累计 2026-03'
  }, fakeDeps);

  assert.equal(artifact.org.code, '62401');
  assert.ok(Array.isArray(artifact.column_defs));
  assert.equal(JSON.stringify(artifact).includes('兰州公司 月累计 2026-03'), false);
});
  • Step 2: Run the targeted line-loss tests and verify they fail

Run:

cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"

Expected: FAIL because the runtime is not yet manifest-driven and the committed sample package does not yet expose the final manifest/dictionary/export contract.

  • Step 3: Implement the sample-scene migration without adding per-scene Rust branches

Required actions:

  • add scene.toml under the in-repo sample skill root and use the same layout the generator will emit

  • make tests and service-smoke config resolve skillsDir to examples/generated_scene_platform so the registry can discover the committed sample package without any external repo copy step

  • export the current org unit data into references/org-dictionary.json and make the resolver read that file instead of a Rust hardcoded list

  • update collect_lineloss.js so the returned report-artifact includes generic-platform fields needed by report_xlsx_export.rs

  • keep collection logic in JS; do not move line-loss business semantics back into Rust

  • write SKILL.toml / SKILL.md / references docs into the sample package to describe canonical args and the manifest-driven contract

  • keep any external staging-repo publish step out of scope for this branch; this task only commits the in-repo sample package

  • Step 4: Re-run the line-loss tests and verify they pass

Run:

cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"

Expected: PASS, including the new missing-period prompt behavior and the new manifest-driven sample-package shape.

  • Step 5: Commit the line-loss sample migration

Run:

git add examples/generated_scene_platform/skills/tq-lineloss-report tests/deterministic_submit_test.rs tests/scene_registry_test.rs
git commit -m "feat: add manifest-driven lineloss sample package"

Expected: one commit that adds the first committed manifest-driven sample package and updates runtime expectations around it.


Task 7: Write the required tq-lineloss lessons-learned artifacts and load them as generator rules

Files:

  • Create: docs/superpowers/references/tq-lineloss-lessons-learned.md

  • Create: docs/superpowers/references/tq-lineloss-lessons-learned.toml

  • Create: tests/generated_scene_lessons_test.rs

  • Create: src/generated_scene/mod.rs

  • Create: src/generated_scene/lessons.rs

  • Modify: src/lib.rs

  • Step 1: Write the failing lessons-rules test before the docs

Add tests/generated_scene_lessons_test.rs that requires all mandatory structured rule sections to exist. In the same red step, wire the empty src/generated_scene/mod.rs and src/lib.rs exports needed so this test fails on missing implementation/data, not on missing module visibility:

#[test]
fn lineloss_lessons_toml_declares_required_generator_rules() {
    let lessons = load_generation_lessons("docs/superpowers/references/tq-lineloss-lessons-learned.toml").unwrap();

    assert!(lessons.routing.require_exact_suffix);
    assert!(lessons.routing.unsupported_scene_fail_closed);
    assert!(lessons.canonical_params.require_explicit_period);
    assert!(lessons.bootstrap.require_expected_domain);
    assert!(lessons.bootstrap.require_target_url);
    assert!(lessons.artifact.require_report_artifact);
    assert!(lessons.validation.require_pipe_and_ws_checks);
    assert!(lessons.validation.require_manual_service_console_smoke);
}
  • Step 2: Run the lessons test and verify it fails

Run:

cargo test --test generated_scene_lessons_test -- --nocapture

Expected: FAIL because the lessons loader and TOML file do not exist yet.

  • Step 3: Implement the loader and write both lessons artifacts

Implement the loader and complete the minimal module wiring (src/generated_scene/mod.rs, src/lib.rs) in this task so cargo test --test generated_scene_lessons_test is buildable before Task 8. Use a TOML shape explicit enough for generator enforcement, for example:

[routing]
require_exact_suffix = true
unsupported_scene_fail_closed = true
ambiguity_fail_closed = true

[canonical_params]
require_dictionary_entity_for_org = true
require_explicit_period = true
forbid_hidden_page_defaults = true

[bootstrap]
require_expected_domain = true
require_target_url = true
prefer_page_context_when_present = true

[artifact]
require_report_artifact = true
require_column_defs_for_export = true
rust_side_xlsx_export_when_postprocess_xlsx = true

[validation]
require_pipe_and_ws_checks = true
require_manual_service_console_smoke = true
require_callback_host_timeout_notes = true

The Markdown companion must explain the why behind those rules: deterministic routing pitfalls, canonical parameter pitfalls, bootstrap target pitfalls, pipe/ws differences, callback-host timeout lessons, and Rust-side export constraints.

  • Step 4: Re-run the lessons tests and verify they pass

Run:

cargo test --test generated_scene_lessons_test -- --nocapture

Expected: PASS.

  • Step 5: Commit the lessons artifacts and loader

Run:

git add docs/superpowers/references/tq-lineloss-lessons-learned.md docs/superpowers/references/tq-lineloss-lessons-learned.toml src/generated_scene/mod.rs src/generated_scene/lessons.rs src/lib.rs tests/generated_scene_lessons_test.rs
git commit -m "docs: add lineloss generation lessons"

Expected: one commit that makes the line-loss lessons machine-consumable and reviewable.


Task 8: Build the v1 source analyzer, package generator, and CLI entry

Files:

  • Create: src/generated_scene/analyzer.rs

  • Create: src/generated_scene/generator.rs

  • Create: src/bin/sg_scene_generate.rs

  • Modify: src/generated_scene/mod.rs

  • Modify: src/lib.rs

  • Create: tests/scene_generator_test.rs

  • Create: tests/fixtures/generated_scene/report_collection/index.html

  • Create: tests/fixtures/generated_scene/report_collection/js/report.js

  • Create: tests/fixtures/generated_scene/non_report/index.html

  • Create: tests/fixtures/scene_source/tq_lineloss/index.html

  • Create: tests/fixtures/scene_source/tq_lineloss/js/collect.js

  • Step 1: Add the failing analyzer/generator tests with hermetic fixtures

Create fixture-backed tests like:

#[test]
fn analyzer_classifies_supported_report_collection_source() {
    let analysis = analyze_scene_source(Path::new("tests/fixtures/generated_scene/report_collection")).unwrap();
    assert_eq!(analysis.scene_kind, SceneKind::ReportCollection);
    assert_eq!(analysis.tool_kind, ToolKind::BrowserScript);
    assert!(analysis.bootstrap.target_url.is_some());
    assert!(analysis.collection_entry_script.is_some());
}

#[test]
fn generator_writes_registration_ready_package_with_scene_toml() {
    let output_root = tempdir();
    generate_scene_package(GenerateSceneRequest {
        source_dir: PathBuf::from("tests/fixtures/generated_scene/report_collection"),
        scene_id: "sample-report-scene".to_string(),
        scene_name: "示例报表场景".to_string(),
        output_root: output_root.path().to_path_buf(),
        lessons_path: PathBuf::from("docs/superpowers/references/tq-lineloss-lessons-learned.toml"),
    }).unwrap();

    assert!(output_root.path().join("skills/sample-report-scene/SKILL.toml").exists());
    assert!(output_root.path().join("skills/sample-report-scene/scene.toml").exists());
    assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.js").exists());
    assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.test.js").exists());
}

#[test]
fn generator_rejects_non_report_source_with_explicit_reason() {
    let err = analyze_scene_source(Path::new("tests/fixtures/generated_scene/non_report")).unwrap_err();
    assert!(err.to_string().contains("report/collection browser_script only"));
}
  • Step 2: Run the generator tests and verify they fail

Run:

cargo test --test scene_generator_test -- --nocapture

Expected: FAIL because the analyzer, generator, fixtures, and CLI do not exist yet.

  • Step 3: Implement the analyzer, generator, CLI, and the source fixtures used by final smoke

Implementation rules:

  • create the generator test fixtures under tests/fixtures/generated_scene/*
  • create the hermetic source-smoke fixtures under tests/fixtures/scene_source/tq_lineloss/* so Task 9 can run without any external scenario directory
  • analyzer must refuse unsupported/non-report scenes explicitly instead of generating broken packages
  • generator must emit scene.toml inside the generated skill root
  • generator must use tq-lineloss-lessons-learned.toml as a required input so the same hardening rules apply to future scenes
  • generator/runtime coupling must stay at the file-contract level only
  • CLI should use an explicit parser, no new heavy dependency

Suggested CLI shape:

cargo run --bin sg_scene_generate -- \
  --source-dir <scenario-dir> \
  --scene-id <scene-id> \
  --scene-name <display-name> \
  --output-root <skill-staging-root> \
  --lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml

Expected outputs under <output-root>:

  • skills/<scene-id>/SKILL.toml

  • skills/<scene-id>/SKILL.md

  • skills/<scene-id>/scene.toml

  • skills/<scene-id>/references/*.md

  • skills/<scene-id>/scripts/*.js

  • skills/<scene-id>/scripts/*.test.js

  • Step 4: Re-run the generator tests and verify they pass

Run:

cargo test --test scene_generator_test -- --nocapture

Expected: PASS.

  • Step 5: Commit the generator slice

Run:

git add src/lib.rs src/generated_scene/mod.rs src/generated_scene/analyzer.rs src/generated_scene/generator.rs src/bin/sg_scene_generate.rs tests/scene_generator_test.rs tests/fixtures/generated_scene tests/fixtures/scene_source/tq_lineloss
 git commit -m "feat: add generated scene package generator"

Expected: one commit that adds the in-repo v1 generator capability.


Task 9: Run the final verification sweep, smoke the real runtime, and remove unused one-off scene code

Files:

  • Delete if unused after green verification: src/compat/tq_lineloss/org_units.rs

  • Delete if unused after green verification: src/compat/tq_lineloss/org_resolver.rs

  • Delete if unused after green verification: src/compat/tq_lineloss/period_resolver.rs

  • Delete or reduce to shim only if unused after green verification: src/compat/lineloss_xlsx_export.rs

  • Modify: src/compat/mod.rs

  • Modify: src/lib.rs

  • Step 1: Remove only the legacy one-off files that are provably unused

Before deleting anything, prove the new path covers the old responsibilities:

cargo test --test deterministic_submit_test -- --nocapture
cargo test --test scene_registry_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture

Then delete the old line-loss-only resolver/export files only if cargo test and Grep show they are no longer referenced.

  • Step 2: Run the full automated verification sweep

Run:

node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
cargo test --test scene_registry_test -- --nocapture
cargo test --test deterministic_submit_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test compat_runtime_test -- --nocapture
cargo test --test compat_config_test -- --nocapture
cargo build --bin sgclaw --bin sg_claw --bin sg_scene_generate

Expected: PASS.

  • Step 3: Run the required hermetic generator smoke and keep the real external source smoke optional

Run the required in-repo smoke first:

tmp_out="$(mktemp -d)"
cargo run --bin sg_scene_generate -- \
  --source-dir tests/fixtures/scene_source/tq_lineloss \
  --scene-id tq-lineloss-report \
  --scene-name "台区线损月周累计线损率统计分析" \
  --output-root "$tmp_out" \
  --lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml

Expected: generator emits a complete package into $tmp_out using only in-repo fixtures.

Optional manual follow-up after the required smoke is green:

  • if the external scenario directory is available on the implementer's machine, re-run the same command against the real source tree for additional confidence

  • if it is unavailable, do not block the branch on that machine-specific path

  • Step 4: Run the real service-console smoke checks with sg_claw.exe semantics in mind

Manual verification checklist:

  • write or reuse a repo-local sgclaw_config.json whose skillsDir points to examples/generated_scene_platform

  • rebuild and run sg_claw/sg_claw.exe with that config so the runtime-scanned skills root is reproducible

  • on the real line-loss page, submit 兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。

  • confirm the request bootstraps the manifest target_url, uses the manifest expected_domain, and returns the line-loss report artifact through the generic scene runtime

  • submit 兰州公司 台区线损大数据 月累计线损率统计分析。。。 and confirm the runtime prompts for missing period instead of defaulting

  • submit 打开知乎热榜 and confirm the ordinary Zhihu path still behaves as before

  • submit 打开知乎热榜。。。 and confirm the deterministic runtime fails closed with the unsupported-scene prompt instead of falling into the Zhihu path

  • Step 5: Commit the cleanup + verified platform state

Run:

git add src/compat/mod.rs src/lib.rs src/compat src/generated_scene src/scene_contract docs/superpowers/references tests examples/generated_scene_platform
 git commit -m "feat: add generated scene skill platform"

Expected: one final commit after the full automated and manual verification passes.


Verification Checklist

Registry and manifest contract

cargo test --test scene_registry_test -- --nocapture

Expected:

  • scene.toml loads from the skill root
  • only schema_version = "1" passes
  • duplicate scene.id fails with both manifest paths in the error
  • non-browser_script or non-report_collection v1 scenes are rejected cleanly
  • the registry still scans exactly one resolved skillsDir

Deterministic routing contract

cargo test --test deterministic_submit_test -- --nocapture

Expected:

  • exact 。。。 suffix only
  • no-suffix behavior unchanged
  • unsupported suffix-scene requests fail closed
  • multi-match ambiguity fails closed
  • missing org/mode/period prompt instead of defaulting
  • page context may improve scoring but cannot cause silent guessing on unresolved ambiguity

Generic report-artifact handling

cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture

Expected:

  • ok / partial / empty map to success
  • blocked / error map to failure
  • generic XLSX export works from artifact fields, not line-loss-only Rust code
  • configured directSubmitSkill keeps working on the shared artifact interpreter

Service submit/bootstrap path

cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture

Expected:

  • deterministic manifest scenes route before LLM
  • bootstrap target resolution uses manifest target_url / expected_domain
  • callback-host/browser-ws paths still receive the correct request URL
  • non-deterministic Zhihu and direct-submit flows remain intact

Generator and lessons

cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo build --bin sg_scene_generate

Expected:

  • lessons TOML contains all required routing/param/bootstrap/artifact/validation rules
  • analyzer only accepts v1 report/collection browser-script fixtures
  • generator writes a complete package with scene.toml and JS test scaffold
  • generator/runtime share only the explicit file contract, not hidden Rust internals

Real runtime smoke

Manual checklist:

  • sg_claw.exe / service console can still run the line-loss deterministic path
  • missing-period deterministic line-loss requests prompt instead of defaulting
  • plain Zhihu requests still avoid the scene platform
  • suffixed unsupported requests fail closed
  • line-loss export still opens through the generic postprocess path when configured

Notes For The Engineer

  • The paired approved spec is docs/superpowers/specs/2026-04-15-generated-scene-skill-platform-design.md.
  • The current repo branch name for the ws baseline is feature/claw-ws, even though the design prose says ws.
  • Do not reintroduce the old scene-registry experiment that was explicitly cleaned off the ws branch. This plan deliberately keeps the new runtime under compat and a shared serializable contract instead of reviving the deleted scene-only branch structure blindly.
  • Keep scene.toml inside each skill package root. The separate skill_staging/scenes/*/scene.json tree remains legacy metadata only in this plan.
  • Keep the generator extractable by holding the boundary at scene.toml, generated package layout, and lessons TOML rules. Avoid runtime code that reaches into generator-only internals.
  • If a real scenario directory does not fit the v1 report/collection/browser-script envelope, the analyzer/generator must refuse it explicitly instead of emitting a half-valid package.
  • Do not add a generic login/session platform here. Capture that need in docs if discovered, but keep it out of this implementation slice.