admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

49 KiB

Raw Blame History

Generated Scene Skill Platform Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a manifest-driven generated-scene platform that discovers staged report/collection browser_script scenes, routes deterministic 。。。 requests through generic registry/resolver logic, migrates tq-lineloss-report off one-off Rust branches, and ships a first in-repo generator that outputs registration-ready scene packages with minimal or zero per-scene Rust changes.

Architecture: Keep the existing submit branch shape in src/agent/task_runner.rs, but replace the line-loss-specific deterministic branch with a thin adapter over a generic scene registry, deterministic dispatcher, generic report-artifact interpreter, and generic XLSX postprocess path. Keep the generator separate from runtime internals by making scene.toml plus the lessons-learned TOML the only stable generator/runtime contract; generator code lives in its own module and binary, while runtime code stays under the existing compat submit/bootstrap seams.

Tech Stack: Rust 2021, serde, serde_json, toml, existing browser_script runtime and callback-host/browser-backend seams, node:test for staged JS, Cargo integration tests, filesystem-based package generation.

Execution Context

Branch from the repo's current ws baseline branch, which is feature/claw-ws in this checkout today. Do not implement on that branch directly; create a new feature branch from its HEAD.
Do not create a worktree unless the user explicitly asks. Branch isolation is required; worktree isolation is not.
Keep skillsDir as the existing single resolved path. The new scene registry must scan inside that one resolved skills root instead of adding array-style scene roots or a second config field.
For this branch's automated tests and real smokes, use a repo-local skillsDir override that points at examples/generated_scene_platform. That still preserves the single-root contract because the runtime scans one resolved root whose skills/ child contains the committed sample package.
Put the new runtime registration manifest at <skill-root>/scene.toml. Keep existing skill_staging/scenes/*/scene.json files for legacy staging/UI metadata and do not move runtime dispatch policy back into scene.json.
Keep every required deliverable for this plan inside the current claw-new repo so the branch can be built, tested, and committed independently. The first committed sample package should live under examples/generated_scene_platform/skills/; publishing the same package into any external skills/staging repo is a separate follow-up, not part of this branch.
V1 scope is locked to category = "report_collection", kind = "browser_script", artifact.type = "report-artifact". Unsupported scene types must fail fast instead of partially working.
Deterministic invocation remains exact-suffix-only: only raw instructions ending with the exact 。。。 suffix enter the scene dispatcher.
Never use hidden page defaults for required canonical parameters. Missing org, missing month/week mode, or missing period must prompt and stop.
Do not add a generic login/session subsystem in this plan.
Preserve current non-platform flows: Zhihu/LLM, configured directSubmitSkill, and ordinary browser-attached orchestration must remain behaviorally unchanged unless an explicit regression test says otherwise.

File Map

Core runtime and contract files

Create: src/scene_contract/mod.rs
- shared serializable manifest contract used by both runtime and generator
Create: src/scene_contract/manifest.rs
- scene.toml schema types, schema-version validation helpers, artifact/postprocess enums
Create: src/compat/scene_platform/mod.rs
- exports the registry, dispatch, and resolver units
Create: src/compat/scene_platform/registry.rs
- scans the single resolved skillsDir, loads <skill-root>/scene.toml, validates duplicates and runtime compatibility
Create: src/compat/scene_platform/dispatch.rs
- deterministic candidate scoring, ambiguity fail-closed behavior, canonical param resolution, executable scene plan creation
Create: src/compat/scene_platform/resolvers.rs
- reusable resolver types for dictionary_entity, month_week_period, fixed_enum, and literal_passthrough
Create: src/compat/report_artifact.rs
- generic report-artifact parsing, status mapping, summary building, and export-readiness helpers
Create: src/compat/report_xlsx_export.rs
- generic XLSX exporter for any report-artifact with column_defs/columns + rows
Modify: src/lib.rs
- export new shared/runtime/generator modules and any CLI helpers needed by tests
Modify: src/compat/mod.rs
- export the new scene-platform and report-artifact modules
Modify: src/compat/deterministic_submit.rs
- keep the public API shape, but make it registry/manifest-driven instead of line-loss-hardcoded
Modify: src/compat/direct_skill_runtime.rs
- reuse the generic report-artifact interpreter so direct-submit and scene-submit summarize/status-map the same way
Modify: src/agent/task_runner.rs
- keep branch order, but call the new registry-backed deterministic planner before ordinary orchestration/LLM
Modify: src/service/server.rs
- keep bootstrap precedence shape, but let deterministic plans source target_url / expected_domain from scene manifests instead of hardcoded constants

Generator files

Create: src/generated_scene/mod.rs
- generator entrypoints shared by tests and CLI
Create: src/generated_scene/analyzer.rs
- source directory inspection for v1 report/collection browser_script scenes
Create: src/generated_scene/generator.rs
- template rendering and package writing into an output staging root
Create: src/generated_scene/lessons.rs
- loads and validates tq-lineloss-lessons-learned.toml as generation constraints
Create: src/bin/sg_scene_generate.rs
- CLI entry for sgClaw's in-repo scene generator capability

In-repo sample package and reference assets

Create: examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml
- first committed manifest-driven sample scene package used by runtime and generator tests in this repo
Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json
- external dictionary data for the dictionary_entity resolver fixture
Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml
- committed sample browser-script tool contract aligned with the manifest-driven runtime
Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md
- committed sample documentation for canonical args, artifact contract, and runtime expectations
Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js
- committed sample collection script with generic-platform artifact fields
Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js
- committed JS contract tests for canonical args and artifact shape
Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md
- committed sample data-quality notes aligned with manifest-driven output rules
Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md
- committed sample bootstrap/collection-flow notes
Create: tests/fixtures/scene_source/tq_lineloss/index.html
- hermetic in-repo source fixture for required analyzer/generator smoke coverage
Create: tests/fixtures/scene_source/tq_lineloss/js/collect.js
- hermetic in-repo source fixture JS for analyzer/generator smoke coverage

Repo-local runtime discovery path for validation

Use examples/generated_scene_platform as the repo-local skillsDir override root during tests and manual smokes.
The runtime still scans one resolved root only; it just resolves that root to examples/generated_scene_platform, whose skills/ child contains the committed sample package.
Add or reuse a tiny repo-local config fixture such as tmp/generated_scene_platform_sgclaw_config.json or an equivalent test helper so the validation steps all point at the same reproducible skillsDir.
Do not require external staging repos to make the manifest-driven runtime discoverable during this branch.

External publish target kept out of scope for this branch

Do not modify external paths like D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/... in this plan.
If the user later wants the generated sample published into that external staging repo, do it as a separate follow-up after this branch is green.

Platform-reference files

Tests and fixtures

Create: tests/scene_registry_test.rs
- manifest loading, duplicate detection, schema validation, tool compatibility checks
Create: tests/report_artifact_postprocess_test.rs
- generic report-artifact parsing and XLSX postprocess coverage
Create: tests/generated_scene_lessons_test.rs
- lessons-TOML shape and required-rule coverage
Create: tests/scene_generator_test.rs
- analyzer + generator integration coverage using hermetic fixtures
Create: tests/fixtures/generated_scene/report_collection/index.html
- supported v1 report-scene fixture
Create: tests/fixtures/generated_scene/report_collection/js/report.js
- supported fixture source hints for analyzer tests
Create: tests/fixtures/generated_scene/non_report/index.html
- unsupported fixture proving fail-fast behavior
Modify: tests/deterministic_submit_test.rs
- migrate from hardcoded line-loss expectations to registry-driven deterministic behavior
Modify: tests/agent_runtime_test.rs
- keep direct-submit behavior intact while sharing generic report-artifact summaries
Modify: tests/service_task_flow_test.rs
- task-runner/bootstrap regressions for manifest-driven deterministic scenes
Modify: tests/service_ws_session_test.rs
- callback-host bootstrap target regression for manifest-driven deterministic submit when the browser-ws path is active

Legacy files to delete only after green verification proves they are unused

Delete: src/compat/tq_lineloss/org_units.rs
Delete: src/compat/tq_lineloss/org_resolver.rs
Delete: src/compat/tq_lineloss/period_resolver.rs
Delete or reduce to a compatibility shim only if still needed: src/compat/lineloss_xlsx_export.rs

Task 1: Create the implementation branch and lock the layout boundaries

Files:

Verify only
Step 1: Switch to the ws baseline branch and create a new platform branch

Run:

git switch feature/claw-ws
git switch -c feature/generated-scene-skill-platform

Expected: git status -sb shows a clean new branch rooted at the current ws baseline, not feature/claw-ws itself.

Step 2: Verify the current single-root skills layout before coding

Run:

cargo test --test compat_config_test ws_cleanup_resolves_single_configured_skills_dir -- --nocapture

Expected: PASS, proving the repo still uses one resolved skillsDir path and the platform work must build on that instead of introducing array-style roots.

Step 3: Write down the two non-negotiable layout decisions in the first registry test scaffold

The very first red test file (tests/scene_registry_test.rs) must assume:

// runtime manifest location:
let manifest_path = skill_root.join("scene.toml");

// legacy scene.json stays outside runtime dispatch ownership:
assert!(skill_root.join("scene.toml").exists());
assert!(!manifest_path.ends_with("skill_staging/scenes/.../scene.json"));

This prevents the implementation from drifting back toward scene.json routing or multi-root config.

Task 2: Add the shared `scene.toml` contract and registry loader

Files:

Create: src/scene_contract/mod.rs
Create: src/scene_contract/manifest.rs
Create: src/compat/scene_platform/mod.rs
Create: src/compat/scene_platform/registry.rs
Modify: src/lib.rs
Modify: src/compat/mod.rs
Create: tests/scene_registry_test.rs
Step 1: Write the failing registry tests first

Add tests/scene_registry_test.rs with focused red cases like:

#[test]
fn registry_loads_scene_manifest_from_skill_root() {
    let skill_root = temp_skill_with_scene_manifest(r#"
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"

[manifest]
schema_version = "1"

[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
requires_target_page = true

[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
"#);

    let registry = load_scene_registry(skill_root.parent().unwrap()).unwrap();
    assert_eq!(registry.len(), 1);
    assert_eq!(registry[0].manifest.scene.id, "tq-lineloss-report");
}

#[test]
fn registry_rejects_duplicate_scene_ids_with_both_paths_in_error() { /* two skills, same scene.id */ }

#[test]
fn registry_rejects_unknown_manifest_schema_version() { /* schema_version = "999" */ }

#[test]
fn registry_rejects_non_browser_script_scene_tool_in_v1() { /* kind = "shell" should fail */ }

#[test]
fn registry_ignores_skills_without_scene_toml() { /* ordinary skills still load elsewhere */ }

Step 2: Run the registry test file and verify it fails

Run:

cargo test --test scene_registry_test -- --nocapture

Expected: FAIL because scene.toml types and registry loading do not exist yet.

Step 3: Implement the serializable manifest contract and the single-root registry loader

Implement the minimal contract and loader needed to satisfy the tests:

#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SceneManifest {
    pub scene: SceneSection,
    pub manifest: ManifestSection,
    pub bootstrap: BootstrapSection,
    pub deterministic: DeterministicSection,
    pub params: Vec<SceneParam>,
    pub artifact: ArtifactSection,
    pub postprocess: Option<PostprocessSection>,
}

#[derive(Debug, Clone)]
pub struct SceneRegistryEntry {
    pub manifest: SceneManifest,
    pub skill_root: PathBuf,
}

pub fn load_scene_registry(skills_dir: &Path) -> Result<Vec<SceneRegistryEntry>, SceneRegistryError> {
    // iterate immediate skill dirs under the already-resolved single skillsDir
    // look for <skill-root>/scene.toml only
    // parse and validate schema version
    // verify scene.id uniqueness across the loaded root
    // verify manifest.scene.skill matches the containing skill package
    // verify referenced tool exists in SKILL.toml and is browser_script in v1
}

Rules to lock now:

schema_version = "1" is the only accepted version in v1
duplicate scene.id is a hard error and must report both manifest paths
manifest loading must not add a second config key or a hardcoded skill_staging/scenes scan
scene.toml is runtime-owned; scene.json stays legacy-only
Step 4: Re-run the registry tests and verify they pass

Run:

cargo test --test scene_registry_test -- --nocapture

Expected: PASS.

Step 5: Commit the contract and registry slice

Run:

git add src/lib.rs src/scene_contract/mod.rs src/scene_contract/manifest.rs src/compat/mod.rs src/compat/scene_platform/mod.rs src/compat/scene_platform/registry.rs tests/scene_registry_test.rs
git commit -m "feat: add scene manifest registry"

Expected: one commit that introduces the stable runtime/generator contract and registry loader.

Task 3: Generalize deterministic dispatch and reusable parameter resolvers

Files:

Create: src/compat/scene_platform/dispatch.rs
Create: src/compat/scene_platform/resolvers.rs
Modify: src/compat/deterministic_submit.rs
Modify: tests/deterministic_submit_test.rs
Step 1: Replace the line-loss-only deterministic tests with registry-backed red tests

Extend tests/deterministic_submit_test.rs with registry-backed red cases built from temp fixture manifests under a temporary skills root. Do not depend on the committed sample package from Task 6 yet; Task 3 must stay hermetic and independently runnable. Add failing cases such as:

#[test]
fn deterministic_submit_uses_registry_backed_scene_plan() {
    let decision = decide_deterministic_submit(
        "兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。",
        None,
        None,
    );

    match decision {
        DeterministicSubmitDecision::Execute(plan) => {
            assert_eq!(plan.scene_id, "tq-lineloss-report");
            assert_eq!(plan.tool_name, "tq-lineloss-report.collect_lineloss");
            assert_eq!(plan.expected_domain, "20.76.57.61");
            assert_eq!(plan.target_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
        }
        other => panic!("expected execute plan, got {other:?}"),
    }
}

#[test]
fn deterministic_submit_fails_closed_on_scene_ambiguity() { /* two plausible scene.toml entries -> Prompt */ }

#[test]
fn deterministic_submit_prompts_for_missing_period_instead_of_defaulting() {
    let decision = decide_deterministic_submit("兰州公司 台区线损大数据 月累计线损率统计分析。。。", None, None);
    assert!(matches!(decision, DeterministicSubmitDecision::Prompt { .. }));
}

#[test]
fn deterministic_submit_uses_page_context_to_break_ties_before_keyword_only_match() { /* page_url/title beats keyword overlap */ }

#[test]
fn zhihu_without_suffix_remains_not_deterministic() {
    assert!(matches!(
        decide_deterministic_submit("打开知乎热榜", Some("https://www.zhihu.com/hot"), Some("知乎热榜")),
        DeterministicSubmitDecision::NotDeterministic
    ));
}

Also invert the current default-period expectations. 兰州公司月累计。。。 and 兰州公司周累计。。。 must now prompt instead of executing.

Step 2: Run the targeted deterministic tests and verify they fail

Run:

cargo test --test deterministic_submit_test -- --nocapture

Expected: FAIL because the current implementation is still hardcoded to line-loss constants and still defaults missing month/week periods.

Step 3: Implement reusable resolver types and a registry-backed dispatcher

Implement the generic deterministic planner in the new scene-platform modules, then make src/compat/deterministic_submit.rs a thin adapter over it.

Required implementation shape:

pub enum ResolverKind {
    DictionaryEntity,
    MonthWeekPeriod,
    FixedEnum,
    LiteralPassthrough,
}

pub struct SceneExecutionPlan {
    pub scene_id: String,
    pub instruction: String,
    pub tool_name: String,
    pub expected_domain: String,
    pub target_url: String,
    pub args: Map<String, Value>,
    pub success_statuses: Vec<String>,
    pub failure_statuses: Vec<String>,
    pub postprocess: Option<PostprocessSection>,
}

pub fn plan_deterministic_scene(
    raw_instruction: &str,
    page_url: Option<&str>,
    page_title: Option<&str>,
    skills_dir: &Path,
) -> Result<DeterministicSubmitDecision, SceneDispatchError> {
    // exact suffix gate
    // load registry from the single skillsDir
    // score candidate scenes using include/exclude keywords + page context + required-param resolution
    // if multiple remain plausible -> fail closed with explicit ambiguity prompt
    // resolve params using generic resolver kinds
    // build executable SceneExecutionPlan with manifest bootstrap + tool + canonical args
}

Resolver rules to lock now:

dictionary_entity reads external dictionary data such as references/org-dictionary.json; no hardcoded org list in Rust after migration
month_week_period returns explicit prompts for missing mode, missing period, contradictory month/week intent, or week-without-year
fixed_enum and literal_passthrough exist now so the manifest contract is extensible, even if line-loss is the only v1 user
if a new scene needs a new resolver type, add a reusable resolver, not a scene-specific if scene_id == ... branch
Step 4: Re-run the deterministic tests and verify they pass

Run:

cargo test --test deterministic_submit_test -- --nocapture

Expected: PASS, including the new no-default-period behavior and ambiguity fail-closed coverage.

Step 5: Commit the registry-driven deterministic slice

Run:

git add src/compat/deterministic_submit.rs src/compat/scene_platform/dispatch.rs src/compat/scene_platform/resolvers.rs tests/deterministic_submit_test.rs
git commit -m "feat: add registry-driven deterministic scene dispatch"

Expected: one commit that removes one-off line-loss decision ownership from the deterministic planner.

Task 4: Add a generic report-artifact interpreter and XLSX postprocess path

Files:

Create: src/compat/report_artifact.rs
Create: src/compat/report_xlsx_export.rs
Modify: src/compat/direct_skill_runtime.rs
Modify: src/compat/deterministic_submit.rs
Create: tests/report_artifact_postprocess_test.rs
Modify: tests/agent_runtime_test.rs
Step 1: Write the red tests for generic report-artifact handling

Add tests/report_artifact_postprocess_test.rs and the minimum tests/agent_runtime_test.rs extensions needed to prove the platform no longer depends on line-loss-specific Rust export logic:

#[test]
fn report_artifact_postprocess_exports_xlsx_for_ok_or_partial_scene() {
    let artifact = serde_json::json!({
        "type": "report-artifact",
        "report_name": "tq-lineloss-report",
        "status": "partial",
        "columns": ["ORG_NAME", "LINE_LOSS_RATE"],
        "column_defs": [["ORG_NAME", "供电单位"], ["LINE_LOSS_RATE", "综合线损率(%)"]],
        "rows": [{"ORG_NAME": "国网兰州供电公司", "LINE_LOSS_RATE": "1.23"}],
        "counts": {"rows": 1},
        "partial_reasons": ["report_log_failed"]
    });

    let outcome = interpret_report_artifact_and_postprocess(&artifact, report_postprocess_xlsx(), temp_workspace()).unwrap();
    assert!(outcome.success);
    assert!(outcome.summary.contains("status=partial"));
    assert!(outcome.summary.contains("detail_rows=1"));
    assert!(outcome.summary.contains("export_path="));
}

#[test]
fn report_artifact_postprocess_skips_export_for_blocked_or_error_scene() { /* no xlsx path */ }

#[test]
fn direct_submit_and_scene_submit_share_the_same_report_summary_contract() { /* direct_skill_runtime + deterministic path both use same summary builder */ }

Step 2: Run the focused report-artifact tests and verify they fail

Run:

cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture

Expected: FAIL because the generic interpreter/exporter does not exist yet and deterministic line-loss export is still special-cased.

Step 3: Implement the shared parser, summary builder, and generic XLSX exporter

Implement a reusable path that both deterministic scenes and configured direct-submit skills can call:

pub struct ParsedReportArtifact {
    pub report_name: String,
    pub status: String,
    pub columns: Vec<String>,
    pub column_defs: Vec<(String, String)>,
    pub rows: Vec<Map<String, Value>>,
    pub counts: ReportCounts,
    pub partial_reasons: Vec<String>,
}

pub fn interpret_report_artifact_and_postprocess(
    artifact_json: &Value,
    postprocess: Option<&PostprocessSection>,
    workspace_root: &Path,
) -> Result<DirectSubmitOutcome, PipeError> {
    // parse report-artifact generically
    // map ok/partial/empty => success=true
    // map blocked/error => success=false
    // if postprocess.exporter == Some("xlsx_report") and status is exportable, write xlsx under workspace_root/out
    // if postprocess.auto_open == Some("excel"), reuse existing open-export helper
}

Rules:

export logic must read column_defs when present, else fall back to columns
do not keep line-loss-only column-name assumptions in Rust
keep direct-submit behavior unchanged for non-artifact string outputs
keep blocked / error as failures even if rows happen to be present late in the artifact
Step 4: Re-run the focused tests and verify they pass

Run:

cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_blocked_report_artifact_as_failure -- --nocapture

Expected: PASS.

Step 5: Commit the generic artifact/postprocess slice

Run:

git add src/compat/report_artifact.rs src/compat/report_xlsx_export.rs src/compat/direct_skill_runtime.rs src/compat/deterministic_submit.rs tests/report_artifact_postprocess_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: share generic report artifact postprocess"

Expected: one commit that removes the need for per-scene Rust export logic.

Task 5: Wire manifest-driven scenes into submit and bootstrap without regressing other flows

Files:

Modify: src/agent/task_runner.rs
Modify: src/service/server.rs
Modify: tests/service_task_flow_test.rs
Modify: tests/service_ws_session_test.rs
Modify: tests/agent_runtime_test.rs
Step 1: Add the failing submit/bootstrap regression tests

Add focused tests that lock branch order and bootstrap behavior:

#[test]
fn submit_task_routes_suffix_instruction_through_manifest_scene_before_llm() {
    // no provider call should happen when deterministic scene planning succeeds or prompts
}

#[test]
fn resolve_submit_bootstrap_target_prefers_manifest_scene_target_for_deterministic_scene() {
    let request = SubmitTaskRequest {
        instruction: "兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。".to_string(),
        conversation_id: None,
        messages: vec![],
        page_url: None,
        page_title: None,
    };
    let target = resolve_submit_bootstrap_target(&request, workspace_root, &settings);
    assert_eq!(target.request_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
    assert_eq!(target.expected_domain.as_deref(), Some("20.76.57.61"));
}

#[test]
fn zhihu_without_suffix_keeps_existing_non_scene_path() { /* ordinary path unchanged */ }

For the browser-ws/callback-host path, add one regression in tests/service_ws_session_test.rs proving the first bootstrap/open target comes from scene.toml when a deterministic scene plan exists.

Step 2: Run the focused integration tests and verify they fail

Run:

cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture

Expected: FAIL because the submit/bootstrap path still depends on the old deterministic line-loss branch shape.

Step 3: Implement the minimal wiring changes only where the branch already exists

Implementation targets:

keep the current submit branch order in src/agent/task_runner.rs
keep resolve_submit_bootstrap_target(...) precedence in src/service/server.rs
replace the old hardcoded deterministic plan source with the new manifest-backed planner
keep configured directSubmitSkill and ordinary LLM/browser orchestration behavior untouched

The resulting branch order must still be:

// 1. registry-backed deterministic scene (exact suffix only)
// 2. ordinary primary orchestration path
// 3. configured directSubmitSkill
// 4. compat LLM/runtime path

Step 4: Re-run the focused integration tests and verify they pass

Run:

cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test agent_runtime_test -- --nocapture

Expected: PASS, with no regression to the ordinary direct-submit or Zhihu paths.

Step 5: Commit the submit/bootstrap integration slice

Run:

git add src/agent/task_runner.rs src/service/server.rs tests/service_task_flow_test.rs tests/service_ws_session_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: wire manifest scenes into submit bootstrap"

Expected: one commit that changes wiring only at the existing seams.

Task 6: Add the first manifest-driven `tq-lineloss-report` sample package inside this repo

Files:

Create: examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml
Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json
Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml
Create: examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md
Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js
Create: examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js
Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md
Create: examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md
Modify: tests/deterministic_submit_test.rs
Modify: tests/scene_registry_test.rs
Step 1: Add the failing line-loss manifest and runtime-contract checks

Create the scene.toml shape in the in-repo sample package first and lock the migration expectations:

[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"

[manifest]
schema_version = "1"

[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
page_title_keywords = ["线损"]
requires_target_page = true

[deterministic]
suffix = "。。。"
include_keywords = ["线损", "月累计", "周累计", "统计分析"]
exclude_keywords = ["知乎"]

[[params]]
name = "org"
resolver = "dictionary_entity"
required = true
prompt_missing = "已命中台区线损报表技能，但缺少供电单位。"
prompt_ambiguous = "已命中台区线损报表技能，但供电单位存在歧义，请补充更完整名称。"

[params.resolver_config]
dictionary_ref = "references/org-dictionary.json"
output_label_field = "org_label"
output_code_field = "org_code"

[[params]]
name = "period"
resolver = "month_week_period"
required = true
prompt_missing = "已命中台区线损报表技能，但缺少统计周期。"
prompt_ambiguous = "已命中台区线损报表技能，但统计周期存在歧义，请补充更明确表达。"

[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]

[postprocess]
exporter = "xlsx_report"
auto_open = "excel"

Also add a red JS assertion in the committed sample package proving the script returns column_defs and never re-parses raw natural-language org/period text:

test('buildBrowserEntrypointResult keeps canonical args and generic export fields only', async () => {
  const artifact = await buildBrowserEntrypointResult({
    expected_domain: '20.76.57.61',
    org_label: '国网兰州供电公司',
    org_code: '62401',
    period_mode: 'month',
    period_mode_code: '1',
    period_value: '2026-03',
    period_payload: { fdate: '2026-03' },
    instruction: '兰州公司 月累计 2026-03'
  }, fakeDeps);

  assert.equal(artifact.org.code, '62401');
  assert.ok(Array.isArray(artifact.column_defs));
  assert.equal(JSON.stringify(artifact).includes('兰州公司 月累计 2026-03'), false);
});

Step 2: Run the targeted line-loss tests and verify they fail

Run:

cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"

Expected: FAIL because the runtime is not yet manifest-driven and the committed sample package does not yet expose the final manifest/dictionary/export contract.

Step 3: Implement the sample-scene migration without adding per-scene Rust branches

Required actions:

add scene.toml under the in-repo sample skill root and use the same layout the generator will emit
make tests and service-smoke config resolve skillsDir to examples/generated_scene_platform so the registry can discover the committed sample package without any external repo copy step
export the current org unit data into references/org-dictionary.json and make the resolver read that file instead of a Rust hardcoded list
update collect_lineloss.js so the returned report-artifact includes generic-platform fields needed by report_xlsx_export.rs
keep collection logic in JS; do not move line-loss business semantics back into Rust
write SKILL.toml / SKILL.md / references docs into the sample package to describe canonical args and the manifest-driven contract
keep any external staging-repo publish step out of scope for this branch; this task only commits the in-repo sample package
Step 4: Re-run the line-loss tests and verify they pass

Run:

cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"

Expected: PASS, including the new missing-period prompt behavior and the new manifest-driven sample-package shape.

Step 5: Commit the line-loss sample migration

Run:

git add examples/generated_scene_platform/skills/tq-lineloss-report tests/deterministic_submit_test.rs tests/scene_registry_test.rs
git commit -m "feat: add manifest-driven lineloss sample package"

Expected: one commit that adds the first committed manifest-driven sample package and updates runtime expectations around it.

Task 7: Write the required `tq-lineloss` lessons-learned artifacts and load them as generator rules

Files:

Create: docs/superpowers/references/tq-lineloss-lessons-learned.md
Create: docs/superpowers/references/tq-lineloss-lessons-learned.toml
Create: tests/generated_scene_lessons_test.rs
Create: src/generated_scene/mod.rs
Create: src/generated_scene/lessons.rs
Modify: src/lib.rs
Step 1: Write the failing lessons-rules test before the docs

Add tests/generated_scene_lessons_test.rs that requires all mandatory structured rule sections to exist. In the same red step, wire the empty src/generated_scene/mod.rs and src/lib.rs exports needed so this test fails on missing implementation/data, not on missing module visibility:

#[test]
fn lineloss_lessons_toml_declares_required_generator_rules() {
    let lessons = load_generation_lessons("docs/superpowers/references/tq-lineloss-lessons-learned.toml").unwrap();

    assert!(lessons.routing.require_exact_suffix);
    assert!(lessons.routing.unsupported_scene_fail_closed);
    assert!(lessons.canonical_params.require_explicit_period);
    assert!(lessons.bootstrap.require_expected_domain);
    assert!(lessons.bootstrap.require_target_url);
    assert!(lessons.artifact.require_report_artifact);
    assert!(lessons.validation.require_pipe_and_ws_checks);
    assert!(lessons.validation.require_manual_service_console_smoke);
}

Step 2: Run the lessons test and verify it fails

Run:

cargo test --test generated_scene_lessons_test -- --nocapture

Expected: FAIL because the lessons loader and TOML file do not exist yet.

Step 3: Implement the loader and write both lessons artifacts

Implement the loader and complete the minimal module wiring (src/generated_scene/mod.rs, src/lib.rs) in this task so cargo test --test generated_scene_lessons_test is buildable before Task 8. Use a TOML shape explicit enough for generator enforcement, for example:

[routing]
require_exact_suffix = true
unsupported_scene_fail_closed = true
ambiguity_fail_closed = true

[canonical_params]
require_dictionary_entity_for_org = true
require_explicit_period = true
forbid_hidden_page_defaults = true

[bootstrap]
require_expected_domain = true
require_target_url = true
prefer_page_context_when_present = true

[artifact]
require_report_artifact = true
require_column_defs_for_export = true
rust_side_xlsx_export_when_postprocess_xlsx = true

[validation]
require_pipe_and_ws_checks = true
require_manual_service_console_smoke = true
require_callback_host_timeout_notes = true

The Markdown companion must explain the why behind those rules: deterministic routing pitfalls, canonical parameter pitfalls, bootstrap target pitfalls, pipe/ws differences, callback-host timeout lessons, and Rust-side export constraints.

Step 4: Re-run the lessons tests and verify they pass

Run:

cargo test --test generated_scene_lessons_test -- --nocapture

Expected: PASS.

Step 5: Commit the lessons artifacts and loader

Run:

git add docs/superpowers/references/tq-lineloss-lessons-learned.md docs/superpowers/references/tq-lineloss-lessons-learned.toml src/generated_scene/mod.rs src/generated_scene/lessons.rs src/lib.rs tests/generated_scene_lessons_test.rs
git commit -m "docs: add lineloss generation lessons"

Expected: one commit that makes the line-loss lessons machine-consumable and reviewable.

Task 8: Build the v1 source analyzer, package generator, and CLI entry

Files:

Create: src/generated_scene/analyzer.rs
Create: src/generated_scene/generator.rs
Create: src/bin/sg_scene_generate.rs
Modify: src/generated_scene/mod.rs
Modify: src/lib.rs
Create: tests/scene_generator_test.rs
Create: tests/fixtures/generated_scene/report_collection/index.html
Create: tests/fixtures/generated_scene/report_collection/js/report.js
Create: tests/fixtures/generated_scene/non_report/index.html
Create: tests/fixtures/scene_source/tq_lineloss/index.html
Create: tests/fixtures/scene_source/tq_lineloss/js/collect.js
Step 1: Add the failing analyzer/generator tests with hermetic fixtures

Create fixture-backed tests like:

#[test]
fn analyzer_classifies_supported_report_collection_source() {
    let analysis = analyze_scene_source(Path::new("tests/fixtures/generated_scene/report_collection")).unwrap();
    assert_eq!(analysis.scene_kind, SceneKind::ReportCollection);
    assert_eq!(analysis.tool_kind, ToolKind::BrowserScript);
    assert!(analysis.bootstrap.target_url.is_some());
    assert!(analysis.collection_entry_script.is_some());
}

#[test]
fn generator_writes_registration_ready_package_with_scene_toml() {
    let output_root = tempdir();
    generate_scene_package(GenerateSceneRequest {
        source_dir: PathBuf::from("tests/fixtures/generated_scene/report_collection"),
        scene_id: "sample-report-scene".to_string(),
        scene_name: "示例报表场景".to_string(),
        output_root: output_root.path().to_path_buf(),
        lessons_path: PathBuf::from("docs/superpowers/references/tq-lineloss-lessons-learned.toml"),
    }).unwrap();

    assert!(output_root.path().join("skills/sample-report-scene/SKILL.toml").exists());
    assert!(output_root.path().join("skills/sample-report-scene/scene.toml").exists());
    assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.js").exists());
    assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.test.js").exists());
}

#[test]
fn generator_rejects_non_report_source_with_explicit_reason() {
    let err = analyze_scene_source(Path::new("tests/fixtures/generated_scene/non_report")).unwrap_err();
    assert!(err.to_string().contains("report/collection browser_script only"));
}

Step 2: Run the generator tests and verify they fail

Run:

cargo test --test scene_generator_test -- --nocapture

Expected: FAIL because the analyzer, generator, fixtures, and CLI do not exist yet.

Step 3: Implement the analyzer, generator, CLI, and the source fixtures used by final smoke

Implementation rules:

create the generator test fixtures under tests/fixtures/generated_scene/*
create the hermetic source-smoke fixtures under tests/fixtures/scene_source/tq_lineloss/* so Task 9 can run without any external scenario directory
analyzer must refuse unsupported/non-report scenes explicitly instead of generating broken packages
generator must emit scene.toml inside the generated skill root
generator must use tq-lineloss-lessons-learned.toml as a required input so the same hardening rules apply to future scenes
generator/runtime coupling must stay at the file-contract level only
CLI should use an explicit parser, no new heavy dependency

Suggested CLI shape:

cargo run --bin sg_scene_generate -- \
  --source-dir <scenario-dir> \
  --scene-id <scene-id> \
  --scene-name <display-name> \
  --output-root <skill-staging-root> \
  --lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml

Expected outputs under <output-root>:

skills/<scene-id>/SKILL.toml
skills/<scene-id>/SKILL.md
skills/<scene-id>/scene.toml
skills/<scene-id>/references/*.md
skills/<scene-id>/scripts/*.js
skills/<scene-id>/scripts/*.test.js
Step 4: Re-run the generator tests and verify they pass

Run:

cargo test --test scene_generator_test -- --nocapture

Expected: PASS.

Step 5: Commit the generator slice

Run:

git add src/lib.rs src/generated_scene/mod.rs src/generated_scene/analyzer.rs src/generated_scene/generator.rs src/bin/sg_scene_generate.rs tests/scene_generator_test.rs tests/fixtures/generated_scene tests/fixtures/scene_source/tq_lineloss
 git commit -m "feat: add generated scene package generator"

Expected: one commit that adds the in-repo v1 generator capability.

Task 9: Run the final verification sweep, smoke the real runtime, and remove unused one-off scene code

Files:

Delete if unused after green verification: src/compat/tq_lineloss/org_units.rs
Delete if unused after green verification: src/compat/tq_lineloss/org_resolver.rs
Delete if unused after green verification: src/compat/tq_lineloss/period_resolver.rs
Delete or reduce to shim only if unused after green verification: src/compat/lineloss_xlsx_export.rs
Modify: src/compat/mod.rs
Modify: src/lib.rs
Step 1: Remove only the legacy one-off files that are provably unused

Before deleting anything, prove the new path covers the old responsibilities:

cargo test --test deterministic_submit_test -- --nocapture
cargo test --test scene_registry_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture

Then delete the old line-loss-only resolver/export files only if cargo test and Grep show they are no longer referenced.

Step 2: Run the full automated verification sweep

Run:

node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
cargo test --test scene_registry_test -- --nocapture
cargo test --test deterministic_submit_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test compat_runtime_test -- --nocapture
cargo test --test compat_config_test -- --nocapture
cargo build --bin sgclaw --bin sg_claw --bin sg_scene_generate

Expected: PASS.

Step 3: Run the required hermetic generator smoke and keep the real external source smoke optional

Run the required in-repo smoke first:

tmp_out="$(mktemp -d)"
cargo run --bin sg_scene_generate -- \
  --source-dir tests/fixtures/scene_source/tq_lineloss \
  --scene-id tq-lineloss-report \
  --scene-name "台区线损月周累计线损率统计分析" \
  --output-root "$tmp_out" \
  --lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml

Expected: generator emits a complete package into $tmp_out using only in-repo fixtures.

Optional manual follow-up after the required smoke is green:

if the external scenario directory is available on the implementer's machine, re-run the same command against the real source tree for additional confidence
if it is unavailable, do not block the branch on that machine-specific path
Step 4: Run the real service-console smoke checks with sg_claw.exe semantics in mind

Manual verification checklist:

write or reuse a repo-local sgclaw_config.json whose skillsDir points to examples/generated_scene_platform
rebuild and run sg_claw/sg_claw.exe with that config so the runtime-scanned skills root is reproducible
on the real line-loss page, submit 兰州公司台区线损大数据月累计线损率统计分析 2026-03。。。
confirm the request bootstraps the manifest target_url, uses the manifest expected_domain, and returns the line-loss report artifact through the generic scene runtime
submit 兰州公司台区线损大数据月累计线损率统计分析。。。 and confirm the runtime prompts for missing period instead of defaulting
submit 打开知乎热榜 and confirm the ordinary Zhihu path still behaves as before
submit 打开知乎热榜。。。 and confirm the deterministic runtime fails closed with the unsupported-scene prompt instead of falling into the Zhihu path
Step 5: Commit the cleanup + verified platform state

Run:

git add src/compat/mod.rs src/lib.rs src/compat src/generated_scene src/scene_contract docs/superpowers/references tests examples/generated_scene_platform
 git commit -m "feat: add generated scene skill platform"

Expected: one final commit after the full automated and manual verification passes.

Verification Checklist

Registry and manifest contract

cargo test --test scene_registry_test -- --nocapture

Expected:

scene.toml loads from the skill root
only schema_version = "1" passes
duplicate scene.id fails with both manifest paths in the error
non-browser_script or non-report_collection v1 scenes are rejected cleanly
the registry still scans exactly one resolved skillsDir

Deterministic routing contract

cargo test --test deterministic_submit_test -- --nocapture

Expected:

exact 。。。 suffix only
no-suffix behavior unchanged
unsupported suffix-scene requests fail closed
multi-match ambiguity fails closed
missing org/mode/period prompt instead of defaulting
page context may improve scoring but cannot cause silent guessing on unresolved ambiguity

Generic report-artifact handling

cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture

Expected:

ok / partial / empty map to success
blocked / error map to failure
generic XLSX export works from artifact fields, not line-loss-only Rust code
configured directSubmitSkill keeps working on the shared artifact interpreter

Service submit/bootstrap path

cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture

Expected:

deterministic manifest scenes route before LLM
bootstrap target resolution uses manifest target_url / expected_domain
callback-host/browser-ws paths still receive the correct request URL
non-deterministic Zhihu and direct-submit flows remain intact

Generator and lessons

cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo build --bin sg_scene_generate

Expected:

lessons TOML contains all required routing/param/bootstrap/artifact/validation rules
analyzer only accepts v1 report/collection browser-script fixtures
generator writes a complete package with scene.toml and JS test scaffold
generator/runtime share only the explicit file contract, not hidden Rust internals

Real runtime smoke

Manual checklist:

sg_claw.exe / service console can still run the line-loss deterministic path
missing-period deterministic line-loss requests prompt instead of defaulting
plain Zhihu requests still avoid the scene platform
suffixed unsupported requests fail closed
line-loss export still opens through the generic postprocess path when configured

Notes For The Engineer

The paired approved spec is docs/superpowers/specs/2026-04-15-generated-scene-skill-platform-design.md.
The current repo branch name for the ws baseline is feature/claw-ws, even though the design prose says ws.
Do not reintroduce the old scene-registry experiment that was explicitly cleaned off the ws branch. This plan deliberately keeps the new runtime under compat and a shared serializable contract instead of reviving the deleted scene-only branch structure blindly.
Keep scene.toml inside each skill package root. The separate skill_staging/scenes/*/scene.json tree remains legacy metadata only in this plan.
Keep the generator extractable by holding the boundary at scene.toml, generated package layout, and lessons TOML rules. Avoid runtime code that reaches into generator-only internals.
If a real scenario directory does not fit the v1 report/collection/browser-script envelope, the analyzer/generator must refuse it explicitly instead of emitting a half-valid package.
Do not add a generic login/session platform here. Capture that need in docs if discovered, but keep it out of this implementation slice.

49 KiB Raw Blame History

Generated Scene Skill Platform Implementation Plan

Execution Context

File Map

Core runtime and contract files

Generator files

In-repo sample package and reference assets

Repo-local runtime discovery path for validation

External publish target kept out of scope for this branch

Platform-reference files

Tests and fixtures

Legacy files to delete only after green verification proves they are unused

Task 1: Create the implementation branch and lock the layout boundaries

Task 2: Add the shared scene.toml contract and registry loader

Task 3: Generalize deterministic dispatch and reusable parameter resolvers

Task 4: Add a generic report-artifact interpreter and XLSX postprocess path

Task 5: Wire manifest-driven scenes into submit and bootstrap without regressing other flows

Task 6: Add the first manifest-driven tq-lineloss-report sample package inside this repo

Task 7: Write the required tq-lineloss lessons-learned artifacts and load them as generator rules

Task 8: Build the v1 source analyzer, package generator, and CLI entry

Task 9: Run the final verification sweep, smoke the real runtime, and remove unused one-off scene code

Verification Checklist

Registry and manifest contract

Deterministic routing contract

Generic report-artifact handling

Service submit/bootstrap path

Generator and lessons

Real runtime smoke

Notes For The Engineer

49 KiB

Raw Blame History

Task 2: Add the shared `scene.toml` contract and registry loader

Task 6: Add the first manifest-driven `tq-lineloss-report` sample package inside this repo

Task 7: Write the required `tq-lineloss` lessons-learned artifacts and load them as generator rules