49 KiB
Generated Scene Skill Platform Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add a manifest-driven generated-scene platform that discovers staged report/collection browser_script scenes, routes deterministic 。。。 requests through generic registry/resolver logic, migrates tq-lineloss-report off one-off Rust branches, and ships a first in-repo generator that outputs registration-ready scene packages with minimal or zero per-scene Rust changes.
Architecture: Keep the existing submit branch shape in src/agent/task_runner.rs, but replace the line-loss-specific deterministic branch with a thin adapter over a generic scene registry, deterministic dispatcher, generic report-artifact interpreter, and generic XLSX postprocess path. Keep the generator separate from runtime internals by making scene.toml plus the lessons-learned TOML the only stable generator/runtime contract; generator code lives in its own module and binary, while runtime code stays under the existing compat submit/bootstrap seams.
Tech Stack: Rust 2021, serde, serde_json, toml, existing browser_script runtime and callback-host/browser-backend seams, node:test for staged JS, Cargo integration tests, filesystem-based package generation.
Execution Context
- Branch from the repo's current ws baseline branch, which is
feature/claw-wsin this checkout today. Do not implement on that branch directly; create a new feature branch from its HEAD. - Do not create a worktree unless the user explicitly asks. Branch isolation is required; worktree isolation is not.
- Keep
skillsDiras the existing single resolved path. The new scene registry must scan inside that one resolved skills root instead of adding array-style scene roots or a second config field. - For this branch's automated tests and real smokes, use a repo-local
skillsDiroverride that points atexamples/generated_scene_platform. That still preserves the single-root contract because the runtime scans one resolved root whoseskills/child contains the committed sample package. - Put the new runtime registration manifest at
<skill-root>/scene.toml. Keep existingskill_staging/scenes/*/scene.jsonfiles for legacy staging/UI metadata and do not move runtime dispatch policy back intoscene.json. - Keep every required deliverable for this plan inside the current
claw-newrepo so the branch can be built, tested, and committed independently. The first committed sample package should live underexamples/generated_scene_platform/skills/; publishing the same package into any external skills/staging repo is a separate follow-up, not part of this branch. - V1 scope is locked to
category = "report_collection",kind = "browser_script",artifact.type = "report-artifact". Unsupported scene types must fail fast instead of partially working. - Deterministic invocation remains exact-suffix-only: only raw instructions ending with the exact
。。。suffix enter the scene dispatcher. - Never use hidden page defaults for required canonical parameters. Missing org, missing month/week mode, or missing period must prompt and stop.
- Do not add a generic login/session subsystem in this plan.
- Preserve current non-platform flows: Zhihu/LLM, configured
directSubmitSkill, and ordinary browser-attached orchestration must remain behaviorally unchanged unless an explicit regression test says otherwise.
File Map
Core runtime and contract files
- Create:
src/scene_contract/mod.rs- shared serializable manifest contract used by both runtime and generator
- Create:
src/scene_contract/manifest.rsscene.tomlschema types, schema-version validation helpers, artifact/postprocess enums
- Create:
src/compat/scene_platform/mod.rs- exports the registry, dispatch, and resolver units
- Create:
src/compat/scene_platform/registry.rs- scans the single resolved
skillsDir, loads<skill-root>/scene.toml, validates duplicates and runtime compatibility
- scans the single resolved
- Create:
src/compat/scene_platform/dispatch.rs- deterministic candidate scoring, ambiguity fail-closed behavior, canonical param resolution, executable scene plan creation
- Create:
src/compat/scene_platform/resolvers.rs- reusable resolver types for
dictionary_entity,month_week_period,fixed_enum, andliteral_passthrough
- reusable resolver types for
- Create:
src/compat/report_artifact.rs- generic report-artifact parsing, status mapping, summary building, and export-readiness helpers
- Create:
src/compat/report_xlsx_export.rs- generic XLSX exporter for any
report-artifactwithcolumn_defs/columns+rows
- generic XLSX exporter for any
- Modify:
src/lib.rs- export new shared/runtime/generator modules and any CLI helpers needed by tests
- Modify:
src/compat/mod.rs- export the new scene-platform and report-artifact modules
- Modify:
src/compat/deterministic_submit.rs- keep the public API shape, but make it registry/manifest-driven instead of line-loss-hardcoded
- Modify:
src/compat/direct_skill_runtime.rs- reuse the generic report-artifact interpreter so direct-submit and scene-submit summarize/status-map the same way
- Modify:
src/agent/task_runner.rs- keep branch order, but call the new registry-backed deterministic planner before ordinary orchestration/LLM
- Modify:
src/service/server.rs- keep bootstrap precedence shape, but let deterministic plans source
target_url/expected_domainfrom scene manifests instead of hardcoded constants
- keep bootstrap precedence shape, but let deterministic plans source
Generator files
- Create:
src/generated_scene/mod.rs- generator entrypoints shared by tests and CLI
- Create:
src/generated_scene/analyzer.rs- source directory inspection for v1 report/collection
browser_scriptscenes
- source directory inspection for v1 report/collection
- Create:
src/generated_scene/generator.rs- template rendering and package writing into an output staging root
- Create:
src/generated_scene/lessons.rs- loads and validates
tq-lineloss-lessons-learned.tomlas generation constraints
- loads and validates
- Create:
src/bin/sg_scene_generate.rs- CLI entry for
sgClaw's in-repo scene generator capability
- CLI entry for
In-repo sample package and reference assets
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml- first committed manifest-driven sample scene package used by runtime and generator tests in this repo
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json- external dictionary data for the
dictionary_entityresolver fixture
- external dictionary data for the
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml- committed sample browser-script tool contract aligned with the manifest-driven runtime
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md- committed sample documentation for canonical args, artifact contract, and runtime expectations
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js- committed sample collection script with generic-platform artifact fields
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js- committed JS contract tests for canonical args and artifact shape
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md- committed sample data-quality notes aligned with manifest-driven output rules
- Create:
examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md- committed sample bootstrap/collection-flow notes
- Create:
tests/fixtures/scene_source/tq_lineloss/index.html- hermetic in-repo source fixture for required analyzer/generator smoke coverage
- Create:
tests/fixtures/scene_source/tq_lineloss/js/collect.js- hermetic in-repo source fixture JS for analyzer/generator smoke coverage
Repo-local runtime discovery path for validation
- Use
examples/generated_scene_platformas the repo-localskillsDiroverride root during tests and manual smokes. - The runtime still scans one resolved root only; it just resolves that root to
examples/generated_scene_platform, whoseskills/child contains the committed sample package. - Add or reuse a tiny repo-local config fixture such as
tmp/generated_scene_platform_sgclaw_config.jsonor an equivalent test helper so the validation steps all point at the same reproducibleskillsDir. - Do not require external staging repos to make the manifest-driven runtime discoverable during this branch.
External publish target kept out of scope for this branch
- Do not modify external paths like
D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/...in this plan. - If the user later wants the generated sample published into that external staging repo, do it as a separate follow-up after this branch is green.
Platform-reference files
Tests and fixtures
- Create:
tests/scene_registry_test.rs- manifest loading, duplicate detection, schema validation, tool compatibility checks
- Create:
tests/report_artifact_postprocess_test.rs- generic report-artifact parsing and XLSX postprocess coverage
- Create:
tests/generated_scene_lessons_test.rs- lessons-TOML shape and required-rule coverage
- Create:
tests/scene_generator_test.rs- analyzer + generator integration coverage using hermetic fixtures
- Create:
tests/fixtures/generated_scene/report_collection/index.html- supported v1 report-scene fixture
- Create:
tests/fixtures/generated_scene/report_collection/js/report.js- supported fixture source hints for analyzer tests
- Create:
tests/fixtures/generated_scene/non_report/index.html- unsupported fixture proving fail-fast behavior
- Modify:
tests/deterministic_submit_test.rs- migrate from hardcoded line-loss expectations to registry-driven deterministic behavior
- Modify:
tests/agent_runtime_test.rs- keep direct-submit behavior intact while sharing generic report-artifact summaries
- Modify:
tests/service_task_flow_test.rs- task-runner/bootstrap regressions for manifest-driven deterministic scenes
- Modify:
tests/service_ws_session_test.rs- callback-host bootstrap target regression for manifest-driven deterministic submit when the browser-ws path is active
Legacy files to delete only after green verification proves they are unused
- Delete:
src/compat/tq_lineloss/org_units.rs - Delete:
src/compat/tq_lineloss/org_resolver.rs - Delete:
src/compat/tq_lineloss/period_resolver.rs - Delete or reduce to a compatibility shim only if still needed:
src/compat/lineloss_xlsx_export.rs
Task 1: Create the implementation branch and lock the layout boundaries
Files:
-
Verify only
-
Step 1: Switch to the ws baseline branch and create a new platform branch
Run:
git switch feature/claw-ws
git switch -c feature/generated-scene-skill-platform
Expected: git status -sb shows a clean new branch rooted at the current ws baseline, not feature/claw-ws itself.
- Step 2: Verify the current single-root skills layout before coding
Run:
cargo test --test compat_config_test ws_cleanup_resolves_single_configured_skills_dir -- --nocapture
Expected: PASS, proving the repo still uses one resolved skillsDir path and the platform work must build on that instead of introducing array-style roots.
- Step 3: Write down the two non-negotiable layout decisions in the first registry test scaffold
The very first red test file (tests/scene_registry_test.rs) must assume:
// runtime manifest location:
let manifest_path = skill_root.join("scene.toml");
// legacy scene.json stays outside runtime dispatch ownership:
assert!(skill_root.join("scene.toml").exists());
assert!(!manifest_path.ends_with("skill_staging/scenes/.../scene.json"));
This prevents the implementation from drifting back toward scene.json routing or multi-root config.
Task 2: Add the shared scene.toml contract and registry loader
Files:
-
Create:
src/scene_contract/mod.rs -
Create:
src/scene_contract/manifest.rs -
Create:
src/compat/scene_platform/mod.rs -
Create:
src/compat/scene_platform/registry.rs -
Modify:
src/lib.rs -
Modify:
src/compat/mod.rs -
Create:
tests/scene_registry_test.rs -
Step 1: Write the failing registry tests first
Add tests/scene_registry_test.rs with focused red cases like:
#[test]
fn registry_loads_scene_manifest_from_skill_root() {
let skill_root = temp_skill_with_scene_manifest(r#"
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"
[manifest]
schema_version = "1"
[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
requires_target_page = true
[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
"#);
let registry = load_scene_registry(skill_root.parent().unwrap()).unwrap();
assert_eq!(registry.len(), 1);
assert_eq!(registry[0].manifest.scene.id, "tq-lineloss-report");
}
#[test]
fn registry_rejects_duplicate_scene_ids_with_both_paths_in_error() { /* two skills, same scene.id */ }
#[test]
fn registry_rejects_unknown_manifest_schema_version() { /* schema_version = "999" */ }
#[test]
fn registry_rejects_non_browser_script_scene_tool_in_v1() { /* kind = "shell" should fail */ }
#[test]
fn registry_ignores_skills_without_scene_toml() { /* ordinary skills still load elsewhere */ }
- Step 2: Run the registry test file and verify it fails
Run:
cargo test --test scene_registry_test -- --nocapture
Expected: FAIL because scene.toml types and registry loading do not exist yet.
- Step 3: Implement the serializable manifest contract and the single-root registry loader
Implement the minimal contract and loader needed to satisfy the tests:
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SceneManifest {
pub scene: SceneSection,
pub manifest: ManifestSection,
pub bootstrap: BootstrapSection,
pub deterministic: DeterministicSection,
pub params: Vec<SceneParam>,
pub artifact: ArtifactSection,
pub postprocess: Option<PostprocessSection>,
}
#[derive(Debug, Clone)]
pub struct SceneRegistryEntry {
pub manifest: SceneManifest,
pub skill_root: PathBuf,
}
pub fn load_scene_registry(skills_dir: &Path) -> Result<Vec<SceneRegistryEntry>, SceneRegistryError> {
// iterate immediate skill dirs under the already-resolved single skillsDir
// look for <skill-root>/scene.toml only
// parse and validate schema version
// verify scene.id uniqueness across the loaded root
// verify manifest.scene.skill matches the containing skill package
// verify referenced tool exists in SKILL.toml and is browser_script in v1
}
Rules to lock now:
-
schema_version = "1"is the only accepted version in v1 -
duplicate
scene.idis a hard error and must report both manifest paths -
manifest loading must not add a second config key or a hardcoded
skill_staging/scenesscan -
scene.tomlis runtime-owned;scene.jsonstays legacy-only -
Step 4: Re-run the registry tests and verify they pass
Run:
cargo test --test scene_registry_test -- --nocapture
Expected: PASS.
- Step 5: Commit the contract and registry slice
Run:
git add src/lib.rs src/scene_contract/mod.rs src/scene_contract/manifest.rs src/compat/mod.rs src/compat/scene_platform/mod.rs src/compat/scene_platform/registry.rs tests/scene_registry_test.rs
git commit -m "feat: add scene manifest registry"
Expected: one commit that introduces the stable runtime/generator contract and registry loader.
Task 3: Generalize deterministic dispatch and reusable parameter resolvers
Files:
-
Create:
src/compat/scene_platform/dispatch.rs -
Create:
src/compat/scene_platform/resolvers.rs -
Modify:
src/compat/deterministic_submit.rs -
Modify:
tests/deterministic_submit_test.rs -
Step 1: Replace the line-loss-only deterministic tests with registry-backed red tests
Extend tests/deterministic_submit_test.rs with registry-backed red cases built from temp fixture manifests under a temporary skills root. Do not depend on the committed sample package from Task 6 yet; Task 3 must stay hermetic and independently runnable. Add failing cases such as:
#[test]
fn deterministic_submit_uses_registry_backed_scene_plan() {
let decision = decide_deterministic_submit(
"兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。",
None,
None,
);
match decision {
DeterministicSubmitDecision::Execute(plan) => {
assert_eq!(plan.scene_id, "tq-lineloss-report");
assert_eq!(plan.tool_name, "tq-lineloss-report.collect_lineloss");
assert_eq!(plan.expected_domain, "20.76.57.61");
assert_eq!(plan.target_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
}
other => panic!("expected execute plan, got {other:?}"),
}
}
#[test]
fn deterministic_submit_fails_closed_on_scene_ambiguity() { /* two plausible scene.toml entries -> Prompt */ }
#[test]
fn deterministic_submit_prompts_for_missing_period_instead_of_defaulting() {
let decision = decide_deterministic_submit("兰州公司 台区线损大数据 月累计线损率统计分析。。。", None, None);
assert!(matches!(decision, DeterministicSubmitDecision::Prompt { .. }));
}
#[test]
fn deterministic_submit_uses_page_context_to_break_ties_before_keyword_only_match() { /* page_url/title beats keyword overlap */ }
#[test]
fn zhihu_without_suffix_remains_not_deterministic() {
assert!(matches!(
decide_deterministic_submit("打开知乎热榜", Some("https://www.zhihu.com/hot"), Some("知乎热榜")),
DeterministicSubmitDecision::NotDeterministic
));
}
Also invert the current default-period expectations. 兰州公司 月累计。。。 and 兰州公司 周累计。。。 must now prompt instead of executing.
- Step 2: Run the targeted deterministic tests and verify they fail
Run:
cargo test --test deterministic_submit_test -- --nocapture
Expected: FAIL because the current implementation is still hardcoded to line-loss constants and still defaults missing month/week periods.
- Step 3: Implement reusable resolver types and a registry-backed dispatcher
Implement the generic deterministic planner in the new scene-platform modules, then make src/compat/deterministic_submit.rs a thin adapter over it.
Required implementation shape:
pub enum ResolverKind {
DictionaryEntity,
MonthWeekPeriod,
FixedEnum,
LiteralPassthrough,
}
pub struct SceneExecutionPlan {
pub scene_id: String,
pub instruction: String,
pub tool_name: String,
pub expected_domain: String,
pub target_url: String,
pub args: Map<String, Value>,
pub success_statuses: Vec<String>,
pub failure_statuses: Vec<String>,
pub postprocess: Option<PostprocessSection>,
}
pub fn plan_deterministic_scene(
raw_instruction: &str,
page_url: Option<&str>,
page_title: Option<&str>,
skills_dir: &Path,
) -> Result<DeterministicSubmitDecision, SceneDispatchError> {
// exact suffix gate
// load registry from the single skillsDir
// score candidate scenes using include/exclude keywords + page context + required-param resolution
// if multiple remain plausible -> fail closed with explicit ambiguity prompt
// resolve params using generic resolver kinds
// build executable SceneExecutionPlan with manifest bootstrap + tool + canonical args
}
Resolver rules to lock now:
-
dictionary_entityreads external dictionary data such asreferences/org-dictionary.json; no hardcoded org list in Rust after migration -
month_week_periodreturns explicit prompts for missing mode, missing period, contradictory month/week intent, or week-without-year -
fixed_enumandliteral_passthroughexist now so the manifest contract is extensible, even if line-loss is the only v1 user -
if a new scene needs a new resolver type, add a reusable resolver, not a scene-specific
if scene_id == ...branch -
Step 4: Re-run the deterministic tests and verify they pass
Run:
cargo test --test deterministic_submit_test -- --nocapture
Expected: PASS, including the new no-default-period behavior and ambiguity fail-closed coverage.
- Step 5: Commit the registry-driven deterministic slice
Run:
git add src/compat/deterministic_submit.rs src/compat/scene_platform/dispatch.rs src/compat/scene_platform/resolvers.rs tests/deterministic_submit_test.rs
git commit -m "feat: add registry-driven deterministic scene dispatch"
Expected: one commit that removes one-off line-loss decision ownership from the deterministic planner.
Task 4: Add a generic report-artifact interpreter and XLSX postprocess path
Files:
-
Create:
src/compat/report_artifact.rs -
Create:
src/compat/report_xlsx_export.rs -
Modify:
src/compat/direct_skill_runtime.rs -
Modify:
src/compat/deterministic_submit.rs -
Create:
tests/report_artifact_postprocess_test.rs -
Modify:
tests/agent_runtime_test.rs -
Step 1: Write the red tests for generic report-artifact handling
Add tests/report_artifact_postprocess_test.rs and the minimum tests/agent_runtime_test.rs extensions needed to prove the platform no longer depends on line-loss-specific Rust export logic:
#[test]
fn report_artifact_postprocess_exports_xlsx_for_ok_or_partial_scene() {
let artifact = serde_json::json!({
"type": "report-artifact",
"report_name": "tq-lineloss-report",
"status": "partial",
"columns": ["ORG_NAME", "LINE_LOSS_RATE"],
"column_defs": [["ORG_NAME", "供电单位"], ["LINE_LOSS_RATE", "综合线损率(%)"]],
"rows": [{"ORG_NAME": "国网兰州供电公司", "LINE_LOSS_RATE": "1.23"}],
"counts": {"rows": 1},
"partial_reasons": ["report_log_failed"]
});
let outcome = interpret_report_artifact_and_postprocess(&artifact, report_postprocess_xlsx(), temp_workspace()).unwrap();
assert!(outcome.success);
assert!(outcome.summary.contains("status=partial"));
assert!(outcome.summary.contains("detail_rows=1"));
assert!(outcome.summary.contains("export_path="));
}
#[test]
fn report_artifact_postprocess_skips_export_for_blocked_or_error_scene() { /* no xlsx path */ }
#[test]
fn direct_submit_and_scene_submit_share_the_same_report_summary_contract() { /* direct_skill_runtime + deterministic path both use same summary builder */ }
- Step 2: Run the focused report-artifact tests and verify they fail
Run:
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture
Expected: FAIL because the generic interpreter/exporter does not exist yet and deterministic line-loss export is still special-cased.
- Step 3: Implement the shared parser, summary builder, and generic XLSX exporter
Implement a reusable path that both deterministic scenes and configured direct-submit skills can call:
pub struct ParsedReportArtifact {
pub report_name: String,
pub status: String,
pub columns: Vec<String>,
pub column_defs: Vec<(String, String)>,
pub rows: Vec<Map<String, Value>>,
pub counts: ReportCounts,
pub partial_reasons: Vec<String>,
}
pub fn interpret_report_artifact_and_postprocess(
artifact_json: &Value,
postprocess: Option<&PostprocessSection>,
workspace_root: &Path,
) -> Result<DirectSubmitOutcome, PipeError> {
// parse report-artifact generically
// map ok/partial/empty => success=true
// map blocked/error => success=false
// if postprocess.exporter == Some("xlsx_report") and status is exportable, write xlsx under workspace_root/out
// if postprocess.auto_open == Some("excel"), reuse existing open-export helper
}
Rules:
-
export logic must read
column_defswhen present, else fall back tocolumns -
do not keep line-loss-only column-name assumptions in Rust
-
keep direct-submit behavior unchanged for non-artifact string outputs
-
keep
blocked/erroras failures even if rows happen to be present late in the artifact -
Step 4: Re-run the focused tests and verify they pass
Run:
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_partial_report_artifact_as_success_with_warning_summary -- --nocapture
cargo test --test agent_runtime_test submit_task_treats_blocked_report_artifact_as_failure -- --nocapture
Expected: PASS.
- Step 5: Commit the generic artifact/postprocess slice
Run:
git add src/compat/report_artifact.rs src/compat/report_xlsx_export.rs src/compat/direct_skill_runtime.rs src/compat/deterministic_submit.rs tests/report_artifact_postprocess_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: share generic report artifact postprocess"
Expected: one commit that removes the need for per-scene Rust export logic.
Task 5: Wire manifest-driven scenes into submit and bootstrap without regressing other flows
Files:
-
Modify:
src/agent/task_runner.rs -
Modify:
src/service/server.rs -
Modify:
tests/service_task_flow_test.rs -
Modify:
tests/service_ws_session_test.rs -
Modify:
tests/agent_runtime_test.rs -
Step 1: Add the failing submit/bootstrap regression tests
Add focused tests that lock branch order and bootstrap behavior:
#[test]
fn submit_task_routes_suffix_instruction_through_manifest_scene_before_llm() {
// no provider call should happen when deterministic scene planning succeeds or prompts
}
#[test]
fn resolve_submit_bootstrap_target_prefers_manifest_scene_target_for_deterministic_scene() {
let request = SubmitTaskRequest {
instruction: "兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。".to_string(),
conversation_id: None,
messages: vec![],
page_url: None,
page_title: None,
};
let target = resolve_submit_bootstrap_target(&request, workspace_root, &settings);
assert_eq!(target.request_url, "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor");
assert_eq!(target.expected_domain.as_deref(), Some("20.76.57.61"));
}
#[test]
fn zhihu_without_suffix_keeps_existing_non_scene_path() { /* ordinary path unchanged */ }
For the browser-ws/callback-host path, add one regression in tests/service_ws_session_test.rs proving the first bootstrap/open target comes from scene.toml when a deterministic scene plan exists.
- Step 2: Run the focused integration tests and verify they fail
Run:
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
Expected: FAIL because the submit/bootstrap path still depends on the old deterministic line-loss branch shape.
- Step 3: Implement the minimal wiring changes only where the branch already exists
Implementation targets:
- keep the current submit branch order in
src/agent/task_runner.rs - keep
resolve_submit_bootstrap_target(...)precedence insrc/service/server.rs - replace the old hardcoded deterministic plan source with the new manifest-backed planner
- keep configured
directSubmitSkilland ordinary LLM/browser orchestration behavior untouched
The resulting branch order must still be:
// 1. registry-backed deterministic scene (exact suffix only)
// 2. ordinary primary orchestration path
// 3. configured directSubmitSkill
// 4. compat LLM/runtime path
- Step 4: Re-run the focused integration tests and verify they pass
Run:
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
Expected: PASS, with no regression to the ordinary direct-submit or Zhihu paths.
- Step 5: Commit the submit/bootstrap integration slice
Run:
git add src/agent/task_runner.rs src/service/server.rs tests/service_task_flow_test.rs tests/service_ws_session_test.rs tests/agent_runtime_test.rs
git commit -m "refactor: wire manifest scenes into submit bootstrap"
Expected: one commit that changes wiring only at the existing seams.
Task 6: Add the first manifest-driven tq-lineloss-report sample package inside this repo
Files:
-
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/scene.toml -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/references/org-dictionary.json -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.toml -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/SKILL.md -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.js -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/references/data-quality.md -
Create:
examples/generated_scene_platform/skills/tq-lineloss-report/references/collection-flow.md -
Modify:
tests/deterministic_submit_test.rs -
Modify:
tests/scene_registry_test.rs -
Step 1: Add the failing line-loss manifest and runtime-contract checks
Create the scene.toml shape in the in-repo sample package first and lock the migration expectations:
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"
[manifest]
schema_version = "1"
[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
page_title_keywords = ["线损"]
requires_target_page = true
[deterministic]
suffix = "。。。"
include_keywords = ["线损", "月累计", "周累计", "统计分析"]
exclude_keywords = ["知乎"]
[[params]]
name = "org"
resolver = "dictionary_entity"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少供电单位。"
prompt_ambiguous = "已命中台区线损报表技能,但供电单位存在歧义,请补充更完整名称。"
[params.resolver_config]
dictionary_ref = "references/org-dictionary.json"
output_label_field = "org_label"
output_code_field = "org_code"
[[params]]
name = "period"
resolver = "month_week_period"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少统计周期。"
prompt_ambiguous = "已命中台区线损报表技能,但统计周期存在歧义,请补充更明确表达。"
[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
[postprocess]
exporter = "xlsx_report"
auto_open = "excel"
Also add a red JS assertion in the committed sample package proving the script returns column_defs and never re-parses raw natural-language org/period text:
test('buildBrowserEntrypointResult keeps canonical args and generic export fields only', async () => {
const artifact = await buildBrowserEntrypointResult({
expected_domain: '20.76.57.61',
org_label: '国网兰州供电公司',
org_code: '62401',
period_mode: 'month',
period_mode_code: '1',
period_value: '2026-03',
period_payload: { fdate: '2026-03' },
instruction: '兰州公司 月累计 2026-03'
}, fakeDeps);
assert.equal(artifact.org.code, '62401');
assert.ok(Array.isArray(artifact.column_defs));
assert.equal(JSON.stringify(artifact).includes('兰州公司 月累计 2026-03'), false);
});
- Step 2: Run the targeted line-loss tests and verify they fail
Run:
cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
Expected: FAIL because the runtime is not yet manifest-driven and the committed sample package does not yet expose the final manifest/dictionary/export contract.
- Step 3: Implement the sample-scene migration without adding per-scene Rust branches
Required actions:
-
add
scene.tomlunder the in-repo sample skill root and use the same layout the generator will emit -
make tests and service-smoke config resolve
skillsDirtoexamples/generated_scene_platformso the registry can discover the committed sample package without any external repo copy step -
export the current org unit data into
references/org-dictionary.jsonand make the resolver read that file instead of a Rust hardcoded list -
update
collect_lineloss.jsso the returnedreport-artifactincludes generic-platform fields needed byreport_xlsx_export.rs -
keep collection logic in JS; do not move line-loss business semantics back into Rust
-
write
SKILL.toml/SKILL.md/ references docs into the sample package to describe canonical args and the manifest-driven contract -
keep any external staging-repo publish step out of scope for this branch; this task only commits the in-repo sample package
-
Step 4: Re-run the line-loss tests and verify they pass
Run:
cargo test --test deterministic_submit_test -- --nocapture
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
Expected: PASS, including the new missing-period prompt behavior and the new manifest-driven sample-package shape.
- Step 5: Commit the line-loss sample migration
Run:
git add examples/generated_scene_platform/skills/tq-lineloss-report tests/deterministic_submit_test.rs tests/scene_registry_test.rs
git commit -m "feat: add manifest-driven lineloss sample package"
Expected: one commit that adds the first committed manifest-driven sample package and updates runtime expectations around it.
Task 7: Write the required tq-lineloss lessons-learned artifacts and load them as generator rules
Files:
-
Create:
docs/superpowers/references/tq-lineloss-lessons-learned.md -
Create:
docs/superpowers/references/tq-lineloss-lessons-learned.toml -
Create:
tests/generated_scene_lessons_test.rs -
Create:
src/generated_scene/mod.rs -
Create:
src/generated_scene/lessons.rs -
Modify:
src/lib.rs -
Step 1: Write the failing lessons-rules test before the docs
Add tests/generated_scene_lessons_test.rs that requires all mandatory structured rule sections to exist. In the same red step, wire the empty src/generated_scene/mod.rs and src/lib.rs exports needed so this test fails on missing implementation/data, not on missing module visibility:
#[test]
fn lineloss_lessons_toml_declares_required_generator_rules() {
let lessons = load_generation_lessons("docs/superpowers/references/tq-lineloss-lessons-learned.toml").unwrap();
assert!(lessons.routing.require_exact_suffix);
assert!(lessons.routing.unsupported_scene_fail_closed);
assert!(lessons.canonical_params.require_explicit_period);
assert!(lessons.bootstrap.require_expected_domain);
assert!(lessons.bootstrap.require_target_url);
assert!(lessons.artifact.require_report_artifact);
assert!(lessons.validation.require_pipe_and_ws_checks);
assert!(lessons.validation.require_manual_service_console_smoke);
}
- Step 2: Run the lessons test and verify it fails
Run:
cargo test --test generated_scene_lessons_test -- --nocapture
Expected: FAIL because the lessons loader and TOML file do not exist yet.
- Step 3: Implement the loader and write both lessons artifacts
Implement the loader and complete the minimal module wiring (src/generated_scene/mod.rs, src/lib.rs) in this task so cargo test --test generated_scene_lessons_test is buildable before Task 8. Use a TOML shape explicit enough for generator enforcement, for example:
[routing]
require_exact_suffix = true
unsupported_scene_fail_closed = true
ambiguity_fail_closed = true
[canonical_params]
require_dictionary_entity_for_org = true
require_explicit_period = true
forbid_hidden_page_defaults = true
[bootstrap]
require_expected_domain = true
require_target_url = true
prefer_page_context_when_present = true
[artifact]
require_report_artifact = true
require_column_defs_for_export = true
rust_side_xlsx_export_when_postprocess_xlsx = true
[validation]
require_pipe_and_ws_checks = true
require_manual_service_console_smoke = true
require_callback_host_timeout_notes = true
The Markdown companion must explain the why behind those rules: deterministic routing pitfalls, canonical parameter pitfalls, bootstrap target pitfalls, pipe/ws differences, callback-host timeout lessons, and Rust-side export constraints.
- Step 4: Re-run the lessons tests and verify they pass
Run:
cargo test --test generated_scene_lessons_test -- --nocapture
Expected: PASS.
- Step 5: Commit the lessons artifacts and loader
Run:
git add docs/superpowers/references/tq-lineloss-lessons-learned.md docs/superpowers/references/tq-lineloss-lessons-learned.toml src/generated_scene/mod.rs src/generated_scene/lessons.rs src/lib.rs tests/generated_scene_lessons_test.rs
git commit -m "docs: add lineloss generation lessons"
Expected: one commit that makes the line-loss lessons machine-consumable and reviewable.
Task 8: Build the v1 source analyzer, package generator, and CLI entry
Files:
-
Create:
src/generated_scene/analyzer.rs -
Create:
src/generated_scene/generator.rs -
Create:
src/bin/sg_scene_generate.rs -
Modify:
src/generated_scene/mod.rs -
Modify:
src/lib.rs -
Create:
tests/scene_generator_test.rs -
Create:
tests/fixtures/generated_scene/report_collection/index.html -
Create:
tests/fixtures/generated_scene/report_collection/js/report.js -
Create:
tests/fixtures/generated_scene/non_report/index.html -
Create:
tests/fixtures/scene_source/tq_lineloss/index.html -
Create:
tests/fixtures/scene_source/tq_lineloss/js/collect.js -
Step 1: Add the failing analyzer/generator tests with hermetic fixtures
Create fixture-backed tests like:
#[test]
fn analyzer_classifies_supported_report_collection_source() {
let analysis = analyze_scene_source(Path::new("tests/fixtures/generated_scene/report_collection")).unwrap();
assert_eq!(analysis.scene_kind, SceneKind::ReportCollection);
assert_eq!(analysis.tool_kind, ToolKind::BrowserScript);
assert!(analysis.bootstrap.target_url.is_some());
assert!(analysis.collection_entry_script.is_some());
}
#[test]
fn generator_writes_registration_ready_package_with_scene_toml() {
let output_root = tempdir();
generate_scene_package(GenerateSceneRequest {
source_dir: PathBuf::from("tests/fixtures/generated_scene/report_collection"),
scene_id: "sample-report-scene".to_string(),
scene_name: "示例报表场景".to_string(),
output_root: output_root.path().to_path_buf(),
lessons_path: PathBuf::from("docs/superpowers/references/tq-lineloss-lessons-learned.toml"),
}).unwrap();
assert!(output_root.path().join("skills/sample-report-scene/SKILL.toml").exists());
assert!(output_root.path().join("skills/sample-report-scene/scene.toml").exists());
assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.js").exists());
assert!(output_root.path().join("skills/sample-report-scene/scripts/collect_sample_report_scene.test.js").exists());
}
#[test]
fn generator_rejects_non_report_source_with_explicit_reason() {
let err = analyze_scene_source(Path::new("tests/fixtures/generated_scene/non_report")).unwrap_err();
assert!(err.to_string().contains("report/collection browser_script only"));
}
- Step 2: Run the generator tests and verify they fail
Run:
cargo test --test scene_generator_test -- --nocapture
Expected: FAIL because the analyzer, generator, fixtures, and CLI do not exist yet.
- Step 3: Implement the analyzer, generator, CLI, and the source fixtures used by final smoke
Implementation rules:
- create the generator test fixtures under
tests/fixtures/generated_scene/* - create the hermetic source-smoke fixtures under
tests/fixtures/scene_source/tq_lineloss/*so Task 9 can run without any external scenario directory - analyzer must refuse unsupported/non-report scenes explicitly instead of generating broken packages
- generator must emit
scene.tomlinside the generated skill root - generator must use
tq-lineloss-lessons-learned.tomlas a required input so the same hardening rules apply to future scenes - generator/runtime coupling must stay at the file-contract level only
- CLI should use an explicit parser, no new heavy dependency
Suggested CLI shape:
cargo run --bin sg_scene_generate -- \
--source-dir <scenario-dir> \
--scene-id <scene-id> \
--scene-name <display-name> \
--output-root <skill-staging-root> \
--lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml
Expected outputs under <output-root>:
-
skills/<scene-id>/SKILL.toml -
skills/<scene-id>/SKILL.md -
skills/<scene-id>/scene.toml -
skills/<scene-id>/references/*.md -
skills/<scene-id>/scripts/*.js -
skills/<scene-id>/scripts/*.test.js -
Step 4: Re-run the generator tests and verify they pass
Run:
cargo test --test scene_generator_test -- --nocapture
Expected: PASS.
- Step 5: Commit the generator slice
Run:
git add src/lib.rs src/generated_scene/mod.rs src/generated_scene/analyzer.rs src/generated_scene/generator.rs src/bin/sg_scene_generate.rs tests/scene_generator_test.rs tests/fixtures/generated_scene tests/fixtures/scene_source/tq_lineloss
git commit -m "feat: add generated scene package generator"
Expected: one commit that adds the in-repo v1 generator capability.
Task 9: Run the final verification sweep, smoke the real runtime, and remove unused one-off scene code
Files:
-
Delete if unused after green verification:
src/compat/tq_lineloss/org_units.rs -
Delete if unused after green verification:
src/compat/tq_lineloss/org_resolver.rs -
Delete if unused after green verification:
src/compat/tq_lineloss/period_resolver.rs -
Delete or reduce to shim only if unused after green verification:
src/compat/lineloss_xlsx_export.rs -
Modify:
src/compat/mod.rs -
Modify:
src/lib.rs -
Step 1: Remove only the legacy one-off files that are provably unused
Before deleting anything, prove the new path covers the old responsibilities:
cargo test --test deterministic_submit_test -- --nocapture
cargo test --test scene_registry_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture
Then delete the old line-loss-only resolver/export files only if cargo test and Grep show they are no longer referenced.
- Step 2: Run the full automated verification sweep
Run:
node "examples/generated_scene_platform/skills/tq-lineloss-report/scripts/collect_lineloss.test.js"
cargo test --test scene_registry_test -- --nocapture
cargo test --test deterministic_submit_test -- --nocapture
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
cargo test --test compat_runtime_test -- --nocapture
cargo test --test compat_config_test -- --nocapture
cargo build --bin sgclaw --bin sg_claw --bin sg_scene_generate
Expected: PASS.
- Step 3: Run the required hermetic generator smoke and keep the real external source smoke optional
Run the required in-repo smoke first:
tmp_out="$(mktemp -d)"
cargo run --bin sg_scene_generate -- \
--source-dir tests/fixtures/scene_source/tq_lineloss \
--scene-id tq-lineloss-report \
--scene-name "台区线损月周累计线损率统计分析" \
--output-root "$tmp_out" \
--lessons docs/superpowers/references/tq-lineloss-lessons-learned.toml
Expected: generator emits a complete package into $tmp_out using only in-repo fixtures.
Optional manual follow-up after the required smoke is green:
-
if the external scenario directory is available on the implementer's machine, re-run the same command against the real source tree for additional confidence
-
if it is unavailable, do not block the branch on that machine-specific path
-
Step 4: Run the real service-console smoke checks with
sg_claw.exesemantics in mind
Manual verification checklist:
-
write or reuse a repo-local
sgclaw_config.jsonwhoseskillsDirpoints toexamples/generated_scene_platform -
rebuild and run
sg_claw/sg_claw.exewith that config so the runtime-scanned skills root is reproducible -
on the real line-loss page, submit
兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。 -
confirm the request bootstraps the manifest
target_url, uses the manifestexpected_domain, and returns the line-loss report artifact through the generic scene runtime -
submit
兰州公司 台区线损大数据 月累计线损率统计分析。。。and confirm the runtime prompts for missing period instead of defaulting -
submit
打开知乎热榜and confirm the ordinary Zhihu path still behaves as before -
submit
打开知乎热榜。。。and confirm the deterministic runtime fails closed with the unsupported-scene prompt instead of falling into the Zhihu path -
Step 5: Commit the cleanup + verified platform state
Run:
git add src/compat/mod.rs src/lib.rs src/compat src/generated_scene src/scene_contract docs/superpowers/references tests examples/generated_scene_platform
git commit -m "feat: add generated scene skill platform"
Expected: one final commit after the full automated and manual verification passes.
Verification Checklist
Registry and manifest contract
cargo test --test scene_registry_test -- --nocapture
Expected:
scene.tomlloads from the skill root- only
schema_version = "1"passes - duplicate
scene.idfails with both manifest paths in the error - non-
browser_scriptor non-report_collectionv1 scenes are rejected cleanly - the registry still scans exactly one resolved
skillsDir
Deterministic routing contract
cargo test --test deterministic_submit_test -- --nocapture
Expected:
- exact
。。。suffix only - no-suffix behavior unchanged
- unsupported suffix-scene requests fail closed
- multi-match ambiguity fails closed
- missing org/mode/period prompt instead of defaulting
- page context may improve scoring but cannot cause silent guessing on unresolved ambiguity
Generic report-artifact handling
cargo test --test report_artifact_postprocess_test -- --nocapture
cargo test --test agent_runtime_test -- --nocapture
Expected:
ok/partial/emptymap to successblocked/errormap to failure- generic XLSX export works from artifact fields, not line-loss-only Rust code
- configured
directSubmitSkillkeeps working on the shared artifact interpreter
Service submit/bootstrap path
cargo test --test service_task_flow_test -- --nocapture
cargo test --test service_ws_session_test callback_host -- --nocapture
Expected:
- deterministic manifest scenes route before LLM
- bootstrap target resolution uses manifest
target_url/expected_domain - callback-host/browser-ws paths still receive the correct request URL
- non-deterministic Zhihu and direct-submit flows remain intact
Generator and lessons
cargo test --test generated_scene_lessons_test -- --nocapture
cargo test --test scene_generator_test -- --nocapture
cargo build --bin sg_scene_generate
Expected:
- lessons TOML contains all required routing/param/bootstrap/artifact/validation rules
- analyzer only accepts v1 report/collection browser-script fixtures
- generator writes a complete package with
scene.tomland JS test scaffold - generator/runtime share only the explicit file contract, not hidden Rust internals
Real runtime smoke
Manual checklist:
sg_claw.exe/ service console can still run the line-loss deterministic path- missing-period deterministic line-loss requests prompt instead of defaulting
- plain Zhihu requests still avoid the scene platform
- suffixed unsupported requests fail closed
- line-loss export still opens through the generic postprocess path when configured
Notes For The Engineer
- The paired approved spec is
docs/superpowers/specs/2026-04-15-generated-scene-skill-platform-design.md. - The current repo branch name for the ws baseline is
feature/claw-ws, even though the design prose saysws. - Do not reintroduce the old scene-registry experiment that was explicitly cleaned off the ws branch. This plan deliberately keeps the new runtime under
compatand a shared serializable contract instead of reviving the deleted scene-only branch structure blindly. - Keep
scene.tomlinside each skill package root. The separateskill_staging/scenes/*/scene.jsontree remains legacy metadata only in this plan. - Keep the generator extractable by holding the boundary at
scene.toml, generated package layout, and lessons TOML rules. Avoid runtime code that reaches into generator-only internals. - If a real scenario directory does not fit the v1 report/collection/browser-script envelope, the analyzer/generator must refuse it explicitly instead of emitting a half-valid package.
- Do not add a generic login/session platform here. Capture that need in docs if discovered, but keep it out of this implementation slice.