# Scene Skill Runtime Routing Design **Goal:** Add a minimal, extensible scene-routing layer so staged business scenes can be triggered from natural language while still executing through the existing browser-backed skill path. **Architecture:** Introduce a registry-driven scene contract loader that reads staged `scene.json` metadata, matches user instructions to a scene, and chooses one of two dispatch modes: direct browser execution or agent-mediated browser execution. Both modes must reuse the same browser-backed skill tool path so scene skills continue to execute through browser-internal methods rather than text-only responses or local fake execution. **Tech Stack:** Rust, serde/JSON scene metadata loading, existing `BrowserScriptSkillTool`, existing compat runtime / runtime engine / workflow executor layers, focused Rust unit tests. --- ## Problem Statement The codebase already supports two useful but separate ideas: 1. **Zhihu special-case runtime routing** - `src/compat/workflow_executor.rs` detects a narrow set of Zhihu tasks and can execute them directly without relying on the model to choose tools. - This is stable, but not extensible for a growing set of business scenes. 2. **Browser-backed skills** - `src/compat/runtime.rs` loads skills and exposes `browser_script` tools through `BrowserScriptSkillTool`. - `src/compat/browser_script_skill_tool.rs` executes those tools by calling the browser backend with `Action::Eval`, so actual execution already happens through browser-internal methods. - This is extensible, but tool choice currently depends too heavily on generic agent behavior. The staged business scenes under `D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging` already provide most of the metadata needed to bridge these two ideas. We need a first integration slice that uses scene metadata to improve routing without turning every scene into a hardcoded Zhihu-style exception. ## Design Goals - Support natural-language triggering for staged scenes. - Preserve the current browser-backed execution contract: both scene modes must end in browser-internal execution via the existing browser tool path. - Support both dispatch styles discussed with the user: - one scene that can execute without the model - one scene that still uses the model for orchestration - Keep the first slice small, covering only: - `fault-details-report` - `95598-repair-city-dispatch` - Keep the design extensible so more scene skills can be added in the same directory later without more ad hoc routing branches. - Avoid broad refactors or a new generic workflow platform in this slice. ## Non-Goals - Do not build a scene editor, scene UI, or registry authoring workflow. - Do not implement a full artifact post-processing platform for all report/monitor types. - Do not convert every staged scene into a direct Rust executor. - Do not replace the existing Zhihu-specific runtime path in this slice. ## Source of Truth and Paths ### Staged scene source The new staged scene source for this work is: - `D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging` The runtime integration must read scene metadata from this location for the initial slice. ### Existing runtime integration points - `src/compat/config_adapter.rs` — current skills-dir resolution logic - `src/compat/runtime.rs` — current skill loading and browser-script tool exposure - `src/runtime/engine.rs` — runtime instruction building and allowed-tool shaping - `src/compat/workflow_executor.rs` — existing direct execution routing pattern - `src/compat/browser_script_skill_tool.rs` — browser-backed execution path for `browser_script` tools ## Scene Contract Model Introduce a small internal scene contract model derived from `scene.json` and paired runtime policy. The loader should extract only the fields needed for the first slice: - `id` - `name` - `summary` - `tags` - `inputs` - `outputs` - `skill.package` - `skill.tool` - `skill.artifact_type` Add a runtime-only dispatch policy associated with each enabled scene inside the same internal registry entry used at runtime: - `dispatch_mode` - `direct_browser` - `agent_browser` - `expected_domain` - bare hostname required by the underlying browser-backed skill tool - optional `aliases` - additional deterministic keywords/phrases when `id/name/summary/tags` are not enough for first-slice matching - optional `default_args` - runtime-supplied tool arguments when a scene needs fixed/default values for first execution This runtime policy may be hardcoded in Rust for the first slice, but it must be represented through one consistent scene-routing abstraction so future scenes can join the same path without rewriting the whole design. The abstraction should be a single registry entry type that combines scene metadata with runtime dispatch policy, rather than a metadata loader plus a separate ad hoc match table. ## Dispatch Modes ### 1. `direct_browser` This mode is for scenes whose collection flow is deterministic enough to bypass the model once the scene is recognized. **Initial scene:** `fault-details-report` **Behavior:** - Detect scene from natural language. - Resolve the corresponding browser-backed skill tool. - Execute it directly through the existing browser-backed skill path. - Return the collected artifact result without delegating tool choice to the model. **Important constraint:** This is not a local fake implementation. Even in direct mode, the actual collection must still go through the existing browser-backed execution path, meaning it ultimately uses browser-internal methods through the browser backend. ### 2. `agent_browser` This mode is for scenes that still benefit from agent orchestration, explanation, or downstream reasoning, but whose business data must still come from browser-backed execution. **Initial scene:** `95598-repair-city-dispatch` **Behavior:** - Detect scene from natural language. - Inject a strong scene execution contract into the runtime instruction. - Treat calling the matching browser-backed skill tool first as a policy requirement for the scene. - In slice one, enforce that policy through scene-specific instruction injection rather than a hard runtime gate. - Allow generic browser probing only as a fallback after the scene tool fails. - Keep final explanation/summarization in the agent path, but never let the model invent business data. ## Matching Strategy Implement a minimal matcher that scores user instructions against enabled scenes using: - scene `id` - scene `name` - scene `summary` - scene `tags` - optional runtime aliases for the first slice The matcher should be intentionally simple and deterministic in this slice. Avoid semantic embedding or fuzzy retrieval infrastructure. Expected first-slice matches: - `fault-details-report` - phrases like `故障明细`, `故障明细报表`, `导出故障明细` - `95598-repair-city-dispatch` - phrases like `95598抢修市指`, `市指抢修监测`, `95598抢修队列` If no scene matches, runtime behavior must remain unchanged. ## Runtime Loading Design ### Scene registry loading Add a small loader that reads enabled scenes from the staged scene directory. For the first slice, it is acceptable to read the concrete scene files directly instead of implementing a full generic registry parser, as long as the resulting module boundary is registry-oriented rather than one-off. The loader should: - resolve the staged scene root - read the two initial `scene.json` files - deserialize them into a small internal scene metadata struct - pair them with dispatch policy in the same in-memory registry entry - ignore malformed or missing scenes safely - never fail runtime startup solely because one or both initial scene files are absent ### Skill loading alignment The corresponding skill packages must still be loaded into runtime skill exposure so the browser-backed tools are available to the runtime. For this slice, the staged scene source and staged skill packages should be treated as coming from the same external root: - staged scenes under `.../skill_staging/scenes` - staged skill packages under `.../skill_staging/skills` The implementation must make that staged skill package root visible to runtime skill loading. If current `skills_dir` resolution cannot express that directly, the design should extend configuration/path resolution to support a staged external skills root explicitly rather than relying on implicit mirroring. ## Execution Design ### Direct browser path (`fault-details-report`) Add a direct execution route that is scene-driven rather than Zhihu-specific. High-level flow: 1. Runtime receives user instruction. 2. Scene matcher recognizes `fault-details-report`. 3. Runtime resolves the browser-backed tool name `fault-details-report.collect_fault_details`. 4. Runtime builds the required tool arguments, including: - `expected_domain` from the matched scene's runtime policy - any first-slice scene inputs that can be deterministically derived from the current request/context - any fixed/default args declared in runtime policy 5. Runtime executes that skill through the existing browser-backed mechanism. 6. Runtime returns normalized tool output as the direct route result. Input/argument rules for the first slice: - Direct execution is only allowed when all required tool arguments are available. - `expected_domain` must always come from runtime scene policy, not from model inference. - If a required scene/tool input cannot be derived from the user request or current browser context, the direct route must fail clearly instead of fabricating values. - The first slice may keep direct-mode argument mapping intentionally narrow; unsupported requests should fall back safely rather than guessing. Return-shape rule for the first slice: - The direct route should return normalized serialized tool output (for example, the tool payload string or normalized JSON text), not a model-authored prose summary. This keeps direct mode deterministic and makes the browser-backed result explicit. Implementation note: The cleanest first slice is to add a small scene direct-execution helper in the compat runtime/workflow area that invokes the already-loaded browser-backed skill tool abstraction rather than duplicating browser request logic. ### Agent browser path (`95598-repair-city-dispatch`) This path stays inside the agent flow. High-level flow: 1. Runtime receives user instruction. 2. Scene matcher recognizes `95598-repair-city-dispatch`. 3. `RuntimeEngine::build_instruction` injects a scene execution contract containing: - the matched scene name - the required tool name `95598-repair-city-dispatch.collect_repair_orders` - explicit requirement that this is a browser workflow, not a text-only task - explicit requirement that business data must come from the browser-backed scene tool - fallback rules for generic browser probing only after tool failure 4. Agent runs and chooses the required tool. 5. Tool executes through the existing browser-backed skill path. 6. Agent may summarize the result, but cannot fabricate data. Enforcement note for the first slice: - The `agent_browser` guarantee is primarily an instruction-contract guarantee in slice one. - If allowed-tool shaping can narrow the exposed tool set for a matched scene without destabilizing existing behavior, that is a valid enhancement, but it is not required for the first slice. - The minimum guaranteed behavior for slice one is strong scene-specific prompt injection plus preservation of the rule that the model must not invent collected business data. ## Browser Execution Contract This requirement is non-negotiable for both dispatch modes: - scene skills must execute like the Zhihu flow in the sense that the final business action is performed through browser-internal methods - scene skills must not devolve into text-only pseudo execution - direct mode and agent mode both reuse the existing browser-backed skill execution path Concretely, the final path for scene skill execution should remain compatible with: - `BrowserScriptSkillTool` - browser backend invocation - browser-side `Eval` / browser action execution semantics ## Error Handling - **Scene metadata missing or invalid:** skip that scene and continue with normal runtime behavior. - **Scene matched but skill/tool unavailable:** do not crash; log enough context for diagnosis and fall back safely. - **Browser surface unavailable:** disable scene browser routing for that turn and fall back to current non-scene behavior. - **Tool execution fails in `agent_browser` mode:** allow existing fallback prompt behavior to continue, but preserve the rule that the model cannot invent collected data. - **Tool execution fails in `direct_browser` mode:** return a concise execution failure instead of pretending collection succeeded. ## Extensibility Rules This slice should be built so future scene additions only need: - a new scene metadata file under the staged scene path - a matching skill package/tool - a dispatch-mode declaration/policy - optional aliases if the natural-language names are not sufficiently explicit Avoid these anti-patterns: - per-scene `if user said X then do Y` branches scattered across runtime files - duplicating browser execution code for each scene - binding future scenes to Zhihu-specific assumptions ## Testing Strategy ### Scene registry tests - load valid metadata for `fault-details-report` - load valid metadata for `95598-repair-city-dispatch` - ignore broken/missing scene files safely ### Matching tests - instruction variants match `fault-details-report` - instruction variants match `95598-repair-city-dispatch` - unrelated instructions do not match ### Instruction-building tests - `agent_browser` scene injects the required browser-first scene contract - unmatched instructions do not gain scene-specific constraints - Zhihu-specific instruction behavior remains unchanged ### Tool exposure tests - staged skills from the moved path are loaded into runtime - browser-backed tool names include: - `fault-details-report.collect_fault_details` - `95598-repair-city-dispatch.collect_repair_orders` ### Direct execution tests - `fault-details-report` direct route invokes the browser-backed tool path rather than bypassing the browser layer - direct route returns failure cleanly when tool execution fails ## Recommended First Implementation Slice 1. Add a tiny scene metadata loader and dispatch-mode policy module. 2. Extend runtime path resolution so the moved staged skills/scenes are visible. 3. Add deterministic scene matching for the two initial scenes. 4. Implement `agent_browser` instruction injection for `95598-repair-city-dispatch`. 5. Implement `direct_browser` execution for `fault-details-report` using the browser-backed skill path. 6. Add focused tests for matching, loading, tool exposure, and direct-vs-agent behavior. ## Open Design Constraint Captured From Discussion The user explicitly requires the following combined behavior: - support both kinds of scene execution in the same architecture - one initial scene should be able to execute without the model - one initial scene should execute through the model - both must still use browser-internal execution methods like the Zhihu path - the design must stay extensible because more staged skills may be added under the same path later This design is built around those exact constraints.