Add registry-driven scene routing and multi-root skill loading so fault-details and 95598 scene skills can be triggered from natural language while still running through the browser-backed runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
15 KiB
Scene Skill Runtime Routing Design
Goal: Add a minimal, extensible scene-routing layer so staged business scenes can be triggered from natural language while still executing through the existing browser-backed skill path.
Architecture: Introduce a registry-driven scene contract loader that reads staged scene.json metadata, matches user instructions to a scene, and chooses one of two dispatch modes: direct browser execution or agent-mediated browser execution. Both modes must reuse the same browser-backed skill tool path so scene skills continue to execute through browser-internal methods rather than text-only responses or local fake execution.
Tech Stack: Rust, serde/JSON scene metadata loading, existing BrowserScriptSkillTool, existing compat runtime / runtime engine / workflow executor layers, focused Rust unit tests.
Problem Statement
The codebase already supports two useful but separate ideas:
-
Zhihu special-case runtime routing
src/compat/workflow_executor.rsdetects a narrow set of Zhihu tasks and can execute them directly without relying on the model to choose tools.- This is stable, but not extensible for a growing set of business scenes.
-
Browser-backed skills
src/compat/runtime.rsloads skills and exposesbrowser_scripttools throughBrowserScriptSkillTool.src/compat/browser_script_skill_tool.rsexecutes those tools by calling the browser backend withAction::Eval, so actual execution already happens through browser-internal methods.- This is extensible, but tool choice currently depends too heavily on generic agent behavior.
The staged business scenes under D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging already provide most of the metadata needed to bridge these two ideas. We need a first integration slice that uses scene metadata to improve routing without turning every scene into a hardcoded Zhihu-style exception.
Design Goals
- Support natural-language triggering for staged scenes.
- Preserve the current browser-backed execution contract: both scene modes must end in browser-internal execution via the existing browser tool path.
- Support both dispatch styles discussed with the user:
- one scene that can execute without the model
- one scene that still uses the model for orchestration
- Keep the first slice small, covering only:
fault-details-report95598-repair-city-dispatch
- Keep the design extensible so more scene skills can be added in the same directory later without more ad hoc routing branches.
- Avoid broad refactors or a new generic workflow platform in this slice.
Non-Goals
- Do not build a scene editor, scene UI, or registry authoring workflow.
- Do not implement a full artifact post-processing platform for all report/monitor types.
- Do not convert every staged scene into a direct Rust executor.
- Do not replace the existing Zhihu-specific runtime path in this slice.
Source of Truth and Paths
Staged scene source
The new staged scene source for this work is:
D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging
The runtime integration must read scene metadata from this location for the initial slice.
Existing runtime integration points
src/compat/config_adapter.rs— current skills-dir resolution logicsrc/compat/runtime.rs— current skill loading and browser-script tool exposuresrc/runtime/engine.rs— runtime instruction building and allowed-tool shapingsrc/compat/workflow_executor.rs— existing direct execution routing patternsrc/compat/browser_script_skill_tool.rs— browser-backed execution path forbrowser_scripttools
Scene Contract Model
Introduce a small internal scene contract model derived from scene.json and paired runtime policy. The loader should extract only the fields needed for the first slice:
idnamesummarytagsinputsoutputsskill.packageskill.toolskill.artifact_type
Add a runtime-only dispatch policy associated with each enabled scene inside the same internal registry entry used at runtime:
dispatch_modedirect_browseragent_browser
expected_domain- bare hostname required by the underlying browser-backed skill tool
- optional
aliases- additional deterministic keywords/phrases when
id/name/summary/tagsare not enough for first-slice matching
- additional deterministic keywords/phrases when
- optional
default_args- runtime-supplied tool arguments when a scene needs fixed/default values for first execution
This runtime policy may be hardcoded in Rust for the first slice, but it must be represented through one consistent scene-routing abstraction so future scenes can join the same path without rewriting the whole design. The abstraction should be a single registry entry type that combines scene metadata with runtime dispatch policy, rather than a metadata loader plus a separate ad hoc match table.
Dispatch Modes
1. direct_browser
This mode is for scenes whose collection flow is deterministic enough to bypass the model once the scene is recognized.
Initial scene: fault-details-report
Behavior:
- Detect scene from natural language.
- Resolve the corresponding browser-backed skill tool.
- Execute it directly through the existing browser-backed skill path.
- Return the collected artifact result without delegating tool choice to the model.
Important constraint: This is not a local fake implementation. Even in direct mode, the actual collection must still go through the existing browser-backed execution path, meaning it ultimately uses browser-internal methods through the browser backend.
2. agent_browser
This mode is for scenes that still benefit from agent orchestration, explanation, or downstream reasoning, but whose business data must still come from browser-backed execution.
Initial scene: 95598-repair-city-dispatch
Behavior:
- Detect scene from natural language.
- Inject a strong scene execution contract into the runtime instruction.
- Treat calling the matching browser-backed skill tool first as a policy requirement for the scene.
- In slice one, enforce that policy through scene-specific instruction injection rather than a hard runtime gate.
- Allow generic browser probing only as a fallback after the scene tool fails.
- Keep final explanation/summarization in the agent path, but never let the model invent business data.
Matching Strategy
Implement a minimal matcher that scores user instructions against enabled scenes using:
- scene
id - scene
name - scene
summary - scene
tags - optional runtime aliases for the first slice
The matcher should be intentionally simple and deterministic in this slice. Avoid semantic embedding or fuzzy retrieval infrastructure.
Expected first-slice matches:
fault-details-report- phrases like
故障明细,故障明细报表,导出故障明细
- phrases like
95598-repair-city-dispatch- phrases like
95598抢修市指,市指抢修监测,95598抢修队列
- phrases like
If no scene matches, runtime behavior must remain unchanged.
Runtime Loading Design
Scene registry loading
Add a small loader that reads enabled scenes from the staged scene directory. For the first slice, it is acceptable to read the concrete scene files directly instead of implementing a full generic registry parser, as long as the resulting module boundary is registry-oriented rather than one-off.
The loader should:
- resolve the staged scene root
- read the two initial
scene.jsonfiles - deserialize them into a small internal scene metadata struct
- pair them with dispatch policy in the same in-memory registry entry
- ignore malformed or missing scenes safely
- never fail runtime startup solely because one or both initial scene files are absent
Skill loading alignment
The corresponding skill packages must still be loaded into runtime skill exposure so the browser-backed tools are available to the runtime.
For this slice, the staged scene source and staged skill packages should be treated as coming from the same external root:
- staged scenes under
.../skill_staging/scenes - staged skill packages under
.../skill_staging/skills
The implementation must make that staged skill package root visible to runtime skill loading. If current skills_dir resolution cannot express that directly, the design should extend configuration/path resolution to support a staged external skills root explicitly rather than relying on implicit mirroring.
Execution Design
Direct browser path (fault-details-report)
Add a direct execution route that is scene-driven rather than Zhihu-specific.
High-level flow:
- Runtime receives user instruction.
- Scene matcher recognizes
fault-details-report. - Runtime resolves the browser-backed tool name
fault-details-report.collect_fault_details. - Runtime builds the required tool arguments, including:
expected_domainfrom the matched scene's runtime policy- any first-slice scene inputs that can be deterministically derived from the current request/context
- any fixed/default args declared in runtime policy
- Runtime executes that skill through the existing browser-backed mechanism.
- Runtime returns normalized tool output as the direct route result.
Input/argument rules for the first slice:
- Direct execution is only allowed when all required tool arguments are available.
expected_domainmust always come from runtime scene policy, not from model inference.- If a required scene/tool input cannot be derived from the user request or current browser context, the direct route must fail clearly instead of fabricating values.
- The first slice may keep direct-mode argument mapping intentionally narrow; unsupported requests should fall back safely rather than guessing.
Return-shape rule for the first slice:
- The direct route should return normalized serialized tool output (for example, the tool payload string or normalized JSON text), not a model-authored prose summary. This keeps direct mode deterministic and makes the browser-backed result explicit.
Implementation note: The cleanest first slice is to add a small scene direct-execution helper in the compat runtime/workflow area that invokes the already-loaded browser-backed skill tool abstraction rather than duplicating browser request logic.
Agent browser path (95598-repair-city-dispatch)
This path stays inside the agent flow.
High-level flow:
- Runtime receives user instruction.
- Scene matcher recognizes
95598-repair-city-dispatch. RuntimeEngine::build_instructioninjects a scene execution contract containing:- the matched scene name
- the required tool name
95598-repair-city-dispatch.collect_repair_orders - explicit requirement that this is a browser workflow, not a text-only task
- explicit requirement that business data must come from the browser-backed scene tool
- fallback rules for generic browser probing only after tool failure
- Agent runs and chooses the required tool.
- Tool executes through the existing browser-backed skill path.
- Agent may summarize the result, but cannot fabricate data.
Enforcement note for the first slice:
- The
agent_browserguarantee is primarily an instruction-contract guarantee in slice one. - If allowed-tool shaping can narrow the exposed tool set for a matched scene without destabilizing existing behavior, that is a valid enhancement, but it is not required for the first slice.
- The minimum guaranteed behavior for slice one is strong scene-specific prompt injection plus preservation of the rule that the model must not invent collected business data.
Browser Execution Contract
This requirement is non-negotiable for both dispatch modes:
- scene skills must execute like the Zhihu flow in the sense that the final business action is performed through browser-internal methods
- scene skills must not devolve into text-only pseudo execution
- direct mode and agent mode both reuse the existing browser-backed skill execution path
Concretely, the final path for scene skill execution should remain compatible with:
BrowserScriptSkillTool- browser backend invocation
- browser-side
Eval/ browser action execution semantics
Error Handling
- Scene metadata missing or invalid: skip that scene and continue with normal runtime behavior.
- Scene matched but skill/tool unavailable: do not crash; log enough context for diagnosis and fall back safely.
- Browser surface unavailable: disable scene browser routing for that turn and fall back to current non-scene behavior.
- Tool execution fails in
agent_browsermode: allow existing fallback prompt behavior to continue, but preserve the rule that the model cannot invent collected data. - Tool execution fails in
direct_browsermode: return a concise execution failure instead of pretending collection succeeded.
Extensibility Rules
This slice should be built so future scene additions only need:
- a new scene metadata file under the staged scene path
- a matching skill package/tool
- a dispatch-mode declaration/policy
- optional aliases if the natural-language names are not sufficiently explicit
Avoid these anti-patterns:
- per-scene
if user said X then do Ybranches scattered across runtime files - duplicating browser execution code for each scene
- binding future scenes to Zhihu-specific assumptions
Testing Strategy
Scene registry tests
- load valid metadata for
fault-details-report - load valid metadata for
95598-repair-city-dispatch - ignore broken/missing scene files safely
Matching tests
- instruction variants match
fault-details-report - instruction variants match
95598-repair-city-dispatch - unrelated instructions do not match
Instruction-building tests
agent_browserscene injects the required browser-first scene contract- unmatched instructions do not gain scene-specific constraints
- Zhihu-specific instruction behavior remains unchanged
Tool exposure tests
- staged skills from the moved path are loaded into runtime
- browser-backed tool names include:
fault-details-report.collect_fault_details95598-repair-city-dispatch.collect_repair_orders
Direct execution tests
fault-details-reportdirect route invokes the browser-backed tool path rather than bypassing the browser layer- direct route returns failure cleanly when tool execution fails
Recommended First Implementation Slice
- Add a tiny scene metadata loader and dispatch-mode policy module.
- Extend runtime path resolution so the moved staged skills/scenes are visible.
- Add deterministic scene matching for the two initial scenes.
- Implement
agent_browserinstruction injection for95598-repair-city-dispatch. - Implement
direct_browserexecution forfault-details-reportusing the browser-backed skill path. - Add focused tests for matching, loading, tool exposure, and direct-vs-agent behavior.
Open Design Constraint Captured From Discussion
The user explicitly requires the following combined behavior:
- support both kinds of scene execution in the same architecture
- one initial scene should be able to execute without the model
- one initial scene should execute through the model
- both must still use browser-internal execution methods like the Zhihu path
- the design must stay extensible because more staged skills may be added under the same path later
This design is built around those exact constraints.