Add registry-driven scene routing and multi-root skill loading so fault-details and 95598 scene skills can be triggered from natural language while still running through the browser-backed runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
292 lines
15 KiB
Markdown
292 lines
15 KiB
Markdown
# Scene Skill Runtime Routing Design
|
|
|
|
**Goal:** Add a minimal, extensible scene-routing layer so staged business scenes can be triggered from natural language while still executing through the existing browser-backed skill path.
|
|
|
|
**Architecture:** Introduce a registry-driven scene contract loader that reads staged `scene.json` metadata, matches user instructions to a scene, and chooses one of two dispatch modes: direct browser execution or agent-mediated browser execution. Both modes must reuse the same browser-backed skill tool path so scene skills continue to execute through browser-internal methods rather than text-only responses or local fake execution.
|
|
|
|
**Tech Stack:** Rust, serde/JSON scene metadata loading, existing `BrowserScriptSkillTool`, existing compat runtime / runtime engine / workflow executor layers, focused Rust unit tests.
|
|
|
|
---
|
|
|
|
## Problem Statement
|
|
|
|
The codebase already supports two useful but separate ideas:
|
|
|
|
1. **Zhihu special-case runtime routing**
|
|
- `src/compat/workflow_executor.rs` detects a narrow set of Zhihu tasks and can execute them directly without relying on the model to choose tools.
|
|
- This is stable, but not extensible for a growing set of business scenes.
|
|
|
|
2. **Browser-backed skills**
|
|
- `src/compat/runtime.rs` loads skills and exposes `browser_script` tools through `BrowserScriptSkillTool`.
|
|
- `src/compat/browser_script_skill_tool.rs` executes those tools by calling the browser backend with `Action::Eval`, so actual execution already happens through browser-internal methods.
|
|
- This is extensible, but tool choice currently depends too heavily on generic agent behavior.
|
|
|
|
The staged business scenes under `D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging` already provide most of the metadata needed to bridge these two ideas. We need a first integration slice that uses scene metadata to improve routing without turning every scene into a hardcoded Zhihu-style exception.
|
|
|
|
## Design Goals
|
|
|
|
- Support natural-language triggering for staged scenes.
|
|
- Preserve the current browser-backed execution contract: both scene modes must end in browser-internal execution via the existing browser tool path.
|
|
- Support both dispatch styles discussed with the user:
|
|
- one scene that can execute without the model
|
|
- one scene that still uses the model for orchestration
|
|
- Keep the first slice small, covering only:
|
|
- `fault-details-report`
|
|
- `95598-repair-city-dispatch`
|
|
- Keep the design extensible so more scene skills can be added in the same directory later without more ad hoc routing branches.
|
|
- Avoid broad refactors or a new generic workflow platform in this slice.
|
|
|
|
## Non-Goals
|
|
|
|
- Do not build a scene editor, scene UI, or registry authoring workflow.
|
|
- Do not implement a full artifact post-processing platform for all report/monitor types.
|
|
- Do not convert every staged scene into a direct Rust executor.
|
|
- Do not replace the existing Zhihu-specific runtime path in this slice.
|
|
|
|
## Source of Truth and Paths
|
|
|
|
### Staged scene source
|
|
The new staged scene source for this work is:
|
|
|
|
- `D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging`
|
|
|
|
The runtime integration must read scene metadata from this location for the initial slice.
|
|
|
|
### Existing runtime integration points
|
|
- `src/compat/config_adapter.rs` — current skills-dir resolution logic
|
|
- `src/compat/runtime.rs` — current skill loading and browser-script tool exposure
|
|
- `src/runtime/engine.rs` — runtime instruction building and allowed-tool shaping
|
|
- `src/compat/workflow_executor.rs` — existing direct execution routing pattern
|
|
- `src/compat/browser_script_skill_tool.rs` — browser-backed execution path for `browser_script` tools
|
|
|
|
## Scene Contract Model
|
|
|
|
Introduce a small internal scene contract model derived from `scene.json` and paired runtime policy. The loader should extract only the fields needed for the first slice:
|
|
|
|
- `id`
|
|
- `name`
|
|
- `summary`
|
|
- `tags`
|
|
- `inputs`
|
|
- `outputs`
|
|
- `skill.package`
|
|
- `skill.tool`
|
|
- `skill.artifact_type`
|
|
|
|
Add a runtime-only dispatch policy associated with each enabled scene inside the same internal registry entry used at runtime:
|
|
|
|
- `dispatch_mode`
|
|
- `direct_browser`
|
|
- `agent_browser`
|
|
- `expected_domain`
|
|
- bare hostname required by the underlying browser-backed skill tool
|
|
- optional `aliases`
|
|
- additional deterministic keywords/phrases when `id/name/summary/tags` are not enough for first-slice matching
|
|
- optional `default_args`
|
|
- runtime-supplied tool arguments when a scene needs fixed/default values for first execution
|
|
|
|
This runtime policy may be hardcoded in Rust for the first slice, but it must be represented through one consistent scene-routing abstraction so future scenes can join the same path without rewriting the whole design. The abstraction should be a single registry entry type that combines scene metadata with runtime dispatch policy, rather than a metadata loader plus a separate ad hoc match table.
|
|
|
|
## Dispatch Modes
|
|
|
|
### 1. `direct_browser`
|
|
This mode is for scenes whose collection flow is deterministic enough to bypass the model once the scene is recognized.
|
|
|
|
**Initial scene:** `fault-details-report`
|
|
|
|
**Behavior:**
|
|
- Detect scene from natural language.
|
|
- Resolve the corresponding browser-backed skill tool.
|
|
- Execute it directly through the existing browser-backed skill path.
|
|
- Return the collected artifact result without delegating tool choice to the model.
|
|
|
|
**Important constraint:**
|
|
This is not a local fake implementation. Even in direct mode, the actual collection must still go through the existing browser-backed execution path, meaning it ultimately uses browser-internal methods through the browser backend.
|
|
|
|
### 2. `agent_browser`
|
|
This mode is for scenes that still benefit from agent orchestration, explanation, or downstream reasoning, but whose business data must still come from browser-backed execution.
|
|
|
|
**Initial scene:** `95598-repair-city-dispatch`
|
|
|
|
**Behavior:**
|
|
- Detect scene from natural language.
|
|
- Inject a strong scene execution contract into the runtime instruction.
|
|
- Treat calling the matching browser-backed skill tool first as a policy requirement for the scene.
|
|
- In slice one, enforce that policy through scene-specific instruction injection rather than a hard runtime gate.
|
|
- Allow generic browser probing only as a fallback after the scene tool fails.
|
|
- Keep final explanation/summarization in the agent path, but never let the model invent business data.
|
|
|
|
## Matching Strategy
|
|
|
|
Implement a minimal matcher that scores user instructions against enabled scenes using:
|
|
|
|
- scene `id`
|
|
- scene `name`
|
|
- scene `summary`
|
|
- scene `tags`
|
|
- optional runtime aliases for the first slice
|
|
|
|
The matcher should be intentionally simple and deterministic in this slice. Avoid semantic embedding or fuzzy retrieval infrastructure.
|
|
|
|
Expected first-slice matches:
|
|
|
|
- `fault-details-report`
|
|
- phrases like `故障明细`, `故障明细报表`, `导出故障明细`
|
|
- `95598-repair-city-dispatch`
|
|
- phrases like `95598抢修市指`, `市指抢修监测`, `95598抢修队列`
|
|
|
|
If no scene matches, runtime behavior must remain unchanged.
|
|
|
|
## Runtime Loading Design
|
|
|
|
### Scene registry loading
|
|
Add a small loader that reads enabled scenes from the staged scene directory. For the first slice, it is acceptable to read the concrete scene files directly instead of implementing a full generic registry parser, as long as the resulting module boundary is registry-oriented rather than one-off.
|
|
|
|
The loader should:
|
|
- resolve the staged scene root
|
|
- read the two initial `scene.json` files
|
|
- deserialize them into a small internal scene metadata struct
|
|
- pair them with dispatch policy in the same in-memory registry entry
|
|
- ignore malformed or missing scenes safely
|
|
- never fail runtime startup solely because one or both initial scene files are absent
|
|
|
|
### Skill loading alignment
|
|
The corresponding skill packages must still be loaded into runtime skill exposure so the browser-backed tools are available to the runtime.
|
|
|
|
For this slice, the staged scene source and staged skill packages should be treated as coming from the same external root:
|
|
- staged scenes under `.../skill_staging/scenes`
|
|
- staged skill packages under `.../skill_staging/skills`
|
|
|
|
The implementation must make that staged skill package root visible to runtime skill loading. If current `skills_dir` resolution cannot express that directly, the design should extend configuration/path resolution to support a staged external skills root explicitly rather than relying on implicit mirroring.
|
|
|
|
## Execution Design
|
|
|
|
### Direct browser path (`fault-details-report`)
|
|
Add a direct execution route that is scene-driven rather than Zhihu-specific.
|
|
|
|
High-level flow:
|
|
1. Runtime receives user instruction.
|
|
2. Scene matcher recognizes `fault-details-report`.
|
|
3. Runtime resolves the browser-backed tool name `fault-details-report.collect_fault_details`.
|
|
4. Runtime builds the required tool arguments, including:
|
|
- `expected_domain` from the matched scene's runtime policy
|
|
- any first-slice scene inputs that can be deterministically derived from the current request/context
|
|
- any fixed/default args declared in runtime policy
|
|
5. Runtime executes that skill through the existing browser-backed mechanism.
|
|
6. Runtime returns normalized tool output as the direct route result.
|
|
|
|
Input/argument rules for the first slice:
|
|
- Direct execution is only allowed when all required tool arguments are available.
|
|
- `expected_domain` must always come from runtime scene policy, not from model inference.
|
|
- If a required scene/tool input cannot be derived from the user request or current browser context, the direct route must fail clearly instead of fabricating values.
|
|
- The first slice may keep direct-mode argument mapping intentionally narrow; unsupported requests should fall back safely rather than guessing.
|
|
|
|
Return-shape rule for the first slice:
|
|
- The direct route should return normalized serialized tool output (for example, the tool payload string or normalized JSON text), not a model-authored prose summary. This keeps direct mode deterministic and makes the browser-backed result explicit.
|
|
|
|
Implementation note:
|
|
The cleanest first slice is to add a small scene direct-execution helper in the compat runtime/workflow area that invokes the already-loaded browser-backed skill tool abstraction rather than duplicating browser request logic.
|
|
|
|
### Agent browser path (`95598-repair-city-dispatch`)
|
|
This path stays inside the agent flow.
|
|
|
|
High-level flow:
|
|
1. Runtime receives user instruction.
|
|
2. Scene matcher recognizes `95598-repair-city-dispatch`.
|
|
3. `RuntimeEngine::build_instruction` injects a scene execution contract containing:
|
|
- the matched scene name
|
|
- the required tool name `95598-repair-city-dispatch.collect_repair_orders`
|
|
- explicit requirement that this is a browser workflow, not a text-only task
|
|
- explicit requirement that business data must come from the browser-backed scene tool
|
|
- fallback rules for generic browser probing only after tool failure
|
|
4. Agent runs and chooses the required tool.
|
|
5. Tool executes through the existing browser-backed skill path.
|
|
6. Agent may summarize the result, but cannot fabricate data.
|
|
|
|
Enforcement note for the first slice:
|
|
- The `agent_browser` guarantee is primarily an instruction-contract guarantee in slice one.
|
|
- If allowed-tool shaping can narrow the exposed tool set for a matched scene without destabilizing existing behavior, that is a valid enhancement, but it is not required for the first slice.
|
|
- The minimum guaranteed behavior for slice one is strong scene-specific prompt injection plus preservation of the rule that the model must not invent collected business data.
|
|
|
|
## Browser Execution Contract
|
|
|
|
This requirement is non-negotiable for both dispatch modes:
|
|
|
|
- scene skills must execute like the Zhihu flow in the sense that the final business action is performed through browser-internal methods
|
|
- scene skills must not devolve into text-only pseudo execution
|
|
- direct mode and agent mode both reuse the existing browser-backed skill execution path
|
|
|
|
Concretely, the final path for scene skill execution should remain compatible with:
|
|
- `BrowserScriptSkillTool`
|
|
- browser backend invocation
|
|
- browser-side `Eval` / browser action execution semantics
|
|
|
|
## Error Handling
|
|
|
|
- **Scene metadata missing or invalid:** skip that scene and continue with normal runtime behavior.
|
|
- **Scene matched but skill/tool unavailable:** do not crash; log enough context for diagnosis and fall back safely.
|
|
- **Browser surface unavailable:** disable scene browser routing for that turn and fall back to current non-scene behavior.
|
|
- **Tool execution fails in `agent_browser` mode:** allow existing fallback prompt behavior to continue, but preserve the rule that the model cannot invent collected data.
|
|
- **Tool execution fails in `direct_browser` mode:** return a concise execution failure instead of pretending collection succeeded.
|
|
|
|
## Extensibility Rules
|
|
|
|
This slice should be built so future scene additions only need:
|
|
- a new scene metadata file under the staged scene path
|
|
- a matching skill package/tool
|
|
- a dispatch-mode declaration/policy
|
|
- optional aliases if the natural-language names are not sufficiently explicit
|
|
|
|
Avoid these anti-patterns:
|
|
- per-scene `if user said X then do Y` branches scattered across runtime files
|
|
- duplicating browser execution code for each scene
|
|
- binding future scenes to Zhihu-specific assumptions
|
|
|
|
## Testing Strategy
|
|
|
|
### Scene registry tests
|
|
- load valid metadata for `fault-details-report`
|
|
- load valid metadata for `95598-repair-city-dispatch`
|
|
- ignore broken/missing scene files safely
|
|
|
|
### Matching tests
|
|
- instruction variants match `fault-details-report`
|
|
- instruction variants match `95598-repair-city-dispatch`
|
|
- unrelated instructions do not match
|
|
|
|
### Instruction-building tests
|
|
- `agent_browser` scene injects the required browser-first scene contract
|
|
- unmatched instructions do not gain scene-specific constraints
|
|
- Zhihu-specific instruction behavior remains unchanged
|
|
|
|
### Tool exposure tests
|
|
- staged skills from the moved path are loaded into runtime
|
|
- browser-backed tool names include:
|
|
- `fault-details-report.collect_fault_details`
|
|
- `95598-repair-city-dispatch.collect_repair_orders`
|
|
|
|
### Direct execution tests
|
|
- `fault-details-report` direct route invokes the browser-backed tool path rather than bypassing the browser layer
|
|
- direct route returns failure cleanly when tool execution fails
|
|
|
|
## Recommended First Implementation Slice
|
|
|
|
1. Add a tiny scene metadata loader and dispatch-mode policy module.
|
|
2. Extend runtime path resolution so the moved staged skills/scenes are visible.
|
|
3. Add deterministic scene matching for the two initial scenes.
|
|
4. Implement `agent_browser` instruction injection for `95598-repair-city-dispatch`.
|
|
5. Implement `direct_browser` execution for `fault-details-report` using the browser-backed skill path.
|
|
6. Add focused tests for matching, loading, tool exposure, and direct-vs-agent behavior.
|
|
|
|
## Open Design Constraint Captured From Discussion
|
|
|
|
The user explicitly requires the following combined behavior:
|
|
|
|
- support both kinds of scene execution in the same architecture
|
|
- one initial scene should be able to execute without the model
|
|
- one initial scene should execute through the model
|
|
- both must still use browser-internal execution methods like the Zhihu path
|
|
- the design must stay extensible because more staged skills may be added under the same path later
|
|
|
|
This design is built around those exact constraints.
|