Files
claw/docs/superpowers/specs/2026-04-06-scene-skill-runtime-routing-design.md
木炎 96c3bf1dee feat: route staged scene skills through runtime
Add registry-driven scene routing and multi-root skill loading so fault-details and 95598 scene skills can be triggered from natural language while still running through the browser-backed runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 16:17:17 +08:00

15 KiB

Scene Skill Runtime Routing Design

Goal: Add a minimal, extensible scene-routing layer so staged business scenes can be triggered from natural language while still executing through the existing browser-backed skill path.

Architecture: Introduce a registry-driven scene contract loader that reads staged scene.json metadata, matches user instructions to a scene, and chooses one of two dispatch modes: direct browser execution or agent-mediated browser execution. Both modes must reuse the same browser-backed skill tool path so scene skills continue to execute through browser-internal methods rather than text-only responses or local fake execution.

Tech Stack: Rust, serde/JSON scene metadata loading, existing BrowserScriptSkillTool, existing compat runtime / runtime engine / workflow executor layers, focused Rust unit tests.


Problem Statement

The codebase already supports two useful but separate ideas:

  1. Zhihu special-case runtime routing

    • src/compat/workflow_executor.rs detects a narrow set of Zhihu tasks and can execute them directly without relying on the model to choose tools.
    • This is stable, but not extensible for a growing set of business scenes.
  2. Browser-backed skills

    • src/compat/runtime.rs loads skills and exposes browser_script tools through BrowserScriptSkillTool.
    • src/compat/browser_script_skill_tool.rs executes those tools by calling the browser backend with Action::Eval, so actual execution already happens through browser-internal methods.
    • This is extensible, but tool choice currently depends too heavily on generic agent behavior.

The staged business scenes under D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging already provide most of the metadata needed to bridge these two ideas. We need a first integration slice that uses scene metadata to improve routing without turning every scene into a hardcoded Zhihu-style exception.

Design Goals

  • Support natural-language triggering for staged scenes.
  • Preserve the current browser-backed execution contract: both scene modes must end in browser-internal execution via the existing browser tool path.
  • Support both dispatch styles discussed with the user:
    • one scene that can execute without the model
    • one scene that still uses the model for orchestration
  • Keep the first slice small, covering only:
    • fault-details-report
    • 95598-repair-city-dispatch
  • Keep the design extensible so more scene skills can be added in the same directory later without more ad hoc routing branches.
  • Avoid broad refactors or a new generic workflow platform in this slice.

Non-Goals

  • Do not build a scene editor, scene UI, or registry authoring workflow.
  • Do not implement a full artifact post-processing platform for all report/monitor types.
  • Do not convert every staged scene into a direct Rust executor.
  • Do not replace the existing Zhihu-specific runtime path in this slice.

Source of Truth and Paths

Staged scene source

The new staged scene source for this work is:

  • D:\data\ideaSpace\rust\sgClaw\claw\claw\skills\skill_staging

The runtime integration must read scene metadata from this location for the initial slice.

Existing runtime integration points

  • src/compat/config_adapter.rs — current skills-dir resolution logic
  • src/compat/runtime.rs — current skill loading and browser-script tool exposure
  • src/runtime/engine.rs — runtime instruction building and allowed-tool shaping
  • src/compat/workflow_executor.rs — existing direct execution routing pattern
  • src/compat/browser_script_skill_tool.rs — browser-backed execution path for browser_script tools

Scene Contract Model

Introduce a small internal scene contract model derived from scene.json and paired runtime policy. The loader should extract only the fields needed for the first slice:

  • id
  • name
  • summary
  • tags
  • inputs
  • outputs
  • skill.package
  • skill.tool
  • skill.artifact_type

Add a runtime-only dispatch policy associated with each enabled scene inside the same internal registry entry used at runtime:

  • dispatch_mode
    • direct_browser
    • agent_browser
  • expected_domain
    • bare hostname required by the underlying browser-backed skill tool
  • optional aliases
    • additional deterministic keywords/phrases when id/name/summary/tags are not enough for first-slice matching
  • optional default_args
    • runtime-supplied tool arguments when a scene needs fixed/default values for first execution

This runtime policy may be hardcoded in Rust for the first slice, but it must be represented through one consistent scene-routing abstraction so future scenes can join the same path without rewriting the whole design. The abstraction should be a single registry entry type that combines scene metadata with runtime dispatch policy, rather than a metadata loader plus a separate ad hoc match table.

Dispatch Modes

1. direct_browser

This mode is for scenes whose collection flow is deterministic enough to bypass the model once the scene is recognized.

Initial scene: fault-details-report

Behavior:

  • Detect scene from natural language.
  • Resolve the corresponding browser-backed skill tool.
  • Execute it directly through the existing browser-backed skill path.
  • Return the collected artifact result without delegating tool choice to the model.

Important constraint: This is not a local fake implementation. Even in direct mode, the actual collection must still go through the existing browser-backed execution path, meaning it ultimately uses browser-internal methods through the browser backend.

2. agent_browser

This mode is for scenes that still benefit from agent orchestration, explanation, or downstream reasoning, but whose business data must still come from browser-backed execution.

Initial scene: 95598-repair-city-dispatch

Behavior:

  • Detect scene from natural language.
  • Inject a strong scene execution contract into the runtime instruction.
  • Treat calling the matching browser-backed skill tool first as a policy requirement for the scene.
  • In slice one, enforce that policy through scene-specific instruction injection rather than a hard runtime gate.
  • Allow generic browser probing only as a fallback after the scene tool fails.
  • Keep final explanation/summarization in the agent path, but never let the model invent business data.

Matching Strategy

Implement a minimal matcher that scores user instructions against enabled scenes using:

  • scene id
  • scene name
  • scene summary
  • scene tags
  • optional runtime aliases for the first slice

The matcher should be intentionally simple and deterministic in this slice. Avoid semantic embedding or fuzzy retrieval infrastructure.

Expected first-slice matches:

  • fault-details-report
    • phrases like 故障明细, 故障明细报表, 导出故障明细
  • 95598-repair-city-dispatch
    • phrases like 95598抢修市指, 市指抢修监测, 95598抢修队列

If no scene matches, runtime behavior must remain unchanged.

Runtime Loading Design

Scene registry loading

Add a small loader that reads enabled scenes from the staged scene directory. For the first slice, it is acceptable to read the concrete scene files directly instead of implementing a full generic registry parser, as long as the resulting module boundary is registry-oriented rather than one-off.

The loader should:

  • resolve the staged scene root
  • read the two initial scene.json files
  • deserialize them into a small internal scene metadata struct
  • pair them with dispatch policy in the same in-memory registry entry
  • ignore malformed or missing scenes safely
  • never fail runtime startup solely because one or both initial scene files are absent

Skill loading alignment

The corresponding skill packages must still be loaded into runtime skill exposure so the browser-backed tools are available to the runtime.

For this slice, the staged scene source and staged skill packages should be treated as coming from the same external root:

  • staged scenes under .../skill_staging/scenes
  • staged skill packages under .../skill_staging/skills

The implementation must make that staged skill package root visible to runtime skill loading. If current skills_dir resolution cannot express that directly, the design should extend configuration/path resolution to support a staged external skills root explicitly rather than relying on implicit mirroring.

Execution Design

Direct browser path (fault-details-report)

Add a direct execution route that is scene-driven rather than Zhihu-specific.

High-level flow:

  1. Runtime receives user instruction.
  2. Scene matcher recognizes fault-details-report.
  3. Runtime resolves the browser-backed tool name fault-details-report.collect_fault_details.
  4. Runtime builds the required tool arguments, including:
    • expected_domain from the matched scene's runtime policy
    • any first-slice scene inputs that can be deterministically derived from the current request/context
    • any fixed/default args declared in runtime policy
  5. Runtime executes that skill through the existing browser-backed mechanism.
  6. Runtime returns normalized tool output as the direct route result.

Input/argument rules for the first slice:

  • Direct execution is only allowed when all required tool arguments are available.
  • expected_domain must always come from runtime scene policy, not from model inference.
  • If a required scene/tool input cannot be derived from the user request or current browser context, the direct route must fail clearly instead of fabricating values.
  • The first slice may keep direct-mode argument mapping intentionally narrow; unsupported requests should fall back safely rather than guessing.

Return-shape rule for the first slice:

  • The direct route should return normalized serialized tool output (for example, the tool payload string or normalized JSON text), not a model-authored prose summary. This keeps direct mode deterministic and makes the browser-backed result explicit.

Implementation note: The cleanest first slice is to add a small scene direct-execution helper in the compat runtime/workflow area that invokes the already-loaded browser-backed skill tool abstraction rather than duplicating browser request logic.

Agent browser path (95598-repair-city-dispatch)

This path stays inside the agent flow.

High-level flow:

  1. Runtime receives user instruction.
  2. Scene matcher recognizes 95598-repair-city-dispatch.
  3. RuntimeEngine::build_instruction injects a scene execution contract containing:
    • the matched scene name
    • the required tool name 95598-repair-city-dispatch.collect_repair_orders
    • explicit requirement that this is a browser workflow, not a text-only task
    • explicit requirement that business data must come from the browser-backed scene tool
    • fallback rules for generic browser probing only after tool failure
  4. Agent runs and chooses the required tool.
  5. Tool executes through the existing browser-backed skill path.
  6. Agent may summarize the result, but cannot fabricate data.

Enforcement note for the first slice:

  • The agent_browser guarantee is primarily an instruction-contract guarantee in slice one.
  • If allowed-tool shaping can narrow the exposed tool set for a matched scene without destabilizing existing behavior, that is a valid enhancement, but it is not required for the first slice.
  • The minimum guaranteed behavior for slice one is strong scene-specific prompt injection plus preservation of the rule that the model must not invent collected business data.

Browser Execution Contract

This requirement is non-negotiable for both dispatch modes:

  • scene skills must execute like the Zhihu flow in the sense that the final business action is performed through browser-internal methods
  • scene skills must not devolve into text-only pseudo execution
  • direct mode and agent mode both reuse the existing browser-backed skill execution path

Concretely, the final path for scene skill execution should remain compatible with:

  • BrowserScriptSkillTool
  • browser backend invocation
  • browser-side Eval / browser action execution semantics

Error Handling

  • Scene metadata missing or invalid: skip that scene and continue with normal runtime behavior.
  • Scene matched but skill/tool unavailable: do not crash; log enough context for diagnosis and fall back safely.
  • Browser surface unavailable: disable scene browser routing for that turn and fall back to current non-scene behavior.
  • Tool execution fails in agent_browser mode: allow existing fallback prompt behavior to continue, but preserve the rule that the model cannot invent collected data.
  • Tool execution fails in direct_browser mode: return a concise execution failure instead of pretending collection succeeded.

Extensibility Rules

This slice should be built so future scene additions only need:

  • a new scene metadata file under the staged scene path
  • a matching skill package/tool
  • a dispatch-mode declaration/policy
  • optional aliases if the natural-language names are not sufficiently explicit

Avoid these anti-patterns:

  • per-scene if user said X then do Y branches scattered across runtime files
  • duplicating browser execution code for each scene
  • binding future scenes to Zhihu-specific assumptions

Testing Strategy

Scene registry tests

  • load valid metadata for fault-details-report
  • load valid metadata for 95598-repair-city-dispatch
  • ignore broken/missing scene files safely

Matching tests

  • instruction variants match fault-details-report
  • instruction variants match 95598-repair-city-dispatch
  • unrelated instructions do not match

Instruction-building tests

  • agent_browser scene injects the required browser-first scene contract
  • unmatched instructions do not gain scene-specific constraints
  • Zhihu-specific instruction behavior remains unchanged

Tool exposure tests

  • staged skills from the moved path are loaded into runtime
  • browser-backed tool names include:
    • fault-details-report.collect_fault_details
    • 95598-repair-city-dispatch.collect_repair_orders

Direct execution tests

  • fault-details-report direct route invokes the browser-backed tool path rather than bypassing the browser layer
  • direct route returns failure cleanly when tool execution fails
  1. Add a tiny scene metadata loader and dispatch-mode policy module.
  2. Extend runtime path resolution so the moved staged skills/scenes are visible.
  3. Add deterministic scene matching for the two initial scenes.
  4. Implement agent_browser instruction injection for 95598-repair-city-dispatch.
  5. Implement direct_browser execution for fault-details-report using the browser-backed skill path.
  6. Add focused tests for matching, loading, tool exposure, and direct-vs-agent behavior.

Open Design Constraint Captured From Discussion

The user explicitly requires the following combined behavior:

  • support both kinds of scene execution in the same architecture
  • one initial scene should be able to execute without the model
  • one initial scene should execute through the model
  • both must still use browser-internal execution methods like the Zhihu path
  • the design must stay extensible because more staged skills may be added under the same path later

This design is built around those exact constraints.