Files
claw/docs/superpowers/plans/2026-04-01-claw-ws-parallel-transport.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

21 KiB

Claw-WS Parallel Transport Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a parallel claw-ws transport path that keeps the current pipe mode intact while introducing a long-lived sg_claw local service, an interactive sg_claw_client, and a browser WebSocket backend at ws://127.0.0.1:12345.

Architecture: First extract a transport-agnostic submit-task runner and browser backend abstraction from the current pipe-coupled flow. Keep the existing pipe path as one adapter/backend, then add a fixed-protocol browser WebSocket backend plus a small service/session layer and an interactive CLI client that reuse the same runtime, orchestration, and browser-facing tool adapters.

Tech Stack: Rust 2021, current sgclaw compat runtime, zeroclaw runtime engine, serde/serde_json, existing MacPolicy, and a blocking WebSocket crate for v1 (tungstenite preferred over a broad async rewrite).


Scope Guardrails

  • Keep the current pipe mode entrypoint and behavior working.
  • Do not replace the existing browser pipe path.
  • Add a parallel WebSocket path only.
  • v1 supports one active client session only.
  • Reuse existing tool names and runtime behavior whenever possible.
  • If WS Eval support is incomplete, disable eval-dependent browser-script skill exposure in WS mode rather than shipping partial behavior.
  • Do not broaden v1 with task queues, multi-client support, or admin endpoints.

File Structure

Existing files to reuse

  • Modify: src/lib.rs — current pipe bootstrap and receive loop; keep as the legacy pipe entrypoint.
  • Modify: src/agent/mod.rs — current BrowserMessage::SubmitTask entrypoint and config-loading flow.
  • Modify: src/compat/runtime.rs — compat runtime and tool assembly.
  • Modify: src/compat/orchestration.rs — direct workflow vs compat runtime routing.
  • Modify: src/compat/browser_tool_adapter.rs — exposes browser_action and superrpa_browser.
  • Modify: src/compat/browser_script_skill_tool.rs — browser-script skill execution.
  • Modify: src/compat/workflow_executor.rs — direct browser workflows such as Zhihu flows.
  • Reuse: src/pipe/browser_tool.rs — current browser command executor; retain as the pipe backend implementation.
  • Reuse: src/pipe/protocol.rsBrowserMessage, AgentMessage, Action, ExecutionSurfaceMetadata.
  • Reuse: src/security/mac_policy.rs — local action/domain guardrails.
  • Modify: src/config/settings.rs — minimal new config surface for WS mode.
  • Optional modify: src/runtime/engine.rs — only if backend capability wiring requires it.

New files to create

  • Create: src/agent/task_runner.rs — shared submit-task execution entrypoint.
  • Create: src/browser/mod.rs — browser backend exports.
  • Create: src/browser/backend.rsBrowserBackend trait and helpers.
  • Create: src/browser/pipe_backend.rs — wrapper around existing BrowserPipeTool.
  • Create: src/browser/ws_protocol.rs — fixed browser WS request/response codec.
  • Create: src/browser/ws_backend.rs — browser WS backend with blocking invoke semantics.
  • Create: src/service/mod.rs — service exports.
  • Create: src/service/protocol.rs — client/service WS message types.
  • Create: src/service/server.rs — single-session sg_claw server.
  • Create: src/bin/sg_claw.rs — service binary.
  • Create: src/bin/sg_claw_client.rs — interactive CLI client.
  • Create: tests/task_runner_test.rs — shared submit-task runner regressions.
  • Create: tests/browser_backend_capability_test.rs — backend capability/tool exposure tests.
  • Create: tests/browser_ws_protocol_test.rs — browser WS protocol tests.
  • Create: tests/browser_ws_backend_test.rs — browser WS backend tests.
  • Create: tests/service_ws_session_test.rs — single-session server tests.
  • Create: tests/service_task_flow_test.rs — client/service task flow tests.

Task 1: Extract a shared submit-task runner

Files:

  • Create: src/agent/task_runner.rs

  • Modify: src/agent/mod.rs

  • Modify: src/lib.rs

  • Test: tests/task_runner_test.rs

  • Reuse: src/compat/runtime.rs, src/compat/orchestration.rs

  • Step 1: Write a failing runner regression test

Create tests/task_runner_test.rs covering:

  • empty instruction returns the same TaskComplete failure summary

  • missing LLM config still returns the same summary shape

  • the pipe adapter still emits LogEntry before TaskComplete

  • Step 2: Run the targeted regression tests first

Run:

cargo test --test runtime_task_flow_test --test task_runner_test

Expected: task_runner_test fails because the shared runner does not exist yet.

  • Step 3: Define the transport-neutral request model

Create src/agent/task_runner.rs with a request struct that mirrors the current pipe payload:

pub struct SubmitTaskRequest {
    pub instruction: String,
    pub conversation_id: Option<String>,
    pub messages: Vec<ConversationMessage>,
    pub page_url: Option<String>,
    pub page_title: Option<String>,
}

Normalize empty strings to None at the adapter boundary.

  • Step 4: Define an event sink abstraction

Add a small trait that can emit the current agent events without depending on a specific transport:

pub trait AgentEventSink {
    fn send(&self, message: &AgentMessage) -> Result<(), PipeError>;
}

The existing pipe transport should implement this first.

  • Step 5: Move submit-task execution into a shared function

Extract the body currently inside BrowserMessage::SubmitTask handling from src/agent/mod.rs into a shared function such as:

pub fn run_submit_task(
    sink: &dyn AgentEventSink,
    browser_backend: Arc<dyn BrowserBackend>,
    context: &AgentRuntimeContext,
    request: SubmitTaskRequest,
) -> Result<(), PipeError>

This function must still:

  • validate empty instruction

  • load sgclaw settings

  • log runtime/config info

  • choose orchestration vs compat runtime

  • emit AgentMessage::TaskComplete

  • Step 6: Keep pipe mode as a thin adapter

Refactor handle_browser_message_with_context(...) in src/agent/mod.rs so it only:

  • pattern matches BrowserMessage

  • converts SubmitTask into SubmitTaskRequest

  • forwards into run_submit_task(...)

  • Step 7: Re-run the runner regressions

Run:

cargo test --test runtime_task_flow_test --test task_runner_test

Expected: both tests pass and pipe behavior remains unchanged.

  • Step 8: Commit
git add src/agent/mod.rs src/agent/task_runner.rs src/lib.rs tests/task_runner_test.rs
git commit -m "refactor: extract shared submit-task runner"

Task 2: Introduce a browser backend abstraction and wrap the current pipe implementation

Files:

  • Create: src/browser/mod.rs

  • Create: src/browser/backend.rs

  • Create: src/browser/pipe_backend.rs

  • Modify: src/lib.rs

  • Modify: src/compat/browser_tool_adapter.rs

  • Modify: src/compat/browser_script_skill_tool.rs

  • Modify: src/compat/runtime.rs

  • Modify: src/compat/orchestration.rs

  • Modify: src/compat/workflow_executor.rs

  • Test: tests/browser_backend_capability_test.rs

  • Reuse: src/pipe/browser_tool.rs, src/security/mac_policy.rs

  • Step 1: Add a failing backend capability test

Create tests/browser_backend_capability_test.rs to verify:

  • pipe backend still exposes privileged surface metadata

  • pipe backend still supports Eval

  • browser-script tool exposure is disabled when supports_eval() is false

  • Step 2: Run the current browser adapter tests first

Run:

cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test

Expected: new capability test fails because the backend abstraction does not exist yet.

  • Step 3: Define the shared browser interface

Create src/browser/backend.rs:

pub trait BrowserBackend: Send + Sync {
    fn invoke(
        &self,
        action: Action,
        params: Value,
        expected_domain: &str,
    ) -> Result<CommandOutput, PipeError>;

    fn surface_metadata(&self) -> ExecutionSurfaceMetadata;

    fn supports_eval(&self) -> bool {
        true
    }
}
  • Step 4: Implement the pipe backend as a wrapper

Create src/browser/pipe_backend.rs that stores the current BrowserPipeTool<T> and forwards invoke(...) and surface_metadata() unchanged.

Pipe mode must continue using:

  • perform_handshake(...)

  • MacPolicy::load_from_path(...)

  • BrowserPipeTool::new(...).with_response_timeout(...)

  • Step 5: Refactor runtime and tool adapters to depend on Arc<dyn BrowserBackend>

Update:

  • src/compat/browser_tool_adapter.rs
  • src/compat/browser_script_skill_tool.rs
  • src/compat/runtime.rs
  • src/compat/orchestration.rs
  • src/compat/workflow_executor.rs

Preserve external tool names:

  • browser_action

  • superrpa_browser

  • Step 6: Add capability gating for eval-dependent script tools

If supports_eval() is false, do not expose browser-script skill tools from build_browser_script_skill_tools(...) in that backend mode.

  • Step 7: Re-run browser adapter tests

Run:

cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test

Expected: all three pass.

  • Step 8: Commit
git add src/browser src/lib.rs src/compat/browser_tool_adapter.rs src/compat/browser_script_skill_tool.rs src/compat/runtime.rs src/compat/orchestration.rs src/compat/workflow_executor.rs tests/browser_backend_capability_test.rs
git commit -m "refactor: abstract browser backend from pipe transport"

Task 3: Implement the fixed browser WebSocket protocol codec in isolation

Files:

  • Create: src/browser/ws_protocol.rs

  • Test: tests/browser_ws_protocol_test.rs

  • Reuse: docs/_tmp_sgbrowser_ws_api_doc.txt

  • Step 1: Write failing protocol codec tests

Create tests/browser_ws_protocol_test.rs covering:

  • exact outbound frame encoding

  • callback payload decoding

  • unknown callback format rejection

  • mapping coverage for every supported v1 action

  • Step 2: Run the protocol tests first

Run:

cargo test --test browser_ws_protocol_test

Expected: fail because the WS protocol codec does not exist yet.

  • Step 3: Encode the exact browser frame shapes

Create src/browser/ws_protocol.rs so it can build exact array-form payloads such as:

[requesturl, "sgBrowserExcuteJsCodeByArea", target_url, js_code, area]

Serialize to the JSON string format required by the browser service.

  • Step 4: Define the v1 action mapping table

Support only the actions already needed by current sgclaw flows:

  • Navigate
  • GetText
  • Click
  • Type
  • Eval

Document which browser functions each one maps to and what assumptions they rely on.

  • Step 5: Define callback parsing and correlation rules

Represent callback-bearing operations explicitly, including the callback function naming or request-correlation strategy the backend will depend on.

  • Step 6: Reject unsupported or malformed shapes early

Fail fast for:

  • unsupported actions

  • malformed callback payloads

  • missing request correlation metadata

  • Step 7: Re-run the protocol tests

Run:

cargo test --test browser_ws_protocol_test

Expected: pass with no network dependency.

  • Step 8: Commit
git add src/browser/ws_protocol.rs tests/browser_ws_protocol_test.rs
git commit -m "test: codify fixed browser websocket protocol"

Task 4: Build the browser WS backend with synchronous invoke semantics

Files:

  • Create: src/browser/ws_backend.rs

  • Modify: src/browser/mod.rs

  • Test: tests/browser_ws_backend_test.rs

  • Reuse: CommandOutput, PipeError, ExecutionSurfaceMetadata, MacPolicy

  • Step 1: Write failing backend behavior tests

Create tests/browser_ws_backend_test.rs covering:

  • zero return + no callback => success

  • non-zero return => failure

  • zero return + callback => success with normalized CommandOutput

  • callback timeout => timeout error

  • dropped socket => clear failure

  • Step 2: Run backend tests first

Run:

cargo test --test browser_ws_backend_test

Expected: fail because the WS backend does not exist yet.

  • Step 3: Build a long-lived browser connection manager

Implement src/browser/ws_backend.rs to connect to ws://127.0.0.1:12345 and expose blocking invoke(...) calls.

Use a dedicated connection loop plus request/response coordination instead of scattering raw socket calls through the runtime.

  • Step 4: Preserve local guardrails before send

Validate MacPolicy before translating an action into the browser WS protocol, matching current pipe backend behavior.

  • Step 5: Normalize immediate status returns and delayed callbacks

For each invoke(...) call:

  • fail immediately on non-zero return codes

  • succeed immediately for operations with no data callback

  • wait for the matching callback for result-bearing operations

  • convert the final outcome into CommandOutput

  • Step 6: Keep v1 concurrency intentionally serialized

Allow only one in-flight browser request at a time unless the browser callback protocol proves a stable request-id guarantee.

  • Step 7: Re-run backend tests

Run:

cargo test --test browser_ws_backend_test

Expected: pass using mocks/fakes, not the real browser.

  • Step 8: Commit
git add src/browser/mod.rs src/browser/ws_backend.rs tests/browser_ws_backend_test.rs
git commit -m "feat: add browser websocket backend"

Task 5: Add the sg_claw service protocol and single-session server

Files:

  • Create: src/service/mod.rs

  • Create: src/service/protocol.rs

  • Create: src/service/server.rs

  • Create: src/bin/sg_claw.rs

  • Modify: src/lib.rs

  • Modify: Cargo.toml

  • Test: tests/service_ws_session_test.rs

  • Reuse: AgentMessage::LogEntry, AgentMessage::TaskComplete, SubmitTaskRequest, run_submit_task(...)

  • Step 1: Write failing service session tests

Create tests/service_ws_session_test.rs to verify:

  • first client attaches

  • second client gets Busy

  • disconnect resets session state

  • overlapping task submission is rejected clearly

  • Step 2: Run the session tests first

Run:

cargo test --test service_ws_session_test

Expected: fail because the service layer does not exist yet.

  • Step 3: Define a thin client/service WS protocol

In src/service/protocol.rs, reuse existing task/event shapes as much as possible:

ClientMessage::SubmitTask { instruction, conversation_id, messages, page_url, page_title }
ClientMessage::Ping
ServiceMessage::LogEntry { level, message }
ServiceMessage::TaskComplete { success, summary }
ServiceMessage::Busy { message }
  • Step 4: Add the service event sink adapter

Implement AgentEventSink for the service session writer so the shared task runner can stream LogEntry and TaskComplete over the service WebSocket.

  • Step 5: Implement single-active-client session state

Model explicit states such as:

  • Idle
  • ClientAttached
  • TaskRunning

Reject a second client with ServiceMessage::Busy and close the socket. Reject overlapping tasks instead of queueing them.

  • Step 6: Add the service binary

Create src/bin/sg_claw.rs that:

  • loads config
  • creates the browser WS backend
  • listens for local client connections
  • routes SubmitTask into run_submit_task(...)

Keep src/main.rs and the existing sgclaw::run() pipe path unchanged.

  • Step 7: Re-run the session tests

Run:

cargo test --test service_ws_session_test

Expected: pass without the real browser.

  • Step 8: Commit
git add src/service src/bin/sg_claw.rs src/lib.rs Cargo.toml tests/service_ws_session_test.rs
git commit -m "feat: add claw-ws service entrypoint"

Task 6: Add the sg_claw_client interactive CLI

Files:

  • Create: src/bin/sg_claw_client.rs

  • Modify: Cargo.toml

  • Test: tests/service_task_flow_test.rs

  • Reuse: src/service/protocol.rs

  • Step 1: Write failing client/service task flow tests

Create tests/service_task_flow_test.rs to verify:

  • the submit-task request reaches the service

  • log entries stream in order

  • the final summary arrives exactly once

  • disconnect after task completion is handled cleanly

  • Step 2: Run the flow tests first

Run:

cargo test --test service_task_flow_test

Expected: fail because the client does not exist yet.

  • Step 3: Implement a thin interactive client loop

Create src/bin/sg_claw_client.rs that:

  • connects to the local sg_claw service

  • reads a line of user input

  • sends ClientMessage::SubmitTask

  • prints streamed LogEntry events as they arrive

  • ends the turn on TaskComplete

  • Step 4: Keep the client intentionally dumb

Do not duplicate runtime logic in the client. Browser access, skills, orchestration, and task execution remain entirely inside the service.

  • Step 5: Re-run the flow tests

Run:

cargo test --test service_task_flow_test

Expected: pass without the real browser.

  • Step 6: Build the new binaries explicitly

Run:

cargo build --bin sg_claw --bin sg_claw_client

Expected: both binaries compile successfully.

  • Step 7: Commit
git add src/bin/sg_claw_client.rs Cargo.toml tests/service_task_flow_test.rs
git commit -m "feat: add interactive claw-ws client"

Task 7: Finish wiring, preserve pipe mode, and verify end-to-end behavior

Files:

  • Modify: Cargo.toml

  • Modify: src/lib.rs

  • Modify: src/config/settings.rs

  • Optional modify: src/runtime/engine.rs

  • Reuse: tests/browser_tool_test.rs, tests/runtime_task_flow_test.rs, tests/compat_runtime_test.rs

  • Step 1: Add only the minimum config surface for v1

Add settings such as:

  • browser_ws_url defaulting to ws://127.0.0.1:12345
  • service_ws_listen_addr defaulting to local loopback

Do not change the meaning of existing browser backend/profile settings just to represent service mode.

  • Step 2: Keep external browser tool naming stable

Verify that the runtime still exposes:

  • superrpa_browser
  • browser_action

under both pipe and WS modes where the backend supports them.

  • Step 3: Re-run the current pipe regression suite

Run:

cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test --test pipe_handshake_test --test pipe_protocol_test --test task_protocol_test

Expected: all existing pipe-oriented tests still pass unchanged.

  • Step 4: Run the new WS-focused suite

Run:

cargo test --test task_runner_test --test browser_ws_protocol_test --test browser_ws_backend_test --test browser_backend_capability_test --test service_ws_session_test --test service_task_flow_test

Expected: all new tests pass without launching the real browser.

  • Step 5: Run a full Rust test sweep

Run:

cargo test --tests

Expected: all Rust tests pass.

  • Step 6: Build all three binaries

Run:

cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client

Expected: all three binaries compile.

  • Step 7: Perform a manual local smoke test

Manual test:

  1. Start the browser app so ws://127.0.0.1:12345 is available.
  2. Run cargo run --bin sg_claw.
  3. In another terminal, run cargo run --bin sg_claw_client.
  4. Submit a simple browser task such as opening a page or fetching visible text.
  5. Confirm the client prints streaming logs and exactly one final completion summary.
  6. Confirm the old pipe-mode entry still starts via cargo run.

Expected: both modes work side-by-side.

  • Step 8: Commit
git add Cargo.toml src/lib.rs src/config/settings.rs src/runtime/engine.rs
git commit -m "feat: wire parallel claw-ws transport"

Verification Checklist

Fast regression checks

cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test

Expected: current pipe/browser runtime behavior remains green.

Full Rust test sweep

cargo test --tests

Expected: all Rust tests pass.

Binary build verification

cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client

Expected: all three binaries compile.

Manual end-to-end verification

  • Browser app listening on ws://127.0.0.1:12345
  • cargo run --bin sg_claw
  • cargo run --bin sg_claw_client
  • submit one browser task
  • verify streaming logs, final completion, and single-client lock behavior
  • verify cargo run still preserves old pipe bootstrap

Notes for Implementation

  • Keep the current pipe bootstrap in src/lib.rs intact until the shared runner and pipe backend wrapper are both green.
  • Prefer small commits at each task boundary.
  • Keep the new WS path additive and isolated.
  • Do not ship partial browser capabilities under stable tool names.
  • Treat docs/_tmp_sgbrowser_ws_api_doc.txt as the browser WS protocol source of truth while implementing src/browser/ws_protocol.rs.