admin/claw

Files

木炎 bdf8e12246 feat: align browser callback runtime and export flows

Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-06 21:44:53 +08:00

21 KiB

Raw Blame History

Claw-WS Parallel Transport Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add a parallel claw-ws transport path that keeps the current pipe mode intact while introducing a long-lived sg_claw local service, an interactive sg_claw_client, and a browser WebSocket backend at ws://127.0.0.1:12345.

Architecture: First extract a transport-agnostic submit-task runner and browser backend abstraction from the current pipe-coupled flow. Keep the existing pipe path as one adapter/backend, then add a fixed-protocol browser WebSocket backend plus a small service/session layer and an interactive CLI client that reuse the same runtime, orchestration, and browser-facing tool adapters.

Tech Stack: Rust 2021, current sgclaw compat runtime, zeroclaw runtime engine, serde/serde_json, existing MacPolicy, and a blocking WebSocket crate for v1 (tungstenite preferred over a broad async rewrite).

Scope Guardrails

Keep the current pipe mode entrypoint and behavior working.
Do not replace the existing browser pipe path.
Add a parallel WebSocket path only.
v1 supports one active client session only.
Reuse existing tool names and runtime behavior whenever possible.
If WS Eval support is incomplete, disable eval-dependent browser-script skill exposure in WS mode rather than shipping partial behavior.
Do not broaden v1 with task queues, multi-client support, or admin endpoints.

File Structure

Existing files to reuse

Modify: src/lib.rs — current pipe bootstrap and receive loop; keep as the legacy pipe entrypoint.
Modify: src/agent/mod.rs — current BrowserMessage::SubmitTask entrypoint and config-loading flow.
Modify: src/compat/runtime.rs — compat runtime and tool assembly.
Modify: src/compat/orchestration.rs — direct workflow vs compat runtime routing.
Modify: src/compat/browser_tool_adapter.rs — exposes browser_action and superrpa_browser.
Modify: src/compat/browser_script_skill_tool.rs — browser-script skill execution.
Modify: src/compat/workflow_executor.rs — direct browser workflows such as Zhihu flows.
Reuse: src/pipe/browser_tool.rs — current browser command executor; retain as the pipe backend implementation.
Reuse: src/pipe/protocol.rs — BrowserMessage, AgentMessage, Action, ExecutionSurfaceMetadata.
Reuse: src/security/mac_policy.rs — local action/domain guardrails.
Modify: src/config/settings.rs — minimal new config surface for WS mode.
Optional modify: src/runtime/engine.rs — only if backend capability wiring requires it.

New files to create

Create: src/agent/task_runner.rs — shared submit-task execution entrypoint.
Create: src/browser/mod.rs — browser backend exports.
Create: src/browser/backend.rs — BrowserBackend trait and helpers.
Create: src/browser/pipe_backend.rs — wrapper around existing BrowserPipeTool.
Create: src/browser/ws_protocol.rs — fixed browser WS request/response codec.
Create: src/browser/ws_backend.rs — browser WS backend with blocking invoke semantics.
Create: src/service/mod.rs — service exports.
Create: src/service/protocol.rs — client/service WS message types.
Create: src/service/server.rs — single-session sg_claw server.
Create: src/bin/sg_claw.rs — service binary.
Create: src/bin/sg_claw_client.rs — interactive CLI client.
Create: tests/task_runner_test.rs — shared submit-task runner regressions.
Create: tests/browser_backend_capability_test.rs — backend capability/tool exposure tests.
Create: tests/browser_ws_protocol_test.rs — browser WS protocol tests.
Create: tests/browser_ws_backend_test.rs — browser WS backend tests.
Create: tests/service_ws_session_test.rs — single-session server tests.
Create: tests/service_task_flow_test.rs — client/service task flow tests.

Task 1: Extract a shared submit-task runner

Files:

Create: src/agent/task_runner.rs
Modify: src/agent/mod.rs
Modify: src/lib.rs
Test: tests/task_runner_test.rs
Reuse: src/compat/runtime.rs, src/compat/orchestration.rs
Step 1: Write a failing runner regression test

Create tests/task_runner_test.rs covering:

empty instruction returns the same TaskComplete failure summary
missing LLM config still returns the same summary shape
the pipe adapter still emits LogEntry before TaskComplete
Step 2: Run the targeted regression tests first

Run:

cargo test --test runtime_task_flow_test --test task_runner_test

Expected: task_runner_test fails because the shared runner does not exist yet.

Step 3: Define the transport-neutral request model

Create src/agent/task_runner.rs with a request struct that mirrors the current pipe payload:

pub struct SubmitTaskRequest {
    pub instruction: String,
    pub conversation_id: Option<String>,
    pub messages: Vec<ConversationMessage>,
    pub page_url: Option<String>,
    pub page_title: Option<String>,
}

Normalize empty strings to None at the adapter boundary.

Step 4: Define an event sink abstraction

Add a small trait that can emit the current agent events without depending on a specific transport:

pub trait AgentEventSink {
    fn send(&self, message: &AgentMessage) -> Result<(), PipeError>;
}

The existing pipe transport should implement this first.

Step 5: Move submit-task execution into a shared function

Extract the body currently inside BrowserMessage::SubmitTask handling from src/agent/mod.rs into a shared function such as:

pub fn run_submit_task(
    sink: &dyn AgentEventSink,
    browser_backend: Arc<dyn BrowserBackend>,
    context: &AgentRuntimeContext,
    request: SubmitTaskRequest,
) -> Result<(), PipeError>

This function must still:

validate empty instruction
load sgclaw settings
log runtime/config info
choose orchestration vs compat runtime
emit AgentMessage::TaskComplete
Step 6: Keep pipe mode as a thin adapter

Refactor handle_browser_message_with_context(...) in src/agent/mod.rs so it only:

pattern matches BrowserMessage
converts SubmitTask into SubmitTaskRequest
forwards into run_submit_task(...)
Step 7: Re-run the runner regressions

Run:

cargo test --test runtime_task_flow_test --test task_runner_test

Expected: both tests pass and pipe behavior remains unchanged.

Step 8: Commit

git add src/agent/mod.rs src/agent/task_runner.rs src/lib.rs tests/task_runner_test.rs
git commit -m "refactor: extract shared submit-task runner"

Task 2: Introduce a browser backend abstraction and wrap the current pipe implementation

Files:

Create: src/browser/mod.rs
Create: src/browser/backend.rs
Create: src/browser/pipe_backend.rs
Modify: src/lib.rs
Modify: src/compat/browser_tool_adapter.rs
Modify: src/compat/browser_script_skill_tool.rs
Modify: src/compat/runtime.rs
Modify: src/compat/orchestration.rs
Modify: src/compat/workflow_executor.rs
Test: tests/browser_backend_capability_test.rs
Reuse: src/pipe/browser_tool.rs, src/security/mac_policy.rs
Step 1: Add a failing backend capability test

Create tests/browser_backend_capability_test.rs to verify:

pipe backend still exposes privileged surface metadata
pipe backend still supports Eval
browser-script tool exposure is disabled when supports_eval() is false
Step 2: Run the current browser adapter tests first

Run:

cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test

Expected: new capability test fails because the backend abstraction does not exist yet.

Step 3: Define the shared browser interface

Create src/browser/backend.rs:

pub trait BrowserBackend: Send + Sync {
    fn invoke(
        &self,
        action: Action,
        params: Value,
        expected_domain: &str,
    ) -> Result<CommandOutput, PipeError>;

    fn surface_metadata(&self) -> ExecutionSurfaceMetadata;

    fn supports_eval(&self) -> bool {
        true
    }
}

Step 4: Implement the pipe backend as a wrapper

Create src/browser/pipe_backend.rs that stores the current BrowserPipeTool<T> and forwards invoke(...) and surface_metadata() unchanged.

Pipe mode must continue using:

perform_handshake(...)
MacPolicy::load_from_path(...)
BrowserPipeTool::new(...).with_response_timeout(...)
Step 5: Refactor runtime and tool adapters to depend on Arc<dyn BrowserBackend>

Update:

src/compat/browser_tool_adapter.rs
src/compat/browser_script_skill_tool.rs
src/compat/runtime.rs
src/compat/orchestration.rs
src/compat/workflow_executor.rs

Preserve external tool names:

browser_action
superrpa_browser
Step 6: Add capability gating for eval-dependent script tools

If supports_eval() is false, do not expose browser-script skill tools from build_browser_script_skill_tools(...) in that backend mode.

Step 7: Re-run browser adapter tests

Run:

cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test

Expected: all three pass.

Step 8: Commit

git add src/browser src/lib.rs src/compat/browser_tool_adapter.rs src/compat/browser_script_skill_tool.rs src/compat/runtime.rs src/compat/orchestration.rs src/compat/workflow_executor.rs tests/browser_backend_capability_test.rs
git commit -m "refactor: abstract browser backend from pipe transport"

Task 3: Implement the fixed browser WebSocket protocol codec in isolation

Files:

Create: src/browser/ws_protocol.rs
Test: tests/browser_ws_protocol_test.rs
Reuse: docs/_tmp_sgbrowser_ws_api_doc.txt
Step 1: Write failing protocol codec tests

Create tests/browser_ws_protocol_test.rs covering:

exact outbound frame encoding
callback payload decoding
unknown callback format rejection
mapping coverage for every supported v1 action
Step 2: Run the protocol tests first

Run:

cargo test --test browser_ws_protocol_test

Expected: fail because the WS protocol codec does not exist yet.

Step 3: Encode the exact browser frame shapes

Create src/browser/ws_protocol.rs so it can build exact array-form payloads such as:

[requesturl, "sgBrowserExcuteJsCodeByArea", target_url, js_code, area]

Serialize to the JSON string format required by the browser service.

Step 4: Define the v1 action mapping table

Support only the actions already needed by current sgclaw flows:

Navigate
GetText
Click
Type
Eval

Document which browser functions each one maps to and what assumptions they rely on.

Step 5: Define callback parsing and correlation rules

Represent callback-bearing operations explicitly, including the callback function naming or request-correlation strategy the backend will depend on.

Step 6: Reject unsupported or malformed shapes early

Fail fast for:

unsupported actions
malformed callback payloads
missing request correlation metadata
Step 7: Re-run the protocol tests

Run:

cargo test --test browser_ws_protocol_test

Expected: pass with no network dependency.

Step 8: Commit

git add src/browser/ws_protocol.rs tests/browser_ws_protocol_test.rs
git commit -m "test: codify fixed browser websocket protocol"

Task 4: Build the browser WS backend with synchronous invoke semantics

Files:

Create: src/browser/ws_backend.rs
Modify: src/browser/mod.rs
Test: tests/browser_ws_backend_test.rs
Reuse: CommandOutput, PipeError, ExecutionSurfaceMetadata, MacPolicy
Step 1: Write failing backend behavior tests

Create tests/browser_ws_backend_test.rs covering:

zero return + no callback => success
non-zero return => failure
zero return + callback => success with normalized CommandOutput
callback timeout => timeout error
dropped socket => clear failure
Step 2: Run backend tests first

Run:

cargo test --test browser_ws_backend_test

Expected: fail because the WS backend does not exist yet.

Step 3: Build a long-lived browser connection manager

Implement src/browser/ws_backend.rs to connect to ws://127.0.0.1:12345 and expose blocking invoke(...) calls.

Use a dedicated connection loop plus request/response coordination instead of scattering raw socket calls through the runtime.

Step 4: Preserve local guardrails before send

Validate MacPolicy before translating an action into the browser WS protocol, matching current pipe backend behavior.

Step 5: Normalize immediate status returns and delayed callbacks

For each invoke(...) call:

fail immediately on non-zero return codes
succeed immediately for operations with no data callback
wait for the matching callback for result-bearing operations
convert the final outcome into CommandOutput
Step 6: Keep v1 concurrency intentionally serialized

Allow only one in-flight browser request at a time unless the browser callback protocol proves a stable request-id guarantee.

Step 7: Re-run backend tests

Run:

cargo test --test browser_ws_backend_test

Expected: pass using mocks/fakes, not the real browser.

Step 8: Commit

git add src/browser/mod.rs src/browser/ws_backend.rs tests/browser_ws_backend_test.rs
git commit -m "feat: add browser websocket backend"

Task 5: Add the `sg_claw` service protocol and single-session server

Files:

Create: src/service/mod.rs
Create: src/service/protocol.rs
Create: src/service/server.rs
Create: src/bin/sg_claw.rs
Modify: src/lib.rs
Modify: Cargo.toml
Test: tests/service_ws_session_test.rs
Reuse: AgentMessage::LogEntry, AgentMessage::TaskComplete, SubmitTaskRequest, run_submit_task(...)
Step 1: Write failing service session tests

Create tests/service_ws_session_test.rs to verify:

first client attaches
second client gets Busy
disconnect resets session state
overlapping task submission is rejected clearly
Step 2: Run the session tests first

Run:

cargo test --test service_ws_session_test

Expected: fail because the service layer does not exist yet.

Step 3: Define a thin client/service WS protocol

In src/service/protocol.rs, reuse existing task/event shapes as much as possible:

ClientMessage::SubmitTask { instruction, conversation_id, messages, page_url, page_title }
ClientMessage::Ping
ServiceMessage::LogEntry { level, message }
ServiceMessage::TaskComplete { success, summary }
ServiceMessage::Busy { message }

Step 4: Add the service event sink adapter

Implement AgentEventSink for the service session writer so the shared task runner can stream LogEntry and TaskComplete over the service WebSocket.

Step 5: Implement single-active-client session state

Model explicit states such as:

Idle
ClientAttached
TaskRunning

Reject a second client with ServiceMessage::Busy and close the socket. Reject overlapping tasks instead of queueing them.

Step 6: Add the service binary

Create src/bin/sg_claw.rs that:

loads config
creates the browser WS backend
listens for local client connections
routes SubmitTask into run_submit_task(...)

Keep src/main.rs and the existing sgclaw::run() pipe path unchanged.

Step 7: Re-run the session tests

Run:

cargo test --test service_ws_session_test

Expected: pass without the real browser.

Step 8: Commit

git add src/service src/bin/sg_claw.rs src/lib.rs Cargo.toml tests/service_ws_session_test.rs
git commit -m "feat: add claw-ws service entrypoint"

Task 6: Add the `sg_claw_client` interactive CLI

Files:

Create: src/bin/sg_claw_client.rs
Modify: Cargo.toml
Test: tests/service_task_flow_test.rs
Reuse: src/service/protocol.rs
Step 1: Write failing client/service task flow tests

Create tests/service_task_flow_test.rs to verify:

the submit-task request reaches the service
log entries stream in order
the final summary arrives exactly once
disconnect after task completion is handled cleanly
Step 2: Run the flow tests first

Run:

cargo test --test service_task_flow_test

Expected: fail because the client does not exist yet.

Step 3: Implement a thin interactive client loop

Create src/bin/sg_claw_client.rs that:

connects to the local sg_claw service
reads a line of user input
sends ClientMessage::SubmitTask
prints streamed LogEntry events as they arrive
ends the turn on TaskComplete
Step 4: Keep the client intentionally dumb

Do not duplicate runtime logic in the client. Browser access, skills, orchestration, and task execution remain entirely inside the service.

Step 5: Re-run the flow tests

Run:

cargo test --test service_task_flow_test

Expected: pass without the real browser.

Step 6: Build the new binaries explicitly

Run:

cargo build --bin sg_claw --bin sg_claw_client

Expected: both binaries compile successfully.

Step 7: Commit

git add src/bin/sg_claw_client.rs Cargo.toml tests/service_task_flow_test.rs
git commit -m "feat: add interactive claw-ws client"

Task 7: Finish wiring, preserve pipe mode, and verify end-to-end behavior

Files:

Modify: Cargo.toml
Modify: src/lib.rs
Modify: src/config/settings.rs
Optional modify: src/runtime/engine.rs
Reuse: tests/browser_tool_test.rs, tests/runtime_task_flow_test.rs, tests/compat_runtime_test.rs
Step 1: Add only the minimum config surface for v1

Add settings such as:

browser_ws_url defaulting to ws://127.0.0.1:12345
service_ws_listen_addr defaulting to local loopback

Do not change the meaning of existing browser backend/profile settings just to represent service mode.

Step 2: Keep external browser tool naming stable

Verify that the runtime still exposes:

superrpa_browser
browser_action

under both pipe and WS modes where the backend supports them.

Step 3: Re-run the current pipe regression suite

Run:

cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test --test pipe_handshake_test --test pipe_protocol_test --test task_protocol_test

Expected: all existing pipe-oriented tests still pass unchanged.

Step 4: Run the new WS-focused suite

Run:

cargo test --test task_runner_test --test browser_ws_protocol_test --test browser_ws_backend_test --test browser_backend_capability_test --test service_ws_session_test --test service_task_flow_test

Expected: all new tests pass without launching the real browser.

Step 5: Run a full Rust test sweep

Run:

cargo test --tests

Expected: all Rust tests pass.

Step 6: Build all three binaries

Run:

cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client

Expected: all three binaries compile.

Step 7: Perform a manual local smoke test

Manual test:

Start the browser app so ws://127.0.0.1:12345 is available.
Run cargo run --bin sg_claw.
In another terminal, run cargo run --bin sg_claw_client.
Submit a simple browser task such as opening a page or fetching visible text.
Confirm the client prints streaming logs and exactly one final completion summary.
Confirm the old pipe-mode entry still starts via cargo run.

Expected: both modes work side-by-side.

Step 8: Commit

git add Cargo.toml src/lib.rs src/config/settings.rs src/runtime/engine.rs
git commit -m "feat: wire parallel claw-ws transport"

Verification Checklist

Fast regression checks

cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test

Expected: current pipe/browser runtime behavior remains green.

Full Rust test sweep

cargo test --tests

Expected: all Rust tests pass.

Binary build verification

cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client

Expected: all three binaries compile.

Manual end-to-end verification

Browser app listening on ws://127.0.0.1:12345
cargo run --bin sg_claw
cargo run --bin sg_claw_client
submit one browser task
verify streaming logs, final completion, and single-client lock behavior
verify cargo run still preserves old pipe bootstrap

Notes for Implementation

Keep the current pipe bootstrap in src/lib.rs intact until the shared runner and pipe backend wrapper are both green.
Prefer small commits at each task boundary.
Keep the new WS path additive and isolated.
Do not ship partial browser capabilities under stable tool names.
Treat docs/_tmp_sgbrowser_ws_api_doc.txt as the browser WS protocol source of truth while implementing src/browser/ws_protocol.rs.

21 KiB Raw Blame History

Claw-WS Parallel Transport Implementation Plan

Scope Guardrails

File Structure

Existing files to reuse

New files to create

Task 1: Extract a shared submit-task runner

Task 2: Introduce a browser backend abstraction and wrap the current pipe implementation

Task 3: Implement the fixed browser WebSocket protocol codec in isolation

Task 4: Build the browser WS backend with synchronous invoke semantics

Task 5: Add the sg_claw service protocol and single-session server

Task 6: Add the sg_claw_client interactive CLI

Task 7: Finish wiring, preserve pipe mode, and verify end-to-end behavior

Verification Checklist

Fast regression checks

Full Rust test sweep

Binary build verification

Manual end-to-end verification

Notes for Implementation

21 KiB

Raw Blame History

Task 5: Add the `sg_claw` service protocol and single-session server

Task 6: Add the `sg_claw_client` interactive CLI