Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
21 KiB
Claw-WS Parallel Transport Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add a parallel claw-ws transport path that keeps the current pipe mode intact while introducing a long-lived sg_claw local service, an interactive sg_claw_client, and a browser WebSocket backend at ws://127.0.0.1:12345.
Architecture: First extract a transport-agnostic submit-task runner and browser backend abstraction from the current pipe-coupled flow. Keep the existing pipe path as one adapter/backend, then add a fixed-protocol browser WebSocket backend plus a small service/session layer and an interactive CLI client that reuse the same runtime, orchestration, and browser-facing tool adapters.
Tech Stack: Rust 2021, current sgclaw compat runtime, zeroclaw runtime engine, serde/serde_json, existing MacPolicy, and a blocking WebSocket crate for v1 (tungstenite preferred over a broad async rewrite).
Scope Guardrails
- Keep the current pipe mode entrypoint and behavior working.
- Do not replace the existing browser pipe path.
- Add a parallel WebSocket path only.
- v1 supports one active client session only.
- Reuse existing tool names and runtime behavior whenever possible.
- If WS
Evalsupport is incomplete, disable eval-dependent browser-script skill exposure in WS mode rather than shipping partial behavior. - Do not broaden v1 with task queues, multi-client support, or admin endpoints.
File Structure
Existing files to reuse
- Modify:
src/lib.rs— current pipe bootstrap and receive loop; keep as the legacy pipe entrypoint. - Modify:
src/agent/mod.rs— currentBrowserMessage::SubmitTaskentrypoint and config-loading flow. - Modify:
src/compat/runtime.rs— compat runtime and tool assembly. - Modify:
src/compat/orchestration.rs— direct workflow vs compat runtime routing. - Modify:
src/compat/browser_tool_adapter.rs— exposesbrowser_actionandsuperrpa_browser. - Modify:
src/compat/browser_script_skill_tool.rs— browser-script skill execution. - Modify:
src/compat/workflow_executor.rs— direct browser workflows such as Zhihu flows. - Reuse:
src/pipe/browser_tool.rs— current browser command executor; retain as the pipe backend implementation. - Reuse:
src/pipe/protocol.rs—BrowserMessage,AgentMessage,Action,ExecutionSurfaceMetadata. - Reuse:
src/security/mac_policy.rs— local action/domain guardrails. - Modify:
src/config/settings.rs— minimal new config surface for WS mode. - Optional modify:
src/runtime/engine.rs— only if backend capability wiring requires it.
New files to create
- Create:
src/agent/task_runner.rs— shared submit-task execution entrypoint. - Create:
src/browser/mod.rs— browser backend exports. - Create:
src/browser/backend.rs—BrowserBackendtrait and helpers. - Create:
src/browser/pipe_backend.rs— wrapper around existingBrowserPipeTool. - Create:
src/browser/ws_protocol.rs— fixed browser WS request/response codec. - Create:
src/browser/ws_backend.rs— browser WS backend with blocking invoke semantics. - Create:
src/service/mod.rs— service exports. - Create:
src/service/protocol.rs— client/service WS message types. - Create:
src/service/server.rs— single-sessionsg_clawserver. - Create:
src/bin/sg_claw.rs— service binary. - Create:
src/bin/sg_claw_client.rs— interactive CLI client. - Create:
tests/task_runner_test.rs— shared submit-task runner regressions. - Create:
tests/browser_backend_capability_test.rs— backend capability/tool exposure tests. - Create:
tests/browser_ws_protocol_test.rs— browser WS protocol tests. - Create:
tests/browser_ws_backend_test.rs— browser WS backend tests. - Create:
tests/service_ws_session_test.rs— single-session server tests. - Create:
tests/service_task_flow_test.rs— client/service task flow tests.
Task 1: Extract a shared submit-task runner
Files:
-
Create:
src/agent/task_runner.rs -
Modify:
src/agent/mod.rs -
Modify:
src/lib.rs -
Test:
tests/task_runner_test.rs -
Reuse:
src/compat/runtime.rs,src/compat/orchestration.rs -
Step 1: Write a failing runner regression test
Create tests/task_runner_test.rs covering:
-
empty instruction returns the same
TaskCompletefailure summary -
missing LLM config still returns the same summary shape
-
the pipe adapter still emits
LogEntrybeforeTaskComplete -
Step 2: Run the targeted regression tests first
Run:
cargo test --test runtime_task_flow_test --test task_runner_test
Expected: task_runner_test fails because the shared runner does not exist yet.
- Step 3: Define the transport-neutral request model
Create src/agent/task_runner.rs with a request struct that mirrors the current pipe payload:
pub struct SubmitTaskRequest {
pub instruction: String,
pub conversation_id: Option<String>,
pub messages: Vec<ConversationMessage>,
pub page_url: Option<String>,
pub page_title: Option<String>,
}
Normalize empty strings to None at the adapter boundary.
- Step 4: Define an event sink abstraction
Add a small trait that can emit the current agent events without depending on a specific transport:
pub trait AgentEventSink {
fn send(&self, message: &AgentMessage) -> Result<(), PipeError>;
}
The existing pipe transport should implement this first.
- Step 5: Move submit-task execution into a shared function
Extract the body currently inside BrowserMessage::SubmitTask handling from src/agent/mod.rs into a shared function such as:
pub fn run_submit_task(
sink: &dyn AgentEventSink,
browser_backend: Arc<dyn BrowserBackend>,
context: &AgentRuntimeContext,
request: SubmitTaskRequest,
) -> Result<(), PipeError>
This function must still:
-
validate empty instruction
-
load sgclaw settings
-
log runtime/config info
-
choose orchestration vs compat runtime
-
emit
AgentMessage::TaskComplete -
Step 6: Keep pipe mode as a thin adapter
Refactor handle_browser_message_with_context(...) in src/agent/mod.rs so it only:
-
pattern matches
BrowserMessage -
converts
SubmitTaskintoSubmitTaskRequest -
forwards into
run_submit_task(...) -
Step 7: Re-run the runner regressions
Run:
cargo test --test runtime_task_flow_test --test task_runner_test
Expected: both tests pass and pipe behavior remains unchanged.
- Step 8: Commit
git add src/agent/mod.rs src/agent/task_runner.rs src/lib.rs tests/task_runner_test.rs
git commit -m "refactor: extract shared submit-task runner"
Task 2: Introduce a browser backend abstraction and wrap the current pipe implementation
Files:
-
Create:
src/browser/mod.rs -
Create:
src/browser/backend.rs -
Create:
src/browser/pipe_backend.rs -
Modify:
src/lib.rs -
Modify:
src/compat/browser_tool_adapter.rs -
Modify:
src/compat/browser_script_skill_tool.rs -
Modify:
src/compat/runtime.rs -
Modify:
src/compat/orchestration.rs -
Modify:
src/compat/workflow_executor.rs -
Test:
tests/browser_backend_capability_test.rs -
Reuse:
src/pipe/browser_tool.rs,src/security/mac_policy.rs -
Step 1: Add a failing backend capability test
Create tests/browser_backend_capability_test.rs to verify:
-
pipe backend still exposes privileged surface metadata
-
pipe backend still supports
Eval -
browser-script tool exposure is disabled when
supports_eval()is false -
Step 2: Run the current browser adapter tests first
Run:
cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test
Expected: new capability test fails because the backend abstraction does not exist yet.
- Step 3: Define the shared browser interface
Create src/browser/backend.rs:
pub trait BrowserBackend: Send + Sync {
fn invoke(
&self,
action: Action,
params: Value,
expected_domain: &str,
) -> Result<CommandOutput, PipeError>;
fn surface_metadata(&self) -> ExecutionSurfaceMetadata;
fn supports_eval(&self) -> bool {
true
}
}
- Step 4: Implement the pipe backend as a wrapper
Create src/browser/pipe_backend.rs that stores the current BrowserPipeTool<T> and forwards invoke(...) and surface_metadata() unchanged.
Pipe mode must continue using:
-
perform_handshake(...) -
MacPolicy::load_from_path(...) -
BrowserPipeTool::new(...).with_response_timeout(...) -
Step 5: Refactor runtime and tool adapters to depend on
Arc<dyn BrowserBackend>
Update:
src/compat/browser_tool_adapter.rssrc/compat/browser_script_skill_tool.rssrc/compat/runtime.rssrc/compat/orchestration.rssrc/compat/workflow_executor.rs
Preserve external tool names:
-
browser_action -
superrpa_browser -
Step 6: Add capability gating for eval-dependent script tools
If supports_eval() is false, do not expose browser-script skill tools from build_browser_script_skill_tools(...) in that backend mode.
- Step 7: Re-run browser adapter tests
Run:
cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test
Expected: all three pass.
- Step 8: Commit
git add src/browser src/lib.rs src/compat/browser_tool_adapter.rs src/compat/browser_script_skill_tool.rs src/compat/runtime.rs src/compat/orchestration.rs src/compat/workflow_executor.rs tests/browser_backend_capability_test.rs
git commit -m "refactor: abstract browser backend from pipe transport"
Task 3: Implement the fixed browser WebSocket protocol codec in isolation
Files:
-
Create:
src/browser/ws_protocol.rs -
Test:
tests/browser_ws_protocol_test.rs -
Reuse:
docs/_tmp_sgbrowser_ws_api_doc.txt -
Step 1: Write failing protocol codec tests
Create tests/browser_ws_protocol_test.rs covering:
-
exact outbound frame encoding
-
callback payload decoding
-
unknown callback format rejection
-
mapping coverage for every supported v1 action
-
Step 2: Run the protocol tests first
Run:
cargo test --test browser_ws_protocol_test
Expected: fail because the WS protocol codec does not exist yet.
- Step 3: Encode the exact browser frame shapes
Create src/browser/ws_protocol.rs so it can build exact array-form payloads such as:
[requesturl, "sgBrowserExcuteJsCodeByArea", target_url, js_code, area]
Serialize to the JSON string format required by the browser service.
- Step 4: Define the v1 action mapping table
Support only the actions already needed by current sgclaw flows:
NavigateGetTextClickTypeEval
Document which browser functions each one maps to and what assumptions they rely on.
- Step 5: Define callback parsing and correlation rules
Represent callback-bearing operations explicitly, including the callback function naming or request-correlation strategy the backend will depend on.
- Step 6: Reject unsupported or malformed shapes early
Fail fast for:
-
unsupported actions
-
malformed callback payloads
-
missing request correlation metadata
-
Step 7: Re-run the protocol tests
Run:
cargo test --test browser_ws_protocol_test
Expected: pass with no network dependency.
- Step 8: Commit
git add src/browser/ws_protocol.rs tests/browser_ws_protocol_test.rs
git commit -m "test: codify fixed browser websocket protocol"
Task 4: Build the browser WS backend with synchronous invoke semantics
Files:
-
Create:
src/browser/ws_backend.rs -
Modify:
src/browser/mod.rs -
Test:
tests/browser_ws_backend_test.rs -
Reuse:
CommandOutput,PipeError,ExecutionSurfaceMetadata,MacPolicy -
Step 1: Write failing backend behavior tests
Create tests/browser_ws_backend_test.rs covering:
-
zero return + no callback => success
-
non-zero return => failure
-
zero return + callback => success with normalized
CommandOutput -
callback timeout => timeout error
-
dropped socket => clear failure
-
Step 2: Run backend tests first
Run:
cargo test --test browser_ws_backend_test
Expected: fail because the WS backend does not exist yet.
- Step 3: Build a long-lived browser connection manager
Implement src/browser/ws_backend.rs to connect to ws://127.0.0.1:12345 and expose blocking invoke(...) calls.
Use a dedicated connection loop plus request/response coordination instead of scattering raw socket calls through the runtime.
- Step 4: Preserve local guardrails before send
Validate MacPolicy before translating an action into the browser WS protocol, matching current pipe backend behavior.
- Step 5: Normalize immediate status returns and delayed callbacks
For each invoke(...) call:
-
fail immediately on non-zero return codes
-
succeed immediately for operations with no data callback
-
wait for the matching callback for result-bearing operations
-
convert the final outcome into
CommandOutput -
Step 6: Keep v1 concurrency intentionally serialized
Allow only one in-flight browser request at a time unless the browser callback protocol proves a stable request-id guarantee.
- Step 7: Re-run backend tests
Run:
cargo test --test browser_ws_backend_test
Expected: pass using mocks/fakes, not the real browser.
- Step 8: Commit
git add src/browser/mod.rs src/browser/ws_backend.rs tests/browser_ws_backend_test.rs
git commit -m "feat: add browser websocket backend"
Task 5: Add the sg_claw service protocol and single-session server
Files:
-
Create:
src/service/mod.rs -
Create:
src/service/protocol.rs -
Create:
src/service/server.rs -
Create:
src/bin/sg_claw.rs -
Modify:
src/lib.rs -
Modify:
Cargo.toml -
Test:
tests/service_ws_session_test.rs -
Reuse:
AgentMessage::LogEntry,AgentMessage::TaskComplete,SubmitTaskRequest,run_submit_task(...) -
Step 1: Write failing service session tests
Create tests/service_ws_session_test.rs to verify:
-
first client attaches
-
second client gets
Busy -
disconnect resets session state
-
overlapping task submission is rejected clearly
-
Step 2: Run the session tests first
Run:
cargo test --test service_ws_session_test
Expected: fail because the service layer does not exist yet.
- Step 3: Define a thin client/service WS protocol
In src/service/protocol.rs, reuse existing task/event shapes as much as possible:
ClientMessage::SubmitTask { instruction, conversation_id, messages, page_url, page_title }
ClientMessage::Ping
ServiceMessage::LogEntry { level, message }
ServiceMessage::TaskComplete { success, summary }
ServiceMessage::Busy { message }
- Step 4: Add the service event sink adapter
Implement AgentEventSink for the service session writer so the shared task runner can stream LogEntry and TaskComplete over the service WebSocket.
- Step 5: Implement single-active-client session state
Model explicit states such as:
IdleClientAttachedTaskRunning
Reject a second client with ServiceMessage::Busy and close the socket. Reject overlapping tasks instead of queueing them.
- Step 6: Add the service binary
Create src/bin/sg_claw.rs that:
- loads config
- creates the browser WS backend
- listens for local client connections
- routes
SubmitTaskintorun_submit_task(...)
Keep src/main.rs and the existing sgclaw::run() pipe path unchanged.
- Step 7: Re-run the session tests
Run:
cargo test --test service_ws_session_test
Expected: pass without the real browser.
- Step 8: Commit
git add src/service src/bin/sg_claw.rs src/lib.rs Cargo.toml tests/service_ws_session_test.rs
git commit -m "feat: add claw-ws service entrypoint"
Task 6: Add the sg_claw_client interactive CLI
Files:
-
Create:
src/bin/sg_claw_client.rs -
Modify:
Cargo.toml -
Test:
tests/service_task_flow_test.rs -
Reuse:
src/service/protocol.rs -
Step 1: Write failing client/service task flow tests
Create tests/service_task_flow_test.rs to verify:
-
the submit-task request reaches the service
-
log entries stream in order
-
the final summary arrives exactly once
-
disconnect after task completion is handled cleanly
-
Step 2: Run the flow tests first
Run:
cargo test --test service_task_flow_test
Expected: fail because the client does not exist yet.
- Step 3: Implement a thin interactive client loop
Create src/bin/sg_claw_client.rs that:
-
connects to the local
sg_clawservice -
reads a line of user input
-
sends
ClientMessage::SubmitTask -
prints streamed
LogEntryevents as they arrive -
ends the turn on
TaskComplete -
Step 4: Keep the client intentionally dumb
Do not duplicate runtime logic in the client. Browser access, skills, orchestration, and task execution remain entirely inside the service.
- Step 5: Re-run the flow tests
Run:
cargo test --test service_task_flow_test
Expected: pass without the real browser.
- Step 6: Build the new binaries explicitly
Run:
cargo build --bin sg_claw --bin sg_claw_client
Expected: both binaries compile successfully.
- Step 7: Commit
git add src/bin/sg_claw_client.rs Cargo.toml tests/service_task_flow_test.rs
git commit -m "feat: add interactive claw-ws client"
Task 7: Finish wiring, preserve pipe mode, and verify end-to-end behavior
Files:
-
Modify:
Cargo.toml -
Modify:
src/lib.rs -
Modify:
src/config/settings.rs -
Optional modify:
src/runtime/engine.rs -
Reuse:
tests/browser_tool_test.rs,tests/runtime_task_flow_test.rs,tests/compat_runtime_test.rs -
Step 1: Add only the minimum config surface for v1
Add settings such as:
browser_ws_urldefaulting tows://127.0.0.1:12345service_ws_listen_addrdefaulting to local loopback
Do not change the meaning of existing browser backend/profile settings just to represent service mode.
- Step 2: Keep external browser tool naming stable
Verify that the runtime still exposes:
superrpa_browserbrowser_action
under both pipe and WS modes where the backend supports them.
- Step 3: Re-run the current pipe regression suite
Run:
cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test --test pipe_handshake_test --test pipe_protocol_test --test task_protocol_test
Expected: all existing pipe-oriented tests still pass unchanged.
- Step 4: Run the new WS-focused suite
Run:
cargo test --test task_runner_test --test browser_ws_protocol_test --test browser_ws_backend_test --test browser_backend_capability_test --test service_ws_session_test --test service_task_flow_test
Expected: all new tests pass without launching the real browser.
- Step 5: Run a full Rust test sweep
Run:
cargo test --tests
Expected: all Rust tests pass.
- Step 6: Build all three binaries
Run:
cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client
Expected: all three binaries compile.
- Step 7: Perform a manual local smoke test
Manual test:
- Start the browser app so
ws://127.0.0.1:12345is available. - Run
cargo run --bin sg_claw. - In another terminal, run
cargo run --bin sg_claw_client. - Submit a simple browser task such as opening a page or fetching visible text.
- Confirm the client prints streaming logs and exactly one final completion summary.
- Confirm the old pipe-mode entry still starts via
cargo run.
Expected: both modes work side-by-side.
- Step 8: Commit
git add Cargo.toml src/lib.rs src/config/settings.rs src/runtime/engine.rs
git commit -m "feat: wire parallel claw-ws transport"
Verification Checklist
Fast regression checks
cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test
Expected: current pipe/browser runtime behavior remains green.
Full Rust test sweep
cargo test --tests
Expected: all Rust tests pass.
Binary build verification
cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client
Expected: all three binaries compile.
Manual end-to-end verification
- Browser app listening on
ws://127.0.0.1:12345 cargo run --bin sg_clawcargo run --bin sg_claw_client- submit one browser task
- verify streaming logs, final completion, and single-client lock behavior
- verify
cargo runstill preserves old pipe bootstrap
Notes for Implementation
- Keep the current pipe bootstrap in
src/lib.rsintact until the shared runner and pipe backend wrapper are both green. - Prefer small commits at each task boundary.
- Keep the new WS path additive and isolated.
- Do not ship partial browser capabilities under stable tool names.
- Treat
docs/_tmp_sgbrowser_ws_api_doc.txtas the browser WS protocol source of truth while implementingsrc/browser/ws_protocol.rs.