Files
claw/docs/superpowers/plans/2026-04-01-claw-ws-parallel-transport.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

688 lines
21 KiB
Markdown

# Claw-WS Parallel Transport Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add a parallel `claw-ws` transport path that keeps the current pipe mode intact while introducing a long-lived `sg_claw` local service, an interactive `sg_claw_client`, and a browser WebSocket backend at `ws://127.0.0.1:12345`.
**Architecture:** First extract a transport-agnostic submit-task runner and browser backend abstraction from the current pipe-coupled flow. Keep the existing pipe path as one adapter/backend, then add a fixed-protocol browser WebSocket backend plus a small service/session layer and an interactive CLI client that reuse the same runtime, orchestration, and browser-facing tool adapters.
**Tech Stack:** Rust 2021, current sgclaw compat runtime, zeroclaw runtime engine, `serde`/`serde_json`, existing `MacPolicy`, and a blocking WebSocket crate for v1 (`tungstenite` preferred over a broad async rewrite).
---
## Scope Guardrails
- Keep the current pipe mode entrypoint and behavior working.
- Do **not** replace the existing browser pipe path.
- Add a **parallel** WebSocket path only.
- v1 supports **one active client session** only.
- Reuse existing tool names and runtime behavior whenever possible.
- If WS `Eval` support is incomplete, disable eval-dependent browser-script skill exposure in WS mode rather than shipping partial behavior.
- Do not broaden v1 with task queues, multi-client support, or admin endpoints.
---
## File Structure
### Existing files to reuse
- Modify: `src/lib.rs` — current pipe bootstrap and receive loop; keep as the legacy pipe entrypoint.
- Modify: `src/agent/mod.rs` — current `BrowserMessage::SubmitTask` entrypoint and config-loading flow.
- Modify: `src/compat/runtime.rs` — compat runtime and tool assembly.
- Modify: `src/compat/orchestration.rs` — direct workflow vs compat runtime routing.
- Modify: `src/compat/browser_tool_adapter.rs` — exposes `browser_action` and `superrpa_browser`.
- Modify: `src/compat/browser_script_skill_tool.rs` — browser-script skill execution.
- Modify: `src/compat/workflow_executor.rs` — direct browser workflows such as Zhihu flows.
- Reuse: `src/pipe/browser_tool.rs` — current browser command executor; retain as the pipe backend implementation.
- Reuse: `src/pipe/protocol.rs``BrowserMessage`, `AgentMessage`, `Action`, `ExecutionSurfaceMetadata`.
- Reuse: `src/security/mac_policy.rs` — local action/domain guardrails.
- Modify: `src/config/settings.rs` — minimal new config surface for WS mode.
- Optional modify: `src/runtime/engine.rs` — only if backend capability wiring requires it.
### New files to create
- Create: `src/agent/task_runner.rs` — shared submit-task execution entrypoint.
- Create: `src/browser/mod.rs` — browser backend exports.
- Create: `src/browser/backend.rs``BrowserBackend` trait and helpers.
- Create: `src/browser/pipe_backend.rs` — wrapper around existing `BrowserPipeTool`.
- Create: `src/browser/ws_protocol.rs` — fixed browser WS request/response codec.
- Create: `src/browser/ws_backend.rs` — browser WS backend with blocking invoke semantics.
- Create: `src/service/mod.rs` — service exports.
- Create: `src/service/protocol.rs` — client/service WS message types.
- Create: `src/service/server.rs` — single-session `sg_claw` server.
- Create: `src/bin/sg_claw.rs` — service binary.
- Create: `src/bin/sg_claw_client.rs` — interactive CLI client.
- Create: `tests/task_runner_test.rs` — shared submit-task runner regressions.
- Create: `tests/browser_backend_capability_test.rs` — backend capability/tool exposure tests.
- Create: `tests/browser_ws_protocol_test.rs` — browser WS protocol tests.
- Create: `tests/browser_ws_backend_test.rs` — browser WS backend tests.
- Create: `tests/service_ws_session_test.rs` — single-session server tests.
- Create: `tests/service_task_flow_test.rs` — client/service task flow tests.
---
## Task 1: Extract a shared submit-task runner
**Files:**
- Create: `src/agent/task_runner.rs`
- Modify: `src/agent/mod.rs`
- Modify: `src/lib.rs`
- Test: `tests/task_runner_test.rs`
- Reuse: `src/compat/runtime.rs`, `src/compat/orchestration.rs`
- [ ] **Step 1: Write a failing runner regression test**
Create `tests/task_runner_test.rs` covering:
- empty instruction returns the same `TaskComplete` failure summary
- missing LLM config still returns the same summary shape
- the pipe adapter still emits `LogEntry` before `TaskComplete`
- [ ] **Step 2: Run the targeted regression tests first**
Run:
```bash
cargo test --test runtime_task_flow_test --test task_runner_test
```
Expected: `task_runner_test` fails because the shared runner does not exist yet.
- [ ] **Step 3: Define the transport-neutral request model**
Create `src/agent/task_runner.rs` with a request struct that mirrors the current pipe payload:
```rust
pub struct SubmitTaskRequest {
pub instruction: String,
pub conversation_id: Option<String>,
pub messages: Vec<ConversationMessage>,
pub page_url: Option<String>,
pub page_title: Option<String>,
}
```
Normalize empty strings to `None` at the adapter boundary.
- [ ] **Step 4: Define an event sink abstraction**
Add a small trait that can emit the current agent events without depending on a specific transport:
```rust
pub trait AgentEventSink {
fn send(&self, message: &AgentMessage) -> Result<(), PipeError>;
}
```
The existing pipe transport should implement this first.
- [ ] **Step 5: Move submit-task execution into a shared function**
Extract the body currently inside `BrowserMessage::SubmitTask` handling from `src/agent/mod.rs` into a shared function such as:
```rust
pub fn run_submit_task(
sink: &dyn AgentEventSink,
browser_backend: Arc<dyn BrowserBackend>,
context: &AgentRuntimeContext,
request: SubmitTaskRequest,
) -> Result<(), PipeError>
```
This function must still:
- validate empty instruction
- load sgclaw settings
- log runtime/config info
- choose orchestration vs compat runtime
- emit `AgentMessage::TaskComplete`
- [ ] **Step 6: Keep pipe mode as a thin adapter**
Refactor `handle_browser_message_with_context(...)` in `src/agent/mod.rs` so it only:
- pattern matches `BrowserMessage`
- converts `SubmitTask` into `SubmitTaskRequest`
- forwards into `run_submit_task(...)`
- [ ] **Step 7: Re-run the runner regressions**
Run:
```bash
cargo test --test runtime_task_flow_test --test task_runner_test
```
Expected: both tests pass and pipe behavior remains unchanged.
- [ ] **Step 8: Commit**
```bash
git add src/agent/mod.rs src/agent/task_runner.rs src/lib.rs tests/task_runner_test.rs
git commit -m "refactor: extract shared submit-task runner"
```
---
## Task 2: Introduce a browser backend abstraction and wrap the current pipe implementation
**Files:**
- Create: `src/browser/mod.rs`
- Create: `src/browser/backend.rs`
- Create: `src/browser/pipe_backend.rs`
- Modify: `src/lib.rs`
- Modify: `src/compat/browser_tool_adapter.rs`
- Modify: `src/compat/browser_script_skill_tool.rs`
- Modify: `src/compat/runtime.rs`
- Modify: `src/compat/orchestration.rs`
- Modify: `src/compat/workflow_executor.rs`
- Test: `tests/browser_backend_capability_test.rs`
- Reuse: `src/pipe/browser_tool.rs`, `src/security/mac_policy.rs`
- [ ] **Step 1: Add a failing backend capability test**
Create `tests/browser_backend_capability_test.rs` to verify:
- pipe backend still exposes privileged surface metadata
- pipe backend still supports `Eval`
- browser-script tool exposure is disabled when `supports_eval()` is false
- [ ] **Step 2: Run the current browser adapter tests first**
Run:
```bash
cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test
```
Expected: new capability test fails because the backend abstraction does not exist yet.
- [ ] **Step 3: Define the shared browser interface**
Create `src/browser/backend.rs`:
```rust
pub trait BrowserBackend: Send + Sync {
fn invoke(
&self,
action: Action,
params: Value,
expected_domain: &str,
) -> Result<CommandOutput, PipeError>;
fn surface_metadata(&self) -> ExecutionSurfaceMetadata;
fn supports_eval(&self) -> bool {
true
}
}
```
- [ ] **Step 4: Implement the pipe backend as a wrapper**
Create `src/browser/pipe_backend.rs` that stores the current `BrowserPipeTool<T>` and forwards `invoke(...)` and `surface_metadata()` unchanged.
Pipe mode must continue using:
- `perform_handshake(...)`
- `MacPolicy::load_from_path(...)`
- `BrowserPipeTool::new(...).with_response_timeout(...)`
- [ ] **Step 5: Refactor runtime and tool adapters to depend on `Arc<dyn BrowserBackend>`**
Update:
- `src/compat/browser_tool_adapter.rs`
- `src/compat/browser_script_skill_tool.rs`
- `src/compat/runtime.rs`
- `src/compat/orchestration.rs`
- `src/compat/workflow_executor.rs`
Preserve external tool names:
- `browser_action`
- `superrpa_browser`
- [ ] **Step 6: Add capability gating for eval-dependent script tools**
If `supports_eval()` is false, do **not** expose browser-script skill tools from `build_browser_script_skill_tools(...)` in that backend mode.
- [ ] **Step 7: Re-run browser adapter tests**
Run:
```bash
cargo test --test browser_tool_test --test compat_browser_tool_test --test browser_backend_capability_test
```
Expected: all three pass.
- [ ] **Step 8: Commit**
```bash
git add src/browser src/lib.rs src/compat/browser_tool_adapter.rs src/compat/browser_script_skill_tool.rs src/compat/runtime.rs src/compat/orchestration.rs src/compat/workflow_executor.rs tests/browser_backend_capability_test.rs
git commit -m "refactor: abstract browser backend from pipe transport"
```
---
## Task 3: Implement the fixed browser WebSocket protocol codec in isolation
**Files:**
- Create: `src/browser/ws_protocol.rs`
- Test: `tests/browser_ws_protocol_test.rs`
- Reuse: `docs/_tmp_sgbrowser_ws_api_doc.txt`
- [ ] **Step 1: Write failing protocol codec tests**
Create `tests/browser_ws_protocol_test.rs` covering:
- exact outbound frame encoding
- callback payload decoding
- unknown callback format rejection
- mapping coverage for every supported v1 action
- [ ] **Step 2: Run the protocol tests first**
Run:
```bash
cargo test --test browser_ws_protocol_test
```
Expected: fail because the WS protocol codec does not exist yet.
- [ ] **Step 3: Encode the exact browser frame shapes**
Create `src/browser/ws_protocol.rs` so it can build exact array-form payloads such as:
```rust
[requesturl, "sgBrowserExcuteJsCodeByArea", target_url, js_code, area]
```
Serialize to the JSON string format required by the browser service.
- [ ] **Step 4: Define the v1 action mapping table**
Support only the actions already needed by current sgclaw flows:
- `Navigate`
- `GetText`
- `Click`
- `Type`
- `Eval`
Document which browser functions each one maps to and what assumptions they rely on.
- [ ] **Step 5: Define callback parsing and correlation rules**
Represent callback-bearing operations explicitly, including the callback function naming or request-correlation strategy the backend will depend on.
- [ ] **Step 6: Reject unsupported or malformed shapes early**
Fail fast for:
- unsupported actions
- malformed callback payloads
- missing request correlation metadata
- [ ] **Step 7: Re-run the protocol tests**
Run:
```bash
cargo test --test browser_ws_protocol_test
```
Expected: pass with no network dependency.
- [ ] **Step 8: Commit**
```bash
git add src/browser/ws_protocol.rs tests/browser_ws_protocol_test.rs
git commit -m "test: codify fixed browser websocket protocol"
```
---
## Task 4: Build the browser WS backend with synchronous invoke semantics
**Files:**
- Create: `src/browser/ws_backend.rs`
- Modify: `src/browser/mod.rs`
- Test: `tests/browser_ws_backend_test.rs`
- Reuse: `CommandOutput`, `PipeError`, `ExecutionSurfaceMetadata`, `MacPolicy`
- [ ] **Step 1: Write failing backend behavior tests**
Create `tests/browser_ws_backend_test.rs` covering:
- zero return + no callback => success
- non-zero return => failure
- zero return + callback => success with normalized `CommandOutput`
- callback timeout => timeout error
- dropped socket => clear failure
- [ ] **Step 2: Run backend tests first**
Run:
```bash
cargo test --test browser_ws_backend_test
```
Expected: fail because the WS backend does not exist yet.
- [ ] **Step 3: Build a long-lived browser connection manager**
Implement `src/browser/ws_backend.rs` to connect to `ws://127.0.0.1:12345` and expose blocking `invoke(...)` calls.
Use a dedicated connection loop plus request/response coordination instead of scattering raw socket calls through the runtime.
- [ ] **Step 4: Preserve local guardrails before send**
Validate `MacPolicy` before translating an action into the browser WS protocol, matching current pipe backend behavior.
- [ ] **Step 5: Normalize immediate status returns and delayed callbacks**
For each `invoke(...)` call:
- fail immediately on non-zero return codes
- succeed immediately for operations with no data callback
- wait for the matching callback for result-bearing operations
- convert the final outcome into `CommandOutput`
- [ ] **Step 6: Keep v1 concurrency intentionally serialized**
Allow only one in-flight browser request at a time unless the browser callback protocol proves a stable request-id guarantee.
- [ ] **Step 7: Re-run backend tests**
Run:
```bash
cargo test --test browser_ws_backend_test
```
Expected: pass using mocks/fakes, not the real browser.
- [ ] **Step 8: Commit**
```bash
git add src/browser/mod.rs src/browser/ws_backend.rs tests/browser_ws_backend_test.rs
git commit -m "feat: add browser websocket backend"
```
---
## Task 5: Add the `sg_claw` service protocol and single-session server
**Files:**
- Create: `src/service/mod.rs`
- Create: `src/service/protocol.rs`
- Create: `src/service/server.rs`
- Create: `src/bin/sg_claw.rs`
- Modify: `src/lib.rs`
- Modify: `Cargo.toml`
- Test: `tests/service_ws_session_test.rs`
- Reuse: `AgentMessage::LogEntry`, `AgentMessage::TaskComplete`, `SubmitTaskRequest`, `run_submit_task(...)`
- [ ] **Step 1: Write failing service session tests**
Create `tests/service_ws_session_test.rs` to verify:
- first client attaches
- second client gets `Busy`
- disconnect resets session state
- overlapping task submission is rejected clearly
- [ ] **Step 2: Run the session tests first**
Run:
```bash
cargo test --test service_ws_session_test
```
Expected: fail because the service layer does not exist yet.
- [ ] **Step 3: Define a thin client/service WS protocol**
In `src/service/protocol.rs`, reuse existing task/event shapes as much as possible:
```rust
ClientMessage::SubmitTask { instruction, conversation_id, messages, page_url, page_title }
ClientMessage::Ping
ServiceMessage::LogEntry { level, message }
ServiceMessage::TaskComplete { success, summary }
ServiceMessage::Busy { message }
```
- [ ] **Step 4: Add the service event sink adapter**
Implement `AgentEventSink` for the service session writer so the shared task runner can stream `LogEntry` and `TaskComplete` over the service WebSocket.
- [ ] **Step 5: Implement single-active-client session state**
Model explicit states such as:
- `Idle`
- `ClientAttached`
- `TaskRunning`
Reject a second client with `ServiceMessage::Busy` and close the socket. Reject overlapping tasks instead of queueing them.
- [ ] **Step 6: Add the service binary**
Create `src/bin/sg_claw.rs` that:
- loads config
- creates the browser WS backend
- listens for local client connections
- routes `SubmitTask` into `run_submit_task(...)`
Keep `src/main.rs` and the existing `sgclaw::run()` pipe path unchanged.
- [ ] **Step 7: Re-run the session tests**
Run:
```bash
cargo test --test service_ws_session_test
```
Expected: pass without the real browser.
- [ ] **Step 8: Commit**
```bash
git add src/service src/bin/sg_claw.rs src/lib.rs Cargo.toml tests/service_ws_session_test.rs
git commit -m "feat: add claw-ws service entrypoint"
```
---
## Task 6: Add the `sg_claw_client` interactive CLI
**Files:**
- Create: `src/bin/sg_claw_client.rs`
- Modify: `Cargo.toml`
- Test: `tests/service_task_flow_test.rs`
- Reuse: `src/service/protocol.rs`
- [ ] **Step 1: Write failing client/service task flow tests**
Create `tests/service_task_flow_test.rs` to verify:
- the submit-task request reaches the service
- log entries stream in order
- the final summary arrives exactly once
- disconnect after task completion is handled cleanly
- [ ] **Step 2: Run the flow tests first**
Run:
```bash
cargo test --test service_task_flow_test
```
Expected: fail because the client does not exist yet.
- [ ] **Step 3: Implement a thin interactive client loop**
Create `src/bin/sg_claw_client.rs` that:
- connects to the local `sg_claw` service
- reads a line of user input
- sends `ClientMessage::SubmitTask`
- prints streamed `LogEntry` events as they arrive
- ends the turn on `TaskComplete`
- [ ] **Step 4: Keep the client intentionally dumb**
Do **not** duplicate runtime logic in the client. Browser access, skills, orchestration, and task execution remain entirely inside the service.
- [ ] **Step 5: Re-run the flow tests**
Run:
```bash
cargo test --test service_task_flow_test
```
Expected: pass without the real browser.
- [ ] **Step 6: Build the new binaries explicitly**
Run:
```bash
cargo build --bin sg_claw --bin sg_claw_client
```
Expected: both binaries compile successfully.
- [ ] **Step 7: Commit**
```bash
git add src/bin/sg_claw_client.rs Cargo.toml tests/service_task_flow_test.rs
git commit -m "feat: add interactive claw-ws client"
```
---
## Task 7: Finish wiring, preserve pipe mode, and verify end-to-end behavior
**Files:**
- Modify: `Cargo.toml`
- Modify: `src/lib.rs`
- Modify: `src/config/settings.rs`
- Optional modify: `src/runtime/engine.rs`
- Reuse: `tests/browser_tool_test.rs`, `tests/runtime_task_flow_test.rs`, `tests/compat_runtime_test.rs`
- [ ] **Step 1: Add only the minimum config surface for v1**
Add settings such as:
- `browser_ws_url` defaulting to `ws://127.0.0.1:12345`
- `service_ws_listen_addr` defaulting to local loopback
Do **not** change the meaning of existing browser backend/profile settings just to represent service mode.
- [ ] **Step 2: Keep external browser tool naming stable**
Verify that the runtime still exposes:
- `superrpa_browser`
- `browser_action`
under both pipe and WS modes where the backend supports them.
- [ ] **Step 3: Re-run the current pipe regression suite**
Run:
```bash
cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test --test pipe_handshake_test --test pipe_protocol_test --test task_protocol_test
```
Expected: all existing pipe-oriented tests still pass unchanged.
- [ ] **Step 4: Run the new WS-focused suite**
Run:
```bash
cargo test --test task_runner_test --test browser_ws_protocol_test --test browser_ws_backend_test --test browser_backend_capability_test --test service_ws_session_test --test service_task_flow_test
```
Expected: all new tests pass without launching the real browser.
- [ ] **Step 5: Run a full Rust test sweep**
Run:
```bash
cargo test --tests
```
Expected: all Rust tests pass.
- [ ] **Step 6: Build all three binaries**
Run:
```bash
cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client
```
Expected: all three binaries compile.
- [ ] **Step 7: Perform a manual local smoke test**
Manual test:
1. Start the browser app so `ws://127.0.0.1:12345` is available.
2. Run `cargo run --bin sg_claw`.
3. In another terminal, run `cargo run --bin sg_claw_client`.
4. Submit a simple browser task such as opening a page or fetching visible text.
5. Confirm the client prints streaming logs and exactly one final completion summary.
6. Confirm the old pipe-mode entry still starts via `cargo run`.
Expected: both modes work side-by-side.
- [ ] **Step 8: Commit**
```bash
git add Cargo.toml src/lib.rs src/config/settings.rs src/runtime/engine.rs
git commit -m "feat: wire parallel claw-ws transport"
```
---
## Verification Checklist
### Fast regression checks
```bash
cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test
```
Expected: current pipe/browser runtime behavior remains green.
### Full Rust test sweep
```bash
cargo test --tests
```
Expected: all Rust tests pass.
### Binary build verification
```bash
cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client
```
Expected: all three binaries compile.
### Manual end-to-end verification
- Browser app listening on `ws://127.0.0.1:12345`
- `cargo run --bin sg_claw`
- `cargo run --bin sg_claw_client`
- submit one browser task
- verify streaming logs, final completion, and single-client lock behavior
- verify `cargo run` still preserves old pipe bootstrap
---
## Notes for Implementation
- Keep the current pipe bootstrap in `src/lib.rs` intact until the shared runner and pipe backend wrapper are both green.
- Prefer small commits at each task boundary.
- Keep the new WS path additive and isolated.
- Do not ship partial browser capabilities under stable tool names.
- Treat `docs/_tmp_sgbrowser_ws_api_doc.txt` as the browser WS protocol source of truth while implementing `src/browser/ws_protocol.rs`.