Files
claw/docs/superpowers/plans/2026-04-02-ws-browser-backend-auth-plan.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

608 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# WS Browser Backend Auth Replacement Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the ws service paths empty-session-key `BrowserPipeTool` dependency with a ws-native browser backend path so real browser websocket calls work, while preserving legacy pipe behavior exactly.
**Architecture:** Keep the existing pipe entry untouched and add a ws-only parallel execution seam. The ws service path will construct a `ServiceBrowserWsClient` plus `WsBrowserBackend`, pass that backend through a new ws-only submit-task entry, and let the existing compat/runtime stack consume `Arc<dyn BrowserBackend>` instead of requiring `BrowserPipeTool` on the ws path.
**Tech Stack:** Rust 2021, current sgclaw agent/task runner, compat runtime/orchestration stack, `tungstenite`, `serde_json`, existing `MacPolicy`, existing `BrowserBackend`/`WsBrowserBackend`, and the current Rust test suite.
---
## Scope Guardrails
- Only change the ws service path.
- Do **not** change `src/lib.rs` pipe runtime behavior.
- Do **not** change pipe handshake semantics.
- Do **not** introduce fake session keys, fake HMAC seeds, or auth bypasses.
- Keep legacy `run_submit_task(...)` available for the pipe entry.
- If a shared layer must change, add a parallel ws-only entry instead of weakening the pipe path.
- Keep the current single-client, single-task service model.
- Do not broaden this slice into browser process launch, queueing, multi-client support, or protocol extensions.
---
## File Structure
### Existing files to modify
- Modify: `src/agent/task_runner.rs` — keep the current pipe-oriented submit path and add the ws-only backend-based submit path.
- Modify: `src/compat/runtime.rs` — add a backend-driven execution entry that accepts `Arc<dyn BrowserBackend>` directly, while keeping the current pipe-oriented public functions behaviorally unchanged.
- Modify: `src/compat/orchestration.rs` — add a matching backend-driven execution entry for orchestration/direct-route flows, while keeping the current pipe-oriented public functions behaviorally unchanged.
- Modify: `src/compat/workflow_executor.rs` — add backend-driven sibling APIs for direct-route/fallback execution, while keeping the current pipe-oriented public functions behaviorally unchanged.
- Modify: `src/service/server.rs` — replace the ws services `BrowserPipeTool::new(..., vec![])` path with a ws-native `WsClient` + `WsBrowserBackend` path.
- Modify: `src/service/mod.rs` — only if minimal re-export or call-signature changes are needed around the new ws-only submit path.
- Modify: `src/browser/mod.rs` — only if export cleanup is truly needed for the service wiring.
- Reuse: `src/agent/mod.rs` — keep the current pipe routing unchanged unless a tiny internal refactor is strictly needed to reuse shared code.
- Reuse: `src/browser/backend.rs` — existing shared browser backend trait.
- Reuse: `src/browser/ws_backend.rs` — existing ws-native browser backend implementation.
- Reuse: `src/browser/ws_protocol.rs` — existing browser websocket protocol codec.
- Reuse: `src/compat/browser_tool_adapter.rs` — should already speak `BrowserBackend`; only touch if a narrow ws regression forces it.
- Reuse: `src/compat/browser_script_skill_tool.rs` — eval-capability gating already exists; only touch if a narrow ws regression forces it.
- Reuse: `src/lib.rs` — pipe entrypoint must remain behaviorally unchanged; verify only.
### Existing tests to extend
- Modify: `tests/browser_ws_backend_test.rs` — keep existing ws backend coverage green after the service adapter wiring lands.
- Modify: `tests/browser_script_skill_tool_test.rs` — re-verify eval-gating and browser-script behavior after the shared compat/runtime seam changes.
- Modify: `tests/service_ws_session_test.rs` — update service-side unit/session tests to exercise the ws-only submit path.
- Modify: `tests/service_task_flow_test.rs` — add client→service chain coverage proving the ws path reaches a browser websocket and no longer emits `invalid hmac seed`.
- Modify: `src/service/server.rs` under `#[cfg(test)]` if the private service-side ws adapter cannot be exercised from an integration test crate without changing production visibility.
### New files to create
- Create: `tests/browser_ws_service_adapter_test.rs` if the adapter can be exercised through a public seam; otherwise keep the deterministic adapter tests as unit tests in `src/service/server.rs` so no production visibility changes are required.
---
## Task 1: Lock the ws-only behavior with deterministic failing tests
**Files:**
- Create: `tests/browser_ws_service_adapter_test.rs`
- Modify: `tests/service_task_flow_test.rs`
- Reuse: `tests/browser_ws_backend_test.rs`, `src/browser/ws_backend.rs`, `src/service/server.rs`
- [ ] **Step 1: Write the first failing backend/adapter test**
Create `tests/browser_ws_service_adapter_test.rs` with one focused test that directly exercises the ws-service adapter layer, without `sg_claw_client`, without LLM planning, and without natural-language tasks.
Start with the smallest real behavior from the spec:
- fake browser websocket server accepts one connection
- the ws-service adapter builds the same kind of client the service will use
- `WsBrowserBackend.invoke(Action::Navigate, ...)` succeeds on status `0`
- the fake server receives one text frame that decodes as a ws `Navigate` call
- [ ] **Step 2: Run that single new test and watch it fail**
Run:
```bash
cargo test --test browser_ws_service_adapter_test ws_service_backend_navigate_reaches_browser_websocket -- --nocapture
```
Expected: FAIL because the service-side ws client/adapter does not exist yet.
- [ ] **Step 3: Add the second failing deterministic test**
In the same file, add a test for the forced-close path:
- fake browser websocket server accepts a request, then closes/reset the socket before returning a status frame
- observe the error at the `WsBrowserBackend.invoke(...)` call site
- assert the outward error is exactly `PipeError::PipeClosed`
- [ ] **Step 4: Run only the forced-close test and watch it fail**
Run:
```bash
cargo test --test browser_ws_service_adapter_test ws_service_backend_maps_browser_disconnect_to_pipe_closed -- --nocapture
```
Expected: FAIL because the service-side ws client/adapter still does not exist.
- [ ] **Step 5: Add the third failing deterministic test**
In the same file, add a callback-timeout test:
- fake browser websocket server returns status `0`
- it never returns the callback frame
- assert the outward error at `invoke(...)` is exactly `PipeError::Timeout`
Use a tiny response timeout in the backend under test.
- [ ] **Step 6: Run only the callback-timeout test and watch it fail**
Run:
```bash
cargo test --test browser_ws_service_adapter_test ws_service_backend_times_out_waiting_for_callback -- --nocapture
```
Expected: FAIL because the service-side ws client/adapter still does not exist.
- [ ] **Step 7: Add the end-to-end failing regression for the auth bug**
Extend `tests/service_task_flow_test.rs` with one client→service integration test that:
- starts a fake browser websocket server
- starts the real `sg_claw` service binary with a temp config pointing `browserWsUrl` to that fake server
- starts the real `sg_claw_client`
- submits the fixed instruction `打开知乎热榜并读取页面主区域文本`
- captures service/client output
- asserts the fake browser server received at least one text frame
- asserts output does **not** contain `invalid hmac seed: session key must not be empty`
Do not assert planner details here. This test only proves the service path no longer goes through the empty-session-key auth failure.
- [ ] **Step 8: Run the integration regression and watch it fail**
Run:
```bash
cargo test --test service_task_flow_test ws_service_submit_task_no_longer_hits_invalid_hmac_seed -- --nocapture
```
Expected: FAIL on the current code because the ws service still constructs `BrowserPipeTool::new(..., vec![])`.
- [ ] **Step 9: Commit the red tests only after they are all in place**
Do not commit yet if any required red test was skipped. The next task will make them pass.
---
## Task 2: Add a ws-only browser-backend execution seam without changing the pipe path
**Files:**
- Modify: `src/agent/task_runner.rs`
- Modify: `src/compat/runtime.rs`
- Modify: `src/compat/orchestration.rs`
- Modify: `src/compat/workflow_executor.rs`
- Reuse: `src/agent/mod.rs`, `src/browser/backend.rs`
- Test: `tests/task_runner_test.rs`, `tests/browser_script_skill_tool_test.rs`
- [ ] **Step 1: Write the smallest failing runner-level ws entry test**
Extend `tests/task_runner_test.rs` with a focused test that proves there is a ws-only submit entry accepting `Arc<dyn BrowserBackend>` and an arbitrary event sink, while the old `run_submit_task(...)` signature still exists for pipe mode.
The test can stay on the missing-LLM-config path so it does not need a real browser call. It should compile only once the new ws-only function exists.
- [ ] **Step 2: Run the targeted runner test and watch it fail**
Run:
```bash
cargo test --test task_runner_test ws_only_submit_task_entry_accepts_browser_backend -- --nocapture
```
Expected: FAIL to compile or FAIL to link because the ws-only entry does not exist yet.
- [ ] **Step 3: Add the new ws-only submit-task entry in `src/agent/task_runner.rs`**
Keep the current pipe function intact:
```rust
pub fn run_submit_task<T: Transport + 'static>(... browser_tool: &BrowserPipeTool<T>, ...)
```
Add a parallel entry for the service path, for example:
```rust
pub fn run_submit_task_with_browser_backend(
sink: &dyn AgentEventSink,
browser_backend: Arc<dyn BrowserBackend>,
context: &AgentRuntimeContext,
request: SubmitTaskRequest,
) -> Result<(), PipeError>
```
Rules:
- share as much internal logic as possible with the pipe path
- do not change `run_submit_task(...)` behavior
- do not change `src/agent/mod.rs` pipe wiring except, at most, small internal refactoring to reuse common code
- [ ] **Step 4: Add a backend-driven compat runtime entry**
In `src/compat/runtime.rs`, add a parallel entry that accepts `Arc<dyn BrowserBackend>` directly instead of `BrowserPipeTool<T>`.
Keep the existing pipe-oriented public function in place.
The backend-driven entry must preserve:
- existing log emission order
- tool names (`superrpa_browser`, `browser_action`)
- existing browser-script tool gating behavior
- existing office/screen tool attachment logic
- existing conversation seeding and provider setup
- [ ] **Step 5: Add backend-driven orchestration and workflow-executor entries**
In `src/compat/orchestration.rs`, add the matching backend-driven entry so direct-route flows and fallback flows can run with `Arc<dyn BrowserBackend>` on the ws path.
In `src/compat/workflow_executor.rs`, add backend-driven sibling APIs for any direct-route/fallback execution that is currently hard-wired to `BrowserPipeTool<T>`.
Keep the existing pipe-oriented orchestration and workflow-executor public functions in place.
- [ ] **Step 6: Route the new ws-only submit entry through the backend-driven compat/orchestration/workflow-executor path**
Inside `src/agent/task_runner.rs`, make the new ws-only submit entry call the new backend-based compat/orchestration functions, while the old pipe entry keeps calling the old pipe-based functions.
This is the core compatibility seam, and it must cover both normal compat-runtime execution and direct-route/fallback workflow execution.
- [ ] **Step 7: Re-run the new runner test**
Run:
```bash
cargo test --test task_runner_test ws_only_submit_task_entry_accepts_browser_backend -- --nocapture
```
Expected: PASS.
- [ ] **Step 8: Re-run the full runner, workflow, and browser-script regressions**
Run:
```bash
cargo test --test task_runner_test --test browser_script_skill_tool_test -- --nocapture
```
Then run the workflow executor unit coverage that protects direct-route behavior:
```bash
cargo test compat::workflow_executor::tests -- --nocapture
```
Expected: all existing runner, workflow, and browser-script tests still pass, proving the pipe-facing path, direct-route behavior, and eval-gating stayed stable.
- [ ] **Step 9: Commit**
```bash
git add src/agent/task_runner.rs src/compat/runtime.rs src/compat/orchestration.rs src/compat/workflow_executor.rs tests/task_runner_test.rs tests/browser_script_skill_tool_test.rs
git commit -m "refactor: add ws-only browser backend submit path"
```
---
## Task 3: Replace the ws services empty-session-key browser tool with a ws-native backend
**Files:**
- Modify: `src/service/server.rs`
- Modify: `src/service/mod.rs` only if minimal re-export or signature cleanup is required
- Modify: `src/browser/mod.rs` only if export cleanup is needed
- Test: `tests/browser_ws_service_adapter_test.rs`
- Reuse: `src/browser/ws_backend.rs`, `src/browser/ws_protocol.rs`
- [ ] **Step 1: Write the smallest failing service-side adapter compile target**
Add a compile-level or construction-level assertion in `tests/browser_ws_service_adapter_test.rs` that the service path can construct the new service-side ws client type used by `serve_client(...)`.
This should fail until the type exists in `src/service/server.rs`.
- [ ] **Step 2: Run the adapter test group and watch the constructor test fail**
Run:
```bash
cargo test --test browser_ws_service_adapter_test -- --nocapture
```
Expected: FAIL because the service-side ws client type does not exist yet.
- [ ] **Step 3: Introduce `ServiceBrowserWsClient` in `src/service/server.rs`**
Create a narrow client type that owns the real websocket connection to `browser_ws_url` and implements `WsClient`:
Required responsibilities only:
- lazily connect on first use
- send raw text frames
- receive raw text frames with timeout
- map close/reset to exactly `PipeError::PipeClosed`
- map connect failure to exactly `PipeError::Protocol("browser websocket connect failed: ...")`
- map timeouts to exactly `PipeError::Timeout`
Do **not** duplicate `WsBrowserBackend` responsibilities here.
- [ ] **Step 4: Remove ws-path use of `BrowserPipeTool::new(..., vec![])`**
In `serve_client(...)`, replace this shape:
```rust
let transport = Arc::new(ServiceBrowserTransport::new(...));
let browser_tool = BrowserPipeTool::new(transport.clone(), mac_policy.clone(), vec![])
```
with the ws-native shape:
```rust
let ws_client = Arc::new(ServiceBrowserWsClient::new(...));
let browser_backend: Arc<dyn BrowserBackend> = Arc::new(
WsBrowserBackend::new(ws_client, mac_policy.clone(), initial_request_url(...))
.with_response_timeout(BROWSER_RESPONSE_TIMEOUT)
);
```
Then route the task through the new ws-only submit entry from Task 2.
- [ ] **Step 5: Delete or narrow old ws-path transport code that duplicated protocol handling**
Remove the service-only callback polling / response queue logic that existed solely to feed `BrowserPipeTool`.
Keep only what is still needed for:
- service client websocket I/O (`sg_claw_client``sg_claw`)
- browser websocket I/O (`sg_claw``browser_ws_url`)
Do not leave two competing ws protocol implementations in `src/service/server.rs`.
- [ ] **Step 6: Re-run deterministic adapter/backend tests**
Run:
```bash
cargo test --test browser_ws_service_adapter_test -- --nocapture
```
Expected: PASS, including:
- navigate success
- disconnect => `PipeError::PipeClosed`
- callback timeout => `PipeError::Timeout`
- [ ] **Step 7: Re-run existing ws backend tests**
Run:
```bash
cargo test --test browser_ws_backend_test -- --nocapture
```
Expected: PASS, confirming the service adapter change did not break the existing ws backend semantics.
- [ ] **Step 8: Commit**
```bash
git add src/service/server.rs src/service/mod.rs src/browser/mod.rs tests/browser_ws_service_adapter_test.rs
git commit -m "feat: switch ws service to ws-native browser backend"
```
---
## Task 4: Prove the auth bug is gone and pipe mode is unchanged
**Files:**
- Modify: `tests/service_ws_session_test.rs`
- Modify: `tests/service_task_flow_test.rs`
- Reuse: `src/lib.rs`, `src/service/mod.rs`, `src/compat/workflow_executor.rs`
- [ ] **Step 1: Update service session tests for the new ws-only call path**
Adjust any service session tests that still call `handle_client_message(...)` through the old ws-path `BrowserPipeTool` assumption.
Prefer one of these narrow approaches:
- overload `handle_client_message(...)` with a backend-based service entry used only in ws tests, or
- keep `handle_client_message(...)` pipe-oriented and test the ws path through `serve_client(...)` and the real service binary instead
Choose the option that changes the fewest existing tests and leaves the pipe path simplest.
- [ ] **Step 2: Run the focused service session file**
Run:
```bash
cargo test --test service_ws_session_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 3: Make the auth-regression integration test pass**
Re-run the exact end-to-end regression from Task 1:
```bash
cargo test --test service_task_flow_test ws_service_submit_task_no_longer_hits_invalid_hmac_seed -- --nocapture
```
Expected: PASS, with evidence that:
- the fake browser websocket server received at least one frame
- output no longer contains `invalid hmac seed: session key must not be empty`
- [ ] **Step 4: Add one explicit mandatory assertion for browser websocket connect failures**
Add one focused assertion that a browser websocket connect failure surfaces outward as:
```rust
PipeError::Protocol("browser websocket connect failed: ...")
```
Do not leave this semantic implied.
- [ ] **Step 5: Add one explicit ws direct-route regression**
Add one focused regression that proves a ws-backed browser backend can traverse a direct-route/fallback path that currently flows through `src/compat/workflow_executor.rs`.
Keep it deterministic and narrow. Prefer a fake backend plus direct function invocation over a planner-dependent natural-language end-to-end test.
- [ ] **Step 6: Run the ws-focused regression suite**
Run:
```bash
cargo test --test browser_ws_service_adapter_test --test browser_ws_backend_test --test browser_ws_protocol_test --test service_ws_session_test --test service_task_flow_test -- --nocapture
```
Then run the workflow-executor direct-route coverage:
```bash
cargo test compat::workflow_executor::tests -- --nocapture
```
Expected: all ws-focused and direct-route workflow tests pass.
- [ ] **Step 7: Run the required pipe and browser-script regression suite**
Run:
```bash
cargo test --test pipe_handshake_test --test browser_tool_test --test compat_browser_tool_test --test browser_script_skill_tool_test --test runtime_task_flow_test -- --nocapture
```
Expected: all required pipe and browser-script regressions pass unchanged.
- [ ] **Step 8: Run the full relevant verification sweep**
Run:
```bash
cargo test --test browser_ws_service_adapter_test --test browser_ws_backend_test --test browser_ws_protocol_test --test service_ws_session_test --test service_task_flow_test --test pipe_handshake_test --test browser_tool_test --test compat_browser_tool_test --test browser_script_skill_tool_test --test runtime_task_flow_test -- --nocapture
```
Then run:
```bash
cargo test compat::workflow_executor::tests -- --nocapture
```
Expected: full mixed ws+pipe verification passes in fresh runs.
- [ ] **Step 9: Build the affected binaries**
Run:
```bash
cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client
```
Expected: all three binaries compile.
- [ ] **Step 10: Commit**
```bash
git add tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/browser_ws_service_adapter_test.rs src/compat/workflow_executor.rs
git commit -m "test: verify ws auth replacement and pipe regressions"
```
---
## Task 5: Manual smoke verification against the real browser
**Files:**
- Reuse only: no code changes unless a verified bug is found during smoke work
- [ ] **Step 1: Start the real browser websocket target**
Confirm the real sgBrowser endpoint is reachable at the configured `browserWsUrl`.
- [ ] **Step 2: Start the real ws service**
Run:
```bash
cargo run --bin sg_claw -- --config-path "D:/data/ideaSpace/rust/sgClaw/sgclaw_config.json"
```
Expected: service prints the resolved listen address and browser websocket URL.
- [ ] **Step 3: Run the minimal browser task through the real client**
Run from a separate terminal with UTF-8-safe input:
```bash
cargo run --bin sg_claw_client -- --config-path "D:/data/ideaSpace/rust/sgClaw/sgclaw_config.json"
```
Submit:
```text
打开知乎热榜并读取页面主区域文本
```
Expected:
- browser actions start executing
- no `invalid hmac seed: session key must not be empty`
- one final completion is returned
- [ ] **Step 4: Run the old Zhihu skill smoke**
Submit:
```text
读取知乎热榜数据,并导出 excel 文件
```
Expected: the task enters the real browser action path instead of dying at auth initialization.
- [ ] **Step 5: Re-check the legacy pipe entry without modifying it**
Run:
```bash
cargo run
```
Only verify startup behavior appropriate for the current pipe environment. Do not change pipe code during this smoke step.
- [ ] **Step 6: If a smoke failure appears, stop and debug before editing**
Any failure found here must be handled with:
- a fresh reproducer
- a failing automated test if feasible
- the smallest scoped fix
Do not fold speculative smoke fixes into this slice.
---
## Verification Checklist
### Deterministic ws-only tests
```bash
cargo test --test browser_ws_service_adapter_test --test browser_ws_backend_test --test browser_ws_protocol_test -- --nocapture
```
Expected: ws-native backend and service adapter semantics are green without LLM/planner dependencies.
### Client→service ws chain tests
```bash
cargo test --test service_ws_session_test --test service_task_flow_test -- --nocapture
```
Expected: the ws service path reaches the browser websocket and no longer emits the empty-session-key auth failure.
### Required pipe and browser-script regressions
```bash
cargo test --test pipe_handshake_test --test browser_tool_test --test compat_browser_tool_test --test browser_script_skill_tool_test --test runtime_task_flow_test -- --nocapture
```
Expected: legacy pipe behavior and browser-script eval-gating remain unchanged.
### Binary build verification
```bash
cargo build --bin sgclaw --bin sg_claw --bin sg_claw_client
```
Expected: all affected binaries compile.
### Manual end-to-end verification
- real sgBrowser running at configured `browserWsUrl`
- `cargo run --bin sg_claw -- --config-path "D:/data/ideaSpace/rust/sgClaw/sgclaw_config.json"`
- `cargo run --bin sg_claw_client -- --config-path "D:/data/ideaSpace/rust/sgClaw/sgclaw_config.json"`
- run the Zhihu minimal task
- run the old Zhihu export task
- verify no `invalid hmac seed` appears
- verify pipe startup still behaves as before
---
## Notes for Implementation
- Keep the current pipe bootstrap in `src/lib.rs` untouched.
- Prefer adding ws-only functions over changing existing pipe signatures.
- Reuse `WsBrowserBackend` for protocol semantics; do not re-implement callback handling inside the service.
- Keep `ServiceBrowserWsClient` narrow: connection lifecycle + raw websocket I/O only.
- Preserve exact outward error semantics from the spec:
- connect failure => `PipeError::Protocol("browser websocket connect failed: ...")`
- non-zero status => `PipeError::Protocol("browser returned non-zero status: ...")`
- callback timeout => `PipeError::Timeout`
- close/reset => `PipeError::PipeClosed`
- Do not claim success until the mixed ws+pipe verification commands have been run fresh.