Files
claw/docs/superpowers/plans/2026-04-03-ws-browser-integration-surface-correction-plan.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

567 lines
19 KiB
Markdown

# WS Browser Integration Surface Correction Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the unvalidated raw-ws-direct assumption with an evidence-backed decision: either prove a minimal sgBrowser bootstrap sequence for raw websocket control, or pivot to the real browser bridge surface.
**Architecture:** Treat the existing ws-native backend as a protocol/testing asset, not as a validated production integration surface. First build a narrow probe/validation harness that can run candidate bootstrap sequences and capture exact live transcripts from the real endpoint. Then branch decisively: if a reproducible bootstrap sequence yields real status/callback frames, implement that bootstrap path; otherwise stop raw-ws speculation and write the bridge-first implementation slice.
**Tech Stack:** Rust 2021, existing `src/browser/ws_protocol.rs` / `src/browser/ws_backend.rs`, service websocket infrastructure, `tungstenite`, `serde_json`, current Rust test suite, local sgBrowser websocket documentation.
---
## Scope Guardrails
- Do **not** add more speculative production fixes to `src/service/server.rs` just to “try one more thing.”
- Do **not** claim raw websocket is the supported path unless the live probe transcript proves it.
- Do **not** modify `src/lib.rs`, pipe handshake behavior, or the pipe browser-tool path.
- Do **not** implement both the bootstrap architecture and the bridge architecture in the same branch.
- Keep the ws-native code unless and until the bridge decision makes specific pieces obsolete.
- Prefer a dedicated probe surface over embedding validation logic into production request handling.
---
## File Structure
### Existing files to modify
- Modify: `src/browser/mod.rs`
- export the new `ws_probe` module so both tests and the probe binary use the same library surface
- Modify: `src/browser/ws_protocol.rs`
- only if a tiny helper extraction is required for test/probe readability
- do not change existing protocol semantics in this slice
- Modify: `tests/browser_ws_protocol_test.rs`
- add deterministic coverage for any extracted helper used by the probe harness
### New files to create
- Create: `src/bin/sgbrowser_ws_probe.rs`
- standalone diagnostic binary for ordered frame-script probing against a live sgBrowser websocket endpoint
- Create: `src/browser/ws_probe.rs`
- small reusable probe/transcript module, if needed, to keep the binary and tests focused
- Create: `tests/browser_ws_probe_test.rs`
- deterministic fake-server tests for transcript capture, timeout reporting, and scripted sequence execution
- Create: `docs/superpowers/specs/2026-04-03-ws-browser-bridge-path-design.md` **only if Option B wins after probing**
- follow-up bridge design, not part of the initial coding slice
- Create: `docs/superpowers/plans/2026-04-03-ws-browser-bridge-path-plan.md` **only if Option B wins after probing**
- follow-up bridge implementation plan, not part of the initial coding slice
- Create: `docs/_tmp_sgbrowser_ws_probe_transcript.md`
- temporary evidence artifact capturing the real endpoint probe matrix and outcomes
### Files deliberately not changed in the initial slice
- `src/lib.rs`
- `src/agent/task_runner.rs`
- `src/compat/runtime.rs`
- `src/compat/orchestration.rs`
- `src/compat/workflow_executor.rs`
- `src/browser/ws_backend.rs`
Unless the probe results prove a real bootstrap contract, these files stay untouched.
---
## Task 1: Build a deterministic websocket probe harness before touching production behavior
**Files:**
- Create: `src/browser/ws_probe.rs`
- Create: `tests/browser_ws_probe_test.rs`
- Reuse: `src/browser/ws_protocol.rs`
- [ ] **Step 1: Write the first failing transcript test**
Create `tests/browser_ws_probe_test.rs` with one focused fake-server test that executes a scripted sequence of outgoing text frames and records all received text frames in order.
Start with this shape:
```rust
#[test]
fn probe_records_welcome_then_silence_transcript() {
// fake server sends one welcome frame and then stays silent
// probe result should preserve that exact transcript and mark timeout/silence explicitly
}
```
Required assertions:
- the probe can connect to the fake websocket server
- it can send a scripted first frame
- it records the first inbound text frame exactly
- it returns a transcript/result object that distinguishes timeout from protocol parse failure
- [ ] **Step 2: Run the single new test and verify it fails**
Run:
```bash
cargo test --test browser_ws_probe_test probe_records_welcome_then_silence_transcript -- --nocapture
```
Expected: FAIL because the probe harness does not exist yet.
- [ ] **Step 3: Add the second failing probe test for ordered multi-step scripts**
In the same file, add a test proving the harness can run multiple outgoing frames in a fixed order and keep the transcript segmented by step.
Test shape:
```rust
#[test]
fn probe_runs_ordered_frame_script_and_records_per_step_results() {
// send bootstrap frame 1, bootstrap frame 2, then minimal action
// fake server replies differently at each step
// probe result preserves exact order and outcomes
}
```
Required assertions:
- outgoing frames are sent in the configured order
- inbound frames are attached to the correct step
- the probe can stop the sequence on timeout/close if configured
- [ ] **Step 4: Run the ordered-script test and verify it fails**
Run:
```bash
cargo test --test browser_ws_probe_test probe_runs_ordered_frame_script_and_records_per_step_results -- --nocapture
```
Expected: FAIL because the probe harness does not exist yet.
- [ ] **Step 5: Add the third failing probe test for close/reset visibility**
Add one focused fake-server test that closes the connection after a script step and asserts the transcript reports close/reset rather than generic timeout.
- [ ] **Step 6: Run the close/reset test and verify it fails**
Run:
```bash
cargo test --test browser_ws_probe_test probe_reports_socket_close_separately_from_timeout -- --nocapture
```
Expected: FAIL because the probe harness does not exist yet.
- [ ] **Step 7: Implement the minimal probe module**
Create `src/browser/ws_probe.rs` with only the types and behavior needed by the tests.
Recommended shape:
```rust
pub struct ProbeStep {
pub label: String,
pub payload: String,
pub expect_reply: bool,
}
pub enum ProbeOutcome {
Received(Vec<String>),
TimedOut,
Closed,
ConnectFailed(String),
}
pub struct ProbeStepResult {
pub label: String,
pub sent: String,
pub outcome: ProbeOutcome,
}
pub fn run_probe_script(/* ws url, timeout, steps */) -> Result<Vec<ProbeStepResult>, ProbeError> {
// connect, send ordered frames, collect exact transcript
}
```
Rules:
- do not parse business meaning yet
- do not mix this into normal task execution
- preserve exact raw text frames in transcript results
- keep the module small and diagnostic-oriented
- [ ] **Step 8: Re-run the new probe tests**
Run:
```bash
cargo test --test browser_ws_probe_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 9: Commit**
```bash
git add src/browser/ws_probe.rs tests/browser_ws_probe_test.rs
git commit -m "test: add sgBrowser websocket probe harness"
```
---
## Task 2: Add a standalone probe binary for live sgBrowser evidence collection
**Files:**
- Create: `src/bin/sgbrowser_ws_probe.rs`
- Create: `src/browser/ws_probe.rs`
- Modify: `src/browser/mod.rs`
- Create: `tests/browser_ws_probe_test.rs`
- [ ] **Step 1: Write the failing helper parser test**
In `tests/browser_ws_probe_test.rs`, add one focused test for a new helper function in `src/browser/ws_probe.rs`:
```rust
#[test]
fn parse_probe_args_accepts_ws_url_timeout_and_ordered_steps() {
// parse a fixed argv-style slice into a ProbeCliConfig
}
```
Create and use this exact helper shape:
```rust
pub struct ProbeCliConfig {
pub ws_url: String,
pub timeout_ms: u64,
pub steps: Vec<ProbeStep>,
}
pub fn parse_probe_args(args: &[String]) -> Result<ProbeCliConfig, ProbeError>
```
The test must assert that these exact arguments parse successfully and preserve step order:
```text
--ws-url ws://127.0.0.1:12345
--timeout-ms 1500
--step open-agent::["about:blank","sgOpenAgent"]
--step open-hot::["about:blank","sgBrowerserOpenPage","https://www.zhihu.com/hot"]
```
- [ ] **Step 2: Run the parser test and verify it fails**
Run:
```bash
cargo test --test browser_ws_probe_test parse_probe_args_accepts_ws_url_timeout_and_ordered_steps -- --nocapture
```
Expected: FAIL because `parse_probe_args(...)` and `ProbeCliConfig` do not exist yet.
- [ ] **Step 3: Implement the helper and binary together**
In `src/browser/ws_probe.rs`, add `ProbeCliConfig` and `parse_probe_args(...)`.
In `src/browser/mod.rs`, add the module export:
```rust
pub mod ws_probe;
```
In `src/bin/sgbrowser_ws_probe.rs`, implement the binary using only `std::env::args()` plus `parse_probe_args(...)`.
Required behavior:
- accepts a websocket URL
- accepts a timeout in milliseconds
- accepts repeated ordered steps
- runs the probe harness
- prints a markdown-friendly transcript including:
- step label
- exact sent payload
- exact received frames, if any
- timeout/close outcome
Output shape can be simple, for example:
```text
STEP 1 bootstrap-open-agent
SEND: ["about:blank","sgOpenAgent"]
RECV: Welcome! You are client #1
OUTCOME: timeout
```
Rules:
- no production/browser-runtime integration
- no hidden fallback logic
- no “best effort” guessing of next steps
- [ ] **Step 4: Re-run the parser/helper test**
Run:
```bash
cargo test --test browser_ws_probe_test parse_probe_args_accepts_ws_url_timeout_and_ordered_steps -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Build the probe binary**
Run:
```bash
cargo build --bin sgbrowser_ws_probe
```
Expected: PASS.
- [ ] **Step 6: Commit**
```bash
git add src/bin/sgbrowser_ws_probe.rs src/browser/ws_probe.rs src/browser/mod.rs tests/browser_ws_probe_test.rs
git commit -m "feat: add live sgBrowser websocket probe binary"
```
---
## Task 3: Run the real endpoint probe matrix and write the evidence transcript
**Files:**
- Create: `docs/_tmp_sgbrowser_ws_probe_transcript.md`
- Reuse only: `src/bin/sgbrowser_ws_probe.rs`, `docs/_tmp_sgbrowser_ws_api_doc.txt`
- [ ] **Step 1: Run the no-bootstrap baseline probe**
Run exactly:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "baseline-open::[\"about:blank\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
Append the exact output under a `## baseline-open` heading in `docs/_tmp_sgbrowser_ws_probe_transcript.md`.
- [ ] **Step 2: Run the documented `sgOpenAgent` candidate**
Run exactly:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "open-agent::[\"about:blank\",\"sgOpenAgent\"]" --step "post-open-agent-open::[\"about:blank\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
Append the exact output under a `## open-agent` heading.
- [ ] **Step 3: Run the documented `sgSetAuthInfo` candidate**
Run exactly:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "set-auth::[\"about:blank\",\"sgSetAuthInfo\",\"probe-user\",\"probe-token\"]" --step "post-set-auth-open::[\"about:blank\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
Append the exact output under a `## set-auth` heading.
- [ ] **Step 4: Run the documented `sgBrowserLogin` candidate**
Run exactly:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "browser-login::{\"request\":\"use-json-helper\"}"
```
Before running, replace the placeholder payload with the exact JSON-array frame produced by the helper for:
```json
["about:blank","sgBrowserLogin",{"appName":"probe","userName":"probe","orgName":"probe","menus":[{"name":"probe","normalImg":"x","activeImg":"x","url":"https://www.zhihu.com/hot"}]}]
```
Then add a second step in the same command:
```json
["about:blank","sgBrowerserOpenPage","https://www.zhihu.com/hot"]
```
Append the exact output under a `## browser-login` heading.
- [ ] **Step 5: Run the documented `sgBrowerserActiveTab` candidate**
Run exactly:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "active-tab::[\"about:blank\",\"sgBrowerserActiveTab\",\"https://www.zhihu.com/hot\",\"probeCallback\"]" --step "post-active-tab-open::[\"about:blank\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
Append the exact output under a `## active-tab` heading.
- [ ] **Step 6: Run one combined bootstrap candidate**
Run exactly:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "combined-open-agent::[\"about:blank\",\"sgOpenAgent\"]" --step "combined-active-tab::[\"about:blank\",\"sgBrowerserActiveTab\",\"https://www.zhihu.com/hot\",\"probeCallback\"]" --step "combined-open::[\"about:blank\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
Append the exact output under a `## combined-bootstrap` heading.
- [ ] **Step 7: Run `requesturl` variants for the minimal action**
Run exactly these two additional commands:
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "target-as-requesturl::[\"https://www.zhihu.com/hot\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
```bash
cargo run --bin sgbrowser_ws_probe -- --ws-url "ws://127.0.0.1:12345" --timeout-ms 1500 --step "baidu-requesturl::[\"https://www.baidu.com\",\"sgBrowerserOpenPage\",\"https://www.zhihu.com/hot\"]"
```
Append the exact outputs under `## requesturl-variants`.
- [ ] **Step 8: Summarize the matrix in the transcript file**
At the end of `docs/_tmp_sgbrowser_ws_probe_transcript.md`, add this exact table template and fill it in:
```markdown
| Sequence | Sent frames | First reply | Final outcome | Decision signal |
| --- | --- | --- | --- | --- |
```
- [ ] **Step 9: Determine which architecture option wins**
Decision rule:
- if at least one sequence reproducibly yields real numeric status and/or callback frames for a real business action, Option A (bootstrap-validated raw websocket) wins
- otherwise, Option B (bridge-first) wins
Do not weaken this decision rule.
- [ ] **Step 10: Commit the evidence artifact**
```bash
git add docs/_tmp_sgbrowser_ws_probe_transcript.md
git commit -m "docs: capture sgBrowser websocket probe evidence"
```
---
## Task 4A: If Option A wins, write the narrow bootstrap implementation slice
**Files:**
- Create: `docs/superpowers/specs/2026-04-03-ws-browser-bootstrap-contract-design.md`
- Create: `docs/superpowers/plans/2026-04-03-ws-browser-bootstrap-contract-plan.md`
- Reuse as evidence input:
- `docs/_tmp_sgbrowser_ws_probe_transcript.md`
- `docs/_tmp_sgbrowser_ws_api_doc.txt`
- `src/browser/ws_backend.rs`
- `src/browser/ws_protocol.rs`
- [ ] **Step 1: Write one new design doc capturing the proven bootstrap contract**
Create:
```text
docs/superpowers/specs/2026-04-03-ws-browser-bootstrap-contract-design.md
```
Include:
- exact validated sequence
- exact required state (`requesturl`, active tab, agent page, auth payload)
- exact failure semantics
- why this is sufficient evidence to keep raw websocket as the product surface
- [ ] **Step 2: Write one new implementation plan for the bootstrap path**
Create:
```text
docs/superpowers/plans/2026-04-03-ws-browser-bootstrap-contract-plan.md
```
Plan only the minimal production changes required to embed the validated bootstrap sequence into the service/browser path.
- [ ] **Step 3: Commit the bootstrap decision docs**
```bash
git add docs/superpowers/specs/2026-04-03-ws-browser-bootstrap-contract-design.md docs/superpowers/plans/2026-04-03-ws-browser-bootstrap-contract-plan.md
git commit -m "docs: capture ws browser bootstrap contract"
```
- [ ] **Step 4: Stop after writing the bootstrap plan**
Do not begin production implementation in the same slice unless the user explicitly asks for execution.
---
## Task 4B: If Option B wins, write the bridge-first implementation slice
**Files:**
- Create: `docs/superpowers/specs/2026-04-03-ws-browser-bridge-path-design.md`
- Create: `docs/superpowers/plans/2026-04-03-ws-browser-bridge-path-plan.md`
- Reuse as evidence input:
- `docs/_tmp_sgbrowser_ws_probe_transcript.md`
- `frontend/archive/sgClaw验证-已归档/testRunner.js`
- `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md`
- `docs/archive/项目管理与排期/协作时间表.md`
- `docs/plans/2026-03-27-sgclaw-floating-chat-frontend-design.md`
- [ ] **Step 1: Write the bridge-path design doc**
Create `docs/superpowers/specs/2026-04-03-ws-browser-bridge-path-design.md`.
The design must specify:
- why raw websocket is considered non-validated for external control
- which bridge surface becomes authoritative
- where sgClaw should integrate (`FunctionsUI`, host bridge, `BrowserAction`, `CommandRouter`, or the nearest validated seam in this repo)
- how to preserve pipe behavior and existing abstractions where practical
- [ ] **Step 2: Write the bridge-path implementation plan**
Create `docs/superpowers/plans/2026-04-03-ws-browser-bridge-path-plan.md`.
The plan must:
- identify exact files to touch
- describe the narrowest adapter implementation
- keep TDD/task granularity as in this document
- avoid speculative work outside the bridge slice
- [ ] **Step 3: Commit the bridge decision docs**
```bash
git add docs/superpowers/specs/2026-04-03-ws-browser-bridge-path-design.md docs/superpowers/plans/2026-04-03-ws-browser-bridge-path-plan.md
git commit -m "docs: define bridge-first sgBrowser integration"
```
- [ ] **Step 4: Stop after writing the bridge plan**
Do not start the bridge implementation in the same slice unless the user explicitly asks for execution.
---
## Verification Checklist
### Deterministic probe harness tests
```bash
cargo test --test browser_ws_probe_test -- --nocapture
```
Expected: transcript capture, ordered scripts, timeout reporting, and close/reset reporting all pass.
### Probe binary build
```bash
cargo build --bin sgbrowser_ws_probe
```
Expected: PASS.
### Live evidence collection
- run the probe matrix against the real configured endpoint
- save exact transcripts to `docs/_tmp_sgbrowser_ws_probe_transcript.md`
- make the architecture decision using the documented rule
### Follow-up branch condition
- if Option A wins, repository contains a bootstrap-contract design + plan
- if Option B wins, repository contains a bridge-path design + plan
- no production runtime changes are made until that decision is written down
---
## Notes for Implementation
- The existing `WsBrowserBackend` fix that remembers the last navigated URL remains valid; do not revert it.
- The previous auth-replacement work also remains valid; it removed a real bug but did not prove the raw websocket architecture.
- Keep the probe tool brutally literal: exact sent frames, exact received frames, explicit timeout/close outcomes.
- Resist the temptation to make the probe “smart.” Smart probes hide evidence.
- If the real endpoint still replies only with the welcome banner and then silence across the matrix, treat that as a decision, not as an excuse for more guessing.