Files
claw/docs/superpowers/specs/2026-04-03-ws-browser-bridge-path-design.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

276 lines
11 KiB
Markdown

# WS Browser Bridge Path Design
## Background
The repository now has explicit live evidence that the real sgBrowser websocket endpoint at `ws://127.0.0.1:12345` is **reachable** but is **not validated as an external-control surface**.
The probe transcript in `docs/_tmp_sgbrowser_ws_probe_transcript.md` shows a stable outcome across the full bootstrap matrix:
- direct open-page frame
- `sgOpenAgent`
- `sgSetAuthInfo`
- `sgBrowserLogin`
- `sgBrowerserActiveTab`
- combined bootstrap attempts
- alternate `requesturl` values
Across all of those sequences, the endpoint behaved like this:
1. websocket connection succeeds
2. first inbound text frame is always the banner `Welcome! You are client #1`
3. no sequence produced a reproducible numeric status frame for a real business action
4. no sequence produced a reproducible callback frame for a real business action
5. follow-on business frames timed out or produced no further usable protocol traffic
That means the current project can no longer treat raw external websocket business frames as the default production integration surface.
## Why the raw websocket path is now considered non-validated
The decision is not based on a guess. It is based on both live evidence and repository evidence.
### Live evidence
`docs/_tmp_sgbrowser_ws_probe_transcript.md` proves that the real endpoint did **not** yield the one thing raw external control needs:
- a reproducible status/callback response for a real browser action
Because that never happened, the bootstrap hypothesis did not clear the acceptance bar.
### Repository evidence
The rest of the repository already points to a different product integration model.
#### 1. Historical frontend code uses browser-host bridge surfaces
In `frontend/archive/sgClaw验证-已归档/testRunner.js:15-26`:
- the runtime checks for `window.sgFunctionsUI`
- the runtime checks for `window.BrowserAction`
- the working path uses `window.sgFunctionsUI(action, params, callback)`
That is a host/browser bridge contract, not an external raw websocket RPC contract.
#### 2. Prior architecture docs make `CommandRouter` the execution entry
In `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md:16-18` and `:36-50`:
- reuse SuperRPA `CommandRouter` as the browser execution entry
- keep browser-side hosting, security re-check, and dispatch in SuperRPA
- avoid building parallel browser automation APIs
That is directly incompatible with treating raw external websocket business frames as the primary control plane.
#### 3. Project planning docs describe FunctionsUI IPC as the supported frontend seam
In `docs/archive/项目管理与排期/协作时间表.md:419-430`:
- Vue/FunctionsUI calls browser-host methods such as `window.superrpa.sgclaw.start()` and `sendCommand(...)`
- browser host pushes callbacks such as `onStatusChange(...)` and `onLog(...)`
Again, this is a bridge and host IPC model.
#### 4. Floating-chat planning already preserves named bridge calls
In `docs/plans/2026-03-27-sgclaw-floating-chat-frontend-design.md:289-293`:
- `connect()` issues `sgclawConnect`
- `start()` issues `sgclawStart`
- `stop()` issues `sgclawStop`
- `submitTask()` issues `sgclawSubmitTask`
That design work assumes a named browser bridge, not direct raw websocket frames.
## Decision
**Authoritative browser integration surface: the browser-host bridge path, not the raw external sgBrowser websocket business-frame path.**
More concretely, sgClaw should target this chain:
```text
sgClaw runtime
-> existing browser-facing bridge contract
-> FunctionsUI / host IPC
-> BrowserAction / sgclaw host callbacks
-> existing SuperRPA CommandRouter dispatch
```
## Authoritative seams for future implementation
Because this repository does not contain the full SuperRPA browser host source tree, the bridge-first implementation must integrate at the **nearest validated seam available in this repo**, while staying aligned with the external browser-host contract already documented.
The future implementation must model **two different bridge layers** explicitly instead of mixing them together.
### Layer 1: session/lifecycle bridge contract
This layer is evidenced by the named calls already present in repo documentation:
- `sgclawConnect`
- `sgclawStart`
- `sgclawStop`
- `sgclawSubmitTask`
This layer manages session setup, task submission, and host/UI lifecycle behavior.
It is important evidence that a browser-host bridge exists, but it is **not** the per-browser-action contract that a new `BrowserBackend` implementation should target.
### Layer 2: browser-action execution contract
This is the authoritative target for the new browser backend.
It is evidenced by:
- `window.BrowserAction(...)` in archived frontend code
- `FunctionsUI` / host IPC integration in archived planning docs
- browser-side dispatch through `CommandRouter` in `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md`
In this repository, the concrete boundary must be a **repo-local semantic transport seam** that can be implemented and tested without access to the external SuperRPA host code.
That seam should be a narrow Rust-side contract such as `BridgeActionTransport`:
- input: semantic browser action request (`navigate`, `click`, `getText`, etc.) plus params and expected domain
- output: semantic success/error reply that can be normalized back into `BrowserBackend` results
`BridgeBrowserBackend` should target **Layer 2 only**.
### Explicit out-of-scope boundary
The following are outside this repository and therefore outside the immediate Rust implementation slice:
- actual SuperRPA C++ host/browser code
- actual `FunctionsUI` TypeScript host plumbing in the external browser repository
- actual `CommandRouter` implementation in the external browser repository
This repository should implement only:
- the Rust-side bridge contract types
- the Rust-side bridge transport/provider seam
- the Rust-side bridge-backed browser adapter
- deterministic tests against those seams
### What this means practically
The next implementation slice should **not** continue trying to make `WsBrowserBackend` drive the real browser endpoint directly.
Instead, the next implementation slice should introduce a **bridge-backed browser adapter** that:
- preserves the Rust-side `BrowserBackend` contract where practical
- translates browser actions onto the Layer-2 semantic bridge surface
- keeps lifecycle/session bridge calls separate from per-action browser execution
- leaves the raw websocket probe code as diagnostic infrastructure only
## Chosen architecture
Use a bridge-backed adapter design.
### Target shape
```text
compat/runtime/orchestration
-> Arc<dyn BrowserBackend>
-> BridgeBrowserBackend (new)
-> BridgeActionTransport (new repo-local seam)
-> external browser-host bridge / FunctionsUI IPC
-> BrowserAction / CommandRouter path
```
### Why this shape
- It preserves the already-useful Rust-side browser abstraction (`BrowserBackend`) instead of re-plumbing the entire runtime.
- It keeps raw websocket probing available for diagnostics without letting it dictate production architecture.
- It matches the architecture already documented for SuperRPA integration.
- It keeps future work narrow: one new adapter layer instead of rewriting all runtime behavior.
## What stays the same
### Pipe path remains unchanged
The existing pipe path must remain behaviorally unchanged:
- `src/lib.rs`
- pipe handshake behavior
- `BrowserPipeTool`
- existing HMAC/domain validation semantics
The bridge-first work is about the **ws service / real browser integration path**, not about replacing or weakening the pipe path.
### Existing compat/runtime abstractions should be preserved where practical
The next slice should reuse:
- `BrowserBackend`
- existing browser tool adapters in compat/runtime
- existing task runner/orchestration flow
The new work should be concentrated in a bridge adapter and its wiring, not spread through unrelated layers.
## What does not stay the same
### Raw websocket is no longer the mainline production assumption
The repository may keep:
- `src/browser/ws_backend.rs`
- `src/browser/ws_protocol.rs`
- `src/browser/ws_probe.rs`
- `src/bin/sgbrowser_ws_probe.rs`
But those should now be treated as:
- protocol tooling
- fake-server test tooling
- live diagnostic/probe tooling
- possibly constrained compatibility code
They should remain diagnostic-only in this repository and must not be treated as the production path for reaching the real browser.
## Design constraints for the bridge slice
The bridge-path implementation must follow these constraints:
1. **No parallel browser API invention.** Reuse the real bridge/browser action surface already evidenced in docs and archived frontend code.
2. **No pipe regression.** Do not alter the working pipe entry path.
3. **Adapter-first design.** Prefer one bridge-backed backend implementation over broad runtime rewrites.
4. **TDD first.** Add focused bridge adapter tests before production wiring.
5. **Repository-local seam only.** Where external SuperRPA browser-host code is unavailable here, encode the contract in narrow adapters and tests instead of guessing internals.
## Testing implications
The bridge path changes what “proof” looks like.
### Required proof for the next slice
The next implementation slice must prove:
- a browser action can be emitted onto the bridge contract deterministically
- the bridge adapter maps replies/errors back into `BrowserBackend` semantics
- compat/runtime can use the bridge-backed backend without pipe regression
### No longer required for acceptance
The next slice does **not** need to prove that raw websocket business frames work directly against `ws://127.0.0.1:12345`, because the current evidence rejected that path as the mainline assumption.
## Acceptance criteria for this design decision
This design is correct only if future implementation follows all of these:
1. The next production slice targets the browser-host bridge path rather than raw external websocket business frames.
2. The raw websocket probe tooling remains diagnostic only.
3. Existing pipe behavior stays unchanged.
4. The next implementation plan identifies a narrow bridge-backed adapter, not a broad architecture rewrite.
5. Future success claims are based on bridge-path execution evidence, not on reinterpreting the existing raw-websocket transcript.
## Consequences
### Positive
- Aligns implementation with the strongest evidence already in the repo
- Stops further speculative coding on the wrong control surface
- Preserves existing ws probe work as useful diagnostics
- Keeps the next slice narrow and testable
### Trade-off
- Requires an additional adapter design step before more production code can land
- Defers any hope that a small websocket tweak alone will unlock the real browser path
That trade-off is correct, because the current blocker is no longer a small protocol bug. It is an integration-surface mismatch.