Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
276 lines
11 KiB
Markdown
276 lines
11 KiB
Markdown
# WS Browser Bridge Path Design
|
|
|
|
## Background
|
|
|
|
The repository now has explicit live evidence that the real sgBrowser websocket endpoint at `ws://127.0.0.1:12345` is **reachable** but is **not validated as an external-control surface**.
|
|
|
|
The probe transcript in `docs/_tmp_sgbrowser_ws_probe_transcript.md` shows a stable outcome across the full bootstrap matrix:
|
|
|
|
- direct open-page frame
|
|
- `sgOpenAgent`
|
|
- `sgSetAuthInfo`
|
|
- `sgBrowserLogin`
|
|
- `sgBrowerserActiveTab`
|
|
- combined bootstrap attempts
|
|
- alternate `requesturl` values
|
|
|
|
Across all of those sequences, the endpoint behaved like this:
|
|
|
|
1. websocket connection succeeds
|
|
2. first inbound text frame is always the banner `Welcome! You are client #1`
|
|
3. no sequence produced a reproducible numeric status frame for a real business action
|
|
4. no sequence produced a reproducible callback frame for a real business action
|
|
5. follow-on business frames timed out or produced no further usable protocol traffic
|
|
|
|
That means the current project can no longer treat raw external websocket business frames as the default production integration surface.
|
|
|
|
## Why the raw websocket path is now considered non-validated
|
|
|
|
The decision is not based on a guess. It is based on both live evidence and repository evidence.
|
|
|
|
### Live evidence
|
|
|
|
`docs/_tmp_sgbrowser_ws_probe_transcript.md` proves that the real endpoint did **not** yield the one thing raw external control needs:
|
|
|
|
- a reproducible status/callback response for a real browser action
|
|
|
|
Because that never happened, the bootstrap hypothesis did not clear the acceptance bar.
|
|
|
|
### Repository evidence
|
|
|
|
The rest of the repository already points to a different product integration model.
|
|
|
|
#### 1. Historical frontend code uses browser-host bridge surfaces
|
|
|
|
In `frontend/archive/sgClaw验证-已归档/testRunner.js:15-26`:
|
|
|
|
- the runtime checks for `window.sgFunctionsUI`
|
|
- the runtime checks for `window.BrowserAction`
|
|
- the working path uses `window.sgFunctionsUI(action, params, callback)`
|
|
|
|
That is a host/browser bridge contract, not an external raw websocket RPC contract.
|
|
|
|
#### 2. Prior architecture docs make `CommandRouter` the execution entry
|
|
|
|
In `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md:16-18` and `:36-50`:
|
|
|
|
- reuse SuperRPA `CommandRouter` as the browser execution entry
|
|
- keep browser-side hosting, security re-check, and dispatch in SuperRPA
|
|
- avoid building parallel browser automation APIs
|
|
|
|
That is directly incompatible with treating raw external websocket business frames as the primary control plane.
|
|
|
|
#### 3. Project planning docs describe FunctionsUI IPC as the supported frontend seam
|
|
|
|
In `docs/archive/项目管理与排期/协作时间表.md:419-430`:
|
|
|
|
- Vue/FunctionsUI calls browser-host methods such as `window.superrpa.sgclaw.start()` and `sendCommand(...)`
|
|
- browser host pushes callbacks such as `onStatusChange(...)` and `onLog(...)`
|
|
|
|
Again, this is a bridge and host IPC model.
|
|
|
|
#### 4. Floating-chat planning already preserves named bridge calls
|
|
|
|
In `docs/plans/2026-03-27-sgclaw-floating-chat-frontend-design.md:289-293`:
|
|
|
|
- `connect()` issues `sgclawConnect`
|
|
- `start()` issues `sgclawStart`
|
|
- `stop()` issues `sgclawStop`
|
|
- `submitTask()` issues `sgclawSubmitTask`
|
|
|
|
That design work assumes a named browser bridge, not direct raw websocket frames.
|
|
|
|
## Decision
|
|
|
|
**Authoritative browser integration surface: the browser-host bridge path, not the raw external sgBrowser websocket business-frame path.**
|
|
|
|
More concretely, sgClaw should target this chain:
|
|
|
|
```text
|
|
sgClaw runtime
|
|
-> existing browser-facing bridge contract
|
|
-> FunctionsUI / host IPC
|
|
-> BrowserAction / sgclaw host callbacks
|
|
-> existing SuperRPA CommandRouter dispatch
|
|
```
|
|
|
|
## Authoritative seams for future implementation
|
|
|
|
Because this repository does not contain the full SuperRPA browser host source tree, the bridge-first implementation must integrate at the **nearest validated seam available in this repo**, while staying aligned with the external browser-host contract already documented.
|
|
|
|
The future implementation must model **two different bridge layers** explicitly instead of mixing them together.
|
|
|
|
### Layer 1: session/lifecycle bridge contract
|
|
|
|
This layer is evidenced by the named calls already present in repo documentation:
|
|
|
|
- `sgclawConnect`
|
|
- `sgclawStart`
|
|
- `sgclawStop`
|
|
- `sgclawSubmitTask`
|
|
|
|
This layer manages session setup, task submission, and host/UI lifecycle behavior.
|
|
|
|
It is important evidence that a browser-host bridge exists, but it is **not** the per-browser-action contract that a new `BrowserBackend` implementation should target.
|
|
|
|
### Layer 2: browser-action execution contract
|
|
|
|
This is the authoritative target for the new browser backend.
|
|
|
|
It is evidenced by:
|
|
|
|
- `window.BrowserAction(...)` in archived frontend code
|
|
- `FunctionsUI` / host IPC integration in archived planning docs
|
|
- browser-side dispatch through `CommandRouter` in `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md`
|
|
|
|
In this repository, the concrete boundary must be a **repo-local semantic transport seam** that can be implemented and tested without access to the external SuperRPA host code.
|
|
|
|
That seam should be a narrow Rust-side contract such as `BridgeActionTransport`:
|
|
|
|
- input: semantic browser action request (`navigate`, `click`, `getText`, etc.) plus params and expected domain
|
|
- output: semantic success/error reply that can be normalized back into `BrowserBackend` results
|
|
|
|
`BridgeBrowserBackend` should target **Layer 2 only**.
|
|
|
|
### Explicit out-of-scope boundary
|
|
|
|
The following are outside this repository and therefore outside the immediate Rust implementation slice:
|
|
|
|
- actual SuperRPA C++ host/browser code
|
|
- actual `FunctionsUI` TypeScript host plumbing in the external browser repository
|
|
- actual `CommandRouter` implementation in the external browser repository
|
|
|
|
This repository should implement only:
|
|
|
|
- the Rust-side bridge contract types
|
|
- the Rust-side bridge transport/provider seam
|
|
- the Rust-side bridge-backed browser adapter
|
|
- deterministic tests against those seams
|
|
|
|
### What this means practically
|
|
|
|
The next implementation slice should **not** continue trying to make `WsBrowserBackend` drive the real browser endpoint directly.
|
|
|
|
Instead, the next implementation slice should introduce a **bridge-backed browser adapter** that:
|
|
|
|
- preserves the Rust-side `BrowserBackend` contract where practical
|
|
- translates browser actions onto the Layer-2 semantic bridge surface
|
|
- keeps lifecycle/session bridge calls separate from per-action browser execution
|
|
- leaves the raw websocket probe code as diagnostic infrastructure only
|
|
|
|
## Chosen architecture
|
|
|
|
Use a bridge-backed adapter design.
|
|
|
|
### Target shape
|
|
|
|
```text
|
|
compat/runtime/orchestration
|
|
-> Arc<dyn BrowserBackend>
|
|
-> BridgeBrowserBackend (new)
|
|
-> BridgeActionTransport (new repo-local seam)
|
|
-> external browser-host bridge / FunctionsUI IPC
|
|
-> BrowserAction / CommandRouter path
|
|
```
|
|
|
|
### Why this shape
|
|
|
|
- It preserves the already-useful Rust-side browser abstraction (`BrowserBackend`) instead of re-plumbing the entire runtime.
|
|
- It keeps raw websocket probing available for diagnostics without letting it dictate production architecture.
|
|
- It matches the architecture already documented for SuperRPA integration.
|
|
- It keeps future work narrow: one new adapter layer instead of rewriting all runtime behavior.
|
|
|
|
## What stays the same
|
|
|
|
### Pipe path remains unchanged
|
|
|
|
The existing pipe path must remain behaviorally unchanged:
|
|
|
|
- `src/lib.rs`
|
|
- pipe handshake behavior
|
|
- `BrowserPipeTool`
|
|
- existing HMAC/domain validation semantics
|
|
|
|
The bridge-first work is about the **ws service / real browser integration path**, not about replacing or weakening the pipe path.
|
|
|
|
### Existing compat/runtime abstractions should be preserved where practical
|
|
|
|
The next slice should reuse:
|
|
|
|
- `BrowserBackend`
|
|
- existing browser tool adapters in compat/runtime
|
|
- existing task runner/orchestration flow
|
|
|
|
The new work should be concentrated in a bridge adapter and its wiring, not spread through unrelated layers.
|
|
|
|
## What does not stay the same
|
|
|
|
### Raw websocket is no longer the mainline production assumption
|
|
|
|
The repository may keep:
|
|
|
|
- `src/browser/ws_backend.rs`
|
|
- `src/browser/ws_protocol.rs`
|
|
- `src/browser/ws_probe.rs`
|
|
- `src/bin/sgbrowser_ws_probe.rs`
|
|
|
|
But those should now be treated as:
|
|
|
|
- protocol tooling
|
|
- fake-server test tooling
|
|
- live diagnostic/probe tooling
|
|
- possibly constrained compatibility code
|
|
|
|
They should remain diagnostic-only in this repository and must not be treated as the production path for reaching the real browser.
|
|
|
|
## Design constraints for the bridge slice
|
|
|
|
The bridge-path implementation must follow these constraints:
|
|
|
|
1. **No parallel browser API invention.** Reuse the real bridge/browser action surface already evidenced in docs and archived frontend code.
|
|
2. **No pipe regression.** Do not alter the working pipe entry path.
|
|
3. **Adapter-first design.** Prefer one bridge-backed backend implementation over broad runtime rewrites.
|
|
4. **TDD first.** Add focused bridge adapter tests before production wiring.
|
|
5. **Repository-local seam only.** Where external SuperRPA browser-host code is unavailable here, encode the contract in narrow adapters and tests instead of guessing internals.
|
|
|
|
## Testing implications
|
|
|
|
The bridge path changes what “proof” looks like.
|
|
|
|
### Required proof for the next slice
|
|
|
|
The next implementation slice must prove:
|
|
|
|
- a browser action can be emitted onto the bridge contract deterministically
|
|
- the bridge adapter maps replies/errors back into `BrowserBackend` semantics
|
|
- compat/runtime can use the bridge-backed backend without pipe regression
|
|
|
|
### No longer required for acceptance
|
|
|
|
The next slice does **not** need to prove that raw websocket business frames work directly against `ws://127.0.0.1:12345`, because the current evidence rejected that path as the mainline assumption.
|
|
|
|
## Acceptance criteria for this design decision
|
|
|
|
This design is correct only if future implementation follows all of these:
|
|
|
|
1. The next production slice targets the browser-host bridge path rather than raw external websocket business frames.
|
|
2. The raw websocket probe tooling remains diagnostic only.
|
|
3. Existing pipe behavior stays unchanged.
|
|
4. The next implementation plan identifies a narrow bridge-backed adapter, not a broad architecture rewrite.
|
|
5. Future success claims are based on bridge-path execution evidence, not on reinterpreting the existing raw-websocket transcript.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- Aligns implementation with the strongest evidence already in the repo
|
|
- Stops further speculative coding on the wrong control surface
|
|
- Preserves existing ws probe work as useful diagnostics
|
|
- Keeps the next slice narrow and testable
|
|
|
|
### Trade-off
|
|
|
|
- Requires an additional adapter design step before more production code can land
|
|
- Defers any hope that a small websocket tweak alone will unlock the real browser path
|
|
|
|
That trade-off is correct, because the current blocker is no longer a small protocol bug. It is an integration-surface mismatch. |