feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,276 @@
|
||||
# WS Browser Bridge Path Design
|
||||
|
||||
## Background
|
||||
|
||||
The repository now has explicit live evidence that the real sgBrowser websocket endpoint at `ws://127.0.0.1:12345` is **reachable** but is **not validated as an external-control surface**.
|
||||
|
||||
The probe transcript in `docs/_tmp_sgbrowser_ws_probe_transcript.md` shows a stable outcome across the full bootstrap matrix:
|
||||
|
||||
- direct open-page frame
|
||||
- `sgOpenAgent`
|
||||
- `sgSetAuthInfo`
|
||||
- `sgBrowserLogin`
|
||||
- `sgBrowerserActiveTab`
|
||||
- combined bootstrap attempts
|
||||
- alternate `requesturl` values
|
||||
|
||||
Across all of those sequences, the endpoint behaved like this:
|
||||
|
||||
1. websocket connection succeeds
|
||||
2. first inbound text frame is always the banner `Welcome! You are client #1`
|
||||
3. no sequence produced a reproducible numeric status frame for a real business action
|
||||
4. no sequence produced a reproducible callback frame for a real business action
|
||||
5. follow-on business frames timed out or produced no further usable protocol traffic
|
||||
|
||||
That means the current project can no longer treat raw external websocket business frames as the default production integration surface.
|
||||
|
||||
## Why the raw websocket path is now considered non-validated
|
||||
|
||||
The decision is not based on a guess. It is based on both live evidence and repository evidence.
|
||||
|
||||
### Live evidence
|
||||
|
||||
`docs/_tmp_sgbrowser_ws_probe_transcript.md` proves that the real endpoint did **not** yield the one thing raw external control needs:
|
||||
|
||||
- a reproducible status/callback response for a real browser action
|
||||
|
||||
Because that never happened, the bootstrap hypothesis did not clear the acceptance bar.
|
||||
|
||||
### Repository evidence
|
||||
|
||||
The rest of the repository already points to a different product integration model.
|
||||
|
||||
#### 1. Historical frontend code uses browser-host bridge surfaces
|
||||
|
||||
In `frontend/archive/sgClaw验证-已归档/testRunner.js:15-26`:
|
||||
|
||||
- the runtime checks for `window.sgFunctionsUI`
|
||||
- the runtime checks for `window.BrowserAction`
|
||||
- the working path uses `window.sgFunctionsUI(action, params, callback)`
|
||||
|
||||
That is a host/browser bridge contract, not an external raw websocket RPC contract.
|
||||
|
||||
#### 2. Prior architecture docs make `CommandRouter` the execution entry
|
||||
|
||||
In `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md:16-18` and `:36-50`:
|
||||
|
||||
- reuse SuperRPA `CommandRouter` as the browser execution entry
|
||||
- keep browser-side hosting, security re-check, and dispatch in SuperRPA
|
||||
- avoid building parallel browser automation APIs
|
||||
|
||||
That is directly incompatible with treating raw external websocket business frames as the primary control plane.
|
||||
|
||||
#### 3. Project planning docs describe FunctionsUI IPC as the supported frontend seam
|
||||
|
||||
In `docs/archive/项目管理与排期/协作时间表.md:419-430`:
|
||||
|
||||
- Vue/FunctionsUI calls browser-host methods such as `window.superrpa.sgclaw.start()` and `sendCommand(...)`
|
||||
- browser host pushes callbacks such as `onStatusChange(...)` and `onLog(...)`
|
||||
|
||||
Again, this is a bridge and host IPC model.
|
||||
|
||||
#### 4. Floating-chat planning already preserves named bridge calls
|
||||
|
||||
In `docs/plans/2026-03-27-sgclaw-floating-chat-frontend-design.md:289-293`:
|
||||
|
||||
- `connect()` issues `sgclawConnect`
|
||||
- `start()` issues `sgclawStart`
|
||||
- `stop()` issues `sgclawStop`
|
||||
- `submitTask()` issues `sgclawSubmitTask`
|
||||
|
||||
That design work assumes a named browser bridge, not direct raw websocket frames.
|
||||
|
||||
## Decision
|
||||
|
||||
**Authoritative browser integration surface: the browser-host bridge path, not the raw external sgBrowser websocket business-frame path.**
|
||||
|
||||
More concretely, sgClaw should target this chain:
|
||||
|
||||
```text
|
||||
sgClaw runtime
|
||||
-> existing browser-facing bridge contract
|
||||
-> FunctionsUI / host IPC
|
||||
-> BrowserAction / sgclaw host callbacks
|
||||
-> existing SuperRPA CommandRouter dispatch
|
||||
```
|
||||
|
||||
## Authoritative seams for future implementation
|
||||
|
||||
Because this repository does not contain the full SuperRPA browser host source tree, the bridge-first implementation must integrate at the **nearest validated seam available in this repo**, while staying aligned with the external browser-host contract already documented.
|
||||
|
||||
The future implementation must model **two different bridge layers** explicitly instead of mixing them together.
|
||||
|
||||
### Layer 1: session/lifecycle bridge contract
|
||||
|
||||
This layer is evidenced by the named calls already present in repo documentation:
|
||||
|
||||
- `sgclawConnect`
|
||||
- `sgclawStart`
|
||||
- `sgclawStop`
|
||||
- `sgclawSubmitTask`
|
||||
|
||||
This layer manages session setup, task submission, and host/UI lifecycle behavior.
|
||||
|
||||
It is important evidence that a browser-host bridge exists, but it is **not** the per-browser-action contract that a new `BrowserBackend` implementation should target.
|
||||
|
||||
### Layer 2: browser-action execution contract
|
||||
|
||||
This is the authoritative target for the new browser backend.
|
||||
|
||||
It is evidenced by:
|
||||
|
||||
- `window.BrowserAction(...)` in archived frontend code
|
||||
- `FunctionsUI` / host IPC integration in archived planning docs
|
||||
- browser-side dispatch through `CommandRouter` in `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md`
|
||||
|
||||
In this repository, the concrete boundary must be a **repo-local semantic transport seam** that can be implemented and tested without access to the external SuperRPA host code.
|
||||
|
||||
That seam should be a narrow Rust-side contract such as `BridgeActionTransport`:
|
||||
|
||||
- input: semantic browser action request (`navigate`, `click`, `getText`, etc.) plus params and expected domain
|
||||
- output: semantic success/error reply that can be normalized back into `BrowserBackend` results
|
||||
|
||||
`BridgeBrowserBackend` should target **Layer 2 only**.
|
||||
|
||||
### Explicit out-of-scope boundary
|
||||
|
||||
The following are outside this repository and therefore outside the immediate Rust implementation slice:
|
||||
|
||||
- actual SuperRPA C++ host/browser code
|
||||
- actual `FunctionsUI` TypeScript host plumbing in the external browser repository
|
||||
- actual `CommandRouter` implementation in the external browser repository
|
||||
|
||||
This repository should implement only:
|
||||
|
||||
- the Rust-side bridge contract types
|
||||
- the Rust-side bridge transport/provider seam
|
||||
- the Rust-side bridge-backed browser adapter
|
||||
- deterministic tests against those seams
|
||||
|
||||
### What this means practically
|
||||
|
||||
The next implementation slice should **not** continue trying to make `WsBrowserBackend` drive the real browser endpoint directly.
|
||||
|
||||
Instead, the next implementation slice should introduce a **bridge-backed browser adapter** that:
|
||||
|
||||
- preserves the Rust-side `BrowserBackend` contract where practical
|
||||
- translates browser actions onto the Layer-2 semantic bridge surface
|
||||
- keeps lifecycle/session bridge calls separate from per-action browser execution
|
||||
- leaves the raw websocket probe code as diagnostic infrastructure only
|
||||
|
||||
## Chosen architecture
|
||||
|
||||
Use a bridge-backed adapter design.
|
||||
|
||||
### Target shape
|
||||
|
||||
```text
|
||||
compat/runtime/orchestration
|
||||
-> Arc<dyn BrowserBackend>
|
||||
-> BridgeBrowserBackend (new)
|
||||
-> BridgeActionTransport (new repo-local seam)
|
||||
-> external browser-host bridge / FunctionsUI IPC
|
||||
-> BrowserAction / CommandRouter path
|
||||
```
|
||||
|
||||
### Why this shape
|
||||
|
||||
- It preserves the already-useful Rust-side browser abstraction (`BrowserBackend`) instead of re-plumbing the entire runtime.
|
||||
- It keeps raw websocket probing available for diagnostics without letting it dictate production architecture.
|
||||
- It matches the architecture already documented for SuperRPA integration.
|
||||
- It keeps future work narrow: one new adapter layer instead of rewriting all runtime behavior.
|
||||
|
||||
## What stays the same
|
||||
|
||||
### Pipe path remains unchanged
|
||||
|
||||
The existing pipe path must remain behaviorally unchanged:
|
||||
|
||||
- `src/lib.rs`
|
||||
- pipe handshake behavior
|
||||
- `BrowserPipeTool`
|
||||
- existing HMAC/domain validation semantics
|
||||
|
||||
The bridge-first work is about the **ws service / real browser integration path**, not about replacing or weakening the pipe path.
|
||||
|
||||
### Existing compat/runtime abstractions should be preserved where practical
|
||||
|
||||
The next slice should reuse:
|
||||
|
||||
- `BrowserBackend`
|
||||
- existing browser tool adapters in compat/runtime
|
||||
- existing task runner/orchestration flow
|
||||
|
||||
The new work should be concentrated in a bridge adapter and its wiring, not spread through unrelated layers.
|
||||
|
||||
## What does not stay the same
|
||||
|
||||
### Raw websocket is no longer the mainline production assumption
|
||||
|
||||
The repository may keep:
|
||||
|
||||
- `src/browser/ws_backend.rs`
|
||||
- `src/browser/ws_protocol.rs`
|
||||
- `src/browser/ws_probe.rs`
|
||||
- `src/bin/sgbrowser_ws_probe.rs`
|
||||
|
||||
But those should now be treated as:
|
||||
|
||||
- protocol tooling
|
||||
- fake-server test tooling
|
||||
- live diagnostic/probe tooling
|
||||
- possibly constrained compatibility code
|
||||
|
||||
They should remain diagnostic-only in this repository and must not be treated as the production path for reaching the real browser.
|
||||
|
||||
## Design constraints for the bridge slice
|
||||
|
||||
The bridge-path implementation must follow these constraints:
|
||||
|
||||
1. **No parallel browser API invention.** Reuse the real bridge/browser action surface already evidenced in docs and archived frontend code.
|
||||
2. **No pipe regression.** Do not alter the working pipe entry path.
|
||||
3. **Adapter-first design.** Prefer one bridge-backed backend implementation over broad runtime rewrites.
|
||||
4. **TDD first.** Add focused bridge adapter tests before production wiring.
|
||||
5. **Repository-local seam only.** Where external SuperRPA browser-host code is unavailable here, encode the contract in narrow adapters and tests instead of guessing internals.
|
||||
|
||||
## Testing implications
|
||||
|
||||
The bridge path changes what “proof” looks like.
|
||||
|
||||
### Required proof for the next slice
|
||||
|
||||
The next implementation slice must prove:
|
||||
|
||||
- a browser action can be emitted onto the bridge contract deterministically
|
||||
- the bridge adapter maps replies/errors back into `BrowserBackend` semantics
|
||||
- compat/runtime can use the bridge-backed backend without pipe regression
|
||||
|
||||
### No longer required for acceptance
|
||||
|
||||
The next slice does **not** need to prove that raw websocket business frames work directly against `ws://127.0.0.1:12345`, because the current evidence rejected that path as the mainline assumption.
|
||||
|
||||
## Acceptance criteria for this design decision
|
||||
|
||||
This design is correct only if future implementation follows all of these:
|
||||
|
||||
1. The next production slice targets the browser-host bridge path rather than raw external websocket business frames.
|
||||
2. The raw websocket probe tooling remains diagnostic only.
|
||||
3. Existing pipe behavior stays unchanged.
|
||||
4. The next implementation plan identifies a narrow bridge-backed adapter, not a broad architecture rewrite.
|
||||
5. Future success claims are based on bridge-path execution evidence, not on reinterpreting the existing raw-websocket transcript.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Aligns implementation with the strongest evidence already in the repo
|
||||
- Stops further speculative coding on the wrong control surface
|
||||
- Preserves existing ws probe work as useful diagnostics
|
||||
- Keeps the next slice narrow and testable
|
||||
|
||||
### Trade-off
|
||||
|
||||
- Requires an additional adapter design step before more production code can land
|
||||
- Defers any hope that a small websocket tweak alone will unlock the real browser path
|
||||
|
||||
That trade-off is correct, because the current blocker is no longer a small protocol bug. It is an integration-surface mismatch.
|
||||
Reference in New Issue
Block a user