Files
claw/docs/superpowers/specs/2026-04-03-ws-browser-bridge-path-design.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

11 KiB

WS Browser Bridge Path Design

Background

The repository now has explicit live evidence that the real sgBrowser websocket endpoint at ws://127.0.0.1:12345 is reachable but is not validated as an external-control surface.

The probe transcript in docs/_tmp_sgbrowser_ws_probe_transcript.md shows a stable outcome across the full bootstrap matrix:

  • direct open-page frame
  • sgOpenAgent
  • sgSetAuthInfo
  • sgBrowserLogin
  • sgBrowerserActiveTab
  • combined bootstrap attempts
  • alternate requesturl values

Across all of those sequences, the endpoint behaved like this:

  1. websocket connection succeeds
  2. first inbound text frame is always the banner Welcome! You are client #1
  3. no sequence produced a reproducible numeric status frame for a real business action
  4. no sequence produced a reproducible callback frame for a real business action
  5. follow-on business frames timed out or produced no further usable protocol traffic

That means the current project can no longer treat raw external websocket business frames as the default production integration surface.

Why the raw websocket path is now considered non-validated

The decision is not based on a guess. It is based on both live evidence and repository evidence.

Live evidence

docs/_tmp_sgbrowser_ws_probe_transcript.md proves that the real endpoint did not yield the one thing raw external control needs:

  • a reproducible status/callback response for a real browser action

Because that never happened, the bootstrap hypothesis did not clear the acceptance bar.

Repository evidence

The rest of the repository already points to a different product integration model.

1. Historical frontend code uses browser-host bridge surfaces

In frontend/archive/sgClaw验证-已归档/testRunner.js:15-26:

  • the runtime checks for window.sgFunctionsUI
  • the runtime checks for window.BrowserAction
  • the working path uses window.sgFunctionsUI(action, params, callback)

That is a host/browser bridge contract, not an external raw websocket RPC contract.

2. Prior architecture docs make CommandRouter the execution entry

In docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md:16-18 and :36-50:

  • reuse SuperRPA CommandRouter as the browser execution entry
  • keep browser-side hosting, security re-check, and dispatch in SuperRPA
  • avoid building parallel browser automation APIs

That is directly incompatible with treating raw external websocket business frames as the primary control plane.

3. Project planning docs describe FunctionsUI IPC as the supported frontend seam

In docs/archive/项目管理与排期/协作时间表.md:419-430:

  • Vue/FunctionsUI calls browser-host methods such as window.superrpa.sgclaw.start() and sendCommand(...)
  • browser host pushes callbacks such as onStatusChange(...) and onLog(...)

Again, this is a bridge and host IPC model.

4. Floating-chat planning already preserves named bridge calls

In docs/plans/2026-03-27-sgclaw-floating-chat-frontend-design.md:289-293:

  • connect() issues sgclawConnect
  • start() issues sgclawStart
  • stop() issues sgclawStop
  • submitTask() issues sgclawSubmitTask

That design work assumes a named browser bridge, not direct raw websocket frames.

Decision

Authoritative browser integration surface: the browser-host bridge path, not the raw external sgBrowser websocket business-frame path.

More concretely, sgClaw should target this chain:

sgClaw runtime
  -> existing browser-facing bridge contract
    -> FunctionsUI / host IPC
      -> BrowserAction / sgclaw host callbacks
        -> existing SuperRPA CommandRouter dispatch

Authoritative seams for future implementation

Because this repository does not contain the full SuperRPA browser host source tree, the bridge-first implementation must integrate at the nearest validated seam available in this repo, while staying aligned with the external browser-host contract already documented.

The future implementation must model two different bridge layers explicitly instead of mixing them together.

Layer 1: session/lifecycle bridge contract

This layer is evidenced by the named calls already present in repo documentation:

  • sgclawConnect
  • sgclawStart
  • sgclawStop
  • sgclawSubmitTask

This layer manages session setup, task submission, and host/UI lifecycle behavior.

It is important evidence that a browser-host bridge exists, but it is not the per-browser-action contract that a new BrowserBackend implementation should target.

Layer 2: browser-action execution contract

This is the authoritative target for the new browser backend.

It is evidenced by:

  • window.BrowserAction(...) in archived frontend code
  • FunctionsUI / host IPC integration in archived planning docs
  • browser-side dispatch through CommandRouter in docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md

In this repository, the concrete boundary must be a repo-local semantic transport seam that can be implemented and tested without access to the external SuperRPA host code.

That seam should be a narrow Rust-side contract such as BridgeActionTransport:

  • input: semantic browser action request (navigate, click, getText, etc.) plus params and expected domain
  • output: semantic success/error reply that can be normalized back into BrowserBackend results

BridgeBrowserBackend should target Layer 2 only.

Explicit out-of-scope boundary

The following are outside this repository and therefore outside the immediate Rust implementation slice:

  • actual SuperRPA C++ host/browser code
  • actual FunctionsUI TypeScript host plumbing in the external browser repository
  • actual CommandRouter implementation in the external browser repository

This repository should implement only:

  • the Rust-side bridge contract types
  • the Rust-side bridge transport/provider seam
  • the Rust-side bridge-backed browser adapter
  • deterministic tests against those seams

What this means practically

The next implementation slice should not continue trying to make WsBrowserBackend drive the real browser endpoint directly.

Instead, the next implementation slice should introduce a bridge-backed browser adapter that:

  • preserves the Rust-side BrowserBackend contract where practical
  • translates browser actions onto the Layer-2 semantic bridge surface
  • keeps lifecycle/session bridge calls separate from per-action browser execution
  • leaves the raw websocket probe code as diagnostic infrastructure only

Chosen architecture

Use a bridge-backed adapter design.

Target shape

compat/runtime/orchestration
  -> Arc<dyn BrowserBackend>
    -> BridgeBrowserBackend (new)
      -> BridgeActionTransport (new repo-local seam)
        -> external browser-host bridge / FunctionsUI IPC
          -> BrowserAction / CommandRouter path

Why this shape

  • It preserves the already-useful Rust-side browser abstraction (BrowserBackend) instead of re-plumbing the entire runtime.
  • It keeps raw websocket probing available for diagnostics without letting it dictate production architecture.
  • It matches the architecture already documented for SuperRPA integration.
  • It keeps future work narrow: one new adapter layer instead of rewriting all runtime behavior.

What stays the same

Pipe path remains unchanged

The existing pipe path must remain behaviorally unchanged:

  • src/lib.rs
  • pipe handshake behavior
  • BrowserPipeTool
  • existing HMAC/domain validation semantics

The bridge-first work is about the ws service / real browser integration path, not about replacing or weakening the pipe path.

Existing compat/runtime abstractions should be preserved where practical

The next slice should reuse:

  • BrowserBackend
  • existing browser tool adapters in compat/runtime
  • existing task runner/orchestration flow

The new work should be concentrated in a bridge adapter and its wiring, not spread through unrelated layers.

What does not stay the same

Raw websocket is no longer the mainline production assumption

The repository may keep:

  • src/browser/ws_backend.rs
  • src/browser/ws_protocol.rs
  • src/browser/ws_probe.rs
  • src/bin/sgbrowser_ws_probe.rs

But those should now be treated as:

  • protocol tooling
  • fake-server test tooling
  • live diagnostic/probe tooling
  • possibly constrained compatibility code

They should remain diagnostic-only in this repository and must not be treated as the production path for reaching the real browser.

Design constraints for the bridge slice

The bridge-path implementation must follow these constraints:

  1. No parallel browser API invention. Reuse the real bridge/browser action surface already evidenced in docs and archived frontend code.
  2. No pipe regression. Do not alter the working pipe entry path.
  3. Adapter-first design. Prefer one bridge-backed backend implementation over broad runtime rewrites.
  4. TDD first. Add focused bridge adapter tests before production wiring.
  5. Repository-local seam only. Where external SuperRPA browser-host code is unavailable here, encode the contract in narrow adapters and tests instead of guessing internals.

Testing implications

The bridge path changes what “proof” looks like.

Required proof for the next slice

The next implementation slice must prove:

  • a browser action can be emitted onto the bridge contract deterministically
  • the bridge adapter maps replies/errors back into BrowserBackend semantics
  • compat/runtime can use the bridge-backed backend without pipe regression

No longer required for acceptance

The next slice does not need to prove that raw websocket business frames work directly against ws://127.0.0.1:12345, because the current evidence rejected that path as the mainline assumption.

Acceptance criteria for this design decision

This design is correct only if future implementation follows all of these:

  1. The next production slice targets the browser-host bridge path rather than raw external websocket business frames.
  2. The raw websocket probe tooling remains diagnostic only.
  3. Existing pipe behavior stays unchanged.
  4. The next implementation plan identifies a narrow bridge-backed adapter, not a broad architecture rewrite.
  5. Future success claims are based on bridge-path execution evidence, not on reinterpreting the existing raw-websocket transcript.

Consequences

Positive

  • Aligns implementation with the strongest evidence already in the repo
  • Stops further speculative coding on the wrong control surface
  • Preserves existing ws probe work as useful diagnostics
  • Keeps the next slice narrow and testable

Trade-off

  • Requires an additional adapter design step before more production code can land
  • Defers any hope that a small websocket tweak alone will unlock the real browser path

That trade-off is correct, because the current blocker is no longer a small protocol bug. It is an integration-surface mismatch.