Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
289 lines
12 KiB
Markdown
289 lines
12 KiB
Markdown
# WS Browser Integration Surface Correction Design
|
|
|
|
## Background
|
|
|
|
The current websocket service path already proved two things:
|
|
|
|
1. `sg_claw_client -> sg_claw` request handling works.
|
|
2. The ws-native backend/auth replacement removed the old pipe/HMAC mismatch that produced `invalid hmac seed: session key must not be empty`.
|
|
|
|
However, real sgBrowser smoke still does not work.
|
|
|
|
Manual probing against the configured real browser websocket endpoint (`ws://127.0.0.1:12345`) produced a stable pattern:
|
|
|
|
- the connection succeeds
|
|
- the server sends one banner text frame such as `Welcome! You are client #1`
|
|
- after that, business frames receive no status frame and no callback frame
|
|
- this remains true for:
|
|
- valid-looking `sgBrowerserOpenPage` frames
|
|
- callback-based APIs
|
|
- no-arg/context-light APIs
|
|
- malformed or obviously wrong frames
|
|
|
|
At the same time, local documentation and archived frontend code point to a different integration model:
|
|
|
|
- the websocket API doc describes the websocket service as a transport replacement for page-context JavaScript calls, and requires the current page URL (`requesturl`) in each message
|
|
- archived frontend/product code uses `window.sgFunctionsUI(...)` and `window.BrowserAction(...)`
|
|
- archived architecture docs describe the supported product path as `FunctionsUI -> browser host bridge -> BrowserAction/CommandRouter`, not an arbitrary external process speaking raw browser websocket frames
|
|
|
|
This means the current assumption is no longer acceptable as the default architecture hypothesis:
|
|
|
|
- **Rejected default assumption:** `sg_claw` can directly control the real browser by speaking raw business frames to `browserWsUrl` as an external client, with no additional browser-host bridge, page context, or bootstrap/session contract.
|
|
|
|
That assumption may still turn out to be partially true, but it is no longer justified enough to continue coding against as the mainline design.
|
|
|
|
## Problem Statement
|
|
|
|
The project currently has a functioning ws-native transport implementation, but it does **not** have a validated real integration surface for sgBrowser.
|
|
|
|
The unresolved question is now architectural rather than syntactic:
|
|
|
|
### Possibility A: raw websocket is valid, but requires hidden bootstrap/preconditions
|
|
|
|
Examples suggested by the local API document:
|
|
|
|
- a real browser page must already exist and `requesturl` must refer to that page
|
|
- one or more setup calls such as `sgSetAuthInfo`, `sgBrowserLogin`, `sgOpenAgent`, or `sgBrowerserActiveTab` must happen first
|
|
- callbacks may require a browser-side JS/page context that an external process does not automatically have
|
|
- some APIs may only work against agent/show/hide areas after browser-side initialization
|
|
|
|
### Possibility B: raw websocket is not the supported external control surface
|
|
|
|
Instead, the real product path may require:
|
|
|
|
- `FunctionsUI` / browser-host IPC
|
|
- host-side security and routing
|
|
- `BrowserAction` / `CommandRouter` dispatch
|
|
- page-injected or browser-embedded execution context
|
|
|
|
If this is true, continuing to invest in raw external websocket business-frame handling as the main integration surface would be architectural drift.
|
|
|
|
## Goal
|
|
|
|
Replace the current unvalidated ws-native-direct assumption with a decision-backed integration strategy.
|
|
|
|
The next implementation slice must do exactly one of these two things based on evidence:
|
|
|
|
1. **Bootstrap path:** prove that raw websocket control is real and supported once the missing bootstrap/precondition sequence is performed, then codify that bootstrap sequence and keep `WsBrowserBackend` as the execution surface.
|
|
2. **Bridge path:** prove that raw websocket is not the real supported surface for external control, then pivot the runtime design so sgClaw targets the actual browser-host bridge / `BrowserAction` surface instead of pretending the raw websocket is enough.
|
|
|
|
## Non-goals
|
|
|
|
This correction slice does **not** include:
|
|
|
|
- broad feature work on the floating chat UI
|
|
- multi-client service redesign
|
|
- browser process lifecycle management
|
|
- speculative protocol expansion
|
|
- generic reconnection/backoff work
|
|
- rewriting the entire compat/runtime stack without evidence
|
|
- landing both bootstrap and bridge implementations in one branch
|
|
|
|
The purpose of this slice is to choose the correct integration surface first.
|
|
|
|
## Evidence Summary
|
|
|
|
### Evidence that the current raw-ws-direct assumption is weak
|
|
|
|
1. Real endpoint accepts connections but stays silent after the welcome/banner frame.
|
|
2. Silence occurs even for malformed frames, which suggests the endpoint is not acting like an openly documented RPC surface for arbitrary external clients.
|
|
3. The API documentation frames websocket use as a replacement for page-side JS invocation, not as a standalone public automation API.
|
|
4. The documentation repeatedly depends on `requesturl`, callback function names, target pages, and browser areas (`show`, `hide`, `agent`).
|
|
5. Historical frontend/product code uses `window.sgFunctionsUI(...)` and `window.BrowserAction(...)`, not raw external websocket business calls.
|
|
6. Historical architecture docs emphasize `FunctionsUI`, `CommandRouter`, and browser-host bridge seams.
|
|
|
|
### Evidence that the current ws-native work is still useful
|
|
|
|
1. The ws-native auth replacement removed a real bug.
|
|
2. The ws backend now correctly carries forward the last navigated request URL.
|
|
3. `WsBrowserBackend` and `ws_protocol` remain valuable as deterministic protocol tooling for fake-server tests and any future bootstrap validation.
|
|
|
|
So the conclusion is **not** “delete ws-native work.”
|
|
|
|
The conclusion is:
|
|
|
|
- do not treat raw external websocket control as validated product architecture yet
|
|
- use the ws-native code only behind a decision gate
|
|
|
|
## Design Decision
|
|
|
|
Adopt a **decision-gated integration strategy**.
|
|
|
|
### Decision Gate 1: Validate bootstrap viability first
|
|
|
|
Before any more production architecture changes, add a focused, deterministic validation harness that can exercise a candidate raw-websocket bootstrap sequence against a live endpoint.
|
|
|
|
The harness must support:
|
|
|
|
- ordered frame scripts
|
|
- exact frame logging
|
|
- exact timeout/silence observation
|
|
- trying candidate setup sequences such as:
|
|
- `sgSetAuthInfo`
|
|
- `sgBrowserLogin`
|
|
- `sgOpenAgent`
|
|
- `sgBrowerserActiveTab`
|
|
- then a minimal action such as `sgBrowerserOpenPage` or `sgBrowserExcuteJsCodeByArea`
|
|
- trying the same action with different `requesturl` assumptions
|
|
- distinguishing these outcomes:
|
|
- numeric status returned
|
|
- callback returned
|
|
- welcome only, then silence
|
|
- close/reset
|
|
- protocol error
|
|
|
|
This harness is not product code. It is an evidence tool that prevents blind implementation.
|
|
|
|
### Decision Gate 2: Make bridge pivot the default fallback
|
|
|
|
If the validation harness cannot demonstrate a reproducible bootstrap sequence that yields real status/callback frames from the live browser endpoint, then raw websocket must be considered **non-validated for external control**.
|
|
|
|
At that point, the design must pivot to the bridge path:
|
|
|
|
- sgClaw browser control targets the real browser-host integration surface
|
|
- use the bridge already evidenced in docs/code (`FunctionsUI`, browser host IPC, `BrowserAction`, `CommandRouter`)
|
|
- keep raw websocket support, if retained at all, as a diagnostic or highly constrained adapter rather than the primary product path
|
|
|
|
## Architecture Options
|
|
|
|
## Option A: Bootstrap-validated raw websocket path
|
|
|
|
Choose this only if the live validation harness produces repeatable evidence.
|
|
|
|
### Resulting architecture
|
|
|
|
```text
|
|
sg_claw_client
|
|
-> sg_claw service
|
|
-> bootstrap sequence executor
|
|
-> WsBrowserBackend
|
|
-> browserWsUrl
|
|
-> sgBrowser
|
|
```
|
|
|
|
### Required conditions
|
|
|
|
- a reproducible bootstrap sequence exists
|
|
- the sequence yields status/callback traffic for real business actions
|
|
- the sequence can be encoded as a narrow service-side precondition layer
|
|
- the sequence does not require unowned browser UI/manual setup outside a documented contract
|
|
|
|
### Allowed production changes if Option A wins
|
|
|
|
- add explicit bootstrap calls before first browser action
|
|
- persist validated session/context state needed by the real endpoint
|
|
- tighten `request_url` / target-page handling around the proven contract
|
|
|
|
### Not allowed even if Option A wins
|
|
|
|
- guessing bootstrap steps without evidence
|
|
- silently sprinkling many setup calls into random locations
|
|
- broadening the compat/runtime API before the bootstrap contract is known
|
|
|
|
## Option B: Bridge-first integration path
|
|
|
|
Choose this if live validation does not prove a workable raw websocket bootstrap.
|
|
|
|
### Resulting architecture
|
|
|
|
```text
|
|
sg_claw_client
|
|
-> sg_claw service
|
|
-> bridge adapter
|
|
-> browser host / FunctionsUI / BrowserAction / CommandRouter
|
|
-> sgBrowser page actions
|
|
```
|
|
|
|
### Required conditions
|
|
|
|
- local docs/code show a stable supported bridge path
|
|
- raw websocket remains non-validated or only page-context-scoped
|
|
- the bridge surface can be wrapped behind the existing `BrowserBackend` abstraction or a sibling adapter without weakening pipe behavior
|
|
|
|
### Allowed production changes if Option B wins
|
|
|
|
- add a new browser backend implementation that targets the real bridge surface
|
|
- redirect ws service/browser execution away from raw business frames
|
|
- preserve ws-native code only for tests, probes, or intentionally constrained cases
|
|
|
|
### Not allowed even if Option B wins
|
|
|
|
- pretending the old raw-ws mainline still works “well enough”
|
|
- leaving the service path ambiguously split between two competing primary surfaces
|
|
|
|
## Scope Guardrails for the Next Implementation Plan
|
|
|
|
The next implementation plan must obey these guardrails:
|
|
|
|
1. **One branch, one decision.** Do not implement both architecture options at once.
|
|
2. **Evidence before code.** If bootstrap is unproven, the next coding task is probe/validation tooling, not another speculative service/runtime refactor.
|
|
3. **Keep pipe untouched.** `src/lib.rs`, pipe handshake, and the pipe `BrowserPipeTool` path remain behaviorally unchanged.
|
|
4. **Do not delete ws-native code prematurely.** It still has value for protocol tests and validation tooling.
|
|
5. **Do not broaden success claims.** Removing `invalid hmac seed` did not make real browser control work.
|
|
|
|
## Testing Strategy
|
|
|
|
### Stage 1: Evidence tooling tests
|
|
|
|
Add deterministic tests for the live-probe/validation harness so it can:
|
|
|
|
- send an ordered frame script
|
|
- record exact received frames
|
|
- report silence/timeout precisely
|
|
- expose transcript output suitable for comparing candidate bootstrap sequences
|
|
|
|
These tests use a fake websocket server, not sgBrowser.
|
|
|
|
### Stage 2: Live validation runs
|
|
|
|
Use the harness against the real endpoint with a fixed matrix of candidate sequences.
|
|
|
|
At minimum, compare:
|
|
|
|
1. no bootstrap -> minimal action
|
|
2. `sgOpenAgent` -> minimal action
|
|
3. `sgSetAuthInfo` -> minimal action
|
|
4. `sgBrowserLogin` -> minimal action
|
|
5. `sgBrowerserActiveTab` -> minimal action
|
|
6. combined documented bootstrap candidates -> minimal action
|
|
7. alternate `requesturl` values representing:
|
|
- `about:blank`
|
|
- target page URL
|
|
- a currently open page URL if known
|
|
|
|
### Stage 3: Architecture-branch acceptance
|
|
|
|
If Option A wins:
|
|
|
|
- add one automated regression that proves the validated bootstrap sequence produces the first real status frame in a controlled integration test
|
|
- then continue with the narrowest production implementation plan
|
|
|
|
If Option B wins:
|
|
|
|
- write a new bridge-integration implementation plan before changing production code
|
|
- base all production tasks on the documented bridge surface
|
|
|
|
## Acceptance Criteria for This Design Correction
|
|
|
|
This design correction is successful only if future work follows these rules:
|
|
|
|
1. The repository has an explicit design document recording that raw ws-native direct control is **not currently validated**.
|
|
2. The next engineering slice starts with validation or bridge selection, not another speculative runtime refactor.
|
|
3. Any future claim that raw websocket is the supported production path must be backed by a reproducible live bootstrap transcript.
|
|
4. If that evidence does not appear, the project pivots to the bridge path rather than continuing to guess.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- stops further speculative coding against an unproven surface
|
|
- preserves useful ws-native work without over-committing to it
|
|
- creates a clean decision point for the next implementation branch
|
|
|
|
### Trade-off
|
|
|
|
- this does not immediately unblock real browser control
|
|
- it intentionally inserts an evidence phase before more production changes
|
|
|
|
That trade-off is acceptable because the current failure mode is architectural uncertainty, not a missing two-line fix.
|