feat: align browser callback runtime and export flows

Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
木炎
2026-04-06 21:44:53 +08:00
parent 0dd655712c
commit bdf8e12246
55 changed files with 14440 additions and 1053 deletions

View File

@@ -0,0 +1,564 @@
# Zhihu Release WS Function-Callback Migration Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Move only the Zhihu direct-execution path to the new Release browser websocket interaction style while keeping the existing pipe protocol and non-Zhihu submit behavior unchanged.
**Architecture:** Keep `ClientMessage` / `ServiceMessage`, `run_submit_task_with_browser_backend(...)`, and the high-level Zhihu workflow steps unchanged. First prove the exact Release browser interaction contract with transcript-backed probes. Then implement the smallest Zhihu-scoped backend path that follows that proven contract. Do not globally rewire the submit path unless the probe evidence proves there is no narrower safe seam.
**Tech Stack:** Rust, tungstenite, existing sgclaw service/client pipe protocol, `docs/_tmp_sgbrowser_ws_api_doc.txt`, Release browser websocket at `ws://127.0.0.1:12345`, current Zhihu direct-execution workflow.
---
## Context
The user has now made the target behavior explicit:
- the browser has changed and the working reference behavior is the user-provided HTML page that connects to `ws://127.0.0.1:12345`
- that page sends a bootstrap registration frame: `{"type":"register","role":"web"}`
- browser requests are still JSON arrays such as `[window.location.href, "sgBrowserSetTheme", "1"]` and `[window.location.href, "sgBrowerserGetUrls", "showUrls"]`
- callback-bearing browser behavior is now centered on page-defined JS callback functions like `showUrls`, not on Rust directly reading a websocket callback frame as the final business result
- the existing sgclaw pipe protocol must remain unchanged
The current sgclaw drift that must be corrected is visible in:
- `src/browser/ws_protocol.rs`
- `Action::Navigate` currently emits `sgHideBrowserCallAfterLoaded` with an inline `callBackJsToCpp(...)` string
- `src/browser/ws_backend.rs`
- Rust currently waits for a browser websocket callback frame and treats that as the action result
- `tests/service_ws_session_test.rs:498-605`
- `tests/service_task_flow_test.rs:499-635`
- existing **generic submit-flow** regressions still lock in the old direct raw-websocket callback-frame assumption
- these are useful as non-regression guardrails, but they are not themselves Zhihu-specific regressions
Zhihu-specific verification must therefore be added explicitly instead of assuming those Baidu-path tests already cover Zhihu.
The new browser style proves these facts and only these facts so far:
1. sgclaw must handle a register-first websocket handshake
2. browser requests are still `[requesturl, action, ...args]`
3. some browser capabilities now return through page-defined callback functions like `showUrls`
4. the current direct raw-websocket callback expectation in Zhihu path is no longer a safe assumption
The production seam is **not** pre-decided here. Task 1 must determine whether Zhihu can be integrated by:
- a direct Zhihu-scoped backend with no helper page, or
- a helper page / relay design because named page callbacks are the only reliable result path
Until Task 1 evidence is captured, both remain hypotheses.
## Evidence to preserve in the implementation
### Browser websocket API doc
From `docs/_tmp_sgbrowser_ws_api_doc.txt`:
- `ws://localhost:12345` is the browser websocket endpoint
- request frames are array payloads with `requesturl`
- `sgBrowerserGetUrls(callback)` uses a callback **function name**: `[requesturl,"sgBrowerserGetUrls", callback]`
- `sgBrowserCallAfterLoaded(targetUrl, callback)` and `sgHideBrowserCallAfterLoaded(targetUrl, callback)` use callback strings with parentheses
- `callBackJsToCpp(param)` uses `sourceUrl@_@targetUrl@_@callback@_@actionUrl@_@responseTxt`
- `sgBrowserRegJsFun(targeturl, funContent)` and `sgBrowserExcuteJsFun(targeturl, funName)` exist and may be useful when the helper page needs durable callback helpers
### Current working HTML pattern from the user
The now-working reference interaction is:
```html
const socket = new WebSocket('ws://127.0.0.1:12345');
socket.onopen = () => {
socket.send(JSON.stringify({type: 'register', role: 'web'}));
};
socket.send(JSON.stringify([window.location.href,"sgBrowerserGetUrls","showUrls"]));
function showUrls(urls) {
// browser invokes this page-defined callback
}
```
That is the browser behavior sgclaw now needs to follow.
---
## Critical files
### Production files to modify
- `src/browser/ws_protocol.rs`
- `src/compat/workflow_executor.rs` (only if a narrow Zhihu-specific correction is required after backend swap)
- `src/service/server.rs` (only if the chosen Zhihu-scoped integration seam must be wired here)
- `src/service/mod.rs` (only if startup plumbing changes are truly required)
- `src/browser/mod.rs`
### New production files likely needed
- `src/browser/zhihu_release_backend.rs`
- a Zhihu-scoped `BrowserBackend` adapter that follows the proven Release browser interaction style without changing non-Zhihu routes
- `src/service/browser_callback_host.rs` **only if the probe proves a service-controlled helper page is actually required**
- service-local helper-page lifecycle and callback relay, if evidence shows the browser cannot be driven safely without it
### Existing files to preserve
- `src/agent/task_runner.rs`
- `src/service/protocol.rs`
- `src/compat/orchestration.rs`
- `src/compat/runtime.rs`
- `src/pipe/*`
### Existing direct-ws files to review explicitly
- `src/browser/ws_backend.rs`
- `tests/browser_ws_backend_test.rs`
These files currently encode the old direct raw-websocket callback expectation. The implementation must either:
- leave them untouched as legacy/direct-contract coverage with no Zhihu production callers, or
- update/remove the Zhihu-specific assumptions they currently lock in.
### Primary test files
- `tests/browser_ws_probe_test.rs`
- `tests/browser_ws_protocol_test.rs`
- `tests/service_ws_session_test.rs`
- `tests/service_task_flow_test.rs`
- `tests/task_runner_test.rs`
- `tests/browser_ws_backend_test.rs`
---
## File structure decisions
### `src/browser/zhihu_release_backend.rs`
Prefer a Zhihu-scoped backend first.
Responsibilities:
- keep the same `BrowserBackend` trait surface
- implement only the behavior needed by the current Zhihu direct-execution route
- translate `Action::Navigate`, `Action::GetText`, and `Action::Eval` into the proven Release-browser interaction style
- normalize results back into `CommandOutput`
- avoid affecting non-Zhihu callers
This is the preferred seam because the user asked to change the current Zhihu flow, not to redesign the whole submit pipeline.
### `src/service/browser_callback_host.rs` (conditional)
Create this file only if Task 1 probe evidence proves that sgclaw must host or control a page in order to receive named callback-function results.
If it is needed, the plan must keep the design minimal and specific:
- one concrete transport only (choose websocket or HTTP, not “websocket or HTTP”)
- explicit readiness handshake
- explicit request correlation by `request_id`
- explicit cleanup when the submit task ends
If Task 1 shows a simpler seam, do not create this file.
### `src/browser/ws_protocol.rs`
Do not let this file keep only the old direct-callback assumption.
It should become the shared place for doc-native request builders such as:
- browser bootstrap frames proven by the transcript
- `sgBrowserCallAfterLoaded` / `sgHideBrowserCallAfterLoaded`
- `sgBrowserExcuteJsCodeByArea`
- optional `sgBrowserRegJsFun` / `sgBrowserExcuteJsFun`
But do **not** let `ws_protocol.rs` absorb service-host lifecycle logic.
### `src/browser/ws_backend.rs` and `tests/browser_ws_backend_test.rs`
Handle these explicitly in the implementation:
- if they still describe a valid direct browser contract, keep them as isolated legacy/direct-ws coverage only
- if their current navigate/callback assumptions conflict with the proven Release Zhihu path, update or narrow those tests so they no longer describe the active Zhihu integration path
Do not leave the old direct-callback assumptions ambiguously “reviewed”; the implementation must make their status explicit.
---
## Task 1: Capture the new Release browser contract in a reproducible probe transcript
**Files:**
- Review/modify: `src/browser/ws_probe.rs`
- Review/modify: `src/bin/sgbrowser_ws_probe.rs`
- Review/modify: `tests/browser_ws_probe_test.rs`
- Create: `docs/_tmp_release_ws_callback_host_transcript.md`
- [ ] **Step 1: Verify current probe coverage against the Release-browser questions**
Read the existing probe module and tests and check whether they already prove all of the following:
- a register-first websocket script can be expressed
- a later array action frame can be expressed in the same script
- per-step inbound frames/outcomes are preserved separately
- timeout/close remain distinguishable in the transcript
Required result:
- identify the exact existing tests that already prove these behaviors
- identify the smallest missing Release-specific coverage, if any
- [ ] **Step 2: Add only the missing regression coverage**
If current tests do **not** already prove the Release-browser bootstrap shape, add the narrowest failing regression in `tests/browser_ws_probe_test.rs`.
Preferred shape if coverage is missing:
```rust
#[test]
fn probe_supports_register_then_array_action_script() {
// fake server expects:
// 1. {"type":"register","role":"web"}
// 2. ["http://127.0.0.1/helper.html","sgBrowerserGetUrls","showUrls"]
}
```
And, if still missing, add one regression proving per-step transcript separation for the register reply and later action reply.
If those behaviors are already covered, skip new test creation and record the exact test names to rely on.
- [ ] **Step 3: Run the relevant probe tests**
Run the narrowest exact tests that prove the Release bootstrap behavior, or the full file if multiple areas changed:
```bash
cargo test --test browser_ws_probe_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 4: Make the probe binary ergonomic for the Release transcript if needed**
Only if the current CLI cannot conveniently express the real Release-browser script, make the smallest change needed in `src/bin/sgbrowser_ws_probe.rs` / `src/browser/ws_probe.rs` so it can capture:
- register frame behavior
- minimal `sgBrowserSetTheme`
- minimal `sgBrowerserGetUrls`
- exact inbound websocket text per step
Do not redesign the probe if it already supports this.
- [ ] **Step 5: Run the live probe against the Release browser and record the real bootstrap**
Use the probe binary against the real endpoint to capture at minimum:
- register frame behavior
- minimal `sgBrowserSetTheme`
- minimal `sgBrowerserGetUrls`
- whether replies come back as websocket text, page-function invocation only, or both
Save the exact transcript in `docs/_tmp_release_ws_callback_host_transcript.md`.
Required output in that temp doc:
- exact sent frames
- exact received websocket frames
- the observed rule for when named callback functions are invoked
- whether Option A or Option B is supported by evidence
- [ ] **Step 6: Commit the probe-only slice if code changed**
If probe code/tests changed:
```bash
git add src/browser/ws_probe.rs src/bin/sgbrowser_ws_probe.rs tests/browser_ws_probe_test.rs docs/_tmp_release_ws_callback_host_transcript.md
git commit -m "test: capture release browser ws bootstrap contract"
```
If only the transcript doc changed, stage only that file and use a docs/test-appropriate commit message.
---
## Task 2: Choose the narrowest Zhihu-only production seam from the probe evidence
**Files:**
- Modify: `src/service/server.rs` (only if required)
- Modify: `src/browser/mod.rs`
- Modify: `src/compat/workflow_executor.rs` (only if required)
- Create: `src/browser/zhihu_release_backend.rs`
- Create: `src/service/browser_callback_host.rs` **only if required**
- Test: `tests/service_ws_session_test.rs`
- Test: `tests/service_task_flow_test.rs`
- [ ] **Step 1: Write down the seam decision in the plan notes before coding**
Based on the transcript from Task 1, record which one of these is supported by evidence:
- Option A: a Zhihu-scoped backend can talk to the Release browser directly with no service-hosted helper page
- Option B: a Zhihu-scoped backend needs a service-controlled helper page because named page callbacks are the only reliable way to get business results
Do not proceed until one option is chosen explicitly from evidence.
- [ ] **Step 2: Add a failing service/task-flow regression that proves only the Zhihu path changes**
Update or add focused tests so that:
- Zhihu submit flow uses the new Release-browser interaction seam
- non-Zhihu behavior is unchanged
- pipe messages remain unchanged
Required assertions:
- the new path is activated only for Zhihu route detection
- `ClientMessage` / `ServiceMessage` stay identical
- existing non-Zhihu submit behavior is not accidentally rerouted
- [ ] **Step 3: Run the new focused regression and confirm failure first**
Run the narrowest exact test names you added in:
```bash
cargo test --test service_ws_session_test <new_test_name> -- --nocapture
cargo test --test service_task_flow_test <new_test_name> -- --nocapture
```
Expected: FAIL because the Zhihu-specific seam does not exist yet.
- [ ] **Step 4: Implement the chosen seam with the smallest blast radius**
If Option A won:
- add `src/browser/zhihu_release_backend.rs`
- wire it only where the Zhihu direct-execution route is selected
- leave global submit-path wiring alone
If Option B won:
- add `src/service/browser_callback_host.rs` with one specific transport and one explicit readiness/correlation model
- add `src/browser/zhihu_release_backend.rs` to talk to that helper path
- wire it only for the Zhihu route
In both cases:
- do not change non-Zhihu callers
- do not redesign `run_submit_task_with_browser_backend(...)`
- do not change the pipe protocol
- [ ] **Step 5: Make the status of old direct-ws code explicit**
Update `src/browser/ws_backend.rs` / `tests/browser_ws_backend_test.rs` only as needed so they no longer ambiguously describe the active Zhihu path.
Allowed outcomes:
- keep them untouched as legacy/direct-ws coverage with no Zhihu production caller
- narrow/update the tests so they no longer claim the active Zhihu integration path
Not allowed:
- leaving the plan and code in a state where both old and new paths appear to be the active Zhihu contract
- [ ] **Step 6: Run focused integration tests**
Run:
```bash
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 7: Commit the seam-selection slice**
Adjust staged files to match the option actually implemented, for example:
```bash
git add src/browser/zhihu_release_backend.rs src/browser/mod.rs src/service/server.rs src/service/browser_callback_host.rs tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/browser_ws_backend_test.rs
git commit -m "feat: route zhihu flow through release browser ws contract"
```
Only stage files that were truly changed.
---
## Task 3: Implement Zhihu action mapping on the chosen Release-browser seam
**Files:**
- Modify: `src/browser/ws_protocol.rs`
- Modify: `src/browser/zhihu_release_backend.rs`
- Test: `tests/browser_ws_protocol_test.rs`
- Create: `tests/browser_zhihu_release_backend_test.rs`
- [ ] **Step 1: Write the first failing backend test for Zhihu navigate mapping**
Create `tests/browser_zhihu_release_backend_test.rs` with a fake transport/relay and assert that `Action::Navigate` for the Zhihu path becomes the exact browser request shape proven by Task 1.
Start with this shape:
```rust
#[test]
fn zhihu_release_backend_maps_navigate_to_proven_release_frame() {
// invoke Action::Navigate
// assert exact outbound frame/opcode chosen from transcript evidence
}
```
Required assertions:
- the call site still uses `BrowserBackend::invoke(...)`
- the exact outbound frame matches the recorded Release-browser evidence
- request correlation stays deterministic
- [ ] **Step 2: Run the single new backend test and verify it fails**
Run:
```bash
cargo test --test browser_zhihu_release_backend_test zhihu_release_backend_maps_navigate_to_proven_release_frame -- --nocapture
```
Expected: FAIL because the backend does not exist yet.
- [ ] **Step 3: Implement minimal `Navigate` support**
In `src/browser/zhihu_release_backend.rs`:
- implement `BrowserBackend`
- support `Action::Navigate` first
- use `ws_protocol.rs` helpers for exact browser-frame construction
- do not hardcode speculative opcodes; follow the transcript from Task 1
- [ ] **Step 4: Add failing tests for `GetText` and `Eval`**
Add tests proving:
- `Action::GetText` returns `CommandOutput.data == {"text": "..."}`
- `Action::Eval` returns `CommandOutput.data == {"text": "..."}`
- callback or relay failures become `PipeError::Protocol(...)`
- [ ] **Step 5: Implement `GetText` and `Eval` on the chosen seam**
Use the smallest proven mechanism:
- if the transcript proves page-defined callback functions are required, route through them
- if `callBackJsToCpp(...)` to a page context is still part of the proven path, use it deliberately
- if `sgBrowserRegJsFun` / `sgBrowserExcuteJsFun` becomes necessary, add it only with test coverage and only for the Zhihu path
- [ ] **Step 6: Run focused backend/protocol tests**
Run:
```bash
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 7: Commit the Zhihu backend slice**
```bash
git add src/browser/ws_protocol.rs src/browser/zhihu_release_backend.rs src/browser/mod.rs tests/browser_ws_protocol_test.rs tests/browser_zhihu_release_backend_test.rs
git commit -m "feat: add zhihu release ws backend"
```
---
## Task 4: Keep the Zhihu workflow logic stable and patch only proven mismatches
**Files:**
- Review: `src/compat/workflow_executor.rs`
- Test: `tests/service_task_flow_test.rs`
- Test: `tests/compat_runtime_test.rs` (only if a focused direct-execution regression is needed)
- [ ] **Step 1: Write a failing Zhihu-specific regression only if the chosen seam changes route assumptions**
If the new Zhihu backend changes request-url or target-url handling enough to break hotlist flow, add one focused failing regression for that exact behavior.
Candidate assertions:
- hotlist navigate still logs `navigate https://www.zhihu.com/hot`
- follow-up `GetText body` still targets the Zhihu page, not any helper page
- extractor `Eval` still runs against Zhihu, not any helper page
- [ ] **Step 2: Keep the current high-level Zhihu action sequence unless a test proves otherwise**
`src/compat/workflow_executor.rs` currently does the right high-level work:
- navigate to Zhihu hotlist
- poll body text until ready
- run the extractor script
Prefer to keep this file unchanged. Only patch it if the new backend needs a narrow explicit `target_url` fix or similar evidence-backed adjustment.
- [ ] **Step 3: Run the smallest Zhihu-focused verification sweep**
Run:
```bash
cargo test --test service_task_flow_test -- --nocapture
cargo test --test compat_runtime_test zhihu -- --nocapture
```
If the `compat_runtime_test zhihu` filter is too broad or unstable, run the exact focused Zhihu cases that cover hotlist extraction.
- [ ] **Step 4: Commit only if a Zhihu-specific code change was actually required**
```bash
git add src/compat/workflow_executor.rs tests/service_task_flow_test.rs tests/compat_runtime_test.rs
git commit -m "fix: keep zhihu workflow aligned with release ws backend"
```
Skip this commit if no production change in `workflow_executor.rs` was needed.
---
## Task 5: Prove that pipe behavior and non-Zhihu behavior stayed unchanged
**Files:**
- Test: `tests/service_ws_session_test.rs`
- Test: `tests/service_task_flow_test.rs`
- Test: `tests/task_runner_test.rs`
- [ ] **Step 1: Add or update one regression that proves pipe messages are unchanged**
Use the smallest existing test seam to assert that `ClientMessage` / `ServiceMessage` payloads remain unchanged while the Zhihu route uses the new browser integration path internally.
- [ ] **Step 2: Add or update one regression that proves non-Zhihu behavior is unchanged**
Use a non-Zhihu submit or service-session case and assert it does not take the new Zhihu-specific backend path.
- [ ] **Step 3: Preserve current runtime regression guards**
The end-to-end tests must continue asserting that output does **not** contain:
- `invalid hmac seed: session key must not be empty`
- `Cannot drop a runtime in a context where blocking is not allowed`
- [ ] **Step 4: Run the final focused verification sweep**
Run:
```bash
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Commit the verification sweep**
```bash
git add tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/task_runner_test.rs tests/browser_ws_backend_test.rs
git commit -m "test: constrain zhihu release ws migration scope"
```
Only stage files that were truly changed.
---
## Out of scope
Do **not** do these in this slice:
- change the pipe protocol
- change `ClientMessage` / `ServiceMessage`
- redesign `run_submit_task_with_browser_backend(...)`
- reintroduce any browser bridge surface
- keep adding speculative direct-raw-websocket callback patches to `ws_backend.rs`
- redesign non-Zhihu workflows unless the new backend abstraction forces a shared fix
- create a long-lived external dependency or third-party server just to host the helper page
---
## Verification checklist
Run at minimum:
```bash
cargo test --test browser_ws_probe_test -- --nocapture
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
```
If Task 2 chose the helper-page / relay design, also run the helper-page-specific backend tests you added for that path.
Manual verification after code changes:
1. start the real Release browser/runtime that exposes `ws://127.0.0.1:12345`
2. start `sg_claw` with real config
3. start `sg_claw_client`
4. submit:
- `打开知乎热榜获取前10条数据并导出 Excel`
5. confirm the Zhihu path uses the exact Release-browser interaction seam proven by Task 1
6. if Task 2 chose Option B, confirm the helper page / relay path is used only for the Zhihu integration seam
7. confirm non-Zhihu behavior is unchanged
8. confirm the task completes without:
- `timeout while waiting for browser message`
- `invalid browser status frame: Welcome! You are client #1`
- `invalid hmac seed: session key must not be empty`
- `Cannot drop a runtime in a context where blocking is not allowed`
---
## Expected outcome
After this slice:
- sgclaw still exposes the same pipe/service contract
- Zhihu hotlist execution uses the Release-browser websocket contract proven by Task 1
- non-Zhihu behavior remains unchanged
- old direct-ws Zhihu assumptions are no longer ambiguous in production/tests
- if Option A won, Zhihu uses a direct Release-browser backend
- if Option B won, Zhihu uses the minimal helper-page / relay seam justified by the probe evidence