Files
claw/docs/superpowers/plans/2026-04-03-zhihu-release-ws-function-callback-plan.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

565 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Zhihu Release WS Function-Callback Migration Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Move only the Zhihu direct-execution path to the new Release browser websocket interaction style while keeping the existing pipe protocol and non-Zhihu submit behavior unchanged.
**Architecture:** Keep `ClientMessage` / `ServiceMessage`, `run_submit_task_with_browser_backend(...)`, and the high-level Zhihu workflow steps unchanged. First prove the exact Release browser interaction contract with transcript-backed probes. Then implement the smallest Zhihu-scoped backend path that follows that proven contract. Do not globally rewire the submit path unless the probe evidence proves there is no narrower safe seam.
**Tech Stack:** Rust, tungstenite, existing sgclaw service/client pipe protocol, `docs/_tmp_sgbrowser_ws_api_doc.txt`, Release browser websocket at `ws://127.0.0.1:12345`, current Zhihu direct-execution workflow.
---
## Context
The user has now made the target behavior explicit:
- the browser has changed and the working reference behavior is the user-provided HTML page that connects to `ws://127.0.0.1:12345`
- that page sends a bootstrap registration frame: `{"type":"register","role":"web"}`
- browser requests are still JSON arrays such as `[window.location.href, "sgBrowserSetTheme", "1"]` and `[window.location.href, "sgBrowerserGetUrls", "showUrls"]`
- callback-bearing browser behavior is now centered on page-defined JS callback functions like `showUrls`, not on Rust directly reading a websocket callback frame as the final business result
- the existing sgclaw pipe protocol must remain unchanged
The current sgclaw drift that must be corrected is visible in:
- `src/browser/ws_protocol.rs`
- `Action::Navigate` currently emits `sgHideBrowserCallAfterLoaded` with an inline `callBackJsToCpp(...)` string
- `src/browser/ws_backend.rs`
- Rust currently waits for a browser websocket callback frame and treats that as the action result
- `tests/service_ws_session_test.rs:498-605`
- `tests/service_task_flow_test.rs:499-635`
- existing **generic submit-flow** regressions still lock in the old direct raw-websocket callback-frame assumption
- these are useful as non-regression guardrails, but they are not themselves Zhihu-specific regressions
Zhihu-specific verification must therefore be added explicitly instead of assuming those Baidu-path tests already cover Zhihu.
The new browser style proves these facts and only these facts so far:
1. sgclaw must handle a register-first websocket handshake
2. browser requests are still `[requesturl, action, ...args]`
3. some browser capabilities now return through page-defined callback functions like `showUrls`
4. the current direct raw-websocket callback expectation in Zhihu path is no longer a safe assumption
The production seam is **not** pre-decided here. Task 1 must determine whether Zhihu can be integrated by:
- a direct Zhihu-scoped backend with no helper page, or
- a helper page / relay design because named page callbacks are the only reliable result path
Until Task 1 evidence is captured, both remain hypotheses.
## Evidence to preserve in the implementation
### Browser websocket API doc
From `docs/_tmp_sgbrowser_ws_api_doc.txt`:
- `ws://localhost:12345` is the browser websocket endpoint
- request frames are array payloads with `requesturl`
- `sgBrowerserGetUrls(callback)` uses a callback **function name**: `[requesturl,"sgBrowerserGetUrls", callback]`
- `sgBrowserCallAfterLoaded(targetUrl, callback)` and `sgHideBrowserCallAfterLoaded(targetUrl, callback)` use callback strings with parentheses
- `callBackJsToCpp(param)` uses `sourceUrl@_@targetUrl@_@callback@_@actionUrl@_@responseTxt`
- `sgBrowserRegJsFun(targeturl, funContent)` and `sgBrowserExcuteJsFun(targeturl, funName)` exist and may be useful when the helper page needs durable callback helpers
### Current working HTML pattern from the user
The now-working reference interaction is:
```html
const socket = new WebSocket('ws://127.0.0.1:12345');
socket.onopen = () => {
socket.send(JSON.stringify({type: 'register', role: 'web'}));
};
socket.send(JSON.stringify([window.location.href,"sgBrowerserGetUrls","showUrls"]));
function showUrls(urls) {
// browser invokes this page-defined callback
}
```
That is the browser behavior sgclaw now needs to follow.
---
## Critical files
### Production files to modify
- `src/browser/ws_protocol.rs`
- `src/compat/workflow_executor.rs` (only if a narrow Zhihu-specific correction is required after backend swap)
- `src/service/server.rs` (only if the chosen Zhihu-scoped integration seam must be wired here)
- `src/service/mod.rs` (only if startup plumbing changes are truly required)
- `src/browser/mod.rs`
### New production files likely needed
- `src/browser/zhihu_release_backend.rs`
- a Zhihu-scoped `BrowserBackend` adapter that follows the proven Release browser interaction style without changing non-Zhihu routes
- `src/service/browser_callback_host.rs` **only if the probe proves a service-controlled helper page is actually required**
- service-local helper-page lifecycle and callback relay, if evidence shows the browser cannot be driven safely without it
### Existing files to preserve
- `src/agent/task_runner.rs`
- `src/service/protocol.rs`
- `src/compat/orchestration.rs`
- `src/compat/runtime.rs`
- `src/pipe/*`
### Existing direct-ws files to review explicitly
- `src/browser/ws_backend.rs`
- `tests/browser_ws_backend_test.rs`
These files currently encode the old direct raw-websocket callback expectation. The implementation must either:
- leave them untouched as legacy/direct-contract coverage with no Zhihu production callers, or
- update/remove the Zhihu-specific assumptions they currently lock in.
### Primary test files
- `tests/browser_ws_probe_test.rs`
- `tests/browser_ws_protocol_test.rs`
- `tests/service_ws_session_test.rs`
- `tests/service_task_flow_test.rs`
- `tests/task_runner_test.rs`
- `tests/browser_ws_backend_test.rs`
---
## File structure decisions
### `src/browser/zhihu_release_backend.rs`
Prefer a Zhihu-scoped backend first.
Responsibilities:
- keep the same `BrowserBackend` trait surface
- implement only the behavior needed by the current Zhihu direct-execution route
- translate `Action::Navigate`, `Action::GetText`, and `Action::Eval` into the proven Release-browser interaction style
- normalize results back into `CommandOutput`
- avoid affecting non-Zhihu callers
This is the preferred seam because the user asked to change the current Zhihu flow, not to redesign the whole submit pipeline.
### `src/service/browser_callback_host.rs` (conditional)
Create this file only if Task 1 probe evidence proves that sgclaw must host or control a page in order to receive named callback-function results.
If it is needed, the plan must keep the design minimal and specific:
- one concrete transport only (choose websocket or HTTP, not “websocket or HTTP”)
- explicit readiness handshake
- explicit request correlation by `request_id`
- explicit cleanup when the submit task ends
If Task 1 shows a simpler seam, do not create this file.
### `src/browser/ws_protocol.rs`
Do not let this file keep only the old direct-callback assumption.
It should become the shared place for doc-native request builders such as:
- browser bootstrap frames proven by the transcript
- `sgBrowserCallAfterLoaded` / `sgHideBrowserCallAfterLoaded`
- `sgBrowserExcuteJsCodeByArea`
- optional `sgBrowserRegJsFun` / `sgBrowserExcuteJsFun`
But do **not** let `ws_protocol.rs` absorb service-host lifecycle logic.
### `src/browser/ws_backend.rs` and `tests/browser_ws_backend_test.rs`
Handle these explicitly in the implementation:
- if they still describe a valid direct browser contract, keep them as isolated legacy/direct-ws coverage only
- if their current navigate/callback assumptions conflict with the proven Release Zhihu path, update or narrow those tests so they no longer describe the active Zhihu integration path
Do not leave the old direct-callback assumptions ambiguously “reviewed”; the implementation must make their status explicit.
---
## Task 1: Capture the new Release browser contract in a reproducible probe transcript
**Files:**
- Review/modify: `src/browser/ws_probe.rs`
- Review/modify: `src/bin/sgbrowser_ws_probe.rs`
- Review/modify: `tests/browser_ws_probe_test.rs`
- Create: `docs/_tmp_release_ws_callback_host_transcript.md`
- [ ] **Step 1: Verify current probe coverage against the Release-browser questions**
Read the existing probe module and tests and check whether they already prove all of the following:
- a register-first websocket script can be expressed
- a later array action frame can be expressed in the same script
- per-step inbound frames/outcomes are preserved separately
- timeout/close remain distinguishable in the transcript
Required result:
- identify the exact existing tests that already prove these behaviors
- identify the smallest missing Release-specific coverage, if any
- [ ] **Step 2: Add only the missing regression coverage**
If current tests do **not** already prove the Release-browser bootstrap shape, add the narrowest failing regression in `tests/browser_ws_probe_test.rs`.
Preferred shape if coverage is missing:
```rust
#[test]
fn probe_supports_register_then_array_action_script() {
// fake server expects:
// 1. {"type":"register","role":"web"}
// 2. ["http://127.0.0.1/helper.html","sgBrowerserGetUrls","showUrls"]
}
```
And, if still missing, add one regression proving per-step transcript separation for the register reply and later action reply.
If those behaviors are already covered, skip new test creation and record the exact test names to rely on.
- [ ] **Step 3: Run the relevant probe tests**
Run the narrowest exact tests that prove the Release bootstrap behavior, or the full file if multiple areas changed:
```bash
cargo test --test browser_ws_probe_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 4: Make the probe binary ergonomic for the Release transcript if needed**
Only if the current CLI cannot conveniently express the real Release-browser script, make the smallest change needed in `src/bin/sgbrowser_ws_probe.rs` / `src/browser/ws_probe.rs` so it can capture:
- register frame behavior
- minimal `sgBrowserSetTheme`
- minimal `sgBrowerserGetUrls`
- exact inbound websocket text per step
Do not redesign the probe if it already supports this.
- [ ] **Step 5: Run the live probe against the Release browser and record the real bootstrap**
Use the probe binary against the real endpoint to capture at minimum:
- register frame behavior
- minimal `sgBrowserSetTheme`
- minimal `sgBrowerserGetUrls`
- whether replies come back as websocket text, page-function invocation only, or both
Save the exact transcript in `docs/_tmp_release_ws_callback_host_transcript.md`.
Required output in that temp doc:
- exact sent frames
- exact received websocket frames
- the observed rule for when named callback functions are invoked
- whether Option A or Option B is supported by evidence
- [ ] **Step 6: Commit the probe-only slice if code changed**
If probe code/tests changed:
```bash
git add src/browser/ws_probe.rs src/bin/sgbrowser_ws_probe.rs tests/browser_ws_probe_test.rs docs/_tmp_release_ws_callback_host_transcript.md
git commit -m "test: capture release browser ws bootstrap contract"
```
If only the transcript doc changed, stage only that file and use a docs/test-appropriate commit message.
---
## Task 2: Choose the narrowest Zhihu-only production seam from the probe evidence
**Files:**
- Modify: `src/service/server.rs` (only if required)
- Modify: `src/browser/mod.rs`
- Modify: `src/compat/workflow_executor.rs` (only if required)
- Create: `src/browser/zhihu_release_backend.rs`
- Create: `src/service/browser_callback_host.rs` **only if required**
- Test: `tests/service_ws_session_test.rs`
- Test: `tests/service_task_flow_test.rs`
- [ ] **Step 1: Write down the seam decision in the plan notes before coding**
Based on the transcript from Task 1, record which one of these is supported by evidence:
- Option A: a Zhihu-scoped backend can talk to the Release browser directly with no service-hosted helper page
- Option B: a Zhihu-scoped backend needs a service-controlled helper page because named page callbacks are the only reliable way to get business results
Do not proceed until one option is chosen explicitly from evidence.
- [ ] **Step 2: Add a failing service/task-flow regression that proves only the Zhihu path changes**
Update or add focused tests so that:
- Zhihu submit flow uses the new Release-browser interaction seam
- non-Zhihu behavior is unchanged
- pipe messages remain unchanged
Required assertions:
- the new path is activated only for Zhihu route detection
- `ClientMessage` / `ServiceMessage` stay identical
- existing non-Zhihu submit behavior is not accidentally rerouted
- [ ] **Step 3: Run the new focused regression and confirm failure first**
Run the narrowest exact test names you added in:
```bash
cargo test --test service_ws_session_test <new_test_name> -- --nocapture
cargo test --test service_task_flow_test <new_test_name> -- --nocapture
```
Expected: FAIL because the Zhihu-specific seam does not exist yet.
- [ ] **Step 4: Implement the chosen seam with the smallest blast radius**
If Option A won:
- add `src/browser/zhihu_release_backend.rs`
- wire it only where the Zhihu direct-execution route is selected
- leave global submit-path wiring alone
If Option B won:
- add `src/service/browser_callback_host.rs` with one specific transport and one explicit readiness/correlation model
- add `src/browser/zhihu_release_backend.rs` to talk to that helper path
- wire it only for the Zhihu route
In both cases:
- do not change non-Zhihu callers
- do not redesign `run_submit_task_with_browser_backend(...)`
- do not change the pipe protocol
- [ ] **Step 5: Make the status of old direct-ws code explicit**
Update `src/browser/ws_backend.rs` / `tests/browser_ws_backend_test.rs` only as needed so they no longer ambiguously describe the active Zhihu path.
Allowed outcomes:
- keep them untouched as legacy/direct-ws coverage with no Zhihu production caller
- narrow/update the tests so they no longer claim the active Zhihu integration path
Not allowed:
- leaving the plan and code in a state where both old and new paths appear to be the active Zhihu contract
- [ ] **Step 6: Run focused integration tests**
Run:
```bash
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 7: Commit the seam-selection slice**
Adjust staged files to match the option actually implemented, for example:
```bash
git add src/browser/zhihu_release_backend.rs src/browser/mod.rs src/service/server.rs src/service/browser_callback_host.rs tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/browser_ws_backend_test.rs
git commit -m "feat: route zhihu flow through release browser ws contract"
```
Only stage files that were truly changed.
---
## Task 3: Implement Zhihu action mapping on the chosen Release-browser seam
**Files:**
- Modify: `src/browser/ws_protocol.rs`
- Modify: `src/browser/zhihu_release_backend.rs`
- Test: `tests/browser_ws_protocol_test.rs`
- Create: `tests/browser_zhihu_release_backend_test.rs`
- [ ] **Step 1: Write the first failing backend test for Zhihu navigate mapping**
Create `tests/browser_zhihu_release_backend_test.rs` with a fake transport/relay and assert that `Action::Navigate` for the Zhihu path becomes the exact browser request shape proven by Task 1.
Start with this shape:
```rust
#[test]
fn zhihu_release_backend_maps_navigate_to_proven_release_frame() {
// invoke Action::Navigate
// assert exact outbound frame/opcode chosen from transcript evidence
}
```
Required assertions:
- the call site still uses `BrowserBackend::invoke(...)`
- the exact outbound frame matches the recorded Release-browser evidence
- request correlation stays deterministic
- [ ] **Step 2: Run the single new backend test and verify it fails**
Run:
```bash
cargo test --test browser_zhihu_release_backend_test zhihu_release_backend_maps_navigate_to_proven_release_frame -- --nocapture
```
Expected: FAIL because the backend does not exist yet.
- [ ] **Step 3: Implement minimal `Navigate` support**
In `src/browser/zhihu_release_backend.rs`:
- implement `BrowserBackend`
- support `Action::Navigate` first
- use `ws_protocol.rs` helpers for exact browser-frame construction
- do not hardcode speculative opcodes; follow the transcript from Task 1
- [ ] **Step 4: Add failing tests for `GetText` and `Eval`**
Add tests proving:
- `Action::GetText` returns `CommandOutput.data == {"text": "..."}`
- `Action::Eval` returns `CommandOutput.data == {"text": "..."}`
- callback or relay failures become `PipeError::Protocol(...)`
- [ ] **Step 5: Implement `GetText` and `Eval` on the chosen seam**
Use the smallest proven mechanism:
- if the transcript proves page-defined callback functions are required, route through them
- if `callBackJsToCpp(...)` to a page context is still part of the proven path, use it deliberately
- if `sgBrowserRegJsFun` / `sgBrowserExcuteJsFun` becomes necessary, add it only with test coverage and only for the Zhihu path
- [ ] **Step 6: Run focused backend/protocol tests**
Run:
```bash
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 7: Commit the Zhihu backend slice**
```bash
git add src/browser/ws_protocol.rs src/browser/zhihu_release_backend.rs src/browser/mod.rs tests/browser_ws_protocol_test.rs tests/browser_zhihu_release_backend_test.rs
git commit -m "feat: add zhihu release ws backend"
```
---
## Task 4: Keep the Zhihu workflow logic stable and patch only proven mismatches
**Files:**
- Review: `src/compat/workflow_executor.rs`
- Test: `tests/service_task_flow_test.rs`
- Test: `tests/compat_runtime_test.rs` (only if a focused direct-execution regression is needed)
- [ ] **Step 1: Write a failing Zhihu-specific regression only if the chosen seam changes route assumptions**
If the new Zhihu backend changes request-url or target-url handling enough to break hotlist flow, add one focused failing regression for that exact behavior.
Candidate assertions:
- hotlist navigate still logs `navigate https://www.zhihu.com/hot`
- follow-up `GetText body` still targets the Zhihu page, not any helper page
- extractor `Eval` still runs against Zhihu, not any helper page
- [ ] **Step 2: Keep the current high-level Zhihu action sequence unless a test proves otherwise**
`src/compat/workflow_executor.rs` currently does the right high-level work:
- navigate to Zhihu hotlist
- poll body text until ready
- run the extractor script
Prefer to keep this file unchanged. Only patch it if the new backend needs a narrow explicit `target_url` fix or similar evidence-backed adjustment.
- [ ] **Step 3: Run the smallest Zhihu-focused verification sweep**
Run:
```bash
cargo test --test service_task_flow_test -- --nocapture
cargo test --test compat_runtime_test zhihu -- --nocapture
```
If the `compat_runtime_test zhihu` filter is too broad or unstable, run the exact focused Zhihu cases that cover hotlist extraction.
- [ ] **Step 4: Commit only if a Zhihu-specific code change was actually required**
```bash
git add src/compat/workflow_executor.rs tests/service_task_flow_test.rs tests/compat_runtime_test.rs
git commit -m "fix: keep zhihu workflow aligned with release ws backend"
```
Skip this commit if no production change in `workflow_executor.rs` was needed.
---
## Task 5: Prove that pipe behavior and non-Zhihu behavior stayed unchanged
**Files:**
- Test: `tests/service_ws_session_test.rs`
- Test: `tests/service_task_flow_test.rs`
- Test: `tests/task_runner_test.rs`
- [ ] **Step 1: Add or update one regression that proves pipe messages are unchanged**
Use the smallest existing test seam to assert that `ClientMessage` / `ServiceMessage` payloads remain unchanged while the Zhihu route uses the new browser integration path internally.
- [ ] **Step 2: Add or update one regression that proves non-Zhihu behavior is unchanged**
Use a non-Zhihu submit or service-session case and assert it does not take the new Zhihu-specific backend path.
- [ ] **Step 3: Preserve current runtime regression guards**
The end-to-end tests must continue asserting that output does **not** contain:
- `invalid hmac seed: session key must not be empty`
- `Cannot drop a runtime in a context where blocking is not allowed`
- [ ] **Step 4: Run the final focused verification sweep**
Run:
```bash
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
```
Expected: PASS.
- [ ] **Step 5: Commit the verification sweep**
```bash
git add tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/task_runner_test.rs tests/browser_ws_backend_test.rs
git commit -m "test: constrain zhihu release ws migration scope"
```
Only stage files that were truly changed.
---
## Out of scope
Do **not** do these in this slice:
- change the pipe protocol
- change `ClientMessage` / `ServiceMessage`
- redesign `run_submit_task_with_browser_backend(...)`
- reintroduce any browser bridge surface
- keep adding speculative direct-raw-websocket callback patches to `ws_backend.rs`
- redesign non-Zhihu workflows unless the new backend abstraction forces a shared fix
- create a long-lived external dependency or third-party server just to host the helper page
---
## Verification checklist
Run at minimum:
```bash
cargo test --test browser_ws_probe_test -- --nocapture
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
```
If Task 2 chose the helper-page / relay design, also run the helper-page-specific backend tests you added for that path.
Manual verification after code changes:
1. start the real Release browser/runtime that exposes `ws://127.0.0.1:12345`
2. start `sg_claw` with real config
3. start `sg_claw_client`
4. submit:
- `打开知乎热榜获取前10条数据并导出 Excel`
5. confirm the Zhihu path uses the exact Release-browser interaction seam proven by Task 1
6. if Task 2 chose Option B, confirm the helper page / relay path is used only for the Zhihu integration seam
7. confirm non-Zhihu behavior is unchanged
8. confirm the task completes without:
- `timeout while waiting for browser message`
- `invalid browser status frame: Welcome! You are client #1`
- `invalid hmac seed: session key must not be empty`
- `Cannot drop a runtime in a context where blocking is not allowed`
---
## Expected outcome
After this slice:
- sgclaw still exposes the same pipe/service contract
- Zhihu hotlist execution uses the Release-browser websocket contract proven by Task 1
- non-Zhihu behavior remains unchanged
- old direct-ws Zhihu assumptions are no longer ambiguous in production/tests
- if Option A won, Zhihu uses a direct Release-browser backend
- if Option B won, Zhihu uses the minimal helper-page / relay seam justified by the probe evidence