Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
23 KiB
Zhihu Release WS Function-Callback Migration Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Move only the Zhihu direct-execution path to the new Release browser websocket interaction style while keeping the existing pipe protocol and non-Zhihu submit behavior unchanged.
Architecture: Keep ClientMessage / ServiceMessage, run_submit_task_with_browser_backend(...), and the high-level Zhihu workflow steps unchanged. First prove the exact Release browser interaction contract with transcript-backed probes. Then implement the smallest Zhihu-scoped backend path that follows that proven contract. Do not globally rewire the submit path unless the probe evidence proves there is no narrower safe seam.
Tech Stack: Rust, tungstenite, existing sgclaw service/client pipe protocol, docs/_tmp_sgbrowser_ws_api_doc.txt, Release browser websocket at ws://127.0.0.1:12345, current Zhihu direct-execution workflow.
Context
The user has now made the target behavior explicit:
- the browser has changed and the working reference behavior is the user-provided HTML page that connects to
ws://127.0.0.1:12345 - that page sends a bootstrap registration frame:
{"type":"register","role":"web"} - browser requests are still JSON arrays such as
[window.location.href, "sgBrowserSetTheme", "1"]and[window.location.href, "sgBrowerserGetUrls", "showUrls"] - callback-bearing browser behavior is now centered on page-defined JS callback functions like
showUrls, not on Rust directly reading a websocket callback frame as the final business result - the existing sgclaw pipe protocol must remain unchanged
The current sgclaw drift that must be corrected is visible in:
src/browser/ws_protocol.rsAction::Navigatecurrently emitssgHideBrowserCallAfterLoadedwith an inlinecallBackJsToCpp(...)string
src/browser/ws_backend.rs- Rust currently waits for a browser websocket callback frame and treats that as the action result
tests/service_ws_session_test.rs:498-605tests/service_task_flow_test.rs:499-635- existing generic submit-flow regressions still lock in the old direct raw-websocket callback-frame assumption
- these are useful as non-regression guardrails, but they are not themselves Zhihu-specific regressions
Zhihu-specific verification must therefore be added explicitly instead of assuming those Baidu-path tests already cover Zhihu.
The new browser style proves these facts and only these facts so far:
- sgclaw must handle a register-first websocket handshake
- browser requests are still
[requesturl, action, ...args] - some browser capabilities now return through page-defined callback functions like
showUrls - the current direct raw-websocket callback expectation in Zhihu path is no longer a safe assumption
The production seam is not pre-decided here. Task 1 must determine whether Zhihu can be integrated by:
- a direct Zhihu-scoped backend with no helper page, or
- a helper page / relay design because named page callbacks are the only reliable result path
Until Task 1 evidence is captured, both remain hypotheses.
Evidence to preserve in the implementation
Browser websocket API doc
From docs/_tmp_sgbrowser_ws_api_doc.txt:
ws://localhost:12345is the browser websocket endpoint- request frames are array payloads with
requesturl sgBrowerserGetUrls(callback)uses a callback function name:[requesturl,"sgBrowerserGetUrls", callback]sgBrowserCallAfterLoaded(targetUrl, callback)andsgHideBrowserCallAfterLoaded(targetUrl, callback)use callback strings with parenthesescallBackJsToCpp(param)usessourceUrl@_@targetUrl@_@callback@_@actionUrl@_@responseTxtsgBrowserRegJsFun(targeturl, funContent)andsgBrowserExcuteJsFun(targeturl, funName)exist and may be useful when the helper page needs durable callback helpers
Current working HTML pattern from the user
The now-working reference interaction is:
const socket = new WebSocket('ws://127.0.0.1:12345');
socket.onopen = () => {
socket.send(JSON.stringify({type: 'register', role: 'web'}));
};
socket.send(JSON.stringify([window.location.href,"sgBrowerserGetUrls","showUrls"]));
function showUrls(urls) {
// browser invokes this page-defined callback
}
That is the browser behavior sgclaw now needs to follow.
Critical files
Production files to modify
src/browser/ws_protocol.rssrc/compat/workflow_executor.rs(only if a narrow Zhihu-specific correction is required after backend swap)src/service/server.rs(only if the chosen Zhihu-scoped integration seam must be wired here)src/service/mod.rs(only if startup plumbing changes are truly required)src/browser/mod.rs
New production files likely needed
src/browser/zhihu_release_backend.rs- a Zhihu-scoped
BrowserBackendadapter that follows the proven Release browser interaction style without changing non-Zhihu routes
- a Zhihu-scoped
src/service/browser_callback_host.rsonly if the probe proves a service-controlled helper page is actually required- service-local helper-page lifecycle and callback relay, if evidence shows the browser cannot be driven safely without it
Existing files to preserve
src/agent/task_runner.rssrc/service/protocol.rssrc/compat/orchestration.rssrc/compat/runtime.rssrc/pipe/*
Existing direct-ws files to review explicitly
src/browser/ws_backend.rstests/browser_ws_backend_test.rs
These files currently encode the old direct raw-websocket callback expectation. The implementation must either:
- leave them untouched as legacy/direct-contract coverage with no Zhihu production callers, or
- update/remove the Zhihu-specific assumptions they currently lock in.
Primary test files
tests/browser_ws_probe_test.rstests/browser_ws_protocol_test.rstests/service_ws_session_test.rstests/service_task_flow_test.rstests/task_runner_test.rstests/browser_ws_backend_test.rs
File structure decisions
src/browser/zhihu_release_backend.rs
Prefer a Zhihu-scoped backend first.
Responsibilities:
- keep the same
BrowserBackendtrait surface - implement only the behavior needed by the current Zhihu direct-execution route
- translate
Action::Navigate,Action::GetText, andAction::Evalinto the proven Release-browser interaction style - normalize results back into
CommandOutput - avoid affecting non-Zhihu callers
This is the preferred seam because the user asked to change the current Zhihu flow, not to redesign the whole submit pipeline.
src/service/browser_callback_host.rs (conditional)
Create this file only if Task 1 probe evidence proves that sgclaw must host or control a page in order to receive named callback-function results.
If it is needed, the plan must keep the design minimal and specific:
- one concrete transport only (choose websocket or HTTP, not “websocket or HTTP”)
- explicit readiness handshake
- explicit request correlation by
request_id - explicit cleanup when the submit task ends
If Task 1 shows a simpler seam, do not create this file.
src/browser/ws_protocol.rs
Do not let this file keep only the old direct-callback assumption.
It should become the shared place for doc-native request builders such as:
- browser bootstrap frames proven by the transcript
sgBrowserCallAfterLoaded/sgHideBrowserCallAfterLoadedsgBrowserExcuteJsCodeByArea- optional
sgBrowserRegJsFun/sgBrowserExcuteJsFun
But do not let ws_protocol.rs absorb service-host lifecycle logic.
src/browser/ws_backend.rs and tests/browser_ws_backend_test.rs
Handle these explicitly in the implementation:
- if they still describe a valid direct browser contract, keep them as isolated legacy/direct-ws coverage only
- if their current navigate/callback assumptions conflict with the proven Release Zhihu path, update or narrow those tests so they no longer describe the active Zhihu integration path
Do not leave the old direct-callback assumptions ambiguously “reviewed”; the implementation must make their status explicit.
Task 1: Capture the new Release browser contract in a reproducible probe transcript
Files:
-
Review/modify:
src/browser/ws_probe.rs -
Review/modify:
src/bin/sgbrowser_ws_probe.rs -
Review/modify:
tests/browser_ws_probe_test.rs -
Create:
docs/_tmp_release_ws_callback_host_transcript.md -
Step 1: Verify current probe coverage against the Release-browser questions
Read the existing probe module and tests and check whether they already prove all of the following:
- a register-first websocket script can be expressed
- a later array action frame can be expressed in the same script
- per-step inbound frames/outcomes are preserved separately
- timeout/close remain distinguishable in the transcript
Required result:
-
identify the exact existing tests that already prove these behaviors
-
identify the smallest missing Release-specific coverage, if any
-
Step 2: Add only the missing regression coverage
If current tests do not already prove the Release-browser bootstrap shape, add the narrowest failing regression in tests/browser_ws_probe_test.rs.
Preferred shape if coverage is missing:
#[test]
fn probe_supports_register_then_array_action_script() {
// fake server expects:
// 1. {"type":"register","role":"web"}
// 2. ["http://127.0.0.1/helper.html","sgBrowerserGetUrls","showUrls"]
}
And, if still missing, add one regression proving per-step transcript separation for the register reply and later action reply.
If those behaviors are already covered, skip new test creation and record the exact test names to rely on.
- Step 3: Run the relevant probe tests
Run the narrowest exact tests that prove the Release bootstrap behavior, or the full file if multiple areas changed:
cargo test --test browser_ws_probe_test -- --nocapture
Expected: PASS.
- Step 4: Make the probe binary ergonomic for the Release transcript if needed
Only if the current CLI cannot conveniently express the real Release-browser script, make the smallest change needed in src/bin/sgbrowser_ws_probe.rs / src/browser/ws_probe.rs so it can capture:
- register frame behavior
- minimal
sgBrowserSetTheme - minimal
sgBrowerserGetUrls - exact inbound websocket text per step
Do not redesign the probe if it already supports this.
- Step 5: Run the live probe against the Release browser and record the real bootstrap
Use the probe binary against the real endpoint to capture at minimum:
- register frame behavior
- minimal
sgBrowserSetTheme - minimal
sgBrowerserGetUrls - whether replies come back as websocket text, page-function invocation only, or both
Save the exact transcript in docs/_tmp_release_ws_callback_host_transcript.md.
Required output in that temp doc:
-
exact sent frames
-
exact received websocket frames
-
the observed rule for when named callback functions are invoked
-
whether Option A or Option B is supported by evidence
-
Step 6: Commit the probe-only slice if code changed
If probe code/tests changed:
git add src/browser/ws_probe.rs src/bin/sgbrowser_ws_probe.rs tests/browser_ws_probe_test.rs docs/_tmp_release_ws_callback_host_transcript.md
git commit -m "test: capture release browser ws bootstrap contract"
If only the transcript doc changed, stage only that file and use a docs/test-appropriate commit message.
Task 2: Choose the narrowest Zhihu-only production seam from the probe evidence
Files:
-
Modify:
src/service/server.rs(only if required) -
Modify:
src/browser/mod.rs -
Modify:
src/compat/workflow_executor.rs(only if required) -
Create:
src/browser/zhihu_release_backend.rs -
Create:
src/service/browser_callback_host.rsonly if required -
Test:
tests/service_ws_session_test.rs -
Test:
tests/service_task_flow_test.rs -
Step 1: Write down the seam decision in the plan notes before coding
Based on the transcript from Task 1, record which one of these is supported by evidence:
- Option A: a Zhihu-scoped backend can talk to the Release browser directly with no service-hosted helper page
- Option B: a Zhihu-scoped backend needs a service-controlled helper page because named page callbacks are the only reliable way to get business results
Do not proceed until one option is chosen explicitly from evidence.
- Step 2: Add a failing service/task-flow regression that proves only the Zhihu path changes
Update or add focused tests so that:
- Zhihu submit flow uses the new Release-browser interaction seam
- non-Zhihu behavior is unchanged
- pipe messages remain unchanged
Required assertions:
-
the new path is activated only for Zhihu route detection
-
ClientMessage/ServiceMessagestay identical -
existing non-Zhihu submit behavior is not accidentally rerouted
-
Step 3: Run the new focused regression and confirm failure first
Run the narrowest exact test names you added in:
cargo test --test service_ws_session_test <new_test_name> -- --nocapture
cargo test --test service_task_flow_test <new_test_name> -- --nocapture
Expected: FAIL because the Zhihu-specific seam does not exist yet.
- Step 4: Implement the chosen seam with the smallest blast radius
If Option A won:
- add
src/browser/zhihu_release_backend.rs - wire it only where the Zhihu direct-execution route is selected
- leave global submit-path wiring alone
If Option B won:
- add
src/service/browser_callback_host.rswith one specific transport and one explicit readiness/correlation model - add
src/browser/zhihu_release_backend.rsto talk to that helper path - wire it only for the Zhihu route
In both cases:
-
do not change non-Zhihu callers
-
do not redesign
run_submit_task_with_browser_backend(...) -
do not change the pipe protocol
-
Step 5: Make the status of old direct-ws code explicit
Update src/browser/ws_backend.rs / tests/browser_ws_backend_test.rs only as needed so they no longer ambiguously describe the active Zhihu path.
Allowed outcomes:
- keep them untouched as legacy/direct-ws coverage with no Zhihu production caller
- narrow/update the tests so they no longer claim the active Zhihu integration path
Not allowed:
-
leaving the plan and code in a state where both old and new paths appear to be the active Zhihu contract
-
Step 6: Run focused integration tests
Run:
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
Expected: PASS.
- Step 7: Commit the seam-selection slice
Adjust staged files to match the option actually implemented, for example:
git add src/browser/zhihu_release_backend.rs src/browser/mod.rs src/service/server.rs src/service/browser_callback_host.rs tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/browser_ws_backend_test.rs
git commit -m "feat: route zhihu flow through release browser ws contract"
Only stage files that were truly changed.
Task 3: Implement Zhihu action mapping on the chosen Release-browser seam
Files:
-
Modify:
src/browser/ws_protocol.rs -
Modify:
src/browser/zhihu_release_backend.rs -
Test:
tests/browser_ws_protocol_test.rs -
Create:
tests/browser_zhihu_release_backend_test.rs -
Step 1: Write the first failing backend test for Zhihu navigate mapping
Create tests/browser_zhihu_release_backend_test.rs with a fake transport/relay and assert that Action::Navigate for the Zhihu path becomes the exact browser request shape proven by Task 1.
Start with this shape:
#[test]
fn zhihu_release_backend_maps_navigate_to_proven_release_frame() {
// invoke Action::Navigate
// assert exact outbound frame/opcode chosen from transcript evidence
}
Required assertions:
-
the call site still uses
BrowserBackend::invoke(...) -
the exact outbound frame matches the recorded Release-browser evidence
-
request correlation stays deterministic
-
Step 2: Run the single new backend test and verify it fails
Run:
cargo test --test browser_zhihu_release_backend_test zhihu_release_backend_maps_navigate_to_proven_release_frame -- --nocapture
Expected: FAIL because the backend does not exist yet.
- Step 3: Implement minimal
Navigatesupport
In src/browser/zhihu_release_backend.rs:
-
implement
BrowserBackend -
support
Action::Navigatefirst -
use
ws_protocol.rshelpers for exact browser-frame construction -
do not hardcode speculative opcodes; follow the transcript from Task 1
-
Step 4: Add failing tests for
GetTextandEval
Add tests proving:
-
Action::GetTextreturnsCommandOutput.data == {"text": "..."} -
Action::EvalreturnsCommandOutput.data == {"text": "..."} -
callback or relay failures become
PipeError::Protocol(...) -
Step 5: Implement
GetTextandEvalon the chosen seam
Use the smallest proven mechanism:
-
if the transcript proves page-defined callback functions are required, route through them
-
if
callBackJsToCpp(...)to a page context is still part of the proven path, use it deliberately -
if
sgBrowserRegJsFun/sgBrowserExcuteJsFunbecomes necessary, add it only with test coverage and only for the Zhihu path -
Step 6: Run focused backend/protocol tests
Run:
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
Expected: PASS.
- Step 7: Commit the Zhihu backend slice
git add src/browser/ws_protocol.rs src/browser/zhihu_release_backend.rs src/browser/mod.rs tests/browser_ws_protocol_test.rs tests/browser_zhihu_release_backend_test.rs
git commit -m "feat: add zhihu release ws backend"
Task 4: Keep the Zhihu workflow logic stable and patch only proven mismatches
Files:
-
Review:
src/compat/workflow_executor.rs -
Test:
tests/service_task_flow_test.rs -
Test:
tests/compat_runtime_test.rs(only if a focused direct-execution regression is needed) -
Step 1: Write a failing Zhihu-specific regression only if the chosen seam changes route assumptions
If the new Zhihu backend changes request-url or target-url handling enough to break hotlist flow, add one focused failing regression for that exact behavior.
Candidate assertions:
-
hotlist navigate still logs
navigate https://www.zhihu.com/hot -
follow-up
GetText bodystill targets the Zhihu page, not any helper page -
extractor
Evalstill runs against Zhihu, not any helper page -
Step 2: Keep the current high-level Zhihu action sequence unless a test proves otherwise
src/compat/workflow_executor.rs currently does the right high-level work:
- navigate to Zhihu hotlist
- poll body text until ready
- run the extractor script
Prefer to keep this file unchanged. Only patch it if the new backend needs a narrow explicit target_url fix or similar evidence-backed adjustment.
- Step 3: Run the smallest Zhihu-focused verification sweep
Run:
cargo test --test service_task_flow_test -- --nocapture
cargo test --test compat_runtime_test zhihu -- --nocapture
If the compat_runtime_test zhihu filter is too broad or unstable, run the exact focused Zhihu cases that cover hotlist extraction.
- Step 4: Commit only if a Zhihu-specific code change was actually required
git add src/compat/workflow_executor.rs tests/service_task_flow_test.rs tests/compat_runtime_test.rs
git commit -m "fix: keep zhihu workflow aligned with release ws backend"
Skip this commit if no production change in workflow_executor.rs was needed.
Task 5: Prove that pipe behavior and non-Zhihu behavior stayed unchanged
Files:
-
Test:
tests/service_ws_session_test.rs -
Test:
tests/service_task_flow_test.rs -
Test:
tests/task_runner_test.rs -
Step 1: Add or update one regression that proves pipe messages are unchanged
Use the smallest existing test seam to assert that ClientMessage / ServiceMessage payloads remain unchanged while the Zhihu route uses the new browser integration path internally.
- Step 2: Add or update one regression that proves non-Zhihu behavior is unchanged
Use a non-Zhihu submit or service-session case and assert it does not take the new Zhihu-specific backend path.
- Step 3: Preserve current runtime regression guards
The end-to-end tests must continue asserting that output does not contain:
-
invalid hmac seed: session key must not be empty -
Cannot drop a runtime in a context where blocking is not allowed -
Step 4: Run the final focused verification sweep
Run:
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
Expected: PASS.
- Step 5: Commit the verification sweep
git add tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/task_runner_test.rs tests/browser_ws_backend_test.rs
git commit -m "test: constrain zhihu release ws migration scope"
Only stage files that were truly changed.
Out of scope
Do not do these in this slice:
- change the pipe protocol
- change
ClientMessage/ServiceMessage - redesign
run_submit_task_with_browser_backend(...) - reintroduce any browser bridge surface
- keep adding speculative direct-raw-websocket callback patches to
ws_backend.rs - redesign non-Zhihu workflows unless the new backend abstraction forces a shared fix
- create a long-lived external dependency or third-party server just to host the helper page
Verification checklist
Run at minimum:
cargo test --test browser_ws_probe_test -- --nocapture
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture
If Task 2 chose the helper-page / relay design, also run the helper-page-specific backend tests you added for that path.
Manual verification after code changes:
- start the real Release browser/runtime that exposes
ws://127.0.0.1:12345 - start
sg_clawwith real config - start
sg_claw_client - submit:
打开知乎热榜,获取前10条数据,并导出 Excel
- confirm the Zhihu path uses the exact Release-browser interaction seam proven by Task 1
- if Task 2 chose Option B, confirm the helper page / relay path is used only for the Zhihu integration seam
- confirm non-Zhihu behavior is unchanged
- confirm the task completes without:
timeout while waiting for browser messageinvalid browser status frame: Welcome! You are client #1invalid hmac seed: session key must not be emptyCannot drop a runtime in a context where blocking is not allowed
Expected outcome
After this slice:
- sgclaw still exposes the same pipe/service contract
- Zhihu hotlist execution uses the Release-browser websocket contract proven by Task 1
- non-Zhihu behavior remains unchanged
- old direct-ws Zhihu assumptions are no longer ambiguous in production/tests
- if Option A won, Zhihu uses a direct Release-browser backend
- if Option B won, Zhihu uses the minimal helper-page / relay seam justified by the probe evidence