Files
claw/docs/superpowers/plans/2026-04-03-zhihu-release-ws-function-callback-plan.md
木炎 bdf8e12246 feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:44:53 +08:00

23 KiB
Raw Blame History

Zhihu Release WS Function-Callback Migration Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Move only the Zhihu direct-execution path to the new Release browser websocket interaction style while keeping the existing pipe protocol and non-Zhihu submit behavior unchanged.

Architecture: Keep ClientMessage / ServiceMessage, run_submit_task_with_browser_backend(...), and the high-level Zhihu workflow steps unchanged. First prove the exact Release browser interaction contract with transcript-backed probes. Then implement the smallest Zhihu-scoped backend path that follows that proven contract. Do not globally rewire the submit path unless the probe evidence proves there is no narrower safe seam.

Tech Stack: Rust, tungstenite, existing sgclaw service/client pipe protocol, docs/_tmp_sgbrowser_ws_api_doc.txt, Release browser websocket at ws://127.0.0.1:12345, current Zhihu direct-execution workflow.


Context

The user has now made the target behavior explicit:

  • the browser has changed and the working reference behavior is the user-provided HTML page that connects to ws://127.0.0.1:12345
  • that page sends a bootstrap registration frame: {"type":"register","role":"web"}
  • browser requests are still JSON arrays such as [window.location.href, "sgBrowserSetTheme", "1"] and [window.location.href, "sgBrowerserGetUrls", "showUrls"]
  • callback-bearing browser behavior is now centered on page-defined JS callback functions like showUrls, not on Rust directly reading a websocket callback frame as the final business result
  • the existing sgclaw pipe protocol must remain unchanged

The current sgclaw drift that must be corrected is visible in:

  • src/browser/ws_protocol.rs
    • Action::Navigate currently emits sgHideBrowserCallAfterLoaded with an inline callBackJsToCpp(...) string
  • src/browser/ws_backend.rs
    • Rust currently waits for a browser websocket callback frame and treats that as the action result
  • tests/service_ws_session_test.rs:498-605
  • tests/service_task_flow_test.rs:499-635
    • existing generic submit-flow regressions still lock in the old direct raw-websocket callback-frame assumption
    • these are useful as non-regression guardrails, but they are not themselves Zhihu-specific regressions

Zhihu-specific verification must therefore be added explicitly instead of assuming those Baidu-path tests already cover Zhihu.

The new browser style proves these facts and only these facts so far:

  1. sgclaw must handle a register-first websocket handshake
  2. browser requests are still [requesturl, action, ...args]
  3. some browser capabilities now return through page-defined callback functions like showUrls
  4. the current direct raw-websocket callback expectation in Zhihu path is no longer a safe assumption

The production seam is not pre-decided here. Task 1 must determine whether Zhihu can be integrated by:

  • a direct Zhihu-scoped backend with no helper page, or
  • a helper page / relay design because named page callbacks are the only reliable result path

Until Task 1 evidence is captured, both remain hypotheses.

Evidence to preserve in the implementation

Browser websocket API doc

From docs/_tmp_sgbrowser_ws_api_doc.txt:

  • ws://localhost:12345 is the browser websocket endpoint
  • request frames are array payloads with requesturl
  • sgBrowerserGetUrls(callback) uses a callback function name: [requesturl,"sgBrowerserGetUrls", callback]
  • sgBrowserCallAfterLoaded(targetUrl, callback) and sgHideBrowserCallAfterLoaded(targetUrl, callback) use callback strings with parentheses
  • callBackJsToCpp(param) uses sourceUrl@_@targetUrl@_@callback@_@actionUrl@_@responseTxt
  • sgBrowserRegJsFun(targeturl, funContent) and sgBrowserExcuteJsFun(targeturl, funName) exist and may be useful when the helper page needs durable callback helpers

Current working HTML pattern from the user

The now-working reference interaction is:

const socket = new WebSocket('ws://127.0.0.1:12345');
socket.onopen = () => {
  socket.send(JSON.stringify({type: 'register', role: 'web'}));
};
socket.send(JSON.stringify([window.location.href,"sgBrowerserGetUrls","showUrls"]));
function showUrls(urls) {
  // browser invokes this page-defined callback
}

That is the browser behavior sgclaw now needs to follow.


Critical files

Production files to modify

  • src/browser/ws_protocol.rs
  • src/compat/workflow_executor.rs (only if a narrow Zhihu-specific correction is required after backend swap)
  • src/service/server.rs (only if the chosen Zhihu-scoped integration seam must be wired here)
  • src/service/mod.rs (only if startup plumbing changes are truly required)
  • src/browser/mod.rs

New production files likely needed

  • src/browser/zhihu_release_backend.rs
    • a Zhihu-scoped BrowserBackend adapter that follows the proven Release browser interaction style without changing non-Zhihu routes
  • src/service/browser_callback_host.rs only if the probe proves a service-controlled helper page is actually required
    • service-local helper-page lifecycle and callback relay, if evidence shows the browser cannot be driven safely without it

Existing files to preserve

  • src/agent/task_runner.rs
  • src/service/protocol.rs
  • src/compat/orchestration.rs
  • src/compat/runtime.rs
  • src/pipe/*

Existing direct-ws files to review explicitly

  • src/browser/ws_backend.rs
  • tests/browser_ws_backend_test.rs

These files currently encode the old direct raw-websocket callback expectation. The implementation must either:

  • leave them untouched as legacy/direct-contract coverage with no Zhihu production callers, or
  • update/remove the Zhihu-specific assumptions they currently lock in.

Primary test files

  • tests/browser_ws_probe_test.rs
  • tests/browser_ws_protocol_test.rs
  • tests/service_ws_session_test.rs
  • tests/service_task_flow_test.rs
  • tests/task_runner_test.rs
  • tests/browser_ws_backend_test.rs

File structure decisions

src/browser/zhihu_release_backend.rs

Prefer a Zhihu-scoped backend first.

Responsibilities:

  • keep the same BrowserBackend trait surface
  • implement only the behavior needed by the current Zhihu direct-execution route
  • translate Action::Navigate, Action::GetText, and Action::Eval into the proven Release-browser interaction style
  • normalize results back into CommandOutput
  • avoid affecting non-Zhihu callers

This is the preferred seam because the user asked to change the current Zhihu flow, not to redesign the whole submit pipeline.

src/service/browser_callback_host.rs (conditional)

Create this file only if Task 1 probe evidence proves that sgclaw must host or control a page in order to receive named callback-function results.

If it is needed, the plan must keep the design minimal and specific:

  • one concrete transport only (choose websocket or HTTP, not “websocket or HTTP”)
  • explicit readiness handshake
  • explicit request correlation by request_id
  • explicit cleanup when the submit task ends

If Task 1 shows a simpler seam, do not create this file.

src/browser/ws_protocol.rs

Do not let this file keep only the old direct-callback assumption.

It should become the shared place for doc-native request builders such as:

  • browser bootstrap frames proven by the transcript
  • sgBrowserCallAfterLoaded / sgHideBrowserCallAfterLoaded
  • sgBrowserExcuteJsCodeByArea
  • optional sgBrowserRegJsFun / sgBrowserExcuteJsFun

But do not let ws_protocol.rs absorb service-host lifecycle logic.

src/browser/ws_backend.rs and tests/browser_ws_backend_test.rs

Handle these explicitly in the implementation:

  • if they still describe a valid direct browser contract, keep them as isolated legacy/direct-ws coverage only
  • if their current navigate/callback assumptions conflict with the proven Release Zhihu path, update or narrow those tests so they no longer describe the active Zhihu integration path

Do not leave the old direct-callback assumptions ambiguously “reviewed”; the implementation must make their status explicit.


Task 1: Capture the new Release browser contract in a reproducible probe transcript

Files:

  • Review/modify: src/browser/ws_probe.rs

  • Review/modify: src/bin/sgbrowser_ws_probe.rs

  • Review/modify: tests/browser_ws_probe_test.rs

  • Create: docs/_tmp_release_ws_callback_host_transcript.md

  • Step 1: Verify current probe coverage against the Release-browser questions

Read the existing probe module and tests and check whether they already prove all of the following:

  • a register-first websocket script can be expressed
  • a later array action frame can be expressed in the same script
  • per-step inbound frames/outcomes are preserved separately
  • timeout/close remain distinguishable in the transcript

Required result:

  • identify the exact existing tests that already prove these behaviors

  • identify the smallest missing Release-specific coverage, if any

  • Step 2: Add only the missing regression coverage

If current tests do not already prove the Release-browser bootstrap shape, add the narrowest failing regression in tests/browser_ws_probe_test.rs.

Preferred shape if coverage is missing:

#[test]
fn probe_supports_register_then_array_action_script() {
    // fake server expects:
    // 1. {"type":"register","role":"web"}
    // 2. ["http://127.0.0.1/helper.html","sgBrowerserGetUrls","showUrls"]
}

And, if still missing, add one regression proving per-step transcript separation for the register reply and later action reply.

If those behaviors are already covered, skip new test creation and record the exact test names to rely on.

  • Step 3: Run the relevant probe tests

Run the narrowest exact tests that prove the Release bootstrap behavior, or the full file if multiple areas changed:

cargo test --test browser_ws_probe_test -- --nocapture

Expected: PASS.

  • Step 4: Make the probe binary ergonomic for the Release transcript if needed

Only if the current CLI cannot conveniently express the real Release-browser script, make the smallest change needed in src/bin/sgbrowser_ws_probe.rs / src/browser/ws_probe.rs so it can capture:

  • register frame behavior
  • minimal sgBrowserSetTheme
  • minimal sgBrowerserGetUrls
  • exact inbound websocket text per step

Do not redesign the probe if it already supports this.

  • Step 5: Run the live probe against the Release browser and record the real bootstrap

Use the probe binary against the real endpoint to capture at minimum:

  • register frame behavior
  • minimal sgBrowserSetTheme
  • minimal sgBrowerserGetUrls
  • whether replies come back as websocket text, page-function invocation only, or both

Save the exact transcript in docs/_tmp_release_ws_callback_host_transcript.md.

Required output in that temp doc:

  • exact sent frames

  • exact received websocket frames

  • the observed rule for when named callback functions are invoked

  • whether Option A or Option B is supported by evidence

  • Step 6: Commit the probe-only slice if code changed

If probe code/tests changed:

git add src/browser/ws_probe.rs src/bin/sgbrowser_ws_probe.rs tests/browser_ws_probe_test.rs docs/_tmp_release_ws_callback_host_transcript.md
git commit -m "test: capture release browser ws bootstrap contract"

If only the transcript doc changed, stage only that file and use a docs/test-appropriate commit message.


Task 2: Choose the narrowest Zhihu-only production seam from the probe evidence

Files:

  • Modify: src/service/server.rs (only if required)

  • Modify: src/browser/mod.rs

  • Modify: src/compat/workflow_executor.rs (only if required)

  • Create: src/browser/zhihu_release_backend.rs

  • Create: src/service/browser_callback_host.rs only if required

  • Test: tests/service_ws_session_test.rs

  • Test: tests/service_task_flow_test.rs

  • Step 1: Write down the seam decision in the plan notes before coding

Based on the transcript from Task 1, record which one of these is supported by evidence:

  • Option A: a Zhihu-scoped backend can talk to the Release browser directly with no service-hosted helper page
  • Option B: a Zhihu-scoped backend needs a service-controlled helper page because named page callbacks are the only reliable way to get business results

Do not proceed until one option is chosen explicitly from evidence.

  • Step 2: Add a failing service/task-flow regression that proves only the Zhihu path changes

Update or add focused tests so that:

  • Zhihu submit flow uses the new Release-browser interaction seam
  • non-Zhihu behavior is unchanged
  • pipe messages remain unchanged

Required assertions:

  • the new path is activated only for Zhihu route detection

  • ClientMessage / ServiceMessage stay identical

  • existing non-Zhihu submit behavior is not accidentally rerouted

  • Step 3: Run the new focused regression and confirm failure first

Run the narrowest exact test names you added in:

cargo test --test service_ws_session_test <new_test_name> -- --nocapture
cargo test --test service_task_flow_test <new_test_name> -- --nocapture

Expected: FAIL because the Zhihu-specific seam does not exist yet.

  • Step 4: Implement the chosen seam with the smallest blast radius

If Option A won:

  • add src/browser/zhihu_release_backend.rs
  • wire it only where the Zhihu direct-execution route is selected
  • leave global submit-path wiring alone

If Option B won:

  • add src/service/browser_callback_host.rs with one specific transport and one explicit readiness/correlation model
  • add src/browser/zhihu_release_backend.rs to talk to that helper path
  • wire it only for the Zhihu route

In both cases:

  • do not change non-Zhihu callers

  • do not redesign run_submit_task_with_browser_backend(...)

  • do not change the pipe protocol

  • Step 5: Make the status of old direct-ws code explicit

Update src/browser/ws_backend.rs / tests/browser_ws_backend_test.rs only as needed so they no longer ambiguously describe the active Zhihu path.

Allowed outcomes:

  • keep them untouched as legacy/direct-ws coverage with no Zhihu production caller
  • narrow/update the tests so they no longer claim the active Zhihu integration path

Not allowed:

  • leaving the plan and code in a state where both old and new paths appear to be the active Zhihu contract

  • Step 6: Run focused integration tests

Run:

cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture

Expected: PASS.

  • Step 7: Commit the seam-selection slice

Adjust staged files to match the option actually implemented, for example:

git add src/browser/zhihu_release_backend.rs src/browser/mod.rs src/service/server.rs src/service/browser_callback_host.rs tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/browser_ws_backend_test.rs
git commit -m "feat: route zhihu flow through release browser ws contract"

Only stage files that were truly changed.


Task 3: Implement Zhihu action mapping on the chosen Release-browser seam

Files:

  • Modify: src/browser/ws_protocol.rs

  • Modify: src/browser/zhihu_release_backend.rs

  • Test: tests/browser_ws_protocol_test.rs

  • Create: tests/browser_zhihu_release_backend_test.rs

  • Step 1: Write the first failing backend test for Zhihu navigate mapping

Create tests/browser_zhihu_release_backend_test.rs with a fake transport/relay and assert that Action::Navigate for the Zhihu path becomes the exact browser request shape proven by Task 1.

Start with this shape:

#[test]
fn zhihu_release_backend_maps_navigate_to_proven_release_frame() {
    // invoke Action::Navigate
    // assert exact outbound frame/opcode chosen from transcript evidence
}

Required assertions:

  • the call site still uses BrowserBackend::invoke(...)

  • the exact outbound frame matches the recorded Release-browser evidence

  • request correlation stays deterministic

  • Step 2: Run the single new backend test and verify it fails

Run:

cargo test --test browser_zhihu_release_backend_test zhihu_release_backend_maps_navigate_to_proven_release_frame -- --nocapture

Expected: FAIL because the backend does not exist yet.

  • Step 3: Implement minimal Navigate support

In src/browser/zhihu_release_backend.rs:

  • implement BrowserBackend

  • support Action::Navigate first

  • use ws_protocol.rs helpers for exact browser-frame construction

  • do not hardcode speculative opcodes; follow the transcript from Task 1

  • Step 4: Add failing tests for GetText and Eval

Add tests proving:

  • Action::GetText returns CommandOutput.data == {"text": "..."}

  • Action::Eval returns CommandOutput.data == {"text": "..."}

  • callback or relay failures become PipeError::Protocol(...)

  • Step 5: Implement GetText and Eval on the chosen seam

Use the smallest proven mechanism:

  • if the transcript proves page-defined callback functions are required, route through them

  • if callBackJsToCpp(...) to a page context is still part of the proven path, use it deliberately

  • if sgBrowserRegJsFun / sgBrowserExcuteJsFun becomes necessary, add it only with test coverage and only for the Zhihu path

  • Step 6: Run focused backend/protocol tests

Run:

cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture

Expected: PASS.

  • Step 7: Commit the Zhihu backend slice
git add src/browser/ws_protocol.rs src/browser/zhihu_release_backend.rs src/browser/mod.rs tests/browser_ws_protocol_test.rs tests/browser_zhihu_release_backend_test.rs
git commit -m "feat: add zhihu release ws backend"

Task 4: Keep the Zhihu workflow logic stable and patch only proven mismatches

Files:

  • Review: src/compat/workflow_executor.rs

  • Test: tests/service_task_flow_test.rs

  • Test: tests/compat_runtime_test.rs (only if a focused direct-execution regression is needed)

  • Step 1: Write a failing Zhihu-specific regression only if the chosen seam changes route assumptions

If the new Zhihu backend changes request-url or target-url handling enough to break hotlist flow, add one focused failing regression for that exact behavior.

Candidate assertions:

  • hotlist navigate still logs navigate https://www.zhihu.com/hot

  • follow-up GetText body still targets the Zhihu page, not any helper page

  • extractor Eval still runs against Zhihu, not any helper page

  • Step 2: Keep the current high-level Zhihu action sequence unless a test proves otherwise

src/compat/workflow_executor.rs currently does the right high-level work:

  • navigate to Zhihu hotlist
  • poll body text until ready
  • run the extractor script

Prefer to keep this file unchanged. Only patch it if the new backend needs a narrow explicit target_url fix or similar evidence-backed adjustment.

  • Step 3: Run the smallest Zhihu-focused verification sweep

Run:

cargo test --test service_task_flow_test -- --nocapture
cargo test --test compat_runtime_test zhihu -- --nocapture

If the compat_runtime_test zhihu filter is too broad or unstable, run the exact focused Zhihu cases that cover hotlist extraction.

  • Step 4: Commit only if a Zhihu-specific code change was actually required
git add src/compat/workflow_executor.rs tests/service_task_flow_test.rs tests/compat_runtime_test.rs
git commit -m "fix: keep zhihu workflow aligned with release ws backend"

Skip this commit if no production change in workflow_executor.rs was needed.


Task 5: Prove that pipe behavior and non-Zhihu behavior stayed unchanged

Files:

  • Test: tests/service_ws_session_test.rs

  • Test: tests/service_task_flow_test.rs

  • Test: tests/task_runner_test.rs

  • Step 1: Add or update one regression that proves pipe messages are unchanged

Use the smallest existing test seam to assert that ClientMessage / ServiceMessage payloads remain unchanged while the Zhihu route uses the new browser integration path internally.

  • Step 2: Add or update one regression that proves non-Zhihu behavior is unchanged

Use a non-Zhihu submit or service-session case and assert it does not take the new Zhihu-specific backend path.

  • Step 3: Preserve current runtime regression guards

The end-to-end tests must continue asserting that output does not contain:

  • invalid hmac seed: session key must not be empty

  • Cannot drop a runtime in a context where blocking is not allowed

  • Step 4: Run the final focused verification sweep

Run:

cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture

Expected: PASS.

  • Step 5: Commit the verification sweep
git add tests/service_ws_session_test.rs tests/service_task_flow_test.rs tests/task_runner_test.rs tests/browser_ws_backend_test.rs
git commit -m "test: constrain zhihu release ws migration scope"

Only stage files that were truly changed.


Out of scope

Do not do these in this slice:

  • change the pipe protocol
  • change ClientMessage / ServiceMessage
  • redesign run_submit_task_with_browser_backend(...)
  • reintroduce any browser bridge surface
  • keep adding speculative direct-raw-websocket callback patches to ws_backend.rs
  • redesign non-Zhihu workflows unless the new backend abstraction forces a shared fix
  • create a long-lived external dependency or third-party server just to host the helper page

Verification checklist

Run at minimum:

cargo test --test browser_ws_probe_test -- --nocapture
cargo test --test browser_zhihu_release_backend_test -- --nocapture
cargo test --test browser_ws_protocol_test -- --nocapture
cargo test --test service_ws_session_test -- --nocapture
cargo test --test service_task_flow_test -- --nocapture
cargo test --test task_runner_test -- --nocapture

If Task 2 chose the helper-page / relay design, also run the helper-page-specific backend tests you added for that path.

Manual verification after code changes:

  1. start the real Release browser/runtime that exposes ws://127.0.0.1:12345
  2. start sg_claw with real config
  3. start sg_claw_client
  4. submit:
    • 打开知乎热榜获取前10条数据并导出 Excel
  5. confirm the Zhihu path uses the exact Release-browser interaction seam proven by Task 1
  6. if Task 2 chose Option B, confirm the helper page / relay path is used only for the Zhihu integration seam
  7. confirm non-Zhihu behavior is unchanged
  8. confirm the task completes without:
    • timeout while waiting for browser message
    • invalid browser status frame: Welcome! You are client #1
    • invalid hmac seed: session key must not be empty
    • Cannot drop a runtime in a context where blocking is not allowed

Expected outcome

After this slice:

  • sgclaw still exposes the same pipe/service contract
  • Zhihu hotlist execution uses the Release-browser websocket contract proven by Task 1
  • non-Zhihu behavior remains unchanged
  • old direct-ws Zhihu assumptions are no longer ambiguous in production/tests
  • if Option A won, Zhihu uses a direct Release-browser backend
  • if Option B won, Zhihu uses the minimal helper-page / relay seam justified by the probe evidence