feat: align browser callback runtime and export flows
Consolidate the browser task runtime around the callback path, add safer artifact opening for Zhihu exports, and cover the new service/browser flows with focused tests and supporting docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,506 @@
|
||||
# WS 浏览器后端认证替换设计
|
||||
|
||||
## 背景
|
||||
|
||||
当前 `sg_claw` 的 websocket service 路径已经能接收 `sg_claw_client` 请求、复用共享 task runner、连接真实浏览器 websocket 地址 `browser_ws_url`,并进入真实 skill 执行链路。但真实联调时,所有浏览器相关调用都会失败并返回:
|
||||
|
||||
- `invalid hmac seed: session key must not be empty`
|
||||
|
||||
根因已经定位:
|
||||
|
||||
- pipe 模式在 [src/lib.rs](src/lib.rs) 中通过 handshake 拿到 `session_key`,并用它构造 `BrowserPipeTool`
|
||||
- ws service 模式在 [src/service/server.rs](src/service/server.rs) 中仍然构造 `BrowserPipeTool::new(..., vec![])`
|
||||
- `BrowserPipeTool` 的认证模型要求非空 session key,因此 ws service 路径虽然使用的是浏览器 websocket 协议,仍错误地依赖了 pipe 特有的 HMAC/session-key 语义
|
||||
|
||||
这会导致:
|
||||
|
||||
1. `sg_claw_client -> sg_claw` 连接正常
|
||||
2. skill 加载与模型调用正常
|
||||
3. 真实浏览器动作开始执行
|
||||
4. 但所有 browser tool 调用在认证层统一失败
|
||||
|
||||
## 目标
|
||||
|
||||
在 **仅限 ws 模式改动** 的前提下,让 `sg_claw` service 路径改为使用 **ws-native browser backend**,不再依赖 `BrowserPipeTool` 的 pipe session-key 认证模型,从而让真实浏览器联调可用。
|
||||
|
||||
## 约束
|
||||
|
||||
必须满足:
|
||||
|
||||
- 只改 ws 模式相关实现
|
||||
- 不破坏 legacy pipe 模式
|
||||
- 不修改 pipe handshake 语义
|
||||
- 不修改 `src/lib.rs` 的 pipe 主入口行为
|
||||
- 不引入临时绕过认证或 fake seed
|
||||
- 不扩大到多客户端、多任务、队列、守护进程管理
|
||||
|
||||
## 非目标
|
||||
|
||||
本次不做:
|
||||
|
||||
- 自动拉起 sgBrowser
|
||||
- 浏览器进程管理
|
||||
- 多浏览器实例支持
|
||||
- service/client UX 优化
|
||||
- browser ws 协议扩展
|
||||
- pipe 模式重构
|
||||
- 统一重构所有 runtime 层去完全依赖 `BrowserBackend`
|
||||
|
||||
## 现状分析
|
||||
|
||||
### 正常 pipe 路径
|
||||
|
||||
pipe 模式当前在 [src/lib.rs](src/lib.rs) 中:
|
||||
|
||||
1. 通过 `perform_handshake(...)` 读取浏览器侧初始化消息
|
||||
2. 从 handshake 中拿到 `session_key`
|
||||
3. 用 `BrowserPipeTool::new(transport.clone(), mac_policy, handshake.session_key)` 构造浏览器工具
|
||||
4. 后续 browser action 使用 pipe/HMAC 语义
|
||||
|
||||
该路径已经可用,本次不能动。
|
||||
|
||||
### 当前 ws service 路径
|
||||
|
||||
当前 ws 模式在 [src/service/server.rs](src/service/server.rs) 中:
|
||||
|
||||
1. `sg_claw_client` 将任务发给 `sg_claw` service
|
||||
2. service 构造 `ServiceBrowserTransport`
|
||||
3. service 用 `BrowserPipeTool::new(transport.clone(), mac_policy.clone(), vec![])`
|
||||
4. browser action 经 `ServiceBrowserTransport` 编码为 browser websocket 请求并发给 `browser_ws_url`
|
||||
|
||||
问题在于第 3 步:
|
||||
|
||||
- service 走的是 browser websocket 协议
|
||||
- 但却仍使用 `BrowserPipeTool`
|
||||
- `BrowserPipeTool` 内部仍坚持要求 pipe session key
|
||||
- 因此真实 ws 联调时直接失败
|
||||
|
||||
### 现有 ws-native 能力
|
||||
|
||||
代码中已经存在:
|
||||
|
||||
- [src/browser/ws_protocol.rs](src/browser/ws_protocol.rs):固定 browser websocket 协议 codec
|
||||
- [src/browser/ws_backend.rs](src/browser/ws_backend.rs):`WsBrowserBackend`
|
||||
- [src/browser/mod.rs](src/browser/mod.rs):已导出 `WsBrowserBackend`
|
||||
|
||||
`WsBrowserBackend` 本身不依赖 pipe session key,而是:
|
||||
|
||||
- 使用 `WsClient` 发送/接收文本帧
|
||||
- 使用 `MacPolicy` 做动作级校验
|
||||
- 通过 `encode_v1_action(...)` 与 `decode_callback_frame(...)` 处理 ws 协议
|
||||
|
||||
这正是 ws service 模式应该使用的模型。
|
||||
|
||||
## 关键集成缝隙
|
||||
|
||||
当前共享 runner 的真实缝隙已经确认:
|
||||
|
||||
- [src/agent/task_runner.rs](src/agent/task_runner.rs) 的 `run_submit_task(...)` 仍直接要求 `&BrowserPipeTool<T>`
|
||||
- [src/compat/runtime.rs](src/compat/runtime.rs) 与 [src/compat/orchestration.rs](src/compat/orchestration.rs) 也继续以 `BrowserPipeTool<T>` 作为主浏览器调用对象
|
||||
- 同时 compat runtime 内部已经存在 `Arc<dyn BrowserBackend>` 的工具适配层,只是它目前是从 `PipeBrowserBackend::from_inner(browser_tool)` 包出来的
|
||||
|
||||
这意味着本次实现不能只在 `src/service/server.rs` 里替换构造逻辑,而必须在 **ws 专用调用面** 增加一个最小适配缝隙,让 service 模式能把 `WsBrowserBackend` 传入 compat/runtime/orchestration,而 pipe 继续保持 `BrowserPipeTool` 原样。
|
||||
|
||||
允许的最小缝隙定义如下:
|
||||
|
||||
1. `run_submit_task(...)` 的 pipe 版本保持不动,供 pipe 入口继续使用
|
||||
2. 新增一个 **仅供 ws service 使用** 的并行入口,例如:
|
||||
- `run_submit_task_with_browser_backend(...)`
|
||||
- 或 service 侧调用的等价 ws-only adapter
|
||||
3. ws-only 入口内部允许把浏览器依赖类型降到 `Arc<dyn BrowserBackend>`
|
||||
4. `src/lib.rs`、pipe handshake、pipe `BrowserPipeTool` 构造逻辑不允许改行为
|
||||
|
||||
## 设计决策
|
||||
|
||||
### 决策 1:ws service 路径弃用 `BrowserPipeTool`
|
||||
|
||||
在 ws service 路径中,不再构造 `BrowserPipeTool`。
|
||||
|
||||
替代方案:
|
||||
|
||||
- service 侧提供一个 `WsClient` 实现
|
||||
- 直接构造 `WsBrowserBackend`
|
||||
- 让 ws service 的 browser action 通过 `WsBrowserBackend` 执行
|
||||
|
||||
### 决策 2:pipe 路径保持原样
|
||||
|
||||
pipe 模式继续:
|
||||
|
||||
- handshake
|
||||
- `session_key`
|
||||
- `BrowserPipeTool`
|
||||
|
||||
不做语义调整,不引入兼容层,不改动已存在的验证路径。
|
||||
|
||||
### 决策 3:runner 只在 ws 调用面做最小接线
|
||||
|
||||
当前共享 task runner 复用已经存在,本次不做大重构。
|
||||
|
||||
策略是:
|
||||
|
||||
- 只在 ws service 用到的调用面,改成可使用 `WsBrowserBackend`
|
||||
- 如果必须扩共享调用接口,则仅做**最小、兼容、对 pipe 零影响**的改动
|
||||
- 任何涉及 pipe 行为变更的改动都不允许
|
||||
|
||||
### 决策 4:保留现有 browser websocket 连接生命周期
|
||||
|
||||
本次不重做连接管理架构。
|
||||
|
||||
继续维持:
|
||||
|
||||
- 单客户端
|
||||
- 单任务串行
|
||||
- 按现有 service 生命周期维护 browser websocket 连接
|
||||
|
||||
只替换认证错误的执行路径,不顺手做生命周期优化。
|
||||
|
||||
## 目标架构
|
||||
|
||||
### 目标调用链
|
||||
|
||||
```text
|
||||
sg_claw_client
|
||||
-> sg_claw service
|
||||
-> ws-native browser backend
|
||||
-> browser_ws_url
|
||||
-> sgBrowser
|
||||
```
|
||||
|
||||
### 与 pipe 的并行关系
|
||||
|
||||
```text
|
||||
pipe mode:
|
||||
browser process <-> stdio/pipe <-> sgclaw::run() <-> BrowserPipeTool
|
||||
|
||||
ws mode:
|
||||
sg_claw_client <-> sg_claw service <-> WsBrowserBackend <-> sgBrowser websocket
|
||||
```
|
||||
|
||||
两条路径并行存在,互不混用认证模型。
|
||||
|
||||
## 模块设计
|
||||
|
||||
### 1. `src/service/server.rs`
|
||||
|
||||
这是本次核心改动文件。
|
||||
|
||||
#### 当前职责
|
||||
|
||||
- 管理 service client websocket 收发
|
||||
- 将 service 请求转入共享 runner
|
||||
- 维护 service->browser 的 websocket 传输桥
|
||||
|
||||
#### 本次改动
|
||||
|
||||
- 将“service->browser 的桥”从 `Transport + BrowserPipeTool` 组合改为 `WsClient + WsBrowserBackend`
|
||||
- 删除 ws service 路径中对空 `session_key` 的依赖
|
||||
- 继续保留 service socket 生命周期与 session 状态机
|
||||
|
||||
#### 目标结构
|
||||
|
||||
可接受的目标形态:
|
||||
|
||||
- `ServiceBrowserWsClient`:实现 `WsClient`
|
||||
- 内部继续维护真实 browser websocket 连接
|
||||
- `serve_client(...)` 在处理任务时构造 `WsBrowserBackend`
|
||||
- 共享 runner 或其 ws 调用包装层通过该 backend 执行 browser action
|
||||
|
||||
### 2. 共享 runner / ws 调用包装层
|
||||
|
||||
本次不要求把全项目统一改成 `BrowserBackend`。
|
||||
|
||||
但 ws service 模式必须能把 browser action 接到 `WsBrowserBackend`。
|
||||
|
||||
可接受的最小方案:
|
||||
|
||||
- 在 ws service 使用的一层引入一个只服务 ws 模式的 adapter
|
||||
- 该 adapter 把 runner 所需的 browser 调用能力委托给 `WsBrowserBackend`
|
||||
|
||||
要求:
|
||||
|
||||
- pipe 现有调用签名不变,或即使扩展也必须保证 pipe 行为完全一致
|
||||
- 不允许为了 ws 把 pipe 入口重写
|
||||
|
||||
### 3. `src/browser/ws_backend.rs`
|
||||
|
||||
原则上复用现有实现。
|
||||
|
||||
只有在以下情况下才允许最小补改:
|
||||
|
||||
- service 真实联调发现它缺一个 ws service 必需但当前未暴露的能力
|
||||
- 该补改只服务 ws-native 路径
|
||||
- 不影响现有测试语义
|
||||
|
||||
## 连接职责与边界
|
||||
|
||||
为避免 service 侧与 `WsBrowserBackend` 重复实现责任,本次显式约束如下:
|
||||
|
||||
### `WsBrowserBackend` 负责
|
||||
|
||||
- 单次 `invoke(...)` 的请求串行化
|
||||
- 调用 `encode_v1_action(...)`
|
||||
- 发送 websocket 文本帧
|
||||
- 等待即时状态帧
|
||||
- 如有 callback,等待 callback 帧并做名称匹配
|
||||
- 将结果统一为 `CommandOutput`
|
||||
- 按现有 `WsBrowserBackend` 语义产出 timeout / protocol 错误
|
||||
|
||||
### service 侧 `WsClient` 适配器负责
|
||||
|
||||
- 持有真实 browser websocket 连接
|
||||
- 在第一次请求时建立到 `browser_ws_url` 的连接
|
||||
- 把 `send_text(...)` / `recv_text_timeout(...)` 委托到真实 websocket
|
||||
- 将底层关闭、reset、timeout 统一映射为既有 `PipeError` 语义
|
||||
- 不实现 request/response correlation,不解析 browser ws 协议 payload
|
||||
|
||||
### 明确不允许
|
||||
|
||||
- service 侧继续手写 callback 轮询逻辑
|
||||
- service 侧继续直接调用 `encode_v1_action(...)` 组包作为主路径
|
||||
- 在 service 侧复制 `WsBrowserBackend` 的协议处理逻辑
|
||||
|
||||
这样可以保证:
|
||||
|
||||
- `src/service/server.rs` 只负责“连线”
|
||||
- `src/browser/ws_backend.rs` 继续负责“ws 浏览器调用语义”
|
||||
|
||||
## 数据流设计
|
||||
|
||||
### 成功路径
|
||||
|
||||
1. `sg_claw_client` 向 `sg_claw` 发 `SubmitTask`
|
||||
2. service 收到任务并进入共享 runner
|
||||
3. 当 runner 需要浏览器动作时:
|
||||
- ws service 调用 `WsBrowserBackend.invoke(...)`
|
||||
4. `WsBrowserBackend`:
|
||||
- 用 `MacPolicy` 校验动作
|
||||
- 用 `encode_v1_action(...)` 编码请求
|
||||
- 发往 `browser_ws_url`
|
||||
- 等待状态帧
|
||||
- 如有 callback,继续等 callback 帧
|
||||
5. 结果返回到 runner
|
||||
6. runner 继续执行并向 client 流式输出日志和 completion
|
||||
|
||||
### 失败路径
|
||||
|
||||
#### browser websocket 不可连
|
||||
|
||||
- 返回明确的 browser websocket connect 错误
|
||||
- 不冒充认证错误
|
||||
|
||||
#### 浏览器返回非 0 状态
|
||||
|
||||
- 返回明确协议错误:`browser returned non-zero status`
|
||||
|
||||
#### callback 超时
|
||||
|
||||
- 返回 timeout
|
||||
|
||||
#### websocket 断开
|
||||
|
||||
- 返回 `PipeError::PipeClosed`
|
||||
- 由 service 生命周期逻辑处理
|
||||
|
||||
#### 不再允许的错误
|
||||
|
||||
- `invalid hmac seed: session key must not be empty`
|
||||
|
||||
该错误在 ws 模式下应彻底消失。
|
||||
|
||||
## 失败语义
|
||||
|
||||
为便于测试与实现,ws-only 路径的 outward error 语义固定如下:
|
||||
|
||||
### browser websocket connect 失败
|
||||
|
||||
- outward: `PipeError::Protocol("browser websocket connect failed: ...")`
|
||||
|
||||
### 浏览器返回非 0 状态码
|
||||
|
||||
- outward: `PipeError::Protocol("browser returned non-zero status: ...")`
|
||||
|
||||
### callback 超时
|
||||
|
||||
- outward: `PipeError::Timeout`
|
||||
- timeout 来源:沿用 `WsBrowserBackend` / ws service 当前 response timeout 配置,默认 30 秒
|
||||
|
||||
### websocket 被对端正常关闭或 reset
|
||||
|
||||
- outward: `PipeError::PipeClosed`
|
||||
- 不允许使用“等价错误”这类不精确表述
|
||||
|
||||
### 本次必须消除的错误
|
||||
|
||||
- `invalid hmac seed: session key must not be empty`
|
||||
|
||||
任何 ws service 联调路径再出现该错误,都视为实现未完成。
|
||||
|
||||
## 测试设计
|
||||
|
||||
### 分层测试策略
|
||||
|
||||
为避免依赖 LLM/planner 的非确定性行为,本次测试必须分成两层,且各自断言不同目标:
|
||||
|
||||
#### A. backend / adapter 层测试(确定性)
|
||||
|
||||
这一层不经过 `sg_claw_client`、不经过真实模型规划,直接验证 ws-only 技术行为。
|
||||
|
||||
目标:
|
||||
|
||||
1. `ServiceBrowserWsClient` 与 `WsBrowserBackend` 的组合可以:
|
||||
- 发送 `Navigate`
|
||||
- 接收 `0` 状态
|
||||
- 在 callback 场景下读取 callback 文本
|
||||
2. 当 fake browser server 主动关闭/reset 时:
|
||||
- 在 `WsClient` / `WsBrowserBackend.invoke(...)` 观察层断言 outward error 必须是 `PipeError::PipeClosed`
|
||||
3. 当 fake browser server 不返回 callback 时:
|
||||
- 在 `WsBrowserBackend.invoke(...)` 观察层断言 outward error 必须是 `PipeError::Timeout`
|
||||
4. 该层测试完全不依赖 LLM、planner、skills 路由
|
||||
|
||||
建议:
|
||||
|
||||
- 新增 focused ws service/backend test
|
||||
- 输入动作固定为代码直接调用 `invoke(Action::Navigate, ...)` 等,而不是自然语言任务
|
||||
|
||||
#### B. client -> service 集成测试(链路验证)
|
||||
|
||||
这一层验证 ws-only 接线已经替换掉空 session key 路径,但不承担细粒度协议语义断言。
|
||||
|
||||
目标:
|
||||
|
||||
1. 通过真实 `sg_claw_client -> sg_claw service` 发起一个最小自然语言任务
|
||||
2. fake browser websocket server 至少收到一个来自 ws-only 路径的文本帧
|
||||
3. client/service 输出中不再出现:
|
||||
- `invalid hmac seed: session key must not be empty`
|
||||
4. 该层只证明:
|
||||
- ws service 已不再走空 session key 的 pipe 认证路径
|
||||
- 真实端到端链路已能到达 browser websocket
|
||||
|
||||
该层不用于断言精确 enum 身份,也不用于覆盖 callback timeout / reset 细节。
|
||||
|
||||
### 新增红测 1:ws-only backend/adapter 基本调用可用
|
||||
|
||||
目标:
|
||||
|
||||
- 不走自然语言任务
|
||||
- 直接构造 ws service 使用的 `WsClient` + `WsBrowserBackend`
|
||||
- 调用固定动作:`Action::Navigate`,目标 url 固定为 `https://www.zhihu.com/hot`
|
||||
- fake browser websocket server 返回 `0`
|
||||
- 断言:
|
||||
- `invoke(...)` 成功
|
||||
- fake server 收到的首个文本帧可按 `ws_protocol` 语义解释为 `Navigate`
|
||||
|
||||
### 新增红测 2:ws-only backend/adapter 断链语义固定
|
||||
|
||||
目标:
|
||||
|
||||
- 不走自然语言任务
|
||||
- fake browser websocket server 在接受请求后主动关闭或 reset
|
||||
- 在 `invoke(...)` 观察层断言:
|
||||
- outward error 固定为 `PipeError::PipeClosed`
|
||||
|
||||
### 新增红测 3:ws-only backend/adapter callback timeout 语义固定
|
||||
|
||||
目标:
|
||||
|
||||
- 不走自然语言任务
|
||||
- fake browser websocket server 返回 `0` 但不返回 callback 帧
|
||||
- 在 `invoke(...)` 观察层断言:
|
||||
- outward error 固定为 `PipeError::Timeout`
|
||||
|
||||
### 新增红测 4:client->service 链路不再触发空 session key 错误
|
||||
|
||||
目标:
|
||||
|
||||
- 通过真实 `sg_claw_client -> sg_claw service` 链路触发浏览器动作
|
||||
- 用 fake browser websocket 服务端接住请求
|
||||
- 任务输入固定为:`打开知乎热榜并读取页面主区域文本`
|
||||
- 断言 client/service 输出中不再出现:
|
||||
- `invalid hmac seed: session key must not be empty`
|
||||
- 断言 fake browser server 至少收到了一个文本帧
|
||||
|
||||
### 回归测试
|
||||
|
||||
必须重新运行并保持通过:
|
||||
|
||||
#### pipe 回归
|
||||
|
||||
```bash
|
||||
cargo test --test pipe_handshake_test -- --nocapture
|
||||
```
|
||||
|
||||
如实现涉及 browser tool 上层接线,还需补跑:
|
||||
|
||||
```bash
|
||||
cargo test --test browser_tool_test --test compat_browser_tool_test --test runtime_task_flow_test -- --nocapture
|
||||
```
|
||||
|
||||
#### ws 回归
|
||||
|
||||
```bash
|
||||
cargo test --test service_ws_session_test --test service_task_flow_test --test browser_ws_protocol_test --test browser_ws_backend_test -- --nocapture
|
||||
```
|
||||
|
||||
## 手工验收
|
||||
|
||||
使用真实配置和真实已启动 sgBrowser:
|
||||
|
||||
1. 启动 sgBrowser,并确保 `browserWsUrl` 可用
|
||||
2. 启动 `sg_claw`
|
||||
3. 运行:
|
||||
- `sg_claw_client`
|
||||
4. 发送知乎最小任务:
|
||||
- 打开知乎热榜并读取页面主区域文本
|
||||
5. 观察:
|
||||
- 不再出现 `invalid hmac seed`
|
||||
- 出现真实 browser action 日志
|
||||
- 能返回单次 completion
|
||||
6. 再运行旧知乎 skill:
|
||||
- `读取知乎热榜数据,并导出 excel 文件`
|
||||
7. 验证旧知乎 skill 进入真实 browser 执行路径
|
||||
8. 最后确认 legacy pipe 入口仍可启动(仅验证,不允许为此修改 pipe 实现)
|
||||
|
||||
## 风险
|
||||
|
||||
### 风险 1:ws service 与共享 runner 接口耦合过深
|
||||
|
||||
控制:
|
||||
|
||||
- 只在 ws 使用面做 adapter
|
||||
- 不对 pipe 主入口做结构性改造
|
||||
|
||||
### 风险 2:为适配 ws-native backend 误改 pipe 调用链
|
||||
|
||||
控制:
|
||||
|
||||
- 所有 pipe 回归必须在每轮修改后重跑
|
||||
- `src/lib.rs` 不允许改行为
|
||||
|
||||
### 风险 3:ws service 内联连接逻辑与 `WsBrowserBackend` 责任重复
|
||||
|
||||
控制:
|
||||
|
||||
- 本次先以最小变更消除认证阻塞
|
||||
- 不顺手做大规模整理
|
||||
|
||||
## 通过标准
|
||||
|
||||
满足以下全部条件才算完成:
|
||||
|
||||
1. ws service 路径不再依赖空 session key
|
||||
2. 不再出现 `invalid hmac seed: session key must not be empty`
|
||||
3. 真实 browser websocket 请求能发到 sgBrowser/fake browser server
|
||||
4. 旧知乎 skill 至少能进入真实 browser action 执行链路
|
||||
5. pipe 模式零回归
|
||||
6. 所有新增/相关测试通过
|
||||
|
||||
## 实施建议
|
||||
|
||||
按以下顺序实施:
|
||||
|
||||
1. 先补红测,锁定“ws 不再触发 invalid hmac seed”
|
||||
2. 再把 ws service 路径切到 `WsBrowserBackend`
|
||||
3. 跑 ws 测试
|
||||
4. 跑 pipe 回归
|
||||
5. 做真实知乎最小任务 smoke
|
||||
6. 再做旧知乎 skill smoke
|
||||
@@ -0,0 +1,276 @@
|
||||
# WS Browser Bridge Path Design
|
||||
|
||||
## Background
|
||||
|
||||
The repository now has explicit live evidence that the real sgBrowser websocket endpoint at `ws://127.0.0.1:12345` is **reachable** but is **not validated as an external-control surface**.
|
||||
|
||||
The probe transcript in `docs/_tmp_sgbrowser_ws_probe_transcript.md` shows a stable outcome across the full bootstrap matrix:
|
||||
|
||||
- direct open-page frame
|
||||
- `sgOpenAgent`
|
||||
- `sgSetAuthInfo`
|
||||
- `sgBrowserLogin`
|
||||
- `sgBrowerserActiveTab`
|
||||
- combined bootstrap attempts
|
||||
- alternate `requesturl` values
|
||||
|
||||
Across all of those sequences, the endpoint behaved like this:
|
||||
|
||||
1. websocket connection succeeds
|
||||
2. first inbound text frame is always the banner `Welcome! You are client #1`
|
||||
3. no sequence produced a reproducible numeric status frame for a real business action
|
||||
4. no sequence produced a reproducible callback frame for a real business action
|
||||
5. follow-on business frames timed out or produced no further usable protocol traffic
|
||||
|
||||
That means the current project can no longer treat raw external websocket business frames as the default production integration surface.
|
||||
|
||||
## Why the raw websocket path is now considered non-validated
|
||||
|
||||
The decision is not based on a guess. It is based on both live evidence and repository evidence.
|
||||
|
||||
### Live evidence
|
||||
|
||||
`docs/_tmp_sgbrowser_ws_probe_transcript.md` proves that the real endpoint did **not** yield the one thing raw external control needs:
|
||||
|
||||
- a reproducible status/callback response for a real browser action
|
||||
|
||||
Because that never happened, the bootstrap hypothesis did not clear the acceptance bar.
|
||||
|
||||
### Repository evidence
|
||||
|
||||
The rest of the repository already points to a different product integration model.
|
||||
|
||||
#### 1. Historical frontend code uses browser-host bridge surfaces
|
||||
|
||||
In `frontend/archive/sgClaw验证-已归档/testRunner.js:15-26`:
|
||||
|
||||
- the runtime checks for `window.sgFunctionsUI`
|
||||
- the runtime checks for `window.BrowserAction`
|
||||
- the working path uses `window.sgFunctionsUI(action, params, callback)`
|
||||
|
||||
That is a host/browser bridge contract, not an external raw websocket RPC contract.
|
||||
|
||||
#### 2. Prior architecture docs make `CommandRouter` the execution entry
|
||||
|
||||
In `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md:16-18` and `:36-50`:
|
||||
|
||||
- reuse SuperRPA `CommandRouter` as the browser execution entry
|
||||
- keep browser-side hosting, security re-check, and dispatch in SuperRPA
|
||||
- avoid building parallel browser automation APIs
|
||||
|
||||
That is directly incompatible with treating raw external websocket business frames as the primary control plane.
|
||||
|
||||
#### 3. Project planning docs describe FunctionsUI IPC as the supported frontend seam
|
||||
|
||||
In `docs/archive/项目管理与排期/协作时间表.md:419-430`:
|
||||
|
||||
- Vue/FunctionsUI calls browser-host methods such as `window.superrpa.sgclaw.start()` and `sendCommand(...)`
|
||||
- browser host pushes callbacks such as `onStatusChange(...)` and `onLog(...)`
|
||||
|
||||
Again, this is a bridge and host IPC model.
|
||||
|
||||
#### 4. Floating-chat planning already preserves named bridge calls
|
||||
|
||||
In `docs/plans/2026-03-27-sgclaw-floating-chat-frontend-design.md:289-293`:
|
||||
|
||||
- `connect()` issues `sgclawConnect`
|
||||
- `start()` issues `sgclawStart`
|
||||
- `stop()` issues `sgclawStop`
|
||||
- `submitTask()` issues `sgclawSubmitTask`
|
||||
|
||||
That design work assumes a named browser bridge, not direct raw websocket frames.
|
||||
|
||||
## Decision
|
||||
|
||||
**Authoritative browser integration surface: the browser-host bridge path, not the raw external sgBrowser websocket business-frame path.**
|
||||
|
||||
More concretely, sgClaw should target this chain:
|
||||
|
||||
```text
|
||||
sgClaw runtime
|
||||
-> existing browser-facing bridge contract
|
||||
-> FunctionsUI / host IPC
|
||||
-> BrowserAction / sgclaw host callbacks
|
||||
-> existing SuperRPA CommandRouter dispatch
|
||||
```
|
||||
|
||||
## Authoritative seams for future implementation
|
||||
|
||||
Because this repository does not contain the full SuperRPA browser host source tree, the bridge-first implementation must integrate at the **nearest validated seam available in this repo**, while staying aligned with the external browser-host contract already documented.
|
||||
|
||||
The future implementation must model **two different bridge layers** explicitly instead of mixing them together.
|
||||
|
||||
### Layer 1: session/lifecycle bridge contract
|
||||
|
||||
This layer is evidenced by the named calls already present in repo documentation:
|
||||
|
||||
- `sgclawConnect`
|
||||
- `sgclawStart`
|
||||
- `sgclawStop`
|
||||
- `sgclawSubmitTask`
|
||||
|
||||
This layer manages session setup, task submission, and host/UI lifecycle behavior.
|
||||
|
||||
It is important evidence that a browser-host bridge exists, but it is **not** the per-browser-action contract that a new `BrowserBackend` implementation should target.
|
||||
|
||||
### Layer 2: browser-action execution contract
|
||||
|
||||
This is the authoritative target for the new browser backend.
|
||||
|
||||
It is evidenced by:
|
||||
|
||||
- `window.BrowserAction(...)` in archived frontend code
|
||||
- `FunctionsUI` / host IPC integration in archived planning docs
|
||||
- browser-side dispatch through `CommandRouter` in `docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md`
|
||||
|
||||
In this repository, the concrete boundary must be a **repo-local semantic transport seam** that can be implemented and tested without access to the external SuperRPA host code.
|
||||
|
||||
That seam should be a narrow Rust-side contract such as `BridgeActionTransport`:
|
||||
|
||||
- input: semantic browser action request (`navigate`, `click`, `getText`, etc.) plus params and expected domain
|
||||
- output: semantic success/error reply that can be normalized back into `BrowserBackend` results
|
||||
|
||||
`BridgeBrowserBackend` should target **Layer 2 only**.
|
||||
|
||||
### Explicit out-of-scope boundary
|
||||
|
||||
The following are outside this repository and therefore outside the immediate Rust implementation slice:
|
||||
|
||||
- actual SuperRPA C++ host/browser code
|
||||
- actual `FunctionsUI` TypeScript host plumbing in the external browser repository
|
||||
- actual `CommandRouter` implementation in the external browser repository
|
||||
|
||||
This repository should implement only:
|
||||
|
||||
- the Rust-side bridge contract types
|
||||
- the Rust-side bridge transport/provider seam
|
||||
- the Rust-side bridge-backed browser adapter
|
||||
- deterministic tests against those seams
|
||||
|
||||
### What this means practically
|
||||
|
||||
The next implementation slice should **not** continue trying to make `WsBrowserBackend` drive the real browser endpoint directly.
|
||||
|
||||
Instead, the next implementation slice should introduce a **bridge-backed browser adapter** that:
|
||||
|
||||
- preserves the Rust-side `BrowserBackend` contract where practical
|
||||
- translates browser actions onto the Layer-2 semantic bridge surface
|
||||
- keeps lifecycle/session bridge calls separate from per-action browser execution
|
||||
- leaves the raw websocket probe code as diagnostic infrastructure only
|
||||
|
||||
## Chosen architecture
|
||||
|
||||
Use a bridge-backed adapter design.
|
||||
|
||||
### Target shape
|
||||
|
||||
```text
|
||||
compat/runtime/orchestration
|
||||
-> Arc<dyn BrowserBackend>
|
||||
-> BridgeBrowserBackend (new)
|
||||
-> BridgeActionTransport (new repo-local seam)
|
||||
-> external browser-host bridge / FunctionsUI IPC
|
||||
-> BrowserAction / CommandRouter path
|
||||
```
|
||||
|
||||
### Why this shape
|
||||
|
||||
- It preserves the already-useful Rust-side browser abstraction (`BrowserBackend`) instead of re-plumbing the entire runtime.
|
||||
- It keeps raw websocket probing available for diagnostics without letting it dictate production architecture.
|
||||
- It matches the architecture already documented for SuperRPA integration.
|
||||
- It keeps future work narrow: one new adapter layer instead of rewriting all runtime behavior.
|
||||
|
||||
## What stays the same
|
||||
|
||||
### Pipe path remains unchanged
|
||||
|
||||
The existing pipe path must remain behaviorally unchanged:
|
||||
|
||||
- `src/lib.rs`
|
||||
- pipe handshake behavior
|
||||
- `BrowserPipeTool`
|
||||
- existing HMAC/domain validation semantics
|
||||
|
||||
The bridge-first work is about the **ws service / real browser integration path**, not about replacing or weakening the pipe path.
|
||||
|
||||
### Existing compat/runtime abstractions should be preserved where practical
|
||||
|
||||
The next slice should reuse:
|
||||
|
||||
- `BrowserBackend`
|
||||
- existing browser tool adapters in compat/runtime
|
||||
- existing task runner/orchestration flow
|
||||
|
||||
The new work should be concentrated in a bridge adapter and its wiring, not spread through unrelated layers.
|
||||
|
||||
## What does not stay the same
|
||||
|
||||
### Raw websocket is no longer the mainline production assumption
|
||||
|
||||
The repository may keep:
|
||||
|
||||
- `src/browser/ws_backend.rs`
|
||||
- `src/browser/ws_protocol.rs`
|
||||
- `src/browser/ws_probe.rs`
|
||||
- `src/bin/sgbrowser_ws_probe.rs`
|
||||
|
||||
But those should now be treated as:
|
||||
|
||||
- protocol tooling
|
||||
- fake-server test tooling
|
||||
- live diagnostic/probe tooling
|
||||
- possibly constrained compatibility code
|
||||
|
||||
They should remain diagnostic-only in this repository and must not be treated as the production path for reaching the real browser.
|
||||
|
||||
## Design constraints for the bridge slice
|
||||
|
||||
The bridge-path implementation must follow these constraints:
|
||||
|
||||
1. **No parallel browser API invention.** Reuse the real bridge/browser action surface already evidenced in docs and archived frontend code.
|
||||
2. **No pipe regression.** Do not alter the working pipe entry path.
|
||||
3. **Adapter-first design.** Prefer one bridge-backed backend implementation over broad runtime rewrites.
|
||||
4. **TDD first.** Add focused bridge adapter tests before production wiring.
|
||||
5. **Repository-local seam only.** Where external SuperRPA browser-host code is unavailable here, encode the contract in narrow adapters and tests instead of guessing internals.
|
||||
|
||||
## Testing implications
|
||||
|
||||
The bridge path changes what “proof” looks like.
|
||||
|
||||
### Required proof for the next slice
|
||||
|
||||
The next implementation slice must prove:
|
||||
|
||||
- a browser action can be emitted onto the bridge contract deterministically
|
||||
- the bridge adapter maps replies/errors back into `BrowserBackend` semantics
|
||||
- compat/runtime can use the bridge-backed backend without pipe regression
|
||||
|
||||
### No longer required for acceptance
|
||||
|
||||
The next slice does **not** need to prove that raw websocket business frames work directly against `ws://127.0.0.1:12345`, because the current evidence rejected that path as the mainline assumption.
|
||||
|
||||
## Acceptance criteria for this design decision
|
||||
|
||||
This design is correct only if future implementation follows all of these:
|
||||
|
||||
1. The next production slice targets the browser-host bridge path rather than raw external websocket business frames.
|
||||
2. The raw websocket probe tooling remains diagnostic only.
|
||||
3. Existing pipe behavior stays unchanged.
|
||||
4. The next implementation plan identifies a narrow bridge-backed adapter, not a broad architecture rewrite.
|
||||
5. Future success claims are based on bridge-path execution evidence, not on reinterpreting the existing raw-websocket transcript.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Aligns implementation with the strongest evidence already in the repo
|
||||
- Stops further speculative coding on the wrong control surface
|
||||
- Preserves existing ws probe work as useful diagnostics
|
||||
- Keeps the next slice narrow and testable
|
||||
|
||||
### Trade-off
|
||||
|
||||
- Requires an additional adapter design step before more production code can land
|
||||
- Defers any hope that a small websocket tweak alone will unlock the real browser path
|
||||
|
||||
That trade-off is correct, because the current blocker is no longer a small protocol bug. It is an integration-surface mismatch.
|
||||
@@ -0,0 +1,288 @@
|
||||
# WS Browser Integration Surface Correction Design
|
||||
|
||||
## Background
|
||||
|
||||
The current websocket service path already proved two things:
|
||||
|
||||
1. `sg_claw_client -> sg_claw` request handling works.
|
||||
2. The ws-native backend/auth replacement removed the old pipe/HMAC mismatch that produced `invalid hmac seed: session key must not be empty`.
|
||||
|
||||
However, real sgBrowser smoke still does not work.
|
||||
|
||||
Manual probing against the configured real browser websocket endpoint (`ws://127.0.0.1:12345`) produced a stable pattern:
|
||||
|
||||
- the connection succeeds
|
||||
- the server sends one banner text frame such as `Welcome! You are client #1`
|
||||
- after that, business frames receive no status frame and no callback frame
|
||||
- this remains true for:
|
||||
- valid-looking `sgBrowerserOpenPage` frames
|
||||
- callback-based APIs
|
||||
- no-arg/context-light APIs
|
||||
- malformed or obviously wrong frames
|
||||
|
||||
At the same time, local documentation and archived frontend code point to a different integration model:
|
||||
|
||||
- the websocket API doc describes the websocket service as a transport replacement for page-context JavaScript calls, and requires the current page URL (`requesturl`) in each message
|
||||
- archived frontend/product code uses `window.sgFunctionsUI(...)` and `window.BrowserAction(...)`
|
||||
- archived architecture docs describe the supported product path as `FunctionsUI -> browser host bridge -> BrowserAction/CommandRouter`, not an arbitrary external process speaking raw browser websocket frames
|
||||
|
||||
This means the current assumption is no longer acceptable as the default architecture hypothesis:
|
||||
|
||||
- **Rejected default assumption:** `sg_claw` can directly control the real browser by speaking raw business frames to `browserWsUrl` as an external client, with no additional browser-host bridge, page context, or bootstrap/session contract.
|
||||
|
||||
That assumption may still turn out to be partially true, but it is no longer justified enough to continue coding against as the mainline design.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The project currently has a functioning ws-native transport implementation, but it does **not** have a validated real integration surface for sgBrowser.
|
||||
|
||||
The unresolved question is now architectural rather than syntactic:
|
||||
|
||||
### Possibility A: raw websocket is valid, but requires hidden bootstrap/preconditions
|
||||
|
||||
Examples suggested by the local API document:
|
||||
|
||||
- a real browser page must already exist and `requesturl` must refer to that page
|
||||
- one or more setup calls such as `sgSetAuthInfo`, `sgBrowserLogin`, `sgOpenAgent`, or `sgBrowerserActiveTab` must happen first
|
||||
- callbacks may require a browser-side JS/page context that an external process does not automatically have
|
||||
- some APIs may only work against agent/show/hide areas after browser-side initialization
|
||||
|
||||
### Possibility B: raw websocket is not the supported external control surface
|
||||
|
||||
Instead, the real product path may require:
|
||||
|
||||
- `FunctionsUI` / browser-host IPC
|
||||
- host-side security and routing
|
||||
- `BrowserAction` / `CommandRouter` dispatch
|
||||
- page-injected or browser-embedded execution context
|
||||
|
||||
If this is true, continuing to invest in raw external websocket business-frame handling as the main integration surface would be architectural drift.
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the current unvalidated ws-native-direct assumption with a decision-backed integration strategy.
|
||||
|
||||
The next implementation slice must do exactly one of these two things based on evidence:
|
||||
|
||||
1. **Bootstrap path:** prove that raw websocket control is real and supported once the missing bootstrap/precondition sequence is performed, then codify that bootstrap sequence and keep `WsBrowserBackend` as the execution surface.
|
||||
2. **Bridge path:** prove that raw websocket is not the real supported surface for external control, then pivot the runtime design so sgClaw targets the actual browser-host bridge / `BrowserAction` surface instead of pretending the raw websocket is enough.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This correction slice does **not** include:
|
||||
|
||||
- broad feature work on the floating chat UI
|
||||
- multi-client service redesign
|
||||
- browser process lifecycle management
|
||||
- speculative protocol expansion
|
||||
- generic reconnection/backoff work
|
||||
- rewriting the entire compat/runtime stack without evidence
|
||||
- landing both bootstrap and bridge implementations in one branch
|
||||
|
||||
The purpose of this slice is to choose the correct integration surface first.
|
||||
|
||||
## Evidence Summary
|
||||
|
||||
### Evidence that the current raw-ws-direct assumption is weak
|
||||
|
||||
1. Real endpoint accepts connections but stays silent after the welcome/banner frame.
|
||||
2. Silence occurs even for malformed frames, which suggests the endpoint is not acting like an openly documented RPC surface for arbitrary external clients.
|
||||
3. The API documentation frames websocket use as a replacement for page-side JS invocation, not as a standalone public automation API.
|
||||
4. The documentation repeatedly depends on `requesturl`, callback function names, target pages, and browser areas (`show`, `hide`, `agent`).
|
||||
5. Historical frontend/product code uses `window.sgFunctionsUI(...)` and `window.BrowserAction(...)`, not raw external websocket business calls.
|
||||
6. Historical architecture docs emphasize `FunctionsUI`, `CommandRouter`, and browser-host bridge seams.
|
||||
|
||||
### Evidence that the current ws-native work is still useful
|
||||
|
||||
1. The ws-native auth replacement removed a real bug.
|
||||
2. The ws backend now correctly carries forward the last navigated request URL.
|
||||
3. `WsBrowserBackend` and `ws_protocol` remain valuable as deterministic protocol tooling for fake-server tests and any future bootstrap validation.
|
||||
|
||||
So the conclusion is **not** “delete ws-native work.”
|
||||
|
||||
The conclusion is:
|
||||
|
||||
- do not treat raw external websocket control as validated product architecture yet
|
||||
- use the ws-native code only behind a decision gate
|
||||
|
||||
## Design Decision
|
||||
|
||||
Adopt a **decision-gated integration strategy**.
|
||||
|
||||
### Decision Gate 1: Validate bootstrap viability first
|
||||
|
||||
Before any more production architecture changes, add a focused, deterministic validation harness that can exercise a candidate raw-websocket bootstrap sequence against a live endpoint.
|
||||
|
||||
The harness must support:
|
||||
|
||||
- ordered frame scripts
|
||||
- exact frame logging
|
||||
- exact timeout/silence observation
|
||||
- trying candidate setup sequences such as:
|
||||
- `sgSetAuthInfo`
|
||||
- `sgBrowserLogin`
|
||||
- `sgOpenAgent`
|
||||
- `sgBrowerserActiveTab`
|
||||
- then a minimal action such as `sgBrowerserOpenPage` or `sgBrowserExcuteJsCodeByArea`
|
||||
- trying the same action with different `requesturl` assumptions
|
||||
- distinguishing these outcomes:
|
||||
- numeric status returned
|
||||
- callback returned
|
||||
- welcome only, then silence
|
||||
- close/reset
|
||||
- protocol error
|
||||
|
||||
This harness is not product code. It is an evidence tool that prevents blind implementation.
|
||||
|
||||
### Decision Gate 2: Make bridge pivot the default fallback
|
||||
|
||||
If the validation harness cannot demonstrate a reproducible bootstrap sequence that yields real status/callback frames from the live browser endpoint, then raw websocket must be considered **non-validated for external control**.
|
||||
|
||||
At that point, the design must pivot to the bridge path:
|
||||
|
||||
- sgClaw browser control targets the real browser-host integration surface
|
||||
- use the bridge already evidenced in docs/code (`FunctionsUI`, browser host IPC, `BrowserAction`, `CommandRouter`)
|
||||
- keep raw websocket support, if retained at all, as a diagnostic or highly constrained adapter rather than the primary product path
|
||||
|
||||
## Architecture Options
|
||||
|
||||
## Option A: Bootstrap-validated raw websocket path
|
||||
|
||||
Choose this only if the live validation harness produces repeatable evidence.
|
||||
|
||||
### Resulting architecture
|
||||
|
||||
```text
|
||||
sg_claw_client
|
||||
-> sg_claw service
|
||||
-> bootstrap sequence executor
|
||||
-> WsBrowserBackend
|
||||
-> browserWsUrl
|
||||
-> sgBrowser
|
||||
```
|
||||
|
||||
### Required conditions
|
||||
|
||||
- a reproducible bootstrap sequence exists
|
||||
- the sequence yields status/callback traffic for real business actions
|
||||
- the sequence can be encoded as a narrow service-side precondition layer
|
||||
- the sequence does not require unowned browser UI/manual setup outside a documented contract
|
||||
|
||||
### Allowed production changes if Option A wins
|
||||
|
||||
- add explicit bootstrap calls before first browser action
|
||||
- persist validated session/context state needed by the real endpoint
|
||||
- tighten `request_url` / target-page handling around the proven contract
|
||||
|
||||
### Not allowed even if Option A wins
|
||||
|
||||
- guessing bootstrap steps without evidence
|
||||
- silently sprinkling many setup calls into random locations
|
||||
- broadening the compat/runtime API before the bootstrap contract is known
|
||||
|
||||
## Option B: Bridge-first integration path
|
||||
|
||||
Choose this if live validation does not prove a workable raw websocket bootstrap.
|
||||
|
||||
### Resulting architecture
|
||||
|
||||
```text
|
||||
sg_claw_client
|
||||
-> sg_claw service
|
||||
-> bridge adapter
|
||||
-> browser host / FunctionsUI / BrowserAction / CommandRouter
|
||||
-> sgBrowser page actions
|
||||
```
|
||||
|
||||
### Required conditions
|
||||
|
||||
- local docs/code show a stable supported bridge path
|
||||
- raw websocket remains non-validated or only page-context-scoped
|
||||
- the bridge surface can be wrapped behind the existing `BrowserBackend` abstraction or a sibling adapter without weakening pipe behavior
|
||||
|
||||
### Allowed production changes if Option B wins
|
||||
|
||||
- add a new browser backend implementation that targets the real bridge surface
|
||||
- redirect ws service/browser execution away from raw business frames
|
||||
- preserve ws-native code only for tests, probes, or intentionally constrained cases
|
||||
|
||||
### Not allowed even if Option B wins
|
||||
|
||||
- pretending the old raw-ws mainline still works “well enough”
|
||||
- leaving the service path ambiguously split between two competing primary surfaces
|
||||
|
||||
## Scope Guardrails for the Next Implementation Plan
|
||||
|
||||
The next implementation plan must obey these guardrails:
|
||||
|
||||
1. **One branch, one decision.** Do not implement both architecture options at once.
|
||||
2. **Evidence before code.** If bootstrap is unproven, the next coding task is probe/validation tooling, not another speculative service/runtime refactor.
|
||||
3. **Keep pipe untouched.** `src/lib.rs`, pipe handshake, and the pipe `BrowserPipeTool` path remain behaviorally unchanged.
|
||||
4. **Do not delete ws-native code prematurely.** It still has value for protocol tests and validation tooling.
|
||||
5. **Do not broaden success claims.** Removing `invalid hmac seed` did not make real browser control work.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Stage 1: Evidence tooling tests
|
||||
|
||||
Add deterministic tests for the live-probe/validation harness so it can:
|
||||
|
||||
- send an ordered frame script
|
||||
- record exact received frames
|
||||
- report silence/timeout precisely
|
||||
- expose transcript output suitable for comparing candidate bootstrap sequences
|
||||
|
||||
These tests use a fake websocket server, not sgBrowser.
|
||||
|
||||
### Stage 2: Live validation runs
|
||||
|
||||
Use the harness against the real endpoint with a fixed matrix of candidate sequences.
|
||||
|
||||
At minimum, compare:
|
||||
|
||||
1. no bootstrap -> minimal action
|
||||
2. `sgOpenAgent` -> minimal action
|
||||
3. `sgSetAuthInfo` -> minimal action
|
||||
4. `sgBrowserLogin` -> minimal action
|
||||
5. `sgBrowerserActiveTab` -> minimal action
|
||||
6. combined documented bootstrap candidates -> minimal action
|
||||
7. alternate `requesturl` values representing:
|
||||
- `about:blank`
|
||||
- target page URL
|
||||
- a currently open page URL if known
|
||||
|
||||
### Stage 3: Architecture-branch acceptance
|
||||
|
||||
If Option A wins:
|
||||
|
||||
- add one automated regression that proves the validated bootstrap sequence produces the first real status frame in a controlled integration test
|
||||
- then continue with the narrowest production implementation plan
|
||||
|
||||
If Option B wins:
|
||||
|
||||
- write a new bridge-integration implementation plan before changing production code
|
||||
- base all production tasks on the documented bridge surface
|
||||
|
||||
## Acceptance Criteria for This Design Correction
|
||||
|
||||
This design correction is successful only if future work follows these rules:
|
||||
|
||||
1. The repository has an explicit design document recording that raw ws-native direct control is **not currently validated**.
|
||||
2. The next engineering slice starts with validation or bridge selection, not another speculative runtime refactor.
|
||||
3. Any future claim that raw websocket is the supported production path must be backed by a reproducible live bootstrap transcript.
|
||||
4. If that evidence does not appear, the project pivots to the bridge path rather than continuing to guess.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- stops further speculative coding against an unproven surface
|
||||
- preserves useful ws-native work without over-committing to it
|
||||
- creates a clean decision point for the next implementation branch
|
||||
|
||||
### Trade-off
|
||||
|
||||
- this does not immediately unblock real browser control
|
||||
- it intentionally inserts an evidence phase before more production changes
|
||||
|
||||
That trade-off is acceptable because the current failure mode is architectural uncertainty, not a missing two-line fix.
|
||||
@@ -0,0 +1,105 @@
|
||||
# WS Browser Welcome Frame Compatibility Design
|
||||
|
||||
## Background
|
||||
|
||||
Manual smoke verification after the ws-native browser backend auth replacement showed that real `sgBrowser` sends a banner text frame immediately after the websocket connection is established:
|
||||
|
||||
- `Welcome! You are client #1`
|
||||
|
||||
The current ws-native path treats the first received text frame as a protocol status frame. In `src/browser/ws_backend.rs`, `WsBrowserBackend::invoke(...)` reads one text frame and immediately parses it as an integer status code. That works for the existing deterministic tests, but it fails against the real browser because the first frame is a human-readable welcome banner rather than `0` or another numeric status.
|
||||
|
||||
This means the auth replacement is working — the old `invalid hmac seed: session key must not be empty` error no longer appears — but real smoke still fails on protocol parsing.
|
||||
|
||||
## Goal
|
||||
|
||||
Make the ws service path tolerate exactly one initial welcome/banner text frame from the real browser websocket, without weakening the general ws protocol semantics.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This change must not:
|
||||
|
||||
- Relax parsing of arbitrary non-protocol text frames
|
||||
- Change `WsBrowserBackend` into a browser-specific parser for banners
|
||||
- Affect the legacy pipe path
|
||||
- Add retry loops or broader reconnection logic
|
||||
- Change callback handling semantics
|
||||
|
||||
## Chosen approach
|
||||
|
||||
Handle the welcome banner only in `ServiceBrowserWsClient`.
|
||||
|
||||
### Why this layer
|
||||
|
||||
`ServiceBrowserWsClient` is already the real-browser adapter used only by the ws service path in `src/service/server.rs`. The welcome frame is a quirk of the real browser endpoint rather than a property of the shared ws protocol abstraction. Keeping the compatibility behavior in the service-side client preserves the stricter semantics of `WsBrowserBackend` for all other callers and test doubles.
|
||||
|
||||
## Behavioral rules
|
||||
|
||||
1. Only the first received text frame after establishing a browser websocket connection may be treated as a welcome/banner candidate.
|
||||
2. If that first text frame matches the real banner shape (currently observed as `Welcome! You are client #1`), the client discards it and continues waiting for the actual protocol frame.
|
||||
3. The welcome skip is one-time only per connection, not per request. Because `ServiceBrowserWsClient` holds a persistent socket, this state must survive multiple `invoke(...)` calls on the same underlying websocket.
|
||||
4. After the welcome skip:
|
||||
- status frames must still be numeric strings
|
||||
- callback frames must still match the existing JSON-array callback protocol
|
||||
- any other malformed frame remains a protocol error
|
||||
5. Timeout, close/reset, and connect-failure semantics remain unchanged.
|
||||
|
||||
## Matching strategy
|
||||
|
||||
Use a narrow string check in `ServiceBrowserWsClient` for a welcome/banner frame:
|
||||
|
||||
- starts with `Welcome! You are client #`
|
||||
|
||||
This is intentionally strict. We are adapting one known real-browser behavior, not introducing a generic “ignore garbage text” mode.
|
||||
|
||||
## Tests
|
||||
|
||||
### New red tests
|
||||
|
||||
Add focused unit tests under `src/service/server.rs` tests:
|
||||
|
||||
1. Positive case:
|
||||
- fake websocket server sends:
|
||||
1. `Welcome! You are client #1`
|
||||
2. `0`
|
||||
- then `WsBrowserBackend.invoke(Action::Navigate, ...)` succeeds
|
||||
|
||||
2. Negative case:
|
||||
- fake websocket server sends a different first text frame that does **not** match the known welcome prefix
|
||||
- assert the call still fails as a protocol error rather than silently skipping the frame
|
||||
|
||||
The positive test must fail before the implementation change and pass after it. The negative test guards the non-goal that we are not introducing a generic “ignore arbitrary text” mode.
|
||||
|
||||
### Regression coverage
|
||||
|
||||
Re-run:
|
||||
|
||||
- `cargo test service::server::tests -- --nocapture`
|
||||
- `cargo test --test browser_ws_backend_test -- --nocapture`
|
||||
- `cargo test --test service_task_flow_test -- --nocapture`
|
||||
|
||||
If those pass, re-run the earlier mixed ws+pipe sweep to confirm no unexpected regression escaped the targeted checks.
|
||||
|
||||
## Risks and controls
|
||||
|
||||
### Risk: swallowing a legitimate protocol error
|
||||
|
||||
Control:
|
||||
- only allow the one-time skip on the first received text frame
|
||||
- only skip frames matching the known welcome prefix
|
||||
|
||||
### Risk: broadening behavior beyond service ws path
|
||||
|
||||
Control:
|
||||
- keep the change entirely inside `ServiceBrowserWsClient`
|
||||
- do not modify `WsBrowserBackend` parsing rules
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
The fix is complete only if all of the following are true:
|
||||
|
||||
1. The positive welcome-banner test fails before the change and passes after it.
|
||||
2. The negative malformed-first-frame test proves that non-matching first text frames still fail as protocol errors.
|
||||
3. Real ws service smoke no longer fails with `invalid browser status frame: Welcome! You are client #1` when using the configured real sgBrowser endpoint.
|
||||
4. Existing ws backend tests remain green.
|
||||
5. Existing service task-flow regression remains green.
|
||||
6. Pipe behavior remains unchanged, verified by the mixed ws+pipe regression suite.
|
||||
@@ -0,0 +1,182 @@
|
||||
# Zhihu WS Submit Realignment Design
|
||||
|
||||
## Background
|
||||
|
||||
The current Zhihu submit path drifted away from the documented browser websocket contract.
|
||||
|
||||
The authoritative contract for this repository is `docs/_tmp_sgbrowser_ws_api_doc.txt`.
|
||||
|
||||
For this slice, the spec anchors to these documented invariants only:
|
||||
|
||||
- connect to `ws://127.0.0.1:12345`
|
||||
- send `{"type":"register","role":"web"}`
|
||||
- send browser actions as JSON arrays `[requesturl, action, ...args]`
|
||||
- let browser results come back through documented callback semantics such as `callBackJsToCpp(...)`
|
||||
- keep the current page URL as the request owner instead of inventing an external helper page
|
||||
|
||||
The current production path does not follow that shape for Zhihu routes.
|
||||
|
||||
Instead, the submit path selects `BrowserCallbackBackend`, which starts `LiveBrowserCallbackHost` and attempts to bootstrap a local helper page at `/sgclaw/browser-helper.html`. That helper-page bootstrap is not part of the user's confirmed production model, and live evidence already shows it is the wrong assumption for the Release browser.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Zhihu submit currently fails before real work begins because the service path depends on a helper-page callback host bootstrap that the Release browser does not use.
|
||||
|
||||
That drift shows up in three ways:
|
||||
|
||||
1. Zhihu submit routes select the callback-host backend instead of the direct websocket backend.
|
||||
2. The mainline request URL becomes the local helper page URL instead of the real browser page URL.
|
||||
3. The submit path waits for helper-page readiness rather than proceeding through the documented websocket callback model.
|
||||
|
||||
This causes the observable failure:
|
||||
|
||||
- `timeout while waiting for browser message`
|
||||
- no real Zhihu page open/action in the browser
|
||||
|
||||
## Goal
|
||||
|
||||
Realign the Zhihu submit path to the documented websocket callback model without changing the existing pipe/service contract.
|
||||
|
||||
Concretely, the target behavior is:
|
||||
|
||||
- Zhihu submit routes use the websocket browser backend directly
|
||||
- browser messages keep the real page URL as `requesturl`
|
||||
- browser actions continue to use documented websocket opcodes
|
||||
- callback-bearing results continue to use the documented callback payload model
|
||||
- the browser no longer depends on opening a local helper page before Zhihu work starts
|
||||
|
||||
## Non-goals
|
||||
|
||||
This slice does not include:
|
||||
|
||||
- changing `ClientMessage` or `ServiceMessage`
|
||||
- changing `run_submit_task_with_browser_backend(...)`
|
||||
- rewriting the Zhihu workflow itself
|
||||
- adding a new browser bridge abstraction
|
||||
- redesigning the pipe path
|
||||
- deleting callback-host code that is outside the Zhihu submit mainline
|
||||
- speculative protocol expansion beyond the documented websocket contract
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
Choose **Option A**: withdraw Zhihu submit from the helper-page callback-host path and return it to the documented websocket callback model.
|
||||
|
||||
Rejected alternatives:
|
||||
|
||||
- Keep callback host but remove helper bootstrap: still preserves the wrong abstraction in the mainline.
|
||||
- Build a new orchestration layer: exceeds the requested scope.
|
||||
|
||||
## Mainline Architecture After Realignment
|
||||
|
||||
```text
|
||||
sg_claw_client
|
||||
-> sg_claw service / runtime submit path
|
||||
-> existing BrowserBackend seam
|
||||
-> WsBrowserBackend
|
||||
-> ws://127.0.0.1:12345
|
||||
-> documented browser opcodes and callback semantics
|
||||
```
|
||||
|
||||
For Zhihu submit routes, the callback-host helper page is no longer part of the mainline execution chain.
|
||||
|
||||
## Required Production Changes
|
||||
|
||||
### 1. Route selection
|
||||
|
||||
Update submit-route backend selection so these routes no longer instantiate `BrowserCallbackBackend`:
|
||||
|
||||
- `WorkflowRoute::ZhihuHotlistExportXlsx`
|
||||
- `WorkflowRoute::ZhihuHotlistScreen`
|
||||
- `WorkflowRoute::ZhihuArticleEntry`
|
||||
- `WorkflowRoute::ZhihuArticleDraft`
|
||||
- `WorkflowRoute::ZhihuArticlePublish`
|
||||
|
||||
The change applies in both:
|
||||
|
||||
- service submit path in `src/service/server.rs`
|
||||
- direct runtime submit path in `src/agent/mod.rs`
|
||||
|
||||
Direct runtime fallback behavior stays unchanged when no browser websocket URL is configured:
|
||||
|
||||
- if a real browser websocket URL is configured, use `WsBrowserBackend` for the listed Zhihu routes
|
||||
- if no browser websocket URL is configured, keep the existing pipe fallback instead of failing fast
|
||||
|
||||
### 2. Request URL ownership
|
||||
|
||||
Keep `requesturl` aligned with the real browser page instead of the helper page.
|
||||
|
||||
Expected behavior:
|
||||
|
||||
- initial request URL comes from the existing submit-path request context
|
||||
- after a successful navigate call, the websocket backend continues to update its request URL to the navigated target page
|
||||
- later `getText` and `eval` calls run against the real Zhihu page URL
|
||||
|
||||
This preserves the documented page-owned websocket model.
|
||||
|
||||
### 3. Callback semantics
|
||||
|
||||
Keep callback-bearing actions on the existing websocket protocol path, using the documented callback payload shape.
|
||||
|
||||
Required invariants:
|
||||
|
||||
- action frames remain `[requesturl, action, ...args]`
|
||||
- navigate uses the documented opcode `sgHideBrowserCallAfterLoaded`
|
||||
- `getText` and `eval` continue to emit `callBackJsToCpp(...)` payloads in the documented `sourceUrl@_@targetUrl@_@callback@_@actionUrl@_@responseTxt` form
|
||||
- callback decoding remains on the websocket path instead of moving through localhost helper-page HTTP endpoints
|
||||
|
||||
### 4. Callback-host removal from the Zhihu mainline
|
||||
|
||||
For this slice, callback-host code is removed from the Zhihu submit mainline, not redesigned.
|
||||
|
||||
Practical meaning:
|
||||
|
||||
- Zhihu submit must not start `LiveBrowserCallbackHost`
|
||||
- Zhihu submit must not emit `sgBrowerserOpenPage` for `/sgclaw/browser-helper.html`
|
||||
- Zhihu submit must not block on `/sgclaw/callback/ready`
|
||||
|
||||
Code outside the Zhihu submit mainline can remain unchanged unless tests require cleanup.
|
||||
|
||||
## Test Strategy
|
||||
|
||||
This slice follows TDD and replaces the stale helper-page assumptions with direct websocket submit-path assertions.
|
||||
|
||||
### Red tests to add or rewrite
|
||||
|
||||
1. Rewrite the current submit regression that asserts helper-page bootstrap.
|
||||
- old behavior under test: Zhihu submit bootstraps callback host
|
||||
- new behavior under test: Zhihu submit does **not** bootstrap callback host and does **not** emit helper-page frames
|
||||
|
||||
2. Add or update a focused submit-path regression proving request ownership stays on the real page.
|
||||
- after navigate, subsequent Zhihu browser actions must use the real target page URL rather than `/sgclaw/browser-helper.html`
|
||||
|
||||
3. Remove or rewrite any newly added red test whose only purpose was to preserve callback-host-without-helper behavior.
|
||||
- that test belongs to the rejected Option B path, not the chosen Option A path
|
||||
|
||||
### Green verification
|
||||
|
||||
After the minimal code change, run focused verification in this order:
|
||||
|
||||
1. `agent_runtime_test` coverage for the submit path
|
||||
2. relevant Zhihu `compat_runtime_test` coverage
|
||||
3. submit/service websocket regressions impacted by route selection
|
||||
4. stronger real-browser validation after focused tests pass
|
||||
|
||||
## Scope Guardrails
|
||||
|
||||
The implementation plan for this spec must obey all of the following:
|
||||
|
||||
1. Do not modify the pipe contract.
|
||||
2. Do not add a new browser abstraction.
|
||||
3. Do not broaden the change beyond the Zhihu submit path and its directly affected websocket protocol tests.
|
||||
4. Do not keep the helper-page path as a second competing Zhihu mainline.
|
||||
5. If live validation still reveals a callback-payload mismatch, only adjust the websocket protocol encoding/decoding at the exact mismatch point.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
The slice is complete when all of the following are true:
|
||||
|
||||
1. Zhihu submit routes no longer select the helper-page callback-host backend.
|
||||
2. No Zhihu submit regression expects or observes `/sgclaw/browser-helper.html` bootstrap.
|
||||
3. The websocket backend sends Zhihu follow-up actions with the real page URL as `requesturl`.
|
||||
4. Focused automated tests covering the changed submit path pass.
|
||||
5. Real-browser validation no longer fails at callback-host readiness timeout, emits no helper-page bootstrap frames, and emits at least one real-page follow-up browser action after navigate.
|
||||
@@ -0,0 +1,219 @@
|
||||
# Service Chat Web Console Design
|
||||
|
||||
## Background
|
||||
|
||||
The current natural-language entrypoint is the terminal client in `src/bin/sg_claw_client.rs`.
|
||||
That client already talks to the existing service websocket, sends `ClientMessage`, and prints
|
||||
`ServiceMessage` responses.
|
||||
|
||||
The repository also contains a separate browser callback helper at
|
||||
`http://127.0.0.1:61058/sgclaw/browser-helper.html`. That page is part of the browser backend
|
||||
execution path and must remain untouched.
|
||||
|
||||
For this slice, the authoritative boundary is:
|
||||
|
||||
- the new page may talk to the existing service websocket only
|
||||
- the page must not talk to the browser websocket directly
|
||||
- the page must not reuse or replace `browser-helper.html`
|
||||
- the page must not change the service protocol or browser execution logic
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Running `cargo run --bin sg_claw_client` and typing into stdin works, but it is inconvenient for
|
||||
routine usage. The user wants a simple local HTML page with a websocket connection field, a natural-
|
||||
language input box, and a send button.
|
||||
|
||||
The risk is scope drift: if the new page reaches into the browser-helper flow or changes backend
|
||||
logic, it could damage the working Zhihu/browser path.
|
||||
|
||||
## Goal
|
||||
|
||||
Add a standalone local HTML console that connects to the existing service websocket and submits
|
||||
natural-language tasks using the current `submit_task` message shape.
|
||||
|
||||
The page should be usable without changing `sg_claw`, `sg_claw_client`, `browser-helper.html`, or
|
||||
any existing service/browser runtime behavior.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This slice does not include:
|
||||
|
||||
- serving the page from the Rust service
|
||||
- changing `ClientMessage` or `ServiceMessage`
|
||||
- changing `src/service/server.rs`
|
||||
- changing `src/browser/callback_host.rs`
|
||||
- changing `src/browser/callback_backend.rs`
|
||||
- changing the helper-page bootstrap flow
|
||||
- adding authentication, persistence, or multi-session orchestration
|
||||
- replacing the terminal client
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
Choose Option A: add one standalone HTML file that opens in a normal browser and talks to the
|
||||
existing service websocket at `ws://127.0.0.1:42321` by default.
|
||||
|
||||
Why this option:
|
||||
|
||||
- it is the narrowest possible change
|
||||
- it reuses the already-working service protocol
|
||||
- it does not alter the browser-helper path
|
||||
- it keeps all runtime ownership in the existing Rust service
|
||||
|
||||
Rejected alternatives:
|
||||
|
||||
- extend `browser-helper.html` into a chat UI: wrong boundary; that page belongs to browser
|
||||
callback orchestration, not user task entry
|
||||
- add a new HTTP server inside `sg_claw`: unnecessary for the requested scope
|
||||
- replace the terminal client binary: not required; both clients can coexist
|
||||
|
||||
## File Placement
|
||||
|
||||
Create the page outside `frontend/runtime-host/`.
|
||||
|
||||
Chosen location:
|
||||
|
||||
- `frontend/service-console/sg_claw_service_console.html`
|
||||
|
||||
Reason:
|
||||
|
||||
- `frontend/runtime-host/` is reserved for SuperRPA runtime-host bundles
|
||||
- the new page is a standalone local tool, not a Chromium-hosted bundle
|
||||
- keeping it in its own directory makes the isolation explicit
|
||||
|
||||
## Page Architecture
|
||||
|
||||
The page is a single self-contained HTML file with inline CSS and inline JavaScript.
|
||||
No build step and no frontend framework are required.
|
||||
|
||||
The page has three UI regions:
|
||||
|
||||
1. Connection bar
|
||||
- websocket URL input
|
||||
- connect/disconnect button
|
||||
- current connection state label
|
||||
|
||||
2. Message stream
|
||||
- appends service logs in arrival order
|
||||
- distinguishes connection info, task logs, errors, and final completion
|
||||
- keeps the current session visible until the page is refreshed
|
||||
|
||||
3. Task composer
|
||||
- one textarea for natural-language input
|
||||
- one send button
|
||||
- send disabled while the websocket is disconnected
|
||||
- while a task is in flight, keep the composer enabled and let repeated submits surface the
|
||||
existing service-side `busy` response rather than adding a new frontend queue
|
||||
|
||||
## Protocol Contract
|
||||
|
||||
The page must reuse the existing service protocol exactly.
|
||||
|
||||
### Outbound message
|
||||
|
||||
When the user clicks send, the page sends:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "submit_task",
|
||||
"instruction": "<user input>",
|
||||
"conversation_id": "",
|
||||
"messages": [],
|
||||
"page_url": "",
|
||||
"page_title": ""
|
||||
}
|
||||
```
|
||||
|
||||
This matches the current terminal client shape in `src/bin/sg_claw_client.rs`.
|
||||
|
||||
### Inbound messages
|
||||
|
||||
The page displays these existing `ServiceMessage` variants:
|
||||
|
||||
- `status_changed` -> render as a compact connection/runtime status row
|
||||
- `log_entry` -> append as a chronological task log row
|
||||
- `task_complete` -> append as the terminal result row for that submission
|
||||
- `busy` -> append as a visible refusal/error row without automatic retry
|
||||
|
||||
No new message type is introduced.
|
||||
|
||||
## Interaction Flow
|
||||
|
||||
1. User opens the local HTML file with a normal browser, typically via `file://`.
|
||||
2. User connects to the service websocket.
|
||||
3. The page shows websocket connection status locally.
|
||||
4. User enters a natural-language instruction and clicks send.
|
||||
5. The page sends one `submit_task` payload over the service websocket.
|
||||
6. The service continues to execute tasks exactly as it already does.
|
||||
7. Incoming service messages are appended to the message stream.
|
||||
8. After `task_complete`, the websocket remains open so the user can send another task.
|
||||
|
||||
## Error Handling
|
||||
|
||||
The page handles only UI-local failures:
|
||||
|
||||
- websocket connect failure -> show connection error and keep send disabled
|
||||
- websocket disconnect mid-session -> mark disconnected and require reconnect
|
||||
- empty instruction -> block send and show inline validation
|
||||
- `busy` response -> show as a visible service-side refusal without retry logic
|
||||
|
||||
The page does not add retries, protocol fallbacks, or browser-runtime recovery logic.
|
||||
|
||||
## Isolation From `browser-helper.html`
|
||||
|
||||
This is the critical constraint.
|
||||
|
||||
The new page must never:
|
||||
|
||||
- reference `/sgclaw/browser-helper.html`
|
||||
- reference `/sgclaw/callback/ready`
|
||||
- reference `/sgclaw/callback/events`
|
||||
- reference `/sgclaw/callback/commands/next`
|
||||
- reference `/sgclaw/callback/commands/ack`
|
||||
- connect to `ws://127.0.0.1:12345`
|
||||
|
||||
The only network target owned by the page is the service websocket, defaulting to
|
||||
`ws://127.0.0.1:42321`.
|
||||
|
||||
Because of that boundary, the page does not interfere with the helper-page bootstrap path.
|
||||
|
||||
## Test Strategy
|
||||
|
||||
This slice stays minimal, so the automated guard is also minimal.
|
||||
|
||||
### Automated regression
|
||||
|
||||
Add one focused integration test in `tests/service_console_html_test.rs` that reads the standalone
|
||||
HTML source and asserts:
|
||||
|
||||
- the file exists at the agreed path and is resolved from `CARGO_MANIFEST_DIR` so the test is
|
||||
stable across working directories
|
||||
- it contains the service websocket default URL
|
||||
- it contains `submit_task` payload construction
|
||||
- it does not contain helper-page URLs or callback-host endpoints
|
||||
- it does not contain the browser websocket URL
|
||||
|
||||
This test is a scope guard, not a browser-E2E suite.
|
||||
|
||||
### Manual smoke verification
|
||||
|
||||
With the existing service binary running:
|
||||
|
||||
1. open the HTML file in a browser
|
||||
2. connect to the service websocket
|
||||
3. confirm local websocket open/close events and service `status_changed` messages both appear in the message stream
|
||||
4. submit a natural-language task
|
||||
5. confirm logs and completion render in the page
|
||||
6. confirm the helper-page path remains unchanged because the page never references it
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
The slice is complete when all of the following are true:
|
||||
|
||||
1. `frontend/service-console/sg_claw_service_console.html` exists.
|
||||
2. The page connects to the existing service websocket without backend changes.
|
||||
3. The page sends the existing `submit_task` shape and receives existing `ServiceMessage` events.
|
||||
4. The page does not reference `browser-helper.html`, callback-host endpoints, or the browser
|
||||
websocket URL.
|
||||
5. Existing browser-helper logic remains untouched.
|
||||
6. The automated source guard passes.
|
||||
7. Manual smoke verification confirms a task can be submitted from the HTML page.
|
||||
@@ -0,0 +1,373 @@
|
||||
# Zhihu Hotlist Post-Export Auto-Open Design
|
||||
|
||||
## Background
|
||||
|
||||
The current Zhihu hotlist workflows already support two separate artifact outputs:
|
||||
|
||||
- `openxml_office` generates a local `.xlsx` file for hotlist export
|
||||
- `screen_html_export` generates a local `.html` dashboard for presentation
|
||||
|
||||
Today, the workflow stops after artifact generation and returns a summary string such as:
|
||||
|
||||
- `已导出知乎热榜 Excel <path>`
|
||||
- `已生成知乎热榜大屏 <path>`
|
||||
|
||||
That means the user still has to manually open the generated file.
|
||||
|
||||
The user wants one additional post-export action, but only one at a time:
|
||||
|
||||
1. for Excel-oriented tasks, automatically open the generated `.xlsx` with the system default spreadsheet application
|
||||
2. for dashboard-oriented tasks, automatically open the generated local dashboard HTML inside the running sgBrowser session
|
||||
|
||||
This is an exclusive choice, not a combined mode.
|
||||
|
||||
## Current Runtime Facts
|
||||
|
||||
The implementation must match the current browser/runtime boundary that already exists in the repo:
|
||||
|
||||
- the active service submit path in `src/service/server.rs` constructs `BrowserCallbackBackend`
|
||||
- `BrowserCallbackBackend::invoke(Action::Navigate, ...)` currently emits `sgBrowerserOpenPage`, which opens a new visible browser tab and keeps the helper page alive
|
||||
- `WsBrowserBackend::invoke(Action::Navigate, ...)` has different semantics and a different transport path from the callback-host service path
|
||||
- `MacPolicy::validate(...)` currently rejects empty or non-domain values, so a raw `file://...` navigation cannot pass through the normal domain validation path today
|
||||
- `screen_html_export` already returns `presentation.url`, which is the existing `file://` presentation URL contract for the generated dashboard
|
||||
|
||||
Those facts mean the design must not promise "replace the helper page" or "reuse identical tab behavior across all backends". The required success path for this slice is narrower: open the generated dashboard automatically in the current callback-host-backed sgBrowser service session without adding a new user-facing surface.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The existing workflow logic in `src/compat/workflow_executor.rs` already separates hotlist export from dashboard generation, but it treats both routes as artifact-only flows. The last mile is missing:
|
||||
|
||||
- the Excel route does not auto-open the generated file
|
||||
- the dashboard route does not consume the generated dashboard presentation URL and open it automatically in the browser runtime
|
||||
|
||||
The risk is scope drift. This change must not:
|
||||
|
||||
- turn Excel-open and dashboard-open into a combined workflow
|
||||
- add new help/help-like user-visible surfaces
|
||||
- move orchestration into `frontend/service-console/`
|
||||
- modify the websocket protocol
|
||||
- modify `browser-helper.html`
|
||||
- modify callback-host HTTP endpoints or their contracts
|
||||
- change the artifact-generation contract of `openxml_office` or `screen_html_export`
|
||||
|
||||
## Goal
|
||||
|
||||
Extend the existing Zhihu hotlist post-export behavior so that:
|
||||
|
||||
- Excel tasks generate `.xlsx` and then auto-open it with the local system default spreadsheet application
|
||||
- dashboard tasks generate `.html` and then auto-open that generated dashboard inside sgBrowser
|
||||
|
||||
On the current callback-host service path, "inside sgBrowser" means opening the generated dashboard in a new visible browser tab while the helper page stays alive. The user does not need to open the file manually.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This slice does not include:
|
||||
|
||||
- opening Excel and dashboard in the same run
|
||||
- adding a new combined route that auto-opens both artifacts
|
||||
- adding any new help, helper, or user-visible assistance surface
|
||||
- modifying `frontend/service-console/sg_claw_service_console.html`
|
||||
- modifying `src/service/protocol.rs`
|
||||
- modifying `browser-helper.html`
|
||||
- modifying `/sgclaw/callback/*` contracts
|
||||
- turning the browser backend into a general-purpose local filesystem browser
|
||||
- changing the artifact-generation JSON contract of `openxml_office` or `screen_html_export`
|
||||
|
||||
## Chosen Approach
|
||||
|
||||
Keep the current two workflow routes, but add one route-specific post-export action to each:
|
||||
|
||||
- `ZhihuHotlistExportXlsx` -> generate `.xlsx`, then open it locally with the OS default app
|
||||
- `ZhihuHotlistScreen` -> generate `.html`, then open the generated dashboard presentation URL in the browser runtime
|
||||
|
||||
For the dashboard route, use the existing `presentation.url` returned by `screen_html_export` as the authoritative browser-open URL. Do not invent a separate normal-path URL conversion layer when the tool already returns the presentation contract.
|
||||
|
||||
The compat opener must emit one exact navigate request shape for this case.
|
||||
|
||||
- `action`: `Action::Navigate`
|
||||
- `expected_domain`: the exact literal `__sgclaw_local_dashboard__`
|
||||
- `params.url`: the exact `presentation.url` returned by `screen_html_export`
|
||||
- `params.sgclaw_local_dashboard_open.source`: the exact literal `compat.workflow_executor`
|
||||
- `params.sgclaw_local_dashboard_open.kind`: the exact literal `zhihu_hotlist_screen`
|
||||
- `params.sgclaw_local_dashboard_open.output_path`: the generated local dashboard artifact path
|
||||
- `params.sgclaw_local_dashboard_open.presentation_url`: the same `file://` URL stored in `params.url`
|
||||
|
||||
On the current callback-host-backed service path, only that exact request shape is approved for the local-dashboard special case. A plain `Action::Navigate` with an arbitrary `file://...` URL, or a request missing any one of the required marker fields above, must continue to be rejected.
|
||||
|
||||
Because normal `MacPolicy` domain validation cannot accept `file://...`, add a narrow local-dashboard presentation allowance in the browser backend/security boundary. That allowance must be limited to this one case:
|
||||
|
||||
- only for `Action::Navigate`
|
||||
- only for generated local dashboard presentation URLs
|
||||
- only for local HTML presentation, not arbitrary local paths or generic file browsing
|
||||
|
||||
Why this approach:
|
||||
|
||||
- it preserves the existing mutual exclusivity between Excel export and dashboard presentation
|
||||
- it keeps artifact generation in the existing tools
|
||||
- it keeps browser opening inside the existing browser backend boundary
|
||||
- it uses the existing `screen_html_export` presentation contract instead of duplicating it
|
||||
- it avoids pushing orchestration into the service console or protocol layer
|
||||
- it stays compatible with the current callback-host runtime, where visible navigation is new-tab based
|
||||
- it limits the guaranteed browser-open behavior in this slice to the callback-host-backed service path that the user is using today
|
||||
|
||||
Rejected alternatives:
|
||||
|
||||
- add a combined "Excel + dashboard" route: explicitly rejected by user behavior
|
||||
- let `frontend/service-console/` decide when to open generated files: wrong layer; the console is only a submit/view surface
|
||||
- add help UI to expose output choices: explicitly unwanted by the user
|
||||
- change `browser-helper.html` so the helper page itself becomes the dashboard: this would break the current helper-page persistence model
|
||||
- promise a backend-agnostic "replace the current page" behavior: inaccurate because callback-host and websocket backends do not share identical navigate semantics
|
||||
- require the websocket backend to gain matching local-dashboard visible-open behavior in this slice: outside the narrow current-service-path goal
|
||||
|
||||
## File Responsibilities
|
||||
|
||||
### `src/compat/workflow_executor.rs`
|
||||
|
||||
Continue to own:
|
||||
|
||||
- route detection for Zhihu hotlist workflows
|
||||
- artifact generation orchestration
|
||||
- post-export summary construction
|
||||
|
||||
New responsibilities in this slice:
|
||||
|
||||
- parse the successful artifact payloads after `openxml_office` and `screen_html_export`
|
||||
- call the route-specific post-export opener only after artifact creation succeeds
|
||||
- for the dashboard route, consume `presentation.url` from the `screen_html_export` result payload
|
||||
- keep generation success and post-export open success/failure distinct in the returned summary
|
||||
|
||||
### `src/compat/artifact_open.rs`
|
||||
|
||||
New helper module to keep side effects out of `workflow_executor.rs`.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- open a generated local `.xlsx` with the system default application
|
||||
- open a generated local dashboard presentation URL through the existing `BrowserBackend`
|
||||
- construct the exact approved dashboard navigate request shape used by this slice
|
||||
- define the narrow local-dashboard presentation token/constants used by the compat layer and backend compatibility path
|
||||
- return narrow success/failure results so `workflow_executor.rs` can produce accurate summaries
|
||||
|
||||
This module must stay small and focused. It is not a general launcher framework.
|
||||
|
||||
### `src/browser/callback_backend.rs`
|
||||
|
||||
New narrow responsibility in this slice:
|
||||
|
||||
- at the `BrowserCallbackBackend::invoke(Action::Navigate, params, expected_domain)` entrypoint, recognize only the exact approved local-dashboard presentation request shape
|
||||
- preserve the current callback-host behavior of using `sgBrowerserOpenPage`, which opens a new visible tab and keeps the helper page alive
|
||||
- reject local-file navigate attempts that do not include the exact post-export marker payload from the compat layer
|
||||
|
||||
This slice must not change callback-host polling, helper bootstrap, or callback endpoint behavior.
|
||||
|
||||
### `src/browser/ws_backend.rs`
|
||||
|
||||
No required behavior change in this slice.
|
||||
|
||||
Notes:
|
||||
|
||||
- websocket transport semantics differ from the callback-host service path
|
||||
- this spec does not require websocket backend local-dashboard visible-open support
|
||||
- websocket-specific parity can be designed later as a separate slice if needed
|
||||
|
||||
### `src/security/mac_policy.rs`
|
||||
|
||||
New narrow responsibility in this slice:
|
||||
|
||||
- expose a small validation helper for the approved local-dashboard presentation case
|
||||
- validate the real local presentation URL and artifact path for that case rather than treating `file://` as a normal allowed domain
|
||||
- keep the normal domain-based validation path unchanged for ordinary remote navigation
|
||||
|
||||
The policy layer must not turn `file://` into a generally allowed "domain". This is an explicit special case for generated local dashboard presentation only.
|
||||
|
||||
### `src/compat/mod.rs`
|
||||
|
||||
Expose the new helper module.
|
||||
|
||||
## Route Semantics
|
||||
|
||||
### Excel export route
|
||||
|
||||
Trigger examples:
|
||||
|
||||
- `读取知乎热榜数据,并导出 excel 文件`
|
||||
- `导出知乎热榜 xlsx`
|
||||
|
||||
Expected behavior:
|
||||
|
||||
1. collect hotlist rows
|
||||
2. call `openxml_office`
|
||||
3. obtain `output_path`
|
||||
4. open the generated `.xlsx` using the local OS default spreadsheet application
|
||||
5. return a success summary reflecting both generation and open state
|
||||
|
||||
Summary rules:
|
||||
|
||||
- open succeeded -> `已导出并打开知乎热榜 Excel <path>`
|
||||
- open failed but file exists -> `已导出知乎热榜 Excel <path>,但自动打开失败:<reason>`
|
||||
|
||||
The workflow still counts artifact generation as successful even if the post-export open step fails.
|
||||
|
||||
### Dashboard route
|
||||
|
||||
Trigger examples:
|
||||
|
||||
- `读取知乎热榜数据并生成领导演示大屏`
|
||||
- `生成知乎热榜 dashboard`
|
||||
- `展示知乎热榜大屏`
|
||||
|
||||
Expected behavior:
|
||||
|
||||
1. collect hotlist rows
|
||||
2. call `screen_html_export`
|
||||
3. obtain `output_path`
|
||||
4. obtain `presentation.url` from the tool result payload
|
||||
5. invoke the browser opener through the existing `BrowserBackend`
|
||||
6. return a success summary reflecting both generation and browser-open state
|
||||
|
||||
Summary rules:
|
||||
|
||||
- browser open succeeded -> `已在浏览器中打开知乎热榜大屏 <path>`
|
||||
- browser open failed but file exists -> `已生成知乎热榜大屏 <path>,但浏览器自动打开失败:<reason>`
|
||||
|
||||
The workflow still counts artifact generation as successful even if the browser-open step fails.
|
||||
|
||||
## Browser Boundary
|
||||
|
||||
This slice must preserve the current browser/runtime boundary.
|
||||
|
||||
Allowed:
|
||||
|
||||
- use the existing `BrowserBackend`
|
||||
- use the existing `Action::Navigate`
|
||||
- use the existing `screen_html_export` `presentation.url`
|
||||
- add a narrow compatibility path so local generated dashboard presentation can pass backend validation
|
||||
|
||||
Not allowed:
|
||||
|
||||
- change `browser-helper.html`
|
||||
- introduce a new callback-host endpoint
|
||||
- move file-opening responsibility into the frontend service console
|
||||
- add a new browser-side bootstrap flow
|
||||
- require websocket protocol changes
|
||||
|
||||
Important semantic note:
|
||||
|
||||
- on the current service callback-host path, dashboard open is expected to use `sgBrowerserOpenPage`, so the generated dashboard appears in a new visible browser tab while the helper page remains available for later tasks
|
||||
- websocket-backed browser execution may continue to differ; this slice does not require matching visible-open semantics there
|
||||
|
||||
## Local Dashboard Presentation Allowance
|
||||
|
||||
The local dashboard browser-open path needs an explicit narrow validation rule because `file://...` cannot pass the normal domain allowlist.
|
||||
|
||||
Requirements for the narrow allowance:
|
||||
|
||||
- only approved for `Action::Navigate`
|
||||
- only approved for the exact compat marker payload described above
|
||||
- only approved for generated local dashboard presentation URLs
|
||||
- only approved when the validated local artifact path points to the generated dashboard HTML artifact returned by the same `screen_html_export` success payload
|
||||
- only approved for local HTML presentation, not arbitrary executables or unrelated local files
|
||||
- ordinary remote navigation must continue using the existing `MacPolicy::validate(...)` domain rules unchanged
|
||||
|
||||
This keeps the behavior small and auditable while still satisfying the user-visible dashboard auto-open requirement.
|
||||
|
||||
## Local File Opening Boundary
|
||||
|
||||
The Excel auto-open action is a local runtime side effect, not a browser action.
|
||||
|
||||
Requirements:
|
||||
|
||||
- use the system default application for `.xlsx`
|
||||
- support the current Windows environment first
|
||||
- keep the implementation minimal and focused on the generated artifact path
|
||||
|
||||
Not required in this slice:
|
||||
|
||||
- a cross-platform abstraction beyond the minimal shape needed for the current repo environment
|
||||
- opening arbitrary user-selected files
|
||||
- exposing local file opening to the service websocket protocol
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Excel route
|
||||
|
||||
If `.xlsx` generation fails:
|
||||
|
||||
- return the existing export failure
|
||||
|
||||
If `.xlsx` generation succeeds but auto-open fails:
|
||||
|
||||
- keep the artifact path in the summary
|
||||
- mark only the auto-open step as failed
|
||||
- do not delete the generated file
|
||||
|
||||
### Dashboard route
|
||||
|
||||
If `.html` generation fails:
|
||||
|
||||
- return the existing screen export failure
|
||||
|
||||
If `.html` generation succeeds but browser open fails:
|
||||
|
||||
- keep the artifact path in the summary
|
||||
- mark only the browser-open step as failed
|
||||
- do not delete the generated file
|
||||
|
||||
If the tool result is missing `presentation.url`:
|
||||
|
||||
- treat that as a protocol error in the post-export open step for this route
|
||||
- keep the generated artifact path in the summary if it is available
|
||||
- do not silently invent a different contract in the normal path
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Workflow tests
|
||||
|
||||
Update or add focused workflow coverage so that:
|
||||
|
||||
- Excel workflow still calls `openxml_office`
|
||||
- dashboard workflow still calls `screen_html_export`
|
||||
- the two routes remain mutually exclusive
|
||||
- dashboard workflow consumes the tool's existing `presentation.url`
|
||||
|
||||
### New Excel post-export test
|
||||
|
||||
Add a focused regression proving:
|
||||
|
||||
- an Excel-oriented hotlist request triggers export
|
||||
- the generated `.xlsx` path is passed into the local default-app opener
|
||||
- no browser dashboard navigate is triggered for that route
|
||||
|
||||
### New dashboard post-export test
|
||||
|
||||
Add a focused regression proving:
|
||||
|
||||
- a dashboard-oriented hotlist request triggers HTML generation
|
||||
- the generated tool payload `presentation.url` is used for browser open
|
||||
- the browser backend receives a local-dashboard navigate request through the approved compat path
|
||||
- no local spreadsheet opener is triggered for that route
|
||||
|
||||
### Backend/security compatibility tests
|
||||
|
||||
Add focused regressions proving:
|
||||
|
||||
- callback backend accepts the approved local-dashboard navigate case and still emits `sgBrowerserOpenPage`
|
||||
- the narrow local-dashboard allowance rejects non-local or malformed URLs
|
||||
- ordinary domain validation behavior remains unchanged for normal remote navigation
|
||||
|
||||
### Existing boundary tests remain unchanged
|
||||
|
||||
Do not change the service-console boundary guard. This slice is runtime behavior only.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
The slice is complete when all of the following are true:
|
||||
|
||||
1. Excel hotlist export still generates a local `.xlsx` artifact.
|
||||
2. Excel hotlist export auto-opens that `.xlsx` with the system default spreadsheet application.
|
||||
3. Dashboard hotlist export still generates a local `.html` artifact.
|
||||
4. Dashboard hotlist export consumes the existing `screen_html_export` `presentation.url` and auto-opens it in the current callback-host-backed sgBrowser service session.
|
||||
5. On the current callback-host service path, the dashboard opens automatically in a visible browser tab without breaking the helper-page runtime.
|
||||
6. Excel-open and dashboard-open remain separate user-chosen flows, not a combined mode.
|
||||
7. No new help/help-like user-visible surface is added.
|
||||
8. The service console, websocket protocol, `browser-helper.html`, and callback-host endpoint surface remain untouched.
|
||||
Reference in New Issue
Block a user