Files
skill-lib/docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md
2026-03-25 02:17:55 +00:00

108 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SuperRPA sgClaw Browser Control Design
## Goal
Build `sgclaw` in two phases so it can control the existing SuperRPA browser with minimal new surface area.
- Phase 1: deliver a demo-safe closed loop for a fixed instruction like `打开百度搜索天气`.
- Phase 2: upgrade that loop into a real Agent flow backed by `deepseek-chat`.
The design must maximize reuse of existing SuperRPA browser interfaces and minimize working context for future contributors.
## Scope
### In Scope
- Reuse SuperRPA `CommandRouter` as the browser execution entry.
- Reuse existing browser rule and security infrastructure where possible.
- Keep the Rust side responsible for task understanding, sequencing, and pipe protocol.
- Keep the browser side responsible for process hosting, security re-check, and command dispatch.
- Use layered docs so contributors only read the smallest necessary document.
### Out of Scope
- New browser automation APIs parallel to `CommandRouter`
- Full SkillLoader / Memory / MCP work in Phase 1
- Broad action-set expansion beyond `click`, `type`, `navigate`, `getText`
## Existing Integration Points
### sgClaw Repository
- Pipe and security baseline already exist in [`src/pipe/protocol.rs`](/home/zyl/projects/sgClaw/src/pipe/protocol.rs), [`src/pipe/handshake.rs`](/home/zyl/projects/sgClaw/src/pipe/handshake.rs), [`src/pipe/browser_tool.rs`](/home/zyl/projects/sgClaw/src/pipe/browser_tool.rs), and [`src/security/mac_policy.rs`](/home/zyl/projects/sgClaw/src/security/mac_policy.rs).
### SuperRPA Repository
- Browser command entry: `src/chrome/browser/superrpa/router/command_router.h/.cc`
- Existing sgclaw dispatch/security area: `src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc`, `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h/.cc`
- FunctionsUI front-end entry: `src/chrome/browser/resources/superrpa/devtools/functions/functions.ts`
- Rules and whitelist sources: `src/chrome/browser/superrpa/rules/*`, `src/chrome/browser/superrpa/zombie/resource_controller.*`
## Recommended Architecture
Use a thin-adapter design.
1. Rust owns `submit_task`, planning, pipe messages, response correlation, and final task completion.
2. SuperRPA owns `sgclaw` process lifecycle, JSON Line I/O, secondary security validation, and delegation into existing `CommandRouter`.
3. Phase 1 uses a rule-based planner for one narrow intent family: `打开百度搜索X`.
4. Phase 2 replaces that planner with a real Agent runtime using `deepseek-chat`, but keeps the same `BrowserPipeTool` contract so browser-side code stays thin.
This preserves the browsers existing abstractions and avoids duplicating action logic.
## Phase Design
### Phase 1: Minimal Demo Loop
- Add task-level messages on top of the existing pipe.
- Accept a `submit_task` instruction from the browser bridge.
- Parse only one pattern family: open Baidu, enter query, click search.
- Return `task_complete` with summary and step log.
- Allow Baidu only in demo rules, not as a permanent broad whitelist expansion.
### Phase 2: Real Agent Loop
- Add `agent/runtime.rs` and provider abstraction.
- Register `BrowserPipeTool` as `browser_action`.
- Default provider is DeepSeek with `base_url=https://api.deepseek.com` and model `deepseek-chat`.
- Keep provider config externalized through environment variables and settings files.
## Security
- HMAC must be aligned to the browser contract exactly: `<seq>\n<action>\n<stable_json(params)>\n<expected_domain>`.
- Rust validates before send; browser validates again before dispatch.
- `rules.json` remains the source for domain/action allow rules.
- Demo-only domains like `baidu.com` must be clearly isolated in a demo profile or demo rules file.
## Context Control Strategy
Use four small docs instead of one large narrative:
1. This design doc: goals, boundaries, architecture.
2. Browser contract doc: exact message shapes and file paths.
3. Plan doc: execution order and concrete files.
4. Acceptance doc: smoke tests and failure matrix.
Each implementation task should point only to the doc section it needs.
## Testing Strategy
- Rust unit tests for protocol, planner, HMAC, and runtime message handling
- Rust integration tests for `submit_task -> command -> response -> task_complete`
- SuperRPA unit tests for process host, listener, security gate, and dispatch mapping
- Cross-repo smoke test for `打开百度搜索天气`
## Acceptance Criteria
### Phase 1
- Start `sgclaw` from SuperRPA
- Send `submit_task`
- Navigate to Baidu and search a keyword through existing browser actions
- Surface logs and final result back to FunctionsUI
### Phase 2
- Execute the same flow through `deepseek-chat`
- Keep the same browser contract and command mapping
- Expose provider/model config without code changes