4.7 KiB
4.7 KiB
SuperRPA sgClaw Browser Control Design
Goal
Build sgclaw in two phases so it can control the existing SuperRPA browser with minimal new surface area.
- Phase 1: deliver a demo-safe closed loop for a fixed instruction like
打开百度搜索天气. - Phase 2: upgrade that loop into a real Agent flow backed by
deepseek-chat.
The design must maximize reuse of existing SuperRPA browser interfaces and minimize working context for future contributors.
Scope
In Scope
- Reuse SuperRPA
CommandRouteras the browser execution entry. - Reuse existing browser rule and security infrastructure where possible.
- Keep the Rust side responsible for task understanding, sequencing, and pipe protocol.
- Keep the browser side responsible for process hosting, security re-check, and command dispatch.
- Use layered docs so contributors only read the smallest necessary document.
Out of Scope
- New browser automation APIs parallel to
CommandRouter - Full SkillLoader / Memory / MCP work in Phase 1
- Broad action-set expansion beyond
click,type,navigate,getText
Existing Integration Points
sgClaw Repository
- Pipe and security baseline already exist in
src/pipe/protocol.rs,src/pipe/handshake.rs,src/pipe/browser_tool.rs, andsrc/security/mac_policy.rs.
SuperRPA Repository
- Browser command entry:
src/chrome/browser/superrpa/router/command_router.h/.cc - Existing sgclaw dispatch/security area:
src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc,src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h/.cc - FunctionsUI front-end entry:
src/chrome/browser/resources/superrpa/devtools/functions/functions.ts - Rules and whitelist sources:
src/chrome/browser/superrpa/rules/*,src/chrome/browser/superrpa/zombie/resource_controller.*
Recommended Architecture
Use a thin-adapter design.
- Rust owns
submit_task, planning, pipe messages, response correlation, and final task completion. - SuperRPA owns
sgclawprocess lifecycle, JSON Line I/O, secondary security validation, and delegation into existingCommandRouter. - Phase 1 uses a rule-based planner for one narrow intent family:
打开百度搜索X. - Phase 2 replaces that planner with a real Agent runtime using
deepseek-chat, but keeps the sameBrowserPipeToolcontract so browser-side code stays thin.
This preserves the browser’s existing abstractions and avoids duplicating action logic.
Phase Design
Phase 1: Minimal Demo Loop
- Add task-level messages on top of the existing pipe.
- Accept a
submit_taskinstruction from the browser bridge. - Parse only one pattern family: open Baidu, enter query, click search.
- Return
task_completewith summary and step log. - Allow Baidu only in demo rules, not as a permanent broad whitelist expansion.
Phase 2: Real Agent Loop
- Add
agent/runtime.rsand provider abstraction. - Register
BrowserPipeToolasbrowser_action. - Default provider is DeepSeek with
base_url=https://api.deepseek.comand modeldeepseek-chat. - Keep provider config externalized through environment variables and settings files.
Security
- HMAC must be aligned to the browser contract exactly:
<seq>\n<action>\n<stable_json(params)>\n<expected_domain>. - Rust validates before send; browser validates again before dispatch.
rules.jsonremains the source for domain/action allow rules.- Demo-only domains like
baidu.commust be clearly isolated in a demo profile or demo rules file.
Context Control Strategy
Use four small docs instead of one large narrative:
- This design doc: goals, boundaries, architecture.
- Browser contract doc: exact message shapes and file paths.
- Plan doc: execution order and concrete files.
- Acceptance doc: smoke tests and failure matrix.
Each implementation task should point only to the doc section it needs.
Testing Strategy
- Rust unit tests for protocol, planner, HMAC, and runtime message handling
- Rust integration tests for
submit_task -> command -> response -> task_complete - SuperRPA unit tests for process host, listener, security gate, and dispatch mapping
- Cross-repo smoke test for
打开百度搜索天气
Acceptance Criteria
Phase 1
- Start
sgclawfrom SuperRPA - Send
submit_task - Navigate to Baidu and search a keyword through existing browser actions
- Surface logs and final result back to FunctionsUI
Phase 2
- Execute the same flow through
deepseek-chat - Keep the same browser contract and command mapping
- Expose provider/model config without code changes