Files
claw/docs/superpowers/specs/2026-03-25-superrpa-sgclaw-browser-control-design.md
2026-03-25 02:17:55 +00:00

4.7 KiB
Raw Blame History

SuperRPA sgClaw Browser Control Design

Goal

Build sgclaw in two phases so it can control the existing SuperRPA browser with minimal new surface area.

  • Phase 1: deliver a demo-safe closed loop for a fixed instruction like 打开百度搜索天气.
  • Phase 2: upgrade that loop into a real Agent flow backed by deepseek-chat.

The design must maximize reuse of existing SuperRPA browser interfaces and minimize working context for future contributors.

Scope

In Scope

  • Reuse SuperRPA CommandRouter as the browser execution entry.
  • Reuse existing browser rule and security infrastructure where possible.
  • Keep the Rust side responsible for task understanding, sequencing, and pipe protocol.
  • Keep the browser side responsible for process hosting, security re-check, and command dispatch.
  • Use layered docs so contributors only read the smallest necessary document.

Out of Scope

  • New browser automation APIs parallel to CommandRouter
  • Full SkillLoader / Memory / MCP work in Phase 1
  • Broad action-set expansion beyond click, type, navigate, getText

Existing Integration Points

sgClaw Repository

SuperRPA Repository

  • Browser command entry: src/chrome/browser/superrpa/router/command_router.h/.cc
  • Existing sgclaw dispatch/security area: src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc, src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h/.cc
  • FunctionsUI front-end entry: src/chrome/browser/resources/superrpa/devtools/functions/functions.ts
  • Rules and whitelist sources: src/chrome/browser/superrpa/rules/*, src/chrome/browser/superrpa/zombie/resource_controller.*

Use a thin-adapter design.

  1. Rust owns submit_task, planning, pipe messages, response correlation, and final task completion.
  2. SuperRPA owns sgclaw process lifecycle, JSON Line I/O, secondary security validation, and delegation into existing CommandRouter.
  3. Phase 1 uses a rule-based planner for one narrow intent family: 打开百度搜索X.
  4. Phase 2 replaces that planner with a real Agent runtime using deepseek-chat, but keeps the same BrowserPipeTool contract so browser-side code stays thin.

This preserves the browsers existing abstractions and avoids duplicating action logic.

Phase Design

Phase 1: Minimal Demo Loop

  • Add task-level messages on top of the existing pipe.
  • Accept a submit_task instruction from the browser bridge.
  • Parse only one pattern family: open Baidu, enter query, click search.
  • Return task_complete with summary and step log.
  • Allow Baidu only in demo rules, not as a permanent broad whitelist expansion.

Phase 2: Real Agent Loop

  • Add agent/runtime.rs and provider abstraction.
  • Register BrowserPipeTool as browser_action.
  • Default provider is DeepSeek with base_url=https://api.deepseek.com and model deepseek-chat.
  • Keep provider config externalized through environment variables and settings files.

Security

  • HMAC must be aligned to the browser contract exactly: <seq>\n<action>\n<stable_json(params)>\n<expected_domain>.
  • Rust validates before send; browser validates again before dispatch.
  • rules.json remains the source for domain/action allow rules.
  • Demo-only domains like baidu.com must be clearly isolated in a demo profile or demo rules file.

Context Control Strategy

Use four small docs instead of one large narrative:

  1. This design doc: goals, boundaries, architecture.
  2. Browser contract doc: exact message shapes and file paths.
  3. Plan doc: execution order and concrete files.
  4. Acceptance doc: smoke tests and failure matrix.

Each implementation task should point only to the doc section it needs.

Testing Strategy

  • Rust unit tests for protocol, planner, HMAC, and runtime message handling
  • Rust integration tests for submit_task -> command -> response -> task_complete
  • SuperRPA unit tests for process host, listener, security gate, and dispatch mapping
  • Cross-repo smoke test for 打开百度搜索天气

Acceptance Criteria

Phase 1

  • Start sgclaw from SuperRPA
  • Send submit_task
  • Navigate to Baidu and search a keyword through existing browser actions
  • Surface logs and final result back to FunctionsUI

Phase 2

  • Execute the same flow through deepseek-chat
  • Keep the same browser contract and command mapping
  • Expose provider/model config without code changes