346 lines
11 KiB
Markdown
346 lines
11 KiB
Markdown
# SuperRPA sgClaw Browser Control Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Deliver a two-phase integration where `sgclaw` first drives the existing SuperRPA browser through a minimal fixed-intent demo, then upgrades to a real Agent loop backed by `deepseek-chat`.
|
|
|
|
**Architecture:** Keep the browser side thin and reuse-first. Rust owns task understanding, pipe protocol, and sequencing; SuperRPA owns process hosting, secondary security checks, and delegation into existing `CommandRouter`. Phase 1 uses a rule-based planner; Phase 2 swaps in an Agent runtime without changing browser command execution.
|
|
|
|
**Tech Stack:** Rust, JSON Line over STDIO, HMAC-SHA256, SuperRPA Chromium C++, existing `CommandRouter`, existing rules services, FunctionsUI bridge, DeepSeek OpenAI-compatible API (`deepseek-chat`).
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
### sgClaw Repository
|
|
|
|
- Create: `src/agent/mod.rs`
|
|
- Create: `src/agent/runtime.rs`
|
|
- Create: `src/agent/planner.rs`
|
|
- Create: `src/llm/mod.rs`
|
|
- Create: `src/llm/provider.rs`
|
|
- Create: `src/llm/deepseek.rs`
|
|
- Create: `src/config/mod.rs`
|
|
- Create: `src/config/settings.rs`
|
|
- Modify: `src/lib.rs`
|
|
- Modify: `src/main.rs`
|
|
- Modify: `src/pipe/protocol.rs`
|
|
- Modify: `src/pipe/browser_tool.rs`
|
|
- Modify: `src/security/hmac.rs`
|
|
- Modify: `resources/rules.json`
|
|
- Create: `tests/task_protocol_test.rs`
|
|
- Create: `tests/planner_test.rs`
|
|
- Create: `tests/runtime_task_flow_test.rs`
|
|
|
|
### SuperRPA Repository
|
|
|
|
- Modify: `src/chrome/browser/superrpa/BUILD.gn`
|
|
- Modify: `src/chrome/browser/superrpa/router/command_router.h`
|
|
- Modify: `src/chrome/browser/superrpa/router/command_router.cc`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.cc`
|
|
- Create or modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.h`
|
|
- Create or modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.cc`
|
|
- Create or modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.h`
|
|
- Create or modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.cc`
|
|
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions.ts`
|
|
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions_manifest.ts`
|
|
- Modify: `src/chrome/browser/superrpa/rules/rpa_rules_service_factory.cc`
|
|
- Test: `test("superrpa_unittests")`
|
|
|
|
## Task 1: Align Pipe Contract and Security Baseline
|
|
|
|
**Files:**
|
|
- Modify: `src/pipe/protocol.rs`
|
|
- Modify: `src/security/hmac.rs`
|
|
- Modify: `resources/rules.json`
|
|
- Create: `tests/task_protocol_test.rs`
|
|
|
|
- [ ] **Step 1: Write failing protocol tests for task-level messages**
|
|
|
|
Add tests covering `submit_task`, `task_complete`, and exact HMAC canonical string expectations.
|
|
|
|
- [ ] **Step 2: Run protocol-focused tests**
|
|
|
|
Run: `cargo test task_protocol_test pipe_protocol_test -q`
|
|
Expected: FAIL because the task-level messages and canonical signing are missing.
|
|
|
|
- [ ] **Step 3: Extend protocol types**
|
|
|
|
Add task-scope message variants in `src/pipe/protocol.rs` for:
|
|
- browser -> sgclaw `submit_task`
|
|
- sgclaw -> browser `task_complete`
|
|
- optional `log_entry`
|
|
|
|
- [ ] **Step 4: Fix HMAC canonical string**
|
|
|
|
Change `src/security/hmac.rs` to sign:
|
|
|
|
```text
|
|
<seq>\n<action>\n<stable_json(params)>\n<expected_domain>
|
|
```
|
|
|
|
- [ ] **Step 5: Add demo rules isolation**
|
|
|
|
Add a clearly marked demo allow entry for Baidu in `resources/rules.json`, with comments in docs explaining it is demo-only.
|
|
|
|
- [ ] **Step 6: Re-run protocol tests**
|
|
|
|
Run: `cargo test task_protocol_test pipe_protocol_test -q`
|
|
Expected: PASS.
|
|
|
|
## Task 2: Build Phase 1 Rust Task Flow
|
|
|
|
**Files:**
|
|
- Create: `src/agent/mod.rs`
|
|
- Create: `src/agent/planner.rs`
|
|
- Modify: `src/lib.rs`
|
|
- Modify: `src/main.rs`
|
|
- Create: `tests/planner_test.rs`
|
|
- Create: `tests/runtime_task_flow_test.rs`
|
|
|
|
- [ ] **Step 1: Write failing planner tests**
|
|
|
|
Add tests for parsing:
|
|
- `打开百度搜索天气`
|
|
- `打开百度搜索电网调度`
|
|
|
|
Expected output is an ordered action plan: `navigate`, `type`, `click`.
|
|
|
|
- [ ] **Step 2: Run planner tests**
|
|
|
|
Run: `cargo test planner_test -q`
|
|
Expected: FAIL because no planner exists.
|
|
|
|
- [ ] **Step 3: Implement rule-based planner**
|
|
|
|
Create `src/agent/planner.rs` with a minimal parser that only accepts the Baidu-search intent family and rejects everything else clearly.
|
|
|
|
- [ ] **Step 4: Wire `submit_task` handling into runtime entry**
|
|
|
|
Update `src/lib.rs` and `src/main.rs` so the Rust process can receive a task message, execute the planner, call `BrowserPipeTool`, and emit `task_complete`.
|
|
|
|
- [ ] **Step 5: Add end-to-end runtime test**
|
|
|
|
Use a mock transport to validate:
|
|
- receive `submit_task`
|
|
- send three browser commands
|
|
- consume three responses
|
|
- emit `task_complete`
|
|
|
|
- [ ] **Step 6: Re-run Rust tests**
|
|
|
|
Run: `cargo test -q`
|
|
Expected: PASS for planner and runtime task flow.
|
|
|
|
## Task 3: Reuse Existing SuperRPA Browser Execution Path
|
|
|
|
**Files:**
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.h`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.cc`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.h`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.cc`
|
|
- Modify: `src/chrome/browser/superrpa/BUILD.gn`
|
|
|
|
- [ ] **Step 1: Add failing browser-side host/listener tests**
|
|
|
|
Cover:
|
|
- process start
|
|
- init handshake timeout
|
|
- JSON Line split and dispatch
|
|
- listener rejection of invalid payloads
|
|
|
|
- [ ] **Step 2: Implement process host skeleton**
|
|
|
|
Add lifecycle states and `Start/Stop/SendLine` using the existing sgclaw area, not a parallel subsystem.
|
|
|
|
- [ ] **Step 3: Implement listener**
|
|
|
|
Read `stdout`, split lines, reject empty/oversized/invalid JSON, and forward valid messages to sgclaw dispatch code.
|
|
|
|
- [ ] **Step 4: Hook build targets**
|
|
|
|
Update `src/chrome/browser/superrpa/BUILD.gn` to compile the sgclaw host/listener path inside existing targets.
|
|
|
|
- [ ] **Step 5: Run browser unit tests**
|
|
|
|
Run the relevant `superrpa_unittests` target for the added cases.
|
|
Expected: PASS.
|
|
|
|
## Task 4: Reuse CommandRouter and Security Gates
|
|
|
|
**Files:**
|
|
- Modify: `src/chrome/browser/superrpa/router/command_router.h`
|
|
- Modify: `src/chrome/browser/superrpa/router/command_router.cc`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h`
|
|
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.cc`
|
|
- Modify: `src/chrome/browser/superrpa/rules/rpa_rules_service_factory.cc`
|
|
|
|
- [ ] **Step 1: Write failing dispatch/security tests**
|
|
|
|
Cover:
|
|
- allowed Baidu demo task
|
|
- blocked non-whitelisted domain
|
|
- blocked unsupported action
|
|
- HMAC mismatch rejection
|
|
|
|
- [ ] **Step 2: Reuse command entrypoints**
|
|
|
|
Map sgclaw commands into existing methods:
|
|
- `ExecuteNavigate`
|
|
- `ExecuteType`
|
|
- `ExecuteClick`
|
|
- `ExecuteGetText`
|
|
|
|
- [ ] **Step 3: Reuse security layers**
|
|
|
|
Ensure sgclaw path reads existing rules services and uses `sgclaw_security_gate` for secondary checks before dispatch.
|
|
|
|
- [ ] **Step 4: Add demo rules source**
|
|
|
|
If needed, gate Baidu allow rules behind profile/demo config rather than broad permanent defaults.
|
|
|
|
- [ ] **Step 5: Re-run browser tests**
|
|
|
|
Run the focused security/dispatch unit tests.
|
|
Expected: PASS.
|
|
|
|
## Task 5: Wire FunctionsUI Submission and Result Flow
|
|
|
|
**Files:**
|
|
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions.ts`
|
|
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions_manifest.ts`
|
|
- Modify: browser-side bridge code that receives `window.__SUPER_RPA_BRIDGE__` calls
|
|
|
|
- [ ] **Step 1: Write failing UI bridge test or manual harness case**
|
|
|
|
Cover:
|
|
- `sgclaw_start`
|
|
- `sgclaw_stop`
|
|
- `sgclaw_submit_task`
|
|
- result/event propagation
|
|
|
|
- [ ] **Step 2: Add bridge entry points**
|
|
|
|
Expose minimal callable actions from FunctionsUI to the browser-side sgclaw host.
|
|
|
|
- [ ] **Step 3: Surface task lifecycle events**
|
|
|
|
Push state, logs, and final result back to FunctionsUI without introducing a new parallel UI subsystem.
|
|
|
|
- [ ] **Step 4: Validate manual smoke path**
|
|
|
|
Manual test:
|
|
1. Open FunctionsUI
|
|
2. Start sgclaw
|
|
3. Submit `打开百度搜索天气`
|
|
4. Observe logs and completion summary
|
|
|
|
- [ ] **Step 5: Document the bridge contract**
|
|
|
|
Add a short browser-side note describing the exact payloads for start/stop/submit/result.
|
|
|
|
## Task 6: Add Phase 2 Agent Runtime with DeepSeek
|
|
|
|
**Files:**
|
|
- Create: `src/agent/runtime.rs`
|
|
- Create: `src/llm/mod.rs`
|
|
- Create: `src/llm/provider.rs`
|
|
- Create: `src/llm/deepseek.rs`
|
|
- Create: `src/config/mod.rs`
|
|
- Create: `src/config/settings.rs`
|
|
- Modify: `src/pipe/browser_tool.rs`
|
|
- Modify: `src/lib.rs`
|
|
- Create: `tests/deepseek_provider_test.rs`
|
|
- Create: `tests/agent_runtime_test.rs`
|
|
|
|
- [ ] **Step 1: Write failing provider tests**
|
|
|
|
Cover:
|
|
- config loading from env
|
|
- request shape for DeepSeek compatible chat API
|
|
- model default = `deepseek-chat`
|
|
|
|
- [ ] **Step 2: Implement provider abstraction**
|
|
|
|
Add a minimal provider trait and DeepSeek implementation using:
|
|
- `base_url=https://api.deepseek.com`
|
|
- model `deepseek-chat`
|
|
- API key from environment or config file, never hardcoded
|
|
|
|
- [ ] **Step 3: Write failing runtime tests**
|
|
|
|
Cover:
|
|
- tool registration for `browser_action`
|
|
- one think-act-observe cycle
|
|
- final summary generation after successful browser actions
|
|
|
|
- [ ] **Step 4: Implement Agent runtime**
|
|
|
|
Create a minimal `AgentRuntime` that can:
|
|
- receive task text
|
|
- call provider
|
|
- parse tool call
|
|
- invoke `BrowserPipeTool`
|
|
- emit `task_complete`
|
|
|
|
- [ ] **Step 5: Keep Phase 1 fallback**
|
|
|
|
Retain the rule-based planner as a fallback path for offline/demo use and for controlled debugging.
|
|
|
|
- [ ] **Step 6: Re-run Rust tests**
|
|
|
|
Run: `cargo test -q`
|
|
Expected: PASS including provider and runtime suites.
|
|
|
|
## Task 7: Final Cross-Repo Acceptance and Low-Context Docs
|
|
|
|
**Files:**
|
|
- Modify: `README.md`
|
|
- Create: `docs/superpowers/acceptance/2026-03-25-superrpa-sgclaw-browser-control.md`
|
|
- Modify: `docs/浏览器对接标准.md`
|
|
- Modify: `docs/sgclaw_project_team_kickoff.md`
|
|
|
|
- [ ] **Step 1: Write acceptance checklist**
|
|
|
|
Cover:
|
|
- handshake
|
|
- `submit_task`
|
|
- Baidu search success
|
|
- HMAC mismatch failure
|
|
- non-whitelisted domain rejection
|
|
|
|
- [ ] **Step 2: Create low-context handoff docs**
|
|
|
|
Write one short acceptance doc that links only the required files and commands for each phase.
|
|
|
|
- [ ] **Step 3: Run final smoke tests**
|
|
|
|
Rust repo:
|
|
`cargo test -q`
|
|
|
|
Browser repo:
|
|
run focused `superrpa_unittests`
|
|
|
|
Manual:
|
|
submit `打开百度搜索天气`
|
|
|
|
- [ ] **Step 4: Update top-level docs**
|
|
|
|
Update README and browser contract docs so the next contributor can find:
|
|
- Phase 1 demo loop
|
|
- Phase 2 Agent loop
|
|
- exact integration points
|
|
|
|
- [ ] **Step 5: Commit in small slices**
|
|
|
|
Suggested commit order:
|
|
1. `feat: align sgclaw pipe contract for task flow`
|
|
2. `feat: add phase1 baidu demo planner`
|
|
3. `feat: wire superrpa sgclaw process host and dispatcher`
|
|
4. `feat: add functionsui sgclaw task bridge`
|
|
5. `feat: add deepseek-backed agent runtime`
|
|
6. `docs: add acceptance and integration notes`
|