Files
skill-lib/docs/superpowers/plans/2026-03-25-superrpa-sgclaw-browser-control.md
2026-03-25 02:17:55 +00:00

346 lines
11 KiB
Markdown

# SuperRPA sgClaw Browser Control Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Deliver a two-phase integration where `sgclaw` first drives the existing SuperRPA browser through a minimal fixed-intent demo, then upgrades to a real Agent loop backed by `deepseek-chat`.
**Architecture:** Keep the browser side thin and reuse-first. Rust owns task understanding, pipe protocol, and sequencing; SuperRPA owns process hosting, secondary security checks, and delegation into existing `CommandRouter`. Phase 1 uses a rule-based planner; Phase 2 swaps in an Agent runtime without changing browser command execution.
**Tech Stack:** Rust, JSON Line over STDIO, HMAC-SHA256, SuperRPA Chromium C++, existing `CommandRouter`, existing rules services, FunctionsUI bridge, DeepSeek OpenAI-compatible API (`deepseek-chat`).
---
## File Structure
### sgClaw Repository
- Create: `src/agent/mod.rs`
- Create: `src/agent/runtime.rs`
- Create: `src/agent/planner.rs`
- Create: `src/llm/mod.rs`
- Create: `src/llm/provider.rs`
- Create: `src/llm/deepseek.rs`
- Create: `src/config/mod.rs`
- Create: `src/config/settings.rs`
- Modify: `src/lib.rs`
- Modify: `src/main.rs`
- Modify: `src/pipe/protocol.rs`
- Modify: `src/pipe/browser_tool.rs`
- Modify: `src/security/hmac.rs`
- Modify: `resources/rules.json`
- Create: `tests/task_protocol_test.rs`
- Create: `tests/planner_test.rs`
- Create: `tests/runtime_task_flow_test.rs`
### SuperRPA Repository
- Modify: `src/chrome/browser/superrpa/BUILD.gn`
- Modify: `src/chrome/browser/superrpa/router/command_router.h`
- Modify: `src/chrome/browser/superrpa/router/command_router.cc`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.cc`
- Create or modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.h`
- Create or modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.cc`
- Create or modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.h`
- Create or modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.cc`
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions.ts`
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions_manifest.ts`
- Modify: `src/chrome/browser/superrpa/rules/rpa_rules_service_factory.cc`
- Test: `test("superrpa_unittests")`
## Task 1: Align Pipe Contract and Security Baseline
**Files:**
- Modify: `src/pipe/protocol.rs`
- Modify: `src/security/hmac.rs`
- Modify: `resources/rules.json`
- Create: `tests/task_protocol_test.rs`
- [ ] **Step 1: Write failing protocol tests for task-level messages**
Add tests covering `submit_task`, `task_complete`, and exact HMAC canonical string expectations.
- [ ] **Step 2: Run protocol-focused tests**
Run: `cargo test task_protocol_test pipe_protocol_test -q`
Expected: FAIL because the task-level messages and canonical signing are missing.
- [ ] **Step 3: Extend protocol types**
Add task-scope message variants in `src/pipe/protocol.rs` for:
- browser -> sgclaw `submit_task`
- sgclaw -> browser `task_complete`
- optional `log_entry`
- [ ] **Step 4: Fix HMAC canonical string**
Change `src/security/hmac.rs` to sign:
```text
<seq>\n<action>\n<stable_json(params)>\n<expected_domain>
```
- [ ] **Step 5: Add demo rules isolation**
Add a clearly marked demo allow entry for Baidu in `resources/rules.json`, with comments in docs explaining it is demo-only.
- [ ] **Step 6: Re-run protocol tests**
Run: `cargo test task_protocol_test pipe_protocol_test -q`
Expected: PASS.
## Task 2: Build Phase 1 Rust Task Flow
**Files:**
- Create: `src/agent/mod.rs`
- Create: `src/agent/planner.rs`
- Modify: `src/lib.rs`
- Modify: `src/main.rs`
- Create: `tests/planner_test.rs`
- Create: `tests/runtime_task_flow_test.rs`
- [ ] **Step 1: Write failing planner tests**
Add tests for parsing:
- `打开百度搜索天气`
- `打开百度搜索电网调度`
Expected output is an ordered action plan: `navigate`, `type`, `click`.
- [ ] **Step 2: Run planner tests**
Run: `cargo test planner_test -q`
Expected: FAIL because no planner exists.
- [ ] **Step 3: Implement rule-based planner**
Create `src/agent/planner.rs` with a minimal parser that only accepts the Baidu-search intent family and rejects everything else clearly.
- [ ] **Step 4: Wire `submit_task` handling into runtime entry**
Update `src/lib.rs` and `src/main.rs` so the Rust process can receive a task message, execute the planner, call `BrowserPipeTool`, and emit `task_complete`.
- [ ] **Step 5: Add end-to-end runtime test**
Use a mock transport to validate:
- receive `submit_task`
- send three browser commands
- consume three responses
- emit `task_complete`
- [ ] **Step 6: Re-run Rust tests**
Run: `cargo test -q`
Expected: PASS for planner and runtime task flow.
## Task 3: Reuse Existing SuperRPA Browser Execution Path
**Files:**
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.h`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_process_host.cc`
- Modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.h`
- Modify: `src/chrome/browser/superrpa/sgclaw/pipe_listener.cc`
- Modify: `src/chrome/browser/superrpa/BUILD.gn`
- [ ] **Step 1: Add failing browser-side host/listener tests**
Cover:
- process start
- init handshake timeout
- JSON Line split and dispatch
- listener rejection of invalid payloads
- [ ] **Step 2: Implement process host skeleton**
Add lifecycle states and `Start/Stop/SendLine` using the existing sgclaw area, not a parallel subsystem.
- [ ] **Step 3: Implement listener**
Read `stdout`, split lines, reject empty/oversized/invalid JSON, and forward valid messages to sgclaw dispatch code.
- [ ] **Step 4: Hook build targets**
Update `src/chrome/browser/superrpa/BUILD.gn` to compile the sgclaw host/listener path inside existing targets.
- [ ] **Step 5: Run browser unit tests**
Run the relevant `superrpa_unittests` target for the added cases.
Expected: PASS.
## Task 4: Reuse CommandRouter and Security Gates
**Files:**
- Modify: `src/chrome/browser/superrpa/router/command_router.h`
- Modify: `src/chrome/browser/superrpa/router/command_router.cc`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_command_dispatcher.cc`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.h`
- Modify: `src/chrome/browser/superrpa/sgclaw/sgclaw_security_gate.cc`
- Modify: `src/chrome/browser/superrpa/rules/rpa_rules_service_factory.cc`
- [ ] **Step 1: Write failing dispatch/security tests**
Cover:
- allowed Baidu demo task
- blocked non-whitelisted domain
- blocked unsupported action
- HMAC mismatch rejection
- [ ] **Step 2: Reuse command entrypoints**
Map sgclaw commands into existing methods:
- `ExecuteNavigate`
- `ExecuteType`
- `ExecuteClick`
- `ExecuteGetText`
- [ ] **Step 3: Reuse security layers**
Ensure sgclaw path reads existing rules services and uses `sgclaw_security_gate` for secondary checks before dispatch.
- [ ] **Step 4: Add demo rules source**
If needed, gate Baidu allow rules behind profile/demo config rather than broad permanent defaults.
- [ ] **Step 5: Re-run browser tests**
Run the focused security/dispatch unit tests.
Expected: PASS.
## Task 5: Wire FunctionsUI Submission and Result Flow
**Files:**
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions.ts`
- Modify: `src/chrome/browser/resources/superrpa/devtools/functions/functions_manifest.ts`
- Modify: browser-side bridge code that receives `window.__SUPER_RPA_BRIDGE__` calls
- [ ] **Step 1: Write failing UI bridge test or manual harness case**
Cover:
- `sgclaw_start`
- `sgclaw_stop`
- `sgclaw_submit_task`
- result/event propagation
- [ ] **Step 2: Add bridge entry points**
Expose minimal callable actions from FunctionsUI to the browser-side sgclaw host.
- [ ] **Step 3: Surface task lifecycle events**
Push state, logs, and final result back to FunctionsUI without introducing a new parallel UI subsystem.
- [ ] **Step 4: Validate manual smoke path**
Manual test:
1. Open FunctionsUI
2. Start sgclaw
3. Submit `打开百度搜索天气`
4. Observe logs and completion summary
- [ ] **Step 5: Document the bridge contract**
Add a short browser-side note describing the exact payloads for start/stop/submit/result.
## Task 6: Add Phase 2 Agent Runtime with DeepSeek
**Files:**
- Create: `src/agent/runtime.rs`
- Create: `src/llm/mod.rs`
- Create: `src/llm/provider.rs`
- Create: `src/llm/deepseek.rs`
- Create: `src/config/mod.rs`
- Create: `src/config/settings.rs`
- Modify: `src/pipe/browser_tool.rs`
- Modify: `src/lib.rs`
- Create: `tests/deepseek_provider_test.rs`
- Create: `tests/agent_runtime_test.rs`
- [ ] **Step 1: Write failing provider tests**
Cover:
- config loading from env
- request shape for DeepSeek compatible chat API
- model default = `deepseek-chat`
- [ ] **Step 2: Implement provider abstraction**
Add a minimal provider trait and DeepSeek implementation using:
- `base_url=https://api.deepseek.com`
- model `deepseek-chat`
- API key from environment or config file, never hardcoded
- [ ] **Step 3: Write failing runtime tests**
Cover:
- tool registration for `browser_action`
- one think-act-observe cycle
- final summary generation after successful browser actions
- [ ] **Step 4: Implement Agent runtime**
Create a minimal `AgentRuntime` that can:
- receive task text
- call provider
- parse tool call
- invoke `BrowserPipeTool`
- emit `task_complete`
- [ ] **Step 5: Keep Phase 1 fallback**
Retain the rule-based planner as a fallback path for offline/demo use and for controlled debugging.
- [ ] **Step 6: Re-run Rust tests**
Run: `cargo test -q`
Expected: PASS including provider and runtime suites.
## Task 7: Final Cross-Repo Acceptance and Low-Context Docs
**Files:**
- Modify: `README.md`
- Create: `docs/superpowers/acceptance/2026-03-25-superrpa-sgclaw-browser-control.md`
- Modify: `docs/浏览器对接标准.md`
- Modify: `docs/sgclaw_project_team_kickoff.md`
- [ ] **Step 1: Write acceptance checklist**
Cover:
- handshake
- `submit_task`
- Baidu search success
- HMAC mismatch failure
- non-whitelisted domain rejection
- [ ] **Step 2: Create low-context handoff docs**
Write one short acceptance doc that links only the required files and commands for each phase.
- [ ] **Step 3: Run final smoke tests**
Rust repo:
`cargo test -q`
Browser repo:
run focused `superrpa_unittests`
Manual:
submit `打开百度搜索天气`
- [ ] **Step 4: Update top-level docs**
Update README and browser contract docs so the next contributor can find:
- Phase 1 demo loop
- Phase 2 Agent loop
- exact integration points
- [ ] **Step 5: Commit in small slices**
Suggested commit order:
1. `feat: align sgclaw pipe contract for task flow`
2. `feat: add phase1 baidu demo planner`
3. `feat: wire superrpa sgclaw process host and dispatcher`
4. `feat: add functionsui sgclaw task bridge`
5. `feat: add deepseek-backed agent runtime`
6. `docs: add acceptance and integration notes`