# Zhihu Hotlist To Excel Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Make sgClaw reliably read Zhihu hotlist data through a Zhihu browser skill and export the collected structured result into a local `.xlsx` file through an independent Office skill. **Architecture:** Keep zeroclaw as the core planner, but stop it from wandering across unrelated tools once a browser-attached skill is selected. The hotlist skill must produce a strict structured artifact, and the Office skill must consume that artifact through a dedicated `openxml_office` tool that wraps the sibling `openxml_cli` project. For the first delivery, reuse `openxml_cli template render` with a bundled `.xlsx` template instead of inventing a new workbook-construction API. **Tech Stack:** Rust, vendored zeroclaw, sgClaw browser pipe, skill packages under `/home/zyl/projects/sgClaw/skill_lib`, sibling `openxml_cli`, JSON payload handoff, `.xlsx` template render, Python/Rust regression tests, real-provider smoke verification. ## Scope Guard - In scope: - browser-attached skill execution discipline - `zhihu-hotlist` structured export artifact - new `office-export-xlsx` skill - new `openxml_office` runtime tool - end-to-end acceptance for "读取知乎热榜数据,并导出 excel 文件" - Out of scope: - generic Office authoring platform - arbitrary shell-based export flows - browser-side file generation as the main export path - broad multi-site data export before Zhihu hotlist is stable ## Current Findings To Preserve - Real-provider validation already proved that `zhihu-hotlist`, `zhihu-navigate`, and `zhihu-write` can be selected through `read_skill`. - The current failure mode is not "skill missing" but "tool discipline collapse": - `file_read`, `glob_search`, and `shell` are attempted after `read_skill` - `zhihu-write` can fill title/body but still exceeds max tool iterations - `zhihu-navigate` succeeds for some intents but still detours through non-browser tools - The sibling Office project already exists at `/home/zyl/projects/sgClaw/openxml_cli`. - `openxml_cli` currently exposes `capabilities`, `template inspect`, `template validate`, and `template render`; it does not yet expose a direct "create workbook from scratch" command. ## Final Acceptance Contract Input: ```text 读取知乎热榜数据,并导出 excel 文件 ``` Required behavior: 1. sgClaw selects `zhihu-hotlist`. 2. sgClaw gathers hotlist rows through the SuperRPA browser interface only. 3. sgClaw converts the result into a structured JSON export payload. 4. sgClaw selects `office-export-xlsx`. 5. sgClaw calls `openxml_office`. 6. A local `.xlsx` file is produced and its path is returned. Required logs: - `read_skill zhihu-hotlist` - browser actions only: `navigate`, `getText`, optionally `click` - `read_skill office-export-xlsx` - `call openxml_office` Forbidden logs during the mainline path: - `call shell` - `call glob_search` - `call file_read` on skill references or skill roots - `docker run` Required Excel content: - one sheet named `知乎热榜` - columns: `rank`, `title`, `heat` - at least 10 hotlist rows - exported values match the collected rows ## Task 1: Lock Browser-Attached Skill Runs To The Right Tools **Files:** - Modify: `/home/zyl/projects/sgClaw/claw/tests/compat_runtime_test.rs` - Modify: `/home/zyl/projects/sgClaw/claw/src/runtime/engine.rs` - Modify: `/home/zyl/projects/sgClaw/claw/src/runtime/tool_policy.rs` - Modify: `/home/zyl/projects/sgClaw/claw/src/compat/runtime.rs` **Intent:** - Once the task is clearly in a browser-attached Zhihu skill flow, the runtime must stop offering unrelated tools such as `shell`, `glob_search`, and arbitrary `file_read`. **Step 1: Write the failing regression tests** Add focused tests in `tests/compat_runtime_test.rs` for: ```rust #[test] fn zhihu_hotlist_skill_flow_does_not_expose_shell_or_glob_tools() {} #[test] fn browser_attached_export_flow_exposes_browser_and_office_tools_only() {} ``` Assertions to include: - request tool list contains `superrpa_browser` - request tool list contains `read_skill` - request tool list does not contain `shell` - request tool list does not contain `glob_search` - request tool list does not contain generic `file_read` during the constrained browser skill phase **Step 2: Run the focused tests to verify failure** Run: ```bash cargo test --test compat_runtime_test zhihu_hotlist_skill_flow_does_not_expose_shell_or_glob_tools -- --nocapture cargo test --test compat_runtime_test browser_attached_export_flow_exposes_browser_and_office_tools_only -- --nocapture ``` Expected: - fail because current runtime still exposes too many tools in browser-attached mode **Step 3: Implement minimal constrained-tool policy** Implement a browser-skill execution mode that: - keeps `superrpa_browser` - keeps compatibility alias `browser_action` - keeps `read_skill` - optionally keeps the new `openxml_office` tool only for export tasks - removes `shell`, `glob_search`, and free-form `file_read` from the allowed tool list for these phases Do this in `src/runtime/engine.rs` by deriving a narrower `allowed_tools` set from: - runtime profile - browser surface present flag - instruction intent - whether export mode is active **Step 4: Re-run the focused tests** Run the same commands. Expected: - both pass ## Task 2: Convert Zhihu Hotlist Skill To Structured Output First **Files:** - Modify: `/home/zyl/projects/sgClaw/skill_lib/skills/zhihu-hotlist/SKILL.md` - Modify: `/home/zyl/projects/sgClaw/claw/tests/skill_lib_validation_test.py` - Modify: `/home/zyl/projects/sgClaw/claw/tests/compat_runtime_test.rs` **Intent:** - The hotlist skill should stop ending with prose-only summaries. Its primary output must be a stable export artifact the Office skill can consume. **Step 1: Write the failing tests** Add tests that enforce: - `zhihu-hotlist` prompt body contains an explicit `Export Artifact` section - the artifact schema includes `sheet_name`, `columns`, and `rows` - runtime regression checks can find those fields in the skill content when `read_skill` is used **Step 2: Run tests to verify failure** Run: ```bash python3 -m unittest tests.skill_lib_validation_test cargo test --test compat_runtime_test handle_browser_message_executes_real_zhihu_hotlist_skill_flow -- --nocapture ``` Expected: - validation fails because the artifact contract is not yet required **Step 3: Update `zhihu-hotlist`** Add an `Export Artifact` section that requires this shape: ```json { "source": "https://www.zhihu.com/hot", "sheet_name": "知乎热榜", "columns": ["rank", "title", "heat"], "rows": [[1, "标题", "344万"]] } ``` Also add hard rules: - no extra exploratory tools after the browser data is collected - prose summary is secondary, structured artifact is primary **Step 4: Re-run tests** Expected: - validation passes ## Task 3: Create The Office Export Skill Package **Files:** - Create: `/home/zyl/projects/sgClaw/skill_lib/skills/office-export-xlsx/SKILL.md` - Create: `/home/zyl/projects/sgClaw/skill_lib/skills/office-export-xlsx/references/export-flow.md` - Create: `/home/zyl/projects/sgClaw/skill_lib/skills/office-export-xlsx/assets/zhihu_hotlist_template.xlsx` - Modify: `/home/zyl/projects/sgClaw/claw/tests/skill_lib_validation_test.py` **Intent:** - Add a fully separate Office skill that knows nothing about browser scraping and only turns structured table data into a local Excel file. **Step 1: Write the failing validation test** Extend `tests/skill_lib_validation_test.py` so discovery expects: ```python EXPECTED_SKILL_NAMES = [ "office-export-xlsx", "zhihu-hotlist", "zhihu-navigate", "zhihu-write", ] ``` Also require the new skill to mention: - `openxml_office` - `.xlsx` - `sheet_name` - `columns` - `rows` **Step 2: Run the validation test to verify failure** Run: ```bash python3 -m unittest tests.skill_lib_validation_test ``` Expected: - fail because the new skill package does not exist yet **Step 3: Create the skill package** `SKILL.md` must define: - when to use: local Office export from structured rows - required input schema - output: exported file path - tool rule: only call `openxml_office`, do not use browser tools `export-flow.md` must define: - validate payload shape - choose output path - invoke `openxml_office` - return file path and row count The first workbook template should be a fixed `zhihu_hotlist_template.xlsx` with: - sheet `知乎热榜` - row 1 headers already present - table fill anchored to a stable name or placeholder expected by `openxml_cli` **Step 4: Re-run validation** Expected: - new skill passes audit ## Task 4: Add The `openxml_office` Runtime Tool **Files:** - Create: `/home/zyl/projects/sgClaw/claw/src/compat/openxml_office_tool.rs` - Modify: `/home/zyl/projects/sgClaw/claw/src/compat/mod.rs` - Modify: `/home/zyl/projects/sgClaw/claw/src/compat/runtime.rs` - Modify: `/home/zyl/projects/sgClaw/claw/src/runtime/tool_policy.rs` - Test: `/home/zyl/projects/sgClaw/claw/tests/compat_openxml_office_tool_test.rs` **Intent:** - Wrap sibling `openxml_cli` as a first-class local tool instead of leaking Office export through shell prompting. **Step 1: Write the failing tool test** Create `tests/compat_openxml_office_tool_test.rs` with cases for: - capability probe - render request assembly for xlsx export - rejection when rows/columns are missing - stable JSON output containing `output_path` **Step 2: Run the test to verify failure** Run: ```bash cargo test --test compat_openxml_office_tool_test -- --nocapture ``` Expected: - fail because the tool does not exist **Step 3: Implement minimal tool** Tool contract: ```json { "action": "export_hotlist_xlsx", "template_path": ".../zhihu_hotlist_template.xlsx", "output_path": "/tmp/zhihu_hotlist.xlsx", "sheet_name": "知乎热榜", "columns": ["rank", "title", "heat"], "rows": [[1, "标题", "344万"]] } ``` Implementation rules: - write the payload JSON to a temp file - invoke sibling `openxml_cli template render --request --json` - return parsed JSON result and normalized `output_path` - no free-form shell composition from model text **Step 4: Re-run the focused tests** Expected: - pass ## Task 5: Wire Export Tasks To Use Two Skills In Sequence **Files:** - Modify: `/home/zyl/projects/sgClaw/claw/src/runtime/engine.rs` - Modify: `/home/zyl/projects/sgClaw/claw/tests/compat_runtime_test.rs` **Intent:** - The single user instruction must naturally flow from hotlist capture into Office export, not end after the first skill. **Step 1: Write the failing runtime test** Add a focused regression test for: ```rust #[test] fn zhihu_hotlist_export_task_reads_hotlist_skill_then_office_skill() {} ``` Assertions: - request stream includes `read_skill zhihu-hotlist` - later includes `read_skill office-export-xlsx` - office phase exposes `openxml_office` - no `shell` is exposed in the constrained task path **Step 2: Run the test to verify failure** Run: ```bash cargo test --test compat_runtime_test zhihu_hotlist_export_task_reads_hotlist_skill_then_office_skill -- --nocapture ``` Expected: - fail because the task currently has no structured handoff to Office export **Step 3: Implement minimal chaining support** Do not add a hard-coded workflow engine. Minimal implementation: - strengthen prompt contract so export tasks require structured hotlist artifact - include `openxml_office` in allowed tools for export intent - keep browser-only tools for the collection phase and Office-only tool for the export phase **Step 4: Re-run the test** Expected: - pass ## Task 6: Add Real Acceptance Harness And Scoring **Files:** - Create: `/home/zyl/projects/sgClaw/claw/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py` - Create: `/home/zyl/projects/sgClaw/claw/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md` **Intent:** - Make the final acceptance repeatable with the real user config and a transparent score. **Step 1: Write the script** The script must: - use `/home/zyl/.config/superrpa/Default/superrpa/sgclaw_config.json` - boot local `target/debug/sgclaw` - send one browser `submit_task` - respond to browser commands with controlled fixture responses - capture: - loaded skills - selected skills - forbidden tool calls - final summary - exported file path **Step 2: Define score rubric** Rubric: - `skill selection`: 30 - `tool discipline`: 25 - `hotlist data correctness`: 20 - `xlsx export success`: 20 - `final response quality`: 5 Automatic deductions: - `shell` called: `-15` - `glob_search` called: `-10` - `file_read` on skill references: `-10` - wrong skill selected first: `-15` - export missing output path: `-20` **Step 3: Run acceptance** Run: ```bash python3 tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py ``` Expected: - prints total score and per-dimension breakdown - stores final evidence in `docs/acceptance/2026-03-29-zhihu-hotlist-excel.md` ## Delivery Sequence Execute in this order: 1. Task 1: constrain tools 2. Task 2: structure hotlist output 3. Task 3: add office skill package 4. Task 4: add `openxml_office` 5. Task 5: chain the two skills 6. Task 6: run acceptance and score ## Definition Of Done - browser-attached hotlist tasks no longer wander into `shell`, `glob_search`, or ad-hoc `file_read` - `office-export-xlsx` exists as an independent skill - `openxml_office` exists as an explicit tool - a single user task can collect hotlist data and export `.xlsx` - acceptance score is at least `85/100`