13 KiB
Zhihu Hotlist To Excel Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Make sgClaw reliably read Zhihu hotlist data through a Zhihu browser skill and export the collected structured result into a local .xlsx file through an independent Office skill.
Architecture: Keep zeroclaw as the core planner, but stop it from wandering across unrelated tools once a browser-attached skill is selected. The hotlist skill must produce a strict structured artifact, and the Office skill must consume that artifact through a dedicated openxml_office tool that wraps the sibling openxml_cli project. For the first delivery, reuse openxml_cli template render with a bundled .xlsx template instead of inventing a new workbook-construction API.
Tech Stack: Rust, vendored zeroclaw, sgClaw browser pipe, skill packages under /home/zyl/projects/sgClaw/skill_lib, sibling openxml_cli, JSON payload handoff, .xlsx template render, Python/Rust regression tests, real-provider smoke verification.
Scope Guard
- In scope:
- browser-attached skill execution discipline
zhihu-hotliststructured export artifact- new
office-export-xlsxskill - new
openxml_officeruntime tool - end-to-end acceptance for "读取知乎热榜数据,并导出 excel 文件"
- Out of scope:
- generic Office authoring platform
- arbitrary shell-based export flows
- browser-side file generation as the main export path
- broad multi-site data export before Zhihu hotlist is stable
Current Findings To Preserve
- Real-provider validation already proved that
zhihu-hotlist,zhihu-navigate, andzhihu-writecan be selected throughread_skill. - The current failure mode is not "skill missing" but "tool discipline collapse":
file_read,glob_search, andshellare attempted afterread_skillzhihu-writecan fill title/body but still exceeds max tool iterationszhihu-navigatesucceeds for some intents but still detours through non-browser tools
- The sibling Office project already exists at
/home/zyl/projects/sgClaw/openxml_cli. openxml_clicurrently exposescapabilities,template inspect,template validate, andtemplate render; it does not yet expose a direct "create workbook from scratch" command.
Final Acceptance Contract
Input:
读取知乎热榜数据,并导出 excel 文件
Required behavior:
- sgClaw selects
zhihu-hotlist. - sgClaw gathers hotlist rows through the SuperRPA browser interface only.
- sgClaw converts the result into a structured JSON export payload.
- sgClaw selects
office-export-xlsx. - sgClaw calls
openxml_office. - A local
.xlsxfile is produced and its path is returned.
Required logs:
read_skill zhihu-hotlist- browser actions only:
navigate,getText, optionallyclick read_skill office-export-xlsxcall openxml_office
Forbidden logs during the mainline path:
call shellcall glob_searchcall file_readon skill references or skill rootsdocker run
Required Excel content:
- one sheet named
知乎热榜 - columns:
rank,title,heat - at least 10 hotlist rows
- exported values match the collected rows
Task 1: Lock Browser-Attached Skill Runs To The Right Tools
Files:
- Modify:
/home/zyl/projects/sgClaw/claw/tests/compat_runtime_test.rs - Modify:
/home/zyl/projects/sgClaw/claw/src/runtime/engine.rs - Modify:
/home/zyl/projects/sgClaw/claw/src/runtime/tool_policy.rs - Modify:
/home/zyl/projects/sgClaw/claw/src/compat/runtime.rs
Intent:
- Once the task is clearly in a browser-attached Zhihu skill flow, the runtime must stop offering unrelated tools such as
shell,glob_search, and arbitraryfile_read.
Step 1: Write the failing regression tests
Add focused tests in tests/compat_runtime_test.rs for:
#[test]
fn zhihu_hotlist_skill_flow_does_not_expose_shell_or_glob_tools() {}
#[test]
fn browser_attached_export_flow_exposes_browser_and_office_tools_only() {}
Assertions to include:
- request tool list contains
superrpa_browser - request tool list contains
read_skill - request tool list does not contain
shell - request tool list does not contain
glob_search - request tool list does not contain generic
file_readduring the constrained browser skill phase
Step 2: Run the focused tests to verify failure
Run:
cargo test --test compat_runtime_test zhihu_hotlist_skill_flow_does_not_expose_shell_or_glob_tools -- --nocapture
cargo test --test compat_runtime_test browser_attached_export_flow_exposes_browser_and_office_tools_only -- --nocapture
Expected:
- fail because current runtime still exposes too many tools in browser-attached mode
Step 3: Implement minimal constrained-tool policy
Implement a browser-skill execution mode that:
- keeps
superrpa_browser - keeps compatibility alias
browser_action - keeps
read_skill - optionally keeps the new
openxml_officetool only for export tasks - removes
shell,glob_search, and free-formfile_readfrom the allowed tool list for these phases
Do this in src/runtime/engine.rs by deriving a narrower allowed_tools set from:
- runtime profile
- browser surface present flag
- instruction intent
- whether export mode is active
Step 4: Re-run the focused tests
Run the same commands.
Expected:
- both pass
Task 2: Convert Zhihu Hotlist Skill To Structured Output First
Files:
- Modify:
/home/zyl/projects/sgClaw/skill_lib/skills/zhihu-hotlist/SKILL.md - Modify:
/home/zyl/projects/sgClaw/claw/tests/skill_lib_validation_test.py - Modify:
/home/zyl/projects/sgClaw/claw/tests/compat_runtime_test.rs
Intent:
- The hotlist skill should stop ending with prose-only summaries. Its primary output must be a stable export artifact the Office skill can consume.
Step 1: Write the failing tests
Add tests that enforce:
zhihu-hotlistprompt body contains an explicitExport Artifactsection- the artifact schema includes
sheet_name,columns, androws - runtime regression checks can find those fields in the skill content when
read_skillis used
Step 2: Run tests to verify failure
Run:
python3 -m unittest tests.skill_lib_validation_test
cargo test --test compat_runtime_test handle_browser_message_executes_real_zhihu_hotlist_skill_flow -- --nocapture
Expected:
- validation fails because the artifact contract is not yet required
Step 3: Update zhihu-hotlist
Add an Export Artifact section that requires this shape:
{
"source": "https://www.zhihu.com/hot",
"sheet_name": "知乎热榜",
"columns": ["rank", "title", "heat"],
"rows": [[1, "标题", "344万"]]
}
Also add hard rules:
- no extra exploratory tools after the browser data is collected
- prose summary is secondary, structured artifact is primary
Step 4: Re-run tests
Expected:
- validation passes
Task 3: Create The Office Export Skill Package
Files:
- Create:
/home/zyl/projects/sgClaw/skill_lib/skills/office-export-xlsx/SKILL.md - Create:
/home/zyl/projects/sgClaw/skill_lib/skills/office-export-xlsx/references/export-flow.md - Create:
/home/zyl/projects/sgClaw/skill_lib/skills/office-export-xlsx/assets/zhihu_hotlist_template.xlsx - Modify:
/home/zyl/projects/sgClaw/claw/tests/skill_lib_validation_test.py
Intent:
- Add a fully separate Office skill that knows nothing about browser scraping and only turns structured table data into a local Excel file.
Step 1: Write the failing validation test
Extend tests/skill_lib_validation_test.py so discovery expects:
EXPECTED_SKILL_NAMES = [
"office-export-xlsx",
"zhihu-hotlist",
"zhihu-navigate",
"zhihu-write",
]
Also require the new skill to mention:
openxml_office.xlsxsheet_namecolumnsrows
Step 2: Run the validation test to verify failure
Run:
python3 -m unittest tests.skill_lib_validation_test
Expected:
- fail because the new skill package does not exist yet
Step 3: Create the skill package
SKILL.md must define:
- when to use: local Office export from structured rows
- required input schema
- output: exported file path
- tool rule: only call
openxml_office, do not use browser tools
export-flow.md must define:
- validate payload shape
- choose output path
- invoke
openxml_office - return file path and row count
The first workbook template should be a fixed zhihu_hotlist_template.xlsx with:
- sheet
知乎热榜 - row 1 headers already present
- table fill anchored to a stable name or placeholder expected by
openxml_cli
Step 4: Re-run validation
Expected:
- new skill passes audit
Task 4: Add The openxml_office Runtime Tool
Files:
- Create:
/home/zyl/projects/sgClaw/claw/src/compat/openxml_office_tool.rs - Modify:
/home/zyl/projects/sgClaw/claw/src/compat/mod.rs - Modify:
/home/zyl/projects/sgClaw/claw/src/compat/runtime.rs - Modify:
/home/zyl/projects/sgClaw/claw/src/runtime/tool_policy.rs - Test:
/home/zyl/projects/sgClaw/claw/tests/compat_openxml_office_tool_test.rs
Intent:
- Wrap sibling
openxml_clias a first-class local tool instead of leaking Office export through shell prompting.
Step 1: Write the failing tool test
Create tests/compat_openxml_office_tool_test.rs with cases for:
- capability probe
- render request assembly for xlsx export
- rejection when rows/columns are missing
- stable JSON output containing
output_path
Step 2: Run the test to verify failure
Run:
cargo test --test compat_openxml_office_tool_test -- --nocapture
Expected:
- fail because the tool does not exist
Step 3: Implement minimal tool
Tool contract:
{
"action": "export_hotlist_xlsx",
"template_path": ".../zhihu_hotlist_template.xlsx",
"output_path": "/tmp/zhihu_hotlist.xlsx",
"sheet_name": "知乎热榜",
"columns": ["rank", "title", "heat"],
"rows": [[1, "标题", "344万"]]
}
Implementation rules:
- write the payload JSON to a temp file
- invoke sibling
openxml_cli template render --request <file> --json - return parsed JSON result and normalized
output_path - no free-form shell composition from model text
Step 4: Re-run the focused tests
Expected:
- pass
Task 5: Wire Export Tasks To Use Two Skills In Sequence
Files:
- Modify:
/home/zyl/projects/sgClaw/claw/src/runtime/engine.rs - Modify:
/home/zyl/projects/sgClaw/claw/tests/compat_runtime_test.rs
Intent:
- The single user instruction must naturally flow from hotlist capture into Office export, not end after the first skill.
Step 1: Write the failing runtime test
Add a focused regression test for:
#[test]
fn zhihu_hotlist_export_task_reads_hotlist_skill_then_office_skill() {}
Assertions:
- request stream includes
read_skill zhihu-hotlist - later includes
read_skill office-export-xlsx - office phase exposes
openxml_office - no
shellis exposed in the constrained task path
Step 2: Run the test to verify failure
Run:
cargo test --test compat_runtime_test zhihu_hotlist_export_task_reads_hotlist_skill_then_office_skill -- --nocapture
Expected:
- fail because the task currently has no structured handoff to Office export
Step 3: Implement minimal chaining support
Do not add a hard-coded workflow engine.
Minimal implementation:
- strengthen prompt contract so export tasks require structured hotlist artifact
- include
openxml_officein allowed tools for export intent - keep browser-only tools for the collection phase and Office-only tool for the export phase
Step 4: Re-run the test
Expected:
- pass
Task 6: Add Real Acceptance Harness And Scoring
Files:
- Create:
/home/zyl/projects/sgClaw/claw/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py - Create:
/home/zyl/projects/sgClaw/claw/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md
Intent:
- Make the final acceptance repeatable with the real user config and a transparent score.
Step 1: Write the script
The script must:
- use
/home/zyl/.config/superrpa/Default/superrpa/sgclaw_config.json - boot local
target/debug/sgclaw - send one browser
submit_task - respond to browser commands with controlled fixture responses
- capture:
- loaded skills
- selected skills
- forbidden tool calls
- final summary
- exported file path
Step 2: Define score rubric
Rubric:
skill selection: 30tool discipline: 25hotlist data correctness: 20xlsx export success: 20final response quality: 5
Automatic deductions:
shellcalled:-15glob_searchcalled:-10file_readon skill references:-10- wrong skill selected first:
-15 - export missing output path:
-20
Step 3: Run acceptance
Run:
python3 tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py
Expected:
- prints total score and per-dimension breakdown
- stores final evidence in
docs/acceptance/2026-03-29-zhihu-hotlist-excel.md
Delivery Sequence
Execute in this order:
- Task 1: constrain tools
- Task 2: structure hotlist output
- Task 3: add office skill package
- Task 4: add
openxml_office - Task 5: chain the two skills
- Task 6: run acceptance and score
Definition Of Done
- browser-attached hotlist tasks no longer wander into
shell,glob_search, or ad-hocfile_read office-export-xlsxexists as an independent skillopenxml_officeexists as an explicit tool- a single user task can collect hotlist data and export
.xlsx - acceptance score is at least
85/100