feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
152
BROWSER_SKILL_AUTHORING.md
Normal file
152
BROWSER_SKILL_AUTHORING.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# Browser Skill Authoring
|
||||
|
||||
This note captures the browser-skill authoring rules proven during the live
|
||||
Zhihu hotlist export debugging on 2026-03-30.
|
||||
|
||||
## Why This Exists
|
||||
|
||||
The live browser run proved that a skill can be selected correctly, the
|
||||
runtime can call the right browser-script tool, and the task can still fail if
|
||||
the skill package does not encode enough deterministic extraction logic.
|
||||
|
||||
The concrete failure pattern was:
|
||||
|
||||
1. `zhihu-hotlist.extract_hotlist` was called correctly.
|
||||
2. The packaged script relied on stale DOM classes and returned no rows.
|
||||
3. The runtime fell back to generic `getText` probing.
|
||||
4. The user saw selector thrashing instead of a stable extraction path.
|
||||
|
||||
This document exists to prevent the same failure pattern in future browser
|
||||
skills.
|
||||
|
||||
## Authoring Rules
|
||||
|
||||
### 1. Use a packaged script for structured browser tasks
|
||||
|
||||
If the task's primary deliverable is structured data such as rows, fields, or a
|
||||
stable artifact, the skill should expose a deterministic `browser_script` tool.
|
||||
|
||||
Do not rely on prose-only instructions for repeated structured extraction.
|
||||
|
||||
### 2. Keep a strict extraction ladder inside the script
|
||||
|
||||
For browser extraction skills, the script should try data sources in this order:
|
||||
|
||||
1. stable structured page state when available
|
||||
2. generalized DOM candidates that are broader than one historical classname
|
||||
3. controlled page-text parsing as the last deterministic fallback
|
||||
|
||||
Do not jump straight from one brittle selector family to generic browser
|
||||
probing.
|
||||
|
||||
### 3. Treat generic `getText` probing as a fallback of last resort
|
||||
|
||||
The packaged script is the primary deterministic path.
|
||||
|
||||
If the script fails, it should fail for a specific reason:
|
||||
|
||||
- blocked/login/captcha page
|
||||
- unsupported page shape
|
||||
- artifact incomplete
|
||||
|
||||
Generic browser wandering should begin only after the packaged script has
|
||||
exhausted its own deterministic fallbacks.
|
||||
|
||||
### 4. Encode blocked-page semantics explicitly
|
||||
|
||||
A browser skill must distinguish:
|
||||
|
||||
- "the expected data is not present"
|
||||
- "the page is blocked by login, captcha, or anti-bot state"
|
||||
|
||||
When the page is blocked, fail with an explicit message. Do not silently report
|
||||
"no rows" if the real issue is that the page is not usable.
|
||||
|
||||
### 5. Make the structured artifact the primary contract
|
||||
|
||||
Upstream collection skills should return the structured artifact as soon as it
|
||||
is stable.
|
||||
|
||||
For example, the Zhihu hotlist flow should produce:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "https://www.zhihu.com/hot",
|
||||
"sheet_name": "知乎热榜",
|
||||
"columns": ["rank", "title", "heat"],
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
Downstream skills such as Office export or screen rendering should consume that
|
||||
artifact. They should not recollect source data.
|
||||
|
||||
### 6. Stop exploratory browser work after the artifact is stable
|
||||
|
||||
Once the primary artifact is complete:
|
||||
|
||||
- stop selector exploration
|
||||
- stop unrelated browser wandering
|
||||
- hand the artifact to the downstream skill or tool
|
||||
|
||||
Do not continue reading random page text after the final rows are already
|
||||
captured.
|
||||
|
||||
### 7. Keep skill boundaries narrow and explicit
|
||||
|
||||
Separate the responsibilities:
|
||||
|
||||
- navigation skill: reach the destination and verify arrival
|
||||
- collection skill: extract structured data
|
||||
- export skill: render `.xlsx`
|
||||
- presentation skill: render `.html` and `presentation`
|
||||
|
||||
Do not mix recollection, export, and presentation logic into one downstream
|
||||
skill.
|
||||
|
||||
### 8. Encode host constraints in every browser skill
|
||||
|
||||
Browser skills should restate the SuperRPA host contract:
|
||||
|
||||
- use `superrpa_browser` semantics inside the browser host
|
||||
- `expected_domain` is a bare hostname only
|
||||
- selectors must be valid CSS selectors
|
||||
- prefer direct routes before brittle click chains
|
||||
|
||||
These rules are not "obvious context". They belong in the skill.
|
||||
|
||||
### 9. Verify browser skills at multiple layers
|
||||
|
||||
A browser skill is not complete without verification at more than one layer:
|
||||
|
||||
1. script-level test for the packaged browser script
|
||||
2. skill-library validation for package structure
|
||||
3. runtime integration test proving the skill is actually called
|
||||
4. live acceptance when a real browser session is available
|
||||
|
||||
The 2026-03-30 fix only became trustworthy after all four were aligned.
|
||||
|
||||
### 10. Keep logs versioned and skill names explicit
|
||||
|
||||
Live debugging is much faster when the runtime logs include:
|
||||
|
||||
- runtime version
|
||||
- protocol version
|
||||
- loaded skill names with versions
|
||||
- explicit `call skill.tool` messages
|
||||
|
||||
Skill packages should be written assuming those logs are part of the
|
||||
operability contract.
|
||||
|
||||
## Update Checklist
|
||||
|
||||
When editing a browser skill, check all of the following:
|
||||
|
||||
- Does the skill define a deterministic primary path?
|
||||
- Does it state when generic probing is allowed?
|
||||
- Does it distinguish blocked pages from missing data?
|
||||
- Does it define the primary structured artifact clearly?
|
||||
- Does it stop downstream skills from recollecting data?
|
||||
- Does it include verification expectations, not only workflow prose?
|
||||
|
||||
If any answer is "no", the skill is still under-specified.
|
||||
44
README.md
Normal file
44
README.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# skill_lib
|
||||
|
||||
Dedicated ZeroClaw-style skill library for SGClaw browser workflows.
|
||||
|
||||
## Layout
|
||||
|
||||
Runtime skill packages live under `skills/<name>/`.
|
||||
|
||||
Each package should use this shape:
|
||||
|
||||
```text
|
||||
skills/<skill-name>/
|
||||
├── SKILL.md
|
||||
├── references/
|
||||
├── assets/
|
||||
└── scripts/ # optional
|
||||
```
|
||||
|
||||
## Package Contract
|
||||
|
||||
- Required file: `SKILL.md`
|
||||
- Repository-standard frontmatter keys: `name`, `description`, `version`, `author`, `tags`
|
||||
- `description` must be trigger-oriented and concise
|
||||
- Keep `SKILL.md` focused on workflow and decision rules
|
||||
- Put long operational detail into `references/`
|
||||
- Put preserved source artifacts, templates, or snapshots into `assets/`
|
||||
- Add `scripts/` only when deterministic repeated work should not be re-described in prose
|
||||
- For browser extraction tasks, prefer a packaged deterministic script before generic browser probing
|
||||
- Treat the structured artifact as the primary output contract for downstream skills
|
||||
- Encode blocked/login/captcha failure semantics explicitly instead of collapsing them into "no data"
|
||||
|
||||
## Authoring Guidance
|
||||
|
||||
Use [BROWSER_SKILL_AUTHORING.md](./BROWSER_SKILL_AUTHORING.md) when creating or
|
||||
updating browser-facing skills. It captures the concrete rules learned from the
|
||||
live Zhihu hotlist extraction failure and recovery.
|
||||
|
||||
## Scope
|
||||
|
||||
This repository stores skill packages, not Rust runtime code. Runtime dispatch, browser transport, and persistence implementations stay in their original source repositories.
|
||||
|
||||
## Verification
|
||||
|
||||
See [VERIFY.md](./VERIFY.md) for the repository-level validation checklist and expected structure checks.
|
||||
60
VERIFY.md
Normal file
60
VERIFY.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# VERIFY
|
||||
|
||||
Use this checklist when validating the `skill_lib` repository.
|
||||
|
||||
## Structural Checks
|
||||
|
||||
- All runtime packages live under `skill_lib/skills/`.
|
||||
- Each skill package contains a `SKILL.md` or `SKILL.toml` plus a maintained `SKILL.md`.
|
||||
- Each current skill package contains a `references/` directory.
|
||||
- Each current skill package contains an `assets/` directory.
|
||||
- Long operational detail lives in `references/`, not only in `SKILL.md`.
|
||||
- Preserved remote source artifacts live in `assets/`.
|
||||
- Browser extraction skills that promise structured output use `scripts/` when a deterministic script is required.
|
||||
- Browser skills define blocked/login/captcha behavior explicitly instead of reporting only "no data".
|
||||
- Downstream export/presentation skills consume upstream artifacts instead of recollecting browser data.
|
||||
|
||||
## Repository Scope Checks
|
||||
|
||||
- This repository does not contain Rust runtime dispatch code.
|
||||
- This repository does not contain browser transport implementation code.
|
||||
- This repository does not copy the original remote Rust modules into the skill packages.
|
||||
- The repository is a skill description library, not a runtime executable.
|
||||
|
||||
## Package Inventory
|
||||
|
||||
Current packages:
|
||||
|
||||
- `zhihu-navigate`
|
||||
- `zhihu-write`
|
||||
- `zhihu-hotlist`
|
||||
- `zhihu-hotlist-screen`
|
||||
- `office-export-xlsx`
|
||||
|
||||
## Recommended Commands
|
||||
|
||||
```bash
|
||||
find /home/zyl/projects/sgClaw/skill_lib/skills -mindepth 2 -maxdepth 2 -name SKILL.md | sort
|
||||
find /home/zyl/projects/sgClaw/skill_lib/skills -type d \( -name references -o -name assets \) | sort
|
||||
rg -n "^name: |^description: |^version: |^author: |^tags:" /home/zyl/projects/sgClaw/skill_lib/skills/*/SKILL.md
|
||||
python3 /home/zyl/projects/sgClaw/claw/scripts/validate_skill_lib.py
|
||||
```
|
||||
|
||||
From `/home/zyl/projects/sgClaw/claw`, run the project-local unit test suite:
|
||||
|
||||
```bash
|
||||
python3 -m unittest tests.skill_lib_validation_test -v
|
||||
python3 -m unittest tests.skill_script_hotlist_extractor_test -v
|
||||
```
|
||||
|
||||
## Expected Outcome
|
||||
|
||||
- Exactly five active skill packages exist.
|
||||
- Each package has both `references/` and `assets/`.
|
||||
- Each maintained skill document exposes the standardized frontmatter keys used by this repository.
|
||||
- The project-local validator reports `PASS` for all active packages.
|
||||
- Browser-script packages pass their dedicated script-level regression tests.
|
||||
|
||||
## Known Caveat
|
||||
|
||||
- `skills/zhihu-hotlist/assets/zhihu_hotlist_flow.source.json` was reconstructed from an earlier successful source fetch because the remote endpoint timed out during the final implementation batch. Its content matches the previously captured remote flow used in analysis.
|
||||
84
skill_inventory.md
Normal file
84
skill_inventory.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Skill Inventory
|
||||
|
||||
This library migrates Zhihu browser capability descriptions from the remote source repository:
|
||||
|
||||
- Repo: `https://gitea.fljx.top/admin/skill-lib`
|
||||
- Source modules: `src/skill/*.rs`
|
||||
- Source resources: `resources/skills/*.json`
|
||||
|
||||
## Migration Targets
|
||||
|
||||
| Target skill | Current remote capability | Source modules | Source resources |
|
||||
| --- | --- | --- | --- |
|
||||
| `zhihu-navigate` | `zhihu_navigate` | `src/skill/zhihu_navigation.rs` | `resources/skills/zhihu_navigation_pages.json` |
|
||||
| `zhihu-write` | `zhihu_write` | `src/skill/zhihu.rs` | `resources/skills/zhihu_write_flow.json` |
|
||||
| `zhihu-hotlist` | `zhihu_hotlist_collect` + `zhihu_hotlist_report` | `src/skill/zhihu_hotlist.rs`, `src/skill/zhihu_hotlist_store.rs` | `resources/skills/zhihu_hotlist_flow.json` |
|
||||
| `zhihu-hotlist-screen` | downstream leadership/demo presentation flow derived from hotlist artifact | local downstream skill only | local template asset |
|
||||
| `office-export-xlsx` | downstream Office export flow derived from structured artifact | local downstream skill only | local export-flow reference |
|
||||
|
||||
## Explicit Non-Goals
|
||||
|
||||
This migration does not port:
|
||||
|
||||
- Rust router dispatch from `src/skill/router.rs`
|
||||
- Browser pipe transport and runtime execution code
|
||||
- Snapshot persistence implementation details as executable code
|
||||
|
||||
The new repository is a skill library, not a Rust runtime.
|
||||
|
||||
## Source Notes Per Skill
|
||||
|
||||
### zhihu-navigate
|
||||
|
||||
- Source module: `src/skill/zhihu_navigation.rs`
|
||||
- Source catalog: `resources/skills/zhihu_navigation_pages.json`
|
||||
- Current model: route/component/flow/target
|
||||
- Observed scope: `13` routes, `53` components, `16` flows, `69` targets
|
||||
- Important risks:
|
||||
- natural-language alias routing drops ambiguous matches instead of explaining them
|
||||
- confirmed alias collisions: `关注分栏`, `回答排序菜单`
|
||||
- some selectors are stable (`href`, `aria-label`, `data-testid`), but some remain brittle or overly generic
|
||||
|
||||
### zhihu-write
|
||||
|
||||
- Source module: `src/skill/zhihu.rs`
|
||||
- Source flow: `resources/skills/zhihu_write_flow.json`
|
||||
- Current model: JSON-driven browser action sequence
|
||||
- Important risks:
|
||||
- publish flow depends on brittle selectors and placeholder text
|
||||
- successful publish depends on URL capture plus title verification
|
||||
- article publishing should require an explicit human confirmation gate in skill form
|
||||
|
||||
### zhihu-hotlist
|
||||
|
||||
- Source modules:
|
||||
- `src/skill/zhihu_hotlist.rs`
|
||||
- `src/skill/zhihu_hotlist_store.rs`
|
||||
- Source flow: `resources/skills/zhihu_hotlist_flow.json`
|
||||
- Current model: collect hotlist page, scrape detail comments, aggregate counts, persist snapshot, render report
|
||||
- Important risks:
|
||||
- comment collection can partially fail while the run still returns success
|
||||
- data completeness depends on current page DOM and comment visibility
|
||||
- reports need explicit caution language when metrics are partial or missing
|
||||
- extraction quality collapses if the packaged script depends on one stale DOM classname family
|
||||
|
||||
### zhihu-hotlist-screen
|
||||
|
||||
- Current model: downstream-only dashboard rendering from collected rows
|
||||
- Important risks:
|
||||
- downstream skill may accidentally recollect browser data if its boundary is not explicit
|
||||
- presentation skill must preserve partial-data status from upstream artifact
|
||||
|
||||
### office-export-xlsx
|
||||
|
||||
- Current model: downstream-only `.xlsx` export from structured rows
|
||||
- Important risks:
|
||||
- export skill may hide upstream partial-data status if the artifact contract is weak
|
||||
- Office export must not pull browser data directly
|
||||
|
||||
## Authoring Lessons From Live Browser Verification
|
||||
|
||||
- Structured browser tasks need packaged deterministic scripts, not prose-only guidance.
|
||||
- Browser scripts should prefer stable data sources, then broader DOM candidates, then controlled text fallback.
|
||||
- Login/captcha/blocked pages must be surfaced as explicit failure states.
|
||||
- Once the primary artifact is stable, downstream skills should consume it and stop browser wandering.
|
||||
80
skills/office-export-xlsx/SKILL.md
Normal file
80
skills/office-export-xlsx/SKILL.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
name: office-export-xlsx
|
||||
description: Use when the user wants to export structured table data into a local .xlsx file through the sgClaw Office pipeline.
|
||||
version: 0.1.0
|
||||
author: sgclaw
|
||||
tags:
|
||||
- office
|
||||
- xlsx
|
||||
- export
|
||||
---
|
||||
|
||||
# Office Export Xlsx
|
||||
|
||||
Convert a structured table artifact into a local `.xlsx` file. This skill is Office-only: it does not navigate websites, inspect browser DOM, or collect source data. It consumes already prepared rows and hands them to `openxml_office`.
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user asks to export collected data as Excel.
|
||||
- An upstream skill already produced `sheet_name`, `columns`, and `rows`.
|
||||
- The task needs a local workbook path as the final deliverable.
|
||||
|
||||
Do not use this skill for:
|
||||
|
||||
- browser navigation
|
||||
- data collection from live pages
|
||||
- free-form spreadsheet editing without a structured table artifact
|
||||
|
||||
## Required Input Artifact
|
||||
|
||||
The upstream skill must provide this structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "https://www.zhihu.com/hot",
|
||||
"sheet_name": "知乎热榜",
|
||||
"columns": ["rank", "title", "heat"],
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- `sheet_name` is the target workbook sheet name.
|
||||
- `columns` is the exact output header order.
|
||||
- `rows` is the ordered row list to export.
|
||||
- Do not invent missing rows.
|
||||
- If the artifact is partial, preserve that status in the final result instead of hiding it.
|
||||
- If the upstream artifact was blocked by login/verification or another browser-side issue, do not continue export as if the data were complete.
|
||||
|
||||
## Tool Contract
|
||||
|
||||
- Call `openxml_office` to render the final `.xlsx`.
|
||||
- Do not use browser tools in this skill.
|
||||
- Do not use `shell` as the main export path.
|
||||
- Return the final local `.xlsx` output path after `openxml_office` succeeds.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Validate that `sheet_name`, `columns`, and `rows` are present.
|
||||
2. Choose the workbook template and output path.
|
||||
3. Pass the structured payload to `openxml_office`.
|
||||
4. Return the output workbook path plus a short export summary.
|
||||
|
||||
Do not:
|
||||
|
||||
- recollect browser data from the source page
|
||||
- reformat the structured artifact into prose before export
|
||||
- continue other browser exploration after the workbook path is available
|
||||
|
||||
## References
|
||||
|
||||
- Use [export-flow.md](references/export-flow.md) for the exact export sequence.
|
||||
- Template assets for this skill belong under `assets/`.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- Mixing browser collection steps into the Office phase.
|
||||
- Reformatting the rows into prose before export.
|
||||
- Dropping the header order defined by `columns`.
|
||||
- Returning success without the generated local file path.
|
||||
7
skills/office-export-xlsx/assets/README.md
Normal file
7
skills/office-export-xlsx/assets/README.md
Normal file
@@ -0,0 +1,7 @@
|
||||
# Assets
|
||||
|
||||
This directory stores workbook templates used by `office-export-xlsx`.
|
||||
|
||||
Planned first template:
|
||||
|
||||
- `zhihu_hotlist_template.xlsx`
|
||||
40
skills/office-export-xlsx/references/export-flow.md
Normal file
40
skills/office-export-xlsx/references/export-flow.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Export Flow
|
||||
|
||||
This skill assumes an upstream structured artifact already exists.
|
||||
|
||||
## Input Contract
|
||||
|
||||
Required keys:
|
||||
|
||||
- `sheet_name`
|
||||
- `columns`
|
||||
- `rows`
|
||||
|
||||
Recommended keys:
|
||||
|
||||
- `source`
|
||||
- `partial`
|
||||
- `notes`
|
||||
|
||||
## Export Sequence
|
||||
|
||||
1. Confirm the task is export-only.
|
||||
2. Validate that `sheet_name` is non-empty.
|
||||
3. Validate that `columns` is a non-empty list.
|
||||
4. Validate that `rows` is a list of row arrays aligned to `columns`.
|
||||
5. Call `openxml_office` with:
|
||||
- template path
|
||||
- output `.xlsx` path
|
||||
- `sheet_name`
|
||||
- `columns`
|
||||
- `rows`
|
||||
6. Return the generated workbook path and any partial-data warning.
|
||||
|
||||
## Output Contract
|
||||
|
||||
Return:
|
||||
|
||||
- output file path
|
||||
- sheet name
|
||||
- row count
|
||||
- whether the source artifact was partial
|
||||
103
skills/zhihu-hotlist-screen/SKILL.md
Normal file
103
skills/zhihu-hotlist-screen/SKILL.md
Normal file
@@ -0,0 +1,103 @@
|
||||
---
|
||||
name: zhihu-hotlist-screen
|
||||
description: Use when the user wants to turn collected Zhihu hotlist data into an ECharts dashboard, leadership demo screen, or new-tab HTML presentation artifact.
|
||||
version: 0.1.0
|
||||
author: sgclaw
|
||||
tags:
|
||||
- zhihu
|
||||
- browser
|
||||
- hotlist
|
||||
- echarts
|
||||
- dashboard
|
||||
---
|
||||
|
||||
# Zhihu Hotlist Screen
|
||||
|
||||
Convert an already collected Zhihu hotlist artifact into a local ECharts `.html` dashboard for leadership demos. This skill is downstream-only: it consumes the structured artifact from `zhihu-hotlist` and hands it to `screen_html_export`.
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user asks for知乎热榜大屏、看板、dashboard、ECharts 演示页。
|
||||
- The user wants a leadership demo artifact that can be opened in a 新标签页.
|
||||
- An upstream collection step already produced ordered hotlist rows.
|
||||
- The final deliverable should be a local `.html` page instead of Excel.
|
||||
|
||||
Do not use this skill for:
|
||||
|
||||
- recollecting Zhihu data from scratch when no hotlist artifact exists
|
||||
- replacing `zhihu-hotlist` as the browser collection skill
|
||||
- exporting Office files such as `.xlsx`
|
||||
|
||||
## Upstream Browser Contract
|
||||
|
||||
- Upstream Zhihu collection still belongs to `zhihu-hotlist`, which uses `superrpa_browser` inside the sgClaw browser host.
|
||||
- Upstream browser calls must keep `expected_domain` as a bare hostname such as `www.zhihu.com`.
|
||||
- Upstream selectors must remain valid CSS selectors only.
|
||||
- This skill does not invent new browser steps. It transforms the collected artifact after upstream browser work is stable.
|
||||
|
||||
## Required Input Artifact
|
||||
|
||||
Use the ordered artifact produced by `zhihu-hotlist`:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "https://www.zhihu.com/hot",
|
||||
"sheet_name": "知乎热榜",
|
||||
"columns": ["rank", "title", "heat"],
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- `rows` is the primary payload for this skill.
|
||||
- Preserve the captured ranking order.
|
||||
- Do not invent extra rows that were not collected upstream.
|
||||
- If the upstream artifact is partial, keep that status visible in the final summary.
|
||||
|
||||
## Tool Contract
|
||||
|
||||
- Call `screen_html_export` to render the final ECharts dashboard.
|
||||
- The tool result must include a local `.html` path.
|
||||
- The tool result must include a `presentation` object for 新标签页 delivery.
|
||||
- Prefer `presentation.url` as the browser-open target.
|
||||
- Do not use `shell` as the primary rendering path.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Confirm that an upstream hotlist artifact with ordered `rows` already exists.
|
||||
2. If the upstream artifact is missing, incomplete, or blocked by a login/verification condition, stop and report that upstream collection must be fixed first.
|
||||
2. Load [render-flow.md](references/render-flow.md) and prepare the dashboard payload.
|
||||
3. Call `screen_html_export`.
|
||||
4. Return the final local `.html` path plus the `presentation` object.
|
||||
5. State explicitly that the final呈现 should open `presentation.url` in a 新标签页.
|
||||
|
||||
Do not:
|
||||
|
||||
- recollect Zhihu browser data inside this downstream skill
|
||||
- restart browser probing after the upstream artifact is already stable
|
||||
- hide upstream partial-data or blocked-page status in the final dashboard summary
|
||||
|
||||
## Output
|
||||
|
||||
Return a concise result with:
|
||||
|
||||
- artifact type: `echarts_dashboard`
|
||||
- source snapshot identifier
|
||||
- local `.html` output path
|
||||
- `presentation` object
|
||||
- whether the data is complete or partial
|
||||
- optional short demo summary after the artifact fields
|
||||
|
||||
## References
|
||||
|
||||
- Use [render-flow.md](references/render-flow.md) for the exact downstream handoff.
|
||||
- Template assets for this skill belong under `assets/`.
|
||||
- The base screen template is `assets/zhihu-hotlist-echarts.html`.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- Recollecting Zhihu data inside this downstream presentation skill.
|
||||
- Returning only prose instead of the local `.html` path.
|
||||
- Dropping the `presentation` contract needed for 新标签页展示.
|
||||
- Mixing Excel export requirements into this dashboard skill.
|
||||
2444
skills/zhihu-hotlist-screen/assets/zhihu-hotlist-echarts.html
Normal file
2444
skills/zhihu-hotlist-screen/assets/zhihu-hotlist-echarts.html
Normal file
File diff suppressed because it is too large
Load Diff
44
skills/zhihu-hotlist-screen/references/render-flow.md
Normal file
44
skills/zhihu-hotlist-screen/references/render-flow.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Render Flow
|
||||
|
||||
Use this flow after `zhihu-hotlist` has already produced ordered hotlist rows.
|
||||
|
||||
1. Keep the upstream artifact as the source of truth:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "https://www.zhihu.com/hot",
|
||||
"sheet_name": "知乎热榜",
|
||||
"columns": ["rank", "title", "heat"],
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
2. Call `screen_html_export` with at least:
|
||||
|
||||
```json
|
||||
{
|
||||
"snapshot_id": "zhihu-hotlist-20260329",
|
||||
"generated_at_ms": 1774713600000,
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
3. Expect the tool result to include:
|
||||
|
||||
```json
|
||||
{
|
||||
"output_path": "/abs/path/zhihu-hotlist-screen.html",
|
||||
"presentation": {
|
||||
"mode": "new_tab",
|
||||
"title": "知乎热榜主题分类分析大屏",
|
||||
"url": "file:///abs/path/zhihu-hotlist-screen.html",
|
||||
"open_in_new_tab": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
4. Final delivery rules:
|
||||
|
||||
- The local `.html` path is the generated artifact.
|
||||
- `presentation.url` is the browser-open target.
|
||||
- Final wording should say the dashboard is ready for 新标签页展示.
|
||||
113
skills/zhihu-hotlist/SKILL.md
Normal file
113
skills/zhihu-hotlist/SKILL.md
Normal file
@@ -0,0 +1,113 @@
|
||||
---
|
||||
name: zhihu-hotlist
|
||||
description: Use when the user wants to collect, snapshot, summarize, or report Zhihu hot list items and related comment metrics from browser-visible page data.
|
||||
version: 0.1.0
|
||||
author: sgclaw
|
||||
tags:
|
||||
- zhihu
|
||||
- browser
|
||||
- hotlist
|
||||
---
|
||||
|
||||
# Zhihu Hotlist
|
||||
|
||||
Collect Zhihu hot list items, optionally collect visible comment metrics from each item’s detail page, and render a compact report from the resulting snapshot. Use this skill for hotlist collection and reporting, not for article editing or general Zhihu navigation.
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user asks to collect Zhihu hot list data.
|
||||
- The user asks for a snapshot, ranking summary, or report of current Zhihu hot list items.
|
||||
- The user wants visible comment metrics such as replies, upvotes, favorites, or heart counts from hot items.
|
||||
- The task needs a structured report from an existing or newly captured snapshot.
|
||||
|
||||
Do not use this skill for:
|
||||
|
||||
- arbitrary Zhihu page navigation without hotlist collection
|
||||
- writing or publishing Zhihu articles
|
||||
- claiming complete data quality when comment collection partially fails
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Decide whether the task is a collection run, a report run, or both.
|
||||
2. For collection runs, call the packaged browser script tool `zhihu-hotlist.extract_hotlist` before any generic `getText` probing.
|
||||
3. For collection rules and guard conditions, follow [collection-flow.md](references/collection-flow.md).
|
||||
4. Inside the packaged script, prefer stable structured page state first, then broader DOM candidates, then controlled page-text fallback.
|
||||
5. Produce the `Export Artifact` immediately after the browser data is stable.
|
||||
6. If the page is blocked by login, captcha, or anti-bot state, fail explicitly instead of collapsing the issue into "no rows".
|
||||
7. Surface partial failures explicitly instead of hiding them behind a success summary.
|
||||
8. For report runs, format output using [report-format.md](references/report-format.md).
|
||||
9. Apply the caution rules in [data-quality.md](references/data-quality.md) whenever metrics are partial, missing, or inferred from fragile selectors.
|
||||
|
||||
## SuperRPA Interface Contract
|
||||
|
||||
- Inside the sgClaw browser host, prefer `superrpa_browser` for Zhihu page actions. `browser_action` is only the compatibility alias.
|
||||
- Always pass `expected_domain` as the bare hostname only, for example `www.zhihu.com`.
|
||||
- All selectors must be valid CSS selectors because the host executes `document.querySelector(...)`.
|
||||
- Never use XPath or jQuery-style pseudo-selectors such as `:contains(...)`.
|
||||
- Prefer canonical route navigation such as `https://www.zhihu.com/hot` before fallback click chains.
|
||||
- The primary deterministic extractor is the packaged browser script tool `zhihu-hotlist.extract_hotlist`.
|
||||
- Use generic `getText` only as a last-resort fallback when the packaged extractor fails.
|
||||
- Do not keep thrashing through selector variants once the packaged script has already produced the structured artifact.
|
||||
|
||||
## Partial-Failure Rule
|
||||
|
||||
- If hotlist items are captured but some comment-metric collections fail, report the run as partial.
|
||||
- Include how many items lacked comment metrics.
|
||||
- Do not phrase the result as fully complete when `partial_items > 0`.
|
||||
|
||||
## Blocked-Page Rule
|
||||
|
||||
- If Zhihu responds with a login wall, captcha, security verification page, or anti-bot interstitial, state that condition explicitly.
|
||||
- Do not misreport those states as ordinary "empty hotlist" outcomes.
|
||||
|
||||
## Export Artifact
|
||||
|
||||
The primary output of this skill is a structured artifact for downstream Office export. The structured artifact is primary. Any prose summary is secondary.
|
||||
|
||||
Return this shape as soon as hotlist collection is complete:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "https://www.zhihu.com/hot",
|
||||
"sheet_name": "知乎热榜",
|
||||
"columns": ["rank", "title", "heat"],
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- `sheet_name` must be exactly `知乎热榜`.
|
||||
- `columns` must remain `["rank", "title", "heat"]`.
|
||||
- `rows` must preserve the collected ranking order from the page.
|
||||
- Each row must contain exactly three values: numeric rank, title text, and heat text.
|
||||
- If fewer than the requested rows are visible, return the visible rows and mark the result as partial.
|
||||
- After the artifact is complete, stop exploratory tool use and do not resume browser wandering.
|
||||
- Do not switch to `shell`, `glob_search`, or unrelated file browsing once the hotlist rows are collected.
|
||||
|
||||
## Output
|
||||
|
||||
Return a concise result with:
|
||||
|
||||
- operation type: `collect` or `report`
|
||||
- requested `top_n`
|
||||
- snapshot identifier when available
|
||||
- item count
|
||||
- whether comment metrics are complete or partial
|
||||
- any missing or weak data areas
|
||||
- the `Export Artifact` block shown above
|
||||
- an optional short prose summary only after the artifact
|
||||
|
||||
## References
|
||||
|
||||
- Use [collection-flow.md](references/collection-flow.md) for browser-side collection steps.
|
||||
- Use [report-format.md](references/report-format.md) for report rendering.
|
||||
- Use [data-quality.md](references/data-quality.md) before making claims about completeness.
|
||||
- Use `assets/zhihu_hotlist_flow.source.json` for exact selectors and guard text from the source flow.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- Treating visible hotlist capture as equivalent to complete comment-metric capture.
|
||||
- Forgetting that report mode can use an existing snapshot instead of recollecting.
|
||||
- Ignoring weak selectors and generic button captures in comment areas.
|
||||
- Reporting zeros as if they were confirmed values when the DOM capture may be incomplete.
|
||||
19
skills/zhihu-hotlist/SKILL.toml
Normal file
19
skills/zhihu-hotlist/SKILL.toml
Normal file
@@ -0,0 +1,19 @@
|
||||
[skill]
|
||||
name = "zhihu-hotlist"
|
||||
description = "Use when the user wants to collect, snapshot, summarize, or export Zhihu hot list items from the current browser page."
|
||||
version = "0.1.0"
|
||||
author = "sgclaw"
|
||||
tags = ["zhihu", "browser", "hotlist"]
|
||||
|
||||
prompts = [
|
||||
"For live Zhihu hotlist extraction, call zhihu-hotlist.extract_hotlist before generic browser getText probing.",
|
||||
]
|
||||
|
||||
[[tools]]
|
||||
name = "extract_hotlist"
|
||||
description = "Primary deterministic extractor for Zhihu hotlist rows on the current page. Use this before generic browser getText probing."
|
||||
kind = "browser_script"
|
||||
command = "scripts/extract_hotlist.js"
|
||||
|
||||
[tools.args]
|
||||
top_n = "Maximum number of hotlist rows to return."
|
||||
20
skills/zhihu-hotlist/assets/zhihu_hotlist_flow.source.json
Normal file
20
skills/zhihu-hotlist/assets/zhihu_hotlist_flow.source.json
Normal file
@@ -0,0 +1,20 @@
|
||||
{
|
||||
"hotlist_url": "https://www.zhihu.com/hot",
|
||||
"domains": {
|
||||
"zhihu": "www.zhihu.com"
|
||||
},
|
||||
"literals": {
|
||||
"hotlist_guard": "热榜"
|
||||
},
|
||||
"selectors": {
|
||||
"hotlist_root": "main, body",
|
||||
"hotlist_item": ".HotList-item, [data-hot-item], section ol li",
|
||||
"hotlist_title_link": ".HotList-item-title a, h2 a, .ContentItem-title a",
|
||||
"hotlist_summary": ".HotList-item-summary, .HotItem-content, .RichContent-inner, .ContentItem-excerpt",
|
||||
"hotlist_heat": ".HotList-item-heat, .HotItem-metrics, .HotItem-hot",
|
||||
"comment_list": ".Comments-list, .CommentListV2, [data-testid='comment-list'], .CommentList",
|
||||
"comment_item": ".Comments-list > .CommentItem, .CommentListV2 > .CommentItem, .CommentItemV2, .CommentItem",
|
||||
"comment_metric": ".CommentItem-metric, .CommentItem-footer button, .ContentItem-actions button, button"
|
||||
}
|
||||
}
|
||||
|
||||
68
skills/zhihu-hotlist/references/collection-flow.md
Normal file
68
skills/zhihu-hotlist/references/collection-flow.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Collection Flow
|
||||
|
||||
This skill uses the preserved source flow in `assets/zhihu_hotlist_flow.source.json`.
|
||||
|
||||
## Source Model
|
||||
|
||||
The source implementation does four things:
|
||||
|
||||
1. ensure the browser is on the hotlist page
|
||||
2. capture hotlist HTML
|
||||
3. extract the top N items from the page
|
||||
4. visit each item detail page and try to collect visible comment metrics
|
||||
|
||||
## Hotlist Page Detection
|
||||
|
||||
- Preferred page URL: `https://www.zhihu.com/hot`
|
||||
- Domain: `www.zhihu.com`
|
||||
- Guard text: `热榜`
|
||||
|
||||
The source flow first probes the current page for the guard text before deciding whether it must navigate.
|
||||
|
||||
## Hotlist Extraction
|
||||
|
||||
The source selectors look for:
|
||||
|
||||
- hotlist root
|
||||
- hotlist item
|
||||
- title link
|
||||
- summary
|
||||
- heat text
|
||||
|
||||
If the page HTML is empty or exposes no items, the collection should be treated as failed.
|
||||
|
||||
## Comment Metric Collection
|
||||
|
||||
For each hot item:
|
||||
|
||||
1. navigate to the item detail page
|
||||
2. wait for page root
|
||||
3. scroll toward comments
|
||||
4. wait for comment list
|
||||
5. scroll comment list into view
|
||||
6. capture page HTML
|
||||
7. parse visible metrics from comment items
|
||||
|
||||
## Parsed Metrics
|
||||
|
||||
The source collector tries to extract:
|
||||
|
||||
- reply count
|
||||
- upvote count
|
||||
- favorite count
|
||||
- heart count
|
||||
|
||||
It also preserves unmatched numeric metrics as raw metric fields when possible.
|
||||
|
||||
## Count Parsing
|
||||
|
||||
The source parser recognizes compact counts such as:
|
||||
|
||||
- plain integers
|
||||
- `万`
|
||||
- `亿`
|
||||
- `k`
|
||||
- `m`
|
||||
|
||||
Use caution when summarizing parsed counts from compact display text.
|
||||
|
||||
46
skills/zhihu-hotlist/references/data-quality.md
Normal file
46
skills/zhihu-hotlist/references/data-quality.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Data Quality
|
||||
|
||||
This skill can return useful partial data, but it must not overclaim completeness.
|
||||
|
||||
## Main Quality Risks
|
||||
|
||||
- comment areas may not load for every hot item
|
||||
- the DOM may expose only visible comments, not the full set
|
||||
- generic selectors may match the wrong footer controls
|
||||
- compact text counts can be parsed but still reflect display approximations
|
||||
|
||||
## Partial Success Rule
|
||||
|
||||
The source implementation tracks partial item failures during comment collection. If some detail pages fail but the run still returns a snapshot:
|
||||
|
||||
- report the run as partial
|
||||
- include how many items were missing comment metrics
|
||||
- keep the successful hotlist capture separate from comment-metric completeness
|
||||
|
||||
## Snapshot Caveats
|
||||
|
||||
The source store design keeps:
|
||||
|
||||
- `snapshot_id`
|
||||
- capture timestamp
|
||||
- page URL
|
||||
- collector version
|
||||
- item list
|
||||
- collection stats
|
||||
|
||||
This is enough for reproducible reporting, but it does not guarantee that every metric field was fully captured.
|
||||
|
||||
## Recommended Caution Language
|
||||
|
||||
Use wording like:
|
||||
|
||||
- `热榜列表已采集,评论指标为部分完成。`
|
||||
- `报告基于最新快照生成,部分条目缺少评论指标。`
|
||||
- `数字来自页面可见指标,可能低于完整站内统计。`
|
||||
|
||||
Avoid wording like:
|
||||
|
||||
- `全部评论指标已准确采集`
|
||||
- `完整真实热度`
|
||||
- `无缺失`
|
||||
|
||||
41
skills/zhihu-hotlist/references/report-format.md
Normal file
41
skills/zhihu-hotlist/references/report-format.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Report Format
|
||||
|
||||
The source report mode renders a compact text report from a snapshot.
|
||||
|
||||
## Header Line
|
||||
|
||||
Use this structure:
|
||||
|
||||
```text
|
||||
知乎热榜报告 <snapshot_id>: 共 <item_count> 条,采集于 <captured_at_ms>
|
||||
```
|
||||
|
||||
## Per-Item Line
|
||||
|
||||
Use this structure:
|
||||
|
||||
```text
|
||||
<rank>. <title> | 热度 <heat_text> | 评论指标 <metric_count> 条 | 回复 <reply_total> | 赞同 <upvote_total> | 收藏 <favorite_total> | 红心 <heart_total>
|
||||
```
|
||||
|
||||
## Field Semantics
|
||||
|
||||
- `metric_count`: number of collected comment metric records for the item
|
||||
- `reply_total`: sum of reply counts across collected records
|
||||
- `upvote_total`: sum of upvote counts across collected records
|
||||
- `favorite_total`: sum of favorite counts across collected records
|
||||
- `heart_total`: sum of heart counts across collected records
|
||||
|
||||
## Missing-Metric Handling
|
||||
|
||||
If an item has no collected comment metrics:
|
||||
|
||||
- keep the item in the report
|
||||
- show metric count as `0`
|
||||
- explicitly note partial data elsewhere in the result summary if the run was incomplete
|
||||
|
||||
## Report Mode Behavior
|
||||
|
||||
- If a specific snapshot ID is supplied, report from that snapshot.
|
||||
- Otherwise, use the latest known snapshot.
|
||||
|
||||
262
skills/zhihu-hotlist/scripts/extract_hotlist.js
Normal file
262
skills/zhihu-hotlist/scripts/extract_hotlist.js
Normal file
@@ -0,0 +1,262 @@
|
||||
const limit = Math.max(1, Number(args.top_n || 10));
|
||||
|
||||
function cleanText(value) {
|
||||
return String(value || '')
|
||||
.replace(/\s+/g, ' ')
|
||||
.replace(/\u200b/g, '')
|
||||
.trim();
|
||||
}
|
||||
|
||||
function pickText(root, selectors) {
|
||||
for (const selector of selectors) {
|
||||
const node = root.querySelector(selector);
|
||||
const text = cleanText(node && node.textContent);
|
||||
if (text) {
|
||||
return text;
|
||||
}
|
||||
}
|
||||
return '';
|
||||
}
|
||||
|
||||
function inferHeat(text) {
|
||||
const compact = cleanText(text);
|
||||
const match = compact.match(/(\d+(?:\.\d+)?)\s*(万|亿|k|K|m|M)(?:热度)?/);
|
||||
if (match) {
|
||||
return `${match[1]}${match[2]}`.replace('K', 'k').replace('M', 'm');
|
||||
}
|
||||
const plain = compact.match(/(\d+(?:\.\d+)?)(?:热度)?/);
|
||||
return plain ? plain[1] : '';
|
||||
}
|
||||
|
||||
function extractHeatToken(text) {
|
||||
const compact = cleanText(text);
|
||||
const match = compact.match(/(\d+(?:\.\d+)?)\s*(万|亿|k|K|m|M)(?:热度)?$/);
|
||||
if (match) {
|
||||
return `${match[1]}${match[2]}`.replace('K', 'k').replace('M', 'm');
|
||||
}
|
||||
return '';
|
||||
}
|
||||
|
||||
function inferRank(item, index) {
|
||||
const direct = pickText(item, [
|
||||
'.HotList-item-index',
|
||||
'.HotItem-index',
|
||||
'[data-rank]',
|
||||
'.RankingIndex',
|
||||
]);
|
||||
const directNumber = Number.parseInt(direct, 10);
|
||||
if (Number.isFinite(directNumber) && directNumber > 0) {
|
||||
return directNumber;
|
||||
}
|
||||
|
||||
const text = cleanText(item.textContent);
|
||||
const leading = text.match(/^(\d{1,2})\b/);
|
||||
if (leading) {
|
||||
return Number.parseInt(leading[1], 10);
|
||||
}
|
||||
|
||||
return index + 1;
|
||||
}
|
||||
|
||||
function collectRows() {
|
||||
const candidates = collectDomCandidates();
|
||||
const seenTitles = new Set();
|
||||
const rows = [];
|
||||
|
||||
for (const item of candidates) {
|
||||
const title = pickText(item, [
|
||||
'.HotList-item-title',
|
||||
'.HotList-item-title a',
|
||||
'.HotItem-content a',
|
||||
'h2 a',
|
||||
'h2',
|
||||
'a[href*="/question/"]',
|
||||
]);
|
||||
if (!title || seenTitles.has(title)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
let heat = pickText(item, [
|
||||
'.HotList-item-metrics',
|
||||
'.HotList-item-heat',
|
||||
'.HotItem-metrics',
|
||||
'.HotItem-hot',
|
||||
'[data-heat]',
|
||||
]);
|
||||
if (!heat) {
|
||||
heat = inferHeat(item.textContent);
|
||||
}
|
||||
if (!heat) {
|
||||
continue;
|
||||
}
|
||||
|
||||
seenTitles.add(title);
|
||||
rows.push([
|
||||
inferRank(item, rows.length),
|
||||
title,
|
||||
heat,
|
||||
]);
|
||||
|
||||
if (rows.length >= limit) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return rows;
|
||||
}
|
||||
|
||||
function collectDomCandidates() {
|
||||
const selectors = [
|
||||
'.HotList-item',
|
||||
'.HotItem',
|
||||
'.HotList-list > *',
|
||||
'[data-hot-item]',
|
||||
'section ol li',
|
||||
'main li',
|
||||
'main article',
|
||||
'main [class*="Hot"]',
|
||||
];
|
||||
const seen = new Set();
|
||||
const candidates = [];
|
||||
selectors.forEach((selector) => {
|
||||
const nodes = Array.from(document.querySelectorAll(selector));
|
||||
nodes.forEach((node) => {
|
||||
if (seen.has(node)) {
|
||||
return;
|
||||
}
|
||||
seen.add(node);
|
||||
candidates.push(node);
|
||||
});
|
||||
});
|
||||
return candidates;
|
||||
}
|
||||
|
||||
function collectTextSources() {
|
||||
const selectors = ['.HotList-list', '.HotList', '#root', 'main', 'body'];
|
||||
const sources = [];
|
||||
const seen = new Set();
|
||||
selectors.forEach((selector) => {
|
||||
const node = document.querySelector(selector);
|
||||
const rawText = String(node && (node.innerText || node.textContent || '') || '');
|
||||
const dedupeKey = cleanText(rawText);
|
||||
if (!dedupeKey || seen.has(dedupeKey)) {
|
||||
return;
|
||||
}
|
||||
seen.add(dedupeKey);
|
||||
sources.push(rawText);
|
||||
});
|
||||
return sources.sort((left, right) => right.length - left.length);
|
||||
}
|
||||
|
||||
function looksLikeBlockedPage(text) {
|
||||
return /安全验证|异常访问|请完成验证|登录后继续|登录即可查看|验证码|访问受限/.test(text);
|
||||
}
|
||||
|
||||
function shouldIgnoreTextLine(line) {
|
||||
if (!line) {
|
||||
return true;
|
||||
}
|
||||
if (line === '知乎热榜' || line === '首页 - 知乎' || line === '首页-知乎') {
|
||||
return true;
|
||||
}
|
||||
if (line.startsWith('/ ') || line.startsWith('当前页面 ·') ||
|
||||
line.startsWith('继续输入任务')) {
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
function collectRowsFromText() {
|
||||
const sources = collectTextSources();
|
||||
for (const source of sources) {
|
||||
if (!source) {
|
||||
continue;
|
||||
}
|
||||
if (looksLikeBlockedPage(source)) {
|
||||
throw new Error('知乎页面当前需要登录或完成安全验证,无法读取热榜条目');
|
||||
}
|
||||
|
||||
const rows = parseRowsFromText(source);
|
||||
if (rows.length) {
|
||||
return rows.slice(0, limit);
|
||||
}
|
||||
}
|
||||
return [];
|
||||
}
|
||||
|
||||
function parseRowsFromText(text) {
|
||||
const lines = String(text || '')
|
||||
.split(/\n+/)
|
||||
.map(cleanText)
|
||||
.filter((line) => !!line && !shouldIgnoreTextLine(line));
|
||||
const seenTitles = new Set();
|
||||
const rows = [];
|
||||
let pendingRank = null;
|
||||
let titleParts = [];
|
||||
|
||||
function pushRow(title, heat) {
|
||||
const normalizedTitle = cleanText(title);
|
||||
if (!normalizedTitle || !heat || seenTitles.has(normalizedTitle)) {
|
||||
return;
|
||||
}
|
||||
seenTitles.add(normalizedTitle);
|
||||
rows.push([
|
||||
pendingRank || rows.length + 1,
|
||||
normalizedTitle,
|
||||
heat,
|
||||
]);
|
||||
pendingRank = null;
|
||||
titleParts = [];
|
||||
}
|
||||
|
||||
for (const rawLine of lines) {
|
||||
let line = rawLine;
|
||||
|
||||
const rankOnly = line.match(/^(\d{1,2})$/);
|
||||
if (rankOnly && !titleParts.length) {
|
||||
pendingRank = Number(rankOnly[1]);
|
||||
continue;
|
||||
}
|
||||
|
||||
const rankedLine = line.match(/^(\d{1,2})[.、\s]+(.+)$/);
|
||||
if (rankedLine) {
|
||||
pendingRank = Number(rankedLine[1]);
|
||||
line = cleanText(rankedLine[2]);
|
||||
}
|
||||
|
||||
const inlineMatch = line.match(/^(.*?)(\d+(?:\.\d+)?)\s*(万|亿|k|K|m|M)(?:热度)?$/);
|
||||
if (inlineMatch && cleanText(inlineMatch[1])) {
|
||||
pushRow(cleanText(inlineMatch[1]), `${inlineMatch[2]}${inlineMatch[3]}`.replace('K', 'k').replace('M', 'm'));
|
||||
if (rows.length >= limit) {
|
||||
break;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
const heatOnly = extractHeatToken(line);
|
||||
if (heatOnly && titleParts.length) {
|
||||
pushRow(titleParts.join(' '), heatOnly);
|
||||
if (rows.length >= limit) {
|
||||
break;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
titleParts.push(line);
|
||||
}
|
||||
|
||||
return rows;
|
||||
}
|
||||
|
||||
const domRows = collectRows();
|
||||
const rows = domRows.length ? domRows : collectRowsFromText();
|
||||
if (!rows.length) {
|
||||
throw new Error('未能从页面 DOM 中提取到知乎热榜条目');
|
||||
}
|
||||
|
||||
return {
|
||||
source: `${location.origin}${location.pathname}`,
|
||||
sheet_name: '知乎热榜',
|
||||
columns: ['rank', 'title', 'heat'],
|
||||
rows,
|
||||
};
|
||||
83
skills/zhihu-navigate/SKILL.md
Normal file
83
skills/zhihu-navigate/SKILL.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
name: zhihu-navigate
|
||||
description: Use when the user wants to open, switch, or navigate to a Zhihu page, tab, menu, profile area, notification area, message area, or creator area through browser actions.
|
||||
version: 0.1.0
|
||||
author: sgclaw
|
||||
tags:
|
||||
- zhihu
|
||||
- browser
|
||||
- navigation
|
||||
---
|
||||
|
||||
# Zhihu Navigate
|
||||
|
||||
Open or switch to known Zhihu destinations through browser navigation steps. Use this skill for page routing, menu opening, profile-area switching, and creator-center entry, not for article authoring or hotlist data extraction.
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user asks to open a Zhihu page such as home, hot list, notifications, messages, search, creator center, or write article.
|
||||
- The user asks to switch to a specific Zhihu tab, section, or profile sub-area.
|
||||
- The user asks to open a Zhihu menu such as the avatar menu or notifications menu.
|
||||
- The task is browser navigation inside Zhihu and the desired destination is one of the known catalog targets.
|
||||
|
||||
Do not use this skill for:
|
||||
|
||||
- writing or publishing Zhihu articles
|
||||
- collecting hotlist data or producing reports
|
||||
- arbitrary web browsing outside the known Zhihu catalog
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Classify the request as one of these destination types:
|
||||
- route: direct page URL
|
||||
- component: clickable entry or tab
|
||||
- flow: multi-step navigation such as open menu, then open a nested destination
|
||||
2. Match the requested destination against the known catalog in [routes-and-targets.md](references/routes-and-targets.md).
|
||||
3. Prefer the most explicit target name available in the request.
|
||||
4. If the request maps to a known ambiguous alias, stop and ask for clarification instead of guessing.
|
||||
5. Use semantic selectors first and revalidate brittle selectors using [selector-strategy.md](references/selector-strategy.md) before relying on them.
|
||||
6. In the SuperRPA browser host, use the packaged browser-script tool `zhihu-navigate.open_creator_entry` before generic probing when the target is creator center or article entry.
|
||||
7. After navigation, verify the destination using URL, domain, or visible text whenever the page exposes a stable signal.
|
||||
8. If Zhihu redirects to login, captcha, or verification state, report that block explicitly instead of pretending the target page was reached.
|
||||
9. Once the target page is verified, stop exploratory click probing and hand off to the next skill if collection or editing is required.
|
||||
|
||||
## SuperRPA Interface Contract
|
||||
|
||||
- Inside the sgClaw browser host, prefer `superrpa_browser` for Zhihu routing and DOM interactions. `browser_action` is only the compatibility alias.
|
||||
- Always pass `expected_domain` as the bare hostname only, for example `www.zhihu.com`.
|
||||
- All selectors must be valid CSS selectors because the host executes `document.querySelector(...)`.
|
||||
- Never use XPath or jQuery-style pseudo-selectors such as `:contains(...)`.
|
||||
- Prefer the packaged browser-script tool over ad-hoc probing for creator/article entry.
|
||||
- Prefer direct route navigation to known canonical URLs before brittle click chains.
|
||||
- Do not cycle through multiple weak selectors when a canonical route is available.
|
||||
|
||||
## Ambiguity Rules
|
||||
|
||||
- Treat `关注分栏` as ambiguous until the user says whether they mean home feed following or profile following.
|
||||
- Treat `回答排序菜单` as ambiguous until the user says whether they mean question page sorting or answer page sorting.
|
||||
- If two targets score equally, ask a direct clarification question instead of falling back to the first match.
|
||||
|
||||
## Output
|
||||
|
||||
Return a concise result with:
|
||||
|
||||
- requested destination
|
||||
- resolved target key
|
||||
- navigation type: route, component, or flow
|
||||
- expected domain
|
||||
- final URL when available
|
||||
- verification result or uncertainty
|
||||
- whether a login/verification redirect blocked the intended destination
|
||||
|
||||
## References
|
||||
|
||||
- Use [routes-and-targets.md](references/routes-and-targets.md) to identify the intended target and flow shape.
|
||||
- Use [selector-strategy.md](references/selector-strategy.md) when a selector looks weak, overly generic, or DOM-version-dependent.
|
||||
- Use the preserved source catalog in `assets/zhihu_navigation_pages.source.json` only when a reference file does not contain enough detail.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- Confusing page navigation with article-writing actions.
|
||||
- Treating menu opening and page opening as the same kind of operation.
|
||||
- Ignoring alias collisions and silently picking one target.
|
||||
- Trusting a generic selector without checking whether it still maps to the intended control.
|
||||
22
skills/zhihu-navigate/SKILL.toml
Normal file
22
skills/zhihu-navigate/SKILL.toml
Normal file
@@ -0,0 +1,22 @@
|
||||
[skill]
|
||||
name = "zhihu-navigate"
|
||||
description = "Use when the user wants to open, switch, or navigate to a Zhihu page, tab, menu, profile area, notification area, message area, or creator area through browser actions."
|
||||
version = "0.1.0"
|
||||
author = "sgclaw"
|
||||
tags = ["zhihu", "browser", "navigation"]
|
||||
|
||||
prompts = [
|
||||
"For Zhihu creator-center or write-article navigation inside the SuperRPA browser host, call zhihu-navigate.open_creator_entry before any generic browser getText, click, or selector probing.",
|
||||
"zhihu-navigate.open_creator_entry is the deterministic path for detecting login blocks, creator-home state, and editor-ready state on www.zhihu.com.",
|
||||
"Do not use zhuanlan.zhihu.com inside this BrowserAttached host unless the host policy explicitly allows it. Prefer canonical www.zhihu.com creator routes.",
|
||||
"Never generate jQuery-style :contains() selectors. If the packaged browser script is available, use it before any generic browser probing."
|
||||
]
|
||||
|
||||
[[tools]]
|
||||
name = "open_creator_entry"
|
||||
description = "Inspect the current Zhihu page, detect login or creator/editor state, and resolve the stable creator/article-entry path on www.zhihu.com before generic probing."
|
||||
kind = "browser_script"
|
||||
command = "scripts/open_creator_entry.js"
|
||||
|
||||
[tools.args]
|
||||
desired_target = "Requested target such as creator_home or article_editor."
|
||||
2481
skills/zhihu-navigate/assets/zhihu_navigation_pages.source.json
Normal file
2481
skills/zhihu-navigate/assets/zhihu_navigation_pages.source.json
Normal file
File diff suppressed because it is too large
Load Diff
75
skills/zhihu-navigate/references/routes-and-targets.md
Normal file
75
skills/zhihu-navigate/references/routes-and-targets.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Routes And Targets
|
||||
|
||||
This skill is derived from the remote navigation catalog in `assets/zhihu_navigation_pages.source.json`.
|
||||
|
||||
## Current Scope
|
||||
|
||||
- Routes: 13
|
||||
- Components: 53
|
||||
- Flows: 16
|
||||
- Targets: 69
|
||||
|
||||
## Model
|
||||
|
||||
- `route`: direct page destination with a known URL
|
||||
- `component`: clickable entry, tab, button, or menu control
|
||||
- `flow`: multi-step sequence composed from routes and components
|
||||
- `target`: user-facing entry point that resolves to exactly one route, component, or flow
|
||||
|
||||
## Representative Routes
|
||||
|
||||
- `home`
|
||||
- `hot_list`
|
||||
- `notifications_page`
|
||||
- `messages_page`
|
||||
- `search_results_page`
|
||||
- `creator_home`
|
||||
- `write_article`
|
||||
|
||||
## Representative Components
|
||||
|
||||
- `top_nav_home`
|
||||
- `top_nav_hot`
|
||||
- `top_nav_creator`
|
||||
- `top_nav_notifications`
|
||||
- `top_nav_avatar_menu`
|
||||
- `notifications_tab_replies`
|
||||
- `messages_all_tab`
|
||||
|
||||
## Representative Flows
|
||||
|
||||
- `open_avatar_menu`
|
||||
- `open_notifications_menu`
|
||||
- `open_creator_from_home`
|
||||
- `open_profile_from_avatar_menu`
|
||||
- `open_profile_answers_tab`
|
||||
- `open_profile_articles_tab`
|
||||
- `open_security_settings_from_avatar_menu`
|
||||
|
||||
## Confirmed Alias Conflicts
|
||||
|
||||
These aliases currently resolve to more than one target and must not be guessed:
|
||||
|
||||
| Alias | Conflicting targets |
|
||||
| --- | --- |
|
||||
| `关注分栏` | `home_feed_following_tab`, `profile_following_tab` |
|
||||
| `回答排序菜单` | `answer_sort_menu`, `question_sort_menu` |
|
||||
|
||||
## Preferred Disambiguation Wording
|
||||
|
||||
When the user uses an ambiguous alias, ask for the missing context directly:
|
||||
|
||||
- `你说的“关注分栏”是首页关注流,还是个人主页里的关注分栏?`
|
||||
- `你说的“回答排序菜单”是问题页的排序菜单,还是回答列表的排序菜单?`
|
||||
|
||||
## Practical Routing Rule
|
||||
|
||||
Prefer the most explicit phrase in this order:
|
||||
|
||||
1. exact target name
|
||||
2. exact alias
|
||||
3. explicit page + area combination
|
||||
4. generic area noun only
|
||||
|
||||
If step 3 or 4 still matches more than one target, ask before acting.
|
||||
|
||||
59
skills/zhihu-navigate/references/selector-strategy.md
Normal file
59
skills/zhihu-navigate/references/selector-strategy.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Selector Strategy
|
||||
|
||||
The source catalog already mixes stable and brittle selectors. Use this order when validating or refreshing selectors.
|
||||
|
||||
## Preferred Order
|
||||
|
||||
1. Stable `href` selectors for direct links
|
||||
2. `aria-label` and `role` selectors for tabs, menus, and buttons
|
||||
3. `data-testid` selectors when available
|
||||
4. Stable semantic class names tied to product structure
|
||||
5. Generic class selectors only as a last resort
|
||||
6. CSS hash classes only when no better hook exists
|
||||
|
||||
## Good Patterns In The Current Catalog
|
||||
|
||||
- `a[href='/']`
|
||||
- `a[href='/hot']`
|
||||
- `a[href='/creator']`
|
||||
- `button[aria-label='通知']`
|
||||
- `[role='tab'][aria-label*='回复']`
|
||||
- `[data-testid='sort-button']`
|
||||
|
||||
These are relatively resilient because they describe user-facing semantics instead of transient layout implementation.
|
||||
|
||||
## Known Brittle Or Weak Patterns
|
||||
|
||||
- `div.css-1q62b6s > div.css-byu4by`
|
||||
- `button:has(img)`
|
||||
- `.MoreButton`
|
||||
- `.Popover`
|
||||
- `.Tooltip`
|
||||
- `.floating-menu`
|
||||
- `.Modal`
|
||||
- `.Dialog`
|
||||
|
||||
Risks:
|
||||
|
||||
- hash classes can change on any frontend build
|
||||
- generic popup selectors can match the wrong layer
|
||||
- image-based button matching is vulnerable to layout and icon changes
|
||||
|
||||
## Revalidation Rule
|
||||
|
||||
Before relying on a weak selector:
|
||||
|
||||
1. Check whether an `href`, `aria-label`, `role`, or `data-testid` selector now exists.
|
||||
2. Confirm the selector matches exactly one intended element.
|
||||
3. Confirm the element is visible and actionable in the current page state.
|
||||
4. If the selector is still generic, pair it with a stronger page-context check before acting.
|
||||
|
||||
## Failure Handling
|
||||
|
||||
If a weak selector stops working:
|
||||
|
||||
- do not silently substitute another generic selector
|
||||
- report which selector failed
|
||||
- describe the page context where it failed
|
||||
- request a selector refresh or DOM inspection before retrying
|
||||
|
||||
137
skills/zhihu-navigate/scripts/open_creator_entry.js
Normal file
137
skills/zhihu-navigate/scripts/open_creator_entry.js
Normal file
@@ -0,0 +1,137 @@
|
||||
function cleanText(value) {
|
||||
return String(value || '')
|
||||
.replace(/\s+/g, ' ')
|
||||
.replace(/\u200b/g, '')
|
||||
.trim();
|
||||
}
|
||||
|
||||
function bodyText() {
|
||||
const body = document.body;
|
||||
return cleanText(body && (body.innerText || body.textContent || ''));
|
||||
}
|
||||
|
||||
function isLoginBlocked(url, text) {
|
||||
return /\/signin\b|\/signup\b/.test(url) ||
|
||||
/登录|注册|验证码|安全验证|验证后继续|请先登录/.test(text);
|
||||
}
|
||||
|
||||
function hasEditorSignals() {
|
||||
return !!document.querySelector(
|
||||
"textarea[placeholder*='标题'], input[placeholder*='标题'], div[contenteditable='true'][role='textbox']"
|
||||
);
|
||||
}
|
||||
|
||||
function isVisible(node) {
|
||||
if (!node) {
|
||||
return false;
|
||||
}
|
||||
const rect = typeof node.getBoundingClientRect === 'function' ? node.getBoundingClientRect() : null;
|
||||
return !rect || rect.width > 0 || rect.height > 0;
|
||||
}
|
||||
|
||||
function isWriteEntryText(text) {
|
||||
return !!text && (text.includes('写文章') || text.includes('发文章'));
|
||||
}
|
||||
|
||||
function extractHref(node) {
|
||||
if (!node) {
|
||||
return '';
|
||||
}
|
||||
if (typeof node.href === 'string' && node.href) {
|
||||
return node.href;
|
||||
}
|
||||
if (typeof node.getAttribute === 'function') {
|
||||
return node.getAttribute('href') || '';
|
||||
}
|
||||
return '';
|
||||
}
|
||||
|
||||
function findClickableAncestor(node) {
|
||||
let current = node;
|
||||
while (current) {
|
||||
const tagName = String(current.tagName || '').toUpperCase();
|
||||
const role = typeof current.getAttribute === 'function' ? cleanText(current.getAttribute('role')) : '';
|
||||
const tabindex = typeof current.getAttribute === 'function' ? current.getAttribute('tabindex') : null;
|
||||
const href = extractHref(current);
|
||||
if (
|
||||
tagName === 'A' ||
|
||||
tagName === 'BUTTON' ||
|
||||
role === 'button' ||
|
||||
href ||
|
||||
(tabindex !== null && tabindex !== '')
|
||||
) {
|
||||
return current;
|
||||
}
|
||||
current = current.parentElement || null;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function findWriteEntry() {
|
||||
const directCandidates = Array.from(
|
||||
document.querySelectorAll("a[href], button, [role='button']")
|
||||
);
|
||||
const directMatch = directCandidates.find((node) => {
|
||||
const text = cleanText(node.textContent);
|
||||
if (!isWriteEntryText(text)) {
|
||||
return false;
|
||||
}
|
||||
return isVisible(node);
|
||||
});
|
||||
if (directMatch) {
|
||||
return directMatch;
|
||||
}
|
||||
|
||||
const textNodes = Array.from(
|
||||
document.querySelectorAll("div, span, [tabindex]")
|
||||
);
|
||||
for (const node of textNodes) {
|
||||
if (!isVisible(node)) {
|
||||
continue;
|
||||
}
|
||||
const text = cleanText(node.textContent);
|
||||
if (!isWriteEntryText(text)) {
|
||||
continue;
|
||||
}
|
||||
const clickableAncestor = findClickableAncestor(node);
|
||||
if (clickableAncestor && isVisible(clickableAncestor)) {
|
||||
return clickableAncestor;
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
const currentUrl = location.href;
|
||||
const text = bodyText();
|
||||
|
||||
if (isLoginBlocked(currentUrl, text)) {
|
||||
return {
|
||||
status: 'login_required',
|
||||
current_url: currentUrl,
|
||||
};
|
||||
}
|
||||
|
||||
if (hasEditorSignals()) {
|
||||
return {
|
||||
status: 'editor_ready',
|
||||
current_url: currentUrl,
|
||||
};
|
||||
}
|
||||
|
||||
const writeEntry = findWriteEntry();
|
||||
if (writeEntry) {
|
||||
writeEntry.click();
|
||||
const href = extractHref(writeEntry);
|
||||
return {
|
||||
status: 'creator_entry_clicked',
|
||||
current_url: currentUrl,
|
||||
next_url: href || undefined,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
status: 'creator_home',
|
||||
current_url: currentUrl,
|
||||
desired_target: String(args.desired_target || 'creator_home'),
|
||||
};
|
||||
83
skills/zhihu-write/SKILL.md
Normal file
83
skills/zhihu-write/SKILL.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
name: zhihu-write
|
||||
description: Use when the user wants to draft, fill, or publish a Zhihu article through browser actions in the Zhihu editor or creator center.
|
||||
version: 0.1.0
|
||||
author: sgclaw
|
||||
tags:
|
||||
- zhihu
|
||||
- browser
|
||||
- writing
|
||||
---
|
||||
|
||||
# Zhihu Write
|
||||
|
||||
Draft or publish a Zhihu article through the creator center and editor flow. Use this skill for filling article title and body, entering the Zhihu editor, and confirming the publish sequence. Do not use it for generic page navigation or hotlist extraction.
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user asks to write, draft, fill, edit, or publish a Zhihu article.
|
||||
- The task requires entering the Zhihu editor from creator center or the direct write page.
|
||||
- The task requires browser-side verification that the article title or publish URL is correct.
|
||||
|
||||
Do not use this skill for:
|
||||
|
||||
- opening ordinary Zhihu pages without editing content
|
||||
- collecting hotlist or comment metrics
|
||||
- bulk content generation without a clear article title and body
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Confirm the article inputs exist and are non-empty: title and body are both required.
|
||||
2. Decide whether the run is draft-only or publish mode.
|
||||
3. In the SuperRPA browser host, call the packaged browser-script tools before any generic browser probing:
|
||||
- `zhihu-write.prepare_article_editor`
|
||||
- `zhihu-write.fill_article_draft`
|
||||
4. Enter the editor flow described in [editor-flow.md](references/editor-flow.md).
|
||||
5. Fill the title first, then the body.
|
||||
6. If the task is publish mode, require explicit human confirmation before clicking publish.
|
||||
7. After publish, verify the final state using title text and published URL when available.
|
||||
8. If a brittle selector is involved, revalidate it using [publish-safety.md](references/publish-safety.md) before acting.
|
||||
9. If the editor is blocked by login, verification, or missing permissions, report that explicit state instead of continuing generic probing.
|
||||
10. Once draft or publish verification succeeds, stop exploratory browser work.
|
||||
|
||||
## SuperRPA Interface Contract
|
||||
|
||||
- Inside the sgClaw browser host, prefer `superrpa_browser` for Zhihu editor actions. `browser_action` is only the compatibility alias.
|
||||
- Always pass `expected_domain` as the bare hostname only, for example `www.zhihu.com`.
|
||||
- All selectors must be valid CSS selectors because the host executes `document.querySelector(...)`.
|
||||
- Never use XPath or jQuery-style pseudo-selectors such as `:contains(...)`.
|
||||
- Prefer the packaged browser-script tools over ad-hoc `getText` or `click` probing.
|
||||
- In the BrowserAttached host, use canonical `www.zhihu.com` creator routes first.
|
||||
- Do not navigate to `zhuanlan.zhihu.com` unless the host policy explicitly allows that domain.
|
||||
- Do not retry multiple weak editor selectors when a canonical creator/editor route is available.
|
||||
|
||||
## Confirmation Rule
|
||||
|
||||
- Draft-only mode does not require a publish confirmation gate.
|
||||
- Publish mode always requires an explicit confirmation from the human operator in the current session.
|
||||
- If the user’s wording is ambiguous between draft and publish, default to draft and ask before publishing.
|
||||
|
||||
## Output
|
||||
|
||||
Return a concise result with:
|
||||
|
||||
- article title
|
||||
- mode: `draft` or `publish`
|
||||
- editor entry path used
|
||||
- final URL if one was captured
|
||||
- verification result
|
||||
- unresolved issues or brittle points encountered
|
||||
- whether login/verification/permission state blocked the requested action
|
||||
|
||||
## References
|
||||
|
||||
- Use [editor-flow.md](references/editor-flow.md) for the action sequence and verification steps.
|
||||
- Use [publish-safety.md](references/publish-safety.md) before any live publish click.
|
||||
- Use `assets/zhihu_write_flow.source.json` when the references need exact selector or step names from the source flow.
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- Publishing when the user only asked for a draft.
|
||||
- Treating a captured edit URL as proof of successful publication.
|
||||
- Ignoring title verification after the publish confirmation flow.
|
||||
- Reusing brittle selectors without checking whether a better semantic selector now exists.
|
||||
34
skills/zhihu-write/SKILL.toml
Normal file
34
skills/zhihu-write/SKILL.toml
Normal file
@@ -0,0 +1,34 @@
|
||||
[skill]
|
||||
name = "zhihu-write"
|
||||
description = "Use when the user wants to draft, fill, or publish a Zhihu article through browser actions in the Zhihu editor or creator center."
|
||||
version = "0.1.0"
|
||||
author = "sgclaw"
|
||||
tags = ["zhihu", "browser", "writing"]
|
||||
|
||||
prompts = [
|
||||
"For Zhihu article drafting or publishing inside the SuperRPA browser host, call zhihu-write.prepare_article_editor before any generic browser getText, click, or selector probing.",
|
||||
"If zhihu-write.prepare_article_editor reports editor_ready, call zhihu-write.fill_article_draft with the title, body, and publish_mode arguments instead of generating ad-hoc browser selectors.",
|
||||
"Do not use zhuanlan.zhihu.com inside this BrowserAttached host unless the host policy explicitly allows it. Prefer canonical www.zhihu.com creator routes.",
|
||||
"If the user asked to publish but has not explicitly confirmed publishing in the current conversation, stop and ask for confirmation before any publish click.",
|
||||
"Never generate jQuery-style :contains() selectors. Use the packaged browser scripts before any generic browser probing."
|
||||
]
|
||||
|
||||
[[tools]]
|
||||
name = "prepare_article_editor"
|
||||
description = "Detect whether the current Zhihu page is login-blocked, creator-home, or editor-ready on www.zhihu.com before article drafting or publishing."
|
||||
kind = "browser_script"
|
||||
command = "scripts/prepare_article_editor.js"
|
||||
|
||||
[tools.args]
|
||||
desired_mode = "Requested mode such as draft or publish."
|
||||
|
||||
[[tools]]
|
||||
name = "fill_article_draft"
|
||||
description = "Fill the current Zhihu article editor with title and body, and optionally continue the publish flow when explicit confirmation is already present."
|
||||
kind = "browser_script"
|
||||
command = "scripts/fill_article_draft.js"
|
||||
|
||||
[tools.args]
|
||||
title = "Article title to write into the Zhihu editor."
|
||||
body = "Article body to write into the Zhihu editor."
|
||||
publish_mode = "Use true only when explicit human confirmation to publish is already present in the current conversation."
|
||||
126
skills/zhihu-write/assets/zhihu_write_flow.source.json
Normal file
126
skills/zhihu-write/assets/zhihu_write_flow.source.json
Normal file
@@ -0,0 +1,126 @@
|
||||
{
|
||||
"entry_url": "https://www.zhihu.com/creator",
|
||||
"editor_url": "https://zhuanlan.zhihu.com/write",
|
||||
"domains": {
|
||||
"creator": "www.zhihu.com",
|
||||
"editor": "zhuanlan.zhihu.com"
|
||||
},
|
||||
"literals": {
|
||||
"write_entry_text": "写文章",
|
||||
"title_placeholder": "请输入标题(最多 100 个字)",
|
||||
"body_role": "textbox",
|
||||
"publish_text": "发布",
|
||||
"publish_confirm_text": "确认发布"
|
||||
},
|
||||
"selectors": {
|
||||
"creator_write_panel": "div.css-1q62b6s",
|
||||
"creator_write_entry": "div.css-1q62b6s > div.css-byu4by",
|
||||
"title_input": "textarea[placeholder='请输入标题(最多 100 个字)']",
|
||||
"body_editor": "div.notranslate.public-DraftEditor-content[contenteditable='true'][role='textbox']",
|
||||
"publish_button": "button.Button--primary.Button--blue",
|
||||
"publish_confirm_dialog": "div[role='dialog']",
|
||||
"publish_confirm_button": "div[role='dialog'] button.Button--primary.Button--blue",
|
||||
"published_title": "h1"
|
||||
},
|
||||
"steps": [
|
||||
{
|
||||
"name": "navigate_creator",
|
||||
"action": "navigate",
|
||||
"expected_domain": "creator",
|
||||
"url_ref": "entry_url",
|
||||
"log_message": "navigate https://www.zhihu.com/creator"
|
||||
},
|
||||
{
|
||||
"name": "click_write_article",
|
||||
"action": "click",
|
||||
"expected_domain": "creator",
|
||||
"selector_ref": "creator_write_entry",
|
||||
"wait_after_ms": 1500,
|
||||
"log_message": "click 写文章"
|
||||
},
|
||||
{
|
||||
"name": "wait_editor_ready",
|
||||
"action": "waitForSelector",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "title_input",
|
||||
"timeout_ms": 8000,
|
||||
"log_message": "wait for editor title input"
|
||||
},
|
||||
{
|
||||
"name": "type_title",
|
||||
"action": "type",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "title_input",
|
||||
"text_source": "title",
|
||||
"clear_first": true,
|
||||
"log_message": "type article title into 请输入标题(最多 100 个字)"
|
||||
},
|
||||
{
|
||||
"name": "type_body",
|
||||
"action": "type",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "body_editor",
|
||||
"text_source": "body",
|
||||
"clear_first": true,
|
||||
"log_message": "type article body into editor textbox"
|
||||
},
|
||||
{
|
||||
"name": "scroll_publish_button",
|
||||
"action": "scrollTo",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "publish_button",
|
||||
"only_when_publish": true,
|
||||
"log_message": "scroll to 发布"
|
||||
},
|
||||
{
|
||||
"name": "click_publish",
|
||||
"action": "click",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "publish_button",
|
||||
"wait_after_ms": 800,
|
||||
"only_when_publish": true,
|
||||
"capture_url": true,
|
||||
"log_message": "click 发布"
|
||||
},
|
||||
{
|
||||
"name": "wait_publish_confirm_dialog",
|
||||
"action": "waitForSelector",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "publish_confirm_dialog",
|
||||
"timeout_ms": 8000,
|
||||
"only_when_publish": true,
|
||||
"log_message": "wait for publish confirm dialog"
|
||||
},
|
||||
{
|
||||
"name": "click_publish_confirm",
|
||||
"action": "click",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "publish_confirm_button",
|
||||
"wait_after_ms": 1500,
|
||||
"only_when_publish": true,
|
||||
"capture_url": true,
|
||||
"log_message": "click 确认发布"
|
||||
},
|
||||
{
|
||||
"name": "wait_published_title",
|
||||
"action": "waitForSelector",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "published_title",
|
||||
"timeout_ms": 15000,
|
||||
"only_when_publish": true,
|
||||
"capture_url": true,
|
||||
"log_message": "wait for published article title"
|
||||
},
|
||||
{
|
||||
"name": "confirm_published_title",
|
||||
"action": "getText",
|
||||
"expected_domain": "editor",
|
||||
"selector_ref": "published_title",
|
||||
"only_when_publish": true,
|
||||
"expect_text_source": "title",
|
||||
"allow_empty_text": true,
|
||||
"capture_url": true,
|
||||
"log_message": "verify published article title"
|
||||
}
|
||||
]
|
||||
}
|
||||
53
skills/zhihu-write/references/editor-flow.md
Normal file
53
skills/zhihu-write/references/editor-flow.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Editor Flow
|
||||
|
||||
This skill is based on the preserved source flow in `assets/zhihu_write_flow.source.json`.
|
||||
|
||||
## Entry Points
|
||||
|
||||
- Creator center entry URL: `https://www.zhihu.com/creator`
|
||||
- BrowserAttached direct editor URL: `https://www.zhihu.com/creator/posts/new`
|
||||
|
||||
`https://zhuanlan.zhihu.com/write` exists in the preserved source flow, but it is not the default entry point inside the current SuperRPA browser host because the host domain policy only guarantees `www.zhihu.com`.
|
||||
|
||||
The current BrowserAttached flow enters creator center first, then uses the packaged browser-script tools to resolve whether the session is blocked by login, already in the editor, or needs the canonical creator editor route.
|
||||
|
||||
## Required Inputs
|
||||
|
||||
- `title`
|
||||
- `body`
|
||||
|
||||
Both fields must be non-empty before any browser action starts.
|
||||
|
||||
## Core Sequence
|
||||
|
||||
1. Navigate to creator center.
|
||||
2. Click the write-article entry.
|
||||
3. Wait for the title input in the editor domain.
|
||||
4. Fill the title with `clear_first = true`.
|
||||
5. Fill the body editor with `clear_first = true`.
|
||||
6. If publish mode:
|
||||
- scroll to the publish button
|
||||
- click publish
|
||||
- wait for publish confirmation dialog
|
||||
- click confirm publish
|
||||
- wait for published title
|
||||
- verify published title text
|
||||
|
||||
## Readiness Checks
|
||||
|
||||
- The editor is considered ready only after the title input appears.
|
||||
- The publish flow is not complete until at least one post-publish verification succeeds.
|
||||
|
||||
## URL Capture Rules
|
||||
|
||||
- Pre-publish clicks may return an editor URL.
|
||||
- A valid published article URL should match the published article prefix and should not end in `/edit`.
|
||||
- If publish mode finishes without a published article URL, treat the run as unconfirmed even if some clicks succeeded.
|
||||
|
||||
## Known Brittle Points
|
||||
|
||||
- creator-center article entry selector
|
||||
- placeholder-based title input selector
|
||||
- generic primary-button publish selectors
|
||||
|
||||
These should be revalidated before any live publish run.
|
||||
49
skills/zhihu-write/references/publish-safety.md
Normal file
49
skills/zhihu-write/references/publish-safety.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Publish Safety
|
||||
|
||||
Publishing is the highest-risk action in this skill. Treat it as a gated operation.
|
||||
|
||||
## Mandatory Safety Rules
|
||||
|
||||
- Do not publish without explicit human confirmation in the current conversation.
|
||||
- Do not assume `publish: true` in an old request still reflects the user’s latest intent.
|
||||
- Do not treat a successful click as proof of publication.
|
||||
|
||||
## What Must Be Verified
|
||||
|
||||
At least one of these post-publish checks should succeed:
|
||||
|
||||
- the final URL is a published Zhihu article URL
|
||||
- the visible page title matches the requested article title
|
||||
|
||||
If both checks fail, report the run as unconfirmed.
|
||||
|
||||
## Failure Cases
|
||||
|
||||
### Title verification fails
|
||||
|
||||
- Stop.
|
||||
- Report the expected title and the observed title.
|
||||
- Do not claim the article was published correctly.
|
||||
|
||||
### URL remains in edit mode
|
||||
|
||||
- Treat the result as draft or unconfirmed publish.
|
||||
- Report that the browser stayed on an editor-style URL.
|
||||
- Ask for manual review before any retry.
|
||||
|
||||
### Publish dialog does not appear
|
||||
|
||||
- Do not retry blindly on generic primary buttons.
|
||||
- Report that the dialog selector failed.
|
||||
- Revalidate selectors and page state first.
|
||||
|
||||
## Brittle Selectors To Revalidate First
|
||||
|
||||
- `div.css-1q62b6s > div.css-byu4by`
|
||||
- `textarea[placeholder='请输入标题(最多 100 个字)']`
|
||||
- `button.Button--primary.Button--blue`
|
||||
- `div[role='dialog'] button.Button--primary.Button--blue`
|
||||
- `h1`
|
||||
|
||||
These are usable as source references, but not trustworthy forever.
|
||||
|
||||
179
skills/zhihu-write/scripts/fill_article_draft.js
Normal file
179
skills/zhihu-write/scripts/fill_article_draft.js
Normal file
@@ -0,0 +1,179 @@
|
||||
function cleanText(value) {
|
||||
return String(value || '')
|
||||
.replace(/\s+/g, ' ')
|
||||
.replace(/\u200b/g, '')
|
||||
.trim();
|
||||
}
|
||||
|
||||
function pageText() {
|
||||
const body = document.body;
|
||||
return cleanText(body && (body.innerText || body.textContent || ''));
|
||||
}
|
||||
|
||||
function isLoginBlocked(url, text) {
|
||||
return /\/signin\b|\/signup\b/.test(url) ||
|
||||
/登录|注册|验证码|安全验证|验证后继续|请先登录/.test(text);
|
||||
}
|
||||
|
||||
function isVisible(node) {
|
||||
if (!node) {
|
||||
return false;
|
||||
}
|
||||
const rect = typeof node.getBoundingClientRect === 'function' ? node.getBoundingClientRect() : null;
|
||||
return !rect || rect.width > 0 || rect.height > 0;
|
||||
}
|
||||
|
||||
function attrText(node, name) {
|
||||
if (!node || typeof node.getAttribute !== 'function') {
|
||||
return '';
|
||||
}
|
||||
return cleanText(node.getAttribute(name) || '');
|
||||
}
|
||||
|
||||
function looksLikeTitleInput(node) {
|
||||
const signals = [
|
||||
attrText(node, 'placeholder'),
|
||||
attrText(node, 'data-placeholder'),
|
||||
attrText(node, 'aria-label'),
|
||||
].filter(Boolean);
|
||||
return signals.some((value) => value.includes('标题'));
|
||||
}
|
||||
|
||||
function dispatchTextInput(node) {
|
||||
node.dispatchEvent(new Event('input', { bubbles: true, composed: true }));
|
||||
node.dispatchEvent(new Event('change', { bubbles: true, composed: true }));
|
||||
}
|
||||
|
||||
function fillInput(node, value) {
|
||||
node.focus();
|
||||
if ('value' in node) {
|
||||
node.value = '';
|
||||
dispatchTextInput(node);
|
||||
node.value = value;
|
||||
dispatchTextInput(node);
|
||||
return;
|
||||
}
|
||||
|
||||
node.textContent = value;
|
||||
dispatchTextInput(node);
|
||||
}
|
||||
|
||||
function fillEditable(node, value) {
|
||||
node.focus();
|
||||
node.innerHTML = '';
|
||||
const lines = String(value || '').split(/\n/);
|
||||
lines.forEach((line, index) => {
|
||||
if (index > 0) {
|
||||
node.appendChild(document.createElement('br'));
|
||||
}
|
||||
node.appendChild(document.createTextNode(line));
|
||||
});
|
||||
dispatchTextInput(node);
|
||||
}
|
||||
|
||||
function findVisible(selectors) {
|
||||
for (const selector of selectors) {
|
||||
const nodes = Array.from(document.querySelectorAll(selector));
|
||||
const match = nodes.find((node) => {
|
||||
return isVisible(node);
|
||||
});
|
||||
if (match) {
|
||||
return match;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function findBodyEditor(titleInput) {
|
||||
for (const selector of [
|
||||
"div[contenteditable='true'][role='textbox']",
|
||||
"div.public-DraftEditor-content[contenteditable='true']",
|
||||
"[role='textbox'][contenteditable='true']",
|
||||
"[contenteditable='true'][data-placeholder]",
|
||||
"div[contenteditable='true']",
|
||||
]) {
|
||||
const nodes = Array.from(document.querySelectorAll(selector));
|
||||
const match = nodes.find((node) => isVisible(node) && node !== titleInput && !looksLikeTitleInput(node));
|
||||
if (match) {
|
||||
return match;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function findButtonByText(fragment) {
|
||||
const candidates = Array.from(document.querySelectorAll("button, [role='button'], a"));
|
||||
const wanted = cleanText(fragment);
|
||||
return candidates.find((node) => {
|
||||
const text = cleanText(node.textContent);
|
||||
if (!text || !text.includes(wanted)) {
|
||||
return false;
|
||||
}
|
||||
return isVisible(node);
|
||||
}) || null;
|
||||
}
|
||||
|
||||
const currentUrl = location.href;
|
||||
const text = pageText();
|
||||
if (isLoginBlocked(currentUrl, text)) {
|
||||
return {
|
||||
status: 'login_required',
|
||||
current_url: currentUrl,
|
||||
};
|
||||
}
|
||||
|
||||
const titleInput = findVisible([
|
||||
"textarea[placeholder*='标题']",
|
||||
"input[placeholder*='标题']",
|
||||
"textarea[data-placeholder*='标题']",
|
||||
"input[data-placeholder*='标题']",
|
||||
"[role='textbox'][aria-label*='标题']",
|
||||
"[contenteditable='true'][aria-label*='标题']",
|
||||
"[contenteditable='true'][data-placeholder*='标题']",
|
||||
]);
|
||||
const bodyEditor = findBodyEditor(titleInput);
|
||||
|
||||
if (!titleInput || !bodyEditor) {
|
||||
return {
|
||||
status: 'editor_not_ready',
|
||||
current_url: currentUrl,
|
||||
};
|
||||
}
|
||||
|
||||
fillInput(titleInput, String(args.title || ''));
|
||||
fillEditable(bodyEditor, String(args.body || ''));
|
||||
|
||||
const publishMode = String(args.publish_mode || '').toLowerCase() === 'true';
|
||||
if (!publishMode) {
|
||||
return {
|
||||
status: 'draft_ready',
|
||||
current_url: currentUrl,
|
||||
title: cleanText(args.title),
|
||||
};
|
||||
}
|
||||
|
||||
const publishButton = findButtonByText('发布');
|
||||
if (!publishButton) {
|
||||
return {
|
||||
status: 'publish_button_missing',
|
||||
current_url: currentUrl,
|
||||
title: cleanText(args.title),
|
||||
};
|
||||
}
|
||||
publishButton.click();
|
||||
|
||||
const confirmButton = findButtonByText('确认发布');
|
||||
if (!confirmButton) {
|
||||
return {
|
||||
status: 'publish_clicked',
|
||||
current_url: currentUrl,
|
||||
title: cleanText(args.title),
|
||||
};
|
||||
}
|
||||
confirmButton.click();
|
||||
|
||||
return {
|
||||
status: 'publish_submitted',
|
||||
current_url: currentUrl,
|
||||
title: cleanText(args.title),
|
||||
};
|
||||
106
skills/zhihu-write/scripts/prepare_article_editor.js
Normal file
106
skills/zhihu-write/scripts/prepare_article_editor.js
Normal file
@@ -0,0 +1,106 @@
|
||||
function cleanText(value) {
|
||||
return String(value || '')
|
||||
.replace(/\s+/g, ' ')
|
||||
.replace(/\u200b/g, '')
|
||||
.trim();
|
||||
}
|
||||
|
||||
function pageText() {
|
||||
const body = document.body;
|
||||
return cleanText(body && (body.innerText || body.textContent || ''));
|
||||
}
|
||||
|
||||
function isLoginBlocked(url, text) {
|
||||
return /\/signin\b|\/signup\b/.test(url) ||
|
||||
/登录|注册|验证码|安全验证|验证后继续|请先登录/.test(text);
|
||||
}
|
||||
|
||||
function isVisible(node) {
|
||||
if (!node) {
|
||||
return false;
|
||||
}
|
||||
const rect = typeof node.getBoundingClientRect === 'function' ? node.getBoundingClientRect() : null;
|
||||
return !rect || rect.width > 0 || rect.height > 0;
|
||||
}
|
||||
|
||||
function attrText(node, name) {
|
||||
if (!node || typeof node.getAttribute !== 'function') {
|
||||
return '';
|
||||
}
|
||||
return cleanText(node.getAttribute(name) || '');
|
||||
}
|
||||
|
||||
function looksLikeTitleInput(node) {
|
||||
const signals = [
|
||||
attrText(node, 'placeholder'),
|
||||
attrText(node, 'data-placeholder'),
|
||||
attrText(node, 'aria-label'),
|
||||
].filter(Boolean);
|
||||
return signals.some((value) => value.includes('标题'));
|
||||
}
|
||||
|
||||
function findVisibleTitleInput() {
|
||||
const selectors = [
|
||||
"textarea[placeholder*='标题']",
|
||||
"input[placeholder*='标题']",
|
||||
"textarea[data-placeholder*='标题']",
|
||||
"input[data-placeholder*='标题']",
|
||||
"[role='textbox'][aria-label*='标题']",
|
||||
"[contenteditable='true'][aria-label*='标题']",
|
||||
"[contenteditable='true'][data-placeholder*='标题']",
|
||||
];
|
||||
for (const selector of selectors) {
|
||||
const node = document.querySelector(selector);
|
||||
if (isVisible(node)) {
|
||||
return node;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function findBodyEditor(titleInput) {
|
||||
const selectors = [
|
||||
"div[contenteditable='true'][role='textbox']",
|
||||
"div.public-DraftEditor-content[contenteditable='true']",
|
||||
"[role='textbox'][contenteditable='true']",
|
||||
"[contenteditable='true'][data-placeholder]",
|
||||
"div[contenteditable='true']",
|
||||
];
|
||||
for (const selector of selectors) {
|
||||
const nodes = Array.from(document.querySelectorAll(selector));
|
||||
const visible = nodes.find((node) => {
|
||||
return isVisible(node) && node !== titleInput && !looksLikeTitleInput(node);
|
||||
});
|
||||
if (visible) {
|
||||
return visible;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
const currentUrl = location.href;
|
||||
const text = pageText();
|
||||
|
||||
if (isLoginBlocked(currentUrl, text)) {
|
||||
return {
|
||||
status: 'login_required',
|
||||
current_url: currentUrl,
|
||||
};
|
||||
}
|
||||
|
||||
const titleInput = findVisibleTitleInput();
|
||||
const bodyEditor = findBodyEditor(titleInput);
|
||||
|
||||
if (titleInput && bodyEditor) {
|
||||
return {
|
||||
status: 'editor_ready',
|
||||
current_url: currentUrl,
|
||||
title_placeholder: titleInput.getAttribute('placeholder') || '',
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
status: 'editor_unavailable',
|
||||
current_url: currentUrl,
|
||||
desired_mode: String(args.desired_mode || 'draft'),
|
||||
};
|
||||
Reference in New Issue
Block a user