Files
skill-lib/skills/zhihu-hotlist/SKILL.md
木炎 51913555ad feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 18:34:56 +08:00

114 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: zhihu-hotlist
description: Use when the user wants to collect, snapshot, summarize, or report Zhihu hot list items and related comment metrics from browser-visible page data.
version: 0.1.0
author: sgclaw
tags:
- zhihu
- browser
- hotlist
---
# Zhihu Hotlist
Collect Zhihu hot list items, optionally collect visible comment metrics from each items detail page, and render a compact report from the resulting snapshot. Use this skill for hotlist collection and reporting, not for article editing or general Zhihu navigation.
## When to Use
- The user asks to collect Zhihu hot list data.
- The user asks for a snapshot, ranking summary, or report of current Zhihu hot list items.
- The user wants visible comment metrics such as replies, upvotes, favorites, or heart counts from hot items.
- The task needs a structured report from an existing or newly captured snapshot.
Do not use this skill for:
- arbitrary Zhihu page navigation without hotlist collection
- writing or publishing Zhihu articles
- claiming complete data quality when comment collection partially fails
## Workflow
1. Decide whether the task is a collection run, a report run, or both.
2. For collection runs, call the packaged browser script tool `zhihu-hotlist.extract_hotlist` before any generic `getText` probing.
3. For collection rules and guard conditions, follow [collection-flow.md](references/collection-flow.md).
4. Inside the packaged script, prefer stable structured page state first, then broader DOM candidates, then controlled page-text fallback.
5. Produce the `Export Artifact` immediately after the browser data is stable.
6. If the page is blocked by login, captcha, or anti-bot state, fail explicitly instead of collapsing the issue into "no rows".
7. Surface partial failures explicitly instead of hiding them behind a success summary.
8. For report runs, format output using [report-format.md](references/report-format.md).
9. Apply the caution rules in [data-quality.md](references/data-quality.md) whenever metrics are partial, missing, or inferred from fragile selectors.
## SuperRPA Interface Contract
- Inside the sgClaw browser host, prefer `superrpa_browser` for Zhihu page actions. `browser_action` is only the compatibility alias.
- Always pass `expected_domain` as the bare hostname only, for example `www.zhihu.com`.
- All selectors must be valid CSS selectors because the host executes `document.querySelector(...)`.
- Never use XPath or jQuery-style pseudo-selectors such as `:contains(...)`.
- Prefer canonical route navigation such as `https://www.zhihu.com/hot` before fallback click chains.
- The primary deterministic extractor is the packaged browser script tool `zhihu-hotlist.extract_hotlist`.
- Use generic `getText` only as a last-resort fallback when the packaged extractor fails.
- Do not keep thrashing through selector variants once the packaged script has already produced the structured artifact.
## Partial-Failure Rule
- If hotlist items are captured but some comment-metric collections fail, report the run as partial.
- Include how many items lacked comment metrics.
- Do not phrase the result as fully complete when `partial_items > 0`.
## Blocked-Page Rule
- If Zhihu responds with a login wall, captcha, security verification page, or anti-bot interstitial, state that condition explicitly.
- Do not misreport those states as ordinary "empty hotlist" outcomes.
## Export Artifact
The primary output of this skill is a structured artifact for downstream Office export. The structured artifact is primary. Any prose summary is secondary.
Return this shape as soon as hotlist collection is complete:
```json
{
"source": "https://www.zhihu.com/hot",
"sheet_name": "知乎热榜",
"columns": ["rank", "title", "heat"],
"rows": [[1, "标题", "344万"]]
}
```
Rules:
- `sheet_name` must be exactly `知乎热榜`.
- `columns` must remain `["rank", "title", "heat"]`.
- `rows` must preserve the collected ranking order from the page.
- Each row must contain exactly three values: numeric rank, title text, and heat text.
- If fewer than the requested rows are visible, return the visible rows and mark the result as partial.
- After the artifact is complete, stop exploratory tool use and do not resume browser wandering.
- Do not switch to `shell`, `glob_search`, or unrelated file browsing once the hotlist rows are collected.
## Output
Return a concise result with:
- operation type: `collect` or `report`
- requested `top_n`
- snapshot identifier when available
- item count
- whether comment metrics are complete or partial
- any missing or weak data areas
- the `Export Artifact` block shown above
- an optional short prose summary only after the artifact
## References
- Use [collection-flow.md](references/collection-flow.md) for browser-side collection steps.
- Use [report-format.md](references/report-format.md) for report rendering.
- Use [data-quality.md](references/data-quality.md) before making claims about completeness.
- Use `assets/zhihu_hotlist_flow.source.json` for exact selectors and guard text from the source flow.
## Common Mistakes
- Treating visible hotlist capture as equivalent to complete comment-metric capture.
- Forgetting that report mode can use an existing snapshot instead of recollecting.
- Ignoring weak selectors and generic button captures in comment areas.
- Reporting zeros as if they were confirmed values when the DOM capture may be incomplete.