Files
skill-lib/skills/zhihu-hotlist/SKILL.md
木炎 51913555ad feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 18:34:56 +08:00

5.3 KiB
Raw Blame History

name, description, version, author, tags
name description version author tags
zhihu-hotlist Use when the user wants to collect, snapshot, summarize, or report Zhihu hot list items and related comment metrics from browser-visible page data. 0.1.0 sgclaw
zhihu
browser
hotlist

Zhihu Hotlist

Collect Zhihu hot list items, optionally collect visible comment metrics from each items detail page, and render a compact report from the resulting snapshot. Use this skill for hotlist collection and reporting, not for article editing or general Zhihu navigation.

When to Use

  • The user asks to collect Zhihu hot list data.
  • The user asks for a snapshot, ranking summary, or report of current Zhihu hot list items.
  • The user wants visible comment metrics such as replies, upvotes, favorites, or heart counts from hot items.
  • The task needs a structured report from an existing or newly captured snapshot.

Do not use this skill for:

  • arbitrary Zhihu page navigation without hotlist collection
  • writing or publishing Zhihu articles
  • claiming complete data quality when comment collection partially fails

Workflow

  1. Decide whether the task is a collection run, a report run, or both.
  2. For collection runs, call the packaged browser script tool zhihu-hotlist.extract_hotlist before any generic getText probing.
  3. For collection rules and guard conditions, follow collection-flow.md.
  4. Inside the packaged script, prefer stable structured page state first, then broader DOM candidates, then controlled page-text fallback.
  5. Produce the Export Artifact immediately after the browser data is stable.
  6. If the page is blocked by login, captcha, or anti-bot state, fail explicitly instead of collapsing the issue into "no rows".
  7. Surface partial failures explicitly instead of hiding them behind a success summary.
  8. For report runs, format output using report-format.md.
  9. Apply the caution rules in data-quality.md whenever metrics are partial, missing, or inferred from fragile selectors.

SuperRPA Interface Contract

  • Inside the sgClaw browser host, prefer superrpa_browser for Zhihu page actions. browser_action is only the compatibility alias.
  • Always pass expected_domain as the bare hostname only, for example www.zhihu.com.
  • All selectors must be valid CSS selectors because the host executes document.querySelector(...).
  • Never use XPath or jQuery-style pseudo-selectors such as :contains(...).
  • Prefer canonical route navigation such as https://www.zhihu.com/hot before fallback click chains.
  • The primary deterministic extractor is the packaged browser script tool zhihu-hotlist.extract_hotlist.
  • Use generic getText only as a last-resort fallback when the packaged extractor fails.
  • Do not keep thrashing through selector variants once the packaged script has already produced the structured artifact.

Partial-Failure Rule

  • If hotlist items are captured but some comment-metric collections fail, report the run as partial.
  • Include how many items lacked comment metrics.
  • Do not phrase the result as fully complete when partial_items > 0.

Blocked-Page Rule

  • If Zhihu responds with a login wall, captcha, security verification page, or anti-bot interstitial, state that condition explicitly.
  • Do not misreport those states as ordinary "empty hotlist" outcomes.

Export Artifact

The primary output of this skill is a structured artifact for downstream Office export. The structured artifact is primary. Any prose summary is secondary.

Return this shape as soon as hotlist collection is complete:

{
  "source": "https://www.zhihu.com/hot",
  "sheet_name": "知乎热榜",
  "columns": ["rank", "title", "heat"],
  "rows": [[1, "标题", "344万"]]
}

Rules:

  • sheet_name must be exactly 知乎热榜.
  • columns must remain ["rank", "title", "heat"].
  • rows must preserve the collected ranking order from the page.
  • Each row must contain exactly three values: numeric rank, title text, and heat text.
  • If fewer than the requested rows are visible, return the visible rows and mark the result as partial.
  • After the artifact is complete, stop exploratory tool use and do not resume browser wandering.
  • Do not switch to shell, glob_search, or unrelated file browsing once the hotlist rows are collected.

Output

Return a concise result with:

  • operation type: collect or report
  • requested top_n
  • snapshot identifier when available
  • item count
  • whether comment metrics are complete or partial
  • any missing or weak data areas
  • the Export Artifact block shown above
  • an optional short prose summary only after the artifact

References

  • Use collection-flow.md for browser-side collection steps.
  • Use report-format.md for report rendering.
  • Use data-quality.md before making claims about completeness.
  • Use assets/zhihu_hotlist_flow.source.json for exact selectors and guard text from the source flow.

Common Mistakes

  • Treating visible hotlist capture as equivalent to complete comment-metric capture.
  • Forgetting that report mode can use an existing snapshot instead of recollecting.
  • Ignoring weak selectors and generic button captures in comment areas.
  • Reporting zeros as if they were confirmed values when the DOM capture may be incomplete.