5.3 KiB
5.3 KiB
name, description, version, author, tags
| name | description | version | author | tags | |||
|---|---|---|---|---|---|---|---|
| zhihu-hotlist | Use when the user wants to collect, snapshot, summarize, or report Zhihu hot list items and related comment metrics from browser-visible page data. | 0.1.0 | sgclaw |
|
Zhihu Hotlist
Collect Zhihu hot list items, optionally collect visible comment metrics from each item’s detail page, and render a compact report from the resulting snapshot. Use this skill for hotlist collection and reporting, not for article editing or general Zhihu navigation.
When to Use
- The user asks to collect Zhihu hot list data.
- The user asks for a snapshot, ranking summary, or report of current Zhihu hot list items.
- The user wants visible comment metrics such as replies, upvotes, favorites, or heart counts from hot items.
- The task needs a structured report from an existing or newly captured snapshot.
Do not use this skill for:
- arbitrary Zhihu page navigation without hotlist collection
- writing or publishing Zhihu articles
- claiming complete data quality when comment collection partially fails
Workflow
- Decide whether the task is a collection run, a report run, or both.
- For collection runs, call the packaged browser script tool
zhihu-hotlist.extract_hotlistbefore any genericgetTextprobing. - For collection rules and guard conditions, follow collection-flow.md.
- Inside the packaged script, prefer stable structured page state first, then broader DOM candidates, then controlled page-text fallback.
- Produce the
Export Artifactimmediately after the browser data is stable. - If the page is blocked by login, captcha, or anti-bot state, fail explicitly instead of collapsing the issue into "no rows".
- Surface partial failures explicitly instead of hiding them behind a success summary.
- For report runs, format output using report-format.md.
- Apply the caution rules in data-quality.md whenever metrics are partial, missing, or inferred from fragile selectors.
SuperRPA Interface Contract
- Inside the sgClaw browser host, prefer
superrpa_browserfor Zhihu page actions.browser_actionis only the compatibility alias. - Always pass
expected_domainas the bare hostname only, for examplewww.zhihu.com. - All selectors must be valid CSS selectors because the host executes
document.querySelector(...). - Never use XPath or jQuery-style pseudo-selectors such as
:contains(...). - Prefer canonical route navigation such as
https://www.zhihu.com/hotbefore fallback click chains. - The primary deterministic extractor is the packaged browser script tool
zhihu-hotlist.extract_hotlist. - Use generic
getTextonly as a last-resort fallback when the packaged extractor fails. - Do not keep thrashing through selector variants once the packaged script has already produced the structured artifact.
Partial-Failure Rule
- If hotlist items are captured but some comment-metric collections fail, report the run as partial.
- Include how many items lacked comment metrics.
- Do not phrase the result as fully complete when
partial_items > 0.
Blocked-Page Rule
- If Zhihu responds with a login wall, captcha, security verification page, or anti-bot interstitial, state that condition explicitly.
- Do not misreport those states as ordinary "empty hotlist" outcomes.
Export Artifact
The primary output of this skill is a structured artifact for downstream Office export. The structured artifact is primary. Any prose summary is secondary.
Return this shape as soon as hotlist collection is complete:
{
"source": "https://www.zhihu.com/hot",
"sheet_name": "知乎热榜",
"columns": ["rank", "title", "heat"],
"rows": [[1, "标题", "344万"]]
}
Rules:
sheet_namemust be exactly知乎热榜.columnsmust remain["rank", "title", "heat"].rowsmust preserve the collected ranking order from the page.- Each row must contain exactly three values: numeric rank, title text, and heat text.
- If fewer than the requested rows are visible, return the visible rows and mark the result as partial.
- After the artifact is complete, stop exploratory tool use and do not resume browser wandering.
- Do not switch to
shell,glob_search, or unrelated file browsing once the hotlist rows are collected.
Output
Return a concise result with:
- operation type:
collectorreport - requested
top_n - snapshot identifier when available
- item count
- whether comment metrics are complete or partial
- any missing or weak data areas
- the
Export Artifactblock shown above - an optional short prose summary only after the artifact
References
- Use collection-flow.md for browser-side collection steps.
- Use report-format.md for report rendering.
- Use data-quality.md before making claims about completeness.
- Use
assets/zhihu_hotlist_flow.source.jsonfor exact selectors and guard text from the source flow.
Common Mistakes
- Treating visible hotlist capture as equivalent to complete comment-metric capture.
- Forgetting that report mode can use an existing snapshot instead of recollecting.
- Ignoring weak selectors and generic button captures in comment areas.
- Reporting zeros as if they were confirmed values when the DOM capture may be incomplete.