114 lines
5.3 KiB
Markdown
114 lines
5.3 KiB
Markdown
---
|
||
name: zhihu-hotlist
|
||
description: Use when the user wants to collect, snapshot, summarize, or report Zhihu hot list items and related comment metrics from browser-visible page data.
|
||
version: 0.1.0
|
||
author: sgclaw
|
||
tags:
|
||
- zhihu
|
||
- browser
|
||
- hotlist
|
||
---
|
||
|
||
# Zhihu Hotlist
|
||
|
||
Collect Zhihu hot list items, optionally collect visible comment metrics from each item’s detail page, and render a compact report from the resulting snapshot. Use this skill for hotlist collection and reporting, not for article editing or general Zhihu navigation.
|
||
|
||
## When to Use
|
||
|
||
- The user asks to collect Zhihu hot list data.
|
||
- The user asks for a snapshot, ranking summary, or report of current Zhihu hot list items.
|
||
- The user wants visible comment metrics such as replies, upvotes, favorites, or heart counts from hot items.
|
||
- The task needs a structured report from an existing or newly captured snapshot.
|
||
|
||
Do not use this skill for:
|
||
|
||
- arbitrary Zhihu page navigation without hotlist collection
|
||
- writing or publishing Zhihu articles
|
||
- claiming complete data quality when comment collection partially fails
|
||
|
||
## Workflow
|
||
|
||
1. Decide whether the task is a collection run, a report run, or both.
|
||
2. For collection runs, call the packaged browser script tool `zhihu-hotlist.extract_hotlist` before any generic `getText` probing.
|
||
3. For collection rules and guard conditions, follow [collection-flow.md](references/collection-flow.md).
|
||
4. Inside the packaged script, prefer stable structured page state first, then broader DOM candidates, then controlled page-text fallback.
|
||
5. Produce the `Export Artifact` immediately after the browser data is stable.
|
||
6. If the page is blocked by login, captcha, or anti-bot state, fail explicitly instead of collapsing the issue into "no rows".
|
||
7. Surface partial failures explicitly instead of hiding them behind a success summary.
|
||
8. For report runs, format output using [report-format.md](references/report-format.md).
|
||
9. Apply the caution rules in [data-quality.md](references/data-quality.md) whenever metrics are partial, missing, or inferred from fragile selectors.
|
||
|
||
## SuperRPA Interface Contract
|
||
|
||
- Inside the sgClaw browser host, prefer `superrpa_browser` for Zhihu page actions. `browser_action` is only the compatibility alias.
|
||
- Always pass `expected_domain` as the bare hostname only, for example `www.zhihu.com`.
|
||
- All selectors must be valid CSS selectors because the host executes `document.querySelector(...)`.
|
||
- Never use XPath or jQuery-style pseudo-selectors such as `:contains(...)`.
|
||
- Prefer canonical route navigation such as `https://www.zhihu.com/hot` before fallback click chains.
|
||
- The primary deterministic extractor is the packaged browser script tool `zhihu-hotlist.extract_hotlist`.
|
||
- Use generic `getText` only as a last-resort fallback when the packaged extractor fails.
|
||
- Do not keep thrashing through selector variants once the packaged script has already produced the structured artifact.
|
||
|
||
## Partial-Failure Rule
|
||
|
||
- If hotlist items are captured but some comment-metric collections fail, report the run as partial.
|
||
- Include how many items lacked comment metrics.
|
||
- Do not phrase the result as fully complete when `partial_items > 0`.
|
||
|
||
## Blocked-Page Rule
|
||
|
||
- If Zhihu responds with a login wall, captcha, security verification page, or anti-bot interstitial, state that condition explicitly.
|
||
- Do not misreport those states as ordinary "empty hotlist" outcomes.
|
||
|
||
## Export Artifact
|
||
|
||
The primary output of this skill is a structured artifact for downstream Office export. The structured artifact is primary. Any prose summary is secondary.
|
||
|
||
Return this shape as soon as hotlist collection is complete:
|
||
|
||
```json
|
||
{
|
||
"source": "https://www.zhihu.com/hot",
|
||
"sheet_name": "知乎热榜",
|
||
"columns": ["rank", "title", "heat"],
|
||
"rows": [[1, "标题", "344万"]]
|
||
}
|
||
```
|
||
|
||
Rules:
|
||
|
||
- `sheet_name` must be exactly `知乎热榜`.
|
||
- `columns` must remain `["rank", "title", "heat"]`.
|
||
- `rows` must preserve the collected ranking order from the page.
|
||
- Each row must contain exactly three values: numeric rank, title text, and heat text.
|
||
- If fewer than the requested rows are visible, return the visible rows and mark the result as partial.
|
||
- After the artifact is complete, stop exploratory tool use and do not resume browser wandering.
|
||
- Do not switch to `shell`, `glob_search`, or unrelated file browsing once the hotlist rows are collected.
|
||
|
||
## Output
|
||
|
||
Return a concise result with:
|
||
|
||
- operation type: `collect` or `report`
|
||
- requested `top_n`
|
||
- snapshot identifier when available
|
||
- item count
|
||
- whether comment metrics are complete or partial
|
||
- any missing or weak data areas
|
||
- the `Export Artifact` block shown above
|
||
- an optional short prose summary only after the artifact
|
||
|
||
## References
|
||
|
||
- Use [collection-flow.md](references/collection-flow.md) for browser-side collection steps.
|
||
- Use [report-format.md](references/report-format.md) for report rendering.
|
||
- Use [data-quality.md](references/data-quality.md) before making claims about completeness.
|
||
- Use `assets/zhihu_hotlist_flow.source.json` for exact selectors and guard text from the source flow.
|
||
|
||
## Common Mistakes
|
||
|
||
- Treating visible hotlist capture as equivalent to complete comment-metric capture.
|
||
- Forgetting that report mode can use an existing snapshot instead of recollecting.
|
||
- Ignoring weak selectors and generic button captures in comment areas.
|
||
- Reporting zeros as if they were confirmed values when the DOM capture may be incomplete.
|