--- name: zhihu-hotlist description: Use when the user wants to collect, snapshot, summarize, or report Zhihu hot list items and related comment metrics from browser-visible page data. version: 0.1.0 author: sgclaw tags: - zhihu - browser - hotlist --- # Zhihu Hotlist Collect Zhihu hot list items, optionally collect visible comment metrics from each item’s detail page, and render a compact report from the resulting snapshot. Use this skill for hotlist collection and reporting, not for article editing or general Zhihu navigation. ## When to Use - The user asks to collect Zhihu hot list data. - The user asks for a snapshot, ranking summary, or report of current Zhihu hot list items. - The user wants visible comment metrics such as replies, upvotes, favorites, or heart counts from hot items. - The task needs a structured report from an existing or newly captured snapshot. Do not use this skill for: - arbitrary Zhihu page navigation without hotlist collection - writing or publishing Zhihu articles - claiming complete data quality when comment collection partially fails ## Workflow 1. Decide whether the task is a collection run, a report run, or both. 2. For collection runs, call the packaged browser script tool `zhihu-hotlist.extract_hotlist` before any generic `getText` probing. 3. For collection rules and guard conditions, follow [collection-flow.md](references/collection-flow.md). 4. Inside the packaged script, prefer stable structured page state first, then broader DOM candidates, then controlled page-text fallback. 5. Produce the `Export Artifact` immediately after the browser data is stable. 6. If the page is blocked by login, captcha, or anti-bot state, fail explicitly instead of collapsing the issue into "no rows". 7. Surface partial failures explicitly instead of hiding them behind a success summary. 8. For report runs, format output using [report-format.md](references/report-format.md). 9. Apply the caution rules in [data-quality.md](references/data-quality.md) whenever metrics are partial, missing, or inferred from fragile selectors. ## SuperRPA Interface Contract - Inside the sgClaw browser host, prefer `superrpa_browser` for Zhihu page actions. `browser_action` is only the compatibility alias. - Always pass `expected_domain` as the bare hostname only, for example `www.zhihu.com`. - All selectors must be valid CSS selectors because the host executes `document.querySelector(...)`. - Never use XPath or jQuery-style pseudo-selectors such as `:contains(...)`. - Prefer canonical route navigation such as `https://www.zhihu.com/hot` before fallback click chains. - The primary deterministic extractor is the packaged browser script tool `zhihu-hotlist.extract_hotlist`. - Use generic `getText` only as a last-resort fallback when the packaged extractor fails. - Do not keep thrashing through selector variants once the packaged script has already produced the structured artifact. ## Partial-Failure Rule - If hotlist items are captured but some comment-metric collections fail, report the run as partial. - Include how many items lacked comment metrics. - Do not phrase the result as fully complete when `partial_items > 0`. ## Blocked-Page Rule - If Zhihu responds with a login wall, captcha, security verification page, or anti-bot interstitial, state that condition explicitly. - Do not misreport those states as ordinary "empty hotlist" outcomes. ## Export Artifact The primary output of this skill is a structured artifact for downstream Office export. The structured artifact is primary. Any prose summary is secondary. Return this shape as soon as hotlist collection is complete: ```json { "source": "https://www.zhihu.com/hot", "sheet_name": "知乎热榜", "columns": ["rank", "title", "heat"], "rows": [[1, "标题", "344万"]] } ``` Rules: - `sheet_name` must be exactly `知乎热榜`. - `columns` must remain `["rank", "title", "heat"]`. - `rows` must preserve the collected ranking order from the page. - Each row must contain exactly three values: numeric rank, title text, and heat text. - If fewer than the requested rows are visible, return the visible rows and mark the result as partial. - After the artifact is complete, stop exploratory tool use and do not resume browser wandering. - Do not switch to `shell`, `glob_search`, or unrelated file browsing once the hotlist rows are collected. ## Output Return a concise result with: - operation type: `collect` or `report` - requested `top_n` - snapshot identifier when available - item count - whether comment metrics are complete or partial - any missing or weak data areas - the `Export Artifact` block shown above - an optional short prose summary only after the artifact ## References - Use [collection-flow.md](references/collection-flow.md) for browser-side collection steps. - Use [report-format.md](references/report-format.md) for report rendering. - Use [data-quality.md](references/data-quality.md) before making claims about completeness. - Use `assets/zhihu_hotlist_flow.source.json` for exact selectors and guard text from the source flow. ## Common Mistakes - Treating visible hotlist capture as equivalent to complete comment-metric capture. - Forgetting that report mode can use an existing snapshot instead of recollecting. - Ignoring weak selectors and generic button captures in comment areas. - Reporting zeros as if they were confirmed values when the DOM capture may be incomplete.