Files
skill-lib/skills/zhihu-hotlist/references/collection-flow.md
木炎 51913555ad feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 18:34:56 +08:00

1.4 KiB

Collection Flow

This skill uses the preserved source flow in assets/zhihu_hotlist_flow.source.json.

Source Model

The source implementation does four things:

  1. ensure the browser is on the hotlist page
  2. capture hotlist HTML
  3. extract the top N items from the page
  4. visit each item detail page and try to collect visible comment metrics

Hotlist Page Detection

  • Preferred page URL: https://www.zhihu.com/hot
  • Domain: www.zhihu.com
  • Guard text: 热榜

The source flow first probes the current page for the guard text before deciding whether it must navigate.

Hotlist Extraction

The source selectors look for:

  • hotlist root
  • hotlist item
  • title link
  • summary
  • heat text

If the page HTML is empty or exposes no items, the collection should be treated as failed.

Comment Metric Collection

For each hot item:

  1. navigate to the item detail page
  2. wait for page root
  3. scroll toward comments
  4. wait for comment list
  5. scroll comment list into view
  6. capture page HTML
  7. parse visible metrics from comment items

Parsed Metrics

The source collector tries to extract:

  • reply count
  • upvote count
  • favorite count
  • heart count

It also preserves unmatched numeric metrics as raw metric fields when possible.

Count Parsing

The source parser recognizes compact counts such as:

  • plain integers
  • 亿
  • k
  • m

Use caution when summarizing parsed counts from compact display text.