1.4 KiB
1.4 KiB
Collection Flow
This skill uses the preserved source flow in assets/zhihu_hotlist_flow.source.json.
Source Model
The source implementation does four things:
- ensure the browser is on the hotlist page
- capture hotlist HTML
- extract the top N items from the page
- visit each item detail page and try to collect visible comment metrics
Hotlist Page Detection
- Preferred page URL:
https://www.zhihu.com/hot - Domain:
www.zhihu.com - Guard text:
热榜
The source flow first probes the current page for the guard text before deciding whether it must navigate.
Hotlist Extraction
The source selectors look for:
- hotlist root
- hotlist item
- title link
- summary
- heat text
If the page HTML is empty or exposes no items, the collection should be treated as failed.
Comment Metric Collection
For each hot item:
- navigate to the item detail page
- wait for page root
- scroll toward comments
- wait for comment list
- scroll comment list into view
- capture page HTML
- parse visible metrics from comment items
Parsed Metrics
The source collector tries to extract:
- reply count
- upvote count
- favorite count
- heart count
It also preserves unmatched numeric metrics as raw metric fields when possible.
Count Parsing
The source parser recognizes compact counts such as:
- plain integers
万亿km
Use caution when summarizing parsed counts from compact display text.