feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
68
skills/zhihu-hotlist/references/collection-flow.md
Normal file
68
skills/zhihu-hotlist/references/collection-flow.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Collection Flow
|
||||
|
||||
This skill uses the preserved source flow in `assets/zhihu_hotlist_flow.source.json`.
|
||||
|
||||
## Source Model
|
||||
|
||||
The source implementation does four things:
|
||||
|
||||
1. ensure the browser is on the hotlist page
|
||||
2. capture hotlist HTML
|
||||
3. extract the top N items from the page
|
||||
4. visit each item detail page and try to collect visible comment metrics
|
||||
|
||||
## Hotlist Page Detection
|
||||
|
||||
- Preferred page URL: `https://www.zhihu.com/hot`
|
||||
- Domain: `www.zhihu.com`
|
||||
- Guard text: `热榜`
|
||||
|
||||
The source flow first probes the current page for the guard text before deciding whether it must navigate.
|
||||
|
||||
## Hotlist Extraction
|
||||
|
||||
The source selectors look for:
|
||||
|
||||
- hotlist root
|
||||
- hotlist item
|
||||
- title link
|
||||
- summary
|
||||
- heat text
|
||||
|
||||
If the page HTML is empty or exposes no items, the collection should be treated as failed.
|
||||
|
||||
## Comment Metric Collection
|
||||
|
||||
For each hot item:
|
||||
|
||||
1. navigate to the item detail page
|
||||
2. wait for page root
|
||||
3. scroll toward comments
|
||||
4. wait for comment list
|
||||
5. scroll comment list into view
|
||||
6. capture page HTML
|
||||
7. parse visible metrics from comment items
|
||||
|
||||
## Parsed Metrics
|
||||
|
||||
The source collector tries to extract:
|
||||
|
||||
- reply count
|
||||
- upvote count
|
||||
- favorite count
|
||||
- heart count
|
||||
|
||||
It also preserves unmatched numeric metrics as raw metric fields when possible.
|
||||
|
||||
## Count Parsing
|
||||
|
||||
The source parser recognizes compact counts such as:
|
||||
|
||||
- plain integers
|
||||
- `万`
|
||||
- `亿`
|
||||
- `k`
|
||||
- `m`
|
||||
|
||||
Use caution when summarizing parsed counts from compact display text.
|
||||
|
||||
46
skills/zhihu-hotlist/references/data-quality.md
Normal file
46
skills/zhihu-hotlist/references/data-quality.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Data Quality
|
||||
|
||||
This skill can return useful partial data, but it must not overclaim completeness.
|
||||
|
||||
## Main Quality Risks
|
||||
|
||||
- comment areas may not load for every hot item
|
||||
- the DOM may expose only visible comments, not the full set
|
||||
- generic selectors may match the wrong footer controls
|
||||
- compact text counts can be parsed but still reflect display approximations
|
||||
|
||||
## Partial Success Rule
|
||||
|
||||
The source implementation tracks partial item failures during comment collection. If some detail pages fail but the run still returns a snapshot:
|
||||
|
||||
- report the run as partial
|
||||
- include how many items were missing comment metrics
|
||||
- keep the successful hotlist capture separate from comment-metric completeness
|
||||
|
||||
## Snapshot Caveats
|
||||
|
||||
The source store design keeps:
|
||||
|
||||
- `snapshot_id`
|
||||
- capture timestamp
|
||||
- page URL
|
||||
- collector version
|
||||
- item list
|
||||
- collection stats
|
||||
|
||||
This is enough for reproducible reporting, but it does not guarantee that every metric field was fully captured.
|
||||
|
||||
## Recommended Caution Language
|
||||
|
||||
Use wording like:
|
||||
|
||||
- `热榜列表已采集,评论指标为部分完成。`
|
||||
- `报告基于最新快照生成,部分条目缺少评论指标。`
|
||||
- `数字来自页面可见指标,可能低于完整站内统计。`
|
||||
|
||||
Avoid wording like:
|
||||
|
||||
- `全部评论指标已准确采集`
|
||||
- `完整真实热度`
|
||||
- `无缺失`
|
||||
|
||||
41
skills/zhihu-hotlist/references/report-format.md
Normal file
41
skills/zhihu-hotlist/references/report-format.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Report Format
|
||||
|
||||
The source report mode renders a compact text report from a snapshot.
|
||||
|
||||
## Header Line
|
||||
|
||||
Use this structure:
|
||||
|
||||
```text
|
||||
知乎热榜报告 <snapshot_id>: 共 <item_count> 条,采集于 <captured_at_ms>
|
||||
```
|
||||
|
||||
## Per-Item Line
|
||||
|
||||
Use this structure:
|
||||
|
||||
```text
|
||||
<rank>. <title> | 热度 <heat_text> | 评论指标 <metric_count> 条 | 回复 <reply_total> | 赞同 <upvote_total> | 收藏 <favorite_total> | 红心 <heart_total>
|
||||
```
|
||||
|
||||
## Field Semantics
|
||||
|
||||
- `metric_count`: number of collected comment metric records for the item
|
||||
- `reply_total`: sum of reply counts across collected records
|
||||
- `upvote_total`: sum of upvote counts across collected records
|
||||
- `favorite_total`: sum of favorite counts across collected records
|
||||
- `heart_total`: sum of heart counts across collected records
|
||||
|
||||
## Missing-Metric Handling
|
||||
|
||||
If an item has no collected comment metrics:
|
||||
|
||||
- keep the item in the report
|
||||
- show metric count as `0`
|
||||
- explicitly note partial data elsewhere in the result summary if the run was incomplete
|
||||
|
||||
## Report Mode Behavior
|
||||
|
||||
- If a specific snapshot ID is supplied, report from that snapshot.
|
||||
- Otherwise, use the latest known snapshot.
|
||||
|
||||
Reference in New Issue
Block a user