feat: add initial skill authoring workspace

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 18:34:56 +08:00
parent a461b0734e
commit 51913555ad
30 changed files with 7114 additions and 0 deletions
--- a/skills/zhihu-hotlist/references/collection-flow.md
+++ b/skills/zhihu-hotlist/references/collection-flow.md
@@ -0,0 +1,68 @@
+# Collection Flow
+
+This skill uses the preserved source flow in `assets/zhihu_hotlist_flow.source.json`.
+
+## Source Model
+
+The source implementation does four things:
+
+1. ensure the browser is on the hotlist page
+2. capture hotlist HTML
+3. extract the top N items from the page
+4. visit each item detail page and try to collect visible comment metrics
+
+## Hotlist Page Detection
+
+- Preferred page URL: `https://www.zhihu.com/hot`
+- Domain: `www.zhihu.com`
+- Guard text: `热榜`
+
+The source flow first probes the current page for the guard text before deciding whether it must navigate.
+
+## Hotlist Extraction
+
+The source selectors look for:
+
+- hotlist root
+- hotlist item
+- title link
+- summary
+- heat text
+
+If the page HTML is empty or exposes no items, the collection should be treated as failed.
+
+## Comment Metric Collection
+
+For each hot item:
+
+1. navigate to the item detail page
+2. wait for page root
+3. scroll toward comments
+4. wait for comment list
+5. scroll comment list into view
+6. capture page HTML
+7. parse visible metrics from comment items
+
+## Parsed Metrics
+
+The source collector tries to extract:
+
+- reply count
+- upvote count
+- favorite count
+- heart count
+
+It also preserves unmatched numeric metrics as raw metric fields when possible.
+
+## Count Parsing
+
+The source parser recognizes compact counts such as:
+
+- plain integers
+- `万`
+- `亿`
+- `k`
+- `m`
+
+Use caution when summarizing parsed counts from compact display text.
+
--- a/skills/zhihu-hotlist/references/data-quality.md
+++ b/skills/zhihu-hotlist/references/data-quality.md
@@ -0,0 +1,46 @@
+# Data Quality
+
+This skill can return useful partial data, but it must not overclaim completeness.
+
+## Main Quality Risks
+
+- comment areas may not load for every hot item
+- the DOM may expose only visible comments, not the full set
+- generic selectors may match the wrong footer controls
+- compact text counts can be parsed but still reflect display approximations
+
+## Partial Success Rule
+
+The source implementation tracks partial item failures during comment collection. If some detail pages fail but the run still returns a snapshot:
+
+- report the run as partial
+- include how many items were missing comment metrics
+- keep the successful hotlist capture separate from comment-metric completeness
+
+## Snapshot Caveats
+
+The source store design keeps:
+
+- `snapshot_id`
+- capture timestamp
+- page URL
+- collector version
+- item list
+- collection stats
+
+This is enough for reproducible reporting, but it does not guarantee that every metric field was fully captured.
+
+## Recommended Caution Language
+
+Use wording like:
+
+- `热榜列表已采集，评论指标为部分完成。`
+- `报告基于最新快照生成，部分条目缺少评论指标。`
+- `数字来自页面可见指标，可能低于完整站内统计。`
+
+Avoid wording like:
+
+- `全部评论指标已准确采集`
+- `完整真实热度`
+- `无缺失`
+
--- a/skills/zhihu-hotlist/references/report-format.md
+++ b/skills/zhihu-hotlist/references/report-format.md
@@ -0,0 +1,41 @@
+# Report Format
+
+The source report mode renders a compact text report from a snapshot.
+
+## Header Line
+
+Use this structure:
+
+```text
+知乎热榜报告 <snapshot_id>: 共 <item_count> 条，采集于 <captured_at_ms>
+```
+
+## Per-Item Line
+
+Use this structure:
+
+```text
+<rank>. <title> | 热度 <heat_text> | 评论指标 <metric_count> 条 | 回复 <reply_total> | 赞同 <upvote_total> | 收藏 <favorite_total> | 红心 <heart_total>
+```
+
+## Field Semantics
+
+- `metric_count`: number of collected comment metric records for the item
+- `reply_total`: sum of reply counts across collected records
+- `upvote_total`: sum of upvote counts across collected records
+- `favorite_total`: sum of favorite counts across collected records
+- `heart_total`: sum of heart counts across collected records
+
+## Missing-Metric Handling
+
+If an item has no collected comment metrics:
+
+- keep the item in the report
+- show metric count as `0`
+- explicitly note partial data elsewhere in the result summary if the run was incomplete
+
+## Report Mode Behavior
+
+- If a specific snapshot ID is supplied, report from that snapshot.
+- Otherwise, use the latest known snapshot.
+