feat: add initial skill authoring workspace

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
木炎
2026-04-02 18:34:56 +08:00
parent a461b0734e
commit 51913555ad
30 changed files with 7114 additions and 0 deletions

View File

@@ -0,0 +1,68 @@
# Collection Flow
This skill uses the preserved source flow in `assets/zhihu_hotlist_flow.source.json`.
## Source Model
The source implementation does four things:
1. ensure the browser is on the hotlist page
2. capture hotlist HTML
3. extract the top N items from the page
4. visit each item detail page and try to collect visible comment metrics
## Hotlist Page Detection
- Preferred page URL: `https://www.zhihu.com/hot`
- Domain: `www.zhihu.com`
- Guard text: `热榜`
The source flow first probes the current page for the guard text before deciding whether it must navigate.
## Hotlist Extraction
The source selectors look for:
- hotlist root
- hotlist item
- title link
- summary
- heat text
If the page HTML is empty or exposes no items, the collection should be treated as failed.
## Comment Metric Collection
For each hot item:
1. navigate to the item detail page
2. wait for page root
3. scroll toward comments
4. wait for comment list
5. scroll comment list into view
6. capture page HTML
7. parse visible metrics from comment items
## Parsed Metrics
The source collector tries to extract:
- reply count
- upvote count
- favorite count
- heart count
It also preserves unmatched numeric metrics as raw metric fields when possible.
## Count Parsing
The source parser recognizes compact counts such as:
- plain integers
- `万`
- `亿`
- `k`
- `m`
Use caution when summarizing parsed counts from compact display text.

View File

@@ -0,0 +1,46 @@
# Data Quality
This skill can return useful partial data, but it must not overclaim completeness.
## Main Quality Risks
- comment areas may not load for every hot item
- the DOM may expose only visible comments, not the full set
- generic selectors may match the wrong footer controls
- compact text counts can be parsed but still reflect display approximations
## Partial Success Rule
The source implementation tracks partial item failures during comment collection. If some detail pages fail but the run still returns a snapshot:
- report the run as partial
- include how many items were missing comment metrics
- keep the successful hotlist capture separate from comment-metric completeness
## Snapshot Caveats
The source store design keeps:
- `snapshot_id`
- capture timestamp
- page URL
- collector version
- item list
- collection stats
This is enough for reproducible reporting, but it does not guarantee that every metric field was fully captured.
## Recommended Caution Language
Use wording like:
- `热榜列表已采集,评论指标为部分完成。`
- `报告基于最新快照生成,部分条目缺少评论指标。`
- `数字来自页面可见指标,可能低于完整站内统计。`
Avoid wording like:
- `全部评论指标已准确采集`
- `完整真实热度`
- `无缺失`

View File

@@ -0,0 +1,41 @@
# Report Format
The source report mode renders a compact text report from a snapshot.
## Header Line
Use this structure:
```text
知乎热榜报告 <snapshot_id>: 共 <item_count> 条,采集于 <captured_at_ms>
```
## Per-Item Line
Use this structure:
```text
<rank>. <title> | 热度 <heat_text> | 评论指标 <metric_count> 条 | 回复 <reply_total> | 赞同 <upvote_total> | 收藏 <favorite_total> | 红心 <heart_total>
```
## Field Semantics
- `metric_count`: number of collected comment metric records for the item
- `reply_total`: sum of reply counts across collected records
- `upvote_total`: sum of upvote counts across collected records
- `favorite_total`: sum of favorite counts across collected records
- `heart_total`: sum of heart counts across collected records
## Missing-Metric Handling
If an item has no collected comment metrics:
- keep the item in the report
- show metric count as `0`
- explicitly note partial data elsewhere in the result summary if the run was incomplete
## Report Mode Behavior
- If a specific snapshot ID is supplied, report from that snapshot.
- Otherwise, use the latest known snapshot.