skill-lib/skills/zhihu-hotlist/references/data-quality.md

# Data Quality

This skill can return useful partial data, but it must not overclaim completeness.

## Main Quality Risks

- comment areas may not load for every hot item
- the DOM may expose only visible comments, not the full set
- generic selectors may match the wrong footer controls
- compact text counts can be parsed but still reflect display approximations

## Partial Success Rule

The source implementation tracks partial item failures during comment collection. If some detail pages fail but the run still returns a snapshot:

- report the run as partial
- include how many items were missing comment metrics
- keep the successful hotlist capture separate from comment-metric completeness

## Snapshot Caveats

The source store design keeps:

- `snapshot_id`
- capture timestamp
- page URL
- collector version
- item list
- collection stats

This is enough for reproducible reporting, but it does not guarantee that every metric field was fully captured.

## Recommended Caution Language

Use wording like:

- `热榜列表已采集，评论指标为部分完成。`
- `报告基于最新快照生成，部分条目缺少评论指标。`
- `数字来自页面可见指标，可能低于完整站内统计。`

Avoid wording like:

- `全部评论指标已准确采集`
- `完整真实热度`
- `无缺失`