47 lines
1.3 KiB
Markdown
47 lines
1.3 KiB
Markdown
# Data Quality
|
|
|
|
This skill can return useful partial data, but it must not overclaim completeness.
|
|
|
|
## Main Quality Risks
|
|
|
|
- comment areas may not load for every hot item
|
|
- the DOM may expose only visible comments, not the full set
|
|
- generic selectors may match the wrong footer controls
|
|
- compact text counts can be parsed but still reflect display approximations
|
|
|
|
## Partial Success Rule
|
|
|
|
The source implementation tracks partial item failures during comment collection. If some detail pages fail but the run still returns a snapshot:
|
|
|
|
- report the run as partial
|
|
- include how many items were missing comment metrics
|
|
- keep the successful hotlist capture separate from comment-metric completeness
|
|
|
|
## Snapshot Caveats
|
|
|
|
The source store design keeps:
|
|
|
|
- `snapshot_id`
|
|
- capture timestamp
|
|
- page URL
|
|
- collector version
|
|
- item list
|
|
- collection stats
|
|
|
|
This is enough for reproducible reporting, but it does not guarantee that every metric field was fully captured.
|
|
|
|
## Recommended Caution Language
|
|
|
|
Use wording like:
|
|
|
|
- `热榜列表已采集,评论指标为部分完成。`
|
|
- `报告基于最新快照生成,部分条目缺少评论指标。`
|
|
- `数字来自页面可见指标,可能低于完整站内统计。`
|
|
|
|
Avoid wording like:
|
|
|
|
- `全部评论指标已准确采集`
|
|
- `完整真实热度`
|
|
- `无缺失`
|
|
|