acceptance: stabilize zhihu hotlist excel flow

2026-03-29 23:17:31 +08:00
parent e294fbb9b1
commit ef88487f4a
5 changed files with 287 additions and 59 deletions
--- a/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md
+++ b/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md
@@ -1,8 +1,8 @@
 # Zhihu Hotlist Excel Acceptance

- Date: 2026-03-29 20:53:59 +0800
+- Date: 2026-03-29 23:15:47 +0800
 - Mode: real provider + live Zhihu hotlist API + simulated browser pipe
- Workspace: `/tmp/sgclaw-live-acceptance-5o87r76f`
+- Workspace: `/tmp/sgclaw-live-acceptance-icg8dnwq`
 - Final success: `True`
 - Total score: `100/100`

@@ -16,68 +16,67 @@

 ## Final Output

- exported_path: `/tmp/sgclaw-live-acceptance-5o87r76f/.sgclaw-zeroclaw-workspace/out/zhihu-hotlist-1774788827307024821.xlsx`
- final_summary: `完美！我已成功完成知乎热榜数据的收集和导出。根据导出完成合同，我必须包含生成的本地.xlsx文件路径。
+- exported_path: `/tmp/sgclaw-live-acceptance-icg8dnwq/.sgclaw-zeroclaw-workspace/zhihu_hotlist_20260329_2314.xlsx`
+- final_summary: `太好了！Excel文件已成功导出。现在让我提供完整的操作总结和结果。

-## 任务完成总结
+## 知乎热榜数据收集与导出完成

 **操作类型**: 收集 (collect)
-**数据源**: https://www.zhihu.com/hot
-**采集时间**: 2026-03-29 20:53:02 (+08:00)
-**采集条目数**: 10条
-**评论指标**: 未采集（仅热榜基础数据）
+**数据来源**: https://www.zhihu.com/hot
+**采集时间**: 2026-03-29 23:14:41 (+08:00)
+**采集结果**: 成功获取10条热榜数据

-**导出工件**:
-```json
-{
-  "source": "https://www.zhihu.com/hot",
-  "sheet_name": "知乎热榜",
-  "columns": ["rank", "title", "heat"],
-  "rows": 10
-}
-```
+**导出文件**: `/tmp/sgclaw-live-acceptance-icg8dnwq/.sgclaw-zeroclaw-workspace/zhihu_hotlist_20260329_2314.xlsx`

-**生成的Excel文件路径**: `/tmp/sgclaw-live-acceptance-5o87r76f/.sgclaw-zeroclaw-workspace/out/zhihu-hotlist-1774788827307024821.xlsx`
+**数据概览**:
+1. 如何看待张雪机车在2026年WSBK葡萄牙站夺冠？ - 热度: 1399万
+2. 李荣浩摆证据4连质问单依纯 - 热度: 805万  
+3. 日本拟动用外储做空国际原油 - 热度: 517万
+4. 官方通报女子被羁押后无罪释放 - 热度: 429万
+5. 黄金大买家土耳其央行抛售黄金 - 热度: 196万
+6. 伊朗科技大学遭袭后伊朗将美以大学列为袭击目标 - 热度: 173万
+7. 字节跳动单挑互联网巨头 - 热度: 151万
+8. 学霸家长低调不晒成绩 - 热度: 129万
+9. DDR5内存价格下降 - 热度: 110万
+10. 四种超能力选择 - 热度: 75万

-**热榜数据概览**:
-1. 官方通报女子被羁押后无罪释放相关话题 - 热度1105万
-2. 张雪机车WSBK葡萄牙站夺冠 - 热度756万  
-3. 高三学生请求拆除鸟巢事件 - 热度549万
-4. 日本拟做空国际原油 - 热度359万
-5. 谷歌TurboQuant论文疑似造假 - 热度338万
-6. 伊朗将美以大学列为袭击目标 - 热度328万
-7. 中国智能手机市场下滑苹果逆势增长 - 热度317万
-8. 超能力选择话题 - 热度194万
-9. 土耳其央行抛售黄金 - 热度166万
-10. 中国U23男足战平朝鲜队 - 热度99万
+**Excel文件内容**:
+- 工作表名称: `知乎热榜`
+- 列: `rank` (排名), `title` (标题), `heat` (热度)
+- 行数: 10条完整热榜数据

-知乎热榜数据已成功从浏览器实时采集并导出为Excel文件，文件包含完整的排名、标题和热度信息。`
+所有热榜数据已成功从知乎官网实时采集并导出为Excel格式，文件已保存在指定路径。`

 ## Skill Logs

- `DeepSeek config loaded from /tmp/sgclaw-live-acceptance-5o87r76f/sgclaw_config.json model=deepseek-chat base_url=https://api.deepseek.com`
+- `DeepSeek config loaded from /tmp/sgclaw-live-acceptance-icg8dnwq/sgclaw_config.json model=deepseek-chat base_url=https://api.deepseek.com`
 - `skills dir resolved to /home/zyl/projects/sgClaw/skill_lib/skills`
 - `runtime profile=BrowserAttached skills_prompt_mode=Compact`
 - `zeroclaw_process_message_primary`
+- `先规划再执行知乎热榜 Excel 导出
+navigate https://www.zhihu.com/hot
+getText main
+call openxml_office
+return generated local .xlsx path`
 - `loaded skills: office-export-xlsx, zhihu-hotlist, zhihu-hotlist-screen, zhihu-navigate, zhihu-write`
+- `read_skill zhihu-hotlist`
 - `navigate https://www.zhihu.com/hot`
 - `getText main`
- `read_skill zhihu-hotlist`
 - `call openxml_office`

 ## Live Hotlist Sample

- 1. 官方通报女子被羁押后无罪释放，申请国赔 13 天被叫停，当地成立联合调查组，最该查清什么？带来哪些深思？ | 1105万
- 2. 如何看待张雪机车在 2026 年 WSBK 葡萄牙站夺冠？这对国内的摩托赛事发展有什么影响？ | 756万
- 3. 高三学生因鸟鸣干扰备考请求学校拆除鸟巢，校长回信「学会与万物共存是成长的必修课」，如何评价此教育方式？ | 549万
- 4. 日本拟动用外储做空国际原油，以挽救日元汇率，对此你怎么看，其会重演 96 年「住友铜事件」么？ | 359万
- 5. 谷歌称可节省 6 倍内存的 TurboQuant 论文疑似造假，RaBitQ 作者独家发文 | 338万
- 6. 伊朗科技大学遭袭后，伊朗将美以大学列为「合法袭击目标」，如果战争扩大到教育机构，冲突还有回头路吗？ | 328万
- 7. 中国智能手机市场下滑 4%，为何苹果销售额逆势增长 23%？ | 317万
- 8. 假如有四种超能力选择，分别为：隐身、透视、飞行、预见未来半小时发生的事情，只能选择一个，你会选择哪个？ | 194万
- 9. 黄金大买家土耳其央行在伊朗战争期间抛售 80 亿美元黄金，这意味着什么？ | 166万
- 10. 国青友谊赛，中国 U23 男足 1 比 1 战平朝鲜队，如何评价本场比赛？ | 99万
+- 1. 如何看待张雪机车在 2026 年 WSBK 葡萄牙站夺冠？这对国内的摩托赛事发展有什么影响？ | 1399万
+- 2. 李荣浩摆证据 4 连质问单依纯，为什么没有授权的歌曲也能放进演唱会？演唱会筹备中可能出了什么问题？ | 805万
+- 3. 日本拟动用外储做空国际原油，以挽救日元汇率，对此你怎么看，其会重演 96 年「住友铜事件」么？ | 517万
+- 4. 官方通报女子被羁押后无罪释放，申请国赔 13 天被叫停，当地成立联合调查组，最该查清什么？带来哪些深思？ | 429万
+- 5. 黄金大买家土耳其央行在伊朗战争期间抛售 80 亿美元黄金，这意味着什么？ | 196万
+- 6. 伊朗科技大学遭袭后，伊朗将美以大学列为「合法袭击目标」，如果战争扩大到教育机构，冲突还有回头路吗？ | 173万
+- 7. 字节跳动是怎么短短数年就能单挑所有互联网巨头的？ | 151万
+- 8. 为什么越厉害的学霸，她们家长越低调？从来不在朋友圈晒孩子成绩？ | 129万
+- 9. DDR5 内存价格 3 月出现明显下降，请问这是短期现象，还是内存供需紧张真的缓和了？ | 110万
+- 10. 假如有四种超能力选择，分别为：隐身、透视、飞行、预见未来半小时发生的事情，只能选择一个，你会选择哪个？ | 75万

 ## Stderr

- `sgclaw ready: agent_id=cfae8218-6720-416e-a14e-6f85ce8ca6a4`
+- `sgclaw ready: agent_id=7482cc6b-8fe0-4727-90da-7b3f62cad9b6`
--- a/src/compat/openxml_office_tool.rs
+++ b/src/compat/openxml_office_tool.rs
@@ -1,6 +1,7 @@
 use async_trait::async_trait;
 use serde::Deserialize;
 use serde_json::{json, Value};
+use std::collections::BTreeSet;
 use std::collections::BTreeMap;
 use std::fs;
 use std::path::{Path, PathBuf};
@@ -79,21 +80,35 @@ impl Tool for OpenXmlOfficeTool {
            .iter()
            .map(|value| value.to_string())
            .collect::<Vec<_>>();
-        if parsed.columns != expected_columns {
-            return Ok(failed_tool_result(
-                "unsupported columns: expected [rank, title, heat]".to_string(),
-            ));
-        }
+        let column_order = match resolve_column_order(&parsed.columns, &expected_columns) {
+            Some(order) => order,
+            None => {
+                return Ok(failed_tool_result(
+                    "unsupported columns: expected [rank, title, heat]".to_string(),
+                ))
+            }
+        };

        if parsed.rows.is_empty() {
            return Ok(failed_tool_result("rows must not be empty".to_string()));
        }

+        if parsed.rows.iter().any(|row| row.len() != parsed.columns.len()) {
+            return Ok(failed_tool_result(
+                "each row must match the declared columns length".to_string(),
+            ));
+        }
+
        if parsed.rows.iter().any(|row| row.len() != 3) {
            return Ok(failed_tool_result(
                "each row must contain exactly 3 values".to_string(),
            ));
        }
+        let normalized_rows = parsed
+            .rows
+            .iter()
+            .map(|row| reorder_row(row, &column_order))
+            .collect::<Vec<_>>();

        let job_root = create_job_root(&self.workspace_root)?;
        let template_path = job_root.join("zhihu_hotlist_template.xlsx");
@@ -105,8 +120,8 @@ impl Tool for OpenXmlOfficeTool {
            .map(PathBuf::from)
            .unwrap_or_else(|| default_output_path(&self.workspace_root));

-        write_hotlist_template(&template_path, parsed.rows.len())?;
-        write_payload_json(&payload_path, &parsed.rows)?;
+        write_hotlist_template(&template_path, normalized_rows.len())?;
+        write_payload_json(&payload_path, &normalized_rows)?;
        write_request_json(&request_path, &template_path, &payload_path, &output_path)?;

        let rendered = run_openxml_cli(&request_path)?;
@@ -120,7 +135,7 @@ impl Tool for OpenXmlOfficeTool {
            output: json!({
                "sheet_name": DEFAULT_SHEET_NAME,
                "output_path": artifact_path,
-                "row_count": parsed.rows.len(),
+                "row_count": normalized_rows.len(),
                "renderer": OPENXML_OFFICE_TOOL_NAME
            })
            .to_string(),
@@ -156,6 +171,44 @@ fn default_output_path(workspace_root: &Path) -> PathBuf {
        .join(format!("zhihu-hotlist-{nanos}.xlsx"))
 }

+fn resolve_column_order(
+    provided_columns: &[String],
+    expected_columns: &[String],
+) -> Option<Vec<usize>> {
+    if provided_columns.len() != expected_columns.len() {
+        return None;
+    }
+
+    let provided_set = provided_columns
+        .iter()
+        .map(|value| value.trim().to_string())
+        .collect::<BTreeSet<_>>();
+    let expected_set = expected_columns
+        .iter()
+        .cloned()
+        .collect::<BTreeSet<_>>();
+
+    if provided_set != expected_set {
+        return None;
+    }
+
+    expected_columns
+        .iter()
+        .map(|expected| {
+            provided_columns
+                .iter()
+                .position(|provided| provided.trim() == expected)
+        })
+        .collect::<Option<Vec<_>>>()
+}
+
+fn reorder_row(row: &[Value], column_order: &[usize]) -> Vec<Value> {
+    column_order
+        .iter()
+        .map(|index| row[*index].clone())
+        .collect()
+}
+
 fn write_payload_json(path: &Path, rows: &[Vec<Value>]) -> anyhow::Result<()> {
    let mut variables = BTreeMap::new();
    for (idx, row) in rows.iter().enumerate() {
--- a/tests/compat_openxml_office_tool_test.rs
+++ b/tests/compat_openxml_office_tool_test.rs
@@ -51,3 +51,41 @@ async fn openxml_office_tool_renders_hotlist_xlsx_from_rows() {
    assert!(xml.contains("问题二"));
    assert!(!xml.contains("{{TITLE_1}}"));
 }
+
+#[tokio::test]
+async fn openxml_office_tool_accepts_reordered_columns_when_rows_are_structured() {
+    let workspace_root = temp_workspace_root();
+    let output_path = workspace_root.join("out/zhihu-hotlist-reordered.xlsx");
+    let tool = OpenXmlOfficeTool::new(workspace_root.clone());
+
+    let result = tool
+        .execute(json!({
+            "sheet_name": "知乎热榜",
+            "columns": ["title", "heat", "rank"],
+            "rows": [
+                ["问题一", "344万", 1],
+                ["问题二", "266万", 2]
+            ],
+            "output_path": output_path
+        }))
+        .await
+        .unwrap();
+
+    assert!(result.success, "{result:?}");
+    assert!(output_path.exists());
+
+    let unzip = ProcessCommand::new("unzip")
+        .args([
+            "-p",
+            output_path.to_str().unwrap(),
+            "xl/worksheets/sheet1.xml",
+        ])
+        .output()
+        .unwrap();
+    assert!(unzip.status.success());
+
+    let xml = String::from_utf8(unzip.stdout).unwrap();
+    assert!(xml.contains("问题一"));
+    assert!(xml.contains("344万"));
+    assert!(xml.contains(">1<"));
+}
--- a/tests/live_acceptance_score_test.py
+++ b/tests/live_acceptance_score_test.py
@@ -7,14 +7,14 @@ class LiveAcceptanceScoreTest(unittest.TestCase):
    def test_score_acceptance_handles_preloaded_office_skill_without_read_skill_log(self):
        result = {
            "logs": [
-                {"message": "navigate https://www.zhihu.com/hot"},
+                {"message": "plan 读取知乎热榜并导出 Excel"},
                {"message": "navigate https://www.zhihu.com/hot"},
                {"message": "getText body"},
                {"message": "call openxml_office"},
            ],
            "final_task": {
                "success": True,
-                "summary": "已导出 Excel",
+                "summary": "已导出 Excel /tmp/sgclaw/out.xlsx",
            },
            "stderr": [],
            "exports": [],
@@ -25,6 +25,77 @@ class LiveAcceptanceScoreTest(unittest.TestCase):

        self.assertEqual(score["skill_selection"], 30)
        self.assertEqual(score["final_response_quality"], 5)
+        self.assertNotIn("planner output missing before tool execution", score["deductions"])
+
+    def test_score_acceptance_flags_missing_plan_repeated_summary_and_fake_export_path(self):
+        repeated = "第一段总结。\n\n第一段总结。"
+        result = {
+            "logs": [
+                {"message": "navigate https://www.zhihu.com/hot"},
+                {"message": "getText main"},
+                {"message": "call openxml_office"},
+            ],
+            "final_task": {
+                "success": True,
+                "summary": f"{repeated}\n\n导出路径：/tmp/not-real.xlsx",
+            },
+            "stderr": [],
+            "exports": [],
+        }
+        items = [HotItem(rank=1, title="标题", heat="123万")]
+
+        score = score_acceptance(result, items)
+
+        self.assertIn("planner output missing before tool execution", score["deductions"])
+        self.assertIn("repeated assistant paragraphs detected", score["deductions"])
+        self.assertIn("export missing output path", score["deductions"])
+        self.assertEqual(score["final_response_quality"], 0)
+
+    def test_score_acceptance_flags_fake_rows_when_export_contains_no_live_hotlist_data(self):
+        result = {
+            "logs": [
+                {"message": "plan 读取知乎热榜并导出 Excel"},
+                {"message": "navigate https://www.zhihu.com/hot"},
+                {"message": "getText main"},
+                {"message": "call openxml_office"},
+            ],
+            "final_task": {
+                "success": True,
+                "summary": "已导出 Excel /tmp/sgclaw/out.xlsx",
+            },
+            "stderr": [],
+            "exports": [],
+        }
+        items = [HotItem(rank=1, title="真实标题", heat="123万")]
+
+        score = score_acceptance(result, items)
+
+        self.assertIn("hotlist rows were not exported as structured live data", score["deductions"])
+        self.assertEqual(score["hotlist_data_correctness"], 0)
+        self.assertEqual(score["xlsx_export_success"], 0)
+
+    def test_score_acceptance_flags_structured_handoff_retry_noise(self):
+        result = {
+            "logs": [
+                {"message": "plan 读取知乎热榜并导出 Excel"},
+                {"message": "navigate https://www.zhihu.com/hot"},
+                {"message": "getText main"},
+                {"message": "call openxml_office"},
+                {"message": "unsupported columns: expected [rank, title, heat]"},
+                {"message": "call openxml_office"},
+            ],
+            "final_task": {
+                "success": True,
+                "summary": "已导出 Excel /tmp/sgclaw/out.xlsx",
+            },
+            "stderr": [],
+            "exports": [],
+        }
+        items = [HotItem(rank=1, title="真实标题", heat="123万")]
+
+        score = score_acceptance(result, items)
+
+        self.assertIn("structured handoff required export retries", score["deductions"])


 if __name__ == "__main__":
--- a/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py
+++ b/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py
@@ -250,16 +250,18 @@ def read_json_line(output_queue: queue.Queue[str], timeout: int) -> dict:


 def score_acceptance(result: dict, items: list[HotItem]) -> dict:
-    logs = [entry.get("message", "") for entry in result["logs"]]
+    log_entries = result["logs"]
+    logs = [entry.get("message", "") for entry in log_entries]
    final_task = result.get("final_task") or {}
    exports = [Path(path) for path in result["exports"]]
    exported_path = resolve_exported_path(exports, final_task.get("summary", ""))
-
-    skill_selection = 0
-    executed_hotlist_collection = (
+    browser_path_exists = (
        "navigate https://www.zhihu.com/hot" in logs and
        any(message.startswith("getText ") for message in logs)
    )
+
+    skill_selection = 0
+    executed_hotlist_collection = browser_path_exists
    read_hotlist_skill = "read_skill zhihu-hotlist" in logs
    read_office_skill = "read_skill office-export-xlsx" in logs
    completed_office_export = "call openxml_office" in logs
@@ -302,12 +304,24 @@ def score_acceptance(result: dict, items: list[HotItem]) -> dict:

    final_response_quality = 0
    summary = final_task.get("summary", "")
-    if final_task.get("success") and summary.strip():
+    repeated_paragraphs = find_repeated_paragraphs(summary)
+    if final_task.get("success") and summary.strip() and not repeated_paragraphs:
        final_response_quality = 5

    deductions = []
+    planner_index = find_planner_log_index(log_entries)
+    first_tool_index = find_first_tool_execution_index(logs)
+    if planner_index is None or (first_tool_index is not None and planner_index > first_tool_index):
+        deductions.append("planner output missing before tool execution")
+    if repeated_paragraphs:
+        deductions.append("repeated assistant paragraphs detected")
    if not exported_path:
        deductions.append("export missing output path")
+    if browser_path_exists and (not exported_path or hotlist_data_correctness == 0):
+        deductions.append("hotlist rows were not exported as structured live data")
+    if logs.count("call openxml_office") > 1 or any(
+        "unsupported columns:" in message for message in logs):
+        deductions.append("structured handoff required export retries")

    total_score = (
        skill_selection
@@ -316,6 +330,7 @@ def score_acceptance(result: dict, items: list[HotItem]) -> dict:
        + xlsx_export_success
        + final_response_quality
    )
+    total_score = max(0, total_score - acceptance_penalty(deductions))

    return {
        "total_score": total_score,
@@ -333,6 +348,58 @@ def score_acceptance(result: dict, items: list[HotItem]) -> dict:
    }


+def find_planner_log_index(log_entries: list[dict]) -> int | None:
+    for index, entry in enumerate(log_entries):
+        message = str(entry.get("message", "")).strip()
+        if entry.get("level") == "plan":
+            return index
+        if not message:
+            continue
+        if message.startswith("plan ") or "先规划再执行" in message:
+            return index
+    return None
+
+
+def find_first_tool_execution_index(logs: list[str]) -> int | None:
+    tool_prefixes = (
+        "navigate ",
+        "click ",
+        "type ",
+        "getText ",
+        "call openxml_office",
+        "call screen_html_export",
+    )
+    for index, message in enumerate(logs):
+        if message.startswith(tool_prefixes):
+            return index
+    return None
+
+
+def find_repeated_paragraphs(summary: str) -> list[str]:
+    seen: set[str] = set()
+    repeated: list[str] = []
+    for paragraph in re.split(r"\n\s*\n", summary):
+        normalized = re.sub(r"\s+", " ", paragraph).strip()
+        if not normalized:
+            continue
+        if normalized in seen and normalized not in repeated:
+            repeated.append(normalized)
+            continue
+        seen.add(normalized)
+    return repeated
+
+
+def acceptance_penalty(deductions: list[str]) -> int:
+    penalty_map = {
+        "planner output missing before tool execution": 10,
+        "repeated assistant paragraphs detected": 10,
+        "export missing output path": 10,
+        "hotlist rows were not exported as structured live data": 15,
+        "structured handoff required export retries": 10,
+    }
+    return sum(penalty_map.get(item, 0) for item in deductions)
+
+
 def resolve_exported_path(exports: list[Path], summary: str) -> Path | None:
    match = re.search(r"(/[^\s`]+\.xlsx)", summary)
    if match: