diff --git a/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md b/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md index 999280c..a6ef5b0 100644 --- a/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md +++ b/docs/acceptance/2026-03-29-zhihu-hotlist-excel.md @@ -1,8 +1,8 @@ # Zhihu Hotlist Excel Acceptance -- Date: 2026-03-29 20:53:59 +0800 +- Date: 2026-03-29 23:15:47 +0800 - Mode: real provider + live Zhihu hotlist API + simulated browser pipe -- Workspace: `/tmp/sgclaw-live-acceptance-5o87r76f` +- Workspace: `/tmp/sgclaw-live-acceptance-icg8dnwq` - Final success: `True` - Total score: `100/100` @@ -16,68 +16,67 @@ ## Final Output -- exported_path: `/tmp/sgclaw-live-acceptance-5o87r76f/.sgclaw-zeroclaw-workspace/out/zhihu-hotlist-1774788827307024821.xlsx` -- final_summary: `完美!我已成功完成知乎热榜数据的收集和导出。根据导出完成合同,我必须包含生成的本地.xlsx文件路径。 +- exported_path: `/tmp/sgclaw-live-acceptance-icg8dnwq/.sgclaw-zeroclaw-workspace/zhihu_hotlist_20260329_2314.xlsx` +- final_summary: `太好了!Excel文件已成功导出。现在让我提供完整的操作总结和结果。 -## 任务完成总结 +## 知乎热榜数据收集与导出完成 **操作类型**: 收集 (collect) -**数据源**: https://www.zhihu.com/hot -**采集时间**: 2026-03-29 20:53:02 (+08:00) -**采集条目数**: 10条 -**评论指标**: 未采集(仅热榜基础数据) +**数据来源**: https://www.zhihu.com/hot +**采集时间**: 2026-03-29 23:14:41 (+08:00) +**采集结果**: 成功获取10条热榜数据 -**导出工件**: -```json -{ - "source": "https://www.zhihu.com/hot", - "sheet_name": "知乎热榜", - "columns": ["rank", "title", "heat"], - "rows": 10 -} -``` +**导出文件**: `/tmp/sgclaw-live-acceptance-icg8dnwq/.sgclaw-zeroclaw-workspace/zhihu_hotlist_20260329_2314.xlsx` -**生成的Excel文件路径**: `/tmp/sgclaw-live-acceptance-5o87r76f/.sgclaw-zeroclaw-workspace/out/zhihu-hotlist-1774788827307024821.xlsx` +**数据概览**: +1. 如何看待张雪机车在2026年WSBK葡萄牙站夺冠? - 热度: 1399万 +2. 李荣浩摆证据4连质问单依纯 - 热度: 805万 +3. 日本拟动用外储做空国际原油 - 热度: 517万 +4. 官方通报女子被羁押后无罪释放 - 热度: 429万 +5. 黄金大买家土耳其央行抛售黄金 - 热度: 196万 +6. 伊朗科技大学遭袭后伊朗将美以大学列为袭击目标 - 热度: 173万 +7. 字节跳动单挑互联网巨头 - 热度: 151万 +8. 学霸家长低调不晒成绩 - 热度: 129万 +9. DDR5内存价格下降 - 热度: 110万 +10. 四种超能力选择 - 热度: 75万 -**热榜数据概览**: -1. 官方通报女子被羁押后无罪释放相关话题 - 热度1105万 -2. 张雪机车WSBK葡萄牙站夺冠 - 热度756万 -3. 高三学生请求拆除鸟巢事件 - 热度549万 -4. 日本拟做空国际原油 - 热度359万 -5. 谷歌TurboQuant论文疑似造假 - 热度338万 -6. 伊朗将美以大学列为袭击目标 - 热度328万 -7. 中国智能手机市场下滑苹果逆势增长 - 热度317万 -8. 超能力选择话题 - 热度194万 -9. 土耳其央行抛售黄金 - 热度166万 -10. 中国U23男足战平朝鲜队 - 热度99万 +**Excel文件内容**: +- 工作表名称: `知乎热榜` +- 列: `rank` (排名), `title` (标题), `heat` (热度) +- 行数: 10条完整热榜数据 -知乎热榜数据已成功从浏览器实时采集并导出为Excel文件,文件包含完整的排名、标题和热度信息。` +所有热榜数据已成功从知乎官网实时采集并导出为Excel格式,文件已保存在指定路径。` ## Skill Logs -- `DeepSeek config loaded from /tmp/sgclaw-live-acceptance-5o87r76f/sgclaw_config.json model=deepseek-chat base_url=https://api.deepseek.com` +- `DeepSeek config loaded from /tmp/sgclaw-live-acceptance-icg8dnwq/sgclaw_config.json model=deepseek-chat base_url=https://api.deepseek.com` - `skills dir resolved to /home/zyl/projects/sgClaw/skill_lib/skills` - `runtime profile=BrowserAttached skills_prompt_mode=Compact` - `zeroclaw_process_message_primary` +- `先规划再执行知乎热榜 Excel 导出 +navigate https://www.zhihu.com/hot +getText main +call openxml_office +return generated local .xlsx path` - `loaded skills: office-export-xlsx, zhihu-hotlist, zhihu-hotlist-screen, zhihu-navigate, zhihu-write` +- `read_skill zhihu-hotlist` - `navigate https://www.zhihu.com/hot` - `getText main` -- `read_skill zhihu-hotlist` - `call openxml_office` ## Live Hotlist Sample -- 1. 官方通报女子被羁押后无罪释放,申请国赔 13 天被叫停,当地成立联合调查组,最该查清什么?带来哪些深思? | 1105万 -- 2. 如何看待张雪机车在 2026 年 WSBK 葡萄牙站夺冠?这对国内的摩托赛事发展有什么影响? | 756万 -- 3. 高三学生因鸟鸣干扰备考请求学校拆除鸟巢,校长回信「学会与万物共存是成长的必修课」,如何评价此教育方式? | 549万 -- 4. 日本拟动用外储做空国际原油,以挽救日元汇率,对此你怎么看,其会重演 96 年「住友铜事件」么? | 359万 -- 5. 谷歌称可节省 6 倍内存的 TurboQuant 论文疑似造假,RaBitQ 作者独家发文 | 338万 -- 6. 伊朗科技大学遭袭后,伊朗将美以大学列为「合法袭击目标」,如果战争扩大到教育机构,冲突还有回头路吗? | 328万 -- 7. 中国智能手机市场下滑 4%,为何苹果销售额逆势增长 23%? | 317万 -- 8. 假如有四种超能力选择,分别为:隐身、透视、飞行、预见未来半小时发生的事情,只能选择一个,你会选择哪个? | 194万 -- 9. 黄金大买家土耳其央行在伊朗战争期间抛售 80 亿美元黄金,这意味着什么? | 166万 -- 10. 国青友谊赛,中国 U23 男足 1 比 1 战平朝鲜队,如何评价本场比赛? | 99万 +- 1. 如何看待张雪机车在 2026 年 WSBK 葡萄牙站夺冠?这对国内的摩托赛事发展有什么影响? | 1399万 +- 2. 李荣浩摆证据 4 连质问单依纯,为什么没有授权的歌曲也能放进演唱会?演唱会筹备中可能出了什么问题? | 805万 +- 3. 日本拟动用外储做空国际原油,以挽救日元汇率,对此你怎么看,其会重演 96 年「住友铜事件」么? | 517万 +- 4. 官方通报女子被羁押后无罪释放,申请国赔 13 天被叫停,当地成立联合调查组,最该查清什么?带来哪些深思? | 429万 +- 5. 黄金大买家土耳其央行在伊朗战争期间抛售 80 亿美元黄金,这意味着什么? | 196万 +- 6. 伊朗科技大学遭袭后,伊朗将美以大学列为「合法袭击目标」,如果战争扩大到教育机构,冲突还有回头路吗? | 173万 +- 7. 字节跳动是怎么短短数年就能单挑所有互联网巨头的? | 151万 +- 8. 为什么越厉害的学霸,她们家长越低调?从来不在朋友圈晒孩子成绩? | 129万 +- 9. DDR5 内存价格 3 月出现明显下降,请问这是短期现象,还是内存供需紧张真的缓和了? | 110万 +- 10. 假如有四种超能力选择,分别为:隐身、透视、飞行、预见未来半小时发生的事情,只能选择一个,你会选择哪个? | 75万 ## Stderr -- `sgclaw ready: agent_id=cfae8218-6720-416e-a14e-6f85ce8ca6a4` +- `sgclaw ready: agent_id=7482cc6b-8fe0-4727-90da-7b3f62cad9b6` diff --git a/src/compat/openxml_office_tool.rs b/src/compat/openxml_office_tool.rs index 53770c6..9e26a84 100644 --- a/src/compat/openxml_office_tool.rs +++ b/src/compat/openxml_office_tool.rs @@ -1,6 +1,7 @@ use async_trait::async_trait; use serde::Deserialize; use serde_json::{json, Value}; +use std::collections::BTreeSet; use std::collections::BTreeMap; use std::fs; use std::path::{Path, PathBuf}; @@ -79,21 +80,35 @@ impl Tool for OpenXmlOfficeTool { .iter() .map(|value| value.to_string()) .collect::>(); - if parsed.columns != expected_columns { - return Ok(failed_tool_result( - "unsupported columns: expected [rank, title, heat]".to_string(), - )); - } + let column_order = match resolve_column_order(&parsed.columns, &expected_columns) { + Some(order) => order, + None => { + return Ok(failed_tool_result( + "unsupported columns: expected [rank, title, heat]".to_string(), + )) + } + }; if parsed.rows.is_empty() { return Ok(failed_tool_result("rows must not be empty".to_string())); } + if parsed.rows.iter().any(|row| row.len() != parsed.columns.len()) { + return Ok(failed_tool_result( + "each row must match the declared columns length".to_string(), + )); + } + if parsed.rows.iter().any(|row| row.len() != 3) { return Ok(failed_tool_result( "each row must contain exactly 3 values".to_string(), )); } + let normalized_rows = parsed + .rows + .iter() + .map(|row| reorder_row(row, &column_order)) + .collect::>(); let job_root = create_job_root(&self.workspace_root)?; let template_path = job_root.join("zhihu_hotlist_template.xlsx"); @@ -105,8 +120,8 @@ impl Tool for OpenXmlOfficeTool { .map(PathBuf::from) .unwrap_or_else(|| default_output_path(&self.workspace_root)); - write_hotlist_template(&template_path, parsed.rows.len())?; - write_payload_json(&payload_path, &parsed.rows)?; + write_hotlist_template(&template_path, normalized_rows.len())?; + write_payload_json(&payload_path, &normalized_rows)?; write_request_json(&request_path, &template_path, &payload_path, &output_path)?; let rendered = run_openxml_cli(&request_path)?; @@ -120,7 +135,7 @@ impl Tool for OpenXmlOfficeTool { output: json!({ "sheet_name": DEFAULT_SHEET_NAME, "output_path": artifact_path, - "row_count": parsed.rows.len(), + "row_count": normalized_rows.len(), "renderer": OPENXML_OFFICE_TOOL_NAME }) .to_string(), @@ -156,6 +171,44 @@ fn default_output_path(workspace_root: &Path) -> PathBuf { .join(format!("zhihu-hotlist-{nanos}.xlsx")) } +fn resolve_column_order( + provided_columns: &[String], + expected_columns: &[String], +) -> Option> { + if provided_columns.len() != expected_columns.len() { + return None; + } + + let provided_set = provided_columns + .iter() + .map(|value| value.trim().to_string()) + .collect::>(); + let expected_set = expected_columns + .iter() + .cloned() + .collect::>(); + + if provided_set != expected_set { + return None; + } + + expected_columns + .iter() + .map(|expected| { + provided_columns + .iter() + .position(|provided| provided.trim() == expected) + }) + .collect::>>() +} + +fn reorder_row(row: &[Value], column_order: &[usize]) -> Vec { + column_order + .iter() + .map(|index| row[*index].clone()) + .collect() +} + fn write_payload_json(path: &Path, rows: &[Vec]) -> anyhow::Result<()> { let mut variables = BTreeMap::new(); for (idx, row) in rows.iter().enumerate() { diff --git a/tests/compat_openxml_office_tool_test.rs b/tests/compat_openxml_office_tool_test.rs index 63a0b0b..2799476 100644 --- a/tests/compat_openxml_office_tool_test.rs +++ b/tests/compat_openxml_office_tool_test.rs @@ -51,3 +51,41 @@ async fn openxml_office_tool_renders_hotlist_xlsx_from_rows() { assert!(xml.contains("问题二")); assert!(!xml.contains("{{TITLE_1}}")); } + +#[tokio::test] +async fn openxml_office_tool_accepts_reordered_columns_when_rows_are_structured() { + let workspace_root = temp_workspace_root(); + let output_path = workspace_root.join("out/zhihu-hotlist-reordered.xlsx"); + let tool = OpenXmlOfficeTool::new(workspace_root.clone()); + + let result = tool + .execute(json!({ + "sheet_name": "知乎热榜", + "columns": ["title", "heat", "rank"], + "rows": [ + ["问题一", "344万", 1], + ["问题二", "266万", 2] + ], + "output_path": output_path + })) + .await + .unwrap(); + + assert!(result.success, "{result:?}"); + assert!(output_path.exists()); + + let unzip = ProcessCommand::new("unzip") + .args([ + "-p", + output_path.to_str().unwrap(), + "xl/worksheets/sheet1.xml", + ]) + .output() + .unwrap(); + assert!(unzip.status.success()); + + let xml = String::from_utf8(unzip.stdout).unwrap(); + assert!(xml.contains("问题一")); + assert!(xml.contains("344万")); + assert!(xml.contains(">1<")); +} diff --git a/tests/live_acceptance_score_test.py b/tests/live_acceptance_score_test.py index f90e973..b4f82c0 100644 --- a/tests/live_acceptance_score_test.py +++ b/tests/live_acceptance_score_test.py @@ -7,14 +7,14 @@ class LiveAcceptanceScoreTest(unittest.TestCase): def test_score_acceptance_handles_preloaded_office_skill_without_read_skill_log(self): result = { "logs": [ - {"message": "navigate https://www.zhihu.com/hot"}, + {"message": "plan 读取知乎热榜并导出 Excel"}, {"message": "navigate https://www.zhihu.com/hot"}, {"message": "getText body"}, {"message": "call openxml_office"}, ], "final_task": { "success": True, - "summary": "已导出 Excel", + "summary": "已导出 Excel /tmp/sgclaw/out.xlsx", }, "stderr": [], "exports": [], @@ -25,6 +25,77 @@ class LiveAcceptanceScoreTest(unittest.TestCase): self.assertEqual(score["skill_selection"], 30) self.assertEqual(score["final_response_quality"], 5) + self.assertNotIn("planner output missing before tool execution", score["deductions"]) + + def test_score_acceptance_flags_missing_plan_repeated_summary_and_fake_export_path(self): + repeated = "第一段总结。\n\n第一段总结。" + result = { + "logs": [ + {"message": "navigate https://www.zhihu.com/hot"}, + {"message": "getText main"}, + {"message": "call openxml_office"}, + ], + "final_task": { + "success": True, + "summary": f"{repeated}\n\n导出路径:/tmp/not-real.xlsx", + }, + "stderr": [], + "exports": [], + } + items = [HotItem(rank=1, title="标题", heat="123万")] + + score = score_acceptance(result, items) + + self.assertIn("planner output missing before tool execution", score["deductions"]) + self.assertIn("repeated assistant paragraphs detected", score["deductions"]) + self.assertIn("export missing output path", score["deductions"]) + self.assertEqual(score["final_response_quality"], 0) + + def test_score_acceptance_flags_fake_rows_when_export_contains_no_live_hotlist_data(self): + result = { + "logs": [ + {"message": "plan 读取知乎热榜并导出 Excel"}, + {"message": "navigate https://www.zhihu.com/hot"}, + {"message": "getText main"}, + {"message": "call openxml_office"}, + ], + "final_task": { + "success": True, + "summary": "已导出 Excel /tmp/sgclaw/out.xlsx", + }, + "stderr": [], + "exports": [], + } + items = [HotItem(rank=1, title="真实标题", heat="123万")] + + score = score_acceptance(result, items) + + self.assertIn("hotlist rows were not exported as structured live data", score["deductions"]) + self.assertEqual(score["hotlist_data_correctness"], 0) + self.assertEqual(score["xlsx_export_success"], 0) + + def test_score_acceptance_flags_structured_handoff_retry_noise(self): + result = { + "logs": [ + {"message": "plan 读取知乎热榜并导出 Excel"}, + {"message": "navigate https://www.zhihu.com/hot"}, + {"message": "getText main"}, + {"message": "call openxml_office"}, + {"message": "unsupported columns: expected [rank, title, heat]"}, + {"message": "call openxml_office"}, + ], + "final_task": { + "success": True, + "summary": "已导出 Excel /tmp/sgclaw/out.xlsx", + }, + "stderr": [], + "exports": [], + } + items = [HotItem(rank=1, title="真实标题", heat="123万")] + + score = score_acceptance(result, items) + + self.assertIn("structured handoff required export retries", score["deductions"]) if __name__ == "__main__": diff --git a/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py b/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py index 58fff81..4048672 100644 --- a/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py +++ b/tools/live_acceptance/run_zhihu_hotlist_excel_acceptance.py @@ -250,16 +250,18 @@ def read_json_line(output_queue: queue.Queue[str], timeout: int) -> dict: def score_acceptance(result: dict, items: list[HotItem]) -> dict: - logs = [entry.get("message", "") for entry in result["logs"]] + log_entries = result["logs"] + logs = [entry.get("message", "") for entry in log_entries] final_task = result.get("final_task") or {} exports = [Path(path) for path in result["exports"]] exported_path = resolve_exported_path(exports, final_task.get("summary", "")) - - skill_selection = 0 - executed_hotlist_collection = ( + browser_path_exists = ( "navigate https://www.zhihu.com/hot" in logs and any(message.startswith("getText ") for message in logs) ) + + skill_selection = 0 + executed_hotlist_collection = browser_path_exists read_hotlist_skill = "read_skill zhihu-hotlist" in logs read_office_skill = "read_skill office-export-xlsx" in logs completed_office_export = "call openxml_office" in logs @@ -302,12 +304,24 @@ def score_acceptance(result: dict, items: list[HotItem]) -> dict: final_response_quality = 0 summary = final_task.get("summary", "") - if final_task.get("success") and summary.strip(): + repeated_paragraphs = find_repeated_paragraphs(summary) + if final_task.get("success") and summary.strip() and not repeated_paragraphs: final_response_quality = 5 deductions = [] + planner_index = find_planner_log_index(log_entries) + first_tool_index = find_first_tool_execution_index(logs) + if planner_index is None or (first_tool_index is not None and planner_index > first_tool_index): + deductions.append("planner output missing before tool execution") + if repeated_paragraphs: + deductions.append("repeated assistant paragraphs detected") if not exported_path: deductions.append("export missing output path") + if browser_path_exists and (not exported_path or hotlist_data_correctness == 0): + deductions.append("hotlist rows were not exported as structured live data") + if logs.count("call openxml_office") > 1 or any( + "unsupported columns:" in message for message in logs): + deductions.append("structured handoff required export retries") total_score = ( skill_selection @@ -316,6 +330,7 @@ def score_acceptance(result: dict, items: list[HotItem]) -> dict: + xlsx_export_success + final_response_quality ) + total_score = max(0, total_score - acceptance_penalty(deductions)) return { "total_score": total_score, @@ -333,6 +348,58 @@ def score_acceptance(result: dict, items: list[HotItem]) -> dict: } +def find_planner_log_index(log_entries: list[dict]) -> int | None: + for index, entry in enumerate(log_entries): + message = str(entry.get("message", "")).strip() + if entry.get("level") == "plan": + return index + if not message: + continue + if message.startswith("plan ") or "先规划再执行" in message: + return index + return None + + +def find_first_tool_execution_index(logs: list[str]) -> int | None: + tool_prefixes = ( + "navigate ", + "click ", + "type ", + "getText ", + "call openxml_office", + "call screen_html_export", + ) + for index, message in enumerate(logs): + if message.startswith(tool_prefixes): + return index + return None + + +def find_repeated_paragraphs(summary: str) -> list[str]: + seen: set[str] = set() + repeated: list[str] = [] + for paragraph in re.split(r"\n\s*\n", summary): + normalized = re.sub(r"\s+", " ", paragraph).strip() + if not normalized: + continue + if normalized in seen and normalized not in repeated: + repeated.append(normalized) + continue + seen.add(normalized) + return repeated + + +def acceptance_penalty(deductions: list[str]) -> int: + penalty_map = { + "planner output missing before tool execution": 10, + "repeated assistant paragraphs detected": 10, + "export missing output path": 10, + "hotlist rows were not exported as structured live data": 15, + "structured handoff required export retries": 10, + } + return sum(penalty_map.get(item, 0) for item in deductions) + + def resolve_exported_path(exports: list[Path], summary: str) -> Path | None: match = re.search(r"(/[^\s`]+\.xlsx)", summary) if match: