admin/claw

Files

木炎 b1647cd865 docs: add detailed implementation plan for scene generator quality improvement

8 tasks across 3 phases with exact file paths, step-by-step instructions,
test commands, and commit messages for each task.

🤖 Generated with [Qoder][https://qoder.com]

2026-04-17 18:19:37 +08:00

19 KiB

Raw Blame History

sgClaw 场景生成器质量提升 — 实施计划

对应设计文档: docs/superpowers/specs/2026-04-17-scene-generator-quality-improvement-design.md

总览

3 个阶段，8 个任务。每个任务包含：改动文件、具体步骤、验证方式、提交信息。

Phase 1: 修基础

Task 1: 统一生成路径（废弃 browser_script_with_business_logic）

文件: src/generated_scene/generator.rs

当前状态 (line 728-735):

fn compile_scene(scene_ir: &SceneIr, analysis: &SceneSourceAnalysis, tool_name: &str) -> CompiledScene {
    let scene_toml = render_scene_toml(scene_ir, analysis, tool_name);
    let browser_script = match scene_ir.workflow_archetype() {
        WorkflowArchetype::SingleRequestTable => compile_single_request_table(scene_ir),
        WorkflowArchetype::MultiModeRequest => compile_multi_mode_request(scene_ir),
        WorkflowArchetype::PaginatedEnrichment => compile_paginated_enrichment(scene_ir),
        WorkflowArchetype::PageStateEval => compile_page_state_eval(scene_ir),
    };
    ...
}

步骤:

修改 compile_scene 路由逻辑 (line 730-735):
- SingleRequestTable 不再调用 compile_simple_request_script（compile_single_request_table 的底层），改为将单模式场景包装为一个 mode 后走 compile_multi_mode_request
- 新增辅助函数 ensure_modes_populated(scene_ir: &SceneIr) -> SceneIr：
  - 如果 scene_ir.modes 为空但 scene_ir.api_endpoints 非空，生成一个 default mode
  - 将 SingleRequestTable 和 PageStateEval 场景的 workflow_archetype 改为 MultiModeRequest（因为统一走 modes 路径）
- 修改 match 分支：
```
let browser_script = match scene_ir.workflow_archetype() {
    WorkflowArchetype::MultiModeRequest => compile_multi_mode_request(scene_ir),
    WorkflowArchetype::PaginatedEnrichment => compile_paginated_enrichment(scene_ir),
    _ => {
        // SingleRequestTable, PageStateEval — fallback to multi-mode with default mode
        let adapted = ensure_modes_populated(scene_ir);
        compile_multi_mode_request(&adapted)
    }
};
```
实现 ensure_modes_populated:
- 接收 &SceneIr，返回 SceneIr（clone）
- 如果 modes 已非空，直接返回 clone
- 如果 modes 为空但 api_endpoints 非空：
  - 取第一个 endpoint 构造默认 mode
  - 设置 name: "default", label: Some("default")
  - condition: { field: "period_mode", operator: "equals", value: "default" }
  - apiEndpoint: 复制第一个 endpoint
  - requestTemplate: 取 scene_ir.request_template
  - responsePath: 取 scene_ir.response_path
  - normalizeRules: 取 scene_ir.normalize_rules 或默认
  - columnDefs: 取 scene_ir.column_defs
- 同时设置 default_mode = Some("default"), mode_switch_field = Some("period_mode")
标记 browser_script_with_business_logic 为废弃（如果仍存在于代码中）：
- 在当前代码中，该函数已不存在（已被 compile_simple_request_script 替代）。在注释中标注 "legacy path, superseded by multi-mode unified path"

验证:

cargo check 无编译错误
单模式场景生成的 JS 脚本包含 const MODES = 和 detectMode 逻辑

提交信息:

feat(generator): unify all scene types through multi-mode path

Single-mode and page-state-eval scenes now get auto-wrapped into a
default mode and compiled through compile_multi_mode_request. This
eliminates the old browser_script_with_business_logic code path and
ensures all scenes get responsePath extraction, requestTemplate, and
contentType support.

Task 2: 修复 jQuery processData 参数

文件: src/generated_scene/generator.rs（compile_multi_mode_request 函数，line 1069-1253）

当前状态: 模板中 buildModeRequest 函数（line 1098-1118）根据 contentType 区分了 body 序列化方式（form-urlencoded 用 Object.entries().join('&')，JSON 用 JSON.stringify），但 jQuery ajax 调用（line 1185-1196）没有设置 processData 参数。

jQuery 对 form-urlencoded body 会默认再次序列化（将字符串当作 query string 处理），导致双重编码。

步骤:

修改 compile_multi_mode_request 中的 jQuery ajax 调用模板（line 1185-1196 区域）：

在 $.ajax({...}) 中增加 processData 参数：

$.ajax({
  url: request.url,
  type: request.method,
  data: request.body,
  contentType: request.headers['Content-Type'],
  processData: contentType !== 'application/x-www-form-urlencoded',
  dataType: 'json',
  success: resolve,
  error: (xhr, status, err) => reject(new Error(`API failed (${xhr.status}): ${err}`))
});

需要将 contentType 变量在 Promise 回调中可访问，从 request 对象中提取

同理修改 compile_simple_request_script 中的 jQuery ajax 调用（line 994-1004 区域），增加相同的 processData 逻辑

验证:

生成的 JS 中 $.ajax 调用包含 processData 参数
form-urlencoded 请求不会双重编码

提交信息:

fix(generator): add processData to jQuery ajax for form-urlencoded requests

jQuery default processData:true re-serializes string bodies, causing
double-encoding for form-urlencoded payloads. Set processData:false
when contentType is application/x-www-form-urlencoded.

Task 3: 单模式场景自动包装为 mode 配置

文件: frontend/scene-generator/llm-client.js

当前状态: analyzeSceneDeep (line 729-769) 调用 LLM 后直接 normalizeSceneIr 返回。如果 LLM 输出 modes: [] 但有 apiEndpoints，不会自动包装。

步骤:

在 analyzeSceneDeep 函数中，normalizeSceneIr(...) 之后、返回之前，增加自动包装逻辑：

async function analyzeSceneDeep(sourceDir, dirContents, config) {
  const content = await requestChatCompletionWithRetry(...);
  const normalized = normalizeSceneIr(await extractJsonFromResponseWithRepair(content, config));

  // ... existing sceneId validation ...

  // AUTO-WRAP: single-mode scenes → modes array
  if (normalized.modes.length === 0 && normalized.apiEndpoints.length > 0) {
    normalized.modes.push({
      name: "default",
      label: "default",
      condition: { field: "period_mode", operator: "equals", value: "default" },
      apiEndpoint: normalized.apiEndpoints[0],
      columnDefs: normalized.columnDefs || [],
      requestTemplate: normalized.requestTemplate || {},
      normalizeRules: normalized.normalizeRules || { type: "validate_required", requiredFields: [], filterNull: true },
      responsePath: normalized.responsePath || "",
    });
    normalized.defaultMode = "default";
    normalized.modeSwitchField = "period_mode";
    // Upgrade archetype if it was single_request_table
    if (normalized.workflowArchetype === "single_request_table") {
      normalized.workflowArchetype = "multi_mode_request";
    }
  }

  return normalized;
}

同时在 normalizeSceneIr 中确保 defaultMode 和 modeSwitchField 有正确的默认值（已有 line 477-478 处理）

验证:

对单模式场景（如 用户日电量监测）运行生成，确认 modes 数组包含一个 default mode
确认 workflowArchetype 被正确升级为 multi_mode_request

提交信息:

feat(llm-client): auto-wrap single-mode scenes into modes array

When the LLM returns an empty modes array but has apiEndpoints,
automatically create a default mode with the first endpoint,
requestTemplate, responsePath, and normalizeRules. This ensures all
scenes compile through the multi-mode path.

Phase 2: 增强提取

Task 4: 增强 LLM prompt 的强制约束

文件: frontend/scene-generator/llm-client.js（DEEP_SYSTEM_PROMPT，line 19-82）

当前状态: prompt 中已列出 schema 但没有强调哪些字段是必须填充的。LLM 经常跳过 contentType、responsePath、requestTemplate。

步骤:

在 DEEP_SYSTEM_PROMPT 的 schema 定义后，增加强制字段约束段落：

MANDATORY FIELDS (never leave empty):
- apiEndpoints[].contentType: detect from source code.
  * For $.ajax({}): look for 'contentType' property. Default 'application/json' if absent.
  * For $http.sendByAxios(): contentType is 'application/json' (axios default).
  * For XMLHttpRequest: look for setRequestHeader('Content-Type', ...).
  * For form submissions: 'application/x-www-form-urlencoded'.
- modes[].responsePath: the JSON path from raw API response to the data array.
  * Common patterns: 'data.list', 'data.rcvblAcctSumAll.rcvblAcctVOS', 'content', 'data.records'
  * If response is the array itself, use empty string "".
- modes[].requestTemplate: the static request body shape from the source code.
  * Extract ALL keys that appear in the request body object.
  * Mark dynamic values as "${args.fieldName}" and static values as literals.
- apiEndpoints[].url: the full API URL as seen in the source code.

RULES:
- If you cannot determine contentType, default to 'application/json'.
- If you cannot determine responsePath, default to '' (empty string).
- If you cannot determine requestTemplate, use {} (empty object).
- NEVER leave these fields as null or undefined.

将这段文字插入到 DEEP_SYSTEM_PROMPT 中 schema 定义之后、Instructions 之前

验证:

对 营销2.0零度户报表数据生成 场景运行生成，确认 LLM 输出的 contentType 和 responsePath 不再为空
确认 requestTemplate 包含了业务必需字段

提交信息:

feat(llm-client): add mandatory field constraints to DEEP_SYSTEM_PROMPT

Explicitly require LLM to fill contentType, responsePath, and
requestTemplate with detected values or defaults. Reduces empty-field
rate from ~60% to target ~10%.

Task 5: 增加业务 JS 文件提取

文件:

frontend/scene-generator/server.js
frontend/scene-generator/generator-runner.js

当前状态: readDirectory 在 generator-runner.js 中已经读取所有文件到 dirContents，但 buildDeepAnalyzePrompt（llm-client.js line 125-157）主要推送 index.html 的 fragments。业务 JS 文件（如 js/mca.js, js/sgApi.js）的内容没有被单独提取推送。

步骤:

在 generator-runner.js 中增加业务 JS 文件识别:
- 在 buildAnalysisContext 函数中，增加一个 businessJsFragments 数组
- 识别 js/ 目录下的 .js 文件（排除 vue.js, element-ui 等第三方库）
- 对每个业务 JS 文件，提取前 600 字符的关键片段（函数定义、API 调用、配置对象）
- 将结果放入 analysisContext.businessJsFragments
在 llm-client.js 的 buildDeepAnalyzePrompt 中推送业务 JS 片段:
- 在现有的 pushFragments 调用后增加：
```
pushFragments(parts, "business JS files", context.businessJsFragments, 4);
```
- 确保总 prompt 大小不超过 MAX_DEEP_PROMPT_CHARS（60000）
在 server.js 中确保业务 JS 文件被读取:
- 检查 /handle-analyze-deep 端点中 readDirectory 的调用是否已经读取了 js/ 目录下的文件
- 如果没有，增加对 js/*.js 文件的读取逻辑

验证:

对 台区线损大数据 场景运行，确认 js/mca.js 或类似业务文件的内容被推送给 LLM
确认 prompt 总大小不超过 60000 字符

提交信息:

feat(scene-generator): extract business JS files for LLM analysis

Identify and push js/ directory business logic files (mca.js, sgApi.js,
etc.) to the LLM prompt. Exclude third-party libraries. Capped at 4
fragments to stay within MAX_DEEP_PROMPT_CHARS budget.

Task 6: 提取后验证与二次追问

文件: frontend/scene-generator/llm-client.js

当前状态: analyzeSceneDeep 拿到 LLM 返回后直接 normalizeSceneIr 然后返回，没有检查关键字段是否缺失。

步骤:

新增 validateExtractedSceneInfo(sceneIr) 函数:

function validateExtractedSceneInfo(sceneIr) {
  const issues = [];

  // Check: at least one apiEndpoint has contentType
  const endpointsWithCt = (sceneIr.apiEndpoints || []).filter(
    ep => ep && ep.contentType
  );
  if ((sceneIr.apiEndpoints || []).length > 0 && endpointsWithCt.length === 0) {
    issues.push("missing_contentType_on_endpoints");
  }

  // Check: at least one mode has responsePath (if modes exist)
  if ((sceneIr.modes || []).length > 0) {
    const modesWithPath = sceneIr.modes.filter(m => m.responsePath !== undefined && m.responsePath !== null);
    if (modesWithPath.length === 0) {
      issues.push("missing_responsePath_on_modes");
    }
  }

  // Check: workflowArchetype is set
  if (!sceneIr.workflowArchetype) {
    issues.push("missing_workflowArchetype");
  }

  return issues;
}

在 analyzeSceneDeep 中，normalizeSceneIr 之后调用验证：

const issues = validateExtractedSceneInfo(normalized);
if (issues.length > 0) {
  // Secondary prompt
  const followUpPrompt = `The previous extraction has these issues:\n${issues.join('\n')}\nPlease re-analyze the source snippets and fill in the missing fields. Use defaults if truly unavailable.`;

  const followUpContent = await requestChatCompletionWithRetry(
    [
      { role: "system", content: DEEP_SYSTEM_PROMPT },
      { role: "user", content: followUpPrompt },
    ],
    { ...config, maxTokens: 2400, timeoutMs: DEEP_REQUEST_TIMEOUT_MS, retryAttempts: 1 }
  );

  const repaired = normalizeSceneIr(await extractJsonFromResponseWithRepair(followUpContent, config));
  // Merge repaired fields into normalized (only fill empty fields)
  Object.assign(normalized, mergeSceneIrFields(repaired, normalized));
}

新增 mergeSceneIrFields(repaired, original) 辅助函数：
- 仅当 original 的字段为空/默认值时，才用 repaired 的值覆盖
- 避免丢失第一次提取的有效信息

验证:

模拟一个 LLM 返回缺少 contentType 的场景，确认二次追问触发
确认最多追问 1 次，不会无限循环

提交信息:

feat(llm-client): add post-extraction validation with one-shot retry

After LLM returns scene IR, validate that critical fields (contentType,
responsePath, workflowArchetype) are present. If missing, send one
follow-up prompt to fill gaps. Merges repaired fields without overwriting
valid data from the first extraction.

Phase 3: 测试验证

Task 7: 单元测试

文件: tests/scene_generator_modes_test.rs（新增）

步骤:

创建测试文件 tests/scene_generator_modes_test.rs

编写 5 个测试用例：

#[cfg(test)]
mod tests {
    use super::*; // adjust imports as needed
    use crate::generated_scene::generator::*;
    use crate::generated_scene::ir::*;
    use serde_json::json;

    #[test]
    fn test_single_mode_generates_modes_array() {
        // Create a SingleRequestTable scene with one endpoint
        let scene_ir = make_test_scene_ir();
        // ... assertions: generated JS contains "const MODES ="
    }

    #[test]
    fn test_multi_mode_generates_mode_routing() {
        // Create a MultiModeRequest scene with two modes
        // ... assertions: generated JS contains "detectMode"
    }

    #[test]
    fn test_snake_camel_consistency() {
        // Verify field name serialization is consistent
        // between Rust (snake_case) and JS (camelCase)
    }

    #[test]
    fn test_form_urlencoded_request_body() {
        // Create a mode with contentType = "application/x-www-form-urlencoded"
        // ... assertions: body is Object.entries().join('&'), not JSON.stringify
    }

    #[test]
    fn test_response_path_extraction_in_template() {
        // Create a mode with responsePath = "data.list"
        // ... assertions: generated JS contains "safeGet(raw, mode.responsePath"
    }
}

每个测试构造一个 SceneIr 实例，调用 compile_multi_mode_request，然后检查生成的字符串包含预期的代码片段

验证:

cargo test scene_generator_modes_test 全部通过

提交信息:

test: add unit tests for multi-mode generation path

Covers: single-mode auto-wrap, multi-mode routing, snake/camel
consistency, form-urlencoded body format, and responsePath extraction.

Task 8: 集成测试

步骤:

选择两个代表性场景跑完整生成:
- 简单场景: 用户日电量监测（模式 C，直接 AJAX）
- 复杂场景: 台区线损大数据-月_周累计线损率统计分析（模式 A，双模式）
对比生成结果与 tq-lineloss-report:
- 对比 SKILL.toml 结构
- 对比 scripts/*.js 的关键函数（buildModeRequest, detectMode, normalizeRows）
- 对比 scene.toml 的 bootstrap 和 params 配置
产出集成测试报告:
- 文件: docs/superpowers/reports/2026-04-17-integration-test-report.md
- 内容: 差距清单、质量评分、遗留问题
记录差距清单:
- 哪些字段仍未正确提取
- 哪些逻辑仍需手动修正
- 哪些场景仍不适合自动化

验证:

集成测试报告已写入
至少一个场景的生成质量达到 tq-lineloss-report 的 80% 以上

提交信息:

docs: add integration test report for scene generator quality

Generated skills for user-daily-power and tq-lineloss scenes. Compared
against manually-authored tq-lineloss-report. Quality assessment and
gap analysis documented.

执行顺序

Task 1 → Task 2 → Task 3 → Task 4 → Task 5 → Task 6 → Task 7 → Task 8
   ├──── Phase 1: 修基础 ────┤   ├───── Phase 2: 增强提取 ─────┤   ├─ Phase 3 ─┤

Phase 1 的三个任务有依赖关系（Task 1 必须先完成，Task 2 和 Task 3 可并行）。 Phase 2 的三个任务可并行（Task 4/5/6 修改不同文件）。 Phase 3 依赖 Phase 1+2 全部完成。

风险与缓解

风险	影响	缓解
LLM 二次追问增加生成时间	用户体验下降	限制追问 1 次，超时 120s
统一路径后 SingleRequestTable 场景生成的 JS 包含不必要的 mode 逻辑	脚本体积增大	default mode 条件判断简单，性能影响可忽略
业务 JS 文件过多导致 prompt 超限	LLM 无法处理	限制 4 个文件，每个 600 字符
`processData` 修改影响现有正常场景	回归问题	仅对 form-urlencoded 设置 false，JSON 不受影响

19 KiB Raw Blame History Unescape Escape

sgClaw 场景生成器质量提升 — 实施计划

总览

Phase 1: 修基础

Task 1: 统一生成路径（废弃 browser_script_with_business_logic）

Task 2: 修复 jQuery processData 参数

Task 3: 单模式场景自动包装为 mode 配置

Phase 2: 增强提取

Task 4: 增强 LLM prompt 的强制约束

Task 5: 增加业务 JS 文件提取

Task 6: 提取后验证与二次追问

Phase 3: 测试验证

Task 7: 单元测试

Task 8: 集成测试

执行顺序

风险与缓解

19 KiB

Raw Blame History