Files
claw/docs/superpowers/plans/2026-04-17-scene-generator-quality-improvement-plan.md
木炎 b1647cd865 docs: add detailed implementation plan for scene generator quality improvement
8 tasks across 3 phases with exact file paths, step-by-step instructions,
test commands, and commit messages for each task.

🤖 Generated with [Qoder][https://qoder.com]
2026-04-17 18:19:37 +08:00

483 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# sgClaw 场景生成器质量提升 — 实施计划
> 对应设计文档: `docs/superpowers/specs/2026-04-17-scene-generator-quality-improvement-design.md`
## 总览
3 个阶段8 个任务。每个任务包含:改动文件、具体步骤、验证方式、提交信息。
---
## Phase 1: 修基础
### Task 1: 统一生成路径(废弃 browser_script_with_business_logic
**文件**: `src/generated_scene/generator.rs`
**当前状态** (line 728-735):
```rust
fn compile_scene(scene_ir: &SceneIr, analysis: &SceneSourceAnalysis, tool_name: &str) -> CompiledScene {
let scene_toml = render_scene_toml(scene_ir, analysis, tool_name);
let browser_script = match scene_ir.workflow_archetype() {
WorkflowArchetype::SingleRequestTable => compile_single_request_table(scene_ir),
WorkflowArchetype::MultiModeRequest => compile_multi_mode_request(scene_ir),
WorkflowArchetype::PaginatedEnrichment => compile_paginated_enrichment(scene_ir),
WorkflowArchetype::PageStateEval => compile_page_state_eval(scene_ir),
};
...
}
```
**步骤**:
1. **修改 `compile_scene` 路由逻辑** (line 730-735):
- `SingleRequestTable` 不再调用 `compile_simple_request_script``compile_single_request_table` 的底层),改为将单模式场景包装为一个 mode 后走 `compile_multi_mode_request`
- 新增辅助函数 `ensure_modes_populated(scene_ir: &SceneIr) -> SceneIr`
- 如果 `scene_ir.modes` 为空但 `scene_ir.api_endpoints` 非空,生成一个 default mode
-`SingleRequestTable``PageStateEval` 场景的 `workflow_archetype` 改为 `MultiModeRequest`(因为统一走 modes 路径)
- 修改 match 分支:
```rust
let browser_script = match scene_ir.workflow_archetype() {
WorkflowArchetype::MultiModeRequest => compile_multi_mode_request(scene_ir),
WorkflowArchetype::PaginatedEnrichment => compile_paginated_enrichment(scene_ir),
_ => {
// SingleRequestTable, PageStateEval — fallback to multi-mode with default mode
let adapted = ensure_modes_populated(scene_ir);
compile_multi_mode_request(&adapted)
}
};
```
2. **实现 `ensure_modes_populated`**:
- 接收 `&SceneIr`,返回 `SceneIr`clone
- 如果 `modes` 已非空,直接返回 clone
- 如果 `modes` 为空但 `api_endpoints` 非空:
- 取第一个 endpoint 构造默认 mode
- 设置 `name: "default"`, `label: Some("default")`
- `condition`: `{ field: "period_mode", operator: "equals", value: "default" }`
- `apiEndpoint`: 复制第一个 endpoint
- `requestTemplate`: 取 `scene_ir.request_template`
- `responsePath`: 取 `scene_ir.response_path`
- `normalizeRules`: 取 `scene_ir.normalize_rules` 或默认
- `columnDefs`: 取 `scene_ir.column_defs`
- 同时设置 `default_mode = Some("default")`, `mode_switch_field = Some("period_mode")`
3. **标记 `browser_script_with_business_logic` 为废弃**(如果仍存在于代码中):
- 在当前代码中,该函数已不存在(已被 `compile_simple_request_script` 替代)。在注释中标注 "legacy path, superseded by multi-mode unified path"
**验证**:
- `cargo check` 无编译错误
- 单模式场景生成的 JS 脚本包含 `const MODES =` 和 `detectMode` 逻辑
**提交信息**:
```
feat(generator): unify all scene types through multi-mode path
Single-mode and page-state-eval scenes now get auto-wrapped into a
default mode and compiled through compile_multi_mode_request. This
eliminates the old browser_script_with_business_logic code path and
ensures all scenes get responsePath extraction, requestTemplate, and
contentType support.
```
---
### Task 2: 修复 jQuery processData 参数
**文件**: `src/generated_scene/generator.rs``compile_multi_mode_request` 函数line 1069-1253
**当前状态**: 模板中 `buildModeRequest` 函数line 1098-1118根据 `contentType` 区分了 body 序列化方式form-urlencoded 用 `Object.entries().join('&')`JSON 用 `JSON.stringify`),但 jQuery ajax 调用line 1185-1196**没有**设置 `processData` 参数。
jQuery 对 form-urlencoded body 会默认再次序列化(将字符串当作 query string 处理),导致双重编码。
**步骤**:
1. 修改 `compile_multi_mode_request` 中的 jQuery ajax 调用模板line 1185-1196 区域):
- 在 `$.ajax({...})` 中增加 `processData` 参数:
```javascript
$.ajax({
url: request.url,
type: request.method,
data: request.body,
contentType: request.headers['Content-Type'],
processData: contentType !== 'application/x-www-form-urlencoded',
dataType: 'json',
success: resolve,
error: (xhr, status, err) => reject(new Error(`API failed (${xhr.status}): ${err}`))
});
```
- 需要将 `contentType` 变量在 Promise 回调中可访问,从 `request` 对象中提取
2. 同理修改 `compile_simple_request_script` 中的 jQuery ajax 调用line 994-1004 区域),增加相同的 `processData` 逻辑
**验证**:
- 生成的 JS 中 `$.ajax` 调用包含 `processData` 参数
- form-urlencoded 请求不会双重编码
**提交信息**:
```
fix(generator): add processData to jQuery ajax for form-urlencoded requests
jQuery default processData:true re-serializes string bodies, causing
double-encoding for form-urlencoded payloads. Set processData:false
when contentType is application/x-www-form-urlencoded.
```
---
### Task 3: 单模式场景自动包装为 mode 配置
**文件**: `frontend/scene-generator/llm-client.js`
**当前状态**: `analyzeSceneDeep` (line 729-769) 调用 LLM 后直接 `normalizeSceneIr` 返回。如果 LLM 输出 `modes: []` 但有 `apiEndpoints`,不会自动包装。
**步骤**:
1. 在 `analyzeSceneDeep` 函数中,`normalizeSceneIr(...)` 之后、返回之前,增加自动包装逻辑:
```javascript
async function analyzeSceneDeep(sourceDir, dirContents, config) {
const content = await requestChatCompletionWithRetry(...);
const normalized = normalizeSceneIr(await extractJsonFromResponseWithRepair(content, config));
// ... existing sceneId validation ...
// AUTO-WRAP: single-mode scenes → modes array
if (normalized.modes.length === 0 && normalized.apiEndpoints.length > 0) {
normalized.modes.push({
name: "default",
label: "default",
condition: { field: "period_mode", operator: "equals", value: "default" },
apiEndpoint: normalized.apiEndpoints[0],
columnDefs: normalized.columnDefs || [],
requestTemplate: normalized.requestTemplate || {},
normalizeRules: normalized.normalizeRules || { type: "validate_required", requiredFields: [], filterNull: true },
responsePath: normalized.responsePath || "",
});
normalized.defaultMode = "default";
normalized.modeSwitchField = "period_mode";
// Upgrade archetype if it was single_request_table
if (normalized.workflowArchetype === "single_request_table") {
normalized.workflowArchetype = "multi_mode_request";
}
}
return normalized;
}
```
2. 同时在 `normalizeSceneIr` 中确保 `defaultMode` 和 `modeSwitchField` 有正确的默认值(已有 line 477-478 处理)
**验证**:
- 对单模式场景(如 `用户日电量监测`)运行生成,确认 `modes` 数组包含一个 default mode
- 确认 `workflowArchetype` 被正确升级为 `multi_mode_request`
**提交信息**:
```
feat(llm-client): auto-wrap single-mode scenes into modes array
When the LLM returns an empty modes array but has apiEndpoints,
automatically create a default mode with the first endpoint,
requestTemplate, responsePath, and normalizeRules. This ensures all
scenes compile through the multi-mode path.
```
---
## Phase 2: 增强提取
### Task 4: 增强 LLM prompt 的强制约束
**文件**: `frontend/scene-generator/llm-client.js``DEEP_SYSTEM_PROMPT`line 19-82
**当前状态**: prompt 中已列出 schema 但没有强调哪些字段是**必须**填充的。LLM 经常跳过 `contentType`、`responsePath`、`requestTemplate`。
**步骤**:
1. 在 `DEEP_SYSTEM_PROMPT` 的 schema 定义后,增加**强制字段约束**段落:
```
MANDATORY FIELDS (never leave empty):
- apiEndpoints[].contentType: detect from source code.
* For $.ajax({}): look for 'contentType' property. Default 'application/json' if absent.
* For $http.sendByAxios(): contentType is 'application/json' (axios default).
* For XMLHttpRequest: look for setRequestHeader('Content-Type', ...).
* For form submissions: 'application/x-www-form-urlencoded'.
- modes[].responsePath: the JSON path from raw API response to the data array.
* Common patterns: 'data.list', 'data.rcvblAcctSumAll.rcvblAcctVOS', 'content', 'data.records'
* If response is the array itself, use empty string "".
- modes[].requestTemplate: the static request body shape from the source code.
* Extract ALL keys that appear in the request body object.
* Mark dynamic values as "${args.fieldName}" and static values as literals.
- apiEndpoints[].url: the full API URL as seen in the source code.
RULES:
- If you cannot determine contentType, default to 'application/json'.
- If you cannot determine responsePath, default to '' (empty string).
- If you cannot determine requestTemplate, use {} (empty object).
- NEVER leave these fields as null or undefined.
```
2. 将这段文字插入到 `DEEP_SYSTEM_PROMPT` 中 schema 定义之后、`Instructions` 之前
**验证**:
- 对 `营销2.0零度户报表数据生成` 场景运行生成,确认 LLM 输出的 `contentType` 和 `responsePath` 不再为空
- 确认 `requestTemplate` 包含了业务必需字段
**提交信息**:
```
feat(llm-client): add mandatory field constraints to DEEP_SYSTEM_PROMPT
Explicitly require LLM to fill contentType, responsePath, and
requestTemplate with detected values or defaults. Reduces empty-field
rate from ~60% to target ~10%.
```
---
### Task 5: 增加业务 JS 文件提取
**文件**:
- `frontend/scene-generator/server.js`
- `frontend/scene-generator/generator-runner.js`
**当前状态**: `readDirectory` 在 `generator-runner.js` 中已经读取所有文件到 `dirContents`,但 `buildDeepAnalyzePrompt``llm-client.js` line 125-157主要推送 `index.html` 的 fragments。业务 JS 文件(如 `js/mca.js`, `js/sgApi.js`)的内容没有被单独提取推送。
**步骤**:
1. **在 `generator-runner.js` 中增加业务 JS 文件识别**:
- 在 `buildAnalysisContext` 函数中,增加一个 `businessJsFragments` 数组
- 识别 `js/` 目录下的 `.js` 文件(排除 `vue.js`, `element-ui` 等第三方库)
- 对每个业务 JS 文件,提取前 600 字符的关键片段函数定义、API 调用、配置对象)
- 将结果放入 `analysisContext.businessJsFragments`
2. **在 `llm-client.js` 的 `buildDeepAnalyzePrompt` 中推送业务 JS 片段**:
- 在现有的 `pushFragments` 调用后增加:
```javascript
pushFragments(parts, "business JS files", context.businessJsFragments, 4);
```
- 确保总 prompt 大小不超过 `MAX_DEEP_PROMPT_CHARS`60000
3. **在 `server.js` 中确保业务 JS 文件被读取**:
- 检查 `/handle-analyze-deep` 端点中 `readDirectory` 的调用是否已经读取了 `js/` 目录下的文件
- 如果没有,增加对 `js/*.js` 文件的读取逻辑
**验证**:
- 对 `台区线损大数据` 场景运行,确认 `js/mca.js` 或类似业务文件的内容被推送给 LLM
- 确认 prompt 总大小不超过 60000 字符
**提交信息**:
```
feat(scene-generator): extract business JS files for LLM analysis
Identify and push js/ directory business logic files (mca.js, sgApi.js,
etc.) to the LLM prompt. Exclude third-party libraries. Capped at 4
fragments to stay within MAX_DEEP_PROMPT_CHARS budget.
```
---
### Task 6: 提取后验证与二次追问
**文件**: `frontend/scene-generator/llm-client.js`
**当前状态**: `analyzeSceneDeep` 拿到 LLM 返回后直接 `normalizeSceneIr` 然后返回,没有检查关键字段是否缺失。
**步骤**:
1. 新增 `validateExtractedSceneInfo(sceneIr)` 函数:
```javascript
function validateExtractedSceneInfo(sceneIr) {
const issues = [];
// Check: at least one apiEndpoint has contentType
const endpointsWithCt = (sceneIr.apiEndpoints || []).filter(
ep => ep && ep.contentType
);
if ((sceneIr.apiEndpoints || []).length > 0 && endpointsWithCt.length === 0) {
issues.push("missing_contentType_on_endpoints");
}
// Check: at least one mode has responsePath (if modes exist)
if ((sceneIr.modes || []).length > 0) {
const modesWithPath = sceneIr.modes.filter(m => m.responsePath !== undefined && m.responsePath !== null);
if (modesWithPath.length === 0) {
issues.push("missing_responsePath_on_modes");
}
}
// Check: workflowArchetype is set
if (!sceneIr.workflowArchetype) {
issues.push("missing_workflowArchetype");
}
return issues;
}
```
2. 在 `analyzeSceneDeep` 中,`normalizeSceneIr` 之后调用验证:
```javascript
const issues = validateExtractedSceneInfo(normalized);
if (issues.length > 0) {
// Secondary prompt
const followUpPrompt = `The previous extraction has these issues:\n${issues.join('\n')}\nPlease re-analyze the source snippets and fill in the missing fields. Use defaults if truly unavailable.`;
const followUpContent = await requestChatCompletionWithRetry(
[
{ role: "system", content: DEEP_SYSTEM_PROMPT },
{ role: "user", content: followUpPrompt },
],
{ ...config, maxTokens: 2400, timeoutMs: DEEP_REQUEST_TIMEOUT_MS, retryAttempts: 1 }
);
const repaired = normalizeSceneIr(await extractJsonFromResponseWithRepair(followUpContent, config));
// Merge repaired fields into normalized (only fill empty fields)
Object.assign(normalized, mergeSceneIrFields(repaired, normalized));
}
```
3. 新增 `mergeSceneIrFields(repaired, original)` 辅助函数:
- 仅当 original 的字段为空/默认值时,才用 repaired 的值覆盖
- 避免丢失第一次提取的有效信息
**验证**:
- 模拟一个 LLM 返回缺少 `contentType` 的场景,确认二次追问触发
- 确认最多追问 1 次,不会无限循环
**提交信息**:
```
feat(llm-client): add post-extraction validation with one-shot retry
After LLM returns scene IR, validate that critical fields (contentType,
responsePath, workflowArchetype) are present. If missing, send one
follow-up prompt to fill gaps. Merges repaired fields without overwriting
valid data from the first extraction.
```
---
## Phase 3: 测试验证
### Task 7: 单元测试
**文件**: `tests/scene_generator_modes_test.rs`(新增)
**步骤**:
1. 创建测试文件 `tests/scene_generator_modes_test.rs`
2. 编写 5 个测试用例:
```rust
#[cfg(test)]
mod tests {
use super::*; // adjust imports as needed
use crate::generated_scene::generator::*;
use crate::generated_scene::ir::*;
use serde_json::json;
#[test]
fn test_single_mode_generates_modes_array() {
// Create a SingleRequestTable scene with one endpoint
let scene_ir = make_test_scene_ir();
// ... assertions: generated JS contains "const MODES ="
}
#[test]
fn test_multi_mode_generates_mode_routing() {
// Create a MultiModeRequest scene with two modes
// ... assertions: generated JS contains "detectMode"
}
#[test]
fn test_snake_camel_consistency() {
// Verify field name serialization is consistent
// between Rust (snake_case) and JS (camelCase)
}
#[test]
fn test_form_urlencoded_request_body() {
// Create a mode with contentType = "application/x-www-form-urlencoded"
// ... assertions: body is Object.entries().join('&'), not JSON.stringify
}
#[test]
fn test_response_path_extraction_in_template() {
// Create a mode with responsePath = "data.list"
// ... assertions: generated JS contains "safeGet(raw, mode.responsePath"
}
}
```
3. 每个测试构造一个 `SceneIr` 实例,调用 `compile_multi_mode_request`,然后检查生成的字符串包含预期的代码片段
**验证**:
- `cargo test scene_generator_modes_test` 全部通过
**提交信息**:
```
test: add unit tests for multi-mode generation path
Covers: single-mode auto-wrap, multi-mode routing, snake/camel
consistency, form-urlencoded body format, and responsePath extraction.
```
---
### Task 8: 集成测试
**步骤**:
1. **选择两个代表性场景跑完整生成**:
- 简单场景: `用户日电量监测`(模式 C直接 AJAX
- 复杂场景: `台区线损大数据-月_周累计线损率统计分析`(模式 A双模式
2. **对比生成结果与 tq-lineloss-report**:
- 对比 `SKILL.toml` 结构
- 对比 `scripts/*.js` 的关键函数(`buildModeRequest`, `detectMode`, `normalizeRows`
- 对比 `scene.toml` 的 bootstrap 和 params 配置
3. **产出集成测试报告**:
- 文件: `docs/superpowers/reports/2026-04-17-integration-test-report.md`
- 内容: 差距清单、质量评分、遗留问题
4. **记录差距清单**:
- 哪些字段仍未正确提取
- 哪些逻辑仍需手动修正
- 哪些场景仍不适合自动化
**验证**:
- 集成测试报告已写入
- 至少一个场景的生成质量达到 tq-lineloss-report 的 80% 以上
**提交信息**:
```
docs: add integration test report for scene generator quality
Generated skills for user-daily-power and tq-lineloss scenes. Compared
against manually-authored tq-lineloss-report. Quality assessment and
gap analysis documented.
```
---
## 执行顺序
```
Task 1 → Task 2 → Task 3 → Task 4 → Task 5 → Task 6 → Task 7 → Task 8
├──── Phase 1: 修基础 ────┤ ├───── Phase 2: 增强提取 ─────┤ ├─ Phase 3 ─┤
```
Phase 1 的三个任务有依赖关系Task 1 必须先完成Task 2 和 Task 3 可并行)。
Phase 2 的三个任务可并行Task 4/5/6 修改不同文件)。
Phase 3 依赖 Phase 1+2 全部完成。
## 风险与缓解
| 风险 | 影响 | 缓解 |
|------|------|------|
| LLM 二次追问增加生成时间 | 用户体验下降 | 限制追问 1 次,超时 120s |
| 统一路径后 SingleRequestTable 场景生成的 JS 包含不必要的 mode 逻辑 | 脚本体积增大 | default mode 条件判断简单,性能影响可忽略 |
| 业务 JS 文件过多导致 prompt 超限 | LLM 无法处理 | 限制 4 个文件,每个 600 字符 |
| `processData` 修改影响现有正常场景 | 回归问题 | 仅对 form-urlencoded 设置 falseJSON 不受影响 |