# Enhanced LLM Extraction Schema - Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Enhance the LLM extraction schema to support multi-mode business logic, enabling automatic generation of scripts like tq-lineloss-report that switch between month/week modes. **Architecture:** Extend existing `SceneInfoJson` in Rust with new mode-related structs. Enhance LLM prompt in `llm-client.js` to detect multi-mode patterns. Add new template function `browser_script_with_modes()` for generating mode-aware JavaScript. **Tech Stack:** Rust (serde_json), JavaScript (Node.js), LLM API --- ## File Structure | File | Action | Purpose | |------|--------|---------| | `src/generated_scene/generator.rs` | Modify | Add mode-related schema structs and multi-mode template | | `frontend/scene-generator/llm-client.js` | Modify | Enhance DEEP_SYSTEM_PROMPT for mode detection | | `frontend/scene-generator/server.js` | Modify | Handle enhanced schema in deep analysis endpoint | --- ### Task 1: Add Rust Schema Structs for Multi-Mode Support **Files:** - Modify: `src/generated_scene/generator.rs` (after line 21) **Goal:** Add new Rust structs to parse the enhanced JSON schema with modes support. - [ ] **Step 1: Add ModeConditionJson struct** Add after `ApiEndpointJson` struct (line 21): ```rust #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] pub struct ModeConditionJson { pub field: String, #[serde(default = "default_equals")] pub operator: String, pub value: serde_json::Value, } fn default_equals() -> String { "equals".to_string() } ``` - [ ] **Step 2: Add NormalizeRulesJson struct** Add after `ModeConditionJson`: ```rust #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] pub struct NormalizeRulesJson { #[serde(rename = "type", default = "default_validate_all")] pub rules_type: String, #[serde(default)] pub required_fields: Vec, #[serde(default = "default_true")] pub filter_null: bool, } fn default_validate_all() -> String { "validate_all_columns".to_string() } fn default_true() -> bool { true } ``` - [ ] **Step 3: Add ModeConfigJson struct** Add after `NormalizeRulesJson`: ```rust #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] pub struct ModeConfigJson { pub name: String, #[serde(default)] pub label: Option, pub condition: ModeConditionJson, #[serde(rename = "apiEndpoint")] pub api_endpoint: ApiEndpointEnhancedJson, #[serde(rename = "columnDefs", default)] pub column_defs: Vec<(String, String)>, #[serde(rename = "requestTemplate", default)] pub request_template: Option, #[serde(rename = "normalizeRules", default)] pub normalize_rules: Option, #[serde(rename = "responsePath", default)] pub response_path: Option, } ``` - [ ] **Step 4: Add ApiEndpointEnhancedJson struct** Add before `ModeConfigJson`: ```rust #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] pub struct ApiEndpointEnhancedJson { pub name: String, pub url: String, #[serde(default)] pub method: String, #[serde(rename = "contentType", default)] pub content_type: Option, #[serde(default)] pub description: Option, } ``` - [ ] **Step 5: Enhance SceneInfoJson struct** Modify `SceneInfoJson` to add mode fields (add after line 54, before the closing brace): ```rust // Multi-mode support (new fields) #[serde(default)] pub modes: Vec, #[serde(rename = "defaultMode", default)] pub default_mode: Option, #[serde(rename = "modeSwitchField", default)] pub mode_switch_field: Option, ``` - [ ] **Step 6: Verify the changes** Run `cargo check` to verify: ```bash cargo check ``` Expected: No compilation errors. - [ ] **Step 7: Commit** ```bash git add src/generated_scene/generator.rs git commit -m "feat(generator): add multi-mode schema structs for enhanced LLM extraction" ``` --- ### Task 2: Enhance LLM Extraction Prompt **Files:** - Modify: `frontend/scene-generator/llm-client.js` (lines 16-46) **Goal:** Enhance `DEEP_SYSTEM_PROMPT` to instruct LLM to detect multi-mode business logic. - [ ] **Step 1: Replace DEEP_SYSTEM_PROMPT with enhanced version** Replace the entire `DEEP_SYSTEM_PROMPT` constant (lines 16-46): ```javascript const DEEP_SYSTEM_PROMPT = `你是一个场景代码分析专家。分析场景源码,提取关键业务信息。 ## 分析目标 1. **多模式识别** (关键): - 查找条件分支逻辑 (if/switch) 中基于 period_mode、reportType 等字段的分支 - 识别不同分支对应的 API 端点、列定义、请求格式 - 如果发现多模式,使用 modes 数组格式输出 2. **API 端点**: 识别所有 HTTP 请求地址 (URL, method, contentType, 用途) - 从 $.ajax/fetch 调用中提取 contentType - 检测请求格式: application/json 或 application/x-www-form-urlencoded 3. **请求模板**: 识别请求参数结构 - 提取硬编码的分页参数 (rows, page, sidx, sord) - 识别模板变量如 \${args.org_code} 4. **数据归一化**: 识别数据处理规则 - 查找数据渲染/表格填充逻辑 - 检测数据验证条件 (哪些字段不能为空) 5. **响应路径**: 识别数据在响应中的位置 - 如 response.content 或 response.data ## 输出格式 ### 单模式场景 (无 modes 数组): { "sceneId": "string", "sceneName": "string", "sceneKind": "report_collection | monitoring", "expectedDomain": "string", "targetUrl": "string", "apiEndpoints": [{"name": "", "url": "", "method": "POST"}], "staticParams": {"key": "value"}, "columnDefs": [["fieldName", "中文列名"]] } ### 多模式场景 (有 modes 数组): { "sceneId": "tq-lineloss-report", "sceneName": "台区线损报表", "sceneKind": "report_collection", "modes": [ { "name": "month", "label": "月度报表", "condition": {"field": "period_mode", "operator": "equals", "value": "month"}, "apiEndpoint": { "name": "月度线损查询", "url": "http://...", "method": "POST", "contentType": "application/x-www-form-urlencoded" }, "columnDefs": [["ORG_NAME", "供电单位"], ...], "requestTemplate": {"orgno": "\${args.org_code}", "rows": 1000, "page": 1}, "normalizeRules": {"type": "validate_all_columns", "filterNull": true}, "responsePath": "content" }, { "name": "week", "label": "周报表", "condition": {"field": "period_mode", "operator": "equals", "value": "week"}, "apiEndpoint": {...}, "columnDefs": [...], ... } ], "defaultMode": "month", "modeSwitchField": "period_mode" } **重要**: 如果发现代码中有基于 period_mode 的 if/switch 分支,必须使用多模式格式输出!`; ``` - [ ] **Step 2: Verify JavaScript syntax** ```bash node --check frontend/scene-generator/llm-client.js ``` Expected: No syntax errors. - [ ] **Step 3: Commit** ```bash git add frontend/scene-generator/llm-client.js git commit -m "feat(llm): enhance DEEP_SYSTEM_PROMPT for multi-mode detection" ``` --- ### Task 3: Implement Multi-Mode Template in Rust **Files:** - Modify: `src/generated_scene/generator.rs` (add new function after `browser_script_with_business_logic`) **Goal:** Add a new template function that generates mode-aware JavaScript. - [ ] **Step 1: Add browser_script_with_modes function** Add after `browser_script_with_business_logic` function (after line 476): ```rust fn browser_script_with_modes(scene_id: &str, scene_info: &SceneInfoJson) -> String { let modes_json = serde_json::to_string_pretty(&scene_info.modes).unwrap_or_else(|_| "[]".to_string()); let default_mode = scene_info.default_mode.as_deref().unwrap_or("month"); let mode_switch_field = scene_info.mode_switch_field.as_deref().unwrap_or("period_mode"); format!(r#"const REPORT_NAME = '{scene_id}'; const MODES = {modes_json}; const DEFAULT_MODE = '{default_mode}'; const MODE_SWITCH_FIELD = '{mode_switch_field}'; function normalizePayload(payload) {{ if (typeof payload === 'string') {{ try {{ return JSON.parse(payload); }} catch (_) {{ return {{}}; }} }} return payload && typeof payload === 'object' ? payload : {{}}; }} function validateArgs(args) {{ const errors = []; if (!args.org_code) errors.push('Missing org_code'); if (!args.period_value) errors.push('Missing period_value'); return {{ valid: errors.length === 0, errors }}; }} function detectMode(args) {{ const modeValue = args[MODE_SWITCH_FIELD] || DEFAULT_MODE; return MODES.find(m => m.condition.value === modeValue) || MODES[0]; }} function buildModeRequest(args, mode) {{ const endpoint = mode.apiEndpoint; const template = mode.requestTemplate || {{}}; const contentType = endpoint.contentType || 'application/json'; const url = endpoint.url; const method = endpoint.method || 'POST'; let body; if (contentType === 'application/x-www-form-urlencoded') {{ body = {{ ...template }}; for (const [key, value] of Object.entries(body)) {{ if (typeof value === 'string' && value.startsWith('${{') && value.endsWith('}}')) {{ const expr = value.slice(2, -1); try {{ body[key] = eval(expr); }} catch (e) {{ body[key] = args.org_code; }} }} }} body.orgno = args.org_code; }} else {{ body = JSON.stringify({{ ...template, ...args }}); }} return {{ url, method, headers: {{ 'Content-Type': contentType }}, body }}; }} function normalizeModeRows(data, mode) {{ const rules = mode.normalizeRules || {{ type: 'validate_all_columns', filterNull: true }}; const columns = mode.columnDefs.map(([key]) => key); if (!Array.isArray(data)) return []; return data.map(row => {{ const result = {{}}; for (const key of columns) {{ const v = row[key]; result[key] = (v === null || v === undefined || v === '') ? '' : String(v).trim(); }} return result; }}).filter(row => {{ if (!rules.filterNull) return true; if (rules.type === 'validate_required' && rules.requiredFields) {{ return rules.requiredFields.every(f => row[f] !== ''); }} return columns.every(k => row[k] !== ''); }}); }} function determineArtifactStatus({{ blockedReason = '', fatalError = '', reasons = [], rows = [] }}) {{ if (blockedReason) return 'blocked'; if (fatalError) return 'error'; if (reasons.length > 0) return 'partial'; if (!rows.length) return 'empty'; return 'ok'; }} function buildArtifact({{ status, blockedReason = '', fatalError = '', reasons = [], rows = [], args, columnDefs, columns }}) {{ return {{ type: 'report-artifact', report_name: REPORT_NAME, status: status || determineArtifactStatus({{ blockedReason, fatalError, reasons, rows }}), period: {{ mode: args.period_mode, mode_code: args.period_mode_code, value: args.period_value, payload: normalizePayload(args.period_payload) }}, org: {{ label: args.org_label, code: args.org_code }}, column_defs: columnDefs || [], columns: columns || [], rows, counts: {{ detail_rows: rows.length }}, partial_reasons: reasons.filter(r => r && !r.startsWith('api_') && !r.startsWith('validation_')), reasons: Array.from(new Set(reasons.filter(Boolean))) }}; }} const defaultDeps = {{ validatePageContext(args) {{ const host = (globalThis.location?.hostname || '').trim(); const expected = (args.expected_domain || '').trim(); if (!host) return {{ ok: false, reason: 'page_context_unavailable' }}; if (host !== expected) return {{ ok: false, reason: 'page_context_mismatch' }}; return {{ ok: true }}; }}, async queryModeData(args, mode) {{ const endpoint = mode.apiEndpoint; const request = buildModeRequest(args, mode); const contentType = endpoint.contentType || 'application/json'; // Prefer jQuery if (typeof $ !== 'undefined' && typeof $.ajax === 'function') {{ return new Promise((resolve, reject) => {{ $.ajax({{ url: request.url, type: request.method, data: request.body, contentType: contentType, dataType: 'json', success: resolve, error: (xhr, status, err) => reject(new Error( `API failed (${{xhr.status}}): ${{err}} | body=${{(xhr.responseText || '').substring(0, 200)}}` )) }}); }}); }} // Fallback: fetch if (typeof fetch === 'function') {{ const response = await fetch(request.url, {{ method: request.method, headers: request.headers, body: request.method !== 'GET' ? request.body : undefined }}); if (!response.ok) {{ const text = await response.text().catch(() => ''); throw new Error(`HTTP ${{response.status}}: ${{text.substring(0, 200)}}`); }} return response.json(); }} throw new Error('No HTTP client available (need jQuery or fetch)'); }} }}; async function buildBrowserEntrypointResult(args, deps = defaultDeps) {{ // 1. Parameter validation const validation = validateArgs(args); if (!validation.valid) {{ const mode = detectMode(args); return buildArtifact({{ status: 'blocked', blockedReason: 'validation_failed', reasons: validation.errors, rows: [], args, columnDefs: mode.columnDefs, columns: mode.columnDefs.map(([key]) => key) }}); }} // 2. Page context validation const pageValidation = typeof deps.validatePageContext === 'function' ? deps.validatePageContext(args) : {{ ok: true }}; if (!pageValidation?.ok) {{ const mode = detectMode(args); return buildArtifact({{ status: 'blocked', blockedReason: pageValidation?.reason || 'page_context_mismatch', reasons: [pageValidation?.reason || 'page_context_mismatch'], rows: [], args, columnDefs: mode.columnDefs, columns: mode.columnDefs.map(([key]) => key) }}); }} // 3. Detect mode const mode = detectMode(args); // 4. Data fetching const reasons = []; let rawData = null; try {{ rawData = await (deps.queryModeData ? deps.queryModeData(args, mode) : Promise.resolve([])); }} catch (error) {{ return buildArtifact({{ status: 'error', fatalError: error.message, reasons: ['api_query_failed:' + error.message], rows: [], args, columnDefs: mode.columnDefs, columns: mode.columnDefs.map(([key]) => key) }}); }} // 5. Extract response data const responsePath = mode.responsePath || ''; let data = rawData; if (responsePath && rawData) {{ data = rawData[responsePath] || rawData; }} // 6. Row normalization const rows = normalizeModeRows(data, mode); if (rows.length === 0 && Array.isArray(data) && data.length > 0) {{ reasons.push('row_normalization_partial'); }} // 7. Build artifact return buildArtifact({{ reasons, rows, args, columnDefs: mode.columnDefs, columns: mode.columnDefs.map(([key]) => key) }}); }} if (typeof module !== 'undefined') {{ module.exports = {{ buildBrowserEntrypointResult, normalizePayload, validateArgs, detectMode, buildModeRequest, normalizeModeRows, buildArtifact, determineArtifactStatus, MODES, REPORT_NAME }}; }} if (typeof args !== 'undefined') {{ return buildBrowserEntrypointResult(args); }} "#, scene_id = scene_id, modes_json = modes_json, default_mode = default_mode, mode_switch_field = mode_switch_field) } ``` - [ ] **Step 2: Modify browser_script function to use multi-mode template** Replace the `browser_script` function (lines 270-277): ```rust fn browser_script(scene_id: &str, analysis: &SceneSourceAnalysis, scene_info: Option<&SceneInfoJson>) -> String { match scene_info { Some(info) if !info.modes.is_empty() => { browser_script_with_modes(scene_id, info) } Some(info) if !info.api_endpoints.is_empty() || !info.column_defs.is_empty() => { browser_script_with_business_logic(scene_id, analysis, info) } _ => browser_script_skeleton(scene_id, analysis), } } ``` - [ ] **Step 3: Verify compilation** ```bash cargo check ``` Expected: No errors. - [ ] **Step 4: Commit** ```bash git add src/generated_scene/generator.rs git commit -m "feat(generator): add multi-mode template for mode-aware script generation" ``` --- ### Task 4: Add Unit Tests for Schema Parsing **Files:** - Create: `src/generated_scene/generator_test.rs` **Goal:** Add tests to verify the enhanced schema parses correctly. - [ ] **Step 1: Create test file** Create `src/generated_scene/generator_test.rs`: ```rust #[cfg(test)] mod tests { use super::*; #[test] fn test_parse_mode_condition() { let json = r#"{"field": "period_mode", "operator": "equals", "value": "month"}"#; let condition: ModeConditionJson = serde_json::from_str(json).unwrap(); assert_eq!(condition.field, "period_mode"); assert_eq!(condition.operator, "equals"); assert_eq!(condition.value.as_str().unwrap(), "month"); } #[test] fn test_parse_normalize_rules() { let json = r#"{"type": "validate_required", "requiredFields": ["ORG_NAME"], "filterNull": true}"#; let rules: NormalizeRulesJson = serde_json::from_str(json).unwrap(); assert_eq!(rules.rules_type, "validate_required"); assert_eq!(rules.required_fields, vec!["ORG_NAME"]); assert!(rules.filter_null); } #[test] fn test_parse_mode_config() { let json = r#"{ "name": "month", "label": "月度报表", "condition": {"field": "period_mode", "operator": "equals", "value": "month"}, "apiEndpoint": {"name": "test", "url": "http://example.com", "method": "POST"}, "columnDefs": [["ORG_NAME", "供电单位"]], "responsePath": "content" }"#; let mode: ModeConfigJson = serde_json::from_str(json).unwrap(); assert_eq!(mode.name, "month"); assert_eq!(mode.column_defs.len(), 1); assert_eq!(mode.response_path, Some("content".to_string())); } #[test] fn test_parse_scene_info_with_modes() { let json = r#"{ "sceneId": "test-report", "sceneName": "测试报表", "sceneKind": "report_collection", "modes": [ {"name": "month", "condition": {"field": "period_mode", "value": "month"}, "apiEndpoint": {"name": "m", "url": "http://a"}, "columnDefs": []}, {"name": "week", "condition": {"field": "period_mode", "value": "week"}, "apiEndpoint": {"name": "w", "url": "http://b"}, "columnDefs": []} ], "defaultMode": "month", "modeSwitchField": "period_mode" }"#; let info: SceneInfoJson = serde_json::from_str(json).unwrap(); assert_eq!(info.scene_id, "test-report"); assert_eq!(info.modes.len(), 2); assert_eq!(info.default_mode, Some("month".to_string())); } #[test] fn test_parse_scene_info_backward_compatible() { // Old format without modes should still work let json = r#"{ "sceneId": "old-report", "sceneName": "旧格式报表", "apiEndpoints": [{"name": "test", "url": "http://example.com"}], "columnDefs": [["col1", "列1"]] }"#; let info: SceneInfoJson = serde_json::from_str(json).unwrap(); assert_eq!(info.scene_id, "old-report"); assert!(info.modes.is_empty()); assert_eq!(info.api_endpoints.len(), 1); } } ``` - [ ] **Step 2: Add test module to generator.rs** Add at the end of `generator.rs`: ```rust #[cfg(test)] mod generator_test; ``` - [ ] **Step 3: Run tests** ```bash cargo test --lib generator ``` Expected: All tests pass. - [ ] **Step 4: Commit** ```bash git add src/generated_scene/generator_test.rs src/generated_scene/generator.rs git commit -m "test(generator): add unit tests for multi-mode schema parsing" ``` --- ### Task 5: Integration Test with tq-lineloss-report **Files:** - Test: Generate skill from tq-lineloss-report source **Goal:** Verify the enhanced template can generate a multi-mode script. - [ ] **Step 1: Build the project** ```bash cargo build --release ``` Expected: Build succeeds. - [ ] **Step 2: Create a test multi-mode scene-info JSON** Create a test JSON file to simulate LLM output: ```json { "sceneId": "tq-lineloss-test", "sceneName": "台区线损测试报表", "sceneKind": "report_collection", "modes": [ { "name": "month", "label": "月度报表", "condition": {"field": "period_mode", "operator": "equals", "value": "month"}, "apiEndpoint": { "name": "月度线损查询", "url": "http://20.76.57.61:18080/gsllys/fourVerEightHor/fourVerEightHorLinelossRateList", "method": "POST", "contentType": "application/x-www-form-urlencoded" }, "columnDefs": [["ORG_NAME", "供电单位"], ["YGDL", "供电量"], ["YYDL", "售电量"]], "requestTemplate": {"orgno": "${args.org_code}", "rows": 1000, "page": 1, "sidx": "ORG_NO", "sord": "asc"}, "normalizeRules": {"type": "validate_all_columns", "filterNull": true}, "responsePath": "content" }, { "name": "week", "label": "周报表", "condition": {"field": "period_mode", "operator": "equals", "value": "week"}, "apiEndpoint": { "name": "周线损查询", "url": "http://20.76.57.61:18080/gsllys/tqLinelossStatis/getYearMonWeekLinelossAnalysisList", "method": "POST", "contentType": "application/x-www-form-urlencoded" }, "columnDefs": [["ORG_NAME", "供电单位"], ["LINE_LOSS_RATE", "线损率"]], "requestTemplate": {"orgno": "${args.org_code}", "tjzq": "week", "rows": 1000}, "normalizeRules": {"type": "validate_required", "requiredFields": ["ORG_NAME", "LINE_LOSS_RATE"], "filterNull": true}, "responsePath": "content" } ], "defaultMode": "month", "modeSwitchField": "period_mode" } ``` Save to `tmp_multi_mode_test.json`. - [ ] **Step 3: Run generator with the multi-mode JSON** ```bash cargo run --bin sg_scene_generate -- --source-dir "examples/test-scene" --scene-id "tq-lineloss-test" --scene-name "台区线损测试" --output-root "tmp_multi_test" --scene-info-json "$(cat tmp_multi_mode_test.json)" ``` Expected: Skill package generated without errors. - [ ] **Step 4: Verify generated script syntax** ```bash node --check tmp_multi_test/skills/tq-lineloss-test/scripts/collect_tq_lineloss_test.js ``` Expected: No syntax errors. - [ ] **Step 5: Verify generated script has multi-mode logic** Check that the generated script contains: - `detectMode()` function - `MODES` constant with mode configurations - `buildModeRequest()` function - `normalizeModeRows()` function - [ ] **Step 6: Commit** ```bash git add -A git commit -m "test: verify multi-mode template generates valid JavaScript" ``` --- ### Task 6: Update Web UI to Display Mode Information **Files:** - Modify: `frontend/scene-generator/sg_scene_generator.html` **Goal:** Add mode information display in the extraction preview panel. - [ ] **Step 1: Add mode display section** Add after the column defs display in the preview panel: ```html ``` - [ ] **Step 2: Add JavaScript to populate mode info** In the `showExtractionPreview` function, add: ```javascript // Show modes if present const modesSection = document.getElementById('modes-preview'); const modesList = document.getElementById('modes-list'); if (data.modes && data.modes.length > 0) { modesSection.style.display = 'block'; modesList.innerHTML = data.modes.map(mode => { const name = escapeHtml(mode.name || 'unknown'); const label = escapeHtml(mode.label || ''); const api = escapeHtml(mode.apiEndpoint?.url || ''); return `
${name}${label ? ` (${label})` : ''}: ${api}
`; }).join(''); } else { modesSection.style.display = 'none'; } ``` - [ ] **Step 3: Verify changes** ```bash node --check frontend/scene-generator/sg_scene_generator.html ``` Note: HTML files can't be syntax-checked directly, just verify the server starts. - [ ] **Step 4: Commit** ```bash git add frontend/scene-generator/sg_scene_generator.html git commit -m "feat(ui): add mode information display in extraction preview" ``` --- ## Self-Review Checklist **1. Spec Coverage:** - [x] Multi-mode schema structs → Task 1 - [x] Enhanced LLM prompt → Task 2 - [x] Multi-mode template → Task 3 - [x] Unit tests → Task 4 - [x] Integration test → Task 5 - [x] UI update → Task 6 **2. Placeholder Scan:** - No TBD, TODO, or placeholder text found - All code snippets are complete - All commands have expected output **3. Type Consistency:** - `ModeConditionJson` field names match JSON schema - `ModeConfigJson` uses `apiEndpoint` (camelCase) matching JSON - `NormalizeRulesJson` uses `rules_type` with serde rename --- ## Execution Handoff Plan complete and saved to `docs/superpowers/plans/2026-04-17-enhanced-llm-extraction-schema-plan.md`. Two execution options: **1. Subagent-Driven (recommended)** - I dispatch a fresh subagent per task, review between tasks, fast iteration **2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints Which approach?