From dd7b3c582a536173de77f937911ac6d28466c2ca Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=9C=A8=E7=82=8E?= <635735027@qq.com>
Date: Fri, 17 Apr 2026 09:51:59 +0800
Subject: [PATCH] docs: add LLM-driven skill generation design spec
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Design for enhancing sg_scene_generate to produce complete,
runnable skill packages with:

- Deep LLM extraction from index.html (API endpoints, params, columns)
- Enhanced Rust template rendering with business logic
- Web UI preview of extracted results

🤖 Generated with [Qoder][https://qoder.com]
---
 ...4-17-llm-driven-skill-generation-design.md | 490 ++++++++++++++++++
 1 file changed, 490 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md

diff --git a/docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md b/docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md
new file mode 100644
index 0000000..aaaf63a
--- /dev/null
+++ b/docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md
@@ -0,0 +1,490 @@
+# LLM-Driven Skill Generation Design
+
+> **Status:** Draft
+> **Date:** 2026-04-17
+> **Author:** Qoder
+
+## Problem Statement
+
+`sg_scene_generate` 当前只生成"骨架"技能包，缺乏实际业务逻辑：
+
+### 当前产出 vs 实际需求
+
+| 方面 | 当前产出 | 实际需求 (tq-lineloss-report) |
+|------|----------|------------------------------|
+| 脚本代码量 | 51 行 | 433 行 |
+| API 端点 | 无 | 有完整定义 |
+| 静态参数 | 无 | 有业务参数 |
+| 列定义 | 通用模板 | 业务特定 |
+| 可运行性 | 需手动填充 | 开箱即用 |
+
+### 根本原因
+
+1. **LLM 分析不读取 index.html** — 场景源码中的业务逻辑被忽略
+2. **只提取 scene-id/scene-name** — 缺少 API、参数、列定义等关键信息
+3. **Rust 模板过于简单** — 只生成骨架，无法渲染业务逻辑
+
+## Goal
+
+让 `sg_scene_generate` 自动生成**可直接运行**的完整技能包，包含：
+
+- API 端点定义
+- 静态业务参数
+- 列定义（导出报表用）
+- 数据采集逻辑骨架
+- 参数验证逻辑
+
+## Non-Goals
+
+- 不实现 100% 自动化 — 复杂业务逻辑仍需人工校验
+- 不支持所有 JavaScript 模式 — 仅覆盖常见场景
+- 不替换现有 Rust 模板系统 — 在其基础上增强
+- 不处理认证/授权逻辑 — 由运行时环境处理
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                         Web UI (HTML)                                │
+│  [选择场景目录] → [分析] → [预览提取结果] → [生成 Skill]              │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                    Node.js Server (server.js)                        │
+│  /analyze  →  LLM 深度提取 (index.html + scripts)                    │
+│  /generate →  传递提取结果给 Rust CLI                                 │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+          ┌───────────────────┼───────────────────┐
+          ▼                   ▼                   ▼
+┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
+│   llm-client.js │  │  generator-     │  │   Rust CLI      │
+│  (增强提取)      │  │  runner.js      │  │ (模板渲染)      │
+└─────────────────┘  └─────────────────┘  └─────────────────┘
+```
+
+## LLM Extraction Schema
+
+### Input Sources
+
+| 文件 | 提取内容 |
+|------|----------|
+| `index.html` | API 端点、静态参数、列定义、业务方法 |
+| `scripts/*.js` | 辅助函数、数据转换逻辑 |
+| 目录结构 | 文件组织方式 |
+
+### Output Schema
+
+```json
+{
+  "sceneId": "string - 场景标识",
+  "sceneName": "string - 场景中文名",
+  "sceneKind": "report_collection | monitoring",
+  "sourceSystem": "string - 来源系统名",
+  "expectedDomain": "string - 目标域名",
+  "targetUrl": "string | null - 目标页面URL",
+  "apiEndpoints": [
+    {
+      "name": "string - API 名称",
+      "url": "string - 完整 URL",
+      "method": "GET | POST",
+      "description": "string - 用途说明"
+    }
+  ],
+  "staticParams": {
+    "key": "value - 静态业务参数"
+  },
+  "columnDefs": [
+    ["fieldName", "中文列名"]
+  ],
+  "entryMethod": "string - 入口方法名",
+  "businessLogic": {
+    "dataFetch": "string - 数据获取逻辑描述",
+    "dataTransform": "string - 数据转换逻辑描述"
+  }
+}
+```
+
+### LLM Prompt Template
+
+```
+你是一个场景代码分析专家。分析以下场景源码，提取关键业务信息。
+
+## 分析目标
+
+1. **API 端点**: 识别所有 HTTP 请求地址
+2. **静态参数**: 识别硬编码的业务参数
+3. **列定义**: 识别数据表格/导出的列配置
+4. **业务逻辑**: 理解数据获取和转换流程
+
+## 源码内容
+
+=== 目录结构 ===
+{directoryTree}
+
+=== index.html ===
+{indexHtmlContent}
+
+=== 脚本文件 ===
+{scriptsContent}
+
+## 输出格式
+
+请以 JSON 格式返回提取结果：
+{
+  "sceneId": "...",
+  "sceneName": "...",
+  ...
+}
+```
+
+## Data Flow
+
+### 当前流程
+
+```
+用户选择目录
+    ↓
+Node.js 读取目录结构、脚本文件 (不读 index.html)
+    ↓
+LLM 只提取 scene-id, scene-name
+    ↓
+Rust 生成骨架脚本 (无业务逻辑)
+```
+
+### 改造后流程
+
+```
+用户选择目录
+    ↓
+Node.js 读取目录结构、index.html、脚本文件
+    ↓
+LLM 深度提取 API/参数/列定义/业务逻辑
+    ↓
+Web UI 展示提取结果供用户确认
+    ↓
+用户确认后，提取结果通过 CLI 参数传给 Rust
+    ↓
+Rust 根据提取结果渲染完整脚本
+```
+
+## Implementation Details
+
+### Task 1: 增强 llm-client.js
+
+**文件**: `frontend/scene-generator/llm-client.js`
+
+**改动**:
+1. 新增 `buildDeepAnalyzePrompt()` 函数
+2. 增强 `SYSTEM_PROMPT` 包含深度提取指令
+3. 新增 `extractSceneInfo()` 函数处理复杂 JSON
+
+**新增接口**:
+```javascript
+async function analyzeSceneDeep(sourceDir, dirContents, indexHtmlContent, config) {
+  // 返回完整的 SceneInfo 对象
+}
+```
+
+### Task 2: 增强 generator-runner.js
+
+**文件**: `frontend/scene-generator/generator-runner.js`
+
+**改动**:
+1. `readDirectory()` 增加读取 `index.html`
+2. 返回值增加 `indexHtml` 字段
+
+```javascript
+function readDirectory(sourceDir) {
+  // ... 现有逻辑 ...
+  
+  const indexHtmlPath = p.join(sourceDir, "index.html");
+  if (fs.existsSync(indexHtmlPath)) {
+    result.indexHtml = fs.readFileSync(indexHtmlPath, "utf-8");
+  }
+  
+  return result;
+}
+```
+
+### Task 3: 增强 server.js
+
+**文件**: `frontend/scene-generator/server.js`
+
+**改动**:
+1. `/analyze` 调用深度分析
+2. `/generate` 传递提取结果给 Rust CLI
+
+```javascript
+async function handleAnalyze(req, res) {
+  // ...
+  const indexHtml = dirContents.indexHtml;
+  const result = await analyzeSceneDeep(sourceDir, dirContents, indexHtml, config);
+  res.json(result);
+}
+
+async function handleGenerate(req, res) {
+  const { sceneInfo, outputRoot, lessons } = body;
+  // 将 sceneInfo 作为 JSON 参数传递给 Rust CLI
+}
+```
+
+### Task 4: 增强 Rust CLI
+
+**文件**: `src/bin/sg_scene_generate.rs`
+
+**新增参数**:
+```bash
+--scene-info-json '<JSON>'  # 完整的场景信息 JSON
+```
+
+**解析逻辑**:
+```rust
+struct SceneInfoJson {
+    scene_id: String,
+    scene_name: String,
+    scene_kind: String,
+    source_system: Option<String>,
+    expected_domain: Option<String>,
+    target_url: Option<String>,
+    api_endpoints: Option<Vec<ApiEndpoint>>,
+    static_params: Option<HashMap<String, String>>,
+    column_defs: Option<Vec<(String, String)>>,
+    entry_method: Option<String>,
+    business_logic: Option<BusinessLogic>,
+}
+```
+
+### Task 5: 增强 generator.rs
+
+**文件**: `src/generated_scene/generator.rs`
+
+**新增结构**:
+```rust
+pub struct ApiEndpoint {
+    pub name: String,
+    pub url: String,
+    pub method: String,
+    pub description: Option<String>,
+}
+
+pub struct BusinessLogic {
+    pub data_fetch: Option<String>,
+    pub data_transform: Option<String>,
+}
+
+pub struct SceneInfo {
+    pub scene_id: String,
+    pub scene_name: String,
+    pub scene_kind: SceneKind,
+    pub source_system: Option<String>,
+    pub expected_domain: Option<String>,
+    pub target_url: Option<String>,
+    pub api_endpoints: Vec<ApiEndpoint>,
+    pub static_params: HashMap<String, String>,
+    pub column_defs: Vec<(String, String)>,
+    pub entry_method: Option<String>,
+    pub business_logic: Option<BusinessLogic>,
+}
+```
+
+**模板渲染增强**:
+```rust
+fn browser_script_with_business_logic(scene_id: &str, info: &SceneInfo) -> String {
+    // 根据 SceneInfo 生成完整脚本
+    // 包含 API 端点定义、静态参数、列定义、数据获取逻辑
+}
+```
+
+### Task 6: Web UI 预览
+
+**文件**: `frontend/scene-generator/sg_scene_generator.html`
+
+**改动**:
+1. 分析完成后展示提取结果摘要
+2. 用户可编辑关键字段
+3. 确认后进入生成流程
+
+**展示字段**:
+- 场景 ID / 名称
+- API 端点列表
+- 列定义预览
+- 静态参数摘要
+
+## Generated Script Template
+
+### 结构
+
+```javascript
+// ===== 自动生成部分 =====
+
+// 常量定义
+const REPORT_NAME = '{scene_id}';
+const API_BASE = '{api_base_url}';
+const EXPECTED_DOMAIN = '{expected_domain}';
+
+// API 端点
+const API_ENDPOINTS = {
+  {api_name}: '{api_path}',
+  // ...
+};
+
+// 静态参数
+const STATIC_PARAMS = {
+  {key}: '{value}',
+  // ...
+};
+
+// 列定义
+const COLUMN_DEFS = [
+  ['{field}', '{label}'],
+  // ...
+];
+const COLUMNS = COLUMN_DEFS.map(([k]) => k);
+
+// ===== 标准框架 =====
+
+function validateArgs(args) { /* 参数验证 */ }
+function buildRequest(args) { /* 构建请求 */ }
+function normalizeRows(rawRows) { /* 数据标准化 */ }
+function buildArtifact(opts) { /* 构建 Artifact */ }
+
+async function buildBrowserEntrypointResult(args, deps = defaultDeps()) {
+  // 1. 参数验证
+  const validation = validateArgs(args);
+  if (!validation.ok) {
+    return buildArtifact({ status: 'blocked', reasons: validation.reasons });
+  }
+
+  // 2. 页面上下文验证
+  const pageValidation = deps.validatePageContext?.(args);
+  if (!pageValidation?.ok) {
+    return buildArtifact({ status: 'blocked', reasons: ['page_context_mismatch'] });
+  }
+
+  // 3. 数据获取
+  try {
+    const request = buildRequest(args);
+    const response = await deps.queryData(request);
+    const rows = normalizeRows(response.rows || []);
+    
+    return buildArtifact({
+      status: rows.length > 0 ? 'ok' : 'empty',
+      column_defs: COLUMN_DEFS,
+      columns: COLUMNS,
+      rows,
+    });
+  } catch (error) {
+    return buildArtifact({ status: 'error', reasons: [error.message] });
+  }
+}
+
+// ===== 默认依赖实现 =====
+
+function defaultDeps() {
+  return {
+    validatePageContext(args) {
+      const host = globalThis.location?.hostname;
+      return host === args.expected_domain 
+        ? { ok: true } 
+        : { ok: false, reason: 'domain_mismatch' };
+    },
+    
+    async queryData(request) {
+      // 根据 API_ENDPOINTS 调用实际接口
+      // 此处为模板，可能需要根据具体场景调整
+      if (typeof $ !== 'undefined' && typeof $.ajax === 'function') {
+        return new Promise((resolve, reject) => {
+          $.ajax({
+            url: API_ENDPOINTS.primary,
+            type: 'POST',
+            data: request,
+            success: resolve,
+            error: (xhr, status, err) => reject(new Error(`API failed: ${err}`)),
+          });
+        });
+      }
+      throw new Error('No HTTP client available');
+    },
+  };
+}
+
+// ===== 模块导出 =====
+
+if (typeof module !== 'undefined') {
+  module.exports = {
+    buildBrowserEntrypointResult,
+    validateArgs,
+    buildRequest,
+    normalizeRows,
+    COLUMN_DEFS,
+    COLUMNS,
+  };
+}
+
+if (typeof args !== 'undefined') {
+  return buildBrowserEntrypointResult(args);
+}
+```
+
+## Testing Strategy
+
+### Unit Tests
+
+1. **LLM 提取测试**
+   - 测试 fixture HTML 文件
+   - 验证提取字段完整性
+   - 验证 JSON 解析健壮性
+
+2. **模板渲染测试**
+   - 验证生成的脚本语法正确
+   - 验证常量定义正确
+   - 验证函数结构完整
+
+### Integration Tests
+
+1. **端到端测试**
+   - 选择 fixture 场景目录
+   - 分析 → 预览 → 生成
+   - 验证生成的 skill 可加载
+
+2. **真实场景测试**
+   - 使用营销零度户场景
+   - 对比 Claude 手动转换版本
+   - 验证功能等价性
+
+## Migration Path
+
+### Phase 1: 基础设施 (Week 1)
+- Task 1-2: LLM 提取增强
+- Task 3: Server 改造
+
+### Phase 2: Rust 模板 (Week 2)
+- Task 4: CLI 参数扩展
+- Task 5: Generator 模板增强
+
+### Phase 3: 用户体验 (Week 3)
+- Task 6: Web UI 预览
+- 集成测试
+
+### Phase 4: 验证优化 (Week 4)
+- 真实场景测试
+- 模板调优
+- 文档完善
+
+## Risks and Mitigations
+
+| 风险 | 影响 | 缓解措施 |
+|------|------|----------|
+| LLM 提取不准确 | 生成的脚本无法运行 | 提供 Web UI 预览编辑，支持人工修正 |
+| 场景源码格式多样 | 提取失败率增加 | 覆盖常见模式，提供 fallback |
+| Token 消耗过大 | 成本增加 | 限制 index.html 读取长度，优先提取关键段落 |
+| 复杂业务逻辑无法自动生成 | 功能不完整 | 生成骨架 + TODO 注释，提示人工补充 |
+
+## Success Criteria
+
+1. **自动化率**: 80% 的场景可自动生成可运行脚本
+2. **准确率**: 提取的 API/参数/列定义准确率 > 90%
+3. **人工干预**: 平均每个场景人工编辑时间 < 5 分钟
+4. **对比 Claude**: 生成的脚本功能与 Claude 手动转换版本等价