feat: add config-owned direct submit runtime

Keep browser-attached workflows on the configured direct-skill path and align the Zhihu export/browser regression contracts with the current ws merge state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 15:45:42 +08:00
parent b454fa3f54 311cc1fee6
commit 883647dffc
29 changed files with 5218 additions and 585 deletions
--- a/docs/superpowers/specs/2026-04-09-config-owned-direct-skill-dispatch-design.md
+++ b/docs/superpowers/specs/2026-04-09-config-owned-direct-skill-dispatch-design.md
@@ -0,0 +1,125 @@
+# Config-Owned Direct Skill Dispatch Design
+
+**Goal:** Preserve the current minimal submit flow where sgClaw accepts natural-language input, directly invokes one configured staged browser skill without calling an LLM, and keeps dispatch ownership in sgClaw configuration rather than external skill metadata.
+
+**Status:** Approved design direction for the next slice. The current minimal direct-submit path already works; this document records the ownership boundary that future dispatch-policy work should follow.
+
+---
+
+## Decision Summary
+
+1. Keep direct-skill selection in sgClaw configuration.
+2. Continue using `skillsDir` plus `directSubmitSkill` as the only control surface for the no-LLM direct path.
+3. Do not add sgClaw-specific dispatch fields to files under `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging` in this slice.
+4. Keep the currently bound skill as `fault-details-report.collect_fault_details`.
+5. When dispatch expands beyond one fixed skill, add the next policy layer on the sgClaw side first, not in `scene.json` or `SKILL.toml`.
+
+---
+
+## Current Minimal Flow
+
+The intended user experience stays unchanged:
+- the user types natural language into the input box
+- sgClaw receives `BrowserMessage::SubmitTask`
+- sgClaw loads runtime config
+- if `directSubmitSkill` is configured, sgClaw bypasses LLM routing and directly resolves the configured staged skill from `skillsDir`
+- sgClaw executes the target `browser_script` tool through the browser runtime and returns the result
+- if `directSubmitSkill` is absent, sgClaw falls back to the existing orchestration / compat behavior
+
+This keeps the first slice small while preserving a clear seam for future expansion.
+
+---
+
+## Ownership Boundary
+
+### sgClaw configuration owns dispatch choice
+
+sgClaw configuration is responsible for deciding whether submit-task should bypass the LLM path and which direct skill should run.
+
+For the current slice, that means:
+- `skillsDir` tells sgClaw where to load staged skills from
+- `directSubmitSkill` tells sgClaw which `skill.tool` should be used for the direct path
+
+Example:
+
+```json
+{
+  "skillsDir": "D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging",
+  "directSubmitSkill": "fault-details-report.collect_fault_details"
+}
+```
+
+### skill_staging owns skill identity and execution assets
+
+Files under `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging` remain responsible for describing the skill package, tool identity, and browser-script implementation.
+
+For the current bound skill:
+- `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging/scenes/fault-details-report/scene.json`
+- `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging/skills/fault-details-report/SKILL.toml`
+- `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging/skills/fault-details-report/scripts/collect_fault_details.js`
+
+These files already provide enough information for sgClaw to locate the package and run the tool. This slice does not add a new dispatch field inside them.
+
+---
+
+## Why This Boundary Is Recommended
+
+### One source of truth for routing
+
+If sgClaw configuration owns the direct-skill decision, the operator can switch the direct skill by changing config only. There is no need to edit code and no need to mutate external skill assets just to change routing.
+
+### Avoid freezing external manifest semantics too early
+
+`skill_staging` is an external skill asset set. Adding sgClaw-specific dispatch metadata now would couple the staged-skill format to one integration strategy before the policy model is stable.
+
+### Preserve a clean migration path
+
+The current minimal path is intentionally narrow: one fixed configured direct skill, no LLM dispatch, no per-skill policy registry yet. Keeping dispatch control in sgClaw makes it easier to add a broader policy layer later without rewriting the staged-skill package format first.
+
+---
+
+## Explicit Non-Goals
+
+This design does not do the following:
+- redesign the submit-task protocol
+- move dispatch control into `scene.json` or `SKILL.toml`
+- require every staged skill to declare `direct_browser` or `llm_agent` right now
+- expand the current direct path into generic natural-language intent classification
+- change the browser-script execution model
+- change the current fallback orchestration / compat execution semantics when `directSubmitSkill` is not configured
+
+---
+
+## Current Skill Contract
+
+The current direct path remains intentionally deterministic.
+
+For `fault-details-report.collect_fault_details`, sgClaw derives only the minimum required arguments:
+- `expected_domain` from the current `page_url`
+- `period` from an explicit `YYYY-MM` token in the user's natural-language input
+
+That means the UX still looks like natural-language submission, but the runtime does not ask an LLM to infer intent or invent missing parameters. If the period is missing, sgClaw should return a clear error instead of guessing.
+
+---
+
+## Future Dispatch Policy Direction
+
+When more than one staged skill needs routing control, the next layer should still begin on the sgClaw side.
+
+Recommended direction:
+- keep `directSubmitSkill` as the current bootstrap switch for the minimal fixed-skill path
+- introduce a sgClaw-owned registry or config mapping that can later express `skill.tool -> direct_browser | llm_agent`
+- keep external skill manifests unchanged until the policy surface proves stable in real use
+
+Only after the routing model is stable should we consider whether external skill metadata needs a default dispatch hint.
+
+---
+
+## Resulting Design Rule
+
+For this project, the direct-skill decision remains config-owned:
+- sgClaw config decides whether submit-task bypasses the LLM path
+- staged skill metadata identifies what the skill is and how its browser tool runs
+- future per-skill dispatch policy should be added in sgClaw first, not in `skill_staging`
+
+This is the approved baseline for the next dispatch-policy slice.
--- a/docs/superpowers/specs/2026-04-10-fault-details-full-skill-alignment-design.md
+++ b/docs/superpowers/specs/2026-04-10-fault-details-full-skill-alignment-design.md
@@ -0,0 +1,495 @@
+# Fault Details Full Skill Alignment Design
+
+**Goal:** Upgrade `fault-details-report.collect_fault_details` from an empty artifact shell into a real staged business skill that matches the original fault-details package's collection, normalization, summary, export, and report-history behavior, while keeping direct-skill routing config-owned in `claw-new`.
+
+**Status:** Approved design direction for the next remediation slice.
+
+---
+
+## Decision Summary
+
+1. Keep direct-skill selection in `claw-new` via `skillsDir` + `directSubmitSkill`; do not move dispatch ownership into `skill_staging` manifests.
+2. Put the fault-details business logic in `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging`, not in `claw-new`.
+3. Align the staged skill with the original package's real behavior: query raw rows, normalize detail columns, derive summary rows, call localhost export, and write report history.
+4. Keep the current browser-execution seam narrow: use the existing `browser_script` / browser-eval path, not a new browser protocol or new opcodes.
+5. Add a narrow artifact interpreter in `claw-new` so structured fault-results map cleanly to `TaskComplete.success` and a readable completion summary.
+
+---
+
+## Why This Slice Exists
+
+The current staged skill contract and the current staged skill implementation do not match.
+
+### What the original package actually does
+
+The original package under `D:/desk/智能体资料/大四区报告监测项/故障明细` does all of the following:
+
+- reads the selected date range from the page UI
+- queries the D4 repair-order data source
+- filters and normalizes raw rows into the canonical detail export schema
+- derives grouped summary rows by `gds`
+- calls `http://localhost:13313/SurfaceServices/personalBread/export/faultDetailsExportXLSXS`
+- auto-opens/downloads the generated file
+- writes report history through `http://localhost:13313/ReportServices/Api/setReportLog`
+
+### What the staged skill currently does
+
+The current staged `collect_fault_details.js` only returns an empty `report-artifact` shell with empty `rows` and empty summary `sections`.
+
+It also still uses a Node-style export shape instead of the browser-eval entrypoint shape that the current `browser_script` runtime expects. In practice, this means the staged script is not yet aligned with the real runtime contract even before business behavior is considered.
+
+This slice closes that gap by making the staged skill actually perform the work the original package performs, but through the current sgClaw direct-skill runtime.
+
+---
+
+## Design Rules
+
+### 1. `claw-new` owns routing, not business transforms
+
+`claw-new` stays responsible for:
+
+- loading config
+- deciding whether submit-task takes the direct-skill path
+- resolving the configured staged skill
+- executing the staged browser-script tool
+- turning the returned artifact into `TaskComplete.success` + human-readable summary
+
+`claw-new` must **not** become the place where the original fault classification table, detail-row field mapping, or summary aggregation rules are reimplemented.
+
+### 2. `skill_staging` owns fault-details business behavior
+
+The staged skill package owns:
+
+- query orchestration inside the browser page context
+- raw-row extraction
+- canonical detail-row normalization
+- classification and derived fields
+- summary-sheet derivation
+- localhost export request
+- localhost report-log request
+- structured result payload
+
+### 3. Keep the current browser seam narrow
+
+Do not introduce a new browser bridge, callback protocol, or skill-specific browser opcode for this slice.
+
+The implementation should continue using the current `browser_script` execution seam already wired through `claw-new/src/compat/browser_script_skill_tool.rs` and `claw-new/src/compat/direct_skill_runtime.rs`.
+
+### 4. Match business behavior, not the original shell verbatim
+
+The original package is a local HTML/Vue shell that uses `BrowserAction(...)`, timers, and hidden-browser choreography. That shell does **not** need to be recreated inside `claw-new`.
+
+What must be preserved is the business outcome:
+
+- same canonical detail columns
+- same key field mappings
+- same classification rules
+- same summary metrics
+- same downstream export/history behavior
+- same distinction between empty, partial, blocked, and failed work
+
+---
+
+## Ownership Boundary and Landing Zones
+
+### Staged skill changes
+
+These changes land in `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging`.
+
+Primary files:
+
+- `skills/fault-details-report/scripts/collect_fault_details.js`
+  - becomes the real browser-eval entrypoint
+  - must directly `return` the final structured artifact from the wrapped browser script
+  - may contain internal helper functions, but should remain self-contained for the current runtime
+- `skills/fault-details-report/SKILL.toml`
+  - keep `browser_script`
+  - tighten the tool description so it matches the real behavior
+  - do not turn `SKILL.toml` into the source of truth for classification rules or routing policy
+- `skills/fault-details-report/SKILL.md`
+  - align the written contract with the implemented runtime behavior
+- `skills/fault-details-report/references/collection-flow.md`
+  - align the staged flow with the implemented query/export/history sequence
+- `skills/fault-details-report/references/data-quality.md`
+  - stay authoritative for canonical columns, required fields, classification tables, `qxxcjl`-based reason heuristics, summary rules, and partial semantics
+- `scenes/fault-details-report/scene.json`
+  - keep the scene contract aligned with the actual output and state semantics
+  - do not move classification or routing policy into scene metadata
+
+### Caller/runtime changes
+
+These changes land in `D:/data/ideaSpace/rust/sgClaw/claw-new`.
+
+Primary files:
+
+- `src/compat/direct_skill_runtime.rs`
+  - keep configured direct-skill execution here
+  - add narrow structured-artifact interpretation after the browser-script returns
+- `src/agent/mod.rs`
+  - keep the current direct-submit routing seam here
+  - do not add fault-specific business logic here
+- `src/compat/browser_script_skill_tool.rs`
+  - keep the browser-script contract strict: browser-eval entrypoint, no Node-only assumptions
+- `tests/agent_runtime_test.rs`
+  - direct-submit path and result-surface regressions
+- `tests/browser_script_skill_tool_test.rs`
+  - browser-script execution-shape regressions
+
+If a new helper is needed in `claw-new`, it should be a narrow artifact-format/parser helper, not a new business-rules module.
+
+---
+
+## Target Runtime Flow
+
+### Step 1: Submit-task stays config-owned
+
+The user still types natural language into the current sgClaw input.
+
+`claw-new`:
+
+- receives `BrowserMessage::SubmitTask`
+- loads `SgClawSettings`
+- sees `directSubmitSkill = "fault-details-report.collect_fault_details"`
+- bypasses LLM routing exactly as it does now
+- resolves the staged skill from `skillsDir`
+
+This preserves the already approved config-owned routing boundary.
+
+### Step 2: Browser-script tool executes as a true browser entrypoint
+
+`collect_fault_details.js` must be shaped for the current runtime:
+
+- the script runs inside the current browser page context through `eval`
+- it must not rely on `module.exports`
+- it must directly `return collectFaultDetails(args)` from the wrapped script body
+
+This is required because the current sgClaw browser-script runtime reads one script file and wraps it in a browser-side IIFE.
+
+### Step 3: The skill reads the page-selected time range
+
+The source-of-truth query window should come from the current page state, matching the original package behavior.
+
+Design rule:
+
+- read the selected start and end time from the business page controls or page state
+- include that exact selected range in the returned artifact
+- keep `period` as a bootstrap label from `claw-new`, not as a license to silently guess a different business range
+
+Compatibility rule with the current direct-submit seam:
+
+- the current `claw-new` direct path still requires an explicit `YYYY-MM` token in the user's instruction in order to enter the configured direct-skill flow
+- that requirement remains in place for this slice
+- once inside the skill, the browser page's selected start/end range is the source of truth for collection
+- the returned artifact should include both the user-visible `period` label and the exact selected page range so mismatches are observable instead of hidden
+
+If the page-selected range cannot be read reliably, the skill should return `blocked` instead of inventing a month-wide query window from `period` alone.
+
+### Step 4: The skill collects raw rows and normalizes detail fields
+
+The staged skill must reproduce the original package's detail normalization logic inside the browser-executed script.
+
+That includes preserving the canonical detail schema from the original `excleIni[0].cols`, including the key transforms already present in the original package, such as:
+
+- `slsj = bxsj`
+- `gssgs = "甘肃省电力公司"`
+- `sgs` derived from the current company/city context
+- `gddw = maintOrgName`
+- `gds = maintGroupName`
+- `clzt = "处理完成"`
+- `bdz = bdzMc`
+- `line = xlmc10`
+- `pb = byqmc`
+
+The staged skill must also port the original classification/derivation logic that fills:
+
+- `sxfl1`
+- `sxfl2`
+- `sxfl3`
+- `gzsb`
+- `gzyy`
+
+That includes the original matching table and the `qxxcjl`-based text extraction heuristics that derive the fault reason.
+
+### Step 5: The skill derives summary rows from normalized detail rows
+
+The staged skill must derive the summary sheet from grouped detail rows, keyed around the same business totals the original package computes.
+
+At minimum that includes:
+
+- `index`
+- `gsName`
+- `fwDept`
+- `className`
+- `allCount`
+- `wxCount`
+- `khcCount`
+- `sbdSbCount`
+- `gyGzCount`
+- `dyGzCount`
+- `tqdzCount`
+- `tqbxCount`
+- `dyxlCount`
+- `bqxCount`
+- `jllCount`
+- `bhxCount`
+- `qftdCount`
+
+The summary derivation must stay in the staged skill so the same package can later be routed by LLM without moving business logic back into `claw-new`.
+
+### Step 6: The skill performs downstream export and report logging
+
+After detail rows and summary rows are available, the staged skill should reproduce the original package's downstream behavior:
+
+- build the export payload for `faultDetailsExportXLSXS`
+- call the localhost export endpoint
+- capture the returned export path/URL
+- write report history via `setReportLog`
+
+Important boundary:
+
+- export/report-log are downstream side effects
+- they do not redefine whether collection itself succeeded
+- if collection succeeds but export/logging fails, the result is `partial`, not a full collection failure
+- auto-opening/downloading the exported file is out of scope for this slice; this slice records the export path/result in the artifact but does not add new opener/UI behavior in `claw-new`
+
+### Step 7: The skill returns one structured artifact
+
+The staged skill should return one self-describing JSON artifact containing:
+
+- business identity (`type`, `report_name`)
+- selected period label
+- exact selected start/end range
+- canonical detail columns + normalized rows
+- summary section columns + rows
+- counts
+- business status
+- partial reasons if any
+- downstream export outcome
+- downstream report-log outcome
+
+### Step 8: `claw-new` interprets the artifact, not the business rules
+
+After the browser-script returns, `claw-new` should parse the JSON artifact and map it into final submit-task behavior.
+
+Recommended mapping:
+
+- `status = ok` -> `TaskComplete.success = true`
+- `status = partial` -> `TaskComplete.success = true`, with warnings in summary
+- `status = empty` -> `TaskComplete.success = true`, clearly reported as empty-result
+- `status = blocked` -> `TaskComplete.success = false`
+- `status = error` -> `TaskComplete.success = false`
+
+This keeps business classification in the staged skill while preventing false-positive success in the direct path.
+
+---
+
+## Artifact Contract
+
+The returned payload should stay `type = "report-artifact"`, but it must become rich enough to describe the real run.
+
+Recommended contract:
+
+```json
+{
+  "type": "report-artifact",
+  "report_name": "fault-details-report",
+  "period": "2026-03",
+  "selected_range": {
+    "start": "2026-03-08 16:00:00",
+    "end": "2026-03-09 16:00:00"
+  },
+  "columns": ["qxdbh", "gssgs", "sgs", "gddw", "gds", "slsj", "yjflMc", "ejflMc", "sjflMc", "gzms", "yhbh", "yhmc", "lxr", "gzdd", "lxdh", "bxsj", "gdsj", "clzt", "qxxcjl", "bdz", "line", "pb", "sxfl1", "sxfl2", "sxfl3", "gzsb", "gzyy", "bz"],
+  "rows": [],
+  "sections": [
+    {
+      "name": "summary-sheet",
+      "columns": ["index", "gsName", "fwDept", "className", "allCount", "wxCount", "khcCount", "sbdSbCount", "gyGzCount", "dyGzCount", "tqdzCount", "tqbxCount", "dyxlCount", "bqxCount", "jllCount", "bhxCount", "qftdCount"],
+      "rows": []
+    }
+  ],
+  "counts": {
+    "detail_rows": 0,
+    "summary_rows": 0
+  },
+  "status": "ok",
+  "partial_reasons": [],
+  "downstream": {
+    "export": {
+      "attempted": true,
+      "success": true,
+      "path": "http://localhost:13313/.../fault-details.xlsx"
+    },
+    "report_log": {
+      "attempted": true,
+      "success": true,
+      "report_name": "国网XX故障报修明细表(03月09日)",
+      "path": "http://localhost:13313/.../fault-details.xlsx"
+    }
+  }
+}
+```
+
+### Contract notes
+
+- `rows` is the canonical returned detail table, not the export-service transport payload.
+- If the export service still requires a placeholder row for an empty spreadsheet, that placeholder should be synthesized only for the downstream export call, not as the canonical returned `rows` contract.
+- `counts` should be computed from the canonical returned tables.
+- `selected_range`, `columns`, `sections`, `counts`, `status`, and `partial_reasons` should always be present for `ok`, `partial`, and `empty`.
+- For `blocked` and `error`, the artifact should still include `type`, `report_name`, `period`, `status`, and `partial_reasons`; `selected_range`, `columns`, `sections`, and `counts` should be included whenever they were already known before the failure point.
+- `downstream` should be omitted only when export/report-log were not attempted yet; otherwise include it with `attempted` / `success` flags and any available path or failure detail.
+
+---
+
+## Error Handling and Status Semantics
+
+### `ok`
+
+Use `ok` when all of the following are true:
+
+- raw collection succeeded
+- required detail-field normalization succeeded
+- summary derivation succeeded
+- export succeeded
+- report-log write succeeded
+
+### `partial`
+
+Use `partial` when detail collection succeeded but at least one downstream stage degraded, including:
+
+- one or more required fields could not be normalized, but the row set still remains exportable and summary derivation can proceed with explicit gaps recorded
+- summary derivation was incomplete, but the detail table is still available
+- export failed after rows were available
+- report-log write failed after rows/export were available
+
+Escalation rule:
+
+- if the raw query succeeds but required fields are missing so broadly that the canonical detail table cannot be produced at all, use `error`, not `partial`
+- if summary derivation cannot even start because the normalized detail rows are structurally unusable, use `error`, not `partial`
+
+`partial_reasons` must name the degraded stage instead of hiding it.
+
+### `empty`
+
+Use `empty` when:
+
+- the query succeeds for the selected range
+- zero real detail rows match
+
+This is not a failure.
+
+If the business flow still wants an empty export file or placeholder export payload, that happens downstream without changing the semantic meaning of the result.
+
+### `blocked`
+
+Use `blocked` when the page/session preconditions are not met, for example:
+
+- expected page/session is not available
+- required page controls cannot be read
+- login/session state is missing or expired
+- required browser-visible APIs are unavailable in the current page context
+
+### `error`
+
+Use `error` when the run starts but fails due to operational or parsing problems, for example:
+
+- request failure
+- page script failure
+- raw response parse failure
+- malformed export response
+
+### `claw-new` completion mapping
+
+`claw-new` should convert structured status into final submit completion behavior:
+
+- `ok` / `partial` / `empty`: return a success completion with a concise human summary
+- `blocked` / `error`: return a failed completion with a concise human summary
+
+This avoids the current risk where a structured error-like payload could still be surfaced as a nominal success string.
+
+---
+
+## Testing and Acceptance Strategy
+
+### Skill-side deterministic coverage
+
+Add deterministic coverage around the staged skill's business logic in `skill_staging` for:
+
+- canonical detail field mapping
+- classification table parity
+- `gzyy` extraction heuristics
+- summary aggregation parity
+- empty-result handling
+- partial-result generation when downstream export/logging fails
+- browser-script entrypoint shape (`return ...`, not `module.exports`)
+
+The classification/summary tests should use fixed raw-row fixtures so the business rules are validated without a live browser session.
+
+### `claw-new` runtime regressions
+
+Add Rust coverage in `claw-new` for:
+
+- direct-submit success with a populated `report-artifact`
+- `partial` artifact mapping to `TaskComplete.success = true`
+- `empty` artifact mapping to `TaskComplete.success = true`
+- `blocked` / `error` artifact mapping to `TaskComplete.success = false`
+- browser-script helper behavior for a real browser-eval return payload
+
+### Manual acceptance
+
+The live manual acceptance bar for this slice should be:
+
+1. Configure `skillsDir` to the staged skill root and `directSubmitSkill` to `fault-details-report.collect_fault_details`.
+2. Attach sgClaw to the real target browser page/session.
+3. Submit a natural-language fault-details request without LLM routing.
+4. Verify the staged skill:
+   - reads the selected page range
+   - queries real fault rows
+   - produces populated detail rows
+   - produces populated summary rows
+   - exports the workbook through localhost
+   - writes report history
+5. Verify the final sgClaw completion message reports the correct status, counts, and downstream file/log outcome.
+
+### Acceptance matrix
+
+At minimum, acceptance should cover:
+
+- normal populated result
+- empty result with no matching rows
+- partial result where export or report-log fails after collection
+- blocked result where page/session preconditions are missing
+- error result where parsing/query execution fails
+
+---
+
+## Explicit Non-Goals
+
+This slice does **not**:
+
+- move routing ownership out of `claw-new`
+- require LLM routing to be available first
+- add per-skill dispatch metadata to external manifests for routing policy
+- introduce a new browser protocol or browser opcode
+- recreate the original Vue shell inside `claw-new`
+- move fault classification logic into Rust
+- redesign the submit-task protocol beyond better interpretation of the returned artifact
+
+---
+
+## Resulting Design Rule
+
+For the fault-details path:
+
+- `claw-new` decides whether to invoke the fixed staged skill
+- the staged skill performs the real fault business workflow
+- the staged skill returns a structured artifact that describes collection + downstream outcomes
+- `claw-new` interprets that artifact for submit-task success/failure and summary output
+
+That keeps routing config-owned, keeps business logic with the staged skill, and makes `fault-details-report.collect_fault_details` ready for both the current no-LLM path and a later LLM-routed path.
+
+---
+
+## Document Landing Zones
+
+- Approved spec: `docs/superpowers/specs/2026-04-10-fault-details-full-skill-alignment-design.md`
+- Follow-up implementation plan: `docs/superpowers/plans/2026-04-10-fault-details-full-skill-alignment-plan.md`