feat: add generated scene skill platform hardening

This commit is contained in:
木炎
2026-04-21 23:19:06 +08:00
parent 118fc77935
commit 956f0c2b68
439 changed files with 61974 additions and 3645 deletions

View File

@@ -0,0 +1,338 @@
# Request URL Resolution Design
**Goal:** Replace the temporary line-loss hardcoded request URL logic in `src/service/server.rs` with a single bootstrap-target resolution path that prefers current page context first, then deterministic submit results, then skill-owned metadata, and only finally falls back to `about:blank`.
**Status:** Approved design direction for the next slice.
---
## Problem
The current callback-host bootstrap path still derives the first helper-page request URL in `src/service/server.rs`:
- `initial_request_url_for_submit_task(...)` prefers `request.page_url`
- then `derive_request_url_from_instruction(...)`
- then falls back to `about:blank`
This is currently patched with a temporary line-loss branch:
- if instruction contains `线损` or `lineloss`
- return `http://20.76.57.61:18080`
That temporary branch is the wrong ownership boundary:
1. service code is guessing scene intent from raw instruction text
2. deterministic submit already has a structured execution plan with `target_url`
3. future direct-submit skills may also need a bootstrap URL, but should not require new Rust hardcoded branches every time
The result is duplicated routing knowledge and brittle request URL derivation.
---
## Decision Summary
1. Introduce one sgClaw-owned bootstrap target resolver for service submit bootstrap.
2. Resolution order is:
- explicit `request.page_url`
- deterministic submit execution plan
- skill metadata fallback
- `about:blank`
3. Deterministic submit is the primary source of truth for deterministic scenes such as line loss.
4. Skill metadata provides the compatibility fallback for direct browser-script skills that do not have a deterministic plan.
5. Remove the current line-loss text-match hardcode from `src/service/server.rs` once the resolver is in place.
---
## Recommended Architecture
### 1. Add a dedicated bootstrap-target result type
Introduce a small sgClaw-side result type dedicated to callback-host bootstrap only.
Recommended shape:
```rust
pub struct SubmitBootstrapTarget {
pub request_url: String,
pub expected_domain: Option<String>,
pub source: BootstrapTargetSource,
}
pub enum BootstrapTargetSource {
PageContext,
DeterministicPlan,
SkillConfig,
Fallback,
}
```
This type should remain intentionally small.
It is **not** a generic execution-plan object. Its only job is to answer:
- what URL should the helper page bootstrap against on first submit?
- where did that value come from?
Keeping this object narrow avoids coupling callback-host bootstrap to unrelated execution details.
---
### 2. Move URL derivation into one resolver
Replace the current `initial_request_url_for_submit_task(...)` branching with a single resolver, conceptually:
1. If `SubmitTaskRequest.page_url` exists and is non-empty, use it.
2. Else attempt deterministic parsing through `decide_deterministic_submit(...)`.
- If it returns `Execute(plan)`, use `plan.target_url`.
3. Else inspect configured direct-submit skill metadata.
- If metadata exposes a bootstrap URL, use it.
4. Else return `about:blank`.
This keeps service bootstrap logic declarative and removes scene-specific guessing from `server.rs`.
---
### 3. Deterministic submit becomes the primary truth for line loss
`src/compat/deterministic_submit.rs` already contains structured line-loss routing data:
- `DeterministicExecutionPlan.expected_domain`
- `DeterministicExecutionPlan.target_url`
For line-loss requests, service bootstrap should use `plan.target_url` rather than reconstructing or hardcoding a URL in `server.rs`.
That means the current temporary branch:
```rust
if instruction.contains("线损") || instruction.contains("lineloss") {
return Some("http://20.76.57.61:18080".to_string());
}
```
should disappear entirely after the resolver is introduced.
This is the cleanest fix because the deterministic parser already owns the scene contract.
---
### 4. Skill metadata is the fallback, not the primary owner
For non-deterministic direct browser-script skills, service may still need a bootstrap URL even when there is no page context.
The fallback should come from skill-owned metadata with minimal fields:
- `bootstrap_url`
- `expected_domain`
Recommended semantics:
- `bootstrap_url`: the page URL service should use when opening the helper/bootstrap context
- `expected_domain`: the hostname direct runtime can use when page context is absent
This metadata should only be consulted **after** page context and deterministic parsing fail.
That preserves the user-selected policy:
- deterministic plan first
- skill metadata second
It also avoids forcing deterministic scenes to duplicate already-structured routing data in skill config.
---
## Ownership Boundary
### sgClaw owns bootstrap resolution policy
sgClaw should own the policy and precedence order for request URL resolution.
That includes:
- checking current page context first
- deciding when deterministic parsing is authoritative
- deciding when skill metadata is an allowed fallback
- falling back to `about:blank` when nothing else resolves
This policy belongs in sgClaw because it is part of submit-path orchestration, not part of an individual skill script.
### deterministic submit owns deterministic scene targets
If a scene already resolves to `DeterministicExecutionPlan`, then that plan owns:
- the authoritative `target_url`
- the authoritative `expected_domain`
Service should consume that plan rather than re-deriving equivalent information from raw text.
### skill metadata owns direct-skill bootstrap hints
When there is no deterministic plan, the skill package may own the minimal hints needed for bootstrap compatibility:
- `bootstrap_url`
- `expected_domain`
The skill should not own resolution precedence. It only provides data for the fallback tier.
---
## File-Level Design
### `src/service/server.rs`
Change responsibilities here from “derive request URL by ad hoc branch logic” to “ask the resolver for a bootstrap target”.
Expected changes:
- replace `initial_request_url_for_submit_task(...)` logic with a call into a resolver
- delete `derive_request_url_from_instruction(...)` or reduce it to thin legacy glue during migration
- remove the line-loss text-match hardcode entirely
- keep callback-host startup logic unchanged apart from where `bootstrap_url` comes from
### `src/compat/deterministic_submit.rs`
No routing policy should move into service from this module.
Expected role in the new design:
- continue producing `DeterministicExecutionPlan`
- expose enough information for service bootstrap resolution to reuse `plan.target_url`
- remain the source of truth for deterministic line-loss target selection
### `src/compat/direct_skill_runtime.rs`
This module currently resolves skill tool execution and derives `expected_domain` from task context.
For this slice, it does **not** need a full behavior rewrite.
Expected role:
- add or expose a helper for reading direct skill metadata if service needs it
- keep runtime execution behavior stable unless required by the new metadata seam
A later slice may allow runtime execution to use skill-owned `expected_domain` fallback too, but that is not required to land this service TODO.
### `src/config/settings.rs`
If sgClaw configuration needs to point to direct-skill metadata or enable fallback behavior explicitly, add the minimum structure here.
However, this slice should avoid creating a second parallel source of target URLs inside sgClaw config if the same information can be read from skill metadata.
The key rule is:
- do not replace one hardcode with a different hardcoded config map inside `settings.rs`
### Skill metadata loading seam
The design assumes a small read path that can answer:
- for the configured direct-submit skill, is there a `bootstrap_url`?
- is there an `expected_domain`?
The exact storage location can follow the existing staged-skill packaging model, but the new metadata should remain minimal and execution-adjacent rather than introducing a new wide dispatch schema.
---
## Data Flow
### Current desired flow
1. service receives `ClientMessage::SubmitTask`
2. service converts it to `SubmitTaskRequest`
3. service resolves `SubmitBootstrapTarget`
4. service passes `SubmitBootstrapTarget.request_url` into `LiveBrowserCallbackHost::start_with_browser_ws_url(...)`
5. callback-host bootstraps helper page using that URL
6. remaining task execution continues unchanged
### Resolution behavior examples
#### Case A: page context exists
- request includes `page_url=https://www.zhihu.com`
- resolver returns `PageContext`
- service uses that URL directly
#### Case B: line-loss deterministic request, no page context
- request has no `page_url`
- deterministic parser returns `Execute(plan)`
- resolver returns `DeterministicPlan` with `request_url=plan.target_url`
- service uses line-loss target URL from the plan
#### Case C: direct-submit skill with configured bootstrap URL, no page context, not deterministic
- request has no `page_url`
- deterministic parser returns `NotDeterministic`
- configured direct skill metadata exposes `bootstrap_url`
- resolver returns `SkillConfig`
#### Case D: nothing resolves
- no page context
- no deterministic plan
- no skill metadata bootstrap URL
- resolver returns `Fallback` with `about:blank`
---
## Testing Strategy
### Resolver-focused tests
Add focused tests covering precedence:
1. `page_url` wins over everything else
2. deterministic line-loss `Execute(plan)` wins when `page_url` is absent
3. skill metadata fallback is used only when no deterministic plan exists
4. `about:blank` remains the terminal fallback
### Regression coverage for the removed TODO
Add a regression proving that the service no longer depends on:
- `instruction.contains("线损")`
- `instruction.contains("lineloss")`
The line-loss bootstrap URL should now come from the deterministic plan only.
### Direct skill fallback tests
Add tests for:
- configured skill metadata with valid `bootstrap_url`
- missing `bootstrap_url`
- malformed `bootstrap_url`
- mismatch between metadata and current page context precedence
Malformed `bootstrap_url` metadata should be treated as unusable fallback data rather than a hard error for service bootstrap resolution:
- if page context exists, page context still wins
- if deterministic plan exists, deterministic plan still wins
- if malformed metadata is the only candidate, resolver should ignore it and fall through to `about:blank`
### Existing callback-host tests remain stable
Do not redesign callback-host behavior in this slice.
The callback-host tests should only need enough updates to reflect the new bootstrap URL source, not a new helper lifecycle contract.
---
## Migration Plan Shape
Recommended implementation order:
1. Introduce the bootstrap-target resolver and narrow result type.
2. Wire deterministic line-loss resolution into it using `DeterministicExecutionPlan.target_url`.
3. Remove the temporary line-loss hardcode from `server.rs`.
4. Add skill metadata fallback for configured direct-submit skills.
5. Expand tests to lock precedence and fallback behavior.
This order lands the TODO removal early without forcing the full fallback design to be implemented blindly first.
---
## Explicit Non-Goals
This slice does **not**:
- redesign callback-host lifecycle
- redesign deterministic scene parsing
- redesign direct-submit routing ownership
- introduce a broad scene registry for request URL derivation
- change browser command protocol
- rewrite direct skill execution behavior beyond what is needed for metadata lookup
- replace all current uses of page context with skill metadata
---
## Design Rule
For service bootstrap request URL resolution:
- current page context stays first
- deterministic execution plans are the authoritative source for deterministic scenes
- skill metadata provides a narrow fallback for non-deterministic direct skills
- `about:blank` remains the final fallback
- `src/service/server.rs` must not contain scene-specific text-match hardcodes such as the current line-loss TODO

View File

@@ -0,0 +1,754 @@
# Generated Scene Skill Platform Design
**Goal:** Evolve `sgClaw` from one-off business-scene integrations into a platform that can generate, register, and invoke staged scene skills through a generic runtime path, while keeping v1 implementation strictly limited to report/collection-oriented `browser_script` scenes.
**Status:** Approved brainstorming direction for formal specification.
---
## Decision Summary
1. `sgClaw` should become a scene-skill platform, not a growing set of per-scene Rust branches.
2. V1 should support only report/collection-oriented `browser_script` scenes generated from existing scenario directories.
3. The generated output must include both the staged skill package and a platform registration manifest so that new scenes can be discovered and invoked with minimal or zero per-scene Rust changes.
4. In the intranet near term, deterministic mode remains the explicit `。。。` suffix path; no model is required for v1 invocation.
5. The design must preserve the existing main architecture, stay close to the current `browser_script` and artifact pipeline, and avoid platform changes that drift into a general workflow engine.
6. The implementation should happen on a new branch copied from `ws`, not directly inside the current `ws` branch.
7. The generator and runtime must be separated by explicit contracts so the generator can later be extracted into a standalone project.
8. The platform design must turn the full `tq-lineloss-report` lessons learned into durable documentation and generator input rules so future generated skills do not repeat the same mistakes.
---
## Hard Constraints
### 1. Extensibility is mandatory
The platform must support future extension without forcing redesign of the core contracts. The design must leave clean seams for:
- additional scene types
- additional deterministic matchers
- additional parameter resolver types
- additional tool invokers beyond `browser_script`
- future LLM semantic routing on top of the same registered scene contracts
- future extraction of the generator into a separate project
### 2. Stay on the main line
The core objective is:
- generate staged scene skills from existing scenario directories
- register them automatically
- invoke them through a generic deterministic runtime path
The design must not drift into:
- a full low-code workflow engine
- a general browser RPA authoring platform
- a full login/session orchestration platform in v1
- a broad runtime rewrite unrelated to generated scene skill support
### 3. Preserve the current architecture theme
The design should reuse and generalize the parts of `sgClaw` that already look platform-like:
- skills discovery/loading
- `browser_script` execution seams
- artifact interpretation
- export/postprocess seams
- bootstrap target resolution seams
It must avoid large theme-breaking rewrites of the runtime unless a generic platform seam truly requires them.
### 4. Execution branch strategy
This work is large enough that implementation should not land directly on the active `ws` branch. The future implementation plan must explicitly require:
- start from the current `ws` branch state
- create a new branch copied from `ws`
- perform platform conversion there
- preserve `ws` as the stable reference baseline during the migration
### 5. Generator extraction must remain possible
The generator should not be tightly coupled to `claw-new` internals. The boundary between runtime and generator must be a stable package/manifest contract so the generator can later move into a separate project without redesigning registered scene skills.
### 6. `tq-lineloss-report` lessons learned must become first-class inputs
The design must require a durable lessons-learned document derived from the full `tq-lineloss-report` path, including deterministic routing, canonical parameterization, bootstrap targets, pipe/ws differences, timeout chains, artifact contracts, and Rust-side export constraints.
This document is not an appendix. It is a required generator-design input and future template hardening source.
The document must be split into two layers so it remains enforceable instead of becoming loose prose:
- a structured machine-consumable lessons artifact that generator templates can read or reference deterministically
- a human-oriented narrative/analysis document explaining the why, trade-offs, and debugging history behind those lessons
### 7. Use the superpowers process end-to-end
This design must be carried through the superpowers flow:
- brainstorming
- formal spec
- review loop
- user review
- implementation planning
### 8. Think through the details before implementation
The spec must make the critical details explicit now so execution does not discover foundational contract problems halfway through.
---
## Why This Platform Exists
The current line-loss integration proves that `sgClaw` can support a staged business scene, but it also exposes the current architecture problem:
- the staged skill package exists and is useful
- the `browser_script` execution seam exists and is useful
- the runtime has some generic pieces already
- but deterministic routing, parameter normalization, bootstrap target selection, and scene-specific invocation are still too tied to one-off Rust code
Examples visible in the current code:
- `src/compat/deterministic_submit.rs` hardcodes the line-loss suffix route, target URL, host, scene matcher, org resolver, and period resolver
- `src/service/server.rs:453` already has a more general bootstrap-target seam, but it still delegates deterministic planning to scene-specific logic
- `src/compat/direct_skill_runtime.rs:148` already knows how to resolve and execute a `browser_script` tool from the skills directory, which is a strong existing platform primitive
- `src/runtime/engine.rs:232` already has multi-directory runtime skill loading and browser-surface-aware filtering, which is another platform primitive
The design goal is to promote the reusable parts into a stable platform and move scene-specific behavior into generated packages plus scene manifests.
---
## V1 Scope
### In scope
V1 is strictly limited to report/collection-oriented `browser_script` scenes generated from existing scenario directories.
That means:
- input source is an existing scenario directory containing page assets and business JS logic
- generated output is a staged skill package plus a platform registration manifest
- runtime invocation uses deterministic `。。。` routing only
- execution reuses the existing `browser_script` invocation chain
- output is a structured report artifact plus optional generic report postprocessing such as local XLSX export/open
### Out of scope
V1 does **not** include:
- generic action/authoring scenes such as navigation, form filling, publishing, or editor automation
- arbitrary multi-step workflow orchestration
- session/login orchestration as a generic platform capability
- non-`browser_script` tool generation
- full LLM semantic scene routing implementation
- a universal low-code engine
### Spec-level future seams
The spec **must** define extension interfaces for future use, but those extensions are not part of v1 implementation:
- matcher extension seam for future LLM semantic selection
- resolver extension seam for more complex domain parsing
- invoker extension seam for new tool kinds
- artifact interpreter extension seam for non-report results
- postprocessor extension seam beyond report export/open
- generator packaging seam for future project extraction
---
## Platform Architecture
The recommended platform has five units.
### 1. Scene Source Analyzer
Input:
- an existing scenario directory
- typical source artifacts such as `index.html`, `js/*`, business requests, export calls, state dictionaries, and target pages
Responsibility:
- inspect source structure and collect candidate scene metadata
- identify the likely business page URL/domain
- identify likely collection mode (report/collection in v1)
- extract request-shape hints, output table hints, export/report-log hints, and page dependencies
- record uncertainty instead of guessing when source evidence is incomplete
This unit is analysis-only. It does not perform runtime registration or invocation.
### 2. Skill Generator
Input:
- analyzed scene source description
- generator templates
- lessons-learned rules derived from existing scenes such as `tq-lineloss-report`
Output:
- staged skill package
- platform registration manifest
- generated references and contract docs
Generated package contents for v1:
- `SKILL.toml`
- `SKILL.md`
- `references/collection-flow.md`
- `references/data-quality.md`
- `scripts/*.js`
- `scripts/*.test.js`
- optional scene snapshot assets
- `scene.toml`
The generator is responsible for producing complete registration-ready output, not just scaffolding files.
### 3. Scene Registry Loader
Responsibility:
- scan staged skill directories
- locate `scene.toml`
- validate scene registration contracts
- register scenes into a unified runtime registry
This replaces the long-term need for per-scene Rust wiring.
The existing runtime already has useful loading primitives in `src/runtime/engine.rs:361` and skill-dir normalization in `src/compat/config_adapter.rs:90`. V1 should build on those instead of replacing them.
### 4. Generic Deterministic Dispatcher
Responsibility:
- activate only when the raw instruction ends with `。。。`
- iterate registered scenes, not hardcoded scene branches
- evaluate deterministic match rules declared in `scene.toml`
- resolve required canonical parameters using platform resolver types
- produce either:
- mismatch / unsupported-scene prompt
- missing/ambiguous parameter prompt
- executable scene invocation plan
#### Multi-match and precedence rules
Extensibility means multiple registered scenes may match the same deterministic request. The platform must define this explicitly instead of allowing hidden first-match behavior.
Design rules:
- deterministic dispatch must score candidate scenes through declared match signals rather than raw file-load order
- higher-confidence signals may include page URL/title context, explicit include/exclude keyword fit, and resolver success for required parameters
- plain keyword overlap alone is not sufficient justification for silently choosing one scene when another remains plausible
- if two or more scenes remain materially plausible after deterministic scoring and required-parameter evaluation, the dispatcher must fail closed with an explicit ambiguity prompt rather than guessing
- the future implementation plan must lock the scoring and tie-break order in tests
- bootstrap/page-context signals are allowed to participate in disambiguation, but they must be declared and explainable
This keeps the system extensible without turning new scenes into routing contradictions.
This should replace scene-specific logic currently concentrated in `src/compat/deterministic_submit.rs`.
### 5. Generic Execution Pipeline
Responsibility:
- invoke the resolved tool through the existing `browser_script` seam
- reuse bootstrap target resolution
- interpret the artifact according to the registered artifact contract
- run generic report postprocessing such as Rust-side XLSX export
- keep business-specific interpretation out of the platform core
The strong requirement is to preserve the already-validated common path in:
- `src/compat/direct_skill_runtime.rs`
- `src/compat/browser_script_skill_tool.rs`
- the existing report-artifact and export seams
---
## Scene Registration Contract
The central platform contract is a per-scene registration manifest, named `scene.toml` in this design.
### Why a separate manifest is needed
`SKILL.toml` describes tools. It does not fully describe:
- deterministic routing rules
- scene identity
- platform parameter resolution contracts
- bootstrap target rules
- artifact interpretation rules
- generic postprocessing declarations
Without this manifest, the generator would only create files while the runtime would still need scene-specific Rust changes.
### Manifest responsibilities
Each generated scene manifest must declare:
1. scene identity and runtime entrypoint
2. bootstrap/page context requirements
3. deterministic matching rules
4. parameter schema and resolver mapping
5. execution contract
6. artifact contract
7. postprocess contract
8. schema/version metadata sufficient for long-term generator/runtime evolution
### Manifest versioning and registry rules
The manifest contract must be explicit and versioned from the start.
Required rules:
- every `scene.toml` must declare a manifest schema version independent from the scene version
- the runtime must validate schema compatibility before registration
- scene registration must require globally unique `scene.id` values across all loaded scene roots
- duplicate scene IDs must fail registration deterministically rather than silently overriding an earlier scene
- the future implementation plan must decide and test the duplicate policy explicitly, but the default design rule is fail-fast with a clear error describing both conflicting manifest locations
- manifest evolution must prefer additive compatibility where possible so a future standalone generator can target the same runtime contract intentionally rather than by coincidence
This versioned contract is part of the extraction seam: it is what allows the runtime and a future standalone generator to evolve without private coupling.
### Recommended manifest shape
```toml
[scene]
id = "tq-lineloss-report"
skill = "tq-lineloss-report"
tool = "collect_lineloss"
kind = "browser_script"
version = "0.1.0"
category = "report_collection"
[manifest]
schema_version = "1"
[bootstrap]
expected_domain = "20.76.57.61"
target_url = "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"
page_title_keywords = ["线损"]
requires_target_page = true
[deterministic]
suffix = "。。。"
include_keywords = ["线损", "月累计", "周累计"]
exclude_keywords = ["知乎"]
[[params]]
name = "org"
resolver = "dictionary_entity"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少供电单位。"
prompt_ambiguous = "已命中台区线损报表技能,但供电单位存在歧义,请补充更完整名称。"
[params.resolver_config]
dictionary_ref = "references/org-dictionary.json"
output_label_field = "org_label"
output_code_field = "org_code"
[[params]]
name = "period"
resolver = "month_week_period"
required = true
prompt_missing = "已命中台区线损报表技能,但缺少统计周期。"
prompt_ambiguous = "已命中台区线损报表技能,但统计周期存在歧义,请补充更明确表达。"
[artifact]
type = "report-artifact"
success_status = ["ok", "partial", "empty"]
failure_status = ["blocked", "error"]
[postprocess]
exporter = "xlsx_report"
auto_open = "excel"
```
### Design rule
`scene.toml` declares behavior. It does not contain business JS code.
- business collection logic stays in `scripts/*.js`
- platform match/resolver selection stays in the manifest
- generic runtime execution stays in the platform
---
## Platform-Provided Generic Capabilities
The platform should expose a small, explicit set of reusable capability types.
### 1. Scene Matchers
V1 deterministic matcher types should stay simple and declarative:
- include keywords
- exclude keywords
- required suffix
- optional page URL/title constraints
This is enough for v1 report scenes and avoids overbuilding NLP into deterministic mode.
Future seam:
- add a semantic matcher interface for model-based routing later without changing the rest of the scene contract
### 2. Parameter Resolvers
The platform should provide reusable resolver types instead of scene-specific branches.
Recommended v1 resolver types:
- `dictionary_entity`
- maps aliases to canonical label/code pairs using scene-provided dictionary data
- `month_week_period`
- parses month/week intent and canonical time payloads
- `fixed_enum`
- maps deterministic text options into fixed internal values
- `literal_passthrough`
- preserves an already explicit literal value
Design rule:
If a new scene needs a new resolver **type**, add a reusable platform capability. Do not add a scene-specific Rust branch.
### 3. Bootstrap Resolvers
The platform must be able to produce:
- `expected_domain`
- `target_url`
- page-context validation hints
These should come from registration metadata, not from per-scene hardcoded constants.
This generalizes the existing bootstrap-target seam already present in `src/service/server.rs:453`.
### 4. Tool Invokers
V1 supports one invoker type only:
- `browser_script`
This is intentionally narrow. It keeps the platform close to the existing main architecture and avoids broad redesign.
Future seam:
- later add invokers for other tool kinds without changing scene registration concepts
### 5. Artifact Interpreters and Postprocessors
V1 should provide generic handling for report-style results:
- `report-artifact` interpreter
- `xlsx_report` exporter/postprocessor
- open-after-export policies
The platform should not know about line-loss business fields specifically. It should only know the generic artifact contract.
---
## Generated Skill Package Contract
V1 generated scenes must follow a predictable staged package shape.
### Required generated files
- `SKILL.toml`
- `SKILL.md`
- `scene.toml`
- `references/collection-flow.md`
- `references/data-quality.md`
- `scripts/<entry>.js`
- `scripts/<entry>.test.js`
- optional support assets such as scene snapshots
### V1 generated skill assumptions
Generated report/collection skills must:
- accept normalized canonical args only
- validate expected page context before collection
- avoid re-parsing raw user language inside the script
- return one structured artifact object
- keep page/API collection logic inside the script
- leave generic interpretation/export policy to the platform where possible
### Separation rule
The generated skill package owns:
- page inspection
- page-side state usage
- page/API calls
- row normalization
- scene-local docs and references
The platform owns:
- scene discovery and registration
- deterministic scene selection
- canonical parameter resolution using generic resolver types
- tool invocation
- artifact interpretation
- generic postprocessing
---
## Migration Path from `tq-lineloss` One-Off to Platform Sample
The current line-loss implementation should not be discarded. It should become the first migration sample and platform proof point.
### Why line-loss is the right sample
It already exercised most of the hard problems:
- deterministic routing via `。。。`
- canonical org resolution
- canonical month/week resolution
- staged `browser_script` packaging
- bootstrap target selection
- report artifact shaping
- local export needs
- pipe/ws transport differences
- real browser/runtime timeout and callback-host issues
### Phase A: Extract generic registry and invocation seams
First, add:
- scene registry loader
- manifest reader/validator
- generic deterministic dispatch planning
while preserving the existing `browser_script` execution seam.
### Phase B: Convert `tq-lineloss` into the first manifest-driven scene
Move line-loss specific declarations out of hardcoded Rust branches and into registration data:
- scene identity
- target URL and expected domain
- deterministic scene match rules
- resolver mapping
- artifact/postprocess declarations
Keep the business collection script in the skill package.
### Phase C: Build the generator on top of the stabilized contract
Once line-loss runs through the manifest-driven platform path, define generator templates that produce the same contracts automatically from scenario directories.
### Phase D: Add future semantic routing later
When model access is available, layer semantic routing onto the same registered scene contracts.
The LLM should eventually help with:
- selecting a scene
- filling unresolved parameters
But it should not replace the registered execution contract.
---
## Generator Extraction Boundary
The design must support eventually moving the generator out of `sgClaw`.
### Required extraction seam
The generator and runtime should communicate only through generated artifacts and contracts:
- staged skill package layout
- `scene.toml`
- any scene-local dictionaries/reference data
### Consequence
The runtime must not depend on internal generator implementation details.
This means:
- do not let the runtime call generator internals directly
- do not let the generator rely on private runtime types as its only output format
- keep manifest and package contracts explicit and versionable
This is what makes later extraction into a separate repository practical.
---
## `tq-lineloss-report` Lessons-Learned Document Requirement
The platform design requires a dedicated lessons-learned document based on the full `tq-lineloss-report` implementation and debugging path.
### Why this document is required
The line-loss path uncovered issues that a naive generator would recreate immediately.
These include:
- deterministic routing and prompt semantics
- strict canonical org/period normalization
- no hidden page-default fallback
- target URL / expected-domain / bootstrap-target contracts
- `browser_script` target URL requirements
- artifact shape discipline
- Rust-side XLSX export necessity because browser-side localhost export can fail under remote page origin constraints
- pipe vs ws differences
- callback-host and helper bootstrap timeout risks
- real-world service-console runtime validation gaps
### Required sections
The lessons document should at minimum cover:
1. source-scene assumptions that must be surfaced explicitly
2. deterministic routing pitfalls
3. canonical parameterization pitfalls
4. bootstrap target and page-context pitfalls
5. execution transport pitfalls (pipe/ws)
6. artifact and export pitfalls
7. testing pitfalls
8. manual runtime validation pitfalls
9. what should become generator template rules
10. what should remain scene-specific manual work
### Required format and location
The design requires both artifacts to live in a stable, versioned location under the project docs so future plans and a future standalone generator can depend on them intentionally.
Recommended shape:
- `docs/superpowers/references/tq-lineloss-lessons-learned.md`
- human-oriented narrative and rationale
- `docs/superpowers/references/tq-lineloss-lessons-learned.toml`
- structured generator input rules
The TOML artifact should be organized as reusable rule sections such as:
- deterministic routing rules
- canonical parameter rules
- bootstrap/target-url rules
- artifact/postprocess rules
- validation/test checklist rules
The generator should consume the structured TOML rules as template constraints or generation-time validation inputs, while the Markdown document remains the explainability companion for human reviewers.
### How it should be used
This document becomes:
- template hardening input for the generator
- a checklist for reviewing generated scenes
- a planning artifact for deciding which pieces can be automated safely
---
## Existing Code Surfaces to Reuse
The design should explicitly build on these current platform-adjacent surfaces rather than replacing them wholesale.
### Skills discovery and loading
- `src/runtime/engine.rs:232` load skills for surface from configurable directories
- `src/runtime/engine.rs:361` load runtime skills across multiple roots
- `src/compat/config_adapter.rs:90` skill-dir normalization
### Generic `browser_script` execution
- `src/compat/direct_skill_runtime.rs:91` raw output execution helper
- `src/compat/direct_skill_runtime.rs:148` tool resolution from staged skills
- `src/compat/browser_script_skill_tool.rs` script loading/wrapping/invocation pipeline
### Bootstrap target resolution seam
- `src/service/server.rs:453` submit bootstrap target resolution
### Current one-off deterministic branch that should be generalized
- `src/compat/deterministic_submit.rs`
The line-loss-specific pieces in that file are the main migration targets for platform conversion.
---
## Failure Semantics
The platform must preserve explicit, business-safe failure semantics.
### Deterministic mismatch
If the request ends with `。。。` but no registered scene matches, the runtime must return an explicit deterministic mismatch response.
### Missing / ambiguous parameters
If a registered scene matches but required parameters cannot be resolved uniquely, the runtime must prompt rather than guess.
### Execution failure
Execution failures should be interpreted according to the registered artifact contract and generic report semantics, not through per-scene special cases in the platform core.
### Design rule
The platform should never silently recover by using page defaults when the scene contract requires canonical inputs.
---
## Verification Requirements for the Future Implementation Plan
The future implementation plan must verify:
1. registry loading from generated scene manifests
2. deterministic dispatch through registered scenes instead of per-scene branches
3. manifest-driven bootstrap target selection
4. manifest-driven parameter resolver dispatch
5. generic `browser_script` invocation of generated scenes
6. generic report artifact interpretation
7. generic XLSX postprocessing compatibility
8. unchanged behavior for existing non-scene core flows outside v1 scope
9. migration of `tq-lineloss` from hardcoded branch to manifest-driven sample
10. branch strategy based on a new branch copied from `ws`
11. lessons-learned document completeness and reuse as generator input
12. separation seam sufficient for future generator extraction
---
## Out of Scope for the V1 Implementation Plan
The future implementation plan should explicitly avoid:
- generic login/session capability as a first-class v1 platform subsystem
- full semantic routing implementation with models
- generalized action workflows
- a full scene DSL runtime
- direct implementation of multiple non-report scene kinds
- replacing the validated core `browser_script` execution path with a new protocol
- broad architectural rewrites unrelated to generated scene skill support
---
## Recommended First Implementation Slice
The most stable first slice is:
1. create the scene manifest contract and validator
2. build a registry loader over existing staged skill directories
3. generalize deterministic dispatch to use registered scenes
4. migrate `tq-lineloss` into the first manifest-driven scene
5. document all line-loss lessons learned
6. only then build the scenario-directory-to-skill generator
This keeps the platform grounded in a working runtime contract before the generator is asked to automate against it.
---
## Final Recommendation
Build `sgClaw` into a generated scene skill platform by separating it into:
- a generic runtime platform that discovers, matches, resolves, invokes, and postprocesses scenes using manifest-driven contracts
- a scenario-directory-to-skill generator that emits staged skill packages and scene registration manifests
Implement v1 only for report/collection-oriented `browser_script` scenes, keep deterministic invocation on the explicit `。。。` suffix, migrate `tq-lineloss` into the first manifest-driven sample, and preserve a clean extraction seam so the generator can later become its own project.

View File

@@ -0,0 +1,236 @@
# Generated Scene Rectification Design
> **Status:** Draft
> **Date:** 2026-04-17
> **Author:** Codex
## Problem Statement
当前自动场景转 skill 流程虽然已经引入了 `Scene IR``workflowArchetype` 和 readiness 分级,但对营销类复杂报表场景仍存在三类致命偏差:
1. `sceneId` 会从中文场景名退化成低信息量标识,例如 `营销2.0零度户报表数据生成 -> 2-0`
2. bootstrap 会被 `localhost` 导出或辅助服务污染,导致内网入口域名解析错误。
3. workflow 会在证据不完整时提前归类,导致 `paginated_enrichment` 场景缺失 `paginate``secondary_request` 等关键步骤证据。
这三个问题叠加后,会出现“结构上看似已生成 skill实际上放到内网一定跑不通”的假阳性结果。现有生成器仍然偏向“模板填充”没有形成一条对内网场景足够保守、可审计、可拒绝错误输出的整改链路。
## Rectification Goal
本次整改目标不是继续提升“生成成功率”,而是把生成器收敛成一个更稳的 `scene skill rectifier`,做到:
1. 不再产出 `2-0` 这类无业务语义的 `sceneId`
2. 不再让 `localhost`、静态资源地址、模板噪声参与 bootstrap 竞争。
3. 不再在工作流证据缺失时冒然输出 `paginated_enrichment` 或其它高复杂 archetype。
4. 不再把低可信生成结果伪装成“可直接内网试跑”的 skill。
5. 为后续 mini 版 `skill-creator` 打下可复用的整改底座。
## Non-Goals
1. 不在本轮整改中解决全部历史场景兼容性。
2. 不要求本轮整改覆盖登录态恢复、复杂鉴权、宿主浏览器差异。
3. 不要求 LLM 单独完成中文场景语义恢复,整改仍以确定性证据优先。
4. 不要求一步到位生成所有 browser workflow 细节;允许在 readiness 门禁前失败关闭。
## Current Failure Reconstruction
`营销2.0零度户报表数据生成` 为例,当前错误链路大致如下:
1. 目录名 fallback 经过仅保留 `[a-z0-9]` 的 slug 规则后,中文主体被剥离,只剩 `2-0`
2. URL 候选集合没有分层,`http://localhost:13313/...``http://yx.gs.sgcc.com.cn``http://yxgateway.gs.sgcc.com.cn/...` 被放在同一个 bootstrap 竞争池。
3. 工作流分类优先命中通用的“多模式字段”信号,导致 archetype 判定先偏向 `multi_mode_request`
4. endpoint 集合又被 `${apiUrl}`、静态模板串、localhost 导出地址等噪声稀释,最终 workflow step 构造无法稳定提取“分页主请求 -> 逐户补数 -> 过滤 -> 导出”的完整证据。
5. 生成器仍然给出可输出 skill造成用户误以为该 skill 具备内网可运行性。
所以整改不能只修某一个 if 分支而是必须同时修正命名链、bootstrap 链、workflow 链和 readiness 链。
## Rectification Principles
### 1. Fail Closed 优先于 Fail Open
`sceneId`、bootstrap、workflow 任一关键链路证据不足时,生成器必须降级、阻断或明确标红,而不是继续生成一个“看起来完整”的 skill。
### 2. 先判定证据类别,再消费证据
URL、函数名、分页变量、过滤条件、导出调用必须先完成“证据分层”与“噪声剔除”再参与 archetype 分类和模板编译。
### 3. Bootstrap 与 Export 必须解耦
内网业务入口域名与本地导出服务是两类完全不同的运行时概念。`localhost` 可以作为 export/downstream evidence但绝不能作为 bootstrap candidate。
### 4. 命名必须具备业务可读性
`sceneId` 不是目录技术标识,而是 skill 的业务身份。任何低熵、数字化、占位式 id 都必须被视为无效结果。
### 5. Archetype 输出必须受完整工作流约束
`paginated_enrichment` 不是“有分页字段”就能输出,而必须同时满足主列表请求、分页证据、二次请求证据和聚合/过滤/导出链路中的最低组合。
## Target Architecture
```text
source scene
-> source scan
-> evidence stratification
-> naming chain
-> bootstrap chain
-> workflow chain
-> archetype gating
-> readiness grading
-> skill generation or fail-closed report
```
整改后的核心不是新增一个大模型步骤,而是在现有 `Scene IR` 前后补齐四道约束:
1. 命名约束
2. bootstrap 约束
3. workflow 约束
4. readiness 约束
## Rectification Design
### Naming Chain Rectification
`sceneId` 整改的目标是让“中文目录名 -> 业务可读 sceneId”成为一条受控链路而不是任由 fallback 退化。
整改方案如下:
1. `sceneId` 候选来源按优先级分层:
- LLM 明确返回的业务语义 id
- 确定性抽取出的英文业务关键词组合
- 基于中文场景名的受控 transliteration / alias 规则
- 最终 fallback 仅可作为 `invalid_candidate`,不可直接落盘
2. 新增 `sceneId` 有效性校验:
- 不能是纯数字或数字主导
- 不能短于业务最小可读长度
- 不能只由版本号或通用词组成
- 不能与 `sceneName` 语义完全脱钩
3.`2-0``1-0``report``scene` 这类低熵 id 统一判为 `invalid_scene_id`
4. 一旦命中无效 id生成器只能输出整改报告或要求人工确认不允许直接生成正式 skill 目录。
这条链路的结果是:`sceneId` 从“字符串清洗结果”升级为“业务标识产物”。
### Bootstrap Chain Rectification
bootstrap 整改的目标是把 URL 候选集拆成不同证据层,彻底消除 `localhost` 污染。
整改方案如下:
1. 对 URL 候选先做角色分层:
- `business_entry`
- `business_api`
- `gateway_api`
- `export_service`
- `local_helper`
- `static_asset`
- `template_noise`
2. 角色识别规则要求:
- `localhost``127.0.0.1``SurfaceServices``ReportServices` 默认归入 `export_service``local_helper`
- `.js``.css`、模板占位 URL、字符串格式化残片归入 `static_asset``template_noise`
- 页面常量中的 `sourceUrl`、首页入口地址、业务网关前缀优先归入 `business_entry``business_api`
3. bootstrap 决策只允许消费:
- `business_entry`
- `business_api`
- `gateway_api`
4.`business_entry``gateway_api` 并存时:
- `expectedDomain` 取业务主域
- API 前缀保留在 request evidence 中,不直接覆盖 target page
5. 当只识别到 `localhost` 或噪声地址时bootstrap 必须判为 `unresolved_bootstrap`,生成器直接降级。
这条链路的结果是:`localhost` 仍可保留为“导出依赖证据”,但永远不再有资格被提升成业务入口域名。
### Workflow Chain Rectification
workflow 整改的目标是从“字段特征分类”转为“请求链重建分类”。
整改方案如下:
1.`Scene IR` 中把 workflow 证据拆成四层:
- request evidence
- pagination evidence
- secondary request evidence
- post-process evidence
2. archetype 分类不再优先依赖泛化字段名如 `type/tab/mode/status`,而是优先依赖:
- 是否存在主列表请求
- 是否存在稳定分页变量组合
- 是否存在逐项或逐批二次请求
- 是否存在过滤、聚合或导出动作
3. endpoint 进入工作流前必须经过归一化:
- 去掉 `${apiUrl}`、格式化占位串、日志文本、异常字符串
- 去掉 `localhost` export endpoint 对 archetype 的干扰
- 合并同一业务 API 的不同拼接形态
4. `paginated_enrichment` 的最小证据门槛改为:
- 至少一个主列表请求
- 至少一组分页变量
- 至少一个二次请求入口或明确定义的逐户补数函数
- 至少一个后处理动作:`filter``transform``export` 之一
5. 如果只满足部分证据:
- 可保留为 `candidate_paginated_enrichment`
- 但不得进入正式 `paginated_enrichment` 编译路径
6. `multi_mode_request` 只在“模式切换显著改变请求体、列定义或响应路径”时成立,不能仅凭通用字段名命中。
这条链路的结果是:营销类场景只有在真正重建出“分页 + 补数 + 后处理”证据后,才会进入对应编译器。
### Readiness Chain Rectification
readiness 整改的目标是把“能否生成”与“能否内网试跑”明确分开。
整改方案如下:
1. 新增关键门禁:
- `scene_id_valid`
- `bootstrap_resolved`
- `workflow_complete_for_archetype`
- `runtime_contract_compatible`
2. 只有全部通过时,结果才可标为 `A``B`
3. 任一关键门禁失败时:
- 结果只能是 `C`
- UI 与报告中必须显式展示缺失项
- 允许输出分析报告,但不应默认输出可运行 skill
4.`marketing-zero-consumer-report` 这类参考场景readiness 最低通过条件应明确写死为:
- 非退化 `sceneId`
- bootstrap 指向 `yx.gs.sgcc.com.cn` 体系
- workflow 含 `paginate`
- workflow 含 `secondary_request`
- workflow 含 `filter``export`
这条链路的结果是:生成器从“只要能写文件就算成功”切换为“只有通过门禁才算可试跑”。
## Superpowers Landing Strategy
本整改必须通过 `superpowers` 的 spec -> plan -> execution 三段式落地,不接受直接跳到零散修补。
落地顺序应为:
1. 先基于本设计补一份对应的 `docs/superpowers/plans` 执行计划。
2. 再按计划拆解到 naming、bootstrap、workflow、readiness 四个任务包。
3. 实施过程中严格以计划边界为准,不额外扩展到登录、鉴权、宿主接入等非本轮范围。
## File Impact
本设计预期主要影响以下区域:
| File | Responsibility |
|------|----------------|
| `frontend/scene-generator/generator-runner.js` | 命名 fallback、URL 分层、workflow 证据重建、门禁前分析 |
| `frontend/scene-generator/llm-client.js` | sceneId 语义补全约束、证据摘要输入、低熵输出拦截 |
| `frontend/scene-generator/server.js` | readiness 汇总、整改风险输出、生成阻断策略 |
| `frontend/scene-generator/sg_scene_generator.html` | 显示 invalid sceneId、bootstrap 角色、workflow 缺证和 readiness 风险 |
| `src/generated_scene/analyzer.rs` | 证据分类、endpoint 去噪、archetype 前置判断 |
| `src/generated_scene/ir.rs` | 承载分层 evidence、candidate archetype、门禁状态 |
| `src/generated_scene/generator.rs` | 按门禁决定是否允许进入编译器和正式输出 |
## Acceptance Criteria
满足以下条件时,视为本整改设计达成目标:
1. 中文场景目录不再退化生成 `2-0``1-0` 等低熵 `sceneId`
2. `localhost``127.0.0.1`、导出服务地址不再进入 bootstrap 竞争链。
3. `marketing-zero-consumer-report` 只有在具备 `paginate + secondary_request + post-process` 证据时才会进入 `paginated_enrichment` 编译路径。
4. 证据不足时,系统输出整改报告和 readiness 风险,而不是默认生成可运行 skill。
5. 生成器对外定位从“自动模板生成器”收敛为“带门禁的通用场景 skill 转化器”。
## Open Questions
1. `sceneId` 的中文转英文策略是引入固定 alias 词表,还是允许 LLM 先给候选再由规则校验收敛。
2. `gateway_api``business_entry` 并存时,是否需要在 `Scene IR` 中同时保留 `entryDomain``apiDomain`
3. readiness 阻断后,默认是否仍允许用户手工确认并强制生成一个 `draft` skill 包。

View File

@@ -0,0 +1,790 @@
# sgClaw Scene Skill 60-to-90 Roadmap Design
> **Status:** Draft
> **Date:** 2026-04-17
> **Author:** Codex
## Problem Statement
`sg_scene_generate` 与配套 runtime 已从单点原型演进为 scene skill 平台雏形,已具备以下基础能力:
- `Scene IR`
- `workflowArchetype` 分类
- deterministic + LLM merge
- readiness / blocker / fail-closed
- scene registry / resolver / dispatch
- `report-artifact` / `xlsx_report` 通用后处理
但从业务结果看,自动生成质量仍普遍停留在“约 60 分”的层级:系统可以识别部分结构、生成基础 skill 包、阻断明显错误结果,但尚不能稳定重建 `tq-lineloss-report` 这类高质量 skill 所具备的业务语义。
这意味着当前核心问题已经不是“有没有 compiler”而是
> compiler 可以工作,但输入给 compiler 的 `Scene IR` 仍然不够像真实业务。
现阶段系统已开始处理以下问题:
- 低质量 `sceneId`
- bootstrap 被 `localhost` 污染
- archetype 误判
- readiness fail-open
但仍未真正解决以下高价值缺口:
1. mode-specific request / response / column / normalize 语义恢复不足
2. `paginated_enrichment` 的主请求、补数请求、过滤和导出链无法稳定绑定
3. 原始场景里的 BrowserAction、跨页注入、宿主桥接和本地导出链没有被统一建模
4. 生成器仍偏向“信息抽取 + 模板填充”,而不是“证据驱动的工作流语义恢复”
因此,本方案的目标不是继续局部修补 prompt 或模板,而是将“场景 skill 自动生成”从 60 分水平提升到可支撑 `tq` 级自动编译的总体路线,并形成正式设计。
## Execution Context
scene skill 转化面对的并非公网独立网页,而是运行在内网环境中的场景包。场景通常嵌入自研浏览器执行,而不是直接在通用浏览器中独立运行。
典型执行链如下:
1. 用户先登录浏览器中的统一平台
2. 统一平台聚合多个业务系统,并承载场景入口
3. 用户点击场景执行后,场景脚本再切换或登录到目标业务系统
4. 登录和上下文预热过程不一定全部显式发生在当前页面中,可能通过隐藏域、宿主接口、后台注入或浏览器管道能力完成
因此,源场景中的 `index.html``js/` 和页面按钮只是整个执行链的一部分,而非完整运行时边界。后续 scene skill 自动转化必须将“场景源码 + 宿主浏览器能力 + 平台上下文 + 目标系统上下文”视为同一问题空间,不能继续将场景误建模为普通静态网页脚本。
## Goal
在不推翻现有 `Scene IR + compiler + runtime` 架构的前提下,把 scene skill 生成能力从“结构可识别、结果可阻断”的 60 分阶段,提升到“业务语义可恢复、样板家族可稳定编译”的 90 分阶段。
具体目标如下:
1. 对代表性样板场景,可稳定恢复 mode matrix、requestTemplate、responsePath、columnDefs、normalizeRules、参数契约与 bootstrap 契约
2. 对简单单请求报表场景,可形成高通过率量产模板
3. 对复杂分页补数场景,至少能正确识别其工作流问题空间,并在证据不足时稳定 fail-closed
4. 让 scene skill 平台从“自动模板生成器”收敛为“带门禁的通用 scene skill 转化器”
### Success Definition
本方案的阶段性成功标准,不再以“生成结果是否尽量接近某个参考 skill 的结构”作为唯一目标,而是以通用场景生成后的 skill 能否在内网环境中直接运行、拿到正确数据并产出正确报表作为主判定口径。
阶段性成功至少同时满足以下三层闭环:
1. 执行闭环成立,即生成 skill 可在自研浏览器承载的内网环境中完成执行
2. 数据闭环成立,即查询、分页、提取后的数据结果正确且完整
3. 产物闭环成立,即生成的 Excel 或其他报表符合业务规则
因此,本轮路线优先面向“单系统、单页面、查询条件明确、分页拉全、按规则生成报表”的通用报表场景。特殊场景、工具型场景和高复杂 workflow 场景不要求在第一轮全部跑通,但必须被正确识别、分类并按边界处理。
## Non-Goals
1. 不承诺一步覆盖全部 102 个场景
2. 不承诺对任意历史场景做到 100% 一次性自动命中
3. 不在本轮方案中解决登录恢复、认证兼容、浏览器宿主差异等全部运行环境问题
4. 不把 BrowserAction 跨页执行链完整抽象到第一阶段
5. 不把所有复杂文档渲染、模板上传、附件解析场景纳入 P0
6. 不在本轮方案中展开“统一平台登录 + 目标业务系统后台登录”的自动恢复实现细节,但必须把这类宿主执行前提显式建模
## Current Landscape
## 全量场景结构判断
`D:\desk\智能体资料\全量业务场景\一平台场景` 的通读表明:
- 一共约 102 个场景目录
- 绝大多数具有统一入口 `index.html`
- 技术壳子高度同质化,主流是 Vue2 + jQuery + ElementUI
- 常见依赖包括 `vue.js``jquery.js``elementui.js``moment.js``dpage.min.js`
- 常见桥接脚本包括 `ami.js``mca.js``/a_js/YPTAPI.js`
- 页面结构通常为“工作信息 + 执行过程日志 + 历史报告 + 一键执行”的自动化工作台
这些场景并非 102 种完全不同的技术结构,而是少数几种前端场景包模板在不同业务上的复用。
## 场景包结构流派
### 1. 单文件内联型
典型:`95598、12398、流程超期风险工单明细`
特征:
- 顶层几乎只有一个 `index.html`
- 大量业务逻辑直接内联在页面内的 `new Vue({...})`
- 适合做源码内联语义提取样板
### 2. 标准静态包型
典型:`台区线损大数据-月_周累计线损率统计分析`
特征:
- 结构通常为 `index.html + css/ + js/ + images/`
- 业务逻辑拆分在 `js/` 目录
- 线损 / 电量分析类核心样板大多属于这一流派
### 3. 模板 / 导出增强型
典型:`供电可靠检修计划报表`
特征:
- 目录包含 `assets/``html/``copy/``docx/xlsx` 模板文件
- 不只是查数还带文档渲染、模板处理、Excel/Word 导出
### 4. 带历史副本的重包型
典型:`力禾动环系统巡视记录`
特征:
-`.history/``fsdownload/`、多个 `index*.html`
- 含历史版本、下载副本、调试残留
- 是后续自动提取的高噪声来源
## 系统分布特征
高频系统和域名包括:
- `yx.gs.sgcc.com.cn`
- `yxgateway.gs.sgcc.com.cn`
- `south.95598.sgcc.com.cn`
- `pms30.gs.sgcc.com.cn:32003`
- `20.76.57.61:18080`
- `10.4.39.180`
- 多个 `20.* / 21.* / 25.* / 10.*` 内网 IP 系统
同时,大量场景依赖:
- `localhost:13313`
- `localhost:13311`
- `localhost:13312`
这表明大量场景并非纯业务页面,而是混合了:
- 业务 API 链
- 浏览器页面逻辑
- 宿主桥接
- 本地导出 / 本地服务
## Host Browser Runtime Context
`localhost:13313``localhost:13311``localhost:13312` 以及 `ws://localhost:12345` 这类地址,在当前问题域中不应被简单视为错误业务域或无意义噪声。结合 [多核浏览器管道API接口文档](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/多核浏览器管道API接口文档.docx) 可知,这些地址主要属于自研浏览器宿主提供的本地桥接能力,用于承接页面与浏览器内核、隐藏域、本地服务之间的通信。
结合现有文档与场景结构,宿主浏览器至少提供以下能力:
- websocket 管道通信
- 隐藏域页面打开与加载完成回调
- 指定域执行 JS
- 浏览器侧 ajax 代理
- 登录初始化与退出
- 主界面 / 隐藏域 / agent 区域切换
- 本地服务路径获取与导出相关能力
因此,`localhost:*` 不应继续直接归类为 bootstrap domain 或主业务 endpoint而应识别为 `host runtime dependency``browser bridge capability` 证据:既要从“目标业务域识别”中剥离,又不能从“运行时依赖识别”中抹掉。
后续自动转化链需要同时区分三类对象:
- 真实业务目标域,如 `yx.gs.sgcc.com.cn`
- 宿主浏览器桥接域,如 `localhost:*`、浏览器 websocket、本地服务
- 场景自身页面与静态资源
若这三类对象继续混合建模,系统将持续出现 bootstrap 错判、workflow 误归因和 readiness 过宽放行。
## Scene Family Segmentation
基于对 `102` 个场景目录命名、页面结构、前端依赖和页面壳子形态的归纳,可将当前场景集合收敛为 5 个实施家族,而不是 `102` 种彼此独立的技术问题。
### Segmentation Result
当前 `102` 个场景可分为:
1. 通用单页报表组:`68`
2. 多模式报表组:`11`
3. 分页明细补数组:`10`
4. 工具检测前置组:`8`
5. 低优先级噪声组:`5`
### Segmentation Interpretation
`G1 + G2 + G3``89` 个场景,约占全部样本的 `87%`。这表明当前主流问题空间仍然是报表型 scene skill 转化,而不是工具型、治理文档型或高噪声边界场景。
这一定义直接带来三点结论:
1. 总体路线应优先围绕报表型场景建立稳定 archetype而不是先为边界场景设计通用超级框架
2. `G4` 虽不是当前主战场,但与后续 `200+` 纯 JS 检测类场景存在结构亲缘,应在架构上预留
3. `G5` 不应进入首轮主线,而应作为降级识别与 fail-closed 的边界样本
## Implementation Mapping by Scene Family
### G1 通用单页报表组
定位:
- 当前覆盖率主力组
- 最接近“单系统、单页面、查询条件明确、分页或统计后导出报表”的通用场景定义
推荐 archetype
- `single_request_table`
- `wrapped_single_mode`
推荐阶段:
- `P1`
主要目标:
- 形成高通过率量产模板
- 证明通用报表类具备规模化迁移能力
代表场景:
- `售电收入日统计`
- `高低压新增报装容量月度统计表`
- `供电可靠率指标统计表`
- `光伏用户超容情况报表`
- `供电服务工单业务统计表`
主要验收点:
1. 查询条件可恢复
2. request / response contract 可恢复
3. 数据抽取正确
4. 报表导出正确
5. 同家族场景具备可复用性
### G2 多模式报表组
定位:
- 当前能力上限验证组
- 用于证明生成器是否具备 `tq` 级业务语义恢复能力
推荐 archetype
- `multi_mode_request`
推荐阶段:
- `P0`
主要目标:
- 验证多模式报表的模式矩阵、模式切换语义和模式内 contract 恢复能力
代表场景:
- `台区线损大数据-月_周累计线损率统计分析`
- `用户日电量监测`
- `线损同期差异报表`
主要验收点:
1. 模式矩阵恢复
2. 模式切换语义恢复
3. 不同模式下 request / response contract 恢复
4. 列定义和 normalize 规则恢复
5. 内网执行后报表结果正确
### G3 分页明细补数组
定位:
- 当前复杂 workflow 风险控制组
- 用于建立分页、明细、补数和 fail-closed 的识别边界
推荐 archetype
- `paginated_enrichment`
推荐阶段:
- `P1`
主要目标:
- 建立复杂 workflow 识别和 fail-closed 能力
代表场景:
- `95598工单明细表`
- `故障明细`
- `重复致电(敏感)客户信息明细表`
主要验收点:
1. 主链、分页链、补数链拆分正确
2. 明细拉全正确
3. 导出链识别正确
4. 证据不足时稳定 fail-closed
### G4 工具检测前置组
定位:
- 当前非报表主线的前置组
- 用于为后续 `200+` 检测类场景预留宿主执行能力
推荐 archetype
- `embedded_page_tool`
- `page_exec_check`
推荐阶段:
- `P2`
主要目标:
- 预留页面内 JS 执行、宿主桥接识别和非报表结果采集能力
代表场景:
- `文件自动采集`
- `计量数据助手`
- `巡视计划完成情况自动检索`
主要验收点:
1. 页面内 JS 执行能力可建模
2. 宿主桥接依赖可识别
3. 非报表型结果可采集
### G5 低优先级噪声组
定位:
- 当前首轮主线外的边界组
推荐阶段:
- 降级处理
处理策略:
1. 优先识别而不是适配
2. 能 fail-closed 的优先 fail-closed
3. 不拿该组场景定义主线 archetype
## Why Current Quality Stops Around 60
现阶段系统已能提取:
- `sceneId`
- `sceneName`
- endpoint
- 分页变量
- 部分过滤表达式
- archetype
- readiness 风险
但这些能力仍停留在“信息抽取”层,尚未进入“工作流语义恢复”层。
当前质量停留在 60 分附近的根因主要有五类:
1. 目标对象仍偏浅
系统能识别 URL、函数名、分页字段却不能稳定恢复 mode matrix、request contract、response contract
2. 缺少中间证据层
当前 deterministic、LLM、compiler 三方结果仍然过于直接地汇入 `Scene IR`,缺少可裁决的语义证据层
3. archetype 约束仍偏粗
能识别 `multi_mode_request``paginated_enrichment`,但还不能稳定证明“为什么成立”“最少哪些证据成立”
4. 业务链与宿主链未彻底分离
`localhost`、导出接口、BrowserAction、静态资源、模板噪声仍容易污染 bootstrap 和 workflow 推断
5. 运行时语境没有被显式建模
当前生成链仍偏向把场景理解成“页面源码 + 接口抽取”,但真实执行依赖统一平台、自研浏览器、隐藏域登录、宿主 ajax 和本地桥接能力。运行时语境一旦缺席,就会把宿主桥接误判为业务主链,或者把本应依赖宿主环境的场景误判为可独立运行 skill
## What 90 Means
本方案中的“90 分”并不指所有场景均可自动跑通,而是指自动生成器开始具备 `tq` 级业务语义恢复能力。
一个场景 skill 只有同时满足以下能力,才可视为进入 90 分区间:
1. 参数契约可恢复
2. 模式切换语义可恢复
3. 每个模式的请求模板可恢复
4. 每个模式的响应提取路径可恢复
5. 列定义与归一化规则可恢复
6. 页面上下文和目标域名校验存在
7. 产物输出结构稳定
8. 失败时能给出业务级原因或稳定 fail-closed
## Design Principles
### 1. 从“信息抽取器”升级到“工作流语义恢复器”
生成器不能继续停留在“抽 URL、抽函数名、抽字段”的层级而必须恢复
- 业务入口语义
- 参数契约语义
- 模式切换语义
- 请求构造语义
- 响应提取语义
- 行归一化语义
- 工作流语义
- 本地 / 宿主依赖语义
### 2. 先沉淀证据,再归约 Scene IR
目标主链为:
`源码 -> 语义证据层 -> 证据归并/冲突消解 -> Scene IR -> compiler`
而不是:
`源码 -> LLM 总结 -> Scene IR -> compiler`
### 3. fail-closed 优先于 fail-open
对复杂场景,证据不完整时应降级为 draft 或分析报告,而不是伪装成 runnable skill。
### 4. archetype 必须由工作流证据驱动
`multi_mode_request``single_request_table``paginated_enrichment` 的成立条件必须由最小工作流证据集支撑,而不是由表面关键词命中支撑。
### 5. 先做家族标准答案,再做规模化迁移
路线不追求起步即覆盖 102 个场景,而是先打穿少量高价值样板家族,再复制到同类场景。
### 6. 先分离宿主层与业务层,再做 archetype 与 Scene IR 归约
生成器必须把以下三层信息显式分离:
- 业务语义层
- 宿主浏览器能力层
- 本地服务 / 导出 / 登录桥接层
业务语义层负责识别目标系统、参数契约、模式切换、请求构造、响应提取和产物结构。宿主浏览器能力层负责识别隐藏域加载、JS 注入、浏览器 ajax 代理、标签页或区域切换等执行机制。本地服务 / 登录桥接层负责承载统一平台登录、目标系统后台登录、本地导出和宿主 websocket 通信等依赖。
若三层继续混合建模,系统将反复出现三类错误:误把 `localhost` 当 bootstrap domain误把宿主桥接当业务 workflow误把缺少宿主能力的场景判成可独立运行的 skill。因此未来 Scene IR 或其前置证据层必须先完成分层,再进入 archetype 判断和 compiler。
## Target Architecture
```text
scene source
-> source scan
-> semantic evidence extraction
-> evidence stratification
-> evidence merge / conflict resolution
-> archetype contract gating
-> Scene IR
-> compiler
-> runtime compatibility check
-> readiness grading
-> runnable skill or fail-closed report
```
## Required Semantic Recovery Domains
后续建设必须围绕以下 8 类语义恢复域展开:
1. 业务入口语义
2. 参数契约语义
3. 模式切换语义
4. 请求构造语义
5. 响应提取语义
6. 行归一化语义
7. 工作流语义
8. 本地 / 宿主依赖语义
## P0 / P1 Sample Strategy
## P0 样板组合
P0 不追求覆盖面,而是分别证明:
- 能力上限
- 规模稳定性
- 复杂 workflow 识别正确性
P0 样板固定为:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `用户日电量监测`
3. `95598工单明细表`
### P0-1`台区线损大数据-月_周累计线损率统计分析`
定位:
- `multi_mode_request.month_week_table`
- `tq` 主样板
#### Canonical Benchmark Mapping
`台区线损大数据-月_周累计线损率统计分析``tq-lineloss-report` 的原始场景来源。`tq-lineloss-report` 在本路线中不是普通参考 skill而是第一份用于衡量“tq 级业务语义恢复能力”的 canonical benchmark。
该场景的自动转化目标不应停留在“生成一个可运行 skill 包”,而应尽可能逼近 `tq-lineloss-report` 已经体现出的关键业务语义,包括:
- `month / week` 模式矩阵
- 每种模式的请求契约与响应契约
- 列定义、归一化规则与导出语义
- bootstrap 与目标系统上下文约束
后续对自动生成质量的评估,应优先比较其与 `tq-lineloss-report` 的关键语义契合度,而不是只看是否产出了 runnable artifact。
#### Benchmark Role Clarification
`tq-lineloss-report` 是已完成“场景 -> skill -> 内网跑通”的高质量参考样板,但不是本方案中的唯一硬标准答案。后续自动生成结果不要求机械复制 `tq-lineloss-report` 的全部表现形式,而是要求在关键业务语义、内网可执行性与报表正确性上达到同等级别。
因此,`tq-lineloss-report` 在本方案中的作用是:
1. 证明“场景 -> skill -> 内网跑通”路线可行
2. 为多模式报表场景提供高质量语义恢复参考
3. 为 P0-1 提供业务级对照基线,而不是唯一输出模板
它的意义:
- 证明系统能否恢复 `mode matrix`
- 直接对标现有 `tq-lineloss-report`
### P0-2`用户日电量监测`
定位:
- `single_request_table` / `wrapped_single_mode`
- 单请求量产样板
它的意义:
- 证明系统可以让简单报表家族形成高通过率
### P0-3`95598工单明细表`
定位:
- `paginated_enrichment.list_detail_filter_export`
- 分页补数预研主样板
它的意义:
- 证明系统至少能正确识别复杂 workflow并在证据不足时 fail-closed
## P1 家族扩展
### 线损 / 电量多模式家族
- `白银线损周报`
- `线损同期差异报表`
- `线损大数据-窃电分析`
- `供电所线路电量统计`
- `台区零度户月度用电量与台区线损电量对比核查报表`
### 单请求报表家族
- `售电收入日统计`
- `高低压新增报装容量月度统计表`
- `电能表现场检验完成率指标报表`
- `供电可靠率指标统计表`
- `光伏用户超容情况报表`
### 分页补数家族
- `95598、12398、流程超期风险工单明细`
- `故障明细`
- `重复致电(敏感)客户信息明细表`
- `营销业务管控监测日报表`
## Minimal Implementation Roadmap
### Roadmap Prioritization Rationale
由于 `G1 + G2 + G3` 已覆盖 `102` 个场景中的绝大多数,因此路线优先级不按业务部门划分,而按场景家族划分。
首轮优先顺序固定为:
1. 先以 `G2` 多模式报表组验证语义恢复上限
2. 再以 `G1` 通用单页报表组验证规模化迁移能力
3. 再以 `G3` 分页明细补数组验证复杂 workflow 与 fail-closed
`G4` 作为后续检测类扩展的前置组保留,`G5` 不进入首轮主线。
## 阶段 1打通 `tq` 主样板
主场景:
- `台区线损大数据-月_周累计线损率统计分析`
最关键的三个里程碑:
1. 稳定恢复完整 `mode matrix`
2. 建立参数契约闭环
3. 让编译结果与手工 `tq` skill 结构同构
## 阶段 2做单请求量产模板
主场景:
- `用户日电量监测`
最关键的三个里程碑:
1. 稳定恢复 request / response / normalize 三件套
2. 压缩伪通用兜底主路径
3. 证明同家族单请求场景可复用
## 阶段 3做分页补数的正确识别与阻断
主场景:
- `95598工单明细表`
最关键的三个里程碑:
1. 正确拆开主链、补数链、导出链
2. 建立 `paginated_enrichment` 最小可编译证据集
3. 在证据不足时稳定 fail-closed
## Three Global Preconditions
在进入上述三阶段前,必须先满足三个总前置里程碑:
1. 建立可裁决的语义证据层
2. 建立最小可编译业务契约
3. 冻结 P0 样板标准答案
## Precondition 1: Semantic Evidence Layer
最小必须落地的对象:
1. 统一证据对象 schema
2. 核心证据类型集合
3. 证据归并规则
4. 证据到 `Scene IR` 的映射边界
最小证据类型集合建议包括:
- `bootstrap_candidate`
- `endpoint_candidate`
- `mode_candidate`
- `request_template_candidate`
- `response_path_candidate`
- `column_defs_candidate`
- `normalize_rules_candidate`
- `workflow_candidate`
- `localhost_dependency_candidate`
- `browser_action_candidate`
- `export_candidate`
## Precondition 2: Minimal Compilable Business Contract
最小必须落地的对象:
1. archetype 最小契约表
2. 契约 gate 列表
3. 阻断规则
4. archetype 最小输出契约
统一 gate 名称至少包括:
- `bootstrap_resolved`
- `request_contract_complete`
- `response_contract_complete`
- `workflow_contract_complete`
- `runtime_contract_compatible`
## Precondition 3: P0 Canonical Answers
最小必须落地的对象:
1. 三个 P0 样板的标准 `Scene IR`
2. 三个样板的关键证据清单
3. 三个样板的验收标准
4. 三个样板的失败 taxonomy
## Acceptance Criteria
本方案完成的标志不是“所有场景都能生成 skill”而是以下条件成立
1. `tq` 主样板能稳定恢复 mode matrix并产出与手工版高度同构的结果
2. 单请求主样板能稳定恢复 request / response / normalize 三件套,并可扩展到同家族多个场景
3. 分页补数主样板能稳定识别其问题空间,并在证据不足时稳定 fail-closed
4. `Scene IR` 前存在可裁决的语义证据层,而不是直接靠全文总结进 compiler
5. archetype 有明确最小可编译业务契约,生成器不会继续伪造 runnable skill
## Risks
1. 若继续直接强化 prompt 而不补证据层,质量提升会很快撞到上限
2. 若没有最小可编译契约compiler 仍会继续吞入“看起来像 business IR、实际证据不闭合”的结果
3. 若没有固定 P0 标准答案,后续回归和迁移验证会失去校准基线
4. 分页补数家族若过早当作主战场,极易把 localhost、导出链、宿主链再次污染到 bootstrap 和 workflow
## Open Questions
1. 证据层是否作为独立 `Evidence IR` 引入,还是先以内嵌字段方式扩充当前 `Scene IR`
2. `localhost_dependency_candidate``browser_action_candidate``export_candidate` 是否第一版就纳入硬门禁
3. P0 标准答案是否需要单独固化为 fixture + golden IR 双份基线
4. 对复杂分页补数场景,第一阶段是否允许生成 `draft skill`,还是一律只出分析报告
## G1 Boundary Refinement
本轮整改后,`G1` 不再表示“所有看起来像报表的场景”,而是收敛为“单系统、单主请求、可直接恢复请求契约与表格契约的通用单页报表家族”。
`G1` 仅允许承接以下结构:
1. 单系统、单主页面承载,主流程不依赖复杂宿主桥接。
2. 存在可识别的主查询入口,且请求模板、响应路径、字段列定义可以直接恢复。
3. 结果主体为单表或单次汇总,不依赖本地落库后二次分析。
4. 输出主形态为直接表格结果、Excel 结果或等价的单次数据汇总。
出现以下任一特征时,应直接排除出 `G1`
1. 业务主链路依赖 `BrowserAction``sgBrowserExcuteJsCode` 或同类宿主桥接接口推进。
2. 页面依赖多轮 callback / 子请求串联才能补全最终结果。
3. 同一场景内存在多个业务 endpoint 的分类盘点、分桶汇总或接口扫数。
4. 报表生成前需要本地落库、SQL 聚合、二次分析或文档拼装。
5. 输出主结果不是直接表格,而是抓取后再生成 Word/专题文档等二段式产物。
在此基础上,`G1` 增加一个上边界子型:`G1-E 轻量补查汇总型`。该子型仍属于 `G1`,但允许“单主请求 + 少量补查请求”的轻度扩展,前提是主查询契约仍清晰、最终输出仍以单次汇总表为主,且补查链路不升级为宿主桥接多步 workflow。
## Family Reassignment
基于第一轮真实样本迁移与结构复核,以下 4 个边界样本的归属已经冻结,不再继续作为普通 `G1` 样本混用:
| 样本场景 | 原暂挂位置 | 正式归属 | 重排原因摘要 |
| --- | --- | --- | --- |
| 高低压新增报装容量月度统计表 | G1 候选 | G1-E 轻量补查汇总型 | 主查询仍存在,但页面同时包含少量补查接口;更适合作为 G1 上边界样本,而非纯 `single_request_table`。 |
| 电能表现场检验完成率指标报表 | G1 候选 | G6 宿主桥接多步查询型 | 主链路依赖 `BrowserAction` / `sgBrowserExcuteJsCode` 与 callback 串联,多步查询特征明确。 |
| 计量资产库存统计 | G1 候选 | G7 多接口盘点汇总型 | 页面通过多个资产统计 endpoint 分类型扫数,属于多接口盘点汇总,而非单请求表格恢复。 |
| 95598供电服务月报 | G1 候选 | G8 抓取落库分析出文档型 | 先抓取数据,再经 `localhost` 落库与 SQL 分析,最后产出文档,结构上已超出报表直生家族。 |
为避免后续家族边界再次漂移,本设计同时冻结这 4 个新旧家族的最小定义:
1. `G1-E 轻量补查汇总型`
主查询清晰、补查轻量、最终仍可归并为单次汇总结果。
2. `G6 宿主桥接多步查询型`
业务查询必须经宿主桥接接口推进,且存在显式多步 workflow 或 callback 串联。
3. `G7 多接口盘点汇总型`
页面通过多个 endpoint 分项拉取后再统一汇总,核心难点是接口分组、口径对齐和聚合拼装。
4. `G8 抓取落库分析出文档型`
页面抓取只是前置阶段后续还存在本地存储、SQL 分析或文档生成链路。
## Implementation Impact
这次边界整改的直接影响不是“增加几个名字”,而是调整后续实现顺序、验证口径和 gate 策略。
首先,`single_request_table` 不再承担所有“看起来像报表”的兜底职责。对于证据落入以下结构的场景,编译器应优先 fail-closed而不是继续伪装成可运行的 `G1` skill
1. 宿主桥接主导型。
2. 多 endpoint 盘点汇总型。
3. 本地落库再分析型。
其次,后续实现顺序固定为:
1. 先做 `G1-E`
2. 再做 `G6`
3. 再做 `G7`
4. 最后做 `G8`
该顺序的原因是:
1. `G1-E` 与现有 `G1` 能力距离最近,最适合作为边界收紧后的第一步。
2. `G6` 需要先解决宿主桥接与 workflow 证据建模问题。
3. `G7` 需要在 `G6` 之后单独处理多接口分组与聚合逻辑。
4. `G8` 依赖抓取、落库、分析、出文档的完整后链路,复杂度最高,放在最后更稳妥。
最后,这一整改会同步改变后续验收口径:
1. `G1` 的验收重点从“是否生成了 skill”改为“是否恢复出完整且自洽的主请求契约”。
2. 边界样本不得再以 `single_request_table` 生成成功作为通过标准。
3. 新家族未落地前,相关样本应输出明确的家族归属和阻断原因,而不是继续产出低质量伪 runnable skill。

View File

@@ -0,0 +1,375 @@
# Scene Skill Compiler Design
> **Status:** Draft
> **Date:** 2026-04-17
> **Author:** Codex
## Problem Statement
当前 `sg_scene_generate` 已经具备基础的场景识别、LLM 抽取和模板渲染能力,但整体上仍然更接近“场景元数据提取器 + 模板填充器”,还不是一个真正可复用的通用 skill 转化器。
这在两个对照样本上表现得很明显:
| 样本 | 转化方式 | 结果 |
|------|----------|------|
| `tq-lineloss-report` | 基于 Claude 的语义重建 | 生成结果接近可运行 skill显式表达了月/周模式、请求体、列定义、响应路径 |
| `marketing-zero-consumer-report` | 基于当前项目自动转化 | 生成结果偏“骨架 skill”无法正确表达分页、逐户补数、过滤、导出等复合工作流 |
### 根因
1. 当前生成流程主要抽取“字段”,没有稳定抽取“工作流”。
2. 生成器默认假设报表场景接近“单次请求 -> 表格归一化 -> 输出 artifact”。
3. `scene.toml`、bootstrap、参数合同和 browser script 之间仍存在较强硬编码。
4. LLM 输入存在截断,且缺少对关键逻辑片段的优先抽取。
5. 运行时 resolver 能力较弱,无法稳定承接更复杂的自动生成结果。
### 典型失败模式
`marketing-zero-consumer-report` 为例,原始场景实际是:
1. 获取组织或用户列表。
2. 按页拉取用户数据。
3. 对每个用户发起二次请求补充电费信息。
4. 根据 `charge !== 0` 做业务过滤。
5. 组装导出数据并通过本地服务导出。
但当前自动生成结果把它错误归类成了“单请求报表”,导致:
- bootstrap 域名和目标页面来源不稳定。
- browser script 只使用第一个 API 端点。
- 数据归一化直接对原始列表生效,没有分页循环和二次请求。
- 生成出的参数合同与真实业务流程不匹配。
## Goal
将当前场景生成器升级为一个面向常见内网场景的“迷你版 skill-creator / scene skill compiler”使其具备以下能力
1. 先理解场景工作流,再选择模板并生成 skill。
2. 覆盖常见报表类内网场景,而不是只覆盖单请求表格场景。
3. 在生成前给出可运行性评估,减少“生成成功但内网跑不通”的情况。
4. 让同类场景可以复用同一套转化机制,而不是逐个手工重写。
## Non-Goals
1. 不追求一次性支持所有历史场景和所有前端技术栈。
2. 不在第一阶段解决登录、鉴权、跨域和浏览器宿主差异的全部问题。
3. 不要求 LLM 单独完成完整语义恢复,必须允许规则提取参与。
4. 不要求生成结果 100% 无需人工审阅。
## Design Principles
### 1. 先建模,再生成
必须先把原始场景建模为统一的 `Scene IR`,再由编译器按 archetype 渲染 skill。
### 2. 抽取“工作流证据”优先于抽取“字段清单”
对通用 skill 转化器而言,分页、模式切换、二次请求、导出动作、过滤条件,比单纯的 URL 和列定义更重要。
### 3. 确定性优先LLM 补全
URL、请求方法、分页变量、入口函数、列头、导出调用等确定性信息优先由规则提取LLM 负责做语义归并、命名和补全。
### 4. 模板按 archetype 拆分
不能继续用一个通用模板覆盖所有报表场景。不同工作流 archetype 必须有独立编译路径。
### 5. 运行时合同必须与生成能力对齐
生成器输出什么参数合同,运行时 resolver 就必须能承接;否则生成器必须降级或提示人工补齐。
## Architecture
### Target Architecture
```text
原始场景目录
-> 场景扫描器
-> 确定性规则提取
-> LLM 语义补全
-> Scene IR 合并
-> archetype 分类
-> archetype 编译器
-> skill 包
-> 静态验收 / 可运行性评级
```
### Core Pipeline
1. 扫描 `index.html``scripts/*.js`、目录结构和可见依赖关系。
2. 规则提取器抓取确定性证据。
3. LLM 基于分块后的关键上下文提取高层语义。
4. 合并为统一 `Scene IR`
5. 根据 `workflowArchetype` 路由到对应编译器。
6. 生成 `scene.toml``SKILL.toml`、browser script 和说明文档。
7. 执行静态门禁和 readiness 评级。
## Scene IR
### Top-Level Fields
```json
{
"sceneId": "marketing-zero-consumer-report",
"sceneName": "营销2.0零度户报表数据生成",
"sceneKind": "report_collection",
"workflowArchetype": "paginated_enrichment",
"bootstrap": {
"expectedDomain": "yx.gs.sgcc.com.cn",
"targetUrl": "http://yx.gs.sgcc.com.cn"
},
"params": [
{
"name": "org_code",
"resolver": "org_tree",
"required": true
}
],
"modes": [],
"workflowSteps": [
{ "type": "paginate", "entry": "getUserList" },
{ "type": "foreach", "source": "userList" },
{ "type": "secondary_request", "entry": "getUserCharges" },
{ "type": "filter", "expr": "charge !== 0" },
{ "type": "export", "entry": "exportExcel" }
],
"requestTemplate": {},
"responsePath": "data.rows",
"normalizeRules": {
"type": "field_map"
},
"artifactContract": {
"type": "report-artifact"
},
"validationHints": {
"requiresTargetPage": true
},
"evidence": []
}
```
### Required IR Blocks
| Block | Purpose |
|------|---------|
| `workflowArchetype` | 决定编译器路由 |
| `bootstrap` | 决定目标域名、目标页面和 helper page 行为 |
| `params` | 决定 `scene.toml` 参数合同 |
| `modes` | 表达月/周、日/月、报表类型切换等多模式逻辑 |
| `workflowSteps` | 表达分页、循环、二次请求、过滤、导出等复合流程 |
| `requestTemplate` | 表达固定请求体和参数映射 |
| `responsePath` | 指定响应数据抽取路径 |
| `normalizeRules` | 指定字段映射、空行过滤和关键字段校验 |
| `artifactContract` | 指定输出 artifact 结构和状态语义 |
| `evidence` | 保留抽取证据,便于 UI 预览和人工复核 |
## Workflow Archetypes
第一阶段建议先稳定支持以下 archetype
| Archetype | 场景特征 | 典型样本 |
|-----------|----------|----------|
| `single_request_table` | 单次请求,直接返回表格或列表 | 简单报表场景 |
| `multi_mode_request` | 同一场景存在月/周等多模式,请求体和列定义随模式切换 | `tq-lineloss-report` |
| `paginated_enrichment` | 先分页拉主列表,再逐条或逐批补数,再过滤或导出 | `marketing-zero-consumer-report` |
| `page_state_eval` | 更偏状态检查、页面判定、轻量采集 | 监测类或状态判定类场景 |
### Routing Rules
1. 有显式模式切换条件,优先判定为 `multi_mode_request`
2. 有分页调用且伴随逐条二次请求,优先判定为 `paginated_enrichment`
3. 无明确请求链、以页面状态判定为主,归到 `page_state_eval`
4. 其余默认归到 `single_request_table`,但标记为低置信度。
## Extraction Architecture
### Stage 1: Deterministic Extraction
规则提取器负责抽取高确定性信息:
1. URL、请求方法、`contentType`、请求体拼装方式。
2. 分页参数,例如 `page``rows``pageSize``sidx``sord`
3. 入口函数、导出函数、列表函数、详情函数。
4. `if/switch` 模式分支。
5. 表头、列定义、字段映射。
6. 明确的过滤条件,例如 `charge !== 0`
### Stage 2: LLM Semantic Completion
LLM 负责以下内容:
1. 对规则提取结果做归并和命名。
2. 补全 `workflowSteps` 的高层描述。
3. 判断 archetype。
4. 推断 `requestTemplate``responsePath``normalizeRules`
5. 输出不确定项和置信度。
### Input Strategy
当前“截前 15000/3000 字符”的做法需要替换为:
1. 目录结构摘要。
2. `index.html` 分块。
3. 命中关键模式的函数片段优先注入。
4. URL 和请求构建语句优先注入。
5. 同一场景允许分阶段提问,而不是一次性塞完整上下文。
## Compiler Architecture
### Compiler Split
建议把当前单一生成器改成 archetype 路由后的多编译器:
| Compiler | Responsibility |
|----------|----------------|
| `single_request_table` compiler | 生成简单表格采集 skill |
| `multi_mode_request` compiler | 生成多模式切换 skill |
| `paginated_enrichment` compiler | 生成分页 + 补数 + 过滤类 skill |
| `page_state_eval` compiler | 生成状态判定或轻量监测类 skill |
### Compiler Outputs
每个编译器都负责:
1. 生成 `scene.toml`
2. 生成 `SKILL.toml`
3. 生成 browser script。
4. 生成引用说明文档,例如 `collection-flow.md`
5. 输出 readiness 评级和风险说明。
### Marketing Case
`marketing-zero-consumer-report` 必须走 `paginated_enrichment` 编译器,至少生成这些逻辑骨架:
1. 主列表分页采集。
2. 对每个用户执行二次请求。
3. 聚合字段。
4. 业务过滤。
5. 导出或 report artifact 输出。
### TQ Case
`tq-lineloss-report` 必须走 `multi_mode_request` 编译器,至少生成这些逻辑骨架:
1. 模式识别。
2. 模式专属请求体构建。
3. 模式专属列定义。
4. 固定响应路径抽取。
5. 统一 artifact 输出。
## Runtime Contract Alignment
当前运行时参数解析器主要集中在 [src/compat/scene_platform/resolvers.rs],能力仍偏基础。编译器设计必须显式处理这层约束。
### Short-Term Strategy
短期有两种可选策略:
1. 扩展 resolver 集合,支持更多参数合同。
2. 在生成阶段限制输出,只允许生成当前运行时能消化的参数模型。
### Recommended Resolver Additions
第一阶段建议补齐以下解析能力:
- `mode_enum`
- `date_range`
- `org_tree`
- `page_size`
- `hidden_static`
- `derived_param`
如果 resolver 暂时不扩,则生成器必须在 UI 和生成报告中明确标出“不兼容运行时合同”的风险。
## Verification And Readiness Gates
### Static Gates
生成完成后必须先过静态门禁:
1. 是否识别到业务入口。
2. 是否识别到核心请求链。
3. 是否识别到正确 bootstrap。
4. 参数合同是否与 archetype 匹配。
5. 编译器是否覆盖全部关键步骤。
### Readiness Levels
建议为每个生成结果打分级标签:
| Level | Meaning |
|-------|---------|
| `A` | 可以直接进入内网试跑 |
| `B` | 结构正确,但建议人工校验后试跑 |
| `C` | 只适合作为草稿,需要人工补逻辑 |
### Minimum Acceptance For Reference Scenes
`marketing-zero-consumer-report`
1. 被识别为 `paginated_enrichment`
2. 识别出主列表请求和二次补数请求。
3. 识别出 `charge !== 0` 过滤逻辑。
4. 生成结果不再退化为单请求表格模板。
`tq-lineloss-report`
1. 被识别为 `multi_mode_request`
2. 识别出月/周双模式。
3. 模式切换字段、请求体和列定义被区分。
4. 生成结果与手工 skill 在结构上同构。
## File Impact
### New Or Modified Areas
| File | Responsibility |
|------|----------------|
| `frontend/scene-generator/llm-client.js` | 深度抽取 schema、分块上下文、置信度输出 |
| `frontend/scene-generator/generator-runner.js` | 文件读取、关键片段抽取、目录摘要 |
| `frontend/scene-generator/server.js` | 分析接口、IR 透传、生成报告 |
| `frontend/scene-generator/sg_scene_generator.html` | 抽取预览、风险展示、readiness 展示 |
| `src/bin/sg_scene_generate.rs` | 接收 `Scene IR` 或 IR JSON 参数 |
| `src/generated_scene/analyzer.rs` | 确定性提取、archetype 辅助识别 |
| `src/generated_scene/generator.rs` | archetype 路由和多编译器编排 |
| `src/generated_scene/ir.rs` | 定义统一 `Scene IR` |
| `src/compat/scene_platform/resolvers.rs` | 参数合同对齐与扩展 |
## Migration Strategy
### Phase 1
先修当前明显错误:
1. bootstrap 来源修正。
2. 移除通用报表默认硬编码。
3. 替换截断式 LLM 输入。
4. 生成前展示抽取预览。
### Phase 2
引入 `Scene IR`,完成“先建模、再生成”的主干改造。
### Phase 3
接入 archetype 分类器和多编译器。
### Phase 4
补运行时 resolver加入 readiness 门禁。
## Open Questions
1. `Scene IR` 是否作为单独 JSON 文件落地到输出目录,便于后续复用和回放。
2. `page_state_eval` 是否继续共用当前 `report-artifact`,还是定义独立 artifact 类型。
3. 是否允许用户在 Web UI 中手工修正 archetype、bootstrap 和参数合同后再生成。
## Acceptance Criteria
满足以下条件时,可以认为本设计达到预期:
1. 同一套生成流程能够同时覆盖 `tq-lineloss-report``marketing-zero-consumer-report` 两类差异明显的场景。
2. `marketing` 不再因错误 archetype 导致内网必然跑不通。
3. `scene.toml` 不再默认带入错误的组织、周期和标题关键字。
4. 生成结果具备明确的 readiness 分级,用户能在生成前识别风险。
5. 生成器在定位上从“模板填充器”升级为“通用场景 skill 编译器”。

View File

@@ -0,0 +1,243 @@
# G1-E Light Enrichment Report Design
**Goal:** 定义 `G1-E 轻量补查汇总型` 的正式实现口径,使生成器能够在“单主请求 + 少量补查请求 + 单次汇总输出”的边界内,稳定恢复出可编译的业务语义,并与普通 `G1 single_request_table` 明确区分。
**Status:** Draft
---
## Decision Summary
1. `G1-E``G1` 的上边界子型,不是 `G6/G7/G8` 的过渡桶。
2. `G1-E` 只承接“主查询清晰、补查轻量、最终仍归并为单次汇总结果”的报表场景。
3. `G1-E` 的编译目标不是继续伪装成纯 `single_request_table`,而是显式生成“主请求 + 轻量补查 + 汇总整形”三段式契约。
4. `G1-E` 必须在证据层恢复三类对象:主请求证据、补查请求证据、汇总映射证据。
5. 当补查链路升级为宿主桥接 workflow、多 endpoint 盘点或本地落库分析时,必须 fail-closed 并重分类,不得继续生成 `G1-E` skill。
6. `高低压新增报装容量月度统计表` 作为当前 `G1-E` 的 P0 样板,用于冻结最小可编译答案。
---
## Why This Family Exists
当前 `G1` 的问题不只是“识别率不够”,而是把两类结构混在了一起:
1. 真正的单主请求单表报表。
2. 主请求之外还带少量补查、补齐、映射拼装的轻量汇总报表。
这两类场景都长得像“普通报表”,但第二类如果硬塞进 `single_request_table`,会导致以下问题:
1. 只能抽到主页面状态,抽不出真实业务请求。
2. 即使生成成功,也没有补查契约,运行结果不完整。
3. 生成器会误把补查型样本当成通用模板,继续污染 `G1` 家族。
因此需要把这类场景单独收束为 `G1-E`,既保留它们仍属于通用报表上边界的事实,又防止继续伪装成纯单请求报表。
---
## Canonical P0 Sample
当前 `G1-E` 的标准样板固定为:
- `高低压新增报装容量月度统计表`
该场景的结构特征是:
1. 存在清晰主查询入口:`getWkorderAll`
2. 存在少量补查请求:
- `queryElectCustInfo`
- `queryBusAcpt`
- `getBatchPerCust97`
3. 最终输出仍是单次统计汇总,而不是宿主驱动的多步任务,也不是本地落库分析后再出文档
`G1-E` 的第一阶段实现和验收都以这个样板为准,不在本阶段横向扩更多家族。
---
## Non-Negotiable Boundaries
### 1. `G1-E` 仍属于报表直生家族
`G1-E` 仍应保持“场景页面直接可恢复业务查询”的基本属性,不能引入以下结构:
1. 宿主桥接主导执行
2. callback 串联的显式多步 workflow
3. 多 endpoint 分类盘点后再统一聚合
4. `localhost` 落库、SQL 分析、文档导出等二段式后链路
### 2. `G1-E` 不是兜底分类
只有在主请求明确、补查数量受控、补查职责单一的情况下,才能进入 `G1-E`
如果只是“看起来比 G1 复杂一些”,但证据无法收敛为轻量补查模型,就必须阻断并重新分流。
### 3. 编译输出必须显式表达补查链路
对于 `G1-E`,生成器不能再只输出一个模糊的“请求 + 表格”骨架。
输出结构里必须能看见:
1. 主请求是谁
2. 每个补查请求补的是什么
3. 补查结果如何并回主结果
---
## Family Definition
`G1-E 轻量补查汇总型` 的最小定义如下:
1. 存在一个可识别的主查询请求,负责拉取主列表或主统计结果。
2. 存在少量补查请求,数量通常为 `1-3` 个,且职责明确,不形成开放式 workflow。
3. 补查请求的触发方式可通过主结果行字段、固定上下文参数或有限枚举维度推导。
4. 最终输出仍为单次汇总表或单份统计结果,不依赖本地持久化再分析。
5. 页面整体仍可被视为“同一报表任务”,而不是多个独立业务流程拼接。
---
## Evidence Requirements
`G1-E` 至少需要恢复以下三层证据。
### 1. Main Request Evidence
必须恢复:
1. 主请求 endpoint
2. 主请求参数模板
3. 主请求响应路径
4. 主表字段或主结果字段映射
### 2. Enrichment Request Evidence
对每个补查请求,必须恢复:
1. 补查 endpoint
2. 触发条件
3. 关键入参来源
4. 返回字段中被消费的部分
### 3. Merge / Normalize Evidence
必须恢复:
1. 主结果与补查结果的关联键
2. 汇总列、补充列或映射列的生成规则
3. 最终输出字段与来源字段之间的映射关系
如果三层证据中任一层缺失到无法闭环,`G1-E` 应阻断,不得伪生成 runnable skill。
---
## Scene IR Contract
`G1-E` 的最小 `Scene IR` 不应再复用纯 `single_request_table` 的扁平结构,而应扩展为三段式:
1. `main_request`
- 主查询定义
2. `enrichment_requests[]`
- 补查请求列表
3. `merge_plan`
- 主结果与补查结果的并回、字段补齐与最终汇总规则
建议最小字段如下:
- `main_request.endpoint`
- `main_request.params`
- `main_request.response_path`
- `main_request.columns`
- `enrichment_requests[].endpoint`
- `enrichment_requests[].param_bindings`
- `enrichment_requests[].response_path`
- `enrichment_requests[].consumed_fields`
- `merge_plan.join_keys`
- `merge_plan.field_mappings`
- `merge_plan.aggregate_rules`
- `merge_plan.output_columns`
---
## Compiler Contract
`G1-E` 编译阶段至少应新增以下 gate
1. `main_request_resolved`
- 主请求是否恢复完整
2. `enrichment_requests_resolved`
- 是否识别出所有必要补查请求
3. `merge_plan_resolved`
- 是否恢复出主补查并回规则
4. `g1e_scope_compatible`
- 是否仍处于轻量补查边界内,而没有越界为 `G6/G7/G8`
编译器行为要求如下:
1.`main_request_resolved = false`,直接阻断。
2. 若补查请求疑似存在但 `enrichment_requests_resolved = false`,不得退化为 `G1` 成功。
3.`merge_plan_resolved = false`,不得输出缺少并回逻辑的伪 skill。
4. 若检测到宿主桥接、多 endpoint 扫数、落库分析等越界特征,直接阻断并给出家族重排建议。
---
## Runtime Shape
`G1-E` 的运行时目标形态应固定为:
1. 先执行主请求
2. 基于主结果触发有限补查
3. 将补查结果并回主结果
4. 输出单次汇总结果
这里的“有限补查”必须可控:
1. 不能无限递归
2. 不能升级为宿主驱动式多步任务编排
3. 不能变成接口盘点扫描
---
## Failure Taxonomy
`G1-E` 第一版至少要显式区分以下失败类型:
1. `missing_main_request`
- 主请求未恢复
2. `missing_enrichment_request`
- 补查请求存在,但未恢复完整
3. `missing_merge_plan`
- 能看见主链和补查链,但并回关系不完整
4. `scope_upgraded_to_g6`
- 实际是宿主桥接多步查询
5. `scope_upgraded_to_g7`
- 实际是多接口盘点汇总
6. `scope_upgraded_to_g8`
- 实际是抓取落库分析出文档
---
## Acceptance Criteria
`G1-E` 第一阶段完成的标志不是“能产出某个 skill 目录”,而是以下条件成立:
1. `高低压新增报装容量月度统计表` 能稳定恢复主请求、补查请求和并回规则三段式语义。
2. 生成结果不再退化为只有 `page_state_eval``params=[]``requestEntries=[]` 的空壳。
3. 编译器不会再把缺失补查契约的结果误判为普通 `G1` 成功。
4. 当样本越界时,系统能够明确阻断并说明应转入 `G6/G7/G8`,而不是继续产出低质量 skill。
---
## Out of Scope
本 spec 当前不覆盖:
1. `G6 宿主桥接多步查询型` 的 workflow 建模
2. `G7 多接口盘点汇总型` 的多 endpoint 盘点框架
3. `G8 抓取落库分析出文档型` 的本地存储与文档生成后链路
4. `G1-E` 之外的大规模家族扩展
---
## Next Step
这份 spec 冻结后,下一步应直接派生对应实施计划,内容只围绕以下三个实现对象展开:
1. `G1-E` 证据层补齐
2. `G1-E` 三段式 `Scene IR` / compiler gate 落地
3. `高低压新增报装容量月度统计表` 的 P0 样板验证

View File

@@ -0,0 +1,224 @@
# G2 家族整改设计
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
## Problem Statement
第一轮真实样本迁移已经完成 `G2` 家族三份代表样本的真实生成与对标分析:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `白银线损周报`
3. `线损同期差异报表`
分析结论已经在以下文档中固化:
1. `docs/superpowers/reports/2026-04-18-r1-real-tq-lineloss-analysis.md`
2. `docs/superpowers/reports/2026-04-18-g2-first-round-blocker-summary.md`
3. `docs/superpowers/reports/2026-04-18-first-round-migration-and-candidate-validation-report.md`
当前问题已经不再是“生成器能不能产出 skill 包”,而是:
> 生成器能产出包,但对 `G2` 线损多模式报表家族仍无法恢复主业务语义,因此没有任何样本达到候选验证门槛。
`G2` 家族已经稳定复现以下 blocker
1. archetype 从 `multi_mode_request` 坍缩为 `paginated_enrichment`
2. bootstrap 稳定落错到 `20.77.115.36:31051`
3. `modes = []`,只残留默认字段,不存在真实 mode 结构
4. `requestTemplate = null`,参数合同为空
5. `columnDefs = []`,列语义与 required fields 缺失
6. endpoint 候选被静态依赖、外链、其他业务系统严重污染
7. readiness 过度乐观,与真实可运行性脱节
因此,整改阶段的目标不是继续扩样,而是先把 `G2` 家族的主矛盾打透。
## Goal
在不脱离现有 `Scene IR + generator + readiness` 框架的前提下,把 `G2` 家族从“能抽取部分信号”提升到“能稳定恢复线损多模式报表主链”。
整改阶段的直接目标如下:
1. 正确识别 `G2` 家族 archetype 为 `multi_mode_request`
2. 正确恢复主业务 bootstrap而不是落到错误入口域
3. 恢复 `month/week` 模式矩阵
4. 恢复 mode-specific request contract
5. 恢复 mode-specific response path / column defs / normalize rules
6. 对 endpoint 污染建立有效隔离
7. 让 readiness 与真实业务闭合程度一致,避免虚高
## Success Criteria
整改阶段完成后,`G2` 样本至少要满足以下门槛:
1. `workflowArchetype = multi_mode_request`
2. `bootstrap.expectedDomain``targetUrl` 锚定到线损主业务承载面
3. `modes` 不为空,至少包含 `month``week`
4. 每个 mode 都有明确 request contract
5. 每个 mode 都有明确 response path 与 column defs
6. 生成脚本不再退化成通用 `paginate -> secondary_request -> filter` 骨架
7. readiness 不能在核心合同缺失时继续给出 `A`
整改阶段的通过标准仍然是“进入候选验证门槛”,不是直接宣称已经内网可运行。
## Non-Goals
本整改阶段不处理以下事项:
1. 不扩展到 `G1/G3` 家族整改
2. 不解决统一平台登录、目标系统后台登录或宿主认证恢复
3. 不重写整套 `Scene IR` 框架
4. 不把所有 BrowserAction 链完整抽象成全新 runtime 模型
5. 不在本阶段追求覆盖全部 102 个场景
## Scope
本阶段整改范围严格限定为:
1. `G2` 家族语义识别与编译链
2.`G2` 识别直接相关的 analyzer / evidence / generator / readiness 逻辑
3. `G2` 对应的 fixture、测试与对标基线
不进入:
1. `G1` 家族量产优化
2. `G3` 复杂分页补数 workflow 整改
3. 运行时 transport、浏览器桥接协议、登录链重构
## Root Cause Framing
基于第一轮报告,当前 `G2` 失真不是单点 bug而是四层问题叠加
### 1. Signal Weighting 错位
系统虽然抓到了:
1. `month/week/tjzq/mode`
2. 线损核心 endpoint
3. `responsePath = content`
但最终决策时,分页、补数、过滤等噪声信号权重更高,导致 archetype 选错。
### 2. Bootstrap Selection 错位
bootstrap 候选选择逻辑把“可见入口页”与“真实业务承载页”混淆,导致 `targetUrl` 稳定落错。
### 3. Mode Reconstruction 缺失
系统能看到模式词面信号,但没有把这些信号提升为:
1. mode matrix
2. per-mode request builder
3. per-mode response parser
4. per-mode column / normalize contract
### 4. Readiness Gate 过宽
当前 readiness 更像“结构生成完成度”,而不是“业务合同闭合度”,导致错误结果被高分放行。
## Design Principles
整改阶段遵循以下原则:
1. 先修判定,再修模板
2. 先修主链,再修文案
3. 先收窄 `G2` 边界,再扩到其他家族
4. 所有修复都必须落到可回归的 fixture 与测试
5. 任何无法闭合的 `G2` 样本必须 fail-closed而不是继续伪装为候选 skill
## Workstreams
整改阶段拆为五条工作流:
### WS1: G2 Archetype Rectification
目标:
`G2` 家族不再被分页/补数噪声夺权,优先命中 `multi_mode_request`
包含内容:
1. 收紧 `G2` archetype 识别条件
2. 提升 `month/week/tjzq/mode` 信号权重
3. 降低通用分页信号对 `G2` 的误导
### WS2: Bootstrap Rectification
目标:
让 bootstrap 选择聚焦真实业务承载面,而不是页面壳或错误入口。
包含内容:
1. 区分入口页、壳页面、真实主业务页
2.`localhost:*`、静态资源、外链保持排除
3.`G2` 增加主业务 bootstrap 选择约束
### WS3: Mode Contract Reconstruction
目标:
从证据层恢复 `month/week` 模式矩阵,并输出 mode-specific 合同。
包含内容:
1. 识别 mode switch field
2. 恢复 `modes[]`
3. 为每个 mode 恢复 request template
4. 为每个 mode 恢复 response path / column defs / normalize rules
### WS4: Endpoint Purification
目标:
把真正业务 endpoint 从依赖库、外链、其他系统噪声中剥离出来。
包含内容:
1. 过滤第三方库和文档 URL
2. 过滤静态资源与依赖包字符串
3. 提高线损业务 endpoint 候选排序权重
### WS5: Readiness Tightening
目标:
让 readiness 真正代表“合同闭合度”,而不是“生成是否完成”。
包含内容:
1. 新增 `G2` 必过 gate
2.`modes / request / columnDefs` 缺失时降级
3. 阻断虚高 `A`
## Required Deliverables
整改阶段至少应产出:
1. `G2` 整改 plan
2. `G2` 对应 fixture / canonical 对比资产更新
3. `G2` 回归测试
4. `G2` 整改后第二轮真实迁移报告
## Acceptance
整改阶段验收以 `G2` 家族三份样本为准:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `白银线损周报`
3. `线损同期差异报表`
至少满足以下要求:
1. 三者不再统一坍缩到 `paginated_enrichment`
2. 至少第一份样本达到候选验证门槛
3. 第二、第三份样本至少能输出更接近真实结构的 `G2` 合同,或者在证据不足时明确 fail-closed
## Next Step
基于本设计,下一步应直接落:
- `docs/superpowers/plans/2026-04-18-g2-remediation-plan.md`
该 plan 只围绕 `G2` 整改,不扩展到 `G1/G3` 或大规模场景迁移。

View File

@@ -0,0 +1,313 @@
# G3 Paginated Enrichment Design
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Inputs:**
> [2026-04-17-scene-skill-60-to-90-roadmap-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-17-scene-skill-60-to-90-roadmap-design.md)
> [2026-04-17-scene-skill-60-to-90-roadmap-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-17-scene-skill-60-to-90-roadmap-plan.md)
> [2026-04-18-first-real-scene-migration-execution-sheet.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-18-first-real-scene-migration-execution-sheet.md)
## Problem Statement
`60 -> 90` 主路线在 `G2` 多模式报表家族与 `G1-E` 轻量补查子型上已经取得阶段性收敛:
1. `G2` 已经从“主样本不可成型”推进到“主样本与多个变体进入候选验证阶段”
2. `G1` 已经完成边界收紧,`G1-E` 的首个 `P0` 样板已通过验证
但主路线中的 `P0-3` 仍然缺位:
1. `paginated_enrichment` 还没有形成正式的 `spec / plan / canonical` 三件套
2. `95598工单明细表` 虽已被明确指定为 `P0-3` 主样板,但尚未建立统一证据层、最小合同和失败 taxonomy
3. 当前系统仍缺少一条“面对复杂分页、补数、导出和宿主桥接混合 workflow 时,能先正确拆解,再决定生成或阻断”的正式路径
因此,当前问题已不再是“是否继续补更多 `G2` 变体”,而是:
> 必须把 `G3` 分页补数家族从“概念上知道它复杂”推进到“结构上可拆解、合同上可裁决、证据不足时稳定 fail-closed”。
## Goal
在不推翻现有 `Scene IR + compiler + readiness` 框架的前提下,把 `G3` 从“分页补数场景的宽泛标签”升级为“可建模、可编译、可阻断的复杂 workflow archetype”。
本设计的直接目标如下:
1. 正确识别 `G3` 的主请求链、分页链、补数链、导出链
2. 区分业务 workflow 与宿主桥接行为,不再让 `localhost:*`、BrowserAction 或宿主注入抢占业务主链
3. 建立 `paginated_enrichment` 的最小可编译证据集
4. 建立 `G3` 最小业务合同与 gate
5.`95598工单明细表` 成为 `P0-3` 的第一版 canonical answer 与失败基线
6. 在证据不足时稳定 `fail-closed`,不再产出伪可运行 skill
## Success Criteria
`G3` 首轮设计完成后,最低成功口径固定为:
1. `95598工单明细表` 不再被当成“普通分页表”或“模糊 workflow”
2. 系统能够显式拆出:
- `main request chain`
- `pagination chain`
- `enrichment chain`
- `export chain`
3. 系统能够显式区分:
- `business workflow evidence`
- `host bridge / localhost dependency evidence`
4. `paginated_enrichment` 具备最小合同,不再只是一个兜底 archetype 名称
5. 证据不足时,结果能够按固定 taxonomy `fail-closed`
6. `G3` 首轮回归结果要么达到候选验证门槛,要么给出准确阻断理由
## Non-Goals
本轮 `G3` 设计不处理以下事项:
1. 不扩展到全部 `102` 个场景同步整改
2. 不并行展开 `G6/G7/G8`
3. 不在本轮解决统一平台登录、隐藏域登录恢复或宿主 transport 重构
4. 不要求第一轮就还原所有工单类复杂业务语义细节
5. 不把本轮工作扩散为“全部 95598 家族一次性打通”
6. 不为了先生成 skill 而放松 gate
## Scope
本轮 `G3` 设计范围严格限定为:
1. `G3` 家族边界定义
2. `G3` 证据层建模
3. `G3` 最小合同与 gate 设计
4. `G3``P0-3` canonical baseline 设计
5. `95598工单明细表` 与一个 `G3` 扩展样板的首轮回归口径
本轮不进入:
1. 大规模 `95598` 场景扩展
2. 工单类全部子家族重排
3. 运行时协议改造
4. 导出后处理、落库分析、文档生产等更高层产物链路重构
## Fixed Samples
本设计冻结如下样板:
### P0 Main Sample
1. `95598工单明细表`
定位:
1. `paginated_enrichment.list_detail_filter_export`
2. `P0-3` 主样板
3. 第一版 `G3 canonical` 唯一校准源
### P1 Expansion Sample
1. `95598、12398、流程超期风险工单明细`
定位:
1. `G3` 第一扩展样板
2. 用于验证 `P0-3` 的合同与证据层是否具备复用性
在本设计完成前,不新增第三个 `G3` 首轮样板。
## Family Definition
`G3` 的正式定义固定为:
> 以分页明细拉全为主链,并伴随详情补查、关联补数、过滤去重、导出动作或阶段性聚合的复杂 workflow 报表场景。
该定义下,`G3` 至少具备以下一个或多个显著特征:
1. 存在明确主查询接口,但最终结果不是单页即得
2. 需要显式分页拉全或滚动时间窗口
3. 需要对列表行做二次补查或关联详情查询
4. 存在主链、补链、导出链并存的情况
5. 最终产物依赖分页明细完整性,而不是单请求返回结果
## Inclusion Rules
`G3` 进入条件固定如下:
1. 存在主查询链候选
2. 存在分页控制证据
3. 存在补查、明细详情或二次链路证据
4. 最终目标是明细拉全、补齐、筛选、导出或汇总
5. 业务链可以与宿主桥接链做分层
## Exclusion Rules
出现以下特征之一时,不再归入当前 `G3`
1. 只存在单次请求表格返回,无分页与补数闭环
2. 只有页面点击链,没有可恢复的业务主链
3. 主体价值在本地落库、SQL 分析或 Word 产物流水线,且业务主链无法恢复
4. `localhost:*` 或宿主桥接动作压倒业务请求证据
5. 主要问题不是分页补数,而是宿主多步桥接或文档生产
## Root Cause Framing
当前 `G3` 迟迟未进入正式落地,不是因为它“太复杂无法做”,而是因为存在三个基础缺口:
### 1. Workflow Signals Are Still Flattened
当前生成链更擅长提取:
1. endpoint 名称
2. 参数片段
3. 导出调用痕迹
4. BrowserAction 或页面控制痕迹
但缺少把这些信号重建成分层 workflow 的机制,因此:
1. 主链与补链混杂
2. 导出链容易被误当成主业务链
3. 宿主桥接与业务链混杂
### 2. Paginated Contract Is Missing
当前系统还没有 `G3` 专属的最小合同,因此无法明确回答:
1. 什么算“分页链已恢复”
2. 什么算“补数链已恢复”
3. 什么算“join key 已成立”
4. 什么算“导出链只是附属动作而不是主链”
### 3. Fail-Closed Taxonomy Is Missing
即使系统意识到结果不能放行,也缺少固定的失败类型表,因此容易出现:
1. 阻断理由模糊
2. readiness 不可解释
3. 结果无法用于后续回归
## Design Principles
`G3` 设计阶段遵循以下原则:
1. 先拆 workflow再讨论生成
2. 先建证据层,再建合同
3. 先把宿主链隔离,再恢复业务主链
4. 优先保证 `fail-closed` 的准确性,而不是优先追求高通过率
5. 所有规则必须可落到 `fixture / test / report`
## Required Evidence Types
在通用证据层之上,`G3` 首轮最小证据类型集合固定为:
1. `main_request_candidate`
2. `pagination_candidate`
3. `enrichment_request_candidate`
4. `join_key_candidate`
5. `export_candidate`
6. `workflow_step_candidate`
7. `dedupe_or_merge_rule_candidate`
8. `host_bridge_candidate`
9. `localhost_dependency_candidate`
10. `browser_action_candidate`
## Evidence Layer Requirements
`G3` 证据层最少必须回答以下问题:
1. 主查询链是什么
2. 分页控制来自哪里
3. 补数链有哪些候选
4. 主链和补链靠什么字段关联
5. 导出动作属于业务链还是结果产物链
6. 哪些行为属于宿主桥接或本地依赖
## Minimal Business Contract
`G3` 的最小可编译合同至少包括:
1. `main_request`
2. `pagination_plan`
3. `enrichment_requests[]`
4. `join_keys[]`
5. `export_plan`
6. `merge_or_dedupe_rules`
只有这些对象闭合时,`G3` 才允许进入可编译状态。
## Required Gates
`G3` 统一 gate 名称最少包括:
1. `g3_main_request_resolved`
2. `g3_pagination_contract_complete`
3. `g3_enrichment_contract_complete`
4. `g3_join_key_resolved`
5. `g3_export_path_identified`
6. `g3_runtime_scope_compatible`
## Fail-Closed Policy
以下情况必须明确 `fail-closed`
1. 主请求链缺失
2. 分页链存在但终止条件不明
3. 补数链存在但 join key 不明
4. 只有导出动作,没有业务主链
5. 宿主桥接证据明显多于业务证据
6. 运行时依赖明显超出当前 `G3` 合同边界
## P0 Canonical Target
`95598工单明细表` 的 canonical baseline 完成后,至少应冻结以下资产:
1. canonical `Scene IR`
2. 关键证据清单
3. 最小合同表
4. 验收检查表
5. 失败 taxonomy
## Failure Taxonomy
`G3` 第一版失败 taxonomy 最少包括:
1. `main_chain_missing`
2. `pagination_incomplete`
3. `enrichment_incomplete`
4. `join_key_missing`
5. `export_only_without_business_chain`
6. `host_bridge_pollution`
7. `runtime_dependency_unresolved`
## Validation Baseline
`G3` 回归时,必须按统一口径检查:
1. archetype 是否正确
2. bootstrap 是否合理
3. 主请求链是否恢复
4. 分页链是否恢复
5. 补数链是否恢复
6. join key 是否恢复
7. 导出链是否恢复
8. 宿主链是否被隔离
9. readiness / blocker 是否可解释
## Required Deliverables
本设计落地时至少产出:
1. `G3` 设计稿
2. `G3` 实施计划
3. `G3` 首轮 `fixture / test` 扩展目标
4. `95598工单明细表` 的 canonical 设计目标
5. `G3` 首轮验证报告模板
## Acceptance
本设计完成的标志是:
1. `G3` 已从宽泛标签进入正式 archetype 设计
2. `95598工单明细表` 被固定为 `P0-3` 主样板
3. `G3` 证据层、最小合同、gate 和 fail-closed 口径被明确定义
4. 后续实现不再把 `G3` 当成“遇到复杂就兜底”的模糊类型
## Next Step
基于本设计,下一步应直接落地:
- `docs/superpowers/plans/2026-04-18-g3-paginated-enrichment-plan.md`
`plan` 只围绕 `G3 / P0-3` 实施,不扩展到 `G6/G7/G8` 或全量场景铺开。

View File

@@ -0,0 +1,67 @@
# G6 Host Bridge Workflow Design
> Date: 2026-04-18
> Status: Initial implementation slice
## Goal
Define `G6 宿主桥接多步查询型` as a separate scene family so boundary samples no longer fall back into `G1` or `G1-E`.
The initial implementation goal is classification and fail-closed safety, not runnable generation.
## Family Definition
`G6` covers scenes where the business workflow is primarily advanced by host-browser bridge actions instead of direct request contracts.
Minimum signals:
1. explicit host bridge action such as `BrowserAction(...)`
2. explicit browser script bridge such as `sgBrowserExcuteJsCode(...)`
3. callback-driven request progression
4. business endpoints nested behind the host callback chain
5. optional `localhost:*` dependency as host runtime evidence
## P0 Boundary Sample
`电能表现场检验完成率指标报表`
Repo-local representative:
`tests/fixtures/generated_scene/g6_host_bridge_workflow`
## Contract Policy
The first slice intentionally does not generate runnable skills for `G6`.
Instead, it must:
1. classify the scene as `host_bridge_workflow`
2. preserve host bridge actions as evidence
3. preserve `localhost:*` dependencies as host-runtime evidence
4. prevent fallback to `single_request_table`
5. prevent fallback to `single_request_enrichment`
6. fail closed with a stable blocker
## Non-Goals
1. no host transport redesign
2. no callback runtime implementation
3. no full browser bridge orchestration
4. no broad `G7/G8` expansion
5. no weakening of `G1-E` or `G3` gates
## Readiness Gates
The first slice adds these G6-specific gates:
1. `g6_host_bridge_detected`
2. `g6_fail_closed`
`g6_fail_closed` is expected to fail until a real G6 runtime contract exists.
## Acceptance Criteria
1. `G6` fixture is classified as `host_bridge_workflow`
2. generation fails closed instead of writing a pseudo-runnable skill
3. ordinary localhost export noise does not get promoted to `G6`
4. existing `G1-E`, `G3`, `G2`, and canonical tests remain green

View File

@@ -0,0 +1,48 @@
# G7 Multi Endpoint Inventory Design
> Date: 2026-04-18
> Status: Initial implementation slice
## Goal
Define `G7 多接口盘点汇总型` as a separate family so multi-endpoint inventory scenes no longer fall back into `G1` or `G1-E`.
The first implementation slice is classification and fail-closed safety only.
## Family Definition
`G7` covers scenes that query multiple inventory/statistics endpoints by asset category and aggregate the results into one report.
Minimum signals:
1. three or more inventory/statistics endpoints
2. endpoint names or URLs carrying `assetStats`, `inventory`, `stock`, `AcqTrml`, `MeterCommonModule`, or `JlGnModule`
3. no explicit host bridge action requirement
4. no local SQL/document-generation pipeline requirement
## P0 Boundary Sample
`计量资产库存统计`
Repo-local representative:
`tests/fixtures/generated_scene/g7_multi_endpoint_inventory`
## Contract Policy
The first slice intentionally blocks runnable generation until a real G7 inventory contract exists.
The initial system must:
1. classify as `multi_endpoint_inventory`
2. preserve inventory endpoint evidence
3. avoid fallback to `single_request_table`
4. avoid fallback to `single_request_enrichment`
5. fail closed with a stable blocker
## Acceptance Criteria
1. the representative fixture classifies as `multi_endpoint_inventory`
2. at least five inventory endpoints are detected in the fixture
3. generation fails closed
4. existing `G1-E`, `G3`, `G6`, and `G2` regressions remain green

View File

@@ -0,0 +1,56 @@
# G8 Local Document Pipeline Design
> Date: 2026-04-18
> Status: Initial implementation slice
## Goal
Define `G8 抓取落库分析出文档型` as a separate family so local storage, SQL analysis, and document-generation scenes no longer fall back into `G1`, `G1-E`, `G6`, or `G3`.
The first implementation slice is classification and fail-closed safety only.
## Family Definition
`G8` covers scenes where page/browser data capture is only the front half of the workflow. The business result depends on a downstream local pipeline:
1. local service persistence or `selectData`
2. SQL analysis such as `definedSqlQuery`
3. document generation such as `docExport`
4. optional host bridge actions
5. optional `localhost:*` dependencies
## P0 Boundary Sample
`95598供电服务月报`
Repo-local representative:
`tests/fixtures/generated_scene/g8_local_doc_pipeline`
## Contract Policy
The first slice intentionally blocks runnable generation until a real G8 local document pipeline contract exists.
The initial system must:
1. classify as `local_doc_pipeline`
2. preserve local pipeline evidence
3. avoid fallback to `page_state_eval`
4. avoid fallback to `host_bridge_workflow`
5. avoid fallback to `single_request_table`
6. fail closed with a stable blocker
## Priority Rule
When both host bridge and local document pipeline signals exist, `G8` wins over `G6`.
Reason: `G6` is about host-bridge-driven query progression; `G8` is about the downstream local storage, SQL, and document production chain.
## Acceptance Criteria
1. representative fixture classifies as `local_doc_pipeline`
2. local pipeline actions include `definedSqlQuery`
3. local pipeline actions include `docExport`
4. local pipeline actions include `selectData`
5. generation fails closed
6. existing `G1-E`, `G3`, `G6`, `G7`, and `G2` regressions remain green

View File

@@ -0,0 +1,380 @@
# Scene Generator Ops Console Design
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
## Problem Statement
当前 `http://127.0.0.1:3210/` 页面虽然已经具备 scene 选择、深度分析、Skill 生成和日志展示能力,但页面默认形态仍然更接近“开发调试控制台”,而不是“运维执行工作台”。
当前主要问题包括:
1. 首屏信息过多,配置项、分析结果、技术细节和日志同时展开,认知负担过高
2. 大量英文标题、字段名和技术术语直接暴露给运维人员,理解成本高
3. `Scene IR``workflowArchetype``requestTemplate``evidence` 等调试信息默认可见,不符合运维默认使用场景
4. 页面目前优先服务开发者调试,而不是运维执行、结果确认和问题定位
因此,该页面需要从“调试面板”收敛为“面向运维的场景 Skill 生成工作台”,并通过信息分层、中文化和双模式设计降低使用门槛。
## Goal
在不削弱原有分析和生成能力的前提下,将 scene generator 页面重构为:
1. 默认服务运维执行
2. 默认中文化
3. 默认只展示结论、操作和结果
4. 将技术细节折叠为调试层
页面重构后的阶段性目标是让运维人员可以不理解底层 `Scene IR` 和 archetype 术语,也能完成以下任务:
1. 选择场景目录
2. 启动分析
3. 判断是否可生成
4. 启动生成
5. 查看结果目录或失败原因
## Non-Goals
1. 不在本轮界面优化中修改 scene generator 后端接口协议
2. 不在本轮优化中重构分析算法或生成逻辑
3. 不要求删除现有调试信息,只要求调整默认显隐与信息分层
4. 不要求一次性完成全部视觉风格重设计
## User Roles
页面需要明确区分两类使用者:
### 1. 运维执行者
主要关注:
1. 处理哪个场景
2. 当前是否可生成
3. 为什么不能生成
4. 生成结果在哪里
5. 是否需要人工确认
### 2. 开发 / 调试者
主要关注:
1. `workflowArchetype`
2. `Scene IR`
3. `requestTemplate`
4. `evidence`
5. `bootstrap`
6. 原始日志流
默认界面必须优先服务运维执行者,开发 / 调试者通过“技术详情”进入二级信息层。
## Design Principles
### 1. 默认运维模式
首页默认展示“运维执行工作台”,而不是“技术调试面板”。
### 2. 先结论后证据
首屏先展示:
1. 当前状态
2. 场景识别结果
3. 可执行性评估
4. 风险摘要
5. 生成结果
技术证据、原始结构和底层日志应延后展示。
### 3. 默认中文化
面向运维的标题、按钮、状态、风险说明和结果文案应全部中文化。
### 4. 技术细节折叠
`Scene IR``evidence``requestTemplate``workflow steps` 等信息应进入“技术详情(调试用)”,默认折叠。
### 5. 状态表达业务化
不直接向运维展示 `Readiness A/B/C``workflowArchetype` 等底层字段,而应映射为可读的业务状态。
## Information Architecture
页面建议收敛为以下五个区域:
## 1. 顶部总览区
用于一眼说明页面用途和当前总体状态。
建议包含:
1. 页面标题
2. 页面副标题
3. 服务状态
4. 当前状态
5. 最近操作时间
## 2. 左侧主操作区
用于承载运维日常需要使用的输入与动作。
建议包含:
1. 场景目录
2. 场景名称
3. 输出目录
4. 开始分析
5. 生成 Skill
6. 重新开始
7. 高级设置(折叠)
## 3. 右侧结果摘要区
这是首屏核心区域,负责承载:
1. 场景识别结果
2. 可执行性评估
3. 风险提示
4. 生成结果
## 4. 底部执行过程区
用于展示中文化后的关键执行过程日志,而不是开发流原始 SSE 输出。
## 5. 技术详情区
默认折叠,仅在开发和排障时查看。
建议包含:
1. 场景识别详情
2. 接口与请求信息
3. 执行步骤
4. 模式信息
5. 识别依据
6. 风险与缺失项
7. 原始 JSON / Scene IR
8. 原始技术日志
## Default Page Layout
建议页面结构如下:
```text
[标题区]
场景 Skill 生成工作台
当前状态 | 服务状态 | 最近操作时间
[左侧:操作区]
场景目录
场景名称
输出目录
开始分析
生成 Skill
高级设置(折叠)
[右侧:结果摘要区]
卡片1场景识别结果
卡片2可执行性评估
卡片3风险提示
卡片4生成结果
[底部:执行过程]
中文摘要日志
[折叠区:技术详情(调试用)]
场景识别详情
工作流步骤
模式信息
请求模板
证据与风险
原始 JSON
原始技术日志
```
## Default Field Visibility
### 一级信息:运维必须看
默认始终可见:
1. 场景目录
2. 场景名称
3. 输出目录
4. 当前状态
5. 场景类型
6. 可执行性评估
7. 风险摘要
8. 生成结果
9. 输出目录 / 结果文件
### 二级信息:运维偶尔看
默认展示简版:
1. 目标系统
2. 输出类型
3. 最近一次执行结果
4. 阻断原因
### 三级信息:开发 / 调试看
默认折叠:
1. `scene-id`
2. `scene-kind`
3. `targetUrl override`
4. `workflow archetype override`
5. `requestTemplate`
6. `staticParams`
7. `evidence`
8. `confidence`
9. `bootstrap domain`
10. `workflow steps`
11. `endpoints`
12. 原始 SSE 日志
## Chinese Copy Strategy
## Page Title
建议使用:
- `场景 Skill 生成工作台`
副标题建议:
- `用于分析场景、生成 Skill并查看内网执行准备情况`
## Main Action Labels
建议按钮文案:
1. `选择目录`
2. `开始分析`
3. `生成 Skill`
4. `重新开始`
5. `恢复默认`
6. `打开输出目录`
7. `查看结果文件`
## Section Titles
建议区块标题:
1. `场景操作`
2. `分析结果`
3. `场景识别结果`
4. `可执行性评估`
5. `风险提示`
6. `生成结果`
7. `执行过程`
8. `技术详情(调试用)`
## Status Copy
建议页面主状态:
1. `待选择场景`
2. `已选择场景,待分析`
3. `分析中`
4. `分析完成`
5. `可直接生成`
6. `可生成但需确认`
7. `暂不建议生成`
8. `生成中`
9. `生成完成`
10. `生成失败`
## Readiness Mapping
不建议直接向运维展示 `Readiness A/B/C`,建议映射为:
1. `A -> 可直接生成`
2. `B -> 可生成但需确认`
3. `C -> 暂不建议生成`
## Archetype Mapping
不建议直接向运维展示英文 archetype建议映射为
1. `single_request_table -> 单页报表`
2. `wrapped_single_mode -> 单页报表`
3. `multi_mode_request -> 多模式报表`
4. `paginated_enrichment -> 分页明细`
5. `page_state_eval -> 页面检测`
6. `embedded_page_tool -> 工具场景`
7. `page_exec_check -> 检测场景`
## Result Copy Examples
可执行性评估区建议使用中文业务态说明,例如:
1. `已识别完整查询链与报表输出链,可直接生成`
2. `主要流程已识别,但存在部分风险,建议确认后生成`
3. `当前缺少关键执行信息,暂不建议直接生成`
风险提示区建议使用简短中文风险,例如:
1. `未识别完整分页链`
2. `导出规则识别不完整`
3. `目标系统地址存在冲突`
4. `场景类型识别置信度偏低`
5. `存在宿主桥接依赖,需内网环境验证`
执行过程区建议使用中文摘要日志,例如:
1. `已开始分析场景`
2. `已完成基础信息识别`
3. `已完成深度分析`
4. `已识别场景类型:多模式报表`
5. `已开始生成 Skill`
6. `Skill 已生成完成`
7. `输出目录xxx`
8. `生成失败:未识别完整分页补数链`
## Interaction Model
### Default Flow
运维默认流程应收敛为:
1. 选择场景目录
2. 点击开始分析
3. 查看分析摘要
4. 点击生成 Skill
5. 查看结果目录或失败原因
### Advanced Flow
只有在分析失败、生成失败或需要排障时,才进入以下流程:
1. 展开风险详情
2. 展开技术详情
3. 查看原始日志和识别依据
## Implementation Priorities
本界面优化建议按以下顺序推进:
1. 默认中文化
2. 默认隐藏技术细节
3. 默认只展示“状态摘要 + 操作 + 结果”
4. 日志区中文化
5. 高级设置折叠
6. 技术详情折叠
## Acceptance Criteria
本界面优化完成的标志是:
1. 运维人员不理解 `Scene IR``workflowArchetype` 等术语,也能完成场景分析和 Skill 生成
2. 首屏不再出现大面积未经翻译的英文标题和底层技术字段
3. 首屏主要承载“状态摘要 + 操作 + 结果”,技术细节默认折叠
4. 页面默认服务运维执行,技术调试仍可通过二级区域完成
## Open Questions
1. 是否需要显式提供“运维模式 / 调试模式”切换,而不是仅通过折叠区分层
2. 结果文件是否需要在页面内提供直接打开入口
3. 风险提示区是否需要区分“阻断项”和“提醒项”

View File

@@ -0,0 +1,182 @@
# sgClaw Scene Skill Post-Roadmap Execution Design
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Plan:** [2026-04-17-scene-skill-60-to-90-roadmap-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-17-scene-skill-60-to-90-roadmap-plan.md)
## Problem Statement
The current `60-to-90 roadmap` has already completed the planned mainline scope:
1. `G2` is now a code-backed promoted family baseline with no remaining queue item.
2. `G1-E` is now a code-backed promoted family baseline with no remaining queue item.
3. `G3` is now a code-backed promoted family baseline with no remaining queue item.
4. `G6/G7/G8` remain established boundary-runtime families.
5. `Track E` already has frozen snapshot, current overlay, family assets, and roadmap status assets.
This means the next problem is no longer:
`How do we finish the current roadmap?`
It is now:
`How do we convert completed repo-local roadmap assets into a stable execution board, a real-sample validation program, and a bounded next-stage roadmap without reopening old implementation work?`
## Goal
Define the next-stage execution design after the current roadmap closure, with three explicit goals:
1. unify the current `102-scene` execution state into one authoritative board
2. introduce real-sample validation as the next quality gate above repo-local fixture success
3. prepare the next bounded roadmap for boundary families and runtime gaps without silently extending the old roadmap
## Success Definition
The next stage is considered successful when:
1. every currently known scene has a stable current-state label in one execution board
2. `repo-local baseline success` and `real-sample success` are explicitly separated
3. the next roadmap boundary is written down before new implementation work begins
4. deferred families and runtime gaps have explicit entry criteria instead of ad hoc expansion
## Scope
This next-stage design includes:
1. current execution-board unification
2. real-sample validation planning and first-round recording
3. boundary-family and runtime-gap prioritization
4. next-stage roadmap design and plan assets
This design does not include:
1. reopening `G1/G2/G3` P0/P1 compiler work already completed
2. unlimited fixture expansion
3. full `102-scene` end-to-end runtime rollout
4. direct implementation of unified login recovery
5. direct implementation of all host-runtime and transport gaps
## Current Baseline
The current repo already has the following stable assets:
1. `roadmap_execution_status_2026-04-18.json`
2. `scene_ledger_snapshot_2026-04-18.json`
3. `scene_ledger_status_2026-04-18.json`
4. `p1_family_manifest.json`
5. `p1_family_results.json`
Together they show that the roadmap mainline is complete at the repo-local level, but they do not yet provide:
1. one unified `102-scene current execution board`
2. one authoritative real-sample validation layer
3. one explicit next-stage roadmap boundary
## Design Principles
1. Do not extend the old roadmap silently.
2. Keep `repo-local promotion` and `real-world validation` as separate stages.
3. Treat family assets as stable inputs, not as temporary scratch data.
4. Keep `G4/G5` deferred until a new entry decision is documented.
5. Keep runtime-gap planning separate from archetype-family planning.
6. Keep execution-board work minimal and subordinate to real-sample validation.
7. Move into real-sample validation as soon as `G2`, `G1-E`, and `G3` each have one mappable real sample.
8. Defer any new asset that does not directly support current validation execution.
## Workstream Model
The next stage is divided into four workstreams:
1. `WS1` Current Execution Board Unification
2. `WS2` Real Sample Validation
3. `WS3` Boundary and Runtime Gap Planning
4. `WS4` Next Roadmap Definition
## WS1: Current Execution Board Unification
### Intent
Unify the frozen snapshot, current overlay, family assets, and roadmap status into one authoritative scene-execution board.
This board is a support layer for validation, not a new standalone asset program.
### Required Outputs
1. current execution board
2. snapshot-vs-current diff table
3. family-to-scene mapping table
### Acceptance
1. every scene has one current-state label
2. promoted baseline and promoted expansion states are visible at scene level
3. no manual cross-reading across multiple assets is required to know current status
4. the board stays limited to the minimum structure required by real-sample validation
## WS2: Real Sample Validation
### Intent
Introduce the next quality layer above fixture success by validating representative real samples for current mainline families.
Once one mappable real sample exists for each of `G2`, `G1-E`, and `G3`, this workstream takes priority over further board refinement.
### Required Outputs
1. real-sample validation plan
2. first-round validation records
3. mismatch taxonomy
4. execution-board status updates
### Acceptance
1. each mainline family has at least one real-sample validation record
2. real-world mismatch reasons are explicit
3. fixture success is no longer treated as the final success state
4. validation execution is not blocked by nonessential board or reporting assets
## WS3: Boundary and Runtime Gap Planning
### Intent
Prepare the next bounded scope by deciding what should happen with `G4/G5` and with runtime gaps that the current roadmap intentionally excluded.
### Required Outputs
1. boundary family readiness notes
2. deferred family entry criteria
3. runtime gap matrix
4. prioritization note for next implementation round
### Acceptance
1. `G4/G5` do not enter by drift
2. runtime gaps have explicit classifications
3. next implementation round has a documented reason for scope choice
## WS4: Next Roadmap Definition
### Intent
Write the next bounded roadmap instead of continuing indefinitely under the old one.
### Required Outputs
1. post-roadmap design
2. post-roadmap plan
3. milestone table
4. new completion criteria
### Acceptance
1. the next stage has its own scope guardrails
2. the next stage has its own completion criteria
3. new work no longer depends on stretching the old roadmap beyond closure
## Completion Criteria
This design is considered fully executed when:
1. the current roadmap is explicitly marked completed in execution assets
2. the execution board is unified
3. real-sample validation has begun with formal records
4. a new bounded roadmap exists for post-roadmap work

View File

@@ -0,0 +1,64 @@
# sgClaw Scene Skill Real Sample Validation Roadmap Design
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Plan:** [2026-04-18-scene-skill-post-roadmap-execution-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-18-scene-skill-post-roadmap-execution-plan.md)
## Problem Statement
The completed `60-to-90 roadmap` established repo-local promoted baselines for `G2`, `G1-E`, and `G3`, but the next quality barrier is no longer family promotion.
It is now real-sample validation:
1. `G2` already has a real mismatch anchor.
2. `G1-E` already has a real pass anchor.
3. `G3` is now selected into the real-sample queue but still lacks an executed real-run record.
4. `G6/G7/G8` remain boundary families until runtime gaps are explicitly closed.
The next roadmap must therefore be validation-first instead of asset-first.
## Goal
Define the next bounded roadmap around three immediate goals:
1. convert current selected real samples into formal pass/mismatch/fail-closed records
2. use validation pressure to decide whether boundary families or deferred families should enter implementation
3. keep execution-board work subordinate to validation rather than growing into a new asset program
## Scope
This roadmap includes:
1. real-sample execution for currently selected `G2/G1-E/G3` anchors
2. validation-result-driven scope decisions for `G6/G7/G8`
3. entry decisions for `G4/G5` only after explicit criteria are met
This roadmap does not include:
1. reopening completed repo-local compiler work for `G1/G2/G3`
2. unlimited fixture expansion
3. full 102-scene runtime rollout
4. direct implementation of all runtime gaps in one round
## Design Principles
1. Real-sample validation is the primary execution axis.
2. Execution-board changes must only exist to support validation records.
3. Boundary-family expansion must be justified by validation pressure, not drift.
4. Deferred-family entry must be decided explicitly before implementation begins.
## Workstream Model
1. `WS1` Mainline Real Sample Execution
2. `WS2` Validation Result Triage And Scope Decisions
3. `WS3` Boundary Runtime Enablement Decision
4. `WS4` Deferred Family Entry Decision
## Completion Criteria
This roadmap is complete when:
1. `G2`, `G1-E`, and `G3` each have executed real-sample records
2. the next implementation scope is selected from validation evidence
3. boundary-family and deferred-family entry decisions are documented before new implementation begins

View File

@@ -0,0 +1,71 @@
# 102 Final Coverage Status Rollup Design
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-design.md`
> Parent Layer: `Layer E`
> Status: Active
## Intent
Publish a final, policy-governed coverage rollup after the residual 13 closure sequence.
This design consolidates the latest full-coverage reconciliation candidate view with the residual 13 follow-up reconciliation result. It does not update the official execution board.
## Inputs
1. `tests/fixtures/generated_scene/full_coverage_reconciliation_candidates_2026-04-19.json`
2. `tests/fixtures/generated_scene/residual_13_reconciliation_candidates_2026-04-19.json`
3. `tests/fixtures/generated_scene/boundary_residual_hold_decision_2026-04-19.json`
4. `tests/fixtures/generated_scene/bootstrap_target_residual_isolation_2026-04-19.json`
5. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
## Output
1. `tests/fixtures/generated_scene/final_coverage_status_rollup_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-102-final-coverage-status-rollup-report.md`
## Rollup Rule
Start from the 102-scene full coverage reconciliation candidates.
For every scene present in the residual 13 reconciliation result, replace its previous candidate status with the residual follow-up candidate status.
## Status Model
1. `framework-auto-pass-candidate`
2. `framework-structured-fail-closed`
3. `framework-valid-host-bridge`
4. `source-unreadable`
5. `unsupported-family`
6. `misclassified-unresolved`
7. `missing-source`
## Expected Final Shape
After residual closure:
1. `95` framework auto-pass candidates
2. `7` structured fail-closed / hold / isolation candidates
3. `0` unresolved states
4. `0` source-unreadable states
5. `0` unsupported-family states
6. `0` misclassified-unresolved states
## Boundary
This design must not:
1. update `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`;
2. modify `src/generated_scene/analyzer.rs`;
3. modify `src/generated_scene/generator.rs`;
4. promote scenes to official board state;
5. rerun the 102 sweep;
6. add a family.
## Acceptance Criteria
1. The rollup contains exactly `102` scenes.
2. Residual 13 updates are applied.
3. The rollup summary matches the final expected shape.
4. The official execution board remains untouched.
5. The report states whether official board reconciliation should be the next step.

View File

@@ -0,0 +1,20 @@
# 102 Framework Closure Rollup Design
> Date: 2026-04-19
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
## Intent
Publish the final framework-level status for all 102 scenes after the final-2 residual sequence.
This is a reporting layer, not an implementation layer.
## Closure States
The rollup must distinguish:
1. full framework auto-pass
2. named structured hold
3. unresolved status
The target is `unresolved = 0`.

View File

@@ -0,0 +1,66 @@
# 102 Full Coverage Follow-Up Sweep And Reconciliation Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Child Sequence: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-child-plan-sequence-plan.md`
> Parent Layer: `Layer E`
## Intent
Run one full 102-scene follow-up sweep after Routes 2 through 6 are complete, then publish a reconciliation candidate view governed by the Route 6 promotion policy.
This design measures cumulative coverage delta. It does not directly update the official execution board.
## Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 2 follow-up assets
3. Route 3 follow-up assets
4. Route 4 follow-up assets
5. Route 5 boundary decisions
6. Route 6 promotion policy
## Output Assets
1. `tests/fixtures/generated_scene/full_coverage_followup_sweep_2026-04-19.json`
2. `tests/fixtures/generated_scene/full_coverage_reconciliation_candidates_2026-04-19.json`
3. `docs/superpowers/reports/2026-04-19-102-full-coverage-followup-sweep-report.md`
4. `docs/superpowers/reports/2026-04-19-102-full-coverage-reconciliation-candidates-report.md`
## Status Model
The sweep must report:
1. `auto-pass`
2. `fail-closed-known`
3. `adjudicated-valid-host-bridge`
4. `source-unreadable`
5. `missing-source`
6. `unsupported-family`
7. `misclassified-unresolved`
The reconciliation candidate view must also report:
1. `framework-auto-pass-candidate`
2. `framework-structured-fail-closed`
3. `framework-valid-host-bridge`
4. `hygiene-pass-candidate`
5. `hygiene-fail-closed-candidate`
## Guardrails
1. Do not modify `src/generated_scene/analyzer.rs`.
2. Do not modify `src/generated_scene/generator.rs`.
3. Do not update `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`.
4. Do not promote scenes directly.
5. Do not open a new family.
6. Do not start implementation work from the sweep result.
## Completion Criteria
1. A fixed 102-scene sweep result exists.
2. A reconciliation candidate asset exists.
3. Coverage delta is reported against the previous structured follow-up baseline.
4. The report states the remaining gap to the 102-scene target.

View File

@@ -0,0 +1,116 @@
# 102 Full Sweep Dry-Run Design
> Date: 2026-04-19
> Status: Draft
> Upstream Context: completed `scene-skill 60-to-90` roadmap and post-roadmap real-sample closures
## 1. Intent
This design defines a bounded, read-only dry-run over the full `102` scene ledger.
The target is:
`measure current generic scene-to-skill coverage without changing generator behavior or promoting scene status`
## 2. Problem Statement
The current project has three different coverage numbers:
1. real-sample executed pass: `5 / 102`
2. code-backed ledger coverage: `23 / 102`
3. repo-local family regression pass count: `24 / 24`
These numbers are all valid, but none answers the direct question:
`how many of the 102 scenes can the current generic analyzer/generator handle if we run them all now?`
This dry-run answers that question.
## 3. Scope Boundary
This design is limited to measurement.
It may include:
1. reading the current `102` execution board
2. resolving local source directories under the fixed real-scene root
3. running analyzer/generator dry-runs against available sources
4. collecting success, fail-closed, missing-source, and unsupported results
5. publishing a standalone dry-run JSON and report
It must not include:
1. changing analyzer logic
2. changing generator logic
3. changing existing family baselines
4. changing `scene_execution_board_2026-04-18.json`
5. promoting scenes from dry-run results
6. creating new family plans
7. running more than the fixed `102` ledger set
## 4. Fixed Inputs
### Execution Board
`tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
### Scene Root
`D:/desk/智能体资料/全量业务场景/一平台场景`
### Generator
`cargo run --bin sg_scene_generate`
## 5. Fixed Outputs
### Dry-Run Result JSON
`tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
### Dry-Run Output Root
`examples/full_sweep_dry_run_2026-04-19`
### Report
`docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
## 6. Classification Model
Each scene must receive exactly one final dry-run status:
1. `auto-pass`
2. `fail-closed-known`
3. `misclassified`
4. `unsupported-family`
5. `missing-source`
6. `source-unreadable`
## 7. Coverage Metrics
The dry-run must report at least these numbers:
1. `realSampleExecutedPass`
2. `codeBackedLedgerCoverage`
3. `dryRunAutoPass`
4. `dryRunActionableCoverage`
5. `missingSource`
6. `sourceUnreadable`
7. `unsupportedFamily`
## 8. Non-Negotiable Stop Rules
1. If a scene fails, record the failure and continue.
2. If many scenes fail with the same blocker, record the blocker and do not fix it in this dry-run.
3. If dry-run discovers a likely bug, write it as a follow-up recommendation only.
4. Do not update the execution board from dry-run output.
## 9. Exit Condition
This design is complete when the project has a single bounded plan that:
1. defines the dry-run tool/task
2. defines the dry-run output schema
3. preserves read-only behavior against generator logic and board status
4. produces a report that answers actual generic coverage over `102` scenes

View File

@@ -0,0 +1,208 @@
# 102 Full Sweep Dry-Run Triage Design
> Date: 2026-04-19
> Status: Draft
> Upstream Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
> Upstream Report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
## Design Intent
Split the non-pass buckets from the `102` scene full sweep into concrete, actionable triage categories without changing generator behavior or promoting scene status.
The design answers:
`why did 62 scenes not become dry-run auto-pass, and which blocker should be handled first?`
## Starting Point
The upstream dry-run produced:
| Status | Count |
| --- | ---: |
| `auto-pass` | 40 |
| `fail-closed-known` | 26 |
| `misclassified` | 5 |
| `source-unreadable` | 31 |
| `missing-source` | 0 |
| `unsupported-family` | 0 |
| Total | 102 |
The triage scope is only the `62` non-pass records.
## Scope Guardrails
1. do not edit `src/generated_scene/analyzer.rs`
2. do not edit `src/generated_scene/generator.rs`
3. do not change scene generation logic
4. do not update `scene_execution_board_2026-04-18.json`
5. do not promote scenes from this triage
6. do not add family baselines
7. do not create implementation plans from a single failure
8. do not rerun outside the fixed `102` scene set
## Fixed Inputs
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
4. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
## Fixed Outputs
1. triage result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
2. triage report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
## Triage Order
The order is fixed:
1. timeout triage
2. misclassification triage
3. no-report failure triage
This order is deliberate:
1. timeouts are the largest bucket and include already-mapped `G2` scenes
2. misclassification has the cleanest routing-quality signal
3. no-report failures are too broad until the higher-signal buckets are separated
## Timeout Triage Model
Input bucket:
`dryRunStatus = source-unreadable`
Current count:
`31`
Current reason:
`generator timeout after 30s`
Target second-level labels:
1. `timeout-known-family-sample`
2. `timeout-unvalidated-source`
3. `timeout-large-source`
4. `timeout-command-hang`
5. `timeout-generator-slow-but-progressing`
6. `timeout-undetermined`
Minimum evidence per timeout record:
1. source directory exists
2. file count
3. total source bytes
4. current group
5. current board status
6. real sample record id if present
7. whether a partial skill directory exists
8. whether a partial generation report exists
Diagnostic reruns are allowed only for classification. A longer rerun success does not promote the scene.
## Misclassification Triage Model
Input bucket:
`dryRunStatus = misclassified`
Current count:
`5`
Current shape:
1. `G3 -> host_bridge_workflow`: `3`
2. `G1-E -> host_bridge_workflow`: `2`
Target second-level labels:
1. `route-overprefer-host-bridge`
2. `board-expectation-stale`
3. `mixed-workflow-host-bridge-valid`
4. `scene-family-split-needed`
5. `misclassification-undetermined`
Minimum evidence per misclassification record:
1. board expected group
2. expected archetype
3. dry-run inferred archetype
4. current source asset
5. real sample layer status
6. generated report path
7. failed or conflicting signal summary
This phase does not correct routing logic.
## No-Report Failure Triage Model
Input bucket:
`dryRunStatus = fail-closed-known` and reason is `generator failed without generation report`
Current count:
`25`
Target failure stages:
1. `source-scan`
2. `analyzer`
3. `ir-assembly`
4. `readiness-before-report`
5. `compiler-package-write`
6. `panic-or-process-error`
7. `unknown-no-report`
The one `bootstrap_target` failure remains separately tracked and is not merged into no-report failures.
Minimum evidence per no-report record:
1. exit code if available
2. stdout tail
3. stderr tail
4. partial skill directory exists
5. partial references directory exists
6. generated report exists
7. inferred failure stage
## Result Schema
Top-level fields:
```json
{
"triageDate": "2026-04-19",
"scope": "102-full-sweep-dry-run-triage",
"sourceDryRun": "tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json",
"summary": {},
"timeoutTriage": [],
"misclassificationTriage": [],
"noReportFailureTriage": [],
"bootstrapTargetFailures": [],
"recommendations": []
}
```
Each triage record keeps the original dry-run scene id and scene name.
## Completion Criteria
This triage is complete when:
1. all `31` timeout records have a second-level timeout label
2. all `5` misclassified records have a routing triage label
3. all `25` no-report failures have an inferred failure stage
4. the `bootstrap_target` case remains separately visible
5. no scene status is promoted
6. no generator or analyzer logic is changed
## Stop Rule
Stop after publishing the triage JSON and report.
Do not start implementation correction from this triage unless a new bounded implementation plan is explicitly created later.

View File

@@ -0,0 +1,239 @@
# 102 Full Sweep Improvement Roadmap Design
> Date: 2026-04-19
> Status: Draft
> Upstream Dry-Run: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
> Upstream Triage: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
## Design Intent
Use the full `102` scene dry-run and triage results to define a single improvement roadmap for generic `scene -> skill` coverage.
This roadmap is the post-triage equivalent of the earlier `60-to-90` roadmap. It is not a single bugfix plan. It is the governing design for turning measured dry-run blockers into bounded implementation tracks.
The design answers:
`how do we move from 40/102 dry-run auto-pass and 66/102 actionable coverage toward a higher verified generic conversion rate without drifting into unbounded fixes?`
## Current Baseline
The current measured state is:
| Metric | Count |
| --- | ---: |
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |
The non-pass triage state is:
| Bucket | Count | Triage conclusion |
| --- | ---: | --- |
| Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` |
| Misclassified | 5 | all `route-overprefer-host-bridge` |
| No-report failure | 25 | all `readiness-before-report` |
| Bootstrap target | 1 | separate `bootstrap_target` |
## Problem Statement
The generic generator already auto-passes more scenes than the formal ledger coverage shows, but the result is not trustworthy enough to promote automatically because:
1. known-family scenes still appear in the timeout bucket
2. `host_bridge_workflow` can over-absorb scenes expected to remain `G3` or `G1-E`
3. many fail-closed cases terminate before a structured generation report exists
4. timeout and no-report failures hide actionable blocker details
## Roadmap Goal
Improve the measurable generic conversion pipeline, not by adding new families first, but by reducing ambiguity in the current failure surface.
The roadmap has four goals:
1. make known-family timeout results explainable and repeatable
2. correct or formally adjudicate host-bridge routing over-preference
3. convert pre-report failures into structured fail-closed results
4. rerun a bounded `102` sweep to measure coverage delta
## Scope Guardrails
1. do not add new scene families in this roadmap
2. do not promote scenes directly from diagnostic runs
3. do not update `scene_execution_board_2026-04-18.json` until a later explicit status-sync plan
4. do not use one failure as justification for an unbounded rewrite
5. do not reopen completed `G1-E / G2 / G3 / G6 / G7` real-sample pass records unless they are part of a fixed regression check
6. do not start `G4 / G5`
7. do not implement login recovery, full host runtime, or attachment pipeline work in this roadmap
## Workstreams
1. `WS1` Timeout and Source-Scale Diagnostics
2. `WS2` Host-Bridge Routing Boundary Correction
3. `WS3` Structured Fail-Closed Reporting
4. `WS4` Coverage Delta Sweep and Decision Board
## Track A: Known-Family Timeout Diagnostics
### Intent
Separate known-family timeout behavior from generic unvalidated-source timeout behavior.
### Input
The `4` records labeled:
`timeout-known-family-sample`
### Expected Output
Each known-family timeout gets one of:
1. `known-family-rerun-pass`
2. `known-family-source-scale-timeout`
3. `known-family-generator-hotspot`
4. `known-family-contract-blocked-after-long-run`
5. `known-family-timeout-unresolved`
### Design Constraint
A longer rerun success does not promote a scene. It only changes diagnostic classification.
## Track B: Timeout Source-Scale Policy
### Intent
Create a bounded input filtering and scan-budget policy for large source directories without changing family semantics.
### Input
The timeout labels:
1. `timeout-large-source`
2. `timeout-unvalidated-source`
### Expected Output
1. source file selection policy
2. large vendor/library ignore list policy
3. scan-budget decision table
4. timeout reporting shape
### Design Constraint
This track is allowed to improve scan boundaries, but not allowed to change archetype semantics.
## Track C: Host-Bridge Route Over-Preference Correction
### Intent
Prevent `host_bridge_workflow` from absorbing scenes that should remain `G3` or `G1-E` when business-chain evidence is stronger.
### Input
The `5` records labeled:
`route-overprefer-host-bridge`
### Expected Output
Each misclassification gets one of:
1. `route-corrected-to-g3`
2. `route-corrected-to-g1e`
3. `board-expectation-reclassified`
4. `valid-host-bridge-workflow`
5. `route-conflict-unresolved`
### Design Constraint
This track must preserve the already-passed `G6` real sample and must not degrade `G3` or `G1-E` canonical tests.
## Track D: Readiness-Before-Report Structured Fail-Closed
### Intent
Convert `generator failed without generation report` into structured, machine-readable fail-closed results.
### Input
The `25` records labeled:
`readiness-before-report`
### Expected Output
Each case produces a generation report or equivalent dry-run failure record with:
1. inferred archetype
2. blocker stage
3. missing contract pieces
4. failed gate name
5. actionable reason
### Design Constraint
This track should not make failing scenes pass. It should make failures explainable.
## Track E: Bootstrap Target Isolation
### Intent
Keep the single `bootstrap_target` failure separate so it does not pollute the no-report or route-correction work.
### Input
The `1` bootstrap target failure:
`用户停电频次分析监测`
### Expected Output
1. isolated bootstrap failure note
2. decision whether it belongs to later bootstrap normalization work
### Design Constraint
No bootstrap auto-recovery or login work is included in this roadmap.
## Track F: Coverage Delta Sweep
### Intent
After bounded improvements, rerun a comparable `102` sweep and compare against the baseline.
### Input
1. baseline dry-run result
2. updated generator after approved tracks
3. same `102` scene board
### Expected Output
1. new dry-run result
2. coverage delta report
3. category movement table
4. decision board for remaining blockers
### Design Constraint
The rerun must be comparable to the baseline. It cannot silently change the scene set.
## Success Criteria
This roadmap succeeds when:
1. all known-family timeouts are separated from unvalidated timeout noise
2. all five host-bridge over-preference cases are adjudicated
3. no-report failures become structured fail-closed outputs
4. a follow-up full sweep shows measurable improvement or a clearly explained plateau
5. no new family is introduced to mask existing failure categories
## Out of Scope
1. new `G4/G5` implementation
2. full login recovery
3. browser host runtime transport implementation
4. local document attachment pipeline
5. automatic scene promotion into the execution board
6. full manual validation of all `102` generated skills

View File

@@ -0,0 +1,119 @@
# 102 Sweep Status Reconciliation Design
> Date: 2026-04-19
> Status: Draft
> Upstream Follow-Up Sweep: `tests/fixtures/generated_scene/full_sweep_improvement_followup_2026-04-19.json`
> Upstream Route Decisions: `tests/fixtures/generated_scene/remaining_route_conflict_decisions_2026-04-19.json`
## Intent
Create a single reconciled status view after the `102` full-sweep improvement roadmap and the remaining route-conflict adjudication.
This design does not change generation behavior. It reconciles measurement assets so the next roadmap starts from a trustworthy status baseline instead of reading stale `misclassified` counts from the follow-up sweep.
## Problem
The current assets intentionally remain separated:
1. the original execution board records current scene status
2. the follow-up sweep records measured analyzer/generator results
3. the route-conflict decision asset adjudicates the remaining `4` sweep misclassifications
Because the follow-up sweep still contains `4` `misclassified` records, a reader can incorrectly treat them as unresolved route bugs. The later route-conflict plan decided all `4` are `valid-host-bridge-workflow`.
The next step needs a reconciled view that preserves the raw sweep result while adding the final adjudicated state.
## Scope
In scope:
1. merge follow-up sweep records with route-conflict decisions
2. produce reconciled status counts
3. mark the `4` previous misclassifications as `adjudicated-valid-host-bridge`
4. preserve the `2` remaining timeouts as unresolved timeout inputs
5. summarize the `48` structured fail-closed records by archetype and blocker
6. produce a reconciliation report for the next roadmap
Out of scope:
1. modifying `analyzer.rs`
2. modifying `generator.rs`
3. modifying `scene_execution_board_2026-04-18.json`
4. promoting any scene
5. creating or changing family baselines
6. rerunning the `102` sweep
7. implementing fixes for fail-closed records or timeouts
## Inputs
Required inputs:
1. `tests/fixtures/generated_scene/full_sweep_improvement_followup_2026-04-19.json`
2. `tests/fixtures/generated_scene/remaining_route_conflict_decisions_2026-04-19.json`
Optional read-only inputs:
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `docs/superpowers/reports/2026-04-19-102-full-sweep-improvement-coverage-delta-report.md`
3. `docs/superpowers/reports/2026-04-19-remaining-route-conflict-correction-report.md`
## Reconciled Status Model
Every scene keeps its raw follow-up `dryRunStatus`.
The reconciliation adds `reconciledStatus`:
1. `auto-pass`
2. `fail-closed-known`
3. `adjudicated-valid-host-bridge`
4. `source-unreadable`
5. `missing-source`
6. `unsupported-family`
The only status transformation in this plan is:
`misclassified` + route decision `valid-host-bridge-workflow` -> `adjudicated-valid-host-bridge`
If a `misclassified` record has no matching final decision, it must remain `misclassified-unresolved`.
## Expected Reconciled Counts
Based on the current follow-up sweep and route decisions, the expected reconciliation is:
| Reconciled status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 48 |
| `adjudicated-valid-host-bridge` | 4 |
| `source-unreadable` | 2 |
| `missing-source` | 0 |
| `unsupported-family` | 0 |
| Total | 102 |
## Follow-Up Inputs for Future Roadmaps
The reconciliation should make the next candidates explicit:
1. `48` structured fail-closed records for workflow evidence / contract completion analysis
2. `2` remaining timeout records for source-scale or command hang diagnostics
3. `4` valid-host-bridge adjudications for optional execution-board expectation cleanup, not analyzer correction
## Deliverables
1. `tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-102-sweep-status-reconciliation-report.md`
## Acceptance Criteria
1. total reconciled scene count is exactly `102`
2. all `4` route conflicts are reconciled from `misclassified` to `adjudicated-valid-host-bridge`
3. no `misclassified` status remains unless it lacks a route decision
4. the `2` timeout cases remain separate and unresolved
5. no execution board status is changed
6. no analyzer or generator logic is changed
## Stop Rule
Stop after publishing the reconciliation JSON and report.
Do not start fail-closed implementation, timeout diagnostics, or execution-board sync inside this plan.

View File

@@ -0,0 +1,27 @@
# Bootstrap Target Normalization Roadmap Design
> Date: 2026-04-19
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Fixed Scene: `sweep-091-scene`
## Intent
Normalize the remaining `page_state_eval` bootstrap target residual without opening general login recovery or browser navigation runtime.
## Fixed Scope
Only `sweep-091-scene` is in scope.
## Minimal Success Definition
The scene must either:
1. become `framework-auto-pass-candidate`; or
2. remain `framework-structured-fail-closed` with a narrower named bootstrap target reason.
## Forbidden Scope
1. no general login recovery
2. no full browser navigation runtime
3. no host-bridge runtime work
4. no new family

View File

@@ -0,0 +1,50 @@
# Boundary Fail-Closed Decision Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 5: boundary-family fail-closed`
> Parent Layer: `Layer C + Layer D`
## Intent
Inspect the remaining boundary-family fail-closed records and decide whether they should be:
1. deferred
2. kept as boundary fail-closed
3. opened into one bounded correction slice
This is a decision-first route.
## Fixed Input Bucket
1. `local_doc_pipeline = 5`
2. `host_bridge_workflow = 1`
3. `page_state_eval/bootstrap_target = 1`
## Allowed Files
1. boundary decision JSON assets
2. boundary decision report assets
3. optional next-plan design/plan files only if a bounded boundary slice is justified
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Expected Delta
1. no code-level coverage delta is required
2. the expected result is a decision-quality delta:
- defer
- hold
- open one bounded slice
## Stop Rule
Stop after the boundary decision is published.
Do not start boundary implementation under this plan.

View File

@@ -0,0 +1,121 @@
# Boundary Family Real-Sample Entry Roadmap Design
> Date: 2026-04-19
> Status: Draft
> Upstream Validation Layer: [real_sample_validation_records_2026-04-18.json](D:/data/ideaSpace/rust/sgClaw/claw-new/tests/fixtures/generated_scene/real_sample_validation_records_2026-04-18.json)
> Upstream Entry Rules: [boundary_runtime_entry_rules_2026-04-18.json](D:/data/ideaSpace/rust/sgClaw/claw-new/tests/fixtures/generated_scene/boundary_runtime_entry_rules_2026-04-18.json)
## 1. Intent
This design defines the next bounded roadmap after the mainline real-sample anchors are closed.
The current mainline state is:
1. `G1-E = executed-pass`
2. `G2 = executed-pass`
3. `G3 = executed-pass`
So the next roadmap should not reopen mainline contract correction.
The next bounded question is narrower:
`Which boundary family, if any, is allowed to enter real-sample execution scope next?`
## 2. Problem Statement
The repo already has boundary families established at the fixture and family-asset layer:
1. `G6 = host_bridge_workflow`
2. `G7 = multi_endpoint_inventory`
3. `G8 = local_doc_pipeline`
But none of them has been promoted into real-sample execution scope.
At this point the strongest risk is not lack of family assets.
It is lack of a bounded admission rule for moving a boundary family from:
1. `hold-as-boundary`
to:
2. `real-sample-entry-candidate`
Without a dedicated roadmap, any next step is likely to drift into:
1. accidental boundary implementation
2. premature runtime-platform work
3. reopening deferred families
## 3. Scope Boundary
This roadmap is limited to boundary-family entry decision work.
It may include:
1. comparing `G6 / G7 / G8` against explicit real-sample entry criteria
2. selecting at most one boundary family as the next execution candidate
3. producing a bounded recommendation and follow-up plan
It must not include:
1. implementing new runtime-platform capabilities
2. executing a real sample for more than one boundary family
3. opening `G4 / G5`
4. reopening `G1-E / G2 / G3`
5. broadening into a new all-family migration program
## 4. Current Decision Inputs
The current repo state already gives the key decision inputs:
1. `G6` requires host-bridge execution semantics beyond repo-local coverage
2. `G7` requires real multi-endpoint aggregation verification
3. `G8` requires local document pipeline runtime and attachment handling
These are not implementation tasks yet.
They are admission constraints.
## 5. Roadmap Goal
The goal of this roadmap is not to make a boundary family pass immediately.
The goal is to produce one bounded and defensible next execution target:
1. select exactly one next boundary family
2. explain why it is first
3. explain why the other two remain held
4. define the minimum real-sample entry slice for the selected family
## 6. Preferred Outcome
The preferred outcome is:
1. one selected boundary family
2. one bounded real-sample execution plan for that family
3. the other boundary families explicitly remain `hold-as-boundary`
An acceptable fallback outcome is:
1. no boundary family is admitted yet
2. a new bounded roadmap is required for runtime-platform prerequisites first
## 7. Acceptance Logic
This roadmap is successful when:
1. the next post-mainline step is no longer ambiguous
2. only one next-family direction is opened
3. boundary-family expansion pressure is kept bounded
4. deferred families remain untouched
## 8. Out of Scope
The following are explicitly out of scope:
1. new scene-generator family work
2. new canonical answers
3. new mainline contract correction
4. login recovery implementation
5. host runtime or transport implementation beyond decision-level scoping

View File

@@ -0,0 +1,57 @@
# Boundary Runtime Prerequisites Roadmap Design
> Date: 2026-04-19
> Status: Draft
> Upstream Decision: [2026-04-19-post-g7-boundary-decision-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-post-g7-boundary-decision-report.md)
## 1. Intent
This design defines the next bounded slice after the post-`G7` boundary decision selects `prerequisites-only hold`.
The target is:
`boundary runtime prerequisites roadmap`
## 2. Why This Direction
This direction is selected because:
1. `G7` is already closed and should not be reopened
2. `G6` still depends on stronger host-bridge real execution semantics
3. `G8` still depends on local document runtime and attachment handling
4. forcing either family into execution now would exceed the bounded next-step budget
## 3. Scope Boundary
This design is limited to prerequisite scoping only.
It may include:
1. separating `G6` prerequisite pressure from `G8` prerequisite pressure
2. defining the minimum prerequisite slice needed before either family can enter real-sample scope
3. selecting one bounded prerequisite direction
It must not include:
1. executing `G6` or `G8`
2. implementing host-runtime or local-doc runtime directly
3. reopening `G7`
4. reopening `G1-E / G2 / G3`
5. opening `G4 / G5`
## 4. Target Outcome
The bounded target outcome is one of two states:
1. a selected prerequisite direction for `G6`
2. or a selected prerequisite direction for `G8`
The design rejects direct family execution under this slice.
## 5. Exit Condition
This design is complete when implementation can be bounded to one roadmap that:
1. compares `G6` and `G8` prerequisite burden directly
2. selects exactly one prerequisite direction
3. publishes one bounded follow-up plan

View File

@@ -0,0 +1,21 @@
# Final 2 Official Board Reconciliation Refresh Design
> Date: 2026-04-19
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
## Intent
Apply candidate results from either bootstrap normalization or host-bridge runtime roadmap to the official board.
This design is only a refresh layer. It does not decide, rerun, or implement runtime behavior.
## Inputs
One or both of:
1. `tests/fixtures/generated_scene/bootstrap_target_normalization_reconciliation_candidates_2026-04-19.json`
2. `tests/fixtures/generated_scene/host_bridge_runtime_reconciliation_candidates_2026-04-19.json`
## Output
The official board framework summary should reflect the selected residual roadmap result while preserving workbook and business-status fields.

View File

@@ -0,0 +1,51 @@
# Final 2 Residual Child Plan Sequence Design
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer E / Route 5 + Route 6`
> Upstream Board State: `framework-auto-pass = 100`, `framework-structured-fail-closed = 2`
## Intent
Define the remaining bounded plan sequence after local-doc runtime closure. Only two framework structured fail-closed residuals remain:
1. `sweep-085-scene`: `host_bridge_workflow`, next action `future-host-bridge-runtime-roadmap-input`
2. `sweep-091-scene`: `page_state_eval`, next action `future-bootstrap-target-normalization-roadmap-input`
This sequence prevents drift back into prior G6 micro-plans. Every next step must be anchored to this final-2 residual sequence and the 102 full coverage parent framework.
## Design Rules
1. Do not reuse the old G6 semantics micro-plan chain as an execution path.
2. Do not start host-bridge and bootstrap work in the same implementation plan.
3. Do not update the official board inside diagnostic or implementation plans.
4. Do not add a new family.
5. Do not modify unrelated mainline contracts for G1-E, G2, or G3.
6. Any implementation must target exactly one residual scene unless a later parent-framework revision expands the fixed input bucket.
## Sequence
1. `Final 2 Residual Roadmap Prioritization`
Decide which residual enters implementation first.
2. `Bootstrap Target Normalization Roadmap`
Bounded roadmap for `sweep-091-scene`, if selected by prioritization.
3. `Host-Bridge Runtime Roadmap`
Bounded roadmap for `sweep-085-scene`, if selected by prioritization.
4. `Final 2 Official Board Reconciliation Refresh`
Consume the selected roadmap output and update only framework-layer board fields.
5. `102 Framework Closure Rollup`
Publish the final 102-status view after both residuals are closed or explicitly held.
## Expected End State
The target end state is one of:
1. `102 framework-auto-pass`, `0 structured fail-closed`
2. `101 framework-auto-pass`, `1 structured fail-closed with named runtime hold`
3. `100 framework-auto-pass`, `2 structured fail-closed with named runtime holds`
The third state is allowed only if both residuals are explicitly held by bounded decision plans.

View File

@@ -0,0 +1,25 @@
# Final 2 Residual Roadmap Prioritization Design
> Date: 2026-04-19
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
## Intent
Choose the next executable residual roadmap between:
1. `bootstrap target normalization`
2. `host-bridge runtime`
The decision must use the current official board state and must not start implementation.
## Decision Criteria
1. fixed residual count
2. scope clarity
3. implementation risk
4. probability of improving framework auto-pass count
5. risk of regression to already-passing paths
## Expected Output
The output is a decision asset and report naming exactly one selected first roadmap. The non-selected roadmap remains queued.

View File

@@ -0,0 +1,39 @@
# G1-E Remaining Fail-Closed Closure Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 4: G1-E / single_request_enrichment`
> Parent Layer: `Layer C + Layer D`
## Intent
Reduce the remaining `G1-E / single_request_enrichment` structured fail-closed bucket after Routes 2 and 3 are complete or deferred.
## Fixed Input Bucket
`single_request_enrichment = 2`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. Route 4 local inventory and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 2 and Route 3 assets
3. Route 5+ assets
## Expected Delta
1. reduce the remaining `G1-E` fail-closed bucket
2. preserve current real-sample `G1-E` pass
## Stop Rule
Stop after the Route 4 bucket is rerun and either reduced or explicitly deferred.

View File

@@ -0,0 +1,132 @@
# G2 Real Sample Contract Correction Design
> Date: 2026-04-19
> Status: Draft
> Upstream Roadmap: [2026-04-18-scene-skill-real-sample-validation-roadmap-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-18-scene-skill-real-sample-validation-roadmap-plan.md)
> Trigger Record: `rsv-g2-001`
## 1. Intent
This bounded design defines the next mainline correction slice after `G3` is closed as an executed pass.
The only target is:
`G2 real-sample contract correction`
The purpose is to reduce the current `G2` real-sample mismatch from a broad first-round failure bundle into either:
1. a corrected executable pass
2. or a smaller named contract mismatch
## 2. Current Trigger
The current real-sample validation record for `G2` is:
1. `recordId = rsv-g2-001`
2. `validationState = executed-mismatch`
3. `mismatchCodes = [archetype_mismatch, bootstrap_mismatch, request_contract_missing, column_defs_missing]`
From the current mainline status, `G2` is now the strongest unresolved real-sample pressure.
## 3. Scope Boundary
This design is strictly bounded to the real-sample contract gap for the fixed `G2` anchor:
1. `台区线损大数据-月_周累计线损率统计分析`
The correction scope is limited to:
1. bootstrap target correctness
2. request contract correctness
3. column-definition correctness
4. output-contract correctness
This design does not reopen:
1. completed `G2` family expansion work
2. new `G2` candidate promotion
3. `G1-E`
4. `G3`
5. `G6 / G7 / G8`
6. `G4 / G5`
7. login recovery or broader runtime-platform work
## 4. Problem Statement
The current mismatch is no longer about whether `G2` exists as a family.
That work is already closed at the repo-local family layer.
The current problem is narrower:
1. the fixed real sample still does not close against the intended `tq-lineloss-report`-level business contract
2. the validation layer still records a compound mismatch instead of a narrowed real-sample outcome
Based on the real-sample analysis and existing `G2` remediation reports, the remaining pressure should be treated as a contract-alignment issue around:
1. target bootstrap surface
2. mode-specific request template completeness
3. output column semantics
4. output correctness against the intended lineloss artifact
## 5. Correction Principles
The correction must obey these principles:
1. prefer narrowing the current real-sample mismatch over broad family refactoring
2. preserve `fail-closed` behavior for unresolved `G2` variants
3. do not broaden `G2` routing into unrelated line-loss-like scenes
4. keep the correction anchored on the fixed real sample rather than batch fixtures
5. only update validation assets after the real-sample outcome becomes narrower than the current broad mismatch bundle
## 6. Target Outcome
The target outcome is one of two bounded states:
### A. Preferred outcome
`rsv-g2-001` becomes:
1. `executed-pass`
### B. Acceptable narrower outcome
`rsv-g2-001` remains `executed-mismatch`, but with a smaller named mismatch such as:
1. bootstrap-only mismatch
2. request-contract-only mismatch
3. column-contract-only mismatch
4. output-contract-only mismatch
The design explicitly rejects leaving `G2` unchanged at the same coarse four-code mismatch bundle.
## 7. Required Verification Surfaces
The correction must be verified against these surfaces:
1. real generated `generation-report.json`
2. intended `tq-lineloss-report` semantic baseline
3. automated regression that names the corrected real-sample pattern
4. validation-layer assets:
- `real_sample_validation_records_2026-04-18.json`
- `scene_execution_board_2026-04-18.json`
- `boundary_runtime_entry_rules_2026-04-18.json` if prioritization changes
## 8. Out of Scope
The following are explicitly out of scope for this design:
1. promoting more `G2` fixtures
2. redesigning all `G2` subtype handling
3. rewriting the general `multi_mode_request` compiler
4. opening a new `G2` family roadmap
5. changing unrelated validation records
## 9. Exit Condition
This design is complete when implementation can be bounded to a single plan that:
1. freezes the fixed `G2` real sample
2. isolates the remaining bootstrap/request/column/output gap
3. narrows the real-sample outcome
4. updates validation assets without reopening family-expansion work

View File

@@ -0,0 +1,44 @@
# G2 Remaining Fail-Closed Closure Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 3: G2 / multi_mode_request`
> Parent Layer: `Layer C + Layer D`
## Intent
Reduce the remaining `G2 / multi_mode_request` structured fail-closed bucket after Route 2 is complete or deferred.
## Fixed Input Bucket
`multi_mode_request = 4`
The child plan owns only the currently remaining `G2` structured fail-closed scenes.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local Route 3 inventory and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 2 assets
3. Route 4+ assets
## Expected Delta
1. reduce the Route 3 bucket count
2. preserve current real-sample `G2` executed-pass
## Stop Rule
Stop after the Route 3 bucket is rerun and either:
1. reduced, or
2. explicitly deferred with named blocker

View File

@@ -0,0 +1,68 @@
# G3 Enrichment Request Closure Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 2: G3 / paginated_enrichment`
> Parent Layer: `Layer C + Layer D`
## Intent
Reduce the largest repeated `G3 / paginated_enrichment` fail-closed subgroup by recovering missing enrichment-request contract evidence without relaxing gates.
## Fixed Input Bucket
Primary bucket:
`paginated_enrichment + g3_enrichment_contract + secondary_request`
This child plan targets the repeated scenes whose structured fail-closed state shows:
1. `g3_enrichment_contract_complete` failed
2. request/response contract failed because `secondary_request` is missing
## Current Pattern
The current follow-up assets show a repeated subgroup where:
1. a main paginated scene is recognized
2. primary request shape is sufficiently visible to classify as `paginated_enrichment`
3. enrichment request extraction is not closed
4. secondary response extraction is not closed
This is the first recovery slice because it appears more frequently than the export-plan-specific slice.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local sweep follow-up assets created by this plan
6. route-local reports created by this plan
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. unrelated family manifests and promotion assets
3. Route 3, Route 4, Route 5, and Route 6 plan files
## Expected Delta
Expected delta is measured only against the Route 2 bucket:
1. some `paginated_enrichment` fail-closed records should move from `g3_enrichment_contract` to either:
- `auto-pass`, or
- a narrower remaining contract blocker
2. no current `G3` canonical or real-sample pass may regress
## Stop Rule
Stop after:
1. the targeted subgroup is rerun in a bounded way
2. coverage delta is measured
3. remaining unresolved `G3` fail-closed scenes are left for the next Route 2 child plan
Do not absorb export-plan-specific work into this plan unless it is strictly required to preserve contract coherence for the targeted subgroup.

View File

@@ -0,0 +1,49 @@
# G3 Export Plan Closure Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 2: G3 / paginated_enrichment`
> Parent Layer: `Layer C + Layer D`
## Intent
Reduce the second repeated `G3 / paginated_enrichment` fail-closed subgroup by recovering missing export-plan evidence without loosening workflow completeness gates.
## Fixed Input Bucket
Primary bucket:
`paginated_enrichment + g3_export_plan + export_plan`
This child plan targets the repeated scenes whose structured fail-closed state shows:
1. `workflow_contract_complete` and/or `workflow_complete_for_archetype` failed because `export_plan` is missing
2. `g3_export_path_identified` failed because `g3_export_plan` is incomplete
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local follow-up assets
6. route-local reports
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 3 and later implementation assets
3. promotion/board policy assets
## Expected Delta
1. reduce the count of `paginated_enrichment` fail-closed records driven primarily by export-plan absence
2. if scenes still fail, narrow them to a smaller residual blocker such as runtime scope or enrichment contract
## Stop Rule
Stop after the export-plan subgroup is rerun and the resulting residual bucket is explicitly measured.
Do not continue into Route 2 residual closure under this plan.

View File

@@ -0,0 +1,142 @@
# G3 Real Sample Archetype Correction Design
> Date: 2026-04-19
> Status: Draft
> Upstream Roadmap: [2026-04-18-scene-skill-real-sample-validation-roadmap-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-18-scene-skill-real-sample-validation-roadmap-plan.md)
> Trigger Report: [2026-04-19-g3-real-sample-execution-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g3-real-sample-execution-report.md)
## Intent
This design starts after the real-sample validation roadmap confirmed that the `G3` anchor real sample executes, but collapses into `local_doc_pipeline` and then fails closed.
The purpose of this design is not to broaden the scene generator again. It is to correct the real-sample archetype routing boundary so that:
1. the real sample `95598工单明细表` stays inside `G3 / paginated_enrichment`
2. `G8 / local_doc_pipeline` keeps its current boundary-family fail-closed role
3. `G3` and `G8` no longer compete on the same evidence tier for this real sample
## Problem Statement
The repo-local `G3` baseline is stable, but the real sample currently produces:
1. executable input discovery
2. archetype collapse into `local_doc_pipeline`
3. fail-closed result due to incomplete `local_doc_pipeline` workflow evidence
This means the strongest gap is no longer generic workflow incompleteness. The strongest gap is a real-sample archetype routing mismatch between:
1. `G3` business request-chain evidence
2. `G8` local storage / document pipeline evidence
## Scope
This design covers only the bounded correction needed to resolve the above mismatch.
Included:
1. compare repo-local `G3` canonical evidence with real-sample evidence
2. split business request-chain evidence from local document pipeline evidence
3. re-order or tighten archetype routing between `G3` and `G8`
4. add regression coverage for the real-sample mismatch pattern
5. re-run the real sample and record the corrected outcome
Excluded:
1. opening `G4 / G5`
2. expanding `G6 / G7 / G8` runtime implementation
3. broad runtime integration work such as login recovery or transport redesign
4. generalized scene-generator redesign outside the `G3 vs G8` routing boundary
5. continuing batch expansion or new fixture-family growth unrelated to this mismatch
## Design Principles
1. Mainline first
The correction serves the mainline `G3` real-sample path first. Boundary-family preservation is a constraint, not the main objective.
2. Fail-closed must remain intact
The fix must not weaken fail-closed behavior for truly incomplete `G3` or `G8` inputs.
3. Evidence tier separation
Local SQL, doc export, local config, or helper persistence evidence must not outrank business request-chain evidence when the sample still has a recoverable `G3` chain.
4. Real-sample anchored
Acceptance is defined by the real sample outcome, not by repo-local fixture success alone.
## Evidence Model Adjustment
The current mismatch implies that the analyzer and generator need a sharper evidence split between two layers:
1. `business_workflow_evidence`
- main request
- pagination fields
- enrichment requests
- join keys
- export path connected to the business chain
2. `local_pipeline_evidence`
- local persistence
- `definedSqlQuery`
- `docExport`
- local helper service
- host or localhost pipeline artifacts
For this correction, the routing rule must treat `local_pipeline_evidence` as secondary when:
1. the `G3` business chain is materially present
2. the local pipeline is downstream support or artifact generation
3. the sample still matches the `G3` minimal contract more strongly than the `G8` minimal contract
## Routing Boundary Decision
The required routing decision is:
1. prefer `paginated_enrichment` when the sample contains:
- a main request
- pagination control
- at least one enrichment or detail chain
- join-key recoverability
2. route to `local_doc_pipeline` only when local pipeline evidence is the dominant workflow backbone and the business request chain cannot form a `G3` contract
This means `G8` remains valid, but its trigger threshold must be higher when a recoverable `G3` mainline exists.
## Expected Code Touch Points
This design is expected to touch only the current generated-scene core:
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/ir.rs`
3. `src/generated_scene/generator.rs`
4. `tests/scene_generator_test.rs`
5. real-sample validation assets under `tests/fixtures/generated_scene/`
## Validation Strategy
The correction must be verified in three layers:
1. deterministic routing regression
prove that the `G3 vs G8` evidence split behaves as intended
2. generator regression
prove that the corrected path still compiles or fail-closes for the right reason
3. real-sample rerun
prove that `95598工单明细表` no longer collapses into `local_doc_pipeline`
## Success Criteria
This design is considered satisfied when:
1. the `G3` real sample no longer routes into `local_doc_pipeline`
2. the real sample resolves as `paginated_enrichment`, or fail-closes inside `G3` for a `G3`-specific reason
3. `G8` representative behavior remains intact
4. the real-sample validation layer records the corrected family outcome
## Non-Goals
This design does not try to guarantee that the `G3` real sample becomes fully runnable in one step.
If the corrected run still fails, that is acceptable only when:
1. the failure remains inside `G3`
2. the blocker is a real `G3` contract gap
3. the result no longer depends on accidental collapse into `G8`

View File

@@ -0,0 +1,59 @@
# G3 Real Sample Output Contract Verification Design
> Date: 2026-04-19
> Upstream Closure: [2026-04-19-g3-real-sample-runtime-contract-correction-closure-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g3-real-sample-runtime-contract-correction-closure-report.md)
## 1. Intent
The previous bounded plan corrected the remaining runtime-scope failure for the real sample `95598工单明细表`.
The sample now:
1. routes inside `G3 / paginated_enrichment`
2. passes `g3_runtime_scope_compatible`
3. reaches `readiness.level = A`
The remaining mainline gap is narrower:
`output_contract_not_verified`
This design defines the next bounded scope:
`G3 real-sample output / contract verification`
## 2. Observed Remaining Gap
The generated real-sample package now satisfies structural routing and runtime-scope gates, but the validation layer still does not verify:
1. whether the main request / enrichment split matches the intended business output
2. whether recovered join keys and dedupe rules are semantically correct rather than merely syntactically complete
3. whether the current generated artifact shape matches the expected real-sample output contract
This means the remaining risk is no longer routing or runtime admission. It is output-level contract fidelity.
## 3. Scope Guardrails
1. do not reopen the completed `G3` archetype-correction scope
2. do not reopen the completed `G3` runtime-scope correction scope
3. do not broaden this work into `G8` runtime implementation
4. do not reopen `G3` family expansion or new fixture growth
5. do not open `G4 / G5`
6. do not weaken fail-closed behavior to force a `passed` record
## 4. Correction Target
The bounded target is:
1. verify the real-sample `G3` output contract against the intended business contract
2. narrow the remaining mismatch from generic `output_contract_not_verified` to:
- verified pass
- or a smaller named contract/output mismatch
3. keep the result anchored in the real sample rather than repo-local proxies
## 5. Expected Outcome
After this scope:
1. the validation record for `rsv-g3-001` should either become `executed-pass`
2. or it should retain `executed-mismatch` with a more specific output/contract code than the current generic label
3. the next scope recommendation should move away from `G3` unless a genuinely narrower output issue remains

View File

@@ -0,0 +1,53 @@
# G3 Real Sample Runtime Contract Correction Design
> Date: 2026-04-19
> Upstream Closure: [2026-04-19-g3-real-sample-archetype-correction-closure-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g3-real-sample-archetype-correction-closure-report.md)
## 1. Intent
The previous bounded plan corrected the real-sample routing boundary for `95598工单明细表`.
The sample now stays in `G3 / paginated_enrichment`, but it still leaves a mainline gap:
`g3_runtime_scope_compatible = false`
This design defines the next bounded scope:
`G3 real-sample runtime / contract correction`
## 2. Observed Remaining Gap
The corrected real-sample rerun shows:
1. archetype is now `paginated_enrichment`
2. main request, pagination, enrichment, join keys, and export path are all present
3. the remaining blocker is the current runtime-scope rule, which treats the volume of localhost evidence as incompatible
The current gate is too coarse for this real sample because the localhost evidence is subordinate to the restored business chain, not the controlling workflow backbone.
## 3. Scope Guardrails
1. do not reopen `G3` family expansion
2. do not broaden this work into `G8` runtime implementation
3. do not change `G6 / G7 / G8` behavior except where a shared generic gate must remain consistent
4. do not weaken fail-closed behavior for scenes that still do not satisfy the `G3` minimum contract
5. do not treat asset-only updates as progress unless they follow a real rerun result
## 4. Correction Target
The bounded target is:
1. keep the real sample in `paginated_enrichment`
2. narrow `g3_runtime_scope_compatible` so it distinguishes:
- subordinate host-runtime dependencies inside a valid `G3` business chain
- dominant host-runtime dependencies that still justify fail-closed
3. preserve explicit visibility of remaining output or data-verification gaps
## 5. Expected Outcome
After correction:
1. the real sample should still resolve as `paginated_enrichment`
2. `g3_runtime_scope_compatible` should pass when localhost evidence is present but subordinate
3. any remaining mismatch should move from `runtime_scope_gap` to a narrower contract or output-verification gap
4. `G8` representative behavior must not regress

View File

@@ -0,0 +1,52 @@
# G3 Residual Contract Closure Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 2: G3 / paginated_enrichment`
> Parent Layer: `Layer C + Layer D`
## Intent
Handle the remaining `G3` fail-closed records that are still unresolved after the enrichment-request and export-plan child plans have finished.
## Fixed Input Bucket
Residual `G3 / paginated_enrichment` records after:
1. `G3 enrichment-request closure`
2. `G3 export-plan closure`
Expected residual themes:
1. `g3_runtime_scope`
2. `join_key`
3. mixed residual contract blockers
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local residual inventory and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 3+ plan files
## Expected Delta
1. reduce the residual Route 2 bucket after the first two child plans
2. or explicitly defer a smaller residual set with named blockers
## Stop Rule
Stop when:
1. the residual Route 2 bucket is either materially reduced, or
2. the remaining residual Route 2 scenes are explicitly named and deferred
After this point, Route 2 is considered complete or deferred.

View File

@@ -0,0 +1,51 @@
# G6 Host-Bridge Callback Semantics Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: [2026-04-19-g6-host-bridge-execution-semantics-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g6-host-bridge-execution-semantics-report.md)
## 1. Intent
This design defines the next bounded slice after `G6 host-bridge execution semantics`.
The target is:
`G6 host-bridge callback semantics`
## 2. Why This Slice
The previous semantic slice isolated two seams, but the tighter next pressure is callback-side semantics:
1. invocation semantics are already identified
2. callback completion semantics still determine whether later real execution can be bounded safely
## 3. Scope Boundary
This design is limited to callback semantics only.
It may include:
1. defining completion states for callback requests
2. defining blocked/error/partial/ok transitions
3. defining how callback semantics constrain later real-sample entry
It must not include:
1. implementing host-runtime directly
2. executing a `G6` real sample
3. reopening `G7`
4. opening `G8`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6 host-bridge callback semantics` plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes callback semantics only
2. separates completion state logic from transport/runtime implementation
3. emits one bounded follow-up plan

View File

@@ -0,0 +1,53 @@
# G6 Host-Bridge Callback State Verification Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: [2026-04-19-g6-host-bridge-callback-semantics-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g6-host-bridge-callback-semantics-report.md)
## 1. Intent
This design defines the next bounded slice after `G6 host-bridge callback semantics`.
The target is:
`G6 host-bridge callback state verification`
## 2. Why This Slice
The callback states are now explicit, but they have not yet been bounded into a verification-oriented slice.
The next pressure is narrower:
1. verify state transitions as a bounded model
2. keep that verification separate from real execution and host-runtime implementation
## 3. Scope Boundary
This design is limited to callback state verification only.
It may include:
1. defining bounded verification targets for `ok/partial/blocked/error`
2. defining what evidence is sufficient for each transition
3. defining how verification narrows a later `G6` real-sample entry
It must not include:
1. implementing host-runtime directly
2. executing a `G6` real sample
3. opening `G8`
4. reopening `G7`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6 host-bridge callback state verification` plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes callback-state verification scope
2. defines bounded verification targets
3. emits one bounded follow-up plan

View File

@@ -0,0 +1,52 @@
# G6 Host-Bridge Entry Gate Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: [2026-04-19-g6-host-bridge-entry-readiness-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g6-host-bridge-entry-readiness-report.md)
## 1. Intent
This design defines the next bounded slice after `G6 host-bridge entry readiness`.
The target is:
`G6 host-bridge entry gate`
## 2. Why This Slice
The semantic readiness criteria are now explicit.
The next bounded pressure is:
1. turn those criteria into a bounded future entry gate without opening real execution
## 3. Scope Boundary
This design is limited to gate modeling only.
It may include:
1. defining pass/fail gate conditions for a future `G6` entry slice
2. defining which readiness criteria are hard blockers
3. defining how the gate narrows later real-sample entry
It must not include:
1. executing a `G6` real sample
2. implementing host-runtime directly
3. opening `G8`
4. reopening `G7`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6 host-bridge entry gate` plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes gate-model scope
2. defines bounded gate conditions
3. emits one bounded follow-up plan

View File

@@ -0,0 +1,52 @@
# G6 Host-Bridge Entry Gate Verification Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: [2026-04-19-g6-host-bridge-entry-gate-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g6-host-bridge-entry-gate-report.md)
## 1. Intent
This design defines the next bounded slice after `G6 host-bridge entry gate`.
The target is:
`G6 host-bridge entry gate verification`
## 2. Why This Slice
The future entry gate is now explicit.
The next bounded pressure is:
1. verify the gate model itself before any later real-sample entry slice is considered
## 3. Scope Boundary
This design is limited to gate verification only.
It may include:
1. defining bounded verification targets for the hard gate
2. defining how fail-close reasons are checked
3. defining how gate verification narrows later `G6` entry work
It must not include:
1. executing a `G6` real sample
2. implementing host-runtime directly
3. opening `G8`
4. reopening `G7`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6 host-bridge entry gate verification` plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes gate-verification scope
2. defines bounded verification targets
3. emits one bounded follow-up plan

View File

@@ -0,0 +1,51 @@
# G6 Host-Bridge Entry Readiness Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: [2026-04-19-g6-host-bridge-callback-state-verification-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g6-host-bridge-callback-state-verification-report.md)
## 1. Intent
This design defines the next bounded slice after `G6 host-bridge callback state verification`.
The target is:
`G6 host-bridge entry readiness`
## 2. Why This Slice
The callback states are now explicit and their verification priority is explicit.
The next bounded pressure is:
1. determine whether those bounded semantics are sufficient to define a future `G6` real-sample entry gate
## 3. Scope Boundary
This design is limited to entry-readiness modeling only.
It may include:
1. defining bounded readiness criteria for future `G6` entry
2. defining which semantic pieces must be present before `G6` real-sample execution may be opened
It must not include:
1. executing a `G6` real sample
2. implementing host-runtime directly
3. opening `G8`
4. reopening `G7`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6 host-bridge entry readiness` plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes entry-readiness scope
2. defines bounded readiness criteria
3. emits one bounded follow-up plan

View File

@@ -0,0 +1,51 @@
# G6 Host-Bridge Execution Semantics Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: [2026-04-19-g6-host-bridge-prerequisites-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g6-host-bridge-prerequisites-report.md)
## 1. Intent
This design defines the next bounded slice after `G6 host-bridge prerequisites` isolates the minimum blocked capability.
The target is:
`G6 host-bridge execution semantics`
## 2. Why This Slice
This slice is selected because the remaining `G6` gap is narrower than broad host-runtime implementation:
1. bridge action invocation semantics
2. callback completion semantics
## 3. Scope Boundary
This design is limited to semantic scoping only.
It may include:
1. defining the minimum bridge action semantic
2. defining the minimum callback completion semantic
3. defining how those semantics bound later `G6` real-sample entry
It must not include:
1. implementing host-runtime directly
2. executing a `G6` real sample
3. opening `G8`
4. reopening `G7`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6 host-bridge execution semantics` plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes the semantic boundary
2. separates bridge invocation from callback completion
3. emits one bounded follow-up plan

View File

@@ -0,0 +1,53 @@
# G6 Host-Bridge Prerequisites Design
> Date: 2026-04-19
> Status: Draft
> Upstream Decision: [2026-04-19-boundary-runtime-prerequisites-decision-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-boundary-runtime-prerequisites-decision-report.md)
## 1. Intent
This design defines the next bounded slice after the boundary-runtime prerequisites roadmap selects `G6 host-bridge prerequisites`.
The target is:
`G6 host-bridge prerequisites`
## 2. Why G6 First
`G6` is selected because:
1. it is blocked by one clearer prerequisite line
2. that prerequisite line is narrower than the combined local-doc and attachment burden on `G8`
3. it is the smaller bounded next step after `G7`
## 3. Scope Boundary
This design is limited to prerequisite scoping only.
It may include:
1. isolating the minimum host-bridge execution semantics needed before `G6` real-sample entry
2. defining a bounded prerequisite slice
3. publishing one follow-up bounded plan
It must not include:
1. executing a `G6` real sample
2. implementing host-runtime directly
3. reopening `G7`
4. opening `G8`
5. opening `G4 / G5`
## 4. Target Outcome
The bounded target outcome is one state:
1. one bounded `G6` prerequisites plan
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes the `G6` prerequisite boundary
2. isolates the minimum blocked host-bridge capability
3. publishes a bounded follow-up plan

View File

@@ -0,0 +1,111 @@
# G6 Real-Sample Entry Preparation And Bounded Execution Design
> Date: 2026-04-19
> Status: Draft
> Replaces Further G6 Micro-Planning: use this design as the single surviving `G6` design reference
## 1. Intent
This design replaces the previous over-split `G6` micro-planning chain with one implementation-oriented bounded design.
The target is:
`G6 real-sample entry preparation and bounded execution`
## 2. Why This Redesign Exists
The prior `G6` work was split too finely into:
1. prerequisites
2. execution semantics
3. callback semantics
4. callback-state verification
5. entry readiness
6. entry gate
That chain produced useful conclusions, but it also created planning recursion.
This redesign stops that recursion.
The older `G6` planning documents are now treated only as input material, not as separate execution tracks.
## 3. Preserved Inputs
The only conclusions preserved from the earlier `G6` planning chain are:
1. `G6` already has classification, family preservation, and a minimum runtime-contract shape
2. the remaining pressure is `host bridge real execution semantics`
3. callback completion states are already explicit:
- `blocked`
- `error`
- `partial`
- `ok`
4. the future `G6` fail-close reasons are already explicit:
- `g6_bridge_invocation_semantics_missing`
- `g6_callback_completion_semantics_missing`
- `g6_callback_state_targets_missing`
No further `G6` semantic sub-plans should be opened for the same topic.
## 4. Scope Boundary
This design is limited to one bounded `G6` mainline preparation-and-execution slice.
It may include:
1. freezing one final `G6` entry gate
2. implementing one minimum host-bridge execution seam
3. running one fixed `G6` real sample
4. writing back one bounded validation result
It must not include:
1. opening more `G6` semantic sub-plans
2. reopening `G7`
3. opening `G8`
4. opening `G4 / G5`
5. broad host-runtime platform redesign
6. multi-sample `G6` family expansion
## 5. Fixed Target
This design allows only one `G6` fixed real-sample anchor.
The exact sample must remain the existing `G6` representative real sample already referenced by current boundary-family materials.
No second `G6` real sample may be introduced under this design.
## 6. Target Outcome
The bounded target outcome is only one of two states:
1. `executed-pass`
2. `named mismatch`
The design explicitly rejects a third outcome of “write another semantic clarification plan”.
## 7. Stop Conditions
This redesign introduces hard stop conditions:
1. once the fixed `G6` real sample is executed, no new `G6` semantic sub-plan may be created
2. if the result is `mismatch`, only an implementation correction plan may follow
3. if the result is `executed-pass`, the `G6` line closes immediately
## 8. Execution Shape
The single surviving `G6` execution shape is:
1. freeze the final entry gate
2. implement the minimum host-bridge execution seam
3. run the fixed real sample once
4. update validation assets and close
## 9. Exit Condition
This design is complete when one bounded plan exists that:
1. freezes the final `G6` gate
2. moves directly into implementation
3. runs one fixed real sample
4. closes with `executed-pass` or `named mismatch`

View File

@@ -0,0 +1,57 @@
# G7 Real-Sample Entry Design
> Date: 2026-04-19
> Status: Draft
> Upstream Decision: [2026-04-19-boundary-family-entry-decision-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-boundary-family-entry-decision-report.md)
## 1. Intent
This design defines the next bounded execution slice after the boundary-family entry roadmap selected `G7` as the only next candidate.
The target is:
`G7 real-sample entry`
## 2. Why G7
`G7` is selected because:
1. it already has a minimal runnable runtime contract
2. it does not require host bridge execution semantics as strongly as `G6`
3. it does not require local document pipeline and attachment handling as strongly as `G8`
## 3. Scope Boundary
This design is limited to one representative `G7` real sample:
1. `计量资产库存统计`
It may include:
1. real-sample contract differential
2. bounded real-sample rerun
3. validation-layer update if the result narrows
It must not include:
1. new `G7` family expansion
2. new runtime-platform work
3. `G6` or `G8` execution
4. `G4 / G5`
## 4. Target Outcome
The bounded target outcome is one of two states:
1. `executed-pass`
2. or a smaller named `G7` real-sample mismatch
The design rejects opening generalized boundary-family work beyond this one representative sample.
## 5. Exit Condition
This design is complete when implementation can be bounded to one plan that:
1. freezes one `G7` real sample
2. reruns it against the existing minimal `G7` runtime contract
3. updates the validation layer with a narrower outcome

View File

@@ -0,0 +1,27 @@
# Host-Bridge Runtime Roadmap Design
> Date: 2026-04-19
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Fixed Scene: `sweep-085-scene`
## Intent
Close or narrow the remaining host-bridge runtime residual without resurrecting the old G6 semantics micro-plan chain.
## Fixed Scope
Only `sweep-085-scene` is in scope.
## Minimal Success Definition
The scene must either:
1. become `framework-auto-pass-candidate`; or
2. remain `framework-structured-fail-closed` with a narrower named host-bridge runtime hold.
## Forbidden Scope
1. no general host-runtime transport implementation
2. no new G6 semantics micro-plan
3. no changes to G1-E/G2/G3 routes
4. no new family

View File

@@ -0,0 +1,70 @@
# Local-Doc Official Board Reconciliation Refresh Design
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Roadmap: `2026-04-19-local-doc-runtime-roadmap-plan.md`
> Layer: `Layer E / official board reconciliation`
## Intent
Consume the five local-doc reconciliation candidates produced by the local-doc runtime roadmap and refresh only their framework status in the official execution board.
This design exists because the local-doc roadmap intentionally stopped before official board update. The board update needs a bounded reconciliation refresh that applies the promotion policy without modifying generation logic.
## Fixed Inputs
1. `tests/fixtures/generated_scene/local_doc_runtime_reconciliation_candidates_2026-04-19.json`
2. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Scope
Only these five scene ids are in scope:
1. `sweep-033-scene`
2. `sweep-034-scene`
3. `sweep-042-scene`
4. `sweep-051-scene`
5. `sweep-074-scene`
## Refresh Rule
If a fixed-scope scene is present in the local-doc candidate asset as `framework-auto-pass-candidate`, update only framework-layer board fields:
- `currentFrameworkStatus`
- `currentFrameworkCandidateStatus`
- `currentFrameworkArchetype`
- `currentFrameworkReadiness`
- `currentFrameworkSource`
- `currentFrameworkDecisionOverlay`
- `currentFrameworkNextAction`
- `currentFrameworkCanAutoUpdateBoard`
Workbook snapshot fields, `currentGroup`, `currentStatus`, real-sample fields, and scene names are preserved.
## Forbidden Scope
1. Do not modify `src/generated_scene/analyzer.rs`.
2. Do not modify `src/generated_scene/generator.rs`.
3. Do not rerun the 102 sweep.
4. Do not update host-bridge residuals.
5. Do not update bootstrap residuals.
6. Do not rename official-board scenes.
7. Do not promote non-framework business status.
## Expected Result
The official board framework summary moves from:
- `framework-auto-pass = 95`
- `framework-structured-fail-closed = 7`
to:
- `framework-auto-pass = 100`
- `framework-structured-fail-closed = 2`
The remaining two structured fail-closed records should be:
- one host-bridge runtime residual;
- one bootstrap target normalization residual.

View File

@@ -0,0 +1,61 @@
# Local-Doc Runtime Roadmap Design
> Date: 2026-04-19
> Parent Decision: `2026-04-19-residual-runtime-roadmap-prioritization-design.md`
> Parent Residual Bucket: `local_doc_pipeline`
> Status: Draft
## Intent
Define the next bounded roadmap for the five remaining `local_doc_pipeline` residuals.
This roadmap exists because official board reconciliation identified five explained structured fail-closed scenes that require local document runtime and attachment/document handling semantics.
## Fixed Input Bucket
The roadmap is limited to the five official board residuals with:
1. `currentFrameworkStatus = framework-structured-fail-closed`
2. `currentFrameworkArchetype = local_doc_pipeline`
3. `currentFrameworkNextAction = future-local-doc-runtime-roadmap-input`
## Target Scenes
1. `sweep-033-scene` / `供电可靠率指标统计表`
2. `sweep-034-scene` / `供电可靠性数据质量自查报告月报`
3. `sweep-042-scene` / `国网金昌供电公司营商环境周例会报告`
4. `sweep-051-scene` / `嘉峪关可靠性分析报告`
5. `sweep-074-scene` / `同兴智能安全督查日报`
## Roadmap Goal
Move the five local-doc residuals from generic structured fail-closed to one of:
1. runnable local-doc contract;
2. named local-doc runtime missing capability;
3. explicit non-goal that remains policy-held.
## Boundary
This design must not:
1. modify host-bridge runtime;
2. open bootstrap target normalization;
3. add a new family;
4. update the official board without a dedicated reconciliation step;
5. treat local document runtime as generic paginated enrichment.
## Required Work Areas
1. local document source evidence extraction;
2. document artifact contract;
3. attachment/input dependency modeling;
4. local pipeline execution seam;
5. fail-closed reasons specific to local-doc runtime.
## Acceptance Criteria
1. The five target scenes remain the only input bucket.
2. Each scene has a local-doc runtime decision.
3. Any implementation step remains bounded to `local_doc_pipeline`.
4. Follow-up reconciliation is explicit and separate.

View File

@@ -0,0 +1,63 @@
# Official Board Reconciliation Design
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-design.md`
> Parent Layer: `Layer E`
> Status: Active
## Intent
Apply the final 102-scene coverage rollup to the official execution board in a controlled, auditable way.
This design is the only point where `scene_execution_board_2026-04-18.json` may be updated from the final coverage rollup.
## Inputs
1. `tests/fixtures/generated_scene/final_coverage_status_rollup_2026-04-19.json`
2. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Outputs
1. updated `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/official_board_reconciliation_2026-04-19.json`
3. `docs/superpowers/reports/2026-04-19-official-board-reconciliation-report.md`
## Status Mapping
1. `framework-auto-pass-candidate` maps to `framework-auto-pass`.
2. `framework-structured-fail-closed` maps to `framework-structured-fail-closed`.
3. `framework-valid-host-bridge` maps to `framework-valid-host-bridge`.
4. unresolved raw statuses remain explicit and must not be collapsed.
## Board Fields
The reconciliation may add or update only current framework fields:
1. `currentFrameworkStatus`
2. `currentFrameworkArchetype`
3. `currentFrameworkReadiness`
4. `currentFrameworkSource`
5. `currentFrameworkDecisionOverlay`
6. `currentFrameworkNextAction`
7. `currentFrameworkCanAutoUpdateBoard`
Existing frozen workbook snapshot fields must be preserved.
## Boundary
This design must not:
1. modify `src/generated_scene/analyzer.rs`;
2. modify `src/generated_scene/generator.rs`;
3. rerun the 102 sweep;
4. promote real-sample validation status;
5. remove existing snapshot fields from the board.
## Acceptance Criteria
1. Official board still contains exactly `102` scenes.
2. Final framework status counts are `95` framework auto-pass and `7` structured fail-closed.
3. No unresolved framework status remains.
4. Reconciliation JSON records all updated scenes.
5. Report explains the remaining `7` residuals and the next roadmap inputs.

View File

@@ -0,0 +1,111 @@
# Post-G7 Boundary Decision Roadmap Design
> Date: 2026-04-19
> Status: Draft
> Upstream Validation Layer: [real_sample_validation_records_2026-04-18.json](D:/data/ideaSpace/rust/sgClaw/claw-new/tests/fixtures/generated_scene/real_sample_validation_records_2026-04-18.json)
> Upstream Entry Rules: [boundary_runtime_entry_rules_2026-04-18.json](D:/data/ideaSpace/rust/sgClaw/claw-new/tests/fixtures/generated_scene/boundary_runtime_entry_rules_2026-04-18.json)
> Upstream Closure: [2026-04-19-g7-real-sample-entry-closure-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g7-real-sample-entry-closure-report.md)
## 1. Intent
This design defines the next bounded roadmap after `G7` has closed as the first executed boundary-family real sample.
The current validated state is now:
1. `G1-E = executed-pass`
2. `G2 = executed-pass`
3. `G3 = executed-pass`
4. `G7 = executed-pass`
So the next roadmap must not reopen any closed mainline slice and must not continue extending the finished `G7` plan.
The only question under this roadmap is:
`After G7, should another boundary family enter real-sample scope next, or should boundary work stop and defer to prerequisites?`
## 2. Problem Statement
The prior boundary-entry roadmap solved the first ambiguity by selecting `G7`.
That ambiguity is now closed.
The remaining ambiguity is narrower:
1. whether `G6` is now the next justified boundary-family entry candidate
2. whether `G8` is now the next justified boundary-family entry candidate
3. or whether both should remain held and a bounded prerequisites roadmap should be opened first
Without a new roadmap, the next step would drift into one of three bad outcomes:
1. reopening `G7` after closure
2. opening both `G6` and `G8` at once
3. starting runtime-platform implementation without a bounded decision slice
## 3. Scope Boundary
This roadmap is limited to a post-`G7` boundary-family decision.
It may include:
1. restating the now-closed `G7` result
2. comparing only `G6` and `G8` as remaining boundary candidates
3. determining whether one of them is admitted next
4. or determining that both remain held and a prerequisites slice is needed
5. publishing one bounded follow-up `design + plan`
It must not include:
1. reopening `G7` implementation or expansion
2. reopening `G1-E / G2 / G3`
3. opening `G4 / G5`
4. implementing host-runtime, transport, or local-doc prerequisites
5. executing real samples for more than one boundary family
## 4. Current Decision Inputs
The current repo state already provides the relevant admission constraints:
1. `G6` still needs stronger host-bridge real execution semantics than current repo-local coverage
2. `G8` still needs stronger local document pipeline and attachment/runtime handling than current repo-local coverage
3. `G7` is no longer a candidate because it has already closed as an executed pass
These are decision inputs only.
They are not yet implementation tasks.
## 5. Roadmap Goal
The goal of this roadmap is to reduce the post-`G7` boundary question to one bounded next step:
1. select exactly one next bounded direction
2. either `G6`
3. or `G8`
4. or a prerequisites-only slice with both held
## 6. Preferred Outcome
The preferred outcome is:
1. either one selected next boundary family
2. or one bounded prerequisites roadmap
3. with the non-selected direction explicitly held
## 7. Acceptance Logic
This roadmap is successful when:
1. `G6` and `G8` no longer compete ambiguously
2. `G7` is not reopened
3. only one bounded next direction is emitted
4. no runtime-platform implementation is started under roadmap scope
## 8. Out of Scope
The following are explicitly out of scope:
1. new scene-generator family work
2. new canonical answers
3. new mainline contract correction
4. direct host-runtime implementation
5. direct local-doc runtime implementation
6. `G4 / G5`

View File

@@ -0,0 +1,48 @@
# Promotion And Board Reconciliation Policy Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 6: promotion and board reconciliation`
> Parent Layer: `Layer E`
## Intent
Define how stronger framework-resolved statuses may update the official scene state representation without over-promoting diagnostics.
## Fixed Input Bucket
This route owns policy, not an archetype bucket.
Fixed policy inputs:
1. `auto-pass`
2. `fail-closed-known`
3. `adjudicated-valid-host-bridge`
4. hygiene-aware timeout interpretation
## Allowed Files
1. policy design and policy plan docs
2. reconciliation-policy JSON assets
3. policy reports
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. direct scene promotion inside `scene_execution_board_2026-04-18.json`
## Expected Delta
Policy-only delta:
1. future board updates become rule-driven
2. diagnostics and promotion are no longer conflated
## Stop Rule
Stop after the policy rules are published.
Do not apply policy updates to the execution board under this plan.

View File

@@ -0,0 +1,93 @@
# Remaining Route Conflict Correction Design
> Date: 2026-04-19
> Status: Draft
> Upstream Report: `docs/superpowers/reports/2026-04-19-102-full-sweep-improvement-coverage-delta-report.md`
## Design Intent
Resolve the remaining `4` route conflicts from the follow-up `102` sweep without reopening the broader full-sweep improvement roadmap.
The design answers:
`which of the remaining G3/G2 vs host_bridge_workflow conflicts should be corrected, and which should be formally adjudicated as valid host-bridge workflows?`
## Fixed Input
The fixed input set is exactly the `4` misclassified records from `tests/fixtures/generated_scene/full_sweep_improvement_followup_2026-04-19.json`:
| Scene | Expected group | Expected archetype | Current inferred archetype |
| --- | --- | --- | --- |
| `95598报修工单日管控` | `G3` | `paginated_enrichment` | `host_bridge_workflow` |
| `95598重要服务事项报备统计表` | `G3` | `paginated_enrichment` | `host_bridge_workflow` |
| `台区线损台区月度高负损预测` | `G2` | `multi_mode_request` | `host_bridge_workflow` |
| `配网支撑月报(95598抢修统计报表)` | `G3` | `paginated_enrichment` | `host_bridge_workflow` |
## Scope Guardrails
1. do not add new scene families
2. do not reopen timeout work
3. do not reopen readiness-before-report work
4. do not update `scene_execution_board_2026-04-18.json`
5. do not promote scenes automatically
6. do not weaken `G6` host-bridge real-sample pass
7. do not weaken `G2` or `G3` canonical / real-sample pass
8. do not make `host_bridge_workflow` lose when it is the only complete contract
## Route Decision Model
Each conflict must be assigned exactly one final route decision:
1. `route-corrected-to-g3`
2. `route-corrected-to-g2`
3. `valid-host-bridge-workflow`
4. `board-expectation-stale`
5. `route-conflict-unresolved`
## Evidence Rules
### G3 Wins Over G6 Only When
1. business endpoint evidence is present
2. pagination evidence is present
3. response path evidence is present
4. at least one of enrichment, join-key, or export workflow evidence is present
5. host bridge evidence is subordinate rather than the only execution path
### G2 Wins Over G6 Only When
1. line-loss / electricity business signal is present
2. mode or prediction signal is present
3. request contract can be inferred
4. host bridge evidence is subordinate rather than the only execution path
### G6 Remains Valid When
1. host bridge action is the only complete execution path
2. callback / localhost dependency dominates the workflow
3. business-chain evidence does not close the expected G2/G3 contract
## Expected Deliverables
1. route conflict decision JSON
2. route conflict correction report
3. bounded routing regression tests if implementation correction is needed
4. follow-up probe result for the same `4` records
## Completion Criteria
This design is complete when:
1. all `4` conflicts have explicit final decisions
2. corrected routes are verified by targeted generation probes
3. valid host-bridge cases remain documented rather than forced into G2/G3
4. existing `G2/G3/G6` regressions still pass
## Out of Scope
1. full `102` sweep rerun unless explicitly required after route correction
2. timeout optimization
3. new family creation
4. login / host runtime implementation
5. execution board status sync

View File

@@ -0,0 +1,47 @@
# Residual Runtime Roadmap Prioritization Design
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-design.md`
> Parent Layer: `Layer E`
> Status: Active
## Intent
Choose the next bounded roadmap after official board reconciliation.
The official board now has `95` framework auto-pass scenes and `7` explained structured fail-closed residuals. This design compares the three residual roadmap inputs and selects exactly one next roadmap.
## Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/official_board_reconciliation_2026-04-19.json`
## Candidate Roadmaps
1. `local-doc runtime roadmap`
2. `host-bridge runtime roadmap`
3. `bootstrap target normalization roadmap`
## Decision Criteria
1. `impact`: number of residual scenes addressed.
2. `scope clarity`: whether the required implementation boundary is clear.
3. `prerequisite weight`: whether the roadmap requires large external runtime work.
4. `risk`: likelihood of disturbing the already reconciled `95` framework auto-pass scenes.
## Boundary
This design is decision-only. It must not:
1. modify `src/generated_scene/analyzer.rs`;
2. modify `src/generated_scene/generator.rs`;
3. modify `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`;
4. start runtime implementation;
5. add a new family.
## Acceptance Criteria
1. All `7` residual records are accounted for.
2. Exactly one next roadmap is selected.
3. Deferred roadmaps have explicit reasons.
4. A next bounded design/plan is created for the selected roadmap only.

View File

@@ -0,0 +1,98 @@
# Scene Skill 102 Final Materialization Design
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Status: `framework-auto-pass = 102 / 102`
## Intent
Freeze a single final skill asset set for the current 102 scenes.
The previous framework work proved that all 102 scenes can be adapted at the framework layer, but the generated skill packages are spread across multiple follow-up directories. Before static, mock, or production-like validation, the project needs one canonical materialized skill set.
## Key Decision
Do not clean or overwrite existing `examples/*` follow-up directories.
Instead, create a new isolated materialization root:
`examples/scene_skill_102_final_materialization_2026-04-19`
Rationale:
1. previous `examples/*` directories are audit artifacts for earlier plans;
2. deleting them would destroy provenance;
3. a new root gives validation a stable input set;
4. final materialization can be repeated without mutating history.
## Inputs
1. official board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. final framework rollup: `tests/fixtures/generated_scene/scene_skill_102_framework_closure_rollup_2026-04-19.json`
3. scene source root: `D:/desk/智能体资料/全量业务场景/一平台场景`
4. generator binary: `cargo run --bin sg_scene_generate`
## Required Hygiene
Before generation, build a clean materialization input manifest.
The official board is the status authority, but it may contain historical encoding or control-character artifacts. The materialization manifest must therefore validate:
1. exactly 102 rows;
2. unique scene ids;
3. source directory exists for each row;
4. scene name used for generation is stable;
5. unsafe control characters are not propagated into final manifest fields.
## Output Layout
```text
examples/scene_skill_102_final_materialization_2026-04-19/
skills/
sweep-001-scene/
SKILL.toml
SKILL.md
scene.toml
scripts/
references/
generation-report.json
generation-report.md
manifest/
scene_skill_102_final_materialization_manifest_2026-04-19.json
```
Repository-level fixture outputs:
1. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
2. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
Report:
1. `docs/superpowers/reports/2026-04-19-scene-skill-102-final-materialization-report.md`
## Success Criteria
The materialization is successful when:
1. all 102 scene ids are attempted;
2. all 102 have a generated skill directory;
3. each generated skill directory has required files;
4. each generated report has `readiness.level = A` or otherwise has a named failure in the failures asset;
5. the manifest is the only input to later static/mock validation plans.
## Non-Goals
1. no production execution;
2. no mock validation;
3. no static validation beyond presence/manifest checks;
4. no deletion of old `examples/*`;
5. no official board mutation;
6. no new family or runtime implementation.
## Follow-Up
After materialization succeeds, the next roadmap should be:
`102 static and mock validation roadmap`
That roadmap must consume the final materialization manifest, not scattered follow-up directories.

View File

@@ -0,0 +1,221 @@
# Scene Skill 102 Full Coverage Child Plan Sequence Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework Design: `docs/superpowers/specs/2026-04-19-scene-skill-102-full-coverage-framework-design.md`
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
## Intent
Turn the parent `102` full-coverage framework into a fixed downstream child-plan sequence.
This design does not implement any bucket directly. It defines:
1. the ordered bounded plans that must be created and executed next
2. which bucket each bounded plan owns
3. which layer and route each bounded plan belongs to
4. which plans are implementation plans and which plans are policy/reconciliation plans
5. what later plans are not allowed to skip
The main purpose is to stop later work from drifting into ad hoc micro-plans.
## Current Parent Baseline
The parent framework freezes the current integrated state as:
| Status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 47 |
| `adjudicated-valid-host-bridge` | 4 |
| raw `source-unreadable` | 3 |
| Total | 102 |
The timeout hygiene layer additionally shows:
| Hygiene interpretation | Count |
| --- | ---: |
| `timeout-as-pass-candidate` | 2 |
| `timeout-as-fail-closed-candidate` | 1 |
| `timeout-still-unreadable` | 0 |
## Child Sequence Principles
Every child plan in this sequence must follow the parent framework requirements:
1. one child plan belongs to exactly one route
2. one child plan belongs to exactly one layer
3. one child plan owns one fixed input bucket
4. one child implementation slice should target one repeated recoverable pattern
5. child plans must not silently absorb neighboring buckets
6. a child plan must stop after its declared delta is measured
## Ordered Child Routes
The child plan sequence begins at `Route 2`, because `Route 1` has already been completed at the parent-framework level.
### Route 2
`Layer C + Layer D`
Target:
`G3 / paginated_enrichment structured fail-closed bucket`
Current bucket size:
`34`
Fixed bounded child-plan order inside Route 2:
1. `G3 enrichment-request closure`
2. `G3 export-plan closure`
3. `G3 residual contract closure`
The residual plan only begins after the first two plans have either:
1. produced measurable delta, or
2. been explicitly closed as deferred
### Route 3
`Layer C + Layer D`
Target:
`G2 / multi_mode_request structured fail-closed bucket`
Current bucket size:
`4`
Fixed bounded child-plan order inside Route 3:
1. `G2 remaining fail-closed closure`
No Route 3 follow-up plan may begin until Route 2 has been completed or explicitly deferred.
### Route 4
`Layer C + Layer D`
Target:
`G1-E / single_request_enrichment structured fail-closed bucket`
Current bucket size:
`2`
Fixed bounded child-plan order inside Route 4:
1. `G1-E remaining fail-closed closure`
No Route 4 follow-up plan may begin until Route 3 has been completed or explicitly deferred.
### Route 5
`Layer C + Layer D`
Target:
Boundary-family fail-closed buckets:
1. `local_doc_pipeline = 5`
2. `host_bridge_workflow = 1`
3. `page_state_eval/bootstrap_target = 1`
Fixed bounded child-plan order inside Route 5:
1. `boundary fail-closed decision`
This route is decision-first by design. It must not start implementation correction before mainline routes have been reduced or deferred.
### Route 6
`Layer E`
Target:
Promotion thresholds and board reconciliation policy.
Fixed bounded child-plan order inside Route 6:
1. `promotion and board reconciliation policy`
This route must start only after Routes 2 through 5 have stable post-delta reporting.
## Required Child Plan Fields
Every bounded child plan in this sequence must declare:
1. parent framework reference
2. parent route name
3. parent layer name
4. fixed input bucket
5. allowed file set
6. forbidden file set
7. expected coverage delta
8. stop statement
If one of these is missing, the plan is not valid under this sequence.
## Implementation vs Policy Split
The child sequence intentionally separates implementation plans from policy plans.
Implementation-oriented plans:
1. `G3 enrichment-request closure`
2. `G3 export-plan closure`
3. `G3 residual contract closure`
4. `G2 remaining fail-closed closure`
5. `G1-E remaining fail-closed closure`
Decision or policy-oriented plans:
1. `boundary fail-closed decision`
2. `promotion and board reconciliation policy`
## Expected Coverage Movement
This sequence does not promise `auto-pass` growth on every child plan.
Expected valid deltas include:
1. `fail-closed-known` reduction
2. stronger structured fail-closed naming
3. bucket shrinkage within one archetype
4. policy-recognized status strengthening
Invalid deltas include:
1. scene-name hardcoding
2. silent gate relaxation
3. route changes that are not measured against current canonical and real-sample anchors
## Stop Rules
This child-plan sequence forbids:
1. opening a child implementation plan outside Routes 2 through 6
2. creating route-local semantics micro-plans that do not reduce a measured bucket
3. mixing timeout hygiene with contract recovery in the same bounded implementation plan
4. updating `scene_execution_board_2026-04-18.json` inside any Route 2 through Route 5 implementation plan
5. starting Route 6 before post-Route-5 status is stable enough for policy design
## Completion Condition
This child-plan sequence remains active until all of these are true:
1. the Route 2 child plans are completed or deferred
2. the Route 3 child plan is completed or deferred
3. the Route 4 child plan is completed or deferred
4. the Route 5 decision plan is completed
5. the Route 6 policy plan is completed
At that point, the parent framework may either:
1. remain active with no open child routes, or
2. be revised into a new parent framework revision

View File

@@ -0,0 +1,391 @@
# Scene Skill 102 Full Coverage Framework Design
> Date: 2026-04-19
> Status: Draft
> Upstream Roadmap: `docs/superpowers/plans/2026-04-17-scene-skill-60-to-90-roadmap-plan.md`
> Upstream Reconciliation: `tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json`
> Upstream Follow-up: `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
> Upstream Timeout Hygiene: `tests/fixtures/generated_scene/timeout_rerun_hygiene_integration_2026-04-19.json`
## Intent
Provide the single post-roadmap framework design for driving the current sgClaw scene-to-skill pipeline from partial `102` scene coverage to full bounded `102` scene coverage.
This design is intentionally broader than the bounded micro-plans used so far. It defines:
1. the current actual state of the `102` scene set
2. what is still missing before `100%` coverage can be claimed
3. the layered framework that all future changes must fit into
4. the fixed route order for future implementation work
5. the stop rules that prevent the project from drifting into unbounded plan recursion
This design is meant to become the single parent framework for later bounded plans.
## Current State
### Raw Current State
From the latest integrated assets:
| Status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 47 |
| `adjudicated-valid-host-bridge` | 4 |
| raw `source-unreadable` | 3 |
| Total | 102 |
### Timeout Hygiene Overlay
The timeout hygiene layer shows that the raw `3` timeout records are not all hard unreadable records:
| Hygiene-aware timeout interpretation | Count |
| --- | ---: |
| `timeout-as-pass-candidate` | 2 |
| `timeout-as-fail-closed-candidate` | 1 |
| `timeout-still-unreadable` | 0 |
| `timeout-rerun-error` | 0 |
### Interpretation
This means the framework has already reached these milestones:
1. there are no `unsupported-family` scenes in the current `102` sweep
2. there are no unresolved route conflicts left in the current `102` sweep
3. the remaining gap is no longer “framework cannot classify this scene”
4. the remaining gap is “contract does not close” or “timeout budget/hygiene distorts the raw reading”
## What Is Still Missing Before 100% Coverage
`100%` coverage does not mean all `102` scenes must become direct `auto-pass`.
For this framework, `100% bounded coverage` means:
1. every scene is classified into a supported framework path
2. every non-pass result is either:
- structured fail-closed with named blocker
- valid host-bridge workflow adjudication
- hygiene-aware timeout interpretation
3. there are no unresolved buckets like:
- unsupported family
- unresolved route conflict
- opaque no-report failure
- unexplained timeout
Under that definition, the missing gap is:
### Missing Gap A: Structured Contract Closure
There are still `47` structured fail-closed records.
Current distribution:
| Archetype | Count |
| --- | ---: |
| `paginated_enrichment` | 34 |
| `local_doc_pipeline` | 5 |
| `multi_mode_request` | 4 |
| `single_request_enrichment` | 2 |
| `host_bridge_workflow` | 1 |
| `page_state_eval` | 1 |
This is the largest remaining implementation gap.
### Missing Gap B: Timeout Hygiene Integration into Main Reporting
The timeout hygiene layer now exists, but it is still a reporting-side overlay. It has not yet been folded into the primary current-state narrative used by later roadmap decisions.
### Missing Gap C: Current-State Overlay vs Execution Board
The project intentionally did not update `scene_execution_board_2026-04-18.json` during these bounded plans. That is correct, but it means the official board is still behind the latest integrated view.
### Missing Gap D: Promotion Policy
The project still lacks a single parent rule that says when a structured fail-closed scene may be promoted from:
1. fail-closed
2. fail-closed with stronger evidence
3. bounded rerun pass candidate
into a stronger scene-level coverage status.
## Framework Layers
All future work must land in exactly one of these layers.
### Layer A: Source Scan and Budget Layer
Purpose:
1. source directory size handling
2. file filtering
3. timeout budget policy
4. rerun hygiene
Owned concerns:
1. source scan volume
2. timeout policy
3. rerun interpretation
Must not own:
1. archetype routing
2. contract closure logic
3. scene promotion
Primary code area:
1. `src/generated_scene/analyzer.rs`
2. reporting JSON and sweep scripts
### Layer B: Archetype Routing Layer
Purpose:
1. decide the correct framework path:
- `single_request_table`
- `single_request_enrichment`
- `multi_mode_request`
- `paginated_enrichment`
- `host_bridge_workflow`
- `multi_endpoint_inventory`
- `local_doc_pipeline`
Owned concerns:
1. route precedence
2. mixed-evidence routing boundaries
3. route adjudication support
Must not own:
1. timeout policy
2. contract synthesis beyond routing evidence
3. board reconciliation
Primary code area:
1. `src/generated_scene/analyzer.rs`
### Layer C: Contract Recovery Layer
Purpose:
Recover the minimum business contract fields needed by each supported archetype.
Owned concerns:
1. request contract recovery
2. response contract recovery
3. pagination plan recovery
4. enrichment request recovery
5. join key recovery
6. export plan recovery
7. mode matrix recovery
Must not own:
1. timeout policy
2. execution board updates
3. status promotion
Primary code area:
1. `src/generated_scene/generator.rs`
2. `src/generated_scene/ir.rs`
### Layer D: Structured Fail-Closed and Reporting Layer
Purpose:
Make every incomplete scene fail in an explainable and structured way.
Owned concerns:
1. readiness-before-report classification
2. blocker naming
3. `contractSnapshot`
4. generation-report completeness
Must not own:
1. route preference
2. source scan budget
3. promotion policy
Primary code area:
1. `src/generated_scene/generator.rs`
2. reporting assets under `tests/fixtures/generated_scene/`
### Layer E: Sweep, Reconciliation, and Coverage Layer
Purpose:
Measure the whole `102` scene set, reconcile multiple interpretation layers, and report trustworthy coverage.
Owned concerns:
1. full sweep outputs
2. route adjudication overlay
3. timeout hygiene overlay
4. integrated coverage reporting
5. board reconciliation planning
Must not own:
1. analyzer implementation changes
2. generator implementation changes
Primary assets:
1. `tests/fixtures/generated_scene/*full_sweep*`
2. `tests/fixtures/generated_scene/*reconciliation*`
3. `tests/fixtures/generated_scene/*timeout*hygiene*`
4. `docs/superpowers/reports/*coverage*`
## Coverage Definitions
This framework uses four explicit coverage concepts.
### Coverage 1: Direct Pass Coverage
Scenes with direct `auto-pass`.
Current count:
`48 / 102`
### Coverage 2: Framework-Resolved Coverage
Scenes in one of:
1. `auto-pass`
2. `adjudicated-valid-host-bridge`
3. structured `fail-closed-known`
4. hygiene-aware timeout interpretation
This is the best measure of whether the framework has “caught” the scene set.
### Coverage 3: Promotion Coverage
Scenes already represented as promoted or boundary family assets in current project assets.
This is lower than framework-resolved coverage because promotion is intentionally conservative.
### Coverage 4: Real-Sample Execution Coverage
Scenes that have actual selected and executed real-sample validation records.
This is the strictest coverage metric.
## Fixed Route Order for Future Work
Future work must follow this order.
### Route 1: Finish Layer E Hygiene Integration
Goal:
Make sweep and reconciliation reporting hygiene-aware by default.
This route is nearly finished and should be closed first.
### Route 2: `G3 / paginated_enrichment` Contract Closure
Goal:
Work down the largest remaining structured fail-closed bucket.
Why first:
1. largest bucket by count
2. most important for closing the remaining `102` gap
3. already split into repeated missing-contract patterns
Expected sub-order:
1. `enrichment_request_missing`
2. `export_plan_missing`
3. then any remaining `join_key` or runtime-scope style gaps
### Route 3: `G2 / multi_mode_request` Small-Bucket Closure
Goal:
Close the remaining `4` multi-mode structured fail-closed records.
Why third:
1. clear archetype
2. relatively small bucket
3. mainline family already has real-sample pass anchor
### Route 4: `G1-E / single_request_enrichment` Small-Bucket Closure
Goal:
Close the remaining `2` G1-E structured fail-closed records.
Why fourth:
1. smallest mainline bucket
2. framework anchor already exists
3. lower leverage than G3 and G2
### Route 5: Decide on `local_doc_pipeline` and `host_bridge_workflow`
Goal:
Handle the remaining boundary-family fail-closed records only after the mainline buckets are reduced.
This route must not start before Routes 24 have completed or been explicitly deferred.
### Route 6: Reconciliation and Board Promotion Policy
Goal:
Define how stronger framework-resolved statuses can update the execution board without over-promoting scenes.
This must be done only after contract-closure routes have produced stable deltas.
## What Future Plans Must Contain
Every later bounded implementation plan must explicitly declare:
1. which framework layer it belongs to
2. which route from this design it belongs to
3. which code modules it is allowed to touch
4. which code modules it must not touch
5. how it protects current real-sample and canonical passes
6. what exact delta it expects to produce in the `102` scene state
If a future plan cannot answer those six items, it is out of framework and should not start.
## Stop Rules
The framework forbids:
1. starting a new micro-plan that only renames a narrower semantics problem without moving toward a route completion
2. treating timeout rerun success as promotion
3. updating execution board state inside a diagnostic plan
4. opening `G4/G5` before the current structured fail-closed mainline is reduced
5. using prompt-only tuning as a substitute for contract recovery
## What 100% Looks Like
This framework considers `100% bounded coverage` achieved when:
1. `unsupported-family = 0`
2. `missing-source = 0`
3. `misclassified-unresolved = 0`
4. `timeout-still-unreadable = 0`
5. every remaining non-pass scene is structured and attributable to a supported framework path
6. execution board and reconciliation reporting can express the current scene state without ambiguity
This is different from `100% auto-pass`.
`100% auto-pass` is not the immediate target.
`100% bounded framework coverage` is the immediate target.

View File

@@ -0,0 +1,130 @@
# Structured Fail-Closed Improvement Roadmap Design
> Date: 2026-04-19
> Status: Draft
> Upstream Reconciliation: `tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json`
## Intent
Turn the `48` structured fail-closed records from the reconciled `102` sweep into a governed improvement roadmap.
The objective is not to weaken gates or inflate `auto-pass`. The objective is to classify contract gaps, identify the highest-value bounded correction slices, and then improve generic scene-to-skill conversion where evidence can be recovered safely.
## Current Reconciled Baseline
After status reconciliation, the `102` scene set is:
| Reconciled status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 48 |
| `adjudicated-valid-host-bridge` | 4 |
| `source-unreadable` | 2 |
| `missing-source` | 0 |
| `unsupported-family` | 0 |
The `4` raw route conflicts are no longer unresolved route bugs. They are valid host-bridge workflows.
This roadmap therefore focuses on `fail-closed-known = 48`.
## Fail-Closed Buckets
| Inferred archetype | Reason | Count |
| --- | --- | ---: |
| `paginated_enrichment` | `workflow evidence is incomplete before package generation` | 35 |
| `local_doc_pipeline` | `workflow evidence is incomplete before package generation` | 5 |
| `multi_mode_request` | `workflow evidence is incomplete before package generation` | 4 |
| `single_request_enrichment` | `workflow evidence is incomplete before package generation` | 2 |
| `host_bridge_workflow` | `workflow evidence is incomplete before package generation` | 1 |
| `page_state_eval` | `bootstrap_target` | 1 |
The first priority is the `35` `paginated_enrichment` records because they are the largest bucket and map to the most important currently generic workflow family.
## Scope Guardrails
In scope:
1. classify the `48` structured fail-closed records by missing contract piece
2. prioritize bounded correction slices
3. implement bounded evidence recovery only after classification shows repeated recoverable patterns
4. keep all fail-closed semantics intact
5. rerun a bounded follow-up sweep after corrections
Out of scope:
1. adding new scene families
2. starting `G4/G5`
3. login recovery
4. full browser host runtime transport
5. local document attachment runtime
6. auto-promoting scenes into the execution board
7. weakening readiness gates to increase pass counts
8. reopening the already adjudicated `4` valid-host-bridge workflows
9. handling the `2` remaining timeout records in this roadmap
## Workstreams
1. `WS1` Fail-Closed Inventory and Gap Taxonomy
2. `WS2` G3 Paginated Enrichment Contract Recovery
3. `WS3` Small-Bucket Contract Recovery
4. `WS4` Bootstrap Target Isolation
5. `WS5` Follow-Up Sweep and Coverage Delta
## Gap Taxonomy
Every structured fail-closed record must receive one primary missing-contract label:
1. `main_request_missing`
2. `pagination_plan_missing`
3. `enrichment_request_missing`
4. `join_key_missing`
5. `export_plan_missing`
6. `mode_matrix_missing`
7. `mode_request_contract_missing`
8. `single_request_enrichment_contract_missing`
9. `host_bridge_contract_missing`
10. `local_doc_contract_missing`
11. `bootstrap_target_unresolved`
12. `mixed_or_ambiguous_contract_gap`
Secondary labels may be added, but every record must have exactly one primary label.
## Correction Strategy
Corrections must be pattern-based, not scene-by-scene.
Allowed correction types:
1. bounded evidence extraction for repeated field names or workflow structures
2. bounded IR fallback only when evidence is explicit and traceable
3. more specific fail-closed reason reporting
4. regression tests for each recovered pattern
Forbidden correction types:
1. hard-coding a scene name to pass
2. converting fail-closed records to pass without closing the contract
3. broad route-precedence rewrites
4. disabling or relaxing gates
## Expected Outputs
1. `tests/fixtures/generated_scene/structured_fail_closed_inventory_2026-04-19.json`
2. `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
3. `docs/superpowers/reports/2026-04-19-structured-fail-closed-inventory-report.md`
4. `docs/superpowers/reports/2026-04-19-structured-fail-closed-improvement-coverage-delta-report.md`
5. `docs/superpowers/reports/2026-04-19-structured-fail-closed-improvement-roadmap-closure-report.md`
## Acceptance Criteria
1. all `48` fail-closed records are inventoried
2. all `48` records have exactly one primary missing-contract label
3. the `35` `paginated_enrichment` records are split into actionable G3 gap groups
4. implementation, if performed, is limited to repeated recoverable patterns
5. no adjudicated host-bridge record is reopened
6. follow-up results are measured against the reconciled baseline
7. execution board status remains unchanged
## Completion Signal
The roadmap is complete when the `48` structured fail-closed records are no longer a single broad bucket and the follow-up sweep quantifies whether bounded evidence recovery improved safe conversion coverage.

View File

@@ -0,0 +1,86 @@
# Structured Fail-Closed Residual 13 Closure Design
> Date: 2026-04-19
> Status: Draft
> Parent Framework: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer C` and `Layer D`
> Upstream Asset: `tests/fixtures/generated_scene/full_coverage_reconciliation_candidates_2026-04-19.json`
## Intent
Close or explicitly defer the remaining `13` `framework-structured-fail-closed` scenes after the 102 full-coverage follow-up sweep.
This design does not update the official execution board. It defines how the remaining residual bucket must be split before any implementation work starts.
## Current Residual Bucket
The current residual bucket contains `13` scenes:
| Archetype | Count | Direction |
| --- | ---: | --- |
| `paginated_enrichment` | 4 | mainline residual closure |
| `multi_mode_request` | 2 | mainline residual closure |
| `local_doc_pipeline` | 5 | boundary hold or future G8 runtime roadmap |
| `host_bridge_workflow` | 1 | boundary hold or future G6 runtime roadmap |
| `page_state_eval` | 1 | bootstrap target isolation |
## Fixed Residual Scenes
| Scene ID | Scene | Archetype | Reason |
| --- | --- | --- | --- |
| `sweep-007-scene` | `95598供电服务月报` | `paginated_enrichment` | workflow evidence incomplete |
| `sweep-018-scene` | `白银线损周报` | `multi_mode_request` | readiness not A/B |
| `sweep-033-scene` | `供电可靠率指标统计表` | `local_doc_pipeline` | workflow evidence incomplete |
| `sweep-034-scene` | `供电可靠性数据质量自查报告月报` | `local_doc_pipeline` | workflow evidence incomplete |
| `sweep-039-scene` | `故障报修工单信息统计表` | `paginated_enrichment` | workflow evidence incomplete |
| `sweep-042-scene` | `国网金昌供电公司营商环境周例会报告` | `local_doc_pipeline` | workflow evidence incomplete |
| `sweep-051-scene` | `嘉峪关可靠性分析报告` | `local_doc_pipeline` | workflow evidence incomplete |
| `sweep-068-scene` | `输变电设备运行分析报告` | `paginated_enrichment` | workflow evidence incomplete |
| `sweep-071-scene` | `台区线损大数据-月_周累计线损率统计分析` | `multi_mode_request` | readiness not A/B |
| `sweep-074-scene` | `同兴智能安全督查日报` | `local_doc_pipeline` | workflow evidence incomplete |
| `sweep-084-scene` | `巡视计划完成情况自动检索` | `paginated_enrichment` | workflow evidence incomplete |
| `sweep-085-scene` | `业扩报装管理制度` | `host_bridge_workflow` | workflow evidence incomplete |
| `sweep-091-scene` | `用户停电频次分析监测` | `page_state_eval` | readiness not A/B |
## Route Mapping
This residual bucket must be split into bounded child work:
1. `Residual Route A`: `G3 / paginated_enrichment` residual closure for 4 scenes.
2. `Residual Route B`: `G2 / multi_mode_request` residual closure for 2 scenes.
3. `Residual Route C`: boundary hold decision for local-doc and host-bridge residuals.
4. `Residual Route D`: bootstrap target isolation for the page-state residual.
5. `Residual Route E`: follow-up mini sweep and reconciliation candidate refresh.
## Guardrails
1. Do not update `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`.
2. Do not add new families.
3. Do not start `G4/G5`.
4. Do not relax readiness gates.
5. Do not merge boundary runtime work with mainline contract closure.
6. Do not classify local-doc residuals as mainline pass without a separate G8 runtime roadmap.
7. Do not classify host-bridge residuals as mainline pass without a separate G6 runtime roadmap.
## Implementation Boundaries
Allowed implementation routes:
1. `G3` residual workflow evidence recovery.
2. `G2` residual request/response/mode readiness correction.
Decision-only routes:
1. `local_doc_pipeline` residuals.
2. `host_bridge_workflow` residual.
3. `page_state_eval/bootstrap_target` residual.
## Completion Criteria
This residual design is complete when:
1. all 13 scenes are assigned to one residual route;
2. the mainline residual routes have bounded implementation plans;
3. boundary and bootstrap residuals have explicit hold/isolate decisions;
4. a follow-up mini sweep plan exists to measure the residual closure delta.

View File

@@ -0,0 +1,99 @@
# Timeout Budget and Rerun Hygiene Design
> Date: 2026-04-19
> Status: Draft
> Upstream Diagnostic: `docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
## Intent
Prevent budget-sensitive scenes from being miscounted as `source-unreadable` when they can resolve into:
1. `executed-pass`
2. structured `fail-closed`
under a bounded rerun budget.
This design does not attempt to improve scene understanding. It only changes timeout handling and rerun classification hygiene.
## Problem Statement
The timeout regression diagnostic produced:
| Scene id | Diagnostic label | Actual behavior under `90s` |
| --- | --- | --- |
| `sweep-015-scene` | `timeout-rerun-pass` | completed successfully |
| `sweep-025-scene` | `timeout-rerun-pass` | completed successfully |
| `sweep-040-scene` | `timeout-rerun-fail-closed` | resolved into structured fail-closed |
This means the current fixed `45s` budget is too coarse for a subset of scenes. It collapses:
1. budget-sensitive success
2. budget-sensitive fail-closed
3. true unreadable or hanging cases
into the same `source-unreadable` bucket.
## Scope
In scope:
1. define a bounded timeout-budget policy
2. define when a diagnostic rerun is allowed
3. define how rerun results should be classified
4. define output JSON and report for timeout hygiene verification
Out of scope:
1. analyzer logic changes
2. generator contract recovery changes
3. scene promotion
4. execution board updates
5. full `102` sweep improvement work
6. timeout implementation unrelated to rerun hygiene
## Policy
### Primary Sweep Budget
The initial sweep still runs with the fixed primary budget.
### Secondary Diagnostic Budget
When a scene ends with:
1. `source-unreadable`
2. reason `generator timeout after 45s`
it becomes eligible for one bounded rerun under a secondary timeout budget.
### Rerun Result Mapping
A bounded rerun may only map to:
1. `timeout-rerun-pass`
2. `timeout-rerun-fail-closed`
3. `timeout-rerun-timeout`
4. `timeout-rerun-error`
These are hygiene classifications, not promoted scene statuses.
### Board and Promotion Boundary
Even when rerun succeeds:
1. do not update `scene_execution_board_2026-04-18.json`
2. do not convert the scene to promoted status
3. do not silently merge the rerun result into canonical scene status
## Output
1. `tests/fixtures/generated_scene/timeout_budget_rerun_hygiene_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-timeout-budget-rerun-hygiene-report.md`
## Success Criteria
1. timeout scenes are no longer treated as a single unreadable bucket in the hygiene layer
2. rerun-pass and rerun-fail-closed are distinguishable
3. true timeout cases remain distinguishable
4. no analyzer or generator implementation changes are made
5. no execution board updates are made

View File

@@ -0,0 +1,114 @@
# Timeout Regression Diagnostic Design
> Date: 2026-04-19
> Status: Draft
> Upstream Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-improvement-roadmap-plan.md`
> Upstream Follow-up: `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
## Intent
Diagnose the three timeout records visible after the structured fail-closed improvement follow-up sweep.
This design is diagnostic-only. It does not change analyzer or generator logic, promote scenes, update the execution board, or treat a longer rerun success as a validated pass.
## Problem Statement
The structured fail-closed improvement follow-up sweep produced:
| Status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 47 |
| `adjudicated-valid-host-bridge` | 4 |
| `source-unreadable` | 3 |
The three `source-unreadable` records are timeout records:
| Scene id | Scene | Timeout type |
| --- | --- | --- |
| `sweep-015-scene` | `任务报表` | persistent timeout |
| `sweep-025-scene` | `力禾动环系统巡视记录` | persistent timeout |
| `sweep-040-scene` | `嘉峪关日报` | new regression timeout |
`sweep-040-scene` is the most important record because it regressed from `fail-closed-known` in the reconciled baseline to `source-unreadable` in the follow-up sweep.
## Scope
In scope:
1. identify the three timeout records from the follow-up sweep
2. collect source directory diagnostics
3. run bounded diagnostic reruns with longer timeout budgets
4. classify each timeout into a secondary timeout reason
5. publish diagnostic JSON and report
Out of scope:
1. analyzer or generator implementation changes
2. readiness gate changes
3. execution board updates
4. scene promotion
5. family baseline changes
6. handling the remaining `47` structured fail-closed records
7. handling the `4` adjudicated host-bridge records
## Diagnostic Labels
Each timeout must receive exactly one final diagnostic label:
1. `timeout-rerun-pass`
2. `timeout-rerun-fail-closed`
3. `timeout-large-source`
4. `timeout-command-hang`
5. `timeout-nondeterministic`
6. `timeout-source-scan-heavy`
7. `timeout-unknown`
Secondary labels may be attached for:
1. large file count
2. large total size
3. many HTML or JS files
4. generated report present after rerun
5. stderr decode noise
6. elapsed time near budget
## Required Evidence
For each timeout record, collect:
1. scene id
2. scene name
3. source directory
4. previous reconciled status
5. follow-up status
6. file count
7. total source bytes
8. HTML file count
9. JavaScript file count
10. largest files
11. diagnostic rerun exit code
12. diagnostic rerun elapsed seconds
13. diagnostic rerun timed out flag
14. generation report path if produced
15. generation status if produced
16. final diagnostic label
## Output
Diagnostic output:
`tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
Report output:
`docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
## Success Criteria
1. exactly three timeout records are diagnosed
2. `sweep-040-scene` is explicitly marked as the regression timeout
3. the two persistent timeout records remain distinguishable from the regression timeout
4. each record has one final diagnostic label
5. no implementation changes are made
6. no execution board state is updated

View File

@@ -0,0 +1,99 @@
# Timeout Rerun Hygiene Integration Design
> Date: 2026-04-19
> Status: Draft
> Upstream Hygiene: `tests/fixtures/generated_scene/timeout_budget_rerun_hygiene_2026-04-19.json`
> Upstream Reconciliation: `tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json`
## Intent
Integrate timeout rerun hygiene into future sweep interpretation and reconciliation reporting so budget-sensitive timeout scenes are not miscounted as a single `source-unreadable` bucket.
This is a reporting and classification integration only. It does not change analyzer or generator behavior.
## Problem Statement
The timeout hygiene layer now distinguishes:
1. `rerun-resolved-pass`
2. `rerun-resolved-fail-closed`
3. `rerun-still-timeout`
4. `rerun-error`
Without integration, a future sweep or reconciliation still risks reporting:
`source-unreadable = 3`
when the actual hygiene-aware interpretation is:
1. `2` budget-sensitive pass candidates
2. `1` budget-sensitive fail-closed candidate
3. `0` persistent timeout after rerun
## Scope
In scope:
1. define a hygiene-aware reporting schema
2. define how raw timeout status and rerun hygiene status coexist
3. define reconciliation-layer summary fields
4. define output JSON and report for the integrated view
Out of scope:
1. analyzer changes
2. generator changes
3. execution board updates
4. scene promotion
5. full `102` sweep rerun
6. timeout implementation fixes
## Integration Rules
### Raw Status Preservation
The raw sweep status is preserved.
Example:
`source-unreadable`
remains stored as the raw sweep result.
### Hygiene Overlay
When a timeout record has a hygiene record, the integrated layer adds:
1. `hygieneStatus`
2. `hygieneInterpretation`
### Hygiene Interpretation
Allowed integrated timeout interpretations:
1. `timeout-as-pass-candidate`
2. `timeout-as-fail-closed-candidate`
3. `timeout-still-unreadable`
4. `timeout-rerun-error`
### Summary Output
The integrated summary must report both:
1. raw status counts
2. hygiene-aware timeout interpretation counts
This prevents lossy reporting.
## Output
1. `tests/fixtures/generated_scene/timeout_rerun_hygiene_integration_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-timeout-rerun-hygiene-integration-report.md`
## Success Criteria
1. raw timeout counts remain visible
2. hygiene-aware timeout interpretation counts are added
3. the three timeout records become distinguishable in reconciliation reporting
4. no analyzer or generator code is changed
5. no execution board state is updated

View File

@@ -0,0 +1,56 @@
# Deterministic Keyword Scoring Refinement Design
> Date: 2026-04-20
> Parent Plan: `2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md`
## Intent
Resolve the 9 deterministic dispatch ambiguity gaps found after normalizing the final materialized scene skills for `。。。` invocation.
The previous readiness pass proved that 92 complete packages can be selected by full-scene-name samples, but 9 scenes collide because generated include keywords are too broad for current score rules.
## Fixed Gap Set
1. `sweep-026-scene / 县区公司故障明细`
2. `sweep-034-scene / 售电收入日统计排程预测`
3. `sweep-037-scene / 嘉峪关可靠性分析报告`
4. `sweep-038-scene / 嘉峪关周报`
5. `sweep-039-scene / 嘉峪关故障明细`
6. `sweep-040-scene / 嘉峪关日报`
7. `sweep-041-scene / 嘉峪关月报`
8. `sweep-044-scene / 国网金昌供电公司指挥中心生产例会报告`
9. `sweep-045-scene / 国网金昌供电公司营商环境周例会报告`
## Scope
Allowed:
1. Refine deterministic include/exclude keywords for the fixed 9 scenes and direct collision partners when needed.
2. Run dispatch dry-run checks without browser execution.
3. Publish refinement decisions and readiness delta.
Forbidden:
1. Do not execute browser scripts.
2. Do not repair `sweep-012-scene`.
3. Do not change generated scripts.
4. Do not update official execution board.
5. Do not modify runtime dispatch code unless this design is superseded by a separate runtime-scoring implementation plan.
## Strategy
Prefer manifest-level disambiguation first:
1. keep full scene names as primary keywords;
2. remove overly broad standalone tokens from colliding scenes where they create ties;
3. add distinctive phrase keywords only when present in the scene name;
4. use `exclude_keywords` only for direct mutually exclusive cases.
Runtime scoring changes are out of scope for this plan unless manifest refinement cannot make all 9 gaps uniquely selectable.
## Completion Criteria
1. All 9 fixed gaps have final decisions.
2. Full-scene-name dispatch dry-run uniquely selects the expected scene for each fixed gap.
3. No new ambiguity is introduced for the complete 101-package set.
4. `sweep-012-scene` remains excluded.

View File

@@ -0,0 +1,68 @@
# Final Skill Human-Readable Index Design
> Date: 2026-04-20
> Parent Plan: `2026-04-19-scene-skill-102-final-materialization-plan.md`
## Intent
Make the final materialized skill set usable by humans without changing canonical scene ids, generation logic, or generated scripts.
The final skill directories intentionally use stable ids such as `sweep-001-scene`, but this is not enough for review, validation, or handoff. This design adds a human-readable index and normalizes metadata for already materialized skill packages.
## Scope
This design only addresses readability and metadata.
Allowed:
1. Create `SCENE_INDEX.md` under the final materialization root.
2. Create `scene_skill_102_index.json` under the final materialization root.
3. Update existing materialized `SKILL.toml` files with `display_name`, `scene_id`, and `scene_name`.
4. Update existing materialized `SKILL.md` files with readable scene metadata.
5. Publish a superpowers report.
Forbidden:
1. Do not rerun `sg_scene_generate`.
2. Do not rename `sweep-xxx-scene` directories.
3. Do not modify generated scripts.
4. Do not modify `src/generated_scene/analyzer.rs`, `src/generated_scene/generator.rs`, or `src/generated_scene/ir.rs`.
5. Do not repair `sweep-012-scene` package generation failure.
6. Do not update the official execution board.
## Data Sources
Authoritative scene names come from:
`tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
Materialization state comes from:
`tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
Failure state comes from:
`tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
## Output Shape
`SCENE_INDEX.md` must include:
1. scene id;
2. scene name;
3. archetype;
4. readiness;
5. materialization status;
6. skill directory;
7. failure reason when applicable.
`scene_skill_102_index.json` must contain the same mapping in machine-readable form.
Each complete skill package should expose the readable name in `SKILL.toml` and `SKILL.md` while preserving the stable `name = "sweep-xxx-scene"` identifier.
## Completion Criteria
1. The index contains exactly 102 rows.
2. The failed `sweep-012-scene` is present and explicitly marked as failed.
3. Complete packages have readable metadata.
4. No generation or recovery work is performed.

View File

@@ -0,0 +1,180 @@
# Generated Scene Rule Hardening Route Design
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Upstream ledger:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-plan.md`
## Intent
Define the bounded route-design stage that converts the completed source-first runtime-semantics ledger into reusable hardening routes.
This design does not implement analyzer/generator changes. It only decides:
1. which reusable hardening routes exist
2. how large each route is
3. which routes should be executed first
4. how downstream implementation slices should be decomposed
## Why This Stage Exists
The ledger proved that the current 102-scene set contains reusable generator-level gaps, not just isolated scene-specific defects.
The project now needs a route map that:
1. groups scenes by reusable rule fixes
2. avoids scene-by-scene patching
3. defines a strict implementation order
4. makes full rematerialization and validation refresh possible after hardening
## Fixed Inputs
1. `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md`
## Scope
In scope:
1. route clustering from the completed ledger
2. route prioritization by coverage and reuse
3. bounded child-plan sequencing for implementation
4. rematerialization dependency declaration
5. validation refresh dependency declaration
Out of scope:
1. any `src/` change
2. any skill manifest edit
3. any rematerialization
4. any validation rerun
5. any inner-network execution
## Route Candidates
The ledger already establishes five generator-level hardening routes.
### Route 1: Resolver-to-Request Mapping Hardening
Goal:
Recover reusable mapping metadata between resolver outputs and request payload fields.
Examples:
1. `org_code -> orgno`
2. `period_payload.fdate -> fdate`
3. `period_payload.weekSfdate -> weekSfdate`
4. `period_payload.weekEfdate -> weekEfdate`
Reason for high priority:
This route spans the full 102-scene set and directly blocks runtime equivalence.
### Route 2: Runtime URL Classification Hardening
Goal:
Separate URL roles during generation:
1. runtime context URL
2. module route URL
3. API endpoint URL
4. entry URL hints
Reason for high priority:
This route also spans the full set and is required to stop callers from guessing `page_url` semantics at execution time.
### Route 3: Embedded Dictionary Extraction Hardening
Goal:
Recover richer dictionaries and trees from source-side JS/HTML assets instead of shipping starter subsets only.
Reason for high priority:
This route also spans the full set and is the only scalable answer to sweep-030-style organization coverage failures.
### Route 4: Parameter Default Semantics Recovery Hardening
Goal:
Recover page-native default period/date/mode behavior from source evidence into generated parameter metadata.
Reason for priority:
This route is slightly narrower than the first three but still affects the majority of scenes and is highly visible in runtime invocation.
### Route 5: Invocation Alias Generation Hardening
Goal:
Broaden deterministic invocation coverage so user wording is not forced to match canonical scene names exactly.
Reason for priority:
This route is selective rather than universal, so it should follow structural hardening routes unless a route-local blocker proves otherwise.
## Prioritization Rules
Route order must be based on:
1. breadth of scene coverage
2. generator-level reuse
3. ability to reduce repeat inner-network rediscovery
4. dependency order between routes
It must not be based on:
1. anecdotal scene debugging order
2. whichever single scene was most recently tested
## Fixed Route Order
The current route order should be:
1. `resolver_request_mapping_hardening`
2. `runtime_url_classification_hardening`
3. `embedded_dictionary_extraction_hardening`
4. `parameter_default_semantics_recovery_hardening`
5. `alias_generation_hardening`
## Implementation-Slice Policy
No route should be implemented as one unbounded giant patch.
Each route must later be split into bounded child plans with:
1. a fixed scene bucket
2. explicit allowed files
3. explicit forbidden files
4. an expected coverage delta
5. a stop statement
## Required Downstream Outputs
This route-design stage must yield:
1. one route-sequencing plan
2. one bounded implementation plan per top route
3. one full rematerialization refresh plan after route execution
4. one validation refresh plan after rematerialization
## Acceptance Criteria
This design is complete when:
1. all five reusable routes are explicit
2. route order is fixed
3. route ordering is justified by ledger evidence, not anecdotes
4. downstream implementation decomposition rules are explicit
5. this stage remains design-only
## Stop Statement
Stop after publishing the route design and its child sequencing plan.
Do not implement any hardening route inside this design.

View File

@@ -0,0 +1,53 @@
# Generated Scene Rule Hardening Route Sequence Design
> Date: 2026-04-20
> Status: Draft
> Parent plan:
> - `docs/superpowers/plans/2026-04-20-generated-scene-rule-hardening-route-plan.md`
## Intent
Freeze the downstream execution sequence after the completed runtime-semantics ledger.
This design does not implement hardening. It only turns the route order into concrete bounded child plans.
## Fixed Route Order
1. `resolver_request_mapping_hardening`
2. `runtime_url_classification_hardening`
3. `embedded_dictionary_extraction_hardening`
4. `parameter_default_semantics_recovery_hardening`
5. `alias_generation_hardening`
## Child-Plan Policy
Each child plan below is a first reusable implementation slice, not full route closure.
That is intentional:
1. each slice must stay bounded
2. each slice must operate on a route-local bucket
3. residual route expansion should happen only after implementation results are known
## Required Downstream Plans
1. `2026-04-20-generated-scene-resolver-request-mapping-hardening-plan.md`
2. `2026-04-20-generated-scene-runtime-url-classification-hardening-plan.md`
3. `2026-04-20-generated-scene-embedded-dictionary-extraction-hardening-plan.md`
4. `2026-04-20-generated-scene-parameter-default-semantics-hardening-plan.md`
5. `2026-04-20-generated-scene-invocation-alias-generation-hardening-plan.md`
6. `2026-04-20-generated-scene-runtime-semantics-rematerialization-refresh-plan.md`
7. `2026-04-20-generated-scene-runtime-semantics-validation-refresh-plan.md`
## Acceptance Criteria
This design is complete when:
1. route order is fixed
2. child-plan list is explicit
3. rematerialization and validation refresh are frozen as mandatory dependencies
4. implementation still has not started
## Stop Statement
Stop after publishing the route sequence design and route sequence plan.

View File

@@ -0,0 +1,203 @@
# Generated Scene Runtime Semantics Gap Analysis Design
> Status: Superseded by `docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md`
## Objective
Produce a bounded, implementation-free analysis of runtime semantics gaps across the final 102 generated scene skills, using `sweep-030-scene` as the anchor case that exposed five concrete gap classes during inner-network validation.
This design does **not** modify analyzer, generator, runtime, skill manifests, or execution assets. It only defines how to analyze and classify the gaps that remain between:
- `generated_scene` framework-level success
- real inner-network invocation / execution equivalence
## Anchor Case
The anchor case is:
- `sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析`
Inner-network debugging exposed the following gap classes:
1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`
The analysis generalizes these five classes across the full 102-scene final materialization set.
## Scope
In scope:
- Analyze the final 102 generated skills under:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- Inspect:
- `scene.toml`
- `SKILL.toml`
- `references/generation-report.json`
- `references/org-dictionary.json` where present
- generated browser scripts where needed for request mapping evidence
- Compare generated assets against source-scene evidence when required to validate dictionary and runtime-url semantics
- Produce a 102-scene gap inventory and summary report
Out of scope:
- Any code change in `src/`
- Any edit to generated skill packages
- Any update to execution board / official board
- Any new pseudo-production execution
- Any new inner-network fix for a specific scene
## Problem Statement
The repository has already reached:
- `102 / 102` framework auto-pass
- `102 / 102` final materialized skills
- deterministic invocation readiness
But `sweep-030-scene` demonstrated that generated skills can still diverge from real runtime semantics in ways not captured by framework-level closure:
- user phrasing differs from canonical scene name
- source scene contains complete org dictionaries not fully recovered into the generated skill
- source page defaults dates / periods while generated invocation initially required explicit period values
- resolver outputs and request field names do not align 1:1
- runtime context URL semantics differ from module-route URL semantics
Therefore the next bounded step is analysis, not implementation.
## Gap Taxonomy
Each scene may be tagged with zero or more of the following gap classes:
### 1. `invocation_alias_gap`
Definition:
- Natural operator phrasing is likely not covered by current deterministic `include_keywords`
Indicators:
- Deterministic keywords only contain canonical scene title
- Scene title includes punctuation / separators / compound mode phrases
- Existing reports already required alias normalization
### 2. `dictionary_recovery_gap`
Definition:
- Source scene contains embedded dictionaries / trees / option arrays, but generated skill only carries a starter subset or no dictionary at all
Indicators:
- Source contains files like `city.js`, `dict.js`, `enum.js`, `options.js`
- Source JS includes tree/option structures with labels/codes/children
- Generated `references/org-dictionary.json` is empty or much smaller than source evidence
### 3. `parameter_default_semantics_gap`
Definition:
- Source page applies default values (date, period, mode, range, org) when user omits them, but generated skill currently treats them as required or unresolved
Indicators:
- Source contains `moment()` / date defaulting / initial query payloads
- Generated parameter readiness previously required explicit user input
### 4. `resolver_to_request_mapping_gap`
Definition:
- Resolved semantic parameters do not align directly with actual request field names or payload layout used by the source page
Indicators:
- Resolver outputs `org_code` while request uses `orgno`, or analogous mismatches
- Generated request template uses placeholders not directly populated by resolver outputs
- Source request payload structure differs from generated request mapping
### 5. `runtime_url_semantics_gap`
Definition:
- Generated skill does not clearly distinguish between app-entry URL, module-route URL, and API endpoint URL for runtime binding
Indicators:
- `scene.toml` only stores one `bootstrap.target_url`
- Inner-network execution shows app-entry URL succeeds while module-route URL fails, or vice versa
- Generation report contains both an app entry and a deeper route candidate
## Inputs
Primary inputs:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
Anchor-case source evidence:
- `D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析`
## Output Artifacts
### 1. JSON inventory
- `tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json`
Required structure:
- top-level summary counts by gap class
- per-scene records
- per-risk-bucket grouping
Each scene record should include:
- `sceneId`
- `sceneName`
- `archetype`
- `riskLevel`
- `gaps`
- `evidence`
- `recommendedFixRoutes`
### 2. Human-readable report
- `docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md`
The report must answer:
1. How many scenes likely have each gap type
2. Which families / archetypes are most affected
3. Which gaps are generator-level
4. Which gaps are runtime-only and should not be pushed back into generation
5. Which next implementation routes should be prioritized
## Risk Buckets
Scenes should be grouped into:
- `high`: multi-parameter or runtime-sensitive scenes where inner-network invocation is likely to diverge without further hardening
- `medium`: scenes with likely alias / dictionary / default-semantics issues but lower execution sensitivity
- `low`: scenes with no immediate evidence of these five gap classes
## Acceptance Criteria
This analysis is complete when:
1. All 102 final materialized scenes have a runtime-semantics record
2. `sweep-030-scene` is explicitly analyzed under all applicable gap classes
3. Summary counts exist for all five gap classes
4. Dictionary recovery gap is supported by direct source-vs-generated evidence for the anchor case
5. The report recommends next implementation routes without changing code
## Stop Statement
Stop after publishing the JSON inventory and report.
Do not open implementation work from this design.

View File

@@ -0,0 +1,165 @@
# Generated Scene Source Evidence Cross-Scan Design
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
## Intent
Execute the first bounded child step of the source-first runtime semantics hardening roadmap:
`scan the original 102 source scenes for high-signal runtime-semantics evidence`
This design does not change analyzer, generator, manifests, or runtime behavior. It only defines how to scan original source-scene evidence so later rule-hardening routes can be derived from source truth rather than from already-generated skills alone.
## Objective
For every scene in the current 102-scene set:
1. locate the original source directory
2. perform a bounded source evidence scan
3. record whether source-side evidence exists for the five anchor gap classes:
- `invocation_alias_gap`
- `dictionary_recovery_gap`
- `parameter_default_semantics_gap`
- `resolver_to_request_mapping_gap`
- `runtime_url_semantics_gap`
## Scope
In scope:
1. source directories under:
- `D:/desk/智能体资料/全量业务场景/一平台场景`
2. current 102-scene mapping from existing materialization / board assets
3. bounded file-content scanning over high-signal files
4. JSON ledger + human-readable report
Out of scope:
1. any code change in `src/`
2. any generated skill change
3. any rematerialization
4. any execution board update
5. any pseudo-production execution
## Required Scan Targets
The scan should prioritize only high-signal evidence sources.
### 1. Invocation alias evidence
Signals:
1. scene name variants
2. menu labels
3. button labels
4. route names
5. report titles
6. user-facing Chinese phrases in HTML / JS
### 2. Dictionary recovery evidence
Signals:
1. `city.js`
2. `dict.js`
3. `enum.js`
4. `options*.js`
5. tree / option arrays with `label`, `value`, `code`, `children`
### 3. Parameter default semantics evidence
Signals:
1. `moment(`
2. `dayjs(`
3. default query parameter assignment
4. implicit month/week/date initialization
### 4. Resolver-to-request mapping evidence
Signals:
1. `$.ajax`
2. `fetch`
3. `contentType`
4. request `data`
5. request body field names
6. mode-specific request payloads
### 5. Runtime URL semantics evidence
Signals:
1. app entry URLs
2. module route URLs
3. API endpoint URLs
4. host runtime / bootstrap page hints
## Scan Strategy
This is not a full source index.
The scan should:
1. use bounded heuristics and targeted filename/content patterns
2. avoid exhaustive deep parsing of every file
3. record evidence flags and representative evidence paths
4. be sufficient to classify scenes for later hardening routes
## Inputs
Primary inputs:
1. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
2. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
3. source root:
- `D:/desk/智能体资料/全量业务场景/一平台场景`
Anchor validation source:
1. `D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析`
## Output Artifacts
### JSON
- `tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json`
Each scene record should include:
1. `sceneId`
2. `sceneName`
3. `sourceDir`
4. `evidenceFlags`
5. `evidenceFiles`
6. `riskHints`
### Report
- `docs/superpowers/reports/2026-04-20-generated-scene-source-evidence-cross-scan-report.md`
The report must answer:
1. how many scenes show dictionary evidence
2. how many scenes show default parameter semantics
3. how many scenes show request field aliasing
4. how many scenes show multi-URL semantics
5. which scenes look most similar to `sweep-030-scene`
## Acceptance Criteria
This design is complete when:
1. all 102 scenes are included in the cross-scan
2. the five evidence families are explicit
3. the output JSON structure is defined
4. the scan remains analysis-only
## Stop Statement
Stop after publishing the child design and child plan.
Do not execute the scan inside this design.

View File

@@ -0,0 +1,298 @@
# Generated Scene Source-First Runtime Semantics Hardening Design
> Date: 2026-04-20
> Status: Draft
> Supersedes:
> - `docs/superpowers/specs/2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md`
> Upstream Parent:
> - `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Materialization:
> - `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
## Intent
Define the next parent roadmap for `generated_scene` after framework closure has already been achieved.
The purpose is no longer:
- whether the `102` scenes can be generated into skills
That has already been proven.
The purpose is now:
- scan the original `102` source scenes for runtime-semantics evidence
- identify all scenes that can reproduce the same class of divergence exposed by `sweep-030-scene`
- harden analyzer / generator / manifest rules at the rule level rather than scene-by-scene
- regenerate the full `102` skill set from the hardened rules
- rerun validation assets so future inner-network execution does not rediscover the same class of defects one scene at a time
This design deliberately moves from a weak `generated-skill-first` analysis to a stronger `source-first` analysis and regeneration program.
## Why the Previous Analysis Was Not Enough
The superseded analysis-only design focused mainly on the already-generated skill assets.
That is insufficient for the actual project goal, because the goal is not simply to describe gaps that already surfaced in generated skills. The goal is to:
1. proactively find other source scenes with the same latent runtime-semantics risks as `sweep-030-scene`
2. correct the generation rules once
3. regenerate the full 102-scene bundle
4. avoid repeated inner-network rediscovery of the same class of defects
Therefore the correct parent approach must be source-first.
## Anchor Problem Family
`sweep-030-scene / 台区线损大数据-月_周累计线损率统计分析` exposed five reusable gap classes:
1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`
The roadmapping problem is no longer “fix sweep-030”.
It is:
`find every source scene in the current 102 set that can reproduce one or more of these five gap classes, then harden generation rules and rematerialize the whole set`
## Source-First Principle
For this roadmap, the original source scenes are the primary truth.
Generated skills are secondary, derived artifacts used for comparison.
This means:
1. risk discovery starts from original source-scene files, not from generated output alone
2. generated skills are used to measure what is missing compared with source evidence
3. implementation targets rule-level recovery, not scene-name patching
4. the roadmap is incomplete until the full 102 skills are regenerated from hardened rules
## Scope
In scope:
1. Scan the original 102 source-scene directories under:
- `D:/desk/智能体资料/全量业务场景/一平台场景`
2. Cross-map each source scene to the current final generated skill
3. Detect source-side evidence for the five runtime-semantics gap classes
4. Produce a full risk ledger for all 102 scenes
5. Define the bounded implementation routes required to harden generation rules
6. Define the required full rematerialization and validation refresh after rule changes
Out of scope:
1. Inner-network execution itself
2. Login / credential handling
3. Host-bridge runtime hardening outside current generated-scene semantics
4. Scene-by-scene ad hoc inner-network patching as the primary method
## Problem Restatement
The repository already reached:
1. `102 / 102` framework auto-pass
2. `102 / 102` materialized skills
3. deterministic invocation readiness
4. full direct mock pass
But `sweep-030-scene` proved that generated skills can still diverge from original scene runtime semantics in ways that only surface when actually invoked in a browser-attached environment.
The project cannot sustainably close that gap by waiting for each scene to fail in inner-network execution.
The missing capability is:
`source-first runtime semantics extraction and rule hardening`
## Runtime-Semantics Gap Taxonomy
The five anchor gap classes remain the canonical taxonomy.
### 1. `invocation_alias_gap`
The original scene affords natural operator phrasing, but the generated deterministic manifest is too narrow.
### 2. `dictionary_recovery_gap`
The original scene contains embedded dictionaries, trees, or option structures, but the generated skill only restores a starter subset or no dictionary.
### 3. `parameter_default_semantics_gap`
The original page supplies default time / mode / org semantics, but the generated skill initially treats the parameter as explicitly required.
### 4. `resolver_to_request_mapping_gap`
The generated resolver output names are not the actual request payload field names used by the original page.
### 5. `runtime_url_semantics_gap`
The generated skill does not properly separate:
1. app-entry URL
2. module-route URL
3. API endpoint URL
4. runtime browser context URL
## New Required Source-Side Scan
The new parent roadmap must explicitly scan the original source scenes for high-signal evidence.
### Evidence families to scan
1. Dictionary files
- `city.js`
- `dict.js`
- `enum.js`
- `options*.js`
- tree / option / label-code-value arrays
2. Default-parameter semantics
- `moment(`
- `dayjs(`
- month/week defaulting
- implicit query payload initialization
3. Request payload semantics
- `$.ajax`
- `fetch`
- `contentType`
- `data`
- request body field names
4. Runtime URL semantics
- app entry URLs
- module route URLs
- menu navigation targets
- bootstrap candidates
5. Invocation alias evidence
- titles
- menu labels
- button text
- route names
- report names
- operator-facing wording
### Required output of the scan
For each source scene:
1. whether embedded dictionaries exist
2. whether page defaults exist
3. whether request-field aliasing exists
4. whether multiple URL kinds exist
5. whether natural alias variation is likely
## Work Product Hierarchy
The roadmap should produce three layers of output.
### Layer 1: Source-Side Risk Ledger
A full 102-scene ledger that starts from original source evidence.
### Layer 2: Rule-Hardening Route Map
A route map that groups scenes by reusable rule fixes rather than by scene name.
### Layer 3: Rematerialization + Validation Refresh Plan
A controlled plan for regenerating all 102 skills and refreshing validation assets after the rule changes land.
## Core Routes
The source-first roadmap must be split into these fixed routes:
### Route A: Source Cross-Scan and Evidence Ledger
Goal:
Build a full 102-scene source-first runtime-semantics risk inventory.
### Route B: Rule-Level Hardening Design
Goal:
Translate the source-first gaps into rule-level changes for analyzer/generator/manifest output.
Primary targets:
1. alias generation
2. dictionary extraction
3. parameter default recovery
4. resolver-to-request field mapping
5. runtime URL classification
### Route C: Bounded Implementation Slices
Goal:
Implement the rule-level hardening in bounded slices organized by reusable fix route, not by single scene.
### Route D: Full 102 Rematerialization
Goal:
Regenerate all 102 skills after hardening so the new rules actually propagate to the released skill bundle.
### Route E: Validation Refresh
Goal:
Refresh:
1. deterministic invocation readiness
2. parameter readiness
3. static validation
4. direct mock execution
5. offline / pseudo-production handoff assets
## Inputs
Primary source inventory:
- `D:/desk/智能体资料/全量业务场景/一平台场景`
Primary generated comparison inventory:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
Supporting assets:
- `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
- `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
## Deliverables
### 1. Source-first risk ledger
- `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
### 2. Source-first analysis report
- `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-report.md`
### 3. Rule-hardening roadmap outputs
Not implemented in this design, but this design must define the bounded next plans that follow the ledger.
## Acceptance Criteria
This design is successful when:
1. it explicitly requires source-scene cross-scan over the full 102 set
2. it no longer relies on generated-skill-only inspection as the main discovery method
3. it makes full rematerialization a required downstream step
4. it treats `sweep-030-scene` as an anchor case, not a one-off patch
5. it defines a route from source scan to rule hardening to regeneration
## Stop Rule
Stop after publishing the parent design and parent plan.
Do not begin source scanning or implementation inside this design document.

View File

@@ -0,0 +1,200 @@
# Generated Scene Source-First Runtime Semantics Ledger Design
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Upstream scan:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-evidence-cross-scan-plan.md`
## Intent
Define the second bounded child step of the source-first runtime semantics hardening roadmap:
`merge source-side evidence with generated-skill evidence into a full 102-scene runtime-semantics ledger`
This design is still analysis-only. It does not modify `src/`, generated skills, validation assets, or execution-board state.
## Objective
For every scene in the current 102-scene set:
1. merge source-side evidence from the completed cross-scan
2. compare that evidence against current generated skill manifests and references
3. assign one or more canonical runtime-semantics gap classes
4. assign a bounded `riskLevel`
5. distinguish:
- reusable generator-level rule gap
- runtime-only residual
6. publish a source-first runtime-semantics ledger that becomes the only valid input for later hardening-route design
## Fixed Gap Taxonomy
The ledger must continue using the five gap classes already anchored by `sweep-030-scene`:
1. `invocation_alias_gap`
2. `dictionary_recovery_gap`
3. `parameter_default_semantics_gap`
4. `resolver_to_request_mapping_gap`
5. `runtime_url_semantics_gap`
No additional gap class should be invented inside this ledger stage unless the evidence is clearly outside these five and cannot be expressed as a subtype.
## Scope
In scope:
1. the completed source cross-scan asset
2. the current final generated skills under `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. current deterministic invocation readiness assets
4. current natural-language parameter readiness assets
5. current parameter dictionary normalization assets
6. source-to-generated comparison for all 102 scenes
7. JSON ledger + human-readable report
Out of scope:
1. any change in `src/`
2. any skill manifest or script edit
3. any rematerialization
4. any validation rerun
5. any inner-network execution
## Required Comparisons
The ledger stage must compare source evidence with generated output along these axes.
### 1. Invocation alias comparison
Check whether source-side operator wording, labels, route names, or titles imply broader natural-language coverage than the current generated `include_keywords`.
### 2. Dictionary comparison
Check whether source-side dictionaries, trees, or option arrays imply a richer entity dictionary than the generated `references/*dictionary*.json` assets currently expose.
### 3. Parameter default semantics comparison
Check whether source-side date / period / mode initialization implies a default-value policy that the generated manifest or resolver metadata does not currently preserve.
### 4. Resolver-to-request mapping comparison
Check whether source-side request field names differ from generated resolver output names and whether the generated skill currently encodes an explicit mapping.
### 5. Runtime URL comparison
Check whether source-side evidence implies multiple URL roles:
1. app entry URL
2. module route URL
3. API endpoint URL
4. runtime browser context URL
and whether the generated skill currently collapses those roles into a single ambiguous target.
## Ledger Schema
Each scene record in the runtime-semantics ledger should include:
1. `sceneId`
2. `sceneName`
3. `sourceDir`
4. `archetype`
5. `readiness`
6. `riskLevel`
7. `gaps`
8. `generatorLevelGap`
9. `runtimeOnlyResidual`
10. `recommendedFixRoutes`
11. `sourceEvidenceSummary`
12. `generatedEvidenceSummary`
13. `comparisonNotes`
## Risk-Level Rules
The ledger should use bounded, reproducible risk levels:
### `high`
Use when the scene has strong source evidence for one or more gap classes and the current generated skill visibly lacks equivalent semantics.
### `medium`
Use when the scene has source evidence for one or more gap classes, but current generated output appears partially aligned or the mismatch is plausible rather than explicit.
### `low`
Use when source evidence exists but generated output already appears materially aligned, or when the residual is likely runtime-only rather than generator-level.
## Generator-Level vs Runtime-Only
The ledger must classify whether a scene's residuals should later drive generator hardening or should remain runtime-only.
### `generatorLevelGap = true`
Use when source evidence proves the generated skill is missing semantics that should be recoverable during generation.
### `runtimeOnlyResidual = true`
Use when the remaining risk is primarily:
1. login / session
2. host runtime behavior
3. local-doc / host-bridge environment
4. inner-network-only execution context
and not a generation-semantic omission.
These two flags are not always mutually exclusive, but the ledger must explain why.
## Inputs
Primary inputs:
1. `tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json`
2. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
Anchor runtime findings:
1. the confirmed `sweep-030-scene` inner-network findings:
- alias mismatch
- starter-subset org dictionary
- page-semantic default period behavior
- request-field mismatch
- runtime context URL ambiguity
## Output Artifacts
### JSON
- `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
### Report
- `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md`
The report must answer:
1. how many scenes are `high`, `medium`, `low`
2. how many scenes carry each gap class
3. how many scenes appear to require generator-level fixes
4. how many scenes look runtime-only
5. which route clusters are likely to yield the highest reuse
## Acceptance Criteria
This design is complete when:
1. it defines a full-scene ledger stage rather than scene-by-scene notes
2. it binds the ledger to the fixed five-gap taxonomy
3. it defines how source evidence and generated evidence are compared
4. it defines `riskLevel`, `generatorLevelGap`, and `runtimeOnlyResidual`
5. it remains analysis-only
## Stop Statement
Stop after publishing this ledger design and its child plan.
Do not execute the ledger build inside this design.

View File

@@ -0,0 +1,59 @@
# Scene Skill 102 Deterministic Invocation Readiness Design
> Date: 2026-04-20
> Parent Asset: `examples/scene_skill_102_final_materialization_2026-04-19`
## Intent
Prepare the final materialized scene skills for deterministic natural-language invocation through sgClaw.
This design addresses the gap between a generated skill package and a deterministic callable scene skill. The sgClaw runtime already has a registry-backed deterministic dispatch path for instructions ending with `。。。`, but the final materialized `scene.toml` files are not yet normalized for that convention.
## Current Findings
1. `101 / 102` scenes have complete skill packages and `scene.toml`.
2. `1 / 102` scene, `sweep-012-scene / 业扩报装管理制度`, is still a materialization failure.
3. `0 / 101` complete generated scene manifests currently use the deterministic suffix `。。。`.
4. `101 / 101` complete generated scene manifests currently use exactly one include keyword, usually the full scene name.
5. `10 / 101` complete generated scene manifests currently define runtime-supported params.
## Scope
Allowed:
1. Analyze deterministic invocation readiness for the complete final skill set.
2. Normalize `scene.toml` deterministic metadata for complete packages.
3. Generate invocation samples and dispatch dry-run evidence.
4. Publish readiness assets and reports.
Forbidden:
1. Do not execute browser scripts.
2. Do not change generated JavaScript scripts.
3. Do not modify sgClaw runtime dispatch code unless a later dedicated implementation plan is created.
4. Do not repair `sweep-012-scene`.
5. Do not start static, mock, or production validation.
6. Do not rename skill directories.
## Readiness Model
A skill is deterministic-invocation-ready when:
1. it has a valid `scene.toml`;
2. `[deterministic].suffix = "。。。"`;
3. `include_keywords` can match at least the readable scene name;
4. the scene can be uniquely selected by dispatch dry-run using an expected invocation sample;
5. required params either resolve or produce a structured prompt.
## Expected Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_samples_2026-04-20.json`
3. `docs/superpowers/reports/2026-04-20-scene-skill-102-deterministic-invocation-readiness-report.md`
## Completion Criteria
1. All 101 complete packages have deterministic suffix normalized to `。。。`.
2. Dispatch dry-run results are available for all complete packages.
3. The failed `sweep-012-scene` remains explicitly excluded and listed as not ready.
4. No browser execution or production validation is performed.

View File

@@ -0,0 +1,76 @@
# Scene Skill 102 Full Direct Mock Execution Design
> Date: 2026-04-20
> Status: Draft
> Upstream Mock Harness: `docs/superpowers/plans/2026-04-20-scene-skill-102-mock-runtime-harness-implementation-plan.md`
> Input Harness Results: `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
## Intent
Extend mock runtime validation from representative execution to direct execution of all `102` materialized scene skills.
This design remains strictly local and mock-only. It does not perform real browser execution, production system access, or business-data validation.
## Current Baseline
The previous mock runtime harness run produced:
| Status | Count |
| --- | ---: |
| representative `mock-runtime-pass` | 19 |
| representative failures | 0 |
That result proves archetype-level representative viability, but it does not prove every generated script can directly execute in a mock runtime.
## Goal
Produce a direct mock runtime result for all `102` materialized skills.
Each scene must receive exactly one of:
1. `direct-mock-pass`
2. `direct-mock-partial`
3. `direct-mock-fail`
## Validation Boundary
Allowed:
1. read final generated skill packages
2. load generated scripts in Node
3. inject fake runtime dependencies
4. invoke `buildBrowserEntrypointResult`
5. write direct mock result assets and report
Forbidden:
1. do not modify generated skill packages
2. do not modify `src/generated_scene/analyzer.rs`
3. do not modify `src/generated_scene/generator.rs`
4. do not rematerialize skills
5. do not update official board
6. do not open a real browser
7. do not access real network or production systems
8. do not claim production pass
## Expected Output
1. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-full-direct-mock-execution-report.md`
## Interpretation
If all `102` scenes pass direct mock execution, the project can say:
`102 / 102 generated skills can load and execute their primary entrypoint under controlled fake dependencies.`
It still cannot say:
`102 / 102 generated skills are production-ready.`
## Stop Rule
Stop after direct mock results and report are published.
Do not start pseudo-production batch selection under this design.

View File

@@ -0,0 +1,150 @@
# Scene Skill 102 Mock Runtime Harness Implementation Design
> Date: 2026-04-20
> Status: Draft
> Upstream Validation: `docs/superpowers/plans/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-plan.md`
> Input Matrix: `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
## Intent
Define a bounded implementation stage for mock runtime harnesses after the `102` materialized skill set has passed static package validation and deterministic dispatch dry-run.
This design is not production validation. It exists to prove that generated skill scripts can be loaded and exercised against controlled fake dependencies before any real browser, host bridge, or production system is touched.
## Current Baseline
From `scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`:
| Archetype | Count | Representative scenes |
| --- | ---: | --- |
| `paginated_enrichment` | 51 | `sweep-001-scene`, `sweep-002-scene`, `sweep-003-scene` |
| `host_bridge_workflow` | 26 | `sweep-007-scene`, `sweep-009-scene`, `sweep-010-scene` |
| `multi_mode_request` | 10 | `sweep-020-scene`, `sweep-023-scene`, `sweep-030-scene` |
| `local_doc_pipeline` | 6 | `sweep-012-scene`, `sweep-017-scene`, `sweep-019-scene` |
| `single_request_enrichment` | 5 | `sweep-013-scene`, `sweep-016-scene`, `sweep-068-scene` |
| `multi_endpoint_inventory` | 2 | `sweep-084-scene`, `sweep-085-scene` |
| `page_state_eval` | 2 | `sweep-066-scene`, `sweep-094-scene` |
Current matrix status:
| Status | Count |
| --- | ---: |
| `mock-covered-by-representative` | 19 |
| `mock-needs-harness` | 83 |
Important interpretation:
`mock-covered-by-representative` currently means representative selection only. It does not mean scripts have been executed in a mock runtime.
## Harness Layers
### Layer 1: Script Load Harness
Purpose:
1. load generated browser scripts in a controlled JavaScript runtime
2. verify the entry module does not fail during parse/load
3. verify referenced helper files are present
Output status:
`script-load-pass` or `script-load-fail`
### Layer 2: Fake Dependency Harness
Purpose:
Provide controlled fake implementations for:
1. `fetch`
2. browser DOM
3. host bridge action/callback
4. local document service
5. artifact writer
Output status:
`mock-dependency-ready` or `mock-dependency-missing`
### Layer 3: Representative Flow Harness
Purpose:
Run representative scene scripts far enough to prove control-flow integrity.
Checks:
1. expected request or host action is attempted
2. controlled empty-data response is handled
3. controlled non-empty response is normalized
4. artifact metadata is produced when declared
5. error response does not crash outside structured failure path
Output status:
`mock-runtime-pass`, `mock-runtime-fail`, or `mock-runtime-partial`
### Layer 4: Matrix Propagation
Purpose:
Propagate representative results to same-archetype scenes without claiming direct execution for every scene.
Output statuses:
1. `mock-runtime-representative-pass`
2. `mock-runtime-covered-by-representative`
3. `mock-runtime-needs-direct-run`
4. `mock-runtime-fail`
## Route Order
The route order is fixed:
1. `Route 1`: `paginated_enrichment` mock harness
2. `Route 2`: `multi_mode_request` and `single_request_enrichment` mock harnesses
3. `Route 3`: `multi_endpoint_inventory` and `page_state_eval` mock harnesses
4. `Route 4`: `local_doc_pipeline` and `host_bridge_workflow` mock harnesses
5. `Route 5`: publish integrated mock runtime validation report
Rationale:
1. `paginated_enrichment` is the largest bucket and should validate the most reused generated flow first.
2. `multi_mode_request` and `single_request_enrichment` are mainline API flows and can share fake fetch infrastructure.
3. `multi_endpoint_inventory` and `page_state_eval` are small buckets and should be validated after the mainline fetch harness exists.
4. `local_doc_pipeline` and `host_bridge_workflow` require more specialized fakes and must not drive the first harness implementation.
## Scope Guardrails
Allowed:
1. add mock validation harness files
2. add mock validation tests
3. read generated final skill packages
4. execute generated scripts only inside a mock runtime
5. publish mock runtime validation result assets and reports
Forbidden:
1. do not modify generated skill scripts under `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. do not modify `src/generated_scene/analyzer.rs`
3. do not modify `src/generated_scene/generator.rs`
4. do not rematerialize the `102` skills
5. do not update `scene_execution_board_2026-04-18.json`
6. do not start real browser execution
7. do not connect to real business systems
8. do not require production credentials, VPN, SSO, or internal network access
## Expected Assets
1. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-mock-runtime-harness-report.md`
## Stop Rules
Stop after representative mock runtime results and the integrated report are published.
Do not continue into production validation under this plan.
Do not claim `102 / 102` real runtime pass from mock results.

View File

@@ -0,0 +1,99 @@
# Scene Skill 102 Natural-Language Parameter Readiness Design
> Date: 2026-04-20
> Parent: `2026-04-20-scene-skill-102-deterministic-invocation-readiness-design.md`
## Intent
Clarify whether the final 102 materialized scene skills can be invoked with natural-language query conditions before pseudo-production testing.
This design separates deterministic dispatch readiness from parameter readiness. A skill can be selected by an instruction ending with `。。。` while still not being able to parse query conditions such as organization, period, date, or report mode.
## Current Baseline
1. `102 / 102` scene skills are final-materialized.
2. `102 / 102` scene skills are deterministic dispatch ready for `。。。` suffix invocation.
3. `102 / 102` scene skills pass full direct mock execution.
4. Only a subset currently declares explicit `[[params]]` in `scene.toml`.
## Problem Statement
Internal-network validation should not use only `场景名。。。` as the invocation pattern for every skill.
Parameterized skills must be validated with representative natural-language query conditions. For example:
```text
兰州公司 台区线损大数据 月累计线损率统计分析。。。
```
This should resolve:
1. `兰州公司` as organization;
2. `月累计` as period mode;
3. the scene keywords as deterministic skill selection evidence.
If a skill has required params but lacks usable resolver resources, it must be flagged before pseudo-production execution.
## Scope
Allowed:
1. Analyze all final 102 skill manifests.
2. Classify parameter readiness for each scene.
3. Generate recommended natural-language invocation samples.
4. Identify resolver-resource gaps.
5. Publish readiness JSON and report.
Forbidden:
1. Do not modify `src/compat/scene_platform/dispatch.rs`.
2. Do not modify `src/compat/scene_platform/resolvers.rs`.
3. Do not modify `src/generated_scene/analyzer.rs`.
4. Do not modify `src/generated_scene/generator.rs`.
5. Do not edit final generated skill packages.
6. Do not execute browser, host bridge, localhost services, or production network.
7. Do not update `scene_execution_board_2026-04-18.json`.
## Readiness Classes
### `parameter-ready`
The skill declares required params and all required resolver resources are present and populated enough to support deterministic parsing.
### `parameter-gap`
The skill declares required params, but at least one required resolver cannot currently resolve real user input because of missing, empty, or unsupported resolver configuration.
### `parameter-not-required`
The skill has no declared required params. It may still accept descriptive natural language, but the current runtime will primarily use it for deterministic scene selection rather than structured argument extraction.
### `parameter-implicit-risk`
The skill has no declared required params, but the scene name suggests likely query conditions such as month, week, day, company, county, report period, or business object. These scenes should be tested carefully because user wording may imply filters that current manifests do not parse.
## Output Model
Each scene record should include:
1. `sceneId`
2. `sceneName`
3. `archetype`
4. `skillDir`
5. `hasParams`
6. `requiredParams`
7. `resolverStatus`
8. `parameterReadiness`
9. `recommendedInvocation`
10. `minimalInvocation`
11. `parameterizedInvocation`
12. `gaps`
13. `notes`
## Completion Criteria
1. All `102` scenes are classified.
2. The report states how many scenes require explicit query conditions.
3. The report states how many required-param scenes are actually resolver-ready.
4. The report states which scenes should not be validated with only `场景名。。。`.
5. The report does not claim production readiness.

View File

@@ -0,0 +1,75 @@
# Scene Skill 102 Parameter Dictionary And Invocation Template Normalization Design
> Date: 2026-04-20
> Parent: `2026-04-20-scene-skill-102-natural-language-parameter-readiness-design.md`
## Intent
Make the `10` required-param scene skills usable with natural-language query conditions before pseudo-production batch execution.
This design does not broaden deterministic dispatch. It only closes the immediate parameter-readiness gap discovered in `scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`.
## Current Gap
`10` `multi_mode_request` skills declare required params:
1. `org` via `dictionary_entity`
2. `period` via `month_week_period`
However, each generated `references/org-dictionary.json` is currently an empty array. Therefore inputs such as:
```text
兰州公司 台区线损大数据 月累计 2026-03。。。
```
have the correct shape, but cannot resolve `兰州公司` into `org_label` and `org_code`.
## Scope
Allowed:
1. Populate `references/org-dictionary.json` for the fixed `10` required-param skills.
2. Use the existing, already-tested minimal organization aliases from deterministic-submit fixtures:
- `国网兰州供电公司` / `62401`
- `城关供电分公司` / `6240108`
- `国网天水供电公司` / `62403`
3. Refresh natural-language parameter readiness assets.
4. Refresh invocation samples.
5. Refresh pseudo-production handoff inputs for selected scenes that require params.
Forbidden:
1. Do not modify `src/compat/scene_platform/dispatch.rs`.
2. Do not modify `src/compat/scene_platform/resolvers.rs`.
3. Do not modify `src/generated_scene/analyzer.rs`.
4. Do not modify `src/generated_scene/generator.rs`.
5. Do not execute browser, host bridge, localhost services, or production network.
6. Do not claim this starter dictionary is a full production organization dictionary.
7. Do not add params to the other `92` non-param skills under this plan.
## Starter Dictionary Policy
The populated dictionary is a pseudo-production starter dictionary. It is sufficient to validate the natural-language parameter plumbing for inputs such as `兰州公司` and `月累计 2026-03`.
It is not a full province-wide organization dictionary. A later production hardening step may replace or expand it with the real unit tree.
## Readiness Target
After this plan:
1. `10 / 10` required-param skills should move from `parameter-gap` to `parameter-ready`.
2. Their recommended invocation samples should include:
- organization alias;
- scene name;
- month/week mode;
- concrete period value;
- `。。。` suffix.
3. Pseudo-production handoff should not use bare scene names for these scenes.
## Completion Criteria
1. All fixed `10` dictionaries are non-empty and parseable.
2. Natural-language parameter readiness is refreshed.
3. Invocation samples are refreshed.
4. Pseudo-production handoff is refreshed for selected required-param scenes.
5. No runtime source files are modified.

View File

@@ -0,0 +1,90 @@
# Scene Skill 102 Pseudo-Production Batch Execution Design
> Date: 2026-04-20
> Parent: `2026-04-20-scene-skill-102-pseudoprod-batch-execution-preparation-plan.md`
## Intent
Execute the prepared 10-scene pseudo-production batch in an operator-provided quasi-production or production-like environment and collect structured evidence.
This design defines execution boundaries and result recording. It does not embed credentials or require credentials to be stored in the repository.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_handoff_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_evidence_checklist_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_record_template_2026-04-20.json`
4. final materialized skill directory:
`examples/scene_skill_102_final_materialization_2026-04-19/skills`
## Execution Boundary
Pseudo-production execution may only run in an operator-approved environment where the operator has provided:
1. browser or sgClaw runtime access
2. required network access
3. valid session state outside the repository
4. a local evidence output directory
5. approval to capture redacted logs/screenshots/artifacts
## Credential Rule
Never store these in the repository:
1. passwords
2. cookies
3. access tokens
4. Authorization headers
5. private keys
6. VPN secrets
7. internal session dumps
Execution records may reference that an operator-provided session was used, but must not include the session material.
## Execution Result States
Each selected scene must resolve to exactly one:
1. `pseudo-prod-pass`
2. `login-blocked`
3. `network-blocked`
4. `host-bridge-blocked`
5. `local-doc-runtime-blocked`
6. `data-mismatch`
7. `artifact-mismatch`
8. `environment-unavailable`
9. `runtime-error`
## Evidence Requirements
Each selected scene must collect:
1. `execution-record.json`
2. console log
3. network summary
4. screenshot when browser target page is required
5. exported artifact if produced
6. notes
All evidence must be redacted before committing any summary to the repository.
## Repository Outputs
The repository should receive only redacted and structured execution summaries:
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_batch_execution_results_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-pseudoprod-batch-execution-report.md`
Raw evidence may remain outside the repository unless explicitly redacted.
## Forbidden Scope
This design does not allow:
1. committing credentials
2. modifying generated skill packages
3. modifying analyzer/generator/runtime code
4. updating official board status
5. expanding beyond the selected 10 scenes
6. treating pseudo-production pass as full production certification

View File

@@ -0,0 +1,100 @@
# Scene Skill 102 Pseudo-Production Batch Execution Preparation Design
> Date: 2026-04-20
> Parent: `2026-04-20-scene-skill-102-pseudoprod-batch-selection-plan.md`
## Intent
Prepare the first pseudo-production execution batch without running it.
This design defines environment handoff requirements, evidence collection rules, and execution record templates for the 10 selected scenes from the batch selection plan.
## Fixed Batch
The execution preparation batch is fixed by:
- `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_batch_selection_2026-04-20.json`
The batch contains 10 scenes across:
1. `paginated_enrichment`: 4
2. `multi_mode_request`: 2
3. `single_request_enrichment`: 2
4. `multi_endpoint_inventory`: 1
5. `page_state_eval`: 1
## Preparation Only
This design does not allow real execution.
It only prepares:
1. environment handoff checklist
2. evidence package layout
3. per-scene execution record template
4. failure taxonomy mapping
5. operator instructions
## Environment Handoff Requirements
The operator must provide or confirm these outside the repository:
1. target browser or quasi-production host
2. network access to required internal endpoints
3. valid login/session state where needed
4. allowed output directory for downloaded/exported artifacts
5. console log capture method
6. network log capture method
7. screenshot capture method
No credentials, tokens, cookies, or secrets should be stored in the repository.
## Evidence Package Layout
Each scene should use a local evidence folder outside tracked credentials:
```text
pseudoprod_evidence/
<scene-id>/
execution-record.json
console.log
network-summary.json
screenshot.png
exported-artifact.*
notes.md
```
The repository may store templates and redacted summaries, but not sensitive raw credentials or session material.
## Execution Result States
Each scene must resolve to one of:
1. `pseudo-prod-pass`
2. `login-blocked`
3. `network-blocked`
4. `host-bridge-blocked`
5. `local-doc-runtime-blocked`
6. `data-mismatch`
7. `artifact-mismatch`
8. `environment-unavailable`
9. `runtime-error`
## Forbidden Scope
This design does not allow:
1. running browser automation
2. invoking real target systems
3. storing credentials
4. modifying generated skills
5. modifying `analyzer.rs`, `generator.rs`, dispatch, or runtime code
6. updating official board status
## Expected Outputs
1. environment handoff checklist JSON
2. per-scene execution record template JSON
3. evidence checklist JSON
4. preparation report

View File

@@ -0,0 +1,97 @@
# Scene Skill 102 Pseudo-Production Batch Selection Design
> Date: 2026-04-20
> Parent: `2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-plan.md`
> Upstream: `2026-04-20-sweep-015-direct-mock-partial-closure-plan.md`
## Intent
Select the first bounded pseudo-production validation batch after all 102 generated skills have passed local full direct mock execution.
This design does not execute pseudo-production. It only defines the candidate selection rules, batch composition, evidence requirements, and stop conditions for the next execution stage.
## Current Baseline
1. Final materialized skills: `102 / 102`
2. Deterministic invocation readiness: `102 / 102`
3. Static validation: `102 / 102`
4. Dispatch dry-run: `102 / 102`
5. Full direct mock execution: `102 / 102`
6. Pseudo-production readiness:
- `pseudo-prod-ready`: `70`
- `real-env-required`: `32`
## Selection Principle
The first pseudo-production batch should be small, balanced, and low-risk.
It should include only scenes that are:
1. materialized
2. deterministic dispatch ready
3. static validated
4. direct mock pass
5. pseudo-prod-ready
It should not include scenes that require host-bridge runtime, local-doc runtime, document export runtime, or other real-environment-only dependencies in the first batch.
## Batch Shape
The first batch should contain `10` scenes:
1. `paginated_enrichment`: 4
2. `multi_mode_request`: 2
3. `single_request_enrichment`: 2
4. `multi_endpoint_inventory`: 1
5. `page_state_eval`: 1
`host_bridge_workflow` and `local_doc_pipeline` are explicitly excluded from the first pseudo-production batch because their readiness records require real environment dependencies.
## Required Evidence Per Scene
Each selected scene must produce or collect:
1. console log
2. network log or request summary
3. screenshot if browser target page is required
4. exported file if an artifact is produced
5. generation report reference
6. deterministic invocation input used
7. final execution classification
## Failure Taxonomy
Pseudo-production execution results must classify failures as one of:
1. `login-blocked`
2. `network-blocked`
3. `host-bridge-blocked`
4. `local-doc-runtime-blocked`
5. `data-mismatch`
6. `artifact-mismatch`
7. `environment-unavailable`
8. `runtime-error`
## Forbidden Scope
This design does not allow:
1. executing browser automation
2. accessing production credentials
3. accessing real business systems
4. modifying generated skill packages
5. modifying `analyzer.rs`, `generator.rs`, or runtime dispatch
6. updating official board status
7. claiming production pass
## Expected Output
The output is a pseudo-production batch plan asset that names:
1. selected scenes
2. deferred scenes
3. selection reasons
4. execution prerequisites
5. required evidence checklist
6. next execution plan input

View File

@@ -0,0 +1,173 @@
# Scene Skill 102 Static, Mock, And Pseudo-Production Validation Design
> Date: 2026-04-20
> Status: Draft
> Upstream Framework: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Materialization: `docs/superpowers/plans/2026-04-19-scene-skill-102-final-materialization-plan.md`
> Upstream Invocation Readiness: `docs/superpowers/plans/2026-04-20-scene-skill-102-deterministic-invocation-readiness-plan.md`
## Intent
Define the validation stage after the `102` scene set has reached:
1. `102 / 102` final materialized skill packages
2. `102 / 102` deterministic invocation readiness using the `U+3002 x3` deterministic suffix
3. `0` materialization failures
4. `0` deterministic dispatch ambiguities
This design does not extend the framework coverage work. It starts the next stage: proving that the materialized skill assets are structurally healthy, dispatchable, mock-runnable, and ready for a later real-environment validation campaign.
## Current Baseline
Fixed inputs:
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
3. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
4. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
5. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
6. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
Current state:
| Layer | Count |
| --- | ---: |
| materialized skill packages | 102 / 102 |
| deterministic invocation ready | 102 / 102 |
| known materialization failures | 0 |
| deterministic ambiguities | 0 |
## Validation Layers
This validation stage has four layers. Each layer answers a different question.
### Layer 1: Static Package Validation
Question:
Can every materialized skill package be parsed, indexed, and inspected without executing browser or business runtime code?
Checks:
1. required files exist:
- `SKILL.toml`
- `SKILL.md`
- `scene.toml`
- `references/generation-report.json`
- at least one runtime script
2. TOML files parse successfully
3. JSON reports parse successfully
4. `scene.toml` references the expected `sceneId`, tool, suffix, and keyword fields
5. `SKILL.toml` contains stable machine name and human-readable display metadata
6. generated scripts are non-empty and referenced consistently
Output status:
`static-validated` or `static-invalid`
### Layer 2: Deterministic Invocation Dry-Run
Question:
Can sgClaw select the correct skill for deterministic user input ending in the `U+3002 x3` suffix without using an LLM?
Checks:
1. full scene name plus the `U+3002 x3` suffix resolves to the expected skill
2. index sample utterance resolves to the expected skill
3. duplicate or ambiguous keyword matches are reported
4. scenes with parameter hints are flagged for later parameter validation
This layer must not execute the selected skill. It only validates registry and dispatch behavior.
Output status:
`dispatch-dry-run-pass`, `dispatch-ambiguous`, or `dispatch-no-match`
### Layer 3: Mock Runtime Validation
Question:
Can representative generated scripts execute their control flow against mocked browser, fetch, host-bridge, and local-doc dependencies?
This layer is not full production validation. It only proves that generated scripts can run through their main control path with controlled fake responses.
Checks:
1. generated script module loads
2. entry function is callable
3. mock request paths are invoked in expected order
4. empty data and basic error data do not crash the script
5. artifact metadata path is produced when the archetype declares exports
Scope:
This layer should begin with archetype representatives, then expand only if the representative harness is stable.
Output status:
`mock-runtime-pass`, `mock-runtime-fail`, or `mock-runtime-not-covered`
### Layer 4: Pseudo-Production Validation Plan
Question:
What must be true before moving from mock validation into real environment validation?
This layer defines the pre-production checklist and evidence bundle. It does not require production credentials or real system access.
Checklist:
1. environment variable and runtime dependency inventory
2. browser or host-bridge dependency declaration
3. expected artifact type per skill
4. required screenshots, logs, HAR files, console logs, or generated artifacts for later real execution
5. pass/fail taxonomy for real-environment results
Output status:
`pseudo-prod-ready`, `pseudo-prod-blocked`, or `real-env-required`
## Non-Goals
This design does not:
1. modify `src/generated_scene/analyzer.rs`
2. modify `src/generated_scene/generator.rs`
3. rematerialize the `102` skill packages
4. update `scene_execution_board_2026-04-18.json`
5. start browser-integrated production execution
6. require live credentials, VPN, SSO, or production network access
7. claim `102 / 102` real-sample executed-pass
## Validation Status Model
Each scene should eventually have independent statuses:
1. `materializationStatus`
2. `deterministicDispatchStatus`
3. `staticValidationStatus`
4. `mockRuntimeStatus`
5. `pseudoProductionReadinessStatus`
6. `realEnvironmentExecutionStatus`
This prevents the project from confusing generated skill availability with production correctness.
## Expected Deliverables
The implementation plan should produce:
1. static validation result JSON
2. deterministic dry-run validation JSON
3. mock runtime readiness matrix
4. pseudo-production checklist
5. validation report
6. next-stage decision on whether to start real environment validation
## Stop Rules
Stop after publishing validation readiness assets and reports.
Do not proceed into real production execution under this plan.
Do not modify generated framework logic under this plan.

View File

@@ -0,0 +1,37 @@
# Sweep 012 Materialization Recovery Design
> Date: 2026-04-20
> Parent Plan: `2026-04-19-scene-skill-102-final-materialization-plan.md`
## Intent
Recover the single failed final materialization package:
`sweep-012-scene / 业扩报装管理制度`
The official framework board expects this scene to be `paginated_enrichment / A`, but final materialization fell into `host_bridge_workflow / C` and failed before writing required skill files.
## Scope
Allowed:
1. Diagnose why this one scene materializes as `host_bridge_workflow`.
2. Apply the smallest bounded correction needed for this scene to materialize.
3. Re-run only `sweep-012-scene` into the existing final materialization root.
4. Refresh final materialization manifest, failures asset, human-readable index, and deterministic readiness for this scene.
5. Publish a recovery report.
Forbidden:
1. Do not rerun all 102 scenes.
2. Do not change other scene packages unless shared metadata refresh requires aggregate indexes.
3. Do not start static, mock, or production validation.
4. Do not update the official execution board.
5. Do not create a new family.
## Completion Criteria
1. `sweep-012-scene` has `SKILL.toml`, `SKILL.md`, `scene.toml`, and at least one script.
2. `sweep-012-scene` deterministic suffix is `。。。`.
3. Full-name deterministic dispatch selects `sweep-012-scene`.
4. Final materialization failure count becomes `0`.

View File

@@ -0,0 +1,64 @@
# Sweep 015 Direct Mock Partial Closure Design
> Date: 2026-04-20
> Parent: `2026-04-20-scene-skill-102-full-direct-mock-execution-plan.md`
> Scope: bounded mock-only closure for `sweep-015-scene`
## Intent
Close the single remaining `direct-mock-partial` from the 102 full direct mock run before starting pseudo-production batch selection.
The only target scene is:
- `sweep-015-scene / 任务报表`
## Problem Statement
The full direct mock execution produced:
- `direct-mock-pass`: 101
- `direct-mock-partial`: 1
The partial scene is `sweep-015-scene`. Its generated script loads and completes the mock runtime path, but returns artifact status `partial` because all mock rows are filtered out by the script-level business filter:
- `FILTER_EXPR = "row.status == 5"`
The full direct mock runner's generic fake row currently does not provide `status = 5`, so the mock data does not satisfy the generated skill's declared business filter.
## Root Cause Classification
This is a mock fixture contract gap, not a generator or generated-skill defect.
Evidence:
- `sweep-015-scene` generation report has readiness `A`.
- `sweep-015-scene` has complete `paginated_enrichment` workflow evidence.
- The script returns `partial` rather than throwing or failing to load.
- The mock runner's fake row lacks the field required by the script filter.
## Allowed Changes
1. Update the full direct mock runner fake data so the mock row satisfies `sweep-015-scene`'s filter contract.
2. Rerun full direct mock execution.
3. Refresh the full direct mock JSON/report.
4. Publish a closure report.
## Forbidden Changes
1. Do not modify generated skill packages.
2. Do not modify `src/generated_scene/analyzer.rs`.
3. Do not modify `src/generated_scene/generator.rs`.
4. Do not modify `src/generated_scene/ir.rs`.
5. Do not access real browser, real network, production credentials, or business systems.
6. Do not start pseudo-production batch selection under this design.
## Expected Outcome
The full direct mock result should become:
- `direct-mock-pass`: 102
- `direct-mock-partial`: 0
- `direct-mock-fail`: 0
This only proves local mock runtime closure. It does not prove pseudo-production or production execution.

View File

@@ -0,0 +1,86 @@
# Sweep-030 Deterministic Keyword / Alias Normalization Design
## Intent
Provide a bounded fix so `sweep-030-scene` can be deterministically matched from the service console input form used in the inner-network environment, without changing sgClaw runtime, callback-host behavior, or resolver logic.
This design only targets the deterministic manifest surface of:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills/sweep-030-scene/scene.toml`
## Problem
The current `sweep-030-scene` deterministic manifest only exposes one `include_keywords` entry:
- `台区线损大数据-月_周累计线损率统计分析`
But the real operator input uses a more natural phrase:
- `兰州公司 台区线损大数据 月累计线损率统计分析。。。`
Current deterministic dispatch requires `instruction.contains(keyword)`. Because the manifest keyword is too narrow and punctuation-sensitive, `include_hits = 0`, and dispatch returns the unsupported-scene prompt before the skill is selected.
## Scope
### In Scope
- Update deterministic keyword / alias coverage for `sweep-030-scene`
- Preserve current suffix `。。。`
- Preserve current param declarations (`org`, `period`)
- Publish a route-local verification asset and report
### Out Of Scope
- Any change to `src/compat/scene_platform/dispatch.rs`
- Any change to resolver implementation
- Any change to callback-host, browser runtime, or helper-page lifecycle
- Any change to `bootstrap.target_url`
- Any change to official board or final materialization status
- Any broader `G2` family normalization outside `sweep-030-scene`
## Design
Normalize `sweep-030-scene` deterministic aliases so the scene can be matched by the natural phrases already used in inner-network testing.
The deterministic alias set should cover at least:
- `台区线损大数据-月_周累计线损率统计分析`
- `台区线损大数据 月累计线损率统计分析`
- `台区线损大数据 周累计线损率统计分析`
- `台区线损大数据 月累计`
- `台区线损大数据 周累计`
- `台区线损率统计分析`
The alias set should remain specific enough not to collide with unrelated `G2` scenes.
## Expected Result
This fix should let the following type of input clear deterministic dispatch:
- `兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。`
This design does not claim to fix helper bootstrap or callback-host startup. It only ensures that `sweep-030-scene` is selected first, so the next layer can be tested correctly.
## Allowed Files
- `examples/scene_skill_102_final_materialization_2026-04-19/skills/sweep-030-scene/scene.toml`
- `tests/fixtures/generated_scene/sweep_030_deterministic_keyword_alias_normalization_2026-04-20.json`
- `docs/superpowers/reports/2026-04-20-sweep-030-deterministic-keyword-alias-normalization-report.md`
## Forbidden Files
- `src/compat/scene_platform/dispatch.rs`
- `src/browser/callback_host.rs`
- `src/service/server.rs`
- `src/generated_scene/*`
- `resources/rules.json`
## Stop Rule
Stop after:
1. `sweep-030-scene` deterministic aliases are normalized
2. A route-local verification record is written
3. A report is published
Do not proceed into helper-page / requesturl debugging within this plan.

View File

@@ -0,0 +1,56 @@
# Generated Scene Local-Doc Pipeline Residual Closure Design
Date: 2026-04-21
Parent status source:
- `docs/superpowers/reports/2026-04-21-generated-scene-runtime-semantics-validation-refresh-execution-report.md`
## Problem Statement
After runtime-semantics hardening, rematerialization rerun, and validation refresh rerun, the remaining unresolved generated-scene residuals are narrowed to `6` `local_doc_pipeline` scenes:
1. `sweep-025-scene`
2. `sweep-047-scene`
3. `sweep-050-scene`
4. `sweep-052-scene`
5. `sweep-062-scene`
6. `sweep-087-scene`
All six currently fail-closed during generation because workflow evidence is still considered incomplete for archetype `local_doc_pipeline`.
## Goal
Create a bounded route that inspects only these six residual scenes, identifies the missing reusable workflow-evidence patterns, and closes them at generator/analyzer rule level so that the next rematerialization rerun can reduce or eliminate the remaining `local_doc_pipeline` failures.
## Source-First Principle
This route remains source-first:
1. inspect the six original source scene directories
2. identify reusable evidence shapes that should count as valid `local_doc_pipeline` workflow evidence
3. encode only reusable evidence recovery, not one-off generated-output patching
## Closure Target
The route succeeds if it publishes a reusable first slice that makes the missing workflow evidence recognizable for the bounded six-scene bucket.
The route does not itself claim final closure of all six scenes; final closure is proven only after downstream rematerialization rerun and validation refresh rerun.
## Expected Reusable Focus
The likely reusable closure surface is one or more of:
1. doc-export evidence variants not currently recognized
2. local report-log / staging-file workflow shapes not currently recognized
3. query-leg + doc-export combinations that should count as `local_doc_pipeline`
4. evidence recovery from source-side helper scripts or embedded config files
## Outputs
This design leads to a bounded implementation plan that:
1. fixes only generator/analyzer rule recovery for the six-scene bucket
2. publishes route-local followup JSON
3. publishes route-local report
4. does not rerun rematerialization or validation refresh inside the route

View File

@@ -0,0 +1,127 @@
# Generated Scene Runtime Semantics Offline Validation Bundle Refresh Design
Date: 2026-04-21
## Context
The runtime-semantics hardening rerun has produced a refreshed 102-scene bundle:
- `examples/scene_skill_102_runtime_semantics_rematerialization_2026-04-21`
The validation refresh confirms:
- `materialized = 102 / 102`
- `deterministicReady = 102 / 102`
- `staticPass = 102 / 102`
- `directMockPass = 102 / 102`
- `pseudoProdSelected = 7`
The previous offline validation bundle under `dist/sgclaw_102_pseudoprod_validation_bundle_2026-04-20` is now stale because it was based on the pre-runtime-semantics materialization.
## Problem
Pseudo-production validation must not continue from the stale 2026-04-20 package. The inner-network operator needs a refreshed offline bundle that carries the new canonical 2026-04-21 runtime-semantics skills and the refreshed pseudo-production handoff assets.
## Goal
Create a bounded plan for refreshing a portable offline validation bundle for the 2026-04-21 runtime-semantics skill set.
The bundle must be suitable for copying to an inner-network machine that does not have `cargo`, Rust sources, or repository test infrastructure.
## Target Bundle
Target directory:
- `dist/sgclaw_102_runtime_semantics_validation_bundle_2026-04-21`
Required shape:
```text
dist/sgclaw_102_runtime_semantics_validation_bundle_2026-04-21/
sg_claw.exe
skills/
README.md
BATCH_001.md
BUNDLE_MANIFEST.json
docs/
SCENE_INDEX.md
scene_skill_102_index.json
handoff/
scene_skill_102_runtime_semantics_pseudoprod_execution_handoff_2026-04-21.json
scene_skill_102_runtime_semantics_pseudoprod_evidence_checklist_2026-04-21.json
scene_skill_102_runtime_semantics_pseudoprod_execution_record_template_2026-04-21.json
scene_skill_102_runtime_semantics_deterministic_invocation_readiness_2026-04-21.json
scene_skill_102_runtime_semantics_natural_language_parameter_readiness_2026-04-21.json
scene_skill_102_runtime_semantics_natural_language_invocation_samples_2026-04-21.json
scene_skill_102_runtime_semantics_full_direct_mock_execution_2026-04-21.json
resources/
rules-102-business-targets-candidate.json
rules-102-business-targets-merged.json
rules-102-business-targets.patch
results/
evidence/
```
## Fixed Inputs
1. `examples/scene_skill_102_runtime_semantics_rematerialization_2026-04-21`
2. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_manifest_2026-04-21.json`
3. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_failures_2026-04-21.json`
4. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_deterministic_invocation_readiness_2026-04-21.json`
5. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_natural_language_parameter_readiness_2026-04-21.json`
6. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_natural_language_invocation_samples_2026-04-21.json`
7. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_pseudoprod_execution_handoff_2026-04-21.json`
8. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_pseudoprod_evidence_checklist_2026-04-21.json`
9. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_pseudoprod_execution_record_template_2026-04-21.json`
10. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_full_direct_mock_execution_2026-04-21.json`
11. Current locally built `sg_claw.exe`
12. Optional rule assets under `resources/`
## Required Bundle Semantics
1. The `skills/` directory must be copied from the 2026-04-21 runtime-semantics rematerialization output.
2. The bundle must not reuse skills from `scene_skill_102_final_materialization_2026-04-19`.
3. The bundle must not reuse the old 2026-04-20 offline package as the source of truth.
4. The first batch must be derived from the refreshed 2026-04-21 pseudo-production handoff.
5. `BATCH_001.md` must show the exact natural-language input and page URL fields expected by `sg_claw_service_console.html`.
6. The README must explain that `skillsDir` must be configured with the exact JSON field name `skillsDir`.
7. The README must state that credentials, cookies, tokens, VPN secrets, and private keys must not be stored in the bundle.
## Scope
Allowed:
1. Create a new bundle directory under `dist/`.
2. Copy the refreshed 102 skill packages into the bundle.
3. Copy refreshed handoff/readiness/mock assets into the bundle.
4. Copy rules candidate assets into the bundle when present.
5. Write bundle README, batch instructions, manifest, and empty evidence/results directories.
6. Write a bundle refresh report.
Forbidden:
1. No `src/` changes.
2. No generated skill edits.
3. No rematerialization rerun.
4. No validation refresh rerun.
5. No pseudo-production execution.
6. No production browser, production network, or credentials.
7. No official board updates.
8. No deletion of the old 2026-04-20 bundle.
## Validation
The bundle refresh execution must verify:
1. `skills/` contains exactly 102 skill directories.
2. Every skill directory contains required package files.
3. Copied `scene.toml` files are parseable enough for structural presence checks.
4. `BUNDLE_MANIFEST.json` is valid JSON.
5. Copied handoff JSON assets are valid JSON.
6. `BATCH_001.md` contains exactly the selected pseudo-production batch entries from the refreshed handoff.
## Stop Statement
Stop after the refreshed offline bundle and bundle refresh report are published.
Do not execute inner-network validation inside this route.

View File

@@ -0,0 +1,63 @@
# Generated Scene Runtime Semantics Post-Refresh Residual Closure Design
Date: 2026-04-21
Parent execution:
- `docs/superpowers/plans/2026-04-21-generated-scene-runtime-semantics-validation-refresh-execution-plan.md`
## Intent
Close the post-refresh residuals exposed by the 2026-04-21 validation refresh before any pseudo-production reuse is attempted.
## Residual Scope
This residual closure is strictly limited to the two regressions exposed by validation refresh:
1. rematerialized `scene.toml` deterministic suffix regression
2. `sweep-078-scene` TOML generation corruption
## Why A Separate Residual Stage Is Required
The validation refresh proved that the hardened bundle is not yet a stable canonical bundle for downstream execution:
1. `95` rematerialized scenes regressed from `suffix = "。。。"` to scene-name suffixes
2. `sweep-078-scene` emitted invalid TOML
These are generator / serialization residuals, not pseudo-production or runtime-environment residuals.
## Fixed Inputs
1. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_manifest_2026-04-21.json`
2. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_deterministic_invocation_readiness_2026-04-21.json`
3. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_static_validation_2026-04-21.json`
4. `docs/superpowers/reports/2026-04-21-generated-scene-runtime-semantics-validation-refresh-execution-report.md`
## Required Closure Targets
### Target A
Restore deterministic suffix generation semantics for rematerialized `scene.toml` outputs so deterministic validation no longer collapses to `0 / 102`.
### Target B
Identify and close the serialization path that emitted invalid TOML in `sweep-078-scene`.
## Non-Goals
This residual closure does not include:
1. pseudo-production execution
2. rematerialization of `local_doc_pipeline` residual scenes
3. runtime callback-host / helper debugging
4. service-console changes
5. new hardening routes outside the two residuals above
## Expected Downstream
After this residual closure, the next step should be:
1. rerun rematerialization execution
2. rerun validation refresh
Only then should pseudo-production selection be reconsidered.

View File

@@ -0,0 +1,55 @@
# Generated Scene Runtime Semantics Rematerialization Execution Design
Date: 2026-04-21
Parent dependency plan:
- `docs/superpowers/plans/2026-04-20-generated-scene-runtime-semantics-rematerialization-refresh-plan.md`
## Purpose
Execute the rematerialization refresh required after the source-first runtime semantics hardening routes.
This design exists because the parent dependency plan explicitly forbids executing rematerialization inside itself.
## Fixed Inputs
1. Hardened generator implementation after these completed routes:
- `resolver_request_mapping_hardening`
- `runtime_url_classification_hardening`
- `embedded_dictionary_extraction_hardening`
- `parameter_default_semantics_recovery_hardening`
- `alias_generation_hardening`
2. Current source mapping from `generated_scene_source_evidence_cross_scan_2026-04-20.json`.
3. Current final materialization directory:
- `examples/scene_skill_102_final_materialization_2026-04-19`
## Execution Target
Create a refreshed final materialization directory using the hardened generator rules.
Recommended output root:
- `examples/scene_skill_102_runtime_semantics_rematerialization_2026-04-21`
## Required Outputs
1. Refreshed 102-skill materialization directory.
2. Refreshed materialization manifest.
3. Refreshed failures asset.
4. Refreshed human-readable scene index.
5. Rematerialization execution report.
## Guardrails
1. Do not manually edit generated skill files after generation.
2. Do not update official execution board.
3. Do not run production, browser, or intranet execution.
4. Do not run validation refresh in this plan.
5. Preserve old final materialization output as an audit artifact.
## Stop Statement
Stop after refreshed materialization assets and report are published.
Do not execute validation refresh inside this plan.

View File

@@ -0,0 +1,72 @@
# Generated Scene Runtime Semantics Validation Refresh Execution Design
Date: 2026-04-21
Parent dependency plan:
- `docs/superpowers/plans/2026-04-20-generated-scene-runtime-semantics-validation-refresh-plan.md`
Parent rematerialization execution:
- `docs/superpowers/plans/2026-04-21-generated-scene-runtime-semantics-rematerialization-execution-plan.md`
## Intent
Execute the full validation refresh required after the hardened runtime-semantics rematerialization.
## Why A Child Execution Plan Is Required
The parent validation refresh plan is dependency-only and explicitly forbids executing validation refresh inside that plan. A separate execution plan is required to:
1. consume the refreshed canonical 102-skill bundle
2. regenerate validation-layer assets against the refreshed bundle
3. keep validation evidence separate from the old pre-hardening assets
## Fixed Validation Layers
The execution must refresh these layers in order:
1. deterministic invocation readiness
2. natural-language parameter readiness
3. static validation
4. direct mock execution
5. pseudo-production handoff assets
## Canonical Input Bundle
All validation in this execution must consume:
- `examples/scene_skill_102_runtime_semantics_rematerialization_2026-04-21`
Old validation assets from the pre-hardening bundle may be read for comparison only, but may not be reused as proof.
## Required Outputs
1. refreshed deterministic invocation readiness asset
2. refreshed natural-language parameter readiness asset
3. refreshed static validation asset
4. refreshed direct mock execution asset
5. refreshed pseudo-production handoff assets
6. one aggregated validation refresh report
## Boundary
This execution may:
1. read the refreshed runtime-semantics bundle
2. regenerate validation JSON / markdown assets under `tests/fixtures/generated_scene/` and `docs/superpowers/reports/`
3. regenerate validation helper directories under `examples/` when the existing validation flow requires route-local outputs
This execution may not:
1. modify `src/`
2. modify the refreshed generated skills
3. rerun rematerialization
4. update the official board
5. start real browser / pseudo-production execution
## Success Criteria
1. validation refresh assets are published for every required layer
2. all refreshed assets explicitly reference the 2026-04-21 rematerialized bundle
3. residual validation blockers are recorded instead of silently reusing stale evidence