feat: add generated scene skill platform hardening

This commit is contained in:
木炎
2026-04-21 23:19:06 +08:00
parent 118fc77935
commit 956f0c2b68
439 changed files with 61974 additions and 3645 deletions

View File

@@ -0,0 +1,418 @@
# Request URL Resolution Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the temporary line-loss request URL hardcode in `src/service/server.rs` with a unified bootstrap-target resolver that prefers current page context, then deterministic submit plans, then skill metadata, and finally `about:blank`.
**Architecture:** Add a small service-owned resolver that returns a narrow `SubmitBootstrapTarget` result and centralizes precedence rules. Reuse `DeterministicExecutionPlan.target_url` as the authoritative source for deterministic line-loss scenes, then add minimal skill metadata fallback for configured direct browser-script skills, while keeping callback-host behavior unchanged.
**Tech Stack:** Rust, serde/serde_json, tungstenite, zeroclaw skill loader, staged `SKILL.toml` manifests, cargo test
---
### Task 1: Add resolver-focused red tests for precedence
**Files:**
- Modify: `src/service/server.rs:422-467`
- Test: `src/service/server.rs` (crate-local resolver tests)
- Test: `tests/service_ws_session_test.rs`
- [ ] **Step 1: Write the failing page-context precedence test**
In a crate-local unit test inside `src/service/server.rs`, add a focused resolver test that exercises the request-url resolver with:
- non-empty `page_url = "https://already-open.example.com/page"`
- an instruction that would otherwise match deterministic line-loss logic
- configured direct skill metadata present
Assert the resolved bootstrap target uses the explicit non-empty `page_url` and reports `PageContext` source.
- [ ] **Step 2: Run the test to verify it fails**
Run: `cargo test page_context_bootstrap_target_wins_over_deterministic_and_skill_fallback --lib -- --nocapture`
Expected: FAIL because no unified resolver/source enum exists yet.
- [ ] **Step 3: Write the failing deterministic-precedence test**
In `src/service/server.rs` crate-local tests, add a focused test for a deterministic line-loss instruction with no `page_url`.
Use the same instruction shape already accepted by `decide_deterministic_submit(...)`, and assert:
- resolver source is `DeterministicPlan`
- resolved `request_url` equals `DeterministicExecutionPlan.target_url`
- no raw `instruction.contains("线损")` fallback is needed
- [ ] **Step 4: Run the test to verify it fails**
Run: `cargo test deterministic_bootstrap_target_uses_plan_target_url --lib -- --nocapture`
Expected: FAIL because service still uses `derive_request_url_from_instruction(...)`.
- [ ] **Step 5: Write the failing skill-fallback test**
In `src/service/server.rs` crate-local tests, add a focused test for:
- no `page_url`
- instruction not deterministic
- configured direct-submit skill metadata provides `bootstrap_url`
Assert resolver source is `SkillConfig` and `request_url` matches metadata.
- [ ] **Step 6: Run the test to verify it fails**
Run: `cargo test skill_metadata_bootstrap_url_is_used_when_no_page_context_or_plan_exists --lib -- --nocapture`
Expected: FAIL because skill metadata is not read today.
- [ ] **Step 7: Write the failing malformed-metadata fallback test**
In `src/service/server.rs` crate-local tests, add a focused test for malformed `bootstrap_url` metadata, with no page context and no deterministic plan.
Assert the resolver:
- ignores malformed metadata
- returns `Fallback`
- resolves to `about:blank`
- [ ] **Step 8: Run the test to verify it fails**
Run: `cargo test malformed_skill_bootstrap_url_falls_back_to_about_blank --lib -- --nocapture`
Expected: FAIL because malformed metadata is not handled by a resolver yet.
---
### Task 2: Introduce the bootstrap-target resolver in service code
**Files:**
- Modify: `src/service/server.rs:280-467`
- Modify: `src/service/mod.rs:17-22`
- Test: `src/service/server.rs` (crate-local resolver tests)
- [ ] **Step 1: Add the narrow resolver types in service code**
In `src/service/server.rs`, add:
```rust
#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) struct SubmitBootstrapTarget {
pub request_url: String,
pub expected_domain: Option<String>,
pub source: BootstrapTargetSource,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) enum BootstrapTargetSource {
PageContext,
DeterministicPlan,
SkillConfig,
Fallback,
}
```
Keep them scoped to service code. Do not create a generic cross-runtime planning object.
- [ ] **Step 2: Add a minimal resolver entry point**
Implement a service-owned function in `src/service/server.rs`, conceptually:
```rust
pub(crate) fn resolve_submit_bootstrap_target(
request: &crate::agent::SubmitTaskRequest,
workspace_root: &Path,
settings: &SgClawSettings,
) -> SubmitBootstrapTarget
```
Initial behavior for this step:
- return `PageContext` only when `request.page_url` exists and is non-empty after trimming
- add a crate-local regression that empty/whitespace `page_url` does not short-circuit later precedence tiers
- otherwise fall through to existing behavior temporarily so the new tests can compile incrementally
- [ ] **Step 3: Update service startup to call the resolver**
At the callback-host startup call site in `serve_client(...)`, replace:
```rust
let bootstrap_url = initial_request_url_for_submit_task(&request);
```
with resolver usage:
```rust
let bootstrap_target = resolve_submit_bootstrap_target(&request, context.workspace_root(), &settings);
let bootstrap_url = bootstrap_target.request_url;
```
Use the current settings-loading seam already used elsewhere in service code. Keep callback-host startup behavior otherwise unchanged.
- [ ] **Step 4: Keep resolver visibility crate-local**
Do not make the resolver types broadly public for integration tests. Keep the resolver and `BootstrapTargetSource` crate-local, and keep source-level assertions in `src/service/server.rs` unit tests.
Only re-export/remove existing `initial_request_url_for_submit_task(...)` seams through `src/service/mod.rs` if production callers still require that wiring.
- [ ] **Step 5: Run the first precedence test to verify it passes**
Run: `cargo test page_context_bootstrap_target_wins_over_deterministic_and_skill_fallback --lib -- --nocapture`
Expected: PASS.
- [ ] **Step 6: Commit**
```bash
git add src/service/server.rs src/service/mod.rs
git commit -m "refactor(service): add submit bootstrap target resolver scaffold"
```
---
### Task 3: Make deterministic submit the authoritative source for line-loss bootstrap URLs
**Files:**
- Modify: `src/service/server.rs:422-467`
- Modify: `src/compat/deterministic_submit.rs:13-101`
- Test: `src/service/server.rs` (crate-local resolver tests)
- Test: `tests/service_ws_session_test.rs`
- [ ] **Step 1: Write a small service-side seam for deterministic resolution**
In `src/service/server.rs`, update the resolver so that when `page_url` is absent it calls:
```rust
crate::compat::deterministic_submit::decide_deterministic_submit(
&request.instruction,
request.page_url.as_deref(),
request.page_title.as_deref(),
)
```
Only `DeterministicSubmitDecision::Execute(plan)` should produce a deterministic bootstrap target.
Treat `NotDeterministic` and `Prompt { .. }` as “no deterministic bootstrap target” for service startup.
- [ ] **Step 2: Use `plan.target_url` directly**
Map `DeterministicSubmitDecision::Execute(plan)` to:
- `request_url = plan.target_url.clone()`
- `expected_domain = Some(plan.expected_domain.clone())`
- `source = BootstrapTargetSource::DeterministicPlan`
Do not reconstruct the URL in `server.rs`.
- [ ] **Step 3: Remove the temporary line-loss hardcode**
Delete this branch from `derive_request_url_from_instruction(...)` or remove the function entirely if it is no longer needed:
```rust
if instruction.contains("线损") || instruction.contains("lineloss") {
return Some("http://20.76.57.61:18080".to_string());
}
```
Keep any still-needed legacy Zhihu fallback only if the resolver still requires it after deterministic integration.
- [ ] **Step 4: Add/adjust a deterministic regression test**
In `src/service/server.rs` crate-local tests, add a focused assertion that line-loss bootstrap URL now comes from `DeterministicExecutionPlan.target_url`, not raw text matching.
A good assertion shape is:
- call resolver with deterministic line-loss instruction
- assert `request_url == "http://20.76.57.61:18080/gsllys/tqLinelossStatis/tqQualifyRateMonitor"`
- assert `source == DeterministicPlan`
- [ ] **Step 5: Run deterministic tests to verify they pass**
Run: `cargo test deterministic_bootstrap_target_uses_plan_target_url --lib -- --nocapture`
Expected: PASS.
- [ ] **Step 6: Run service websocket coverage for the same precedence**
Run: `cargo test callback_host --test service_ws_session_test -- --nocapture`
Expected: PASS with no line-loss hardcode dependency.
- [ ] **Step 7: Commit**
```bash
git add src/service/server.rs src/compat/deterministic_submit.rs tests/service_ws_session_test.rs
git commit -m "refactor(service): derive line-loss bootstrap URL from deterministic plan"
```
---
### Task 4: Add skill-metadata fallback for configured direct-submit skills
**Files:**
- Modify: `src/compat/direct_skill_runtime.rs:114-153`
- Modify: `src/service/server.rs:422-467`
- Optionally modify: `src/config/settings.rs` only if a tiny metadata pointer is required
- Modify: `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging/skills/fault-details-report/SKILL.toml`
- Optionally modify: `D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging/skills/95598-weekly-monitor-report/SKILL.toml`
- Test: `src/service/server.rs` (crate-local resolver tests)
- Test: `tests/service_ws_session_test.rs`
- [ ] **Step 1: Define the minimal skill metadata shape**
Extend staged `SKILL.toml` parsing expectations to support a narrow metadata seam for browser-script direct skills.
The plan target fields are:
- `bootstrap_url`
- `expected_domain`
Keep the metadata minimal. Do not add a broad dispatch registry or scene-policy schema.
Recommended TOML shape in the skill manifest:
```toml
[tools.metadata]
bootstrap_url = "https://example.com/path"
expected_domain = "example.com"
```
If the actual skill loader only supports per-tool custom fields in another location, use that established seam instead. Do not invent a parallel config file.
- [ ] **Step 2: Add a helper that reads fallback metadata for the configured direct skill**
In `src/compat/direct_skill_runtime.rs`, add a helper like:
```rust
pub(crate) fn resolve_direct_submit_bootstrap_metadata(
configured_tool: &str,
workspace_root: &Path,
settings: &SgClawSettings,
) -> Result<Option<DirectSubmitBootstrapMetadata>, PipeError>
```
Recommended shape:
```rust
pub(crate) struct DirectSubmitBootstrapMetadata {
pub bootstrap_url: String,
pub expected_domain: Option<String>,
}
```
Reuse the existing `resolve_browser_script_skill(...)` lookup path so the service resolver does not duplicate staged-skill discovery logic.
- [ ] **Step 3: Validate metadata conservatively**
When reading fallback metadata:
- accept only non-empty `bootstrap_url`
- require it to parse as a valid absolute URL
- normalize or preserve `expected_domain` only if non-empty
- on malformed metadata, return `Ok(None)` for resolver purposes instead of failing service startup
This keeps malformed fallback data from breaking submits and matches the approved spec.
- [ ] **Step 4: Wire skill metadata into the service resolver**
Update `resolve_submit_bootstrap_target(...)` to:
- check skill metadata only after page context and deterministic parsing fail
- use `SkillConfig` as the source when metadata resolves
- fall through to `about:blank` when metadata is missing or malformed
- [ ] **Step 5: Add a staged-skill fixture update**
Update at least one configured direct skill fixture, likely `fault-details-report`, to include valid fallback metadata.
Use concrete values appropriate for that skills target page; do not reuse the line-loss URL.
- [ ] **Step 6: Run the skill-fallback test to verify it passes**
Run: `cargo test skill_metadata_bootstrap_url_is_used_when_no_page_context_or_plan_exists --lib -- --nocapture`
Expected: PASS.
- [ ] **Step 7: Run the malformed-metadata test to verify it passes**
Run: `cargo test malformed_skill_bootstrap_url_falls_back_to_about_blank --lib -- --nocapture`
Expected: PASS.
- [ ] **Step 8: Commit**
```bash
git add src/compat/direct_skill_runtime.rs src/service/server.rs D:/data/ideaSpace/rust/sgClaw/claw/claw/skills/skill_staging/skills/fault-details-report/SKILL.toml tests/service_ws_session_test.rs
git commit -m "feat(service): add direct skill bootstrap URL fallback metadata"
```
---
### Task 5: Remove obsolete request-url glue and lock the final precedence contract
**Files:**
- Modify: `src/service/server.rs:422-467`
- Modify: `src/service/mod.rs:20-22`
- Test: `src/service/server.rs` (crate-local resolver tests)
- Test: `tests/service_ws_session_test.rs`
- [ ] **Step 1: Delete obsolete helper logic**
If `derive_request_url_from_instruction(...)` is no longer needed after resolver landing, delete it completely.
If a tiny legacy Zhihu-only seam still remains, keep it private behind the resolver and remove the old public shape from `service::browser_ws_client` if no longer needed.
- [ ] **Step 2: Lock the precedence contract with one final matrix test**
In `src/service/server.rs` crate-local tests, add one table-driven or clearly segmented test that verifies all four final outcomes:
- non-empty page context wins
- deterministic plan wins when page context is absent or empty
- skill metadata wins when page context and deterministic plan are absent
- fallback becomes `about:blank` when nothing resolves
- [ ] **Step 3: Run the focused resolver suite**
Run: `cargo test bootstrap_target --lib -- --nocapture`
Expected: PASS.
- [ ] **Step 4: Run service websocket regression coverage**
Run: `cargo test callback_host --test service_ws_session_test -- --nocapture`
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add src/service/server.rs src/service/mod.rs tests/service_ws_session_test.rs
git commit -m "refactor(service): finalize bootstrap target precedence"
```
---
### Task 6: Full verification and implementation handoff check
**Files:** None (verification only)
- [ ] **Step 1: Run focused deterministic and direct-skill tests**
Run: `cargo test deterministic_submit -- --nocapture`
Expected: PASS.
Run: `cargo test direct_submit -- --nocapture`
Expected: PASS.
- [ ] **Step 2: Run service submit regression coverage**
Run: `cargo test --test service_task_flow_test -- --nocapture`
Expected: PASS.
Run: `cargo test --test service_ws_session_test -- --nocapture`
Expected: PASS.
- [ ] **Step 3: Run targeted config/settings coverage if touched**
Run: `cargo test service_protocol_update_config_test -- --nocapture`
Expected: PASS.
- [ ] **Step 4: Build the project**
Run: `cargo build --bin sg_claw`
Expected: PASS.
- [ ] **Step 5: Manual behavior checklist**
Verify manually:
1. Existing page-attached submits still bootstrap against the current page URL.
2. Deterministic line-loss submit without page context boots helper against the line-loss target page from `DeterministicExecutionPlan.target_url`.
3. Non-deterministic configured direct skill without page context uses skill metadata bootstrap URL if present.
4. Missing or malformed skill metadata does not crash startup and falls back to `about:blank`.
5. No service code remains that hardcodes line-loss request URL by checking raw instruction text.
- [ ] **Step 6: Final commit (only if verification revealed required follow-up fixes)**
```bash
git add -A
git commit -m "test: lock request URL resolution precedence"
```
Only create this commit if verification required an additional code or test fix.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,441 @@
# Generated Scene Rectification Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Rectify the generated-scene pipeline so it stops emitting false-positive runnable skills for complex internal scenes, specifically by fixing `sceneId` degeneration, bootstrap pollution, incomplete workflow reconstruction, and readiness fail-open behavior.
**Architecture:** Keep the current `Scene IR` pipeline, but add four hard control chains around it: naming validation, bootstrap evidence stratification, workflow evidence reconstruction, and readiness gating. Generation must fail closed whenever these chains are incomplete.
**Tech Stack:** Rust, Node.js, HTML/CSS/JavaScript, serde_json, OpenAI-compatible LLM API
---
## Scope Check
This plan implements the design in:
- `docs/superpowers/specs/2026-04-17-generated-scene-rectification-design.md`
This plan builds on the existing generated-scene foundation already described in:
- `docs/superpowers/specs/2026-04-17-scene-skill-compiler-design.md`
- `docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md`
- `docs/superpowers/specs/2026-04-17-enhanced-llm-extraction-schema-design.md`
This plan does not attempt to solve:
- login or authentication recovery
- Chromium host integration or browser embedding changes
- full runtime resolver expansion beyond what this rectification needs
- arbitrary historical scene compatibility outside the reference regression cases
---
## File Map
### Frontend scene generator
| File | Action | Purpose |
|------|--------|---------|
| `frontend/scene-generator/generator-runner.js` | Modify | Implement naming fallback control, URL evidence stratification, workflow evidence cleanup, and pre-generation gate inputs |
| `frontend/scene-generator/llm-client.js` | Modify | Tighten sceneId semantic constraints and reject low-entropy LLM naming output |
| `frontend/scene-generator/server.js` | Modify | Aggregate readiness gates, block unsafe generation, and return rectification diagnostics |
| `frontend/scene-generator/sg_scene_generator.html` | Modify | Show invalid `sceneId`, bootstrap role breakdown, workflow evidence completeness, and generation block reasons |
### Rust generated-scene pipeline
| File | Action | Purpose |
|------|--------|---------|
| `src/generated_scene/analyzer.rs` | Modify | Add endpoint denoising, evidence role typing, and stricter archetype preconditions |
| `src/generated_scene/ir.rs` | Modify | Extend IR to carry candidate roles, gate states, and workflow evidence completeness |
| `src/generated_scene/generator.rs` | Modify | Prevent compiler routing when gates fail and surface fail-closed diagnostics |
### Tests and fixtures
| File | Action | Purpose |
|------|--------|---------|
| `tests/scene_generator_test.rs` | Modify | Cover naming, bootstrap, workflow, and readiness regression cases |
| `tests/scene_generator_html_test.rs` | Modify | Cover HTML/UI risk and blocking output |
| `tests/fixtures/generated_scene/paginated_enrichment/*` | Modify | Preserve marketing-like reference coverage |
| `tests/fixtures/generated_scene/multi_mode/*` | Modify | Preserve tq-like multi-mode coverage |
| Additional fixture files as needed | Create | Add low-entropy naming and localhost-pollution regression inputs |
---
## Scope Guardrails
- Do not broaden this work into a generic scene-generator redesign.
- Do not remove the existing `Scene IR` structure; extend and constrain it.
- Do not let `localhost` or helper/export endpoints participate in bootstrap selection.
- Do not silently coerce invalid `sceneId` values into accepted ids.
- Do not route into `paginated_enrichment` unless its minimum workflow evidence is complete.
- Do not emit a default runnable skill when any rectification gate fails.
---
### Task 1: Rectify Naming Chain
**Files:**
- Modify: `frontend/scene-generator/generator-runner.js`
- Modify: `frontend/scene-generator/llm-client.js`
- Modify: `frontend/scene-generator/server.js`
- Modify: `src/generated_scene/ir.rs`
**Goal:** Stop Chinese-source scenes from degrading into low-information ids such as `2-0`, and turn `sceneId` into a validated business identifier instead of a raw slug fallback.
- [ ] **Step 1: Classify sceneId candidate sources**
Define explicit candidate tiers for `sceneId`:
1. LLM semantic business id
2. deterministic keyword-derived id
3. controlled alias/transliteration fallback
4. invalid fallback candidate
Expected result: the pipeline can explain where the chosen id came from.
- [ ] **Step 2: Add low-entropy sceneId validation**
Implement shared validation rules that reject ids which are:
- numeric-only or numeric-dominant
- too short to be business-readable
- generic placeholders such as `scene` or `report`
- semantically detached from the extracted `sceneName`
Expected result: ids like `2-0`, `1-0`, `scene`, `report` are blocked.
- [ ] **Step 3: Fail closed on invalid sceneId**
Update generation flow so invalid `sceneId` produces:
- `invalid_scene_id` gate failure
- readiness downgrade
- analysis/report output only unless explicitly overridden later by a separate approved flow
Expected result: invalid ids never create a formal generated skill directory by default.
- [ ] **Step 4: Surface naming diagnostics in server/UI**
Return and display:
- chosen `sceneId`
- candidate source
- validation result
- invalidation reason if blocked
- [ ] **Step 5: Add regression tests**
Cover at least:
- Chinese source name that previously degraded to `2-0`
- valid semantic id chosen over slug fallback
- invalid low-entropy id blocked from generation
- [ ] **Step 6: Commit**
```bash
git add frontend/scene-generator/generator-runner.js frontend/scene-generator/llm-client.js frontend/scene-generator/server.js src/generated_scene/ir.rs tests/scene_generator_test.rs
git commit -m "fix(generator): block degenerate generated scene ids"
```
---
### Task 2: Rectify Bootstrap Chain
**Files:**
- Modify: `frontend/scene-generator/generator-runner.js`
- Modify: `frontend/scene-generator/server.js`
- Modify: `src/generated_scene/analyzer.rs`
- Modify: `src/generated_scene/ir.rs`
**Goal:** Separate business bootstrap candidates from localhost/export/helper URLs so internal-network entry domains resolve correctly.
- [ ] **Step 1: Add URL evidence role stratification**
Classify URL candidates into:
- `business_entry`
- `business_api`
- `gateway_api`
- `export_service`
- `local_helper`
- `static_asset`
- `template_noise`
Expected result: every URL candidate is typed before bootstrap selection.
- [ ] **Step 2: Add deterministic localhost and noise rejection**
Ensure that:
- `localhost`
- `127.0.0.1`
- `SurfaceServices`
- `ReportServices`
- `.js` / `.css` assets
- template placeholders and format strings
are routed away from bootstrap candidates.
Expected result: helper/export/static/template strings can remain as evidence but can never win bootstrap.
- [ ] **Step 3: Redefine bootstrap resolution order**
Bootstrap selection may only consume:
1. `business_entry`
2. `business_api`
3. `gateway_api`
When only helper/noise roles exist, set bootstrap to unresolved and downgrade readiness.
- [ ] **Step 4: Preserve export/helper evidence separately**
Retain localhost/export endpoints as downstream evidence for workflow/reporting, but isolate them from `expectedDomain` and `targetUrl`.
- [ ] **Step 5: Add regression tests**
Cover at least:
- marketing-like source choosing `yx.gs.sgcc.com.cn` over `localhost`
- mixed business + gateway scene preserving business target page
- scene with only localhost/noise ending in unresolved bootstrap
- [ ] **Step 6: Commit**
```bash
git add frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs tests/scene_generator_test.rs
git commit -m "fix(generator): stratify bootstrap evidence and exclude localhost"
```
---
### Task 3: Rectify Workflow Chain
**Files:**
- Modify: `frontend/scene-generator/generator-runner.js`
- Modify: `frontend/scene-generator/server.js`
- Modify: `src/generated_scene/analyzer.rs`
- Modify: `src/generated_scene/ir.rs`
- Modify: `src/generated_scene/generator.rs`
**Goal:** Reconstruct workflow from request-chain evidence instead of generic field names, so `paginated_enrichment` is only emitted when its true workflow exists.
- [ ] **Step 1: Split workflow evidence into typed layers**
Represent workflow evidence as:
- request evidence
- pagination evidence
- secondary request evidence
- post-process evidence
Expected result: archetype decisions operate on structured workflow signals instead of a flat endpoint list.
- [ ] **Step 2: Denoise endpoint and method evidence**
Normalize and filter out:
- `${apiUrl}`
- template placeholders
- exception strings
- log text fragments
- localhost export endpoints
Expected result: workflow reconstruction only consumes business-relevant requests.
- [ ] **Step 3: Tighten archetype routing rules**
Require `paginated_enrichment` to have at minimum:
1. one main list request
2. one pagination variable set
3. one secondary request or explicit per-item enrichment function
4. one post-process action among `filter`, `transform`, `export`
If only part of this exists, preserve it as candidate evidence but do not route into the compiler.
- [ ] **Step 4: Narrow multi_mode detection**
Allow `multi_mode_request` only when mode switching materially changes at least one of:
- request body
- endpoint shape
- response path
- column definition
Expected result: generic `type/tab/mode/status` fields alone no longer misclassify marketing-like scenes.
- [ ] **Step 5: Block compiler routing on incomplete workflow**
Update generator-side routing so incomplete evidence cannot produce a formal `paginated_enrichment` skill package.
- [ ] **Step 6: Add regression tests**
Cover at least:
- marketing-like scene must expose `paginate` + `secondary_request` + post-process evidence
- generic mode fields without real mode divergence must not force `multi_mode_request`
- noisy endpoint lists must still reconstruct the correct business request chain
- [ ] **Step 7: Commit**
```bash
git add frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs src/generated_scene/generator.rs tests/scene_generator_test.rs
git commit -m "fix(generator): require complete workflow evidence before archetype routing"
```
---
### Task 4: Rectify Readiness Chain
**Files:**
- Modify: `frontend/scene-generator/server.js`
- Modify: `frontend/scene-generator/sg_scene_generator.html`
- Modify: `src/generated_scene/ir.rs`
- Modify: `src/generated_scene/generator.rs`
- Modify: `tests/scene_generator_html_test.rs`
**Goal:** Turn readiness into a hard gate that distinguishes analysis output from runnable skill output.
- [ ] **Step 1: Add explicit rectification gates**
Track at minimum:
- `scene_id_valid`
- `bootstrap_resolved`
- `workflow_complete_for_archetype`
- `runtime_contract_compatible`
Expected result: readiness is derived from named gates rather than a loose score only.
- [ ] **Step 2: Enforce fail-closed readiness rules**
Require:
- all core gates pass for readiness `A` or `B`
- any core gate failure forces readiness `C`
- generation endpoint blocks runnable output on gate failure
- [ ] **Step 3: Separate analysis result from generation result**
When gates fail, allow:
- analysis preview
- evidence report
- block reasons
But do not default to:
- full skill emission
- compiler success messaging
- [ ] **Step 4: Expose readiness breakdown in UI**
Display:
- gate names
- pass/fail state
- missing workflow pieces
- bootstrap resolution reason
- invalid sceneId reason
- [ ] **Step 5: Add regression tests**
Cover at least:
- invalid `sceneId` forcing readiness `C`
- unresolved bootstrap forcing readiness `C`
- incomplete paginated workflow forcing readiness `C`
- fully valid reference fixture remaining eligible for generation
- [ ] **Step 6: Commit**
```bash
git add frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/ir.rs src/generated_scene/generator.rs tests/scene_generator_html_test.rs tests/scene_generator_test.rs
git commit -m "fix(generator): enforce readiness fail-closed gating"
```
---
### Task 5: Reference Regression Verification
**Files:**
- Modify: `tests/scene_generator_test.rs`
- Modify: `tests/scene_generator_html_test.rs`
- Modify/Create: relevant fixtures under `tests/fixtures/generated_scene/`
**Goal:** Lock the rectification against the two reference scene families and ensure future changes do not reintroduce the same false positives.
- [ ] **Step 1: Regress marketing-like fixture**
Verify the marketing reference path now satisfies:
- non-degenerate `sceneId`
- bootstrap rooted in `yx.gs.sgcc.com.cn` family
- workflow includes `paginate`
- workflow includes `secondary_request`
- readiness does not pass if any of the above are missing
- [ ] **Step 2: Regress tq-like fixture**
Verify the tq reference path still satisfies:
- stable semantic `sceneId`
- valid non-localhost bootstrap
- genuine `multi_mode_request` detection
- no downgrade caused by the stricter marketing rectification rules
- [ ] **Step 3: Run verification commands**
Run:
```bash
cargo check
cargo test --test scene_generator_test -- --nocapture
cargo test --test scene_generator_html_test -- --nocapture
node --check frontend/scene-generator/llm-client.js
node --check frontend/scene-generator/generator-runner.js
node --check frontend/scene-generator/server.js
```
Expected result: rectification passes both Rust and Node validation plus regression coverage.
- [ ] **Step 4: Record outcomes in generated reports if needed**
If the implementation emits readiness or analysis JSON reports, ensure the test fixtures assert the key blocked/passed states directly.
- [ ] **Step 5: Commit**
```bash
git add tests/scene_generator_test.rs tests/scene_generator_html_test.rs tests/fixtures/generated_scene frontend/scene-generator/llm-client.js frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js src/generated_scene/analyzer.rs src/generated_scene/ir.rs src/generated_scene/generator.rs
git commit -m "test(generator): lock generated scene rectification regressions"
```
---
## Acceptance Criteria
This plan is complete when all of the following are true:
1. Chinese-source scene names no longer degrade into low-entropy ids like `2-0`.
2. `localhost`, `127.0.0.1`, export services, and helper URLs no longer compete for bootstrap resolution.
3. `paginated_enrichment` routing only occurs when pagination, secondary request, and post-process evidence are all present.
4. Incomplete evidence paths fail closed with explicit readiness gate failures instead of generating false-positive runnable skills.
5. The marketing-like and tq-like reference scenes both remain covered by automated regression tests.
## Rollback Strategy
If this rectification causes unacceptable regressions:
1. Revert the latest rectification task commit only, not unrelated generated-scene work.
2. Keep the previous `Scene IR` and compiler structure intact.
3. Preserve newly added fixtures and tests where possible, then relax only the specific gate or classifier that caused the regression.
## Notes For Executors
- Implement this plan strictly in order: naming, bootstrap, workflow, readiness, verification.
- Do not skip ahead to UI polish before the gating logic is in place.
- Do not add speculative resolver or login work under this plan.
- Any need for user override or forced draft generation must be handled as a separate follow-up spec, not smuggled into this rectification plan.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,382 @@
# sgClaw Scene Skill 60-to-90 Roadmap Plan
> **Status:** Draft
> **Date:** 2026-04-17
> **Author:** Codex
> **Upstream Spec:** [2026-04-17-scene-skill-60-to-90-roadmap-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-17-scene-skill-60-to-90-roadmap-design.md)
## Plan Intent
本计划用于将“scene skill 自动生成能力从 60 分提升到 90 分”的设计方案拆解为可执行的交付阶段、任务边界、验收条件与实施顺序。计划严格服从上游 `spec`,不额外扩展问题空间,不提前引入未在 `spec` 中确认的实现目标。
本计划覆盖的核心目标仅包括:
1. 建立可裁决的语义证据层
2. 建立最小可编译业务契约
3. 冻结 P0 样板标准答案
4. 按 P0 到 P1 的路线推动 scene skill 自动转化能力从结构识别升级到业务语义恢复
## Success Criteria Baseline
本计划默认采用上游 `spec` 中已经收敛的成功标准:阶段性成功不再以“生成结果是否尽量接近某个参考 skill 的结构”作为唯一目标,而是以通用场景生成后的 skill 能否在内网环境中直接运行、拿到正确数据并产出正确报表作为主判定口径。
因此,实施验收默认同时检查以下三层闭环:
1. 执行闭环:生成 skill 可在自研浏览器承载的内网环境中完成执行
2. 数据闭环:查询、分页、提取后的数据正确且完整
3. 产物闭环:生成的 Excel 或其他报表符合业务规则
## Scope Guardrails
本计划执行过程中,以下边界保持不变:
1. 不以“一次覆盖全部 102 个场景”为目标
2. 不在本计划中展开统一平台登录或目标业务系统后台登录的自动恢复实现
3. 不把 BrowserAction 全链路抽象一次性做完
4. 不把复杂文档渲染、模板上传、附件解析场景纳入 P0
5. 不以“先做更多 prompt 调优”代替证据层、契约层和标准答案建设
## Scene Family Baseline
本计划执行时,默认承接上游 `spec``102` 个场景的家族分组结果:
1. `G1` 通用单页报表组:`68`
2. `G2` 多模式报表组:`11`
3. `G3` 分页明细补数组:`10`
4. `G4` 工具检测前置组:`8`
5. `G5` 低优先级噪声组:`5`
本计划的主线实施范围以 `G1 + G2 + G3` 为主,它们合计 `89` 个场景,约占全部样本的 `87%``G4` 作为后续检测类扩展前置保留,`G5` 默认降级处理,不进入首轮主线。
## Workstreams
本计划拆分为四条主工作流:
1. `WS1` 语义证据层建设
2. `WS2` 最小可编译业务契约建设
3. `WS3` P0 标准答案与校准基线建设
4. `WS4` P0/P1 样板路线落地与验证
四条工作流之间的依赖关系为:
`WS1 + WS2 + WS3 -> WS4`
## Phase Overview
计划按五个阶段推进:
1. Phase 0冻结边界与样板
2. Phase 1建立语义证据层
3. Phase 2建立最小可编译业务契约
4. Phase 3冻结 P0 canonical answers
5. Phase 4按 P0/P1 路线逐步验证 60-to-90 能力提升
其中 Phase 4 不是按业务部门推进,而是按场景家族推进,顺序固定为:
1. 先打 `G2` 多模式报表组,验证语义恢复上限
2. 再打 `G1` 通用单页报表组,验证规模化迁移能力
3. 再打 `G3` 分页明细补数组,验证复杂 workflow 与 fail-closed
4. `G4` 保留到后续检测类扩展
5. `G5` 默认降级处理
## Phase 0冻结边界与样板
### Objective
在进入建设阶段前先冻结问题边界、P0 样板、P1 家族和对标基线,避免实施过程中反复漂移。
### Tasks
1. 固化 P0 样板清单
2. 固化 P1 家族清单
3. 固化 `台区线损大数据-月_周累计线损率统计分析 -> tq-lineloss-report` 的 canonical mapping
4. 固化宿主浏览器执行上下文和 `localhost:*` 的语义分类口径
5. 固化“业务语义层 / 宿主浏览器能力层 / 登录与本地桥接层”的分层约束
6. 固化 `102` 个场景的五大分组和分组口径
7. 固化各分组到 archetype / 阶段 / 验收重点的映射关系
### Deliverables
1. 冻结后的样板名单
2. 样板与 archetype 对照表
3. 宿主与业务分层约束说明
4. canonical benchmark 映射说明
5. 五大场景分组清单
6. 分组实施映射说明
### Exit Criteria
1. P0 / P1 样板不再变动
2. `tq-lineloss-report` 已被明确为 P0-1 的 canonical benchmark
3. `localhost:*` 已被明确定义为宿主桥接证据而非默认业务域
4. `102` 个场景的五大分组和分组实施口径不再漂移
## Phase 1建立语义证据层
### Objective
将“源码直接汇总到 Scene IR”的生成路径升级为“源码先形成可裁决语义证据再归约为 Scene IR”的路径。
### Tasks
1. 定义统一证据对象 schema
2. 定义证据来源分层
3. 定义证据归并与冲突消解规则
4. 定义证据到 `Scene IR` 的映射边界
5. 建立核心证据类型集合
### Required Evidence Types
第一版最小证据类型集合固定为:
1. `bootstrap_candidate`
2. `endpoint_candidate`
3. `mode_candidate`
4. `request_template_candidate`
5. `response_path_candidate`
6. `column_defs_candidate`
7. `normalize_rules_candidate`
8. `workflow_candidate`
9. `localhost_dependency_candidate`
10. `browser_action_candidate`
11. `export_candidate`
### Deliverables
1. 证据对象 schema 文档
2. 证据类型字典
3. 证据归并规则文档
4. 证据到 `Scene IR` 的映射规则文档
5. P0 样板的证据抽取结果样例
### Acceptance Criteria
1. 任一 P0 样板都能输出结构化证据集合
2. `localhost:*`、宿主 JS 注入、隐藏域行为可进入独立证据槽位
3. `Scene IR` 的核心字段均可回溯到对应证据来源
4. 证据冲突时存在明确裁决路径,而不是被最终总结直接吞没
## Phase 2建立最小可编译业务契约
### Objective
把 archetype 判断从“关键词命中”升级为“最小业务契约是否成立”,让 compiler 只接收证据闭合的输入。
### Tasks
1. 定义各 archetype 的最小可编译契约
2. 定义统一 gate 列表
3. 定义 gate 失败时的阻断规则
4. 定义 archetype 最小输出契约
5. 建立 fail-closed 优先的 readiness 判定口径
### Required Gates
统一 gate 名称最少包括:
1. `bootstrap_resolved`
2. `request_contract_complete`
3. `response_contract_complete`
4. `workflow_contract_complete`
5. `runtime_contract_compatible`
### Deliverables
1. archetype 最小契约表
2. gate 判定表
3. blocker / readiness 规则表
4. archetype 输出契约样例
### Acceptance Criteria
1. `multi_mode_request``single_request_table``paginated_enrichment` 均有明确最小契约
2. 没有通过 gate 的场景不能再伪装为 runnable skill
3. readiness 结果能够区分“业务证据不足”和“宿主运行时依赖未满足”
4. compiler 输入边界清晰,不能继续吞入未闭合 IR
## Phase 3冻结 P0 Canonical Answers
### Objective
为 P0 三个主样板建立稳定的标准答案、关键证据清单和验收基线,作为后续回归与迁移的唯一校准源。
### Tasks
1. 固化三个 P0 样板的标准 `Scene IR`
2. 固化三个 P0 样板的关键证据清单
3. 固化三个 P0 样板的验收标准
4. 固化三个 P0 样板的失败 taxonomy
5. 建立 canonical answer 与实际生成结果的比对方式
### P0 Canonical Targets
1. `台区线损大数据-月_周累计线损率统计分析`
参考 `tq-lineloss-report`
2. `用户日电量监测`
对标单请求量产样板
3. `95598工单明细表`
对标分页补数识别与阻断样板
### Deliverables
1. 三个 P0 样板的 canonical `Scene IR`
2. 三个 P0 样板的关键语义证据基线
3. 三个 P0 样板的验收表
4. 三个 P0 样板的失败类型表
### Acceptance Criteria
1. P0-1 能明确以 `tq-lineloss-report` 作为高质量参考样板,而非唯一硬标准答案
2. 三个 P0 样板都存在“生成结果 vs canonical answer”的对齐方式
3. 后续每次能力升级均可回归验证是否偏离 P0 标准答案
## Phase 4按 P0/P1 路线逐步验证 60-to-90 提升
### Objective
按照 `spec` 已定义的优先级,以 P0 为主、P1 为扩展,逐步验证自动转化器从结构识别向业务语义恢复的提升路径。
本阶段不按业务部门推进,而按场景家族推进。其首轮目标不是“覆盖全部 `102` 个场景”,而是先打穿主流报表型场景,再逐步扩展。
### Track AP0-1 `tq` 主样板
#### Goal
打通 `multi_mode_request.month_week_table` 的主样板能力,并使结果在关键业务语义、内网可执行性与报表正确性上达到 `tq-lineloss-report` 同等级别。
#### Tasks
1. 恢复完整 `month / week` 模式矩阵
2. 恢复每个模式的请求契约与响应契约
3. 恢复列定义、归一化规则和导出语义
4. 校验 bootstrap 与目标系统上下文约束
5. 建立自动结果与 `tq-lineloss-report` 的关键语义比对
#### Acceptance Criteria
1. `mode matrix` 稳定恢复
2. 关键 request / response contract 稳定恢复
3. 生成结果在关键业务语义与内网报表结果上达到高质量参考水平
### Track BP0-2 单请求量产样板
#### Goal
证明单请求报表家族可以形成高通过率的通用转化模板。
#### Tasks
1. 恢复 request / response / normalize 三件套
2. 压缩伪通用兜底主路径
3. 验证同家族样板迁移能力
#### Acceptance Criteria
1. `single_request_table` 样板稳定通过
2. 同家族样板具备可复用性
3. 结果判定不再过度依赖全文总结
### Track CP0-3 分页补数样板
#### Goal
正确识别复杂分页补数场景的问题空间,并在证据不足时稳定阻断。
#### Tasks
1. 拆开主请求链、补数链、导出链
2. 建立 `paginated_enrichment` 最小可编译证据集
3. 区分业务 workflow 与宿主桥接行为
4. 落地 fail-closed 判定
#### Acceptance Criteria
1. 分页补数 workflow 被正确拆解
2. 证据不足时稳定 fail-closed
3. 不再把宿主链或 `localhost:*` 误判为业务主链
### Track DP1 家族扩展
#### Goal
在 P0 样板稳定后,将能力迁移到已定义的 P1 家族,验证路线具备规模化复制能力。
#### Tasks
1. 迁移线损 / 电量多模式家族
2. 迁移单请求报表家族
3. 迁移分页补数家族
4. 记录每一类家族的复用成功率与失败类型
#### Acceptance Criteria
1. 每个 P1 家族至少完成一轮代表场景迁移验证
2. P1 验证主要依赖 P0 已沉淀的证据、契约和标准答案体系
3. 若超出当前 archetype 或契约能力边界,结果应明确 fail-closed
### Track EScene Family Expansion Policy
#### Goal
以五大场景分组为单位,明确哪些家族进入主线,哪些家族仅做预留或降级。
#### Tasks
1.`G1` 通用单页报表组建立量产迁移节奏
2.`G2` 多模式报表组建立深做样板节奏
3.`G3` 分页明细补数组建立复杂链识别节奏
4.`G4` 工具检测前置组仅保留架构入口与后续扩展口径
5.`G5` 低优先级噪声组建立默认降级口径
#### Acceptance Criteria
1. `G1 + G2 + G3` 成为首轮主线范围
2. `G4` 不抢占当前主线资源,但保留后续检测类扩展入口
3. `G5` 不污染主线 archetype 和验收口径
## Milestone Order
总前置里程碑的发生顺序固定为:
1. 先完成语义证据层
2. 再完成最小可编译业务契约
3. 再冻结 P0 标准答案
在这三个里程碑完成之前,不进入大规模家族扩展。
## File-Level Planning Targets
本计划要求后续实施至少覆盖以下资产类型:
1. `docs/superpowers/specs/` 中的上游设计稿
2. `docs/superpowers/plans/` 中的阶段计划与进展计划
3. scene 生成链中的证据层、契约层、readiness / blocker 相关实现
4. P0 样板对应的 fixture、golden IR、验收基线或等价校准资产
## Completion Criteria
本计划完成的标志为:
1. `tq` 主样板可以稳定恢复核心业务语义,并在内网运行与报表结果上达到高质量参考水平
2. 单请求主样板可以形成可复制的高通过率模板,并覆盖主流通用报表场景
3. 分页补数主样板可以稳定识别复杂 workflow并在证据不足时 fail-closed
4. `Scene IR` 前存在可裁决的证据层
5. archetype 前存在明确契约 gate
6. P0 标准答案已成为后续迁移与回归的统一校准基线
7. 实施主线明确聚焦 `G1 + G2 + G3`,不再被边界场景牵引偏航
## Risks and Control Points
1. 若证据层先天过薄,后续契约和 canonical answer 会失去支撑
2. 若契约 gate 定义过宽,系统会继续伪造 runnable skill
3. 若 P0 标准答案不冻结,后续优化将失去对齐基线
4. 若过早进入 P1 扩展,容易在未完成分层前再次引入宿主噪声污染
## Out of Plan
以下事项明确不属于本计划直接交付范围:
1. 统一平台登录流程自动恢复
2. 目标业务系统后台登录实现细节
3. 浏览器宿主能力的全量抽象
4. 所有场景的一次性端到端可运行保证

View File

@@ -0,0 +1,663 @@
# Scene Skill Compiler Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Upgrade `sg_scene_generate` from a scene metadata extractor plus template filler into a reusable scene skill compiler that can understand workflow semantics, classify scene archetypes, and generate runnable skills for both `tq-lineloss-report`-style and `marketing-zero-consumer-report`-style internal scenes.
**Architecture:** Introduce a unified `Scene IR`, switch extraction to a hybrid deterministic-plus-LLM pipeline, route generation by `workflowArchetype`, align runtime resolver contracts, and add readiness gates so users can tell whether a generated skill is safe to trial on the internal network.
**Tech Stack:** Rust, Node.js, HTML/CSS/JavaScript, serde_json, OpenAI-compatible LLM API
---
## Scope Check
This plan implements the design in:
- `docs/superpowers/specs/2026-04-17-scene-skill-compiler-design.md`
This plan builds on the existing generator work already described in:
- `docs/superpowers/specs/2026-04-17-llm-driven-skill-generation-design.md`
- `docs/superpowers/specs/2026-04-17-enhanced-llm-extraction-schema-design.md`
- `docs/superpowers/specs/2026-04-17-progressive-template-enhancement-design.md`
- `docs/superpowers/specs/2026-04-16-multi-scene-kind-generator-design.md`
This plan does not attempt to solve:
- full login and authentication reconstruction
- all historical scene patterns in one pass
- 100% no-touch generation without human review
---
## File Map
### Core generator pipeline
| File | Action | Purpose |
|------|--------|---------|
| `frontend/scene-generator/llm-client.js` | Modify | Replace truncation-only extraction with chunked workflow-aware extraction and `Scene IR` schema output |
| `frontend/scene-generator/generator-runner.js` | Modify | Add deterministic scene scanning, key-fragment selection, and IR support |
| `frontend/scene-generator/server.js` | Modify | Expose analysis, preview, readiness, and generation endpoints for `Scene IR` |
| `frontend/scene-generator/sg_scene_generator.html` | Modify | Show extraction preview, archetype classification, bootstrap, risks, and readiness |
### Rust backend
| File | Action | Purpose |
|------|--------|---------|
| `src/generated_scene/analyzer.rs` | Modify | Add deterministic extraction helpers and archetype support |
| `src/generated_scene/generator.rs` | Modify | Route generation by archetype and compile from `Scene IR` instead of ad hoc fields |
| `src/generated_scene/ir.rs` | Create | Define unified `Scene IR` structs and serde contracts |
| `src/bin/sg_scene_generate.rs` | Modify | Accept `Scene IR` JSON or file input and pass it into generator |
| `src/compat/scene_platform/resolvers.rs` | Modify | Align runtime parameter resolution with generated contracts |
### Tests and fixtures
| File | Action | Purpose |
|------|--------|---------|
| `tests/scene_generator_test.rs` | Modify | Cover new analysis, archetype classification, and generation routing |
| `tests/generated_scene_*` or related fixtures | Modify/Create | Add representative fixtures for single-request, multi-mode, and paginated-enrichment scenes |
---
## Scope Guardrails
- Do not break existing `--scene-id`, `--scene-name`, or `--scene-kind` compatibility.
- Do not require all scenes to provide complete metadata in HTML meta tags.
- Do not force the runtime to support new resolver contracts unless generation is updated to gate incompatible output.
- Do not assume all report scenes share `org + period` params.
- Do not silently generate low-confidence skills as if they were runnable.
---
### Task 1: Fix Current Hard Failures Before Compiler Refactor
**Files:**
- Modify: `frontend/scene-generator/llm-client.js`
- Modify: `frontend/scene-generator/generator-runner.js`
- Modify: `frontend/scene-generator/server.js`
- Modify: `frontend/scene-generator/sg_scene_generator.html`
- Modify: `src/generated_scene/generator.rs`
**Goal:** Stop the most obvious wrong outputs that currently make generated skills fail on the internal network even before the full compiler architecture lands.
- [ ] **Step 1: Remove report-scene hardcoded parameter assumptions**
Audit `scene.toml` generation in `src/generated_scene/generator.rs` and remove default injection of generic report params such as:
- fixed `org`
- fixed `period`
- default dictionary entity for a specific city
- generic page title keywords like `["报表", "线损"]`
Expected result: generated params come from extracted scene semantics or are omitted when not confidently known.
- [ ] **Step 2: Rework bootstrap source priority**
Change bootstrap derivation so `expected_domain` and `target_url` are resolved using this order:
1. explicit deep extraction result
2. deterministic extraction from business entry points
3. HTML meta tags if trustworthy
4. fallback empty with warning
Explicitly prevent script-host URLs such as static JS includes from becoming the business domain by mistake.
- [ ] **Step 3: Replace naive truncation with chunked extraction input**
Update `frontend/scene-generator/llm-client.js` and `frontend/scene-generator/generator-runner.js` so they no longer send only the first `15000/3000` characters. Replace with:
1. directory tree summary
2. `index.html` chunking
3. URL-bearing fragments
4. request-construction fragments
5. branching logic fragments
6. export-related fragments
- [ ] **Step 4: Add analysis preview and risk banner in Web UI**
Update `frontend/scene-generator/sg_scene_generator.html` and `frontend/scene-generator/server.js` to preview:
- detected archetype
- bootstrap
- key endpoints
- extracted params
- workflow steps
- confidence and risk notes
- [ ] **Step 5: Verify with marketing and tq reference scenes**
Run local analysis against the two reference scenes and confirm:
- `marketing-zero-consumer-report` no longer resolves the wrong domain
- `tq-lineloss-report` still identifies mode-related structures
- generated preview no longer shows generic hardcoded report params
- [ ] **Step 6: Commit**
```bash
git add frontend/scene-generator/llm-client.js frontend/scene-generator/generator-runner.js frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/generator.rs
git commit -m "fix(generator): remove hardcoded report defaults and improve bootstrap extraction"
```
---
### Task 2: Introduce Unified Scene IR
**Files:**
- Create: `src/generated_scene/ir.rs`
- Modify: `src/generated_scene/generator.rs`
- Modify: `src/bin/sg_scene_generate.rs`
- Modify: `frontend/scene-generator/server.js`
- Modify: `frontend/scene-generator/llm-client.js`
**Goal:** Introduce a single intermediate representation that all extraction and compilation stages use.
- [ ] **Step 1: Add Rust `Scene IR` structs**
Create `src/generated_scene/ir.rs` with serde-enabled structs for:
- `SceneIr`
- `BootstrapIr`
- `ParamIr`
- `ModeIr`
- `WorkflowStepIr`
- `ArtifactContractIr`
- `NormalizeRulesIr`
- `ReadinessIr`
- `EvidenceIr`
Minimum top-level fields:
```json
{
"sceneId": "",
"sceneName": "",
"sceneKind": "",
"workflowArchetype": "",
"bootstrap": {},
"params": [],
"modes": [],
"workflowSteps": [],
"requestTemplate": {},
"responsePath": "",
"normalizeRules": {},
"artifactContract": {},
"validationHints": {},
"evidence": []
}
```
- [ ] **Step 2: Wire `Scene IR` into generator entrypoints**
Update `src/bin/sg_scene_generate.rs` to accept either:
- `--scene-info-json` upgraded to the new IR contract, or
- a new `--scene-ir-json` / `--scene-ir-file` parameter
Keep backward compatibility by translating old scene info into partial IR where needed.
- [ ] **Step 3: Refactor generator to compile from IR**
Update `src/generated_scene/generator.rs` so its internal interfaces no longer directly depend on loosely grouped fields like `expectedDomain`, `staticParams`, and `columnDefs` alone. It should compile from unified `SceneIr`.
- [ ] **Step 4: Update Node server to pass IR through generation**
Modify `frontend/scene-generator/server.js` so analyze endpoints return IR-shaped JSON and generate endpoints pass the same structure into Rust without flattening.
- [ ] **Step 5: Verify serde and CLI compatibility**
Run:
```bash
cargo check
node --check frontend/scene-generator/server.js
node --check frontend/scene-generator/llm-client.js
```
Expected: Rust and Node compile cleanly with the new IR contract.
- [ ] **Step 6: Commit**
```bash
git add src/generated_scene/ir.rs src/generated_scene/generator.rs src/bin/sg_scene_generate.rs frontend/scene-generator/server.js frontend/scene-generator/llm-client.js
git commit -m "feat(generator): introduce unified scene ir for analysis and compilation"
```
---
### Task 3: Build Hybrid Extraction Pipeline
**Files:**
- Modify: `src/generated_scene/analyzer.rs`
- Modify: `frontend/scene-generator/generator-runner.js`
- Modify: `frontend/scene-generator/llm-client.js`
- Modify: `frontend/scene-generator/server.js`
**Goal:** Split extraction into deterministic signal collection plus LLM semantic completion.
- [ ] **Step 1: Implement deterministic extraction helpers**
Add helper logic in `src/generated_scene/analyzer.rs` or adjacent extraction code to detect:
- URLs and request methods
- `contentType`
- request payload builders
- pagination variables such as `page`, `rows`, `pageSize`
- branch variables such as `period_mode`, `reportType`
- entry methods
- export methods
- obvious filter expressions such as `charge !== 0`
- [ ] **Step 2: Create key-fragment selection in Node runner**
Update `frontend/scene-generator/generator-runner.js` to extract and package:
- directory summary
- URL fragments
- branch fragments
- request-body fragments
- response normalization fragments
- export fragments
for LLM analysis.
- [ ] **Step 3: Redesign LLM prompt for workflow understanding**
Update `frontend/scene-generator/llm-client.js` so the prompt explicitly asks for:
- `workflowArchetype`
- `bootstrap`
- `params`
- `modes`
- `workflowSteps`
- `requestTemplate`
- `responsePath`
- `normalizeRules`
- `artifactContract`
- `confidence`
- `uncertainties`
- [ ] **Step 4: Merge deterministic and LLM results**
Implement merge logic in `frontend/scene-generator/server.js` or a dedicated helper:
- deterministic extraction wins for hard facts
- LLM fills missing semantics
- conflicts are surfaced in preview as warnings
- [ ] **Step 5: Verify against reference workflows**
Check that:
- `marketing-zero-consumer-report` emits workflow steps including `paginate`, `secondary_request`, `filter`, and `export`
- `tq-lineloss-report` emits `modes`, `defaultMode`, and `modeSwitchField`
- [ ] **Step 6: Commit**
```bash
git add src/generated_scene/analyzer.rs frontend/scene-generator/generator-runner.js frontend/scene-generator/llm-client.js frontend/scene-generator/server.js
git commit -m "feat(generator): add hybrid deterministic and llm workflow extraction"
```
---
### Task 4: Add Workflow Archetype Classification
**Files:**
- Modify: `src/generated_scene/analyzer.rs`
- Modify: `src/generated_scene/ir.rs`
- Modify: `frontend/scene-generator/server.js`
- Modify: `frontend/scene-generator/sg_scene_generator.html`
**Goal:** Reliably classify scenes so the correct compiler path is chosen.
- [ ] **Step 1: Add archetype enum support**
Define and support these initial archetypes:
- `single_request_table`
- `multi_mode_request`
- `paginated_enrichment`
- `page_state_eval`
- [ ] **Step 2: Implement classification rules**
Classification logic should prefer:
1. `multi_mode_request` when explicit mode-switch branching exists
2. `paginated_enrichment` when paginated list fetch plus secondary requests are detected
3. `page_state_eval` when page-state judgment dominates
4. `single_request_table` as fallback with lower confidence
- [ ] **Step 3: Expose classification confidence**
Add confidence and evidence fields to the preview payload so UI can show why a scene was classified into an archetype.
- [ ] **Step 4: Add manual override support in UI**
Allow users to override archetype in `frontend/scene-generator/sg_scene_generator.html` before final generation, but preserve the original detected result and confidence.
- [ ] **Step 5: Verify reference classifications**
Expected:
- `marketing-zero-consumer-report` => `paginated_enrichment`
- `tq-lineloss-report` => `multi_mode_request`
- [ ] **Step 6: Commit**
```bash
git add src/generated_scene/analyzer.rs src/generated_scene/ir.rs frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html
git commit -m "feat(generator): classify scenes by workflow archetype with confidence"
```
---
### Task 5: Split Generator Into Archetype Compilers
**Files:**
- Modify: `src/generated_scene/generator.rs`
- Optionally create: `src/generated_scene/compiler_single_request.rs`
- Optionally create: `src/generated_scene/compiler_multi_mode.rs`
- Optionally create: `src/generated_scene/compiler_paginated_enrichment.rs`
- Optionally create: `src/generated_scene/compiler_page_state.rs`
**Goal:** Replace the single generic report template with explicit compiler paths.
- [ ] **Step 1: Add compiler routing by archetype**
Update `src/generated_scene/generator.rs` so generation dispatches on `workflowArchetype`.
- [ ] **Step 2: Implement `single_request_table` compiler**
Generate:
- minimal `scene.toml`
- direct request browser script
- artifact output for simple table/list data
- [ ] **Step 3: Implement `multi_mode_request` compiler**
Generate:
- mode detection
- mode-specific request builders
- mode-specific column definitions
- mode-specific response extraction
- unified artifact output
Reference target: `tq-lineloss-report`
- [ ] **Step 4: Implement `paginated_enrichment` compiler**
Generate:
- paginated list loop
- per-item or batched secondary requests
- aggregation and transform steps
- business filters
- final artifact or export output
Reference target: `marketing-zero-consumer-report`
- [ ] **Step 5: Implement `page_state_eval` compiler**
Generate:
- state-check script skeleton
- light artifact semantics for monitoring or status checks
- [ ] **Step 6: Verify generated outputs by archetype**
Validate that generated scripts no longer:
- define multiple API endpoints but use only the first
- collapse mode-aware scenes into one request body
- flatten paginated enrichment scenes into one-step normalization
- [ ] **Step 7: Commit**
```bash
git add src/generated_scene/generator.rs src/generated_scene/compiler_*.rs
git commit -m "feat(generator): split scene generation into workflow archetype compilers"
```
---
### Task 6: Align Runtime Resolver Contracts
**Files:**
- Modify: `src/compat/scene_platform/resolvers.rs`
- Modify: `src/generated_scene/generator.rs`
- Modify: `src/generated_scene/ir.rs`
**Goal:** Ensure generated parameter contracts are either executable by the runtime or explicitly flagged as unsupported.
- [ ] **Step 1: Audit current resolver coverage**
Document which current contracts are already supported, including:
- `dictionary_entity`
- `month_week_period`
- `fixed_enum`
- `literal_passthrough`
- [ ] **Step 2: Add missing resolver types or gate them**
Choose one of these paths per parameter type:
1. implement new runtime resolver support
2. downgrade generation to an existing supported resolver
3. block generation with explicit readiness warning
Recommended additions:
- `mode_enum`
- `date_range`
- `org_tree`
- `page_size`
- `hidden_static`
- `derived_param`
- [ ] **Step 3: Reflect runtime compatibility in generated metadata**
Generated output should clearly indicate:
- supported params
- unresolved params
- manual-completion requirements
- [ ] **Step 4: Add tests for resolver alignment**
Extend tests to ensure a generated skill cannot claim runnable readiness when its params require unsupported resolver behavior.
- [ ] **Step 5: Commit**
```bash
git add src/compat/scene_platform/resolvers.rs src/generated_scene/generator.rs src/generated_scene/ir.rs tests/scene_generator_test.rs
git commit -m "feat(runtime): align generated scene contracts with resolver support"
```
---
### Task 7: Add Readiness Gates And Generation Report
**Files:**
- Modify: `frontend/scene-generator/server.js`
- Modify: `frontend/scene-generator/sg_scene_generator.html`
- Modify: `src/generated_scene/ir.rs`
- Modify: `src/generated_scene/generator.rs`
**Goal:** Make generation output self-describing so users know whether a skill is ready for internal-network trial.
- [ ] **Step 1: Add static readiness checks**
Implement checks for:
- entrypoint detection
- request-chain completeness
- bootstrap plausibility
- param/runtime compatibility
- archetype compiler completeness
- [ ] **Step 2: Add readiness levels**
Define:
- `A` = ready for direct internal-network trial
- `B` = structurally correct, human review recommended
- `C` = draft only, manual completion required
- [ ] **Step 3: Generate human-readable report**
Each analysis or generation result should include:
- archetype
- confidence
- key evidence
- detected risks
- missing pieces
- readiness level
- [ ] **Step 4: Display readiness in Web UI**
Show the readiness grade before generation and after generation, with explicit warnings for internal-network execution risk.
- [ ] **Step 5: Verify readiness outcomes**
Expected baseline:
- `tq-lineloss-report` should reach `A` or high-confidence `B`
- `marketing-zero-consumer-report` should not be labeled runnable unless pagination and secondary-request logic are correctly represented
- [ ] **Step 6: Commit**
```bash
git add frontend/scene-generator/server.js frontend/scene-generator/sg_scene_generator.html src/generated_scene/ir.rs src/generated_scene/generator.rs
git commit -m "feat(generator): add readiness grading and generation risk reporting"
```
---
### Task 8: Add Regression Coverage For Reference Scenes
**Files:**
- Modify: `tests/scene_generator_test.rs`
- Create/Modify: scene generator fixtures as needed
**Goal:** Lock in the two reference scenes as ongoing regression cases.
- [ ] **Step 1: Add marketing classification fixture coverage**
Test that the marketing source scene is classified as `paginated_enrichment` and contains evidence for:
- paginated list request
- secondary request
- filter rule
- export step
- [ ] **Step 2: Add tq classification fixture coverage**
Test that the tq source scene is classified as `multi_mode_request` and contains evidence for:
- month mode
- week mode
- distinct request templates
- distinct column definitions
- [ ] **Step 3: Add generation-shape assertions**
Assert that generated outputs differ by archetype and do not collapse to a single generic template shape.
- [ ] **Step 4: Run verification**
```bash
cargo test --test scene_generator_test -- --nocapture
```
Expected: both reference cases pass and guard against regression.
- [ ] **Step 5: Commit**
```bash
git add tests/scene_generator_test.rs tests/fixtures
git commit -m "test(generator): add regression coverage for marketing and tq reference scenes"
```
---
## Delivery Sequence
Recommended implementation order:
1. Task 1: hard failure fixes
2. Task 2: `Scene IR`
3. Task 3: hybrid extraction
4. Task 4: archetype classification
5. Task 5: compiler split
6. Task 6: resolver alignment
7. Task 7: readiness gates
8. Task 8: regression coverage
Rationale:
- Task 1 stops current bad outputs early.
- Tasks 2 to 5 establish the new compiler backbone.
- Tasks 6 and 7 prevent false claims of runnability.
- Task 8 locks the new architecture against regression.
---
## Verification Strategy
### Static Verification
- `cargo check`
- `cargo test --test scene_generator_test -- --nocapture`
- `node --check frontend/scene-generator/llm-client.js`
- `node --check frontend/scene-generator/generator-runner.js`
- `node --check frontend/scene-generator/server.js`
### Functional Verification
For `marketing-zero-consumer-report`:
- detected as `paginated_enrichment`
- bootstrap resolves to business domain, not static script host
- generated workflow includes pagination and secondary requests
- generation is not labeled runnable if those steps are missing
For `tq-lineloss-report`:
- detected as `multi_mode_request`
- month and week logic remain distinct
- request templates and column definitions are mode-specific
### UI Verification
Confirm the scene generator UI now shows:
- detected archetype
- confidence
- bootstrap
- key params
- readiness grade
- risk notes
---
## Acceptance Criteria
This plan is complete when all of the following are true:
1. `sg_scene_generate` consumes a unified `Scene IR`.
2. The analysis pipeline can distinguish at least `single_request_table`, `multi_mode_request`, `paginated_enrichment`, and `page_state_eval`.
3. `tq-lineloss-report` is generated through the multi-mode compiler path.
4. `marketing-zero-consumer-report` is generated through the paginated-enrichment compiler path.
5. Generated `scene.toml` no longer injects unrelated default org/period assumptions.
6. Bootstrap resolution no longer mistakes external script hosts for business target domains.
7. Runtime resolver compatibility is explicit, not implicit.
8. Generation results include readiness grading and risk reporting before internal-network trial.

View File

@@ -0,0 +1,193 @@
# G1 边界收敛与家族重排实施计划
> Date: 2026-04-18
> Status: Draft
> Source:
> - `docs/superpowers/specs/2026-04-17-scene-skill-60-to-90-roadmap-design.md`
> - `examples/g1_batch_round1/`
## 1. Plan Intent
本计划用于处理 `G1` 通用单页报表组边界过宽的问题。
通过对以下 4 个边界样本的实测与结构分析,已经确认当前 `G1` 分类存在误收问题:
1. `高低压新增报装容量月度统计表`
2. `电能表现场检验完成率指标报表`
3. `计量资产库存统计`
4. `95598供电服务月报`
结论不是“是否继续观察”,而是“必须整改”:
1. `G1` 的定义必须收紧
2. 这 4 个样本必须重排
3. 后续实施必须按新边界推进,不能继续把这 4 个样本混在同一类里
## 2. Rectification Objective
本轮整改目标固定为:
1. 收紧 `G1` 定义,避免继续污染 `single_request_table`
2. 将 4 个边界样本重新分配到正确家族
3. 为后续实现提供明确顺序,不再把边界样本混做“通用报表”
## 3. Final Reassignment Decision
本计划执行时4 个样本的正式归类结论固定如下:
1. `高低压新增报装容量月度统计表`
- 保留在 `G1`
- 子型标记为:`G1-E 轻量补查汇总型`
2. `电能表现场检验完成率指标报表`
-`G1` 拆出
- 新家族标记为:`G6 宿主桥接多步查询型`
3. `计量资产库存统计`
-`G1` 拆出
- 新家族标记为:`G7 多接口盘点汇总型`
4. `95598供电服务月报`
-`G1` 拆出
- 新家族标记为:`G8 抓取落库分析出文档型`
## 4. Scope Guardrails
本计划边界固定如下:
1. 不修改线损家族 `G2`
2. 不扩展到全部 `102` 个场景同步重排
3. 只处理 `G1` 边界定义与这 4 个边界样本
4. 不在本计划内直接实现 `G6/G7/G8` 全部能力
5. 本计划优先产出“边界收敛 + 家族重排 + 实施顺序”
## 5. Phase Overview
执行顺序固定为:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3`
### Phase 0: 冻结整改口径
目标:
1. 冻结 `G1` 修订定义
2. 冻结 4 个样本的正式重排结论
退出标准:
1. 后续不再把这 4 个样本同时作为 `G1` 候选讨论
### Phase 1: 收紧 G1 边界
目标:
1.`G1` 明确收敛为“通用单页报表”
2. 把不属于 `G1` 的结构特征显式列为排除条件
必须落地的对象:
1. `G1` 修订定义
2. `G1` 进入条件
3. `G1` 排除条件
4. `G1-E` 作为上边界子型的说明
退出标准:
1. `single_request_table` 不再承接宿主桥接型、盘点型、落库分析型场景
### Phase 2: 样本重排与家族建档
目标:
1. 把 4 个样本正式移到对应家族
2.`G6/G7/G8` 建立最小定义
必须落地的对象:
1. 样本重排表
2. `G6` 最小定义
3. `G7` 最小定义
4. `G8` 最小定义
退出标准:
1. 4 个样本不再处于“G1 模糊候选”状态
### Phase 3: 后续实施顺序固定
目标:
1. 确定后续开发顺序
2. 避免多家族并发扩散
固定顺序:
1. 先继续推进 `高低压新增报装容量月度统计表`
- 作为 `G1-E`
2. 再单开 `G6`
- `电能表现场检验完成率指标报表`
3. 再评估 `G7`
- `计量资产库存统计`
4. 最后评估 `G8`
- `95598供电服务月报`
退出标准:
1. 后续任务顺序明确
2. `G1` 不再继续吞入新边界样本
## 6. Family-Level Rectification Rules
### 6.1 G1 修订规则
`G1` 仅保留以下场景:
1. 单系统、单页面承载
2. 存在相对清晰的主请求链
3. 请求模板与响应路径可直接恢复
4. 最终结果为单表或单次统计汇总
5. 不依赖复杂宿主桥接
6. 不依赖本地落库与 SQL 分析
### 6.2 G1 排除规则
出现以下特征之一,即不再归入 `G1`
1. `BrowserAction / sgBrowserExcuteJsCode` 主导业务请求推进
2. 存在明显多轮 callback 串联 workflow
3. 同场景内存在多个业务 endpoint 分类型扫数
4. 报表前需要本地落库、二次分析或 SQL 聚合
5. 输出以 Word 文档流水线而非直接表格结果为主
## 7. Implementation Priority
优先级固定如下:
1. `P0`
- `高低压新增报装容量月度统计表`
- 目标:验证 `G1-E` 是否可作为 `G1` 上边界稳定成立
2. `P1`
- `电能表现场检验完成率指标报表`
- 目标:验证 `G6` 的最小 workflow 定义
3. `P2`
- `计量资产库存统计`
- 目标:验证 `G7` 的多 endpoint 聚合边界
4. `P3`
- `95598供电服务月报`
- 目标:验证 `G8` 的抓取落库分析链路边界
## 8. Deliverables
本计划完成时至少产出:
1. `G1` 边界修订文案
2. 4 个边界样本重排表
3. `G6/G7/G8` 最小家族定义
4. 后续实施优先级清单
## 9. Completion Criteria
本计划完成的标志是:
1. `G1` 定义被正式收紧
2. 4 个边界样本完成正式重排
3. `高低压新增报装容量月度统计表` 被确定为 `G1-E`
4. `电能表现场检验完成率指标报表``计量资产库存统计``95598供电服务月报` 不再继续作为 `G1` 样本使用
5. 后续开发顺序固定,不再反复讨论边界归属

View File

@@ -0,0 +1,212 @@
# G1-E Light Enrichment Report Plan
> Date: 2026-04-18
> Status: Draft
> Source:
> - `docs/superpowers/specs/2026-04-18-g1-e-light-enrichment-report-design.md`
> - `docs/superpowers/specs/2026-04-17-scene-skill-60-to-90-roadmap-design.md`
> - `docs/superpowers/reports/2026-04-18-g1-boundary-reassignment-report.md`
## 1. Plan Intent
本计划用于把 `G1-E 轻量补查汇总型` 从概念边界推进到可实施状态。
本轮只解决一个问题:
1. 让生成器能够对“单主请求 + 少量补查 + 单次汇总输出”的场景,恢复出可编译的三段式业务语义。
本计划不处理 `G6/G7/G8`,也不扩展到其它家族。
## 2. Scope
本计划纳入范围的对象只有三类:
1. `G1-E` 证据层补齐
2. `G1-E` 三段式 `Scene IR` / compiler gate 落地
3. `高低压新增报装容量月度统计表` 的 P0 样板验证
本计划明确排除:
1. `G6 宿主桥接多步查询型`
2. `G7 多接口盘点汇总型`
3. `G8 抓取落库分析出文档型`
4. `102` 个场景的大规模家族扩展
## 3. Fixed Sample
本计划的唯一 P0 样板固定为:
1. `高低压新增报装容量月度统计表`
该样板的冻结目标是:
1. 主请求:`getWkorderAll`
2. 补查请求:
- `queryElectCustInfo`
- `queryBusAcpt`
- `getBatchPerCust97`
3. 最终恢复为主请求、补查请求、并回规则三段式结构
在本计划完成前,不新增第二个 `G1-E` 样板。
## 4. Phase Overview
执行顺序固定为:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3`
### Phase 0: Freeze Contract
目标:
1. 冻结 `G1-E` 最小定义
2. 冻结 P0 样板的主链、补查链、并回链目标口径
必须落地的对象:
1. `G1-E` spec
2. P0 样板目标结构说明
3. 失败分类口径
退出标准:
1. 后续实现不再回退成普通 `G1 single_request_table`
### Phase 1: Evidence Layer Completion
目标:
1. 让提取链路可以显式产出 `main_request` 证据
2. 让提取链路可以显式产出 `enrichment_request` 证据
3. 让提取链路可以显式产出 `merge_plan` 证据
必须落地的对象:
1. `main_request` 证据 schema
2. `enrichment_request` 证据 schema
3. `merge_plan` 证据 schema
4. 对应的越界识别信号
退出标准:
1. P0 样板不再只落到 `page_state_eval`
2. 提取结果中能看见主请求、补查请求、并回规则候选
### Phase 2: Scene IR And Compiler Gates
目标:
1.`Scene IR` 中承载三段式结构
2. 在 compiler 中增加 `G1-E` 专属 gate
3. 防止缺失补查契约的结果误判为普通 `G1` 成功
必须落地的对象:
1. `main_request`
2. `enrichment_requests[]`
3. `merge_plan`
4. `main_request_resolved`
5. `enrichment_requests_resolved`
6. `merge_plan_resolved`
7. `g1e_scope_compatible`
退出标准:
1. `G1-E` 可以独立于 `single_request_table` 被判定
2. 越界样本会被阻断,而不是伪成功
### Phase 3: P0 Validation
目标:
1.`高低压新增报装容量月度统计表` 验证 `G1-E` 最小闭环
2. 冻结第一版验收基线
必须落地的对象:
1. P0 样板生成结果
2. P0 样板验证记录
3. P0 样板失败归因记录
退出标准:
1. 主请求、补查请求、并回规则均能稳定恢复
2. 结果不再是空壳 `params=[] / requestEntries=[] / columnDefs=[]`
3. 缺证据时能 fail-closed
## 5. Work Breakdown
### Task Group A: G1-E Evidence Modeling
任务目标:
1. 定义主请求证据对象
2. 定义补查请求证据对象
3. 定义并回规则证据对象
4. 明确越界到 `G6/G7/G8` 的识别信号
完成标志:
1. `G1-E` 不再依赖“全文像不像报表”的模糊判断
### Task Group B: G1-E IR / Compiler Integration
任务目标:
1.`G1-E` 建立三段式 `Scene IR`
2. 增加 `G1-E` gate
3. 切断“补查缺失但仍按普通 G1 成功”的通道
完成标志:
1. `G1-E``G1` 的成功条件正式分离
### Task Group C: P0 Sample Verification
任务目标:
1. 重新生成 `高低压新增报装容量月度统计表`
2. 核对主请求、补查请求、并回规则是否完整
3. 输出验证报告
完成标志:
1. `高低压新增报装容量月度统计表` 成为 `G1-E` 第一版标准样板
## 6. Deliverables
本计划完成时至少产出:
1. `G1-E` 证据层实现
2. `G1-E` 三段式 `Scene IR`
3. `G1-E` compiler gate
4. `高低压新增报装容量月度统计表` 的 P0 生成与验证结果
5. 对应整改报告或验证报告
## 7. Acceptance Criteria
本计划完成的标志是:
1. `G1-E` 已从文档定义进入可实现、可验证状态
2. `高低压新增报装容量月度统计表` 不再被误生成为普通 `G1` 空壳 skill
3. 生成器能够显式恢复:
- 主请求
- 补查请求
- 并回规则
4. 当证据不足或结构越界时,系统会阻断并说明原因
## 8. Execution Guardrails
执行过程中必须遵守以下边界:
1. 不把 `G6/G7/G8` 的能力提前混入 `G1-E`
2. 不扩展第二个 `G1-E` 样板
3. 不为了“先生成一个 skill”而放松 gate
4. 不把 `G1-E` 再退化回普通 `single_request_table`
## 9. Next Plan
本计划完成后,后续顺序固定为:
1.`G1-E` P0 验证通过,再决定是否补第二个 `G1-E` 样板
2. 然后再进入 `G6` 的独立 spec / plan

View File

@@ -0,0 +1,304 @@
# G2 家族扩展整改计划
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Inputs:**
> [2026-04-18-g2-remediation-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-18-g2-remediation-plan.md)
> [2026-04-18-g2-second-round-remediation-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-18-g2-second-round-remediation-report.md)
## Plan Intent
本计划用于承接上一轮 `G2` 主样本整改结果,把目标从“修通 `tq` 主样本”推进到“扩展 `G2` 家族变体覆盖”。
上一轮已经证明:
1. `台区线损大数据-月_周累计线损率统计分析` 可以进入候选验证名单
2. `G2` 主样本链路已经具备可编译性
3. `白银线损周报`
4. `线损同期差异报表`
这两份剩余真实样本仍然稳定 `fail-close`
因此,本计划的核心目标不是重做上一轮主样本整改,而是补齐 `G2` 家族内部剩余两类变体的识别与合同恢复能力。
## Success Baseline
本计划完成后的最低成功口径固定为:
1. `白银线损周报` 不再因为 `G2` 合同缺失而直接阻断
2. `线损同期差异报表` 不再被粗暴套入 `tq` 主报表模板
3. 生成器能够明确区分至少两类新增 `G2` 家族子型
4. 新增子型具备各自最小可解释合同
5. 对证据不足的样本继续 `fail-close`
6. readiness 与“是否达到候选验证名单”保持一致
7. 输出第三轮 `G2` 家族扩展回归报告
## Scope Guardrails
执行过程中保持以下边界不变:
1. 不切换到 `G1`
2. 不切换到 `G3`
3. 不展开统一登录、隐藏域登录或宿主 transport 重构
4. 不扩展到 102 个全量场景
5. 不把本计划扩散成通用 scene skill 平台重写
6. 不否定上一轮 `tq` 主样本已经收敛的口径
## Target Samples
本计划只围绕以下三份 `G2` 家族真实样本执行:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `白银线损周报`
3. `线损同期差异报表`
其中角色区分为:
1. `台区线损大数据-月_周累计线损率统计分析`
作用:`G2-A` 主样本基线,不允许回退
2. `白银线损周报`
作用:`G2-B` 周报单侧 mode 变体
3. `线损同期差异报表`
作用:`G2-C` 混合联动变体
## Family Expansion Hypothesis
基于上一轮报告,本计划先将 `G2` 家族收束为三类:
1. `G2-A`
定义:`tq` 主报表型,具备稳定的 `month/week + cols1/cols2 + mode-specific request/response`
当前状态:已进入候选验证名单
2. `G2-B`
定义:周报偏单侧 mode 变体,存在 `week/tjzq` 与线损主接口,但缺少与主样本同等级的双模式列合同
当前代表:`白银线损周报`
3. `G2-C`
定义:线损主链路与外部系统联动混合变体,存在线损接口和联动接口并存的情况
当前代表:`线损同期差异报表`
本计划的整改原则是:
1. 不强行把 `G2-B/G2-C` 编造成 `G2-A`
2. 先把三类子型边界立住
3. 再让每类子型各自拥有最小合同
## Workstreams
本计划拆为五条工作流:
1. `WS1` G2 子型分层与判定收束
2. `WS2` G2-B 周报变体合同补齐
3. `WS3` G2-C 混合联动变体隔离
4. `WS4` G2 家族 readiness 分级重整
5. `WS5` 真实样本第三轮回归与报告
## Phase Overview
本计划按四个阶段推进:
1. Phase 0冻结家族扩展目标
2. Phase 1建立 `G2-A/G2-B/G2-C` 子型边界
3. Phase 2分别补齐 `G2-B/G2-C` 最小合同
4. Phase 3回归三份真实样本并输出扩展报告
执行顺序固定为:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3`
## Phase 0冻结扩展目标
### Objective
把上一轮已经收敛出来的家族事实冻结下来,避免再次把问题表述成“主样本没修好”。
### Tasks
1. 固化 `G2-A` 已达标口径
2. 固化 `G2-B``G2-C` 的直接 blocker
3. 固化本计划只补家族扩展,不回退主样本链路
### Exit Criteria
1. `tq` 主样本被视为基线,不再作为待整改对象
2. 家族扩展问题被明确表述为“变体支持缺失”
## Phase 1建立子型边界
### Objective
让系统能区分 `G2-A/G2-B/G2-C`,而不是所有 `G2` 一律走同一套路。
### WS1G2 子型分层与判定收束
#### Task 1
审计当前 `G2` 真实样本信号差异,明确以下边界:
1. 哪些信号属于 `G2-A`
2. 哪些信号属于 `G2-B`
3. 哪些信号属于 `G2-C`
#### Task 2
`G2` 增加子型判定规则,至少能区分:
1. 双模式主报表型
2. 周报单侧 mode 型
3. 混合联动型
#### Task 3
补充 fixture 与回归测试,证明:
1. `G2-A` 不回退
2. `G2-B` 不再误套 `G2-A`
3. `G2-C` 不再误套 `G2-A`
### Phase 1 Exit Criteria
1. `G2` 家族内部已可分层
2. 生成路径不再默认所有 `G2` 都是 `tq` 主报表
## Phase 2补齐变体最小合同
### Objective
分别为 `G2-B``G2-C` 建立“足够小但可解释”的合同。
### WS2G2-B 周报变体合同补齐
#### Task 4
定义 `G2-B` 的最小合同,至少包括:
1. 主 mode 或主周期字段
2. 对应 request template
3. 对应 response path
4. 对应 column/required fields
#### Task 5
修改 analyzer / generator / scene ir 组装逻辑,使 `白银线损周报` 能输出非空合同,而不是继续因合同缺失直接阻断。
#### Task 6
新增或更新测试,证明 `G2-B` 可以独立成立,不依赖 `month/week` 双模式完整结构。
### WS3G2-C 混合联动变体隔离
#### Task 7
审计 `线损同期差异报表` 中:
1. 线损主链路
2. 同期系统联动链路
3. 哪一部分属于主报表合同
#### Task 8
`G2-C` 建立隔离规则,避免混合联动接口污染主报表生成。
#### Task 9
定义 `G2-C` 的最小可编译合同,允许:
1. 主链路进入候选验证
2. 联动链路作为风险或扩展证据保留
而不是全部混在一起后直接失败。
#### Task 10
新增或更新测试,证明 `G2-C` 至少能稳定输出“主链路 + 联动风险”的结构化结果。
### WS4G2 家族 readiness 分级重整
#### Task 11
`G2-A/G2-B/G2-C` 增加子型级 readiness gate。
#### Task 12
调整 readiness 评级逻辑,保证:
1. `G2-A` 满足完整双模式合同时可以进入 `A`
2. `G2-B` 满足其最小合同时可以进入候选验证等级
3. `G2-C` 若仅主链路闭合,也能获得可解释等级
4. 证据不足时继续 `fail-close`
#### Task 13
补充测试,证明 readiness 不会再用 `G2-A` 的标准去误判全部 `G2` 子型。
### Phase 2 Exit Criteria
1. `G2-B` 具备最小合同
2. `G2-C` 具备隔离后的最小合同
3. readiness 与子型口径一致
## Phase 3真实样本第三轮回归
### Objective
基于扩展后的家族能力,重新回归三份真实样本并输出正式结论。
### WS5真实样本第三轮回归与报告
#### Task 14
重新生成以下三份真实样本:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `白银线损周报`
3. `线损同期差异报表`
#### Task 15
按统一口径对比:
1. 子型判定
2. bootstrap
3. request contract
4. response / column / normalize contract
5. readiness
6. 是否进入候选验证名单
#### Task 16
输出第三轮 `G2` 家族扩展整改报告,至少说明:
1. `G2-A` 是否保持稳定
2. `G2-B` 是否进入候选验证名单
3. `G2-C` 是否进入候选验证名单或仍需 fail-close
4. 剩余 blocker 是否已经从“主样本不可生成”转移为“少数变体待扩展”
### Deliverables
1. `G2` 家族扩展回归测试
2. `G2-B/G2-C` 对应 fixture
3. 第三轮真实样本生成结果
4. 第三轮 `G2` 家族扩展整改报告
### Acceptance Criteria
1. `G2-A` 不回退
2. `G2-B` 至少达到可解释合同或候选验证等级
3. `G2-C` 至少达到主链路隔离成功,不能继续被整包噪声污染
4. 三份样本不再被单一 `G2-A` 模型粗暴处理
## File-Level Targets
本计划执行时,至少会触达以下资产类型:
1. `src/generated_scene/` 下的 analyzer / generator / readiness 相关实现
2. `tests/fixtures/generated_scene/` 下的 `G2-B/G2-C` fixture
3. `tests/` 下与 scene generator / readiness / family regression 相关的测试
4. `docs/superpowers/reports/` 下的第三轮家族扩展报告
## Completion Criteria
本计划完成的标志是:
1. `G2` 已从“单主样本修通”推进到“至少三类子型可区分”
2. `白银线损周报``线损同期差异报表` 不再只是被动 fail-close
3. 下一步是否继续扩到更多线损变体,可以建立在第三轮家族扩展报告上

View File

@@ -0,0 +1,331 @@
# G2 家族整改计划
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Spec:** [2026-04-18-g2-remediation-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-18-g2-remediation-design.md)
## Plan Intent
本计划用于把 `G2` 家族整改设计拆解为可执行任务,目标是把当前线损多模式报表家族从“信号能抓到但主链重建失败”,推进到“至少第一份样本达到候选验证门槛”。
本计划严格限定在 `G2` 家族整改,不扩展到:
1. `G1`
2. `G3`
3. 更大范围真实场景迁移
4. 登录恢复、宿主协议重构或运行时 transport 改造
## Success Baseline
整改阶段的最低成功口径固定为:
1. `台区线损大数据-月_周累计线损率统计分析` 不再坍缩为 `paginated_enrichment`
2. 至少该样本能生成 `multi_mode_request` 结构
3. `bootstrap` 落到线损主业务承载面
4. `modes` 至少恢复 `month``week`
5. mode-specific `request/response/column/normalize` 合同不再为空
6. readiness 不再在核心合同缺失时给出虚高 `A`
7. 样本结果达到“可进入候选验证”门槛
## Scope Guardrails
执行过程中保持以下边界不变:
1. 不切换到 `G1/G3` 样本执行
2. 不继续补更多同类 `G2` 观察样本
3. 不在本计划中展开内网人工验证
4. 不在本计划中处理统一登录与隐藏域登录恢复
5. 不发散到 scene skill 平台通用重构
## Target Samples
本计划整改与回归只围绕以下三份 `G2` 样本:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `白银线损周报`
3. `线损同期差异报表`
对应产物路径:
1. `examples/real_scene_batch_round1/skills/real-tq-lineloss-report-r1`
2. `examples/real_scene_batch_round1/skills/real-baiyin-lineloss-weekly-r1`
3. `examples/real_scene_batch_round1/skills/real-lineloss-period-diff-r1`
## Workstreams
本计划拆为五条工作流,与上游 `spec` 一一对应:
1. `WS1` G2 archetype 纠偏
2. `WS2` bootstrap 纠偏
3. `WS3` mode contract 重建
4. `WS4` endpoint 去污染
5. `WS5` readiness 收紧
## Phase Overview
本计划按四个阶段推进:
1. Phase 0冻结整改基线
2. Phase 1修正识别与选择
3. Phase 2重建 `G2` 合同
4. Phase 3回归真实样本并产出整改报告
执行顺序固定为:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3`
其中 `Phase 1` 先于 `Phase 2`,避免在错误 archetype 和错误 bootstrap 上继续堆模板逻辑。
## Phase 0冻结整改基线
### Objective
把当前 `G2` 家族首轮 blocker、对标口径和验收门槛冻结避免整改过程中边界漂移。
### Tasks
1. 固化三份 `G2` 样本的当前失败画像
2. 固化 `tq-lineloss-report` 作为 `G2` 主锚点参考
3. 固化 `G2` 候选验证门槛
4. 固化整改阶段只围绕 `G2` 的边界
### Deliverables
1. 本计划
2. 已存在的 `G2` blocker 汇总
3. 已存在的第一轮迁移与候选验证报告
### Exit Criteria
1. 后续执行不再追加同类 `G2` 观察样本
2. 不再用“先去内网试试”替代整改闭环
## Phase 1修正识别与选择
### Objective
先把 `G2` 主链判定修正过来,解决 archetype、bootstrap 与 endpoint 污染这三个上游问题。
### WS1G2 Archetype Rectification
#### Task 1
审计当前 `G2` archetype 误判来源,确认:
1. 哪些分页信号在夺权
2. 哪些 mode 信号没有进入主判定
3. 当前 `multi_mode_request``paginated_enrichment` 的优先级冲突点在哪里
#### Task 2
修改 `G2` archetype 判定逻辑,使以下信号在 `G2` 中具备更高权重:
1. `month/week`
2. `mode`
3. `tjzq`
4. 同一场景内多组线损接口
5. 模式切换分支字段
#### Task 3
新增或更新回归测试,证明:
1. 当前 `G2` fixture 不再判成 `paginated_enrichment`
2. `G2` 相关修正不会误伤现有 `G3` fixture
### WS2Bootstrap Rectification
#### Task 4
审计当前 bootstrap 选择逻辑,确认为什么三份样本都稳定落到 `20.77.115.36:31051`
#### Task 5
`G2` 引入更严格的 bootstrap 选择约束:
1. 优先真实线损业务承载页
2. 排除页面壳入口与错误主域
3. 继续排除 `localhost:*`、第三方库 URL、静态资源 URL
#### Task 6
新增或更新测试,证明:
1. `G2` 主样本 bootstrap 不再落到错误入口
2. `localhost:*` 仍只作为宿主依赖证据保留
### WS4Endpoint Purification
#### Task 7
审计当前 endpoint 提取污染来源,明确以下类别如何被误收进业务候选:
1. 第三方依赖库
2. 文档外链
3. 静态资源 URL
4. 其他业务系统遗留接口
#### Task 8
收紧 endpoint 候选过滤与排序规则,使 `G2` 样本中:
1. 线损主业务接口排在前列
2. 外链与依赖库 URL 不再进入主业务候选
3. 其他业务系统 endpoint 不再轻易抢占主链
#### Task 9
补充测试,证明:
1. `G2` 主 endpoint 排序明显改善
2. 噪声 endpoint 不再污染生成主脚本
### Phase 1 Exit Criteria
1. `G2` fixture archetype 判定修正
2. `G2` bootstrap 选择修正
3. `G2` endpoint 候选排序修正
## Phase 2重建 G2 合同
### Objective
在主链判定正确后,恢复 `G2` 必需的 mode-specific 合同与更严格的 readiness。
### WS3Mode Contract Reconstruction
#### Task 10
`G2` 定义最小 mode contract至少包括
1. `modes[]`
2. `defaultMode`
3. `modeSwitchField`
4. per-mode `requestTemplate`
5. per-mode `responsePath`
6. per-mode `columnDefs`
7. per-mode `normalizeRules`
#### Task 11
修改 `Scene IR` 组装或生成逻辑,让 `G2` 样本在证据充分时真正输出 `modes[]`,而不是只保留空壳默认字段。
#### Task 12
修改 `G2` 生成脚本模板或编译路径,避免继续退化成通用:
- `paginate -> secondary_request -> filter`
要求生成结果能体现:
1. `month` 模式
2. `week` 模式
3. 不同模式的请求差异
4. 不同模式的列差异
#### Task 13
新增或更新测试,证明:
1. `台区线损大数据-月_周累计线损率统计分析` 可输出非空 `modes`
2. 至少一个 `G2` fixture 恢复出 mode-specific contract
### WS5Readiness Tightening
#### Task 14
`G2` 新增或收紧 gate至少覆盖
1. `g2_archetype_resolved`
2. `g2_bootstrap_resolved`
3. `g2_modes_present`
4. `g2_request_contract_complete`
5. `g2_response_contract_complete`
#### Task 15
调整 readiness 评级逻辑,保证以下情况不再给出高等级:
1. `modes = []`
2. `requestTemplate = null`
3. `columnDefs = []`
4. archetype 误判
#### Task 16
新增或更新测试,证明:
1. 不闭合 `G2` 样本会被降级或阻断
2. readiness 与候选验证门槛一致
### Phase 2 Exit Criteria
1. 至少 `G2` 主样本拥有可解释的 mode contract
2. readiness 不再虚高
3. `G2` 生成结果在结构上具备进入候选门槛的可能
## Phase 3回归真实样本并产出整改报告
### Objective
在整改完成后,重新生成三份 `G2` 真实样本,并输出第二轮正式结论。
### Tasks
#### Task 17
重新生成以下三份 `G2` 样本:
1. `台区线损大数据-月_周累计线损率统计分析`
2. `白银线损周报`
3. `线损同期差异报表`
#### Task 18
按与第一轮完全一致的口径,对比以下项目:
1. archetype
2. bootstrap
3. modes
4. request contract
5. response / column / normalize contract
6. readiness
#### Task 19
输出整改后的第二轮报告,至少包含:
1. 哪些 blocker 被修掉
2. 哪些 blocker 仍存在
3. 哪些样本进入候选验证名单
4. 哪些样本仍需 fail-closed
### Deliverables
1. 第二轮 `G2` 真实样本生成结果
2. 第二轮 `G2` 整改回归报告
3. 更新后的候选验证名单
### Acceptance Criteria
1. `台区线损大数据-月_周累计线损率统计分析` 至少进入候选验证名单
2. 三份样本不再统一坍缩成 `paginated_enrichment`
3. readiness 与真实业务闭合程度基本一致
## File-Level Targets
本计划执行时,至少会触达以下类型资产:
1. `src/generated_scene/` 下的 analyzer / generator / readiness 相关实现
2. `tests/fixtures/generated_scene/` 下的 `G2` fixture 或 canonical 资产
3. `tests/` 下与 scene generator / canonical / readiness 相关的回归测试
4. `docs/superpowers/reports/` 下的第二轮整改报告
## Completion Criteria
本计划完成的标志是:
1. `G2` 主样本达到候选验证门槛
2. `G2` 家族 blocker 从“稳定复现”转为“部分修复且可解释”
3. 后续是否切换到 `G1/G3`,可以建立在整改后二轮报告上,而不是继续依赖第一轮失败画像

View File

@@ -0,0 +1,458 @@
# G3 Paginated Enrichment Plan
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Spec:** [2026-04-18-g3-paginated-enrichment-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-18-g3-paginated-enrichment-design.md)
## Plan Intent
本计划用于把 `G3` 分页补数家族设计拆解为可执行任务,目标是把当前 `paginated_enrichment` 从“宽泛的复杂 workflow 标签”推进到“具备证据层、最小合同、canonical baseline 和 fail-closed 判定”的正式主线 archetype。
本计划严格限定在 `G3 / P0-3` 落地,不扩展到:
1. `G6/G7/G8`
2. 全量 `95598` 家族并发整改
3. 登录恢复或宿主 transport 重构
4. 102 个场景大规模铺开
## Success Baseline
本计划完成后的最低成功口径固定为:
1. `95598工单明细表` 不再只是“复杂工单类”模糊样本
2. 生成链能够显式恢复:
- `main request`
- `pagination plan`
- `enrichment requests`
- `export plan`
3. `localhost:*`、宿主注入和 BrowserAction 不再被误判为业务主链
4. `G3` 具备最小可编译合同和独立 gate
5. 证据不足时结果稳定 `fail-closed`
6. `95598、12398、流程超期风险工单明细` 能作为第一扩展样板进入复用验证
## Scope Guardrails
执行过程中保持以下边界不变:
1. 不把 `G3` 回退为普通分页表识别
2. 不把宿主桥接能力提前混入 `G3` 合同
3. 不为了先生成 skill 而放松 gate
4. 不并发展开 `G6/G7/G8`
5. 不在本计划中做真实内网人工验证
## Target Samples
本计划整改与回归只围绕以下两个样板:
1. `95598工单明细表`
2. `95598、12398、流程超期风险工单明细`
其中角色固定为:
1. `95598工单明细表`
- 作用:`P0-3` 主样板
- 目标:冻结 `G3 canonical`
2. `95598、12398、流程超期风险工单明细`
- 作用:第一扩展样板
- 目标:验证 `G3` 合同与证据层是否可复用
## Workstreams
本计划拆为五条工作流:
1. `WS1` G3 边界冻结与样板建档
2. `WS2` G3 证据层建模
3. `WS3` G3 Scene IR / compiler gate / readiness 建设
4. `WS4` G3 P0 canonical 与失败 taxonomy 冻结
5. `WS5` G3 真实样本回归与报告
## Phase Overview
本计划按五个阶段推进:
1. Phase 0冻结 `G3` 边界与样板
2. Phase 1建立 `G3` 证据层
3. Phase 2建立 `G3` 最小合同与 gate
4. Phase 3冻结 `P0-3 canonical`
5. Phase 4回归真实样本并输出首轮报告
执行顺序固定为:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3 -> Phase 4`
## Phase 0冻结 G3 边界与样板
### Objective
先把 `G3` 的问题边界、主样板和扩展样板固定下来,避免开发过程中把工单类、宿主桥接类和导出分析类重新混在一起。
### WS1G3 边界冻结与样板建档
#### Task 1
冻结 `G3` 正式定义:
1. 不是普通分页表
2. 不是宿主桥接型
3. 而是“主查询链 + 分页链 + 补数链 + 导出链”并存的复杂 workflow 报表
#### Task 2
冻结 `95598工单明细表` 为唯一 `P0-3` 主样板。
#### Task 3
冻结 `95598、12398、流程超期风险工单明细` 为第一扩展样板。
#### Task 4
固化 `G3` 进入条件:
1. 存在主查询链候选
2. 存在分页控制证据
3. 存在补数或关联详情链
4. 最终结果依赖分页拉全、补齐、导出或汇总
#### Task 5
固化 `G3` 排除条件:
1. 单请求即可完成的普通报表
2. 仅靠 BrowserAction 推进、无稳定业务主链
3. 以本地落库分析或文档产物为主体
4. `localhost:*` 或宿主依赖明显压过业务证据
### Deliverables
1. `G3` family definition
2. `G3` 样板清单
3. `G3` 进入条件与排除条件
4. `G3` 与其它家族边界说明
### Exit Criteria
1. `95598工单明细表` 不再作为模糊工单样本讨论
2. `G3` 不再与宿主桥接型、文档产物型场景混淆
## Phase 1建立 G3 证据层
### Objective
把源码直接压成 `Scene IR` 的路径升级为:先形成 `G3` 可裁决证据,再归约成 `Scene IR`
### WS2G3 证据层建模
#### Task 6
定义 `main_request_candidate`,承载:
1. 主查询 endpoint
2. 查询参数模板
3. 时间范围或主过滤条件
#### Task 7
定义 `pagination_candidate`,承载:
1. 页码字段
2. pageSize 字段
3. 翻页终止条件
4. 滚动窗口或区间推进规则
#### Task 8
定义 `enrichment_request_candidate`,承载:
1. 详情补查
2. 二次接口
3. 关联补数
#### Task 9
定义 `join_key_candidate`,承载:
1. 工单号
2. 流程号
3. 用户号
4. 设备号
5. 其它主补链关联键
#### Task 10
定义 `export_candidate`,承载:
1. 导出接口
2. 导出参数
3. 导出前置动作
4. 产物类型
#### Task 11
定义 `workflow_step_candidate`,承载:
1. 主查
2. 翻页
3. 补查
4. 聚合
5. 导出
之间的顺序关系。
#### Task 12
定义 `dedupe_or_merge_rule_candidate`,承载:
1. 去重规则
2. 主从并回规则
3. 跨页累积规则
#### Task 13
定义 `host_bridge_candidate``localhost_dependency_candidate`,确保宿主链只作为独立证据保留。
#### Task 14
建立证据归并与冲突裁决规则,明确:
1. 哪些属于业务主链
2. 哪些属于宿主桥接
3. 哪些属于结果导出链
### Deliverables
1. `G3` evidence schema
2. `G3` evidence type dictionary
3. 证据归并规则
4. `95598工单明细表` 第一版证据样例
### Exit Criteria
1. 主链、分页链、补链、导出链、宿主链能够分槽呈现
2. `localhost:*` 不再混入业务主链
## Phase 2建立 G3 最小合同与 Gate
### Objective
`G3` 的判定标准从“看起来像分页补数场景”升级为“最小业务合同是否成立”。
### WS3G3 Scene IR / compiler gate / readiness 建设
#### Task 15
定义 `G3` 最小合同,至少包括:
1. `main_request`
2. `pagination_plan`
3. `enrichment_requests[]`
4. `join_keys[]`
5. `export_plan`
6. `merge_or_dedupe_rules`
#### Task 16
`Scene IR` 中承载 `G3` 专属结构,不再退化成普通 `paginated_enrichment` 空壳字段。
#### Task 17
增加 `G3` gate至少包括
1. `g3_main_request_resolved`
2. `g3_pagination_contract_complete`
3. `g3_enrichment_contract_complete`
4. `g3_join_key_resolved`
5. `g3_export_path_identified`
6. `g3_runtime_scope_compatible`
#### Task 18
定义 blocker / readiness 判定口径,要求能区分:
1. 业务证据不足
2. 分页合同不闭合
3. 补数合同不闭合
4. 导出链依赖宿主
5. 运行时依赖未满足
#### Task 19
落地 `fail-closed` 规则:
1. 主请求链缺失,阻断
2. 分页链存在但终止条件不明,阻断
3. 补数链存在但 join key 不明,阻断
4. 只有导出动作没有业务主链,阻断
5. 宿主桥接证据明显多于业务证据,阻断
#### Task 20
补充测试,证明未闭合 `G3` 样本不能伪装成 runnable skill。
### Deliverables
1. `G3` minimal contract table
2. `G3` gate table
3. `G3` blocker / readiness table
4. `G3` Scene IR example
### Exit Criteria
1. `G3` 已拥有独立 gate
2. 未闭合结果会准确阻断
3. `compiler` 不再吞入未闭合 `G3 IR`
## Phase 3冻结 P0-3 Canonical
### Objective
`95598工单明细表` 做成 `G3` 的第一版标准答案、关键证据基线和失败 taxonomy 基线。
### WS4G3 P0 canonical 与失败 taxonomy 冻结
#### Task 21
冻结 `95598工单明细表` 的 canonical `Scene IR`
#### Task 22
冻结关键证据清单,至少包括:
1. 主请求链
2. 分页链
3. 补数链
4. join key
5. 导出链
6. 宿主依赖
#### Task 23
冻结验收检查表,至少检查:
1. 主链是否恢复
2. 分页链是否恢复
3. 补链是否恢复
4. join key 是否恢复
5. 导出链是否恢复
6. 宿主链是否被隔离
7. readiness 是否与真实闭合程度一致
#### Task 24
冻结失败 taxonomy至少包括
1. `main_chain_missing`
2. `pagination_incomplete`
3. `enrichment_incomplete`
4. `join_key_missing`
5. `export_only_without_business_chain`
6. `host_bridge_pollution`
7. `runtime_dependency_unresolved`
#### Task 25
建立“生成结果 vs canonical”对齐方式。
### Deliverables
1. `G3` P0 canonical `Scene IR`
2. `G3` P0 evidence baseline
3. `G3` acceptance checklist
4. `G3` failure taxonomy table
### Exit Criteria
1. `95598工单明细表` 成为 `G3` 第一版统一校准源
2. 后续 `G3` 回归都可以对照固定 taxonomy
## Phase 4真实样本回归与首轮报告
### Objective
先用 `P0` 主样板建立闭环,再用一个扩展样板验证 `G3` 合同是否具备复用性。
### WS5G3 真实样本回归与报告
#### Task 26
重新生成 `95598工单明细表`
#### Task 27
按统一口径检查:
1. archetype
2. bootstrap
3. main request
4. pagination plan
5. enrichment requests
6. join keys
7. export plan
8. localhost / host bridge separation
9. readiness / blocker
#### Task 28
输出 `G3 P0 validation report`,结论只允许以下三种:
1. `通过`
2. `Fail-closed 且理由准确`
3. `误判,需要整改`
#### Task 29
重新生成 `95598、12398、流程超期风险工单明细`
#### Task 30
对比其与 `P0` 样板之间:
1. 哪些合同可复用
2. 哪些 blocker 是家族共性
3. 哪些是扩展样板特有复杂度
#### Task 31
输出 `G3 first-round family expansion report`
### Deliverables
1. `G3` P0 样板生成结果
2. `G3` P0 验证报告
3. `G3` 扩展样板生成结果
4. `G3` 首轮家族扩展报告
### Acceptance Criteria
1. `95598工单明细表` 至少达到“结构恢复完整”或“Fail-closed 理由准确”
2. 扩展样板不会再被粗暴压成普通分页表
3. `G3` 失败结果具备可解释性
4. `G3` 至少形成第一版家族复用口径
## File-Level Targets
本计划执行时,至少会触达以下资产类型:
1. `docs/superpowers/specs/`
2. `docs/superpowers/plans/`
3. `docs/superpowers/reports/`
4. `src/generated_scene/` 下与证据层、合同层、readiness 相关实现
5. `tests/fixtures/generated_scene/`
6. `tests/`
## Completion Criteria
本计划完成的标志是:
1. `G3` 已拥有正式边界定义
2. `G3` 已拥有最小证据层与最小合同
3. `G3` 已拥有独立 gate 与 fail-closed 口径
4. `95598工单明细表` 已成为 `P0-3 canonical`
5. `G3` 首轮真实样本回归已经给出正式结论
## Next Step
本计划完成后,后续顺序固定为:
1.`G3``P0` 与首轮扩展样板稳定,再决定是否补第二个 `G1-E` 样板
2. 然后再决定是否进入 `G6` 的独立设计与计划

View File

@@ -0,0 +1,77 @@
# G6 Host Bridge Workflow Plan
> Date: 2026-04-18
> Status: Initial implementation slice
## Plan Intent
Start the `G6` line after `G1-E` second-sample reuse has been validated.
This plan implements the first safe slice only: classification, evidence separation, readiness gates, and fail-closed behavior.
## Phase 0: Boundary Freeze
Tasks:
1. keep `电能表现场检验完成率指标报表` as the P0 boundary sample
2. define the repo-local representative fixture
3. keep `G6` separate from `G1`, `G1-E`, `G3`, `G7`, and `G8`
Deliverables:
1. `G6` design doc
2. `G6` plan doc
3. repo-local representative fixture
Acceptance criteria:
1. `G6` is no longer discussed as a `G1` candidate
2. `G6` is not treated as a generic localhost-pollution case
## Phase 1: Analyzer Classification
Tasks:
1. add `host_bridge_workflow` as a workflow archetype
2. detect explicit host bridge actions
3. keep `localhost:*` as supporting host-runtime evidence
4. ensure explicit host bridge signals outrank `G1-E`
5. ensure ordinary localhost export noise does not become `G6`
Acceptance criteria:
1. `g6_host_bridge_workflow` fixture classifies as `host_bridge_workflow`
2. `bootstrap_localhost_pollution` remains a non-G6 business scene
## Phase 2: Fail-Closed Gate
Tasks:
1. add readiness risks for missing or unsupported G6 contract
2. add `g6_host_bridge_detected`
3. add `g6_fail_closed`
4. block generation before runnable output
Acceptance criteria:
1. `G6` generation returns a controlled error
2. error message includes `host_bridge_workflow`
3. no pseudo-runnable skill is produced
## Phase 3: Regression
Tasks:
1. run scene generator regression
2. run family regression
3. run family policy regression
4. run canonical regression
Acceptance criteria:
1. all target regressions pass
2. no `G1-E/G3/G2` behavior regresses
## Next Step
After this safe G6 slice, continue to `G7 多接口盘点汇总型` boundary assessment unless G6 runtime implementation becomes the selected priority.

View File

@@ -0,0 +1,68 @@
# G7 Multi Endpoint Inventory Plan
> Date: 2026-04-18
> Status: Initial implementation slice
## Plan Intent
Start `G7` after the safe `G6` classification slice.
This plan only establishes boundary classification and fail-closed behavior. It does not implement runnable multi-endpoint inventory aggregation.
## Phase 0: Boundary Freeze
Tasks:
1. use `计量资产库存统计` as the P0 boundary sample
2. define a repo-local representative fixture
3. keep `G7` separate from `G1`, `G1-E`, `G6`, and `G8`
Acceptance criteria:
1. `G7` is no longer a `G1` candidate
2. `G7` is not confused with host bridge workflow
## Phase 1: Analyzer Classification
Tasks:
1. add `multi_endpoint_inventory` as a workflow archetype
2. detect inventory endpoint families
3. classify scenes with three or more inventory endpoints as `G7`
Acceptance criteria:
1. `g7_multi_endpoint_inventory` fixture classifies as `multi_endpoint_inventory`
2. inventory endpoint names include `assetStatsQueryMeter` and `assetStatsQueryJlGnModule`
## Phase 2: Fail-Closed Gate
Tasks:
1. add `g7_inventory_endpoints_detected`
2. add `g7_fail_closed`
3. block generation before runnable output
Acceptance criteria:
1. generation returns a controlled error
2. error message includes `multi_endpoint_inventory`
3. no pseudo-runnable skill is produced
## Phase 3: Regression
Tasks:
1. run scene generator regression
2. run family regression
3. run family policy regression
4. run canonical regression
Acceptance criteria:
1. all target regressions pass
2. no existing family baseline regresses
## Next Step
After this safe G7 slice, continue to `G8 抓取落库分析出文档型` boundary assessment.

View File

@@ -0,0 +1,70 @@
# G8 Local Document Pipeline Plan
> Date: 2026-04-18
> Status: Initial implementation slice
## Plan Intent
Start `G8` after the safe `G7` classification slice.
This plan only establishes boundary classification and fail-closed behavior. It does not implement runnable local storage, SQL, or document generation orchestration.
## Phase 0: Boundary Freeze
Tasks:
1. use `95598供电服务月报` as the P0 boundary sample
2. define a repo-local representative fixture
3. keep `G8` separate from `G1`, `G1-E`, `G6`, `G7`, and `G3`
Acceptance criteria:
1. `G8` is no longer a `G1` candidate
2. `G8` is not collapsed into generic host bridge workflow
## Phase 1: Analyzer Classification
Tasks:
1. add `local_doc_pipeline` as a workflow archetype
2. detect `definedSqlQuery`
3. detect `docExport`
4. detect `selectData` / local config service persistence
5. prioritize `G8` over `G6` when both signals exist
Acceptance criteria:
1. `g8_local_doc_pipeline` fixture classifies as `local_doc_pipeline`
2. local pipeline actions are visible in deterministic facts
## Phase 2: Fail-Closed Gate
Tasks:
1. add `g8_local_doc_pipeline_detected`
2. add `g8_fail_closed`
3. block generation before runnable output
Acceptance criteria:
1. generation returns a controlled error
2. error message includes `local_doc_pipeline`
3. no pseudo-runnable skill is produced
## Phase 3: Regression
Tasks:
1. run scene generator regression
2. run family regression
3. run family policy regression
4. run canonical regression
Acceptance criteria:
1. all target regressions pass
2. no existing family baseline regresses
## Next Step
After this safe G8 slice, the boundary-reassignment sequence has a code-backed fail-closed guard for `G1-E`, `G6`, `G7`, and `G8`.

View File

@@ -0,0 +1,215 @@
# 线损家族变体扩展计划
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Inputs:**
> [2026-04-18-g2-family-expansion-plan.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/plans/2026-04-18-g2-family-expansion-plan.md)
> [2026-04-18-g2-family-expansion-third-round-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-18-g2-family-expansion-third-round-report.md)
## Plan Intent
本计划用于承接当前已经收敛出的 `G2-A/G2-B/G2-C` 三类线损子型,把目标从“修通三个代表样本”推进到“可复制扩展更多线损变体”。
当前已经证明:
1. `G2-A` 双模式主报表型可生成
2. `G2-B` 周报单侧 mode 型可生成
3. `G2-C` 混合联动型可生成
因此,下一阶段不再围绕这三个样本反复微调,而是要把“线损场景 -> 子型 -> 最小合同 -> 候选验证”这条复制链做出来。
## Success Baseline
本计划完成后的最低成功口径固定为:
1. 新增一批线损真实场景能够被归入现有子型或新子型
2. 每个新增子型都有最小合同标准
3. 至少每类新增子型有 2 到 3 个真实样本完成迁移验证
4. 不能归类或合同不足的样本继续 `fail-close`
5. 形成一份“线损家族实施映射表”
6. 输出一轮新的线损家族扩展报告
## Scope Guardrails
执行过程中保持以下边界不变:
1. 不扩展到非线损报表家族
2. 不处理统一登录、隐藏域登录或宿主 transport 重构
3. 不在本计划中做真实内网人工验证
4. 不把本计划扩散成 102 个全量场景一次性铺开
5. 不回头推翻已经收敛的 `G2-A/G2-B/G2-C` 结果
## Phase Overview
本计划按五个阶段推进:
1. Phase 0冻结线损扩展基线
2. Phase 1建立线损变体分组清单
3. Phase 2为新增变体建立最小合同标准
4. Phase 3按分组扩展 fixture / 判定 / 生成链路
5. Phase 4回归真实样本并输出扩展报告
执行顺序固定为:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3 -> Phase 4`
## Phase 0冻结扩展基线
### Objective
把当前已经达成的线损家族基线冻结下来,作为后续横向复制的起点。
### Tasks
1. 固化 `G2-A/G2-B/G2-C` 当前口径
2. 固化这三类子型的最小合同事实
3. 固化本计划不再回到“主样本修通”阶段
### Exit Criteria
1. `G2-A/G2-B/G2-C` 被视为已建立的家族基线
2. 扩展工作被明确表述为“更多线损变体复制”
## Phase 1建立线损变体分组清单
### Objective
先把“还要扩哪些线损场景”分组,而不是直接零散补样本。
### Tasks
1. 从现有线损场景中筛出最接近当前家族的候选样本
2. 按结构而不是按名称分组,至少分成:
- 双模式主报表型
- 周报/日报单侧模式型
- 排行/明细主链路型
- 线损主链路 + 外部系统联动型
- 异常诊断/详情下钻型
3. 每组先挑 2 到 3 个代表样本
### Deliverables
1. 线损变体分组清单
2. 每组代表样本名单
### Exit Criteria
1. 不再按单个场景零散推进
2. 后续整改对象以“分组”为单位推进
## Phase 2建立新增变体最小合同标准
### Objective
为每一组新增线损变体先定义“什么叫最低可用”,再动生成器。
### Tasks
1. 为每组定义最小合同,至少明确:
- 主 endpoint
- request template
- response path
- 关键字段或 column defs
- normalize / required fields
2. 明确哪些链路属于主合同
3. 明确哪些链路属于扩展证据或风险证据
### Deliverables
1. 线损变体最小合同表
2. 每组的候选验证门槛
### Exit Criteria
1. 每组都有统一判定口径
2. 后续开发不再靠单样本临时拍脑袋
## Phase 3按分组扩展生成链路
### Objective
把新增变体分组逐类接入 analyzer / generator / readiness。
### Tasks
1. 每一组先补 fixture
2. 每一组先补测试
3. 再补子型判定
4. 再补最小合同恢复
5. 再补 readiness 分级
### Rules
1. 任何一组都必须先有 fixture再改逻辑
2. 不允许多个组同时无边界并行扩散
3. 一组完成后再推进下一组
### Deliverables
1. 新增线损变体 fixture
2. 新增家族回归测试
3. 对应 analyzer / generator / readiness 扩展实现
### Exit Criteria
1. 至少新增 1 到 2 类线损变体可生成
2. 原有 `G2-A/G2-B/G2-C` 不回退
## Phase 4真实样本回归与扩展报告
### Objective
把扩展后的线损家族能力回到真实样本上验证,而不是停在 fixture 层。
### Tasks
1. 重新生成各组代表样本
2. 对比:
- 子型判定
- bootstrap
- request contract
- response / column / normalize contract
- readiness
- 是否进入候选验证名单
3. 输出线损家族扩展回归报告
### Deliverables
1. 真实样本生成结果
2. 线损家族扩展回归报告
3. 更新后的候选验证名单
### Acceptance Criteria
1. 至少 2 个以上新增线损变体组进入候选验证阶段
2. 不能归类的场景继续 `fail-close`
3. 原有三类 `G2-A/G2-B/G2-C` 不回退
## Workstream Breakdown
本计划建议按以下工作流落地:
1. `WS1` 线损变体盘点与分组
2. `WS2` 新增变体最小合同设计
3. `WS3` fixture / 回归测试扩展
4. `WS4` analyzer / generator / readiness 扩展
5. `WS5` 真实样本回归与报告
## File-Level Targets
执行本计划时,预计触达以下资产类型:
1. `docs/superpowers/plans/`
2. `docs/superpowers/reports/`
3. `tests/fixtures/generated_scene/`
4. `tests/`
5. `src/generated_scene/`
## Completion Criteria
本计划完成的标志是:
1. 线损家族不再只有三个代表样本可解释
2. 已建立“按分组复制”的扩展方法,而不是单样本修修补补
3. 后续是否继续向更广场景扩展,可以建立在这份线损家族扩展结果上

View File

@@ -0,0 +1,237 @@
# Scene Generator Ops Console Plan
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Spec:** [2026-04-18-scene-generator-ops-console-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-18-scene-generator-ops-console-design.md)
## Plan Intent
本计划用于将 scene generator 页面从“开发调试控制台”收敛为“面向运维的场景 Skill 生成工作台”,并把上游 `spec` 中已经明确的信息架构、中文化、显隐分层和交互流程拆解为可执行的实施步骤。
本计划只覆盖前端页面层与页面交互层的收敛,不扩展到 scene generator 后端分析逻辑或生成协议改造。
## Scope Guardrails
本计划执行过程中,以下边界保持不变:
1. 不修改 scene generator 后端接口协议
2. 不重写分析算法或 Skill 生成逻辑
3. 不删除现有调试信息,只调整默认显隐与展示层次
4. 不把本计划扩展成新的前端设计系统建设
## Primary Outcome
本计划的直接目标是让运维人员不需要理解 `Scene IR``workflowArchetype``requestTemplate` 等底层术语,也能完成:
1. 选择场景目录
2. 启动分析
3. 判断是否可生成
4. 启动生成
5. 查看结果目录或失败原因
## Workstreams
本计划拆分为四条工作流:
1. `WS1` 信息架构与页面分层收敛
2. `WS2` 中文化与业务态映射
3. `WS3` 日志、结果与风险摘要收敛
4. `WS4` 调试信息折叠与双层体验收口
## Phase Overview
计划按五个阶段推进:
1. Phase 0冻结页面目标与口径
2. Phase 1完成信息架构重组
3. Phase 2完成中文化和业务态映射
4. Phase 3完成日志与结果区收敛
5. Phase 4完成调试层折叠和整体验收
## Phase 0冻结页面目标与口径
### Objective
先冻结该页面服务对象、默认使用模式、主状态表达与一级/二级/三级信息边界,避免实施过程中一边改布局一边改定位。
### Tasks
1. 固化页面角色定义:运维执行者优先,开发 / 调试者次级
2. 固化页面定位:运维工作台,而不是开发调试台
3. 固化默认模式:默认运维模式,技术详情折叠
4. 固化一级/二级/三级信息边界
5. 固化状态表达、场景类型映射和可执行性映射口径
### Deliverables
1. 页面角色说明
2. 信息层级边界说明
3. 状态与场景类型映射表
4. 显隐策略说明
### Exit Criteria
1. 页面默认服务对象不再摇摆
2. 一级信息与技术详情边界不再摇摆
3. 中文状态和类型映射口径冻结
## Phase 1完成信息架构重组
### Objective
将当前“配置区 + 分析区 + 生成日志 + 技术字段混排”的页面结构,重组为运维可理解的工作台结构。
### Tasks
1. 重组顶部总览区
2. 重组左侧主操作区
3. 重组右侧结果摘要区
4. 重组底部执行过程区
5. 预留技术详情区并默认折叠
### Required Sections
首屏结构固定为:
1. 顶部总览区
2. 左侧主操作区
3. 右侧结果摘要区
4. 底部执行过程区
5. 技术详情区
### Deliverables
1. 页面区块结构实现
2. 区块标题与区块顺序实现
3. 一级流程的视觉主路径
### Acceptance Criteria
1. 首屏不再同时暴露大量技术细节
2. 运维默认流程可以按“选择目录 -> 分析 -> 生成 -> 查看结果”完成
3. 页面结构从“调试面板”转为“工作台”
## Phase 2完成中文化和业务态映射
### Objective
将当前页面的大量英文标题、按钮和技术术语替换为面向运维的中文表述,并将底层技术状态映射为业务可读状态。
### Tasks
1. 替换页面标题、副标题和区块标题
2. 替换按钮文案和输入框占位文案
3. 替换日志标签文案
4. 建立 `Readiness` 中文映射
5. 建立 archetype 中文映射
### Required Mappings
最小映射集合包括:
1. `Readiness A/B/C -> 可直接生成 / 可生成但需确认 / 暂不建议生成`
2. `single_request_table -> 单页报表`
3. `multi_mode_request -> 多模式报表`
4. `paginated_enrichment -> 分页明细`
5. `page_state_eval -> 页面检测`
### Deliverables
1. 中文标题与按钮实现
2. 中文状态映射实现
3. 中文场景类型映射实现
4. 中文风险与结果文案实现
### Acceptance Criteria
1. 首屏不再出现大面积未翻译英文
2. 运维可直接理解主要状态和场景类型
3. 技术术语不再作为首页主文案
## Phase 3完成日志与结果区收敛
### Objective
让页面日志和结果区优先服务“执行与排障”,而不是原始流式调试输出。
### Tasks
1.`Generation Log` 改为 `执行过程`
2.`status / log / complete / error` 标签中文化
3. 将原始流日志优先收敛为中文摘要日志
4. 完善 `生成结果` 区的成功/失败状态展示
5. 强化输出目录和结果文件入口
### Deliverables
1. 中文摘要日志
2. 生成结果卡片
3. 失败原因摘要
4. 输出目录入口
### Acceptance Criteria
1. 运维无需阅读底层 SSE 技术消息也能理解执行过程
2. 成功时能快速找到结果目录
3. 失败时能快速看到中文失败原因
## Phase 4完成调试层折叠和整体验收
### Objective
保留开发与排障能力,但让其默认下沉为调试层,不干扰运维首屏使用。
### Tasks
1.`Scene IR``requestTemplate``evidence``workflow steps` 等收入口技术详情区
2.`scene-id``scene-kind``targetUrl override``workflow archetype override` 收入口高级设置
3. 校验默认显隐逻辑
4. 校验运维模式与调试模式体验边界
5. 完成最终页面口径验收
### Deliverables
1. 高级设置折叠区
2. 技术详情折叠区
3. 最终页面显隐策略实现
### Acceptance Criteria
1. 运维首页只承载状态摘要、操作与结果
2. 开发调试仍可通过折叠区查看完整技术信息
3. 不再出现“默认首屏就是技术调试面板”的体验
## File-Level Planning Targets
本计划后续实施至少覆盖以下资产:
1. [sg_scene_generator.html](D:/data/ideaSpace/rust/sgClaw/claw-new/frontend/scene-generator/sg_scene_generator.html)
2. 与页面展示文案和显隐逻辑相关的前端脚本
3. 与页面标题、区块结构和状态映射相关的前端样式与渲染逻辑
## Completion Criteria
本计划完成的标志为:
1. 页面默认形态已从“开发调试控制台”转为“运维工作台”
2. 首屏已完成中文化和业务态映射
3. 运维默认流程可在首屏完成,不依赖技术详情区
4. 调试信息仍保留,但不再默认淹没首页
5. 失败原因、风险提示和结果目录对运维可直接理解
## Risks and Control Points
1. 若只改文案不改信息架构,页面仍会保持臃肿
2. 若只隐藏字段不重做结果摘要,运维仍无法快速判断是否可生成
3. 若过度删除技术信息,会削弱开发与排障效率
4. 若状态映射不统一,页面会出现中文标题下仍夹杂底层技术语义的割裂感
## Out of Plan
以下事项不属于本计划直接交付范围:
1. scene generator 后端分析逻辑重构
2. Skill 生成协议变更
3. 页面服务端接口新增
4. 运维权限、账号体系或多角色权限控制

View File

@@ -0,0 +1,277 @@
# sgClaw Scene Skill Post-Roadmap Execution Plan
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Spec:** [2026-04-18-scene-skill-post-roadmap-execution-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-18-scene-skill-post-roadmap-execution-design.md)
## Plan Intent
This plan starts after the closure of the current `60-to-90 roadmap`.
Its purpose is not to reopen `G1/G2/G3` implementation, but to:
1. unify current execution state
2. start real-sample validation
3. plan the next bounded roadmap
## Scope Guardrails
1. Do not reopen completed `G1/G2/G3` repo-local baseline implementation.
2. Do not keep expanding fixtures as the primary mode of progress.
3. Do not silently pull `G4/G5` into implementation.
4. Do not directly implement unified login recovery in this plan.
5. Do not treat the old roadmap as still open-ended.
6. Phase 1 execution-board work must stay minimal and exist only to support Phase 2 real-sample validation.
7. Once `G2`, `G1-E`, and `G3` each have at least one mappable real sample, execution must move immediately into Phase 2.
8. Any new asset that does not directly support real-sample validation is deferred to Phase 3 or Phase 4.
## Workstreams
1. `WS1` Current Execution Board Unification
2. `WS2` Real Sample Validation
3. `WS3` Boundary and Runtime Gap Planning
4. `WS4` Next Roadmap Definition
## Phase Overview
1. Phase 0: Freeze Handover Boundary
2. Phase 1: Build Current Execution Board
3. Phase 2: Start Real Sample Validation
4. Phase 3: Define Boundary and Runtime Entry Rules
5. Phase 4: Publish the Next Roadmap
Execution order is fixed as:
`Phase 0 -> Phase 1 -> Phase 2 -> Phase 3 -> Phase 4`
## Phase 0: Freeze Handover Boundary
### Objective
Freeze the boundary between the completed roadmap and the next-stage work.
### Tasks
1. Freeze current roadmap completion status.
2. Freeze current mainline family status for `G2`, `G1-E`, and `G3`.
3. Freeze current boundary family status for `G6/G7/G8`.
4. Freeze current deferred status for `G4/G5`.
### Deliverables
1. roadmap handover snapshot
2. next-stage scope statement
3. current family-state matrix
### Acceptance Criteria
1. old and new roadmap boundaries are explicit
2. next-stage work is no longer mixed into the old roadmap
## Phase 1: Build Current Execution Board
### Objective
Create the minimum authoritative execution board required to start real-sample validation for the current `102-scene` status.
### WS1
#### Task 1
Build one `102-scene current execution board`.
#### Task 2
Define the stable scene status vocabulary:
1. `promoted-baseline`
2. `promoted-expansion`
3. `boundary-family`
4. `deferred`
5. `degraded`
6. `unvalidated`
#### Task 3
Map current `G2/G1-E/G3` scene promotions into the board.
#### Task 4
Generate a snapshot-vs-current diff asset.
#### Task 5
Stop Phase 1 immediately after `G2`, `G1-E`, and `G3` each have at least one mappable real sample entry in the board.
### Deliverables
1. `102-scene current execution board`
2. snapshot-vs-current diff report
3. scene-to-family status mapping
### Acceptance Criteria
1. every scene has one current-state label
2. promoted states are visible without reading multiple assets
3. board status matches current family assets
4. the board is limited to the minimum fields needed by Phase 2 validation records
5. no Phase 1 asset is added unless it directly supports real-sample validation
## Phase 2: Start Real Sample Validation
### Objective
Create the next quality layer above fixture success.
### WS2
#### Task 5
Choose the first real-sample validation set for:
1. `G2`
2. `G1-E`
3. `G3`
#### Task 6
Freeze validation criteria:
1. compile success
2. readiness correctness
3. data correctness
4. output correctness
5. fail-closed correctness
#### Task 7
Create a real-sample validation record template.
#### Task 8
Record first-round real-sample results.
#### Task 9
Write mismatches back into the execution board.
#### Task 10
Reject requests for new board-only assets that do not unblock current validation execution.
### Deliverables
1. real-sample validation plan
2. real-sample record template
3. first-round validation records
4. mismatch taxonomy
### Acceptance Criteria
1. each mainline family has at least one real-sample record
2. real-sample status is separated from fixture status
3. mismatch reasons are explicit and reusable
4. Phase 2 begins as soon as `G2`, `G1-E`, and `G3` each have one mappable real sample
## Phase 3: Define Boundary and Runtime Entry Rules
### Objective
Prepare the next bounded execution scope instead of drifting into it.
### WS3
#### Task 11
Assess `G6/G7/G8` boundary-family readiness for future expansion.
#### Task 12
Define formal entry criteria for `G4/G5`.
#### Task 13
Build a runtime-gap matrix for:
1. login recovery
2. host-runtime integration
3. transport/runtime gaps
4. local document and attachment workflows
#### Task 14
Separate:
1. archetype-family gaps
2. runtime-platform gaps
### Deliverables
1. boundary readiness note
2. deferred family entry criteria
3. runtime gap matrix
4. prioritization note
### Acceptance Criteria
1. `G4/G5` do not enter the next build round without documented criteria
2. runtime gaps are tracked separately from family expansion
3. next implementation scope has an explicit reason
## Phase 4: Publish the Next Roadmap
### Objective
Replace open-ended continuation with a new bounded roadmap.
### WS4
#### Task 15
Write the next-stage design.
#### Task 16
Write the next-stage plan.
#### Task 17
Define milestone ordering.
#### Task 18
Define next-stage completion criteria.
### Deliverables
1. post-roadmap design
2. post-roadmap plan
3. milestone table
4. completion criteria
### Acceptance Criteria
1. new implementation work has a new roadmap
2. the old roadmap is no longer implicitly extended
3. next-stage completion can be judged independently
## Milestone Order
1. Freeze the handover boundary
2. Unify the execution board
3. Start real-sample validation
4. Freeze boundary/runtime entry rules
5. Publish the next roadmap
No new implementation round should begin before milestones 1 to 4 are complete.
No Phase 1 expansion should continue after the minimum board needed for milestone 3 is available.
## Completion Criteria
This plan is complete when:
1. the current roadmap is explicitly closed
2. the execution board is unified
3. real-sample validation is formally underway
4. a new bounded roadmap exists for post-roadmap work

View File

@@ -0,0 +1,128 @@
# sgClaw Scene Skill Real Sample Validation Roadmap Plan
> **Status:** Draft
> **Date:** 2026-04-18
> **Author:** Codex
> **Upstream Spec:** [2026-04-18-scene-skill-real-sample-validation-roadmap-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-18-scene-skill-real-sample-validation-roadmap-design.md)
## Plan Intent
This plan starts after the post-roadmap execution board and first-round validation layer are in place.
Its purpose is to:
1. execute selected real samples for `G2`, `G1-E`, and `G3`
2. use validation outcomes to decide the next bounded implementation scope
3. avoid drifting back into fixture-first or asset-first work
## Scope Guardrails
1. Do not reopen completed repo-local baseline implementation for `G1/G2/G3`.
2. Do not create new board-only assets unless they unblock current validation execution.
3. Do not open `G4/G5` implementation before formal entry decisions are documented.
4. Do not pull `G6/G7/G8` into the next build round without explicit validation pressure.
## Workstreams
1. `WS1` Mainline Real Sample Execution
2. `WS2` Validation Result Triage
3. `WS3` Boundary Runtime Entry Decision
4. `WS4` Deferred Family Entry Decision
## Phase 0: Execute Mainline Real Samples
### Objective
Convert selected `G2`, `G1-E`, and `G3` anchors into executed real-sample records.
### Tasks
1. Execute `G2` anchor validation updates from the current mismatch baseline.
2. Keep `G1-E` real pass anchor as the current positive baseline.
3. Execute the pending `G3` real sample.
4. Write all outcomes into the validation record layer.
### Deliverables
1. updated real-sample validation records
2. updated mismatch taxonomy usage
3. updated execution-board validation statuses
### Acceptance Criteria
1. `G2`, `G1-E`, and `G3` each have executed real-sample records
2. `selected-not-yet-run` no longer remains for current mainline anchors
## Phase 1: Triage Results Into Scope Decisions
### Objective
Use validation results, not fixture status, to choose the next bounded scope.
### Tasks
1. classify each mainline family result as `stable`, `mismatch-driven`, or `blocked-by-runtime`
2. identify which problems are compiler-family gaps and which are runtime gaps
3. define the next recommended scope from validation evidence
### Deliverables
1. validation triage report
2. next-scope recommendation
### Acceptance Criteria
1. the next scope is justified by executed validation evidence
2. repo-local success no longer acts as the sole decision signal
## Phase 2: Boundary Runtime Entry Decision
### Objective
Decide whether `G6/G7/G8` should stay boundary-only or enter a runtime-focused roadmap.
### Tasks
1. compare boundary-family runtime gaps against executed validation pressure
2. decide whether any boundary family should enter the next roadmap
3. document non-entry decisions explicitly when scope stays closed
### Deliverables
1. boundary runtime decision note
2. next-roadmap inclusion or exclusion list
### Acceptance Criteria
1. `G6/G7/G8` entry decisions are explicit
2. no boundary family enters by drift
## Phase 3: Deferred Family Entry Decision
### Objective
Decide whether `G4/G5` should remain closed or enter a later roadmap.
### Tasks
1. compare deferred-family criteria against current validation pressure
2. confirm whether `G4/G5` remain deferred or degraded
3. record the decision before any new implementation starts
### Deliverables
1. deferred family decision note
2. updated next-roadmap scope boundary
### Acceptance Criteria
1. `G4/G5` entry decisions are explicit
2. deferred families do not enter implementation implicitly
## Completion Criteria
This plan is complete when:
1. all selected mainline anchors have executed real-sample records
2. the next implementation scope is selected from validation outcomes
3. boundary and deferred family entry decisions are documented

View File

@@ -0,0 +1,51 @@
# 102 Final Coverage Status Rollup Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer E`
> Status: Active
## Plan Intent
Create the final 102-scene coverage rollup after residual 13 closure. This plan publishes a candidate/status view only.
## Fixed Inputs
1. `tests/fixtures/generated_scene/full_coverage_reconciliation_candidates_2026-04-19.json`
2. `tests/fixtures/generated_scene/residual_13_reconciliation_candidates_2026-04-19.json`
3. `tests/fixtures/generated_scene/boundary_residual_hold_decision_2026-04-19.json`
4. `tests/fixtures/generated_scene/bootstrap_target_residual_isolation_2026-04-19.json`
5. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
## Allowed Files
1. `tests/fixtures/generated_scene/final_coverage_status_rollup_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-102-final-coverage-status-rollup-report.md`
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `src/generated_scene/analyzer.rs`
3. `src/generated_scene/generator.rs`
## Tasks
1. Load the 102-scene full coverage reconciliation candidate view.
2. Load the residual 13 reconciliation candidate view.
3. Replace matching residual scenes in the 102 view with residual follow-up candidate statuses.
4. Attach boundary/bootstrap overlay decisions where present.
5. Produce final coverage summary.
6. Publish the rollup JSON.
7. Publish the rollup report.
## Completion Criteria
1. Final rollup contains `102` scenes.
2. Final summary has `95` framework auto-pass candidates and `7` structured fail-closed candidates.
3. There are `0` source-unreadable, unsupported-family, missing-source, and misclassified-unresolved records.
4. Official execution board is not modified.
5. Report names the next bounded step.
## Stop Statement
Stop after the final coverage rollup JSON and report are published. Do not update the official execution board under this plan.

View File

@@ -0,0 +1,42 @@
# 102 Framework Closure Rollup Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Status: Draft
## Plan Intent
Publish the final 102-scene framework closure rollup after the final-2 residual roadmaps and board refresh are complete.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. optional `tests/fixtures/generated_scene/final_2_official_board_reconciliation_refresh_2026-04-19.json`
## Allowed Files
1. `tests/fixtures/generated_scene/scene_skill_102_framework_closure_rollup_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-scene-skill-102-framework-closure-rollup-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. Load official board.
2. Count framework statuses.
3. List any remaining structured fail-closed scenes and their named next actions.
4. Verify unresolved count is zero.
5. Publish closure rollup JSON and report.
## Expected Delta
No implementation delta. This is the final reporting layer.
## Stop Statement
Stop after publishing the 102 framework closure rollup. Do not start another runtime roadmap under this plan.

View File

@@ -0,0 +1,62 @@
# 102 Full Coverage Follow-Up Sweep And Reconciliation Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer E`
> Upstream Design: `docs/superpowers/specs/2026-04-19-102-full-coverage-followup-sweep-and-reconciliation-design.md`
## Plan Intent
Run one fixed full 102-scene follow-up sweep after Route 2 through Route 6 have closed, then publish a policy-governed reconciliation candidate view.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/g3_enrichment_request_closure_followup_2026-04-19.json`
3. `tests/fixtures/generated_scene/g3_export_plan_closure_followup_2026-04-19.json`
4. `tests/fixtures/generated_scene/g3_residual_contract_closure_2026-04-19.json`
5. `tests/fixtures/generated_scene/g2_remaining_fail_closed_closure_followup_2026-04-19.json`
6. `tests/fixtures/generated_scene/g1e_remaining_fail_closed_closure_followup_2026-04-19.json`
7. `tests/fixtures/generated_scene/boundary_fail_closed_decision_2026-04-19.json`
8. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
## Allowed Files
1. follow-up sweep JSON asset
2. reconciliation candidate JSON asset
3. follow-up sweep report
4. reconciliation candidate report
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
4. family implementation assets
## Tasks
1. run fixed 102-scene follow-up sweep
2. classify raw sweep result
3. apply Route 5 route decisions where applicable
4. apply Route 6 promotion policy to build reconciliation candidate view
5. publish coverage delta and remaining-gap report
## Expected Coverage Delta
The plan should quantify cumulative delta after Routes 2, 3, and 4.
## Completion Criteria
1. total scene count is 102
2. every scene has one raw sweep status
3. every scene has one reconciliation candidate status
4. coverage delta is reported
5. official execution board is not modified
## Stop Statement
Stop after publishing the follow-up sweep and reconciliation candidate reports.
Do not start a new implementation route under this plan.

View File

@@ -0,0 +1,197 @@
# 102 Full Sweep Dry-Run Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-102-full-sweep-dry-run-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-design.md)
## Plan Intent
Run one bounded, read-only full sweep over the `102` scene ledger to measure actual generic `scene -> skill` coverage.
The plan answers:
`how many of the 102 scenes can the current generic analyzer/generator handle today?`
## Scope Guardrails
1. do not change analyzer logic
2. do not change generator logic
3. do not promote scenes into `scene_execution_board_2026-04-18.json`
4. do not add new family baselines
5. do not create new family implementation plans
6. do not fix failures during this dry-run
7. do not run outside the fixed `102` scene set
## Fixed Inputs
1. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
3. generator command: `cargo run --bin sg_scene_generate`
## Fixed Outputs
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md`
## Workstreams
1. `WS1` Build Scene Inventory
2. `WS2` Run Analyzer/Generator Dry-Run
3. `WS3` Classify Results
4. `WS4` Publish Coverage Report
## Phase 0: Freeze Dry-Run Boundary
### Objective
Make the dry-run a measurement exercise only.
### Tasks
1. freeze the execution board input
2. freeze the local scene root
3. freeze the dry-run output paths
4. explicitly mark the run as read-only with respect to generator behavior and board status
### Deliverables
1. fixed input statement
2. fixed output statement
3. dry-run no-promotion statement
### Acceptance Criteria
1. no analyzer/generator implementation file is edited for this dry-run
2. `scene_execution_board_2026-04-18.json` is not modified by dry-run results
3. failures are recorded, not fixed
## Phase 1: Build Scene Inventory
### Objective
Construct a deterministic inventory of all `102` scene names and expected source directories.
### Tasks
1. read `scene_execution_board_2026-04-18.json`
2. extract all scene entries
3. map each scene name to `D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>`
4. check whether each source directory exists
5. assign initial inventory status:
- `source-present`
- `missing-source`
### Deliverables
1. inventory section inside `full_sweep_dry_run_2026-04-19.json`
2. missing-source list
### Acceptance Criteria
1. inventory count equals `102`
2. every scene has a source path
3. missing source does not stop the sweep
## Phase 2: Run Analyzer/Generator Dry-Run
### Objective
Attempt current generic generation for every source-present scene without fixing failures.
### Tasks
1. generate a stable safe scene id for each scene
2. invoke `sg_scene_generate` for each source-present scene
3. write outputs under `examples/full_sweep_dry_run_2026-04-19`
4. for successful generation, read `references/generation-report.json`
5. for failed generation, capture stderr/stdout and exit code
6. continue until all `102` scenes are processed
### Deliverables
1. per-scene dry-run execution record
2. generated output root for successful scenes
3. captured error messages for failed scenes
### Acceptance Criteria
1. every source-present scene has a generator result
2. no failure aborts the full sweep
3. generator results are isolated under the dry-run output root
## Phase 3: Classify Results
### Objective
Turn raw dry-run output into actionable coverage categories.
### Tasks
1. classify generated `A/B` readiness with no blocker as `auto-pass`
2. classify generator blocking with known gate/contract reason as `fail-closed-known`
3. classify obvious family mismatch as `misclassified`
4. classify evidence outside current families as `unsupported-family`
5. classify absent directories as `missing-source`
6. classify read/analyze failures as `source-unreadable`
7. compute top blockers by frequency
8. compute counts by inferred archetype
### Deliverables
1. final dry-run status per scene
2. summary counts
3. by-archetype counts
4. top-blocker list
### Acceptance Criteria
1. every scene has exactly one final status
2. total classified count equals `102`
3. every non-pass scene has a reason
## Phase 4: Publish Report
### Objective
Answer the coverage question without changing project state.
### Tasks
1. write `full_sweep_dry_run_2026-04-19.json`
2. write `2026-04-19-102-full-sweep-dry-run-report.md`
3. report these four headline numbers:
- `real-sample executed pass`
- `code-backed ledger coverage`
- `dry-run auto-pass`
- `dry-run actionable coverage`
4. list next recommended blocker, but do not start implementation
### Deliverables
1. dry-run JSON
2. dry-run report
### Acceptance Criteria
1. report can answer actual generic coverage over `102` scenes
2. report separates proven coverage from predicted/dry-run coverage
3. report does not promote scene status
## Completion Criteria
This plan is complete when:
1. all `102` scenes are included in the dry-run result
2. the dry-run result has stable summary counts
3. the report explains the gap between `5/102`, `23/102`, and dry-run coverage
4. no generator logic or execution board status is modified
## Non-Negotiable Stop Rule
After this dry-run starts:
1. do not fix generator failures inside the sweep
2. do not create new family implementation plans from a single failure
3. do not update the execution board automatically
4. stop after publishing the dry-run result and report

View File

@@ -0,0 +1,240 @@
# 102 Full Sweep Dry-Run Triage Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-triage-design.md`
## Plan Intent
Turn the `62` non-pass records from the full sweep into concrete triage buckets while staying measurement-only.
The plan must not fix generator failures. It only explains them.
## Fixed Inputs
1. dry-run result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
2. dry-run output root: `examples/full_sweep_dry_run_2026-04-19`
3. execution board: `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
4. scene root: `D:/desk/智能体资料/全量业务场景/一平台场景`
## Fixed Outputs
1. triage result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
2. triage report: `docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md`
## Non-Negotiable Scope Guardrails
1. do not edit analyzer implementation
2. do not edit generator implementation
3. do not update `scene_execution_board_2026-04-18.json`
4. do not promote any scene
5. do not add new family baselines
6. do not start implementation correction during triage
7. do not expand beyond the fixed `102` scene set
## Workstreams
1. `WS1` Timeout Triage
2. `WS2` Misclassification Triage
3. `WS3` No-Report Failure Triage
4. `WS4` Publish Triage Result
## Phase 0: Freeze Triage Boundary
### Objective
Make the triage a classification exercise only.
### Tasks
1. read the upstream dry-run result
2. verify the upstream result has `102` scenes
3. verify non-pass buckets are:
- `31` timeout records
- `5` misclassified records
- `25` no-report records
- `1` bootstrap-target record
4. freeze the triage order:
- timeout first
- misclassification second
- no-report third
### Deliverables
1. frozen triage input statement
2. frozen non-pass bucket counts
3. frozen triage order
### Acceptance Criteria
1. triage input count is stable
2. no code is changed
3. no board status is updated
## Phase 1: Timeout Triage
### Objective
Split the `31` timeout records into second-level reasons.
### Tasks
1. select records where `dryRunStatus = source-unreadable`
2. verify reason is `generator timeout after 30s`
3. collect source directory metadata:
- source directory exists
- file count
- total source bytes
- largest file path
- largest file bytes
4. collect dry-run artifact metadata:
- generated skill directory exists
- references directory exists
- generation report exists
5. preserve board context:
- current group
- current status
- current source asset
- real sample record id
6. optionally run one diagnostic longer-timeout attempt for classification only
7. assign one timeout label:
- `timeout-known-family-sample`
- `timeout-unvalidated-source`
- `timeout-large-source`
- `timeout-command-hang`
- `timeout-generator-slow-but-progressing`
- `timeout-undetermined`
### Deliverables
1. `timeoutTriage[]` records in the triage JSON
2. timeout label summary
3. timeout size/source metadata summary
### Acceptance Criteria
1. all `31` timeout records have a second-level label
2. no timeout is treated as unsupported family by default
3. no long-timeout rerun result promotes a scene
## Phase 2: Misclassification Triage
### Objective
Explain the `5` board-vs-archetype conflicts.
### Tasks
1. select records where `dryRunStatus = misclassified`
2. preserve:
- board expected group
- expected archetype
- inferred archetype
- current source asset
- real sample layer status
3. inspect existing dry-run report path when present
4. collect route-conflict evidence:
- whether host bridge evidence dominates
- whether G3 or G1-E evidence is still present
- whether current board expectation came from baseline or expansion
5. assign one routing triage label:
- `route-overprefer-host-bridge`
- `board-expectation-stale`
- `mixed-workflow-host-bridge-valid`
- `scene-family-split-needed`
- `misclassification-undetermined`
### Deliverables
1. `misclassificationTriage[]` records in the triage JSON
2. routing conflict summary
3. high-priority routing risk list
### Acceptance Criteria
1. all `5` misclassified records have a routing label
2. no routing code is changed
3. the report identifies whether implementation correction is justified later
## Phase 3: No-Report Failure Triage
### Objective
Split the `25` generic no-report failures into concrete failure stages.
### Tasks
1. select records where:
- `dryRunStatus = fail-closed-known`
- `reason = generator failed without generation report`
2. collect command artifacts:
- exit code
- stdout tail
- stderr tail
3. inspect output artifacts:
- skill directory exists
- references directory exists
- any report file exists
4. infer one failure stage:
- `source-scan`
- `analyzer`
- `ir-assembly`
- `readiness-before-report`
- `compiler-package-write`
- `panic-or-process-error`
- `unknown-no-report`
5. keep `bootstrap_target` failure separate
### Deliverables
1. `noReportFailureTriage[]` records in the triage JSON
2. `bootstrapTargetFailures[]` records in the triage JSON
3. failure-stage summary
### Acceptance Criteria
1. all `25` no-report failures have an inferred failure stage
2. the `bootstrap_target` case is not hidden in the no-report bucket
3. every non-pass record remains explainable without implementation changes
## Phase 4: Publish Triage Result
### Objective
Publish a bounded triage result and stop.
### Tasks
1. write `full_sweep_dry_run_triage_2026-04-19.json`
2. write `2026-04-19-102-full-sweep-dry-run-triage-report.md`
3. include:
- timeout triage summary
- misclassification triage summary
- no-report triage summary
- recommended next blocker
4. explicitly state that the triage does not promote scenes or start fixes
### Deliverables
1. triage JSON
2. triage report
### Acceptance Criteria
1. all `62` non-pass records are covered
2. every non-pass record has a second-level explanation
3. the report identifies the next blocker without implementing it
4. no generator/analyzer file is modified
5. `scene_execution_board_2026-04-18.json` is not modified
## Completion Criteria
This plan is complete when:
1. `31` timeout records have timeout labels
2. `5` misclassified records have routing labels
3. `25` no-report failures have failure stages
4. `1` bootstrap-target failure is separately tracked
5. the triage JSON and report are published
6. execution stops without implementation work

View File

@@ -0,0 +1,305 @@
# 102 Full Sweep Improvement Roadmap Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-full-sweep-improvement-roadmap-design.md`
> Upstream Dry-Run Result: `tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json`
> Upstream Triage Result: `tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json`
## Plan Intent
Turn the `102` scene dry-run and triage findings into a governed improvement roadmap.
This plan is intentionally broad like the earlier `60-to-90` roadmap. It coordinates multiple bounded implementation tracks instead of starting isolated fixes from individual failures.
## Baseline
Current measured baseline:
| Metric | Count |
| --- | ---: |
| Real-sample executed pass | 5 / 102 |
| Code-backed ledger coverage | 23 / 102 |
| Dry-run auto-pass | 40 / 102 |
| Dry-run actionable coverage | 66 / 102 |
Current triage baseline:
| Bucket | Count | Triage conclusion |
| --- | ---: | --- |
| Timeout | 31 | `19 timeout-unvalidated-source`, `8 timeout-large-source`, `4 timeout-known-family-sample` |
| Misclassified | 5 | all `route-overprefer-host-bridge` |
| No-report failure | 25 | all `readiness-before-report` |
| Bootstrap target | 1 | separate `bootstrap_target` |
## Scope Guardrails
1. do not add new scene families
2. do not update `scene_execution_board_2026-04-18.json` inside this roadmap
3. do not promote scenes directly from diagnostic or dry-run results
4. do not reopen completed real-sample passes except as regression checks
5. do not start `G4/G5`
6. do not implement full login recovery
7. do not implement full host runtime transport
8. do not implement local document attachment runtime
9. do not create unbounded micro-plans from a single failure
## Workstreams
1. `WS1` Timeout Diagnostics and Scan Budget
2. `WS2` Routing Boundary Correction
3. `WS3` Structured Fail-Closed Reporting
4. `WS4` Follow-Up Sweep and Coverage Delta
## Phase 0: Freeze Improvement Baseline
### Objective
Freeze the dry-run and triage outputs as the only accepted inputs to this roadmap.
### Tasks
1. freeze `full_sweep_dry_run_2026-04-19.json`
2. freeze `full_sweep_dry_run_triage_2026-04-19.json`
3. freeze the four headline metrics:
- `5/102` real-sample pass
- `23/102` code-backed ledger coverage
- `40/102` dry-run auto-pass
- `66/102` dry-run actionable coverage
4. freeze the problem buckets:
- `4` known-family timeouts
- `8` large-source timeouts
- `19` unvalidated-source timeouts
- `5` host-bridge over-preference cases
- `25` readiness-before-report failures
- `1` bootstrap-target failure
### Deliverables
1. baseline statement
2. frozen blocker inventory
3. roadmap entry criteria
### Acceptance Criteria
1. no additional scene is added to scope
2. no implementation starts before the baseline is frozen
3. dry-run and triage assets are treated as immutable inputs
## Phase 1: Known-Family Timeout Diagnostics
### Objective
Resolve the highest-priority ambiguity: known-family scenes that timed out in the full sweep.
### Tasks
1. select only records labeled `timeout-known-family-sample`
2. capture source scale metrics and previous family context
3. run bounded diagnostic attempts if needed
4. classify each record as:
- `known-family-rerun-pass`
- `known-family-source-scale-timeout`
- `known-family-generator-hotspot`
- `known-family-contract-blocked-after-long-run`
- `known-family-timeout-unresolved`
5. publish diagnostic result
### Deliverables
1. known-family timeout diagnostic JSON
2. known-family timeout diagnostic report
### Acceptance Criteria
1. all `4` known-family timeout records are classified
2. no scene is promoted from diagnostic success
3. no generator logic is changed in the diagnostic step
## Phase 2: Source-Scale and Scan-Budget Improvement
### Objective
Reduce timeout noise caused by oversized source directories and obvious vendor/library files.
### Tasks
1. analyze `timeout-large-source` and `timeout-unvalidated-source`
2. define source scan budget policy
3. define vendor/library ignore policy
4. implement only bounded source scanning or timeout reporting changes
5. verify no canonical or real-sample regression is introduced
### Deliverables
1. source scan budget policy
2. bounded scan implementation if approved by Phase 1 evidence
3. timeout reporting regression tests
### Acceptance Criteria
1. large source directories no longer dominate the full sweep by accidental vendor-file scanning
2. known-family samples are not made worse
3. archetype semantics are unchanged
## Phase 3: Host-Bridge Route Over-Preference Correction
### Objective
Correct or formally adjudicate the five cases where `host_bridge_workflow` over-absorbed `G3` or `G1-E` expected scenes.
### Tasks
1. select the `5` `route-overprefer-host-bridge` records
2. compare business-chain evidence against host-bridge evidence
3. define routing precedence rules for:
- `G3` vs `G6`
- `G1-E` vs `G6`
4. implement bounded routing correction only if evidence supports it
5. preserve regressions for:
- `G3` real-sample pass
- `G1-E` real-sample pass
- `G6` real-sample pass
6. classify each case as:
- `route-corrected-to-g3`
- `route-corrected-to-g1e`
- `board-expectation-reclassified`
- `valid-host-bridge-workflow`
- `route-conflict-unresolved`
### Deliverables
1. route over-preference correction report
2. routing regression tests
3. updated dry-run classification for the five fixed records
### Acceptance Criteria
1. all `5` route conflicts are adjudicated
2. `host_bridge_workflow` no longer wins solely because host evidence exists
3. existing `G6` pass remains stable
4. no broad routing rewrite is introduced
## Phase 4: Structured Fail-Closed Reporting
### Objective
Convert `readiness-before-report` failures into structured failure reports instead of process-level no-report failures.
### Tasks
1. select the `25` `readiness-before-report` records
2. identify where generation exits before report emission
3. define a minimal failure-report schema for pre-package fail-closed
4. emit structured failure records with:
- inferred archetype
- failed gate
- blocker reason
- missing contract pieces
- stderr summary if any
5. keep scenes failing unless their contracts are actually complete
### Deliverables
1. pre-report fail-closed schema
2. implementation of structured failure report emission
3. regression covering at least one `paginated_enrichment`, one `local_doc_pipeline`, one `multi_mode_request`, and one `single_request_enrichment` pre-report failure
### Acceptance Criteria
1. no-report failures are reduced or eliminated as a category
2. failing scenes still fail closed
3. failure reasons become machine-readable
4. auto-pass count is not inflated by looser gates
## Phase 5: Bootstrap Target Isolation
### Objective
Keep the single `bootstrap_target` failure isolated and decide whether it belongs to later bootstrap normalization work.
### Tasks
1. preserve `用户停电频次分析监测` as a separate bootstrap failure
2. inspect whether the failure is caused by missing target URL, domain mismatch, or unsupported bootstrap pattern
3. produce a bootstrap isolation note
4. do not implement login or bootstrap auto-recovery
### Deliverables
1. bootstrap target isolation note
2. decision whether the case enters a later bootstrap-normalization roadmap
### Acceptance Criteria
1. the bootstrap case does not pollute readiness-before-report work
2. no login recovery implementation is started
## Phase 6: Follow-Up Full Sweep and Coverage Delta
### Objective
Measure whether the bounded improvements improved generic coverage.
### Tasks
1. rerun the fixed `102` scene full sweep with the same scene set
2. produce a new dry-run result
3. compare against the baseline:
- auto-pass delta
- actionable coverage delta
- timeout delta
- misclassification delta
- no-report delta
4. publish coverage delta report
5. decide whether to move to execution-board status sync or another bounded improvement cycle
### Deliverables
1. follow-up full sweep JSON
2. coverage delta report
3. remaining blocker decision board
### Acceptance Criteria
1. scene set remains exactly `102`
2. baseline and follow-up are comparable
3. improvements are quantified, not assumed
4. no execution board status is changed automatically
## Milestone Order
The order is fixed:
1. Phase 0: freeze baseline
2. Phase 1: known-family timeout diagnostics
3. Phase 2: source-scale and scan-budget improvement
4. Phase 3: host-bridge route over-preference correction
5. Phase 4: structured fail-closed reporting
6. Phase 5: bootstrap target isolation
7. Phase 6: follow-up full sweep and coverage delta
Do not start Phase 3 before Phase 1 is completed. Known-family timeout ambiguity affects the interpretation of current coverage.
Do not start Phase 6 before Phases 2-5 have either completed or been explicitly deferred with reasons.
## Completion Criteria
This roadmap is complete when:
1. known-family timeouts are no longer mixed with generic timeout noise
2. host-bridge over-preference cases are adjudicated
3. readiness-before-report failures become structured fail-closed records
4. the bootstrap target case is isolated
5. a follow-up full sweep quantifies coverage delta
6. no new family is introduced as a shortcut around current blockers
## Out of Plan
1. new family implementation
2. `G4/G5` implementation
3. browser host runtime transport
4. login recovery
5. attachment/local document runtime
6. automatic execution board promotion

View File

@@ -0,0 +1,140 @@
# 102 Sweep Status Reconciliation Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-102-sweep-status-reconciliation-design.md`
## Plan Intent
Reconcile the follow-up `102` sweep result with the final route-conflict decisions so the next roadmap uses a trustworthy status baseline.
This plan is a status reconciliation plan, not an implementation plan.
## Scope Guardrails
1. do not modify `src/generated_scene/analyzer.rs`
2. do not modify `src/generated_scene/generator.rs`
3. do not modify `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
4. do not promote any scene
5. do not add or modify family baselines
6. do not rerun the `102` sweep
7. do not implement fixes for fail-closed or timeout records
## Phase 0: Freeze Inputs
### Objective
Freeze the exact reconciliation inputs.
### Tasks
1. read `full_sweep_improvement_followup_2026-04-19.json`
2. read `remaining_route_conflict_decisions_2026-04-19.json`
3. verify follow-up sweep scene count is `102`
4. verify route-decision conflict count is `4`
### Deliverables
1. input validation summary
### Acceptance Criteria
1. reconciliation does not proceed if follow-up scene count is not `102`
2. reconciliation does not proceed if route-decision count is not `4`
## Phase 1: Merge Route Decisions
### Objective
Apply route-conflict decisions as a reconciliation overlay without changing raw sweep status.
### Tasks
1. match route decisions by `sceneId`
2. for each matching scene, keep `dryRunStatus = misclassified`
3. add `routeDecision = valid-host-bridge-workflow`
4. set `reconciledStatus = adjudicated-valid-host-bridge`
5. preserve decision reason and evidence summary
### Deliverables
1. route-decision overlay records
### Acceptance Criteria
1. all `4` route decisions match a follow-up scene
2. all `4` are reconciled to `adjudicated-valid-host-bridge`
3. no broad status rewrite is performed
## Phase 2: Build Reconciled Status Counts
### Objective
Build the reconciled status summary for all `102` scenes.
### Tasks
1. copy all follow-up scene records into a new reconciliation asset
2. assign `reconciledStatus` for every scene
3. count statuses:
- `auto-pass`
- `fail-closed-known`
- `adjudicated-valid-host-bridge`
- `source-unreadable`
- `missing-source`
- `unsupported-family`
- `misclassified-unresolved`
4. summarize fail-closed records by archetype and reason
5. preserve remaining timeout records as unresolved timeout inputs
### Deliverables
1. `tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json`
### Acceptance Criteria
1. total scene count is `102`
2. reconciled status count total is `102`
3. unresolved misclassification count is `0`
4. timeout count remains `2`
## Phase 3: Publish Reconciliation Report
### Objective
Make the reconciled state readable and actionable.
### Tasks
1. summarize raw follow-up counts
2. summarize reconciled counts
3. list `4` valid-host-bridge adjudications
4. list `2` remaining timeout inputs
5. summarize `48` fail-closed-known records as the next implementation-analysis candidate
6. state explicitly that the execution board was not changed
### Deliverables
1. `docs/superpowers/reports/2026-04-19-102-sweep-status-reconciliation-report.md`
### Acceptance Criteria
1. report explains why raw `misclassified = 4` no longer means unresolved route bugs
2. report identifies the next likely roadmap input without starting it
3. report confirms no code or execution-board changes
## Completion Criteria
This plan is complete when:
1. reconciliation JSON exists
2. reconciliation report exists
3. all `4` route conflicts are represented as adjudicated valid host-bridge workflows
4. no unresolved misclassification remains
5. `2` timeouts and `48` fail-closed records remain visible as separate future inputs
## Stop Statement
Stop after publishing the reconciliation JSON and report.
Do not start the next roadmap in this plan.

View File

@@ -0,0 +1,44 @@
# Bootstrap Target Normalization Roadmap Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Fixed Scene: `sweep-091-scene`
> Status: Draft
## Plan Intent
Run a bounded bootstrap target normalization slice for the single remaining `page_state_eval` residual.
## Fixed Input Bucket
1. `sweep-091-scene`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/scene_generator_test.rs`
4. `tests/fixtures/generated_scene/bootstrap_target_normalization_followup_2026-04-19.json`
5. `tests/fixtures/generated_scene/bootstrap_target_normalization_reconciliation_candidates_2026-04-19.json`
6. `docs/superpowers/reports/2026-04-19-bootstrap-target-normalization-roadmap-report.md`
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. Freeze the current `sweep-091-scene` generation report.
2. Identify whether the failure is a missing target URL, target-domain ambiguity, or policy-held navigation dependency.
3. Implement at most one bounded bootstrap target normalization slice if the target can be recovered from deterministic source evidence.
4. Rerun only `sweep-091-scene`.
5. Publish follow-up and reconciliation candidate assets.
## Expected Delta
Target delta is `+1 framework-auto-pass-candidate` if deterministic bootstrap target recovery is possible. Otherwise the delta is `0`, with a narrower named hold.
## Stop Statement
Stop after the single-scene follow-up and reconciliation candidates are published. Do not update the official board under this plan.

View File

@@ -0,0 +1,38 @@
# Bootstrap Target Residual Isolation Plan
> Date: 2026-04-19
> Status: Draft
> Parent Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-residual-13-closure-plan.md`
> Parent Route: `Residual Route D`
> Parent Layer: `Layer D`
## Plan Intent
Isolate the remaining page-state/bootstrap-target residual without starting login recovery or runtime navigation implementation.
## Fixed Input Bucket
1. `sweep-091-scene` / `用户停电频次分析监测`
## Allowed Files
1. isolation JSON asset
2. isolation report
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. login/runtime implementation files
4. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. preserve the residual as bootstrap-target isolated;
2. publish isolation report;
3. do not implement login recovery.
## Stop Statement
Stop after isolation assets are published.

View File

@@ -0,0 +1,55 @@
# Boundary Fail-Closed Decision Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 5: boundary-family fail-closed`
> Parent Layer: `Layer C + Layer D`
> Upstream Design: `docs/superpowers/specs/2026-04-19-boundary-fail-closed-decision-design.md`
## Plan Intent
Publish a decision for the remaining boundary-family fail-closed buckets after mainline routes are complete or deferred.
## Fixed Input Bucket
1. `local_doc_pipeline = 5`
2. `host_bridge_workflow = 1`
3. `page_state_eval/bootstrap_target = 1`
## Allowed Files
1. boundary decision JSON assets
2. boundary decision report assets
3. optional next bounded boundary plan docs
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. freeze the Route 5 bucket state
2. inspect each boundary subgroup
3. decide defer/hold/open-slice
4. publish Route 5 decision report
## Expected Coverage Delta
Decision-only delta:
1. unresolved boundary ambiguity should go to zero
## Completion Criteria
1. every Route 5 subgroup has a named decision
2. any follow-up bounded plan is explicit and optional
## Stop Statement
Stop after the Route 5 decision report is published.
Do not begin boundary implementation under this plan.

View File

@@ -0,0 +1,139 @@
# Boundary Family Real-Sample Entry Roadmap Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-boundary-family-real-sample-entry-roadmap-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-boundary-family-real-sample-entry-roadmap-design.md)
## Plan Intent
This roadmap determines the next bounded step after `G1-E / G2 / G3` have all closed as executed real-sample passes.
Its only purpose is:
`decide whether one boundary family may enter real-sample execution scope next`
## Scope Guardrails
1. do not reopen `G1-E / G2 / G3`
2. do not implement runtime-platform prerequisites under this roadmap
3. do not execute real samples for more than one boundary family
4. do not open `G4 / G5`
5. do not turn this work into a new family-asset expansion program
## Candidate Boundary Families
The only candidates under this roadmap are:
1. `G6`
2. `G7`
3. `G8`
## Workstreams
1. `WS1` Freeze the Post-Mainline Starting State
2. `WS2` Evaluate Boundary-Family Entry Readiness
3. `WS3` Select One Next Candidate or Hold All
4. `WS4` Publish the Next Bounded Execution Slice
## Phase 0: Freeze the Starting State
### Objective
Lock the roadmap start point so the decision does not drift back into old mainline work.
### Tasks
1. freeze `G1-E / G2 / G3` as closed executed passes
2. freeze `G6 / G7 / G8` as held boundary families
3. freeze `G4 / G5` as out of scope
### Deliverables
1. starting-state note
2. fixed candidate list
### Acceptance Criteria
1. no mainline or deferred family work is reopened under this roadmap
## Phase 1: Evaluate Boundary-Family Entry Readiness
### Objective
Compare `G6 / G7 / G8` against explicit entry criteria instead of intuition.
### Tasks
1. restate the current entry condition for each boundary family
2. compare the required runtime gap for each family
3. estimate which family needs the smallest new capability to enter real-sample scope
### Deliverables
1. boundary-family comparison matrix
2. smallest-entry-cost summary
### Acceptance Criteria
1. the next candidate family can be justified with explicit criteria
2. the rejected families have explicit hold reasons
## Phase 2: Select One Next Candidate or Hold All
### Objective
Reduce the next-step ambiguity to a single bounded decision.
### Tasks
1. select exactly one family as the next real-sample entry candidate
2. or explicitly conclude that all boundary families remain held
3. record why the non-selected families remain out of scope
### Deliverables
1. boundary-family entry decision
2. hold reasons for non-selected families
### Acceptance Criteria
1. no more than one next family is opened
2. the decision is bounded and defensible
## Phase 3: Publish the Next Bounded Execution Slice
### Objective
Turn the decision into the next actionable bounded plan.
### Tasks
1. if one family is selected, write a bounded `design + plan` for its minimum real-sample entry slice
2. if none is selected, write a bounded prerequisites plan instead
3. update the decision report layer
### Deliverables
1. next-family bounded `design`
2. next-family bounded `plan`
3. roadmap closure report
### Acceptance Criteria
1. the next step is ready to execute without reopening roadmap scope
2. only one bounded direction is emitted
## Completion Criteria
This roadmap is complete when:
1. the post-mainline next step is reduced to one bounded direction
2. `G6 / G7 / G8` no longer compete ambiguously for priority
3. a single follow-up `design + plan` exists for the selected direction
## Next Step
After this roadmap completes:
1. execute the selected family-entry slice if one family is admitted
2. otherwise execute the bounded prerequisites slice before any boundary family enters real-sample scope

View File

@@ -0,0 +1,38 @@
# Boundary Residual Hold Decision Plan
> Date: 2026-04-19
> Status: Draft
> Parent Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-residual-13-closure-plan.md`
> Parent Route: `Residual Route C`
> Parent Layer: `Layer D`
## Plan Intent
Decide whether the remaining `local_doc_pipeline` and `host_bridge_workflow` residual records should remain held or enter a future runtime roadmap.
## Fixed Input Bucket
1. five `local_doc_pipeline` residual records
2. one `host_bridge_workflow` residual record
## Allowed Files
1. decision JSON asset
2. decision report
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. classify each boundary residual as hold/defer/runtime-roadmap-input;
2. do not implement runtime support;
3. publish decision report.
## Stop Statement
Stop after decision assets are published.

View File

@@ -0,0 +1,123 @@
# Boundary Runtime Prerequisites Roadmap Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-boundary-runtime-prerequisites-roadmap-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-boundary-runtime-prerequisites-roadmap-design.md)
## Plan Intent
This roadmap determines the next bounded prerequisites slice after the post-`G7` boundary decision concludes that direct `G6` or `G8` execution should not start yet.
Its only purpose is:
`select one bounded prerequisite direction before the next boundary-family real-sample attempt`
## Scope Guardrails
1. do not execute `G6` or `G8`
2. do not reopen `G7`
3. do not reopen `G1-E / G2 / G3`
4. do not implement host-runtime or local-doc runtime under this roadmap
5. do not open `G4 / G5`
## Candidate Prerequisite Directions
The only candidates under this roadmap are:
1. `G6 host-bridge prerequisites`
2. `G8 local-doc prerequisites`
## Workstreams
1. `WS1` Freeze the Post-G7 Boundary Hold State
2. `WS2` Compare G6 and G8 Prerequisite Burden
3. `WS3` Select One Prerequisite Direction
4. `WS4` Publish the Next Bounded Prerequisites Slice
## Phase 0: Freeze the Starting State
### Objective
Lock the roadmap start point so no closed family work is reopened.
### Tasks
1. freeze `G7` as closed
2. freeze `G6` and `G8` as held pending prerequisites
3. freeze `G1-E / G2 / G3` as closed
4. freeze `G4 / G5` as out of scope
### Deliverables
1. starting-state note
2. fixed prerequisite candidate list
### Acceptance Criteria
1. no family execution begins under this roadmap
## Phase 1: Compare Prerequisite Burden
### Objective
Compare `G6` and `G8` at the prerequisite level instead of at the execution level.
### Tasks
1. restate the smallest blocked capability for `G6`
2. restate the smallest blocked capability for `G8`
3. compare which prerequisite can be isolated more cleanly
### Deliverables
1. prerequisite comparison matrix
2. smallest-prerequisite summary
### Acceptance Criteria
1. the selected prerequisite direction is justified explicitly
## Phase 2: Select One Prerequisite Direction
### Objective
Reduce the post-`G7` prerequisite ambiguity to one bounded decision.
### Tasks
1. select exactly one direction:
- `G6 host-bridge prerequisites`
- or `G8 local-doc prerequisites`
2. record why the other direction remains held
### Deliverables
1. prerequisite direction decision
2. hold reason for the non-selected direction
### Acceptance Criteria
1. only one next direction is opened
2. the decision is bounded and defensible
## Phase 3: Publish the Next Bounded Slice
### Objective
Turn the decision into the next executable bounded artifact.
### Tasks
1. write one bounded follow-up design and plan for the selected prerequisite direction
2. publish a roadmap closure report
### Deliverables
1. next bounded `design`
2. next bounded `plan`
3. roadmap closure report
### Acceptance Criteria
1. the next step is ready without extending this roadmap
2. only one bounded direction is emitted

View File

@@ -0,0 +1,54 @@
# Final 2 Official Board Reconciliation Refresh Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Status: Draft
## Plan Intent
Refresh official board framework fields after one or both final-2 residual roadmaps publish reconciliation candidates.
## Fixed Inputs
At least one of:
1. `tests/fixtures/generated_scene/bootstrap_target_normalization_reconciliation_candidates_2026-04-19.json`
2. `tests/fixtures/generated_scene/host_bridge_runtime_reconciliation_candidates_2026-04-19.json`
Also required:
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
## Allowed Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/final_2_official_board_reconciliation_refresh_2026-04-19.json`
3. `docs/superpowers/reports/2026-04-19-final-2-official-board-reconciliation-refresh-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
## Tasks
1. Load candidate assets that exist.
2. Verify each candidate belongs to `sweep-085-scene` or `sweep-091-scene`.
3. Match board rows by `sceneId`.
4. Update only framework-layer fields.
5. Recompute board framework summary.
6. Publish reconciliation refresh JSON and report.
## Expected Delta
Delta depends on candidate assets:
1. one closed residual: `framework-auto-pass +1`, `framework-structured-fail-closed -1`
2. both closed residuals: `framework-auto-pass +2`, `framework-structured-fail-closed -2`
3. held residuals: no count delta, but narrower next action / hold reason
## Stop Statement
Stop after the final-2 board reconciliation refresh JSON and report are published.

View File

@@ -0,0 +1,47 @@
# Final 2 Residual Child Plan Sequence Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer E / Route 5 + Route 6`
> Status: Draft
## Plan Intent
Create the remaining child-plan sequence for the last two framework structured fail-closed residuals. This plan only defines the sequence and child plan boundaries; it does not execute implementation.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/local_doc_official_board_reconciliation_refresh_2026-04-19.json`
## Fixed Residual Bucket
1. `sweep-085-scene`: `host_bridge_workflow`, `future-host-bridge-runtime-roadmap-input`
2. `sweep-091-scene`: `page_state_eval`, `future-bootstrap-target-normalization-roadmap-input`
## Child Plans
1. `2026-04-19-final-2-residual-roadmap-prioritization-plan.md`
2. `2026-04-19-bootstrap-target-normalization-roadmap-plan.md`
3. `2026-04-19-host-bridge-runtime-roadmap-plan.md`
4. `2026-04-19-final-2-official-board-reconciliation-refresh-plan.md`
5. `2026-04-19-102-framework-closure-rollup-plan.md`
## Scope Guardrails
1. Do not modify `analyzer.rs`.
2. Do not modify `generator.rs`.
3. Do not update the official board under this sequence-definition plan.
4. Do not run a full 102 sweep under this plan.
5. Do not reopen G1-E, G2, G3, or local-doc runtime work.
6. Do not continue the old G6 micro-plan chain.
## Completion Criteria
1. The final-2 residual child plan sequence exists.
2. Each child plan declares parent route, fixed input bucket, allowed files, forbidden files, expected delta, and stop statement.
3. The next executable child plan is the prioritization plan.
## Stop Statement
Stop after the final-2 child plan sequence is created. Do not execute any child plan under this sequence-definition plan.

View File

@@ -0,0 +1,43 @@
# Final 2 Residual Roadmap Prioritization Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Status: Draft
## Plan Intent
Select the next residual roadmap from the final two structured fail-closed records.
## Fixed Input Bucket
1. `sweep-085-scene`: host-bridge runtime residual
2. `sweep-091-scene`: bootstrap target normalization residual
## Allowed Files
1. `tests/fixtures/generated_scene/final_2_residual_roadmap_prioritization_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-final-2-residual-roadmap-prioritization-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. Load the current official board.
2. Extract the two residuals.
3. Score bootstrap normalization vs host-bridge runtime.
4. Select exactly one first roadmap.
5. Publish decision JSON.
6. Publish decision report.
## Expected Delta
No coverage delta. This is a decision-only plan.
## Stop Statement
Stop after the prioritization asset and report are published. Do not start the selected roadmap under this plan.

View File

@@ -0,0 +1,55 @@
# G1-E Remaining Fail-Closed Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 4: G1-E / single_request_enrichment`
> Parent Layer: `Layer C + Layer D`
> Upstream Design: `docs/superpowers/specs/2026-04-19-g1e-remaining-fail-closed-closure-design.md`
## Plan Intent
Implement one bounded correction slice for the remaining Route 4 `G1-E` fail-closed records.
## Fixed Input Bucket
`single_request_enrichment = 2`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. Route 4 local inventory and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 2 and Route 3 assets
3. Route 5+ assets
## Tasks
1. freeze the two Route 4 records
2. confirm the repeated missing contract
3. implement one bounded `G1-E` correction slice
4. rerun bounded validation
5. publish Route 4 delta
## Expected Coverage Delta
1. reduce the `G1-E` fail-closed bucket
2. preserve current `G1-E` real-sample pass and canonical stability
## Completion Criteria
1. Route 4 bucket has measured before/after status
2. Route 4 is closed or deferred
## Stop Statement
Stop after Route 4 delta is measured.
Do not begin Route 5 under this plan.

View File

@@ -0,0 +1,185 @@
# G2 Real Sample Contract Correction Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g2-real-sample-contract-correction-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g2-real-sample-contract-correction-design.md)
> Trigger Record: `rsv-g2-001`
## Plan Intent
This plan implements one bounded mainline correction slice:
`G2 real-sample contract correction`
Its purpose is to reduce the current real-sample `G2` mismatch from the broad bundle:
1. `bootstrap_mismatch`
2. `request_contract_missing`
3. `column_defs_missing`
4. `output correctness not closed`
into either:
1. a verified pass
2. or a smaller named contract mismatch
## Scope Guardrails
1. do not reopen completed `G2` family expansion work
2. do not add new `G2` fixtures or promote new `G2` candidates
3. do not reopen `G3`, `G1-E`, or boundary families
4. do not turn this work into login recovery or broader runtime-platform implementation
5. do not update validation assets until the real-sample outcome becomes narrower than the current broad mismatch bundle
## Fixed Verification Anchor
The only anchor under this plan is:
1. `台区线损大数据-月_周累计线损率统计分析`
Mapped real-sample record:
1. `rsv-g2-001`
## Workstreams
1. `WS1` Real-Sample Contract Differential
2. `WS2` Bootstrap and Request Contract Narrowing
3. `WS3` Column and Output Contract Narrowing
4. `WS4` Regression, Rerun, and Validation Closure
## Phase 0: Freeze the Correction Boundary
### Objective
Lock the scope to the fixed `G2` real sample and its remaining contract gaps.
### Tasks
1. freeze `rsv-g2-001` as the only real-sample correction target
2. freeze the current mismatch bundle from the validation layer
3. freeze `G2` family-expansion outputs as completed and out of scope
### Deliverables
1. correction-boundary note
2. fixed mismatch statement
### Acceptance Criteria
1. no new `G2` family-expansion task is opened
2. the correction target is explicitly limited to real-sample contract closure
## Phase 1: Build the Real-Sample Contract Differential
### Objective
Make the smallest remaining real-sample contract mismatch explicit before code changes.
### Tasks
1. compare the current real generated `SceneIr` against the intended `tq-lineloss-report` contract
2. isolate whether the dominant remaining gap is:
- bootstrap target selection
- per-mode request template completeness
- output column semantics
- output artifact correctness
3. write a minimum contract-gap summary
### Deliverables
1. contract differential note
2. minimum gap summary
### Acceptance Criteria
1. the smallest remaining `G2` mismatch is explicit
2. the next implementation target is narrower than the current broad mismatch bundle
## Phase 2: Narrow Bootstrap and Request Contract Gaps
### Objective
Correct only the bootstrap and request-side contract pieces that the real sample proves are still too coarse.
### Tasks
1. adjust `G2` bootstrap resolution only where the real sample proves it is still misaligned
2. adjust mode-specific request contract recovery only where the real sample proves it is still incomplete
3. preserve fail-closed behavior for unresolved `G2` variants
### Deliverables
1. bounded bootstrap correction
2. bounded request-contract correction
### Acceptance Criteria
1. the real sample no longer keeps the same broad bootstrap/request mismatch shape
2. unrelated `G2` family fixtures are not broadened or reclassified
## Phase 3: Narrow Column and Output Contract Gaps
### Objective
Reduce the remaining output-side mismatch to a verified or smaller state.
### Tasks
1. adjust `G2` column-definition recovery only where the real sample proves it is still incomplete
2. adjust output-contract verification only where the real sample proves the generated artifact is too coarse
3. keep readiness and fail-closed behavior intact for still-unresolved samples
### Deliverables
1. bounded column-contract correction
2. bounded output-contract correction
### Acceptance Criteria
1. the real-sample mismatch becomes narrower than the current broad bundle
2. `G2` does not regress into false positives for unresolved variants
## Phase 4: Regression, Rerun, and Validation Closure
### Objective
Use rerun and validation-layer updates to close the bounded `G2` correction loop.
### Tasks
1. add or update regression that names the corrected `G2` real-sample pattern
2. rerun the fixed real sample
3. record whether:
- the sample becomes `executed-pass`
- or the remaining mismatch is now smaller and named
4. update the validation-layer assets
5. write a formal closure report
### Deliverables
1. rerun output
2. updated validation assets
3. `G2` real-sample contract-correction closure report
### Acceptance Criteria
1. `rsv-g2-001` no longer remains unchanged as the same broad mismatch bundle
2. the narrowed outcome is covered by automated regression
3. validation assets record the narrower `G2` state
## Completion Criteria
This plan is complete when:
1. the fixed `G2` real sample no longer remains at the same broad mismatch bundle
2. the narrower result is covered by automated regression
3. validation assets are updated with the narrowed outcome
4. completed `G2` family-expansion work remains untouched
## Next Step
After this plan completes:
1. if `G2` becomes `executed-pass`, mainline real-sample pressure leaves both `G2` and `G3`
2. if `G2` still has a smaller named mismatch, move only to that narrower `G2` correction slice

View File

@@ -0,0 +1,55 @@
# G2 Remaining Fail-Closed Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 3: G2 / multi_mode_request`
> Parent Layer: `Layer C + Layer D`
> Upstream Design: `docs/superpowers/specs/2026-04-19-g2-remaining-fail-closed-closure-design.md`
## Plan Intent
Implement one bounded correction slice for the remaining Route 3 `G2` fail-closed records.
## Fixed Input Bucket
`multi_mode_request = 4`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. Route 3 local inventory and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 2 assets
3. Route 4+ assets
## Tasks
1. freeze the four Route 3 records
2. confirm the repeated missing contract
3. implement one bounded `G2` correction slice
4. rerun bounded validation
5. publish Route 3 delta
## Expected Coverage Delta
1. reduce the `multi_mode_request` fail-closed bucket
2. protect current `G2` real-sample pass and canonical stability
## Completion Criteria
1. Route 3 bucket has measured before/after status
2. Route 3 is closed or explicitly deferred
## Stop Statement
Stop after Route 3 delta is measured.
Do not begin Route 4 under this plan.

View File

@@ -0,0 +1,48 @@
# G2 Residual 2 Readiness Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-residual-13-closure-plan.md`
> Parent Route: `Residual Route B`
> Parent Layer: `Layer C`
## Plan Intent
Close the `2` remaining `G2 / multi_mode_request` structured fail-closed records by correcting bounded readiness or contract interpretation.
## Fixed Input Bucket
1. `sweep-018-scene` / `白银线损周报`
2. `sweep-071-scene` / `台区线损大数据-月_周累计线损率统计分析`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/scene_generator_test.rs`
4. route-local follow-up JSON/report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. G3/G6/G8 route code unless required to preserve regression tests
3. family baseline manifests
## Tasks
1. inspect the two fixed G2 residuals;
2. determine whether readiness labels `02` and `00` are report parsing artifacts or real contract gaps;
3. implement one bounded G2 correction if justified;
4. rerun only the two fixed scenes;
5. publish delta report.
## Expected Coverage Delta
Target: reduce the `2` G2 residual fail-closed records.
## Stop Statement
Stop after the two-scene route-local follow-up and report.
Do not continue into G1-E, G3, or boundary work.

View File

@@ -0,0 +1,56 @@
# G3 Enrichment Request Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 2: G3 / paginated_enrichment`
> Parent Layer: `Layer C + Layer D`
> Upstream Design: `docs/superpowers/specs/2026-04-19-g3-enrichment-request-closure-design.md`
## Plan Intent
Implement the first bounded `G3` contract-recovery slice by recovering repeated enrichment-request and secondary-request evidence gaps inside the remaining `paginated_enrichment` fail-closed bucket.
## Fixed Input Bucket
`paginated_enrichment + g3_enrichment_contract + secondary_request`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local follow-up JSON and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 3+ plan files
3. family promotion assets
## Tasks
1. freeze the targeted `G3` subgroup from the current follow-up asset
2. confirm the repeated enrichment-request missing pattern
3. implement one bounded contract-recovery slice
4. rerun only the bounded validation needed by this subgroup
5. publish subgroup delta and residual subgroup count
## Expected Coverage Delta
1. reduce the count of `paginated_enrichment` fail-closed records caused primarily by enrichment-request closure failure
2. do not reduce canonical or real-sample `G3` pass stability
## Completion Criteria
1. targeted subgroup has a measured before/after count
2. remaining unresolved Route 2 issues are explicitly handed to the next child plan
3. no route drift into `host_bridge_workflow`
## Stop Statement
Stop after the targeted enrichment-request subgroup has been corrected or explicitly bounded as residual.
Do not continue into export-plan closure work under this plan.

View File

@@ -0,0 +1,55 @@
# G3 Export Plan Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 2: G3 / paginated_enrichment`
> Parent Layer: `Layer C + Layer D`
> Upstream Design: `docs/superpowers/specs/2026-04-19-g3-export-plan-closure-design.md`
## Plan Intent
Implement the second bounded `G3` contract-recovery slice by recovering repeated export-plan evidence gaps inside the remaining `paginated_enrichment` fail-closed bucket.
## Fixed Input Bucket
`paginated_enrichment + g3_export_plan + export_plan`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local follow-up JSON and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 3+ plan files
3. promotion policy assets
## Tasks
1. freeze the targeted export-plan subgroup
2. confirm repeated `export_plan` and `g3_export_plan` missing pattern
3. implement one bounded export-plan recovery slice
4. rerun bounded validation only for this subgroup
5. publish delta and residual Route 2 inventory
## Expected Coverage Delta
1. reduce the count of `paginated_enrichment` records whose primary blocker is export-plan absence
2. preserve stable `G3` canonical and real-sample anchors
## Completion Criteria
1. export-plan subgroup count is lower or more narrowly classified
2. residual Route 2 bucket is explicitly measured
## Stop Statement
Stop after the export-plan subgroup has been rerun and measured.
Do not continue into Route 2 residual closure under this plan.

View File

@@ -0,0 +1,171 @@
# G3 Real Sample Archetype Correction Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g3-real-sample-archetype-correction-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g3-real-sample-archetype-correction-design.md)
> Trigger Report: [2026-04-19-g3-real-sample-execution-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g3-real-sample-execution-report.md)
## Plan Intent
This plan implements the next bounded scope selected by the real-sample validation roadmap:
`mainline G3 real-sample archetype correction`
Its purpose is to correct the routing boundary that currently makes the real sample `95598工单明细表` collapse into `local_doc_pipeline`.
## Scope Guardrails
1. Do not reopen the completed `G3` repo-local family expansion program.
2. Do not broaden this work into `G8` runtime implementation.
3. Do not open `G4 / G5`.
4. Do not add new family-expansion fixtures unrelated to the real-sample mismatch.
5. Do not weaken fail-closed behavior in order to force a pass result.
6. Do not treat generic asset updates as progress unless they directly unblock the real-sample rerun.
## Workstreams
1. `WS1` Real-Sample Evidence Differential
2. `WS2` G3-vs-G8 Routing Boundary Correction
3. `WS3` Regression and Fail-Closed Integrity
4. `WS4` Real-Sample Rerun and Closure
## Phase 0: Freeze the Correction Boundary
### Objective
Lock the scope to one mismatch: the `G3` real sample being misrouted into `G8`.
### Tasks
1. freeze `95598工单明细表` as the only real-sample correction anchor
2. freeze the current observed mismatch:
- `archetype_mismatch`
- `evidence_not_closed`
3. freeze current `G8` behavior as a boundary-family constraint that must not regress
### Deliverables
1. correction-boundary note
2. fixed anchor and mismatch statement
### Acceptance Criteria
1. no additional family or runtime scope is added under this plan
2. the correction target is explicitly `G3 vs G8` routing
## Phase 1: Build the Real-Sample Evidence Differential
### Objective
Understand why the real sample routes differently from the repo-local `G3` baseline.
### WS1 Tasks
1. compare repo-local `G3` canonical evidence against real-sample deterministic facts
2. isolate which evidence currently drives `local_doc_pipeline`
3. isolate which `G3` business-chain signals are present but losing in routing
4. write a differential summary that identifies the minimum routing fix
### Deliverables
1. evidence differential note
2. real-sample routing-pressure summary
### Acceptance Criteria
1. the team can point to the specific evidence classes causing `G8` to win
2. the minimum routing correction is explicit before code changes start
## Phase 2: Correct the G3-vs-G8 Routing Boundary
### Objective
Change routing so recoverable `G3` business-chain evidence outranks `G8` local-pipeline evidence for this mismatch class.
### WS2 Tasks
1. tighten the `local_doc_pipeline` trigger threshold for mixed-evidence scenes
2. raise the priority of `G3` when:
- main request exists
- pagination contract is recoverable
- enrichment or detail chain exists
3. keep `G8` routing only when local pipeline evidence is still the dominant workflow backbone
4. preserve fail-closed behavior if the sample still does not satisfy the `G3` minimum contract after routing correction
### Deliverables
1. analyzer and generator routing update
2. explicit `G3 vs G8` routing rule in code comments or tests where needed
### Acceptance Criteria
1. the real-sample mismatch no longer defaults to `local_doc_pipeline`
2. `G8` representative classification remains intact
3. incomplete `G3` still fail-closes without pseudo-runnable output
## Phase 3: Lock Regression and Fail-Closed Integrity
### Objective
Prove the correction does not trade one false-positive for another.
### WS3 Tasks
1. add deterministic regression for the mixed `G3/G8` evidence pattern
2. add generator regression showing the corrected route stays inside `G3`
3. retain or strengthen `G8` regression so the boundary family does not collapse
4. verify that unresolved `G3` cases still fail closed for `G3` reasons
### Deliverables
1. regression tests for `G3 vs G8`
2. updated validation fixtures or assertions as needed
### Acceptance Criteria
1. no regression causes `G8` to disappear as a boundary archetype
2. no regression reintroduces a false-positive runnable skill
3. test coverage explicitly names the corrected mismatch pattern
## Phase 4: Rerun the Real Sample and Close the Loop
### Objective
Use the actual real sample to confirm the correction outcome and record the next state.
### WS4 Tasks
1. rerun `sg_scene_generate` on `95598工单明细表`
2. record whether the sample now:
- resolves as `paginated_enrichment`
- or fail-closes inside `G3`
3. update the real-sample validation record layer
4. write a formal correction closure report
### Deliverables
1. rerun output
2. updated real-sample validation assets
3. `G3` archetype-correction closure report
### Acceptance Criteria
1. the rerun no longer reports `local_doc_pipeline` as the controlling archetype
2. the validation layer records the corrected family outcome
3. the next scope recommendation can move from `G3 archetype correction` to the next remaining mainline gap
## Completion Criteria
This plan is complete when:
1. the `G3` real sample no longer collapses into `local_doc_pipeline`
2. the corrected route is covered by automated regression
3. real-sample validation assets are updated with the new outcome
4. `G8` remains a valid boundary-family archetype with no unintended regression
## Next Step
After this plan completes:
1. if `G3` real-sample routing is corrected and still shows a `G3` contract gap, move to `G3` real-sample contract correction
2. if `G3` stabilizes, return to the next mainline mismatch in priority order, which is `G2` real-sample contract correction

View File

@@ -0,0 +1,173 @@
# G3 Real Sample Output Contract Verification Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g3-real-sample-output-contract-verification-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g3-real-sample-output-contract-verification-design.md)
> Trigger Report: [2026-04-19-g3-real-sample-runtime-contract-correction-closure-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g3-real-sample-runtime-contract-correction-closure-report.md)
## Plan Intent
This plan implements the next bounded mainline scope after `G3` runtime-scope correction:
`G3 real-sample output / contract verification`
Its purpose is to reduce the remaining real-sample mismatch from a generic verification gap to either:
1. a verified pass
2. or a smaller named output/contract mismatch
## Scope Guardrails
1. do not reopen the completed `G3` archetype-correction scope
2. do not reopen the completed `G3` runtime-scope correction scope
3. do not broaden this work into `G8` runtime implementation
4. do not reopen `G3` family expansion or add unrelated fixtures
5. do not open `G4 / G5`
6. do not update validation assets until output verification produces a narrower outcome
## Workstreams
1. `WS1` Real-Sample Output Contract Differential
2. `WS2` G3 Output Contract Narrowing
3. `WS3` Regression and Verification Integrity
4. `WS4` Real-Sample Verification Rerun and Closure
## Phase 0: Freeze the Verification Boundary
### Objective
Lock the scope to one remaining mismatch: `output_contract_not_verified`.
### Tasks
1. freeze `95598工单明细表` as the only verification anchor
2. freeze current remaining mismatch:
- `output_contract_not_verified`
3. freeze current `G3` routing and runtime-scope behavior as completed constraints that must not regress
### Deliverables
1. verification-boundary note
2. fixed output-gap statement
### Acceptance Criteria
1. no additional family or runtime scope is added under this plan
2. the correction target is explicitly `G3 output / contract verification`
## Phase 1: Build the Real-Sample Output Contract Differential
### Objective
Understand exactly what part of the generated real-sample contract is still unverified.
### Tasks
1. compare the real generated `SceneIr` against the intended `G3` business output contract
2. isolate which fields are structurally present but semantically too broad
3. isolate whether the dominant gap is:
- main request selection
- enrichment request partitioning
- join key correctness
- merge/dedupe correctness
- export contract correctness
4. write a minimum verification-gap summary before code changes begin
### Deliverables
1. output-contract differential note
2. minimum verification-gap summary
### Acceptance Criteria
1. the smallest remaining output mismatch is explicit
2. the next change target is narrower than the current generic verification label
## Phase 2: Narrow the G3 Output Contract Gap
### Objective
Reduce the real-sample mismatch from generic non-verified output to a specific verified contract state.
### Tasks
1. adjust the minimum `G3` output-contract logic only where the real sample proves it is too coarse
2. keep routing and runtime-scope logic unchanged unless required by output verification
3. preserve fail-closed behavior for scenes whose output contract is still unresolved
### Deliverables
1. bounded output-contract update
2. explicit verification rule in code or tests where needed
### Acceptance Criteria
1. the real-sample mismatch is narrower than `output_contract_not_verified`
2. no unrelated family is reclassified or broadened
3. the corrected result stays inside `G3`
## Phase 3: Lock Regression and Verification Integrity
### Objective
Prove the narrower contract logic does not create false positives.
### Tasks
1. add or update regression that names the corrected real-sample verification pattern
2. retain mixed-boundary, `G8`, and canonical regressions
3. verify unresolved `G3` cases still fail closed when the output contract is genuinely incomplete
### Deliverables
1. regression tests for `G3` output verification
2. updated assertions where needed
### Acceptance Criteria
1. no regression causes `G8` to disappear as a boundary archetype
2. no regression causes unrelated `single_request_table` or other families to drift
3. test coverage explicitly names the corrected output-verification pattern
## Phase 4: Rerun the Real Sample and Close the Loop
### Objective
Use the actual real sample to confirm the narrowed output-verification outcome and record the next state.
### Tasks
1. rerun `sg_scene_generate` on `95598工单明细表`
2. record whether:
- the sample becomes `executed-pass`
- or the remaining mismatch is narrower than `output_contract_not_verified`
3. update the real-sample validation record layer
4. write a formal closure report
### Deliverables
1. rerun output
2. updated real-sample validation assets
3. `G3` output-contract-verification closure report
### Acceptance Criteria
1. the rerun no longer leaves the generic `output_contract_not_verified` label unchanged
2. the validation layer records a narrower family outcome
3. the next scope recommendation can move from `G3` to the next mainline gap when appropriate
## Completion Criteria
This plan is complete when:
1. the `G3` real sample no longer ends at the generic `output_contract_not_verified` label
2. the narrowed result is covered by automated regression
3. real-sample validation assets are updated with the new outcome
4. `G8` and prior `G3` routing/runtime corrections remain intact
## Next Step
After this plan completes:
1. if `G3` becomes `executed-pass`, return to the next mainline mismatch in priority order, which is `G2` real-sample contract correction
2. if `G3` still has a smaller output-specific mismatch, move only to that narrower `G3` verification slice

View File

@@ -0,0 +1,166 @@
# G3 Real Sample Runtime Contract Correction Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g3-real-sample-runtime-contract-correction-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g3-real-sample-runtime-contract-correction-design.md)
> Trigger Report: [2026-04-19-g3-real-sample-archetype-correction-closure-report.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/reports/2026-04-19-g3-real-sample-archetype-correction-closure-report.md)
## Plan Intent
This plan implements the next bounded scope selected after `G3` archetype correction:
`mainline G3 real-sample runtime / contract correction`
Its purpose is to narrow the remaining real-sample gap from a coarse runtime-scope failure to the smallest accurate contract state.
## Scope Guardrails
1. do not reopen the completed `G3` archetype-correction scope
2. do not broaden this work into `G8` runtime implementation
3. do not open `G4 / G5`
4. do not add new family-expansion fixtures unrelated to the real-sample mismatch
5. do not weaken fail-closed behavior for incomplete `G3` scenes
6. do not update validation assets until the rerun result changes
## Workstreams
1. `WS1` Runtime-Scope Differential
2. `WS2` G3 Runtime-Scope Gate Narrowing
3. `WS3` Regression and Fail-Closed Integrity
4. `WS4` Real-Sample Rerun and Closure
## Phase 0: Freeze the Correction Boundary
### Objective
Lock the scope to one remaining mismatch: `G3` real-sample runtime scope compatibility.
### Tasks
1. freeze `95598工单明细表` as the only correction anchor
2. freeze current remaining mismatch:
- `runtime_scope_gap`
- `output_contract_not_verified`
3. freeze current `G8` behavior as a boundary-family constraint that must not regress
### Deliverables
1. correction-boundary note
2. fixed runtime-gap statement
### Acceptance Criteria
1. no additional family or runtime scope is added under this plan
2. the correction target is explicitly `G3 runtime scope`, not a broader runtime program
## Phase 1: Build the Runtime-Scope Differential
### Objective
Understand why the current gate still marks the real sample as runtime-incompatible.
### Tasks
1. compare current `G3` runtime-scope gate logic against the corrected real-sample evidence
2. isolate which localhost evidence should remain subordinate
3. isolate what dominant-runtime pattern should still fail closed
4. write a minimum gate-narrowing summary before code changes begin
### Deliverables
1. runtime-scope differential note
2. gate-narrowing summary
### Acceptance Criteria
1. the minimum change to `g3_runtime_scope_compatible` is explicit
2. the team can distinguish subordinate host-runtime evidence from dominant runtime takeover
## Phase 2: Narrow the G3 Runtime-Scope Gate
### Objective
Allow valid `G3` real samples with subordinate localhost evidence to stay runtime-compatible.
### Tasks
1. narrow `g3_runtime_scope_compatible` so it considers business-chain dominance, not only localhost evidence count
2. preserve fail-closed behavior for scenes whose business chain is still not dominant
3. keep `G8` representative behavior intact
### Deliverables
1. generator gate update
2. explicit regression rule for subordinate localhost evidence inside `G3`
### Acceptance Criteria
1. the corrected real sample no longer fails the runtime-scope gate for the old coarse reason
2. `G8` representative classification remains intact
3. incomplete `G3` scenes still fail closed for `G3` reasons
## Phase 3: Lock Regression and Fail-Closed Integrity
### Objective
Prove the narrowed gate does not create a pseudo-runnable class of scenes.
### Tasks
1. add regression for real-sample-like `G3` with subordinate localhost evidence
2. retain `G8` regression and mixed-boundary regression
3. verify unresolved `G3` scenes still fail closed when business-chain dominance is absent
### Deliverables
1. regression tests for `G3 runtime scope`
2. updated assertions where needed
### Acceptance Criteria
1. no regression causes `G8` to disappear as a boundary archetype
2. no regression causes unrelated `single_request_table` or other families to drift
3. test coverage explicitly names the corrected runtime-scope pattern
## Phase 4: Rerun the Real Sample and Close the Loop
### Objective
Use the actual real sample to confirm the narrowed runtime-scope outcome and record the next state.
### Tasks
1. rerun `sg_scene_generate` on `95598工单明细表`
2. record whether:
- `g3_runtime_scope_compatible` now passes
- remaining mismatch, if any, is narrower than runtime-scope failure
3. update the real-sample validation record layer
4. write a formal closure report
### Deliverables
1. rerun output
2. updated real-sample validation assets
3. `G3` runtime-contract-correction closure report
### Acceptance Criteria
1. the rerun no longer fails for `g3_runtime_scope`
2. the validation layer records the narrowed family outcome
3. the next scope recommendation can move from `G3 runtime correction` to the next remaining mainline gap
## Completion Criteria
This plan is complete when:
1. the `G3` real sample no longer fails for the old coarse runtime-scope reason
2. the narrowed gate is covered by automated regression
3. real-sample validation assets are updated with the new outcome
4. `G8` remains a valid boundary-family archetype with no unintended regression
## Next Step
After this plan completes:
1. if `G3` still has a narrower output or data-verification gap, move to `G3` real-sample output or contract verification
2. if `G3` stabilizes, return to the next mainline mismatch in priority order, which is `G2` real-sample contract correction

View File

@@ -0,0 +1,50 @@
# G3 Residual 4 Workflow Evidence Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-residual-13-closure-plan.md`
> Parent Route: `Residual Route A`
> Parent Layer: `Layer C`
## Plan Intent
Close the `4` remaining `G3 / paginated_enrichment` structured fail-closed scenes by recovering missing workflow evidence without relaxing gates.
## Fixed Input Bucket
1. `sweep-007-scene` / `95598供电服务月报`
2. `sweep-039-scene` / `故障报修工单信息统计表`
3. `sweep-068-scene` / `输变电设备运行分析报告`
4. `sweep-084-scene` / `巡视计划完成情况自动检索`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/scene_generator_test.rs`
4. route-local follow-up JSON/report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. family baseline manifests
3. G6/G8 runtime implementation files
## Tasks
1. inspect the four fixed reports and source scenes;
2. identify the repeated missing G3 evidence subtype;
3. implement one bounded G3 recovery slice;
4. rerun only the four fixed scenes;
5. publish delta report.
## Expected Coverage Delta
Target: reduce the `4` G3 residual fail-closed records.
## Stop Statement
Stop after the four-scene route-local follow-up and report.
Do not continue into G2 or boundary residual work.

View File

@@ -0,0 +1,54 @@
# G3 Residual Contract Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 2: G3 / paginated_enrichment`
> Parent Layer: `Layer C + Layer D`
> Upstream Design: `docs/superpowers/specs/2026-04-19-g3-residual-contract-closure-design.md`
## Plan Intent
Implement the final bounded Route 2 slice for any `G3` residual contract blockers left after enrichment-request and export-plan closure work.
## Fixed Input Bucket
Residual `G3 / paginated_enrichment` bucket after the first two Route 2 child plans.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/scene_generator_test.rs`
5. route-local residual inventory and report assets
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. Route 3+ implementation assets
## Tasks
1. freeze post-Route-2 residual inventory
2. group residual blockers
3. implement at most one bounded residual correction slice
4. rerun bounded validation
5. declare Route 2 complete or deferred
## Expected Coverage Delta
1. shrink or explicitly name the final residual `G3` bucket
2. produce a clean handoff into Route 3
## Completion Criteria
1. Route 2 is no longer open-ended
2. remaining residual `G3` records are explicitly categorized
## Stop Statement
Stop after Route 2 is explicitly closed or deferred.
Do not begin Route 3 work under this plan.

View File

@@ -0,0 +1,72 @@
# G6 Host-Bridge Callback Semantics Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-callback-semantics-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-callback-semantics-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge callback semantics`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly
3. do not open `G8`
4. do not reopen `G7`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze callback semantics scope
2. `WS2` Define completion-state semantics
3. `WS3` Publish one bounded callback-semantic result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to callback semantics only.
### Tasks
1. freeze `G6` as the only target
2. freeze transport/runtime implementation and real execution as out of scope
### Acceptance Criteria
1. no broader host-runtime work begins under this plan
## Phase 1: Define Completion-State Semantics
### Objective
Turn callback completion into an explicit bounded semantic model.
### Tasks
1. define `ok`
2. define `partial`
3. define `blocked`
4. define `error`
### Acceptance Criteria
1. callback state logic is explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the callback-semantic model into one bounded next artifact.
### Tasks
1. publish the semantic result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step remains narrower than direct host-runtime implementation

View File

@@ -0,0 +1,72 @@
# G6 Host-Bridge Callback State Verification Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-callback-state-verification-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-callback-state-verification-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge callback state verification`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly
3. do not open `G8`
4. do not reopen `G7`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze callback-state verification scope
2. `WS2` Define verification targets for `ok/partial/blocked/error`
3. `WS3` Publish one bounded verification result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to callback-state verification only.
### Tasks
1. freeze `G6` as the only target
2. freeze implementation and real execution as out of scope
### Acceptance Criteria
1. no broader host-runtime work begins under this plan
## Phase 1: Define Verification Targets
### Objective
Turn the explicit callback states into bounded verification targets.
### Tasks
1. define verification target for `ok`
2. define verification target for `partial`
3. define verification target for `blocked`
4. define verification target for `error`
### Acceptance Criteria
1. callback verification targets are explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the callback-state verification model into one bounded next artifact.
### Tasks
1. publish the verification result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step remains narrower than direct host-runtime implementation

View File

@@ -0,0 +1,71 @@
# G6 Host-Bridge Entry Gate Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-entry-gate-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-entry-gate-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge entry gate`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly
3. do not open `G8`
4. do not reopen `G7`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze entry-gate scope
2. `WS2` Define bounded gate conditions
3. `WS3` Publish one bounded gate result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to entry-gate modeling only.
### Tasks
1. freeze `G6` as the only target
2. freeze implementation and real execution as out of scope
### Acceptance Criteria
1. no broader host-runtime work begins under this plan
## Phase 1: Define Gate Conditions
### Objective
Turn the semantic readiness criteria into bounded gate conditions.
### Tasks
1. define hard gate conditions
2. define soft/optional later conditions
3. define fail-close gate reasons
### Acceptance Criteria
1. entry-gate conditions are explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the gate model into one bounded next artifact.
### Tasks
1. publish the gate result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step remains narrower than direct host-runtime implementation

View File

@@ -0,0 +1,70 @@
# G6 Host-Bridge Entry Gate Verification Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-entry-gate-verification-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-entry-gate-verification-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge entry gate verification`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly
3. do not open `G8`
4. do not reopen `G7`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze gate-verification scope
2. `WS2` Define bounded verification targets for the hard gate
3. `WS3` Publish one bounded verification result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to gate verification only.
### Tasks
1. freeze `G6` as the only target
2. freeze implementation and real execution as out of scope
### Acceptance Criteria
1. no broader host-runtime work begins under this plan
## Phase 1: Define Verification Targets
### Objective
Turn the hard gate into bounded verification targets.
### Tasks
1. define verification target for each hard gate condition
2. define verification target for each fail-close reason
### Acceptance Criteria
1. gate verification targets are explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the gate-verification model into one bounded next artifact.
### Tasks
1. publish the verification result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step remains narrower than direct host-runtime implementation

View File

@@ -0,0 +1,71 @@
# G6 Host-Bridge Entry Readiness Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-entry-readiness-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-entry-readiness-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge entry readiness`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly
3. do not open `G8`
4. do not reopen `G7`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze entry-readiness scope
2. `WS2` Define bounded readiness criteria
3. `WS3` Publish one bounded readiness result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to entry-readiness only.
### Tasks
1. freeze `G6` as the only target
2. freeze implementation and real execution as out of scope
### Acceptance Criteria
1. no broader host-runtime work begins under this plan
## Phase 1: Define Readiness Criteria
### Objective
Turn the explicit callback verification model into bounded entry-readiness criteria.
### Tasks
1. define which semantics are required before `G6` entry can open
2. define which semantics remain optional
3. define the minimal readiness threshold
### Acceptance Criteria
1. entry-readiness criteria are explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the readiness model into one bounded next artifact.
### Tasks
1. publish the readiness result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step remains narrower than direct host-runtime implementation

View File

@@ -0,0 +1,71 @@
# G6 Host-Bridge Execution Semantics Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-execution-semantics-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-execution-semantics-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge execution semantics`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly
3. do not open `G8`
4. do not reopen `G7`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze the semantic boundary
2. `WS2` Separate bridge invocation from callback completion
3. `WS3` Publish one bounded semantic result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to semantic scoping only.
### Tasks
1. freeze `G6` as the only target
2. freeze real execution and implementation as out of scope
### Acceptance Criteria
1. no host-runtime implementation begins under this plan
## Phase 1: Separate the Minimum Semantics
### Objective
Turn the blocked capability into explicit bounded semantics.
### Tasks
1. isolate bridge action invocation semantics
2. isolate callback completion semantics
3. keep both separate from broader host-runtime work
### Acceptance Criteria
1. the semantic model is explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the semantic model into one bounded next artifact.
### Tasks
1. publish the semantic result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step remains narrower than direct host-runtime implementation

View File

@@ -0,0 +1,71 @@
# G6 Host-Bridge Prerequisites Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-host-bridge-prerequisites-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-host-bridge-prerequisites-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G6 host-bridge prerequisites`
## Scope Guardrails
1. do not execute a `G6` real sample
2. do not implement host-runtime directly under this plan
3. do not reopen `G7`
4. do not open `G8`
5. do not open `G4 / G5`
## Workstreams
1. `WS1` Freeze the G6 prerequisite boundary
2. `WS2` Isolate the minimum blocked host-bridge capability
3. `WS3` Publish one bounded prerequisite result
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to `G6` prerequisite scoping only.
### Tasks
1. freeze `G6` as the only target
2. freeze `G6` real-sample execution as out of scope
### Acceptance Criteria
1. no other boundary family is touched under this plan
## Phase 1: Isolate the Minimum Blocked Capability
### Objective
Reduce `G6` prerequisite pressure to the smallest explicit capability gap.
### Tasks
1. restate the current `G6` hold condition
2. isolate the minimum host-bridge execution semantic still missing
3. keep that capability separate from broader runtime-platform work
### Acceptance Criteria
1. the blocked capability is explicit and bounded
## Phase 2: Publish the Bounded Result
### Objective
Turn the isolated prerequisite into one bounded next artifact.
### Tasks
1. publish the prerequisite result
2. if needed, publish the next bounded follow-up plan
### Acceptance Criteria
1. the next step is narrower than broad host-runtime implementation

View File

@@ -0,0 +1,159 @@
# G6 Real-Sample Entry Preparation And Bounded Execution Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g6-real-sample-entry-preparation-and-bounded-execution-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g6-real-sample-entry-preparation-and-bounded-execution-design.md)
## Plan Intent
This plan is the only surviving `G6` execution plan after the redesign.
Its purpose is:
`stop G6 planning recursion and move directly to one bounded implementation-plus-real-sample slice`
## Scope Guardrails
1. do not open any new `G6` semantic sub-plan
2. do not reopen `G7`
3. do not open `G8`
4. do not open `G4 / G5`
5. do not broaden into host-runtime platform redesign
6. do not add more than one fixed `G6` real sample
## Preserved G6 Gate
The final frozen `G6` gate under this plan is:
### Hard Conditions
1. `host-bridge-action-invocation-defined`
2. `callback-request-completion-defined`
3. `callback-state-verification-targets-defined`
### Soft Later Conditions
1. `host-runtime-transport-implementation`
2. `real-sample-execution-proof`
### Fail-Close Reasons
1. `g6_bridge_invocation_semantics_missing`
2. `g6_callback_completion_semantics_missing`
3. `g6_callback_state_targets_missing`
## Workstreams
1. `WS1` Freeze the Final G6 Entry Gate
2. `WS2` Implement the Minimum Host-Bridge Execution Seam
3. `WS3` Run the Fixed G6 Real Sample
4. `WS4` Write Back Validation And Close
## Phase 0: Freeze The Final Gate
### Objective
Stop semantic drift and declare the gate final for this execution slice.
### Tasks
1. treat the hard `G6` gate as frozen
2. treat the fail-close reasons as frozen
3. explicitly forbid any further `G6` semantic micro-plan under this line
### Deliverables
1. final frozen `G6` gate note
2. final fixed-sample statement
### Acceptance Criteria
1. no further `G6` semantic clarification plan is produced
## Phase 1: Implement The Minimum Execution Seam
### Objective
Add only the minimum implementation needed to let the fixed `G6` real sample enter one controlled execution attempt.
### Tasks
1. implement the minimum host-bridge invocation seam required by the fixed sample
2. implement the minimum callback completion handling required by the fixed sample
3. keep the change narrower than generic host-runtime redesign
4. preserve fail-close behavior when the frozen hard conditions are not met
### Deliverables
1. bounded `G6` code change
2. bounded regression tests
### Acceptance Criteria
1. `G6` execution support is improved only at the seam required by the fixed sample
2. unrelated families are untouched
3. fail-close remains explicit
## Phase 2: Execute The Fixed Real Sample
### Objective
Use one real `G6` sample to prove whether the bounded implementation slice is enough.
### Tasks
1. run the fixed `G6` real sample once
2. classify the result only as:
- `executed-pass`
- `named mismatch`
3. do not open a new semantic sub-plan regardless of result
### Deliverables
1. real execution result
2. fixed-sample execution note
### Acceptance Criteria
1. the result is narrower than “not executed”
2. the result is not deferred into another semantic-planning loop
## Phase 3: Validation Closure
### Objective
Write the fixed result back and close the line.
### Tasks
1. update validation-layer assets
2. if pass: close `G6`
3. if mismatch: write one implementation correction plan only
4. publish a closure report
### Deliverables
1. validation asset update
2. closure report
3. optional implementation correction plan if mismatch occurs
### Acceptance Criteria
1. `G6` ends in `executed-pass` or `named mismatch`
2. no new semantic micro-plan is emitted
## Completion Criteria
This plan is complete when:
1. one bounded implementation seam is landed
2. one fixed `G6` real sample is executed
3. the line closes with `executed-pass` or `named mismatch`
## Non-Negotiable Stop Rule
After this plan starts executing:
1. do not create another `G6` semantic plan
2. if the run fails, create only one implementation correction plan
3. if the run passes, close the `G6` line immediately

View File

@@ -0,0 +1,92 @@
# G7 Real-Sample Entry Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-g7-real-sample-entry-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-g7-real-sample-entry-design.md)
## Plan Intent
This plan executes one bounded next slice:
`G7 real-sample entry`
## Scope Guardrails
1. do not reopen mainline families
2. do not execute `G6` or `G8`
3. do not add new `G7` family fixtures
4. do not implement new runtime-platform prerequisites under this plan
5. do not open `G4 / G5`
## Fixed Verification Anchor
The only target under this plan is:
1. `计量资产库存统计`
## Workstreams
1. `WS1` Freeze the G7 real-sample boundary
2. `WS2` Build the real-sample contract differential
3. `WS3` Rerun the fixed real sample against the existing G7 runtime contract
4. `WS4` Update validation assets and close the loop
## Phase 0: Freeze the Boundary
### Objective
Lock the plan to one `G7` representative sample.
### Tasks
1. freeze `计量资产库存统计` as the only real-sample anchor
2. freeze existing `G7` repo-local runtime contract as the starting baseline
### Acceptance Criteria
1. no other boundary family is touched under this plan
## Phase 1: Build the Differential
### Objective
Understand whether the existing `G7` runtime contract is already close enough for a real-sample rerun.
### Tasks
1. compare the representative `G7` fixture contract to the chosen real sample
2. isolate the smallest remaining contract risk
### Acceptance Criteria
1. the rerun target is explicit and bounded
## Phase 2: Real-Sample Rerun
### Objective
Use the fixed real sample to test the current `G7` runtime contract.
### Tasks
1. run `sg_scene_generate` on the fixed `G7` real sample
2. record whether the result is a pass or a smaller mismatch
### Acceptance Criteria
1. the outcome is narrower than `not yet executed`
## Phase 3: Validation Closure
### Objective
Write the result back into the validation layer and close the bounded slice.
### Tasks
1. update validation assets if the outcome narrows
2. write a closure report
### Acceptance Criteria
1. the next boundary-family ambiguity is reduced further without broadening roadmap scope

View File

@@ -0,0 +1,44 @@
# Host-Bridge Runtime Roadmap Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Sequence: `2026-04-19-final-2-residual-child-plan-sequence-plan.md`
> Fixed Scene: `sweep-085-scene`
> Status: Draft
## Plan Intent
Run a bounded host-bridge runtime slice for the single remaining `host_bridge_workflow` residual.
## Fixed Input Bucket
1. `sweep-085-scene`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/scene_generator_test.rs`
4. `tests/fixtures/generated_scene/host_bridge_runtime_followup_2026-04-19.json`
5. `tests/fixtures/generated_scene/host_bridge_runtime_reconciliation_candidates_2026-04-19.json`
6. `docs/superpowers/reports/2026-04-19-host-bridge-runtime-roadmap-report.md`
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. Freeze the current `sweep-085-scene` generation report.
2. Identify the exact host-bridge runtime missing piece.
3. Implement at most one bounded correction slice if it can be expressed in generated-scene contract or fail-closed reporting.
4. Rerun only `sweep-085-scene`.
5. Publish follow-up and reconciliation candidate assets.
## Expected Delta
Target delta is `+1 framework-auto-pass-candidate` if the host-bridge contract can be closed without full runtime transport. Otherwise the delta is `0`, with a narrower named runtime hold.
## Stop Statement
Stop after the single-scene follow-up and reconciliation candidates are published. Do not update the official board under this plan.

View File

@@ -0,0 +1,50 @@
# Local-Doc Official Board Reconciliation Refresh Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Roadmap: `2026-04-19-local-doc-runtime-roadmap-plan.md`
> Status: Active
## Plan Intent
Refresh the official execution board using the five local-doc framework auto-pass candidates produced by the local-doc runtime roadmap.
## Fixed Inputs
1. `tests/fixtures/generated_scene/local_doc_runtime_reconciliation_candidates_2026-04-19.json`
2. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Allowed Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/local_doc_official_board_reconciliation_refresh_2026-04-19.json`
3. `docs/superpowers/reports/2026-04-19-local-doc-official-board-reconciliation-refresh-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
## Tasks
1. Load the official execution board.
2. Load the local-doc reconciliation candidates.
3. Verify the candidate asset contains exactly the five fixed local-doc scene ids.
4. Match board rows by `sceneId`.
5. Update only framework-layer fields for the five matched rows.
6. Recompute board framework summary counts.
7. Publish reconciliation refresh JSON.
8. Publish reconciliation refresh report.
## Acceptance Criteria
1. Board scene count remains `102`.
2. The five fixed local-doc scene ids have `currentFrameworkStatus = framework-auto-pass`.
3. Board framework counts are `framework-auto-pass = 100` and `framework-structured-fail-closed = 2`.
4. Host-bridge and bootstrap residuals remain structured fail-closed.
5. Analyzer and generator are not modified by this plan.
## Stop Statement
Stop after the local-doc official board reconciliation refresh JSON and report are published. Do not start host-bridge runtime or bootstrap normalization under this plan.

View File

@@ -0,0 +1,54 @@
# Local-Doc Runtime Roadmap Plan
> Date: 2026-04-19
> Parent Decision: `2026-04-19-residual-runtime-roadmap-prioritization-plan.md`
> Parent Residual Bucket: `local_doc_pipeline`
> Status: Draft
## Plan Intent
Plan the bounded closure path for the five `local_doc_pipeline` residuals selected by the residual runtime roadmap prioritization decision.
## Fixed Input Bucket
Only these scenes are in scope:
1. `sweep-033-scene`
2. `sweep-034-scene`
3. `sweep-042-scene`
4. `sweep-051-scene`
5. `sweep-074-scene`
## Initial Phases
### Phase 0: Freeze Local-Doc Residual Baseline
Capture current generation reports and missing pieces for the five scenes.
### Phase 1: Local-Doc Evidence Inventory
Classify document source, attachment dependency, local service dependency, and output artifact expectation.
### Phase 2: Minimal Local-Doc Contract Design
Define the smallest contract that can distinguish runnable local-doc pipelines from policy-held local-doc pipelines.
### Phase 3: Bounded Implementation Slice
Implement only the contract recovery or fail-closed detail required by the five-scene bucket.
### Phase 4: Follow-Up Sweep And Reconciliation
Rerun only the five target scenes and publish candidates. Do not update the official board inside this phase.
## Forbidden Scope
1. host-bridge runtime roadmap;
2. bootstrap target normalization;
3. G4/G5;
4. full attachment runtime implementation unless explicitly required by the minimal contract;
5. official board update.
## Stop Statement
Stop after the local-doc five-scene follow-up and reconciliation candidates are published. A later official board reconciliation plan must consume the result.

View File

@@ -0,0 +1,50 @@
# Official Board Reconciliation Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer E`
> Status: Active
## Plan Intent
Update the official execution board from the final coverage status rollup.
## Fixed Inputs
1. `tests/fixtures/generated_scene/final_coverage_status_rollup_2026-04-19.json`
2. `tests/fixtures/generated_scene/promotion_board_reconciliation_policy_2026-04-19.json`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Allowed Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/official_board_reconciliation_2026-04-19.json`
3. `docs/superpowers/reports/2026-04-19-official-board-reconciliation-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
## Tasks
1. Load the official execution board.
2. Load the final coverage rollup.
3. Match scenes by `sceneId` where present, falling back to ordered index only if necessary.
4. Preserve frozen workbook fields.
5. Add final framework status fields to each board scene.
6. Update board summary with framework status counts.
7. Publish reconciliation JSON.
8. Publish reconciliation report.
## Completion Criteria
1. Board scene count remains `102`.
2. Framework status counts are `95` framework auto-pass and `7` structured fail-closed.
3. No source-unreadable, unsupported-family, missing-source, or unresolved status remains.
4. Analyzer and generator are not modified by this plan.
5. Reconciliation report is published.
## Stop Statement
Stop after the official board reconciliation JSON and report are published. Do not start runtime-roadmap work under this plan.

View File

@@ -0,0 +1,144 @@
# Post-G7 Boundary Decision Roadmap Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: [2026-04-19-post-g7-boundary-decision-roadmap-design.md](D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-19-post-g7-boundary-decision-roadmap-design.md)
## Plan Intent
This roadmap determines the next bounded step after `G7` has already closed as the first boundary-family executed real sample.
Its only purpose is:
`decide whether G6 or G8 may enter real-sample execution scope next, or whether both remain held pending prerequisites`
## Scope Guardrails
1. do not reopen `G7`
2. do not reopen `G1-E / G2 / G3`
3. do not implement runtime-platform prerequisites under this roadmap
4. do not execute real samples for more than one remaining boundary family
5. do not open `G4 / G5`
## Candidate Directions
The only remaining directions under this roadmap are:
1. `G6`
2. `G8`
3. `prerequisites-only hold`
## Workstreams
1. `WS1` Freeze the Post-G7 Starting State
2. `WS2` Compare G6 and G8 Entry Cost
3. `WS3` Select One Next Direction
4. `WS4` Publish the Next Bounded Slice
## Phase 0: Freeze the Starting State
### Objective
Lock the roadmap start point so the decision cannot drift back into closed work.
### Tasks
1. freeze `G7` as closed executed-pass
2. freeze `G6` and `G8` as the only remaining boundary candidates
3. freeze `G1-E / G2 / G3` as closed
4. freeze `G4 / G5` as out of scope
### Deliverables
1. starting-state note
2. fixed candidate list
### Acceptance Criteria
1. no closed family is reopened under this roadmap
## Phase 1: Compare the Remaining Boundary Candidates
### Objective
Compare `G6` and `G8` using explicit entry cost and prerequisite pressure.
### Tasks
1. restate the current hold condition for `G6`
2. restate the current hold condition for `G8`
3. compare which one requires the smaller new capability to enter real-sample scope
4. compare whether either direction is still too expensive and should remain held
### Deliverables
1. `G6 vs G8` comparison matrix
2. smallest-next-step summary
### Acceptance Criteria
1. the preferred next direction is justified explicitly
2. the non-selected direction has an explicit hold reason
## Phase 2: Select One Next Direction
### Objective
Reduce the post-`G7` ambiguity to one bounded decision.
### Tasks
1. select exactly one direction:
- `G6`
- `G8`
- or `prerequisites-only hold`
2. record why the other directions remain out of scope
### Deliverables
1. post-`G7` boundary decision
2. hold reasons for non-selected directions
### Acceptance Criteria
1. only one next direction is opened
2. the decision is bounded and defensible
## Phase 3: Publish the Next Bounded Slice
### Objective
Turn the decision into the next executable bounded artifact.
### Tasks
1. if `G6` is selected, write a bounded `G6 real-sample entry` design and plan
2. if `G8` is selected, write a bounded `G8 real-sample entry` design and plan
3. if `prerequisites-only hold` is selected, write a bounded prerequisites roadmap
4. publish a roadmap closure report
### Deliverables
1. next bounded `design`
2. next bounded `plan`
3. roadmap closure report
### Acceptance Criteria
1. the next step is ready without extending this roadmap
2. only one bounded direction is emitted
## Completion Criteria
This roadmap is complete when:
1. the post-`G7` next step is reduced to one bounded direction
2. `G6` and `G8` no longer compete ambiguously
3. a single follow-up `design + plan` exists for the selected direction
## Next Step
After this roadmap completes:
1. execute the selected bounded slice
2. do not reopen this roadmap during execution

View File

@@ -0,0 +1,60 @@
# Promotion And Board Reconciliation Policy Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Route: `Route 6: promotion and board reconciliation`
> Parent Layer: `Layer E`
> Upstream Design: `docs/superpowers/specs/2026-04-19-promotion-and-board-reconciliation-policy-design.md`
## Plan Intent
Publish the promotion and reconciliation policy that governs how future stronger statuses may update official scene-state assets.
## Fixed Input Bucket
Policy inputs only:
1. `auto-pass`
2. `fail-closed-known`
3. `adjudicated-valid-host-bridge`
4. hygiene-aware timeout interpretation
## Allowed Files
1. policy design and plan docs
2. policy JSON assets
3. policy reports
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. define promotion thresholds
2. define how timeout hygiene is represented
3. define how structured fail-closed progress is represented
4. define what evidence is sufficient for board reconciliation
5. publish policy assets
## Expected Coverage Delta
No direct scene-count delta is required.
The expected result is policy readiness for later rule-driven reconciliation.
## Completion Criteria
1. promotion thresholds are explicit
2. timeout hygiene representation is explicit
3. board update rules are explicit
## Stop Statement
Stop after the Route 6 policy is published.
Do not update the execution board under this plan.

View File

@@ -0,0 +1,168 @@
# Remaining Route Conflict Correction Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-remaining-route-conflict-correction-design.md`
## Plan Intent
Adjudicate and, where evidence supports it, correct the remaining `4` route conflicts from the follow-up full sweep.
This is a bounded route-conflict plan, not a new full-sweep roadmap.
## Fixed Input
Use only the `4` `misclassified` records from:
`tests/fixtures/generated_scene/full_sweep_improvement_followup_2026-04-19.json`
The fixed scene set is:
1. `95598报修工单日管控`
2. `95598重要服务事项报备统计表`
3. `台区线损台区月度高负损预测`
4. `配网支撑月报(95598抢修统计报表)`
## Fixed Outputs
1. `tests/fixtures/generated_scene/remaining_route_conflict_decisions_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-remaining-route-conflict-correction-report.md`
## Scope Guardrails
1. do not touch timeout handling
2. do not touch structured fail-closed reporting
3. do not add new families
4. do not update execution board
5. do not promote scenes
6. do not weaken current `G2/G3/G6` pass cases
7. do not force a scene into G2/G3 if host bridge is the only complete path
## Phase 0: Freeze Conflict Set
### Objective
Freeze the `4` route conflicts as the only input.
### Tasks
1. read `full_sweep_improvement_followup_2026-04-19.json`
2. select only `dryRunStatus = misclassified`
3. verify count is `4`
4. freeze expected group and inferred archetype for each record
### Deliverables
1. frozen route conflict inventory
### Acceptance Criteria
1. exactly `4` records are in scope
2. no extra scene is added
## Phase 1: Evidence Adjudication
### Objective
Decide whether each conflict should be corrected or retained as host bridge.
### Tasks
1. inspect existing generation reports for the `4` records
2. compare business-chain evidence against host-bridge evidence
3. apply the route decision model:
- `route-corrected-to-g3`
- `route-corrected-to-g2`
- `valid-host-bridge-workflow`
- `board-expectation-stale`
- `route-conflict-unresolved`
4. write preliminary decisions
### Deliverables
1. preliminary route conflict decision table
### Acceptance Criteria
1. all `4` records have a preliminary decision
2. no code is changed before evidence is adjudicated
## Phase 2: Bounded Route Correction
### Objective
Apply only the route corrections justified by Phase 1.
### Tasks
1. update analyzer routing precedence only if evidence supports correction
2. keep valid host-bridge cases unchanged
3. add targeted regression tests for corrected cases
4. preserve existing `G2/G3/G6` real-sample and canonical tests
### Deliverables
1. bounded analyzer routing patch if needed
2. route conflict regression tests
### Acceptance Criteria
1. corrected records no longer misclassify
2. valid host-bridge records remain host bridge
3. no broad routing rewrite is introduced
## Phase 3: Targeted Probe
### Objective
Verify only the fixed `4` records after correction.
### Tasks
1. rerun generation for the same `4` scenes
2. record resulting archetype and readiness
3. classify each final decision
4. write final decision JSON
### Deliverables
1. `remaining_route_conflict_decisions_2026-04-19.json`
### Acceptance Criteria
1. all `4` records have final probe results
2. no full `102` sweep is required by this plan
## Phase 4: Report and Stop
### Objective
Publish the route conflict report and stop.
### Tasks
1. write the route conflict correction report
2. include final decisions for all `4` records
3. list verification commands
4. explicitly state that the execution board is not updated
### Deliverables
1. route conflict correction report
### Acceptance Criteria
1. all `4` conflicts are adjudicated
2. tests pass
3. no execution board update is made
## Completion Criteria
This plan is complete when:
1. the fixed `4` route conflicts have final decisions
2. targeted probes have been run
3. relevant regressions pass
4. decision JSON and report are published
5. execution stops without opening another plan

View File

@@ -0,0 +1,40 @@
# Residual 13 Follow-Up Sweep And Reconciliation Plan
> Date: 2026-04-19
> Status: Draft
> Parent Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-residual-13-closure-plan.md`
> Parent Route: `Residual Route E`
> Parent Layer: `Layer E`
## Plan Intent
Measure the cumulative delta after residual Routes A through D complete.
## Fixed Input Bucket
The fixed input bucket is the same `13` residual scenes from the parent residual closure plan.
## Allowed Files
1. residual follow-up JSON asset
2. residual reconciliation candidate JSON asset
3. residual follow-up report
4. residual reconciliation report
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. rerun the fixed 13 residual scenes;
2. classify raw statuses;
3. apply promotion policy;
4. report remaining residual count.
## Stop Statement
Stop after residual follow-up and reconciliation reports.

View File

@@ -0,0 +1,49 @@
# Residual Runtime Roadmap Prioritization Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: `Layer E`
> Status: Active
## Plan Intent
Select the next roadmap from the three residual inputs after official board reconciliation.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/official_board_reconciliation_2026-04-19.json`
## Allowed Files
1. `tests/fixtures/generated_scene/residual_runtime_roadmap_prioritization_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-residual-runtime-roadmap-prioritization-report.md`
3. selected next roadmap design
4. selected next roadmap plan
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Tasks
1. Load official board residual records.
2. Group residuals by next action.
3. Score local-doc runtime, host-bridge runtime, and bootstrap target normalization.
4. Select exactly one next roadmap.
5. Publish prioritization JSON.
6. Publish prioritization report.
7. Create design/plan for the selected roadmap only.
## Completion Criteria
1. All `7` residual records are represented.
2. Exactly one selected roadmap exists.
3. Non-selected roadmaps are deferred with reasons.
4. No implementation file is modified.
## Stop Statement
Stop after prioritization assets and the selected next roadmap design/plan are published. Do not execute the selected roadmap under this plan.

View File

@@ -0,0 +1,129 @@
# Scene Skill 102 Final Materialization Plan
> Date: 2026-04-19
> Parent Framework: `2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Parent Layer: final asset materialization before validation
> Status: Draft
## Plan Intent
Generate and freeze a single canonical `102` skill package set for later static, mock, and production-like validation.
This plan answers whether all 102 scenes have materialized skill assets, not just framework auto-pass status.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/scene_skill_102_framework_closure_rollup_2026-04-19.json`
3. scene source root: `D:/desk/智能体资料/全量业务场景/一平台场景`
## Output Root
`examples/scene_skill_102_final_materialization_2026-04-19`
## Allowed Files
1. `examples/scene_skill_102_final_materialization_2026-04-19/**`
2. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
3. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
4. `docs/superpowers/reports/2026-04-19-scene-skill-102-final-materialization-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
5. existing `examples/*` follow-up roots outside the output root
## Phase 0: Freeze Materialization Boundary
### Tasks
1. Confirm framework rollup is `102 / 102`.
2. Confirm materialization does not delete existing `examples/*`.
3. Confirm this plan does not perform static/mock/production validation.
### Acceptance Criteria
1. Scope is materialization only.
2. Output root is isolated.
## Phase 1: Build Materialization Input Manifest
### Tasks
1. Load official board or fallback source-list assets.
2. Produce exactly 102 materialization input rows.
3. Validate unique scene ids.
4. Resolve source directory for each scene.
5. Sanitize manifest-only string fields for control characters.
### Acceptance Criteria
1. Input manifest has 102 rows.
2. No missing source directory remains.
3. No duplicate scene id remains.
## Phase 2: Generate 102 Skill Packages
### Tasks
For each manifest row, run:
```powershell
cargo run --bin sg_scene_generate -- `
--source-dir "<sourceDir>" `
--scene-id "<sceneId>" `
--scene-name "<sceneName>" `
--scene-kind report_collection `
--output-root "D:/data/ideaSpace/rust/sgClaw/claw-new/examples/scene_skill_102_final_materialization_2026-04-19"
```
### Acceptance Criteria
1. Every row is attempted.
2. No single scene failure stops the full batch.
3. stdout/stderr/result status are captured.
## Phase 3: Verify Materialized Package Presence
### Tasks
For each scene, check:
1. `SKILL.toml`
2. `SKILL.md`
3. `scene.toml`
4. `references/generation-report.json`
5. at least one script under `scripts/`
### Acceptance Criteria
1. All successful rows have required files.
2. Failures are explicit in the failures asset.
## Phase 4: Publish Manifest And Report
### Tasks
1. Publish final materialization manifest.
2. Publish final materialization failures.
3. Publish superpowers report.
### Acceptance Criteria
1. Manifest row count is 102.
2. Report states generated count, failure count, readiness distribution, and next validation input.
3. The report explicitly states that old `examples/*` roots were not cleaned.
## Expected Delta
No framework coverage delta. Expected asset delta is:
1. `102` canonical final skill package rows;
2. one stable manifest for later validation.
## Stop Statement
Stop after final materialization manifest, failures asset, and report are published. Do not start static, mock, or production validation under this plan.

View File

@@ -0,0 +1,132 @@
# Scene Skill 102 Full Coverage Child Plan Sequence Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework Plan: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Design: `docs/superpowers/specs/2026-04-19-scene-skill-102-full-coverage-child-plan-sequence-design.md`
## Plan Intent
Create the full bounded child-plan sequence for `Route 2` through `Route 6` under the `102` full-coverage parent framework.
This plan only creates the downstream plan tree. It does not implement any bucket directly.
## Scope Guardrails
1. do not modify `analyzer.rs`
2. do not modify `generator.rs`
3. do not modify `ir.rs`
4. do not update `scene_execution_board_2026-04-18.json`
5. do not rerun `102` sweep
6. do not open new families
7. do not collapse multiple buckets into one child implementation plan
## Workstreams
1. `WS1` Route 2 child plans
2. `WS2` Route 3 child plans
3. `WS3` Route 4 child plans
4. `WS4` Route 5 child plans
5. `WS5` Route 6 child plans
## Phase 0: Freeze Sequence Inputs
### Objective
Freeze the parent baseline and route order before generating child plans.
### Tasks
1. freeze parent framework references
2. freeze current bucket sizes
3. freeze route order from Route 2 through Route 6
### Deliverables
1. child-plan sequence design
2. child-plan sequence plan
### Acceptance Criteria
1. all later child plans can reference the same parent baseline
2. route order is explicit and cannot drift
## Phase 1: Route 2 Child Plans
### Objective
Create the first three bounded child plans under the largest remaining mainline bucket.
### Tasks
1. create `G3 enrichment-request closure` design and plan
2. create `G3 export-plan closure` design and plan
3. create `G3 residual contract closure` design and plan
### Deliverables
1. Route 2 child designs
2. Route 2 child plans
### Acceptance Criteria
1. each Route 2 child plan owns a narrower fixed bucket
2. Route 2 plans declare allowed and forbidden file sets
3. Route 2 plans declare expected deltas separately
## Phase 2: Route 3 and Route 4 Child Plans
### Objective
Create the bounded plans for the smaller remaining mainline buckets.
### Tasks
1. create `G2 remaining fail-closed closure` design and plan
2. create `G1-E remaining fail-closed closure` design and plan
### Deliverables
1. Route 3 child design and plan
2. Route 4 child design and plan
### Acceptance Criteria
1. Route 3 and Route 4 remain downstream of Route 2
2. neither plan absorbs Route 2 issues
## Phase 3: Route 5 and Route 6 Child Plans
### Objective
Create the policy and decision plans that follow mainline contract-recovery work.
### Tasks
1. create `boundary fail-closed decision` design and plan
2. create `promotion and board reconciliation policy` design and plan
### Deliverables
1. Route 5 child design and plan
2. Route 6 child design and plan
### Acceptance Criteria
1. Route 5 is decision-first, not implementation-first
2. Route 6 is policy-only
## Completion Criteria
This plan is complete when:
1. Route 2 through Route 6 all have bounded child designs and plans
2. every child plan declares parent route, parent layer, input bucket, allowed files, forbidden files, expected delta, and stop statement
3. later work can proceed without inventing new unanchored micro-plans
## Stop Statement
Stop after the bounded child-plan sequence for Route 2 through Route 6 has been created.
Do not implement any route from this sequence under this plan.

View File

@@ -0,0 +1,298 @@
# Scene Skill 102 Full Coverage Framework Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-19-scene-skill-102-full-coverage-framework-design.md`
## Plan Intent
Turn the current sgClaw post-roadmap work into a single controlled framework for driving the `102` scene set toward full bounded coverage.
This plan is the parent roadmap for all later bounded plans. Future bounded plans must fit inside one of the routes defined here.
## Current Baseline
Current integrated baseline:
| Status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 47 |
| `adjudicated-valid-host-bridge` | 4 |
| raw `source-unreadable` | 3 |
| Total | 102 |
Timeout hygiene overlay:
| Hygiene interpretation | Count |
| --- | ---: |
| `timeout-as-pass-candidate` | 2 |
| `timeout-as-fail-closed-candidate` | 1 |
| `timeout-still-unreadable` | 0 |
| `timeout-rerun-error` | 0 |
## Overall Goal
The overall goal is:
`100% bounded framework coverage for the current 102 scene set`
This means:
1. every scene is covered by a supported framework path
2. every non-pass scene has a structured and named reason
3. no unresolved timeout, unsupported-family, or route-conflict bucket remains
It does not require `100% auto-pass`.
## Scope Guardrails
1. do not start `G4/G5`
2. do not add new families unless this parent framework is updated first
3. do not treat diagnostics as promotions
4. do not update `scene_execution_board_2026-04-18.json` inside diagnostic or bounded recovery plans
5. do not mix timeout policy work with contract recovery work in the same bounded implementation plan
6. do not create semantics-only micro-plans that are not tied to one of the routes below
## Workstreams
1. `WS1` Coverage and Reporting Integrity
2. `WS2` Mainline Contract Closure
3. `WS3` Boundary Bucket Handling
4. `WS4` Promotion and Board Reconciliation
## Phase 0: Freeze the Parent Framework
### Objective
Make this plan the single parent framework for the next improvement cycle.
### Tasks
1. freeze the current integrated baseline
2. freeze the five framework layers
3. freeze the route order
4. forbid out-of-framework micro-plan drift
### Deliverables
1. parent framework design
2. parent framework plan
### Acceptance Criteria
1. future bounded plans can be mapped to one framework layer
2. future bounded plans can be mapped to one route
## Phase 1: Close Reporting Integrity
### Objective
Finish the reporting-side work so the `102` scene set is measured correctly before further implementation.
### Route
`Route 1: Layer E hygiene integration`
### Tasks
1. preserve raw timeout counts
2. preserve hygiene-aware timeout interpretation
3. preserve route adjudication
4. preserve structured fail-closed buckets
5. produce reconciliation-friendly current-state reporting
### Deliverables
1. timeout hygiene integration assets
2. reconciliation-friendly integrated reporting
### Acceptance Criteria
1. no unresolved timeout interpretation remains
2. no unresolved route conflict remains
## Phase 2: Mainline G3 Contract Closure
### Objective
Reduce the largest remaining fail-closed bucket in a controlled way.
### Route
`Route 2: G3 / paginated_enrichment`
### Tasks
1. freeze the current `G3` fail-closed subgrouping
2. select the top repeated recoverable pattern
3. implement bounded contract recovery
4. rerun only the bounded validation needed by that slice
5. measure delta against the parent baseline
### Deliverables
1. bounded G3 implementation plan(s)
2. bounded G3 implementation report(s)
3. updated coverage delta assets
### Acceptance Criteria
1. no scene-name hardcoding
2. no gate relaxation
3. canonical `G3` and real-sample `G3` remain stable
## Phase 3: Mainline G2 Closure
### Objective
Reduce the remaining `multi_mode_request` fail-closed bucket.
### Route
`Route 3: G2 / multi_mode_request`
### Tasks
1. freeze the current `4` G2 fail-closed records
2. identify the common missing contract
3. implement one bounded G2 correction slice
4. rerun bounded validation
### Deliverables
1. bounded G2 implementation plan(s)
2. bounded G2 implementation report(s)
### Acceptance Criteria
1. real-sample `G2` pass remains stable
2. no route drift into host-bridge or other families
## Phase 4: Mainline G1-E Closure
### Objective
Reduce the remaining `single_request_enrichment` fail-closed bucket.
### Route
`Route 4: G1-E / single_request_enrichment`
### Tasks
1. freeze the current `2` G1-E fail-closed records
2. identify the common missing contract
3. implement one bounded G1-E correction slice
4. rerun bounded validation
### Deliverables
1. bounded G1-E implementation plan(s)
2. bounded G1-E implementation report(s)
### Acceptance Criteria
1. real-sample `G1-E` pass remains stable
2. no route drift into host-bridge or page-state families
## Phase 5: Boundary Buckets After Mainline
### Objective
Touch boundary-family fail-closed buckets only after the mainline buckets have been reduced or explicitly deferred.
### Route
`Route 5: local_doc_pipeline and host_bridge_workflow remaining fail-closed`
### Tasks
1. inspect the `5` local-doc records
2. inspect the `1` host-bridge fail-closed record
3. decide whether to defer or open one bounded boundary correction slice
### Deliverables
1. boundary bucket decision report
2. optional bounded boundary plan
### Acceptance Criteria
1. no boundary slice starts before mainline routes are resolved or deferred
## Phase 6: Promotion and Board Policy
### Objective
Define how stronger framework-resolved statuses may flow back into official scene status assets.
### Route
`Route 6: promotion and board reconciliation`
### Tasks
1. define promotion thresholds
2. define how hygiene-aware timeout results are represented
3. define how structured fail-closed progress is represented
4. define what can and cannot update the execution board
### Deliverables
1. promotion policy design
2. execution-board reconciliation plan
### Acceptance Criteria
1. diagnostics remain distinct from promotion
2. execution board updates become rule-driven instead of ad hoc
## Route Order
The route order is fixed:
1. finish reporting integrity
2. reduce `G3` fail-closed bucket
3. reduce `G2` fail-closed bucket
4. reduce `G1-E` fail-closed bucket
5. inspect boundary fail-closed buckets
6. define promotion and board reconciliation policy
No bounded plan may skip upward in this order unless this parent plan is revised.
## Required Contents for Future Bounded Plans
Every future bounded plan must include:
1. parent route reference
2. parent framework layer
3. fixed input bucket
4. exact files allowed to change
5. files that must not change
6. expected coverage delta
7. stop statement
If one of these is missing, the bounded plan is not valid under this framework.
## Completion Criteria
This parent framework remains active until all of the following are true:
1. `unsupported-family = 0`
2. `missing-source = 0`
3. `misclassified-unresolved = 0`
4. `timeout-still-unreadable = 0`
5. every remaining non-pass scene is either:
- structured fail-closed
- adjudicated valid host-bridge
- policy-recognized timeout rerun hygiene result
6. board reconciliation policy exists
## Stop Statement
This is a parent framework plan.
Do not implement code directly from this plan.
All implementation must happen through later bounded plans that explicitly declare which route and which layer they belong to.

View File

@@ -0,0 +1,263 @@
# Structured Fail-Closed Improvement Roadmap Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Spec: `docs/superpowers/specs/2026-04-19-structured-fail-closed-improvement-roadmap-design.md`
> Upstream Reconciliation: `tests/fixtures/generated_scene/full_sweep_status_reconciliation_2026-04-19.json`
## Plan Intent
Coordinate the next improvement cycle for the `48` structured fail-closed records from the reconciled `102` sweep.
This is a roadmap-level plan. It intentionally starts with inventory and gap taxonomy before any implementation correction.
## Baseline
Current reconciled `102` status:
| Status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 48 |
| `adjudicated-valid-host-bridge` | 4 |
| `source-unreadable` | 2 |
Fail-closed distribution:
| Inferred archetype | Count |
| --- | ---: |
| `paginated_enrichment` | 35 |
| `local_doc_pipeline` | 5 |
| `multi_mode_request` | 4 |
| `single_request_enrichment` | 2 |
| `host_bridge_workflow` | 1 |
| `page_state_eval` | 1 |
## Scope Guardrails
1. do not add new scene families
2. do not start `G4/G5`
3. do not implement login recovery
4. do not implement full host runtime transport
5. do not implement local document attachment runtime
6. do not update `scene_execution_board_2026-04-18.json`
7. do not promote scenes directly from dry-run or follow-up results
8. do not reopen `adjudicated-valid-host-bridge` records
9. do not handle the `2` timeout records in this roadmap
10. do not loosen readiness gates to increase pass count
## Workstreams
1. `WS1` Fail-Closed Inventory and Gap Taxonomy
2. `WS2` G3 Paginated Enrichment Recovery
3. `WS3` Small-Bucket Recovery
4. `WS4` Bootstrap Isolation
5. `WS5` Follow-Up Sweep and Reporting
## Phase 0: Freeze Structured Fail-Closed Baseline
### Objective
Freeze the `48` fail-closed records as the only implementation-analysis input.
### Tasks
1. read `full_sweep_status_reconciliation_2026-04-19.json`
2. verify total scene count is `102`
3. verify `fail-closed-known = 48`
4. verify `adjudicated-valid-host-bridge = 4`
5. verify `source-unreadable = 2`
6. extract only records with `reconciledStatus = fail-closed-known`
### Deliverables
1. frozen fail-closed input list
2. baseline validation summary
### Acceptance Criteria
1. exactly `48` records enter this roadmap
2. route-adjudicated records are excluded
3. timeout records are excluded
## Phase 1: Build Fail-Closed Inventory and Gap Taxonomy
### Objective
Split the `48` records into actionable missing-contract buckets.
### Tasks
1. inspect each fail-closed record
2. assign exactly one primary missing-contract label:
- `main_request_missing`
- `pagination_plan_missing`
- `enrichment_request_missing`
- `join_key_missing`
- `export_plan_missing`
- `mode_matrix_missing`
- `mode_request_contract_missing`
- `single_request_enrichment_contract_missing`
- `host_bridge_contract_missing`
- `local_doc_contract_missing`
- `bootstrap_target_unresolved`
- `mixed_or_ambiguous_contract_gap`
3. attach secondary labels when useful
4. group by inferred archetype and primary label
5. identify top repeated recoverable patterns
### Deliverables
1. `tests/fixtures/generated_scene/structured_fail_closed_inventory_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-structured-fail-closed-inventory-report.md`
### Acceptance Criteria
1. all `48` records have exactly one primary label
2. the `35` `paginated_enrichment` records are explicitly split
3. no implementation is performed in this phase
## Phase 2: G3 Paginated Enrichment Recovery Slice
### Objective
Improve the largest bucket only when Phase 1 identifies repeated recoverable G3 patterns.
### Tasks
1. select only `paginated_enrichment` records from the inventory
2. prioritize repeated primary labels in this order:
- `main_request_missing`
- `pagination_plan_missing`
- `enrichment_request_missing`
- `join_key_missing`
- `export_plan_missing`
3. define bounded recovery rules for the top repeated pattern
4. implement only traceable evidence recovery
5. add regression tests for the recovered pattern
6. preserve canonical `G3` and real-sample `G3` pass
### Deliverables
1. G3 recovery implementation if evidence supports it
2. regression tests for the recovered pattern
3. G3 recovery report
### Acceptance Criteria
1. no scene-name hardcoding
2. no gate relaxation
3. recovered fields are traceable to source evidence
4. existing `G3` canonical and real-sample tests pass
## Phase 3: Small-Bucket Recovery Slice
### Objective
Handle smaller buckets only after the G3 slice is complete or explicitly deferred.
### Tasks
1. inspect `local_doc_pipeline = 5`
2. inspect `multi_mode_request = 4`
3. inspect `single_request_enrichment = 2`
4. inspect `host_bridge_workflow = 1`
5. choose at most one bounded non-G3 recovery slice
6. preserve existing real-sample passes for `G1-E`, `G2`, `G6`, `G7`
### Deliverables
1. small-bucket recovery decision report
2. optional bounded implementation and tests
### Acceptance Criteria
1. only one small-bucket slice is implemented in this roadmap
2. no `G8` attachment/local document runtime is started
3. no full host runtime transport is started
## Phase 4: Bootstrap Target Isolation
### Objective
Keep the single `page_state_eval + bootstrap_target` record separate.
### Tasks
1. identify the bootstrap target record
2. preserve it as a separate future input
3. do not implement login recovery
4. produce bootstrap isolation note
### Deliverables
1. bootstrap isolation note
### Acceptance Criteria
1. bootstrap target does not pollute G3 or small-bucket recovery
2. no login or bootstrap auto-recovery is implemented
## Phase 5: Follow-Up Sweep and Coverage Delta
### Objective
Measure the impact of bounded recovery work.
### Tasks
1. rerun the fixed `102` scene sweep
2. produce a new follow-up result
3. compare against the reconciled baseline:
- auto-pass delta
- fail-closed-known delta
- actionable coverage delta
- timeout count
- adjudicated host-bridge count
4. publish coverage delta report
### Deliverables
1. `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-structured-fail-closed-improvement-coverage-delta-report.md`
3. `docs/superpowers/reports/2026-04-19-structured-fail-closed-improvement-roadmap-closure-report.md`
### Acceptance Criteria
1. scene set remains exactly `102`
2. improvements are measured, not assumed
3. execution board remains unchanged
4. fail-closed count only drops when contracts close or become more specifically isolated
## Milestone Order
The order is fixed:
1. Phase 0: freeze fail-closed baseline
2. Phase 1: build inventory and taxonomy
3. Phase 2: G3 recovery slice
4. Phase 3: small-bucket recovery slice
5. Phase 4: bootstrap target isolation
6. Phase 5: follow-up sweep and delta
Do not start implementation before Phase 1 is complete.
Do not start small-bucket recovery before the G3 slice is completed or explicitly deferred with reasons.
## Completion Criteria
This roadmap is complete when:
1. all `48` structured fail-closed records are inventoried and labeled
2. the `35` G3 records are split into actionable contract-gap groups
3. at least the highest-value repeated recoverable pattern is either implemented or explicitly deferred
4. small buckets are inspected and at most one bounded slice is implemented
5. the bootstrap target remains isolated
6. a follow-up sweep quantifies coverage delta
7. no new family is introduced
## Stop Statement
Stop after the follow-up sweep, delta report, and closure report.
Do not automatically update the execution board or start another roadmap inside this plan.

View File

@@ -0,0 +1,150 @@
# Structured Fail-Closed Residual 13 Closure Plan
> Date: 2026-04-19
> Status: Draft
> Parent Framework: `docs/superpowers/plans/2026-04-19-scene-skill-102-full-coverage-framework-plan.md`
> Upstream Design: `docs/superpowers/specs/2026-04-19-structured-fail-closed-residual-13-closure-design.md`
> Fixed Input: `tests/fixtures/generated_scene/full_coverage_reconciliation_candidates_2026-04-19.json`
## Plan Intent
Turn the remaining `13` `framework-structured-fail-closed` scenes into a controlled residual closure sequence.
This plan is a coordinator plan. It does not directly implement code. Implementation must happen only in bounded child plans declared below.
## Fixed Input Bucket
The fixed input bucket is the `13` scenes with:
`reconciliationCandidateStatus = framework-structured-fail-closed`
from:
`tests/fixtures/generated_scene/full_coverage_reconciliation_candidates_2026-04-19.json`
## Residual Routes
### Residual Route A: G3 Residual Closure
Fixed input:
1. `sweep-007-scene` / `95598供电服务月报`
2. `sweep-039-scene` / `故障报修工单信息统计表`
3. `sweep-068-scene` / `输变电设备运行分析报告`
4. `sweep-084-scene` / `巡视计划完成情况自动检索`
Expected child plan:
`2026-04-19-g3-residual-4-workflow-evidence-closure-plan.md`
Allowed implementation area:
1. G3 workflow evidence recovery.
2. G3 contract assembly.
3. bounded G3 route-local validation.
Forbidden:
1. G8 runtime.
2. G6 host bridge runtime.
3. new family creation.
### Residual Route B: G2 Residual Closure
Fixed input:
1. `sweep-018-scene` / `白银线损周报`
2. `sweep-071-scene` / `台区线损大数据-月_周累计线损率统计分析`
Expected child plan:
`2026-04-19-g2-residual-2-readiness-closure-plan.md`
Allowed implementation area:
1. G2 readiness interpretation.
2. G2 mode/request/response contract correction.
3. bounded G2 route-local validation.
Forbidden:
1. changing G2 real-sample pass semantics;
2. adding a new G2 variant family;
3. route drift into host bridge.
### Residual Route C: Boundary Residual Decision
Fixed input:
1. `sweep-033-scene` / `供电可靠率指标统计表`
2. `sweep-034-scene` / `供电可靠性数据质量自查报告月报`
3. `sweep-042-scene` / `国网金昌供电公司营商环境周例会报告`
4. `sweep-051-scene` / `嘉峪关可靠性分析报告`
5. `sweep-074-scene` / `同兴智能安全督查日报`
6. `sweep-085-scene` / `业扩报装管理制度`
Expected child plan:
`2026-04-19-boundary-residual-hold-decision-plan.md`
Allowed action:
1. decision-only hold/defer classification.
2. no implementation.
### Residual Route D: Bootstrap Residual Isolation
Fixed input:
1. `sweep-091-scene` / `用户停电频次分析监测`
Expected child plan:
`2026-04-19-bootstrap-target-residual-isolation-plan.md`
Allowed action:
1. bootstrap target isolation.
2. no login recovery implementation.
### Residual Route E: Residual Follow-Up Reconciliation
Expected child plan:
`2026-04-19-residual-13-followup-sweep-and-reconciliation-plan.md`
Allowed action:
1. route-local or fixed 13-scene follow-up sweep.
2. reconciliation candidate refresh.
3. no official board update.
## Phase Order
1. Run Residual Route A.
2. Run Residual Route B.
3. Run Residual Route C.
4. Run Residual Route D.
5. Run Residual Route E.
Do not skip to Route E before Routes A through D are complete.
## Deliverables
1. residual 13 design.
2. residual 13 coordinator plan.
3. child bounded plans for Routes A through E.
## Completion Criteria
1. the 13 residual scenes are fully assigned to residual routes;
2. every residual route has an expected child plan name;
3. mainline residuals are separated from boundary/bootstrap residuals;
4. no implementation is performed directly by this coordinator plan.
## Stop Statement
Stop after publishing this coordinator plan and its child plan skeletons.
Do not modify implementation files under this coordinator plan.

View File

@@ -0,0 +1,144 @@
# Timeout Budget and Rerun Hygiene Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-19-timeout-budget-rerun-hygiene-design.md`
> Upstream Diagnostic: `docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
## Plan Intent
Create a bounded timeout-budget and rerun-hygiene layer so budget-sensitive scenes are not collapsed into a single `source-unreadable` bucket.
This plan is classification and reporting only. It does not change analyzer or generator code.
## Scope Guardrails
1. do not modify `src/generated_scene/analyzer.rs`
2. do not modify `src/generated_scene/generator.rs`
3. do not update `scene_execution_board_2026-04-18.json`
4. do not promote scenes
5. do not rerun the full `102` sweep
6. do not treat rerun success as validated pass
7. do not start timeout implementation fixes
## Fixed Input
The fixed input is:
`tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
Only the three diagnosed timeout records enter this plan.
## Phase 0: Freeze Timeout Diagnostic Input
### Objective
Freeze the timeout diagnostic records before hygiene mapping.
### Tasks
1. read the timeout diagnostic JSON
2. verify total timeout records is `3`
3. verify the label set is:
- `timeout-rerun-pass = 2`
- `timeout-rerun-fail-closed = 1`
### Deliverables
1. frozen timeout diagnostic baseline
### Acceptance Criteria
1. exactly `3` records enter this hygiene plan
2. no non-timeout scene enters the plan
## Phase 1: Define Hygiene Mapping
### Objective
Map timeout diagnostic results to explicit rerun hygiene statuses.
### Tasks
1. map `timeout-rerun-pass` to `rerun-resolved-pass`
2. map `timeout-rerun-fail-closed` to `rerun-resolved-fail-closed`
3. preserve any future timeout as `rerun-still-timeout`
4. preserve any future unexpected exit as `rerun-error`
### Deliverables
1. explicit rerun hygiene mapping table
### Acceptance Criteria
1. each timeout diagnostic label maps to one hygiene status
2. pass-like rerun and fail-closed rerun remain distinct
## Phase 2: Build Hygiene Output
### Objective
Publish a hygiene-layer view for the three timeout records.
### Tasks
1. write `tests/fixtures/generated_scene/timeout_budget_rerun_hygiene_2026-04-19.json`
2. include:
- original timeout status
- diagnostic label
- rerun hygiene status
- elapsed seconds
- report presence
- readiness if present
3. summarize how many records are:
- `rerun-resolved-pass`
- `rerun-resolved-fail-closed`
- `rerun-still-timeout`
- `rerun-error`
### Deliverables
1. timeout budget hygiene JSON
### Acceptance Criteria
1. all three timeout records appear in the hygiene JSON
2. each has exactly one hygiene status
## Phase 3: Publish Report
### Objective
Publish the bounded timeout hygiene report without changing scene status.
### Tasks
1. write `docs/superpowers/reports/2026-04-19-timeout-budget-rerun-hygiene-report.md`
2. explain why `sweep-040-scene` should not be counted the same way as a hard unreadable source
3. explain why `sweep-015-scene` and `sweep-025-scene` are budget-sensitive pass candidates
4. state that this remains a hygiene layer, not a promotion layer
### Deliverables
1. timeout budget and rerun hygiene report
### Acceptance Criteria
1. report exists
2. no execution board update is made
3. no implementation change is made
## Completion Criteria
This plan is complete when:
1. timeout diagnostic input is frozen
2. rerun hygiene mapping is defined
3. hygiene JSON is published
4. hygiene report is published
## Stop Statement
Stop after publishing the timeout hygiene JSON and report.
Do not start timeout implementation or scene promotion inside this plan.

View File

@@ -0,0 +1,178 @@
# Timeout Regression Diagnostic Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-19-timeout-regression-diagnostic-design.md`
> Upstream Follow-up: `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
## Plan Intent
Run a bounded diagnostic for the three timeout records after the structured fail-closed improvement follow-up sweep.
This plan only diagnoses timeout behavior. It does not implement fixes.
## Scope Guardrails
1. do not modify `src/generated_scene/analyzer.rs`
2. do not modify `src/generated_scene/generator.rs`
3. do not update `scene_execution_board_2026-04-18.json`
4. do not promote scenes
5. do not add family baselines
6. do not handle the remaining structured fail-closed records
7. do not handle adjudicated host-bridge records
8. do not treat diagnostic rerun success as validated scene pass
## Fixed Input
The fixed input is:
`tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
Only records with `followupStatus = source-unreadable` and reason `generator timeout after 45s` enter this plan.
Expected fixed set:
| Scene id | Scene | Type |
| --- | --- | --- |
| `sweep-015-scene` | `任务报表` | persistent timeout |
| `sweep-025-scene` | `力禾动环系统巡视记录` | persistent timeout |
| `sweep-040-scene` | `嘉峪关日报` | regression timeout |
## Phase 0: Freeze Timeout Inputs
### Objective
Freeze the exact timeout set before diagnostics.
### Tasks
1. read the follow-up sweep JSON
2. filter `source-unreadable` timeout records
3. verify the count is exactly `3`
4. identify `sweep-040-scene` as the regression timeout
### Deliverables
1. frozen timeout input list
### Acceptance Criteria
1. exactly `3` timeout records enter diagnostics
2. no non-timeout record enters diagnostics
## Phase 1: Source Directory Diagnostics
### Objective
Determine whether timeout records are likely caused by source scale or source structure.
### Tasks
1. inspect each source directory
2. count all files
3. count HTML files
4. count JavaScript files
5. compute total source bytes
6. record the largest files
### Deliverables
1. per-scene source diagnostics in JSON
### Acceptance Criteria
1. all `3` timeout records have source diagnostics
2. missing directories are reported explicitly
## Phase 2: Bounded Diagnostic Rerun
### Objective
Check whether each timeout completes under a longer diagnostic budget.
### Tasks
1. rerun each timeout scene with a diagnostic timeout budget
2. write output under `examples/timeout_regression_diagnostic_2026-04-19`
3. capture exit code
4. capture elapsed seconds
5. record whether a `generation-report.json` is produced
6. do not update any execution status based on the result
### Deliverables
1. diagnostic rerun result per timeout scene
### Acceptance Criteria
1. each timeout has exactly one diagnostic rerun result
2. rerun success is marked only as diagnostic evidence
3. rerun failure is categorized, not fixed
## Phase 3: Timeout Labeling
### Objective
Assign each timeout one final diagnostic label.
### Tasks
1. assign one primary diagnostic label:
- `timeout-rerun-pass`
- `timeout-rerun-fail-closed`
- `timeout-large-source`
- `timeout-command-hang`
- `timeout-nondeterministic`
- `timeout-source-scan-heavy`
- `timeout-unknown`
2. attach secondary labels when useful
3. distinguish persistent timeouts from regression timeout
### Deliverables
1. labeled timeout diagnostic JSON
### Acceptance Criteria
1. all `3` records have exactly one primary diagnostic label
2. `sweep-040-scene` remains clearly identified as the regression timeout
## Phase 4: Diagnostic Report
### Objective
Publish diagnostic results without starting implementation.
### Tasks
1. write `tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
2. write `docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
3. summarize whether the next step should be timeout implementation, rerun hygiene, or no action
### Deliverables
1. `tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
2. `docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
### Acceptance Criteria
1. diagnostic output exists
2. report exists
3. no implementation changes are made
4. no execution board update is made
## Completion Criteria
This plan is complete when:
1. the three timeout records are frozen
2. each has source diagnostics
3. each has one diagnostic rerun result
4. each has one final diagnostic label
5. JSON and report are published
## Stop Statement
Stop after publishing the timeout diagnostic JSON and report.
Do not start timeout implementation or status promotion inside this plan.

View File

@@ -0,0 +1,140 @@
# Timeout Rerun Hygiene Integration Plan
> Date: 2026-04-19
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-19-timeout-rerun-hygiene-integration-design.md`
> Upstream Hygiene: `tests/fixtures/generated_scene/timeout_budget_rerun_hygiene_2026-04-19.json`
## Plan Intent
Integrate timeout rerun hygiene into sweep and reconciliation reporting.
This plan only changes the reporting layer. It does not change scene generation behavior.
## Scope Guardrails
1. do not modify `src/generated_scene/analyzer.rs`
2. do not modify `src/generated_scene/generator.rs`
3. do not update `scene_execution_board_2026-04-18.json`
4. do not promote scenes
5. do not rerun the `102` sweep
6. do not start timeout implementation fixes
## Fixed Inputs
1. `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
2. `tests/fixtures/generated_scene/timeout_budget_rerun_hygiene_2026-04-19.json`
## Phase 0: Freeze Inputs
### Objective
Freeze the sweep follow-up and timeout hygiene inputs.
### Tasks
1. verify follow-up sweep status counts
2. verify timeout hygiene summary:
- `rerun-resolved-pass = 2`
- `rerun-resolved-fail-closed = 1`
- `rerun-still-timeout = 0`
- `rerun-error = 0`
### Deliverables
1. frozen integration input set
### Acceptance Criteria
1. only the fixed follow-up and hygiene inputs are used
## Phase 1: Build Hygiene Overlay
### Objective
Attach timeout hygiene results onto raw timeout scenes.
### Tasks
1. match timeout hygiene records to the follow-up sweep by `sceneId`
2. preserve raw `source-unreadable`
3. add:
- `hygieneStatus`
- `hygieneInterpretation`
4. map:
- `rerun-resolved-pass -> timeout-as-pass-candidate`
- `rerun-resolved-fail-closed -> timeout-as-fail-closed-candidate`
- `rerun-still-timeout -> timeout-still-unreadable`
- `rerun-error -> timeout-rerun-error`
### Deliverables
1. timeout hygiene overlay records
### Acceptance Criteria
1. all three timeout scenes receive one overlay status
2. raw status is preserved
## Phase 2: Build Integrated Summary
### Objective
Publish a hygiene-aware timeout summary alongside the raw sweep summary.
### Tasks
1. preserve raw follow-up status counts
2. add hygiene-aware timeout interpretation counts
3. summarize:
- `timeout-as-pass-candidate`
- `timeout-as-fail-closed-candidate`
- `timeout-still-unreadable`
- `timeout-rerun-error`
### Deliverables
1. integrated summary block
### Acceptance Criteria
1. raw and hygiene-aware summaries both exist
2. timeout bucket is no longer lossy in the integrated output
## Phase 3: Publish Integrated Output
### Objective
Publish the bounded reconciliation-friendly hygiene integration output.
### Tasks
1. write `tests/fixtures/generated_scene/timeout_rerun_hygiene_integration_2026-04-19.json`
2. write `docs/superpowers/reports/2026-04-19-timeout-rerun-hygiene-integration-report.md`
3. state that this is an interpretation/reporting layer only
### Deliverables
1. timeout hygiene integration JSON
2. timeout hygiene integration report
### Acceptance Criteria
1. both files exist
2. no execution board update is made
3. no implementation change is made
## Completion Criteria
This plan is complete when:
1. inputs are frozen
2. timeout hygiene overlay is attached
3. integrated raw and hygiene-aware summaries are published
4. JSON and report are written
## Stop Statement
Stop after publishing the integration JSON and report.
Do not start implementation or board updates inside this plan.

View File

@@ -0,0 +1,86 @@
# Deterministic Keyword Scoring Refinement Plan
> Date: 2026-04-20
> Design: `2026-04-20-deterministic-keyword-scoring-refinement-design.md`
## Plan Intent
Close the 9 deterministic dispatch ambiguity gaps by bounded manifest keyword refinement and dry-run verification.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_2026-04-20.json`
2. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
3. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*/scene.toml`
## Allowed Files
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*/scene.toml`
2. `tests/fixtures/generated_scene/deterministic_keyword_scoring_refinement_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
4. `docs/superpowers/reports/2026-04-20-deterministic-keyword-scoring-refinement-report.md`
## Forbidden Files
1. `src/compat/scene_platform/dispatch.rs`
2. `src/compat/scene_platform/resolvers.rs`
3. `src/generated_scene/analyzer.rs`
4. `src/generated_scene/generator.rs`
5. generated `scripts/*`
6. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Gap Set
### Tasks
1. Load readiness gaps from the parent readiness asset.
2. Confirm the fixed gap set is exactly 9 ambiguous dispatch entries.
### Acceptance Criteria
1. No additional gap categories are pulled into scope.
2. `sweep-012-scene` remains excluded.
## Phase 1: Refine Manifest Keywords
### Tasks
1. For each fixed gap, identify direct collision partner.
2. Narrow include keywords to distinctive full phrases.
3. Remove broad standalone collision tokens where they create ties.
4. Add explicit exclude keywords only when a pair is mutually exclusive.
### Acceptance Criteria
1. The fixed 9 scenes retain non-empty include keywords.
2. No generated script is changed.
## Phase 2: Dispatch Dry-Run Verification
### Tasks
1. Re-run dispatch dry-run for all 101 complete packages.
2. Verify the fixed 9 gaps uniquely select their expected scene by full-name sample.
3. Check that no previously-ready scene regresses into ambiguity or no-match.
### Acceptance Criteria
1. `dispatchReady = 101` or all residual gaps are explicitly justified.
2. `ambiguous = 0` unless escalated to a separate runtime scoring plan.
## Phase 3: Publish Report
### Tasks
1. Publish refinement JSON.
2. Publish post-refinement readiness JSON.
3. Publish report.
### Acceptance Criteria
1. Report states before/after ready and ambiguous counts.
2. Report states whether runtime scoring changes are needed.
## Stop Statement
Stop after refinement assets and report are published. Do not start browser execution, runtime dispatch implementation, or `sweep-012-scene` recovery under this plan.

View File

@@ -0,0 +1,101 @@
# Final Skill Human-Readable Index Plan
> Date: 2026-04-20
> Parent Plan: `2026-04-19-scene-skill-102-final-materialization-plan.md`
> Design: `2026-04-20-final-skill-human-readable-index-design.md`
## Plan Intent
Add human-readable lookup and metadata to the final materialized skill set so reviewers can identify which `sweep-xxx-scene` skill maps to which business scene.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
3. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
4. `examples/scene_skill_102_final_materialization_2026-04-19`
## Allowed Files
1. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
2. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
3. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*/SKILL.toml`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*/SKILL.md`
5. `docs/superpowers/reports/2026-04-20-final-skill-human-readable-index-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
5. generated `scripts/*`
6. existing materialization manifest and failures assets
## Phase 0: Freeze Metadata Boundary
### Tasks
1. Confirm final materialization root exists.
2. Confirm official board has 102 scene mappings.
3. Confirm this plan does not repair failed packages.
### Acceptance Criteria
1. Scope is metadata/index only.
2. Stable `sweep-xxx-scene` ids are preserved.
## Phase 1: Build Human-Readable Mapping
### Tasks
1. Load scene id and scene name from official board.
2. Load materialization status from final materialization manifest and failures asset.
3. Produce 102 mapping rows.
### Acceptance Criteria
1. Row count is 102.
2. `sweep-012-scene` is included and marked failed.
## Phase 2: Publish Index Assets
### Tasks
1. Write `SCENE_INDEX.md`.
2. Write `scene_skill_102_index.json`.
### Acceptance Criteria
1. Index files are present.
2. Index files include scene id, scene name, archetype, readiness, status, and skill directory.
## Phase 3: Normalize Skill Metadata
### Tasks
1. For each complete package, update `SKILL.toml` readable fields while preserving `[skill].name`.
2. For each complete package, update `SKILL.md` readable summary.
3. Skip failed packages that lack required files.
### Acceptance Criteria
1. Complete packages expose readable scene names.
2. Failed packages remain explicit failures.
3. Generated scripts are not modified.
## Phase 4: Publish Report
### Tasks
1. Publish human-readable index report.
2. State materialized package count and skipped failed package count.
### Acceptance Criteria
1. Report explains how to find scene-to-skill mapping.
2. Report states that no generation or recovery was performed.
## Stop Statement
Stop after index assets, metadata normalization, and report are published. Do not start static/mock validation or `sweep-012-scene` recovery under this plan.

View File

@@ -0,0 +1,47 @@
# Generated Scene Embedded Dictionary Extraction Hardening Plan
> Date: 2026-04-20
> Status: Draft
> Parent route:
> - `embedded_dictionary_extraction_hardening`
> Parent ledger:
> - `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
## Plan Intent
Implement the first reusable slice for source-side dictionary/tree extraction.
## Fixed Input Bucket
Use the bounded bucket:
1. scenes with declared `org` parameters
2. scenes with source-side dictionary evidence (`city.js`, `dict.js`, `enum.js`, tree/options files)
3. scenes whose current generated `org-dictionary.json` is absent or starter-sized
This first slice should center on the 10 parameterized scenes that most resemble `sweep-030-scene`.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. route-local generator tests
## Forbidden Files
1. no edits to already materialized dictionaries under `examples/`
2. no runtime resolver implementation outside generation output needs
3. no board assets
4. no pseudo-production handoff assets
## Expected Coverage Delta
1. generated dictionaries move beyond starter subsets for the bucketed scenes
2. dictionary recovery becomes source-driven rather than hand-seeded
## Stop Statement
Stop after the first reusable dictionary-extraction slice is implemented and route-local follow-up assets are published.
Do not attempt complete organization-tree closure for every scene inside this route plan.

View File

@@ -0,0 +1,47 @@
# Generated Scene Invocation Alias Generation Hardening Plan
> Date: 2026-04-20
> Status: Draft
> Parent route:
> - `alias_generation_hardening`
> Parent ledger:
> - `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
## Plan Intent
Implement the first reusable slice for natural-language alias generation.
## Fixed Input Bucket
Use the bounded bucket:
1. scenes with source-side alias evidence
2. scenes whose current generated deterministic manifests still expose only narrow keyword coverage
3. high-risk browser-script report scenes where operator wording is likely to diverge from canonical scene names
This first slice should prefer the densest high-risk alias bucket rather than the full 84-scene route at once.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. route-local generator tests
## Forbidden Files
1. no runtime scoring changes in sgClaw dispatch
2. no service-console changes
3. no direct edits to final materialized `scene.toml`
4. no board assets
## Expected Coverage Delta
1. generated `include_keywords` become less brittle for the bucketed scenes
2. deterministic invocation becomes less dependent on exact canonical wording
## Stop Statement
Stop after the first reusable alias-generation slice is implemented and route-local follow-up assets are published.
Do not attempt one-shot full alias closure for every scene inside this route plan.

View File

@@ -0,0 +1,47 @@
# Generated Scene Parameter Default Semantics Hardening Plan
> Date: 2026-04-20
> Status: Draft
> Parent route:
> - `parameter_default_semantics_recovery_hardening`
> Parent ledger:
> - `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
## Plan Intent
Implement the first reusable slice for page-native default period/date/mode recovery.
## Fixed Input Bucket
Use the bounded bucket:
1. scenes with explicit `period` parameters
2. scenes whose source evidence shows implicit month/week/date initialization
3. scenes whose current generated manifests do not encode a reusable default strategy
This first slice should center on the parameterized monthly/weekly scenes highlighted by the ledger.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. route-local generator tests
## Forbidden Files
1. no runtime resolver patching outside generation metadata needs
2. no edits to generated skill bundle under `examples/`
3. no board assets
4. no pseudo-production assets
## Expected Coverage Delta
1. generated parameter metadata can preserve source-side default semantics for the bucketed scenes
2. callers are no longer forced to supply values that the source page itself normally supplies
## Stop Statement
Stop after the first reusable default-semantics slice is implemented and route-local follow-up assets are published.
Do not expand to all possible date semantics inside this route plan.

View File

@@ -0,0 +1,47 @@
# Generated Scene Resolver Request Mapping Hardening Plan
> Date: 2026-04-20
> Status: Draft
> Parent route:
> - `resolver_request_mapping_hardening`
> Parent ledger:
> - `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
## Plan Intent
Implement the first reusable mapping slice for request-field recovery.
## Fixed Input Bucket
Use the bounded bucket:
1. scenes with explicit `org` and/or `period` params
2. scenes whose source evidence shows request-field tokens like `orgno`, `fdate`, `weekSfdate`, `weekEfdate`
3. scenes currently lacking explicit generated request-mapping metadata
This first slice is expected to center on the parameterized `multi_mode_request` family and adjacent structured-request scenes.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. route-local generator tests
## Forbidden Files
1. no edits to final materialized skill bundle
2. no execution-board assets
3. no runtime / browser callback host
4. no service console assets
## Expected Coverage Delta
1. introduce reusable request-field mapping metadata rather than scene-name patches
2. reduce `resolver_to_request_mapping_gap` in the highest-signal parameterized bucket
## Stop Statement
Stop after the first reusable mapping slice is implemented and route-local follow-up assets are published.
Do not yet attempt full 102-scene closure inside this route plan.

View File

@@ -0,0 +1,143 @@
# Generated Scene Rule Hardening Route Plan
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Parent design:
> - `docs/superpowers/specs/2026-04-20-generated-scene-rule-hardening-route-design.md`
> Upstream ledger:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-plan.md`
## Plan Intent
Convert the completed runtime-semantics ledger into a bounded hardening-route sequence.
This stage decides execution order and the next child implementation plans. It does not change code yet.
## Fixed Inputs
1. `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md`
## Scope Guardrails
Allowed:
1. cluster scenes by reusable route
2. freeze route order
3. define bounded child implementation plans
4. define rematerialization dependency
5. define validation refresh dependency
Forbidden:
1. no implementation changes in `src/`
2. no skill manifest changes
3. no rematerialization execution
4. no validation reruns
5. no inner-network execution
## Phase 0: Freeze Route Order
### Objective
Turn the ledger into one fixed route order for downstream implementation.
### Ordered Routes
1. `resolver_request_mapping_hardening`
2. `runtime_url_classification_hardening`
3. `embedded_dictionary_extraction_hardening`
4. `parameter_default_semantics_recovery_hardening`
5. `alias_generation_hardening`
### Acceptance
1. the order is explicit and no longer derived ad hoc during later implementation
## Phase 1: Build Route Clusters
### Objective
Cluster scenes from the ledger into reusable route buckets.
### Tasks
1. count all scenes covered by each route
2. identify the densest scene families per route
3. identify route-local anchor scenes
### Acceptance
1. each route has a stable implementation bucket definition
## Phase 2: Define Bounded Child Implementation Plans
### Objective
Create one bounded implementation child plan for each top route.
### Required child plans
1. `2026-04-20-generated-scene-resolver-request-mapping-hardening-plan.md`
2. `2026-04-20-generated-scene-runtime-url-classification-hardening-plan.md`
3. `2026-04-20-generated-scene-embedded-dictionary-extraction-hardening-plan.md`
4. `2026-04-20-generated-scene-parameter-default-semantics-hardening-plan.md`
5. `2026-04-20-generated-scene-invocation-alias-generation-hardening-plan.md`
### Acceptance
1. each child plan has a fixed scope and stop rule
2. no child plan is scene-name hardcoded as its whole purpose
## Phase 3: Declare Rematerialization Dependency
### Objective
Make full 102-scene rematerialization a mandatory downstream stage after route execution.
### Tasks
1. define `generated-scene-runtime-semantics-rematerialization-refresh-plan`
2. freeze it as required after implementation
### Acceptance
1. no route may be considered complete without rematerialization
## Phase 4: Declare Validation Refresh Dependency
### Objective
Make validation refresh mandatory after rematerialization.
### Tasks
1. define `generated-scene-runtime-semantics-validation-refresh-plan`
2. require refresh of:
- deterministic invocation readiness
- natural-language parameter readiness
- static validation
- direct mock execution
- pseudo-production handoff
### Acceptance
1. no route may be considered fully closed until validation assets are refreshed
## Deliverables
1. route design / sequencing report
2. route cluster JSON
3. bounded child-plan list for the five routes
## Stop Statement
Stop after:
1. publishing the route design / sequencing assets
2. publishing the five child implementation plans
3. publishing rematerialization and validation-refresh dependency plans
Do not execute route implementation inside this plan.

View File

@@ -0,0 +1,34 @@
# Generated Scene Rule Hardening Route Sequence Plan
> Date: 2026-04-20
> Status: Draft
> Parent design:
> - `docs/superpowers/specs/2026-04-20-generated-scene-rule-hardening-route-sequence-design.md`
## Plan Intent
Publish the bounded child-plan tree that follows the completed runtime-semantics ledger.
## Fixed Sequence
1. `generated-scene-resolver-request-mapping-hardening`
2. `generated-scene-runtime-url-classification-hardening`
3. `generated-scene-embedded-dictionary-extraction-hardening`
4. `generated-scene-parameter-default-semantics-hardening`
5. `generated-scene-invocation-alias-generation-hardening`
6. `generated-scene-runtime-semantics-rematerialization-refresh`
7. `generated-scene-runtime-semantics-validation-refresh`
## Deliverables
1. route cluster JSON
2. route sequence report
3. five bounded child implementation plans
4. one rematerialization refresh dependency plan
5. one validation refresh dependency plan
## Stop Statement
Stop after publishing the child-plan tree.
Do not implement any route in this plan.

View File

@@ -0,0 +1,122 @@
# Generated Scene Runtime Semantics Gap Analysis Plan
> Status: Superseded by `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
## Parent
- Parent design: [2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md](/D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-20-generated-scene-runtime-semantics-gap-analysis-design.md)
## Goal
Analyze the 102 final generated scene skills for runtime-semantics divergence, using `sweep-030-scene` as the anchor case and systematizing the five gap classes exposed during inner-network validation.
This plan is analysis-only.
## Fixed Inputs
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
- `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
- `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
- Anchor source:
- `D:/desk/智能体资料/全量业务场景/一平台场景/台区线损大数据-月_周累计线损率统计分析`
## Boundaries
Allowed:
- Read skill manifests, reports, references, and selected source-scene evidence
- Produce JSON inventory and report
Forbidden:
- No edits in `src/`
- No edits to generated skills
- No rerun materialization
- No execution board updates
- No pseudo-production execution
- No implementation patch for any scene
## Phase 0: Freeze Gap Taxonomy
Tasks:
1. Fix the five runtime-semantics gap classes from the anchor case
2. Define high / medium / low risk buckets
3. Lock analysis outputs and stop rule
Acceptance:
1. The five gap classes are explicit and stable
2. The plan remains analysis-only
## Phase 1: Anchor-Case Evidence Extraction
Tasks:
1. Read `sweep-030-scene` generated assets:
- `scene.toml`
- `references/generation-report.json`
- `references/org-dictionary.json`
- generated script
2. Read source-scene evidence from the original `台区线损大数据-月_周累计线损率统计分析`
3. Record direct evidence for:
- alias gap
- dictionary recovery gap
- parameter default semantics gap
- resolver-to-request mapping gap
- runtime URL semantics gap
Acceptance:
1. `sweep-030-scene` has explicit evidence for each applicable gap class
## Phase 2: 102-Scene Inventory Scan
Tasks:
1. Scan all 102 final skills
2. Extract:
- deterministic keywords
- params presence
- dictionary reference presence
- bootstrap target presence
- generation-report URL evidence
3. Tag scenes with likely gap classes using bounded heuristics
Acceptance:
1. Every scene gets a runtime-semantics record
2. Every scene has `riskLevel` and `gaps`
## Phase 3: Family / Archetype Grouping
Tasks:
1. Group findings by archetype / family
2. Count gap incidence by bucket
3. Separate:
- generator-level fix candidates
- runtime-only residuals
Acceptance:
1. Summary counts exist per gap type and per archetype
2. Report can distinguish generator vs runtime responsibilities
## Phase 4: Publish Analysis Assets
Deliverables:
1. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_gap_analysis_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-generated-scene-runtime-semantics-gap-analysis-report.md`
Acceptance:
1. All 102 scenes are represented
2. `sweep-030-scene` is explicitly called out as anchor evidence
3. The report recommends next implementation routes, but does not execute them
## Stop Statement
Stop after publishing the JSON inventory and report.

View File

@@ -0,0 +1,34 @@
# Generated Scene Runtime Semantics Rematerialization Refresh Plan
> Date: 2026-04-20
> Status: Draft
> Dependency stage:
> - post route implementation
## Plan Intent
Make full 102-scene rematerialization mandatory after runtime-semantics hardening routes land.
## Fixed Inputs
1. completed route-local hardening reports
2. current canonical final skill root
3. current final materialization manifest/failure assets
## Required Outputs
1. refreshed final 102-skill materialization directory
2. refreshed materialization manifest
3. refreshed materialization failures asset
4. refreshed scene index / metadata layer
## Guardrails
1. no route may be considered complete without this refresh
2. rematerialization must use hardened generator rules, not manual skill edits
## Stop Statement
Stop after publishing the rematerialization refresh plan.
Do not execute rematerialization inside this dependency plan.

View File

@@ -0,0 +1,29 @@
# Generated Scene Runtime Semantics Validation Refresh Plan
> Date: 2026-04-20
> Status: Draft
> Dependency stage:
> - post rematerialization refresh
## Plan Intent
Make validation refresh mandatory after runtime-semantics rematerialization.
## Required Refresh Layers
1. deterministic invocation readiness
2. natural-language parameter readiness
3. static validation
4. direct mock execution
5. pseudo-production handoff assets
## Guardrails
1. validation must consume the refreshed canonical 102-skill bundle
2. old validation assets may not be reused as proof of the hardened bundle
## Stop Statement
Stop after publishing the validation refresh plan.
Do not execute validation refresh inside this dependency plan.

View File

@@ -0,0 +1,47 @@
# Generated Scene Runtime URL Classification Hardening Plan
> Date: 2026-04-20
> Status: Draft
> Parent route:
> - `runtime_url_classification_hardening`
> Parent ledger:
> - `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
## Plan Intent
Implement the first reusable slice that separates runtime URL roles during generation.
## Fixed Input Bucket
Use the bounded bucket:
1. scenes with strong source evidence for multiple URL roles
2. scenes whose current generated manifest only exposes `target_url`
3. high-signal browser-script scenes where runtime context URL and module-route URL are likely to diverge
This first slice should focus on the highest-risk parameterized browser families before broader expansion.
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. route-local generator tests
## Forbidden Files
1. no callback-host/runtime implementation
2. no service-console changes
3. no direct edits to generated skills
4. no board or validation assets
## Expected Coverage Delta
1. generated metadata can distinguish app-entry/runtime-context/module-route roles
2. callers are no longer forced to guess `page_url` semantics for the bucketed scenes
## Stop Statement
Stop after the first reusable URL-classification slice is implemented and route-local follow-up assets are published.
Do not expand to every scene in this route plan.

View File

@@ -0,0 +1,94 @@
# Generated Scene Source Evidence Cross-Scan Plan
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Parent design:
> - `docs/superpowers/specs/2026-04-20-generated-scene-source-evidence-cross-scan-design.md`
## Goal
Perform a bounded source-first cross-scan over the original 102 scene directories so the project can identify which scenes share the same runtime-semantics risk family as `sweep-030-scene`.
This plan is analysis-only.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
2. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
3. source root:
- `D:/desk/智能体资料/全量业务场景/一平台场景`
## Boundaries
Allowed:
1. map the current 102 scenes to original source directories
2. scan bounded source evidence
3. publish JSON inventory and report
Forbidden:
1. no edits in `src/`
2. no edits to generated skills
3. no rematerialization
4. no validation reruns
5. no execution board updates
## Phase 0: Freeze Scene Mapping
Tasks:
1. derive the exact 102-scene source directory mapping
2. validate that each scene maps to one source directory or an explicit missing record
Acceptance:
1. all 102 scenes have a source mapping status
## Phase 1: Run Bounded Source Evidence Scan
Tasks:
1. scan for alias evidence
2. scan for dictionary evidence
3. scan for default parameter evidence
4. scan for request mapping evidence
5. scan for runtime URL evidence
Acceptance:
1. each scene has evidence flags
2. representative evidence files are recorded where found
## Phase 2: Build Cross-Scan Ledger
Tasks:
1. write one record per scene
2. tag scenes with source-side risk hints
3. explicitly identify scenes that look similar to `sweep-030-scene`
Acceptance:
1. all 102 scenes appear in the ledger
2. the anchor case is clearly represented
## Phase 3: Publish Assets
Deliverables:
1. `tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-generated-scene-source-evidence-cross-scan-report.md`
Acceptance:
1. the JSON can be used as the next input to the runtime-semantics ledger stage
2. the report summarizes the five evidence families across the 102-scene set
## Stop Statement
Stop after publishing the JSON inventory and report.
Do not start rule-hardening or rematerialization in this plan.

View File

@@ -0,0 +1,214 @@
# Generated Scene Source-First Runtime Semantics Hardening Plan
> Date: 2026-04-20
> Status: Draft
> Parent design: `docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-design.md`
## Plan Intent
Replace the weaker generated-skill-first analysis path with a stronger source-first roadmap:
1. scan all 102 original source scenes
2. detect scenes that can reproduce the same runtime-semantics defect classes exposed by `sweep-030-scene`
3. convert those findings into rule-level hardening routes
4. require full 102-scene rematerialization after rule changes
5. refresh the full validation stack after rematerialization
## Why This Plan Exists
The project goal is not to describe already-surfaced gaps after they break in inner-network testing.
The goal is to prevent the same class of defect from reappearing across the remaining source scenes.
Therefore this plan is driven by original source-scene evidence, not generated skill artifacts alone.
## Fixed Inputs
1. Original source root:
- `D:/desk/智能体资料/全量业务场景/一平台场景`
2. Current final generated skills:
- `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. Current 102-skill materialization manifest
4. Current invocation / parameter readiness assets
5. `sweep-030-scene` inner-network runtime findings
## Scope Guardrails
Allowed:
1. scan all 102 original source-scene directories
2. compare source evidence against current generated skills
3. produce risk ledgers, reports, and downstream bounded plans
Forbidden in this parent plan:
1. no implementation changes in `src/`
2. no skill manifest edits
3. no rematerialization execution yet
4. no validation reruns yet
5. no inner-network patching as a substitute for source-first analysis
## Workstreams
1. `WS1` Source Evidence Scan
2. `WS2` Runtime-Semantics Risk Ledger
3. `WS3` Rule Hardening Route Design
4. `WS4` Full Rematerialization and Validation Refresh Planning
## Phase 0: Freeze Parent Scope
### Objective
Make this the new parent roadmap for generated-scene runtime semantics hardening.
### Tasks
1. freeze the five gap classes
2. freeze the source-first principle
3. freeze rematerialization as a required downstream step
### Acceptance
1. future work must start from source-scene evidence
2. future fixes must be rule-level before scene-level
## Phase 1: Full 102 Source Cross-Scan
### Objective
Systematically scan the original 102 source scenes for high-signal evidence related to the five runtime-semantics gap classes.
### Required scan targets
1. dictionary / enum / tree files
2. default parameter logic
3. request payload field names
4. runtime URL candidates
5. operator-facing wording and alias sources
### Tasks
1. map each scene id to its original source directory
2. run a bounded evidence scan over all 102 source directories
3. tag source-side evidence flags per scene
### Deliverables
1. source evidence scan JSON
2. source evidence scan report
### Acceptance
1. all 102 scenes have source evidence flags
2. `sweep-030-scene` is validated as anchor evidence
## Phase 2: Build the Source-First Runtime Semantics Ledger
### Objective
Merge source-side evidence with generated-skill evidence into a full runtime-semantics risk ledger.
### Tasks
1. compare source evidence with generated manifests and references
2. assign gap classes per scene
3. assign risk level per scene
4. distinguish:
- generator-level rule gap
- runtime-only residual
### Deliverables
1. `generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
2. source-first runtime semantics report
### Acceptance
1. all 102 scenes are represented
2. each scene has `gaps`, `riskLevel`, and `recommendedFixRoutes`
## Phase 3: Convert Ledger into Rule-Hardening Routes
### Objective
Turn the source-first ledger into bounded implementation routes that modify reusable generation rules rather than scene-specific patches.
### Candidate hardening routes
1. alias generation hardening
2. embedded dictionary extraction hardening
3. parameter default semantics recovery hardening
4. resolver-to-request mapping hardening
5. runtime URL classification hardening
### Tasks
1. count scenes affected by each route
2. prioritize routes by coverage gain and reuse
3. define bounded implementation slices for the top routes
### Deliverables
1. child-plan sequence for runtime semantics hardening
2. bounded route plans for top reusable fixes
### Acceptance
1. no route is scene-name hardcoded
2. route priority is based on 102-scene reuse, not anecdotal debugging order
## Phase 4: Require Full 102 Rematerialization
### Objective
Ensure that hardened rules are propagated into the final generated skill inventory.
### Tasks
1. define full 102 rematerialization as mandatory after route implementation
2. define materialization outputs that must be refreshed
3. define how canonical final skill bundle is replaced
### Deliverables
1. full rematerialization refresh plan
### Acceptance
1. no runtime-semantics hardening route may be considered complete without rematerialization
## Phase 5: Require Validation Refresh
### Objective
Refresh downstream validation after rematerialization so improved rules are measured end-to-end.
### Required refresh layers
1. deterministic invocation readiness
2. natural-language parameter readiness
3. static validation
4. direct mock execution
5. pseudo-production handoff refresh
### Deliverables
1. validation refresh plan
### Acceptance
1. the new final 102-skill bundle is revalidated before more inner-network testing
## Immediate Next Output
This parent plan should immediately lead to a new bounded child plan:
- `2026-04-20-generated-scene-source-evidence-cross-scan-plan.md`
That child plan should perform the actual source cross-scan over the 102 original scenes.
## Stop Statement
Stop after publishing this parent plan and its design.
Do not execute the source cross-scan or implementation inside this plan.

View File

@@ -0,0 +1,143 @@
# Generated Scene Source-First Runtime Semantics Ledger Plan
> Date: 2026-04-20
> Status: Draft
> Parent roadmap:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-first-runtime-semantics-hardening-plan.md`
> Parent design:
> - `docs/superpowers/specs/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-design.md`
> Upstream completed step:
> - `docs/superpowers/plans/2026-04-20-generated-scene-source-evidence-cross-scan-plan.md`
## Plan Intent
Build the full source-first runtime-semantics ledger for the current 102-scene set.
This stage exists to convert the completed source cross-scan into a reusable comparison ledger before any analyzer/generator hardening route is defined.
## Fixed Inputs
1. `tests/fixtures/generated_scene/generated_scene_source_evidence_cross_scan_2026-04-20.json`
2. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
3. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
6. `sweep-030-scene` inner-network findings already established in prior discussion and analysis assets
## Scope Guardrails
Allowed:
1. read source cross-scan outputs
2. read current generated skills and references
3. compare source evidence with generated evidence
4. assign gap classes, risk levels, and route hints
5. publish ledger JSON and report
Forbidden:
1. no implementation changes in `src/`
2. no manifest or script edits
3. no rematerialization
4. no validation reruns
5. no execution-board update
6. no inner-network testing
## Phase 0: Freeze Ledger Inputs
### Objective
Make the cross-scan asset and current generated-skill assets the only valid inputs for this ledger stage.
### Tasks
1. verify the cross-scan JSON parses
2. verify all 102 scenes are represented
3. verify the current generated skill root is readable
### Acceptance
1. the ledger stage starts from a stable 102-scene evidence base
## Phase 1: Build Per-Scene Comparison Records
### Objective
For each scene, merge source evidence with generated-skill evidence into one comparison record.
### Tasks
1. load source evidence flags, evidence files, alias samples, request tokens, and runtime URL samples
2. read current scene-level generated manifests/references as needed
3. summarize generated-side evidence for:
- invocation aliases
- dictionaries
- parameter defaults
- request mapping
- runtime URL roles
4. write one comparison record per scene
### Acceptance
1. all 102 scenes have both source-side and generated-side summaries
## Phase 2: Assign Gap Classes and Risk Levels
### Objective
Convert comparison records into a stable runtime-semantics risk ledger.
### Tasks
1. assign `gaps` from the fixed five-class taxonomy
2. assign `riskLevel = high|medium|low`
3. assign:
- `generatorLevelGap`
- `runtimeOnlyResidual`
4. record `comparisonNotes`
### Acceptance
1. every scene has `gaps`
2. every scene has `riskLevel`
3. every scene has `recommendedFixRoutes`
## Phase 3: Aggregate Route-Level Signals
### Objective
Produce route-level reuse signals from the scene ledger so the next stage can design bounded hardening routes.
### Tasks
1. count scenes carrying each gap class
2. count scenes marked `generatorLevelGap`
3. count scenes marked `runtimeOnlyResidual`
4. identify the highest-density reusable route clusters
### Acceptance
1. the ledger can drive downstream route prioritization without returning to anecdotal scene debugging
## Deliverables
1. `tests/fixtures/generated_scene/generated_scene_source_first_runtime_semantics_ledger_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-generated-scene-source-first-runtime-semantics-ledger-report.md`
## Expected Coverage
The ledger should represent:
1. all 102 scenes
2. all five canonical gap classes
3. source-first route hints derived from the completed cross-scan
## Stop Statement
Stop after:
1. publishing the ledger JSON
2. publishing the ledger report
3. summarizing the highest-reuse hardening routes
Do not yet create implementation route plans inside this ledger plan.

View File

@@ -0,0 +1,110 @@
# Scene Skill 102 Deterministic Invocation Readiness Plan
> Date: 2026-04-20
> Design: `2026-04-20-scene-skill-102-deterministic-invocation-readiness-design.md`
## Plan Intent
Make the materialized scene skills ready for sgClaw deterministic invocation using natural-language instructions ending with `。。。`.
This plan does not prove production execution. It only prepares and verifies registry/dispatch readiness.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19`
2. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
3. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Allowed Files
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*/scene.toml`
2. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_samples_2026-04-20.json`
4. `docs/superpowers/reports/2026-04-20-scene-skill-102-deterministic-invocation-readiness-report.md`
## Forbidden Files
1. `src/compat/scene_platform/dispatch.rs`
2. `src/compat/scene_platform/resolvers.rs`
3. `src/generated_scene/analyzer.rs`
4. `src/generated_scene/generator.rs`
5. generated `scripts/*`
6. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Invocation Readiness Boundary
### Tasks
1. Confirm final materialization root exists.
2. Confirm human-readable index exists.
3. Confirm this plan excludes browser execution and runtime changes.
### Acceptance Criteria
1. Scope is deterministic invocation readiness only.
2. `sweep-012-scene` remains outside complete-package normalization.
## Phase 1: Normalize Deterministic Manifest Metadata
### Tasks
1. For each complete package, set `[deterministic].suffix = "。。。"`.
2. Preserve scene id, skill, tool, bootstrap, params, artifact, and postprocess sections.
3. Generate include keywords from:
- full scene name;
- meaningful scene-name tokens;
- archetype/family hints when available.
4. Keep exclude keywords.
### Acceptance Criteria
1. All complete packages use suffix `。。。`.
2. Every complete package has non-empty include keywords.
3. Skill directories and scripts are unchanged.
## Phase 2: Build Invocation Samples
### Tasks
For each complete package, generate at least:
1. full-name sample: `<sceneName>。。。`
2. keyword sample: `<bestKeyword>。。。`
3. parameterized sample when params exist.
### Acceptance Criteria
1. Sample asset contains all complete packages.
2. Failed package is listed as excluded.
## Phase 3: Dispatch Dry-Run
### Tasks
1. Run registry-backed dispatch checks without browser execution.
2. Verify full-name sample selects the expected scene.
3. Record ambiguous or unsupported dispatch results.
4. Record required-param prompts separately from dispatch misses.
### Acceptance Criteria
1. Every complete package has a dispatch result.
2. Results distinguish selected, prompt, ambiguous, and no-match.
## Phase 4: Publish Readiness Report
### Tasks
1. Publish readiness JSON.
2. Publish invocation sample JSON.
3. Publish superpowers report.
### Acceptance Criteria
1. Report states deterministic-ready count.
2. Report states gap count and gap categories.
3. Report states whether runtime dispatch changes are needed.
## Stop Statement
Stop after readiness assets and report are published. Do not start browser execution, static validation, production validation, or runtime dispatch implementation under this plan.

View File

@@ -0,0 +1,112 @@
# Scene Skill 102 Full Direct Mock Execution Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-full-direct-mock-execution-design.md`
## Plan Intent
Run all `102` final materialized scene skill scripts through a local direct mock runtime.
This plan expands beyond representative harness execution, but remains fully mock-only and local.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
3. `tests/fixtures/generated_scene/scene_skill_102_static_validation_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-full-direct-mock-execution-report.md`
## Allowed Files
1. new direct mock runner under `tests/`
2. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
3. `docs/superpowers/reports/2026-04-20-scene-skill-102-full-direct-mock-execution-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/**`
5. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Direct Mock Boundary
### Tasks
1. Confirm representative mock harness is complete.
2. Confirm this plan does not mutate generated skill packages.
3. Confirm this plan does not use real network, browser, or credentials.
### Acceptance Criteria
1. direct mock starts from final materialized skills
2. generated skills remain unchanged
## Phase 1: Build Direct Mock Runner
### Tasks
1. load the `102` scene index
2. locate each generated script
3. reuse fake runtime dependencies by archetype
4. call `buildBrowserEntrypointResult`
5. capture artifact status, row count, failure reason, and mock request log
### Acceptance Criteria
1. every scene is attempted
2. no single scene failure aborts the full run
3. no real request is sent
## Phase 2: Execute Direct Mock For 102
### Tasks
1. run the direct mock runner
2. write per-scene direct mock result
3. classify each scene as:
- `direct-mock-pass`
- `direct-mock-partial`
- `direct-mock-fail`
### Acceptance Criteria
1. output record count is `102`
2. each failure has a named reason
## Phase 3: Publish Report
### Tasks
1. summarize direct mock pass/fail
2. summarize results by archetype
3. identify remaining mock-only blockers
4. recommend whether pseudo-production batch selection should start
### Acceptance Criteria
1. report does not claim production execution
2. report separates mock pass from production pass
## Completion Criteria
This plan is complete when:
1. all `102` scenes have direct mock results
2. JSON asset is published
3. report is published
4. generated skill packages remain unchanged
## Stop Statement
Stop after publishing direct mock execution results and report.
Do not start pseudo-production batch selection under this plan.

View File

@@ -0,0 +1,279 @@
# Scene Skill 102 Mock Runtime Harness Implementation Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-mock-runtime-harness-implementation-design.md`
> Input Matrix: `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
## Plan Intent
Implement and execute bounded mock runtime harnesses for representative generated scene skills.
This plan validates generated script control flow under fake dependencies. It does not validate production access, real data correctness, or browser-integrated host behavior.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `tests/fixtures/generated_scene/scene_skill_102_static_validation_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_dispatch_dry_run_validation_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_readiness_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-mock-runtime-harness-report.md`
## Allowed Files
1. new mock harness files under `tests/` or `tests/fixtures/generated_scene/`
2. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
3. `docs/superpowers/reports/2026-04-20-scene-skill-102-mock-runtime-harness-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/**`
5. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Workstreams
1. `WS1` Mock harness foundation
2. `WS2` Mainline fetch archetype harnesses
3. `WS3` Small bucket harnesses
4. `WS4` Boundary/runtime harnesses
5. `WS5` Integrated result reporting
## Phase 0: Freeze Mock Runtime Boundary
### Objective
Freeze mock validation as a non-production, non-browser, non-network stage.
### Tasks
1. Confirm static validation is `102 / 102`.
2. Confirm deterministic dispatch dry-run is `102 / 102`.
3. Confirm this plan does not mutate generated skill packages.
4. Confirm this plan does not require production credentials or network access.
### Deliverables
1. baseline section in final mock runtime harness report
### Acceptance Criteria
1. no production environment is accessed
2. no generated skill is modified
3. no official board status is changed
## Phase 1: Mock Harness Foundation
### Objective
Create the shared fake runtime primitives used by all representative harnesses.
### Tasks
1. define fake `fetch`
2. define fake browser DOM surface
3. define fake artifact writer
4. define fake host bridge callback surface
5. define fake local-doc service surface
6. define common result schema:
- `script-load-pass`
- `mock-runtime-pass`
- `mock-runtime-partial`
- `mock-runtime-fail`
### Deliverables
1. shared mock harness implementation
### Acceptance Criteria
1. harness foundation does not call real network
2. harness foundation can run without browser or credentials
3. harness foundation can load a generated script from the final materialization root
## Phase 2: Route 1 - Paginated Enrichment Harness
### Objective
Validate the largest archetype bucket first.
### Fixed Representatives
1. `sweep-001-scene`
2. `sweep-002-scene`
3. `sweep-003-scene`
### Tasks
1. load each representative script
2. provide fake primary page response
3. provide fake enrichment response
4. verify expected request order where observable
5. verify artifact metadata or structured result is produced
### Deliverables
1. paginated enrichment mock result records
### Acceptance Criteria
1. each representative receives a `mock-runtime-*` status
2. no real request is sent
3. failures include named failure reason
## Phase 3: Route 2 - G2 And G1-E Fetch Harnesses
### Objective
Validate fetch-based mainline small buckets.
### Fixed Representatives
`multi_mode_request`:
1. `sweep-020-scene`
2. `sweep-023-scene`
3. `sweep-030-scene`
`single_request_enrichment`:
1. `sweep-013-scene`
2. `sweep-016-scene`
3. `sweep-068-scene`
### Tasks
1. run representative scripts with fake fetch
2. verify mode/request paths for multi-mode scenes
3. verify enrichment path for single-request enrichment scenes
4. record pass/fail reason
### Deliverables
1. multi-mode request mock result records
2. single-request enrichment mock result records
### Acceptance Criteria
1. each representative receives a `mock-runtime-*` status
2. real-sample or production execution is not started
## Phase 4: Route 3 - Inventory And Page-State Harnesses
### Objective
Validate the small specialized buckets.
### Fixed Representatives
`multi_endpoint_inventory`:
1. `sweep-084-scene`
2. `sweep-085-scene`
`page_state_eval`:
1. `sweep-066-scene`
2. `sweep-094-scene`
### Tasks
1. run multi-endpoint representatives with fake endpoint responses
2. run page-state representatives with fake DOM state
3. record pass/fail reason
### Deliverables
1. inventory mock result records
2. page-state mock result records
### Acceptance Criteria
1. each representative receives a `mock-runtime-*` status
2. no host browser is required
## Phase 5: Route 4 - Local-Doc And Host-Bridge Harnesses
### Objective
Validate boundary runtime families with fake local-doc and fake host-bridge surfaces.
### Fixed Representatives
`local_doc_pipeline`:
1. `sweep-012-scene`
2. `sweep-017-scene`
3. `sweep-019-scene`
`host_bridge_workflow`:
1. `sweep-007-scene`
2. `sweep-009-scene`
3. `sweep-010-scene`
### Tasks
1. run local-doc representatives with fake local document query and export responses
2. run host-bridge representatives with fake action and callback completion responses
3. classify boundary failures as mock harness gaps or script contract gaps
### Deliverables
1. local-doc mock result records
2. host-bridge mock result records
### Acceptance Criteria
1. no real host bridge is invoked
2. no local document service is invoked
3. failures are explicitly categorized
## Phase 6: Integrated Mock Runtime Report
### Objective
Publish representative execution results and propagated matrix interpretation.
### Tasks
1. write `scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
2. summarize representative pass/fail by archetype
3. summarize which non-representative scenes are covered only by representative inference
4. identify which archetypes still require direct mock expansion
5. recommend whether to proceed to pseudo-production batch planning
### Deliverables
1. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_harness_results_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-mock-runtime-harness-report.md`
### Acceptance Criteria
1. report distinguishes representative execution from propagated coverage
2. report does not claim production execution
3. report does not update official board
## Completion Criteria
This plan is complete when:
1. every fixed representative has a mock runtime result record
2. integrated mock runtime results JSON is published
3. mock runtime report is published
4. generated skill packages remain unchanged
5. no real browser or production environment was used
## Stop Statement
Stop after publishing mock runtime harness results and report.
Do not start pseudo-production or real-environment validation under this plan.

View File

@@ -0,0 +1,104 @@
# Scene Skill 102 Natural-Language Parameter Readiness Plan
> Date: 2026-04-20
> Design: `2026-04-20-scene-skill-102-natural-language-parameter-readiness-design.md`
## Plan Intent
Build a 102-scene natural-language invocation parameter readiness view before pseudo-production testing.
This plan answers which skills should be invoked with query conditions such as organization and period, which skills currently only support scene-keyword deterministic selection, and which required-param skills have resolver gaps.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
## Allowed Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_natural_language_invocation_samples_2026-04-20.json`
3. `docs/superpowers/reports/2026-04-20-scene-skill-102-natural-language-parameter-readiness-report.md`
## Forbidden Files
1. `src/compat/scene_platform/dispatch.rs`
2. `src/compat/scene_platform/resolvers.rs`
3. `src/generated_scene/analyzer.rs`
4. `src/generated_scene/generator.rs`
5. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*`
6. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Boundary
### Tasks
1. Confirm final skill count is `102`.
2. Confirm this plan is analysis-only.
3. Confirm no browser, network, host bridge, or production execution is performed.
### Acceptance Criteria
1. No generated skill files are modified.
2. No runtime source files are modified.
## Phase 1: Parameter Manifest Scan
### Tasks
1. Read each `SKILL.toml` for scene name and archetype.
2. Read each `scene.toml` for deterministic suffix, params, and resolver declarations.
3. Record required params and resolver types.
4. Check resolver resources such as dictionary files.
### Acceptance Criteria
1. All `102` scenes have one parameter scan record.
2. Required param scenes are explicitly identified.
## Phase 2: Readiness Classification
### Tasks
1. Mark scenes with supported, populated resolver resources as `parameter-ready`.
2. Mark scenes with empty or missing resolver resources as `parameter-gap`.
3. Mark no-param scenes as `parameter-not-required`.
4. Mark no-param scenes with likely filter words as `parameter-implicit-risk`.
### Acceptance Criteria
1. Every scene has exactly one primary readiness class.
2. Resolver gaps list concrete file or config reasons.
## Phase 3: Invocation Sample Generation
### Tasks
1. Generate minimal invocation samples for every scene.
2. Generate parameterized samples for scenes with required params.
3. Generate cautionary samples for implicit-risk scenes.
4. Make clear when organization or period wording is not currently parsed.
### Acceptance Criteria
1. Sample JSON covers all `102` scenes.
2. Parameterized samples are not generated as if resolver gaps are resolved.
## Phase 4: Publish Report
### Tasks
1. Write readiness JSON.
2. Write invocation sample JSON.
3. Write superpowers report with counts, gaps, and next route.
### Acceptance Criteria
1. Report explains why `场景名。。。` is insufficient for parameterized scenes.
2. Report states whether pseudo-production batch input should be regenerated.
3. Stop after report; do not start implementation.
## Stop Statement
Stop after readiness assets and report are published. Do not edit runtime, generated skills, board assets, or pseudo-production execution records under this plan.

View File

@@ -0,0 +1,124 @@
# Scene Skill 102 Parameter Dictionary And Invocation Template Normalization Plan
> Date: 2026-04-20
> Design: `2026-04-20-scene-skill-102-parameter-dictionary-template-normalization-design.md`
## Plan Intent
Close the natural-language parameter readiness gap for the fixed `10` required-param scene skills and refresh pseudo-production invocation templates.
## Fixed Input Bucket
The only input bucket is the `10` scenes marked `parameter-gap` in:
`tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
## Allowed Files
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills/{fixed-10}/references/org-dictionary.json`
2. `tests/fixtures/generated_scene/scene_skill_102_natural_language_parameter_readiness_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_natural_language_invocation_samples_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_handoff_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_parameter_dictionary_template_normalization_2026-04-20.json`
6. `docs/superpowers/reports/2026-04-20-scene-skill-102-parameter-dictionary-template-normalization-report.md`
## Forbidden Files
1. `src/compat/scene_platform/dispatch.rs`
2. `src/compat/scene_platform/resolvers.rs`
3. `src/generated_scene/analyzer.rs`
4. `src/generated_scene/generator.rs`
5. `examples/scene_skill_102_final_materialization_2026-04-19/skills/*/scripts/*`
6. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Scope
### Tasks
1. Confirm the fixed `10` required-param scenes.
2. Confirm all current gaps are empty org dictionaries.
3. Confirm no runtime code changes are needed.
### Acceptance Criteria
1. No non-param scene is changed.
2. No browser or production execution is started.
## Phase 1: Populate Starter Organization Dictionaries
### Tasks
1. Write the pseudo-production starter dictionary into each fixed `10` skill.
2. Use the already-tested aliases:
- `兰州公司`
- `兰州供电公司`
- `国网兰州供电公司`
- `城关供电分公司`
- `城关分公司`
- `天水公司`
- `天水供电公司`
- `国网天水供电公司`
3. Mark dictionary provenance as starter, not full production.
### Acceptance Criteria
1. All fixed `10` dictionaries are non-empty arrays.
2. Each dictionary contains alias coverage for `兰州公司`.
## Phase 2: Refresh Parameter Readiness
### Tasks
1. Re-scan all `102` skills.
2. Recompute parameter readiness.
3. Verify the fixed `10` move to `parameter-ready`.
4. Keep implicit-risk classification for no-param scenes.
### Acceptance Criteria
1. `parameter-gap = 0`.
2. `parameter-ready = 10`.
3. `total scenes = 102`.
## Phase 3: Refresh Invocation Templates
### Tasks
1. Generate parameterized samples for the fixed `10`.
2. Ensure samples include concrete period, e.g. `月累计 2026-03`.
3. Ensure samples keep `。。。` suffix.
### Acceptance Criteria
1. All fixed `10` have parameterized sample input.
2. No-param scenes keep minimal invocation samples.
## Phase 4: Refresh Pseudo-Production Handoff
### Tasks
1. Update the selected pseudo-production handoff entries that are in the fixed `10`.
2. Replace bare scene-name inputs with parameterized inputs.
3. Preserve credential policy and evidence collection fields.
### Acceptance Criteria
1. Selected required-param scenes no longer use bare `场景名。。。` in handoff.
2. No credentials are written to the repository.
## Phase 5: Publish Normalization Report
### Tasks
1. Publish normalization JSON.
2. Publish superpowers report.
3. State remaining limits explicitly.
### Acceptance Criteria
1. Report states that dictionaries are starter dictionaries, not complete production unit trees.
2. Report states next step for pseudo-production execution preparation refresh.
## Stop Statement
Stop after dictionaries, readiness assets, invocation samples, handoff, and report are refreshed. Do not run browser, production, or runtime implementation work under this plan.

View File

@@ -0,0 +1,111 @@
# Scene Skill 102 Pseudo-Production Batch Execution Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-pseudoprod-batch-execution-design.md`
## Plan Intent
Run the prepared 10-scene pseudo-production batch in an operator-provided environment and record structured results.
This plan is bounded to execution and evidence collection for the selected 10 scenes.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_handoff_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_evidence_checklist_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_record_template_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_batch_selection_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_batch_execution_results_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-pseudoprod-batch-execution-report.md`
## Allowed Files
1. planned execution result JSON
2. planned execution report
3. redacted evidence summaries if explicitly generated
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/**`
5. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
6. any credential, token, cookie, or secret file
## Phase 0: Confirm Environment Readiness
### Tasks
1. Confirm operator-provided browser/runtime environment exists.
2. Confirm network/session access is provided outside repository.
3. Confirm evidence output location.
4. Confirm redaction rules.
### Acceptance Criteria
1. No credentials are stored in repository.
2. Execution does not start unless environment readiness is confirmed externally.
## Phase 1: Execute Selected Scenes
### Tasks
For each selected scene:
1. use the deterministic invocation input ending with `。。。`
2. execute through sgClaw runtime or agreed quasi-production host
3. collect console log
4. collect network summary
5. capture screenshot if target page is required
6. capture exported artifact if produced
7. record final result state
### Acceptance Criteria
1. every selected scene has one execution record
2. every record has exactly one result state
3. failures use the allowed taxonomy
## Phase 2: Redact And Normalize Evidence
### Tasks
1. redact credentials, cookies, tokens, Authorization headers, and private data
2. normalize evidence paths
3. confirm each evidence checklist item is present or explicitly unavailable
### Acceptance Criteria
1. no secret material enters repository output
2. missing evidence has a reason
## Phase 3: Publish Execution Results
### Tasks
1. write execution results JSON
2. write execution report
3. summarize pass/blocker/mismatch/runtime-error counts
4. list follow-up blockers
### Acceptance Criteria
1. selected scene count remains 10
2. report does not claim full production certification
3. official board is not updated under this plan
## Completion Criteria
This plan is complete when all 10 selected scenes have structured execution records and a redacted execution report is published.
## Stop Statement
Stop after publishing execution results and report.
Do not update official board status under this plan.

View File

@@ -0,0 +1,114 @@
# Scene Skill 102 Pseudo-Production Batch Execution Preparation Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-pseudoprod-batch-execution-preparation-design.md`
## Plan Intent
Prepare the first pseudo-production batch for execution without executing it.
This plan creates handoff and evidence templates for the 10 selected scenes.
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_batch_selection_2026-04-20.json`
2. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
3. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_handoff_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_execution_record_template_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_evidence_checklist_2026-04-20.json`
4. `docs/superpowers/reports/2026-04-20-scene-skill-102-pseudoprod-batch-execution-preparation-report.md`
## Allowed Files
1. planned JSON assets
2. planned report
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/**`
5. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Preparation Boundary
### Tasks
1. Confirm selected batch is exactly 10 scenes.
2. Confirm all 10 are direct-mock-pass.
3. Confirm no browser/network execution happens in this plan.
### Acceptance Criteria
1. No production or quasi-production target is invoked.
2. No credentials are requested or stored.
## Phase 1: Build Environment Handoff
### Tasks
1. List required environment inputs for the operator.
2. Map required dependencies per selected scene.
3. Define credential handling rule: outside repository only.
### Acceptance Criteria
1. Every selected scene has environment prerequisites.
2. The handoff asset contains no credential values.
## Phase 2: Build Evidence Checklist
### Tasks
1. Define evidence required for each scene.
2. Define evidence file names.
3. Define redaction requirements for logs and screenshots.
### Acceptance Criteria
1. Every selected scene has an evidence checklist.
2. Every checklist includes final execution classification.
## Phase 3: Build Execution Record Template
### Tasks
1. Define common execution record fields.
2. Include per-scene placeholders for operator output.
3. Include allowed result states and failure taxonomy.
### Acceptance Criteria
1. The template can record pass, blocker, mismatch, and runtime error.
2. The template stores references to evidence files, not credentials.
## Phase 4: Publish Preparation Report
### Tasks
1. Summarize selected batch.
2. Summarize environment handoff.
3. Summarize evidence package structure.
4. Identify next bounded execution plan.
### Acceptance Criteria
1. Report states this is preparation-only.
2. Report does not claim pseudo-production execution.
## Completion Criteria
This plan is complete when handoff, evidence checklist, record template, and report are published.
## Stop Statement
Stop after publishing preparation assets.
Do not run pseudo-production execution under this plan.

View File

@@ -0,0 +1,127 @@
# Scene Skill 102 Pseudo-Production Batch Selection Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-pseudoprod-batch-selection-design.md`
## Plan Intent
Select the first pseudo-production validation batch from the 102 final materialized skills.
This plan is selection-only. It does not run pseudo-production execution.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
2. `tests/fixtures/generated_scene/scene_skill_102_static_validation_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_dispatch_dry_run_validation_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
5. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_readiness_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_batch_selection_2026-04-20.json`
2. `docs/superpowers/reports/2026-04-20-scene-skill-102-pseudoprod-batch-selection-report.md`
## Allowed Files
1. the planned output JSON
2. the planned report
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/**`
5. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Freeze Selection Boundary
### Tasks
1. Confirm all 102 scenes are direct-mock-pass.
2. Confirm this plan does not execute browser automation or real network access.
3. Confirm this plan does not update official board status.
### Acceptance Criteria
1. Selection starts from a clean `102 / 102` local mock baseline.
2. Selection does not mutate generated skills or runtime code.
## Phase 1: Build Eligible Candidate Set
### Tasks
1. Read pseudo-production readiness records.
2. Keep only `pseudo-prod-ready` scenes.
3. Exclude `real-env-required` scenes from first batch.
4. Join with direct mock results and static/dispatch readiness.
### Acceptance Criteria
1. Every selected candidate is static validated.
2. Every selected candidate is dispatch ready.
3. Every selected candidate is direct-mock-pass.
## Phase 2: Select Balanced First Batch
### Tasks
Select `10` scenes with archetype balance:
1. `paginated_enrichment`: 4
2. `multi_mode_request`: 2
3. `single_request_enrichment`: 2
4. `multi_endpoint_inventory`: 1
5. `page_state_eval`: 1
### Acceptance Criteria
1. The selected batch contains exactly `10` scenes.
2. The batch excludes host-bridge and local-doc runtime-dependent scenes.
3. Every selected scene has a deterministic invocation input.
## Phase 3: Define Evidence Checklist
### Tasks
For each selected scene, define required evidence:
1. deterministic invocation input
2. console log
3. network log or request summary
4. screenshot if browser target page is required
5. exported file if produced
6. generation report path
7. failure taxonomy slot
### Acceptance Criteria
1. Every selected scene has a complete checklist.
2. Checklist does not require production credentials to be stored in the repository.
## Phase 4: Publish Selection Report
### Tasks
1. Write selection JSON.
2. Write selection report.
3. Summarize selected and deferred scenes.
### Acceptance Criteria
1. The report states this is selection-only.
2. The report does not claim pseudo-production execution.
3. The report identifies the next bounded execution plan.
## Completion Criteria
This plan is complete when the first pseudo-production batch selection JSON and report are published.
## Stop Statement
Stop after publishing selection assets.
Do not execute pseudo-production validation under this plan.

View File

@@ -0,0 +1,252 @@
# Scene Skill 102 Static, Mock, And Pseudo-Production Validation Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-design.md`
> Parent Stage: validation after final materialization and deterministic invocation readiness
## Plan Intent
Define the next validation stage for the fully materialized `102` scene skill set.
This plan validates package health, deterministic dispatch readiness, mock runtime feasibility, and pseudo-production readiness. It does not perform real production execution.
## Fixed Inputs
1. `examples/scene_skill_102_final_materialization_2026-04-19/skills`
2. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
3. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
4. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
5. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
6. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
## Planned Outputs
1. `tests/fixtures/generated_scene/scene_skill_102_static_validation_2026-04-20.json`
2. `tests/fixtures/generated_scene/scene_skill_102_dispatch_dry_run_validation_2026-04-20.json`
3. `tests/fixtures/generated_scene/scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
4. `tests/fixtures/generated_scene/scene_skill_102_pseudoprod_readiness_2026-04-20.json`
5. `docs/superpowers/reports/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-report.md`
## Scope Guardrails
Allowed:
1. read final materialized skill packages
2. parse `SKILL.toml`, `scene.toml`, and generation reports
3. run deterministic dispatch dry-run without executing selected skills
4. build mock runtime validation matrix
5. publish validation JSON and report assets
Forbidden:
1. do not modify `src/generated_scene/analyzer.rs`
2. do not modify `src/generated_scene/generator.rs`
3. do not modify generated skill scripts during this validation plan
4. do not rematerialize the `102` skills
5. do not update `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
6. do not start real browser or production environment execution
7. do not require production credentials, SSO, VPN, or real system access
## Workstreams
1. `WS1` Static package validation
2. `WS2` Deterministic invocation dry-run validation
3. `WS3` Mock runtime validation matrix
4. `WS4` Pseudo-production readiness checklist
5. `WS5` Validation report and next-stage decision
## Phase 0: Freeze Validation Baseline
### Objective
Freeze the `102` final skill set as the input to validation.
### Tasks
1. Confirm final materialization count is `102 / 102`.
2. Confirm materialization failure count is `0`.
3. Confirm deterministic invocation readiness is `102 / 102`.
4. Confirm this plan does not rematerialize skills.
### Deliverables
1. validation baseline section in final report
### Acceptance Criteria
1. validation begins from the final materialization root
2. no source scene directories are rescanned
3. no generated scene logic is changed
## Phase 1: Static Package Validation
### Objective
Validate that all `102` skill packages are structurally complete and parseable.
### Tasks
For each skill:
1. check `SKILL.toml`
2. check `SKILL.md`
3. check `scene.toml`
4. check `references/generation-report.json`
5. check at least one script under `scripts/`
6. parse TOML and JSON files
7. compare `sceneId`, display name, archetype, readiness, suffix, and keyword fields against index and manifest assets
### Deliverables
1. `scene_skill_102_static_validation_2026-04-20.json`
### Acceptance Criteria
1. every scene has exactly one static validation record
2. every static failure has a named reason
3. total records equal `102`
## Phase 2: Deterministic Invocation Dry-Run Validation
### Objective
Validate `U+3002 x3` deterministic suffix dispatch selection without executing selected skills.
### Tasks
For each skill:
1. construct one canonical utterance from the scene display name plus the `U+3002 x3` suffix
2. optionally construct one keyword-based utterance when safe
3. dry-run deterministic selection against the skill registry
4. record selected skill id, ambiguity count, and no-match status
### Deliverables
1. `scene_skill_102_dispatch_dry_run_validation_2026-04-20.json`
### Acceptance Criteria
1. every complete skill has a dispatch dry-run result
2. ambiguous and no-match outcomes are explicit
3. no selected skill is executed
## Phase 3: Mock Runtime Validation Matrix
### Objective
Define and, where safe, prepare mock runtime validation by archetype.
### Tasks
1. group `102` skills by workflow archetype
2. identify one to three representatives per archetype
3. define mock dependencies required by each archetype:
- fake fetch
- fake browser DOM
- fake host bridge
- fake local-doc service
- fake artifact writer
4. classify each skill as:
- `mock-covered-by-representative`
- `mock-needs-harness`
- `mock-not-safe-yet`
### Deliverables
1. `scene_skill_102_mock_runtime_validation_matrix_2026-04-20.json`
### Acceptance Criteria
1. every scene is assigned a mock-runtime coverage status
2. every archetype has a named harness requirement
3. this phase does not require real network or browser credentials
## Phase 4: Pseudo-Production Readiness Checklist
### Objective
Define what evidence is required before real-environment validation can start.
### Tasks
For each scene or archetype:
1. record required runtime dependencies
2. record expected artifact type
3. record whether host bridge, browser, localhost service, or document pipeline is required
4. define required execution evidence:
- console logs
- network logs
- screenshots
- exported files
- generated artifact metadata
5. define failure taxonomy:
- `login-blocked`
- `network-blocked`
- `host-bridge-blocked`
- `data-mismatch`
- `artifact-mismatch`
- `environment-unavailable`
- `runtime-error`
### Deliverables
1. `scene_skill_102_pseudoprod_readiness_2026-04-20.json`
### Acceptance Criteria
1. every scene has a pseudo-production readiness record
2. every real-environment blocker has a named category
3. no production credentials are required by this phase
## Phase 5: Publish Validation Report
### Objective
Publish one report that separates static readiness, dispatch readiness, mock readiness, and pseudo-production readiness.
### Tasks
1. summarize static validation results
2. summarize dispatch dry-run results
3. summarize mock runtime coverage matrix
4. summarize pseudo-production readiness categories
5. recommend whether to start real-environment validation and at what batch size
### Deliverables
1. `docs/superpowers/reports/2026-04-20-scene-skill-102-static-mock-pseudoprod-validation-report.md`
### Acceptance Criteria
1. report explains that `102 / 102` materialization is not the same as `102 / 102` production execution
2. report lists remaining validation blockers, if any
3. report does not promote any scene to real executed-pass
## Expected Status Outputs
This plan should produce these independent status counts:
1. `staticValidationStatus`
2. `dispatchDryRunStatus`
3. `mockRuntimeCoverageStatus`
4. `pseudoProductionReadinessStatus`
## Completion Criteria
This plan is complete when:
1. all planned validation assets are published
2. all `102` scenes have static validation records
3. all `102` scenes have dispatch dry-run records
4. all `102` scenes have mock runtime matrix records
5. all `102` scenes have pseudo-production readiness records
6. the validation report is published
## Stop Statement
Stop after publishing static, dispatch, mock-runtime matrix, pseudo-production readiness assets, and the report.
Do not execute real production validation under this plan.

View File

@@ -0,0 +1,96 @@
# Sweep 012 Materialization Recovery Plan
> Date: 2026-04-20
> Design: `2026-04-20-sweep-012-materialization-recovery-design.md`
## Plan Intent
Recover the only failed final materialization scene package and refresh final materialization/readiness assets to reach `102 / 102` materialized packages.
## Fixed Input
`sweep-012-scene / 业扩报装管理制度`
Source directory:
`D:/desk/智能体资料/全量业务场景/一平台场景/业扩报装管理制度`
Final output root:
`examples/scene_skill_102_final_materialization_2026-04-19`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `tests/scene_generator_test.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/sweep-012-scene/**`
5. `examples/scene_skill_102_final_materialization_2026-04-19/SCENE_INDEX.md`
6. `examples/scene_skill_102_final_materialization_2026-04-19/scene_skill_102_index.json`
7. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_manifest_2026-04-19.json`
8. `tests/fixtures/generated_scene/scene_skill_102_final_materialization_failures_2026-04-19.json`
9. `tests/fixtures/generated_scene/scene_skill_102_deterministic_invocation_readiness_after_keyword_refinement_2026-04-20.json`
10. `docs/superpowers/reports/2026-04-20-sweep-012-materialization-recovery-report.md`
## Forbidden Files
1. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
2. other final skill package scripts outside `sweep-012-scene`
3. production/static/mock validation assets
## Phase 0: Diagnose Single-Scene Failure
### Tasks
1. Reproduce current failure for `sweep-012-scene`.
2. Compare analyzer output with official framework expectation.
3. Identify whether the failure is route selection, missing bootstrap, or workflow contract recovery.
### Acceptance Criteria
1. Root cause is stated before implementation.
## Phase 1: Apply Minimal Recovery
### Tasks
1. Make the smallest code or fixture correction needed for `sweep-012-scene`.
2. Add or update a focused regression test.
### Acceptance Criteria
1. Existing G3/G6 route behavior remains protected.
2. The correction is not generalized beyond evidence-supported behavior.
## Phase 2: Re-materialize Sweep 012 Only
### Tasks
1. Run `sg_scene_generate` only for `sweep-012-scene`.
2. Verify required files exist.
3. Normalize human-readable metadata and deterministic invocation metadata for this package.
### Acceptance Criteria
1. `sweep-012-scene` package is complete.
2. Its `scene.toml` uses suffix `。。。`.
## Phase 3: Refresh Assets
### Tasks
1. Patch final materialization manifest for `sweep-012-scene`.
2. Clear final materialization failures.
3. Refresh `SCENE_INDEX.md` and `scene_skill_102_index.json`.
4. Refresh deterministic readiness after keyword refinement.
5. Publish recovery report.
### Acceptance Criteria
1. Materialized count is `102`.
2. Failure count is `0`.
3. Deterministic dispatch ready count is `102`.
## Stop Statement
Stop after `sweep-012-scene` recovery assets and report are published. Do not start static/mock/production validation under this plan.

View File

@@ -0,0 +1,103 @@
# Sweep 015 Direct Mock Partial Closure Plan
> Date: 2026-04-20
> Status: Draft
> Upstream Design: `docs/superpowers/specs/2026-04-20-sweep-015-direct-mock-partial-closure-design.md`
## Plan Intent
Close the only remaining `direct-mock-partial` before pseudo-production batch selection.
This plan is mock-only and targets one fixed scene:
- `sweep-015-scene / 任务报表`
## Fixed Inputs
1. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
2. `tests/generated_scene_full_direct_mock_runner.js`
3. `examples/scene_skill_102_final_materialization_2026-04-19/skills/sweep-015-scene/scripts/collect_sweep_015_scene.js`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/sweep-015-scene/references/generation-report.json`
## Planned Outputs
1. refreshed `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
2. refreshed `docs/superpowers/reports/2026-04-20-scene-skill-102-full-direct-mock-execution-report.md`
3. `docs/superpowers/reports/2026-04-20-sweep-015-direct-mock-partial-closure-report.md`
## Allowed Files
1. `tests/generated_scene_full_direct_mock_runner.js`
2. `tests/fixtures/generated_scene/scene_skill_102_full_direct_mock_execution_2026-04-20.json`
3. `docs/superpowers/reports/2026-04-20-scene-skill-102-full-direct-mock-execution-report.md`
4. `docs/superpowers/reports/2026-04-20-sweep-015-direct-mock-partial-closure-report.md`
## Forbidden Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. `examples/scene_skill_102_final_materialization_2026-04-19/skills/**`
5. `tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json`
## Phase 0: Confirm Partial Cause
### Tasks
1. Confirm `sweep-015-scene` is the only partial.
2. Confirm the generated script uses `row.status == 5`.
3. Confirm the mock row does not currently satisfy that filter.
### Acceptance Criteria
1. The closure target remains exactly one scene.
2. The cause is classified as mock fixture contract mismatch.
## Phase 1: Patch Mock Fixture Contract
### Tasks
1. Add the minimal filter-satisfying field to direct mock fake row data.
2. Keep the change generic enough to support filter-aware mock execution but narrow enough not to mask runtime failures.
### Acceptance Criteria
1. No generated skill package is modified.
2. No generator/analyzer code is modified.
## Phase 2: Rerun Full Direct Mock
### Tasks
1. Run the full direct mock runner.
2. Confirm `sweep-015-scene` becomes `direct-mock-pass`.
3. Confirm total output record count is `102`.
### Acceptance Criteria
1. `direct-mock-pass = 102`.
2. `direct-mock-partial = 0`.
3. `direct-mock-fail = 0`.
## Phase 3: Publish Closure
### Tasks
1. Refresh the full direct mock report.
2. Publish a sweep-015 closure report.
### Acceptance Criteria
1. Closure report states this is mock-only.
2. Closure report does not claim pseudo-production or production pass.
## Completion Criteria
This plan is complete when the full direct mock result has no partial/fail records and the closure report is published.
## Stop Statement
Stop after publishing the closure report.
Do not start pseudo-production batch selection under this plan.

View File

@@ -0,0 +1,93 @@
# Sweep-030 Deterministic Keyword / Alias Normalization Plan
## Parent
- Framework phase: post-materialization deterministic invocation hardening
- Parent route: line-loss service-console readiness correction
- Parent design: [2026-04-20-sweep-030-deterministic-keyword-alias-normalization-design.md](/D:/data/ideaSpace/rust/sgClaw/claw-new/docs/superpowers/specs/2026-04-20-sweep-030-deterministic-keyword-alias-normalization-design.md)
## Objective
Make `sweep-030-scene` selectable from the natural service-console phrasing used by operators, without touching sgClaw runtime.
## Fixed Input
- `sweep-030-scene`
- Current failing operator phrasing:
- `兰州公司 台区线损大数据 月累计线损率统计分析。。。`
- Current parameter-complete phrasing:
- `兰州公司 台区线损大数据 月累计线损率统计分析 2026-03。。。`
## Phase 0: Freeze Boundary
### Tasks
1. Confirm the current `scene.toml` deterministic keyword set for `sweep-030-scene`
2. Confirm the natural operator phrasing to support
3. Freeze allowed / forbidden files
### Deliverable
- frozen route-local scope
### Acceptance
- scope stays limited to `sweep-030-scene`
## Phase 1: Normalize Deterministic Aliases
### Tasks
1. Update `include_keywords` in `sweep-030-scene/scene.toml`
2. Preserve current suffix `。。。`
3. Keep aliases specific to line-loss monthly/weekly cumulative phrasing
### Deliverable
- normalized deterministic manifest
### Acceptance
- `台区线损大数据 月累计线损率统计分析` can satisfy at least one `include_keywords` match
- no runtime files are changed
## Phase 2: Route-Local Verification Record
### Tasks
1. Record the normalized alias set
2. Record the operator-ready sample input
3. Record the next-layer note that helper/requesturl validation is still pending
### Deliverables
- `tests/fixtures/generated_scene/sweep_030_deterministic_keyword_alias_normalization_2026-04-20.json`
- `docs/superpowers/reports/2026-04-20-sweep-030-deterministic-keyword-alias-normalization-report.md`
### Acceptance
- verification asset exists
- report states that this plan only fixes dispatch matchability
## Allowed Files
- `examples/scene_skill_102_final_materialization_2026-04-19/skills/sweep-030-scene/scene.toml`
- `tests/fixtures/generated_scene/sweep_030_deterministic_keyword_alias_normalization_2026-04-20.json`
- `docs/superpowers/reports/2026-04-20-sweep-030-deterministic-keyword-alias-normalization-report.md`
## Forbidden Files
- `src/compat/scene_platform/dispatch.rs`
- `src/browser/callback_host.rs`
- `src/service/server.rs`
- `src/generated_scene/*`
- `resources/rules.json`
## Expected Delta
- `sweep-030-scene` becomes dispatch-matchable from the natural operator wording used in the service console
- no claim about helper bootstrap success within this plan
## Stop Statement
Stop after manifest alias normalization and route-local report publication. Do not continue into callback-host or bootstrap-target fixes.

View File

@@ -0,0 +1,76 @@
# Generated Scene Local-Doc Pipeline Residual Closure Plan
Date: 2026-04-21
Parent status source:
- `docs/superpowers/reports/2026-04-21-generated-scene-runtime-semantics-validation-refresh-execution-report.md`
Parent design:
- `docs/superpowers/specs/2026-04-21-generated-scene-local-doc-pipeline-residual-closure-design.md`
## Intent
Close the remaining bounded `local_doc_pipeline` residual bucket that still blocks full rematerialization completeness after runtime-semantics hardening.
## Fixed Residual Bucket
Only these six scenes are in scope:
1. `sweep-025-scene`
2. `sweep-047-scene`
3. `sweep-050-scene`
4. `sweep-052-scene`
5. `sweep-062-scene`
6. `sweep-087-scene`
## Fixed Inputs
1. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_failures_2026-04-21.json`
2. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_manifest_2026-04-21.json`
3. `docs/superpowers/reports/2026-04-21-generated-scene-runtime-semantics-validation-refresh-execution-report.md`
4. corresponding original scene source directories under `D:/desk/智能体资料/全量业务场景/一平台场景`
## Allowed Files
1. `src/generated_scene/analyzer.rs`
2. `src/generated_scene/generator.rs`
3. `src/generated_scene/ir.rs`
4. route-local generator tests under `tests/scene_generator*.rs`
5. route-local assets under `tests/fixtures/generated_scene/`
6. route-local report under `docs/superpowers/reports/`
## Forbidden Files
1. no rematerialization rerun inside this plan
2. no validation refresh rerun inside this plan
3. no edits to already materialized skill bundles under `examples/`
4. no runtime/service-console/browser changes
5. no official board updates
6. no pseudo-production execution
## Required Work
1. inspect the six source scenes and classify the exact missing workflow-evidence shapes
2. implement only reusable `local_doc_pipeline` evidence recovery needed for this bounded bucket
3. add route-local regression coverage for the recovered evidence shapes
4. publish route-local followup JSON
5. publish route-local report
## Required Outputs
1. `tests/fixtures/generated_scene/generated_scene_local_doc_pipeline_residual_closure_followup_2026-04-21.json`
2. `docs/superpowers/reports/2026-04-21-generated-scene-local-doc-pipeline-residual-closure-report.md`
## Validation
1. prove the bounded six-scene evidence shapes are now recognized at analyzer/generator level
2. keep validation route-local to this plan
3. do not claim bundle-level closure until downstream rematerialization rerun
## Stop Statement
Stop after the bounded `local_doc_pipeline` residual closure slice and route-local assets are published.
Do not rerun rematerialization or validation refresh inside this plan.

View File

@@ -0,0 +1,130 @@
# Generated Scene Runtime Semantics Offline Validation Bundle Refresh Plan
Date: 2026-04-21
Design:
- `docs/superpowers/specs/2026-04-21-generated-scene-runtime-semantics-offline-validation-bundle-refresh-design.md`
Upstream plans:
- `docs/superpowers/plans/2026-04-21-generated-scene-runtime-semantics-rematerialization-execution-plan.md`
- `docs/superpowers/plans/2026-04-21-generated-scene-runtime-semantics-validation-refresh-execution-plan.md`
## Intent
Build a refreshed offline validation bundle from the 2026-04-21 runtime-semantics rematerialized 102-skill set and refreshed validation handoff assets.
This plan only prepares the portable bundle. It does not run pseudo-production validation.
## Fixed Inputs
1. `examples/scene_skill_102_runtime_semantics_rematerialization_2026-04-21`
2. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_manifest_2026-04-21.json`
3. `tests/fixtures/generated_scene/generated_scene_runtime_semantics_rematerialization_failures_2026-04-21.json`
4. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_deterministic_invocation_readiness_2026-04-21.json`
5. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_natural_language_parameter_readiness_2026-04-21.json`
6. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_natural_language_invocation_samples_2026-04-21.json`
7. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_pseudoprod_execution_handoff_2026-04-21.json`
8. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_pseudoprod_evidence_checklist_2026-04-21.json`
9. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_pseudoprod_execution_record_template_2026-04-21.json`
10. `tests/fixtures/generated_scene/scene_skill_102_runtime_semantics_full_direct_mock_execution_2026-04-21.json`
11. `target/debug/sg_claw.exe` or an equivalent current local `sg_claw.exe`
12. Optional rule assets:
- `resources/rules-102-business-targets-candidate.json`
- `resources/rules-102-business-targets-merged.json`
- `resources/rules-102-business-targets.patch`
## Output Directory
Create:
- `dist/sgclaw_102_runtime_semantics_validation_bundle_2026-04-21`
Do not delete or mutate:
- `dist/sgclaw_102_pseudoprod_validation_bundle_2026-04-20`
## Phases
### Phase 0: Input Verification
1. Confirm the 2026-04-21 rematerialization directory exists.
2. Confirm it contains 102 skill directories.
3. Confirm the 2026-04-21 rematerialization failures asset reports zero failures.
4. Confirm the 2026-04-21 validation refresh assets exist.
5. Confirm the pseudo-production handoff contains the refreshed selected batch.
### Phase 1: Bundle Directory Preparation
1. Create the target `dist/` bundle directory.
2. Create subdirectories:
- `skills/`
- `docs/`
- `handoff/`
- `resources/`
- `results/`
- `evidence/`
3. Copy `sg_claw.exe` into the bundle root.
4. Copy the 2026-04-21 rematerialized `skills/` content into the bundle.
### Phase 2: Documentation And Handoff Copy
1. Copy `SCENE_INDEX.md` and `scene_skill_102_index.json` from the 2026-04-21 rematerialization output into `docs/`.
2. Copy refreshed validation/handoff assets into `handoff/`.
3. Copy rule candidate assets into `resources/` when present.
4. Generate `README.md` with:
- config sample using `skillsDir`
- browser WebSocket reminder
- service console execution notes
- credential safety warning
- page URL guidance
5. Generate `BATCH_001.md` from the refreshed pseudo-production handoff selected batch.
6. Generate `BUNDLE_MANIFEST.json`.
### Phase 3: Bundle Validation
1. Confirm `skills/` contains 102 scene directories.
2. Confirm every skill directory contains:
- `SKILL.toml`
- `SKILL.md`
- `scene.toml`
- `scripts/`
3. Confirm critical JSON files parse.
4. Confirm `BATCH_001.md` includes the same selected batch count as the handoff.
5. Confirm no credentials or tokens are introduced by the generated bundle docs.
### Phase 4: Report
Publish:
- `docs/superpowers/reports/2026-04-21-generated-scene-runtime-semantics-offline-validation-bundle-refresh-report.md`
The report must include:
1. bundle path
2. copied skill count
3. selected batch count
4. copied handoff assets
5. validation result
6. explicit statement that no pseudo-production execution was run
## Allowed Files
1. `dist/sgclaw_102_runtime_semantics_validation_bundle_2026-04-21/**`
2. `docs/superpowers/reports/2026-04-21-generated-scene-runtime-semantics-offline-validation-bundle-refresh-report.md`
## Forbidden Files
1. No `src/` changes
2. No `examples/scene_skill_102_runtime_semantics_rematerialization_2026-04-21/**` edits
3. No validation asset edits under `tests/fixtures/generated_scene/`
4. No official board updates
5. No old bundle deletion
6. No credentials or secret material
## Stop Statement
Stop after the refreshed offline validation bundle and report are generated and validated.
Do not run browser, inner-network, or pseudo-production validation inside this plan.

Some files were not shown because too many files have changed in this diff Show More