admin/claw

Fork 0

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

5.6 KiB

Raw Blame History

102 Full Sweep Dry-Run Plan

Date: 2026-04-19 Status: Draft Upstream Spec: 2026-04-19-102-full-sweep-dry-run-design.md

Plan Intent

Run one bounded, read-only full sweep over the 102 scene ledger to measure actual generic scene -> skill coverage.

The plan answers:

how many of the 102 scenes can the current generic analyzer/generator handle today?

Scope Guardrails

do not change analyzer logic
do not change generator logic
do not promote scenes into scene_execution_board_2026-04-18.json
do not add new family baselines
do not create new family implementation plans
do not fix failures during this dry-run
do not run outside the fixed 102 scene set

Fixed Inputs

execution board: tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json
scene root: D:/desk/智能体资料/全量业务场景/一平台场景
generator command: cargo run --bin sg_scene_generate

Fixed Outputs

dry-run result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json
dry-run output root: examples/full_sweep_dry_run_2026-04-19
report: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

Workstreams

WS1 Build Scene Inventory
WS2 Run Analyzer/Generator Dry-Run
WS3 Classify Results
WS4 Publish Coverage Report

Phase 0: Freeze Dry-Run Boundary

Objective

Make the dry-run a measurement exercise only.

Tasks

freeze the execution board input
freeze the local scene root
freeze the dry-run output paths
explicitly mark the run as read-only with respect to generator behavior and board status

Deliverables

fixed input statement
fixed output statement
dry-run no-promotion statement

Acceptance Criteria

no analyzer/generator implementation file is edited for this dry-run
scene_execution_board_2026-04-18.json is not modified by dry-run results
failures are recorded, not fixed

Phase 1: Build Scene Inventory

Objective

Construct a deterministic inventory of all 102 scene names and expected source directories.

Tasks

read scene_execution_board_2026-04-18.json
extract all scene entries
map each scene name to D:/desk/智能体资料/全量业务场景/一平台场景/<sceneName>
check whether each source directory exists
assign initial inventory status:
- source-present
- missing-source

Deliverables

inventory section inside full_sweep_dry_run_2026-04-19.json
missing-source list

Acceptance Criteria

inventory count equals 102
every scene has a source path
missing source does not stop the sweep

Phase 2: Run Analyzer/Generator Dry-Run

Objective

Attempt current generic generation for every source-present scene without fixing failures.

Tasks

generate a stable safe scene id for each scene
invoke sg_scene_generate for each source-present scene
write outputs under examples/full_sweep_dry_run_2026-04-19
for successful generation, read references/generation-report.json
for failed generation, capture stderr/stdout and exit code
continue until all 102 scenes are processed

Deliverables

per-scene dry-run execution record
generated output root for successful scenes
captured error messages for failed scenes

Acceptance Criteria

every source-present scene has a generator result
no failure aborts the full sweep
generator results are isolated under the dry-run output root

Phase 3: Classify Results

Objective

Turn raw dry-run output into actionable coverage categories.

Tasks

classify generated A/B readiness with no blocker as auto-pass
classify generator blocking with known gate/contract reason as fail-closed-known
classify obvious family mismatch as misclassified
classify evidence outside current families as unsupported-family
classify absent directories as missing-source
classify read/analyze failures as source-unreadable
compute top blockers by frequency
compute counts by inferred archetype

Deliverables

final dry-run status per scene
summary counts
by-archetype counts
top-blocker list

Acceptance Criteria

every scene has exactly one final status
total classified count equals 102
every non-pass scene has a reason

Phase 4: Publish Report

Objective

Answer the coverage question without changing project state.

Tasks

write full_sweep_dry_run_2026-04-19.json
write 2026-04-19-102-full-sweep-dry-run-report.md
report these four headline numbers:
- real-sample executed pass
- code-backed ledger coverage
- dry-run auto-pass
- dry-run actionable coverage
list next recommended blocker, but do not start implementation

5.6 KiB

Raw Blame History

102 Full Sweep Dry-Run Plan

Plan Intent

Scope Guardrails

Fixed Inputs

Fixed Outputs

Workstreams

Phase 0: Freeze Dry-Run Boundary

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 1: Build Scene Inventory

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 2: Run Analyzer/Generator Dry-Run

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 3: Classify Results

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 4: Publish Report

Objective

Tasks

Deliverables

Acceptance Criteria

Completion Criteria

Non-Negotiable Stop Rule

5.6 KiB Raw Blame History

102 Full Sweep Dry-Run Plan

Plan Intent

Scope Guardrails

Fixed Inputs

Fixed Outputs

Workstreams

Phase 0: Freeze Dry-Run Boundary

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 1: Build Scene Inventory

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 2: Run Analyzer/Generator Dry-Run

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 3: Classify Results

Objective

Tasks

Deliverables

Acceptance Criteria

Phase 4: Publish Report

Objective

Tasks

Deliverables

Acceptance Criteria

Completion Criteria

Non-Negotiable Stop Rule

5.6 KiB

Raw Blame History