admin/claw

Files

木炎 956f0c2b68 feat: add generated scene skill platform hardening

2026-04-21 23:19:06 +08:00

5.2 KiB

Raw Blame History

102 Full Sweep Dry-Run Triage Design

Date: 2026-04-19 Status: Draft Upstream Result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json Upstream Report: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

Design Intent

Split the non-pass buckets from the 102 scene full sweep into concrete, actionable triage categories without changing generator behavior or promoting scene status.

The design answers:

why did 62 scenes not become dry-run auto-pass, and which blocker should be handled first?

Starting Point

The upstream dry-run produced:

Status	Count
`auto-pass`	40
`fail-closed-known`	26
`misclassified`	5
`source-unreadable`	31
`missing-source`	0
`unsupported-family`	0
Total	102

The triage scope is only the 62 non-pass records.

Scope Guardrails

do not edit src/generated_scene/analyzer.rs
do not edit src/generated_scene/generator.rs
do not change scene generation logic
do not update scene_execution_board_2026-04-18.json
do not promote scenes from this triage
do not add family baselines
do not create implementation plans from a single failure
do not rerun outside the fixed 102 scene set

Fixed Inputs

dry-run result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json
dry-run output root: examples/full_sweep_dry_run_2026-04-19
execution board: tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json
scene root: D:/desk/智能体资料/全量业务场景/一平台场景

Fixed Outputs

triage result: tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json
triage report: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md

Triage Order

The order is fixed:

timeout triage
misclassification triage
no-report failure triage

This order is deliberate:

timeouts are the largest bucket and include already-mapped G2 scenes
misclassification has the cleanest routing-quality signal
no-report failures are too broad until the higher-signal buckets are separated

Timeout Triage Model

Input bucket:

dryRunStatus = source-unreadable

Current count:

31

Current reason:

generator timeout after 30s

Target second-level labels:

timeout-known-family-sample
timeout-unvalidated-source
timeout-large-source
timeout-command-hang
timeout-generator-slow-but-progressing
timeout-undetermined

Minimum evidence per timeout record:

source directory exists
file count
total source bytes
current group
current board status
real sample record id if present
whether a partial skill directory exists
whether a partial generation report exists

Diagnostic reruns are allowed only for classification. A longer rerun success does not promote the scene.

Misclassification Triage Model

Input bucket:

dryRunStatus = misclassified

Current count:

5

Current shape:

G3 -> host_bridge_workflow: 3
G1-E -> host_bridge_workflow: 2

Target second-level labels:

route-overprefer-host-bridge
board-expectation-stale
mixed-workflow-host-bridge-valid
scene-family-split-needed
misclassification-undetermined

Minimum evidence per misclassification record:

board expected group
expected archetype
dry-run inferred archetype
current source asset
real sample layer status
generated report path
failed or conflicting signal summary

This phase does not correct routing logic.

No-Report Failure Triage Model

Input bucket:

dryRunStatus = fail-closed-known and reason is generator failed without generation report

Current count:

25

Target failure stages:

source-scan
analyzer
ir-assembly
readiness-before-report
compiler-package-write
panic-or-process-error
unknown-no-report

The one bootstrap_target failure remains separately tracked and is not merged into no-report failures.

Minimum evidence per no-report record:

exit code if available
stdout tail
stderr tail
partial skill directory exists
partial references directory exists
generated report exists
inferred failure stage

Result Schema

Top-level fields:

{
  "triageDate": "2026-04-19",
  "scope": "102-full-sweep-dry-run-triage",
  "sourceDryRun": "tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json",
  "summary": {},
  "timeoutTriage": [],
  "misclassificationTriage": [],
  "noReportFailureTriage": [],
  "bootstrapTargetFailures": [],
  "recommendations": []
}

Each triage record keeps the original dry-run scene id and scene name.

Completion Criteria

This triage is complete when:

all 31 timeout records have a second-level timeout label
all 5 misclassified records have a routing triage label
all 25 no-report failures have an inferred failure stage
the bootstrap_target case remains separately visible
no scene status is promoted
no generator or analyzer logic is changed

Stop Rule

Stop after publishing the triage JSON and report.

Do not start implementation correction from this triage unless a new bounded implementation plan is explicitly created later.

5.2 KiB Raw Blame History

102 Full Sweep Dry-Run Triage Design

Design Intent

Starting Point

Scope Guardrails

Fixed Inputs

Fixed Outputs

Triage Order

Timeout Triage Model

Misclassification Triage Model

No-Report Failure Triage Model

Result Schema

Completion Criteria

Stop Rule

5.2 KiB

Raw Blame History