Files
claw/docs/superpowers/specs/2026-04-19-102-full-sweep-dry-run-triage-design.md

5.2 KiB

102 Full Sweep Dry-Run Triage Design

Date: 2026-04-19 Status: Draft Upstream Result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json Upstream Report: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-report.md

Design Intent

Split the non-pass buckets from the 102 scene full sweep into concrete, actionable triage categories without changing generator behavior or promoting scene status.

The design answers:

why did 62 scenes not become dry-run auto-pass, and which blocker should be handled first?

Starting Point

The upstream dry-run produced:

Status Count
auto-pass 40
fail-closed-known 26
misclassified 5
source-unreadable 31
missing-source 0
unsupported-family 0
Total 102

The triage scope is only the 62 non-pass records.

Scope Guardrails

  1. do not edit src/generated_scene/analyzer.rs
  2. do not edit src/generated_scene/generator.rs
  3. do not change scene generation logic
  4. do not update scene_execution_board_2026-04-18.json
  5. do not promote scenes from this triage
  6. do not add family baselines
  7. do not create implementation plans from a single failure
  8. do not rerun outside the fixed 102 scene set

Fixed Inputs

  1. dry-run result: tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json
  2. dry-run output root: examples/full_sweep_dry_run_2026-04-19
  3. execution board: tests/fixtures/generated_scene/scene_execution_board_2026-04-18.json
  4. scene root: D:/desk/智能体资料/全量业务场景/一平台场景

Fixed Outputs

  1. triage result: tests/fixtures/generated_scene/full_sweep_dry_run_triage_2026-04-19.json
  2. triage report: docs/superpowers/reports/2026-04-19-102-full-sweep-dry-run-triage-report.md

Triage Order

The order is fixed:

  1. timeout triage
  2. misclassification triage
  3. no-report failure triage

This order is deliberate:

  1. timeouts are the largest bucket and include already-mapped G2 scenes
  2. misclassification has the cleanest routing-quality signal
  3. no-report failures are too broad until the higher-signal buckets are separated

Timeout Triage Model

Input bucket:

dryRunStatus = source-unreadable

Current count:

31

Current reason:

generator timeout after 30s

Target second-level labels:

  1. timeout-known-family-sample
  2. timeout-unvalidated-source
  3. timeout-large-source
  4. timeout-command-hang
  5. timeout-generator-slow-but-progressing
  6. timeout-undetermined

Minimum evidence per timeout record:

  1. source directory exists
  2. file count
  3. total source bytes
  4. current group
  5. current board status
  6. real sample record id if present
  7. whether a partial skill directory exists
  8. whether a partial generation report exists

Diagnostic reruns are allowed only for classification. A longer rerun success does not promote the scene.

Misclassification Triage Model

Input bucket:

dryRunStatus = misclassified

Current count:

5

Current shape:

  1. G3 -> host_bridge_workflow: 3
  2. G1-E -> host_bridge_workflow: 2

Target second-level labels:

  1. route-overprefer-host-bridge
  2. board-expectation-stale
  3. mixed-workflow-host-bridge-valid
  4. scene-family-split-needed
  5. misclassification-undetermined

Minimum evidence per misclassification record:

  1. board expected group
  2. expected archetype
  3. dry-run inferred archetype
  4. current source asset
  5. real sample layer status
  6. generated report path
  7. failed or conflicting signal summary

This phase does not correct routing logic.

No-Report Failure Triage Model

Input bucket:

dryRunStatus = fail-closed-known and reason is generator failed without generation report

Current count:

25

Target failure stages:

  1. source-scan
  2. analyzer
  3. ir-assembly
  4. readiness-before-report
  5. compiler-package-write
  6. panic-or-process-error
  7. unknown-no-report

The one bootstrap_target failure remains separately tracked and is not merged into no-report failures.

Minimum evidence per no-report record:

  1. exit code if available
  2. stdout tail
  3. stderr tail
  4. partial skill directory exists
  5. partial references directory exists
  6. generated report exists
  7. inferred failure stage

Result Schema

Top-level fields:

{
  "triageDate": "2026-04-19",
  "scope": "102-full-sweep-dry-run-triage",
  "sourceDryRun": "tests/fixtures/generated_scene/full_sweep_dry_run_2026-04-19.json",
  "summary": {},
  "timeoutTriage": [],
  "misclassificationTriage": [],
  "noReportFailureTriage": [],
  "bootstrapTargetFailures": [],
  "recommendations": []
}

Each triage record keeps the original dry-run scene id and scene name.

Completion Criteria

This triage is complete when:

  1. all 31 timeout records have a second-level timeout label
  2. all 5 misclassified records have a routing triage label
  3. all 25 no-report failures have an inferred failure stage
  4. the bootstrap_target case remains separately visible
  5. no scene status is promoted
  6. no generator or analyzer logic is changed

Stop Rule

Stop after publishing the triage JSON and report.

Do not start implementation correction from this triage unless a new bounded implementation plan is explicitly created later.