Files
claw/docs/superpowers/specs/2026-04-19-timeout-regression-diagnostic-design.md

115 lines
3.3 KiB
Markdown

# Timeout Regression Diagnostic Design
> Date: 2026-04-19
> Status: Draft
> Upstream Plan: `docs/superpowers/plans/2026-04-19-structured-fail-closed-improvement-roadmap-plan.md`
> Upstream Follow-up: `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
## Intent
Diagnose the three timeout records visible after the structured fail-closed improvement follow-up sweep.
This design is diagnostic-only. It does not change analyzer or generator logic, promote scenes, update the execution board, or treat a longer rerun success as a validated pass.
## Problem Statement
The structured fail-closed improvement follow-up sweep produced:
| Status | Count |
| --- | ---: |
| `auto-pass` | 48 |
| `fail-closed-known` | 47 |
| `adjudicated-valid-host-bridge` | 4 |
| `source-unreadable` | 3 |
The three `source-unreadable` records are timeout records:
| Scene id | Scene | Timeout type |
| --- | --- | --- |
| `sweep-015-scene` | `任务报表` | persistent timeout |
| `sweep-025-scene` | `力禾动环系统巡视记录` | persistent timeout |
| `sweep-040-scene` | `嘉峪关日报` | new regression timeout |
`sweep-040-scene` is the most important record because it regressed from `fail-closed-known` in the reconciled baseline to `source-unreadable` in the follow-up sweep.
## Scope
In scope:
1. identify the three timeout records from the follow-up sweep
2. collect source directory diagnostics
3. run bounded diagnostic reruns with longer timeout budgets
4. classify each timeout into a secondary timeout reason
5. publish diagnostic JSON and report
Out of scope:
1. analyzer or generator implementation changes
2. readiness gate changes
3. execution board updates
4. scene promotion
5. family baseline changes
6. handling the remaining `47` structured fail-closed records
7. handling the `4` adjudicated host-bridge records
## Diagnostic Labels
Each timeout must receive exactly one final diagnostic label:
1. `timeout-rerun-pass`
2. `timeout-rerun-fail-closed`
3. `timeout-large-source`
4. `timeout-command-hang`
5. `timeout-nondeterministic`
6. `timeout-source-scan-heavy`
7. `timeout-unknown`
Secondary labels may be attached for:
1. large file count
2. large total size
3. many HTML or JS files
4. generated report present after rerun
5. stderr decode noise
6. elapsed time near budget
## Required Evidence
For each timeout record, collect:
1. scene id
2. scene name
3. source directory
4. previous reconciled status
5. follow-up status
6. file count
7. total source bytes
8. HTML file count
9. JavaScript file count
10. largest files
11. diagnostic rerun exit code
12. diagnostic rerun elapsed seconds
13. diagnostic rerun timed out flag
14. generation report path if produced
15. generation status if produced
16. final diagnostic label
## Output
Diagnostic output:
`tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
Report output:
`docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
## Success Criteria
1. exactly three timeout records are diagnosed
2. `sweep-040-scene` is explicitly marked as the regression timeout
3. the two persistent timeout records remain distinguishable from the regression timeout
4. each record has one final diagnostic label
5. no implementation changes are made
6. no execution board state is updated