Timeout Regression Diagnostic Report

Date: 2026-04-19 Plan: docs/superpowers/plans/2026-04-19-timeout-regression-diagnostic-plan.md Follow-up input: tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json

Scope

This diagnostic only handled the three timeout records from the structured fail-closed improvement follow-up sweep.

No analyzer or generator logic was changed.

No execution board state was updated.

Frozen Timeout Inputs

Scene id	Scene	Type	Previous reconciled status	Follow-up status
`sweep-015-scene`	`????`	persistent timeout	`source-unreadable`	`source-unreadable`
`sweep-025-scene`	`??????????`	persistent timeout	`source-unreadable`	`source-unreadable`
`sweep-040-scene`	`?????`	regression timeout	`fail-closed-known`	`source-unreadable`

Source Diagnostics

Scene id	File count	HTML	JS	Total bytes
`sweep-015-scene`	93	10	21	96,922,420
`sweep-025-scene`	137	51	38	11,274,750
`sweep-040-scene`	50	2	21	5,037,507

Interpretation:

sweep-015-scene is the largest source set by total bytes and contains many zip artifacts.
sweep-025-scene is not the largest by bytes, but it has the highest combined HTML and JavaScript file count.
sweep-040-scene is materially smaller than the two persistent timeout records, so its regression does not look like a pure source-scale problem.

Diagnostic Rerun

A bounded diagnostic rerun was executed for each timeout record with a 90s timeout budget.

Scene id	Elapsed seconds	Exit code	Timed out	Generation report	Result
`sweep-015-scene`	74.76	0	`false`	present	readiness `A`
`sweep-025-scene`	49.03	0	`false`	present	readiness `A`
`sweep-040-scene`	45.91	1	`false`	present	fail-closed, readiness `C`

Final Diagnostic Labels

Scene id	Final label	Secondary labels
`sweep-015-scene`	`timeout-rerun-pass`	`large-total-source`, `zip-heavy-source`
`sweep-025-scene`	`timeout-rerun-pass`	`source-scan-heavy`, `high-html-js-count`
`sweep-040-scene`	`timeout-rerun-fail-closed`	`regression-timeout`, `budget-sensitive-timeout`

Conclusions

The two persistent timeout records are not hard failures. Under a bounded 90s diagnostic rerun, both completed successfully.
sweep-040-scene is the only real regression timeout. Under the same 90s diagnostic rerun, it resolved into a structured fail-closed result instead of timing out.
The current timeout bucket is therefore mixed:
- two records are budget-sensitive successful runs
- one record is a budget-sensitive regression that should really be treated as a structured fail-closed case after rerun
The next step should not be timeout implementation first. The higher-value next step is rerun hygiene and timeout-budget policy, so that scenes like sweep-040-scene do not get miscounted as unreadable when they can resolve into a concrete fail-closed result.

Boundaries Preserved

This diagnostic did not:

change analyzer or generator code
update scene_execution_board_2026-04-18.json
promote any scene
rerun the full 102 sweep
start an implementation correction plan

3.3 KiB Raw Blame History