feat: add generated scene skill platform hardening
This commit is contained in:
@@ -0,0 +1,178 @@
|
||||
# Timeout Regression Diagnostic Plan
|
||||
|
||||
> Date: 2026-04-19
|
||||
> Status: Draft
|
||||
> Upstream Design: `docs/superpowers/specs/2026-04-19-timeout-regression-diagnostic-design.md`
|
||||
> Upstream Follow-up: `tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
|
||||
|
||||
## Plan Intent
|
||||
|
||||
Run a bounded diagnostic for the three timeout records after the structured fail-closed improvement follow-up sweep.
|
||||
|
||||
This plan only diagnoses timeout behavior. It does not implement fixes.
|
||||
|
||||
## Scope Guardrails
|
||||
|
||||
1. do not modify `src/generated_scene/analyzer.rs`
|
||||
2. do not modify `src/generated_scene/generator.rs`
|
||||
3. do not update `scene_execution_board_2026-04-18.json`
|
||||
4. do not promote scenes
|
||||
5. do not add family baselines
|
||||
6. do not handle the remaining structured fail-closed records
|
||||
7. do not handle adjudicated host-bridge records
|
||||
8. do not treat diagnostic rerun success as validated scene pass
|
||||
|
||||
## Fixed Input
|
||||
|
||||
The fixed input is:
|
||||
|
||||
`tests/fixtures/generated_scene/structured_fail_closed_improvement_followup_2026-04-19.json`
|
||||
|
||||
Only records with `followupStatus = source-unreadable` and reason `generator timeout after 45s` enter this plan.
|
||||
|
||||
Expected fixed set:
|
||||
|
||||
| Scene id | Scene | Type |
|
||||
| --- | --- | --- |
|
||||
| `sweep-015-scene` | `任务报表` | persistent timeout |
|
||||
| `sweep-025-scene` | `力禾动环系统巡视记录` | persistent timeout |
|
||||
| `sweep-040-scene` | `嘉峪关日报` | regression timeout |
|
||||
|
||||
## Phase 0: Freeze Timeout Inputs
|
||||
|
||||
### Objective
|
||||
|
||||
Freeze the exact timeout set before diagnostics.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. read the follow-up sweep JSON
|
||||
2. filter `source-unreadable` timeout records
|
||||
3. verify the count is exactly `3`
|
||||
4. identify `sweep-040-scene` as the regression timeout
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. frozen timeout input list
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. exactly `3` timeout records enter diagnostics
|
||||
2. no non-timeout record enters diagnostics
|
||||
|
||||
## Phase 1: Source Directory Diagnostics
|
||||
|
||||
### Objective
|
||||
|
||||
Determine whether timeout records are likely caused by source scale or source structure.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. inspect each source directory
|
||||
2. count all files
|
||||
3. count HTML files
|
||||
4. count JavaScript files
|
||||
5. compute total source bytes
|
||||
6. record the largest files
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. per-scene source diagnostics in JSON
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. all `3` timeout records have source diagnostics
|
||||
2. missing directories are reported explicitly
|
||||
|
||||
## Phase 2: Bounded Diagnostic Rerun
|
||||
|
||||
### Objective
|
||||
|
||||
Check whether each timeout completes under a longer diagnostic budget.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. rerun each timeout scene with a diagnostic timeout budget
|
||||
2. write output under `examples/timeout_regression_diagnostic_2026-04-19`
|
||||
3. capture exit code
|
||||
4. capture elapsed seconds
|
||||
5. record whether a `generation-report.json` is produced
|
||||
6. do not update any execution status based on the result
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. diagnostic rerun result per timeout scene
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. each timeout has exactly one diagnostic rerun result
|
||||
2. rerun success is marked only as diagnostic evidence
|
||||
3. rerun failure is categorized, not fixed
|
||||
|
||||
## Phase 3: Timeout Labeling
|
||||
|
||||
### Objective
|
||||
|
||||
Assign each timeout one final diagnostic label.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. assign one primary diagnostic label:
|
||||
- `timeout-rerun-pass`
|
||||
- `timeout-rerun-fail-closed`
|
||||
- `timeout-large-source`
|
||||
- `timeout-command-hang`
|
||||
- `timeout-nondeterministic`
|
||||
- `timeout-source-scan-heavy`
|
||||
- `timeout-unknown`
|
||||
2. attach secondary labels when useful
|
||||
3. distinguish persistent timeouts from regression timeout
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. labeled timeout diagnostic JSON
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. all `3` records have exactly one primary diagnostic label
|
||||
2. `sweep-040-scene` remains clearly identified as the regression timeout
|
||||
|
||||
## Phase 4: Diagnostic Report
|
||||
|
||||
### Objective
|
||||
|
||||
Publish diagnostic results without starting implementation.
|
||||
|
||||
### Tasks
|
||||
|
||||
1. write `tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
|
||||
2. write `docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
|
||||
3. summarize whether the next step should be timeout implementation, rerun hygiene, or no action
|
||||
|
||||
### Deliverables
|
||||
|
||||
1. `tests/fixtures/generated_scene/timeout_regression_diagnostic_2026-04-19.json`
|
||||
2. `docs/superpowers/reports/2026-04-19-timeout-regression-diagnostic-report.md`
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
1. diagnostic output exists
|
||||
2. report exists
|
||||
3. no implementation changes are made
|
||||
4. no execution board update is made
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
This plan is complete when:
|
||||
|
||||
1. the three timeout records are frozen
|
||||
2. each has source diagnostics
|
||||
3. each has one diagnostic rerun result
|
||||
4. each has one final diagnostic label
|
||||
5. JSON and report are published
|
||||
|
||||
## Stop Statement
|
||||
|
||||
Stop after publishing the timeout diagnostic JSON and report.
|
||||
|
||||
Do not start timeout implementation or status promotion inside this plan.
|
||||
Reference in New Issue
Block a user