sgclaw: snapshot today's runtime and skill updates
This commit is contained in:
551
docs/plans/2026-03-26-zeroclaw-prompt-safety-hardening-plan.md
Normal file
551
docs/plans/2026-03-26-zeroclaw-prompt-safety-hardening-plan.md
Normal file
@@ -0,0 +1,551 @@
|
||||
# ZeroClaw Prompt Safety Hardening Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Harden ZeroClaw prompt handling and tool execution so non-skill freeform operations degrade to read-only or business-approved execution, while trusted skill-defined operations retain bounded execution privileges.
|
||||
|
||||
**Architecture:** Build a security gate around the existing prompt and tool-entry paths instead of rewriting the full prompt architecture. The gate classifies prompt-injection risk, records operation provenance (`trusted_skill` vs `non_skill`), sanitizes injected workspace/skill content, and enforces execution mode transitions (`clean`, `suspect_readonly`, `suspect_waiting_approval`, `suspect_business_approved`). Trusted skills gain structured business-operation metadata; non-skill operations require business-level approval before any privileged capability is released.
|
||||
|
||||
**Tech Stack:** Rust, vendored ZeroClaw (`third_party/zeroclaw`), existing approval/autonomy system, current prompt guard and prompt builder tests, `cargo test`.
|
||||
|
||||
### Task 1: Create an Isolated Worktree and Verify a Clean Baseline
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.gitignore`
|
||||
- Create: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/**`
|
||||
|
||||
**Step 1: Verify the worktree directory is safe to use**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /home/zyl/projects/sgClaw/claw
|
||||
ls -d .worktrees
|
||||
git check-ignore -v .worktrees
|
||||
```
|
||||
|
||||
Expected: `.worktrees` exists and is ignored by git.
|
||||
|
||||
**Step 2: Create the implementation worktree**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /home/zyl/projects/sgClaw/claw
|
||||
git worktree add .worktrees/zeroclaw-prompt-safety-hardening -b zeroclaw-prompt-safety-hardening
|
||||
```
|
||||
|
||||
Expected: a new branch and worktree are created.
|
||||
|
||||
**Step 3: Build the baseline in the worktree**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening
|
||||
cargo test -p zeroclawlabs prompt_guard -- --nocapture
|
||||
cargo test -p zeroclawlabs build_system_prompt -- --nocapture
|
||||
```
|
||||
|
||||
Expected: existing relevant tests pass before any code changes.
|
||||
|
||||
**Step 4: Commit the clean worktree setup if `.gitignore` changed**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add .gitignore
|
||||
git commit -m "chore: prepare worktree for prompt safety hardening"
|
||||
```
|
||||
|
||||
Expected: commit only if `.gitignore` required an adjustment.
|
||||
|
||||
### Task 2: Add the Core Security-Mode Data Model
|
||||
|
||||
**Files:**
|
||||
- Create: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/operation_policy.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/mod.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/operation_policy.rs`
|
||||
|
||||
**Step 1: Write the failing policy tests**
|
||||
|
||||
Add tests that prove:
|
||||
- suspicious non-skill input maps to `suspect_readonly`
|
||||
- trusted skill operations can request bounded privileged execution
|
||||
- any out-of-scope capability request downgrades the operation
|
||||
|
||||
Use concrete enums and assertions, for example:
|
||||
```rust
|
||||
assert_eq!(
|
||||
ExecutionMode::from_guard_and_provenance(GuardRisk::Suspicious, OperationProvenance::NonSkill),
|
||||
ExecutionMode::SuspectReadOnly
|
||||
);
|
||||
```
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening
|
||||
cargo test -p zeroclawlabs operation_policy -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because the new types do not exist yet.
|
||||
|
||||
**Step 3: Implement the minimal policy model**
|
||||
|
||||
Define:
|
||||
- `GuardRisk` (`Clean`, `Suspicious`, `Dangerous`)
|
||||
- `OperationProvenance` (`TrustedSkill`, `NonSkill`, `Mixed`)
|
||||
- `ExecutionMode` (`Clean`, `SuspectReadOnly`, `SuspectWaitingApproval`, `SuspectBusinessApproved`)
|
||||
- `CapabilityClass` for privileged business actions
|
||||
|
||||
Add small helper functions that do only state mapping. Do not pull prompt-building logic into this module.
|
||||
|
||||
**Step 4: Re-run the policy tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs operation_policy -- --nocapture
|
||||
```
|
||||
|
||||
Expected: the new policy tests pass.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/security/mod.rs third_party/zeroclaw/src/security/operation_policy.rs
|
||||
git commit -m "feat: add prompt security execution mode model"
|
||||
```
|
||||
|
||||
### Task 3: Add Structured Skill Trust Metadata
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/skills/mod.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/tools/read_skill.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/skills/mod.rs`
|
||||
|
||||
**Step 1: Write failing skill metadata tests**
|
||||
|
||||
Add tests that prove:
|
||||
- `SKILL.toml` can declare a business operation type, capability list, argument constraints, and `step_budget`
|
||||
- markdown-only skills default to unprivileged metadata
|
||||
- malformed privileged metadata is rejected or downgraded safely
|
||||
|
||||
Use a manifest shape like:
|
||||
```toml
|
||||
[skill]
|
||||
name = "export-report"
|
||||
description = "Export the monthly report"
|
||||
|
||||
[security]
|
||||
operation_type = "browser_export_data"
|
||||
allowed_capabilities = ["browser_read", "browser_export"]
|
||||
step_budget = 6
|
||||
approval_mode = "trusted_skill"
|
||||
```
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs skill -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because the structured metadata fields are missing.
|
||||
|
||||
**Step 3: Implement minimal structured metadata**
|
||||
|
||||
Extend `Skill` with a structured security block, for example:
|
||||
- `operation_type`
|
||||
- `business_description`
|
||||
- `allowed_capabilities`
|
||||
- `arg_constraints`
|
||||
- `step_budget`
|
||||
- `approval_mode`
|
||||
|
||||
Default markdown-only skills to unprivileged metadata so existing skills remain compatible.
|
||||
|
||||
**Step 4: Make `read_skill` expose the metadata**
|
||||
|
||||
Return or prepend enough structured metadata so the runtime can distinguish trusted skill operations from plain prompt text.
|
||||
|
||||
**Step 5: Re-run the tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs skill -- --nocapture
|
||||
```
|
||||
|
||||
Expected: skill parsing and `read_skill` tests pass.
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/skills/mod.rs third_party/zeroclaw/src/tools/read_skill.rs
|
||||
git commit -m "feat: add trusted skill security metadata"
|
||||
```
|
||||
|
||||
### Task 4: Sanitize Injected Workspace and Skill Content Before Prompt Assembly
|
||||
|
||||
**Files:**
|
||||
- Create: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/prompt_sanitizer.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/mod.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/channels/mod.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/prompt.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/channels/mod.rs`
|
||||
|
||||
**Step 1: Write failing sanitizer tests**
|
||||
|
||||
Add tests that prove:
|
||||
- dangerous bootstrap phrases are removed, escaped, or summarized before prompt injection
|
||||
- control characters are stripped
|
||||
- overlong files are truncated with an audit-friendly marker
|
||||
- safe business content remains readable
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs build_system_prompt -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because injected files are still copied verbatim.
|
||||
|
||||
**Step 3: Implement the sanitizer**
|
||||
|
||||
Create a small sanitizer that:
|
||||
- strips control characters
|
||||
- caps content length
|
||||
- flags prompt-override phrases
|
||||
- emits sanitized content plus metadata such as `truncated` and matched rules
|
||||
|
||||
Use this sanitizer in:
|
||||
- `load_openclaw_bootstrap_files`
|
||||
- any shared path in `agent/prompt.rs` that renders workspace or skill text into the system prompt
|
||||
|
||||
**Step 4: Re-run the tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs build_system_prompt -- --nocapture
|
||||
```
|
||||
|
||||
Expected: prompt-building tests pass with the new sanitized behavior.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/security/mod.rs third_party/zeroclaw/src/security/prompt_sanitizer.rs third_party/zeroclaw/src/channels/mod.rs third_party/zeroclaw/src/agent/prompt.rs
|
||||
git commit -m "feat: sanitize injected workspace prompt content"
|
||||
```
|
||||
|
||||
### Task 5: Wire `PromptGuard` into Main Agent and Gateway Entry Points
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/prompt_guard.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/agent.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/gateway/mod.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/gateway/ws.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/agent.rs`
|
||||
|
||||
**Step 1: Write failing entry-point tests**
|
||||
|
||||
Add tests that prove:
|
||||
- suspicious input marks the turn as degraded instead of silently continuing
|
||||
- dangerous input is blocked
|
||||
- clean input remains unchanged
|
||||
|
||||
Prefer tests that assert on a security decision object instead of brittle prompt strings.
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs prompt_guard -- --nocapture
|
||||
cargo test -p zeroclawlabs agent -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because no entry path consumes the guard result.
|
||||
|
||||
**Step 3: Implement guarded entry evaluation**
|
||||
|
||||
Before each turn:
|
||||
- scan the inbound user content
|
||||
- map the guard result into `GuardRisk`
|
||||
- create an execution context carrying risk and provenance
|
||||
- attach audit details for later logging
|
||||
|
||||
Keep the existing `PromptGuard` regexes unless a test demands a specific adjustment.
|
||||
|
||||
**Step 4: Re-run the tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs prompt_guard -- --nocapture
|
||||
cargo test -p zeroclawlabs agent -- --nocapture
|
||||
```
|
||||
|
||||
Expected: suspicious and blocked paths now behave deterministically.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/security/prompt_guard.rs third_party/zeroclaw/src/agent/agent.rs third_party/zeroclaw/src/gateway/mod.rs third_party/zeroclaw/src/gateway/ws.rs
|
||||
git commit -m "feat: enforce prompt guard at runtime entry points"
|
||||
```
|
||||
|
||||
### Task 6: Add Business-Level Privileged Operation Registry and Approval Tokens
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/approval/mod.rs`
|
||||
- Create: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/business_approval.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/mod.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/security/business_approval.rs`
|
||||
|
||||
**Step 1: Write failing business approval tests**
|
||||
|
||||
Add tests that prove:
|
||||
- only operations in the privileged registry can request approval
|
||||
- approval tokens bind to `session_id`, `operation_type`, `allowed_capabilities`, `step_budget`, and expiration
|
||||
- a mismatched or expired approval token is rejected
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs business_approval -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because the business approval registry does not exist yet.
|
||||
|
||||
**Step 3: Implement the registry and token model**
|
||||
|
||||
Create:
|
||||
- a privileged business operation registry
|
||||
- a single-operation approval token
|
||||
- helper checks for `can_request_approval` and `matches_execution_request`
|
||||
|
||||
Model approval at the business-operation level, not raw tool calls.
|
||||
|
||||
**Step 4: Extend the existing approval module**
|
||||
|
||||
Teach the approval module to carry business-level fields through the current request/response flow without breaking old call sites.
|
||||
|
||||
**Step 5: Re-run the tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs business_approval -- --nocapture
|
||||
```
|
||||
|
||||
Expected: the token validation and registry tests pass.
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/approval/mod.rs third_party/zeroclaw/src/security/mod.rs third_party/zeroclaw/src/security/business_approval.rs
|
||||
git commit -m "feat: add business-level approval registry"
|
||||
```
|
||||
|
||||
### Task 7: Enforce Execution Modes in Tool Dispatch
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/dispatcher.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/agent.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/loop_.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/dispatcher.rs`
|
||||
|
||||
**Step 1: Write failing dispatcher tests**
|
||||
|
||||
Add tests that prove:
|
||||
- `suspect_readonly` allows only safe read capabilities
|
||||
- `trusted_skill` can execute capabilities listed in its metadata within `step_budget`
|
||||
- `mixed` or non-skill privileged calls require a matching business approval token
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs dispatcher -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because the dispatcher does not yet know about execution modes.
|
||||
|
||||
**Step 3: Implement capability enforcement**
|
||||
|
||||
Before dispatching any tool:
|
||||
- resolve the operation context
|
||||
- map the tool call to a capability class
|
||||
- reject calls outside the current execution mode
|
||||
- decrement or validate `step_budget` for approved bounded flows
|
||||
|
||||
Do not rely on prompt text for enforcement.
|
||||
|
||||
**Step 4: Re-run the tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs dispatcher -- --nocapture
|
||||
```
|
||||
|
||||
Expected: dispatch now respects read-only, trusted skill, and business-approved modes.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/agent/dispatcher.rs third_party/zeroclaw/src/agent/agent.rs third_party/zeroclaw/src/agent/loop_.rs
|
||||
git commit -m "feat: enforce execution mode in tool dispatch"
|
||||
```
|
||||
|
||||
### Task 8: Default Skills Prompt Injection to Compact for Safer Runtime Behavior
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/config/schema.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/prompt.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/channels/mod.rs`
|
||||
- Test: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/config/schema.rs`
|
||||
|
||||
**Step 1: Write the failing configuration test**
|
||||
|
||||
Add a test that asserts the default skill prompt injection mode is `Compact` unless explicitly configured otherwise.
|
||||
|
||||
**Step 2: Run the test to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs skills_prompt_injection_mode -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because defaults still point to `Full`.
|
||||
|
||||
**Step 3: Implement the default flip**
|
||||
|
||||
Update config defaults and any prompt-builder defaults that currently assume `Full`. Keep explicit user config backward compatible.
|
||||
|
||||
**Step 4: Re-run the test to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs skills_prompt_injection_mode -- --nocapture
|
||||
```
|
||||
|
||||
Expected: default configuration now resolves to `Compact`.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/config/schema.rs third_party/zeroclaw/src/agent/prompt.rs third_party/zeroclaw/src/channels/mod.rs
|
||||
git commit -m "feat: default skills prompt injection to compact"
|
||||
```
|
||||
|
||||
### Task 9: Add Audit Logging and Regression Coverage
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/observability/mod.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/agent/agent.rs`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/src/channels/mod.rs`
|
||||
- Create: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/third_party/zeroclaw/tests/prompt_safety_regression.rs`
|
||||
|
||||
**Step 1: Write the failing regression tests**
|
||||
|
||||
Cover:
|
||||
- prompt override attack from user content
|
||||
- malicious `AGENTS.md` bootstrap content
|
||||
- trusted skill execution within bounds
|
||||
- non-skill privileged request requiring business approval
|
||||
- approval token mismatch
|
||||
- session history restore preserving degraded mode
|
||||
|
||||
**Step 2: Run the tests to verify RED**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs --test prompt_safety_regression -- --nocapture
|
||||
```
|
||||
|
||||
Expected: fail because the end-to-end behavior is not wired together yet.
|
||||
|
||||
**Step 3: Implement audit logging**
|
||||
|
||||
Record:
|
||||
- input hash
|
||||
- matched guard rules
|
||||
- risk level
|
||||
- provenance
|
||||
- execution mode transitions
|
||||
- approval decisions
|
||||
|
||||
Avoid logging raw sensitive content.
|
||||
|
||||
**Step 4: Re-run the regression tests to verify GREEN**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs --test prompt_safety_regression -- --nocapture
|
||||
```
|
||||
|
||||
Expected: the regression suite passes.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add third_party/zeroclaw/src/observability/mod.rs third_party/zeroclaw/src/agent/agent.rs third_party/zeroclaw/src/channels/mod.rs third_party/zeroclaw/tests/prompt_safety_regression.rs
|
||||
git commit -m "test: add prompt safety regression coverage"
|
||||
```
|
||||
|
||||
### Task 10: Final Verification and Integration Review
|
||||
|
||||
**Files:**
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/docs/L5-提示词分布与安全改造方案.md`
|
||||
- Modify: `/home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening/docs/README.md`
|
||||
|
||||
**Step 1: Run targeted verification**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cd /home/zyl/projects/sgClaw/claw/.worktrees/zeroclaw-prompt-safety-hardening
|
||||
cargo test -p zeroclawlabs prompt_guard -- --nocapture
|
||||
cargo test -p zeroclawlabs build_system_prompt -- --nocapture
|
||||
cargo test -p zeroclawlabs dispatcher -- --nocapture
|
||||
cargo test -p zeroclawlabs --test prompt_safety_regression -- --nocapture
|
||||
```
|
||||
|
||||
Expected: all prompt safety and dispatcher tests pass.
|
||||
|
||||
**Step 2: Run a broad ZeroClaw package test pass if time permits**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
cargo test -p zeroclawlabs -- --nocapture
|
||||
```
|
||||
|
||||
Expected: no regressions in the vendored package test suite, or a documented list of unrelated existing failures.
|
||||
|
||||
**Step 3: Update the security design docs**
|
||||
|
||||
Document:
|
||||
- execution modes
|
||||
- trusted skill metadata contract
|
||||
- business approval flow
|
||||
- why non-skill privileged actions are gated
|
||||
|
||||
**Step 4: Commit the docs**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
git add docs/L5-提示词分布与安全改造方案.md docs/README.md
|
||||
git commit -m "docs: record prompt safety hardening design"
|
||||
```
|
||||
|
||||
**Step 5: Prepare merge review notes**
|
||||
|
||||
Write a short integration summary covering:
|
||||
- changed entry points
|
||||
- backward-compatibility expectations
|
||||
- any skills that need metadata upgrades
|
||||
- rollout recommendation for existing integrators
|
||||
Reference in New Issue
Block a user