Files
skill-lib/BROWSER_SKILL_AUTHORING.md
木炎 51913555ad feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 18:34:56 +08:00

153 lines
4.9 KiB
Markdown

# Browser Skill Authoring
This note captures the browser-skill authoring rules proven during the live
Zhihu hotlist export debugging on 2026-03-30.
## Why This Exists
The live browser run proved that a skill can be selected correctly, the
runtime can call the right browser-script tool, and the task can still fail if
the skill package does not encode enough deterministic extraction logic.
The concrete failure pattern was:
1. `zhihu-hotlist.extract_hotlist` was called correctly.
2. The packaged script relied on stale DOM classes and returned no rows.
3. The runtime fell back to generic `getText` probing.
4. The user saw selector thrashing instead of a stable extraction path.
This document exists to prevent the same failure pattern in future browser
skills.
## Authoring Rules
### 1. Use a packaged script for structured browser tasks
If the task's primary deliverable is structured data such as rows, fields, or a
stable artifact, the skill should expose a deterministic `browser_script` tool.
Do not rely on prose-only instructions for repeated structured extraction.
### 2. Keep a strict extraction ladder inside the script
For browser extraction skills, the script should try data sources in this order:
1. stable structured page state when available
2. generalized DOM candidates that are broader than one historical classname
3. controlled page-text parsing as the last deterministic fallback
Do not jump straight from one brittle selector family to generic browser
probing.
### 3. Treat generic `getText` probing as a fallback of last resort
The packaged script is the primary deterministic path.
If the script fails, it should fail for a specific reason:
- blocked/login/captcha page
- unsupported page shape
- artifact incomplete
Generic browser wandering should begin only after the packaged script has
exhausted its own deterministic fallbacks.
### 4. Encode blocked-page semantics explicitly
A browser skill must distinguish:
- "the expected data is not present"
- "the page is blocked by login, captcha, or anti-bot state"
When the page is blocked, fail with an explicit message. Do not silently report
"no rows" if the real issue is that the page is not usable.
### 5. Make the structured artifact the primary contract
Upstream collection skills should return the structured artifact as soon as it
is stable.
For example, the Zhihu hotlist flow should produce:
```json
{
"source": "https://www.zhihu.com/hot",
"sheet_name": "知乎热榜",
"columns": ["rank", "title", "heat"],
"rows": [[1, "标题", "344万"]]
}
```
Downstream skills such as Office export or screen rendering should consume that
artifact. They should not recollect source data.
### 6. Stop exploratory browser work after the artifact is stable
Once the primary artifact is complete:
- stop selector exploration
- stop unrelated browser wandering
- hand the artifact to the downstream skill or tool
Do not continue reading random page text after the final rows are already
captured.
### 7. Keep skill boundaries narrow and explicit
Separate the responsibilities:
- navigation skill: reach the destination and verify arrival
- collection skill: extract structured data
- export skill: render `.xlsx`
- presentation skill: render `.html` and `presentation`
Do not mix recollection, export, and presentation logic into one downstream
skill.
### 8. Encode host constraints in every browser skill
Browser skills should restate the SuperRPA host contract:
- use `superrpa_browser` semantics inside the browser host
- `expected_domain` is a bare hostname only
- selectors must be valid CSS selectors
- prefer direct routes before brittle click chains
These rules are not "obvious context". They belong in the skill.
### 9. Verify browser skills at multiple layers
A browser skill is not complete without verification at more than one layer:
1. script-level test for the packaged browser script
2. skill-library validation for package structure
3. runtime integration test proving the skill is actually called
4. live acceptance when a real browser session is available
The 2026-03-30 fix only became trustworthy after all four were aligned.
### 10. Keep logs versioned and skill names explicit
Live debugging is much faster when the runtime logs include:
- runtime version
- protocol version
- loaded skill names with versions
- explicit `call skill.tool` messages
Skill packages should be written assuming those logs are part of the
operability contract.
## Update Checklist
When editing a browser skill, check all of the following:
- Does the skill define a deterministic primary path?
- Does it state when generic probing is allowed?
- Does it distinguish blocked pages from missing data?
- Does it define the primary structured artifact clearly?
- Does it stop downstream skills from recollecting data?
- Does it include verification expectations, not only workflow prose?
If any answer is "no", the skill is still under-specified.