feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
152
BROWSER_SKILL_AUTHORING.md
Normal file
152
BROWSER_SKILL_AUTHORING.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# Browser Skill Authoring
|
||||
|
||||
This note captures the browser-skill authoring rules proven during the live
|
||||
Zhihu hotlist export debugging on 2026-03-30.
|
||||
|
||||
## Why This Exists
|
||||
|
||||
The live browser run proved that a skill can be selected correctly, the
|
||||
runtime can call the right browser-script tool, and the task can still fail if
|
||||
the skill package does not encode enough deterministic extraction logic.
|
||||
|
||||
The concrete failure pattern was:
|
||||
|
||||
1. `zhihu-hotlist.extract_hotlist` was called correctly.
|
||||
2. The packaged script relied on stale DOM classes and returned no rows.
|
||||
3. The runtime fell back to generic `getText` probing.
|
||||
4. The user saw selector thrashing instead of a stable extraction path.
|
||||
|
||||
This document exists to prevent the same failure pattern in future browser
|
||||
skills.
|
||||
|
||||
## Authoring Rules
|
||||
|
||||
### 1. Use a packaged script for structured browser tasks
|
||||
|
||||
If the task's primary deliverable is structured data such as rows, fields, or a
|
||||
stable artifact, the skill should expose a deterministic `browser_script` tool.
|
||||
|
||||
Do not rely on prose-only instructions for repeated structured extraction.
|
||||
|
||||
### 2. Keep a strict extraction ladder inside the script
|
||||
|
||||
For browser extraction skills, the script should try data sources in this order:
|
||||
|
||||
1. stable structured page state when available
|
||||
2. generalized DOM candidates that are broader than one historical classname
|
||||
3. controlled page-text parsing as the last deterministic fallback
|
||||
|
||||
Do not jump straight from one brittle selector family to generic browser
|
||||
probing.
|
||||
|
||||
### 3. Treat generic `getText` probing as a fallback of last resort
|
||||
|
||||
The packaged script is the primary deterministic path.
|
||||
|
||||
If the script fails, it should fail for a specific reason:
|
||||
|
||||
- blocked/login/captcha page
|
||||
- unsupported page shape
|
||||
- artifact incomplete
|
||||
|
||||
Generic browser wandering should begin only after the packaged script has
|
||||
exhausted its own deterministic fallbacks.
|
||||
|
||||
### 4. Encode blocked-page semantics explicitly
|
||||
|
||||
A browser skill must distinguish:
|
||||
|
||||
- "the expected data is not present"
|
||||
- "the page is blocked by login, captcha, or anti-bot state"
|
||||
|
||||
When the page is blocked, fail with an explicit message. Do not silently report
|
||||
"no rows" if the real issue is that the page is not usable.
|
||||
|
||||
### 5. Make the structured artifact the primary contract
|
||||
|
||||
Upstream collection skills should return the structured artifact as soon as it
|
||||
is stable.
|
||||
|
||||
For example, the Zhihu hotlist flow should produce:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "https://www.zhihu.com/hot",
|
||||
"sheet_name": "知乎热榜",
|
||||
"columns": ["rank", "title", "heat"],
|
||||
"rows": [[1, "标题", "344万"]]
|
||||
}
|
||||
```
|
||||
|
||||
Downstream skills such as Office export or screen rendering should consume that
|
||||
artifact. They should not recollect source data.
|
||||
|
||||
### 6. Stop exploratory browser work after the artifact is stable
|
||||
|
||||
Once the primary artifact is complete:
|
||||
|
||||
- stop selector exploration
|
||||
- stop unrelated browser wandering
|
||||
- hand the artifact to the downstream skill or tool
|
||||
|
||||
Do not continue reading random page text after the final rows are already
|
||||
captured.
|
||||
|
||||
### 7. Keep skill boundaries narrow and explicit
|
||||
|
||||
Separate the responsibilities:
|
||||
|
||||
- navigation skill: reach the destination and verify arrival
|
||||
- collection skill: extract structured data
|
||||
- export skill: render `.xlsx`
|
||||
- presentation skill: render `.html` and `presentation`
|
||||
|
||||
Do not mix recollection, export, and presentation logic into one downstream
|
||||
skill.
|
||||
|
||||
### 8. Encode host constraints in every browser skill
|
||||
|
||||
Browser skills should restate the SuperRPA host contract:
|
||||
|
||||
- use `superrpa_browser` semantics inside the browser host
|
||||
- `expected_domain` is a bare hostname only
|
||||
- selectors must be valid CSS selectors
|
||||
- prefer direct routes before brittle click chains
|
||||
|
||||
These rules are not "obvious context". They belong in the skill.
|
||||
|
||||
### 9. Verify browser skills at multiple layers
|
||||
|
||||
A browser skill is not complete without verification at more than one layer:
|
||||
|
||||
1. script-level test for the packaged browser script
|
||||
2. skill-library validation for package structure
|
||||
3. runtime integration test proving the skill is actually called
|
||||
4. live acceptance when a real browser session is available
|
||||
|
||||
The 2026-03-30 fix only became trustworthy after all four were aligned.
|
||||
|
||||
### 10. Keep logs versioned and skill names explicit
|
||||
|
||||
Live debugging is much faster when the runtime logs include:
|
||||
|
||||
- runtime version
|
||||
- protocol version
|
||||
- loaded skill names with versions
|
||||
- explicit `call skill.tool` messages
|
||||
|
||||
Skill packages should be written assuming those logs are part of the
|
||||
operability contract.
|
||||
|
||||
## Update Checklist
|
||||
|
||||
When editing a browser skill, check all of the following:
|
||||
|
||||
- Does the skill define a deterministic primary path?
|
||||
- Does it state when generic probing is allowed?
|
||||
- Does it distinguish blocked pages from missing data?
|
||||
- Does it define the primary structured artifact clearly?
|
||||
- Does it stop downstream skills from recollecting data?
|
||||
- Does it include verification expectations, not only workflow prose?
|
||||
|
||||
If any answer is "no", the skill is still under-specified.
|
||||
Reference in New Issue
Block a user