153 lines
4.9 KiB
Markdown
153 lines
4.9 KiB
Markdown
# Browser Skill Authoring
|
|
|
|
This note captures the browser-skill authoring rules proven during the live
|
|
Zhihu hotlist export debugging on 2026-03-30.
|
|
|
|
## Why This Exists
|
|
|
|
The live browser run proved that a skill can be selected correctly, the
|
|
runtime can call the right browser-script tool, and the task can still fail if
|
|
the skill package does not encode enough deterministic extraction logic.
|
|
|
|
The concrete failure pattern was:
|
|
|
|
1. `zhihu-hotlist.extract_hotlist` was called correctly.
|
|
2. The packaged script relied on stale DOM classes and returned no rows.
|
|
3. The runtime fell back to generic `getText` probing.
|
|
4. The user saw selector thrashing instead of a stable extraction path.
|
|
|
|
This document exists to prevent the same failure pattern in future browser
|
|
skills.
|
|
|
|
## Authoring Rules
|
|
|
|
### 1. Use a packaged script for structured browser tasks
|
|
|
|
If the task's primary deliverable is structured data such as rows, fields, or a
|
|
stable artifact, the skill should expose a deterministic `browser_script` tool.
|
|
|
|
Do not rely on prose-only instructions for repeated structured extraction.
|
|
|
|
### 2. Keep a strict extraction ladder inside the script
|
|
|
|
For browser extraction skills, the script should try data sources in this order:
|
|
|
|
1. stable structured page state when available
|
|
2. generalized DOM candidates that are broader than one historical classname
|
|
3. controlled page-text parsing as the last deterministic fallback
|
|
|
|
Do not jump straight from one brittle selector family to generic browser
|
|
probing.
|
|
|
|
### 3. Treat generic `getText` probing as a fallback of last resort
|
|
|
|
The packaged script is the primary deterministic path.
|
|
|
|
If the script fails, it should fail for a specific reason:
|
|
|
|
- blocked/login/captcha page
|
|
- unsupported page shape
|
|
- artifact incomplete
|
|
|
|
Generic browser wandering should begin only after the packaged script has
|
|
exhausted its own deterministic fallbacks.
|
|
|
|
### 4. Encode blocked-page semantics explicitly
|
|
|
|
A browser skill must distinguish:
|
|
|
|
- "the expected data is not present"
|
|
- "the page is blocked by login, captcha, or anti-bot state"
|
|
|
|
When the page is blocked, fail with an explicit message. Do not silently report
|
|
"no rows" if the real issue is that the page is not usable.
|
|
|
|
### 5. Make the structured artifact the primary contract
|
|
|
|
Upstream collection skills should return the structured artifact as soon as it
|
|
is stable.
|
|
|
|
For example, the Zhihu hotlist flow should produce:
|
|
|
|
```json
|
|
{
|
|
"source": "https://www.zhihu.com/hot",
|
|
"sheet_name": "知乎热榜",
|
|
"columns": ["rank", "title", "heat"],
|
|
"rows": [[1, "标题", "344万"]]
|
|
}
|
|
```
|
|
|
|
Downstream skills such as Office export or screen rendering should consume that
|
|
artifact. They should not recollect source data.
|
|
|
|
### 6. Stop exploratory browser work after the artifact is stable
|
|
|
|
Once the primary artifact is complete:
|
|
|
|
- stop selector exploration
|
|
- stop unrelated browser wandering
|
|
- hand the artifact to the downstream skill or tool
|
|
|
|
Do not continue reading random page text after the final rows are already
|
|
captured.
|
|
|
|
### 7. Keep skill boundaries narrow and explicit
|
|
|
|
Separate the responsibilities:
|
|
|
|
- navigation skill: reach the destination and verify arrival
|
|
- collection skill: extract structured data
|
|
- export skill: render `.xlsx`
|
|
- presentation skill: render `.html` and `presentation`
|
|
|
|
Do not mix recollection, export, and presentation logic into one downstream
|
|
skill.
|
|
|
|
### 8. Encode host constraints in every browser skill
|
|
|
|
Browser skills should restate the SuperRPA host contract:
|
|
|
|
- use `superrpa_browser` semantics inside the browser host
|
|
- `expected_domain` is a bare hostname only
|
|
- selectors must be valid CSS selectors
|
|
- prefer direct routes before brittle click chains
|
|
|
|
These rules are not "obvious context". They belong in the skill.
|
|
|
|
### 9. Verify browser skills at multiple layers
|
|
|
|
A browser skill is not complete without verification at more than one layer:
|
|
|
|
1. script-level test for the packaged browser script
|
|
2. skill-library validation for package structure
|
|
3. runtime integration test proving the skill is actually called
|
|
4. live acceptance when a real browser session is available
|
|
|
|
The 2026-03-30 fix only became trustworthy after all four were aligned.
|
|
|
|
### 10. Keep logs versioned and skill names explicit
|
|
|
|
Live debugging is much faster when the runtime logs include:
|
|
|
|
- runtime version
|
|
- protocol version
|
|
- loaded skill names with versions
|
|
- explicit `call skill.tool` messages
|
|
|
|
Skill packages should be written assuming those logs are part of the
|
|
operability contract.
|
|
|
|
## Update Checklist
|
|
|
|
When editing a browser skill, check all of the following:
|
|
|
|
- Does the skill define a deterministic primary path?
|
|
- Does it state when generic probing is allowed?
|
|
- Does it distinguish blocked pages from missing data?
|
|
- Does it define the primary structured artifact clearly?
|
|
- Does it stop downstream skills from recollecting data?
|
|
- Does it include verification expectations, not only workflow prose?
|
|
|
|
If any answer is "no", the skill is still under-specified.
|