4.9 KiB
Browser Skill Authoring
This note captures the browser-skill authoring rules proven during the live Zhihu hotlist export debugging on 2026-03-30.
Why This Exists
The live browser run proved that a skill can be selected correctly, the runtime can call the right browser-script tool, and the task can still fail if the skill package does not encode enough deterministic extraction logic.
The concrete failure pattern was:
zhihu-hotlist.extract_hotlistwas called correctly.- The packaged script relied on stale DOM classes and returned no rows.
- The runtime fell back to generic
getTextprobing. - The user saw selector thrashing instead of a stable extraction path.
This document exists to prevent the same failure pattern in future browser skills.
Authoring Rules
1. Use a packaged script for structured browser tasks
If the task's primary deliverable is structured data such as rows, fields, or a
stable artifact, the skill should expose a deterministic browser_script tool.
Do not rely on prose-only instructions for repeated structured extraction.
2. Keep a strict extraction ladder inside the script
For browser extraction skills, the script should try data sources in this order:
- stable structured page state when available
- generalized DOM candidates that are broader than one historical classname
- controlled page-text parsing as the last deterministic fallback
Do not jump straight from one brittle selector family to generic browser probing.
3. Treat generic getText probing as a fallback of last resort
The packaged script is the primary deterministic path.
If the script fails, it should fail for a specific reason:
- blocked/login/captcha page
- unsupported page shape
- artifact incomplete
Generic browser wandering should begin only after the packaged script has exhausted its own deterministic fallbacks.
4. Encode blocked-page semantics explicitly
A browser skill must distinguish:
- "the expected data is not present"
- "the page is blocked by login, captcha, or anti-bot state"
When the page is blocked, fail with an explicit message. Do not silently report "no rows" if the real issue is that the page is not usable.
5. Make the structured artifact the primary contract
Upstream collection skills should return the structured artifact as soon as it is stable.
For example, the Zhihu hotlist flow should produce:
{
"source": "https://www.zhihu.com/hot",
"sheet_name": "知乎热榜",
"columns": ["rank", "title", "heat"],
"rows": [[1, "标题", "344万"]]
}
Downstream skills such as Office export or screen rendering should consume that artifact. They should not recollect source data.
6. Stop exploratory browser work after the artifact is stable
Once the primary artifact is complete:
- stop selector exploration
- stop unrelated browser wandering
- hand the artifact to the downstream skill or tool
Do not continue reading random page text after the final rows are already captured.
7. Keep skill boundaries narrow and explicit
Separate the responsibilities:
- navigation skill: reach the destination and verify arrival
- collection skill: extract structured data
- export skill: render
.xlsx - presentation skill: render
.htmlandpresentation
Do not mix recollection, export, and presentation logic into one downstream skill.
8. Encode host constraints in every browser skill
Browser skills should restate the SuperRPA host contract:
- use
superrpa_browsersemantics inside the browser host expected_domainis a bare hostname only- selectors must be valid CSS selectors
- prefer direct routes before brittle click chains
These rules are not "obvious context". They belong in the skill.
9. Verify browser skills at multiple layers
A browser skill is not complete without verification at more than one layer:
- script-level test for the packaged browser script
- skill-library validation for package structure
- runtime integration test proving the skill is actually called
- live acceptance when a real browser session is available
The 2026-03-30 fix only became trustworthy after all four were aligned.
10. Keep logs versioned and skill names explicit
Live debugging is much faster when the runtime logs include:
- runtime version
- protocol version
- loaded skill names with versions
- explicit
call skill.toolmessages
Skill packages should be written assuming those logs are part of the operability contract.
Update Checklist
When editing a browser skill, check all of the following:
- Does the skill define a deterministic primary path?
- Does it state when generic probing is allowed?
- Does it distinguish blocked pages from missing data?
- Does it define the primary structured artifact clearly?
- Does it stop downstream skills from recollecting data?
- Does it include verification expectations, not only workflow prose?
If any answer is "no", the skill is still under-specified.