# Browser Skill Authoring This note captures the browser-skill authoring rules proven during the live Zhihu hotlist export debugging on 2026-03-30. ## Why This Exists The live browser run proved that a skill can be selected correctly, the runtime can call the right browser-script tool, and the task can still fail if the skill package does not encode enough deterministic extraction logic. The concrete failure pattern was: 1. `zhihu-hotlist.extract_hotlist` was called correctly. 2. The packaged script relied on stale DOM classes and returned no rows. 3. The runtime fell back to generic `getText` probing. 4. The user saw selector thrashing instead of a stable extraction path. This document exists to prevent the same failure pattern in future browser skills. ## Authoring Rules ### 1. Use a packaged script for structured browser tasks If the task's primary deliverable is structured data such as rows, fields, or a stable artifact, the skill should expose a deterministic `browser_script` tool. Do not rely on prose-only instructions for repeated structured extraction. ### 2. Keep a strict extraction ladder inside the script For browser extraction skills, the script should try data sources in this order: 1. stable structured page state when available 2. generalized DOM candidates that are broader than one historical classname 3. controlled page-text parsing as the last deterministic fallback Do not jump straight from one brittle selector family to generic browser probing. ### 3. Treat generic `getText` probing as a fallback of last resort The packaged script is the primary deterministic path. If the script fails, it should fail for a specific reason: - blocked/login/captcha page - unsupported page shape - artifact incomplete Generic browser wandering should begin only after the packaged script has exhausted its own deterministic fallbacks. ### 4. Encode blocked-page semantics explicitly A browser skill must distinguish: - "the expected data is not present" - "the page is blocked by login, captcha, or anti-bot state" When the page is blocked, fail with an explicit message. Do not silently report "no rows" if the real issue is that the page is not usable. ### 5. Make the structured artifact the primary contract Upstream collection skills should return the structured artifact as soon as it is stable. For example, the Zhihu hotlist flow should produce: ```json { "source": "https://www.zhihu.com/hot", "sheet_name": "知乎热榜", "columns": ["rank", "title", "heat"], "rows": [[1, "标题", "344万"]] } ``` Downstream skills such as Office export or screen rendering should consume that artifact. They should not recollect source data. ### 6. Stop exploratory browser work after the artifact is stable Once the primary artifact is complete: - stop selector exploration - stop unrelated browser wandering - hand the artifact to the downstream skill or tool Do not continue reading random page text after the final rows are already captured. ### 7. Keep skill boundaries narrow and explicit Separate the responsibilities: - navigation skill: reach the destination and verify arrival - collection skill: extract structured data - export skill: render `.xlsx` - presentation skill: render `.html` and `presentation` Do not mix recollection, export, and presentation logic into one downstream skill. ### 8. Encode host constraints in every browser skill Browser skills should restate the SuperRPA host contract: - use `superrpa_browser` semantics inside the browser host - `expected_domain` is a bare hostname only - selectors must be valid CSS selectors - prefer direct routes before brittle click chains These rules are not "obvious context". They belong in the skill. ### 9. Verify browser skills at multiple layers A browser skill is not complete without verification at more than one layer: 1. script-level test for the packaged browser script 2. skill-library validation for package structure 3. runtime integration test proving the skill is actually called 4. live acceptance when a real browser session is available The 2026-03-30 fix only became trustworthy after all four were aligned. ### 10. Keep logs versioned and skill names explicit Live debugging is much faster when the runtime logs include: - runtime version - protocol version - loaded skill names with versions - explicit `call skill.tool` messages Skill packages should be written assuming those logs are part of the operability contract. ## Update Checklist When editing a browser skill, check all of the following: - Does the skill define a deterministic primary path? - Does it state when generic probing is allowed? - Does it distinguish blocked pages from missing data? - Does it define the primary structured artifact clearly? - Does it stop downstream skills from recollecting data? - Does it include verification expectations, not only workflow prose? If any answer is "no", the skill is still under-specified.