Files
skill-lib/BROWSER_SKILL_AUTHORING.md
木炎 51913555ad feat: add initial skill authoring workspace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 18:34:56 +08:00

4.9 KiB

Browser Skill Authoring

This note captures the browser-skill authoring rules proven during the live Zhihu hotlist export debugging on 2026-03-30.

Why This Exists

The live browser run proved that a skill can be selected correctly, the runtime can call the right browser-script tool, and the task can still fail if the skill package does not encode enough deterministic extraction logic.

The concrete failure pattern was:

  1. zhihu-hotlist.extract_hotlist was called correctly.
  2. The packaged script relied on stale DOM classes and returned no rows.
  3. The runtime fell back to generic getText probing.
  4. The user saw selector thrashing instead of a stable extraction path.

This document exists to prevent the same failure pattern in future browser skills.

Authoring Rules

1. Use a packaged script for structured browser tasks

If the task's primary deliverable is structured data such as rows, fields, or a stable artifact, the skill should expose a deterministic browser_script tool.

Do not rely on prose-only instructions for repeated structured extraction.

2. Keep a strict extraction ladder inside the script

For browser extraction skills, the script should try data sources in this order:

  1. stable structured page state when available
  2. generalized DOM candidates that are broader than one historical classname
  3. controlled page-text parsing as the last deterministic fallback

Do not jump straight from one brittle selector family to generic browser probing.

3. Treat generic getText probing as a fallback of last resort

The packaged script is the primary deterministic path.

If the script fails, it should fail for a specific reason:

  • blocked/login/captcha page
  • unsupported page shape
  • artifact incomplete

Generic browser wandering should begin only after the packaged script has exhausted its own deterministic fallbacks.

4. Encode blocked-page semantics explicitly

A browser skill must distinguish:

  • "the expected data is not present"
  • "the page is blocked by login, captcha, or anti-bot state"

When the page is blocked, fail with an explicit message. Do not silently report "no rows" if the real issue is that the page is not usable.

5. Make the structured artifact the primary contract

Upstream collection skills should return the structured artifact as soon as it is stable.

For example, the Zhihu hotlist flow should produce:

{
  "source": "https://www.zhihu.com/hot",
  "sheet_name": "知乎热榜",
  "columns": ["rank", "title", "heat"],
  "rows": [[1, "标题", "344万"]]
}

Downstream skills such as Office export or screen rendering should consume that artifact. They should not recollect source data.

6. Stop exploratory browser work after the artifact is stable

Once the primary artifact is complete:

  • stop selector exploration
  • stop unrelated browser wandering
  • hand the artifact to the downstream skill or tool

Do not continue reading random page text after the final rows are already captured.

7. Keep skill boundaries narrow and explicit

Separate the responsibilities:

  • navigation skill: reach the destination and verify arrival
  • collection skill: extract structured data
  • export skill: render .xlsx
  • presentation skill: render .html and presentation

Do not mix recollection, export, and presentation logic into one downstream skill.

8. Encode host constraints in every browser skill

Browser skills should restate the SuperRPA host contract:

  • use superrpa_browser semantics inside the browser host
  • expected_domain is a bare hostname only
  • selectors must be valid CSS selectors
  • prefer direct routes before brittle click chains

These rules are not "obvious context". They belong in the skill.

9. Verify browser skills at multiple layers

A browser skill is not complete without verification at more than one layer:

  1. script-level test for the packaged browser script
  2. skill-library validation for package structure
  3. runtime integration test proving the skill is actually called
  4. live acceptance when a real browser session is available

The 2026-03-30 fix only became trustworthy after all four were aligned.

10. Keep logs versioned and skill names explicit

Live debugging is much faster when the runtime logs include:

  • runtime version
  • protocol version
  • loaded skill names with versions
  • explicit call skill.tool messages

Skill packages should be written assuming those logs are part of the operability contract.

Update Checklist

When editing a browser skill, check all of the following:

  • Does the skill define a deterministic primary path?
  • Does it state when generic probing is allowed?
  • Does it distinguish blocked pages from missing data?
  • Does it define the primary structured artifact clearly?
  • Does it stop downstream skills from recollecting data?
  • Does it include verification expectations, not only workflow prose?

If any answer is "no", the skill is still under-specified.