skill-lib/BROWSER_SKILL_AUTHORING.md

# Browser Skill Authoring

This note captures the browser-skill authoring rules proven during the live
Zhihu hotlist export debugging on 2026-03-30.

## Why This Exists

The live browser run proved that a skill can be selected correctly, the
runtime can call the right browser-script tool, and the task can still fail if
the skill package does not encode enough deterministic extraction logic.

The concrete failure pattern was:

1. `zhihu-hotlist.extract_hotlist` was called correctly.
2. The packaged script relied on stale DOM classes and returned no rows.
3. The runtime fell back to generic `getText` probing.
4. The user saw selector thrashing instead of a stable extraction path.

This document exists to prevent the same failure pattern in future browser
skills.

## Authoring Rules

### 1. Use a packaged script for structured browser tasks

If the task's primary deliverable is structured data such as rows, fields, or a
stable artifact, the skill should expose a deterministic `browser_script` tool.

Do not rely on prose-only instructions for repeated structured extraction.

### 2. Keep a strict extraction ladder inside the script

For browser extraction skills, the script should try data sources in this order:

1. stable structured page state when available
2. generalized DOM candidates that are broader than one historical classname
3. controlled page-text parsing as the last deterministic fallback

Do not jump straight from one brittle selector family to generic browser
probing.

### 3. Treat generic `getText` probing as a fallback of last resort

The packaged script is the primary deterministic path.

If the script fails, it should fail for a specific reason:

- blocked/login/captcha page
- unsupported page shape
- artifact incomplete

Generic browser wandering should begin only after the packaged script has
exhausted its own deterministic fallbacks.

### 4. Encode blocked-page semantics explicitly

A browser skill must distinguish:

- "the expected data is not present"
- "the page is blocked by login, captcha, or anti-bot state"

When the page is blocked, fail with an explicit message. Do not silently report
"no rows" if the real issue is that the page is not usable.

### 5. Make the structured artifact the primary contract

Upstream collection skills should return the structured artifact as soon as it
is stable.

For example, the Zhihu hotlist flow should produce:

```json
{
  "source": "https://www.zhihu.com/hot",
  "sheet_name": "知乎热榜",
  "columns": ["rank", "title", "heat"],
  "rows": [[1, "标题", "344万"]]
}
```

Downstream skills such as Office export or screen rendering should consume that
artifact. They should not recollect source data.

### 6. Stop exploratory browser work after the artifact is stable

Once the primary artifact is complete:

- stop selector exploration
- stop unrelated browser wandering
- hand the artifact to the downstream skill or tool

Do not continue reading random page text after the final rows are already
captured.

### 7. Keep skill boundaries narrow and explicit

Separate the responsibilities:

- navigation skill: reach the destination and verify arrival
- collection skill: extract structured data
- export skill: render `.xlsx`
- presentation skill: render `.html` and `presentation`

Do not mix recollection, export, and presentation logic into one downstream
skill.

### 8. Encode host constraints in every browser skill

Browser skills should restate the SuperRPA host contract:

- use `superrpa_browser` semantics inside the browser host
- `expected_domain` is a bare hostname only
- selectors must be valid CSS selectors
- prefer direct routes before brittle click chains

These rules are not "obvious context". They belong in the skill.

### 9. Verify browser skills at multiple layers

A browser skill is not complete without verification at more than one layer:

1. script-level test for the packaged browser script
2. skill-library validation for package structure
3. runtime integration test proving the skill is actually called
4. live acceptance when a real browser session is available

The 2026-03-30 fix only became trustworthy after all four were aligned.

### 10. Keep logs versioned and skill names explicit

Live debugging is much faster when the runtime logs include:

- runtime version
- protocol version
- loaded skill names with versions
- explicit `call skill.tool` messages

Skill packages should be written assuming those logs are part of the
operability contract.

## Update Checklist

When editing a browser skill, check all of the following:

- Does the skill define a deterministic primary path?
- Does it state when generic probing is allowed?
- Does it distinguish blocked pages from missing data?
- Does it define the primary structured artifact clearly?
- Does it stop downstream skills from recollecting data?
- Does it include verification expectations, not only workflow prose?

If any answer is "no", the skill is still under-specified.