Quickstart: driving a task with an LLM
OpenErrand is "the pipe" — you bring the intelligence. The SDK's decide(ctx)
callback turns each PageContext into the next Command; back it with whatever
LLM you want. This guide shows it with Claude.
When you need this. A signed playbook with deterministic
stepsneeds no LLM for the happy path — the steps drive it. Reach for an LLM decider for the cold-start / fallback path: a flow you haven't recorded yet, or a step that broke because the page changed.
#The shape
client.run({ url, userId, decide }) calls your decide(ctx) once per step. The
extension sends up the page as a stripped element list (refs + labels + types — no
values, no screenshot by default); your decider picks one action; the extension
enforces it against the signed playbook fence and executes it. Loop until done.
#Install
npm install @anthropic-ai/sdk zod
#A Claude-backed decider
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
import { zodOutputFormat } from "@anthropic-ai/sdk/helpers/zod";
import type { Command, PageContext } from "@obep/protocol";
const anthropic = new Anthropic(); // reads ANTHROPIC_API_KEY
// The closed OBEP action surface, as a schema the model MUST emit (structured output).
const CommandSchema = z.discriminatedUnion("action", [
z.object({ action: z.literal("navigate"), url: z.string() }),
z.object({ action: z.literal("click"), ref: z.string() }),
z.object({ action: z.literal("fill"), ref: z.string(), value: z.string() }),
z.object({ action: z.literal("fillSecret"), ref: z.string(), credentialKey: z.string() }),
z.object({ action: z.literal("upload"), ref: z.string(), file: z.string() }),
z.object({ action: z.literal("wait"), ref: z.string().optional(), timeoutMs: z.number().optional() }),
z.object({ action: z.literal("extract"), ref: z.string(), as: z.string() }),
z.object({ action: z.literal("done"), result: z.record(z.unknown()).optional() }),
]);
const SYSTEM = `You drive a web task in the user's own browser, one step at a time.
Each turn you receive the current page as a list of interactive elements (ref, type, label).
Choose exactly ONE next action from the allowed set to make progress toward the goal.
Rules:
- Address elements only by their "ref". Never invent a ref that isn't listed.
- To enter a saved login, use fillSecret with the credentialKey — never type a
password as a value. You never see secret values.
- Call "done" when the goal is complete, with any extracted result.
- Prefer the smallest action that makes progress; do not guess at off-page navigation.`;
export function makeClaudeDecider(goal: string) {
// Per-task memory: the goal, plus every page seen and action taken so far.
const history: Anthropic.MessageParam[] = [{ role: "user", content: `Goal: ${goal}` }];
return async function decide(ctx: PageContext): Promise<Command> {
history.push({
role: "user",
content: `URL: ${ctx.url}\nElements:\n${ctx.interactiveElements
.map((e) => `- ${e.ref} <${e.type}> ${e.label}`)
.join("\n")}`,
});
const res = await anthropic.messages.parse({
model: "claude-opus-4-8",
max_tokens: 1024, // output is one small command
system: [{ type: "text", text: SYSTEM, cache_control: { type: "ephemeral" } }],
messages: history,
output_config: { format: zodOutputFormat(CommandSchema) },
});
const command = res.parsed_output!; // validated against CommandSchema
history.push({ role: "assistant", content: JSON.stringify(command) });
return command as Command;
};
}
Wire it into a task:
import { RelayClient } from "@obep/sdk";
import { WebSocket } from "ws";
const client = new RelayClient({ url: RELAY_WS, apiKey: API_KEY, WebSocketImpl: WebSocket });
const result = await client.run({
url: "https://portal.example.com/login",
userId: "dana",
decide: makeClaudeDecider("Log in, upload the claim document, then read the confirmation number."),
});
#Why this is safe even though the model is "untrusted"
The enforcement layer treats the LLM as adversarial — which is exactly right:
- The fence still wins. Whatever the model emits is re-checked on-device against
the signed playbook's
allowedDomains/allowedActions/allowedCredentialKeysbefore it runs. A hallucinatednavigateto an off-fence domain is blocked, not executed. - The model never sees secrets.
fillSecretcarries only acredentialKey; the value is resolved from the on-device vault. And capture minimization means the model receives element labels, not field values — so a password on the page is never in the context you send to Claude.
So a misbehaving or prompt-injected model can, at worst, fail the task — it cannot exfiltrate a credential or escape the playbook's domains.
#Model choice for a per-step loop
run() calls your decider once per step, so latency compounds over a flow. The
default here is claude-opus-4-8 (most capable). Because model choice is yours to make,
if step latency matters more than per-step reasoning depth you can switch to a faster
tier — claude-haiku-4-5 or claude-sonnet-4-6 — by changing the model string. For
flows that need real reasoning (ambiguous pages, recovery), add adaptive thinking:
thinking: { type: "adaptive" }.
#Keeping it cheap: caching across steps
Each step re-sends the whole conversation so far (the goal + every prior page + action), so the same prefix is processed on every call. Two caching levers:
- The conversation prefix is the real win. Put a
cache_controlbreakpoint on the last block of the most recent turn each step; the next step reads the cached prefix and only pays full price for the new page. This is the standard multi-turn pattern and it compounds as the flow grows. - The system prompt caches too — but only if it's large enough. The minimum
cacheable prefix on Opus-tier models is 4096 tokens (2048 on Sonnet/Haiku); a short
instruction block like the one above is below that, so its
cache_controlmarker is a silent no-op. It pays off when your system prompt is large (detailed policy, few-shot examples). Verify withusage.cache_read_input_tokens— if it stays 0, nothing cached.
#Notes
messages.parse+output_config.formatforces the model's output to matchCommandSchema, so you get a validatedCommandwith no brittle JSON parsing.- Keep
decidedeterministic-ish: address byref, and let the extension's enforcement (not your prompt) be the security boundary. - This is "bring your own LLM" — the same
decide(ctx) => Commandcontract works with any provider; only the call insidedecidechanges.