anvil-serving — the quality-gated local-model router for coding harnesses

anvil-serving

The quality-gated local-model router for coding harnesses.

Local where it's been proven, cloud where it hasn't — verified, with automatic fallback.

License: MIT Version Docs

Point your coding harness (Claude Code, Codex, Aider, Cline, Continue — OpenClaw as the near-first-class beachhead) at one anvil-serving endpoint. Per request, the router resolves an intent to a tier — fast-local, heavy-local, or cloud — using a measured per-(model, work-class) quality profile, cheaply verifies the output, and falls back to the next tier (ultimately cloud) when the local answer fails. The harness sees one reliable endpoint and never silently eats a local-quality failure mid-run.


Why a router, and not just another proxy

Transport is a commodity — LiteLLM, claude-code-router, Ollama, OpenRouter all move tokens. None of them know whether local can actually do this work. They route by static rules (model name, cost, regex). On anvil's real PRD→tasks planning prompt, the gap was measured directly:

  • Local output is structurally valid ≥92% of the time — structural validity is not the differentiator.
  • But on dependency/ordering reasoning local collapses: frontier 24.75/25, fast 16.0, heavy 13.25 (local ≈ 55–65% of frontier).

A dumb proxy sends that planning request to local and silently corrupts a long agent run. The defensible asset is therefore not the transport — it's the quality profile (per model × work-class, measured on the operator's own workload) plus the verify-and-fallback loop.


Intent presets in the model field

Callers declare an intent — a closed enum of named presets — instead of a model name:

planning   quick-edit   review   chat   long-context

Accepted bare (planning) or namespaced (anvil/planning). Each preset resolves internally to hard constraints (context length, privacy, tool support, cost ceiling) that filter the candidate pool, plus a quality intent that ranks the survivors via the profile. Filter, then rank.


Quickstart

pip install anvil-serving

# copy and edit the local-only example config
cp configs/example.toml ~/.config/anvil-serving/config.toml

# start the router
anvil-serving serve --config ~/.config/anvil-serving/config.toml

See Model Settings Example for a full annotated configuration, and How it works for the full design reference.


Section What's there
How it works Full architecture — intents, tiers, quality gate, verify-and-fallback
Model settings Annotated config file with all options
Serves & eval Managing model serves + running evals
External benchmarks Millstone import/fetch, reports, exports, and local benchmark comparisons
OpenClaw integration Plugin spec for the OpenClaw gateway
OpenClaw live validation Validation runbook for OpenClaw
Cost model advise-and-defer plan — local-only default, opt-in metered cloud
ADRs Architecture decisions
Changelog Release history

Cloud is off by default

The default config ships local-only, $0 metered API billing. Cloud is opt-in and explicit — you must declare a CloudBackend section in your config to unlock it. See ADR-0001 and the advise-and-defer plan for the full rationale.