Model-Agnostic Architecture

Overview

Model-agnostic architecture is the practice of building LLM-powered systems so that the specific model — GPT, Claude, Gemini, Llama, whatever ships next week — is an implementation detail, not a foundation.^[1] The system sits on top of an abstraction; the abstraction calls whichever model is currently the right one for the job at hand.

This is not a philosophical stance. It is a response to the empirical fact that the frontier model changes every six to eight weeks, prices reset, context windows grow, and the model that was cheapest-per-token last quarter is now not even the cheapest for the same job. Systems built against a specific model age badly. Systems built against an interface age like a good foundation.

Why It Matters

The frontier moves. Better models appear regularly. Your system should benefit from them without a rewrite.
Prices drop. A year after a model's release, equivalent quality is usually 10–30% of the original cost. A coupled system cannot capture this.
Capabilities diverge. One model is best at reasoning, another at structured output, a third at long-context retrieval. You want to route, not commit.
Vendors go down. Not often, but memorably. A hard dependency on one provider is an SLA tied to theirs.
Compliance varies. Some customers need on-prem, some need specific geos, some need open-weights. A model-coupled system cannot be sold to all three.

In short: coupling to one model is a decision to rebuild the system every 12–18 months. The Aggressive Craftsmanship position on this is predictable: don't.

The Layered Architecture

Production LLM systems converge on a recognizable stack. Each layer sits on the one below and can be swapped without the others noticing:^[2]

LAYER	RESPONSIBILITY	SWAPPABLE?
User Interface	Surface (chat, form, app)	Yes
Orchestration	Decides which tools/agents/models run when	Yes
Agents / Tools	Execute discrete skills (search, code, call APIs)	Yes
Retrieval	Memory, documents, vector stores	Yes
Model Gateway	Abstracts provider-specific calls	Yes
Models	The actual LLMs (possibly many)	Especially yes
Safety / Eval	Guardrails, logging, regression tests	Yes

Core Patterns

1. The Unified Wrapper.

A single class or function that standardizes input and output across providers so application code never calls openai or anthropic directly. Swap providers by swapping the wrapper's backend, not the caller.^[3]

2. Prompt-as-Data, not Prompt-as-Code.

Prompts live in versioned files (YAML, JSON, Markdown) — not hardcoded strings. This makes them reviewable, testable, and most importantly swappable when a new model wants a different format.

3. Router / Dispatcher.

Don't send every request to the biggest model. Route simple classifications and summaries to small/cheap models; reserve top-tier models for reasoning-heavy work. The material principle applies: use the right model for the load.

4. Golden-Path Evaluation.

A fixed set of prompts with known-good outputs, run nightly against whichever model is in rotation. Without this, you cannot confidently switch models. With it, you can.

5. Structured Output, Always.

Free-form text replies lock you in (the next model phrases things differently; your parser breaks). Ask for structured output — JSON, function calls, schema-validated — so the contract is the data, not the prose.

6. Open-Weights Escape Hatch.

Wherever possible, ensure there is some open-weights model that can do the job, even if worse. This guarantees you are never held hostage by a vendor, and — separately — gives you an on-prem option the moment a customer asks for one.

"If swapping the model requires re-architecting, you didn't build an architecture. You built a shrine." — Burbridge

Failure Modes

Prompt-leak. Code depends on phrasing that only one model reliably produces.
SDK-lock. Application imports vendor SDK directly; abstracting later requires touching every call site.
Hidden features. Using provider-only features (e.g., one vendor's cache or eval product) without an escape plan.
No evals. Swapping is terrifying because no one knows whether the new model is worse. Fix: run actual tests.
"We'll abstract later." The same mistake as "we'll write the tests later." Both are promises made by people who will not be there when the bill comes.

Burbridge's Approach

Every LLM-backed system Burbridge ships assumes the model will be swapped at least twice before retirement. The first swap is usually a cost optimization; the second is usually a capability upgrade. If either is painful, the system was built wrong.

See also: Open Source Agents (the current state of frameworks that make this easier), and Aggressive Craftsmanship (the philosophical parent).

References

Databricks, Agent system design patterns. Agent Framework documentation, 2026.
Besiroglu, Z. (2026). "Architecture Patterns for LLM Systems." Medium, March 2026.
AImultiple. (2026). "LLM Orchestration in 2026: Top 22 frameworks and gateways."

Type	Architectural pattern
Applies to	Any LLM-backed system
Cost	Small up-front, large downstream savings
Alternative	Rewriting every 12–18 months

Overview

Why It Matters

The Layered Architecture

Core Patterns

Failure Modes

Burbridge's Approach

See Also

References