Model-Agnostic Architecture
Build the system so the model can be swapped. The model will be swapped.
Overview
Model-agnostic architecture is the practice of building LLM-powered systems so that the specific model — GPT, Claude, Gemini, Llama, whatever ships next week — is an implementation detail, not a foundation.[1] The system sits on top of an abstraction; the abstraction calls whichever model is currently the right one for the job at hand.
This is not a philosophical stance. It is a response to the empirical fact that the frontier model changes every six to eight weeks, prices reset, context windows grow, and the model that was cheapest-per-token last quarter is now not even the cheapest for the same job. Systems built against a specific model age badly. Systems built against an interface age like a good foundation.
Why It Matters
- The frontier moves. Better models appear regularly. Your system should benefit from them without a rewrite.
- Prices drop. A year after a model's release, equivalent quality is usually 10–30% of the original cost. A coupled system cannot capture this.
- Capabilities diverge. One model is best at reasoning, another at structured output, a third at long-context retrieval. You want to route, not commit.
- Vendors go down. Not often, but memorably. A hard dependency on one provider is an SLA tied to theirs.
- Compliance varies. Some customers need on-prem, some need specific geos, some need open-weights. A model-coupled system cannot be sold to all three.
In short: coupling to one model is a decision to rebuild the system every 12–18 months. The Aggressive Craftsmanship position on this is predictable: don't.
The Layered Architecture
Production LLM systems converge on a recognizable stack. Each layer sits on the one below and can be swapped without the others noticing:[2]
| LAYER | RESPONSIBILITY | SWAPPABLE? |
|---|---|---|
| User Interface | Surface (chat, form, app) | Yes |
| Orchestration | Decides which tools/agents/models run when | Yes |
| Agents / Tools | Execute discrete skills (search, code, call APIs) | Yes |
| Retrieval | Memory, documents, vector stores | Yes |
| Model Gateway | Abstracts provider-specific calls | Yes |
| Models | The actual LLMs (possibly many) | Especially yes |
| Safety / Eval | Guardrails, logging, regression tests | Yes |
Core Patterns
1. The Unified Wrapper.
A single class or function that standardizes input and output across providers so
application code never calls openai or anthropic directly.
Swap providers by swapping the wrapper's backend, not the caller.[3]
2. Prompt-as-Data, not Prompt-as-Code.
Prompts live in versioned files (YAML, JSON, Markdown) — not hardcoded strings. This makes them reviewable, testable, and most importantly swappable when a new model wants a different format.
3. Router / Dispatcher.
Don't send every request to the biggest model. Route simple classifications and summaries to small/cheap models; reserve top-tier models for reasoning-heavy work. The material principle applies: use the right model for the load.
4. Golden-Path Evaluation.
A fixed set of prompts with known-good outputs, run nightly against whichever model is in rotation. Without this, you cannot confidently switch models. With it, you can.
5. Structured Output, Always.
Free-form text replies lock you in (the next model phrases things differently; your parser breaks). Ask for structured output — JSON, function calls, schema-validated — so the contract is the data, not the prose.
6. Open-Weights Escape Hatch.
Wherever possible, ensure there is some open-weights model that can do the job, even if worse. This guarantees you are never held hostage by a vendor, and — separately — gives you an on-prem option the moment a customer asks for one.
"If swapping the model requires re-architecting, you didn't build an architecture. You built a shrine." — Burbridge
Failure Modes
- Prompt-leak. Code depends on phrasing that only one model reliably produces.
- SDK-lock. Application imports vendor SDK directly; abstracting later requires touching every call site.
- Hidden features. Using provider-only features (e.g., one vendor's cache or eval product) without an escape plan.
- No evals. Swapping is terrifying because no one knows whether the new model is worse. Fix: run actual tests.
- "We'll abstract later." The same mistake as "we'll write the tests later." Both are promises made by people who will not be there when the bill comes.
Burbridge's Approach
Every LLM-backed system Burbridge ships assumes the model will be swapped at least twice before retirement. The first swap is usually a cost optimization; the second is usually a capability upgrade. If either is painful, the system was built wrong.
See also: Open Source Agents (the current state of frameworks that make this easier), and Aggressive Craftsmanship (the philosophical parent).
See Also
Software · Open Source Agents · Aggressive Craftsmanship · The Material · Foundation · The One Ring
References
- Databricks, Agent system design patterns. Agent Framework documentation, 2026.
- Besiroglu, Z. (2026). "Architecture Patterns for LLM Systems." Medium, March 2026.
- AImultiple. (2026). "LLM Orchestration in 2026: Top 22 frameworks and gateways."