The 5 layers of an agentic AI stack

"We have one agent working. How do we scale it to ten?"

That's the question a fractional CTO sent me last week, and it's the question that lands in my inbox in some form every few days now. The honest answer is that you can't scale from one agent to ten by adding nine more agents. You scale by building a stack underneath them. And most teams that get stuck at one agent are stuck because they're missing two or three layers of that stack and don't know it yet.

Here's the model we use. Five layers. Each one named by what it does, not by what category of tool it is. With a real tool example at each layer and a one-line diagnostic for what breaks if you skip it.

TL;DR


Why "the agent stack" diagrams you've seen so far don't help you ship

You've seen the diagrams. LangChain pyramid. a16z pyramid. The one where "agent layer" sits on top of "model layer" on top of "data layer" on top of "infra layer." They're pretty. They tell you nothing about what to build first or what breaks if you skip a layer.

Look, I've drawn those pyramids too. They're not wrong, they're just decorative. They name categories, not jobs. "Data layer" is a category. "The place an agent reads what it did yesterday so it doesn't repeat itself today" is a job. The second framing is the one you can build against.

So here's a stack named by jobs. Five layers. Each layer answers a specific question. If the layer is missing, the question doesn't get answered, and your agent system breaks in the specific way that question predicts.

Layer 1: Model, "what thinks"

The model layer is the LLM your agents call. Claude 3.7 Sonnet, GPT-5, Gemini 2.5, whichever frontier model you're paying for. This is the layer everyone builds first and the layer everyone over-invests in.

Concrete: at OperatorIQ we use Claude Sonnet for drafting work, Claude Opus for strategic decisions, Haiku for high-volume tail tasks like syndication formatting and link checking. The split saves us roughly 40% on monthly spend versus running everything on the top-tier model.

What breaks if this layer is missing: nothing thinks. Obvious. Nobody skips this one.

What breaks if this layer is over-built: you spend $4K a month on Opus calls when 70% of the work could've run on Sonnet or Haiku. We see this in nearly every audit we run. Teams pick the most expensive model "to be safe" and burn budget that should've gone to layers 2 and 3.

The trap on this layer is that it's the easy layer. You add an API key and you're done. So teams keep adding things to this layer (more prompts, longer system messages, more tools wired directly to the model) instead of moving up the stack.

Layer 2: Memory, "what the agent remembers"

The memory layer is what lets an agent know what it did yesterday, what it was told last week, and what the other agents on the team are doing right now. This is the layer that turns a script into a worker.

There are three kinds of memory you actually need.

  1. Episodic memory. What this specific agent did, in order, with timestamps. A runs.jsonl file or a Postgres table works. The agent appends to it after every cycle. Tomorrow's run reads yesterday's tail.
  2. Shared state. What every agent on the team is doing right now. A single state.json file or a small Postgres row per agent. Every agent reads it on cycle start and writes to it on cycle end.
  3. Reference memory. Long-running facts that don't change cycle to cycle. Customer ICP, voice profile, brand rules, the company's actual offerings. Markdown files on disk are fine for this; you don't need a vector DB until you actually need one.

Concrete: our memory layer is Postgres for episodic plus shared state, plus a folder of markdown files for reference memory. Total cost: $20/month for the Postgres instance on Supabase. We don't use a vector DB at all. We tried Pinecone, removed it after a month, never missed it.

What breaks if this layer is missing: the agent forgets what it did yesterday. It re-emails the same prospect. It re-publishes the same post. It re-runs the same migration. We've watched teams ship "agentic" systems with no memory layer and they look magical for a week, then start hallucinating their own work history.

The trap on this layer is reaching for vector DBs first. You probably don't need one. Episodic plus state plus reference covers 95% of cases. Add a vector DB the day you actually can't find what you're looking for in episodic memory by date or by tag.

Layer 3: Orchestration, "who runs what when"

The orchestration layer decides which agent runs, in what order, on what schedule, and what happens when an agent fails. This is the layer most teams skip and the layer that determines whether you have a team or a pile of scripts.

There are two flavors of orchestration to think about.

  1. Time-based. Agent X runs every morning at 06:00 ET. Agent Y runs every 30 minutes during business hours. A scheduler does this. Windows Task Scheduler, cron, GitHub Actions schedules, or n8n schedule nodes all work. Pick one and stick with it.
  2. Event-based. Agent Y runs whenever Agent X drops a file in a specific folder. Or whenever a webhook fires. Or whenever a row appears in a queue table. This is the part that turns a schedule into a system.

Concrete: our orchestration runs on Windows Task Scheduler for time-based triggers (calls Python scripts that fire the agents) plus a TRIGGER_*.md file convention for event-based handoffs (one agent writes the trigger, the next agent's cycle reads and consumes it). Total cost: zero. WTS is free, the trigger files are markdown.

What breaks if this layer is missing: you become the orchestrator. You're the one deciding which agent to run next. The whole "autonomous" promise of agentic AI collapses because every cycle is gated on you opening a terminal and typing python run_agent.py.

The trap on this layer is reaching for Kubernetes or a workflow engine before you have three agents. You don't need it yet. WTS plus a folder of trigger files will take you to ~10 agents before you outgrow it. We're at 17 and we still haven't needed anything heavier.

Layer 4: Tooling, "what the agent can touch"

The tooling layer is the set of integrations the agent has authority to call. Send an email. Push a commit. Update a CRM record. Pay an invoice. Edit a config file. Each tool is an action the agent is allowed to take in the world, with an authority envelope around how much it can do without a human in the loop.

Concrete: our agents touch Gmail (via the IMAP/SMTP API), GitHub (via the gh CLI), Stripe (via the Stripe Python SDK), Apollo (via the Apollo MCP), HubSpot (via the HubSpot MCP), Substack (via Playwright, no public API), and a dozen markdown files on disk. Each integration has a hard cap on what the agent can do without escalating. Outreach Closer can send emails up to a daily quota; past the quota, it queues for the founder.

What breaks if this layer is missing: the agent thinks but can't act. You get a system that drafts emails and never sends them, that recommends commits and never pushes them. The promised value of "the agent does the work" never lands because there's a human in the loop on every single output.

What breaks if this layer is over-built without a verification layer above it: the agent ships things you didn't want shipped. It emails the wrong list. It commits the wrong branch. It pays the wrong invoice. This is the failure mode that gets agentic projects shut down. The fix isn't "give the agent less tooling," it's "build layer 5 before you build layer 4."

The trap on this layer is wiring tools directly to the model with no envelope. Every tool needs a quota, a scope, and an escalation path. "Can send emails" is wrong. "Can send up to 25 emails per day to prospects in segment X, with subject lines that pass the linter, escalates anything that fails the linter" is right.

Layer 5: Verification, "what catches the mistakes"

The verification layer reads the work the other agents produced and checks it before anything ships externally. This is the layer that nobody draws on the pretty pyramid diagrams and the layer that determines whether the system is safe to leave running overnight.

There are three kinds of verification you actually need.

  1. Output linting. A rules engine that checks every customer-facing draft for banned phrases, format violations, missing CTAs, broken links. Cheap. Runs in seconds. Catches 80% of issues before they ship.
  2. Cross-agent challenge. A second agent reads the first agent's output and disagrees in writing when something's off. This is the layer that catches the 15% the linter misses. It costs real model spend, so use it on high-stakes outputs (outreach copy, financial decisions, public posts), not on every artifact.
  3. Reality check. A scheduled pass that takes the agent's claims ("I emailed 25 prospects yesterday") and verifies them against the source of truth ("did 25 messages actually leave the Gmail sent folder?"). This catches the 5% of cases where the agent lies about its own work.

Concrete: our verification layer is a Python linter for output rules (banned phrases, em-dashes, format), a QA sub-agent that reads outreach drafts before they queue for approval, and a daily verification cycle that reconciles claimed work against actual outputs. Total cost: ~$30/month in model spend for the QA sub-agent, zero for the linter.

What breaks if this layer is missing: the agent ships work that's wrong, and the first time you find out is when a customer points it out. Or worse, when nobody ever points it out and the wrongness compounds for months. We had this happen in month two. An agent claimed it had emailed 14 leads. It had drafted 14 emails and sent zero. We caught it because the Analyst agent's daily reconciliation flagged the gap. Without that reconciliation, we'd have thought we had 14 conversations in flight when we had zero.

The trap on this layer is treating verification as a feature you'll add later. You won't. Add it from day one, even if it's just a 50-line linter.

The build order

Most teams build their agent stack in the wrong order. Here's the order we recommend and the reason.

  1. Model. You need the LLM to do anything. Pick one, get an API key, move on.
  2. Memory. Build this before you build your second agent. If your first agent has no memory, your second agent will inherit the same hole and the bug will compound.
  3. Orchestration. Build this when you have two agents. A schedule and an event trigger file convention. Don't reach for Kubernetes.
  4. Tooling. Wire integrations one at a time. Each one with an authority envelope.
  5. Verification. Build the linter alongside Layer 4. Build the QA sub-agent when the first piece of agent-shipped work goes external.

The order people actually build in is usually model, then tools, then more model, then more tools, then they realize they need memory, then they realize they need orchestration, then they never quite get to verification. That's the path to a one-agent demo that never becomes a system.

What a working stack looks like, all five layers, in one paragraph

Claude Sonnet drafts the work (Layer 1). Postgres plus markdown files on disk hold what every agent did yesterday and what they're doing today (Layer 2). Windows Task Scheduler fires the cycles and trigger files chain the handoffs (Layer 3). Each agent has 1-to-5 integrations wired with quotas and escalation paths (Layer 4). A linter and a QA sub-agent catch mistakes before anything ships externally (Layer 5). Total monthly infrastructure cost for a 17-agent system: under $200, mostly model spend.

That's the whole stack. No vector DB, no LangChain, no Kubernetes, no purpose-built "agent framework." Boring tools, named jobs, layered cleanly.

If you want this built for your business inside seven days, we ship it as a productized service. The blueprint, the agents, the verification layer, the lot. See our blueprints for what we ship and what it costs.

What to read next

If you got value from this, the cornerstone post in this series is What is an agentic-AI-first business?. It's the definition piece that anchors everything else we write about. The companion to this stack post is the org chart of an agentic-AI-first company, which is the same model from the human-organization angle instead of the infrastructure angle.

Coming next in this series: the agentic maturity model, how to tell where your company is on the spectrum from "copilot" to "colleague," with the move you make at each stage.

If you want to talk about your stack, email christine@operatoriq.io. Tell me which layer you're missing. I'll tell you what to do about it.

Cheers, Christine