Why Production AI Fails (And It's Not the AI)

Written by Mary Kaplan | April 21, 2026

"Our agents work great in demos. They fall apart in production."

Our CEO, Eric Barroca, just published a deep technical analysis that finally explains why. And honestly? It's changed how I talk about what we're building at Vertesia.

Read Eric's full article here →

The demo-to-production gap

Here's what we've learned at Vertesia: almost every AI agent—whether you're using CrewAI, LangGraph, or building from scratch—runs the same basic loop at its core:

The model reasons about what to do next
It selects a tool to use
The tool executes
The result goes back to the model
Repeat until done

That loop? It's not the problem. In fact, it works remarkably well.

The problem is what happens when you go from processing a few documents to processing thousands. When you go from one agent to hundreds running concurrently. When your model provider's capacity fluctuates unpredictably throughout the day.

That's when the system around the loop starts to fail.

What enterprise AI needs

Eric calls the infrastructure around the reasoning loop "the harness." After watching our customers navigate these challenges, I can't think of a better term.

Here's what enterprises are telling us they actually need:

Reliability under failure
- When an agent has been running for three hours analyzing contracts and the process crashes, you can't start over from scratch. You need durable execution that survives failures and picks up where it left off.
- This is why Vertesia uses Temporal under the hood—it's infrastructure-grade reliability for AI workloads.
Intelligent resource management
- Model providers don't publish their capacity limits, and those limits change throughout the day. Basic retry logic isn't enough. You need a system that understands dynamic capacity constraints and manages throughput intelligently—like air traffic control for API calls.
Production-grade document processing
- Here's something that doesn't get talked about enough: bad document extraction silently kills reasoning quality. If your PDF parser misses tables, loses headers, or mangles formatting, even the best LLM can't compensate.
- At Vertesia, we treat document processing as critical infrastructure—semantic chunking, embeddings, versioning, and indexing are first-class concerns, not afterthoughts.
Tools that actually work together
- We talk to teams that have built 50+ custom tools for their agents. Then they tell me their agents keep picking the wrong tool or getting confused by inconsistent outputs.
- Quality beats quantity. We organize tools into skills that activate on-demand, with consistent formatting, error handling, and context protection. The agent gets exactly what it needs, when it needs it—nothing more.
Real-world event handling
- Production work isn't linear. Users upload new files mid-run. Human approvals are required before proceeding. Supervisors need to redirect agents based on interim results.
- Your system needs to listen and respond to events while working, not operate in a sealed bubble.
Memory that compounds
- Beyond storing chat history, production systems need searchable content, reusable artifacts, and the ability to learn from past runs. Intelligence should accumulate over time, not reset with every conversation.

The controversial take: process control matters

Here's where Eric's perspective really resonated with me, because I hear this pain point constantly:

Not every business workflow should be an unconstrained conversational loop.

Sometimes you need deterministic control—typed state, explicit transitions, human approval gates, retry logic, and auditability. But you still want agents to handle the genuinely open-ended parts.

This hybrid approach is exactly what our Process Engine does. The engine provides the guardrails and control panel. Agents operate as bounded workers within that structure. You get the best of both worlds: reliability where you need it, intelligence where it matters.

The technical deep dive

For the complete technical perspective, read Eric's full analysis:

Agent Reasoning Is Just a While Loop. Production Intelligence Is Everything Around It.

It's the first in a series exploring what it really takes to run AI agents in production. If you're building or buying AI infrastructure, reading it is worth your time.

View full post