Reliable AI Agents: Context, Data, and Permissions Explained

Written by Mary Kaplan | December 2, 2025

Vertesia's CEO and co-founder, Eric Barroca, recently joined GenAI Global, the weekly podcast hosted by MIT Professor John R. Williams and Dr. Abel Sanchez —two of the most respected voices in applied AI and enterprise systems. The show explores the rapidly evolving landscape of generative AI (GenAI) and agentic systems, with a strong focus on real-world deployment, governance, and the technical decisions that determine whether AI succeeds or stalls inside organizations.

Before Eric joined the conversation, Abel and John were discussing some of the emerging pressure points facing enterprises as they adopt generative AI: the hidden challenges of legacy systems, and why 95% of AI pilots fail despite enormous investment. They touched on everything from cloud migration and system fragmentation to the surprising complexity of identity, permissions, and data quality—recurring themes across their enterprise consulting and research at MIT.

Given that backdrop, they turned to Eric to dive into a topic the two hosts have been increasingly focused on: content and AI.

How do companies manage content vs how AI agents consume content, and why is content preparation becoming one of the most important building blocks for enterprise AI?

With Eric’s 20+ years in enterprise content and digital asset management (including as former CEO of Nuxeo), he offered a rare, deeply technical perspective on how AI agents actually interact with enterprise content and why foundational elements like data structure and permissions determine whether agents behave safely and correctly.

Building on that expertise, the three turned to some of the most important—and most misunderstood—questions enterprises face when adopting agentic AI, including:

What are context windows—and why do they matter in the enterprise?
How do you keep agents from accessing data they shouldn’t?
How do you make messy ERP data usable for AI agents?
What kinds of content do enterprises need to prepare for agent use?
How do different AI models perform—and how do you choose the right one?

Below is the full Q&A, edited for clarity and optimized for readers looking to understand the mechanics of enterprise agentic AI.

When companies start building AI agents to actually do work, they keep hearing about “context windows.” What does that mean in a real enterprise environment, and why does it matter?

A context window is basically the model’s short-term memory — the amount of information it can actively hold in its “mind” while it’s reasoning. It sounds big on paper; many models today allow 200k or 300k tokens, which is the equivalent of a few hundred pages. But in real enterprise work, that fills up fast. Once you start loading contracts,slide decks, spreadsheets, or images, that window gets saturated — and when it does, the quality of the model’s output drops sharply.

That’s why managing context is a huge part of making agents truly useful. Life is a large context. Work is a large context. Business processes are constantly changing. So you need a way to load the right information, remove what’s no longer relevant, and structure everything so the model can stay efficient inside a fixed memory limit.

This is where real software engineering still matters. You can’t just dump everything into the model and hope for the best. You need patterns and tools that help the agent decide what to keep in memory, when to refresh it, and how to package content so the model can process it cleanly.

If you don’t do that, you get exactly the problems people complain about — hallucinations, incomplete outputs, or agents that get confused because there’s simply too much noise in the window. Managing the context window carefully is essential to keeping outputs high-quality and making sure the model actually completes the task you gave it.

In a company, not everyone has the same level of access to data or systems. How do you keep AI agents from overstepping—especially when LLMs can “see” everything you give them?

This is one of the biggest—and least talked about—challenges of deploying agents inside an enterprise. Once you solve the AI part, the real issue becomes access: who is allowed to see what, and how do you keep an agent from reaching into places a human in that same role would never be allowed to touch?

In a company, access is never uniform. If you’re an accountant level one, you don’t see what an accountant level ten sees. A project manager may only access documents for their project—but not every document in the department. Finance sees different things than marketing. And the “need to know / right to know” rules are extremely complex, especially in large organizations.

The problem is that LLMs don’t grow up in a world of permission tables. If you give them access to everything, they will use everything. And unlike a human, the model will not “politely ignore” information it shouldn’t touch. If it sees a way to complete the task by grabbing a sensitive file, it will do it—because there’s no innate restraint mechanism.

So the only safe way is to enforce permissions at the query level. That’s how we do it. Every agent runs as the user—not as an admin, not as a system superuser—and we propagate that identity end-to-end. If the agent queries Salesforce, it does so with the exact user permissions of the real person. Same for SharePoint, Box, internal tools, everything.

This is critical, because if you don’t enforce permissions at the moment the model tries to fetch information, the model will find data that a human wouldn’t look for. And once that information is inside its context window, you can’t tell the model “pretend you didn’t see that.” It’s too late. It will process it.

So real enterprise AI requires the same strict access controls we use for humans—just applied to models. You have to restrict every tool call, every query, and every interaction to the permission scope of the user asking the agent to do the task. That’s the only way to prevent leakage and keep the system safe.

From a design perspective, when you say agents “run with an identity,” it sounds like a scalable way to control access. Is that how you think about it, especially given all the differences between cloud identity systems?

Yes — that’s exactly the model. We use something called workload and workforce identity federation, which essentially lets an agent operate on behalf of the user instead of as its own superuser. So when an agent accesses Azure, Office 365, SharePoint, Google, or AWS, it does so carrying the user’s identity.

Two things happen as a result:

You know the request is not the human directly, but it is acting on the human’s behalf.
You can reuse the existing permission system that already governs those environments.

So when the model calls Salesforce or SharePoint or any SaaS app, it does so with the exact permissions of the user who initiated the task. That lets us restrict what the agent can query or fetch, and it keeps the enterprise’s existing access-control model intact.

This works well in modern cloud environments that support identity federation. The real complexity shows up when you hit legacy systems — the ones still using old-school usernames and passwords. For those, you need proxies to exchange credentials and enforce permission boundaries, because one thing is absolutely certain:
If the model can see something, it will use it.

There’s no way to tell an LLM “ignore this sensitive file” once it’s already in its context — that’s not how these systems work. So every single system the agent touches must enforce the user’s permissions at the moment of access.

This identity-based approach doesn’t make the problem simple — identity and permissions have been one of the hardest problems in enterprise software for decades — and AI doesn’t solve the problem. It actually complicates it. But solving this is what makes enterprise-grade AI deployment possible. Agents must respect the same boundaries as humans, and that only works when every tool call and every query carries the authenticated identity of the person behind it.

When companies try to give AI agents access to systems like ERPs, the data is often messy, siloed, or locked behind layers of permissions. Many teams just export everything into BigQuery or a data lake. But how do you actually make that data usable for an agent?

In most companies, you can’t use ERP or legacy system data “as is.” You still have to prepare the data so the agent can work with it, because LLMs don’t operate like relational databases and they can’t reason across raw tables and joins the way SAP or Oracle does.

That’s why the first step is always creating tools that act as the interface between the model and the system. The model never directly queries SAP. Instead, a tool impersonates the user, enforces permissions, selects the right slice of data, cleans it, and passes only what’s needed into the model. That layer is what makes the system safe and usable.

But you also have to prepare the data itself. You can’t dump 20 ERP tables into the model’s context window and expect it to figure out the joins. It won’t. So you need to flatten the data — remove complex relationships, denormalize it, and put it into a form that is natural for the model to consume.

This is what people often describe as de-normalization, and it’s exactly right. For 50 years we engineered data for software — normalized tables, foreign keys, lookups, schemas optimized for compute. Now we have to reverse that and structure data in a way that models can understand, which happens to be the way humans understand it.

When you think about preparing data for agents, it’s not just ERP tables or CSVs. What kinds of content do enterprises actually need to process so that models can use them effectively?

It’s essentially everything a company produces or touches. ERP data is the easy example, but in reality, enterprises run on an enormous amount of free-flow content: reports, contracts, post-mortems, presentations, policies, spreadsheets, images, videos — all the things people create and read every day.

Take something like an insurance claim. You don’t just have a single form. You have supporting documents, photos of the damage, maybe videos, maybe scanned PDFs — all kinds of material the agent needs to understand. Or think about a 300-page PDF with embedded charts and images. That’s extremely common. A 200-page report full of pictures? Also common. And the model can’t simply “read” any of that out of the box.

So you have to transform this content into something the model can understand. For text-heavy pages, that means parsing and structuring the text. For images or graphs, it means generating descriptions or converting visual information into data. If you convert an annual report, for example, the images and charts must be described or the model will miss half the meaning — and numbers without charts don’t tell the story.

This is true across industries. Creative teams produce tons of images. Financial services produce long reports. Healthcare has scanned documents and forms. Manufacturing has photos and diagrams. More and more, we’re seeing video and audio — recordings, transcripts, walkthroughs.

Under the hood, all systems look similar: there’s an API to call or a database to query. But you can’t feed the raw structure directly to a model. SAP isn’t going to work if you dump raw tables. Salesforce isn’t going to work if you just export JSON. Every one of these systems needs a “UX for the model” — a transformed, human-readable version of the data.

That’s the pattern. Whether it’s content, images, video, or structured system data, you have to prepare it in a form that a model can consume naturally. If it’s natural for a human, it’s natural for the model. And that’s what unlocks all these different content types inside an enterprise.

Earlier this week, you mentioned that different AI models have different strengths. Can you walk us through how you choose which model to use for which kind of task — and what differences you’ve observed?

Models aren’t interchangeable. They have very different strengths, different price points, and even different behaviors depending on where they’re hosted. That’s why you need to match the model to the task instead of assuming one model can do everything.

For example, when we need extraction — meaning you have a lot of content and you just want the gist or structured pieces out of it — we mostly use Gemini Flash. Flash is fast, cheap, and very good at pulling information out of documents.

When you need advanced reasoning, we observe that Claude performs much better. But here’s where it gets interesting: Claude doesn’t behave the same way everywhere. We recently ran a controlled test using a complex agent with long-running logic and many loops. Everything was identical — same prompt, same tools, same context — but Claude running on Google Vertex AI behaved differently from Claude running on AWS.

Not just slightly different outputs — different behavior.
We don’t know exactly why, because we don’t see system prompts or deployment parameters. But the infrastructure is different: Google uses TPUs, Amazon uses GPUs. That alone can affect everything from latency to context handling to how the model processes long sequences. So the environment absolutely matters.

We also use GPT-5 for reasoning tasks and it performs well. But the larger point is:
You have to test. These models have different capabilities, different costs, and different trade-offs. Even the same model from the same company can behave differently depending on which cloud you run it on. You can’t assume uniformity — you have to choose deliberately based on what you’re trying to do.

To listen to the full episode, click here!

View full post