Getting Real on the Cost of Tokens

Written by Jonny McFadden | June 2, 2026

As Agentic AI becomes more widely adopted, token costs are becoming a big topic of conversation. The discourse online often misinterprets this trend, framing agentic AI as a budget-burner rather than what it actually is: an expanding, mission-critical infrastructure.

The concern is valid, but we need to reframe how we look at it. Token costs aren’t rising because the technology is inefficient; they’re rising because your AI is actually doing more work. As you roll out more agents and automate deeper workflows, your volume naturally scales. It’s a sign of adoption and value.

This is where I think we're collectively getting it wrong: we're measuring the cost without measuring the value.

The wrong question

When someone asks "how much did that agent run cost?", the implied comparison is almost always: compared to doing nothing.

That's the wrong baseline.

The right question is: compared to what?

Because if an agent completes a task in two hours that would have taken a skilled person a full week - we're not talking about a cost problem. We're talking about a leverage problem. And leverage, in this business, is everything.

Let’s do the math

At Vertesia, we practice what we preach. We’ve been using our own platform to build and scale these interfaces, and the results have completely shifted our understanding of velocity.

Below are just a few examples of what I have personally used Vertesia for in the past week:

Writing documentation for our platform
Writing onboarding material for a new-hire
Researching a prospect to better understand their problems and potential solutions
Brainstorming, scoping, and building a solution for multiple customer problems
Summarizing meeting notes
Designing and building client facing presentations
And my favorite…literally having an AI agent demo itself end to end for a prospective client

Does it cost money to do the above? Of course it does, but the cost is a tiny fraction of what it would take to hire more people to do this manually.

We work with a range of enterprise customers to build custom AI-powered applications: tailored tools that fit their specific workflows, data, and teams. One of the things we do a lot of is build out the UI layer for these apps.

Front-end development is, notoriously, time-consuming. Translating a design concept into a working, polished interface - handling responsiveness, state management, edge cases, accessibility - takes time. Good developers are expensive, and those who also understand the AI application layer are even harder to find.

On a recent project for a global retailer under serious deadline pressure, a deliverable that would have normally required a full week of focused frontend engineering time was completed in under two days.

The old way: A week of senior engineering time, fully loaded, sits in the range of $3,000 to $5,000 when you factor in salary, benefits, and overhead.
The Vertesia way: The token cost for the automated research, configuration, and generation loops? A tiny fraction of that. We're talking orders of magnitude less expensive.

But here's the part that doesn't show up in the token bill: our engineers weren't just moving faster. They were thinking at a higher level. Instead of getting lost in boilerplate and repetitive configuration, they focused on critical reviews and architectural integrity. Because Vertesia's agents handled the heavy lifting, and allowed me and my colleagues to spend more time focusing on cognitively intensive tasks, the overall quality went up, not down.

Knowing when to walk away: the bad use cases

Now, let’s clear something up. Just because AI can solve a problem doesn't mean it should.

A lot of the current AI space is saturated with hype, leading people to deploy complex agentic workflows for things that simply don't justify the spend. If you build an agent to solve a minor administrative papercut, and that agent costs you $20 per run to fix a problem that wasn't really that big of a deal... you’ve selected a bad use case.

The math only works when the value of the outcome dwarfs the cost of the input. If you are using expensive compute to solve low-value problems, you are burning money. Recognizing bad use cases is just as important as identifying good ones. Prioritizing high-value opportunities is the only way to ensure you don’t end up in the boat of “AI is just too expensive for our business”

Rethinking the unit cost

We need a new mental model here. Token cost is an input cost, just like compute, cloud storage, or SaaS seat licenses. Nobody looks at their AWS bill and asks "was this worth it?" in isolation. They ask: what did we build with it? What revenue did it generate?

The same logic applies to tokens. When you frame it as cost per outcome instead of cost per token, the math looks completely different. You stop asking "did we spend too much on AI?" and start asking "what's the cost of not using it?"

What costs should you be watching?

None of this means you should ignore token costs. You absolutely should understand them. A few things we pay close attention to:

Task scoping: Agents that wander are agents that spend. Tight, well-defined tasks with clear stopping conditions are dramatically more efficient.

Caching and reuse: Don't pay to process the same context twice. Prompt caching alone can cut costs significantly in repetitive workflows.

Output quality gates: A cheap run that produces rework is more expensive than a slightly pricier run that gets it right the first time.

The goal isn't to spend as little as possible on tokens. The goal is to spend wisely — with a clear line of sight to the value being created.

The bottom line, your bottom dollar

The conversation about AI costs is happening in the wrong currency. Tokens are the mechanism. Outcomes are the point.

The next time someone sends you a screenshot of your AI spend with a question mark, reframe the conversation:

What did we build?

How fast did we build it?

What would it have cost us to do it the old way?

My guess is the answer changes the conversation pretty quickly.

View full post