Intelligence Demand Is Infinite

Last quarter our AI bill barely moved. Token usage went up and to the right, the kind of curve you screenshot for an investor deck. A token is roughly a word’s worth of text a model reads or writes, and everything in AI gets billed by the token. The cost line stayed flat. That gap is the whole story of where AI is going, and almost nobody is pricing it in correctly.

Here is my thesis. Demand for intelligence is close to infinite. We will never run out of things we want a model to read, draft, summarize, classify, or decide. But inside that infinite demand, the workloads split hard. Within 12 to 18 months, roughly 80% of them will run on models that cost 99% less than today’s frontier. The other 20% will stay on the latest, most expensive generation, because for that slice raw IQ is the product. Scientific work. Hard reasoning. The orchestrator agents that sit on top and direct a fleet of smaller models underneath, where one wrong call cascades into all the rest.

Most teams are running the entire 80% on frontier models anyway. They are paying a genius tax. Frontier prices for commodity work, on every prompt, forever.

The MacBook test

Think about how many MacBooks or gaming rigs ship with the maxed-out CPU and GPU. It is a small fraction. Most people buy the model that is plenty good for what they do, and the spec sheet of the top tier stops mattering the moment the cheaper one clears the bar. Intelligence is heading the same way. The frontier keeps climbing, and most work quietly stops needing the frontier.

The difference is the speed of the price collapse. This is not Moore’s law. It is faster, and it is not close.

Andreessen Horowitz called it LLMflation: for a model of equivalent performance, inference cost, what it costs to actually run the model and get an answer, has dropped about 10x every year. The numbers are almost hard to believe. Hitting GPT-4 quality cost around $20 per million tokens in late 2022. By late 2025 it was roughly $0.40. Epoch AI measured the same collapse from a different angle and found the rate ranges from 9x to as much as 900x per year, fastest for the routine tasks, slowest for the hardest reasoning. DeepSeek showed up and undercut incumbents by 90%. PC compute during its revolution and bandwidth during the dotcom boom never fell this fast.

So the maxed-out tier is real, and a sliver of the market genuinely needs it. Everyone else is overpaying for capability the workload will never touch.

The trap inside the good news

Here is where the optimists get lazy.

Cheaper tokens do not mean cheaper bills. They mean more tokens. The industry already has a name for what I lived last quarter: the LLM Cost Paradox. Per-token price drops 10x while consumption for some workloads climbs 100x. The savings get eaten alive by volume. Reasoning models, the ones that work through a problem step by step before answering, make it worse, because a single request now burns thousands of those hidden thinking steps, and you pay by the token for every one.

This is Jevons paradox with a GPU. When something gets cheap and useful, we do not pocket the savings. We do more of it. Demand for intelligence being near-infinite is exactly what guarantees the bill keeps climbing even as the unit price falls off a cliff.

Which leads to the conclusion that matters. The constraint on this industry is not better models. It is energy and compute.

The real ceiling

We have spent three years treating model quality as the bottleneck. Will it reason. Will it hallucinate. Will the next generation be smart enough. Those questions are getting answered fast, and they are becoming the easy part.

The hard part is physical. Every token is a quantity of electricity and a slice of a chip that has to exist, be powered, and be cooled. There is a hard physical floor for how little energy a computation can possibly use, and a recent analysis found today’s AI hardware runs astronomically far above it, with room to get orders of magnitude more efficient. The headroom is enormous, but closing it is a problem of chip factories, power grids, and watts, not cleverness. You cannot prompt your way to a power plant.

So the winners over the next two years will not be whoever has the smartest model. Everyone will have a smart enough model. The winners will be whoever turns a fixed amount of energy and silicon into the most useful work.

I have seen this movie before

If you have spent real time in software and services, you already know how this ends, because it is always how it goes.

In the early 2000s the money in digital media was in routing traffic across ad networks. You had a flood of ad inventory and a stack of buyers paying different rates, and the margin lived in the layer in the middle that decided, in real time, which impression went to which network. Here is the part that matters: that layer did not exist. There was no product to buy. So we built it. We wrote our own stacking services and yield routers, because building the router was the only way to capture the money sitting in between. It worked. That machine made us millions. It also took constant attention and a dedicated team, because a yield router is never finished. The market underneath it moves every day.

Intelligence routing is that same pattern, one layer up the stack. Same shape, bigger prize. The ad version optimized pennies per impression. This one optimizes the cost of thinking itself. Done right, it does not just save millions. For the teams that own the routing layer, it yields billions.

So when the cost paradox hit us, I did not reach for routing because it was elegant. I reached for it because I had built this exact machine before, and the bill made it obvious the pattern was back.

You are going to need a router

At Outsider Labs we hit the cost paradox head-on early, while building our live project CO/AI, the publication you are reading now. We had a range of models we wanted to use, and workflows stacked on top for things like image processing. We watched the math and realized no amount of waiting for cheaper models would fix it on its own. So we started doing the work directly, routing prompts to the cheapest model that can clear the bar and reserving the frontier for the 20% that truly needs it.

To be clear about what we have and what we don’t: we do not have a polished router product, and almost nobody does yet. We route by judgment and glue code. But the lesson from doing it is unmistakable. Every company is going to need a real router for this. The only difference from 2003 is that this time you have a choice. Back then nothing existed and you had to build the router yourself. Today you can wire up an off-the-shelf layer like OpenRouter, or build something custom for your own workflows. What you cannot do is skip it and keep defaulting every call to the frontier.

Think of it as a dispatch pattern. A workflow is a series of decisions, and not every decision is hard. Tagging, enrichment, classification, extraction, first drafts. That work goes to a small, cheap, often local model that returns the same answer the expensive one would. When the task genuinely calls for a deep check, real analysis, or high-stakes reasoning, you dispatch it up to the state-of-the-art model. The router decides, per call, which brain the job deserves.

Done well, the results are stark. We have kept costs roughly flat while token usage grew exponentially. Flat is the new down. In a market where the price of intelligence falls 10x a year, holding spend steady while usage explodes is a massive real gain, and most teams are leaving it on the table because they default every call to the frontier.

A few moves follow directly:

Treat your model choice as a portfolio, not a default. Every workload gets matched to a tier on purpose.
Build routing in early. Retrofitting it after you are locked to one expensive model is painful.
Measure work per watt, not tokens per dollar. The metric that wins is useful output against the physical input.
Assume the cheap tier is good enough until a workload proves it isn’t. Make the frontier earn its place.

Where this is all going

Start with the obvious. General-purpose routers will appear, because the need is universal and the margin in the middle is real. Then the platforms move. Apple, Google, and Microsoft will bake routing into the operating system. You will get a mix of local and remote: a capable model running on your device for most of what you ask, and a hop out to a frontier provider only when the task demands it. You will not choose. The system will.

For most people that is the whole story. The general user never has to think about any of this. The chatbot that ships with their phone or laptop will be more than enough, and the routing underneath it stays invisible.

Power users and businesses are a different animal. They will want a router they can operate and monitor, not a black box buried in someone else’s OS. They will want to see which model handled what, set their own rules, control their own spend, and deploy the thing across the entire company. That is a product you run, not a feature you accept.

And that is the shift that matters most. Intelligence stops being a project and becomes an assumption. Companies will take it for granted the way they take electricity and bandwidth for granted, baked into the hiring plan and the IT stack as a baseline rather than a special initiative. The router is how a company turns that assumption into something it can actually see, control, and afford.

Underneath all of it, the old lesson holds. I watched it play out in ad tech. The networks that supplied the inventory did not capture the value. The layer that owned the routing did. Intelligence is about to learn the same thing. The frontier labs become wholesale suppliers of the expensive 20%, called in by a dispatcher they do not control, whether that dispatcher is the OS on a billion phones or the router a company runs for itself. Own the routing layer and you own the leverage. The prize was never the model.

The honest counterpoint

Here is the strongest case against everything I just said. If the best models resist commoditization, then quality still matters and the moat is not just energy and routing.

I think that is half right, and it sharpens the thesis instead of breaking it. Good, general-purpose intelligence will commoditize all the way down. It will be nearly free and run right on your phone. But the truly special models will not. A model fine-tuned for one hard job, trained deep on a narrow domain until it is genuinely expert, is the rare thing the cheap tier cannot fake. As the floor falls, the distance to the top gets more valuable, not less.

So the 20% is more durable than “the latest, most expensive generation.” It is the genuinely differentiated models, the super-smart and purpose-built, and they will command a premium long after everyday intelligence is a free feature of your operating system.

Which is the whole point of the router. Intelligence is splitting in two. The basic kind becomes a commodity you barely pay for, running locally on the device in your hand. The exceptional kind stays worth real money. The scarce things underneath are power and the layer that decides which kind each request deserves. Build like everyday intelligence is the easy part, because it already is. Pay for the exceptional kind only when the job earns it.

Anthony Batt // Outsider Labs & CO/AI // @djabatt

Intelligence Demand Is Infinite

The MacBook test

The trap inside the good news

The real ceiling

I have seen this movie before

You are going to need a router

Where this is all going

The honest counterpoint

More like this

Claude 5 Fable Vibe Check: Anthropic Opens the Door to a Mythos-Class Model

The Livestream That Made 543,000 People Realize We’re Cooked

Apple’s Real Move and Why They Win The AI Race

All Signal.No Noise.

All Signal.
No Noise.