The AI boom is built on a basic assumption: Bigger models are stronger, and the strongest models win. Now the industry is about to find out what happens if that assumption starts to break down.
Installation costs have already forced users to take a second look at smaller and cheaper models. This valuable model shopping it is new and it is not clear how it will affect the industry, but the impact is likely to be significant.
A prediction best put forward by Coinbase co-founder Brian Armstrong is that this will result in the vast majority of jobs moving to cheaper models.
“The demand for (D)intelligence is endless, but 80% of workloads will run on 99% cheaper models within 12-18 months” – Armstrong He wrote in X. “20% of workloads will still run on the latest generation models where maximum IQ is important.”
If Armstrong’s prediction comes true, it’s hard to overestimate how significant the changes will be for the AI industry.
Until now, most AI companies competed on quality, which meant defaulting to the most advanced model available. If the same work could be handled by cheaper models without compromising quality, it would mean a huge shift in the economics of AI. And critically, much of the savings will come out of the pockets of the big labs, hitting OpenAI and Anthropic financially just as they head into their IPOs.
It’s a potentially seismic shift in the industry that hinges on one key question: Are companies ready to switch to smaller models?
Preliminary tests show that when the system is set up correctly, cheaper models can work without any loss in quality. In a recent test by legal AI tool Harvey, the company was able to reduce inference costs by a factor of 3 without compromising quality. test, is carried out in partnership inference platform combined Fireworks AI with Claude Opus and Fireworks’ GLM 5.1, switching to Opus for the most intensive tasks. The result was a significantly lower load in terms of server time and overall cost.
“Quality comes first and legally always will,” Harvey co-founder Gabe Pereira told TechCrunch, referring to the AI-related legal services his startup provides. “But the definition of quality is evolving from simply using the most powerful model for everything to using the best model that gets the right answer most efficiently.”
This trend is often framed in terms of Chinese models or large labs compared to open weight, but this misses the larger point. The real difference is not between proprietary and open models; between large models and small models. You can save money by switching from GPT-5.5 to DeepSeek’s V4 Flash, but switching to GPT-5.4-mini works just as well.
There is an active price war between in-house results from major labs and independently presented open weight models. For the small vs. large question, it doesn’t matter which small model wins.
All of this may seem obvious—of course you shouldn’t use more compute than necessary—but it goes against the scale-first approach that has so far dominated the industry. was inspired bitter lessonlabs pushed the boundaries of what AI models could do, eager to train the most computationally intensive models possible. With prices heavily subsidized by investors, customers had no reason to choose anything other than the most advanced option.
With token prices rising and subsidies slowing, users are facing cost pressure for the first time. We don’t know if the new cost pressure will actually drive enterprise users to smaller models. They could easily save by making fewer calls, using less context, or simply abandoning the least promising placements.
But if it turns out that most deployments can be done in a smaller model, that could seriously reduce the incremental demand for inference and raise new questions about how to justify the cost of developing a frontier model.
When you purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.





