The AI Spending Trap: Token Prices Fell 98%, Enterprise Bills Are Still Rising.

AI promised to replace human labor. Uber burned its entire 2026 AI budget in four months. Here is what the math actually looks like.

Jun 02, 2026

∙ Paid

If you haven’t explored our previous research, you may revisit some of our earlier due diligence reports and thematic notes below. Each piece reflects the same thesis-driven framework we apply across every investment case.

If AI Is Killing SaaS, Why Are SaaS Earnings Accelerating?

Thesis Rationale

May 20

Read full story

Legend Series: Super Investor - Chuck Akre

Thesis Rationale

May 26

Read full story

The Risk Framework | Part 2: Position Sizing Is the Most Underrated Risk Management Tool You Have

Thesis Rationale

May 13

Read full story

Thesis Weekly_Week 21 | The War Tax Is Starting to Expire

Thesis Rationale

May 31

Read full story

Opening

Praveen Neppalli Naga, Uber’s Chief Technology Officer, thought he had budgeted correctly for AI in 2026. He had not. By April, the company had consumed its entire annual allocation for AI coding tools. “I went back to the drawing board,” he told The Information, “because the budget I thought I would need has already evaporated.”

Uber was not alone. Microsoft gave thousands of engineers, product managers, and designers access to Claude Code in late 2025, encouraging experimentation. Within months, the company began canceling licenses across its Experiences and Devices group, the team behind Windows, Microsoft 365, and Surface, with a June 30 cutoff. Bryan Catanzaro, Nvidia’s Vice President of Applied Deep Learning, stated publicly what his own company’s infrastructure team had concluded: “For my team, the cost of compute is far beyond the costs of the employees.”

This is the irony at the center of the current AI moment. The companies most aggressively deploying AI are openly acknowledging that it has become, in many contexts, more expensive than the human labor it was supposed to replace. The narrative of inevitable cost reduction has collided with the reality of how token-based pricing actually works at scale.

None of this means AI is overrated as a technology. It means most organizations are making the same set of calculable errors when they evaluate the economics. Understanding those errors, and understanding precisely where AI genuinely does reduce cost and where it does not, is the difference between deploying AI as a competitive advantage and deploying it as an expensive liability.

Part One: What Is Inside a Token

Before evaluating the economics of AI versus human labor, it is worth tracing what actually happens when an enterprise deploys a large language model at scale. The cost of a token is not a price. It is the visible surface of a multi-layer cost structure that most organizations never fully account for.

Layer One: The Chip

Every token generated by a frontier model requires GPU or specialized AI accelerator compute. The H100, Nvidia’s flagship inference chip, costs between $30,000 and $40,000 per unit to purchase outright. Cloud providers charge between $1.49 and $6.98 per hour depending on the provider and commitment level, a range that has fallen significantly: H100 cloud prices dropped from $7 to $8 per hour at peak to under $2 per hour for some configurations by mid-2026.

Nvidia chases $30 billion custom chip market with new unit: Reuters

Token prices reflect this underlying chip economics. GPT-4-equivalent performance now costs approximately $0.40 per million tokens, down from $20 per million in late 2022, a 98% reduction. This number is cited frequently as evidence that AI is becoming dramatically cheaper. It is accurate as far as it goes. It stops too early.

Layer Two: Energy

The IEA’s 2025 Energy and AI report projected that global data center electricity consumption could double by 2030, with AI workloads accounting for the majority of incremental demand. Major markets including Northern Virginia, Silicon Valley, and Northern Europe have seen power approval timelines for new facilities stretch to 24 to 36 months, regardless of capital availability. Electricity is no longer an unconstrained resource for AI infrastructure.

The energy cost per token does not appear on any API invoice, but it is real. Current AI inference hardware consumes approximately 0.0001 to 0.002 watt-hours per output token. An H100 GPU draws 700 watts at peak. Data center Power Usage Effectiveness ratios of 1.2 to 1.5 mean that for every watt delivered to the GPU, an additional 0.2 to 0.5 watts is consumed by cooling, power distribution, and infrastructure. None of this appears on the token price.

Layer Three: The Inference Tax

Training receives the headlines. The $100 million model. The breakthrough architecture. Inference, the continuous cost of serving that model in production, is where the budget dies quietly. GPT-4’s training cost approximately $150 million. Within two years, its inference costs exceeded $2.3 billion. That is a 15x ratio.

Learn How to Train a Large Language Model in 5 Steps

Inference now represents approximately 55% of total AI infrastructure spending in early 2026. As models shift from simple question-and-answer interactions to agentic workflows that plan, use tools, execute code, and self-correct, token consumption per task scales exponentially. Models like GPT-4 Pro now process approximately 22,000 tokens in a single planning-and-execution exchange, compared to a few hundred for a simple query. Gartner projects that by 2030, even a 90% reduction in token unit prices will be outpaced by this consumption growth, leaving enterprise AI total costs higher than today.

Layer Four: Verification and Oversight

This is the layer that breaks most enterprise AI business cases. Frontier models maintain an average hallucination rate of approximately 9.2% on general knowledge tasks. Well-architected production systems using retrieval-augmented generation can reduce this to below 3%, but not to zero. Forrester Research estimates that each enterprise employee spends resources worth approximately $14,200 per year on AI hallucination mitigation: checking outputs, correcting errors, handling escalations, and rebuilding trust with downstream stakeholders who received incorrect information.

Microsoft’s own 2025 data found that knowledge workers spend 4.3 hours per week, more than 10% of their working time, verifying AI outputs. A 20-agent deployment requires dedicated human oversight running at approximately $23,000 per agent per year. This cost does not appear in the model API bill. It appears in the headcount budget of the teams that have to catch what the model gets wrong.

The complete cost chain of a deployed AI token, from chip depreciation through energy through inference infrastructure through human verification, is typically two to four times the visible API cost for enterprise deployments. Organizations that model the economics using API price alone are systematically underestimating their total cost of ownership.

Part Two: Where AI Is Genuinely Cheaper Than Humans

With the full cost structure in view, a more precise picture emerges of where the economics actually favor AI deployment.

High-volume, standardized, low-stakes tasks

The clearest case for AI cost advantage is in tasks that are highly repetitive, structurally predictable, and where the cost of an error is low or easily caught before causing damage.

Customer service at scale is the canonical example. The median U.S. customer service representative earned $42,830 in 2024, according to BLS data. Fully loaded with benefits, payroll taxes, management overhead, and infrastructure, the annual cost reaches approximately $60,000 to $72,000. The token cost of processing equivalent customer interactions using a well-configured API is orders of magnitude lower.

But the honest accounting does not stop there. Add enterprise infrastructure, integration costs, quality assurance oversight, and the human escalation layer required for complex or emotionally sensitive interactions, and the economics become viable only when the team size exceeds approximately five to eight agents. Below that threshold, human representatives are often cheaper on a total cost basis. The BLS projects customer service representative employment will decline 5% from 2024 to 2034, confirming the economic case is real, while also confirming the transition is slower than the simple token math implies.

Additional task categories where AI delivers genuine cost advantage:

Large-scale document classification and data extraction, where accuracy requirements are moderate and human review is the final gate
Code review for standard patterns and security vulnerabilities, where the output is verified by a human engineer before acting
Initial contract drafting for standard transaction types, where a lawyer reviews the AI output rather than writing from scratch
Translation and localization at volume, where the cost differential is large enough that human review of AI output is still cheaper than human translation from scratch

The common thread is structure. AI outperforms human labor economically when the task has a predictable form, the inputs are well-defined, and the output is reviewed before high-stakes use.

Companies capturing this advantage:

Salesforce’s Agentforce deploys AI agents for standardized customer service workflows, charging on a per-conversation basis rather than a per-seat model. For customers with high-volume, structured inquiry patterns, the unit economics work because the deployment is confined to the structured, lower-stakes tier. ServiceNow applies the same logic to IT service management, automating common ticket resolution while routing complex issues to human technicians.

Tasks Where AI Augments Rather Than Replaces

MIT’s study of automation economics found that AI is only economically justifiable for 23% of the wages tied to the tasks examined. The remaining 77% does not mean AI has no role. It means the economic case is for augmentation rather than replacement: a human with an AI tool outperforms either alone, at a combined cost lower than two humans operating independently.

The empirical evidence on what augmentation actually produces is now specific enough to be actionable. A study published in the Quarterly Journal of Economics in May 2025 examined 5,172 customer support agents given access to a generative AI assistant. Productivity, measured by issues resolved per hour, increased 15% on average. Critically, the gains were largest for lower-skilled workers, who benefited most from access to institutional knowledge embedded in the AI tool, while the highest-performing agents saw smaller gains. The AI did not replace the agents. It compressed the performance gap between the best and the rest.

In financial analysis, a natural experiment using FactSet’s 2023 AI platform launch found that AI-assisted analysts produced reports with 40% more distinct information sources, 34% broader topical coverage, and 25% greater use of advanced analytical methods, while also improving timeliness. However, forecast errors rose 59%, as AI-assisted reports conveyed a more complex mix of information that proved harder to synthesize, particularly for analysts under heavy cognitive load. That last finding matters. AI made analysts faster and broader. It did not make them more accurate. The judgment layer remained indispensable, and in some cases, the additional information volume created by AI increased the cognitive burden on the human responsible for the final call.

A February 2026 review of the empirical evidence across multiple studies found that roughly 80% of U.S. workers have at least 10% of their tasks exposed to LLM assistance, and about 19% have 50% or more of their tasks exposed. The research characterizes current AI systems as primarily augmenting human labor rather than automating it outright, with AI adoption increasing complexity in augmentation-prone roles while reducing skill requirements in automation-prone roles.

Three domains illustrate the augmentation dynamic with particular clarity.

In legal research, AI tools reduce the time required to survey case law, identify relevant precedents, and draft initial briefs. The productivity gain is real. The lawyer’s judgment in applying that research to the specific facts of a client’s situation, and the professional liability that attaches to the advice given, remains entirely human. The total cost of legal work delivered falls because the AI absorbs the low-judgment research hours. It does not fall because the lawyer is eliminated.

Design Customizable Workflows With Agent Builder | Harvey

In medical diagnosis support, AI systems have demonstrated detection rates for specific conditions that match or exceed experienced clinicians in controlled settings. In practice, no major jurisdiction permits AI to make clinical decisions without physician validation under current regulatory frameworks. The physician who reviews AI-flagged imaging studies is not freed from the task. They are redirected to higher-judgment interpretation and patient communication, with AI handling the first-pass pattern recognition. The cost per diagnosis falls. The physician does not disappear from the workflow.

AI based medical imagery diagnosis for COVID-19 disease examination and remedy | Scientific Reports

In software engineering, generative AI tools allow developers to generate, test, and document code more rapidly, improving data quality and accelerating system integration work. BLS analysis suggests these productivity enhancements are likely to be outweighed by strong business demand for software infrastructure, meaning employment in these roles is not projected to decline despite significant AI-driven productivity improvement. More output per engineer. Not fewer engineers.

Continue reading this post for free, courtesy of Thesis Rationale.

Or purchase a paid subscription.