In the past, the AI cost model was simple: IT purchased seats with one of the model providers and put down a credit card for API fees that were a rounding error due to limited use. All license costs matched 1:1 with a team’s headcount and were trivial compared to overall compensation, and API fees were directly linked to experimental features.
This year, the breakout of coding agents and agentic workflows broke the status quo.
Among software engineers, the economics are no longer legible when a single agentic process can fan out across tools and reasoning loops to unpredictably consume $5, $50, or $500+ in an afternoon. At least weekly we see news stories of “tokenmaxxing” engineers burning through thousands of dollars overnight using parallel agents, or researchers targeting frontier math problems to the tune of $1-10K+ per attempt. On the extreme end, Jensen Huang of Nvidia said he would be "deeply alarmed" if a $500K engineer didn't consume at least $250K in compute.
Yet the emergence of agents is not limited to IT spend. Sales, marketing, finance - everyone is moving towards agentic tooling. The budgeting infrastructure that worked for the last decade will have to adapt.
Open Problems in AI Budgeting
Measurement:
Lines of code and PRs were imperfect metrics before agents, and now can be actively misleading. Neither is a useful measure of productivity or value created for the firm. Agent-generated code ships faster than humans can review, sometimes leading to maintenance overhead or production brittleness as “cognitive debt” accumulates in the codebase.
Prior AI adoption often used time savings to justify expenditures. With background agentic processes, we’re often tackling greenfield work that wasn’t previously done at all, without a cost-savings proposition. How do you calculate ROI on a counterfactual?
Trends:
- Companies are in the early days and moving toward an imperfect blend of a) the quantitative LoC/PR metrics and b) impact judgements (number of new features released).
- DevOps pipeline health and code quality assessments require time to measure and are hard to track at the individual level but are worthwhile long-term options.
- Inevitably, many startups will be created to grade agent productivity and usefulness.
Estimation:
Estimating the complexity of a software development task has always been difficult and agents bring additional challenges.
We lack the ability to predict exactly what tool calls, decompositions, and reasoning loops an agent will pursue to complete a task. For background subagents, a poorly structured workflow could fan out across a system and process millions of tokens before anyone notices.
Even when tasks are reasonably understood, developers rarely have real-time instrumentation to translate token throughput into costs and weigh against value delivered. The feedback loop doesn’t exist for engineers, and managed tools are even more opaque.
Trends:
- Organizations are starting to calculate their blended effective token cost (factoring in caching, input/output ratios, etc.) since list prices are rarely accurate.
- Agent frameworks are moving towards granular process control to prevent costly downside risks. All organizations should have basic spend limits to avoid worst-case scenarios.
Routing:
Frontier models with high effort settings are orders of magnitude more expensive than commodity models but are a rational choice for high-value, complex problems. Cheaper models are adequate for many routine tasks such as bug triage and customer support.
The problem is that this capability boundary is difficult to understand and constantly shifting. For now, organizations have to rely on engineers hard-coding model configurations into certain workflows, but this cost optimization is extremely immature.
Trends:
- The research labs are working towards better built-in routers, so expect workable solutions by the end of 2026. However, this may only apply when operating within their own agent frameworks (e.g. Claude Managed Agents) so plenty of work remains to be done.
- Time and budget preferences will become configurable at the agent level. Expect more customization around “fast” mode and cascades trading off time vs. cost vs. quality.
- Cheap open-source models are saturating more benchmarks, so an increasing swath of current-day workloads won’t require frontier routing.
Timelines:
An annual budget planning process will fail. An AI budget from June 2025 based on usage trends from the trailing 12 months will not account for the costs and opportunities of agentic AI. The pace of change is so rapid that executives need to actively monitor and manage: a thoughtful spending policy might only be effective until the next model or agent framework release. Enterprise change management efforts with 2-3 year timelines will be obsolete before delivery.
Trends:
- Contract terms are being negotiated short (1 year or less) to hedge against capability change.
- Founders with direct spend authority are pushing experiments outside of typical budget cycles.
- Small teams are being empowered to run multiple prototypes in parallel.
Four Positions on Token Spend:
Most organizations display one of the following stances.
- Headcount-led: Spend = subscription licenses. The organizations were late to LLMs and view AI as a productivity tool. API spend is negligible and is treated no differently than other third-party API spend.
- Project-led: Spend is allocated per project (“$50K for the contracts migration”). This approach tends to treat AI as a one-time purchase rather than persistent capability and discourages experimentation.
- Metric-led: Employees are evaluated on token usage in performance reviews and compared to peers via leaderboards. Executives use such measures to force adoption, but competition on usage has led to gratuitous waste and policy reversals on several occasions.
- Workflow-led: Spend is the operating cost of an agent-native process; these agent roles are the budget line items. An organization evaluates cost per unit of work completed rather than money spent by the developer.
Most organizations are stuck in in the Headcount- and Project-led positions. I am not aware of any large organizations completing a reorientation to Workflow-led spending.
Rethinking the Firm
Organizational budgets are, to a large extent, a map of how firms view the world and themselves. Reporting hierarchies are theories of how work gets done, headcount follows reporting channels, and budgets are set based on headcount.
Once or twice a generation, a structural revolution ripples through firms and forces a fundamental reevaluation. Offshoring altered the assumptions about team composition and communication. The internet eliminated geographic friction and forced firms to compete with global competitors and distribution channels. Cloud computing removed the capital requirements for enterprise infrastructure, enabling faster scaling than ever before. The firms that were able to adapt their organizational structure the quickest won out over competitors.
For AI, these structural revolutions arrive every three years or so. For now, firms experiment with forward-deployed engineers to embed technical capacity with non-technical functions. Some startups are beginning to operationalize agentic workflows independently of staffing, and allocate budgets accordingly. But unlike prior transitions, there is no true “end state” for agent adoption, with the capability frontier advancing faster than firms can plan. The AI transition cannot be treated as a narrow procurement question. The organizations that succeed will treat organizational design as a continuous process, with agents as first-class participants in how work gets done.
