Token Budget War: Corporate AI entered the "Age of Accounting"

2026/05/29 02:37
👤ODAILY
🌐en

AI COSTS, ROI AND INTRA-ENTERPRISE RESOURCE ALLOCATION

Token Budget War: Corporate AI entered the "Age of Accounting"

Original title: Token Budget Wars

Photo by Jaya Gupta

Original language: Peggy

THE EDITOR PRESSES: ENTERPRISE AI IS MOVING FROM THE "USE OR NOT" PHASE TO THE "HOW TO CALCULATE"。

Over the past two years, many companies have promoted the use of AI by employees, more so as to keep pace with technological trends and competitive pressures. But when AI reasoned costs went from experimental budgets to ongoing operating expenses, CEOs and CFOs began to ask a more realistic question: how much value did AI create? Every dollar token cost, what's the actual result

This is the heart of Token Budget Wars. The so-called token budget war is not just an attempt by an enterprise to lower the AI bill, but a rejuvenating exercise to determine which businesses deserve more credit, which tasks should be replaced by cheaper models, which processes can replace outsourcing or labour, and which are just inefficient consumption。

The article is of the utmost concern that the use of AI is not equal to value. In the SaaS era, usage usually means that software is used; however, in the AI era, token consumption can only indicate that "the counter is running". The same workflow may result in multiple cost differences due to differences in the hint, context, model selection and number of re-tests. The bill gets taller, it could be AI's really working, or the system's not working。

Therefore, in the next phase of the Enterprise AI, the key is not just modelling capacity, but whether token costs and business results can be matched. Phase I proved that AI could complete its work; phase II responded that it was not worth the cost。

The following is the original text:

BUSINESS AI HAS MOVED FROM "USE OR NOT" TO "DISTRIBUTION"。

At the top of the company, the new currency is your ability to quantify AI returns on investment. Each function was asked the same question: What did you produce? What's the cost? Over the past two years, CEOs have woken up in the morning to see Jim Cramer on the CNBC (#bearish) while watching competitors announce productivity gains and ask companies to use AI. Now what really comes under pressure is the follow-up question: prove the value。

Claude was released in November 2025, when most businesses' annual budget for 2026 was locked. By the first quarter, the actual usage of the enterprise had gone well beyond what had been planned. The reasoned cost is no longer just a budget item for the experiment, but a continuing operating cost. And then there's a new question: where does AI really create value

This question is difficult to answer because the utility of token is not quantified. The bill does not tell you whether this expenditure replaced labour, generated income, reduced risk, speeded up the process, or just a group of engineers who were mad at token (#metames). When spending hundreds of thousands of dollars, it still looks like an experiment. But beyond a certain threshold, such as reaching the seven digits, it becomes infrastructure. Technical differences begin to have a material impact on the income statement: the same workflow, the same group of inputs, and the cost of running two tokens may be 5 to 10 times different, with no apparent problem. At the experimental scale, this volatility is already quite expensive; but once it enters the infrastructure scale, it becomes the number that CFO must explain to the CEO。

it can be called the "marginal token utility": the commercial value of every dollar of reasoning cost. this is a truly important figure at the scale stage and is currently invisible to most companies。

The question on the board is going from "Ai is not useful" to "Ai is really leveraging." And that's why the so-called token budget struggle is essentially about token distribution。

and the competition for token ownership has risen rapidly because it is hitting a high-management instinct that has lasted for 30 years: big teams mean big jobs, big jobs and greater power. in the past, the success of senior managers was marked by the size of the team they managed — the number of subordinates, the number of subordinates and the number of people in the organizational structure。

But when intelligence becomes a scarce resource, the new sign becomes: how much intelligence can you deploy。

AI EXPENDITURES ARE IN ESSENCE COMPETING WITH LABOUR COSTS。

MOST AI BUDGET APPLICATIONS ARE ESSENTIALLY ONE OF THREE TYPES OF PROPOSITIONS: REPLACING OUTSOURCED LABOUR, REPLACING INTERNAL LABOUR, OR GENERATING NEW INCOME。

AN EMPLOYEE HAS PAY. ONE BPO OUTSOURCING CONTRACT HAS PRICES BASED ON WORK ORDERS, SETTLEMENTS, INVOICES OR AUDITS. HUMANS CAN UNDERSTAND THESE MEASURES. HOWEVER, THE COST OF REASONING IS MORE COMPLEX, AS THE COST OF THE FINAL COMPLETION OF A MISSION DEPENDS ON HOW THE SYSTEM OPERATES DURING ITS IMPLEMENTATION. A CLAIM MANAGEMENT MISSION THAT REQUIRES THREE RE-TESTS, MANUAL CORRECTIONS AND A FORWARD MODEL MAY BE MORE EXPENSIVE THAN THE OUTSOURCED MANPOWER IT INTENDED TO REPLACE. THAT IS WHY THE DISCUSSION IS MOVING TOWARDS: WHAT IS THE COST OF COMPLETING AN OUTCOME? FOR EXAMPLE, FOR EACH PROCESSED WORKSHEET, FOR EACH PROCESSED CLAIM, FOR EACH REVIEWED CONTRACT, FOR EACH INVOICE COMPLETED, FOR EACH ADDITIONAL POST AVOIDED, FOR EACH CUSTOMER RETAINED, OR FOR THE COSTS ASSOCIATED WITH EACH DOLLAR-BASED INCOME CONVERSION。

The executives have realized that BPOs are the easiest place to establish benchmarks because they are already priced according to `finished units'. By contrast, the comparison between internal staff and AI is much more difficult, as employees do a lot of work every day, including a lunch break brush, TikTok; productivity gains are often reflected in avoiding recruitment or decentralized release of capacity; and managers resist reducing team numbers based on partial automation alone. BPO provides a quantifiable baseline for the business team。

This is different from SaaS's logic. SaaS has trained enterprises to use quantities as proxy indicators of value。

But AI broke this. How much reasoning resources are consumed by the same workflow may vary significantly from hints, context to context, selected model, tools to be called, number of retries, and whether angent is stuck. The unit on the bill — token — is stable, but it represents an unstable workload。

more precisely: signals and noise use the same unit of measure. token bill increases may mean that real work is being done; but they may also mean that computing is being wasted on bad hints, irrelevant context, unnecessary tool calls, repetitive reasoning and overcapacity models. the token bill for both businesses may be identical, but the business at the bottom is very different: one was translating the reasoning into results, while the other was buying bills for ineffectiveness, both of which looked exactly the same on the bill entry。

The usage of SaaS tells you that the software is already in use. AI usage only tells you that the meter is running. It doesn't tell you if the company really runs up。

why is the marginal token invisibility

There are three main points。

The first is to try again. If the probability that an individual will do the flow correctly for the first time is p, the expected token consumption per resolved stream will be roughly increased by T/p, of which T is the base cost. If the completion rate drops from 90% to 70%, the effective cost of solving the problem increases by about 28%, instead of 20%, because failure has a compound effect. In business streams, input tends to be confusing and anomalies are also important. Failure not only reduces the accuracy rate, but also changes economic accounts。

The second is the expansion of the context. For operations that rely heavily on attention mechanisms, the cost of reasoning increases more or less by O(n2) with context length. As a result, the length of the context is doubled and the cost of reasoning is roughly fourfold. Everyone wants the model to have enough information, so the system tends to oversupply: the original five documents are enough and 50 are retrieved; the connector goes straight to the entire mail line; and angent continues with an outdated history of dialogue。

Third is the route. And when the team doesn't know which model is good enough, default uses the strongest model. A base classification task may run on the same model that was originally used for complex reasoning. When the number of calls reaches millions, the question of whether simple tasks are given to small models or whether all tasks are given to forward models is often the difference between manageable bills and board-level issues。

THE NON-SOFTWARE INDUSTRY WILL FEEL THIS PAIN IN THE FORM OF A “TRANSFORMATION”. SOFTWARE COMPANIES ARE THE FIRST TO SEE THIS PROBLEM, AS THE WORK THAT HAS BEEN OPTIMIZED HAS ALREADY BEEN FULLY INSTRUMENTED. THE ENGINEERING TEAM HAS INDICATORS SUCH AS PR, SUBMISSION, DEPLOYMENT, ACCIDENT, CYCLE TIME, AVERAGE RECOVERY TIME, AND THESE INDICATORS ARE LINKED TO PRODUCTS. ALTHOUGH NOT PERFECT, SUCH WORK IS MORE EASILY MEASURED。

Non-software enterprises will feel the problem more deeply because their work is operational. For example, claims settlement, insurance coverage, passenger service orders, compliance reviews, supply chain anomalies, payment disputes. Or, companies with real world assets will face the same problems. In the past, these flows were often measured in terms of labour, periodicity, SLA achievement and error rates, and were often more demanding and needed to be sustained in the audit, not just in an average sense. The work unit and the cost unit do not speak the same language and are not in the same organization. The technical team can see token consumption and the business sector can see changes in the workflow, but to connect the two, multiple teams are required to agree first on what to measure。

I think software companies will experience token budget competition as a productivity measurement issue, which corresponds to the many "AI layoffs" that have occurred before; not software companies will experience it as a transition issue。

The missing layer is the attribution from token to the result. An enterprise needs a layer of transformation linking the expenditure of reasoning to the work done and the business results generated. This layer must answer three questions: what is the real cost of this stream, including retesting and amending it? Which parts of the execution trajectories are really important and which are just ineffective? Does this work change the business model — for example, for each passenger service to handle fewer work orders, shorter resolution cycles, smaller BPO budgets, delayed recruitment? The next tier is the attribution of results in the business language. Instead of simply saying, "This job stream cost $2.13", it says, "This kind of settlement is cheaper for angent to process than BPO, but if the policy requires extra unusual documents, it destroys the economy by trying again。

Measurement becomes memory. To connect a token to one result, an enterprise must capture what happened in the middle: what angent saw, what he retrieved, what tools he used, what he ignored, where he tried again, when he was manually covered, which rule was applied, which precedent worked, and why one path worked and another failed. The measurement layer must record the trajectory of decision-making, which is exactly what an enterprise has never really had in the past. Recording systems can capture what happened, but rarely why. CRM, for example, can tell you that a transaction has been postponed, but can't tell you the unwritten judgements behind the sales forecast。

The rationale for decision-making is one of the most corrupt and disappearing assets in the company, as it exists in the Slack thread, the mail chain, upgraded meetings and human minds. But the problem is that people leave and the process changes。

AI changed this because angent would create a trajectory. Every search, tool call, retest, upgrade, manual correction and final decision-making is part of the path from context to action to outcome. Initially, the company would capture the tracks in order to justify the expenditure. But once these tracks are captured, they are more valuable than cost reporting itself, as they become a permanent record of how the organization actually makes decisions. (COUGHING, CONTEXT GRAPH) Although I've really heard the word lately

the distribution level is the real prize. if reasoning becomes a measured-cost resource in the client's operating model, each dollar must prove itself worth spending. which suppliers can explain when token was converted, when not, and why

Businesses don't figure it out. They would buy it as a transformation. Fortune 500 companies have done this repeatedly before: fasten their seat belts, hire McKenzie, bring in every former employee of Palantir in the market and then drive change from top to bottom by CEO. Token's attribution to the results will come in the same way as ERP, BI and digital transformation: as a "project" with executive endorsement, the bottom-up set of infrastructure will eventually become a new source of fact. The founders who are able to do so will form different types of founding teams and will themselves be different from the traditional prototype of entrepreneurs。

Whoever knows what token is going to do, he can make a distribution decision: which jobs deserve more credit, which should be limited, which should be switched to cheaper models, which should continue to be done by people and which could replace BPO. And once you make these decisions, you control the flow of AI spending within the enterprise and you gain the trust needed to allocate this resource。

THE FIRST PHASE OF ENTERPRISE AI PROVED THAT THE MODEL COULD BE COMPLETED. THE NEXT STAGE WILL DETERMINE HOW MUCH OF THIS IS WORTH PAYING FOR. AS CHARLIE MANGER SAID: SHOW ME THE INCENTIVE, I CAN TELL YOU THE RESULTS。

Original Link

QQlink

無加密後門,無妥協。基於區塊鏈技術的去中心化社交和金融平台,讓私隱與自由回歸用戶手中。

© 2024 QQlink 研發團隊. 保留所有權利.