The Ai-era conservation philosophy: how to spend every Token in a blade Let's go
The unit that counts has changed and changed, and the instincts of saving money have remained constant。

Sleepy.md
In the age of a word-based telegram, pens were money. People are used to condensing thousands of words to the point where "fast return" is a long letter, and "peace" is the heavyest tinker。
The telephone was later taken into the house, but the long-distance fee was charged in seconds. The parents' long-distance telephone calls were always brief, and when the business was finished, they quickly hung up, and once the conversation had spread slightly, the painful thought would have cut off the cold that had just appeared。
Later, broadband came in, the Internet was charged hourly, people were staring at the timers on the screen, web pages were turned off, videos were only downloaded, and streaming media was a luxury verb. At the end of every download of progress, there is a desire to connect the world and a fear for the balance。
The unit that counts has changed and changed, and the instincts of saving money have remained constant。
Now, Token became the currency of the AI era. However, most people have yet to learn how to figure things out in this era, because we have yet to learn how to calculate gains and losses in invisible algorithms。
2022 ChatGPT came out when almost nobody cared about Token. It's the A.I. Big dinner, 20 bucks a month, how much to talk about。
But since the recent fire of AI Agent, Token spending has become something that every person who uses AI Agent must focus on。
Unlike a simple conversation with an answer, there are hundreds of API calls behind a task stream, and Agent's independent thinking has a price, and every self-correction, every tool call, corresponds to the beat of the number on the bill. And then you'll find that all of a sudden you're out of money and you don't know what Agent did。
In real life, everyone knows how to save money. We're going to the market to buy food, we know to clean up muddled leaves and weigh them; we're going to the airport, and old drivers know to avoid early peaks。
The cost-saving logic of the digital world is the same, except that the unit of account is changed from "chip" and "kilometres" to Token。

IN THE PAST, SAVINGS WERE DUE TO SCARCITY; IN THE AI ERA, SAVINGS WERE MADE FOR PRECISION。
WE WANT THIS ARTICLE TO HELP YOU OUT WITH AN AI-ERA APPROACH TO SAVING MONEY, AND YOU SPEND EVERY PENNY ON A KNIFE。
I'll take out the bad leaves before the scale
IN THE AI ERA, THE VALUE OF INFORMATION IS NO LONGER DETERMINED BY BREADTH BUT BY PURITY。
AI'S COSTING LOGIC IS TO CHARGE THE NUMBER OF WORDS IT READS. WHETHER YOU FEED IT WITH REAL KNOWLEDGE OR WITH MEANINGLESS FORMAT, IF IT READS IT, YOU HAVE TO PAY。
So the first way to think about this is to take the "noise ratio" subconscious。
EVERY WORD YOU FEED AI, EVERY PICTURE, EVERY LINE OF CODE IS PAID. SO BEFORE YOU GIVE ANYTHING TO AI, ASK YOURSELF: HOW MUCH IS REALLY NEEDED? HOW MANY ARE MUDDY LEAVES
For example, "Hello, help me..." such lengthy introductory remarks, repeated background notes, uncut code notes, are muddled leaves。
IN ADDITION TO THIS, THE MOST COMMON WASTE IS THE DIRECT THROWING OF PDF OR WEBSHOTS TO AI. IT'S REALLY EASY FOR YOU, BUT THE "SAVING" OF THE AI ERA OFTEN MEANS "COST."。
A FULLY FORMATTED PDF CONTAINS, IN ADDITION TO THE BODY OF THE TEXT, THE HEADER, FOOTER, CHART LABELS, HIDDEN WATERMARKS, AND A LARGE NUMBER OF FORMAT CODES FOR LAYOUT. THESE THINGS DON'T HELP AI TO UNDERSTAND YOUR PROBLEMS, BUT THEY'RE ALL BILLED。
Next time remember to turn PDF into a clean Markdown text and feed it to AI. When you turn 10MB's PDF into 10KB's clean text, you not only save 99% of the money, but also make AI's brain run much faster than before。
The picture is another gold swallower。
IN THE LOGIC OF VISUAL MODELS, AI DOESN'T CARE ABOUT THE BEAUTY OF YOUR PICTURES, BUT ABOUT HOW MUCH PIXELS YOU OCCUPY。
Take the official calculation logic of Claude: Token consumption = width pixel x height pixel 750 in pictures。
(a) A picture of 1,000 x 1,000 pixels, costing approximately 1,334 Tokens, converted at Claude Sonet 4.6, approximately US$ 0.004 each
But if you compress the same image to 200 x 200 pixels, you consume 54 Tokens at a cost of $ 00016, 25 times the difference。
Many people throw high-resolution photos of mobile phones, 4K screenshots, to AI, even though these images consume Token enough to get AI to read half of a novel. If the task is simply to identify the text in the picture or make a simple visual judgement, such as to allow AI to identify the amount on the invoice, to read the text in the instructions, or to determine whether there are red and green lights in the chart, then the resolution of 4K is pure waste, and it is sufficient to compress the picture to the minimum available resolution。
But the reason why the input end is the easiest to waste Token is not the file format, but the inefficient way to speak。
A lot of people used to talk about AI as a real neighbor, socialized, broken up, and throw a "writing a web page for me" first, wait for AI to spit out a half-finished product, then add details, then pull it over and over again. This toothpaste conversation allows AI to generate content over and over again, and each round of modification is over and above Token consumption。
In practice, the same demand, the many rounds of toothpaste dialogue that ended up consuming Token, was found to be three to five times clearer at once。
The real way to save money is to give up this inefficient social experiment and to clarify once and for all the requirements, the border conditions and the examples. Less effort is needed to explain “do nothing”, because the negative sentence often costs more than the positive sentence; tell it what to do, and give a clear and correct example。
MEANWHILE, IF YOU KNOW WHERE THE TARGET IS, JUST TALK TO AI, AND DON'T LET AI GO TO DETECTIVE。
When you order AI to "see the user-related code", it has to do a large-scale scanning, analysis and speculation backstage; and when you tell it "see this file of src/services/user.ts," Token's consumption varies from day to day, and in the digital world, information parity is the greatest saving。
DON'T PAY FOR AI'S COURTESY
The big model bill has a sub-rule that many people do not realize: output Token is usually three to five times more expensive than input Token。
In other words, what AI says is much more expensive than what you say to it. In the case of Claude Sonet 4.6, for example, only $3 per million token was entered, while the output jumped sharply to $15, a full fivefold price difference。
THE POLITE WORDS "WELL, I UNDERSTAND YOUR NEEDS, AND I'M BEGINNING TO ANSWER TO YOU" AND THE "HOPE THAT THE ABOVE WILL HELP YOU" ENDS WITH POLITE SOCIAL WORDS WHEN YOU COMMUNICATE, BUT ON THE API BILL, ALL THESE NO-MESSAGING CHILLS ARE GOING TO COST YOU。
THE MOST EFFECTIVE WAY TO SOLVE OUTPUT WASTE IS TO SET RULES FOR AI. USE THE SYSTEM DIRECTIVE TO TELL IT CLEARLY: DON'T CHILL, DON'T EXPLAIN, DON'T REPEAT THE NEED, JUST GIVE THE ANSWER。
These rules, which are set only once, are effective in every dialogue and are a real means of “one investment, one lasting benefit”. But in setting up the rules, many people were caught in another mistake: stacking instructions in long, natural languages。
The engineers ' empirical data show that the effectiveness of the command is not in word count, but in density. Compressing a 500-word system hint to 180 words, by deleting meaningless polite language, combining repeated instructions and recasting paragraphs into a simple list of entries, the output quality of AI is almost non-variant, but single-calling Token consumption can drop by 64 per cent。
THERE IS ALSO A MORE PROACTIVE CONTROL, WHICH IS TO LIMIT OUTPUT LENGTH. MANY NEVER SET AN OUTPUT CEILING AND WERE LEFT TO AI, WHICH ALLOWED FOR THE RIGHT TO EXPRESS, OFTEN LED TO EXTREME COST LOSS. YOU MIGHT JUST NEED A LITTLE SHORT SENTENCE, BUT AI, IN ORDER TO SHOW SOME KIND OF "INTELLIGENCE" TO PRODUCE YOU AN 800-WORD PIECE。
If you are looking for pure data, you should force AI to return to a structured format rather than a lengthy natural language description. With the same amount of information being carried, the Token consumption in JSON format is much lower than that in scattered cultures. This is due to the fact that structured data remove all redundant connecting words, words and explanatory corrections and retain only high concentrations of logical cores. In the AI era, you should realize that what is worth paying for is the value of the result, not the meaningless self-interpretation of AI。
IN ADDITION TO THIS, AI'S "OVERTHINKING" IS FRENZIING YOUR ACCOUNT BALANCE。
Some advanced models have a "extension of thinking" model, with massive internal reasoning before answering. This reasoning is also expensive and priced at the price of the output。
THIS MODEL IS ESSENTIALLY DESIGNED FOR "COMPLEX TASKS THAT REQUIRE DEEP LOGIC". BUT MOST PEOPLE HAVE CHOSEN THIS MODEL WHEN ASKING SIMPLE QUESTIONS. FOR TASKS THAT DO NOT REQUIRE IN-DEPTH REASONING, IT'S CLEAR TO AI THAT IT CAN SAVE YOU A LOT OF MONEY BY EITHER SAYING "NO NEED TO EXPLAIN, JUST GIVE THE ANSWER" OR TURNING IT OFF MANUALLY。
DON'T LET AI TURN THE OLD BILL
Big models don't have real memories. It's just crazy about old books。
THIS IS A BOTTOM-UP MECHANISM THAT MANY DO NOT KNOW. EVERY TIME YOU SEND A NEW MESSAGE IN A DIALOGUE WINDOW, AI DOESN'T BEGIN TO UNDERSTAND FROM YOUR WORDS, BUT IS GOING TO REREAD EVERYTHING YOU'VE TALKED ABOUT BEFORE, EVERY ROUND OF CONVERSATIONS, EVERY CODE, EVERY REFERENCE DOCUMENT, AND THEN ANSWER YOU。
In Token's bill, this "good and new" is not free. The cost of rereading the whole account behind AI will increase exponentially as the dialogue turns over. This mechanism determines that the heavier the history of the dialogue, the more expensive every question you ask。
496 genuine conversations containing more than 20 articles were followed, and it was found that article 1 messages averaged 14,000 Tokens at a cost of about 3.6 cents per message; by article 50, an average of 79,000 Tokens was read at a cost of about 4.5 cents per message, representing a total cost of 80 per cent. Moreover, the context is growing longer, and by the time of article 50, the context in which AI would have to deal with is already 5.6 times greater than when article 1。
The simplest habit of solving this problem is a task, a dialogue box。
WHEN A TOPIC IS FINISHED, YOU START A NEW CONVERSATION WITH DETERMINATION, DON'T USE AI AS A CHAT WINDOW THAT NEVER SHUTS DOWN. THE HABIT SOUNDS SIMPLE, BUT A LOT OF PEOPLE JUST CAN'T DO IT, AND THEY THINK, "WHAT IF WE USE WHAT WE USED TO DO?" IN FACT, MOST OF THE "IF" YOU'RE WORRIED ABOUT ISN'T GONNA HAPPEN, AND FOR THAT, YOU'VE ALREADY PAID SEVERAL TIMES MORE ON EVERY NEWS。
When dialogue does need to be continued, but the context has become long, we can use some of the tools to compress. Claude Code has a/compact command that can condensate the history of the long talk into a short summary and help you do a Sabotage breakup。
And save logic is Prompt Caching. If you use the same system hint repeatedly, or if you refer to the same reference document every time you talk, AI caches this part of the message. Next call is made at a very low rate of access, not at a full price。
Anthropic's official pricing shows that the cache token price is 1/10 of the normal price. OpenAI's Prompt Caching also reduces input costs by about 50%. A paper published on arXiv in January 2026 tested the long missions of several AI platforms and found that the cache of hints reduced API costs by 45% to 80%。
That is to say, the same thing, the first time I fed AI was to pay the full price, and every call after that was to pay 1/10. This feature saves a large number of Tokens for users who repeat the same set of standard documents or system hints on a daily basis。
But Prompt Caching has the premise that the content and order of your system alerts and reference documents must be consistent and placed at the top of the conversation. Once there is any change in content, the cache will lapse and be recosted at full price. So if you have a fixed working code, write it down, and don't change it。
The last technique in context management is loading as required. A lot of people like to put all the rules, documents, attention into the system's hint for the reason that it's "just in case."。
But the price of doing so is that you are forced to load thousands of words of rules and waste a bunch of Token in vain, even though you are doing a simple task. Claude Code's official documentation suggests that CLUDE.md be kept within 200 lines, that the special rules for different scenarios be broken down into separate skill files, and that the rules for which scenes be loaded be used. The absolute purity of the context is respect for the highest level of arithmetic。
Don't take the Porsche to buy food
DIFFERENT AI MODELS, THE PRICE GAP IS HUGE。
Claude Opus 4.6 5 dollars per million Token input, 25 dollars for output, 3.5 dollars for Claude Haiku, 0.8 dollars for input, 4 dollars for output, almost six times the difference. It is not only slow, but also expensive to get the top models to do the information-gathering, layout-format chores。

SMARTLY USED IS TO BRING THE "CLASS DIVISION" OF LABOUR THAT IS COMMON IN OUR HUMAN SOCIETY TO THE AI SOCIETY, WITH DIFFERENT DIFFICULT TASKS, TO MODEL AT DIFFERENT PRICES。
Like hiring people to work in the real world, you don't hire a million-year-old expert to move bricks. So is AI. Claude Code's official file also clearly suggests that Sonnet handles most programming tasks, Opus leaves complex architecture decisions and multi-step reasoning, and simple sub-tasks are assigned to Haiku。
More specifically, the practical approach is to build a two-part workflow. In the first stage, the first stage of dirty work is carried out using free or cheap basic models, such as data collection, format clean-up, first draft generation, simple classification and aggregation. In the second phase, the refined high purity fineness is fed to the top model for core decision-making and depth refinement。
For example, if you want to analyse a 100-page industry report, you can extract the key data and conclusions from the report with Gemini Flash, form a 10-page summary, which can then be sent to Claude Opus for in-depth analysis and judgement. This two-part workflow allows for significant cost compression with quality assurance。
What is more advanced than a simple sub-processing is a deep division of labour based on task deconstruction. A complex engineering mission could well be detached from several separate subtasks and matched the most appropriate model。
For example, a task requiring code writing allows cheap models to write frames and sample codes first, and then only to hand over the core logic to expensive models. Each sub-mission has a clean, focused context and results are more accurate and less costly。
You didn't need flowers, Token
All the discussions that have taken place have essentially addressed the tactical question of how to save money, but a much lower logical proposition has been ignored by a lot of people: does this move require spending token
THE GREATEST SAVINGS ARE NOT THE OPTIMIZATION OF ALGORITHMS, BUT THE SEPARATION OF DECISION-MAKING. WE'RE USED TO LOOKING FOR AL-AI ANSWERS, BUT WE FORGET THAT IN A LOT OF SCENARIOS, IT'S LIKE USING A BIG, EXPENSIVE MODEL TO FIGHT MOSQUITOES。
For example, allowing AI to process mail automatically, it can use every e-mail as an independent task to understand, sort, respond, and Token consumes a lot. But if you take 30 seconds to take a look at the inbox, and manually sift out the mail that clearly does not need to be handled by AI, then hand over the rest to AI, the cost is immediately reduced to a small fraction. Human judgment here is not an obstacle, but the best filter。
PEOPLE IN THE TELEGRAPH AGE KNOW HOW MUCH IT COSTS TO SEND ONE MORE WORD, SO THEY'LL TAKE CARE OF IT, WHICH IS AN INSTINCT OF RESOURCES. THE SAME IS TRUE OF THE AI ERA, WHEN YOU REALLY KNOW HOW MUCH IT COSTS TO GET AI TO SAY MORE, THEN YOU'LL NATURALLY HAVE TO TAKE CARE OF THE FACT THAT IT'S NOT WORTH IT, THE TASK NEEDS A TOP MODEL OR A CHEAP MODEL, AND THE CONTEXT IS USELESS。
This is the most cost-saving capacity. The smartest way to do this is not to let AI replace people, but to let AI and people do what they do best. When this sensitivity to Token is internalized into a conditional reflection, you really change from a subservient to a master of arithmetic。
