OpenAI's self-studying the AI chip. Will the cost of ChatGPT fall

2026/06/26 01:15
🌐en

First step is to focus on the cost of reasoning

OpenAI's self-studying the AI chip. Will the cost of ChatGPT fall
TL;DR
• OpenAI and Chase released a self-study of the AI reasoning chip Jalapeño, with an initial deployment target by 2026。
• THIS CHIP IS ORIENTED TOWARDS LLM REASONING, NOT TRAINING GPU ALTERNATIVES, AND EARLY PERFORMANCE STATEMENTS STILL COME FROM COMPANY TESTS。
• OpenAI has taken the first step in reducing British Wida's dependence, but the magnitude, volume and true load performance have not yet been disclosed。

OpenAI and Chase announced on June 24th that Jalalapeño, a self-research AI accelerator for the reasoning of large language models, is the first chip of OpenAI called "Intelligence Processor". According to two company announcements, Galapeño is still in the testing and sample phase, with the goal of starting initial deployment by 2026 and continuing expansion over the coming years. For OpenAI, the focus is not on immediately replacing all of the GPUs, but on shifting from the growing reasoning of ChatGPT, Codex, API, and future intelligent products to a mix of software and hardware that are more appropriate to their own models。

From the point of view, keep an eye on the daily search costs

AI CHIP REQUIREMENTS ARE BROADLY DIVIDED INTO TRAINING AND REASONING. THE TRAINING DETERMINES THE MAXIMUM CAPACITY OF THE MODEL, AND THE REASONING DETERMINES WHETHER, AFTER THE MODEL TRAINING HAS BEEN COMPLETED, IT WILL BE POSSIBLE TO RESPOND TO LARGE USER REQUESTS AT AN ACCEPTABLE COST。

Galapeño aimed at the latter. OpenAI states that this chip will be used for large-language modelling work loads to service ChatGPT, Codex, API and future smart body products. For ordinary users, it does not directly change the chat interface, but may affect the cost, speed and scalability of back-office processing requests。

This is also the common direction of AI's self-study chip surge. Google has TPU, Amazon, Meta is also pushing the custom accelerator. They do not necessarily have to be a complete replacement for Weeda, but instead move the most stable and large-scale internal workload onto chips that are more suitable for their own models and software stacks。

OpenAI has long been dependent on external GPU supplies, especially high-end accelerator cards. As models grow in volume, the procurement of generic GPUs alone is under double pressure: costs and supplies are constrained by external supply chains, and generic chips do not necessarily maximize efficiency under specific models and service models. The direct change in Galapeño is to get OpenAI to participate in the definition of arithmetic from pure purchasing power。

Combination is responsible for silicon realization and connection technology, but the crystal circle is not made Pea. Lou

Galapeño, designed by OpenAI from zero, provides silicon realization, network and connectivity technology, and Celestica participates in landings at the card, rack and system levels. The official communiqué did not disclose the crystal-turner and therefore could not be simply written as an opportunist responsible for manufacturing。

Moto has accumulated more in terms of self-defined ASIC and data centre networks, and has become one of the suppliers behind several AI giants ' self-study chips over the past few years. For OpenAI, the choice of interfacing is more like transferring the structural needs within the model company to a mature semiconductor and system supply chain to deployable products。

The performance part still leaves the boundary. According to both companies, the goal of Galapeño is to combine the current capacity of the leading AI accelerator with a significant improvement in performance/watt compared to current advanced programmes. At the same time, it was stressed that engineering samples were already on the working load of machines operating at the laboratory with target frequencies and power consumption, that the final performance was still being measured and that more detailed technical reports would be published in the coming months。

No third-party benchmarks have been made publicly available, nor have key indicators such as accurate throughput, delays, power consumption and the extent to which the cost of a single query has decreased were disclosed. Part of the market interpretation would place it directly next to the British Blackwell or Google TPU, but until the real data is deployed, the downfalls and performance lead can still count only early company tests。

NINE MONTHS TO THE STREAM. AI'S AUXILIARY CORE BECAME ANOTHER CLUE

Galapeño also has an officially highlighted detail: it took only nine months from the initial design to the production of the film. This time is not the same as the full production cycle, but rather the pace at which the chip design enters the manufacturing preparation phase。

In traditional chip development, it usually takes a long cycle from architecture design, validation, EDA process to streaming. OpenAI states that the process is accelerated by its model. The company tried to prove that AI could not only write codes, generate content, but also access hardware engineering processes, helping to shorten the time spent on chip design and validation。

If this method can be reused in subsequent products, Galapeño is not just a single-generation chip, but the starting point of the OpenAI multi-generational self-defined calculation platform. OpenAI and Chase had previously announced 10 GW-class custom AI Accelerator collaboration in October 2025, with deployment scheduled to begin in the second half of 2026 and to be completed by the end of 2029. Galapeño is the first sample of this cooperative line entering the open product phase。

However, the speed of design is not equal to the speed of deployment. Galapeño ' s access to the OpenAI core infrastructure also depends on subsequent production, containment, high bandwidth memory supply, server system integration and data centre movement. Any bottlenecks could affect the pace of initial deployment before the end of the year。

It's the way to reduce Britain's dependence, not to replace it

Galapeño is the most easily interpreted by the market as "OpenAI challenges Britain". More precisely, OpenAI began to look for a custom path outside of Weeda for part of the reasoning load。

The strength of Weeda is still not limited to a single GPU performance, but also includes the CUDA ecological, training and reasoning tool chain, system-level interconnection, developer base and supply scale. Even if OpenAI had successfully deployed Jalalapeño, it wouldn't be out of England immediately. High-end GPUs remain key infrastructure, especially on front-line large model training, generic R & D and diversification loads。

The line of reasoning is more appropriate for custom chip access. The request model is relatively stable, the model structure is more manageable and the service objectives are clearer. If OpenAI were to move the ChatGPT, Codex or API medium-high frequency, standardized reasoning tasks to Jalalpeño, there would be real benefits in terms of cost and supply elasticity。

FOR CHASE, AI'S SELF-STUDY CHIP IS NOT THE SAME AS BEING BYPASSED BY THE CHIP COMPANY, BUT WILL LEAD TO MORE CUSTOMIZED PROJECTS. ONLY THE REVENUE SPACE OF SUCH PROJECTS CONTINUES TO DEPEND ON THE COST OF COMPONENTS, THE SCALE OF DELIVERY AND THE COMPLEXITY OF SYSTEMS, ESPECIALLY THE PRICE AND AVAILABILITY OF CRITICAL SUPPLIES SUCH AS HIGH BANDWIDTH MEMORY。

The real answer to Galapeño is not whether the label Blackwell or TPU will be available at the time of publication, but how large-scale it will be, what real load it will run, and what it will reduce the cost of reasoning. For OpenAI, this "Mexican chili" is just the first taste of the GPU bottleneck, can it be the main course, and wait for the data to be deployed。

QQlink

Tiada pintu belakang kripto, tiada kompromi. Platform sosial dan kewangan terdesentralisasi berasaskan teknologi blockchain, mengembalikan privasi dan kebebasan kepada pengguna.

© 2024 Pasukan R&D QQlink. Hak Cipta Terpelihara.