ORIGINAL TITLE: "HWANG IN-HOON GTC FULL SPEECH: THE AGE OF REASONING, 2027 AT LEAST TRILLION DOLLARS, LOBSTER IS THE NEW OPERATING SYSTEM"

Original by: Wall Street

ON 16 MARCH 2026, THE CONGRESS WAS OFFICIALLY OPENED WITH A KEYNOTE ADDRESS BY THE FOUNDER AND CEO HOANG IN-HOON。

AT THIS CONFERENCE, WHICH IS CONSIDERED TO BE "AI INDUSTRY ANNUAL PILGRIMAGE", WE HAVE BEEN ABLE TO FIND A WAY TO MAKE A DIFFERENCEWONG IN-HOON DESCRIBED THE STRUGGLE OF YOUNG WEI DA FROM A CHIP COMPANY TO AN AI INFRASTRUCTURE AND FACTORY COMPANY. CHANGEI don't know. Facing the market's greatest concern for sustainability of performance and the space for growthHuang In-hoon details the underlying business logic that drives future growth - "Token Factory Economics"。

Performance guidance is extremely optimistic, "at least $1 trillion in 2027."

OVER THE PAST TWO YEARS, THE GLOBAL AI CALCULATION DEMAND HAS EXPLODED EXPONENTIALLY. AS LARGE MODELS EVOLVE FROM "SENSITIZATION" TO "GENERATION" TO "DEBATE" AND "ACTION" (TASK) THE CONSUMPTION OF COMPUTING POWER RISES DRAMATICALLY. HWANG IN-HOON GAVE VERY STRONG EXPECTATIONS FOR HIGH MARKET INTEREST ORDERS AND CEILING COLLECTIONS。

In-hoon Huang, in his speech, stated:

Last year at this time, I said, we saw $500 billion in high-assure demand, covering Blackwell and Rubin until 2026。now, right here, i see at least $1 trillion in demand in 2027。

Wong In-hoon's trillions of dollars are expected to push Britain's stock prices upwards by 4.3 per cent。

Moreover, he adds to this number:

Does that make sense? That's what I'm talking about. In fact, we would even be out of supply。I am sure that the actual computing needs will be much higher。

WONG IN-HOON POINTS OUT THAT THE SYSTEM TODAY HAS PROVED TO BE THE WORLD’S “LOWEST-COST INFRASTRUCTURE”. THIS GENERALITY ALLOWS CLIENTS TO FULLY UTILIZE AND SUSTAIN THE LONG LIFE CYCLE OF THE $1 TRILLION INVESTED IN ALMOST ALL AREAS OF AI MODELS。

Currently, 60 per cent of the business in Ingweida comes from the top five supermassive cloud service providers, while another 40 per cent is widely distributed in the fields of sovereignty cloud, enterprise, industry, robotics and edge computing。

Token Factory Economics. Every watt of performance determines the business lifeline

TO EXPLAIN THE LEGITIMACY OF THIS 1 TRILLION-BILLION DEMAND, HUANG IN-HOON SHOWED THE GLOBAL FIRM CEO A WHOLE NEW SET OF BUSINESS THINKING。He noted that the future data centre was no longer a repository of documents, but a "workshop" that produced Token (the basic unit created by AI)。

Wong In-hoon stressed:

EACH DATA CENTRE, EACH PLANT IS BY DEFINITION RESTRICTED BY ELECTRICITY. A 1GW PLANT WILL NEVER BECOME 2GW, THE LAW OF PHYSICS AND ATOMS。With a fixed power, who has the highest volume per watt of Token, who has the lowest production cost。

HUANG IN-HOON DIVIDES FUTURE AI SERVICES INTO FOUR BUSINESS LEVELS:

• Free floorsHigh, low, low)

:: Intermediate level(~$3 per million)

• Advanced level(~$6 per million)

:: High-speed layer(~$45 per million token)

• Hypervelocity layer(~$150 per million token)

He pointed out that AI would be smarter with the growing number of models and context, but the production rate of Token would be lower. Huang In-hoon stated:

In this Token factory, your throughput and the speed of Token generation will be translated directly into your exact income next year。

Wong In-hoon emphasized that Young Wei Da’s architecture would allow clients to achieve a very high level of throughput at the free level, while increasing performance by a staggering 35 times at the highest value level of reasoning。

Vera Rubin achieves 350 times faster in two years, Groq fills in hyperdrive reasoning

Under the constraints of this physical limit, Yvesta describes its most complex AI computing system ever, Vera Rubin. Huang In-hoon stated:

I used to talk about Hope, and I would lift a chip, and that's cute. But with Vera Rubin, you're thinking about the system. In this 100 per cent liquid cooling system, which completely eliminates traditional cables, it took two days to install the hangar and now only two hours。

Hoang In-hoon noted that Vera Rubin created amazing data leaps in the same 1 GW data centre through a highly end-to-end hardware co-design:

In just two years, we increased the production rate of Token from 22 million to 700 million, achieving 350 times the increase. Moore's Law can bring only about 1.5 times higher in the meantime。

In order to address the bandwidth bottlenecks under the conditions of hyper-speed reasoning (e.g. 1000 Tokens/s), Inverda gave a final solution for consolidating the acquired company Groq: asymmetrical separation reasoning。

Wong In-hoon explains:

The characteristics of the two processors are distinct. The Groq chip has 500MB SRAM and a Rubin chip has 288GB memory。

Wong In-hoon noted that through the Dynamo software system, Young Weida would need a mass computing and visible prefilling phase for Vera Rubin, and a highly sensitive decodement phase for delays for Groq。Huang In-hoon also offered advice on the configuration of the enterprise:

If your job is mainly high-swallow, 100% uses Vera Rubin; if you have a lot of high-value programming level Token generation needs, take 25% of the data centre size to Groq。

It was revealed that the Groq LP30 chip, which had been produced by a three-star agent, had been produced in volume and was expected to be shipped in the third quarter, while the first Vera Rubin hanger was operating on Microsoft Azure clouds。

Moreover, with regard to light interconnection technology, Huang In-hoon demonstrated the global primary production of a co-container optical exchange, Spectrum X, and calmed the market's fight over the "copper retreat" route:

WE NEED MORE COPPER CABLES, MORE LIGHT CHIPS, MORE CPOSI don't know。

Agent, end the tradition, SaaS, "Annual pay plus Token" is a Silicon Valley sign. Match

In addition to the hardware barriers, Hwang In-hoon left a great deal of space for the AI software and ecological revolution, particularly the outbreak of Agent。

He described open-source project OpenClaw as "the most popular open-source project in human history", claiming that it took just a few weeks to surpass Linux's achievements over the past 30 years. In-hoon Hwang said that OpenClaw is essentially the "operating system" of Agent's computer。

Huang In-hoon asserted:

Each SaaS (software, service) company will become Aaas, Smart, Service. There is no doubt that, in order to secure the safe landing of this smart body, which has access to sensitive data and enforcement codes, the NeMo Claw reference design at the enterprise level has been introduced, adding strategic engines and privacy routers。

For ordinary workers, the change is also near. Huang In-hoon describes the new shape of the future workplace:

In the future, every engineer in our company needs an annual Token budget。Their base salary could be hundreds of thousands of dollars a year, and on that basis, I'll give them about half the Token scale, allowing them to achieve 10x efficiency gains。This is Silicon Valley's new recruitment chips: how much of your offer carry, Token

At the end of the speech, Huang In-hoon also "blowed" the next generation of computational structures, Feynman, which will for the first time expand the same level as the copper line and the CCPO. What is even more striking is that the data centre computer Vera Rubin Space-1, which is deployed in space, is being developed by Ingweida, which completely opens up the imagination of AI ' s algorithms extending beyond the Earth。

THE FULL TEXT OF THE GTC 2026 SPEECH, TRANSLATED AS FOLLOWS (AI TOOL SUPPORT):

Moderator:Welcome to the podium, Hoang In-hoon, founder and Chief Executive Officer of In Weida。

Huang In-hoon, Founder and Chief Executive Officer:WELCOME TO GTC. I WOULD LIKE TO REMIND YOU THAT THIS IS A TECHNICAL CONFERENCE. IT GIVES ME GREAT PLEASURE TO SEE SO MANY PEOPLE IN LINE EARLY IN THE MORNING AND ALL OF YOU HERE。

IN GTC, WE WILL FOCUS ON THREE MAIN THEMES: TECHNOLOGY, PLATFORMS AND ECOSYSTEMS。

IT NOW HAS THREE MAIN PLATFORMS: THE CUDA-X PLATFORM, THE SYSTEM PLATFORM, AND OUR RECENTLY LAUNCHED AI PLANT PLATFORM。

Before we begin, I would like to thank our pre-heat host — Sarah Guo of Convition, Alfred Lin of Redwood Capital (the first venture investor in England), and Gavin Baker, the first major institutional investor in Inverda. The three have a profound insight into technology and have great influence throughout the technology ecosystem. Of course, I would also like to thank all the distinguished guests I personally invited today. Thanks to this all-star team。

I would also like to thank all the companies that are here today. We have technology, platforms and rich ecosystems. The company that is here today represents almost all the participants in a $10 trillion industry, and 450 companies have sponsored this event, and we are deeply grateful。

The conference will have 1,000 technical forums, 2,000 speakers and will cover every level of the artificially intelligent five-storey cake architecture — from infrastructure such as land, electricity and machinery to chips, platforms, models and applications that ultimately drive the entire industry。

CUDA: TWENTY YEARS OF TECHNOLOGY ACCUMULATION

EVERYTHING STARTS RIGHT HERE. THIS YEAR IS THE TWENTIETH ANNIVERSARY OF CUDA。

For two decades, we have been working on the development of this architecture。CUDA IS A REVOLUTIONARY INVENTION - SIMT (SINGLE COMMAND MULTI-WIRE) TECHNOLOGY THAT ALLOWS DEVELOPERS TO PREPARE THEIR PROGRAMS WITH STANDARD CODE AND EXPAND THEM TO MULTI-WIRE APPLICATIONS, WHICH ARE PROGRAMMING MUCH LESS DIFFICULT THAN THE PREVIOUS SIMD ARCHITECTURE。We have also recently added the Tiles function, which helps developers to better program the core (Tensor Core), as well as the various mathematical algorithms on which artificial intelligence depends today。CURRENTLY, CUDA HAS THOUSANDS OF TOOLS, COMPILERS, FRAMEWORKS AND LIBRARIES, HUNDREDS OF THOUSANDS OF OPEN PROJECTS IN OPEN-SOURCE COMMUNITIES, AND HAS BEEN DEEPLY INTEGRATED INTO EVERY TECHNOLOGY ECOSYSTEM。

THIS CHART REVEALS 100% OF THE STRATEGIC LOGIC OF INVERDA, AND I'VE BEEN TALKING ABOUT THIS SLIDE SINCE THE BEGINNING. ONE OF THE MOST DIFFICULT AND CENTRAL ELEMENTS TO ACHIEVE IS THE “LOAD” AT THE BOTTOM OF THE CHART. OVER THE PAST TWO DECADES, WE'VE ACCUMULATED HUNDREDS OF MILLIONS OF PIECES OF CUDA-OPERATED GPU AND COMPUTING SYSTEMS WORLDWIDE。

OUR GPU COVERS ALL THE CLOUD PLATFORMS THAT SERVE ALMOST ALL COMPUTER MANUFACTURERS AND INDUSTRIES. CUDA'S HUGE CAPACITY IS THE UNDERLYING CAUSE OF THE ACCELERATING SPEED OF THIS SHIP. THE LOAD ATTRACTS DEVELOPERS, WHO CREATE NEW ALGORITHMS AND MAKE BREAKTHROUGHS, BREAK NEW MARKETS, CREATE NEW ECOLOGY AND ATTRACT MORE FIRMS TO EXPAND THE LOAD — A WHEEL THAT IS ACCELERATING STEADILY。

The volume of downloads from Wei Daqu is growing at an alarming rate and is large and increasing. This wheel has enabled our platform of calculation to sustain massive applications and to develop new breakthroughs。

More importantly, it gives these infrastructure an extremely long useful life。The reason for this is obvious: NVIDIA CUDA has a very wide range of operational applications covering each phase of the AI life cycle, various data processing platforms, and various scientific solvers. Therefore, the real value of the GPU once installed is extremely high. And that's why the Ampere architecture that we published six years ago, the GPU, is going up。

The underlying causes of all this are:The capacity is large, the wheel is strong and the developers are wide-ranging。WHEN THESE FACTORS WORK TOGETHER, TOGETHER WITH THE CONTINUOUS UPDATING OF OUR SOFTWARE, COSTING WILL CONTINUE TO DECLINE. ACCELERATOR COMPUTING, WHILE SIGNIFICANTLY IMPROVING APPLICATION PERFORMANCE, WILL ALLOW USERS TO NOT ONLY JUMP IN PERFORMANCE AT AN EARLY STAGE, BUT ALSO CONTINUE TO BENEFIT FROM REDUCED COMPUTING COSTS AS WE MAINTAIN THE ITERATIVE SOFTWARE OVER TIME. WE'RE WILLING TO PROVIDE LONG-TERM SUPPORT TO EVERY GPU IN THE WORLD, BECAUSE THEY'RE PERFECTLY COMPATIBLE。

We are willing to do so because the capacity is so large — the millions of users can benefit from new optimizations every time they are published。THIS DYNAMIC COMBINATION HAS ALLOWED THE BRITISH WIDA ARCHITECTURE TO KEEP COSTS DOWN WHILE EXPANDING ITS COVERAGE AND ACCELERATING ITS OWN GROWTH, ULTIMATELY STIMULATING NEW GROWTH. CUDA IS THE CORE OF ALL THIS。

From GeForce to CUDA: 25 years of evolution

AND OUR JOURNEY WITH CUDA ACTUALLY STARTED 25 YEARS AGO。

GeForce -- I believe there's a lot of people here who grew up with GeForce. GeForce is the most successful market promotion project in New Zealand. We've been raising future clients since you couldn't afford to buy them. - Your parents, who replaced you, became the first users of Yvette, buying our products year after year, until one day you grew up to be good computer scientists, true customers and developers。

This is the foundation that GeForce laid 25 years ago. Twenty-five years ago, we invented a programmable color monitor. This is an obvious but far-reaching invention for programmable accelerators, and the first programmable accelerator in the world, the pixel colourer. Five years later, we created CUDA — one of the most important investments in our history. The company had limited financial resources at the time, but we invested the vast majority of our profits in extending CUDA from GeForce to every computer. We are so strong because we are convinced of its potential. Despite the initial difficulties, the company has held on to this belief for 13 generations, 20 years, and now CUDA is everywhere。

It's the pixel colorer that drives the GeForce revolution. And about eight years ago, we launched RTX -- a complete overhaul of architecture for modern computer graphics. GeForce brought CUDA to the world, and that is why many scholars, such as Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, Andrew Ng, discovered that GPU could be a powerful tool for accelerating in-depth learning, thereby triggering a massive explosion of artificial intelligence 10 years ago。

Ten years ago, we decided to integrate programmable colours with two new ideas:One is hardware trackingThis is technically challengingAND THE SECOND WAS A FORWARD-LOOKING IDEA -- ABOUT A DECADE AGO, WE PREDICTED THAT AI WOULD COMPLETELY CHANGE COMPUTER GRAPHICS。Just as GeForce brought AI to the world, AI will now also recreate the way the entire computer graphics are made。

Today, I want to show you the future. This is our next generation of graphics, which we call Neural Rendering -- the depth of 3D graphics combined with artificial intelligence. This is DLSS 5, look。

NEURAL RENDERING: INTEGRATION OF STRUCTURED DATA WITH GENERATED AI

Is this amazing? Computer graphics are thus re-engineered。

WHAT DID WE DO? WE COMBINE CONTROLLABLE 3D GRAPHICS (REAL FOUNDATION OF THE VIRTUAL WORLD) WITH THEIR STRUCTURED DATA, AND THEN INTEGRATE THE GENERATION AI AND PROBABILITY CALCULATIONS. ONE IS FULLY CERTAIN, THE OTHER IS HIGHLY REALISTIC — WE COMBINE THE TWO CONCEPTS TO ACHIEVE PRECISION AND CONTROL THROUGH STRUCTURED DATA, WHILE PRODUCING THEM IN REAL TIME. IN THE END, THE CONTENT IS BEAUTIFUL AND COMPLETELY MANAGEABLE。

THE IDEA OF STRUCTURED INFORMATION AND GENERATION AI INTEGRATION WILL BE REPEATED IN ONE INDUSTRY AFTER ANOTHER。STRUCTURED DATA IS THE CORNERSTONE OF A CREDIBLE AI。

Accelerating platforms for structured and non-structured data

Now I want to show you a technical chart。

Structured data - the familiar SQL, Spark, Pandas, Velox, and important platforms such as Snowflake, Databricks, Amazon EMR, Azure Fabric, Google BigQuery, are processing data frames (Data Frame). These data frames, which are like giant spreadsheets, carry all the information from the business world and are the basic fact of business calculations。

IN THE AI ERA, WE NEED TO GET AI TO USE STRUCTURED DATA AND TO ACCELERATE THEIR ACHIEVEMENT. IN THE PAST, THE ACCELERATION OF STRUCTURED DATA PROCESSING WAS AIMED AT MAKING ENTERPRISES MORE EFFICIENT。IN THE FUTURE, AI WILL USE THESE DATA STRUCTURES AT A MUCH FASTER RATE THAN HUMANS, AND AI INTELLIGENTS WILL MAKE EXTENSIVE CALLS TO STRUCTURED DATABASES。

IN TERMS OF UNSTRUCTURED DATA, VECTOR DATABASES, PDF, VIDEOS, AUDIO, ETC. CONSTITUTE THE VAST MAJORITY OF DATA PATTERNS IN THE WORLD - ABOUT 90 PER CENT OF DATA GENERATED ANNUALLY ARE UNSTRUCTURED. IN THE PAST, THESE DATA WERE ALMOST COMPLETELY UNUSABLE: WE READ THEM AND PUT THEM IN THE FILE SYSTEM, THAT IS ALL. WE CANNOT SEARCH OR SEARCH BECAUSE OF THE LACK OF SIMPLE INDEXING OF UNSTRUCTURED DATA AND THE NEED TO UNDERSTAND THEIR MEANING AND CONTEXT。NOW, AI CAN DO THIS -- WITH MULTI-MODULAR SENSORY AND UNDERSTANDING TECHNIQUES, AI CAN READ PDF DOCUMENTS, UNDERSTAND WHAT THEY MEAN, AND EMBED THEM INTO LARGER STRUCTURES THAT CAN BE CONSULTED。

Young Waida created two foundations for this:

CuDF: Accelerated processing of data frames, structured data

CUVS: Processing of AI data for vector storage, semantic and non-structured data

These two platforms will be one of the most important basic platforms for the future。

Today, we announce cooperation with a number of enterprises. The inventor of the IBM-SQL language will use cuDF to accelerate its WatsonX Data platform. Dell worked with us to create a Dell AI data platform that integrates cuDF and cuvs and achieves significant performance improvements in the NTT Data project. Google Cloud, for its part, is now accelerating not only Vertex AI, but BigQuery, and working with Snapchat to reduce its calculated costs by nearly 80%。

The benefits of accelerated calculations are threefold: speed, scale, cost. This goes hand in hand with the logic of Moore's Law — to achieve a leap in performance by accelerating calculations while continuously optimizing algorithms so that everyone can enjoy the calculated costs of continuous decline。

Weeda built an accelerator platform that brought together libraries: RTX, cuDF, cuvs, etc。THESE LIBRARIES ARE INTEGRATED INTO GLOBAL CLOUD SERVICES AND OEM SYSTEMS TO REACH GLOBAL USERS。

In-depth collaboration with cloud service providers

Cooperation with major cloud service providers

Google Cloud:We speed up Vertex AI and BigQuery, and deep integration with JAX/XLA, while doing excellent performance on PyTorch -- the only accelerator in the world to show color on PyTorch and JAX/XLA. We introduced Google Cloud ecology to clients like Base10, CrowdStrike, Puma, Salesforce。

AWS:We accelerate EMR, SageMaker and Bedrock, with deep integration with AWS. This year I was particularly excited that we would introduce OpenAI into AWS, which would significantly boost the consumption growth of AWS cloud computing, and help OpenAI expand its regional deployment and scale。

Microsoft:100 PFLOPS is the first supercomputer we've built, and the first supercomputer that's deployed on Azure, which provides an important basis for working with OpenAI. We speed up the Azure Cloud Service and AI Foundation, work together to advance the Azure area expansion and work in depth on the Bing search。

It's worth mentioning that our "Confidentic Computing" capability, which ensures that even operators do not have access to user data and models, is the first GPU in the world to support classified calculations, supporting the secure deployment of OpenAI and Anthropic models in cloud environments around the world. In the case of Synopsys, we speed up all of its EDA and CAD workflows and deploy them to Microsoft Azure。

Oracle:We're Oracle's first AI client, and I'm proud to be able to explain to Oracle for the first time the concept of an AI cloud. Since then, they have grown rapidly, and we have introduced many partners, such as Cohere, Fireworks and OpenAI。

CoreWeave:THE WORLD'S FIRST AI ORIGINAL CLOUD, CREATED TO SERVE THE GPU HOSTING AND AI CLOUD, HAS AN EXCELLENT CLIENT BASE AND IS GROWING STRONGLY。

Palantir + Dell:A new AI platform based on Palantir's Ontology Platform and AI platforms have been created jointly to deploy fully localized AI - from data processing (to quantitative or structured) to full-scale Accelerator chambers in any country, in any vacuum isolation。

Young Weida has established this special partnership with global cloud service providers — a win-win ecosystem that brings our clients to the cloud。

Vertical integration, horizontal openness: the core strategy of Weeda

Weeda is the first globally vertically integrated, horizontally open company。

The need for this model is very simple: accelerated computing is not a chip problem, nor is it a systemic problem, and its full formulation should be accelerated application。CPU CAN MAKE THE COMPUTER RUN FASTER, BUT THE ROAD HAS REACHED THE BOTTLENECK. IN THE FUTURE, IT IS ONLY THROUGH APPLICATION OR SECTOR-SPECIFIC ACCELERATION THAT PERFORMANCE LEAPS AND COSTS CAN BE SUSTAINED。

That is why Weidar had to cultivate one bank after another, one territory after another, one vertical industry after another. We're a vertically integrated computing firm, and there's no other way. We must understand applications, understand areas, understand algorithms in depth and be able to deploy them in any scenario — data centres, clouds, local, marginal and even robotic systems。

At the same time, Ying Weidar remains horizontally open and willing to integrate technology into the platform of any partner so that the world can enjoy the accelerated dividends。

THIS IS FULLY REFLECTED IN THE CURRENT GTC PARTICIPANT STRUCTURE. THE FINANCIAL SERVICES SECTOR HAS THE HIGHEST PROPORTION OF PARTICIPANTS — HOPEFULLY DEVELOPERS, NOT TRADERS. OUR ECOSYSTEMS COVER BOTH UPSTREAM AND DOWNSTREAM SUPPLY CHAINS. LAST YEAR WAS THE BEST YEAR OF HISTORY FOR BUSINESSES 50 YEARS, 70 YEARS AND 150 YEARS. WE ARE AT THE BEGINNING OF SOMETHING VERY, VERY IMPORTANT。

CUDA-X: ACCELERATING ENGINES IN INDUSTRIES

In all vertical areas, Yin Weidar has a deep layout:

Autopilot:The coverage is wide and far-reaching

Financial services:Quantitative investment is moving from artificial characterization engineering to supercomputer-driven in-depth learning, leading to its "Transformer Time"

Medical health:It's on its own "ChatGPT Time", covering the direction of AI assistive drug discovery, AI intelligence support diagnosis, medical client service, etc

Industry:THE WORLD'S LARGEST WAVE OF CONSTRUCTION IS UNDER WAY

Entertainment and games:REAL TIME AI PLATFORM SUPPORTS TRANSLATION, LIVE BROADCASTING, GAME INTERACTION, AND SMART SHOPPING AGENTS

Robot:For more than a decade, three major computer structures have been in place

Telecommunications:In industries of about $2 trillion size, the base station will evolve from a single communications function to an AI infrastructure platform called Aerial, which works in depth with businesses such as Nokia and T-Mobile

AT THE HEART OF ALL THESE AREAS IS OUR CUDA-X LIBRARY — WHICH IS THE VERY ESSENCE OF IN WEIDA AS A ALGORITHM COMPANY. THESE BANKS ARE THE COMPANY ' S CORE ASSET, ENABLING THE PLATFORM TO DELIVER REAL VALUE ACROSS INDUSTRIES。

One of the most important of these is the cUDNN, which completely innovates artificial intelligence and triggers the Big Bang of modern AI。

(PLAYING CUDA-X PRESENTATION VIDEO)

EVERYTHING YOU'VE JUST SEEN IS A SIMULATION -- A SOLVER BASED ON PHYSICAL PRINCIPLES, AN AI PROXY PHYSICS MODEL, AND A PHYSICAL AI ROBOTIC MODEL. EVERYTHING IS SIMULATION AND THERE IS NO MANUAL ANIMATION OR JOINT BINDING. THIS IS THE CORE OF THE POWER OF INGWEIDA:These opportunities are unlocked through a deep understanding of algorithms and an organic combination of computing platforms。

AI NATIVE ENTERPRISES AND THE NEW CALCULATOR AGE

You've just seen Wal-Mart, Al-Leaa, Morgan Chase, Ross, Toyota and so on, which define today's society as industrial giants, and there's a large number of companies that you've never heard of -- what we call an AI original. The list is extremely large, including OpenAI, Anthropic and a large number of emerging enterprises working in different vertical areas。

Over the past two years, the industry has experienced an alarming flight. Venture investment flows to start-ups reached $150 billion, the largest in human history. More importantly, for the first time, single investments jumped from millions of dollars to hundreds of millions or even billions of dollars。

There's only one reason:For the first time in history, each such company would require considerable computing resources and a large amount of token. The industry is creating, generating, or adding value to the token from institutions such as Anthropic and OpenAI。

JUST AS THE PC REVOLUTION, THE INTERNET REVOLUTION, AND THE MOBILE CLOUD REVOLUTION HAVE EACH CREATED A NUMBER OF EPOCH-MAKING BUSINESSES, THIS GENERATION OF PLATFORM CHANGES WILL ALSO PRODUCE AN EXTREMELY INFLUENTIAL GROUP OF COMPANIES THAT WILL BECOME IMPORTANT FORCES IN THE FUTURE。

Three historic breakthroughs in all this

What happened in the last two years? Three big things。

Number one: ChatGPT, launch generation AI era (2022-end-2023)

NOT ONLY CAN IT BE PERCEIVED AND UNDERSTOOD, IT CAN ALSO GENERATE UNIQUE CONTENT. I SHOWED THE INTEGRATION OF GENERATED AI WITH COMPUTER GRAPHICS. GENERATING AI FUNDAMENTALLY CHANGES THE WAY IN WHICH CALCULATIONS ARE CALCULATED - FROM SEARCH TO GENERATION, WHICH PROFOUNDLY AFFECTS COMPUTER ARCHITECTURE, DEPLOYMENT AND OVERALL SIGNIFICANCE。

Second: reasoning AI, represented by o1

The ability to reason enables AI to self-reflect, plan, decompose issues — decompose issues that it cannot understand directly into manageable steps. O1 makes the generated AI credible and able to reason on the basis of real information. For this reason, the number of tokens entered into context and the output used for thought has increased significantly, and the number of tokens calculated has risen significantly。

Number three: Claude Code, first intelligent model

It reads documents, prepares codes, compiles, tests, evaluates and is iterative. Claude Code has completely overhauled the software project - 100% of the engineers in England are using one or more of Claude Code, Codex and Cursor, and none of the software engineers are using AI。

THIS IS A WHOLE NEW TURNING POINT -- YOU'RE NOT ASKING AI WHAT IT IS, WHERE IT IS, HOW IT'S DONE, BUT WHAT IT'S CREATING, IMPLEMENTING, BUILDING, USING TOOLS, READING DOCUMENTS, DECOMPOSING PROBLEMS AND PUTTING IT INTO ACTION. AI, FROM PERCEPTION, TO GENERATION, TO REASONING, TO REALLY BEING ABLE TO DO IT NOW。

Over the past two years, the amount of calculations required for reasoning has increased by about 10,000 times, and the use has increased by about 100 times. I've always thought that counting demand has increased a million times in the last two years -- it's what everyone feels, it's what OpenAI feels, it's what Anthropic feels. If you get more credit, you get more token, you get more income, and AI gets smarter. The point of reasoning has arrived。

TRILLIONS OF DOLLARS

Last year at this time, I say here that we have a high level of confidence in Blackwell and Rubin's needs and purchase orders before 2026, about $500 billion。

TODAY, A YEAR AFTER GTC, I STAND HERE AND TELL YOU: Looking ahead to 2027, I see at least $1 trillion. And I am sure that the actual computing needs will be much more than that。

2025: The year of English Weida reasoning

2025 is the year of inference. We want to ensure that, in addition to training and post-training, excellence can be maintained at every stage of the AI life cycle, allowing invested infrastructure to function efficiently and continuously, and that unit costs are reduced by the longer it lasts。

At the same time, Anthropic and Meta officially joined the NVIDIA platform, which together represents one third of the global AI computing needs. The open source model is near the front level and is everywhere。

YOUNG WEIDA IS CURRENTLY THE ONLY PLATFORM IN THE WORLD THAT CAN OPERATE ALL AI DOMAINS — LANGUAGES, BIOLOGY, COMPUTER GRAPHICS, COMPUTER VISION, VOICE, PROTEIN AND CHEMISTRY, ROBOTICS, ETC. — ALL AI MODELS, NO MATTER THE EDGE OR THE CLOUD, NO MATTER THE LANGUAGE. THE COMMONALITY OF ALL THESE SCENARIOS WITH THE BRITISH WEIDA ARCHITECTURE MAKES US THE LEAST COSTLY AND MOST TRUSTED PLATFORM。

At present, 60 percent of the business in Ingweida comes from the top five super-large cloud service providers in the world, with the remaining 40 percent in the fields of regional cloud, sovereign cloud, enterprise, industry, robotics, edge computing, etcI DON'T KNOW. THE BREADTH OF AI'S COVERAGE IS ITSELF ITS RESILIENCE — IT IS UNDOUBTEDLY A COMPLETELY NEW PLATFORM CHANGE。

Grace Blackwell and NVLink 72: Brave structural innovation

At the peak of the Hopper architecture, we decided to completely reorganize the system, expand the NVLink from 8 to NVLink 72, and fully decomposition the computing system. Grace Blackwell NVLink 72 is a huge technical bet, which is not easy for all partners, and we would like to express our sincere gratitude to all。

AT THE SAME TIME, WE HAVE INTRODUCED NVFP4 -- NOT JUST A REGULAR FP4 BUT A BRAND-NEW TYPE OF TENSION CORE AND COMPUTING UNIT. WE HAVE DEMONSTRATED THAT NVFP4 CAN ACHIEVE REASONING WITHOUT LOSS OF PRECISION, WITH ENORMOUS PERFORMANCE AND ENERGY EFFICIENCY IMPROVEMENTS, AND EQUALLY FOR TRAINING。

In addition, a series of new algorithms, Dynamo and TensorRT-LLM, have emerged, and we have even dedicated billions of dollars to the optimization of the core to build a supercomputer called DGX Cloud。

Our reasoning proved remarkable. Data from Semi Analysis — the most comprehensive AI reasoning performance assessment to date — shows that YVD is ahead of both dimensions per watt token and per token cost. The original Morse law could have brought H200 1.5 times higher performance, but we did 35 times. Dylan Patel of Semi Analysis even said:"Hwang In-hoon is conservative, actually 50 times. He's right。

And We quote him: "Jensen Sandbagged

The cost of every token in England is the lowest in the world and no one currently has access. This is due to the extremely synergistic design (Extreme Co-design)。

In the case of Fireworks, the average token speed was about 700 per second before the full software and algorithms were updated in Weaverda; the update was close to 5,000 per second, increasing about seven times. And that's the power of extremely synergistic design。

AI Plant: From Data Centre to Token Plant

The data centre used to be a repository of files, and now it is a factory that produces token. Each cloud service provider, each AI company, will use "tokeen plant efficiency" as a core performance indicator in the future。

This is my core argument:

• Vertical axis:Throughput - number of tokens per second at fixed power

• Cross-axis:Interactive Speed (Token Speed) - The faster each reasoning responds, the bigger the available model, the longer the context, the smarter the AI

token is a new bulk commodity that, once matured, is priced in layers:

:: Free floor(high, low speed)

:: Intermediate level(~$3 per million)

• Advanced level(~$6 per million)

:: High-speed layer(~$45 per million token)

Hypervelocity layer(~$150 per million token)

Compared to Hope, Grace Blackwell has increased 35 times in the highest values and introduced a whole new layer. In a simplified model, 25% of power is allocated to four levels, Grace Blackwell generates five times more income than Hope。

Vera Rubin: Next generation AI computing system

(Play Vera Rubin introduction video)

Vera Rubin is a complete, end-to-end optimized system designed specifically for workloads of intelligents:

• Computation core of large language models:NVLink 72 GPU cluster, processing prefill and KV Cache

• New, Vera CPU:SPECIALLY DESIGNED FOR VERY HIGH SINGLE-WAY PERFORMANCE, USING LPDDR5 MEMORY, WITH EXCELLENT ENERGY EFFICIENCY, IS THE ONLY DATA CENTRE IN THE WORLD USING LPDDR5 CPU SUITABLE FOR THE AI SMART TOOL

• Storage systems:Bluefield 4 + CX 9, completely new storage platform for the AI era, global storage industry 100% Add to CPO Spectrum X Switch: a global download of optical Ethernet switches, fully produced

& nbsp; KyberA whole new rack system, supporting 144 GPUs into a single NVLink domain, front-end computing, back-end NVLink exchange to form a giant computer

• Rubin Ultra:Next generation hypernodes, vertically designed to match the Kyber rack to support larger-scale NVLink interconnection

Vera Rubin has been 100 per cent liquid cooling, reduced from two days to two hours of installation, introduced 45°C hot water cooling, and significantly reduced cooling pressure in data centres. This time Satya (Nadra) has sent a letter confirming that the first Vera Rubin hangar is on line with Microsoft Azure, and I am greatly encouraged by this。

Groq Integration: Extreme extension of reasoning performance

We acquired the Groq team and obtained its technical authorization. Groq is a definitive data stream processor (Deterministic DataFlow Processor), which is deployed using static compilers and compilers, with a large number of SRAMs dedicated to the optimization of a single load for reasoning, with extremely low delays and very high token production speed。

However, the limited memory capacity of Groq (SRAM on 500MB film) makes it difficult to carry the parameters of the large model and KV Cache independently, limiting its large-scale application。

The solution is exactly Dynamo, a source of reasoned scheduling software. By Dynamo, we're gathering the line of reasoning

• Decode of Prefill and Attention:Done on Vera Rubin (needs a lot of computing and KV Cache storage)

• Feed-Forward Network Decode:i. e. token generation component, completed on Groq (high bandwidth and low delay required)

By closely linking the two with Ethernets, the delay would be reduced by about half by special models. Under the unified deployment of Dynamo, the "AI plant operating system", the overall performance was increased 35 times and created a completely new level of reasoning that NVLink 72 could not reach before。

Groq and Vera Rubin suggested:

Use 100% Vera Rubin for high value token if task loads are dominated by high throughput

Groq could be introduced, with a recommended rate of about 25% Groq + 75% Vera Rubin Groq LP30, working on a three-star basis, is now in production and Q3 is expected to start delivery. Thank Samsung for his full cooperation。

A historic leap in reasoning

Quantification of previous technological advances: In a 2-year period, the token production rate at 1 Giva AI plant will increase from 22 million token/s to 700 million token/s, up 350 times. And that's the power of extremely synergistic design。

Technology road map

• Blackwell:Currently in production, Oberon standard rack system, copper cable extension to NVLink 72, optional optical extension to NVLink 576

• Vera Rubin (current):Kyber hanger, NVLink 144 (Cronic Cable); Oberon hanger, NVLink 72 + Optical, extended to NVLink 576; Spectrum 6, global first CPO exchange

• Vera Rubin Ultra (forthcoming):A new generation of Rubin Ultra GPU, LP35 chip (first integration NVFP4), further multiplication

• Feynman (next generation):New GPU, LP40 chip (jointly developed by the British and Groq teams, integrated NVFP4); new CPU-Rosa (Rosalyn); Bluefield 5; CX 10; and Kyber racks supporting copper cables and CPO expansions

The road map is clear:The three routes of copper cable extension, optical extension (Scale-Up) and optical extension (Scale-Out) advance in parallel, and we need a sustained expansion of production by all partners in terms of copper cables, fibre optics and CPO。

NVIDIA DSX: A DIGITAL TWIN PLATFORM AT THE PLANT

AI IS BECOMING INCREASINGLY COMPLEX, BUT THE VARIOUS TECHNOLOGY SUPPLIERS THAT MAKE UP IT HAVE NEVER WORKED WITH EACH OTHER IN THE DESIGN PHASE UNTIL THEY MET IN THE DATA CENTRE — CLEARLY NOT ENOUGH。

To that end, we created Omniverse, and based on it, the NVIDIA DSX platform, a platform for all partners to design and operate the Giva-class AI plant in a virtual world。DSX PROVIDES:

:: On-board machinery, thermal, electrical and network simulation systems

:: Connecting to the grid to achieve synergistic energy-saving movement

:: Dynamic energy consumption and cooling optimization based on Max-Q in data centres

It is conservatively estimated that the system can increase energy efficiency about twofold, which is a very significant gain on the scale we talk about. Omniverse, starting with digital Earth, will carry digital twins of all sizes, and we are working with global partners to build the largest computer in human history。

In addition, the British are marching into space. Thor chip has been certified for radiation and is being operated on satellites. We are working with partners to develop Vera Rubin Space-1 for space data centres. Thermal heat is the core challenge in space, where we are gathering top engineers to attack。

OpenClaw: Operating systems of the age of intelligent bodies

Peter Steinberger developed a software called OpenClaw. This is the most popular open-source project in human history, surpassing Linux ' s achievements for 30 years in just a few weeks。

OpenClaw is essentially an intelligence system capable of:

:: Management of resources, access to tools, documentation systems and large language models

:: Implementation of scheduling and timing tasks

:: Gradually decompose the problem and call on sub-intelligence

• Support for any type of input or output (voice, video, text, mail, etc.)

It's really an operating system — an operating system for smart body computers. Windows makes personal computers possible, OpenClaw makes personal intelligence possible。

Each enterprise needs its own OpenClaw strategy, just as we need Linux, HTML, Kubernetes。

ENTERPRISE IT COMPREHENSIVE RESHAPING

IT:DATA AND DOCUMENTS ENTER THE SYSTEM, PASS THROUGH TOOLS AND WORK STREAMS AND EVENTUALLY BECOME TOOLS FOR HUMAN USE. SOFTWARE COMPANIES CREATE TOOLS, SYSTEMS INTEGRATORS (GSI) AND CONSULTING FIRMS HELP BUSINESSES TO USE THEM。

IT:Each SaaS company will be transformed into Aaas (Agentic as a Service, Smart as a Service) - not just to provide tools, but to provide an AI smart body specialized in specific areas。

But here's a key challenge:Intra-enterprise intelligence has access to sensitive data, enforcement codes, and external communications. This must be strictly regulated in the business environment。

To that end, we worked with Peter to integrate safety into the enterprise-level version, which was launched:

• NeMo Claw (reference design):OpenClaw-based enterprise-level reference frame for an integrated NVIDIA set of smarts AI toolkits

• Open Shield (security level):Integrated to OpenClaw, which provides strategic engines, network fences, privacy routes to ensure enterprise data security

• Nemo Cloud:Downloadable and interfaced with the strategic engines of all SaaS enterprises

THIS IS THE RENAISSANCE OF THE ENTERPRISE IT, A $2 TRILLION INDUSTRY THAT IS ABOUT TO GROW ON A TRILLION-DOLLAR SCALE, MOVING FROM PROVIDING TOOLS TO PROVIDING SPECIALIZED AI SMART BODY SERVICES。

I can totally foresee:in the future, every engineer in the company will have an annual token budget. they could be paid hundreds of thousands of dollars a year, and i'll give them an extra token quota equal to half the pay, so their output will be 10 times larger. "how much token quota is attached to entry" has become the subject of new recruitment in silicon valley。

Each enterprise will be a user of token (for engineers) and a producer of token (for its clients). The significance of OpenClaw cannot be underestimated. It is as important as HTML, Linux。

NVIDIA OPEN MODEL INITIATIVE

In terms of self-defined intelligence (Custom Claw), we have provided a front-line model of NVIDIA self-study:

Model field Nemotron Large Language Model Cosmos World Foundation Model

We're at the forefront of technology in every field, and we're committed to continuing iteration - Nemotron 4, Cosmos 1, Cosmos 2, Groq, to the second generation。

Nemotron 3 ranks among the top three global best models in OpenClaw and is at the forefront. Nemotron 3 Ultra will be the strongest basic model ever to support countries in building sovereign AI。

Today, we announced the establishment of the Nemotron Alliance to invest billions of dollars in the development of basic models of AI. Its members include BlackForest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam (India), Thinking Machines (Mira Murati ' s laboratory)。

One enterprise software company after another has integrated the Nemo Claw reference design and the NVIDIA smart body AI toolkit into its own product。

PHYSICS AI AND ROBOTS

DIGITAL INTELLIGENCE MOVES IN THE DIGITAL WORLD -- WRITING CODES, ANALYZING DATA; AND PHYSICS AI IS A BODY OF BUILT INTELLIGENCE, A ROBOT。

THIS GTC HAS A TOTAL OF 110 ROBOTICS, COVERING ALMOST ALL ROBOTICS AND DEVELOPMENT COMPANIES AROUND THE WORLD. IN WEIDA PROVIDES THREE COMPUTERS (TRAINING COMPUTERS, SIMULATION COMPUTERS, ON-BOARD COMPUTERS) AND COMPLETE SOFTWARE STACKS AND AI MODELS。

With respect to autopilot, the autopilot "ChatGPT Time" has arrived. Today, we announce that four new partners will join the British RoboTaxi Ready platform: Biadi, Modern, Japanese, Gili, for a total annual production of 18 million. This, together with the previous Mercedes, Toyota and General, has further strengthened the formation. At the same time, we announced a major collaboration with Uber to deploy and access RoboTaxi Ready vehicles in several cities。

In terms of industrial robotics, many robotics such as ABB, Universal Romanics, and KUKA work with us to combine physical AI models with simulation systems to drive robots to land on global production lines。

In telecommunications, Caterpillar and T-Mobile are also listed. In the future, the wireless base station will no longer be just a communication node, but rather a NVIDIA Aerial AI RAN - a smart edge calculation platform capable of real-time sensoring flows, adjusting beams to shape, and achieving energy efficiency efficiency。

Special link: Olaf

(playing Disney Olaf robot demonstration video)

Newton is working

I'm so happy to see you。

Yes, because I gave you the computer -- Jetson

What's that

It's in your stomach。

Amazing。

You learned to walk in Omniverse。

I like walking. It's better than riding a reindeer looking at the beautiful sky。

Wong In-hoon: This is precisely because of the physics simulation -- the Newton solver based on NVIDIA Warp, which we developed jointly with Disney and DeepMind to adapt to the real physical world。

That's what I'm talking about。

This is where you're smart. I'm a snowman, not a snowball。

Can you imagine? The future of Disneyland -- all these robotic characters roam freely in the park. But honestly, I thought you'd be taller. I've never seen such a short snowman。

Olaf: (without permission)

Wong In-hoon, will you help me finish my speech today

Bravo

Summary of keynote addresses

Hoang In-hoon: Today, we share the following core themes:

1. The arrival of points of reasoning:The reasoning has become the core of AI's work load. Token is a new mass of goods. The reasoning function directly determines income

2. AI FACTORY AGE:The data centre has evolved from a file storage facility to a token production plant, and each company will measure its competitiveness by "AI plant efficiency" in the future

OpenClaw Smart Revolution:OpenClaw opens the age of intelligent computing, and business IT is moving from the age of tools to the age of intelligents, and each enterprise needs to develop an OpenClaw strategy

4. PHYSICAL AI AND ROBOTICS:SMARTY IS BEING SCALED DOWN, AND AUTOPILOTS, INDUSTRIAL ROBOTS, HUMAN ROBOTS TOGETHER CONSTITUTE THE NEXT BIG OPPORTUNITY FOR PHYSICS

THANK YOU, GTC. HAVE FUN

Original Link

Full GTC speech: Market demand will exceed trillions of dollars by 2027; everyone should develop OpenClaw strategies