Litecoin

WORLD MODELS MOVE FROM PREDICTION TO PLANNING, HWM AND LONG-RANGE CONTROL CHALLENGES

2026/04/18 02:39
👤ODAILY
🌐en

On 3 April, NYU and the Meta FAIR team published the paper Hierarchical Planning with Late World Models (HWM). Original address: (https://arxiv.org/abs/2604.03208) The paper does not continue to focus on generating a more realistic picture of the future, but instead turns to a long-standing implementation challenge of a world model. Once the task chain is lengthened, predictions will accumulate and action search space will expand rapidly。

WORLD MODELS MOVE FROM PREDICTION TO PLANNING, HWM AND LONG-RANGE CONTROL CHALLENGES

Introduction

THE RESEARCH FOCUS OF THE WORLD MODEL IN RECENT YEARS WAS INITIALLY ON MANIFESTATIONS OF LEARNING AND FUTURE PROJECTIONS. MODELS UNDERSTAND THE WORLD AND THEN PUSH FOR THE FUTURE. THIS ROUTE HAS PRODUCED A REPRESENTATIVE SET OF RESULTS. V-JEPA 2Video Point Embeding Predicative Architecture 2— Meta launched a video world model in 2025) with over 1 million hours of Internet video pre-training, combined with a small amount of robotic interactive data, demonstrates the potential of world models for understanding, predicting and zero sample robotic planning。

But the model predicts that it will not be the same as a long mission. Faced with multiple stages of control, the system usually encounters two pressures. One is the continued accumulation of predicted errors in long rollout (a continuous multi-step exercise), resulting in an increasing vulnerability of the entire path to target deviation. Another is the rapid expansion of the operational search space with the growth of horizon, leading to continued increases in planning costs. Instead of rewriting the bottom learning route of the world model, HWM has added a layered planning structure to the world model with action conditions in place, allowing the system to organize a phase path before processing local actions。

Technically, V-JEPA 2 (https://ai.meta.com/research/vjepa/) prefers world representation and base projection, HWM prefers long-term planning, WAVWorld Action Plan: Self-Improving World Models via Forward-Inverse Asymmetryhttps://arxiv.org/abs/2604.01985) more biased models identify and modify their own predictions. the three lines are gradually shrinking. the focus of world modelling research has shifted from mere predictions of the future to the transformation of predictive capabilities into implementable, reversible, verifiable system capabilities。

I. Why long-term control remains a bottleneck in the world model

the difficulty of long-term control is easier to see in robotic missions. using mechanical arm operations, for example, a cup is seized and put in a drawer, which is not a single move, but a sequence of steps. the system is to approach the object, adjust the attitude, complete the capture, move to the target position, reprocess the drawers and place them. once the chain is long, both problems arise simultaneously. on the one hand, the predicted errors will accumulate along the rollout and on the other hand, the motion search space will expand rapidly。

What is lacking in the system is often not the ability to project locally, but the ability to organize long-range targets into stages. Many actions, which are localizedly deviating from the target, are in fact intermediate steps required to achieve the goal. For example, you lift your arms before you take them, you turn back a bit before you open your drawer and you adjust your angle。

In demonstration missions, world models already provide consistent predictions. But when we enter the real control scene, the performance starts to decline, and problems follow. Pressure comes not only from the signs themselves, but also from the planning level。

II. HWM HOW TO RECONSTRUCT THE PLANNING PROCESS

HWM SPLITS THE ORIGINAL LEVEL OF THE PLANNING PROCESS INTO TWO LAYERS. THE UPPER IS RESPONSIBLE FOR THE DIRECTION OF THE STAGE AT THE LONGER TIME SCALE AND THE LOWER IS RESPONSIBLE FOR PARTIAL EXECUTION AT THE SHORTER TIME SCALE. THE MODEL IS NOT PLANNED AT ONE RHYTHM, BUT AT TWO DIFFERENT TIME RHYTHMS。

When a single layer handles a long task, a direct search of the entire action chain is usually required in the bottom action space. The longer the mission, the higher the search costs, the easier the prediction error will spread along multiple steps rollout. After the HWM break-up process, the top handles only the route selection at a longer time scale, the lower handles only the completion of this current part of the move, and the whole long task is broken down into several shorter tasks, reducing the complexity of planning

There is also a key design where a high-level action is not simply a record of the difference between two states, but rather a coder that compresses a lower-level action into a higher-level action. For a long mission, the key is not only how much is different between the starting point and the end point, but also how the intermediate step is organized. High-levels can easily lose path information in this action chain if they only look at the shift。

HWM REFLECTS A HIERARCHICAL APPROACH TO TASK ORGANIZATION. IN THE FACE OF A MULTI-PHASED PROCESS, THE SYSTEM NO LONGER CARRIES OUT ALL ACTIONS IN A ONE-TIME FASHION, STARTING WITH A MORE CRUDE PHASE PATH, FOLLOWED BY PARAGRAPH-BY-PARAGRAPH IMPLEMENTATION AND AMENDMENT. WHEN THIS HIERARCHY ENTERS THE WORLD MODEL, THE PREDICTIVE CAPACITY BEGINS TO BE MORE STEADILY TRANSFORMED INTO PLANNING CAPACITY。

III. From 0% to 70%, what did the results show

In the real world captured and placed tasks set out in the paper, the system is given only the final target terms and does not provide artificially detached intermediate targets. Under these conditions, the success rate of HWM is 70 per cent, while the single-layer world model success rate is 0 per cent. The long-term tasks that were almost impossible to accomplish, with the introduction of tiered planning, have become an achievable outcome。

The paper also tested simulations such as push object operations and maze navigation. The results show that tiered planning has not only increased the success rate but also reduced the costing of the planning phase. The costing of the planning phase in some environments can be reduced to a maximum of about a quarter, while maintaining a higher or comparable success rate。

IV. FROM V-JEPA TO HWM TO WAV

V-JEPA 2 represents the path of the world. V-JEPA 2 pre-trained with more than 1 million hours of Internet video, combined with less than 62 hours of robotic video for post-pre-training targeted training, to obtain a world model for understanding, predicting and planning the physical world. It shows that models can obtain world signs through large-scale observations and migrate them to robotic planning。

HWM IS NEXT. MODELS ALREADY HAVE WORLD REPRESENTATION AND BASE FORECASTING CAPABILITIES, BUT ONCE THEY ENTER MULTIPLE STAGES OF CONTROL, PROBLEMS OF ERROR ACCUMULATION AND SEARCH SPACE EXPANSION ERUPT. HWM DOES NOT CHANGE THE BOTTOM EXPRESSION OF THE LEARNING ROUTE, BUT INSTEAD INCORPORATES A MULTI-TIMESCALE PLANNING STRUCTURE BASED ON A WORLD MODEL WITH ACTION CONDITIONS. IT ADDRESSES THE QUESTION OF HOW THE MODEL FORMS AN INTERMEDIATE SET OF STEPS FOR MOVING FORWARD ON A PARAGRAPH-BY-PARAGRAPH BASIS。

WAV, FOR ITS PART, FURTHER FOCUSES ON CERTIFICATION CAPACITY. THE WORLD MODEL, WHICH SEEKS TO MOVE INTO A TACTICAL OPTIMIZATION AND DEPLOYMENT SCENARIO, CANNOT ONLY PREDICT, BUT ALSO DETECT AND CORRECT AREAS WHERE IT IS PRONE TO DISTORTION. IT'S CONCERNED ABOUT HOW MODELS EXAMINE THEMSELVES。

V-JEPA IS BIASED TOWARDS WORLD REPRESENTATION, HWM IS BIASED TOWARDS MISSION PLANNING, WAV IS BIASED TOWARDS RESULTS. THE THREE ARE DIFFERENT, BUT IN THE SAME DIRECTION. THE NEXT PHASE OF THE WORLD MODEL IS NO LONGER JUST AN INTERNAL FORECAST, BUT A SYSTEM CAPABILITY TO PREDICT, PLAN AND VALIDATE。

V. Moving from internal projections to implementable systems

Much of the world ' s modelling work in the past has been closer to improving the continuity of future state predictions or improving the stability of internal world manifestations. However, the focus of the current study has begun to change, and the system needs to evolve both into an environmental judgement and into an action and to continue to revise the next step once the results are available. To be closer to real deployment, it is necessary to control the spread of errors in long-range missions, to compress the search range and to reduce the cost of reasoning。

These changes also affect AI anent. Many agent systems have been able to perform short-link tasks, such as calling tools, reading documents and implementing several step instructions. However, once the task becomes a long chain, multi-phased one requiring intermediate re-planning, performance declines. This is not fundamentally different from the difficulty of robotic control, but is a lack of high-level path organization capacity, leading to a disconnect between local implementation and overall objectives。

THE HIERARCHICAL APPROACH PROVIDED BY HWM, WITH SENIOR RESPONSIBILITY FOR PATH AND STAGE OBJECTIVES, LOWER RESPONSIBILITY FOR LOCAL ACTION AND FEEDBACK PROCESSING, AND ADDITIONAL VALIDATION OF RESULTS, WILL CONTINUE TO EMERGE IN MORE SYSTEMS IN THE FUTURE. IN THE NEXT PHASE OF THE WORLD MODEL, THE FOCUS IS NO LONGER JUST ON PREDICTING THE FUTURE, BUT ON ORGANIZING PREDICTION, IMPLEMENTATION AND REVISION INTO A FUNCTIONAL PATH。

QQlink

無加密後門,無妥協。基於區塊鏈技術的去中心化社交和金融平台,讓私隱與自由回歸用戶手中。

© 2024 QQlink 研發團隊. 保留所有權利.