Content


HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model

*Equal Contributions    Equal Advising
1Tsinghua University, 2HKUST(Guangzhou), 3ETH Zurich, 4Xiaomi Robotics Lab
HAIC teaser

Abstract

Humanoid robots exhibit significant potential for executing complex whole-body interaction tasks in unstructured environments. While recent advancements in Human-Object Interaction (HOI) have been substantial, prevailing methodologies predominantly address the manipulation of fully actuated objects, where the target is rigidly coupled to the robot's end-effector and its state is strictly constrained by the robot's kinematics. This paradigm neglects the pervasive class of underactuated objects characterized by independent dynamics and non-holonomic constraints, which pose significant control challenges due to complex coupling forces and frequent visual occlusions. To bridge this gap, we propose HAIC, a unified framework designed to enable robust interaction across a spectrum of object dynamics without reliance on external state estimation. Central to our approach is a novel dynamics predictor that infers high-order object states, specifically velocity and acceleration, solely from proprioceptive history. These predictions are explicitly projected onto static geometric priors to construct a spatially grounded representation of dynamic occupancy, allowing the policy to internalize collision boundaries and contact affordances in visual blind spots. We employ an asymmetric fine-tuning strategy where the world model continuously adapts to the student policy's exploration, ensuring robust state estimation under distribution shifts. We evaluate our framework on a humanoid robot. Empirical results demonstrate that HAIC achieves high success rates in agile object interactions, including skateboarding, cart pushing, and cart pulling under various weight load conditions, by proactively compensating for inertial physical perturbations, while HAIC simultaneously masters multi-object interaction involving long-horizon tasks and carrying a box across composed terrain by predicting the dynamics of multiple objects.

Underactuated Object Interaction

Underactuated object interaction

Case #1: Skateboarding

Case #2: Cart Pushing

Case #3: Cart Pulling

Long-horizon Interaction


Multi-terrain Interaction

Multi-terrain interaction

Case #1: Cross slope

Case #2: Cross stair

Case #3: Composed terrain

Generalization

Generalization #1: Box size

Generalization #2: Terrain Rotation

Generalization #3: Load Weight

Failure Cases

BibTeX

@article{li2026haic,
  title = {HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model},
  author = {Li, Dongting and Chen, Xingyu and Wu, Qianyang and Chen, Bo and Wu, Sikai and Wu, Hanyu and Zhang, Guoyao and Li, Liang and Zhou, Mingliang and Xiang, Diyun and Ma, Jianzhu and Zhang, Qiang and Xu, Renjing},
  journal = {arXiv preprint arXiv:2602.11758},
  year = {2026}
}