PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

Xingyu Peng^1,2*, Chen Gao^1,3*, Liankai Jin^1*, Annan Li¹, Si Liu¹⁺

¹Beihang University ²Zhongguancun Academy ³National University of Singapore

^*Equal Contribution ⁺Corresponding Author

Checkpoints Dataset Code

Overview of BiCoord. (a) The data generation pipeline. (b) An example trajectory of Cook task is exhibited. Each trajectory is divided into several stages with sub-goals and arm behaviours. Besides, key features of bimanual coordination are embodied in BiCoord, like phased coupling, spatial-temporal constraint and predictive coordination. (c) We design metrics to evaluate the bimanual manipulation benchmarks. STI characterizes the temporal and spatial coupling of the dual arms, and long-horizon metrics reflect task length. We also evaluate four methods on BiCoord in both single-task and multi-task setting.

Abstract

Bimanual manipulation, i.e., the coordinated use of two robotic arms to complete tasks, is essential for achieving human-level dexterity in robotics. Recent simulation benchmarks, e.g., RoboTwin and RLBench2, have advanced data-driven learning for bimanual manipulation. However, existing tasks are short-horizon and only loosely coordinated, failing to capture the spatial-temporal coupling inherent in real-world bimanual behaviors. To address this gap, we introduce BiCoord, a benchmark for long-horizon and tightly coordinated bimanual manipulation. Specifically, BiCoord comprises diverse tasks that require continuous inter-arm dependency and dynamic role exchange across multiple sub-goals. Also, we propose a suite of quantitative metrics that evaluate coordination from temporal, spatial, and spatial-temporal perspectives, enabling systematic measurement of bimanual cooperation. Experimental results show that representative manipulation policies, e.g., DP, RDT, Pi0, and OpenVLA-OFT, struggle with long-duration and highly coupled tasks, revealing fundamental challenges in achieving long-horizon and tight coordination tasks. We hope BiCoord can serve as a foundation for studying long-horizon cooperative manipulation and inspire future research on coordination-aware robotic learning.

Features of BiCoord

Details of Tasks

Experimental Results

Results of single-task learning.

Results of multi-task learning.

Acknowledgements

We sincerely thank RoboTwin 2.0 for their outstanding contributions to bimanual manipulation simulation, convenient action APIs and open-source release. We also thank DP, RDT, OpenVLA-OFT and Pi0 for their outstanding and representive contributions in manipulation and VLAs.

License

This benchmark is released under the MIT License.