Overview
Natural organisms, particularly soft-bodied animals, effectively explore and interact with their environments using highly redundant structures. Inspired by nature, engineers have integrated soft materials into rigid robotic joints, leading to significant advancements in the field of soft robotics. This innovative design enables robots to bend, twist, and continuously deform along their entire length. The inherently deformable nature of soft robots provides safe and adaptive solutions, especially in applications such as human-robot collaboration, search and rescue operations, and exploration and manipulation in unstructured environments.
However, soft robots are inherently underactuated, highly nonlinear mechanical systems, immersed in an elastic potential field and subject to dissipative forces that contribute to their stability. This underactuated nature, combined with the complex dynamics, presents significant challenges in controlling soft robots. These challenges have attracted researchers from various fields, including mechanical engineering, control theory, and computer science.
Recent advances in multimodal learning, particularly the integration of vision and language, offer a promising direction for improving soft robot autonomy. By leveraging vision-language models, soft robots can interpret human instructions in natural language while grounding their actions in visual perception. Therefore, this workshop focuses on multimodal soft robot planning, aiming to develop efficient control strategies that bridge the gap between high-level human intent and low-level robot execution. The ultimate goal is to enhance the adaptability and usability of soft robots in real-world applications.
Invited Speakers

Cosimo Della Santina
TU Delft

Xiang LI
Tsinghua University

Shuqiang Jiang
Chinese Academy of Sciences

Qin Jin
Renmin University of China

Jiebo Luo
University of Rochester

Mohan Kankanhalli
National University of Singapore
Schedule
Tentative schedule - Half-day workshop
Time | Event |
---|---|
TBD | Opening Remarks |
TBD | Invited Talk: Cosimo Della Santina |
TBD | Invited Talk: Xiang LI |
TBD | Invited Talk: Shuqiang Jiang |
TBD | Coffee Break |
TBD | Invited Talk: Qin Jin |
TBD | Invited Talk: Jiebo Luo |
TBD | Invited Talk: Mohan Kankanhalli |
TBD | Challenge Results Announcement |
TBD | Challenge Winner Presentations |
TBD | Paper Presentations |
TBD | Panel Discussion & Closing Remarks |
Challenge
To advance research in multimodal soft robot planning, we propose two challenge tasks:
Task 1: Vision-Language Manipulation for Soft Robot
In this task, a soft robot operates within a cluttered workspace containing various objects, such as cubes, spheres, and cones. One end of the soft robot is fixed to the surface, while the other end moves freely to perform manipulation. The robot receives natural language instruction and multi-perspective visual observations as inputs. The instruction specifies the objects to be manipulated and their target locations.

Instruction: "pick up the cone and place it in the gray area on the right side of the workspace."

Instruction: "pick up the blue cylinder on the left and place it in the gray area."

Instruction: "place the blue cube in the blue area and then place the green cube in the green area."

Instruction: "pick up the red cube behind the tall cone and place it in the red area."
Easy-Track: Single-Object Manipulation
In the Easy-Track, each instruction involves only one object. For example, an instruction might be: “Pick up the cone and place it in the gray area on the right side of the workspace.”

Illustration of Task 1 Easy-Track: The soft robot must reach the cone and place it in the target area while avoiding collision with the cube in between.
Hard-Track: Multi-Object Manipulation
The Hard-Track increases complexity by involving multiple objects within a single manipulation task. For example, an instruction might be: “Pick up the red cube behind the tall cone and place it in the red area.”

Illustration of Task 1 Hard-Track: The soft robot must first move the cone and cylinder obstacles before placing the cube in its target location.
Task 2: Vision-Language Navigation for Soft Robot
Soft robot vision-language navigation establishes a novel research field for embodied intelligence where compliant robots execute navigation tasks through morphological adaptation in dynamic environments.
Easy-Track: Navigation in sparse obstacles example
Easy-Track: Different view of navigation with sparse obstacles
Hard-Track: Navigation in dense obstacles example
Hard-Track: Multiple sub-goals navigation scenario
Organizers

Si Liu
Beihang University

Li Wen
Beihang University

Chen Gao
Beihang University

Ziyu Ren
Beihang University

Luting Wang
Beihang University

Jiaqi Liu
Beihang University

Heqing Yang
Beihang University

Xingyu Chen
Beihang University

Youning Duo
Beihang University
Call for Papers
This workshop aims to bring together researchers and practitioners from different disciplines to share ideas and methods on soft robot learning. We welcome research contributions as well as best-practice contributions on (but not limited to) the following topics:
- Multimodal robot manipulation in constrained environments.
- Multimodal navigation with mobile robots.
- Learning-based paradigms for soft robot control.
- Multimodal perception and modeling for soft-bodied robots.
All submissions must be original work not under review at any other workshop, conference, or journal. The workshop will accept papers describing completed work as well as work in progress. One submission format is accepted: full paper, which must follow the formatting guidelines of the main conference ACM MM 2025.
Important Dates
- Paper submission deadline: TBD
- Paper notification: TBD
- Camera-ready submission: TBD
- Challenge submission deadline: TBD
- Workshop date: At ACM MM 2025
Prizes
- Prizes for each track winner
- Best paper award