Overview
The rapid development of embodied intelligence, realized through robots’ interaction with the environment, has evolved from rule-based control to autonomous systems integrating deep learning and reinforcement learning. Current research in embodied intelligence predominantly focuses on rigid robotic platforms. However, the characteristics of rigid materials not only limit flexibility and increase collision risks but also render them insufficiently adaptive in unstructured and constrained environments. To address these limitations, researchers have drawn inspiration from the biological traits of mollusks, introducing flexible materials into robotic design and advancing the field of embodied intelligence centered on soft robotic platforms. Soft robots, owing to their deformable properties, offer highly adaptive and safe solutions, particularly for human-robot collaboration scenarios and tasks in complex environments. Nevertheless, their underactuated nature and strong nonlinear dynamic characteristics pose significant challenges for the design of autonomous control systems.
This workshop focuses on the multimodal perception and decision-making of soft robots, delving into and promoting cutting-edge technologies in embodied intelligence with soft robotics as the carrier. We aim to bring together researchers and practitioners to explore emerging challenges and solutions, including the design of adaptive control systems for soft robots, the integration of multimodal sensory data, and the optimization of decision-making algorithms under nonlinear dynamics.
Invited Speakers

Cosimo Della Santina
TU Delft

Xiang LI
Tsinghua University

Shuqiang Jiang
Chinese Academy of Sciences

Qin Jin
Renmin University of China

Jiebo Luo
University of Rochester
Call for Papers
This workshop aims to bring together researchers and practitioners from different disciplines to share ideas and methods on soft robot learning. We welcome research contributions as well as best-practice contributions on (but not limited to) the following topics:
- Multimodal Embodied Navigation: visual navigation, vision-language navigation, soft robot navigation
- Multimodal Embodied Manipulation: grasping, dexterous manipulation, soft-hand manipulation, tool manipulation
- Embodied Reasoning: spatial reasoning, affordance learning, task planning
- Embodied Perception: multi-modal perception, active perception
- Embodied Simulation: 2D/3D reconstruction, sim-to-real, benchmark
- Control Methods for Soft Robots: model-based/learning-based control methods
For this workshop, we accept the following type of papers:
- 4–8 pages for the main text, plus up to 2 pages for references.
- Topics include but are not limited to original ideas, perspectives, research visions, and open challenges in the themes outlined above.
Submission Website: https://openreview.net/group?id=acmmm.org%2FACMMM%2F2025%2FWorkshop%2FRoboSoft#tab-your-consoles
Submission templates are available on the ACM MM 2025 website.
Submissions must adhere to the ACM MM 2025 submission policies.
The workshop will finally select a Best Paper Award.
Challenge
To advance research in multimodal soft robot planning, we propose two challenge tasks.
This competition employs the open-source software Elastica developed by the Gazzola Lab at UIUC for soft-body dynamics modeling, establishing a benchmark platform for soft robot dynamics and interaction simulation. In this benchmark, soft robots are modeled as a single Cosserat rod freely moving in 3D space—serving as a flexible manipulator in Task 1 and a flexible mobile body in Task 2. The soft rod features an elastic Young’s modulus of 10 MPa, exhibiting the typical bending stiffness of soft robots.
The actuation mechanism is realized through internal moments distributed along the rod’s length, where the continuous activation function is characterized by spline curves defined by N independent control points, approaching zero at both ends of the rod. Precise control is achieved by decomposing the overall actuation into orthogonal moment functions in the local normal, binormal (inducing bending), and orthogonal directions (inducing torsion).
Task 1: Vision-Language Manipulation for Soft Robot
Vision-Language Manipulation aims to endow soft robots with the ability to interact with objects based on human instructions and visual perception—a capability crucial for manufacturing and medical fields, encompassing scenarios such as object grasping, component assembly, item classification, and even surgical assistance. In this task, the soft robot must operate within a complex workspace containing various objects (e.g., cubes, spheres, cones). One end of the robot is fixed to a base, while the other end moves freely to accomplish manipulation tasks.
The system inputs include natural language instructions (specifying the object to manipulate and its target position) and multi-view visual observations. The robot must first recognize and localize the target object from visual inputs, then execute motions to transport it to the specified position. The operation is deemed successful when the object accurately reaches the target location.
Instruction: Move the football to the basketball
Instruction: Move the smaller yellow roadblocks next to the larger roadblocks
Task 2: Vision-Language Navigation for Soft Robot
Vision-Language Navigation requires soft robots to autonomously explore complex environments by comprehending linguistic instructions and parsing visual cues. This task holds significant importance for applications such as disaster search and rescue, as well as exploration. Within this task, the agent must process synchronized multimodal inputs comprising visual observations and natural language instructions, necessitating cross-modal alignment between the vision-language modality and soft-body dynamics modeling. Instructions are translated into continuum mechanics actions.
The solution space must jointly optimize semantic localization accuracy, deformation trajectory smoothness, and obstacle avoidance feasibility under time-varying boundary conditions. Vision-Language Navigation for soft robots establishes a new research domain in embodied intelligence, where soft robots execute navigation tasks through morphological adaptation in dynamic environments.
Instruction: Navigate to the basketball between two footballs
Instruction: Navigate to the football next to the yellow cone
Important Dates
- Paper submission deadline: July 11, 2025
- Challenge submission deadline: July 30, 2025
- Paper notification: August 1, 2025
- Camera-ready submission: August 11, 2025
- Workshop date: October 27–28, 2025
Program Committee

Si Liu
Beihang University

Li Wen
Beihang University

Chen Gao
Beihang University

Ziyu Ren
Beihang University

Luting Wang
Beihang University

Jiaqi Liu
Beihang University

Heqing Yang
Beihang University

Xingyu Chen
Beihang University

Youning Duo
Beihang University
Challenge Technical Committee

Ziyu Wei
Beihang University

Hongliang Huang
Beihang University