RoboSoft’25: The 1st International Workshop on Vision-Language in Soft Robot

Leaderboard

Registration: All teams wishing to participate and obtain official eligibility must register here, and may update team member information afterward.

Submission: All teams may submit docker images here.

Note: If multiple submissions are made, the best one shall prevail.

**VLM**
Team ID	Name	Score	Submit Time
4c1593	SoftVisioBots	-	-
-	-	-	-
-	-	-	-
-	-	-	-
-	-	-	-

**VLN**
Team ID	Name	Score	Submit Time
4c1593	SoftVisioBots	-	-
-	-	-	-
-	-	-	-
-	-	-	-
-	-	-	-

*The above tables are updated every two days.

Data Details

Download VLM data

Download VLN data

The data directory structure is as follows:

train
├── annotations.json
├── scenarios
│   ├── 0
│   │   └── config.yaml
│   ├── ...
│   └── 99
└── trajectories
    ├── 0
    │   ├── actions.json (only exists for VLN tasks)
    │   ├── state_action.pkl
    │   └── visual
    │       ├── step_00000.png
    │       ├── step_00001.png
    │       └── ...
    ├── ...
    └── 99

scenarios directory stores the environmental configurations.

trajectories directory stores data annotations corresponding to each piece of data, with folders named by the data’s id.

visual directory stores visualization renderings of the task process from top-down view. The visualizations provided in the dataset are rendered every 1333 simulation time steps (i.e., 1333 time steps between two frames). For more fine-grained visualization, users can render independently.

File Details

config.yaml contains configuration information for the simulation environment.

Example

objects:
- center:
  - 0.4863890172516796
  - 0.2153593989325976
  - 3.1806980291836973
  color: ''
  mesh_path: ./assets/cylinder.stl
  scale:
  - 0.215
  - 0.215
  - 0.215
  shape: cylinder
  type: mesh_surface
- center:
  - 2.8291035111323892
  - 0.23340298631651013
  - 4.21278268891604
  density: 1.0
  radius: 0.23340298631651013
  type: sphere

rod:
  base_length: 0.5
  base_radius: 0.025
  density: 1000
  direction:
  - 0.0
  - 0.0
  - 1.0
  n_elem: 20
  normal:
  - 0.0
  - 1.0
  - 0.0
  poisson_ratio: 0.5
  start:
  - 0.0
  - 0.0
  - 0.0
  youngs_modulus: 10000000
simulator:
  collect_data: true
  final_time: 10.0
  rendering_fps: 15
  time_step: 5.0e-05
  update_interval: 1

Keyword	Meaning
objects	Configurations for all objects in the environment except the soft robot, including basic types such as sphere and mesh surface.
rod	Configuration for the Cosserat rod simulating the soft robot; parameters are not recommended to be modified.
simulator	Simulator-related configurations. The maximum simulation duration can be adjusted by modifying the final_time field; other parameters are not recommended to be changed.

annotations.json contains all annotation information.

Example

// VLM
[
    {
        "id": 0,
        "target_object_id": 5,
        "target_position_id": 6,
        "instruction": "Pick up the basketball and place it in the gray zone."
    },
    {
        "id": 1,
        "target_object_id": 5,
        "target_position_id": 6,
        "instruction": "Pick up the red book and place it in the gray zone."
    }
]
// VLN
[
    {
        "id": 0,
        "target_id": 10,
        "description": "Explore the environment and find: indigo hemisphere, remember to carefully cross any potential obstacles."
    },
    {
        "id": 1,
        "target_id": 10,
        "description": "Navigate to: red cone, ensuring you avoid all obstacles to arrive safely."
    }
]

Keyword	Meaning
id	The data id. The corresponding environmental configuration and data annotations are in subfolders named by the id within the trajectories folder.
target_id	(VLN) ID of the target object in the environment.
target_object_id	(VLM) ID of the target object in the environment.
target_position_id	(VLM) ID of the target position in the environment, mainly used for simulation environment construction.
instruction	Text guidance for the task.

state_actions.pkl stores data annotations of actions executed and the soft robot’s state for completing the current task.

Keyword	Meaning
rod_time	(n_time_steps, ), recording the time instants corresponding to the rod’s positions.
torque_time	(n_time_steps, ), recording the time instants of applied torques.
position	(n_time_steps, 3, n_elem+1), position information of the rod at each time step. 3 denotes the number of spatial coordinates, and n_elem is the total number of rod segments (details can be found in Cosserat rod simulation methods). Positions record the start and end positions of each segment rather than the center, hence the third dimension is n_elem+1 instead of n_elem.
velocity	(n_time_steps, 3, n_elem+1), velocity information of the rod at each time step, with each dimension defined as above.

For VLN, we have encapsulated higher-level atomic actions (i.e., move forward, turn left, turn right) and map these actions to torques for control. The atomic actions taken to complete the task in actions.json are formatted as timestep: action_type. For time steps not explicitly specified, the action from the nearest specified time step is maintained. The torques corresponding to these atomic actions are exactly the torques recorded in state_actions.pkl.

Example

{"9000": 1, "43000": 2, "49000": 1}

Codebase

Inference

# inference VLN task
python -m ssim.inference.vln \
    --data-path your/data_path \
    --work-dir your/work_dir \
    --control-mode torque \ # or action, depending on your control mode
    --run-gt \ # If you want to execute the provided trajectory
    --visualize \ # If you want to save images and a video of the entire trajectory
# inference VLM task
python -m ssim.inference.vlm \
    --data-path your/data_path \
    --work-dir your/work_dir \
    --run-gt \ # If you want to execute the provided trajectory
    --visualize \ # If you want to save images and a video of the entire trajectory

Customize Model

Define Your Model

ssim/inference/model.py provides the basic framework for custom models. You can define your own model based on VLNModel or VLMModel, implement their predefined __init__() and forward() methods, and ensure interface alignment. Once completed, you can directly run the test script to test your model.

Train Your Model

If you need to perform imitation learning training, after defining your model, you can directly use the provided dataset for training.

For reinforcement learning training:

SoftManipulationEnvironment for VLM tasks is defined in ssim/envs/soft_manipulation.py.
The base environments for VLN tasks are defined in ssim/envs/navigation_snake.py, including:
- NavigationSnakeTorqueEnvironment (for torque-based control)
- NavigationSnakeActionEnvironment (for high-level action control)

You can develop your RL training environment by inheriting from these base environments, customizing methods such as reward() and get_state(), and refer to the initialization scripts in ssim/inference/vln or ssim/inference/vlm for environment setup.

Development Toolkit

During this workshop, we provide a base Docker image for teams to set up environment. The image is pre-configured with dependencies for Elastica and PyTorch, and can be obtained via docker pull command.

To build base environment container, you can follow the steps below:

docker pull crpi-juuq9gkz24fdpj2r.cn-beijing.personal.cr.aliyuncs.com/wangluting/robosoft:1.0
docker run -v <data_path>:/data -d --name <name> -it <image> /bin/bash

Teams are required to develop programs based on the provided base image, push the image to Docker Hub, and submit the image URL on Docker Hub. We will use this image for testing.

Specifically, we will mount the test data into the container directory data via docker run -v, so please ensure the /data directory in the image is empty.

After mounting, the following directory structure will be obtained. Teams should maintain the file layout and complete the task based on this structure.

/data
├── vln_train
├── vlm_train
├── vln_eval
│   ├── ...
│   └── annotations.json
└── vlm_eval
    ├── ...
    └── annotations.json

RoboSoft'25: The 1st International Workshop on Vision-Language in Soft Robot

ACM Multimedia 2025 Workshop
October 28, 2025
Dublin, Ireland

Leaderboard

Data Details

Codebase

Inference

Customize Model

Development Toolkit

RoboSoft'25: The 1st International Workshop on Vision-Language in Soft Robot

ACM Multimedia 2025 WorkshopOctober 28, 2025Dublin, Ireland

Leaderboard

Data Details

Codebase

Inference

Customize Model

Development Toolkit

ACM Multimedia 2025 Workshop
October 28, 2025
Dublin, Ireland