This repo contains the code implementation for VLA-Touch:
|
VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback
|
[Arxiv] [Project Page] [Video]
We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing without fine-tuning the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision.
Figure 1:Dual-level Tactile feedback framework of VLA-Touch. Planning: Given a scene image
-
Follow RDT-1B installation
See the official instructions: RDT-1B installation -
Clone VLA-Touch and copy files to RDT-1B (replace original files):
git clone https://github.com/jxbi1010/VLA-Touch # Copy relevant files to your RDT-1B directory, replacing originals as needed -
Download dataset and controller checkpoints: Google Drive Folder or Hugging Face for Processed Dataset
- Copy controller checkpoints:
cp controller_ckpt/* VLA/residual_controller/checkpoints/
- Copy controller checkpoints:
-
Dataset processing (for reference):
-
Copy dataset files:
cp vla_data/* VLA/data/datasets/ -
Convert raw data to
.h5format:# Run the provided scripts to convert raw data cd VLA/data/franka_data python convert*_to_h5.py # Replace with actual processing scripts # The resulting files should look like: vla_data/wipe_example/episode_*.h5
-
If you need our processed dataset, kindly approach us.
-
-
Compute dataset stats and update configs:
# Use RDT scripts to compute dataset statistics python compute_dataset_stats.py -
Install Octopi:
Follow the instructions in:octopi/README.md -
Copy Octopi data files:
# Download from Google Drive and copy to the correct location cp octopi_data/* octopi/octopi_s/data/
- Follow RDT-1B for VLA base model finetuning without tactile data.
- Run scripts in residual_controller/ for controller training and test, e.g.
# training for interpolant controller
python bridge_train.py
# testing for interpolant controller
python bridger_test.py
#training for residual controller
python lstm_train.py
# testing for residual controller
python lstm_step_test.py- For ocpoti inference, run octopi/octopi_s/touch_vla.py using your own VLM API.
- Inference method is modified based on RDT inference script, our version will release soon.
If you find our work useful, please consider citing:
@misc{bi2025vlatouchenhancingvisionlanguageactionmodels,
title={VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback},
author={Jianxin Bi and Kevin Yuchen Ma and Ce Hao and Mike Zheng Shou and Harold Soh},
year={2025},
eprint={2507.17294},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2507.17294},
}VLA-Touch is licensed under the MIT license. See the LICENSE file for details.
VLA-Touch is developed based on many open-sourced works, including BRIDGeR, Octopi and RDT-1B. We thank all these authors for their nicely open sourced code and their great contributions to the community.
