This repo contains code of Touch-LLM for UniTouch. Our code is built on top of the ImageBind and LLaMA-Adapter codebases.
If you only need the touch encoder to extract tactile embeddings (without the language model):
-
Download the pretrained touch encoder (
last_new.ckpt) from the HuggingFace model hub and put it in the./UniTouchfolder, same level astouch_qa.py. -
Run the standalone touch encoder:
python load_touch_encoder.pyThis will load only the touch encoder and extract 1024-dimensional embeddings from touch images. You can use these embeddings for:
- Touch-based similarity search
- Touch classification
- Touch feature extraction
- Cross-modal retrieval (touch-to-vision, touch-to-text, etc.)
Example usage:
import torch
import ImageBind.data as data
from ImageBind.models.x2touch_model_part import x2touch, ModalityType
# Load touch encoder
model = x2touch(pretrained=True)
model.eval()
# Extract embeddings from touch images
touch_images = data.load_and_transform_vision_data(["path/to/touch.jpg"], device="cuda")
with torch.no_grad():
embeddings = model({ModalityType.TOUCH: touch_images})[ModalityType.TOUCH] # Shape: [1, 1024]If you need the full Touch-LLM for tactile question answering:
-
Download the pretrained touch encoder (
last_new.ckpt) from the HuggingFace model hub and put it in the./UniTouchfolder, same level astouch_qa.py. -
Download the folder
ckptsfrom the HuggingFace model hub and put it in the./UniTouchfolder, same level astouch_qa.py. -
Download the folder
llama_orifrom the HuggingFace model hub and put it in the./UniTouchfolder, same level astouch_qa.py. -
Run Touch-LLM for tactile question answering:
CUDA_VISIBLE_DEVICES=0 python touch_qa.py@inproceedings{yang2024binding,
title={Binding touch to everything: Learning unified multimodal tactile representations},
author={Yang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={26340--26353},
year={2024}
}