Skip to content

cfeng16/UniTouch

Repository files navigation

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations (CVPR 2024)

arXiv HF Models Static Badge


This repo contains code of Touch-LLM for UniTouch. Our code is built on top of the ImageBind and LLaMA-Adapter codebases.

UniTouch model

Inference with Pretrained Models

Option 1: Touch Encoder Only (Lightweight)

If you only need the touch encoder to extract tactile embeddings (without the language model):

  1. Download the pretrained touch encoder (last_new.ckpt) from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.

  2. Run the standalone touch encoder:

python load_touch_encoder.py

This will load only the touch encoder and extract 1024-dimensional embeddings from touch images. You can use these embeddings for:

  • Touch-based similarity search
  • Touch classification
  • Touch feature extraction
  • Cross-modal retrieval (touch-to-vision, touch-to-text, etc.)

Example usage:

import torch
import ImageBind.data as data
from ImageBind.models.x2touch_model_part import x2touch, ModalityType

# Load touch encoder
model = x2touch(pretrained=True)
model.eval()

# Extract embeddings from touch images
touch_images = data.load_and_transform_vision_data(["path/to/touch.jpg"], device="cuda")
with torch.no_grad():
    embeddings = model({ModalityType.TOUCH: touch_images})[ModalityType.TOUCH]  # Shape: [1, 1024]

Option 2: Full Touch-LLM (with Language Model)

If you need the full Touch-LLM for tactile question answering:

  1. Download the pretrained touch encoder (last_new.ckpt) from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.

  2. Download the folder ckpts from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.

  3. Download the folder llama_ori from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.

  4. Run Touch-LLM for tactile question answering:

CUDA_VISIBLE_DEVICES=0 python touch_qa.py

Citation

@inproceedings{yang2024binding,
  title={Binding touch to everything: Learning unified multimodal tactile representations},
  author={Yang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={26340--26353},
  year={2024}
}

About

[CVPR 2024] Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages