Binding Touch to Everything: Learning Unified Multimodal Tactile Representations (CVPR 2024)

This repo contains code of Touch-LLM for UniTouch. Our code is built on top of the ImageBind and LLaMA-Adapter codebases.

Inference with Pretrained Models

Option 1: Touch Encoder Only (Lightweight)

If you only need the touch encoder to extract tactile embeddings (without the language model):

Download the pretrained touch encoder (last_new.ckpt) from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.
Run the standalone touch encoder:

python load_touch_encoder.py

This will load only the touch encoder and extract 1024-dimensional embeddings from touch images. You can use these embeddings for:

Touch-based similarity search
Touch classification
Touch feature extraction
Cross-modal retrieval (touch-to-vision, touch-to-text, etc.)

Example usage:

import torch
import ImageBind.data as data
from ImageBind.models.x2touch_model_part import x2touch, ModalityType

# Load touch encoder
model = x2touch(pretrained=True)
model.eval()

# Extract embeddings from touch images
touch_images = data.load_and_transform_vision_data(["path/to/touch.jpg"], device="cuda")
with torch.no_grad():
    embeddings = model({ModalityType.TOUCH: touch_images})[ModalityType.TOUCH]  # Shape: [1, 1024]

Option 2: Full Touch-LLM (with Language Model)

If you need the full Touch-LLM for tactile question answering:

Download the pretrained touch encoder (last_new.ckpt) from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.
Download the folder ckpts from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.
Download the folder llama_ori from the HuggingFace model hub and put it in the ./UniTouch folder, same level as touch_qa.py.
Run Touch-LLM for tactile question answering:

CUDA_VISIBLE_DEVICES=0 python touch_qa.py

Citation

@inproceedings{yang2024binding,
  title={Binding touch to everything: Learning unified multimodal tactile representations},
  author={Yang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={26340--26353},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ImageBind		ImageBind
data		data
images		images
llama		llama
tools		tools
util		util
README.md		README.md
convert_ckpt.py		convert_ckpt.py
image_generate.py		image_generate.py
load_touch_encoder.py		load_touch_encoder.py
requirements.txt		requirements.txt
touch_qa.py		touch_qa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations (CVPR 2024)

Inference with Pretrained Models

Option 1: Touch Encoder Only (Lightweight)

Option 2: Full Touch-LLM (with Language Model)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

cfeng16/UniTouch

Folders and files

Latest commit

History

Repository files navigation

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations (CVPR 2024)

Inference with Pretrained Models

Option 1: Touch Encoder Only (Lightweight)

Option 2: Full Touch-LLM (with Language Model)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages