Skip to content

[MM‘25] GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting

License

Notifications You must be signed in to change notification settings

RayYoh/GaussianCross

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting

ACM MM2025

Lei Yao, Yi Wang, Yi Zhang, Moyun Liu, Lap-Pui Chau

Image Image Image Image Image

image

Note: Since the work is still woking in progress, the full pre-training code has not been updated yet.

📝 To-Do List

  • Environment installation instructions.
  • Instructions for processing (pretraining) dataset.
  • Processing (pretraining part) code.
  • Release downstream training configs.
  • Release trained weights and experiment record.
  • Release pretraining code.

🌟 Pipeline

image

🔨 Installation

Our model is built on Pointcept toolkit, you can follow its official instructions to install the packages:

conda create -n GaussianCross python=3.8 -y
conda activate GaussianCross

xxx

Note that they also provide scripts to build correponding docker image: build_image.sh

🔍 Data Preprocessing

ScanNet V2 & ScanNet200

  • Download the ScanNet V2 dataset.
  • Run preprocessing code for raw ScanNet as follows:
xxx
  • Link processed dataset to codebase:
# PROCESSED_SCANNET_DIR: the directory of the processed ScanNet dataset.
mkdir data
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/scannet

S3DIS

We use the preprocessd S3DIS data from Pointcept.

  • Link processed dataset to codebase:
# PROCESSED_SCANNET_DIR: the directory of the processed S3DIS dataset.
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/s3dis

🚀 Training

Same to Pointcept, the training process is based on configs in configs folder. The training scripts will create an experiment folder in exp and backup essential code in the experiment folder. Training config, log file, tensorboard, and checkpoints will also be saved during the training process.

Attention: Note that a cricital difference from Pointcept is that most of data augmentation operations are conducted on GPU in this file. Make sure ToTensor is before the augmentation operations.

Download the pretrained 3D backbone from GaussianCross.

ScanNet V2

# Load the pretrained model
WEIGHT="path/to/downloaded/model/model_last.pth"

# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
# Instance Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHT
# Paramater Efficiency and Data Efficiency
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-efficient-[la20-lr20] -n semseg-spunet-efficient-[la20-lr20] -w $WEIGHT

ScanNet200

# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
# Instance Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHT

S3DIS

# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d s3dis -c semseg-spunet-base-area[1-5] -n semseg-spunet-base-area[1-5] -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d s3dis -c semseg-spunet-base-area[1-5]-lin -n semseg-spunet-base-area[1-5]-lin -w $WEIGHT

📚 License

This repository is released under the MIT license.

👏 Acknowledgement

The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.

Our code is primarily built upon Pointcept, Ponder V2 and gsplat.

📝 Citation

@article{yao2025gaussiancross,
  title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting},
  author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui},
  journal={arXiv preprint arXiv:2508.02172},
  year={2025}
}
or
@inproceedings{yao2025gaussiancross, 
  title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting}, 
  author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui}, 
  booktitle={Proceedings of the 33nd ACM International Conference on Multimedia}, 
  year={2025}
}

About

[MM‘25] GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting

Resources

License

Stars

Watchers

Forks