GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting

ACM MM2025

Lei Yao, Yi Wang, Yi Zhang, Moyun Liu, Lap-Pui Chau

Note: Since the work is still woking in progress, the full pre-training code has not been updated yet.

📝 To-Do List

Environment installation instructions.
Instructions for processing (pretraining) dataset.
Processing (pretraining part) code.
Release downstream training configs.
Release trained weights and experiment record.
Release pretraining code.

🌟 Pipeline

🔨 Installation

Our model is built on Pointcept toolkit, you can follow its official instructions to install the packages:

conda create -n GaussianCross python=3.8 -y
conda activate GaussianCross

xxx

Note that they also provide scripts to build correponding docker image: build_image.sh

🔍 Data Preprocessing

ScanNet V2 & ScanNet200

Download the ScanNet V2 dataset.
Run preprocessing code for raw ScanNet as follows:

xxx

Link processed dataset to codebase:

# PROCESSED_SCANNET_DIR: the directory of the processed ScanNet dataset.
mkdir data
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/scannet

S3DIS

We use the preprocessd S3DIS data from Pointcept.

Link processed dataset to codebase:

# PROCESSED_SCANNET_DIR: the directory of the processed S3DIS dataset.
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/s3dis

🚀 Training

Same to Pointcept, the training process is based on configs in configs folder. The training scripts will create an experiment folder in exp and backup essential code in the experiment folder. Training config, log file, tensorboard, and checkpoints will also be saved during the training process.

Attention: Note that a cricital difference from Pointcept is that most of data augmentation operations are conducted on GPU in this file. Make sure ToTensor is before the augmentation operations.

Download the pretrained 3D backbone from GaussianCross.

ScanNet V2

# Load the pretrained model
WEIGHT="path/to/downloaded/model/model_last.pth"

# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
# Instance Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHT
# Paramater Efficiency and Data Efficiency
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-efficient-[la20-lr20] -n semseg-spunet-efficient-[la20-lr20] -w $WEIGHT

ScanNet200

# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
# Instance Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHT

S3DIS

# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d s3dis -c semseg-spunet-base-area[1-5] -n semseg-spunet-base-area[1-5] -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d s3dis -c semseg-spunet-base-area[1-5]-lin -n semseg-spunet-base-area[1-5]-lin -w $WEIGHT

📚 License

This repository is released under the MIT license.

👏 Acknowledgement

The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.

Our code is primarily built upon Pointcept, Ponder V2 and gsplat.

📝 Citation

@article{yao2025gaussiancross,
  title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting},
  author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui},
  journal={arXiv preprint arXiv:2508.02172},
  year={2025}
}
or
@inproceedings{yao2025gaussiancross, 
  title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting}, 
  author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui}, 
  booktitle={Proceedings of the 33nd ACM International Conference on Multimedia}, 
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
configs		configs
libs		libs
pointcept		pointcept
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting

📝 To-Do List

🌟 Pipeline

🔨 Installation

🔍 Data Preprocessing

🚀 Training

📚 License

👏 Acknowledgement

📝 Citation

About

Uh oh!

Uh oh!

Languages

License

RayYoh/GaussianCross

Folders and files

Latest commit

History

Repository files navigation

GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting

📝 To-Do List

🌟 Pipeline

🔨 Installation

🔍 Data Preprocessing

🚀 Training

📚 License

👏 Acknowledgement

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages