ACM MM2025
Lei Yao, Yi Wang, Yi Zhang, Moyun Liu, Lap-Pui Chau
Note: Since the work is still woking in progress, the full pre-training code has not been updated yet.
- Environment installation instructions.
- Instructions for processing (pretraining) dataset.
- Processing (pretraining part) code.
- Release downstream training configs.
- Release trained weights and experiment record.
- Release pretraining code.
Our model is built on Pointcept toolkit, you can follow its official instructions to install the packages:
conda create -n GaussianCross python=3.8 -y
conda activate GaussianCross
xxxNote that they also provide scripts to build correponding docker image: build_image.sh
ScanNet V2 & ScanNet200
- Download the ScanNet V2 dataset.
- Run preprocessing code for raw ScanNet as follows:
xxx- Link processed dataset to codebase:
# PROCESSED_SCANNET_DIR: the directory of the processed ScanNet dataset.
mkdir data
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/scannetS3DIS
We use the preprocessd S3DIS data from Pointcept.
- Link processed dataset to codebase:
# PROCESSED_SCANNET_DIR: the directory of the processed S3DIS dataset.
ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/s3disSame to Pointcept, the training process is based on configs in configs folder. The training scripts will create an experiment folder in exp and backup essential code in the experiment folder. Training config, log file, tensorboard, and checkpoints will also be saved during the training process.
Attention: Note that a cricital difference from Pointcept is that most of data augmentation operations are conducted on GPU in this file. Make sure ToTensor is before the augmentation operations.
Download the pretrained 3D backbone from GaussianCross.
ScanNet V2
# Load the pretrained model
WEIGHT="path/to/downloaded/model/model_last.pth"
# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
# Instance Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHT
# Paramater Efficiency and Data Efficiency
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-efficient-[la20-lr20] -n semseg-spunet-efficient-[la20-lr20] -w $WEIGHTScanNet200
# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
# Instance Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet200 -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHTS3DIS
# Linear Probing
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d s3dis -c semseg-spunet-base-area[1-5] -n semseg-spunet-base-area[1-5] -w $WEIGHT
# Semantic Segmentation
CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d s3dis -c semseg-spunet-base-area[1-5]-lin -n semseg-spunet-base-area[1-5]-lin -w $WEIGHTThis repository is released under the MIT license.
The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.
Our code is primarily built upon Pointcept, Ponder V2 and gsplat.
@article{yao2025gaussiancross,
title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting},
author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui},
journal={arXiv preprint arXiv:2508.02172},
year={2025}
}
or
@inproceedings{yao2025gaussiancross,
title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting},
author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui},
booktitle={Proceedings of the 33nd ACM International Conference on Multimedia},
year={2025}
}
