[TPAMI 2025] Spatial Frequency Modulation for Semantic Segmentation
In this work, we identify and address the "aliasing degradation" problem in modern deep neural networks, where high-frequency information crucial for semantic segmentation is distorted during downsampling.
Instead of simply filtering out these valuable details, we introduce Spatial Frequency Modulation (SFM), a novel framework that:
- Modulates high-frequency features to a lower frequency band before downsampling, protecting them from aliasing.
- Demodulates these features back to their original high frequency during upsampling, recovering fine-grained details for a more accurate segmentation.
Our lightweight and plug-and-play modules, Adaptive Resampling (ARS) and Multi-Scale Adaptive Upsampling (MSAU), can be seamlessly integrated into various CNN and Transformer architectures to significantly boost their performance.
Figure: An illustration of our SFM framework. Adaptive Resampling (ARS) is inserted before downsampling layers to perform frequency modulation, and Multi-Scale Adaptive Upsampling (MSAU) is used to demodulate the features and produce the final high-resolution segmentation map.
- Identifying "Aliasing Degradation": We quantitatively demonstrate that a higher aliasing ratio in feature maps leads to lower segmentation accuracy, providing a clear motivation for frequency-aware network design.
- Spatial Frequency Modulation (SFM): We propose a novel framework to preserve high-frequency details by modulating them to lower frequencies to survive downsampling, and then demodulating them to recover the details.
- Lightweight and Effective Modules: We implement SFM with two novel modules:
- Adaptive Resampling (ARS): A lightweight module that learns to densely sample high-frequency regions (e.g., boundaries, textures) to effectively lower their frequency representation.
- Multi-Scale Adaptive Upsampling (MSAU): A module that performs non-uniform upsampling to reverse the modulation and refines segmentation details by modeling multi-scale pixel relationships.
- Broad Applicability: Our method is model-agnostic and consistently improves various state-of-the-art segmentation models, including CNN-based (ResNet, ConvNeXt, InternImage) and Transformer-based (Swin) architectures, with minimal computational overhead.
| Model | Dataset | mIoU |
|---|---|---|
| Mask2Former (code, config, ckpt) | ADE20K | 47.7 |
If you find our work useful in your research, please consider citing our paper:
Generated bibtex
@article{chen2023spatialfrequency,
title={Spatial Frequency Modulation for Semantic Segmentation},
author={Chen, Linwei and Fu, Ying and Gu, Lin and Zheng, Dezhi and Dai, Jifeng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2025},
volume={47},
number={11},
pages={9767-9784},
doi={10.1109/TPAMI.2025.3592621}
}
This project is built upon the excellent MMSegmentation toolbox. We thank the authors for their open-source contribution.
If you encounter any problems or bugs, please don't hesitate to contact me at chenlinwei@bit.edu.cn, charleschen2013@163.com. To ensure effective assistance, please provide a brief self-introduction, including your name, affiliation, and position. If you would like more in-depth help, feel free to provide additional information such as your personal website link. I would be happy to discuss with you and offer support.

