LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
Zhenxiong Tan, Xinyin Ma, Gongfan Fang, and Xinchao Wang
Learning and Vision Lab, National University of Singapore
LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2. It doubles the processing speed and enhances audio quality.
- Prepare Environment (optional)
conda create -n litefocus python=3.10
conda activate litefocus- Install Base Model
pip3 install git+https://github.com/haoheliu/AudioLDM2.gitfrom audioldm2 import text_to_audio, build_model
import scipy
+ from litefocus import inject_lite_focus, disable_lite_focus
model = build_model(model_name='audioldm2-full')
+ inject_lite_focus(model)
waveform = text_to_audio(
latent_diffusion=model,
duration=40,
text='Musical constellations twinkling in the night sky, forming a cosmic melody.',
)
scipy.io.wavfile.write("out.wav", rate=16000, data=waveform)disable_lite_focus(model)config = {
'same_frequency': True,
'cross_frequency': True,
'sparse_ratio': 0.1
}
inject_lite_focus(model, config)| Parameter | Description | Default Value |
|---|---|---|
same_frequency |
Enables attention to tokens sharing the same-frequency. | True |
cross_frequency |
Enables attention to tokens in cross-frequency compensation. | True |
sparse_ratio |
Specifies the sparsity ratio for cross_frequency. |
0.1 |
- AudioLDM2 Integration
- Diffusers pipeline Integration
@article{tan2024litefocus,
title={LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis},
author={Tan, Zhenxiong and Ma, Xinyin and Fang, Gongfan and Wang, Xinchao},
booktitle={Proc. Interspeech 2024},
pages={4878--4882},
year={2024}
}