Skip to content

Yuanshi9815/LiteFocus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LiteFocus

arXiv

LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
Zhenxiong Tan, Xinyin Ma, Gongfan Fang, and Xinchao Wang
Learning and Vision Lab, National University of Singapore

TL;DR (Too Long; Didn't Read)

LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2. It doubles the processing speed and enhances audio quality.

Setup

  • Prepare Environment (optional)
conda create -n litefocus python=3.10
conda activate litefocus
  • Install Base Model
pip3 install git+https://github.com/haoheliu/AudioLDM2.git

Usage

Basic Usage

from audioldm2 import text_to_audio, build_model
import scipy

+ from litefocus import inject_lite_focus, disable_lite_focus

model = build_model(model_name='audioldm2-full')

+ inject_lite_focus(model)

waveform = text_to_audio(
    latent_diffusion=model,
    duration=40,
    text='Musical constellations twinkling in the night sky, forming a cosmic melody.',
)

scipy.io.wavfile.write("out.wav", rate=16000, data=waveform)

Disable LiteFocus

disable_lite_focus(model)

Configuration

config = {
    'same_frequency': True,
    'cross_frequency': True,
    'sparse_ratio': 0.1
}

inject_lite_focus(model, config)
Parameter Description Default Value
same_frequency Enables attention to tokens sharing the same-frequency. True
cross_frequency Enables attention to tokens in cross-frequency compensation. True
sparse_ratio Specifies the sparsity ratio for cross_frequency. 0.1

To-Do

  • AudioLDM2 Integration
  • Diffusers pipeline Integration

Citation

@article{tan2024litefocus,
  title={LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis},
  author={Tan, Zhenxiong and Ma, Xinyin and Fang, Gongfan and Wang, Xinchao},
  booktitle={Proc. Interspeech 2024},
  pages={4878--4882},
  year={2024}
}

About

[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages