StemgenRT

A real-time low-latency music source separation plugin. Drop it on a track and get 4 separate stems: drums, bass, other, and vocals.

With a latency of 11.6 milliseconds, it is made for spatializing DJ sets in real-time: split the mix into stems, place them in the room, and create an immersive experience.

Built with JUCE and ONNX Runtime, using HS-TasNet.

Available as VST3 and AU.

Usage

StemgenRT is a multi-output plugin with 4 stereo output buses:

Drums
Bass
Other (synths, guitars, etc.)
Vocals

To set it up:

Insert StemgenRT on your source track (e.g., a DJ mix or full song)
Create 4 auxiliary/bus tracks to receive each stem
Route each of the plugin's stem outputs to its corresponding aux track

Check your DAW's documentation for multi-output plugin routing.

Note

Set your DAW to 44.1 kHz, the model only works at this sample rate.

Downloads

Note

The macOS plugin is not signed (yet). You need to sign it yourself: codesign --force --deep --sign - StemgenRT.component

Building

First, grab the ONNX Runtime dependency:

# macOS
./scripts/download-onnxruntime.sh

# Windows (PowerShell)
./scripts/download-onnxruntime.ps1

Then build with CMake:

cmake -S . -B build
cmake --build build

For a release build:

cmake -S . -B build-release
cmake --build build-release

How it works

The plugin runs a neural network to separate audio, but neural networks are slow and audio callbacks are fast. To bridge the gap:

Audio thread collects incoming samples and feeds them to a ring buffer
Inference thread runs the model asynchronously in the background
Audio thread picks up the processed stems when ready

If inference can't keep up, the plugin gracefully crossfades to the dry signal rather than glitching.

A few DSP tricks help the model out:

HP/LP split + LP reinjection — Input is split by LR4 crossover. HP goes to the model; LP bypasses inference and is reinjected after model output (currently bass-biased) to keep low end stable.
Chunk boundary crossfade — The model outputs more samples than the 512-sample center region. Extra samples from the right context are crossfaded with the next chunk's start, eliminating discontinuities at chunk boundaries.
Input normalization — Context-aware: RMS is computed over both the context window and the current input chunk, then normalized to a consistent level before inference. This avoids extreme gain swings at transients (e.g., loud kick tail in context, silence in input) and pushes the model's noise floor below the signal level.
Vocals gate — Detects spurious low-level content in the vocals stem (common on instrumentals) using both energy ratio and absolute level thresholds. Gated content is transferred to the "other" stem to preserve total energy. Asymmetric attack/release smoothing prevents pumping.
Soft gating — When input is silent, output is silent. Prevents the model from hallucinating noise.
Low-band stabilizer — Reconstructs low-band stem balance using dry-constrained low-frequency energy and suppresses synthetic high-frequency leakage on low-only inputs.

The main bus is dry passthrough. The stem buses carry model output with LP reinjection, low-band stabilization, and gates applied.

A note on GPU acceleration

You might expect GPU to be faster, but for this particular model it often isn't:

1D convolutions — GPUs are optimized for 2D (images). 1D audio convolutions don't parallelize as well.
Batch size of 1 — Real-time audio processes one chunk at a time. GPUs shine with large batches.
Memory-bound ops — Reshapes and audio operations are limited by memory bandwidth, not compute. Your CPU cache is actually fast for this.
Kernel launch overhead — Each GPU operation has ~5-20μs overhead. With many small ops, it adds up.

That said, GPU builds are available if you want to try.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
.github/workflows		.github/workflows
cmake		cmake
model		model
plugin		plugin
screenshots		screenshots
scripts		scripts
test		test
.clang-format		.clang-format
.cmake-format.yaml		.cmake-format.yaml
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StemgenRT

Usage

Downloads

Building

How it works

A note on GPU acceleration

License

About

Uh oh!

Releases 1

Packages

Languages

License

sweetspotsoundsystem/stemgen-rt

Folders and files

Latest commit

History

Repository files navigation

StemgenRT

Usage

Downloads

Building

How it works

A note on GPU acceleration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages