Sanitune is an AI-powered tool for creating clean versions of songs. It separates vocals from instrumentals, transcribes the vocal track, detects explicit words, and then mutes, bleeps, or replaces those words before remixing the song.
Key features:
- Three edit modes:
mute,bleep, andreplace(voice replacement with pitch/timbre matching) - Web UI: Gradio-based interface with drag-and-drop upload, transcript highlighting, and audio preview
- AI-powered suggestions: BYO API key (Anthropic/OpenAI) for context-aware replacement word selection
- Voice synthesis: Edge-TTS (speech) or Bark (singing) for replacement word generation
- Cloud voice conversion: Optional Kits.ai integration for singer voice matching
- Format preservation: Output matches input format (MP3, FLAC, WAV, OGG, M4A, etc.)
- CLI + Docker: Local processing, audio never leaves your machine (except optional AI/cloud features)
Upload Song → Separate Vocals & Instrumentals → Transcribe Lyrics
→ Detect Profanity → [AI Suggestions] → Mute/Bleep/Replace → Remix → Clean Song
- Source separation — Demucs v4 (Meta) isolates vocals from the instrumental track
- Transcription — WhisperX transcribes lyrics with precise word-level timestamps
- Detection — Configurable profanity word lists flag explicit content (English + Spanish)
- AI suggestions (optional) — LLM picks context-aware clean replacements that match rhyme, syllable count, and tone
- Processing — Flagged words are muted, bleeped, or replaced with synthesized clean alternatives
- Remix — Processed vocals are merged back with the untouched instrumental
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e ".[lyrics,web,ai]"
# Basic mute flow
sanitune process song.mp3 --mode mute --language en
# Bleep explicit words instead
sanitune process song.mp3 --mode bleep --language es --bleep-freq 1200
# Replace with voice synthesis
sanitune process song.mp3 --mode replace --synth-engine bark -l es
# AI-powered contextual replacements (BYO API key)
sanitune process song.mp3 --mode replace \
--ai-provider anthropic --ai-api-key sk-ant-...
# Override device or output location
sanitune process song.mp3 --device cuda --output song_clean.wavpip install -e ".[web]"
sanitune web --port 7860Then open http://localhost:7860 in your browser.
pip install -e ".[lyrics]"
sanitune process song.mp3 \
--artist "Artist Name" \
--title "Song Title"When --artist and --title are provided, Sanitune may query external lyrics providers to cross-reference the transcription. Only song metadata and optional provider API tokens are sent; audio stays local.
# CLI mode
docker compose run --rm sanitune process input/song.mp3 -o output/song_clean.wav
# Web UI mode (uncomment sanitune-web service in docker-compose.yml)
docker compose up sanitune-web| Environment Variable | Default | Description |
|---|---|---|
SANITUNE_DEVICE |
auto |
Processing device: auto, cpu, cuda, mps |
SANITUNE_LANGUAGE |
en |
Profanity detection language: en, es |
SANITUNE_DEFAULT_MODE |
mute |
Default cleaning mode: mute, bleep |
SANITUNE_MAX_FILE_SIZE |
200 |
Maximum upload file size in MB |
SANITUNE_BLEEP_FREQ |
1000 |
Bleep tone frequency in Hz |
SANITUNE_AI_API_KEY |
API key for AI replacement suggestions (Anthropic/OpenAI) | |
KITS_API_KEY |
Kits.ai API key for cloud voice conversion | |
KITS_VOICE_MODEL_ID |
Kits.ai voice model ID |
| Mode | Minimum | Recommended |
|---|---|---|
| Mute/Bleep | 4 GB RAM, any CPU | 8 GB RAM, 4+ cores |
Processing times (per 3-minute song, approximate):
| Hardware | Mute/Bleep |
|---|---|
| CPU (4 cores) | ~3 min |
| NVIDIA RTX 3060 | ~30 sec |
| Apple M2 | ~45 sec |
| Component | Library | Purpose |
|---|---|---|
| Source Separation | Demucs v4 | Isolate vocals from instrumentals |
| Transcription | WhisperX | Word-level lyrics transcription |
| Optional Lyrics Lookup | syncedlyrics, lyricsgenius | Cross-reference lyrics when explicitly enabled |
| Audio I/O | soundfile + ffmpeg | Audio reading and writing |
- Voice replacement quality is experimental — TTS-based synthesis doesn't perfectly match singing voices yet
- Bark singing output varies in quality depending on language and speaker preset
- Kits.ai voice conversion requires a paid account and has a 1 request/minute rate limit
- AI suggestions require a BYO API key (Anthropic or OpenAI)
- GPU recommended for faster processing (CPU works but is slower)
See ROADMAP.md for the full development plan.
Contributions are welcome. Please open an issue first to discuss what you'd like to change.