Voiovo
human + AI agent collaboration · agents draft via API · you fine-tune in the UI · same job, two front doors

Generated in partnership
with AI agents.

Voiovo is built for the way real video work happens now: an AI agent drafts the first pass, a human reviews and fine-tunes. Agents call the API to plan storyboards, regenerate individual scenes, flip a scene's kind from atmospheric still to live chart, swap voices, adjust effects bundles — everything addressable by scene index, every round-trip non-destructive. You open the same job in the browser, watch the preview, edit captions, retime, swap a portrait, click "Re-render assembly only" and the tweak ships in 30 seconds. ElevenLabs narration, talking-head lipsync, transitions, vignette / grain / color grade, karaoke captions with word-pop, music with sidechain ducking, SFX placed to the millisecond on individual spoken words — all available to both sides.

Sign in to start What's possible API docs

Drafted by an agent, fine-tuned in the UI — both rendered through Voiovo.

New Render-time tools

Every job carries a declarative effects bundle the renderer applies in one ffmpeg pass. Pick a preset (None / Subtle / Punchy / Cinematic) or roll your own combination on the Style tab.

talking heads

Lipsync any scene

prunaai p-video-avatar · 10 credits/sec

Pick a portrait from your persons library; voiovo slices the matching narration window per scene and lipsyncs it directly from the still photo via prunaai p-video-avatar (no base-video step). 720p output, fast, photoreal. Swap the person between renders — cache invalidates per scene automatically.

scene 0 + scene 6 talking head, the rest AI imagery — common hook+close pattern
transitions

20 transitions, cycled

multi-select chip panel

fade · slide left/right/up/down · wipe · circle open/close · radial · pixelize · fade-through-black/white · smooth slides. Pick one or several — the renderer cycles through them across scene boundaries for visual variety. ffmpeg 7 xfade under the hood.

["slideleft","slideright","fade"] → cycles slideleft/slideright/fade across each cut
captions

Caption styling toolkit

5 independent toggles

Word pop-in (vertical scale per current word, no sentence reflow), emphasis highlight (numbers + ALL-CAPS auto-color), progressive reveal (words appear when spoken — YouTube-shorts style), background pill on/off, thin black outline on/off. All compose. Burned in via libass at caption time, no extra ffmpeg pass.

progressive reveal + word pop + outline + bg off → the YouTube-shorts caption look
progress bar

Progress indicator, 3 modes

full · per-scene · per-step

Thin top-edge bar that grows pixel-by-pixel via crop+overlay. Three reset modes: across the whole video, per scene, or per listicle step. Color picker with an Inherit button that follows the caption highlight.

listicle short with 5 methods + per-step bar → 5 fresh sweeps, one per method
grade · grain · vignette

Three finishing touches

single ffmpeg pass

Color grade presets (cool-cyan for data/news, warm-amber for spiritual/story, desaturated for cinematic) + film grain (1−30, temporal noise) + vignette (quadratic-eased angle). All compose into one filter chain.

data short → cool-cyan grade + grain 6 + vignette 0.4 = "Bloomberg + film stock"
sfx · word-aligned

SFX placed on words

15 built-in + your generor archive

Built-in library of 15 sounds (whooshes, ticks, drones, stings, beeps) shared across every voiovo user. Plus your personal archive on generor.com. The Plan-tab editor shows each scene's narration as clickable word chips — click a word to align an SFX hit to that exact moment via per-word TTS timings.

click "twenty" in the narration → tick fires at the syllable
cheap re-render

Assembly-only re-render

~30% of full cost

Tweaked a font, color, transition, or effect? Hit "Re-render assembly only" instead of paying for image gen + lipsync + TTS again. Worker reuses every cached asset and only re-runs assembly + thumbnail. ~30s vs ~6 min, ~113 credits vs ~374 credits.

change Punchy → Cinematic preset on a finished job → new mp4 in 30 seconds

Live integrations across the alexiuz network

Voiovo pulls real artifacts from sister sites and renders them as scenes (or audio segments) in your video. Attribution on visual integrations is part of the rendered pixels — not removable, not ambiguous.

data charts

Real-time animated charts

Mention an asset, country indicator, or commodity in your script and voiovo renders the actual time-series chart from the database, animated on a draw-in. Cryptos, stocks, FRED economic series, World Bank indicators, NOAA climate.

"US 10-year yield since 1980" → live chart, fully labeled, source line baked in
3D globes

Country highlight rotations

When your script names a country, voiovo rotates a real 3D globe (Three.js + WebGL) to bring that country to center, highlights the territory in brand color, and locks the frame for the rest of the scene. Cinematic-grade.

"In Switzerland..." → globe spins to Europe, Switzerland glows cyan
citations

Sourced wisdom citations

125,000+ openly-licensed passages from the Bible, Quran, Bhagavad Gita, Stoics, Taoists, Zen masters, Sufi poets and more — with canonical references rendered as on-screen attribution. No LLM hallucinated quotes; only verified, attributable text.

"Love your enemies" → Luke 6:27 quote-card overlays the scene
scene editing

Edit any scene in Imaig

round-trip with imaig.com

Don't like a generated scene image? Click "Edit in Imaig" on the scene grid — the image opens in the imaig editor. Tweak it (filters, AI inpaint, regenerate, color grade), click "Send back", and voiovo replaces the scene + offers to reassemble. No download/upload dance.

scene 4 looks off → one click into imaig → one click back → reassemble
audio editing API

Server-side audio ops

via ahudio.com /api/v1/process

Declarative audio processing as a service: trim · fade in/out · gain · normalize (LUFS) · pad silence · mix · lowpass · highpass · tempo. One endpoint, one ffmpeg pass per request, output cached by (input + ops + format). Voiovo's render pipeline + 3rd-party LLMs both consume it.

trim a drone-low to 3.4s, fade-out 200ms, gain −6 dB, mix under narration — one POST
SFX library

15 shared sound effects

generated via generor.com

Built-in SFX library shared across every voiovo user — whooshes, ticks, chimes, drones, stings, beeps. Plus your personal generor archive shows up in the same picker. Per-scene SFX entries take a creation_id, gain (dB), and an offset_ms aligned to per-word TTS timings.

tick at the syllable "twenty" + drone-low under the close + bass-thud sting at the final word

1. Write or generate

Paste your own script, or use the built-in idea generator + script writer to produce a punchy 150-word draft from a niche.

2. Pick an aesthetic

Photoreal documentary, cinematic Roman, ink-wash, dystopian near-future, Persian miniature — every scene stays on style.

3. Storyboard auto-anchored

Scenes line up with the moment they're being narrated. AI atmospheric stills mix with charts, globes, and citation cards.

4. Stills, motion, or talking heads

Default: high-quality stills with smooth Ken Burns. Optional AI motion-video clips per scene (Seedance / Kling). Or set any scene to talking_head with a person from your library — lipsync runs against your TTS narration.

5. Effects bundle, one ffmpeg pass

Pick a preset (Subtle / Punchy / Cinematic) or compose: cross-dissolve transitions cycled across cuts, vignette, film grain, color grade, progress bar (full / per-scene / per-step), caption emphasis, word pop-in, progressive reveal, outline, background pill on/off.

6. Karaoke captions + sound design

Word-by-word brand-color highlight, opening title card, "METHOD 1/2/3" pop-ins auto-detected, header overlays, safe-area padding for iPhone notch and TikTok UI. Music with sidechain ducking, per-scene SFX placed on individual spoken words via per-word TTS timings.

7. Tune cheaply, re-render

"Re-render assembly only" reuses cached images, TTS, and lipsync — ~30% of full cost, finishes in seconds. Use it for font / color / effect / transition tweaks. "Re-render new version" runs the full pipeline when you've edited content.

8. Output bundle

mp4 ready for YouTube/TikTok/Reels, matching thumbnail, .ass + .srt subtitle files, downloadable per-scene images and audio chunks for remixing. Old versions stay reachable; new renders write to -v(N+1).

Aesthetic library

cinematic-roman · Caravaggio chiaroscuro, marble busts, candlelit scrolls. Stoicism, Greco-Roman content.
photoreal-documentary · NYT-feature-photo realism. Long-form essays, profiles.
dystopian-near-future · Black Mirror / Severance / Children of Men. Cold institutional architecture.
ink-wash-east-asian · Restrained monochrome, large negative space. Buddhism, Taoism, Zen content.
persian-miniature-sufi · Jewel tones, ornamental tile, dervishes. Sufi / Islamic content.
renaissance-painterly · Oil-paint feel, classical composition. Christianity, Greek philosophy.
existential-noir · French new wave / Hopper. Camus, Sartre territory.
data-news-anchor · Bloomberg-Terminal cinematic. Stats, finance, market analysis.

Pay per render

Uses your alexiuz credits. Same balance as Generor, Imaig, Ahudio — buy credits once, use them anywhere on the network.

~200 credits · 60s short, Ken Burns stills
~350 credits · 60s short with AI motion video
+10 credits/sec · per second of talking-head lipsync
~1,800 credits · 18-min long-form essay
+30−80 credits · per chart / globe / citation scene (sister-site cost)
~30% of full cost · assembly-only re-render for tweaks
free · built-in SFX library, sister-site music, mascot loops

What you control

Format · Aesthetic · Voice · Image / video / lipsync model per job · Talking-head persona library · Caption font / color / size / position · Effects bundle (transitions, vignette, grain, color grade, progress bar, caption emphasis / word-pop / progressive reveal / outline / background) · Per-scene SFX with word-aligned offsets · Mascot, logo, attribution overlays · Edit-in-Imaig · Cheap assembly-only re-render · Or full re-render with media regen

What you get back

mp4 (1920×1080 long, 1080×1920 short) with safe-area padding · thumbnail (matching aspect) · .ass and .srt subtitle files · per-scene images and audio chunks for re-mixing · storyboard.json for editing scene anchors and citation overrides