Voiovo is built for the way real video work happens now: an AI agent drafts the first pass, a human reviews and fine-tunes. Agents call the API to plan storyboards, regenerate individual scenes, flip a scene's kind from atmospheric still to live chart, swap voices, adjust effects bundles — everything addressable by scene index, every round-trip non-destructive. You open the same job in the browser, watch the preview, edit captions, retime, swap a portrait, click "Re-render assembly only" and the tweak ships in 30 seconds. ElevenLabs narration, talking-head lipsync, transitions, vignette / grain / color grade, karaoke captions with word-pop, music with sidechain ducking, SFX placed to the millisecond on individual spoken words — all available to both sides.
Drafted by an agent, fine-tuned in the UI — both rendered through Voiovo.
Every job carries a declarative effects bundle the renderer applies in one ffmpeg pass. Pick a preset (None / Subtle / Punchy / Cinematic) or roll your own combination on the Style tab.
Pick a portrait from your persons library; voiovo slices the matching narration window per scene and lipsyncs it directly from the still photo via prunaai p-video-avatar (no base-video step). 720p output, fast, photoreal. Swap the person between renders — cache invalidates per scene automatically.
scene 0 + scene 6 talking head, the rest AI imagery — common hook+close patternfade · slide left/right/up/down · wipe · circle open/close · radial · pixelize · fade-through-black/white · smooth slides. Pick one or several — the renderer cycles through them across scene boundaries for visual variety. ffmpeg 7 xfade under the hood.
["slideleft","slideright","fade"] → cycles slideleft/slideright/fade across each cutWord pop-in (vertical scale per current word, no sentence reflow), emphasis highlight (numbers + ALL-CAPS auto-color), progressive reveal (words appear when spoken — YouTube-shorts style), background pill on/off, thin black outline on/off. All compose. Burned in via libass at caption time, no extra ffmpeg pass.
progressive reveal + word pop + outline + bg off → the YouTube-shorts caption lookThin top-edge bar that grows pixel-by-pixel via crop+overlay. Three reset modes: across the whole video, per scene, or per listicle step. Color picker with an Inherit button that follows the caption highlight.
listicle short with 5 methods + per-step bar → 5 fresh sweeps, one per methodColor grade presets (cool-cyan for data/news, warm-amber for spiritual/story, desaturated for cinematic) + film grain (1−30, temporal noise) + vignette (quadratic-eased angle). All compose into one filter chain.
data short → cool-cyan grade + grain 6 + vignette 0.4 = "Bloomberg + film stock"Built-in library of 15 sounds (whooshes, ticks, drones, stings, beeps) shared across every voiovo user. Plus your personal archive on generor.com. The Plan-tab editor shows each scene's narration as clickable word chips — click a word to align an SFX hit to that exact moment via per-word TTS timings.
click "twenty" in the narration → tick fires at the syllableTweaked a font, color, transition, or effect? Hit "Re-render assembly only" instead of paying for image gen + lipsync + TTS again. Worker reuses every cached asset and only re-runs assembly + thumbnail. ~30s vs ~6 min, ~113 vs ~374
.
Voiovo pulls real artifacts from sister sites and renders them as scenes (or audio segments) in your video. Attribution on visual integrations is part of the rendered pixels — not removable, not ambiguous.
Mention an asset, country indicator, or commodity in your script and voiovo renders the actual time-series chart from the database, animated on a draw-in. Cryptos, stocks, FRED economic series, World Bank indicators, NOAA climate.
"US 10-year yield since 1980" → live chart, fully labeled, source line baked inWhen your script names a country, voiovo rotates a real 3D globe (Three.js + WebGL) to bring that country to center, highlights the territory in brand color, and locks the frame for the rest of the scene. Cinematic-grade.
"In Switzerland..." → globe spins to Europe, Switzerland glows cyan125,000+ openly-licensed passages from the Bible, Quran, Bhagavad Gita, Stoics, Taoists, Zen masters, Sufi poets and more — with canonical references rendered as on-screen attribution. No LLM hallucinated quotes; only verified, attributable text.
"Love your enemies" → Luke 6:27 quote-card overlays the sceneDon't like a generated scene image? Click "Edit in Imaig" on the scene grid — the image opens in the imaig editor. Tweak it (filters, AI inpaint, regenerate, color grade), click "Send back", and voiovo replaces the scene + offers to reassemble. No download/upload dance.
scene 4 looks off → one click into imaig → one click back → reassembleDeclarative audio processing as a service: trim · fade in/out · gain · normalize (LUFS) · pad silence · mix · lowpass · highpass · tempo. One endpoint, one ffmpeg pass per request, output cached by (input + ops + format). Voiovo's render pipeline + 3rd-party LLMs both consume it.
trim a drone-low to 3.4s, fade-out 200ms, gain −6 dB, mix under narration — one POSTBuilt-in SFX library shared across every voiovo user — whooshes, ticks, chimes, drones, stings, beeps. Plus your personal generor archive shows up in the same picker. Per-scene SFX entries take a creation_id, gain (dB), and an offset_ms aligned to per-word TTS timings.
Paste your own script, or use the built-in idea generator + script writer to produce a punchy 150-word draft from a niche.
Photoreal documentary, cinematic Roman, ink-wash, dystopian near-future, Persian miniature — every scene stays on style.
Scenes line up with the moment they're being narrated. AI atmospheric stills mix with charts, globes, and citation cards.
Default: high-quality stills with smooth Ken Burns. Optional AI motion-video clips per scene (Seedance / Kling). Or set any scene to talking_head with a person from your library — lipsync runs against your TTS narration.
Pick a preset (Subtle / Punchy / Cinematic) or compose: cross-dissolve transitions cycled across cuts, vignette, film grain, color grade, progress bar (full / per-scene / per-step), caption emphasis, word pop-in, progressive reveal, outline, background pill on/off.
Word-by-word brand-color highlight, opening title card, "METHOD 1/2/3" pop-ins auto-detected, header overlays, safe-area padding for iPhone notch and TikTok UI. Music with sidechain ducking, per-scene SFX placed on individual spoken words via per-word TTS timings.
"Re-render assembly only" reuses cached images, TTS, and lipsync — ~30% of full cost, finishes in seconds. Use it for font / color / effect / transition tweaks. "Re-render new version" runs the full pipeline when you've edited content.
mp4 ready for YouTube/TikTok/Reels, matching thumbnail, .ass + .srt subtitle files, downloadable per-scene images and audio chunks for remixing. Old versions stay reachable; new renders write to -v(N+1).
cinematic-roman · Caravaggio chiaroscuro, marble busts, candlelit scrolls. Stoicism, Greco-Roman content.
photoreal-documentary · NYT-feature-photo realism. Long-form essays, profiles.
dystopian-near-future · Black Mirror / Severance / Children of Men. Cold institutional architecture.
ink-wash-east-asian · Restrained monochrome, large negative space. Buddhism, Taoism, Zen content.
persian-miniature-sufi · Jewel tones, ornamental tile, dervishes. Sufi / Islamic content.
renaissance-painterly · Oil-paint feel, classical composition. Christianity, Greek philosophy.
existential-noir · French new wave / Hopper. Camus, Sartre territory.
data-news-anchor · Bloomberg-Terminal cinematic. Stats, finance, market analysis.
Uses your alexiuz credits. Same balance as Generor, Imaig, Ahudio — buy credits once, use them anywhere on the network.
~200 · 60s short, Ken Burns stills
~350 · 60s short with AI motion video
+10 /sec · per second of talking-head lipsync
~1,800 · 18-min long-form essay
+30−80 · per chart / globe / citation scene (sister-site cost)
~30% of full cost · assembly-only re-render for tweaks
free · built-in SFX library, sister-site music, mascot loops
Format · Aesthetic · Voice · Image / video / lipsync model per job · Talking-head persona library · Caption font / color / size / position · Effects bundle (transitions, vignette, grain, color grade, progress bar, caption emphasis / word-pop / progressive reveal / outline / background) · Per-scene SFX with word-aligned offsets · Mascot, logo, attribution overlays · Edit-in-Imaig · Cheap assembly-only re-render · Or full re-render with media regen
mp4 (1920×1080 long, 1080×1920 short) with safe-area padding · thumbnail (matching aspect) · .ass and .srt subtitle files · per-scene images and audio chunks for re-mixing · storyboard.json for editing scene anchors and citation overrides