🎬 LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

💡 TLDR: Infra with NVFP4 and parallelism for both training and inference

News

🔥 [2026.06.01] We released LongLive-RAG, a general retrieval-augmented framework for long video gen.
🔥 [2026.05.30] LongLive2.0 now supports I2V AR teacher-forcing training and I2V DMD distillation for Wan2.2-TI2V-5B.
⚡ [2026.05.25] We optimized the NVFP4 inference path with fused Triton RoPE/adaLN kernels, reduced KV-cache synchronization overhead, in-place quantized KV-cache updates, faster FP4 KV dequantization, pinned VAE transfers, and safer LoRA-before-quantization setup, improving overall throughput by 18.6%.
🔥 [2026.05.13] We release LongLive 2.0, infra with NVFP4, parallelism and multi-shot for AR training, DMD distillation, and inference (⚡45.7 FPS). The original LongLive 1.0 is now in the v1.0 branch.
🔥 [2026.04.12] LongLive supports kv cache compression with TriAttention, with 50% KV reduction and no quality drop. Check it here
🎉 [2026.1.27] LongLive is accepted by ICLR-2026.
🔥 [2026.1.11] LongLive supports adapting LongLive's original RoPE into KV-cache relative RoPE and generates infinite long videos!
🔥 [2025.11.3] We implement LongLive on linear attention model SANA-Video! Now SANA-Video can generate 60s interactive videos in real-time.
🔥 [2025.9.29] We release Paper, this GitHub repo LongLive with all training and inference code, the model weight LongLive-1.3B, and demo page Website.

Introduction

LongLive 1.0: Real-time Interactive Long Video Generation. You can find it here in our V1.0 branch.

LongLive 2.0: an NVFP4 Parallel Infrastructure for Long Video Generation

For training, it supports
- Balanced sequence parallel for T2V/I2V AR training (teacher-forcing).
- T2V/I2V AR training on multi-shot (or single-shot) videos.
- NVFP4 (or BF16) for both AR training and few-step distillation.
For inference, it supports
- NVFP4 inference (W4A4) and NVFP4 KV Cache.
- Multi-shot attention sink.
- Sequence parallel inference.
- Async decoding.

LongLive 1.0: Real-time Interactive Long Video Generation. It accepts sequential user prompts and generates corresponding videos in real time, enabling user-guided long video generation. The key insights are attention sink, KV-recache, and streaming long tuning.

Getting Started

Quick Start

BF16

import torch
from omegaconf import OmegaConf

from pipeline import CausalDiffusionInferencePipeline
from utils.config import normalize_config
from utils.inference_utils import (
    load_generator_checkpoint,
    place_vae_for_streaming,
    prepare_single_prompt_inputs,
    save_video,
)

prompt = "A compact silver robot walks through a clean robotics lab."
merged_checkpoint_path = "LongLive-2.0-5B/model_bf16.pt"

config = normalize_config(OmegaConf.load("configs/inference.yaml"))
device = torch.device("cuda")

torch.set_grad_enabled(False)
pipe = CausalDiffusionInferencePipeline(config, device=device)
load_generator_checkpoint(pipe.generator, merged_checkpoint_path)
pipe = pipe.to(device=device, dtype=torch.bfloat16)
place_vae_for_streaming(pipe, config)  # honor streaming_vae + vae_device when set
pipe.generator.model.eval().requires_grad_(False)

noise, prompts = prepare_single_prompt_inputs(config, prompt, device)
video = pipe.inference(noise=noise, text_prompts=prompts)
save_video(video[0], "videos/quickstart/sample.mp4", fps=24)

place_vae_for_streaming is a no-op unless inference.streaming_vae is true and inference.vae_device is set, so toggling streaming-pipeline decode in your yaml is enough — the script does not need to change.

NVFP4

Point checkpoints.generator_ckpt in configs/nvfp4/inference_nvfp4.yaml at the downloaded checkpoint and set model_quant_use_transformer_engine according to the backend you are using:

TransformerEngine checkpoint (model_te.pt): model_quant_use_transformer_engine: true
FourOverSix checkpoint (model_4o6.pt): model_quant_use_transformer_engine: false

setup_nvfp4_pipeline handles checkpoint loading, NVFP4 module wrapping, weight materialization, dtype/device placement, and the streaming-pipeline VAE relocation for both backends — the bf16 pipe.to(...) shortcut is unsafe here because it would cast the quantized buffers.

import torch
from omegaconf import OmegaConf

from pipeline import CausalDiffusionInferencePipeline
from utils.config import normalize_config
from utils.inference_utils import prepare_single_prompt_inputs, save_video, setup_nvfp4_pipeline

prompt = "A compact silver robot walks through a clean robotics lab."

config = normalize_config(OmegaConf.load("configs/nvfp4/inference_nvfp4.yaml"))
device = torch.device("cuda")

torch.set_grad_enabled(False)
pipe = CausalDiffusionInferencePipeline(config, device=device)
setup_nvfp4_pipeline(pipe, config, device)
pipe.generator.model.eval().requires_grad_(False)

noise, prompts = prepare_single_prompt_inputs(config, prompt, device)
video = pipe.inference(noise=noise, text_prompts=prompts)
save_video(video[0], "videos/quickstart/sample_nvfp4.mp4", fps=24)

Training Modes

LongLive2.0 supports both T2V and I2V training. Each modality follows the same two-stage recipe: AR teacher-forcing training first, then DMD distillation from the AR checkpoint.

T2V Training

torchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \
  --config_path configs/train_ar.yaml \
  --logdir logs/train_ar \
  --wandb-save-dir wandb \
  --disable-wandb

torchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \
  --config_path configs/train_dmd.yaml \
  --logdir logs/train_dmd \
  --wandb-save-dir wandb \
  --disable-wandb

I2V Training

torchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \
  --config_path configs/train_i2v_ar.yaml \
  --logdir logs/train_i2v_ar \
  --wandb-save-dir wandb \
  --disable-wandb

torchrun --standalone --nnodes=1 --nproc_per_node=8 train.py \
  --config_path configs/train_i2v_dmd.yaml \
  --logdir logs/train_i2v_dmd \
  --wandb-save-dir wandb \
  --disable-wandb

For I2V configs, set algorithm.i2v: true and algorithm.independent_first_frame: true. data.image_or_video_shape[1] is the full latent sequence length, for example 96, not 96 + 1: the clean image latent replaces the first latent during denoising and that first latent is masked out of the training loss. For I2V DMD, set checkpoints.generator_ckpt to the I2V AR checkpoint used to initialize the student.

Models

Model	FPS ↑	Params	VBench ↑	Multi-shot
LongLive-1.3B	20.7	1.3B	84.87
LongLive-2.0-5B	24.8	5B	85.06	✅
LongLive-2.0-5B-NVFP4-4Step	29.7	5B	84.51	✅
LongLive-2.0-5B-NVFP4-2Step	45.7	5B	83.14	✅

License

This repository is released under the Apache 2.0 license. See LICENSE for details.

Citation

Please consider citing our work if you find them useful:

@article{longlive_2.0,
  title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
  author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
  journal={arXiv preprint arXiv: 2605.18739},
  year={2026}
}

@inproceedings{longlive,
    title={Longlive: Real-time interactive long video generation}, 
    author={Yang, Shuai and Huang, Wei and Chu, Ruihang and Xiao, Yicheng and Zhao, Yuyang and Wang, Xianbang and Li, Muyang and Xie, Enze and Chen, Yingcong and Lu, Yao and others},
    booktitle={ICLR},
    year={2026},
}

@article{longlive_rag,
  title         = {LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation},
  author        = {Hu, Qixin and Yang, Shuai and Huang, Wei and Han, Song and Chen, Yukang},
  journal       = {arXiv preprint arXiv:2606.02553},
  year          = {2026}
}

Acknowledgement

Self-Forcing: the AR training codebase and formulation we build upon.
Wan2.2: the base video diffusion model components used in this release.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
assets/longlive2		assets/longlive2
configs		configs
docs		docs
example		example
fouroversix		fouroversix
model		model
pipeline		pipeline
scripts		scripts
tests		tests
trainer		trainer
utils		utils
wan_5b		wan_5b
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
inference_sp.py		inference_sp.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

💡 TLDR: Infra with NVFP4 and parallelism for both training and inference

News

Introduction

Getting Started

Quick Start

BF16

NVFP4

Training Modes

T2V Training

I2V Training

Models

License

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 LongLive 2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

💡 TLDR: Infra with NVFP4 and parallelism for both training and inference

News

Introduction

Getting Started

Quick Start

BF16

NVFP4

Training Modes

T2V Training

I2V Training

Models

License

Citation

Acknowledgement

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages