GitHub - Deaddawn/DreamFrame-code

DreamFrame:
Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes

Zhende Song · Chenchen Wang · Jiamu Sheng · Chi Zhang · Shengji Tang · Jiayuan Fan✦ · Tao Chen
(✦ Corresponding Author )
From Fudan University and Tencent PCG

We propose DreamFrame, a novel framework designed to create synthetic, high-quality data for video understanding.

Install

Please follow the instructions below to install the required packages. Our training process is mainly based on LLaMA-VID. And our short video evaluation process is mainly based on quantitative_evaluation from Video-ChatGPT.

Clone this repository

git clone https://github.com/Deaddawn/DreamFrame-code.git

Install Package (Tested on A100 and RTX3090, CUDA 11.8. We recommend sticking to the package versions we provided, as changes in the versions of diffusers and transformers may lead to certain issues.)

conda create -n DreamFrame python=3.10 -y
conda activate DreamFrame
cd DreamFrame
pip install -r requirements.txt

Generation

The data generation process of DreamFrame mainly consists of three stage: (1) Move Plot Generation (2) Style Immobilization Process (3) Video Instruction Data Generation

Move Plot Generation

We basically adopt a story expanding strategy which incrementally generates frame descriptions through three levels. We provide three-level example prompts. Use any LLM(We use GPT-4) to generate frame descriptions and organize them into a JSON file like this story_js

Style Immobilization Process

Style Immobilization is to learn a style embedding which can be used to generate style consistent key frames. To learn the style embedding, we will need a style-related keyword and a set of style-related images. Keyword can be obtained from stage one. For style-related images, we simply use sdxl-1.0-base to generate these based on the detail style description (you can find an example in the prompt we provide).

Here, we provide an example to show how you can train a style embedding. We use keyword "Dramatic".

cd StyleImmobilization
python style_embedding.py --style_keyword Dramatic --image_path ./style

The learned style embedding will be saved at folder "Embeddings". This should only take 5~10 minutes (tested on A100).

Video Instruction Data Generation

After train a style embedding, you can start to generate consistent keyframes based on the aformentioned json file like this:

cd StyleImmobilization
python generate.py --js_path ./json/story_info_0.json --embed_path ./Embeddings/story_0_Dramatic.pt --keyword Dramatic --save_path ./save_path

Model

We provide our baseline model and model trained on our generated dataset. For more detailed information, refer to LLaMA-VID-model. And please follow LLaMA-VID to prepare the necessary settings and feel free to use our provided checkpiont.

Type	Max Token	Base LLM	Finetuning Data	Finetuning schedule	Download
Base Model	64K	Vicuna-7B-v1.5	LLaVA1.5-VideoChatGPT-Instruct	full_ft-1e	ckpt
DreamFrame-7B	64K	Vicuna-7B-v1.5	LLaVA1.5-VideoChatGPT-Instruct + DreamFrameQA	full_ft-1e	ckpt

Dataset

Data generated from our pipeline consists of key frame images, corresponding QAs and dialogues. You can download it from here DreamFrame-Data

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Embeddings		Embeddings
StyleImmobilization		StyleImmobilization
Supplementary		Supplementary
docs		docs
json		json
prompt		prompt
style		style
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamFrame:
Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes

Contents

Install

Generation

Move Plot Generation

Style Immobilization Process

Video Instruction Data Generation

Model

Dataset

Pipeline

Evaluation

Evaluation Results

Results

Generation Results

Comparison Results

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes

Contents

Install

​Generation

Move Plot Generation

Style Immobilization Process

Video Instruction Data Generation

Model

Dataset

Pipeline

Evaluation

Evaluation Results

Results

Generation Results

Comparison Results

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

DreamFrame:
Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes

Generation

Packages