ReCorD (ACM MM 2024)

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Jian-Yu Jiang-Lin^1,2* Kang-Yang Huang^2* Ling Lo¹ Yi-Ning Huang¹
Terence Lin¹ Jhih-Ciang Wu² Hong-Han Shuai¹ Wen-Huang Cheng²
*Indicates Equal Contribution

¹ National Yang Ming Chiao Tung University ² National Taiwan University

Overview: ReCorD revolutionizes image generation by integrating LDM with VLMs. This training-free approach enhances the creation of multimedia content, particularly in accurately depicting HOIs.

Installation

conda create -n ReCorD python=3.10.13 -y
conda activate ReCorD
pip install -r requirements.txt

Usage

Execute the script to generate the example within the script. You can adjust the prompts and other configurations in the main.py script to customize your image generation.

python main.py

Configuration

Here’s a brief overview of key configuration options:

prompt: text prompts for generating images.
guidance_scale: Controls the specificity of the image to the prompt (higher values generate more specific images).
n_inference_steps: Number of steps for the model's inference, affecting the clarity and detail of the output.
run_standard_sd: Boolean to choose between standard and custom Stable Diffusion runs.
scale_factor: Affects the resolution or quality of generated images.
max_iter_to_alter: Maximum iterations to modify attention for specified indices.
output_path: File path where generated images will be stored.
viz_path: Path for saving attention maps.

You can modify these settings in the main.py script to fit your specific requirements:

config = RunConfig(
    prompt=['a boy is feeding', 'a boy is feeding a bird'],
    guidance_scale=7.5,
    n_inference_steps=50,
    run_standard_sd=False,
    scale_factor=20,
    max_iter_to_alter=25,
    output_path=Path('./outputs'),
    viz_path=Path('./attention_maps')
)

Run Interaction-aware Reasoning Module

python ./GPT4V/gpt4v.py

interaction_aware_reasoning(generated_image_path, api_key, input_annotations=None)

generated_image_path: The path of the generated image.
api_key: API key for accessing the GPT4V.
input_annotations: Input annotations for the GPT4V query.

input_annotations: should be a dictionay with the following format:

{
    "object": "motorcycle",
    "verb": "ride",
    "human_bboxes": "[169, 20, 349, 379]",
    "object_bboxes": "[60, 121, 465, 421]"
}

Acknowledgment

This project builds upon and is deeply influenced by the following repositories:

We are grateful to these outstanding open-source projects for providing the groundwork that made this project possible.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
GPT4V		GPT4V
SD_version		SD_version
docs		docs
outputs/exp/a boy is feeding a bird		outputs/exp/a boy is feeding a bird
README.md		README.md
batch_generate.py		batch_generate.py
config.py		config.py
gpt4_interaction_aware_reasoning.py		gpt4_interaction_aware_reasoning.py
gpt4_pose_selection.py		gpt4_pose_selection.py
main.py		main.py
my_utils.py		my_utils.py
processors.py		processors.py
prompt_preprocess.py		prompt_preprocess.py
record_pipeline.py		record_pipeline.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReCorD (ACM MM 2024)

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Installation

Usage

Configuration

interaction_aware_reasoning(generated_image_path, api_key, input_annotations=None)

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReCorD (ACM MM 2024)

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Installation

Usage

Configuration

interaction_aware_reasoning(generated_image_path, api_key, input_annotations=None)

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages