Jian-Yu Jiang-Lin1,2* Kang-Yang Huang2* Ling Lo1 Yi-Ning Huang1
Terence Lin1 Jhih-Ciang Wu2 Hong-Han Shuai1 Wen-Huang Cheng2
*Indicates Equal Contribution
1 National Yang Ming Chiao Tung University 2 National Taiwan University
Overview: ReCorD revolutionizes image generation by integrating LDM with VLMs. This training-free approach enhances the creation of multimedia content, particularly in accurately depicting HOIs.
conda create -n ReCorD python=3.10.13 -y
conda activate ReCorD
pip install -r requirements.txt
Execute the script to generate the example within the script. You can adjust the prompts and other configurations in the main.py script to customize your image generation.
python main.pyHere’s a brief overview of key configuration options:
- prompt: text prompts for generating images.
- guidance_scale: Controls the specificity of the image to the prompt (higher values generate more specific images).
- n_inference_steps: Number of steps for the model's inference, affecting the clarity and detail of the output.
- run_standard_sd: Boolean to choose between standard and custom Stable Diffusion runs.
- scale_factor: Affects the resolution or quality of generated images.
- max_iter_to_alter: Maximum iterations to modify attention for specified indices.
- output_path: File path where generated images will be stored.
- viz_path: Path for saving attention maps.
You can modify these settings in the main.py script to fit your specific requirements:
config = RunConfig(
prompt=['a boy is feeding', 'a boy is feeding a bird'],
guidance_scale=7.5,
n_inference_steps=50,
run_standard_sd=False,
scale_factor=20,
max_iter_to_alter=25,
output_path=Path('./outputs'),
viz_path=Path('./attention_maps')
)Run Interaction-aware Reasoning Module
python ./GPT4V/gpt4v.py
- generated_image_path: The path of the generated image.
- api_key: API key for accessing the GPT4V.
- input_annotations: Input annotations for the GPT4V query.
input_annotations: should be a dictionay with the following format:
{
"object": "motorcycle",
"verb": "ride",
"human_bboxes": "[169, 20, 349, 379]",
"object_bboxes": "[60, 121, 465, 421]"
}
This project builds upon and is deeply influenced by the following repositories:
We are grateful to these outstanding open-source projects for providing the groundwork that made this project possible.
