This is for the author, you can skip this step.
We use Google docstring format for our docstrings and the pre-commit library to check our code. To install pre-commit, run the following command:
conda install pre-commit # or pip install pre-commit
pre-commit installThe pre-commit hooks will run automatically when you try to commit changes to the repository.
These tasks require a GUI, so it is recommended that you run it on a Mac / Windows.
git clone https://github.com/Mars-tin/pixrefer.git
cd pixrefer
pip install -e .If you are using a Mac, run the following code to install PyAudio:
brew install portaudio
pip install pyaudioRun the following code to install google-cloud-speech:
pip install google-cloud-speechgit lfs installgit clone https://huggingface.co/datasets/Seed42Lab/Pixrefer_dataIf you have already download the data above and want to download the pragmatics preference data:
cd Pixrefer_data
mkdir pragmatic
git worktree add pragmatic pragmatics_preference
cd -If you want to renew the data:
cd Pixrefer_data
git pull origin main
cd -git clone -b pragmatics_preference https://huggingface.co/datasets/Seed42Lab/Pixrefer_dataIf you want to renew the data:
cd Pixrefer_data/pragmatic
git pull origin pragmatics_preference
cd -Create empty .env file:
touch .envAnd add the content below:
GOOGLE_API_KEY={YOUR_API_KEY}
Please replace the key with the real api key provided.
bash pixrefer/interface/run_rel.shPlease note you need to change the JSON file path in this file first: run_rel.sh
Replace the following path with your given data path. For example, you may need to annotate the llava_7b_concise_results.json:
--json_path Pixrefer_data/data/rel_user_input/llava_7b_concise_results.json # replace the example gpt_4o file path hereAlso replace the output_dir when you annotate another file, so that your results will not be overwritten:
--output_dir output/user_rel/regular # replace the example concise dir if you are annotating the regular dataFor each image, you are required to click where you think the unique object in the red box (you cannot see it) is located.
If you find multiple objects that match the description, click Multiple Match and confirm your guess.
If you cannot find such an object in the image, click Cannot Tell Where The Object Is and confirm your guess.
You can always use Enter(Return) on your keyboard to quickly confirm and go to the next image.
bash pixrefer/interface/run_reg.shFor each image, you are required to give at least one description of the object in the red box to make it can be uniquely identified by another person.
Write a text description:
After you finish, please click `Save Description` to save your result and you will see a green 'Text ✓'.Record an audio description:
Please note that you need to set the google api key in the .env file to proceed.
Click Audio to switch to the audio mode, and click Start Recording to record. When you finish, click Stop Recording. You can edit the translation words, and click Save Description to save the edited result.
You can always use Enter(Return) on your keyboard to quickly confirm and go to the next image.
bash pixrefer/interface/run_pragmatic.shPlease note you need to change the JSON file path in this file first: run_rel.sh
Replace the following path with your given data path. For example, you may need to annotate the user_6_allocation.json:
--json_path Pixrefer_data/pragmatic/user_input/user_6_allocation.json # replace the example user_1 file path hereFor the task, select one of the following options to describe the object pointed by the arrow compared to the other one in the image.
Please note:
- Follow your first instinct.
- The options change orders for each image.
- The maximum number of images that can be annotated at a time is 25. Once this limit is reached, please take a break for at least 10 minutes before continuing.




