Skip to content

CSU-JPG/VCode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎨 VCode: SVG as Symbolic Visual Representation

Image Image Image Image Image

TL;DR: SVG code as a Visual Representation

Overview

See our demo video for fun!

VCode_demo_video.mp4

📣 News

  • [2025.12.20] 🌟 Added GPT-5.2 to our benchmark, showing solid performance, below Gemini-3-Pro but outperforming Claude-4.5-Sonnet.
  • [2025.11.21] 🔥 Added Gemini-3-Pro to our benchmark, showing excellent performance.
  • [2025.11.08] 🎥 Released our demo video featuring lots of fun memes and reaction images converted into SVGs.
  • [2025.11.08] 🚀 We now offer a free trial API on our 🤗 HuggingFace Space.
  • [2025.11.05] 🔥 We are honored to be featured as 🤗 HuggingFace Daily Paper #1.

📋 Table of Contents


🛠️ Installation

Environment

git clone -b main --single-branch https://github.com/CSU-JPG/VCode.git
cd VCode
conda create -n vcode python=3.10.2 -y
conda activate vcode
conda install pytorch=2.5.1 torchvision=0.20.1 torchaudio=2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txt

🚀 Quick Start

🧩 VCode-suite

VCode-suite is a comprehensive toolkit that automates the full image-to-SVG-to-render workflow. It includes both integrated pipelines and independent modules for generation, rendering, and revision. Users can either run the end-to-end pipelines for batch processing, or execute individual scripts for customized control.

📁 vcode-suite/
├── filter.py
├── img2svg.py
├── img2svgthinking.py
├── img2svg-w-visual-tool.py
├── img2text2svg.py
├── pipeline.sh
├── revision_pipeline.sh
├── revision.py
└── svg_render_img.py

💡 Tip: The pipelines (pipeline.sh, revision_pipeline.sh) perform fully automated batch processing, while the Python scripts (img2svg.py, img2text2svg.py, revision.py, etc.) can be run independently to support flexible and modular experimentation within the VCode framework.

⚙️ Usage

1️⃣ Generate and render SVGs

pipeline.sh orchestrates the full image-to-SVG-to-render workflow. It can connect to different generation modules — img2svg, img2text2svg, or img2svgthinking — to convert images into SVGs, then filter and render them into pixel images.

chmod +x pipeline.sh
./pipeline.sh

2️⃣ Optimize generated SVGs

revision_pipeline.sh automates the revision and optimization process. It takes the previously generated SVGs (generated_svgs/) and rendered images (generated_imgs/), calls the API-based revision module, and outputs the optimized SVGs and renders to optimized_svgs/ and optimized_imgs/.

chmod +x revision_pipeline.sh
./revision_pipeline.sh

3️⃣ Run scripts independently

Both generation and revision scripts can be executed independently for flexible and customized workflows.

Each core generation script — img2svg.py, img2text2svg.py, img2svgthinking.py, and img2svg-w-visual-tool.py — can directly convert input images into SVG code. Similarly, revision.py can be run independently to optimize previously generated SVGs through visual feedback.


Run img2svg.py

python vcode-suite/img2svg.py \
/path/to/input_images \
./generated_svgs \
--model gpt-5 \
--base-url https://openrouter.ai/api/v1 \
--api-key <OPENROUTER_API_KEY> \
--max-tokens 16384
Argument Type Default Description
images_folder str - Path to the input folder containing image files.
svg_output_folder str - Directory to save the generated SVG files.
--model str gpt-5 API model name used for conversion.
--base-url str https://openrouter.ai/api/v1 Base URL of the API endpoint.
--api-key str - API key for authentication.
--sleep int 5 Seconds to wait between consecutive API calls.
--max-tokens int 16384 Maximum number of tokens allowed in the model’s response.

Run revision.py

python vcode-suite/revision.py \
--svg-folder ./generated_svgs \
--original-folder ./input_images \
--rendered-folder ./generated_imgs \
--output-folder ./optimized_svgs \
--analysis-folder ./visual_analysis \
--base-url https://openrouter.ai/api/v1 \
--api-key <OPENROUTER_API_KEY> \
--model gpt-5 \
--max-tokens 16384
Argument Type Default Description
--svg-folder str Root directory containing the SVG files to optimize.
--svg-folder str - Root directory containing the SVG files to optimize.
--original-folder str - Directory of the original reference images.
--rendered-folder str - Directory of rendered images corresponding to the SVGs.
--output-folder str - Directory to save the optimized SVG files.
--analysis-folder str - Directory to save visual comparison and analysis txts.
--base-url str https://openrouter.ai/api/v1 Base URL of the API endpoint.
--api-key str - API key.
--model str gpt-5 Model used for revision.
--max-tokens int 16384 Maximum tokens allowed in the model response.

💡 Tip: The revision.py script refines existing SVGs based on visual comparison feedback, while generation scripts (img2svg.py, img2text2svg.py, img2svgthinking.py, img2svg-w-visual-tool.py) create SVGs from input images_folder. You can flexibly mix and match these tools depending on your pipeline needs.


🔮 Evaluation

⚙️ Usage

1️⃣ Generate IMGs for all three datasets

Use the VCode-suite pipeline (or standalone scripts) to render images for each dataset. Original images are already in data/:

  • MM-Vet: data/mm-vet/images
  • CV-Bench: data/cv-bench
  • MMMU: data/mmmu/mmmu_dev_processed_single_img_subset

Running your pipeline will produce, per dataset, a folder like:

generated_svgs/
generated_imgs/  ← used by the evaluators

2️⃣ Run each dataset’s evaluator

Each evaluator is a shell script under evaluation/…. They all follow the same usage:

chmod +x evaluation/mm-vet/mmvet_eval.sh
./evaluation/mm-vet/mmvet_eval.sh
chmod +x evaluation/cv-bench/cvbench_eval.sh
./evaluation/cv-bench/cvbench_eval.sh
chmod +x evaluation/mmmu/mmmu_eval.sh
./evaluation/mmmu/mmmu_eval.sh

These scripts will read your generated_imgs/ and compute scores.

💡 Reference: For directory organization and example script configuration, see example_results/ (it shows a working layout you can mirror).


3️⃣ Calculate each dataset’s metrics

Full Command with Options

python metrics.py \
--folder1 /path/to/reference_images \
--folder2 /path/to/model_outputs/gpt-4o \
--ckpt google/siglip2-so400m-patch14-384

Command Line Arguments

Argument Required Default Description
--folder1 ✅ Yes - Path to reference images folder
--folder2 ✅ Yes - Path to model output folder (containing generated_imgs/ and generated_svgs/)
--ckpt ❌ No google/siglip2-so400m-patch14-384 SigLIP model checkpoint

Expected Directory Layout:

Reference Images Folder (--folder1)

Location: data/mm-vet/images (example path - can be customized)

folder1/
├── category1/
│   ├── image001.png
│   ├── image002.jpg
│   └── ...
├── category2/
│   ├── image003.png
│   └── ...
└── ...

Model Output Folder (--folder2)

Location: example_results/mm-vet/Gemini-2.5-Pro (example path - can be customized)

folder2/
├── generated_imgs/           # Generated/rendered images
│   ├── category1/
│   │   ├── image001.png
│   │   ├── image002.jpg
│   │   └── ...
│   ├── category2/
│   │   ├── image003.png
│   │   └── ...
│   └── ...
│
└── generated_svgs/           # SVG source files
   ├── category1/
   │   ├── image001.svg
   │   ├── image002.svg
   │   └── ...
   ├── category2/
   │   ├── image003.svg
   │   └── ...
   └── ...

📌 Citation

If you find our work useful, please cite:

@article{vcode,
  title={VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation},
  author={Lin, Kevin Qinghong and Zheng, Yuhao and Ran, Hangyu and Zhu, Dantong and Mao, Dongxing and Li, Linjie and Torr, Philip and Wang, Alex Jinpeng},
  journal={arXiv preprint arXiv:2511.02778},
  year={2025}
}

About

VCode: SVG as Symbolic Visual Representation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors