Skip to content

Anjiang-Wei/CodeARC

Repository files navigation

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis (COLM'25)

COLM 2025 arXiv License Python

HuggingFace HuggingFace

HuggingFace HuggingFace

Quick Start

Setting Up the Environment

  1. Create and activate a Conda environment:

    conda create -y -n CodeARC python=3.10.12
    conda activate CodeARC
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set API keys: Ensure you have valid API keys for the required services:

    export OPENAI_API_KEY=<your_openai_api_key>
    export ANTHROPIC_API_KEY=<your_anthropic_api_key>
    export TOGETHER_API_KEY=<your_together_api_key>

Running Main Evaluation

python3 run.py --model_name openai/gpt-4o-mini --total_idx 20

We support OpenAI models (e.g., openai/gpt-4o), Anthropic models (e.g., anthropic/claude-3-7-sonnet-20250219), and models served by Together AI (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo). For testing purposes, you can pass --total_idx 20 to limit evaluation to 20 problems instead of the full dataset (1114 problems). See run.py for additional configuration options.

To summarize results:

python3 src/compute_metrics.py

HuggingFace Dataset

The CodeARC datasets are hosted on HuggingFace:

Setting up HuggingFace Account

  1. Obtain an access token:

  2. Login using the token:

    Option A: Use the command line:

    huggingface-cli login
    huggingface-cli whoami

    Option B: Add the token to the environment variable:

    export HF_TOKEN=<your_huggingface_token>
    

Accessing Datasets via the HuggingFace datasets Library

You can directly load the datasets using the HuggingFace datasets library:

from datasets import load_dataset

# Define dataset paths
hf_problems_path = "anjiangwei/CodeARC-Problems"
hf_invocations_path = "anjiangwei/CodeARC-Invocations"

# Load datasets
problems_dataset = load_dataset(hf_problems_path)
invocations_dataset = load_dataset(hf_invocations_path)

# Example: Access the first training sample
print(problems_dataset["train"][0])
print(invocations_dataset["train"][0])

Citation

If our research inspires you, please cite our paper:

@inproceedings{wei2025codearc,
  title={Code{ARC}: Benchmarking Reasoning Capabilities of {LLM} Agents for Inductive Program Synthesis},
  author={Anjiang Wei and Tarun Suresh and Jiannan Cao and Naveen Kannan and Yuheng Wu and Kai Yan and Thiago S. F. X. Teixeira and Ke Wang and Alex Aiken},
  booktitle={Second Conference on Language Modeling},
  year={2025},
  url={https://openreview.net/forum?id=Q5pVZCrrKr}
}

License

This project is licensed under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages