Vision Language Models Cannot Plan, but Can They Formalize?

Overview

This repo implements the benchmarks and pipeline logics for this paper.

Benchmark structure

Two benchmarks are available (adding support for ALFRED currently). They have the following structure:

data/
  dataset_root/
    subtask_name/
      observations/  $ contain .png files
      domain.pddl
      instruction.txt
      problem.pddl
      plan.txt

Usage

We use scripts/main.py as the entry point to choose a dataset and a pipeline to evaluate.

Command line arguments include:

--dataset_dir
--dataset
--model
--pipeline

Example:

python main.py --dataset_dir ... --dataset ... --model ... --pipeline ...

Results

Output directory structure:

results/
  <dataset>_<pipeline>_<model>/
    subtask_name/
      subtask_name.pddl

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data/blocksworld-real		data/blocksworld-real
scripts		scripts
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Language Models Cannot Plan, but Can They Formalize?

Overview

Benchmark structure

Usage

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Language Models Cannot Plan, but Can They Formalize?

Overview

Benchmark structure

Usage

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages