code-logic-bench

A benchmark for LLMs and related tools for reasoning and answering questions about software logic. Results for comparing performance of LLM-only vs LLM+CodeLogician(ImandraX) (CodeLogician) presented.

Overview

Given a Python function and questions about its behavior (e.g., "How many distinct output scenarios exist?"), we compare:

LLM-only: Answers generated by prompting LLMs with only the source code
LLM + Automated Reasoning: Answers generated by CodeLogician, a neurosymbolic agentic governance framework

Structure

├── examples/                    # 50 benchmark examples
│   └── <example_name>/
│       ├── model.py             # Python source code
│       ├── model.iml            # IML specification + analysis (region decomp/VG)
│       ├── questions.yaml       # 3 questions about the code
│       ├── answer_CL.yaml       # Answers from CodeLogician
│       ├── answer_LLM.yaml      # Answers from pure LLM
│       ├── metrics.yaml         # Evaluation scores
│       └── cl_analysis/         # Raw CodeLogician outputs (JSON)
├── analysis/                    # Aggregated results and plots
├── generate_CL_answers/         # Prompts for generating CodeLogician answers (see README)
├── generate_llm_answers.py      # Generate pure LLM responses
├── generate_metrics.py          # Evaluate and compare answers
└── aggregate_metrics.py         # Aggregate and visualize results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code-logic-bench

Overview

Structure

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
examples		examples
generate_CL_answers		generate_CL_answers
LICENSE		LICENSE
README.md		README.md
aggregate_metrics.py		aggregate_metrics.py
generate_llm_answers.py		generate_llm_answers.py
generate_metrics.py		generate_metrics.py

License

imandra-ai/code-logic-bench

Folders and files

Latest commit

History

Repository files navigation

code-logic-bench

Overview

Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages