Skip to content

Tables & Resources

This page contains statistical tables and resources from our comprehensive survey on Issue Resolution in Software Engineering.


Evaluation & Training Datasets

A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasets based on programming language, modality support, source repositories, data scale (Amount), and the availability of reproducible execution environments.

Dataset Language Multimodal Repos Amount Environment Link
Single-PL Datasets
SWE-Fixer Python 856 115,406 GitHub HuggingFace HuggingFace
SWE-smith Python 128 50k GitHub HuggingFace
SWE-Lego Python 3,251 32,119 GitHub HuggingFace
SWE-rebench Python 3,468 21,336 GitHub HuggingFace
SWE-bench-train Python 37 19k GitHub HuggingFace
SWE-Flow Python 74 18,081 GitHub
Skywork-SWE Python 2,531 10,169 -
R2E-Gym Python 10 8,135 GitHub HuggingFace
RepoForge Python - 7.3k -
SWE-bench-extra Python 2k 6.38k HuggingFace
SWE-Gym Python 11 2,438 GitHub HuggingFace
SWE-bench Python 12 2,294 GitHub HuggingFace
SWE-bench-java Java 19 1,797 GitHub HuggingFace
FEA-bench Python 83 1,401 GitHub HuggingFace
SWE-bench-Live Python 164 1,565 GitHub HuggingFace
Loc-Bench Python - 560 GitHub HuggingFace
SWE-bench Verified Python - 500 GitHub HuggingFace
SWE-bench Lite Python 12 300 GitHub HuggingFace
SWE-MERA Python 200 300 GitHub HuggingFace
SWE-Bench-CL Python 8 273 GitHub
SWE-Sharp-Bench C# 17 150 GitHub HuggingFace
SWE-Perf Python 12 140 GitHub HuggingFace
Visual SWE-bench Python 11 133 GitHub HuggingFace
SWE-EVO Python 7 48 GitHub
Multi-PL Datasets
SWE-Mirror Python, Rust, Go 40 60k -
Multi-SWE-bench Java, JS, TS, Go, Rust, C, C++ 76 4,723 GitHub HuggingFace
Swing-Bench Python, Go, C++, Rust 400 2300 -
SWE-PolyBench Python, Java, JS, TS 21 2,110 GitHub HuggingFace HuggingFace
SWE-Compass Python, JS, TS, Java, C, C++, Go, Rust, Kotlin, C# - 2,000 GitHub HuggingFace
SWE-Bench Pro Python, Go, TS 41 1,865 GitHub HuggingFace
SWE-bench++ Python, Go, TS, JS, Ruby, PHP, Java, Rust, C++, C#, C 3,971 1,782 GitHub HuggingFace
SWE-Lancer JS, TS - 1,488 GitHub
OmniGIRL Python, TS, Java, JS 15 959 GitHub HuggingFace
SWE-bench Multimodal JS, TS, HTML, CSS 17 619 GitHub HuggingFace
SWE-fficiency Python, Cython 9 498 GitHub
SWE-Factory Python, Java, JS, TS 12 430 GitHub HuggingFace
SWE-bench-Live-MultiLang \& Windows Python, JS, TS, C, C++, C#, Java, Go, Rust 238 418 GitHub HuggingFace HuggingFace
SWE-bench Multilingual C, C++, Go, Java, JS, TS, Rust, Python, Ruby, PHP 42 300 GitHub HuggingFace
SWE-InfraBench Python, TS - 100 -

Training Trajectory Datasets

A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.

Dataset Language Repos Amount Link
SWE-Fixer Python 856 69,752 GitHub HuggingFace
SWE-rebench Python 1,823 67,074 HuggingFace
R2E-Gym Python 10 3,321 GitHub HuggingFace
SWE-Synth Python 11 3,018 GitHub HuggingFace
SWE-Factory Python 10 2,809 GitHub HuggingFace
SWE-Gym Python 11 491 GitHub HuggingFace
SWE-Lego Python 3251 14.6k GitHub

SFT-based Methods

Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).

Model Name Base Model Size Arch. Training Scaffold Res.(%) Code Data Model
SWE-rebench-openhands-Qwen3-235B-A22B Qwen3-235B-A22B 235B-A22B MoE OpenHands 59.9 - HuggingFace HuggingFace
SWE-Lego-Qwen3-32B Qwen3-32B 32B Dense OpenHands 57.6 GitHub HuggingFace HuggingFace
SWE-rebench-openhands-Qwen3-30B-A3B Qwen3-30B-A3B 30B-A3B MoE OpenHands 49.7 - HuggingFace HuggingFace
Devstral Mistral Small 3 22B Dense OpenHands 46.8 - Website HuggingFace
Co-PatcheR Qwen2.5-Coder-14B 3×14B Dense PatchPilot-mini 46.0 GitHub - HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense Agentless 45.0 GitHub HuggingFace HuggingFace
SWE-Lego-Qwen3-8B Qwen3-8B 8B Dense OpenHands 44.4 GitHub HuggingFace HuggingFace
Lingma SWE-GPT Qwen2.5-72B-Instruct 72B Dense SWESynInfer 30.2 GitHub - -
SWE-Gym-Qwen-32B Qwen2.5-Coder-32B 32B Dense OpenHands, MoatlessTools 20.6 GitHub - HuggingFace
Lingma SWE-GPT Qwen2.5-Coder-7B 7B Dense SWESynInfer 18.2 GitHub - -
SWE-Gym-Qwen-14B Qwen2.5-Coder-14B 14B Dense OpenHands, MoatlessTools 16.4 GitHub - HuggingFace
SWE-Gym-Qwen-7B Qwen2.5-Coder-7B 7B Dense OpenHands, MoatlessTools 10.6 GitHub - HuggingFace

RL-based Methods

A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. %) on issue resolution benchmarks.

Model Name Base Model Size Arch. Train. Scaffold Reward Res.(%) Code Data Model
560B Models (MoE)
LongCat-Flash-Think LongCatFlash-Base 560B-A27B MoE R2E-Gym Outcome 60.4 GitHub - HuggingFace
72B Models
Kimi-Dev Qwen 2.5-72B-Base 72B Dense BugFixer + TestWriter Outcome 60.4 GitHub - HuggingFace
SWE-RL Llama-3.3-70B-Instruct 70B Dense Agentless-mini Outcome 41.0 GitHub - -
Multi-turn RL(Nebius) Qwen2.5-72B-Instruct 72B Dense SWE-agent Outcome 39.0 - - -
Agent-RLVR-RM-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 27.8 - - -
Agent-RLVR-72B Qwen2.5-Coder-72B 72B Dense Localization + Repair Outcome 22.4 - - -
32B Models
OpenHands Critic Qwen2.5-Coder-32B 32B Dense SWE-Gym - 66.4 GitHub - HuggingFace
KAT-Dev-32B Qwen3-32B 32B Dense - - 62.4 - - HuggingFace
SWE-Swiss-32B Qwen2.5-32B-Instruct 32B Dense - Outcome 60.2 GitHub HuggingFace HuggingFace
FoldAgent Seed-OSS-36B-Instruct 36B Dense FoldAgent Process 58.0 GitHub Website -
SeamlessFlow-32B Qwen3-32B 32B Dense SWE-agent Outcome 45.8 GitHub - -
DeepSWE Qwen3-32B 32B Dense R2E-Gym Outcome 42.2 GitHub HuggingFace HuggingFace
SA-SWE-32B - 32B Dense SkyRL-Agent - 39.4 - - -
OpenHands LM v0.1 Qwen2.5-Coder-32B 32B Dense SWE-Gym - 37.2 GitHub - HuggingFace
SWE-Dev-32B Qwen2.5-Coder-32B 32B Dense OpenHands Outcome 36.6 GitHub - HuggingFace
Satori-SWE Qwen2.5-Coder-32B 32B Dense Retriever + Code editor Outcome 35.8 GitHub HuggingFace HuggingFace
SoRFT-32B Qwen2.5-Coder-32B 32B Dense Agentless Outcome 30.8 - - -
Agent-RLVR-32B Qwen2.5-Coder-32B 32B Dense Localization + Repair Outcome 21.6 - - -
14B Models
Agent-RLVR-14B Qwen2.5-Coder-14B 14B Dense Localization + Repair Outcome 18.0 - - -
SEAlign-14B Qwen2.5-Coder-14B 14B Dense OpenHands Process 17.7 - - -
7-8B Models
SeamlessFlow-8B Qwen3-8B 8B Dense SWE-agent Outcome 27.4 GitHub - -
SWE-Dev-7B Qwen2.5-Coder-7B 7B Dense OpenHands Outcome 23.4 GitHub - HuggingFace
SoRFT-7B Qwen2.5-Coder-7B 7B Dense Agentless Outcome 21.4 - - -
SWE-Dev-8B Llama-3.1-8B 8B Dense OpenHands Outcome 18.0 GitHub - HuggingFace
SEAlign-7B Qwen2.5-Coder-7B 7B Dense OpenHands Process 15.0 - - -
SWE-Dev-9B GLM-4-9B 9B Dense OpenHands Outcome 13.6 GitHub - HuggingFace

General Foundation Models

Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.

Model Name Size Arch. Inf. Scaffold Reward Res.(%) Code Model
MiMo-V2-Flash 309B-A15B MoE Agentless Outcome 73.4 GitHub HuggingFace
KAT-Coder - - Claude Code Outcome 73.4 - Website
Deepseek V3.2 671B-A37B MoE Claude Code, RooCode - 73.1 GitHub HuggingFace
Kimi-K2-Instruct 1T MoE Agentless Outcome 71.6 - HuggingFace
Qwen3-Coder 480B-A35B MoE OpenHands Outcome 69.6 GitHub HuggingFace
GLM-4.6 355B-A32B MoE OpenHands Outcome 68.0 - HuggingFace
gpt-oss-120b 116.8B-A5.1B MoE Internal tool Outcome 62.0 GitHub HuggingFace
Minimax M2 230B-10B MoE R2E-Gym Outcome 61.0 GitHub HuggingFace
gpt-oss-20b 20.9B-A3.6B MoE Internal tool Outcome 60.0 GitHub HuggingFace
GLM-4.5-Air 106B-A12B MoE OpenHands Outcome 57.6 - -
Minimax M1-80k 456B-A45.9B MoE Agentless Outcome 56.0 GitHub Website
Minimax M1-40k 456B-A45.9B MoE Agentless Outcome 55.6 GitHub Website
Seed1.5-Thinking 200B-A20B MoE - Outcome 47.0 GitHub -
Llama 4 Maverick 400B-A17B MoE mini-SWE-agent Outcome 21.0 GitHub HuggingFace
Llama 4 Scout 109B-17B MoE mini-SWE-agent Outcome 9.1 GitHub HuggingFace