Skip to content

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

License

Notifications You must be signed in to change notification settings

Hambaobao/SWE-Flow

Repository files navigation

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

Official implementation of the ICML 2025 paper: Synthesizing Software Engineering Data in a Test-Driven Manner

✨ Overview

SWE-Flow Framework

SWE-Flow is a data-synthesis framework that turns unit tests into fully-verifiable, incremental development tasks. It constructs a Runtime Dependency Graph (RDG) to trace function interactions and automatically derives a step-by-step development schedule:

  • Partial codebase for each step
  • Unit tests that express the high-level requirement
  • Minimal code patch needed to make the tests pass

With this pipeline we generated 16,061 training and 2,020 test instances from real-world GitHub projects, forming the SWE-Flow Dataset. Fine-tuning open models on this dataset yields significant gains on TDD-oriented coding tasks.

🔧 Installation

git clone https://github.com/Hambaobao/SWE-Flow.git
cd SWE-Flow
pip install -e .

📚 Documentation

🤝 Contributing

Contributions are welcome! A detailed CONTRIBUTING.md guideline will be added soon. Feel free to open issues or pull requests in the meantime.

📄 License

This repository is licensed under the MIT License. See License for the full text.

📌 Citation

If you use SWE-Flow, please cite:

@misc{zhang2025sweflow,
      title={SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner}, 
      author={Lei Zhang and Jiaxi Yang and Min Yang and Jian Yang and Mouxiang Chen and Jiajun Zhang and Zeyu Cui and Binyuan Hui and Junyang Lin},
      year={2025},
      eprint={2506.09003},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09003}, 
}

🙏 Acknowledgments

Work done during an internship at Alibaba Qwen. We thank the Alibaba Qwen Team and the open-source community for the projects that enabled SWE-Flow.

About

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •