Popular repositories Loading
-
skillsbench
skillsbench PublicSkillsBench evaluates how well skills work and how effective agents are at using them.
-
-
Repositories
- benchflow Public
Framework for creating high fidelity and complex RL environments and evaluation tasks
benchflow-ai/benchflow’s past year of commit activity - skillsbench Public
SkillsBench evaluates how well skills work and how effective agents are at using them.
benchflow-ai/skillsbench’s past year of commit activity - agents Public
Minimal coding agents, easy to play with (opencode TUI) and easy to benchmark (ACP on BenchFlow) — built around mini-swe-agent
benchflow-ai/agents’s past year of commit activity - mini-swe-code Public archive
mini-swe-agent + opencode TUI (mini-opencode) — BenchFlow's standalone interactive mini-swe distribution
benchflow-ai/mini-swe-code’s past year of commit activity - skillsbench-trajectories Public
benchflow-ai/skillsbench-trajectories’s past year of commit activity - skillsbench-leaderboard Public
benchflow-ai/skillsbench-leaderboard’s past year of commit activity - posttrainarena Public
benchflow-ai/posttrainarena’s past year of commit activity - mini-swe-agent Public Forked from SWE-agent/mini-swe-agent
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
benchflow-ai/mini-swe-agent’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…