AI & ML interests
None defined yet.
Recent Activity
Papers
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
datasets 11
benchflow/env0-experiment-trajectories
Updated • 50
benchflow/skillsbench
Updated • 4.14k • 6
benchflow/skillsbench-leaderboard
Updated • 11.9k • 1
benchflow/benchmarks
Updated • 49
benchflow/skillsbench-research-artifacts
Updated • 35
benchflow/skillsbench-trajectories-apr2026
Updated • 525 • 1
benchflow/skillsbench-data
Viewer • Updated • 94.3k • 331
benchflow/ClawsBench
Viewer • Updated • 7.83k • 536 • 2
benchflow/artifacts
Preview • Updated • 50
benchflow/skills_parquet
Viewer • Updated • 35.5k • 18 • 1