Mercor is in the Forbes AI 50 for the 2nd year in a row. Thank you to our experts, customers, and team for being a part of this. Advance the frontier of AI with us. We're hiring across nearly every function. Check out our open roles at the link in the comments.
Mercor
Software Development
San Francisco, California 689,877 followers
Defining the future of work
About us
Mercor is defining the future of work. We connect human expertise with leading AI labs and enterprises to train frontier models.
- Website
-
mercor.com
External link for Mercor
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2023
Locations
-
Primary
Get directions
San Francisco, California 94105, US
Employees at Mercor
Updates
-
Ayushi spent years building at the intersection of AI and healthcare, most recently as the founder of a healthcare AI startup. She knew what it felt like to search for product-market fit from the inside, and what it cost when you didn't find it. When she started thinking about what came next, she was deliberate. She wanted colleagues who understood founder life without her having to explain it. About 30% of people at Mercor are former founders. "After years of trying to build something from nothing, there is a specific energy in joining a team that's already sprinting and finding out you can keep pace." At Mercor, she's working on problems that only exist at scale, helping build the infrastructure that connects human expertise to AI advancement. Read Ayushi's story at the link in the comments.
-
-
We are excited to announce our collaboration with Artificial Analysis on APEX-Agents-AA — an independent, live leaderboard evaluating AI agents on the professional tasks that knowledge workers do every day. The leaderboard is built on APEX-Agents, Mercor's open-source benchmark of 480 tasks across investment banking, management consulting, and corporate law — including tool implementations, rubrics, and grading workflows, all available to the community for evaluation and training. Artificial Analysis runs a subset of these tasks through their open-source Stirrup harness, providing a reproducible, independent baseline that any team can verify and build on. APEX-Agents-AA results: 🥇 GPT-5.4: 33.3% 🥈 Claude Opus 4.6: 33.0% 🥉 Gemini 3.1 Pro Preview: 32.0% The top three frontier models are separated by just 1.3 percentage points. The leaderboard will update with key model releases. Check it out at the link in the comments.
-
-
The privacy and security of our customers and contractors is foundational to everything we do at Mercor. We recently identified that we were one of thousands of companies impacted by a supply chain attack involving LiteLLM. Our security team moved promptly to contain and remediate the incident. We are conducting a thorough investigation supported by leading third-party forensics experts. We will continue to communicate with our customers and contractors directly as appropriate and devote the resources necessary to resolving the matter as soon as possible.
-
Does Training on APEX-Agents Dev Set Generalize Beyond the Benchmark? Applied Compute post-trained GLM-4.7 on ~2,000 expert Mercor tasks and achieved state-of-the-art legal performance on APEX-Agents. We then evaluated that model, AC-Small, on benchmarks outside its training distribution. On GDPVal, AC-Small's win+tie rate rose from 55.0% to 62.7% (+7.7pp), placing it 5th overall and ahead of Opus 4.5. To understand where the gain came from, we ran two ablations: On Toolathalon, AC-Small improved by +8.0pp, from 26.5% to 34.6%. On APEX, which removes tool use and agent loops, AC-Small moved up seven spots, beating Opus 4.5, Sonnet 4.5, and Grok 4. The biggest surprise was medicine. AC-Small placed 4th at 64.8%, ahead of GPT 5.4, Gemini 3.1 Pro, and o3, despite zero medical tasks in training. The gains appear to come from stronger procedural discipline: preserving sub-details, checking intermediate outputs, and catching logical errors. Read more at the links in the comment.
-
-
"The most important problem in the world is what we do all day for work and how the knowledge work economy operates." - Brendan Foody, at Upfront Ventures Summit. Brendan sat down with Sundeep Peechu of Felicis to talk about the future of work, what's blocking enterprise AI, and why humans become more valuable as AI advances. Watch the full video at the link in the comments.
-
Mercor reposted this
Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with Cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. APEX-SWE Leaderboard | Pass@1 🥇OpenAI GPT-5.3 Codex (High) at 41.5% 🥈Anthropic Opus 4.6 (High) at 40.5% 🥉Anthropic Opus 4.5 (High) at 38.7% Every frontier model fails on nearly 60% of real production tasks.
-
Introducing APEX-SWE, in collaboration with Cognition. They see firsthand that real software engineering is not just writing code anymore. On APEX-SWE, every model fails to reliably solve the real production software engineering tasks. OpenAI GPT-5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1, followed by Anthropic Opus 4.6 (High) at 40.5%. APEX-SWE tests two things legacy benchmarks ignore: building and deploying end-to-end systems across cloud services and databases, and diagnosing real production failures from logs and unstructured context. Read more about our new benchmark at the link in the comments.
-
-
Colin built his career inside UK institutions, including Cambridge University, the Bank of England, and the Financial Conduct Authority, where he helped reshape national consumer protection rules. When he moved to the U.S., he realized how much professional networks mattered and he didn't want to start over. He found Mercor instead. Now, Colin works alongside investment bankers, PhD economists, and law professors on some of the most complex legal and economic problems. "I'm teaching AI how to do legal reasoning, how to distinguish a case subtly from another one, and I'm asking it to reproduce some of the hardest work that I've done in my own professional life. It's felt like something that's given me my dignity back as a professional." Find your next opportunity: www.mercor.com
-
Amresh Subramaniam spent nearly a decade at McKinsey before joining Mercor. He was ready to stop just advising on AI and start building it. In his first year, he worked with top AI labs and built a team of more than 20 people, learning what kinds of data actually move the needle on model performance. For Amresh, it's the experts who've made the work meaningful. People who joined for flexibility, who hit a rough patch, or who simply wanted to put their expertise to work on their own terms. Read more of Amresh's story at the link in the comments.
-