Labelbox

Labelbox · 2025-12-12T17:18:59.203Z

Our Labelbox holiday party this week at the beautifully designed Hedge Coffee was full of great vibes and even greater people. As the team took turns on the turntables with espresso martinis in hand, we celebrated everything we’ve built together this year, while getting energized for a big year ahead.

Software Development

San Francisco, California 36,104 followers

The data factory for leading AI teams

See jobs Follow

View all 467 employees

About us

Labelbox is the data factory for leading AI labs and AI-powered enterprises. Innovate faster using Labelbox’s on-demand expert labeling services and unified software to deliver high-quality, frontier data with control and speed.

Website: https://labelbox.com/
External link for Labelbox
Industry: Software Development
Company size: 51-200 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2018

Locations

Primary

510 Treat Ave

San Francisco, California 94110, US

Get directions

Employees at Labelbox

See all employees

Updates

Labelbox

36,104 followers
1d
Report this post
This week, we had the pleasure of hosting 50+ researchers and builders from leading AI companies to meet, talk and socialize (MTS 😎) at Labelbox HQ. Huge thanks to Dwarkesh Patel, Sholto Douglas (Anthropic), Mo Bavarian (OpenAI), and Melvin Johnson (DeepMind) for leading our fireside chat on scaling RL and the pursuit of AGI.
- +1
Like Comment Share
Labelbox reposted this
Keshav Sahoo
2w Edited
Report this post
Keeping customer and workforce data secure is the highest priority at Labelbox, using a trust platform that is thorough, continuous, and adaptive. https://lnkd.in/gZUpfD2s

Engineering trust in an autonomous world labelbox.com

1 Comment

Like Comment Share
Labelbox

36,104 followers
1mo
Report this post
🏆 Forbes’ 2026 list of America’s Best Startup Employers is out, and we’re proud to see Labelbox on the list. We’re committed to enabling the next generation of AI by powering the data and evaluation for the world’s most advanced teams. Recognition like this reflects the people building that mission every day. See the full list: https://bit.ly/4u8CumB

Forbes 2026 America's Best Startup Employers - Ranked List social-www.forbes.com

6 Comments

Like Comment Share
Labelbox

36,104 followers
1mo
Report this post
Voice agents are evolving from rigid turn-based designs toward continuous, natural conversation, enabling streaming comprehension and generation at the same time. However, most existing benchmarks are either turn-based or latency-focused and do not directly test whether models can maintain reasoning when users interrupt or update objectives mid-utterance. We introduce EchoChain 🔊, a novel benchmark for evaluating reasoning under pressure in full-duplex dialogue. Key findings: - Full-duplex models often fail to properly integrate interruption information, even so far as ignoring the interruption entirely in some cases. - A major weakness in today’s most advanced models is that they struggle to stay consistent when new input arrives while they’re still responding. - In many cases, a model performs well when it can respond without interruption, but struggles once it’s interrupted mid-response. Check out the full analysis in our blog post. Stay tuned for the arXiv paper as well which will be released in the coming days. https://lnkd.in/g3QkNZdb

Introducing EchoChain: An audio benchmark for reasoning under pressure in full-duplex dialogue labelbox.com

5 Comments

Like Comment Share
Labelbox

36,104 followers
2mo
Report this post
Model safety is often judged by refusal rates on AI safety benchmarks. But what if our evaluations are flagging overtly negative or sensitive language rather than detecting genuine adversarial behavior? In our latest research, we show that when this language is removed, frontier models previously labeled as safe frequently fail, exposing a gap between how model safety is evaluated using benchmarks and how adversarial behavior occurs in the real world. Key findings: - AI safety benchmarks are over-reliant on explicit triggering language, provoking model refusals unrealistically. - Removing these cues significantly degrades safety performance, challenging prior assumptions about the robustness of safety evaluations. - We found evidence that both internal safety evaluations and safety alignment techniques use similar language patterns, further questioning the robustness of safety evaluations. - Our novel “intent laundering” framework serves as a strong diagnostic and red-teaming tool, exposing where model safety succeeds and where it fails. Read the full blog post for the complete analysis. https://lnkd.in/g84dywcR

The AI safety illusion: why current safety datasets fool us on model safety labelbox.com

1 Comment

Like Comment Share
Labelbox

36,104 followers
2mo Edited
Report this post
Today, Dario (CEO of Anthropic) x Dwarkesh unpacked where AI is headed, from exponential scaling to what he calls a “country of geniuses in a data center". A few key takeaways: - RL is about generalization, not specialization: Like early pretraining, the goal isn’t mastering one task, but building rich environments and broad data so models generalize across domains. - 1–3 years to a “country of geniuses”: Dario estimates ~50/50 odds that AI systems collectively match the output of an entire nation of top experts in a few years. Not a single superintelligence, but millions of genius-level systems in parallel. - Context as the next unlock: With context windows in the tens of millions of tokens, models could absorb months of workflow in one pass. The goal: steerable, human-aligned systems, as opposed to unchecked autonomous actors. - Software engineering goes end to end: Models are moving from writing code to executing full engineering cycles: setup, debugging, iteration. Bottlenecks now shift from syntax to judgment. - Diffusion will lag capability, briefly: Enterprise adoption slows even with rapid growth, but AI can onboard itself via docs, Slack threads, and codebases. By compressing the adoption curve, trillions in AI-driven revenue by 2030 becomes realistic. Excited to be featured in this conversation, showcasing how we help leading AI teams build high-fidelity RL environments and tighten the iteration loop so models learn from the most informative experiences.
3 Comments

Like Comment Share
Labelbox

36,104 followers
2mo
Report this post
We're excited to share that we’ve acquired Upcraft to bring AI agents to the heart of how we scale human expertise for frontier AI. Upcraft’s AI-powered automation strengthens Alignerr by helping us recruit, engage, and empower a global network of domain experts who train and evaluate the world’s most advanced models. As leading AI teams invest billions into post-training and reinforcement learning, expert-generated data has become the true bottleneck for injecting models with the taste and judgement that only deep human expertise can provide. A big welcome to Greg Caplan and the Upcraft team and we look forward to building together. https://lnkd.in/g4rjRNeA

Welcoming Upcraft to the Labelbox team labelbox.com

4 Comments

Like Comment Share
Labelbox

36,104 followers
2mo Edited
Report this post
Elon x Dwarkesh x John Collison from Stripe just went live. Their almost three hour chat (over some Guinness 🍻) dives into what actually limits the next phase of AI and how Elon plans to break through. A few takeaways from this must-watch episode: - Space as the next data center: Solar power in orbit is roughly five times more effective than on Earth. Within thirty to thirty six months, Musk believes space could become the most economically viable location for AI compute, with Starship launching massive power and compute capacity into orbit. - Humanoid robots as the economic unlock: Optimus could be the ultimate productivity multiplier, potentially expanding the global economy by orders of magnitude. The hardest problem is hands. The endgame is robots that eventually build robots. - Power as the next bottleneck: Electricity production outside China is flat while compute demand is exploding. Musk says the true scaling wall for AI on Earth is utilities, not just models. - Debuggability as a safety requirement: Tools that show where a model’s reasoning went wrong, trace the origin of errors, or detect potential deception will be essential as AI grows more capable. - Efficiency as an existential issue: Interest on national debt now exceeds the military budget. Musk argues that massive productivity gains from AI and robotics are not optional. They are existential. We’re excited to be featured in the conversation, helping leading AI teams scale high quality robotics and reinforcement learning data so their models learn from the right experiences and reach their full potential.
5 Comments

Like Comment Share
Labelbox

36,104 followers
4mo
Report this post
A few research takeaways from NeurIPS 2025, pointing toward a 2026 focused on rigorous evaluation and continually learning AI systems: - Evaluation moves to the core: Data contamination, shortcut learning, and unfaithful benchmarks increasingly blur the line between genuine capability gains and test data overfitting. Designing tasks that faithfully target underlying capabilities is now a first-order research problem and opportunity. - Agents everywhere: The field is moving beyond static foundation models toward interactive agents, with reinforcement learning re-emerging as infrastructure for continual, experience-driven improvement at inference time. - From inflection to consolidation: Expect benchmarks that deliberately surface failure modes, alongside agentic systems that learn across multi-turn interaction in complex, dynamic environments. At Labelbox, these themes directly shape our work. We’re building high-signal, contamination-resistant datasets and capability-focused evaluations to more faithfully measure the performance of frontier AI systems and uncover their failure modes. https://lnkd.in/gsjcn_dK

Reflections on NeurIPS 2025: Advancing evaluation and continual learning in AI labelbox.com

1 Comment

Like Comment Share
Labelbox

36,104 followers
4mo
Report this post
Our Labelbox holiday party this week at the beautifully designed Hedge Coffee was full of great vibes and even greater people. As the team took turns on the turntables with espresso martinis in hand, we celebrated everything we’ve built together this year, while getting energized for a big year ahead.
- +3
5 Comments

Like Comment Share

Browse jobs

Funding

Labelbox 5 total rounds

Last Round

Series D Feb 6, 2022

US$ 110.0M

Investors

SoftBank Vision Fund + 4 Other investors

See more info on crunchbase

Labelbox

Software Development

San Francisco, California 36,104 followers

The data factory for leading AI teams

About us

Locations

Employees at Labelbox

Eric Chan

Ross Barbash

Anthony de Freitas

Jonathan Perry

Updates

Join now to see what you are missing

Similar pages

Alignerr

Scale AI

Outlier

SuperAnnotate

Invisible Technologies

DataAnnotation

Appen

Snorkel AI

Turing

Mercor

Browse jobs

Labelbox jobs

Engineer jobs

Machine Learning Engineer jobs

Analyst jobs

Project Manager jobs

Scientist jobs

Developer jobs

Manager jobs

Intern jobs

Associate jobs

Software Engineer jobs

Specialist jobs

Python Developer jobs

Full Stack Engineer jobs

Solutions Engineer jobs

Director jobs

Engineering Manager jobs

Intelligence Specialist jobs

Account Executive jobs

Support Engineer jobs

Funding