Log inSign up
William Berrios
2,366 posts
Image
user avatar
William Berrios
@w33lliam
Engineer @UNIoficial 🇵🇪
San Francisco, CA
wberriosr.com
Joined January 2020
2,071
Following
986
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    William Berrios
    @w33lliam
    Jul 14, 2025
    Tired of seeing O3 hallucinate? 😵‍💫 Today, I am excited to share how we built the least hallucinatory LLM in the 🌍 Our GLMv2, developed at @ContextualAI, just claimed 1st place 🥇 on the FACTS Grounded leaderboard by Google DeepMind — outperforming Gemini-2.5-pro, Claude 4, and
    Image
    00:00
    581K
  • user avatar
    William Berrios
    @w33lliam
    Jun 29, 2023
    Announcing LENS 🔎, a framework for vision-augmented language models. - Outperforms Flamingo by 9% (56->65%) on VQAv2 - Eliminates the additional cost of multimodal pre-training Demo: lens.contextual.ai Blog+Paper+Code: contextual.ai/introducing-le… A 🧵 [1/N]
    78K
  • user avatar
    William Berrios
    @w33lliam
    Jun 23, 2025
    Excited to share 🤯 that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 🥇 How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below: 🧵 1/11
    Image
    77K
  • user avatar
    William Berrios
    @w33lliam
    Jun 30, 2023
    If you want to augment your favorite LLM with vision capabilities like GPT-4, take a look at the following: Blog+Paper: contextual.ai/introducing-le… Demo: lens.contextual.ai Code: github.com/ContextualAI/l…
    Image
    Image
    user avatar
    AK
    @_akhaliq
    Jun 29, 2023
    Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language paper page: huggingface.co/papers/2306.16… propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a
    8.3K
  • user avatar
    William Berrios
    @w33lliam
    Jul 22, 2025
    📢 As promised ✨, we're open-sourcing LMUnit! Our SoTA generative model for fine-grained criteria evaluation of your LLM responses 🎯 ✅ SoTA on Flask & BigGbench ✅ SoTA generative reward model on RewardBench2 🤗 Models available on @huggingface: tiny.cc/qjzp001 💻
    7.1K
  • user avatar
    William Berrios
    @w33lliam
    Mar 18, 2022
    Happy to share that the fantastic teamwork with @ArtDeza resulted in our new paper which shows that a Vision Transformer trained in an adversarial manner and coupled with rotation invariance achieves new SOTA in Area V4 at @brain_score competition🤯. #COSYNE22
    user avatar
    Arturo Deza
    @ArtDeza
    Mar 18, 2022
    1/ Excited to share our new paper showing that training a Transformer adversarially with rotation invariance achieves new SOTA in Area V4 for in @brain_score ! We also scored 2nd in the aggregate competition. This epic tour-de-force was lead by @W33lliam96 openreview.net/forum?id=SOulr…
  • user avatar
    William Berrios
    @w33lliam
    Jun 23, 2025
    Replying to @w33lliam
    As a quick recap, in LMUnit, we utilize "natural language unit tests," which decompose response quality into explicit, testable criteria. Instead of relying on opaque metrics like "pick the better response," each quality aspect becomes a specific question that humans can
    Image
    2.1K
  • user avatar
    William Berrios
    @w33lliam
    Jun 23, 2025
    Replying to @w33lliam
    Everything is available 👀 🏆 Leaderboard: huggingface.co/spaces/allenai… 🔧 API: contextual.ai/request-lmunit… 💻 RewardBench2 code submission: github.com/ContextualAI/e… 📄 Paper: arxiv.org/abs/2412.13091 ⭐ LMUnit-llama3.1 available now, LMUnit-qwen2.5 coming soon! 9/11
    Image
    Reward Bench Leaderboard - a Hugging Face Space by allenai
    From huggingface.co
    882
  • user avatar
    William Berrios
    @w33lliam
    Aug 9, 2022
    Would be a good idea to extend the discussion period by 1 week? Hopefully, most authors could get a response in their rebuttal 🙃 #NeurIPS2022
  • user avatar
    William Berrios
    @w33lliam
    Apr 3, 2025
    LMUnit in CI/CD pipelines for catching regressions ❤️
    user avatar
    Contextual AI
    @ContextualAI
    Apr 3, 2025
    🔥 Introducing the most reliable way to evaluate LLMs and agents in production! It's time to stop “vibe testing” your AI systems. Our latest developer's guide shows you how to rigorously test AI systems so that they hold up in production, using Contextual AI's LMUnit evaluation
    Image
    131
  • user avatar
    William Berrios
    @w33lliam
    Jun 23, 2025
    Replying to @w33lliam
    But here is also one of our most exciting results: When humans evaluate using our unit tests instead of traditional preference judgments, inter-annotator agreement jumps from 71% to 86%! That's a 15% improvement in human consensus, just by asking better questions. 6/11
    Image
    1.3K
  • user avatar
    William Berrios
    @w33lliam
    Dec 13, 2023
    🌟 Gen AI advances fairness in AI models with 🔁Diffusion Perturbations! Explore our demographic-balanced dataset for fair AI evaluation, led by the remarkable @niclui97 and @bryanchiaw! Let's ensure AI fairness prevails! 🤖✨ #AIForFairness #AAAI24
    user avatar
    Nicholas Lui
    @niclui97
    Dec 13, 2023
    Can Gen AI help us evaluate the fairness of AI models? The answer is YES. Excited to announce 🔁Diffusion Perturbations, a diffusion-based approach to create datasets balanced across demographic traits. Paper: arxiv.org/abs/2311.15108 Dataset: huggingface.co/datasets/Diffu… 🧵👇! 1/N
    Image
    693
  • user avatar
    William Berrios
    @w33lliam
    Jun 29, 2023
    Replying to @w33lliam
    Make your favorite LLM vision-augmented with just a pip install and a few lines of Python! [9/N]
    Image
    609
  • user avatar
    William Berrios
    @w33lliam
    Mar 19, 2024
    Excited to share what we have been working @ContextualAI! RAG 2.0, our end-to-end system for developing production-grade AI 🚀 Check out our post with benchmarks and long-context experiments!
    user avatar
    Contextual AI
    @ContextualAI
    Mar 19, 2024
    Today, we’re excited to announce RAG 2.0, our end-to-end system for developing production-grade AI. Using RAG 2.0, we’ve created Contextual Language Models (CLMs), which achieve state-of-the-art performance on a variety of industry benchmarks. CLMs outperform strong RAG
    A bar chart comparing the accuracy of multiple RAG-based models on a series of benchmarks like NQ and TriviaQA. RAG 2.0 (Contextual Language Model) exceeds SOTA across all benchmarks shown compared to RAG baselines built using GPT-4 and Mixtral.
    360
Advertisement
Advertisement