Kyle Lo (@kylelostat) / X

Kyle Lo

940 posts

Kyle Lo

@kylelostat

language models, mts @MicrosoftAI, ex co-lead of Olmo @allen_ai, @uwcse, he/him, kylelo.bsky.social🧋

Seattle, WA

Joined January 2019

Pinned
Kyle Lo
@kylelostat
Dec 17, 2025
olmo 3 paper finally on arxiv 🫡 thx to our teammates esp folks who chased additional baselines thx to arxiv-latex-cleaner and overleaf feature for chasing latex bugs thx for all the helpful discussions after our Nov release, best part of open science is progressing together!
57K
Kyle Lo
@kylelostat
Nov 5, 2025
why intern at Ai2? 🐟interns own major parts of our model development, sometimes even leading whole projects 🐡we're committed to open science & actively help our interns publish their work reach out if u wanna build open language models together 🤝 links👇
79K
Kyle Lo
@kylelostat
Jan 3, 2025
kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡 🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into: 🚖 stable pretrain 🚔 lr anneal 🤝 data curricula 🤝 soups 🚘 tulu post-train 🚜 compute infra 👇🧵
47K
Kyle Lo
@kylelostat
Nov 26, 2024
Excited to share OLMo 2! 🐟 7B and 13B weights, up to 4-5T tokens, fully open data, code, etc 🐠 better architecture and recipe for training stability 🐡 staged training, with new data mix Dolmino🍕during annealing 🦈 state-of-the-art OLMo 2 Instruct models links below 👇
15K
Kyle Lo
@kylelostat
Oct 22, 2025
woah guess VLMs for OCR the hottest research topic this week😆 since the first olmOCR, we've been.. 🔥training our VLM using RLVR with binary unit test rewards🔥 it's incredibly effective & unit test creation easy to scale w synthetic data pipelines
Ai2
@allen_ai
Oct 22, 2025
We’re updating olmOCR, our model for turning PDFs & scans into clean text with support for tables, equations, handwriting, & more. olmOCR 2 uses synthetic data + unit tests as verifiable rewards to reach state-of-the-art performance on challenging documents. 🧵
14K
Kyle Lo
@kylelostat
Apr 25, 2023
The Semantic Reader project combines AI & HCI research to explore the future of scientific reading. We ask: “Can we create intelligent, interactive, and accessible reading interfaces for research papers, even atop existing PDFs?” semanticscholar.org/reader/67a5bac… 1/n
40K
Kyle Lo
@kylelostat
Oct 16, 2024
we're hiring research interns to join OLMo!🐙 if you're a student looking to dive into the excitement/chaos of training language models, great opportunity to embed with our team & cook some open models together 🍳
24K
Kyle Lo
@kylelostat
Feb 22, 2024
DM me if you're interested in: 🐋creating high-quality pretraining datasets 🐊studying data's impact on LM capabilities 🦉tools for sensemaking over large corpora 🐡adapting LMs to specialized domains like science 🐈evaluation through human interaction
Semantic Scholar Research @ AI2
@ai2_s2research
Feb 22, 2024
📣 Job opportunities at Semantic Scholar Research @ the Allen Institute for AI (AI2) for post-doctoral & pre-doctoral researchers starting in 2024! 📣 Our team works on NLP and HCI research with a focus on open LLMs and LLM-powered research support tools and assistants.
21K
Kyle Lo
@kylelostat
Aug 31, 2021
@allen_ai is hiring #nlproc #ml #ai #hci researchers to join @SemanticScholar! We welcome postdocs, scientists, engineers, research interns & predoctoral researchers. Happy to chat if you're interested! Our team: research.semanticscholar.org Apply by *Oct 15*: allenai.org/careers
Kyle Lo
@kylelostat
Dec 10, 2024
the science of LMs should be fully open✨ today, we are giving our NeurIPS 2024 tutorial on language model development. everything from data, training, adaptation. published or not, no secrets 🫡 tues, 12/10, 9:30am PT ☕️ West Ballroom B
7K
Kyle Lo
@kylelostat
Dec 10, 2023
so happy to have our work recognized at #emnlp2023 🥳 big thanks to @shannonzshen @blnewm @josephcc @soldni and collaborators at @SemanticScholar @allen_ai @UCBerkeley @MIT two of my favorite aspects of this work:
EMNLP 2026
@emnlpmeeting
Dec 10, 2023
EMNLP 2023 Best Paper Demo PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents (Kyle Lo, Zejiang Shen, Benjamin Newman, Joseph Chang et al.) aclanthology.org/2023.emnlp-dem… #EMNLP2023 #NLProc
20K
Kyle Lo
@kylelostat
Aug 14, 2024
"Can we have two golds?"
Jesse Dodge
@JesseDodge
Aug 14, 2024
Congrats to our team for winning two paper awards at #ACL2024! OLMo won the Best Theme Paper award, and Dolma won a Best Resource Paper award! All the credit goes to the whole team for the massive group effort 🎉🎉
7.5K
Kyle Lo
@kylelostat
Oct 6, 2023
Don't forget to apply by *Oct 15* for @allen_ai research internships! Interested in language models of science, evaluating AI-generated text, challenging retrieval settings, and human-AI collaborative reading/writing? Come work with meeee!😸
Semantic Scholar Research @ AI2
@ai2_s2research
Sep 12, 2023
@allen_ai @SemanticScholar is hiring #nlproc #hci #ml #ai researchers for internships! For start dates in spring/summer 2024, apply by *Oct 15*. About us: research.semanticscholar.org Apply: boards.greenhouse.io/thealleninstit…
27K
Kyle Lo
@kylelostat
Oct 23, 2025
one of the big motivations behind olmOCR 2’s use of RLVR with binary unit tests. the ability to easily define unit tests for model failures + retrain makes iteration really easy tech report out 👉arxiv.org/abs/2510.19817
Nathan Lambert
@natolambert
Oct 23, 2025
I agree that RLVR is definitely much more satisfying and engaging than RLHF bug bashing (or human data chasing)
13K