Log inSign up
Orion Weller
643 posts
user avatar
Orion Weller
@orionweller
PhD student @jhuclsp Prev Intern @AIatMeta @GoogleDeepMind, @samaya_ai, @allen_ai Research: LLMs, Search, Agents
Baltimore
orionweller.github.io
Joined March 2015
1,050
Following
2,164
Followers
  • user avatar
    Orion Weller
    @orionweller
    Aug 29, 2025
    Instructions/reasoning are now everywhere in retrieval - we want embeddings to do it all! 🚀 But... is it even possible? 🤔 Turns out, it's not possible for single-vector models 😱 theoretically and empirically! To make it obvious we OSS a simple eval SoTA models flop on! 🧵
    Image
    35K
  • user avatar
    Orion Weller
    @orionweller
    Mar 25, 2024
    LLMs can use complex instructions - why can’t retrieval models? We build FollowIR, a training/test set of real-world human retrieval instructions. Our FollowIR-7B is the best IR model for instruct-following, even beating @cohere @openai retrievers 🤯 📝 arxiv.org/abs/2403.15246
    Image
    56K
  • user avatar
    Orion Weller
    @orionweller
    Feb 26, 2025
    Ever wonder how test-time compute would do in retrieval? 🤔 introducing ✨rank1✨ rank1 is distilled from R1 & designed for reranking. rank1 is state-of-the-art at complex reranking tasks in reasoning, instruction-following, and general semantics (often 2x RankLlama 🤯) 🧵
    Image
    Image
    28K
  • user avatar
    Orion Weller
    @orionweller
    Jul 25, 2024
    🚨 We all complain a lot about reviewers/ACs/SACs in the ML/NLP community.   But why not look at the data to see what’s going on? I found some crazy statistics about who is doing/not doing this service in the *CL community. 😱 orionweller.github.io/blog/2024/revi… 🧵
    32K
  • user avatar
    Orion Weller
    @orionweller
    Jul 16, 2025
    🤔 Have you ever wondered how good ModernBERT is compared to decoders like Llama? We made an open-data version of ModernBERT and used the same recipe for encoders and decoders. Turns out, our encoder model beat ModernBERT and our decoder model beats Llama 3.2 / SmolLM2 🤯 🧵
    Image
    30K
  • user avatar
    Orion Weller
    @orionweller
    May 23, 2023
    Can we guide LLMs to quote text from their pre-training data using prefixes like "According To ..", improving grounding and reducing hallucination? We discovered that LLMs do have this capability and can increase or decrease quoting on request 🤯 📝:arxiv.org/abs/2305.13252 1/5
    Image
    27K
  • user avatar
    Orion Weller
    @orionweller
    Sep 18, 2024
    Introducing ✨Promptriever ✨ the first retriever that can be prompted like an LM with free-form prompts! Our secret: query-level instruction training lets you keep the promptability of the base LM! 🚫 keyword-matching ✅ instruction search 📝 arxiv.org/abs/2409.11136
    Image
    41K
  • user avatar
    Orion Weller
    @orionweller
    Sep 15, 2023
    Using LLMs for query or document expansion in retrieval (e.g. HyDE and Doc2Query) have scores going 📈 But do these approaches work for all IR models and for different types of distribution shifts? Turns out its actually more 📉 🚨 📝 (arxiv soon): orionweller.github.io/assets/pdf/LLM…
    A plot: the x axis is baseline score of rankers, in ndcg@10. y axis is delta of model score after an expansion is applied.

There are three sets of results, one dataset for each shift type: TrecDL (no shift), FiQA (domain shift), ArguAna (query shift).  For each set of result, the chart shows a scatter plot with a trend line. We observe the same trend for all: as the baseline score increases, the delta when using expansion decreases. 

On TREC DL, worst models have a base score of ~40, and improve by 10 points w/expansion. the best models have a score of >70, and their performance decreases by -5 points w/expansion.


On FiQA, worse models have a base score of ~15, and improve by 5 points w/expansion. the best models have a score of ~45, and their performance decreases by -3 point w/expansion.


On ArguAna, worst models have a base score of ~25, and improve by >20 points w/expansion. the best models have a score of >55, and their performance decreases by -1 point w/expansion.
    25K
  • user avatar
    Orion Weller
    @orionweller
    Apr 11, 2022
    I'm excited to announce that I have been awarded both the NSF GRFP and the DoD NDSEG fellowships!
  • user avatar
    Orion Weller
    @orionweller
    May 27, 2021
    Life update: I'll be joining @jhuclsp to start my PhD in the fall! I'm grateful for the many researchers who have helped mentor me to this point and am looking forward to future collaborations on the East Coast!
  • user avatar
    Orion Weller
    @orionweller
    Jan 24, 2025
    Now accepted to ICLR! Excited to see everyone in Singapore ✈️
    user avatar
    Orion Weller
    @orionweller
    Sep 18, 2024
    Introducing ✨Promptriever ✨ the first retriever that can be prompted like an LM with free-form prompts! Our secret: query-level instruction training lets you keep the promptability of the base LM! 🚫 keyword-matching ✅ instruction search 📝 arxiv.org/abs/2409.11136
    Image
    3.9K
  • user avatar
    Orion Weller
    @orionweller
    May 15, 2023
    🚨 Negation misunderstanding in IR systems can lead to dire outcomes, as seen in the quoted Google Search example from 2021. But are SOTA IR models any better now? 🔍 Spoiler: nearly all IR models perform worse than random! #IR #NLProc
    user avatar
    soft
    @soft
    Oct 16, 2021
    The Google search summary vs the actual page
    Image
    Image
    11K
  • user avatar
    Orion Weller
    @orionweller
    Jul 7, 2022
    Excited to share our #NAACL2022 work (and my first @jhuclsp) on pretraining, federated learning (FL), and multilingual data! We provide the first study on the impact of multilingual partitioning on FL algorithms, showing that non-IID settings can cause a large drop in performance
    Image
  • user avatar
    Orion Weller
    @orionweller
    Dec 21, 2022
    Search is becoming critical to many NLP tasks now - but how can we defend against malicious actors who attack knowledge sources? We introduce a simple and effective method that uses query augmentation and answer redundancy to provide gains of 5-20% EM! arxiv.org/abs/2212.10002
    Image
    4.4K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement