Sumit (@_reachsumit) / X

Sumit

10.2K posts

Sumit

@_reachsumit

Senior ML Engineer @Meta | prev: @TikTok_us, @Amazon, @Samsung | UChicago Alum blog.reachsumit.com 🇮🇳→🇰🇷→🇦🇺→🇨🇦→🇺🇲

Seattle, WA

Joined April 2010

Pinned
Sumit
@_reachsumit
Oct 8, 2025
In the final post of the Adaptive RAG series, we explore how to treat selective retrieval as a core, learned skill, moving from passive observation to active, intelligent decision-making.
Teaching Models to Decide When to Retrieve: Adaptive RAG, Part 4
From blog.reachsumit.com
8K
Sumit
@_reachsumit
Mar 11, 2024
Is Cosine-Similarity of Embeddings Really About Similarity? Netflix cautions against blindly using cosine similarity as a measure of semantic similarity between learned embeddings, as it can yield arbitrary and meaningless results. 📝arxiv.org/abs/2403.05440
373K
Sumit
@_reachsumit
Jun 10, 2024
RAG Does Not Work for Enterprises Explores the challenges and requirements for implementing RAG in enterprises proposing potential solutions like semantic search and hybrid queries, and an evaluation framework to validate enterprise-grade RAG solutions 📝arxiv.org/abs/2406.04369
107K
Sumit
@_reachsumit
Aug 29, 2025
On the Theoretical Limitations of Embedding-Based Retrieval @orionweller et al. at Google DeepMind demonstrate that vector embeddings have fundamental limitations in representing all possible document combinations. 📝arxiv.org/abs/2508.21038 👨🏽‍💻github.com/google-deepmin…
arxiv.org
On the Theoretical Limitations of Embedding-Based Retrieval
Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These...
63K
Sumit
@_reachsumit
Jan 18, 2024
Foundations of Vector Retrieval This 185-page monograph provides a summary of major algorithmic milestones in the vector retrieval literature, with the goal of serving as a self-contained reference for new and established researchers. 📝arxiv.org/abs/2401.09350
25K
Sumit
@_reachsumit
Jan 13, 2025
Small Language Models (SLMs) Can Still Pack a Punch: A survey Amazon presents a survey of Small Language Models (1-8B parameters), exploring how these smaller models can match or outperform larger counterparts. 📝arxiv.org/abs/2501.05465
17K
Sumit
@_reachsumit
Dec 9, 2024
Semantic Retrieval at Walmart Presents a hybrid search system deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer user tail queries, significantly improving relevance. 📝
arxiv.org
Semantic Retrieval at Walmart
In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and...
19K
Sumit
@_reachsumit
Nov 22, 2024
FastRAG: Retrieval Augmented Generation for Semi-structured Data Introduces a RAG approach that improves data processing speed up to 90% and reduces costs by 85% compared to GraphRAG through schema and script learning techniques. arxiv.org/abs/2411.13773
16K
Sumit
@_reachsumit
Mar 10, 2025
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Introduces a two-stage RL approach enabling LLMs to autonomously invoke search during reasoning. 📝arxiv.org/abs/2503.05592 👨🏽‍💻github.com/SsmallSong/R1-…
13K
Sumit
@_reachsumit
Nov 25, 2024
Understanding LLM Embeddings for Regression Demonstrates that LLM embeddings can outperform traditional feature engineering for high-dimensional regression tasks while preserving Lipschitz continuity in the embedding space. arxiv.org/abs/2411.14708
19K
Sumit
@_reachsumit
Jan 30, 2025
WARP: An Efficient Engine for Multi-Vector Retrieval Introduces an efficient engine that significantly reduces query latency for multi-vector retrieval systems through implicit decompression and dynamic similarity imputation. 📝arxiv.org/abs/2501.17788 👨🏽‍💻github.com/jlscheerer/xtr…
arxiv.org
WARP: An Efficient Engine for Multi-Vector Retrieval
Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we...
42K
Sumit
@_reachsumit
Jul 29, 2024
REAPER: Reasoning based Retrieval Planning for Complex RAG Systems Amazon presents an LLM-based planner for generating efficient retrieval plans in conversational AI systems offering reduced latency, higher accuracy, and easy scalability. 📝arxiv.org/abs/2407.18553
13K
Sumit
@_reachsumit
Nov 26, 2024
A Survey on LLM-as-a-Judge Presents a comprehensive survey examining how to build reliable LLM-as-Judge systems, exploring strategies for improving consistency, mitigating biases, and adapting to diverse assessment scenarios. 📝arxiv.org/abs/2411.15594 👨🏽‍💻github.com/IDEA-FinAI/LLM…
arxiv.org
A Survey on LLM-as-a-Judge
Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language...
12K
Sumit
@_reachsumit
Jul 24, 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More Salesforces presents a survey of LLM alignment methods, categorizing approaches into four main topics and identifying future research directions. 📝arxiv.org/abs/2407.16216
8.9K