Revanth Gangi Reddy

I am a PhD student at the University of Illinois, Urbana Champaign, advised by Prof. Heng Ji. Previously, I was an AI Resident at IBM Research, New York wherein I had the pleasure of working with Vittorio Castelli, Avirup Sil and Salim Roukos.

I graduated from Indian Institute of Technology Madras in 2018 with a bachelors degree in computer science. While at IIT Madras, I worked with Prof. Mitesh Khapra and Prof. Balaraman Ravindran

Email / CV / LinkedIn / Google Scholar

Research

I'm interested in natural language processing and large language models, particulary in the fields of agentic search, embedding models and reranking, and information-seeking for retrieval-augmented generation.

Papers

	WiNELL: Wikipedia Never-Ending Updating with LLM Agents Revanth Reddy, Tanay Dixit, Jiaxin Qin, Cheng Qian, Daniel Lee, Jiawei Han, Kevin Small, Xing Fan, Ruhi Sarikaya, Heng Ji Under Review We introduce WiNELL, an agentic framework for automatic wikipedia updating, that continuously monitors online sources for recent facts, identifies relevant updates for the Wiki article under consideration, and generates well-formed edit suggestions.
	SWERank: Software Issue Localization with Code Ranking Revanth Reddy, Tarun Suresh, JaeHyeok Doo, Ye Liu, Xuan Phi Nguyen, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Heng Ji, Shafiq Joty Under Review [Paper][Blog Post][Code] We introduce SWERank, a code ranking framework for software issue localization, which identifies the relevant code that needs to be modified to fix a software issue.
	CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking Tarun Suresh, Revanth Reddy, Yifei Xu, Zach Nussbaum, Andriy Mulyar, Brandon Duderstadt, Heng Ji ICLR 2025 [Paper][Blog Post][Code] We introduce CoRNStack, a large-scale, high-quality contrastive training dataset for code that spans multiple programming languages. We demonstrate that contrastive training of embedding models using CoRNStack leads to state-of-the-art performance across a variety of code retrieval tasks.
	INFOGENT: An Agent-Based Framework for Web Information Aggregation Revanth Reddy, Sagnik Mukherjee, Jeonghwan Kim, Zhenhailong Wang, Dilek Hakkani-Tur, Heng Ji NAACL Findings, 2025 [Paper][Blog Post][Code] We introduce INFOGENT,a novel modular and feedback-driven framework for web information aggregation involving three distinct components: Navigator, Extractor and Aggregator.
	Inference-time Reranker Relevance Feedback for Neural Information Retrieval Revanth Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avi Sil, Heng Ji Hannaneh Hajishirzi SIGIR 2025 [Paper] We propose to compute an improved vector representation of the query using supervision from the re-ranker at inference time, thereby improving the retriever's Recall@K. Our approach is parameter-free, lightweight, and can serve arbitrary retrieve-and-rerank pipelines, significantly improving retrieval recall in multiple domains, languages, and modalities.
	SmartBook: AI-Assisted Situation Report Generation Revanth Reddy, Daniel Lee, Yi R. Fung, Qi Zeng, Manling Li, Ziqi Wang, Paul Sullivan, Clare Voss, Heng Ji Knowledge Graphs Journal, 2024 [Paper][Code] We introduce SmartBook, a generalizable automated framework designed to assist human analysts in real-time situation report generation from large news corpora, by generating a structured report with multiple hypotheses (claims) summarized and grounded with rich links to factual evidence.
	FIRST: Faster Improved Listwise Reranking with Single Token Decoding Revanth Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain, Avirup Sil, Heng Ji EMNLP 2024 [Paper][Code] We introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates.
	AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings Revanth Reddy, Omar Attia, Yunyao Li, Heng Ji, Saloni Potdar EMNLP 2024 [Paper] We introduce any-granularity ranking which leverages multi-vector embeddings to rank at varying levels of granularity while maintaining encoding at a single (coarser) level of granularity.
	Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization Zixuan Zhang, Revanth Reddy, Kevin Small, Tong Zhang, Heng Ji NAACL Findings, 2024 [Paper] We propose to improve an OpenQA model's generalizability across different corpora and domains by mitigating the model's over-memorization of knowledge.
	Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement Chenkai Sun, Ke Yang, Revanth Reddy, Yi R. Fung, Hou Pong Chan, Kevin Small, ChengXiang Zhai, Heng Ji COLING 2025 [Paper] We introduce a framework that enhances the accuracy and context efficiency of retrieval-based LLM personalization through collaborative data refinement. The method also excels in cold-start scenarios.
	Progressive Responses with Real-Time Internet Search for Knowledge-Powered Conversations Revanth Reddy, Sharath Chandra, Hao Bai, Wentao Yao, Mankeerat Singh Sidhu, Karan Aggarwal, Prathamesh Sonawane, ChengXiang Zhai WSDM Demo, 2024 [Paper] We introduce the use of progressive response generation to integrate real-time web search results, where the preliminary response buys time for a detailed follow-up, ensuring a smooth user interaction. As a result, our method cuts down user waiting times for voice-based chatbots by 50%.
	Social Commonsense-Guided Search Query Generation for Open-Domain Knowledge-Powered Conversations Revanth Reddy, Hao Bai, Wentao Yao, Sharath Chandra, Heng Ji ChengXiang Zhai Findings of EMNLP, 2023 [Paper] To tackle passive conversations, we propose to integrate social commonsense reasoning for the generation of search queries in knowledge-powered conversations. We leverage a commonsense dialog system to establish connections related to the conversation topic, which subsequently guides an instruction-driven query generation model.
	SumREN: Summarizing Reported Speech about Events in News Revanth Reddy, Heba Elfardy, Hou Pong Chan, Kevin Small, Heng Ji AAAI, 2023 [Paper][Poster] We propose the novel task of summarizing the reactions of different speakers with respect to a given event. We create a new multi-document summarization benchmark, SumREN, along with a pipeline-based framework for summarizing reported speech, which generates summaries that are more abstractive and factually consistent.
	NewsClaims: A New Benchmark for Claim Detection from News with Attribute Knowledge Revanth Reddy, Sai Chinthakindi, Zhenhailong Wang, Yi R. Fung, Kathryn S. Conger, Ahmed S. Elsayed, Martha Palmer, Preslav Nakov, Eduard Hovy, Kevin Small, Heng Ji EMNLP, 2022 [Paper][Poster] We present NewsClaims, a new benchmark for knowledge-aware claim detection, that re-defines the claim detection problem to include extraction of additional attributes related to the claim. NewsClaims aims to benchmark claim detection in emerging scenarios, comprising unseen topics with no training data.
	Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval Revanth Reddy, Arafat Sultan, Martin Franz, Avi Sil, Heng Ji SIGIR, 2022 [Paper][Poster] Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation episodes on those, we teach neural IR to attend more uniformly and robustly to all entities in a given passage.
	Towards Robust Neural Retrieval Models with Synthetic Pre-Training Revanth Reddy, Vikas Yadav, Arafat Sultan, Martin Franz, Vittorio Castelli, Heng Ji, Avi Sil COLING, 2022 [Paper] We show that synthetic examples generated using a sequence-to-sequence generator can be effective in improving the robustness of neural IR, with gains in both in-domain and out-of-domain scenarios.
	A Zero-Shot Claim Detection Framework using Question Answering Revanth Reddy, Sai Chinthakindi, Yi R. Fung, Kevin Small, Heng Ji COLING, 2022 [Paper][Poster] We propose a fine-grained claim detection framework that leverages zero-shot question answering using directed questions to solve a diverse set of sub-tasks such as topic filtering, claim object detection, and claimer detection.
	MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding Revanth Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang , Mohit Bansal, Avi Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji AAAI, 2022 [Paper][Slides][Poster] We propose a new benchmark for multimedia question answering over news articles and introduce a novel data generation framework for generating questions that are grounded on objects in images and answered using the news body text.
	COVID-19 Claim Radar: A Structured Claim Extraction and Tracking System Manling Li, Revanth Reddy, Ziqi Wang, Yi-Shyuan Chiang, Tuan M. Lai, Pengfei Yu, Zixuan Zhang, Heng Ji ACL Demo, 2022 [Paper][Demo] We present COVID-19 Claim Radar, a system that automatically extracts claims relating to COVID-19 in news articles. We provide a comprehensive structured view of such claims, with rich attributes (such as claimers and their affiliations) and associated knowledge elements (such as events, relations and entities).
	Synthetic Target Domain Supervision for Open Retrieval QA Revanth Reddy, Bhavani Iyer, Arafat Sultan, Rong Zhang, Avi Sil, Vittorio Castelli, Radu Florian, Salim Roukos SIGIR, 2021 [Paper][Poster][Slides] We explore using a synthetic example generation approach to improve the performance of state-of-the-art open-domain end-to-end question answering systems in a specialized domain, such as COVID-19.
	InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection Yi R. Fung, Chris Thomas, Revanth Reddy, Sandeep Polisetty, Heng Ji, Shih-Fu Chang, Kathleen McKeown, Mohit Bansal, Avi Sil ACL, 2021 [Paper] While most previous work is on document level fake news detection, for the first time we propose misinformation detection at knowledge element level. It not only achieves higher detection accuracy but also makes the results more explainable.
	Leveraging Abstract Meaning Representation for Knowledge Base Question Answering Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, ... , Revanth Reddy, Ryan Riegel, Gaetano Rossiello, Udit Sharma, Shrivatsa Bhargav, Mo Yu Findings of ACL, 2021* [Paper] We introduce a neuro-symbolic question answering system that leverages AMR for question understanding and uses a pipeline-based approach involving a semantic parser, entity and relationship linkers and a neuro-symbolic reasoner.
	Multi-Stage Pretraining for Low-Resource Domain Adaptation Rong Zhang, Revanth Reddy, Arafat Sultan, Efsun Kayi, Anthony Ferrito, Vittorio Castelli, Avi Sil, Todd Ward, Radu Florian, Salim Roukos EMNLP, 2020 [Paper] We formulate synthetic pre-training tasks that can transfer to downstream tasks, by using structure in unlabeled text. We show considerable gains on multiple tasks in the IT domain: question answering, document ranking and duplicate question detection.
	Answer Span Correction in Machine Reading Comprehension Revanth Reddy, Arafat Sultan, Rong Zhang, Efsun Kayi, Vittorio Castelli, Avi Sil Findings of EMNLP, 2020 [Paper] We propose an approach for correcting partial match answers (EM=0, 0<F1<1) into exact match (EM=1, F1=1) and obtain upto 1.3% improvement over a RoBERTa-based machine reading comprehension system in both monolingual and multilingual evaluation.
	Pushing the Limits of AMR Parsing with Self-Learning Young-suk Lee, Ramon Astudillo, Tahira Naseem, Revanth Reddy, Radu Florian, Salim Roukos Findings of EMNLP, 2020 [Paper] We propose self-learning approaches to improve AMR parsers, via generation of synthetic text and synthetic AMR as well as refinement of actions from the oracle. We achieve state-of-the-art performance in AMR parsing on benchmark AMR 1.0 and AMR 2.0 datasets.
	Multi-Level Memory for Task Oriented Dialogs Revanth Reddy, Danish Contractor, Dinesh Raghu, Sachindra Joshi NAACL, 2019 [Paper][Poster] We design a novel multi-level memory architecture that retains natural hierarchy of the knowledge base without breaking it down into subject-relation-object triples. We use separate memories for dialog context and KB to learn different memory readers.
	FigureNet : A Deep Learning model for Question-Answering on Scientific Plots Revanth Reddy, Rahul Ramesh, Ameet Deshpande, Mitesh Khapra IJCNN, 2019 [Paper][Slides] We design a modular network that uses depth-wise and 1D convolutions for visual reasoning on scientific plots. We achieve state-of-the-art accuracy on FigureQA dataset, bettering Relation Networks by 7%, with a training time over an order of magnitude lesser.
	Edge Replacement Grammars : A Formal Language Approach for Generating Graphs Revanth Reddy, Sarath Chandar, Balaraman Ravindran SDM, 2019 [Paper][Slides][Poster] We propose a graph generative model based on probabilistic edge replacement grammars. We design an algorithm to build graph grammars by capturing the statistically significant sub-graph patterns.

Template from here