Albert Gong

I'm a third-year PhD student in computer science at Cornell, where I'm fortunate to be advised by Raaz Dwivedi and Kilian Q. Weinberger. I'm currently working on using distribution compression (a.k.a. "thinning") to speed up training and inference of large-language models (LLMs). My longer term goal is to enable LLM agents to efficiently search over large and dynamic datastores.

Previously, I was an undergrad at Yale, where I had the privilege of working with Andre Wibisono, Zhong Shao, and Cormac O'Dea.

Email / Google Scholar / LinkedIn / Github

Research

* = equal contribution

Memento: Note-Taking for Your Future Self
Chao Wan*, Albert Gong*, Mihir Mishra, Claas Beger, Carl-Leander Henneking, Kilian Q. Weinberger
arXiv preprint, 2025
Code (coming soon) / arXiv

Tl;dr—Developed an agentic workflow for multi-step question-answering that coordinates multiple LLM calls to achieve superior reasoning capability over vanilla Chain-of-Thought.

N²: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion
Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi
arXiv preprint, 2025
Code / arXiv / Poster (CODEML Workshop)

Tl;dr—Introduced the N² package and N²-Bench test bench for nearest neighbor-based matrix completion.

PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong*, Kamilė Stankevičiūtė*, Chao Wan*, Anmol Kabra*, Raphael Thesmar, Johann Lee, JT Klenke, Carla P. Gomes, Kilian Q. Weinberger
ICML, 2025
Code / arXiv / Poster / Slides

Tl;dr—Created a framework to automatically generate both the document corpus and question-answer pairs for benchmarking RAG and agentic workflows.

Low-Rank Thinning
Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey
ICML, 2025
Code (see below) / arXiv

Tl;dr—Developed new analysis of thinning algorithms that adapts to low-rank structures, enabling faster dot-product attention in Transformers (Thinformer), stochastic gradient descent (KH-SGD), and deep kernel hypothesis testing (DeepCTT).

Supervised Kernel Thinning
Albert Gong, Kyuseong Choi, Raaz Dwivedi
NeurIPS, 2024
Code / arXiv / Poster / Slides / Video

Tl;dr—Used distribution compression to speed up kernel smoothing and kernel ridge regression.

Source code adapted from Jon Barron's website.