Zack Ankner (@ZackAnkner) / X

Zack Ankner

365 posts

Zack Ankner

@ZackAnkner

Prev @MIT.

zackankner.com

Joined September 2019

Pinned
Zack Ankner
@ZackAnkner
Aug 22, 2024
Excited to announce our new work: Critique-out-Loud (CLoud) reward models. CLoud reward models first produce a chain of thought critique of the input before predicting a scalar reward, allowing reward models to reason explicitly instead of implicitly! arxiv.org/abs/2408.11791
GIF
71K
Zack Ankner
@ZackAnkner
Jun 3, 2024
New paper where we explore using a small LM’s perplexity to prune the pretraining data for larger LMs. We find that small LMs can prune data for up to 30x larger LMs, data pruning works in the overtrained and data-constrained regimes, and more!
arxiv.org
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small...
In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing...
73K
Zack Ankner
@ZackAnkner
Sep 4, 2023
My EMNLP paper got desk-rejected post-rebuttal because I posted it to arxiv 25 minutes after the anonymity deadline. I was optimistic about our reviews, so I spent a whole week while visiting my family writing rebuttals and coding experiments to respond.
Naomi Saphra
@nsaphra
Sep 4, 2023
Just got a desk reject, post-rebuttals, for a paper being submitted to arxiv <30 min late for the anonymity deadline. I talk about how the ACL embargo policy hurts junior researchers and makes ACL venues less desirable for NLP work. I don’t talk about the pointless NOISE it adds.
105K
Zack Ankner
@ZackAnkner
Feb 9, 2024
Excited to announce Hydra decoding! 🚀 We introduce sequential dependence in Medusa decoding and achieve up to a 1.31x and 2.71x improvement in throughput as compared to Medusa and baseline decoding! Paper: arxiv.org/abs/2402.05109 Github: github.com/zankner/Hydra
GIF
21K
Zack Ankner
@ZackAnkner
Nov 7, 2023
arxiv.org/abs/2310.01889 "Ring Attention with Blockwise Transformers for Near-Infinite Context" TLDR: Ring Attention: A distributed algorithm to efficiently compute exact attention over arbitrarily long sequence lengths. Details in thread 👇
arxiv.org
Ring Attention with Blockwise Transformers for Near-Infinite Context
Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands...
12K
Zack Ankner
@ZackAnkner
May 24, 2023
Should we really use constant masking rates for BERT pretraining? We introduce dynamic masking rate schedules, a simple but effective method for improving masked language modeling (MLM) pretraining. Paper: zackankner.com/mlm-schedule.p…
24K
Zack Ankner
@ZackAnkner
Nov 11, 2024
There have been a lot of anectodes about the Llama3 series of models being harder to post-training quanitze (PTQ) than Llama2. As part of this paper, we investigated the hypothesis that the degradation from PTQ grows with the token-to-parameter ratio (TPR), .ie as you overtrain.
Tanishq Kumar
@tanishqkumar07
Nov 11, 2024
[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training arxiv.org/pdf/2411.04330. TLDR; - Models become harder to post-train quantize as they
11K
Zack Ankner
@ZackAnkner
Sep 25, 2024
SCoRe paper was a fun read (arxiv.org/abs/2409.12917)! One suprising result was that you don't effectively learn self-correction with naive multi-turn RL. Instead they do a 2 stage approach, where the first stage maximizes the reward while fixing the first-turn's ...
arxiv.org
Training Language Models to Self-Correct via Reinforcement Learning
Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Current methods for training...
7.9K
Zack Ankner
@ZackAnkner
Jul 12, 2024
Hydra was accepted to COLM! Going to be dropping some new perf improvements and batched decoding support as well soon 😁
Zack Ankner
@ZackAnkner
Feb 9, 2024
Excited to announce Hydra decoding! 🚀 We introduce sequential dependence in Medusa decoding and achieve up to a 1.31x and 2.71x improvement in throughput as compared to Medusa and baseline decoding! Paper: arxiv.org/abs/2402.05109 Github: github.com/zankner/Hydra
GIF
6.3K
Zack Ankner
@ZackAnkner
Oct 9, 2024
Agreed ;) But in all seriousness, its cool to see everyone converging on reward models that perform explicit reasoning by critiquing out loud. Super excited to see how people build on top of these works.
Rishabh Agarwal
@agarwl_
Oct 8, 2024
Imitation is the best form of flattery ;) Great to see more work on generative verifiers and reward models.
12K
Zack Ankner
@ZackAnkner
Sep 5, 2024
Code and models for Critique-out-Loud (CLoud) reward models are finally public! The repo comes with a gradio demo you can run, so hopefully people can mess around with the models 😃 Code: github.com/zankner/CLoud
Zack Ankner
@ZackAnkner
Aug 22, 2024
Excited to announce our new work: Critique-out-Loud (CLoud) reward models. CLoud reward models first produce a chain of thought critique of the input before predicting a scalar reward, allowing reward models to reason explicitly instead of implicitly! arxiv.org/abs/2408.11791
8.9K
Zack Ankner
@ZackAnkner
Sep 4, 2023
Replying to @ZackAnkner
It’s especially hard being an undergraduate researcher who already has to balance a full-time class schedule while trying to fit in research. But instead of just complaining I would like to highlight the paper because I am proud of the work we did.
3.5K
Zack Ankner
@ZackAnkner
Oct 6, 2024
Heading to @COLM_conf to present our work Hydra! Would love to meet people there so please DM me if you want to chat about reward models, verifiers, ML sys, inference time compute ... and honestly anything else.
2K
Zack Ankner
@ZackAnkner
Jun 3, 2024
Super cool to see our work picked up by @_akhaliq as someone who has been reading the daily paper dumps for a while 😁! Can also find my summary of the paper here: x.com/ZackAnkner/sta…
AK
@_akhaliq
Jun 3, 2024
Perplexed by Perplexity Perplexity-Based Data Pruning With Small Reference Models In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language
15K