Bailin Wang (@bailin

Bailin Wang

107 posts

Bailin Wang

@bailin_28

LLM researcher

Cambridge

Joined February 2012

Bailin Wang
@bailin_28
Sep 29, 2024
I'm looking for a summer'25 intern at Apple AI/ML, New York. Focus: long-context modeling for LLM pretraining Apply link: shorturl.at/wyK0G Please also email me your resume after application.
170K
Bailin Wang
@bailin_28
Jan 29, 2024
What is the landscape of sequence models (e.g., Mamba, GLA), and what are the intrinsic limitations (if any) of those efficient architectures? I'm really excited about our effort for these question, from the perspective of formal language learning and in-context-learning
Ekin Akyürek
@akyurekekin
Jan 29, 2024
Can insights from synthetic experiments and interpretability lead to real improvements in language modeling? We: > propose a formal model for in-context learning > uncover "n-gram heads" = high order induction heads, crucial for ICLL > improve Transformer LM perplexity by 6.7%
12K
Bailin Wang
@bailin_28
Dec 12, 2023
It seems that data-dependent gating is the core ingredient for effective linear-complexity alternatives to softmax attention, as shown in both our GLA and Mamba. We (the authors) are also at NeurIPS, and happy to chat more about this!
Yikang Shen
@Yikang_Shen
Dec 12, 2023
Impressed by the performance of Mamba and believe in RNN? We provide a simple alternative solution! Excited to share Gated Linear Attention (GLA-Transformer). (1/n) arxiv.org/abs/2312.06635
9.6K
Bailin Wang
@bailin_28
Nov 27, 2023
Checkout Linlu’s recent work on LM’s phenomenal 🤩yet puzzling 🧐 behavior !
Linlu Qiu
@linluqiu
Nov 27, 2023
How good are LMs at inductive reasoning? How are their behaviors similar to/contrasted with those of humans? We study these via iterative hypothesis refinement. We observe that LMs are phenomenal hypothesis proposers, but they also behave as puzzling inductive reasoners: (1/n)
6.2K
Bailin Wang
@bailin_28
Dec 12, 2023
@Ziwphd and I will present our Grammar Prompting work tomorrow at #NeurIPS2023 Happy to chat more on - LLMs for structured language generation (e.g., programs, molecules and robotic plans) - Structured/formal chain-of-thought for reasoning
5.2K
Bailin Wang
@bailin_28
Nov 7, 2023
Exciting!
Leroy Wang
@LiruiWang1
Nov 7, 2023
Replying to @LiruiWang1
Excited to share that GenSim won the outstanding paper award at the LangRob workshop at CoRL 2023!
1.3K
Bailin Wang
@bailin_28
Jan 29, 2024
Replying to @bailin_28
I was also quite surprised that all models we tested (including standard attention) can benefit consistently from adding a simple static n-gram head.
937
Bailin Wang
@bailin_28
Dec 12, 2023
Replying to @bailin_28
@srush_nlp @jefrankle’s bet (isattentionallyouneed.com) might depend on whether the question was “Is softmax attention all you need?” :)
387
Bailin Wang
@bailin_28
Jul 8, 2020
Interested in NL--> SQL? Come to our Q&A session 14A. We proposed to integrate structured relations into transformer to bias the representation learning.#acl2020nlp Paper: virtual.acl2020.org/paper_main.677… Code: github.com/microsoft/rat-… Joint work with @rshin, @Skiminok @AllenLao Matt
GitHub - microsoft/rat-sql: A relation-aware semantic parsing model from English to SQL
From github.com
Bailin Wang
@bailin_28
Apr 2, 2019
Well, meditation really helps me pull myself together and become more aware of the stress I got.
Bailin Wang
@bailin_28
Jul 8, 2017
Uncertainty in Deep Learning (PhD Thesis) | Yarin Gal - Blog | Cambridge Machine Learning Group mlg.eng.cam.ac.uk/yarin/blog_224… via @yaringal
Bailin Wang
@bailin_28
Apr 25, 2024
Replying to @sivil_taram and @Francis_YAO_
Very interesting work! @Francis_YAO_ yeah, we found that sub-quadratic models (e.g., linear attention, SSM) are not good at in-context learning, primarily due to the incapability of retrieval.
297
Bailin Wang
@bailin_28
May 23, 2017
Replying to @SugarKLeonard2
One more year, Manu😢
Bailin Wang
@bailin_28
Jun 9, 2017
“An Adversarial Review of ‘Adversarial Generation of Natural Language’” by Yoav Goldberg medium.com/@yoav.goldberg…