Log inSign up
Bailin Wang
107 posts
user avatar
Bailin Wang
@bailin_28
LLM researcher
Cambridge
berlino.github.io
Joined February 2012
2,569
Following
1,649
Followers
  • user avatar
    Bailin Wang
    @bailin_28
    Sep 29, 2024
    I'm looking for a summer'25 intern at Apple AI/ML, New York. Focus: long-context modeling for LLM pretraining Apply link: shorturl.at/wyK0G Please also email me your resume after application.
    170K
  • user avatar
    Bailin Wang
    @bailin_28
    Jan 29, 2024
    What is the landscape of sequence models (e.g., Mamba, GLA), and what are the intrinsic limitations (if any) of those efficient architectures? I'm really excited about our effort for these question, from the perspective of formal language learning and in-context-learning
    user avatar
    Ekin Akyürek
    @akyurekekin
    Jan 29, 2024
    Can insights from synthetic experiments and interpretability lead to real improvements in language modeling? We: > propose a formal model for in-context learning > uncover "n-gram heads" = high order induction heads, crucial for ICLL > improve Transformer LM perplexity by 6.7%
    https://arxiv.org/abs/2401.12973
    12K
  • user avatar
    Bailin Wang
    @bailin_28
    Dec 12, 2023
    It seems that data-dependent gating is the core ingredient for effective linear-complexity alternatives to softmax attention, as shown in both our GLA and Mamba. We (the authors) are also at NeurIPS, and happy to chat more about this!
    user avatar
    Yikang Shen
    @Yikang_Shen
    Dec 12, 2023
    Impressed by the performance of Mamba and believe in RNN? We provide a simple alternative solution! Excited to share Gated Linear Attention (GLA-Transformer). (1/n) arxiv.org/abs/2312.06635
    Image
    9.6K
  • user avatar
    Bailin Wang
    @bailin_28
    Nov 27, 2023
    Checkout Linlu’s recent work on LM’s phenomenal 🤩yet puzzling 🧐 behavior !
    user avatar
    Linlu Qiu
    @linluqiu
    Nov 27, 2023
    How good are LMs at inductive reasoning? How are their behaviors similar to/contrasted with those of humans? We study these via iterative hypothesis refinement. We observe that LMs are phenomenal hypothesis proposers, but they also behave as puzzling inductive reasoners: (1/n)
    Image
    6.2K
  • user avatar
    Bailin Wang
    @bailin_28
    Dec 12, 2023
    @Ziwphd and I will present our Grammar Prompting work tomorrow at #NeurIPS2023 Happy to chat more on - LLMs for structured language generation (e.g., programs, molecules and robotic plans) - Structured/formal chain-of-thought for reasoning
    Image
    Image
    5.2K
  • user avatar
    Bailin Wang
    @bailin_28
    Nov 7, 2023
    Exciting!
    user avatar
    Leroy Wang
    @LiruiWang1
    Nov 7, 2023
    Replying to @LiruiWang1
    Excited to share that GenSim won the outstanding paper award at the LangRob workshop at CoRL 2023!
    Image
    1.3K
  • user avatar
    Bailin Wang
    @bailin_28
    Jan 29, 2024
    Replying to @bailin_28
    I was also quite surprised that all models we tested (including standard attention) can benefit consistently from adding a simple static n-gram head.
    937
  • user avatar
    Bailin Wang
    @bailin_28
    Dec 12, 2023
    Replying to @bailin_28
    @srush_nlp @jefrankle’s bet (isattentionallyouneed.com) might depend on whether the question was “Is softmax attention all you need?” :)
    387
  • user avatar
    Bailin Wang
    @bailin_28
    Jul 8, 2020
    Interested in NL--> SQL? Come to our Q&A session 14A. We proposed to integrate structured relations into transformer to bias the representation learning.#acl2020nlp Paper: virtual.acl2020.org/paper_main.677… Code: github.com/microsoft/rat-… Joint work with @rshin, @Skiminok @AllenLao Matt
    Image
    GitHub - microsoft/rat-sql: A relation-aware semantic parsing model from English to SQL
    From github.com
  • user avatar
    Bailin Wang
    @bailin_28
    Apr 2, 2019
    Well, meditation really helps me pull myself together and become more aware of the stress I got.
  • user avatar
    Bailin Wang
    @bailin_28
    Jul 8, 2017
    Uncertainty in Deep Learning (PhD Thesis) | Yarin Gal - Blog | Cambridge Machine Learning Group mlg.eng.cam.ac.uk/yarin/blog_224… via @yaringal
  • user avatar
    Bailin Wang
    @bailin_28
    Apr 25, 2024
    Replying to @sivil_taram and @Francis_YAO_
    Very interesting work! @Francis_YAO_ yeah, we found that sub-quadratic models (e.g., linear attention, SSM) are not good at in-context learning, primarily due to the incapability of retrieval.
    297
  • user avatar
    Bailin Wang
    @bailin_28
    May 23, 2017
    Replying to @SugarKLeonard2
    One more year, Manu😢
  • user avatar
    Bailin Wang
    @bailin_28
    Jun 9, 2017
    “An Adversarial Review of ‘Adversarial Generation of Natural Language’” by Yoav Goldberg medium.com/@yoav.goldberg…

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement