Log inSign up
Chunting Zhou
142 posts
user avatar
Chunting Zhou
@violet_zct
Past: Research Scientist at FAIR | PhD @CMU. she/her.
Seattle, WA
violet-zct.github.io
Joined July 2015
310
Following
3,399
Followers
  • Pinned
    user avatar
    Chunting Zhou
    @violet_zct
    Aug 21, 2024
    Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
    Image
    Image
    240K
  • user avatar
    Chunting Zhou
    @violet_zct
    May 22, 2023
    How do you turn a language model into a chatbot without any user interactions? We introduce LIMA: a LLaMa-based model fine-tuned on only 1,000 curated prompts and responses, which produces shockingly good responses. * No user data * No mode distillation * No RLHF
    Image
    180K
  • user avatar
    Chunting Zhou
    @violet_zct
    Sep 23, 2022
    I'm excited to share our work on a new sequence modeling architecture called Mega: Moving Average Equipped Gated Attention. Mega achieves SOTA results on multiple benchmarks, including NMT, Long Range Arena, language modeling, ImageNet and raw speech classification.
    Image
  • user avatar
    Chunting Zhou
    @violet_zct
    Apr 16, 2024
    How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head
    Image
    Image
    97K
  • user avatar
    Chunting Zhou
    @violet_zct
    Dec 26, 2019
    ``Understanding Knowledge Distillation in Non-autoregressive Machine Translation": (openreview.net/forum?id=BygFV…, accepted by ICLR 2020!) We first systematically studied why and how non-autoregressive generation models benefit from the knowledge distillation technique.
    openreview.net
    Understanding Knowledge Distillation in Non-autoregressive Machine...
    We systematically examine why knowledge distillation is crucial to the training of non-autoregressive translation (NAT) models, and propose methods to further improve the distilled data to best...
  • user avatar
    Chunting Zhou
    @violet_zct
    Sep 29, 2022
    Mega is now open source at: github.com/facebookresear… Feel free to play with it! @MaxMa1987 @XiangKong4 @junxian_he @liangkegui @jonathanmay @gneubig @LukeZettlemoyer
    Image
    Image
    user avatar
    Chunting Zhou
    @violet_zct
    Sep 23, 2022
    I'm excited to share our work on a new sequence modeling architecture called Mega: Moving Average Equipped Gated Attention. Mega achieves SOTA results on multiple benchmarks, including NMT, Long Range Arena, language modeling, ImageNet and raw speech classification.
  • user avatar
    Chunting Zhou
    @violet_zct
    Sep 6, 2019
    Introducing FlowSeq: Generative Flow-based Non-Autoregressive Seq2Seq generation @emnlp2019 (arxiv.org/abs/1909.02480)! FlowSeq allows for efficient parallel decoding while modeling the joint distribution of the output sequence. Our code is at github.com/XuezheMax/flow….
    Image
  • user avatar
    Chunting Zhou
    @violet_zct
    Oct 13, 2021
    Check out our recent work on parameter-efficient fine-tuning. We present a unified framework that establishes connections between state-of-the-art methods (e.g. Prefix-Tuning, Adapters, LoRA). Great collaboration with @junxian_he and others @MaxMa1987, @BergKirkpatrick @gneubig!
    Image
  • user avatar
    Chunting Zhou
    @violet_zct
    May 17, 2024
    🚀 Excited to introduce Chameleon, our work in mixed-modality early-fusion foundation models from last year! 🦎 Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities!
    user avatar
    AI at Meta
    Meta
    @AIatMeta
    May 17, 2024
    Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n
    Image
    18K
  • user avatar
    Chunting Zhou
    @violet_zct
    Sep 14, 2021
    I am excited to introduce our EMNLP paper ``Distributionally Robust Multilingual Machine Translation”. To encourage uniform performance across languages, we propose a new learning objective for multilingual training based on the concept of distributionally robust optimization.
  • user avatar
    Chunting Zhou
    @violet_zct
    Aug 21, 2024
    Replying to @violet_zct
    Modeling innovations: * Global causal attention + bidirectional attention within each image is crucial. * Introducing modality-specific encoding and decoding layers improves performance and can compress each image to 64 or even 16 patches! 4/5
    Image
    Image
    27K
  • user avatar
    Chunting Zhou
    @violet_zct
    Sep 4, 2019
    Does syntactic reordering help in Neural Machine Translation? Checkout our @emnlp2019 paper arxiv.org/pdf/1909.00040…, we reorder target-language monolingual sentences to the source order and use as an additional source of training supervisions. (1/2)
    Image
  • user avatar
    Chunting Zhou
    @violet_zct
    Aug 1, 2021
    Neural sequence generation models can hallucinate unfaithful content that is not supported by the source input. How to detect the hallucinated tokens and reduce them? Check out our ACL-finding paper: Detecting Hallucinated Content in Conditional Neural Sequence Generation.
  • user avatar
    Chunting Zhou
    @violet_zct
    May 22, 2023
    Replying to @violet_zct
    So what do we learn about alignment? One hypothesis is that almost all of the knowledge and capabilities are already learned in pretraining, and alignment just needs to teach the model which format or style to use when interacting with a user. Let me know other hypotheses?(8/8)
    4.4K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement