Chunting Zhou (@violet

Chunting Zhou

142 posts

Chunting Zhou

@violet_zct

Past: Research Scientist at FAIR | PhD @CMU. she/her.

Seattle, WA

Joined July 2015

Pinned
Chunting Zhou
@violet_zct
Aug 21, 2024
Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
240K
Chunting Zhou
@violet_zct
May 22, 2023
How do you turn a language model into a chatbot without any user interactions? We introduce LIMA: a LLaMa-based model fine-tuned on only 1,000 curated prompts and responses, which produces shockingly good responses. * No user data * No mode distillation * No RLHF
180K
Chunting Zhou
@violet_zct
Sep 23, 2022
I'm excited to share our work on a new sequence modeling architecture called Mega: Moving Average Equipped Gated Attention. Mega achieves SOTA results on multiple benchmarks, including NMT, Long Range Arena, language modeling, ImageNet and raw speech classification.
Chunting Zhou
@violet_zct
Apr 16, 2024
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head
97K
Chunting Zhou
@violet_zct
Dec 26, 2019
``Understanding Knowledge Distillation in Non-autoregressive Machine Translation": (openreview.net/forum?id=BygFV…, accepted by ICLR 2020!) We first systematically studied why and how non-autoregressive generation models benefit from the knowledge distillation technique.
openreview.net
Understanding Knowledge Distillation in Non-autoregressive Machine...
We systematically examine why knowledge distillation is crucial to the training of non-autoregressive translation (NAT) models, and propose methods to further improve the distilled data to best...
Chunting Zhou
@violet_zct
Sep 29, 2022
Mega is now open source at: github.com/facebookresear… Feel free to play with it! @MaxMa1987 @XiangKong4 @junxian_he @liangkegui @jonathanmay @gneubig @LukeZettlemoyer
Chunting Zhou
@violet_zct
Sep 23, 2022
I'm excited to share our work on a new sequence modeling architecture called Mega: Moving Average Equipped Gated Attention. Mega achieves SOTA results on multiple benchmarks, including NMT, Long Range Arena, language modeling, ImageNet and raw speech classification.
Chunting Zhou
@violet_zct
Sep 6, 2019
Introducing FlowSeq: Generative Flow-based Non-Autoregressive Seq2Seq generation @emnlp2019 (arxiv.org/abs/1909.02480)! FlowSeq allows for efficient parallel decoding while modeling the joint distribution of the output sequence. Our code is at github.com/XuezheMax/flow….
Chunting Zhou
@violet_zct
Oct 13, 2021
Check out our recent work on parameter-efficient fine-tuning. We present a unified framework that establishes connections between state-of-the-art methods (e.g. Prefix-Tuning, Adapters, LoRA). Great collaboration with @junxian_he and others @MaxMa1987, @BergKirkpatrick @gneubig!
Chunting Zhou
@violet_zct
May 17, 2024
🚀 Excited to introduce Chameleon, our work in mixed-modality early-fusion foundation models from last year! 🦎 Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities!
AI at Meta
@AIatMeta
May 17, 2024
Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n
18K
Chunting Zhou
@violet_zct
Sep 14, 2021
I am excited to introduce our EMNLP paper ``Distributionally Robust Multilingual Machine Translation”. To encourage uniform performance across languages, we propose a new learning objective for multilingual training based on the concept of distributionally robust optimization.
Chunting Zhou
@violet_zct
Aug 21, 2024
Replying to @violet_zct
Modeling innovations: * Global causal attention + bidirectional attention within each image is crucial. * Introducing modality-specific encoding and decoding layers improves performance and can compress each image to 64 or even 16 patches! 4/5
27K
Chunting Zhou
@violet_zct
Sep 4, 2019
Does syntactic reordering help in Neural Machine Translation? Checkout our @emnlp2019 paper arxiv.org/pdf/1909.00040…, we reorder target-language monolingual sentences to the source order and use as an additional source of training supervisions. (1/2)
Chunting Zhou
@violet_zct
Aug 1, 2021
Neural sequence generation models can hallucinate unfaithful content that is not supported by the source input. How to detect the hallucinated tokens and reduce them? Check out our ACL-finding paper: Detecting Hallucinated Content in Conditional Neural Sequence Generation.
Chunting Zhou
@violet_zct
May 22, 2023
Replying to @violet_zct
So what do we learn about alignment? One hypothesis is that almost all of the knowledge and capabilities are already learned in pretraining, and alignment just needs to teach the model which format or style to use when interacting with a user. Let me know other hypotheses?(8/8)
4.4K