Log inSign up
Alexander Kolesnikov
444 posts
user avatar
Alexander Kolesnikov
@__kolesnikov__
Joined January 2019
196
Following
11.5K
Followers
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Dec 4, 2024
    Ok, it is yesterdays news already, but good night sleep is important. After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @XiaohuaZhai and @giffmana, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
    110K
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Feb 17, 2023
    Vision meets RL! We reveal that policy gradient can be used for tuning vision models to optimize complex metrics, such as mAP, PQ or “color diversity”, observing large performance boosts on tasks like object detection, panoptic segmentation, etc. arxiv.org/abs/2302.08242
    Image
    218K
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    May 23, 2022
    I've always been frustrated that, beyond image classification, computer vision is full of complex and task-specific components. Thus, very excited to share our new work, where we propose a unified modeling approach for vision: arxiv.org/abs/2205.10337. More in the thread🧵.
    Image
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    May 4, 2022
    Let me introduce big_vision: an original home of ViT, MLP-Mixer, LiT and many more. These days I do all my research in this codebase: it is great for doing vision research with emphasis on large-scale pretraining and transfer. Highlights in 🧵 ⬇️ Link:
    github.com
    GitHub - google-research/big_vision: Official codebase used to develop Vision Transformer, SigLIP,...
    Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more. - google-research/big_vision
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Oct 23, 2020
    We release pre-trained vision transformer models and code for inference/fine-tuning: github.com/google-researc…. There is still a long way towards understanding transformers in vision and I am looking forward to the future research. Hope this release will be a good starting point.
    Image
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    May 14, 2024
    We just released PaliGemma-3B, a very capable Vision-Language Model. Do not waste any time, finetune it for your task: Code: github.com/google-researc… Colab: colab.research.google.com/github/google-… Kaggle: kaggle.com/models/google/… HF: huggingface.co/collections/go… Vertex AI: console.cloud.google.com/vertex-ai/publ…
    28K
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    May 5, 2021
    MLP-Mixer (a new vision architecture based on MLP only) code and pretrained models are now available: github.com/google-researc…. Looking forward to community contributions that will shed some light on how Mixer works and how to make it even better. paper: arxiv.org/abs/2105.01601.
    Image
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Nov 1, 2023
    We've landed a big revamp of github.com/google-researc…. The main new feature is support for flexible weight sharding, which doesn't get in the way of cutting-edge research code. Scaling ViTs, ResNets, MLP-Mixers, SigLIPs (and so on) beyond single GPU/TPU device memory becomes easy.
    github.com
    GitHub - google-research/big_vision: Official codebase used to develop Vision Transformer, SigLIP,...
    Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more. - google-research/big_vision
    81K
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Dec 2, 2024
    I always dreamed of a model that simultaneously 1. optimizes NLL of raw pixel data, 2. generates competitive high-res. natural images, 3. is practical. But it seemed too good to be true. Until today! Our new JetFormer model (arxiv.org/abs/2411.19722) ticks on all of these. 🧵
    Image
    Image
    user avatar
    Michael Tschannen
    @mtschannen
    Dec 2, 2024
    Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 arxiv.org/abs/2411.19722 A thread 👇 1/
    68K
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Dec 20, 2024
    With some delay, JetFormer's *prequel* paper is finally out on arXiv: a radically simple ViT-based normalizing flow (NF) model that achieves SOTA results in its class. Jet is one of the key components of JetFormer, deserving a standalone report. Let's unpack: 🧵⬇️
    Image
    57K
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Jul 28, 2022
    We have opensourced UViM models and complete training/inference/eval code. You can now train new models yourself and explore the released models (and UViM guiding codes) in the interactive colabs. All available at github.com/google-researc…. UViM paper: arxiv.org/abs/2205.10337.
    UViM predictions in colab.
    Image
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Nov 15, 2021
    Also an interesting survey on MLP-Mixer and concurrent/follow-up research: arxiv.org/abs/2111.04060 Crazy all of it happened in ~6 months only.
    Image
    Image
    user avatar
    Neil Houlsby
    @neilhoulsby
    Nov 15, 2021
    An incredibly thorough-looking survey of Vision Transformers! It only been just over a year since we published ViT. I thought it would be useful, but didn't imagine this much cool innovation would happen. arxiv.org/abs/2111.06091
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    May 16, 2022
    Do not want to miss out on the recent trend, so I officially announce that 1. All my ICML 2022 papers were rejected. 2. All my ICML 2022 papers were accepted. 3. Both statements above are true.
  • user avatar
    Alexander Kolesnikov
    @__kolesnikov__
    Jul 11, 2024
    Our PaliGemma technical report is finally out: arxiv.org/abs/2407.07726. We share many insights that we learned while cooking the PaliGemma-3B model. Both about pretraining and transfer.
    user avatar
    Lucas Beyer (bl16)
    @giffmana
    Jul 10, 2024
    ✨PaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye." So here's some of the many interesting ablations we did, check the paper tomorrow for more! 🧶
    Image
    arXiv logo
    arxiv.org
    PaliGemma: A versatile 3B VLM for transfer
    PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base...
    19K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement