Log inSign up
Jannik Kossen
336 posts
Image
user avatar
Jannik Kossen
@janundnik
AI Research Scientist at FAIR (@meta) working on LLMs for CodeGen and Reasoning. PhD Student @OATML_Oxford and @oxcsml. Interned @DeepMind and @GoogleAI.
Oxford / Berlin
jlko.eu
Joined April 2011
692
Following
1,693
Followers
  • user avatar
    Jannik Kossen
    @janundnik
    Jun 7, 2021
    🗞New Paper🗞 🤖🧪Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning 🧪🤖 Huge thanks to @neilbband* as well as @clarelyle, @AidanNGomez, @tom_rainforth, @yaringal, and @OATML_Oxford ! Introducing 🚀Non-Parametric Transformers🚀 1/
    Overview of the Non-Parametric Transformer. (a) The input dataset and mask matrix are stacked and (b) linearly embedded for all datapoints independently. NPT then applies (c) Attention Between Datapoints across all n samples of hidden dimension h = d · e. (d) Attention
Between Attributes then attends between the attributes for each datapoint independently. We repeat steps (c) and (d) and obtain a final prediction from a separate linear projection.
  • user avatar
    Jannik Kossen
    @janundnik
    Nov 11, 2024
    Life update! I've joined FAIR @Meta as an AI Research Scientist to work on code generation with LLMs in @syhw's team ✨ Thanks to everyone who supported me along the way 🙏 I'm super excited for what's to come!
    Image
    27K
  • user avatar
    Jannik Kossen
    @janundnik
    Jul 3, 2023
    My @DeepMind internship project just got published in TMLR 🎉 Looking for a challenging application for your multimodal agents? We propose a temporal, multimodal decision-making task that's hard for contemporary models. 📄 openreview.net/forum?id=Gbu1b…
    Image
    Image
    27K
  • user avatar
    Jannik Kossen
    @janundnik
    Aug 10, 2023
    Have you heard that In-Context Learning in LLMs does not learn label relationships? Our new paper shows this is usually not true  – even for smaller models with < 10B parameters. We also study *how* LLMs incorporate label information in a variety of experiments. 🧵1/N
    Image
    26K
  • user avatar
    Jannik Kossen
    @janundnik
    Oct 22, 2021
    Super happy that 🧪🤖 Non-Parametric Transformers 🤖🧪 is accepted to NeurIPS 2021! 🗞Join us at the virtual poster session 📹 Camera-ready with more experiments soon Try NPTs and tell us what you think! @OATML_Oxford @neilbband @clarelyle @AidanNGomez @tom_rainforth @yaringal
    user avatar
    Jannik Kossen
    @janundnik
    Jun 7, 2021
    🗞New Paper🗞 🤖🧪Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning 🧪🤖 Huge thanks to @neilbband* as well as @clarelyle, @AidanNGomez, @tom_rainforth, @yaringal, and @OATML_Oxford ! Introducing 🚀Non-Parametric Transformers🚀 1/
    Overview of the Non-Parametric Transformer. (a) The input dataset and mask matrix are stacked and (b) linearly embedded for all datapoints independently. NPT then applies (c) Attention Between Datapoints across all n samples of hidden dimension h = d · e. (d) Attention
Between Attributes then attends between the attributes for each datapoint independently. We repeat steps (c) and (d) and obtain a final prediction from a separate linear projection.
  • user avatar
    Jannik Kossen
    @janundnik
    Aug 18, 2023
    Interested in few-shot in-context learning (ICL) in LLMs? You might like to hear about this neat trick. Many papers just report performance at a single fixed number N of in-context examples. Well it turns out, you can get nice ICL training curves like these at *no extra cost*.
    Image
    29K
  • user avatar
    Jannik Kossen
    @janundnik
    May 2, 2020
    I am super excited to announce that I will be joining @yaringal and @tom_rainforth at @OATML_Oxford as a PhD student this fall! 🧠🤖
  • user avatar
    Jannik Kossen
    @janundnik
    Jun 19, 2024
    Our work on detecting hallucinations in LLMs just got published in @Nature! Check it out :)
    user avatar
    Sebastian Farquhar
    @seb_far
    Jun 19, 2024
    Is your LLM hallucinating? 👻 Our @Nature paper shows how to detect when an LLM is making things up. A 'confabulating' LLM answers with inconsistent meanings when re-asked the same question. We use this to estimate uncertainty and detect confabulations. Learn more 🧵👇 1/
    Image
    9K
  • user avatar
    Jannik Kossen
    @janundnik
    May 16, 2021
    Extremely happy (albeit late) to announce that 🧪🏃‍♀️Active Testing – Sample-Efficient Model Evaluation🏃‍♀️🧪 is accepted to ICML2021! 📚🖨 arxiv.org/abs/2103.05331 Huge thanks to the fantastic co-authors @seb_far, @yaringal, and @tom_rainforth of my first paper at @OATML_Oxford!
    arXiv logo
    arxiv.org
    Active Testing: Sample-Efficient Model Evaluation
    We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training,...
  • user avatar
    Jannik Kossen
    @janundnik
    Aug 20, 2023
    "The authors have effectively addressed my concerns with their rebuttal. Therefore, I will not be lowering my score." 🧐
    27K
  • user avatar
    Jannik Kossen
    @janundnik
    Dec 11, 2023
    👀 Looking to improve contrastive learning with pre-trained models? 🎷Check out my @GoogleAI internship project at NeurIPS this week – Poster #807 in Session 1 on Wednesday. 🔥 With Three Towers, the image tower benefits from both contrastive learning and pre-training!
    Image
    9.4K
  • user avatar
    Jannik Kossen
    @janundnik
    Oct 31, 2022
    Our NeurIPS Oral🔥 🧪Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation shows actively learned models predict test loss better than using only labels. 📄arxiv.org/abs/2202.06881 Thank you @seb_far @yaringal @tom_rainforth @OATML_Oxford
    Image
    Image
  • user avatar
    Jannik Kossen
    @janundnik
    Dec 6, 2021
    🤖🧪Non-Parametric Transformers 🧪🤖 🌎@NeurIPSConf Poster Session 1 📆 Dec 7, 4.30 – 6.00 pm GMT Come visit, hang out, or ask questions – help make virtual conferences more fun! neurips.cc/virtual/2021/p… @neilbband @clarelyle @AidanNGomez @tom_rainforth @yaringal @OATML_Oxford
    Image
    Overview of the Non-Parametric Transformer. (a) The input dataset and mask matrix are stacked and (b) linearly embedded for all datapoints independently. NPT then applies (c) Attention Between Datapoints across all n samples of hidden dimension h = d · e. (d) Attention
Between Attributes then attends between the attributes for each datapoint independently. We repeat steps (c) and (d) and obtain a final prediction from a separate linear projection.
    user avatar
    Jannik Kossen
    @janundnik
    Jun 7, 2021
    🗞New Paper🗞 🤖🧪Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning 🧪🤖 Huge thanks to @neilbband* as well as @clarelyle, @AidanNGomez, @tom_rainforth, @yaringal, and @OATML_Oxford ! Introducing 🚀Non-Parametric Transformers🚀 1/
  • user avatar
    Jannik Kossen
    @janundnik
    Jul 19, 2021
    🧐Planning #ICML2021? 👀Chat to us about 'Active Testing' in Session 5 on Thu, 3-6 UCT! 🧪We introduce strategies for sample-efficient model evaluation. ✍️@janundnik @sebfar @yaringal @tom_rainforth @OATML_Oxford @oxcsml 📹icml.cc/virtual/2021/s… 📄proceedings.mlr.press/v139/kossen21a…
    We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation. This creates a disconnect to real applications, where test labels are important and just as expensive, e.g. for optimizing hyperparameters. Active testing addresses this by carefully selecting the test points to label, ensuring model evaluation is sample-efficient. To this end, we derive theoretically-grounded and intuitive acquisition strategies that are specifically tailored to the goals of active testing, noting these are distinct to those of active learning. As actively selecting labels introduces a bias; we further show how to remove this bias while reducing the variance of the estimator at the same time.

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement