Lucas Beyer (bl16) (@giffmana) / X

Lucas Beyer (bl16)

27.4K posts

Lucas Beyer (bl16)

@giffmana

Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: admonymous.co/giffmana ✗DMs → email

Zürich, Suisse

Joined December 2013

Pinned
Lucas Beyer (bl16)
@giffmana
Sep 14, 2022
My Transformer tutorial slides are now available at lucasb.eyer.be/transformer I'll append recordings to this thread as I get them. If you want to use some of the slides for your lecture, you may, as long as you credit me. If you'd like me to give the lecture: maybe; e-mail me.
Lucas Beyer (bl16)
@giffmana
Sep 11, 2022
Giving a lecture introducing the Transformer architecture in all gory details at @M2lSchool tomorrow. Also got permission to publish slides and will share recording if/when I get one. It's a pretty cool set of slides, largely thanks to @_basilM for inspiration!
Lucas Beyer (bl16)
@giffmana
Sep 23, 2025
Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node" The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm. I certainly didn't know.
773K
Lucas Beyer (bl16)
@giffmana
Nov 15, 2025
I would like to, for no reason at all, remind my dear Googlers of the yearly training they get about insider trading.
Sundar Pichai
@sundarpichai
Nov 14, 2025
🤔🤔
1.4M
Lucas Beyer (bl16)
@giffmana
Jun 26, 2025
hey all, couple quick notes: 1) yes, we will be joining Meta. 2) no, we did not get 100M sign-on, that's fake news. Excited about what's ahead though, will share more in due time! cc @__kolesnikov__ and @XiaohuaZhai.
705K
Lucas Beyer (bl16)
@giffmana
Dec 29, 2022
How good of a BERT can one get in ONE DAY on ONE GPU? With all the recent studies about scaling compute up, this paper takes a refreshing turn and does a deep dive into scaling down compute. It's well written, stock full of insights. Here is my summary and my opinions. 🧶 1/N
851K
Lucas Beyer (bl16)
@giffmana
Sep 15, 2025
Huh, did you really put your height and weight in your resume in the 70's!?
Priyanshu Priyank
@PriyanshuP1405
Sep 14, 2025
Bill gates resume as a fresher at Harvard The resume and experience is still better than probably 99% of the college students in tech out there.
352K
Lucas Beyer (bl16)
@giffmana
Feb 6, 2025
Our PR folks somehow just forgot that they bought chat.com or something lol
OpenAI
@OpenAI
Feb 5, 2025
ChatGPT search is now available to everyone on chatgpt.com — no sign up required.
764K
Lucas Beyer (bl16)
@giffmana
Jun 21, 2025
Ladies and gentlemen, i present to you the most surreal exchange I've had on x the everything app:
230K
Lucas Beyer (bl16)
@giffmana
Dec 27, 2024
This actually reproduces as of today. In 5 out of 8 generations, DeepSeekV3 claims to be ChatGPT (v4), while claiming to be DeepSeekV3 only 3 times. Gives you a rough idea of some of their training data distribution.
Ross Lazer
@rosslazer
Dec 27, 2024
Replying to @mathemagic1an
LOL I'm coming around to your theory
1.1M
Lucas Beyer (bl16)
@giffmana
Nov 20, 2024
Hahahaha
Liron Shapira
@liron
Nov 20, 2024
Replying to @liron
astralcodexten.com/p/how-did-you-…
287K
Lucas Beyer (bl16)
@giffmana
Aug 19, 2025
Looks like there's an ongoing xAI exodus. Wild.
709K
Lucas Beyer (bl16)
@giffmana
Jul 1, 2025
guys I'm under observation now👀
234K
Lucas Beyer (bl16)
@giffmana
Feb 4, 2025
I took a brief look at the Harmonic Loss paper tl;dr: instead of dot-product with softmax, do euclid dist with normalized 1/d**n. I kinda want this to work. I've dabbled with preferring euclid many times throughout my career (eg triplet loss etc) However...
David D. Baek
@dbaek__
Feb 4, 2025
1/9 🚨 New Paper Alert: Cross-Entropy Loss is NOT What You Need! 🚨 We introduce harmonic loss as alternative to the standard CE loss for training neural networks and LLMs! Harmonic loss achieves 🛠️significantly better interpretability, ⚡faster convergence, and ⏳less grokking!
GIF
476K
Lucas Beyer (bl16)
@giffmana
Jan 10, 2025
I've been trying out Cursor with o1 for a few weeks now, and it's been giving me proper "holy shit, this changes things a bit" vibes. The most impressive to me is not the "generate code for XYZ" you see everywhere. That's nice, but I can also do that myself just fine, so it's
250K