Roger Grosse (@RogerGrosse) / X

Roger Grosse

1,037 posts

Roger Grosse

@RogerGrosse

Joined July 2015

Roger Grosse
@RogerGrosse
Nov 24, 2018
Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread] arxiv.org/abs/1811.03600
Roger Grosse
@RogerGrosse
Nov 17, 2018
ICLR reviewers keep insisting on ImageNet experiments and expensive-to-train SOTA architectures. Effectively, they require proof that you've spent sufficiently many GPU cycles for your conclusions to be taken seriously. ICLR papers are a cryptocurrency.
Roger Grosse
@RogerGrosse
Oct 8, 2020
SGD and Adam are good enough for training most neural net architectures.
Roger Grosse
@RogerGrosse
Oct 22, 2018
The Deep Learning and RL Summer School videos are up!
Deep Learning and Reinforcement Learning Summer School, Toronto 2018
From videolectures.net
Roger Grosse
@RogerGrosse
Oct 22, 2021
Google has a significant fraction of the world's top AI talent, and yet Gmail has recently been marking as spam nearly every email from the undergraduates in my ML course. It sometimes even spam filters emails from my grad students or replies to messages I sent.
Roger Grosse
@RogerGrosse
Feb 16, 2024
Here's what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.
80K
Roger Grosse
@RogerGrosse
Dec 29, 2017
My vote for deep learning result of the year -- and this has gotten almost no hype -- is machine translation without parallel text. I'd thought this was impossible. Says something interesting about language. arxiv.org/abs/1710.04087 arxiv.org/abs/1710.11041 arxiv.org/abs/1711.00043
arxiv.org
Word Translation Without Parallel Data
State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can...
Roger Grosse
@RogerGrosse
Aug 16, 2020
This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena. papers.nips.cc/paper/9483-a-f…
Roger Grosse
@RogerGrosse
Oct 26, 2018
Reversible RNNs: reduce memory costs of GRU and LSTM networks by 10-15x without loss in performance. Also 5-10x for attention-based architectures. New paper with Matt MacKay, Paul Vicol, and Jimmy Ba, to appear at NIPS. arxiv.org/abs/1810.10999
Roger Grosse
@RogerGrosse
Mar 8, 2019
Excited to release our paper on Self-Tuning Networks, a way of adapting regularization hyperparameters online during training. This is the work of Matt MacKay, Paul Vicol, and @jonLorraine9, to appear at ICLR 2019.
arxiv.org
Self-Tuning Networks: Bilevel Optimization of Hyperparameters...
Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization...
Roger Grosse
@RogerGrosse
Oct 11, 2020
90% of all confusion about neural net training dynamics would vanish if everyone got used to thinking about and measuring neural net Jacobians, Hessians, Fisher information matrices, etc.
Roger Grosse
@RogerGrosse
Jan 3, 2024
I'm teaching a new course on AI Alignment this term at the University of Toronto. The first half will cover idealized models of future AI systems (optimal planners, universal induction, etc.), and the second half will cover practical alignment techniques in the context of LLMs.
53K
Roger Grosse
@RogerGrosse
Jul 13, 2018
The deep learning revolution happened in Canada because Canadians are used to long winters.
Tim Vieira
@xtimv
Jul 12, 2018
New name for @NipsConference "AI Winter" — Miro Dudík
Roger Grosse
@RogerGrosse
Jun 19, 2017
The Wasserstein GAN should have been called the GAN whose Discriminator's A Lipschitz Function (GANDALF).