Log inSign up
Roger Grosse
1,037 posts
user avatar
Roger Grosse
@RogerGrosse
Joined July 2015
807
Following
11.7K
Followers
  • user avatar
    Roger Grosse
    @RogerGrosse
    Nov 24, 2018
    Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread] arxiv.org/abs/1811.03600
    Image
  • user avatar
    Roger Grosse
    @RogerGrosse
    Nov 17, 2018
    ICLR reviewers keep insisting on ImageNet experiments and expensive-to-train SOTA architectures. Effectively, they require proof that you've spent sufficiently many GPU cycles for your conclusions to be taken seriously. ICLR papers are a cryptocurrency.
  • user avatar
    Roger Grosse
    @RogerGrosse
    Oct 8, 2020
    SGD and Adam are good enough for training most neural net architectures.
    Image
  • user avatar
    Roger Grosse
    @RogerGrosse
    Oct 22, 2018
    The Deep Learning and RL Summer School videos are up!
    Image
    Deep Learning and Reinforcement Learning Summer School, Toronto 2018
    From videolectures.net
  • user avatar
    Roger Grosse
    @RogerGrosse
    Oct 22, 2021
    Google has a significant fraction of the world's top AI talent, and yet Gmail has recently been marking as spam nearly every email from the undergraduates in my ML course. It sometimes even spam filters emails from my grad students or replies to messages I sent.
  • user avatar
    Roger Grosse
    @RogerGrosse
    Feb 16, 2024
    Here's what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.
    Image
    80K
  • user avatar
    Roger Grosse
    @RogerGrosse
    Dec 29, 2017
    My vote for deep learning result of the year -- and this has gotten almost no hype -- is machine translation without parallel text. I'd thought this was impossible. Says something interesting about language. arxiv.org/abs/1710.04087 arxiv.org/abs/1710.11041 arxiv.org/abs/1711.00043
    arXiv logo
    arxiv.org
    Word Translation Without Parallel Data
    State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can...
  • user avatar
    Roger Grosse
    @RogerGrosse
    Aug 16, 2020
    This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena. papers.nips.cc/paper/9483-a-f…
  • user avatar
    Roger Grosse
    @RogerGrosse
    Oct 26, 2018
    Reversible RNNs: reduce memory costs of GRU and LSTM networks by 10-15x without loss in performance. Also 5-10x for attention-based architectures. New paper with Matt MacKay, Paul Vicol, and Jimmy Ba, to appear at NIPS. arxiv.org/abs/1810.10999
  • user avatar
    Roger Grosse
    @RogerGrosse
    Mar 8, 2019
    Excited to release our paper on Self-Tuning Networks, a way of adapting regularization hyperparameters online during training. This is the work of Matt MacKay, Paul Vicol, and @jonLorraine9, to appear at ICLR 2019.
    arXiv logo
    arxiv.org
    Self-Tuning Networks: Bilevel Optimization of Hyperparameters...
    Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization...
  • user avatar
    Roger Grosse
    @RogerGrosse
    Oct 11, 2020
    90% of all confusion about neural net training dynamics would vanish if everyone got used to thinking about and measuring neural net Jacobians, Hessians, Fisher information matrices, etc.
  • user avatar
    Roger Grosse
    @RogerGrosse
    Jan 3, 2024
    I'm teaching a new course on AI Alignment this term at the University of Toronto. The first half will cover idealized models of future AI systems (optimal planners, universal induction, etc.), and the second half will cover practical alignment techniques in the context of LLMs.
    53K
  • user avatar
    Roger Grosse
    @RogerGrosse
    Jul 13, 2018
    The deep learning revolution happened in Canada because Canadians are used to long winters.
    user avatar
    Tim Vieira
    @xtimv
    Jul 12, 2018
    New name for @NipsConference "AI Winter" — Miro Dudík
  • user avatar
    Roger Grosse
    @RogerGrosse
    Jun 19, 2017
    The Wasserstein GAN should have been called the GAN whose Discriminator's A Lipschitz Function (GANDALF).

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement