Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread]
arxiv.org/abs/1811.03600
Roger Grosse
1,037 posts
- ICLR reviewers keep insisting on ImageNet experiments and expensive-to-train SOTA architectures. Effectively, they require proof that you've spent sufficiently many GPU cycles for your conclusions to be taken seriously. ICLR papers are a cryptocurrency.
- SGD and Adam are good enough for training most neural net architectures.
- The Deep Learning and RL Summer School videos are up!
- Google has a significant fraction of the world's top AI talent, and yet Gmail has recently been marking as spam nearly every email from the undergraduates in my ML course. It sometimes even spam filters emails from my grad students or replies to messages I sent.
- Here's what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.
- My vote for deep learning result of the year -- and this has gotten almost no hype -- is machine translation without parallel text. I'd thought this was impossible. Says something interesting about language. arxiv.org/abs/1710.04087 arxiv.org/abs/1710.11041 arxiv.org/abs/1711.00043
- This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena. papers.nips.cc/paper/9483-a-f…
- Reversible RNNs: reduce memory costs of GRU and LSTM networks by 10-15x without loss in performance. Also 5-10x for attention-based architectures. New paper with Matt MacKay, Paul Vicol, and Jimmy Ba, to appear at NIPS. arxiv.org/abs/1810.10999
- Excited to release our paper on Self-Tuning Networks, a way of adapting regularization hyperparameters online during training. This is the work of Matt MacKay, Paul Vicol, and @jonLorraine9, to appear at ICLR 2019.
- 90% of all confusion about neural net training dynamics would vanish if everyone got used to thinking about and measuring neural net Jacobians, Hessians, Fisher information matrices, etc.
- I'm teaching a new course on AI Alignment this term at the University of Toronto. The first half will cover idealized models of future AI systems (optimal planners, universal induction, etc.), and the second half will cover practical alignment techniques in the context of LLMs.
- The deep learning revolution happened in Canada because Canadians are used to long winters.
- The Wasserstein GAN should have been called the GAN whose Discriminator's A Lipschitz Function (GANDALF).





