Let's Push Things Forward: machine learning

Showing posts with label machine learning. Show all posts

Sunday, June 14, 2020

GPT-3

I am not sure if people who haven't been doing machine learning can appreciate how weird GPT-3 is or what "few-shot learner" means.

In normal NLP machine learning, you might start with a pre-trained language model that encodes the relationships between words of a language. You then build a training set of data, probably 1000s of items where you have labels applied to text, correct translations, answers to questions, and things like that. You then train the model to minimize error on that training set. Then, with the trained model, you send it new samples of text and it spits out a label, translation, or answer as appropriate.

With GPT-3, a much bigger language model, trained to just predict the next word, you don't have to do any of that. You construct the whole task in the last bit, where you would normally be sending a trained model new samples of text. The trick is, you send in a description of the task in with the text. So you could send in:

translate from English to French: hat => chapeau, cat => chat, hello =>

and it would send back "bonjour".

It learned enough about language to be able to have examples of what typically follows "translate from English to French" to be able to get good performance on that task. This wouldn't be surprising if it had been trained on that task, but there is no task specific training (aka fine-tuning). It was just trained to predict the next word. Having a big enough model (a mind boggling 175 billion parameters) it just picks up that whole task as a pattern.

Read the paper.

Thursday, August 29, 2019

Black box

https://news.ycombinator.com/item?id=20826979

Unpopular quote from my image and video processing professor - “The only problem with machine learning is that the machine does the learning and you don’t.”

While I understand that is missing a lot of nuance, it has stuck with me over the past few years as I feel like I am missing out on the cool machine learning work going on out there.

There is a ton of learning about calculus, probability, and statistics when doing machine learning, but I can’t shake the fact that at the end of the day, the output is basically a black box. As you start toying with AI you realize that the only way to learn from your architecture and results is by tuning parameters and trial and error.

Of course there are many applications that only AI can solve, which is all good and well, but I’m curious to hear from some heavy machine learning practitioners - what is exciting to you about your work?

This is a serious inquiry because I want to know if it’s worth exploring again. In the past university AI classes I took, I just got bored writing tiny programs that leveraged AI libraries to classify images, do some simple predictions etc.

The complaint that a neural network, or deep learning model, is a "black box", where you can't see why the model is making a particular output seems quite weak to me. Not being interested in it because of that is worse, because it is like saying that we shouldn't study neuroscience because we don't understand how memories are stored in the nervous system. The box is not black. It is just a very complicated box, with lots of weights and layers. You can look at the weights and inputs and layers and literally see exactly what simple math applied to the inputs led to the output. It might take you a while, but it is all there. We are building tools all of the time to make it easier to make this process more efficient, but, unlike a brain, no one is stopping you from reaching into the box and taking a look.

Friday, October 27, 2017

Keras

I've been working hard to keep up to date with recent progress in machine learning. The progress in Deep Learning is exciting to me, especially because it builds on all of the things I learned about neural networks in 1995-99 in college in grad school (particularly some work with Dr. William Levy at UVA), but never really had the opportunity to put into practice directly, except in some simple classifier type situations.

In any case, last year I took a short course from Miner and Kasch taught by Florian Muellerklein that was quite good. You can check out the slides and code on github. I'd previously been messing around with Torch, Caffe, TensorFlow, cuDNN and a variety of other libraries, while easier than the old days of finding eigenvalues in c++ or running out of MatLab, they required a lot of configuration and such. In the course, we jumped right into using keras. Wow, so much easier. It's a bit like a Ruby on Rails for neural networks, giving you some sensible defaults and get going right away, minimizing common errors, but different in that it is just a simpler interface overlaying other libraries. If anyone is diving in this stuff, the keras path is the best path that I've tried.

To help one out on this path, there are now a couple of books on keras. Deep Learning with Python is the one that I recommend. It is written by Francois Chollet, who is the creator and maintainer of keras, and is now working for Google. Respective of that, I think it does the best job of communicating how the library is intended to be used, and puts things in the right context of experimentation with hyper-parameters and other topics that can take up all of your time.

In any case, I was recently asked what I thought about the joint project Gluon from Amazon and Microsoft. I think it is trying to do the same thing as keras, and I am hoping that it doesn't mean more divergence in the stack. Already, the deep learning stack is pushing in the hardware direction. There were already a level of matrix math operations that were implemented in hardware, which is why Nvidia GPUs were such a early accelerator for progress based on similar needs in graphics processing, and AMD and Intel are not far behind, with Intel's acquisition of Nervana systems a key recent purchase in designing chips built for purpose. Google's TensorFlow Processing Units (TPUs) take this a step further, and are designed to push more of the TensorFlow code into hardware. Obviously Microsoft and Amazon don't want to be left behind, as they won't be buying TPUs for AWS or Azure. Even Tesla is looking at building their own chips for image analysis in cars.

That leaves those of us at software level trying to find the right API to code against. Right now, the answer is for me keras, but keeping an eye on the whole stack is necessary to see what choice is the right one to make at the top of it.

Let's Push Things Forward