Stories by Sara Robinson on Medium

SQL or ML? The same dataset, two ways.

Sara Robinson — Mon, 17 Dec 2018 15:57:08 GMT

This is the second post in a two-part series going through my “when to use machine learning” flowchart:

The first post focused on two things:

Figuring out whether you want to use your data for generating future predictions or historical trend analysis. If it’s the latter, you don’t need ML
Why ML is often a good solution for generating predictions on video, image, or audio data

In this post I’ll focus on the bottom quadrant of the flowchart: taking text, numerical, and categorical data and deciding whether ML is a good fit for your prediction task. This is the tricky part. With the unstructured nature of images and audio, it’s almost always obvious that ML is the best choice since defining rules like “if the majority of pixels are orange” quickly gets out of hand. But with text, numerical, and categorical data the answer isn’t always so obvious.

Finding patterns in your data

For this section let’s assume you’ve found yourself at this step in the flowchart:

First, use SQL or other data analysis tools to identify patterns between your inputs and the thing you’re predicting.

We can’t get much further in this discussion without looking at some actual data. Let’s get started using log files as an example. I’m choosing log files because logs generate tons of data, which might lead people to think they’d be a good fit for running through an ML model. Are they? It depends.

“The thing you’re predicting” is the important piece here. Let’s say I want to flag error logs generated by my application. Since I’ve already got lots of log data, I could classify each log as “error” or “not_error,” build a binary classification model with a framework like TensorFlow, train the model, figure out where to serve it, and then make an API call to my model whenever new logs come in. That sounds like a lot of trouble, which is why we should see if there’s a simpler approach before we go the ML route.

Time to test this with real data. I stumbled upon this jackpot of log datasets on GitHub. For this example I’ll look at the BlueGene/L logs — a dataset of logs generated by the BlueGene/L supercomputer. Disclaimer: I am not an expert in analyzing logs or HPC 😮

The GitHub repo provides a small preview of the original dataset. Each log consists of a message, timestamp, ID, location, type, and more (all detailed here). The original dataset is a newline delimited text file, with a log statement on each line:

https://medium.com/media/5873b8a52a29f3e5cc85656a9452e0ac/href

Before I write a script to split the log strings into a more structured format, let’s look at the data to see if we can spot any trends. It looks like nearly every log with an error contains the word “error,” “fatal,” or “failed.” Rather than building out an entire ML system, it would be way easier to run new logs through something like this:

https://medium.com/media/901f2ae9d858b86b7a4bd6b0c5a5cdd1/href

In the sample file with 2000 logs, the above IF statement is able to flag 207 of them as errors. By clearly defining what I wanted to predict and doing a quick search for patterns in my data, I’ve saved myself lots of time and built a solution in five lines of code.

The same type of analysis could be applied to any dataset, not just log files. For example, I if I worked at a clothing company and had a dataset of customer demographic and purchasing data, I’d want to do the same “what am I predicting?” analysis before jumping to ML. If all I want to do is figure out which customers to email about a new winter coat, it might be as simple as “IF user lives in cold climate, promote the coat.” But if I want to use their past purchasing history to recommend new shirts they might like, it might be time for ML.

ML on text, numerical, and categorical data

Working with the same log file dataset as we did above, I’d now like to change the thing I’m predicting. Instead of flagging logs that indicate errors, I want to identify logs that indicate anomalies. The dataset is already labeled for this, using a “-” in the first character of each log to indicate “non-alert” logs. Logs without a dash symbol are alerts. This makes it a good dataset for alert detection, but I’m still not 100% convinced that machine learning is the right tool for the job. Before I decide, I want to compare alert vs. non-alert logs to see if I can spot any patterns.

To do this I’ve written a script to extract a few pieces of data from each log (using the full BlueGene dataset here) and write it to a CSV. I’ve then uploaded it to BigQuery for exploration. The result is a 340MB table with 4.7M rows and four columns that look like this:

The alert column contains the code for the alert, or the string non_alert. With a simple query I can see how many of the 4.7M rows contain alerts. There are 348,698 alerts, about 7% of the total dataset. Let’s see if we can spot any trends by doing some queries on the alert logs. In this phase of data analysis, simple SQL aggregations are almost always the solution — sometimes with COUNT(), and sometimes with AVG(). In this query we’ll look at a breakdown of the location of alerts using COUNT():

https://medium.com/media/e49ce4dff4ac33d888c9bc5c9d982169/href

And here’s the result:

It looks like most of the alerts occurred at the kernel or app level. Does that mean I could write something like IF location == 'KERNEL' or 'APP', it’s an alert? Let’s look at all the logs from the KERNEL level:

https://medium.com/media/6cf656e4f8f5fb250f65042b57f9f3f6/href

Hmmm, over 4 million of the ‘KERNEL’ logs are not alerts, so it doesn’t look like we can draw any conclusions from the location. We can repeat the same analysis for log type. Here’s a breakdown of log types for only the alert messages:

Almost all of them are FATAL. But if we look at a breakdown of alert type for the FATAL logs, most of them are not alerts:

We can conclude that FATAL logs have a higher likelihood of being alerts, but we definitely cannot say that if a log is FATAL, it is an alert.

Next I’ll look at the log message field. Since this is free-form text, it won’t be quite as easy to extract patterns as the location and type fields in the log data. But it turns out there are many repeated log messages within the alert logs:

Can we conclude that logs with a “data TLB error interrupt” error message are more likely to be alerts? Let’s break down the alert type for logs with that message:

https://medium.com/media/9d7ce7d78be5b232db25ce622c290d82/href

Boom! Now we’re getting somewhere. All of the “data TLB error interrupt” messages are alerts. Looking more closely at my GROUP BY message query above, there are 49,042 rows in the output — meaning there are 49,042 unique log messages associated with alerts. Since the top error message accounts for nearly half of all alerts, in theory I could write an IF statement to classify all logs with ‘data TLB error interrupt’ as alerts. But that would only classify 45% of new error messages correctly.

For a moment let’s pretend that this approach for classifying logs is a “machine learning model”:

https://medium.com/media/615aec42191529158e78ff03857248aa/href

Of course it’s not actually an ML model, it’s just an IF statement. But pretend for a moment that the classifications it’s generating are coming from an ML model. If I only care about my model’s ability to correctly identify logs that are alerts (also known as precision), this model is doing very well. Precision measures: for all logs my model classified as alerts, what % of those were correct? In terms of precision, this model achieves 100%. Pretty impressive!

But what about all those logs with other messages that my “model” missed? That’s what recall measures — what % of all alerts did my model identify correctly? My model’s recall is 45%, which is not so great. But I still may not need machine learning. First I need to decide which I care more about:

Option 1: My system doesn’t flag any non-alerts as alerts. In the process, it may miss some logs that are alerts.
Option 2: My system flags a high percentage of possible alerts. In the process, it may surface some false positives (alerts that actually aren’t alerts).

This is often a tradeoff. If I prefer option #1, I can likely proceed without machine learning. I could even improve my system by adding more of the top alert log messages to my IF statement, along with some common features of non-alerts:

https://medium.com/media/fa2a26486280e5c55cb92ac56326da4c/href

This rule-based approach will classify alerts and non-alerts correctly, but if I want it to be all-encompassing it’ll quickly get long and messy.

If I want my system to flag as many potential alerts as possible (Option 2), a machine learning approach may work best with this dataset. And since I’ve already done some data analysis to determine which factors influence the likelihood that a log contains an alert, I know the pieces of data I’ll be using to train my model.

We can use BigQuery ML to take existing data stored in BigQuery and use it to train a machine learning model. With BQML we can choose between three different types of models:

Linear regression: predict a numerical value (like revenue)
Binary logistic regression: predict the probability that your input belongs to one of two classes
Multi-class logistic regression: predict multiple possible classes for a particular input

Our case of predicting whether or not a log is an alert is a great fit for binary logistic regression since we’ll have 2 classes: alert and non_alert. For our model we’ll use the log message, type of log, and log location to predict whether or not it’s an alert. We can create and train our model with a single SQL query in BQML:

https://medium.com/media/668bbaf5baa8e6342dd0422ec07f92f6/href

Let’s break down what’s happening in our query:

In the first line we create the model and give it a name
Then we tell BQML the type of model we’re building (logistic regression) and the column from our table that will be the label: this is the thing our model is predicting. In our case, it’s the alert column
The rest is a regular old SQL query where we extract all of the data that will be used as input and output to our model. Remember that the alert column had more than 2 values: non_alert or the particular alert messages if the log was an alert. I’ve used a CASE statement to convert this data to two classes

The model took about 10 minutes to train, and once it finished we can see the loss (error) reported after each iteration of training:

Here we want to focus on the Evaluation Data Loss, since this measures our model’s error on our test set. When we train our BQML model, it’ll automatically split our data: using the majority of the data to train the model and then reserving a subset of the data to see how our model performs on data it hasn’t seen before. We want to see loss decreasing as our model trains for more iterations, which is exactly what we get here.

We’ve got a trained model, now what? Let’s use it to generate a prediction, which we can also do with SQL. Here I’ll feed it one example for prediction, using a log with one of the most infrequently occurring alert messages (our model should predict alert). When we run predictions, BQML creates a new field with the name of our output column prefixed with predicted_:

https://medium.com/media/b6b4f1ad7680af95c03496a05cea891f/href

And here’s the result:

The model has correctly predicted it’s an alert! We’re seeing very high accuracy here since BQML treats free-form text as categorical data. The model associates every unique log message with a numerical value, and it can memorize the combination of certain messages, locations, and log types that result in alerts. In other words, it is handling each message in its entirety rather than learning patterns within the text. If we pass it the same log message as the one above with a period at the end of the sentence, it would likely classify it as non_alert. To train a model that is better at generalizing, we should build a custom model using some natural language approaches to learning patterns in our text (more on that in a future post).

There you have it — with the same dataset, we’ve solved one problem with a simple IF statement and another with an ML model. Along the way we used some data analysis to help us figure out which use case was a better fit for ML.

Thank you to my teammates Lak and Terry for their feedback on this post.

What’s next?

Hopefully you’ve now got a better idea of the process you might go through to determine if ML is a good fit. Want more of what was covered here? Check out these resources:

This ML crash course has good descriptions of precision and recall
Lak has a great post on this topic
Get started with BigQuery
Train a model in BQML
Learn how to build a custom model for text classification with this post

Got feedback on this post or something you’d like to see covered in the future? Leave a comment or find me on Twitter @SRobTweets.

Machine learning on machines: building a model to evaluate CPU performance

Sara Robinson — Wed, 14 Nov 2018 15:40:20 GMT

Wouldn’t it be cool if we could train a machine learning model to predict machine performance? In this post, we’ll look at a linear regression model I built using BQML to predict the performance of a machine given hardware and software specifications.

All of the work I do is software related so I rarely think about the hardware running my code — this gave me an opportunity to learn about the hardware side of things.

The dataset: SPEC

To train this model I used data from SPEC*, an organization that builds tools for evaluating computer performance and energy efficiency. They published a series of benchmarking results from 2017, where they used 43 different tests to evaluate the performance of specific hardware. Their benchmarks are divided into 4 categories: integer and floating point tests measuring performance through both time (called SPEC speed) and throughput (called SPEC rate):

The four categories of SPEC 2017 benchmarks.

For this post, I’ll focus mostly on the Integer Speed tests.

Each test includes specs on the hardware and software of the system where the test was run, along with results for several benchmarks. The benchmarks are intended to simulate different types of applications or workloads.

Here’s an example of the hardware and software data for one test:

https://medium.com/media/9f48834168286d662a126e4b3d67790c/href

And here’s a subset of the SPEC benchmarks for the above system:

https://medium.com/media/ac04061f9f38b813257ca301d1dbf2ed/href

Why so many benchmarks, and what do they all mean? SPEC’s goal is to evaluate machine performance across a variety of common workloads, all outlined here. For example, the 631.deepsjeng_s benchmark evaluates the speed of a machine running the alpha-beta search algorithm for playing a game of Chess. SPEC also has benchmarks for other ML workloads, ocean modeling, video compression, and more.

Is there a relationship between hardware, software, and benchmark results?

The short answer is — yes! But I wanted to confirm this before naively dropping inputs into a model and hoping for the best.

To take a closer look at the data, I wrote a Python script to parse the CSV files for each test using the CPU2017 Integer Speed results. The script extracts data on the hardware and software used in each test, along with the score for each of the benchmarks run on that machine. In the script, I first initialize a dict() with the data I want to capture from each test:

https://medium.com/media/4ba9f1356b1ac190f4be965eee681e36/href

Then I write the header row to the CSV using the keys from the dict above:

https://medium.com/media/8d2c268320ac21eaa07023decc8012a4/href

Here’s an example CSV that I’ll be iterating over in the script. The next step is to iterate over the local directory where I’ve saved all the CSV files for the integer speed tests, and use the Python csv module to read them:

https://medium.com/media/3abe1f03b3489b4bf3d7c4b32847dcde/href

The files aren’t typical CSV format, but they all follow the same pattern so I can grab the benchmark data with the following:

https://medium.com/media/8c413d848d09cf2ed45f54ef3e1085bc/href

Using the benchmark start and end indices, I can iterate over the rows of the CSV that contain the benchmarks. Then I look for the specific data points I’m collecting within a CSV. Here’s how I grab the vendor and machine model name:

https://medium.com/media/f78ae84baba485ebd2750fcf5350ee76/href

When I’m done collecting all of the machine specs, I create a CSV string of the data and write it to my file:

https://medium.com/media/87045c8d52d7a5e96e8a5c45fb18e02e/href

With all of the machine data and benchmark scores in a CSV file, I used a Python notebook and matplotlib to explore relationships between machine specs and benchmark results to see if there was a linear relationship. Here I’ve plotted the nominal speed of a machine and the 631.deepsjeng_s (alpha-beta search) benchmark:

Looks like I could draw a line or curve that roughly estimates the relationship above. Let’s take a look at another input. Here’s a plot of a machine’s number of cores and the 631.deepsjeng_s benchmark:

While there’s some relationship between the inputs and benchmark measured above, it’s clear that we couldn’t use a single feature (like number of cores) to accurately predict a machine’s speed. There are many features of a machine that affect its performance, and to make use of all of them we’ll build a linear regression model to evaluate the speed of a machine. Here’s what we’ll use as inputs:

- The hardware vendor
- The machine model name
- Nominal and max speed of the microprocessor in MHz
- Number of cores
- Number of memory channels
- Size of L1, L2, and L3 caches
- Memory in GB
- Memory speed
- The OS running on the system
- The compiler being used
- The company running the test (Lenovo, Huawei, Dell, etc.)

A linear regression model typically outputs a single numeric value, so we’ll need to create separate models to predict each benchmark score. It’s important to note that not all of our inputs are numerical — vendor name, model name, and OS name are categorical, string inputs. Since a linear regression model expects numerical inputs, we’ll need a way to encode these as integers before feeding them into our model.

Building a linear regression model with BigQuery Machine Learning

We could write the code for our linear regression model by hand, but I’d like to focus on the predictions generated by my model rather than the nitty gritty of the model code. BigQuery Machine Learning (BQML) is a great tool for this job. If I create a BigQuery table with my input data and benchmarks, I can write a SQL query to train a linear regression model. BQML will handle the underlying model code, hyperparameter tuning, splitting my data into training and test sets, and converting categorical data to integers.

Since I’ve already got the data for my model as a CSV, I can use the BigQuery web UI to upload this data into a table. Here’s the schema for the table:

And here’s a preview of the data:

To get a sense of one of the categorical features we’re working with, let’s take a look at the sponsor field in our table, which refers to the company that ran an individual test. Here’s the query we’ll run to get a breakdown of our data by test sponsor:

https://medium.com/media/17185b380ef36eb2ead4257339b84a68/href

The results:

And here are the top operating systems used in the tests:

I can train my model with a single SQL query in BQML:

https://medium.com/media/1cbc478b0cd5123ef2c1d6ef0359b137/href

This took 1 minute and 14 seconds to train. When it completes, I can look at the stats for each iteration of training:

Loss metrics for each epoch of training in BQML

The number I want to focus on here is the Evaluation Data Loss — this is the loss metric calculated after each iteration of training. We can see that it has steadily decreased from the first to the last iteration. To get additional evaluation metrics, I can run an ML.EVALUATE query. Here are the results:

Evaluation metrics on my BQML model

Mean squared error (MSE) measures the difference between the values our model predicted using the test set and the actual values. You can also think of it as the distance between your regression (best fit) line and the predicted values. A smaller value is better, and our model’s .0121 MSE is very good.

Since this is a regression model (predicting a continuous numerical value), the best way to see how it performed is to evaluate the difference between the value predicted by the model and the ground truth benchmark score. We can do this with an ML.PREDICT query.

When this query runs it’ll create a new field prefixed withpredicted_ and the name of our label column (in this case _631_deepsjeng_s). Let’s run ML.PREDICT across the original dataset and output a few features of the system being tested, the actual speed test result, and predicted benchmark:

https://medium.com/media/481c9c34bbac6d07a6d87354633a1f0c/href

Our predictions are very close to the actual score values!

To measure this another way, we can subtract the predicted score from the actual score, and then get the average of the absolute values of these differences:

https://medium.com/media/1286c137f93f6ab6894aeb052f66b7df/href

On average, our model’s prediction is only .04 off from the original value — pretty impressive.

Generating predictions on new data

Now that I’ve got a trained model, I can use it to predict the speed benchmarks for a machine that wasn’t part of my training set with the following query:

https://medium.com/media/4e97441e77aacd76a08dd30c1afaff3f/href

The SPEC speed score for this machine on the 631.deepsjeng_s benchmark should be 5.19. When I run this query my model predicts 5.20 which is pretty close to the actual value. I can now use this model to generate predictions on new combinations of hardware and software specs that weren’t included in the test dataset. In addition to the alpha-beta search benchmark, I created models for other Integer Speed benchmarks that had similar accuracy to this one — let me know if you’re interested in these results and I can share details.

Because I didn’t have to write any of the underlying model code, I was able to focus on feature engineering and finding the right combination of inputs to build a high accuracy model — thanks BQML!

More with SPEC data

If you’re interested in experimenting with this data and creating your own models, check out the Integer Speed benchmark results here. It’s also worth noting that SPEC has lots of data on previous CPU benchmarks. The 2006 SPEC CPU benchmarks have many more years worth of data (though SPEC has retired this set), so you could train a model using a similar approach to predict the performance of older machines.

This post focused on integer speed benchmarks. I haven’t included the other SPEC benchmarks here (floating point speed and throughput) because they didn’t perform as well with linear regression models (MSE in the 100s compared to MSE of less than 0.1 in the integer speed models). I’m planning to experiment with these tests using other types of models, stay tuned :)

In addition to SPEC, there are also many other performance benchmarks you could use like LINPACK, which is used by Top 500.

Have feedback?

This is my first foray into hardware performance data, so I’d love any feedback you have on the features I used as inputs, other types of models to try, or anything else. Leave a comment below or find me on Twitter at @SRobTweets.

If you want to learn more about BQML, my teammates have some great posts: this one covers predicting Stack Overflow response times, and this one walks you through building a model to predict flight delays.

Thank you to these awesome people for their contributions and feedback: Brian Kang, Shane Glass, Abhishek Kashyap, Eric Van Hensbergen, Jon Masters, and a few other SPEC experts at Red Hat 😀

*********************

*Data source: https://www.spec.org/cpu2017/results/cint2017.html Data retrieved on November 11 2018. SPEC Fair Use Rules: https://www.spec.org/fairuse.html

Machine learning on machines: building a model to evaluate CPU performance was originally published in HackerNoon.com on Medium, where people are continuing the conversation by highlighting and responding to this story.

When should you use machine learning? — Part 1

Sara Robinson — Thu, 08 Nov 2018 16:35:56 GMT

When should you use machine learning? — Part 1

I’m constantly fascinated by machine learning and always excited to find new projects for it. But as trendy as ML has become, sometimes a SQL query or IF statement can accomplish the same job as an ML model in much less time. I wanted to gauge interest in this topic before diving in, so I sketched a quick flowchart while on a plane and posted it on Twitter:

Sara Robinson on Twitter

I made a little doodle to help you determine when to use ML, and when to use a SQL query or IF statement instead. It's just a draft, so let me know what you think!

I guess this is something a lot of people are thinking about! In this post I’ll go through the paths in the flowchart with specific examples using real datasets. This series will focus on supervised learning — using labeled data to train a model. This is Part 1, so I won’t cover all the paths in the flowchart just yet. Let’s dive in.

How will you use your data?

Starting from the top of the chart, the first thing to think about when solving any problem with data is how you’ll use the data when you’re finished with the task. ML is primarily for using data to make future predictions. If you’re only looking at historical trends in your data there’s no need to build a machine learning model.

Also, chances are if you’re analyzing historical data you’ll want to know why certain events occurred and exactly how two pieces of data are related. A machine learning model, on the other hand, creates a sort of black box between your inputs and predictions. For example, it may be able to predict that there’s an 80% likelihood a specific person will buy X brand of shoes given their age, past purchase history, and interests. But the model won’t be able to tell you how it came to those conclusions.

What’s the best approach if we only care about analyzing historical trends? Let’s take a look at this public domain dataset of U.S. Congressional bills as an example. It contains data on ~400k bills that have gone through Congress since 1947. For each bill, it includes a description, whether it was passed, data on who introduced it, the topic, and much more. I want to figure out which factors have contributed to passing a bill in the past. I’ve put this data on BigQuery so I can easily run some SQL queries on the data. First, I want to see the percentage of all bills in the dataset that have passed. I can do this with a simple SQL query:

https://medium.com/media/1d21396eb9e5e766361b6b9cf751bebc/href

The results:

On average, only 4.03% of proposed bills pass. Let’s see if the topic of a bill affects whether or not it will pass:

https://medium.com/media/89d42e34512525fe0d30a74ace7189f3/href

Here are the results, visualized using Data Studio:

It looks like certain topics do indeed have a higher percentage of passing. Bills related to Public Lands and Government Operations are much more likely to pass than bills about Social Welfare, Macroeconomics, or Labor. It would also be worth investigating whether the percentage of bills passed for each of these categories has changed over time, or whether any data related to the person who proposed the bill affects its likelihood of passing.

It’s worth reiterating that the dataset we choose doesn’t dictate whether or not it’s a good fit for ML — it all depends what you want to do with the data. The bills dataset I used for SQL analysis above could also be used for ML if I define a clear set of inputs and outputs for my model. For example, I could use the bill title to train a model to predict the topic of a new bill. Or, I could use several data points for each bill to predict whether or not a new bill will pass.

If we do decide to use our data to train an ML model, the analysis above is an important first step. We want to understand any relationships between the data we’re feeding into our model and the thing we’re predicting before we ML-ify it, especially if we’re dealing with structured data like the example above (more on that in the next post).

What type of data are you working with?

Next let’s move along to the right side of the flowchart. For the rest of the post I’ll assume you want to use your data to generate future predictions.

Videos, images, audio

With unstructured data like videos, images, or audio files, machine learning will typically be your best bet. To understand why, think about how you’d go about analyzing these types of files without machine learning. If you wanted to build a model to determine whether an image contains an apple or an orange, you’d need to iterate over all the pixels in the image with a series of IF statements or rules you’ve specified in advance. You’ll quickly discover quite a few edge cases that don’t fit into the rules you’ve defined, like black & white images, or an image of a cross-section of an orange. That’s exactly why ML is a good fit — you can train a model on thousands of images of apples and oranges at all different angles and it’ll be able to generalize on images it hasn’t seen before.

I haven’t yet made a distinction between using already available pre-trained models vs. building your own custom deep learning models. This largely depends on the type of task you’re solving. Let’s take this image of an orange as an example:

I could run this through a pre-trained model or build my own from scratch, it really depends what I want the output of my model to be. If I’m building some sort of orange photo contest app and need my model to tell me simply whether or not an image contains an orange, I could utilize a pre-trained model like the one provided in the Cloud Vision API. Here are the labels I get back:

https://medium.com/media/9dd22dec575ca0e7a2f769c9eb095ffc/href

The Vision API will work great for my orange / not orange use case. But this image is actually of a blood orange, and now I want my app to identify what type of orange is in the image. I can’t reliably get this back from the Vision API, but I could use a tool like AutoML Vision to train my own custom model to identify blood oranges, navel oranges, and mandarin oranges. I don’t need to write any of the underlying model code to use AutoML, I just need to upload 10+ images of each type of orange into the UI and click a train button. AutoML makes use of a technique called Neural Architecture Search, which uses an ML model to generate different types of model architectures to find the optimal one for a particular task. But the great thing about AutoML is that I don’t need to understand how this works. When my model training completes I can evaluate its accuracy in the UI:

An AutoML Vision model for classifying mandarin, navel, and blood oranges.

When I’m happy with the results I can generate predictions via a custom REST API endpoint.

So far I haven’t had to write any of the model code since the prediction I want from my orange image hasn’t required it. But now I’m getting a little fancier, and I want my app to take the same types of images as input and then highlight the regions of the image that contain orange rind. We’re getting pretty specific here, so this is not a task for your average pre-trained model.

There are many ways to solve this, but one approach would be to build a TensorFlow model that makes use of transfer learning. With transfer learning, we can take a model that’s already been trained on lots of images to accomplish a task similar to ours (like identifying regions, or “masks” in an image), and then use that as a starting point for our training. This way we don’t need as much training data as we would if we were building a model from scratch. TensorFlow has a few models built for this exact use case, listed at the bottom of the table here. This blog post provides a great walkthrough of how to implement this in TensorFlow. If you’re looking to build a similar model with bounding boxes (identify boxed regions in the image instead of masks), I’ve got a post on doing this to detect Taylor Swift.

Video and audio data

I haven’t touched on generating predictions from video or audio data with machine learning, but the approach is very similar to what I’ve outlined above for images. If I wanted to analyze videos without machine learning, I’d need to take the pixel approach I mentioned above one step further and apply it across each frame in a video. This would quickly get messy with a rule-based approach. To get started analyzing videos with machine learning, you can check out the Cloud Video Intelligence API to get scene-level video annotations. For a deep dive on building your own custom model for analyzing videos, check out this Medium post.

Audio data is also unstructured, but the process for analyzing and generating predictions on it without machine learning is different than looking at the pixels in an image. Let’s say I want to write a rule-based program to determine whether or not an audio file contains a human speaking. First, I’d need to read the raw audio data and run it through a Fourier Transform (FT). I am not an expert on FTs, but suffice it to say it will transform the data from time domain to frequency domain. Once I have the frequency data (measured in hertz), I can find the average frequency and compare that number to the range of frequencies a human voice typically occupies (85–255 Hz).

Sounds simple enough, right? You could write a script to calculate the average frequency of a file and then check if it falls within the average 85–255 human range. But what about cases where a person’s voice falls slightly outside the range? Even with frequencies only slightly less than 85 or more than 255, the rule-based approach would classify these as “not human.” Or, take a file with a person speaking and a cat meowing. If the meows take up more time, this will bring down the average and the script will classify it as “not human” as well. There are also many other sounds that occupy a similar frequency to a human voice. You get the idea.

It would be better if we could train a program to learn what the data associated with a human voice looks like, or how to identify a meow, or the word “go.” For a simple way to convert audio to text or text to audio, try the Cloud Speech APIs. And to get started building your own model for custom audio analysis (like training a model to predict the specific person speaking), check out this TensorFlow tutorial.

What’s next?

Stay tuned for part 2 where I’ll cover when to use machine learning to analyze text, numerical, and categorical data. And for more info on the tools I’ve covered in this post, check out:

I’d love to hear what you think about this approach for evaluating whether to solve a problem with ML. Find me on Twitter @SRobTweets or leave a comment below.

Classifying congressional bills with machine learning

Sara Robinson — Thu, 01 Nov 2018 13:49:37 GMT

I’m always looking for new datasets for ML projects, so I was particularly excited to discover this public domain dataset of ~400k congressional bills. The dataset has 20+ data points for each bill. Here’s an example a subset of this data for one bill:

Title: A bill to provide for the expansion of the James Campbell National Wildlife Refuge Honolulu County Hawaii
ID: 109-S-1165
URL: https://www.congress.gov/bill/109th-congress/senate-bill/1165
Topic: Public Lands
Date Introduced: 6 June 2005
Date Passed: 25 May 2006
Congressperson who introduced it: Daniel Inouye
Passed: Yes

The bills from this dataset were all manually assigned a topic by domain experts. The resolutions in the dataset are unlabeled which makes it a great fit for AutoML Natural Language — we can use the labeled bills to train the model and see how it performs on unlabeled resolutions. We’ll use the title of the bill as the input to our model, and the label will be the topic. The original dataset includes a higher level topic (Major field in the dataset) and a more specific (Minor) topic for each bill. To keep things simple, we’ll use just the Major topic field to categorize bills.

Note that there are an infinite number of possible input and label combinations you could use to build ML models from this data, we’re just using two in this example,Title and Major topic. Here are some sample inputs and predictions for our model:

Title: To permit the televising of Supreme Court proceedings.

Category: Technology

-----

Title: A bill to provide a program of tax adjustment for small business and for persons engaged in small business.

Category: Domestic commerce

The first step in building a model with AutoML NL is uploading a CSV of training data. I wrote a script to extract only bills with a topic assigned and strip some characters from the text. The resulting CSV looks like this:

https://medium.com/media/14cd130a7324aa27091bab023044a8df/href

If you’d like to play with the dataset I used to train the model, I’ve made the CSV file publicly available here.

Training an AutoML NL model

Now that we’ve got a CSV with our text inputs and their associated labels, we can upload this directly to the AutoML UI to create our dataset:

Once it uploads, we can look at all of our training data in the UI:

To train our model, all we need to do is press the train button:

How cool is that?! We don’t need to write any of the underlying model code — AutoML will handle that automagically for us.

Evaluating our model

To evaluate our model we’ll look at the confusion matrix in the Evaluate tab of the UI:

It may look confusing (it is called a confusion matrix after all), but turns out it’s not so hard to understand: what we ideally want to see here is a strong diagonal from the top left. This tells us the percentage of text items from our test set that the model was able to classify correctly. Side note: AutoML automagically splits the data we upload into training, test, and validation sets.

The confusion matrix shows the accuracy for 10 of our 20 topics, but we can also look individually at a topic to see specific examples that our model classified correctly and incorrectly. Looking here, we might want to improve the training data for Domestic Commerce bills, since our model only classified 78% of those correctly, and confused about 10% of them as Macroeconomics.

Generating predictions on unlabeled bills

Next it’s time for the best part — generating predictions on unlabeled bills. We can try this out right in the AutoML NL UI. Let’s try the following unlabeled bill:

A concurrent resolution making the necessary arrangements for the inauguration of the President-elect and Vice President-elect of the United States.

The model says there’s a 95.6% chance this bill is related to Government Operations. I’ve run two more examples through our model to see how it performs on new data:

Bill: Honoring women who have served, and who are currently serving, as members of the Armed Forces and recognizing the recently expanded service opportunities available to female members of the Armed Forces.

Predicted label: Civil Rights 88.5%, Defense 7.5%

-----

Bill: Setting forth the congressional budget for the United States Government for fiscal year 2010 and including the appropriate budgetary levels for fiscal years 2009 and 2011 through 2014.

Predicted label: Macroeconomics 99.5%

We don’t have the ground truth label to compare our model’s predictions to since these are unlabeled, but the results generated look pretty accurate. If we want to build an app that auto-classifies new bills, we could do that with a simple AutoML API request to our trained model. Here’s an example using curl:

https://medium.com/media/5c45897cc0d5cc73a4e96cc4824f61b9/href

Be on the lookout for a post from one of my teammates that covers using this API to classify new bills.

What’s next?

For the American readers here: since Election Day is coming up in the US and this post covered a political dataset, I’ll use this opportunity as a shameless plug to encourage you to vote :)

Ok, back to my regularly scheduled programming — check out these resources to learn more about what I covered here:

And if writing model code is your thing, check out this post I did on building a text classification model with Keras to predict the price of wine given its description, or this one on using TF Hub to build a text classification model. Stay tuned for more blog posts on this dataset. I’d also love to hear if you do anything interesting with it! You can leave a comment below or find me on Twitter at @SRobTweets.

Crowdsourcing ML training data with the AutoML API and Firebase

Sara Robinson — Mon, 08 Oct 2018 08:12:25 GMT

Want to build an ML model but don’t have enough training data? In this post I’ll show you how I built an ML pipeline that gathers labeled, crowdsourced training data, uploads it to an AutoML dataset, and then trains a model. I’ll be showing an image classification model using AutoML Vision in this example but the same pipeline could easily be adapted to AutoML Natural Language. Here’s an overview of how it works:

A web app asks users to upload an image and assign a label
Using Cloud Functions for Firebase, the labeled image gets uploaded to Cloud Storage
When we have a specified number of training images per label, create a CSV of image paths and their corresponding label
Upload the CSV to an AutoML dataset
Kick off model training

Here’s an architecture diagram:

Want to jump to the code? The full example is available on GitHub.

Collecting crowdsourced images

Let’s say I’m building a model to detect types of cheese. I don’t have enough training images of my own, so I’d like to collect images from cheesemongers around the world. For that I’ve built a simple web app:

This model will only have 3 labels (brie, blue, camembert), and the user will be able to select the label for the photo they are uploading. I’m using the Firebase SDK for Cloud Storage to upload images to Cloud Storage directly from the web client.

Images will automatically be uploaded to a bucket called .appspot.com. Because AutoML requires images to be in a bucket called -vcm, I’ve created a cloud function that will copy uploaded images to this vcm bucket:

https://medium.com/media/435d3c3da970fc6ffe6aedecc34598f5/href

Once the image is copied, the function will write to my Firebase Realtime Database, where I’m keeping track of the number of images we’ve collected for each label:

I make use of Firebase transactions to update the label count in my Firebase database:

https://medium.com/media/9cf0e19c04353aa1a1bde2501b9e7ca5/href

Uploading crowdsourced images to an AutoML dataset

I have a separate function that’s triggered whenever a label count is updated in my Firebase database shown above. AutoML Vision requires at least 10 images per label to train a model, but we’ll likely want more for higher accuracy. Here I’ve specified the number of images I’d like to collect for each label. If we’ve reached that number, we’re ready to upload our images to AutoML:

https://medium.com/media/4a2e26cb6ad187c6feb68b529b4803f5/href

You could also write this function to kick off training periodically. For example, every time you have 500 new labeled images.

To upload labeled images to AutoML, we can create a CSV where the first column contains the GCS path of the image and the second column contains the label for that image. Once we’ve collected enough images, we’ll create a CSV:

https://medium.com/media/698ff1de138260c9118ed97d963eee5a/href

Next, we’ll upload our CSV to Cloud Storage and then use the importData method from the AutoML API to add these images to our AutoML dataset. Our JSON request to importData includes our project and dataset IDs, along with the CSV filepath:

https://medium.com/media/cb8c4fe3f122987fe48699026c461ab0/href

We could alternatively pass a list of individual image paths to inputUris, but currently the only way to upload labeled data to AutoML through the API is to provide a CSV path. With our request above, we’re ready to call importData:

https://medium.com/media/843eafa57289e0b34bd4bbe631f94219/href

Note that the maximum duration for a single cloud function is 9 minutes. If you’re collecting lots of images you should expect the AutoML import to take longer than this, and you’ll want to use something other than Cloud Functions to kick off your model training.

When you check your AutoML project, you should see that images are being uploaded to your dataset. Once this completes, the function will then kick off training for your model using the createModel method. This kicks off training, creates a new model, and deploys it:

https://medium.com/media/108b884b2db9a36d60bfc629fc40d9a2/href

That’s all you need to build a pipeline for collecting labeled training data!

Get started

Inspired to start crowdsourcing training data for your own ML models? Dive into the AutoML API docs. I used the Node.js client in this example, but there are also client libraries for Python and Java. In addition to the importData and createModel methods of the AutoML API I’ve shown, it has many other cool features like exporting a dataset, updating the IAM policy for a model, and of course generating predictions. It’s also worth noting that AutoML Vision has a human labeling feature. If you’ve already collected the images you’ll use to train your model but need help labeling them, it’s worth a try. Check out the full code for this example on GitHub, and let me know what you think on Twitter at @SRobTweets.

Query a custom AutoML model with Cloud Functions and Firebase

Sara Robinson — Mon, 27 Aug 2018 19:29:46 GMT

If you haven’t heard about AutoML yet, it‘s the newest ML offering on Google Cloud and lets you build custom ML models trained on your own data — no model code required. It’s currently available for images, text, and translation models. There are lots of resources out there to help you prepare your data and train models in AutoML, so in this post I want to focus on the prediction (or serving) part of AutoML.

I’ll walk you through building a simple web app to generate predictions on your trained model. It makes use of Firebase and Cloud Functions so it’s entirely serverless (yes, I put serverless and ML in the same blog post 🙄). Here’s the app architecture:

Want to skip to the code? It’s all available in this GitHub repo.

The AutoML API

I was particularly excited to discover that in addition to providing an entire UI for building and training models, AutoML has an API for adding training data, deploying models, generating predictions, and more. Let’s say you’re crowdsourcing training data for your model: with the AutoML API you could dynamically add new data to your project’s dataset and regularly train updated versions of your model. I’ll cover that in a future post, here I’ll focus on the prediction piece.

For this demo we’ll build a web app for generating predictions on a trained AutoML Vision model (though it could easily be adapted to AutoML NL since they use the same API). The particular model I’ll be querying can detect the type of cloud in an image. On the frontend, users will be able to upload an image for prediction. Our app will upload that image to Firebase Storage, which will kick off a Cloud Function. Inside the function we’ll call the AutoML API and return the prediction data to our frontend client. The finished product looks like this:

Setting up your Firebase project

Firebase is a great way to get apps up and running quickly without worrying about managing servers. It provides a variety of SDKs that make it easy to do things like upload images, save data, and authenticate users directly from client-side JavaScript.

For this blog post I’ll assume you already have a trained AutoML Vision model that’s ready for predictions. The next step is to associate this project with Firebase. Head over to the Firebase console and click Add project. Then click on the dropdown and select the Cloud project where you’ve created your AutoML model. If you’ve never used Firebase before, you’ll also need to install the CLI.

Next, clone the code from this GitHub repo and cd into the directory where you’ve downloaded it. To initialize Firebase in that directory run firebase init and select Firestore, Functions, Hosting, and Storage when prompted (this demo uses all four):

Now we’re ready to go. In the next step we’ll set up and deploy the Cloud Function that calls AutoML.

AutoML + Cloud Functions for Firebase

You can use Cloud Functions independently of Firebase, but since I’m using so many Firebase features in my app already, I’ll make use of the handy Firebase SDK for Cloud Functions. Take a look at the functions/index.js file and update the 3 variables at the top to reflect the info for your project:

https://medium.com/media/6fb7cf2e8b7a0262788cb3588bf80cfb/href

Our Cloud Function is defined in exports.callCustomModel. To trigger this function whenever a file is added to our Storage bucket we use: functions.storage.object().onFinalize(). Here’s what’s happening in the function:

Download the image to our Cloud Functions file system(we can use the tmp/ dir to do this)
Base64 encode the image to prepare it for the AutoML prediction request
Make the AutoML prediction request using the handy nodejs-automl package
Write the prediction response to Cloud Firestore

We can create an AutoML prediction client with 2 lines of code:

https://medium.com/media/eed858777d3036de0082a6e8aa49ac11/href

The request JSON to make an AutoML prediction looks like this:

https://medium.com/media/e6295422bb95438deab4f3cea0c4528f/href

All we need to do to send this to the AutoML API is created a prediction client and call predict():

https://medium.com/media/339c48b6b7e6b21474e6a4a4d9a723d3/href

Time to deploy the function. From the root directory of this project, run firebase deploy --only functions. When the deploy completes you can test it out by navigating to the Storage section of your Firebase console and uploading an image:

Uploading an image to Firebase Storage

Then, head over to the Functions part of the console to look at the logs. If the prediction request completed successfully, you should see the prediction response JSON in the logs:

Function logs

Inside the function, we also write the prediction metadata to Firestore so that our app can display this data on the client. In the Firestore console, you should see the metadata saved in a images/ collection:

Prediction metadata in Cloud Firestore

With the function working, it’s time to set up the app frontend.

Putting it all together

To test the frontend locally, run the command firebase serve from the root directory of your project and navigate to localhost:5000. Click on the Upload a cloud photo button in the top right. If the image you uploaded returned a prediction from your model, you should see that displayed in the app. Remember that this app is configured for my cloud detector model, but you can easily modify the code to make it work for your own domain. When you upload a photo, check your Functions, Firestore, and Storage dashboards to ensure everything is working.

Finally, let’s make use of Firebase Hosting to deploy the frontend so we can share it with others! Deploying the app is as simple as running the command firebase deploy --only hosting. When the deploy finishes your app will be live at your own firebaseapp.com domain.

That’s it! We’re getting predictions from a custom ML model with entirely serverless technology. To dive into the details of everything covered in this post, check out these resources:

Let me know what you think in the comments or find me on Twitter at @SRobTweets.

Query a custom AutoML model with Cloud Functions and Firebase was originally published in HackerNoon.com on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a ML keynote demo for 100,000+ people

Sara Robinson — Mon, 20 Aug 2018 14:11:16 GMT

Did you miss the AutoML announcements and demos during the Cloud Next ’18 keynote? I‘ve got you covered! In this post I’ll provide an overview of the AutoML products launched and the demos I showed during the keynote. I’ll also share some insights on the demo building process so that you can apply it to your own demos, if that’s your thing.

What is AutoML?

AutoML is a new ML offering on Google Cloud that lets you train custom machine learning models on your own data. The best part? You don’t have to worry about writing any of the model code, just press a train button and you’ll magically get access to your custom predictions through a REST API endpoint.

If you think of machine learning tools across a spectrum, AutoML falls right in the middle:

AutoML is currently available in 3 variants, all of which launched in beta at Next:

Vision: build image classification models trained on images from your dataset to do things like classifying the type of cloud in an image.
Natural Language: build custom text classification models to classify sentences and text documents into your own categories.
Translation: build domain-specific translation models to improve translation for industry-specific jargon and linguistic nuances.

During the keynote I showed a demo of AutoML Vision and NL. If you want to watch me be loaded into a 14x14-foot rotating cube (there’s something I never thought I’d say), you can watch the recording here:

https://medium.com/media/4f4710b82ddaa2768ce6e17a3d49424c/href

One of my favorite things about AutoML is that it gives you a domain-specific model, trained on your own dataset. This also makes it difficult to demo since any AutoML example will be niche and therefore not generally applicable. We cycled through lots (seriously lots) of different datasets for AutoML keynote demo ideas, with the goal of finding something that would inspire people to think about how they could apply AutoML to their own problem. I’ll outline both demos below.

You won’t be-leaf the AutoML Vision demo

To demo AutoML Vision I built a model that would take an image of a leaf and predict its species. But let’s start from the beginning — how did I land on leaves? Behind any good machine learning model is a high quality set of training data.

Finding an image dataset

The size of a training dataset depends on the type of model you’re building. If you’re building a model from scratch you’ll need on the order of thousands of image per label for high accuracy. Luckily AutoML makes use of a technique called transfer learning, which involves using a pre-trained model as a starting point for training a model on a similar task rather than starting from scratch. As a result, training a model with AutoML doesn’t require as much training data. You can start with as few as 10 images per label, though you’ll probably want a few more for a high accuracy model.

With that in mind, I needed to find or create an image dataset specific enough to highlight the value of AutoML Vision. There are a lot of great publicly available image datasets out there, but many of them use higher level labels which we could get from something like the Cloud Vision API.

Here’s another kicker: I needed to make sure I had the rights to use the dataset for this demo. That meant it needed to either be public domain, have a creative commons license that allowed for commercial use, or comprise images taken by folks (like me) who released all rights to the images. Along the way I discovered two handy tools for filtering datasets and images by license:

CC Search: this searches across multiple sources for images, and you can filter by licenses that allow for commercial use or modification
Kaggle: my favorite place to find all sorts of datasets. You can filter by licenses in the dropdown on the Datasets page. Or, you can look at the license for any individual dataset on the Overview page:

Note that there are many different types of creative commons licenses. Anything with “NC” in the name (like “CC BY-NC-SA 4.0”) means non-commercial, so you can’t use these datasets for any commercial purposes (and an enterprise cloud conference is pretty much as commercial as it gets).

After weighing all these requirements we settled on the Leafsnap dataset, which comprises over 30k images of leaves from 185 different species.

Preparing the data and training a model

To keep things simple I used only 10 types of leaves from the original dataset. After narrowing down the images I was ready to upload my data to AutoML Vision. There are a couple ways to do this (more details in this video):

Upload them directly to Cloud Storage and create a CSV with the filepath of your image in one column, and the associated label(s) in another. Once I got my images in GCS, I wrote a quick Python script to generate the CSV
Put your images in folders with the name of their label and create a zip. When you upload the zip AutoML will assign labels based on folder names

Once you’ve imported all your images you’ll see them in the UI:

Training a model is really as simple as pressing a button:

Evaluating the model and generating predictions

Once your model is trained, you can look at some common ML metrics to evaluate its accuracy. My favorite is the confusion matrix, which shows the percentage of images from the test set that a model was able to classify correctly. Ideally you want to see a strong diagonal down the top left, like this:

You can click on any of the squares to see which images the model found confusing. Next, it’s time for the best part — generating predictions on images your model hasn’t seen before.

You may have noticed that the leaf images in the LeafSnap dataset are pretty homogenous: they’re all taken on white lab-like backgrounds. Since this is the entire “world” our model was trained on, we shouldn’t expect it to do a great job classifying images of leaves with busy backgrounds, like this:

I was, however, impressed to discover that the model was able to classify some images of leaves “in the wild” where the shape of the leaf was more clearly defined in the image:

If I add more varied images to the training dataset, I can expect higher accuracy prediction on leaf images taken in the wild.

The screenshot above shows how to generate a prediction in the AutoML UI. But chances are you’ll want to build an app that dynamically generates predictions on your trained model. That’s where the AutoML API comes in. Here’s all you need in your JSON request to the AutoML API:

https://medium.com/media/bda5e24d8a4c5c2efcb8c8ea2e05c0c0/href

And here’s how you’d make a request to your custom model with curl:

https://medium.com/media/ccde50663fdc3c119d59b97cb5b325d5/href

AutoML Natural Language demo

The same dataset selection criteria from Vision also applied to NL. I was looking for a dataset specific enough that you couldn’t take the text and run it through the Cloud NL API’s content classification method (which is great in a lot of cases). I also needed to filter by dataset license. I did some fun experiments while looking for NL dataset options, like building a model that can predict the region a wine came from based on the description, a model for predicting the source of an article based only on its headline, and a model to classify text messages as “spam” or “not spam” (but think about showing spammy text messages on a giant screen at a livestreamed event 😂).

For the demo we choose to go with a dataset on Kaggle provided by the non-profit DonorsChoose:

DonorsChoose matches teachers who need resources for their classroom with donors. They currently have a team of volunteers that manually screens and categorizes each submission, so that donors are matched with projects they care about. The categories are specific to their application (things like “Lab Equipment,” “Art Supplies,” and “Trips”), so a generic pre-trained API wouldn’t work. DonorsChoose released a public domain dataset with over 1 million teacher requests to Kaggle to see if a community of ML experts and data scientists could build them a solution. I wanted to see if AutoML could help.

Preparing the data and training a model

Because AutoML NL has a limit of 100,000 examples in a dataset, I wrote a script to gather a subset of the original 1M+ dataset, limiting it to 9 categories. Uploading text data to AutoML NL is as simple as creating a CSV with your raw text in one column and the associated category in another (you can also build models with multiple labels per text input). Once the data has been imported, you’ll see something similar to AutoML Vision:

To train the model, you guessed it — just press a button! Note that training NL models currently takes considerably longer than Vision. I’ve found NL models take about 3–4 hours to train (you’ll get an email when it completes).

Evaluating the model and generating predictions

To evaluate the accuracy of single-label NL models, we can also look at the confusion matrix:

Here we also see a pretty strong diagonal from the top left, which means our model categorized the majority of our test data correctly. Now time to generate a prediction on text input our model hasn’t seen before:

In this example our model predicted the correct category of “Lab Equipment” with 98% confidence for this particular text input:

My students need aquarium lighting for our environmental clownfish aquaculture project.

Like AutoML Vision, we have access to a custom API endpoint for generating predictions:

https://medium.com/media/6a795c240ebb8d1b69c80f374c1819b3/href

It’s important to note that only I have access to this model, or any developers I’ve shared my project with. None of the training data for these models will be used to improve the Cloud ML APIs.

Presentation time

Once you’ve built the demo, what could possibly go wrong? Turns out lots of things. Live demos of developer products are always high risk, but my team are big believers in them. It’s infinitely more authentic to show a product being used live than to show a slide and talk to a product’s features. Even if something goes wrong, that’s real and relatable to developers.

I was amazed at how thorough our A/V team was in helping me think about backup plans for every possible type of demo failure:

Tech (AutoML) fail: What if the prediction request failed? If this happened I had screenshots of a successful prediction ready to go in a separate browser window. For some demos a backup screen recording is best.
Hardware fail: If the machine I was demoing from stopped working, we had a second machine with the demo configured and the A/V team was ready to switch to it if necessary. Not only was the demo configured on the backup machine, one of my teammates was backstage mirroring everything I was doing on screen in near realtime so that if we had to switch to this machine it would be in the same place where I left off.
Human fail: Humans are not perfect, and so in the event that I failed to show up or deliver the demo I had another teammate backstage ready to present.

Before I presented I had a few hours to freak out backstage and convince myself that no one would actually watch this thing. I spent most of that time listening to Hamilton. Luckily I didn’t find out until after I presented that there were over 100k people on the livestream 😮

My biggest takeaway from all of this is it takes a village to build a four minute demo. I was on stage but tons of people made this happen.

Train your own models

If you made it this far, maybe you’ve got some ideas for your own AutoML models! Dive into the docs for AutoML Vision, Natural Language, and Translation (I’ll cover Translation in a future post). Want to share what you’re building with AutoML, or have suggestions of interesting datasets? Find me on Twitter at @SRobTweets.

More where this came from

This story is published in Noteworthy, where thousands come every day to learn about the people & ideas shaping the products we love.

Follow our publication to see more product & design stories featured by the Journal team.

Predicting the price of wine with the Keras Functional API and TensorFlow

Sara Robinson — Mon, 23 Apr 2018 17:01:00 GMT

Can you put a dollar value on “elegant, fine tannins,” “ripe aromas of cassis,” or “dense and toasty”? It turns out a machine learning model can. In this post I’ll explain how I built a wide and deep network using Keras (tf.keras) to predict the price of wine from its description. For those of you new to Keras, it’s the higher level TensorFlow API for building ML models. And if you’d like to skip right to the code, it’s available on GitHub here. You can also run the model directly in the browser with zero setup using Colab here.

Shout-out to Francois, Josh, and Yufeng for their help and input on this post.

The model: wide & deep with Keras

I’ve been building a lot of Keras models recently (here are some examples) using the Sequential model API, but I wanted to try out the Functional API. The Sequential API is the best way to get started with Keras — it lets you easily define models as a stack of layers. The Functional API allows for more flexibility, and is best suited for models with multiple inputs or combined models. A good use case for the Functional API is implementing a wide and deep network in Keras. There’s a lot of great resources on wide and deep so I won’t focus on the specifics, but if you’re interested in learning more I recommend this post.

And before you jump to solve your ML problem with a wide and deep network, it’s best to make sure it’s well suited for what you’re trying to predict. If you’ve got a prediction task where there’s a relatively direct relationship between inputs and outputs, a wide model will probably suffice. Wide models are models with sparse feature vectors, or vectors with mostly zero values. Multi-layer deep networks, on the other hand, have been known to do well on tasks like image or speech recognition, where there may be unexpected relationships between inputs and outputs. If you’ve got a prediction task that could benefit from both of these models (recommendation models or models with text inputs are good examples), wide & deep might be a good fit. In this case, I tried a wide and deep model each separately, then combined them, and found accuracy to be best with wide & deep together. Let’s dive in.

The dataset: predicting the price of wine

We’ll use this wine dataset from Kaggle to see:

Can we predict the price of a bottle of wine from its description and variety?

This problem is well suited for wide & deep learning because it involves text input and there isn’t an obvious correlation between a wine’s description and its price. We can’t definitively say that wines with the word “fruity” in the description are more expensive, or that wines with “soft tannins” are cheaper. In addition, there are multiple ways to represent text when we feed it into our model, and both can lead to different types of insights. There are both wide representations (bags of words) and deep ones (embeddings), and combining the two can allow us to extract more meaning from text. This dataset has lots of different feature possibilities but we’ll use only the description and variety to keep things relatively simple. Here’s a sample input and prediction from this dataset:

Inputs

Description: Powerful vanilla scents rise from the glass, but the fruit, even in this difficult vintage, comes out immediately. It’s tart and sharp, with a strong herbal component, and the wine snaps into focus quickly with fruit, acid, tannin, herb and vanilla in equal proportion. Firm and tight, still quite young, this wine needs decanting and/or further bottle age to show its best.
Variety: Pinot Noir

Prediction

Price — $45

To begin, here are all the imports we’ll need to build this model:

https://medium.com/media/6b9cbc036e351f845dd71447e7eaad87/href

Since the output (prediction) of our model is a number for price, we’ll feed the price value directly to our model for training and evaluation. The full code for this model is available on GitHub. Here I’ll highlight the key points.

First, let’s download the data and convert it to a Pandas data frame:

https://medium.com/media/1aaae3b57c3c88ce8d7542d499d97ef4/href

Next we’ll split it into a training and testing set and extract the features and labels:

https://medium.com/media/7bd1217a79892b84024649768be808c4/href

Part 1: the wide model

Feature 1: Wine description

To create a wide representation of our text descriptions we’ll use a bag of words model. More on that here, but for a quick recap: a bag of words models looks for the presence of words in each input to our model. You can think of each input as a bag of Scrabble tiles, where each tile contains a word instead of a letter. The model doesn’t take into account the order of words in a description, just the presence or absence of a word.

Think of the inputs for a bag of words model as Scrabble tiles, where each tile contains a word (instead of a letter) from our input.

Instead of looking at every word found in every description in our dataset, we’ll limit our bag of words to the top 12,000 words in our dataset (don’t worry, there’s a built-in Keras utility for creating this vocabulary). This is considered “wide” because the input to our model for each description will be a 12k element wide vector with 1s and 0s indicating the presence of words from our vocabulary in a particular description.

Keras has some handy utilities for text preprocessing that we’ll use to convert the text descriptions into a bag of words. With a bag of words model we’ll typically want to only include a subset of the total words found in our dataset in the vocabulary. In this example I used 12,000 words, but this is a hyperparameter you can tune (try a few values and see what works best on your dataset). We can use the Keras Tokenizer class to create our bag of words vocabulary:

https://medium.com/media/04ce733ad9fd5a3ed50d3c78a6ff26f5/href

Then we’ll use the texts_to_matrix function to convert each description to a bag of words vector:

https://medium.com/media/916a144ae4eef185f0ec3e9e48a164a5/href

Feature 2: Wine variety

In the original Kaggle dataset there are 632 total varietals of wine. To make it easier for our models to extract patterns, I did a bit of preprocessing to keep only the top 40 varietals (around 65% of the original dataset, or 96k total examples). We’ll use a Keras utility to convert each of these varieties to integer representation, and then we’ll create 40-element wide one-hot vectors for each input to indicate the variety:

https://medium.com/media/665003c79f9caae2d83c152dba7501c9/href

Now we’re ready to build the wide model.

Building the wide model with the Keras functional API

Keras has two APIs for building models: the Sequential API and the Functional API. The Functional API gives us a bit more flexibility in how we define our layers, and lets us combine multiple feature inputs into one layer. It also makes it easy to combine our wide and deep models into one when we’re ready. With the Functional API, we can define our wide model in just a few lines of code. First, we’ll define our input layer as a 12k element vector (for each word in our vocabulary). We’ll then connect this to our Dense output layer to generate price predictions:

https://medium.com/media/24d357ba48a1fb49d18388e5854f32a3/href

Then we’ll compile the model so it’s ready to use:

https://medium.com/media/ea6f48199facdf489b21bf8ef76c9710/href

If we were using the wide model on its own, this is where we’d run training with fit() and evaluation with evaluate(). Since we’re going to combine it with our deep model later on we can hold off on training until the two models are combined. Time to build our deep model!

Part 2: the deep model

To create a deep representation of the wine’s description we’ll represent it as an embedding. There are lots of resources on word embeddings, but the short version is that they provide a way to map word to vectors so that similar words are closer together in vector space.

Representing descriptions as a word embedding

To convert our text descriptions to an embedding layer, we’ll first need to convert each description to a vector of integers corresponding to each word in our vocabulary. We can do that with the handy Keras texts_to_sequences method:

https://medium.com/media/c4c5a0e058e6a5bc38ee05a40e0c0e0f/href

Now that we’ve got integerized description vectors, we need to make sure they’re all the same length to feed them into our model. Keras has a handy method for that too. We’ll use pad_sequences to add zeros to each description vector so that they’re all the same length (I used 170 as the max length so that no descriptions were cut short):

https://medium.com/media/a06fb022732c3ce8b92587bf3065a1c3/href

With our descriptions converted to vectors that are all the same length, we’re ready to create our embedding layer and feed it into a deep model.

Building the deep model

There are two ways to create an embedding layer — we can use weights from pre-trained embeddings (there are many open source word embeddings) or we can learn the embeddings from our vocabulary. It’s best to experiment with both and see which one performs better on your dataset. Here we’ll use learned embeddings.

First, we’ll define the shape of our inputs to the deep model. Then we’ll feed it into the Embedding layer. Here I’m using an Embedding layer with 8 dimensions (you can experiment with tweaking the dimensionality of your embedding layer). The output of the Embedding layer will be a three dimensional vector with shape: [batch size, sequence length (170 in this example), embedding dimension (8 in this example)]. In order to connect our Embedding layer to the Dense, fully connected output layer we need to flatten it first:

https://medium.com/media/c45fb0dfdb8c8739ac979620aad347c4/href

Once the embedding layer is flattened it’s ready to feed into the model and compile it:

https://medium.com/media/01b764f73f61cec3c669c32414d71eb7/href

Part 3: wide and deep

Once we’ve defined both of our models, combining them is easy. We simply need to create a layer that concatenates the outputs from each model, then merge them into a fully connected Dense layer, and finally define a combined model that combines the input and output from each one. Obviously since each model is predicting the same thing (price), the output or labels from each one will be the same. Also note that since the output of our model is a numerical value we don’t need to do any preprocessing — it’s already in the right format:

https://medium.com/media/b91c88cd389aece82ba902ed7c43d9b8/href

With that, it’s time to run training and evaluation. You can experiment with the number of training epochs and batch size that works best for your dataset:

https://medium.com/media/87d66080d084ade063312d020be34175/href

Generating predictions on our trained model

Time for the most important part — seeing how our model performs on data it hasn’t seen before. To do this, we can call predict() on our trained model, passing it our test dataset (in a future post I’ll cover how to get predictions from plain text input):

https://medium.com/media/da17e6f1c9ba96dd862fe6942691885e/href

Then we’ll compare predictions to the actual values for the first 15 wines from our test dataset:

https://medium.com/media/ac3a33032bfff099a34ce199d452e31a/href

How did the model do? Let’s take a look at the three examples from our test set:

Powerful vanilla scents rise from the glass, but the fruit, even in this difficult vintage, comes out immediately. It's tart and sharp, with a strong herbal component, and the wine snaps into focus quickly with fruit, acid, tannin, herb and vanilla in equal proportion. Firm and tight, still quite young, this wine needs decanting and/or further bottle age to show its best.
Predicted:  46.233624 Actual:  45.0


A good everyday wine. It's dry, full-bodied and has enough berry-cherry flavors to get by, wrapped into a smooth texture.
Predicted:  9.694958 Actual:  10.0

Here's a modern, round and velvety Barolo (from Monforte d'Alba) that will appeal to those who love a thick and juicy style of wine. The aromas include lavender, allspice, cinnamon, white chocolate and vanilla. Tart berry flavors backed by crisp acidity and firm tannins give the mouthfeel determination and grit.
Predicted:  41.028854 Actual:  49.0

Pretty well! It turns out there is some relationship between a wine’s description and its price. We may not be able to see it instinctively, but our ML model can.

What’s next?

We covered a lot of material here but there are always more layers 😉. In a future post, I’ll cover how to train this model in the cloud. Also, a trained model isn’t the end of the road. If you’re training a model chances are you probably want to build an app that makes predictions on it. In another post I’ll cover serving this model in production and building an app to make predictions against it: enter a wine description, predict the price.

Want to build your own wide + deep model in Keras? Check out the full code from this model on GitHub and dive into the Keras Functional API docs. Let me know if you have any feedback in the comments or on Twitter @SRobTweets. Cheers! 🥂

Predicting the price of wine with the Keras Functional API and TensorFlow was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Build a Taylor Swift detector with the TensorFlow Object Detection API, ML Engine, and Swift

Sara Robinson — Tue, 02 Jan 2018 15:57:10 GMT

Note: as of this writing there is no official TensorFlow library for Swift, I used Swift to build the client app for prediction requests against my model. This may change in the future, but Taylor has the final say on that.

Here’s what we’re building:

The TensorFlow Object Detection API demos lets you recognize the location of objects in an image, which can lead to some super cool applications. But because I spend more time taking pictures of people, rather than things, I wanted to see if the same technology could be applied to recognizing faces. Turns out it worked pretty well! I used it to build the Taylor Swift detector pictured above.

In this post I’ll outline the steps I took to get from a collection of T-Swift images to an iOS app that made predictions against a trained model:

Preprocess images: resize, label, split them into training and test sets, and convert to the Pascal VOC format
Convert images to TFRecords for feeding to the Object Detection API
Train the model on Cloud ML Engine using MobileNet
Export the trained model and deploy it to ML Engine for serving
Build an iOS frontend that makes prediction requests against the trained model (in Swift, obviously)

Here’s an architecture diagram of how it all fits together:

And if you’d rather skip to the code, you can find it on GitHub.

Looking at it now, it all seems so simple

Before I dive into the steps, it would help to explain some of the technology and terms we’ll be using: The TensorFlow Object Detection API is a framework built on top of TensorFlow for identifying objects in images. For example, you can train it with lots of photos of cats and once it’s trained you can pass in an image of a cat and it’ll return a list of rectangles where it thinks there’s a cat in the image. And while it has API in the name you can think of it more as a set of handy utilities for transfer learning.

But training a model to recognize objects in an image takes time and tons of data. The coolest part of Object Detection is that it supports five pre-trained models for transfer learning. Here’s an analogy to help you understand how transfer learning works: when a child is learning their first language they are exposed to many examples and corrected if they misidentify something. For example, the first time they learn to identify a cat they’ll see their parents point to the cat and say the word “cat,” and this repetition strengthens pathways in their brain. When they then learn how to identify a dog, the child doesn’t need to start from scratch. They can use a similar recognition process that they did for the cat, but apply it to a slightly different task. That’s how transfer learning works too.

I don’t have time to find and label thousands of TSwift images but I can use the features extracted from those models which were trained on millions of images by modifying the last few layers and applying them to my specific classification task (detecting TSwift).

Step 1: Preprocessing images

Big thank you to Dat Tran who wrote this awesome post on training a raccoon detector with TF Object Detection. I followed his blog post for labeling images and converting them to the correct format for TensorFlow. His post has the details; I’ll summarize my steps here.

My first step was downloading 200 images of Taylor Swift from Google Images. Turns out there’s a Chrome extension for that — it’ll download all results from a Google Images search. Before labeling my images I split them into two datasets: train and test. I reserved the test set to test the accuracy of my model on images it didn’t see during training. Per Dat’s recommendations, I wrote a resize script to make sure none of the images were wider than 600px.

Because the Object Detection API will tell us where our object is in the image, you can’t just pass it images and labels as training data. You need to pass it a bounding box identifying where the object is in your image and the label associated with that bounding box (in our dataset we’ll only have one label, tswift).

To generate the bounding boxes for our images I used LabelImg, as recommended in Dat’s raccoon detector blog post. LabelImg is a Python program that lets you hand label images and returns an xml file for each image with the bounding box and associated label (I did spend an entire morning labeling tswift images while people walked by my desk with concerned glances). Here’s how it works — I define the bounding box on an image and give it the label tswift:

Then LabelImg generates an xml file that looks like the following:

https://medium.com/media/01ebd2f7aa073a84ccdb41f00cc75d05/href

Now I have an image, a bounding box, and a label but I need to convert this into a format that TensorFlow accepts — a binary representation of this data called a TFRecord. I wrote a script to do this based on the guide provided in the Object Detection repo. To use my script, you’ll need to clone the tensorflow/models repo locally and package the Object Detection API:

# From tensorflow/models/research/
python setup.py sdist
(cd slim && python setup.py sdist)

Now you’re ready to run the TFRecord script. Run the command below from the tensorflow/models/research directory, and pass it the following flags (run it twice: once for training data, once for test data):

python convert_labels_to_tfrecords.py \
--output_path=train.record \ 
--images_dir=path/to/your/training/images/ \
--labels_dir=path/to/training/label/xml/

Step 2: Training a TSwift detector on Cloud Machine Learning Engine

I could train this model on my laptop but that would take time, lots of resources, and if I had to put my computer away and do something else the training job would abruptly stop. That’s what the cloud is for! We can leverage the cloud to run our training across many cores to get the entire job done in a few hours. And when I use Cloud ML Engine I can run a training job even faster by leveraging GPUs (graphical processing units), which are specialized silicon chips that excel at the type of computations that our model performs. Utilizing this processing power, I can kick off a training job and then go jam out to TSwift for a few hours while my model trains.

Setting up Cloud ML Engine

With all my data in TFRecord format I’m ready to upload it to the cloud and start training. First I created a project in the Google Cloud console and enabled Cloud ML Engine:

Then I’ll create a Cloud Storage bucket to package up all the resources for my model. Make sure to specify a region for the bucket (don’t choose multi-regional):

I’ll create a /data subdirectory within this bucket to put the training and test TFRecord files:

The Object Detection API also needs a pbtxt file that maps labels to an integer ID. Since I only have one label this will be very short:

https://medium.com/media/5e2b37cfb7be6fe2060f20d58fd25ad2/href

Adding the MobileNet checkpoints for transfer learning

I’m not training this model from scratch so when I run training I’ll need to point to the pre-trained model I’ll be building on. I chose to use a MobileNet model — MobileNets are a series of small models optimized for mobile. While I won’t be serving my model directly on a mobile device, MobileNet will train quickly and allow for faster prediction requests. I downloaded this MobileNet checkpoint for use in my training. A checkpoint is a binary file that contains the state of a TensorFlow model at a specific point in the training process. After downloading and unzipping the checkpoint, you’ll see that it contains three files:

I’ll need all of them to train the model so I put them in the same data/ directory in my Cloud storage bucket.

There’s one more file to add before running the training job. The Object Detection script needs a way to find our model checkpoint, label map, and training data. We’ll do that with a config file. The TF Object Detection repo has sample config files for each of the five pre-trained model types. I used the one for MobileNet here and updated all of the PATH_TO_BE_CONFIGURED placeholders with the corresponding paths in my Cloud Storage bucket. In addition to connecting my model to the data in Cloud Storage, this file also configures several hyperparameters for my model like convolution size, activation functions, and steps.

Here are all the files that should be in my /data Cloud Storage bucket before I begin training:

I’ll also create train/ and eval/ subdirectories in my bucket — this is where TensorFlow will write my model checkpoint files while running training and evaluation jobs.

Now I’m ready to run training, which I can do through the gcloud command line tool. Note that you need to clone tensorflow/models/research locally and run this training script from that directory:

https://medium.com/media/3202df26b7cc5b785811c364cc65893e/href

While training is running, I also kicked off the evaluation job. This will evaluate the accuracy of my model using data it hasn’t seen before:

https://medium.com/media/ae53d9547d15df080e2b428af416937d/href

You can verify that your job is running correctly and inspect the logs for a specific job by navigating to the Jobs section of ML Engine in your Cloud console:

Step 3: Deploying the model to serve predictions

To deploy the model to ML Engine I need to convert my model checkpoints to a ProtoBuf. In my train/ bucket, I can see checkpoint files saved from a few points throughout my training process:

The first line of the checkpoint file will tell me the latest checkpoint path — I’ll download the 3 files from that checkpoint locally. There should be a .index, .meta, and .data file for each checkpoint. With these saved in a local directory I can make use of Objection Detection’s handy export_inference_graph script to convert these to a ProtoBuf. To run the script below, you’ll need to define the local path to your MobileNet config file, the checkpoint number of the model checkpoint you downloaded from the training job, and the name of the directory you’d like the exported graph to be written to:

https://medium.com/media/9583a433be88b80fa5ffcf23aab40c61/href

After this script runs, you should see a saved_model/ directory inside your .pb output directory. Upload the saved_model.pb file (don’t worry about the other generated files) to the /data directory in your Cloud Storage bucket.

Now you’re ready to deploy the model to ML Engine for serving. First, use gcloud to create your model:

gcloud ml-engine models create tswift_detector

Then create the first version of your model by pointing it to the saved model ProtoBuf you just uploaded to Cloud Storage:

gcloud ml-engine versions create v1 --model=tswift_detector --origin=gs://${YOUR_GCS_BUCKET}/data  --runtime-version=1.4

Once the model deploys I’m ready to use ML Engine’s online prediction API to generate a prediction on a new image.

Step 4: Building a prediction client with Firebase Functions and Swift

I wrote an iOS client in Swift for making predictions requests on my model (because why write a TSwift detector in any other language?). The Swift client uploads an image to Cloud Storage, which triggers a Firebase Function that makes the prediction request in Node.js and saves the resulting prediction image and data to Cloud Storage and Firestore.

First, in my Swift client I added a button for users to access their device’s photo library. Once the user selects a photo, this triggers an action which uploads the image to Cloud Storage:

https://medium.com/media/38836612f0ffe1e3fad6cddcfacb593d/href

Next I wrote the Firebase Function triggered on uploads to the Cloud Storage bucket for my project. It takes the image, base64 encodes it, and sends it to ML Engine for prediction. You can find the full function code here. Below I’ve included the part of the function where I make a request to the ML Engine prediction API (thank you to Bret McGowen for his expert Cloud Functions help on getting this working!):

https://medium.com/media/081aa9fe890bdac6b6c6015b0a4abf0d/href

In the ML Engine response, we get:

detection_boxes which we can use to define a bounding box around Taylor if she was detected in an image
detection_scores return a confidence value for each detection box. I’ll only include detections that have a score higher than 70%
detection_classes tells us the label ID associated with our detection. In this case it will always be 1 since there’s only one label

In the function I use detection_boxes to draw a box on the image if Taylor was detected, along with the confidence score. Then I save the new boxed image to Cloud Storage, and write the image’s filepath to Cloud Firestore so I can read the path and download the new image (with the rectangle) in my iOS app:

https://medium.com/media/0e8783fc92d797903c0cb8e2cb169fa9/href

Finally, in my iOS app I can listen for updates to the Firestore path for the image. If a detection was found, I’ll download the image and display it in my app along with the detection confidence score. This function will replace the comment in the first Swift snippet above:

https://medium.com/media/3b5ce9e168d092b103ccbe15f05171a4/href

Woohoo! We’ve got a working Taylor Swift detector. Note that the focus here was not on accuracy (I only had 140 images in my training set) so the model did incorrectly identify some images of people you might mistake for tswift. But if I find time to hand label more images I will update the model and publish the app in the App Store :)

What’s next?

This post covered a lot of information. Want to build your own? Here’s a breakdown of the steps with links to resources:

Preprocessing data: I followed Dat’s blog post on using LabelImg to hand label images and generate xml files with bounding box data. Then I wrote this script to convert labeled images to TFRecords
Training and evaluating an Object Detection model: using the approach from this blog post, I uploaded training and test data to Cloud Storage and used ML Engine to run training and evaluation
Deploying the model to ML Engine: I used the gcloud CLI to deploy my model to ML Engine
Making prediction requests: I used the Firebase SDK for Cloud Functions to make an online prediction request to my ML Engine model. This request was triggered by an upload to Firebase Storage from my Swift app. In my function, I wrote the prediction metadata to Firestore.

Have questions or topics you’d like me to cover in future posts? Leave a comment or find me on Twitter @SRobTweets.

Build a Taylor Swift detector with the TensorFlow Object Detection API, ML Engine, and Swift was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Adding Computer Vision to your iOS App

Sara Robinson — Thu, 26 Oct 2017 16:31:06 GMT

Recently I’ve been using the Google Cloud Machine Learning APIs with Node.js and Python, but I wondered — wouldn’t it be cool if there was an easy way to add them to a mobile app? That’s where the magic of Firebase comes in. I built an iOS app in Swift that makes use of the Cloud Vision API via the Firebase SDK for Cloud Functions. Here’s how it works:

The iOS client uploads an image to Cloud Storage for Firebase. This triggers a Cloud Function, where I’ve written Node.js code to send the image to the Vision API’s safe search and web detection methods. When I get the Vision API response, I write the response JSON into a document in Cloud Firestore (the latest Firebase DB offering). The iOS client has a listener on the Firestore document where my function will write the image data so it can display results in the UI. Here’s what the app looks like on an iPhone X (!) in the simulator (ignore the janky UI, I’m not an iOS design expert):

Let’s dive into the code! Before I get started I’ll create a new project in the Firebase console and initialize Storage, Functions, and Firestore for the project using the Firebase CLI:

Update: @joaolaq built an Android version of this app after I published this post. Check it out on GitHub!

Step 1: Upload an image from the device to Cloud Storage for Firebase

The first step is to upload an image from the iPhone’s photo library to Cloud Storage. I connected the upload button in my Storyboard to an IBAction:

https://medium.com/media/75b97a9cba34b4937829034828947805/href

In my UIImagePickerController delegate method I can upload the image to Cloud Storage with just a few lines of code:

https://medium.com/media/6255d77d7c95f8505e549a45660209c8/href

Step 2: Send the image to the Vision API with Cloud Functions for Firebase

The Firebase SDK for Cloud Functions lets us respond to different events in our Firebase environment, in this case a file being uploaded to Cloud Storage. Anytime our iOS client uploads a new photo, we’ll send that photo to the Vision API and then process the response.

The Vision API gives us access to a pre-trained model for image recognition in a single API call. It has many features (read about them all here), but in this example I’ll use safe search and web detection. Why these two? Many mobile apps use images in some way or another, and rather than having someone manually review whether these images are appropriate we can automate this with an API call to keep our app SFW. Web detection is one of my favorite Vision API features because you can do cool things like reverse image search to find similar and matching images.

Cloud Functions are written in Node.js. Here’s what our Vision API call looks like:

https://medium.com/media/8a92ac46e62f208c102f995984e1b58f/href

It’s worth noting that with one REST API request we’re able to get tons of data on our images, whereas we’d otherwise need to build a custom model from scratch and give it enough training data to be able to flag images as inappropriate and find visually similar images from across the web. In addition to all of that, we’d also need to handle serving our model and making prediction requests.

Here’s what the response JSON looks like for safeSearchAnnotation from the cat picture used in my demo above:

https://medium.com/media/a9e37a9153fc06991911cafd2020c7c5/href

If you look at the gif of my demo you’ll notice that when I sent it a selfie I took in Cambridge UK, it returned pictures of other people’s selfies from the same location. Pretty cool! Here’s part of the webDetection JSON response:

https://medium.com/media/d5f0bbdd8c66827575b987550b7a4d7d/href

Step 3: Write our Vision API response to Cloud Firestore

Now that we’ve got our Vision API response, we need a way to connect it to our iOS app. Cloud Firestore is a great option for this. Within our function, we’ll use the firebase-admin npm package to store the response JSON as a document in Firestore. The response for each image will be stored as a document under our images collection with a key corresponding to the the filename of our image (this code will replace the comment in our Cloud Function above):

https://medium.com/media/cb0a668665a76da88cd6608464bcb493/href

And here’s what the data for one image looks like in the Firestore console:

Now we’ve got our image data from the Vision API stored in a database!

Step 4: Listen for Firestore updates and display Vision API data in our iOS app

Going back to the image upload function from step 1, I’ll add a listener to the Firestore location for my image:

https://medium.com/media/b654668be44198e6905118ce7f1a0cad/href

And then display the web entities and similar images from the Vision API in the app:

https://medium.com/media/3c5d87e965ed956caa1062bfa4c94f2c/href

That’s it! With a bit of Swift and Node.js I’ve got an iPhone app that makes use of a pre-trained model for image recognition.

What’s next?

This app focused on the Firebase iOS SDK, but you could easily do the same thing with Android. Stay tuned for future posts exploring ML from a mobile development perspective, and I’d love to hear what you think: add comments below or find me on Twitter @SRobTweets if you’ve got ideas for future posts.