Stories by Deval Parikh on Medium

Visualizing Backpropagation in Neural Network Training

Deval Parikh — Fri, 14 Jan 2022 19:20:51 GMT

Visualizing Backpropagation in Neural Network Training at Any Scale

Using HiPlot to generate parallel coordinate plots to visualize deep learning model training.

A parallel coordinate plot to visualize ML training (Image by Author)

Understanding and debugging a Neural Network’s performance on a dataset is a critical chapter in the end-to-end lifecycle of a Machine Learning (ML) model. Having the ability to comprehend how a model is training can provide valuable insight into where improvements can be made. In this article, we will walk through creating a simple, yet effective, method of visualizing a process called backpropagation during Neural Network training. The visualization technique we will be using is called parallel coordinate plots. This is generally a technique used to visualize many different features with varying units or types from multiple data points. Below is an outline of the rest of this article:

Understanding the importance of evaluating deep learning models
Building a foundation
Generating a visualization

1. Understanding the importance of evaluating deep learning models

Photo by Hans Reniers on Unsplash

Deep learning models have a wide range of real-world applications ranging from fraud detection to self-driving vehicles. These applications are often things that impact our daily lives. When a model is scaled to be used by millions of users, millions of times a day, a marginal improvement in model metrics could result in a significant improvement in overall model performance.

By using visualizations, we can add another layer of depth to understanding how exactly our models are training and performing over each epoch. The further we can understand our models, the clearer decisions we can make during model selection. We can also use this information to understand if our models are being over-fitted or under-fitted based on the evaluation metric used, which can help with tuning hyperparameters.

2. Building a foundation

Photo by Etienne Girardet on Unsplash

Now that we understand the benefits that visualizing model training can provide, let’s get building! This example will be using Python version 3.7.

We will start by importing our Python dependencies:

import tensorflow as tf

from keras import layers
from keras import models

For this example model, we will be using the [1] Keras Boston Housing dataset. This dataset contains 13 features of houses around Boston, with the target value being the value of the home. To download and load the data add the following lines:

# Download dataset
dataset = tf.keras.datasets.boston_housing

# Test train split
(x_train, y_train), (x_test, y_test) = dataset.load_data()

# Normalize data using training data
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /= std

test_data -= mean
test_data /= std

Now that the data is ready to be trained on, we will create a function to build our barebones Neural Network.

def build_model():

    model = models.Sequential()

    model.add(layers.Dense(64, activation='relu',
              input_shape=(x_train.shape[1],)))

    model.add(layers.Dense(64, activation='relu'))

    model.add(layers.Dense(1))

    model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])


return model

Lastly, let’s train the model!

num_epochs = 18

model = build_model()

model_history = model.fit(x_train, y_train, epochs=num_epochs, batch_size=16, verbose=0)

test_mse_score, test_mae_score = model.evaluate(x_test, y_test)

print(test_mae_score)

Here are model is using Mean Absolute Error (MAE) as an evaluation metric. MAE can be used to evaluate how accurate our model is and can give a sense of performance. The MAE score will always be a positive number and the closer the score is to 0, the better your model is performing. MAE can be calculated by taking the average magnitude of errors. Take the absolute value of the predicted values subtracted by the actual values and take the sum off all of those numbers and lastly divide it by the total number of data points, 🎉 you’ve got your MAE score.

The formula for calculating MAE (Image by Author)

3. Generating a visualization

Photo by Isaac Smith on Unsplash

Now for the best part! Let's visually observe how model training went with our baseline Neural Network at each epoch by generating a parallel plot!

To do this we will leverage Facebook’s open-source HiPlot library. HiPlot is a lightweight tool that is able to easily and quickly generate a parallel coordinate plot based on the provided data. This is a super convenient way of displaying multiple features side-by-side to visually analyze what data looks like. In our case, we can use extracted meta-data about our model’s training to very quickly create a stunning and interactive visualization. More information about HiPlot can be found on the HiPlot Github Repository.

To generate the visualization, first, you will need to install a Python dependency for HiPlot:

pip install -U hiplot

Next, you will need to import HiPlot and use the functions to generate the plot.

import hiplot as hip

data = [{'epoch': idx,
         'loss': model_history.history['loss'][idx], 
         'mae': model_history.history['mae'][idx]}
       for idx in range(num_epochs)]
hip.Experiment.from_iterable(data).display()

The code above will use the return value of the Keras model.fit() method to load in the model history to HiPlot to be visualized and interacted with.

Interacting with the parallel plot generated by HiPlot (Image by Author)

🎉 Congratulations on building a Neural Network model and visualizing the backpropagation process with a parallel plot generated by HiPlot! You’ll notice for each epoch, the MAE score progressively gets better and now we are able to analyze and interact with this data while seeing where fitting tapers off at the exact epoch.

Going through this process, we can see how viable and seamless creating these interactive visualizations can be, while still providing great insight into how a model ran during training by leveraging training history data. One of the biggest advantages of using HiPlot is you can add many other attributes to extend onto the visualization to paint an even better picture without the cost of significantly increasing computing power needs.

References

[1] Boston Housing Dataset: https://keras.io/api/datasets/boston_housing/

Harrison, D. and Rubinfeld, D.L. ‘Hedonic prices and the demand for clean air’ (1978), J. Environ. Economics & Management, vol.5, 81–102

Visualizing Backpropagation in Neural Network Training was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a Real Time Chat Application with NLP Super Powers!

Deval Parikh — Tue, 01 Jun 2021 01:08:32 GMT

Building a Real Time Chat Application with NLP Capabilities

A Chat App with Sentiment Analysis and Tone Detection using TensorFlow JS Deep Learning API, IBM Cloud, Node.JS, Web Sockets, and React

Photo by Volodymyr Hryshchenko on Unsplash

In a world where Artificial Intelligence (AI) and Machine Learning (ML) models are being leveraged to obtain real-time information to continuously improve customer experience, I wanted to discover how effective ML can prove to be when trying to understand human conversations, and even try building our own custom application that implements this technology.

Conversational Machine Learning in Practice

The world is quickly adapting to using AI-driven applications to assist humans on a day to day basis. Siri and Alexa are driven by Natural Language Processing (NLP) based ML models that are continuously training and iterating to sound more natural and offer more value by understanding complex human dialog. Not to mention, Google’s latest conversation technology, LaMDA. LaMDA is trained in a way to be able to interrupt larger blocks of human conversation and understand how multiple words or phrases are related to each other to gain more context about the inputed text, specifically targeting human dialog. It is then able to dynamically generate a response that is natural and human-like. The following is from Google’s AI blog post, LaMDA: our breakthrough conversation technology, which briefly describes the deep learning technology LaMDA is built on:

[1] LaMDA’s conversational skills have been years in the making. Like many recent language models, including BERT and GPT-3, it’s built on Transformer, a neural network architecture that Google Research invented and open-sourced in 2017

This technology is super impressive and is quickly proving how valuable it can be in our daily lives, from making reservations for us to eliminating the need for human powered call centers.

Challenges of Machine Learning

However, it’s not all rainbows and sunshines, in the process of training and integrating ML models into production applications, there comes many challenges. How do we test these models? How do we develop reliable models? How do we ensure biases in ML models are optimally mitigated? How easily can we build applications to scale that use these models? These are all questions to consider when developing ML solutions at scale.

Let’s Build Our Own App!

In this article, we are going to build a custom full stack application that utilizes Google’s TensorFlow and IBM’s Cloud Machine Learning as a Service (MLaaS) platform to discover how competently engineers can develop maintainable, reliable, and scalable full stack ML solutions using Cloud-based tools.

The application we will be building is a real-time chat application that is able to detect the tone of the users’ messages. As you can imagine the use cases for this can span greatly, from understanding customers’ interaction with customer service chats to understanding how well a production AI chatbot is performing.

Architecture

At a high level, this is the architecture of the application.

The high level application architecture consists of utilizing React and TypeScript for building out our custom user interface. Using Node.JS and the Socket.IO library to enable real-time, bidirectional network communication between the end user and the application server. Since Socket.IO allows us to have event-based communication, we can make network calls to our ML services asynchronously upon a message that is being sent from an end user host.

Backend

As for as the ML services, we can make an HTTP request (that contains the content of the chat message being sent in the payload) to IBM’s Cloud based tone detection models. The response from IBM’s web service will be a JSON object containing an array of tones that the model classified that message having, such as

[“Joy”, “Positive”, “Happy”]

Asynchronously, our Node.JS web service can make a request to TensorFlow’s Sentiment API. TensorFlow’s ML model is a Convolutional Neural Network based Deep Learning architecture that has been trained on 50,000 labelled data points made up of IBMD movie reviews to be able to predict the sentiment of newly introduced inputted texts. We will send each new chat message through TensorFlow’s pre-trained model to get an average Sentiment score of the entire chat conversation.

https://medium.com/media/dac16f73062ebeb0758c013b58f49726/href

In the above gist, you can see upon a client sending a new message, the server will call 2 functions, getTone and updateSentiment, while passing in the text value of the chat message into those functions.

https://medium.com/media/e9ecd2030d2031054514d081b4d80b61/href https://medium.com/media/b612eec0ee803444b7d77f20018e3f5d/href https://medium.com/media/1e7f22a05aecb4b06207113b7fb6dd7e/href

The predict function above is called in updateSentiment. This function loads the TensorFlow pre-trained model by using a network fetch, preprocesses the inputted data, and uses the model to evaluate a sentiment score. This all happens in the background parallel to processing other backend tasks.

Frontend

https://medium.com/media/4c2befb9881e5ca97850da3936f99ce0/href

In the code above, we are building a functional React component to handle client side interaction with the Chat Application. Since we are using a functional component, we have access to React hooks, such as useState and useEffect. The message state holds the value of the current user’s inputed text. The messages state holds the array of all sent messages. You can see the connection to the Socket server in useEffect, which will be called upon every re-render/on-load of the component. When a new message is emitted from the server, and event is triggered for the UI to receive and render that new message to all online user instances.

https://medium.com/media/187f21ff1fc18d567537c0397feba6bf/href

The messageObject emitted from the sendMessage function will reach the server which will parse out the messageObject.body (the actual value of the user sent chat message) and process the data through the ML web services built out previously. It will then build and return a new object containing the message, username, and the tone of the message acquired from the ML model’s output.

https://medium.com/media/7106e0ffef21d14738f34763d06467d4/href

and the final results… TADA!

A demonstration of the application we built in this article. Each chat bubble contains the tag of the tone of that given message. (Source: Author)

Full Source Code

To checkout the full code visit the Github repository:

devalparikh/NLP_Chat_App

Conclusion

Through building this proof of concept project, I hope to demonstrate how seamless it is to develop a custom application that can integrate Machine Learning tools and techniques by utilizing Cloud-based Web Services to be scalable and maintainable. By using IBM’s Cloud Services and Google’s TensorFlow Pre-Trained Sentiment Model, we were able to build a chat application that can classify the tone of each chat message, as well as the overall sentiment of the conversation.

References

[1] E. Collins, Z. Ghahramani, LaMDA: our breakthrough conversation technology (2021), https://blog.google/technology/ai

Building a Real Time Chat Application with NLP Super Powers! was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.