Generatones | Devpost

Generatone

Introducting Generatone

Receiving a call from a loved one, from an old friend or from a complete strangers. These are all special calls that we cherish, but why do they all come with the same boring ringtones? How can one limit himself to a few dozen predefined ringtones when the calls following them are so unique?

This is why we decided to build Generatone, an A.I. that can create a new ringtone everytime you receive a call so that each and every of your phone's rings rings is unique.

Technical details

Collectively, our group members had already used various ML algorithms for projects, usually using DNN for classification. For this MAISHacks, we wanted to try out something new. We decided to settle for a project involving the generation of content by an ML algorithm. We decided to go for a ringtone generator because we did not find anybody online who had done the same project.

We first wanted to build a Generative-Adverserial Network (GAN), but we then realized the complexity of building such a model with sequential data. After thinking, we decided to settle for a Recursive Neural Network using a Long Short-Term Memory (LSTM) architecture, which would facilitate the use of sequential audio data. We also decided to use MIDI instead of MP3 for the file format since the latter would have been likely to simply yield white noise, whereas MIDI is almost guaranteed to yield a result because the file format directly specifies notes to play instead of frequencies.

Ultimately, we built and trained a functional LSTM that generated a variety of ringtones, and exported them to playable MIDI format.

How we built it

Python
Keras LSTM
MIDI Parsing library
Flask web ap
HTML/CSS

Challenges we ran into

We had to figure out how to work with audio files, and how to design a neural network architecture to generate ringtones.

We decided to use MIDI files, since a MIDI file represents a series of notes. We self-taught ourselves to use MidiToolKit, a Python library designed specifically to parse and create MIDI files.

We had to find training data, so Justin Lacoste (beloved team member) downloaded 600 ringtones, and manually annotated sentiment analysis for a significant amount for future analysis.

We then had to find a way to generate music. We initially were looking forward to use a GAN-inspired network, though we eventually decided to use a recurrent neural network generator (without discriminator), as the complexity of a newly trained GAN risked overfitting data.

We had to learn how to implement LSTMs through Keras. This led to new techniques that we invented:

Converting MIDI files to a matrix format that a neural network could read
Selecting specific parameters for the LSTM input
Creating a sliding window for the notes
Fine-tuning the LSTM architecture to achieve the best results

We also hit an error of overfitting at times, which led us to implement dropout and modify hyperparameters such as the number of nodes, activation function and optimizers.

Accomplishments that we're proud of

Aggregating 600 MIDI music files for various ringtones
Inventing a method to parse MIDI music files and convert them into an easily readable format for LSTM neural nets
Deploying a functional LSTM that generates ringtones and outputs them into MIDI music files
Creating a Flask web app to showcase our ringtones