Distributed Hyperparamter Optimization

Do you like creating neural networks but hate the time it takes to optimize them - in particular, the process of optimizing hyperparameters? Don't you hate burning out your RAM while training the 90th variation of your model in your quest for better accuracy? Does code like this make your eyes bleed? real code in one of my friends' projects

for nn in test_num_nodes_per_hidden_layer:
    for a in test_activation:
        for ki in test_kernel_initializer:
            for of in test_optimizer_function:
                for lf in test_loss_function:
                    for df in test_discount_factor:
                        for r in test_reward:
                            for p in test_punishment:
                                for dr in test_draw_reward:
                                    for tfq in test_transfer_frequency:
                                        for ed in test_epsilon_decay:
                                            for edt in test_epsilon_decay_type:
                                                for me in test_min_epsilon:
                                                    for i in test_include_whos_turn:
                                                        for ms in test_memory_size:
                                                            for (
                                                                rbs
                                                            ) in test_replay_batch_size:
                                                                n = "{}".format(
                                                                    datetime.datetime.now().timestamp()
                                                                )
                                                                entry = pandas.DataFrame.from_dict(
                                                                    {
                                                                        "name": [n],
                                                                        "test_include_whos_turn": [
                                                                            i
                                                                        ],
                                                                        "test_num_nodes_per_hidden_layer": [
                                                                            nn
                                                                        ],
                                                                        "test_activation": [
                                                                            a
                                                                        ],
                                                                        "test_kernel_initializer": [
                                                                            ki
                                                                        ],
                                                                        "test_learning_rate": [
                                                                            of.learning_rate
                                                                        ],
                                                                        "test_optimizer_function": [
                                                                            of
                                                                        ],
                                                                        "test_loss_function": [
                                                                            lf
                                                                        ],
                                                                        "test_discount_factor": [
                                                                            df
                                                                        ],
                                                                        "test_reward": [
                                                                            r
                                                                        ],
                                                                        "test_punishment": [
                                                                            p
                                                                        ],
                                                                        "test_draw_reward": [
                                                                            dr
                                                                        ],
                                                                        "test_transfer_frequency": [
                                                                            tfq
                                                                        ],
                                                                        "test_epsilon_decay": [
                                                                            ed
                                                                        ],
                                                                        "test_epsilon_decay_type": [
                                                                            edt
                                                                        ],
                                                                        "test_memory_size": [
                                                                            ms
                                                                        ],
                                                                        "test_replay_batch_size": [
                                                                            rbs
                                                                        ],
                                                                    }
                                                                )
                                                                results = pandas.concat(
                                                                    [results, entry],
                                                                    ignore_index=True,
                                                                )
                                                                entry.to_csv(
                                                                    "results.csv",
                                                                    mode="a",
                                                                    header=False,
                                                                )
                                                                m = model.Model(
                                                                    model_name=n,
                                                                    model=model.create_model(
                                                                        include_whos_turn=i,
                                                                        num_nodes_per_hidden_layer=nn,
                                                                        activation=a,
                                                                        kernel_initializer=ki,
                                                                        optimizer_function=of,
                                                                        loss_function=lf,
                                                                    ),
                                                                    discount_factor=df,
                                                                    num_episodes=test_num_episodes,
                                                                    include_whos_turn=i,
                                                                    reward=r,
                                                                    punishment=p,
                                                                    draw_reward=dr,
                                                                    transfer_frequency=tfq,
                                                                    epsilon_decay=ed,
                                                                    epsilon_decay_type=edt,
                                                                    min_epsilon=me,
                                                                )
                                                                m.train(
                                                                    with_output=False,
                                                                    save_plots=True,
                                                                )

Us too! That's why we created this tool that allows (broke students just like you) to distribute your AI training workloads over all of your friends' computers without selling your organs to AWS to afford a few hours of SageMaker!.

Each client will need to have python installed and run the bash install script which will install the necessary ML dependencies. Then, all they need to run is the worker.py file and they will automatically start receiving hyperparameters to optimize! Once they're connected, a Websocket server written in Go puts all of your little minions to work! Hard work! As soon as they finish their job, they are immediately sent with the next set of hyperparameters to train. Finally, a convenient frontend written in React allows you to configure what combinations of hyperparameters you want to optimize and trigger the start of training. It also provides real-time feedback with information delivered directly from the Go Websocket Server.

We tried to implement a genetic algorithm in python instead of going for a brute force "grid search" approach for finding hyperparameters hoping that it would converge on the best hyperparameters; however, in practice we found that the simple grid search was more effective as the genetic algorithm was too unstable to converge on within a reasonable amount of time. Shout-out to Adrian for spending a painful 12 hours debugging it and trying to make it work!.

Briefly, some challenges we ran into were:

Learning concurrency patterns in Go while attempting to write efficient code in Go
Designing our own protocol for websocket messages
Transfer of data across 3 different programming languages through websockets
Hosting the final project

Built With

Submitted to

McHacks 11

Created by

I mostly worked on the Golang Websocket server and worked on the frontend.

Ezra Huang
I created the tensorflow model and the genetic algorythm that did not end up getting integrated because it did not converge enought (0.975 accuracy compared to 0.977 for the grid search)

Adrian Vidal
I worked on some of the frontend and debugging the genetic algorithm for optimization

Olivier Lefevre
I worked on the Python Websocket clients and their integration with the Golang server.

Devin Strachan