Inspiration
Many interesting machine learning models need high quality ground truth data. Data labelling is often painful with clunky UIs and very basic inputs.
What it does
We wanted to build a personalized cosmetics company using machine learning, but finding ground truth for things like lip segmentation is hard. Doing it manually one by one is also painful. Instead we built UIs to quickly label that data, training the ML model in real-time and using it's output for aiding the rater (ex: model guesses a segmentation mask, rater simply corrects it instead of starting from scratch).
How we built it
To keep it simple, we are labelling deep fakes. We got some data from the internet and we ourselves generated some synthetic data.
We have a backend built in node.js, it has the logic for returning images to label and also recording rater results. It includes an ML model which uses tfjs and is updated in realtime as rater input comes in. We have a frontend which lets the rater quickly go through data using keyboard shortcuts, either labelling 1 by 1 or with grids.
Challenges we ran into
Our initial models were simple 4-5 layer deep convolutional networks, the performance was very bad. We ended up switching to using pretrained base models and built classifiers on top.
Accomplishments that we're proud of
We have a real-time updating ML model, we have a slick/fast UI.
What we learned
tfjs, concept of active learning in ml, challenges of data labelling.
What's next for false:positive
Make it a one-stop shot for accelerated dataset labelling.
Built With
- javascript
- node.js
- react
- tensorflow
Log in or sign up for Devpost to join the conversation.