The Pillars of Creation

femto

Femto is a novel image compression methodology and related suite of tools designed at MHacks: Refactor. It was created to allow machine learning algorithms to run more efficiently on images by compressing them using pre-computed features. Further, the femto methodology can be reduced smaller than the size of a jpeg without sacrificing information and is transmissable by text, making it both lightweight and low overhead.

The femto suite includes an automated setup tool for Apache Spark (PySpark) and the SciPy stack on Linode servers, allowing distributed scientific computing using master and slave configurations on multiple machines.

The Intuition

Femto treats images as an array of pixels and performs singular value decomposition to isolate important features. The singular values are encoded using a collection of performance optimizations (including multi-stage clustering to increase redundancy, indexing using k-strings, and standard text compression) on the server side. A pre-chosen amount of encoded data is then sent to the client where image reconstruction or data analysis takes place.

Performance

The femto protocol manages to compress large TIF images by up to 8 times without significant loss in quality or information. Additionally, femto is between 40% and 50% of the size of JPEG encoding (when converting from large TIF or Raw files).

Tested Use Cases

Femto has been tested using clustering algorithms (Lloyd's) and computer vision algorithms (blob identification). While being a smaller file, the reconstructed images using the femto protocol were just as accurate as uncompressed TIF images while being faster to complete.

Built With

apache-spark
bash
computer-vision
jupyter
linode
machine-learning
python

Submitted to

MHacks: Refactor

Created by

I worked on the mathematics behind and the optimization of the compression engine, built the grayscale compression and reconstruction engines, and implemented the computer vision test case.

Alex Cordover
I was in charge of creating a distributed system using spark and linode servers. Created automation tools to aid in the setting up of a large master-slave network. I helped out with figuring out the various maths behind our algorithm and implemented color image compression. Had the idea of increasing redundancy via machine learning

Alberto Rios

Updates

Alex Cordover started this project — Feb 21, 2016 04:34 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.