WCCL - Webcam Collective Communications Library

Inspiration

Have you ever wanted to build a distributed model training system by setting up a bunch of laptops in a circle, pointing their webcams at each other, and letting them talk to each other using nothing but their screens? Probably not because the bandwidth would suck and trying to rely on pixel-perfect accuracy from grainy webcam footage is not a recipe for success. But that's why it would also be really sick if it was possible. And there's only one to find out.

What it does

You first set up multiple root/worker nodes within webcam-view of each other. The user triggers the calibration process, where each laptop locks onto the others' screens, and they measure their expected color range in their current lighting. Once they're calibrated, they start sending tensors between themselves by encoding them into pixels and synchronously reading from the others in a mini distributed system. The only way they can communicate is this; no wireless networks needed. We use this to train a 670 param model for MNIST, up to 83% accuracy in 2.5 minutes.

How we built it

Sweat, tears, and love. Shoutout the Spoon conference room in the Neo office.

Challenges we ran into

One of the core problems was how to accurately transmit our data visually. To get the highest bandwidth, we would want high FPS, a wide color range, and small pixels. All of these things make it very hard to accurately decode the lossy webcam images though. There were a lot of factors that affected our accuracy, including which wall we were facing, which laptop we were using, how bright the room's lights were, etc.

Accomplishments that we're proud of

We ended up reimplmenting a number of non-trivial existing things from scratch to make this work, including pseudo-QR codes, distributed system clock synchronization, and color calibration. We also built a robust eval setup to test out different pixel sizes, color ranges, frame rates, and other variables.

What we learned

Wireless networks are really nice for communicating data compared to reading pixels, and we take them for granted. It turns out one of our laptops was able to consistently read new frames from its webcam about 10x faster than any of our other laptops. This variability in Mac hardware surprised us and was a good warning against working in hardware industries in the future.