Inspiration

Distance vision impairment or blindness affects an estimated 100 million people globally  [1]. It's critical to recognise that blindness is a continuum  [2], and that everyone's experience is different. Blurred vision, tunnel vision, issues with depth perception or object recognition, and other visual impairments are examples. Some people are merely able to perceive light or are entirely blind. We've chosen to look into possible remedies to ease visual impairment symptoms and design a method to enter into the challenge.

What it does

We developed a system in collaboration with a doctor that can recognise and declare 80 distinct daily things (both indoors and outdoors) in real time. Visual feedback in the form of light flashes can eventually be provided via a prototype wearable IoT device. The following considerations should be made: The system should - i) be able to identify a variety of items, ii) be adaptable to meet different demands, offer real-time verbal and visual feedback, iii) and be simple and easy to use.

How we built it

Back-End Machine Learning Inference: The system's backend is built to run on an EC2 G4 instance optimised for Machine Learning in an AWS Wavelength Zone. Incoming connections are handled by a Rust-based Websocket server, which decodes and transmits frames to the ML model. We used the superb C-based Darknet and YOLOv4 for object detection. We can conduct real-time object detection in 0.05 seconds per frame using the GPUs provided on EC2 G4 instances.

Android App: The system's user-facing component is an Android app that connects to the camera and sends the video stream to the back-end server for object detection in real time. To reduce bandwidth requirements, frames are shrunk and compressed suitably. The server's detected items are then prioritised and sorted. A car, for example, is more important to the user and potentially more harmful than a person. When using the wearable IoT device, such items are first stated and will eventually alter the blink-rate/colors of the LEDs. The app has accessibility settings such as adjusting the voice rate (how fast object announcements are delivered) and which language to use (currently, English, German, and Spanish) to meet the needs of people with various types and degrees of visual impairment.

The app includes accessibility settings such as varying the speech rate (how fast object announcements are spoken), selecting the language to use (currently, English, German, and Chinese are supported), and visual feedback settings that will be used once integration with the wearable device is established to meet the needs of people with various types and degrees of visual impairment.

Challenges we ran into

Encoding of video: We noticed that accessible implementations of a few different techniques to streaming video, including RTSP, were buffering frames to reduce lag/stutter. Unfortunately, this took roughly two seconds on average (depending on buffer size), which is too long for our needs. Reducing the buffer size would certainly result in visible compression artefacts, which would make object detection more difficult. We eventually chose to manually resize and compress frames, and the results were fantastic.

Testing: Testing the system was one of the most difficult tasks we faced. AWS Wavelength is currently only offered in a few cities across the United States and is only accessible via the Verizon Network. We faced excessive latency due to our location which is far from the AWS Wavelength Zones. Verizon kindly provided access to the Nova Testing Platform for the duration of the challenge, which was quite beneficial. It let us to test the Android app's access to AWS Wavelength in a 5G environment. During development, we mostly relied on generic testing in a Local Area Network environment, which we then deployed to a test system operating on a typical AWS EC2 instance in the Asia Pacific region.

Accomplishments that we're proud of

We're ecstatic that we were able to get precise real-time object detection to work dependably, and that we can now announce objects using text-to-speech in a variety of languages to suit diverse demands. We hope that persons with visual impairments may find this software beneficial.

What we learned

We had the opportunity to try out a variety of new things during the course of this project:

1) We learned about AWS Wavelength and 5G technology, which allows us to connect with ultra-low latency and high bandwidth.

2) On GPU-enabled EC2 G4 instances, real-time object identification utilising Machine Learning.

3) An in-depth look into Android video capture and transcoding.

What's next for NoBuff

We want to improve vocal object announcements even more and make them smarter. Objects are currently sorted by priority and grouped into rough locations. This works fine, however it can get a little verbose. Object persistence (keeping track of objects as they move across successive frames) would help us minimise noise and make announcements more clear.

We'd also like to improve and correctly integrate the wearable IoT gadget with the app so that they can communicate.

Built With

Share this project:

Updates