Inspiration
Errors, if not corrected in timely manner, can lead to violent consequences. Although robotics and computer technologies are used to do many repeating and critical tasks, there are still many positions require human's work. Errors made in assembly line could lead to major delays of assignment, or even many plane crashes reportedly resulted due to pilot error. But "To err is human" and perfection is illusion. The easiest way to deal with such situation is to monitor actor for critical operations. We envisage a vision based system which keeps an eye on operator and signal when the action can lead to or have led to an error. This permits timely correction.
What it does
Our system named "RoBoss" (short for 'Robot Boss') constantly monitors an operator in work environment while (s)he is performing critical actions and gives real time warning when error found. The system is designed to signal in two scenarios: (1) issue code 'yellow' when the current action is predicted to lead to a fatal error (2) issue code 'red' when particular action has led to an error. It is to be noted that RoBoss, just like a boss, does not monitor the state of the equipment or product but only the actions of an operator.
How we built it
Roboss is a sequential activity recognition agent, integrating instance segmentation (object detection, recognition, masking, etc.), object orientation detection, gesture recognition, and activity recognition (traditional methods).
RoBoss should be tailored for specific task and environment. As a proof of concept, we defined a hypothetical critical task which is: to water a plant :). We have a plant and two containers, one with water and one with poisonous liquid. The task is to pour water into the plant and not the poisonous liquid. We collected our own data based on this scene which was used for all of our experiments.
Challenges we ran into
Sequential activity recognition is a very challenging task because it integrates multiple major computer vision applications: instance segmentation, gesture detection, etc. The implementation and research of each this technology takes time and the error of the final integration accumulates in each of this function block. It is even more challenging for us because we need to implement these technologies we are not familiar with in a short period of time. At the same time, it also took time to collect our own data and pre-processing.
Accomplishments that we're proud of
We are very proud that we made all the major blocks work separately and together: instance segmentation using MaskRCNN open source code and gesture detection using OpenCV background subtraction. We are able to demonstrating the plant watering scenario where RoBoss is able to warn the operator for pouring poisoning liquid into the potted plant!
What we learned
We learned how to use object segmentation and gesture detection technologies and how to integrate them together for a challenging task.
We have struggled with multiple open source implementations including OpenPose and Detectron. We learned to give up wisely and switch to new open source options, which is the key for us to implement our idea in limited time.
What's next for Roboss
First, we will make RoBoss real time when we get access to server's GUI. Then, with better gesture detection technology (e.g. OpenPose), we are going to detect/recognize activities more accurately. We want to extend our hypothetical task to much more realistic and complex tasks involving real working environment and help human agents work in less stress.

Log in or sign up for Devpost to join the conversation.