The Janus Array: A Universal Time-Series Visual Anomaly Engine

Inspiration

We were always a little bothered by how the inspection of expensive assets—like bridges, factory equipment, or crucial infrastructure—relies on someone flipping through old, static photos and trying to "spot the difference." Human eyes get tired, and subtle changes over months or years, like a hairline crack growing or paint slowly fading, are often missed until they become catastrophic failures.

Our inspiration was simple: We wanted to build a machine that could precisely track the history of an object's appearance. We didn't want to just compare a production line item to a perfect mold (a "golden reference"); we wanted to compare the object today ($Image_{Time2}$) to the same object six months ago ($Image_{Time1}$). This shift allowed us to move from simple quality control to predictive maintenance and generalized asset degradation monitoring.


What It Does

The Janus Array is a highly precise, industrial-grade "spot the difference" engine that operates on two images captured at different points in time ($Image_{Time1}$ and $Image_{Time2}$). Its core function is to intelligently compare these two states, specifically designed to handle universal degradation such as scratching, denting, corrosion, or material fading, rather than just simple manufacturing faults. It achieves this by first performing a robust Homography-based registration to flawlessly align the two images despite differences in camera viewpoint, angle, or scale. Once aligned, the system outputs a pixel-perfect segmentation mask showing the changed region, localizes it with a bounding box, and provides a classified text label (e.g., "Stress Fracture" or "Color Shift") detailing the nature of the anomaly for immediate, actionable intelligence.


How We Built It

We approached this as a robust, modular engineering challenge, building a pipeline in five distinct phases using Python (OpenCV/PyTorch) and Matlab for prototyping.

  1. Phase 1: The Realistic Dataset

    • We deliberately captured our datasets under variable conditions to simulate the field. We photographed our test objects (e.g., rusted metal, painted surfaces) with subtle variations in ambient lighting and camera angle for $Image_{Time1}$ and $Image_{Time2}$.
    • Anomaly Generation: We manually introduced realistic forms of degradation (e.g., using fine-tip pens for stress fractures, light abrasion for wear).
    • Ground Truth: We relied on Matlab's Image Labeler to meticulously paint pixel-perfect masks for only the region of change. This effort ensured the quality of our ground truth data, which was crucial for dealing with noise.
  2. Phase 2: The Janus Vision (Registration)

    • Since our images weren't perfectly still, a single pixel shift would destroy the comparison.
    • Prototyping (Matlab): We quickly tested feature extraction methods like SIFT and ORB to determine the most stable approach for finding matching points across the time-series images.
    • Implementation (OpenCV/Python): We used the identified features to calculate a Homography Matrix. This matrix allowed us to mathematically warp $Image_{Time2}$ to perfectly align with $Image_{Time1}$, effectively cancelling out all global movement (shift, rotation, perspective change).
  3. Phase 3: The Brain (Change Segmentation)

    • We needed a model that could understand subtle context. Simple image subtraction was useless due to residual lighting noise.
    • U-Net Architecture: We chose the U-Net segmentation model. Its structure is ideal for this task because it learns to isolate small, highly localized anomalies while ignoring the low-frequency noise.
    • Input Stacking: We trained the model by feeding it a 6-channel input (the 3 RGB channels of $Image_{Time1}$ stacked with the 3 RGB channels of the aligned $Image_{Time2}$), teaching it to find the spatial variance between them.
  4. Phase 4: Classification and Interpretation

    • A mask is great, but industrial users need labels.
    • We built a small, sequential Convolutional Neural Network (CNN) that takes the cropped area of the detected change (the "blob" from the U-Net) and classifies it into one of our defined categories ("Scratch," "Crack," "Fouling," etc.).
  5. Phase 5: Real-Time Visualization

    • The final Python script orchestrates the pipeline: Camera Frame → Normalization → Registration → U-Net Prediction → Contour Detection (OpenCV) → CNN Classification → Color-Coded Bounding Box output to the live feed.

Challenges We Ran Into

  • The Problem of Illumination Noise: When comparing images taken at different times of the day, shadows and light reflections were instantly flagged as "change." Our initial attempt at simple grayscale normalization failed. We had to implement a specific, subtle per-channel histogram normalization step before feature matching to minimize this noise without losing real color shift information.
  • Imbalanced Data and Dice Loss: In any real-world scenario, 99.9% of an object is not a defect. Using a standard Binary Cross-Entropy Loss would have resulted in a model that simply predicted "no defect" everywhere and still achieved 99.9% accuracy. The solution was using Dice Loss, which prioritizes the accurate prediction of the very rare defect pixels, forcing the U-Net to be sensitive to the small, meaningful changes.
  • The Jitter Effect: Even with the best registration, minor sub-pixel misalignment (the "jitter") persisted. This showed us the strength of the U-Net: it learned to treat this low-magnitude, full-image jitter as noise, proving its superiority over traditional subtraction methods.

Accomplishments That We're Proud Of

  • Building a truly Universal Registration module that could successfully align images despite significant field-simulated perspective changes, which is mandatory for any time-series application.
  • Successfully leveraging the U-Net architecture in an unconventional way (6-channel difference input) to perform anomaly segmentation, which provided much cleaner results than traditional methods.
  • Creating a system capable of both localization (where the change is) and classification (what the change is), delivering immediate, actionable intelligence to the user.

What We Learned

The most significant lesson was that Computer Vision is an engineering pipeline, not just a single Deep Learning model. The model's success was entirely dependent on the quality of the two upstream steps: Data Labeling (spending ~60% of our time) and the Alignment Module. No amount of complex training could fix poor registration. We learned to respect the classic CV techniques (like Homography) as the non-negotiable foundation for real-world reliability.


What's Next for The Janus Array?

The next phase for The Janus Array is moving from 2D images to 3D models. We plan to integrate with a ROS 2 (Robotics Operating System) framework and use an RGB-D camera to:

  1. 3D Registration: Align two point-clouds of an object over time, making our system completely invariant to viewpoint and scale.
  2. Volumetric Change Detection: Detect changes not just on the surface (scratches), but changes in shape (dents, warpage) by comparing the volume of the object at $Time_1$ versus $Time_2$.
  3. Autonomous Field Deployment: Use ROS 2 to orchestrate a mobile robot or drone to autonomously fly along a pre-defined path to capture $Image_{Time2}$ perfectly, ensuring highly consistent inspection data.

Built With

Share this project:

Updates