Project Story: Arm Photo Enhancer

Inspiration

The inspiration for this project came from observing a fundamental gap in mobile photography. While smartphone cameras have become incredibly sophisticated, the photos I capture are often degraded by poor lighting, motion blur, compression artifacts, and noise. Professional photo restoration software exists, but it requires expensive desktop applications, cloud processing, or specialized hardware.

I asked myself: could I bring research-grade image restoration algorithms directly to mobile devices? With the advancement of Arm processors and NNAPI acceleration, I believe sophisticated Multi-model AI pipelines could run efficiently on smartphones, enabling professional-quality restoration without compromising user privacy or requiring internet connectivity.

What I Learned

Deep Learning for Image Restoration

I gained a deep understanding of multiple image restoration paradigms:

  1. Zero-Reference Enhancement Zero-DCE uses a curve adjustment approach where the model learns to predict pixel-wise curve parameters $\alpha$ that transform input pixel intensities $I$ to enhanced output $O$:

$$O(x,y) = I(x,y) + \alpha(x,y) \cdot I(x,y) \cdot (1 - I(x,y))$$

This eliminates the need for paired training data while achieving natural-looking enhancements.

  1. Degradation-Aware Processing DA-CLIP encoder generates context embeddings $c_{img}$ and $c_{deg}$ that guide restoration by encoding both image content and degradation characteristics into a joint latent space.

  2. Stochastic Differential Equations for Restoration The UNet model implements reverse-time SDE integration for iterative restoration:

$$dx = [\theta(\mu - x) - \sigma^2 \nabla_x \log p(x|c)] dt + \sigma dW$$

Mobile AI Optimization

Memory Management: I learned that mobile devices require careful memory orchestration. Loading all three models simultaneously (Zero-DCE: 187MB, DA-CLIP: 698MB, UNet: 187MB) caused out-of-memory crashes. I implemented lazy loading where restoration models only load when needed and are explicitly freed after use.

NNAPI Integration: Arm's Neural Network API provides hardware acceleration, but requires careful tensor layout management and proper session configuration. I learned to balance performance gains with compatibility across different Arm architectures.

Image Processing Fundamentals: High-quality mobile image processing requires attention to detail. I implemented custom Canvas-based scaling with anti-aliasing to eliminate edge artifacts, proper EXIF orientation handling, and aspect-ratio-preserving layouts.

How I Built It

Implementation Phases

Phase 1: Model Integration Started with ONNX Runtime integration, converting PyTorch models to ONNX format. Initial implementation used CPU-only execution for validation.

Phase 2: NNAPI Acceleration Added Arm NNAPI backend with proper error handling. Discovered that NNAPI requires specific tensor layouts and careful memory alignment.

Phase 3: Image Processing Pipeline Implemented complete preprocessing: EXIF rotation correction, high-quality scaling with anti-aliasing, and proper color space handling. The key insight was using Canvas-based rendering with Paint flags.

Phase 4: Background Processing Moved restoration to foreground service with proper lifecycle management, notification updates, and broadcast receivers for progress communication.

Phase 5: UI/UX Polish Built a custom ImageComparisonSlider with proper aspect ratio handling, implemented Material Design 3 theming, and added tappable save notifications.

Challenges I Faced

Challenge 1: Model Conversation Issue

Problem: Converting the DA-CLIP model to ONNX format proved extremely challenging. Multiple conversion attempts resulted in models that produced low-quality embeddings or failed to export critical components.

Investigation: DA-CLIP uses a complex vision transformer architecture with custom attention mechanisms and degradation-aware components. During the PyTorch to ONNX conversion, several issues emerged:

  • Custom attention layers didn't trace correctly
  • Dynamic shapes in the encoder caused export failures
  • Quantization attempts degraded the embedding quality significantly

Root Cause: Vision transformers with custom operations often have components that don't map cleanly to ONNX operators. The degradation analysis pathway in DA-CLIP includes specialized layers not present in standard CLIP implementations.

Solution: After multiple iterations, we successfully converted the model by:

  • Simplifying certain custom operations while preserving functionality
  • Using explicit shape specifications during export
  • Validating embedding quality against the original PyTorch model
  • Testing with various degradation types to ensure the converted model maintained its degradation-aware capabilities

Challenge 2: Memory Management

Problem: App crashed with OutOfMemoryError during restoration, especially on devices with 4GB RAM.

Root Cause: Three models loaded simultaneously consumed over 1GB of heap space, plus intermediate tensors and bitmap buffers.

Solution: Implemented strategic memory management.

Challenge 3: Image Quality Issues

Problem: Users reported jagged edges after restoration, appearing as "zig-zag lines."

Root Cause: Default Bitmap.createScaledBitmap() uses basic bilinear interpolation, which creates aliasing artifacts on edges.

Solution: Replaced with Canvas-based high-quality scaling. This provided proper anti-aliasing and eliminated visible artifacts.

Challenge 4: UI Blocking

Problem: Long restoration operations blocked the UI, making the app appear frozen. The initial dialog-based progress indicator couldn't be dismissed.

Root Cause: Modal dialogs block user interaction even though processing runs in the background service.

Solution: Removed blocking dialog, moved progress display to status bar with live updates, allowing users to minimize app during processing. Implemented proper notification with step progress for background monitoring.

Challenge 5: Image Orientation

Problem: Photos from the gallery appeared rotated incorrectly.

Root Cause: Android cameras store orientation in EXIF metadata rather than physically rotating pixels.

Solution: Implemented complete EXIF orientation handling. Handled all EXIF orientations, including rotations and flips.

Technical Insights

Tensor Processing: Working with ONNX Runtime taught us the importance of proper tensor memory layout. The models expect NCHW (batch, channels, height, width) format, requiring careful preprocessing:

$$T_{NCHW}[n,c,h,w] = \text{normalize}(I_{HWC}[h,w,c])$$

Diffusion Sampling: Implementing 100-step reverse diffusion required understanding the balance between quality and performance. Each step computes:

$$x_{t-1} = x_t + \nabla_x \log p(x_t | c) \cdot \Delta t + \sigma \epsilon, \quad \epsilon \sim \mathcal{N}(0,I)$$

Conclusion

Building ARM Photo Enhancer demonstrated that sophisticated AI-powered image restoration can run efficiently on mobile devices when properly optimized. The key lessons were: careful memory management is critical for multi-model pipelines, NNAPI acceleration provides substantial benefits when properly configured, and user experience requires thoughtful background processing design.

This project proves that the boundary between desktop and mobile AI capabilities continues to blur, bringing professional-grade computational photography tools to billions of Arm-powered smartphones worldwide.

Built With

  • arm
  • da-clip
  • nnapi
  • onnx
  • unet
  • zero-dce
Share this project:

Updates