🧠 About the Project — Cortex Pocket

Offline-first AI. Arm-optimized performance. Cross-platform intelligence.

🌟 Inspiration

Modern AI assistants rely almost entirely on cloud inference. This creates several limitations:

  • They stop working offline
  • Sensitive data is sent to external servers
  • They depend on paid API keys
  • They are inaccessible in low-connectivity regions
  • They provide no control over performance or privacy

I wanted to challenge this model.

The inspiration behind Cortex Pocket was simple:

What if a full AI assistant could run entirely on your device, powered only by its Arm CPU?

From that question, Cortex Pocket was born — a mobile-first, privacy-first, offline-first AI assistant capable of running modern LLMs without touching the cloud. And for those who still want cloud access, Cortex Pocket offers a secure optional remote mode, fully encrypted and user-controlled.

🚀 Project Overview

Cortex Pocket is an Android-first AI application designed to run fully on-device Large Language Models (LLMs) using Arm-optimized inference via llama.cpp. While Android is the primary target platform, the project also supports:

  • 📱 iOS
  • 🌐 Web (remote API mode only)
  • 🐧 Linux
  • 🍎 macOS
  • 🪟 Windows

This makes Cortex Pocket not just a mobile AI assistant, but a cross-platform developer tool and research sandbox.

🧩 How I Built It — Technical Architecture

Cortex Pocket is built using a layered, modular architecture that balances performance, privacy, and portability.

Tech Stack

  • Flutter 3.10+ — Fast, beautiful cross-platform UI
  • Dart 3.0+ FFI — Low-level bridging to native inference
  • llama.cpp (GGUF) — High-performance local AI engine
  • Flutter Secure Storage — Encrypted local persistence
  • Custom Benchmark Engine — Real-time profiling
  • JSON-driven Personas — Developer, Security, Writer, Analyst, Guide
  • Optional Cloud Mode — Gemini/OpenAI API support

🧬 Architecture Diagram (Textual)

Flutter UI (Material)
│
├── UI Layer (Screens, Widgets)
│
├── Services Layer
│    ├── LLMService (local/remote switching)
│    ├── BenchmarkService
│    ├── PersonaService
│    └── FileReasoningService
│
├── Data Layer
│    ├── Repositories (models, personas)
│    ├── Encrypted Storage (chat, configs)
│    └── Runtime Providers
│
├── FFI Layer
│    ├── llama_ffi.dart
│    ├── llama_bridge.dart
│    └── llama_types.dart
│
└── Native Inference (llama.cpp / GGUF)

This separation ensures clean maintainability and easy expansion, such as adding new models or personas.

🏎️ Arm Optimization (Core Requirement)

Because this project is part of the Arm AI Developer Challenge, I focused heavily on Arm-specific performance optimization:

✔ Compiled llama.cpp with Arm NEON

Vectorized matrix compute for improved token throughput.

✔ Armv8-A acceleration

Enabled math intrinsics for faster inference.

✔ FP16 KV cache

Greatly reduces memory bandwidth and improves consistency.

✔ Quantized GGUF Models

Supports Q8 → Q4 → Q3 levels for low-RAM devices.

✔ big.LITTLE CPU Scheduling

Optimized number of threads to balance:

  • performance (big cores)
  • thermal limits
  • battery consumption

✔ On-device Benchmark Suite

Measures:

  • token/s
  • RAM usage
  • model load time
  • CPU core usage
  • quantization efficiency

These optimizations allow Cortex Pocket to run efficiently even on mid-range Android devices, completely offline.

🔍 What I Learned

1. Low-level LLM inference

Building the FFI bridge into Flutter gave me hands-on understanding of:

  • KV cache
  • attention computation
  • tokenizer performance
  • quantized kernels
  • memory spikes during inference

2. Mobile hardware constraints

Phones introduce real challenges:

  • limited RAM
  • thermal throttling
  • mixed-performance CPU clusters
  • background app limits
  • memory fragmentation

I learned how to mitigate these issues through smarter resource management.

3. Privacy-focused engineering

Designing a zero-cloud default mode required thinking deeply about:

  • encrypted local storage
  • opt-in consent
  • key handling
  • fallback logic

4. AI UX design

Token streaming, file reasoning, personas, and model selection all required thoughtful user experience design.

5. Scalable cross-platform architecture

Flutter made it possible to extend Cortex Pocket beyond Android. Supporting Web-only remote mode taught me how to gracefully degrade capabilities per platform.

⚠️ Challenges I Faced

1. Mobile FFI stability

Integrating llama.cpp into a Flutter app required:

  • careful memory management
  • isolating native calls
  • handling token streaming without blocking the UI

2. Managing large models on limited RAM

I had to design:

  • model unload/reload mechanisms
  • low-memory safe quantization pipelines
  • file streaming for large GGUF models

3. Real-time performance

Keeping inference smooth while generating tokens in real-time was non-trivial.

4. Secure cloud fallback

Ensuring:

  • encrypted API keys
  • clear user consent
  • zero accidental network calls

was challenging but rewarding.

5. Cross-platform consistency

Web cannot run local models — I designed a remote-only mode with:

  • graceful degradation
  • UI state indicators
  • cloud-only benchmarks

🌍 Impact & Future Vision

Cortex Pocket demonstrates that modern AI does not need server farms or proprietary APIs. A phone — even a mid-range one — is enough.

This project lays the foundation for:

  • offline developer assistants
  • field research tools
  • privacy-safe enterprise assistants
  • AI for low-connectivity regions
  • educational AI labs
  • portable coding companions
  • cybersecurity analysis tools

By being Android-first and cross-platform, Cortex Pocket is designed to scale to many environments while keeping privacy at its core.

🏁 Final Thoughts

Building Cortex Pocket has been one of the most rewarding technical projects I've completed. It blends:

  • low-level AI engineering
  • Arm performance optimization
  • Flutter cross-platform development
  • privacy-first design
  • multi-persona intelligence
  • smart cloud fallback logic

It aligns directly with the mission of the Arm AI Developer Challenge:

Show that powerful AI can run completely on-device, without the cloud.

Cortex Pocket proves exactly that — AI can be fast, private, portable, and truly yours.

Built With

Share this project:

Updates