🧠 About the Project — Cortex Pocket
Offline-first AI. Arm-optimized performance. Cross-platform intelligence.
🌟 Inspiration
Modern AI assistants rely almost entirely on cloud inference. This creates several limitations:
- They stop working offline
- Sensitive data is sent to external servers
- They depend on paid API keys
- They are inaccessible in low-connectivity regions
- They provide no control over performance or privacy
I wanted to challenge this model.
The inspiration behind Cortex Pocket was simple:
What if a full AI assistant could run entirely on your device, powered only by its Arm CPU?
From that question, Cortex Pocket was born — a mobile-first, privacy-first, offline-first AI assistant capable of running modern LLMs without touching the cloud. And for those who still want cloud access, Cortex Pocket offers a secure optional remote mode, fully encrypted and user-controlled.
🚀 Project Overview
Cortex Pocket is an Android-first AI application designed to run fully on-device Large Language Models (LLMs) using Arm-optimized inference via llama.cpp. While Android is the primary target platform, the project also supports:
- 📱 iOS
- 🌐 Web (remote API mode only)
- 🐧 Linux
- 🍎 macOS
- 🪟 Windows
This makes Cortex Pocket not just a mobile AI assistant, but a cross-platform developer tool and research sandbox.
🧩 How I Built It — Technical Architecture
Cortex Pocket is built using a layered, modular architecture that balances performance, privacy, and portability.
Tech Stack
- Flutter 3.10+ — Fast, beautiful cross-platform UI
- Dart 3.0+ FFI — Low-level bridging to native inference
- llama.cpp (GGUF) — High-performance local AI engine
- Flutter Secure Storage — Encrypted local persistence
- Custom Benchmark Engine — Real-time profiling
- JSON-driven Personas — Developer, Security, Writer, Analyst, Guide
- Optional Cloud Mode — Gemini/OpenAI API support
🧬 Architecture Diagram (Textual)
Flutter UI (Material)
│
├── UI Layer (Screens, Widgets)
│
├── Services Layer
│ ├── LLMService (local/remote switching)
│ ├── BenchmarkService
│ ├── PersonaService
│ └── FileReasoningService
│
├── Data Layer
│ ├── Repositories (models, personas)
│ ├── Encrypted Storage (chat, configs)
│ └── Runtime Providers
│
├── FFI Layer
│ ├── llama_ffi.dart
│ ├── llama_bridge.dart
│ └── llama_types.dart
│
└── Native Inference (llama.cpp / GGUF)
This separation ensures clean maintainability and easy expansion, such as adding new models or personas.
🏎️ Arm Optimization (Core Requirement)
Because this project is part of the Arm AI Developer Challenge, I focused heavily on Arm-specific performance optimization:
✔ Compiled llama.cpp with Arm NEON
Vectorized matrix compute for improved token throughput.
✔ Armv8-A acceleration
Enabled math intrinsics for faster inference.
✔ FP16 KV cache
Greatly reduces memory bandwidth and improves consistency.
✔ Quantized GGUF Models
Supports Q8 → Q4 → Q3 levels for low-RAM devices.
✔ big.LITTLE CPU Scheduling
Optimized number of threads to balance:
- performance (big cores)
- thermal limits
- battery consumption
✔ On-device Benchmark Suite
Measures:
- token/s
- RAM usage
- model load time
- CPU core usage
- quantization efficiency
These optimizations allow Cortex Pocket to run efficiently even on mid-range Android devices, completely offline.
🔍 What I Learned
1. Low-level LLM inference
Building the FFI bridge into Flutter gave me hands-on understanding of:
- KV cache
- attention computation
- tokenizer performance
- quantized kernels
- memory spikes during inference
2. Mobile hardware constraints
Phones introduce real challenges:
- limited RAM
- thermal throttling
- mixed-performance CPU clusters
- background app limits
- memory fragmentation
I learned how to mitigate these issues through smarter resource management.
3. Privacy-focused engineering
Designing a zero-cloud default mode required thinking deeply about:
- encrypted local storage
- opt-in consent
- key handling
- fallback logic
4. AI UX design
Token streaming, file reasoning, personas, and model selection all required thoughtful user experience design.
5. Scalable cross-platform architecture
Flutter made it possible to extend Cortex Pocket beyond Android. Supporting Web-only remote mode taught me how to gracefully degrade capabilities per platform.
⚠️ Challenges I Faced
1. Mobile FFI stability
Integrating llama.cpp into a Flutter app required:
- careful memory management
- isolating native calls
- handling token streaming without blocking the UI
2. Managing large models on limited RAM
I had to design:
- model unload/reload mechanisms
- low-memory safe quantization pipelines
- file streaming for large GGUF models
3. Real-time performance
Keeping inference smooth while generating tokens in real-time was non-trivial.
4. Secure cloud fallback
Ensuring:
- encrypted API keys
- clear user consent
- zero accidental network calls
was challenging but rewarding.
5. Cross-platform consistency
Web cannot run local models — I designed a remote-only mode with:
- graceful degradation
- UI state indicators
- cloud-only benchmarks
🌍 Impact & Future Vision
Cortex Pocket demonstrates that modern AI does not need server farms or proprietary APIs. A phone — even a mid-range one — is enough.
This project lays the foundation for:
- offline developer assistants
- field research tools
- privacy-safe enterprise assistants
- AI for low-connectivity regions
- educational AI labs
- portable coding companions
- cybersecurity analysis tools
By being Android-first and cross-platform, Cortex Pocket is designed to scale to many environments while keeping privacy at its core.
🏁 Final Thoughts
Building Cortex Pocket has been one of the most rewarding technical projects I've completed. It blends:
- low-level AI engineering
- Arm performance optimization
- Flutter cross-platform development
- privacy-first design
- multi-persona intelligence
- smart cloud fallback logic
It aligns directly with the mission of the Arm AI Developer Challenge:
Show that powerful AI can run completely on-device, without the cloud.
Cortex Pocket proves exactly that — AI can be fast, private, portable, and truly yours.
Log in or sign up for Devpost to join the conversation.