Chat Page
Chat History
File Analysis
Persona Switch
Setting
Model List
Performance
API Integration

🧠 About the Project — Cortex Pocket

Offline-first AI. Arm-optimized performance. Cross-platform intelligence.

🌟 Inspiration

Modern AI assistants rely almost entirely on cloud inference. This creates several limitations:

They stop working offline
Sensitive data is sent to external servers
They depend on paid API keys
They are inaccessible in low-connectivity regions
They provide no control over performance or privacy

I wanted to challenge this model.

The inspiration behind Cortex Pocket was simple:

What if a full AI assistant could run entirely on your device, powered only by its Arm CPU?

From that question, Cortex Pocket was born — a mobile-first, privacy-first, offline-first AI assistant capable of running modern LLMs without touching the cloud. And for those who still want cloud access, Cortex Pocket offers a secure optional remote mode, fully encrypted and user-controlled.

🚀 Project Overview

Cortex Pocket is an Android-first AI application designed to run fully on-device Large Language Models (LLMs) using Arm-optimized inference via llama.cpp. While Android is the primary target platform, the project also supports:

📱 iOS
🌐 Web (remote API mode only)
🐧 Linux
🍎 macOS
🪟 Windows

This makes Cortex Pocket not just a mobile AI assistant, but a cross-platform developer tool and research sandbox.

🧩 How I Built It — Technical Architecture

Cortex Pocket is built using a layered, modular architecture that balances performance, privacy, and portability.

Tech Stack

Flutter 3.10+ — Fast, beautiful cross-platform UI
Dart 3.0+ FFI — Low-level bridging to native inference
llama.cpp (GGUF) — High-performance local AI engine
Flutter Secure Storage — Encrypted local persistence
Custom Benchmark Engine — Real-time profiling
JSON-driven Personas — Developer, Security, Writer, Analyst, Guide
Optional Cloud Mode — Gemini/OpenAI API support

🧬 Architecture Diagram (Textual)

Flutter UI (Material)
│
├── UI Layer (Screens, Widgets)
│
├── Services Layer
│    ├── LLMService (local/remote switching)
│    ├── BenchmarkService
│    ├── PersonaService
│    └── FileReasoningService
│
├── Data Layer
│    ├── Repositories (models, personas)
│    ├── Encrypted Storage (chat, configs)
│    └── Runtime Providers
│
├── FFI Layer
│    ├── llama_ffi.dart
│    ├── llama_bridge.dart
│    └── llama_types.dart
│
└── Native Inference (llama.cpp / GGUF)

This separation ensures clean maintainability and easy expansion, such as adding new models or personas.

🏎️ Arm Optimization (Core Requirement)

Because this project is part of the Arm AI Developer Challenge, I focused heavily on Arm-specific performance optimization:

✔ Compiled llama.cpp with Arm NEON

Vectorized matrix compute for improved token throughput.

✔ Armv8-A acceleration

Enabled math intrinsics for faster inference.

✔ FP16 KV cache

Greatly reduces memory bandwidth and improves consistency.

✔ Quantized GGUF Models

Supports Q8 → Q4 → Q3 levels for low-RAM devices.

✔ big.LITTLE CPU Scheduling

Optimized number of threads to balance:

performance (big cores)
thermal limits
battery consumption

✔ On-device Benchmark Suite

Measures:

token/s
RAM usage
model load time
CPU core usage
quantization efficiency

These optimizations allow Cortex Pocket to run efficiently even on mid-range Android devices, completely offline.

🔍 What I Learned

1. Low-level LLM inference

Building the FFI bridge into Flutter gave me hands-on understanding of:

KV cache
attention computation
tokenizer performance
quantized kernels
memory spikes during inference

2. Mobile hardware constraints

Phones introduce real challenges:

limited RAM
thermal throttling
mixed-performance CPU clusters
background app limits
memory fragmentation

I learned how to mitigate these issues through smarter resource management.

3. Privacy-focused engineering

Designing a zero-cloud default mode required thinking deeply about:

encrypted local storage
opt-in consent
key handling
fallback logic

4. AI UX design

Token streaming, file reasoning, personas, and model selection all required thoughtful user experience design.

5. Scalable cross-platform architecture

Flutter made it possible to extend Cortex Pocket beyond Android. Supporting Web-only remote mode taught me how to gracefully degrade capabilities per platform.

⚠️ Challenges I Faced

1. Mobile FFI stability

Integrating llama.cpp into a Flutter app required:

careful memory management
isolating native calls
handling token streaming without blocking the UI

2. Managing large models on limited RAM

I had to design:

model unload/reload mechanisms
low-memory safe quantization pipelines
file streaming for large GGUF models

3. Real-time performance

Keeping inference smooth while generating tokens in real-time was non-trivial.

4. Secure cloud fallback

Ensuring:

encrypted API keys
clear user consent
zero accidental network calls

was challenging but rewarding.

5. Cross-platform consistency

Web cannot run local models — I designed a remote-only mode with:

graceful degradation
UI state indicators
cloud-only benchmarks

🌍 Impact & Future Vision

Cortex Pocket demonstrates that modern AI does not need server farms or proprietary APIs. A phone — even a mid-range one — is enough.

This project lays the foundation for:

offline developer assistants
field research tools
privacy-safe enterprise assistants
AI for low-connectivity regions
educational AI labs
portable coding companions
cybersecurity analysis tools

By being Android-first and cross-platform, Cortex Pocket is designed to scale to many environments while keeping privacy at its core.

🏁 Final Thoughts

Building Cortex Pocket has been one of the most rewarding technical projects I've completed. It blends:

low-level AI engineering
Arm performance optimization
Flutter cross-platform development
privacy-first design
multi-persona intelligence
smart cloud fallback logic

It aligns directly with the mission of the Arm AI Developer Challenge:

Show that powerful AI can run completely on-device, without the cloud.

Cortex Pocket proves exactly that — AI can be fast, private, portable, and truly yours.

Built With

android
arm-neon
cmake
dart
dart-ffi
flutter
flutter-secure-storage
gemini-api
ios
linux
llama.cpp
macos
ndk
quantized-gguf-models
web
windows

Updates

Hrudu Shibu started this project — Dec 02, 2025 01:13 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.