Omerta AI: The Silent Counsel for iPhone

Inspiration

The inspiration for Omerta AI stemmed from a critical, modern conflict in legal practice. Analysis of the American Bar Association's 2023 Cybersecurity TechReport revealed a stark paradox: while 73% of lawyers plan to adopt generative AI for efficiency, 29% of law firms have experienced a security incident, with cloud-based tools and mobile devices posing significant risks to attorney-client privilege. The project was conceived to resolve this by leveraging the powerful, private hardware already in professionals' pockets. The goal is to design an AI system where confidentiality is not a feature but the foundational architecture, making the choice between efficiency and ethics obsolete.

What It Does

Omerta AI is a conceptual native iOS application designed to function as a fully autonomous, on-device legal assistant. It enables legal professionals to perform sensitive tasks on an iPhone without any data transmission. The application would allow for the private recording and instant transcription of client meetings, using on-device models to generate summaries and extract action items. It could conduct legal research by querying a pre-loaded, local database of case law via Retrieval-Augmented Generation (RAG), even in offline environments. Confidential documents could be analyzed locally for clauses and risks, with all outputs—draft letters, case notes, and strategy memos—being structured and stored solely within the iPhone's encrypted storage, upholding the sanctity of attorney-client privilege.

How We Would Build It

The technical blueprint for Omerta AI is architected as a fortress, built upon the RunAnywhere iOS SDK to enforce privacy at the deepest level.

Privacy-First Foundation: The SDK would be initialized with SDKInitParams(mode: .localOnly), configuring the core ServiceContainer to disable all network-dependent services. A custom LocalOnlyRoutingService would be implemented to override any default cloud routing, ensuring a 100% on-device execution path.
On-Device AI Pipeline: User audio would be captured via AVAudioEngine and managed by a custom IOSAudioSession. The audio would flow through the SDK's VADComponent for voice detection and into the STTComponent, which would use a locally registered WhisperKitProvider running a quantized Whisper Tiny model. The transcribed text would be passed to the LLMComponent. The project would leverage the SDK's ModelRegistry and ModuleRegistry to prioritize and load a local Gemma-2B-IT-INT4 or Phi-2 model, executing inference via an optimized CoreMLProvider for Apple's Neural Engine.
Data Security & Feasibility: All application data—from case files to the AI models themselves—would reside within the iOS sandbox. The iOS Data Protection API (.completeUntilFirstUserAuthentication) would be used to ensure file encryption is tied to the device passcode. The key technical strategy for mobile feasibility involves dynamic model selection based on a DeviceCapabilityCache, which checks ProcessInfo for available RAM. Devices like an iPhone 15 Pro Max would use the more capable Gemma-2B model, while older hardware would automatically fall back to a smaller model like TinyLlama, with the SDK's MemoryManager handling aggressive cache purging upon memory pressure warnings.

Challenges We Would Face

The primary challenge in realizing this concept is the extreme hardware constraint of running advanced AI models on a mobile device. Fitting a ~1.2GB model into the active memory of an iOS app, which must also accommodate the OS and frameworks, would risk constant EXC_RESOURCE crashes. The solution would require deep integration with the SDK's memory management and pre-emptive caching strategies. Achieving low latency for a good user experience is another major hurdle; overcoming initial inference times of 8-10 seconds would necessitate model optimization for CoreML, token streaming for instant feedback, and model pre-warming. Finally, designing a useful offline RAG system requires solving the data problem: curating, embedding, and storing a relevant subset of legal knowledge locally without consuming excessive storage space.

Accomplishments That We're Proud Of

This concept demonstrates a fully-realized architectural blueprint for privacy-first AI. The design proves that a complex, multi-model pipeline (VAD → STT → LLM → RAG) can be conceptualized to run entirely within iOS's security sandbox under a "zero-byte egress" policy. The dynamic hardware adaptation logic is a sophisticated answer to mobile diversity, ensuring the idea is viable across a wide range of iPhones. Furthermore, the concept translates a powerful technical SDK into a focused, professional user experience that directly addresses a critical, data-backed need in a high-stakes industry.

What We Learned

The research and design process underscored that true privacy is an architectural outcome, not a setting. It cannot be added to a cloud-based system but must be the starting premise. The project revealed the intricate balance required for on-device AI: trading off model size and capability against latency and stability, all within a strict memory budget. It also highlighted the importance of the RunAnywhere SDK's design patterns—like its protocol-oriented services, plugin registry, and component lifecycle—as the essential framework that makes such a privacy-native application conceivable.

What's Next for Omerta AI

The immediate next step for this concept is the development of a minimal viable prototype using the RunAnywhere SDK's demo applications as a foundation, focusing on proving the core local transcription-and-summary pipeline. Following validation, the roadmap would expand to integrate a local vector database for the RAG system and develop the structured output templates for legal workflows. The ultimate vision is to extend the principle beyond iOS, utilizing the RunAnywhere SDK's Kotlin Multiplatform capabilities to bring the same standard of unbreakable, on-device confidentiality to Android and desktop platforms, establishing a new benchmark for trusted AI in professional services.

Built With

Updates

Aniruddh Shreyas started this project — Dec 24, 2025 02:08 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.