Omerta AI: The Silent Counsel for iPhone
Inspiration
The inspiration for Omerta AI stemmed from a critical, modern conflict in legal practice. Analysis of the American Bar Association's 2023 Cybersecurity TechReport revealed a stark paradox: while 73% of lawyers plan to adopt generative AI for efficiency, 29% of law firms have experienced a security incident, with cloud-based tools and mobile devices posing significant risks to attorney-client privilege. The project was conceived to resolve this by leveraging the powerful, private hardware already in professionals' pockets. The goal is to design an AI system where confidentiality is not a feature but the foundational architecture, making the choice between efficiency and ethics obsolete.
What It Does
Omerta AI is a conceptual native iOS application designed to function as a fully autonomous, on-device legal assistant. It enables legal professionals to perform sensitive tasks on an iPhone without any data transmission. The application would allow for the private recording and instant transcription of client meetings, using on-device models to generate summaries and extract action items. It could conduct legal research by querying a pre-loaded, local database of case law via Retrieval-Augmented Generation (RAG), even in offline environments. Confidential documents could be analyzed locally for clauses and risks, with all outputs—draft letters, case notes, and strategy memos—being structured and stored solely within the iPhone's encrypted storage, upholding the sanctity of attorney-client privilege.
How We Would Build It
The technical blueprint for Omerta AI is architected as a fortress, built upon the RunAnywhere iOS SDK to enforce privacy at the deepest level.
Privacy-First Foundation: The SDK would be initialized with
SDKInitParams(mode: .localOnly), configuring the coreServiceContainerto disable all network-dependent services. A customLocalOnlyRoutingServicewould be implemented to override any default cloud routing, ensuring a 100% on-device execution path.On-Device AI Pipeline: User audio would be captured via
AVAudioEngineand managed by a customIOSAudioSession. The audio would flow through the SDK'sVADComponentfor voice detection and into theSTTComponent, which would use a locally registeredWhisperKitProviderrunning a quantized Whisper Tiny model. The transcribed text would be passed to theLLMComponent. The project would leverage the SDK'sModelRegistryandModuleRegistryto prioritize and load a local Gemma-2B-IT-INT4 or Phi-2 model, executing inference via an optimizedCoreMLProviderfor Apple's Neural Engine.Data Security & Feasibility: All application data—from case files to the AI models themselves—would reside within the iOS sandbox. The iOS Data Protection API (
.completeUntilFirstUserAuthentication) would be used to ensure file encryption is tied to the device passcode. The key technical strategy for mobile feasibility involves dynamic model selection based on aDeviceCapabilityCache, which checksProcessInfofor available RAM. Devices like an iPhone 15 Pro Max would use the more capable Gemma-2B model, while older hardware would automatically fall back to a smaller model like TinyLlama, with the SDK'sMemoryManagerhandling aggressive cache purging upon memory pressure warnings.
Challenges We Would Face
The primary challenge in realizing this concept is the extreme hardware constraint of running advanced AI models on a mobile device. Fitting a ~1.2GB model into the active memory of an iOS app, which must also accommodate the OS and frameworks, would risk constant EXC_RESOURCE crashes. The solution would require deep integration with the SDK's memory management and pre-emptive caching strategies. Achieving low latency for a good user experience is another major hurdle; overcoming initial inference times of 8-10 seconds would necessitate model optimization for CoreML, token streaming for instant feedback, and model pre-warming. Finally, designing a useful offline RAG system requires solving the data problem: curating, embedding, and storing a relevant subset of legal knowledge locally without consuming excessive storage space.
Accomplishments That We're Proud Of
This concept demonstrates a fully-realized architectural blueprint for privacy-first AI. The design proves that a complex, multi-model pipeline (VAD → STT → LLM → RAG) can be conceptualized to run entirely within iOS's security sandbox under a "zero-byte egress" policy. The dynamic hardware adaptation logic is a sophisticated answer to mobile diversity, ensuring the idea is viable across a wide range of iPhones. Furthermore, the concept translates a powerful technical SDK into a focused, professional user experience that directly addresses a critical, data-backed need in a high-stakes industry.
What We Learned
The research and design process underscored that true privacy is an architectural outcome, not a setting. It cannot be added to a cloud-based system but must be the starting premise. The project revealed the intricate balance required for on-device AI: trading off model size and capability against latency and stability, all within a strict memory budget. It also highlighted the importance of the RunAnywhere SDK's design patterns—like its protocol-oriented services, plugin registry, and component lifecycle—as the essential framework that makes such a privacy-native application conceivable.
What's Next for Omerta AI
The immediate next step for this concept is the development of a minimal viable prototype using the RunAnywhere SDK's demo applications as a foundation, focusing on proving the core local transcription-and-summary pipeline. Following validation, the roadmap would expand to integrate a local vector database for the RAG system and develop the structured output templates for legal workflows. The ultimate vision is to extend the principle beyond iOS, utilizing the RunAnywhere SDK's Kotlin Multiplatform capabilities to bring the same standard of unbreakable, on-device confidentiality to Android and desktop platforms, establishing a new benchmark for trusted AI in professional services.
Log in or sign up for Devpost to join the conversation.