Atila (@atiorh) / X

Atila

843 posts

Atila

@atiorh

on-device AI at @argmax

San Francisco, CA

Joined July 2016

Pinned
Atila
@atiorh
Jul 24, 2025
Argmax Pro SDK is now generally available!!
argmax
@argmax
Jul 24, 2025
Introducing Real-time Transcription with Nvidia Parakeet - Same top accuracy as file transcription - Best-in-market 160 ms lips-to-screen latency - 744x more cost-efficient compared to cloud APIs - Available in Argmax Pro SDK starting today! Link in comments
00:00
9.6K
Atila
@atiorh
Jun 14, 2023
Exciting updates to #stablediffusion with Core ML! - 6-bit weight compression that yields just under 1 GB - Up to 30% improved Neural Engine performance - New benchmarks on iPhone, iPad and Macs - Multilingual system text encoder support - ControlNet github.com/apple/ml-stabl… 🧵
GitHub - apple/ml-stable-diffusion: Stable Diffusion with Core ML on Apple Silicon
From github.com
377K
Atila
@atiorh
Sep 28, 2023
Stable Diffusion XL on iPhone with Core ML! - 4-bit weight compression - Works on iOS 17 & iPhone 13 Pro or newer - Other features and improvements to the repo 🧵
GitHub - apple/ml-stable-diffusion: Stable Diffusion with Core ML on Apple Silicon
From github.com
232K
Atila
@atiorh
Jun 6, 2022
As part of #WWDC22 , we are open-sourcing a reference implementation of the Transformer architecture optimized for the Apple Neural Engine (ANE)! github.com/apple/ml-ane-t… (1/n) 🧵
GitHub - apple/ml-ane-transformers: Reference implementation of the Transformer architecture...
From github.com
Atila
@atiorh
Dec 1, 2022
Delighted to share #stablediffusion with Core ML on Apple Silicon built on top of @huggingface diffusers! 🧵
Atila
@atiorh
May 4, 2024
Thanks for updating the license to MIT @Apple ! Let's build 🫡
ml-stable-diffusion/LICENSE.md at main · apple/ml-stable-diffusion
From github.com
49K
Atila
@atiorh
Dec 21, 2023
My takeaways from Apple's “LLM in a flash" (1/n)
AK
@_akhaliq
Dec 20, 2023
Apple announces LLM in a flash: Efficient Large Language Model Inference with Limited Memory paper page: huggingface.co/papers/2312.11… Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their
164K
Atila
@atiorh
Jul 27, 2023
Stable Diffusion XL with Core ML on Apple Silicon! #SDXL The model grew 3x in size to ~2.6 billion parameters so we are releasing a new model compression technique that yields variants quantized to as little as 3 bits with minimal output difference 🧵
69K
Atila
@atiorh
Sep 12, 2023
35 TFlops of ML compute in your pocket! (#iPhone15Pro) On-device inference is getting interesting.. #AppleEvent
117K
Atila
@atiorh
Jul 29, 2024
Apple Intelligence hits the market in beta today: A pretty impressive 2.6b on-device LLM running on the Neural Engine compressed down to ~1GB. It consumes way below 10W. Congrats to my former teammates & colleagues on landing this! Tech report is also out:
Max Weinbach
@mweinbach
Jul 29, 2024
Depending on the task you give Apple Intelligence, it can peak up to ~5.5W on the ANE Mail summarization is less than 1-2W, but rewriting here hits up to around 5.9W. This is admittedly very efficient. Also, it did a better job at re-writing this document than Gemini did lol
00:00
25K
Atila
@atiorh
Dec 1, 2022
Replying to @atiorh
Please refer to our code repository for details:
GitHub - apple/ml-stable-diffusion: Stable Diffusion with Core ML on Apple Silicon
From github.com
Atila
@atiorh
Jun 18, 2024
Thanks @Apple
argmax
@argmax
Jun 18, 2024
WhisperKit is 40% faster on iOS 18 Improved from 165 to 237 tok/s on whisper-base Repo: github.com/argmaxinc/Whis… Test App: testflight.apple.com/join/LPVOyJZW
00:00
22K
Atila
@atiorh
Nov 9, 2023
Persimmon-8b LLM (@AdeptAILabs) has ~95% activation sparsity in many of its layers which is crazy! Here is a gist that prints some stats. Most zeros are shared across tokens too:
Activation Sparsity in LLMs
From gist.github.com
40K
Atila
@atiorh
Dec 1, 2022
Replying to @atiorh
Today's release of macOS Ventura 13.1 Beta 4 and iOS and iPadOS 16.2 Beta 4 include optimizations that let Stable Diffusion run with improved efficiency on the Apple Neural Engine as well as on Apple Silicon GPU