our workers ai team sprinted through saturday to get llama 4 up,
learned a lot over the last 24h (and still learning) - want to see how we think about llama 4 from a provider’s perspective? 🧵
michelle
2,185 posts
terminal romantic @cloudflare
New York
Joined January 2019
- i gotta work harder so i can be on a tbpn trading card
- probably my fave trip to sf~ thankful for the friends old and new <3
- we’re just getting startedReplicate + Cloudflare @replicate’s world-class inference platform joins the best network in the world: Cloudflare. Read more on the blog: blog.cloudflare.com/replicate-join…
- turns out that if you up the bpm the workers go fastercloudflare workers are really fast as of late god damn
- Replying to @michellechendid i get anything wrong? is this interesting to you? (we’re hiring!) last plug, try out Llama 4 on Workers AI today
- vibe coded a little experiment - was curious how effective it is to route requests to different models based on the complexity of the prompt. compass.mchen.workers.dev 🧵 here's what i learned:
- excited to be a launch partner with @AIatMeta - Llama 3.1 8B is live on workers ai blog.cloudflare.com/meta-llama-3-1… fastest way to try it out - playground.ai.cloudflare.com
- Replying to @michellecheni pulled an all nighter and now i’m at a layover for my intercontinental flight and bored disclaimer im just a PM that learns from the people around her (including you - tell me where i’m wrong) so two things i want to touch on: 1) 17b MoE is not small 2) is a 10M context
- Replying to @michellechensmall tangent - people always ask about gemini context window, yeah it’s big, it probably uses some sliding window-like architecture too (don’t quote me). most notably though, google has it’s own proprietary accelerators called TPUs. much more GPU memory, so they can fit larger










