100% open source framework for realtime voice and multimodal AI. Maintained by @trydaily engineering team with support from the Pipecat developer community.
Big day today. Pipecat version 1.0. Two years in the making. The most widely used framework for voice agents, but not just voice agents. Pipecat is a framework for realtime, multi-modal, multi-model AI applications. Contributions from NVIDIA, all the foundation labs, AWS, GCP,
The new sonic-2 voice models from @cartesia are available in Pipecat.
Latency is 40ms for the `sonic-turbo` version of the model and 90ms for the larger `sonic-2` model.
- Cartesia developer docs: docs.cartesia.ai/build-with-car…
- A complete voice agent example project:
Lots of new things in 0.40.
✨ Function calling and prompt caching for @AnthropicAI Claude 3.5 Sonnet
✨ Llama 3.1 function calling support in the @togethercompute service
✨ A complete implementation of the RTVI standard
✨ Studypal, a new application example from the team at
This Thursday, Sep 4th - Voice AI Meetup in London!
We love talking with you all at these meetups, and are excited to be back in the UK. Register your spot, in thread.
🍻 Networking, food, drinks with fellow voice AI engineers and teams
🫡 Live demos, panel, and chats with
🚀 Today we’re launching Pipecat TV! A video podcast about Pipecat: new features, how-tos, advanced tricks, community interviews & more. 🎙️ Fun + useful for all voice AI builders. Watch the pilot episode now! 🐱📺
Pipecat 0.0.62 released today. Highlights include:
➡️ Support for @gladia_io's Solaris speech-to-text model released today.
➡️ A new memory layer service courtesy of @mem0ai.
➡️ WhisperSTTServiceMLX for Whisper transcription on Apple Silicon
➡️ A new peer-to-peer WebRTC
.@gladia_io announced their new Solaria speech-to-text model today.
I hear so, so many rave reviews about Gladia from people building French-language voice AI agents.
Gladia's new model supports more than 100 languages. The language auto-detection is the best I've seen from any
Can you beat my 1-929-LLM-GAME high score?
We've been exploring what you can do with speech-to-speech models.
Here's a word guessing game, built with the Gemini Multimodal Live API, Vercel, and Twilio, that has a bunch of interesting features ... 🧵
Today we’re announcing an open standard for Real-time Voice and Video Inference: RTVI-AI.
The RTVI abstractions and data structures define how client applications communicate with inference services.
These are the “real-time APIs” for use cases like:
- Voice chat with LLMs