Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Stream Blog

Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Vision Agents is a new, open-source framework from Stream that helps developers quickly build low-latency vision AI applications. The project is completely open-source and ships with over ten out-of-the-box integrations, including day one support for leading real-time voice and video models like OpenAI Realtime and Gemini Live. Text-to-speech, speech-to-text, and speech-to-speech models are also natively
Read more ->
4 min read

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things
Read more ->
13 min

The 6 Best LLM Tools To Run Models Locally

Running large language models (LLMs) like DeepSeek Chat, ChatGPT, and Claude usually involves sending data to servers managed by DeepSeek, OpenAI, and other AI model providers. While these services are secure, some businesses prefer to keep their data offline for greater privacy. Get started! Activate your free Stream account today and start prototyping with the
Read more ->
12 min

Using Stream to Build a Livestream Chat App in Next.js

I always wondered how to create the dynamic chat experience of livestreams, like those found on YouTube, but with an added convenience of allowing anyone to participate without logging in. Get started! Activate your free Stream account today and start prototyping livestream video. With Next.js and Stream, I was able to successfully create that experience.
Read more ->
8 min

Guide to AI Agent Protocols: MCP, A2A, ACP & More

As agents become more commonplace, new protocols are being invented to simplify and standardize complex workflows. With so many emerging standards, it’s increasingly important to know which to use and how they fit together. This guide breaks down the most widely recognized AI agent protocols and explains how they interoperate, overlap, and complement each other.

Read more ->
10 min read

Seeing with GPT‑4o: Building with OpenAI’s Vision Capabilities

Over the last few years, developers have gone from using language models for text-only chat to relying on them as general-purpose perception systems. You’re not only building chatbots; you’re building apps that use text, audio, and vision to understand and act on the world around them. GPT-4o is the most capable step yet: a single

Read more ->
14 min read

Lessons from Redesigning a Multi-Product Developer Dashboard

B2B dashboards tend to evolve quietly. New features get added. New data appears. Navigation grows more complex over time. Eventually, what started as a focused interface becomes a dense surface area that’s difficult to extend, harder to learn, and increasingly fragile to change. At Stream, we recently rebuilt our dashboard from the ground up to

Read more ->
3 min read

Clone MedTalk: HIPAA-Ready Video and Chat Consultations in Flutter

Telehealth is transforming the way patients and providers connect, offering faster access to care and reducing barriers caused by distance or scheduling. A critical part of this experience is enabling secure, real-time video consultations alongside features like chat messaging for sharing updates, questions, and follow-ups. With Stream’s healthcare chat solution, developers can build HIPAA-ready communication

Read more ->
24 min read

Build vs. Buy In-App Chat: The Ultimate Decision Guide

Adding in-app chat is one of the most common (and underestimated) product decisions teams face. Today, AI has made it easier than ever to prototype messaging features quickly. A small team can scaffold a working chat experience in days, not months. But shipping a demo is not the same as running a production chat system.

Read more ->
19 min read

From Cameras to Action: Real‑World Applications of Vision and Speech AI

You’re working in a warehouse when you see an automated forklift barreling towards a coworker. You whip out your phone and type "STOP!" into the app controlling the vehicle. You add another exclamation point to make sure it knows it’s an emergency. That’s not good enough, and it’s not how things have to be. AI

Read more ->
9 min read

Lessons from Building an AI Football Commentator

Vision Agents is our open source framework for quickly building low-latency video AI applications on the edge. It runs on Stream’s global edge network by default, supports any edge provider and integrates with 25+ leading voice and video AI models. To put the framework to the test, we built a real-time sports commentator using stock

Read more ->
10 min read

Content Moderation Ethics: Navigating Bias, Censorship & Fairness

Online communities rely on moderation to function, yet every moderation decision carries significant ethical implications.  The platforms that choose to remove, allow, promote, or downrank content directly shape culture, discourse, and safety. As AI systems take on more moderation responsibilities and regulatory scrutiny increases, the stakes continue to rise. Ethical moderation is no longer only

Read more ->
8 min read