Inspiration

IronIQ started as a solo attempt to solve a problem I personally experienced and could not ignore: fitness apps don’t coach. They record. The exact moment when guidance matters most—mid-set, mid-rest, mid-hesitation—is when software disappears. We noticed a massive disconnect in the fitness industry. You either have passive tracking apps that require you to stare at a screen and tap tiny buttons with sweaty hands—breaking your flow state—or you pay a premium for a human trainer. We wanted to build the middle ground: an AI companion that is truly present. With the release of Gemini’s Multimodal Live API, we realized we finally had the technology to create a coach that can see (via camera), hear (via low-latency voice), and adapt in real-time, just like a human. IronIQ was born from the desire to make elite, interactive coaching accessible to everyone, regardless of budget or language.

Human trainers solve this, but they don’t scale and they’re inaccessible for most people. At the same time, interacting with screens during workouts is fundamentally broken. Touching a phone with sweaty hands, switching between timers, plans, and music adds friction at the worst possible moment.

What changed was feasibility. Real-time, low-latency multimodal AI finally made it possible to build a system that could listen, speak, and respond fast enough to act like a coach. IronIQ exists to test a single thesis: coaching can be built as software, and it can happen live.


What it does

IronIQ is a voice-first, multimodal AI gym coach that operates in a continuous real-time loop. The user places the phone down and trains. IronIQ manages pacing, rest periods, exercise flow, and adaptations without requiring screen interaction.

It listens through live audio, responds conversationally, and can use the camera to understand environment and equipment. It reacts to hesitation, adjusts workouts on the fly, and maintains full session context from start to finish.

This is not a tracking tool. It is an active coaching system designed to feel closer to a human trainer than to a traditional app.


How I built it

I designed IronIQ around one non-negotiable constraint: latency. If the system couldn’t respond fast enough to feel present, it would fail.

I implemented raw PCM audio streaming with voice activity detection to achieve sub-second responses. The AI runs in a persistent state loop, sharing memory across voice, UI, and vision so context is never lost mid-session.

Video input is streamed at low frame rates to provide environmental awareness without heavy compute overhead. The UI is intentionally minimal and fully synchronized with the AI’s internal state, allowing manual overrides that never break the conversational flow.

Every component was built to serve a single goal: real-time coaching that feels continuous, not reactive.


Challenges I ran into

Latency was the hardest problem. Even small delays instantly break the illusion of intelligence and presence. I spent significant time tuning audio buffers, streaming logic, and state transitions to make responses feel immediate.

State synchronization was another major challenge. Voice commands, timers, camera input, and UI actions all needed to update the same internal model in real time. Any desynchronization would undermine trust.

Finally, designing coaching behavior that felt motivating rather than robotic required iteration. Coaching is behavioral, not informational, and getting that right was more complex than building the technical stack.


Accomplishments that I'm proud of

I built a real-time AI coaching loop that users can complete full workouts with. I achieved conversational-speed responses fast enough to interrupt mistakes, not comment afterward. I created localized coaching personas that adapt behavior, not just language. I delivered a cohesive multimodal system built end-to-end by a single developer.

Most importantly, I proved that an AI coach can feel credible during real physical effort.


What I learned

Fitness is not a data problem; it’s a timing problem. Guidance delivered late is functionally useless.

Voice-first interaction is not optional in physical environments—it’s required. Multimodality only works when all inputs share a single authoritative state.

I also learned that perceived presence matters more than technical perfection. Users trust systems that feel attentive and responsive, even if they are not flawless.


What's next for IronIQ

The next phase is deeper embodiment. I plan to expand computer vision for real-time form awareness and correction, scale localized coaching personas, and prepare IronIQ to move beyond a standalone app into gyms and wellness platforms.

IronIQ’s direction is not more features. It is more authority. The goal is to turn AI coaching into real-time infrastructure—built by one person, designed to scale far beyond one.

Built With

Share this project:

Updates