Inspiration

PaperHeart was inspired by a natural extension of my startup, Pajama Cats Media, where we bring AI to illustrated children's books for interactive storytelling. After learning about OpenAI's real-time API, I wanted to test the limits of natural voice interaction—especially using voice activity detection (VAD)—and quickly prototype a demo showcasing this technology.

What it does

PaperHeart adds a lightweight, conversational AI agent on top of any PDF. Users can also feed it extra knowledge for simple retrieval-augmented generation (RAG). Potential applications include personalized financial advice, explaining insurance bills, educational tutoring, or even acting as a virtual host to guide customers through a French restaurant menu and take orders.

How we built it

We built PaperHeart with iOS Swift, using a finite state machine (FSM) to control the conversational flow. For the AI side, we integrated LangChain and Gemini, and used streaming text-to-speech (TTS) with real-time voice activity detection.

Challenges we ran into

One major challenge was designing the interaction flow—making sure LLM responses and real-time voice processing stayed smooth and natural in a procedural experience.

Accomplishments that we're proud of

It works! We achieved real-time, voice-driven interactions layered on top of static documents.

What we learned

Building a prototype with real-time AI voice interaction isn’t too hard with today's tools, but polishing it to feel as smooth and natural as a commercial product is a much bigger challenge. We also learned that real-world usage demands more robustness on both the AI and voice input sides—especially around handling environmental noise.

What's next for PaperHeart

We're focusing on consolidating around one use case and validating it with real customers.

** How to test:

  1. download the iOS app using TestFlight (iPhone, iPad): https://testflight.apple.com/join/18gd5tey
  2. login with demo account: username: demo@foo.com password: paperheart
  3. open one of documents to listen and talk to the AI agent (Try Investment Strategy Bulletin first). You can speak any moment to disrupt the AI like natural conversation.

Built With

Share this project:

Updates