The inspiration for TalkSense-AI came from witnessing the real-world challenges faced by call center agents who often struggle to deliver quick, accurate responses during live customer interactions. Traditional systems lack real-time AI-driven support, resulting in slower resolutions and inconsistent customer experiences.

We envisioned a solution that could listen, analyze, and assist in real-time — transforming every conversation into an opportunity for better service. With advancements in speech recognition and generative AI, we realized it was finally possible to create an intelligent co-pilot for live calls.


What It Does

TalkSense-AI is a comprehensive Live Call Analytics & AI Assistance Platform that empowers agents and supervisors with:

  • Real-time Speech-to-Text Transcription using browser microphone input
  • AI-Powered Call Assistance providing contextual, in-call suggestions
  • Sentiment Analysis to track customer emotions throughout the conversation
  • AI Chatbot powered by Bedrock for instant Q&A from your knowledge base
  • Call Simulator for realistic training and mock call scenarios
  • Data Management System to upload, parse, and analyze call recordings, PDFs, and documents
  • Post-Call Analytics with detailed insights like talk time, interruptions, resolution patterns

How We Built It

We designed a full-stack system using a React.js + Vite frontend, a FastAPI backend, and a powerful set of AWS cloud services to enable scalability and intelligent automation.

Architecture Overview

Frontend (React + Vite)

  • Real-time audio capture via MediaRecorder or Web Audio API
  • WebSocket-based transcript updates and agent suggestions
  • Dynamic UI updates using React state, Radix UI, and Tailwind CSS
// Establish WebSocket for live call analytics
const wsRef = useRef(null);
wsRef.current = new WebSocket('ws://localhost:8000/ws/live-call-analytics');

Backend (FastAPI + AWS)

  • Real-time audio ingestion over WebSocket
  • AWS Transcribe for live speech-to-text with Call Analytics metadata
  • Bedrock inference to generate contextual suggestions from the knowledge base
  • Sentiment analysis using AWS Comprehend
# Real-time WebSocket endpoint for streaming transcription
@app.websocket("/ws/live-call-analytics")
async def live_call_analytics(websocket: WebSocket):
    client = TranscribeStreamingClient(region=AWS_REGION)
    stream = await client.start_call_analytics_stream_transcription(
        language_code="en-US",
        media_sample_rate_hz=16000,
        media_encoding="pcm"
    )

Key Implementation Features

  • WebSocket Integration for low-latency audio and data transfer
  • Amazon Transcribe Call Analytics with speaker separation
  • Amazon Bedrock (Titan LLM) for real-time generative AI responses
  • S3-based Knowledge Base for document storage and retrieval
  • Sentiment Analysis using Amazon Comprehend
  • Post-call Summarization using Bedrock with structured prompts
  • Secure Upload System for audio files, PDFs, and more

Challenges We Faced

  • Real-time Audio Processing: Achieving low-latency browser-to-AWS streaming with synchronized audio
  • WebSocket Stability: Managing concurrent WebSocket sessions for multiple users
  • Multi-Service Integration: Coordinating Bedrock, Transcribe, Comprehend, and S3 within the same workflow
  • Context Management: Maintaining continuous conversation context for live AI prompts
  • CORS Configuration: Ensuring secure cross-origin access for APIs and sockets

What We Learned

  • Implementing real-time GenAI workflows in production-grade stacks
  • Deep-dive into Amazon Transcribe Call Analytics and speaker separation
  • Designing efficient WebSocket-based pipelines
  • Crafting effective AI prompts for contextual assistance
  • Leveraging cloud-native AWS architectures for scalable AI products

What's Next for TalkSense-AI

  • Multi-Language Support for global customer interaction
  • Advanced Analytics Dashboard with real-time KPIs and escalation tracking
  • CRM Integrations (Salesforce, HubSpot) for data syncing
  • Mobile App for agents and managers on the go
  • Custom ML Models for industry-specific classification (e.g., healthcare, legal, banking)

Built With

Languages & Frameworks

  • Python (FastAPI, Uvicorn)
  • JavaScript (React, Vite)
  • HTML5/CSS3 (Tailwind CSS)

Cloud Services

  • Amazon Web Services (AWS)
  • Amazon Transcribe Call Analytics
  • Amazon Bedrock (Titan Text)
  • Amazon S3 (Storage)
  • Amazon Comprehend (Sentiment Analysis)
  • Amazon DynamoDB (Conversation Storage)

APIs & Libraries

  • WebSocket API
  • Amazon Transcribe Streaming SDK
  • Boto3 (AWS SDK for Python)
  • PyPDF2 (PDF Document Parsing)
  • Radix UI (Component Library)

Development Tools

  • React Router
  • Python-dotenv
  • Pydantic
  • Aiohttp

Real-time Technologies

  • WebSocket Protocol
  • Audio Streaming APIs
  • Real-time Transcription
  • Live Sentiment Analysis

Built With

Share this project:

Updates