Inspiration

Working as the architect at a CX automation org, I found a gap in this space. There's plenty of bot building apps out there but they require you to run on their platform, in their ecosystem, usually building via a low-code/no-code editor.

The tooling that's more dev-centric tends to be too raw - nothing that really helps you run the conversation, and you end up writing far more code than you ever would've guessed you would need.

We've been working on BotRunner since early January, though the core of it has been up and running since mid-February and we've mostly been using it rather than building on it.

What it does

At its core, what BotRunner does is ping a webhook you provide when someone calls or texts an assigned phone number, and accept a JSON response from your webhook telling it how to handle that call. You choose whether to accept the call, then what to say, what questions to ask, etc. BotRunner's job is to manage the conversation, transfer, convert the user's input into useful data, and for phone calls, handle speech-to-text and text-to-speech for you. In addition, it manages state for you so that your API can remain stateless.

As part of processing user input, we currently lean heavily on Microsoft's AI- and ML-based tools in Azure Speech and CLU. CLU is a heavy ML-based system of translating user input into usable JSON, extracting the intent of the user and the data ("entities") the user provides for that intent.

The goal of BotRunner, more than anything else, is to let any developer with some web experience be able to converse with their users in a way that makes sense to them. That means that the real core of "what BotRunner does" is this: tackling the hard parts of telephony and user communication and abstracting it away. To that end, we're building SDKs in other languages to help with executing conversations, and we've launched our first one in C#.

How we built it

I created an Ubuntu VM, installed Asterisk (a very popular open-source VoIP system) and .NET, and coded out a C#-based application that can communicate with Asterisk, connecting to a websocket they provide for receiving events, sending actions to take like answering a call, bridging it, opening up an AudioSocket (another Asterisk tool), etc.

Alma worked on the CLU models to convert 'standard' inputs that we'd handle on behalf of our users (so they can just say they want a DateTime back instead of building and modeling every possible input from a user), while I worked to set up BotRunner to be able to send/receive audio with Azure Speech, and push the text parsed by Azure Speech into CLU.

Nate and I both worked on the demos that utilize and demonstrate the connection with BotRunner.

Challenges we ran into

Mostly, the focus on low-level things that you don't deal with every day in higher-level languages that don't typically compile to native:

  • Asterisk has a useful PR that wasn't yet merged into the main codebase, so I had to manually recreate those changes in my local copy of Asterisk
  • Communicating with AudioSocket means managing an active TCP socket
  • Real-time communication over voice means milliseconds matter - we have to be firing off a 320 byte packet of audio exactly every 20 milliseconds to avoid stutter. When running the app locally, the audio is painfully slow
  • Speech engines default to 'expected' words from users. They don't expect you to say "chess" or "D4" so building our demo application, which allows you to play chess over the phone, required custom training a speech model.

Accomplishments that we're proud of

  • Just getting all of these pieces to work together
  • The ease of building out a new BotRunner-based app using the BotRunner SDK
  • The new SIP-based demo available at our website - talking to the BotRunner via phone over the web is pretty cool the first time you do it!

What we learned

  • How to work with a lot of the modern ML-based tooling for handling user input
  • How to use WebRTC for voice communication on the web
  • Dealing with real-time communication issues
  • How to write an SDK for others to consume against your API that really helps the user execute.

What's next for BotRunner

Features, features, features! We'll be adding outbound calling, logging and reporting first. After that we'll be building our own BotRunner-based apps to help entrepreneurs answer their phone, deflecting junk calls. Finally, we'll add more SDKs in the most popular languages for webservices (Java, Python, Node, etc.)

Built With

Share this project:

Updates