Built for the AWS Hackathon · Powered by Amazon Nova 2 Sonic
SonicServe is a fully autonomous AI voice agent for restaurants. Callers speak naturally to place orders, make reservations, browse the menu, and request call transfers or manager escalations - no hold music, no phone trees. A parallel Slack agent lets staff manage store operations (hours, inventory, and policies) without leaving their chat workspace.
The system is split into two independent paths:
Voice path: Twilio receives an inbound call and opens a WebSocket media stream to the FastAPI server. Audio is relayed to Amazon Nova Sonic (Bedrock) which handles bidirectional streaming with barge-in detection. Nova Sonic calls tools via an MCP server (FastMCP) to handle reservations, orders, menu lookups, escalations, and call transfers.
Slack path: A Strands agent powered by Amazon Nova Pro listens for staff mentions and DMs in Slack. It calls its own set of tools directly (no MCP) to update business hours, manage Square inventory, and edit policies stored in PostgreSQL. The agent runs in Socket Mode, so Slack pushes events directly over a persistent WebSocket connection - no public endpoint required.
| Layer | Technology |
|---|---|
| Voice AI | Amazon Nova Sonic (Bedrock) |
| Slack AI | Amazon Nova Pro (Bedrock) via Strands |
| Telephony | Twilio, PSTN + WebSocket media streams |
| Tool protocol | MCP via FastMCP |
| POS integration | Square Orders and Catalog API |
| Database | PostgreSQL (RDS) |
| API server | FastAPI + Uvicorn |
| Deployment | Docker, AWS ECR, AWS ECS |
| Local tunneling | ngrok |
.
├── app.py # FastAPI entrypoint, Twilio WebSocket + Slack action handler
├── nova_sonic.py # Nova Sonic session, audio relay, barge-in logic
├── mcp_client.py # MCP client wrapper (tool discovery + invocation)
├── mcp_server.py # MCP server, registers all voice tools
├── Dockerfile # Voice server container
├── docker-compose.yml # Voice server compose
├── requirements.txt
│
├── tools/ # MCP tools (voice path only)
│ ├── business_info.py # get_business_hours, get_location, get_parking_info
│ ├── escalation.py # escalate_to_manager (Slack Block Kit)
│ ├── menu.py # get_menu, search_menu
│ ├── orders.py # place_order, get_order_status
│ ├── policies.py # get_policy
│ ├── reservations.py # create_reservation, cancel_reservation
│ └── transfer.py # transfer_call (Twilio)
│
├── data/
│ ├── master_data.py # In-memory cache, loaded from DB + Square at startup
│ ├── policies_data.py # Fallback static policies
│ ├── restaurant.py # Static restaurant + parking info
│ ├── square_payload.json # Square catalog seed payload
│ └── seed/
│ ├── business_hours.sql # Creates and seeds business_hours table
│ ├── square_batch_insert.py
│ ├── square_batch_delete.py
│ └── clover-bootstrap.py
│
├── utils/
│ ├── audio.py # u-law / PCM resampling (Twilio / Nova Sonic)
│ ├── pos/
│ │ ├── base.py # POSProvider abstract interface
│ │ ├── factory.py # Returns Square or Clover provider
│ │ ├── square.py # Square Orders + Catalog API
│ │ └── clover.py # Clover Orders API
│ ├── rds/
│ │ ├── business_hours/core.py
│ │ ├── policies/core.py
│ │ └── reservations/core.py
│ └── slack/
│ └── actions.py # Signature verification + Block Kit action handlers
│
├── public/
│ └── assets/
│ └── sonicserve_arch.png
│
└── slack-agent/ # Independent Slack operations agent
├── slack_app.py # Slack Bolt app, mentions + DMs
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── db/
│ └── schema.sql # Creates business_hours + policy_information tables
└── agent/
├── agent.py # Strands agent, Nova Pro, tool registration
└── tools/
├── business_hours.py # get_business_hours, update_business_hours
├── square_inventory.py # mark_item_sold_out, mark_item_back_in_stock
└── policy.py # list_policies, get_policy, update_policy
- Python 3.12+
- Docker
- AWS account with Bedrock access: Nova Sonic (
amazon.nova-2-sonic-v1:0) and Nova Pro (us.amazon.nova-pro-v1:0) enabled inus-east-1 - Twilio account with a phone number
- Square developer account (sandbox or production)
- PostgreSQL database (local or RDS)
- Slack app with bot token, signing secret, and app-level token (for Socket Mode)
- ngrok (local development only)
Create a .env file in the project root. The same file is used by both the voice server and the Slack agent; copy it into slack-agent/ as well, or mount it via Docker.
# AWS credentials
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1
# Square POS
SQUARE_ACCESS_TOKEN=
SQUARE_LOCATION_ID=
SQUARE_ENV=sandbox # sandbox or production
# Clover POS (optional alternative to Square)
CLOVER_API_TOKEN=
CLOVER_MERCHANT_ID=
# Slack
SLACK_BOT_TOKEN= # xoxb- bot token
SLACK_APP_TOKEN= # xapp- app-level token, required for Socket Mode
SLACK_SIGNING_SECRET= # used to verify incoming Slack requests
SLACK_ESCALATION_CHANNEL=#manager-alerts
SLACK_RESERVATION_CHANNEL=#reservations
# PostgreSQL
DATABASE_URL=postgresql://user:password@host:5432/dbname
# Twilio
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
TRANSFER_PHONE_NUMBER= # E.164 format, e.g. +16175550100Run the seed SQL once against your PostgreSQL instance to create the required tables and insert default data:
psql $DATABASE_URL -f data/seed/business_hours.sql
psql $DATABASE_URL -f slack-agent/db/schema.sqlTo load the menu catalog into Square sandbox:
pip install squareup python-dotenv
python data/seed/square_batch_insert.py# Install dependencies
pip install -r requirements.txt
# Start the FastAPI server
uvicorn app:app --reload --port 3000
# In a second terminal, open a public tunnel
ngrok http 3000Copy the ngrok HTTPS URL and set it as your Twilio phone number's Voice webhook:
https://<your-ngrok-id>.ngrok.io/incoming-call
Twilio will POST to /incoming-call on each inbound call and upgrade to a WebSocket at /media-stream.
cd slack-agent
pip install -r requirements.txt
python slack_app.pyThe Slack agent uses Socket Mode, so no public URL or ngrok tunnel is needed. Make sure Socket Mode is enabled in your Slack app settings and SLACK_APP_TOKEN is present in your .env.
sequenceDiagram
participant Caller
participant Twilio
participant FastAPI
participant NovaSonic
participant MCP
Caller->>Twilio: Inbound call
Twilio->>FastAPI: POST /incoming-call
FastAPI-->>Twilio: TwiML <Stream url="wss://.../media-stream"/>
Twilio->>FastAPI: WebSocket connect
FastAPI->>NovaSonic: Start Bedrock session
NovaSonic-->>Caller: "Thank you for calling Nova Dine..."
loop Conversation
Caller->>FastAPI: Audio (u-law 8kHz)
FastAPI->>NovaSonic: PCM 16kHz
NovaSonic->>MCP: Tool call (e.g. create_reservation)
MCP-->>NovaSonic: Tool result
NovaSonic-->>FastAPI: PCM 24kHz
FastAPI-->>Twilio: Audio (u-law 8kHz)
Twilio-->>Caller: Speech
end
sequenceDiagram
participant Staff
participant Slack
participant SlackBolt
participant Strands
participant NovaPro
participant DB/Square
Staff->>Slack: @bot we're closed this Sunday
Slack->>SlackBolt: Event pushed over Socket Mode WebSocket
SlackBolt->>Strands: Forward message + conversation context
Strands->>NovaPro: Invoke Nova Pro with tool definitions
NovaPro-->>Strands: Tool call (e.g. update_business_hours)
Strands->>DB/Square: Execute tool against PostgreSQL / Square API
DB/Square-->>Strands: Result
Strands->>NovaPro: Return tool result, request final response
NovaPro-->>Strands: Natural language confirmation
Strands-->>SlackBolt: Response text
SlackBolt-->>Staff: "Got it! Marked Sunday as closed."
The voice server is containerized and deployed to ECS via ECR. The Twilio webhook is pointed at the ECS service URL instead of ngrok.
aws ecr get-login-password --region us-east-1 --profile dev-pers \
| docker login --username AWS --password-stdin \
374834463497.dkr.ecr.us-east-1.amazonaws.comdocker build -t sonic-serve .
docker tag sonic-serve:latest \
374834463497.dkr.ecr.us-east-1.amazonaws.com/aws-nova-hack:latest
docker push \
374834463497.dkr.ecr.us-east-1.amazonaws.com/aws-nova-hack:latestaws ecs update-service \
--cluster default \
--service aws-nova-hack-dd04 \
--force-new-deployment \
--region us-east-1 \
--profile dev-persOnce the ECS task is running, update the Twilio voice webhook to your service's public URL:
https://<your-ecs-url>/incoming-call
This replaces the ngrok tunnel used in local development with no other changes required.
docker-compose up --buildThis starts the voice server on port 3000. The Slack agent has its own compose file in slack-agent/:
cd slack-agent
docker-compose up --buildInvite the bot to any channel or DM it directly. Example commands:
| Intent | Message |
|---|---|
| View all hours | what are our hours? |
| Close for a day | we're closed this Sunday |
| Change open time | Monday opens at 8am |
| Mark item sold out | the BBQ Bacon Burger ran out |
| Restock an item | BBQ Bacon Burger is back |
| List policies | show me all policies |
| Update a policy | update the dress_code policy to: smart casual only |
MCP for the voice path: Tools are registered as MCP-compliant specs so Nova Sonic can invoke them natively via Bedrock's tool-use protocol. This keeps nova_sonic.py decoupled from any specific business logic.
Bidirectional streaming with Nova Sonic: Nova Sonic runs as a persistent Bedrock session over a bidirectional WebSocket - audio flows in from the caller while synthesized speech flows back simultaneously. The server manages two async loops: one forwarding u-law audio (resampled to PCM 16kHz) to Bedrock, and another consuming PCM 24kHz responses and resampling back to u-law for Twilio. A generation ID gates outbound audio so stale chunks from interrupted responses are silently dropped rather than played.
Strands agent + Slack Bolt: The Slack agent combines Slack Bolt (for event handling) with a Strands agent (for reasoning and tool orchestration). Bolt receives the incoming message, extracts the text and channel context, and hands it to the Strands agent. The agent is initialized with Amazon Nova Pro as its model and a set of registered tool functions. Strands handles the multi-step reasoning loop - deciding which tools to call, invoking them, and synthesizing a natural language reply - which Bolt then posts back to the channel.
Socket Mode for the Slack agent: Rather than exposing a public HTTP endpoint for Slack events, the agent uses Socket Mode - Slack pushes events directly over a persistent WebSocket connection. This means no ingress rules, no ngrok tunnel, and no load balancer needed for the Slack path. The agent runs anywhere (local, ECS, a developer's laptop) and receives events instantly without any webhook infrastructure.
Barge-in handling: When Nova Sonic detects the caller speaking mid-response, the server clears Twilio's audio playback buffer, increments a generation ID to drop stale audio chunks, and reopens the audio channel without dropping the WebSocket connection.
master_data cache: Policies, business hours, and the Square menu are fetched once at startup into an in-memory dict. This avoids repeated DB and API calls on every tool invocation during a call.
