Calling

Overview

Whatomate supports WhatsApp voice calling with WebRTC-based audio bridging. Incoming calls are handled by an Interactive Voice Response (IVR) system built using a visual drag-and-drop flow editor. The IVR plays greetings, collects DTMF input, makes HTTP callbacks, and routes callers to agent teams. Agents answer calls from the browser — no phone hardware required.

Visual Flow Editor

Drag-and-drop node-based IVR builder with 8 node types

Call Transfers

Route callers to agent teams with hold music

Call Hold

Put calls on hold with instant hold music and resume anytime

Call Recording

Record agent-caller audio as OGG/Opus, stored in S3

Outgoing Calls

Agents can place outbound calls to contacts from the chat view

How It Works

A WhatsApp user calls your business number
WhatsApp sends a webhook with the SDP offer
Whatomate establishes a WebRTC peer connection and runs the IVR flow
The caller hears greetings, presses digits (DTMF) to navigate menus, and provides input
Based on the flow logic, the call is transferred to an agent team, routed to another flow, or hung up
An available agent accepts the transfer in the browser and speaks with the caller

IVR Flow Builder

IVR flows are configured from Settings > IVR Flows in the admin UI. The editor uses a visual canvas where you drag and drop nodes, then connect them with edges to define the call flow.

Creating a Flow

Each flow has:

Name and optional description
WhatsApp Account — which phone number this flow handles
Active toggle — disabled flows are skipped
Call Start toggle — marks this as the entry flow for incoming calls

Node Types

The flow editor provides 8 node types. Drag them from the palette onto the canvas, configure their properties in the side panel, and connect them with edges.

Greeting

Plays an audio message to the caller. The audio can be uploaded as a file or generated from text using TTS (text-to-speech).

Property	Description
Audio	Upload an audio file (OGG, MP3, WAV, etc.) — automatically transcoded to Opus
TTS Text	Type text to generate audio using Piper TTS
Interruptible	If enabled, the caller can press a digit to skip the greeting

The greeting node has one output (default) that connects to the next node.

Plays an audio prompt and waits for the caller to press a DTMF digit. Routes the call based on the digit pressed.

Property	Description
Audio / TTS	The prompt audio (e.g., “Press 1 for sales, press 2 for support”)
Timeout	Seconds to wait for input (default: 10)
Max Retries	Number of retries on invalid/no input before following the `max_retries` edge (default: 3)
Options	Digit-to-label mappings (e.g., `1` → “Sales”, `2` → “Support”)

Output handles:

digit:1, digit:2, etc. — one per configured option
timeout — triggered when the caller doesn’t press anything
max_retries — triggered after exhausting all retries
default — fallback for unconfigured digits

Connect each output handle to the appropriate next node.

Gather

Collects multi-digit input from the caller (e.g., account number, PIN). The collected digits are stored as a context variable for use in subsequent HTTP callbacks.

Property	Description
Audio / TTS	Prompt audio (e.g., “Please enter your account number”)
Max Digits	Maximum number of digits to collect (default: 10)
Terminator	Character that ends input (default: `#`)
Store As	Variable name to store the input (e.g., `account_number`)
Timeout	Seconds to wait for input (default: 10)
Max Retries	Retries before following `max_retries` edge (default: 3)

Output handles: default, timeout, max_retries

Stored variables can be used in HTTP callback URL and body templates as {{variable_name}}.

HTTP Callback

Makes an HTTP request to an external API during the call flow. Useful for looking up caller information, validating input, or triggering actions in other systems.

Property	Description
URL	The endpoint URL (supports `{{variable}}` interpolation)
Method	`GET` or `POST`
Headers	Custom HTTP headers (key-value pairs)
Body Template	Request body with variable interpolation (e.g., `{"phone": "{{caller_phone}}"}`)
Timeout	Request timeout in seconds (default: 10)
Store Response As	Variable name to store the response body

Built-in variables available for interpolation:

{{caller_phone}} — the caller’s phone number
{{call_id}} — the WhatsApp call ID
Any variable set by a previous Gather node

Output handles: default (continues regardless of response status)

Transfer

Routes the caller to an agent team. This is a terminal node — it cannot have outgoing edges. Once the transfer starts, hold music plays while agents are notified.

Property	Description
Team	The agent team to transfer to

When a transfer executes:

Hold music plays for the caller
A transfer notification appears for all online agents in the target team
The first agent to accept connects to the caller via WebRTC audio bridge
If no agent accepts within the transfer timeout, the call is terminated

Goto Flow

Jumps to a different IVR flow. This is a terminal node. Use this to split complex IVR trees into reusable modules (e.g., a shared “Account Verification” flow).

Property	Description
Target Flow	The IVR flow to jump to

The target flow starts from its entry node. The caller’s context variables carry over.

Timing

Branches the call based on business hours. Configure a weekly schedule with per-day enable/disable and start/end times.

Property	Description
Timezone	IANA timezone (e.g., `Asia/Kolkata`, `America/New_York`)
Schedule	Per-day enabled/disabled with start and end times

Output handles:

in_hours — current time is within the configured schedule
out_of_hours — current time is outside the schedule

Hangup

Plays an optional goodbye message and terminates the call. This is a terminal node — it cannot have outgoing edges.

Property	Description
Audio / TTS	Optional goodbye message

Building a Flow

Click Add Flow and give it a name, select the WhatsApp account, and enable “Active” and “Call Start”
Drag nodes from the palette at the top onto the canvas
Click a node to configure its properties in the right panel
Connect nodes by dragging from an output handle (bottom/right of a node) to the input (top) of another node
The first node added becomes the entry node (marked with a green indicator). You can change this by deleting and re-adding nodes
Click Save to persist the flow

Example Flow

Below is a screenshot of an example IVR flow built in the visual editor:

Call Transfers

When a caller reaches a Transfer node:

Hold music plays for the caller
A transfer notification appears for all agents in the target team
The first agent to accept connects to the caller
The call is bridged — both parties hear each other through the browser

Call Hold

During an active call, agents can put the caller on hold by clicking the Hold button (pause icon) in the call panel. Hold music plays immediately to the caller with seamless audio continuity — no gaps or clicks.

Hold: Stops the audio bridge between agent and caller, starts playing the configured hold music file on loop
Resume: Stops hold music and restores two-way audio instantly

The hold button toggles between pause (hold) and play (resume) icons, with an amber highlight when the call is on hold. The status text in the call panel shows “On Hold” while held.

Hold music is configurable per organization from Settings > Organization > Calling. The default file is set in config.toml via hold_music_file.

Call Recording

When enabled, calls are recorded during the agent-caller bridge phase. Recordings are saved as OGG/Opus files and uploaded to S3.

To enable, add to your config.toml:

[calling]
recording_enabled = true

[storage]
s3_bucket = "your-bucket"
s3_region = "us-east-1"
s3_key = "AKIA..."
s3_secret = "..."

Recordings are accessible from the call log detail view, which generates time-limited presigned URLs for playback.

Call Logs

All calls (incoming and outgoing) are logged with:

Caller phone number and contact name
Direction, status, and duration
Who disconnected the call (client, agent, or system)
IVR flow traversal path (shown as a step-by-step trace)
Agent who handled the call
Recording playback (if enabled)

Filter logs by status, direction, account, or IVR flow.

Configuration

Add to your config.toml:

[calling]
audio_dir = "./audio"                  # Directory for IVR audio files
hold_music_file = "hold-music.ogg"     # Hold music file (relative to audio_dir)
ringback_file = "ringback.ogg"         # Ringback tone for outgoing calls
max_call_duration = 3600               # Max call duration in seconds
transfer_timeout_secs = 120            # Seconds to wait for agent to accept
recording_enabled = false              # Enable call recording to S3
udp_port_min = 10000                   # WebRTC UDP port range start
udp_port_max = 10100                   # WebRTC UDP port range end
public_ip = ""                         # Public IP for NAT (required on cloud/AWS)
relay_only = false                     # Force all media through TURN relay

# ICE servers (STUN/TURN) for WebRTC connectivity
[[calling.ice_servers]]
urls = ["stun:stun.l.google.com:19302"]

[[calling.ice_servers]]
urls = ["turn:your-turn-server:3478"]
username = "user"
credential = "pass"

Enabling Calling per Organization

Calling is enabled per-organization in the database. Set calling_enabled = true on the organization record to allow calls for that org.

Text-to-Speech (IVR Greetings)

Whatomate uses Piper for offline text-to-speech generation. When admins type greeting text in the IVR flow editor, the server generates OGG/Opus audio files using Piper + opusenc. This is optional — you can also upload pre-recorded audio files directly.

Install Dependencies

Piper requires the espeak-ng shared library at runtime, and opusenc is needed to convert WAV output to OGG/Opus:

# Debian/Ubuntu
sudo apt install espeak-ng opus-tools

# Fedora
sudo dnf install espeak-ng opus-tools

Install Piper

# Download Piper binary (Linux x86_64)
wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz
tar xf piper_linux_x86_64.tar.gz
sudo mv piper/piper /usr/local/bin/

Download a Voice Model

Piper voices are available at huggingface.co/rhasspy/piper-voices (mirrors at OHF-Voice). Each voice has a .onnx model file and a .onnx.json config file — both are required.

Choosing a voice:

Browse voices and listen to samples at rhasspy.github.io/piper-samples
Voices come in quality levels: low, medium, and high — medium is a good balance of quality and speed
For US English, en_US-lessac-medium is recommended (~60MB)

mkdir -p /opt/piper/models

# Download model and config
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx \
  -O /opt/piper/models/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json \
  -O /opt/piper/models/en_US-lessac-medium.onnx.json

Configure TTS

Add to your config.toml:

[tts]
piper_binary = "/usr/local/bin/piper"
piper_model = "/opt/piper/models/en_US-lessac-medium.onnx"
# opusenc_binary = "opusenc"  # defaults to finding in PATH

Test TTS

echo "Press 1 for sales, press 2 for support." | piper \
  --model /opt/piper/models/en_US-lessac-medium.onnx \
  --output_file test.wav
opusenc --bitrate 24 test.wav test.ogg
# Play: aplay test.wav  OR  ffplay test.ogg

Firewall & Network

For WebRTC to work, ensure the following ports are open:

Port	Protocol	Purpose
10000–10100	UDP	WebRTC media (configurable via `udp_port_min`/`udp_port_max`)
3478	TCP/UDP	TURN server (if using relay)

Calling

Overview

How It Works

IVR Flow Builder

Creating a Flow

Node Types

Greeting

Menu

Gather

HTTP Callback

Transfer

Goto Flow

Timing

Hangup

Building a Flow

Example Flow

Call Transfers

Call Hold

Call Recording

Call Logs

Configuration

Enabling Calling per Organization

Text-to-Speech (IVR Greetings)

Install Dependencies

Install Piper

Download a Voice Model

Configure TTS

Test TTS

Firewall & Network