Skip to content

Calling

Overview

Whatomate supports WhatsApp voice calling with WebRTC-based audio bridging. Incoming calls are handled by an Interactive Voice Response (IVR) system built using a visual drag-and-drop flow editor. The IVR plays greetings, collects DTMF input, makes HTTP callbacks, and routes callers to agent teams. Agents answer calls from the browser — no phone hardware required.

Visual Flow Editor

Drag-and-drop node-based IVR builder with 8 node types

Call Transfers

Route callers to agent teams with hold music

Call Hold

Put calls on hold with instant hold music and resume anytime

Call Recording

Record agent-caller audio as OGG/Opus, stored in S3

Outgoing Calls

Agents can place outbound calls to contacts from the chat view

How It Works

  1. A WhatsApp user calls your business number
  2. WhatsApp sends a webhook with the SDP offer
  3. Whatomate establishes a WebRTC peer connection and runs the IVR flow
  4. The caller hears greetings, presses digits (DTMF) to navigate menus, and provides input
  5. Based on the flow logic, the call is transferred to an agent team, routed to another flow, or hung up
  6. An available agent accepts the transfer in the browser and speaks with the caller

IVR Flow Builder

IVR flows are configured from Settings > IVR Flows in the admin UI. The editor uses a visual canvas where you drag and drop nodes, then connect them with edges to define the call flow.

Creating a Flow

Each flow has:

  • Name and optional description
  • WhatsApp Account — which phone number this flow handles
  • Active toggle — disabled flows are skipped
  • Call Start toggle — marks this as the entry flow for incoming calls

Node Types

The flow editor provides 8 node types. Drag them from the palette onto the canvas, configure their properties in the side panel, and connect them with edges.

Greeting

Plays an audio message to the caller. The audio can be uploaded as a file or generated from text using TTS (text-to-speech).

PropertyDescription
AudioUpload an audio file (OGG, MP3, WAV, etc.) — automatically transcoded to Opus
TTS TextType text to generate audio using Piper TTS
InterruptibleIf enabled, the caller can press a digit to skip the greeting

The greeting node has one output (default) that connects to the next node.

Plays an audio prompt and waits for the caller to press a DTMF digit. Routes the call based on the digit pressed.

PropertyDescription
Audio / TTSThe prompt audio (e.g., “Press 1 for sales, press 2 for support”)
TimeoutSeconds to wait for input (default: 10)
Max RetriesNumber of retries on invalid/no input before following the max_retries edge (default: 3)
OptionsDigit-to-label mappings (e.g., 1 → “Sales”, 2 → “Support”)

Output handles:

  • digit:1, digit:2, etc. — one per configured option
  • timeout — triggered when the caller doesn’t press anything
  • max_retries — triggered after exhausting all retries
  • default — fallback for unconfigured digits

Connect each output handle to the appropriate next node.

Gather

Collects multi-digit input from the caller (e.g., account number, PIN). The collected digits are stored as a context variable for use in subsequent HTTP callbacks.

PropertyDescription
Audio / TTSPrompt audio (e.g., “Please enter your account number”)
Max DigitsMaximum number of digits to collect (default: 10)
TerminatorCharacter that ends input (default: #)
Store AsVariable name to store the input (e.g., account_number)
TimeoutSeconds to wait for input (default: 10)
Max RetriesRetries before following max_retries edge (default: 3)

Output handles: default, timeout, max_retries

Stored variables can be used in HTTP callback URL and body templates as {{variable_name}}.

HTTP Callback

Makes an HTTP request to an external API during the call flow. Useful for looking up caller information, validating input, or triggering actions in other systems.

PropertyDescription
URLThe endpoint URL (supports {{variable}} interpolation)
MethodGET or POST
HeadersCustom HTTP headers (key-value pairs)
Body TemplateRequest body with variable interpolation (e.g., {"phone": "{{caller_phone}}"})
TimeoutRequest timeout in seconds (default: 10)
Store Response AsVariable name to store the response body

Built-in variables available for interpolation:

  • {{caller_phone}} — the caller’s phone number
  • {{call_id}} — the WhatsApp call ID
  • Any variable set by a previous Gather node

Output handles: default (continues regardless of response status)

Transfer

Routes the caller to an agent team. This is a terminal node — it cannot have outgoing edges. Once the transfer starts, hold music plays while agents are notified.

PropertyDescription
TeamThe agent team to transfer to

When a transfer executes:

  1. Hold music plays for the caller
  2. A transfer notification appears for all online agents in the target team
  3. The first agent to accept connects to the caller via WebRTC audio bridge
  4. If no agent accepts within the transfer timeout, the call is terminated

Goto Flow

Jumps to a different IVR flow. This is a terminal node. Use this to split complex IVR trees into reusable modules (e.g., a shared “Account Verification” flow).

PropertyDescription
Target FlowThe IVR flow to jump to

The target flow starts from its entry node. The caller’s context variables carry over.

Timing

Branches the call based on business hours. Configure a weekly schedule with per-day enable/disable and start/end times.

PropertyDescription
TimezoneIANA timezone (e.g., Asia/Kolkata, America/New_York)
SchedulePer-day enabled/disabled with start and end times

Output handles:

  • in_hours — current time is within the configured schedule
  • out_of_hours — current time is outside the schedule

Hangup

Plays an optional goodbye message and terminates the call. This is a terminal node — it cannot have outgoing edges.

PropertyDescription
Audio / TTSOptional goodbye message

Building a Flow

  1. Click Add Flow and give it a name, select the WhatsApp account, and enable “Active” and “Call Start”
  2. Drag nodes from the palette at the top onto the canvas
  3. Click a node to configure its properties in the right panel
  4. Connect nodes by dragging from an output handle (bottom/right of a node) to the input (top) of another node
  5. The first node added becomes the entry node (marked with a green indicator). You can change this by deleting and re-adding nodes
  6. Click Save to persist the flow

Example Flow

Below is a screenshot of an example IVR flow built in the visual editor:

Example IVR flow in the visual editor

Call Transfers

When a caller reaches a Transfer node:

  1. Hold music plays for the caller
  2. A transfer notification appears for all agents in the target team
  3. The first agent to accept connects to the caller
  4. The call is bridged — both parties hear each other through the browser

Call Hold

During an active call, agents can put the caller on hold by clicking the Hold button (pause icon) in the call panel. Hold music plays immediately to the caller with seamless audio continuity — no gaps or clicks.

  • Hold: Stops the audio bridge between agent and caller, starts playing the configured hold music file on loop
  • Resume: Stops hold music and restores two-way audio instantly

The hold button toggles between pause (hold) and play (resume) icons, with an amber highlight when the call is on hold. The status text in the call panel shows “On Hold” while held.

Hold music is configurable per organization from Settings > Organization > Calling. The default file is set in config.toml via hold_music_file.

Call Recording

When enabled, calls are recorded during the agent-caller bridge phase. Recordings are saved as OGG/Opus files and uploaded to S3.

To enable, add to your config.toml:

[calling]
recording_enabled = true
[storage]
s3_bucket = "your-bucket"
s3_region = "us-east-1"
s3_key = "AKIA..."
s3_secret = "..."

Recordings are accessible from the call log detail view, which generates time-limited presigned URLs for playback.

Call Logs

All calls (incoming and outgoing) are logged with:

  • Caller phone number and contact name
  • Direction, status, and duration
  • Who disconnected the call (client, agent, or system)
  • IVR flow traversal path (shown as a step-by-step trace)
  • Agent who handled the call
  • Recording playback (if enabled)

Filter logs by status, direction, account, or IVR flow.

Configuration

Add to your config.toml:

[calling]
audio_dir = "./audio" # Directory for IVR audio files
hold_music_file = "hold-music.ogg" # Hold music file (relative to audio_dir)
ringback_file = "ringback.ogg" # Ringback tone for outgoing calls
max_call_duration = 3600 # Max call duration in seconds
transfer_timeout_secs = 120 # Seconds to wait for agent to accept
recording_enabled = false # Enable call recording to S3
udp_port_min = 10000 # WebRTC UDP port range start
udp_port_max = 10100 # WebRTC UDP port range end
public_ip = "" # Public IP for NAT (required on cloud/AWS)
relay_only = false # Force all media through TURN relay
# ICE servers (STUN/TURN) for WebRTC connectivity
[[calling.ice_servers]]
urls = ["stun:stun.l.google.com:19302"]
[[calling.ice_servers]]
urls = ["turn:your-turn-server:3478"]
username = "user"
credential = "pass"

Enabling Calling per Organization

Calling is enabled per-organization in the database. Set calling_enabled = true on the organization record to allow calls for that org.

Text-to-Speech (IVR Greetings)

Whatomate uses Piper for offline text-to-speech generation. When admins type greeting text in the IVR flow editor, the server generates OGG/Opus audio files using Piper + opusenc. This is optional — you can also upload pre-recorded audio files directly.

Install Dependencies

Piper requires the espeak-ng shared library at runtime, and opusenc is needed to convert WAV output to OGG/Opus:

Terminal window
# Debian/Ubuntu
sudo apt install espeak-ng opus-tools
# Fedora
sudo dnf install espeak-ng opus-tools

Install Piper

Terminal window
# Download Piper binary (Linux x86_64)
wget https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz
tar xf piper_linux_x86_64.tar.gz
sudo mv piper/piper /usr/local/bin/

Download a Voice Model

Piper voices are available at huggingface.co/rhasspy/piper-voices (mirrors at OHF-Voice). Each voice has a .onnx model file and a .onnx.json config file — both are required.

Choosing a voice:

  • Browse voices and listen to samples at rhasspy.github.io/piper-samples
  • Voices come in quality levels: low, medium, and highmedium is a good balance of quality and speed
  • For US English, en_US-lessac-medium is recommended (~60MB)
Terminal window
mkdir -p /opt/piper/models
# Download model and config
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx \
-O /opt/piper/models/en_US-lessac-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json \
-O /opt/piper/models/en_US-lessac-medium.onnx.json

Configure TTS

Add to your config.toml:

[tts]
piper_binary = "/usr/local/bin/piper"
piper_model = "/opt/piper/models/en_US-lessac-medium.onnx"
# opusenc_binary = "opusenc" # defaults to finding in PATH

Test TTS

Terminal window
echo "Press 1 for sales, press 2 for support." | piper \
--model /opt/piper/models/en_US-lessac-medium.onnx \
--output_file test.wav
opusenc --bitrate 24 test.wav test.ogg
# Play: aplay test.wav OR ffplay test.ogg

Firewall & Network

For WebRTC to work, ensure the following ports are open:

PortProtocolPurpose
10000–10100UDPWebRTC media (configurable via udp_port_min/udp_port_max)
3478TCP/UDPTURN server (if using relay)