Outbound Voice Agent hero
Strongly Certified · Streaming Workflow

Outbound Voice Agent

Twilio Media Streams in. mulaw 8 kHz on the wire, PCM 16 kHz inside the loop.

Voice agent for telephony - inbound or outbound calls. Twilio Media Streams WebSocket on both sides, with mulaw codec at the carrier and PCM 16 kHz inside the pipeline so STT, conversation memory, LLM, and TTS all work unchanged. Configurable DTMF + VAD barge-in.

≤1.5s
First audio response (p95)
≤4s
Turn complete (p95)
Barge-in
DTMF + VAD-driven interrupt

The voice loop, end-to-end.

No black box. Each step is a typed-frame node you can edit, monitor, and replace.

01

Twilio answers the call (or your dialler initiates the outbound). Twilio's TwiML hands the audio to a Media Streams WebSocket pointing at this workflow's session.

02

streaming-twilio-trigger parses the Twilio JSON envelope, base64-decodes mulaw, ratecv-upsamples 8 kHz to 16 kHz, emits AudioFrame.

03

Standard voice loop: VAD, STT, turn-detection, conversation-memory, LLM, TTS at 16 kHz PCM.

04

streaming-twilio-response ratecv-downsamples back to 8 kHz, mulaw-encodes via stdlib audioop, base64-wraps, frames as Twilio media envelopes. InterruptFrame triggers a Twilio clear envelope for barge-in.

Built for production. Day Two-ready.

Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.

Twilio Media Streams pair

streaming-twilio-trigger + streaming-twilio-response. Both nodes share the same WebSocket the runtime accepts on the configured session URL. SDP/ICE not required - Twilio handles the carrier-side codec.

Trigger + response pairSingle WebSocketCarrier-handled SDP

Real codec, no stubs

stdlib audioop for mulaw <-> PCM linear and ratecv resampling. Filter state carries across chunks so the upsampled inbound and downsampled outbound stay phase-continuous.

audioop stdlibPhase-continuousPer-call state

Configurable barge-in

interrupt_dtmf_digit config maps a DTMF press to InterruptFrame. VAD also fires Interrupt on caller speech-onset. Both trigger streaming-twilio-response to send a Twilio clear envelope, flushing buffered outbound audio at the carrier.

DTMF interruptVAD interruptTwilio clear

Standard voice loop

VAD, STT, turn-detection, conversation-memory, LLM, TTS - same shape as the WebSocket-driven voice templates. Telephony is just a different transport at the edges.

Same loop shapeDrop-in trigger / responseFamiliar pattern

Memory feedback edge

LLM response feeds back into conversation-memory as the assistant turn (ADR-S16 feedback edge with max_iterations: 1000). The graph_validator accepts the cycle.

ADR-S16Coherent memoryValidator-clean

Live span tree

stream_sid, call_sid, ratecv state, mulaw decode counts all land on the call's span tree. Replay any call, any turn, any time.

ADR-S14Carrier IDsReplayable

Real services. Your stack.

Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.

Telephony trigger
streaming-twilio-trigger - Twilio Media Streams in
Telephony response
streaming-twilio-response - Twilio Media Streams out
STT / LLM / TTS
All swappable in the wizard - whisper-1 / gpt-4o-mini / tts-1 default
audioop stdlib
mulaw codec + ratecv resampling

Tune it. Don't fork it.

The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.

Inbound vs outbound

The graph is identical; only the Twilio call origination changes. Outbound: your dialler creates the call and points the answer-TwiML at this workflow's session WebSocket.

Different barge-in digit

Edit tel_in.config.interrupt_dtmf_digit to # or *, or any single DTMF digit. Empty disables DTMF barge-in (VAD-only still works).

Slower / faster TTS

Pick a different TTS model in the install wizard. tts-1 default; tts-1-hd, elevenlabs-v2 also work. The graph stays the same.

Tighter scope

Edit memory.system_prompt to constrain the agent to your business: hours, refusal policy, compliance disclosures.

CRM grounding

Add streaming-db-reader between trigger and LLM, configured to load the caller's CRM record by phone number (carried in the StartFrame metadata as `from`).

Production. Not pilots.

We don't leave until it runs. Talk to a forward-deployed engineer about deploying Outbound Voice Agent into your environment with your STT, your LLM, your TTS, your data.

Schedule a Demo