Twilio Media Streams in. mulaw 8 kHz on the wire, PCM 16 kHz inside the loop.
Voice agent for telephony - inbound or outbound calls. Twilio Media Streams WebSocket on both sides, with mulaw codec at the carrier and PCM 16 kHz inside the pipeline so STT, conversation memory, LLM, and TTS all work unchanged. Configurable DTMF + VAD barge-in.
No black box. Each step is a typed-frame node you can edit, monitor, and replace.
Twilio answers the call (or your dialler initiates the outbound). Twilio's TwiML hands the audio to a Media Streams WebSocket pointing at this workflow's session.
streaming-twilio-trigger parses the Twilio JSON envelope, base64-decodes mulaw, ratecv-upsamples 8 kHz to 16 kHz, emits AudioFrame.
Standard voice loop: VAD, STT, turn-detection, conversation-memory, LLM, TTS at 16 kHz PCM.
streaming-twilio-response ratecv-downsamples back to 8 kHz, mulaw-encodes via stdlib audioop, base64-wraps, frames as Twilio media envelopes. InterruptFrame triggers a Twilio clear envelope for barge-in.
Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.
streaming-twilio-trigger + streaming-twilio-response. Both nodes share the same WebSocket the runtime accepts on the configured session URL. SDP/ICE not required - Twilio handles the carrier-side codec.
stdlib audioop for mulaw <-> PCM linear and ratecv resampling. Filter state carries across chunks so the upsampled inbound and downsampled outbound stay phase-continuous.
interrupt_dtmf_digit config maps a DTMF press to InterruptFrame. VAD also fires Interrupt on caller speech-onset. Both trigger streaming-twilio-response to send a Twilio clear envelope, flushing buffered outbound audio at the carrier.
VAD, STT, turn-detection, conversation-memory, LLM, TTS - same shape as the WebSocket-driven voice templates. Telephony is just a different transport at the edges.
LLM response feeds back into conversation-memory as the assistant turn (ADR-S16 feedback edge with max_iterations: 1000). The graph_validator accepts the cycle.
stream_sid, call_sid, ratecv state, mulaw decode counts all land on the call's span tree. Replay any call, any turn, any time.
Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.
The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.
The graph is identical; only the Twilio call origination changes. Outbound: your dialler creates the call and points the answer-TwiML at this workflow's session WebSocket.
Edit tel_in.config.interrupt_dtmf_digit to # or *, or any single DTMF digit. Empty disables DTMF barge-in (VAD-only still works).
Pick a different TTS model in the install wizard. tts-1 default; tts-1-hd, elevenlabs-v2 also work. The graph stays the same.
Edit memory.system_prompt to constrain the agent to your business: hours, refusal policy, compliance disclosures.
Add streaming-db-reader between trigger and LLM, configured to load the caller's CRM record by phone number (carried in the StartFrame metadata as `from`).
We don't leave until it runs. Talk to a forward-deployed engineer about deploying Outbound Voice Agent into your environment with your STT, your LLM, your TTS, your data.
Schedule a Demo