Strongly Certified · Streaming Workflow

Realtime Telephony Agent

Twilio Media Streams in / out, realtime model behind the carrier.

mulaw 8 kHz on the wire end-to-end - the realtime model emits g711_mulaw 8 kHz directly so the response side passes audio through without re-encoding. Single-node alternative to STT + LLM + TTS for telephony.

Talk to an Engineer See What It Does

≤1s

First audio response (p95)

≤3s

Turn complete (p95)

mulaw

Carrier codec end-to-end

What it does

The voice loop, end-to-end.

No black box. Each step is a typed-frame node you can edit, monitor, and replace.

Twilio answers the call (or your dialler initiates outbound). Twilio's TwiML hands the audio to a Media Streams WebSocket pointing at this workflow's session.

streaming-twilio-trigger 1.1.0 receives mulaw 8 kHz envelopes and forwards 8 kHz PCM AudioFrames; streaming-realtime-agent consumes via input_audio_format=g711_mulaw and emits output_audio_format=g711_mulaw directly.

streaming-twilio-response 1.1.0 wraps the model's mulaw audio in Twilio media envelopes and sends back to the carrier. No PCM ratecv on either side of the wire.

DTMF or VAD-driven InterruptFrame on the trigger sends response.cancel to the realtime model for barge-in. Audio + transcript archive to S3 on EndFrame.

Capabilities

Built for production. Day Two-ready.

Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.

Twilio Media Streams pair

streaming-twilio-trigger + streaming-twilio-response 1.1.0+. Carrier-agnostic envelope_spec config for Bandwidth Voice / internal SIP bridges; default config is Twilio.

Carrier abstractionenvelope_spec configTwilio default

Single-node voice loop

Replaces VAD + STT + LLM + TTS with one realtime model. mulaw 8 kHz stays on the wire end-to-end - no PCM ratecv, no extra audioop hops.

g711_mulaw end-to-endLower latencyOne billing line

DTMF + VAD barge-in

interrupt_dtmf_digit emits InterruptFrame on configured digit; the realtime node sends response.cancel to the provider; outbound mulaw audio buffered at the carrier flushes via Twilio clear envelope.

Twilio clearServer VADDTMF interrupt

Same loop, different transport

telephony pair handles the carrier protocol; realtime node handles the model; recorder archives the call. Each piece is independently swappable; the loop shape matches the WebSocket-driven realtime template.

Drop-in transportSame node loopFamiliar pattern

Steerable on the call

Wire a SteerFrame source into steer_in (e.g. an operator dashboard signal) to swap voice or instructions mid-call without ending the conversation. Defers voice swaps until response.done so the carrier doesn't get a half-rendered chunk.

SteerFrameMid-call mutationOperator-controlled

Same span path as batch

Per-frame spans on workflow_spans + carrier IDs (stream_sid, call_sid) tagged on the call's span tree per ADR-S14. Replay any call, any turn, any time.

ADR-S14Carrier IDsReplayable

Built on

Real services. Your stack.

Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.

Telephony pair

streaming-twilio-trigger 1.1.0 + streaming-twilio-response 1.1.0 - carrier-agnostic

Realtime node

streaming-realtime-agent 1.0.0 - input/output_audio_format=g711_mulaw for end-to-end mulaw

Realtime model

OpenAI Realtime (gpt-4o-realtime-preview); must support g711_mulaw I/O

Strongly tracing

Spans on workflow_spans - same path as batch

Five common customisations

Tune it. Don't fork it.

The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.

Outbound calls

The graph is identical; only Twilio call origination changes. Outbound: your dialler creates the call and points the answer-TwiML at this workflow's session WebSocket.

Different barge-in digit

Edit tel_in.config.interrupt_dtmf_digit. Empty disables DTMF barge-in (VAD-only still works).

Voice swap

rt.config.voice sets the initial voice; SteerFrame mid-call swaps voices (deferred until response.done).

Tighter system prompt

rt.config.system_prompt sets the agent's voice + tone + refusal policy. SteerFrame.instructions for runtime mutations.

Bandwidth or internal SIP

Set tel_in.config.envelope_spec + tel_out.config.response_envelope_spec to the carrier's shape; codec stays mulaw 8 kHz on the wire if the carrier is G.711.