Realtime Voice Agent (DB-Backed) hero
Strongly Certified · Streaming Workflow

Realtime Voice Agent (DB-Backed)

Realtime voice + caller profile + RAG + DB-backed function calls.

The canonical demonstration of the §5.10 architecture: streaming-user-profile pre-loads caller context, streaming-vector-search adds RAG mid-session, and the realtime model can call DB-backed tools through streaming-tool-executor. Three DB-integration ports working end-to-end.

≤1.2s
First audio response (p95)
3 ports
DB integration surface
RAG
Mid-session retrieval

The voice loop, end-to-end.

No black box. Each step is a typed-frame node you can edit, monitor, and replace.

01

On StartFrame, streaming-user-profile pulls the caller's profile from the configured Mongo addon and emits a ContextFrame into the realtime node's context_in.

02

On every assistant turn, streaming-realtime-agent emits the user transcript on text_out; streaming-embed embeds the transcript and streaming-vector-search retrieves top-K chunks from the knowledge base; the resulting ContextFrame loops back into context_in via a feedback edge so the model uses freshly-retrieved context for the next turn.

03

When the model calls lookup_account_balance, the ToolCallFrame routes to streaming-tool-executor, which hits the configured HTTP datasource and returns a ToolResultFrame.

04

ToolResultFrame returns to the realtime node's tool_result_in via a feedback edge; the node serialises it as conversation.item.create function_call_output and triggers a follow-up response.

Built for production. Day Two-ready.

Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.

Pre-session profile

streaming-user-profile loads the caller's record at StartFrame and folds it into the model's session.update so the model knows who it's talking to before the first audio frame.

MongoDB lookupPre-Start ContextFrameSession-level instructions

Mid-session RAG

streaming-vector-search hits a Mongo $vectorSearch index over the operator's knowledge-base chunks every turn; ContextFrame loops back into context_in via a feedback edge so the model grounds the next reply.

$vectorSearchADR-S16 feedback edgePer-turn retrieval

DB-backed tool call

ToolCallFrame on tool_call_out routes to streaming-tool-executor configured with handler_kind=service_http; the executor hits the configured HTTP datasource and returns a ToolResultFrame; result loops back to tool_result_in.

service_http handlerToolResultFrame loopJSON-schema tools

Steerable on demand

Wire a SteerFrame source (operator dashboard, escalation signal, escalation router) into steer_in to swap the model's persona, voice, or temperature mid-conversation without ending the session.

SteerFrameLive mutationOperator dashboard ready

Spans bundled with archive

streaming-recorder 1.4.0 with record_spans=true bundles the session's full span trace from workflow_spans (Mongo) into the S3 archive. Single bundle for batch consumption / post-call review / compliance audit.

Span archiveCompliance-readySingle bundle

Same span path as batch

Per-frame spans on workflow_spans, same path as the batch runtime, viewable in the canvas. No Prometheus, no Grafana, no external tracing.

ADR-S14Strongly tracingCanvas-first

Real services. Your stack.

Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.

Realtime node
streaming-realtime-agent 1.0.0 - all three DB-integration ports wired
Profile / RAG
streaming-user-profile + streaming-embed + streaming-vector-search
Tool execution
streaming-tool-executor with service_http handler against an HTTP datasource
Strongly tracing
Spans on workflow_spans + record_spans=true on the recorder for archive bundling

Tune it. Don't fork it.

The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.

Different RAG cadence

Default fires on every assistant turn. Add a streaming-conditional upstream to gate retrieval, or change the vector-search top_k.

More tools

Add a streaming-tool-executor instance per tool. Multi-tool routing for ToolCallFrame is a v1.1 follow-up requiring a streaming-tool-router update.

Persona swap mid-call

Wire a SteerFrame source into rt.steer_in to mutate instructions / voice on demand. Voice swaps defer until response.done.

Profile lookup key

profile.config.user_id_key (default 'user_id') controls which session metadata field the lookup uses. Match your auth integration's session shape.

Embedding model

ai_models.embeddings is configurable independently of the realtime model. text-embedding-3-small is the cheapest preferred default.

Production. Not pilots.

We don't leave until it runs. Talk to a forward-deployed engineer about deploying Realtime Voice Agent (DB-Backed) into your environment with your STT, your LLM, your TTS, your data.

Schedule a Demo