Voice Banking Assistant hero
Strongly Certified · Streaming Workflow

Voice Banking Assistant

Voice support that grounds every answer in the customer's actual account.

Real-time voice agent for banking and financial services. Customer profile loads from your Postgres at session start. PII redacted before TTS. No general 'ask the LLM' path.

≤1.5s
First audio response (p95)
≤4s
Turn complete (p95)
$0.02
Per-turn cost (default models)

The voice loop, end-to-end.

No black box. Each step is a typed-frame node you can edit, monitor, and replace.

01

Caller speaks. Audio streams in over WebSocket.

02

STT transcribes. The agent reads the customer's profile and recent transactions from Postgres.

03

The LLM answers grounded in those rows. Nothing else.

04

Outbound TTS audio streams back. PII regex hits are redacted before TTS reads them.

Built for production. Day Two-ready.

Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.

Grounded answers

Customer profile and recent transactions are pulled from your Postgres at session start. The LLM is constrained to that context. No hallucinated balances, no invented merchants.

Postgres addonPer-session loadStrict context

Streaming voice loop

WebSocket in, WebSocket out. STT, LLM, and TTS run concurrently behind a typed-frame contract. Audio chunks ship as they're produced - no full-utterance waits.

WebSocketReal-timeADR-S11 frames

PII before TTS

Outbound responses pass through streaming-safety-filter before synthesis. SSN, email, US phone, account numbers - redacted, blocked, or dropped per your policy.

Regex presetsConfigurable actionPre-TTS

Rolling summary

Conversation history stays under the model's context window via streaming-summariser-rolling. Long calls don't degrade. Cost per turn stays bounded.

Token-budgetedPer-turnCost-stable

Live span tree

Every turn writes spans to workflow_spans - node latency, frame bytes, queue depth, watermark. The canvas overlay shows what each node did, when, and why.

ADR-S14Per-turnCanvas overlay

Cost line you can quote

≈$0.02 per turn at the default models (whisper-1 + gpt-4o + tts-1). Swap any one of them in the install wizard. The graph stays intact.

Fixed defaultsSwappable modelsPredictable spend

Real services. Your stack.

Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.

Postgres addon
Customer profile, balances, recent transactions
STT model
whisper-1 default - swap any registered STT
LLM model
gpt-4o default - swap any registered chat model
TTS model
tts-1 default - swap any registered TTS

Tune it. Don't fork it.

The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.

Smaller LLM

Switch the llm node's model to gpt-4o-mini or a self-hosted llama-3.1-70b. The system prompt is short - a smaller model usually does.

Tighter scope

Edit the prompt template. Common additions: scope ('checking and savings only'), tone, refusal rules ('never quote rates').

Hybrid retrieval

Replace streaming-db-reader with a join across multiple tables, or chain it with streaming-vector-retrieval. Variables flow into the prompt as {{name}} substitutions.

Language routing

Insert streaming-confidence-router between stt and prompt. Branch on detected language. Send non-English speech down a parallel pipeline.

Different sink

Swap websocket-response for telephony-response (SIP) or webrtc-response. Or chain into streaming-aggregator → kafka-producer for the data lake.

Production. Not pilots.

We don't leave until it runs. Talk to a forward-deployed engineer about deploying Voice Banking Assistant into your environment with your STT, your LLM, your TTS, your data.

Schedule a Demo