Per-turn quality scoring. Full session archive. QA webhook on session end.
Voice support agent with programmatic quality scoring built into every turn. Audio and transcript are archived per session. The QA inbox gets a webhook when the call ends - score, flagged turns, full replay link.
No black box. Each step is a typed-frame node you can edit, monitor, and replace.
The voice loop runs as a normal support session - STT, LLM, TTS over WebSocket.
After every turn, streaming-eval-scorer rates the response (tone, clarity, compliance). The score lands on the turn's span.
streaming-recorder writes the raw audio to S3. streaming-conversation-store writes the transcript and per-turn scores to your archive collection.
On session end, streaming-webhook-response posts the summary to your QA inbox - score, flagged turns, replay link.
Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.
streaming-eval-scorer rates each agent reply against a rubric you define - tone, clarity, compliance, accuracy. Scores attach to the turn span and feed the session summary.
Raw audio is archived to S3 by streaming-recorder. Transcripts and per-turn scores write to streaming-conversation-store. Replay any call, any turn, any time.
When the session ends, streaming-webhook-response fires a signed POST to your QA inbox with the session summary, score histogram, and flagged-turn IDs. Standard HMAC-SHA256.
streaming-conditional routes turns below your score threshold into a separate flagged collection. QA teams see only the calls that need review - not every transcript.
Eval scores, recording offsets, and webhook delivery status all land as span attributes. The canvas overlay shows the call as it happened, with scores annotated per turn.
Edit the eval-scorer rubric per-region, per-business-line, per-script. The same workflow runs against multiple scoring policies - pick one at deploy time.
Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.
The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.
Edit the eval-scorer's rubric template. Common additions: brand-tone checks, regulated-disclosure verification, escalation-trigger detection.
Add a streaming-handoff-detector after the scorer. Turns below threshold escalate to a human queue mid-call instead of post-call.
Replace the always-on recorder with a flag-driven one - only flagged sessions get audio archived. Lower S3 spend, same QA coverage.
Insert streaming-pii-anonymiser before streaming-recorder so archived audio and transcript never carry raw PII.
Stand up the same workflow with different rubric configs (one per region or business line). Compare score distributions across deployments.
We don't leave until it runs. Talk to a forward-deployed engineer about deploying Call Center QA into your environment with your STT, your LLM, your TTS, your data.
Schedule a Demo