PII redacted. Prompt injections blocked. Verdict logged. Before the LLM sees a thing.
Inline moderation for live text and voice channels. Rules-first checks short-circuit risky messages before any LLM call. The moderator-LLM only rules on edge cases - bounded cost, auditable verdicts.
No black box. Each step is a typed-frame node you can edit, monitor, and replace.
Message arrives over WebSocket. The streaming-pii-anonymiser strips known patterns first.
The streaming-rule-based-classifier checks for prompt injection, abuse, banned topics. Hits short-circuit to a redact or block verdict.
Only the residual edge cases reach the moderator-LLM. The verdict (allow / redact / block) is written to the session log with a reason.
Allowed messages pass through. The LLM is never invoked on a rejected message - cost is bounded by the rules layer.
Streaming graph contract, observability, and cost discipline come standard. The agent ships with a full test suite that runs in CI on every node version bump.
streaming-rule-based-classifier runs deterministic rules first. Prompt injection signatures, abuse lexicons, banned topic keywords. Hits short-circuit - no LLM call, no cost.
Inbound messages pass through streaming-pii-anonymiser before any classifier. SSN, email, US phone, account numbers - replaced with stable placeholders so downstream nodes never see the raw value.
Known injection signatures (jailbreak prefixes, system-prompt overrides, delimiter abuse) are detected and dropped at the gate. The moderator-LLM never sees the bait.
streaming-conditional routes by verdict: allow → downstream, redact → modified text + tag, block → drop with audit row. Each path is its own clean branch.
Every decision (allow / redact / block) lands in streaming-conversation-store with the rule that fired and the moderator's reason. Replay any session, any turn.
Each verdict writes a span. Filter the canvas overlay by verdict to find the rule that's firing too often, or the policy gap that's letting things through.
Every dependency is a registered Strongly service or a model you control. Swap any one of them in the install wizard. The graph stays intact.
The marketplace template is the graph. Every customisation below is a config change or a single-node addition - never a rewrite.
Add domain-specific banned patterns to the rule-based-classifier. Brand-protection lexicons, regulated-industry term lists, regional law overrides.
The default presets are US-centric. Swap in EU-PII, UK-NI, or APAC presets, or add a regex pack for your locale.
Add a second pii-anonymiser between LLM and TTS so generated replies are also scrubbed - useful when the LLM might quote user input back.
Flip streaming-pii-anonymiser's action from 'redact' to 'drop'. Risky messages disappear entirely instead of being modified.
Replace websocket-response with streaming-webhook-response to send verdicts to your trust-and-safety SIEM.
We don't leave until it runs. Talk to a forward-deployed engineer about deploying Real-Time Content Moderator into your environment with your STT, your LLM, your TTS, your data.
Schedule a Demo