AI Infrastructure Architecture

Technical details

Real-time transport

WebRTC via JsSIP + Asterisk PBX. DTLS-SRTP end-to-end encryption (same as Google Meet). Adaptive bitrate. Geo-distributed TURN/STUN. Sub-100ms round-trip on most networks.

Speech-to-text

Primary: Deepgram Nova-2 (real-time, 7+ languages). Fallback: Whisper-large via self-hosted GPU. ~200ms transcription latency. Auto-detect language.

LLM orchestration

Multi-model routing: Claude Sonnet for nuanced conversations, GPT-4o for tool use, fine-tuned Llama for high-volume tier-1. Per-customer routing rules. RAG via pgvector + Qdrant.

Text-to-speech

ElevenLabs Turbo (lowest latency) for premium voices. Azure Neural TTS as fallback. Custom voice cloning available (Enterprise). Sentence-level streaming, not wait-for-complete.

Data & compliance

PostgreSQL primary store, encrypted at rest (AES-256). EU-residency option via Finnish-hosted infrastructure. Audio never trains external models. SOC2 Type II in progress.

Reliability

99.9% uptime SLA (99.99% Enterprise). Multi-region active-active. Automatic failover between STT/LLM/TTS providers. Real-time status page. PagerDuty 24/7.

How WebCallHub AI works

The voice loop (real-time)