Definition
Voice AI refers to systems that combine real-time speech recognition, large language model reasoning, and natural-sounding text-to-speech to hold two-way spoken conversations. The term covers voice agents (inbound/outbound calls), voice bots, voice-enabled assistants, and voice analytics.
Components
- Speech-to-text (ASR): converts caller speech to text in real time
- Reasoning (LLM): decides what to say and what to do
- Tool execution: integrates with CRM, scheduler, EHR, etc.
- Text-to-speech (TTS): converts the response to natural speech
- Voice transport: routes audio between caller and AI
Common uses
- AI receptionists / inbound call answering
- Outbound voice campaigns
- Voice-driven intake (healthcare, legal)
- Voice analytics for QA and coaching
- Voice-to-document workflows (proposals, contracts)