Glossary

Voice AI

Voice AI is software that holds natural spoken conversations with humans — understanding speech, reasoning about intent, and responding in real time with synthesized voice.

3 min read

Definition

Voice AI refers to systems that combine real-time speech recognition, large language model reasoning, and natural-sounding text-to-speech to hold two-way spoken conversations. The term covers voice agents (inbound/outbound calls), voice bots, voice-enabled assistants, and voice analytics.

Components

  • Speech-to-text (ASR): converts caller speech to text in real time
  • Reasoning (LLM): decides what to say and what to do
  • Tool execution: integrates with CRM, scheduler, EHR, etc.
  • Text-to-speech (TTS): converts the response to natural speech
  • Voice transport: routes audio between caller and AI

Common uses

  • AI receptionists / inbound call answering
  • Outbound voice campaigns
  • Voice-driven intake (healthcare, legal)
  • Voice analytics for QA and coaching
  • Voice-to-document workflows (proposals, contracts)