Turn Detection identifies when a speaker has finished their conversational turn and it’s appropriate for an AI to respond. It solves a critical problem in voice AI: respond too early and you interrupt the speaker; wait too long and the conversation feels awkward.
How It Works
Turn detection analyzes audio through a multi-stage pipeline:
- Voice Activity Detection (VAD): Detects when someone is speaking
- Audio Buffering: Collects speech segments for analysis
- AI Analysis: Examines speech patterns, content, and context to predict turn completion
- Event Emission: Fires
TurnStartedEvent when speech begins and TurnEndedEvent when the turn is complete
The key insight is distinguishing between “I’m pausing to think” and “I’m done talking”—something simple silence detection can’t do.
Turn Detection vs VAD
| VAD | Turn Detection |
|---|
| Question | ”Is someone speaking?" | "Has the speaker finished?” |
| Output | Speech start/end timestamps | Turn completion signal |
| Intelligence | Simple audio analysis | Conversational context |
| Best for | Detecting presence | Knowing when to respond |
Vision Agents’ turn detection uses VAD under the hood, then applies neural models to determine turn completion.
Available Plugins
| Plugin | Description |
|---|
| Smart Turn | Combines Silero VAD, Whisper features, and neural turn completion models |
| Vogent | Neural turn detection with high accuracy prediction |
Some STT plugins also include built-in turn detection via VAD, which means no separate plugin is needed:
| STT Plugin | Turn Detection |
|---|
| Deepgram | Built-in with eager_turn_detection option |
| ElevenLabs | Built-in via VAD commit strategy |
For Realtime APIs (OpenAI, Gemini, AWS Bedrock, Qwen), turn detection is built-in at the model level—no separate plugin needed.
When an STT plugin provides built-in turn detection, the Agent automatically ignores any external TurnDetector plugin to prevent conflicts.
Use Cases
- Voice Assistants: Respond at the right moment without interrupting
- Customer Service Bots: Natural conversation flow with customers
- Real-time Translation: Capture complete thoughts before translating
- Meeting Intelligence: Identify natural break points for summarization
- Interview Tools: AI interviewers that don’t interrupt
Next Steps