Voice Communication
This guide covers advanced voice communication features including real-time voice calls, text-to-speech (TTS), speech-to-text (STT), and LiveKit integration for WebRTC-based voice interactions.
Overview
AlooChat’s voice communication system enables:
- Real-time Voice Calls: WebRTC-based voice conversations with AI agents
- Text-to-Speech (TTS): Convert AI responses to natural speech using ElevenLabs
- Speech-to-Text (STT): Transcribe user voice input to text
- Voice Agent Configuration: Customize voice settings, language, and personality
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Device │────▶│ LiveKit Room │────▶│ AI Engine │
│ (WebRTC) │◀────│ (WebRTC SFU) │◀────│ (Voice Agent) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ ElevenLabs │
│ (TTS/STT) │
└─────────────────┘Setting Up Voice Agents
1. Create a Voice Agent
First, create a voice agent with your preferred voice settings:
curl -X POST "https://api.aloochat.ai/api/public/voice-agents" \
-H "x-api-key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"name": "Customer Support Voice",
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"voice_name": "Sarah",
"language": "en",
"model_id": "eleven_turbo_v2_5",
"active": true,
"voice_stability": 0.5,
"voice_similarity_boost": 0.75,
"voice_speed": 100.0
}'2. Link Voice Agent to Main Agent
Update your main agent to use the voice agent:
curl -X PUT "https://api.aloochat.ai/api/public/agents/{agent_id}" \
-H "x-api-key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"voice_agent": {
"id": "voice_agent_uuid_here"
}
}'Voice Settings
Voice Parameters
| Parameter | Range | Default | Description |
|---|---|---|---|
voice_stability | 0.0 - 1.0 | 0.5 | Higher = more consistent, lower = more expressive |
voice_similarity_boost | 0.0 - 1.0 | 0.5 | Higher = closer to original voice |
voice_style | 0.0 - 1.0 | 0.0 | Style exaggeration (use sparingly) |
voice_speed | 50 - 200 | 100 | Speech speed percentage |
Recommended Settings by Use Case
| Use Case | Stability | Similarity | Style | Speed |
|---|---|---|---|---|
| Customer Support | 0.5 | 0.75 | 0.0 | 100 |
| Sales/Marketing | 0.4 | 0.6 | 0.2 | 105 |
| Technical Support | 0.6 | 0.8 | 0.0 | 95 |
| Casual Chat | 0.3 | 0.5 | 0.3 | 100 |
Text-to-Speech (TTS)
Convert text to speech using ElevenLabs voices.
curl -X POST "https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/EXAVITQu4vr4xnSDxMaL/tts" \
-H "x-api-key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! How can I help you today?",
"model_id": "eleven_turbo_v2_5",
"language_code": "en",
"voice_stability": 0.5,
"voice_similarity_boost": 0.75
}' \
--output response.mp3Speech-to-Text (STT)
Transcribe audio to text.
curl -X POST "https://api.aloochat.ai/api/public/voice-agents/stt" \
-H "x-api-key: your_api_key_here" \
-F "file=@recording.mp3"Available Voices
Arabic Voices
Get curated Arabic voices optimized for Middle Eastern dialects:
const arabicVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/arabic?gender=female', {
headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
console.log('Arabic voices:', arabicVoices);English Voices
Get curated English voices with various accents:
const englishVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/english?gender=male&accent=british', {
headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
console.log('English voices:', englishVoices);Multilingual Voices
Get voices that support both Arabic and English:
const multilingualVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/multilingual', {
headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
console.log('Multilingual voices:', multilingualVoices);ElevenLabs Models
| Model ID | Description | Best For |
|---|---|---|
eleven_turbo_v2_5 | Latest turbo model, fast and high quality | Real-time conversations |
eleven_multilingual_v2 | Best for multilingual content | Arabic/English mixed |
eleven_monolingual_v1 | Original English model | English-only content |
Recommendation: Use eleven_turbo_v2_5 for most use cases. It provides the best balance of speed and quality for real-time voice interactions.
Complete Voice Chat Example
Here’s a complete example of a voice chat implementation:
class VoiceChat {
constructor(apiKey, voiceId) {
this.apiKey = apiKey;
this.voiceId = voiceId;
this.conversationId = null;
this.agentKey = null;
}
async initialize(agentKey) {
this.agentKey = agentKey;
// Generate a unique conversation ID
this.conversationId = `voice-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
}
async processVoiceInput(audioBlob) {
// 1. Convert speech to text
const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');
const sttResponse = await fetch('https://api.aloochat.ai/api/public/voice-agents/stt', {
method: 'POST',
headers: { 'x-api-key': this.apiKey },
body: formData
});
const { text: userMessage } = await sttResponse.json();
// 2. Send to chat API
const chatResponse = await fetch('https://api.aloochat.ai/api/public/chat', {
method: 'POST',
headers: {
'x-api-key': this.apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
agent_key: this.agentKey,
query: userMessage,
conversation_id: this.conversationId,
messages: []
})
});
const { content: aiResponse } = await chatResponse.json();
// 3. Convert AI response to speech
const ttsResponse = await fetch(
`https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/${this.voiceId}/tts`,
{
method: 'POST',
headers: {
'x-api-key': this.apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: aiResponse,
model_id: 'eleven_turbo_v2_5'
})
}
);
const audioBlob = await ttsResponse.blob();
return {
userMessage,
aiResponse,
audioUrl: URL.createObjectURL(audioBlob)
};
}
}
// Usage
const voiceChat = new VoiceChat('your_api_key', 'EXAVITQu4vr4xnSDxMaL');
await voiceChat.initialize('your_agent_key');
// When user records audio
const result = await voiceChat.processVoiceInput(recordedAudioBlob);
console.log('User said:', result.userMessage);
console.log('AI responded:', result.aiResponse);
// Play the audio response
const audio = new Audio(result.audioUrl);
audio.play();Best Practices
1. Optimize for Latency
- Use
eleven_turbo_v2_5model for fastest response times - Keep text chunks short for streaming TTS
- Pre-fetch voices list on app initialization
2. Handle Errors Gracefully
try {
const audio = await textToSpeech(text, voiceId);
} catch (error) {
if (error.status === 429) {
// Rate limited - implement exponential backoff
await sleep(1000);
return textToSpeech(text, voiceId);
}
// Fallback to text-only response
displayTextResponse(text);
}3. Audio Quality
- Use high-quality audio input (16kHz+ sample rate)
- Reduce background noise before STT
- Consider using noise suppression libraries
4. Language Detection
For multilingual support, detect the language before selecting the appropriate voice:
async function detectAndSpeak(text) {
// Simple language detection (you can use a library for better accuracy)
const isArabic = /[\u0600-\u06FF]/.test(text);
const voiceId = isArabic ? 'arabic_voice_id' : 'english_voice_id';
const languageCode = isArabic ? 'ar' : 'en';
return textToSpeech(text, voiceId, languageCode);
}Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Audio not playing | Browser autoplay policy | Require user interaction before playing |
| Poor transcription | Low audio quality | Use noise suppression, higher sample rate |
| Slow TTS response | Large text chunks | Break text into smaller segments |
| Voice sounds robotic | Wrong stability settings | Adjust voice_stability (try 0.3-0.5) |
Debug Mode
Enable debug logging to troubleshoot issues:
const DEBUG = true;
async function textToSpeechDebug(text, voiceId) {
if (DEBUG) {
console.log('TTS Request:', { text, voiceId });
console.time('TTS Response');
}
const response = await textToSpeech(text, voiceId);
if (DEBUG) {
console.timeEnd('TTS Response');
console.log('Audio size:', response.size, 'bytes');
}
return response;
}