AdvancedVoice Communication

Voice Communication

This guide covers advanced voice communication features including real-time voice calls, text-to-speech (TTS), speech-to-text (STT), and LiveKit integration for WebRTC-based voice interactions.

Overview

AlooChat’s voice communication system enables:

  • Real-time Voice Calls: WebRTC-based voice conversations with AI agents
  • Text-to-Speech (TTS): Convert AI responses to natural speech using ElevenLabs
  • Speech-to-Text (STT): Transcribe user voice input to text
  • Voice Agent Configuration: Customize voice settings, language, and personality

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   User Device   │────▶│   LiveKit Room  │────▶│   AI Engine     │
│   (WebRTC)      │◀────│   (WebRTC SFU)  │◀────│   (Voice Agent) │
└─────────────────┘     └─────────────────┘     └─────────────────┘


                        ┌─────────────────┐
                        │   ElevenLabs    │
                        │   (TTS/STT)     │
                        └─────────────────┘

Setting Up Voice Agents

1. Create a Voice Agent

First, create a voice agent with your preferred voice settings:

curl -X POST "https://api.aloochat.ai/api/public/voice-agents" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Support Voice",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "voice_name": "Sarah",
    "language": "en",
    "model_id": "eleven_turbo_v2_5",
    "active": true,
    "voice_stability": 0.5,
    "voice_similarity_boost": 0.75,
    "voice_speed": 100.0
  }'

Update your main agent to use the voice agent:

curl -X PUT "https://api.aloochat.ai/api/public/agents/{agent_id}" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_agent": {
      "id": "voice_agent_uuid_here"
    }
  }'

Voice Settings

Voice Parameters

ParameterRangeDefaultDescription
voice_stability0.0 - 1.00.5Higher = more consistent, lower = more expressive
voice_similarity_boost0.0 - 1.00.5Higher = closer to original voice
voice_style0.0 - 1.00.0Style exaggeration (use sparingly)
voice_speed50 - 200100Speech speed percentage
Use CaseStabilitySimilarityStyleSpeed
Customer Support0.50.750.0100
Sales/Marketing0.40.60.2105
Technical Support0.60.80.095
Casual Chat0.30.50.3100

Text-to-Speech (TTS)

Convert text to speech using ElevenLabs voices.

curl -X POST "https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/EXAVITQu4vr4xnSDxMaL/tts" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! How can I help you today?",
    "model_id": "eleven_turbo_v2_5",
    "language_code": "en",
    "voice_stability": 0.5,
    "voice_similarity_boost": 0.75
  }' \
  --output response.mp3

Speech-to-Text (STT)

Transcribe audio to text.

curl -X POST "https://api.aloochat.ai/api/public/voice-agents/stt" \
  -H "x-api-key: your_api_key_here" \
  -F "file=@recording.mp3"

Available Voices

Arabic Voices

Get curated Arabic voices optimized for Middle Eastern dialects:

const arabicVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/arabic?gender=female', {
  headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
 
console.log('Arabic voices:', arabicVoices);

English Voices

Get curated English voices with various accents:

const englishVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/english?gender=male&accent=british', {
  headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
 
console.log('English voices:', englishVoices);

Multilingual Voices

Get voices that support both Arabic and English:

const multilingualVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/multilingual', {
  headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
 
console.log('Multilingual voices:', multilingualVoices);

ElevenLabs Models

Model IDDescriptionBest For
eleven_turbo_v2_5Latest turbo model, fast and high qualityReal-time conversations
eleven_multilingual_v2Best for multilingual contentArabic/English mixed
eleven_monolingual_v1Original English modelEnglish-only content

Recommendation: Use eleven_turbo_v2_5 for most use cases. It provides the best balance of speed and quality for real-time voice interactions.

Complete Voice Chat Example

Here’s a complete example of a voice chat implementation:

class VoiceChat {
  constructor(apiKey, voiceId) {
    this.apiKey = apiKey;
    this.voiceId = voiceId;
    this.conversationId = null;
    this.agentKey = null;
  }
 
  async initialize(agentKey) {
    this.agentKey = agentKey;
    // Generate a unique conversation ID
    this.conversationId = `voice-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
  }
 
  async processVoiceInput(audioBlob) {
    // 1. Convert speech to text
    const formData = new FormData();
    formData.append('file', audioBlob, 'recording.webm');
 
    const sttResponse = await fetch('https://api.aloochat.ai/api/public/voice-agents/stt', {
      method: 'POST',
      headers: { 'x-api-key': this.apiKey },
      body: formData
    });
    const { text: userMessage } = await sttResponse.json();
 
    // 2. Send to chat API
    const chatResponse = await fetch('https://api.aloochat.ai/api/public/chat', {
      method: 'POST',
      headers: {
        'x-api-key': this.apiKey,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        agent_key: this.agentKey,
        query: userMessage,
        conversation_id: this.conversationId,
        messages: []
      })
    });
    const { content: aiResponse } = await chatResponse.json();
 
    // 3. Convert AI response to speech
    const ttsResponse = await fetch(
      `https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/${this.voiceId}/tts`,
      {
        method: 'POST',
        headers: {
          'x-api-key': this.apiKey,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          text: aiResponse,
          model_id: 'eleven_turbo_v2_5'
        })
      }
    );
 
    const audioBlob = await ttsResponse.blob();
    return {
      userMessage,
      aiResponse,
      audioUrl: URL.createObjectURL(audioBlob)
    };
  }
}
 
// Usage
const voiceChat = new VoiceChat('your_api_key', 'EXAVITQu4vr4xnSDxMaL');
await voiceChat.initialize('your_agent_key');
 
// When user records audio
const result = await voiceChat.processVoiceInput(recordedAudioBlob);
console.log('User said:', result.userMessage);
console.log('AI responded:', result.aiResponse);
 
// Play the audio response
const audio = new Audio(result.audioUrl);
audio.play();

Best Practices

1. Optimize for Latency

  • Use eleven_turbo_v2_5 model for fastest response times
  • Keep text chunks short for streaming TTS
  • Pre-fetch voices list on app initialization

2. Handle Errors Gracefully

try {
  const audio = await textToSpeech(text, voiceId);
} catch (error) {
  if (error.status === 429) {
    // Rate limited - implement exponential backoff
    await sleep(1000);
    return textToSpeech(text, voiceId);
  }
  // Fallback to text-only response
  displayTextResponse(text);
}

3. Audio Quality

  • Use high-quality audio input (16kHz+ sample rate)
  • Reduce background noise before STT
  • Consider using noise suppression libraries

4. Language Detection

For multilingual support, detect the language before selecting the appropriate voice:

async function detectAndSpeak(text) {
  // Simple language detection (you can use a library for better accuracy)
  const isArabic = /[\u0600-\u06FF]/.test(text);
  
  const voiceId = isArabic ? 'arabic_voice_id' : 'english_voice_id';
  const languageCode = isArabic ? 'ar' : 'en';
  
  return textToSpeech(text, voiceId, languageCode);
}

Troubleshooting

Common Issues

IssueCauseSolution
Audio not playingBrowser autoplay policyRequire user interaction before playing
Poor transcriptionLow audio qualityUse noise suppression, higher sample rate
Slow TTS responseLarge text chunksBreak text into smaller segments
Voice sounds roboticWrong stability settingsAdjust voice_stability (try 0.3-0.5)

Debug Mode

Enable debug logging to troubleshoot issues:

const DEBUG = true;
 
async function textToSpeechDebug(text, voiceId) {
  if (DEBUG) {
    console.log('TTS Request:', { text, voiceId });
    console.time('TTS Response');
  }
  
  const response = await textToSpeech(text, voiceId);
  
  if (DEBUG) {
    console.timeEnd('TTS Response');
    console.log('Audio size:', response.size, 'bytes');
  }
  
  return response;
}