Voice Communication

This guide covers advanced voice communication features including real-time voice calls, text-to-speech (TTS), speech-to-text (STT), and LiveKit integration for WebRTC-based voice interactions.

Overview

AlooChat’s voice communication system enables:

Real-time Voice Calls: WebRTC-based voice conversations with AI agents
Text-to-Speech (TTS): Convert AI responses to natural speech using ElevenLabs
Speech-to-Text (STT): Transcribe user voice input to text
Voice Agent Configuration: Customize voice settings, language, and personality

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   User Device   │────▶│   LiveKit Room  │────▶│   AI Engine     │
│   (WebRTC)      │◀────│   (WebRTC SFU)  │◀────│   (Voice Agent) │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                │
                                ▼
                        ┌─────────────────┐
                        │   ElevenLabs    │
                        │   (TTS/STT)     │
                        └─────────────────┘

Setting Up Voice Agents

1. Create a Voice Agent

First, create a voice agent with your preferred voice settings:

curl -X POST "https://api.aloochat.ai/api/public/voice-agents" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Support Voice",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "voice_name": "Sarah",
    "language": "en",
    "model_id": "eleven_turbo_v2_5",
    "active": true,
    "voice_stability": 0.5,
    "voice_similarity_boost": 0.75,
    "voice_speed": 100.0
  }'

const voiceAgent = await fetch('https://api.aloochat.ai/api/public/voice-agents', {
  method: 'POST',
  headers: {
    'x-api-key': 'your_api_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: 'Customer Support Voice',
    voice_id: 'EXAVITQu4vr4xnSDxMaL',
    voice_name: 'Sarah',
    language: 'en',
    model_id: 'eleven_turbo_v2_5',
    active: true,
    voice_stability: 0.5,
    voice_similarity_boost: 0.75,
    voice_speed: 100.0
  })
}).then(r => r.json());
 
console.log('Voice agent created:', voiceAgent.id);

import requests
 
response = requests.post(
    'https://api.aloochat.ai/api/public/voice-agents',
    headers={
        'x-api-key': 'your_api_key_here',
        'Content-Type': 'application/json'
    },
    json={
        'name': 'Customer Support Voice',
        'voice_id': 'EXAVITQu4vr4xnSDxMaL',
        'voice_name': 'Sarah',
        'language': 'en',
        'model_id': 'eleven_turbo_v2_5',
        'active': True,
        'voice_stability': 0.5,
        'voice_similarity_boost': 0.75,
        'voice_speed': 100.0
    }
)
 
voice_agent = response.json()
print(f"Voice agent created: {voice_agent['id']}")

2. Link Voice Agent to Main Agent

Update your main agent to use the voice agent:

curl -X PUT "https://api.aloochat.ai/api/public/agents/{agent_id}" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_agent": {
      "id": "voice_agent_uuid_here"
    }
  }'

const agentId = 'your_agent_id';
const voiceAgentId = 'your_voice_agent_id';
 
await fetch(`https://api.aloochat.ai/api/public/agents/${agentId}`, {
  method: 'PUT',
  headers: {
    'x-api-key': 'your_api_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    voice_agent: {
      id: voiceAgentId
    }
  })
});

agent_id = 'your_agent_id'
voice_agent_id = 'your_voice_agent_id'
 
requests.put(
    f'https://api.aloochat.ai/api/public/agents/{agent_id}',
    headers={
        'x-api-key': 'your_api_key_here',
        'Content-Type': 'application/json'
    },
    json={
        'voice_agent': {
            'id': voice_agent_id
        }
    }
)

Voice Settings

Voice Parameters

Parameter	Range	Default	Description
`voice_stability`	0.0 - 1.0	0.5	Higher = more consistent, lower = more expressive
`voice_similarity_boost`	0.0 - 1.0	0.5	Higher = closer to original voice
`voice_style`	0.0 - 1.0	0.0	Style exaggeration (use sparingly)
`voice_speed`	50 - 200	100	Speech speed percentage

Recommended Settings by Use Case

Use Case	Stability	Similarity	Style	Speed
Customer Support	0.5	0.75	0.0	100
Sales/Marketing	0.4	0.6	0.2	105
Technical Support	0.6	0.8	0.0	95
Casual Chat	0.3	0.5	0.3	100

Text-to-Speech (TTS)

Convert text to speech using ElevenLabs voices.

curl -X POST "https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/EXAVITQu4vr4xnSDxMaL/tts" \
  -H "x-api-key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! How can I help you today?",
    "model_id": "eleven_turbo_v2_5",
    "language_code": "en",
    "voice_stability": 0.5,
    "voice_similarity_boost": 0.75
  }' \
  --output response.mp3

async function textToSpeech(text, voiceId) {
  const response = await fetch(
    `https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/${voiceId}/tts`,
    {
      method: 'POST',
      headers: {
        'x-api-key': 'your_api_key_here',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: text,
        model_id: 'eleven_turbo_v2_5',
        language_code: 'en',
        voice_stability: 0.5,
        voice_similarity_boost: 0.75
      })
    }
  );
 
  // Get audio blob and play it
  const audioBlob = await response.blob();
  const audioUrl = URL.createObjectURL(audioBlob);
  const audio = new Audio(audioUrl);
  audio.play();
  
  return audioUrl;
}
 
// Usage
const audioUrl = await textToSpeech('Hello! How can I help you today?', 'EXAVITQu4vr4xnSDxMaL');

import requests
 
def text_to_speech(text: str, voice_id: str, output_file: str = 'response.mp3'):
    response = requests.post(
        f'https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/{voice_id}/tts',
        headers={
            'x-api-key': 'your_api_key_here',
            'Content-Type': 'application/json'
        },
        json={
            'text': text,
            'model_id': 'eleven_turbo_v2_5',
            'language_code': 'en',
            'voice_stability': 0.5,
            'voice_similarity_boost': 0.75
        }
    )
    
    with open(output_file, 'wb') as f:
        f.write(response.content)
    
    return output_file
 
# Usage
audio_file = text_to_speech('Hello! How can I help you today?', 'EXAVITQu4vr4xnSDxMaL')
print(f'Audio saved to: {audio_file}')

Speech-to-Text (STT)

Transcribe audio to text.

curl -X POST "https://api.aloochat.ai/api/public/voice-agents/stt" \
  -H "x-api-key: your_api_key_here" \
  -F "file=@recording.mp3"

async function speechToText(audioFile) {
  const formData = new FormData();
  formData.append('file', audioFile);
 
  const response = await fetch('https://api.aloochat.ai/api/public/voice-agents/stt', {
    method: 'POST',
    headers: {
      'x-api-key': 'your_api_key_here'
    },
    body: formData
  });
 
  const result = await response.json();
  return result.text;
}
 
// Usage with file input
const fileInput = document.getElementById('audioInput');
fileInput.addEventListener('change', async (e) => {
  const file = e.target.files[0];
  const transcription = await speechToText(file);
  console.log('Transcription:', transcription);
});

import requests
 
def speech_to_text(audio_file_path: str) -> str:
    with open(audio_file_path, 'rb') as audio_file:
        response = requests.post(
            'https://api.aloochat.ai/api/public/voice-agents/stt',
            headers={
                'x-api-key': 'your_api_key_here'
            },
            files={
                'file': ('audio.mp3', audio_file, 'audio/mpeg')
            }
        )
    
    result = response.json()
    return result['text']
 
# Usage
transcription = speech_to_text('recording.mp3')
print(f'Transcription: {transcription}')

Available Voices

Arabic Voices

Get curated Arabic voices optimized for Middle Eastern dialects:

const arabicVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/arabic?gender=female', {
  headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
 
console.log('Arabic voices:', arabicVoices);

English Voices

Get curated English voices with various accents:

const englishVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/english?gender=male&accent=british', {
  headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
 
console.log('English voices:', englishVoices);

Multilingual Voices

Get voices that support both Arabic and English:

const multilingualVoices = await fetch('https://api.aloochat.ai/api/public/voice-agents/multilingual', {
  headers: { 'x-api-key': 'your_api_key_here' }
}).then(r => r.json());
 
console.log('Multilingual voices:', multilingualVoices);

ElevenLabs Models

Model ID	Description	Best For
`eleven_turbo_v2_5`	Latest turbo model, fast and high quality	Real-time conversations
`eleven_multilingual_v2`	Best for multilingual content	Arabic/English mixed
`eleven_monolingual_v1`	Original English model	English-only content

Recommendation: Use eleven_turbo_v2_5 for most use cases. It provides the best balance of speed and quality for real-time voice interactions.

Complete Voice Chat Example

Here’s a complete example of a voice chat implementation:

class VoiceChat {
  constructor(apiKey, voiceId) {
    this.apiKey = apiKey;
    this.voiceId = voiceId;
    this.conversationId = null;
    this.agentKey = null;
  }
 
  async initialize(agentKey) {
    this.agentKey = agentKey;
    // Generate a unique conversation ID
    this.conversationId = `voice-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
  }
 
  async processVoiceInput(audioBlob) {
    // 1. Convert speech to text
    const formData = new FormData();
    formData.append('file', audioBlob, 'recording.webm');
 
    const sttResponse = await fetch('https://api.aloochat.ai/api/public/voice-agents/stt', {
      method: 'POST',
      headers: { 'x-api-key': this.apiKey },
      body: formData
    });
    const { text: userMessage } = await sttResponse.json();
 
    // 2. Send to chat API
    const chatResponse = await fetch('https://api.aloochat.ai/api/public/chat', {
      method: 'POST',
      headers: {
        'x-api-key': this.apiKey,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        agent_key: this.agentKey,
        query: userMessage,
        conversation_id: this.conversationId,
        messages: []
      })
    });
    const { content: aiResponse } = await chatResponse.json();
 
    // 3. Convert AI response to speech
    const ttsResponse = await fetch(
      `https://api.aloochat.ai/api/public/voice-agents/elevenlabs/voices/${this.voiceId}/tts`,
      {
        method: 'POST',
        headers: {
          'x-api-key': this.apiKey,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          text: aiResponse,
          model_id: 'eleven_turbo_v2_5'
        })
      }
    );
 
    const audioBlob = await ttsResponse.blob();
    return {
      userMessage,
      aiResponse,
      audioUrl: URL.createObjectURL(audioBlob)
    };
  }
}
 
// Usage
const voiceChat = new VoiceChat('your_api_key', 'EXAVITQu4vr4xnSDxMaL');
await voiceChat.initialize('your_agent_key');
 
// When user records audio
const result = await voiceChat.processVoiceInput(recordedAudioBlob);
console.log('User said:', result.userMessage);
console.log('AI responded:', result.aiResponse);
 
// Play the audio response
const audio = new Audio(result.audioUrl);
audio.play();

Best Practices

1. Optimize for Latency

Use eleven_turbo_v2_5 model for fastest response times
Keep text chunks short for streaming TTS
Pre-fetch voices list on app initialization

2. Handle Errors Gracefully

try {
  const audio = await textToSpeech(text, voiceId);
} catch (error) {
  if (error.status === 429) {
    // Rate limited - implement exponential backoff
    await sleep(1000);
    return textToSpeech(text, voiceId);
  }
  // Fallback to text-only response
  displayTextResponse(text);
}

3. Audio Quality

Use high-quality audio input (16kHz+ sample rate)
Reduce background noise before STT
Consider using noise suppression libraries

4. Language Detection

For multilingual support, detect the language before selecting the appropriate voice:

async function detectAndSpeak(text) {
  // Simple language detection (you can use a library for better accuracy)
  const isArabic = /[\u0600-\u06FF]/.test(text);
  
  const voiceId = isArabic ? 'arabic_voice_id' : 'english_voice_id';
  const languageCode = isArabic ? 'ar' : 'en';
  
  return textToSpeech(text, voiceId, languageCode);
}

Troubleshooting

Common Issues

Issue	Cause	Solution
Audio not playing	Browser autoplay policy	Require user interaction before playing
Poor transcription	Low audio quality	Use noise suppression, higher sample rate
Slow TTS response	Large text chunks	Break text into smaller segments
Voice sounds robotic	Wrong stability settings	Adjust voice_stability (try 0.3-0.5)

Debug Mode

Enable debug logging to troubleshoot issues:

const DEBUG = true;
 
async function textToSpeechDebug(text, voiceId) {
  if (DEBUG) {
    console.log('TTS Request:', { text, voiceId });
    console.time('TTS Response');
  }
  
  const response = await textToSpeech(text, voiceId);
  
  if (DEBUG) {
    console.timeEnd('TTS Response');
    console.log('Audio size:', response.size, 'bytes');
  }
  
  return response;
}

Workflow Workflows