Skip to content

Ultravox Integration

Ultravox serves as a critical integration partner for AngelCX, providing advanced voice AI capabilities and RAG (Retrieval Augmented Generation) services. This document outlines how Ultravox is integrated into our platform and its key functionalities.

Overview

Ultravox provides two main services to AngelCX:

  1. Voice AI Services

    • Real-time voice conversations
    • Voice cloning and synthesis
    • Call handling and management
    • Call recording and transcription
  2. RAG Services

    • Document ingestion and processing
    • Vector embeddings management
    • Semantic search capabilities
    • Knowledge base updates
    • Web crawling / scraping

Voice AI Integration

Call Flow Architecture

sequenceDiagram participant Visitor participant BotUI participant AIEngine participant Ultravox participant Database Visitor->>BotUI: Initiate voice call BotUI->>AIEngine: Request call session AIEngine->>Ultravox: Create call Ultravox-->>AIEngine: Return Call ID & Join URL AIEngine->>Database: Store session info AIEngine-->>BotUI: Return join details BotUI->>Ultravox: Connect to call (WebRTC) loop Voice Conversation Visitor->>Ultravox: Stream audio Ultravox->>Ultravox: Process speech Ultravox-->>Visitor: Stream response end Ultravox->>AIEngine: Call ended webhook AIEngine->>Database: Update session AIEngine->>Database: Store transcription

Voice Features

  1. Real-time Voice Processing

    • Low-latency audio streaming
    • WebRTC-based communication
    • High-quality voice synthesis
    • Natural conversation flow
  2. Voice Cloning

    • Custom voice creation
    • Voice model training
    • Brand-specific voice personas
    • Multiple language support
  3. Call Management

    • Session initialization
    • Call state tracking
    • Automated disconnection handling
    • Quality monitoring
  4. Recording & Transcription

    • Real-time transcription
    • Call recording storage
    • Conversation analytics
    • Quality assurance tools

RAG Integration

Document Processing Flow

graph TD A[Admin Dashboard] -->|Upload Document| B[Admin API] B -->|Process Request| C[Ultravox RAG API] C -->|Create Embeddings| D[Vector Store] C -->|Extract Text| E[Document Store] C -->|Index Content| F[Search Index] G[AI Engine] -->|Query| C C -->|Search| F C -->|Retrieve| D C -->|Format| H[Response] H -->|Return| G

RAG Features

  1. Document Management

    • Multiple format support (PDF, DOC, TXT)
    • Batch processing
    • Version control
    • Content extraction
  2. Vector Operations

    • Embedding generation
    • Vector storage
    • Similarity search
    • Clustering capabilities
  3. Knowledge Base

    • Structured data organization
    • Real-time updates
    • Content categorization
    • Context window management

Integration Points

Admin API Integration

The Admin API interacts with Ultravox for:

  • RAG collection management
  • Voice model configuration
  • Analytics retrieval
  • Recordings, System settings

AI Engine Integration

The AI Engine leverages Ultravox for:

  • Real-time voice conversations
  • RAG query processing
  • Response generation
  • Session management

Frontend Integration with Ultravox SDK

The Ultravox SDK provides a streamlined integration experience for our Bot UI frontend, significantly reducing development complexity and maintenance overhead.

SDK Benefits

  1. Simplified WebRTC Implementation

    • Built-in connection management
    • Network state monitoring
    • Fallback mechanisms for different network conditions
  2. Audio Processing

    • Automatic audio stream optimization
    • Background noise reduction
    • Echo cancellation
    • Voice activity detection
  3. Call Management

    • Automatic reconnection handling
    • Call quality monitoring
    • Session state management
    • Event-driven architecture

Offloaded Complexities

The SDK handles several complex tasks automatically:

  1. Media Management

    • Audio codec negotiation
    • Sample rate adaptation
    • Buffer management
    • Stream synchronization
  2. Network Handling

    • Connection optimization
    • Bandwidth adaptation
    • Latency compensation
    • Packet loss recovery
  3. Voice Processing

    • Real-time transcription
    • Voice synthesis
    • Audio enhancement
    • Latency optimization

This significantly reduces the development effort required for implementing voice capabilities in our Bot UI, allowing our frontend team to focus on creating a great user experience rather than dealing with WebRTC and audio processing implementations.

References