Back to projects

AI Interviewer Control Plane

Production voice agent system for automated phone interviews

LiveKit
Voice AI
Python
FastAPI
WebRTC

AI Interviewer Control Plane

Overview

A production voice agent system that conducts automated phone interviews for candidate screening, built with LiveKit for real-time communication and a custom control plane for orchestration and monitoring.

Problem

Recruiting teams were spending hours on initial phone screens that often yielded predictable outcomes. Candidates waited days for feedback, and interviewers conducted repetitive conversations that could be standardized. We needed a system that could:

  • Conduct natural voice conversations with candidates
  • Ask structured questions while adapting to responses
  • Provide immediate transcripts and assessments
  • Scale to handle spikes during hiring campaigns

What I Built

Architecture

Loading diagram...

Key Components

Session Orchestrator

  • Manages interview lifecycle from scheduling to completion
  • Handles interruptions and reconnections gracefully
  • Event-sources all state transitions for replay

Agent Workers

  • LiveKit-based workers that join calls as AI participants
  • Pipeline: audio → STT → context + RAG → LLM → TTS → audio
  • Stateless with session state stored externally

Evaluation Engine

  • Analyzes transcripts against job requirements
  • Generates structured scorecards
  • Flags responses for human review

Control Dashboard

  • Real-time monitoring of active interviews
  • Session replay with timeline and transcript
  • Aggregate metrics and quality reports

Reliability & Operations

Latency Budget

StageTarget p95Actual p95
STT (to final)500ms380ms
LLM (first token)800ms650ms
TTS (first chunk)300ms220ms
End-to-end2000ms1450ms

Fault Tolerance

  • STT failures: Buffer audio, retry with exponential backoff, fallback to secondary provider
  • LLM timeouts: Use shorter prompts, fallback to simpler model, acknowledge delay to user
  • TTS failures: Pre-cached common phrases, graceful degradation to silence
  • Network drops: Automatic reconnection with state recovery from event log

Monitoring

  • Custom Grafana dashboards tracking:
    • Sessions active, queued, completed
    • Latency percentiles by stage
    • Error rates by type
    • Cost per interview
  • Alerts on:
    • Error rate > 2% for 5 minutes
    • p95 latency > 3s for 5 minutes
    • Queue depth > 50 for 10 minutes

Logging

All events logged with correlation IDs:

json
{
  "session_id": "int_abc123",
  "event": "llm.response",
  "latency_ms": 450,
  "tokens_in": 1200,
  "tokens_out": 85,
  "model": "gpt-4o-mini",
  "correlation_id": "req_xyz789"
}

Tech Stack

  • Real-time: LiveKit, WebRTC
  • STT: Deepgram (primary), Whisper (fallback)
  • LLM: OpenAI GPT-4o-mini
  • TTS: Azure Speech Services
  • Backend: Python, FastAPI
  • Database: PostgreSQL, Redis
  • Infrastructure: Azure Kubernetes Service
  • Observability: Grafana, Sentry, Azure Monitor

What I Owned

  • Designed overall system architecture
  • Built the session orchestrator and event sourcing system
  • Implemented reliability patterns (retries, circuit breakers, fallbacks)
  • Set up monitoring and alerting infrastructure
  • Led team of 2 engineers through implementation

Results

  • Reduced screening time from 30 min to 15 min average
  • Achieved 95% candidate completion rate
  • < 2% error rate in production
  • Handled 100+ concurrent interviews during peak

Links