AI Interviewer Control Plane

Overview

A production voice agent system that conducts automated phone interviews for candidate screening, built with LiveKit for real-time communication and a custom control plane for orchestration and monitoring.

Problem

Recruiting teams were spending hours on initial phone screens that often yielded predictable outcomes. Candidates waited days for feedback, and interviewers conducted repetitive conversations that could be standardized. We needed a system that could:

Conduct natural voice conversations with candidates
Ask structured questions while adapting to responses
Provide immediate transcripts and assessments
Scale to handle spikes during hiring campaigns

What I Built

Architecture

Loading diagram...

Key Components

Session Orchestrator

Manages interview lifecycle from scheduling to completion
Handles interruptions and reconnections gracefully
Event-sources all state transitions for replay

Agent Workers

LiveKit-based workers that join calls as AI participants
Pipeline: audio → STT → context + RAG → LLM → TTS → audio
Stateless with session state stored externally

Evaluation Engine

Analyzes transcripts against job requirements
Generates structured scorecards
Flags responses for human review

Control Dashboard

Real-time monitoring of active interviews
Session replay with timeline and transcript
Aggregate metrics and quality reports

Reliability & Operations

Latency Budget

Stage	Target p95	Actual p95
STT (to final)	500ms	380ms
LLM (first token)	800ms	650ms
TTS (first chunk)	300ms	220ms
End-to-end	2000ms	1450ms

Fault Tolerance

STT failures: Buffer audio, retry with exponential backoff, fallback to secondary provider
LLM timeouts: Use shorter prompts, fallback to simpler model, acknowledge delay to user
TTS failures: Pre-cached common phrases, graceful degradation to silence
Network drops: Automatic reconnection with state recovery from event log

Monitoring

Custom Grafana dashboards tracking:
- Sessions active, queued, completed
- Latency percentiles by stage
- Error rates by type
- Cost per interview
Alerts on:
- Error rate > 2% for 5 minutes
- p95 latency > 3s for 5 minutes
- Queue depth > 50 for 10 minutes

Logging

All events logged with correlation IDs:

json

{
  "session_id": "int_abc123",
  "event": "llm.response",
  "latency_ms": 450,
  "tokens_in": 1200,
  "tokens_out": 85,
  "model": "gpt-4o-mini",
  "correlation_id": "req_xyz789"
}

Tech Stack

Real-time: LiveKit, WebRTC
STT: Deepgram (primary), Whisper (fallback)
LLM: OpenAI GPT-4o-mini
TTS: Azure Speech Services
Backend: Python, FastAPI
Database: PostgreSQL, Redis
Infrastructure: Azure Kubernetes Service
Observability: Grafana, Sentry, Azure Monitor

What I Owned

Designed overall system architecture
Built the session orchestrator and event sourcing system
Implemented reliability patterns (retries, circuit breakers, fallbacks)
Set up monitoring and alerting infrastructure
Led team of 2 engineers through implementation

Results

Reduced screening time from 30 min to 15 min average
Achieved 95% candidate completion rate
< 2% error rate in production
Handled 100+ concurrent interviews during peak