AI Interviewer Control Plane
Overview
A production voice agent system that conducts automated phone interviews for candidate screening, built with LiveKit for real-time communication and a custom control plane for orchestration and monitoring.
Problem
Recruiting teams were spending hours on initial phone screens that often yielded predictable outcomes. Candidates waited days for feedback, and interviewers conducted repetitive conversations that could be standardized. We needed a system that could:
- Conduct natural voice conversations with candidates
- Ask structured questions while adapting to responses
- Provide immediate transcripts and assessments
- Scale to handle spikes during hiring campaigns
What I Built
Architecture
Loading diagram...
Key Components
Session Orchestrator
- Manages interview lifecycle from scheduling to completion
- Handles interruptions and reconnections gracefully
- Event-sources all state transitions for replay
Agent Workers
- LiveKit-based workers that join calls as AI participants
- Pipeline: audio → STT → context + RAG → LLM → TTS → audio
- Stateless with session state stored externally
Evaluation Engine
- Analyzes transcripts against job requirements
- Generates structured scorecards
- Flags responses for human review
Control Dashboard
- Real-time monitoring of active interviews
- Session replay with timeline and transcript
- Aggregate metrics and quality reports
Reliability & Operations
Latency Budget
| Stage | Target p95 | Actual p95 |
|---|---|---|
| STT (to final) | 500ms | 380ms |
| LLM (first token) | 800ms | 650ms |
| TTS (first chunk) | 300ms | 220ms |
| End-to-end | 2000ms | 1450ms |
Fault Tolerance
- STT failures: Buffer audio, retry with exponential backoff, fallback to secondary provider
- LLM timeouts: Use shorter prompts, fallback to simpler model, acknowledge delay to user
- TTS failures: Pre-cached common phrases, graceful degradation to silence
- Network drops: Automatic reconnection with state recovery from event log
Monitoring
- Custom Grafana dashboards tracking:
- Sessions active, queued, completed
- Latency percentiles by stage
- Error rates by type
- Cost per interview
- Alerts on:
- Error rate > 2% for 5 minutes
- p95 latency > 3s for 5 minutes
- Queue depth > 50 for 10 minutes
Logging
All events logged with correlation IDs:
json
{
"session_id": "int_abc123",
"event": "llm.response",
"latency_ms": 450,
"tokens_in": 1200,
"tokens_out": 85,
"model": "gpt-4o-mini",
"correlation_id": "req_xyz789"
}
Tech Stack
- Real-time: LiveKit, WebRTC
- STT: Deepgram (primary), Whisper (fallback)
- LLM: OpenAI GPT-4o-mini
- TTS: Azure Speech Services
- Backend: Python, FastAPI
- Database: PostgreSQL, Redis
- Infrastructure: Azure Kubernetes Service
- Observability: Grafana, Sentry, Azure Monitor
What I Owned
- Designed overall system architecture
- Built the session orchestrator and event sourcing system
- Implemented reliability patterns (retries, circuit breakers, fallbacks)
- Set up monitoring and alerting infrastructure
- Led team of 2 engineers through implementation
Results
- Reduced screening time from 30 min to 15 min average
- Achieved 95% candidate completion rate
- < 2% error rate in production
- Handled 100+ concurrent interviews during peak
Links
- Architecture documentation: Available on request
- Demo: Voice Agent demo on this site