RAG Triage Agent
Overview
An automated system that uses RAG to analyze incoming Linear tickets (bugs, feature requests) against product documentation and codebase context, providing initial triage and routing recommendations.
Problem
Engineering teams were spending hours each week triaging incoming tickets:
- Reading tickets to understand the issue
- Searching docs to find related context
- Looking through code to understand affected areas
- Assigning to the right team/person
We needed to automate the initial analysis while maintaining accuracy.
What I Built
Architecture
Loading diagram...
RAG Pipeline
Document Processing:
Loading diagram...
Chunking Strategy:
- Markdown: By heading (h2/h3) with hierarchy preserved
- Code: By function/class with docstrings
- Tickets: Full ticket with comments as single chunk
Retrieval:
- Query expansion using LLM
- Hybrid search (vector + keyword)
- Reranking with cross-encoder
- Deduplication by source
Triage Output
For each ticket, the system produces:
typescript
interface TriageResult {
ticket_id: string;
classification: {
type: "bug" | "feature" | "question" | "task";
severity: "critical" | "high" | "medium" | "low";
confidence: number;
};
context: {
related_docs: Citation[];
related_code: Citation[];
similar_tickets: Citation[];
};
analysis: {
summary: string;
affected_areas: string[];
potential_causes: string[];
suggested_approach: string;
};
routing: {
suggested_team: string;
suggested_assignee?: string;
reasoning: string;
};
}
Example Output
Input Ticket:
"Dashboard loading slowly after yesterday's deploy. Users reporting 10+ second load times."
Triage Output:
json
{
"classification": {
"type": "bug",
"severity": "high",
"confidence": 0.92
},
"context": {
"related_docs": [
{"title": "Dashboard Architecture", "path": "docs/dashboard.md", "section": "Data Loading"}
],
"related_code": [
{"title": "DashboardLoader", "path": "src/dashboard/loader.ts", "lines": "45-120"}
],
"similar_tickets": [
{"title": "Dashboard performance regression", "id": "ENG-1234", "resolution": "Query optimization"}
]
},
"analysis": {
"summary": "Performance regression on dashboard load after recent deployment",
"affected_areas": ["Dashboard", "API", "Database queries"],
"potential_causes": [
"N+1 query introduced in recent changes",
"Missing database index",
"Increased data volume"
],
"suggested_approach": "Check query logs for slow queries, compare with pre-deploy baseline"
},
"routing": {
"suggested_team": "Platform",
"suggested_assignee": "alice",
"reasoning": "Alice recently worked on dashboard queries and has context"
}
}
Reliability & Operations
Accuracy Metrics
| Metric | Target | Actual |
|---|---|---|
| Classification accuracy | > 85% | 89% |
| Severity assessment | > 80% | 84% |
| Team routing accuracy | > 75% | 78% |
Failure Handling
- Embedding API down: Queue tickets, process in batch when recovered
- Low confidence: Flag for human review, don't auto-assign
- Missing context: Acknowledge gaps, ask clarifying questions
Monitoring
- Tickets processed per hour
- Average processing latency
- Classification confidence distribution
- Human override rate (accuracy signal)
Tech Stack
- Trigger: Linear webhooks
- Queue: Redis + Celery
- RAG: pgvector, OpenAI embeddings
- LLM: GPT-4o-mini for classification, GPT-4o for analysis
- Backend: Python, FastAPI
- Indexing: Custom pipeline with incremental updates
What I Owned
- Designed RAG architecture and chunking strategy
- Built the classification and analysis pipeline
- Implemented accuracy monitoring system
- Created feedback loop for continuous improvement
Results
- 70% reduction in manual triage time
- 89% classification accuracy
- Average triage latency: 8 seconds
- Engineers receive pre-analyzed tickets with relevant context
Links
- System documentation: Available on request
- Sample outputs: Available on request