Adaptive LLM Router
A FastAPI service that intelligently routes prompts to the most appropriate LLM based on task type, cost, latency requirements, and model availability. Supports local Ollama models and cloud providers (OpenAI, Anthropic, Groq).
Overview
Running every prompt through GPT-4o is expensive and slow. This router classifies incoming prompts and dispatches them to the cheapest capable model — local Llama 3 for simple tasks, GPT-4o only for complex reasoning.
Architecture
Client Request
└─► FastAPI Router
└─► Classifier (task type detection)
├─► Simple / short → Ollama (Llama 3 8B) — free, local
├─► Code generation → Ollama (DeepSeek Coder) — free, local
├─► Complex reasoning → Groq (Llama 3 70B) — fast, cheap
└─► Critical / long context → OpenAI (GPT-4o) — best quality
Stack
- FastAPI — Async API server
- Ollama — Local model inference
- LiteLLM — Unified interface across providers
- Redis — Response caching
- Pydantic — Request/response validation
Setup
Kali Linux
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3
ollama pull deepseek-coder
# Clone and install
git clone https://github.com/rootwithkhandal/llm-router
cd llm-router
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Install Redis
sudo apt install redis-server -y
sudo systemctl start redis
# Configure
cp .env.example .env
nano .env # Add OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY
# Run
uvicorn main:app --reload --port 8000
macOS
# Install Ollama
brew install ollama
ollama serve &
ollama pull llama3
# Clone and install
git clone https://github.com/rootwithkhandal/llm-router
cd llm-router
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Install Redis
brew install redis && brew services start redis
cp .env.example .env
uvicorn main:app --reload --port 8000
Windows
# Install Ollama from https://ollama.com/download
ollama pull llama3
git clone https://github.com/rootwithkhandal/llm-router
cd llm-router
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
# Install Redis via WSL2 or use Redis for Windows
# https://github.com/microsoftarchive/redis/releases
Copy-Item .env.example .env
uvicorn main:app --reload --port 8000
API Usage
# Route a prompt
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain reentrancy attacks", "priority": "cost"}'
# Force a specific model
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Write a Foundry test", "model": "deepseek-coder"}'
# Check routing stats
curl http://localhost:8000/stats
Routing Logic
| Task Type | Default Model | Fallback |
|---|---|---|
| Simple Q&A | Llama 3 8B (local) | Groq Llama 3 70B |
| Code generation | DeepSeek Coder (local) | GPT-4o |
| Complex reasoning | Groq Llama 3 70B | GPT-4o |
| Long context (>8k) | GPT-4o | Claude 3 |
| Security analysis | GPT-4o | Groq |