Deployment
Docker
The included docker-compose.yml runs the API with persistent index storage:
docker-compose up -dThis mounts artifacts/, docs/, and data/ into the container. The --seed flag auto-ingests sample documents on first boot.
Custom Dockerfile build
docker build -t askbase .docker run -p 8080:8080 -v $(pwd)/artifacts:/app/artifacts askbaseEnvironment variables
| Variable | Default | Description |
|---|---|---|
RAG_ADMIN_TOKEN | admin-demo-token | Admin API token |
RAG_USER_TOKEN | user-demo-token | User API token |
Always override these in production.
Kubernetes
Manifests live in the k8s/ directory:
k8s/ namespace.yaml # Dedicated namespace backend-deployment.yaml # API deployment + service frontend-deployment.yaml # Frontend deployment + service ingress.yaml # Ingress with TLSDeploy
kubectl apply -f k8s/namespace.yamlkubectl apply -f k8s/backend-deployment.yamlkubectl apply -f k8s/frontend-deployment.yamlkubectl apply -f k8s/ingress.yamlHealth probes
The deployment uses Kubernetes-native probes:
| Probe | Endpoint | Purpose |
|---|---|---|
| Liveness | GET /health | Restart if the process is stuck |
| Readiness | GET /readyz | Only route traffic when the index is loaded |
Scaling
Askbase is stateless at the API layer. The index file is read-only after ingestion, so you can scale replicas horizontally. Mount the index from a shared volume (PVC or S3-backed) for multi-replica setups.
# In backend-deployment.yamlspec: replicas: 3 template: spec: volumes: - name: index persistentVolumeClaim: claimName: askbase-indexMonitoring
The /metrics endpoint exposes Prometheus-compatible metrics. Add a ServiceMonitor or scrape config:
- job_name: askbase static_configs: - targets: ['askbase-api:8080'] metrics_path: /metricsKey metrics: request latency (p50/p95/p99), error rate, requests per second, index chunk count.
Production checklist
- Override default auth tokens via environment variables
- Mount index on persistent storage
- Configure ingress with TLS
- Set up Prometheus scraping for
/metrics - Run
rag evaluateagainst your golden dataset after each re-index - Set resource limits (256MB RAM is enough for most indexes under 100k chunks)