Production Deployment
This guide covers everything you need to deploy OJS in a production environment.
1. Choose Your Backend
Section titled “1. Choose Your Backend”| Use Case | Recommended Backend | Why |
|---|---|---|
| Speed-critical, low latency | Redis | Sub-millisecond enqueue, mature ecosystem |
| ACID guarantees, SQL queryability | PostgreSQL | Strong durability, transactional enqueue |
| Cloud-native microservices | NATS | Single binary, built-in clustering |
| Event replay, compliance | Kafka | Immutable log, unlimited throughput |
| AWS-native, zero ops | SQS | Fully managed, pay-per-use |
| Development / CI | Lite | Zero deps, sub-50ms startup |
See the Backend Selection Guide for detailed comparison.
2. Deploy with Kubernetes
Section titled “2. Deploy with Kubernetes”Helm Chart
Section titled “Helm Chart”# Add the OJS Helm repositoryhelm repo add openjobspec https://openjobspec.github.io/chartshelm repo update
# Install with Redis backendhelm install ojs openjobspec/ojs-server \ --set backend=redis \ --set redis.url=redis://redis:6379 \ --set auth.apiKey=your-secret-key \ --set replicas=3Key Configuration Values
Section titled “Key Configuration Values”backend: redisreplicas: 3
redis: url: redis://redis-cluster:6379
auth: apiKey: "${OJS_API_KEY}" # Required in production enabled: true
resources: requests: memory: "128Mi" cpu: "250m" limits: memory: "512Mi" cpu: "1000m"
autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilization: 70
monitoring: prometheus: enabled: true grafana: dashboards: trueDocker Compose (Single Node)
Section titled “Docker Compose (Single Node)”For simpler deployments:
cd ojs-cloud/deploycp .env.example .env # Edit with your secretsdocker compose -f docker-compose.production.yml up -d3. Security Hardening
Section titled “3. Security Hardening”Authentication
Section titled “Authentication”All production deployments MUST enable API key authentication:
# Environment variableOJS_AUTH_REQUIRED=trueOJS_API_KEY=your-strong-random-key-32-chars-minimumEncryption
Section titled “Encryption”Enable job payload encryption for sensitive data:
OJS_ENCRYPTION_ENABLED=trueOJS_ENCRYPTION_KEY=your-32-byte-aes-key-base64-encodedNetwork Security
Section titled “Network Security”- Place OJS servers on a private network (not internet-facing)
- Use a reverse proxy (Nginx, Caddy, ALB) for TLS termination
- Enable rate limiting to prevent abuse
- Set CORS headers if Admin UI is on a different domain
Policy Engine
Section titled “Policy Engine”Define governance rules for job processing:
[ { "id": "block-pii-queue", "name": "Block PII on public queues", "action": "deny", "enabled": true, "conditions": { "queues": ["public-*"], "tags": ["contains-pii"] } }]4. Observability
Section titled “4. Observability”Prometheus Metrics
Section titled “Prometheus Metrics”Every OJS backend exposes metrics at /metrics:
# Key metrics to monitorojs_jobs_enqueued_total # Total jobs enqueuedojs_jobs_completed_total # Total jobs completedojs_jobs_failed_total # Total jobs failedojs_queue_depth # Current queue depthojs_job_duration_seconds # Job processing time histogramojs_worker_active_jobs # Currently active jobs per workerGrafana Dashboards
Section titled “Grafana Dashboards”Import the pre-built dashboards from deploy/grafana/:
- Overview — System-wide throughput, latency, error rate
- Queues — Per-queue depth, throughput, and age
- Workers — Worker count, utilization, and heartbeat status
- Jobs — Job lifecycle timing and state distribution
- Errors — Error rate by type, retry patterns, dead letter growth
- Performance — p50/p95/p99 latency, memory, CPU
Alerting
Section titled “Alerting”Recommended alerts:
| Alert | Condition | Severity |
|---|---|---|
| Queue backlog growing | Depth > 1000 for > 5 min | Warning |
| High failure rate | > 10% for > 2 min | Critical |
| Worker stall | No heartbeat for > 60s | Critical |
| Dead letter growth | > 100 jobs in 1 hour | Warning |
OpenTelemetry
Section titled “OpenTelemetry”Enable distributed tracing across producers and workers:
// Go SDKworker.Use(ojs.OpenTelemetryMiddleware(ojs.OTelConfig{ ServiceName: "payment-worker", Endpoint: "otel-collector:4317",}))5. High Availability
Section titled “5. High Availability”Multi-Replica Deployment
Section titled “Multi-Replica Deployment”Run 3+ OJS server replicas behind a load balancer. All backends support concurrent access from multiple server instances.
Backend Redundancy
Section titled “Backend Redundancy”| Backend | HA Strategy |
|---|---|
| Redis | Redis Sentinel or Redis Cluster |
| PostgreSQL | Streaming replication + pgbouncer |
| NATS | NATS Cluster (built-in) |
| Kafka | Multi-broker cluster |
| SQS | AWS-managed (multi-AZ by default) |
Graceful Shutdown
Section titled “Graceful Shutdown”OJS backends support graceful shutdown:
# Kubernetes terminationGracePeriodSecondsterminationGracePeriodSeconds: 30
# Or manuallykill -SIGTERM <pid> # Starts graceful shutdown# Active jobs complete, no new jobs are fetched6. Performance Tuning
Section titled “6. Performance Tuning”Worker Concurrency
Section titled “Worker Concurrency”# Start with: concurrency = 2 × CPU coresOJS_WORKER_CONCURRENCY=16Poll Interval
Section titled “Poll Interval”# High throughput: shorter intervalOJS_WORKER_POLL_INTERVAL=200ms
# Low throughput: longer to save CPUOJS_WORKER_POLL_INTERVAL=2sAuto-Tuning
Section titled “Auto-Tuning”Enable the auto-tuning engine for automatic optimization:
OJS_AUTOTUNE=trueOJS_AUTOTUNE_INTERVAL=30sThe engine analyzes throughput, latency, and queue depth to recommend optimal concurrency, poll intervals, and connection pool sizes.
Production Checklist
Section titled “Production Checklist”- API key authentication enabled
- TLS termination configured
- Backend persistence configured (Redis AOF, Postgres WAL)
- 3+ server replicas running
- Health check endpoint monitored
- Prometheus scraping enabled
- Grafana dashboards imported
- Alerting rules configured
- Graceful shutdown tested
- Backup strategy for backend data
- Log aggregation configured
- Rate limiting enabled
- Network security reviewed