Deployment Guide¶
Best practices for deploying KruxOS in production environments.
Deployment options¶
| Method | Best for | Setup time |
|---|---|---|
| Docker | Quick evaluation, CI/CD environments | 5 minutes |
| ISO image (VM) | Dedicated agent infrastructure | 15 minutes |
| ISO image (bare metal) | Maximum performance, air-gapped environments | 30 minutes |
Production checklist¶
Before deployment¶
- [ ] Choose a policy template appropriate for your environment
- [ ] Prepare agent names and purposes for initial registration
- [ ] Decide on external service connections (Gmail, etc.)
- [ ] Plan backup strategy and retention period
- [ ] Configure network firewall rules for ports 7700, 7701, 7800
System requirements¶
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4 cores |
| RAM | 2 GB | 4 GB |
| Disk | 20 GB | 50 GB |
| Network | 1 Mbps | 10 Mbps |
| OS | x86_64 Linux | KruxOS ISO or Docker |
Network configuration¶
┌─────────────────────────────────────────────┐
│ Firewall │
│ │
│ Allow inbound: │
│ 7700/tcp ← Agent connections (internal) │
│ 7800/tcp ← Dashboard (admin network) │
│ │
│ Block inbound: │
│ 7701/tcp ← Supervision (localhost only) │
│ 7702/tcp ← OpenClaw bridge (if unused) │
│ │
│ Allow outbound: │
│ 443/tcp → Gmail API, update server │
│ 53/udp → DNS │
└─────────────────────────────────────────────┘
Supervision port
Port 7701 should never be exposed to untrusted networks. It provides session control (kill, pause) and live activity streaming. Restrict it to localhost or a management VLAN.
TLS configuration¶
The dashboard serves HTTPS with a self-signed certificate by default. For production:
Option 1: Let's Encrypt (recommended for public-facing)
Configure in /etc/kruxos/config.yaml:
dashboard:
tls:
mode: letsencrypt
domain: agents.example.com
email: [email protected]
Option 2: Reverse proxy
Place nginx or Caddy in front of KruxOS:
server {
listen 443 ssl;
server_name agents.example.com;
ssl_certificate /etc/ssl/certs/agents.pem;
ssl_certificate_key /etc/ssl/private/agents.key;
# Dashboard
location / {
proxy_pass http://localhost:7800;
proxy_set_header Host $host;
}
# Agent WebSocket
location /ws {
proxy_pass http://localhost:7700;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Policy configuration¶
Choose a starting policy¶
| Environment | Recommended template | Philosophy |
|---|---|---|
| Development / testing | personal-permissive |
Maximum agent autonomy |
| Team use | team-moderate |
Reads auto, writes notify, destructive = approval |
| Production | enterprise-restrictive |
All writes need approval |
Customize for your needs¶
Start with a template and customize. See the Policies guide for full YAML syntax.
Key customizations for production:
# Rate limit email sending
email.send:
tier: notify
rate_limit:
max: 20
window: 3600
on_exceed: approval_required
# Block network access by default
network.*:
tier: blocked
# Allow specific network capabilities
network.http_request:
tier: approval_required
reason: "External HTTP requests require review"
Backup strategy¶
Automated daily backups¶
Daily backups run automatically on the appliance via a systemd timer at 02:00 UTC out of the box — no setup needed. Verify with systemctl list-timers '*kruxos*'. Audit-log rotation runs at 03:00 UTC with a 90-day default retention.
To add a second schedule (e.g., hourly incrementals), use the host's cron:
0 * * * * /usr/local/bin/kruxos state backup --out /data/kruxos/backups/incr-$(date +\%H).tar.gz.enc
Offsite backup¶
Copy backups to external storage:
# Example: sync to S3-compatible storage
0 3 * * * rclone copy /data/kruxos/backups/ remote:kruxos-backups/ --max-age 1d
Retention¶
Configure backup retention in your backup rotation script. Recommended:
- Daily backups: keep 7 days
- Weekly backups: keep 4 weeks
- Monthly backups: keep 12 months
Monitoring integration¶
Health check endpoint¶
The /health endpoint on port 7701 returns HTTP 200 (healthy) or 503 (unhealthy):
# Nagios / monitoring check
curl -sf http://localhost:7701/health || echo "CRITICAL: KruxOS unhealthy"
Alerting¶
Configure external alerting by polling the health endpoint or subscribing to the supervision WebSocket for real-time events.
Update strategy¶
Recommended process¶
- Back up before updating:
kruxos state backup --out /data/kruxos/backups/pre-update.tar.gz.enc - Check for updates:
kruxos update check - Apply during maintenance window:
kruxos update apply - Verify after reboot:
kruxos statusandkruxos audit stats - Rollback if issues:
kruxos update rollback
Health-driven A/B rollback ships in v0.0.2
The A/B partition layout and inactive-slot write path are in place in v0.0.1, but the automated post-reboot health probe that confirms the new slot (or fails back to the previous one) lands in v0.0.2. Until then, run step 4 manually before the boot flag is confirmed.
The A/B partition system ensures that failed updates automatically roll back. See Updating for details.
Scaling considerations¶
v0.0.x (single-node)¶
The v0.0.x line is designed for single-node deployment. Practical limits:
| Metric | Tested capacity |
|---|---|
| Concurrent agents | 50+ |
| Capability invocations/sec | 1,000+ |
| Audit entries/day | 1,000,000+ |
| State entries per agent | 10,000+ |
Enterprise (future)¶
Multi-node clustering, PostgreSQL backend, and horizontal scaling are planned for the enterprise edition.