Skip to content

Architecture

System overview

KruxOS is a layered system where every agent interaction follows a deterministic pipeline:

graph TB
    subgraph "Agents"
        Claude[Claude<br/>MCP native]
        GPT[GPT-4o<br/>Function calling]
        Gemini[Gemini<br/>Function declarations]
        Local[Ollama<br/>OpenAI-compatible]
    end

    subgraph "Layer 1: Gateway"
        GW[Agent Gateway<br/>Rust / Tokio async<br/>Port 7700]
        Auth[Authentication]
        Session[Session Management]
        GW --> Auth --> Session
    end

    subgraph "Layer 2: Capability Registry"
        Registry[47 Capabilities<br/>YAML definitions<br/>Schema validation]
    end

    subgraph "Layer 2a: Service Proxy"
        Sync[Sync Engine<br/>Read-replicas]
        Write[Write Proxy<br/>Buffer + cancel]
        Roll[Rollback Engine<br/>Point-in-time recovery]
    end

    subgraph "Layer 3: State System"
        SState[Session State<br/>In-memory]
        PState[Persistent State<br/>Per-agent SQLite]
        ShState[Shared State<br/>Cross-agent SQLite]
    end

    subgraph "Layer 4: Policy Engine"
        PE[YAML Rules<br/>Compiled evaluation tree<br/>< 1ms evaluation]
    end

    subgraph "Cross-cutting"
        Vault[Secrets Vault<br/>AES-256-GCM]
        Sandbox[Agent Sandbox<br/>5-layer kernel isolation]
        Audit[Audit Logs<br/>Hash-chained CBOR]
        Health[Health & Diagnostics<br/>HTTP /health endpoint]
        Comms[Agent Comms<br/>In-memory broker]
    end

    subgraph "Supervision"
        Dashboard[Web Dashboard<br/>Next.js + React<br/>Port 7800]
        CLI[kruxos CLI<br/>Rust / clap]
        SupWS[Supervision WebSocket<br/>Port 7701]
    end

    subgraph "External Services"
        Gmail[Gmail API]
        Future[Future services...]
    end

    Claude & GPT & Gemini & Local --> GW
    Session --> PE
    Session --> Registry
    Registry --> Sync & Write & Roll
    Sync --> Gmail & Future
    Write --> Gmail & Future
    GW --> SState & PState & ShState
    GW --> Vault
    GW --> Sandbox
    GW --> Audit
    GW --> Health
    GW --> Comms
    Dashboard & CLI --> SupWS --> GW

Request lifecycle

Every capability invocation follows this exact sequence:

sequenceDiagram
    participant Agent
    participant Gateway
    participant Policy
    participant Sandbox
    participant Registry
    participant Capability
    participant Vault
    participant Audit

    Agent->>Gateway: capabilities.call(name, inputs)
    Gateway->>Gateway: Authenticate (session lookup)
    Gateway->>Policy: evaluate(agent, capability, inputs)

    alt Blocked
        Policy-->>Gateway: Denied
        Gateway->>Audit: log(denied)
        Gateway-->>Agent: StructuredError(PolicyDenied)
    else Approval Required
        Policy-->>Gateway: ApprovalRequired
        Gateway->>Audit: log(pending)
        Gateway-->>Agent: StructuredError(ApprovalPending, request_id)
    else Autonomous / Notify
        Policy-->>Gateway: Allowed
        Gateway->>Sandbox: verify agent sandbox active
        Gateway->>Registry: dispatch(capability, inputs)
        Registry->>Capability: execute(inputs, secret_provider)
        Capability->>Vault: get_handle(secret_name)
        Vault-->>Capability: SecretHandle (opaque)
        Capability-->>Registry: CapabilityResponse
        Registry-->>Gateway: CapabilityResponse
        Gateway->>Audit: log(success, duration)
        Gateway-->>Agent: CapabilityResponse
    end

Technology stack

Component Technology Rationale
Gateway Rust (tokio async) Performance-critical hot path, memory safety
Registry Rust + YAML definitions Definitions as data, hot-reload without recompile
Policy Engine Rust Deterministic evaluation, < 1ms per decision
Sandbox Rust + Linux kernel Direct kernel API for minimal overhead
Vault Rust Security-critical, minimal attack surface
Audit Rust Append-only writes, hash chain computation
State System SQLite (WAL mode) Single-node, crash-safe, zero config
Service Proxy Rust framework + Python adapters Framework in Rust for safety, adapters in Python for flexibility
Agent Comms Rust + Protocol Buffers Low-latency in-memory message broker
Dashboard Next.js 15 + TypeScript + Tailwind Modern web stack, real-time via WebSocket
Agent SDK Python 3.11+ Primary AI agent ecosystem language
CLI Rust (clap) Single binary, fast startup, shell completions

Data storage

All persistent data lives under /data/kruxos/:

Database Engine Purpose Scope
agents.db SQLite Agent identity, metadata Global
agents/{name}/state.db SQLite Per-agent persistent state Per-agent
shared/state.db SQLite Cross-agent shared state Global
approval_queue.db SQLite Pending approval requests Global
vault.db SQLite Encrypted secrets Global
audit/audit-index.db SQLite Audit log query index Global
audit/audit-*.log CBOR files Raw audit entries (hash-chained) Daily files
proxy/{service}/sync.db SQLite Service read-replicas Per-service
proxy/{service}/write_buffer.db SQLite Buffered outbound writes Per-service

All SQLite databases use WAL (Write-Ahead Logging) mode for concurrent read performance.

Capability categories

Category Count Examples
filesystem.* 10 read, write, list, move, delete, search, stat, mkdir, copy, watch
process.* 5 run, list, kill, wait, info
network.* 4 http_request, dns_lookup, port_check, download
git.* 7 log, diff, status, commit, branch, checkout, clone
scheduler.* 3 cron_create, cron_list, cron_delete
alerts.* 3 send, list, acknowledge
system.* 4 metrics, health, info, shutdown
agent.* 4 session, capabilities, briefing, whoami
secrets.* 3 list, use, rotate
comms.* 4 send, receive, subscribe, publish
Total 47

Each capability is defined in YAML with: purpose, when_to_use, typed inputs/outputs, side effects, common patterns, and error types.

Deployment topology

Single-node (v0.0.x)

┌─────────────────────────────────┐
│         KruxOS Instance         │
│                                  │
│  Gateway ──── Registry           │
│     │                            │
│  Policy ──── Sandbox             │
│     │                            │
│  Vault ──── Audit ──── State     │
│     │                            │
│  Dashboard ──── Proxy            │
│                                  │
│  SQLite for all persistence      │
└─────────────────────────────────┘

The v0.0.x line is single-node with SQLite. All services run on one machine. This is the right architecture for personal use, small teams, and initial enterprise evaluation.

Enterprise (future)

Multi-node clustering with PostgreSQL, distributed audit collection, and centralized policy management is planned for the enterprise edition.

Port map

Port Protocol Service Access
7700 WebSocket (MCP + JSON-RPC) Agent Gateway Agent tokens (64-char hex)
7701 WebSocket Supervision (live activity, audit events) User tokens (krx_user_*)
7702 UDP Loopback trigger-wake (127.0.0.1) Loopback only
7703 HTTPS User API (bearer-auth, loopback) User tokens (krx_user_*)
7800 HTTPS Web Dashboard Operator session / User tokens

The default appliance firewall accepts TCP 22 / 7700 / 7701 / 7702 / 7800.