Skip to main content

Monitor Swarm Health

Apiary provides four status APIs for monitoring: node status, bee status, swarm status, and colony status.

Check Node Status

s = ap.status()
print(f"Node: {s['node_id']}")
print(f"Cores: {s['cores']}")
print(f"Memory: {s['memory_gb']:.1f} GB")
print(f"State: {s['state']}")

Check Bee Status

See per-core utilization:

bees = ap.bee_status()
for bee in bees:
print(f"Bee {bee['bee_id']}: {bee['state']} — "
f"{bee['memory_used_mb']:.0f}/{bee['memory_budget_mb']:.0f} MB")

Bee states:

  • idle -- Available for tasks
  • busy -- Executing a task
  • draining -- Finishing current task before shutdown

Check Swarm Status

See all nodes in the swarm:

swarm = ap.swarm_status()
print(f"Nodes alive: {swarm['alive']}")
print(f"Total bees: {swarm['total_bees']}")

for node in swarm['nodes']:
print(f" {node['node_id']}: {node['state']} "
f"({node['cores']} cores, {node['memory_gb']:.1f} GB)")

Node states:

  • alive -- Heartbeat updated within the last 15 seconds
  • suspect -- Heartbeat is 15-30 seconds old
  • dead -- Heartbeat is older than 30 seconds

Check Colony Status

Monitor the behavioral model:

colony = ap.colony_status()
print(f"Temperature: {colony['temperature']:.2f}")
print(f"Regulation: {colony['regulation']}")
print(f"Abandoned tasks: {colony['abandoned_tasks']}")

Temperature ranges:

  • Cold (0.0-0.3) -- System underutilized
  • Ideal (0.3-0.7) -- Normal operating range
  • Warm (0.7-0.85) -- Approaching capacity
  • Hot (0.85-0.95) -- Consider reducing write load (v2: automatic backpressure)
  • Critical (0.95-1.0) -- Investigate immediately (v2: query admission control)

Create a Monitoring Script

Poll status at regular intervals:

import time
from apiary import Apiary

ap = Apiary("production", storage="s3://apiary-data/prod")
ap.start()

while True:
# Node health
colony = ap.colony_status()
swarm = ap.swarm_status()

print(f"[{time.strftime('%H:%M:%S')}] "
f"Temp: {colony['temperature']:.2f} ({colony['regulation']}) | "
f"Nodes: {swarm['alive']} alive | "
f"Bees: {swarm['total_bees']} | "
f"Abandoned: {colony['abandoned_tasks']}")

# Alert on high temperature
if colony['temperature'] > 0.85:
print(" WARNING: Colony temperature is HOT — consider reducing write load")

# Alert on dead nodes
dead_nodes = [n for n in swarm['nodes'] if n['state'] == 'dead']
if dead_nodes:
for n in dead_nodes:
print(f" ALERT: Node {n['node_id']} is DEAD")

time.sleep(10)

Check Logs (Systemd)

If running as a systemd service:

# Recent logs
journalctl -u apiary --since "5 minutes ago"

# Follow logs in real time
journalctl -u apiary -f

# Filter by log level
journalctl -u apiary | grep -i error

Adjust Log Verbosity

Set the RUST_LOG environment variable:

# Default: info level
RUST_LOG=info

# Debug query execution
RUST_LOG=info,apiary_query=debug

# Trace storage operations
RUST_LOG=info,apiary_storage=trace

# Debug everything
RUST_LOG=debug