Skip to main content

Python SDK Reference

Installation

pip install maturin
maturin develop

Prerequisites: Rust 1.75+, Python 3.9+


Apiary Class

Constructor

Apiary(name: str, storage: str | None = None)

Create an Apiary instance.

ParameterTypeDescription
namestrLogical name for this apiary (used as root namespace)
storagestr | NoneStorage URI. Defaults to local filesystem (~/.apiary/data/). Use "s3://bucket/path" for S3-compatible storage.
from apiary import Apiary

# Local filesystem (solo mode)
ap = Apiary("my_project")

# S3-compatible storage (multi-node capable)
ap = Apiary("production", storage="s3://my-bucket/apiary")

Lifecycle

start()

Initialize the node: detect hardware, start bee pool, begin heartbeat writer, start worker poller.

ap.start()

shutdown()

Gracefully stop the node: drain tasks, stop heartbeat, clean up resources.

ap.shutdown()

Namespace Operations

Create

create_hive(name: str) -> None
create_box(hive: str, name: str) -> None
create_frame(hive: str, box_name: str, name: str, schema: dict, partition_by: list[str] | None = None) -> None

Traditional aliases: create_database(), create_schema(), create_table() accept the same signatures. Note that create_table() uses columns as the parameter name instead of schema:

create_table(database: str, schema: str, name: str, columns: dict, partition_by: list[str] | None = None) -> None
ParameterTypeDescription
namestrName of the hive, box, or frame
hivestrParent hive name
box_namestrParent box name
schemadictColumn name to type mapping
partition_bylist[str] | NoneColumns to partition by

Supported schema types: int64, float64, utf8, boolean, date32, timestamp

ap.create_hive("warehouse")
ap.create_box("warehouse", "sales")
ap.create_frame("warehouse", "sales", "orders", {
"order_id": "int64",
"customer": "utf8",
"amount": "float64",
"region": "utf8",
}, partition_by=["region"])

List

list_hives() -> list[str]
list_boxes(hive: str) -> list[str]
list_frames(hive: str, box_name: str) -> list[str]

Traditional aliases: list_databases(), list_schemas(), list_tables()

ap.list_hives()                        # ["warehouse"]
ap.list_boxes("warehouse") # ["sales"]
ap.list_frames("warehouse", "sales") # ["orders"]

Get Metadata

get_frame(hive: str, box_name: str, name: str) -> dict

Traditional alias: get_table()

Returns frame metadata including schema, partition columns, max partitions, and creation timestamp.

info = ap.get_frame("warehouse", "sales", "orders")
# {
# "schema": {"order_id": "int64", "customer": "utf8", ...},
# "partition_by": ["region"],
# "max_partitions": 1024,
# "created_at": "2026-02-10T12:00:00+00:00"
# }

Data Operations

Write

write_to_frame(hive: str, box_name: str, frame_name: str, ipc_data: bytes) -> dict

Append data to a frame. Input is Arrow IPC stream bytes. Returns a write result with version, cell/row counts, bytes written, duration, and colony temperature.

ParameterTypeDescription
hivestrTarget hive
box_namestrTarget box
frame_namestrTarget frame
ipc_databytesArrow IPC stream bytes

Returns: {"version": int, "cells_written": int, "rows_written": int, "bytes_written": int, "duration_ms": int, "temperature": float}

import pyarrow as pa

table = pa.table({
"order_id": [1, 2, 3],
"customer": ["alice", "bob", "alice"],
"amount": [100.0, 250.0, 75.0],
"region": ["us", "eu", "us"],
})

sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream_writer(sink, table.schema)
writer.write_table(table)
writer.close()

result = ap.write_to_frame("warehouse", "sales", "orders", sink.getvalue().to_pybytes())
# {"version": 1, "cells_written": 2, "rows_written": 3, "bytes_written": 4096, "duration_ms": 42, "temperature": 0.15}

Read

read_from_frame(hive: str, box_name: str, frame_name: str, partition_filter: dict | None = None) -> bytes

Read data from a frame as Arrow IPC bytes. Optional partition filter for pruning.

ParameterTypeDescription
hivestrTarget hive
box_namestrTarget box
frame_namestrTarget frame
partition_filterdict | NonePartition column values to filter by

Returns: Arrow IPC stream bytes

data = ap.read_from_frame("warehouse", "sales", "orders")
reader = pa.ipc.open_stream(data)
table = reader.read_all()

# With partition pruning
data = ap.read_from_frame("warehouse", "sales", "orders", partition_filter={"region": "us"})

Overwrite

overwrite_frame(hive: str, box_name: str, frame_name: str, ipc_data: bytes) -> dict

Atomically replace all data in a frame. Old cells are removed and new cells are written in a single ledger entry.

result = ap.overwrite_frame("warehouse", "sales", "orders", sink.getvalue().to_pybytes())

SQL

sql(query: str) -> bytes

Execute a SQL query and return Arrow IPC stream bytes. See the SQL Reference for supported syntax.

ParameterTypeDescription
querystrSQL query string

Returns: Arrow IPC stream bytes

result_bytes = ap.sql("SELECT customer, SUM(amount) FROM warehouse.sales.orders GROUP BY customer")

reader = pa.ipc.open_stream(result_bytes)
table = reader.read_all()
print(table.to_pandas())

Custom commands (USE, SHOW, DESCRIBE) also return Arrow IPC with result metadata:

ap.sql("USE HIVE warehouse")
ap.sql("USE BOX sales")
result = ap.sql("SELECT * FROM orders LIMIT 10")

Status and Monitoring

Node Status

status() -> dict

Returns basic node information including hardware details and storage type.

s = ap.status()
# {
# "name": "production",
# "node_id": "abc123",
# "cores": 4,
# "memory_gb": 3.7,
# "bees": 4,
# "storage": "s3",
# "memory_per_bee_mb": 950,
# "target_cell_size_mb": 237
# }

Bee Status

bee_status() -> list[dict]

Returns per-bee (per-core) information: memory budget (in bytes), current utilization, and task state.

bees = ap.bee_status()
for bee in bees:
print(f"Bee {bee['bee_id']}: {bee['state']}{bee['memory_used']}/{bee['memory_budget']} bytes")

Swarm Status

swarm_status() -> dict

Returns the full swarm view: all discovered nodes with their state, bee count, and health metrics.

swarm = ap.swarm_status()
print(f"Total bees: {swarm['total_bees']}, Idle bees: {swarm['total_idle_bees']}")
for node in swarm['nodes']:
print(f" {node['node_id']}: {node['state']}{node['bees']} bees, "
f"mem pressure: {node['memory_pressure']:.2f}, "
f"temp: {node['colony_temperature']:.2f}")

Colony Status

colony_status() -> dict

Returns the behavioral model state: colony temperature, regulation classification, and setpoint.

colony = ap.colony_status()
print(f"Temperature: {colony['temperature']:.2f}")
print(f"Regulation: {colony['regulation']}") # "cold", "ideal", "warm", "hot", "critical"
print(f"Setpoint: {colony['setpoint']:.2f}")

Dual Terminology

Apiary supports both bee-themed and traditional database terminology. Every namespace operation has an alias:

Bee-themedTraditionalDescription
create_hive()create_database()Create a top-level namespace
create_box()create_schema()Create a namespace within a hive
create_frame()create_table()Create a queryable dataset
list_hives()list_databases()List all hives
list_boxes()list_schemas()List boxes in a hive
list_frames()list_tables()List frames in a box
get_frame()get_table()Get frame metadata

Both forms are functionally identical. Use whichever you prefer.