Python SDK Reference

Installation

pip install maturin
maturin develop

Prerequisites: Rust 1.75+, Python 3.9+

Apiary Class

Constructor

Apiary(name: str, storage: str | None = None)

Create an Apiary instance.

Parameter	Type	Description
`name`	`str`	Logical name for this apiary (used as root namespace)
`storage`	`str \| None`	Storage URI. Defaults to local filesystem (`~/.apiary/data/`). Use `"s3://bucket/path"` for S3-compatible storage.

from apiary import Apiary

# Local filesystem (solo mode)
ap = Apiary("my_project")

# S3-compatible storage (multi-node capable)
ap = Apiary("production", storage="s3://my-bucket/apiary")

Lifecycle

`start()`

Initialize the node: detect hardware, start bee pool, begin heartbeat writer, start worker poller.

ap.start()

`shutdown()`

Gracefully stop the node: drain tasks, stop heartbeat, clean up resources.

ap.shutdown()

Namespace Operations

Create

create_hive(name: str) -> None
create_box(hive: str, name: str) -> None
create_frame(hive: str, box_name: str, name: str, schema: dict, partition_by: list[str] | None = None) -> None

Traditional aliases: create_database(), create_schema(), create_table() accept the same signatures. Note that create_table() uses columns as the parameter name instead of schema:

create_table(database: str, schema: str, name: str, columns: dict, partition_by: list[str] | None = None) -> None

Parameter	Type	Description
`name`	`str`	Name of the hive, box, or frame
`hive`	`str`	Parent hive name
`box_name`	`str`	Parent box name
`schema`	`dict`	Column name to type mapping
`partition_by`	`list[str] \| None`	Columns to partition by

Supported schema types: int64, float64, utf8, boolean, date32, timestamp

ap.create_hive("warehouse")
ap.create_box("warehouse", "sales")
ap.create_frame("warehouse", "sales", "orders", {
    "order_id": "int64",
    "customer": "utf8",
    "amount": "float64",
    "region": "utf8",
}, partition_by=["region"])

List

list_hives() -> list[str]
list_boxes(hive: str) -> list[str]
list_frames(hive: str, box_name: str) -> list[str]

Traditional aliases: list_databases(), list_schemas(), list_tables()

ap.list_hives()                        # ["warehouse"]
ap.list_boxes("warehouse")             # ["sales"]
ap.list_frames("warehouse", "sales")   # ["orders"]

Get Metadata

get_frame(hive: str, box_name: str, name: str) -> dict

Traditional alias: get_table()

Returns frame metadata including schema, partition columns, max partitions, and creation timestamp.

info = ap.get_frame("warehouse", "sales", "orders")
# {
#   "schema": {"order_id": "int64", "customer": "utf8", ...},
#   "partition_by": ["region"],
#   "max_partitions": 1024,
#   "created_at": "2026-02-10T12:00:00+00:00"
# }

Data Operations

Write

write_to_frame(hive: str, box_name: str, frame_name: str, ipc_data: bytes) -> dict

Append data to a frame. Input is Arrow IPC stream bytes. Returns a write result with version, cell/row counts, bytes written, duration, and colony temperature.

Parameter	Type	Description
`hive`	`str`	Target hive
`box_name`	`str`	Target box
`frame_name`	`str`	Target frame
`ipc_data`	`bytes`	Arrow IPC stream bytes

Returns: {"version": int, "cells_written": int, "rows_written": int, "bytes_written": int, "duration_ms": int, "temperature": float}

import pyarrow as pa

table = pa.table({
    "order_id": [1, 2, 3],
    "customer": ["alice", "bob", "alice"],
    "amount": [100.0, 250.0, 75.0],
    "region": ["us", "eu", "us"],
})

sink = pa.BufferOutputStream()
writer = pa.ipc.new_stream_writer(sink, table.schema)
writer.write_table(table)
writer.close()

result = ap.write_to_frame("warehouse", "sales", "orders", sink.getvalue().to_pybytes())
# {"version": 1, "cells_written": 2, "rows_written": 3, "bytes_written": 4096, "duration_ms": 42, "temperature": 0.15}

Read

read_from_frame(hive: str, box_name: str, frame_name: str, partition_filter: dict | None = None) -> bytes

Read data from a frame as Arrow IPC bytes. Optional partition filter for pruning.

Parameter	Type	Description
`hive`	`str`	Target hive
`box_name`	`str`	Target box
`frame_name`	`str`	Target frame
`partition_filter`	`dict \| None`	Partition column values to filter by

Returns: Arrow IPC stream bytes

data = ap.read_from_frame("warehouse", "sales", "orders")
reader = pa.ipc.open_stream(data)
table = reader.read_all()

# With partition pruning
data = ap.read_from_frame("warehouse", "sales", "orders", partition_filter={"region": "us"})

Overwrite

overwrite_frame(hive: str, box_name: str, frame_name: str, ipc_data: bytes) -> dict

Atomically replace all data in a frame. Old cells are removed and new cells are written in a single ledger entry.

result = ap.overwrite_frame("warehouse", "sales", "orders", sink.getvalue().to_pybytes())

SQL

sql(query: str) -> bytes

Execute a SQL query and return Arrow IPC stream bytes. See the SQL Reference for supported syntax.

Parameter	Type	Description
`query`	`str`	SQL query string

Returns: Arrow IPC stream bytes

result_bytes = ap.sql("SELECT customer, SUM(amount) FROM warehouse.sales.orders GROUP BY customer")

reader = pa.ipc.open_stream(result_bytes)
table = reader.read_all()
print(table.to_pandas())

Custom commands (USE, SHOW, DESCRIBE) also return Arrow IPC with result metadata:

ap.sql("USE HIVE warehouse")
ap.sql("USE BOX sales")
result = ap.sql("SELECT * FROM orders LIMIT 10")

Status and Monitoring

Node Status

status() -> dict

Returns basic node information including hardware details and storage type.

s = ap.status()
# {
#   "name": "production",
#   "node_id": "abc123",
#   "cores": 4,
#   "memory_gb": 3.7,
#   "bees": 4,
#   "storage": "s3",
#   "memory_per_bee_mb": 950,
#   "target_cell_size_mb": 237
# }

Bee Status

bee_status() -> list[dict]

Returns per-bee (per-core) information: memory budget (in bytes), current utilization, and task state.

bees = ap.bee_status()
for bee in bees:
    print(f"Bee {bee['bee_id']}: {bee['state']} — {bee['memory_used']}/{bee['memory_budget']} bytes")

Swarm Status

swarm_status() -> dict

Returns the full swarm view: all discovered nodes with their state, bee count, and health metrics.

swarm = ap.swarm_status()
print(f"Total bees: {swarm['total_bees']}, Idle bees: {swarm['total_idle_bees']}")
for node in swarm['nodes']:
    print(f"  {node['node_id']}: {node['state']} — {node['bees']} bees, "
          f"mem pressure: {node['memory_pressure']:.2f}, "
          f"temp: {node['colony_temperature']:.2f}")

Colony Status

colony_status() -> dict

Returns the behavioral model state: colony temperature, regulation classification, and setpoint.

colony = ap.colony_status()
print(f"Temperature: {colony['temperature']:.2f}")
print(f"Regulation: {colony['regulation']}")  # "cold", "ideal", "warm", "hot", "critical"
print(f"Setpoint: {colony['setpoint']:.2f}")

Dual Terminology

Apiary supports both bee-themed and traditional database terminology. Every namespace operation has an alias:

Bee-themed	Traditional	Description
`create_hive()`	`create_database()`	Create a top-level namespace
`create_box()`	`create_schema()`	Create a namespace within a hive
`create_frame()`	`create_table()`	Create a queryable dataset
`list_hives()`	`list_databases()`	List all hives
`list_boxes()`	`list_schemas()`	List boxes in a hive
`list_frames()`	`list_tables()`	List frames in a box
`get_frame()`	`get_table()`	Get frame metadata

Both forms are functionally identical. Use whichever you prefer.

Installation​

Apiary Class​

Constructor​

Lifecycle​

start()​

shutdown()​

Namespace Operations​

Create​

List​

Get Metadata​

Data Operations​

Write​

Read​

Overwrite​

SQL​

Status and Monitoring​

Node Status​

Bee Status​

Swarm Status​

Colony Status​

Dual Terminology​