Skip to main content

Benchmarks

Apiary includes a benchmark framework for measuring query performance across different hardware profiles. The framework supports both direct DataFusion execution and full Apiary Docker stack testing.

Benchmark Suites

Tier 1 -- Baseline Credibility

These are industry-standard benchmarks for establishing baseline comparability:

SuiteTablesQueriesPurpose
SSB (Star Schema Benchmark)513Star schema analytics, primary baseline
TPC-H822Industry-standard decision support benchmark

Both suites support multiple scale factors:

Scale FactorApproximate SizeUse Case
SF1~1 GBQuick smoke tests, Pi development
SF10~10 GBFull Pi benchmarks, baseline measurements
SF100~100 GBCloud/desktop benchmarks, stress testing

Tier 2 -- Apiary-Specific

SuiteWhat It TestsRequirements
apiary_formatQuery across Parquet, CSV, and Arrow simultaneouslySingle node
apiary_heterogeneousMixed Pi 4/Pi 5/x86 coordinationMulti-node
apiary_acidConcurrent reads + writes (0, 1, 2, 4 concurrent writers)Single node
apiary_elasticityDynamic node join/leave during queriesMulti-node

Tier 3 -- Stretch Goals

SuiteSizePurpose
JOB (Join Order Benchmark)~3.6 GB (IMDB)Complex join optimization
ClickBench (subset)~14 GB (Yandex)Wide-table analytical patterns

Execution Engines

DataFusion Direct

Runs queries through DataFusion via the Apiary Python bindings, bypassing the full Apiary stack. Useful for measuring raw query engine performance.

python bench_runner.py --suite ssb --data-dir ./data/ssb/sf1/parquet --output ./results

Apiary Docker

Runs the full Apiary stack in Docker containers with optional resource constraints matched to specific hardware profiles (see Pi Deploy Profiles).

python bench_runner.py --suite ssb --engine apiary-docker --image apiary:latest --profile pi4-4gb

Multi-node execution:

python bench_runner.py --suite ssb --engine apiary-docker --image apiary:latest --nodes 3

Execution Protocol

Each benchmark run follows this protocol:

ParameterValue
Runs per query3
Warmup runs1 (discarded)
Reported metricMedian of 3 runs
Timeout per query600 seconds
Metrics collectedWall-clock ms, peak RSS, rows/sec, bytes/sec, partitions pruned

Running Benchmarks

Setup

cd benchmarks
pip install -r requirements.txt

Generate Test Data

# Generate SSB data at scale factor 1 (~1 GB)
python generate_datasets.py --scale-factor 1 --output ./data

# Generate TPC-H data at scale factor 1
python generate_datasets.py --suite tpc-h --scale-factor 1 --output ./data

Run Locally (DataFusion Direct)

python bench_runner.py --suite ssb --data-dir ./data/ssb/sf1/parquet --output ./results

Run with Docker (Full Stack)

# Build the Apiary image first
docker build -t apiary:latest ..

# Single node
python bench_runner.py --suite ssb --engine apiary-docker --image apiary:latest --output ./results

# Multi-node (3 nodes)
python bench_runner.py --suite ssb --engine apiary-docker --image apiary:latest --nodes 3 --output ./results

# With hardware profile constraints
python bench_runner.py --suite ssb --engine apiary-docker --image apiary:latest --profile pi4-4gb --output ./results

Generate Reports

python ../scripts/generate_benchmark_report.py --input ./results --output ./results/report.html

Results Format

Benchmark results are written as JSON files in the output directory. Each result includes:

{
"suite": "ssb",
"query": "q1.1",
"engine": "datafusion",
"scale_factor": 1,
"runs": [
{"wall_clock_ms": 142, "peak_rss_bytes": 52428800, "rows": 100}
],
"median_ms": 142,
"system_info": {
"cpu": "...",
"cores": 4,
"memory_gb": 3.7,
"os": "Linux",
"arch": "aarch64"
}
}

Hardware Profiles

The benchmark framework supports Docker Compose profiles that constrain resources to match specific Raspberry Pi models. See Pi Deploy Profiles for the full comparison table.