Benchmarks ========== haystack-py includes a Docker-based benchmark suite that measures HTTP and WebSocket throughput against all three storage backends (InMemory, Redis, TimescaleDB). The suite also times the decoding of entity data in each supported wire format (JSON, Trio, Zinc). .. contents:: On this page :local: :depth: 2 Test Setup ---------- All benchmarks were run on Docker Desktop with a single client container driving a single server container sequentially (HTTP first, then WebSocket) for each backend. .. list-table:: :header-rows: 1 :widths: 30 70 * - Parameter - Value * - **Server container** - 3 GB RAM limit * - **Client container** - 1.5 GB RAM limit * - **HTTP concurrency** - 50 connections * - **WebSocket concurrency** - 20 connections * - **Duration** - 15 s per test (+ 3 s warmup) * - **Entity dataset** - 3,109 entities (Alpha 2,032 + Bravo 1,077) * - **Entity mix** - 2,764 points, 333 equips, 2 sites, 8 VAVs, AHUs, meters * - **Server** - Uvicorn (single worker), FastAPI, SCRAM-SHA-256 auth * - **Wire formats decoded** - JSON (orjson), Trio, Zinc The server loads entity data from JSON files (Trio and Zinc files are decoded for timing but not double-ingested). Each client authenticates via SCRAM-SHA-256 then sends a mix of Haystack operations (``about``, ``ops``, ``read`` with filters, ``nav``) in a tight loop for the benchmark duration. HTTP API Results ---------------- .. list-table:: :header-rows: 1 :widths: 20 15 15 15 15 * - Backend - RPS - p50 - p95 - p99 * - **InMemory** - 3,380 - 15.3 ms - 20.8 ms - 37.9 ms * - **Redis** - 3,578 - 13.8 ms - 21.7 ms - 40.2 ms * - **TimescaleDB** - 3,713 - 12.7 ms - 23.3 ms - 34.3 ms All three backends achieve comparable throughput (3,300–3,700 rps) thanks to response caching and decode-path optimizations that eliminate storage I/O as a bottleneck after warmup. The server is bottlenecked on HTTP/ASGI overhead rather than backend speed. Per-endpoint breakdown (InMemory, 50 connections): .. list-table:: :header-rows: 1 :widths: 25 15 15 15 * - Endpoint - RPS - p50 - p99 * - ``/about`` - 677 - 9.9 ms - 29.1 ms * - ``/ops`` - 677 - 9.9 ms - 28.9 ms * - ``/read`` (filter) - 1,352 - 16.8 ms - 40.0 ms * - ``/nav`` - 675 - 16.3 ms - 39.2 ms WebSocket API Results --------------------- .. list-table:: :header-rows: 1 :widths: 20 18 15 15 15 * - Backend - msg/s - p50 - p95 - p99 * - **InMemory** - 1,552 - 12.7 ms - 23.8 ms - 29.4 ms * - **Redis** - 1,616 - 10.2 ms - 30.9 ms - 40.7 ms * - **TimescaleDB** - 1,797 - 4.0 ms - 38.4 ms - 46.4 ms WebSocket throughput ranges from 1,550–1,800 msg/s across backends with 20 concurrent connections. The persistent connection eliminates per-request auth/HTTP overhead. Response payloads are larger than HTTP (full entity grids per message), which makes the transport I/O the primary bottleneck. All three backends perform similarly because read caches serve repeated queries from memory. Per-operation breakdown (TimescaleDB, 20 connections): .. list-table:: :header-rows: 1 :widths: 20 15 15 15 * - Operation - msg/s - p50 - p99 * - ``about`` - 360 - 0.8 ms - 12.1 ms * - ``ops`` - 359 - 0.6 ms - 17.4 ms * - ``read`` - 718 - 11.5 ms - 40.7 ms * - ``nav`` - 359 - 25.2 ms - 49.3 ms Wire Format Decode Performance ------------------------------ Decoding 3,109 entities (Alpha + Bravo datasets): .. list-table:: :header-rows: 1 :widths: 15 20 20 * - Format - Alpha (2,032) - Bravo (1,077) * - **JSON** (orjson) - 34 ms - 56 ms * - **Trio** - 67 ms - 93 ms * - **Zinc** - 75 ms - N/A\ :sup:`1` .. note:: :sup:`1` The Bravo Zinc file contains a scientific notation edge case (``10.6E``) that the current Zinc scanner does not handle. This is a known limitation tracked for a future release. JSON (via orjson) is the fastest decoder — roughly 2× faster than Trio and Zinc for the same dataset. All formats decode the full 2,032-entity Alpha dataset in under 100 ms. Running the Benchmarks ---------------------- The benchmark suite lives in ``benchmarks/`` and can be run with a single script: .. code-block:: bash cd benchmarks ./run_benchmarks.sh This will: 1. Build the server and client Docker images 2. Start each backend (InMemory → Redis → TimescaleDB) 3. Run HTTP then WebSocket client sequentially per backend 4. Collect results to ``benchmarks/results/`` 5. Print an aggregated summary To run a single backend manually: .. code-block:: bash # Start server with InMemory backend docker compose -f benchmarks/docker-compose.yml up -d --wait server-inmemory # Run one HTTP client docker compose -f benchmarks/docker-compose.yml run --rm http-inmemory # Run one WebSocket client docker compose -f benchmarks/docker-compose.yml run --rm ws-inmemory # Cleanup docker compose -f benchmarks/docker-compose.yml down -v --remove-orphans Configuration is via environment variables: .. list-table:: :header-rows: 1 :widths: 30 15 55 * - Variable - Default - Description * - ``HS_PY_CONCURRENCY`` - 50 / 20 - Concurrent connections (HTTP / WebSocket) * - ``HS_PY_DURATION`` - 15 - Benchmark duration in seconds * - ``HS_PY_WARMUP`` - 3 - Warmup duration in seconds * - ``BACKEND`` - inmemory - Server storage backend (``inmemory``, ``redis``, ``timescale``)