Encoding¶

Haystack wire format implementations for JSON, Zinc, Trio, and CSV encoding.

Haystack encoding formats.

Provides JSON, Zinc, Trio, and CSV encoding/decoding.

JSON: from hs_py.encoding.json import ... (v3 and v4)
Zinc: from hs_py.encoding.zinc import ... (grid text format)
Trio: from hs_py.encoding.trio import ... (record text format)
CSV: from hs_py.encoding.csv import ... (lossy grid export)

For convenience, the most common JSON functions are re-exported directly from this package. Zinc, Trio, and CSV functions should be imported from their respective modules to avoid name collisions.

JSON¶

Haystack JSON v3 and v4 encode/decode with optional pythonic mode.

Haystack JSON encoding and decoding.

Supports both Haystack 4 (v4) and Haystack 3 (v3) JSON formats, with an optional pythonic decode mode that converts Haystack types to native Python equivalents where possible.

See: https://project-haystack.org/doc/docHaystack/Json

class hs_py.encoding.json.JsonVersion(*values)[source]¶

Bases: Enum

Haystack JSON encoding version.

V3 = 'v3'¶: Haystack 3 JSON — type-prefixed strings (e.g. "n:42 °F").

V4 = 'v4'¶: Haystack 4 JSON — _kind object wrappers.

hs_py.encoding.json.decode_grid(data, *, version=JsonVersion.V4, pythonic=False)[source]¶

Decode Haystack JSON bytes to a Grid.

Parameters:

data (bytes) – JSON bytes.
version (JsonVersion (default: <JsonVersion.V4: 'v4'>)) – JSON encoding version to decode.
pythonic (bool (default: False)) – If True, convert values to native Python types.

Return type:

Grid

Returns:

Decoded Grid.

hs_py.encoding.json.decode_grid_dict(obj, *, version=JsonVersion.V4, pythonic=False)[source]¶

Decode a pre-parsed JSON dict to a Grid.

Use this when the JSON has already been deserialized (e.g. from a WebSocket message) to avoid an unnecessary orjson.dumps / orjson.loads round-trip.

Parameters:

obj (dict[str, Any]) – JSON-deserialized dict representing a grid.
version (JsonVersion (default: <JsonVersion.V4: 'v4'>)) – JSON encoding version to decode.
pythonic (bool (default: False)) – If True, convert values to native Python types.

Return type:

Grid

Returns:

Decoded Grid.

hs_py.encoding.json.decode_val(obj, *, version=JsonVersion.V4, pythonic=False)[source]¶

Decode a JSON value to a Haystack kind.

Parameters:

obj (Any) – JSON-deserialized value.
version (JsonVersion (default: <JsonVersion.V4: 'v4'>)) – JSON encoding version to decode.
pythonic (bool (default: False)) – If True, convert to native Python types where possible. Marker becomes True, unitless Number becomes float, Symbol and Uri become str.

Return type:

Any

Returns:

Decoded Haystack value.

hs_py.encoding.json.encode_grid(grid, *, version=JsonVersion.V4)[source]¶

Encode a Grid to Haystack JSON bytes.

Parameters:

grid (Grid) – Grid to encode.
version (JsonVersion (default: <JsonVersion.V4: 'v4'>)) – JSON encoding version to use.

Return type:

bytes

Returns:

JSON-encoded bytes via orjson.

hs_py.encoding.json.encode_grid_dict(grid, *, version=JsonVersion.V4)[source]¶

Encode a Grid to a JSON-compatible dict (no serialization).

Use this when embedding a grid dict inside a larger JSON structure to avoid the overhead of serializing to bytes and back.

Parameters:

grid (Grid) – Grid to encode.
version (JsonVersion (default: <JsonVersion.V4: 'v4'>)) – JSON encoding version to use.

Return type:

dict[str, Any]

Returns:

JSON-serializable dict.

hs_py.encoding.json.encode_val(val, *, version=JsonVersion.V4)[source]¶

Encode a single Haystack value to its JSON-compatible representation.

Parameters:

val (Any) – Haystack value to encode.
version (JsonVersion (default: <JsonVersion.V4: 'v4'>)) – JSON encoding version to use.

Return type:

Any

Returns:

JSON-serializable Python object.

Zinc¶

Haystack Zinc text grid format encode/decode.

Haystack Zinc encoding and decoding.

Zinc is the primary text format for Haystack data. It encodes grids as a line-oriented text format with typed scalar values.

See: https://project-haystack.org/doc/docHaystack/Zinc

hs_py.encoding.zinc.decode_grid(text, *, _depth=0)[source]¶

Decode Zinc text into a Grid.

Parameters:

text (str) – Zinc grid text.
_depth (int)

Return type:

Grid

Returns:

Decoded Grid.

hs_py.encoding.zinc.decode_val(text)[source]¶

Decode a Zinc-encoded scalar value string.

Parameters:: text (str) – Zinc value text.
Return type:: Any
Returns:: Parsed Haystack value.

hs_py.encoding.zinc.encode_grid(grid)[source]¶

Encode a Grid as Zinc text.

Parameters:: grid (Grid) – Grid to encode.
Return type:: str
Returns:: Zinc-encoded grid string.

hs_py.encoding.zinc.encode_val(val)[source]¶

Encode a single Haystack value as Zinc text.

Parameters:: val (Any) – Haystack value to encode.
Return type:: str
Returns:: Zinc-encoded string.

Trio¶

Trio record format parser and encoder.

Trio text format parser and encoder.

Trio is a line-oriented format for hand-authoring Haystack data records. Each record contains tag name-value pairs separated by lines of dashes. Values are encoded in Zinc scalar format with Trio-specific extensions (unquoted strings, true/false booleans).

See: https://project-haystack.org/doc/docHaystack/Trio

hs_py.encoding.trio.encode_trio(records)[source]¶

Encode a list of tag dicts as Trio text.

Multi-line strings, nested Grid values (via Zinc), and nested record lists (via Trio) are encoded using indented continuation lines.

Parameters:: records (list[dict[str, Any]]) – List of tag dicts, one per record.
Return type:: str
Returns:: Trio-formatted text with trailing newline.

hs_py.encoding.trio.parse_trio(text, *, _depth=0)[source]¶

Parse Trio text into a list of tag dicts.

Each dict represents one record (separated by lines of ---). Supports multi-line string, Zinc, and Trio values via indented continuation lines.

Parameters:

text (str) – Trio-formatted text.
_depth (int)

Return type:

list[dict[str, Any]]

Returns:

List of tag dicts, one per record.

Raises:

ValueError – If nesting depth exceeds limit.

hs_py.encoding.trio.parse_zinc_val(text)[source]¶

Parse a Zinc-encoded scalar value string.

This parses strict Zinc syntax only. For Trio-specific extensions (unquoted strings, true/false), use parse_trio().

Parameters:: text (str) – Zinc value text.
Return type:: Any
Returns:: Parsed Haystack value.

CSV¶

Lossy CSV grid export (encode-only).

Haystack CSV encoding.

CSV is a lossy text format for grids — metadata, column meta, and type information are discarded. It is useful for exporting grid data to spreadsheets and other tools that consume RFC 4180 CSV.

See: https://project-haystack.org/doc/docHaystack/Csv

hs_py.encoding.csv.encode_grid(grid)[source]¶

Encode a Grid as CSV text.

Column headers use the dis metadata value when present, otherwise the programmatic column name. Grid and column metadata are discarded. Type information is simplified per the Haystack CSV spec.

Parameters:: grid (Grid) – Grid to encode.
Return type:: str
Returns:: CSV-formatted string (with trailing newline).

Scanner¶

Shared position-based Zinc value scanning helpers.

Shared Zinc value scanning utilities.

Position-based scanner functions for Zinc-encoded scalar values. Used by both the Trio parser and the filter lexer to avoid duplicating regex constants and parsing logic.

All scan functions use the (text, pos) -> (value, end_pos) signature.

hs_py.encoding.scanner.DATETIME_RE = re.compile('\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:Z|[+-]\\d{2}:\\d{2})(?:\\s+[A-Z][a-zA-Z0-9_/]+)?')¶: Regex for Zinc datetime values.

hs_py.encoding.scanner.DATE_RE = re.compile('\\d{4}-\\d{2}-\\d{2}')¶: Regex for Zinc date values.

hs_py.encoding.scanner.DIGIT_CHARS = frozenset({'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '_'})¶: Digit characters and underscore (for numeric scanning).

hs_py.encoding.scanner.IDENT_CHARS = frozenset({'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'})¶: Characters valid in tag names and identifiers (alphanumeric + underscore).

hs_py.encoding.scanner.REF_CHARS = frozenset({'-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '~'})¶: Characters valid in a Ref id.

hs_py.encoding.scanner.STR_ESCAPES: dict[str, str] = {'"': '"', '$': '$', '\\': '\\', 'b': '\x08', 'f': '\x0c', 'n': '\n', 'r': '\r', 't': '\t'}¶: String escape sequences per Zinc spec.

hs_py.encoding.scanner.SYMBOL_CHARS = frozenset({'-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'})¶: Characters valid in symbol names (alphanumeric + hyphen, underscore, colon, dot).

hs_py.encoding.scanner.TIME_RE = re.compile('\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?')¶: Regex for Zinc time values.

hs_py.encoding.scanner.UNIT_STOP_BASE = frozenset({'\t', '\n', '\r', ' '})¶: Base characters that terminate a number unit (whitespace only). Consumers extend this with context-specific delimiters.

hs_py.encoding.scanner.city_to_tz(name)[source]¶

Resolve a Haystack timezone name to a ZoneInfo.

Accepts both city-only names ("New_York") and full IANA names ("America/New_York"). Results are cached to avoid repeated filesystem lookups from ZoneInfo.

Parameters:: name (str) – Haystack city name or full IANA timezone key.
Return type:: ZoneInfo
Returns:: Resolved ZoneInfo instance.
Raises:: KeyError – If name cannot be resolved.

hs_py.encoding.scanner.escape_str(s)[source]¶

Escape a string per the Zinc string escape spec.

Parameters:: s (str) – Raw string.
Return type:: str
Returns:: Escaped string safe for Zinc encoding.

hs_py.encoding.scanner.format_num(val)[source]¶

Format a float, dropping unnecessary trailing zeros.

Parameters:: val (float) – Numeric value.
Return type:: str
Returns:: String representation without redundant decimal places.

hs_py.encoding.scanner.format_number(n)[source]¶

Format a Number as a string with optional unit.

Handles NaN, INF, -INF, and appends the unit if present.

Parameters:: n (Number) – Number to format.
Return type:: str
Returns:: Zinc-formatted number string.

hs_py.encoding.scanner.format_ref(ref, *, zinc=False)[source]¶

Format a Ref as @id dis or @id "dis".

Parameters:

ref (Ref) – Ref to format.
zinc (bool (default: False)) – If True, quote the display string per Zinc syntax.

Return type:

str

Returns:

Formatted ref string.

hs_py.encoding.scanner.parse_datetime(s)[source]¶

Parse a Zinc datetime string into a Python datetime.

Parameters:: s (str) – Zinc datetime text (e.g. "2024-01-15T10:30:00-05:00 New_York").
Return type:: datetime
Returns:: Parsed timezone-aware datetime.

hs_py.encoding.scanner.scan_dict(text, pos, *, _depth=0)[source]¶

Scan a Zinc dict literal starting at {.

Parameters:

text (str) – Source text.
pos (int) – Position of the opening {.
_depth (int)

Return type:

tuple[dict[str, Any], int]

Returns:

(dict, end_pos) tuple.

hs_py.encoding.scanner.scan_keyword(text, pos)[source]¶

Scan a keyword (T/F/M/NA/…), Coord, XStr, or bare identifier.

Parameters:

text (str) – Source text.
pos (int) – Starting position (must be an alpha character).

Return type:

tuple[Any, int]

Returns:

(value, end_pos) tuple.

hs_py.encoding.scanner.scan_list(text, pos, *, _depth=0)[source]¶

Scan a Zinc list literal starting at [.

Parameters:

text (str) – Source text.
pos (int) – Position of the opening [.
_depth (int)

Return type:

tuple[list[Any], int]

Returns:

(list, end_pos) tuple.

hs_py.encoding.scanner.scan_number(text, pos, *, unit_stop=None)[source]¶

Scan a numeric literal with optional unit.

Supports underscore digit separators per the Zinc spec (e.g. 10_000).

Parameters:

text (str) – Source text.
pos (int) – Starting position.
unit_stop (frozenset[str] | None (default: None)) – Characters that terminate the unit string.

Return type:

tuple[Number, int]

Returns:

(number, end_pos) tuple.

hs_py.encoding.scanner.scan_number_or_temporal(text, pos, *, unit_stop=None)[source]¶

Disambiguate and scan a number, date, time, or datetime.

Parameters:

text (str) – Source text.
pos (int) – Starting position.
unit_stop (frozenset[str] | None (default: None)) – Characters that terminate a number unit.

Return type:

tuple[Any, int]

Returns:

(value, end_pos) tuple.

hs_py.encoding.scanner.scan_ref(text, pos)[source]¶

Scan a Zinc Ref literal starting at @.

Parameters:

text (str) – Source text.
pos (int) – Position of the @ character.

Return type:

tuple[Ref, int]

Returns:

(ref, end_pos) tuple.

hs_py.encoding.scanner.scan_str(text, pos)[source]¶

Scan a Zinc quoted string starting at the opening ".

Parameters:

text (str) – Source text.
pos (int) – Position of the opening quote.

Return type:

tuple[str, int]

Returns:

(string_value, end_pos) tuple.

Raises:

ValueError – If the string is unterminated.

hs_py.encoding.scanner.scan_symbol(text, pos)[source]¶

Scan a Zinc Symbol literal starting at ^.

Parameters:

text (str) – Source text.
pos (int) – Position of the ^ character.

Return type:

tuple[Symbol, int]

Returns:

(symbol, end_pos) tuple.

hs_py.encoding.scanner.scan_tag_name(text, pos)[source]¶

Scan a tag name (alphanumeric + underscore) starting at pos.

Parameters:

text (str) – Source text.
pos (int) – Starting position.

Return type:

tuple[str, int]

Returns:

(name, end_pos) tuple; name may be empty.

hs_py.encoding.scanner.scan_uri(text, pos)[source]¶

Scan a Zinc Uri literal starting at back-tick.

Parameters:

text (str) – Source text.
pos (int) – Position of the opening back-tick.

Return type:

tuple[Uri, int]

Returns:

(uri, end_pos) tuple.

Raises:

ValueError – If the URI is unterminated.

hs_py.encoding.scanner.scan_val(text, pos, *, _depth=0)[source]¶

Scan a Zinc value starting at pos.

Parameters:

text (str) – Source text.
pos (int) – Starting position.
_depth (int)

Return type:

tuple[Any, int]

Returns:

(value, end_pos) tuple.

Raises:

ValueError – If an unexpected character is encountered or nesting depth exceeds MAX_SCAN_DEPTH.

hs_py.encoding.scanner.skip_ws(text, pos)[source]¶

Advance pos past whitespace.

Parameters:

text (str) – Source text.
pos (int) – Current position.

Return type:

int

Returns:

Position of the first non-whitespace character.

hs_py.encoding.scanner.tz_name(dt)[source]¶

Extract the Haystack city timezone name from a datetime.

Parameters:: dt (datetime) – Timezone-aware datetime.
Return type:: str | None
Returns:: City-only name, or None if the datetime has no timezone or uses a fixed-offset timezone without an IANA key.

hs_py.encoding.scanner.tz_to_city(tz_key)[source]¶

Extract the Haystack city name from an IANA timezone key.

Haystack uses city-only timezone names per the zoneinfo convention:

"America/New_York" → "New_York"
"UTC"              → "UTC"

Parameters:: tz_key (str) – Full IANA timezone key.
Return type:: str
Returns:: City-only timezone name.