binary encoding#

patterns for encoding/decoding binary wire formats (CBOR, CAR, protocol frames). distinct from JSON - you're working with raw bytes and need to handle endianness, varints, and content addressing.

anytype writer for encoders#

the core pattern: an encoder function that accepts any writer via anytype. this lets the same encoder write to fixed buffers, ArrayLists, or any other writer:

pub fn encode(allocator: Allocator, writer: anytype, value: Value) !void {
    switch (value) {
        .unsigned => |v| try writeArgument(writer, 0, v),
        .text => |t| {
            try writeArgument(writer, 3, t.len);
            try writer.writeAll(t);
        },
        .map => |entries| {
            // sort keys (DAG-CBOR determinism), needs allocator
            const sorted = try allocator.dupe(MapEntry, entries);
            defer allocator.free(sorted);
            std.mem.sort(MapEntry, sorted, {}, keyLessThan);
            // ...
        },
        // ...
    }
}

the allocator parameter is separate from the writer - needed for temporary allocations during encoding (sorting map keys, building intermediate buffers), not for the output itself.

usage with different writers:

// fixed buffer (no allocation for output)
var buf: [1024]u8 = undefined;
var stream = std.io.fixedBufferStream(&buf);
try encode(alloc, stream.writer(), value);
const result = stream.getWritten();

// growable buffer
var list: std.ArrayList(u8) = .{};
defer list.deinit(alloc);
try encode(alloc, list.writer(alloc), value);

note: std.io.fixedBufferStream is deprecated in 0.15 — the stdlib says to use std.Io.Writer.fixed / std.Io.Reader.fixed instead. the old API still compiles (zat uses it in 3 files) but new code should prefer the non-deprecated form. the anytype writer pattern itself is fine either way — the encoder doesn't care which writer type backs it.

see: zat/cbor.zig

encodeAlloc convenience#

wrap the growable-buffer pattern into a helper:

pub fn encodeAlloc(allocator: Allocator, value: Value) ![]u8 {
    var list: std.ArrayList(u8) = .{};
    errdefer list.deinit(allocator);
    try encode(allocator, list.writer(allocator), value);
    return try list.toOwnedSlice(allocator);
}

caller owns the returned slice. errdefer ensures cleanup if encoding fails partway through.

big-endian integers without writeInt#

when writing fixed-width big-endian integers to an anytype writer, build the bytes manually rather than depending on writeInt (which may not be available on all writer types):

fn writeArgument(writer: anytype, major: u3, val: u64) !void {
    const prefix: u8 = @as(u8, major) << 5;
    if (val <= 0xffff) {
        try writer.writeByte(prefix | 25);
        const v: u16 = @intCast(val);
        try writer.writeAll(&[2]u8{ @truncate(v >> 8), @truncate(v) });
    }
    // ...
}

@truncate on shifted values is the idiomatic way to extract individual bytes.

unsigned varint (LEB128)#

used by CID, CAR, and other IPLD formats for variable-length integers:

// write
pub fn writeUvarint(writer: anytype, val: u64) !void {
    var v = val;
    while (v >= 0x80) {
        try writer.writeByte(@as(u8, @truncate(v)) | 0x80);
        v >>= 7;
    }
    try writer.writeByte(@as(u8, @truncate(v)));
}

// read
fn readUvarint(data: []const u8, pos: *usize) ?u64 {
    var result: u64 = 0;
    var shift: u6 = 0;
    while (pos.* < data.len) {
        const byte = data[pos.*];
        pos.* += 1;
        result |= @as(u64, byte & 0x7f) << shift;
        if (byte & 0x80 == 0) return result;
        shift +|= 7;
        if (shift >= 64) return null;
    }
    return null;
}

note +|= (saturating add) prevents overflow on the shift counter.

arena per message#

for streaming protocols, create an arena per incoming message. all decoding allocations go into it, then free everything at once:

pub fn serverMessage(self: *Self, data: []const u8) !void {
    var arena = std.heap.ArenaAllocator.init(self.allocator);
    defer arena.deinit();

    const event = decodeFrame(arena.allocator(), data) catch |err| {
        log.debug("decode error: {s}", .{@errorName(err)});
        return;
    };

    self.handler.onEvent(event);
    // arena freed here — all decoded data is gone
}

this means the handler's onEvent must not hold references to event data past the call. if it needs to, it must copy into its own allocator.

see: zat/firehose.zig, zat/jetstream.zig

specialized decoders#

when generic decoding is too expensive, write a purpose-built parser for a known schema. the generic path builds Value unions, MapEntry arrays, and handles every CBOR type. if you know the exact shape, skip all that.

example: MST nodes are always map(2) { "e": array[entries...], "l": CID|null }. instead of cbor.decodeAll() → extract fields from Value unions, parse the CBOR bytes directly:

pub fn decodeMstNode(allocator: Allocator, data: []const u8) MstDecodeError!MstNodeData {
    // expect map(2), key "e", array(n) — known byte sequence
    // parse entries inline, zero-copy slicing into input buffer
    // only allocation: the entries array itself
}

pub const MstNodeData = struct {
    left: ?[]const u8,       // raw CID bytes (borrowed from input)
    entries: []MstEntryData, // heap-allocated array
};

pub const MstEntryData = struct {
    prefix_len: usize,
    key_suffix: []const u8,  // borrowed from input
    value_cid: []const u8,   // borrowed from input
    tree: ?[]const u8,       // borrowed from input
};

the result: MST walk went from 45.5ms (generic decode per node) to 39.3ms (specialized decode) on 243k blocks. the bigger win was avoiding the full tree rebuild (218ms → 39ms total) by verifying structure during the walk.

when to use this pattern:

you decode the same schema thousands of times (MST nodes, CBOR blocks)
the schema is stable and well-known
profiling shows decode time dominates

when NOT to use it:

the schema varies or is user-defined
you only decode a handful of times
generic decode is fast enough

see: zat/mst.zig decodeMstNode

deterministic encoding#

DAG-CBOR requires deterministic output (same value → same bytes). the main rules:

shortest integer encoding: 0-23 inline, 24-255 in 1 byte, etc.
map keys sorted: by byte length first, then lexicographically
no floats, no indefinite lengths

sorting map keys during encoding:

fn dagCborKeyLessThan(_: void, a: MapEntry, b: MapEntry) bool {
    if (a.key.len != b.key.len) return a.key.len < b.key.len;
    return std.mem.order(u8, a.key, b.key) == .lt;
}

// in encoder:
const sorted = try allocator.dupe(MapEntry, entries);
defer allocator.free(sorted);
std.mem.sort(MapEntry, sorted, {}, dagCborKeyLessThan);

the dupe + sort pattern avoids mutating the input — the caller's entries slice stays unchanged.

Configure Feed