about things
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 201 lines 7.2 kB view raw view rendered
1# binary encoding 2 3patterns for encoding/decoding binary wire formats (CBOR, CAR, protocol frames). distinct from JSON - you're working with raw bytes and need to handle endianness, varints, and content addressing. 4 5## anytype writer for encoders 6 7the core pattern: an encoder function that accepts any writer via `anytype`. this lets the same encoder write to fixed buffers, ArrayLists, or any other writer: 8 9```zig 10pub fn encode(allocator: Allocator, writer: anytype, value: Value) !void { 11 switch (value) { 12 .unsigned => |v| try writeArgument(writer, 0, v), 13 .text => |t| { 14 try writeArgument(writer, 3, t.len); 15 try writer.writeAll(t); 16 }, 17 .map => |entries| { 18 // sort keys (DAG-CBOR determinism), needs allocator 19 const sorted = try allocator.dupe(MapEntry, entries); 20 defer allocator.free(sorted); 21 std.mem.sort(MapEntry, sorted, {}, keyLessThan); 22 // ... 23 }, 24 // ... 25 } 26} 27``` 28 29the allocator parameter is separate from the writer - needed for temporary allocations during encoding (sorting map keys, building intermediate buffers), not for the output itself. 30 31usage with different writers: 32 33```zig 34// fixed buffer (no allocation for output) 35var buf: [1024]u8 = undefined; 36var stream = std.io.fixedBufferStream(&buf); 37try encode(alloc, stream.writer(), value); 38const result = stream.getWritten(); 39 40// growable buffer 41var list: std.ArrayList(u8) = .{}; 42defer list.deinit(alloc); 43try encode(alloc, list.writer(alloc), value); 44``` 45 46**note**: `std.io.fixedBufferStream` is deprecated in 0.15 — the stdlib says to use `std.Io.Writer.fixed` / `std.Io.Reader.fixed` instead. the old API still compiles (zat uses it in 3 files) but new code should prefer the non-deprecated form. the `anytype` writer pattern itself is fine either way — the encoder doesn't care which writer type backs it. 47 48see: [zat/cbor.zig](https://tangled.sh/@zzstoatzz.io/zat/tree/main/src/internal/cbor.zig) 49 50## encodeAlloc convenience 51 52wrap the growable-buffer pattern into a helper: 53 54```zig 55pub fn encodeAlloc(allocator: Allocator, value: Value) ![]u8 { 56 var list: std.ArrayList(u8) = .{}; 57 errdefer list.deinit(allocator); 58 try encode(allocator, list.writer(allocator), value); 59 return try list.toOwnedSlice(allocator); 60} 61``` 62 63caller owns the returned slice. `errdefer` ensures cleanup if encoding fails partway through. 64 65## big-endian integers without writeInt 66 67when writing fixed-width big-endian integers to an `anytype` writer, build the bytes manually rather than depending on `writeInt` (which may not be available on all writer types): 68 69```zig 70fn writeArgument(writer: anytype, major: u3, val: u64) !void { 71 const prefix: u8 = @as(u8, major) << 5; 72 if (val <= 0xffff) { 73 try writer.writeByte(prefix | 25); 74 const v: u16 = @intCast(val); 75 try writer.writeAll(&[2]u8{ @truncate(v >> 8), @truncate(v) }); 76 } 77 // ... 78} 79``` 80 81`@truncate` on shifted values is the idiomatic way to extract individual bytes. 82 83## unsigned varint (LEB128) 84 85used by CID, CAR, and other IPLD formats for variable-length integers: 86 87```zig 88// write 89pub fn writeUvarint(writer: anytype, val: u64) !void { 90 var v = val; 91 while (v >= 0x80) { 92 try writer.writeByte(@as(u8, @truncate(v)) | 0x80); 93 v >>= 7; 94 } 95 try writer.writeByte(@as(u8, @truncate(v))); 96} 97 98// read 99fn readUvarint(data: []const u8, pos: *usize) ?u64 { 100 var result: u64 = 0; 101 var shift: u6 = 0; 102 while (pos.* < data.len) { 103 const byte = data[pos.*]; 104 pos.* += 1; 105 result |= @as(u64, byte & 0x7f) << shift; 106 if (byte & 0x80 == 0) return result; 107 shift +|= 7; 108 if (shift >= 64) return null; 109 } 110 return null; 111} 112``` 113 114note `+|=` (saturating add) prevents overflow on the shift counter. 115 116## arena per message 117 118for streaming protocols, create an arena per incoming message. all decoding allocations go into it, then free everything at once: 119 120```zig 121pub fn serverMessage(self: *Self, data: []const u8) !void { 122 var arena = std.heap.ArenaAllocator.init(self.allocator); 123 defer arena.deinit(); 124 125 const event = decodeFrame(arena.allocator(), data) catch |err| { 126 log.debug("decode error: {s}", .{@errorName(err)}); 127 return; 128 }; 129 130 self.handler.onEvent(event); 131 // arena freed here — all decoded data is gone 132} 133``` 134 135this means the handler's `onEvent` must not hold references to event data past the call. if it needs to, it must copy into its own allocator. 136 137see: [zat/firehose.zig](https://tangled.sh/@zzstoatzz.io/zat/tree/main/src/internal/firehose.zig), [zat/jetstream.zig](https://tangled.sh/@zzstoatzz.io/zat/tree/main/src/internal/jetstream.zig) 138 139## specialized decoders 140 141when generic decoding is too expensive, write a purpose-built parser for a known schema. the generic path builds `Value` unions, `MapEntry` arrays, and handles every CBOR type. if you know the exact shape, skip all that. 142 143example: MST nodes are always `map(2) { "e": array[entries...], "l": CID|null }`. instead of `cbor.decodeAll()` → extract fields from Value unions, parse the CBOR bytes directly: 144 145```zig 146pub fn decodeMstNode(allocator: Allocator, data: []const u8) MstDecodeError!MstNodeData { 147 // expect map(2), key "e", array(n) — known byte sequence 148 // parse entries inline, zero-copy slicing into input buffer 149 // only allocation: the entries array itself 150} 151 152pub const MstNodeData = struct { 153 left: ?[]const u8, // raw CID bytes (borrowed from input) 154 entries: []MstEntryData, // heap-allocated array 155}; 156 157pub const MstEntryData = struct { 158 prefix_len: usize, 159 key_suffix: []const u8, // borrowed from input 160 value_cid: []const u8, // borrowed from input 161 tree: ?[]const u8, // borrowed from input 162}; 163``` 164 165the result: MST walk went from 45.5ms (generic decode per node) to 39.3ms (specialized decode) on 243k blocks. the bigger win was avoiding the full tree rebuild (218ms → 39ms total) by verifying structure during the walk. 166 167when to use this pattern: 168- you decode the same schema thousands of times (MST nodes, CBOR blocks) 169- the schema is stable and well-known 170- profiling shows decode time dominates 171 172when NOT to use it: 173- the schema varies or is user-defined 174- you only decode a handful of times 175- generic decode is fast enough 176 177see: [zat/mst.zig decodeMstNode](https://tangled.sh/@zzstoatzz.io/zat/tree/main/src/internal/repo/mst.zig) 178 179## deterministic encoding 180 181DAG-CBOR requires deterministic output (same value → same bytes). the main rules: 182 183- **shortest integer encoding**: 0-23 inline, 24-255 in 1 byte, etc. 184- **map keys sorted**: by byte length first, then lexicographically 185- **no floats, no indefinite lengths** 186 187sorting map keys during encoding: 188 189```zig 190fn dagCborKeyLessThan(_: void, a: MapEntry, b: MapEntry) bool { 191 if (a.key.len != b.key.len) return a.key.len < b.key.len; 192 return std.mem.order(u8, a.key, b.key) == .lt; 193} 194 195// in encoder: 196const sorted = try allocator.dupe(MapEntry, entries); 197defer allocator.free(sorted); 198std.mem.sort(MapEntry, sorted, {}, dagCborKeyLessThan); 199``` 200 201the dupe + sort pattern avoids mutating the input — the caller's `entries` slice stays unchanged.