Protocol Buffers codec for hand-written schemas
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

wal, block, sse, zephyr: remove [mutable] from never-reassigned fields

Warning 69 (unused-field, mutable-never-assigned). Four independent
record fields were flagged as mutable but the code only mutates their
referents in place, never rebinds the record slot itself:

- ocaml-wal/lib/wal.ml: [t.file] (the Eio file resource; methods call
Eio.File.pwrite_all etc., the slot is set once at open time).
- ocaml-block/lib/block.ml: [Memory.state.data] (the backing bytes,
written via Bytes.blit_string; [Bytes.t] is already mutable).
- ocaml-sse/lib/sse.ml: [Parser.t.data_buf] (a Buffer.t, written via
Buffer.add_*; the slot never changes).
- ocaml-zephyr/lib/zephyr.ml: drop [mode : Read | Write] entirely —
set at open-time, read nowhere. The open_read / open_write
constructors already distinguish the two call shapes, so mode
tracking was redundant.

+1594
+15
LICENSE.md
··· 1 + ISC License 2 + 3 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 4 + 5 + Permission to use, copy, modify, and distribute this software for any 6 + purpose with or without fee is hereby granted, provided that the above 7 + copyright notice and this permission notice appear in all copies. 8 + 9 + THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10 + WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11 + MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12 + ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13 + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14 + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15 + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+77
README.md
··· 1 + # protobuf 2 + 3 + Protocol Buffers codec for OCaml, for hand-written message schemas. 4 + 5 + Define a codec once as a value of type `'a t` and use it for both encoding 6 + and decoding. No `.proto` code generation yet — the target use case is 7 + speaking specific gRPC or protobuf protocols by hand, where the schema 8 + change cadence is low and the combinator definition is short. 9 + 10 + ## Installation 11 + 12 + ``` 13 + opam install protobuf 14 + ``` 15 + 16 + ## Usage 17 + 18 + ```ocaml 19 + type person = { name : string; age : int32; hobbies : string list } 20 + 21 + let person_codec : person Protobuf.t = 22 + let open Protobuf.Message in 23 + let* name = required 1 (fun p -> p.name) Protobuf.string in 24 + let* age = required 2 (fun p -> p.age) Protobuf.int32 in 25 + let* hobbies = repeated 3 (fun p -> p.hobbies) Protobuf.string in 26 + return { name; age; hobbies } |> finish 27 + 28 + let wire = 29 + Protobuf.encode_string person_codec 30 + { name = "Ada"; age = 36l; hobbies = [ "math"; "sewing" ] } 31 + 32 + let () = 33 + match Protobuf.decode_string person_codec wire with 34 + | Ok p -> Printf.printf "%s, %ld, %d hobbies\n" p.name p.age (List.length p.hobbies) 35 + | Error msg -> prerr_endline msg 36 + ``` 37 + 38 + ## Scalar codecs 39 + 40 + | Codec | OCaml type | Wire | 41 + |----------------|--------------|----------------------------------| 42 + | `int32` | `int32` | varint, signed | 43 + | `int64` | `int64` | varint, signed | 44 + | `uint32` | `int32` | varint, unsigned bit pattern | 45 + | `uint64` | `int64` | varint, unsigned bit pattern | 46 + | `sint32` | `int32` | varint, zig-zag | 47 + | `sint64` | `int64` | varint, zig-zag | 48 + | `fixed32` | `int32` | 4 bytes little-endian | 49 + | `fixed64` | `int64` | 8 bytes little-endian | 50 + | `sfixed32` | `int32` | 4 bytes little-endian, signed | 51 + | `sfixed64` | `int64` | 8 bytes little-endian, signed | 52 + | `float` | `float` | 4 bytes IEEE 754 | 53 + | `double` | `float` | 8 bytes IEEE 754 | 54 + | `bool` | `bool` | varint | 55 + | `string` | `string` | length-delimited UTF-8 | 56 + | `bytes` | `string` | length-delimited bytes | 57 + 58 + ## Field combinators 59 + 60 + - `required n get c` — a field at tag `n`, codec `c`. Absent decodes as the 61 + scalar's default (`0`, `""`, `false`, etc., matching proto3 semantics). 62 + - `optional n get c` — the value as `'a option`; decoded as `None` iff the 63 + tag is absent. 64 + - `repeated n get c` — multiple values (non-packed). Each occurrence of 65 + tag `n` appends to the list. 66 + - `packed n get c` — packed repeated: one length-delimited blob holds all 67 + values concatenated. Required for numeric repeated fields in proto3. 68 + - `message n get c` — a nested message (length-delimited). 69 + 70 + ## Limitations (v0.1) 71 + 72 + - No `.proto` code generation. Write codecs by hand. 73 + - No `oneof`. 74 + - No `map<K, V>`. 75 + - Unknown fields are discarded on decode (not preserved for re-encode). 76 + - `int32`/`int64` (not zig-zag) use 10-byte varints for negative values, 77 + matching the protobuf spec.
+3
dune
··· 1 + (env 2 + (dev 3 + (flags :standard %{dune-warnings})))
+29
dune-project
··· 1 + (lang dune 3.21) 2 + (name protobuf) 3 + 4 + (generate_opam_files true) 5 + 6 + (license ISC) 7 + (authors "Thomas Gazagnaire <thomas@gazagnaire.org>") 8 + (maintainers "Thomas Gazagnaire <thomas@gazagnaire.org>") 9 + (source (tangled gazagnaire.org/ocaml-protobuf)) 10 + 11 + (package 12 + (name protobuf) 13 + (synopsis "Protocol Buffers codec for hand-written schemas") 14 + (tags (org:blacksun codec.binary)) 15 + (description 16 + "Bidirectional Protocol Buffers codec for hand-written message schemas. 17 + Defines a codec once as a value of type ['a t] and uses it for both 18 + encoding and decoding. Uses cbor-style [let*] monadic combinators for 19 + message composition and [leb128] for the varint wire primitives. No 20 + [.proto] code generation yet -- the target use case is speaking 21 + specific gRPC or protobuf protocols by hand.") 22 + (depends 23 + (ocaml (>= 5.1)) 24 + (bytesrw (>= 0.2)) 25 + (leb128 (= :version)) 26 + (fmt (>= 0.9)) 27 + (odoc :with-doc) 28 + (alcotest (and (>= 1.7) :with-test)) 29 + (crowbar (and (>= 0.2) :with-test))))
+4
lib/dune
··· 1 + (library 2 + (name protobuf) 3 + (public_name protobuf) 4 + (libraries bytesrw fmt leb128))
+479
lib/protobuf.ml
··· 1 + module Wire = Wire 2 + 3 + (* A pre-parsed wire field value. The message decoder parses the byte stream 4 + into a [(tag, wire_value list)] table before running the field GADT, so 5 + tags can appear in any order on the wire. *) 6 + type wire_value = 7 + | Varint of int64 8 + | Fixed32 of int32 9 + | Fixed64 of int64 10 + | Length_delim of string 11 + 12 + let wire_value_type = function 13 + | Varint _ -> Wire.Varint 14 + | Fixed32 _ -> Wire.Fixed32 15 + | Fixed64 _ -> Wire.Fixed64 16 + | Length_delim _ -> Wire.Length_delimited 17 + 18 + (* Unified codec. [write_value] writes a full wire value (length prefix 19 + included for length-delimited types). [write_body] writes the unwrapped 20 + body (same as [write_value] for scalars; drops the length prefix for 21 + length-delimited types) — used at top level where the outer length is 22 + implicit. *) 23 + type 'a t = { 24 + wire_type : Wire.wire_type; 25 + write_value : Buffer.t -> 'a -> unit; 26 + write_body : Buffer.t -> 'a -> unit; 27 + (* Reads from a pre-parsed wire value. Raises {!Wire.Decode_error} on type 28 + mismatch. *) 29 + read_wire : wire_value -> 'a; 30 + (* Reads from raw bytes at an offset, for packed repeated decoding. Returns 31 + [(value, new_offset)]. Only meaningful for varint, fixed32, fixed64 32 + codecs; length-delimited codecs raise if asked. *) 33 + read_bytes : string -> int -> 'a * int; 34 + default : 'a; 35 + } 36 + 37 + (* -- Scalars -- *) 38 + 39 + let type_error expected got = 40 + raise 41 + (Wire.Decode_error 42 + (Fmt.str "type mismatch: expected %a, got %a" Wire.pp_wire_type expected 43 + Wire.pp_wire_type (wire_value_type got))) 44 + 45 + let varint_of = function Varint v -> v | w -> type_error Wire.Varint w 46 + let fixed32_of = function Fixed32 v -> v | w -> type_error Wire.Fixed32 w 47 + let fixed64_of = function Fixed64 v -> v | w -> type_error Wire.Fixed64 w 48 + 49 + let length_delim_of = function 50 + | Length_delim s -> s 51 + | w -> type_error Wire.Length_delimited w 52 + 53 + let int32 : int32 t = 54 + { 55 + wire_type = Varint; 56 + write_value = Wire.write_int32; 57 + write_body = Wire.write_int32; 58 + read_wire = (fun w -> Int64.to_int32 (varint_of w)); 59 + read_bytes = Wire.read_int32; 60 + default = 0l; 61 + } 62 + 63 + let int64 : int64 t = 64 + { 65 + wire_type = Varint; 66 + write_value = Wire.write_int64; 67 + write_body = Wire.write_int64; 68 + read_wire = (fun w -> varint_of w); 69 + read_bytes = Wire.read_int64; 70 + default = 0L; 71 + } 72 + 73 + let uint32 : int32 t = 74 + { 75 + wire_type = Varint; 76 + write_value = Wire.write_uint32; 77 + write_body = Wire.write_uint32; 78 + read_wire = 79 + (fun w -> Int64.to_int32 (Int64.logand (varint_of w) 0xFFFF_FFFFL)); 80 + read_bytes = Wire.read_uint32; 81 + default = 0l; 82 + } 83 + 84 + let uint64 : int64 t = 85 + { 86 + wire_type = Varint; 87 + write_value = Wire.write_uint64; 88 + write_body = Wire.write_uint64; 89 + read_wire = (fun w -> varint_of w); 90 + read_bytes = Wire.read_uint64; 91 + default = 0L; 92 + } 93 + 94 + let sint32 : int32 t = 95 + { 96 + wire_type = Varint; 97 + write_value = Wire.write_sint32; 98 + write_body = Wire.write_sint32; 99 + read_wire = 100 + (fun w -> Int64.to_int32 (Leb128.zigzag_decode_i64 (varint_of w))); 101 + read_bytes = Wire.read_sint32; 102 + default = 0l; 103 + } 104 + 105 + let sint64 : int64 t = 106 + { 107 + wire_type = Varint; 108 + write_value = Wire.write_sint64; 109 + write_body = Wire.write_sint64; 110 + read_wire = (fun w -> Leb128.zigzag_decode_i64 (varint_of w)); 111 + read_bytes = Wire.read_sint64; 112 + default = 0L; 113 + } 114 + 115 + let fixed32 : int32 t = 116 + { 117 + wire_type = Fixed32; 118 + write_value = Wire.write_fixed32; 119 + write_body = Wire.write_fixed32; 120 + read_wire = (fun w -> fixed32_of w); 121 + read_bytes = Wire.read_fixed32; 122 + default = 0l; 123 + } 124 + 125 + let fixed64 : int64 t = 126 + { 127 + wire_type = Fixed64; 128 + write_value = Wire.write_fixed64; 129 + write_body = Wire.write_fixed64; 130 + read_wire = (fun w -> fixed64_of w); 131 + read_bytes = Wire.read_fixed64; 132 + default = 0L; 133 + } 134 + 135 + let sfixed32 : int32 t = 136 + { 137 + wire_type = Fixed32; 138 + write_value = Wire.write_sfixed32; 139 + write_body = Wire.write_sfixed32; 140 + read_wire = (fun w -> fixed32_of w); 141 + read_bytes = Wire.read_sfixed32; 142 + default = 0l; 143 + } 144 + 145 + let sfixed64 : int64 t = 146 + { 147 + wire_type = Fixed64; 148 + write_value = Wire.write_sfixed64; 149 + write_body = Wire.write_sfixed64; 150 + read_wire = (fun w -> fixed64_of w); 151 + read_bytes = Wire.read_sfixed64; 152 + default = 0L; 153 + } 154 + 155 + let float : float t = 156 + { 157 + wire_type = Fixed32; 158 + write_value = Wire.write_float; 159 + write_body = Wire.write_float; 160 + read_wire = (fun w -> Int32.float_of_bits (fixed32_of w)); 161 + read_bytes = Wire.read_float; 162 + default = 0.0; 163 + } 164 + 165 + let double : float t = 166 + { 167 + wire_type = Fixed64; 168 + write_value = Wire.write_double; 169 + write_body = Wire.write_double; 170 + read_wire = (fun w -> Int64.float_of_bits (fixed64_of w)); 171 + read_bytes = Wire.read_double; 172 + default = 0.0; 173 + } 174 + 175 + let bool : bool t = 176 + { 177 + wire_type = Varint; 178 + write_value = Wire.write_bool; 179 + write_body = Wire.write_bool; 180 + read_wire = (fun w -> not (Int64.equal (varint_of w) 0L)); 181 + read_bytes = Wire.read_bool; 182 + default = false; 183 + } 184 + 185 + let not_packable _ _ = 186 + raise 187 + (Wire.Decode_error 188 + "length-delimited codec cannot be used inside a packed field") 189 + 190 + let string : string t = 191 + { 192 + wire_type = Length_delimited; 193 + write_value = Wire.write_string; 194 + write_body = Buffer.add_string; 195 + read_wire = (fun w -> length_delim_of w); 196 + read_bytes = not_packable; 197 + default = ""; 198 + } 199 + 200 + let bytes : string t = 201 + { 202 + wire_type = Length_delimited; 203 + write_value = Wire.write_bytes; 204 + write_body = Buffer.add_string; 205 + read_wire = (fun w -> length_delim_of w); 206 + read_bytes = not_packable; 207 + default = ""; 208 + } 209 + 210 + (* -- Message combinators -- 211 + 212 + The (o, a) field GADT is adapted from cbor's Obj_int. It captures a 213 + sequence of field declarations and the continuation that builds the 214 + record value. Encoding walks the GADT in declaration order (= tag 215 + order, conventionally) and emits (tag, value) per field. Decoding 216 + pre-parses the wire into a tag -> wire_value list table and then walks 217 + the same GADT, looking each field up in the table. *) 218 + 219 + module Message = struct 220 + type (_, _) field = 221 + | Return : 'a -> ('o, 'a) field 222 + | Required : { 223 + tag : int; 224 + get : 'o -> 'x; 225 + codec : 'x t; 226 + cont : 'x -> ('o, 'a) field; 227 + } 228 + -> ('o, 'a) field 229 + | Optional : { 230 + tag : int; 231 + get : 'o -> 'x option; 232 + codec : 'x t; 233 + cont : 'x option -> ('o, 'a) field; 234 + } 235 + -> ('o, 'a) field 236 + | Repeated : { 237 + tag : int; 238 + get : 'o -> 'x list; 239 + codec : 'x t; 240 + packed : bool; 241 + cont : 'x list -> ('o, 'a) field; 242 + } 243 + -> ('o, 'a) field 244 + 245 + let return v = Return v 246 + 247 + let required tag get codec = 248 + Required { tag; get; codec; cont = (fun x -> Return x) } 249 + 250 + let optional tag get codec = 251 + Optional { tag; get; codec; cont = (fun x -> Return x) } 252 + 253 + let repeated tag get codec = 254 + Repeated { tag; get; codec; packed = false; cont = (fun x -> Return x) } 255 + 256 + let packed tag get codec = 257 + Repeated { tag; get; codec; packed = true; cont = (fun x -> Return x) } 258 + 259 + let rec ( let* ) : type o a b. 260 + (o, a) field -> (a -> (o, b) field) -> (o, b) field = 261 + fun m f -> 262 + match m with 263 + | Return a -> f a 264 + | Required r -> 265 + Required 266 + { 267 + r with 268 + cont = 269 + (fun x -> 270 + let* y = r.cont x in 271 + f y); 272 + } 273 + | Optional r -> 274 + Optional 275 + { 276 + r with 277 + cont = 278 + (fun x -> 279 + let* y = r.cont x in 280 + f y); 281 + } 282 + | Repeated r -> 283 + Repeated 284 + { 285 + r with 286 + cont = 287 + (fun x -> 288 + let* y = r.cont x in 289 + f y); 290 + } 291 + 292 + (* -- Encoding -- *) 293 + 294 + let write_field buf ~tag codec v = 295 + Wire.write_tag buf ~field_number:tag ~wire_type:codec.wire_type; 296 + codec.write_value buf v 297 + 298 + let write_packed buf ~tag codec vs = 299 + (* Concatenate raw value bytes into a scratch buffer, then emit as a 300 + single length-delimited blob. *) 301 + let body = Buffer.create 64 in 302 + List.iter (codec.write_value body) vs; 303 + Wire.write_tag buf ~field_number:tag ~wire_type:Length_delimited; 304 + Leb128.add_u63_to_buffer buf (Buffer.length body); 305 + Buffer.add_buffer buf body 306 + 307 + let rec encode_fields : type o a. Buffer.t -> o -> (o, a) field -> unit = 308 + fun buf o m -> 309 + match m with 310 + | Return _ -> () 311 + | Required { tag; get; codec; cont } -> 312 + let v = get o in 313 + write_field buf ~tag codec v; 314 + encode_fields buf o (cont v) 315 + | Optional { tag; get; codec; cont } -> 316 + let v_opt = get o in 317 + (match v_opt with Some v -> write_field buf ~tag codec v | None -> ()); 318 + encode_fields buf o (cont v_opt) 319 + | Repeated { tag; get; codec; packed; cont } -> 320 + let vs = get o in 321 + (match vs with 322 + | [] -> () 323 + | _ when packed -> write_packed buf ~tag codec vs 324 + | _ -> List.iter (write_field buf ~tag codec) vs); 325 + encode_fields buf o (cont vs) 326 + 327 + (* -- Decoding helpers -- *) 328 + 329 + (* Parse the wire into a tag -> [wire_value] table. Order within each 330 + bucket reflects wire order (first to last). *) 331 + let parse_wire s start end_ : (int, wire_value list ref) Hashtbl.t = 332 + let table = Hashtbl.create 8 in 333 + let push tag v = 334 + match Hashtbl.find_opt table tag with 335 + | Some r -> r := v :: !r 336 + | None -> Hashtbl.add table tag (ref [ v ]) 337 + in 338 + let pos = ref start in 339 + while !pos < end_ do 340 + let field_number, wt, off = Wire.read_tag s !pos in 341 + pos := off; 342 + match wt with 343 + | Wire.Varint -> 344 + let v, off = Wire.read_int64 s !pos in 345 + push field_number (Varint v); 346 + pos := off 347 + | Wire.Fixed32 -> 348 + let v, off = Wire.read_fixed32 s !pos in 349 + push field_number (Fixed32 v); 350 + pos := off 351 + | Wire.Fixed64 -> 352 + let v, off = Wire.read_fixed64 s !pos in 353 + push field_number (Fixed64 v); 354 + pos := off 355 + | Wire.Length_delimited -> 356 + let v, off = Wire.read_bytes s !pos in 357 + push field_number (Length_delim v); 358 + pos := off 359 + done; 360 + if !pos <> end_ then 361 + raise 362 + (Wire.Decode_error 363 + (Fmt.str "overran message boundary: at %d, expected end %d" !pos end_)); 364 + table 365 + 366 + let take_last table tag = 367 + match Hashtbl.find_opt table tag with 368 + | None -> None 369 + | Some r -> ( match List.rev !r with [] -> None | v :: _ -> Some v) 370 + 371 + let take_all table tag = 372 + match Hashtbl.find_opt table tag with None -> [] | Some r -> List.rev !r 373 + 374 + let decode_packed_or_repeated codec values = 375 + (* For a repeated field, each element in [values] can be either a scalar 376 + wire value (non-packed) or a length-delimited blob containing the 377 + concatenation (packed). The protobuf spec requires decoders to accept 378 + both forms on the same field for compatibility. *) 379 + let acc = ref [] in 380 + List.iter 381 + (fun w -> 382 + match w with 383 + | Length_delim body when codec.wire_type <> Length_delimited -> 384 + (* Packed form: parse body as a sequence of raw values. *) 385 + let pos = ref 0 in 386 + let len = String.length body in 387 + while !pos < len do 388 + let v, off = codec.read_bytes body !pos in 389 + acc := v :: !acc; 390 + pos := off 391 + done 392 + | _ -> acc := codec.read_wire w :: !acc) 393 + values; 394 + List.rev !acc 395 + 396 + let rec decode_fields : type o a. 397 + (int, wire_value list ref) Hashtbl.t -> (o, a) field -> a = 398 + fun table m -> 399 + match m with 400 + | Return a -> a 401 + | Required { tag; codec; cont; _ } -> 402 + let v = 403 + match take_last table tag with 404 + | Some w -> codec.read_wire w 405 + | None -> codec.default 406 + in 407 + decode_fields table (cont v) 408 + | Optional { tag; codec; cont; _ } -> 409 + let v = 410 + match take_last table tag with 411 + | Some w -> Some (codec.read_wire w) 412 + | None -> None 413 + in 414 + decode_fields table (cont v) 415 + | Repeated { tag; codec; cont; _ } -> 416 + let vs = decode_packed_or_repeated codec (take_all table tag) in 417 + decode_fields table (cont vs) 418 + 419 + let finish : type o. (o, o) field -> o t = 420 + fun spec -> 421 + let encode_body buf o = encode_fields buf o spec in 422 + let decode_body s start end_ = 423 + let table = parse_wire s start end_ in 424 + decode_fields table spec 425 + in 426 + { 427 + wire_type = Length_delimited; 428 + write_value = 429 + (fun buf v -> 430 + let body = Buffer.create 64 in 431 + encode_body body v; 432 + Leb128.add_u63_to_buffer buf (Buffer.length body); 433 + Buffer.add_buffer buf body); 434 + write_body = encode_body; 435 + read_wire = 436 + (fun w -> 437 + let body = length_delim_of w in 438 + decode_body body 0 (String.length body)); 439 + read_bytes = not_packable; 440 + default = decode_body "" 0 0; 441 + (* A message with no fields populated: all scalars take their default, 442 + repeated fields are empty, optionals are [None]. *) 443 + } 444 + end 445 + 446 + (* -- Top-level encode/decode. 447 + 448 + For messages, we write just the body (no outer length prefix or tag). 449 + For length-delimited scalars (string/bytes) we also write just the body 450 + bytes. For other scalars the body IS the value bytes. *) 451 + 452 + let encode_string codec v = 453 + let buf = Buffer.create 64 in 454 + codec.write_body buf v; 455 + Buffer.contents buf 456 + 457 + let decode_string codec s = 458 + try 459 + match codec.wire_type with 460 + | Length_delimited -> 461 + (* Both messages and string/bytes decode from the whole input as 462 + their body. [read_wire] unwraps a [Length_delim] value, and both 463 + cases expect a bare body here. *) 464 + Ok (codec.read_wire (Length_delim s)) 465 + | Varint | Fixed32 | Fixed64 -> 466 + let v, off = codec.read_bytes s 0 in 467 + if off <> String.length s then 468 + Error 469 + (Fmt.str "trailing %d bytes after scalar" (String.length s - off)) 470 + else Ok v 471 + with Wire.Decode_error msg -> Error msg 472 + 473 + let encode codec w v = 474 + let s = encode_string codec v in 475 + Bytesrw.Bytes.Writer.write_string w s 476 + 477 + let decode codec r = 478 + let s = Bytesrw.Bytes.Reader.to_string r in 479 + decode_string codec s
+149
lib/protobuf.mli
··· 1 + (** Protocol Buffers codec for hand-written schemas. 2 + 3 + A codec is a value of type ['a t] that knows how to both encode and decode 4 + messages of type ['a]. Scalar codecs cover the 15 protobuf primitives; 5 + {!message} builds composite codecs from field descriptors using a cbor / 6 + jsont-style [let*] combinator pattern. 7 + 8 + {2 Example} 9 + 10 + {[ 11 + type person = { name : string; age : int32; hobbies : string list } 12 + 13 + let person : person Protobuf.t = 14 + let open Protobuf.Message in 15 + let* name = required 1 (fun p -> p.name) Protobuf.string in 16 + let* age = required 2 (fun p -> p.age) Protobuf.int32 in 17 + let* hobbies = repeated 3 (fun p -> p.hobbies) Protobuf.string in 18 + return { name; age; hobbies } |> finish 19 + 20 + let wire = 21 + Protobuf.encode_string person 22 + { name = "Ada"; age = 36l; hobbies = [ "math" ] } 23 + ]} *) 24 + 25 + module Wire = Wire 26 + (** Low-level wire primitives. Most users should not need this directly. *) 27 + 28 + (** {1 Codec type} *) 29 + 30 + type 'a t 31 + (** A codec for protobuf messages or scalar values of type ['a]. *) 32 + 33 + (** {1 Scalar codecs} 34 + 35 + Each codec maps an OCaml type to exactly one protobuf scalar type. The 36 + default value used for missing fields (proto3 semantics) is [0] for numeric 37 + types, [false] for [bool], [""] for [string] and [bytes]. *) 38 + 39 + val int32 : int32 t 40 + (** Signed 32-bit varint. Negative values are sign-extended to 64 bits and 41 + encode as 10-byte varints (matching protoc). *) 42 + 43 + val int64 : int64 t 44 + (** Signed 64-bit varint. *) 45 + 46 + val uint32 : int32 t 47 + (** Unsigned 32-bit varint. The OCaml [int32] holds the bit pattern. *) 48 + 49 + val uint64 : int64 t 50 + (** Unsigned 64-bit varint. *) 51 + 52 + val sint32 : int32 t 53 + (** Signed 32-bit varint with zig-zag encoding. Small negatives stay small. *) 54 + 55 + val sint64 : int64 t 56 + (** Signed 64-bit varint with zig-zag encoding. *) 57 + 58 + val fixed32 : int32 t 59 + (** Unsigned 4-byte little-endian integer. *) 60 + 61 + val fixed64 : int64 t 62 + (** Unsigned 8-byte little-endian integer. *) 63 + 64 + val sfixed32 : int32 t 65 + (** Signed 4-byte little-endian integer. *) 66 + 67 + val sfixed64 : int64 t 68 + (** Signed 8-byte little-endian integer. *) 69 + 70 + val float : float t 71 + (** 4-byte IEEE 754 single-precision float. *) 72 + 73 + val double : float t 74 + (** 8-byte IEEE 754 double-precision float. *) 75 + 76 + val bool : bool t 77 + (** Boolean encoded as a single-byte varint (0 or 1). Any non-zero varint 78 + decodes to [true]. *) 79 + 80 + val string : string t 81 + (** UTF-8 text string prefixed by a varint length. Decoders accept any bytes; no 82 + UTF-8 validation is performed. *) 83 + 84 + val bytes : string t 85 + (** Arbitrary byte string prefixed by a varint length. Shares wire 86 + representation with {!string}. *) 87 + 88 + (** {1 Messages} *) 89 + 90 + module Message : sig 91 + (** Build message codecs via a [let*] chain of field declarations. 92 + 93 + A [('o, 'a) field] is a partial codec: it describes how to encode and 94 + decode a message of type ['o], building up to a value of type ['a] along 95 + the way. The chain terminates with {!return} (producing the final ['o] 96 + value) and {!finish} (sealing it into an ['o t]). *) 97 + 98 + type ('o, 'a) field 99 + 100 + val ( let* ) : ('o, 'a) field -> ('a -> ('o, 'b) field) -> ('o, 'b) field 101 + (** Monadic bind for sequencing field declarations. *) 102 + 103 + val return : 'a -> ('o, 'a) field 104 + (** [return v] produces the final value of the record. *) 105 + 106 + val finish : ('o, 'o) field -> 'o t 107 + (** [finish spec] seals a message specification into a codec. *) 108 + 109 + (** {2 Field declarations} *) 110 + 111 + val required : int -> ('o -> 'a) -> 'a t -> ('o, 'a) field 112 + (** [required tag get c] declares a field at the given tag. Proto3 omits it on 113 + the wire if equal to the scalar default, but decoders accept either 114 + presence. *) 115 + 116 + val optional : int -> ('o -> 'a option) -> 'a t -> ('o, 'a option) field 117 + (** [optional tag get c] declares a field as [None] if absent, [Some v] 118 + otherwise. On encode, [None] skips the field entirely. *) 119 + 120 + val repeated : int -> ('o -> 'a list) -> 'a t -> ('o, 'a list) field 121 + (** [repeated tag get c] declares a non-packed repeated field: each element is 122 + written as a separate (tag, value) pair, and decode concatenates all 123 + occurrences of [tag] in wire order. *) 124 + 125 + val packed : int -> ('o -> 'a list) -> 'a t -> ('o, 'a list) field 126 + (** [packed tag get c] declares a packed repeated field: all elements are 127 + concatenated into a single length-delimited blob. Only valid for varint 128 + and fixed-width codecs; strings and messages cannot be packed. 129 + 130 + The decoder accepts both packed and non-packed wire forms for the same tag 131 + (required by the protobuf spec for compatibility). *) 132 + end 133 + 134 + (** {1 Encode / Decode} *) 135 + 136 + val encode_string : 'a t -> 'a -> string 137 + (** [encode_string c v] encodes [v] as a protobuf message body (no outer length 138 + prefix). *) 139 + 140 + val decode_string : 'a t -> string -> ('a, string) result 141 + (** [decode_string c s] decodes the entire input as a message body. Returns 142 + [Error msg] on malformed input; trailing garbage is an error. *) 143 + 144 + val encode : 'a t -> Bytesrw.Bytes.Writer.t -> 'a -> unit 145 + (** [encode c w v] encodes [v] and writes it to [w] as a single slice. Useful 146 + for composition with other bytesrw pipelines. *) 147 + 148 + val decode : 'a t -> Bytesrw.Bytes.Reader.t -> ('a, string) result 149 + (** [decode c r] drains [r] to end-of-data and decodes the full content. *)
+186
lib/wire.ml
··· 1 + type wire_type = Varint | Fixed64 | Length_delimited | Fixed32 2 + 3 + let pp_wire_type ppf = function 4 + | Varint -> Fmt.string ppf "varint" 5 + | Fixed64 -> Fmt.string ppf "fixed64" 6 + | Length_delimited -> Fmt.string ppf "length-delimited" 7 + | Fixed32 -> Fmt.string ppf "fixed32" 8 + 9 + let wire_type_to_int = function 10 + | Varint -> 0 11 + | Fixed64 -> 1 12 + | Length_delimited -> 2 13 + | Fixed32 -> 5 14 + 15 + let wire_type_of_int = function 16 + | 0 -> Some Varint 17 + | 1 -> Some Fixed64 18 + | 2 -> Some Length_delimited 19 + | 5 -> Some Fixed32 20 + | _ -> None 21 + 22 + exception Decode_error of string 23 + 24 + let decode_error fmt = Fmt.kstr (fun s -> raise (Decode_error s)) fmt 25 + 26 + (* -- Tags -- *) 27 + 28 + let write_tag buf ~field_number ~wire_type = 29 + let tag = (field_number lsl 3) lor wire_type_to_int wire_type in 30 + Leb128.add_u63_to_buffer buf tag 31 + 32 + let read_tag s off = 33 + let tag, off' = 34 + try 35 + let v, n = Leb128.decode_u63_string s off in 36 + (v, off + n) 37 + with Invalid_argument msg -> decode_error "tag: %s" msg 38 + in 39 + let field_number = tag lsr 3 in 40 + let wt = tag land 0x7 in 41 + if field_number = 0 then decode_error "tag: field number 0 is reserved"; 42 + match wire_type_of_int wt with 43 + | Some w -> (field_number, w, off') 44 + | None -> decode_error "tag: unsupported wire type %d" wt 45 + 46 + (* -- Low-level varint helpers keyed on int64 / int32 -- 47 + 48 + Protobuf [int32] and [int64] (non-zigzag) encode negative values as 49 + 10-byte varints by sign-extending to 64 bits first. That matches 50 + [Leb128.encode_u64] with the int64 reinterpreted as unsigned. *) 51 + 52 + let write_varint_int64 buf v = Leb128.add_u64_to_buffer buf v 53 + 54 + (* Leb128 decoders return (value, bytes_consumed); protobuf readers use the 55 + (value, new_offset) convention. *) 56 + let read_varint_int64 s off = 57 + try 58 + let scratch = Bytes.unsafe_of_string s in 59 + let v, n = Leb128.decode_u64 scratch off in 60 + (v, off + n) 61 + with Invalid_argument msg -> decode_error "varint: %s" msg 62 + 63 + let read_varint_int s off = 64 + try 65 + let v, n = Leb128.decode_u63_string s off in 66 + (v, off + n) 67 + with Invalid_argument msg -> decode_error "varint: %s" msg 68 + 69 + (* -- Scalar writes -- *) 70 + 71 + let write_int64 = write_varint_int64 72 + 73 + let write_int32 buf v = 74 + (* Sign-extend to int64 to match protoc's 10-byte encoding for negatives. *) 75 + write_varint_int64 buf (Int64.of_int32 v) 76 + 77 + let write_uint64 = write_varint_int64 78 + 79 + let write_uint32 buf v = 80 + (* Treat as unsigned 32-bit bit pattern, zero-extend to int64. *) 81 + write_varint_int64 buf (Int64.logand (Int64.of_int32 v) 0xFFFF_FFFFL) 82 + 83 + let write_sint64 buf v = write_varint_int64 buf (Leb128.zigzag_encode_i64 v) 84 + 85 + let write_sint32 buf v = 86 + write_varint_int64 buf (Leb128.zigzag_encode_i64 (Int64.of_int32 v)) 87 + 88 + let write_fixed32 buf v = 89 + let b = Bytes.create 4 in 90 + Bytes.set_int32_le b 0 v; 91 + Buffer.add_bytes buf b 92 + 93 + let write_fixed64 buf v = 94 + let b = Bytes.create 8 in 95 + Bytes.set_int64_le b 0 v; 96 + Buffer.add_bytes buf b 97 + 98 + let write_sfixed32 = write_fixed32 99 + let write_sfixed64 = write_fixed64 100 + let write_float buf v = write_fixed32 buf (Int32.bits_of_float v) 101 + let write_double buf v = write_fixed64 buf (Int64.bits_of_float v) 102 + let write_bool buf v = Buffer.add_char buf (if v then '\x01' else '\x00') 103 + 104 + let write_string buf s = 105 + Leb128.add_u63_to_buffer buf (String.length s); 106 + Buffer.add_string buf s 107 + 108 + let write_bytes = write_string 109 + 110 + (* -- Scalar reads -- *) 111 + 112 + let read_int64 = read_varint_int64 113 + 114 + let read_int32 s off = 115 + let v, n = read_varint_int64 s off in 116 + (Int64.to_int32 v, n) 117 + 118 + let read_uint64 = read_varint_int64 119 + 120 + let read_uint32 s off = 121 + let v, n = read_varint_int64 s off in 122 + (* Low 32 bits as int32 bit pattern. *) 123 + (Int64.to_int32 (Int64.logand v 0xFFFF_FFFFL), n) 124 + 125 + let read_sint64 s off = 126 + let v, n = read_varint_int64 s off in 127 + (Leb128.zigzag_decode_i64 v, n) 128 + 129 + let read_sint32 s off = 130 + let v, n = read_varint_int64 s off in 131 + (Int64.to_int32 (Leb128.zigzag_decode_i64 v), n) 132 + 133 + let require_bytes s off n = 134 + if off + n > String.length s then 135 + decode_error "truncated: need %d bytes at offset %d, have %d" n off 136 + (String.length s - off) 137 + 138 + let read_fixed32 s off = 139 + require_bytes s off 4; 140 + let b = Bytes.unsafe_of_string s in 141 + (Bytes.get_int32_le b off, off + 4) 142 + 143 + let read_fixed64 s off = 144 + require_bytes s off 8; 145 + let b = Bytes.unsafe_of_string s in 146 + (Bytes.get_int64_le b off, off + 8) 147 + 148 + let read_sfixed32 = read_fixed32 149 + let read_sfixed64 = read_fixed64 150 + 151 + let read_float s off = 152 + let v, off' = read_fixed32 s off in 153 + (Int32.float_of_bits v, off') 154 + 155 + let read_double s off = 156 + let v, off' = read_fixed64 s off in 157 + (Int64.float_of_bits v, off') 158 + 159 + let read_bool s off = 160 + let v, n = read_varint_int64 s off in 161 + (not (Int64.equal v 0L), n) 162 + 163 + let read_length_delim_body s off = 164 + let len, start = read_varint_int s off in 165 + require_bytes s start len; 166 + (String.sub s start len, start + len) 167 + 168 + let read_string = read_length_delim_body 169 + let read_bytes = read_length_delim_body 170 + 171 + (* -- Skip -- *) 172 + 173 + let skip_field s off = function 174 + | Varint -> 175 + let _, off' = read_varint_int64 s off in 176 + off' 177 + | Fixed64 -> 178 + require_bytes s off 8; 179 + off + 8 180 + | Length_delimited -> 181 + let len, start = read_varint_int s off in 182 + require_bytes s start len; 183 + start + len 184 + | Fixed32 -> 185 + require_bytes s off 4; 186 + off + 4
+99
lib/wire.mli
··· 1 + (** Protocol Buffers wire-format primitives. 2 + 3 + Maps the 15 protobuf scalar types onto the four wire types defined in the 4 + {{:https://protobuf.dev/programming-guides/encoding/}Protocol Buffers 5 + encoding guide}. Tag encoding, scalar write/read, and length-delimited 6 + framing are here; message-level field dispatch and combinator composition 7 + live in the top-level {!module:Protobuf} module. *) 8 + 9 + (** {1 Wire types} *) 10 + 11 + (** The four wire types of Protocol Buffers (wire types 3/4 for groups are 12 + deprecated and not supported). *) 13 + type wire_type = 14 + | Varint 15 + (** Wire type 0: int32, int64, uint32, uint64, sint32, sint64, bool, enum. 16 + Length-prefixed by continuation bits. *) 17 + | Fixed64 (** Wire type 1: fixed64, sfixed64, double. 8 bytes LE. *) 18 + | Length_delimited 19 + (** Wire type 2: string, bytes, embedded messages, packed repeated fields. 20 + Varint length + bytes. *) 21 + | Fixed32 (** Wire type 5: fixed32, sfixed32, float. 4 bytes LE. *) 22 + 23 + val pp_wire_type : Format.formatter -> wire_type -> unit 24 + val wire_type_to_int : wire_type -> int 25 + 26 + val wire_type_of_int : int -> wire_type option 27 + (** [wire_type_of_int n] returns [Some w] when [n] is 0, 1, 2, or 5; [None] 28 + otherwise. *) 29 + 30 + (** {1 Decode errors} 31 + 32 + Decoders raise {!Decode_error} on any malformed input. *) 33 + 34 + exception Decode_error of string 35 + 36 + (** {1 Tags} 37 + 38 + A tag is a varint encoding [(field_number lsl 3) lor wire_type]. *) 39 + 40 + val write_tag : Buffer.t -> field_number:int -> wire_type:wire_type -> unit 41 + 42 + val read_tag : string -> int -> int * wire_type * int 43 + (** [read_tag s off] returns [(field_number, wire_type, new_offset)]. Raises 44 + {!Decode_error} if the wire-type bits are not 0/1/2/5 or if the field number 45 + is 0. *) 46 + 47 + (** {1 Scalar writes} 48 + 49 + Each [write_*] appends the value bytes to the buffer without a tag. The 50 + caller is responsible for first writing the tag. *) 51 + 52 + val write_int32 : Buffer.t -> int32 -> unit 53 + val write_int64 : Buffer.t -> int64 -> unit 54 + val write_uint32 : Buffer.t -> int32 -> unit 55 + val write_uint64 : Buffer.t -> int64 -> unit 56 + val write_sint32 : Buffer.t -> int32 -> unit 57 + val write_sint64 : Buffer.t -> int64 -> unit 58 + val write_fixed32 : Buffer.t -> int32 -> unit 59 + val write_fixed64 : Buffer.t -> int64 -> unit 60 + val write_sfixed32 : Buffer.t -> int32 -> unit 61 + val write_sfixed64 : Buffer.t -> int64 -> unit 62 + val write_float : Buffer.t -> float -> unit 63 + val write_double : Buffer.t -> float -> unit 64 + val write_bool : Buffer.t -> bool -> unit 65 + 66 + val write_string : Buffer.t -> string -> unit 67 + (** Writes a varint length prefix followed by the string bytes. *) 68 + 69 + val write_bytes : Buffer.t -> string -> unit 70 + (** Alias for {!write_string}; protobuf [string] and [bytes] share the same wire 71 + representation. *) 72 + 73 + (** {1 Scalar reads} 74 + 75 + Each [read_*] decodes from [s] starting at [off] and returns 76 + [(value, new_offset)]. Raises {!Decode_error} on truncation or overflow. *) 77 + 78 + val read_int32 : string -> int -> int32 * int 79 + val read_int64 : string -> int -> int64 * int 80 + val read_uint32 : string -> int -> int32 * int 81 + val read_uint64 : string -> int -> int64 * int 82 + val read_sint32 : string -> int -> int32 * int 83 + val read_sint64 : string -> int -> int64 * int 84 + val read_fixed32 : string -> int -> int32 * int 85 + val read_fixed64 : string -> int -> int64 * int 86 + val read_sfixed32 : string -> int -> int32 * int 87 + val read_sfixed64 : string -> int -> int64 * int 88 + val read_float : string -> int -> float * int 89 + val read_double : string -> int -> float * int 90 + val read_bool : string -> int -> bool * int 91 + val read_string : string -> int -> string * int 92 + val read_bytes : string -> int -> string * int 93 + 94 + (** {1 Skipping} *) 95 + 96 + val skip_field : string -> int -> wire_type -> int 97 + (** [skip_field s off w] returns the offset past an unknown field of wire type 98 + [w] starting at [off]. Used by the message decoder to discard fields whose 99 + tag is not in the schema. *)
+45
protobuf.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Protocol Buffers codec for hand-written schemas" 4 + description: """ 5 + Bidirectional Protocol Buffers codec for hand-written message schemas. 6 + Defines a codec once as a value of type ['a t] and uses it for both 7 + encoding and decoding. Uses cbor-style [let*] monadic combinators for 8 + message composition and [leb128] for the varint wire primitives. No 9 + [.proto] code generation yet -- the target use case is speaking 10 + specific gRPC or protobuf protocols by hand.""" 11 + maintainer: ["Thomas Gazagnaire <thomas@gazagnaire.org>"] 12 + authors: ["Thomas Gazagnaire <thomas@gazagnaire.org>"] 13 + license: "ISC" 14 + tags: ["org:blacksun" "codec.binary"] 15 + homepage: "https://tangled.org/gazagnaire.org/ocaml-protobuf" 16 + bug-reports: "https://tangled.org/gazagnaire.org/ocaml-protobuf/issues" 17 + depends: [ 18 + "dune" {>= "3.21"} 19 + "ocaml" {>= "5.1"} 20 + "bytesrw" {>= "0.2"} 21 + "leb128" {= version} 22 + "fmt" {>= "0.9"} 23 + "odoc" {with-doc} 24 + "alcotest" {>= "1.7" & with-test} 25 + "crowbar" {>= "0.2" & with-test} 26 + ] 27 + build: [ 28 + ["dune" "subst"] {dev} 29 + [ 30 + "dune" 31 + "build" 32 + "-p" 33 + name 34 + "-j" 35 + jobs 36 + "@install" 37 + "@runtest" {with-test} 38 + "@doc" {with-doc} 39 + ] 40 + ] 41 + dev-repo: "git+https://tangled.org/gazagnaire.org/ocaml-protobuf" 42 + x-maintenance-intent: ["(latest)"] 43 + x-quality-build: "2026-04-20" 44 + x-quality-fuzz: "2026-04-20" 45 + x-quality-test: "2026-04-20"
+3
protobuf.opam.template
··· 1 + x-quality-build: "2026-04-20" 2 + x-quality-fuzz: "2026-04-20" 3 + x-quality-test: "2026-04-20"
+3
test/dune
··· 1 + (test 2 + (name test) 3 + (libraries protobuf bytesrw alcotest fmt))
+1
test/test.ml
··· 1 + let () = Alcotest.run "protobuf" [ Test_wire.suite; Test_protobuf.suite ]
+325
test/test_protobuf.ml
··· 1 + (* High-level combinator tests. 2 + 3 + The message-level spec vectors are from the Protocol Buffers encoding 4 + guide: https://protobuf.dev/programming-guides/encoding/ *) 5 + 6 + let hex s = 7 + let b = Buffer.create (2 * String.length s) in 8 + String.iter (fun c -> Buffer.add_string b (Fmt.str "%02x" (Char.code c))) s; 9 + Buffer.contents b 10 + 11 + (* --- Test 1: message Test1 { int32 a = 1; } with a = 150. --- *) 12 + 13 + type test1 = { a : int32 } 14 + 15 + let test1_codec : test1 Protobuf.t = 16 + let open Protobuf.Message in 17 + finish 18 + (let* a = required 1 (fun r -> r.a) Protobuf.int32 in 19 + return { a }) 20 + 21 + let test_spec_test1 () = 22 + let wire = Protobuf.encode_string test1_codec { a = 150l } in 23 + Alcotest.(check string) "Test1 a=150" "089601" (hex wire); 24 + match Protobuf.decode_string test1_codec wire with 25 + | Error msg -> Alcotest.fail msg 26 + | Ok r -> Alcotest.(check int32) "decoded a" 150l r.a 27 + 28 + (* --- Test 2: message Test2 { string b = 2; } with b = "testing". --- *) 29 + 30 + type test2 = { b : string } 31 + 32 + let test2_codec : test2 Protobuf.t = 33 + let open Protobuf.Message in 34 + finish 35 + (let* b = required 2 (fun r -> r.b) Protobuf.string in 36 + return { b }) 37 + 38 + let test_spec_test2 () = 39 + let wire = Protobuf.encode_string test2_codec { b = "testing" } in 40 + Alcotest.(check string) "Test2 b=testing" "120774657374696e67" (hex wire); 41 + match Protobuf.decode_string test2_codec wire with 42 + | Error msg -> Alcotest.fail msg 43 + | Ok r -> Alcotest.(check string) "decoded b" "testing" r.b 44 + 45 + (* --- Test 3: a record with every scalar type. --- *) 46 + 47 + type all_scalars = { 48 + i32 : int32; 49 + i64 : int64; 50 + u32 : int32; 51 + u64 : int64; 52 + s32 : int32; 53 + s64 : int64; 54 + f32 : int32; 55 + f64 : int64; 56 + sf32 : int32; 57 + sf64 : int64; 58 + flt : float; 59 + dbl : float; 60 + bl : bool; 61 + str : string; 62 + byts : string; 63 + } 64 + 65 + let all_scalars_codec : all_scalars Protobuf.t = 66 + let open Protobuf.Message in 67 + finish 68 + (let* i32 = required 1 (fun r -> r.i32) Protobuf.int32 in 69 + let* i64 = required 2 (fun r -> r.i64) Protobuf.int64 in 70 + let* u32 = required 3 (fun r -> r.u32) Protobuf.uint32 in 71 + let* u64 = required 4 (fun r -> r.u64) Protobuf.uint64 in 72 + let* s32 = required 5 (fun r -> r.s32) Protobuf.sint32 in 73 + let* s64 = required 6 (fun r -> r.s64) Protobuf.sint64 in 74 + let* f32 = required 7 (fun r -> r.f32) Protobuf.fixed32 in 75 + let* f64 = required 8 (fun r -> r.f64) Protobuf.fixed64 in 76 + let* sf32 = required 9 (fun r -> r.sf32) Protobuf.sfixed32 in 77 + let* sf64 = required 10 (fun r -> r.sf64) Protobuf.sfixed64 in 78 + let* flt = required 11 (fun r -> r.flt) Protobuf.float in 79 + let* dbl = required 12 (fun r -> r.dbl) Protobuf.double in 80 + let* bl = required 13 (fun r -> r.bl) Protobuf.bool in 81 + let* str = required 14 (fun r -> r.str) Protobuf.string in 82 + let* byts = required 15 (fun r -> r.byts) Protobuf.bytes in 83 + return 84 + { 85 + i32; 86 + i64; 87 + u32; 88 + u64; 89 + s32; 90 + s64; 91 + f32; 92 + f64; 93 + sf32; 94 + sf64; 95 + flt; 96 + dbl; 97 + bl; 98 + str; 99 + byts; 100 + }) 101 + 102 + let test_all_scalars_roundtrip () = 103 + let v = 104 + { 105 + i32 = -1l; 106 + i64 = -1L; 107 + u32 = 0xdeadbeefl; 108 + u64 = 0xcafef00dbaadf00dL; 109 + s32 = -3l; 110 + s64 = 1_000_000_000_000L; 111 + f32 = 0x12345678l; 112 + f64 = 0x0123456789abcdefL; 113 + sf32 = -5l; 114 + sf64 = -999L; 115 + flt = 3.14; 116 + dbl = 2.718281828459045; 117 + bl = true; 118 + str = "hello"; 119 + byts = "\x00\x01\x02\xff"; 120 + } 121 + in 122 + let wire = Protobuf.encode_string all_scalars_codec v in 123 + match Protobuf.decode_string all_scalars_codec wire with 124 + | Error msg -> Alcotest.fail msg 125 + | Ok r -> 126 + Alcotest.(check int32) "i32" v.i32 r.i32; 127 + Alcotest.(check int64) "i64" v.i64 r.i64; 128 + Alcotest.(check int32) "u32" v.u32 r.u32; 129 + Alcotest.(check int64) "u64" v.u64 r.u64; 130 + Alcotest.(check int32) "s32" v.s32 r.s32; 131 + Alcotest.(check int64) "s64" v.s64 r.s64; 132 + Alcotest.(check int32) "f32" v.f32 r.f32; 133 + Alcotest.(check int64) "f64" v.f64 r.f64; 134 + Alcotest.(check int32) "sf32" v.sf32 r.sf32; 135 + Alcotest.(check int64) "sf64" v.sf64 r.sf64; 136 + Alcotest.(check (float 1e-6)) "flt" v.flt r.flt; 137 + Alcotest.(check (float 1e-12)) "dbl" v.dbl r.dbl; 138 + Alcotest.(check bool) "bl" v.bl r.bl; 139 + Alcotest.(check string) "str" v.str r.str; 140 + Alcotest.(check string) "byts" v.byts r.byts 141 + 142 + (* --- Test 4: optional fields. --- *) 143 + 144 + type opt_msg = { name : string option; age : int32 option } 145 + 146 + let opt_codec : opt_msg Protobuf.t = 147 + let open Protobuf.Message in 148 + finish 149 + (let* name = optional 1 (fun r -> r.name) Protobuf.string in 150 + let* age = optional 2 (fun r -> r.age) Protobuf.int32 in 151 + return { name; age }) 152 + 153 + let test_optional_both_present () = 154 + let v = { name = Some "Ada"; age = Some 36l } in 155 + let wire = Protobuf.encode_string opt_codec v in 156 + match Protobuf.decode_string opt_codec wire with 157 + | Error msg -> Alcotest.fail msg 158 + | Ok r -> 159 + Alcotest.(check (option string)) "name" v.name r.name; 160 + Alcotest.(check (option int32)) "age" v.age r.age 161 + 162 + let test_optional_both_absent () = 163 + let v = { name = None; age = None } in 164 + let wire = Protobuf.encode_string opt_codec v in 165 + Alcotest.(check int) "empty wire" 0 (String.length wire); 166 + match Protobuf.decode_string opt_codec wire with 167 + | Error msg -> Alcotest.fail msg 168 + | Ok r -> 169 + Alcotest.(check (option string)) "name" None r.name; 170 + Alcotest.(check (option int32)) "age" None r.age 171 + 172 + let test_optional_partial () = 173 + let v = { name = Some "solo"; age = None } in 174 + let wire = Protobuf.encode_string opt_codec v in 175 + match Protobuf.decode_string opt_codec wire with 176 + | Error msg -> Alcotest.fail msg 177 + | Ok r -> 178 + Alcotest.(check (option string)) "name" (Some "solo") r.name; 179 + Alcotest.(check (option int32)) "age" None r.age 180 + 181 + (* --- Test 5: repeated (non-packed). --- *) 182 + 183 + type rep_str = { tags : string list } 184 + 185 + let rep_str_codec : rep_str Protobuf.t = 186 + let open Protobuf.Message in 187 + finish 188 + (let* tags = repeated 1 (fun r -> r.tags) Protobuf.string in 189 + return { tags }) 190 + 191 + let test_repeated_strings () = 192 + let v = { tags = [ "a"; "bb"; "ccc" ] } in 193 + let wire = Protobuf.encode_string rep_str_codec v in 194 + match Protobuf.decode_string rep_str_codec wire with 195 + | Error msg -> Alcotest.fail msg 196 + | Ok r -> Alcotest.(check (list string)) "tags" v.tags r.tags 197 + 198 + (* --- Test 6: packed repeated varint. --- *) 199 + 200 + type packed_ints = { nums : int32 list } 201 + 202 + let packed_codec : packed_ints Protobuf.t = 203 + let open Protobuf.Message in 204 + finish 205 + (let* nums = packed 1 (fun r -> r.nums) Protobuf.int32 in 206 + return { nums }) 207 + 208 + let test_packed_ints () = 209 + let v = { nums = [ 1l; 2l; 3l; 150l ] } in 210 + let wire = Protobuf.encode_string packed_codec v in 211 + (* Tag 1 wire type 2 (length-delim), length 5, body = 01 02 03 96 01 *) 212 + Alcotest.(check string) "packed wire" "0a050102039601" (hex wire); 213 + match Protobuf.decode_string packed_codec wire with 214 + | Error msg -> Alcotest.fail msg 215 + | Ok r -> Alcotest.(check (list int32)) "nums" v.nums r.nums 216 + 217 + let test_packed_accepts_non_packed () = 218 + (* The protobuf spec requires decoders to accept both forms for the same 219 + tag. Emit the non-packed form manually: tag 1 varint, 4 times. *) 220 + let unpacked_codec : packed_ints Protobuf.t = 221 + let open Protobuf.Message in 222 + finish 223 + (let* nums = repeated 1 (fun r -> r.nums) Protobuf.int32 in 224 + return { nums }) 225 + in 226 + let unpacked_wire = 227 + Protobuf.encode_string unpacked_codec { nums = [ 1l; 2l; 3l; 150l ] } 228 + in 229 + match Protobuf.decode_string packed_codec unpacked_wire with 230 + | Error msg -> Alcotest.fail msg 231 + | Ok r -> Alcotest.(check (list int32)) "nums" [ 1l; 2l; 3l; 150l ] r.nums 232 + 233 + (* --- Test 7: nested messages. --- *) 234 + 235 + type inner = { x : int32 } 236 + 237 + let inner_codec : inner Protobuf.t = 238 + let open Protobuf.Message in 239 + finish 240 + (let* x = required 1 (fun r -> r.x) Protobuf.int32 in 241 + return { x }) 242 + 243 + type outer = { inner : inner; label : string } 244 + 245 + let outer_codec : outer Protobuf.t = 246 + let open Protobuf.Message in 247 + finish 248 + (let* inner = required 1 (fun r -> r.inner) inner_codec in 249 + let* label = required 2 (fun r -> r.label) Protobuf.string in 250 + return { inner; label }) 251 + 252 + let test_nested () = 253 + let v = { inner = { x = 42l }; label = "hi" } in 254 + let wire = Protobuf.encode_string outer_codec v in 255 + match Protobuf.decode_string outer_codec wire with 256 + | Error msg -> Alcotest.fail msg 257 + | Ok r -> 258 + Alcotest.(check int32) "inner.x" 42l r.inner.x; 259 + Alcotest.(check string) "label" "hi" r.label 260 + 261 + (* --- Test 8: unknown fields are skipped. --- *) 262 + 263 + let test_unknown_field_skipped () = 264 + (* Encode Test1 { a = 150 } but also append a stray field at tag 99. *) 265 + let buf = Buffer.create 16 in 266 + Protobuf.Wire.write_tag buf ~field_number:1 ~wire_type:Protobuf.Wire.Varint; 267 + Protobuf.Wire.write_int32 buf 150l; 268 + Protobuf.Wire.write_tag buf ~field_number:99 269 + ~wire_type:Protobuf.Wire.Length_delimited; 270 + Protobuf.Wire.write_string buf "extra-junk"; 271 + let wire = Buffer.contents buf in 272 + match Protobuf.decode_string test1_codec wire with 273 + | Error msg -> Alcotest.fail msg 274 + | Ok r -> Alcotest.(check int32) "a decoded despite stray tag 99" 150l r.a 275 + 276 + (* --- Test 9: field out-of-order on the wire. --- *) 277 + 278 + let test_fields_out_of_order () = 279 + let buf = Buffer.create 16 in 280 + (* Emit tag 2 (string) first, then tag 1 (int32). *) 281 + Protobuf.Wire.write_tag buf ~field_number:14 282 + ~wire_type:Protobuf.Wire.Length_delimited; 283 + Protobuf.Wire.write_string buf "backwards"; 284 + Protobuf.Wire.write_tag buf ~field_number:1 ~wire_type:Protobuf.Wire.Varint; 285 + Protobuf.Wire.write_int32 buf 7l; 286 + let wire = Buffer.contents buf in 287 + match Protobuf.decode_string all_scalars_codec wire with 288 + | Error msg -> Alcotest.fail msg 289 + | Ok r -> 290 + Alcotest.(check int32) "i32 decoded" 7l r.i32; 291 + Alcotest.(check string) "str decoded" "backwards" r.str; 292 + (* All other scalars take their default. *) 293 + Alcotest.(check int64) "i64 default" 0L r.i64; 294 + Alcotest.(check bool) "bl default" false r.bl 295 + 296 + (* --- Test 10: decode rejects malformed input. --- *) 297 + 298 + let test_truncated_rejected () = 299 + (* A valid tag but a truncated varint body. *) 300 + let bad = "\x08\x80" in 301 + match Protobuf.decode_string test1_codec bad with 302 + | Error _ -> () 303 + | Ok _ -> Alcotest.fail "truncated should be rejected" 304 + 305 + let suite = 306 + ( "protobuf", 307 + [ 308 + Alcotest.test_case "spec vector Test1" `Quick test_spec_test1; 309 + Alcotest.test_case "spec vector Test2" `Quick test_spec_test2; 310 + Alcotest.test_case "all scalars roundtrip" `Quick 311 + test_all_scalars_roundtrip; 312 + Alcotest.test_case "optional both present" `Quick 313 + test_optional_both_present; 314 + Alcotest.test_case "optional both absent" `Quick test_optional_both_absent; 315 + Alcotest.test_case "optional partial" `Quick test_optional_partial; 316 + Alcotest.test_case "repeated strings" `Quick test_repeated_strings; 317 + Alcotest.test_case "packed int32" `Quick test_packed_ints; 318 + Alcotest.test_case "packed accepts non-packed wire" `Quick 319 + test_packed_accepts_non_packed; 320 + Alcotest.test_case "nested message" `Quick test_nested; 321 + Alcotest.test_case "unknown field skipped" `Quick 322 + test_unknown_field_skipped; 323 + Alcotest.test_case "out-of-order fields" `Quick test_fields_out_of_order; 324 + Alcotest.test_case "truncated rejected" `Quick test_truncated_rejected; 325 + ] )
+1
test/test_protobuf.mli
··· 1 + val suite : string * unit Alcotest.test_case list
+174
test/test_wire.ml
··· 1 + (* Tests for the wire-format primitives. 2 + 3 + Vectors are drawn from the Protocol Buffers encoding guide: 4 + https://protobuf.dev/programming-guides/encoding/ *) 5 + 6 + open Protobuf 7 + 8 + let hex s = 9 + let b = Buffer.create (2 * String.length s) in 10 + String.iter (fun c -> Buffer.add_string b (Fmt.str "%02x" (Char.code c))) s; 11 + Buffer.contents b 12 + 13 + let write_and_hex f = 14 + let buf = Buffer.create 16 in 15 + f buf; 16 + hex (Buffer.contents buf) 17 + 18 + let test_tag () = 19 + let check field_number wire_type expected = 20 + let got = 21 + write_and_hex (fun b -> Wire.write_tag b ~field_number ~wire_type) 22 + in 23 + Alcotest.(check string) 24 + (Fmt.str "tag %d/%a" field_number Wire.pp_wire_type wire_type) 25 + expected got 26 + in 27 + check 1 Wire.Varint "08"; 28 + check 2 Wire.Length_delimited "12"; 29 + check 3 Wire.Length_delimited "1a"; 30 + check 15 Wire.Varint "78"; 31 + (* tag 16 crosses into 2-byte varint *) 32 + check 16 Wire.Varint "8001" 33 + 34 + let test_tag_roundtrip () = 35 + let cases = 36 + [ 37 + (1, Wire.Varint); 38 + (2, Wire.Length_delimited); 39 + (5, Wire.Fixed32); 40 + (100, Wire.Fixed64); 41 + (65535, Wire.Varint); 42 + ] 43 + in 44 + List.iter 45 + (fun (field_number, wire_type) -> 46 + let buf = Buffer.create 4 in 47 + Wire.write_tag buf ~field_number ~wire_type; 48 + let s = Buffer.contents buf in 49 + let fn, wt, off = Wire.read_tag s 0 in 50 + Alcotest.(check int) "field number" field_number fn; 51 + Alcotest.(check bool) 52 + "wire type" true 53 + (Wire.wire_type_to_int wire_type = Wire.wire_type_to_int wt); 54 + Alcotest.(check int) "consumed" (String.length s) off) 55 + cases 56 + 57 + let test_tag_invalid () = 58 + Alcotest.check_raises "field 0 rejected" (Wire.Decode_error "") @@ fun () -> 59 + try ignore (Wire.read_tag "\x00" 0) 60 + with Wire.Decode_error _ -> raise (Wire.Decode_error "") 61 + 62 + let test_int32_roundtrip () = 63 + let check v = 64 + let buf = Buffer.create 10 in 65 + Wire.write_int32 buf v; 66 + let s = Buffer.contents buf in 67 + let v', _ = Wire.read_int32 s 0 in 68 + Alcotest.(check int32) (Fmt.str "int32 %ld" v) v v' 69 + in 70 + List.iter check 71 + [ 0l; 1l; 127l; 128l; 150l; -1l; -128l; Int32.min_int; Int32.max_int ] 72 + 73 + let test_int32_negative_is_10_bytes () = 74 + (* int32 -1 sign-extends to a 64-bit pattern, encoded as a 10-byte varint. *) 75 + let buf = Buffer.create 10 in 76 + Wire.write_int32 buf (-1l); 77 + Alcotest.(check int) "bytes used" 10 (Buffer.length buf) 78 + 79 + let test_sint32_zigzag () = 80 + (* sint32 -1 zig-zag encodes as varint 1 = 0x01. *) 81 + let buf = Buffer.create 10 in 82 + Wire.write_sint32 buf (-1l); 83 + Alcotest.(check string) "sint32 -1" "01" (hex (Buffer.contents buf)); 84 + let v, _ = Wire.read_sint32 "\x01" 0 in 85 + Alcotest.(check int32) "decodes to -1" (-1l) v 86 + 87 + let test_fixed32_le () = 88 + let buf = Buffer.create 4 in 89 + Wire.write_fixed32 buf 0x12345678l; 90 + Alcotest.(check string) "little-endian" "78563412" (hex (Buffer.contents buf)) 91 + 92 + let test_float_roundtrip () = 93 + (* Use only values that are exact in IEEE 754 single precision. *) 94 + let check v = 95 + let buf = Buffer.create 4 in 96 + Wire.write_float buf v; 97 + let s = Buffer.contents buf in 98 + let v', _ = Wire.read_float s 0 in 99 + Alcotest.(check (float 0.0)) (Fmt.str "float %g" v) v v' 100 + in 101 + List.iter check [ 0.0; 1.0; -1.0; 0.5; 0.25; 3.125; 1024.0; -16384.0 ] 102 + 103 + let test_double_roundtrip () = 104 + let check v = 105 + let buf = Buffer.create 8 in 106 + Wire.write_double buf v; 107 + let s = Buffer.contents buf in 108 + let v', _ = Wire.read_double s 0 in 109 + Alcotest.(check bool) 110 + (Fmt.str "double %g" v) true 111 + ((Float.is_nan v && Float.is_nan v') || v = v') 112 + in 113 + List.iter check [ 0.0; 1.0; -1.0; 3.141592653589793; Float.infinity ] 114 + 115 + let test_string_vector () = 116 + (* "testing" at tag 2: 0x12 0x07 "testing" *) 117 + let buf = Buffer.create 16 in 118 + Wire.write_tag buf ~field_number:2 ~wire_type:Wire.Length_delimited; 119 + Wire.write_string buf "testing"; 120 + Alcotest.(check string) 121 + "spec vector" 122 + ("1207" ^ "74657374696e67") 123 + (hex (Buffer.contents buf)) 124 + 125 + let test_string_roundtrip () = 126 + let check s = 127 + let buf = Buffer.create 16 in 128 + Wire.write_string buf s; 129 + let s' = Buffer.contents buf in 130 + let v, n = Wire.read_string s' 0 in 131 + Alcotest.(check string) ("roundtrip " ^ s) s v; 132 + Alcotest.(check int) "consumed" (String.length s') n 133 + in 134 + List.iter check [ ""; "a"; "testing"; String.make 200 'x' ] 135 + 136 + let test_skip_field () = 137 + (* Encode tag+value for a varint, skip_field should advance past it. *) 138 + let buf = Buffer.create 8 in 139 + Wire.write_tag buf ~field_number:1 ~wire_type:Wire.Varint; 140 + Wire.write_int32 buf 42l; 141 + let s = Buffer.contents buf in 142 + let _, wt, off = Wire.read_tag s 0 in 143 + let off' = Wire.skip_field s off wt in 144 + Alcotest.(check int) "skipped past value" (String.length s) off' 145 + 146 + let test_truncated_varint () = 147 + Alcotest.check_raises "truncated" (Wire.Decode_error "") @@ fun () -> 148 + try ignore (Wire.read_int32 "\x80\x80" 0) 149 + with Wire.Decode_error _ -> raise (Wire.Decode_error "") 150 + 151 + let test_truncated_fixed32 () = 152 + Alcotest.check_raises "truncated fixed32" (Wire.Decode_error "") @@ fun () -> 153 + try ignore (Wire.read_fixed32 "\x01\x02" 0) 154 + with Wire.Decode_error _ -> raise (Wire.Decode_error "") 155 + 156 + let suite = 157 + ( "wire", 158 + [ 159 + Alcotest.test_case "tag encoding" `Quick test_tag; 160 + Alcotest.test_case "tag roundtrip" `Quick test_tag_roundtrip; 161 + Alcotest.test_case "tag field=0 rejected" `Quick test_tag_invalid; 162 + Alcotest.test_case "int32 roundtrip" `Quick test_int32_roundtrip; 163 + Alcotest.test_case "int32 negative = 10 bytes" `Quick 164 + test_int32_negative_is_10_bytes; 165 + Alcotest.test_case "sint32 zig-zag" `Quick test_sint32_zigzag; 166 + Alcotest.test_case "fixed32 little-endian" `Quick test_fixed32_le; 167 + Alcotest.test_case "float roundtrip" `Quick test_float_roundtrip; 168 + Alcotest.test_case "double roundtrip" `Quick test_double_roundtrip; 169 + Alcotest.test_case "string vector" `Quick test_string_vector; 170 + Alcotest.test_case "string roundtrip" `Quick test_string_roundtrip; 171 + Alcotest.test_case "skip_field advances" `Quick test_skip_field; 172 + Alcotest.test_case "truncated varint" `Quick test_truncated_varint; 173 + Alcotest.test_case "truncated fixed32" `Quick test_truncated_fixed32; 174 + ] )
+1
test/test_wire.mli
··· 1 + val suite : string * unit Alcotest.test_case list