leb128#
LEB128 (Little-Endian Base 128) variable-length integer codec for OCaml.
LEB128 encodes a non-negative integer as a sequence of 7-bit groups, low group first, with the high bit of each byte set as a continuation flag. Signed values are first zig-zag encoded so that small magnitudes (positive or negative) stay small.
Used by:
- DWARF debug info — unsigned and signed LEB128 throughout
- WebAssembly — all integer immediates
- Protocol Buffers — called "varint", with zig-zag for
sint32/sint64 - IPLD CAR files — block length prefixes
- Git pack-delta headers — source and target sizes
Installation#
Install with opam:
$ opam install leb128
If opam cannot find the package, it may not yet be released in the public
opam-repository. Add the overlay repository, then install it:
$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install leb128
Usage#
(* Bytes-based core: fast path, no allocation for the value. *)
let buf = Bytes.create 10
let n = Leb128.encode_u64 300L buf 0 (* n = 2, buf.[0..1] = ac 02 *)
let v, n = Leb128.decode_u64 buf 0 (* v = 300L, n = 2 *)
(* OCaml-int fast path (no int64 boxing). *)
let n = Leb128.encode_u63 150 buf 0
let id, n = Leb128.decode_u63 buf 0 (* id = 150 *)
(* Zig-zag signed. *)
let n = Leb128.encode_int (-1) buf 0 (* encodes as 0x01 *)
let v, n = Leb128.decode_int buf 0 (* v = -1 *)
(* String convenience — shares no bytes, no copy. *)
let v, n = Leb128.decode_u63_string "\xac\x02" 0
(* Streaming over Bytesrw. *)
let r = Bytesrw.Bytes.Reader.of_string "\xac\x02"
let v = Leb128.read_u63 r (* v = 300 *)
API shape#
| Variant | Type | Width | Signed |
|---|---|---|---|
u64 |
int64 |
full 64 bits | no |
u63 |
int |
OCaml int | no |
i64 |
int64 |
zig-zag | yes |
int |
int |
zig-zag | yes |
Each variant has decode_*, encode_*, and size_*. Most have a _string
form for non-allocating reads from string, and streaming read_* / write_*
over Bytesrw.Bytes.Reader.t / .Writer.t.
Decoders raise Invalid_argument on over-long encodings, truncated input,
or values outside the target type's range. Encoders never emit over-long
forms.