ewah#
EWAH-compressed bitmaps for OCaml.
ewah encodes bit sets as alternating clean (all-zero or all-one run)
and dirty (literal 64-bit) words. Set algebra runs in time
proportional to the compressed size, not the bit-vector length. The
wire format is EWAH-64 big-endian, byte-for-byte identical to git's
.bitmap index files.
Reference: Lemire, Kaser, Aouiche (2009).
Installation#
Install with opam:
$ opam install ewah
If opam cannot find the package, it may not yet be released in the
public opam-repository. Add the overlay repository, then install
it:
$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install ewah
Usage#
Build a bitmap from bit indices:
# let b = Ewah.of_indices [3; 7; 42; 1_000] in
(Ewah.length b, Ewah.cardinal b, Ewah.mem b 7, Ewah.mem b 8)
- : int * int * bool * bool = (1001, 4, true, false)
Set algebra#
Union, intersection, and difference run on the compressed form:
# let a = Ewah.of_indices [1; 2; 3; 10]
and b = Ewah.of_indices [2; 3; 4; 10] in
(Ewah.to_indices (Ewah.inter a b),
Ewah.to_indices (Ewah.diff a b))
- : int list * int list = ([2; 3; 10], [1])
Serialization#
The wire format is git-compatible. Bytes emitted by to_bytes can
be read by git; of_bytes accepts git's output:
# let bytes = Ewah.to_bytes (Ewah.of_indices [0; 63; 64; 128]) in
match Ewah.of_bytes bytes with
| Ok b -> Ewah.to_indices b
| Error (`Msg m) -> failwith m
- : int list = [0; 63; 64; 128]
Licence#
ISC