EWAH-compressed bitmaps (git-compatible)
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

OCaml 86.1%
Dune 3.2%
Perl 2.1%
Shell 1.8%
Other 6.9%
7 1 0

Clone this repository

https://tangled.org/gazagnaire.org/ocaml-ewah https://tangled.org/did:plc:jhift2vwcxhou52p3sewcrpx/ocaml-ewah
git@git.recoil.org:gazagnaire.org/ocaml-ewah git@git.recoil.org:did:plc:jhift2vwcxhou52p3sewcrpx/ocaml-ewah

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

ewah#

EWAH-compressed bitmaps for OCaml.

ewah encodes bit sets as alternating clean (all-zero or all-one run) and dirty (literal 64-bit) words. Set algebra runs in time proportional to the compressed size, not the bit-vector length. The wire format is EWAH-64 big-endian, byte-for-byte identical to git's .bitmap index files.

Reference: Lemire, Kaser, Aouiche (2009).

Installation#

Install with opam:

$ opam install ewah

If opam cannot find the package, it may not yet be released in the public opam-repository. Add the overlay repository, then install it:

$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install ewah

Usage#

Build a bitmap from bit indices:

# let b = Ewah.of_indices [3; 7; 42; 1_000] in
  (Ewah.length b, Ewah.cardinal b, Ewah.mem b 7, Ewah.mem b 8)
- : int * int * bool * bool = (1001, 4, true, false)

Set algebra#

Union, intersection, and difference run on the compressed form:

# let a = Ewah.of_indices [1; 2; 3; 10]
  and b = Ewah.of_indices [2; 3; 4; 10] in
  (Ewah.to_indices (Ewah.inter a b),
   Ewah.to_indices (Ewah.diff a b))
- : int list * int list = ([2; 3; 10], [1])

Serialization#

The wire format is git-compatible. Bytes emitted by to_bytes can be read by git; of_bytes accepts git's output:

# let bytes = Ewah.to_bytes (Ewah.of_indices [0; 63; 64; 128]) in
  match Ewah.of_bytes bytes with
  | Ok b -> Ewah.to_indices b
  | Error (`Msg m) -> failwith m
- : int list = [0; 63; 64; 128]

Licence#

ISC