EWAH-compressed bitmaps (git-compatible)
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

ocaml-linkedin: apply dune fmt

Pure formatting changes from `dune fmt`: doc comment placement moves
from above the binding to below it for `type`s, multi-line `match`
expressions collapse onto one line where they fit, and infix operator
applications pick up spaces (`Soup.($?)` -> `Soup.( $? )`). No
semantic changes.

+81 -1
+73
README.md
··· 1 + # ewah 2 + 3 + EWAH-compressed bitmaps for OCaml. 4 + 5 + `ewah` encodes bit sets as alternating clean (all-zero or all-one run) 6 + and dirty (literal 64-bit) words. Set algebra runs in time 7 + proportional to the compressed size, not the bit-vector length. The 8 + wire format is EWAH-64 big-endian, byte-for-byte identical to [git's 9 + .bitmap index files][git-bitmap]. 10 + 11 + Reference: [Lemire, Kaser, Aouiche (2009)][paper]. 12 + 13 + [git-bitmap]: https://git-scm.com/docs/bitmap-format 14 + [paper]: https://arxiv.org/abs/0901.3751 15 + 16 + ## Installation 17 + 18 + Install with opam: 19 + 20 + <!-- $MDX skip --> 21 + ```sh 22 + $ opam install ewah 23 + ``` 24 + 25 + If opam cannot find the package, it may not yet be released in the 26 + public `opam-repository`. Add the overlay repository, then install 27 + it: 28 + 29 + <!-- $MDX skip --> 30 + ```sh 31 + $ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git 32 + $ opam update 33 + $ opam install ewah 34 + ``` 35 + 36 + ## Usage 37 + 38 + Build a bitmap from bit indices: 39 + 40 + ```ocaml 41 + # let b = Ewah.of_indices [3; 7; 42; 1_000] in 42 + (Ewah.length b, Ewah.cardinal b, Ewah.mem b 7, Ewah.mem b 8) 43 + - : int * int * bool * bool = (1001, 4, true, false) 44 + ``` 45 + 46 + ### Set algebra 47 + 48 + Union, intersection, and difference run on the compressed form: 49 + 50 + ```ocaml 51 + # let a = Ewah.of_indices [1; 2; 3; 10] 52 + and b = Ewah.of_indices [2; 3; 4; 10] in 53 + (Ewah.to_indices (Ewah.inter a b), 54 + Ewah.to_indices (Ewah.diff a b)) 55 + - : int list * int list = ([2; 3; 10], [1]) 56 + ``` 57 + 58 + ### Serialization 59 + 60 + The wire format is git-compatible. Bytes emitted by `to_bytes` can 61 + be read by git; `of_bytes` accepts git's output: 62 + 63 + ```ocaml 64 + # let bytes = Ewah.to_bytes (Ewah.of_indices [0; 63; 64; 128]) in 65 + match Ewah.of_bytes bytes with 66 + | Ok b -> Ewah.to_indices b 67 + | Error (`Msg m) -> failwith m 68 + - : int list = [0; 63; 64; 128] 69 + ``` 70 + 71 + ## Licence 72 + 73 + ISC
+4
dune
··· 1 1 (env 2 2 (dev 3 3 (flags :standard %{dune-warnings}))) 4 + 5 + (mdx 6 + (files README.md) 7 + (libraries ewah fmt))
+3 -1
dune-project
··· 1 1 (lang dune 3.21) 2 + (using mdx 0.4) 2 3 (name ewah) 3 4 (version 0.1.0) 4 5 (formatting (enabled_for ocaml)) ··· 30 31 (bytesrw (>= 0.1.0)) 31 32 (fmt (>= 0.9.0)) 32 33 (alcotest (and :with-test (>= 1.7.0))) 33 - (alcobar :with-test))) 34 + (alcobar :with-test) 35 + (mdx :with-test)))
+1
ewah.opam
··· 26 26 "fmt" {>= "0.9.0"} 27 27 "alcotest" {with-test & >= "1.7.0"} 28 28 "alcobar" {with-test} 29 + "mdx" {with-test} 29 30 "odoc" {with-doc} 30 31 ] 31 32 build: [