Bloom - Bloom filters for OCaml#
Bloom filters are memory and time efficient data structures allowing probabilistic membership queries in a set.
A query negative result ensures that the element is not present in the set, while a positive result might be a false positive, i.e. the element might not be present and the Bloom filter membership query can return true anyway.
Internal parameters of the Bloom filter allow to control its false positive rate depending on the expected number of elements in it.
Installation#
Install with opam:
$ opam install bloom
If opam cannot find the package, it may not yet be released in the public
opam-repository. Add the overlay repository, then install it:
$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install bloom
Alternatively, you can build from sources with dune build.
Usage#
Generic interface#
(* Create a Bloom filter expecting 1000 elements with 1% error rate *)
let bf = Bloom.v ~error_rate:0.01 1000
(* Add elements *)
let () = Bloom.add bf "hello"
let () = Bloom.add bf "world"
(* Query membership *)
let _ = Bloom.mem bf "hello" (* true *)
let _ = Bloom.mem bf "other" (* probably false *)
(* Estimate the number of elements *)
let _ = Bloom.size_estimate bf
Functorial interface#
The functorial interface lets you provide a custom hash function:
module My_bloom = Bloom.Make (struct
type t = string
let hash = Hashtbl.hash
end)
let my_bf = My_bloom.v ~error_rate:0.01 1000
let () = My_bloom.add my_bf "hello"
let _ = My_bloom.mem my_bf "hello" (* true *)
Set operations#
Bloom filters support lossless union and intersection:
let bf1 = Bloom.v ~error_rate:0.01 1000
let bf2 = Bloom.v ~error_rate:0.01 1000
let () = Bloom.add bf1 "a"
let () = Bloom.add bf2 "b"
let combined = Bloom.union bf1 bf2
let _ = Bloom.mem combined "a" (* true *)
let _ = Bloom.mem combined "b" (* true *)
Serialization#
Bloom filters can be serialized to and from bytes:
let bytes = Bloom.to_bytes bf
let bf' = Bloom.of_bytes bytes (* ('a t, [`Msg of string]) result *)
Tests#
Some of the tests, measuring false positive rate or size estimation, might fail
once in a while since they are randomized. They are thus removed from dune runtest alias.
To run the whole test suite, run dune build @runtest-rand instead.
Benchmarks#
Micro benchmarks are provided for v, add, mem and size_estimate
operations. Expected error rate is 0.01.
They perform OLS regression analysis using the development version of
bechamel. To reproduce them, pin
bechamel then run dune build @bench.