Bloom filter for probabilistic membership testing
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

bloom: rewrite README — rename bloomf→bloom, add usage examples

+69 -10
+69 -10
README.md
··· 1 - # Bloomf - Efficient Bloom filters for OCaml [![OCaml-CI Build Status](https://img.shields.io/endpoint?url=https%3A%2F%2Fci.ocamllabs.io%2Fbadge%2Fmirage%2Fbloomf%2Fmaster&logo=ocaml)](https://ci.ocamllabs.io/github/mirage/bloomf) 1 + # Bloom - Bloom filters for OCaml 2 + 2 3 Bloom filters are memory and time efficient data structures allowing 3 4 probabilistic membership queries in a set. 4 5 5 6 A query negative result ensures that the element is not present in the set, 6 7 while a positive result might be a false positive, i.e. the element might not be 7 - present and the BF membership query can return true anyway. 8 + present and the Bloom filter membership query can return true anyway. 9 + 10 + Internal parameters of the Bloom filter allow to control its false positive rate 11 + depending on the expected number of elements in it. 12 + 13 + ## Install 14 + 15 + ``` 16 + opam install bloom 17 + ``` 18 + 19 + Alternatively, you can build from sources with `dune build`. 20 + 21 + ## Usage 22 + 23 + ### Generic interface 24 + 25 + ```ocaml 26 + (* Create a Bloom filter expecting 1000 elements with 1% error rate *) 27 + let bf = Bloom.v ~error_rate:0.01 1000 28 + 29 + (* Add elements *) 30 + let () = Bloom.add bf "hello" 31 + let () = Bloom.add bf "world" 32 + 33 + (* Query membership *) 34 + let _ = Bloom.mem bf "hello" (* true *) 35 + let _ = Bloom.mem bf "other" (* probably false *) 36 + 37 + (* Estimate the number of elements *) 38 + let _ = Bloom.size_estimate bf 39 + ``` 40 + 41 + ### Functorial interface 42 + 43 + The functorial interface lets you provide a custom hash function: 44 + 45 + ```ocaml 46 + module My_bloom = Bloom.Make (struct 47 + type t = string 48 + let hash = Hashtbl.hash 49 + end) 8 50 9 - Internal parameters of the BF allow to control its false positive rate depending 10 - on the expected number of elements in it. 51 + let bf = My_bloom.v ~error_rate:0.01 1000 52 + let () = My_bloom.add bf "hello" 53 + let _ = My_bloom.mem bf "hello" (* true *) 54 + ``` 11 55 12 - Online documentation is available [here](https://mirage.github.io/bloomf/). 56 + ### Set operations 57 + 58 + Bloom filters support lossless union and intersection: 59 + 60 + ```ocaml 61 + let bf1 = Bloom.v ~error_rate:0.01 1000 62 + let bf2 = Bloom.v ~error_rate:0.01 1000 63 + let () = Bloom.add bf1 "a" 64 + let () = Bloom.add bf2 "b" 65 + let combined = Bloom.union bf1 bf2 66 + let _ = Bloom.mem combined "a" (* true *) 67 + let _ = Bloom.mem combined "b" (* true *) 68 + ``` 13 69 14 - ## Install 70 + ### Serialization 15 71 16 - The latest version of `bloomf` is available on opam with `opam install bloomf`. 72 + Bloom filters can be serialized to and from bytes: 17 73 18 - Alternatively, you can build from sources with `make` or `dune build`. 74 + ```ocaml 75 + let bytes = Bloom.to_bytes bf 76 + let bf' = Bloom.of_bytes bytes (* ('a t, [`Msg of string]) result *) 77 + ``` 19 78 20 79 ## Tests 21 80 ··· 27 86 28 87 ## Benchmarks 29 88 30 - Micro benchmarks are provided for `create`, `add`, `mem` and `size_estimate` 89 + Micro benchmarks are provided for `v`, `add`, `mem` and `size_estimate` 31 90 operations. Expected error rate is 0.01. 32 91 33 - They preform OLS regression analysis using the development version of 92 + They perform OLS regression analysis using the development version of 34 93 [bechamel](https://github.com/dinosaure/bechamel). To reproduce them, pin 35 94 `bechamel` then run `dune build @bench`.