Bloom filter for probabilistic membership testing
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Squashed 'ocaml-bloom/' content from commit 368ca5fc git-subtree-split: 368ca5fca912121963db23d77db46495ae53ac30

+652
+8
.gitignore
··· 1 + _bench/ 2 + _build/ 3 + _metrics/ 4 + _opam/ 5 + _tests/ 6 + *.install 7 + *.merlin 8 + **/.DS_Store
+4
.ocamlformat
··· 1 + version = 0.19.0 2 + parse-docstrings = true 3 + break-infix = fit-or-vertical 4 + module-item-spacing = compact
+12
CHANGES.md
··· 1 + # unreleased 2 + 3 + - Added `copy` 4 + 5 + # v0.2.0 2021-02-10 6 + 7 + - Added `to_bytes` and `of_bytes`. 8 + - Added `union` and `inter`. 9 + 10 + # v0.1.0 2019-03-07 11 + 12 + - Initial release
+21
LICENSE
··· 1 + The MIT License 2 + 3 + Copyright (c) 2019 Clément Pascutto 4 + 5 + Permission is hereby granted, free of charge, to any person obtaining a copy 6 + of this software and associated documentation files (the "Software"), to deal 7 + in the Software without restriction, including without limitation the rights 8 + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 + copies of the Software, and to permit persons to whom the Software is 10 + furnished to do so, subject to the following conditions: 11 + 12 + The above copyright notice and this permission notice shall be included in 13 + all copies or substantial portions of the Software. 14 + 15 + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 + THE SOFTWARE.
+16
Makefile
··· 1 + .PHONY: all test bench doc clean 2 + 3 + all: 4 + dune build 5 + 6 + test: 7 + dune runtest 8 + 9 + bench: 10 + dune build @bench 11 + 12 + clean: 13 + dune clean 14 + 15 + doc: 16 + dune build @doc
+35
README.md
··· 1 + # Bloomf - Efficient Bloom filters for OCaml [![OCaml-CI Build Status](https://img.shields.io/endpoint?url=https%3A%2F%2Fci.ocamllabs.io%2Fbadge%2Fmirage%2Fbloomf%2Fmaster&logo=ocaml)](https://ci.ocamllabs.io/github/mirage/bloomf) 2 + Bloom filters are memory and time efficient data structures allowing 3 + probabilistic membership queries in a set. 4 + 5 + A query negative result ensures that the element is not present in the set, 6 + while a positive result might be a false positive, i.e. the element might not be 7 + present and the BF membership query can return true anyway. 8 + 9 + Internal parameters of the BF allow to control its false positive rate depending 10 + on the expected number of elements in it. 11 + 12 + Online documentation is available [here](https://mirage.github.io/bloomf/). 13 + 14 + ## Install 15 + 16 + The latest version of `bloomf` is available on opam with `opam install bloomf`. 17 + 18 + Alternatively, you can build from sources with `make` or `dune build`. 19 + 20 + ## Tests 21 + 22 + Some of the tests, measuring false positive rate or size estimation, might fail 23 + once in a while since they are randomized. They are thus removed from `dune 24 + runtest` alias. 25 + 26 + To run the whole test suite, run `dune build @runtest-rand` instead. 27 + 28 + ## Benchmarks 29 + 30 + Micro benchmarks are provided for `create`, `add`, `mem` and `size_estimate` 31 + operations. Expected error rate is 0.01. 32 + 33 + They preform OLS regression analysis using the development version of 34 + [bechamel](https://github.com/dinosaure/bechamel). To reproduce them, pin 35 + `bechamel` then run `dune build @bench`.
+32
bloom.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Bloom filters for OCaml" 4 + description: 5 + "Bloom filters are probabilistic data structures for membership testing. A positive result means the element may be in the set, while a negative result means it is definitely not." 6 + maintainer: ["Thomas Gazagnaire"] 7 + authors: ["Clément Pascutto" "Thomas Gazagnaire"] 8 + homepage: "https://tangled.org/gazagnaire.org/ocaml-bloom" 9 + bug-reports: "https://tangled.org/gazagnaire.org/ocaml-bloom/issues" 10 + depends: [ 11 + "dune" {>= "3.0"} 12 + "ocaml" {>= "5.1"} 13 + "bitv" {>= "1.4"} 14 + "alcotest" {>= "1.0" & with-test} 15 + "crowbar" {>= "0.2" & with-test} 16 + "odoc" {with-doc} 17 + ] 18 + build: [ 19 + ["dune" "subst"] {dev} 20 + [ 21 + "dune" 22 + "build" 23 + "-p" 24 + name 25 + "-j" 26 + jobs 27 + "@install" 28 + "@runtest" {with-test} 29 + "@doc" {with-doc} 30 + ] 31 + ] 32 + dev-repo: "https://tangled.org/gazagnaire.org/ocaml-bloom"
+22
dune-project
··· 1 + (lang dune 3.0) 2 + (name bloom) 3 + 4 + (generate_opam_files true) 5 + (implicit_transitive_deps false) 6 + 7 + (source (uri https://tangled.org/gazagnaire.org/ocaml-bloom)) 8 + (bug_reports https://tangled.org/gazagnaire.org/ocaml-bloom/issues) 9 + (homepage https://tangled.org/gazagnaire.org/ocaml-bloom) 10 + 11 + (maintainers "Thomas Gazagnaire") 12 + (authors "Clément Pascutto" "Thomas Gazagnaire") 13 + 14 + (package 15 + (name bloom) 16 + (synopsis "Bloom filters for OCaml") 17 + (description "Bloom filters are probabilistic data structures for membership testing. A positive result means the element may be in the set, while a negative result means it is definitely not.") 18 + (depends 19 + (ocaml (>= 5.1)) 20 + (bitv (>= 1.4)) 21 + (alcotest (and (>= 1.0) :with-test)) 22 + (crowbar (and (>= 0.2) :with-test))))
+20
fuzz/dune
··· 1 + (executable 2 + (name fuzz) 3 + (modules fuzz fuzz_common fuzz_bloom) 4 + (libraries bloom crowbar)) 5 + 6 + ; Quick check with Crowbar (no AFL instrumentation) 7 + (rule 8 + (alias fuzz) 9 + (deps fuzz.exe) 10 + (action 11 + (run %{exe:fuzz.exe}))) 12 + 13 + ; AFL-instrumented build target (use with --profile=afl) 14 + (rule 15 + (alias fuzz-afl) 16 + (deps 17 + (source_tree input) 18 + fuzz.exe) 19 + (action 20 + (echo "AFL fuzzer built: %{exe:fuzz.exe}\n")))
+5
fuzz/fuzz.ml
··· 1 + (** Main fuzz test entry point. *) 2 + 3 + let () = 4 + Fuzz_common.run (); 5 + Fuzz_bloom.run ()
+108
fuzz/fuzz_bloom.ml
··· 1 + (** Fuzz tests for Bloom filter module. *) 2 + 3 + open Crowbar 4 + 5 + (** No false negatives - after adding an element, mem must return true. *) 6 + let test_no_false_negatives seed elts = 7 + let size = max 100 (String.length seed * 10) in 8 + let bf = Bloom.create ~error_rate:0.01 size in 9 + List.iter (fun e -> Bloom.add bf e) elts; 10 + List.iter 11 + (fun e -> if not (Bloom.mem bf e) then fail "false negative detected") 12 + elts 13 + 14 + (** Serialization roundtrip - valid filters must round-trip. *) 15 + let test_serialization_roundtrip elts = 16 + let bf = Bloom.create ~error_rate:0.01 1000 in 17 + List.iter (fun e -> Bloom.add bf e) elts; 18 + match Bloom.to_bytes bf |> Bloom.of_bytes with 19 + | Error _ -> fail "deserialization failed" 20 + | Ok bf2 -> 21 + List.iter 22 + (fun e -> 23 + if not (Bloom.mem bf2 e) then fail "element lost after roundtrip") 24 + elts 25 + 26 + (** Union contains all elements from both filters. *) 27 + let test_union elts1 elts2 = 28 + let bf1 = Bloom.create ~error_rate:0.01 1000 in 29 + let bf2 = Bloom.create ~error_rate:0.01 1000 in 30 + List.iter (fun e -> Bloom.add bf1 e) elts1; 31 + List.iter (fun e -> Bloom.add bf2 e) elts2; 32 + let bf_union = Bloom.union bf1 bf2 in 33 + List.iter 34 + (fun e -> 35 + if not (Bloom.mem bf_union e) then fail "element missing from union") 36 + elts1; 37 + List.iter 38 + (fun e -> 39 + if not (Bloom.mem bf_union e) then fail "element missing from union") 40 + elts2 41 + 42 + (** Copy creates independent filter - original elements in both. *) 43 + let test_copy_independence elts extra = 44 + let bf = Bloom.create ~error_rate:0.01 1000 in 45 + List.iter (fun e -> Bloom.add bf e) elts; 46 + let bf_copy = Bloom.copy bf in 47 + Bloom.add bf extra; 48 + List.iter 49 + (fun e -> 50 + if not (Bloom.mem bf e) then fail "element missing from original"; 51 + if not (Bloom.mem bf_copy e) then fail "element missing from copy") 52 + elts 53 + 54 + (** Clear empties filter - size estimate should be near 0. *) 55 + let test_clear elts = 56 + let bf = Bloom.create ~error_rate:0.01 1000 in 57 + List.iter (fun e -> Bloom.add bf e) elts; 58 + Bloom.clear bf; 59 + let estimate = Bloom.size_estimate bf in 60 + if estimate >= 10 then fail "filter not cleared properly" 61 + 62 + (** Intersection preserves common elements. *) 63 + let test_intersection elts = 64 + let bf1 = Bloom.create ~error_rate:0.01 1000 in 65 + let bf2 = Bloom.create ~error_rate:0.01 1000 in 66 + (* Add same elements to both *) 67 + List.iter (fun e -> Bloom.add bf1 e) elts; 68 + List.iter (fun e -> Bloom.add bf2 e) elts; 69 + let bf_inter = Bloom.inter bf1 bf2 in 70 + (* All elements should be in intersection *) 71 + List.iter 72 + (fun e -> 73 + if not (Bloom.mem bf_inter e) then 74 + fail "element missing from intersection") 75 + elts 76 + 77 + (** Create with invalid error_rate should fail. *) 78 + let test_invalid_error_rate () = 79 + (try 80 + let _ = Bloom.create ~error_rate:0.0 100 in 81 + fail "should reject error_rate=0" 82 + with Invalid_argument _ -> ()); 83 + (try 84 + let _ = Bloom.create ~error_rate:1.0 100 in 85 + fail "should reject error_rate=1" 86 + with Invalid_argument _ -> ()); 87 + (try 88 + let _ = Bloom.create ~error_rate:(-0.1) 100 in 89 + fail "should reject negative error_rate" 90 + with Invalid_argument _ -> ()); 91 + (try 92 + let _ = Bloom.create ~error_rate:1.5 100 in 93 + fail "should reject error_rate>1" 94 + with Invalid_argument _ -> ()) 95 + 96 + let run () = 97 + add_test ~name:"bloom: no false negatives" [ bytes; list bytes ] 98 + test_no_false_negatives; 99 + add_test ~name:"bloom: serialization roundtrip" [ list bytes ] 100 + test_serialization_roundtrip; 101 + add_test ~name:"bloom: union contains both" [ list bytes; list bytes ] 102 + test_union; 103 + add_test ~name:"bloom: copy independence" [ list bytes; bytes ] 104 + test_copy_independence; 105 + add_test ~name:"bloom: clear empties filter" [ list bytes ] test_clear; 106 + add_test ~name:"bloom: intersection preserves common" [ list bytes ] 107 + test_intersection; 108 + add_test ~name:"bloom: invalid error_rate" [ const () ] test_invalid_error_rate
+12
fuzz/fuzz_common.ml
··· 1 + (** Common utilities for fuzz tests. *) 2 + 3 + let to_bytes buf = 4 + let len = String.length buf in 5 + let b = Bytes.create len in 6 + Bytes.blit_string buf 0 b 0 len; 7 + b 8 + 9 + let truncate ?(max_len = 4096) buf = 10 + if String.length buf > max_len then String.sub buf 0 max_len else buf 11 + 12 + let run () = ()
fuzz/input/empty

This is a binary file and will not be displayed.

+1
fuzz/input/simple
··· 1 + test
+153
src/bloom.ml
··· 1 + type priv = { m : int; k : int; p_len : (int * int) list; b : Bitv.t } 2 + type 'a t = priv 3 + 4 + let copy t = { m = t.m; k = t.k; p_len = t.p_len; b = Bitv.copy t.b } 5 + let rec gcd a b = if b = 0 then a else gcd b (a mod b) 6 + 7 + let partition_lengths m k = 8 + let rec aux sum acc i = 9 + if List.length acc = k then (sum, acc) 10 + else 11 + let rec loop step = 12 + let k = i + step in 13 + let gcd_k = gcd k in 14 + if List.for_all (fun p -> gcd_k p = 1) acc then 15 + aux (sum + k) (k :: acc) (k + 1) 16 + else loop (step + 1) 17 + in 18 + loop 1 19 + in 20 + aux 0 [] (m / k) 21 + 22 + let v m k = 23 + let m, lengths = partition_lengths m k in 24 + let p_len = 25 + let rec aux acc off = function 26 + | [] -> acc 27 + | h :: t -> aux ((off, h) :: acc) (off + h) t 28 + in 29 + aux [] 0 lengths 30 + in 31 + try 32 + let b = Bitv.create m false in 33 + { m; k; p_len; b } 34 + with Invalid_argument _ -> invalid_arg "Bloomf.create" 35 + 36 + let estimate_parameters n p = 37 + let log2 = log 2. in 38 + let nf = float_of_int n in 39 + let m = ceil (-.nf *. log p /. log (2. ** log2)) in 40 + let k = ceil (log2 *. m /. nf) in 41 + (m, k) 42 + 43 + let create ?(error_rate = 0.01) n_items = 44 + let m, k = estimate_parameters n_items error_rate in 45 + if error_rate <= 0. || error_rate >= 1. then invalid_arg "Bloomf.create"; 46 + v (int_of_float m) (int_of_float k) 47 + 48 + let add_priv t hashed_data = 49 + let rec loop = function 50 + | [] -> () 51 + | (off, len) :: tl -> 52 + let loc = off + (hashed_data mod len) in 53 + let () = Bitv.unsafe_set t.b loc true in 54 + loop tl 55 + in 56 + loop t.p_len 57 + 58 + let add bf data = add_priv bf (Hashtbl.hash data) 59 + 60 + let op f bf1 bf2 = 61 + if bf1.k <> bf2.k || bf1.m <> bf2.m then 62 + invalid_arg "incompatible bloom filters"; 63 + { m = bf1.m; k = bf2.k; p_len = bf1.p_len; b = f bf1.b bf2.b } 64 + 65 + let union bf1 bf2 = op Bitv.bw_or bf1 bf2 66 + let inter bf1 bf2 = op Bitv.bw_and bf1 bf2 67 + 68 + let mem_priv t hashed_data = 69 + let rec loop = function 70 + | [] -> true 71 + | (off, len) :: tl -> 72 + let loc = off + (hashed_data mod len) in 73 + let res = Bitv.unsafe_get t.b loc in 74 + if res then loop tl else false 75 + in 76 + loop t.p_len 77 + 78 + let mem bf data = mem_priv bf (Hashtbl.hash data) 79 + let clear t = Bitv.fill t.b 0 t.m false 80 + 81 + (* Bitv.pop is really slow *) 82 + let size_estimate t = 83 + let mf = float_of_int t.m in 84 + let kf = float_of_int t.k in 85 + let xf = float_of_int (Bitv.pop t.b) in 86 + int_of_float (-.mf /. kf *. log (1. -. (xf /. mf))) 87 + 88 + (* Serialisers *) 89 + 90 + external set_64 : bytes -> int -> int64 -> unit = "%caml_string_set64u" 91 + external swap64 : int64 -> int64 = "%bswap_int64" 92 + 93 + let set_uint64 buf off v = 94 + if not Sys.big_endian then set_64 buf off (swap64 v) else set_64 buf off v 95 + 96 + (* type priv = { m : int; k : int; p_len : (int * int) list; b : Bitv.t } *) 97 + 98 + let to_bytes t = 99 + let enc_b = Bitv.to_bytes t.b in 100 + let enc_b_len = Bytes.length enc_b in 101 + let enc_p_len_len = 16 * List.length t.p_len in 102 + let len = 8 + 8 + 8 + enc_p_len_len + enc_b_len in 103 + let buf = Bytes.create len in 104 + set_uint64 buf 0 (Int64.of_int t.m); 105 + set_uint64 buf 8 (Int64.of_int t.k); 106 + set_uint64 buf 16 (Int64.of_int (List.length t.p_len)); 107 + List.iteri 108 + (fun i (i1, i2) -> 109 + set_uint64 buf (24 + (8 * (2 * i))) (Int64.of_int i1); 110 + set_uint64 buf (24 + (8 * ((2 * i) + 1))) (Int64.of_int i2)) 111 + t.p_len; 112 + Bytes.blit enc_b 0 buf (24 + enc_p_len_len) enc_b_len; 113 + buf 114 + 115 + external get_64 : bytes -> int -> int64 = "%caml_string_get64" 116 + 117 + let get_uint64 buf off = 118 + if not Sys.big_endian then swap64 (get_64 buf off) else get_64 buf off 119 + 120 + let of_bytes buf = 121 + try 122 + let m = get_uint64 buf 0 |> Int64.to_int in 123 + let k = get_uint64 buf 8 |> Int64.to_int in 124 + let p_len_len = get_uint64 buf 16 |> Int64.to_int in 125 + let p_len = 126 + List.init p_len_len (fun i -> 127 + let i1 = get_uint64 buf (24 + (8 * (2 * i))) |> Int64.to_int in 128 + let i2 = get_uint64 buf (24 + (8 * ((2 * i) + 1))) |> Int64.to_int in 129 + (i1, i2)) 130 + in 131 + let read = 24 + (16 * p_len_len) in 132 + let b = Bytes.sub buf read (Bytes.length buf - read) |> Bitv.of_bytes in 133 + Ok { m; k; p_len; b } 134 + with _ -> Error (`Msg "invalid serialisation format") 135 + 136 + module type Hashable = sig 137 + type t 138 + 139 + val hash : t -> int 140 + end 141 + 142 + module Make (H : Hashable) = struct 143 + type t = priv 144 + 145 + let create = create 146 + let copy = copy 147 + let add bf data = add_priv bf (H.hash data) 148 + let mem bf data = mem_priv bf (H.hash data) 149 + let clear = clear 150 + let size_estimate = size_estimate 151 + let to_bytes = to_bytes 152 + let of_bytes = of_bytes 153 + end
+93
src/bloom.mli
··· 1 + (** Bloom filters 2 + 3 + Bloom is an implementation of Bloom filters in OCaml. 4 + 5 + Bloom filters are memory and time efficient data structures allowing 6 + probabilistic membership queries in a set. A query negative result ensures 7 + that the element is not present in the set, while a positive result might be 8 + a false positive, i.e. the element might not be present and the BF 9 + membership query can return true anyway. Internal parameters of the BF allow 10 + to control its false positive rate depending on the expected number of 11 + elements in it. *) 12 + 13 + (** {1 Generic interface} *) 14 + 15 + type 'a t 16 + (** The type of the Bloom filter. *) 17 + 18 + val create : ?error_rate:float -> int -> 'a t 19 + (** [create ~error_rate size] creates a fresh BF for which expected false 20 + positive rate when filled with [size] elements is [error_rate]. 21 + 22 + @raise Invalid_argument 23 + if [error_rate] is not in \]0, 1\[, or [size] is negative. *) 24 + 25 + val copy : 'a t -> 'a t 26 + (** [copy t] copies the BF. The fresh returned BF is completely separated from 27 + the given one. *) 28 + 29 + val add : 'a t -> 'a -> unit 30 + (** [add t e] adds [e] to [t]. *) 31 + 32 + val mem : 'a t -> 'a -> bool 33 + (** [mem t e] is [true] if [e] is in [t]. *) 34 + 35 + val clear : 'a t -> unit 36 + (** [clear t] clears the contents of [t]. *) 37 + 38 + val union : 'a t -> 'a t -> 'a t 39 + (** [union t1 t2] computes the union of the two inputs. This operation is 40 + lossless in the sense that the resulting Bloom filter is the same as the 41 + Bloom filter created from scratch using the union of the two sets. 42 + 43 + Raises [Invalid_argument] if the two bloom filters were created with 44 + different parameters *) 45 + 46 + val inter : 'a t -> 'a t -> 'a t 47 + (** [inter t1 t2] computes the intersection of the two inputs. The false 48 + positive probability in the resulting Bloom filter is at most the 49 + false-positive probability in one of the constituent Bloom filters, but may 50 + be larger than the false positive probability in the Bloom filter created 51 + from scratch using the intersection of the two sets. 52 + 53 + Raises [Invalid_argument] if the two bloom filters were created with 54 + different parameters *) 55 + 56 + val size_estimate : 'a t -> int 57 + (** [size_estimate t] is an approximation of the number of elements stored in 58 + the bloom filter. Please note that this operation is costly (see 59 + benchmarks). *) 60 + 61 + (** {2 Serializers/Deserializers} *) 62 + 63 + val to_bytes : 'a t -> bytes 64 + val of_bytes : bytes -> ('a t, [ `Msg of string ]) result 65 + 66 + (** {1 Functorial interface} *) 67 + 68 + (** The functorial interface allows you to specify your own hash function. *) 69 + 70 + (** The input interface for [Bloom.Make]. *) 71 + module type Hashable = sig 72 + type t 73 + (** The type of the values to be stored. *) 74 + 75 + val hash : t -> int 76 + (** The hash function. {e This function must return positive integers.} 77 + Behavior is undefined otherwise. Please note that false positive rate 78 + might be affected by unevenly distributed hash functions. *) 79 + end 80 + 81 + (** The output interface for [Bloom.Make]. *) 82 + module Make (H : Hashable) : sig 83 + type t 84 + 85 + val create : ?error_rate:float -> int -> t 86 + val copy : t -> t 87 + val add : t -> H.t -> unit 88 + val mem : t -> H.t -> bool 89 + val clear : t -> unit 90 + val size_estimate : t -> int 91 + val to_bytes : t -> bytes 92 + val of_bytes : bytes -> (t, [ `Msg of string ]) result 93 + end
+4
src/dune
··· 1 + (library 2 + (public_name bloom) 3 + (name bloom) 4 + (libraries bitv))
+3
test/dune
··· 1 + (test 2 + (name main) 3 + (libraries bloom alcotest))
+102
test/main.ml
··· 1 + let () = Random.self_init () 2 + let random_char () = char_of_int (Random.int 256) 3 + let random_string n = String.init n (fun _i -> random_char ()) 4 + 5 + module StringSet = Set.Make (String) 6 + 7 + let expected_error_rate = 0.001 8 + 9 + let create_and_fill size = 10 + let bf = Bloom.create ~error_rate:expected_error_rate size in 11 + let rec loop i acc = 12 + if i = 0 then acc 13 + else 14 + let r = random_string 1024 in 15 + let () = Bloom.add bf r in 16 + loop (i - 1) (StringSet.add r acc) 17 + in 18 + let elts = loop size StringSet.empty in 19 + (bf, elts) 20 + 21 + let test_mem () = 22 + let sizes = [ 1_000; 10_000; 100_000 ] in 23 + List.iter 24 + (fun i -> 25 + let bf, elts = create_and_fill i in 26 + StringSet.iter 27 + (fun r -> Alcotest.(check bool) "mem (empty)" true (Bloom.mem bf r)) 28 + elts) 29 + sizes 30 + 31 + let test_errors () = 32 + let sizes = [ 1_000; 10_000; 100_000 ] in 33 + List.iter 34 + (fun i -> 35 + let bf, elts = create_and_fill i in 36 + let attempts = 100_000 in 37 + let rec loop i count = 38 + if i = 0 then count 39 + else 40 + let r = random_string 1024 in 41 + if StringSet.mem r elts then loop i count 42 + else loop (i - 1) (if Bloom.mem bf r then count + 1 else count) 43 + in 44 + let count = loop attempts 0 in 45 + let error_rate = float_of_int count /. float_of_int attempts in 46 + if error_rate > 1.15 *. expected_error_rate then 47 + Alcotest.failf "error_rate: expecting@\n%f, got@\n%f" 48 + expected_error_rate error_rate 49 + else ()) 50 + sizes 51 + 52 + let test_size () = 53 + let sizes = [ 1_000; 10_000; 100_000 ] in 54 + List.iter 55 + (fun i -> 56 + let bf, _ = create_and_fill i in 57 + let len = Bloom.size_estimate bf in 58 + if abs (len - i) > int_of_float (0.15 *. float_of_int i) then 59 + Alcotest.failf "size_estimate: expecting@\n%d, got@\n%d" i len) 60 + sizes 61 + 62 + let test_op msg bop sop = 63 + let sizes = [ 1_000; 10_000; 100_000 ] in 64 + List.iter 65 + (fun i -> 66 + let bf1, elts1 = create_and_fill i in 67 + let bf2, elts2 = create_and_fill i in 68 + let bf3 = bop bf1 bf2 in 69 + let elts3 = sop elts1 elts2 in 70 + StringSet.iter 71 + (fun r -> Alcotest.(check bool) msg true (Bloom.mem bf3 r)) 72 + elts3) 73 + sizes 74 + 75 + let test_union () = test_op "union" Bloom.union StringSet.union 76 + let test_inter () = test_op "intersection" Bloom.inter StringSet.inter 77 + 78 + let test_bytes () = 79 + let sizes = [ 1_000; 10_000; 100_000 ] in 80 + List.iter 81 + (fun i -> 82 + let bf1, _ = create_and_fill i in 83 + match Bloom.to_bytes bf1 |> Bloom.of_bytes with 84 + | Ok bf2 -> 85 + Alcotest.(check bool) 86 + "serialisation / deserialisation" true (bf1 = bf2) 87 + | Error _ -> Alcotest.failf "deserialisation failed") 88 + sizes 89 + 90 + let suite = 91 + [ 92 + ("Mem returns true when element was added", `Quick, test_mem); 93 + ( "False positive rate is as specified (15% error allowed)", 94 + `Slow, 95 + test_errors ); 96 + ("Size estimate is correct", `Slow, test_size); 97 + ("Union", `Quick, test_union); 98 + ("Intersection", `Quick, test_inter); 99 + ("Serialisation", `Quick, test_bytes); 100 + ] 101 + 102 + let () = Alcotest.run "Bloom" [ ("bloom", suite) ]
+1
test/main.mli
··· 1 + (* Left empty on purpose *)