Pure OCaml B-tree implementation for persistent storage
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs(btree): rewrite README with design rationale and spec details

Explain the two B-tree variants (B+tree for tables, B-tree for indexes),
the SQLite format as a deliberate design choice (tooling + spec maturity),
and the tradeoffs vs COW-based engines (LMDB, sanakirja). Add overflow
formula, module table, and related work section.

+87 -76
+87 -76
README.md
··· 1 1 # btree 2 2 3 - Pure OCaml B-tree implementation for persistent storage. 4 - 5 - > **Note:** This library currently implements SQLite's specific B-tree file 6 - > format (page layout, varint encoding, record serialisation). It is not a 7 - > generic B-tree library. A future version may abstract the serialisation 8 - > scheme to support other formats. 3 + Pure OCaml B-tree storage engine with SQLite-compatible file format. 9 4 10 5 ## Overview 11 6 12 - A B-tree implementation using SQLite's page-based storage format: 7 + A persistent, page-based B-tree that produces valid SQLite database files. 8 + You get a fast embedded storage engine whose files can be inspected and 9 + debugged with standard `sqlite3` tools. 13 10 14 - - **Table B-trees**: 64-bit integer keys with data in leaves 15 - - **Index B-trees**: Arbitrary keys, no data (for secondary indexes) 16 - - **Page-based**: Interior and leaf pages with configurable size 17 - - **Overflow support**: Large records span multiple pages 11 + Two B-tree variants following the 12 + [SQLite file format specification](https://www.sqlite.org/fileformat.html): 18 13 19 - ## Installation 14 + - **Table B-tree** (B+tree) — data lives only in leaves, interior nodes 15 + hold rowid keys and child pointers. Optimised for sequential scans. 16 + - **Index B-tree** — keys stored in both interior and leaf nodes. Optimised 17 + for point lookups. 20 18 21 - ``` 22 - opam install btree 23 - ``` 19 + ### Features 24 20 25 - ## Usage 21 + - Page sizes 512 to 65536 (powers of 2) 22 + - Overflow pages for payloads exceeding `U - 35` bytes 23 + - File-backed or purely in-memory operation 24 + - Configurable page cache 26 25 27 - ```ocaml 28 - (* Create a pager backed by a file *) 29 - let pager = Btree.Pager.create ~page_size:4096 file in 26 + ### Design choices 30 27 31 - (* Create a table B-tree *) 32 - let tree = Btree.Table.create pager in 28 + The SQLite file format is an implementation choice, not a limitation. It 29 + brings free tooling (`sqlite3` CLI, DB Browser, etc.) and a 30 + [spec](https://www.sqlite.org/fileformat.html) with 20+ years of 31 + battle-testing. The user-facing API is generic — persistent ordered map 32 + (`Table`) and persistent ordered set (`Index`). 33 33 34 - (* Insert records *) 35 - Btree.Table.insert tree ~rowid:1L "Hello"; 36 - Btree.Table.insert tree ~rowid:2L "World"; 34 + What the format does **not** give you (compared to LMDB/sanakirja): 37 35 38 - (* Lookup *) 39 - let data = Btree.Table.find tree 1L in (* Some "Hello" *) 36 + | Feature | SQLite format | LMDB / sanakirja | 37 + |---------|--------------|------------------| 38 + | Concurrency | In-place updates + WAL/journal | Copy-on-write (lock-free readers) | 39 + | Range scans | Via parent traversal | Leaf sibling pointers | 40 + | Crash safety | Rollback journal or WAL | Atomic root pointer swap | 40 41 41 - (* Iterate *) 42 - Btree.Table.iter tree (fun rowid data -> 43 - Printf.printf "%Ld: %s\n" rowid data) 44 - ``` 42 + These are deliberate tradeoffs — compatibility and tooling vs. raw 43 + throughput on concurrent workloads. 45 44 46 - ## API 45 + ## Installation 47 46 48 - ### Pager 47 + ``` 48 + opam install btree 49 + ``` 49 50 50 - The pager manages page I/O and caching: 51 + ## Usage 51 52 52 - - `Pager.create ~page_size file` - Create pager with given page size 53 - - `Pager.read pager page_num` - Read a page 54 - - `Pager.write pager page_num data` - Write a page 55 - - `Pager.allocate pager` - Allocate a new page 56 - - `Pager.free pager page_num` - Free a page 57 - - `Pager.sync pager` - Sync to disk 53 + ```ocaml 54 + (* File-backed table B-tree *) 55 + let pager = Btree.Pager.v ~page_size:4096 file in 56 + let table = Btree.Table.v pager in 57 + Btree.Table.insert table ~rowid:1L "Hello"; 58 + Btree.Table.insert table ~rowid:2L "World"; 59 + Btree.Table.find table 1L (* Some "Hello" *) 58 60 59 - ### Table B-tree 61 + (* In-memory index B-tree *) 62 + let pager = Btree.Pager.mem ~page_size:4096 () in 63 + let index = Btree.Index.v pager in 64 + Btree.Index.insert index "key"; 65 + Btree.Index.mem index "key" (* true *) 60 66 61 - For rowid-keyed tables (like SQLite tables): 67 + (* Iterate in key order *) 68 + Btree.Table.iter table (fun rowid data -> 69 + Printf.printf "%Ld: %s\n" rowid data) 70 + ``` 62 71 63 - - `Table.create pager` - Create a new table B-tree 64 - - `Table.open_ pager root_page` - Open existing table 65 - - `Table.insert tree ~rowid data` - Insert a record 66 - - `Table.find tree rowid` - Find by rowid 67 - - `Table.delete tree rowid` - Delete by rowid 68 - - `Table.iter tree f` - Iterate all records 72 + ## Modules 69 73 70 - ### Index B-tree 71 - 72 - For arbitrary keys (like SQLite indexes): 73 - 74 - - `Index.create pager` - Create a new index B-tree 75 - - `Index.insert tree key` - Insert a key 76 - - `Index.mem tree key` - Check if key exists 77 - - `Index.delete tree key` - Delete a key 78 - - `Index.iter tree f` - Iterate all keys 79 - 80 - ## Page Format 74 + | Module | Purpose | 75 + |--------|---------| 76 + | `Pager` | Page cache and file I/O (file-backed or in-memory) | 77 + | `Table` | B+tree for rowid-keyed records (`int64 -> string`) | 78 + | `Index` | B-tree for string key sets | 79 + | `Page` | Page header parsing, binary helpers | 80 + | `Cell` | Cell encoding/decoding (table leaf, interior, index) | 81 + | `Record` | SQLite record format (serial types, column values) | 82 + | `Varint` | Variable-length integer encoding | 81 83 82 - Following SQLite's B-tree page format: 84 + ## File format 83 85 84 - ### Page Header 86 + ### Page header (8 bytes leaf, 12 bytes interior) 85 87 86 88 | Offset | Size | Description | 87 89 |--------|------|-------------| 88 - | 0 | 1 | Page type (0x02, 0x05, 0x0a, 0x0d) | 89 - | 1 | 2 | First freeblock offset | 90 - | 3 | 2 | Number of cells | 91 - | 5 | 2 | Cell content area start | 92 - | 7 | 1 | Fragmented free bytes | 93 - | 8 | 4 | Right-most child (interior only) | 90 + | 0 | 1 | Page type: `0x0d` leaf table, `0x05` interior table, `0x0a` leaf index, `0x02` interior index | 91 + | 1 | 2 | First freeblock offset (0 if none) | 92 + | 3 | 2 | Cell count | 93 + | 5 | 2 | Cell content area start (0 = 65536) | 94 + | 7 | 1 | Fragmented free bytes (max 60) | 95 + | 8 | 4 | Right-most child pointer (interior pages only) | 94 96 95 - ### Page Types 97 + ### Overflow 96 98 97 - - `0x02` - Interior index page 98 - - `0x05` - Interior table page 99 - - `0x0a` - Leaf index page 100 - - `0x0d` - Leaf table page 99 + When a cell payload exceeds `X = U - 35` bytes (where `U` is usable page 100 + size), excess data is stored in a chain of overflow pages. Each overflow 101 + page has a 4-byte next-page pointer followed by up to `U - 4` bytes of 102 + data. The local payload size is computed per the SQLite spec: 101 103 102 - ## Related Work 104 + ``` 105 + M = ((U - 12) * 32 / 255) - 23 106 + K = M + ((P - M) mod (U - 4)) 107 + local = if K <= X then K else M 108 + ``` 103 109 104 - - [SQLite file format](https://www.sqlite.org/fileformat.html) - The specification this library implements 105 - - [Limbo](https://github.com/tursodatabase/limbo) - Rust SQLite implementation 106 - - [ocaml-btree (ctk21)](https://github.com/ctk21/ocaml-btree) - Generic in-memory B-tree (different use case) 110 + ## Related work 111 + 112 + - [SQLite file format](https://www.sqlite.org/fileformat.html) — the specification this library implements 113 + - [ocaml-sqlite](../ocaml-sqlite/) — database layer built on top of this library (KV API, named tables, schema) 114 + - [LMDB](http://www.lmdb.tech/doc/) — C B+tree with memory-mapped COW (different tradeoffs) 115 + - [sanakirja](https://docs.rs/sanakirja/) — Rust COW B-tree, used by Pijul 116 + - [bbolt](https://github.com/etcd-io/bbolt) — Go B+tree, used by etcd 117 + - [Limbo](https://github.com/tursodatabase/limbo) — Rust SQLite reimplementation 107 118 108 119 ## Licence 109 120 110 - MIT License. See [LICENSE.md](LICENSE.md) for details. 121 + ISC. See [LICENSE.md](LICENSE.md) for details.