btree#
Pure OCaml B-tree storage engine with SQLite-compatible file format.
Overview#
A persistent, page-based B-tree that produces valid SQLite database files.
You get a fast embedded storage engine whose files can be inspected and
debugged with standard sqlite3 tools.
Two B-tree variants following the SQLite file format specification:
- Table B-tree (B+tree) -- data lives only in leaves, interior nodes hold rowid keys and child pointers. Optimised for sequential scans.
- Index B-tree -- keys stored in both interior and leaf nodes. Optimised for point lookups.
Features#
- Page sizes 512 to 65536 (powers of 2)
- Overflow pages for payloads exceeding
U - 35bytes - File-backed or purely in-memory operation
- Configurable page cache
Installation#
Install with opam:
$ opam install btree
If opam cannot find the package, it may not yet be released in the public
opam-repository. Add the overlay repository, then install it:
$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install btree
Usage#
Table: insert, find, iterate#
let demo () =
let pager = Btree.Pager.mem ~page_size:4096 () in
let table = Btree.Table.v pager in
Btree.Table.insert table ~rowid:1L "Hello";
Btree.Table.insert table ~rowid:2L "World";
(match Btree.Table.find table 1L with
| Some v -> Fmt.pr "1 -> %s@." v
| None -> ());
Btree.Table.iter table (fun rowid data ->
Printf.printf "%Ld: %s\n" rowid data)
Index: add, membership, range scan#
let index_demo () =
let pager = Btree.Pager.mem ~page_size:4096 () in
let index = Btree.Index.v pager in
Btree.Index.insert index "key";
Btree.Index.mem index "key"
File-backed#
Point the pager at an Eio file and call Pager.sync to persist. The result
is a file that sqlite3 can open:
let persist ~fs =
Eio.Path.with_open_out
~create:(`If_missing 0o644) Eio.Path.(fs / "demo.db")
(fun file ->
let pager = Btree.Pager.v ~page_size:4096 file in
let table = Btree.Table.v pager in
Btree.Table.insert table ~rowid:1L "persisted";
Btree.Pager.sync pager)
Reopen by constructing a pager on the same file and calling
Btree.Table.open_ pager ~root_page; the root page number comes from
Btree.Table.save_root (or 1 for a fresh file).
API#
Table#
v pager/open_ pager ~root_pageinsert t ~rowid data/find t rowid/delete t rowiditer t f/fold t ~init ~fsave_root t/restore_root t root-- persist the root for reopen
Index#
v pager/open_ pager ~root_pageinsert t key/mem t key- Iteration and range scans mirror
Table
Pager#
Pager.v ~page_size file-- file-backedPager.mem ~page_size ()-- in-memory,syncis a no-opPager.sync t-- flush dirty pagesPager.snapshot t/rollback t snap-- capture and restore for aborting a logical transaction
Internals#
Modules#
| Module | Purpose |
|---|---|
Pager |
Page cache and file I/O (file-backed or in-memory) |
Table |
B+tree for rowid-keyed records (int64 -> string) |
Index |
B-tree for string key sets |
Page |
Page header parsing, binary helpers |
Cell |
Cell encoding/decoding (table leaf, interior, index) |
Record |
SQLite record format (serial types, column values) |
Varint |
Variable-length integer encoding |
Page header (8 bytes leaf, 12 bytes interior)#
| Offset | Size | Description |
|---|---|---|
| 0 | 1 | Page type: 0x0d leaf table, 0x05 interior table, 0x0a leaf index, 0x02 interior index |
| 1 | 2 | First freeblock offset (0 if none) |
| 3 | 2 | Cell count |
| 5 | 2 | Cell content area start (0 = 65536) |
| 7 | 1 | Fragmented free bytes (max 60) |
| 8 | 4 | Right-most child pointer (interior pages only) |
Overflow#
When a cell payload exceeds X = U - 35 bytes (where U is usable page
size), excess data is stored in a chain of overflow pages. Each overflow
page has a 4-byte next-page pointer followed by up to U - 4 bytes of
data. The local payload size is computed per the SQLite spec:
M = ((U - 12) * 32 / 255) - 23
K = M + ((P - M) mod (U - 4))
local = if K <= X then K else M
Design choices#
The SQLite file format is an implementation choice, not a limitation. It
brings free tooling (sqlite3 CLI, DB Browser, etc.) and a
spec with 20+ years of
battle-testing. The user-facing API is generic -- persistent ordered map
(Table) and persistent ordered set (Index).
What the format does not give you (compared to LMDB/sanakirja):
| Feature | SQLite format | LMDB / sanakirja |
|---|---|---|
| Concurrency | In-place updates + WAL/journal | Copy-on-write (lock-free readers) |
| Range scans | Via parent traversal | Leaf sibling pointers |
| Crash safety | Rollback journal or WAL | Atomic root pointer swap |
These are deliberate tradeoffs -- compatibility and tooling vs. raw throughput on concurrent workloads.
Related work#
- SQLite file format -- the specification this library implements
- ocaml-sqlite -- database layer built on top of this library (KV API, named tables, schema)
- LMDB -- C B+tree with memory-mapped COW (different tradeoffs)
- sanakirja -- Rust COW B-tree, used by Pijul
- bbolt -- Go B+tree, used by etcd
- Limbo -- Rust SQLite reimplementation
Licence#
MIT. See LICENSE.md for details.