Persistent store with Git semantics: lazy reads, delayed writes, content-addressing
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

OCaml 89.6%
Perl 7.2%
Standard ML 0.8%
Shell 0.5%
Dune 0.3%
Other 1.6%
264 1 0

Clone this repository

https://tangled.org/gazagnaire.org/irmin https://tangled.org/did:plc:jhift2vwcxhou52p3sewcrpx/irmin
git@git.recoil.org:gazagnaire.org/irmin git@git.recoil.org:did:plc:jhift2vwcxhou52p3sewcrpx/irmin

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

Irmin#

Irmin is a content-addressable store that models data as typed, hash-linked DAGs. You navigate them with cursors, merge them with schema-defined strategies, and sync them over the network. Backends are pluggable: the same code works against a Git repository, an ATProto PDS, an OCI registry, or an in-memory heap.

This version of Irmin is rebuilt around a new core abstraction: a typed cursor over a content-addressed DAG, with codecs that define block structure and merge semantics at the schema level rather than per-backend. It uses Eio for concurrent I/O and requires OCaml >= 5.2.

Why Irmin#

A Git repository is a content-addressed store. So is an ATProto repository, an OCI image, or an IPFS object. They all share the same core operations: hash a block, store it, link blocks by hash, traverse, diff, merge, sync.

Irmin makes that pattern a library. You define a schema (how blocks are structured), and Irmin gives you:

  • Typed cursors that navigate the DAG lazily, fetching blocks on demand
  • Schema-driven merge with typed merge functions (counters, sets, text, LWW) composed at the tree level
  • Merkle proofs — record a computation's reads into a subheap, replay and verify elsewhere
  • Sync protocols — anti-entropy gossip, Merkle descent, Bloom-slice transfer (Journault & Gazagnaire, 2014)
  • A CLI for inspecting, editing, and serving stores from the terminal

The cursor abstraction means application code is backend-independent. Switch from Git to ATProto by changing the heap, not the logic.

Installation#

Install with opam:

$ opam install nox-irmin

If opam cannot find the package, it may not yet be released in the public opam-repository. Add the overlay repository, then install it:

$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install nox-irmin

Quick Start#

open Irmin_git

let walk_tree () =
  Eio_main.run @@ fun env ->
  let fs = Eio.Stdenv.fs env in
  Eio.Switch.run @@ fun sw ->
  let heap = open_ ~sw ~fs ~path:(Fpath.v ".") in
  match head heap ~branch:"main" with
  | None -> print_endline "empty repository"
  | Some h ->
    let c = at heap tree h in
    list c
    |> List.iter (fun (name, kind) ->
         let tag = match kind with `Node -> "/" | `Leaf -> "" in
         Fmt.pr "  %s%s@." name tag)

Define a custom schema#

(* Reuse the Git tree codec but override the leaf merge strategy *)
let my_tree =
  fix (fun self ->
    node ~name:"application/x-tree"
      ~dec:tree_parse ~enc:tree_serialize
      ~merge:merge_lww              (* last-writer-wins at leaves *)
      ~rules:[ "*" => self ] ())

A child of a node is either linked (a separate content-addressed block, referenced by hash, deduplicated across the DAG) or inlined (bytes stored inside the parent block, not content-addressed on their own). The schema's dec/enc decides per child; the cursor walks both transparently.

(* A Git tree entry: the permission bits live inline in the
   parent, the target blob lives as a separate Link. *)
let example_children =
  let blob_hash = Git.Hash.digest_string ~kind:`Blob "hello" in
  Named
    [ ("mode",   inline "100644");
      ("target", link (irmin_hash blob_hash)) ]

Rule of thumb: anything you want to share across blocks (deduplicated, independently fetchable, reachable from proofs) should be a link. Anything you always want to materialise with the parent (a small flag, a permission, a short tag) should be inline. The schema stays pure either way — heap writes happen at flush, not inside enc.

Merge with typed strategies#

(* Lift a typed counter merge into a block-level merge strategy *)
let counter_merge =
  Irmin.Merge.v
    ~decode:int_of_string
    ~encode:string_of_int
    Irmin.Merge.counter

let counter_leaf =
  leaf ~name:"counter" ~merge:counter_merge ()
(* 5 + 3 = 8, not a conflict *)

let text_leaf =
  leaf ~name:"text" ~merge:merge_lww ()

(* Tree-level merge composes leaf strategies automatically *)
let merged_tree =
  fix (fun self ->
    directory
      [ "*.count" => counter_leaf;
        "*.txt"   => text_leaf;
        "*"       => self ])

CLI#

$ irmin init -r mystore
$ irmin set -r mystore config/db.json '{"host":"localhost"}'
$ irmin get -r mystore config/db.json
{"host":"localhost"}
$ irmin tree -r mystore
config/
  db.json
$ irmin log -r mystore
[abc1234] Set config/db.json
$ irmin branches -r mystore
  main

Moving data in and out#

Four commands, split by source medium (file on disk vs. another live store) and direction (ingest vs. emit):

from a file from another store
in irmin import FILE irmin pull REMOTE
out irmin export FILE irmin push REMOTE

Plus the onboard shortcut:

$ irmin clone SOURCE [DIR]

which seeds a fresh store under DIR from a CAR archive (today) or a remote URL (later) — the one-shot of init + import + setting HEAD. With no DIR, the target folder is inferred from the source basename, matching git clone's convention.

The archival pair (import / export) and the sync pair (pull / push) are deliberately different workflows:

  • Archive: CAR file is a self-contained, hash-integral snapshot. No refs, no merge, no network. Hand it to someone, commit it to backup, re-hydrate with import. clone is the "get started" shortcut over this pair.
  • Sync: two live stores agreeing on a ref. push sends the delta, pull fetches + merges. Refs and merge strategy matter; the target must be reachable and writable for push.

CAR support currently targets the PDS/ATProto backend; Git-backed stores export via git bundle on the underlying .git.

Backends#

Module Type Block format Status
Irmin_git Backend SHA-1 tree/blob/commit Read/write, full Heap.S
Irmin_atproto Backend DAG-CBOR (SHA-256) Read/write, full Heap.S
Irmin_tar Backend SHA-256 file blobs Read/write
Irmin_json Codec In-memory JSON values Read-only (parse JSON blocks)
Irmin_cbor Codec DAG-CBOR blocks Read-only (parse CBOR blocks)
Irmin_oci Codec SHA-256 manifests/layers Read-only (parse OCI manifests)

Full backends implement Heap.S — content-addressed block storage with named refs, put, get, and mem. Codecs provide the dec/enc functions that interpret blocks but do not manage storage.

Architecture#

┌──────────────────────────────────────────┐
│  Schema: codecs, cursors, merge, proofs  │  typed, backend-agnostic
├────────────────────┬─────────────────────┤
│  Sync              │  Worktree           │  coordination
│  gossip / Merkle / │  checkout / status  │
│  Bloom-slice       │  / commit           │
├────────────────────┴──────┬──────────────┤
│  Git heap                 │ ATProto heap │  backend layer
│  (SHA-1, packfiles)       │ (SHA-256)    │
└───────────────────────────┴──────────────┘

Module Structure#

Module Purpose
Irmin.Hash Phantom-typed SHA-1 / SHA-256 hashes
Irmin.Heap Content-addressed block store with named refs
Irmin.Schema.Make(H) Codecs, cursors, merge, diff, proofs over H
Irmin.Sync Backend-agnostic sync (gossip, Merkle, Bloom)
Irmin.Worktree Filesystem checkout / status / commit
Irmin.SHA1 Pre-built Schema.Make for SHA-1
Irmin.SHA256 Pre-built Schema.Make for SHA-256
Irmin_git Git backend
Irmin_atproto ATProto backend
Irmin_oci OCI registry backend
Irmin_json JSON in-memory backend
Irmin_cbor CBOR block backend
Irmin_tar TAR archive backend

Key Concepts#

Heap: A typed, content-addressed persistent store. Blocks are stored and retrieved by hash. Named refs (branches, HEAD) provide mutable entry points. The heap does not interpret blocks — structure is the codec's job.

Codec: A typed description of a block at one level of the DAG. Codecs carry a MIME-style name, a decode/encode pair, an optional merge function, and navigation rules that map child names to codecs. The fix combinator handles recursive structures (trees that contain trees).

Cursor: A position in the DAG with a known value type. Reads are lazy: step descends to a child by name or typed field, fetching the block only when needed. Writes accumulate locally until flush persists them.

Proof: A subheap recording every block read during a computation. produce runs a function against a full heap and captures the reads; verify replays the function against the subheap alone, confirming the result without trusting the full store.

Sync: The protocol for exchanging blocks between stores. Irmin provides three default strategies: anti-entropy gossip (exchange head lists), Merkle descent (walk the DAG to find divergence), and Bloom-slice sync (probabilistic set difference using layered Bloom filters — efficient for large DAGs with small deltas).

References#

License#

ISC