Irmin is a content-addressable store that models data as typed, hash-linked DAGs. You navigate them with cursors, merge them with schema-defined strategies, and sync them over the network. Backends are pluggable: the same code works against a Git repository, an ATProto PDS, an OCI registry, or an in-memory heap.

This version of Irmin is rebuilt around a new core abstraction: a typed cursor over a content-addressed DAG, with codecs that define block structure and merge semantics at the schema level rather than per-backend. It uses Eio for concurrent I/O and requires OCaml >= 5.2.

Why Irmin#

A Git repository is a content-addressed store. So is an ATProto repository, an OCI image, or an IPFS object. They all share the same core operations: hash a block, store it, link blocks by hash, traverse, diff, merge, sync.

Irmin makes that pattern a library. You define a schema (how blocks are structured), and Irmin gives you:

Typed cursors that navigate the DAG lazily, fetching blocks on demand
Schema-driven merge with typed merge functions (counters, sets, text, LWW) composed at the tree level
Merkle proofs — record a computation's reads into a subheap, replay and verify elsewhere
Sync protocols — anti-entropy gossip, Merkle descent, Bloom-slice transfer (Journault & Gazagnaire, 2014)
A CLI for inspecting, editing, and serving stores from the terminal

The cursor abstraction means application code is backend-independent. Switch from Git to ATProto by changing the heap, not the logic.

Installation#

Install with opam:

$ opam install nox-irmin

If opam cannot find the package, it may not yet be released in the public opam-repository. Add the overlay repository, then install it:

$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install nox-irmin

Quick Start#

Navigate a Git repository#

open Irmin_git

let walk_tree () =
  Eio_main.run @@ fun env ->
  let fs = Eio.Stdenv.fs env in
  Eio.Switch.run @@ fun sw ->
  let heap = open_ ~sw ~fs ~path:(Fpath.v ".") in
  match head heap ~branch:"main" with
  | None -> print_endline "empty repository"
  | Some h ->
    let c = at heap tree h in
    list c
    |> List.iter (fun (name, kind) ->
         let tag = match kind with `Node -> "/" | `Leaf -> "" in
         Fmt.pr "  %s%s@." name tag)

Define a custom schema#

(* Reuse the Git tree codec but override the leaf merge strategy *)
let my_tree =
  fix (fun self ->
    node ~name:"application/x-tree"
      ~dec:tree_parse ~enc:tree_serialize
      ~merge:merge_lww              (* last-writer-wins at leaves *)
      ~rules:[ "*" => self ] ())

Links vs inlines#

A child of a node is either linked (a separate content-addressed block, referenced by hash, deduplicated across the DAG) or inlined (bytes stored inside the parent block, not content-addressed on their own). The schema's dec/enc decides per child; the cursor walks both transparently.

(* A Git tree entry: the permission bits live inline in the
   parent, the target blob lives as a separate Link. *)
let example_children =
  let blob_hash = Git.Hash.digest_string ~kind:`Blob "hello" in
  Named
    [ ("mode",   inline "100644");
      ("target", link (irmin_hash blob_hash)) ]

Rule of thumb: anything you want to share across blocks (deduplicated, independently fetchable, reachable from proofs) should be a link. Anything you always want to materialise with the parent (a small flag, a permission, a short tag) should be inline. The schema stays pure either way — heap writes happen at flush, not inside enc.

Merge with typed strategies#

(* Lift a typed counter merge into a block-level merge strategy *)
let counter_merge =
  Irmin.Merge.v
    ~decode:int_of_string
    ~encode:string_of_int
    Irmin.Merge.counter

let counter_leaf =
  leaf ~name:"counter" ~merge:counter_merge ()
(* 5 + 3 = 8, not a conflict *)

let text_leaf =
  leaf ~name:"text" ~merge:merge_lww ()

(* Tree-level merge composes leaf strategies automatically *)
let merged_tree =
  fix (fun self ->
    directory
      [ "*.count" => counter_leaf;
        "*.txt"   => text_leaf;
        "*"       => self ])

CLI#

$ irmin init -r mystore
$ irmin set -r mystore config/db.json '{"host":"localhost"}'
$ irmin get -r mystore config/db.json
{"host":"localhost"}
$ irmin tree -r mystore
config/
  db.json
$ irmin log -r mystore
[abc1234] Set config/db.json
$ irmin branches -r mystore
  main

Moving data in and out#

Four commands, split by source medium (file on disk vs. another live store) and direction (ingest vs. emit):

	from a file	from another store
in	`irmin import FILE`	`irmin pull REMOTE`
out	`irmin export FILE`	`irmin push REMOTE`

Plus the onboard shortcut:

$ irmin clone SOURCE [DIR]

which seeds a fresh store under DIR from a CAR archive (today) or a remote URL (later) — the one-shot of init + import + setting HEAD. With no DIR, the target folder is inferred from the source basename, matching git clone's convention.

The archival pair (import / export) and the sync pair (pull / push) are deliberately different workflows:

Archive: CAR file is a self-contained, hash-integral snapshot. No refs, no merge, no network. Hand it to someone, commit it to backup, re-hydrate with import. clone is the "get started" shortcut over this pair.
Sync: two live stores agreeing on a ref. push sends the delta, pull fetches + merges. Refs and merge strategy matter; the target must be reachable and writable for push.

CAR support currently targets the PDS/ATProto backend; Git-backed stores export via git bundle on the underlying .git.

Backends#

Module	Type	Block format	Status
`Irmin_git`	Backend	SHA-1 tree/blob/commit	Read/write, full `Heap.S`
`Irmin_atproto`	Backend	DAG-CBOR (SHA-256)	Read/write, full `Heap.S`
`Irmin_tar`	Backend	SHA-256 file blobs	Read/write
`Irmin_json`	Codec	In-memory JSON values	Read-only (parse JSON blocks)
`Irmin_cbor`	Codec	DAG-CBOR blocks	Read-only (parse CBOR blocks)
`Irmin_oci`	Codec	SHA-256 manifests/layers	Read-only (parse OCI manifests)

Full backends implement Heap.S — content-addressed block storage with named refs, put, get, and mem. Codecs provide the dec/enc functions that interpret blocks but do not manage storage.

Architecture#

┌──────────────────────────────────────────┐
│  Schema: codecs, cursors, merge, proofs  │  typed, backend-agnostic
├────────────────────┬─────────────────────┤
│  Sync              │  Worktree           │  coordination
│  gossip / Merkle / │  checkout / status  │
│  Bloom-slice       │  / commit           │
├────────────────────┴──────┬──────────────┤
│  Git heap                 │ ATProto heap │  backend layer
│  (SHA-1, packfiles)       │ (SHA-256)    │
└───────────────────────────┴──────────────┘

Module Structure#

Module	Purpose
`Irmin.Hash`	Phantom-typed SHA-1 / SHA-256 hashes
`Irmin.Heap`	Content-addressed block store with named refs
`Irmin.Schema.Make(H)`	Codecs, cursors, merge, diff, proofs over `H`
`Irmin.Sync`	Backend-agnostic sync (gossip, Merkle, Bloom)
`Irmin.Worktree`	Filesystem checkout / status / commit
`Irmin.SHA1`	Pre-built `Schema.Make` for SHA-1
`Irmin.SHA256`	Pre-built `Schema.Make` for SHA-256
`Irmin_git`	Git backend
`Irmin_atproto`	ATProto backend
`Irmin_oci`	OCI registry backend
`Irmin_json`	JSON in-memory backend
`Irmin_cbor`	CBOR block backend
`Irmin_tar`	TAR archive backend

Key Concepts#

Heap: A typed, content-addressed persistent store. Blocks are stored and retrieved by hash. Named refs (branches, HEAD) provide mutable entry points. The heap does not interpret blocks — structure is the codec's job.

Codec: A typed description of a block at one level of the DAG. Codecs carry a MIME-style name, a decode/encode pair, an optional merge function, and navigation rules that map child names to codecs. The fix combinator handles recursive structures (trees that contain trees).

Cursor: A position in the DAG with a known value type. Reads are lazy: step descends to a child by name or typed field, fetching the block only when needed. Writes accumulate locally until flush persists them.

Proof: A subheap recording every block read during a computation. produce runs a function against a full heap and captures the reads; verify replays the function against the subheap alone, confirming the result without trusting the full store.

Sync: The protocol for exchanging blocks between stores. Irmin provides three default strategies: anti-entropy gossip (exchange head lists), Merkle descent (walk the DAG to find divergence), and Bloom-slice sync (probabilistic set difference using layered Bloom filters — efficient for large DAGs with small deltas).

References#

Irmin on GitHub — previous version of Irmin (Lwt-based, MirageOS-focused)
Git Internals — the content-addressed model that Irmin generalises
AT Protocol Repository Spec — ATProto's MST-based content-addressed repository
B. Journault and T. Gazagnaire. Irmin: a branch-consistent distributed library database. OCaml Workshop, 2014.

License#

ISC

Configure Feed