Streaming opam file codec for OCaml
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

OCaml 97.4%
Dune 0.9%
Other 1.7%
17 1 0

Clone this repository

https://tangled.org/gazagnaire.org/ocaml-opam https://tangled.org/did:plc:jhift2vwcxhou52p3sewcrpx/ocaml-opam
git@git.recoil.org:gazagnaire.org/ocaml-opam git@git.recoil.org:did:plc:jhift2vwcxhou52p3sewcrpx/ocaml-opam

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

ocaml-opam#

A type-safe codec library for opam files using a combinator-based approach inspired by Jsont and the conventions of ocaml-toml and ocaml-json.

Layout#

  • lib/ — value AST, lexer, parser, printer, codec combinators.
  • lib/bytesrw/ — streaming I/O via Bytesrw.

Installation#

Install with opam:

opam install nox-opam

If opam cannot find the package, it may not yet be released in the public opam-repository. Add the overlay repository, then install it:

opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
opam update
opam install nox-opam

For Eio, combine with bytesrw-eio:

let read_opam_from_flow flow =
  let r = Bytesrw_eio.bytes_reader_of_flow flow in
  Opam_bytesrw.of_reader r

For Unix channels, use Bytesrw directly:

let read_opam_from_path path =
  let r = Bytesrw.Bytes.Reader.of_in_channel (open_in path) in
  Opam_bytesrw.of_reader ~file:path r

Quick start#

Define a codec for a record:

type pkg = { name : string; version : string; depends : string list }

let pkg_codec : pkg Opam.File.t =
  Opam.Codec.File.(
    obj (fun name version depends -> { name; version; depends })
    |> field "name"    Opam.Codec.string ~enc:(fun p -> p.name)
    |> field "version" Opam.Codec.string ~enc:(fun p -> p.version)
    |> field "depends" Opam.Codec.(list string)
        ~dec_absent:[] ~enc:(fun p -> p.depends)
    |> finish)

let decode_pkg input =
  match Opam.decode_string pkg_codec input with
  | Ok pkg -> pkg
  | Error e -> failwith (Opam.Error.to_string e)

For raw value parsing without codecs, use Opam_bytesrw.of_string / of_reader.

Errors#

Errors carry a Loc.t source location. Typed kinds live as extensions of Loc.Error.kind:

let report_error codec s =
  match Opam.decode_string codec s with
  | Ok _ -> ()
  | Error { kind = Opam.Error.Missing_field _name; _ } -> ()
  | Error { kind = Opam.Error.Sort_mismatch _; _ } -> ()
  | Error e -> prerr_endline (Opam.Error.to_string e)

A printer is registered at module load so Loc.Error.to_string renders typed kinds with helpful messages.

Benchmarks#

bench/bench.ml compares ocaml-opam against opam-file-format on real opam files. On typical opam manifests in this monorepo:

$ Geomean full ocaml-opam:  206.1 MB/s  (2.30x opam-file-format)
$ Geomean field ocaml-opam: 333.4 MB/s  (3.72x opam-file-format)
$ Geomean opam-file-format:  89.7 MB/s

Two decoding paths:

  • full: build the complete {!Opam.Value.file} AST. Both libraries do this.
  • field: use Opam.Parser.find_field (a.k.a. Opam_bytesrw.find_field) to extract a single top-level field. ocaml-opam token-skips unwanted fields and sections without building their values — and stops as soon as the requested field is found. opam-file-format has no skip-parse path, so it still does full parse there.

Run with:

$ dune exec ocaml-opam/bench/bench.exe -- ocaml-paseto/paseto.opam ...

How we got there#

  1. Hand-written lexer reading from a mutable {!Lexer.source} (bytes + pos + len + refill callback). No per-byte function calls; bytesrw slices are fed directly into the source.
  2. Single shared scratch Buffer.t reused across token scans.
  3. No Loc.t allocation per token — the lexer tracks position as mutable ints and builds Loc.t only on demand for errors.
  4. Parser lookahead caches just tokens (not (token, loc) pairs).
  5. find_field uses token-level skipping for unwanted fields: the skip_* helpers mirror the parse_* grammar but allocate nothing for the skipped spans.

Status#

  • AST: complete (booleans, integers, strings, identifiers, lists, groups, optional/filtered values, env bindings, prefix and binary relops, logical ops, sections).
  • Lexer / parser / printer: round-trip tested on hand-written inputs and real opam files from this monorepo.
  • Codec combinators: bool, int, string, ident, list, option, map, enum, filtered, constraint_, File.obj / field / opt / finish.
  • Tests: 71 cases, including 29 negative parser tests.
  • Out of scope (for now): fuzz target, jsont bridge, advanced codec combinators (fold / iterate / unknown-handling) — add them as needed.