ocaml-opam#
A type-safe codec library for opam files using a combinator-based approach inspired by Jsont and the conventions of ocaml-toml and ocaml-json.
Layout#
lib/— value AST, lexer, parser, printer, codec combinators.lib/bytesrw/— streaming I/O via Bytesrw.
Installation#
Install with opam:
opam install nox-opam
If opam cannot find the package, it may not yet be released in the public
opam-repository. Add the overlay repository, then install it:
opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
opam update
opam install nox-opam
For Eio, combine with bytesrw-eio:
let read_opam_from_flow flow =
let r = Bytesrw_eio.bytes_reader_of_flow flow in
Opam_bytesrw.of_reader r
For Unix channels, use Bytesrw directly:
let read_opam_from_path path =
let r = Bytesrw.Bytes.Reader.of_in_channel (open_in path) in
Opam_bytesrw.of_reader ~file:path r
Quick start#
Define a codec for a record:
type pkg = { name : string; version : string; depends : string list }
let pkg_codec : pkg Opam.File.t =
Opam.Codec.File.(
obj (fun name version depends -> { name; version; depends })
|> field "name" Opam.Codec.string ~enc:(fun p -> p.name)
|> field "version" Opam.Codec.string ~enc:(fun p -> p.version)
|> field "depends" Opam.Codec.(list string)
~dec_absent:[] ~enc:(fun p -> p.depends)
|> finish)
let decode_pkg input =
match Opam.decode_string pkg_codec input with
| Ok pkg -> pkg
| Error e -> failwith (Opam.Error.to_string e)
For raw value parsing without codecs, use Opam_bytesrw.of_string /
of_reader.
Errors#
Errors carry a Loc.t source location. Typed kinds live as extensions
of Loc.Error.kind:
let report_error codec s =
match Opam.decode_string codec s with
| Ok _ -> ()
| Error { kind = Opam.Error.Missing_field _name; _ } -> ()
| Error { kind = Opam.Error.Sort_mismatch _; _ } -> ()
| Error e -> prerr_endline (Opam.Error.to_string e)
A printer is registered at module load so Loc.Error.to_string
renders typed kinds with helpful messages.
Benchmarks#
bench/bench.ml compares ocaml-opam against
opam-file-format on
real opam files. On typical opam manifests in this monorepo:
$ Geomean full ocaml-opam: 206.1 MB/s (2.30x opam-file-format)
$ Geomean field ocaml-opam: 333.4 MB/s (3.72x opam-file-format)
$ Geomean opam-file-format: 89.7 MB/s
Two decoding paths:
- full: build the complete {!Opam.Value.file} AST. Both libraries do this.
- field: use
Opam.Parser.find_field(a.k.a.Opam_bytesrw.find_field) to extract a single top-level field. ocaml-opam token-skips unwanted fields and sections without building their values — and stops as soon as the requested field is found. opam-file-format has no skip-parse path, so it still does full parse there.
Run with:
$ dune exec ocaml-opam/bench/bench.exe -- ocaml-paseto/paseto.opam ...
How we got there#
- Hand-written lexer reading from a mutable {!Lexer.source} (bytes + pos + len + refill callback). No per-byte function calls; bytesrw slices are fed directly into the source.
- Single shared scratch
Buffer.treused across token scans. - No
Loc.tallocation per token — the lexer tracks position as mutable ints and buildsLoc.tonly on demand for errors. - Parser lookahead caches just tokens (not
(token, loc)pairs). find_fielduses token-level skipping for unwanted fields: theskip_*helpers mirror theparse_*grammar but allocate nothing for the skipped spans.
Status#
- AST: complete (booleans, integers, strings, identifiers, lists, groups, optional/filtered values, env bindings, prefix and binary relops, logical ops, sections).
- Lexer / parser / printer: round-trip tested on hand-written inputs and real opam files from this monorepo.
- Codec combinators:
bool,int,string,ident,list,option,map,enum,filtered,constraint_,File.obj/field/opt/finish. - Tests: 71 cases, including 29 negative parser tests.
- Out of scope (for now): fuzz target, jsont bridge, advanced codec combinators (fold / iterate / unknown-handling) — add them as needed.