opam: new streaming codec library for opam files
A type-safe codec library inspired by Jsont and the monorepo's
ocaml-toml / ocaml-json conventions:
- lib/: value AST, hand-written streaming Lexer, recursive-descent
Parser, canonical Printer, codec combinators (bool/int/string/
ident/list/option/map/enum/filtered/constraint_ plus File record
builder). Errors extend Loc.Error.kind with typed variants
(Unexpected_char, Unterminated_string, Sort_mismatch,
Missing_field, ...) and register a printer at load.
- lib/bytesrw/: streaming I/O. The lexer reads from a mutable
Lexer.source (bytes + pos + len + refill callback); bytesrw
feeds slices straight in via source_of_reader -- no per-byte
callback, no copy. of_string maps the input string with
Bytes.unsafe_of_string for zero-overhead reads.
- Parser.field: token-level skip-parse for extracting one
top-level field. Mirrors parse_value structurally but
allocates nothing for the skipped spans and short-circuits
on match.
Profile-guided optimisations driven by memtrace_hotspots:
- Input callback returns int (-1 = EOF) instead of char option, so
no Some-boxing per byte. Bytes are unboxed to char via
Char.unsafe_chr for matching against character literals.
- Single shared Buffer.t per lexer, cleared between token scans.
- Loc.t is built lazily: the lexer tracks token start/end as
mutable ints and Lexer.last_loc / current_loc materialise a
Loc.t only when an error is raised. Parser lookahead caches
tokens (not (token, loc) pairs).
Benchmark on real opam files in this monorepo vs opam-file-format:
Geomean full ocaml-opam: 206.1 MB/s (2.30x opam-file-format)
Geomean field ocaml-opam: 333.4 MB/s (3.72x opam-file-format)
Geomean opam-file-format: 89.7 MB/s
77 tests (value / lexer / parser / printer / codec / opam_error /
opam_bytesrw) including 29 negative parser tests covering
unbalanced delimiters, escape errors, lone operators, garbage at
top level, and error-location precision through the 2-token
lookahead.