Streaming opam file codec for OCaml
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

opam: new streaming codec library for opam files

A type-safe codec library inspired by Jsont and the monorepo's
ocaml-toml / ocaml-json conventions:

- lib/: value AST, hand-written streaming Lexer, recursive-descent
Parser, canonical Printer, codec combinators (bool/int/string/
ident/list/option/map/enum/filtered/constraint_ plus File record
builder). Errors extend Loc.Error.kind with typed variants
(Unexpected_char, Unterminated_string, Sort_mismatch,
Missing_field, ...) and register a printer at load.
- lib/bytesrw/: streaming I/O. The lexer reads from a mutable
Lexer.source (bytes + pos + len + refill callback); bytesrw
feeds slices straight in via source_of_reader -- no per-byte
callback, no copy. of_string maps the input string with
Bytes.unsafe_of_string for zero-overhead reads.
- Parser.field: token-level skip-parse for extracting one
top-level field. Mirrors parse_value structurally but
allocates nothing for the skipped spans and short-circuits
on match.

Profile-guided optimisations driven by memtrace_hotspots:

- Input callback returns int (-1 = EOF) instead of char option, so
no Some-boxing per byte. Bytes are unboxed to char via
Char.unsafe_chr for matching against character literals.
- Single shared Buffer.t per lexer, cleared between token scans.
- Loc.t is built lazily: the lexer tracks token start/end as
mutable ints and Lexer.last_loc / current_loc materialise a
Loc.t only when an error is raised. Parser lookahead caches
tokens (not (token, loc) pairs).

Benchmark on real opam files in this monorepo vs opam-file-format:

Geomean full ocaml-opam: 206.1 MB/s (2.30x opam-file-format)
Geomean field ocaml-opam: 333.4 MB/s (3.72x opam-file-format)
Geomean opam-file-format: 89.7 MB/s

77 tests (value / lexer / parser / printer / codec / opam_error /
opam_bytesrw) including 29 negative parser tests covering
unbalanced delimiters, escape errors, lone operators, garbage at
top level, and error-location precision through the 2-token
lookahead.

+3457
+1
.ocamlformat
··· 1 + version = 0.29.0
+123
README.md
··· 1 + # ocaml-opam 2 + 3 + A type-safe codec library for opam files using a combinator-based 4 + approach inspired by [Jsont](https://erratique.ch/software/jsont) and 5 + the conventions of [ocaml-toml](../ocaml-toml/) and 6 + [ocaml-json](../ocaml-json/). 7 + 8 + ## Layout 9 + 10 + - `lib/` — value AST, lexer, parser, printer, codec combinators. 11 + - `lib/bytesrw/` — streaming I/O via [Bytesrw](https://erratique.ch/software/bytesrw). 12 + 13 + For Eio, combine with [`bytesrw-eio`](../ocaml-bytesrw-eio/): 14 + 15 + ```ocaml 16 + let r = Bytesrw_eio.bytes_reader_of_flow flow in 17 + Opam_bytesrw.of_reader r 18 + ``` 19 + 20 + For Unix channels, use Bytesrw directly: 21 + 22 + ```ocaml 23 + let r = Bytesrw.Bytes.Reader.of_in_channel (open_in path) in 24 + Opam_bytesrw.of_reader ~file:path r 25 + ``` 26 + 27 + ## Quick start 28 + 29 + Define a codec for a record: 30 + 31 + ```ocaml 32 + type pkg = { name : string; version : string; depends : string list } 33 + 34 + let pkg_codec : pkg Opam.File.t = 35 + Opam.File.( 36 + obj (fun name version depends -> { name; version; depends }) 37 + |> field "name" Opam.string ~enc:(fun p -> p.name) 38 + |> field "version" Opam.string ~enc:(fun p -> p.version) 39 + |> field "depends" Opam.(list string) 40 + ~dec_absent:[] ~enc:(fun p -> p.depends) 41 + |> finish) 42 + 43 + let () = 44 + match Opam.decode_string pkg_codec input with 45 + | Ok pkg -> ... 46 + | Error e -> prerr_endline (Opam.Error.to_string e) 47 + ``` 48 + 49 + For raw value parsing without codecs, use `Opam_bytesrw.of_string` / 50 + `of_reader`. 51 + 52 + ## Errors 53 + 54 + Errors carry a `Loc.t` source location. Typed kinds live as extensions 55 + of `Loc.Error.kind`: 56 + 57 + ```ocaml 58 + match Opam.decode_string pkg_codec s with 59 + | Ok _ -> () 60 + | Error (_, _, Opam.Error.Missing_field name) -> ... 61 + | Error (_, _, Opam.Error.Sort_mismatch { expected; found }) -> ... 62 + | Error e -> prerr_endline (Opam.Error.to_string e) 63 + ``` 64 + 65 + A printer is registered at module load so `Loc.Error.to_string` 66 + renders typed kinds with helpful messages. 67 + 68 + ## Benchmarks 69 + 70 + `bench/bench.ml` compares ocaml-opam against 71 + [`opam-file-format`](https://github.com/ocaml/opam-file-format) on 72 + real opam files. On typical opam manifests in this monorepo: 73 + 74 + ```text 75 + Geomean full ocaml-opam: 206.1 MB/s (2.30x opam-file-format) 76 + Geomean field ocaml-opam: 333.4 MB/s (3.72x opam-file-format) 77 + Geomean opam-file-format: 89.7 MB/s 78 + ``` 79 + 80 + Two decoding paths: 81 + 82 + - **full**: build the complete {!Opam.Value.file} AST. Both libraries 83 + do this. 84 + - **field**: use `Opam.Parser.find_field` (a.k.a. 85 + `Opam_bytesrw.find_field`) to extract a single top-level field. 86 + ocaml-opam token-skips unwanted fields and sections without building 87 + their values — and stops as soon as the requested field is found. 88 + opam-file-format has no skip-parse path, so it still does full parse 89 + there. 90 + 91 + Run with: 92 + 93 + ```sh 94 + dune exec ocaml-opam/bench/bench.exe -- ocaml-paseto/paseto.opam ... 95 + ``` 96 + 97 + ### How we got there 98 + 99 + 1. Hand-written lexer reading from a mutable {!Lexer.source} (bytes + 100 + pos + len + refill callback). No per-byte function calls; bytesrw 101 + slices are fed directly into the source. 102 + 2. Single shared scratch `Buffer.t` reused across token scans. 103 + 3. No `Loc.t` allocation per token — the lexer tracks position as 104 + mutable ints and builds `Loc.t` only on demand for errors. 105 + 4. Parser lookahead caches just tokens (not `(token, loc)` pairs). 106 + 5. `find_field` uses token-level skipping for unwanted fields: the 107 + `skip_*` helpers mirror the `parse_*` grammar but allocate nothing 108 + for the skipped spans. 109 + 110 + ## Status 111 + 112 + - AST: complete (booleans, integers, strings, identifiers, lists, 113 + groups, optional/filtered values, env bindings, prefix and binary 114 + relops, logical ops, sections). 115 + - Lexer / parser / printer: round-trip tested on hand-written inputs 116 + and real opam files from this monorepo. 117 + - Codec combinators: `bool`, `int`, `string`, `ident`, `list`, 118 + `option`, `map`, `enum`, `filtered`, `constraint_`, `File.obj` / 119 + `field` / `opt` / `finish`. 120 + - Tests: 71 cases, including 29 negative parser tests. 121 + - Out of scope (for now): fuzz target, jsont bridge, advanced codec 122 + combinators (fold / iterate / unknown-handling) — add them as 123 + needed.
+169
bench/bench.ml
··· 1 + (* Benchmark ocaml-opam against opam-file-format on real opam files. 2 + 3 + Two modes per file: 4 + - [full] : parse the whole file into the AST 5 + - [field] : parse + extract just the [depends:] field's raw value 6 + (no further structural decoding of the value) 7 + 8 + Methodology mirrors ocaml-json/bench/bench.ml: 9 + - one untimed warmup 10 + - iterate until [min_wall_time_s] seconds elapsed and at least 11 + [min_iters] iterations 12 + - report best (min) and median throughput in MB/s. 13 + *) 14 + 15 + let () = Memtrace.trace_if_requested () 16 + 17 + let read_file path = 18 + let ic = open_in_bin path in 19 + let n = in_channel_length ic in 20 + let b = Bytes.create n in 21 + really_input ic b 0 n; 22 + close_in ic; 23 + Bytes.unsafe_to_string b 24 + 25 + let min_wall_time_s = 1.0 26 + let min_iters = 5 27 + 28 + let run_mode ~content ~decode = 29 + decode content; 30 + Gc.compact (); 31 + let times = ref [] in 32 + let alloc_start = Gc.allocated_bytes () in 33 + let t_budget_end = Unix.gettimeofday () +. min_wall_time_s in 34 + let rec loop i = 35 + if i >= min_iters && Unix.gettimeofday () >= t_budget_end then i 36 + else begin 37 + Gc.full_major (); 38 + let t0 = Unix.gettimeofday () in 39 + decode content; 40 + let t1 = Unix.gettimeofday () in 41 + times := (t1 -. t0) :: !times; 42 + loop (i + 1) 43 + end 44 + in 45 + let iters = loop 0 in 46 + let alloc_end = Gc.allocated_bytes () in 47 + let ts = Array.of_list !times in 48 + Array.sort compare ts; 49 + let min_s = ts.(0) in 50 + let median_s = 51 + if iters mod 2 = 0 then (ts.((iters / 2) - 1) +. ts.(iters / 2)) /. 2.0 52 + else ts.(iters / 2) 53 + in 54 + let alloc_mb_per_iter = 55 + (alloc_end -. alloc_start) /. float_of_int iters /. 1_048_576.0 56 + in 57 + (iters, min_s, median_s, alloc_mb_per_iter) 58 + 59 + (* Decoders *) 60 + 61 + let opam_full content = 62 + let _ = Opam_bytesrw.of_string content in 63 + () 64 + 65 + let opam_field content = 66 + match Opam_bytesrw.field "depends" content with Some _ | None -> () 67 + 68 + let opamff_full content = 69 + let _ = OpamParser.FullPos.string content "<bench>" in 70 + () 71 + 72 + let opamff_field content = 73 + let f = OpamParser.FullPos.string content "<bench>" in 74 + let module OP = OpamParserTypes.FullPos in 75 + let _ = 76 + List.find_opt 77 + (fun item -> 78 + match item.OP.pelem with 79 + | OP.Variable (n, _) when n.OP.pelem = "depends" -> true 80 + | _ -> false) 81 + f.OP.file_contents 82 + in 83 + () 84 + 85 + (* Try-and-skip to handle files that one or both parsers reject. *) 86 + let try_decode label decode content = 87 + try 88 + decode content; 89 + true 90 + with e -> 91 + Fmt.epr "%s: %s@." label (Printexc.to_string e); 92 + false 93 + 94 + let throughput size_mb time_s = size_mb /. time_s 95 + 96 + let file path = 97 + let name = Filename.basename (Filename.dirname path) in 98 + let content = read_file path in 99 + let size_bytes = String.length content in 100 + let size_mb = float_of_int size_bytes /. 1_048_576.0 in 101 + if 102 + try_decode "ocaml-opam" opam_full content 103 + && try_decode "opam-file-format" opamff_full content 104 + then 105 + let r1 = run_mode ~content ~decode:opam_full in 106 + let r2 = run_mode ~content ~decode:opamff_full in 107 + let r3 = run_mode ~content ~decode:opam_field in 108 + let r4 = run_mode ~content ~decode:opamff_field in 109 + Some (name, size_mb, r1, r2, r3, r4) 110 + else None 111 + 112 + let () = 113 + let files = 114 + Array.sub Sys.argv 1 (Array.length Sys.argv - 1) |> Array.to_list 115 + in 116 + if files = [] then begin 117 + prerr_endline "usage: bench <opam-file> ..."; 118 + exit 1 119 + end; 120 + Fmt.pr 121 + "| file | size kB | full ocaml-opam MB/s | full opam-ff MB/s | speedup | \ 122 + field ocaml-opam MB/s | field opam-ff MB/s | speedup |@."; 123 + Fmt.pr 124 + "|------|---------|----------------------|-------------------|---------|----------------------|--------------------|---------|@."; 125 + let results = List.filter_map file files in 126 + List.iter 127 + (fun (name, size_mb, r1, r2, r3, r4) -> 128 + let _, t1_min, _, _ = r1 in 129 + let _, t2_min, _, _ = r2 in 130 + let _, t3_min, _, _ = r3 in 131 + let _, t4_min, _, _ = r4 in 132 + let s1 = throughput size_mb t1_min in 133 + let s2 = throughput size_mb t2_min in 134 + let s3 = throughput size_mb t3_min in 135 + let s4 = throughput size_mb t4_min in 136 + Fmt.pr 137 + "| %-22s | %7.2f | %20.1f | %17.1f | %6.2fx | %20.1f | %18.1f | %6.2fx \ 138 + |@." 139 + name (size_mb *. 1024.0) s1 s2 (s1 /. s2) s3 s4 (s3 /. s4)) 140 + results; 141 + let n = List.length results in 142 + if n > 0 then begin 143 + let log_sum f = 144 + List.fold_left (fun acc r -> acc +. log (f r)) 0.0 results 145 + in 146 + let geo f = exp (log_sum f /. float_of_int n) in 147 + let s1 r = 148 + let _, sm, (_, t, _, _), _, _, _ = r in 149 + throughput sm t 150 + in 151 + let s2 r = 152 + let _, sm, _, (_, t, _, _), _, _ = r in 153 + throughput sm t 154 + in 155 + let s3 r = 156 + let _, sm, _, _, (_, t, _, _), _ = r in 157 + throughput sm t 158 + in 159 + let s4 r = 160 + let _, sm, _, _, _, (_, t, _, _) = r in 161 + throughput sm t 162 + in 163 + Fmt.pr "@.Geomean full ocaml-opam: %.1f MB/s@." (geo s1); 164 + Fmt.pr "Geomean full opam-ff: %.1f MB/s@." (geo s2); 165 + Fmt.pr "Geomean speedup full: %.2fx@." (geo s1 /. geo s2); 166 + Fmt.pr "Geomean field ocaml-opam: %.1f MB/s@." (geo s3); 167 + Fmt.pr "Geomean field opam-ff: %.1f MB/s@." (geo s4); 168 + Fmt.pr "Geomean speedup field: %.2fx@." (geo s3 /. geo s4) 169 + end
+3
bench/dune
··· 1 + (executable 2 + (name bench) 3 + (libraries opam opam.bytesrw opam-file-format unix memtrace fmt))
+26
dune-project
··· 1 + (lang dune 3.21) 2 + (name opam) 3 + 4 + (generate_opam_files true) 5 + 6 + (license ISC) 7 + (authors "Thomas Gazagnaire <thomas@gazagnaire.org>") 8 + (maintainers "Thomas Gazagnaire <thomas@gazagnaire.org>") 9 + (source (tangled gazagnaire.org/ocaml-opam)) 10 + 11 + (package 12 + (name opam) 13 + (synopsis "Streaming opam file codec for OCaml") 14 + (tags (org:blacksun codec format opam)) 15 + (description 16 + "A type-safe codec library for opam files using a combinator-based 17 + approach inspired by Jsont and the ocaml-toml package. The core library 18 + provides a value AST and codec combinators. The opam.bytesrw subpackage 19 + provides a streaming parser and encoder built on Bytesrw; combine with 20 + bytesrw-eio to drive it from an Eio flow.") 21 + (depends 22 + (ocaml (>= 4.14.0)) 23 + (fmt (>= 0.9.0)) 24 + (bytesrw (>= 0.1.0)) 25 + (loc (>= 0.1)) 26 + (alcotest :with-test)))
+4
lib/bytesrw/dune
··· 1 + (library 2 + (name opam_bytesrw) 3 + (public_name opam.bytesrw) 4 + (libraries opam bytesrw))
+64
lib/bytesrw/opam_bytesrw.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Bytesrw 7 + 8 + (* Build a [Lexer.source] that pulls slices straight from a 9 + [Bytes.Reader.t]. The refill callback re-points the source's [buf] 10 + at each new slice without copying. *) 11 + let source_of_reader r : Opam.Lexer.source = 12 + let refill (s : Opam.Lexer.source) = 13 + let slice = Bytes.Reader.read r in 14 + if Bytesrw.Bytes.Slice.is_eod slice then s.eof <- true 15 + else begin 16 + s.buf <- Bytesrw.Bytes.Slice.bytes slice; 17 + s.pos <- Bytesrw.Bytes.Slice.first slice; 18 + s.len <- s.pos + Bytesrw.Bytes.Slice.length slice 19 + end 20 + in 21 + { buf = Bytes.empty; pos = 0; len = 0; eof = false; refill } 22 + 23 + let lexer_of_reader ?file r = Opam.Lexer.of_source ?file (source_of_reader r) 24 + 25 + let of_reader ?file r = 26 + let lex = lexer_of_reader ?file r in 27 + Opam.Parser.parse lex 28 + 29 + let of_string ?file s = 30 + let lex = Opam.Lexer.of_string ?file s in 31 + Opam.Parser.parse lex 32 + 33 + let parse_value_string ?file s = 34 + let lex = Opam.Lexer.of_string ?file s in 35 + Opam.Parser.parse_value lex 36 + 37 + let parse_value_reader ?file r = 38 + let lex = lexer_of_reader ?file r in 39 + Opam.Parser.parse_value lex 40 + 41 + let field ?file name s = 42 + let lex = Opam.Lexer.of_string ?file s in 43 + Opam.Parser.field lex name 44 + 45 + let field_reader ?file name r = 46 + let lex = lexer_of_reader ?file r in 47 + Opam.Parser.field lex name 48 + 49 + let to_string f = Opam.Printer.to_string f 50 + 51 + let to_writer w f = 52 + let s = to_string f in 53 + Bytes.Writer.write_string w s; 54 + Bytes.Writer.write_eod w 55 + 56 + let decode_reader ?file fc r = 57 + try 58 + let f = of_reader ?file r in 59 + Opam.File.of_file fc f 60 + with Opam.Error e -> Error e 61 + 62 + let encode_writer fc x w = 63 + let f = Opam.File.to_file fc x in 64 + to_writer w f
+59
lib/bytesrw/opam_bytesrw.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Bytesrw integration for opam parsing and encoding. 7 + 8 + Streaming I/O that drives the {!Opam.Lexer} from a [Bytes.Reader.t] and 9 + emits printer output to a [Bytes.Writer.t]. *) 10 + 11 + open Bytesrw 12 + 13 + (** {1:parse Parsing} *) 14 + 15 + val of_string : ?file:string -> string -> Opam.Value.file 16 + (** [of_string s] parses [s] as an opam file. 17 + @raise Opam.Error on parse errors. *) 18 + 19 + val of_reader : ?file:string -> Bytes.Reader.t -> Opam.Value.file 20 + (** [of_reader r] parses an opam file from [r]. 21 + @raise Opam.Error on parse errors. *) 22 + 23 + val parse_value_string : ?file:string -> string -> Opam.Value.t 24 + (** [parse_value_string s] parses [s] as a single opam value (no surrounding 25 + [field:]). 26 + @raise Opam.Error on parse errors. *) 27 + 28 + val parse_value_reader : ?file:string -> Bytes.Reader.t -> Opam.Value.t 29 + (** [parse_value_reader r] parses a single opam value from [r]. 30 + @raise Opam.Error on parse errors. *) 31 + 32 + val field : ?file:string -> string -> string -> Opam.Value.t option 33 + (** [field ?file name s] is [Some v] if the top-level field [name] appears in 34 + [s], or [None] otherwise. Scans [s] with token-level skipping — never 35 + allocates a {!Opam.Value.t} for unwanted fields and stops as soon as [name] 36 + is found. Much cheaper than {!of_string} + {!Opam.Value.find} on large 37 + inputs. 38 + @raise Opam.Error on syntax errors encountered while scanning. *) 39 + 40 + val field_reader : 41 + ?file:string -> string -> Bytes.Reader.t -> Opam.Value.t option 42 + (** [field_reader ?file name r] is like {!field} but reads from [r]. *) 43 + 44 + (** {1:encode Encoding} *) 45 + 46 + val to_string : Opam.Value.file -> string 47 + (** [to_string f] is [f] rendered as canonical opam syntax. *) 48 + 49 + val to_writer : Bytes.Writer.t -> Opam.Value.file -> unit 50 + (** [to_writer w f] writes [f] to [w] and emits an end-of-data slice. *) 51 + 52 + (** {1:codec Codec I/O} *) 53 + 54 + val decode_reader : 55 + ?file:string -> 'a Opam.File.t -> Bytes.Reader.t -> ('a, Opam.Error.t) result 56 + (** [decode_reader c r] parses [r] and decodes with [c]. *) 57 + 58 + val encode_writer : 'a Opam.File.t -> 'a -> Bytes.Writer.t -> unit 59 + (** [encode_writer c x w] encodes [x] with [c] and writes to [w]. *)
+7
lib/dune
··· 1 + (library 2 + (name opam) 3 + (public_name opam) 4 + (modules opam value lexer parser printer opam_error) 5 + (libraries 6 + fmt 7 + (re_export loc)))
+464
lib/lexer.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + type token = 7 + | LBRACKET 8 + | RBRACKET 9 + | LBRACE 10 + | RBRACE 11 + | LPAREN 12 + | RPAREN 13 + | COLON 14 + | COMMA 15 + | EQ 16 + | NEQ 17 + | GT 18 + | GEQ 19 + | LT 20 + | LEQ 21 + | AND 22 + | OR 23 + | NOT 24 + | QMARK 25 + | PLUSEQ 26 + | EQPLUS 27 + | COLONEQ 28 + | EQCOLON 29 + | STRING of string 30 + | INT of int 31 + | BOOL of bool 32 + | IDENT of string 33 + | EOF 34 + 35 + let pp_token ppf = function 36 + | LBRACKET -> Fmt.string ppf "[" 37 + | RBRACKET -> Fmt.string ppf "]" 38 + | LBRACE -> Fmt.string ppf "{" 39 + | RBRACE -> Fmt.string ppf "}" 40 + | LPAREN -> Fmt.string ppf "(" 41 + | RPAREN -> Fmt.string ppf ")" 42 + | COLON -> Fmt.string ppf ":" 43 + | COMMA -> Fmt.string ppf "," 44 + | EQ -> Fmt.string ppf "=" 45 + | NEQ -> Fmt.string ppf "!=" 46 + | GT -> Fmt.string ppf ">" 47 + | GEQ -> Fmt.string ppf ">=" 48 + | LT -> Fmt.string ppf "<" 49 + | LEQ -> Fmt.string ppf "<=" 50 + | AND -> Fmt.string ppf "&" 51 + | OR -> Fmt.string ppf "|" 52 + | NOT -> Fmt.string ppf "!" 53 + | QMARK -> Fmt.string ppf "?" 54 + | PLUSEQ -> Fmt.string ppf "+=" 55 + | EQPLUS -> Fmt.string ppf "=+" 56 + | COLONEQ -> Fmt.string ppf ":=" 57 + | EQCOLON -> Fmt.string ppf "=:" 58 + | STRING s -> Fmt.pf ppf {|STRING %S|} s 59 + | INT i -> Fmt.pf ppf "INT %d" i 60 + | BOOL b -> Fmt.pf ppf "BOOL %b" b 61 + | IDENT s -> Fmt.pf ppf "IDENT %s" s 62 + | EOF -> Fmt.string ppf "EOF" 63 + 64 + (* The source holds the current chunk; refill is called when [pos = len] 65 + and [not eof]. Hot read path is [Bytes.unsafe_get src.buf src.pos] 66 + - no callbacks, no allocation. *) 67 + type source = { 68 + mutable buf : bytes; 69 + mutable pos : int; 70 + mutable len : int; 71 + mutable eof : bool; 72 + refill : source -> unit; 73 + } 74 + 75 + (* Position tracking is kept as raw mutable ints on the lexer. We never 76 + build a [Loc.t] per token — only on demand for errors via 77 + {!last_loc} / {!current_loc}. *) 78 + type t = { 79 + src : source; 80 + file : string; 81 + scratch : Buffer.t; 82 + mutable byte : int; 83 + mutable line : int; 84 + mutable line_byte : int; 85 + (* Position info for the most recently returned token. *) 86 + mutable tok_start_byte : int; 87 + mutable tok_start_line : int; 88 + mutable tok_start_line_byte : int; 89 + mutable tok_end_byte : int; 90 + mutable tok_end_line : int; 91 + mutable tok_end_line_byte : int; 92 + } 93 + 94 + let of_source ?(file = "-") src = 95 + { 96 + src; 97 + file; 98 + scratch = Buffer.create 64; 99 + byte = 0; 100 + line = 1; 101 + line_byte = 0; 102 + tok_start_byte = 0; 103 + tok_start_line = 1; 104 + tok_start_line_byte = 0; 105 + tok_end_byte = 0; 106 + tok_end_line = 1; 107 + tok_end_line_byte = 0; 108 + } 109 + 110 + let of_string ?file s = 111 + let buf = Bytes.unsafe_of_string s in 112 + let src = 113 + { buf; pos = 0; len = Bytes.length buf; eof = true; refill = (fun _ -> ()) } 114 + in 115 + of_source ?file src 116 + 117 + type input = unit -> int 118 + 119 + let of_input ?file f = 120 + let scratch = Bytes.create 1 in 121 + let refill src = 122 + let c = f () in 123 + if c < 0 then src.eof <- true 124 + else begin 125 + Bytes.unsafe_set scratch 0 (Char.unsafe_chr c); 126 + src.buf <- scratch; 127 + src.pos <- 0; 128 + src.len <- 1 129 + end 130 + in 131 + let src = { buf = scratch; pos = 0; len = 0; eof = false; refill } in 132 + of_source ?file src 133 + 134 + let file l = l.file 135 + let pp ppf l = Fmt.pf ppf "Lexer(%s, line %d, byte %d)" l.file l.line l.byte 136 + 137 + (* Ensure at least one byte is available, or set eof. Returns true if a 138 + byte is now available. *) 139 + let rec ensure_byte l = 140 + let s = l.src in 141 + if s.pos < s.len then true 142 + else if s.eof then false 143 + else begin 144 + s.refill s; 145 + ensure_byte l 146 + end 147 + 148 + let peek_int l = 149 + if ensure_byte l then Char.code (Bytes.unsafe_get l.src.buf l.src.pos) else -1 150 + 151 + let peek_char l = Bytes.unsafe_get l.src.buf l.src.pos 152 + let is_eof l = not (ensure_byte l) 153 + 154 + (* Consume one byte; returns its int code (-1 at EOF). *) 155 + let advance l = 156 + if ensure_byte l then begin 157 + let s = l.src in 158 + let c = Bytes.unsafe_get s.buf s.pos in 159 + s.pos <- s.pos + 1; 160 + if c = '\n' then begin 161 + l.byte <- l.byte + 1; 162 + l.line <- l.line + 1; 163 + l.line_byte <- l.byte 164 + end 165 + else l.byte <- l.byte + 1; 166 + Char.code c 167 + end 168 + else -1 169 + 170 + (* Consume one byte that the caller knows exists. *) 171 + let bump l = 172 + let s = l.src in 173 + let c = Bytes.unsafe_get s.buf s.pos in 174 + s.pos <- s.pos + 1; 175 + if c = '\n' then begin 176 + l.byte <- l.byte + 1; 177 + l.line <- l.line + 1; 178 + l.line_byte <- l.byte 179 + end 180 + else l.byte <- l.byte + 1 181 + 182 + (* Snapshot the lexer's current cursor into the [tok_start_*] fields. *) 183 + let mark_start l = 184 + l.tok_start_byte <- l.byte; 185 + l.tok_start_line <- l.line; 186 + l.tok_start_line_byte <- l.line_byte 187 + 188 + (* Snapshot the lexer's current cursor into the [tok_end_*] fields. *) 189 + let mark_end l = 190 + l.tok_end_byte <- l.byte; 191 + l.tok_end_line <- l.line; 192 + l.tok_end_line_byte <- l.line_byte 193 + 194 + let last_loc l = 195 + let last_byte = max l.tok_start_byte (l.tok_end_byte - 1) in 196 + Loc.make ~file:l.file ~first_byte:l.tok_start_byte ~last_byte 197 + ~first_line_num:l.tok_start_line ~first_line_byte:l.tok_start_line_byte 198 + ~last_line_num:l.tok_end_line ~last_line_byte:l.tok_end_line_byte 199 + 200 + let current_loc l = 201 + Loc.make ~file:l.file ~first_byte:l.byte ~last_byte:l.byte 202 + ~first_line_num:l.line ~first_line_byte:l.line_byte ~last_line_num:l.line 203 + ~last_line_byte:l.line_byte 204 + 205 + (* Internal helpers that build a one-shot loc for raising errors during 206 + lexing. Most cost is amortised here since errors are rare. *) 207 + let error l msg = 208 + mark_end l; 209 + Opam_error.fail (Loc.Meta.make (last_loc l)) msg 210 + 211 + let errorf l fmt = Fmt.kstr (error l) fmt 212 + 213 + let raise_kind l kind = 214 + mark_end l; 215 + Loc.Error.raise [] (Loc.Meta.make (last_loc l)) kind 216 + 217 + let is_ident_start = function 218 + | 'a' .. 'z' | 'A' .. 'Z' | '_' -> true 219 + | _ -> false 220 + 221 + let is_ident_continue = function 222 + | 'a' .. 'z' | 'A' .. 'Z' | '_' | '0' .. '9' | '-' | '+' | '.' -> true 223 + | _ -> false 224 + 225 + let is_digit = function '0' .. '9' -> true | _ -> false 226 + 227 + let rec skip_line_comment l = 228 + if not (ensure_byte l) then () 229 + else if peek_char l = '\n' then () 230 + else ( 231 + bump l; 232 + skip_line_comment l) 233 + 234 + let rec skip_ws l = 235 + if not (ensure_byte l) then () 236 + else 237 + match peek_char l with 238 + | ' ' | '\t' | '\r' | '\n' -> 239 + bump l; 240 + skip_ws l 241 + | '#' -> 242 + bump l; 243 + skip_line_comment l; 244 + skip_ws l 245 + | _ -> () 246 + 247 + let scratch l = 248 + Buffer.clear l.scratch; 249 + l.scratch 250 + 251 + let hex_value l = function 252 + | '0' .. '9' as c -> Char.code c - Char.code '0' 253 + | 'a' .. 'f' as c -> Char.code c - Char.code 'a' + 10 254 + | 'A' .. 'F' as c -> Char.code c - Char.code 'A' + 10 255 + | _ -> raise_kind l Opam_error.Invalid_hex_escape 256 + 257 + let read_hex_byte l = 258 + let read_hex () = 259 + let c = advance l in 260 + if c < 0 then raise_kind l Opam_error.Invalid_hex_escape 261 + else hex_value l (Char.unsafe_chr c) 262 + in 263 + let h1 = read_hex () in 264 + let h2 = read_hex () in 265 + Char.chr ((h1 lsl 4) lor h2) 266 + 267 + (* Scan a single-line basic string. Leading double-quote already consumed. *) 268 + let scan_basic_string l = 269 + let buf = scratch l in 270 + let rec loop () = 271 + let c = advance l in 272 + if c < 0 then raise_kind l Opam_error.Unterminated_string 273 + else 274 + match Char.unsafe_chr c with 275 + | '"' -> Buffer.contents buf 276 + | '\n' -> error l "newline in single-line string" 277 + | '\\' -> 278 + let e = advance l in 279 + if e < 0 then raise_kind l Opam_error.Unterminated_escape 280 + else begin 281 + (match Char.unsafe_chr e with 282 + | 'n' -> Buffer.add_char buf '\n' 283 + | 'r' -> Buffer.add_char buf '\r' 284 + | 't' -> Buffer.add_char buf '\t' 285 + | 'b' -> Buffer.add_char buf '\b' 286 + | '\\' -> Buffer.add_char buf '\\' 287 + | '"' -> Buffer.add_char buf '"' 288 + | '\n' -> () (* line continuation *) 289 + | 'x' -> Buffer.add_char buf (read_hex_byte l) 290 + | '0' .. '9' as d1c -> 291 + let d1 = Char.code d1c - Char.code '0' in 292 + let read_dec () = 293 + let c = advance l in 294 + if c >= 0 && is_digit (Char.unsafe_chr c) then 295 + c - Char.code '0' 296 + else raise_kind l Opam_error.Invalid_decimal_escape 297 + in 298 + let d2 = read_dec () in 299 + let d3 = read_dec () in 300 + let v = (d1 * 100) + (d2 * 10) + d3 in 301 + if v > 255 then 302 + raise_kind l Opam_error.Decimal_escape_out_of_range 303 + else Buffer.add_char buf (Char.chr v) 304 + | other -> 305 + raise_kind l (Opam_error.Invalid_escape (Fmt.str "\\%c" other))); 306 + loop () 307 + end 308 + | other -> 309 + Buffer.add_char buf other; 310 + loop () 311 + in 312 + loop () 313 + 314 + (* Scan a triple-quoted string. The opening triple-quote has already been 315 + consumed. *) 316 + let scan_triple_string l = 317 + let buf = scratch l in 318 + let rec loop () = 319 + let c = advance l in 320 + if c < 0 then raise_kind l Opam_error.Unterminated_string 321 + else 322 + match Char.unsafe_chr c with 323 + | '"' -> 324 + if ensure_byte l && peek_char l = '"' then begin 325 + bump l; 326 + if ensure_byte l && peek_char l = '"' then begin 327 + bump l; 328 + Buffer.contents buf 329 + end 330 + else ( 331 + Buffer.add_string buf "\"\""; 332 + loop ()) 333 + end 334 + else ( 335 + Buffer.add_char buf '"'; 336 + loop ()) 337 + | other -> 338 + Buffer.add_char buf other; 339 + loop () 340 + in 341 + loop () 342 + 343 + let scan_number l first = 344 + let buf = scratch l in 345 + Buffer.add_char buf first; 346 + let rec loop () = 347 + if ensure_byte l && is_digit (peek_char l) then begin 348 + Buffer.add_char buf (peek_char l); 349 + bump l; 350 + loop () 351 + end 352 + in 353 + loop (); 354 + let s = Buffer.contents buf in 355 + match int_of_string_opt s with 356 + | Some i -> i 357 + | None -> errorf l "invalid integer %s" s 358 + 359 + let scan_ident l first = 360 + let buf = scratch l in 361 + Buffer.add_char buf first; 362 + let rec loop () = 363 + if ensure_byte l && is_ident_continue (peek_char l) then begin 364 + Buffer.add_char buf (peek_char l); 365 + bump l; 366 + loop () 367 + end 368 + in 369 + loop (); 370 + Buffer.contents buf 371 + 372 + let scan_negative_int l = 373 + if ensure_byte l && is_digit (peek_char l) then begin 374 + let first = peek_char l in 375 + bump l; 376 + let buf = scratch l in 377 + Buffer.add_char buf '-'; 378 + Buffer.add_char buf first; 379 + let rec loop () = 380 + if ensure_byte l && is_digit (peek_char l) then begin 381 + Buffer.add_char buf (peek_char l); 382 + bump l; 383 + loop () 384 + end 385 + in 386 + loop (); 387 + let s = Buffer.contents buf in 388 + match int_of_string_opt s with 389 + | Some i -> INT i 390 + | None -> errorf l "invalid integer %s" s 391 + end 392 + else raise_kind l (Opam_error.Unexpected_char '-') 393 + 394 + let scan_string_token l = 395 + if ensure_byte l && peek_char l = '"' then begin 396 + bump l; 397 + if ensure_byte l && peek_char l = '"' then begin 398 + bump l; 399 + STRING (scan_triple_string l) 400 + end 401 + else (* The "" was an empty basic string. *) 402 + STRING "" 403 + end 404 + else STRING (scan_basic_string l) 405 + 406 + let scan_ident_token l first = 407 + let id = scan_ident l first in 408 + match id with "true" -> BOOL true | "false" -> BOOL false | _ -> IDENT id 409 + 410 + let two_char l extends_to longer shorter = 411 + if ensure_byte l && peek_char l = extends_to then begin 412 + bump l; 413 + longer 414 + end 415 + else shorter 416 + 417 + let next l = 418 + skip_ws l; 419 + mark_start l; 420 + let c = advance l in 421 + let tok = 422 + if c < 0 then EOF 423 + else 424 + match Char.unsafe_chr c with 425 + | '[' -> LBRACKET 426 + | ']' -> RBRACKET 427 + | '{' -> LBRACE 428 + | '}' -> RBRACE 429 + | '(' -> LPAREN 430 + | ')' -> RPAREN 431 + | ',' -> COMMA 432 + | '&' -> AND 433 + | '|' -> OR 434 + | '?' -> QMARK 435 + | ':' -> two_char l '=' COLONEQ COLON 436 + | '!' -> two_char l '=' NEQ NOT 437 + | '>' -> two_char l '=' GEQ GT 438 + | '<' -> two_char l '=' LEQ LT 439 + | '=' -> 440 + begin if not (ensure_byte l) then EQ 441 + else 442 + match peek_char l with 443 + | '+' -> 444 + bump l; 445 + EQPLUS 446 + | ':' -> 447 + bump l; 448 + EQCOLON 449 + | _ -> EQ 450 + end 451 + | '+' -> 452 + if ensure_byte l && peek_char l = '=' then begin 453 + bump l; 454 + PLUSEQ 455 + end 456 + else raise_kind l (Opam_error.Unexpected_char '+') 457 + | '"' -> scan_string_token l 458 + | '-' -> scan_negative_int l 459 + | c when is_digit c -> INT (scan_number l c) 460 + | c when is_ident_start c -> scan_ident_token l c 461 + | c -> raise_kind l (Opam_error.Unexpected_char c) 462 + in 463 + mark_end l; 464 + tok
+101
lib/lexer.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Streaming lexer for the opam file grammar. 7 + 8 + The lexer reads from an [input] callback that returns the next byte (or 9 + [None] at EOF), so the caller controls buffering. Tokens carry a {!Loc.t} 10 + for error reporting. *) 11 + 12 + (** {1:tokens Tokens} *) 13 + 14 + type token = 15 + | LBRACKET 16 + | RBRACKET 17 + | LBRACE 18 + | RBRACE 19 + | LPAREN 20 + | RPAREN 21 + | COLON 22 + | COMMA 23 + | EQ 24 + | NEQ 25 + | GT 26 + | GEQ 27 + | LT 28 + | LEQ 29 + | AND 30 + | OR 31 + | NOT 32 + | QMARK 33 + | PLUSEQ 34 + | EQPLUS 35 + | COLONEQ 36 + | EQCOLON 37 + | STRING of string 38 + | INT of int 39 + | BOOL of bool 40 + | IDENT of string 41 + | EOF 42 + 43 + val pp_token : Format.formatter -> token -> unit 44 + (** [pp_token ppf t] pretty-prints token [t] for error messages. *) 45 + 46 + (** {1:lexer Lexer} *) 47 + 48 + type t 49 + (** Lexer state. *) 50 + 51 + val pp : Format.formatter -> t -> unit 52 + (** [pp ppf l] prints a short summary of [l]'s current position. *) 53 + 54 + (** {2 Buffer source} 55 + 56 + The lexer reads from a {!source}: a mutable buffer plus a refill callback. 57 + This lets sibling I/O libraries (e.g. {!Opam_bytesrw}) feed slices straight 58 + into the lexer without per-byte function calls. For string input use 59 + {!of_string}; for callback-style use {!of_input}. *) 60 + 61 + type source = { 62 + mutable buf : bytes; (** Current chunk. *) 63 + mutable pos : int; (** Next byte to consume in [buf]. *) 64 + mutable len : int; (** End of valid bytes in [buf]. *) 65 + mutable eof : bool; (** [true] if no more data will follow. *) 66 + refill : source -> unit; 67 + (** Called when [pos = len] and [not eof]. Must update fields to either 68 + supply more bytes (e.g. set [buf]/[pos]/[len]) or set [eof] to [true]. 69 + *) 70 + } 71 + 72 + val of_source : ?file:string -> source -> t 73 + (** [of_source ?file src] is a fresh lexer reading from [src]. *) 74 + 75 + val of_string : ?file:string -> string -> t 76 + (** [of_string ?file s] reads directly from [s] with zero per-byte overhead. *) 77 + 78 + type input = unit -> int 79 + (** Pull-based input source: returns the next byte (0..255) or [-1] at EOF. *) 80 + 81 + val of_input : ?file:string -> input -> t 82 + (** [of_input ?file f] wraps a callback into a lexer. Slower than {!of_source} 83 + since each byte requires a function call. *) 84 + 85 + val next : t -> token 86 + (** [next l] returns the next token. Locations are not allocated per token; use 87 + {!last_loc} from the parser when constructing errors. Raises 88 + {!Opam_error.Error} on lexical errors (unterminated string, unexpected 89 + character, etc.). *) 90 + 91 + val last_loc : t -> Loc.t 92 + (** [last_loc l] is the source location of the most recently returned token. 93 + Built lazily — only call this when constructing an error message. *) 94 + 95 + val current_loc : t -> Loc.t 96 + (** [current_loc l] is a zero-width location at the lexer's current cursor. 97 + Useful when a parser-level error is detected before the next token is read. 98 + *) 99 + 100 + val file : t -> string 101 + (** [file l] is the [?file] passed at creation, or ["-"]. *)
+252
lib/opam.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + module Loc = Loc 7 + module Meta = Loc.Meta 8 + module Path = Loc.Path 9 + module Error = Opam_error 10 + module Value = Value 11 + module Lexer = Lexer 12 + module Parser = Parser 13 + module Printer = Printer 14 + 15 + exception Error = Opam_error.Error 16 + 17 + type 'a fmt = Format.formatter -> 'a -> unit 18 + type 'a t = { kind : string; dec : Value.t -> 'a; enc : 'a -> Value.t } 19 + 20 + let kind c = c.kind 21 + 22 + let kind_name = function 23 + | Value.Bool _ -> "bool" 24 + | Value.Int _ -> "int" 25 + | Value.String _ -> "string" 26 + | Value.Ident _ -> "identifier" 27 + | Value.List _ -> "list" 28 + | Value.Group _ -> "group" 29 + | Value.Option _ -> "option" 30 + | Value.Relop _ -> "relational" 31 + | Value.Prefix_relop _ -> "prefix-relational" 32 + | Value.Logop _ -> "logical" 33 + | Value.Pfxop _ -> "prefix" 34 + | Value.Env_binding _ -> "env-binding" 35 + 36 + let err_expected exp v = 37 + Opam_error.sort_mismatch Loc.Meta.none ~expected:exp ~found:(kind_name v) 38 + 39 + let bool = 40 + { 41 + kind = "bool"; 42 + dec = (function Value.Bool b -> b | v -> err_expected "bool" v); 43 + enc = (fun b -> Value.Bool b); 44 + } 45 + 46 + let int = 47 + { 48 + kind = "int"; 49 + dec = (function Value.Int i -> i | v -> err_expected "int" v); 50 + enc = (fun i -> Value.Int i); 51 + } 52 + 53 + let string = 54 + { 55 + kind = "string"; 56 + dec = (function Value.String s -> s | v -> err_expected "string" v); 57 + enc = (fun s -> Value.String s); 58 + } 59 + 60 + let ident = 61 + { 62 + kind = "identifier"; 63 + dec = (function Value.Ident s -> s | v -> err_expected "identifier" v); 64 + enc = (fun s -> Value.Ident s); 65 + } 66 + 67 + let list elt = 68 + { 69 + kind = "list of " ^ elt.kind; 70 + dec = 71 + (function 72 + | Value.List xs -> List.map elt.dec xs 73 + | v -> err_expected "list" v); 74 + enc = (fun xs -> Value.List (List.map elt.enc xs)); 75 + } 76 + 77 + let option elt = 78 + { 79 + kind = "optional " ^ elt.kind; 80 + dec = (fun v -> Some (elt.dec v)); 81 + enc = (function Some v -> elt.enc v | None -> Value.List []); 82 + } 83 + 84 + let map ?kind:k ~dec ~enc c = 85 + let kind = match k with Some k -> k | None -> c.kind in 86 + { kind; dec = (fun v -> dec (c.dec v)); enc = (fun x -> c.enc (enc x)) } 87 + 88 + let enum ?kind:k cases = 89 + let kind = match k with Some k -> k | None -> "enum" in 90 + let rev = List.map (fun (s, v) -> (v, s)) cases in 91 + { 92 + kind; 93 + dec = 94 + (function 95 + | Value.Ident s | Value.String s -> ( 96 + match List.assoc_opt s cases with 97 + | Some v -> v 98 + | None -> 99 + Loc.Error.raise [] Loc.Meta.none (Opam_error.Unknown_enum_value s) 100 + ) 101 + | v -> err_expected "identifier or string" v); 102 + enc = 103 + (fun x -> 104 + match List.find_opt (fun (v, _) -> Stdlib.compare v x = 0) rev with 105 + | Some (_, s) -> Value.Ident s 106 + | None -> failwith "Opam.enum: value not in case list"); 107 + } 108 + 109 + let filtered inner = 110 + { 111 + kind = inner.kind ^ " (filtered)"; 112 + dec = 113 + (function 114 + | Value.Option (v, fs) -> (inner.dec v, fs) 115 + | v -> (inner.dec v, [])); 116 + enc = 117 + (function x, [] -> inner.enc x | x, fs -> Value.Option (inner.enc x, fs)); 118 + } 119 + 120 + let rec flatten_logop op = function 121 + | Value.Logop (op', a, b) when op' = op -> 122 + flatten_logop op a @ flatten_logop op b 123 + | v -> [ v ] 124 + 125 + let rec encode_constraint = function 126 + | [] -> failwith "Opam.constraint_: empty constraint" 127 + | [ (op, ver) ] -> Value.Prefix_relop (op, Value.String ver) 128 + | (op, ver) :: rest -> 129 + Value.Logop 130 + (`And, Value.Prefix_relop (op, Value.String ver), encode_constraint rest) 131 + 132 + let constraint_ = 133 + let dec_atom = function 134 + | Value.Prefix_relop (op, Value.String s) -> (op, s) 135 + | v -> err_expected "version constraint atom" v 136 + in 137 + { 138 + kind = "version constraint"; 139 + dec = 140 + (function 141 + | Value.Logop (`And, _, _) as v -> 142 + List.map dec_atom (flatten_logop `And v) 143 + | v -> [ dec_atom v ]); 144 + enc = encode_constraint; 145 + } 146 + 147 + (* ---- File / record codecs ---- *) 148 + 149 + module File = struct 150 + type 'a codec = 'a t 151 + 152 + type ('o, 'dec) builder = { 153 + kind : string; 154 + dec : Value.item list -> 'dec; 155 + encs : ('o -> Value.item list -> Value.item list) list; 156 + } 157 + 158 + type 'a t = { 159 + file_kind : string; 160 + file_dec : Value.file -> 'a; 161 + file_enc : 'a -> Value.file; 162 + } 163 + 164 + let kind (t : 'a t) = t.file_kind 165 + 166 + let obj ?kind:k ctor = 167 + let kind = match k with Some k -> k | None -> "opam record" in 168 + { kind; dec = (fun _ -> ctor); encs = [] } 169 + 170 + let lookup_value name items = 171 + let rec loop = function 172 + | [] -> None 173 + | Value.Variable (n, v) :: _ when n = name -> Some v 174 + | _ :: rest -> loop rest 175 + in 176 + loop items 177 + 178 + let push_field_ctx kind name f = 179 + try f () 180 + with Opam_error.Error e -> 181 + Opam_error.push_object (kind, Loc.Meta.none) (name, Loc.Meta.none) e 182 + 183 + let field ?dec_absent ~enc name (codec : 'a codec) 184 + (b : ('o, 'a -> 'dec) builder) : ('o, 'dec) builder = 185 + let kind = b.kind in 186 + let dec items = 187 + let f = b.dec items in 188 + match lookup_value name items with 189 + | Some v -> 190 + let decoded = push_field_ctx kind name (fun () -> codec.dec v) in 191 + f decoded 192 + | None -> ( 193 + match dec_absent with 194 + | Some default -> f default 195 + | None -> 196 + push_field_ctx kind name (fun () -> 197 + Opam_error.missing_field Loc.Meta.none name)) 198 + in 199 + let enc_step (o : 'o) acc = 200 + Value.Variable (name, codec.enc (enc o)) :: acc 201 + in 202 + { kind; dec; encs = enc_step :: b.encs } 203 + 204 + let opt ~enc name (codec : 'a codec) (b : ('o, 'a option -> 'dec) builder) : 205 + ('o, 'dec) builder = 206 + let kind = b.kind in 207 + let dec items = 208 + let f = b.dec items in 209 + match lookup_value name items with 210 + | Some v -> 211 + let decoded = 212 + push_field_ctx kind name (fun () -> Some (codec.dec v)) 213 + in 214 + f decoded 215 + | None -> f None 216 + in 217 + let enc_step (o : 'o) acc = 218 + match enc o with 219 + | None -> acc 220 + | Some v -> Value.Variable (name, codec.enc v) :: acc 221 + in 222 + { kind; dec; encs = enc_step :: b.encs } 223 + 224 + let finish (b : ('o, 'o) builder) : 'o t = 225 + let encs_in_order = List.rev b.encs in 226 + let file_dec (f : Value.file) = b.dec f.contents in 227 + let file_enc (o : 'o) = 228 + let items = List.fold_right (fun f acc -> f o acc) encs_in_order [] in 229 + Value.file items 230 + in 231 + { file_kind = b.kind; file_dec; file_enc } 232 + 233 + let of_file_exn (t : 'a t) f = t.file_dec f 234 + let to_file (t : 'a t) x = t.file_enc x 235 + 236 + let of_file t f = 237 + try Ok (of_file_exn t f) with Opam_error.Error e -> Error e 238 + end 239 + 240 + let decode_string fc s = 241 + try 242 + let lex = Lexer.of_string s in 243 + let f = Parser.parse lex in 244 + File.of_file fc f 245 + with Opam_error.Error e -> Error e 246 + 247 + let decode_string_exn fc s = 248 + match decode_string fc s with 249 + | Ok v -> v 250 + | Error e -> raise (Opam_error.Error e) 251 + 252 + let encode_string fc x = Printer.to_string (File.to_file fc x)
+150
lib/opam.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Type-safe codecs for opam files. 7 + 8 + Two layers of codecs: 9 + - {{!t} ['a t]} maps a single {!Value.t} to/from an OCaml value. 10 + - {{!File.t} ['a File.t]} maps a whole {!Value.file} to/from an OCaml record 11 + by composing field codecs. 12 + 13 + {2 Quick Start} 14 + 15 + {[ 16 + type pkg = { name : string; version : string; depends : string list } 17 + 18 + let pkg_codec : pkg Opam.File.t = 19 + Opam.File.( 20 + obj (fun name version depends -> { name; version; depends }) 21 + |> field "name" Opam.string ~enc:(fun p -> p.name) 22 + |> field "version" Opam.string ~enc:(fun p -> p.version) 23 + |> field "depends" 24 + Opam.(list string) 25 + ~enc:(fun p -> p.depends) 26 + ~dec_absent:[] 27 + |> finish) 28 + 29 + let r = Opam.decode_string pkg_codec input 30 + ]} 31 + 32 + For raw value parsing without codecs, see {!Opam_bytesrw}. *) 33 + 34 + (** {1:re_exports Re-exports} *) 35 + 36 + module Loc = Loc 37 + module Meta = Loc.Meta 38 + module Path = Loc.Path 39 + module Error = Opam_error 40 + module Value = Value 41 + module Lexer = Lexer 42 + module Parser = Parser 43 + module Printer = Printer 44 + 45 + exception Error of Opam_error.t 46 + 47 + (** {1:codec Value codecs} *) 48 + 49 + type 'a t 50 + (** A codec for a single opam {!Value.t}. *) 51 + 52 + type 'a fmt = Format.formatter -> 'a -> unit 53 + 54 + val kind : 'a t -> string 55 + (** [kind c] is a short human-readable name used in error messages. *) 56 + 57 + val bool : bool t 58 + (** Codec for opam booleans. *) 59 + 60 + val int : int t 61 + (** Codec for opam integers. *) 62 + 63 + val string : string t 64 + (** Codec for opam strings (quoted). *) 65 + 66 + val ident : string t 67 + (** Codec for opam identifiers (unquoted). *) 68 + 69 + val list : 'a t -> 'a list t 70 + (** [list c] is the codec for a homogeneous opam list. *) 71 + 72 + val option : 'a t -> 'a option t 73 + (** [option c] wraps decoded values in [Some]; encodes [None] as the empty list. 74 + For absent record fields, see {!File.opt}. *) 75 + 76 + val map : ?kind:string -> dec:('a -> 'b) -> enc:('b -> 'a) -> 'a t -> 'b t 77 + (** [map ~dec ~enc c] transforms a codec [c] from ['a] to ['b]. *) 78 + 79 + val enum : ?kind:string -> (string * 'a) list -> 'a t 80 + (** [enum cases] is a codec restricted to the [(label, value)] cases. *) 81 + 82 + (** {1:opam Opam-specific value codecs} *) 83 + 84 + val filtered : 'a t -> ('a * Value.t list) t 85 + (** [filtered c] decodes [v {f1 f2 ...}] as a pair of underlying value and 86 + filter expressions, or just [v] (with [[]]). *) 87 + 88 + val constraint_ : (Value.relop * string) list t 89 + (** [constraint_] decodes a version constraint expression like 90 + [>= "1.0" & < "2.0"] as a list of [(operator, version)] pairs. *) 91 + 92 + (** {1:file File codecs} *) 93 + 94 + module File : sig 95 + type 'a codec = 'a t 96 + 97 + type 'a t 98 + (** A codec for a {!Value.file}. *) 99 + 100 + type ('o, 'dec) builder 101 + 102 + val obj : ?kind:string -> 'dec -> ('o, 'dec) builder 103 + (** [obj ctor] starts a record builder. [ctor] is the constructor expecting 104 + one argument per [field] / [opt] call (in declaration order). *) 105 + 106 + val field : 107 + ?dec_absent:'a -> 108 + enc:('o -> 'a) -> 109 + string -> 110 + 'a codec -> 111 + ('o, 'a -> 'dec) builder -> 112 + ('o, 'dec) builder 113 + (** [field name c b] requires field [name] of value-codec [c] in the record. 114 + [dec_absent] supplies a default if the field is missing. Without 115 + [dec_absent], the field is required. *) 116 + 117 + val opt : 118 + enc:('o -> 'a option) -> 119 + string -> 120 + 'a codec -> 121 + ('o, 'a option -> 'dec) builder -> 122 + ('o, 'dec) builder 123 + (** [opt name c b] adds an optional field. *) 124 + 125 + val finish : ('o, 'o) builder -> 'o t 126 + (** [finish b] turns a complete builder into a file-level codec. *) 127 + 128 + val kind : 'a t -> string 129 + (** [kind t] is the codec's kind label. *) 130 + 131 + val of_file : 'a t -> Value.file -> ('a, Error.t) result 132 + (** [of_file t f] decodes file [f] with codec [t]. *) 133 + 134 + val of_file_exn : 'a t -> Value.file -> 'a 135 + (** [of_file_exn t f] is like {!of_file} but raises {!Error}. *) 136 + 137 + val to_file : 'a t -> 'a -> Value.file 138 + (** [to_file t x] encodes record [x] as a file. *) 139 + end 140 + 141 + (** {1:io Decoding and encoding strings} *) 142 + 143 + val decode_string : 'a File.t -> string -> ('a, Error.t) result 144 + (** [decode_string c s] parses [s] and decodes it with [c]. *) 145 + 146 + val decode_string_exn : 'a File.t -> string -> 'a 147 + (** [decode_string_exn c s] is like {!decode_string} but raises {!Error}. *) 148 + 149 + val encode_string : 'a File.t -> 'a -> string 150 + (** [encode_string c x] encodes [x] to opam syntax. *)
+69
lib/opam_error.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + type kind = Loc.Error.kind = .. 7 + type t = Loc.Error.t 8 + 9 + module Context = Loc.Error.Context 10 + 11 + type kind += 12 + | Unexpected_char of char 13 + | Unterminated_string 14 + | Unterminated_escape 15 + | Invalid_escape of string 16 + | Invalid_hex_escape 17 + | Invalid_decimal_escape 18 + | Decimal_escape_out_of_range 19 + | Unexpected_token of { expected : string; found : string } 20 + | Sort_mismatch of { expected : string; found : string } 21 + | Missing_field of string 22 + | Unknown_enum_value of string 23 + 24 + let () = 25 + Loc.Error.register_kind_printer (function 26 + | Unexpected_char c -> 27 + Some (fun ppf -> Fmt.pf ppf "unexpected character %C" c) 28 + | Unterminated_string -> 29 + Some (fun ppf -> Fmt.string ppf "unterminated string") 30 + | Unterminated_escape -> 31 + Some (fun ppf -> Fmt.string ppf "unterminated escape sequence") 32 + | Invalid_escape s -> 33 + Some (fun ppf -> Fmt.pf ppf "invalid escape sequence %s" s) 34 + | Invalid_hex_escape -> 35 + Some (fun ppf -> Fmt.string ppf "invalid \\xHH hex escape") 36 + | Invalid_decimal_escape -> 37 + Some (fun ppf -> Fmt.string ppf "invalid \\DDD decimal escape") 38 + | Decimal_escape_out_of_range -> 39 + Some (fun ppf -> Fmt.string ppf "decimal escape out of byte range") 40 + | Unexpected_token { expected; found } -> 41 + Some (fun ppf -> Fmt.pf ppf "expected %s, got %s" expected found) 42 + | Sort_mismatch { expected; found } -> 43 + Some (fun ppf -> Fmt.pf ppf "expected %s, got %s" expected found) 44 + | Missing_field name -> Some (fun ppf -> Fmt.pf ppf "missing field %S" name) 45 + | Unknown_enum_value v -> Some (fun ppf -> Fmt.pf ppf "unknown value %S" v) 46 + | _ -> None) 47 + 48 + let v = Loc.Error.v 49 + let msg = Loc.Error.msg 50 + let raise = Loc.Error.raise 51 + let fail = Loc.Error.fail 52 + let failf = Loc.Error.failf 53 + let expected = Loc.Error.expected 54 + let push_array = Loc.Error.push_array 55 + let push_object = Loc.Error.push_object 56 + let pp = Loc.Error.pp 57 + let to_string = Loc.Error.to_string 58 + let unexpected_char meta c = raise Context.empty meta (Unexpected_char c) 59 + let unterminated_string meta = raise Context.empty meta Unterminated_string 60 + 61 + let unexpected_token meta ~expected ~found = 62 + raise Context.empty meta (Unexpected_token { expected; found }) 63 + 64 + let sort_mismatch meta ~expected ~found = 65 + raise Context.empty meta (Sort_mismatch { expected; found }) 66 + 67 + let missing_field meta name = raise Context.empty meta (Missing_field name) 68 + 69 + exception Error = Loc.Error
+81
lib/opam_error.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Structured errors for opam parsing, encoding, and codec failures. 7 + 8 + The {!kind} type is the extensible {!Loc.Error.kind}, so callers can 9 + pattern-match on the typed variants below or fall back to the generic 10 + {!Loc.Error.Msg} cases. A printer for each typed variant is registered at 11 + module load time. *) 12 + 13 + type kind = Loc.Error.kind = .. 14 + type t = Loc.Error.t 15 + 16 + module Context = Loc.Error.Context 17 + 18 + (** {1:typed Opam-specific kinds} *) 19 + 20 + type kind += 21 + | Unexpected_char of char 22 + | Unterminated_string 23 + | Unterminated_escape 24 + | Invalid_escape of string 25 + | Invalid_hex_escape 26 + | Invalid_decimal_escape 27 + | Decimal_escape_out_of_range 28 + | Unexpected_token of { expected : string; found : string } 29 + | Sort_mismatch of { expected : string; found : string } 30 + | Missing_field of string 31 + | Unknown_enum_value of string 32 + 33 + (** {1:raise Raise helpers} *) 34 + 35 + val v : Context.t -> Loc.Meta.t -> kind -> t 36 + (** [v ctx meta k] is an error value with kind [k] at [meta]. *) 37 + 38 + val msg : Context.t -> Loc.Meta.t -> string -> t 39 + (** [msg ctx meta s] is [v ctx meta (Loc.Error.Msg s)]. *) 40 + 41 + val raise : Context.t -> Loc.Meta.t -> kind -> 'a 42 + (** [raise ctx meta k] raises {!Error} with kind [k]. *) 43 + 44 + val fail : Loc.Meta.t -> string -> 'a 45 + (** [fail meta s] raises a {!Loc.Error.Msg} error. *) 46 + 47 + val failf : Loc.Meta.t -> ('a, Format.formatter, unit, 'b) format4 -> 'a 48 + (** [failf meta fmt] raises a formatted {!Loc.Error.Msg} error. *) 49 + 50 + val expected : Loc.Meta.t -> string -> fnd:string -> 'a 51 + (** [expected meta exp ~fnd] raises a generic "expected exp, found fnd". *) 52 + 53 + val unexpected_char : Loc.Meta.t -> char -> 'a 54 + (** [unexpected_char meta c] raises {!Unexpected_char} for [c]. *) 55 + 56 + val unterminated_string : Loc.Meta.t -> 'a 57 + (** [unterminated_string meta] raises {!Unterminated_string}. *) 58 + 59 + val unexpected_token : Loc.Meta.t -> expected:string -> found:string -> 'a 60 + (** [unexpected_token meta ~expected ~found] raises {!Unexpected_token}. *) 61 + 62 + val sort_mismatch : Loc.Meta.t -> expected:string -> found:string -> 'a 63 + (** [sort_mismatch meta ~expected ~found] raises {!Sort_mismatch}. *) 64 + 65 + val missing_field : Loc.Meta.t -> string -> 'a 66 + (** [missing_field meta name] raises {!Missing_field} for [name]. *) 67 + 68 + val push_array : string Loc.node -> int Loc.node -> t -> 'a 69 + (** [push_array kn idx e] re-raises [e] with an array-index context frame. *) 70 + 71 + val push_object : string Loc.node -> string Loc.node -> t -> 'a 72 + (** [push_object kn name e] re-raises [e] with an object-member context. *) 73 + 74 + val pp : Format.formatter -> t -> unit 75 + (** [pp ppf e] pretty-prints error [e]. *) 76 + 77 + val to_string : t -> string 78 + (** [to_string e] is [e] formatted as a string. *) 79 + 80 + exception Error of t 81 + (** Raised by parser, encoder, and codec on failure. *)
+401
lib/parser.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (* Two-token lookahead wrapper around [Lexer.t]. We cache only tokens 7 + (not locations); errors call back into the lexer for the most-recent 8 + token's loc via {!Lexer.last_loc}. *) 9 + type stream = { 10 + lex : Lexer.t; 11 + mutable buf : Lexer.token array; 12 + mutable filled : int; 13 + } 14 + 15 + (* Sentinel value: stays in unused buffer slots between calls. *) 16 + let init_buf () = Array.make 4 Lexer.EOF 17 + let v lex = { lex; buf = init_buf (); filled = 0 } 18 + 19 + let fill s n = 20 + if Array.length s.buf < n then s.buf <- Array.make (max n 4) Lexer.EOF; 21 + while s.filled < n do 22 + s.buf.(s.filled) <- Lexer.next s.lex; 23 + s.filled <- s.filled + 1 24 + done 25 + 26 + let peek s = 27 + fill s 1; 28 + s.buf.(0) 29 + 30 + let peek2 s = 31 + fill s 2; 32 + s.buf.(1) 33 + 34 + let advance s = 35 + fill s 1; 36 + let t = s.buf.(0) in 37 + for i = 0 to s.filled - 2 do 38 + s.buf.(i) <- s.buf.(i + 1) 39 + done; 40 + s.filled <- s.filled - 1; 41 + t 42 + 43 + (* For error context, use the last consumed token's location (if any), 44 + otherwise the lexer's current cursor. The lookahead means [last_loc] 45 + refers to whichever token was most recently read from the lexer; the 46 + token currently at [s.buf.(0)] may have its loc reflected here. *) 47 + let error_loc s = Lexer.last_loc s.lex 48 + let _ = error_loc 49 + 50 + let expected s ~exp ~fnd = 51 + Opam_error.unexpected_token 52 + (Loc.Meta.make (Lexer.last_loc s.lex)) 53 + ~expected:exp ~found:fnd 54 + 55 + let consume s exp = 56 + let tok = advance s in 57 + if tok = exp then () 58 + else 59 + expected s 60 + ~exp:(Fmt.to_to_string Lexer.pp_token exp) 61 + ~fnd:(Fmt.to_to_string Lexer.pp_token tok) 62 + 63 + let token_relop = function 64 + | Lexer.EQ -> Some `Eq 65 + | Lexer.NEQ -> Some `Neq 66 + | Lexer.GT -> Some `Gt 67 + | Lexer.GEQ -> Some `Geq 68 + | Lexer.LT -> Some `Lt 69 + | Lexer.LEQ -> Some `Leq 70 + | _ -> None 71 + 72 + let token_envop = function 73 + | Lexer.EQ -> Some `Eq 74 + | Lexer.PLUSEQ -> Some `PlusEq 75 + | Lexer.EQPLUS -> Some `EqPlus 76 + | Lexer.COLONEQ -> Some `ColonEq 77 + | Lexer.EQCOLON -> Some `EqColon 78 + | _ -> None 79 + 80 + let token_pfxop = function 81 + | Lexer.NOT -> Some `Not 82 + | Lexer.QMARK -> Some `Defined 83 + | _ -> None 84 + 85 + let token_logop = function 86 + | Lexer.AND -> Some `And 87 + | Lexer.OR -> Some `Or 88 + | _ -> None 89 + 90 + let _ = token_logop 91 + 92 + let rec parse_atom s = 93 + let tok = advance s in 94 + match tok with 95 + | Lexer.STRING str -> Value.String str 96 + | Lexer.INT i -> Value.Int i 97 + | Lexer.BOOL b -> Value.Bool b 98 + | Lexer.IDENT id -> Value.Ident id 99 + | Lexer.LBRACKET -> 100 + let items = parse_value_seq_until s Lexer.RBRACKET in 101 + consume s Lexer.RBRACKET; 102 + Value.List items 103 + | Lexer.LPAREN -> 104 + let items = parse_value_seq_until s Lexer.RPAREN in 105 + consume s Lexer.RPAREN; 106 + Value.Group items 107 + | t -> expected s ~exp:"value atom" ~fnd:(Fmt.to_to_string Lexer.pp_token t) 108 + 109 + and parse_value_seq_until s closing = 110 + let rec loop acc = 111 + let tok = peek s in 112 + if tok = closing || tok = Lexer.EOF then List.rev acc 113 + else 114 + let v = parse_value s in 115 + loop (v :: acc) 116 + in 117 + loop [] 118 + 119 + and parse_value s = parse_options s 120 + 121 + and parse_options s = 122 + let v = parse_logop_or s in 123 + match peek s with 124 + | Lexer.LBRACE -> 125 + ignore (advance s); 126 + let items = parse_value_seq_until s Lexer.RBRACE in 127 + consume s Lexer.RBRACE; 128 + Value.Option (v, items) 129 + | _ -> v 130 + 131 + and parse_logop_or s = 132 + let v = parse_logop_and s in 133 + let rec loop v = 134 + match peek s with 135 + | Lexer.OR -> 136 + ignore (advance s); 137 + let v2 = parse_logop_and s in 138 + loop (Value.Logop (`Or, v, v2)) 139 + | _ -> v 140 + in 141 + loop v 142 + 143 + and parse_logop_and s = 144 + let v = parse_relop_value s in 145 + let rec loop v = 146 + match peek s with 147 + | Lexer.AND -> 148 + ignore (advance s); 149 + let v2 = parse_relop_value s in 150 + loop (Value.Logop (`And, v, v2)) 151 + | _ -> v 152 + in 153 + loop v 154 + 155 + and parse_relop_value s = 156 + let tok = peek s in 157 + match token_pfxop tok with 158 + | Some op -> 159 + ignore (advance s); 160 + let v = parse_relop_value s in 161 + Value.Pfxop (op, v) 162 + | None -> ( 163 + match token_relop tok with 164 + | Some op -> 165 + (* Prefix relop: op atom *) 166 + ignore (advance s); 167 + let a = parse_atom s in 168 + Value.Prefix_relop (op, a) 169 + | None -> parse_atom_post s) 170 + 171 + and parse_atom_post s = 172 + let tok = peek s in 173 + match tok with 174 + | Lexer.IDENT name -> ( 175 + let next_tok = peek2 s in 176 + match token_envop next_tok with 177 + | Some op -> 178 + ignore (advance s); 179 + ignore (advance s); 180 + let v = parse_atom s in 181 + Value.Env_binding (Value.Ident name, op, v) 182 + | None -> 183 + ignore (advance s); 184 + maybe_binary_relop s (Value.Ident name)) 185 + | _ -> 186 + let a = parse_atom s in 187 + maybe_binary_relop s a 188 + 189 + and maybe_binary_relop s a = 190 + let tok = peek s in 191 + match token_relop tok with 192 + | Some op -> 193 + ignore (advance s); 194 + let b = parse_atom s in 195 + Value.Relop (op, a, b) 196 + | None -> a 197 + 198 + let parse_value_top lex = parse_value (v lex) 199 + 200 + (* ---- Token-level skipping for {!field} ---- *) 201 + 202 + (* The [skip_*] helpers mirror [parse_*] but construct no {!Value.t}. They 203 + consume exactly the same token spans, keeping both in lockstep. *) 204 + 205 + let rec skip_value s = skip_options s 206 + 207 + and skip_options s = 208 + skip_logop_or s; 209 + match peek s with 210 + | Lexer.LBRACE -> 211 + ignore (advance s); 212 + skip_value_seq_until s Lexer.RBRACE; 213 + consume s Lexer.RBRACE 214 + | _ -> () 215 + 216 + and skip_value_seq_until s closing = 217 + let rec loop () = 218 + let tok = peek s in 219 + if tok = closing || tok = Lexer.EOF then () 220 + else ( 221 + skip_value s; 222 + loop ()) 223 + in 224 + loop () 225 + 226 + and skip_logop_or s = 227 + skip_logop_and s; 228 + let rec loop () = 229 + match peek s with 230 + | Lexer.OR -> 231 + ignore (advance s); 232 + skip_logop_and s; 233 + loop () 234 + | _ -> () 235 + in 236 + loop () 237 + 238 + and skip_logop_and s = 239 + skip_relop_value s; 240 + let rec loop () = 241 + match peek s with 242 + | Lexer.AND -> 243 + ignore (advance s); 244 + skip_relop_value s; 245 + loop () 246 + | _ -> () 247 + in 248 + loop () 249 + 250 + and skip_relop_value s = 251 + let tok = peek s in 252 + match token_pfxop tok with 253 + | Some _ -> 254 + ignore (advance s); 255 + skip_relop_value s 256 + | None -> ( 257 + match token_relop tok with 258 + | Some _ -> 259 + ignore (advance s); 260 + skip_atom s 261 + | None -> skip_atom_post s) 262 + 263 + and skip_atom_post s = 264 + let tok = peek s in 265 + match tok with 266 + | Lexer.IDENT _ -> ( 267 + let next_tok = peek2 s in 268 + match token_envop next_tok with 269 + | Some _ -> 270 + ignore (advance s); 271 + ignore (advance s); 272 + skip_atom s 273 + | None -> 274 + ignore (advance s); 275 + skip_maybe_relop s) 276 + | _ -> 277 + skip_atom s; 278 + skip_maybe_relop s 279 + 280 + and skip_maybe_relop s = 281 + let tok = peek s in 282 + match token_relop tok with 283 + | Some _ -> 284 + ignore (advance s); 285 + skip_atom s 286 + | None -> () 287 + 288 + and skip_atom s = 289 + let tok = advance s in 290 + match tok with 291 + | Lexer.STRING _ | Lexer.INT _ | Lexer.BOOL _ | Lexer.IDENT _ -> () 292 + | Lexer.LBRACKET -> 293 + skip_value_seq_until s Lexer.RBRACKET; 294 + consume s Lexer.RBRACKET 295 + | Lexer.LPAREN -> 296 + skip_value_seq_until s Lexer.RPAREN; 297 + consume s Lexer.RPAREN 298 + | t -> expected s ~exp:"value atom" ~fnd:(Fmt.to_to_string Lexer.pp_token t) 299 + 300 + (* Skip the contents of a section (the [{ ... }] body, opening brace 301 + already consumed). Does NOT recurse into the section to look for a 302 + top-level field. *) 303 + let rec skip_section_body s = 304 + let tok = peek s in 305 + match tok with 306 + | Lexer.RBRACE -> ignore (advance s) 307 + | Lexer.EOF -> expected s ~exp:"'}'" ~fnd:"EOF" 308 + | Lexer.IDENT _ -> ( 309 + ignore (advance s); 310 + let tok2 = peek s in 311 + match tok2 with 312 + | Lexer.COLON -> 313 + ignore (advance s); 314 + skip_value s; 315 + skip_section_body s 316 + | Lexer.STRING _ -> 317 + ignore (advance s); 318 + consume s Lexer.LBRACE; 319 + skip_section_body s; 320 + skip_section_body s 321 + | Lexer.LBRACE -> 322 + ignore (advance s); 323 + skip_section_body s; 324 + skip_section_body s 325 + | t -> 326 + expected s ~exp:"':' or section body" 327 + ~fnd:(Fmt.to_to_string Lexer.pp_token t)) 328 + | t -> 329 + expected s ~exp:"identifier or '}'" 330 + ~fnd:(Fmt.to_to_string Lexer.pp_token t) 331 + 332 + let field lex name = 333 + let s = v lex in 334 + let rec loop () = 335 + let tok = peek s in 336 + match tok with 337 + | Lexer.EOF -> None 338 + | Lexer.IDENT n -> ( 339 + ignore (advance s); 340 + let tok2 = peek s in 341 + match tok2 with 342 + | Lexer.COLON -> 343 + ignore (advance s); 344 + if n = name then Some (parse_value s) 345 + else begin 346 + skip_value s; 347 + loop () 348 + end 349 + | Lexer.STRING _ -> 350 + ignore (advance s); 351 + consume s Lexer.LBRACE; 352 + skip_section_body s; 353 + loop () 354 + | Lexer.LBRACE -> 355 + ignore (advance s); 356 + skip_section_body s; 357 + loop () 358 + | t -> 359 + expected s ~exp:"':' or section body" 360 + ~fnd:(Fmt.to_to_string Lexer.pp_token t)) 361 + | t -> expected s ~exp:"identifier" ~fnd:(Fmt.to_to_string Lexer.pp_token t) 362 + in 363 + loop () 364 + 365 + let rec parse_items s acc = 366 + let tok = peek s in 367 + match tok with 368 + | Lexer.EOF -> List.rev acc 369 + | Lexer.IDENT name -> ( 370 + ignore (advance s); 371 + let tok2 = peek s in 372 + match tok2 with 373 + | Lexer.COLON -> 374 + ignore (advance s); 375 + let v = parse_value s in 376 + parse_items s (Value.Variable (name, v) :: acc) 377 + | Lexer.STRING label -> 378 + ignore (advance s); 379 + consume s Lexer.LBRACE; 380 + let items = parse_items s [] in 381 + consume s Lexer.RBRACE; 382 + parse_items s 383 + (Value.Section { kind = name; name = Some label; items } :: acc) 384 + | Lexer.LBRACE -> 385 + ignore (advance s); 386 + let items = parse_items s [] in 387 + consume s Lexer.RBRACE; 388 + parse_items s 389 + (Value.Section { kind = name; name = None; items } :: acc) 390 + | t -> 391 + expected s ~exp:"':' or section body" 392 + ~fnd:(Fmt.to_to_string Lexer.pp_token t)) 393 + | Lexer.RBRACE -> List.rev acc 394 + | t -> expected s ~exp:"identifier" ~fnd:(Fmt.to_to_string Lexer.pp_token t) 395 + 396 + let parse lex = 397 + let s = v lex in 398 + let items = parse_items s [] in 399 + Value.file ~file_name:(Lexer.file lex) items 400 + 401 + let parse_value = parse_value_top
+25
lib/parser.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Recursive-descent parser for the opam file grammar. 7 + 8 + The parser consumes tokens from a {!Lexer.t} and produces a {!Value.file}. 9 + It raises {!Opam_error.Error} on syntactic errors. *) 10 + 11 + val parse : Lexer.t -> Value.file 12 + (** [parse l] parses an opam file from lexer [l]. *) 13 + 14 + val parse_value : Lexer.t -> Value.t 15 + (** [parse_value l] parses a single opam value (no surrounding [field:] or 16 + section). Useful for testing and for parsing values stored in other contexts 17 + (e.g. command-line arguments). *) 18 + 19 + val field : Lexer.t -> string -> Value.t option 20 + (** [field l name] scans the opam file for the top-level field [name] and 21 + returns its value, or [None] if absent. Skipped fields and sections are 22 + token-skipped without building a {!Value.t}, so this is faster and allocates 23 + less than {!parse} followed by {!Value.find}. Stops as soon as the field is 24 + found — no further parsing. Raises {!Opam_error.Error} on syntax errors 25 + encountered while scanning. *)
+132
lib/printer.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + let escape_string s = 7 + let buf = Buffer.create (String.length s + 2) in 8 + String.iter 9 + (fun c -> 10 + match c with 11 + | '"' -> Buffer.add_string buf "\\\"" 12 + | '\\' -> Buffer.add_string buf "\\\\" 13 + | '\n' -> Buffer.add_string buf "\\n" 14 + | '\r' -> Buffer.add_string buf "\\r" 15 + | '\t' -> Buffer.add_string buf "\\t" 16 + | '\b' -> Buffer.add_string buf "\\b" 17 + | c when Char.code c < 0x20 -> 18 + Buffer.add_string buf (Fmt.str "\\x%02x" (Char.code c)) 19 + | c -> Buffer.add_char buf c) 20 + s; 21 + Buffer.contents buf 22 + 23 + (* Choose between a single-line [String s] and a triple-quoted form 24 + when the contents contain newlines or unprintable characters. We 25 + keep things simple: always emit a single-line escaped string. *) 26 + let write_string buf s = 27 + Buffer.add_char buf '"'; 28 + Buffer.add_string buf (escape_string s); 29 + Buffer.add_char buf '"' 30 + 31 + let needs_padding_around op = 32 + match op with `Eq | `Neq | `Gt | `Geq | `Lt | `Leq -> true 33 + 34 + let _ = needs_padding_around 35 + 36 + let rec write_value buf v = 37 + match v with 38 + | Value.Bool b -> Buffer.add_string buf (string_of_bool b) 39 + | Value.Int i -> Buffer.add_string buf (string_of_int i) 40 + | Value.String s -> write_string buf s 41 + | Value.Ident s -> Buffer.add_string buf s 42 + | Value.Relop (op, a, b) -> 43 + write_value buf a; 44 + Buffer.add_char buf ' '; 45 + Buffer.add_string buf (Value.relop_to_string op); 46 + Buffer.add_char buf ' '; 47 + write_value buf b 48 + | Value.Prefix_relop (op, a) -> 49 + Buffer.add_string buf (Value.relop_to_string op); 50 + Buffer.add_char buf ' '; 51 + write_value buf a 52 + | Value.Logop (op, a, b) -> 53 + write_value buf a; 54 + Buffer.add_char buf ' '; 55 + Buffer.add_string buf (Value.logop_to_string op); 56 + Buffer.add_char buf ' '; 57 + write_value buf b 58 + | Value.Pfxop (op, a) -> 59 + Buffer.add_string buf (Value.pfxop_to_string op); 60 + write_value buf a 61 + | Value.List xs -> 62 + Buffer.add_char buf '['; 63 + write_value_seq buf xs; 64 + Buffer.add_char buf ']' 65 + | Value.Group xs -> 66 + Buffer.add_char buf '('; 67 + write_value_seq buf xs; 68 + Buffer.add_char buf ')' 69 + | Value.Option (v, fs) -> 70 + write_value buf v; 71 + Buffer.add_string buf " {"; 72 + write_value_seq buf fs; 73 + Buffer.add_char buf '}' 74 + | Value.Env_binding (n, op, v) -> 75 + write_value buf n; 76 + Buffer.add_char buf ' '; 77 + Buffer.add_string buf (Value.env_op_to_string op); 78 + Buffer.add_char buf ' '; 79 + write_value buf v 80 + 81 + and write_value_seq buf = function 82 + | [] -> () 83 + | [ x ] -> write_value buf x 84 + | x :: rest -> 85 + write_value buf x; 86 + List.iter 87 + (fun v -> 88 + Buffer.add_char buf ' '; 89 + write_value buf v) 90 + rest 91 + 92 + let rec write_item buf = function 93 + | Value.Variable (name, v) -> 94 + Buffer.add_string buf name; 95 + Buffer.add_string buf ": "; 96 + write_value buf v; 97 + Buffer.add_char buf '\n' 98 + | Value.Section s -> 99 + Buffer.add_string buf s.kind; 100 + (match s.name with 101 + | None -> () 102 + | Some label -> 103 + Buffer.add_string buf " \""; 104 + Buffer.add_string buf (escape_string label); 105 + Buffer.add_char buf '"'); 106 + Buffer.add_string buf " {\n"; 107 + List.iter 108 + (fun item -> 109 + Buffer.add_string buf " "; 110 + write_item buf item) 111 + s.items; 112 + Buffer.add_string buf "}\n" 113 + 114 + let write_file buf f = List.iter (write_item buf) f.Value.contents 115 + let to_buffer = write_file 116 + let value_to_buffer = write_value 117 + let item_to_buffer = write_item 118 + 119 + let to_string f = 120 + let buf = Buffer.create 256 in 121 + write_file buf f; 122 + Buffer.contents buf 123 + 124 + let value_to_string v = 125 + let buf = Buffer.create 64 in 126 + write_value buf v; 127 + Buffer.contents buf 128 + 129 + let item_to_string i = 130 + let buf = Buffer.create 64 in 131 + write_item buf i; 132 + Buffer.contents buf
+27
lib/printer.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Pretty-printer producing canonical opam syntax. 7 + 8 + The output round-trips through {!Parser}: parsing the printer's output 9 + yields a value structurally equal to the input. *) 10 + 11 + val to_string : Value.file -> string 12 + (** [to_string f] is [f] rendered as canonical opam syntax. *) 13 + 14 + val to_buffer : Buffer.t -> Value.file -> unit 15 + (** [to_buffer buf f] writes [f] to [buf]. *) 16 + 17 + val value_to_string : Value.t -> string 18 + (** [value_to_string v] is [v] rendered as a single value. *) 19 + 20 + val value_to_buffer : Buffer.t -> Value.t -> unit 21 + (** [value_to_buffer buf v] writes [v] to [buf]. *) 22 + 23 + val item_to_string : Value.item -> string 24 + (** [item_to_string i] is [i] rendered as a single item line. *) 25 + 26 + val item_to_buffer : Buffer.t -> Value.item -> unit 27 + (** [item_to_buffer buf i] writes [i] to [buf]. *)
+206
lib/value.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + type relop = [ `Eq | `Neq | `Gt | `Geq | `Lt | `Leq ] 7 + 8 + let relop_to_string = function 9 + | `Eq -> "=" 10 + | `Neq -> "!=" 11 + | `Gt -> ">" 12 + | `Geq -> ">=" 13 + | `Lt -> "<" 14 + | `Leq -> "<=" 15 + 16 + let relop_of_string = function 17 + | "=" -> Some `Eq 18 + | "!=" -> Some `Neq 19 + | ">" -> Some `Gt 20 + | ">=" -> Some `Geq 21 + | "<" -> Some `Lt 22 + | "<=" -> Some `Leq 23 + | _ -> None 24 + 25 + type logop = [ `And | `Or ] 26 + 27 + let logop_to_string = function `And -> "&" | `Or -> "|" 28 + 29 + type pfxop = [ `Not | `Defined ] 30 + 31 + let pfxop_to_string = function `Not -> "!" | `Defined -> "?" 32 + 33 + type env_op = [ `Eq | `PlusEq | `EqPlus | `ColonEq | `EqColon ] 34 + 35 + let env_op_to_string = function 36 + | `Eq -> "=" 37 + | `PlusEq -> "+=" 38 + | `EqPlus -> "=+" 39 + | `ColonEq -> ":=" 40 + | `EqColon -> "=:" 41 + 42 + let env_op_of_string = function 43 + | "=" -> Some `Eq 44 + | "+=" -> Some `PlusEq 45 + | "=+" -> Some `EqPlus 46 + | ":=" -> Some `ColonEq 47 + | "=:" -> Some `EqColon 48 + | _ -> None 49 + 50 + type t = 51 + | Bool of bool 52 + | Int of int 53 + | String of string 54 + | Relop of relop * t * t 55 + | Prefix_relop of relop * t 56 + | Logop of logop * t * t 57 + | Pfxop of pfxop * t 58 + | Ident of string 59 + | List of t list 60 + | Group of t list 61 + | Option of t * t list 62 + | Env_binding of t * env_op * t 63 + 64 + type item = Variable of string * t | Section of section 65 + and section = { kind : string; name : string option; items : item list } 66 + 67 + type file = { file_name : string; contents : item list } 68 + 69 + let bool b = Bool b 70 + let int i = Int i 71 + let string s = String s 72 + let ident s = Ident s 73 + let list xs = List xs 74 + let group xs = Group xs 75 + let option v fs = Option (v, fs) 76 + let relop op a b = Relop (op, a, b) 77 + let prefix_relop op v = Prefix_relop (op, v) 78 + let logop op a b = Logop (op, a, b) 79 + let pfxop op v = Pfxop (op, v) 80 + let env_binding name op v = Env_binding (name, op, v) 81 + let variable name v = Variable (name, v) 82 + let file ?(file_name = "-") contents = { file_name; contents } 83 + 84 + let kind_name = function 85 + | Bool _ -> "bool" 86 + | Int _ -> "int" 87 + | String _ -> "string" 88 + | Relop _ -> "relational" 89 + | Prefix_relop _ -> "prefix-relational" 90 + | Logop _ -> "logical" 91 + | Pfxop _ -> "prefix" 92 + | Ident _ -> "identifier" 93 + | List _ -> "list" 94 + | Group _ -> "group" 95 + | Option _ -> "option" 96 + | Env_binding _ -> "env-binding" 97 + 98 + let invalid label v = 99 + Fmt.invalid_arg "Opam.Value.to_%s: got %s" label (kind_name v) 100 + 101 + let to_bool = function Bool b -> b | v -> invalid "bool" v 102 + let to_int = function Int i -> i | v -> invalid "int" v 103 + let to_string = function String s -> s | v -> invalid "string" v 104 + let to_ident = function Ident s -> s | v -> invalid "ident" v 105 + let to_list = function List xs -> xs | v -> invalid "list" v 106 + let to_bool_opt = function Bool b -> Some b | _ -> None 107 + let to_int_opt = function Int i -> Some i | _ -> None 108 + let to_string_opt = function String s -> Some s | _ -> None 109 + let to_ident_opt = function Ident s -> Some s | _ -> None 110 + let to_list_opt = function List xs -> Some xs | _ -> None 111 + let is_bool = function Bool _ -> true | _ -> false 112 + let is_int = function Int _ -> true | _ -> false 113 + let is_string = function String _ -> true | _ -> false 114 + let is_ident = function Ident _ -> true | _ -> false 115 + let is_list = function List _ -> true | _ -> false 116 + 117 + let find name f = 118 + let rec loop = function 119 + | [] -> None 120 + | Variable (n, v) :: _ when n = name -> Some v 121 + | _ :: rest -> loop rest 122 + in 123 + loop f.contents 124 + 125 + let section ~kind ?name f = 126 + let rec loop = function 127 + | [] -> None 128 + | Section s :: _ when s.kind = kind && (name = None || s.name = name) -> 129 + Some s 130 + | _ :: rest -> loop rest 131 + in 132 + loop f.contents 133 + 134 + let variables f = 135 + List.filter_map 136 + (function Variable (n, v) -> Some (n, v) | _ -> None) 137 + f.contents 138 + 139 + let sections f = 140 + List.filter_map (function Section s -> Some s | _ -> None) f.contents 141 + 142 + let pp_string ppf s = Fmt.pf ppf {|"%s"|} (String.escaped s) 143 + 144 + let rec pp ppf = function 145 + | Bool b -> Fmt.bool ppf b 146 + | Int i -> Fmt.int ppf i 147 + | String s -> pp_string ppf s 148 + | Ident s -> Fmt.string ppf s 149 + | Relop (op, a, b) -> Fmt.pf ppf "%a %s %a" pp a (relop_to_string op) pp b 150 + | Prefix_relop (op, v) -> Fmt.pf ppf "%s %a" (relop_to_string op) pp v 151 + | Logop (op, a, b) -> Fmt.pf ppf "%a %s %a" pp a (logop_to_string op) pp b 152 + | Pfxop (op, v) -> Fmt.pf ppf "%s%a" (pfxop_to_string op) pp v 153 + | List xs -> Fmt.pf ppf "[ %a ]" (Fmt.list ~sep:Fmt.sp pp) xs 154 + | Group xs -> Fmt.pf ppf "( %a )" (Fmt.list ~sep:Fmt.sp pp) xs 155 + | Option (v, fs) -> Fmt.pf ppf "%a {%a}" pp v (Fmt.list ~sep:Fmt.sp pp) fs 156 + | Env_binding (n, op, v) -> 157 + Fmt.pf ppf "%a %s %a" pp n (env_op_to_string op) pp v 158 + 159 + let rec pp_item ppf = function 160 + | Variable (n, v) -> Fmt.pf ppf "%s: %a" n pp v 161 + | Section s -> 162 + let label = 163 + match s.name with None -> "" | Some n -> Fmt.str " \"%s\"" n 164 + in 165 + Fmt.pf ppf "%s%s {@ @[<v>%a@]@ }" s.kind label 166 + (Fmt.list ~sep:Fmt.cut pp_item) 167 + s.items 168 + 169 + let pp_file ppf f = Fmt.list ~sep:Fmt.cut pp_item ppf f.contents 170 + 171 + let rec equal a b = 172 + match (a, b) with 173 + | Bool x, Bool y -> x = y 174 + | Int x, Int y -> x = y 175 + | String x, String y -> x = y 176 + | Ident x, Ident y -> x = y 177 + | Relop (op1, a1, b1), Relop (op2, a2, b2) -> 178 + op1 = op2 && equal a1 a2 && equal b1 b2 179 + | Prefix_relop (op1, v1), Prefix_relop (op2, v2) -> op1 = op2 && equal v1 v2 180 + | Logop (op1, a1, b1), Logop (op2, a2, b2) -> 181 + op1 = op2 && equal a1 a2 && equal b1 b2 182 + | Pfxop (op1, v1), Pfxop (op2, v2) -> op1 = op2 && equal v1 v2 183 + | List xs, List ys | Group xs, Group ys -> equal_list xs ys 184 + | Option (v1, fs1), Option (v2, fs2) -> equal v1 v2 && equal_list fs1 fs2 185 + | Env_binding (n1, op1, v1), Env_binding (n2, op2, v2) -> 186 + op1 = op2 && equal n1 n2 && equal v1 v2 187 + | _ -> false 188 + 189 + and equal_list xs ys = 190 + List.length xs = List.length ys && List.for_all2 equal xs ys 191 + 192 + let rec equal_item a b = 193 + match (a, b) with 194 + | Variable (n1, v1), Variable (n2, v2) -> n1 = n2 && equal v1 v2 195 + | Section s1, Section s2 -> 196 + s1.kind = s2.kind && s1.name = s2.name 197 + && List.length s1.items = List.length s2.items 198 + && List.for_all2 equal_item s1.items s2.items 199 + | _ -> false 200 + 201 + let equal_file a b = 202 + a.file_name = b.file_name 203 + && List.length a.contents = List.length b.contents 204 + && List.for_all2 equal_item a.contents b.contents 205 + 206 + module Error = Opam_error
+220
lib/value.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** opam file value AST. 7 + 8 + This module defines the syntactic value forms produced by parsing an opam 9 + file. For the public codec API, see {!Opam}. For parsing and encoding, see 10 + {!Opam_bytesrw}. *) 11 + 12 + (** {1:relop Relational operators} 13 + 14 + The six version comparison operators recognised in opam values. *) 15 + 16 + type relop = [ `Eq | `Neq | `Gt | `Geq | `Lt | `Leq ] 17 + 18 + val relop_to_string : relop -> string 19 + (** [relop_to_string op] is the symbol for [op] (e.g. [">="]). *) 20 + 21 + val relop_of_string : string -> relop option 22 + (** [relop_of_string s] is the relop with symbol [s], or [None]. *) 23 + 24 + (** {1:logop Logical operators} *) 25 + 26 + type logop = [ `And | `Or ] 27 + 28 + val logop_to_string : logop -> string 29 + (** [logop_to_string op] is the symbol for [op] (["&"] or ["|"]). *) 30 + 31 + (** {1:pfxop Prefix operators} 32 + 33 + [`Not] is the boolean negation, [`Defined] is the postfix [?] written here 34 + as a prefix tag. *) 35 + 36 + type pfxop = [ `Not | `Defined ] 37 + 38 + val pfxop_to_string : pfxop -> string 39 + (** [pfxop_to_string op] is the symbol for [op] (["!"] or ["?"]). *) 40 + 41 + (** {1:envop Environment update operators} 42 + 43 + The five operators allowed in [setenv:] and [build-env:] bindings. *) 44 + 45 + type env_op = [ `Eq | `PlusEq | `EqPlus | `ColonEq | `EqColon ] 46 + 47 + val env_op_to_string : env_op -> string 48 + (** [env_op_to_string op] is the symbol for [op]. *) 49 + 50 + val env_op_of_string : string -> env_op option 51 + (** [env_op_of_string s] is the env op with symbol [s], or [None]. *) 52 + 53 + (** {1:values Values} *) 54 + 55 + type t = 56 + | Bool of bool 57 + | Int of int 58 + | String of string 59 + | Relop of relop * t * t 60 + | Prefix_relop of relop * t 61 + | Logop of logop * t * t 62 + | Pfxop of pfxop * t 63 + | Ident of string 64 + | List of t list 65 + | Group of t list 66 + | Option of t * t list 67 + | Env_binding of t * env_op * t 68 + (** An opam value. [Option (v, fs)] is the [v {f1 f2 ...}] form for 69 + optional / filtered values. *) 70 + 71 + (** {1:item Item} 72 + 73 + A top-level entry in an opam file is either a [field: value] or a section 74 + [name "label" {items}]. *) 75 + 76 + type item = Variable of string * t | Section of section 77 + and section = { kind : string; name : string option; items : item list } 78 + 79 + (** {1:file File} *) 80 + 81 + type file = { 82 + file_name : string; (** Source path or ["-"] for in-memory inputs. *) 83 + contents : item list; 84 + } 85 + 86 + (** {1:constructors Constructors} *) 87 + 88 + val bool : bool -> t 89 + (** [bool b] is the boolean value [b]. *) 90 + 91 + val int : int -> t 92 + (** [int i] is the integer value [i]. *) 93 + 94 + val string : string -> t 95 + (** [string s] is the string value [s]. *) 96 + 97 + val ident : string -> t 98 + (** [ident s] is the unquoted identifier value [s]. *) 99 + 100 + val list : t list -> t 101 + (** [list xs] is the bracketed list value [[ x1 x2 ... ]]. *) 102 + 103 + val group : t list -> t 104 + (** [group xs] is the parenthesised group value [( x1 x2 ... )]. *) 105 + 106 + val option : t -> t list -> t 107 + (** [option v fs] is [v {f1 f2 ...}] (filtered/optional value). *) 108 + 109 + val relop : relop -> t -> t -> t 110 + (** [relop op a b] is the binary relational expression [a op b]. *) 111 + 112 + val prefix_relop : relop -> t -> t 113 + (** [prefix_relop op v] is the prefix relational expression [op v]. *) 114 + 115 + val logop : logop -> t -> t -> t 116 + (** [logop op a b] is the logical expression [a op b]. *) 117 + 118 + val pfxop : pfxop -> t -> t 119 + (** [pfxop op v] is the prefix expression [op v]. *) 120 + 121 + val env_binding : t -> env_op -> t -> t 122 + (** [env_binding name op v] is the env binding [name op v]. *) 123 + 124 + val variable : string -> t -> item 125 + (** [variable name v] is the field item [name: v]. *) 126 + 127 + val file : ?file_name:string -> item list -> file 128 + (** [file ?file_name items] is a file holding [items], with [file_name] used in 129 + error messages. *) 130 + 131 + (** {1:accessors Accessors and predicates} *) 132 + 133 + val to_bool : t -> bool 134 + (** [to_bool v] extracts a boolean. @raise Invalid_argument otherwise. *) 135 + 136 + val to_int : t -> int 137 + (** [to_int v] extracts an integer. @raise Invalid_argument otherwise. *) 138 + 139 + val to_string : t -> string 140 + (** [to_string v] extracts a string. @raise Invalid_argument otherwise. *) 141 + 142 + val to_ident : t -> string 143 + (** [to_ident v] extracts an identifier. @raise Invalid_argument otherwise. *) 144 + 145 + val to_list : t -> t list 146 + (** [to_list v] extracts a list. @raise Invalid_argument otherwise. *) 147 + 148 + val to_bool_opt : t -> bool option 149 + (** [to_bool_opt v] is [Some b] if [v] is a boolean, [None] otherwise. *) 150 + 151 + val to_int_opt : t -> int option 152 + (** [to_int_opt v] is [Some i] if [v] is an integer, [None] otherwise. *) 153 + 154 + val to_string_opt : t -> string option 155 + (** [to_string_opt v] is [Some s] if [v] is a string, [None] otherwise. *) 156 + 157 + val to_ident_opt : t -> string option 158 + (** [to_ident_opt v] is [Some s] if [v] is an identifier, [None] otherwise. *) 159 + 160 + val to_list_opt : t -> t list option 161 + (** [to_list_opt v] is [Some xs] if [v] is a list, [None] otherwise. *) 162 + 163 + val is_bool : t -> bool 164 + (** [is_bool v] is [true] iff [v] is a boolean. *) 165 + 166 + val is_int : t -> bool 167 + (** [is_int v] is [true] iff [v] is an integer. *) 168 + 169 + val is_string : t -> bool 170 + (** [is_string v] is [true] iff [v] is a string. *) 171 + 172 + val is_ident : t -> bool 173 + (** [is_ident v] is [true] iff [v] is an identifier. *) 174 + 175 + val is_list : t -> bool 176 + (** [is_list v] is [true] iff [v] is a list. *) 177 + 178 + (** {1:fields Field lookup} *) 179 + 180 + val find : string -> file -> t option 181 + (** [find name f] returns the value of the first variable item whose name is 182 + [name], or [None] if absent. *) 183 + 184 + val section : kind:string -> ?name:string -> file -> section option 185 + (** [section ~kind ?name f] returns the first section matching [kind] (and 186 + optional [name] label). *) 187 + 188 + val variables : file -> (string * t) list 189 + (** [variables f] is the list of [(name, value)] for top-level [Variable] items 190 + in source order. *) 191 + 192 + val sections : file -> section list 193 + (** [sections f] is the list of top-level [Section] items in source order. *) 194 + 195 + (** {1:pp Pretty-printing} 196 + 197 + These produce a compact, single-line representation suitable for debugging. 198 + For canonical opam syntax see {!Printer}. *) 199 + 200 + val pp : Format.formatter -> t -> unit 201 + (** [pp ppf v] pretty-prints a value [v]. *) 202 + 203 + val pp_item : Format.formatter -> item -> unit 204 + (** [pp_item ppf i] pretty-prints an item [i]. *) 205 + 206 + val pp_file : Format.formatter -> file -> unit 207 + (** [pp_file ppf f] pretty-prints a file [f]. *) 208 + 209 + val equal : t -> t -> bool 210 + (** [equal a b] is structural value equality. *) 211 + 212 + val equal_item : item -> item -> bool 213 + (** [equal_item a b] is structural item equality. *) 214 + 215 + val equal_file : file -> file -> bool 216 + (** [equal_file a b] is structural file equality, including file name. *) 217 + 218 + (** {1:errors Errors} *) 219 + 220 + module Error = Opam_error
+40
opam.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Streaming opam file codec for OCaml" 4 + description: """ 5 + A type-safe codec library for opam files using a combinator-based 6 + approach inspired by Jsont and the ocaml-toml package. The core library 7 + provides a value AST and codec combinators. The opam.bytesrw subpackage 8 + provides a streaming parser and encoder built on Bytesrw; combine with 9 + bytesrw-eio to drive it from an Eio flow.""" 10 + maintainer: ["Thomas Gazagnaire <thomas@gazagnaire.org>"] 11 + authors: ["Thomas Gazagnaire <thomas@gazagnaire.org>"] 12 + license: "ISC" 13 + tags: ["org:blacksun" "codec" "format" "opam"] 14 + homepage: "https://tangled.org/gazagnaire.org/ocaml-opam" 15 + bug-reports: "https://tangled.org/gazagnaire.org/ocaml-opam/issues" 16 + depends: [ 17 + "dune" {>= "3.21"} 18 + "ocaml" {>= "4.14.0"} 19 + "fmt" {>= "0.9.0"} 20 + "bytesrw" {>= "0.1.0"} 21 + "loc" {>= "0.1"} 22 + "alcotest" {with-test} 23 + "odoc" {with-doc} 24 + ] 25 + build: [ 26 + ["dune" "subst"] {dev} 27 + [ 28 + "dune" 29 + "build" 30 + "-p" 31 + name 32 + "-j" 33 + jobs 34 + "@install" 35 + "@runtest" {with-test} 36 + "@doc" {with-doc} 37 + ] 38 + ] 39 + dev-repo: "git+https://tangled.org/gazagnaire.org/ocaml-opam" 40 + x-maintenance-intent: ["(latest)"]
+12
test/dune
··· 1 + (test 2 + (name test) 3 + (modules 4 + test 5 + test_value 6 + test_lexer 7 + test_parser 8 + test_printer 9 + test_opam 10 + test_opam_error 11 + test_opam_bytesrw) 12 + (libraries opam opam.bytesrw alcotest))
+16
test/test.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + let () = 7 + Alcotest.run "opam" 8 + [ 9 + Test_value.suite; 10 + Test_lexer.suite; 11 + Test_parser.suite; 12 + Test_printer.suite; 13 + Test_opam.suite; 14 + Test_opam_error.suite; 15 + Test_opam_bytesrw.suite; 16 + ]
+79
test/test_lexer.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Opam 7 + 8 + let collect s = 9 + let lex = Lexer.of_string s in 10 + let rec loop acc = 11 + let tok = Lexer.next lex in 12 + if tok = Lexer.EOF then List.rev acc else loop (tok :: acc) 13 + in 14 + loop [] 15 + 16 + let tok_eq = Alcotest.testable (Fmt.using (fun t -> t) Lexer.pp_token) ( = ) 17 + 18 + let check name expect input = 19 + Alcotest.(check (list tok_eq)) name expect (collect input) 20 + 21 + let test_punctuation () = 22 + check "brackets and braces" 23 + [ Lexer.LBRACKET; Lexer.RBRACKET; Lexer.LBRACE; Lexer.RBRACE ] 24 + "[ ] { }"; 25 + check "parens and colons" 26 + [ Lexer.LPAREN; Lexer.RPAREN; Lexer.COLON; Lexer.COMMA ] 27 + "( ) : ," 28 + 29 + let test_relops () = 30 + check "all relops" 31 + [ Lexer.EQ; Lexer.NEQ; Lexer.GT; Lexer.GEQ; Lexer.LT; Lexer.LEQ ] 32 + "= != > >= < <=" 33 + 34 + let test_envops () = 35 + check "envops" 36 + [ Lexer.PLUSEQ; Lexer.EQPLUS; Lexer.COLONEQ; Lexer.EQCOLON ] 37 + "+= =+ := =:" 38 + 39 + let test_logical () = 40 + check "and or" [ Lexer.AND; Lexer.OR ] "& |"; 41 + check "not qmark" [ Lexer.NOT; Lexer.QMARK ] "! ?" 42 + 43 + let test_strings () = 44 + check "basic string" [ Lexer.STRING "hello" ] {|"hello"|}; 45 + check "string with escapes" [ Lexer.STRING "a\"b\\c" ] {|"a\"b\\c"|}; 46 + check "triple-quoted" [ Lexer.STRING "raw\ntext" ] "\"\"\"raw\ntext\"\"\"" 47 + 48 + let test_numbers () = 49 + check "positive int" [ Lexer.INT 42 ] "42"; 50 + check "negative int" [ Lexer.INT (-7) ] "-7" 51 + 52 + let test_bools_idents () = 53 + check "bools" [ Lexer.BOOL true; Lexer.BOOL false ] "true false"; 54 + check "ident with dashes-and-dots" [ Lexer.IDENT "foo-bar.baz" ] "foo-bar.baz" 55 + 56 + let test_comment () = 57 + check "line comment is skipped" 58 + [ Lexer.IDENT "name"; Lexer.COLON; Lexer.STRING "foo" ] 59 + "name: # comment\n\"foo\"" 60 + 61 + let test_unterminated_string () = 62 + try 63 + let _ = collect {|"abc|} in 64 + Alcotest.fail "expected lexer error on unterminated string" 65 + with Opam.Error _ -> () 66 + 67 + let suite = 68 + ( "lexer", 69 + [ 70 + Alcotest.test_case "punctuation" `Quick test_punctuation; 71 + Alcotest.test_case "relops" `Quick test_relops; 72 + Alcotest.test_case "envops" `Quick test_envops; 73 + Alcotest.test_case "logical" `Quick test_logical; 74 + Alcotest.test_case "strings" `Quick test_strings; 75 + Alcotest.test_case "numbers" `Quick test_numbers; 76 + Alcotest.test_case "bools_idents" `Quick test_bools_idents; 77 + Alcotest.test_case "comments" `Quick test_comment; 78 + Alcotest.test_case "unterminated_string" `Quick test_unterminated_string; 79 + ] )
+2
test/test_lexer.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam.Lexer}. *)
+124
test/test_opam.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Opam 7 + 8 + type pkg = { 9 + opam_version : string; 10 + name : string; 11 + version : string option; 12 + maintainer : string list; 13 + depends : string list; 14 + } 15 + 16 + let pkg_codec : pkg File.t = 17 + File.( 18 + obj (fun opam_version name version maintainer depends -> 19 + { opam_version; name; version; maintainer; depends }) 20 + |> field "opam-version" string ~enc:(fun p -> p.opam_version) 21 + |> field "name" string ~enc:(fun p -> p.name) 22 + |> opt "version" string ~enc:(fun p -> p.version) 23 + |> field "maintainer" (list string) ~dec_absent:[] ~enc:(fun p -> 24 + p.maintainer) 25 + |> field "depends" (list string) ~dec_absent:[] ~enc:(fun p -> p.depends) 26 + |> finish) 27 + 28 + let test_decode_minimal () = 29 + let src = {|opam-version: "2.0" 30 + name: "foo"|} in 31 + match decode_string pkg_codec src with 32 + | Ok p -> 33 + Alcotest.(check string) "opam-version" "2.0" p.opam_version; 34 + Alcotest.(check string) "name" "foo" p.name; 35 + Alcotest.(check (option string)) "version" None p.version; 36 + Alcotest.(check (list string)) "maintainer" [] p.maintainer; 37 + Alcotest.(check (list string)) "depends" [] p.depends 38 + | Error e -> Alcotest.fail (Error.to_string e) 39 + 40 + let test_decode_full () = 41 + let src = 42 + {|opam-version: "2.0" 43 + name: "foo" 44 + version: "1.0" 45 + maintainer: ["a@example.com" "b@example.com"] 46 + depends: ["ocaml" "dune"]|} 47 + in 48 + match decode_string pkg_codec src with 49 + | Ok p -> 50 + Alcotest.(check string) "name" "foo" p.name; 51 + Alcotest.(check (option string)) "version" (Some "1.0") p.version; 52 + Alcotest.(check int) "maintainer count" 2 (List.length p.maintainer); 53 + Alcotest.(check (list string)) "depends" [ "ocaml"; "dune" ] p.depends 54 + | Error e -> Alcotest.fail (Error.to_string e) 55 + 56 + let test_missing_required () = 57 + match decode_string pkg_codec {|name: "foo"|} with 58 + | Ok _ -> Alcotest.fail "expected missing-field error" 59 + | Error _ -> () 60 + 61 + let test_encode_roundtrip () = 62 + let p = 63 + { 64 + opam_version = "2.0"; 65 + name = "foo"; 66 + version = Some "1.0"; 67 + maintainer = [ "a@example.com" ]; 68 + depends = [ "ocaml"; "dune" ]; 69 + } 70 + in 71 + let s = encode_string pkg_codec p in 72 + match decode_string pkg_codec s with 73 + | Ok p' -> 74 + Alcotest.(check string) "name" p.name p'.name; 75 + Alcotest.(check (option string)) "version" p.version p'.version; 76 + Alcotest.(check (list string)) "depends" p.depends p'.depends 77 + | Error e -> Alcotest.fail (Error.to_string e) 78 + 79 + let test_omit_optional () = 80 + let p = 81 + { 82 + opam_version = "2.0"; 83 + name = "foo"; 84 + version = None; 85 + maintainer = []; 86 + depends = []; 87 + } 88 + in 89 + let s = encode_string pkg_codec p in 90 + Alcotest.(check bool) 91 + "no version line" true 92 + (let rec find sub i = 93 + let n = String.length s in 94 + let m = String.length sub in 95 + if i + m > n then false 96 + else if String.sub s i m = sub then true 97 + else find sub (i + 1) 98 + in 99 + (* The output contains "opam-version:" but must not contain a 100 + line starting with "version:" *) 101 + not (find "\nversion:" 0)) 102 + 103 + type cwrap = { c : (Value.relop * string) list } 104 + 105 + let constraint_codec : cwrap File.t = 106 + File.( 107 + obj (fun c -> { c }) |> field "c" constraint_ ~enc:(fun w -> w.c) |> finish) 108 + 109 + let test_constraint_codec () = 110 + match decode_string constraint_codec {|c: >= "1.0" & < "2.0"|} with 111 + | Ok { c = [ (`Geq, "1.0"); (`Lt, "2.0") ] } -> () 112 + | Ok _ -> Alcotest.fail "wrong constraint clauses" 113 + | Error e -> Alcotest.fail (Error.to_string e) 114 + 115 + let suite = 116 + ( "opam", 117 + [ 118 + Alcotest.test_case "decode_minimal" `Quick test_decode_minimal; 119 + Alcotest.test_case "decode_full" `Quick test_decode_full; 120 + Alcotest.test_case "missing_required" `Quick test_missing_required; 121 + Alcotest.test_case "encode_roundtrip" `Quick test_encode_roundtrip; 122 + Alcotest.test_case "omit_optional" `Quick test_omit_optional; 123 + Alcotest.test_case "constraint" `Quick test_constraint_codec; 124 + ] )
+2
test/test_opam.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam}'s codec combinators. *)
+40
test/test_opam_bytesrw.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Bytesrw 7 + 8 + let test_of_string () = 9 + let f = Opam_bytesrw.of_string {|name: "foo"|} in 10 + match Opam.Value.find "name" f with 11 + | Some (Opam.Value.String "foo") -> () 12 + | _ -> Alcotest.fail "name field" 13 + 14 + let test_of_reader () = 15 + let r = Bytes.Reader.of_string {|opam-version: "2.0"|} in 16 + let f = Opam_bytesrw.of_reader r in 17 + match Opam.Value.find "opam-version" f with 18 + | Some (Opam.Value.String "2.0") -> () 19 + | _ -> Alcotest.fail "opam-version" 20 + 21 + let test_to_writer () = 22 + let f = 23 + Opam.Value.file [ Opam.Value.variable "name" (Opam.Value.string "foo") ] 24 + in 25 + let buf = Buffer.create 32 in 26 + let w = Bytes.Writer.of_buffer buf in 27 + Opam_bytesrw.to_writer w f; 28 + let s = Buffer.contents buf in 29 + Alcotest.(check bool) 30 + "writer output round-trips" true 31 + (let f' = Opam_bytesrw.of_string s in 32 + Opam.Value.equal_file { f with file_name = f'.file_name } f') 33 + 34 + let suite = 35 + ( "opam_bytesrw", 36 + [ 37 + Alcotest.test_case "of_string" `Quick test_of_string; 38 + Alcotest.test_case "of_reader" `Quick test_of_reader; 39 + Alcotest.test_case "to_writer" `Quick test_to_writer; 40 + ] )
+2
test/test_opam_bytesrw.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam_bytesrw}. *)
+38
test/test_opam_error.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Opam 7 + 8 + let test_typed_kind_renders () = 9 + let buf = Buffer.create 64 in 10 + let ppf = Format.formatter_of_buffer buf in 11 + let err = 12 + Error.v Error.Context.empty Loc.Meta.none (Error.Unexpected_char '@') 13 + in 14 + Error.pp ppf err; 15 + Format.pp_print_flush ppf (); 16 + let s = Buffer.contents buf in 17 + Alcotest.(check bool) 18 + "renders the typed kind, not 'Msg'" true 19 + (let n = String.length s in 20 + let rec find sub i = 21 + let m = String.length sub in 22 + if i + m > n then false 23 + else if String.sub s i m = sub then true 24 + else find sub (i + 1) 25 + in 26 + find "unexpected character '@'" 0) 27 + 28 + let test_raise_helpers () = 29 + try Error.unexpected_char Loc.Meta.none 'X' with 30 + | Error.Error (_, _, Error.Unexpected_char 'X') -> () 31 + | _ -> Alcotest.fail "expected Unexpected_char 'X'" 32 + 33 + let suite = 34 + ( "opam_error", 35 + [ 36 + Alcotest.test_case "renders typed kind" `Quick test_typed_kind_renders; 37 + Alcotest.test_case "raise_helpers" `Quick test_raise_helpers; 38 + ] )
+2
test/test_opam_error.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam.Error} typed kinds. *)
+340
test/test_parser.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Opam 7 + 8 + let parse s = Opam_bytesrw.of_string s 9 + 10 + let test_simple_field () = 11 + let f = parse {|name: "foo"|} in 12 + match Value.find "name" f with 13 + | Some (Value.String "foo") -> () 14 + | _ -> Alcotest.fail "name field" 15 + 16 + let test_int_field () = 17 + let f = parse "opam-version: 2" in 18 + match Value.find "opam-version" f with 19 + | Some (Value.Int 2) -> () 20 + | _ -> Alcotest.fail "opam-version int" 21 + 22 + let test_list_field () = 23 + let f = parse {|maintainer: ["a@example.com" "b@example.com"]|} in 24 + match Value.find "maintainer" f with 25 + | Some 26 + (Value.List [ Value.String "a@example.com"; Value.String "b@example.com" ]) 27 + -> 28 + () 29 + | _ -> Alcotest.fail "maintainer list" 30 + 31 + let test_constraint () = 32 + let f = parse {|depends: [ "ocaml" {>= "4.14"} ]|} in 33 + match Value.find "depends" f with 34 + | Some 35 + (Value.List 36 + [ 37 + Value.Option 38 + ( Value.String "ocaml", 39 + [ Value.Prefix_relop (`Geq, Value.String "4.14") ] ); 40 + ]) -> 41 + () 42 + | _ -> Alcotest.fail "constraint shape" 43 + 44 + let test_logop_constraint () = 45 + let f = parse {|x: ["foo" {>= "1.0" & < "2.0"}]|} in 46 + match Value.find "x" f with 47 + | Some (Value.List [ Value.Option (Value.String "foo", [ filter ]) ]) -> ( 48 + match filter with 49 + | Value.Logop 50 + ( `And, 51 + Value.Prefix_relop (`Geq, Value.String "1.0"), 52 + Value.Prefix_relop (`Lt, Value.String "2.0") ) -> 53 + () 54 + | _ -> Alcotest.fail "filter logop shape") 55 + | _ -> Alcotest.fail "logop constraint shape" 56 + 57 + let test_section () = 58 + let f = 59 + parse 60 + {| 61 + url { 62 + src: "https://example.com/foo-1.0.tar.gz" 63 + checksum: "sha256=abc" 64 + }|} 65 + in 66 + match Value.section ~kind:"url" f with 67 + | Some s -> Alcotest.(check int) "section item count" 2 (List.length s.items) 68 + | None -> Alcotest.fail "url section" 69 + 70 + let test_section_with_label () = 71 + let f = 72 + parse 73 + {| 74 + extra-source "patch.diff" { 75 + src: "https://example.com/patch.diff" 76 + }|} 77 + in 78 + match Value.section ~kind:"extra-source" f with 79 + | Some s -> 80 + Alcotest.(check (option string)) "label" (Some "patch.diff") s.name 81 + | None -> Alcotest.fail "extra-source section" 82 + 83 + let test_env_binding () = 84 + let f = parse {|setenv: PATH += "/usr/local/bin"|} in 85 + match Value.find "setenv" f with 86 + | Some 87 + (Value.Env_binding 88 + (Value.Ident "PATH", `PlusEq, Value.String "/usr/local/bin")) -> 89 + () 90 + | _ -> Alcotest.fail "env binding shape" 91 + 92 + let test_pfxop () = 93 + let f = parse {|x: !available|} in 94 + match Value.find "x" f with 95 + | Some (Value.Pfxop (`Not, Value.Ident "available")) -> () 96 + | _ -> Alcotest.fail "pfxop shape" 97 + 98 + let test_real_opam_file () = 99 + let src = 100 + {|opam-version: "2.0" 101 + synopsis: "A test package" 102 + maintainer: "thomas@example.com" 103 + authors: ["Thomas"] 104 + license: "ISC" 105 + homepage: "https://example.com" 106 + depends: [ 107 + "ocaml" {>= "4.14"} 108 + "dune" {>= "3.0"} 109 + ] 110 + build: [ 111 + ["dune" "build" "-p" name "-j" jobs] 112 + ]|} 113 + in 114 + let f = parse src in 115 + Alcotest.(check int) "items" 8 (List.length f.contents) 116 + 117 + (* Negative test helper: parsing [src] must raise [Opam.Error]. The 118 + [why] string is the failure label if no error is raised. *) 119 + let must_fail why src = 120 + try 121 + let _ = parse src in 122 + Alcotest.failf "expected error: %s (input: %S)" why src 123 + with Opam.Error _ -> () 124 + 125 + let must_fail_kind name pred src = 126 + try 127 + let _ = parse src in 128 + Alcotest.failf "expected error %s on %S" name src 129 + with 130 + | Opam.Error (_, _, kind) when pred kind -> () 131 + | Opam.Error (_, _, _) -> 132 + Alcotest.failf "wrong error kind: expected %s for %S" name src 133 + 134 + let test_neg_unbalanced_bracket () = must_fail "unbalanced [" {|x: [|} 135 + let test_neg_unbalanced_paren () = must_fail "unbalanced (" {|x: (1|} 136 + let test_neg_unbalanced_brace () = must_fail "unbalanced {" {|s {|} 137 + let test_neg_extra_close_bracket () = must_fail "extra ]" {|x: ]|} 138 + let test_neg_extra_close_paren () = must_fail "extra )" {|x: )|} 139 + let test_neg_extra_close_brace () = must_fail "extra }" "x: } " 140 + let test_neg_value_without_field () = must_fail "value at top level" {|"foo"|} 141 + let test_neg_int_without_field () = must_fail "int at top level" "42" 142 + 143 + let test_neg_string_without_field () = 144 + must_fail "string at top level" {|"unattached"|} 145 + 146 + let test_neg_double_colon () = 147 + must_fail "field followed by another colon" {|x:: 1|} 148 + 149 + let test_neg_missing_value () = must_fail "field without value" {|name:|} 150 + let test_neg_field_no_colon () = must_fail "field without colon" {|name "foo"|} 151 + let test_neg_garbage_top () = must_fail "garbage punctuation at top" {|, : ; |} 152 + 153 + let test_neg_unterminated_string () = 154 + must_fail_kind "unterminated string" 155 + (function Opam.Error.Unterminated_string -> true | _ -> false) 156 + {|name: "abc|} 157 + 158 + let test_neg_invalid_escape () = 159 + must_fail_kind "invalid escape" 160 + (function Opam.Error.Invalid_escape _ -> true | _ -> false) 161 + {|name: "\q"|} 162 + 163 + let test_neg_invalid_hex_escape () = 164 + must_fail_kind "invalid hex escape" 165 + (function Opam.Error.Invalid_hex_escape -> true | _ -> false) 166 + {|name: "\xZZ"|} 167 + 168 + let test_neg_decimal_overflow () = 169 + must_fail_kind "decimal out of range" 170 + (function Opam.Error.Decimal_escape_out_of_range -> true | _ -> false) 171 + {|name: "\999"|} 172 + 173 + let test_neg_unexpected_char () = 174 + must_fail_kind "unexpected char" 175 + (function Opam.Error.Unexpected_char _ -> true | _ -> false) 176 + "x: ^" 177 + 178 + let test_neg_section_missing_close () = 179 + must_fail "section missing }" {|url { src: "x"|} 180 + 181 + let test_neg_section_missing_open () = 182 + must_fail "section missing {" {|url "label"|} 183 + 184 + let test_neg_double_lbracket () = must_fail "[[ at top" {|[[|} 185 + 186 + let test_neg_unclosed_filter () = 187 + must_fail "unclosed { in filter" {|x: ["foo" {>= "1.0"|} 188 + 189 + let test_neg_logop_no_rhs () = must_fail "logop with no rhs" {|x: 1 &|} 190 + let test_neg_relop_only () = must_fail "lone relop" {|x: >= |} 191 + let test_neg_envop_no_rhs () = must_fail "envop with no rhs" {|x: PATH +=|} 192 + let test_neg_pfxop_no_arg () = must_fail "pfxop with no arg" {|x: !|} 193 + 194 + let test_neg_value_then_garbage () = 195 + must_fail "garbage after value" {|x: "foo" "bar" %|} 196 + 197 + let test_neg_section_label () = 198 + must_fail "section label without quotes" {|url label { }|} 199 + 200 + let test_neg_string_newline () = 201 + must_fail "newline in basic string" "x: \"foo\nbar\"" 202 + 203 + (* find_field: token-level skip-parse for extracting one top-level field. *) 204 + let test_find_field_present () = 205 + let src = 206 + {|opam-version: "2.0" 207 + name: "foo" 208 + depends: [ "ocaml" {>= "4.14"} ] 209 + authors: ["a"]|} 210 + in 211 + match Opam_bytesrw.field "depends" src with 212 + | Some (Value.List [ Value.Option (Value.String "ocaml", _) ]) -> () 213 + | Some _ -> Alcotest.fail "depends: wrong shape" 214 + | None -> Alcotest.fail "depends: not found" 215 + 216 + let test_find_field_absent () = 217 + let src = {|name: "foo" 218 + version: "1.0"|} in 219 + Alcotest.(check bool) 220 + "missing field returns None" true 221 + (Option.is_none (Opam_bytesrw.field "depends" src)) 222 + 223 + let test_find_field_skips_sections () = 224 + let src = 225 + {|name: "foo" 226 + url { 227 + src: "https://example.com/foo.tar.gz" 228 + checksum: "sha256=abc" 229 + } 230 + version: "1.0"|} 231 + in 232 + match Opam_bytesrw.field "version" src with 233 + | Some (Value.String "1.0") -> () 234 + | _ -> Alcotest.fail "version after section: not found" 235 + 236 + let test_find_field_early_exit () = 237 + (* Field appears before syntax errors — find_field must not scan past it. *) 238 + let src = {|name: "foo" 239 + THIS IS NOT VALID|} in 240 + match Opam_bytesrw.field "name" src with 241 + | Some (Value.String "foo") -> () 242 + | _ -> Alcotest.fail "early-exit failed" 243 + 244 + (* Error locations should point at the offending token, not at the file 245 + start or some earlier position. *) 246 + let test_error_location () = 247 + let src = "name: \"foo\"\nversion: garbage^here" in 248 + try 249 + let _ = parse src in 250 + Alcotest.fail "expected error" 251 + with Opam.Error (_, meta, _) -> 252 + let loc = Loc.Meta.textloc meta in 253 + let line = Loc.first_line_num loc in 254 + let byte = Loc.first_byte loc in 255 + Alcotest.(check int) "error line" 2 line; 256 + Alcotest.(check bool) 257 + (Fmt.str "error points inside line 2 (byte %d)" byte) 258 + true 259 + (byte > 12 (* past the first newline *)) 260 + 261 + (* peek2 in parse_atom_post could in theory confuse the loc reporting 262 + if the parser raised on buf.(0) after reading ahead. Verify by 263 + triggering an error right after an IDENT with peek2 lookahead. *) 264 + let test_error_location_after_peek2 () = 265 + let src = {|x: foo $|} in 266 + try 267 + let _ = parse src in 268 + Alcotest.fail "expected error" 269 + with Opam.Error (_, meta, _) -> 270 + let loc = Loc.Meta.textloc meta in 271 + let byte = Loc.first_byte loc in 272 + Alcotest.(check bool) 273 + (Fmt.str "error points at '$' (byte %d)" byte) 274 + true 275 + (byte >= 7 (* past 'x: foo ' *)) 276 + 277 + let suite = 278 + ( "parser", 279 + [ 280 + Alcotest.test_case "simple_field" `Quick test_simple_field; 281 + Alcotest.test_case "int_field" `Quick test_int_field; 282 + Alcotest.test_case "list_field" `Quick test_list_field; 283 + Alcotest.test_case "constraint" `Quick test_constraint; 284 + Alcotest.test_case "logop_constraint" `Quick test_logop_constraint; 285 + Alcotest.test_case "section" `Quick test_section; 286 + Alcotest.test_case "section_with_label" `Quick test_section_with_label; 287 + Alcotest.test_case "env_binding" `Quick test_env_binding; 288 + Alcotest.test_case "pfxop" `Quick test_pfxop; 289 + Alcotest.test_case "real_opam_file" `Quick test_real_opam_file; 290 + Alcotest.test_case "neg/unbalanced_bracket" `Quick 291 + test_neg_unbalanced_bracket; 292 + Alcotest.test_case "neg/unbalanced_paren" `Quick test_neg_unbalanced_paren; 293 + Alcotest.test_case "neg/unbalanced_brace" `Quick test_neg_unbalanced_brace; 294 + Alcotest.test_case "neg/extra_close_bracket" `Quick 295 + test_neg_extra_close_bracket; 296 + Alcotest.test_case "neg/extra_close_paren" `Quick 297 + test_neg_extra_close_paren; 298 + Alcotest.test_case "neg/extra_close_brace" `Quick 299 + test_neg_extra_close_brace; 300 + Alcotest.test_case "neg/value_without_field" `Quick 301 + test_neg_value_without_field; 302 + Alcotest.test_case "neg/int_without_field" `Quick 303 + test_neg_int_without_field; 304 + Alcotest.test_case "neg/string_without_field" `Quick 305 + test_neg_string_without_field; 306 + Alcotest.test_case "neg/double_colon" `Quick test_neg_double_colon; 307 + Alcotest.test_case "neg/missing_value" `Quick test_neg_missing_value; 308 + Alcotest.test_case "neg/field_no_colon" `Quick test_neg_field_no_colon; 309 + Alcotest.test_case "neg/garbage_top" `Quick test_neg_garbage_top; 310 + Alcotest.test_case "neg/unterminated_string" `Quick 311 + test_neg_unterminated_string; 312 + Alcotest.test_case "neg/invalid_escape" `Quick test_neg_invalid_escape; 313 + Alcotest.test_case "neg/invalid_hex_escape" `Quick 314 + test_neg_invalid_hex_escape; 315 + Alcotest.test_case "neg/decimal_overflow" `Quick test_neg_decimal_overflow; 316 + Alcotest.test_case "neg/unexpected_char" `Quick test_neg_unexpected_char; 317 + Alcotest.test_case "neg/section_missing_close" `Quick 318 + test_neg_section_missing_close; 319 + Alcotest.test_case "neg/section_missing_open" `Quick 320 + test_neg_section_missing_open; 321 + Alcotest.test_case "neg/double_lbracket" `Quick test_neg_double_lbracket; 322 + Alcotest.test_case "neg/unclosed_filter" `Quick test_neg_unclosed_filter; 323 + Alcotest.test_case "neg/logop_no_rhs" `Quick test_neg_logop_no_rhs; 324 + Alcotest.test_case "neg/relop_only" `Quick test_neg_relop_only; 325 + Alcotest.test_case "neg/envop_no_rhs" `Quick test_neg_envop_no_rhs; 326 + Alcotest.test_case "neg/pfxop_no_arg" `Quick test_neg_pfxop_no_arg; 327 + Alcotest.test_case "neg/value_then_garbage" `Quick 328 + test_neg_value_then_garbage; 329 + Alcotest.test_case "neg/section_label" `Quick test_neg_section_label; 330 + Alcotest.test_case "neg/string_newline" `Quick test_neg_string_newline; 331 + Alcotest.test_case "error_location" `Quick test_error_location; 332 + Alcotest.test_case "error_location_after_peek2" `Quick 333 + test_error_location_after_peek2; 334 + Alcotest.test_case "find_field/present" `Quick test_find_field_present; 335 + Alcotest.test_case "find_field/absent" `Quick test_find_field_absent; 336 + Alcotest.test_case "find_field/skips_sections" `Quick 337 + test_find_field_skips_sections; 338 + Alcotest.test_case "find_field/early_exit" `Quick 339 + test_find_field_early_exit; 340 + ] )
+2
test/test_parser.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam.Parser}. *)
+82
test/test_printer.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Opam 7 + 8 + let parse s = Opam_bytesrw.of_string s 9 + let print = Opam_bytesrw.to_string 10 + 11 + let test_roundtrip name src = 12 + let f1 = parse src in 13 + let s = print f1 in 14 + let f2 = parse s in 15 + Alcotest.(check bool) 16 + (name ^ " roundtrip") true 17 + (List.length f1.contents = List.length f2.contents 18 + && List.for_all2 Value.equal_item f1.contents f2.contents) 19 + 20 + let test_simple () = test_roundtrip "simple" {|name: "foo"|} 21 + 22 + let test_full () = 23 + test_roundtrip "full" 24 + {|opam-version: "2.0" 25 + synopsis: "A test package" 26 + maintainer: "thomas@example.com" 27 + authors: ["Thomas"] 28 + license: "ISC" 29 + depends: [ 30 + "ocaml" {>= "4.14"} 31 + "dune" {>= "3.0"} 32 + ]|} 33 + 34 + let test_section () = 35 + test_roundtrip "section" 36 + {|url { 37 + src: "https://example.com/foo.tar.gz" 38 + checksum: "sha256=abc" 39 + }|} 40 + 41 + let test_value_string_escapes () = 42 + let v = Value.string "with\"quote\nand\nnewline" in 43 + let s = Printer.value_to_string v in 44 + let lex = Lexer.of_string s in 45 + let v' = Parser.parse_value lex in 46 + Alcotest.(check bool) "escapes survive" true (Value.equal v v') 47 + 48 + let test_pretty_content () = 49 + (* A pretty-print test must check content, not just non-empty. *) 50 + let f = parse {|name: "foo"|} in 51 + let s = print f in 52 + Alcotest.(check bool) 53 + "contains name field" true 54 + (let n = String.length s in 55 + n >= 5 56 + && 57 + let rec contains_at i = 58 + if i + 4 > n then false 59 + else if String.sub s i 4 = "name" then true 60 + else contains_at (i + 1) 61 + in 62 + contains_at 0); 63 + Alcotest.(check bool) 64 + "contains foo string" true 65 + (let n = String.length s in 66 + let rec find sub i = 67 + let m = String.length sub in 68 + if i + m > n then false 69 + else if String.sub s i m = sub then true 70 + else find sub (i + 1) 71 + in 72 + find "\"foo\"" 0) 73 + 74 + let suite = 75 + ( "printer", 76 + [ 77 + Alcotest.test_case "simple" `Quick test_simple; 78 + Alcotest.test_case "full" `Quick test_full; 79 + Alcotest.test_case "section" `Quick test_section; 80 + Alcotest.test_case "string_escapes" `Quick test_value_string_escapes; 81 + Alcotest.test_case "pretty_content" `Quick test_pretty_content; 82 + ] )
+2
test/test_printer.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam.Printer}. *)
+88
test/test_value.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2026 Thomas Gazagnaire <thomas@gazagnaire.org> 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Opam 7 + 8 + let test_constructors () = 9 + Alcotest.(check bool) "bool" true (Value.to_bool (Value.bool true)); 10 + Alcotest.(check int) "int" 42 (Value.to_int (Value.int 42)); 11 + Alcotest.(check string) "string" "hi" (Value.to_string (Value.string "hi")); 12 + Alcotest.(check string) "ident" "name" (Value.to_ident (Value.ident "name")); 13 + let l = [ Value.string "a"; Value.string "b" ] in 14 + Alcotest.(check int) "list" 2 (List.length (Value.to_list (Value.list l))) 15 + 16 + let test_predicates () = 17 + Alcotest.(check bool) "is_string" true (Value.is_string (Value.string "x")); 18 + Alcotest.(check bool) "not is_int" false (Value.is_int (Value.string "x")); 19 + Alcotest.(check bool) "is_list" true (Value.is_list (Value.list [])); 20 + Alcotest.(check bool) 21 + "to_string_opt none" true 22 + (Option.is_none (Value.to_string_opt (Value.bool true))) 23 + 24 + let test_find () = 25 + let f = 26 + Value.file 27 + [ 28 + Value.variable "name" (Value.string "foo"); 29 + Value.variable "version" (Value.string "1.0"); 30 + ] 31 + in 32 + (match Value.find "name" f with 33 + | Some (Value.String "foo") -> () 34 + | _ -> Alcotest.fail "find: name lookup failed"); 35 + Alcotest.(check bool) "find: missing" true (Option.is_none (Value.find "x" f)) 36 + 37 + let test_section_lookup () = 38 + let f = 39 + Value.file 40 + [ 41 + Value.Section 42 + { 43 + kind = "url"; 44 + name = Some "src"; 45 + items = 46 + [ Value.variable "src" (Value.string "https://example.com") ]; 47 + }; 48 + Value.Section { kind = "extra-source"; name = Some "foo"; items = [] }; 49 + ] 50 + in 51 + (match Value.section ~kind:"url" f with 52 + | Some s -> Alcotest.(check (option string)) "url label" (Some "src") s.name 53 + | None -> Alcotest.fail "section: url not found"); 54 + Alcotest.(check bool) 55 + "section missing kind" true 56 + (Option.is_none (Value.section ~kind:"depends" f)) 57 + 58 + let test_equal_roundtrip () = 59 + let v = Value.list [ Value.int 1; Value.string "x" ] in 60 + Alcotest.(check bool) "equal reflexive" true (Value.equal v v); 61 + Alcotest.(check bool) 62 + "equal different" false 63 + (Value.equal v (Value.list [ Value.int 2 ])) 64 + 65 + let test_relop () = 66 + Alcotest.(check string) "Eq" "=" (Value.relop_to_string `Eq); 67 + Alcotest.(check string) "Geq" ">=" (Value.relop_to_string `Geq); 68 + match Value.relop_of_string ">=" with 69 + | Some `Geq -> () 70 + | _ -> Alcotest.fail "relop_of_string >=" 71 + 72 + let test_envop () = 73 + Alcotest.(check string) "PlusEq" "+=" (Value.env_op_to_string `PlusEq); 74 + match Value.env_op_of_string ":=" with 75 + | Some `ColonEq -> () 76 + | _ -> Alcotest.fail "env_op_of_string :=" 77 + 78 + let suite = 79 + ( "value", 80 + [ 81 + Alcotest.test_case "constructors" `Quick test_constructors; 82 + Alcotest.test_case "predicates" `Quick test_predicates; 83 + Alcotest.test_case "find" `Quick test_find; 84 + Alcotest.test_case "section_lookup" `Quick test_section_lookup; 85 + Alcotest.test_case "equal" `Quick test_equal_roundtrip; 86 + Alcotest.test_case "relop" `Quick test_relop; 87 + Alcotest.test_case "envop" `Quick test_envop; 88 + ] )
+2
test/test_value.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Alcotest suite for {!Opam.Value}. *)