protobuf: map<K,V>, unknown-field preservation, CVE test matrix
- [Message.map tag get key_codec value_codec] declares a [map<K,V>]
field. On the wire this is sugar for a repeated nested message with
[key = 1] and [value = 2], and the decoder handles both forms.
Internal [map_entry_codec] builds the entry submessage inline
without routing through [let* / finish] -- the entry is an ephemeral
tuple rather than a named record.
- [decode_with_unknowns_string] / [encode_with_unknowns_string] let
forward-compatible pipelines preserve fields whose tag was not in
the schema. Decode returns [Ok (value, unknown_wire)] where the
byte string can be tacked onto a later encode via the matching
[~unknowns] argument. Unknowns are re-serialized in canonical form
and sorted by tag, so round-trip preserves semantics but not
byte-identity. Standard [decode_string] / [encode_string] still
silently drop unknowns.
Implementation: [Message.take_last] and [take_all] now consume the
matched entries from the parse_wire hashtable; what remains after
[decode_fields] returns is exactly the unknown-field set.
- Hostile-input suite is rewritten around CVE identifiers. Each test
cites the upstream vulnerability:
CVE-2015-5237 (C++ 2015): huge length prefix, over-long varint,
truncated tag -- integer overflow / DoS
CVE-2021-22569 (Java 2021): many small groups -- memory
amplification
CVE-2022-1941 (C++ 2022): all-unknown-fields schema -- null deref
CVE-2022-3171 (Java 2022): deprecated group wire types 3 & 4
CVE-2024-7254 (Go 2024): deep nesting in known and unknown
message fields
CVE-2024-47554 (Rust prost 2024): length past end, packed
corrupt body
Plus spec-conformance tests for reserved tag 0, wire-type mismatch,
non-UTF-8 string content (must accept), empty input (proto3
defaults), overrun rejection, and map duplicate keys (last-wins
but decoder preserves wire order).
- GADT tweak: drop the [_t] suffix from [Fixed32_t] / [Fixed64_t]
codec constructors. OCaml's type-directed constructor
disambiguation resolves the name collision with [Wire.Fixed32] /
[Wire.Fixed64] by context.
- Add [Protobuf.pp : 'a t -> _] printing the codec's sort (for
debugging / merlint E415).
- Add a top-level [.ocamlformat] (version 0.29.0) to match the
monorepo convention.
- Add one-line docstrings to every [Wire.read_*] entry in [wire.mli].
All 49 unit + 17 fuzz + 2 protoc interop tests pass.
Remaining merlint items (queued for next session): inline
test_hostile.ml into test_protobuf.ml as a [hostile_cases] list per
the user's established pattern; shorten test identifiers to
<= 4 underscores; rename [Wire.wire_type] to [Wire.t].