Native CBOR codec with type-safe combinators
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

cbor: rename from cbort (partial downstream)

Drops the "t" suffix. Internal raw CBOR module moves to Value (was
Cbor in lib/cbor.ml), matching the value/codec/<pkg> layout from the
other codec packages. Low-level byte R/W moved to lib/binary.ml (was
lib/cbor_rw.ml). Library name cbor; main module Cbor via lib/cbor.ml
(was cbort.ml).

Downstream packages (ocaml-bundle, ocaml-cose, ocaml-bpsec, ocaml-scitt,
ocaml-crow, irmin) partially migrated: Cbort.Cbor -> Cbor.Value, the
internal Cbor alias shadowing in each file renamed to V to free the
top-level Cbor for the library facade. Some downstream build errors
remain because many callsites conflated raw value constructors
(Cbor.int, Cbor.int64) with schema codecs and need manual triage.

The lib/binary.ml R/W primitives are NOT re-exported through Cbor.Binary
due to OCaml's lazy module alias elision when the aliased module isn't
referenced by any type/value in the parent signature. A separate
cbor.bytesrw library (ocaml-cbor/lib/bytesrw/) is the right home for
that, matching json.bytesrw / toml.bytesrw; left as a follow-up.

+10349
+17
.gitignore
··· 1 + # OCaml build artifacts 2 + _build/ 3 + *.install 4 + *.merlin 5 + 6 + # Third-party sources (fetch locally with opam source) 7 + third_party/ 8 + 9 + # Editor and OS files 10 + .DS_Store 11 + *.swp 12 + *~ 13 + .vscode/ 14 + .idea/ 15 + 16 + # Opam local switch 17 + _opam/
+1
.ocamlformat
··· 1 + version = 0.29.0
+54
.tangled/workflows/build.yml
··· 1 + when: 2 + - event: ["push", "pull_request"] 3 + branch: ["main"] 4 + 5 + engine: nixery 6 + 7 + dependencies: 8 + nixpkgs: 9 + - shell 10 + - stdenv 11 + - findutils 12 + - binutils 13 + - libunwind 14 + - ncurses 15 + - opam 16 + - git 17 + - gawk 18 + - gnupatch 19 + - gnum4 20 + - gnumake 21 + - gnutar 22 + - gnused 23 + - gnugrep 24 + - diffutils 25 + - gzip 26 + - bzip2 27 + - gcc 28 + - ocaml 29 + - pkg-config 30 + - gmp 31 + 32 + steps: 33 + - name: opam 34 + command: | 35 + opam init --disable-sandboxing -a -y 36 + - name: repo 37 + command: | 38 + opam repo add aoah https://tangled.org/anil.recoil.org/aoah-opam-repo.git 39 + - name: switch 40 + command: | 41 + opam install . --confirm-level=unsafe-yes --deps-only 42 + - name: build 43 + command: | 44 + opam exec -- dune build 45 + - name: switch-test 46 + command: | 47 + opam install . --confirm-level=unsafe-yes --deps-only --with-test 48 + - name: test 49 + command: | 50 + opam exec -- dune runtest --verbose 51 + - name: doc 52 + command: | 53 + opam install -y odoc 54 + opam exec -- dune build @doc
+15
LICENSE.md
··· 1 + ISC License 2 + 3 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 4 + 5 + Permission to use, copy, modify, and distribute this software for any 6 + purpose with or without fee is hereby granted, provided that the above 7 + copyright notice and this permission notice appear in all copies. 8 + 9 + THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10 + WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11 + MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12 + ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13 + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14 + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15 + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+89
README.md
··· 1 + # cbor 2 + 3 + Type-safe CBOR codec combinators for OCaml, with GADT-based bidirectional 4 + maps and bytesrw streaming. 5 + 6 + ## Overview 7 + 8 + cbor provides encoding and decoding of CBOR (RFC 8949) using a 9 + combinator-based approach inspired by 10 + [Jsont](https://github.com/dbuenzli/jsont). Define a codec once as a value 11 + of type `'a t` and use it for both directions. Codecs compose from base 12 + types through objects, arrays, variants, tags, and recursive types. 13 + 14 + Streaming I/O is built on [bytesrw](https://github.com/dbuenzli/bytesrw) 15 + for zero-copy reading and writing. Path-aware error messages pinpoint decode 16 + failures (e.g., `$.items[3].name: expected string, got integer`). 17 + 18 + ## Installation 19 + 20 + ``` 21 + opam install cbor 22 + ``` 23 + 24 + ## Usage 25 + 26 + ```ocaml 27 + open Cbor 28 + 29 + (* Define a codec for a record type *) 30 + type person = { name : string; age : int } 31 + 32 + let person_codec = 33 + let open Obj in 34 + let* name = mem "name" (fun p -> p.name) string in 35 + let* age = mem "age" (fun p -> p.age) int in 36 + return { name; age } |> finish 37 + 38 + (* Encode to CBOR bytes *) 39 + let encoded = encode_string person_codec { name = "Alice"; age = 30 } 40 + 41 + (* Decode from CBOR bytes *) 42 + match decode_string person_codec encoded with 43 + | Ok p -> Printf.printf "%s is %d\n" p.name p.age 44 + | Error e -> prerr_endline (Error.to_string e) 45 + 46 + (* Streaming decode from a bytesrw reader *) 47 + let reader = Bytesrw.Bytes.Reader.of_string encoded in 48 + match decode person_codec reader with 49 + | Ok p -> Printf.printf "%s\n" p.name 50 + | Error e -> prerr_endline (Error.to_string e) 51 + ``` 52 + 53 + ### Variants and Tags 54 + 55 + ```ocaml 56 + type shape = Circle of float | Rect of float * float 57 + 58 + let shape_codec = 59 + Variant.(variant [ 60 + case 0 float (fun r -> Circle r) 61 + (function Circle r -> Some r | _ -> None); 62 + case 1 (tuple2 float float) (fun (w, h) -> Rect (w, h)) 63 + (function Rect (w, h) -> Some (w, h) | _ -> None); 64 + ]) 65 + ``` 66 + 67 + ## API Overview 68 + 69 + - **Base codecs** -- `null`, `bool`, `int`, `int32`, `int64`, `float`, `string`, `bytes`, `any` 70 + - **`nullable`**, **`option`** -- Optional values 71 + - **`array`**, **`array_of`**, **`tuple2`**--**`tuple4`** -- Array codecs 72 + - **`assoc`**, **`string_map`**, **`int_map`** -- Map codecs 73 + - **`Obj`** module -- Record builder with string keys: `mem`, `mem_opt`, `mem_default`, `return`, `finish` 74 + - **`Obj_int`** module -- Record builder with integer keys (COSE/CWT style) 75 + - **`Variant`**, **`Variant_key`** -- Sum types via CBOR tags or string-keyed maps 76 + - **`tag`**, **`tag_opt`** -- CBOR semantic tags 77 + - **`map`**, **`conv`**, **`const`**, **`fix`** -- Transformations and recursion 78 + - **`mem`**, **`int_mem`**, **`nth`** -- Query combinators 79 + - **`decode`**, **`decode_string`** -- Decoding (streaming and string) 80 + - **`encode`**, **`encode_string`** -- Encoding (streaming and string) 81 + 82 + ## References 83 + 84 + - [RFC 8949](https://www.rfc-editor.org/rfc/rfc8949.html) -- Concise Binary 85 + Object Representation (CBOR) 86 + 87 + ## Licence 88 + 89 + ISC License. See [LICENSE.md](LICENSE.md) for details.
+41
cbor.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Native CBOR codec with type-safe combinators" 4 + description: """ 5 + Type-safe CBOR (RFC 8949) encoding and decoding using a combinator-based 6 + approach. Define codecs once and use them for both encoding and decoding 7 + OCaml values to and from CBOR binary format.""" 8 + maintainer: ["Anil Madhavapeddy <anil@recoil.org>"] 9 + authors: ["Anil Madhavapeddy"] 10 + license: "ISC" 11 + tags: ["org:blacksun" "codec" "binary"] 12 + homepage: "https://tangled.org/anil.recoil.org/ocaml-cbor" 13 + bug-reports: "https://tangled.org/anil.recoil.org/ocaml-cbor/issues" 14 + depends: [ 15 + "dune" {>= "3.21"} 16 + "ocaml" {>= "5.1"} 17 + "bytesrw" {>= "0.2"} 18 + "fmt" {>= "0.9"} 19 + "zarith" {>= "1.12"} 20 + "odoc" {with-doc} 21 + "crowbar" {>= "0.2" & with-test} 22 + ] 23 + build: [ 24 + ["dune" "subst"] {dev} 25 + [ 26 + "dune" 27 + "build" 28 + "-p" 29 + name 30 + "-j" 31 + jobs 32 + "@install" 33 + "@runtest" {with-test} 34 + "@doc" {with-doc} 35 + ] 36 + ] 37 + dev-repo: "git+https://tangled.org/anil.recoil.org/ocaml-cbor" 38 + x-maintenance-intent: ["(latest)"] 39 + x-quality-build: "2026-04-15" 40 + x-quality-fuzz: "2026-04-15" 41 + x-quality-test: "2026-04-15"
+3
cbor.opam.template
··· 1 + x-quality-build: "2026-04-15" 2 + x-quality-fuzz: "2026-04-15" 3 + x-quality-test: "2026-04-15"
+6
dune
··· 1 + ; Root dune file 2 + 3 + ; Ignore third_party directory (for fetched dependency sources) 4 + ; Exclude specs (reference docs) from build 5 + 6 + (data_only_dirs third_party specs)
+25
dune-project
··· 1 + (lang dune 3.21) 2 + (name cbor) 3 + 4 + (generate_opam_files true) 5 + 6 + (license ISC) 7 + (authors "Anil Madhavapeddy") 8 + (maintainers "Anil Madhavapeddy <anil@recoil.org>") 9 + (source (tangled anil.recoil.org/ocaml-cbor)) 10 + 11 + (package 12 + (name cbor) 13 + (synopsis "Native CBOR codec with type-safe combinators") 14 + (tags (org:blacksun codec encoding binary)) 15 + (description 16 + "Type-safe CBOR (RFC 8949) encoding and decoding using a combinator-based 17 + approach. Define codecs once and use them for both encoding and decoding 18 + OCaml values to and from CBOR binary format.") 19 + (depends 20 + (ocaml (>= 5.1)) 21 + (bytesrw (>= 0.2)) 22 + (fmt (>= 0.9)) 23 + (zarith (>= 1.12)) 24 + (odoc :with-doc) 25 + (crowbar (and (>= 0.2) :with-test))))
+34
fuzz/dune
··· 1 + ; Crowbar fuzz testing for CBOR roundtripping 2 + ; 3 + ; Quick check (runs tests with random inputs): 4 + ; dune build @fuzz 5 + ; 6 + ; With AFL instrumentation (use crow orchestrator): 7 + ; crow start --cpus=4 8 + ; 9 + ; To generate seed corpus: 10 + 11 + (executable 12 + (name fuzz) 13 + (modules fuzz fuzz_cbor) 14 + (libraries cbor bytesrw alcobar)) 15 + 16 + (rule 17 + (alias runtest) 18 + (enabled_if 19 + (<> %{profile} afl)) 20 + (deps 21 + fuzz.exe 22 + (source_tree corpus)) 23 + (action 24 + (run %{exe:fuzz.exe}))) 25 + 26 + (rule 27 + (alias fuzz) 28 + (enabled_if 29 + (= %{profile} afl)) 30 + (deps fuzz.exe) 31 + (action 32 + (progn 33 + (run %{exe:fuzz.exe} --gen-corpus corpus) 34 + (run afl-fuzz -V 60 -i corpus -o _fuzz -- %{exe:fuzz.exe} @@))))
+1
fuzz/fuzz.ml
··· 1 + let () = Alcobar.run "cbor" [ Fuzz_cbor.suite ]
+132
fuzz/fuzz_cbor.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (* Alcobar-based fuzz testing for CBOR roundtripping *) 7 + 8 + open Bytesrw 9 + open Alcobar 10 + module Rw = Cbor.Binary 11 + module V = Cbor.Value 12 + 13 + (* Compare CBOR values for equality, handling floats and normalized simple values *) 14 + let rec cbor_equal (a : V.t) (b : V.t) = 15 + match (a, b) with 16 + | V.Int x, V.Int y -> Z.equal x y 17 + | V.Bytes x, V.Bytes y -> String.equal x y 18 + | V.Text x, V.Text y -> String.equal x y 19 + | V.Array xs, V.Array ys -> 20 + List.length xs = List.length ys && List.for_all2 cbor_equal xs ys 21 + | V.Map xs, V.Map ys -> 22 + List.length xs = List.length ys 23 + && List.for_all2 24 + (fun (k1, v1) (k2, v2) -> cbor_equal k1 k2 && cbor_equal v1 v2) 25 + xs ys 26 + | V.Tag (n1, v1), V.Tag (n2, v2) -> n1 = n2 && cbor_equal v1 v2 27 + | V.Bool x, V.Bool y -> x = y 28 + | V.Null, V.Null -> true 29 + | V.Undefined, V.Undefined -> true 30 + | V.Simple x, V.Simple y -> x = y 31 + | V.Float x, V.Float y -> (Float.is_nan x && Float.is_nan y) || x = y 32 + (* Handle Simple(20-23) which decoder normalizes to Bool/Null/Undefined *) 33 + | V.Simple 20, V.Bool false | V.Bool false, V.Simple 20 -> true 34 + | V.Simple 21, V.Bool true | V.Bool true, V.Simple 21 -> true 35 + | V.Simple 22, V.Null | V.Null, V.Simple 22 -> true 36 + | V.Simple 23, V.Undefined | V.Undefined, V.Simple 23 -> true 37 + | _ -> false 38 + 39 + (* Generator for valid simple values (excluding reserved 20-31) *) 40 + let simple_gen = 41 + map [ uint8 ] (fun n -> 42 + (* Simple values 20-31 are reserved/special, avoid them *) 43 + if n >= 20 && n <= 31 then V.Simple (n + 12) (* shift to 32-43 *) 44 + else V.Simple n) 45 + 46 + (* Generator for arbitrary CBOR values *) 47 + let cbor_gen : V.t gen = 48 + fix (fun cbor_gen -> 49 + let leaf_gen = 50 + choose 51 + [ 52 + map [ int64 ] (fun n -> V.Int (Z.of_int64 n)); 53 + map [ bytes ] (fun s -> V.Bytes s); 54 + map [ bytes ] (fun s -> V.Text s); 55 + map [ bool ] (fun b -> V.Bool b); 56 + const V.Null; 57 + const V.Undefined; 58 + map [ float ] (fun f -> V.Float f); 59 + simple_gen; 60 + ] 61 + in 62 + let compound_gen = 63 + choose 64 + [ 65 + map [ list cbor_gen ] (fun items -> V.Array items); 66 + map [ list (pair cbor_gen cbor_gen) ] (fun pairs -> V.Map pairs); 67 + (* Avoid tags 2 and 3 which are bignum tags handled specially *) 68 + map 69 + [ range 100; cbor_gen ] 70 + (fun tag v -> 71 + let tag = if tag = 2 || tag = 3 then tag + 100 else tag in 72 + V.Tag (tag, v)); 73 + ] 74 + in 75 + (* Bias towards leaf nodes to avoid deeply nested structures *) 76 + choose [ leaf_gen; leaf_gen; leaf_gen; compound_gen ]) 77 + 78 + (* Test encode-decode roundtrip *) 79 + let test_encode_decode_roundtrip cbor = 80 + let buf = Buffer.create 256 in 81 + let writer = Bytes.Writer.of_buffer buf in 82 + let enc = Rw.encoder writer in 83 + Rw.write_cbor enc cbor; 84 + Rw.flush_encoder enc; 85 + let encoded = Buffer.contents buf in 86 + let reader = Bytes.Reader.of_string encoded in 87 + let dec = Rw.decoder reader in 88 + let decoded = Rw.read_cbor dec in 89 + check_eq ~eq:cbor_equal ~pp:V.pp cbor decoded 90 + 91 + (* Check if CBOR value has valid (non-negative) tag numbers *) 92 + let rec has_valid_tags = function 93 + | V.Tag (n, _) when n < 0 -> false (* Overflow occurred *) 94 + | V.Tag (_, v) -> has_valid_tags v 95 + | V.Array items -> List.for_all has_valid_tags items 96 + | V.Map pairs -> 97 + List.for_all (fun (k, v) -> has_valid_tags k && has_valid_tags v) pairs 98 + | _ -> true 99 + 100 + (* Test decode-encode roundtrip from raw bytes *) 101 + let test_decode_encode_roundtrip input = 102 + let reader = Bytes.Reader.of_string input in 103 + let dec = Rw.decoder reader in 104 + match Rw.read_cbor dec with 105 + | cbor -> 106 + (* Skip values with overflowed tag numbers *) 107 + if not (has_valid_tags cbor) then () 108 + else begin 109 + (* Encode back to bytes *) 110 + let buf = Buffer.create 256 in 111 + let writer = Bytes.Writer.of_buffer buf in 112 + let enc = Rw.encoder writer in 113 + Rw.write_cbor enc cbor; 114 + Rw.flush_encoder enc; 115 + let encoded = Buffer.contents buf in 116 + (* Decode again *) 117 + let reader2 = Bytes.Reader.of_string encoded in 118 + let dec2 = Rw.decoder reader2 in 119 + let cbor2 = Rw.read_cbor dec2 in 120 + check_eq ~eq:cbor_equal ~pp:V.pp cbor cbor2 121 + end 122 + | exception _ -> 123 + (* Invalid CBOR input is fine *) 124 + () 125 + 126 + let suite = 127 + ( "cbor", 128 + [ 129 + test_case "encode-decode roundtrip" [ cbor_gen ] 130 + test_encode_decode_roundtrip; 131 + test_case "decode-encode roundtrip" [ bytes ] test_decode_encode_roundtrip; 132 + ] )
+4
fuzz/fuzz_cbor.mli
··· 1 + (** Fuzz tests for {!Cbor}. *) 2 + 3 + val suite : string * Alcobar.test_case list 4 + (** Test suite. *)
+757
lib/binary.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Bytesrw 7 + 8 + (* CBOR Major Types *) 9 + let major_uint = 0 10 + let major_nint = 1 11 + let major_bytes = 2 12 + let major_text = 3 13 + let major_array = 4 14 + let major_map = 5 15 + let major_tag = 6 16 + let major_simple = 7 17 + 18 + (* CBOR Simple Values *) 19 + let simple_false = 20 20 + let simple_true = 21 21 + let simple_null = 22 22 + let simple_undefined = 23 23 + 24 + (* Additional Information *) 25 + let ai_1byte = 24 26 + let ai_2byte = 25 27 + let ai_4byte = 26 28 + let ai_8byte = 27 29 + let ai_indefinite = 31 30 + let break_code = 0xff 31 + 32 + (* Bignum tags *) 33 + let tag_positive_bignum = 2 34 + let tag_negative_bignum = 3 35 + 36 + (* Limits for integer encoding *) 37 + let max_uint64 = Z.of_string "18446744073709551615" (* 2^64 - 1 *) 38 + let max_nint64 = Z.of_string "-18446744073709551616" (* -2^64 *) 39 + 40 + (* ========== Encoder ========== *) 41 + 42 + type encoder = { writer : Bytes.Writer.t; buf : bytes; mutable buf_pos : int } 43 + 44 + let buf_size = 4096 45 + let encoder writer = { writer; buf = Stdlib.Bytes.create buf_size; buf_pos = 0 } 46 + 47 + let flush_encoder enc = 48 + if enc.buf_pos > 0 then begin 49 + let slice = Bytes.Slice.make enc.buf ~first:0 ~length:enc.buf_pos in 50 + Bytes.Writer.write enc.writer slice; 51 + enc.buf_pos <- 0 52 + end 53 + 54 + let ensure_space enc n = if enc.buf_pos + n > buf_size then flush_encoder enc 55 + 56 + let write_byte enc b = 57 + ensure_space enc 1; 58 + Stdlib.Bytes.set_uint8 enc.buf enc.buf_pos b; 59 + enc.buf_pos <- enc.buf_pos + 1 60 + 61 + let write_bytes enc s = 62 + let len = String.length s in 63 + if len <= buf_size - enc.buf_pos then begin 64 + Stdlib.Bytes.blit_string s 0 enc.buf enc.buf_pos len; 65 + enc.buf_pos <- enc.buf_pos + len 66 + end 67 + else begin 68 + flush_encoder enc; 69 + if len <= buf_size then begin 70 + Stdlib.Bytes.blit_string s 0 enc.buf 0 len; 71 + enc.buf_pos <- len 72 + end 73 + else begin 74 + let slice = Bytes.Slice.of_string s in 75 + Bytes.Writer.write enc.writer slice 76 + end 77 + end 78 + 79 + let write_u16_be enc v = 80 + ensure_space enc 2; 81 + Stdlib.Bytes.set_uint16_be enc.buf enc.buf_pos v; 82 + enc.buf_pos <- enc.buf_pos + 2 83 + 84 + let write_u32_be enc v = 85 + ensure_space enc 4; 86 + Stdlib.Bytes.set_int32_be enc.buf enc.buf_pos v; 87 + enc.buf_pos <- enc.buf_pos + 4 88 + 89 + let write_u64_be enc v = 90 + ensure_space enc 8; 91 + Stdlib.Bytes.set_int64_be enc.buf enc.buf_pos v; 92 + enc.buf_pos <- enc.buf_pos + 8 93 + 94 + let write_type_arg enc major arg = 95 + let base = major lsl 5 in 96 + if arg < 24 then write_byte enc (base lor arg) 97 + else if arg < 0x100 then begin 98 + write_byte enc (base lor ai_1byte); 99 + write_byte enc arg 100 + end 101 + else if arg < 0x10000 then begin 102 + write_byte enc (base lor ai_2byte); 103 + write_u16_be enc arg 104 + end 105 + else if arg < 0x100000000 then begin 106 + write_byte enc (base lor ai_4byte); 107 + write_u32_be enc (Int32.of_int arg) 108 + end 109 + else begin 110 + write_byte enc (base lor ai_8byte); 111 + write_u64_be enc (Int64.of_int arg) 112 + end 113 + 114 + let write_type_arg64 enc major arg = 115 + let base = major lsl 5 in 116 + if arg < 24L then write_byte enc (base lor Int64.to_int arg) 117 + else if arg < 0x100L then begin 118 + write_byte enc (base lor ai_1byte); 119 + write_byte enc (Int64.to_int arg) 120 + end 121 + else if arg < 0x10000L then begin 122 + write_byte enc (base lor ai_2byte); 123 + write_u16_be enc (Int64.to_int arg) 124 + end 125 + else if arg < 0x100000000L then begin 126 + write_byte enc (base lor ai_4byte); 127 + write_u32_be enc (Int64.to_int32 arg) 128 + end 129 + else begin 130 + write_byte enc (base lor ai_8byte); 131 + write_u64_be enc arg 132 + end 133 + 134 + let write_null enc = write_byte enc ((major_simple lsl 5) lor simple_null) 135 + 136 + let write_undefined enc = 137 + write_byte enc ((major_simple lsl 5) lor simple_undefined) 138 + 139 + let write_bool enc b = 140 + let v = if b then simple_true else simple_false in 141 + write_byte enc ((major_simple lsl 5) lor v) 142 + 143 + let write_simple enc n = 144 + if n < 24 then write_byte enc ((major_simple lsl 5) lor n) 145 + else begin 146 + write_byte enc ((major_simple lsl 5) lor ai_1byte); 147 + write_byte enc n 148 + end 149 + 150 + (* Half-precision float encoding *) 151 + let encode_half f = 152 + let bits = Int32.to_int (Int32.bits_of_float f) in 153 + let sign = (bits lsr 31) land 1 in 154 + let exp = (bits lsr 23) land 0xff in 155 + let mant = bits land 0x7fffff in 156 + if exp = 0 then 157 + (* Zero or subnormal - may lose precision *) 158 + (sign lsl 15) lor (mant lsr 13) 159 + else if exp = 0xff then 160 + (* Inf or NaN *) 161 + (sign lsl 15) lor 0x7c00 lor if mant <> 0 then 0x200 else 0 162 + else 163 + let new_exp = exp - 127 + 15 in 164 + if new_exp <= 0 then 165 + (* Subnormal in half *) 166 + (sign lsl 15) lor ((mant lor 0x800000) lsr (14 - new_exp)) 167 + else if new_exp >= 31 then 168 + (* Overflow to infinity *) 169 + (sign lsl 15) lor 0x7c00 170 + else (sign lsl 15) lor (new_exp lsl 10) lor (mant lsr 13) 171 + 172 + let decode_half bits = 173 + let sign = (bits lsr 15) land 1 in 174 + let exp = (bits lsr 10) land 0x1f in 175 + let mant = bits land 0x3ff in 176 + let f = 177 + if exp = 0 then ldexp (float_of_int mant) (-24) 178 + else if exp = 31 then if mant = 0 then infinity else nan 179 + else ldexp (float_of_int (mant lor 0x400)) (exp - 25) 180 + in 181 + if sign = 1 then -.f else f 182 + 183 + (* Check if float can be exactly represented in half precision *) 184 + let can_encode_half f = 185 + let half = encode_half f in 186 + let back = decode_half half in 187 + (* Compare bit patterns to handle -0.0 and NaN correctly *) 188 + Int64.bits_of_float f = Int64.bits_of_float back 189 + 190 + (* Check if float can be exactly represented in single precision *) 191 + let can_encode_single f = 192 + let single = Int32.float_of_bits (Int32.bits_of_float f) in 193 + Int64.bits_of_float f = Int64.bits_of_float single 194 + 195 + let write_float16 enc f = 196 + write_byte enc ((major_simple lsl 5) lor ai_2byte); 197 + write_u16_be enc (encode_half f) 198 + 199 + let write_float32 enc f = 200 + write_byte enc ((major_simple lsl 5) lor ai_4byte); 201 + write_u32_be enc (Int32.bits_of_float f) 202 + 203 + let write_float64 enc f = 204 + write_byte enc ((major_simple lsl 5) lor ai_8byte); 205 + write_u64_be enc (Int64.bits_of_float f) 206 + 207 + (* Write float using minimal encoding *) 208 + let write_float enc f = 209 + if can_encode_half f then write_float16 enc f 210 + else if can_encode_single f then write_float32 enc f 211 + else write_float64 enc f 212 + 213 + let write_int enc n = 214 + if n >= 0 then write_type_arg enc major_uint n 215 + else write_type_arg enc major_nint (-n - 1) 216 + 217 + let write_int64 enc n = 218 + if n >= 0L then write_type_arg64 enc major_uint n 219 + else write_type_arg64 enc major_nint (Int64.neg n |> Int64.pred) 220 + 221 + (* Convert Z.t to big-endian bytes (minimal representation) *) 222 + let bigint_to_bytes n = 223 + if Z.equal n Z.zero then "\x00" 224 + else 225 + let s = Z.to_bits n in 226 + (* Z.to_bits gives little-endian, we need big-endian *) 227 + let len = String.length s in 228 + let buf = Stdlib.Bytes.create len in 229 + for i = 0 to len - 1 do 230 + Stdlib.Bytes.set buf i s.[len - 1 - i] 231 + done; 232 + (* Remove leading zeros *) 233 + let start = ref 0 in 234 + while !start < len - 1 && Stdlib.Bytes.get buf !start = '\x00' do 235 + incr start 236 + done; 237 + Stdlib.Bytes.sub_string buf !start (len - !start) 238 + 239 + (* Convert big-endian bytes to Z.t *) 240 + let bytes_to_bigint s = 241 + let len = String.length s in 242 + if len = 0 then Z.zero 243 + else begin 244 + (* Convert big-endian to little-endian for Z.of_bits *) 245 + let buf = Stdlib.Bytes.create len in 246 + for i = 0 to len - 1 do 247 + Stdlib.Bytes.set buf i s.[len - 1 - i] 248 + done; 249 + Z.of_bits (Stdlib.Bytes.unsafe_to_string buf) 250 + end 251 + 252 + (* Write a Z.t integer, using bignum tags for large values *) 253 + let write_padded_u64 enc major n = 254 + write_byte enc ((major lsl 5) lor ai_8byte); 255 + let bytes = bigint_to_bytes n in 256 + let padded = Stdlib.Bytes.make 8 '\x00' in 257 + let offset = 8 - String.length bytes in 258 + Stdlib.Bytes.blit_string bytes 0 padded offset (String.length bytes); 259 + for i = 0 to 7 do 260 + write_byte enc (Char.code (Stdlib.Bytes.get padded i)) 261 + done 262 + 263 + let write_bigint enc n = 264 + if Z.sign n >= 0 then 265 + begin if Z.leq n max_uint64 then 266 + (* Fits in uint64 *) 267 + if Z.fits_int64 n then write_type_arg64 enc major_uint (Z.to_int64 n) 268 + else write_padded_u64 enc major_uint n 269 + else begin 270 + (* Need bignum tag 2 *) 271 + write_type_arg enc major_tag tag_positive_bignum; 272 + let bytes = bigint_to_bytes n in 273 + write_type_arg enc major_bytes (String.length bytes); 274 + write_bytes enc bytes 275 + end 276 + end 277 + else begin 278 + (* Negative number *) 279 + let abs_minus_1 = Z.pred (Z.neg n) in 280 + (* -1 - n = |n| - 1 *) 281 + if Z.geq n max_nint64 then 282 + (* Fits in nint64 *) 283 + if Z.fits_int64 abs_minus_1 then 284 + write_type_arg64 enc major_nint (Z.to_int64 abs_minus_1) 285 + else write_padded_u64 enc major_nint abs_minus_1 286 + else begin 287 + (* Need bignum tag 3 *) 288 + write_type_arg enc major_tag tag_negative_bignum; 289 + let bytes = bigint_to_bytes abs_minus_1 in 290 + write_type_arg enc major_bytes (String.length bytes); 291 + write_bytes enc bytes 292 + end 293 + end 294 + 295 + let write_text enc s = 296 + write_type_arg enc major_text (String.length s); 297 + write_bytes enc s 298 + 299 + let write_text_start enc = write_byte enc ((major_text lsl 5) lor ai_indefinite) 300 + let write_text_chunk enc s = write_text enc s 301 + 302 + let write_bytes_data enc s = 303 + write_type_arg enc major_bytes (String.length s); 304 + write_bytes enc s 305 + 306 + let write_bytes_header enc len = write_type_arg enc major_bytes len 307 + 308 + let write_bytes_start enc = 309 + write_byte enc ((major_bytes lsl 5) lor ai_indefinite) 310 + 311 + let write_bytes_chunk enc s = write_bytes_data enc s 312 + let write_array_start enc n = write_type_arg enc major_array n 313 + 314 + let write_array_indef enc = 315 + write_byte enc ((major_array lsl 5) lor ai_indefinite) 316 + 317 + let write_map_start enc n = write_type_arg enc major_map n 318 + let write_map_indef enc = write_byte enc ((major_map lsl 5) lor ai_indefinite) 319 + let write_tag enc n = write_type_arg enc major_tag n 320 + let write_break enc = write_byte enc break_code 321 + 322 + (* ========== Decoder ========== *) 323 + 324 + type decoder = { 325 + reader : Bytes.Reader.t; 326 + mutable slice : Bytes.Slice.t; 327 + mutable slice_pos : int; 328 + mutable position : int; 329 + } 330 + 331 + let decoder reader = 332 + (* Initialize with eod slice - will be replaced on first read *) 333 + { reader; slice = Bytes.Slice.eod; slice_pos = 0; position = 0 } 334 + 335 + let decoder_at_end dec = 336 + if dec.slice_pos < Bytes.Slice.length dec.slice then false 337 + else begin 338 + dec.slice <- Bytes.Reader.read dec.reader; 339 + dec.slice_pos <- 0; 340 + Bytes.Slice.is_eod dec.slice 341 + end 342 + 343 + let decoder_position dec = dec.position 344 + 345 + let refill dec = 346 + dec.slice <- Bytes.Reader.read dec.reader; 347 + dec.slice_pos <- 0; 348 + if Bytes.Slice.is_eod dec.slice then raise End_of_file 349 + 350 + let slice_get_byte slice pos = 351 + Stdlib.Bytes.get_uint8 (Bytes.Slice.bytes slice) 352 + (Bytes.Slice.first slice + pos) 353 + 354 + let peek_byte dec = 355 + if dec.slice_pos >= Bytes.Slice.length dec.slice then begin 356 + dec.slice <- Bytes.Reader.read dec.reader; 357 + dec.slice_pos <- 0; 358 + if Bytes.Slice.is_eod dec.slice then None 359 + else Some (slice_get_byte dec.slice dec.slice_pos) 360 + end 361 + else Some (slice_get_byte dec.slice dec.slice_pos) 362 + 363 + let read_byte dec = 364 + if dec.slice_pos >= Bytes.Slice.length dec.slice then refill dec; 365 + let b = slice_get_byte dec.slice dec.slice_pos in 366 + dec.slice_pos <- dec.slice_pos + 1; 367 + dec.position <- dec.position + 1; 368 + b 369 + 370 + let read_bytes dec n = 371 + if n = 0 then "" 372 + else if n < 0 || n > Sys.max_string_length then failwith "Invalid CBOR length" 373 + else 374 + let buf = Stdlib.Bytes.create n in 375 + let rec loop off remaining = 376 + if remaining = 0 then () 377 + else begin 378 + if dec.slice_pos >= Bytes.Slice.length dec.slice then refill dec; 379 + let avail = Bytes.Slice.length dec.slice - dec.slice_pos in 380 + let take = min avail remaining in 381 + let src = Bytes.Slice.bytes dec.slice in 382 + let src_off = Bytes.Slice.first dec.slice + dec.slice_pos in 383 + Stdlib.Bytes.blit src src_off buf off take; 384 + dec.slice_pos <- dec.slice_pos + take; 385 + dec.position <- dec.position + take; 386 + loop (off + take) (remaining - take) 387 + end 388 + in 389 + loop 0 n; 390 + Stdlib.Bytes.unsafe_to_string buf 391 + 392 + let read_u16_be dec = 393 + let b0 = read_byte dec in 394 + let b1 = read_byte dec in 395 + (b0 lsl 8) lor b1 396 + 397 + let read_u32_be dec = 398 + let b0 = read_byte dec in 399 + let b1 = read_byte dec in 400 + let b2 = read_byte dec in 401 + let b3 = read_byte dec in 402 + Int32.( 403 + logor 404 + (shift_left (of_int b0) 24) 405 + (logor 406 + (shift_left (of_int b1) 16) 407 + (logor (shift_left (of_int b2) 8) (of_int b3)))) 408 + 409 + let read_u64_be dec = 410 + let b0 = read_byte dec in 411 + let b1 = read_byte dec in 412 + let b2 = read_byte dec in 413 + let b3 = read_byte dec in 414 + let b4 = read_byte dec in 415 + let b5 = read_byte dec in 416 + let b6 = read_byte dec in 417 + let b7 = read_byte dec in 418 + Int64.( 419 + logor 420 + (shift_left (of_int b0) 56) 421 + (logor 422 + (shift_left (of_int b1) 48) 423 + (logor 424 + (shift_left (of_int b2) 40) 425 + (logor 426 + (shift_left (of_int b3) 32) 427 + (logor 428 + (shift_left (of_int b4) 24) 429 + (logor 430 + (shift_left (of_int b5) 16) 431 + (logor (shift_left (of_int b6) 8) (of_int b7)))))))) 432 + 433 + (* Read unsigned 64-bit as Z.t to handle full range *) 434 + let read_uint64_as_z dec = 435 + let bytes = read_bytes dec 8 in 436 + bytes_to_bigint bytes 437 + 438 + type header = { major : int; info : int } 439 + 440 + let read_header dec = 441 + let b = read_byte dec in 442 + { major = b lsr 5; info = b land 0x1f } 443 + 444 + (* Read argument as Z.t to handle full unsigned range *) 445 + let read_argument_z dec hdr = 446 + let info = hdr.info in 447 + if info < 24 then Z.of_int info 448 + else if info = ai_1byte then Z.of_int (read_byte dec) 449 + else if info = ai_2byte then Z.of_int (read_u16_be dec) 450 + else if info = ai_4byte then 451 + Z.of_int64 (Int64.logand (Int64.of_int32 (read_u32_be dec)) 0xffffffffL) 452 + else if info = ai_8byte then read_uint64_as_z dec 453 + else if info = ai_indefinite then Z.minus_one 454 + else Fmt.failwith "Invalid additional info: %d" info 455 + 456 + let read_argument dec hdr = 457 + let info = hdr.info in 458 + if info < 24 then Int64.of_int info 459 + else if info = ai_1byte then Int64.of_int (read_byte dec) 460 + else if info = ai_2byte then Int64.of_int (read_u16_be dec) 461 + else if info = ai_4byte then 462 + Int64.logand (Int64.of_int32 (read_u32_be dec)) 0xffffffffL 463 + else if info = ai_8byte then read_u64_be dec 464 + else if info = ai_indefinite then -1L 465 + else Fmt.failwith "Invalid additional info: %d" info 466 + 467 + let read_int dec = 468 + let hdr = read_header dec in 469 + let arg = read_argument dec hdr in 470 + if hdr.major = major_uint then arg 471 + else if hdr.major = major_nint then Int64.(neg (succ arg)) 472 + else failwith "Expected integer" 473 + 474 + let read_text dec = 475 + let hdr = read_header dec in 476 + if hdr.major <> major_text then failwith "Expected text string"; 477 + let arg = read_argument dec hdr in 478 + if arg >= 0L then read_bytes dec (Int64.to_int arg) 479 + else begin 480 + let buf = Buffer.create 64 in 481 + while 482 + match peek_byte dec with 483 + | Some 0xff -> 484 + ignore (read_byte dec); 485 + false 486 + | _ -> true 487 + do 488 + let hdr = read_header dec in 489 + if hdr.major <> major_text then failwith "Expected text chunk"; 490 + let len = read_argument dec hdr in 491 + if len < 0L then failwith "Nested indefinite text"; 492 + Buffer.add_string buf (read_bytes dec (Int64.to_int len)) 493 + done; 494 + Buffer.contents buf 495 + end 496 + 497 + let read_bytes_data dec = 498 + let hdr = read_header dec in 499 + if hdr.major <> major_bytes then failwith "Expected byte string"; 500 + let arg = read_argument dec hdr in 501 + if arg >= 0L then read_bytes dec (Int64.to_int arg) 502 + else begin 503 + let buf = Buffer.create 64 in 504 + while 505 + match peek_byte dec with 506 + | Some 0xff -> 507 + ignore (read_byte dec); 508 + false 509 + | _ -> true 510 + do 511 + let hdr = read_header dec in 512 + if hdr.major <> major_bytes then failwith "Expected bytes chunk"; 513 + let len = read_argument dec hdr in 514 + if len < 0L then failwith "Nested indefinite bytes"; 515 + Buffer.add_string buf (read_bytes dec (Int64.to_int len)) 516 + done; 517 + Buffer.contents buf 518 + end 519 + 520 + let read_float dec = 521 + let hdr = read_header dec in 522 + if hdr.major <> major_simple then failwith "Expected float"; 523 + if hdr.info = ai_2byte then decode_half (read_u16_be dec) 524 + else if hdr.info = ai_4byte then Int32.float_of_bits (read_u32_be dec) 525 + else if hdr.info = ai_8byte then Int64.float_of_bits (read_u64_be dec) 526 + else failwith "Expected float" 527 + 528 + let read_bool dec = 529 + let hdr = read_header dec in 530 + if hdr.major <> major_simple then failwith "Expected boolean"; 531 + if hdr.info = simple_false then false 532 + else if hdr.info = simple_true then true 533 + else failwith "Expected boolean" 534 + 535 + let read_null dec = 536 + let hdr = read_header dec in 537 + if hdr.major <> major_simple || hdr.info <> simple_null then 538 + failwith "Expected null" 539 + 540 + let read_undefined dec = 541 + let hdr = read_header dec in 542 + if hdr.major <> major_simple || hdr.info <> simple_undefined then 543 + failwith "Expected undefined" 544 + 545 + let read_simple dec = 546 + let hdr = read_header dec in 547 + if hdr.major <> major_simple then failwith "Expected simple value"; 548 + if hdr.info < 24 then hdr.info 549 + else if hdr.info = ai_1byte then read_byte dec 550 + else failwith "Expected simple value" 551 + 552 + let read_array_start dec = 553 + let hdr = read_header dec in 554 + if hdr.major <> major_array then failwith "Expected array"; 555 + let arg = read_argument dec hdr in 556 + if arg < 0L then None else Some (Int64.to_int arg) 557 + 558 + let read_map_start dec = 559 + let hdr = read_header dec in 560 + if hdr.major <> major_map then failwith "Expected map"; 561 + let arg = read_argument dec hdr in 562 + if arg < 0L then None else Some (Int64.to_int arg) 563 + 564 + let read_tag dec = 565 + let hdr = read_header dec in 566 + if hdr.major <> major_tag then failwith "Expected tag"; 567 + Int64.to_int (read_argument dec hdr) 568 + 569 + let is_break dec = match peek_byte dec with Some 0xff -> true | _ -> false 570 + 571 + let skip_break dec = 572 + let b = read_byte dec in 573 + if b <> break_code then failwith "Expected break" 574 + 575 + let rec skip dec = 576 + let hdr = read_header dec in 577 + match hdr.major with 578 + | 0 | 1 -> ignore (read_argument dec hdr) 579 + | 2 | 3 -> 580 + let arg = read_argument dec hdr in 581 + if arg >= 0L then ignore (read_bytes dec (Int64.to_int arg)) 582 + else 583 + while not (is_break dec) do 584 + skip dec 585 + done; 586 + skip_break dec 587 + | 4 -> 588 + let arg = read_argument dec hdr in 589 + if arg >= 0L then 590 + for _ = 1 to Int64.to_int arg do 591 + skip dec 592 + done 593 + else begin 594 + while not (is_break dec) do 595 + skip dec 596 + done; 597 + skip_break dec 598 + end 599 + | 5 -> 600 + let arg = read_argument dec hdr in 601 + if arg >= 0L then 602 + for _ = 1 to Int64.to_int arg do 603 + skip dec; 604 + skip dec 605 + done 606 + else begin 607 + while not (is_break dec) do 608 + skip dec; 609 + skip dec 610 + done; 611 + skip_break dec 612 + end 613 + | 6 -> 614 + ignore (read_argument dec hdr); 615 + skip dec 616 + | 7 -> 617 + let info = hdr.info in 618 + if info < 24 then () 619 + else if info = ai_1byte then ignore (read_byte dec) 620 + else if info = ai_2byte then ignore (read_u16_be dec) 621 + else if info = ai_4byte then ignore (read_u32_be dec) 622 + else if info = ai_8byte then ignore (read_u64_be dec) 623 + | _ -> failwith "Invalid major type" 624 + 625 + (* ========== CBOR Value I/O ========== *) 626 + 627 + let rec write_cbor enc (v : Value.t) = 628 + match v with 629 + | Value.Int n -> write_bigint enc n 630 + | Value.Bytes s -> write_bytes_data enc s 631 + | Value.Text s -> write_text enc s 632 + | Value.Array items -> 633 + write_array_start enc (List.length items); 634 + List.iter (write_cbor enc) items 635 + | Value.Map pairs -> 636 + write_map_start enc (List.length pairs); 637 + List.iter 638 + (fun (k, v) -> 639 + write_cbor enc k; 640 + write_cbor enc v) 641 + pairs 642 + | Value.Tag (n, v) -> 643 + write_tag enc n; 644 + write_cbor enc v 645 + | Value.Bool b -> write_bool enc b 646 + | Value.Null -> write_null enc 647 + | Value.Undefined -> write_undefined enc 648 + | Value.Simple n -> write_simple enc n 649 + | Value.Float f -> write_float enc f 650 + 651 + let read_indefinite_concat dec ~expect_major ~kind = 652 + let buf = Buffer.create 64 in 653 + while not (is_break dec) do 654 + let hdr = read_header dec in 655 + if hdr.major <> expect_major then failwith ("Expected " ^ kind ^ " chunk"); 656 + let len = read_argument dec hdr in 657 + if len < 0L then failwith ("Nested indefinite " ^ kind); 658 + Buffer.add_string buf (read_bytes dec (Int64.to_int len)) 659 + done; 660 + skip_break dec; 661 + Buffer.contents buf 662 + 663 + let read_simple_or_float dec info = 664 + if info = simple_false then Value.Bool false 665 + else if info = simple_true then Value.Bool true 666 + else if info = simple_null then Value.Null 667 + else if info = simple_undefined then Value.Undefined 668 + else if info < 24 then Value.Simple info 669 + else if info = ai_1byte then Value.Simple (read_byte dec) 670 + else if info = ai_2byte then Value.Float (decode_half (read_u16_be dec)) 671 + else if info = ai_4byte then 672 + Value.Float (Int32.float_of_bits (read_u32_be dec)) 673 + else if info = ai_8byte then 674 + Value.Float (Int64.float_of_bits (read_u64_be dec)) 675 + else failwith "Invalid simple/float encoding" 676 + 677 + (* Structural limits for hostile input. *) 678 + let max_collection_items = 1_000_000 679 + let max_nesting_depth = 512 680 + 681 + let rec read_cbor dec : Value.t = read_cbor_d dec 0 682 + 683 + and read_cbor_d dec depth : Value.t = 684 + if depth > max_nesting_depth then failwith "CBOR nesting too deep"; 685 + let hdr = read_header dec in 686 + match hdr.major with 687 + | 0 -> Value.Int (read_argument_z dec hdr) 688 + | 1 -> Value.Int (Z.neg (Z.succ (read_argument_z dec hdr))) 689 + | 2 -> 690 + let arg = read_argument dec hdr in 691 + if arg >= 0L then Value.Bytes (read_bytes dec (Int64.to_int arg)) 692 + else 693 + Value.Bytes 694 + (read_indefinite_concat dec ~expect_major:major_bytes ~kind:"bytes") 695 + | 3 -> 696 + let arg = read_argument dec hdr in 697 + if arg >= 0L then Value.Text (read_bytes dec (Int64.to_int arg)) 698 + else 699 + Value.Text 700 + (read_indefinite_concat dec ~expect_major:major_text ~kind:"text") 701 + | 4 -> Value.Array (read_cbor_list dec (depth + 1) hdr) 702 + | 5 -> Value.Map (read_cbor_pairs dec (depth + 1) hdr) 703 + | 6 -> read_cbor_tag dec depth hdr 704 + | 7 -> read_simple_or_float dec hdr.info 705 + | _ -> failwith "Invalid major type" 706 + 707 + and read_cbor_list dec depth hdr = 708 + let arg = read_argument dec hdr in 709 + if arg >= 0L then ( 710 + let n = Int64.to_int arg in 711 + if n < 0 || n > max_collection_items then 712 + failwith "Invalid CBOR array length"; 713 + List.init n (fun _ -> read_cbor_d dec depth)) 714 + else 715 + let items = ref [] in 716 + let count = ref 0 in 717 + while not (is_break dec) do 718 + incr count; 719 + if !count > max_collection_items then 720 + failwith "CBOR indefinite array too large"; 721 + items := read_cbor_d dec depth :: !items 722 + done; 723 + skip_break dec; 724 + List.rev !items 725 + 726 + and read_cbor_pairs dec depth hdr = 727 + let arg = read_argument dec hdr in 728 + let read_pair () = 729 + let k = read_cbor_d dec depth in 730 + let v = read_cbor_d dec depth in 731 + (k, v) 732 + in 733 + if arg >= 0L then ( 734 + let n = Int64.to_int arg in 735 + if n < 0 || n > max_collection_items then failwith "Invalid CBOR map length"; 736 + List.init n (fun _ -> read_pair ())) 737 + else 738 + let pairs = ref [] in 739 + let count = ref 0 in 740 + while not (is_break dec) do 741 + incr count; 742 + if !count > max_collection_items then 743 + failwith "CBOR indefinite map too large"; 744 + pairs := read_pair () :: !pairs 745 + done; 746 + skip_break dec; 747 + List.rev !pairs 748 + 749 + and read_cbor_tag dec depth hdr = 750 + let tag = Int64.to_int (read_argument dec hdr) in 751 + let content = read_cbor_d dec (depth + 1) in 752 + match (tag, content) with 753 + | t, Value.Bytes s when t = tag_positive_bignum -> 754 + Value.Int (bytes_to_bigint s) 755 + | t, Value.Bytes s when t = tag_negative_bignum -> 756 + Value.Int (Z.neg (Z.succ (bytes_to_bigint s))) 757 + | _ -> Value.Tag (tag, content)
+406
lib/binary.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Low-level CBOR encoding and decoding primitives. 7 + 8 + This module provides basic encoding and decoding operations for CBOR 9 + (Concise Binary Object Representation) data as specified in 10 + {{:https://www.rfc-editor.org/rfc/rfc8949}RFC 8949}. 11 + 12 + Both encoder and decoder use buffered approaches for efficiency. 13 + 14 + {2 CBOR Data Model} 15 + 16 + CBOR uses a type system based on major types (0-7) that determine how the 17 + following bytes should be interpreted: 18 + 19 + - Major type 0: Unsigned integer 20 + - Major type 1: Negative integer 21 + - Major type 2: Byte string 22 + - Major type 3: Text string (UTF-8) 23 + - Major type 4: Array of data items 24 + - Major type 5: Map of pairs of data items 25 + - Major type 6: Tagged data item 26 + - Major type 7: Simple values and floating-point numbers 27 + 28 + {2 Encoding Example} 29 + 30 + {[ 31 + let encode_person name age = 32 + let buf = Buffer.create 64 in 33 + let writer = Bytesrw.Bytes.Writer.of_buffer buf in 34 + let enc = Cbor_rw.encoder writer in 35 + Cbor_rw.write_map_start enc 2; 36 + Cbor_rw.write_text enc "name"; 37 + Cbor_rw.write_text enc name; 38 + Cbor_rw.write_text enc "age"; 39 + Cbor_rw.write_int enc age; 40 + Cbor_rw.flush_encoder enc; 41 + Buffer.contents buf 42 + ]} 43 + 44 + {2 Decoding Example} 45 + 46 + {[ 47 + let decode_person bytes = 48 + let reader = Bytesrw.Bytes.Reader.of_string bytes in 49 + let dec = Cbor_rw.decoder reader in 50 + let len = Cbor_rw.read_map_start dec in 51 + (* ... read key-value pairs ... *) 52 + ]} *) 53 + 54 + (** {1 CBOR Major Types} 55 + 56 + Constants for the seven CBOR major types as defined in RFC 8949 Section 3.1. 57 + These are used as the high 3 bits of the initial byte. *) 58 + 59 + val major_uint : int 60 + (** Major type 0: Unsigned integer (0x00-0x1b). *) 61 + 62 + val major_nint : int 63 + (** [major_nint] is major type 1: Negative integer (0x20-0x3b). The value is 64 + [-1 - n] where [n] is the encoded unsigned integer. *) 65 + 66 + val major_bytes : int 67 + (** Major type 2: Byte string (0x40-0x5b). *) 68 + 69 + val major_text : int 70 + (** Major type 3: Text string in UTF-8 encoding (0x60-0x7b). *) 71 + 72 + val major_array : int 73 + (** Major type 4: Array of data items (0x80-0x9b). *) 74 + 75 + val major_map : int 76 + (** Major type 5: Map of pairs of data items (0xa0-0xbb). *) 77 + 78 + val major_tag : int 79 + (** Major type 6: Tagged data item (0xc0-0xdb). *) 80 + 81 + val major_simple : int 82 + (** Major type 7: Simple values and floating-point (0xe0-0xfb). *) 83 + 84 + (** {1 CBOR Simple Values} 85 + 86 + Simple values are encoded using major type 7. These constants represent the 87 + most commonly used simple values. *) 88 + 89 + val simple_false : int 90 + (** Simple value 20: Boolean false. *) 91 + 92 + val simple_true : int 93 + (** Simple value 21: Boolean true. *) 94 + 95 + val simple_null : int 96 + (** Simple value 22: Null (absence of value). *) 97 + 98 + val simple_undefined : int 99 + (** Simple value 23: Undefined value. *) 100 + 101 + (** {1 CBOR Additional Information} 102 + 103 + The low 5 bits of the initial byte encode additional information about the 104 + data item. Values 0-23 encode the value directly; values 24-27 indicate that 105 + 1, 2, 4, or 8 bytes follow containing the value. *) 106 + 107 + val ai_1byte : int 108 + (** Additional info 24: One byte follows with the value. *) 109 + 110 + val ai_2byte : int 111 + (** Additional info 25: Two bytes follow in big-endian order. *) 112 + 113 + val ai_4byte : int 114 + (** Additional info 26: Four bytes follow in big-endian order. *) 115 + 116 + val ai_8byte : int 117 + (** Additional info 27: Eight bytes follow in big-endian order. *) 118 + 119 + val ai_indefinite : int 120 + (** Additional info 31: Indefinite-length encoding (used with break code). *) 121 + 122 + val break_code : int 123 + (** Break stop code (0xff) for terminating indefinite-length items. *) 124 + 125 + (** {1:encoder Encoder} *) 126 + 127 + type encoder 128 + (** A buffered CBOR encoder that writes to a {!Bytesrw.Bytes.Writer.t}. *) 129 + 130 + val encoder : Bytesrw.Bytes.Writer.t -> encoder 131 + (** [encoder writer] creates a new encoder that outputs to [writer]. The encoder 132 + uses an internal buffer of 4096 bytes. *) 133 + 134 + val flush_encoder : encoder -> unit 135 + (** [flush_encoder enc] writes any buffered data to the underlying writer. Must 136 + be called after encoding is complete to ensure all data is written. *) 137 + 138 + (** {2 Low-level Write Operations} 139 + 140 + These functions write raw bytes to the encoder's buffer. They are used 141 + internally by the higher-level encoding functions. *) 142 + 143 + val write_byte : encoder -> int -> unit 144 + (** [write_byte enc b] writes a single byte [b] (0-255). *) 145 + 146 + val write_bytes : encoder -> string -> unit 147 + (** [write_bytes enc s] writes the raw bytes of string [s]. *) 148 + 149 + val write_u16_be : encoder -> int -> unit 150 + (** [write_u16_be enc v] writes [v] as a 16-bit big-endian unsigned integer. *) 151 + 152 + val write_u32_be : encoder -> int32 -> unit 153 + (** [write_u32_be enc v] writes [v] as a 32-bit big-endian integer. *) 154 + 155 + val write_u64_be : encoder -> int64 -> unit 156 + (** [write_u64_be enc v] writes [v] as a 64-bit big-endian integer. *) 157 + 158 + (** {2 CBOR Type Header Encoding} 159 + 160 + These functions encode the CBOR initial byte and any following argument 161 + bytes using the shortest possible encoding (deterministic encoding). *) 162 + 163 + val write_type_arg : encoder -> int -> int -> unit 164 + (** [write_type_arg enc major arg] writes a CBOR type header with the given 165 + major type and argument value. Uses the shortest encoding for [arg]. *) 166 + 167 + val write_type_arg64 : encoder -> int -> int64 -> unit 168 + (** [write_type_arg64 enc major arg] writes a CBOR type header with a 64-bit 169 + argument value. Uses the shortest encoding for [arg]. *) 170 + 171 + (** {2 CBOR Value Encoding} 172 + 173 + High-level functions for encoding CBOR data items. *) 174 + 175 + val write_null : encoder -> unit 176 + (** [write_null enc] encodes a CBOR null value (simple value 22). *) 177 + 178 + val write_undefined : encoder -> unit 179 + (** [write_undefined enc] encodes a CBOR undefined value (simple value 23). *) 180 + 181 + val write_bool : encoder -> bool -> unit 182 + (** [write_bool enc b] encodes a CBOR boolean value. *) 183 + 184 + val write_simple : encoder -> int -> unit 185 + (** [write_simple enc n] encodes simple value [n] (0-255). *) 186 + 187 + val write_float16 : encoder -> float -> unit 188 + (** [write_float16 enc f] encodes a 16-bit half-precision float. *) 189 + 190 + val write_float32 : encoder -> float -> unit 191 + (** [write_float32 enc f] encodes a 32-bit single-precision float. *) 192 + 193 + val write_float : encoder -> float -> unit 194 + (** [write_float enc f] encodes a 64-bit double-precision float. *) 195 + 196 + val write_int : encoder -> int -> unit 197 + (** [write_int enc n] encodes an OCaml [int] as a CBOR integer. Positive values 198 + use major type 0; negative values use major type 1. *) 199 + 200 + val write_int64 : encoder -> int64 -> unit 201 + (** [write_int64 enc n] encodes a 64-bit integer as a CBOR integer. *) 202 + 203 + val write_text : encoder -> string -> unit 204 + (** [write_text enc s] encodes a UTF-8 text string (major type 3). The string 205 + should be valid UTF-8; no validation is performed. *) 206 + 207 + val write_text_start : encoder -> unit 208 + (** [write_text_start enc] starts an indefinite-length text string. Call 209 + {!write_text_chunk} for each chunk and {!write_break} to end. *) 210 + 211 + val write_text_chunk : encoder -> string -> unit 212 + (** [write_text_chunk enc s] writes a text chunk in an indefinite string. *) 213 + 214 + val write_bytes_data : encoder -> string -> unit 215 + (** [write_bytes_data enc s] encodes a byte string (major type 2). *) 216 + 217 + val write_bytes_header : encoder -> int -> unit 218 + (** [write_bytes_header enc len] writes the header for a byte string of length 219 + [len]. The caller must then write exactly [len] bytes using {!write_bytes}. 220 + *) 221 + 222 + val write_bytes_start : encoder -> unit 223 + (** [write_bytes_start enc] starts an indefinite-length byte string. *) 224 + 225 + val write_bytes_chunk : encoder -> string -> unit 226 + (** [write_bytes_chunk enc s] writes a byte chunk in an indefinite string. *) 227 + 228 + val write_array_start : encoder -> int -> unit 229 + (** [write_array_start enc n] writes the header for a definite-length array of 230 + [n] items. The caller must then encode exactly [n] data items. *) 231 + 232 + val write_array_indef : encoder -> unit 233 + (** [write_array_indef enc] starts an indefinite-length array. Encode items then 234 + call {!write_break} to end. *) 235 + 236 + val write_map_start : encoder -> int -> unit 237 + (** [write_map_start enc n] writes the header for a definite-length map of [n] 238 + key-value pairs. The caller must then encode exactly [2*n] data items 239 + (alternating keys and values). *) 240 + 241 + val write_map_indef : encoder -> unit 242 + (** [write_map_indef enc] starts an indefinite-length map. Encode key-value 243 + pairs then call {!write_break} to end. *) 244 + 245 + val write_tag : encoder -> int -> unit 246 + (** [write_tag enc n] writes tag number [n]. The next item written is the tagged 247 + content. *) 248 + 249 + val write_break : encoder -> unit 250 + (** [write_break enc] writes the break stop code (0xff) to terminate an 251 + indefinite-length item. *) 252 + 253 + (** {1:decoder Decoder} *) 254 + 255 + type decoder 256 + (** A buffered CBOR decoder that reads from a {!Bytesrw.Bytes.Reader.t}. *) 257 + 258 + val decoder : Bytesrw.Bytes.Reader.t -> decoder 259 + (** [decoder reader] creates a new decoder that reads from [reader]. *) 260 + 261 + (** {2 Decoder State} *) 262 + 263 + val decoder_at_end : decoder -> bool 264 + (** [decoder_at_end dec] returns [true] if the decoder has reached the end of 265 + input. *) 266 + 267 + val decoder_position : decoder -> int 268 + (** [decoder_position dec] returns the current byte offset in the input. *) 269 + 270 + (** {2 Low-level Read Operations} *) 271 + 272 + val peek_byte : decoder -> int option 273 + (** [peek_byte dec] returns the next byte without consuming it, or [None] at end 274 + of input. *) 275 + 276 + val read_byte : decoder -> int 277 + (** [read_byte dec] reads and returns a single byte (0-255). 278 + @raise End_of_file if at end of input. *) 279 + 280 + val read_bytes : decoder -> int -> string 281 + (** [read_bytes dec n] reads exactly [n] bytes as a string. 282 + @raise End_of_file if fewer than [n] bytes available. *) 283 + 284 + val read_u16_be : decoder -> int 285 + (** [read_u16_be dec] reads a 16-bit big-endian unsigned integer. *) 286 + 287 + val read_u32_be : decoder -> int32 288 + (** [read_u32_be dec] reads a 32-bit big-endian integer. *) 289 + 290 + val read_u64_be : decoder -> int64 291 + (** [read_u64_be dec] reads a 64-bit big-endian integer. *) 292 + 293 + (** {2 CBOR Type Header Decoding} *) 294 + 295 + type header = { 296 + major : int; (** Major type (0-7). *) 297 + info : int; (** Additional info (0-31). *) 298 + } 299 + (** A decoded CBOR initial byte. *) 300 + 301 + val read_header : decoder -> header 302 + (** [read_header dec] reads the initial byte of a CBOR item, returning the major 303 + type and additional info. *) 304 + 305 + val read_argument : decoder -> header -> int64 306 + (** [read_argument dec hdr] reads the argument value based on the additional 307 + info in [hdr]. For info 0-23, returns that value. For 24-27, reads the 308 + following 1/2/4/8 bytes. Returns -1 for indefinite length (31). *) 309 + 310 + val read_argument_z : decoder -> header -> Z.t 311 + (** [read_argument_z dec hdr] is like {!read_argument} but returns a [Z.t] for 312 + arbitrary-precision integers. *) 313 + 314 + val tag_positive_bignum : int 315 + (** CBOR tag 2: positive bignum. *) 316 + 317 + val tag_negative_bignum : int 318 + (** CBOR tag 3: negative bignum. *) 319 + 320 + val bytes_to_bigint : string -> Z.t 321 + (** [bytes_to_bigint s] converts a big-endian byte string to a [Z.t]. *) 322 + 323 + val bigint_to_bytes : Z.t -> string 324 + (** [bigint_to_bytes n] converts a non-negative [Z.t] to big-endian bytes. *) 325 + 326 + val decode_half : int -> float 327 + (** [decode_half bits] decodes an IEEE 754 half-precision float. *) 328 + 329 + (** {2 CBOR Value Decoding} *) 330 + 331 + val read_int : decoder -> int64 332 + (** [read_int dec] reads a CBOR integer (major type 0 or 1). Returns the signed 333 + value. 334 + @raise Failure if not an integer type. *) 335 + 336 + val read_text : decoder -> string 337 + (** [read_text dec] reads a CBOR text string (major type 3). Handles both 338 + definite and indefinite-length strings. 339 + @raise Failure if not a text string. *) 340 + 341 + val read_bytes_data : decoder -> string 342 + (** [read_bytes_data dec] reads a CBOR byte string (major type 2). Handles both 343 + definite and indefinite-length strings. 344 + @raise Failure if not a byte string. *) 345 + 346 + val read_float : decoder -> float 347 + (** [read_float dec] reads a CBOR float (major type 7 with info 25-27). Handles 348 + half, single, and double precision. 349 + @raise Failure if not a float. *) 350 + 351 + val read_bool : decoder -> bool 352 + (** [read_bool dec] reads a CBOR boolean (simple values 20 or 21). 353 + @raise Failure if not a boolean. *) 354 + 355 + val read_null : decoder -> unit 356 + (** [read_null dec] reads a CBOR null (simple value 22). 357 + @raise Failure if not null. *) 358 + 359 + val read_undefined : decoder -> unit 360 + (** [read_undefined dec] reads a CBOR undefined (simple value 23). 361 + @raise Failure if not undefined. *) 362 + 363 + val read_simple : decoder -> int 364 + (** [read_simple dec] reads a CBOR simple value (major type 7 with info 0-24). 365 + Returns the simple value number. 366 + @raise Failure if not a simple value. *) 367 + 368 + val read_array_start : decoder -> int option 369 + (** [read_array_start dec] reads an array header (major type 4). Returns 370 + [Some n] for definite-length array of [n] items, or [None] for 371 + indefinite-length array. 372 + @raise Failure if not an array. *) 373 + 374 + val read_map_start : decoder -> int option 375 + (** [read_map_start dec] reads a map header (major type 5). Returns [Some n] for 376 + definite-length map of [n] pairs, or [None] for indefinite-length map. 377 + @raise Failure if not a map. *) 378 + 379 + val read_tag : decoder -> int 380 + (** [read_tag dec] reads a tag number (major type 6). 381 + @raise Failure if not a tag. *) 382 + 383 + val is_break : decoder -> bool 384 + (** [is_break dec] returns [true] if the next byte is the break code (0xff). 385 + Does not consume the byte. Use {!skip_break} to consume it. *) 386 + 387 + val skip_break : decoder -> unit 388 + (** [skip_break dec] consumes the break code. 389 + @raise Failure if next byte is not break. *) 390 + 391 + (** {2 Skipping} *) 392 + 393 + val skip : decoder -> unit 394 + (** [skip dec] skips the next complete CBOR data item, including any nested 395 + items in arrays, maps, or tags. *) 396 + 397 + (** {1:cbor CBOR Value I/O} 398 + 399 + Functions for reading and writing complete {!Value.t} values. *) 400 + 401 + val write_cbor : encoder -> Value.t -> unit 402 + (** [write_cbor enc v] encodes CBOR value [v] using deterministic encoding. *) 403 + 404 + val read_cbor : decoder -> Value.t 405 + (** [read_cbor dec] reads a complete CBOR data item. 406 + @raise Failure on invalid CBOR or unexpected end of input. *)
+2060
lib/cbor.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + open Bytesrw 7 + module Binary = Binary 8 + module Value = Value 9 + 10 + module Sort = struct 11 + type t = 12 + | Unsigned 13 + | Negative 14 + | Bytes 15 + | Text 16 + | Array 17 + | Map 18 + | Tag 19 + | Bool 20 + | Null 21 + | Undefined 22 + | Simple 23 + | Float 24 + 25 + let to_string = function 26 + | Unsigned -> "unsigned integer" 27 + | Negative -> "negative integer" 28 + | Bytes -> "byte string" 29 + | Text -> "text string" 30 + | Array -> "array" 31 + | Map -> "map" 32 + | Tag -> "tag" 33 + | Bool -> "boolean" 34 + | Null -> "null" 35 + | Undefined -> "undefined" 36 + | Simple -> "simple value" 37 + | Float -> "float" 38 + 39 + let pp ppf s = Format.pp_print_string ppf (to_string s) 40 + 41 + let of_cbor = function 42 + | Value.Int z -> if Z.sign z >= 0 then Unsigned else Negative 43 + | Value.Bytes _ -> Bytes 44 + | Value.Text _ -> Text 45 + | Value.Array _ -> Array 46 + | Value.Map _ -> Map 47 + | Value.Tag _ -> Tag 48 + | Value.Bool _ -> Bool 49 + | Value.Null -> Null 50 + | Value.Undefined -> Undefined 51 + | Value.Simple _ -> Simple 52 + | Value.Float _ -> Float 53 + 54 + let kinded ~kind s = 55 + if kind = "" then to_string s else kind ^ " " ^ to_string s 56 + 57 + let or_kind ~kind s = if kind <> "" then kind else to_string s 58 + end 59 + 60 + module Error = struct 61 + type path = segment list 62 + 63 + and segment = 64 + | Root 65 + | Index of int 66 + | Key of string 67 + | Key_cbor of Value.t 68 + | Tag of int 69 + 70 + let pp_segment ppf = function 71 + | Root -> Fmt.pf ppf "$" 72 + | Index i -> Fmt.pf ppf "[%d]" i 73 + | Key k -> Fmt.pf ppf ".%s" k 74 + | Key_cbor k -> Fmt.pf ppf "[%a]" Value.pp k 75 + | Tag n -> Fmt.pf ppf "<%d>" n 76 + 77 + let pp_path ppf path = List.iter (pp_segment ppf) (List.rev path) 78 + let path_to_string path = Fmt.str "%a" pp_path path 79 + 80 + type kind = 81 + | Type_mismatch of { expected : string; got : string } 82 + | Missing_member of string 83 + | Unknown_member of string 84 + | Duplicate_member of string 85 + | Out_of_range of { value : string; range : string } 86 + | Invalid_value of string 87 + | Parse_error of string 88 + | Custom of string 89 + 90 + type t = { path : path; kind : kind } 91 + 92 + let make path kind = { path; kind } 93 + 94 + let pp_kind ppf = function 95 + | Type_mismatch { expected; got } -> 96 + Fmt.pf ppf "type mismatch: expected %s, got %s" expected got 97 + | Missing_member name -> Fmt.pf ppf "missing required member: %s" name 98 + | Unknown_member name -> Fmt.pf ppf "unknown member: %s" name 99 + | Duplicate_member name -> Fmt.pf ppf "duplicate member: %s" name 100 + | Out_of_range { value; range } -> 101 + Fmt.pf ppf "value %s out of range %s" value range 102 + | Invalid_value msg -> Fmt.pf ppf "invalid value: %s" msg 103 + | Parse_error msg -> Fmt.pf ppf "parse error: %s" msg 104 + | Custom msg -> Fmt.pf ppf "%s" msg 105 + 106 + let pp ppf { path; kind } = Fmt.pf ppf "%a: %a" pp_path path pp_kind kind 107 + let to_string e = Fmt.str "%a" pp e 108 + 109 + exception Decode of t 110 + end 111 + 112 + type 'a t = { 113 + kind : string; 114 + encode : 'a -> Value.t; 115 + decode : Error.path -> Value.t -> ('a, Error.t) result; 116 + decode_rw : Error.path -> Binary.decoder -> ('a, Error.t) result; 117 + } 118 + 119 + let kind c = c.kind 120 + let pp ppf c = Fmt.pf ppf "<codec:%s>" c.kind 121 + 122 + let type_name (v : Value.t) = 123 + match v with 124 + | Int _ -> "integer" 125 + | Bytes _ -> "bytes" 126 + | Text _ -> "text" 127 + | Array _ -> "array" 128 + | Map _ -> "map" 129 + | Tag _ -> "tag" 130 + | Bool _ -> "boolean" 131 + | Null -> "null" 132 + | Undefined -> "undefined" 133 + | Simple _ -> "simple" 134 + | Float _ -> "float" 135 + 136 + let type_error path expected v = 137 + Error (Error.make path (Type_mismatch { expected; got = type_name v })) 138 + 139 + (* Major type name for stream decoding errors *) 140 + let major_type_name = function 141 + | 0 -> "integer" 142 + | 1 -> "integer" 143 + | 2 -> "bytes" 144 + | 3 -> "text" 145 + | 4 -> "array" 146 + | 5 -> "map" 147 + | 6 -> "tag" 148 + | 7 -> "simple/float" 149 + | _ -> "unknown" 150 + 151 + let stream_type_error path expected (hdr : Binary.header) = 152 + Error 153 + (Error.make path 154 + (Type_mismatch { expected; got = major_type_name hdr.major })) 155 + 156 + (* Fallback: read a Value.t from the stream, then use the Value.t decoder *) 157 + let decode_rw_via_cbor decode path dec = 158 + let v = Binary.read_cbor dec in 159 + decode path v 160 + 161 + (* Stream-level: read text string (major type 3) handling both definite 162 + and indefinite lengths, given an already-read header *) 163 + let read_text_with_hdr dec (hdr : Binary.header) = 164 + let arg = Binary.read_argument dec hdr in 165 + if arg >= 0L then Binary.read_bytes dec (Int64.to_int arg) 166 + else begin 167 + let buf = Buffer.create 64 in 168 + while 169 + match Binary.peek_byte dec with 170 + | Some 0xff -> 171 + ignore (Binary.read_byte dec); 172 + false 173 + | _ -> true 174 + do 175 + let hdr = Binary.read_header dec in 176 + if hdr.major <> Binary.major_text then failwith "Expected text chunk"; 177 + let len = Binary.read_argument dec hdr in 178 + if len < 0L then failwith "Nested indefinite text"; 179 + Buffer.add_string buf (Binary.read_bytes dec (Int64.to_int len)) 180 + done; 181 + Buffer.contents buf 182 + end 183 + 184 + (* Stream-level: read bytes string (major type 2) given an already-read header *) 185 + let read_bytes_with_hdr dec (hdr : Binary.header) = 186 + let arg = Binary.read_argument dec hdr in 187 + if arg >= 0L then Binary.read_bytes dec (Int64.to_int arg) 188 + else begin 189 + let buf = Buffer.create 64 in 190 + while 191 + match Binary.peek_byte dec with 192 + | Some 0xff -> 193 + ignore (Binary.read_byte dec); 194 + false 195 + | _ -> true 196 + do 197 + let hdr = Binary.read_header dec in 198 + if hdr.major <> Binary.major_bytes then failwith "Expected bytes chunk"; 199 + let len = Binary.read_argument dec hdr in 200 + if len < 0L then failwith "Nested indefinite bytes"; 201 + Buffer.add_string buf (Binary.read_bytes dec (Int64.to_int len)) 202 + done; 203 + Buffer.contents buf 204 + end 205 + 206 + (* Stream-level: read a float from major type 7 header *) 207 + let read_float_with_hdr dec (hdr : Binary.header) = 208 + if hdr.info = Binary.ai_2byte then Binary.decode_half (Binary.read_u16_be dec) 209 + else if hdr.info = Binary.ai_4byte then 210 + Int32.float_of_bits (Binary.read_u32_be dec) 211 + else if hdr.info = Binary.ai_8byte then 212 + Int64.float_of_bits (Binary.read_u64_be dec) 213 + else failwith "Expected float" 214 + 215 + (* Base codecs *) 216 + 217 + let null = 218 + { 219 + kind = "null"; 220 + encode = (fun () -> Value.Null); 221 + decode = 222 + (fun path v -> 223 + match v with Value.Null -> Ok () | _ -> type_error path "null" v); 224 + decode_rw = 225 + (fun path dec -> 226 + let hdr = Binary.read_header dec in 227 + if hdr.major = Binary.major_simple && hdr.info = Binary.simple_null then 228 + Ok () 229 + else stream_type_error path "null" hdr); 230 + } 231 + 232 + let bool = 233 + { 234 + kind = "bool"; 235 + encode = (fun b -> Value.Bool b); 236 + decode = 237 + (fun path v -> 238 + match v with Value.Bool b -> Ok b | _ -> type_error path "boolean" v); 239 + decode_rw = 240 + (fun path dec -> 241 + let hdr = Binary.read_header dec in 242 + if hdr.major = Binary.major_simple && hdr.info = Binary.simple_false 243 + then Ok false 244 + else if hdr.major = Binary.major_simple && hdr.info = Binary.simple_true 245 + then Ok true 246 + else stream_type_error path "boolean" hdr); 247 + } 248 + 249 + let int = 250 + { 251 + kind = "int"; 252 + encode = (fun n -> Value.Int (Z.of_int n)); 253 + decode = 254 + (fun path v -> 255 + match v with 256 + | Value.Int n -> 257 + if Z.fits_int n then Ok (Z.to_int n) 258 + else 259 + Error 260 + (Error.make path 261 + (Out_of_range 262 + { 263 + value = Z.to_string n; 264 + range = Fmt.str "[%d, %d]" min_int max_int; 265 + })) 266 + | _ -> type_error path "integer" v); 267 + decode_rw = 268 + (fun path dec -> 269 + let hdr = Binary.read_header dec in 270 + match hdr.major with 271 + | 0 -> 272 + let n = Binary.read_argument_z dec hdr in 273 + if Z.fits_int n then Ok (Z.to_int n) 274 + else 275 + Error 276 + (Error.make path 277 + (Out_of_range 278 + { 279 + value = Z.to_string n; 280 + range = Fmt.str "[%d, %d]" min_int max_int; 281 + })) 282 + | 1 -> 283 + let n = Z.neg (Z.succ (Binary.read_argument_z dec hdr)) in 284 + if Z.fits_int n then Ok (Z.to_int n) 285 + else 286 + Error 287 + (Error.make path 288 + (Out_of_range 289 + { 290 + value = Z.to_string n; 291 + range = Fmt.str "[%d, %d]" min_int max_int; 292 + })) 293 + | _ -> stream_type_error path "integer" hdr); 294 + } 295 + 296 + let int32 = 297 + { 298 + kind = "int32"; 299 + encode = (fun n -> Value.Int (Z.of_int32 n)); 300 + decode = 301 + (fun path v -> 302 + match v with 303 + | Value.Int n -> 304 + if 305 + Z.geq n (Z.of_int32 Int32.min_int) 306 + && Z.leq n (Z.of_int32 Int32.max_int) 307 + then Ok (Z.to_int32 n) 308 + else 309 + Error 310 + (Error.make path 311 + (Out_of_range 312 + { 313 + value = Z.to_string n; 314 + range = Fmt.str "[%ld, %ld]" Int32.min_int Int32.max_int; 315 + })) 316 + | _ -> type_error path "integer" v); 317 + decode_rw = 318 + (fun path dec -> 319 + let hdr = Binary.read_header dec in 320 + match hdr.major with 321 + | 0 | 1 -> 322 + let n = 323 + if hdr.major = 0 then Binary.read_argument_z dec hdr 324 + else Z.neg (Z.succ (Binary.read_argument_z dec hdr)) 325 + in 326 + if 327 + Z.geq n (Z.of_int32 Int32.min_int) 328 + && Z.leq n (Z.of_int32 Int32.max_int) 329 + then Ok (Z.to_int32 n) 330 + else 331 + Error 332 + (Error.make path 333 + (Out_of_range 334 + { 335 + value = Z.to_string n; 336 + range = Fmt.str "[%ld, %ld]" Int32.min_int Int32.max_int; 337 + })) 338 + | _ -> stream_type_error path "integer" hdr); 339 + } 340 + 341 + let int64 = 342 + { 343 + kind = "int64"; 344 + encode = (fun n -> Value.Int (Z.of_int64 n)); 345 + decode = 346 + (fun path v -> 347 + match v with 348 + | Value.Int n -> 349 + if Z.fits_int64 n then Ok (Z.to_int64 n) 350 + else 351 + Error 352 + (Error.make path 353 + (Out_of_range 354 + { 355 + value = Z.to_string n; 356 + range = Fmt.str "[%Ld, %Ld]" Int64.min_int Int64.max_int; 357 + })) 358 + | _ -> type_error path "integer" v); 359 + decode_rw = 360 + (fun path dec -> 361 + let hdr = Binary.read_header dec in 362 + match hdr.major with 363 + | 0 | 1 -> 364 + let n = 365 + if hdr.major = 0 then Binary.read_argument_z dec hdr 366 + else Z.neg (Z.succ (Binary.read_argument_z dec hdr)) 367 + in 368 + if Z.fits_int64 n then Ok (Z.to_int64 n) 369 + else 370 + Error 371 + (Error.make path 372 + (Out_of_range 373 + { 374 + value = Z.to_string n; 375 + range = Fmt.str "[%Ld, %Ld]" Int64.min_int Int64.max_int; 376 + })) 377 + | _ -> stream_type_error path "integer" hdr); 378 + } 379 + 380 + let float = 381 + { 382 + kind = "float"; 383 + encode = (fun f -> Value.Float f); 384 + decode = 385 + (fun path v -> 386 + match v with 387 + | Value.Float f -> Ok f 388 + | Value.Int n -> Ok (Z.to_float n) 389 + | _ -> type_error path "float" v); 390 + decode_rw = 391 + (fun path dec -> 392 + let hdr = Binary.read_header dec in 393 + match hdr.major with 394 + | 7 -> Ok (read_float_with_hdr dec hdr) 395 + | 0 -> Ok (Z.to_float (Binary.read_argument_z dec hdr)) 396 + | 1 -> Ok (Z.to_float (Z.neg (Z.succ (Binary.read_argument_z dec hdr)))) 397 + | _ -> stream_type_error path "float" hdr); 398 + } 399 + 400 + let string = 401 + { 402 + kind = "string"; 403 + encode = (fun s -> Value.Text s); 404 + decode = 405 + (fun path v -> 406 + match v with Value.Text s -> Ok s | _ -> type_error path "text" v); 407 + decode_rw = 408 + (fun path dec -> 409 + let hdr = Binary.read_header dec in 410 + if hdr.major = Binary.major_text then Ok (read_text_with_hdr dec hdr) 411 + else stream_type_error path "text" hdr); 412 + } 413 + 414 + let bytes = 415 + { 416 + kind = "bytes"; 417 + encode = (fun s -> Value.Bytes s); 418 + decode = 419 + (fun path v -> 420 + match v with Value.Bytes s -> Ok s | _ -> type_error path "bytes" v); 421 + decode_rw = 422 + (fun path dec -> 423 + let hdr = Binary.read_header dec in 424 + if hdr.major = Binary.major_bytes then Ok (read_bytes_with_hdr dec hdr) 425 + else stream_type_error path "bytes" hdr); 426 + } 427 + 428 + let any = 429 + { 430 + kind = "any"; 431 + encode = Fun.id; 432 + decode = (fun _path v -> Ok v); 433 + decode_rw = (fun _path dec -> Ok (Binary.read_cbor dec)); 434 + } 435 + 436 + (* Nullable *) 437 + 438 + let nullable c = 439 + { 440 + kind = Fmt.str "nullable(%s)" c.kind; 441 + encode = 442 + (fun opt -> match opt with None -> Value.Null | Some x -> c.encode x); 443 + decode = 444 + (fun path v -> 445 + match v with 446 + | Value.Null -> Ok None 447 + | _ -> Result.map Option.some (c.decode path v)); 448 + decode_rw = 449 + (fun path dec -> 450 + match Binary.peek_byte dec with 451 + | Some b 452 + when b lsr 5 = Binary.major_simple && b land 0x1f = Binary.simple_null 453 + -> 454 + ignore (Binary.read_byte dec); 455 + Ok None 456 + | _ -> Result.map Option.some (c.decode_rw path dec)); 457 + } 458 + 459 + let option ~default c = 460 + { 461 + kind = Fmt.str "option(%s)" c.kind; 462 + encode = c.encode; 463 + decode = 464 + (fun path v -> 465 + match v with Value.Null -> Ok default | _ -> c.decode path v); 466 + decode_rw = 467 + (fun path dec -> 468 + match Binary.peek_byte dec with 469 + | Some b 470 + when b lsr 5 = Binary.major_simple && b land 0x1f = Binary.simple_null 471 + -> 472 + ignore (Binary.read_byte dec); 473 + Ok default 474 + | _ -> c.decode_rw path dec); 475 + } 476 + 477 + (* Numeric variants *) 478 + 479 + let uint = 480 + { 481 + kind = "uint"; 482 + encode = (fun n -> Value.Int (Z.of_int n)); 483 + decode = 484 + (fun path v -> 485 + match v with 486 + | Value.Int n -> 487 + if Z.sign n >= 0 && Z.fits_int n then Ok (Z.to_int n) 488 + else 489 + Error 490 + (Error.make path 491 + (Out_of_range 492 + { 493 + value = Z.to_string n; 494 + range = Fmt.str "[0, %d]" max_int; 495 + })) 496 + | _ -> type_error path "integer" v); 497 + decode_rw = 498 + (fun path dec -> 499 + let hdr = Binary.read_header dec in 500 + match hdr.major with 501 + | 0 -> 502 + let n = Binary.read_argument_z dec hdr in 503 + if Z.fits_int n then Ok (Z.to_int n) 504 + else 505 + Error 506 + (Error.make path 507 + (Out_of_range 508 + { 509 + value = Z.to_string n; 510 + range = Fmt.str "[0, %d]" max_int; 511 + })) 512 + | 1 -> 513 + let n = Z.neg (Z.succ (Binary.read_argument_z dec hdr)) in 514 + Error 515 + (Error.make path 516 + (Out_of_range 517 + { value = Z.to_string n; range = Fmt.str "[0, %d]" max_int })) 518 + | _ -> stream_type_error path "integer" hdr); 519 + } 520 + 521 + let uint32 = 522 + { 523 + kind = "uint32"; 524 + encode = (fun n -> Value.Int (Z.of_int32 n)); 525 + decode = 526 + (fun path v -> 527 + match v with 528 + | Value.Int n -> 529 + if Z.sign n >= 0 && Z.leq n (Z.of_string "4294967295") then 530 + Ok (Z.to_int32 n) 531 + else 532 + Error 533 + (Error.make path 534 + (Out_of_range 535 + { value = Z.to_string n; range = "[0, 4294967295]" })) 536 + | _ -> type_error path "integer" v); 537 + decode_rw = 538 + (fun path dec -> 539 + let hdr = Binary.read_header dec in 540 + match hdr.major with 541 + | 0 | 1 -> 542 + let n = 543 + if hdr.major = 0 then Binary.read_argument_z dec hdr 544 + else Z.neg (Z.succ (Binary.read_argument_z dec hdr)) 545 + in 546 + if Z.sign n >= 0 && Z.leq n (Z.of_string "4294967295") then 547 + Ok (Z.to_int32 n) 548 + else 549 + Error 550 + (Error.make path 551 + (Out_of_range 552 + { value = Z.to_string n; range = "[0, 4294967295]" })) 553 + | _ -> stream_type_error path "integer" hdr); 554 + } 555 + 556 + let uint64 = 557 + { 558 + kind = "uint64"; 559 + encode = (fun n -> Value.Int (Z.of_int64 n)); 560 + decode = 561 + (fun path v -> 562 + match v with 563 + | Value.Int n -> 564 + if Z.sign n >= 0 && Z.fits_int64 n then Ok (Z.to_int64 n) 565 + else 566 + Error 567 + (Error.make path 568 + (Out_of_range 569 + { value = Z.to_string n; range = "[0, 2^63-1]" })) 570 + | _ -> type_error path "integer" v); 571 + decode_rw = 572 + (fun path dec -> 573 + let hdr = Binary.read_header dec in 574 + match hdr.major with 575 + | 0 | 1 -> 576 + let n = 577 + if hdr.major = 0 then Binary.read_argument_z dec hdr 578 + else Z.neg (Z.succ (Binary.read_argument_z dec hdr)) 579 + in 580 + if Z.sign n >= 0 && Z.fits_int64 n then Ok (Z.to_int64 n) 581 + else 582 + Error 583 + (Error.make path 584 + (Out_of_range 585 + { value = Z.to_string n; range = "[0, 2^63-1]" })) 586 + | _ -> stream_type_error path "integer" hdr); 587 + } 588 + 589 + let number = float 590 + 591 + (* Arrays *) 592 + 593 + (* Helper: read array length from stream, returning count or None for indefinite *) 594 + let read_array_length_rw path dec = 595 + let hdr = Binary.read_header dec in 596 + if hdr.major <> Binary.major_array then 597 + Error (stream_type_error path "array" hdr) 598 + else 599 + let arg = Binary.read_argument dec hdr in 600 + if arg < 0L then Ok None else Ok (Some (Int64.to_int arg)) 601 + 602 + (* Helper: decode n elements from stream using element codec *) 603 + let decode_array_elements_rw c path dec n = 604 + let rec loop i acc = 605 + if i >= n then Ok (List.rev acc) 606 + else 607 + let path' = Error.Index i :: path in 608 + match c.decode_rw path' dec with 609 + | Ok v -> loop (i + 1) (v :: acc) 610 + | Error e -> Error e 611 + in 612 + loop 0 [] 613 + 614 + (* Helper: decode indefinite-length array elements from stream *) 615 + let decode_array_indef_rw c path dec = 616 + let rec loop i acc = 617 + if Binary.is_break dec then ( 618 + Binary.skip_break dec; 619 + Ok (List.rev acc)) 620 + else 621 + let path' = Error.Index i :: path in 622 + match c.decode_rw path' dec with 623 + | Ok v -> loop (i + 1) (v :: acc) 624 + | Error e -> Error e 625 + in 626 + loop 0 [] 627 + 628 + let array c = 629 + { 630 + kind = Fmt.str "array(%s)" c.kind; 631 + encode = (fun items -> Value.Array (List.map c.encode items)); 632 + decode = 633 + (fun path v -> 634 + match v with 635 + | Value.Array items -> 636 + let rec loop i acc = function 637 + | [] -> Ok (List.rev acc) 638 + | x :: xs -> ( 639 + let path' = Error.Index i :: path in 640 + match c.decode path' x with 641 + | Ok v -> loop (i + 1) (v :: acc) xs 642 + | Error e -> Error e) 643 + in 644 + loop 0 [] items 645 + | _ -> type_error path "array" v); 646 + decode_rw = 647 + (fun path dec -> 648 + match read_array_length_rw path dec with 649 + | Error e -> e 650 + | Ok (Some n) -> decode_array_elements_rw c path dec n 651 + | Ok None -> decode_array_indef_rw c path dec); 652 + } 653 + 654 + let array_of ~len c = 655 + { 656 + kind = Fmt.str "array_of(%d, %s)" len c.kind; 657 + encode = 658 + (fun items -> 659 + if List.length items <> len then 660 + Fmt.failwith "Expected array of length %d" len; 661 + Value.Array (List.map c.encode items)); 662 + decode = 663 + (fun path v -> 664 + match v with 665 + | Value.Array items when List.length items = len -> 666 + let rec loop i acc = function 667 + | [] -> Ok (List.rev acc) 668 + | x :: xs -> ( 669 + let path' = Error.Index i :: path in 670 + match c.decode path' x with 671 + | Ok v -> loop (i + 1) (v :: acc) xs 672 + | Error e -> Error e) 673 + in 674 + loop 0 [] items 675 + | Value.Array items -> 676 + Error 677 + (Error.make path 678 + (Invalid_value 679 + (Fmt.str "expected array of length %d, got %d" len 680 + (List.length items)))) 681 + | _ -> type_error path "array" v); 682 + decode_rw = 683 + (fun path dec -> 684 + match read_array_length_rw path dec with 685 + | Error e -> e 686 + | Ok (Some n) when n = len -> decode_array_elements_rw c path dec n 687 + | Ok (Some n) -> 688 + (* Skip the remaining elements to leave the stream in a valid state *) 689 + for _ = 1 to n do 690 + Binary.skip dec 691 + done; 692 + Error 693 + (Error.make path 694 + (Invalid_value 695 + (Fmt.str "expected array of length %d, got %d" len n))) 696 + | Ok None -> ( 697 + (* Indefinite array: read all elements to find the break *) 698 + match decode_array_indef_rw c path dec with 699 + | Ok items when List.length items = len -> Ok items 700 + | Ok items -> 701 + Error 702 + (Error.make path 703 + (Invalid_value 704 + (Fmt.str "expected array of length %d, got %d" len 705 + (List.length items)))) 706 + | Error e -> Error e)); 707 + } 708 + 709 + let tuple2 ca cb = 710 + { 711 + kind = Fmt.str "tuple2(%s, %s)" ca.kind cb.kind; 712 + encode = (fun (a, b) -> Value.Array [ ca.encode a; cb.encode b ]); 713 + decode = 714 + (fun path v -> 715 + match v with 716 + | Value.Array [ va; vb ] -> ( 717 + match ca.decode (Error.Index 0 :: path) va with 718 + | Error e -> Error e 719 + | Ok a -> ( 720 + match cb.decode (Error.Index 1 :: path) vb with 721 + | Error e -> Error e 722 + | Ok b -> Ok (a, b))) 723 + | Value.Array _ -> 724 + Error (Error.make path (Invalid_value "expected 2-element array")) 725 + | _ -> type_error path "array" v); 726 + decode_rw = 727 + (fun path dec -> 728 + match read_array_length_rw path dec with 729 + | Error e -> e 730 + | Ok (Some 2) -> ( 731 + match ca.decode_rw (Error.Index 0 :: path) dec with 732 + | Error e -> Error e 733 + | Ok a -> ( 734 + match cb.decode_rw (Error.Index 1 :: path) dec with 735 + | Error e -> Error e 736 + | Ok b -> Ok (a, b))) 737 + | Ok (Some _n) -> 738 + Error (Error.make path (Invalid_value "expected 2-element array")) 739 + | Ok None -> 740 + Error (Error.make path (Invalid_value "expected 2-element array"))); 741 + } 742 + 743 + let tuple3 ca cb cc = 744 + { 745 + kind = Fmt.str "tuple3(%s, %s, %s)" ca.kind cb.kind cc.kind; 746 + encode = 747 + (fun (a, b, c) -> Value.Array [ ca.encode a; cb.encode b; cc.encode c ]); 748 + decode = 749 + (fun path v -> 750 + match v with 751 + | Value.Array [ va; vb; vc ] -> ( 752 + match ca.decode (Error.Index 0 :: path) va with 753 + | Error e -> Error e 754 + | Ok a -> ( 755 + match cb.decode (Error.Index 1 :: path) vb with 756 + | Error e -> Error e 757 + | Ok b -> ( 758 + match cc.decode (Error.Index 2 :: path) vc with 759 + | Error e -> Error e 760 + | Ok c -> Ok (a, b, c)))) 761 + | Value.Array _ -> 762 + Error (Error.make path (Invalid_value "expected 3-element array")) 763 + | _ -> type_error path "array" v); 764 + decode_rw = 765 + (fun path dec -> 766 + match read_array_length_rw path dec with 767 + | Error e -> e 768 + | Ok (Some 3) -> ( 769 + match ca.decode_rw (Error.Index 0 :: path) dec with 770 + | Error e -> Error e 771 + | Ok a -> ( 772 + match cb.decode_rw (Error.Index 1 :: path) dec with 773 + | Error e -> Error e 774 + | Ok b -> ( 775 + match cc.decode_rw (Error.Index 2 :: path) dec with 776 + | Error e -> Error e 777 + | Ok c -> Ok (a, b, c)))) 778 + | Ok (Some _n) -> 779 + Error (Error.make path (Invalid_value "expected 3-element array")) 780 + | Ok None -> 781 + Error (Error.make path (Invalid_value "expected 3-element array"))); 782 + } 783 + 784 + let tuple4 ca cb cc cd = 785 + { 786 + kind = Fmt.str "tuple4(%s, %s, %s, %s)" ca.kind cb.kind cc.kind cd.kind; 787 + encode = 788 + (fun (a, b, c, d) -> 789 + Value.Array [ ca.encode a; cb.encode b; cc.encode c; cd.encode d ]); 790 + decode = 791 + (fun path v -> 792 + match v with 793 + | Value.Array [ va; vb; vc; vd ] -> ( 794 + match ca.decode (Error.Index 0 :: path) va with 795 + | Error e -> Error e 796 + | Ok a -> ( 797 + match cb.decode (Error.Index 1 :: path) vb with 798 + | Error e -> Error e 799 + | Ok b -> ( 800 + match cc.decode (Error.Index 2 :: path) vc with 801 + | Error e -> Error e 802 + | Ok c -> ( 803 + match cd.decode (Error.Index 3 :: path) vd with 804 + | Error e -> Error e 805 + | Ok d -> Ok (a, b, c, d))))) 806 + | Value.Array _ -> 807 + Error (Error.make path (Invalid_value "expected 4-element array")) 808 + | _ -> type_error path "array" v); 809 + decode_rw = 810 + (fun path dec -> 811 + match read_array_length_rw path dec with 812 + | Error e -> e 813 + | Ok (Some 4) -> ( 814 + match ca.decode_rw (Error.Index 0 :: path) dec with 815 + | Error e -> Error e 816 + | Ok a -> ( 817 + match cb.decode_rw (Error.Index 1 :: path) dec with 818 + | Error e -> Error e 819 + | Ok b -> ( 820 + match cc.decode_rw (Error.Index 2 :: path) dec with 821 + | Error e -> Error e 822 + | Ok c -> ( 823 + match cd.decode_rw (Error.Index 3 :: path) dec with 824 + | Error e -> Error e 825 + | Ok d -> Ok (a, b, c, d))))) 826 + | Ok (Some _n) -> 827 + Error (Error.make path (Invalid_value "expected 4-element array")) 828 + | Ok None -> 829 + Error (Error.make path (Invalid_value "expected 4-element array"))); 830 + } 831 + 832 + (* Maps *) 833 + 834 + (* Helper: read map length from stream *) 835 + let read_map_length_rw path dec = 836 + let hdr = Binary.read_header dec in 837 + if hdr.major <> Binary.major_map then Error (stream_type_error path "map" hdr) 838 + else 839 + let arg = Binary.read_argument dec hdr in 840 + if arg < 0L then Ok None else Ok (Some (Int64.to_int arg)) 841 + 842 + let assoc kc vc = 843 + { 844 + kind = Fmt.str "assoc(%s, %s)" kc.kind vc.kind; 845 + encode = 846 + (fun pairs -> 847 + Value.Map (List.map (fun (k, v) -> (kc.encode k, vc.encode v)) pairs)); 848 + decode = 849 + (fun path v -> 850 + match v with 851 + | Value.Map pairs -> 852 + let rec loop acc = function 853 + | [] -> Ok (List.rev acc) 854 + | (ck, cv) :: rest -> ( 855 + let path_k = Error.Key_cbor ck :: path in 856 + match kc.decode path_k ck with 857 + | Error e -> Error e 858 + | Ok k -> ( 859 + match vc.decode path_k cv with 860 + | Error e -> Error e 861 + | Ok v -> loop ((k, v) :: acc) rest)) 862 + in 863 + loop [] pairs 864 + | _ -> type_error path "map" v); 865 + decode_rw = 866 + (fun path dec -> 867 + match read_map_length_rw path dec with 868 + | Error e -> e 869 + | Ok len_opt -> 870 + let n = match len_opt with Some n -> n | None -> max_int in 871 + let rec loop i acc = 872 + if i >= n then Ok (List.rev acc) 873 + else if len_opt = None && Binary.is_break dec then ( 874 + Binary.skip_break dec; 875 + Ok (List.rev acc)) 876 + else 877 + match kc.decode_rw path dec with 878 + | Error e -> Error e 879 + | Ok k -> ( 880 + match vc.decode_rw path dec with 881 + | Error e -> Error e 882 + | Ok v -> loop (i + 1) ((k, v) :: acc)) 883 + in 884 + loop 0 []); 885 + } 886 + 887 + let string_map vc = assoc string vc 888 + let int_map vc = assoc int vc 889 + 890 + (* Object codec module *) 891 + 892 + module Obj = struct 893 + type enc = (string * Value.t) list 894 + 895 + let field name (v : Value.t) (acc : enc) : enc = (name, v) :: acc 896 + 897 + let find_remove key pairs = 898 + let rec loop acc = function 899 + | [] -> (None, List.rev acc) 900 + | (k, v) :: rest when k = key -> (Some v, List.rev_append acc rest) 901 + | kv :: rest -> loop (kv :: acc) rest 902 + in 903 + loop [] pairs 904 + 905 + type (_, _) mem = 906 + | Return : 'a -> ('o, 'a) mem 907 + | Mem : { 908 + name : string; 909 + get : 'o -> 'x; 910 + codec : 'x t; 911 + cont : 'x -> ('o, 'a) mem; 912 + } 913 + -> ('o, 'a) mem 914 + | Mem_opt : { 915 + name : string; 916 + get : 'o -> 'x option; 917 + codec : 'x t; 918 + cont : 'x option -> ('o, 'a) mem; 919 + } 920 + -> ('o, 'a) mem 921 + | Mem_default : { 922 + name : string; 923 + get : 'o -> 'x; 924 + codec : 'x t; 925 + default : 'x; 926 + cont : 'x -> ('o, 'a) mem; 927 + } 928 + -> ('o, 'a) mem 929 + 930 + let return v = Return v 931 + let mem name get codec = Mem { name; get; codec; cont = (fun x -> Return x) } 932 + 933 + let mem_opt name get codec = 934 + Mem_opt { name; get; codec; cont = (fun x -> Return x) } 935 + 936 + let mem_default name get ~default codec = 937 + Mem_default { name; get; codec; default; cont = (fun x -> Return x) } 938 + 939 + let rec ( let* ) : type o a b. (o, a) mem -> (a -> (o, b) mem) -> (o, b) mem = 940 + fun m f -> 941 + match m with 942 + | Return a -> f a 943 + | Mem r -> 944 + Mem 945 + { 946 + r with 947 + cont = 948 + (fun x -> 949 + let* y = r.cont x in 950 + f y); 951 + } 952 + | Mem_opt r -> 953 + Mem_opt 954 + { 955 + r with 956 + cont = 957 + (fun x -> 958 + let* y = r.cont x in 959 + f y); 960 + } 961 + | Mem_default r -> 962 + Mem_default 963 + { 964 + r with 965 + cont = 966 + (fun x -> 967 + let* y = r.cont x in 968 + f y); 969 + } 970 + 971 + let rec decode_mem : type o a. 972 + Error.path -> 973 + (string * Value.t) list -> 974 + (o, a) mem -> 975 + (a * (string * Value.t) list, Error.t) result = 976 + fun path pairs m -> 977 + match m with 978 + | Return a -> Ok (a, pairs) 979 + | Mem { name; codec; cont; _ } -> ( 980 + match find_remove name pairs with 981 + | None, _ -> Error (Error.make path (Missing_member name)) 982 + | Some v, remaining -> ( 983 + let path' = Error.Key name :: path in 984 + match codec.decode path' v with 985 + | Error e -> Error e 986 + | Ok x -> decode_mem path remaining (cont x))) 987 + | Mem_opt { name; codec; cont; _ } -> ( 988 + match find_remove name pairs with 989 + | None, remaining -> decode_mem path remaining (cont None) 990 + | Some Value.Null, remaining -> decode_mem path remaining (cont None) 991 + | Some v, remaining -> ( 992 + let path' = Error.Key name :: path in 993 + match codec.decode path' v with 994 + | Error e -> Error e 995 + | Ok x -> decode_mem path remaining (cont (Some x)))) 996 + | Mem_default { name; codec; default; cont; _ } -> ( 997 + match find_remove name pairs with 998 + | None, remaining -> decode_mem path remaining (cont default) 999 + | Some Value.Null, remaining -> decode_mem path remaining (cont default) 1000 + | Some v, remaining -> ( 1001 + let path' = Error.Key name :: path in 1002 + match codec.decode path' v with 1003 + | Error e -> Error e 1004 + | Ok x -> decode_mem path remaining (cont x))) 1005 + 1006 + let rec encode_mem : type o a. o -> (o, a) mem -> enc -> enc = 1007 + fun o m acc -> 1008 + match m with 1009 + | Return _ -> acc 1010 + | Mem { name; get; codec; cont } -> 1011 + let v = get o in 1012 + let acc = field name (codec.encode v) acc in 1013 + encode_mem o (cont v) acc 1014 + | Mem_opt { name; get; codec; cont } -> 1015 + let v = get o in 1016 + let acc = 1017 + match v with None -> acc | Some x -> field name (codec.encode x) acc 1018 + in 1019 + encode_mem o (cont v) acc 1020 + | Mem_default { name; get; codec; cont; _ } -> 1021 + let v = get o in 1022 + let acc = field name (codec.encode v) acc in 1023 + encode_mem o (cont v) acc 1024 + 1025 + let rec mem_names : type o a. (o, a) mem -> string list = function 1026 + | Return _ -> [] 1027 + | Mem { name; cont; _ } -> 1028 + (* We need a dummy value to get the continuation, but for names 1029 + we just use Obj.magic since we only inspect structure *) 1030 + name :: mem_names (cont (Stdlib.Obj.magic ())) 1031 + | Mem_opt { name; cont; _ } -> 1032 + name :: mem_names (cont (Stdlib.Obj.magic ())) 1033 + | Mem_default { name; cont; _ } -> 1034 + name :: mem_names (cont (Stdlib.Obj.magic ())) 1035 + 1036 + (* Build a dispatch table from member name to a streaming decoder that 1037 + stores the typed result into a hashtable keyed by name. The stored 1038 + values are wrapped via Stdlib.Obj.repr so that a single table can 1039 + hold heterogeneously-typed decoded results. *) 1040 + type mem_decoder = { 1041 + decode_rw_store : 1042 + Error.path -> 1043 + Binary.decoder -> 1044 + (string, Stdlib.Obj.t) Hashtbl.t -> 1045 + (unit, Error.t) result; 1046 + } 1047 + 1048 + let rec build_decoders : type o a. (o, a) mem -> (string * mem_decoder) list = 1049 + function 1050 + | Return _ -> [] 1051 + | Mem { name; codec; cont; _ } -> 1052 + let dec_entry = 1053 + { 1054 + decode_rw_store = 1055 + (fun path dec tbl -> 1056 + match codec.decode_rw path dec with 1057 + | Ok v -> 1058 + Hashtbl.replace tbl name (Stdlib.Obj.repr v); 1059 + Ok () 1060 + | Error e -> Error e); 1061 + } 1062 + in 1063 + (name, dec_entry) :: build_decoders (cont (Stdlib.Obj.magic ())) 1064 + | Mem_opt { name; codec; cont; _ } -> 1065 + let dec_entry = 1066 + { 1067 + decode_rw_store = 1068 + (fun path dec tbl -> 1069 + (* Peek: if null, store None *) 1070 + match Binary.peek_byte dec with 1071 + | Some b 1072 + when b lsr 5 = Binary.major_simple 1073 + && b land 0x1f = Binary.simple_null -> 1074 + ignore (Binary.read_byte dec); 1075 + Hashtbl.replace tbl name (Stdlib.Obj.repr None); 1076 + Ok () 1077 + | _ -> ( 1078 + match codec.decode_rw path dec with 1079 + | Ok v -> 1080 + Hashtbl.replace tbl name (Stdlib.Obj.repr (Some v)); 1081 + Ok () 1082 + | Error e -> Error e)); 1083 + } 1084 + in 1085 + (name, dec_entry) :: build_decoders (cont (Stdlib.Obj.magic ())) 1086 + | Mem_default { name; codec; default; cont; _ } -> 1087 + let dec_entry = 1088 + { 1089 + decode_rw_store = 1090 + (fun path dec tbl -> 1091 + match Binary.peek_byte dec with 1092 + | Some b 1093 + when b lsr 5 = Binary.major_simple 1094 + && b land 0x1f = Binary.simple_null -> 1095 + ignore (Binary.read_byte dec); 1096 + Hashtbl.replace tbl name (Stdlib.Obj.repr default); 1097 + Ok () 1098 + | _ -> ( 1099 + match codec.decode_rw path dec with 1100 + | Ok v -> 1101 + Hashtbl.replace tbl name (Stdlib.Obj.repr v); 1102 + Ok () 1103 + | Error e -> Error e)); 1104 + } 1105 + in 1106 + (name, dec_entry) :: build_decoders (cont (Stdlib.Obj.magic ())) 1107 + 1108 + (* After reading all map pairs from the stream, walk the mem chain 1109 + and collect typed values from the hashtable. *) 1110 + let rec resolve_mem : type o a. 1111 + Error.path -> 1112 + (string, Stdlib.Obj.t) Hashtbl.t -> 1113 + (o, a) mem -> 1114 + (a, Error.t) result = 1115 + fun path tbl m -> 1116 + match m with 1117 + | Return a -> Ok a 1118 + | Mem { name; cont; _ } -> ( 1119 + match Hashtbl.find_opt tbl name with 1120 + | None -> Error (Error.make path (Missing_member name)) 1121 + | Some obj -> resolve_mem path tbl (cont (Stdlib.Obj.obj obj))) 1122 + | Mem_opt { name; cont; _ } -> ( 1123 + match Hashtbl.find_opt tbl name with 1124 + | None -> resolve_mem path tbl (cont None) 1125 + | Some obj -> resolve_mem path tbl (cont (Stdlib.Obj.obj obj))) 1126 + | Mem_default { name; default; cont; _ } -> ( 1127 + match Hashtbl.find_opt tbl name with 1128 + | None -> resolve_mem path tbl (cont default) 1129 + | Some obj -> resolve_mem path tbl (cont (Stdlib.Obj.obj obj))) 1130 + 1131 + (* Streaming decode: read map key-value pairs directly, dispatching 1132 + each value to the appropriate member's decode_rw. Unknown keys 1133 + have their values skipped. *) 1134 + let decode_map_rw path dec len_opt dispatch tbl = 1135 + let n = match len_opt with Some n -> n | None -> max_int in 1136 + let rec loop i = 1137 + if i >= n then Ok () 1138 + else if len_opt = None && Binary.is_break dec then ( 1139 + Binary.skip_break dec; 1140 + Ok ()) 1141 + else 1142 + match Binary.peek_byte dec with 1143 + | Some b when b lsr 5 = Binary.major_text -> ( 1144 + let key = Binary.read_text dec in 1145 + match Hashtbl.find_opt dispatch key with 1146 + | Some entry -> ( 1147 + let path' = Error.Key key :: path in 1148 + match entry.decode_rw_store path' dec tbl with 1149 + | Ok () -> loop (i + 1) 1150 + | Error e -> Error e) 1151 + | None -> 1152 + (* Unknown member: skip the value *) 1153 + Binary.skip dec; 1154 + loop (i + 1)) 1155 + | _ -> 1156 + (* Non-text key: skip key and value *) 1157 + Binary.skip dec; 1158 + Binary.skip dec; 1159 + loop (i + 1) 1160 + in 1161 + loop 0 1162 + 1163 + let finish (m : ('o, 'o) mem) : 'o t = 1164 + let names = mem_names m in 1165 + let kind = Fmt.str "obj({%s})" (String.concat ", " names) in 1166 + (* Pre-build the dispatch table for streaming decode *) 1167 + let decoder_list = build_decoders m in 1168 + let make_dispatch () = 1169 + let tbl = Hashtbl.create (List.length decoder_list) in 1170 + List.iter 1171 + (fun (name, entry) -> Hashtbl.replace tbl name entry) 1172 + decoder_list; 1173 + tbl 1174 + in 1175 + { 1176 + kind; 1177 + encode = 1178 + (fun v -> 1179 + let fields = encode_mem v m [] in 1180 + Value.Map 1181 + (List.map (fun (k, v) -> (Value.Text k, v)) (List.rev fields))); 1182 + decode = 1183 + (fun path v -> 1184 + match v with 1185 + | Value.Map pairs -> ( 1186 + let text_pairs = 1187 + List.filter_map 1188 + (fun (k, v) -> 1189 + match k with Value.Text s -> Some (s, v) | _ -> None) 1190 + pairs 1191 + in 1192 + match decode_mem path text_pairs m with 1193 + | Error e -> Error e 1194 + | Ok (result, _remaining) -> Ok result) 1195 + | _ -> type_error path "map" v); 1196 + decode_rw = 1197 + (fun path dec -> 1198 + match read_map_length_rw path dec with 1199 + | Error e -> e 1200 + | Ok len_opt -> ( 1201 + let dispatch = make_dispatch () in 1202 + let results = Hashtbl.create (List.length decoder_list) in 1203 + match decode_map_rw path dec len_opt dispatch results with 1204 + | Error e -> Error e 1205 + | Ok () -> resolve_mem path results m)); 1206 + } 1207 + end 1208 + 1209 + (* Integer-keyed object codec module *) 1210 + 1211 + module Obj_int = struct 1212 + type enc = (int * Value.t) list 1213 + 1214 + let field key (v : Value.t) (acc : enc) : enc = (key, v) :: acc 1215 + 1216 + let find_remove key pairs = 1217 + let rec loop acc = function 1218 + | [] -> (None, List.rev acc) 1219 + | (k, v) :: rest when k = key -> (Some v, List.rev_append acc rest) 1220 + | kv :: rest -> loop (kv :: acc) rest 1221 + in 1222 + loop [] pairs 1223 + 1224 + type (_, _) mem = 1225 + | Return : 'a -> ('o, 'a) mem 1226 + | Mem : { 1227 + key : int; 1228 + get : 'o -> 'x; 1229 + codec : 'x t; 1230 + cont : 'x -> ('o, 'a) mem; 1231 + } 1232 + -> ('o, 'a) mem 1233 + | Mem_opt : { 1234 + key : int; 1235 + get : 'o -> 'x option; 1236 + codec : 'x t; 1237 + cont : 'x option -> ('o, 'a) mem; 1238 + } 1239 + -> ('o, 'a) mem 1240 + | Mem_default : { 1241 + key : int; 1242 + get : 'o -> 'x; 1243 + codec : 'x t; 1244 + default : 'x; 1245 + cont : 'x -> ('o, 'a) mem; 1246 + } 1247 + -> ('o, 'a) mem 1248 + 1249 + let return v = Return v 1250 + let mem key get codec = Mem { key; get; codec; cont = (fun x -> Return x) } 1251 + 1252 + let mem_opt key get codec = 1253 + Mem_opt { key; get; codec; cont = (fun x -> Return x) } 1254 + 1255 + let mem_default key get ~default codec = 1256 + Mem_default { key; get; codec; default; cont = (fun x -> Return x) } 1257 + 1258 + let rec ( let* ) : type o a b. (o, a) mem -> (a -> (o, b) mem) -> (o, b) mem = 1259 + fun m f -> 1260 + match m with 1261 + | Return a -> f a 1262 + | Mem r -> 1263 + Mem 1264 + { 1265 + r with 1266 + cont = 1267 + (fun x -> 1268 + let* y = r.cont x in 1269 + f y); 1270 + } 1271 + | Mem_opt r -> 1272 + Mem_opt 1273 + { 1274 + r with 1275 + cont = 1276 + (fun x -> 1277 + let* y = r.cont x in 1278 + f y); 1279 + } 1280 + | Mem_default r -> 1281 + Mem_default 1282 + { 1283 + r with 1284 + cont = 1285 + (fun x -> 1286 + let* y = r.cont x in 1287 + f y); 1288 + } 1289 + 1290 + let rec decode_mem : type o a. 1291 + Error.path -> 1292 + (int * Value.t) list -> 1293 + (o, a) mem -> 1294 + (a * (int * Value.t) list, Error.t) result = 1295 + fun path pairs m -> 1296 + match m with 1297 + | Return a -> Ok (a, pairs) 1298 + | Mem { key; codec; cont; _ } -> ( 1299 + match find_remove key pairs with 1300 + | None, _ -> 1301 + Error (Error.make path (Missing_member (string_of_int key))) 1302 + | Some v, remaining -> ( 1303 + let path' = Error.Key (string_of_int key) :: path in 1304 + match codec.decode path' v with 1305 + | Error e -> Error e 1306 + | Ok x -> decode_mem path remaining (cont x))) 1307 + | Mem_opt { key; codec; cont; _ } -> ( 1308 + match find_remove key pairs with 1309 + | None, remaining -> decode_mem path remaining (cont None) 1310 + | Some Value.Null, remaining -> decode_mem path remaining (cont None) 1311 + | Some v, remaining -> ( 1312 + let path' = Error.Key (string_of_int key) :: path in 1313 + match codec.decode path' v with 1314 + | Error e -> Error e 1315 + | Ok x -> decode_mem path remaining (cont (Some x)))) 1316 + | Mem_default { key; codec; default; cont; _ } -> ( 1317 + match find_remove key pairs with 1318 + | None, remaining -> decode_mem path remaining (cont default) 1319 + | Some Value.Null, remaining -> decode_mem path remaining (cont default) 1320 + | Some v, remaining -> ( 1321 + let path' = Error.Key (string_of_int key) :: path in 1322 + match codec.decode path' v with 1323 + | Error e -> Error e 1324 + | Ok x -> decode_mem path remaining (cont x))) 1325 + 1326 + let rec encode_mem : type o a. o -> (o, a) mem -> enc -> enc = 1327 + fun o m acc -> 1328 + match m with 1329 + | Return _ -> acc 1330 + | Mem { key; get; codec; cont } -> 1331 + let v = get o in 1332 + let acc = field key (codec.encode v) acc in 1333 + encode_mem o (cont v) acc 1334 + | Mem_opt { key; get; codec; cont } -> 1335 + let v = get o in 1336 + let acc = 1337 + match v with None -> acc | Some x -> field key (codec.encode x) acc 1338 + in 1339 + encode_mem o (cont v) acc 1340 + | Mem_default { key; get; codec; cont; _ } -> 1341 + let v = get o in 1342 + let acc = field key (codec.encode v) acc in 1343 + encode_mem o (cont v) acc 1344 + 1345 + let rec mem_keys : type o a. (o, a) mem -> int list = function 1346 + | Return _ -> [] 1347 + | Mem { key; cont; _ } -> key :: mem_keys (cont (Stdlib.Obj.magic ())) 1348 + | Mem_opt { key; cont; _ } -> key :: mem_keys (cont (Stdlib.Obj.magic ())) 1349 + | Mem_default { key; cont; _ } -> 1350 + key :: mem_keys (cont (Stdlib.Obj.magic ())) 1351 + 1352 + (* Build a dispatch table from integer key to a streaming decoder that 1353 + stores the typed result into a hashtable keyed by int. *) 1354 + type mem_decoder = { 1355 + decode_rw_store : 1356 + Error.path -> 1357 + Binary.decoder -> 1358 + (int, Stdlib.Obj.t) Hashtbl.t -> 1359 + (unit, Error.t) result; 1360 + } 1361 + 1362 + let rec build_decoders : type o a. (o, a) mem -> (int * mem_decoder) list = 1363 + function 1364 + | Return _ -> [] 1365 + | Mem { key; codec; cont; _ } -> 1366 + let entry = 1367 + { 1368 + decode_rw_store = 1369 + (fun path dec tbl -> 1370 + match codec.decode_rw path dec with 1371 + | Ok v -> 1372 + Hashtbl.replace tbl key (Stdlib.Obj.repr v); 1373 + Ok () 1374 + | Error e -> Error e); 1375 + } 1376 + in 1377 + (key, entry) :: build_decoders (cont (Stdlib.Obj.magic ())) 1378 + | Mem_opt { key; codec; cont; _ } -> 1379 + let entry = 1380 + { 1381 + decode_rw_store = 1382 + (fun path dec tbl -> 1383 + match Binary.peek_byte dec with 1384 + | Some b 1385 + when b lsr 5 = Binary.major_simple 1386 + && b land 0x1f = Binary.simple_null -> 1387 + ignore (Binary.read_byte dec); 1388 + Hashtbl.replace tbl key (Stdlib.Obj.repr None); 1389 + Ok () 1390 + | _ -> ( 1391 + match codec.decode_rw path dec with 1392 + | Ok v -> 1393 + Hashtbl.replace tbl key (Stdlib.Obj.repr (Some v)); 1394 + Ok () 1395 + | Error e -> Error e)); 1396 + } 1397 + in 1398 + (key, entry) :: build_decoders (cont (Stdlib.Obj.magic ())) 1399 + | Mem_default { key; codec; default; cont; _ } -> 1400 + let entry = 1401 + { 1402 + decode_rw_store = 1403 + (fun path dec tbl -> 1404 + match Binary.peek_byte dec with 1405 + | Some b 1406 + when b lsr 5 = Binary.major_simple 1407 + && b land 0x1f = Binary.simple_null -> 1408 + ignore (Binary.read_byte dec); 1409 + Hashtbl.replace tbl key (Stdlib.Obj.repr default); 1410 + Ok () 1411 + | _ -> ( 1412 + match codec.decode_rw path dec with 1413 + | Ok v -> 1414 + Hashtbl.replace tbl key (Stdlib.Obj.repr v); 1415 + Ok () 1416 + | Error e -> Error e)); 1417 + } 1418 + in 1419 + (key, entry) :: build_decoders (cont (Stdlib.Obj.magic ())) 1420 + 1421 + let rec resolve_mem : type o a. 1422 + Error.path -> 1423 + (int, Stdlib.Obj.t) Hashtbl.t -> 1424 + (o, a) mem -> 1425 + (a, Error.t) result = 1426 + fun path tbl m -> 1427 + match m with 1428 + | Return a -> Ok a 1429 + | Mem { key; cont; _ } -> ( 1430 + match Hashtbl.find_opt tbl key with 1431 + | None -> Error (Error.make path (Missing_member (string_of_int key))) 1432 + | Some obj -> resolve_mem path tbl (cont (Stdlib.Obj.obj obj))) 1433 + | Mem_opt { key; cont; _ } -> ( 1434 + match Hashtbl.find_opt tbl key with 1435 + | None -> resolve_mem path tbl (cont None) 1436 + | Some obj -> resolve_mem path tbl (cont (Stdlib.Obj.obj obj))) 1437 + | Mem_default { key; default; cont; _ } -> ( 1438 + match Hashtbl.find_opt tbl key with 1439 + | None -> resolve_mem path tbl (cont default) 1440 + | Some obj -> resolve_mem path tbl (cont (Stdlib.Obj.obj obj))) 1441 + 1442 + (* Read an integer key from the stream, returning Some int or None 1443 + if the key is not an integer or does not fit in OCaml int. *) 1444 + let read_int_key dec = 1445 + let key_hdr = Binary.read_header dec in 1446 + if key_hdr.major = 0 || key_hdr.major = 1 then 1447 + let kz = 1448 + if key_hdr.major = 0 then Binary.read_argument_z dec key_hdr 1449 + else Z.neg (Z.succ (Binary.read_argument_z dec key_hdr)) 1450 + in 1451 + if Z.fits_int kz then Some (Z.to_int kz) else None 1452 + else None 1453 + 1454 + let decode_int_map_rw path dec len_opt dispatch tbl = 1455 + let n = match len_opt with Some n -> n | None -> max_int in 1456 + let rec loop i = 1457 + if i >= n then Ok () 1458 + else if len_opt = None && Binary.is_break dec then ( 1459 + Binary.skip_break dec; 1460 + Ok ()) 1461 + else 1462 + match Binary.peek_byte dec with 1463 + | Some b when b lsr 5 = Binary.major_uint || b lsr 5 = Binary.major_nint 1464 + -> ( 1465 + match read_int_key dec with 1466 + | Some key -> ( 1467 + match Hashtbl.find_opt dispatch key with 1468 + | Some entry -> ( 1469 + let path' = Error.Key (string_of_int key) :: path in 1470 + match entry.decode_rw_store path' dec tbl with 1471 + | Ok () -> loop (i + 1) 1472 + | Error e -> Error e) 1473 + | None -> 1474 + Binary.skip dec; 1475 + loop (i + 1)) 1476 + | None -> 1477 + (* Key too large for int, skip value *) 1478 + Binary.skip dec; 1479 + loop (i + 1)) 1480 + | _ -> 1481 + (* Non-integer key: skip key and value *) 1482 + Binary.skip dec; 1483 + Binary.skip dec; 1484 + loop (i + 1) 1485 + in 1486 + loop 0 1487 + 1488 + let finish (m : ('o, 'o) mem) : 'o t = 1489 + let keys = mem_keys m in 1490 + let kind = 1491 + Fmt.str "obj_int({%s})" (String.concat ", " (List.map string_of_int keys)) 1492 + in 1493 + let decoder_list = build_decoders m in 1494 + let make_dispatch () = 1495 + let tbl = Hashtbl.create (List.length decoder_list) in 1496 + List.iter (fun (key, entry) -> Hashtbl.replace tbl key entry) decoder_list; 1497 + tbl 1498 + in 1499 + { 1500 + kind; 1501 + encode = 1502 + (fun v -> 1503 + let fields = encode_mem v m [] in 1504 + Value.Map 1505 + (List.map 1506 + (fun (k, v) -> (Value.Int (Z.of_int k), v)) 1507 + (List.rev fields))); 1508 + decode = 1509 + (fun path v -> 1510 + match v with 1511 + | Value.Map pairs -> ( 1512 + let int_pairs = 1513 + List.filter_map 1514 + (fun (k, v) -> 1515 + match k with 1516 + | Value.Int n when Z.fits_int n -> Some (Z.to_int n, v) 1517 + | _ -> None) 1518 + pairs 1519 + in 1520 + match decode_mem path int_pairs m with 1521 + | Error e -> Error e 1522 + | Ok (result, _remaining) -> Ok result) 1523 + | _ -> type_error path "map" v); 1524 + decode_rw = 1525 + (fun path dec -> 1526 + match read_map_length_rw path dec with 1527 + | Error e -> e 1528 + | Ok len_opt -> ( 1529 + let dispatch = make_dispatch () in 1530 + let results = Hashtbl.create (List.length decoder_list) in 1531 + match decode_int_map_rw path dec len_opt dispatch results with 1532 + | Error e -> Error e 1533 + | Ok () -> resolve_mem path results m)); 1534 + } 1535 + end 1536 + 1537 + (* Tags *) 1538 + 1539 + let tag n c = 1540 + { 1541 + kind = Fmt.str "tag(%d, %s)" n c.kind; 1542 + encode = (fun v -> Value.Tag (n, c.encode v)); 1543 + decode = 1544 + (fun path v -> 1545 + match v with 1546 + | Value.Tag (m, content) when m = n -> 1547 + c.decode (Error.Tag n :: path) content 1548 + | Value.Tag (m, _) -> 1549 + Error 1550 + (Error.make path 1551 + (Invalid_value (Fmt.str "expected tag %d, got tag %d" n m))) 1552 + | _ -> type_error path (Fmt.str "tag(%d)" n) v); 1553 + decode_rw = 1554 + (fun path dec -> 1555 + let hdr = Binary.read_header dec in 1556 + if hdr.major = Binary.major_tag then 1557 + let m = Int64.to_int (Binary.read_argument dec hdr) in 1558 + if m = n then c.decode_rw (Error.Tag n :: path) dec 1559 + else 1560 + Error 1561 + (Error.make path 1562 + (Invalid_value (Fmt.str "expected tag %d, got tag %d" n m))) 1563 + else stream_type_error path (Fmt.str "tag(%d)" n) hdr); 1564 + } 1565 + 1566 + let tag_opt n c = 1567 + { 1568 + kind = Fmt.str "tag_opt(%d, %s)" n c.kind; 1569 + encode = (fun v -> Value.Tag (n, c.encode v)); 1570 + decode = 1571 + (fun path v -> 1572 + match v with 1573 + | Value.Tag (m, content) when m = n -> 1574 + c.decode (Error.Tag n :: path) content 1575 + | _ -> c.decode path v); 1576 + decode_rw = 1577 + (fun path dec -> 1578 + match Binary.peek_byte dec with 1579 + | Some b when b lsr 5 = Binary.major_tag -> 1580 + let hdr = Binary.read_header dec in 1581 + let m = Int64.to_int (Binary.read_argument dec hdr) in 1582 + if m = n then c.decode_rw (Error.Tag n :: path) dec 1583 + else 1584 + (* Not our tag; read the content as Value.t and use the Value.t 1585 + decoder with the tag wrapper *) 1586 + let content = Binary.read_cbor dec in 1587 + c.decode path (Value.Tag (m, content)) 1588 + | _ -> c.decode_rw path dec); 1589 + } 1590 + 1591 + (* Transformations *) 1592 + 1593 + let map decode_f encode_f c = 1594 + { 1595 + kind = Fmt.str "map(%s)" c.kind; 1596 + encode = (fun v -> c.encode (encode_f v)); 1597 + decode = 1598 + (fun path v -> 1599 + match c.decode path v with 1600 + | Error e -> Error e 1601 + | Ok x -> Ok (decode_f x)); 1602 + decode_rw = 1603 + (fun path dec -> 1604 + match c.decode_rw path dec with 1605 + | Error e -> Error e 1606 + | Ok x -> Ok (decode_f x)); 1607 + } 1608 + 1609 + let conv decode_f encode_f c = 1610 + { 1611 + kind = Fmt.str "conv(%s)" c.kind; 1612 + encode = (fun v -> c.encode (encode_f v)); 1613 + decode = 1614 + (fun path v -> 1615 + match c.decode path v with 1616 + | Error e -> Error e 1617 + | Ok x -> ( 1618 + match decode_f x with 1619 + | Ok y -> Ok y 1620 + | Error msg -> Error (Error.make path (Custom msg)))); 1621 + decode_rw = 1622 + (fun path dec -> 1623 + match c.decode_rw path dec with 1624 + | Error e -> Error e 1625 + | Ok x -> ( 1626 + match decode_f x with 1627 + | Ok y -> Ok y 1628 + | Error msg -> Error (Error.make path (Custom msg)))); 1629 + } 1630 + 1631 + let const v c = 1632 + { 1633 + kind = Fmt.str "const(%s)" c.kind; 1634 + encode = (fun _ -> c.encode ()); 1635 + decode = 1636 + (fun path cbor -> 1637 + match c.decode path cbor with Error e -> Error e | Ok () -> Ok v); 1638 + decode_rw = 1639 + (fun path dec -> 1640 + match c.decode_rw path dec with Error e -> Error e | Ok () -> Ok v); 1641 + } 1642 + 1643 + (* Variants *) 1644 + 1645 + module Variant = struct 1646 + type 'a case = 1647 + | Case : int * 'b t * ('b -> 'a) * ('a -> 'b option) -> 'a case 1648 + | Case0 : int * 'a * ('a -> bool) -> 'a case 1649 + 1650 + let case tag c inject project = Case (tag, c, inject, project) 1651 + let case0 tag v is_v = Case0 (tag, v, is_v) 1652 + 1653 + let variant cases = 1654 + { 1655 + kind = "variant"; 1656 + encode = 1657 + (fun v -> 1658 + let rec find = function 1659 + | [] -> failwith "No matching variant case for encoding" 1660 + | Case (tag, c, _, project) :: rest -> ( 1661 + match project v with 1662 + | Some x -> Value.Tag (tag, c.encode x) 1663 + | None -> find rest) 1664 + | Case0 (tag, _, is_v) :: rest -> 1665 + if is_v v then Value.Tag (tag, Value.Null) else find rest 1666 + in 1667 + find cases); 1668 + decode = 1669 + (fun path v -> 1670 + match v with 1671 + | Value.Tag (tag, content) -> 1672 + let rec try_cases = function 1673 + | [] -> 1674 + Error 1675 + (Error.make path 1676 + (Invalid_value 1677 + (Fmt.str "unknown tag %d in variant" tag))) 1678 + | Case (t, c, inject, _) :: _rest when t = tag -> ( 1679 + match c.decode (Error.Tag t :: path) content with 1680 + | Error e -> Error e 1681 + | Ok x -> Ok (inject x)) 1682 + | Case0 (t, v, _) :: _ when t = tag -> Ok v 1683 + | _ :: rest -> try_cases rest 1684 + in 1685 + try_cases cases 1686 + | _ -> type_error path "tag" v); 1687 + decode_rw = 1688 + (fun path dec -> 1689 + let hdr = Binary.read_header dec in 1690 + if hdr.major <> Binary.major_tag then stream_type_error path "tag" hdr 1691 + else 1692 + let tag = Int64.to_int (Binary.read_argument dec hdr) in 1693 + let rec try_cases = function 1694 + | [] -> 1695 + Binary.skip dec; 1696 + Error 1697 + (Error.make path 1698 + (Invalid_value (Fmt.str "unknown tag %d in variant" tag))) 1699 + | Case (t, c, inject, _) :: _rest when t = tag -> ( 1700 + match c.decode_rw (Error.Tag t :: path) dec with 1701 + | Error e -> Error e 1702 + | Ok x -> Ok (inject x)) 1703 + | Case0 (t, v, _) :: _ when t = tag -> 1704 + Binary.skip dec; 1705 + Ok v 1706 + | _ :: rest -> try_cases rest 1707 + in 1708 + try_cases cases); 1709 + } 1710 + end 1711 + 1712 + module Variant_key = struct 1713 + type 'a case = 1714 + | Case : string * 'b t * ('b -> 'a) * ('a -> 'b option) -> 'a case 1715 + | Case0 : string * 'a * ('a -> bool) -> 'a case 1716 + 1717 + let case key c inject project = Case (key, c, inject, project) 1718 + let case0 key v is_v = Case0 (key, v, is_v) 1719 + 1720 + let variant cases = 1721 + { 1722 + kind = "variant_key"; 1723 + encode = 1724 + (fun v -> 1725 + let rec find = function 1726 + | [] -> failwith "No matching variant case for encoding" 1727 + | Case (key, c, _, project) :: rest -> ( 1728 + match project v with 1729 + | Some x -> Value.Map [ (Value.Text key, c.encode x) ] 1730 + | None -> find rest) 1731 + | Case0 (key, _, is_v) :: rest -> 1732 + if is_v v then Value.Map [ (Value.Text key, Value.Null) ] 1733 + else find rest 1734 + in 1735 + find cases); 1736 + decode = 1737 + (fun path v -> 1738 + match v with 1739 + | Value.Map [ (Value.Text key, content) ] -> 1740 + let rec try_cases = function 1741 + | [] -> 1742 + Error 1743 + (Error.make path 1744 + (Invalid_value 1745 + (Fmt.str "unknown key %S in variant" key))) 1746 + | Case (k, c, inject, _) :: _rest when k = key -> ( 1747 + match c.decode (Error.Key k :: path) content with 1748 + | Error e -> Error e 1749 + | Ok x -> Ok (inject x)) 1750 + | Case0 (k, v, _) :: _ when k = key -> Ok v 1751 + | _ :: rest -> try_cases rest 1752 + in 1753 + try_cases cases 1754 + | Value.Map _ -> 1755 + Error 1756 + (Error.make path 1757 + (Invalid_value "variant map must have exactly one key")) 1758 + | _ -> type_error path "map" v); 1759 + decode_rw = 1760 + (fun path dec -> 1761 + match read_map_length_rw path dec with 1762 + | Error e -> e 1763 + | Ok (Some 1) -> ( 1764 + (* Peek at key type without consuming *) 1765 + match Binary.peek_byte dec with 1766 + | Some b when b lsr 5 <> Binary.major_text -> 1767 + (* Non-text key: skip key and value *) 1768 + Binary.skip dec; 1769 + Binary.skip dec; 1770 + Error 1771 + (Error.make path 1772 + (Invalid_value "variant map key must be text")) 1773 + | _ -> 1774 + let key = Binary.read_text dec in 1775 + let rec try_cases = function 1776 + | [] -> 1777 + Binary.skip dec; 1778 + Error 1779 + (Error.make path 1780 + (Invalid_value 1781 + (Fmt.str "unknown key %S in variant" key))) 1782 + | Case (k, c, inject, _) :: _rest when k = key -> ( 1783 + match c.decode_rw (Error.Key k :: path) dec with 1784 + | Error e -> Error e 1785 + | Ok x -> Ok (inject x)) 1786 + | Case0 (k, v, _) :: _ when k = key -> 1787 + Binary.skip dec; 1788 + Ok v 1789 + | _ :: rest -> try_cases rest 1790 + in 1791 + try_cases cases) 1792 + | Ok (Some n) -> 1793 + (* Skip all map entries *) 1794 + for _ = 1 to n do 1795 + Binary.skip dec; 1796 + Binary.skip dec 1797 + done; 1798 + Error 1799 + (Error.make path 1800 + (Invalid_value "variant map must have exactly one key")) 1801 + | Ok None -> ( 1802 + if 1803 + (* Indefinite map: must have exactly one entry *) 1804 + Binary.is_break dec 1805 + then ( 1806 + Binary.skip_break dec; 1807 + Error 1808 + (Error.make path 1809 + (Invalid_value "variant map must have exactly one key"))) 1810 + else 1811 + (* Read the first (and hopefully only) key *) 1812 + match Binary.peek_byte dec with 1813 + | Some b when b lsr 5 <> Binary.major_text -> 1814 + (* Non-text key: skip key and value, then drain rest *) 1815 + Binary.skip dec; 1816 + Binary.skip dec; 1817 + while not (Binary.is_break dec) do 1818 + Binary.skip dec; 1819 + Binary.skip dec 1820 + done; 1821 + Binary.skip_break dec; 1822 + Error 1823 + (Error.make path 1824 + (Invalid_value "variant map key must be text")) 1825 + | _ -> 1826 + let key = Binary.read_text dec in 1827 + let result = 1828 + let rec try_cases = function 1829 + | [] -> 1830 + Binary.skip dec; 1831 + Error 1832 + (Error.make path 1833 + (Invalid_value 1834 + (Fmt.str "unknown key %S in variant" key))) 1835 + | Case (k, c, inject, _) :: _rest when k = key -> ( 1836 + match c.decode_rw (Error.Key k :: path) dec with 1837 + | Error e -> Error e 1838 + | Ok x -> Ok (inject x)) 1839 + | Case0 (k, v, _) :: _ when k = key -> 1840 + Binary.skip dec; 1841 + Ok v 1842 + | _ :: rest -> try_cases rest 1843 + in 1844 + try_cases cases 1845 + in 1846 + (* Check that no more entries follow *) 1847 + let extra = ref 0 in 1848 + while not (Binary.is_break dec) do 1849 + incr extra; 1850 + Binary.skip dec; 1851 + Binary.skip dec 1852 + done; 1853 + Binary.skip_break dec; 1854 + if !extra > 0 then 1855 + Error 1856 + (Error.make path 1857 + (Invalid_value 1858 + "variant map must have exactly one key")) 1859 + else result)); 1860 + } 1861 + end 1862 + 1863 + (* Recursive types *) 1864 + 1865 + let fix f = 1866 + let rec self = 1867 + lazy 1868 + (f 1869 + { 1870 + kind = "fix"; 1871 + encode = (fun v -> (Lazy.force self).encode v); 1872 + decode = (fun path v -> (Lazy.force self).decode path v); 1873 + decode_rw = (fun path dec -> (Lazy.force self).decode_rw path dec); 1874 + }) 1875 + in 1876 + Lazy.force self 1877 + 1878 + (* Queries *) 1879 + 1880 + let mem name c = 1881 + let decode = 1882 + fun path v -> 1883 + match v with 1884 + | Value.Map pairs -> 1885 + let rec find = function 1886 + | [] -> Error (Error.make path (Missing_member name)) 1887 + | (Value.Text k, value) :: _ when k = name -> 1888 + c.decode (Error.Key name :: path) value 1889 + | _ :: rest -> find rest 1890 + in 1891 + find pairs 1892 + | _ -> type_error path "map" v 1893 + in 1894 + { 1895 + kind = Fmt.str "mem(%s, %s)" name c.kind; 1896 + encode = (fun v -> Value.Map [ (Value.Text name, c.encode v) ]); 1897 + decode; 1898 + decode_rw = decode_rw_via_cbor decode; 1899 + } 1900 + 1901 + let int_mem key c = 1902 + let decode = 1903 + fun path v -> 1904 + match v with 1905 + | Value.Map pairs -> 1906 + let key_cbor = Value.Int (Z.of_int key) in 1907 + let rec find = function 1908 + | [] -> Error (Error.make path (Missing_member (string_of_int key))) 1909 + | (k, value) :: _ when Value.equal k key_cbor -> 1910 + c.decode (Error.Key (string_of_int key) :: path) value 1911 + | _ :: rest -> find rest 1912 + in 1913 + find pairs 1914 + | _ -> type_error path "map" v 1915 + in 1916 + { 1917 + kind = Fmt.str "int_mem(%d, %s)" key c.kind; 1918 + encode = (fun v -> Value.Map [ (Value.Int (Z.of_int key), c.encode v) ]); 1919 + decode; 1920 + decode_rw = decode_rw_via_cbor decode; 1921 + } 1922 + 1923 + let nth n c = 1924 + let decode = 1925 + fun path v -> 1926 + match v with 1927 + | Value.Array items -> ( 1928 + match List.nth_opt items n with 1929 + | None -> 1930 + Error 1931 + (Error.make path 1932 + (Out_of_range 1933 + { 1934 + value = string_of_int n; 1935 + range = Fmt.str "[0, %d)" (List.length items); 1936 + })) 1937 + | Some item -> c.decode (Error.Index n :: path) item) 1938 + | _ -> type_error path "array" v 1939 + in 1940 + { 1941 + kind = Fmt.str "nth(%d, %s)" n c.kind; 1942 + encode = 1943 + (fun v -> 1944 + let items = 1945 + List.init (n + 1) (fun i -> if i = n then c.encode v else Value.Null) 1946 + in 1947 + Value.Array items); 1948 + decode; 1949 + decode_rw = decode_rw_via_cbor decode; 1950 + } 1951 + 1952 + (* Updates *) 1953 + 1954 + let update_mem name c = 1955 + let decode = 1956 + fun path v -> 1957 + match v with 1958 + | Value.Map pairs -> 1959 + let rec find found acc = function 1960 + | [] -> 1961 + if found then Ok (Value.Map (List.rev acc)) 1962 + else Error (Error.make path (Missing_member name)) 1963 + | (Value.Text k, value) :: rest when k = name -> ( 1964 + match c.decode (Error.Key name :: path) value with 1965 + | Error e -> Error e 1966 + | Ok new_value -> 1967 + let new_pair = (Value.Text name, c.encode new_value) in 1968 + find true (new_pair :: acc) rest) 1969 + | pair :: rest -> find found (pair :: acc) rest 1970 + in 1971 + find false [] pairs 1972 + | _ -> type_error path "map" v 1973 + in 1974 + { 1975 + kind = Fmt.str "update_mem(%s, %s)" name c.kind; 1976 + encode = (fun v -> Value.Map [ (Value.Text name, v) ]); 1977 + decode; 1978 + decode_rw = decode_rw_via_cbor decode; 1979 + } 1980 + 1981 + let delete_mem name = 1982 + let decode = 1983 + fun path v -> 1984 + match v with 1985 + | Value.Map pairs -> 1986 + let filtered = 1987 + List.filter 1988 + (fun (k, _) -> match k with Value.Text k -> k <> name | _ -> true) 1989 + pairs 1990 + in 1991 + Ok (Value.Map filtered) 1992 + | _ -> type_error path "map" v 1993 + in 1994 + { 1995 + kind = Fmt.str "delete_mem(%s)" name; 1996 + encode = (fun v -> v); 1997 + decode; 1998 + decode_rw = decode_rw_via_cbor decode; 1999 + } 2000 + 2001 + (* Decoding *) 2002 + 2003 + let decode_cbor c v = c.decode [ Error.Root ] v 2004 + 2005 + let decode_cbor_exn c v = 2006 + match decode_cbor c v with Ok x -> x | Error e -> raise (Error.Decode e) 2007 + 2008 + let decode c reader = 2009 + let dec = Binary.decoder reader in 2010 + try 2011 + let result = c.decode_rw [ Error.Root ] dec in 2012 + match result with 2013 + | Error _ -> result 2014 + | Ok _ -> ( 2015 + match Binary.peek_byte dec with 2016 + | Some _ -> 2017 + Error 2018 + (Error.make [ Error.Root ] 2019 + (Parse_error 2020 + (Fmt.str "trailing bytes at position %d" 2021 + (Binary.decoder_position dec)))) 2022 + | None -> result) 2023 + with 2024 + | Failure msg -> Error (Error.make [ Error.Root ] (Parse_error msg)) 2025 + | Invalid_argument msg -> Error (Error.make [ Error.Root ] (Parse_error msg)) 2026 + | End_of_file -> 2027 + Error (Error.make [ Error.Root ] (Parse_error "unexpected end of input")) 2028 + 2029 + let decode_exn c reader = 2030 + match decode c reader with Ok x -> x | Error e -> raise (Error.Decode e) 2031 + 2032 + let decode_string c s = 2033 + let reader = Bytes.Reader.of_string s in 2034 + decode c reader 2035 + 2036 + let decode_string_exn c s = 2037 + match decode_string c s with Ok x -> x | Error e -> raise (Error.Decode e) 2038 + 2039 + (* Encoding *) 2040 + 2041 + let encode_cbor c v = c.encode v 2042 + 2043 + let encode c v ~eod writer = 2044 + let enc = Binary.encoder writer in 2045 + let cbor = c.encode v in 2046 + Binary.write_cbor enc cbor; 2047 + Binary.flush_encoder enc; 2048 + if eod then Bytes.Writer.write writer Bytes.Slice.eod 2049 + 2050 + let encode_string c v = 2051 + let buf = Buffer.create 256 in 2052 + let writer = Bytes.Writer.of_buffer buf in 2053 + encode c v ~eod:false writer; 2054 + Buffer.contents buf 2055 + 2056 + module Private = struct 2057 + let decode_cbor = decode_cbor 2058 + let decode_cbor_exn = decode_cbor_exn 2059 + let encode_cbor = encode_cbor 2060 + end
+486
lib/cbor.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** CBOR codecs. 7 + 8 + This module provides type-safe CBOR encoding and decoding using a 9 + combinator-based approach. Define codecs once and use them for both encoding 10 + and decoding. 11 + 12 + The design follows the jsont pattern: codecs are values of type ['a t] that 13 + describe how to convert between OCaml values of type ['a] and CBOR data 14 + items. 15 + 16 + {2 Quick Start} 17 + 18 + {[ 19 + (* Define a codec for a record type *) 20 + type person = { name : string; age : int } 21 + 22 + let person_codec = 23 + let open Value.Obj in 24 + let* name = mem "name" Value.string in 25 + let* age = mem "age" Value.int in 26 + return { name; age } ~enc:(fun p -> 27 + field "name" p.name @@ field "age" p.age @@ empty) 28 + 29 + (* Encode to CBOR bytes *) 30 + let cbor_bytes = 31 + Value.encode_string person_codec { name = "Alice"; age = 30 } 32 + 33 + (* Decode from CBOR bytes *) 34 + let person = Value.decode_string person_codec cbor_bytes 35 + ]} 36 + 37 + {2 Data Model} 38 + 39 + This codec library maps OCaml types to CBOR types: 40 + 41 + | OCaml type | CBOR type | |------------|-----------| | [unit] | null 42 + (simple value 22) | | [bool] | true/false (simple values 21/20) | | [int], 43 + [int32], [int64] | integer (major types 0/1) | | [float] | float (major type 44 + 7) | | [string] | text string (major type 3) | | [bytes] | byte string 45 + (major type 2) | | ['a list] | array (major type 4) | | [('k, 'v) list] | 46 + map (major type 5) | | records | map with text keys | | variants | tagged or 47 + keyed encoding | *) 48 + 49 + open Bytesrw 50 + module Binary = Binary 51 + module Value = Value 52 + 53 + (** {1:sort Sorts} *) 54 + 55 + (** Sorts of CBOR values, one per RFC 8949 major type (with major type 7 split 56 + into {!Bool} / {!Null} / {!Undefined} / {!Simple} / {!Float} by the 57 + simple-value sub-tag). *) 58 + module Sort : sig 59 + (** The sort of a CBOR value. *) 60 + type t = 61 + | Unsigned (** Unsigned integer (major type 0). *) 62 + | Negative (** Negative integer (major type 1). *) 63 + | Bytes (** Byte string (major type 2). *) 64 + | Text (** Text string (major type 3). *) 65 + | Array (** Array (major type 4). *) 66 + | Map (** Map (major type 5). *) 67 + | Tag (** Tagged data item (major type 6). *) 68 + | Bool (** Boolean simple values (20/21, major 7). *) 69 + | Null (** Null simple value (22, major 7). *) 70 + | Undefined (** Undefined simple value (23, major 7). *) 71 + | Simple (** Other simple value (major 7). *) 72 + | Float (** Floating-point (major 7 ieee). *) 73 + 74 + val to_string : t -> string 75 + (** [to_string sort] is a human-readable name for [sort]. *) 76 + 77 + val pp : Format.formatter -> t -> unit 78 + (** [pp] formats sorts. *) 79 + 80 + val of_cbor : Value.t -> t 81 + (** [of_cbor v] returns the sort of CBOR value [v]. Inspects [Z.sign] on 82 + {!Value.Int} to distinguish {!Unsigned} from {!Negative}. *) 83 + 84 + val kinded : kind:string -> t -> string 85 + (** [kinded ~kind s] is [kind ^ " " ^ to_string s] when [kind] is non-empty, 86 + otherwise {!to_string} [s]. *) 87 + 88 + val or_kind : kind:string -> t -> string 89 + (** [or_kind ~kind s] is [kind] when non-empty, otherwise {!to_string} [s]. *) 90 + end 91 + 92 + (** {1:errors Errors} *) 93 + 94 + (** Error handling for codec operations. *) 95 + module Error : sig 96 + (** {1:paths Paths} *) 97 + 98 + type path = segment list 99 + (** Path to a location in a CBOR structure. *) 100 + 101 + (** A segment of a path. *) 102 + and segment = 103 + | Root (** The root of the structure. *) 104 + | Index of int (** An array index. *) 105 + | Key of string (** A map key (text string). *) 106 + | Key_cbor of Value.t (** A map key (arbitrary CBOR). *) 107 + | Tag of int (** Inside a tagged value. *) 108 + 109 + val pp_path : Format.formatter -> path -> unit 110 + (** [pp_path ppf path] pretty-prints [path]. *) 111 + 112 + val path_to_string : path -> string 113 + (** [path_to_string path] returns [path] as a string. *) 114 + 115 + (** {1:errors Errors} *) 116 + 117 + (** The kind of error. *) 118 + type kind = 119 + | Type_mismatch of { expected : string; got : string } 120 + (** Expected one CBOR type but got another. *) 121 + | Missing_member of string (** Required map member not found. *) 122 + | Unknown_member of string (** Unexpected map member (when strict). *) 123 + | Duplicate_member of string (** Map contains duplicate key. *) 124 + | Out_of_range of { value : string; range : string } 125 + (** Value outside acceptable range. *) 126 + | Invalid_value of string (** Value doesn't satisfy a constraint. *) 127 + | Parse_error of string (** Low-level parsing error. *) 128 + | Custom of string (** User-defined error. *) 129 + 130 + type t = { 131 + path : path; (** Location in the structure. *) 132 + kind : kind; (** What went wrong. *) 133 + } 134 + (** A decode error with location context. *) 135 + 136 + val make : path -> kind -> t 137 + (** [make path kind] creates an error. *) 138 + 139 + val pp : Format.formatter -> t -> unit 140 + (** [pp ppf e] pretty-prints error [e]. *) 141 + 142 + val to_string : t -> string 143 + (** [to_string e] returns [e] as a string. *) 144 + 145 + exception Decode of t 146 + (** Exception raised by [_exn] decoding functions. *) 147 + end 148 + 149 + (** {1:codec Codecs} *) 150 + 151 + type 'a t 152 + (** The type of codecs for values of type ['a]. A codec knows how to both encode 153 + ['a] to CBOR and decode CBOR to ['a]. *) 154 + 155 + val pp : Format.formatter -> _ t -> unit 156 + (** [pp ppf t] pretty-prints the codec as [<codec>]. *) 157 + 158 + (** {1:base Base Codecs} 159 + 160 + Codecs for primitive types. *) 161 + 162 + val null : unit t 163 + (** [null] is a codec for the CBOR null value. Encodes [()] as null. *) 164 + 165 + val bool : bool t 166 + (** [bool] is a codec for CBOR booleans. *) 167 + 168 + val int : int t 169 + (** [int] is a codec for OCaml [int] as CBOR integer. 170 + @raise Error.Decode if the CBOR integer is out of [int] range. *) 171 + 172 + val int32 : int32 t 173 + (** [int32] is a codec for [int32] as CBOR integer. 174 + @raise Error.Decode if the CBOR integer is out of [int32] range. *) 175 + 176 + val int64 : int64 t 177 + (** [int64] is a codec for [int64] as CBOR integer. *) 178 + 179 + val float : float t 180 + (** [float] is a codec for CBOR floating-point numbers. Also accepts CBOR 181 + integers, converting them to float. *) 182 + 183 + val string : string t 184 + (** [string] is a codec for CBOR text strings (UTF-8). *) 185 + 186 + val bytes : string t 187 + (** [bytes] is a codec for CBOR byte strings. The OCaml [string] type is used 188 + since it can hold arbitrary bytes. *) 189 + 190 + val any : Value.t t 191 + (** [any] is a codec that accepts any CBOR value. Useful for dynamic content or 192 + when preserving unknown fields. *) 193 + 194 + (** {1:nullable Nullable Values} *) 195 + 196 + val nullable : 'a t -> 'a option t 197 + (** [nullable c] creates a codec for optional values. Encodes [None] as null, 198 + [Some x] as [c] would encode [x]. *) 199 + 200 + val option : default:'a -> 'a t -> 'a t 201 + (** [option ~default c] creates a codec that uses [default] when decoding null 202 + instead of failing. *) 203 + 204 + (** {1:numbers Numeric Variants} *) 205 + 206 + val uint : int t 207 + (** [uint] is like {!int} but only accepts non-negative integers. *) 208 + 209 + val uint32 : int32 t 210 + (** [uint32] is like {!int32} but only accepts non-negative integers. *) 211 + 212 + val uint64 : int64 t 213 + (** [uint64] is like {!int64} but only accepts non-negative integers. *) 214 + 215 + val number : float t 216 + (** [number] accepts both integers and floats, converting to float. Alias for 217 + {!float}. *) 218 + 219 + (** {1:arrays Arrays} *) 220 + 221 + val array : 'a t -> 'a list t 222 + (** [array c] is a codec for arrays where each element uses codec [c]. *) 223 + 224 + val array_of : len:int -> 'a t -> 'a list t 225 + (** [array_of ~len c] is like {!array} but requires exactly [len] elements. *) 226 + 227 + val tuple2 : 'a t -> 'b t -> ('a * 'b) t 228 + (** [tuple2 ca cb] is a codec for 2-element arrays as pairs. *) 229 + 230 + val tuple3 : 'a t -> 'b t -> 'c t -> ('a * 'b * 'c) t 231 + (** [tuple3 ca cb cc] is a codec for 3-element arrays as triples. *) 232 + 233 + val tuple4 : 'a t -> 'b t -> 'c t -> 'd t -> ('a * 'b * 'c * 'd) t 234 + (** [tuple4 ca cb cc cd] is a codec for 4-element arrays as quadruples. *) 235 + 236 + (** {1:maps Maps} 237 + 238 + Codecs for CBOR maps with uniform key and value types. For records and 239 + heterogeneous maps, see {!module:Obj}. *) 240 + 241 + val assoc : 'k t -> 'v t -> ('k * 'v) list t 242 + (** [assoc kc vc] is a codec for maps as association lists. Keys are decoded 243 + using [kc], values using [vc]. *) 244 + 245 + val string_map : 'v t -> (string * 'v) list t 246 + (** [string_map vc] is a codec for maps with text string keys. Equivalent to 247 + [assoc string vc]. *) 248 + 249 + val int_map : 'v t -> (int * 'v) list t 250 + (** [int_map vc] is a codec for maps with integer keys. Common in COSE and other 251 + binary protocols. *) 252 + 253 + (** {1:objects Object Codecs} 254 + 255 + Build codecs for records and objects from CBOR maps with text string keys. 256 + Uses a monadic interface for composing member codecs. *) 257 + module Obj : sig 258 + type ('o, 'a) mem 259 + (** A member specification. ['o] is the object type being built, ['a] is the 260 + decoded value at this step. *) 261 + 262 + val ( let* ) : ('o, 'a) mem -> ('a -> ('o, 'b) mem) -> ('o, 'b) mem 263 + (** Monadic bind for sequencing member decoders. *) 264 + 265 + val mem : string -> ('o -> 'a) -> 'a t -> ('o, 'a) mem 266 + (** [mem name get c] declares a required member with key [name] decoded by 267 + [c]. The [get] function extracts the field value from the object for 268 + encoding. *) 269 + 270 + val mem_opt : string -> ('o -> 'a option) -> 'a t -> ('o, 'a option) mem 271 + (** [mem_opt name get c] declares an optional member. Returns [None] if the 272 + key is absent or the value is null. *) 273 + 274 + val mem_default : string -> ('o -> 'a) -> default:'a -> 'a t -> ('o, 'a) mem 275 + (** [mem_default name get ~default c] declares a member with a default value 276 + used when the key is absent. *) 277 + 278 + val return : 'o -> ('o, 'o) mem 279 + (** [return v] completes the object codec, returning the built value. *) 280 + 281 + val finish : ('o, 'o) mem -> 'o t 282 + (** [finish m] converts the member specification to a codec. *) 283 + end 284 + 285 + (** {1:int_objects Integer-Keyed Objects} 286 + 287 + Build codecs for maps with integer keys. Common in COSE, CWT, and other 288 + space-efficient binary protocols. *) 289 + module Obj_int : sig 290 + type ('o, 'a) mem 291 + (** A member specification. ['o] is the object type being built, ['a] is the 292 + decoded value at this step. *) 293 + 294 + val ( let* ) : ('o, 'a) mem -> ('a -> ('o, 'b) mem) -> ('o, 'b) mem 295 + (** Monadic bind for sequencing member decoders. *) 296 + 297 + val mem : int -> ('o -> 'a) -> 'a t -> ('o, 'a) mem 298 + (** [mem key get c] declares a required member with integer key [key]. The 299 + [get] function extracts the field value for encoding. *) 300 + 301 + val mem_opt : int -> ('o -> 'a option) -> 'a t -> ('o, 'a option) mem 302 + (** [mem_opt key get c] declares an optional member with integer key. *) 303 + 304 + val mem_default : int -> ('o -> 'a) -> default:'a -> 'a t -> ('o, 'a) mem 305 + (** [mem_default key get ~default c] declares a member with default value. *) 306 + 307 + val return : 'o -> ('o, 'o) mem 308 + (** [return v] completes the codec. *) 309 + 310 + val finish : ('o, 'o) mem -> 'o t 311 + (** [finish m] converts to a codec. *) 312 + end 313 + 314 + (** {1:tags Tagged Values} 315 + 316 + CBOR tags provide semantic information about data items. *) 317 + 318 + val tag : int -> 'a t -> 'a t 319 + (** [tag n c] wraps codec [c] with tag number [n]. On encoding, outputs the tag; 320 + on decoding, expects and strips the tag. *) 321 + 322 + val tag_opt : int -> 'a t -> 'a t 323 + (** [tag_opt n c] is like {!tag} but the tag is optional when decoding. Useful 324 + for accepting both tagged and untagged input. *) 325 + 326 + (** {1:transforms Transformations} 327 + 328 + Convert between types using codecs. *) 329 + 330 + val map : ('a -> 'b) -> ('b -> 'a) -> 'a t -> 'b t 331 + (** [map decode encode c] transforms codec [c]. The [decode] function is applied 332 + after decoding, [encode] before encoding. *) 333 + 334 + val conv : ('a -> ('b, string) result) -> ('b -> 'a) -> 'a t -> 'b t 335 + (** [conv decode encode c] is like {!map} but [decode] may fail. *) 336 + 337 + val const : 'a -> unit t -> 'a t 338 + (** [const v c] is a codec that always decodes to [v] and encodes [v] using [c]. 339 + *) 340 + 341 + (** {1:variants Variants} 342 + 343 + Encode sum types using either tags or key-based discrimination. *) 344 + 345 + (** Tag-based variant encoding. Each constructor is assigned a unique CBOR tag 346 + number. *) 347 + module Variant : sig 348 + type 'a case 349 + (** A variant case specification. *) 350 + 351 + val case : int -> 'a t -> ('a -> 'b) -> ('b -> 'a option) -> 'b case 352 + (** [case tag c inject project] defines a case: 353 + - [tag] is the CBOR tag number for this case 354 + - [c] is the codec for the payload 355 + - [inject] wraps decoded payload into the variant type 356 + - [project] extracts payload if this case matches *) 357 + 358 + val case0 : int -> 'a -> ('a -> bool) -> 'a case 359 + (** [case0 tag v is_v] defines a case with no payload. Encodes as an empty 360 + tag. [is_v x] should return [true] iff [x] equals [v]. *) 361 + 362 + val variant : 'a case list -> 'a t 363 + (** [variant cases] builds a codec from a list of cases. Cases are tried in 364 + order during decoding. *) 365 + end 366 + 367 + (** Key-based variant encoding. Each constructor is identified by a string key 368 + in a singleton map. *) 369 + module Variant_key : sig 370 + type 'a case 371 + (** A variant case specification. *) 372 + 373 + val case : string -> 'a t -> ('a -> 'b) -> ('b -> 'a option) -> 'b case 374 + (** [case key c inject project] defines a case identified by text key [key]. 375 + *) 376 + 377 + val case0 : string -> 'a -> ('a -> bool) -> 'a case 378 + (** [case0 key v is_v] defines a case with no payload. Encodes as 379 + [{key: null}]. *) 380 + 381 + val variant : 'a case list -> 'a t 382 + (** [variant cases] builds a codec from cases. *) 383 + end 384 + 385 + (** {1:recursive Recursive Types} *) 386 + 387 + val fix : ('a t -> 'a t) -> 'a t 388 + (** [fix f] creates a recursive codec. The function [f] receives a codec that 389 + can be used for self-reference. 390 + 391 + {[ 392 + type tree = Leaf of int | Node of tree * tree 393 + 394 + let tree_codec = 395 + Value.fix @@ fun self -> 396 + Value.Variant.( 397 + variant 398 + [ 399 + case 0 Value.int 400 + (fun x -> Leaf x) 401 + (function Leaf x -> Some x | _ -> None); 402 + case 1 (Value.tuple2 self self) 403 + (fun (l, r) -> Node (l, r)) 404 + (function Node (l, r) -> Some (l, r) | _ -> None); 405 + ]) 406 + ]} *) 407 + 408 + (** {1:queries Queries} 409 + 410 + Queries navigate into CBOR structures. Following the soup paper approach, 411 + each query builds a new codec that knows how to find and decode a specific 412 + part of a CBOR data structure. *) 413 + 414 + val mem : string -> 'a t -> 'a t 415 + (** [mem name c] queries a map member by text key [name], decoding the value 416 + with codec [c]. Returns a codec that expects a CBOR map and extracts the 417 + member named [name]. *) 418 + 419 + val int_mem : int -> 'a t -> 'a t 420 + (** [int_mem key c] queries a map member by integer key [key], decoding the 421 + value with codec [c]. Useful for COSE/CWT style integer-keyed maps. *) 422 + 423 + val nth : int -> 'a t -> 'a t 424 + (** [nth n c] queries the [n]th element of an array, decoding the element with 425 + codec [c]. Zero-indexed. *) 426 + 427 + (** {1:updates Updates} 428 + 429 + Updates modify parts of a CBOR structure, returning new CBOR values. Like 430 + queries, updates are built as codecs. *) 431 + 432 + val update_mem : string -> 'a t -> Value.t t 433 + (** [update_mem name c] creates a codec that decodes a map, finds the member 434 + named [name], decodes it with [c], re-encodes the decoded value, and returns 435 + the whole map with the member replaced. This is an identity transform 436 + through codec [c] -- useful when [c] normalizes or transforms values. *) 437 + 438 + val delete_mem : string -> Value.t t 439 + (** [delete_mem name] creates a codec that decodes a map and returns it with the 440 + member named [name] removed. If the member is absent the map is returned 441 + unchanged. *) 442 + 443 + (** {1:introspection Introspection} *) 444 + 445 + val kind : 'a t -> string 446 + (** [kind c] returns a human-readable description of the codec kind (e.g., 447 + ["int"], ["string"], ["obj({name, age})"], ["mem(name, string)"]). *) 448 + 449 + (** {1:decode Decoding} *) 450 + 451 + val decode : 'a t -> Bytes.Reader.t -> ('a, Error.t) result 452 + (** [decode c r] decodes a value from CBOR reader [r] using codec [c]. *) 453 + 454 + val decode_string : 'a t -> string -> ('a, Error.t) result 455 + (** [decode_string c s] decodes from CBOR bytes [s]. *) 456 + 457 + val decode_exn : 'a t -> Bytes.Reader.t -> 'a 458 + (** [decode_exn c r] is like {!val-decode} but raises {!Error.Decode}. *) 459 + 460 + val decode_string_exn : 'a t -> string -> 'a 461 + (** [decode_string_exn c s] is like {!decode_string} but raises. *) 462 + 463 + (** {1:encode Encoding} *) 464 + 465 + val encode : 'a t -> 'a -> eod:bool -> Bytes.Writer.t -> unit 466 + (** [encode c v ~eod w] encodes [v] to writer [w] using codec [c]. If [eod] is 467 + true, signals end-of-data after encoding. *) 468 + 469 + val encode_string : 'a t -> 'a -> string 470 + (** [encode_string c v] encodes [v] to a CBOR byte string. *) 471 + 472 + (** {1:private_ Private} 473 + 474 + {b For internal use by sibling libraries and tests only.} These functions 475 + expose the intermediate [Value.t] type and are not part of the public codec 476 + API. *) 477 + module Private : sig 478 + val decode_cbor : 'a t -> Value.t -> ('a, Error.t) result 479 + (** [decode_cbor c v] decodes from a CBOR value [v]. *) 480 + 481 + val decode_cbor_exn : 'a t -> Value.t -> 'a 482 + (** [decode_cbor_exn c v] is like [decode_cbor] but raises. *) 483 + 484 + val encode_cbor : 'a t -> 'a -> Value.t 485 + (** [encode_cbor c v] encodes [v] to a [Value.t] value. *) 486 + end
+4
lib/dune
··· 1 + (library 2 + (name cbor) 3 + (public_name cbor) 4 + (libraries bytesrw zarith fmt))
+265
lib/value.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + type t = 7 + | Int of Z.t 8 + | Bytes of string 9 + | Text of string 10 + | Array of t list 11 + | Map of (t * t) list 12 + | Tag of int * t 13 + | Bool of bool 14 + | Null 15 + | Undefined 16 + | Simple of int 17 + | Float of float 18 + 19 + (* Constructors *) 20 + let int n = Int (Z.of_int n) 21 + let int64 n = Int (Z.of_int64 n) 22 + let bigint n = Int n 23 + let string s = Text s 24 + let bytes s = Bytes s 25 + let array items = Array items 26 + let map pairs = Map pairs 27 + let tag n item = Tag (n, item) 28 + let bool b = Bool b 29 + let null = Null 30 + let undefined = Undefined 31 + let float f = Float f 32 + 33 + (* Map operations *) 34 + let find key = function Map pairs -> List.assoc_opt key pairs | _ -> None 35 + let text key t = find (Text key) t 36 + let mem key = function Map pairs -> List.mem_assoc key pairs | _ -> false 37 + let mem_text key t = mem (Text key) t 38 + 39 + (* Array operations *) 40 + let nth i = function Array items -> List.nth_opt items i | _ -> None 41 + 42 + let length = function 43 + | Array items -> Some (List.length items) 44 + | Map pairs -> Some (List.length pairs) 45 + | _ -> None 46 + 47 + (* Type predicates *) 48 + let is_int = function Int _ -> true | _ -> false 49 + let is_bytes = function Bytes _ -> true | _ -> false 50 + let is_text = function Text _ -> true | _ -> false 51 + let is_array = function Array _ -> true | _ -> false 52 + let is_map = function Map _ -> true | _ -> false 53 + let is_tag = function Tag _ -> true | _ -> false 54 + let is_bool = function Bool _ -> true | _ -> false 55 + let is_null = function Null -> true | _ -> false 56 + let is_undefined = function Undefined -> true | _ -> false 57 + let is_simple = function Simple _ -> true | _ -> false 58 + let is_float = function Float _ -> true | _ -> false 59 + 60 + (* Type-safe accessors *) 61 + let to_int = function Int n -> Some n | _ -> None 62 + 63 + let to_int_exn = function 64 + | Int n -> n 65 + | _ -> invalid_arg "Cbor.to_int_exn: not an Int" 66 + 67 + let to_int64 = function 68 + | Int n -> if Z.fits_int64 n then Some (Z.to_int64 n) else None 69 + | _ -> None 70 + 71 + let to_int64_exn = function 72 + | Int n -> 73 + if Z.fits_int64 n then Z.to_int64 n 74 + else invalid_arg "Cbor.to_int64_exn: value doesn't fit in int64" 75 + | _ -> invalid_arg "Cbor.to_int64_exn: not an Int" 76 + 77 + let to_bytes = function Bytes s -> Some s | _ -> None 78 + 79 + let to_bytes_exn = function 80 + | Bytes s -> s 81 + | _ -> invalid_arg "Cbor.to_bytes_exn: not Bytes" 82 + 83 + let to_text = function Text s -> Some s | _ -> None 84 + 85 + let to_text_exn = function 86 + | Text s -> s 87 + | _ -> invalid_arg "Cbor.to_text_exn: not Text" 88 + 89 + let to_array = function Array items -> Some items | _ -> None 90 + 91 + let to_array_exn = function 92 + | Array items -> items 93 + | _ -> invalid_arg "Cbor.to_array_exn: not Array" 94 + 95 + let to_map = function Map pairs -> Some pairs | _ -> None 96 + 97 + let to_map_exn = function 98 + | Map pairs -> pairs 99 + | _ -> invalid_arg "Cbor.to_map_exn: not Map" 100 + 101 + let to_tag = function Tag (n, v) -> Some (n, v) | _ -> None 102 + 103 + let to_tag_exn = function 104 + | Tag (n, v) -> (n, v) 105 + | _ -> invalid_arg "Cbor.to_tag_exn: not Tag" 106 + 107 + let to_bool = function Bool b -> Some b | _ -> None 108 + 109 + let to_bool_exn = function 110 + | Bool b -> b 111 + | _ -> invalid_arg "Cbor.to_bool_exn: not Bool" 112 + 113 + let to_float = function Float f -> Some f | _ -> None 114 + 115 + let to_float_exn = function 116 + | Float f -> f 117 + | _ -> invalid_arg "Cbor.to_float_exn: not Float" 118 + 119 + (* Numeric conversions *) 120 + let to_number = function 121 + | Int n -> Some (Z.to_float n) 122 + | Float f -> Some f 123 + | _ -> None 124 + 125 + let to_int_of_float = function 126 + | Int n -> Some n 127 + | Float f -> 128 + let n = Z.of_float f in 129 + if Z.to_float n = f then Some n else None 130 + | _ -> None 131 + 132 + (* Comparison *) 133 + let rec equal a b = 134 + match (a, b) with 135 + | Int x, Int y -> Z.equal x y 136 + | Bytes x, Bytes y -> x = y 137 + | Text x, Text y -> x = y 138 + | Array xs, Array ys -> 139 + List.length xs = List.length ys && List.for_all2 equal xs ys 140 + | Map xs, Map ys -> 141 + List.length xs = List.length ys 142 + && List.for_all2 143 + (fun (k1, v1) (k2, v2) -> equal k1 k2 && equal v1 v2) 144 + xs ys 145 + | Tag (n1, v1), Tag (n2, v2) -> n1 = n2 && equal v1 v2 146 + | Bool x, Bool y -> x = y 147 + | Null, Null -> true 148 + | Undefined, Undefined -> true 149 + | Simple x, Simple y -> x = y 150 + | Float x, Float y -> x = y 151 + | _ -> false 152 + 153 + let major_type_order = function 154 + | Int n when Z.sign n >= 0 -> 0 155 + | Int _ -> 1 156 + | Bytes _ -> 2 157 + | Text _ -> 3 158 + | Array _ -> 4 159 + | Map _ -> 5 160 + | Tag _ -> 6 161 + | Bool _ | Null | Undefined | Simple _ | Float _ -> 7 162 + 163 + let rec compare a b = 164 + let ma = major_type_order a and mb = major_type_order b in 165 + if ma <> mb then Int.compare ma mb 166 + else 167 + match (a, b) with 168 + | Int x, Int y -> Z.compare x y 169 + | Bytes x, Bytes y -> String.compare x y 170 + | Text x, Text y -> String.compare x y 171 + | Array xs, Array ys -> compare_lists xs ys 172 + | Map xs, Map ys -> compare_maps xs ys 173 + | Tag (n1, v1), Tag (n2, v2) -> 174 + let c = Int.compare n1 n2 in 175 + if c <> 0 then c else compare v1 v2 176 + | Bool x, Bool y -> Bool.compare x y 177 + | Null, Null -> 0 178 + | Undefined, Undefined -> 0 179 + | Simple x, Simple y -> Int.compare x y 180 + | Float x, Float y -> Float.compare x y 181 + | _ -> 0 182 + 183 + and compare_lists xs ys = 184 + match (xs, ys) with 185 + | [], [] -> 0 186 + | [], _ -> -1 187 + | _, [] -> 1 188 + | x :: xs', y :: ys' -> 189 + let c = compare x y in 190 + if c <> 0 then c else compare_lists xs' ys' 191 + 192 + and compare_maps xs ys = 193 + match (xs, ys) with 194 + | [], [] -> 0 195 + | [], _ -> -1 196 + | _, [] -> 1 197 + | (k1, v1) :: xs', (k2, v2) :: ys' -> 198 + let c = compare k1 k2 in 199 + if c <> 0 then c 200 + else 201 + let c = compare v1 v2 in 202 + if c <> 0 then c else compare_maps xs' ys' 203 + 204 + (* Pretty printing - diagnostic notation per RFC 8949 Section 8 *) 205 + let rec pp ppf = function 206 + | Int n -> Fmt.pf ppf "%s" (Z.to_string n) 207 + | Bytes s -> pp_bytes ppf s 208 + | Text s -> pp_text ppf s 209 + | Array items -> pp_array ppf items 210 + | Map pairs -> pp_map ppf pairs 211 + | Tag (n, v) -> Fmt.pf ppf "%d(%a)" n pp v 212 + | Bool true -> Fmt.pf ppf "true" 213 + | Bool false -> Fmt.pf ppf "false" 214 + | Null -> Fmt.pf ppf "null" 215 + | Undefined -> Fmt.pf ppf "undefined" 216 + | Simple n -> Fmt.pf ppf "simple(%d)" n 217 + | Float f -> pp_float ppf f 218 + 219 + and pp_bytes ppf s = 220 + Fmt.pf ppf "h'"; 221 + String.iter (fun c -> Fmt.pf ppf "%02x" (Char.code c)) s; 222 + Fmt.pf ppf "'" 223 + 224 + and pp_text ppf s = 225 + Fmt.pf ppf "\""; 226 + String.iter 227 + (fun c -> 228 + match c with 229 + | '"' -> Fmt.pf ppf "\\\"" 230 + | '\\' -> Fmt.pf ppf "\\\\" 231 + | '\n' -> Fmt.pf ppf "\\n" 232 + | '\r' -> Fmt.pf ppf "\\r" 233 + | '\t' -> Fmt.pf ppf "\\t" 234 + | c when Char.code c < 32 -> Fmt.pf ppf "\\u%04x" (Char.code c) 235 + | c -> Fmt.pf ppf "%c" c) 236 + s; 237 + Fmt.pf ppf "\"" 238 + 239 + and pp_array ppf items = 240 + Fmt.pf ppf "["; 241 + List.iteri 242 + (fun i v -> 243 + if i > 0 then Fmt.pf ppf ", "; 244 + pp ppf v) 245 + items; 246 + Fmt.pf ppf "]" 247 + 248 + and pp_map ppf pairs = 249 + Fmt.pf ppf "{"; 250 + List.iteri 251 + (fun i (k, v) -> 252 + if i > 0 then Fmt.pf ppf ", "; 253 + Fmt.pf ppf "%a: %a" pp k pp v) 254 + pairs; 255 + Fmt.pf ppf "}" 256 + 257 + and pp_float ppf f = 258 + match classify_float f with 259 + | FP_nan -> Fmt.pf ppf "NaN" 260 + | FP_infinite -> 261 + if f > 0.0 then Fmt.pf ppf "Infinity" else Fmt.pf ppf "-Infinity" 262 + | FP_zero when 1.0 /. f < 0.0 -> Fmt.pf ppf "-0.0" 263 + | _ -> Fmt.pf ppf "%g" f 264 + 265 + let to_diagnostic v = Fmt.str "%a" pp v
+271
lib/value.mli
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** CBOR data items. 7 + 8 + This module defines the CBOR (Concise Binary Object Representation) data 9 + model as specified in {{:https://www.rfc-editor.org/rfc/rfc8949}RFC 8949}. 10 + 11 + A CBOR data item is one of: 12 + - Unsigned integer (major type 0) 13 + - Negative integer (major type 1) 14 + - Byte string (major type 2) 15 + - Text string (major type 3) 16 + - Array of data items (major type 4) 17 + - Map of data item pairs (major type 5) 18 + - Tagged data item (major type 6) 19 + - Simple value or float (major type 7) 20 + 21 + {2 Example} 22 + 23 + {[ 24 + let person = 25 + Cbor.Map 26 + [ 27 + (Cbor.Text "name", Cbor.Text "Alice"); (Cbor.Text "age", Cbor.int 30); 28 + ] 29 + ]} *) 30 + 31 + (** {1:types CBOR Data Items} *) 32 + 33 + (** The type of CBOR data items. *) 34 + type t = 35 + | Int of Z.t 36 + (** Signed integer. Positive values encode as major type 0, negative as 37 + major type 1. Small values (fitting in int64) are encoded directly; 38 + larger values use bignum tags (2 for positive, 3 for negative). Uses 39 + zarith for arbitrary precision. *) 40 + | Bytes of string (** Byte string (major type 2). Raw binary data. *) 41 + | Text of string (** Text string (major type 3). Must be valid UTF-8. *) 42 + | Array of t list (** Array of data items (major type 4). *) 43 + | Map of (t * t) list 44 + (** Map of key-value pairs (major type 5). Keys can be any CBOR type, 45 + though string keys are most common for JSON interoperability. *) 46 + | Tag of int * t 47 + (** Tagged data item (major type 6). The tag number provides semantic 48 + information about the enclosed item. Common tags include: 49 + - 0: Standard date/time string 50 + - 1: Epoch-based date/time 51 + - 2: Positive bignum 52 + - 3: Negative bignum 53 + - 21: Expected base64url encoding 54 + - 22: Expected base64 encoding 55 + - 23: Expected base16 encoding *) 56 + | Bool of bool 57 + (** Boolean simple value. [true] is simple value 21, [false] is 20. *) 58 + | Null (** Null simple value (22). Represents absence of a value. *) 59 + | Undefined (** Undefined simple value (23). Distinct from null in CBOR. *) 60 + | Simple of int 61 + (** Other simple values (0-19, 24-255 excluding reserved). Simple values 62 + in range 0-23 are encoded in a single byte; 24-255 require two bytes. 63 + Values 24-31 are reserved for special encodings. *) 64 + | Float of float 65 + (** Floating-point number. May be encoded as half (16-bit), single 66 + (32-bit), or double (64-bit) precision IEEE 754. *) 67 + 68 + (** {1:constructors Constructors} 69 + 70 + Convenience constructors for common cases. *) 71 + 72 + val int : int -> t 73 + (** [int n] creates an integer from an OCaml [int]. *) 74 + 75 + val int64 : int64 -> t 76 + (** [int64 n] creates an integer from an [int64]. *) 77 + 78 + val bigint : Z.t -> t 79 + (** [bigint n] creates an integer from a zarith integer. *) 80 + 81 + val string : string -> t 82 + (** [string s] creates a text string. Alias for [Text s]. *) 83 + 84 + val bytes : string -> t 85 + (** [bytes s] creates a byte string. Alias for [Bytes s]. *) 86 + 87 + val array : t list -> t 88 + (** [array items] creates an array. Alias for [Array items]. *) 89 + 90 + val map : (t * t) list -> t 91 + (** [map pairs] creates a map. Alias for [Map pairs]. *) 92 + 93 + val tag : int -> t -> t 94 + (** [tag n item] creates a tagged item. Alias for [Tag (n, item)]. *) 95 + 96 + val bool : bool -> t 97 + (** [bool b] creates a boolean. Alias for [Bool b]. *) 98 + 99 + val null : t 100 + (** [null] is the null value. Alias for [Null]. *) 101 + 102 + val undefined : t 103 + (** [undefined] is the undefined value. Alias for [Undefined]. *) 104 + 105 + val float : float -> t 106 + (** [float f] creates a float. Alias for [Float f]. *) 107 + 108 + (** {1:maps Map Operations} 109 + 110 + CBOR maps can have any type as keys. These functions provide convenient 111 + access patterns for the common case of text string keys. *) 112 + 113 + val find : t -> t -> t option 114 + (** [find key map] finds the value associated with [key] in [map]. Returns 115 + [None] if [map] is not a map or [key] is not found. *) 116 + 117 + val text : string -> t -> t option 118 + (** [text key map] finds the value for text key [key] in [map]. Equivalent to 119 + [find (Text key) map]. *) 120 + 121 + val mem : t -> t -> bool 122 + (** [mem key map] returns [true] if [key] exists in [map]. *) 123 + 124 + val mem_text : string -> t -> bool 125 + (** [mem_text key map] returns [true] if text key [key] exists in [map]. *) 126 + 127 + (** {1:arrays Array Operations} *) 128 + 129 + val nth : int -> t -> t option 130 + (** [nth i arr] returns the [i]th element of array [arr], or [None] if [arr] is 131 + not an array or [i] is out of bounds. Zero-indexed. *) 132 + 133 + val length : t -> int option 134 + (** [length v] returns the length of array or map [v], or [None] if [v] is 135 + neither. For maps, returns the number of key-value pairs. *) 136 + 137 + (** {1:predicates Type Predicates} *) 138 + 139 + val is_int : t -> bool 140 + (** [is_int v] is [true] iff [v] is an [Int]. *) 141 + 142 + val is_bytes : t -> bool 143 + (** [is_bytes v] is [true] iff [v] is a [Bytes]. *) 144 + 145 + val is_text : t -> bool 146 + (** [is_text v] is [true] iff [v] is a [Text]. *) 147 + 148 + val is_array : t -> bool 149 + (** [is_array v] is [true] iff [v] is an [Array]. *) 150 + 151 + val is_map : t -> bool 152 + (** [is_map v] is [true] iff [v] is a [Map]. *) 153 + 154 + val is_tag : t -> bool 155 + (** [is_tag v] is [true] iff [v] is a [Tag]. *) 156 + 157 + val is_bool : t -> bool 158 + (** [is_bool v] is [true] iff [v] is a [Bool]. *) 159 + 160 + val is_null : t -> bool 161 + (** [is_null v] is [true] iff [v] is [Null]. *) 162 + 163 + val is_undefined : t -> bool 164 + (** [is_undefined v] is [true] iff [v] is [Undefined]. *) 165 + 166 + val is_simple : t -> bool 167 + (** [is_simple v] is [true] iff [v] is a [Simple]. *) 168 + 169 + val is_float : t -> bool 170 + (** [is_float v] is [true] iff [v] is a [Float]. *) 171 + 172 + (** {1:accessors Type-Safe Accessors} 173 + 174 + These functions extract values with type checking. *) 175 + 176 + val to_int : t -> Z.t option 177 + (** [to_int v] extracts the integer value, or [None] if not an [Int]. *) 178 + 179 + val to_int_exn : t -> Z.t 180 + (** [to_int_exn v] extracts the integer value. 181 + @raise Invalid_argument if [v] is not an [Int]. *) 182 + 183 + val to_int64 : t -> int64 option 184 + (** [to_int64 v] extracts the integer as int64, or [None] if not an [Int] or if 185 + the value doesn't fit in int64. *) 186 + 187 + val to_int64_exn : t -> int64 188 + (** [to_int64_exn v] extracts the integer as int64. 189 + @raise Invalid_argument if [v] is not an [Int] or doesn't fit in int64. *) 190 + 191 + val to_bytes : t -> string option 192 + (** [to_bytes v] extracts the byte string, or [None] if not [Bytes]. *) 193 + 194 + val to_bytes_exn : t -> string 195 + (** [to_bytes_exn v] extracts the byte string. 196 + @raise Invalid_argument if [v] is not [Bytes]. *) 197 + 198 + val to_text : t -> string option 199 + (** [to_text v] extracts the text string, or [None] if not [Text]. *) 200 + 201 + val to_text_exn : t -> string 202 + (** [to_text_exn v] extracts the text string. 203 + @raise Invalid_argument if [v] is not [Text]. *) 204 + 205 + val to_array : t -> t list option 206 + (** [to_array v] extracts the array elements, or [None] if not [Array]. *) 207 + 208 + val to_array_exn : t -> t list 209 + (** [to_array_exn v] extracts the array elements. 210 + @raise Invalid_argument if [v] is not [Array]. *) 211 + 212 + val to_map : t -> (t * t) list option 213 + (** [to_map v] extracts the map pairs, or [None] if not [Map]. *) 214 + 215 + val to_map_exn : t -> (t * t) list 216 + (** [to_map_exn v] extracts the map pairs. 217 + @raise Invalid_argument if [v] is not [Map]. *) 218 + 219 + val to_tag : t -> (int * t) option 220 + (** [to_tag v] extracts the tag number and content, or [None] if not [Tag]. *) 221 + 222 + val to_tag_exn : t -> int * t 223 + (** [to_tag_exn v] extracts the tag number and content. 224 + @raise Invalid_argument if [v] is not [Tag]. *) 225 + 226 + val to_bool : t -> bool option 227 + (** [to_bool v] extracts the boolean, or [None] if not [Bool]. *) 228 + 229 + val to_bool_exn : t -> bool 230 + (** [to_bool_exn v] extracts the boolean. 231 + @raise Invalid_argument if [v] is not [Bool]. *) 232 + 233 + val to_float : t -> float option 234 + (** [to_float v] extracts the float, or [None] if not [Float]. *) 235 + 236 + val to_float_exn : t -> float 237 + (** [to_float_exn v] extracts the float. 238 + @raise Invalid_argument if [v] is not [Float]. *) 239 + 240 + (** {1:numeric Numeric Conversions} 241 + 242 + These functions provide flexible numeric access, converting between integer 243 + and float representations as needed. *) 244 + 245 + val to_number : t -> float option 246 + (** [to_number v] extracts a numeric value as float. Returns [Some f] if [v] is 247 + [Int n] (converted to float) or [Float f]. Returns [None] otherwise. *) 248 + 249 + val to_int_of_float : t -> Z.t option 250 + (** [to_int_of_float v] extracts an integer, converting floats if they represent 251 + whole numbers. Returns [None] if [v] is not numeric or if the float has a 252 + fractional part. *) 253 + 254 + (** {1:comparison Comparison and Equality} *) 255 + 256 + val equal : t -> t -> bool 257 + (** [equal a b] is structural equality. For floats, uses IEEE 754 equality (NaN 258 + ≠ NaN). For maps, order of pairs matters. *) 259 + 260 + val compare : t -> t -> int 261 + (** [compare a b] provides total ordering. Comparison order follows CBOR 262 + deterministic encoding: by major type first, then by value. *) 263 + 264 + (** {1:output Pretty Printing} *) 265 + 266 + val pp : Format.formatter -> t -> unit 267 + (** [pp ppf v] pretty-prints [v] in diagnostic notation as defined in RFC 8949 268 + Section 8. *) 269 + 270 + val to_diagnostic : t -> string 271 + (** [to_diagnostic v] returns [v] in diagnostic notation. *)
+3674
specs/rfc8949.txt
··· 1 +  2 + 3 + 4 + 5 + Internet Engineering Task Force (IETF) C. Bormann 6 + Request for Comments: 8949 Universität Bremen TZI 7 + STD: 94 P. Hoffman 8 + Obsoletes: 7049 ICANN 9 + Category: Standards Track December 2020 10 + ISSN: 2070-1721 11 + 12 + 13 + Concise Binary Object Representation (CBOR) 14 + 15 + Abstract 16 + 17 + The Concise Binary Object Representation (CBOR) is a data format 18 + whose design goals include the possibility of extremely small code 19 + size, fairly small message size, and extensibility without the need 20 + for version negotiation. These design goals make it different from 21 + earlier binary serializations such as ASN.1 and MessagePack. 22 + 23 + This document obsoletes RFC 7049, providing editorial improvements, 24 + new details, and errata fixes while keeping full compatibility with 25 + the interchange format of RFC 7049. It does not create a new version 26 + of the format. 27 + 28 + Status of This Memo 29 + 30 + This is an Internet Standards Track document. 31 + 32 + This document is a product of the Internet Engineering Task Force 33 + (IETF). It represents the consensus of the IETF community. It has 34 + received public review and has been approved for publication by the 35 + Internet Engineering Steering Group (IESG). Further information on 36 + Internet Standards is available in Section 2 of RFC 7841. 37 + 38 + Information about the current status of this document, any errata, 39 + and how to provide feedback on it may be obtained at 40 + https://www.rfc-editor.org/info/rfc8949. 41 + 42 + Copyright Notice 43 + 44 + Copyright (c) 2020 IETF Trust and the persons identified as the 45 + document authors. All rights reserved. 46 + 47 + This document is subject to BCP 78 and the IETF Trust's Legal 48 + Provisions Relating to IETF Documents 49 + (https://trustee.ietf.org/license-info) in effect on the date of 50 + publication of this document. Please review these documents 51 + carefully, as they describe your rights and restrictions with respect 52 + to this document. Code Components extracted from this document must 53 + include Simplified BSD License text as described in Section 4.e of 54 + the Trust Legal Provisions and are provided without warranty as 55 + described in the Simplified BSD License. 56 + 57 + Table of Contents 58 + 59 + 1. Introduction 60 + 1.1. Objectives 61 + 1.2. Terminology 62 + 2. CBOR Data Models 63 + 2.1. Extended Generic Data Models 64 + 2.2. Specific Data Models 65 + 3. Specification of the CBOR Encoding 66 + 3.1. Major Types 67 + 3.2. Indefinite Lengths for Some Major Types 68 + 3.2.1. The "break" Stop Code 69 + 3.2.2. Indefinite-Length Arrays and Maps 70 + 3.2.3. Indefinite-Length Byte Strings and Text Strings 71 + 3.2.4. Summary of Indefinite-Length Use of Major Types 72 + 3.3. Floating-Point Numbers and Values with No Content 73 + 3.4. Tagging of Items 74 + 3.4.1. Standard Date/Time String 75 + 3.4.2. Epoch-Based Date/Time 76 + 3.4.3. Bignums 77 + 3.4.4. Decimal Fractions and Bigfloats 78 + 3.4.5. Content Hints 79 + 3.4.5.1. Encoded CBOR Data Item 80 + 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 81 + 3.4.5.3. Encoded Text 82 + 3.4.6. Self-Described CBOR 83 + 4. Serialization Considerations 84 + 4.1. Preferred Serialization 85 + 4.2. Deterministically Encoded CBOR 86 + 4.2.1. Core Deterministic Encoding Requirements 87 + 4.2.2. Additional Deterministic Encoding Considerations 88 + 4.2.3. Length-First Map Key Ordering 89 + 5. Creating CBOR-Based Protocols 90 + 5.1. CBOR in Streaming Applications 91 + 5.2. Generic Encoders and Decoders 92 + 5.3. Validity of Items 93 + 5.3.1. Basic validity 94 + 5.3.2. Tag validity 95 + 5.4. Validity and Evolution 96 + 5.5. Numbers 97 + 5.6. Specifying Keys for Maps 98 + 5.6.1. Equivalence of Keys 99 + 5.7. Undefined Values 100 + 6. Converting Data between CBOR and JSON 101 + 6.1. Converting from CBOR to JSON 102 + 6.2. Converting from JSON to CBOR 103 + 7. Future Evolution of CBOR 104 + 7.1. Extension Points 105 + 7.2. Curating the Additional Information Space 106 + 8. Diagnostic Notation 107 + 8.1. Encoding Indicators 108 + 9. IANA Considerations 109 + 9.1. CBOR Simple Values Registry 110 + 9.2. CBOR Tags Registry 111 + 9.3. Media Types Registry 112 + 9.4. CoAP Content-Format Registry 113 + 9.5. Structured Syntax Suffix Registry 114 + 10. Security Considerations 115 + 11. References 116 + 11.1. Normative References 117 + 11.2. Informative References 118 + Appendix A. Examples of Encoded CBOR Data Items 119 + Appendix B. Jump Table for Initial Byte 120 + Appendix C. Pseudocode 121 + Appendix D. Half-Precision 122 + Appendix E. Comparison of Other Binary Formats to CBOR's Design 123 + Objectives 124 + E.1. ASN.1 DER, BER, and PER 125 + E.2. MessagePack 126 + E.3. BSON 127 + E.4. MSDTP: RFC 713 128 + E.5. Conciseness on the Wire 129 + Appendix F. Well-Formedness Errors and Examples 130 + F.1. Examples of CBOR Data Items That Are Not Well-Formed 131 + Appendix G. Changes from RFC 7049 132 + G.1. Errata Processing and Clerical Changes 133 + G.2. Changes in IANA Considerations 134 + G.3. Changes in Suggestions and Other Informational Components 135 + Acknowledgements 136 + Authors' Addresses 137 + 138 + 1. Introduction 139 + 140 + There are hundreds of standardized formats for binary representation 141 + of structured data (also known as binary serialization formats). Of 142 + those, some are for specific domains of information, while others are 143 + generalized for arbitrary data. In the IETF, probably the best-known 144 + formats in the latter category are ASN.1's BER and DER [ASN.1]. 145 + 146 + The format defined here follows some specific design goals that are 147 + not well met by current formats. The underlying data model is an 148 + extended version of the JSON data model [RFC8259]. It is important 149 + to note that this is not a proposal that the grammar in RFC 8259 be 150 + extended in general, since doing so would cause a significant 151 + backwards incompatibility with already deployed JSON documents. 152 + Instead, this document simply defines its own data model that starts 153 + from JSON. 154 + 155 + Appendix E lists some existing binary formats and discusses how well 156 + they do or do not fit the design objectives of the Concise Binary 157 + Object Representation (CBOR). 158 + 159 + This document obsoletes [RFC7049], providing editorial improvements, 160 + new details, and errata fixes while keeping full compatibility with 161 + the interchange format of RFC 7049. It does not create a new version 162 + of the format. 163 + 164 + 1.1. Objectives 165 + 166 + The objectives of CBOR, roughly in decreasing order of importance, 167 + are: 168 + 169 + 1. The representation must be able to unambiguously encode most 170 + common data formats used in Internet standards. 171 + 172 + * It must represent a reasonable set of basic data types and 173 + structures using binary encoding. "Reasonable" here is 174 + largely influenced by the capabilities of JSON, with the major 175 + addition of binary byte strings. The structures supported are 176 + limited to arrays and trees; loops and lattice-style graphs 177 + are not supported. 178 + 179 + * There is no requirement that all data formats be uniquely 180 + encoded; that is, it is acceptable that the number "7" might 181 + be encoded in multiple different ways. 182 + 183 + 2. The code for an encoder or decoder must be able to be compact in 184 + order to support systems with very limited memory, processor 185 + power, and instruction sets. 186 + 187 + * An encoder and a decoder need to be implementable in a very 188 + small amount of code (for example, in class 1 constrained 189 + nodes as defined in [RFC7228]). 190 + 191 + * The format should use contemporary machine representations of 192 + data (for example, not requiring binary-to-decimal 193 + conversion). 194 + 195 + 3. Data must be able to be decoded without a schema description. 196 + 197 + * Similar to JSON, encoded data should be self-describing so 198 + that a generic decoder can be written. 199 + 200 + 4. The serialization must be reasonably compact, but data 201 + compactness is secondary to code compactness for the encoder and 202 + decoder. 203 + 204 + * "Reasonable" here is bounded by JSON as an upper bound in size 205 + and by the implementation complexity, which limits the amount 206 + of effort that can go into achieving that compactness. Using 207 + either general compression schemes or extensive bit-fiddling 208 + violates the complexity goals. 209 + 210 + 5. The format must be applicable to both constrained nodes and high- 211 + volume applications. 212 + 213 + * This means it must be reasonably frugal in CPU usage for both 214 + encoding and decoding. This is relevant both for constrained 215 + nodes and for potential usage in applications with a very high 216 + volume of data. 217 + 218 + 6. The format must support all JSON data types for conversion to and 219 + from JSON. 220 + 221 + * It must support a reasonable level of conversion as long as 222 + the data represented is within the capabilities of JSON. It 223 + must be possible to define a unidirectional mapping towards 224 + JSON for all types of data. 225 + 226 + 7. The format must be extensible, and the extended data must be 227 + decodable by earlier decoders. 228 + 229 + * The format is designed for decades of use. 230 + 231 + * The format must support a form of extensibility that allows 232 + fallback so that a decoder that does not understand an 233 + extension can still decode the message. 234 + 235 + * The format must be able to be extended in the future by later 236 + IETF standards. 237 + 238 + 1.2. Terminology 239 + 240 + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 241 + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 242 + "OPTIONAL" in this document are to be interpreted as described in 243 + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 244 + capitals, as shown here. 245 + 246 + The term "byte" is used in its now-customary sense as a synonym for 247 + "octet". All multi-byte values are encoded in network byte order 248 + (that is, most significant byte first, also known as "big-endian"). 249 + 250 + This specification makes use of the following terminology: 251 + 252 + Data item: A single piece of CBOR data. The structure of a data 253 + item may contain zero, one, or more nested data items. The term 254 + is used both for the data item in representation format and for 255 + the abstract idea that can be derived from that by a decoder; the 256 + former can be addressed specifically by using the term "encoded 257 + data item". 258 + 259 + Decoder: A process that decodes a well-formed encoded CBOR data item 260 + and makes it available to an application. Formally speaking, a 261 + decoder contains a parser to break up the input using the syntax 262 + rules of CBOR, as well as a semantic processor to prepare the data 263 + in a form suitable to the application. 264 + 265 + Encoder: A process that generates the (well-formed) representation 266 + format of a CBOR data item from application information. 267 + 268 + Data Stream: A sequence of zero or more data items, not further 269 + assembled into a larger containing data item (see [RFC8742] for 270 + one application). The independent data items that make up a data 271 + stream are sometimes also referred to as "top-level data items". 272 + 273 + Well-formed: A data item that follows the syntactic structure of 274 + CBOR. A well-formed data item uses the initial bytes and the byte 275 + strings and/or data items that are implied by their values as 276 + defined in CBOR and does not include following extraneous data. 277 + CBOR decoders by definition only return contents from well-formed 278 + data items. 279 + 280 + Valid: A data item that is well-formed and also follows the semantic 281 + restrictions that apply to CBOR data items (Section 5.3). 282 + 283 + Expected: Besides its normal English meaning, the term "expected" is 284 + used to describe requirements beyond CBOR validity that an 285 + application has on its input data. Well-formed (processable at 286 + all), valid (checked by a validity-checking generic decoder), and 287 + expected (checked by the application) form a hierarchy of layers 288 + of acceptability. 289 + 290 + Stream decoder: A process that decodes a data stream and makes each 291 + of the data items in the sequence available to an application as 292 + they are received. 293 + 294 + Terms and concepts for floating-point values such as Infinity, NaN 295 + (not a number), negative zero, and subnormal are defined in 296 + [IEEE754]. 297 + 298 + Where bit arithmetic or data types are explained, this document uses 299 + the notation familiar from the programming language C [C], except 300 + that ".." denotes a range that includes both ends given, and 301 + superscript notation denotes exponentiation. For example, 2 to the 302 + power of 64 is notated: 2^(64). In the plain-text version of this 303 + specification, superscript notation is not available and therefore is 304 + rendered by a surrogate notation. That notation is not optimized for 305 + this RFC; it is unfortunately ambiguous with C's exclusive-or (which 306 + is only used in the appendices, which in turn do not use 307 + exponentiation) and requires circumspection from the reader of the 308 + plain-text version. 309 + 310 + Examples and pseudocode assume that signed integers use two's 311 + complement representation and that right shifts of signed integers 312 + perform sign extension; these assumptions are also specified in 313 + Sections 6.8.1 (basic.fundamental) and 7.6.7 (expr.shift) of the 2020 314 + version of C++ (currently available as a final draft, [Cplusplus20]). 315 + 316 + Similar to the "0x" notation for hexadecimal numbers, numbers in 317 + binary notation are prefixed with "0b". Underscores can be added to 318 + a number solely for readability, so 0b00100001 (0x21) might be 319 + written 0b001_00001 to emphasize the desired interpretation of the 320 + bits in the byte; in this case, it is split into three bits and five 321 + bits. Encoded CBOR data items are sometimes given in the "0x" or 322 + "0b" notation; these values are first interpreted as numbers as in C 323 + and are then interpreted as byte strings in network byte order, 324 + including any leading zero bytes expressed in the notation. 325 + 326 + Words may be _italicized_ for emphasis; in the plain text form of 327 + this specification, this is indicated by surrounding words with 328 + underscore characters. Verbatim text (e.g., names from a programming 329 + language) may be set in "monospace" type; in plain text, this is 330 + approximated somewhat ambiguously by surrounding the text in double 331 + quotes (which also retain their usual meaning). 332 + 333 + 2. CBOR Data Models 334 + 335 + CBOR is explicit about its generic data model, which defines the set 336 + of all data items that can be represented in CBOR. Its basic generic 337 + data model is extensible by the registration of "simple values" and 338 + tags. Applications can then create a subset of the resulting 339 + extended generic data model to build their specific data models. 340 + 341 + Within environments that can represent the data items in the generic 342 + data model, generic CBOR encoders and decoders can be implemented 343 + (which usually involves defining additional implementation data types 344 + for those data items that do not already have a natural 345 + representation in the environment). The ability to provide generic 346 + encoders and decoders is an explicit design goal of CBOR; however, 347 + many applications will provide their own application-specific 348 + encoders and/or decoders. 349 + 350 + In the basic (unextended) generic data model defined in Section 3, a 351 + data item is one of the following: 352 + 353 + * an integer in the range -2^(64)..2^(64)-1 inclusive 354 + 355 + * a simple value, identified by a number between 0 and 255, but 356 + distinct from that number itself 357 + 358 + * a floating-point value, distinct from an integer, out of the set 359 + representable by IEEE 754 binary64 (including non-finites) 360 + [IEEE754] 361 + 362 + * a sequence of zero or more bytes ("byte string") 363 + 364 + * a sequence of zero or more Unicode code points ("text string") 365 + 366 + * a sequence of zero or more data items ("array") 367 + 368 + * a mapping (mathematical function) from zero or more data items 369 + ("keys") each to a data item ("values"), ("map") 370 + 371 + * a tagged data item ("tag"), comprising a tag number (an integer in 372 + the range 0..2^(64)-1) and the tag content (a data item) 373 + 374 + Note that integer and floating-point values are distinct in this 375 + model, even if they have the same numeric value. 376 + 377 + Also note that serialization variants are not visible at the generic 378 + data model level. This deliberate absence of visibility includes the 379 + number of bytes of the encoded floating-point value. It also 380 + includes the choice of encoding for an "argument" (see Section 3) 381 + such as the encoding for an integer, the encoding for the length of a 382 + text or byte string, the encoding for the number of elements in an 383 + array or pairs in a map, or the encoding for a tag number. 384 + 385 + 2.1. Extended Generic Data Models 386 + 387 + This basic generic data model has been extended in this document by 388 + the registration of a number of simple values and tag numbers, such 389 + as: 390 + 391 + * "false", "true", "null", and "undefined" (simple values identified 392 + by 20..23, Section 3.3) 393 + 394 + * integer and floating-point values with a larger range and 395 + precision than the above (tag numbers 2 to 5, Section 3.4) 396 + 397 + * application data types such as a point in time or date/time string 398 + defined in RFC 3339 (tag numbers 1 and 0, Section 3.4) 399 + 400 + Additional elements of the extended generic data model can be (and 401 + have been) defined via the IANA registries created for CBOR. Even if 402 + such an extension is unknown to a generic encoder or decoder, data 403 + items using that extension can be passed to or from the application 404 + by representing them at the application interface within the basic 405 + generic data model, i.e., as generic simple values or generic tags. 406 + 407 + In other words, the basic generic data model is stable as defined in 408 + this document, while the extended generic data model expands by the 409 + registration of new simple values or tag numbers, but never shrinks. 410 + 411 + While there is a strong expectation that generic encoders and 412 + decoders can represent "false", "true", and "null" ("undefined" is 413 + intentionally omitted) in the form appropriate for their programming 414 + environment, the implementation of the data model extensions created 415 + by tags is truly optional and a matter of implementation quality. 416 + 417 + 2.2. Specific Data Models 418 + 419 + The specific data model for a CBOR-based protocol usually takes a 420 + subset of the extended generic data model and assigns application 421 + semantics to the data items within this subset and its components. 422 + When documenting such specific data models and specifying the types 423 + of data items, it is preferable to identify the types by their 424 + generic data model names ("negative integer", "array") instead of 425 + referring to aspects of their CBOR representation ("major type 1", 426 + "major type 4"). 427 + 428 + Specific data models can also specify value equivalency (including 429 + values of different types) for the purposes of map keys and encoder 430 + freedom. For example, in the generic data model, a valid map MAY 431 + have both "0" and "0.0" as keys, and an encoder MUST NOT encode "0.0" 432 + as an integer (major type 0, Section 3.1). However, if a specific 433 + data model declares that floating-point and integer representations 434 + of integral values are equivalent, using both map keys "0" and "0.0" 435 + in a single map would be considered duplicates, even while encoded as 436 + different major types, and so invalid; and an encoder could encode 437 + integral-valued floats as integers or vice versa, perhaps to save 438 + encoded bytes. 439 + 440 + 3. Specification of the CBOR Encoding 441 + 442 + A CBOR data item (Section 2) is encoded to or decoded from a byte 443 + string carrying a well-formed encoded data item as described in this 444 + section. The encoding is summarized in Table 7 in Appendix B, 445 + indexed by the initial byte. An encoder MUST produce only well- 446 + formed encoded data items. A decoder MUST NOT return a decoded data 447 + item when it encounters input that is not a well-formed encoded CBOR 448 + data item (this does not detract from the usefulness of diagnostic 449 + and recovery tools that might make available some information from a 450 + damaged encoded CBOR data item). 451 + 452 + The initial byte of each encoded data item contains both information 453 + about the major type (the high-order 3 bits, described in 454 + Section 3.1) and additional information (the low-order 5 bits). With 455 + a few exceptions, the additional information's value describes how to 456 + load an unsigned integer "argument": 457 + 458 + Less than 24: The argument's value is the value of the additional 459 + information. 460 + 461 + 24, 25, 26, or 27: The argument's value is held in the following 1, 462 + 2, 4, or 8 bytes, respectively, in network byte order. For major 463 + type 7 and additional information value 25, 26, 27, these bytes 464 + are not used as an integer argument, but as a floating-point value 465 + (see Section 3.3). 466 + 467 + 28, 29, 30: These values are reserved for future additions to the 468 + CBOR format. In the present version of CBOR, the encoded item is 469 + not well-formed. 470 + 471 + 31: No argument value is derived. If the major type is 0, 1, or 6, 472 + the encoded item is not well-formed. For major types 2 to 5, the 473 + item's length is indefinite, and for major type 7, the byte does 474 + not constitute a data item at all but terminates an indefinite- 475 + length item; all are described in Section 3.2. 476 + 477 + The initial byte and any additional bytes consumed to construct the 478 + argument are collectively referred to as the _head_ of the data item. 479 + 480 + The meaning of this argument depends on the major type. For example, 481 + in major type 0, the argument is the value of the data item itself 482 + (and in major type 1, the value of the data item is computed from the 483 + argument); in major type 2 and 3, it gives the length of the string 484 + data in bytes that follow; and in major types 4 and 5, it is used to 485 + determine the number of data items enclosed. 486 + 487 + If the encoded sequence of bytes ends before the end of a data item, 488 + that item is not well-formed. If the encoded sequence of bytes still 489 + has bytes remaining after the outermost encoded item is decoded, that 490 + encoding is not a single well-formed CBOR item. Depending on the 491 + application, the decoder may either treat the encoding as not well- 492 + formed or just identify the start of the remaining bytes to the 493 + application. 494 + 495 + A CBOR decoder implementation can be based on a jump table with all 496 + 256 defined values for the initial byte (Table 7). A decoder in a 497 + constrained implementation can instead use the structure of the 498 + initial byte and following bytes for more compact code (see 499 + Appendix C for a rough impression of how this could look). 500 + 501 + 3.1. Major Types 502 + 503 + The following lists the major types and the additional information 504 + and other bytes associated with the type. 505 + 506 + Major type 0: 507 + An unsigned integer in the range 0..2^(64)-1 inclusive. The value 508 + of the encoded item is the argument itself. For example, the 509 + integer 10 is denoted as the one byte 0b000_01010 (major type 0, 510 + additional information 10). The integer 500 would be 0b000_11001 511 + (major type 0, additional information 25) followed by the two 512 + bytes 0x01f4, which is 500 in decimal. 513 + 514 + Major type 1: 515 + A negative integer in the range -2^(64)..-1 inclusive. The value 516 + of the item is -1 minus the argument. For example, the integer 517 + -500 would be 0b001_11001 (major type 1, additional information 518 + 25) followed by the two bytes 0x01f3, which is 499 in decimal. 519 + 520 + Major type 2: 521 + A byte string. The number of bytes in the string is equal to the 522 + argument. For example, a byte string whose length is 5 would have 523 + an initial byte of 0b010_00101 (major type 2, additional 524 + information 5 for the length), followed by 5 bytes of binary 525 + content. A byte string whose length is 500 would have 3 initial 526 + bytes of 0b010_11001 (major type 2, additional information 25 to 527 + indicate a two-byte length) followed by the two bytes 0x01f4 for a 528 + length of 500, followed by 500 bytes of binary content. 529 + 530 + Major type 3: 531 + A text string (Section 2) encoded as UTF-8 [RFC3629]. The number 532 + of bytes in the string is equal to the argument. A string 533 + containing an invalid UTF-8 sequence is well-formed but invalid 534 + (Section 1.2). This type is provided for systems that need to 535 + interpret or display human-readable text, and allows the 536 + differentiation between unstructured bytes and text that has a 537 + specified repertoire (that of Unicode) and encoding (UTF-8). In 538 + contrast to formats such as JSON, the Unicode characters in this 539 + type are never escaped. Thus, a newline character (U+000A) is 540 + always represented in a string as the byte 0x0a, and never as the 541 + bytes 0x5c6e (the characters "\" and "n") nor as 0x5c7530303061 542 + (the characters "\", "u", "0", "0", "0", and "a"). 543 + 544 + Major type 4: 545 + An array of data items. In other formats, arrays are also called 546 + lists, sequences, or tuples (a "CBOR sequence" is something 547 + slightly different, though [RFC8742]). The argument is the number 548 + of data items in the array. Items in an array do not need to all 549 + be of the same type. For example, an array that contains 10 items 550 + of any type would have an initial byte of 0b100_01010 (major type 551 + 4, additional information 10 for the length) followed by the 10 552 + remaining items. 553 + 554 + Major type 5: 555 + A map of pairs of data items. Maps are also called tables, 556 + dictionaries, hashes, or objects (in JSON). A map is comprised of 557 + pairs of data items, each pair consisting of a key that is 558 + immediately followed by a value. The argument is the number of 559 + _pairs_ of data items in the map. For example, a map that 560 + contains 9 pairs would have an initial byte of 0b101_01001 (major 561 + type 5, additional information 9 for the number of pairs) followed 562 + by the 18 remaining items. The first item is the first key, the 563 + second item is the first value, the third item is the second key, 564 + and so on. Because items in a map come in pairs, their total 565 + number is always even: a map that contains an odd number of items 566 + (no value data present after the last key data item) is not well- 567 + formed. A map that has duplicate keys may be well-formed, but it 568 + is not valid, and thus it causes indeterminate decoding; see also 569 + Section 5.6. 570 + 571 + Major type 6: 572 + A tagged data item ("tag") whose tag number, an integer in the 573 + range 0..2^(64)-1 inclusive, is the argument and whose enclosed 574 + data item (_tag content_) is the single encoded data item that 575 + follows the head. See Section 3.4. 576 + 577 + Major type 7: 578 + Floating-point numbers and simple values, as well as the "break" 579 + stop code. See Section 3.3. 580 + 581 + These eight major types lead to a simple table showing which of the 582 + 256 possible values for the initial byte of a data item are used 583 + (Table 7). 584 + 585 + In major types 6 and 7, many of the possible values are reserved for 586 + future specification. See Section 9 for more information on these 587 + values. 588 + 589 + Table 1 summarizes the major types defined by CBOR, ignoring 590 + Section 3.2 for now. The number N in this table stands for the 591 + argument. 592 + 593 + +============+=======================+=========================+ 594 + | Major Type | Meaning | Content | 595 + +============+=======================+=========================+ 596 + | 0 | unsigned integer N | - | 597 + +------------+-----------------------+-------------------------+ 598 + | 1 | negative integer -1-N | - | 599 + +------------+-----------------------+-------------------------+ 600 + | 2 | byte string | N bytes | 601 + +------------+-----------------------+-------------------------+ 602 + | 3 | text string | N bytes (UTF-8 text) | 603 + +------------+-----------------------+-------------------------+ 604 + | 4 | array | N data items (elements) | 605 + +------------+-----------------------+-------------------------+ 606 + | 5 | map | 2N data items (key/ | 607 + | | | value pairs) | 608 + +------------+-----------------------+-------------------------+ 609 + | 6 | tag of number N | 1 data item | 610 + +------------+-----------------------+-------------------------+ 611 + | 7 | simple/float | - | 612 + +------------+-----------------------+-------------------------+ 613 + 614 + Table 1: Overview over the Definite-Length Use of CBOR Major 615 + Types (N = Argument) 616 + 617 + 3.2. Indefinite Lengths for Some Major Types 618 + 619 + Four CBOR items (arrays, maps, byte strings, and text strings) can be 620 + encoded with an indefinite length using additional information value 621 + 31. This is useful if the encoding of the item needs to begin before 622 + the number of items inside the array or map, or the total length of 623 + the string, is known. (The ability to start sending a data item 624 + before all of it is known is often referred to as "streaming" within 625 + that data item.) 626 + 627 + Indefinite-length arrays and maps are dealt with differently than 628 + indefinite-length strings (byte strings and text strings). 629 + 630 + 3.2.1. The "break" Stop Code 631 + 632 + The "break" stop code is encoded with major type 7 and additional 633 + information value 31 (0b111_11111). It is not itself a data item: it 634 + is just a syntactic feature to close an indefinite-length item. 635 + 636 + If the "break" stop code appears where a data item is expected, other 637 + than directly inside an indefinite-length string, array, or map -- 638 + for example, directly inside a definite-length array or map -- the 639 + enclosing item is not well-formed. 640 + 641 + 3.2.2. Indefinite-Length Arrays and Maps 642 + 643 + Indefinite-length arrays and maps are represented using their major 644 + type with the additional information value of 31, followed by an 645 + arbitrary-length sequence of zero or more items for an array or key/ 646 + value pairs for a map, followed by the "break" stop code 647 + (Section 3.2.1). In other words, indefinite-length arrays and maps 648 + look identical to other arrays and maps except for beginning with the 649 + additional information value of 31 and ending with the "break" stop 650 + code. 651 + 652 + If the "break" stop code appears after a key in a map, in place of 653 + that key's value, the map is not well-formed. 654 + 655 + There is no restriction against nesting indefinite-length array or 656 + map items. A "break" only terminates a single item, so nested 657 + indefinite-length items need exactly as many "break" stop codes as 658 + there are type bytes starting an indefinite-length item. 659 + 660 + For example, assume an encoder wants to represent the abstract array 661 + [1, [2, 3], [4, 5]]. The definite-length encoding would be 662 + 0x8301820203820405: 663 + 664 + 83 -- Array of length 3 665 + 01 -- 1 666 + 82 -- Array of length 2 667 + 02 -- 2 668 + 03 -- 3 669 + 82 -- Array of length 2 670 + 04 -- 4 671 + 05 -- 5 672 + 673 + Indefinite-length encoding could be applied independently to each of 674 + the three arrays encoded in this data item, as required, leading to 675 + representations such as: 676 + 677 + 0x9f018202039f0405ffff 678 + 9F -- Start indefinite-length array 679 + 01 -- 1 680 + 82 -- Array of length 2 681 + 02 -- 2 682 + 03 -- 3 683 + 9F -- Start indefinite-length array 684 + 04 -- 4 685 + 05 -- 5 686 + FF -- "break" (inner array) 687 + FF -- "break" (outer array) 688 + 689 + 0x9f01820203820405ff 690 + 9F -- Start indefinite-length array 691 + 01 -- 1 692 + 82 -- Array of length 2 693 + 02 -- 2 694 + 03 -- 3 695 + 82 -- Array of length 2 696 + 04 -- 4 697 + 05 -- 5 698 + FF -- "break" 699 + 700 + 0x83018202039f0405ff 701 + 83 -- Array of length 3 702 + 01 -- 1 703 + 82 -- Array of length 2 704 + 02 -- 2 705 + 03 -- 3 706 + 9F -- Start indefinite-length array 707 + 04 -- 4 708 + 05 -- 5 709 + FF -- "break" 710 + 711 + 0x83019f0203ff820405 712 + 83 -- Array of length 3 713 + 01 -- 1 714 + 9F -- Start indefinite-length array 715 + 02 -- 2 716 + 03 -- 3 717 + FF -- "break" 718 + 82 -- Array of length 2 719 + 04 -- 4 720 + 05 -- 5 721 + 722 + An example of an indefinite-length map (that happens to have two key/ 723 + value pairs) might be: 724 + 725 + 0xbf6346756ef563416d7421ff 726 + BF -- Start indefinite-length map 727 + 63 -- First key, UTF-8 string length 3 728 + 46756e -- "Fun" 729 + F5 -- First value, true 730 + 63 -- Second key, UTF-8 string length 3 731 + 416d74 -- "Amt" 732 + 21 -- Second value, -2 733 + FF -- "break" 734 + 735 + 3.2.3. Indefinite-Length Byte Strings and Text Strings 736 + 737 + Indefinite-length strings are represented by a byte containing the 738 + major type for byte string or text string with an additional 739 + information value of 31, followed by a series of zero or more strings 740 + of the specified type ("chunks") that have definite lengths, and 741 + finished by the "break" stop code (Section 3.2.1). The data item 742 + represented by the indefinite-length string is the concatenation of 743 + the chunks. If no chunks are present, the data item is an empty 744 + string of the specified type. Zero-length chunks, while not 745 + particularly useful, are permitted. 746 + 747 + If any item between the indefinite-length string indicator 748 + (0b010_11111 or 0b011_11111) and the "break" stop code is not a 749 + definite-length string item of the same major type, the string is not 750 + well-formed. 751 + 752 + The design does not allow nesting indefinite-length strings as chunks 753 + into indefinite-length strings. If it were allowed, it would require 754 + decoder implementations to keep a stack, or at least a count, of 755 + nesting levels. It is unnecessary on the encoder side because the 756 + inner indefinite-length string would consist of chunks, and these 757 + could instead be put directly into the outer indefinite-length 758 + string. 759 + 760 + If any definite-length text string inside an indefinite-length text 761 + string is invalid, the indefinite-length text string is invalid. 762 + Note that this implies that the UTF-8 bytes of a single Unicode code 763 + point (scalar value) cannot be spread between chunks: a new chunk of 764 + a text string can only be started at a code point boundary. 765 + 766 + For example, assume an encoded data item consisting of the bytes: 767 + 768 + 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 769 + 5F -- Start indefinite-length byte string 770 + 44 -- Byte string of length 4 771 + aabbccdd -- Bytes content 772 + 43 -- Byte string of length 3 773 + eeff99 -- Bytes content 774 + FF -- "break" 775 + 776 + After decoding, this results in a single byte string with seven 777 + bytes: 0xaabbccddeeff99. 778 + 779 + 3.2.4. Summary of Indefinite-Length Use of Major Types 780 + 781 + Table 2 summarizes the major types defined by CBOR as used for 782 + indefinite-length encoding (with additional information set to 31). 783 + 784 + +============+===================+==================================+ 785 + | Major Type | Meaning | Enclosed up to "break" Stop Code | 786 + +============+===================+==================================+ 787 + | 0 | (not well- | - | 788 + | | formed) | | 789 + +------------+-------------------+----------------------------------+ 790 + | 1 | (not well- | - | 791 + | | formed) | | 792 + +------------+-------------------+----------------------------------+ 793 + | 2 | byte string | definite-length byte strings | 794 + +------------+-------------------+----------------------------------+ 795 + | 3 | text string | definite-length text strings | 796 + +------------+-------------------+----------------------------------+ 797 + | 4 | array | data items (elements) | 798 + +------------+-------------------+----------------------------------+ 799 + | 5 | map | data items (key/value pairs) | 800 + +------------+-------------------+----------------------------------+ 801 + | 6 | (not well- | - | 802 + | | formed) | | 803 + +------------+-------------------+----------------------------------+ 804 + | 7 | "break" stop | - | 805 + | | code | | 806 + +------------+-------------------+----------------------------------+ 807 + 808 + Table 2: Overview of the Indefinite-Length Use of CBOR Major 809 + Types (Additional Information = 31) 810 + 811 + 3.3. Floating-Point Numbers and Values with No Content 812 + 813 + Major type 7 is for two types of data: floating-point numbers and 814 + "simple values" that do not need any content. Each value of the 815 + 5-bit additional information in the initial byte has its own separate 816 + meaning, as defined in Table 3. Like the major types for integers, 817 + items of this major type do not carry content data; all the 818 + information is in the initial bytes (the head). 819 + 820 + +=============+===================================================+ 821 + | 5-Bit Value | Semantics | 822 + +=============+===================================================+ 823 + | 0..23 | Simple value (value 0..23) | 824 + +-------------+---------------------------------------------------+ 825 + | 24 | Simple value (value 32..255 in following byte) | 826 + +-------------+---------------------------------------------------+ 827 + | 25 | IEEE 754 Half-Precision Float (16 bits follow) | 828 + +-------------+---------------------------------------------------+ 829 + | 26 | IEEE 754 Single-Precision Float (32 bits follow) | 830 + +-------------+---------------------------------------------------+ 831 + | 27 | IEEE 754 Double-Precision Float (64 bits follow) | 832 + +-------------+---------------------------------------------------+ 833 + | 28-30 | Reserved, not well-formed in the present document | 834 + +-------------+---------------------------------------------------+ 835 + | 31 | "break" stop code for indefinite-length items | 836 + | | (Section 3.2.1) | 837 + +-------------+---------------------------------------------------+ 838 + 839 + Table 3: Values for Additional Information in Major Type 7 840 + 841 + As with all other major types, the 5-bit value 24 signifies a single- 842 + byte extension: it is followed by an additional byte to represent the 843 + simple value. (To minimize confusion, only the values 32 to 255 are 844 + used.) This maintains the structure of the initial bytes: as for the 845 + other major types, the length of these always depends on the 846 + additional information in the first byte. Table 4 lists the numeric 847 + values assigned and available for simple values. 848 + 849 + +=========+==============+ 850 + | Value | Semantics | 851 + +=========+==============+ 852 + | 0..19 | (unassigned) | 853 + +---------+--------------+ 854 + | 20 | false | 855 + +---------+--------------+ 856 + | 21 | true | 857 + +---------+--------------+ 858 + | 22 | null | 859 + +---------+--------------+ 860 + | 23 | undefined | 861 + +---------+--------------+ 862 + | 24..31 | (reserved) | 863 + +---------+--------------+ 864 + | 32..255 | (unassigned) | 865 + +---------+--------------+ 866 + 867 + Table 4: Simple Values 868 + 869 + An encoder MUST NOT issue two-byte sequences that start with 0xf8 870 + (major type 7, additional information 24) and continue with a byte 871 + less than 0x20 (32 decimal). Such sequences are not well-formed. 872 + (This implies that an encoder cannot encode "false", "true", "null", 873 + or "undefined" in two-byte sequences and that only the one-byte 874 + variants of these are well-formed; more generally speaking, each 875 + simple value only has a single representation variant). 876 + 877 + The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit 878 + IEEE 754 binary floating-point values [IEEE754]. These floating- 879 + point values are encoded in the additional bytes of the appropriate 880 + size. (See Appendix D for some information about 16-bit floating- 881 + point numbers.) 882 + 883 + 3.4. Tagging of Items 884 + 885 + In CBOR, a data item can be enclosed by a tag to give it some 886 + additional semantics, as uniquely identified by a _tag number_. The 887 + tag is major type 6, its argument (Section 3) indicates the tag 888 + number, and it contains a single enclosed data item, the _tag 889 + content_. (If a tag requires further structure to its content, this 890 + structure is provided by the enclosed data item.) We use the term 891 + _tag_ for the entire data item consisting of both a tag number and 892 + the tag content: the tag content is the data item that is being 893 + tagged. 894 + 895 + For example, assume that a byte string of length 12 is marked with a 896 + tag of number 2 to indicate it is an unsigned _bignum_ 897 + (Section 3.4.3). The encoded data item would start with a byte 898 + 0b110_00010 (major type 6, additional information 2 for the tag 899 + number) followed by the encoded tag content: 0b010_01100 (major type 900 + 2, additional information 12 for the length) followed by the 12 bytes 901 + of the bignum. 902 + 903 + In the extended generic data model, a tag number's definition 904 + describes the additional semantics conveyed with the tag number. 905 + These semantics may include equivalence of some tagged data items 906 + with other data items, including some that can be represented in the 907 + basic generic data model. For instance, 0xc24101, a bignum the tag 908 + content of which is the byte string with the single byte 0x01, is 909 + equivalent to an integer 1, which could also be encoded as 0x01, 910 + 0x1801, or 0x190001. The tag definition may specify a preferred 911 + serialization (Section 4.1) that is recommended for generic encoders; 912 + this may prefer basic generic data model representations over ones 913 + that employ a tag. 914 + 915 + The tag definition usually defines which nested data items are valid 916 + for such tags. Tag definitions may restrict their content to a very 917 + specific syntactic structure, as the tags defined in this document 918 + do, or they may define their content more semantically. An example 919 + for the latter is how tags 40 and 1040 accept multiple ways to 920 + represent arrays [RFC8746]. 921 + 922 + As a matter of convention, many tags do not accept "null" or 923 + "undefined" values as tag content; instead, the expectation is that a 924 + "null" or "undefined" value can be used in place of the entire tag; 925 + Section 3.4.2 provides some further considerations for one specific 926 + tag about the handling of this convention in application protocols 927 + and in mapping to platform types. 928 + 929 + Decoders do not need to understand tags of every tag number, and tags 930 + may be of little value in applications where the implementation 931 + creating a particular CBOR data item and the implementation decoding 932 + that stream know the semantic meaning of each item in the data flow. 933 + The primary purpose of tags in this specification is to define common 934 + data types such as dates. A secondary purpose is to provide 935 + conversion hints when it is foreseen that the CBOR data item needs to 936 + be translated into a different format, requiring hints about the 937 + content of items. Understanding the semantics of tags is optional 938 + for a decoder; it can simply present both the tag number and the tag 939 + content to the application, without interpreting the additional 940 + semantics of the tag. 941 + 942 + A tag applies semantics to the data item it encloses. Tags can nest: 943 + if tag A encloses tag B, which encloses data item C, tag A applies to 944 + the result of applying tag B on data item C. 945 + 946 + IANA maintains a registry of tag numbers as described in Section 9.2. 947 + Table 5 provides a list of tag numbers that were defined in [RFC7049] 948 + with definitions in the rest of this section. (Tag number 35 was 949 + also defined in [RFC7049]; a discussion of this tag number follows in 950 + Section 3.4.5.3.) Note that many other tag numbers have been defined 951 + since the publication of [RFC7049]; see the registry described at 952 + Section 9.2 for the complete list. 953 + 954 + +=======+=============+==================================+ 955 + | Tag | Data Item | Semantics | 956 + +=======+=============+==================================+ 957 + | 0 | text string | Standard date/time string; see | 958 + | | | Section 3.4.1 | 959 + +-------+-------------+----------------------------------+ 960 + | 1 | integer or | Epoch-based date/time; see | 961 + | | float | Section 3.4.2 | 962 + +-------+-------------+----------------------------------+ 963 + | 2 | byte string | Unsigned bignum; see | 964 + | | | Section 3.4.3 | 965 + +-------+-------------+----------------------------------+ 966 + | 3 | byte string | Negative bignum; see | 967 + | | | Section 3.4.3 | 968 + +-------+-------------+----------------------------------+ 969 + | 4 | array | Decimal fraction; see | 970 + | | | Section 3.4.4 | 971 + +-------+-------------+----------------------------------+ 972 + | 5 | array | Bigfloat; see Section 3.4.4 | 973 + +-------+-------------+----------------------------------+ 974 + | 21 | (any) | Expected conversion to base64url | 975 + | | | encoding; see Section 3.4.5.2 | 976 + +-------+-------------+----------------------------------+ 977 + | 22 | (any) | Expected conversion to base64 | 978 + | | | encoding; see Section 3.4.5.2 | 979 + +-------+-------------+----------------------------------+ 980 + | 23 | (any) | Expected conversion to base16 | 981 + | | | encoding; see Section 3.4.5.2 | 982 + +-------+-------------+----------------------------------+ 983 + | 24 | byte string | Encoded CBOR data item; see | 984 + | | | Section 3.4.5.1 | 985 + +-------+-------------+----------------------------------+ 986 + | 32 | text string | URI; see Section 3.4.5.3 | 987 + +-------+-------------+----------------------------------+ 988 + | 33 | text string | base64url; see Section 3.4.5.3 | 989 + +-------+-------------+----------------------------------+ 990 + | 34 | text string | base64; see Section 3.4.5.3 | 991 + +-------+-------------+----------------------------------+ 992 + | 36 | text string | MIME message; see | 993 + | | | Section 3.4.5.3 | 994 + +-------+-------------+----------------------------------+ 995 + | 55799 | (any) | Self-described CBOR; see | 996 + | | | Section 3.4.6 | 997 + +-------+-------------+----------------------------------+ 998 + 999 + Table 5: Tag Numbers Defined in RFC 7049 1000 + 1001 + Conceptually, tags are interpreted in the generic data model, not at 1002 + (de-)serialization time. A small number of tags (at this time, tag 1003 + number 25 and tag number 29 [IANA.cbor-tags]) have been registered 1004 + with semantics that may require processing at (de-)serialization 1005 + time: the decoder needs to be aware of, and the encoder needs to be 1006 + in control of, the exact sequence in which data items are encoded 1007 + into the CBOR data item. This means these tags cannot be implemented 1008 + on top of an arbitrary generic CBOR encoder/decoder (which might not 1009 + reflect the serialization order for entries in a map at the data 1010 + model level and vice versa); their implementation therefore typically 1011 + needs to be integrated into the generic encoder/decoder. The 1012 + definition of new tags with this property is NOT RECOMMENDED. 1013 + 1014 + IANA allocated tag numbers 65535, 4294967295, and 1015 + 18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit). 1016 + These can be used as a convenience for implementers who want a 1017 + single-integer data structure to indicate either the presence of a 1018 + specific tag or absence of a tag. That allocation is described in 1019 + Section 10 of [CBOR-TAGS]. These tags are not intended to occur in 1020 + actual CBOR data items; implementations MAY flag such an occurrence 1021 + as an error. 1022 + 1023 + Protocols can extend the generic data model (Section 2) with data 1024 + items representing points in time by using tag numbers 0 and 1, with 1025 + arbitrarily sized integers by using tag numbers 2 and 3, and with 1026 + floating-point values of arbitrary size and precision by using tag 1027 + numbers 4 and 5. 1028 + 1029 + 3.4.1. Standard Date/Time String 1030 + 1031 + Tag number 0 contains a text string in the standard format described 1032 + by the "date-time" production in [RFC3339], as refined by Section 3.3 1033 + of [RFC4287], representing the point in time described there. A 1034 + nested item of another type or a text string that doesn't match the 1035 + format described in [RFC4287] is invalid. 1036 + 1037 + 3.4.2. Epoch-Based Date/Time 1038 + 1039 + Tag number 1 contains a numerical value counting the number of 1040 + seconds from 1970-01-01T00:00Z in UTC time to the represented point 1041 + in civil time. 1042 + 1043 + The tag content MUST be an unsigned or negative integer (major types 1044 + 0 and 1) or a floating-point number (major type 7 with additional 1045 + information 25, 26, or 27). Other contained types are invalid. 1046 + 1047 + Nonnegative values (major type 0 and nonnegative floating-point 1048 + numbers) stand for time values on or after 1970-01-01T00:00Z UTC and 1049 + are interpreted according to POSIX [TIME_T]. (POSIX time is also 1050 + known as "UNIX Epoch time".) Leap seconds are handled specially by 1051 + POSIX time, and this results in a 1-second discontinuity several 1052 + times per decade. Note that applications that require the expression 1053 + of times beyond early 2106 cannot leave out support of 64-bit 1054 + integers for the tag content. 1055 + 1056 + Negative values (major type 1 and negative floating-point numbers) 1057 + are interpreted as determined by the application requirements as 1058 + there is no universal standard for UTC count-of-seconds time before 1059 + 1970-01-01T00:00Z (this is particularly true for points in time that 1060 + precede discontinuities in national calendars). The same applies to 1061 + non-finite values. 1062 + 1063 + To indicate fractional seconds, floating-point values can be used 1064 + within tag number 1 instead of integer values. Note that this 1065 + generally requires binary64 support, as binary16 and binary32 provide 1066 + nonzero fractions of seconds only for a short period of time around 1067 + early 1970. An application that requires tag number 1 support may 1068 + restrict the tag content to be an integer (or a floating-point value) 1069 + only. 1070 + 1071 + Note that platform types for date/time may include "null" or 1072 + "undefined" values, which may also be desirable at an application 1073 + protocol level. While emitting tag number 1 values with non-finite 1074 + tag content values (e.g., with NaN for undefined date/time values or 1075 + with Infinity for an expiry date that is not set) may seem an obvious 1076 + way to handle this, using untagged "null" or "undefined" avoids the 1077 + use of non-finites and results in a shorter encoding. Application 1078 + protocol designers are encouraged to consider these cases and include 1079 + clear guidelines for handling them. 1080 + 1081 + 3.4.3. Bignums 1082 + 1083 + Protocols using tag numbers 2 and 3 extend the generic data model 1084 + (Section 2) with "bignums" representing arbitrarily sized integers. 1085 + In the basic generic data model, bignum values are not equal to 1086 + integers from the same model, but the extended generic data model 1087 + created by this tag definition defines equivalence based on numeric 1088 + value, and preferred serialization (Section 4.1) never makes use of 1089 + bignums that also can be expressed as basic integers (see below). 1090 + 1091 + Bignums are encoded as a byte string data item, which is interpreted 1092 + as an unsigned integer n in network byte order. Contained items of 1093 + other types are invalid. For tag number 2, the value of the bignum 1094 + is n. For tag number 3, the value of the bignum is -1 - n. The 1095 + preferred serialization of the byte string is to leave out any 1096 + leading zeroes (note that this means the preferred serialization for 1097 + n = 0 is the empty byte string, but see below). Decoders that 1098 + understand these tags MUST be able to decode bignums that do have 1099 + leading zeroes. The preferred serialization of an integer that can 1100 + be represented using major type 0 or 1 is to encode it this way 1101 + instead of as a bignum (which means that the empty string never 1102 + occurs in a bignum when using preferred serialization). Note that 1103 + this means the non-preferred choice of a bignum representation 1104 + instead of a basic integer for encoding a number is not intended to 1105 + have application semantics (just as the choice of a longer basic 1106 + integer representation than needed, such as 0x1800 for 0x00, does 1107 + not). 1108 + 1109 + For example, the number 18446744073709551616 (2^(64)) is represented 1110 + as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001 1111 + (major type 2, length 9), followed by 0x010000000000000000 (one byte 1112 + 0x01 and eight bytes 0x00). In hexadecimal: 1113 + 1114 + C2 -- Tag 2 1115 + 49 -- Byte string of length 9 1116 + 010000000000000000 -- Bytes content 1117 + 1118 + 3.4.4. Decimal Fractions and Bigfloats 1119 + 1120 + Protocols using tag number 4 extend the generic data model with data 1121 + items representing arbitrary-length decimal fractions of the form 1122 + m*(10^(e)). Protocols using tag number 5 extend the generic data 1123 + model with data items representing arbitrary-length binary fractions 1124 + of the form m*(2^(e)). As with bignums, values of different types 1125 + are not equal in the generic data model. 1126 + 1127 + Decimal fractions combine an integer mantissa with a base-10 scaling 1128 + factor. They are most useful if an application needs the exact 1129 + representation of a decimal fraction such as 1.1 because there is no 1130 + exact representation for many decimal fractions in binary floating- 1131 + point representations. 1132 + 1133 + "Bigfloats" combine an integer mantissa with a base-2 scaling factor. 1134 + They are binary floating-point values that can exceed the range or 1135 + the precision of the three IEEE 754 formats supported by CBOR 1136 + (Section 3.3). Bigfloats may also be used by constrained 1137 + applications that need some basic binary floating-point capability 1138 + without the need for supporting IEEE 754. 1139 + 1140 + A decimal fraction or a bigfloat is represented as a tagged array 1141 + that contains exactly two integer numbers: an exponent e and a 1142 + mantissa m. Decimal fractions (tag number 4) use base-10 exponents; 1143 + the value of a decimal fraction data item is m*(10^(e)). Bigfloats 1144 + (tag number 5) use base-2 exponents; the value of a bigfloat data 1145 + item is m*(2^(e)). The exponent e MUST be represented in an integer 1146 + of major type 0 or 1, while the mantissa can also be a bignum 1147 + (Section 3.4.3). Contained items with other structures are invalid. 1148 + 1149 + An example of a decimal fraction is the representation of the number 1150 + 273.15 as 0b110_00100 (major type 6 for tag, additional information 4 1151 + for the tag number), followed by 0b100_00010 (major type 4 for the 1152 + array, additional information 2 for the length of the array), 1153 + followed by 0b001_00001 (major type 1 for the first integer, 1154 + additional information 1 for the value of -2), followed by 1155 + 0b000_11001 (major type 0 for the second integer, additional 1156 + information 25 for a two-byte value), followed by 0b0110101010110011 1157 + (27315 in two bytes). In hexadecimal: 1158 + 1159 + C4 -- Tag 4 1160 + 82 -- Array of length 2 1161 + 21 -- -2 1162 + 19 6ab3 -- 27315 1163 + 1164 + An example of a bigfloat is the representation of the number 1.5 as 1165 + 0b110_00101 (major type 6 for tag, additional information 5 for the 1166 + tag number), followed by 0b100_00010 (major type 4 for the array, 1167 + additional information 2 for the length of the array), followed by 1168 + 0b001_00000 (major type 1 for the first integer, additional 1169 + information 0 for the value of -1), followed by 0b000_00011 (major 1170 + type 0 for the second integer, additional information 3 for the value 1171 + of 3). In hexadecimal: 1172 + 1173 + C5 -- Tag 5 1174 + 82 -- Array of length 2 1175 + 20 -- -1 1176 + 03 -- 3 1177 + 1178 + Decimal fractions and bigfloats provide no representation of 1179 + Infinity, -Infinity, or NaN; if these are needed in place of a 1180 + decimal fraction or bigfloat, the IEEE 754 half-precision 1181 + representations from Section 3.3 can be used. 1182 + 1183 + 3.4.5. Content Hints 1184 + 1185 + The tags in this section are for content hints that might be used by 1186 + generic CBOR processors. These content hints do not extend the 1187 + generic data model. 1188 + 1189 + 3.4.5.1. Encoded CBOR Data Item 1190 + 1191 + Sometimes it is beneficial to carry an embedded CBOR data item that 1192 + is not meant to be decoded immediately at the time the enclosing data 1193 + item is being decoded. Tag number 24 (CBOR data item) can be used to 1194 + tag the embedded byte string as a single data item encoded in CBOR 1195 + format. Contained items that aren't byte strings are invalid. A 1196 + contained byte string is valid if it encodes a well-formed CBOR data 1197 + item; validity checking of the decoded CBOR item is not required for 1198 + tag validity (but could be offered by a generic decoder as a special 1199 + option). 1200 + 1201 + 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 1202 + 1203 + Tag numbers 21 to 23 indicate that a byte string might require a 1204 + specific encoding when interoperating with a text-based 1205 + representation. These tags are useful when an encoder knows that the 1206 + byte string data it is writing is likely to be later converted to a 1207 + particular JSON-based usage. That usage specifies that some strings 1208 + are encoded as base64, base64url, and so on. The encoder uses byte 1209 + strings instead of doing the encoding itself to reduce the message 1210 + size, to reduce the code size of the encoder, or both. The encoder 1211 + does not know whether or not the converter will be generic, and 1212 + therefore wants to say what it believes is the proper way to convert 1213 + binary strings to JSON. 1214 + 1215 + The data item tagged can be a byte string or any other data item. In 1216 + the latter case, the tag applies to all of the byte string data items 1217 + contained in the data item, except for those contained in a nested 1218 + data item tagged with an expected conversion. 1219 + 1220 + These three tag numbers suggest conversions to three of the base data 1221 + encodings defined in [RFC4648]. Tag number 21 suggests conversion to 1222 + base64url encoding (Section 5 of [RFC4648]) where padding is not used 1223 + (see Section 3.2 of [RFC4648]); that is, all trailing equals signs 1224 + ("=") are removed from the encoded string. Tag number 22 suggests 1225 + conversion to classical base64 encoding (Section 4 of [RFC4648]) with 1226 + padding as defined in RFC 4648. For both base64url and base64, 1227 + padding bits are set to zero (see Section 3.5 of [RFC4648]), and the 1228 + conversion to alternate encoding is performed on the contents of the 1229 + byte string (that is, without adding any line breaks, whitespace, or 1230 + other additional characters). Tag number 23 suggests conversion to 1231 + base16 (hex) encoding with uppercase alphabetics (see Section 8 of 1232 + [RFC4648]). Note that, for all three tag numbers, the encoding of 1233 + the empty byte string is the empty text string. 1234 + 1235 + 3.4.5.3. Encoded Text 1236 + 1237 + Some text strings hold data that have formats widely used on the 1238 + Internet, and sometimes those formats can be validated and presented 1239 + to the application in appropriate form by the decoder. There are 1240 + tags for some of these formats. 1241 + 1242 + * Tag number 32 is for URIs, as defined in [RFC3986]. If the text 1243 + string doesn't match the "URI-reference" production, the string is 1244 + invalid. 1245 + 1246 + * Tag numbers 33 and 34 are for base64url- and base64-encoded text 1247 + strings, respectively, as defined in [RFC4648]. If any of the 1248 + following apply: 1249 + 1250 + - the encoded text string contains non-alphabet characters or 1251 + only 1 alphabet character in the last block of 4 (where 1252 + alphabet is defined by Section 5 of [RFC4648] for tag number 33 1253 + and Section 4 of [RFC4648] for tag number 34), or 1254 + 1255 + - the padding bits in a 2- or 3-character block are not 0, or 1256 + 1257 + - the base64 encoding has the wrong number of padding characters, 1258 + or 1259 + 1260 + - the base64url encoding has padding characters, 1261 + 1262 + the string is invalid. 1263 + 1264 + * Tag number 36 is for MIME messages (including all headers), as 1265 + defined in [RFC2045]. A text string that isn't a valid MIME 1266 + message is invalid. (For this tag, validity checking may be 1267 + particularly onerous for a generic decoder and might therefore not 1268 + be offered. Note that many MIME messages are general binary data 1269 + and therefore cannot be represented in a text string; 1270 + [IANA.cbor-tags] lists a registration for tag number 257 that is 1271 + similar to tag number 36 but uses a byte string as its tag 1272 + content.) 1273 + 1274 + Note that tag numbers 33 and 34 differ from 21 and 22 in that the 1275 + data is transported in base-encoded form for the former and in raw 1276 + byte string form for the latter. 1277 + 1278 + [RFC7049] also defined a tag number 35 for regular expressions that 1279 + are in Perl Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] 1280 + or in JavaScript regular expression syntax [ECMA262]. The state of 1281 + the art in these regular expression specifications has since advanced 1282 + and is continually advancing, so this specification does not attempt 1283 + to update the references. Instead, this tag remains available (as 1284 + registered in [RFC7049]) for applications that specify the particular 1285 + regular expression variant they use out-of-band (possibly by limiting 1286 + the usage to a defined common subset of both PCRE and ECMA262). As 1287 + this specification clarifies tag validity beyond [RFC7049], we note 1288 + that due to the open way the tag was defined in [RFC7049], any 1289 + contained string value needs to be valid at the CBOR tag level (but 1290 + then may not be "expected" at the application level). 1291 + 1292 + 3.4.6. Self-Described CBOR 1293 + 1294 + In many applications, it will be clear from the context that CBOR is 1295 + being employed for encoding a data item. For instance, a specific 1296 + protocol might specify the use of CBOR, or a media type is indicated 1297 + that specifies its use. However, there may be applications where 1298 + such context information is not available, such as when CBOR data is 1299 + stored in a file that does not have disambiguating metadata. Here, 1300 + it may help to have some distinguishing characteristics for the data 1301 + itself. 1302 + 1303 + Tag number 55799 is defined for this purpose, specifically for use at 1304 + the start of a stored encoded CBOR data item as specified by an 1305 + application. It does not impart any special semantics on the data 1306 + item that it encloses; that is, the semantics of the tag content 1307 + enclosed in tag number 55799 is exactly identical to the semantics of 1308 + the tag content itself. 1309 + 1310 + The serialization of this tag's head is 0xd9d9f7, which does not 1311 + appear to be in use as a distinguishing mark for any frequently used 1312 + file types. In particular, 0xd9d9f7 is not a valid start of a 1313 + Unicode text in any Unicode encoding if it is followed by a valid 1314 + CBOR data item. 1315 + 1316 + For instance, a decoder might be able to decode both CBOR and JSON. 1317 + Such a decoder would need to mechanically distinguish the two 1318 + formats. An easy way for an encoder to help the decoder would be to 1319 + tag the entire CBOR item with tag number 55799, the serialization of 1320 + which will never be found at the beginning of a JSON text. 1321 + 1322 + 4. Serialization Considerations 1323 + 1324 + 4.1. Preferred Serialization 1325 + 1326 + For some values at the data model level, CBOR provides multiple 1327 + serializations. For many applications, it is desirable that an 1328 + encoder always chooses a preferred serialization (preferred 1329 + encoding); however, the present specification does not put the burden 1330 + of enforcing this preference on either the encoder or decoder. 1331 + 1332 + Some constrained decoders may be limited in their ability to decode 1333 + non-preferred serializations: for example, if only integers below 1334 + 1_000_000_000 (one billion) are expected in an application, the 1335 + decoder may leave out the code that would be needed to decode 64-bit 1336 + arguments in integers. An encoder that always uses preferred 1337 + serialization ("preferred encoder") interoperates with this decoder 1338 + for the numbers that can occur in this application. Generally 1339 + speaking, a preferred encoder is more universally interoperable (and 1340 + also less wasteful) than one that, say, always uses 64-bit integers. 1341 + 1342 + Similarly, a constrained encoder may be limited in the variety of 1343 + representation variants it supports such that it does not emit 1344 + preferred serializations ("variant encoder"). For instance, a 1345 + constrained encoder could be designed to always use the 32-bit 1346 + variant for an integer that it encodes even if a short representation 1347 + is available (assuming that there is no application need for integers 1348 + that can only be represented with the 64-bit variant). A decoder 1349 + that does not rely on receiving only preferred serializations 1350 + ("variation-tolerant decoder") can therefore be said to be more 1351 + universally interoperable (it might very well optimize for the case 1352 + of receiving preferred serializations, though). Full implementations 1353 + of CBOR decoders are by definition variation tolerant; the 1354 + distinction is only relevant if a constrained implementation of a 1355 + CBOR decoder meets a variant encoder. 1356 + 1357 + The preferred serialization always uses the shortest form of 1358 + representing the argument (Section 3); it also uses the shortest 1359 + floating-point encoding that preserves the value being encoded. 1360 + 1361 + The preferred serialization for a floating-point value is the 1362 + shortest floating-point encoding that preserves its value, e.g., 1363 + 0xf94580 for the number 5.5, and 0xfa45ad9c00 for the number 5555.5. 1364 + For NaN values, a shorter encoding is preferred if zero-padding the 1365 + shorter significand towards the right reconstitutes the original NaN 1366 + value (for many applications, the single NaN encoding 0xf97e00 will 1367 + suffice). 1368 + 1369 + Definite-length encoding is preferred whenever the length is known at 1370 + the time the serialization of the item starts. 1371 + 1372 + 4.2. Deterministically Encoded CBOR 1373 + 1374 + Some protocols may want encoders to only emit CBOR in a particular 1375 + deterministic format; those protocols might also have the decoders 1376 + check that their input is in that deterministic format. Those 1377 + protocols are free to define what they mean by a "deterministic 1378 + format" and what encoders and decoders are expected to do. This 1379 + section defines a set of restrictions that can serve as the base of 1380 + such a deterministic format. 1381 + 1382 + 4.2.1. Core Deterministic Encoding Requirements 1383 + 1384 + A CBOR encoding satisfies the "core deterministic encoding 1385 + requirements" if it satisfies the following restrictions: 1386 + 1387 + * Preferred serialization MUST be used. In particular, this means 1388 + that arguments (see Section 3) for integers, lengths in major 1389 + types 2 through 5, and tags MUST be as short as possible, for 1390 + instance: 1391 + 1392 + - 0 to 23 and -1 to -24 MUST be expressed in the same byte as the 1393 + major type; 1394 + 1395 + - 24 to 255 and -25 to -256 MUST be expressed only with an 1396 + additional uint8_t; 1397 + 1398 + - 256 to 65535 and -257 to -65536 MUST be expressed only with an 1399 + additional uint16_t; 1400 + 1401 + - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed 1402 + only with an additional uint32_t. 1403 + 1404 + Floating-point values also MUST use the shortest form that 1405 + preserves the value, e.g., 1.5 is encoded as 0xf93e00 (binary16) 1406 + and 1000000.5 as 0xfa49742408 (binary32). (One implementation of 1407 + this is to have all floats start as a 64-bit float, then do a test 1408 + conversion to a 32-bit float; if the result is the same numeric 1409 + value, use the shorter form and repeat the process with a test 1410 + conversion to a 16-bit float. This also works to select 16-bit 1411 + float for positive and negative Infinity as well.) 1412 + 1413 + * Indefinite-length items MUST NOT appear. They can be encoded as 1414 + definite-length items instead. 1415 + 1416 + * The keys in every map MUST be sorted in the bytewise lexicographic 1417 + order of their deterministic encodings. For example, the 1418 + following keys are sorted correctly: 1419 + 1420 + 1. 10, encoded as 0x0a. 1421 + 1422 + 2. 100, encoded as 0x1864. 1423 + 1424 + 3. -1, encoded as 0x20. 1425 + 1426 + 4. "z", encoded as 0x617a. 1427 + 1428 + 5. "aa", encoded as 0x626161. 1429 + 1430 + 6. [100], encoded as 0x811864. 1431 + 1432 + 7. [-1], encoded as 0x8120. 1433 + 1434 + 8. false, encoded as 0xf4. 1435 + 1436 + | Implementation note: the self-delimiting nature of the CBOR 1437 + | encoding means that there are no two well-formed CBOR encoded 1438 + | data items where one is a prefix of the other. The bytewise 1439 + | lexicographic comparison of deterministic encodings of 1440 + | different map keys therefore always ends in a position where 1441 + | the byte differs between the keys, before the end of a key is 1442 + | reached. 1443 + 1444 + 4.2.2. Additional Deterministic Encoding Considerations 1445 + 1446 + CBOR tags present additional considerations for deterministic 1447 + encoding. If a CBOR-based protocol were to provide the same 1448 + semantics for the presence and absence of a specific tag (e.g., by 1449 + allowing both tag 1 data items and raw numbers in a date/time 1450 + position, treating the latter as if they were tagged), the 1451 + deterministic format would not allow the presence of the tag, based 1452 + on the "shortest form" principle. For example, a protocol might give 1453 + encoders the choice of representing a URL as either a text string or, 1454 + using Section 3.4.5.3, tag number 32 containing a text string. This 1455 + protocol's deterministic encoding needs either to require that the 1456 + tag is present or to require that it is absent, not allow either one. 1457 + 1458 + In a protocol that does require tags in certain places to obtain 1459 + specific semantics, the tag needs to appear in the deterministic 1460 + format as well. Deterministic encoding considerations also apply to 1461 + the content of tags. 1462 + 1463 + If a protocol includes a field that can express integers with an 1464 + absolute value of 2^(64) or larger using tag numbers 2 or 3 1465 + (Section 3.4.3), the protocol's deterministic encoding needs to 1466 + specify whether smaller integers are also expressed using these tags 1467 + or using major types 0 and 1. Preferred serialization uses the 1468 + latter choice, which is therefore recommended. 1469 + 1470 + Protocols that include floating-point values, whether represented 1471 + using basic floating-point values (Section 3.3) or using tags (or 1472 + both), may need to define extra requirements on their deterministic 1473 + encodings, such as: 1474 + 1475 + * Although IEEE floating-point values can represent both positive 1476 + and negative zero as distinct values, the application might not 1477 + distinguish these and might decide to represent all zero values 1478 + with a positive sign, disallowing negative zero. (The application 1479 + may also want to restrict the precision of floating-point values 1480 + in such a way that there is never a need to represent 64-bit -- or 1481 + even 32-bit -- floating-point values.) 1482 + 1483 + * If a protocol includes a field that can express floating-point 1484 + values, with a specific data model that declares integer and 1485 + floating-point values to be interchangeable, the protocol's 1486 + deterministic encoding needs to specify whether, for example, the 1487 + integer 1.0 is encoded as 0x01 (unsigned integer), 0xf93c00 1488 + (binary16), 0xfa3f800000 (binary32), or 0xfb3ff0000000000000 1489 + (binary64). Example rules for this are: 1490 + 1491 + 1. Encode integral values that fit in 64 bits as values from 1492 + major types 0 and 1, and other values as the preferred 1493 + (smallest of 16-, 32-, or 64-bit) floating-point 1494 + representation that accurately represents the value, 1495 + 1496 + 2. Encode all values as the preferred floating-point 1497 + representation that accurately represents the value, even for 1498 + integral values, or 1499 + 1500 + 3. Encode all values as 64-bit floating-point representations. 1501 + 1502 + Rule 1 straddles the boundaries between integers and floating- 1503 + point values, and Rule 3 does not use preferred serialization, so 1504 + Rule 2 may be a good choice in many cases. 1505 + 1506 + * If NaN is an allowed value, and there is no intent to support NaN 1507 + payloads or signaling NaNs, the protocol needs to pick a single 1508 + representation, typically 0xf97e00. If that simple choice is not 1509 + possible, specific attention will be needed for NaN handling. 1510 + 1511 + * Subnormal numbers (nonzero numbers with the lowest possible 1512 + exponent of a given IEEE 754 number format) may be flushed to zero 1513 + outputs or be treated as zero inputs in some floating-point 1514 + implementations. A protocol's deterministic encoding may want to 1515 + specifically accommodate such implementations while creating an 1516 + onus on other implementations by excluding subnormal numbers from 1517 + interchange, interchanging zero instead. 1518 + 1519 + * The same number can be represented by different decimal fractions, 1520 + by different bigfloats, and by different forms under other tags 1521 + that may be defined to express numeric values. Depending on the 1522 + implementation, it may not always be practical to determine 1523 + whether any of these forms (or forms in the basic generic data 1524 + model) are equivalent. An application protocol that presents 1525 + choices of this kind for the representation format of numbers 1526 + needs to be explicit about how the formats for deterministic 1527 + encoding are to be chosen. 1528 + 1529 + 4.2.3. Length-First Map Key Ordering 1530 + 1531 + The core deterministic encoding requirements (Section 4.2.1) sort map 1532 + keys in a different order from the one suggested by Section 3.9 of 1533 + [RFC7049] (called "Canonical CBOR" there). Protocols that need to be 1534 + compatible with the order specified in [RFC7049] can instead be 1535 + specified in terms of this specification's "length-first core 1536 + deterministic encoding requirements": 1537 + 1538 + A CBOR encoding satisfies the "length-first core deterministic 1539 + encoding requirements" if it satisfies the core deterministic 1540 + encoding requirements except that the keys in every map MUST be 1541 + sorted such that: 1542 + 1543 + 1. If two keys have different lengths, the shorter one sorts 1544 + earlier; 1545 + 1546 + 2. If two keys have the same length, the one with the lower value in 1547 + (bytewise) lexical order sorts earlier. 1548 + 1549 + For example, under the length-first core deterministic encoding 1550 + requirements, the following keys are sorted correctly: 1551 + 1552 + 1. 10, encoded as 0x0a. 1553 + 1554 + 2. -1, encoded as 0x20. 1555 + 1556 + 3. false, encoded as 0xf4. 1557 + 1558 + 4. 100, encoded as 0x1864. 1559 + 1560 + 5. "z", encoded as 0x617a. 1561 + 1562 + 6. [-1], encoded as 0x8120. 1563 + 1564 + 7. "aa", encoded as 0x626161. 1565 + 1566 + 8. [100], encoded as 0x811864. 1567 + 1568 + | Although [RFC7049] used the term "Canonical CBOR" for its form 1569 + | of requirements on deterministic encoding, this document avoids 1570 + | this term because "canonicalization" is often associated with 1571 + | specific uses of deterministic encoding only. The terms are 1572 + | essentially interchangeable, however, and the set of core 1573 + | requirements in this document could also be called "Canonical 1574 + | CBOR", while the length-first-ordered version of that could be 1575 + | called "Old Canonical CBOR". 1576 + 1577 + 5. Creating CBOR-Based Protocols 1578 + 1579 + Data formats such as CBOR are often used in environments where there 1580 + is no format negotiation. A specific design goal of CBOR is to not 1581 + need any included or assumed schema: a decoder can take a CBOR item 1582 + and decode it with no other knowledge. 1583 + 1584 + Of course, in real-world implementations, the encoder and the decoder 1585 + will have a shared view of what should be in a CBOR data item. For 1586 + example, an agreed-to format might be "the item is an array whose 1587 + first value is a UTF-8 string, second value is an integer, and 1588 + subsequent values are zero or more floating-point numbers" or "the 1589 + item is a map that has byte strings for keys and contains a pair 1590 + whose key is 0xab01". 1591 + 1592 + CBOR-based protocols MUST specify how their decoders handle invalid 1593 + and other unexpected data. CBOR-based protocols MAY specify that 1594 + they treat arbitrary valid data as unexpected. Encoders for CBOR- 1595 + based protocols MUST produce only valid items, that is, the protocol 1596 + cannot be designed to make use of invalid items. An encoder can be 1597 + capable of encoding as many or as few types of values as is required 1598 + by the protocol in which it is used; a decoder can be capable of 1599 + understanding as many or as few types of values as is required by the 1600 + protocols in which it is used. This lack of restrictions allows CBOR 1601 + to be used in extremely constrained environments. 1602 + 1603 + The rest of this section discusses some considerations in creating 1604 + CBOR-based protocols. With few exceptions, it is advisory only and 1605 + explicitly excludes any language from BCP 14 [RFC2119] [RFC8174] 1606 + other than words that could be interpreted as "MAY" in the sense of 1607 + BCP 14. The exceptions aim at facilitating interoperability of CBOR- 1608 + based protocols while making use of a wide variety of both generic 1609 + and application-specific encoders and decoders. 1610 + 1611 + 5.1. CBOR in Streaming Applications 1612 + 1613 + In a streaming application, a data stream may be composed of a 1614 + sequence of CBOR data items concatenated back-to-back. In such an 1615 + environment, the decoder immediately begins decoding a new data item 1616 + if data is found after the end of a previous data item. 1617 + 1618 + Not all of the bytes making up a data item may be immediately 1619 + available to the decoder; some decoders will buffer additional data 1620 + until a complete data item can be presented to the application. 1621 + Other decoders can present partial information about a top-level data 1622 + item to an application, such as the nested data items that could 1623 + already be decoded, or even parts of a byte string that hasn't 1624 + completely arrived yet. Such an application also MUST have a 1625 + matching streaming security mechanism, where the desired protection 1626 + is available for incremental data presented to the application. 1627 + 1628 + Note that some applications and protocols will not want to use 1629 + indefinite-length encoding. Using indefinite-length encoding allows 1630 + an encoder to not need to marshal all the data for counting, but it 1631 + requires a decoder to allocate increasing amounts of memory while 1632 + waiting for the end of the item. This might be fine for some 1633 + applications but not others. 1634 + 1635 + 5.2. Generic Encoders and Decoders 1636 + 1637 + A generic CBOR decoder can decode all well-formed encoded CBOR data 1638 + items and present the data items to an application. See Appendix C. 1639 + (The diagnostic notation, Section 8, may be used to present well- 1640 + formed CBOR values to humans.) 1641 + 1642 + Generic CBOR encoders provide an application interface that allows 1643 + the application to specify any well-formed value to be encoded as a 1644 + CBOR data item, including simple values and tags unknown to the 1645 + encoder. 1646 + 1647 + Even though CBOR attempts to minimize these cases, not all well- 1648 + formed CBOR data is valid: for example, the encoded text string 1649 + "0x62c0ae" does not contain valid UTF-8 (because [RFC3629] requires 1650 + always using the shortest form) and so is not a valid CBOR item. 1651 + Also, specific tags may make semantic constraints that may be 1652 + violated, for instance, by a bignum tag enclosing another tag or by 1653 + an instance of tag number 0 containing a byte string or containing a 1654 + text string with contents that do not match the "date-time" 1655 + production of [RFC3339]. There is no requirement that generic 1656 + encoders and decoders make unnatural choices for their application 1657 + interface to enable the processing of invalid data. Generic encoders 1658 + and decoders are expected to forward simple values and tags even if 1659 + their specific codepoints are not registered at the time the encoder/ 1660 + decoder is written (Section 5.4). 1661 + 1662 + 5.3. Validity of Items 1663 + 1664 + A well-formed but invalid CBOR data item (Section 1.2) presents a 1665 + problem with interpreting the data encoded in it in the CBOR data 1666 + model. A CBOR-based protocol could be specified in several layers, 1667 + in which the lower layers don't process the semantics of some of the 1668 + CBOR data they forward. These layers can't notice any validity 1669 + errors in data they don't process and MUST forward that data as-is. 1670 + The first layer that does process the semantics of an invalid CBOR 1671 + item MUST pick one of two choices: 1672 + 1673 + 1. Replace the problematic item with an error marker and continue 1674 + with the next item, or 1675 + 1676 + 2. Issue an error and stop processing altogether. 1677 + 1678 + A CBOR-based protocol MUST specify which of these options its 1679 + decoders take for each kind of invalid item they might encounter. 1680 + 1681 + Such problems might occur at the basic validity level of CBOR or in 1682 + the context of tags (tag validity). 1683 + 1684 + 5.3.1. Basic validity 1685 + 1686 + Two kinds of validity errors can occur in the basic generic data 1687 + model: 1688 + 1689 + Duplicate keys in a map: Generic decoders (Section 5.2) make data 1690 + available to applications using the native CBOR data model. That 1691 + data model includes maps (key-value mappings with unique keys), 1692 + not multimaps (key-value mappings where multiple entries can have 1693 + the same key). Thus, a generic decoder that gets a CBOR map item 1694 + that has duplicate keys will decode to a map with only one 1695 + instance of that key, or it might stop processing altogether. On 1696 + the other hand, a "streaming decoder" may not even be able to 1697 + notice. See Section 5.6 for more discussion of keys in maps. 1698 + 1699 + Invalid UTF-8 string: A decoder might or might not want to verify 1700 + that the sequence of bytes in a UTF-8 string (major type 3) is 1701 + actually valid UTF-8 and react appropriately. 1702 + 1703 + 5.3.2. Tag validity 1704 + 1705 + Two additional kinds of validity errors are introduced by adding tags 1706 + to the basic generic data model: 1707 + 1708 + Inadmissible type for tag content: Tag numbers (Section 3.4) specify 1709 + what type of data item is supposed to be used as their tag 1710 + content; for example, the tag numbers for unsigned or negative 1711 + bignums are supposed to be put on byte strings. A decoder that 1712 + decodes the tagged data item into a native representation (a 1713 + native big integer in this example) is expected to check the type 1714 + of the data item being tagged. Even decoders that don't have such 1715 + native representations available in their environment may perform 1716 + the check on those tags known to them and react appropriately. 1717 + 1718 + Inadmissible value for tag content: The type of data item may be 1719 + admissible for a tag's content, but the specific value may not be; 1720 + e.g., a value of "yesterday" is not acceptable for the content of 1721 + tag 0, even though it properly is a text string. A decoder that 1722 + normally ingests such tags into equivalent platform types might 1723 + present this tag to the application in a similar way to how it 1724 + would present a tag with an unknown tag number (Section 5.4). 1725 + 1726 + 5.4. Validity and Evolution 1727 + 1728 + A decoder with validity checking will expend the effort to reliably 1729 + detect data items with validity errors. For example, such a decoder 1730 + needs to have an API that reports an error (and does not return data) 1731 + for a CBOR data item that contains any of the validity errors listed 1732 + in the previous subsection. 1733 + 1734 + The set of tags defined in the "Concise Binary Object Representation 1735 + (CBOR) Tags" registry (Section 9.2), as well as the set of simple 1736 + values defined in the "Concise Binary Object Representation (CBOR) 1737 + Simple Values" registry (Section 9.1), can grow at any time beyond 1738 + the set understood by a generic decoder. A validity-checking decoder 1739 + can do one of two things when it encounters such a case that it does 1740 + not recognize: 1741 + 1742 + * It can report an error (and not return data). Note that treating 1743 + this case as an error can cause ossification and is thus not 1744 + encouraged. This error is not a validity error, per se. This 1745 + kind of error is more likely to be raised by a decoder that would 1746 + be performing validity checking if this were a known case. 1747 + 1748 + * It can emit the unknown item (type, value, and, for tags, the 1749 + decoded tagged data item) to the application calling the decoder, 1750 + and then give the application an indication that the decoder did 1751 + not recognize that tag number or simple value. 1752 + 1753 + The latter approach, which is also appropriate for decoders that do 1754 + not support validity checking, provides forward compatibility with 1755 + newly registered tags and simple values without the requirement to 1756 + update the encoder at the same time as the calling application. (For 1757 + this, the decoder's API needs the ability to mark unknown items so 1758 + that the calling application can handle them in a manner appropriate 1759 + for the program.) 1760 + 1761 + Since some of the processing needed for validity checking may have an 1762 + appreciable cost (in particular with duplicate detection for maps), 1763 + support of validity checking is not a requirement placed on all CBOR 1764 + decoders. 1765 + 1766 + Some encoders will rely on their applications to provide input data 1767 + in such a way that valid CBOR results from the encoder. A generic 1768 + encoder may also want to provide a validity-checking mode where it 1769 + reliably limits its output to valid CBOR, independent of whether or 1770 + not its application is indeed providing API-conformant data. 1771 + 1772 + 5.5. Numbers 1773 + 1774 + CBOR-based protocols should take into account that different language 1775 + environments pose different restrictions on the range and precision 1776 + of numbers that are representable. For example, the basic JavaScript 1777 + number system treats all numbers as floating-point values, which may 1778 + result in the silent loss of precision in decoding integers with more 1779 + than 53 significant bits. Another example is that, since CBOR keeps 1780 + the sign bit for its integer representation in the major type, it has 1781 + one bit more for signed numbers of a certain length (e.g., 1782 + -2^(64)..2^(64)-1 for 1+8-byte integers) than the typical platform 1783 + signed integer representation of the same length (-2^(63)..2^(63)-1 1784 + for 8-byte int64_t). A protocol that uses numbers should define its 1785 + expectations on the handling of nontrivial numbers in decoders and 1786 + receiving applications. 1787 + 1788 + A CBOR-based protocol that includes floating-point numbers can 1789 + restrict which of the three formats (half-precision, single- 1790 + precision, and double-precision) are to be supported. For an 1791 + integer-only application, a protocol may want to completely exclude 1792 + the use of floating-point values. 1793 + 1794 + A CBOR-based protocol designed for compactness may want to exclude 1795 + specific integer encodings that are longer than necessary for the 1796 + application, such as to save the need to implement 64-bit integers. 1797 + There is an expectation that encoders will use the most compact 1798 + integer representation that can represent a given value. However, a 1799 + compact application that does not require deterministic encoding 1800 + should accept values that use a longer-than-needed encoding (such as 1801 + encoding "0" as 0b000_11001 followed by two bytes of 0x00) as long as 1802 + the application can decode an integer of the given size. Similar 1803 + considerations apply to floating-point values; decoding both 1804 + preferred serializations and longer-than-needed ones is recommended. 1805 + 1806 + CBOR-based protocols for constrained applications that provide a 1807 + choice between representing a specific number as an integer and as a 1808 + decimal fraction or bigfloat (such as when the exponent is small and 1809 + nonnegative) might express a quality-of-implementation expectation 1810 + that the integer representation is used directly. 1811 + 1812 + 5.6. Specifying Keys for Maps 1813 + 1814 + The encoding and decoding applications need to agree on what types of 1815 + keys are going to be used in maps. In applications that need to 1816 + interwork with JSON-based applications, conversion is simplified by 1817 + limiting keys to text strings only; otherwise, there has to be a 1818 + specified mapping from the other CBOR types to text strings, and this 1819 + often leads to implementation errors. In applications where keys are 1820 + numeric in nature, and numeric ordering of keys is important to the 1821 + application, directly using the numbers for the keys is useful. 1822 + 1823 + If multiple types of keys are to be used, consideration should be 1824 + given to how these types would be represented in the specific 1825 + programming environments that are to be used. For example, in 1826 + JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished 1827 + from a key of floating-point 1.0. This means that, if integer keys 1828 + are used, the protocol needs to avoid the use of floating-point keys 1829 + the values of which happen to be integer numbers in the same map. 1830 + 1831 + Decoders that deliver data items nested within a CBOR data item 1832 + immediately on decoding them ("streaming decoders") often do not keep 1833 + the state that is necessary to ascertain uniqueness of a key in a 1834 + map. Similarly, an encoder that can start encoding data items before 1835 + the enclosing data item is completely available ("streaming encoder") 1836 + may want to reduce its overhead significantly by relying on its data 1837 + source to maintain uniqueness. 1838 + 1839 + A CBOR-based protocol MUST define what to do when a receiving 1840 + application sees multiple identical keys in a map. The resulting 1841 + rule in the protocol MUST respect the CBOR data model: it cannot 1842 + prescribe a specific handling of the entries with the identical keys, 1843 + except that it might have a rule that having identical keys in a map 1844 + indicates a malformed map and that the decoder has to stop with an 1845 + error. When processing maps that exhibit entries with duplicate 1846 + keys, a generic decoder might do one of the following: 1847 + 1848 + * Not accept maps with duplicate keys (that is, enforce validity for 1849 + maps, see also Section 5.4). These generic decoders are 1850 + universally useful. An application may still need to perform its 1851 + own duplicate checking based on application rules (for instance, 1852 + if the application equates integers and floating-point values in 1853 + map key positions for specific maps). 1854 + 1855 + * Pass all map entries to the application, including ones with 1856 + duplicate keys. This requires that the application handle (check 1857 + against) duplicate keys, even if the application rules are 1858 + identical to the generic data model rules. 1859 + 1860 + * Lose some entries with duplicate keys, e.g., deliver only the 1861 + final (or first) entry out of the entries with the same key. With 1862 + such a generic decoder, applications may get different results for 1863 + a specific key on different runs, and with different generic 1864 + decoders, which value is returned is based on generic decoder 1865 + implementation and the actual order of keys in the map. In 1866 + particular, applications cannot validate key uniqueness on their 1867 + own as they do not necessarily see all entries; they may not be 1868 + able to use such a generic decoder if they need to validate key 1869 + uniqueness. These generic decoders can only be used in situations 1870 + where the data source and transfer always provide valid maps; this 1871 + is not possible if the data source and transfer can be attacked. 1872 + 1873 + Generic decoders need to document which of these three approaches 1874 + they implement. 1875 + 1876 + The CBOR data model for maps does not allow ascribing semantics to 1877 + the order of the key/value pairs in the map representation. Thus, a 1878 + CBOR-based protocol MUST NOT specify that changing the key/value pair 1879 + order in a map changes the semantics, except to specify that some 1880 + orders are disallowed, for example, where they would not meet the 1881 + requirements of a deterministic encoding (Section 4.2). (Any 1882 + secondary effects of map ordering such as on timing, cache usage, and 1883 + other potential side channels are not considered part of the 1884 + semantics but may be enough reason on their own for a protocol to 1885 + require a deterministic encoding format.) 1886 + 1887 + Applications for constrained devices should consider using small 1888 + integers as keys if they have maps with a small number of frequently 1889 + used keys; for instance, a set of 24 or fewer keys can be encoded in 1890 + a single byte as unsigned integers, up to 48 if negative integers are 1891 + also used. Less frequently occurring keys can then use integers with 1892 + longer encodings. 1893 + 1894 + 5.6.1. Equivalence of Keys 1895 + 1896 + The specific data model that applies to a CBOR data item is used to 1897 + determine whether keys occurring in maps are duplicates or distinct. 1898 + 1899 + At the generic data model level, numerically equivalent integer and 1900 + floating-point values are distinct from each other, as they are from 1901 + the various big numbers (Tags 2 to 5). Similarly, text strings are 1902 + distinct from byte strings, even if composed of the same bytes. A 1903 + tagged value is distinct from an untagged value or from a value 1904 + tagged with a different tag number. 1905 + 1906 + Within each of these groups, numeric values are distinct unless they 1907 + are numerically equal (specifically, -0.0 is equal to 0.0); for the 1908 + purpose of map key equivalence, NaN values are equivalent if they 1909 + have the same significand after zero-extending both significands at 1910 + the right to 64 bits. 1911 + 1912 + Both byte strings and text strings are compared byte by byte, arrays 1913 + are compared element by element, and are equal if they have the same 1914 + number of bytes/elements and the same values at the same positions. 1915 + Two maps are equal if they have the same set of pairs regardless of 1916 + their order; pairs are equal if both the key and value are equal. 1917 + 1918 + Tagged values are equal if both the tag number and the tag content 1919 + are equal. (Note that a generic decoder that provides processing for 1920 + a specific tag may not be able to distinguish some semantically 1921 + equivalent values, e.g., if leading zeroes occur in the content of 1922 + tag 2 or tag 3 (Section 3.4.3).) Simple values are equal if they 1923 + simply have the same value. Nothing else is equal in the generic 1924 + data model; a simple value 2 is not equivalent to an integer 2, and 1925 + an array is never equivalent to a map. 1926 + 1927 + As discussed in Section 2.2, specific data models can make values 1928 + equivalent for the purpose of comparing map keys that are distinct in 1929 + the generic data model. Note that this implies that a generic 1930 + decoder may deliver a decoded map to an application that needs to be 1931 + checked for duplicate map keys by that application (alternatively, 1932 + the decoder may provide a programming interface to perform this 1933 + service for the application). Specific data models are not able to 1934 + distinguish values for map keys that are equal for this purpose at 1935 + the generic data model level. 1936 + 1937 + 5.7. Undefined Values 1938 + 1939 + In some CBOR-based protocols, the simple value (Section 3.3) of 1940 + "undefined" might be used by an encoder as a substitute for a data 1941 + item with an encoding problem, in order to allow the rest of the 1942 + enclosing data items to be encoded without harm. 1943 + 1944 + 6. Converting Data between CBOR and JSON 1945 + 1946 + This section gives non-normative advice about converting between CBOR 1947 + and JSON. Implementations of converters MAY use whichever advice 1948 + here they want. 1949 + 1950 + It is worth noting that a JSON text is a sequence of characters, not 1951 + an encoded sequence of bytes, while a CBOR data item consists of 1952 + bytes, not characters. 1953 + 1954 + 6.1. Converting from CBOR to JSON 1955 + 1956 + Most of the types in CBOR have direct analogs in JSON. However, some 1957 + do not, and someone implementing a CBOR-to-JSON converter has to 1958 + consider what to do in those cases. The following non-normative 1959 + advice deals with these by converting them to a single substitute 1960 + value, such as a JSON null. 1961 + 1962 + * An integer (major type 0 or 1) becomes a JSON number. 1963 + 1964 + * A byte string (major type 2) that is not embedded in a tag that 1965 + specifies a proposed encoding is encoded in base64url without 1966 + padding and becomes a JSON string. 1967 + 1968 + * A UTF-8 string (major type 3) becomes a JSON string. Note that 1969 + JSON requires escaping certain characters ([RFC8259], Section 7): 1970 + quotation mark (U+0022), reverse solidus (U+005C), and the "C0 1971 + control characters" (U+0000 through U+001F). All other characters 1972 + are copied unchanged into the JSON UTF-8 string. 1973 + 1974 + * An array (major type 4) becomes a JSON array. 1975 + 1976 + * A map (major type 5) becomes a JSON object. This is possible 1977 + directly only if all keys are UTF-8 strings. A converter might 1978 + also convert other keys into UTF-8 strings (such as by converting 1979 + integers into strings containing their decimal representation); 1980 + however, doing so introduces a danger of key collision. Note also 1981 + that, if tags on UTF-8 strings are ignored as proposed below, this 1982 + will cause a key collision if the tags are different but the 1983 + strings are the same. 1984 + 1985 + * False (major type 7, additional information 20) becomes a JSON 1986 + false. 1987 + 1988 + * True (major type 7, additional information 21) becomes a JSON 1989 + true. 1990 + 1991 + * Null (major type 7, additional information 22) becomes a JSON 1992 + null. 1993 + 1994 + * A floating-point value (major type 7, additional information 25 1995 + through 27) becomes a JSON number if it is finite (that is, it can 1996 + be represented in a JSON number); if the value is non-finite (NaN, 1997 + or positive or negative Infinity), it is represented by the 1998 + substitute value. 1999 + 2000 + * Any other simple value (major type 7, any additional information 2001 + value not yet discussed) is represented by the substitute value. 2002 + 2003 + * A bignum (major type 6, tag number 2 or 3) is represented by 2004 + encoding its byte string in base64url without padding and becomes 2005 + a JSON string. For tag number 3 (negative bignum), a "~" (ASCII 2006 + tilde) is inserted before the base-encoded value. (The conversion 2007 + to a binary blob instead of a number is to prevent a likely 2008 + numeric overflow for the JSON decoder.) 2009 + 2010 + * A byte string with an encoding hint (major type 6, tag number 21 2011 + through 23) is encoded as described by the hint and becomes a JSON 2012 + string. 2013 + 2014 + * For all other tags (major type 6, any other tag number), the tag 2015 + content is represented as a JSON value; the tag number is ignored. 2016 + 2017 + * Indefinite-length items are made definite before conversion. 2018 + 2019 + A CBOR-to-JSON converter may want to keep to the JSON profile I-JSON 2020 + [RFC7493], to maximize interoperability and increase confidence that 2021 + the JSON output can be processed with predictable results. For 2022 + example, this has implications on the range of integers that can be 2023 + represented reliably, as well as on the top-level items that may be 2024 + supported by older JSON implementations. 2025 + 2026 + 6.2. Converting from JSON to CBOR 2027 + 2028 + All JSON values, once decoded, directly map into one or more CBOR 2029 + values. As with any kind of CBOR generation, decisions have to be 2030 + made with respect to number representation. In a suggested 2031 + conversion: 2032 + 2033 + * JSON numbers without fractional parts (integer numbers) are 2034 + represented as integers (major types 0 and 1, possibly major type 2035 + 6, tag number 2 and 3), choosing the shortest form; integers 2036 + longer than an implementation-defined threshold may instead be 2037 + represented as floating-point values. The default range that is 2038 + represented as integer is -2^(53)+1..2^(53)-1 (fully exploiting 2039 + the range for exact integers in the binary64 representation often 2040 + used for decoding JSON [RFC7493]). A CBOR-based protocol, or a 2041 + generic converter implementation, may choose -2^(32)..2^(32)-1 or 2042 + -2^(64)..2^(64)-1 (fully using the integer ranges available in 2043 + CBOR with uint32_t or uint64_t, respectively) or even 2044 + -2^(31)..2^(31)-1 or -2^(63)..2^(63)-1 (using popular ranges for 2045 + two's complement signed integers). (If the JSON was generated 2046 + from a JavaScript implementation, its precision is already limited 2047 + to 53 bits maximum.) 2048 + 2049 + * Numbers with fractional parts are represented as floating-point 2050 + values, performing the decimal-to-binary conversion based on the 2051 + precision provided by IEEE 754 binary64. The mathematical value 2052 + of the JSON number is converted to binary64 using the 2053 + roundTiesToEven procedure in Section 4.3.1 of [IEEE754]. Then, 2054 + when encoding in CBOR, the preferred serialization uses the 2055 + shortest floating-point representation exactly representing this 2056 + conversion result; for instance, 1.5 is represented in a 16-bit 2057 + floating-point value (not all implementations will be capable of 2058 + efficiently finding the minimum form, though). Instead of using 2059 + the default binary64 precision, there may be an implementation- 2060 + defined limit to the precision of the conversion that will affect 2061 + the precision of the represented values. Decimal representation 2062 + should only be used on the CBOR side if that is specified in a 2063 + protocol. 2064 + 2065 + CBOR has been designed to generally provide a more compact encoding 2066 + than JSON. One implementation strategy that might come to mind is to 2067 + perform a JSON-to-CBOR encoding in place in a single buffer. This 2068 + strategy would need to carefully consider a number of pathological 2069 + cases, such as that some strings represented with no or very few 2070 + escapes and longer (or much longer) than 255 bytes may expand when 2071 + encoded as UTF-8 strings in CBOR. Similarly, a few of the binary 2072 + floating-point representations might cause expansion from some short 2073 + decimal representations (1.1, 1e9) in JSON. This may be hard to get 2074 + right, and any ensuing vulnerabilities may be exploited by an 2075 + attacker. 2076 + 2077 + 7. Future Evolution of CBOR 2078 + 2079 + Successful protocols evolve over time. New ideas appear, 2080 + implementation platforms improve, related protocols are developed and 2081 + evolve, and new requirements from applications and protocols are 2082 + added. Facilitating protocol evolution is therefore an important 2083 + design consideration for any protocol development. 2084 + 2085 + For protocols that will use CBOR, CBOR provides some useful 2086 + mechanisms to facilitate their evolution. Best practices for this 2087 + are well known, particularly from JSON format development of JSON- 2088 + based protocols. Therefore, such best practices are outside the 2089 + scope of this specification. 2090 + 2091 + However, facilitating the evolution of CBOR itself is very well 2092 + within its scope. CBOR is designed to both provide a stable basis 2093 + for development of CBOR-based protocols and to be able to evolve. 2094 + Since a successful protocol may live for decades, CBOR needs to be 2095 + designed for decades of use and evolution. This section provides 2096 + some guidance for the evolution of CBOR. It is necessarily more 2097 + subjective than other parts of this document. It is also necessarily 2098 + incomplete, lest it turn into a textbook on protocol development. 2099 + 2100 + 7.1. Extension Points 2101 + 2102 + In a protocol design, opportunities for evolution are often included 2103 + in the form of extension points. For example, there may be a 2104 + codepoint space that is not fully allocated from the outset, and the 2105 + protocol is designed to tolerate and embrace implementations that 2106 + start using more codepoints than initially allocated. 2107 + 2108 + Sizing the codepoint space may be difficult because the range 2109 + required may be hard to predict. Protocol designs should attempt to 2110 + make the codepoint space large enough so that it can slowly be filled 2111 + over the intended lifetime of the protocol. 2112 + 2113 + CBOR has three major extension points: 2114 + 2115 + the "simple" space (values in major type 7): Of the 24 efficient 2116 + (and 224 slightly less efficient) values, only a small number have 2117 + been allocated. Implementations receiving an unknown simple data 2118 + item may easily be able to process it as such, given that the 2119 + structure of the value is indeed simple. The IANA registry in 2120 + Section 9.1 is the appropriate way to address the extensibility of 2121 + this codepoint space. 2122 + 2123 + the "tag" space (values in major type 6): The total codepoint space 2124 + is abundant; only a tiny part of it has been allocated. However, 2125 + not all of these codepoints are equally efficient: the first 24 2126 + only consume a single ("1+0") byte, and half of them have already 2127 + been allocated. The next 232 values only consume two ("1+1") 2128 + bytes, with nearly a quarter already allocated. These subspaces 2129 + need some curation to last for a few more decades. 2130 + Implementations receiving an unknown tag number can choose to 2131 + process just the enclosed tag content or, preferably, to process 2132 + the tag as an unknown tag number wrapping the tag content. The 2133 + IANA registry in Section 9.2 is the appropriate way to address the 2134 + extensibility of this codepoint space. 2135 + 2136 + the "additional information" space: An implementation receiving an 2137 + unknown additional information value has no way to continue 2138 + decoding, so allocating codepoints in this space is a major step 2139 + beyond just exercising an extension point. There are also very 2140 + few codepoints left. See also Section 7.2. 2141 + 2142 + 7.2. Curating the Additional Information Space 2143 + 2144 + The human mind is sometimes drawn to filling in little perceived gaps 2145 + to make something neat. We expect the remaining gaps in the 2146 + codepoint space for the additional information values to be an 2147 + attractor for new ideas, just because they are there. 2148 + 2149 + The present specification does not manage the additional information 2150 + codepoint space by an IANA registry. Instead, allocations out of 2151 + this space can only be done by updating this specification. 2152 + 2153 + For an additional information value of n >= 24, the size of the 2154 + additional data typically is 2^(n-24) bytes. Therefore, additional 2155 + information values 28 and 29 should be viewed as candidates for 2156 + 128-bit and 256-bit quantities, in case a need arises to add them to 2157 + the protocol. Additional information value 30 is then the only 2158 + additional information value available for general allocation, and 2159 + there should be a very good reason for allocating it before assigning 2160 + it through an update of the present specification. 2161 + 2162 + 8. Diagnostic Notation 2163 + 2164 + CBOR is a binary interchange format. To facilitate documentation and 2165 + debugging, and in particular to facilitate communication between 2166 + entities cooperating in debugging, this section defines a simple 2167 + human-readable diagnostic notation. All actual interchange always 2168 + happens in the binary format. 2169 + 2170 + Note that this truly is a diagnostic format; it is not meant to be 2171 + parsed. Therefore, no formal definition (as in ABNF) is given in 2172 + this document. (Implementers looking for a text-based format for 2173 + representing CBOR data items in configuration files may also want to 2174 + consider YAML [YAML].) 2175 + 2176 + The diagnostic notation is loosely based on JSON as it is defined in 2177 + RFC 8259, extending it where needed. 2178 + 2179 + The notation borrows the JSON syntax for numbers (integer and 2180 + floating-point), True (>true<), False (>false<), Null (>null<), UTF-8 2181 + strings, arrays, and maps (maps are called objects in JSON; the 2182 + diagnostic notation extends JSON here by allowing any data item in 2183 + the key position). Undefined is written >undefined< as in 2184 + JavaScript. The non-finite floating-point numbers Infinity, 2185 + -Infinity, and NaN are written exactly as in this sentence (this is 2186 + also a way they can be written in JavaScript, although JSON does not 2187 + allow them). A tag is written as an integer number for the tag 2188 + number, followed by the tag content in parentheses; for instance, a 2189 + date in the format specified by RFC 3339 (ISO 8601) could be notated 2190 + as: 2191 + 2192 + 0("2013-03-21T20:04:00Z") 2193 + 2194 + or the equivalent relative time as the following: 2195 + 2196 + 1(1363896240) 2197 + 2198 + Byte strings are notated in one of the base encodings, without 2199 + padding, enclosed in single quotes, prefixed by >h< for base16, >b32< 2200 + for base32, >h32< for base32hex, >b64< for base64 or base64url (the 2201 + actual encodings do not overlap, so the string remains unambiguous). 2202 + For example, the byte string 0x12345678 could be written h'12345678', 2203 + b32'CI2FM6A', or b64'EjRWeA'. 2204 + 2205 + Unassigned simple values are given as "simple()" with the appropriate 2206 + integer in the parentheses. For example, "simple(42)" indicates 2207 + major type 7, value 42. 2208 + 2209 + A number of useful extensions to the diagnostic notation defined here 2210 + are provided in Appendix G of [RFC8610], "Extended Diagnostic 2211 + Notation" (EDN). Similarly, this notation could be extended in a 2212 + separate document to provide documentation for NaN payloads, which 2213 + are not covered in this document. 2214 + 2215 + 8.1. Encoding Indicators 2216 + 2217 + Sometimes it is useful to indicate in the diagnostic notation which 2218 + of several alternative representations were actually used; for 2219 + example, a data item written >1.5< by a diagnostic decoder might have 2220 + been encoded as a half-, single-, or double-precision float. 2221 + 2222 + The convention for encoding indicators is that anything starting with 2223 + an underscore and all following characters that are alphanumeric or 2224 + underscore is an encoding indicator, and can be ignored by anyone not 2225 + interested in this information. For example, "_" or "_3". Encoding 2226 + indicators are always optional. 2227 + 2228 + A single underscore can be written after the opening brace of a map 2229 + or the opening bracket of an array to indicate that the data item was 2230 + represented in indefinite-length format. For example, [_ 1, 2] 2231 + contains an indicator that an indefinite-length representation was 2232 + used to represent the data item [1, 2]. 2233 + 2234 + An underscore followed by a decimal digit n indicates that the 2235 + preceding item (or, for arrays and maps, the item starting with the 2236 + preceding bracket or brace) was encoded with an additional 2237 + information value of 24+n. For example, 1.5_1 is a half-precision 2238 + floating-point number, while 1.5_3 is encoded as double precision. 2239 + This encoding indicator is not shown in Appendix A. (Note that the 2240 + encoding indicator "_" is thus an abbreviation of the full form "_7", 2241 + which is not used.) 2242 + 2243 + The detailed chunk structure of byte and text strings of indefinite 2244 + length can be notated in the form (_ h'0123', h'4567') and (_ "foo", 2245 + "bar"). However, for an indefinite-length string with no chunks 2246 + inside, (_ ) would be ambiguous as to whether a byte string (0x5fff) 2247 + or a text string (0x7fff) is meant and is therefore not used. The 2248 + basic forms ''_ and ""_ can be used instead and are reserved for the 2249 + case of no chunks only -- not as short forms for the (permitted, but 2250 + not really useful) encodings with only empty chunks, which need to be 2251 + notated as (_ ''), (_ ""), etc., to preserve the chunk structure. 2252 + 2253 + 9. IANA Considerations 2254 + 2255 + IANA has created two registries for new CBOR values. The registries 2256 + are separate, that is, not under an umbrella registry, and follow the 2257 + rules in [RFC8126]. IANA has also assigned a new media type, an 2258 + associated CoAP Content-Format entry, and a structured syntax suffix. 2259 + 2260 + 9.1. CBOR Simple Values Registry 2261 + 2262 + IANA has created the "Concise Binary Object Representation (CBOR) 2263 + Simple Values" registry at [IANA.cbor-simple-values]. The initial 2264 + values are shown in Table 4. 2265 + 2266 + New entries in the range 0 to 19 are assigned by Standards Action 2267 + [RFC8126]. It is suggested that IANA allocate values starting with 2268 + the number 16 in order to reserve the lower numbers for contiguous 2269 + blocks (if any). 2270 + 2271 + New entries in the range 32 to 255 are assigned by Specification 2272 + Required. 2273 + 2274 + 9.2. CBOR Tags Registry 2275 + 2276 + IANA has created the "Concise Binary Object Representation (CBOR) 2277 + Tags" registry at [IANA.cbor-tags]. The tags that were defined in 2278 + [RFC7049] are described in detail in Section 3.4, and other tags have 2279 + already been defined since then. 2280 + 2281 + New entries in the range 0 to 23 ("1+0") are assigned by Standards 2282 + Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767 2283 + (lower half of "1+2") are assigned by Specification Required. New 2284 + entries in the range 32768 to 18446744073709551615 (upper half of 2285 + "1+2", "1+4", and "1+8") are assigned by First Come First Served. 2286 + The template for registration requests is: 2287 + 2288 + * Data item 2289 + 2290 + * Semantics (short form) 2291 + 2292 + In addition, First Come First Served requests should include: 2293 + 2294 + * Point of contact 2295 + 2296 + * Description of semantics (URL) -- This description is optional; 2297 + the URL can point to something like an Internet-Draft or a web 2298 + page. 2299 + 2300 + Applicants exercising the First Come First Served range and making a 2301 + suggestion for a tag number that is not representable in 32 bits 2302 + (i.e., larger than 4294967295) should be aware that this could reduce 2303 + interoperability with implementations that do not support 64-bit 2304 + numbers. 2305 + 2306 + 9.3. Media Types Registry 2307 + 2308 + The Internet media type [RFC6838] ("MIME type") for a single encoded 2309 + CBOR data item is "application/cbor" as defined in the "Media Types" 2310 + registry [IANA.media-types]: 2311 + 2312 + Type name: application 2313 + 2314 + Subtype name: cbor 2315 + 2316 + Required parameters: n/a 2317 + 2318 + Optional parameters: n/a 2319 + 2320 + Encoding considerations: Binary 2321 + 2322 + Security considerations: See Section 10 of RFC 8949. 2323 + 2324 + Interoperability considerations: n/a 2325 + 2326 + Published specification: RFC 8949 2327 + 2328 + Applications that use this media type: Many 2329 + 2330 + Additional information: 2331 + 2332 + Magic number(s): n/a 2333 + File extension(s): .cbor 2334 + Macintosh file type code(s): n/a 2335 + 2336 + Person & email address to contact for further information: IETF CBOR 2337 + Working Group (cbor@ietf.org) or IETF Applications and Real-Time 2338 + Area (art@ietf.org) 2339 + 2340 + Intended usage: COMMON 2341 + 2342 + Restrictions on usage: none 2343 + 2344 + Author: IETF CBOR Working Group (cbor@ietf.org) 2345 + 2346 + Change controller: The IESG (iesg@ietf.org) 2347 + 2348 + 9.4. CoAP Content-Format Registry 2349 + 2350 + The CoAP Content-Format for CBOR has been registered in the "CoAP 2351 + Content-Formats" subregistry within the "Constrained RESTful 2352 + Environments (CoRE) Parameters" registry [IANA.core-parameters]: 2353 + 2354 + Media Type: application/cbor 2355 + 2356 + Encoding: - 2357 + 2358 + ID: 60 2359 + 2360 + Reference: RFC 8949 2361 + 2362 + 9.5. Structured Syntax Suffix Registry 2363 + 2364 + The structured syntax suffix [RFC6838] for media types based on a 2365 + single encoded CBOR data item is +cbor, which IANA has registered in 2366 + the "Structured Syntax Suffixes" registry [IANA.structured-suffix]: 2367 + 2368 + Name: Concise Binary Object Representation (CBOR) 2369 + 2370 + +suffix: +cbor 2371 + 2372 + References: RFC 8949 2373 + 2374 + Encoding Considerations: CBOR is a binary format. 2375 + 2376 + Interoperability Considerations: n/a 2377 + 2378 + Fragment Identifier Considerations: The syntax and semantics of 2379 + fragment identifiers specified for +cbor SHOULD be as specified 2380 + for "application/cbor". (At publication of RFC 8949, there is no 2381 + fragment identification syntax defined for "application/cbor".) 2382 + 2383 + The syntax and semantics for fragment identifiers for a specific 2384 + "xxx/yyy+cbor" SHOULD be processed as follows: 2385 + 2386 + * For cases defined in +cbor, where the fragment identifier 2387 + resolves per the +cbor rules, then process as specified in 2388 + +cbor. 2389 + 2390 + * For cases defined in +cbor, where the fragment identifier does 2391 + not resolve per the +cbor rules, then process as specified in 2392 + "xxx/yyy+cbor". 2393 + 2394 + * For cases not defined in +cbor, then process as specified in 2395 + "xxx/yyy+cbor". 2396 + 2397 + Security Considerations: See Section 10 of RFC 8949. 2398 + 2399 + Contact: IETF CBOR Working Group (cbor@ietf.org) or IETF 2400 + Applications and Real-Time Area (art@ietf.org) 2401 + 2402 + Author/Change Controller: IETF 2403 + 2404 + 10. Security Considerations 2405 + 2406 + A network-facing application can exhibit vulnerabilities in its 2407 + processing logic for incoming data. Complex parsers are well known 2408 + as a likely source of such vulnerabilities, such as the ability to 2409 + remotely crash a node, or even remotely execute arbitrary code on it. 2410 + CBOR attempts to narrow the opportunities for introducing such 2411 + vulnerabilities by reducing parser complexity, by giving the entire 2412 + range of encodable values a meaning where possible. 2413 + 2414 + Because CBOR decoders are often used as a first step in processing 2415 + unvalidated input, they need to be fully prepared for all types of 2416 + hostile input that may be designed to corrupt, overrun, or achieve 2417 + control of the system decoding the CBOR data item. A CBOR decoder 2418 + needs to assume that all input may be hostile even if it has been 2419 + checked by a firewall, has come over a secure channel such as TLS, is 2420 + encrypted or signed, or has come from some other source that is 2421 + presumed trusted. 2422 + 2423 + Section 4.1 gives examples of limitations in interoperability when 2424 + using a constrained CBOR decoder with input from a CBOR encoder that 2425 + uses a non-preferred serialization. When a single data item is 2426 + consumed both by such a constrained decoder and a full decoder, it 2427 + can lead to security issues that can be exploited by an attacker who 2428 + can inject or manipulate content. 2429 + 2430 + As discussed throughout this document, there are many values that can 2431 + be considered "equivalent" in some circumstances and "not equivalent" 2432 + in others. As just one example, the numeric value for the number 2433 + "one" might be expressed as an integer or a bignum. A system 2434 + interpreting CBOR input might accept either form for the number 2435 + "one", or might reject one (or both) forms. Such acceptance or 2436 + rejection can have security implications in the program that is using 2437 + the interpreted input. 2438 + 2439 + Hostile input may be constructed to overrun buffers, to overflow or 2440 + underflow integer arithmetic, or to cause other decoding disruption. 2441 + CBOR data items might have lengths or sizes that are intentionally 2442 + extremely large or too short. Resource exhaustion attacks might 2443 + attempt to lure a decoder into allocating very big data items 2444 + (strings, arrays, maps, or even arbitrary precision numbers) or 2445 + exhaust the stack depth by setting up deeply nested items. Decoders 2446 + need to have appropriate resource management to mitigate these 2447 + attacks. (Items for which very large sizes are given can also 2448 + attempt to exploit integer overflow vulnerabilities.) 2449 + 2450 + A CBOR decoder, by definition, only accepts well-formed CBOR; this is 2451 + the first step to its robustness. Input that is not well-formed CBOR 2452 + causes no further processing from the point where the lack of well- 2453 + formedness was detected. If possible, any data decoded up to this 2454 + point should have no impact on the application using the CBOR 2455 + decoder. 2456 + 2457 + In addition to ascertaining well-formedness, a CBOR decoder might 2458 + also perform validity checks on the CBOR data. Alternatively, it can 2459 + leave those checks to the application using the decoder. This choice 2460 + needs to be clearly documented in the decoder. Beyond the validity 2461 + at the CBOR level, an application also needs to ascertain that the 2462 + input is in alignment with the application protocol that is 2463 + serialized in CBOR. 2464 + 2465 + The input check itself may consume resources. This is usually linear 2466 + in the size of the input, which means that an attacker has to spend 2467 + resources that are commensurate to the resources spent by the 2468 + defender on input validation. However, an attacker might be able to 2469 + craft inputs that will take longer for a target decoder to process 2470 + than for the attacker to produce. Processing for arbitrary-precision 2471 + numbers may exceed linear effort. Also, some hash-table 2472 + implementations that are used by decoders to build in-memory 2473 + representations of maps can be attacked to spend quadratic effort, 2474 + unless a secret key (see Section 7 of [SIPHASH_LNCS], also 2475 + [SIPHASH_OPEN]) or some other mitigation is employed. Such 2476 + superlinear efforts can be exploited by an attacker to exhaust 2477 + resources at or before the input validator; they therefore need to be 2478 + avoided in a CBOR decoder implementation. Note that tag number 2479 + definitions and their implementations can add security considerations 2480 + of this kind; this should then be discussed in the security 2481 + considerations of the tag number definition. 2482 + 2483 + CBOR encoders do not receive input directly from the network and are 2484 + thus not directly attackable in the same way as CBOR decoders. 2485 + However, CBOR encoders often have an API that takes input from 2486 + another level in the implementation and can be attacked through that 2487 + API. The design and implementation of that API should assume the 2488 + behavior of its caller may be based on hostile input or on coding 2489 + mistakes. It should check inputs for buffer overruns, overflow and 2490 + underflow of integer arithmetic, and other such errors that are aimed 2491 + to disrupt the encoder. 2492 + 2493 + Protocols should be defined in such a way that potential multiple 2494 + interpretations are reliably reduced to a single interpretation. For 2495 + example, an attacker could make use of invalid input such as 2496 + duplicate keys in maps, or exploit different precision in processing 2497 + numbers to make one application base its decisions on a different 2498 + interpretation than the one that will be used by a second 2499 + application. To facilitate consistent interpretation, encoder and 2500 + decoder implementations should provide a validity-checking mode of 2501 + operation (Section 5.4). Note, however, that a generic decoder 2502 + cannot know about all requirements that an application poses on its 2503 + input data; it is therefore not relieving the application from 2504 + performing its own input checking. Also, since the set of defined 2505 + tag numbers evolves, the application may employ a tag number that is 2506 + not yet supported for validity checking by the generic decoder it 2507 + uses. Generic decoders therefore need to document which tag numbers 2508 + they support and what validity checking they provide for those tag 2509 + numbers as well as for basic CBOR (UTF-8 checking, duplicate map key 2510 + checking). 2511 + 2512 + Section 3.4.3 notes that using the non-preferred choice of a bignum 2513 + representation instead of a basic integer for encoding a number is 2514 + not intended to have application semantics, but it can have such 2515 + semantics if an application receiving CBOR data is using a decoder in 2516 + the basic generic data model. This disparity causes a security issue 2517 + if the two sets of semantics differ. Thus, applications using CBOR 2518 + need to specify the data model that they are using for each use of 2519 + CBOR data. 2520 + 2521 + It is common to convert CBOR data to other formats. In many cases, 2522 + CBOR has more expressive types than other formats; this is 2523 + particularly true for the common conversion to JSON. The loss of 2524 + type information can cause security issues for the systems that are 2525 + processing the less-expressive data. 2526 + 2527 + Section 6.2 describes a possibly common usage scenario of converting 2528 + between CBOR and JSON that could allow an attack if the attacker 2529 + knows that the application is performing the conversion. 2530 + 2531 + Security considerations for the use of base16 and base64 from 2532 + [RFC4648], and the use of UTF-8 from [RFC3629], are relevant to CBOR 2533 + as well. 2534 + 2535 + 11. References 2536 + 2537 + 11.1. Normative References 2538 + 2539 + [C] International Organization for Standardization, 2540 + "Information technology - Programming languages - C", 2541 + Fourth Edition, ISO/IEC 9899:2018, June 2018, 2542 + <https://www.iso.org/standard/74528.html>. 2543 + 2544 + [Cplusplus20] 2545 + International Organization for Standardization, 2546 + "Programming languages - C++", Sixth Edition, ISO/IEC DIS 2547 + 14882, ISO/IEC ISO/IEC JTC1 SC22 WG21 N 4860, March 2020, 2548 + <https://isocpp.org/files/papers/N4860.pdf>. 2549 + 2550 + [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE 2551 + Std 754-2019, DOI 10.1109/IEEESTD.2019.8766229, 2552 + <https://ieeexplore.ieee.org/document/8766229>. 2553 + 2554 + [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 2555 + Extensions (MIME) Part One: Format of Internet Message 2556 + Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, 2557 + <https://www.rfc-editor.org/info/rfc2045>. 2558 + 2559 + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2560 + Requirement Levels", BCP 14, RFC 2119, 2561 + DOI 10.17487/RFC2119, March 1997, 2562 + <https://www.rfc-editor.org/info/rfc2119>. 2563 + 2564 + [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2565 + Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, 2566 + <https://www.rfc-editor.org/info/rfc3339>. 2567 + 2568 + [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2569 + 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2570 + 2003, <https://www.rfc-editor.org/info/rfc3629>. 2571 + 2572 + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 2573 + Resource Identifier (URI): Generic Syntax", STD 66, 2574 + RFC 3986, DOI 10.17487/RFC3986, January 2005, 2575 + <https://www.rfc-editor.org/info/rfc3986>. 2576 + 2577 + [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom 2578 + Syndication Format", RFC 4287, DOI 10.17487/RFC4287, 2579 + December 2005, <https://www.rfc-editor.org/info/rfc4287>. 2580 + 2581 + [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 2582 + Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 2583 + <https://www.rfc-editor.org/info/rfc4648>. 2584 + 2585 + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 2586 + Writing an IANA Considerations Section in RFCs", BCP 26, 2587 + RFC 8126, DOI 10.17487/RFC8126, June 2017, 2588 + <https://www.rfc-editor.org/info/rfc8126>. 2589 + 2590 + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2591 + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2592 + May 2017, <https://www.rfc-editor.org/info/rfc8174>. 2593 + 2594 + [TIME_T] The Open Group, "The Open Group Base Specifications", 2595 + Section 4.16, 'Seconds Since the Epoch', Issue 7, 2018 2596 + Edition, IEEE Std 1003.1, 2018, 2597 + <https://pubs.opengroup.org/onlinepubs/9699919799/ 2598 + basedefs/V1_chap04.html#tag_04_16>. 2599 + 2600 + 11.2. Informative References 2601 + 2602 + [ASN.1] International Telecommunication Union, "Information 2603 + Technology - ASN.1 encoding rules: Specification of Basic 2604 + Encoding Rules (BER), Canonical Encoding Rules (CER) and 2605 + Distinguished Encoding Rules (DER)", ITU-T Recommendation 2606 + X.690, 2015, 2607 + <https://www.itu.int/rec/T-REC-X.690-201508-I/en>. 2608 + 2609 + [BSON] Various, "BSON - Binary JSON", <http://bsonspec.org/>. 2610 + 2611 + [CBOR-TAGS] 2612 + Bormann, C., "Notable CBOR Tags", Work in Progress, 2613 + Internet-Draft, draft-bormann-cbor-notable-tags-02, 25 2614 + June 2020, <https://tools.ietf.org/html/draft-bormann- 2615 + cbor-notable-tags-02>. 2616 + 2617 + [ECMA262] Ecma International, "ECMAScript 2020 Language 2618 + Specification", Standard ECMA-262, 11th Edition, June 2619 + 2020, <https://www.ecma- 2620 + international.org/publications/standards/Ecma-262.htm>. 2621 + 2622 + [Err3764] RFC Errata, Erratum ID 3764, RFC 7049, 2623 + <https://www.rfc-editor.org/errata/eid3764>. 2624 + 2625 + [Err3770] RFC Errata, Erratum ID 3770, RFC 7049, 2626 + <https://www.rfc-editor.org/errata/eid3770>. 2627 + 2628 + [Err4294] RFC Errata, Erratum ID 4294, RFC 7049, 2629 + <https://www.rfc-editor.org/errata/eid4294>. 2630 + 2631 + [Err4409] RFC Errata, Erratum ID 4409, RFC 7049, 2632 + <https://www.rfc-editor.org/errata/eid4409>. 2633 + 2634 + [Err4963] RFC Errata, Erratum ID 4963, RFC 7049, 2635 + <https://www.rfc-editor.org/errata/eid4963>. 2636 + 2637 + [Err4964] RFC Errata, Erratum ID 4964, RFC 7049, 2638 + <https://www.rfc-editor.org/errata/eid4964>. 2639 + 2640 + [Err5434] RFC Errata, Erratum ID 5434, RFC 7049, 2641 + <https://www.rfc-editor.org/errata/eid5434>. 2642 + 2643 + [Err5763] RFC Errata, Erratum ID 5763, RFC 7049, 2644 + <https://www.rfc-editor.org/errata/eid5763>. 2645 + 2646 + [Err5917] RFC Errata, Erratum ID 5917, RFC 7049, 2647 + <https://www.rfc-editor.org/errata/eid5917>. 2648 + 2649 + [IANA.cbor-simple-values] 2650 + IANA, "Concise Binary Object Representation (CBOR) Simple 2651 + Values", 2652 + <https://www.iana.org/assignments/cbor-simple-values>. 2653 + 2654 + [IANA.cbor-tags] 2655 + IANA, "Concise Binary Object Representation (CBOR) Tags", 2656 + <https://www.iana.org/assignments/cbor-tags>. 2657 + 2658 + [IANA.core-parameters] 2659 + IANA, "Constrained RESTful Environments (CoRE) 2660 + Parameters", 2661 + <https://www.iana.org/assignments/core-parameters>. 2662 + 2663 + [IANA.media-types] 2664 + IANA, "Media Types", 2665 + <https://www.iana.org/assignments/media-types>. 2666 + 2667 + [IANA.structured-suffix] 2668 + IANA, "Structured Syntax Suffixes", 2669 + <https://www.iana.org/assignments/media-type-structured- 2670 + suffix>. 2671 + 2672 + [MessagePack] 2673 + Furuhashi, S., "MessagePack", <https://msgpack.org/>. 2674 + 2675 + [PCRE] Hazel, P., "PCRE - Perl Compatible Regular Expressions", 2676 + <https://www.pcre.org/>. 2677 + 2678 + [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission 2679 + Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, 2680 + <https://www.rfc-editor.org/info/rfc713>. 2681 + 2682 + [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 2683 + Specifications and Registration Procedures", BCP 13, 2684 + RFC 6838, DOI 10.17487/RFC6838, January 2013, 2685 + <https://www.rfc-editor.org/info/rfc6838>. 2686 + 2687 + [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 2688 + Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 2689 + October 2013, <https://www.rfc-editor.org/info/rfc7049>. 2690 + 2691 + [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 2692 + Constrained-Node Networks", RFC 7228, 2693 + DOI 10.17487/RFC7228, May 2014, 2694 + <https://www.rfc-editor.org/info/rfc7228>. 2695 + 2696 + [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 2697 + DOI 10.17487/RFC7493, March 2015, 2698 + <https://www.rfc-editor.org/info/rfc7493>. 2699 + 2700 + [RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary", 2701 + RFC 7991, DOI 10.17487/RFC7991, December 2016, 2702 + <https://www.rfc-editor.org/info/rfc7991>. 2703 + 2704 + [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 2705 + Interchange Format", STD 90, RFC 8259, 2706 + DOI 10.17487/RFC8259, December 2017, 2707 + <https://www.rfc-editor.org/info/rfc8259>. 2708 + 2709 + [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 2710 + Definition Language (CDDL): A Notational Convention to 2711 + Express Concise Binary Object Representation (CBOR) and 2712 + JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 2713 + June 2019, <https://www.rfc-editor.org/info/rfc8610>. 2714 + 2715 + [RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T., 2716 + and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS 2717 + Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September 2718 + 2019, <https://www.rfc-editor.org/info/rfc8618>. 2719 + 2720 + [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 2721 + Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 2722 + <https://www.rfc-editor.org/info/rfc8742>. 2723 + 2724 + [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation 2725 + (CBOR) Tags for Typed Arrays", RFC 8746, 2726 + DOI 10.17487/RFC8746, February 2020, 2727 + <https://www.rfc-editor.org/info/rfc8746>. 2728 + 2729 + [SIPHASH_LNCS] 2730 + Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- 2731 + Input PRF", Progress in Cryptology - INDOCRYPT 2012, pp. 2732 + 489-508, DOI 10.1007/978-3-642-34931-7_28, 2012, 2733 + <https://doi.org/10.1007/978-3-642-34931-7_28>. 2734 + 2735 + [SIPHASH_OPEN] 2736 + Aumasson, J. and D.J. Bernstein, "SipHash: a fast short- 2737 + input PRF", <https://www.aumasson.jp/siphash/siphash.pdf>. 2738 + 2739 + [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup 2740 + Language (YAML[TM]) Version 1.2", 3rd Edition, October 2741 + 2009, <https://www.yaml.org/spec/1.2/spec.html>. 2742 + 2743 + Appendix A. Examples of Encoded CBOR Data Items 2744 + 2745 + The following table provides some CBOR-encoded values in hexadecimal 2746 + (right column), together with diagnostic notation for these values 2747 + (left column). Note that the string "\u00fc" is one form of 2748 + diagnostic notation for a UTF-8 string containing the single Unicode 2749 + character U+00FC (LATIN SMALL LETTER U WITH DIAERESIS, "ü"). 2750 + Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a 2751 + single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, "水"), often 2752 + representing "water", and "\ud800\udd51" is a UTF-8 string in 2753 + diagnostic notation with a single character U+10151 (GREEK ACROPHONIC 2754 + ATTIC FIFTY STATERS, "𐅑"). (Note that all these single-character 2755 + strings could also be represented in native UTF-8 in diagnostic 2756 + notation, just not if an ASCII-only specification is required.) In 2757 + the diagnostic notation provided for bignums, their intended numeric 2758 + value is shown as a decimal number (such as 18446744073709551616) 2759 + instead of a tagged byte string (such as 2(h'010000000000000000')). 2760 + 2761 + +==============================+====================================+ 2762 + |Diagnostic | Encoded | 2763 + +==============================+====================================+ 2764 + |0 | 0x00 | 2765 + +------------------------------+------------------------------------+ 2766 + |1 | 0x01 | 2767 + +------------------------------+------------------------------------+ 2768 + |10 | 0x0a | 2769 + +------------------------------+------------------------------------+ 2770 + |23 | 0x17 | 2771 + +------------------------------+------------------------------------+ 2772 + |24 | 0x1818 | 2773 + +------------------------------+------------------------------------+ 2774 + |25 | 0x1819 | 2775 + +------------------------------+------------------------------------+ 2776 + |100 | 0x1864 | 2777 + +------------------------------+------------------------------------+ 2778 + |1000 | 0x1903e8 | 2779 + +------------------------------+------------------------------------+ 2780 + |1000000 | 0x1a000f4240 | 2781 + +------------------------------+------------------------------------+ 2782 + |1000000000000 | 0x1b000000e8d4a51000 | 2783 + +------------------------------+------------------------------------+ 2784 + |18446744073709551615 | 0x1bffffffffffffffff | 2785 + +------------------------------+------------------------------------+ 2786 + |18446744073709551616 | 0xc249010000000000000000 | 2787 + +------------------------------+------------------------------------+ 2788 + |-18446744073709551616 | 0x3bffffffffffffffff | 2789 + +------------------------------+------------------------------------+ 2790 + |-18446744073709551617 | 0xc349010000000000000000 | 2791 + +------------------------------+------------------------------------+ 2792 + |-1 | 0x20 | 2793 + +------------------------------+------------------------------------+ 2794 + |-10 | 0x29 | 2795 + +------------------------------+------------------------------------+ 2796 + |-100 | 0x3863 | 2797 + +------------------------------+------------------------------------+ 2798 + |-1000 | 0x3903e7 | 2799 + +------------------------------+------------------------------------+ 2800 + |0.0 | 0xf90000 | 2801 + +------------------------------+------------------------------------+ 2802 + |-0.0 | 0xf98000 | 2803 + +------------------------------+------------------------------------+ 2804 + |1.0 | 0xf93c00 | 2805 + +------------------------------+------------------------------------+ 2806 + |1.1 | 0xfb3ff199999999999a | 2807 + +------------------------------+------------------------------------+ 2808 + |1.5 | 0xf93e00 | 2809 + +------------------------------+------------------------------------+ 2810 + |65504.0 | 0xf97bff | 2811 + +------------------------------+------------------------------------+ 2812 + |100000.0 | 0xfa47c35000 | 2813 + +------------------------------+------------------------------------+ 2814 + |3.4028234663852886e+38 | 0xfa7f7fffff | 2815 + +------------------------------+------------------------------------+ 2816 + |1.0e+300 | 0xfb7e37e43c8800759c | 2817 + +------------------------------+------------------------------------+ 2818 + |5.960464477539063e-8 | 0xf90001 | 2819 + +------------------------------+------------------------------------+ 2820 + |0.00006103515625 | 0xf90400 | 2821 + +------------------------------+------------------------------------+ 2822 + |-4.0 | 0xf9c400 | 2823 + +------------------------------+------------------------------------+ 2824 + |-4.1 | 0xfbc010666666666666 | 2825 + +------------------------------+------------------------------------+ 2826 + |Infinity | 0xf97c00 | 2827 + +------------------------------+------------------------------------+ 2828 + |NaN | 0xf97e00 | 2829 + +------------------------------+------------------------------------+ 2830 + |-Infinity | 0xf9fc00 | 2831 + +------------------------------+------------------------------------+ 2832 + |Infinity | 0xfa7f800000 | 2833 + +------------------------------+------------------------------------+ 2834 + |NaN | 0xfa7fc00000 | 2835 + +------------------------------+------------------------------------+ 2836 + |-Infinity | 0xfaff800000 | 2837 + +------------------------------+------------------------------------+ 2838 + |Infinity | 0xfb7ff0000000000000 | 2839 + +------------------------------+------------------------------------+ 2840 + |NaN | 0xfb7ff8000000000000 | 2841 + +------------------------------+------------------------------------+ 2842 + |-Infinity | 0xfbfff0000000000000 | 2843 + +------------------------------+------------------------------------+ 2844 + |false | 0xf4 | 2845 + +------------------------------+------------------------------------+ 2846 + |true | 0xf5 | 2847 + +------------------------------+------------------------------------+ 2848 + |null | 0xf6 | 2849 + +------------------------------+------------------------------------+ 2850 + |undefined | 0xf7 | 2851 + +------------------------------+------------------------------------+ 2852 + |simple(16) | 0xf0 | 2853 + +------------------------------+------------------------------------+ 2854 + |simple(255) | 0xf8ff | 2855 + +------------------------------+------------------------------------+ 2856 + |0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | 2857 + | | 30343a30305a | 2858 + +------------------------------+------------------------------------+ 2859 + |1(1363896240) | 0xc11a514b67b0 | 2860 + +------------------------------+------------------------------------+ 2861 + |1(1363896240.5) | 0xc1fb41d452d9ec200000 | 2862 + +------------------------------+------------------------------------+ 2863 + |23(h'01020304') | 0xd74401020304 | 2864 + +------------------------------+------------------------------------+ 2865 + |24(h'6449455446') | 0xd818456449455446 | 2866 + +------------------------------+------------------------------------+ 2867 + |32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | 2868 + | | 616d706c652e636f6d | 2869 + +------------------------------+------------------------------------+ 2870 + |h'' | 0x40 | 2871 + +------------------------------+------------------------------------+ 2872 + |h'01020304' | 0x4401020304 | 2873 + +------------------------------+------------------------------------+ 2874 + |"" | 0x60 | 2875 + +------------------------------+------------------------------------+ 2876 + |"a" | 0x6161 | 2877 + +------------------------------+------------------------------------+ 2878 + |"IETF" | 0x6449455446 | 2879 + +------------------------------+------------------------------------+ 2880 + |"\"\\" | 0x62225c | 2881 + +------------------------------+------------------------------------+ 2882 + |"\u00fc" | 0x62c3bc | 2883 + +------------------------------+------------------------------------+ 2884 + |"\u6c34" | 0x63e6b0b4 | 2885 + +------------------------------+------------------------------------+ 2886 + |"\ud800\udd51" | 0x64f0908591 | 2887 + +------------------------------+------------------------------------+ 2888 + |[] | 0x80 | 2889 + +------------------------------+------------------------------------+ 2890 + |[1, 2, 3] | 0x83010203 | 2891 + +------------------------------+------------------------------------+ 2892 + |[1, [2, 3], [4, 5]] | 0x8301820203820405 | 2893 + +------------------------------+------------------------------------+ 2894 + |[1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | 2895 + |10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | 2896 + |17, 18, 19, 20, 21, 22, 23, | | 2897 + |24, 25] | | 2898 + +------------------------------+------------------------------------+ 2899 + |{} | 0xa0 | 2900 + +------------------------------+------------------------------------+ 2901 + |{1: 2, 3: 4} | 0xa201020304 | 2902 + +------------------------------+------------------------------------+ 2903 + |{"a": 1, "b": [2, 3]} | 0xa26161016162820203 | 2904 + +------------------------------+------------------------------------+ 2905 + |["a", {"b": "c"}] | 0x826161a161626163 | 2906 + +------------------------------+------------------------------------+ 2907 + |{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 | 2908 + |"d": "D", "e": "E"} | 4461656145 | 2909 + +------------------------------+------------------------------------+ 2910 + |(_ h'0102', h'030405') | 0x5f42010243030405ff | 2911 + +------------------------------+------------------------------------+ 2912 + |(_ "strea", "ming") | 0x7f657374726561646d696e67ff | 2913 + +------------------------------+------------------------------------+ 2914 + |[_ ] | 0x9fff | 2915 + +------------------------------+------------------------------------+ 2916 + |[_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | 2917 + +------------------------------+------------------------------------+ 2918 + |[_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | 2919 + +------------------------------+------------------------------------+ 2920 + |[1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | 2921 + +------------------------------+------------------------------------+ 2922 + |[1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | 2923 + +------------------------------+------------------------------------+ 2924 + |[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f | 2925 + |10, 11, 12, 13, 14, 15, 16, | 101112131415161718181819ff | 2926 + |17, 18, 19, 20, 21, 22, 23, | | 2927 + |24, 25] | | 2928 + +------------------------------+------------------------------------+ 2929 + |{_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | 2930 + +------------------------------+------------------------------------+ 2931 + |["a", {_ "b": "c"}] | 0x826161bf61626163ff | 2932 + +------------------------------+------------------------------------+ 2933 + |{_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | 2934 + +------------------------------+------------------------------------+ 2935 + 2936 + Table 6: Examples of Encoded CBOR Data Items 2937 + 2938 + Appendix B. Jump Table for Initial Byte 2939 + 2940 + For brevity, this jump table does not show initial bytes that are 2941 + reserved for future extension. It also only shows a selection of the 2942 + initial bytes that can be used for optional features. (All unsigned 2943 + integers are in network byte order.) 2944 + 2945 + +============+================================================+ 2946 + | Byte | Structure/Semantics | 2947 + +============+================================================+ 2948 + | 0x00..0x17 | unsigned integer 0x00..0x17 (0..23) | 2949 + +------------+------------------------------------------------+ 2950 + | 0x18 | unsigned integer (one-byte uint8_t follows) | 2951 + +------------+------------------------------------------------+ 2952 + | 0x19 | unsigned integer (two-byte uint16_t follows) | 2953 + +------------+------------------------------------------------+ 2954 + | 0x1a | unsigned integer (four-byte uint32_t follows) | 2955 + +------------+------------------------------------------------+ 2956 + | 0x1b | unsigned integer (eight-byte uint64_t follows) | 2957 + +------------+------------------------------------------------+ 2958 + | 0x20..0x37 | negative integer -1-0x00..-1-0x17 (-1..-24) | 2959 + +------------+------------------------------------------------+ 2960 + | 0x38 | negative integer -1-n (one-byte uint8_t for n | 2961 + | | follows) | 2962 + +------------+------------------------------------------------+ 2963 + | 0x39 | negative integer -1-n (two-byte uint16_t for n | 2964 + | | follows) | 2965 + +------------+------------------------------------------------+ 2966 + | 0x3a | negative integer -1-n (four-byte uint32_t for | 2967 + | | n follows) | 2968 + +------------+------------------------------------------------+ 2969 + | 0x3b | negative integer -1-n (eight-byte uint64_t for | 2970 + | | n follows) | 2971 + +------------+------------------------------------------------+ 2972 + | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | 2973 + +------------+------------------------------------------------+ 2974 + | 0x58 | byte string (one-byte uint8_t for n, and then | 2975 + | | n bytes follow) | 2976 + +------------+------------------------------------------------+ 2977 + | 0x59 | byte string (two-byte uint16_t for n, and then | 2978 + | | n bytes follow) | 2979 + +------------+------------------------------------------------+ 2980 + | 0x5a | byte string (four-byte uint32_t for n, and | 2981 + | | then n bytes follow) | 2982 + +------------+------------------------------------------------+ 2983 + | 0x5b | byte string (eight-byte uint64_t for n, and | 2984 + | | then n bytes follow) | 2985 + +------------+------------------------------------------------+ 2986 + | 0x5f | byte string, byte strings follow, terminated | 2987 + | | by "break" | 2988 + +------------+------------------------------------------------+ 2989 + | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) | 2990 + +------------+------------------------------------------------+ 2991 + | 0x78 | UTF-8 string (one-byte uint8_t for n, and then | 2992 + | | n bytes follow) | 2993 + +------------+------------------------------------------------+ 2994 + | 0x79 | UTF-8 string (two-byte uint16_t for n, and | 2995 + | | then n bytes follow) | 2996 + +------------+------------------------------------------------+ 2997 + | 0x7a | UTF-8 string (four-byte uint32_t for n, and | 2998 + | | then n bytes follow) | 2999 + +------------+------------------------------------------------+ 3000 + | 0x7b | UTF-8 string (eight-byte uint64_t for n, and | 3001 + | | then n bytes follow) | 3002 + +------------+------------------------------------------------+ 3003 + | 0x7f | UTF-8 string, UTF-8 strings follow, terminated | 3004 + | | by "break" | 3005 + +------------+------------------------------------------------+ 3006 + | 0x80..0x97 | array (0x00..0x17 data items follow) | 3007 + +------------+------------------------------------------------+ 3008 + | 0x98 | array (one-byte uint8_t for n, and then n data | 3009 + | | items follow) | 3010 + +------------+------------------------------------------------+ 3011 + | 0x99 | array (two-byte uint16_t for n, and then n | 3012 + | | data items follow) | 3013 + +------------+------------------------------------------------+ 3014 + | 0x9a | array (four-byte uint32_t for n, and then n | 3015 + | | data items follow) | 3016 + +------------+------------------------------------------------+ 3017 + | 0x9b | array (eight-byte uint64_t for n, and then n | 3018 + | | data items follow) | 3019 + +------------+------------------------------------------------+ 3020 + | 0x9f | array, data items follow, terminated by | 3021 + | | "break" | 3022 + +------------+------------------------------------------------+ 3023 + | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) | 3024 + +------------+------------------------------------------------+ 3025 + | 0xb8 | map (one-byte uint8_t for n, and then n pairs | 3026 + | | of data items follow) | 3027 + +------------+------------------------------------------------+ 3028 + | 0xb9 | map (two-byte uint16_t for n, and then n pairs | 3029 + | | of data items follow) | 3030 + +------------+------------------------------------------------+ 3031 + | 0xba | map (four-byte uint32_t for n, and then n | 3032 + | | pairs of data items follow) | 3033 + +------------+------------------------------------------------+ 3034 + | 0xbb | map (eight-byte uint64_t for n, and then n | 3035 + | | pairs of data items follow) | 3036 + +------------+------------------------------------------------+ 3037 + | 0xbf | map, pairs of data items follow, terminated by | 3038 + | | "break" | 3039 + +------------+------------------------------------------------+ 3040 + | 0xc0 | text-based date/time (data item follows; see | 3041 + | | Section 3.4.1) | 3042 + +------------+------------------------------------------------+ 3043 + | 0xc1 | epoch-based date/time (data item follows; see | 3044 + | | Section 3.4.2) | 3045 + +------------+------------------------------------------------+ 3046 + | 0xc2 | unsigned bignum (data item "byte string" | 3047 + | | follows) | 3048 + +------------+------------------------------------------------+ 3049 + | 0xc3 | negative bignum (data item "byte string" | 3050 + | | follows) | 3051 + +------------+------------------------------------------------+ 3052 + | 0xc4 | decimal Fraction (data item "array" follows; | 3053 + | | see Section 3.4.4) | 3054 + +------------+------------------------------------------------+ 3055 + | 0xc5 | bigfloat (data item "array" follows; see | 3056 + | | Section 3.4.4) | 3057 + +------------+------------------------------------------------+ 3058 + | 0xc6..0xd4 | (tag) | 3059 + +------------+------------------------------------------------+ 3060 + | 0xd5..0xd7 | expected conversion (data item follows; see | 3061 + | | Section 3.4.5.2) | 3062 + +------------+------------------------------------------------+ 3063 + | 0xd8..0xdb | (more tags; 1/2/4/8 bytes of tag number and | 3064 + | | then a data item follow) | 3065 + +------------+------------------------------------------------+ 3066 + | 0xe0..0xf3 | (simple value) | 3067 + +------------+------------------------------------------------+ 3068 + | 0xf4 | false | 3069 + +------------+------------------------------------------------+ 3070 + | 0xf5 | true | 3071 + +------------+------------------------------------------------+ 3072 + | 0xf6 | null | 3073 + +------------+------------------------------------------------+ 3074 + | 0xf7 | undefined | 3075 + +------------+------------------------------------------------+ 3076 + | 0xf8 | (simple value, one byte follows) | 3077 + +------------+------------------------------------------------+ 3078 + | 0xf9 | half-precision float (two-byte IEEE 754) | 3079 + +------------+------------------------------------------------+ 3080 + | 0xfa | single-precision float (four-byte IEEE 754) | 3081 + +------------+------------------------------------------------+ 3082 + | 0xfb | double-precision float (eight-byte IEEE 754) | 3083 + +------------+------------------------------------------------+ 3084 + | 0xff | "break" stop code | 3085 + +------------+------------------------------------------------+ 3086 + 3087 + Table 7: Jump Table for Initial Byte 3088 + 3089 + Appendix C. Pseudocode 3090 + 3091 + The well-formedness of a CBOR item can be checked by the pseudocode 3092 + in Figure 1. The data is well-formed if and only if: 3093 + 3094 + * the pseudocode does not "fail"; 3095 + 3096 + * after execution of the pseudocode, no bytes are left in the input 3097 + (except in streaming applications). 3098 + 3099 + The pseudocode has the following prerequisites: 3100 + 3101 + * take(n) reads n bytes from the input data and returns them as a 3102 + byte string. If n bytes are no longer available, take(n) fails. 3103 + 3104 + * uint() converts a byte string into an unsigned integer by 3105 + interpreting the byte string in network byte order. 3106 + 3107 + * Arithmetic works as in C. 3108 + 3109 + * All variables are unsigned integers of sufficient range. 3110 + 3111 + Note that "well_formed" returns the major type for well-formed 3112 + definite-length items, but 99 for an indefinite-length item (or -1 3113 + for a "break" stop code, only if "breakable" is set). This is used 3114 + in "well_formed_indefinite" to ascertain that indefinite-length 3115 + strings only contain definite-length strings as chunks. 3116 + 3117 + well_formed(breakable = false) { 3118 + // process initial bytes 3119 + ib = uint(take(1)); 3120 + mt = ib >> 5; 3121 + val = ai = ib & 0x1f; 3122 + switch (ai) { 3123 + case 24: val = uint(take(1)); break; 3124 + case 25: val = uint(take(2)); break; 3125 + case 26: val = uint(take(4)); break; 3126 + case 27: val = uint(take(8)); break; 3127 + case 28: case 29: case 30: fail(); 3128 + case 31: 3129 + return well_formed_indefinite(mt, breakable); 3130 + } 3131 + // process content 3132 + switch (mt) { 3133 + // case 0, 1, 7 do not have content; just use val 3134 + case 2: case 3: take(val); break; // bytes/UTF-8 3135 + case 4: for (i = 0; i < val; i++) well_formed(); break; 3136 + case 5: for (i = 0; i < val*2; i++) well_formed(); break; 3137 + case 6: well_formed(); break; // 1 embedded data item 3138 + case 7: if (ai == 24 && val < 32) fail(); // bad simple 3139 + } 3140 + return mt; // definite-length data item 3141 + } 3142 + 3143 + well_formed_indefinite(mt, breakable) { 3144 + switch (mt) { 3145 + case 2: case 3: 3146 + while ((it = well_formed(true)) != -1) 3147 + if (it != mt) // need definite-length chunk 3148 + fail(); // of same type 3149 + break; 3150 + case 4: while (well_formed(true) != -1); break; 3151 + case 5: while (well_formed(true) != -1) well_formed(); break; 3152 + case 7: 3153 + if (breakable) 3154 + return -1; // signal break out 3155 + else fail(); // no enclosing indefinite 3156 + default: fail(); // wrong mt 3157 + } 3158 + return 99; // indefinite-length data item 3159 + } 3160 + 3161 + Figure 1: Pseudocode for Well-Formedness Check 3162 + 3163 + Note that the remaining complexity of a complete CBOR decoder is 3164 + about presenting data that has been decoded to the application in an 3165 + appropriate form. 3166 + 3167 + Major types 0 and 1 are designed in such a way that they can be 3168 + encoded in C from a signed integer without actually doing an if-then- 3169 + else for positive/negative (Figure 2). This uses the fact that 3170 + (-1-n), the transformation for major type 1, is the same as ~n 3171 + (bitwise complement) in C unsigned arithmetic; ~n can then be 3172 + expressed as (-1)^n for the negative case, while 0^n leaves n 3173 + unchanged for nonnegative. The sign of a number can be converted to 3174 + -1 for negative and 0 for nonnegative (0 or positive) by arithmetic- 3175 + shifting the number by one bit less than the bit length of the number 3176 + (for example, by 63 for 64-bit numbers). 3177 + 3178 + void encode_sint(int64_t n) { 3179 + uint64t ui = n >> 63; // extend sign to whole length 3180 + unsigned mt = ui & 0x20; // extract (shifted) major type 3181 + ui ^= n; // complement negatives 3182 + if (ui < 24) 3183 + *p++ = mt + ui; 3184 + else if (ui < 256) { 3185 + *p++ = mt + 24; 3186 + *p++ = ui; 3187 + } else 3188 + ... 3189 + 3190 + Figure 2: Pseudocode for Encoding a Signed Integer 3191 + 3192 + See Section 1.2 for some specific assumptions about the profile of 3193 + the C language used in these pieces of code. 3194 + 3195 + Appendix D. Half-Precision 3196 + 3197 + As half-precision floating-point numbers were only added to IEEE 754 3198 + in 2008 [IEEE754], today's programming platforms often still only 3199 + have limited support for them. It is very easy to include at least 3200 + decoding support for them even without such support. An example of a 3201 + small decoder for half-precision floating-point numbers in the C 3202 + language is shown in Figure 3. A similar program for Python is in 3203 + Figure 4; this code assumes that the 2-byte value has already been 3204 + decoded as an (unsigned short) integer in network byte order (as 3205 + would be done by the pseudocode in Appendix C). 3206 + 3207 + #include <math.h> 3208 + 3209 + double decode_half(unsigned char *halfp) { 3210 + unsigned half = (halfp[0] << 8) + halfp[1]; 3211 + unsigned exp = (half >> 10) & 0x1f; 3212 + unsigned mant = half & 0x3ff; 3213 + double val; 3214 + if (exp == 0) val = ldexp(mant, -24); 3215 + else if (exp != 31) val = ldexp(mant + 1024, exp - 25); 3216 + else val = mant == 0 ? INFINITY : NAN; 3217 + return half & 0x8000 ? -val : val; 3218 + } 3219 + 3220 + Figure 3: C Code for a Half-Precision Decoder 3221 + 3222 + import struct 3223 + from math import ldexp 3224 + 3225 + def decode_single(single): 3226 + return struct.unpack("!f", struct.pack("!I", single))[0] 3227 + 3228 + def decode_half(half): 3229 + valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 3230 + if ((half & 0x7c00) != 0x7c00): 3231 + return ldexp(decode_single(valu), 112) 3232 + return decode_single(valu | 0x7f800000) 3233 + 3234 + Figure 4: Python Code for a Half-Precision Decoder 3235 + 3236 + Appendix E. Comparison of Other Binary Formats to CBOR's Design 3237 + Objectives 3238 + 3239 + The proposal for CBOR follows a history of binary formats that is as 3240 + long as the history of computers themselves. Different formats have 3241 + had different objectives. In most cases, the objectives of the 3242 + format were never stated, although they can sometimes be implied by 3243 + the context where the format was first used. Some formats were meant 3244 + to be universally usable, although history has proven that no binary 3245 + format meets the needs of all protocols and applications. 3246 + 3247 + CBOR differs from many of these formats due to it starting with a set 3248 + of objectives and attempting to meet just those. This section 3249 + compares a few of the dozens of formats with CBOR's objectives in 3250 + order to help the reader decide if they want to use CBOR or a 3251 + different format for a particular protocol or application. 3252 + 3253 + Note that the discussion here is not meant to be a criticism of any 3254 + format: to the best of our knowledge, no format before CBOR was meant 3255 + to cover CBOR's objectives in the priority we have assigned them. A 3256 + brief recap of the objectives from Section 1.1 is: 3257 + 3258 + 1. unambiguous encoding of most common data formats from Internet 3259 + standards 3260 + 3261 + 2. code compactness for encoder or decoder 3262 + 3263 + 3. no schema description needed 3264 + 3265 + 4. reasonably compact serialization 3266 + 3267 + 5. applicability to constrained and unconstrained applications 3268 + 3269 + 6. good JSON conversion 3270 + 3271 + 7. extensibility 3272 + 3273 + A discussion of CBOR and other formats with respect to a different 3274 + set of design objectives is provided in Section 5 and Appendix C of 3275 + [RFC8618]. 3276 + 3277 + E.1. ASN.1 DER, BER, and PER 3278 + 3279 + [ASN.1] has many serializations. In the IETF, DER and BER are the 3280 + most common. The serialized output is not particularly compact for 3281 + many items, and the code needed to decode numeric items can be 3282 + complex on a constrained device. 3283 + 3284 + Few (if any) IETF protocols have adopted one of the several variants 3285 + of Packed Encoding Rules (PER). There could be many reasons for 3286 + this, but one that is commonly stated is that PER makes use of the 3287 + schema even for parsing the surface structure of the data item, 3288 + requiring significant tool support. There are different versions of 3289 + the ASN.1 schema language in use, which has also hampered adoption. 3290 + 3291 + E.2. MessagePack 3292 + 3293 + [MessagePack] is a concise, widely implemented counted binary 3294 + serialization format, similar in many properties to CBOR, although 3295 + somewhat less regular. While the data model can be used to represent 3296 + JSON data, MessagePack has also been used in many remote procedure 3297 + call (RPC) applications and for long-term storage of data. 3298 + 3299 + MessagePack has been essentially stable since it was first published 3300 + around 2011; it has not yet had a transition. The evolution of 3301 + MessagePack is impeded by an imperative to maintain complete 3302 + backwards compatibility with existing stored data, while only few 3303 + bytecodes are still available for extension. Repeated requests over 3304 + the years from the MessagePack user community to separate out binary 3305 + and text strings in the encoding recently have led to an extension 3306 + proposal that would leave MessagePack's "raw" data ambiguous between 3307 + its usages for binary and text data. The extension mechanism for 3308 + MessagePack remains unclear. 3309 + 3310 + E.3. BSON 3311 + 3312 + [BSON] is a data format that was developed for the storage of JSON- 3313 + like maps (JSON objects) in the MongoDB database. Its major 3314 + distinguishing feature is the capability for in-place update, which 3315 + prevents a compact representation. BSON uses a counted 3316 + representation except for map keys, which are null-byte terminated. 3317 + While BSON can be used for the representation of JSON-like objects on 3318 + the wire, its specification is dominated by the requirements of the 3319 + database application and has become somewhat baroque. The status of 3320 + how BSON extensions will be implemented remains unclear. 3321 + 3322 + E.4. MSDTP: RFC 713 3323 + 3324 + Message Services Data Transmission (MSDTP) is a very early example of 3325 + a compact message format; it is described in [RFC0713], written in 3326 + 1976. It is included here for its historical value, not because it 3327 + was ever widely used. 3328 + 3329 + E.5. Conciseness on the Wire 3330 + 3331 + While CBOR's design objective of code compactness for encoders and 3332 + decoders is a higher priority than its objective of conciseness on 3333 + the wire, many people focus on the wire size. Table 8 shows some 3334 + encoding examples for the simple nested array [1, [2, 3]]; where some 3335 + form of indefinite-length encoding is supported by the encoding, 3336 + [_ 1, [2, 3]] (indefinite length on the outer array) is also shown. 3337 + 3338 + +=============+============================+================+ 3339 + | Format | [1, [2, 3]] | [_ 1, [2, 3]] | 3340 + +=============+============================+================+ 3341 + | RFC 713 | c2 05 81 c2 02 82 83 | | 3342 + +-------------+----------------------------+----------------+ 3343 + | ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 | 3344 + | | 02 02 01 03 | 30 06 02 01 02 | 3345 + | | | 02 01 03 00 00 | 3346 + +-------------+----------------------------+----------------+ 3347 + | MessagePack | 92 01 92 02 03 | | 3348 + +-------------+----------------------------+----------------+ 3349 + | BSON | 22 00 00 00 10 30 00 01 00 | | 3350 + | | 00 00 04 31 00 13 00 00 00 | | 3351 + | | 10 30 00 02 00 00 00 10 31 | | 3352 + | | 00 03 00 00 00 00 00 | | 3353 + +-------------+----------------------------+----------------+ 3354 + | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | 3355 + | | | ff | 3356 + +-------------+----------------------------+----------------+ 3357 + 3358 + Table 8: Examples for Different Levels of Conciseness 3359 + 3360 + Appendix F. Well-Formedness Errors and Examples 3361 + 3362 + There are three basic kinds of well-formedness errors that can occur 3363 + in decoding a CBOR data item: 3364 + 3365 + Too much data: There are input bytes left that were not consumed. 3366 + This is only an error if the application assumed that the input 3367 + bytes would span exactly one data item. Where the application 3368 + uses the self-delimiting nature of CBOR encoding to permit 3369 + additional data after the data item, as is done in CBOR sequences 3370 + [RFC8742], for example, the CBOR decoder can simply indicate which 3371 + part of the input has not been consumed. 3372 + 3373 + Too little data: The input data available would need additional 3374 + bytes added at their end for a complete CBOR data item. This may 3375 + indicate the input is truncated; it is also a common error when 3376 + trying to decode random data as CBOR. For some applications, 3377 + however, this may not actually be an error, as the application may 3378 + not be certain it has all the data yet and can obtain or wait for 3379 + additional input bytes. Some of these applications may have an 3380 + upper limit for how much additional data can appear; here the 3381 + decoder may be able to indicate that the encoded CBOR data item 3382 + cannot be completed within this limit. 3383 + 3384 + Syntax error: The input data are not consistent with the 3385 + requirements of the CBOR encoding, and this cannot be remedied by 3386 + adding (or removing) data at the end. 3387 + 3388 + In Appendix C, errors of the first kind are addressed in the first 3389 + paragraph and bullet list (requiring "no bytes are left"), and errors 3390 + of the second kind are addressed in the second paragraph/bullet list 3391 + (failing "if n bytes are no longer available"). Errors of the third 3392 + kind are identified in the pseudocode by specific instances of 3393 + calling fail(), in order: 3394 + 3395 + * a reserved value is used for additional information (28, 29, 30) 3396 + 3397 + * major type 7, additional information 24, value < 32 (incorrect) 3398 + 3399 + * incorrect substructure of indefinite-length byte string or text 3400 + string (may only contain definite-length strings of the same major 3401 + type) 3402 + 3403 + * "break" stop code (major type 7, additional information 31) occurs 3404 + in a value position of a map or except at a position directly in 3405 + an indefinite-length item where also another enclosed data item 3406 + could occur 3407 + 3408 + * additional information 31 used with major type 0, 1, or 6 3409 + 3410 + F.1. Examples of CBOR Data Items That Are Not Well-Formed 3411 + 3412 + This subsection shows a few examples for CBOR data items that are not 3413 + well-formed. Each example is a sequence of bytes, each shown in 3414 + hexadecimal; multiple examples in a list are separated by commas. 3415 + 3416 + Examples for well-formedness error kind 1 (too much data) can easily 3417 + be formed by adding data to a well-formed encoded CBOR data item. 3418 + 3419 + Similarly, examples for well-formedness error kind 2 (too little 3420 + data) can be formed by truncating a well-formed encoded CBOR data 3421 + item. In test suites, it may be beneficial to specifically test with 3422 + incomplete data items that would require large amounts of addition to 3423 + be completed (for instance by starting the encoding of a string of a 3424 + very large size). 3425 + 3426 + A premature end of the input can occur in a head or within the 3427 + enclosed data, which may be bare strings or enclosed data items that 3428 + are either counted or should have been ended by a "break" stop code. 3429 + 3430 + End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 03 3431 + 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 00 3432 + 00, fb 00 00 00 3433 + 3434 + Definite-length strings with short data: 41, 61, 5a ff ff ff ff 00, 3435 + 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f ff 3436 + ff ff ff ff ff ff 01 02 03 3437 + 3438 + Definite-length maps and arrays not closed with enough items: 81, 81 3439 + 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 00 3440 + 3441 + Tag number not followed by tag content: c0 3442 + 3443 + Indefinite-length strings not closed by a "break" stop code: 5f 41 3444 + 00, 7f 61 00 3445 + 3446 + Indefinite-length maps and arrays not closed by a "break" stop 3447 + code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 3448 + 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff 3449 + 3450 + A few examples for the five subkinds of well-formedness error kind 3 3451 + (syntax error) are shown below. 3452 + 3453 + Subkind 1: 3454 + Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, 3455 + 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, 3456 + fd, fe, 3457 + 3458 + Subkind 2: 3459 + Reserved two-byte encodings of simple values: f8 00, f8 01, f8 3460 + 18, f8 1f 3461 + 3462 + Subkind 3: 3463 + Indefinite-length string chunks not of the correct type: 5f 00 3464 + ff, 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f 3465 + e0 ff, 7f 41 00 ff 3466 + 3467 + Indefinite-length string chunks not definite length: 5f 5f 41 00 3468 + ff ff, 7f 7f 61 00 ff ff 3469 + 3470 + Subkind 4: 3471 + Break occurring on its own outside of an indefinite-length 3472 + item: ff 3473 + 3474 + Break occurring in a definite-length array or map or a tag: 81 3475 + ff, 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 3476 + 9f 82 9f 81 9f 9f ff ff ff ff 3477 + 3478 + Break in an indefinite-length map that would lead to an odd 3479 + number of items (break in a value position): bf 00 ff, bf 00 00 3480 + 00 ff 3481 + 3482 + Subkind 5: 3483 + Major type 0, 1, 6 with additional information 31: 1f, 3f, df 3484 + 3485 + Appendix G. Changes from RFC 7049 3486 + 3487 + As discussed in the introduction, this document formally obsoletes 3488 + RFC 7049 while keeping full compatibility with the interchange format 3489 + from RFC 7049. This document provides editorial improvements, added 3490 + detail, and fixed errata. This document does not create a new 3491 + version of the format. 3492 + 3493 + G.1. Errata Processing and Clerical Changes 3494 + 3495 + The two verified errata on RFC 7049, [Err3764] and [Err3770], 3496 + concerned two encoding examples in the text that have been corrected 3497 + (Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" -> 3498 + "0b000_11001"). Also, RFC 7049 contained an example using the 3499 + numeric value 24 for a simple value [Err5917], which is not well- 3500 + formed; this example has been removed. Errata report 5763 [Err5763] 3501 + pointed to an error in the wording of the definition of tags; this 3502 + was resolved during a rewrite of Section 3.4. Errata report 5434 3503 + [Err5434] pointed out that the Universal Binary JSON (UBJSON) example 3504 + in Appendix E no longer complied with the version of UBJSON current 3505 + at the time of the errata report submission. It turned out that the 3506 + UBJSON specification had completely changed since 2013; this example 3507 + therefore was removed. Other errata reports [Err4409] [Err4963] 3508 + [Err4964] complained that the map key sorting rules for canonical 3509 + encoding were onerous; these led to a reconsideration of the 3510 + canonical encoding suggestions and replacement by the deterministic 3511 + encoding suggestions (described below). An editorial suggestion in 3512 + errata report 4294 [Err4294] was also implemented (improved symmetry 3513 + by adding "Second value" to a comment to the last example in 3514 + Section 3.2.2). 3515 + 3516 + Other clerical changes include: 3517 + 3518 + * the use of new xml2rfc functionality [RFC7991]; 3519 + 3520 + * more explanation of the notation used; 3521 + 3522 + * the update of references, e.g., from RFC 4627 to [RFC8259], from 3523 + CNN-TERMS to [RFC7228], and from the 5.1 edition to the 11th 3524 + edition of [ECMA262]; the addition of a reference to [IEEE754] and 3525 + importation of required definitions; the addition of references to 3526 + [C] and [Cplusplus20]; and the addition of a reference to 3527 + [RFC8618] that further illustrates the discussion in Appendix E; 3528 + 3529 + * in the discussion of diagnostic notation (Section 8), the 3530 + "Extended Diagnostic Notation" (EDN) defined in [RFC8610] is now 3531 + mentioned, the gap in representing NaN payloads is now 3532 + highlighted, and an explanation of representing indefinite-length 3533 + strings with no chunks has been added (Section 8.1); 3534 + 3535 + * the addition of this appendix. 3536 + 3537 + G.2. Changes in IANA Considerations 3538 + 3539 + The IANA considerations were generally updated (clerical changes, 3540 + e.g., now pointing to the CBOR Working Group as the author of the 3541 + specification). References to the respective IANA registries were 3542 + added to the informative references. 3543 + 3544 + In the "Concise Binary Object Representation (CBOR) Tags" registry 3545 + [IANA.cbor-tags], tags in the space from 256 to 32767 (lower half of 3546 + "1+2") are no longer assigned by First Come First Served; this range 3547 + is now Specification Required. 3548 + 3549 + G.3. Changes in Suggestions and Other Informational Components 3550 + 3551 + While revising the document, beyond the addressing of the errata 3552 + reports, the working group drew upon nearly seven years of experience 3553 + with CBOR in a diverse set of applications. This led to a number of 3554 + editorial changes, including adding tables for illustration, but also 3555 + emphasizing some aspects and de-emphasizing others. 3556 + 3557 + A significant addition is Section 2, which discusses the CBOR data 3558 + model and its small variations involved in the processing of CBOR. 3559 + The introduction of terms for those variations (basic generic, 3560 + extended generic, specific) enables more concise language in other 3561 + places of the document and also helps to clarify expectations of 3562 + implementations and of the extensibility features of the format. 3563 + 3564 + As a format derived from the JSON ecosystem, RFC 7049 was influenced 3565 + by the JSON number system that was in turn inherited from JavaScript 3566 + at the time. JSON does not provide distinct integers and floating- 3567 + point values (and the latter are decimal in the format). CBOR 3568 + provides binary representations of numbers, which do differ between 3569 + integers and floating-point values. Experience from implementation 3570 + and use suggested that the separation between these two number 3571 + domains should be more clearly drawn in the document; language that 3572 + suggested an integer could seamlessly stand in for a floating-point 3573 + value was removed. Also, a suggestion (based on I-JSON [RFC7493]) 3574 + was added for handling these types when converting JSON to CBOR, and 3575 + the use of a specific rounding mechanism has been recommended. 3576 + 3577 + For a single value in the data model, CBOR often provides multiple 3578 + encoding options. A new section (Section 4) introduces the term 3579 + "preferred serialization" (Section 4.1) and defines it for various 3580 + kinds of data items. On the basis of this terminology, the section 3581 + then discusses how a CBOR-based protocol can define "deterministic 3582 + encoding" (Section 4.2), which avoids terms "canonical" and 3583 + "canonicalization" from RFC 7049. The suggestion of "Core 3584 + Deterministic Encoding Requirements" (Section 4.2.1) enables generic 3585 + support for such protocol-defined encoding requirements. This 3586 + document further eases the implementation of deterministic encoding 3587 + by simplifying the map ordering suggested in RFC 7049 to a simple 3588 + lexicographic ordering of encoded keys. A description of the older 3589 + suggestion is kept as an alternative, now termed "length-first map 3590 + key ordering" (Section 4.2.3). 3591 + 3592 + The terminology for well-formed and valid data was sharpened and more 3593 + stringently used, avoiding less well-defined alternative terms such 3594 + as "syntax error", "decoding error", and "strict mode" outside of 3595 + examples. Also, a third level of requirements that an application 3596 + has on its input data beyond CBOR-level validity is now explicitly 3597 + called out. Well-formed (processable at all), valid (checked by a 3598 + validity-checking generic decoder), and expected input (as checked by 3599 + the application) are treated as a hierarchy of layers of 3600 + acceptability. 3601 + 3602 + The handling of non-well-formed simple values was clarified in text 3603 + and pseudocode. Appendix F was added to discuss well-formedness 3604 + errors and provide examples for them. The pseudocode was updated to 3605 + be more portable, and some portability considerations were added. 3606 + 3607 + The discussion of validity has been sharpened in two areas. Map 3608 + validity (handling of duplicate keys) was clarified, and the domain 3609 + of applicability of certain implementation choices explained. Also, 3610 + while streamlining the terminology for tags, tag numbers, and tag 3611 + content, discussion was added on tag validity, and the restrictions 3612 + were clarified on tag content, in general and specifically for tag 1. 3613 + 3614 + An implementation note (and note for future tag definitions) was 3615 + added to Section 3.4 about defining tags with semantics that depend 3616 + on serialization order. 3617 + 3618 + Tag 35 is not defined by this document; the registration based on the 3619 + definition in RFC 7049 remains in place. 3620 + 3621 + Terminology was introduced in Section 3 for "argument" and "head", 3622 + simplifying further discussion. 3623 + 3624 + The security considerations (Section 10) were mostly rewritten and 3625 + significantly expanded; in multiple other places, the document is now 3626 + more explicit that a decoder cannot simply condone well-formedness 3627 + errors. 3628 + 3629 + Acknowledgements 3630 + 3631 + CBOR was inspired by MessagePack. MessagePack was developed and 3632 + promoted by Sadayuki Furuhashi ("frsyuki"). This reference to 3633 + MessagePack is solely for attribution; CBOR is not intended as a 3634 + version of, or replacement for, MessagePack, as it has different 3635 + design goals and requirements. 3636 + 3637 + The need for functionality beyond the original MessagePack 3638 + specification became obvious to many people at about the same time 3639 + around the year 2012. BinaryPack is a minor derivation of 3640 + MessagePack that was developed by Eric Zhang for the binaryjs 3641 + project. A similar, but different, extension was made by Tim Caswell 3642 + for his msgpack-js and msgpack-js-browser projects. Many people have 3643 + contributed to the discussion about extending MessagePack to separate 3644 + text string representation from byte string representation. 3645 + 3646 + The encoding of the additional information in CBOR was inspired by 3647 + the encoding of length information designed by Klaus Hartke for CoAP. 3648 + 3649 + This document also incorporates suggestions made by many people, 3650 + notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand, 3651 + Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael 3652 + Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray 3653 + Polk, Stuart Cheshire, Tim Bray, Tony Finch, Tony Hansen, and Yaron 3654 + Sheffer. Benjamin Kaduk provided an extensive review during IESG 3655 + processing. Éric Vyncke, Erik Kline, Robert Wilton, and Roman Danyliw 3656 + provided further IESG comments, which included an IoT directorate 3657 + review by Eve Schooler. 3658 + 3659 + Authors' Addresses 3660 + 3661 + Carsten Bormann 3662 + Universität Bremen TZI 3663 + Postfach 330440 3664 + D-28359 Bremen 3665 + Germany 3666 + 3667 + Phone: +49-421-218-63921 3668 + Email: cabo@tzi.org 3669 + 3670 + 3671 + Paul Hoffman 3672 + ICANN 3673 + 3674 + Email: paul.hoffman@icann.org
+494
test-vectors/appendix_a.json
··· 1 + [ 2 + { 3 + "cbor": "AA==", 4 + "hex": "00", 5 + "roundtrip": true, 6 + "decoded": 0 7 + }, 8 + { 9 + "cbor": "AQ==", 10 + "hex": "01", 11 + "roundtrip": true, 12 + "decoded": 1 13 + }, 14 + { 15 + "cbor": "Cg==", 16 + "hex": "0a", 17 + "roundtrip": true, 18 + "decoded": 10 19 + }, 20 + { 21 + "cbor": "Fw==", 22 + "hex": "17", 23 + "roundtrip": true, 24 + "decoded": 23 25 + }, 26 + { 27 + "cbor": "GBg=", 28 + "hex": "1818", 29 + "roundtrip": true, 30 + "decoded": 24 31 + }, 32 + { 33 + "cbor": "GBk=", 34 + "hex": "1819", 35 + "roundtrip": true, 36 + "decoded": 25 37 + }, 38 + { 39 + "cbor": "GGQ=", 40 + "hex": "1864", 41 + "roundtrip": true, 42 + "decoded": 100 43 + }, 44 + { 45 + "cbor": "GQPo", 46 + "hex": "1903e8", 47 + "roundtrip": true, 48 + "decoded": 1000 49 + }, 50 + { 51 + "cbor": "GgAPQkA=", 52 + "hex": "1a000f4240", 53 + "roundtrip": true, 54 + "decoded": 1000000 55 + }, 56 + { 57 + "cbor": "GwAAAOjUpRAA", 58 + "hex": "1b000000e8d4a51000", 59 + "roundtrip": true, 60 + "decoded": 1000000000000 61 + }, 62 + { 63 + "cbor": "G///////////", 64 + "hex": "1bffffffffffffffff", 65 + "roundtrip": true, 66 + "decoded": 18446744073709551615 67 + }, 68 + { 69 + "cbor": "wkkBAAAAAAAAAAA=", 70 + "hex": "c249010000000000000000", 71 + "roundtrip": true, 72 + "decoded": 18446744073709551616 73 + }, 74 + { 75 + "cbor": "O///////////", 76 + "hex": "3bffffffffffffffff", 77 + "roundtrip": true, 78 + "decoded": -18446744073709551616 79 + }, 80 + { 81 + "cbor": "w0kBAAAAAAAAAAA=", 82 + "hex": "c349010000000000000000", 83 + "roundtrip": true, 84 + "decoded": -18446744073709551617 85 + }, 86 + { 87 + "cbor": "IA==", 88 + "hex": "20", 89 + "roundtrip": true, 90 + "decoded": -1 91 + }, 92 + { 93 + "cbor": "KQ==", 94 + "hex": "29", 95 + "roundtrip": true, 96 + "decoded": -10 97 + }, 98 + { 99 + "cbor": "OGM=", 100 + "hex": "3863", 101 + "roundtrip": true, 102 + "decoded": -100 103 + }, 104 + { 105 + "cbor": "OQPn", 106 + "hex": "3903e7", 107 + "roundtrip": true, 108 + "decoded": -1000 109 + }, 110 + { 111 + "cbor": "+QAA", 112 + "hex": "f90000", 113 + "roundtrip": true, 114 + "decoded": 0.0 115 + }, 116 + { 117 + "cbor": "+YAA", 118 + "hex": "f98000", 119 + "roundtrip": true, 120 + "decoded": -0.0 121 + }, 122 + { 123 + "cbor": "+TwA", 124 + "hex": "f93c00", 125 + "roundtrip": true, 126 + "decoded": 1.0 127 + }, 128 + { 129 + "cbor": "+z/xmZmZmZma", 130 + "hex": "fb3ff199999999999a", 131 + "roundtrip": true, 132 + "decoded": 1.1 133 + }, 134 + { 135 + "cbor": "+T4A", 136 + "hex": "f93e00", 137 + "roundtrip": true, 138 + "decoded": 1.5 139 + }, 140 + { 141 + "cbor": "+Xv/", 142 + "hex": "f97bff", 143 + "roundtrip": true, 144 + "decoded": 65504.0 145 + }, 146 + { 147 + "cbor": "+kfDUAA=", 148 + "hex": "fa47c35000", 149 + "roundtrip": true, 150 + "decoded": 100000.0 151 + }, 152 + { 153 + "cbor": "+n9///8=", 154 + "hex": "fa7f7fffff", 155 + "roundtrip": true, 156 + "decoded": 3.4028234663852886e+38 157 + }, 158 + { 159 + "cbor": "+3435DyIAHWc", 160 + "hex": "fb7e37e43c8800759c", 161 + "roundtrip": true, 162 + "decoded": 1.0e+300 163 + }, 164 + { 165 + "cbor": "+QAB", 166 + "hex": "f90001", 167 + "roundtrip": true, 168 + "decoded": 5.960464477539063e-08 169 + }, 170 + { 171 + "cbor": "+QQA", 172 + "hex": "f90400", 173 + "roundtrip": true, 174 + "decoded": 6.103515625e-05 175 + }, 176 + { 177 + "cbor": "+cQA", 178 + "hex": "f9c400", 179 + "roundtrip": true, 180 + "decoded": -4.0 181 + }, 182 + { 183 + "cbor": "+8AQZmZmZmZm", 184 + "hex": "fbc010666666666666", 185 + "roundtrip": true, 186 + "decoded": -4.1 187 + }, 188 + { 189 + "cbor": "+XwA", 190 + "hex": "f97c00", 191 + "roundtrip": true, 192 + "diagnostic": "Infinity" 193 + }, 194 + { 195 + "cbor": "+X4A", 196 + "hex": "f97e00", 197 + "roundtrip": true, 198 + "diagnostic": "NaN" 199 + }, 200 + { 201 + "cbor": "+fwA", 202 + "hex": "f9fc00", 203 + "roundtrip": true, 204 + "diagnostic": "-Infinity" 205 + }, 206 + { 207 + "cbor": "+n+AAAA=", 208 + "hex": "fa7f800000", 209 + "roundtrip": false, 210 + "diagnostic": "Infinity" 211 + }, 212 + { 213 + "cbor": "+n/AAAA=", 214 + "hex": "fa7fc00000", 215 + "roundtrip": false, 216 + "diagnostic": "NaN" 217 + }, 218 + { 219 + "cbor": "+v+AAAA=", 220 + "hex": "faff800000", 221 + "roundtrip": false, 222 + "diagnostic": "-Infinity" 223 + }, 224 + { 225 + "cbor": "+3/wAAAAAAAA", 226 + "hex": "fb7ff0000000000000", 227 + "roundtrip": false, 228 + "diagnostic": "Infinity" 229 + }, 230 + { 231 + "cbor": "+3/4AAAAAAAA", 232 + "hex": "fb7ff8000000000000", 233 + "roundtrip": false, 234 + "diagnostic": "NaN" 235 + }, 236 + { 237 + "cbor": "+//wAAAAAAAA", 238 + "hex": "fbfff0000000000000", 239 + "roundtrip": false, 240 + "diagnostic": "-Infinity" 241 + }, 242 + { 243 + "cbor": "9A==", 244 + "hex": "f4", 245 + "roundtrip": true, 246 + "decoded": false 247 + }, 248 + { 249 + "cbor": "9Q==", 250 + "hex": "f5", 251 + "roundtrip": true, 252 + "decoded": true 253 + }, 254 + { 255 + "cbor": "9g==", 256 + "hex": "f6", 257 + "roundtrip": true, 258 + "decoded": null 259 + }, 260 + { 261 + "cbor": "9w==", 262 + "hex": "f7", 263 + "roundtrip": true, 264 + "diagnostic": "undefined" 265 + }, 266 + { 267 + "cbor": "8A==", 268 + "hex": "f0", 269 + "roundtrip": true, 270 + "diagnostic": "simple(16)" 271 + }, 272 + { 273 + "cbor": "+Bg=", 274 + "hex": "f818", 275 + "roundtrip": true, 276 + "diagnostic": "simple(24)" 277 + }, 278 + { 279 + "cbor": "+P8=", 280 + "hex": "f8ff", 281 + "roundtrip": true, 282 + "diagnostic": "simple(255)" 283 + }, 284 + { 285 + "cbor": "wHQyMDEzLTAzLTIxVDIwOjA0OjAwWg==", 286 + "hex": "c074323031332d30332d32315432303a30343a30305a", 287 + "roundtrip": true, 288 + "diagnostic": "0(\"2013-03-21T20:04:00Z\")" 289 + }, 290 + { 291 + "cbor": "wRpRS2ew", 292 + "hex": "c11a514b67b0", 293 + "roundtrip": true, 294 + "diagnostic": "1(1363896240)" 295 + }, 296 + { 297 + "cbor": "wftB1FLZ7CAAAA==", 298 + "hex": "c1fb41d452d9ec200000", 299 + "roundtrip": true, 300 + "diagnostic": "1(1363896240.5)" 301 + }, 302 + { 303 + "cbor": "10QBAgME", 304 + "hex": "d74401020304", 305 + "roundtrip": true, 306 + "diagnostic": "23(h'01020304')" 307 + }, 308 + { 309 + "cbor": "2BhFZElFVEY=", 310 + "hex": "d818456449455446", 311 + "roundtrip": true, 312 + "diagnostic": "24(h'6449455446')" 313 + }, 314 + { 315 + "cbor": "2CB2aHR0cDovL3d3dy5leGFtcGxlLmNvbQ==", 316 + "hex": "d82076687474703a2f2f7777772e6578616d706c652e636f6d", 317 + "roundtrip": true, 318 + "diagnostic": "32(\"http://www.example.com\")" 319 + }, 320 + { 321 + "cbor": "QA==", 322 + "hex": "40", 323 + "roundtrip": true, 324 + "diagnostic": "h''" 325 + }, 326 + { 327 + "cbor": "RAECAwQ=", 328 + "hex": "4401020304", 329 + "roundtrip": true, 330 + "diagnostic": "h'01020304'" 331 + }, 332 + { 333 + "cbor": "YA==", 334 + "hex": "60", 335 + "roundtrip": true, 336 + "decoded": "" 337 + }, 338 + { 339 + "cbor": "YWE=", 340 + "hex": "6161", 341 + "roundtrip": true, 342 + "decoded": "a" 343 + }, 344 + { 345 + "cbor": "ZElFVEY=", 346 + "hex": "6449455446", 347 + "roundtrip": true, 348 + "decoded": "IETF" 349 + }, 350 + { 351 + "cbor": "YiJc", 352 + "hex": "62225c", 353 + "roundtrip": true, 354 + "decoded": "\"\\" 355 + }, 356 + { 357 + "cbor": "YsO8", 358 + "hex": "62c3bc", 359 + "roundtrip": true, 360 + "decoded": "ü" 361 + }, 362 + { 363 + "cbor": "Y+awtA==", 364 + "hex": "63e6b0b4", 365 + "roundtrip": true, 366 + "decoded": "水" 367 + }, 368 + { 369 + "cbor": "ZPCQhZE=", 370 + "hex": "64f0908591", 371 + "roundtrip": true, 372 + "decoded": "𐅑" 373 + }, 374 + { 375 + "cbor": "gA==", 376 + "hex": "80", 377 + "roundtrip": true, 378 + "decoded": [] 379 + }, 380 + { 381 + "cbor": "gwECAw==", 382 + "hex": "83010203", 383 + "roundtrip": true, 384 + "decoded": [1, 2, 3] 385 + }, 386 + { 387 + "cbor": "gwGCAgOCBAU=", 388 + "hex": "8301820203820405", 389 + "roundtrip": true, 390 + "decoded": [1, [2, 3], [4, 5]] 391 + }, 392 + { 393 + "cbor": "mBkBAgMEBQYHCAkKCwwNDg8QERITFBUWFxgYGBk=", 394 + "hex": "98190102030405060708090a0b0c0d0e0f101112131415161718181819", 395 + "roundtrip": true, 396 + "decoded": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25] 397 + }, 398 + { 399 + "cbor": "oA==", 400 + "hex": "a0", 401 + "roundtrip": true, 402 + "decoded": {} 403 + }, 404 + { 405 + "cbor": "ogECAwQ=", 406 + "hex": "a201020304", 407 + "roundtrip": true, 408 + "diagnostic": "{1: 2, 3: 4}" 409 + }, 410 + { 411 + "cbor": "omFhAWFiggID", 412 + "hex": "a26161016162820203", 413 + "roundtrip": true, 414 + "decoded": {"a": 1, "b": [2, 3]} 415 + }, 416 + { 417 + "cbor": "gmFhoWFiYWM=", 418 + "hex": "826161a161626163", 419 + "roundtrip": true, 420 + "decoded": ["a", {"b": "c"}] 421 + }, 422 + { 423 + "cbor": "pWFhYUFhYmFCYWNhQ2FkYURhZWFF", 424 + "hex": "a56161614161626142616361436164614461656145", 425 + "roundtrip": true, 426 + "decoded": {"a": "A", "b": "B", "c": "C", "d": "D", "e": "E"} 427 + }, 428 + { 429 + "cbor": "X0IBAkMDBAX/", 430 + "hex": "5f42010243030405ff", 431 + "roundtrip": false, 432 + "diagnostic": "(_ h'0102', h'030405')" 433 + }, 434 + { 435 + "cbor": "f2VzdHJlYWRtaW5n/w==", 436 + "hex": "7f657374726561646d696e67ff", 437 + "roundtrip": false, 438 + "decoded": "streaming" 439 + }, 440 + { 441 + "cbor": "n/8=", 442 + "hex": "9fff", 443 + "roundtrip": false, 444 + "decoded": [] 445 + }, 446 + { 447 + "cbor": "nwGCAgOfBAX//w==", 448 + "hex": "9f018202039f0405ffff", 449 + "roundtrip": false, 450 + "decoded": [1, [2, 3], [4, 5]] 451 + }, 452 + { 453 + "cbor": "nwGCAgOCBAX/", 454 + "hex": "9f01820203820405ff", 455 + "roundtrip": false, 456 + "decoded": [1, [2, 3], [4, 5]] 457 + }, 458 + { 459 + "cbor": "gwGCAgOfBAX/", 460 + "hex": "83018202039f0405ff", 461 + "roundtrip": false, 462 + "decoded": [1, [2, 3], [4, 5]] 463 + }, 464 + { 465 + "cbor": "gwGfAgP/ggQF", 466 + "hex": "83019f0203ff820405", 467 + "roundtrip": false, 468 + "decoded": [1, [2, 3], [4, 5]] 469 + }, 470 + { 471 + "cbor": "nwECAwQFBgcICQoLDA0ODxAREhMUFRYXGBgYGf8=", 472 + "hex": "9f0102030405060708090a0b0c0d0e0f101112131415161718181819ff", 473 + "roundtrip": false, 474 + "decoded": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25] 475 + }, 476 + { 477 + "cbor": "v2FhAWFinwID//8=", 478 + "hex": "bf61610161629f0203ffff", 479 + "roundtrip": false, 480 + "decoded": {"a": 1, "b": [2, 3]} 481 + }, 482 + { 483 + "cbor": "gmFhv2FiYWP/", 484 + "hex": "826161bf61626163ff", 485 + "roundtrip": false, 486 + "decoded": ["a", {"b": "c"}] 487 + }, 488 + { 489 + "cbor": "v2NGdW71Y0FtdCH/", 490 + "hex": "bf6346756ef563416d7421ff", 491 + "roundtrip": false, 492 + "decoded": {"Fun": true, "Amt": -2} 493 + } 494 + ]
+5
test/dune
··· 1 + (test 2 + (name test) 3 + (libraries cbor alcotest bytesrw zarith ohex fmt) 4 + (deps 5 + (source_tree ../test-vectors)))
+2
test/test.ml
··· 1 + let () = 2 + Alcotest.run "cbor" [ Test_cbor.suite; Test_value.suite; Test_cbor_rw.suite ]
+714
test/test_cbor.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved. 3 + SPDX-License-Identifier: ISC 4 + ---------------------------------------------------------------------------*) 5 + 6 + (* Test suite for cbor using RFC 8949 Appendix A test vectors *) 7 + 8 + open Bytesrw 9 + 10 + (* Module aliases for wrapped modules *) 11 + module Cbor_rw = Cbor.Binary 12 + module V = Cbor.Value 13 + 14 + (* Simple JSON parser for test vectors *) 15 + type json = 16 + | Jnull 17 + | Jbool of bool 18 + | Jnumber of float 19 + | Jstring of string 20 + | Jarray of json list 21 + | Jobject of (string * json) list 22 + 23 + let rec skip_ws s i = 24 + if i >= String.length s then i 25 + else match s.[i] with ' ' | '\t' | '\n' | '\r' -> skip_ws s (i + 1) | _ -> i 26 + 27 + let parse_string s i = 28 + if s.[i] <> '"' then failwith "Expected string"; 29 + let buf = Buffer.create 64 in 30 + let rec loop i = 31 + if i >= String.length s then failwith "Unterminated string"; 32 + match s.[i] with 33 + | '"' -> (Buffer.contents buf, i + 1) 34 + | '\\' -> ( 35 + if i + 1 >= String.length s then failwith "Unterminated escape"; 36 + match s.[i + 1] with 37 + | '"' -> 38 + Buffer.add_char buf '"'; 39 + loop (i + 2) 40 + | '\\' -> 41 + Buffer.add_char buf '\\'; 42 + loop (i + 2) 43 + | '/' -> 44 + Buffer.add_char buf '/'; 45 + loop (i + 2) 46 + | 'b' -> 47 + Buffer.add_char buf '\b'; 48 + loop (i + 2) 49 + | 'f' -> 50 + Buffer.add_char buf '\012'; 51 + loop (i + 2) 52 + | 'n' -> 53 + Buffer.add_char buf '\n'; 54 + loop (i + 2) 55 + | 'r' -> 56 + Buffer.add_char buf '\r'; 57 + loop (i + 2) 58 + | 't' -> 59 + Buffer.add_char buf '\t'; 60 + loop (i + 2) 61 + | 'u' -> 62 + let hex = String.sub s (i + 2) 4 in 63 + let code = int_of_string ("0x" ^ hex) in 64 + if code < 0x80 then Buffer.add_char buf (Char.chr code) 65 + else if code < 0x800 then begin 66 + Buffer.add_char buf (Char.chr (0xC0 lor (code lsr 6))); 67 + Buffer.add_char buf (Char.chr (0x80 lor (code land 0x3F))) 68 + end 69 + else begin 70 + Buffer.add_char buf (Char.chr (0xE0 lor (code lsr 12))); 71 + Buffer.add_char buf (Char.chr (0x80 lor ((code lsr 6) land 0x3F))); 72 + Buffer.add_char buf (Char.chr (0x80 lor (code land 0x3F))) 73 + end; 74 + loop (i + 6) 75 + | _ -> failwith "Invalid escape") 76 + | c -> 77 + Buffer.add_char buf c; 78 + loop (i + 1) 79 + in 80 + loop (i + 1) 81 + 82 + let parse_number s i = 83 + let start = i in 84 + let rec loop i = 85 + if i >= String.length s then i 86 + else 87 + match s.[i] with 88 + | '0' .. '9' | '-' | '+' | '.' | 'e' | 'E' -> loop (i + 1) 89 + | _ -> i 90 + in 91 + let end_i = loop i in 92 + let num_str = String.sub s start (end_i - start) in 93 + (float_of_string num_str, end_i) 94 + 95 + let rec parse_json s i = 96 + let i = skip_ws s i in 97 + if i >= String.length s then failwith "Unexpected end of input"; 98 + match s.[i] with 99 + | 'n' -> 100 + if String.sub s i 4 = "null" then (Jnull, i + 4) 101 + else failwith "Expected null" 102 + | 't' -> 103 + if String.sub s i 4 = "true" then (Jbool true, i + 4) 104 + else failwith "Expected true" 105 + | 'f' -> 106 + if String.sub s i 5 = "false" then (Jbool false, i + 5) 107 + else failwith "Expected false" 108 + | '"' -> 109 + let str, i = parse_string s i in 110 + (Jstring str, i) 111 + | '[' -> 112 + let rec parse_array acc i = 113 + let i = skip_ws s i in 114 + if s.[i] = ']' then (List.rev acc, i + 1) 115 + else 116 + let v, i = parse_json s i in 117 + let i = skip_ws s i in 118 + if s.[i] = ',' then parse_array (v :: acc) (i + 1) 119 + else if s.[i] = ']' then (List.rev (v :: acc), i + 1) 120 + else failwith "Expected , or ]" 121 + in 122 + let i = skip_ws s (i + 1) in 123 + if s.[i] = ']' then (Jarray [], i + 1) 124 + else 125 + let arr, i = parse_array [] i in 126 + (Jarray arr, i) 127 + | '{' -> 128 + let rec parse_object acc i = 129 + let i = skip_ws s i in 130 + if s.[i] = '}' then (List.rev acc, i + 1) 131 + else 132 + let key, i = parse_string s i in 133 + let i = skip_ws s i in 134 + if s.[i] <> ':' then failwith "Expected :"; 135 + let v, i = parse_json s (i + 1) in 136 + let i = skip_ws s i in 137 + if s.[i] = ',' then parse_object ((key, v) :: acc) (i + 1) 138 + else if s.[i] = '}' then (List.rev ((key, v) :: acc), i + 1) 139 + else failwith "Expected , or }" 140 + in 141 + let i = skip_ws s (i + 1) in 142 + if s.[i] = '}' then (Jobject [], i + 1) 143 + else 144 + let obj, i = parse_object [] i in 145 + (Jobject obj, i) 146 + | '-' | '0' .. '9' -> 147 + let num, i = parse_number s i in 148 + (Jnumber num, i) 149 + | c -> Fmt.failwith "Unexpected character: %c" c 150 + 151 + let parse_json_string s = 152 + let json, _ = parse_json s 0 in 153 + json 154 + 155 + (* Compare CBOR values for equality, handling floats specially *) 156 + let rec cbor_equal (a : V.t) (b : V.t) = 157 + match (a, b) with 158 + | V.Int x, V.Int y -> x = y 159 + | V.Bytes x, V.Bytes y -> x = y 160 + | V.Text x, V.Text y -> x = y 161 + | V.Array xs, V.Array ys -> 162 + List.length xs = List.length ys && List.for_all2 cbor_equal xs ys 163 + | V.Map xs, V.Map ys -> 164 + List.length xs = List.length ys 165 + && List.for_all2 166 + (fun (k1, v1) (k2, v2) -> cbor_equal k1 k2 && cbor_equal v1 v2) 167 + xs ys 168 + | V.Tag (n1, v1), V.Tag (n2, v2) -> n1 = n2 && cbor_equal v1 v2 169 + | V.Bool x, V.Bool y -> x = y 170 + | V.Null, V.Null -> true 171 + | V.Undefined, V.Undefined -> true 172 + | V.Simple x, V.Simple y -> x = y 173 + | V.Float x, V.Float y -> 174 + (Float.is_nan x && Float.is_nan y) 175 + || x = y 176 + || 177 + (* Handle -0.0 == 0.0 *) 178 + (x = 0.0 && y = 0.0) 179 + (* Handle Int/Float comparisons - floats that are exact integers *) 180 + | V.Float x, V.Int y -> 181 + let yi = Z.to_float y in 182 + x = yi 183 + | V.Int x, V.Float y -> 184 + let xi = Z.to_float x in 185 + xi = y 186 + | _ -> false 187 + 188 + (* Convert JSON decoded value to V.t *) 189 + let rec json_to_cbor = function 190 + | Jnull -> V.Null 191 + | Jbool b -> V.Bool b 192 + | Jnumber f -> 193 + let n = Z.of_float f in 194 + if Z.to_float n = f && f >= -9007199254740992.0 && f <= 9007199254740992.0 195 + then V.Int n 196 + else V.Float f 197 + | Jstring s -> V.Text s 198 + | Jarray items -> V.Array (List.map json_to_cbor items) 199 + | Jobject pairs -> 200 + V.Map (List.map (fun (k, v) -> (V.Text k, json_to_cbor v)) pairs) 201 + 202 + (* RFC 8949 Appendix A test vectors *) 203 + let rfc_tests = 204 + let test_file = "../test-vectors/appendix_a.json" in 205 + let content = 206 + let ic = open_in test_file in 207 + let len = in_channel_length ic in 208 + let s = really_input_string ic len in 209 + close_in ic; 210 + s 211 + in 212 + let json = parse_json_string content in 213 + let vectors = 214 + match json with 215 + | Jarray items -> items 216 + | _ -> failwith "Expected array of test vectors" 217 + in 218 + List.concat_map 219 + (fun (i, vec) -> 220 + match vec with 221 + | Jobject fields -> 222 + let hex = 223 + match List.assoc_opt "hex" fields with 224 + | Some (Jstring s) -> s 225 + | _ -> failwith "Missing hex field" 226 + in 227 + let cbor_bytes = Ohex.decode hex in 228 + let roundtrip = 229 + match List.assoc_opt "roundtrip" fields with 230 + | Some (Jbool b) -> b 231 + | _ -> true 232 + in 233 + let decoded_opt = List.assoc_opt "decoded" fields in 234 + let test_name = Fmt.str "vector %d (hex: %s)" i hex in 235 + let decode_test = 236 + Alcotest.test_case (test_name ^ " decode") `Quick (fun () -> 237 + let reader = Bytes.Reader.of_string cbor_bytes in 238 + let dec = Cbor_rw.decoder reader in 239 + let cbor = Cbor_rw.read_cbor dec in 240 + match decoded_opt with 241 + | Some expected_json -> 242 + let expected = json_to_cbor expected_json in 243 + if not (cbor_equal cbor expected) then 244 + Alcotest.fail 245 + (Fmt.str "Decoded value mismatch: got %s, expected %s" 246 + (V.to_diagnostic cbor) (V.to_diagnostic expected)) 247 + | None -> ignore cbor) 248 + in 249 + let roundtrip_test = 250 + if roundtrip then 251 + [ 252 + Alcotest.test_case (test_name ^ " roundtrip") `Quick (fun () -> 253 + let reader = Bytes.Reader.of_string cbor_bytes in 254 + let dec = Cbor_rw.decoder reader in 255 + let cbor = Cbor_rw.read_cbor dec in 256 + let buf = Buffer.create 64 in 257 + let writer = Bytes.Writer.of_buffer buf in 258 + let enc = Cbor_rw.encoder writer in 259 + Cbor_rw.write_cbor enc cbor; 260 + Cbor_rw.flush_encoder enc; 261 + let encoded = Buffer.contents buf in 262 + if encoded <> cbor_bytes then 263 + Alcotest.fail 264 + (Fmt.str 265 + "Roundtrip mismatch: encoded %d bytes, expected %d \ 266 + bytes" 267 + (String.length encoded) (String.length cbor_bytes))); 268 + ] 269 + else [] 270 + in 271 + decode_test :: roundtrip_test 272 + | _ -> failwith "Expected object in test vector") 273 + (List.mapi (fun i v -> (i, v)) vectors) 274 + 275 + (* Define a recursive tree type for testing fix *) 276 + type tree = Leaf of int | Node of tree * tree 277 + 278 + (* Unit tests *) 279 + let unit_tests = 280 + [ 281 + Alcotest.test_case "codec: null" `Quick (fun () -> 282 + let encoded = Cbor.encode_string Cbor.null () in 283 + let decoded = Cbor.decode_string_exn Cbor.null encoded in 284 + assert (decoded = ())); 285 + Alcotest.test_case "codec: bool true" `Quick (fun () -> 286 + let encoded = Cbor.encode_string Cbor.bool true in 287 + let decoded = Cbor.decode_string_exn Cbor.bool encoded in 288 + assert (decoded = true)); 289 + Alcotest.test_case "codec: bool false" `Quick (fun () -> 290 + let encoded = Cbor.encode_string Cbor.bool false in 291 + let decoded = Cbor.decode_string_exn Cbor.bool encoded in 292 + assert (decoded = false)); 293 + Alcotest.test_case "codec: int positive" `Quick (fun () -> 294 + let encoded = Cbor.encode_string Cbor.int 42 in 295 + let decoded = Cbor.decode_string_exn Cbor.int encoded in 296 + assert (decoded = 42)); 297 + Alcotest.test_case "codec: int negative" `Quick (fun () -> 298 + let encoded = Cbor.encode_string Cbor.int (-100) in 299 + let decoded = Cbor.decode_string_exn Cbor.int encoded in 300 + assert (decoded = -100)); 301 + Alcotest.test_case "codec: int64" `Quick (fun () -> 302 + let n = 1000000000000L in 303 + let encoded = Cbor.encode_string Cbor.int64 n in 304 + let decoded = Cbor.decode_string_exn Cbor.int64 encoded in 305 + assert (decoded = n)); 306 + Alcotest.test_case "codec: float" `Quick (fun () -> 307 + let encoded = Cbor.encode_string Cbor.float 3.14159 in 308 + let decoded = Cbor.decode_string_exn Cbor.float encoded in 309 + assert (abs_float (decoded -. 3.14159) < 0.00001)); 310 + Alcotest.test_case "codec: string" `Quick (fun () -> 311 + let s = "Hello, CBOR!" in 312 + let encoded = Cbor.encode_string Cbor.string s in 313 + let decoded = Cbor.decode_string_exn Cbor.string encoded in 314 + assert (decoded = s)); 315 + Alcotest.test_case "codec: bytes" `Quick (fun () -> 316 + let s = "\x00\x01\x02\x03" in 317 + let encoded = Cbor.encode_string Cbor.bytes s in 318 + let decoded = Cbor.decode_string_exn Cbor.bytes encoded in 319 + assert (decoded = s)); 320 + Alcotest.test_case "codec: array" `Quick (fun () -> 321 + let arr = [ 1; 2; 3; 4; 5 ] in 322 + let encoded = Cbor.encode_string (Cbor.array Cbor.int) arr in 323 + let decoded = Cbor.decode_string_exn (Cbor.array Cbor.int) encoded in 324 + assert (decoded = arr)); 325 + Alcotest.test_case "codec: tuple2" `Quick (fun () -> 326 + let t = ("hello", 42) in 327 + let encoded = Cbor.encode_string (Cbor.tuple2 Cbor.string Cbor.int) t in 328 + let decoded = 329 + Cbor.decode_string_exn (Cbor.tuple2 Cbor.string Cbor.int) encoded 330 + in 331 + assert (decoded = t)); 332 + Alcotest.test_case "codec: string_map" `Quick (fun () -> 333 + let m = [ ("a", 1); ("b", 2) ] in 334 + let encoded = Cbor.encode_string (Cbor.string_map Cbor.int) m in 335 + let decoded = 336 + Cbor.decode_string_exn (Cbor.string_map Cbor.int) encoded 337 + in 338 + assert (decoded = m)); 339 + Alcotest.test_case "codec: nullable Some" `Quick (fun () -> 340 + let v = Some 42 in 341 + let encoded = Cbor.encode_string (Cbor.nullable Cbor.int) v in 342 + let decoded = Cbor.decode_string_exn (Cbor.nullable Cbor.int) encoded in 343 + assert (decoded = v)); 344 + Alcotest.test_case "codec: nullable None" `Quick (fun () -> 345 + let v = None in 346 + let encoded = Cbor.encode_string (Cbor.nullable Cbor.int) v in 347 + let decoded = Cbor.decode_string_exn (Cbor.nullable Cbor.int) encoded in 348 + assert (decoded = v)); 349 + Alcotest.test_case "codec: tag" `Quick (fun () -> 350 + let v = 12345 in 351 + let encoded = Cbor.encode_string (Cbor.tag 1 Cbor.int) v in 352 + let decoded = Cbor.decode_string_exn (Cbor.tag 1 Cbor.int) encoded in 353 + assert (decoded = v)); 354 + Alcotest.test_case "codec: Obj" `Quick (fun () -> 355 + let open Cbor.Obj in 356 + let codec = 357 + let* name = mem "name" fst Cbor.string in 358 + let* age = mem "age" snd Cbor.int in 359 + return (name, age) 360 + in 361 + let codec = finish codec in 362 + let v = ("Alice", 30) in 363 + let encoded = Cbor.encode_string codec v in 364 + let decoded = Cbor.decode_string_exn codec encoded in 365 + assert (decoded = v)); 366 + Alcotest.test_case "codec: map transform" `Quick (fun () -> 367 + let codec = 368 + Cbor.map 369 + (fun s -> String.uppercase_ascii s) 370 + (fun s -> String.lowercase_ascii s) 371 + Cbor.string 372 + in 373 + let encoded = Cbor.encode_string codec "HELLO" in 374 + let decoded = Cbor.decode_string_exn codec encoded in 375 + assert (decoded = "HELLO")); 376 + Alcotest.test_case "codec: fix (recursive)" `Quick (fun () -> 377 + let tree_codec = 378 + Cbor.fix (fun self -> 379 + Cbor.Variant.( 380 + variant 381 + [ 382 + case 0 Cbor.int 383 + (fun x -> Leaf x) 384 + (function Leaf x -> Some x | _ -> None); 385 + case 1 (Cbor.tuple2 self self) 386 + (fun (l, r) -> Node (l, r)) 387 + (function Node (l, r) -> Some (l, r) | _ -> None); 388 + ])) 389 + in 390 + let v = Node (Leaf 1, Node (Leaf 2, Leaf 3)) in 391 + let encoded = Cbor.encode_string tree_codec v in 392 + let decoded = Cbor.decode_string_exn tree_codec encoded in 393 + let rec tree_equal a b = 394 + match (a, b) with 395 + | Leaf x, Leaf y -> x = y 396 + | Node (l1, r1), Node (l2, r2) -> tree_equal l1 l2 && tree_equal r1 r2 397 + | _ -> false 398 + in 399 + assert (tree_equal decoded v)); 400 + ] 401 + 402 + (* Hostile-input security tests (inlined from former test_hostile.ml) *) 403 + 404 + (* Helper: decode a raw byte string through [read_cbor] and expect failure. *) 405 + let must_fail ~msg bytes = 406 + match 407 + let reader = Bytes.Reader.of_string bytes in 408 + let dec = Cbor_rw.decoder reader in 409 + ignore (Cbor_rw.read_cbor dec) 410 + with 411 + | () -> Alcotest.failf "%s: expected failure but decode succeeded" msg 412 + | exception Failure _ -> () 413 + | exception End_of_file -> () 414 + 415 + let test_deep_nesting () = 416 + let depth = 513 in 417 + let buf = Buffer.create (depth + 1) in 418 + for _ = 1 to depth do 419 + Buffer.add_char buf '\x81' 420 + done; 421 + Buffer.add_char buf '\x00'; 422 + must_fail ~msg:"deep nesting (513 levels)" (Buffer.contents buf) 423 + 424 + let test_indefinite_array_too_large () = 425 + let limit = 1_000_001 in 426 + let buf = Buffer.create (limit + 2) in 427 + Buffer.add_char buf '\x9f'; 428 + for _ = 1 to limit do 429 + Buffer.add_char buf '\x00' 430 + done; 431 + Buffer.add_char buf '\xff'; 432 + must_fail ~msg:"indefinite array > 1M items" (Buffer.contents buf) 433 + 434 + let test_huge_length () = 435 + let bytes = "\x5a\xff\xff\xff\xff" in 436 + must_fail ~msg:"huge length 0xFFFFFFFF" bytes 437 + 438 + let test_huge_length_64 () = 439 + let bytes = "\x5b\x00\x00\x00\x01\x00\x00\x00\x00" in 440 + must_fail ~msg:"huge 64-bit length" bytes 441 + 442 + let test_nested_indefinite_text () = 443 + let bytes = "\x7f\x7f\xff\xff" in 444 + must_fail ~msg:"nested indefinite text" bytes 445 + 446 + let test_nested_indefinite_bytes () = 447 + let bytes = "\x5f\x5f\xff\xff" in 448 + must_fail ~msg:"nested indefinite bytes" bytes 449 + 450 + let test_truncated_array () = 451 + let bytes = "\x85\x01\x02" in 452 + must_fail ~msg:"truncated array" bytes 453 + 454 + let test_truncated_text () = 455 + let bytes = "\x6a\x61\x62\x63" in 456 + must_fail ~msg:"truncated text" bytes 457 + 458 + let test_empty_input () = must_fail ~msg:"empty input" "" 459 + 460 + let test_break_at_top_level () = 461 + let bytes = "\xff" in 462 + must_fail ~msg:"break at top level" bytes 463 + 464 + let test_reserved_additional_info () = 465 + must_fail ~msg:"reserved additional info 28" "\x1c"; 466 + must_fail ~msg:"reserved additional info 29" "\x1d"; 467 + must_fail ~msg:"reserved additional info 30" "\x1e" 468 + 469 + let test_definite_array_too_large () = 470 + let bytes = "\x9a\x00\x0f\x42\x41" in 471 + must_fail ~msg:"definite array > 1M items" bytes 472 + 473 + let hostile_cases = 474 + [ 475 + Alcotest.test_case "deep nesting (513 levels)" `Quick test_deep_nesting; 476 + Alcotest.test_case "indefinite array > 1M items" `Slow 477 + test_indefinite_array_too_large; 478 + Alcotest.test_case "huge length (32-bit)" `Quick test_huge_length; 479 + Alcotest.test_case "huge length (64-bit)" `Quick test_huge_length_64; 480 + Alcotest.test_case "nested indefinite text" `Quick 481 + test_nested_indefinite_text; 482 + Alcotest.test_case "nested indefinite bytes" `Quick 483 + test_nested_indefinite_bytes; 484 + Alcotest.test_case "truncated array" `Quick test_truncated_array; 485 + Alcotest.test_case "truncated text" `Quick test_truncated_text; 486 + Alcotest.test_case "empty input" `Quick test_empty_input; 487 + Alcotest.test_case "break at top level" `Quick test_break_at_top_level; 488 + Alcotest.test_case "reserved additional info" `Quick 489 + test_reserved_additional_info; 490 + Alcotest.test_case "definite array > 1M items" `Quick 491 + test_definite_array_too_large; 492 + ] 493 + 494 + (* Query / update / introspection tests (inlined from former test_query.ml) *) 495 + 496 + let person_cbor = 497 + V.Map 498 + [ 499 + (V.Text "name", V.Text "Alice"); 500 + (V.Text "age", V.Int (Z.of_int 30)); 501 + (V.Text "active", V.Bool true); 502 + ] 503 + 504 + let nested_cbor = 505 + V.Map [ (V.Text "user", person_cbor); (V.Text "score", V.Float 99.5) ] 506 + 507 + let int_keyed_cbor = 508 + V.Map 509 + [ 510 + (V.Int (Z.of_int 1), V.Text "alg"); (V.Int (Z.of_int 4), V.Bytes "keyid"); 511 + ] 512 + 513 + let array_cbor = 514 + V.Array [ V.Int (Z.of_int 10); V.Int (Z.of_int 20); V.Int (Z.of_int 30) ] 515 + 516 + let query_cases = 517 + [ 518 + Alcotest.test_case "mem: string field" `Quick (fun () -> 519 + let c = Cbor.mem "name" Cbor.string in 520 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 521 + Alcotest.(check string) "name" "Alice" result); 522 + Alcotest.test_case "mem: int field" `Quick (fun () -> 523 + let c = Cbor.mem "age" Cbor.int in 524 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 525 + Alcotest.(check int) "age" 30 result); 526 + Alcotest.test_case "mem: bool field" `Quick (fun () -> 527 + let c = Cbor.mem "active" Cbor.bool in 528 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 529 + Alcotest.(check bool) "active" true result); 530 + Alcotest.test_case "mem: missing field" `Quick (fun () -> 531 + let c = Cbor.mem "email" Cbor.string in 532 + match Cbor.Private.decode_cbor c person_cbor with 533 + | Error _ -> () 534 + | Ok _ -> Alcotest.fail "expected error for missing field"); 535 + Alcotest.test_case "mem: wrong type" `Quick (fun () -> 536 + let c = Cbor.mem "name" Cbor.int in 537 + match Cbor.Private.decode_cbor c person_cbor with 538 + | Error _ -> () 539 + | Ok _ -> Alcotest.fail "expected error for wrong type"); 540 + Alcotest.test_case "mem: not a map" `Quick (fun () -> 541 + let c = Cbor.mem "name" Cbor.string in 542 + match Cbor.Private.decode_cbor c (V.Int (Z.of_int 42)) with 543 + | Error _ -> () 544 + | Ok _ -> Alcotest.fail "expected error for non-map"); 545 + Alcotest.test_case "mem: nested" `Quick (fun () -> 546 + let c = Cbor.mem "user" (Cbor.mem "name" Cbor.string) in 547 + let result = Cbor.Private.decode_cbor_exn c nested_cbor in 548 + Alcotest.(check string) "nested name" "Alice" result); 549 + Alcotest.test_case "mem: encode roundtrip" `Quick (fun () -> 550 + let c = Cbor.mem "name" Cbor.string in 551 + let cbor = Cbor.Private.encode_cbor c "Bob" in 552 + let result = Cbor.Private.decode_cbor_exn c cbor in 553 + Alcotest.(check string) "roundtrip" "Bob" result); 554 + Alcotest.test_case "int_mem: text field" `Quick (fun () -> 555 + let c = Cbor.int_mem 1 Cbor.string in 556 + let result = Cbor.Private.decode_cbor_exn c int_keyed_cbor in 557 + Alcotest.(check string) "key 1" "alg" result); 558 + Alcotest.test_case "int_mem: bytes field" `Quick (fun () -> 559 + let c = Cbor.int_mem 4 Cbor.bytes in 560 + let result = Cbor.Private.decode_cbor_exn c int_keyed_cbor in 561 + Alcotest.(check string) "key 4" "keyid" result); 562 + Alcotest.test_case "int_mem: missing key" `Quick (fun () -> 563 + let c = Cbor.int_mem 99 Cbor.string in 564 + match Cbor.Private.decode_cbor c int_keyed_cbor with 565 + | Error _ -> () 566 + | Ok _ -> Alcotest.fail "expected error for missing int key"); 567 + Alcotest.test_case "int_mem: encode roundtrip" `Quick (fun () -> 568 + let c = Cbor.int_mem 1 Cbor.string in 569 + let cbor = Cbor.Private.encode_cbor c "HS256" in 570 + let result = Cbor.Private.decode_cbor_exn c cbor in 571 + Alcotest.(check string) "roundtrip" "HS256" result); 572 + Alcotest.test_case "nth: first element" `Quick (fun () -> 573 + let c = Cbor.nth 0 Cbor.int in 574 + let result = Cbor.Private.decode_cbor_exn c array_cbor in 575 + Alcotest.(check int) "nth 0" 10 result); 576 + Alcotest.test_case "nth: last element" `Quick (fun () -> 577 + let c = Cbor.nth 2 Cbor.int in 578 + let result = Cbor.Private.decode_cbor_exn c array_cbor in 579 + Alcotest.(check int) "nth 2" 30 result); 580 + Alcotest.test_case "nth: out of bounds" `Quick (fun () -> 581 + let c = Cbor.nth 5 Cbor.int in 582 + match Cbor.Private.decode_cbor c array_cbor with 583 + | Error _ -> () 584 + | Ok _ -> Alcotest.fail "expected error for out of bounds"); 585 + Alcotest.test_case "nth: not an array" `Quick (fun () -> 586 + let c = Cbor.nth 0 Cbor.int in 587 + match Cbor.Private.decode_cbor c (V.Text "hello") with 588 + | Error _ -> () 589 + | Ok _ -> Alcotest.fail "expected error for non-array"); 590 + Alcotest.test_case "nth: encode roundtrip" `Quick (fun () -> 591 + let c = Cbor.nth 2 Cbor.int in 592 + let cbor = Cbor.Private.encode_cbor c 42 in 593 + let result = Cbor.Private.decode_cbor_exn c cbor in 594 + Alcotest.(check int) "roundtrip" 42 result); 595 + ] 596 + 597 + let update_cases = 598 + [ 599 + Alcotest.test_case "update_mem: identity transform" `Quick (fun () -> 600 + let c = Cbor.update_mem "name" Cbor.string in 601 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 602 + match V.text "name" result with 603 + | Some (V.Text s) -> Alcotest.(check string) "name unchanged" "Alice" s 604 + | _ -> Alcotest.fail "expected text member 'name'"); 605 + Alcotest.test_case "update_mem: transform via map codec" `Quick (fun () -> 606 + let upper_string = 607 + Cbor.map String.uppercase_ascii String.lowercase_ascii Cbor.string 608 + in 609 + let c = Cbor.update_mem "name" upper_string in 610 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 611 + match V.text "name" result with 612 + | Some (V.Text s) -> Alcotest.(check string) "name uppercased" "alice" s 613 + | _ -> Alcotest.fail "expected text member 'name'"); 614 + Alcotest.test_case "update_mem: missing field" `Quick (fun () -> 615 + let c = Cbor.update_mem "email" Cbor.string in 616 + match Cbor.Private.decode_cbor c person_cbor with 617 + | Error _ -> () 618 + | Ok _ -> Alcotest.fail "expected error for missing field"); 619 + Alcotest.test_case "update_mem: preserves other fields" `Quick (fun () -> 620 + let c = Cbor.update_mem "name" Cbor.string in 621 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 622 + (match V.text "age" result with 623 + | Some (V.Int n) -> Alcotest.(check int) "age preserved" 30 (Z.to_int n) 624 + | _ -> Alcotest.fail "expected int member 'age'"); 625 + match V.text "active" result with 626 + | Some (V.Bool b) -> Alcotest.(check bool) "active preserved" true b 627 + | _ -> Alcotest.fail "expected bool member 'active'"); 628 + Alcotest.test_case "update_mem: not a map" `Quick (fun () -> 629 + let c = Cbor.update_mem "name" Cbor.string in 630 + match Cbor.Private.decode_cbor c (V.Int (Z.of_int 42)) with 631 + | Error _ -> () 632 + | Ok _ -> Alcotest.fail "expected error for non-map"); 633 + Alcotest.test_case "delete_mem: remove field" `Quick (fun () -> 634 + let c = Cbor.delete_mem "name" in 635 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 636 + Alcotest.(check bool) "name absent" false (V.mem_text "name" result); 637 + Alcotest.(check bool) "age present" true (V.mem_text "age" result); 638 + Alcotest.(check bool) "active present" true (V.mem_text "active" result)); 639 + Alcotest.test_case "delete_mem: absent field is no-op" `Quick (fun () -> 640 + let c = Cbor.delete_mem "email" in 641 + let result = Cbor.Private.decode_cbor_exn c person_cbor in 642 + match V.length result with 643 + | Some n -> Alcotest.(check int) "same length" 3 n 644 + | None -> Alcotest.fail "expected map"); 645 + Alcotest.test_case "delete_mem: not a map" `Quick (fun () -> 646 + let c = Cbor.delete_mem "name" in 647 + match Cbor.Private.decode_cbor c (V.Array []) with 648 + | Error _ -> () 649 + | Ok _ -> Alcotest.fail "expected error for non-map"); 650 + ] 651 + 652 + let introspection_cases = 653 + [ 654 + Alcotest.test_case "kind: base codecs" `Quick (fun () -> 655 + Alcotest.(check string) "null" "null" (Cbor.kind Cbor.null); 656 + Alcotest.(check string) "bool" "bool" (Cbor.kind Cbor.bool); 657 + Alcotest.(check string) "int" "int" (Cbor.kind Cbor.int); 658 + Alcotest.(check string) "int32" "int32" (Cbor.kind Cbor.int32); 659 + Alcotest.(check string) "int64" "int64" (Cbor.kind Cbor.int64); 660 + Alcotest.(check string) "float" "float" (Cbor.kind Cbor.float); 661 + Alcotest.(check string) "string" "string" (Cbor.kind Cbor.string); 662 + Alcotest.(check string) "bytes" "bytes" (Cbor.kind Cbor.bytes); 663 + Alcotest.(check string) "any" "any" (Cbor.kind Cbor.any)); 664 + Alcotest.test_case "kind: composite codecs" `Quick (fun () -> 665 + Alcotest.(check string) 666 + "array(int)" "array(int)" 667 + (Cbor.kind (Cbor.array Cbor.int)); 668 + Alcotest.(check string) 669 + "nullable(string)" "nullable(string)" 670 + (Cbor.kind (Cbor.nullable Cbor.string)); 671 + Alcotest.(check string) 672 + "tuple2" "tuple2(string, int)" 673 + (Cbor.kind (Cbor.tuple2 Cbor.string Cbor.int)); 674 + Alcotest.(check string) 675 + "tag" "tag(1, int)" 676 + (Cbor.kind (Cbor.tag 1 Cbor.int))); 677 + Alcotest.test_case "kind: obj codec" `Quick (fun () -> 678 + let codec = 679 + let open Cbor.Obj in 680 + let* name = mem "name" fst Cbor.string in 681 + let* age = mem "age" snd Cbor.int in 682 + return (name, age) 683 + in 684 + let codec = Cbor.Obj.finish codec in 685 + let k = Cbor.kind codec in 686 + Alcotest.(check string) "obj kind" "obj({name, age})" k); 687 + Alcotest.test_case "kind: query codecs" `Quick (fun () -> 688 + Alcotest.(check string) 689 + "mem" "mem(name, string)" 690 + (Cbor.kind (Cbor.mem "name" Cbor.string)); 691 + Alcotest.(check string) 692 + "int_mem" "int_mem(1, string)" 693 + (Cbor.kind (Cbor.int_mem 1 Cbor.string)); 694 + Alcotest.(check string) 695 + "nth" "nth(0, int)" 696 + (Cbor.kind (Cbor.nth 0 Cbor.int))); 697 + Alcotest.test_case "kind: update codecs" `Quick (fun () -> 698 + Alcotest.(check string) 699 + "update_mem" "update_mem(name, string)" 700 + (Cbor.kind (Cbor.update_mem "name" Cbor.string)); 701 + Alcotest.(check string) 702 + "delete_mem" "delete_mem(name)" 703 + (Cbor.kind (Cbor.delete_mem "name"))); 704 + Alcotest.test_case "kind: map/conv" `Quick (fun () -> 705 + let codec = 706 + Cbor.map String.uppercase_ascii String.lowercase_ascii Cbor.string 707 + in 708 + Alcotest.(check string) "map(string)" "map(string)" (Cbor.kind codec)); 709 + ] 710 + 711 + let suite = 712 + ( "cbor", 713 + rfc_tests @ unit_tests @ hostile_cases @ query_cases @ update_cases 714 + @ introspection_cases )
+2
test/test_cbor.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Test suite. *)
+1
test/test_cbor_rw.ml
··· 1 + let suite = ("cbor_rw", [ Alcotest.test_case "noop" `Quick ignore ])
+2
test/test_cbor_rw.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Test suite. *)
+782
test/test_value.ml
··· 1 + (** CBOR Encoding Tests 2 + 3 + Tests derived from RFC 8949 Appendix A (Examples of Encoded CBOR Data 4 + Items). *) 5 + 6 + (* Helper to encode to hex string *) 7 + let encode_to_hex f = 8 + let buf = Buffer.create 64 in 9 + let writer = Bytesrw.Bytes.Writer.of_buffer buf in 10 + let enc = Cbor.Binary.encoder writer in 11 + f enc; 12 + Cbor.Binary.flush_encoder enc; 13 + let bytes = Buffer.contents buf in 14 + String.concat "" 15 + (List.init (String.length bytes) (fun i -> 16 + Fmt.str "%02x" (Char.code (String.get bytes i)))) 17 + 18 + (* Helper to convert hex string to bytes for comparison *) 19 + let hex_to_bytes hex = 20 + let hex = String.lowercase_ascii hex in 21 + let len = String.length hex / 2 in 22 + let buf = Bytes.create len in 23 + for i = 0 to len - 1 do 24 + let byte = int_of_string ("0x" ^ String.sub hex (i * 2) 2) in 25 + Bytes.set_uint8 buf i byte 26 + done; 27 + Bytes.to_string buf 28 + 29 + (* ============= Integer Tests (RFC 8949 Appendix A) ============= *) 30 + 31 + let test_uint_0 () = 32 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 0) in 33 + Alcotest.(check string) "0" "00" hex 34 + 35 + let test_uint_1 () = 36 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 1) in 37 + Alcotest.(check string) "1" "01" hex 38 + 39 + let test_uint_10 () = 40 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 10) in 41 + Alcotest.(check string) "10" "0a" hex 42 + 43 + let test_uint_23 () = 44 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 23) in 45 + Alcotest.(check string) "23" "17" hex 46 + 47 + let test_uint_24 () = 48 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 24) in 49 + Alcotest.(check string) "24" "1818" hex 50 + 51 + let test_uint_25 () = 52 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 25) in 53 + Alcotest.(check string) "25" "1819" hex 54 + 55 + let test_uint_100 () = 56 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 100) in 57 + Alcotest.(check string) "100" "1864" hex 58 + 59 + let test_uint_1000 () = 60 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 1000) in 61 + Alcotest.(check string) "1000" "1903e8" hex 62 + 63 + let test_uint_1000000 () = 64 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc 1000000) in 65 + Alcotest.(check string) "1000000" "1a000f4240" hex 66 + 67 + let test_uint_1000000000000 () = 68 + let hex = 69 + encode_to_hex (fun enc -> Cbor.Binary.write_int64 enc 1000000000000L) 70 + in 71 + Alcotest.(check string) "1000000000000" "1b000000e8d4a51000" hex 72 + 73 + (* ============= Negative Integer Tests ============= *) 74 + 75 + let test_nint_minus1 () = 76 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc (-1)) in 77 + Alcotest.(check string) "-1" "20" hex 78 + 79 + let test_nint_minus10 () = 80 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc (-10)) in 81 + Alcotest.(check string) "-10" "29" hex 82 + 83 + let test_nint_minus100 () = 84 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc (-100)) in 85 + Alcotest.(check string) "-100" "3863" hex 86 + 87 + let test_nint_minus1000 () = 88 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_int enc (-1000)) in 89 + Alcotest.(check string) "-1000" "3903e7" hex 90 + 91 + (* ============= Boolean and Null Tests ============= *) 92 + 93 + let test_false () = 94 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_bool enc false) in 95 + Alcotest.(check string) "false" "f4" hex 96 + 97 + let test_true () = 98 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_bool enc true) in 99 + Alcotest.(check string) "true" "f5" hex 100 + 101 + let test_null () = 102 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_null enc) in 103 + Alcotest.(check string) "null" "f6" hex 104 + 105 + (* ============= Float Tests ============= *) 106 + 107 + (* Note: RFC 8949 deterministic encoding uses the smallest float representation 108 + that preserves the value. Values like 1.0, infinity, and NaN can be represented 109 + exactly in half precision (16-bit), so they use f9 prefix. *) 110 + 111 + let test_float_1_0 () = 112 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_float enc 1.0) in 113 + (* Half precision 1.0 = 0xf93c00 per RFC 8949 deterministic encoding *) 114 + Alcotest.(check string) "1.0" "f93c00" hex 115 + 116 + let test_float_1_1 () = 117 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_float enc 1.1) in 118 + (* 1.1 cannot be exactly represented in half precision, uses double *) 119 + (* RFC: 0xfb3ff199999999999a *) 120 + Alcotest.(check string) "1.1" "fb3ff199999999999a" hex 121 + 122 + let test_float_neg_4_1 () = 123 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_float enc (-4.1)) in 124 + (* -4.1 cannot be exactly represented in half precision, uses double *) 125 + (* RFC: 0xfbc010666666666666 *) 126 + Alcotest.(check string) "-4.1" "fbc010666666666666" hex 127 + 128 + let test_float_1e300 () = 129 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_float enc 1.0e300) in 130 + (* 1.0e300 exceeds half/single precision range, uses double *) 131 + (* RFC: 0xfb7e37e43c8800759c *) 132 + Alcotest.(check string) "1.0e+300" "fb7e37e43c8800759c" hex 133 + 134 + let test_float_infinity () = 135 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_float enc infinity) in 136 + (* Half precision infinity = 0xf97c00 per RFC 8949 deterministic encoding *) 137 + Alcotest.(check string) "Infinity" "f97c00" hex 138 + 139 + let test_float_neg_infinity () = 140 + let hex = 141 + encode_to_hex (fun enc -> Cbor.Binary.write_float enc neg_infinity) 142 + in 143 + (* Half precision -infinity = 0xf9fc00 per RFC 8949 deterministic encoding *) 144 + Alcotest.(check string) "-Infinity" "f9fc00" hex 145 + 146 + let test_float_nan () = 147 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_float enc nan) in 148 + (* Half precision NaN = 0xf97e00 per RFC 8949 deterministic encoding *) 149 + Alcotest.(check string) "NaN" "f97e00" hex 150 + 151 + (* ============= Text String Tests ============= *) 152 + 153 + let test_text_empty () = 154 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_text enc "") in 155 + Alcotest.(check string) "empty string" "60" hex 156 + 157 + let test_text_a () = 158 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_text enc "a") in 159 + Alcotest.(check string) "\"a\"" "6161" hex 160 + 161 + let test_text_ietf () = 162 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_text enc "IETF") in 163 + Alcotest.(check string) "\"IETF\"" "6449455446" hex 164 + 165 + let test_text_quote_backslash () = 166 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_text enc "\"\\") in 167 + Alcotest.(check string) "\"\\\"\\\\\"" "62225c" hex 168 + 169 + let test_text_utf8_umlaut () = 170 + (* U+00FC = ü = 0xc3 0xbc in UTF-8 *) 171 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_text enc "\xc3\xbc") in 172 + Alcotest.(check string) "ü" "62c3bc" hex 173 + 174 + let test_text_utf8_water () = 175 + (* U+6C34 = 水 = 0xe6 0xb0 0xb4 in UTF-8 *) 176 + let hex = 177 + encode_to_hex (fun enc -> Cbor.Binary.write_text enc "\xe6\xb0\xb4") 178 + in 179 + Alcotest.(check string) "水" "63e6b0b4" hex 180 + 181 + let test_text_utf8_emoji () = 182 + (* U+10151 = 𐅑 = 0xf0 0x90 0x85 0x91 in UTF-8 *) 183 + let hex = 184 + encode_to_hex (fun enc -> Cbor.Binary.write_text enc "\xf0\x90\x85\x91") 185 + in 186 + Alcotest.(check string) "𐅑" "64f0908591" hex 187 + 188 + (* ============= Byte String Tests ============= *) 189 + 190 + let test_bytes_empty () = 191 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_bytes_header enc 0) in 192 + Alcotest.(check string) "empty bytes" "40" hex 193 + 194 + let test_bytes_01020304 () = 195 + let hex = 196 + encode_to_hex (fun enc -> 197 + Cbor.Binary.write_bytes_header enc 4; 198 + Cbor.Binary.write_bytes enc (hex_to_bytes "01020304")) 199 + in 200 + Alcotest.(check string) "h'01020304'" "4401020304" hex 201 + 202 + (* ============= Array Tests ============= *) 203 + 204 + let test_array_empty () = 205 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_array_start enc 0) in 206 + Alcotest.(check string) "[]" "80" hex 207 + 208 + let test_array_123 () = 209 + let hex = 210 + encode_to_hex (fun enc -> 211 + Cbor.Binary.write_array_start enc 3; 212 + Cbor.Binary.write_int enc 1; 213 + Cbor.Binary.write_int enc 2; 214 + Cbor.Binary.write_int enc 3) 215 + in 216 + Alcotest.(check string) "[1, 2, 3]" "83010203" hex 217 + 218 + let test_array_nested () = 219 + (* [1, [2, 3], [4, 5]] *) 220 + let hex = 221 + encode_to_hex (fun enc -> 222 + Cbor.Binary.write_array_start enc 3; 223 + Cbor.Binary.write_int enc 1; 224 + Cbor.Binary.write_array_start enc 2; 225 + Cbor.Binary.write_int enc 2; 226 + Cbor.Binary.write_int enc 3; 227 + Cbor.Binary.write_array_start enc 2; 228 + Cbor.Binary.write_int enc 4; 229 + Cbor.Binary.write_int enc 5) 230 + in 231 + Alcotest.(check string) "[1, [2, 3], [4, 5]]" "8301820203820405" hex 232 + 233 + let test_array_25_items () = 234 + (* [1, 2, 3, ..., 25] - requires 1-byte length encoding *) 235 + let hex = 236 + encode_to_hex (fun enc -> 237 + Cbor.Binary.write_array_start enc 25; 238 + for i = 1 to 25 do 239 + Cbor.Binary.write_int enc i 240 + done) 241 + in 242 + (* 0x98 0x19 = array with 1-byte length (25) *) 243 + Alcotest.(check string) 244 + "[1..25]" "98190102030405060708090a0b0c0d0e0f101112131415161718181819" hex 245 + 246 + (* ============= Map Tests ============= *) 247 + 248 + let test_map_empty () = 249 + let hex = encode_to_hex (fun enc -> Cbor.Binary.write_map_start enc 0) in 250 + Alcotest.(check string) "{}" "a0" hex 251 + 252 + let test_map_int_keys () = 253 + (* {1: 2, 3: 4} *) 254 + let hex = 255 + encode_to_hex (fun enc -> 256 + Cbor.Binary.write_map_start enc 2; 257 + Cbor.Binary.write_int enc 1; 258 + Cbor.Binary.write_int enc 2; 259 + Cbor.Binary.write_int enc 3; 260 + Cbor.Binary.write_int enc 4) 261 + in 262 + Alcotest.(check string) "{1: 2, 3: 4}" "a201020304" hex 263 + 264 + let test_map_string_keys () = 265 + (* {"a": 1, "b": [2, 3]} *) 266 + let hex = 267 + encode_to_hex (fun enc -> 268 + Cbor.Binary.write_map_start enc 2; 269 + Cbor.Binary.write_text enc "a"; 270 + Cbor.Binary.write_int enc 1; 271 + Cbor.Binary.write_text enc "b"; 272 + Cbor.Binary.write_array_start enc 2; 273 + Cbor.Binary.write_int enc 2; 274 + Cbor.Binary.write_int enc 3) 275 + in 276 + Alcotest.(check string) "{\"a\": 1, \"b\": [2, 3]}" "a26161016162820203" hex 277 + 278 + let test_mixed_array_map () = 279 + (* ["a", {"b": "c"}] *) 280 + let hex = 281 + encode_to_hex (fun enc -> 282 + Cbor.Binary.write_array_start enc 2; 283 + Cbor.Binary.write_text enc "a"; 284 + Cbor.Binary.write_map_start enc 1; 285 + Cbor.Binary.write_text enc "b"; 286 + Cbor.Binary.write_text enc "c") 287 + in 288 + Alcotest.(check string) "[\"a\", {\"b\": \"c\"}]" "826161a161626163" hex 289 + 290 + let test_map_5_pairs () = 291 + (* {"a": "A", "b": "B", "c": "C", "d": "D", "e": "E"} *) 292 + let hex = 293 + encode_to_hex (fun enc -> 294 + Cbor.Binary.write_map_start enc 5; 295 + Cbor.Binary.write_text enc "a"; 296 + Cbor.Binary.write_text enc "A"; 297 + Cbor.Binary.write_text enc "b"; 298 + Cbor.Binary.write_text enc "B"; 299 + Cbor.Binary.write_text enc "c"; 300 + Cbor.Binary.write_text enc "C"; 301 + Cbor.Binary.write_text enc "d"; 302 + Cbor.Binary.write_text enc "D"; 303 + Cbor.Binary.write_text enc "e"; 304 + Cbor.Binary.write_text enc "E") 305 + in 306 + Alcotest.(check string) 307 + "{a:A, b:B, c:C, d:D, e:E}" "a56161614161626142616361436164614461656145" hex 308 + 309 + (* ============= Tag Tests ============= *) 310 + 311 + let test_tag_epoch_timestamp () = 312 + (* 1(1363896240) - epoch-based date/time *) 313 + let hex = 314 + encode_to_hex (fun enc -> 315 + Cbor.Binary.write_type_arg enc Cbor.Binary.major_tag 1; 316 + Cbor.Binary.write_int enc 1363896240) 317 + in 318 + Alcotest.(check string) "1(1363896240)" "c11a514b67b0" hex 319 + 320 + (* ============= Major Type Constants Test ============= *) 321 + 322 + let test_major_type_constants () = 323 + Alcotest.(check int) "major_uint" 0 Cbor.Binary.major_uint; 324 + Alcotest.(check int) "major_nint" 1 Cbor.Binary.major_nint; 325 + Alcotest.(check int) "major_bytes" 2 Cbor.Binary.major_bytes; 326 + Alcotest.(check int) "major_text" 3 Cbor.Binary.major_text; 327 + Alcotest.(check int) "major_array" 4 Cbor.Binary.major_array; 328 + Alcotest.(check int) "major_map" 5 Cbor.Binary.major_map; 329 + Alcotest.(check int) "major_tag" 6 Cbor.Binary.major_tag; 330 + Alcotest.(check int) "major_simple" 7 Cbor.Binary.major_simple 331 + 332 + let test_simple_value_constants () = 333 + Alcotest.(check int) "simple_false" 20 Cbor.Binary.simple_false; 334 + Alcotest.(check int) "simple_true" 21 Cbor.Binary.simple_true; 335 + Alcotest.(check int) "simple_null" 22 Cbor.Binary.simple_null; 336 + Alcotest.(check int) "simple_undefined" 23 Cbor.Binary.simple_undefined 337 + 338 + let test_additional_info_constants () = 339 + Alcotest.(check int) "ai_1byte" 24 Cbor.Binary.ai_1byte; 340 + Alcotest.(check int) "ai_2byte" 25 Cbor.Binary.ai_2byte; 341 + Alcotest.(check int) "ai_4byte" 26 Cbor.Binary.ai_4byte; 342 + Alcotest.(check int) "ai_8byte" 27 Cbor.Binary.ai_8byte; 343 + Alcotest.(check int) "ai_indefinite" 31 Cbor.Binary.ai_indefinite 344 + 345 + (* ============= High-level Codec API Tests ============= *) 346 + 347 + (* Round-trip tests using Cbor.encode_string and Cbor.decode_string *) 348 + 349 + let test_codec_int_roundtrip () = 350 + let values = [ 0; 1; 23; 24; 100; 1000; 1000000; -1; -10; -100; -1000 ] in 351 + List.iter 352 + (fun v -> 353 + let encoded = Cbor.encode_string Cbor.int v in 354 + match Cbor.decode_string Cbor.int encoded with 355 + | Ok decoded -> Alcotest.(check int) (Fmt.str "int %d" v) v decoded 356 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 357 + values 358 + 359 + let test_codec_int64_roundtrip () = 360 + let values = [ 0L; 1L; 1000000000000L; -1L; Int64.max_int; Int64.min_int ] in 361 + List.iter 362 + (fun v -> 363 + let encoded = Cbor.encode_string Cbor.int64 v in 364 + match Cbor.decode_string Cbor.int64 encoded with 365 + | Ok decoded -> Alcotest.(check int64) (Fmt.str "int64 %Ld" v) v decoded 366 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 367 + values 368 + 369 + let test_codec_bool_roundtrip () = 370 + List.iter 371 + (fun v -> 372 + let encoded = Cbor.encode_string Cbor.bool v in 373 + match Cbor.decode_string Cbor.bool encoded with 374 + | Ok decoded -> Alcotest.(check bool) (Fmt.str "bool %b" v) v decoded 375 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 376 + [ true; false ] 377 + 378 + let test_codec_null_roundtrip () = 379 + let encoded = Cbor.encode_string Cbor.null () in 380 + match Cbor.decode_string Cbor.null encoded with 381 + | Ok () -> () 382 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 383 + 384 + let test_codec_float_roundtrip () = 385 + let values = [ 0.0; 1.0; -1.0; 1.5; 3.14159; 1e10; -1e-10 ] in 386 + List.iter 387 + (fun v -> 388 + let encoded = Cbor.encode_string Cbor.float v in 389 + match Cbor.decode_string Cbor.float encoded with 390 + | Ok decoded -> 391 + let diff = abs_float (v -. decoded) in 392 + Alcotest.(check bool) (Fmt.str "float %g" v) true (diff < 1e-10) 393 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 394 + values 395 + 396 + let test_codec_string_roundtrip () = 397 + let values = 398 + [ ""; "a"; "hello"; "UTF-8: \xc3\xbc \xe6\xb0\xb4"; "with\nnewline" ] 399 + in 400 + List.iter 401 + (fun v -> 402 + let encoded = Cbor.encode_string Cbor.string v in 403 + match Cbor.decode_string Cbor.string encoded with 404 + | Ok decoded -> Alcotest.(check string) (Fmt.str "string %S" v) v decoded 405 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 406 + values 407 + 408 + let test_codec_bytes_roundtrip () = 409 + let values = [ ""; "\x00\x01\x02\x03"; String.make 100 '\xff' ] in 410 + List.iter 411 + (fun v -> 412 + let encoded = Cbor.encode_string Cbor.bytes v in 413 + match Cbor.decode_string Cbor.bytes encoded with 414 + | Ok decoded -> Alcotest.(check string) "bytes" v decoded 415 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 416 + values 417 + 418 + let test_codec_array_roundtrip () = 419 + let values = [ []; [ 1 ]; [ 1; 2; 3 ]; List.init 25 (fun i -> i) ] in 420 + let int_list = Cbor.array Cbor.int in 421 + List.iter 422 + (fun v -> 423 + let encoded = Cbor.encode_string int_list v in 424 + match Cbor.decode_string int_list encoded with 425 + | Ok decoded -> Alcotest.(check (list int)) "array" v decoded 426 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)) 427 + values 428 + 429 + let test_codec_nested_array () = 430 + let nested = Cbor.array (Cbor.array Cbor.int) in 431 + let v = [ [ 1; 2 ]; [ 3; 4; 5 ]; [] ] in 432 + let encoded = Cbor.encode_string nested v in 433 + match Cbor.decode_string nested encoded with 434 + | Ok decoded -> Alcotest.(check (list (list int))) "nested array" v decoded 435 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 436 + 437 + let test_codec_string_map_roundtrip () = 438 + let map = Cbor.string_map Cbor.int in 439 + let v = [ ("a", 1); ("b", 2); ("c", 3) ] in 440 + let encoded = Cbor.encode_string map v in 441 + match Cbor.decode_string map encoded with 442 + | Ok decoded -> 443 + (* Maps may reorder, so sort before comparing *) 444 + let sort = List.sort compare in 445 + Alcotest.(check (list (pair string int))) 446 + "string map" (sort v) (sort decoded) 447 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 448 + 449 + let test_codec_int_map_roundtrip () = 450 + let map = Cbor.int_map Cbor.string in 451 + let v = [ (1, "one"); (2, "two"); (3, "three") ] in 452 + let encoded = Cbor.encode_string map v in 453 + match Cbor.decode_string map encoded with 454 + | Ok decoded -> 455 + let sort = List.sort compare in 456 + Alcotest.(check (list (pair int string))) 457 + "int map" (sort v) (sort decoded) 458 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 459 + 460 + let test_codec_tuple2 () = 461 + let codec = Cbor.tuple2 Cbor.string Cbor.int in 462 + let v = ("hello", 42) in 463 + let encoded = Cbor.encode_string codec v in 464 + match Cbor.decode_string codec encoded with 465 + | Ok decoded -> Alcotest.(check (pair string int)) "tuple2" v decoded 466 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 467 + 468 + let test_codec_tuple3 () = 469 + let codec = Cbor.tuple3 Cbor.int Cbor.string Cbor.bool in 470 + let v = (42, "hello", true) in 471 + let encoded = Cbor.encode_string codec v in 472 + match Cbor.decode_string codec encoded with 473 + | Ok decoded -> 474 + let a, b, c = decoded in 475 + Alcotest.(check int) "tuple3.0" 42 a; 476 + Alcotest.(check string) "tuple3.1" "hello" b; 477 + Alcotest.(check bool) "tuple3.2" true c 478 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 479 + 480 + let test_codec_nullable () = 481 + let codec = Cbor.nullable Cbor.int in 482 + (* Test Some *) 483 + let v1 = Some 42 in 484 + let encoded1 = Cbor.encode_string codec v1 in 485 + (match Cbor.decode_string codec encoded1 with 486 + | Ok decoded -> Alcotest.(check (option int)) "nullable some" v1 decoded 487 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)); 488 + (* Test None *) 489 + let v2 = None in 490 + let encoded2 = Cbor.encode_string codec v2 in 491 + match Cbor.decode_string codec encoded2 with 492 + | Ok decoded -> Alcotest.(check (option int)) "nullable none" v2 decoded 493 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 494 + 495 + (* ============= Obj Codec Tests (Records with String Keys) ============= *) 496 + 497 + type person = { name : string; age : int; email : string option } 498 + 499 + let person_codec = 500 + Cbor.Obj.finish 501 + @@ 502 + let open Cbor.Obj in 503 + let* name = mem "name" (fun p -> p.name) Cbor.string in 504 + let* age = mem "age" (fun p -> p.age) Cbor.int in 505 + let* email = mem_opt "email" (fun p -> p.email) Cbor.string in 506 + return { name; age; email } 507 + 508 + let test_obj_codec_basic () = 509 + let v = { name = "Alice"; age = 30; email = None } in 510 + let encoded = Cbor.encode_string person_codec v in 511 + match Cbor.decode_string person_codec encoded with 512 + | Ok decoded -> 513 + Alcotest.(check string) "name" v.name decoded.name; 514 + Alcotest.(check int) "age" v.age decoded.age; 515 + Alcotest.(check (option string)) "email" v.email decoded.email 516 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 517 + 518 + let test_obj_codec_with_optional () = 519 + let v = { name = "Bob"; age = 25; email = Some "bob@example.com" } in 520 + let encoded = Cbor.encode_string person_codec v in 521 + match Cbor.decode_string person_codec encoded with 522 + | Ok decoded -> 523 + Alcotest.(check string) "name" v.name decoded.name; 524 + Alcotest.(check int) "age" v.age decoded.age; 525 + Alcotest.(check (option string)) "email" v.email decoded.email 526 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 527 + 528 + (* ============= Obj_int Codec Tests (Records with Integer Keys) ============= *) 529 + 530 + (* CWT-style claims with integer keys per RFC 8392: 531 + 1=iss, 2=sub, 3=aud, 4=exp, 5=nbf, 6=iat, 7=cti *) 532 + type cwt_claims = { 533 + iss : string option; (* key 1 *) 534 + sub : string option; (* key 2 *) 535 + exp : int64 option; (* key 4 *) 536 + } 537 + 538 + let cwt_claims_codec = 539 + Cbor.Obj_int.finish 540 + @@ 541 + let open Cbor.Obj_int in 542 + let* iss = mem_opt 1 (fun c -> c.iss) Cbor.string in 543 + let* sub = mem_opt 2 (fun c -> c.sub) Cbor.string in 544 + let* exp = mem_opt 4 (fun c -> c.exp) Cbor.int64 in 545 + return { iss; sub; exp } 546 + 547 + let test_obj_int_codec () = 548 + let v = 549 + { 550 + iss = Some "https://example.com"; 551 + sub = Some "user123"; 552 + exp = Some 1700000000L; 553 + } 554 + in 555 + let encoded = Cbor.encode_string cwt_claims_codec v in 556 + match Cbor.decode_string cwt_claims_codec encoded with 557 + | Ok decoded -> 558 + Alcotest.(check (option string)) "iss" v.iss decoded.iss; 559 + Alcotest.(check (option string)) "sub" v.sub decoded.sub; 560 + Alcotest.(check (option int64)) "exp" v.exp decoded.exp 561 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 562 + 563 + let test_obj_int_partial () = 564 + let v = { iss = Some "issuer"; sub = None; exp = None } in 565 + let encoded = Cbor.encode_string cwt_claims_codec v in 566 + match Cbor.decode_string cwt_claims_codec encoded with 567 + | Ok decoded -> 568 + Alcotest.(check (option string)) "iss" v.iss decoded.iss; 569 + Alcotest.(check (option string)) "sub" v.sub decoded.sub; 570 + Alcotest.(check (option int64)) "exp" v.exp decoded.exp 571 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 572 + 573 + (* ============= Tag Tests with Codec API ============= *) 574 + 575 + let test_codec_tag () = 576 + (* Tag 1 = epoch timestamp *) 577 + let epoch_codec = Cbor.tag 1 Cbor.int64 in 578 + let v = 1363896240L in 579 + let encoded = Cbor.encode_string epoch_codec v in 580 + (* Should match RFC 8949 example: c11a514b67b0 *) 581 + let hex = 582 + String.concat "" 583 + (List.init (String.length encoded) (fun i -> 584 + Fmt.str "%02x" (Char.code (String.get encoded i)))) 585 + in 586 + Alcotest.(check string) "epoch tag hex" "c11a514b67b0" hex; 587 + match Cbor.decode_string epoch_codec encoded with 588 + | Ok decoded -> Alcotest.(check int64) "epoch value" v decoded 589 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 590 + 591 + let test_codec_tag_opt () = 592 + (* Tag 32 = URI (optional) *) 593 + let uri_codec = Cbor.tag_opt 32 Cbor.string in 594 + let v = "https://example.com" in 595 + (* Encode with tag *) 596 + let encoded = Cbor.encode_string uri_codec v in 597 + (match Cbor.decode_string uri_codec encoded with 598 + | Ok decoded -> Alcotest.(check string) "uri tagged" v decoded 599 + | Error e -> Alcotest.fail (Cbor.Error.to_string e)); 600 + (* Decode without tag should also work *) 601 + let plain = Cbor.encode_string Cbor.string v in 602 + match Cbor.decode_string uri_codec plain with 603 + | Ok decoded -> Alcotest.(check string) "uri untagged" v decoded 604 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 605 + 606 + (* ============= Decode from Hex Tests ============= *) 607 + 608 + let test_decode_rfc_integers () = 609 + (* RFC 8949 Appendix A test vectors *) 610 + let tests = 611 + [ 612 + ("00", 0L); 613 + ("01", 1L); 614 + ("0a", 10L); 615 + ("17", 23L); 616 + ("1818", 24L); 617 + ("1819", 25L); 618 + ("1864", 100L); 619 + ("1903e8", 1000L); 620 + ("1a000f4240", 1000000L); 621 + ("1b000000e8d4a51000", 1000000000000L); 622 + ("20", -1L); 623 + ("29", -10L); 624 + ("3863", -100L); 625 + ("3903e7", -1000L); 626 + ] 627 + in 628 + List.iter 629 + (fun (hex, expected) -> 630 + let bytes = hex_to_bytes hex in 631 + match Cbor.decode_string Cbor.int64 bytes with 632 + | Ok decoded -> Alcotest.(check int64) hex expected decoded 633 + | Error e -> Alcotest.failf "%s: %s" hex (Cbor.Error.to_string e)) 634 + tests 635 + 636 + let test_decode_rfc_strings () = 637 + let tests = 638 + [ 639 + ("60", ""); 640 + ("6161", "a"); 641 + ("6449455446", "IETF"); 642 + ("62225c", "\"\\"); 643 + ("62c3bc", "\xc3\xbc"); 644 + (* ü *) 645 + ("63e6b0b4", "\xe6\xb0\xb4"); 646 + (* 水 *) 647 + ] 648 + in 649 + List.iter 650 + (fun (hex, expected) -> 651 + let bytes = hex_to_bytes hex in 652 + match Cbor.decode_string Cbor.string bytes with 653 + | Ok decoded -> Alcotest.(check string) hex expected decoded 654 + | Error e -> Alcotest.failf "%s: %s" hex (Cbor.Error.to_string e)) 655 + tests 656 + 657 + let test_decode_rfc_arrays () = 658 + let int_list = Cbor.array Cbor.int in 659 + let tests = [ ("80", []); ("83010203", [ 1; 2; 3 ]) ] in 660 + List.iter 661 + (fun (hex, expected) -> 662 + let bytes = hex_to_bytes hex in 663 + match Cbor.decode_string int_list bytes with 664 + | Ok decoded -> Alcotest.(check (list int)) hex expected decoded 665 + | Error e -> Alcotest.failf "%s: %s" hex (Cbor.Error.to_string e)) 666 + tests 667 + 668 + let test_decode_rfc_booleans () = 669 + let tests = [ ("f4", false); ("f5", true) ] in 670 + List.iter 671 + (fun (hex, expected) -> 672 + let bytes = hex_to_bytes hex in 673 + match Cbor.decode_string Cbor.bool bytes with 674 + | Ok decoded -> Alcotest.(check bool) hex expected decoded 675 + | Error e -> Alcotest.failf "%s: %s" hex (Cbor.Error.to_string e)) 676 + tests 677 + 678 + let test_decode_rfc_null () = 679 + let bytes = hex_to_bytes "f6" in 680 + match Cbor.decode_string Cbor.null bytes with 681 + | Ok () -> () 682 + | Error e -> Alcotest.fail (Cbor.Error.to_string e) 683 + 684 + (* ============= Error Handling Tests ============= *) 685 + 686 + let test_decode_type_mismatch () = 687 + (* Try to decode an integer as a string *) 688 + let bytes = hex_to_bytes "01" in 689 + (* integer 1 *) 690 + match Cbor.decode_string Cbor.string bytes with 691 + | Ok _ -> Alcotest.fail "Expected type mismatch error" 692 + | Error e -> 693 + let msg = Cbor.Error.to_string e in 694 + Alcotest.(check bool) 695 + "error contains type info" true 696 + (String.length msg > 0) 697 + 698 + let test_decode_truncated () = 699 + (* Truncated integer (header says 4 bytes follow but only 2 provided) *) 700 + let bytes = hex_to_bytes "1a0001" in 701 + match Cbor.decode_string Cbor.int bytes with 702 + | Ok _ -> Alcotest.fail "Expected parse error" 703 + | Error _ -> () 704 + 705 + let suite = 706 + ( "value", 707 + [ 708 + Alcotest.test_case "uint 0" `Quick test_uint_0; 709 + Alcotest.test_case "uint 1" `Quick test_uint_1; 710 + Alcotest.test_case "uint 10" `Quick test_uint_10; 711 + Alcotest.test_case "uint 23" `Quick test_uint_23; 712 + Alcotest.test_case "uint 24" `Quick test_uint_24; 713 + Alcotest.test_case "uint 25" `Quick test_uint_25; 714 + Alcotest.test_case "uint 100" `Quick test_uint_100; 715 + Alcotest.test_case "uint 1000" `Quick test_uint_1000; 716 + Alcotest.test_case "uint 1000000" `Quick test_uint_1000000; 717 + Alcotest.test_case "uint 1000000000000" `Quick test_uint_1000000000000; 718 + Alcotest.test_case "nint -1" `Quick test_nint_minus1; 719 + Alcotest.test_case "nint -10" `Quick test_nint_minus10; 720 + Alcotest.test_case "nint -100" `Quick test_nint_minus100; 721 + Alcotest.test_case "nint -1000" `Quick test_nint_minus1000; 722 + Alcotest.test_case "false" `Quick test_false; 723 + Alcotest.test_case "true" `Quick test_true; 724 + Alcotest.test_case "null" `Quick test_null; 725 + Alcotest.test_case "float 1.0" `Quick test_float_1_0; 726 + Alcotest.test_case "float 1.1" `Quick test_float_1_1; 727 + Alcotest.test_case "float -4.1" `Quick test_float_neg_4_1; 728 + Alcotest.test_case "float 1.0e+300" `Quick test_float_1e300; 729 + Alcotest.test_case "float Infinity" `Quick test_float_infinity; 730 + Alcotest.test_case "float -Infinity" `Quick test_float_neg_infinity; 731 + Alcotest.test_case "float NaN" `Quick test_float_nan; 732 + Alcotest.test_case "text empty" `Quick test_text_empty; 733 + Alcotest.test_case "text a" `Quick test_text_a; 734 + Alcotest.test_case "text IETF" `Quick test_text_ietf; 735 + Alcotest.test_case "text quote_backslash" `Quick test_text_quote_backslash; 736 + Alcotest.test_case "text utf8_umlaut" `Quick test_text_utf8_umlaut; 737 + Alcotest.test_case "text utf8_water" `Quick test_text_utf8_water; 738 + Alcotest.test_case "text utf8_emoji" `Quick test_text_utf8_emoji; 739 + Alcotest.test_case "bytes empty" `Quick test_bytes_empty; 740 + Alcotest.test_case "bytes 01020304" `Quick test_bytes_01020304; 741 + Alcotest.test_case "array empty" `Quick test_array_empty; 742 + Alcotest.test_case "array [1,2,3]" `Quick test_array_123; 743 + Alcotest.test_case "array nested" `Quick test_array_nested; 744 + Alcotest.test_case "array 25_items" `Quick test_array_25_items; 745 + Alcotest.test_case "map empty" `Quick test_map_empty; 746 + Alcotest.test_case "map int_keys" `Quick test_map_int_keys; 747 + Alcotest.test_case "map string_keys" `Quick test_map_string_keys; 748 + Alcotest.test_case "map mixed" `Quick test_mixed_array_map; 749 + Alcotest.test_case "map 5_pairs" `Quick test_map_5_pairs; 750 + Alcotest.test_case "tag epoch_timestamp" `Quick test_tag_epoch_timestamp; 751 + Alcotest.test_case "major_types" `Quick test_major_type_constants; 752 + Alcotest.test_case "simple_values" `Quick test_simple_value_constants; 753 + Alcotest.test_case "additional_info" `Quick test_additional_info_constants; 754 + Alcotest.test_case "codec int" `Quick test_codec_int_roundtrip; 755 + Alcotest.test_case "codec int64" `Quick test_codec_int64_roundtrip; 756 + Alcotest.test_case "codec bool" `Quick test_codec_bool_roundtrip; 757 + Alcotest.test_case "codec null" `Quick test_codec_null_roundtrip; 758 + Alcotest.test_case "codec float" `Quick test_codec_float_roundtrip; 759 + Alcotest.test_case "codec string" `Quick test_codec_string_roundtrip; 760 + Alcotest.test_case "codec bytes" `Quick test_codec_bytes_roundtrip; 761 + Alcotest.test_case "codec array" `Quick test_codec_array_roundtrip; 762 + Alcotest.test_case "codec nested_array" `Quick test_codec_nested_array; 763 + Alcotest.test_case "codec string_map" `Quick 764 + test_codec_string_map_roundtrip; 765 + Alcotest.test_case "codec int_map" `Quick test_codec_int_map_roundtrip; 766 + Alcotest.test_case "codec tuple2" `Quick test_codec_tuple2; 767 + Alcotest.test_case "codec tuple3" `Quick test_codec_tuple3; 768 + Alcotest.test_case "codec nullable" `Quick test_codec_nullable; 769 + Alcotest.test_case "obj basic" `Quick test_obj_codec_basic; 770 + Alcotest.test_case "obj with_optional" `Quick test_obj_codec_with_optional; 771 + Alcotest.test_case "obj_int full" `Quick test_obj_int_codec; 772 + Alcotest.test_case "obj_int partial" `Quick test_obj_int_partial; 773 + Alcotest.test_case "tag_codec tag" `Quick test_codec_tag; 774 + Alcotest.test_case "tag_codec tag_opt" `Quick test_codec_tag_opt; 775 + Alcotest.test_case "decode integers" `Quick test_decode_rfc_integers; 776 + Alcotest.test_case "decode strings" `Quick test_decode_rfc_strings; 777 + Alcotest.test_case "decode arrays" `Quick test_decode_rfc_arrays; 778 + Alcotest.test_case "decode booleans" `Quick test_decode_rfc_booleans; 779 + Alcotest.test_case "decode null" `Quick test_decode_rfc_null; 780 + Alcotest.test_case "error type_mismatch" `Quick test_decode_type_mismatch; 781 + Alcotest.test_case "error truncated" `Quick test_decode_truncated; 782 + ] )
+2
test/test_value.mli
··· 1 + val suite : string * unit Alcotest.test_case list 2 + (** Test suite. *)