My working unpac space for OCaml projects in development
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add RFC 8949 (CBOR) spec

+3674
+3674
specs/rfc8949.txt
··· 1 +  2 + 3 + 4 + 5 + Internet Engineering Task Force (IETF) C. Bormann 6 + Request for Comments: 8949 Universität Bremen TZI 7 + STD: 94 P. Hoffman 8 + Obsoletes: 7049 ICANN 9 + Category: Standards Track December 2020 10 + ISSN: 2070-1721 11 + 12 + 13 + Concise Binary Object Representation (CBOR) 14 + 15 + Abstract 16 + 17 + The Concise Binary Object Representation (CBOR) is a data format 18 + whose design goals include the possibility of extremely small code 19 + size, fairly small message size, and extensibility without the need 20 + for version negotiation. These design goals make it different from 21 + earlier binary serializations such as ASN.1 and MessagePack. 22 + 23 + This document obsoletes RFC 7049, providing editorial improvements, 24 + new details, and errata fixes while keeping full compatibility with 25 + the interchange format of RFC 7049. It does not create a new version 26 + of the format. 27 + 28 + Status of This Memo 29 + 30 + This is an Internet Standards Track document. 31 + 32 + This document is a product of the Internet Engineering Task Force 33 + (IETF). It represents the consensus of the IETF community. It has 34 + received public review and has been approved for publication by the 35 + Internet Engineering Steering Group (IESG). Further information on 36 + Internet Standards is available in Section 2 of RFC 7841. 37 + 38 + Information about the current status of this document, any errata, 39 + and how to provide feedback on it may be obtained at 40 + https://www.rfc-editor.org/info/rfc8949. 41 + 42 + Copyright Notice 43 + 44 + Copyright (c) 2020 IETF Trust and the persons identified as the 45 + document authors. All rights reserved. 46 + 47 + This document is subject to BCP 78 and the IETF Trust's Legal 48 + Provisions Relating to IETF Documents 49 + (https://trustee.ietf.org/license-info) in effect on the date of 50 + publication of this document. Please review these documents 51 + carefully, as they describe your rights and restrictions with respect 52 + to this document. Code Components extracted from this document must 53 + include Simplified BSD License text as described in Section 4.e of 54 + the Trust Legal Provisions and are provided without warranty as 55 + described in the Simplified BSD License. 56 + 57 + Table of Contents 58 + 59 + 1. Introduction 60 + 1.1. Objectives 61 + 1.2. Terminology 62 + 2. CBOR Data Models 63 + 2.1. Extended Generic Data Models 64 + 2.2. Specific Data Models 65 + 3. Specification of the CBOR Encoding 66 + 3.1. Major Types 67 + 3.2. Indefinite Lengths for Some Major Types 68 + 3.2.1. The "break" Stop Code 69 + 3.2.2. Indefinite-Length Arrays and Maps 70 + 3.2.3. Indefinite-Length Byte Strings and Text Strings 71 + 3.2.4. Summary of Indefinite-Length Use of Major Types 72 + 3.3. Floating-Point Numbers and Values with No Content 73 + 3.4. Tagging of Items 74 + 3.4.1. Standard Date/Time String 75 + 3.4.2. Epoch-Based Date/Time 76 + 3.4.3. Bignums 77 + 3.4.4. Decimal Fractions and Bigfloats 78 + 3.4.5. Content Hints 79 + 3.4.5.1. Encoded CBOR Data Item 80 + 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 81 + 3.4.5.3. Encoded Text 82 + 3.4.6. Self-Described CBOR 83 + 4. Serialization Considerations 84 + 4.1. Preferred Serialization 85 + 4.2. Deterministically Encoded CBOR 86 + 4.2.1. Core Deterministic Encoding Requirements 87 + 4.2.2. Additional Deterministic Encoding Considerations 88 + 4.2.3. Length-First Map Key Ordering 89 + 5. Creating CBOR-Based Protocols 90 + 5.1. CBOR in Streaming Applications 91 + 5.2. Generic Encoders and Decoders 92 + 5.3. Validity of Items 93 + 5.3.1. Basic validity 94 + 5.3.2. Tag validity 95 + 5.4. Validity and Evolution 96 + 5.5. Numbers 97 + 5.6. Specifying Keys for Maps 98 + 5.6.1. Equivalence of Keys 99 + 5.7. Undefined Values 100 + 6. Converting Data between CBOR and JSON 101 + 6.1. Converting from CBOR to JSON 102 + 6.2. Converting from JSON to CBOR 103 + 7. Future Evolution of CBOR 104 + 7.1. Extension Points 105 + 7.2. Curating the Additional Information Space 106 + 8. Diagnostic Notation 107 + 8.1. Encoding Indicators 108 + 9. IANA Considerations 109 + 9.1. CBOR Simple Values Registry 110 + 9.2. CBOR Tags Registry 111 + 9.3. Media Types Registry 112 + 9.4. CoAP Content-Format Registry 113 + 9.5. Structured Syntax Suffix Registry 114 + 10. Security Considerations 115 + 11. References 116 + 11.1. Normative References 117 + 11.2. Informative References 118 + Appendix A. Examples of Encoded CBOR Data Items 119 + Appendix B. Jump Table for Initial Byte 120 + Appendix C. Pseudocode 121 + Appendix D. Half-Precision 122 + Appendix E. Comparison of Other Binary Formats to CBOR's Design 123 + Objectives 124 + E.1. ASN.1 DER, BER, and PER 125 + E.2. MessagePack 126 + E.3. BSON 127 + E.4. MSDTP: RFC 713 128 + E.5. Conciseness on the Wire 129 + Appendix F. Well-Formedness Errors and Examples 130 + F.1. Examples of CBOR Data Items That Are Not Well-Formed 131 + Appendix G. Changes from RFC 7049 132 + G.1. Errata Processing and Clerical Changes 133 + G.2. Changes in IANA Considerations 134 + G.3. Changes in Suggestions and Other Informational Components 135 + Acknowledgements 136 + Authors' Addresses 137 + 138 + 1. Introduction 139 + 140 + There are hundreds of standardized formats for binary representation 141 + of structured data (also known as binary serialization formats). Of 142 + those, some are for specific domains of information, while others are 143 + generalized for arbitrary data. In the IETF, probably the best-known 144 + formats in the latter category are ASN.1's BER and DER [ASN.1]. 145 + 146 + The format defined here follows some specific design goals that are 147 + not well met by current formats. The underlying data model is an 148 + extended version of the JSON data model [RFC8259]. It is important 149 + to note that this is not a proposal that the grammar in RFC 8259 be 150 + extended in general, since doing so would cause a significant 151 + backwards incompatibility with already deployed JSON documents. 152 + Instead, this document simply defines its own data model that starts 153 + from JSON. 154 + 155 + Appendix E lists some existing binary formats and discusses how well 156 + they do or do not fit the design objectives of the Concise Binary 157 + Object Representation (CBOR). 158 + 159 + This document obsoletes [RFC7049], providing editorial improvements, 160 + new details, and errata fixes while keeping full compatibility with 161 + the interchange format of RFC 7049. It does not create a new version 162 + of the format. 163 + 164 + 1.1. Objectives 165 + 166 + The objectives of CBOR, roughly in decreasing order of importance, 167 + are: 168 + 169 + 1. The representation must be able to unambiguously encode most 170 + common data formats used in Internet standards. 171 + 172 + * It must represent a reasonable set of basic data types and 173 + structures using binary encoding. "Reasonable" here is 174 + largely influenced by the capabilities of JSON, with the major 175 + addition of binary byte strings. The structures supported are 176 + limited to arrays and trees; loops and lattice-style graphs 177 + are not supported. 178 + 179 + * There is no requirement that all data formats be uniquely 180 + encoded; that is, it is acceptable that the number "7" might 181 + be encoded in multiple different ways. 182 + 183 + 2. The code for an encoder or decoder must be able to be compact in 184 + order to support systems with very limited memory, processor 185 + power, and instruction sets. 186 + 187 + * An encoder and a decoder need to be implementable in a very 188 + small amount of code (for example, in class 1 constrained 189 + nodes as defined in [RFC7228]). 190 + 191 + * The format should use contemporary machine representations of 192 + data (for example, not requiring binary-to-decimal 193 + conversion). 194 + 195 + 3. Data must be able to be decoded without a schema description. 196 + 197 + * Similar to JSON, encoded data should be self-describing so 198 + that a generic decoder can be written. 199 + 200 + 4. The serialization must be reasonably compact, but data 201 + compactness is secondary to code compactness for the encoder and 202 + decoder. 203 + 204 + * "Reasonable" here is bounded by JSON as an upper bound in size 205 + and by the implementation complexity, which limits the amount 206 + of effort that can go into achieving that compactness. Using 207 + either general compression schemes or extensive bit-fiddling 208 + violates the complexity goals. 209 + 210 + 5. The format must be applicable to both constrained nodes and high- 211 + volume applications. 212 + 213 + * This means it must be reasonably frugal in CPU usage for both 214 + encoding and decoding. This is relevant both for constrained 215 + nodes and for potential usage in applications with a very high 216 + volume of data. 217 + 218 + 6. The format must support all JSON data types for conversion to and 219 + from JSON. 220 + 221 + * It must support a reasonable level of conversion as long as 222 + the data represented is within the capabilities of JSON. It 223 + must be possible to define a unidirectional mapping towards 224 + JSON for all types of data. 225 + 226 + 7. The format must be extensible, and the extended data must be 227 + decodable by earlier decoders. 228 + 229 + * The format is designed for decades of use. 230 + 231 + * The format must support a form of extensibility that allows 232 + fallback so that a decoder that does not understand an 233 + extension can still decode the message. 234 + 235 + * The format must be able to be extended in the future by later 236 + IETF standards. 237 + 238 + 1.2. Terminology 239 + 240 + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 241 + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 242 + "OPTIONAL" in this document are to be interpreted as described in 243 + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 244 + capitals, as shown here. 245 + 246 + The term "byte" is used in its now-customary sense as a synonym for 247 + "octet". All multi-byte values are encoded in network byte order 248 + (that is, most significant byte first, also known as "big-endian"). 249 + 250 + This specification makes use of the following terminology: 251 + 252 + Data item: A single piece of CBOR data. The structure of a data 253 + item may contain zero, one, or more nested data items. The term 254 + is used both for the data item in representation format and for 255 + the abstract idea that can be derived from that by a decoder; the 256 + former can be addressed specifically by using the term "encoded 257 + data item". 258 + 259 + Decoder: A process that decodes a well-formed encoded CBOR data item 260 + and makes it available to an application. Formally speaking, a 261 + decoder contains a parser to break up the input using the syntax 262 + rules of CBOR, as well as a semantic processor to prepare the data 263 + in a form suitable to the application. 264 + 265 + Encoder: A process that generates the (well-formed) representation 266 + format of a CBOR data item from application information. 267 + 268 + Data Stream: A sequence of zero or more data items, not further 269 + assembled into a larger containing data item (see [RFC8742] for 270 + one application). The independent data items that make up a data 271 + stream are sometimes also referred to as "top-level data items". 272 + 273 + Well-formed: A data item that follows the syntactic structure of 274 + CBOR. A well-formed data item uses the initial bytes and the byte 275 + strings and/or data items that are implied by their values as 276 + defined in CBOR and does not include following extraneous data. 277 + CBOR decoders by definition only return contents from well-formed 278 + data items. 279 + 280 + Valid: A data item that is well-formed and also follows the semantic 281 + restrictions that apply to CBOR data items (Section 5.3). 282 + 283 + Expected: Besides its normal English meaning, the term "expected" is 284 + used to describe requirements beyond CBOR validity that an 285 + application has on its input data. Well-formed (processable at 286 + all), valid (checked by a validity-checking generic decoder), and 287 + expected (checked by the application) form a hierarchy of layers 288 + of acceptability. 289 + 290 + Stream decoder: A process that decodes a data stream and makes each 291 + of the data items in the sequence available to an application as 292 + they are received. 293 + 294 + Terms and concepts for floating-point values such as Infinity, NaN 295 + (not a number), negative zero, and subnormal are defined in 296 + [IEEE754]. 297 + 298 + Where bit arithmetic or data types are explained, this document uses 299 + the notation familiar from the programming language C [C], except 300 + that ".." denotes a range that includes both ends given, and 301 + superscript notation denotes exponentiation. For example, 2 to the 302 + power of 64 is notated: 2^(64). In the plain-text version of this 303 + specification, superscript notation is not available and therefore is 304 + rendered by a surrogate notation. That notation is not optimized for 305 + this RFC; it is unfortunately ambiguous with C's exclusive-or (which 306 + is only used in the appendices, which in turn do not use 307 + exponentiation) and requires circumspection from the reader of the 308 + plain-text version. 309 + 310 + Examples and pseudocode assume that signed integers use two's 311 + complement representation and that right shifts of signed integers 312 + perform sign extension; these assumptions are also specified in 313 + Sections 6.8.1 (basic.fundamental) and 7.6.7 (expr.shift) of the 2020 314 + version of C++ (currently available as a final draft, [Cplusplus20]). 315 + 316 + Similar to the "0x" notation for hexadecimal numbers, numbers in 317 + binary notation are prefixed with "0b". Underscores can be added to 318 + a number solely for readability, so 0b00100001 (0x21) might be 319 + written 0b001_00001 to emphasize the desired interpretation of the 320 + bits in the byte; in this case, it is split into three bits and five 321 + bits. Encoded CBOR data items are sometimes given in the "0x" or 322 + "0b" notation; these values are first interpreted as numbers as in C 323 + and are then interpreted as byte strings in network byte order, 324 + including any leading zero bytes expressed in the notation. 325 + 326 + Words may be _italicized_ for emphasis; in the plain text form of 327 + this specification, this is indicated by surrounding words with 328 + underscore characters. Verbatim text (e.g., names from a programming 329 + language) may be set in "monospace" type; in plain text, this is 330 + approximated somewhat ambiguously by surrounding the text in double 331 + quotes (which also retain their usual meaning). 332 + 333 + 2. CBOR Data Models 334 + 335 + CBOR is explicit about its generic data model, which defines the set 336 + of all data items that can be represented in CBOR. Its basic generic 337 + data model is extensible by the registration of "simple values" and 338 + tags. Applications can then create a subset of the resulting 339 + extended generic data model to build their specific data models. 340 + 341 + Within environments that can represent the data items in the generic 342 + data model, generic CBOR encoders and decoders can be implemented 343 + (which usually involves defining additional implementation data types 344 + for those data items that do not already have a natural 345 + representation in the environment). The ability to provide generic 346 + encoders and decoders is an explicit design goal of CBOR; however, 347 + many applications will provide their own application-specific 348 + encoders and/or decoders. 349 + 350 + In the basic (unextended) generic data model defined in Section 3, a 351 + data item is one of the following: 352 + 353 + * an integer in the range -2^(64)..2^(64)-1 inclusive 354 + 355 + * a simple value, identified by a number between 0 and 255, but 356 + distinct from that number itself 357 + 358 + * a floating-point value, distinct from an integer, out of the set 359 + representable by IEEE 754 binary64 (including non-finites) 360 + [IEEE754] 361 + 362 + * a sequence of zero or more bytes ("byte string") 363 + 364 + * a sequence of zero or more Unicode code points ("text string") 365 + 366 + * a sequence of zero or more data items ("array") 367 + 368 + * a mapping (mathematical function) from zero or more data items 369 + ("keys") each to a data item ("values"), ("map") 370 + 371 + * a tagged data item ("tag"), comprising a tag number (an integer in 372 + the range 0..2^(64)-1) and the tag content (a data item) 373 + 374 + Note that integer and floating-point values are distinct in this 375 + model, even if they have the same numeric value. 376 + 377 + Also note that serialization variants are not visible at the generic 378 + data model level. This deliberate absence of visibility includes the 379 + number of bytes of the encoded floating-point value. It also 380 + includes the choice of encoding for an "argument" (see Section 3) 381 + such as the encoding for an integer, the encoding for the length of a 382 + text or byte string, the encoding for the number of elements in an 383 + array or pairs in a map, or the encoding for a tag number. 384 + 385 + 2.1. Extended Generic Data Models 386 + 387 + This basic generic data model has been extended in this document by 388 + the registration of a number of simple values and tag numbers, such 389 + as: 390 + 391 + * "false", "true", "null", and "undefined" (simple values identified 392 + by 20..23, Section 3.3) 393 + 394 + * integer and floating-point values with a larger range and 395 + precision than the above (tag numbers 2 to 5, Section 3.4) 396 + 397 + * application data types such as a point in time or date/time string 398 + defined in RFC 3339 (tag numbers 1 and 0, Section 3.4) 399 + 400 + Additional elements of the extended generic data model can be (and 401 + have been) defined via the IANA registries created for CBOR. Even if 402 + such an extension is unknown to a generic encoder or decoder, data 403 + items using that extension can be passed to or from the application 404 + by representing them at the application interface within the basic 405 + generic data model, i.e., as generic simple values or generic tags. 406 + 407 + In other words, the basic generic data model is stable as defined in 408 + this document, while the extended generic data model expands by the 409 + registration of new simple values or tag numbers, but never shrinks. 410 + 411 + While there is a strong expectation that generic encoders and 412 + decoders can represent "false", "true", and "null" ("undefined" is 413 + intentionally omitted) in the form appropriate for their programming 414 + environment, the implementation of the data model extensions created 415 + by tags is truly optional and a matter of implementation quality. 416 + 417 + 2.2. Specific Data Models 418 + 419 + The specific data model for a CBOR-based protocol usually takes a 420 + subset of the extended generic data model and assigns application 421 + semantics to the data items within this subset and its components. 422 + When documenting such specific data models and specifying the types 423 + of data items, it is preferable to identify the types by their 424 + generic data model names ("negative integer", "array") instead of 425 + referring to aspects of their CBOR representation ("major type 1", 426 + "major type 4"). 427 + 428 + Specific data models can also specify value equivalency (including 429 + values of different types) for the purposes of map keys and encoder 430 + freedom. For example, in the generic data model, a valid map MAY 431 + have both "0" and "0.0" as keys, and an encoder MUST NOT encode "0.0" 432 + as an integer (major type 0, Section 3.1). However, if a specific 433 + data model declares that floating-point and integer representations 434 + of integral values are equivalent, using both map keys "0" and "0.0" 435 + in a single map would be considered duplicates, even while encoded as 436 + different major types, and so invalid; and an encoder could encode 437 + integral-valued floats as integers or vice versa, perhaps to save 438 + encoded bytes. 439 + 440 + 3. Specification of the CBOR Encoding 441 + 442 + A CBOR data item (Section 2) is encoded to or decoded from a byte 443 + string carrying a well-formed encoded data item as described in this 444 + section. The encoding is summarized in Table 7 in Appendix B, 445 + indexed by the initial byte. An encoder MUST produce only well- 446 + formed encoded data items. A decoder MUST NOT return a decoded data 447 + item when it encounters input that is not a well-formed encoded CBOR 448 + data item (this does not detract from the usefulness of diagnostic 449 + and recovery tools that might make available some information from a 450 + damaged encoded CBOR data item). 451 + 452 + The initial byte of each encoded data item contains both information 453 + about the major type (the high-order 3 bits, described in 454 + Section 3.1) and additional information (the low-order 5 bits). With 455 + a few exceptions, the additional information's value describes how to 456 + load an unsigned integer "argument": 457 + 458 + Less than 24: The argument's value is the value of the additional 459 + information. 460 + 461 + 24, 25, 26, or 27: The argument's value is held in the following 1, 462 + 2, 4, or 8 bytes, respectively, in network byte order. For major 463 + type 7 and additional information value 25, 26, 27, these bytes 464 + are not used as an integer argument, but as a floating-point value 465 + (see Section 3.3). 466 + 467 + 28, 29, 30: These values are reserved for future additions to the 468 + CBOR format. In the present version of CBOR, the encoded item is 469 + not well-formed. 470 + 471 + 31: No argument value is derived. If the major type is 0, 1, or 6, 472 + the encoded item is not well-formed. For major types 2 to 5, the 473 + item's length is indefinite, and for major type 7, the byte does 474 + not constitute a data item at all but terminates an indefinite- 475 + length item; all are described in Section 3.2. 476 + 477 + The initial byte and any additional bytes consumed to construct the 478 + argument are collectively referred to as the _head_ of the data item. 479 + 480 + The meaning of this argument depends on the major type. For example, 481 + in major type 0, the argument is the value of the data item itself 482 + (and in major type 1, the value of the data item is computed from the 483 + argument); in major type 2 and 3, it gives the length of the string 484 + data in bytes that follow; and in major types 4 and 5, it is used to 485 + determine the number of data items enclosed. 486 + 487 + If the encoded sequence of bytes ends before the end of a data item, 488 + that item is not well-formed. If the encoded sequence of bytes still 489 + has bytes remaining after the outermost encoded item is decoded, that 490 + encoding is not a single well-formed CBOR item. Depending on the 491 + application, the decoder may either treat the encoding as not well- 492 + formed or just identify the start of the remaining bytes to the 493 + application. 494 + 495 + A CBOR decoder implementation can be based on a jump table with all 496 + 256 defined values for the initial byte (Table 7). A decoder in a 497 + constrained implementation can instead use the structure of the 498 + initial byte and following bytes for more compact code (see 499 + Appendix C for a rough impression of how this could look). 500 + 501 + 3.1. Major Types 502 + 503 + The following lists the major types and the additional information 504 + and other bytes associated with the type. 505 + 506 + Major type 0: 507 + An unsigned integer in the range 0..2^(64)-1 inclusive. The value 508 + of the encoded item is the argument itself. For example, the 509 + integer 10 is denoted as the one byte 0b000_01010 (major type 0, 510 + additional information 10). The integer 500 would be 0b000_11001 511 + (major type 0, additional information 25) followed by the two 512 + bytes 0x01f4, which is 500 in decimal. 513 + 514 + Major type 1: 515 + A negative integer in the range -2^(64)..-1 inclusive. The value 516 + of the item is -1 minus the argument. For example, the integer 517 + -500 would be 0b001_11001 (major type 1, additional information 518 + 25) followed by the two bytes 0x01f3, which is 499 in decimal. 519 + 520 + Major type 2: 521 + A byte string. The number of bytes in the string is equal to the 522 + argument. For example, a byte string whose length is 5 would have 523 + an initial byte of 0b010_00101 (major type 2, additional 524 + information 5 for the length), followed by 5 bytes of binary 525 + content. A byte string whose length is 500 would have 3 initial 526 + bytes of 0b010_11001 (major type 2, additional information 25 to 527 + indicate a two-byte length) followed by the two bytes 0x01f4 for a 528 + length of 500, followed by 500 bytes of binary content. 529 + 530 + Major type 3: 531 + A text string (Section 2) encoded as UTF-8 [RFC3629]. The number 532 + of bytes in the string is equal to the argument. A string 533 + containing an invalid UTF-8 sequence is well-formed but invalid 534 + (Section 1.2). This type is provided for systems that need to 535 + interpret or display human-readable text, and allows the 536 + differentiation between unstructured bytes and text that has a 537 + specified repertoire (that of Unicode) and encoding (UTF-8). In 538 + contrast to formats such as JSON, the Unicode characters in this 539 + type are never escaped. Thus, a newline character (U+000A) is 540 + always represented in a string as the byte 0x0a, and never as the 541 + bytes 0x5c6e (the characters "\" and "n") nor as 0x5c7530303061 542 + (the characters "\", "u", "0", "0", "0", and "a"). 543 + 544 + Major type 4: 545 + An array of data items. In other formats, arrays are also called 546 + lists, sequences, or tuples (a "CBOR sequence" is something 547 + slightly different, though [RFC8742]). The argument is the number 548 + of data items in the array. Items in an array do not need to all 549 + be of the same type. For example, an array that contains 10 items 550 + of any type would have an initial byte of 0b100_01010 (major type 551 + 4, additional information 10 for the length) followed by the 10 552 + remaining items. 553 + 554 + Major type 5: 555 + A map of pairs of data items. Maps are also called tables, 556 + dictionaries, hashes, or objects (in JSON). A map is comprised of 557 + pairs of data items, each pair consisting of a key that is 558 + immediately followed by a value. The argument is the number of 559 + _pairs_ of data items in the map. For example, a map that 560 + contains 9 pairs would have an initial byte of 0b101_01001 (major 561 + type 5, additional information 9 for the number of pairs) followed 562 + by the 18 remaining items. The first item is the first key, the 563 + second item is the first value, the third item is the second key, 564 + and so on. Because items in a map come in pairs, their total 565 + number is always even: a map that contains an odd number of items 566 + (no value data present after the last key data item) is not well- 567 + formed. A map that has duplicate keys may be well-formed, but it 568 + is not valid, and thus it causes indeterminate decoding; see also 569 + Section 5.6. 570 + 571 + Major type 6: 572 + A tagged data item ("tag") whose tag number, an integer in the 573 + range 0..2^(64)-1 inclusive, is the argument and whose enclosed 574 + data item (_tag content_) is the single encoded data item that 575 + follows the head. See Section 3.4. 576 + 577 + Major type 7: 578 + Floating-point numbers and simple values, as well as the "break" 579 + stop code. See Section 3.3. 580 + 581 + These eight major types lead to a simple table showing which of the 582 + 256 possible values for the initial byte of a data item are used 583 + (Table 7). 584 + 585 + In major types 6 and 7, many of the possible values are reserved for 586 + future specification. See Section 9 for more information on these 587 + values. 588 + 589 + Table 1 summarizes the major types defined by CBOR, ignoring 590 + Section 3.2 for now. The number N in this table stands for the 591 + argument. 592 + 593 + +============+=======================+=========================+ 594 + | Major Type | Meaning | Content | 595 + +============+=======================+=========================+ 596 + | 0 | unsigned integer N | - | 597 + +------------+-----------------------+-------------------------+ 598 + | 1 | negative integer -1-N | - | 599 + +------------+-----------------------+-------------------------+ 600 + | 2 | byte string | N bytes | 601 + +------------+-----------------------+-------------------------+ 602 + | 3 | text string | N bytes (UTF-8 text) | 603 + +------------+-----------------------+-------------------------+ 604 + | 4 | array | N data items (elements) | 605 + +------------+-----------------------+-------------------------+ 606 + | 5 | map | 2N data items (key/ | 607 + | | | value pairs) | 608 + +------------+-----------------------+-------------------------+ 609 + | 6 | tag of number N | 1 data item | 610 + +------------+-----------------------+-------------------------+ 611 + | 7 | simple/float | - | 612 + +------------+-----------------------+-------------------------+ 613 + 614 + Table 1: Overview over the Definite-Length Use of CBOR Major 615 + Types (N = Argument) 616 + 617 + 3.2. Indefinite Lengths for Some Major Types 618 + 619 + Four CBOR items (arrays, maps, byte strings, and text strings) can be 620 + encoded with an indefinite length using additional information value 621 + 31. This is useful if the encoding of the item needs to begin before 622 + the number of items inside the array or map, or the total length of 623 + the string, is known. (The ability to start sending a data item 624 + before all of it is known is often referred to as "streaming" within 625 + that data item.) 626 + 627 + Indefinite-length arrays and maps are dealt with differently than 628 + indefinite-length strings (byte strings and text strings). 629 + 630 + 3.2.1. The "break" Stop Code 631 + 632 + The "break" stop code is encoded with major type 7 and additional 633 + information value 31 (0b111_11111). It is not itself a data item: it 634 + is just a syntactic feature to close an indefinite-length item. 635 + 636 + If the "break" stop code appears where a data item is expected, other 637 + than directly inside an indefinite-length string, array, or map -- 638 + for example, directly inside a definite-length array or map -- the 639 + enclosing item is not well-formed. 640 + 641 + 3.2.2. Indefinite-Length Arrays and Maps 642 + 643 + Indefinite-length arrays and maps are represented using their major 644 + type with the additional information value of 31, followed by an 645 + arbitrary-length sequence of zero or more items for an array or key/ 646 + value pairs for a map, followed by the "break" stop code 647 + (Section 3.2.1). In other words, indefinite-length arrays and maps 648 + look identical to other arrays and maps except for beginning with the 649 + additional information value of 31 and ending with the "break" stop 650 + code. 651 + 652 + If the "break" stop code appears after a key in a map, in place of 653 + that key's value, the map is not well-formed. 654 + 655 + There is no restriction against nesting indefinite-length array or 656 + map items. A "break" only terminates a single item, so nested 657 + indefinite-length items need exactly as many "break" stop codes as 658 + there are type bytes starting an indefinite-length item. 659 + 660 + For example, assume an encoder wants to represent the abstract array 661 + [1, [2, 3], [4, 5]]. The definite-length encoding would be 662 + 0x8301820203820405: 663 + 664 + 83 -- Array of length 3 665 + 01 -- 1 666 + 82 -- Array of length 2 667 + 02 -- 2 668 + 03 -- 3 669 + 82 -- Array of length 2 670 + 04 -- 4 671 + 05 -- 5 672 + 673 + Indefinite-length encoding could be applied independently to each of 674 + the three arrays encoded in this data item, as required, leading to 675 + representations such as: 676 + 677 + 0x9f018202039f0405ffff 678 + 9F -- Start indefinite-length array 679 + 01 -- 1 680 + 82 -- Array of length 2 681 + 02 -- 2 682 + 03 -- 3 683 + 9F -- Start indefinite-length array 684 + 04 -- 4 685 + 05 -- 5 686 + FF -- "break" (inner array) 687 + FF -- "break" (outer array) 688 + 689 + 0x9f01820203820405ff 690 + 9F -- Start indefinite-length array 691 + 01 -- 1 692 + 82 -- Array of length 2 693 + 02 -- 2 694 + 03 -- 3 695 + 82 -- Array of length 2 696 + 04 -- 4 697 + 05 -- 5 698 + FF -- "break" 699 + 700 + 0x83018202039f0405ff 701 + 83 -- Array of length 3 702 + 01 -- 1 703 + 82 -- Array of length 2 704 + 02 -- 2 705 + 03 -- 3 706 + 9F -- Start indefinite-length array 707 + 04 -- 4 708 + 05 -- 5 709 + FF -- "break" 710 + 711 + 0x83019f0203ff820405 712 + 83 -- Array of length 3 713 + 01 -- 1 714 + 9F -- Start indefinite-length array 715 + 02 -- 2 716 + 03 -- 3 717 + FF -- "break" 718 + 82 -- Array of length 2 719 + 04 -- 4 720 + 05 -- 5 721 + 722 + An example of an indefinite-length map (that happens to have two key/ 723 + value pairs) might be: 724 + 725 + 0xbf6346756ef563416d7421ff 726 + BF -- Start indefinite-length map 727 + 63 -- First key, UTF-8 string length 3 728 + 46756e -- "Fun" 729 + F5 -- First value, true 730 + 63 -- Second key, UTF-8 string length 3 731 + 416d74 -- "Amt" 732 + 21 -- Second value, -2 733 + FF -- "break" 734 + 735 + 3.2.3. Indefinite-Length Byte Strings and Text Strings 736 + 737 + Indefinite-length strings are represented by a byte containing the 738 + major type for byte string or text string with an additional 739 + information value of 31, followed by a series of zero or more strings 740 + of the specified type ("chunks") that have definite lengths, and 741 + finished by the "break" stop code (Section 3.2.1). The data item 742 + represented by the indefinite-length string is the concatenation of 743 + the chunks. If no chunks are present, the data item is an empty 744 + string of the specified type. Zero-length chunks, while not 745 + particularly useful, are permitted. 746 + 747 + If any item between the indefinite-length string indicator 748 + (0b010_11111 or 0b011_11111) and the "break" stop code is not a 749 + definite-length string item of the same major type, the string is not 750 + well-formed. 751 + 752 + The design does not allow nesting indefinite-length strings as chunks 753 + into indefinite-length strings. If it were allowed, it would require 754 + decoder implementations to keep a stack, or at least a count, of 755 + nesting levels. It is unnecessary on the encoder side because the 756 + inner indefinite-length string would consist of chunks, and these 757 + could instead be put directly into the outer indefinite-length 758 + string. 759 + 760 + If any definite-length text string inside an indefinite-length text 761 + string is invalid, the indefinite-length text string is invalid. 762 + Note that this implies that the UTF-8 bytes of a single Unicode code 763 + point (scalar value) cannot be spread between chunks: a new chunk of 764 + a text string can only be started at a code point boundary. 765 + 766 + For example, assume an encoded data item consisting of the bytes: 767 + 768 + 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111 769 + 5F -- Start indefinite-length byte string 770 + 44 -- Byte string of length 4 771 + aabbccdd -- Bytes content 772 + 43 -- Byte string of length 3 773 + eeff99 -- Bytes content 774 + FF -- "break" 775 + 776 + After decoding, this results in a single byte string with seven 777 + bytes: 0xaabbccddeeff99. 778 + 779 + 3.2.4. Summary of Indefinite-Length Use of Major Types 780 + 781 + Table 2 summarizes the major types defined by CBOR as used for 782 + indefinite-length encoding (with additional information set to 31). 783 + 784 + +============+===================+==================================+ 785 + | Major Type | Meaning | Enclosed up to "break" Stop Code | 786 + +============+===================+==================================+ 787 + | 0 | (not well- | - | 788 + | | formed) | | 789 + +------------+-------------------+----------------------------------+ 790 + | 1 | (not well- | - | 791 + | | formed) | | 792 + +------------+-------------------+----------------------------------+ 793 + | 2 | byte string | definite-length byte strings | 794 + +------------+-------------------+----------------------------------+ 795 + | 3 | text string | definite-length text strings | 796 + +------------+-------------------+----------------------------------+ 797 + | 4 | array | data items (elements) | 798 + +------------+-------------------+----------------------------------+ 799 + | 5 | map | data items (key/value pairs) | 800 + +------------+-------------------+----------------------------------+ 801 + | 6 | (not well- | - | 802 + | | formed) | | 803 + +------------+-------------------+----------------------------------+ 804 + | 7 | "break" stop | - | 805 + | | code | | 806 + +------------+-------------------+----------------------------------+ 807 + 808 + Table 2: Overview of the Indefinite-Length Use of CBOR Major 809 + Types (Additional Information = 31) 810 + 811 + 3.3. Floating-Point Numbers and Values with No Content 812 + 813 + Major type 7 is for two types of data: floating-point numbers and 814 + "simple values" that do not need any content. Each value of the 815 + 5-bit additional information in the initial byte has its own separate 816 + meaning, as defined in Table 3. Like the major types for integers, 817 + items of this major type do not carry content data; all the 818 + information is in the initial bytes (the head). 819 + 820 + +=============+===================================================+ 821 + | 5-Bit Value | Semantics | 822 + +=============+===================================================+ 823 + | 0..23 | Simple value (value 0..23) | 824 + +-------------+---------------------------------------------------+ 825 + | 24 | Simple value (value 32..255 in following byte) | 826 + +-------------+---------------------------------------------------+ 827 + | 25 | IEEE 754 Half-Precision Float (16 bits follow) | 828 + +-------------+---------------------------------------------------+ 829 + | 26 | IEEE 754 Single-Precision Float (32 bits follow) | 830 + +-------------+---------------------------------------------------+ 831 + | 27 | IEEE 754 Double-Precision Float (64 bits follow) | 832 + +-------------+---------------------------------------------------+ 833 + | 28-30 | Reserved, not well-formed in the present document | 834 + +-------------+---------------------------------------------------+ 835 + | 31 | "break" stop code for indefinite-length items | 836 + | | (Section 3.2.1) | 837 + +-------------+---------------------------------------------------+ 838 + 839 + Table 3: Values for Additional Information in Major Type 7 840 + 841 + As with all other major types, the 5-bit value 24 signifies a single- 842 + byte extension: it is followed by an additional byte to represent the 843 + simple value. (To minimize confusion, only the values 32 to 255 are 844 + used.) This maintains the structure of the initial bytes: as for the 845 + other major types, the length of these always depends on the 846 + additional information in the first byte. Table 4 lists the numeric 847 + values assigned and available for simple values. 848 + 849 + +=========+==============+ 850 + | Value | Semantics | 851 + +=========+==============+ 852 + | 0..19 | (unassigned) | 853 + +---------+--------------+ 854 + | 20 | false | 855 + +---------+--------------+ 856 + | 21 | true | 857 + +---------+--------------+ 858 + | 22 | null | 859 + +---------+--------------+ 860 + | 23 | undefined | 861 + +---------+--------------+ 862 + | 24..31 | (reserved) | 863 + +---------+--------------+ 864 + | 32..255 | (unassigned) | 865 + +---------+--------------+ 866 + 867 + Table 4: Simple Values 868 + 869 + An encoder MUST NOT issue two-byte sequences that start with 0xf8 870 + (major type 7, additional information 24) and continue with a byte 871 + less than 0x20 (32 decimal). Such sequences are not well-formed. 872 + (This implies that an encoder cannot encode "false", "true", "null", 873 + or "undefined" in two-byte sequences and that only the one-byte 874 + variants of these are well-formed; more generally speaking, each 875 + simple value only has a single representation variant). 876 + 877 + The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit 878 + IEEE 754 binary floating-point values [IEEE754]. These floating- 879 + point values are encoded in the additional bytes of the appropriate 880 + size. (See Appendix D for some information about 16-bit floating- 881 + point numbers.) 882 + 883 + 3.4. Tagging of Items 884 + 885 + In CBOR, a data item can be enclosed by a tag to give it some 886 + additional semantics, as uniquely identified by a _tag number_. The 887 + tag is major type 6, its argument (Section 3) indicates the tag 888 + number, and it contains a single enclosed data item, the _tag 889 + content_. (If a tag requires further structure to its content, this 890 + structure is provided by the enclosed data item.) We use the term 891 + _tag_ for the entire data item consisting of both a tag number and 892 + the tag content: the tag content is the data item that is being 893 + tagged. 894 + 895 + For example, assume that a byte string of length 12 is marked with a 896 + tag of number 2 to indicate it is an unsigned _bignum_ 897 + (Section 3.4.3). The encoded data item would start with a byte 898 + 0b110_00010 (major type 6, additional information 2 for the tag 899 + number) followed by the encoded tag content: 0b010_01100 (major type 900 + 2, additional information 12 for the length) followed by the 12 bytes 901 + of the bignum. 902 + 903 + In the extended generic data model, a tag number's definition 904 + describes the additional semantics conveyed with the tag number. 905 + These semantics may include equivalence of some tagged data items 906 + with other data items, including some that can be represented in the 907 + basic generic data model. For instance, 0xc24101, a bignum the tag 908 + content of which is the byte string with the single byte 0x01, is 909 + equivalent to an integer 1, which could also be encoded as 0x01, 910 + 0x1801, or 0x190001. The tag definition may specify a preferred 911 + serialization (Section 4.1) that is recommended for generic encoders; 912 + this may prefer basic generic data model representations over ones 913 + that employ a tag. 914 + 915 + The tag definition usually defines which nested data items are valid 916 + for such tags. Tag definitions may restrict their content to a very 917 + specific syntactic structure, as the tags defined in this document 918 + do, or they may define their content more semantically. An example 919 + for the latter is how tags 40 and 1040 accept multiple ways to 920 + represent arrays [RFC8746]. 921 + 922 + As a matter of convention, many tags do not accept "null" or 923 + "undefined" values as tag content; instead, the expectation is that a 924 + "null" or "undefined" value can be used in place of the entire tag; 925 + Section 3.4.2 provides some further considerations for one specific 926 + tag about the handling of this convention in application protocols 927 + and in mapping to platform types. 928 + 929 + Decoders do not need to understand tags of every tag number, and tags 930 + may be of little value in applications where the implementation 931 + creating a particular CBOR data item and the implementation decoding 932 + that stream know the semantic meaning of each item in the data flow. 933 + The primary purpose of tags in this specification is to define common 934 + data types such as dates. A secondary purpose is to provide 935 + conversion hints when it is foreseen that the CBOR data item needs to 936 + be translated into a different format, requiring hints about the 937 + content of items. Understanding the semantics of tags is optional 938 + for a decoder; it can simply present both the tag number and the tag 939 + content to the application, without interpreting the additional 940 + semantics of the tag. 941 + 942 + A tag applies semantics to the data item it encloses. Tags can nest: 943 + if tag A encloses tag B, which encloses data item C, tag A applies to 944 + the result of applying tag B on data item C. 945 + 946 + IANA maintains a registry of tag numbers as described in Section 9.2. 947 + Table 5 provides a list of tag numbers that were defined in [RFC7049] 948 + with definitions in the rest of this section. (Tag number 35 was 949 + also defined in [RFC7049]; a discussion of this tag number follows in 950 + Section 3.4.5.3.) Note that many other tag numbers have been defined 951 + since the publication of [RFC7049]; see the registry described at 952 + Section 9.2 for the complete list. 953 + 954 + +=======+=============+==================================+ 955 + | Tag | Data Item | Semantics | 956 + +=======+=============+==================================+ 957 + | 0 | text string | Standard date/time string; see | 958 + | | | Section 3.4.1 | 959 + +-------+-------------+----------------------------------+ 960 + | 1 | integer or | Epoch-based date/time; see | 961 + | | float | Section 3.4.2 | 962 + +-------+-------------+----------------------------------+ 963 + | 2 | byte string | Unsigned bignum; see | 964 + | | | Section 3.4.3 | 965 + +-------+-------------+----------------------------------+ 966 + | 3 | byte string | Negative bignum; see | 967 + | | | Section 3.4.3 | 968 + +-------+-------------+----------------------------------+ 969 + | 4 | array | Decimal fraction; see | 970 + | | | Section 3.4.4 | 971 + +-------+-------------+----------------------------------+ 972 + | 5 | array | Bigfloat; see Section 3.4.4 | 973 + +-------+-------------+----------------------------------+ 974 + | 21 | (any) | Expected conversion to base64url | 975 + | | | encoding; see Section 3.4.5.2 | 976 + +-------+-------------+----------------------------------+ 977 + | 22 | (any) | Expected conversion to base64 | 978 + | | | encoding; see Section 3.4.5.2 | 979 + +-------+-------------+----------------------------------+ 980 + | 23 | (any) | Expected conversion to base16 | 981 + | | | encoding; see Section 3.4.5.2 | 982 + +-------+-------------+----------------------------------+ 983 + | 24 | byte string | Encoded CBOR data item; see | 984 + | | | Section 3.4.5.1 | 985 + +-------+-------------+----------------------------------+ 986 + | 32 | text string | URI; see Section 3.4.5.3 | 987 + +-------+-------------+----------------------------------+ 988 + | 33 | text string | base64url; see Section 3.4.5.3 | 989 + +-------+-------------+----------------------------------+ 990 + | 34 | text string | base64; see Section 3.4.5.3 | 991 + +-------+-------------+----------------------------------+ 992 + | 36 | text string | MIME message; see | 993 + | | | Section 3.4.5.3 | 994 + +-------+-------------+----------------------------------+ 995 + | 55799 | (any) | Self-described CBOR; see | 996 + | | | Section 3.4.6 | 997 + +-------+-------------+----------------------------------+ 998 + 999 + Table 5: Tag Numbers Defined in RFC 7049 1000 + 1001 + Conceptually, tags are interpreted in the generic data model, not at 1002 + (de-)serialization time. A small number of tags (at this time, tag 1003 + number 25 and tag number 29 [IANA.cbor-tags]) have been registered 1004 + with semantics that may require processing at (de-)serialization 1005 + time: the decoder needs to be aware of, and the encoder needs to be 1006 + in control of, the exact sequence in which data items are encoded 1007 + into the CBOR data item. This means these tags cannot be implemented 1008 + on top of an arbitrary generic CBOR encoder/decoder (which might not 1009 + reflect the serialization order for entries in a map at the data 1010 + model level and vice versa); their implementation therefore typically 1011 + needs to be integrated into the generic encoder/decoder. The 1012 + definition of new tags with this property is NOT RECOMMENDED. 1013 + 1014 + IANA allocated tag numbers 65535, 4294967295, and 1015 + 18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit). 1016 + These can be used as a convenience for implementers who want a 1017 + single-integer data structure to indicate either the presence of a 1018 + specific tag or absence of a tag. That allocation is described in 1019 + Section 10 of [CBOR-TAGS]. These tags are not intended to occur in 1020 + actual CBOR data items; implementations MAY flag such an occurrence 1021 + as an error. 1022 + 1023 + Protocols can extend the generic data model (Section 2) with data 1024 + items representing points in time by using tag numbers 0 and 1, with 1025 + arbitrarily sized integers by using tag numbers 2 and 3, and with 1026 + floating-point values of arbitrary size and precision by using tag 1027 + numbers 4 and 5. 1028 + 1029 + 3.4.1. Standard Date/Time String 1030 + 1031 + Tag number 0 contains a text string in the standard format described 1032 + by the "date-time" production in [RFC3339], as refined by Section 3.3 1033 + of [RFC4287], representing the point in time described there. A 1034 + nested item of another type or a text string that doesn't match the 1035 + format described in [RFC4287] is invalid. 1036 + 1037 + 3.4.2. Epoch-Based Date/Time 1038 + 1039 + Tag number 1 contains a numerical value counting the number of 1040 + seconds from 1970-01-01T00:00Z in UTC time to the represented point 1041 + in civil time. 1042 + 1043 + The tag content MUST be an unsigned or negative integer (major types 1044 + 0 and 1) or a floating-point number (major type 7 with additional 1045 + information 25, 26, or 27). Other contained types are invalid. 1046 + 1047 + Nonnegative values (major type 0 and nonnegative floating-point 1048 + numbers) stand for time values on or after 1970-01-01T00:00Z UTC and 1049 + are interpreted according to POSIX [TIME_T]. (POSIX time is also 1050 + known as "UNIX Epoch time".) Leap seconds are handled specially by 1051 + POSIX time, and this results in a 1-second discontinuity several 1052 + times per decade. Note that applications that require the expression 1053 + of times beyond early 2106 cannot leave out support of 64-bit 1054 + integers for the tag content. 1055 + 1056 + Negative values (major type 1 and negative floating-point numbers) 1057 + are interpreted as determined by the application requirements as 1058 + there is no universal standard for UTC count-of-seconds time before 1059 + 1970-01-01T00:00Z (this is particularly true for points in time that 1060 + precede discontinuities in national calendars). The same applies to 1061 + non-finite values. 1062 + 1063 + To indicate fractional seconds, floating-point values can be used 1064 + within tag number 1 instead of integer values. Note that this 1065 + generally requires binary64 support, as binary16 and binary32 provide 1066 + nonzero fractions of seconds only for a short period of time around 1067 + early 1970. An application that requires tag number 1 support may 1068 + restrict the tag content to be an integer (or a floating-point value) 1069 + only. 1070 + 1071 + Note that platform types for date/time may include "null" or 1072 + "undefined" values, which may also be desirable at an application 1073 + protocol level. While emitting tag number 1 values with non-finite 1074 + tag content values (e.g., with NaN for undefined date/time values or 1075 + with Infinity for an expiry date that is not set) may seem an obvious 1076 + way to handle this, using untagged "null" or "undefined" avoids the 1077 + use of non-finites and results in a shorter encoding. Application 1078 + protocol designers are encouraged to consider these cases and include 1079 + clear guidelines for handling them. 1080 + 1081 + 3.4.3. Bignums 1082 + 1083 + Protocols using tag numbers 2 and 3 extend the generic data model 1084 + (Section 2) with "bignums" representing arbitrarily sized integers. 1085 + In the basic generic data model, bignum values are not equal to 1086 + integers from the same model, but the extended generic data model 1087 + created by this tag definition defines equivalence based on numeric 1088 + value, and preferred serialization (Section 4.1) never makes use of 1089 + bignums that also can be expressed as basic integers (see below). 1090 + 1091 + Bignums are encoded as a byte string data item, which is interpreted 1092 + as an unsigned integer n in network byte order. Contained items of 1093 + other types are invalid. For tag number 2, the value of the bignum 1094 + is n. For tag number 3, the value of the bignum is -1 - n. The 1095 + preferred serialization of the byte string is to leave out any 1096 + leading zeroes (note that this means the preferred serialization for 1097 + n = 0 is the empty byte string, but see below). Decoders that 1098 + understand these tags MUST be able to decode bignums that do have 1099 + leading zeroes. The preferred serialization of an integer that can 1100 + be represented using major type 0 or 1 is to encode it this way 1101 + instead of as a bignum (which means that the empty string never 1102 + occurs in a bignum when using preferred serialization). Note that 1103 + this means the non-preferred choice of a bignum representation 1104 + instead of a basic integer for encoding a number is not intended to 1105 + have application semantics (just as the choice of a longer basic 1106 + integer representation than needed, such as 0x1800 for 0x00, does 1107 + not). 1108 + 1109 + For example, the number 18446744073709551616 (2^(64)) is represented 1110 + as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001 1111 + (major type 2, length 9), followed by 0x010000000000000000 (one byte 1112 + 0x01 and eight bytes 0x00). In hexadecimal: 1113 + 1114 + C2 -- Tag 2 1115 + 49 -- Byte string of length 9 1116 + 010000000000000000 -- Bytes content 1117 + 1118 + 3.4.4. Decimal Fractions and Bigfloats 1119 + 1120 + Protocols using tag number 4 extend the generic data model with data 1121 + items representing arbitrary-length decimal fractions of the form 1122 + m*(10^(e)). Protocols using tag number 5 extend the generic data 1123 + model with data items representing arbitrary-length binary fractions 1124 + of the form m*(2^(e)). As with bignums, values of different types 1125 + are not equal in the generic data model. 1126 + 1127 + Decimal fractions combine an integer mantissa with a base-10 scaling 1128 + factor. They are most useful if an application needs the exact 1129 + representation of a decimal fraction such as 1.1 because there is no 1130 + exact representation for many decimal fractions in binary floating- 1131 + point representations. 1132 + 1133 + "Bigfloats" combine an integer mantissa with a base-2 scaling factor. 1134 + They are binary floating-point values that can exceed the range or 1135 + the precision of the three IEEE 754 formats supported by CBOR 1136 + (Section 3.3). Bigfloats may also be used by constrained 1137 + applications that need some basic binary floating-point capability 1138 + without the need for supporting IEEE 754. 1139 + 1140 + A decimal fraction or a bigfloat is represented as a tagged array 1141 + that contains exactly two integer numbers: an exponent e and a 1142 + mantissa m. Decimal fractions (tag number 4) use base-10 exponents; 1143 + the value of a decimal fraction data item is m*(10^(e)). Bigfloats 1144 + (tag number 5) use base-2 exponents; the value of a bigfloat data 1145 + item is m*(2^(e)). The exponent e MUST be represented in an integer 1146 + of major type 0 or 1, while the mantissa can also be a bignum 1147 + (Section 3.4.3). Contained items with other structures are invalid. 1148 + 1149 + An example of a decimal fraction is the representation of the number 1150 + 273.15 as 0b110_00100 (major type 6 for tag, additional information 4 1151 + for the tag number), followed by 0b100_00010 (major type 4 for the 1152 + array, additional information 2 for the length of the array), 1153 + followed by 0b001_00001 (major type 1 for the first integer, 1154 + additional information 1 for the value of -2), followed by 1155 + 0b000_11001 (major type 0 for the second integer, additional 1156 + information 25 for a two-byte value), followed by 0b0110101010110011 1157 + (27315 in two bytes). In hexadecimal: 1158 + 1159 + C4 -- Tag 4 1160 + 82 -- Array of length 2 1161 + 21 -- -2 1162 + 19 6ab3 -- 27315 1163 + 1164 + An example of a bigfloat is the representation of the number 1.5 as 1165 + 0b110_00101 (major type 6 for tag, additional information 5 for the 1166 + tag number), followed by 0b100_00010 (major type 4 for the array, 1167 + additional information 2 for the length of the array), followed by 1168 + 0b001_00000 (major type 1 for the first integer, additional 1169 + information 0 for the value of -1), followed by 0b000_00011 (major 1170 + type 0 for the second integer, additional information 3 for the value 1171 + of 3). In hexadecimal: 1172 + 1173 + C5 -- Tag 5 1174 + 82 -- Array of length 2 1175 + 20 -- -1 1176 + 03 -- 3 1177 + 1178 + Decimal fractions and bigfloats provide no representation of 1179 + Infinity, -Infinity, or NaN; if these are needed in place of a 1180 + decimal fraction or bigfloat, the IEEE 754 half-precision 1181 + representations from Section 3.3 can be used. 1182 + 1183 + 3.4.5. Content Hints 1184 + 1185 + The tags in this section are for content hints that might be used by 1186 + generic CBOR processors. These content hints do not extend the 1187 + generic data model. 1188 + 1189 + 3.4.5.1. Encoded CBOR Data Item 1190 + 1191 + Sometimes it is beneficial to carry an embedded CBOR data item that 1192 + is not meant to be decoded immediately at the time the enclosing data 1193 + item is being decoded. Tag number 24 (CBOR data item) can be used to 1194 + tag the embedded byte string as a single data item encoded in CBOR 1195 + format. Contained items that aren't byte strings are invalid. A 1196 + contained byte string is valid if it encodes a well-formed CBOR data 1197 + item; validity checking of the decoded CBOR item is not required for 1198 + tag validity (but could be offered by a generic decoder as a special 1199 + option). 1200 + 1201 + 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters 1202 + 1203 + Tag numbers 21 to 23 indicate that a byte string might require a 1204 + specific encoding when interoperating with a text-based 1205 + representation. These tags are useful when an encoder knows that the 1206 + byte string data it is writing is likely to be later converted to a 1207 + particular JSON-based usage. That usage specifies that some strings 1208 + are encoded as base64, base64url, and so on. The encoder uses byte 1209 + strings instead of doing the encoding itself to reduce the message 1210 + size, to reduce the code size of the encoder, or both. The encoder 1211 + does not know whether or not the converter will be generic, and 1212 + therefore wants to say what it believes is the proper way to convert 1213 + binary strings to JSON. 1214 + 1215 + The data item tagged can be a byte string or any other data item. In 1216 + the latter case, the tag applies to all of the byte string data items 1217 + contained in the data item, except for those contained in a nested 1218 + data item tagged with an expected conversion. 1219 + 1220 + These three tag numbers suggest conversions to three of the base data 1221 + encodings defined in [RFC4648]. Tag number 21 suggests conversion to 1222 + base64url encoding (Section 5 of [RFC4648]) where padding is not used 1223 + (see Section 3.2 of [RFC4648]); that is, all trailing equals signs 1224 + ("=") are removed from the encoded string. Tag number 22 suggests 1225 + conversion to classical base64 encoding (Section 4 of [RFC4648]) with 1226 + padding as defined in RFC 4648. For both base64url and base64, 1227 + padding bits are set to zero (see Section 3.5 of [RFC4648]), and the 1228 + conversion to alternate encoding is performed on the contents of the 1229 + byte string (that is, without adding any line breaks, whitespace, or 1230 + other additional characters). Tag number 23 suggests conversion to 1231 + base16 (hex) encoding with uppercase alphabetics (see Section 8 of 1232 + [RFC4648]). Note that, for all three tag numbers, the encoding of 1233 + the empty byte string is the empty text string. 1234 + 1235 + 3.4.5.3. Encoded Text 1236 + 1237 + Some text strings hold data that have formats widely used on the 1238 + Internet, and sometimes those formats can be validated and presented 1239 + to the application in appropriate form by the decoder. There are 1240 + tags for some of these formats. 1241 + 1242 + * Tag number 32 is for URIs, as defined in [RFC3986]. If the text 1243 + string doesn't match the "URI-reference" production, the string is 1244 + invalid. 1245 + 1246 + * Tag numbers 33 and 34 are for base64url- and base64-encoded text 1247 + strings, respectively, as defined in [RFC4648]. If any of the 1248 + following apply: 1249 + 1250 + - the encoded text string contains non-alphabet characters or 1251 + only 1 alphabet character in the last block of 4 (where 1252 + alphabet is defined by Section 5 of [RFC4648] for tag number 33 1253 + and Section 4 of [RFC4648] for tag number 34), or 1254 + 1255 + - the padding bits in a 2- or 3-character block are not 0, or 1256 + 1257 + - the base64 encoding has the wrong number of padding characters, 1258 + or 1259 + 1260 + - the base64url encoding has padding characters, 1261 + 1262 + the string is invalid. 1263 + 1264 + * Tag number 36 is for MIME messages (including all headers), as 1265 + defined in [RFC2045]. A text string that isn't a valid MIME 1266 + message is invalid. (For this tag, validity checking may be 1267 + particularly onerous for a generic decoder and might therefore not 1268 + be offered. Note that many MIME messages are general binary data 1269 + and therefore cannot be represented in a text string; 1270 + [IANA.cbor-tags] lists a registration for tag number 257 that is 1271 + similar to tag number 36 but uses a byte string as its tag 1272 + content.) 1273 + 1274 + Note that tag numbers 33 and 34 differ from 21 and 22 in that the 1275 + data is transported in base-encoded form for the former and in raw 1276 + byte string form for the latter. 1277 + 1278 + [RFC7049] also defined a tag number 35 for regular expressions that 1279 + are in Perl Compatible Regular Expressions (PCRE/PCRE2) form [PCRE] 1280 + or in JavaScript regular expression syntax [ECMA262]. The state of 1281 + the art in these regular expression specifications has since advanced 1282 + and is continually advancing, so this specification does not attempt 1283 + to update the references. Instead, this tag remains available (as 1284 + registered in [RFC7049]) for applications that specify the particular 1285 + regular expression variant they use out-of-band (possibly by limiting 1286 + the usage to a defined common subset of both PCRE and ECMA262). As 1287 + this specification clarifies tag validity beyond [RFC7049], we note 1288 + that due to the open way the tag was defined in [RFC7049], any 1289 + contained string value needs to be valid at the CBOR tag level (but 1290 + then may not be "expected" at the application level). 1291 + 1292 + 3.4.6. Self-Described CBOR 1293 + 1294 + In many applications, it will be clear from the context that CBOR is 1295 + being employed for encoding a data item. For instance, a specific 1296 + protocol might specify the use of CBOR, or a media type is indicated 1297 + that specifies its use. However, there may be applications where 1298 + such context information is not available, such as when CBOR data is 1299 + stored in a file that does not have disambiguating metadata. Here, 1300 + it may help to have some distinguishing characteristics for the data 1301 + itself. 1302 + 1303 + Tag number 55799 is defined for this purpose, specifically for use at 1304 + the start of a stored encoded CBOR data item as specified by an 1305 + application. It does not impart any special semantics on the data 1306 + item that it encloses; that is, the semantics of the tag content 1307 + enclosed in tag number 55799 is exactly identical to the semantics of 1308 + the tag content itself. 1309 + 1310 + The serialization of this tag's head is 0xd9d9f7, which does not 1311 + appear to be in use as a distinguishing mark for any frequently used 1312 + file types. In particular, 0xd9d9f7 is not a valid start of a 1313 + Unicode text in any Unicode encoding if it is followed by a valid 1314 + CBOR data item. 1315 + 1316 + For instance, a decoder might be able to decode both CBOR and JSON. 1317 + Such a decoder would need to mechanically distinguish the two 1318 + formats. An easy way for an encoder to help the decoder would be to 1319 + tag the entire CBOR item with tag number 55799, the serialization of 1320 + which will never be found at the beginning of a JSON text. 1321 + 1322 + 4. Serialization Considerations 1323 + 1324 + 4.1. Preferred Serialization 1325 + 1326 + For some values at the data model level, CBOR provides multiple 1327 + serializations. For many applications, it is desirable that an 1328 + encoder always chooses a preferred serialization (preferred 1329 + encoding); however, the present specification does not put the burden 1330 + of enforcing this preference on either the encoder or decoder. 1331 + 1332 + Some constrained decoders may be limited in their ability to decode 1333 + non-preferred serializations: for example, if only integers below 1334 + 1_000_000_000 (one billion) are expected in an application, the 1335 + decoder may leave out the code that would be needed to decode 64-bit 1336 + arguments in integers. An encoder that always uses preferred 1337 + serialization ("preferred encoder") interoperates with this decoder 1338 + for the numbers that can occur in this application. Generally 1339 + speaking, a preferred encoder is more universally interoperable (and 1340 + also less wasteful) than one that, say, always uses 64-bit integers. 1341 + 1342 + Similarly, a constrained encoder may be limited in the variety of 1343 + representation variants it supports such that it does not emit 1344 + preferred serializations ("variant encoder"). For instance, a 1345 + constrained encoder could be designed to always use the 32-bit 1346 + variant for an integer that it encodes even if a short representation 1347 + is available (assuming that there is no application need for integers 1348 + that can only be represented with the 64-bit variant). A decoder 1349 + that does not rely on receiving only preferred serializations 1350 + ("variation-tolerant decoder") can therefore be said to be more 1351 + universally interoperable (it might very well optimize for the case 1352 + of receiving preferred serializations, though). Full implementations 1353 + of CBOR decoders are by definition variation tolerant; the 1354 + distinction is only relevant if a constrained implementation of a 1355 + CBOR decoder meets a variant encoder. 1356 + 1357 + The preferred serialization always uses the shortest form of 1358 + representing the argument (Section 3); it also uses the shortest 1359 + floating-point encoding that preserves the value being encoded. 1360 + 1361 + The preferred serialization for a floating-point value is the 1362 + shortest floating-point encoding that preserves its value, e.g., 1363 + 0xf94580 for the number 5.5, and 0xfa45ad9c00 for the number 5555.5. 1364 + For NaN values, a shorter encoding is preferred if zero-padding the 1365 + shorter significand towards the right reconstitutes the original NaN 1366 + value (for many applications, the single NaN encoding 0xf97e00 will 1367 + suffice). 1368 + 1369 + Definite-length encoding is preferred whenever the length is known at 1370 + the time the serialization of the item starts. 1371 + 1372 + 4.2. Deterministically Encoded CBOR 1373 + 1374 + Some protocols may want encoders to only emit CBOR in a particular 1375 + deterministic format; those protocols might also have the decoders 1376 + check that their input is in that deterministic format. Those 1377 + protocols are free to define what they mean by a "deterministic 1378 + format" and what encoders and decoders are expected to do. This 1379 + section defines a set of restrictions that can serve as the base of 1380 + such a deterministic format. 1381 + 1382 + 4.2.1. Core Deterministic Encoding Requirements 1383 + 1384 + A CBOR encoding satisfies the "core deterministic encoding 1385 + requirements" if it satisfies the following restrictions: 1386 + 1387 + * Preferred serialization MUST be used. In particular, this means 1388 + that arguments (see Section 3) for integers, lengths in major 1389 + types 2 through 5, and tags MUST be as short as possible, for 1390 + instance: 1391 + 1392 + - 0 to 23 and -1 to -24 MUST be expressed in the same byte as the 1393 + major type; 1394 + 1395 + - 24 to 255 and -25 to -256 MUST be expressed only with an 1396 + additional uint8_t; 1397 + 1398 + - 256 to 65535 and -257 to -65536 MUST be expressed only with an 1399 + additional uint16_t; 1400 + 1401 + - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed 1402 + only with an additional uint32_t. 1403 + 1404 + Floating-point values also MUST use the shortest form that 1405 + preserves the value, e.g., 1.5 is encoded as 0xf93e00 (binary16) 1406 + and 1000000.5 as 0xfa49742408 (binary32). (One implementation of 1407 + this is to have all floats start as a 64-bit float, then do a test 1408 + conversion to a 32-bit float; if the result is the same numeric 1409 + value, use the shorter form and repeat the process with a test 1410 + conversion to a 16-bit float. This also works to select 16-bit 1411 + float for positive and negative Infinity as well.) 1412 + 1413 + * Indefinite-length items MUST NOT appear. They can be encoded as 1414 + definite-length items instead. 1415 + 1416 + * The keys in every map MUST be sorted in the bytewise lexicographic 1417 + order of their deterministic encodings. For example, the 1418 + following keys are sorted correctly: 1419 + 1420 + 1. 10, encoded as 0x0a. 1421 + 1422 + 2. 100, encoded as 0x1864. 1423 + 1424 + 3. -1, encoded as 0x20. 1425 + 1426 + 4. "z", encoded as 0x617a. 1427 + 1428 + 5. "aa", encoded as 0x626161. 1429 + 1430 + 6. [100], encoded as 0x811864. 1431 + 1432 + 7. [-1], encoded as 0x8120. 1433 + 1434 + 8. false, encoded as 0xf4. 1435 + 1436 + | Implementation note: the self-delimiting nature of the CBOR 1437 + | encoding means that there are no two well-formed CBOR encoded 1438 + | data items where one is a prefix of the other. The bytewise 1439 + | lexicographic comparison of deterministic encodings of 1440 + | different map keys therefore always ends in a position where 1441 + | the byte differs between the keys, before the end of a key is 1442 + | reached. 1443 + 1444 + 4.2.2. Additional Deterministic Encoding Considerations 1445 + 1446 + CBOR tags present additional considerations for deterministic 1447 + encoding. If a CBOR-based protocol were to provide the same 1448 + semantics for the presence and absence of a specific tag (e.g., by 1449 + allowing both tag 1 data items and raw numbers in a date/time 1450 + position, treating the latter as if they were tagged), the 1451 + deterministic format would not allow the presence of the tag, based 1452 + on the "shortest form" principle. For example, a protocol might give 1453 + encoders the choice of representing a URL as either a text string or, 1454 + using Section 3.4.5.3, tag number 32 containing a text string. This 1455 + protocol's deterministic encoding needs either to require that the 1456 + tag is present or to require that it is absent, not allow either one. 1457 + 1458 + In a protocol that does require tags in certain places to obtain 1459 + specific semantics, the tag needs to appear in the deterministic 1460 + format as well. Deterministic encoding considerations also apply to 1461 + the content of tags. 1462 + 1463 + If a protocol includes a field that can express integers with an 1464 + absolute value of 2^(64) or larger using tag numbers 2 or 3 1465 + (Section 3.4.3), the protocol's deterministic encoding needs to 1466 + specify whether smaller integers are also expressed using these tags 1467 + or using major types 0 and 1. Preferred serialization uses the 1468 + latter choice, which is therefore recommended. 1469 + 1470 + Protocols that include floating-point values, whether represented 1471 + using basic floating-point values (Section 3.3) or using tags (or 1472 + both), may need to define extra requirements on their deterministic 1473 + encodings, such as: 1474 + 1475 + * Although IEEE floating-point values can represent both positive 1476 + and negative zero as distinct values, the application might not 1477 + distinguish these and might decide to represent all zero values 1478 + with a positive sign, disallowing negative zero. (The application 1479 + may also want to restrict the precision of floating-point values 1480 + in such a way that there is never a need to represent 64-bit -- or 1481 + even 32-bit -- floating-point values.) 1482 + 1483 + * If a protocol includes a field that can express floating-point 1484 + values, with a specific data model that declares integer and 1485 + floating-point values to be interchangeable, the protocol's 1486 + deterministic encoding needs to specify whether, for example, the 1487 + integer 1.0 is encoded as 0x01 (unsigned integer), 0xf93c00 1488 + (binary16), 0xfa3f800000 (binary32), or 0xfb3ff0000000000000 1489 + (binary64). Example rules for this are: 1490 + 1491 + 1. Encode integral values that fit in 64 bits as values from 1492 + major types 0 and 1, and other values as the preferred 1493 + (smallest of 16-, 32-, or 64-bit) floating-point 1494 + representation that accurately represents the value, 1495 + 1496 + 2. Encode all values as the preferred floating-point 1497 + representation that accurately represents the value, even for 1498 + integral values, or 1499 + 1500 + 3. Encode all values as 64-bit floating-point representations. 1501 + 1502 + Rule 1 straddles the boundaries between integers and floating- 1503 + point values, and Rule 3 does not use preferred serialization, so 1504 + Rule 2 may be a good choice in many cases. 1505 + 1506 + * If NaN is an allowed value, and there is no intent to support NaN 1507 + payloads or signaling NaNs, the protocol needs to pick a single 1508 + representation, typically 0xf97e00. If that simple choice is not 1509 + possible, specific attention will be needed for NaN handling. 1510 + 1511 + * Subnormal numbers (nonzero numbers with the lowest possible 1512 + exponent of a given IEEE 754 number format) may be flushed to zero 1513 + outputs or be treated as zero inputs in some floating-point 1514 + implementations. A protocol's deterministic encoding may want to 1515 + specifically accommodate such implementations while creating an 1516 + onus on other implementations by excluding subnormal numbers from 1517 + interchange, interchanging zero instead. 1518 + 1519 + * The same number can be represented by different decimal fractions, 1520 + by different bigfloats, and by different forms under other tags 1521 + that may be defined to express numeric values. Depending on the 1522 + implementation, it may not always be practical to determine 1523 + whether any of these forms (or forms in the basic generic data 1524 + model) are equivalent. An application protocol that presents 1525 + choices of this kind for the representation format of numbers 1526 + needs to be explicit about how the formats for deterministic 1527 + encoding are to be chosen. 1528 + 1529 + 4.2.3. Length-First Map Key Ordering 1530 + 1531 + The core deterministic encoding requirements (Section 4.2.1) sort map 1532 + keys in a different order from the one suggested by Section 3.9 of 1533 + [RFC7049] (called "Canonical CBOR" there). Protocols that need to be 1534 + compatible with the order specified in [RFC7049] can instead be 1535 + specified in terms of this specification's "length-first core 1536 + deterministic encoding requirements": 1537 + 1538 + A CBOR encoding satisfies the "length-first core deterministic 1539 + encoding requirements" if it satisfies the core deterministic 1540 + encoding requirements except that the keys in every map MUST be 1541 + sorted such that: 1542 + 1543 + 1. If two keys have different lengths, the shorter one sorts 1544 + earlier; 1545 + 1546 + 2. If two keys have the same length, the one with the lower value in 1547 + (bytewise) lexical order sorts earlier. 1548 + 1549 + For example, under the length-first core deterministic encoding 1550 + requirements, the following keys are sorted correctly: 1551 + 1552 + 1. 10, encoded as 0x0a. 1553 + 1554 + 2. -1, encoded as 0x20. 1555 + 1556 + 3. false, encoded as 0xf4. 1557 + 1558 + 4. 100, encoded as 0x1864. 1559 + 1560 + 5. "z", encoded as 0x617a. 1561 + 1562 + 6. [-1], encoded as 0x8120. 1563 + 1564 + 7. "aa", encoded as 0x626161. 1565 + 1566 + 8. [100], encoded as 0x811864. 1567 + 1568 + | Although [RFC7049] used the term "Canonical CBOR" for its form 1569 + | of requirements on deterministic encoding, this document avoids 1570 + | this term because "canonicalization" is often associated with 1571 + | specific uses of deterministic encoding only. The terms are 1572 + | essentially interchangeable, however, and the set of core 1573 + | requirements in this document could also be called "Canonical 1574 + | CBOR", while the length-first-ordered version of that could be 1575 + | called "Old Canonical CBOR". 1576 + 1577 + 5. Creating CBOR-Based Protocols 1578 + 1579 + Data formats such as CBOR are often used in environments where there 1580 + is no format negotiation. A specific design goal of CBOR is to not 1581 + need any included or assumed schema: a decoder can take a CBOR item 1582 + and decode it with no other knowledge. 1583 + 1584 + Of course, in real-world implementations, the encoder and the decoder 1585 + will have a shared view of what should be in a CBOR data item. For 1586 + example, an agreed-to format might be "the item is an array whose 1587 + first value is a UTF-8 string, second value is an integer, and 1588 + subsequent values are zero or more floating-point numbers" or "the 1589 + item is a map that has byte strings for keys and contains a pair 1590 + whose key is 0xab01". 1591 + 1592 + CBOR-based protocols MUST specify how their decoders handle invalid 1593 + and other unexpected data. CBOR-based protocols MAY specify that 1594 + they treat arbitrary valid data as unexpected. Encoders for CBOR- 1595 + based protocols MUST produce only valid items, that is, the protocol 1596 + cannot be designed to make use of invalid items. An encoder can be 1597 + capable of encoding as many or as few types of values as is required 1598 + by the protocol in which it is used; a decoder can be capable of 1599 + understanding as many or as few types of values as is required by the 1600 + protocols in which it is used. This lack of restrictions allows CBOR 1601 + to be used in extremely constrained environments. 1602 + 1603 + The rest of this section discusses some considerations in creating 1604 + CBOR-based protocols. With few exceptions, it is advisory only and 1605 + explicitly excludes any language from BCP 14 [RFC2119] [RFC8174] 1606 + other than words that could be interpreted as "MAY" in the sense of 1607 + BCP 14. The exceptions aim at facilitating interoperability of CBOR- 1608 + based protocols while making use of a wide variety of both generic 1609 + and application-specific encoders and decoders. 1610 + 1611 + 5.1. CBOR in Streaming Applications 1612 + 1613 + In a streaming application, a data stream may be composed of a 1614 + sequence of CBOR data items concatenated back-to-back. In such an 1615 + environment, the decoder immediately begins decoding a new data item 1616 + if data is found after the end of a previous data item. 1617 + 1618 + Not all of the bytes making up a data item may be immediately 1619 + available to the decoder; some decoders will buffer additional data 1620 + until a complete data item can be presented to the application. 1621 + Other decoders can present partial information about a top-level data 1622 + item to an application, such as the nested data items that could 1623 + already be decoded, or even parts of a byte string that hasn't 1624 + completely arrived yet. Such an application also MUST have a 1625 + matching streaming security mechanism, where the desired protection 1626 + is available for incremental data presented to the application. 1627 + 1628 + Note that some applications and protocols will not want to use 1629 + indefinite-length encoding. Using indefinite-length encoding allows 1630 + an encoder to not need to marshal all the data for counting, but it 1631 + requires a decoder to allocate increasing amounts of memory while 1632 + waiting for the end of the item. This might be fine for some 1633 + applications but not others. 1634 + 1635 + 5.2. Generic Encoders and Decoders 1636 + 1637 + A generic CBOR decoder can decode all well-formed encoded CBOR data 1638 + items and present the data items to an application. See Appendix C. 1639 + (The diagnostic notation, Section 8, may be used to present well- 1640 + formed CBOR values to humans.) 1641 + 1642 + Generic CBOR encoders provide an application interface that allows 1643 + the application to specify any well-formed value to be encoded as a 1644 + CBOR data item, including simple values and tags unknown to the 1645 + encoder. 1646 + 1647 + Even though CBOR attempts to minimize these cases, not all well- 1648 + formed CBOR data is valid: for example, the encoded text string 1649 + "0x62c0ae" does not contain valid UTF-8 (because [RFC3629] requires 1650 + always using the shortest form) and so is not a valid CBOR item. 1651 + Also, specific tags may make semantic constraints that may be 1652 + violated, for instance, by a bignum tag enclosing another tag or by 1653 + an instance of tag number 0 containing a byte string or containing a 1654 + text string with contents that do not match the "date-time" 1655 + production of [RFC3339]. There is no requirement that generic 1656 + encoders and decoders make unnatural choices for their application 1657 + interface to enable the processing of invalid data. Generic encoders 1658 + and decoders are expected to forward simple values and tags even if 1659 + their specific codepoints are not registered at the time the encoder/ 1660 + decoder is written (Section 5.4). 1661 + 1662 + 5.3. Validity of Items 1663 + 1664 + A well-formed but invalid CBOR data item (Section 1.2) presents a 1665 + problem with interpreting the data encoded in it in the CBOR data 1666 + model. A CBOR-based protocol could be specified in several layers, 1667 + in which the lower layers don't process the semantics of some of the 1668 + CBOR data they forward. These layers can't notice any validity 1669 + errors in data they don't process and MUST forward that data as-is. 1670 + The first layer that does process the semantics of an invalid CBOR 1671 + item MUST pick one of two choices: 1672 + 1673 + 1. Replace the problematic item with an error marker and continue 1674 + with the next item, or 1675 + 1676 + 2. Issue an error and stop processing altogether. 1677 + 1678 + A CBOR-based protocol MUST specify which of these options its 1679 + decoders take for each kind of invalid item they might encounter. 1680 + 1681 + Such problems might occur at the basic validity level of CBOR or in 1682 + the context of tags (tag validity). 1683 + 1684 + 5.3.1. Basic validity 1685 + 1686 + Two kinds of validity errors can occur in the basic generic data 1687 + model: 1688 + 1689 + Duplicate keys in a map: Generic decoders (Section 5.2) make data 1690 + available to applications using the native CBOR data model. That 1691 + data model includes maps (key-value mappings with unique keys), 1692 + not multimaps (key-value mappings where multiple entries can have 1693 + the same key). Thus, a generic decoder that gets a CBOR map item 1694 + that has duplicate keys will decode to a map with only one 1695 + instance of that key, or it might stop processing altogether. On 1696 + the other hand, a "streaming decoder" may not even be able to 1697 + notice. See Section 5.6 for more discussion of keys in maps. 1698 + 1699 + Invalid UTF-8 string: A decoder might or might not want to verify 1700 + that the sequence of bytes in a UTF-8 string (major type 3) is 1701 + actually valid UTF-8 and react appropriately. 1702 + 1703 + 5.3.2. Tag validity 1704 + 1705 + Two additional kinds of validity errors are introduced by adding tags 1706 + to the basic generic data model: 1707 + 1708 + Inadmissible type for tag content: Tag numbers (Section 3.4) specify 1709 + what type of data item is supposed to be used as their tag 1710 + content; for example, the tag numbers for unsigned or negative 1711 + bignums are supposed to be put on byte strings. A decoder that 1712 + decodes the tagged data item into a native representation (a 1713 + native big integer in this example) is expected to check the type 1714 + of the data item being tagged. Even decoders that don't have such 1715 + native representations available in their environment may perform 1716 + the check on those tags known to them and react appropriately. 1717 + 1718 + Inadmissible value for tag content: The type of data item may be 1719 + admissible for a tag's content, but the specific value may not be; 1720 + e.g., a value of "yesterday" is not acceptable for the content of 1721 + tag 0, even though it properly is a text string. A decoder that 1722 + normally ingests such tags into equivalent platform types might 1723 + present this tag to the application in a similar way to how it 1724 + would present a tag with an unknown tag number (Section 5.4). 1725 + 1726 + 5.4. Validity and Evolution 1727 + 1728 + A decoder with validity checking will expend the effort to reliably 1729 + detect data items with validity errors. For example, such a decoder 1730 + needs to have an API that reports an error (and does not return data) 1731 + for a CBOR data item that contains any of the validity errors listed 1732 + in the previous subsection. 1733 + 1734 + The set of tags defined in the "Concise Binary Object Representation 1735 + (CBOR) Tags" registry (Section 9.2), as well as the set of simple 1736 + values defined in the "Concise Binary Object Representation (CBOR) 1737 + Simple Values" registry (Section 9.1), can grow at any time beyond 1738 + the set understood by a generic decoder. A validity-checking decoder 1739 + can do one of two things when it encounters such a case that it does 1740 + not recognize: 1741 + 1742 + * It can report an error (and not return data). Note that treating 1743 + this case as an error can cause ossification and is thus not 1744 + encouraged. This error is not a validity error, per se. This 1745 + kind of error is more likely to be raised by a decoder that would 1746 + be performing validity checking if this were a known case. 1747 + 1748 + * It can emit the unknown item (type, value, and, for tags, the 1749 + decoded tagged data item) to the application calling the decoder, 1750 + and then give the application an indication that the decoder did 1751 + not recognize that tag number or simple value. 1752 + 1753 + The latter approach, which is also appropriate for decoders that do 1754 + not support validity checking, provides forward compatibility with 1755 + newly registered tags and simple values without the requirement to 1756 + update the encoder at the same time as the calling application. (For 1757 + this, the decoder's API needs the ability to mark unknown items so 1758 + that the calling application can handle them in a manner appropriate 1759 + for the program.) 1760 + 1761 + Since some of the processing needed for validity checking may have an 1762 + appreciable cost (in particular with duplicate detection for maps), 1763 + support of validity checking is not a requirement placed on all CBOR 1764 + decoders. 1765 + 1766 + Some encoders will rely on their applications to provide input data 1767 + in such a way that valid CBOR results from the encoder. A generic 1768 + encoder may also want to provide a validity-checking mode where it 1769 + reliably limits its output to valid CBOR, independent of whether or 1770 + not its application is indeed providing API-conformant data. 1771 + 1772 + 5.5. Numbers 1773 + 1774 + CBOR-based protocols should take into account that different language 1775 + environments pose different restrictions on the range and precision 1776 + of numbers that are representable. For example, the basic JavaScript 1777 + number system treats all numbers as floating-point values, which may 1778 + result in the silent loss of precision in decoding integers with more 1779 + than 53 significant bits. Another example is that, since CBOR keeps 1780 + the sign bit for its integer representation in the major type, it has 1781 + one bit more for signed numbers of a certain length (e.g., 1782 + -2^(64)..2^(64)-1 for 1+8-byte integers) than the typical platform 1783 + signed integer representation of the same length (-2^(63)..2^(63)-1 1784 + for 8-byte int64_t). A protocol that uses numbers should define its 1785 + expectations on the handling of nontrivial numbers in decoders and 1786 + receiving applications. 1787 + 1788 + A CBOR-based protocol that includes floating-point numbers can 1789 + restrict which of the three formats (half-precision, single- 1790 + precision, and double-precision) are to be supported. For an 1791 + integer-only application, a protocol may want to completely exclude 1792 + the use of floating-point values. 1793 + 1794 + A CBOR-based protocol designed for compactness may want to exclude 1795 + specific integer encodings that are longer than necessary for the 1796 + application, such as to save the need to implement 64-bit integers. 1797 + There is an expectation that encoders will use the most compact 1798 + integer representation that can represent a given value. However, a 1799 + compact application that does not require deterministic encoding 1800 + should accept values that use a longer-than-needed encoding (such as 1801 + encoding "0" as 0b000_11001 followed by two bytes of 0x00) as long as 1802 + the application can decode an integer of the given size. Similar 1803 + considerations apply to floating-point values; decoding both 1804 + preferred serializations and longer-than-needed ones is recommended. 1805 + 1806 + CBOR-based protocols for constrained applications that provide a 1807 + choice between representing a specific number as an integer and as a 1808 + decimal fraction or bigfloat (such as when the exponent is small and 1809 + nonnegative) might express a quality-of-implementation expectation 1810 + that the integer representation is used directly. 1811 + 1812 + 5.6. Specifying Keys for Maps 1813 + 1814 + The encoding and decoding applications need to agree on what types of 1815 + keys are going to be used in maps. In applications that need to 1816 + interwork with JSON-based applications, conversion is simplified by 1817 + limiting keys to text strings only; otherwise, there has to be a 1818 + specified mapping from the other CBOR types to text strings, and this 1819 + often leads to implementation errors. In applications where keys are 1820 + numeric in nature, and numeric ordering of keys is important to the 1821 + application, directly using the numbers for the keys is useful. 1822 + 1823 + If multiple types of keys are to be used, consideration should be 1824 + given to how these types would be represented in the specific 1825 + programming environments that are to be used. For example, in 1826 + JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished 1827 + from a key of floating-point 1.0. This means that, if integer keys 1828 + are used, the protocol needs to avoid the use of floating-point keys 1829 + the values of which happen to be integer numbers in the same map. 1830 + 1831 + Decoders that deliver data items nested within a CBOR data item 1832 + immediately on decoding them ("streaming decoders") often do not keep 1833 + the state that is necessary to ascertain uniqueness of a key in a 1834 + map. Similarly, an encoder that can start encoding data items before 1835 + the enclosing data item is completely available ("streaming encoder") 1836 + may want to reduce its overhead significantly by relying on its data 1837 + source to maintain uniqueness. 1838 + 1839 + A CBOR-based protocol MUST define what to do when a receiving 1840 + application sees multiple identical keys in a map. The resulting 1841 + rule in the protocol MUST respect the CBOR data model: it cannot 1842 + prescribe a specific handling of the entries with the identical keys, 1843 + except that it might have a rule that having identical keys in a map 1844 + indicates a malformed map and that the decoder has to stop with an 1845 + error. When processing maps that exhibit entries with duplicate 1846 + keys, a generic decoder might do one of the following: 1847 + 1848 + * Not accept maps with duplicate keys (that is, enforce validity for 1849 + maps, see also Section 5.4). These generic decoders are 1850 + universally useful. An application may still need to perform its 1851 + own duplicate checking based on application rules (for instance, 1852 + if the application equates integers and floating-point values in 1853 + map key positions for specific maps). 1854 + 1855 + * Pass all map entries to the application, including ones with 1856 + duplicate keys. This requires that the application handle (check 1857 + against) duplicate keys, even if the application rules are 1858 + identical to the generic data model rules. 1859 + 1860 + * Lose some entries with duplicate keys, e.g., deliver only the 1861 + final (or first) entry out of the entries with the same key. With 1862 + such a generic decoder, applications may get different results for 1863 + a specific key on different runs, and with different generic 1864 + decoders, which value is returned is based on generic decoder 1865 + implementation and the actual order of keys in the map. In 1866 + particular, applications cannot validate key uniqueness on their 1867 + own as they do not necessarily see all entries; they may not be 1868 + able to use such a generic decoder if they need to validate key 1869 + uniqueness. These generic decoders can only be used in situations 1870 + where the data source and transfer always provide valid maps; this 1871 + is not possible if the data source and transfer can be attacked. 1872 + 1873 + Generic decoders need to document which of these three approaches 1874 + they implement. 1875 + 1876 + The CBOR data model for maps does not allow ascribing semantics to 1877 + the order of the key/value pairs in the map representation. Thus, a 1878 + CBOR-based protocol MUST NOT specify that changing the key/value pair 1879 + order in a map changes the semantics, except to specify that some 1880 + orders are disallowed, for example, where they would not meet the 1881 + requirements of a deterministic encoding (Section 4.2). (Any 1882 + secondary effects of map ordering such as on timing, cache usage, and 1883 + other potential side channels are not considered part of the 1884 + semantics but may be enough reason on their own for a protocol to 1885 + require a deterministic encoding format.) 1886 + 1887 + Applications for constrained devices should consider using small 1888 + integers as keys if they have maps with a small number of frequently 1889 + used keys; for instance, a set of 24 or fewer keys can be encoded in 1890 + a single byte as unsigned integers, up to 48 if negative integers are 1891 + also used. Less frequently occurring keys can then use integers with 1892 + longer encodings. 1893 + 1894 + 5.6.1. Equivalence of Keys 1895 + 1896 + The specific data model that applies to a CBOR data item is used to 1897 + determine whether keys occurring in maps are duplicates or distinct. 1898 + 1899 + At the generic data model level, numerically equivalent integer and 1900 + floating-point values are distinct from each other, as they are from 1901 + the various big numbers (Tags 2 to 5). Similarly, text strings are 1902 + distinct from byte strings, even if composed of the same bytes. A 1903 + tagged value is distinct from an untagged value or from a value 1904 + tagged with a different tag number. 1905 + 1906 + Within each of these groups, numeric values are distinct unless they 1907 + are numerically equal (specifically, -0.0 is equal to 0.0); for the 1908 + purpose of map key equivalence, NaN values are equivalent if they 1909 + have the same significand after zero-extending both significands at 1910 + the right to 64 bits. 1911 + 1912 + Both byte strings and text strings are compared byte by byte, arrays 1913 + are compared element by element, and are equal if they have the same 1914 + number of bytes/elements and the same values at the same positions. 1915 + Two maps are equal if they have the same set of pairs regardless of 1916 + their order; pairs are equal if both the key and value are equal. 1917 + 1918 + Tagged values are equal if both the tag number and the tag content 1919 + are equal. (Note that a generic decoder that provides processing for 1920 + a specific tag may not be able to distinguish some semantically 1921 + equivalent values, e.g., if leading zeroes occur in the content of 1922 + tag 2 or tag 3 (Section 3.4.3).) Simple values are equal if they 1923 + simply have the same value. Nothing else is equal in the generic 1924 + data model; a simple value 2 is not equivalent to an integer 2, and 1925 + an array is never equivalent to a map. 1926 + 1927 + As discussed in Section 2.2, specific data models can make values 1928 + equivalent for the purpose of comparing map keys that are distinct in 1929 + the generic data model. Note that this implies that a generic 1930 + decoder may deliver a decoded map to an application that needs to be 1931 + checked for duplicate map keys by that application (alternatively, 1932 + the decoder may provide a programming interface to perform this 1933 + service for the application). Specific data models are not able to 1934 + distinguish values for map keys that are equal for this purpose at 1935 + the generic data model level. 1936 + 1937 + 5.7. Undefined Values 1938 + 1939 + In some CBOR-based protocols, the simple value (Section 3.3) of 1940 + "undefined" might be used by an encoder as a substitute for a data 1941 + item with an encoding problem, in order to allow the rest of the 1942 + enclosing data items to be encoded without harm. 1943 + 1944 + 6. Converting Data between CBOR and JSON 1945 + 1946 + This section gives non-normative advice about converting between CBOR 1947 + and JSON. Implementations of converters MAY use whichever advice 1948 + here they want. 1949 + 1950 + It is worth noting that a JSON text is a sequence of characters, not 1951 + an encoded sequence of bytes, while a CBOR data item consists of 1952 + bytes, not characters. 1953 + 1954 + 6.1. Converting from CBOR to JSON 1955 + 1956 + Most of the types in CBOR have direct analogs in JSON. However, some 1957 + do not, and someone implementing a CBOR-to-JSON converter has to 1958 + consider what to do in those cases. The following non-normative 1959 + advice deals with these by converting them to a single substitute 1960 + value, such as a JSON null. 1961 + 1962 + * An integer (major type 0 or 1) becomes a JSON number. 1963 + 1964 + * A byte string (major type 2) that is not embedded in a tag that 1965 + specifies a proposed encoding is encoded in base64url without 1966 + padding and becomes a JSON string. 1967 + 1968 + * A UTF-8 string (major type 3) becomes a JSON string. Note that 1969 + JSON requires escaping certain characters ([RFC8259], Section 7): 1970 + quotation mark (U+0022), reverse solidus (U+005C), and the "C0 1971 + control characters" (U+0000 through U+001F). All other characters 1972 + are copied unchanged into the JSON UTF-8 string. 1973 + 1974 + * An array (major type 4) becomes a JSON array. 1975 + 1976 + * A map (major type 5) becomes a JSON object. This is possible 1977 + directly only if all keys are UTF-8 strings. A converter might 1978 + also convert other keys into UTF-8 strings (such as by converting 1979 + integers into strings containing their decimal representation); 1980 + however, doing so introduces a danger of key collision. Note also 1981 + that, if tags on UTF-8 strings are ignored as proposed below, this 1982 + will cause a key collision if the tags are different but the 1983 + strings are the same. 1984 + 1985 + * False (major type 7, additional information 20) becomes a JSON 1986 + false. 1987 + 1988 + * True (major type 7, additional information 21) becomes a JSON 1989 + true. 1990 + 1991 + * Null (major type 7, additional information 22) becomes a JSON 1992 + null. 1993 + 1994 + * A floating-point value (major type 7, additional information 25 1995 + through 27) becomes a JSON number if it is finite (that is, it can 1996 + be represented in a JSON number); if the value is non-finite (NaN, 1997 + or positive or negative Infinity), it is represented by the 1998 + substitute value. 1999 + 2000 + * Any other simple value (major type 7, any additional information 2001 + value not yet discussed) is represented by the substitute value. 2002 + 2003 + * A bignum (major type 6, tag number 2 or 3) is represented by 2004 + encoding its byte string in base64url without padding and becomes 2005 + a JSON string. For tag number 3 (negative bignum), a "~" (ASCII 2006 + tilde) is inserted before the base-encoded value. (The conversion 2007 + to a binary blob instead of a number is to prevent a likely 2008 + numeric overflow for the JSON decoder.) 2009 + 2010 + * A byte string with an encoding hint (major type 6, tag number 21 2011 + through 23) is encoded as described by the hint and becomes a JSON 2012 + string. 2013 + 2014 + * For all other tags (major type 6, any other tag number), the tag 2015 + content is represented as a JSON value; the tag number is ignored. 2016 + 2017 + * Indefinite-length items are made definite before conversion. 2018 + 2019 + A CBOR-to-JSON converter may want to keep to the JSON profile I-JSON 2020 + [RFC7493], to maximize interoperability and increase confidence that 2021 + the JSON output can be processed with predictable results. For 2022 + example, this has implications on the range of integers that can be 2023 + represented reliably, as well as on the top-level items that may be 2024 + supported by older JSON implementations. 2025 + 2026 + 6.2. Converting from JSON to CBOR 2027 + 2028 + All JSON values, once decoded, directly map into one or more CBOR 2029 + values. As with any kind of CBOR generation, decisions have to be 2030 + made with respect to number representation. In a suggested 2031 + conversion: 2032 + 2033 + * JSON numbers without fractional parts (integer numbers) are 2034 + represented as integers (major types 0 and 1, possibly major type 2035 + 6, tag number 2 and 3), choosing the shortest form; integers 2036 + longer than an implementation-defined threshold may instead be 2037 + represented as floating-point values. The default range that is 2038 + represented as integer is -2^(53)+1..2^(53)-1 (fully exploiting 2039 + the range for exact integers in the binary64 representation often 2040 + used for decoding JSON [RFC7493]). A CBOR-based protocol, or a 2041 + generic converter implementation, may choose -2^(32)..2^(32)-1 or 2042 + -2^(64)..2^(64)-1 (fully using the integer ranges available in 2043 + CBOR with uint32_t or uint64_t, respectively) or even 2044 + -2^(31)..2^(31)-1 or -2^(63)..2^(63)-1 (using popular ranges for 2045 + two's complement signed integers). (If the JSON was generated 2046 + from a JavaScript implementation, its precision is already limited 2047 + to 53 bits maximum.) 2048 + 2049 + * Numbers with fractional parts are represented as floating-point 2050 + values, performing the decimal-to-binary conversion based on the 2051 + precision provided by IEEE 754 binary64. The mathematical value 2052 + of the JSON number is converted to binary64 using the 2053 + roundTiesToEven procedure in Section 4.3.1 of [IEEE754]. Then, 2054 + when encoding in CBOR, the preferred serialization uses the 2055 + shortest floating-point representation exactly representing this 2056 + conversion result; for instance, 1.5 is represented in a 16-bit 2057 + floating-point value (not all implementations will be capable of 2058 + efficiently finding the minimum form, though). Instead of using 2059 + the default binary64 precision, there may be an implementation- 2060 + defined limit to the precision of the conversion that will affect 2061 + the precision of the represented values. Decimal representation 2062 + should only be used on the CBOR side if that is specified in a 2063 + protocol. 2064 + 2065 + CBOR has been designed to generally provide a more compact encoding 2066 + than JSON. One implementation strategy that might come to mind is to 2067 + perform a JSON-to-CBOR encoding in place in a single buffer. This 2068 + strategy would need to carefully consider a number of pathological 2069 + cases, such as that some strings represented with no or very few 2070 + escapes and longer (or much longer) than 255 bytes may expand when 2071 + encoded as UTF-8 strings in CBOR. Similarly, a few of the binary 2072 + floating-point representations might cause expansion from some short 2073 + decimal representations (1.1, 1e9) in JSON. This may be hard to get 2074 + right, and any ensuing vulnerabilities may be exploited by an 2075 + attacker. 2076 + 2077 + 7. Future Evolution of CBOR 2078 + 2079 + Successful protocols evolve over time. New ideas appear, 2080 + implementation platforms improve, related protocols are developed and 2081 + evolve, and new requirements from applications and protocols are 2082 + added. Facilitating protocol evolution is therefore an important 2083 + design consideration for any protocol development. 2084 + 2085 + For protocols that will use CBOR, CBOR provides some useful 2086 + mechanisms to facilitate their evolution. Best practices for this 2087 + are well known, particularly from JSON format development of JSON- 2088 + based protocols. Therefore, such best practices are outside the 2089 + scope of this specification. 2090 + 2091 + However, facilitating the evolution of CBOR itself is very well 2092 + within its scope. CBOR is designed to both provide a stable basis 2093 + for development of CBOR-based protocols and to be able to evolve. 2094 + Since a successful protocol may live for decades, CBOR needs to be 2095 + designed for decades of use and evolution. This section provides 2096 + some guidance for the evolution of CBOR. It is necessarily more 2097 + subjective than other parts of this document. It is also necessarily 2098 + incomplete, lest it turn into a textbook on protocol development. 2099 + 2100 + 7.1. Extension Points 2101 + 2102 + In a protocol design, opportunities for evolution are often included 2103 + in the form of extension points. For example, there may be a 2104 + codepoint space that is not fully allocated from the outset, and the 2105 + protocol is designed to tolerate and embrace implementations that 2106 + start using more codepoints than initially allocated. 2107 + 2108 + Sizing the codepoint space may be difficult because the range 2109 + required may be hard to predict. Protocol designs should attempt to 2110 + make the codepoint space large enough so that it can slowly be filled 2111 + over the intended lifetime of the protocol. 2112 + 2113 + CBOR has three major extension points: 2114 + 2115 + the "simple" space (values in major type 7): Of the 24 efficient 2116 + (and 224 slightly less efficient) values, only a small number have 2117 + been allocated. Implementations receiving an unknown simple data 2118 + item may easily be able to process it as such, given that the 2119 + structure of the value is indeed simple. The IANA registry in 2120 + Section 9.1 is the appropriate way to address the extensibility of 2121 + this codepoint space. 2122 + 2123 + the "tag" space (values in major type 6): The total codepoint space 2124 + is abundant; only a tiny part of it has been allocated. However, 2125 + not all of these codepoints are equally efficient: the first 24 2126 + only consume a single ("1+0") byte, and half of them have already 2127 + been allocated. The next 232 values only consume two ("1+1") 2128 + bytes, with nearly a quarter already allocated. These subspaces 2129 + need some curation to last for a few more decades. 2130 + Implementations receiving an unknown tag number can choose to 2131 + process just the enclosed tag content or, preferably, to process 2132 + the tag as an unknown tag number wrapping the tag content. The 2133 + IANA registry in Section 9.2 is the appropriate way to address the 2134 + extensibility of this codepoint space. 2135 + 2136 + the "additional information" space: An implementation receiving an 2137 + unknown additional information value has no way to continue 2138 + decoding, so allocating codepoints in this space is a major step 2139 + beyond just exercising an extension point. There are also very 2140 + few codepoints left. See also Section 7.2. 2141 + 2142 + 7.2. Curating the Additional Information Space 2143 + 2144 + The human mind is sometimes drawn to filling in little perceived gaps 2145 + to make something neat. We expect the remaining gaps in the 2146 + codepoint space for the additional information values to be an 2147 + attractor for new ideas, just because they are there. 2148 + 2149 + The present specification does not manage the additional information 2150 + codepoint space by an IANA registry. Instead, allocations out of 2151 + this space can only be done by updating this specification. 2152 + 2153 + For an additional information value of n >= 24, the size of the 2154 + additional data typically is 2^(n-24) bytes. Therefore, additional 2155 + information values 28 and 29 should be viewed as candidates for 2156 + 128-bit and 256-bit quantities, in case a need arises to add them to 2157 + the protocol. Additional information value 30 is then the only 2158 + additional information value available for general allocation, and 2159 + there should be a very good reason for allocating it before assigning 2160 + it through an update of the present specification. 2161 + 2162 + 8. Diagnostic Notation 2163 + 2164 + CBOR is a binary interchange format. To facilitate documentation and 2165 + debugging, and in particular to facilitate communication between 2166 + entities cooperating in debugging, this section defines a simple 2167 + human-readable diagnostic notation. All actual interchange always 2168 + happens in the binary format. 2169 + 2170 + Note that this truly is a diagnostic format; it is not meant to be 2171 + parsed. Therefore, no formal definition (as in ABNF) is given in 2172 + this document. (Implementers looking for a text-based format for 2173 + representing CBOR data items in configuration files may also want to 2174 + consider YAML [YAML].) 2175 + 2176 + The diagnostic notation is loosely based on JSON as it is defined in 2177 + RFC 8259, extending it where needed. 2178 + 2179 + The notation borrows the JSON syntax for numbers (integer and 2180 + floating-point), True (>true<), False (>false<), Null (>null<), UTF-8 2181 + strings, arrays, and maps (maps are called objects in JSON; the 2182 + diagnostic notation extends JSON here by allowing any data item in 2183 + the key position). Undefined is written >undefined< as in 2184 + JavaScript. The non-finite floating-point numbers Infinity, 2185 + -Infinity, and NaN are written exactly as in this sentence (this is 2186 + also a way they can be written in JavaScript, although JSON does not 2187 + allow them). A tag is written as an integer number for the tag 2188 + number, followed by the tag content in parentheses; for instance, a 2189 + date in the format specified by RFC 3339 (ISO 8601) could be notated 2190 + as: 2191 + 2192 + 0("2013-03-21T20:04:00Z") 2193 + 2194 + or the equivalent relative time as the following: 2195 + 2196 + 1(1363896240) 2197 + 2198 + Byte strings are notated in one of the base encodings, without 2199 + padding, enclosed in single quotes, prefixed by >h< for base16, >b32< 2200 + for base32, >h32< for base32hex, >b64< for base64 or base64url (the 2201 + actual encodings do not overlap, so the string remains unambiguous). 2202 + For example, the byte string 0x12345678 could be written h'12345678', 2203 + b32'CI2FM6A', or b64'EjRWeA'. 2204 + 2205 + Unassigned simple values are given as "simple()" with the appropriate 2206 + integer in the parentheses. For example, "simple(42)" indicates 2207 + major type 7, value 42. 2208 + 2209 + A number of useful extensions to the diagnostic notation defined here 2210 + are provided in Appendix G of [RFC8610], "Extended Diagnostic 2211 + Notation" (EDN). Similarly, this notation could be extended in a 2212 + separate document to provide documentation for NaN payloads, which 2213 + are not covered in this document. 2214 + 2215 + 8.1. Encoding Indicators 2216 + 2217 + Sometimes it is useful to indicate in the diagnostic notation which 2218 + of several alternative representations were actually used; for 2219 + example, a data item written >1.5< by a diagnostic decoder might have 2220 + been encoded as a half-, single-, or double-precision float. 2221 + 2222 + The convention for encoding indicators is that anything starting with 2223 + an underscore and all following characters that are alphanumeric or 2224 + underscore is an encoding indicator, and can be ignored by anyone not 2225 + interested in this information. For example, "_" or "_3". Encoding 2226 + indicators are always optional. 2227 + 2228 + A single underscore can be written after the opening brace of a map 2229 + or the opening bracket of an array to indicate that the data item was 2230 + represented in indefinite-length format. For example, [_ 1, 2] 2231 + contains an indicator that an indefinite-length representation was 2232 + used to represent the data item [1, 2]. 2233 + 2234 + An underscore followed by a decimal digit n indicates that the 2235 + preceding item (or, for arrays and maps, the item starting with the 2236 + preceding bracket or brace) was encoded with an additional 2237 + information value of 24+n. For example, 1.5_1 is a half-precision 2238 + floating-point number, while 1.5_3 is encoded as double precision. 2239 + This encoding indicator is not shown in Appendix A. (Note that the 2240 + encoding indicator "_" is thus an abbreviation of the full form "_7", 2241 + which is not used.) 2242 + 2243 + The detailed chunk structure of byte and text strings of indefinite 2244 + length can be notated in the form (_ h'0123', h'4567') and (_ "foo", 2245 + "bar"). However, for an indefinite-length string with no chunks 2246 + inside, (_ ) would be ambiguous as to whether a byte string (0x5fff) 2247 + or a text string (0x7fff) is meant and is therefore not used. The 2248 + basic forms ''_ and ""_ can be used instead and are reserved for the 2249 + case of no chunks only -- not as short forms for the (permitted, but 2250 + not really useful) encodings with only empty chunks, which need to be 2251 + notated as (_ ''), (_ ""), etc., to preserve the chunk structure. 2252 + 2253 + 9. IANA Considerations 2254 + 2255 + IANA has created two registries for new CBOR values. The registries 2256 + are separate, that is, not under an umbrella registry, and follow the 2257 + rules in [RFC8126]. IANA has also assigned a new media type, an 2258 + associated CoAP Content-Format entry, and a structured syntax suffix. 2259 + 2260 + 9.1. CBOR Simple Values Registry 2261 + 2262 + IANA has created the "Concise Binary Object Representation (CBOR) 2263 + Simple Values" registry at [IANA.cbor-simple-values]. The initial 2264 + values are shown in Table 4. 2265 + 2266 + New entries in the range 0 to 19 are assigned by Standards Action 2267 + [RFC8126]. It is suggested that IANA allocate values starting with 2268 + the number 16 in order to reserve the lower numbers for contiguous 2269 + blocks (if any). 2270 + 2271 + New entries in the range 32 to 255 are assigned by Specification 2272 + Required. 2273 + 2274 + 9.2. CBOR Tags Registry 2275 + 2276 + IANA has created the "Concise Binary Object Representation (CBOR) 2277 + Tags" registry at [IANA.cbor-tags]. The tags that were defined in 2278 + [RFC7049] are described in detail in Section 3.4, and other tags have 2279 + already been defined since then. 2280 + 2281 + New entries in the range 0 to 23 ("1+0") are assigned by Standards 2282 + Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767 2283 + (lower half of "1+2") are assigned by Specification Required. New 2284 + entries in the range 32768 to 18446744073709551615 (upper half of 2285 + "1+2", "1+4", and "1+8") are assigned by First Come First Served. 2286 + The template for registration requests is: 2287 + 2288 + * Data item 2289 + 2290 + * Semantics (short form) 2291 + 2292 + In addition, First Come First Served requests should include: 2293 + 2294 + * Point of contact 2295 + 2296 + * Description of semantics (URL) -- This description is optional; 2297 + the URL can point to something like an Internet-Draft or a web 2298 + page. 2299 + 2300 + Applicants exercising the First Come First Served range and making a 2301 + suggestion for a tag number that is not representable in 32 bits 2302 + (i.e., larger than 4294967295) should be aware that this could reduce 2303 + interoperability with implementations that do not support 64-bit 2304 + numbers. 2305 + 2306 + 9.3. Media Types Registry 2307 + 2308 + The Internet media type [RFC6838] ("MIME type") for a single encoded 2309 + CBOR data item is "application/cbor" as defined in the "Media Types" 2310 + registry [IANA.media-types]: 2311 + 2312 + Type name: application 2313 + 2314 + Subtype name: cbor 2315 + 2316 + Required parameters: n/a 2317 + 2318 + Optional parameters: n/a 2319 + 2320 + Encoding considerations: Binary 2321 + 2322 + Security considerations: See Section 10 of RFC 8949. 2323 + 2324 + Interoperability considerations: n/a 2325 + 2326 + Published specification: RFC 8949 2327 + 2328 + Applications that use this media type: Many 2329 + 2330 + Additional information: 2331 + 2332 + Magic number(s): n/a 2333 + File extension(s): .cbor 2334 + Macintosh file type code(s): n/a 2335 + 2336 + Person & email address to contact for further information: IETF CBOR 2337 + Working Group (cbor@ietf.org) or IETF Applications and Real-Time 2338 + Area (art@ietf.org) 2339 + 2340 + Intended usage: COMMON 2341 + 2342 + Restrictions on usage: none 2343 + 2344 + Author: IETF CBOR Working Group (cbor@ietf.org) 2345 + 2346 + Change controller: The IESG (iesg@ietf.org) 2347 + 2348 + 9.4. CoAP Content-Format Registry 2349 + 2350 + The CoAP Content-Format for CBOR has been registered in the "CoAP 2351 + Content-Formats" subregistry within the "Constrained RESTful 2352 + Environments (CoRE) Parameters" registry [IANA.core-parameters]: 2353 + 2354 + Media Type: application/cbor 2355 + 2356 + Encoding: - 2357 + 2358 + ID: 60 2359 + 2360 + Reference: RFC 8949 2361 + 2362 + 9.5. Structured Syntax Suffix Registry 2363 + 2364 + The structured syntax suffix [RFC6838] for media types based on a 2365 + single encoded CBOR data item is +cbor, which IANA has registered in 2366 + the "Structured Syntax Suffixes" registry [IANA.structured-suffix]: 2367 + 2368 + Name: Concise Binary Object Representation (CBOR) 2369 + 2370 + +suffix: +cbor 2371 + 2372 + References: RFC 8949 2373 + 2374 + Encoding Considerations: CBOR is a binary format. 2375 + 2376 + Interoperability Considerations: n/a 2377 + 2378 + Fragment Identifier Considerations: The syntax and semantics of 2379 + fragment identifiers specified for +cbor SHOULD be as specified 2380 + for "application/cbor". (At publication of RFC 8949, there is no 2381 + fragment identification syntax defined for "application/cbor".) 2382 + 2383 + The syntax and semantics for fragment identifiers for a specific 2384 + "xxx/yyy+cbor" SHOULD be processed as follows: 2385 + 2386 + * For cases defined in +cbor, where the fragment identifier 2387 + resolves per the +cbor rules, then process as specified in 2388 + +cbor. 2389 + 2390 + * For cases defined in +cbor, where the fragment identifier does 2391 + not resolve per the +cbor rules, then process as specified in 2392 + "xxx/yyy+cbor". 2393 + 2394 + * For cases not defined in +cbor, then process as specified in 2395 + "xxx/yyy+cbor". 2396 + 2397 + Security Considerations: See Section 10 of RFC 8949. 2398 + 2399 + Contact: IETF CBOR Working Group (cbor@ietf.org) or IETF 2400 + Applications and Real-Time Area (art@ietf.org) 2401 + 2402 + Author/Change Controller: IETF 2403 + 2404 + 10. Security Considerations 2405 + 2406 + A network-facing application can exhibit vulnerabilities in its 2407 + processing logic for incoming data. Complex parsers are well known 2408 + as a likely source of such vulnerabilities, such as the ability to 2409 + remotely crash a node, or even remotely execute arbitrary code on it. 2410 + CBOR attempts to narrow the opportunities for introducing such 2411 + vulnerabilities by reducing parser complexity, by giving the entire 2412 + range of encodable values a meaning where possible. 2413 + 2414 + Because CBOR decoders are often used as a first step in processing 2415 + unvalidated input, they need to be fully prepared for all types of 2416 + hostile input that may be designed to corrupt, overrun, or achieve 2417 + control of the system decoding the CBOR data item. A CBOR decoder 2418 + needs to assume that all input may be hostile even if it has been 2419 + checked by a firewall, has come over a secure channel such as TLS, is 2420 + encrypted or signed, or has come from some other source that is 2421 + presumed trusted. 2422 + 2423 + Section 4.1 gives examples of limitations in interoperability when 2424 + using a constrained CBOR decoder with input from a CBOR encoder that 2425 + uses a non-preferred serialization. When a single data item is 2426 + consumed both by such a constrained decoder and a full decoder, it 2427 + can lead to security issues that can be exploited by an attacker who 2428 + can inject or manipulate content. 2429 + 2430 + As discussed throughout this document, there are many values that can 2431 + be considered "equivalent" in some circumstances and "not equivalent" 2432 + in others. As just one example, the numeric value for the number 2433 + "one" might be expressed as an integer or a bignum. A system 2434 + interpreting CBOR input might accept either form for the number 2435 + "one", or might reject one (or both) forms. Such acceptance or 2436 + rejection can have security implications in the program that is using 2437 + the interpreted input. 2438 + 2439 + Hostile input may be constructed to overrun buffers, to overflow or 2440 + underflow integer arithmetic, or to cause other decoding disruption. 2441 + CBOR data items might have lengths or sizes that are intentionally 2442 + extremely large or too short. Resource exhaustion attacks might 2443 + attempt to lure a decoder into allocating very big data items 2444 + (strings, arrays, maps, or even arbitrary precision numbers) or 2445 + exhaust the stack depth by setting up deeply nested items. Decoders 2446 + need to have appropriate resource management to mitigate these 2447 + attacks. (Items for which very large sizes are given can also 2448 + attempt to exploit integer overflow vulnerabilities.) 2449 + 2450 + A CBOR decoder, by definition, only accepts well-formed CBOR; this is 2451 + the first step to its robustness. Input that is not well-formed CBOR 2452 + causes no further processing from the point where the lack of well- 2453 + formedness was detected. If possible, any data decoded up to this 2454 + point should have no impact on the application using the CBOR 2455 + decoder. 2456 + 2457 + In addition to ascertaining well-formedness, a CBOR decoder might 2458 + also perform validity checks on the CBOR data. Alternatively, it can 2459 + leave those checks to the application using the decoder. This choice 2460 + needs to be clearly documented in the decoder. Beyond the validity 2461 + at the CBOR level, an application also needs to ascertain that the 2462 + input is in alignment with the application protocol that is 2463 + serialized in CBOR. 2464 + 2465 + The input check itself may consume resources. This is usually linear 2466 + in the size of the input, which means that an attacker has to spend 2467 + resources that are commensurate to the resources spent by the 2468 + defender on input validation. However, an attacker might be able to 2469 + craft inputs that will take longer for a target decoder to process 2470 + than for the attacker to produce. Processing for arbitrary-precision 2471 + numbers may exceed linear effort. Also, some hash-table 2472 + implementations that are used by decoders to build in-memory 2473 + representations of maps can be attacked to spend quadratic effort, 2474 + unless a secret key (see Section 7 of [SIPHASH_LNCS], also 2475 + [SIPHASH_OPEN]) or some other mitigation is employed. Such 2476 + superlinear efforts can be exploited by an attacker to exhaust 2477 + resources at or before the input validator; they therefore need to be 2478 + avoided in a CBOR decoder implementation. Note that tag number 2479 + definitions and their implementations can add security considerations 2480 + of this kind; this should then be discussed in the security 2481 + considerations of the tag number definition. 2482 + 2483 + CBOR encoders do not receive input directly from the network and are 2484 + thus not directly attackable in the same way as CBOR decoders. 2485 + However, CBOR encoders often have an API that takes input from 2486 + another level in the implementation and can be attacked through that 2487 + API. The design and implementation of that API should assume the 2488 + behavior of its caller may be based on hostile input or on coding 2489 + mistakes. It should check inputs for buffer overruns, overflow and 2490 + underflow of integer arithmetic, and other such errors that are aimed 2491 + to disrupt the encoder. 2492 + 2493 + Protocols should be defined in such a way that potential multiple 2494 + interpretations are reliably reduced to a single interpretation. For 2495 + example, an attacker could make use of invalid input such as 2496 + duplicate keys in maps, or exploit different precision in processing 2497 + numbers to make one application base its decisions on a different 2498 + interpretation than the one that will be used by a second 2499 + application. To facilitate consistent interpretation, encoder and 2500 + decoder implementations should provide a validity-checking mode of 2501 + operation (Section 5.4). Note, however, that a generic decoder 2502 + cannot know about all requirements that an application poses on its 2503 + input data; it is therefore not relieving the application from 2504 + performing its own input checking. Also, since the set of defined 2505 + tag numbers evolves, the application may employ a tag number that is 2506 + not yet supported for validity checking by the generic decoder it 2507 + uses. Generic decoders therefore need to document which tag numbers 2508 + they support and what validity checking they provide for those tag 2509 + numbers as well as for basic CBOR (UTF-8 checking, duplicate map key 2510 + checking). 2511 + 2512 + Section 3.4.3 notes that using the non-preferred choice of a bignum 2513 + representation instead of a basic integer for encoding a number is 2514 + not intended to have application semantics, but it can have such 2515 + semantics if an application receiving CBOR data is using a decoder in 2516 + the basic generic data model. This disparity causes a security issue 2517 + if the two sets of semantics differ. Thus, applications using CBOR 2518 + need to specify the data model that they are using for each use of 2519 + CBOR data. 2520 + 2521 + It is common to convert CBOR data to other formats. In many cases, 2522 + CBOR has more expressive types than other formats; this is 2523 + particularly true for the common conversion to JSON. The loss of 2524 + type information can cause security issues for the systems that are 2525 + processing the less-expressive data. 2526 + 2527 + Section 6.2 describes a possibly common usage scenario of converting 2528 + between CBOR and JSON that could allow an attack if the attacker 2529 + knows that the application is performing the conversion. 2530 + 2531 + Security considerations for the use of base16 and base64 from 2532 + [RFC4648], and the use of UTF-8 from [RFC3629], are relevant to CBOR 2533 + as well. 2534 + 2535 + 11. References 2536 + 2537 + 11.1. Normative References 2538 + 2539 + [C] International Organization for Standardization, 2540 + "Information technology - Programming languages - C", 2541 + Fourth Edition, ISO/IEC 9899:2018, June 2018, 2542 + <https://www.iso.org/standard/74528.html>. 2543 + 2544 + [Cplusplus20] 2545 + International Organization for Standardization, 2546 + "Programming languages - C++", Sixth Edition, ISO/IEC DIS 2547 + 14882, ISO/IEC ISO/IEC JTC1 SC22 WG21 N 4860, March 2020, 2548 + <https://isocpp.org/files/papers/N4860.pdf>. 2549 + 2550 + [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE 2551 + Std 754-2019, DOI 10.1109/IEEESTD.2019.8766229, 2552 + <https://ieeexplore.ieee.org/document/8766229>. 2553 + 2554 + [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 2555 + Extensions (MIME) Part One: Format of Internet Message 2556 + Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, 2557 + <https://www.rfc-editor.org/info/rfc2045>. 2558 + 2559 + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2560 + Requirement Levels", BCP 14, RFC 2119, 2561 + DOI 10.17487/RFC2119, March 1997, 2562 + <https://www.rfc-editor.org/info/rfc2119>. 2563 + 2564 + [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 2565 + Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, 2566 + <https://www.rfc-editor.org/info/rfc3339>. 2567 + 2568 + [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 2569 + 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2570 + 2003, <https://www.rfc-editor.org/info/rfc3629>. 2571 + 2572 + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 2573 + Resource Identifier (URI): Generic Syntax", STD 66, 2574 + RFC 3986, DOI 10.17487/RFC3986, January 2005, 2575 + <https://www.rfc-editor.org/info/rfc3986>. 2576 + 2577 + [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom 2578 + Syndication Format", RFC 4287, DOI 10.17487/RFC4287, 2579 + December 2005, <https://www.rfc-editor.org/info/rfc4287>. 2580 + 2581 + [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 2582 + Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 2583 + <https://www.rfc-editor.org/info/rfc4648>. 2584 + 2585 + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 2586 + Writing an IANA Considerations Section in RFCs", BCP 26, 2587 + RFC 8126, DOI 10.17487/RFC8126, June 2017, 2588 + <https://www.rfc-editor.org/info/rfc8126>. 2589 + 2590 + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2591 + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2592 + May 2017, <https://www.rfc-editor.org/info/rfc8174>. 2593 + 2594 + [TIME_T] The Open Group, "The Open Group Base Specifications", 2595 + Section 4.16, 'Seconds Since the Epoch', Issue 7, 2018 2596 + Edition, IEEE Std 1003.1, 2018, 2597 + <https://pubs.opengroup.org/onlinepubs/9699919799/ 2598 + basedefs/V1_chap04.html#tag_04_16>. 2599 + 2600 + 11.2. Informative References 2601 + 2602 + [ASN.1] International Telecommunication Union, "Information 2603 + Technology - ASN.1 encoding rules: Specification of Basic 2604 + Encoding Rules (BER), Canonical Encoding Rules (CER) and 2605 + Distinguished Encoding Rules (DER)", ITU-T Recommendation 2606 + X.690, 2015, 2607 + <https://www.itu.int/rec/T-REC-X.690-201508-I/en>. 2608 + 2609 + [BSON] Various, "BSON - Binary JSON", <http://bsonspec.org/>. 2610 + 2611 + [CBOR-TAGS] 2612 + Bormann, C., "Notable CBOR Tags", Work in Progress, 2613 + Internet-Draft, draft-bormann-cbor-notable-tags-02, 25 2614 + June 2020, <https://tools.ietf.org/html/draft-bormann- 2615 + cbor-notable-tags-02>. 2616 + 2617 + [ECMA262] Ecma International, "ECMAScript 2020 Language 2618 + Specification", Standard ECMA-262, 11th Edition, June 2619 + 2020, <https://www.ecma- 2620 + international.org/publications/standards/Ecma-262.htm>. 2621 + 2622 + [Err3764] RFC Errata, Erratum ID 3764, RFC 7049, 2623 + <https://www.rfc-editor.org/errata/eid3764>. 2624 + 2625 + [Err3770] RFC Errata, Erratum ID 3770, RFC 7049, 2626 + <https://www.rfc-editor.org/errata/eid3770>. 2627 + 2628 + [Err4294] RFC Errata, Erratum ID 4294, RFC 7049, 2629 + <https://www.rfc-editor.org/errata/eid4294>. 2630 + 2631 + [Err4409] RFC Errata, Erratum ID 4409, RFC 7049, 2632 + <https://www.rfc-editor.org/errata/eid4409>. 2633 + 2634 + [Err4963] RFC Errata, Erratum ID 4963, RFC 7049, 2635 + <https://www.rfc-editor.org/errata/eid4963>. 2636 + 2637 + [Err4964] RFC Errata, Erratum ID 4964, RFC 7049, 2638 + <https://www.rfc-editor.org/errata/eid4964>. 2639 + 2640 + [Err5434] RFC Errata, Erratum ID 5434, RFC 7049, 2641 + <https://www.rfc-editor.org/errata/eid5434>. 2642 + 2643 + [Err5763] RFC Errata, Erratum ID 5763, RFC 7049, 2644 + <https://www.rfc-editor.org/errata/eid5763>. 2645 + 2646 + [Err5917] RFC Errata, Erratum ID 5917, RFC 7049, 2647 + <https://www.rfc-editor.org/errata/eid5917>. 2648 + 2649 + [IANA.cbor-simple-values] 2650 + IANA, "Concise Binary Object Representation (CBOR) Simple 2651 + Values", 2652 + <https://www.iana.org/assignments/cbor-simple-values>. 2653 + 2654 + [IANA.cbor-tags] 2655 + IANA, "Concise Binary Object Representation (CBOR) Tags", 2656 + <https://www.iana.org/assignments/cbor-tags>. 2657 + 2658 + [IANA.core-parameters] 2659 + IANA, "Constrained RESTful Environments (CoRE) 2660 + Parameters", 2661 + <https://www.iana.org/assignments/core-parameters>. 2662 + 2663 + [IANA.media-types] 2664 + IANA, "Media Types", 2665 + <https://www.iana.org/assignments/media-types>. 2666 + 2667 + [IANA.structured-suffix] 2668 + IANA, "Structured Syntax Suffixes", 2669 + <https://www.iana.org/assignments/media-type-structured- 2670 + suffix>. 2671 + 2672 + [MessagePack] 2673 + Furuhashi, S., "MessagePack", <https://msgpack.org/>. 2674 + 2675 + [PCRE] Hazel, P., "PCRE - Perl Compatible Regular Expressions", 2676 + <https://www.pcre.org/>. 2677 + 2678 + [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission 2679 + Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976, 2680 + <https://www.rfc-editor.org/info/rfc713>. 2681 + 2682 + [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 2683 + Specifications and Registration Procedures", BCP 13, 2684 + RFC 6838, DOI 10.17487/RFC6838, January 2013, 2685 + <https://www.rfc-editor.org/info/rfc6838>. 2686 + 2687 + [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 2688 + Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 2689 + October 2013, <https://www.rfc-editor.org/info/rfc7049>. 2690 + 2691 + [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for 2692 + Constrained-Node Networks", RFC 7228, 2693 + DOI 10.17487/RFC7228, May 2014, 2694 + <https://www.rfc-editor.org/info/rfc7228>. 2695 + 2696 + [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, 2697 + DOI 10.17487/RFC7493, March 2015, 2698 + <https://www.rfc-editor.org/info/rfc7493>. 2699 + 2700 + [RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary", 2701 + RFC 7991, DOI 10.17487/RFC7991, December 2016, 2702 + <https://www.rfc-editor.org/info/rfc7991>. 2703 + 2704 + [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 2705 + Interchange Format", STD 90, RFC 8259, 2706 + DOI 10.17487/RFC8259, December 2017, 2707 + <https://www.rfc-editor.org/info/rfc8259>. 2708 + 2709 + [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 2710 + Definition Language (CDDL): A Notational Convention to 2711 + Express Concise Binary Object Representation (CBOR) and 2712 + JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 2713 + June 2019, <https://www.rfc-editor.org/info/rfc8610>. 2714 + 2715 + [RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T., 2716 + and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS 2717 + Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September 2718 + 2019, <https://www.rfc-editor.org/info/rfc8618>. 2719 + 2720 + [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 2721 + Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 2722 + <https://www.rfc-editor.org/info/rfc8742>. 2723 + 2724 + [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation 2725 + (CBOR) Tags for Typed Arrays", RFC 8746, 2726 + DOI 10.17487/RFC8746, February 2020, 2727 + <https://www.rfc-editor.org/info/rfc8746>. 2728 + 2729 + [SIPHASH_LNCS] 2730 + Aumasson, J. and D. Bernstein, "SipHash: A Fast Short- 2731 + Input PRF", Progress in Cryptology - INDOCRYPT 2012, pp. 2732 + 489-508, DOI 10.1007/978-3-642-34931-7_28, 2012, 2733 + <https://doi.org/10.1007/978-3-642-34931-7_28>. 2734 + 2735 + [SIPHASH_OPEN] 2736 + Aumasson, J. and D.J. Bernstein, "SipHash: a fast short- 2737 + input PRF", <https://www.aumasson.jp/siphash/siphash.pdf>. 2738 + 2739 + [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup 2740 + Language (YAML[TM]) Version 1.2", 3rd Edition, October 2741 + 2009, <https://www.yaml.org/spec/1.2/spec.html>. 2742 + 2743 + Appendix A. Examples of Encoded CBOR Data Items 2744 + 2745 + The following table provides some CBOR-encoded values in hexadecimal 2746 + (right column), together with diagnostic notation for these values 2747 + (left column). Note that the string "\u00fc" is one form of 2748 + diagnostic notation for a UTF-8 string containing the single Unicode 2749 + character U+00FC (LATIN SMALL LETTER U WITH DIAERESIS, "ü"). 2750 + Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a 2751 + single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, "水"), often 2752 + representing "water", and "\ud800\udd51" is a UTF-8 string in 2753 + diagnostic notation with a single character U+10151 (GREEK ACROPHONIC 2754 + ATTIC FIFTY STATERS, "𐅑"). (Note that all these single-character 2755 + strings could also be represented in native UTF-8 in diagnostic 2756 + notation, just not if an ASCII-only specification is required.) In 2757 + the diagnostic notation provided for bignums, their intended numeric 2758 + value is shown as a decimal number (such as 18446744073709551616) 2759 + instead of a tagged byte string (such as 2(h'010000000000000000')). 2760 + 2761 + +==============================+====================================+ 2762 + |Diagnostic | Encoded | 2763 + +==============================+====================================+ 2764 + |0 | 0x00 | 2765 + +------------------------------+------------------------------------+ 2766 + |1 | 0x01 | 2767 + +------------------------------+------------------------------------+ 2768 + |10 | 0x0a | 2769 + +------------------------------+------------------------------------+ 2770 + |23 | 0x17 | 2771 + +------------------------------+------------------------------------+ 2772 + |24 | 0x1818 | 2773 + +------------------------------+------------------------------------+ 2774 + |25 | 0x1819 | 2775 + +------------------------------+------------------------------------+ 2776 + |100 | 0x1864 | 2777 + +------------------------------+------------------------------------+ 2778 + |1000 | 0x1903e8 | 2779 + +------------------------------+------------------------------------+ 2780 + |1000000 | 0x1a000f4240 | 2781 + +------------------------------+------------------------------------+ 2782 + |1000000000000 | 0x1b000000e8d4a51000 | 2783 + +------------------------------+------------------------------------+ 2784 + |18446744073709551615 | 0x1bffffffffffffffff | 2785 + +------------------------------+------------------------------------+ 2786 + |18446744073709551616 | 0xc249010000000000000000 | 2787 + +------------------------------+------------------------------------+ 2788 + |-18446744073709551616 | 0x3bffffffffffffffff | 2789 + +------------------------------+------------------------------------+ 2790 + |-18446744073709551617 | 0xc349010000000000000000 | 2791 + +------------------------------+------------------------------------+ 2792 + |-1 | 0x20 | 2793 + +------------------------------+------------------------------------+ 2794 + |-10 | 0x29 | 2795 + +------------------------------+------------------------------------+ 2796 + |-100 | 0x3863 | 2797 + +------------------------------+------------------------------------+ 2798 + |-1000 | 0x3903e7 | 2799 + +------------------------------+------------------------------------+ 2800 + |0.0 | 0xf90000 | 2801 + +------------------------------+------------------------------------+ 2802 + |-0.0 | 0xf98000 | 2803 + +------------------------------+------------------------------------+ 2804 + |1.0 | 0xf93c00 | 2805 + +------------------------------+------------------------------------+ 2806 + |1.1 | 0xfb3ff199999999999a | 2807 + +------------------------------+------------------------------------+ 2808 + |1.5 | 0xf93e00 | 2809 + +------------------------------+------------------------------------+ 2810 + |65504.0 | 0xf97bff | 2811 + +------------------------------+------------------------------------+ 2812 + |100000.0 | 0xfa47c35000 | 2813 + +------------------------------+------------------------------------+ 2814 + |3.4028234663852886e+38 | 0xfa7f7fffff | 2815 + +------------------------------+------------------------------------+ 2816 + |1.0e+300 | 0xfb7e37e43c8800759c | 2817 + +------------------------------+------------------------------------+ 2818 + |5.960464477539063e-8 | 0xf90001 | 2819 + +------------------------------+------------------------------------+ 2820 + |0.00006103515625 | 0xf90400 | 2821 + +------------------------------+------------------------------------+ 2822 + |-4.0 | 0xf9c400 | 2823 + +------------------------------+------------------------------------+ 2824 + |-4.1 | 0xfbc010666666666666 | 2825 + +------------------------------+------------------------------------+ 2826 + |Infinity | 0xf97c00 | 2827 + +------------------------------+------------------------------------+ 2828 + |NaN | 0xf97e00 | 2829 + +------------------------------+------------------------------------+ 2830 + |-Infinity | 0xf9fc00 | 2831 + +------------------------------+------------------------------------+ 2832 + |Infinity | 0xfa7f800000 | 2833 + +------------------------------+------------------------------------+ 2834 + |NaN | 0xfa7fc00000 | 2835 + +------------------------------+------------------------------------+ 2836 + |-Infinity | 0xfaff800000 | 2837 + +------------------------------+------------------------------------+ 2838 + |Infinity | 0xfb7ff0000000000000 | 2839 + +------------------------------+------------------------------------+ 2840 + |NaN | 0xfb7ff8000000000000 | 2841 + +------------------------------+------------------------------------+ 2842 + |-Infinity | 0xfbfff0000000000000 | 2843 + +------------------------------+------------------------------------+ 2844 + |false | 0xf4 | 2845 + +------------------------------+------------------------------------+ 2846 + |true | 0xf5 | 2847 + +------------------------------+------------------------------------+ 2848 + |null | 0xf6 | 2849 + +------------------------------+------------------------------------+ 2850 + |undefined | 0xf7 | 2851 + +------------------------------+------------------------------------+ 2852 + |simple(16) | 0xf0 | 2853 + +------------------------------+------------------------------------+ 2854 + |simple(255) | 0xf8ff | 2855 + +------------------------------+------------------------------------+ 2856 + |0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a | 2857 + | | 30343a30305a | 2858 + +------------------------------+------------------------------------+ 2859 + |1(1363896240) | 0xc11a514b67b0 | 2860 + +------------------------------+------------------------------------+ 2861 + |1(1363896240.5) | 0xc1fb41d452d9ec200000 | 2862 + +------------------------------+------------------------------------+ 2863 + |23(h'01020304') | 0xd74401020304 | 2864 + +------------------------------+------------------------------------+ 2865 + |24(h'6449455446') | 0xd818456449455446 | 2866 + +------------------------------+------------------------------------+ 2867 + |32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 | 2868 + | | 616d706c652e636f6d | 2869 + +------------------------------+------------------------------------+ 2870 + |h'' | 0x40 | 2871 + +------------------------------+------------------------------------+ 2872 + |h'01020304' | 0x4401020304 | 2873 + +------------------------------+------------------------------------+ 2874 + |"" | 0x60 | 2875 + +------------------------------+------------------------------------+ 2876 + |"a" | 0x6161 | 2877 + +------------------------------+------------------------------------+ 2878 + |"IETF" | 0x6449455446 | 2879 + +------------------------------+------------------------------------+ 2880 + |"\"\\" | 0x62225c | 2881 + +------------------------------+------------------------------------+ 2882 + |"\u00fc" | 0x62c3bc | 2883 + +------------------------------+------------------------------------+ 2884 + |"\u6c34" | 0x63e6b0b4 | 2885 + +------------------------------+------------------------------------+ 2886 + |"\ud800\udd51" | 0x64f0908591 | 2887 + +------------------------------+------------------------------------+ 2888 + |[] | 0x80 | 2889 + +------------------------------+------------------------------------+ 2890 + |[1, 2, 3] | 0x83010203 | 2891 + +------------------------------+------------------------------------+ 2892 + |[1, [2, 3], [4, 5]] | 0x8301820203820405 | 2893 + +------------------------------+------------------------------------+ 2894 + |[1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e | 2895 + |10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 | 2896 + |17, 18, 19, 20, 21, 22, 23, | | 2897 + |24, 25] | | 2898 + +------------------------------+------------------------------------+ 2899 + |{} | 0xa0 | 2900 + +------------------------------+------------------------------------+ 2901 + |{1: 2, 3: 4} | 0xa201020304 | 2902 + +------------------------------+------------------------------------+ 2903 + |{"a": 1, "b": [2, 3]} | 0xa26161016162820203 | 2904 + +------------------------------+------------------------------------+ 2905 + |["a", {"b": "c"}] | 0x826161a161626163 | 2906 + +------------------------------+------------------------------------+ 2907 + |{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 | 2908 + |"d": "D", "e": "E"} | 4461656145 | 2909 + +------------------------------+------------------------------------+ 2910 + |(_ h'0102', h'030405') | 0x5f42010243030405ff | 2911 + +------------------------------+------------------------------------+ 2912 + |(_ "strea", "ming") | 0x7f657374726561646d696e67ff | 2913 + +------------------------------+------------------------------------+ 2914 + |[_ ] | 0x9fff | 2915 + +------------------------------+------------------------------------+ 2916 + |[_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff | 2917 + +------------------------------+------------------------------------+ 2918 + |[_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff | 2919 + +------------------------------+------------------------------------+ 2920 + |[1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff | 2921 + +------------------------------+------------------------------------+ 2922 + |[1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 | 2923 + +------------------------------+------------------------------------+ 2924 + |[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f | 2925 + |10, 11, 12, 13, 14, 15, 16, | 101112131415161718181819ff | 2926 + |17, 18, 19, 20, 21, 22, 23, | | 2927 + |24, 25] | | 2928 + +------------------------------+------------------------------------+ 2929 + |{_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff | 2930 + +------------------------------+------------------------------------+ 2931 + |["a", {_ "b": "c"}] | 0x826161bf61626163ff | 2932 + +------------------------------+------------------------------------+ 2933 + |{_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | 2934 + +------------------------------+------------------------------------+ 2935 + 2936 + Table 6: Examples of Encoded CBOR Data Items 2937 + 2938 + Appendix B. Jump Table for Initial Byte 2939 + 2940 + For brevity, this jump table does not show initial bytes that are 2941 + reserved for future extension. It also only shows a selection of the 2942 + initial bytes that can be used for optional features. (All unsigned 2943 + integers are in network byte order.) 2944 + 2945 + +============+================================================+ 2946 + | Byte | Structure/Semantics | 2947 + +============+================================================+ 2948 + | 0x00..0x17 | unsigned integer 0x00..0x17 (0..23) | 2949 + +------------+------------------------------------------------+ 2950 + | 0x18 | unsigned integer (one-byte uint8_t follows) | 2951 + +------------+------------------------------------------------+ 2952 + | 0x19 | unsigned integer (two-byte uint16_t follows) | 2953 + +------------+------------------------------------------------+ 2954 + | 0x1a | unsigned integer (four-byte uint32_t follows) | 2955 + +------------+------------------------------------------------+ 2956 + | 0x1b | unsigned integer (eight-byte uint64_t follows) | 2957 + +------------+------------------------------------------------+ 2958 + | 0x20..0x37 | negative integer -1-0x00..-1-0x17 (-1..-24) | 2959 + +------------+------------------------------------------------+ 2960 + | 0x38 | negative integer -1-n (one-byte uint8_t for n | 2961 + | | follows) | 2962 + +------------+------------------------------------------------+ 2963 + | 0x39 | negative integer -1-n (two-byte uint16_t for n | 2964 + | | follows) | 2965 + +------------+------------------------------------------------+ 2966 + | 0x3a | negative integer -1-n (four-byte uint32_t for | 2967 + | | n follows) | 2968 + +------------+------------------------------------------------+ 2969 + | 0x3b | negative integer -1-n (eight-byte uint64_t for | 2970 + | | n follows) | 2971 + +------------+------------------------------------------------+ 2972 + | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | 2973 + +------------+------------------------------------------------+ 2974 + | 0x58 | byte string (one-byte uint8_t for n, and then | 2975 + | | n bytes follow) | 2976 + +------------+------------------------------------------------+ 2977 + | 0x59 | byte string (two-byte uint16_t for n, and then | 2978 + | | n bytes follow) | 2979 + +------------+------------------------------------------------+ 2980 + | 0x5a | byte string (four-byte uint32_t for n, and | 2981 + | | then n bytes follow) | 2982 + +------------+------------------------------------------------+ 2983 + | 0x5b | byte string (eight-byte uint64_t for n, and | 2984 + | | then n bytes follow) | 2985 + +------------+------------------------------------------------+ 2986 + | 0x5f | byte string, byte strings follow, terminated | 2987 + | | by "break" | 2988 + +------------+------------------------------------------------+ 2989 + | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) | 2990 + +------------+------------------------------------------------+ 2991 + | 0x78 | UTF-8 string (one-byte uint8_t for n, and then | 2992 + | | n bytes follow) | 2993 + +------------+------------------------------------------------+ 2994 + | 0x79 | UTF-8 string (two-byte uint16_t for n, and | 2995 + | | then n bytes follow) | 2996 + +------------+------------------------------------------------+ 2997 + | 0x7a | UTF-8 string (four-byte uint32_t for n, and | 2998 + | | then n bytes follow) | 2999 + +------------+------------------------------------------------+ 3000 + | 0x7b | UTF-8 string (eight-byte uint64_t for n, and | 3001 + | | then n bytes follow) | 3002 + +------------+------------------------------------------------+ 3003 + | 0x7f | UTF-8 string, UTF-8 strings follow, terminated | 3004 + | | by "break" | 3005 + +------------+------------------------------------------------+ 3006 + | 0x80..0x97 | array (0x00..0x17 data items follow) | 3007 + +------------+------------------------------------------------+ 3008 + | 0x98 | array (one-byte uint8_t for n, and then n data | 3009 + | | items follow) | 3010 + +------------+------------------------------------------------+ 3011 + | 0x99 | array (two-byte uint16_t for n, and then n | 3012 + | | data items follow) | 3013 + +------------+------------------------------------------------+ 3014 + | 0x9a | array (four-byte uint32_t for n, and then n | 3015 + | | data items follow) | 3016 + +------------+------------------------------------------------+ 3017 + | 0x9b | array (eight-byte uint64_t for n, and then n | 3018 + | | data items follow) | 3019 + +------------+------------------------------------------------+ 3020 + | 0x9f | array, data items follow, terminated by | 3021 + | | "break" | 3022 + +------------+------------------------------------------------+ 3023 + | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) | 3024 + +------------+------------------------------------------------+ 3025 + | 0xb8 | map (one-byte uint8_t for n, and then n pairs | 3026 + | | of data items follow) | 3027 + +------------+------------------------------------------------+ 3028 + | 0xb9 | map (two-byte uint16_t for n, and then n pairs | 3029 + | | of data items follow) | 3030 + +------------+------------------------------------------------+ 3031 + | 0xba | map (four-byte uint32_t for n, and then n | 3032 + | | pairs of data items follow) | 3033 + +------------+------------------------------------------------+ 3034 + | 0xbb | map (eight-byte uint64_t for n, and then n | 3035 + | | pairs of data items follow) | 3036 + +------------+------------------------------------------------+ 3037 + | 0xbf | map, pairs of data items follow, terminated by | 3038 + | | "break" | 3039 + +------------+------------------------------------------------+ 3040 + | 0xc0 | text-based date/time (data item follows; see | 3041 + | | Section 3.4.1) | 3042 + +------------+------------------------------------------------+ 3043 + | 0xc1 | epoch-based date/time (data item follows; see | 3044 + | | Section 3.4.2) | 3045 + +------------+------------------------------------------------+ 3046 + | 0xc2 | unsigned bignum (data item "byte string" | 3047 + | | follows) | 3048 + +------------+------------------------------------------------+ 3049 + | 0xc3 | negative bignum (data item "byte string" | 3050 + | | follows) | 3051 + +------------+------------------------------------------------+ 3052 + | 0xc4 | decimal Fraction (data item "array" follows; | 3053 + | | see Section 3.4.4) | 3054 + +------------+------------------------------------------------+ 3055 + | 0xc5 | bigfloat (data item "array" follows; see | 3056 + | | Section 3.4.4) | 3057 + +------------+------------------------------------------------+ 3058 + | 0xc6..0xd4 | (tag) | 3059 + +------------+------------------------------------------------+ 3060 + | 0xd5..0xd7 | expected conversion (data item follows; see | 3061 + | | Section 3.4.5.2) | 3062 + +------------+------------------------------------------------+ 3063 + | 0xd8..0xdb | (more tags; 1/2/4/8 bytes of tag number and | 3064 + | | then a data item follow) | 3065 + +------------+------------------------------------------------+ 3066 + | 0xe0..0xf3 | (simple value) | 3067 + +------------+------------------------------------------------+ 3068 + | 0xf4 | false | 3069 + +------------+------------------------------------------------+ 3070 + | 0xf5 | true | 3071 + +------------+------------------------------------------------+ 3072 + | 0xf6 | null | 3073 + +------------+------------------------------------------------+ 3074 + | 0xf7 | undefined | 3075 + +------------+------------------------------------------------+ 3076 + | 0xf8 | (simple value, one byte follows) | 3077 + +------------+------------------------------------------------+ 3078 + | 0xf9 | half-precision float (two-byte IEEE 754) | 3079 + +------------+------------------------------------------------+ 3080 + | 0xfa | single-precision float (four-byte IEEE 754) | 3081 + +------------+------------------------------------------------+ 3082 + | 0xfb | double-precision float (eight-byte IEEE 754) | 3083 + +------------+------------------------------------------------+ 3084 + | 0xff | "break" stop code | 3085 + +------------+------------------------------------------------+ 3086 + 3087 + Table 7: Jump Table for Initial Byte 3088 + 3089 + Appendix C. Pseudocode 3090 + 3091 + The well-formedness of a CBOR item can be checked by the pseudocode 3092 + in Figure 1. The data is well-formed if and only if: 3093 + 3094 + * the pseudocode does not "fail"; 3095 + 3096 + * after execution of the pseudocode, no bytes are left in the input 3097 + (except in streaming applications). 3098 + 3099 + The pseudocode has the following prerequisites: 3100 + 3101 + * take(n) reads n bytes from the input data and returns them as a 3102 + byte string. If n bytes are no longer available, take(n) fails. 3103 + 3104 + * uint() converts a byte string into an unsigned integer by 3105 + interpreting the byte string in network byte order. 3106 + 3107 + * Arithmetic works as in C. 3108 + 3109 + * All variables are unsigned integers of sufficient range. 3110 + 3111 + Note that "well_formed" returns the major type for well-formed 3112 + definite-length items, but 99 for an indefinite-length item (or -1 3113 + for a "break" stop code, only if "breakable" is set). This is used 3114 + in "well_formed_indefinite" to ascertain that indefinite-length 3115 + strings only contain definite-length strings as chunks. 3116 + 3117 + well_formed(breakable = false) { 3118 + // process initial bytes 3119 + ib = uint(take(1)); 3120 + mt = ib >> 5; 3121 + val = ai = ib & 0x1f; 3122 + switch (ai) { 3123 + case 24: val = uint(take(1)); break; 3124 + case 25: val = uint(take(2)); break; 3125 + case 26: val = uint(take(4)); break; 3126 + case 27: val = uint(take(8)); break; 3127 + case 28: case 29: case 30: fail(); 3128 + case 31: 3129 + return well_formed_indefinite(mt, breakable); 3130 + } 3131 + // process content 3132 + switch (mt) { 3133 + // case 0, 1, 7 do not have content; just use val 3134 + case 2: case 3: take(val); break; // bytes/UTF-8 3135 + case 4: for (i = 0; i < val; i++) well_formed(); break; 3136 + case 5: for (i = 0; i < val*2; i++) well_formed(); break; 3137 + case 6: well_formed(); break; // 1 embedded data item 3138 + case 7: if (ai == 24 && val < 32) fail(); // bad simple 3139 + } 3140 + return mt; // definite-length data item 3141 + } 3142 + 3143 + well_formed_indefinite(mt, breakable) { 3144 + switch (mt) { 3145 + case 2: case 3: 3146 + while ((it = well_formed(true)) != -1) 3147 + if (it != mt) // need definite-length chunk 3148 + fail(); // of same type 3149 + break; 3150 + case 4: while (well_formed(true) != -1); break; 3151 + case 5: while (well_formed(true) != -1) well_formed(); break; 3152 + case 7: 3153 + if (breakable) 3154 + return -1; // signal break out 3155 + else fail(); // no enclosing indefinite 3156 + default: fail(); // wrong mt 3157 + } 3158 + return 99; // indefinite-length data item 3159 + } 3160 + 3161 + Figure 1: Pseudocode for Well-Formedness Check 3162 + 3163 + Note that the remaining complexity of a complete CBOR decoder is 3164 + about presenting data that has been decoded to the application in an 3165 + appropriate form. 3166 + 3167 + Major types 0 and 1 are designed in such a way that they can be 3168 + encoded in C from a signed integer without actually doing an if-then- 3169 + else for positive/negative (Figure 2). This uses the fact that 3170 + (-1-n), the transformation for major type 1, is the same as ~n 3171 + (bitwise complement) in C unsigned arithmetic; ~n can then be 3172 + expressed as (-1)^n for the negative case, while 0^n leaves n 3173 + unchanged for nonnegative. The sign of a number can be converted to 3174 + -1 for negative and 0 for nonnegative (0 or positive) by arithmetic- 3175 + shifting the number by one bit less than the bit length of the number 3176 + (for example, by 63 for 64-bit numbers). 3177 + 3178 + void encode_sint(int64_t n) { 3179 + uint64t ui = n >> 63; // extend sign to whole length 3180 + unsigned mt = ui & 0x20; // extract (shifted) major type 3181 + ui ^= n; // complement negatives 3182 + if (ui < 24) 3183 + *p++ = mt + ui; 3184 + else if (ui < 256) { 3185 + *p++ = mt + 24; 3186 + *p++ = ui; 3187 + } else 3188 + ... 3189 + 3190 + Figure 2: Pseudocode for Encoding a Signed Integer 3191 + 3192 + See Section 1.2 for some specific assumptions about the profile of 3193 + the C language used in these pieces of code. 3194 + 3195 + Appendix D. Half-Precision 3196 + 3197 + As half-precision floating-point numbers were only added to IEEE 754 3198 + in 2008 [IEEE754], today's programming platforms often still only 3199 + have limited support for them. It is very easy to include at least 3200 + decoding support for them even without such support. An example of a 3201 + small decoder for half-precision floating-point numbers in the C 3202 + language is shown in Figure 3. A similar program for Python is in 3203 + Figure 4; this code assumes that the 2-byte value has already been 3204 + decoded as an (unsigned short) integer in network byte order (as 3205 + would be done by the pseudocode in Appendix C). 3206 + 3207 + #include <math.h> 3208 + 3209 + double decode_half(unsigned char *halfp) { 3210 + unsigned half = (halfp[0] << 8) + halfp[1]; 3211 + unsigned exp = (half >> 10) & 0x1f; 3212 + unsigned mant = half & 0x3ff; 3213 + double val; 3214 + if (exp == 0) val = ldexp(mant, -24); 3215 + else if (exp != 31) val = ldexp(mant + 1024, exp - 25); 3216 + else val = mant == 0 ? INFINITY : NAN; 3217 + return half & 0x8000 ? -val : val; 3218 + } 3219 + 3220 + Figure 3: C Code for a Half-Precision Decoder 3221 + 3222 + import struct 3223 + from math import ldexp 3224 + 3225 + def decode_single(single): 3226 + return struct.unpack("!f", struct.pack("!I", single))[0] 3227 + 3228 + def decode_half(half): 3229 + valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 3230 + if ((half & 0x7c00) != 0x7c00): 3231 + return ldexp(decode_single(valu), 112) 3232 + return decode_single(valu | 0x7f800000) 3233 + 3234 + Figure 4: Python Code for a Half-Precision Decoder 3235 + 3236 + Appendix E. Comparison of Other Binary Formats to CBOR's Design 3237 + Objectives 3238 + 3239 + The proposal for CBOR follows a history of binary formats that is as 3240 + long as the history of computers themselves. Different formats have 3241 + had different objectives. In most cases, the objectives of the 3242 + format were never stated, although they can sometimes be implied by 3243 + the context where the format was first used. Some formats were meant 3244 + to be universally usable, although history has proven that no binary 3245 + format meets the needs of all protocols and applications. 3246 + 3247 + CBOR differs from many of these formats due to it starting with a set 3248 + of objectives and attempting to meet just those. This section 3249 + compares a few of the dozens of formats with CBOR's objectives in 3250 + order to help the reader decide if they want to use CBOR or a 3251 + different format for a particular protocol or application. 3252 + 3253 + Note that the discussion here is not meant to be a criticism of any 3254 + format: to the best of our knowledge, no format before CBOR was meant 3255 + to cover CBOR's objectives in the priority we have assigned them. A 3256 + brief recap of the objectives from Section 1.1 is: 3257 + 3258 + 1. unambiguous encoding of most common data formats from Internet 3259 + standards 3260 + 3261 + 2. code compactness for encoder or decoder 3262 + 3263 + 3. no schema description needed 3264 + 3265 + 4. reasonably compact serialization 3266 + 3267 + 5. applicability to constrained and unconstrained applications 3268 + 3269 + 6. good JSON conversion 3270 + 3271 + 7. extensibility 3272 + 3273 + A discussion of CBOR and other formats with respect to a different 3274 + set of design objectives is provided in Section 5 and Appendix C of 3275 + [RFC8618]. 3276 + 3277 + E.1. ASN.1 DER, BER, and PER 3278 + 3279 + [ASN.1] has many serializations. In the IETF, DER and BER are the 3280 + most common. The serialized output is not particularly compact for 3281 + many items, and the code needed to decode numeric items can be 3282 + complex on a constrained device. 3283 + 3284 + Few (if any) IETF protocols have adopted one of the several variants 3285 + of Packed Encoding Rules (PER). There could be many reasons for 3286 + this, but one that is commonly stated is that PER makes use of the 3287 + schema even for parsing the surface structure of the data item, 3288 + requiring significant tool support. There are different versions of 3289 + the ASN.1 schema language in use, which has also hampered adoption. 3290 + 3291 + E.2. MessagePack 3292 + 3293 + [MessagePack] is a concise, widely implemented counted binary 3294 + serialization format, similar in many properties to CBOR, although 3295 + somewhat less regular. While the data model can be used to represent 3296 + JSON data, MessagePack has also been used in many remote procedure 3297 + call (RPC) applications and for long-term storage of data. 3298 + 3299 + MessagePack has been essentially stable since it was first published 3300 + around 2011; it has not yet had a transition. The evolution of 3301 + MessagePack is impeded by an imperative to maintain complete 3302 + backwards compatibility with existing stored data, while only few 3303 + bytecodes are still available for extension. Repeated requests over 3304 + the years from the MessagePack user community to separate out binary 3305 + and text strings in the encoding recently have led to an extension 3306 + proposal that would leave MessagePack's "raw" data ambiguous between 3307 + its usages for binary and text data. The extension mechanism for 3308 + MessagePack remains unclear. 3309 + 3310 + E.3. BSON 3311 + 3312 + [BSON] is a data format that was developed for the storage of JSON- 3313 + like maps (JSON objects) in the MongoDB database. Its major 3314 + distinguishing feature is the capability for in-place update, which 3315 + prevents a compact representation. BSON uses a counted 3316 + representation except for map keys, which are null-byte terminated. 3317 + While BSON can be used for the representation of JSON-like objects on 3318 + the wire, its specification is dominated by the requirements of the 3319 + database application and has become somewhat baroque. The status of 3320 + how BSON extensions will be implemented remains unclear. 3321 + 3322 + E.4. MSDTP: RFC 713 3323 + 3324 + Message Services Data Transmission (MSDTP) is a very early example of 3325 + a compact message format; it is described in [RFC0713], written in 3326 + 1976. It is included here for its historical value, not because it 3327 + was ever widely used. 3328 + 3329 + E.5. Conciseness on the Wire 3330 + 3331 + While CBOR's design objective of code compactness for encoders and 3332 + decoders is a higher priority than its objective of conciseness on 3333 + the wire, many people focus on the wire size. Table 8 shows some 3334 + encoding examples for the simple nested array [1, [2, 3]]; where some 3335 + form of indefinite-length encoding is supported by the encoding, 3336 + [_ 1, [2, 3]] (indefinite length on the outer array) is also shown. 3337 + 3338 + +=============+============================+================+ 3339 + | Format | [1, [2, 3]] | [_ 1, [2, 3]] | 3340 + +=============+============================+================+ 3341 + | RFC 713 | c2 05 81 c2 02 82 83 | | 3342 + +-------------+----------------------------+----------------+ 3343 + | ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 | 3344 + | | 02 02 01 03 | 30 06 02 01 02 | 3345 + | | | 02 01 03 00 00 | 3346 + +-------------+----------------------------+----------------+ 3347 + | MessagePack | 92 01 92 02 03 | | 3348 + +-------------+----------------------------+----------------+ 3349 + | BSON | 22 00 00 00 10 30 00 01 00 | | 3350 + | | 00 00 04 31 00 13 00 00 00 | | 3351 + | | 10 30 00 02 00 00 00 10 31 | | 3352 + | | 00 03 00 00 00 00 00 | | 3353 + +-------------+----------------------------+----------------+ 3354 + | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 | 3355 + | | | ff | 3356 + +-------------+----------------------------+----------------+ 3357 + 3358 + Table 8: Examples for Different Levels of Conciseness 3359 + 3360 + Appendix F. Well-Formedness Errors and Examples 3361 + 3362 + There are three basic kinds of well-formedness errors that can occur 3363 + in decoding a CBOR data item: 3364 + 3365 + Too much data: There are input bytes left that were not consumed. 3366 + This is only an error if the application assumed that the input 3367 + bytes would span exactly one data item. Where the application 3368 + uses the self-delimiting nature of CBOR encoding to permit 3369 + additional data after the data item, as is done in CBOR sequences 3370 + [RFC8742], for example, the CBOR decoder can simply indicate which 3371 + part of the input has not been consumed. 3372 + 3373 + Too little data: The input data available would need additional 3374 + bytes added at their end for a complete CBOR data item. This may 3375 + indicate the input is truncated; it is also a common error when 3376 + trying to decode random data as CBOR. For some applications, 3377 + however, this may not actually be an error, as the application may 3378 + not be certain it has all the data yet and can obtain or wait for 3379 + additional input bytes. Some of these applications may have an 3380 + upper limit for how much additional data can appear; here the 3381 + decoder may be able to indicate that the encoded CBOR data item 3382 + cannot be completed within this limit. 3383 + 3384 + Syntax error: The input data are not consistent with the 3385 + requirements of the CBOR encoding, and this cannot be remedied by 3386 + adding (or removing) data at the end. 3387 + 3388 + In Appendix C, errors of the first kind are addressed in the first 3389 + paragraph and bullet list (requiring "no bytes are left"), and errors 3390 + of the second kind are addressed in the second paragraph/bullet list 3391 + (failing "if n bytes are no longer available"). Errors of the third 3392 + kind are identified in the pseudocode by specific instances of 3393 + calling fail(), in order: 3394 + 3395 + * a reserved value is used for additional information (28, 29, 30) 3396 + 3397 + * major type 7, additional information 24, value < 32 (incorrect) 3398 + 3399 + * incorrect substructure of indefinite-length byte string or text 3400 + string (may only contain definite-length strings of the same major 3401 + type) 3402 + 3403 + * "break" stop code (major type 7, additional information 31) occurs 3404 + in a value position of a map or except at a position directly in 3405 + an indefinite-length item where also another enclosed data item 3406 + could occur 3407 + 3408 + * additional information 31 used with major type 0, 1, or 6 3409 + 3410 + F.1. Examples of CBOR Data Items That Are Not Well-Formed 3411 + 3412 + This subsection shows a few examples for CBOR data items that are not 3413 + well-formed. Each example is a sequence of bytes, each shown in 3414 + hexadecimal; multiple examples in a list are separated by commas. 3415 + 3416 + Examples for well-formedness error kind 1 (too much data) can easily 3417 + be formed by adding data to a well-formed encoded CBOR data item. 3418 + 3419 + Similarly, examples for well-formedness error kind 2 (too little 3420 + data) can be formed by truncating a well-formed encoded CBOR data 3421 + item. In test suites, it may be beneficial to specifically test with 3422 + incomplete data items that would require large amounts of addition to 3423 + be completed (for instance by starting the encoding of a string of a 3424 + very large size). 3425 + 3426 + A premature end of the input can occur in a head or within the 3427 + enclosed data, which may be bare strings or enclosed data items that 3428 + are either counted or should have been ended by a "break" stop code. 3429 + 3430 + End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 03 3431 + 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 00 3432 + 00, fb 00 00 00 3433 + 3434 + Definite-length strings with short data: 41, 61, 5a ff ff ff ff 00, 3435 + 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f ff 3436 + ff ff ff ff ff ff 01 02 03 3437 + 3438 + Definite-length maps and arrays not closed with enough items: 81, 81 3439 + 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 00 3440 + 3441 + Tag number not followed by tag content: c0 3442 + 3443 + Indefinite-length strings not closed by a "break" stop code: 5f 41 3444 + 00, 7f 61 00 3445 + 3446 + Indefinite-length maps and arrays not closed by a "break" stop 3447 + code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 3448 + 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff 3449 + 3450 + A few examples for the five subkinds of well-formedness error kind 3 3451 + (syntax error) are shown below. 3452 + 3453 + Subkind 1: 3454 + Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e, 3455 + 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, 3456 + fd, fe, 3457 + 3458 + Subkind 2: 3459 + Reserved two-byte encodings of simple values: f8 00, f8 01, f8 3460 + 18, f8 1f 3461 + 3462 + Subkind 3: 3463 + Indefinite-length string chunks not of the correct type: 5f 00 3464 + ff, 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f 3465 + e0 ff, 7f 41 00 ff 3466 + 3467 + Indefinite-length string chunks not definite length: 5f 5f 41 00 3468 + ff ff, 7f 7f 61 00 ff ff 3469 + 3470 + Subkind 4: 3471 + Break occurring on its own outside of an indefinite-length 3472 + item: ff 3473 + 3474 + Break occurring in a definite-length array or map or a tag: 81 3475 + ff, 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 3476 + 9f 82 9f 81 9f 9f ff ff ff ff 3477 + 3478 + Break in an indefinite-length map that would lead to an odd 3479 + number of items (break in a value position): bf 00 ff, bf 00 00 3480 + 00 ff 3481 + 3482 + Subkind 5: 3483 + Major type 0, 1, 6 with additional information 31: 1f, 3f, df 3484 + 3485 + Appendix G. Changes from RFC 7049 3486 + 3487 + As discussed in the introduction, this document formally obsoletes 3488 + RFC 7049 while keeping full compatibility with the interchange format 3489 + from RFC 7049. This document provides editorial improvements, added 3490 + detail, and fixed errata. This document does not create a new 3491 + version of the format. 3492 + 3493 + G.1. Errata Processing and Clerical Changes 3494 + 3495 + The two verified errata on RFC 7049, [Err3764] and [Err3770], 3496 + concerned two encoding examples in the text that have been corrected 3497 + (Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" -> 3498 + "0b000_11001"). Also, RFC 7049 contained an example using the 3499 + numeric value 24 for a simple value [Err5917], which is not well- 3500 + formed; this example has been removed. Errata report 5763 [Err5763] 3501 + pointed to an error in the wording of the definition of tags; this 3502 + was resolved during a rewrite of Section 3.4. Errata report 5434 3503 + [Err5434] pointed out that the Universal Binary JSON (UBJSON) example 3504 + in Appendix E no longer complied with the version of UBJSON current 3505 + at the time of the errata report submission. It turned out that the 3506 + UBJSON specification had completely changed since 2013; this example 3507 + therefore was removed. Other errata reports [Err4409] [Err4963] 3508 + [Err4964] complained that the map key sorting rules for canonical 3509 + encoding were onerous; these led to a reconsideration of the 3510 + canonical encoding suggestions and replacement by the deterministic 3511 + encoding suggestions (described below). An editorial suggestion in 3512 + errata report 4294 [Err4294] was also implemented (improved symmetry 3513 + by adding "Second value" to a comment to the last example in 3514 + Section 3.2.2). 3515 + 3516 + Other clerical changes include: 3517 + 3518 + * the use of new xml2rfc functionality [RFC7991]; 3519 + 3520 + * more explanation of the notation used; 3521 + 3522 + * the update of references, e.g., from RFC 4627 to [RFC8259], from 3523 + CNN-TERMS to [RFC7228], and from the 5.1 edition to the 11th 3524 + edition of [ECMA262]; the addition of a reference to [IEEE754] and 3525 + importation of required definitions; the addition of references to 3526 + [C] and [Cplusplus20]; and the addition of a reference to 3527 + [RFC8618] that further illustrates the discussion in Appendix E; 3528 + 3529 + * in the discussion of diagnostic notation (Section 8), the 3530 + "Extended Diagnostic Notation" (EDN) defined in [RFC8610] is now 3531 + mentioned, the gap in representing NaN payloads is now 3532 + highlighted, and an explanation of representing indefinite-length 3533 + strings with no chunks has been added (Section 8.1); 3534 + 3535 + * the addition of this appendix. 3536 + 3537 + G.2. Changes in IANA Considerations 3538 + 3539 + The IANA considerations were generally updated (clerical changes, 3540 + e.g., now pointing to the CBOR Working Group as the author of the 3541 + specification). References to the respective IANA registries were 3542 + added to the informative references. 3543 + 3544 + In the "Concise Binary Object Representation (CBOR) Tags" registry 3545 + [IANA.cbor-tags], tags in the space from 256 to 32767 (lower half of 3546 + "1+2") are no longer assigned by First Come First Served; this range 3547 + is now Specification Required. 3548 + 3549 + G.3. Changes in Suggestions and Other Informational Components 3550 + 3551 + While revising the document, beyond the addressing of the errata 3552 + reports, the working group drew upon nearly seven years of experience 3553 + with CBOR in a diverse set of applications. This led to a number of 3554 + editorial changes, including adding tables for illustration, but also 3555 + emphasizing some aspects and de-emphasizing others. 3556 + 3557 + A significant addition is Section 2, which discusses the CBOR data 3558 + model and its small variations involved in the processing of CBOR. 3559 + The introduction of terms for those variations (basic generic, 3560 + extended generic, specific) enables more concise language in other 3561 + places of the document and also helps to clarify expectations of 3562 + implementations and of the extensibility features of the format. 3563 + 3564 + As a format derived from the JSON ecosystem, RFC 7049 was influenced 3565 + by the JSON number system that was in turn inherited from JavaScript 3566 + at the time. JSON does not provide distinct integers and floating- 3567 + point values (and the latter are decimal in the format). CBOR 3568 + provides binary representations of numbers, which do differ between 3569 + integers and floating-point values. Experience from implementation 3570 + and use suggested that the separation between these two number 3571 + domains should be more clearly drawn in the document; language that 3572 + suggested an integer could seamlessly stand in for a floating-point 3573 + value was removed. Also, a suggestion (based on I-JSON [RFC7493]) 3574 + was added for handling these types when converting JSON to CBOR, and 3575 + the use of a specific rounding mechanism has been recommended. 3576 + 3577 + For a single value in the data model, CBOR often provides multiple 3578 + encoding options. A new section (Section 4) introduces the term 3579 + "preferred serialization" (Section 4.1) and defines it for various 3580 + kinds of data items. On the basis of this terminology, the section 3581 + then discusses how a CBOR-based protocol can define "deterministic 3582 + encoding" (Section 4.2), which avoids terms "canonical" and 3583 + "canonicalization" from RFC 7049. The suggestion of "Core 3584 + Deterministic Encoding Requirements" (Section 4.2.1) enables generic 3585 + support for such protocol-defined encoding requirements. This 3586 + document further eases the implementation of deterministic encoding 3587 + by simplifying the map ordering suggested in RFC 7049 to a simple 3588 + lexicographic ordering of encoded keys. A description of the older 3589 + suggestion is kept as an alternative, now termed "length-first map 3590 + key ordering" (Section 4.2.3). 3591 + 3592 + The terminology for well-formed and valid data was sharpened and more 3593 + stringently used, avoiding less well-defined alternative terms such 3594 + as "syntax error", "decoding error", and "strict mode" outside of 3595 + examples. Also, a third level of requirements that an application 3596 + has on its input data beyond CBOR-level validity is now explicitly 3597 + called out. Well-formed (processable at all), valid (checked by a 3598 + validity-checking generic decoder), and expected input (as checked by 3599 + the application) are treated as a hierarchy of layers of 3600 + acceptability. 3601 + 3602 + The handling of non-well-formed simple values was clarified in text 3603 + and pseudocode. Appendix F was added to discuss well-formedness 3604 + errors and provide examples for them. The pseudocode was updated to 3605 + be more portable, and some portability considerations were added. 3606 + 3607 + The discussion of validity has been sharpened in two areas. Map 3608 + validity (handling of duplicate keys) was clarified, and the domain 3609 + of applicability of certain implementation choices explained. Also, 3610 + while streamlining the terminology for tags, tag numbers, and tag 3611 + content, discussion was added on tag validity, and the restrictions 3612 + were clarified on tag content, in general and specifically for tag 1. 3613 + 3614 + An implementation note (and note for future tag definitions) was 3615 + added to Section 3.4 about defining tags with semantics that depend 3616 + on serialization order. 3617 + 3618 + Tag 35 is not defined by this document; the registration based on the 3619 + definition in RFC 7049 remains in place. 3620 + 3621 + Terminology was introduced in Section 3 for "argument" and "head", 3622 + simplifying further discussion. 3623 + 3624 + The security considerations (Section 10) were mostly rewritten and 3625 + significantly expanded; in multiple other places, the document is now 3626 + more explicit that a decoder cannot simply condone well-formedness 3627 + errors. 3628 + 3629 + Acknowledgements 3630 + 3631 + CBOR was inspired by MessagePack. MessagePack was developed and 3632 + promoted by Sadayuki Furuhashi ("frsyuki"). This reference to 3633 + MessagePack is solely for attribution; CBOR is not intended as a 3634 + version of, or replacement for, MessagePack, as it has different 3635 + design goals and requirements. 3636 + 3637 + The need for functionality beyond the original MessagePack 3638 + specification became obvious to many people at about the same time 3639 + around the year 2012. BinaryPack is a minor derivation of 3640 + MessagePack that was developed by Eric Zhang for the binaryjs 3641 + project. A similar, but different, extension was made by Tim Caswell 3642 + for his msgpack-js and msgpack-js-browser projects. Many people have 3643 + contributed to the discussion about extending MessagePack to separate 3644 + text string representation from byte string representation. 3645 + 3646 + The encoding of the additional information in CBOR was inspired by 3647 + the encoding of length information designed by Klaus Hartke for CoAP. 3648 + 3649 + This document also incorporates suggestions made by many people, 3650 + notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand, 3651 + Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael 3652 + Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray 3653 + Polk, Stuart Cheshire, Tim Bray, Tony Finch, Tony Hansen, and Yaron 3654 + Sheffer. Benjamin Kaduk provided an extensive review during IESG 3655 + processing. Éric Vyncke, Erik Kline, Robert Wilton, and Roman Danyliw 3656 + provided further IESG comments, which included an IoT directorate 3657 + review by Eve Schooler. 3658 + 3659 + Authors' Addresses 3660 + 3661 + Carsten Bormann 3662 + Universität Bremen TZI 3663 + Postfach 330440 3664 + D-28359 Bremen 3665 + Germany 3666 + 3667 + Phone: +49-421-218-63921 3668 + Email: cabo@tzi.org 3669 + 3670 + 3671 + Paul Hoffman 3672 + ICANN 3673 + 3674 + Email: paul.hoffman@icann.org