···11+
22+33+44+55+Internet Engineering Task Force (IETF) C. Bormann
66+Request for Comments: 8949 Universität Bremen TZI
77+STD: 94 P. Hoffman
88+Obsoletes: 7049 ICANN
99+Category: Standards Track December 2020
1010+ISSN: 2070-1721
1111+1212+1313+ Concise Binary Object Representation (CBOR)
1414+1515+Abstract
1616+1717+ The Concise Binary Object Representation (CBOR) is a data format
1818+ whose design goals include the possibility of extremely small code
1919+ size, fairly small message size, and extensibility without the need
2020+ for version negotiation. These design goals make it different from
2121+ earlier binary serializations such as ASN.1 and MessagePack.
2222+2323+ This document obsoletes RFC 7049, providing editorial improvements,
2424+ new details, and errata fixes while keeping full compatibility with
2525+ the interchange format of RFC 7049. It does not create a new version
2626+ of the format.
2727+2828+Status of This Memo
2929+3030+ This is an Internet Standards Track document.
3131+3232+ This document is a product of the Internet Engineering Task Force
3333+ (IETF). It represents the consensus of the IETF community. It has
3434+ received public review and has been approved for publication by the
3535+ Internet Engineering Steering Group (IESG). Further information on
3636+ Internet Standards is available in Section 2 of RFC 7841.
3737+3838+ Information about the current status of this document, any errata,
3939+ and how to provide feedback on it may be obtained at
4040+ https://www.rfc-editor.org/info/rfc8949.
4141+4242+Copyright Notice
4343+4444+ Copyright (c) 2020 IETF Trust and the persons identified as the
4545+ document authors. All rights reserved.
4646+4747+ This document is subject to BCP 78 and the IETF Trust's Legal
4848+ Provisions Relating to IETF Documents
4949+ (https://trustee.ietf.org/license-info) in effect on the date of
5050+ publication of this document. Please review these documents
5151+ carefully, as they describe your rights and restrictions with respect
5252+ to this document. Code Components extracted from this document must
5353+ include Simplified BSD License text as described in Section 4.e of
5454+ the Trust Legal Provisions and are provided without warranty as
5555+ described in the Simplified BSD License.
5656+5757+Table of Contents
5858+5959+ 1. Introduction
6060+ 1.1. Objectives
6161+ 1.2. Terminology
6262+ 2. CBOR Data Models
6363+ 2.1. Extended Generic Data Models
6464+ 2.2. Specific Data Models
6565+ 3. Specification of the CBOR Encoding
6666+ 3.1. Major Types
6767+ 3.2. Indefinite Lengths for Some Major Types
6868+ 3.2.1. The "break" Stop Code
6969+ 3.2.2. Indefinite-Length Arrays and Maps
7070+ 3.2.3. Indefinite-Length Byte Strings and Text Strings
7171+ 3.2.4. Summary of Indefinite-Length Use of Major Types
7272+ 3.3. Floating-Point Numbers and Values with No Content
7373+ 3.4. Tagging of Items
7474+ 3.4.1. Standard Date/Time String
7575+ 3.4.2. Epoch-Based Date/Time
7676+ 3.4.3. Bignums
7777+ 3.4.4. Decimal Fractions and Bigfloats
7878+ 3.4.5. Content Hints
7979+ 3.4.5.1. Encoded CBOR Data Item
8080+ 3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters
8181+ 3.4.5.3. Encoded Text
8282+ 3.4.6. Self-Described CBOR
8383+ 4. Serialization Considerations
8484+ 4.1. Preferred Serialization
8585+ 4.2. Deterministically Encoded CBOR
8686+ 4.2.1. Core Deterministic Encoding Requirements
8787+ 4.2.2. Additional Deterministic Encoding Considerations
8888+ 4.2.3. Length-First Map Key Ordering
8989+ 5. Creating CBOR-Based Protocols
9090+ 5.1. CBOR in Streaming Applications
9191+ 5.2. Generic Encoders and Decoders
9292+ 5.3. Validity of Items
9393+ 5.3.1. Basic validity
9494+ 5.3.2. Tag validity
9595+ 5.4. Validity and Evolution
9696+ 5.5. Numbers
9797+ 5.6. Specifying Keys for Maps
9898+ 5.6.1. Equivalence of Keys
9999+ 5.7. Undefined Values
100100+ 6. Converting Data between CBOR and JSON
101101+ 6.1. Converting from CBOR to JSON
102102+ 6.2. Converting from JSON to CBOR
103103+ 7. Future Evolution of CBOR
104104+ 7.1. Extension Points
105105+ 7.2. Curating the Additional Information Space
106106+ 8. Diagnostic Notation
107107+ 8.1. Encoding Indicators
108108+ 9. IANA Considerations
109109+ 9.1. CBOR Simple Values Registry
110110+ 9.2. CBOR Tags Registry
111111+ 9.3. Media Types Registry
112112+ 9.4. CoAP Content-Format Registry
113113+ 9.5. Structured Syntax Suffix Registry
114114+ 10. Security Considerations
115115+ 11. References
116116+ 11.1. Normative References
117117+ 11.2. Informative References
118118+ Appendix A. Examples of Encoded CBOR Data Items
119119+ Appendix B. Jump Table for Initial Byte
120120+ Appendix C. Pseudocode
121121+ Appendix D. Half-Precision
122122+ Appendix E. Comparison of Other Binary Formats to CBOR's Design
123123+ Objectives
124124+ E.1. ASN.1 DER, BER, and PER
125125+ E.2. MessagePack
126126+ E.3. BSON
127127+ E.4. MSDTP: RFC 713
128128+ E.5. Conciseness on the Wire
129129+ Appendix F. Well-Formedness Errors and Examples
130130+ F.1. Examples of CBOR Data Items That Are Not Well-Formed
131131+ Appendix G. Changes from RFC 7049
132132+ G.1. Errata Processing and Clerical Changes
133133+ G.2. Changes in IANA Considerations
134134+ G.3. Changes in Suggestions and Other Informational Components
135135+ Acknowledgements
136136+ Authors' Addresses
137137+138138+1. Introduction
139139+140140+ There are hundreds of standardized formats for binary representation
141141+ of structured data (also known as binary serialization formats). Of
142142+ those, some are for specific domains of information, while others are
143143+ generalized for arbitrary data. In the IETF, probably the best-known
144144+ formats in the latter category are ASN.1's BER and DER [ASN.1].
145145+146146+ The format defined here follows some specific design goals that are
147147+ not well met by current formats. The underlying data model is an
148148+ extended version of the JSON data model [RFC8259]. It is important
149149+ to note that this is not a proposal that the grammar in RFC 8259 be
150150+ extended in general, since doing so would cause a significant
151151+ backwards incompatibility with already deployed JSON documents.
152152+ Instead, this document simply defines its own data model that starts
153153+ from JSON.
154154+155155+ Appendix E lists some existing binary formats and discusses how well
156156+ they do or do not fit the design objectives of the Concise Binary
157157+ Object Representation (CBOR).
158158+159159+ This document obsoletes [RFC7049], providing editorial improvements,
160160+ new details, and errata fixes while keeping full compatibility with
161161+ the interchange format of RFC 7049. It does not create a new version
162162+ of the format.
163163+164164+1.1. Objectives
165165+166166+ The objectives of CBOR, roughly in decreasing order of importance,
167167+ are:
168168+169169+ 1. The representation must be able to unambiguously encode most
170170+ common data formats used in Internet standards.
171171+172172+ * It must represent a reasonable set of basic data types and
173173+ structures using binary encoding. "Reasonable" here is
174174+ largely influenced by the capabilities of JSON, with the major
175175+ addition of binary byte strings. The structures supported are
176176+ limited to arrays and trees; loops and lattice-style graphs
177177+ are not supported.
178178+179179+ * There is no requirement that all data formats be uniquely
180180+ encoded; that is, it is acceptable that the number "7" might
181181+ be encoded in multiple different ways.
182182+183183+ 2. The code for an encoder or decoder must be able to be compact in
184184+ order to support systems with very limited memory, processor
185185+ power, and instruction sets.
186186+187187+ * An encoder and a decoder need to be implementable in a very
188188+ small amount of code (for example, in class 1 constrained
189189+ nodes as defined in [RFC7228]).
190190+191191+ * The format should use contemporary machine representations of
192192+ data (for example, not requiring binary-to-decimal
193193+ conversion).
194194+195195+ 3. Data must be able to be decoded without a schema description.
196196+197197+ * Similar to JSON, encoded data should be self-describing so
198198+ that a generic decoder can be written.
199199+200200+ 4. The serialization must be reasonably compact, but data
201201+ compactness is secondary to code compactness for the encoder and
202202+ decoder.
203203+204204+ * "Reasonable" here is bounded by JSON as an upper bound in size
205205+ and by the implementation complexity, which limits the amount
206206+ of effort that can go into achieving that compactness. Using
207207+ either general compression schemes or extensive bit-fiddling
208208+ violates the complexity goals.
209209+210210+ 5. The format must be applicable to both constrained nodes and high-
211211+ volume applications.
212212+213213+ * This means it must be reasonably frugal in CPU usage for both
214214+ encoding and decoding. This is relevant both for constrained
215215+ nodes and for potential usage in applications with a very high
216216+ volume of data.
217217+218218+ 6. The format must support all JSON data types for conversion to and
219219+ from JSON.
220220+221221+ * It must support a reasonable level of conversion as long as
222222+ the data represented is within the capabilities of JSON. It
223223+ must be possible to define a unidirectional mapping towards
224224+ JSON for all types of data.
225225+226226+ 7. The format must be extensible, and the extended data must be
227227+ decodable by earlier decoders.
228228+229229+ * The format is designed for decades of use.
230230+231231+ * The format must support a form of extensibility that allows
232232+ fallback so that a decoder that does not understand an
233233+ extension can still decode the message.
234234+235235+ * The format must be able to be extended in the future by later
236236+ IETF standards.
237237+238238+1.2. Terminology
239239+240240+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
241241+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
242242+ "OPTIONAL" in this document are to be interpreted as described in
243243+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
244244+ capitals, as shown here.
245245+246246+ The term "byte" is used in its now-customary sense as a synonym for
247247+ "octet". All multi-byte values are encoded in network byte order
248248+ (that is, most significant byte first, also known as "big-endian").
249249+250250+ This specification makes use of the following terminology:
251251+252252+ Data item: A single piece of CBOR data. The structure of a data
253253+ item may contain zero, one, or more nested data items. The term
254254+ is used both for the data item in representation format and for
255255+ the abstract idea that can be derived from that by a decoder; the
256256+ former can be addressed specifically by using the term "encoded
257257+ data item".
258258+259259+ Decoder: A process that decodes a well-formed encoded CBOR data item
260260+ and makes it available to an application. Formally speaking, a
261261+ decoder contains a parser to break up the input using the syntax
262262+ rules of CBOR, as well as a semantic processor to prepare the data
263263+ in a form suitable to the application.
264264+265265+ Encoder: A process that generates the (well-formed) representation
266266+ format of a CBOR data item from application information.
267267+268268+ Data Stream: A sequence of zero or more data items, not further
269269+ assembled into a larger containing data item (see [RFC8742] for
270270+ one application). The independent data items that make up a data
271271+ stream are sometimes also referred to as "top-level data items".
272272+273273+ Well-formed: A data item that follows the syntactic structure of
274274+ CBOR. A well-formed data item uses the initial bytes and the byte
275275+ strings and/or data items that are implied by their values as
276276+ defined in CBOR and does not include following extraneous data.
277277+ CBOR decoders by definition only return contents from well-formed
278278+ data items.
279279+280280+ Valid: A data item that is well-formed and also follows the semantic
281281+ restrictions that apply to CBOR data items (Section 5.3).
282282+283283+ Expected: Besides its normal English meaning, the term "expected" is
284284+ used to describe requirements beyond CBOR validity that an
285285+ application has on its input data. Well-formed (processable at
286286+ all), valid (checked by a validity-checking generic decoder), and
287287+ expected (checked by the application) form a hierarchy of layers
288288+ of acceptability.
289289+290290+ Stream decoder: A process that decodes a data stream and makes each
291291+ of the data items in the sequence available to an application as
292292+ they are received.
293293+294294+ Terms and concepts for floating-point values such as Infinity, NaN
295295+ (not a number), negative zero, and subnormal are defined in
296296+ [IEEE754].
297297+298298+ Where bit arithmetic or data types are explained, this document uses
299299+ the notation familiar from the programming language C [C], except
300300+ that ".." denotes a range that includes both ends given, and
301301+ superscript notation denotes exponentiation. For example, 2 to the
302302+ power of 64 is notated: 2^(64). In the plain-text version of this
303303+ specification, superscript notation is not available and therefore is
304304+ rendered by a surrogate notation. That notation is not optimized for
305305+ this RFC; it is unfortunately ambiguous with C's exclusive-or (which
306306+ is only used in the appendices, which in turn do not use
307307+ exponentiation) and requires circumspection from the reader of the
308308+ plain-text version.
309309+310310+ Examples and pseudocode assume that signed integers use two's
311311+ complement representation and that right shifts of signed integers
312312+ perform sign extension; these assumptions are also specified in
313313+ Sections 6.8.1 (basic.fundamental) and 7.6.7 (expr.shift) of the 2020
314314+ version of C++ (currently available as a final draft, [Cplusplus20]).
315315+316316+ Similar to the "0x" notation for hexadecimal numbers, numbers in
317317+ binary notation are prefixed with "0b". Underscores can be added to
318318+ a number solely for readability, so 0b00100001 (0x21) might be
319319+ written 0b001_00001 to emphasize the desired interpretation of the
320320+ bits in the byte; in this case, it is split into three bits and five
321321+ bits. Encoded CBOR data items are sometimes given in the "0x" or
322322+ "0b" notation; these values are first interpreted as numbers as in C
323323+ and are then interpreted as byte strings in network byte order,
324324+ including any leading zero bytes expressed in the notation.
325325+326326+ Words may be _italicized_ for emphasis; in the plain text form of
327327+ this specification, this is indicated by surrounding words with
328328+ underscore characters. Verbatim text (e.g., names from a programming
329329+ language) may be set in "monospace" type; in plain text, this is
330330+ approximated somewhat ambiguously by surrounding the text in double
331331+ quotes (which also retain their usual meaning).
332332+333333+2. CBOR Data Models
334334+335335+ CBOR is explicit about its generic data model, which defines the set
336336+ of all data items that can be represented in CBOR. Its basic generic
337337+ data model is extensible by the registration of "simple values" and
338338+ tags. Applications can then create a subset of the resulting
339339+ extended generic data model to build their specific data models.
340340+341341+ Within environments that can represent the data items in the generic
342342+ data model, generic CBOR encoders and decoders can be implemented
343343+ (which usually involves defining additional implementation data types
344344+ for those data items that do not already have a natural
345345+ representation in the environment). The ability to provide generic
346346+ encoders and decoders is an explicit design goal of CBOR; however,
347347+ many applications will provide their own application-specific
348348+ encoders and/or decoders.
349349+350350+ In the basic (unextended) generic data model defined in Section 3, a
351351+ data item is one of the following:
352352+353353+ * an integer in the range -2^(64)..2^(64)-1 inclusive
354354+355355+ * a simple value, identified by a number between 0 and 255, but
356356+ distinct from that number itself
357357+358358+ * a floating-point value, distinct from an integer, out of the set
359359+ representable by IEEE 754 binary64 (including non-finites)
360360+ [IEEE754]
361361+362362+ * a sequence of zero or more bytes ("byte string")
363363+364364+ * a sequence of zero or more Unicode code points ("text string")
365365+366366+ * a sequence of zero or more data items ("array")
367367+368368+ * a mapping (mathematical function) from zero or more data items
369369+ ("keys") each to a data item ("values"), ("map")
370370+371371+ * a tagged data item ("tag"), comprising a tag number (an integer in
372372+ the range 0..2^(64)-1) and the tag content (a data item)
373373+374374+ Note that integer and floating-point values are distinct in this
375375+ model, even if they have the same numeric value.
376376+377377+ Also note that serialization variants are not visible at the generic
378378+ data model level. This deliberate absence of visibility includes the
379379+ number of bytes of the encoded floating-point value. It also
380380+ includes the choice of encoding for an "argument" (see Section 3)
381381+ such as the encoding for an integer, the encoding for the length of a
382382+ text or byte string, the encoding for the number of elements in an
383383+ array or pairs in a map, or the encoding for a tag number.
384384+385385+2.1. Extended Generic Data Models
386386+387387+ This basic generic data model has been extended in this document by
388388+ the registration of a number of simple values and tag numbers, such
389389+ as:
390390+391391+ * "false", "true", "null", and "undefined" (simple values identified
392392+ by 20..23, Section 3.3)
393393+394394+ * integer and floating-point values with a larger range and
395395+ precision than the above (tag numbers 2 to 5, Section 3.4)
396396+397397+ * application data types such as a point in time or date/time string
398398+ defined in RFC 3339 (tag numbers 1 and 0, Section 3.4)
399399+400400+ Additional elements of the extended generic data model can be (and
401401+ have been) defined via the IANA registries created for CBOR. Even if
402402+ such an extension is unknown to a generic encoder or decoder, data
403403+ items using that extension can be passed to or from the application
404404+ by representing them at the application interface within the basic
405405+ generic data model, i.e., as generic simple values or generic tags.
406406+407407+ In other words, the basic generic data model is stable as defined in
408408+ this document, while the extended generic data model expands by the
409409+ registration of new simple values or tag numbers, but never shrinks.
410410+411411+ While there is a strong expectation that generic encoders and
412412+ decoders can represent "false", "true", and "null" ("undefined" is
413413+ intentionally omitted) in the form appropriate for their programming
414414+ environment, the implementation of the data model extensions created
415415+ by tags is truly optional and a matter of implementation quality.
416416+417417+2.2. Specific Data Models
418418+419419+ The specific data model for a CBOR-based protocol usually takes a
420420+ subset of the extended generic data model and assigns application
421421+ semantics to the data items within this subset and its components.
422422+ When documenting such specific data models and specifying the types
423423+ of data items, it is preferable to identify the types by their
424424+ generic data model names ("negative integer", "array") instead of
425425+ referring to aspects of their CBOR representation ("major type 1",
426426+ "major type 4").
427427+428428+ Specific data models can also specify value equivalency (including
429429+ values of different types) for the purposes of map keys and encoder
430430+ freedom. For example, in the generic data model, a valid map MAY
431431+ have both "0" and "0.0" as keys, and an encoder MUST NOT encode "0.0"
432432+ as an integer (major type 0, Section 3.1). However, if a specific
433433+ data model declares that floating-point and integer representations
434434+ of integral values are equivalent, using both map keys "0" and "0.0"
435435+ in a single map would be considered duplicates, even while encoded as
436436+ different major types, and so invalid; and an encoder could encode
437437+ integral-valued floats as integers or vice versa, perhaps to save
438438+ encoded bytes.
439439+440440+3. Specification of the CBOR Encoding
441441+442442+ A CBOR data item (Section 2) is encoded to or decoded from a byte
443443+ string carrying a well-formed encoded data item as described in this
444444+ section. The encoding is summarized in Table 7 in Appendix B,
445445+ indexed by the initial byte. An encoder MUST produce only well-
446446+ formed encoded data items. A decoder MUST NOT return a decoded data
447447+ item when it encounters input that is not a well-formed encoded CBOR
448448+ data item (this does not detract from the usefulness of diagnostic
449449+ and recovery tools that might make available some information from a
450450+ damaged encoded CBOR data item).
451451+452452+ The initial byte of each encoded data item contains both information
453453+ about the major type (the high-order 3 bits, described in
454454+ Section 3.1) and additional information (the low-order 5 bits). With
455455+ a few exceptions, the additional information's value describes how to
456456+ load an unsigned integer "argument":
457457+458458+ Less than 24: The argument's value is the value of the additional
459459+ information.
460460+461461+ 24, 25, 26, or 27: The argument's value is held in the following 1,
462462+ 2, 4, or 8 bytes, respectively, in network byte order. For major
463463+ type 7 and additional information value 25, 26, 27, these bytes
464464+ are not used as an integer argument, but as a floating-point value
465465+ (see Section 3.3).
466466+467467+ 28, 29, 30: These values are reserved for future additions to the
468468+ CBOR format. In the present version of CBOR, the encoded item is
469469+ not well-formed.
470470+471471+ 31: No argument value is derived. If the major type is 0, 1, or 6,
472472+ the encoded item is not well-formed. For major types 2 to 5, the
473473+ item's length is indefinite, and for major type 7, the byte does
474474+ not constitute a data item at all but terminates an indefinite-
475475+ length item; all are described in Section 3.2.
476476+477477+ The initial byte and any additional bytes consumed to construct the
478478+ argument are collectively referred to as the _head_ of the data item.
479479+480480+ The meaning of this argument depends on the major type. For example,
481481+ in major type 0, the argument is the value of the data item itself
482482+ (and in major type 1, the value of the data item is computed from the
483483+ argument); in major type 2 and 3, it gives the length of the string
484484+ data in bytes that follow; and in major types 4 and 5, it is used to
485485+ determine the number of data items enclosed.
486486+487487+ If the encoded sequence of bytes ends before the end of a data item,
488488+ that item is not well-formed. If the encoded sequence of bytes still
489489+ has bytes remaining after the outermost encoded item is decoded, that
490490+ encoding is not a single well-formed CBOR item. Depending on the
491491+ application, the decoder may either treat the encoding as not well-
492492+ formed or just identify the start of the remaining bytes to the
493493+ application.
494494+495495+ A CBOR decoder implementation can be based on a jump table with all
496496+ 256 defined values for the initial byte (Table 7). A decoder in a
497497+ constrained implementation can instead use the structure of the
498498+ initial byte and following bytes for more compact code (see
499499+ Appendix C for a rough impression of how this could look).
500500+501501+3.1. Major Types
502502+503503+ The following lists the major types and the additional information
504504+ and other bytes associated with the type.
505505+506506+ Major type 0:
507507+ An unsigned integer in the range 0..2^(64)-1 inclusive. The value
508508+ of the encoded item is the argument itself. For example, the
509509+ integer 10 is denoted as the one byte 0b000_01010 (major type 0,
510510+ additional information 10). The integer 500 would be 0b000_11001
511511+ (major type 0, additional information 25) followed by the two
512512+ bytes 0x01f4, which is 500 in decimal.
513513+514514+ Major type 1:
515515+ A negative integer in the range -2^(64)..-1 inclusive. The value
516516+ of the item is -1 minus the argument. For example, the integer
517517+ -500 would be 0b001_11001 (major type 1, additional information
518518+ 25) followed by the two bytes 0x01f3, which is 499 in decimal.
519519+520520+ Major type 2:
521521+ A byte string. The number of bytes in the string is equal to the
522522+ argument. For example, a byte string whose length is 5 would have
523523+ an initial byte of 0b010_00101 (major type 2, additional
524524+ information 5 for the length), followed by 5 bytes of binary
525525+ content. A byte string whose length is 500 would have 3 initial
526526+ bytes of 0b010_11001 (major type 2, additional information 25 to
527527+ indicate a two-byte length) followed by the two bytes 0x01f4 for a
528528+ length of 500, followed by 500 bytes of binary content.
529529+530530+ Major type 3:
531531+ A text string (Section 2) encoded as UTF-8 [RFC3629]. The number
532532+ of bytes in the string is equal to the argument. A string
533533+ containing an invalid UTF-8 sequence is well-formed but invalid
534534+ (Section 1.2). This type is provided for systems that need to
535535+ interpret or display human-readable text, and allows the
536536+ differentiation between unstructured bytes and text that has a
537537+ specified repertoire (that of Unicode) and encoding (UTF-8). In
538538+ contrast to formats such as JSON, the Unicode characters in this
539539+ type are never escaped. Thus, a newline character (U+000A) is
540540+ always represented in a string as the byte 0x0a, and never as the
541541+ bytes 0x5c6e (the characters "\" and "n") nor as 0x5c7530303061
542542+ (the characters "\", "u", "0", "0", "0", and "a").
543543+544544+ Major type 4:
545545+ An array of data items. In other formats, arrays are also called
546546+ lists, sequences, or tuples (a "CBOR sequence" is something
547547+ slightly different, though [RFC8742]). The argument is the number
548548+ of data items in the array. Items in an array do not need to all
549549+ be of the same type. For example, an array that contains 10 items
550550+ of any type would have an initial byte of 0b100_01010 (major type
551551+ 4, additional information 10 for the length) followed by the 10
552552+ remaining items.
553553+554554+ Major type 5:
555555+ A map of pairs of data items. Maps are also called tables,
556556+ dictionaries, hashes, or objects (in JSON). A map is comprised of
557557+ pairs of data items, each pair consisting of a key that is
558558+ immediately followed by a value. The argument is the number of
559559+ _pairs_ of data items in the map. For example, a map that
560560+ contains 9 pairs would have an initial byte of 0b101_01001 (major
561561+ type 5, additional information 9 for the number of pairs) followed
562562+ by the 18 remaining items. The first item is the first key, the
563563+ second item is the first value, the third item is the second key,
564564+ and so on. Because items in a map come in pairs, their total
565565+ number is always even: a map that contains an odd number of items
566566+ (no value data present after the last key data item) is not well-
567567+ formed. A map that has duplicate keys may be well-formed, but it
568568+ is not valid, and thus it causes indeterminate decoding; see also
569569+ Section 5.6.
570570+571571+ Major type 6:
572572+ A tagged data item ("tag") whose tag number, an integer in the
573573+ range 0..2^(64)-1 inclusive, is the argument and whose enclosed
574574+ data item (_tag content_) is the single encoded data item that
575575+ follows the head. See Section 3.4.
576576+577577+ Major type 7:
578578+ Floating-point numbers and simple values, as well as the "break"
579579+ stop code. See Section 3.3.
580580+581581+ These eight major types lead to a simple table showing which of the
582582+ 256 possible values for the initial byte of a data item are used
583583+ (Table 7).
584584+585585+ In major types 6 and 7, many of the possible values are reserved for
586586+ future specification. See Section 9 for more information on these
587587+ values.
588588+589589+ Table 1 summarizes the major types defined by CBOR, ignoring
590590+ Section 3.2 for now. The number N in this table stands for the
591591+ argument.
592592+593593+ +============+=======================+=========================+
594594+ | Major Type | Meaning | Content |
595595+ +============+=======================+=========================+
596596+ | 0 | unsigned integer N | - |
597597+ +------------+-----------------------+-------------------------+
598598+ | 1 | negative integer -1-N | - |
599599+ +------------+-----------------------+-------------------------+
600600+ | 2 | byte string | N bytes |
601601+ +------------+-----------------------+-------------------------+
602602+ | 3 | text string | N bytes (UTF-8 text) |
603603+ +------------+-----------------------+-------------------------+
604604+ | 4 | array | N data items (elements) |
605605+ +------------+-----------------------+-------------------------+
606606+ | 5 | map | 2N data items (key/ |
607607+ | | | value pairs) |
608608+ +------------+-----------------------+-------------------------+
609609+ | 6 | tag of number N | 1 data item |
610610+ +------------+-----------------------+-------------------------+
611611+ | 7 | simple/float | - |
612612+ +------------+-----------------------+-------------------------+
613613+614614+ Table 1: Overview over the Definite-Length Use of CBOR Major
615615+ Types (N = Argument)
616616+617617+3.2. Indefinite Lengths for Some Major Types
618618+619619+ Four CBOR items (arrays, maps, byte strings, and text strings) can be
620620+ encoded with an indefinite length using additional information value
621621+ 31. This is useful if the encoding of the item needs to begin before
622622+ the number of items inside the array or map, or the total length of
623623+ the string, is known. (The ability to start sending a data item
624624+ before all of it is known is often referred to as "streaming" within
625625+ that data item.)
626626+627627+ Indefinite-length arrays and maps are dealt with differently than
628628+ indefinite-length strings (byte strings and text strings).
629629+630630+3.2.1. The "break" Stop Code
631631+632632+ The "break" stop code is encoded with major type 7 and additional
633633+ information value 31 (0b111_11111). It is not itself a data item: it
634634+ is just a syntactic feature to close an indefinite-length item.
635635+636636+ If the "break" stop code appears where a data item is expected, other
637637+ than directly inside an indefinite-length string, array, or map --
638638+ for example, directly inside a definite-length array or map -- the
639639+ enclosing item is not well-formed.
640640+641641+3.2.2. Indefinite-Length Arrays and Maps
642642+643643+ Indefinite-length arrays and maps are represented using their major
644644+ type with the additional information value of 31, followed by an
645645+ arbitrary-length sequence of zero or more items for an array or key/
646646+ value pairs for a map, followed by the "break" stop code
647647+ (Section 3.2.1). In other words, indefinite-length arrays and maps
648648+ look identical to other arrays and maps except for beginning with the
649649+ additional information value of 31 and ending with the "break" stop
650650+ code.
651651+652652+ If the "break" stop code appears after a key in a map, in place of
653653+ that key's value, the map is not well-formed.
654654+655655+ There is no restriction against nesting indefinite-length array or
656656+ map items. A "break" only terminates a single item, so nested
657657+ indefinite-length items need exactly as many "break" stop codes as
658658+ there are type bytes starting an indefinite-length item.
659659+660660+ For example, assume an encoder wants to represent the abstract array
661661+ [1, [2, 3], [4, 5]]. The definite-length encoding would be
662662+ 0x8301820203820405:
663663+664664+ 83 -- Array of length 3
665665+ 01 -- 1
666666+ 82 -- Array of length 2
667667+ 02 -- 2
668668+ 03 -- 3
669669+ 82 -- Array of length 2
670670+ 04 -- 4
671671+ 05 -- 5
672672+673673+ Indefinite-length encoding could be applied independently to each of
674674+ the three arrays encoded in this data item, as required, leading to
675675+ representations such as:
676676+677677+ 0x9f018202039f0405ffff
678678+ 9F -- Start indefinite-length array
679679+ 01 -- 1
680680+ 82 -- Array of length 2
681681+ 02 -- 2
682682+ 03 -- 3
683683+ 9F -- Start indefinite-length array
684684+ 04 -- 4
685685+ 05 -- 5
686686+ FF -- "break" (inner array)
687687+ FF -- "break" (outer array)
688688+689689+ 0x9f01820203820405ff
690690+ 9F -- Start indefinite-length array
691691+ 01 -- 1
692692+ 82 -- Array of length 2
693693+ 02 -- 2
694694+ 03 -- 3
695695+ 82 -- Array of length 2
696696+ 04 -- 4
697697+ 05 -- 5
698698+ FF -- "break"
699699+700700+ 0x83018202039f0405ff
701701+ 83 -- Array of length 3
702702+ 01 -- 1
703703+ 82 -- Array of length 2
704704+ 02 -- 2
705705+ 03 -- 3
706706+ 9F -- Start indefinite-length array
707707+ 04 -- 4
708708+ 05 -- 5
709709+ FF -- "break"
710710+711711+ 0x83019f0203ff820405
712712+ 83 -- Array of length 3
713713+ 01 -- 1
714714+ 9F -- Start indefinite-length array
715715+ 02 -- 2
716716+ 03 -- 3
717717+ FF -- "break"
718718+ 82 -- Array of length 2
719719+ 04 -- 4
720720+ 05 -- 5
721721+722722+ An example of an indefinite-length map (that happens to have two key/
723723+ value pairs) might be:
724724+725725+ 0xbf6346756ef563416d7421ff
726726+ BF -- Start indefinite-length map
727727+ 63 -- First key, UTF-8 string length 3
728728+ 46756e -- "Fun"
729729+ F5 -- First value, true
730730+ 63 -- Second key, UTF-8 string length 3
731731+ 416d74 -- "Amt"
732732+ 21 -- Second value, -2
733733+ FF -- "break"
734734+735735+3.2.3. Indefinite-Length Byte Strings and Text Strings
736736+737737+ Indefinite-length strings are represented by a byte containing the
738738+ major type for byte string or text string with an additional
739739+ information value of 31, followed by a series of zero or more strings
740740+ of the specified type ("chunks") that have definite lengths, and
741741+ finished by the "break" stop code (Section 3.2.1). The data item
742742+ represented by the indefinite-length string is the concatenation of
743743+ the chunks. If no chunks are present, the data item is an empty
744744+ string of the specified type. Zero-length chunks, while not
745745+ particularly useful, are permitted.
746746+747747+ If any item between the indefinite-length string indicator
748748+ (0b010_11111 or 0b011_11111) and the "break" stop code is not a
749749+ definite-length string item of the same major type, the string is not
750750+ well-formed.
751751+752752+ The design does not allow nesting indefinite-length strings as chunks
753753+ into indefinite-length strings. If it were allowed, it would require
754754+ decoder implementations to keep a stack, or at least a count, of
755755+ nesting levels. It is unnecessary on the encoder side because the
756756+ inner indefinite-length string would consist of chunks, and these
757757+ could instead be put directly into the outer indefinite-length
758758+ string.
759759+760760+ If any definite-length text string inside an indefinite-length text
761761+ string is invalid, the indefinite-length text string is invalid.
762762+ Note that this implies that the UTF-8 bytes of a single Unicode code
763763+ point (scalar value) cannot be spread between chunks: a new chunk of
764764+ a text string can only be started at a code point boundary.
765765+766766+ For example, assume an encoded data item consisting of the bytes:
767767+768768+ 0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_11111
769769+ 5F -- Start indefinite-length byte string
770770+ 44 -- Byte string of length 4
771771+ aabbccdd -- Bytes content
772772+ 43 -- Byte string of length 3
773773+ eeff99 -- Bytes content
774774+ FF -- "break"
775775+776776+ After decoding, this results in a single byte string with seven
777777+ bytes: 0xaabbccddeeff99.
778778+779779+3.2.4. Summary of Indefinite-Length Use of Major Types
780780+781781+ Table 2 summarizes the major types defined by CBOR as used for
782782+ indefinite-length encoding (with additional information set to 31).
783783+784784+ +============+===================+==================================+
785785+ | Major Type | Meaning | Enclosed up to "break" Stop Code |
786786+ +============+===================+==================================+
787787+ | 0 | (not well- | - |
788788+ | | formed) | |
789789+ +------------+-------------------+----------------------------------+
790790+ | 1 | (not well- | - |
791791+ | | formed) | |
792792+ +------------+-------------------+----------------------------------+
793793+ | 2 | byte string | definite-length byte strings |
794794+ +------------+-------------------+----------------------------------+
795795+ | 3 | text string | definite-length text strings |
796796+ +------------+-------------------+----------------------------------+
797797+ | 4 | array | data items (elements) |
798798+ +------------+-------------------+----------------------------------+
799799+ | 5 | map | data items (key/value pairs) |
800800+ +------------+-------------------+----------------------------------+
801801+ | 6 | (not well- | - |
802802+ | | formed) | |
803803+ +------------+-------------------+----------------------------------+
804804+ | 7 | "break" stop | - |
805805+ | | code | |
806806+ +------------+-------------------+----------------------------------+
807807+808808+ Table 2: Overview of the Indefinite-Length Use of CBOR Major
809809+ Types (Additional Information = 31)
810810+811811+3.3. Floating-Point Numbers and Values with No Content
812812+813813+ Major type 7 is for two types of data: floating-point numbers and
814814+ "simple values" that do not need any content. Each value of the
815815+ 5-bit additional information in the initial byte has its own separate
816816+ meaning, as defined in Table 3. Like the major types for integers,
817817+ items of this major type do not carry content data; all the
818818+ information is in the initial bytes (the head).
819819+820820+ +=============+===================================================+
821821+ | 5-Bit Value | Semantics |
822822+ +=============+===================================================+
823823+ | 0..23 | Simple value (value 0..23) |
824824+ +-------------+---------------------------------------------------+
825825+ | 24 | Simple value (value 32..255 in following byte) |
826826+ +-------------+---------------------------------------------------+
827827+ | 25 | IEEE 754 Half-Precision Float (16 bits follow) |
828828+ +-------------+---------------------------------------------------+
829829+ | 26 | IEEE 754 Single-Precision Float (32 bits follow) |
830830+ +-------------+---------------------------------------------------+
831831+ | 27 | IEEE 754 Double-Precision Float (64 bits follow) |
832832+ +-------------+---------------------------------------------------+
833833+ | 28-30 | Reserved, not well-formed in the present document |
834834+ +-------------+---------------------------------------------------+
835835+ | 31 | "break" stop code for indefinite-length items |
836836+ | | (Section 3.2.1) |
837837+ +-------------+---------------------------------------------------+
838838+839839+ Table 3: Values for Additional Information in Major Type 7
840840+841841+ As with all other major types, the 5-bit value 24 signifies a single-
842842+ byte extension: it is followed by an additional byte to represent the
843843+ simple value. (To minimize confusion, only the values 32 to 255 are
844844+ used.) This maintains the structure of the initial bytes: as for the
845845+ other major types, the length of these always depends on the
846846+ additional information in the first byte. Table 4 lists the numeric
847847+ values assigned and available for simple values.
848848+849849+ +=========+==============+
850850+ | Value | Semantics |
851851+ +=========+==============+
852852+ | 0..19 | (unassigned) |
853853+ +---------+--------------+
854854+ | 20 | false |
855855+ +---------+--------------+
856856+ | 21 | true |
857857+ +---------+--------------+
858858+ | 22 | null |
859859+ +---------+--------------+
860860+ | 23 | undefined |
861861+ +---------+--------------+
862862+ | 24..31 | (reserved) |
863863+ +---------+--------------+
864864+ | 32..255 | (unassigned) |
865865+ +---------+--------------+
866866+867867+ Table 4: Simple Values
868868+869869+ An encoder MUST NOT issue two-byte sequences that start with 0xf8
870870+ (major type 7, additional information 24) and continue with a byte
871871+ less than 0x20 (32 decimal). Such sequences are not well-formed.
872872+ (This implies that an encoder cannot encode "false", "true", "null",
873873+ or "undefined" in two-byte sequences and that only the one-byte
874874+ variants of these are well-formed; more generally speaking, each
875875+ simple value only has a single representation variant).
876876+877877+ The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bit
878878+ IEEE 754 binary floating-point values [IEEE754]. These floating-
879879+ point values are encoded in the additional bytes of the appropriate
880880+ size. (See Appendix D for some information about 16-bit floating-
881881+ point numbers.)
882882+883883+3.4. Tagging of Items
884884+885885+ In CBOR, a data item can be enclosed by a tag to give it some
886886+ additional semantics, as uniquely identified by a _tag number_. The
887887+ tag is major type 6, its argument (Section 3) indicates the tag
888888+ number, and it contains a single enclosed data item, the _tag
889889+ content_. (If a tag requires further structure to its content, this
890890+ structure is provided by the enclosed data item.) We use the term
891891+ _tag_ for the entire data item consisting of both a tag number and
892892+ the tag content: the tag content is the data item that is being
893893+ tagged.
894894+895895+ For example, assume that a byte string of length 12 is marked with a
896896+ tag of number 2 to indicate it is an unsigned _bignum_
897897+ (Section 3.4.3). The encoded data item would start with a byte
898898+ 0b110_00010 (major type 6, additional information 2 for the tag
899899+ number) followed by the encoded tag content: 0b010_01100 (major type
900900+ 2, additional information 12 for the length) followed by the 12 bytes
901901+ of the bignum.
902902+903903+ In the extended generic data model, a tag number's definition
904904+ describes the additional semantics conveyed with the tag number.
905905+ These semantics may include equivalence of some tagged data items
906906+ with other data items, including some that can be represented in the
907907+ basic generic data model. For instance, 0xc24101, a bignum the tag
908908+ content of which is the byte string with the single byte 0x01, is
909909+ equivalent to an integer 1, which could also be encoded as 0x01,
910910+ 0x1801, or 0x190001. The tag definition may specify a preferred
911911+ serialization (Section 4.1) that is recommended for generic encoders;
912912+ this may prefer basic generic data model representations over ones
913913+ that employ a tag.
914914+915915+ The tag definition usually defines which nested data items are valid
916916+ for such tags. Tag definitions may restrict their content to a very
917917+ specific syntactic structure, as the tags defined in this document
918918+ do, or they may define their content more semantically. An example
919919+ for the latter is how tags 40 and 1040 accept multiple ways to
920920+ represent arrays [RFC8746].
921921+922922+ As a matter of convention, many tags do not accept "null" or
923923+ "undefined" values as tag content; instead, the expectation is that a
924924+ "null" or "undefined" value can be used in place of the entire tag;
925925+ Section 3.4.2 provides some further considerations for one specific
926926+ tag about the handling of this convention in application protocols
927927+ and in mapping to platform types.
928928+929929+ Decoders do not need to understand tags of every tag number, and tags
930930+ may be of little value in applications where the implementation
931931+ creating a particular CBOR data item and the implementation decoding
932932+ that stream know the semantic meaning of each item in the data flow.
933933+ The primary purpose of tags in this specification is to define common
934934+ data types such as dates. A secondary purpose is to provide
935935+ conversion hints when it is foreseen that the CBOR data item needs to
936936+ be translated into a different format, requiring hints about the
937937+ content of items. Understanding the semantics of tags is optional
938938+ for a decoder; it can simply present both the tag number and the tag
939939+ content to the application, without interpreting the additional
940940+ semantics of the tag.
941941+942942+ A tag applies semantics to the data item it encloses. Tags can nest:
943943+ if tag A encloses tag B, which encloses data item C, tag A applies to
944944+ the result of applying tag B on data item C.
945945+946946+ IANA maintains a registry of tag numbers as described in Section 9.2.
947947+ Table 5 provides a list of tag numbers that were defined in [RFC7049]
948948+ with definitions in the rest of this section. (Tag number 35 was
949949+ also defined in [RFC7049]; a discussion of this tag number follows in
950950+ Section 3.4.5.3.) Note that many other tag numbers have been defined
951951+ since the publication of [RFC7049]; see the registry described at
952952+ Section 9.2 for the complete list.
953953+954954+ +=======+=============+==================================+
955955+ | Tag | Data Item | Semantics |
956956+ +=======+=============+==================================+
957957+ | 0 | text string | Standard date/time string; see |
958958+ | | | Section 3.4.1 |
959959+ +-------+-------------+----------------------------------+
960960+ | 1 | integer or | Epoch-based date/time; see |
961961+ | | float | Section 3.4.2 |
962962+ +-------+-------------+----------------------------------+
963963+ | 2 | byte string | Unsigned bignum; see |
964964+ | | | Section 3.4.3 |
965965+ +-------+-------------+----------------------------------+
966966+ | 3 | byte string | Negative bignum; see |
967967+ | | | Section 3.4.3 |
968968+ +-------+-------------+----------------------------------+
969969+ | 4 | array | Decimal fraction; see |
970970+ | | | Section 3.4.4 |
971971+ +-------+-------------+----------------------------------+
972972+ | 5 | array | Bigfloat; see Section 3.4.4 |
973973+ +-------+-------------+----------------------------------+
974974+ | 21 | (any) | Expected conversion to base64url |
975975+ | | | encoding; see Section 3.4.5.2 |
976976+ +-------+-------------+----------------------------------+
977977+ | 22 | (any) | Expected conversion to base64 |
978978+ | | | encoding; see Section 3.4.5.2 |
979979+ +-------+-------------+----------------------------------+
980980+ | 23 | (any) | Expected conversion to base16 |
981981+ | | | encoding; see Section 3.4.5.2 |
982982+ +-------+-------------+----------------------------------+
983983+ | 24 | byte string | Encoded CBOR data item; see |
984984+ | | | Section 3.4.5.1 |
985985+ +-------+-------------+----------------------------------+
986986+ | 32 | text string | URI; see Section 3.4.5.3 |
987987+ +-------+-------------+----------------------------------+
988988+ | 33 | text string | base64url; see Section 3.4.5.3 |
989989+ +-------+-------------+----------------------------------+
990990+ | 34 | text string | base64; see Section 3.4.5.3 |
991991+ +-------+-------------+----------------------------------+
992992+ | 36 | text string | MIME message; see |
993993+ | | | Section 3.4.5.3 |
994994+ +-------+-------------+----------------------------------+
995995+ | 55799 | (any) | Self-described CBOR; see |
996996+ | | | Section 3.4.6 |
997997+ +-------+-------------+----------------------------------+
998998+999999+ Table 5: Tag Numbers Defined in RFC 7049
10001000+10011001+ Conceptually, tags are interpreted in the generic data model, not at
10021002+ (de-)serialization time. A small number of tags (at this time, tag
10031003+ number 25 and tag number 29 [IANA.cbor-tags]) have been registered
10041004+ with semantics that may require processing at (de-)serialization
10051005+ time: the decoder needs to be aware of, and the encoder needs to be
10061006+ in control of, the exact sequence in which data items are encoded
10071007+ into the CBOR data item. This means these tags cannot be implemented
10081008+ on top of an arbitrary generic CBOR encoder/decoder (which might not
10091009+ reflect the serialization order for entries in a map at the data
10101010+ model level and vice versa); their implementation therefore typically
10111011+ needs to be integrated into the generic encoder/decoder. The
10121012+ definition of new tags with this property is NOT RECOMMENDED.
10131013+10141014+ IANA allocated tag numbers 65535, 4294967295, and
10151015+ 18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit).
10161016+ These can be used as a convenience for implementers who want a
10171017+ single-integer data structure to indicate either the presence of a
10181018+ specific tag or absence of a tag. That allocation is described in
10191019+ Section 10 of [CBOR-TAGS]. These tags are not intended to occur in
10201020+ actual CBOR data items; implementations MAY flag such an occurrence
10211021+ as an error.
10221022+10231023+ Protocols can extend the generic data model (Section 2) with data
10241024+ items representing points in time by using tag numbers 0 and 1, with
10251025+ arbitrarily sized integers by using tag numbers 2 and 3, and with
10261026+ floating-point values of arbitrary size and precision by using tag
10271027+ numbers 4 and 5.
10281028+10291029+3.4.1. Standard Date/Time String
10301030+10311031+ Tag number 0 contains a text string in the standard format described
10321032+ by the "date-time" production in [RFC3339], as refined by Section 3.3
10331033+ of [RFC4287], representing the point in time described there. A
10341034+ nested item of another type or a text string that doesn't match the
10351035+ format described in [RFC4287] is invalid.
10361036+10371037+3.4.2. Epoch-Based Date/Time
10381038+10391039+ Tag number 1 contains a numerical value counting the number of
10401040+ seconds from 1970-01-01T00:00Z in UTC time to the represented point
10411041+ in civil time.
10421042+10431043+ The tag content MUST be an unsigned or negative integer (major types
10441044+ 0 and 1) or a floating-point number (major type 7 with additional
10451045+ information 25, 26, or 27). Other contained types are invalid.
10461046+10471047+ Nonnegative values (major type 0 and nonnegative floating-point
10481048+ numbers) stand for time values on or after 1970-01-01T00:00Z UTC and
10491049+ are interpreted according to POSIX [TIME_T]. (POSIX time is also
10501050+ known as "UNIX Epoch time".) Leap seconds are handled specially by
10511051+ POSIX time, and this results in a 1-second discontinuity several
10521052+ times per decade. Note that applications that require the expression
10531053+ of times beyond early 2106 cannot leave out support of 64-bit
10541054+ integers for the tag content.
10551055+10561056+ Negative values (major type 1 and negative floating-point numbers)
10571057+ are interpreted as determined by the application requirements as
10581058+ there is no universal standard for UTC count-of-seconds time before
10591059+ 1970-01-01T00:00Z (this is particularly true for points in time that
10601060+ precede discontinuities in national calendars). The same applies to
10611061+ non-finite values.
10621062+10631063+ To indicate fractional seconds, floating-point values can be used
10641064+ within tag number 1 instead of integer values. Note that this
10651065+ generally requires binary64 support, as binary16 and binary32 provide
10661066+ nonzero fractions of seconds only for a short period of time around
10671067+ early 1970. An application that requires tag number 1 support may
10681068+ restrict the tag content to be an integer (or a floating-point value)
10691069+ only.
10701070+10711071+ Note that platform types for date/time may include "null" or
10721072+ "undefined" values, which may also be desirable at an application
10731073+ protocol level. While emitting tag number 1 values with non-finite
10741074+ tag content values (e.g., with NaN for undefined date/time values or
10751075+ with Infinity for an expiry date that is not set) may seem an obvious
10761076+ way to handle this, using untagged "null" or "undefined" avoids the
10771077+ use of non-finites and results in a shorter encoding. Application
10781078+ protocol designers are encouraged to consider these cases and include
10791079+ clear guidelines for handling them.
10801080+10811081+3.4.3. Bignums
10821082+10831083+ Protocols using tag numbers 2 and 3 extend the generic data model
10841084+ (Section 2) with "bignums" representing arbitrarily sized integers.
10851085+ In the basic generic data model, bignum values are not equal to
10861086+ integers from the same model, but the extended generic data model
10871087+ created by this tag definition defines equivalence based on numeric
10881088+ value, and preferred serialization (Section 4.1) never makes use of
10891089+ bignums that also can be expressed as basic integers (see below).
10901090+10911091+ Bignums are encoded as a byte string data item, which is interpreted
10921092+ as an unsigned integer n in network byte order. Contained items of
10931093+ other types are invalid. For tag number 2, the value of the bignum
10941094+ is n. For tag number 3, the value of the bignum is -1 - n. The
10951095+ preferred serialization of the byte string is to leave out any
10961096+ leading zeroes (note that this means the preferred serialization for
10971097+ n = 0 is the empty byte string, but see below). Decoders that
10981098+ understand these tags MUST be able to decode bignums that do have
10991099+ leading zeroes. The preferred serialization of an integer that can
11001100+ be represented using major type 0 or 1 is to encode it this way
11011101+ instead of as a bignum (which means that the empty string never
11021102+ occurs in a bignum when using preferred serialization). Note that
11031103+ this means the non-preferred choice of a bignum representation
11041104+ instead of a basic integer for encoding a number is not intended to
11051105+ have application semantics (just as the choice of a longer basic
11061106+ integer representation than needed, such as 0x1800 for 0x00, does
11071107+ not).
11081108+11091109+ For example, the number 18446744073709551616 (2^(64)) is represented
11101110+ as 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001
11111111+ (major type 2, length 9), followed by 0x010000000000000000 (one byte
11121112+ 0x01 and eight bytes 0x00). In hexadecimal:
11131113+11141114+ C2 -- Tag 2
11151115+ 49 -- Byte string of length 9
11161116+ 010000000000000000 -- Bytes content
11171117+11181118+3.4.4. Decimal Fractions and Bigfloats
11191119+11201120+ Protocols using tag number 4 extend the generic data model with data
11211121+ items representing arbitrary-length decimal fractions of the form
11221122+ m*(10^(e)). Protocols using tag number 5 extend the generic data
11231123+ model with data items representing arbitrary-length binary fractions
11241124+ of the form m*(2^(e)). As with bignums, values of different types
11251125+ are not equal in the generic data model.
11261126+11271127+ Decimal fractions combine an integer mantissa with a base-10 scaling
11281128+ factor. They are most useful if an application needs the exact
11291129+ representation of a decimal fraction such as 1.1 because there is no
11301130+ exact representation for many decimal fractions in binary floating-
11311131+ point representations.
11321132+11331133+ "Bigfloats" combine an integer mantissa with a base-2 scaling factor.
11341134+ They are binary floating-point values that can exceed the range or
11351135+ the precision of the three IEEE 754 formats supported by CBOR
11361136+ (Section 3.3). Bigfloats may also be used by constrained
11371137+ applications that need some basic binary floating-point capability
11381138+ without the need for supporting IEEE 754.
11391139+11401140+ A decimal fraction or a bigfloat is represented as a tagged array
11411141+ that contains exactly two integer numbers: an exponent e and a
11421142+ mantissa m. Decimal fractions (tag number 4) use base-10 exponents;
11431143+ the value of a decimal fraction data item is m*(10^(e)). Bigfloats
11441144+ (tag number 5) use base-2 exponents; the value of a bigfloat data
11451145+ item is m*(2^(e)). The exponent e MUST be represented in an integer
11461146+ of major type 0 or 1, while the mantissa can also be a bignum
11471147+ (Section 3.4.3). Contained items with other structures are invalid.
11481148+11491149+ An example of a decimal fraction is the representation of the number
11501150+ 273.15 as 0b110_00100 (major type 6 for tag, additional information 4
11511151+ for the tag number), followed by 0b100_00010 (major type 4 for the
11521152+ array, additional information 2 for the length of the array),
11531153+ followed by 0b001_00001 (major type 1 for the first integer,
11541154+ additional information 1 for the value of -2), followed by
11551155+ 0b000_11001 (major type 0 for the second integer, additional
11561156+ information 25 for a two-byte value), followed by 0b0110101010110011
11571157+ (27315 in two bytes). In hexadecimal:
11581158+11591159+ C4 -- Tag 4
11601160+ 82 -- Array of length 2
11611161+ 21 -- -2
11621162+ 19 6ab3 -- 27315
11631163+11641164+ An example of a bigfloat is the representation of the number 1.5 as
11651165+ 0b110_00101 (major type 6 for tag, additional information 5 for the
11661166+ tag number), followed by 0b100_00010 (major type 4 for the array,
11671167+ additional information 2 for the length of the array), followed by
11681168+ 0b001_00000 (major type 1 for the first integer, additional
11691169+ information 0 for the value of -1), followed by 0b000_00011 (major
11701170+ type 0 for the second integer, additional information 3 for the value
11711171+ of 3). In hexadecimal:
11721172+11731173+ C5 -- Tag 5
11741174+ 82 -- Array of length 2
11751175+ 20 -- -1
11761176+ 03 -- 3
11771177+11781178+ Decimal fractions and bigfloats provide no representation of
11791179+ Infinity, -Infinity, or NaN; if these are needed in place of a
11801180+ decimal fraction or bigfloat, the IEEE 754 half-precision
11811181+ representations from Section 3.3 can be used.
11821182+11831183+3.4.5. Content Hints
11841184+11851185+ The tags in this section are for content hints that might be used by
11861186+ generic CBOR processors. These content hints do not extend the
11871187+ generic data model.
11881188+11891189+3.4.5.1. Encoded CBOR Data Item
11901190+11911191+ Sometimes it is beneficial to carry an embedded CBOR data item that
11921192+ is not meant to be decoded immediately at the time the enclosing data
11931193+ item is being decoded. Tag number 24 (CBOR data item) can be used to
11941194+ tag the embedded byte string as a single data item encoded in CBOR
11951195+ format. Contained items that aren't byte strings are invalid. A
11961196+ contained byte string is valid if it encodes a well-formed CBOR data
11971197+ item; validity checking of the decoded CBOR item is not required for
11981198+ tag validity (but could be offered by a generic decoder as a special
11991199+ option).
12001200+12011201+3.4.5.2. Expected Later Encoding for CBOR-to-JSON Converters
12021202+12031203+ Tag numbers 21 to 23 indicate that a byte string might require a
12041204+ specific encoding when interoperating with a text-based
12051205+ representation. These tags are useful when an encoder knows that the
12061206+ byte string data it is writing is likely to be later converted to a
12071207+ particular JSON-based usage. That usage specifies that some strings
12081208+ are encoded as base64, base64url, and so on. The encoder uses byte
12091209+ strings instead of doing the encoding itself to reduce the message
12101210+ size, to reduce the code size of the encoder, or both. The encoder
12111211+ does not know whether or not the converter will be generic, and
12121212+ therefore wants to say what it believes is the proper way to convert
12131213+ binary strings to JSON.
12141214+12151215+ The data item tagged can be a byte string or any other data item. In
12161216+ the latter case, the tag applies to all of the byte string data items
12171217+ contained in the data item, except for those contained in a nested
12181218+ data item tagged with an expected conversion.
12191219+12201220+ These three tag numbers suggest conversions to three of the base data
12211221+ encodings defined in [RFC4648]. Tag number 21 suggests conversion to
12221222+ base64url encoding (Section 5 of [RFC4648]) where padding is not used
12231223+ (see Section 3.2 of [RFC4648]); that is, all trailing equals signs
12241224+ ("=") are removed from the encoded string. Tag number 22 suggests
12251225+ conversion to classical base64 encoding (Section 4 of [RFC4648]) with
12261226+ padding as defined in RFC 4648. For both base64url and base64,
12271227+ padding bits are set to zero (see Section 3.5 of [RFC4648]), and the
12281228+ conversion to alternate encoding is performed on the contents of the
12291229+ byte string (that is, without adding any line breaks, whitespace, or
12301230+ other additional characters). Tag number 23 suggests conversion to
12311231+ base16 (hex) encoding with uppercase alphabetics (see Section 8 of
12321232+ [RFC4648]). Note that, for all three tag numbers, the encoding of
12331233+ the empty byte string is the empty text string.
12341234+12351235+3.4.5.3. Encoded Text
12361236+12371237+ Some text strings hold data that have formats widely used on the
12381238+ Internet, and sometimes those formats can be validated and presented
12391239+ to the application in appropriate form by the decoder. There are
12401240+ tags for some of these formats.
12411241+12421242+ * Tag number 32 is for URIs, as defined in [RFC3986]. If the text
12431243+ string doesn't match the "URI-reference" production, the string is
12441244+ invalid.
12451245+12461246+ * Tag numbers 33 and 34 are for base64url- and base64-encoded text
12471247+ strings, respectively, as defined in [RFC4648]. If any of the
12481248+ following apply:
12491249+12501250+ - the encoded text string contains non-alphabet characters or
12511251+ only 1 alphabet character in the last block of 4 (where
12521252+ alphabet is defined by Section 5 of [RFC4648] for tag number 33
12531253+ and Section 4 of [RFC4648] for tag number 34), or
12541254+12551255+ - the padding bits in a 2- or 3-character block are not 0, or
12561256+12571257+ - the base64 encoding has the wrong number of padding characters,
12581258+ or
12591259+12601260+ - the base64url encoding has padding characters,
12611261+12621262+ the string is invalid.
12631263+12641264+ * Tag number 36 is for MIME messages (including all headers), as
12651265+ defined in [RFC2045]. A text string that isn't a valid MIME
12661266+ message is invalid. (For this tag, validity checking may be
12671267+ particularly onerous for a generic decoder and might therefore not
12681268+ be offered. Note that many MIME messages are general binary data
12691269+ and therefore cannot be represented in a text string;
12701270+ [IANA.cbor-tags] lists a registration for tag number 257 that is
12711271+ similar to tag number 36 but uses a byte string as its tag
12721272+ content.)
12731273+12741274+ Note that tag numbers 33 and 34 differ from 21 and 22 in that the
12751275+ data is transported in base-encoded form for the former and in raw
12761276+ byte string form for the latter.
12771277+12781278+ [RFC7049] also defined a tag number 35 for regular expressions that
12791279+ are in Perl Compatible Regular Expressions (PCRE/PCRE2) form [PCRE]
12801280+ or in JavaScript regular expression syntax [ECMA262]. The state of
12811281+ the art in these regular expression specifications has since advanced
12821282+ and is continually advancing, so this specification does not attempt
12831283+ to update the references. Instead, this tag remains available (as
12841284+ registered in [RFC7049]) for applications that specify the particular
12851285+ regular expression variant they use out-of-band (possibly by limiting
12861286+ the usage to a defined common subset of both PCRE and ECMA262). As
12871287+ this specification clarifies tag validity beyond [RFC7049], we note
12881288+ that due to the open way the tag was defined in [RFC7049], any
12891289+ contained string value needs to be valid at the CBOR tag level (but
12901290+ then may not be "expected" at the application level).
12911291+12921292+3.4.6. Self-Described CBOR
12931293+12941294+ In many applications, it will be clear from the context that CBOR is
12951295+ being employed for encoding a data item. For instance, a specific
12961296+ protocol might specify the use of CBOR, or a media type is indicated
12971297+ that specifies its use. However, there may be applications where
12981298+ such context information is not available, such as when CBOR data is
12991299+ stored in a file that does not have disambiguating metadata. Here,
13001300+ it may help to have some distinguishing characteristics for the data
13011301+ itself.
13021302+13031303+ Tag number 55799 is defined for this purpose, specifically for use at
13041304+ the start of a stored encoded CBOR data item as specified by an
13051305+ application. It does not impart any special semantics on the data
13061306+ item that it encloses; that is, the semantics of the tag content
13071307+ enclosed in tag number 55799 is exactly identical to the semantics of
13081308+ the tag content itself.
13091309+13101310+ The serialization of this tag's head is 0xd9d9f7, which does not
13111311+ appear to be in use as a distinguishing mark for any frequently used
13121312+ file types. In particular, 0xd9d9f7 is not a valid start of a
13131313+ Unicode text in any Unicode encoding if it is followed by a valid
13141314+ CBOR data item.
13151315+13161316+ For instance, a decoder might be able to decode both CBOR and JSON.
13171317+ Such a decoder would need to mechanically distinguish the two
13181318+ formats. An easy way for an encoder to help the decoder would be to
13191319+ tag the entire CBOR item with tag number 55799, the serialization of
13201320+ which will never be found at the beginning of a JSON text.
13211321+13221322+4. Serialization Considerations
13231323+13241324+4.1. Preferred Serialization
13251325+13261326+ For some values at the data model level, CBOR provides multiple
13271327+ serializations. For many applications, it is desirable that an
13281328+ encoder always chooses a preferred serialization (preferred
13291329+ encoding); however, the present specification does not put the burden
13301330+ of enforcing this preference on either the encoder or decoder.
13311331+13321332+ Some constrained decoders may be limited in their ability to decode
13331333+ non-preferred serializations: for example, if only integers below
13341334+ 1_000_000_000 (one billion) are expected in an application, the
13351335+ decoder may leave out the code that would be needed to decode 64-bit
13361336+ arguments in integers. An encoder that always uses preferred
13371337+ serialization ("preferred encoder") interoperates with this decoder
13381338+ for the numbers that can occur in this application. Generally
13391339+ speaking, a preferred encoder is more universally interoperable (and
13401340+ also less wasteful) than one that, say, always uses 64-bit integers.
13411341+13421342+ Similarly, a constrained encoder may be limited in the variety of
13431343+ representation variants it supports such that it does not emit
13441344+ preferred serializations ("variant encoder"). For instance, a
13451345+ constrained encoder could be designed to always use the 32-bit
13461346+ variant for an integer that it encodes even if a short representation
13471347+ is available (assuming that there is no application need for integers
13481348+ that can only be represented with the 64-bit variant). A decoder
13491349+ that does not rely on receiving only preferred serializations
13501350+ ("variation-tolerant decoder") can therefore be said to be more
13511351+ universally interoperable (it might very well optimize for the case
13521352+ of receiving preferred serializations, though). Full implementations
13531353+ of CBOR decoders are by definition variation tolerant; the
13541354+ distinction is only relevant if a constrained implementation of a
13551355+ CBOR decoder meets a variant encoder.
13561356+13571357+ The preferred serialization always uses the shortest form of
13581358+ representing the argument (Section 3); it also uses the shortest
13591359+ floating-point encoding that preserves the value being encoded.
13601360+13611361+ The preferred serialization for a floating-point value is the
13621362+ shortest floating-point encoding that preserves its value, e.g.,
13631363+ 0xf94580 for the number 5.5, and 0xfa45ad9c00 for the number 5555.5.
13641364+ For NaN values, a shorter encoding is preferred if zero-padding the
13651365+ shorter significand towards the right reconstitutes the original NaN
13661366+ value (for many applications, the single NaN encoding 0xf97e00 will
13671367+ suffice).
13681368+13691369+ Definite-length encoding is preferred whenever the length is known at
13701370+ the time the serialization of the item starts.
13711371+13721372+4.2. Deterministically Encoded CBOR
13731373+13741374+ Some protocols may want encoders to only emit CBOR in a particular
13751375+ deterministic format; those protocols might also have the decoders
13761376+ check that their input is in that deterministic format. Those
13771377+ protocols are free to define what they mean by a "deterministic
13781378+ format" and what encoders and decoders are expected to do. This
13791379+ section defines a set of restrictions that can serve as the base of
13801380+ such a deterministic format.
13811381+13821382+4.2.1. Core Deterministic Encoding Requirements
13831383+13841384+ A CBOR encoding satisfies the "core deterministic encoding
13851385+ requirements" if it satisfies the following restrictions:
13861386+13871387+ * Preferred serialization MUST be used. In particular, this means
13881388+ that arguments (see Section 3) for integers, lengths in major
13891389+ types 2 through 5, and tags MUST be as short as possible, for
13901390+ instance:
13911391+13921392+ - 0 to 23 and -1 to -24 MUST be expressed in the same byte as the
13931393+ major type;
13941394+13951395+ - 24 to 255 and -25 to -256 MUST be expressed only with an
13961396+ additional uint8_t;
13971397+13981398+ - 256 to 65535 and -257 to -65536 MUST be expressed only with an
13991399+ additional uint16_t;
14001400+14011401+ - 65536 to 4294967295 and -65537 to -4294967296 MUST be expressed
14021402+ only with an additional uint32_t.
14031403+14041404+ Floating-point values also MUST use the shortest form that
14051405+ preserves the value, e.g., 1.5 is encoded as 0xf93e00 (binary16)
14061406+ and 1000000.5 as 0xfa49742408 (binary32). (One implementation of
14071407+ this is to have all floats start as a 64-bit float, then do a test
14081408+ conversion to a 32-bit float; if the result is the same numeric
14091409+ value, use the shorter form and repeat the process with a test
14101410+ conversion to a 16-bit float. This also works to select 16-bit
14111411+ float for positive and negative Infinity as well.)
14121412+14131413+ * Indefinite-length items MUST NOT appear. They can be encoded as
14141414+ definite-length items instead.
14151415+14161416+ * The keys in every map MUST be sorted in the bytewise lexicographic
14171417+ order of their deterministic encodings. For example, the
14181418+ following keys are sorted correctly:
14191419+14201420+ 1. 10, encoded as 0x0a.
14211421+14221422+ 2. 100, encoded as 0x1864.
14231423+14241424+ 3. -1, encoded as 0x20.
14251425+14261426+ 4. "z", encoded as 0x617a.
14271427+14281428+ 5. "aa", encoded as 0x626161.
14291429+14301430+ 6. [100], encoded as 0x811864.
14311431+14321432+ 7. [-1], encoded as 0x8120.
14331433+14341434+ 8. false, encoded as 0xf4.
14351435+14361436+ | Implementation note: the self-delimiting nature of the CBOR
14371437+ | encoding means that there are no two well-formed CBOR encoded
14381438+ | data items where one is a prefix of the other. The bytewise
14391439+ | lexicographic comparison of deterministic encodings of
14401440+ | different map keys therefore always ends in a position where
14411441+ | the byte differs between the keys, before the end of a key is
14421442+ | reached.
14431443+14441444+4.2.2. Additional Deterministic Encoding Considerations
14451445+14461446+ CBOR tags present additional considerations for deterministic
14471447+ encoding. If a CBOR-based protocol were to provide the same
14481448+ semantics for the presence and absence of a specific tag (e.g., by
14491449+ allowing both tag 1 data items and raw numbers in a date/time
14501450+ position, treating the latter as if they were tagged), the
14511451+ deterministic format would not allow the presence of the tag, based
14521452+ on the "shortest form" principle. For example, a protocol might give
14531453+ encoders the choice of representing a URL as either a text string or,
14541454+ using Section 3.4.5.3, tag number 32 containing a text string. This
14551455+ protocol's deterministic encoding needs either to require that the
14561456+ tag is present or to require that it is absent, not allow either one.
14571457+14581458+ In a protocol that does require tags in certain places to obtain
14591459+ specific semantics, the tag needs to appear in the deterministic
14601460+ format as well. Deterministic encoding considerations also apply to
14611461+ the content of tags.
14621462+14631463+ If a protocol includes a field that can express integers with an
14641464+ absolute value of 2^(64) or larger using tag numbers 2 or 3
14651465+ (Section 3.4.3), the protocol's deterministic encoding needs to
14661466+ specify whether smaller integers are also expressed using these tags
14671467+ or using major types 0 and 1. Preferred serialization uses the
14681468+ latter choice, which is therefore recommended.
14691469+14701470+ Protocols that include floating-point values, whether represented
14711471+ using basic floating-point values (Section 3.3) or using tags (or
14721472+ both), may need to define extra requirements on their deterministic
14731473+ encodings, such as:
14741474+14751475+ * Although IEEE floating-point values can represent both positive
14761476+ and negative zero as distinct values, the application might not
14771477+ distinguish these and might decide to represent all zero values
14781478+ with a positive sign, disallowing negative zero. (The application
14791479+ may also want to restrict the precision of floating-point values
14801480+ in such a way that there is never a need to represent 64-bit -- or
14811481+ even 32-bit -- floating-point values.)
14821482+14831483+ * If a protocol includes a field that can express floating-point
14841484+ values, with a specific data model that declares integer and
14851485+ floating-point values to be interchangeable, the protocol's
14861486+ deterministic encoding needs to specify whether, for example, the
14871487+ integer 1.0 is encoded as 0x01 (unsigned integer), 0xf93c00
14881488+ (binary16), 0xfa3f800000 (binary32), or 0xfb3ff0000000000000
14891489+ (binary64). Example rules for this are:
14901490+14911491+ 1. Encode integral values that fit in 64 bits as values from
14921492+ major types 0 and 1, and other values as the preferred
14931493+ (smallest of 16-, 32-, or 64-bit) floating-point
14941494+ representation that accurately represents the value,
14951495+14961496+ 2. Encode all values as the preferred floating-point
14971497+ representation that accurately represents the value, even for
14981498+ integral values, or
14991499+15001500+ 3. Encode all values as 64-bit floating-point representations.
15011501+15021502+ Rule 1 straddles the boundaries between integers and floating-
15031503+ point values, and Rule 3 does not use preferred serialization, so
15041504+ Rule 2 may be a good choice in many cases.
15051505+15061506+ * If NaN is an allowed value, and there is no intent to support NaN
15071507+ payloads or signaling NaNs, the protocol needs to pick a single
15081508+ representation, typically 0xf97e00. If that simple choice is not
15091509+ possible, specific attention will be needed for NaN handling.
15101510+15111511+ * Subnormal numbers (nonzero numbers with the lowest possible
15121512+ exponent of a given IEEE 754 number format) may be flushed to zero
15131513+ outputs or be treated as zero inputs in some floating-point
15141514+ implementations. A protocol's deterministic encoding may want to
15151515+ specifically accommodate such implementations while creating an
15161516+ onus on other implementations by excluding subnormal numbers from
15171517+ interchange, interchanging zero instead.
15181518+15191519+ * The same number can be represented by different decimal fractions,
15201520+ by different bigfloats, and by different forms under other tags
15211521+ that may be defined to express numeric values. Depending on the
15221522+ implementation, it may not always be practical to determine
15231523+ whether any of these forms (or forms in the basic generic data
15241524+ model) are equivalent. An application protocol that presents
15251525+ choices of this kind for the representation format of numbers
15261526+ needs to be explicit about how the formats for deterministic
15271527+ encoding are to be chosen.
15281528+15291529+4.2.3. Length-First Map Key Ordering
15301530+15311531+ The core deterministic encoding requirements (Section 4.2.1) sort map
15321532+ keys in a different order from the one suggested by Section 3.9 of
15331533+ [RFC7049] (called "Canonical CBOR" there). Protocols that need to be
15341534+ compatible with the order specified in [RFC7049] can instead be
15351535+ specified in terms of this specification's "length-first core
15361536+ deterministic encoding requirements":
15371537+15381538+ A CBOR encoding satisfies the "length-first core deterministic
15391539+ encoding requirements" if it satisfies the core deterministic
15401540+ encoding requirements except that the keys in every map MUST be
15411541+ sorted such that:
15421542+15431543+ 1. If two keys have different lengths, the shorter one sorts
15441544+ earlier;
15451545+15461546+ 2. If two keys have the same length, the one with the lower value in
15471547+ (bytewise) lexical order sorts earlier.
15481548+15491549+ For example, under the length-first core deterministic encoding
15501550+ requirements, the following keys are sorted correctly:
15511551+15521552+ 1. 10, encoded as 0x0a.
15531553+15541554+ 2. -1, encoded as 0x20.
15551555+15561556+ 3. false, encoded as 0xf4.
15571557+15581558+ 4. 100, encoded as 0x1864.
15591559+15601560+ 5. "z", encoded as 0x617a.
15611561+15621562+ 6. [-1], encoded as 0x8120.
15631563+15641564+ 7. "aa", encoded as 0x626161.
15651565+15661566+ 8. [100], encoded as 0x811864.
15671567+15681568+ | Although [RFC7049] used the term "Canonical CBOR" for its form
15691569+ | of requirements on deterministic encoding, this document avoids
15701570+ | this term because "canonicalization" is often associated with
15711571+ | specific uses of deterministic encoding only. The terms are
15721572+ | essentially interchangeable, however, and the set of core
15731573+ | requirements in this document could also be called "Canonical
15741574+ | CBOR", while the length-first-ordered version of that could be
15751575+ | called "Old Canonical CBOR".
15761576+15771577+5. Creating CBOR-Based Protocols
15781578+15791579+ Data formats such as CBOR are often used in environments where there
15801580+ is no format negotiation. A specific design goal of CBOR is to not
15811581+ need any included or assumed schema: a decoder can take a CBOR item
15821582+ and decode it with no other knowledge.
15831583+15841584+ Of course, in real-world implementations, the encoder and the decoder
15851585+ will have a shared view of what should be in a CBOR data item. For
15861586+ example, an agreed-to format might be "the item is an array whose
15871587+ first value is a UTF-8 string, second value is an integer, and
15881588+ subsequent values are zero or more floating-point numbers" or "the
15891589+ item is a map that has byte strings for keys and contains a pair
15901590+ whose key is 0xab01".
15911591+15921592+ CBOR-based protocols MUST specify how their decoders handle invalid
15931593+ and other unexpected data. CBOR-based protocols MAY specify that
15941594+ they treat arbitrary valid data as unexpected. Encoders for CBOR-
15951595+ based protocols MUST produce only valid items, that is, the protocol
15961596+ cannot be designed to make use of invalid items. An encoder can be
15971597+ capable of encoding as many or as few types of values as is required
15981598+ by the protocol in which it is used; a decoder can be capable of
15991599+ understanding as many or as few types of values as is required by the
16001600+ protocols in which it is used. This lack of restrictions allows CBOR
16011601+ to be used in extremely constrained environments.
16021602+16031603+ The rest of this section discusses some considerations in creating
16041604+ CBOR-based protocols. With few exceptions, it is advisory only and
16051605+ explicitly excludes any language from BCP 14 [RFC2119] [RFC8174]
16061606+ other than words that could be interpreted as "MAY" in the sense of
16071607+ BCP 14. The exceptions aim at facilitating interoperability of CBOR-
16081608+ based protocols while making use of a wide variety of both generic
16091609+ and application-specific encoders and decoders.
16101610+16111611+5.1. CBOR in Streaming Applications
16121612+16131613+ In a streaming application, a data stream may be composed of a
16141614+ sequence of CBOR data items concatenated back-to-back. In such an
16151615+ environment, the decoder immediately begins decoding a new data item
16161616+ if data is found after the end of a previous data item.
16171617+16181618+ Not all of the bytes making up a data item may be immediately
16191619+ available to the decoder; some decoders will buffer additional data
16201620+ until a complete data item can be presented to the application.
16211621+ Other decoders can present partial information about a top-level data
16221622+ item to an application, such as the nested data items that could
16231623+ already be decoded, or even parts of a byte string that hasn't
16241624+ completely arrived yet. Such an application also MUST have a
16251625+ matching streaming security mechanism, where the desired protection
16261626+ is available for incremental data presented to the application.
16271627+16281628+ Note that some applications and protocols will not want to use
16291629+ indefinite-length encoding. Using indefinite-length encoding allows
16301630+ an encoder to not need to marshal all the data for counting, but it
16311631+ requires a decoder to allocate increasing amounts of memory while
16321632+ waiting for the end of the item. This might be fine for some
16331633+ applications but not others.
16341634+16351635+5.2. Generic Encoders and Decoders
16361636+16371637+ A generic CBOR decoder can decode all well-formed encoded CBOR data
16381638+ items and present the data items to an application. See Appendix C.
16391639+ (The diagnostic notation, Section 8, may be used to present well-
16401640+ formed CBOR values to humans.)
16411641+16421642+ Generic CBOR encoders provide an application interface that allows
16431643+ the application to specify any well-formed value to be encoded as a
16441644+ CBOR data item, including simple values and tags unknown to the
16451645+ encoder.
16461646+16471647+ Even though CBOR attempts to minimize these cases, not all well-
16481648+ formed CBOR data is valid: for example, the encoded text string
16491649+ "0x62c0ae" does not contain valid UTF-8 (because [RFC3629] requires
16501650+ always using the shortest form) and so is not a valid CBOR item.
16511651+ Also, specific tags may make semantic constraints that may be
16521652+ violated, for instance, by a bignum tag enclosing another tag or by
16531653+ an instance of tag number 0 containing a byte string or containing a
16541654+ text string with contents that do not match the "date-time"
16551655+ production of [RFC3339]. There is no requirement that generic
16561656+ encoders and decoders make unnatural choices for their application
16571657+ interface to enable the processing of invalid data. Generic encoders
16581658+ and decoders are expected to forward simple values and tags even if
16591659+ their specific codepoints are not registered at the time the encoder/
16601660+ decoder is written (Section 5.4).
16611661+16621662+5.3. Validity of Items
16631663+16641664+ A well-formed but invalid CBOR data item (Section 1.2) presents a
16651665+ problem with interpreting the data encoded in it in the CBOR data
16661666+ model. A CBOR-based protocol could be specified in several layers,
16671667+ in which the lower layers don't process the semantics of some of the
16681668+ CBOR data they forward. These layers can't notice any validity
16691669+ errors in data they don't process and MUST forward that data as-is.
16701670+ The first layer that does process the semantics of an invalid CBOR
16711671+ item MUST pick one of two choices:
16721672+16731673+ 1. Replace the problematic item with an error marker and continue
16741674+ with the next item, or
16751675+16761676+ 2. Issue an error and stop processing altogether.
16771677+16781678+ A CBOR-based protocol MUST specify which of these options its
16791679+ decoders take for each kind of invalid item they might encounter.
16801680+16811681+ Such problems might occur at the basic validity level of CBOR or in
16821682+ the context of tags (tag validity).
16831683+16841684+5.3.1. Basic validity
16851685+16861686+ Two kinds of validity errors can occur in the basic generic data
16871687+ model:
16881688+16891689+ Duplicate keys in a map: Generic decoders (Section 5.2) make data
16901690+ available to applications using the native CBOR data model. That
16911691+ data model includes maps (key-value mappings with unique keys),
16921692+ not multimaps (key-value mappings where multiple entries can have
16931693+ the same key). Thus, a generic decoder that gets a CBOR map item
16941694+ that has duplicate keys will decode to a map with only one
16951695+ instance of that key, or it might stop processing altogether. On
16961696+ the other hand, a "streaming decoder" may not even be able to
16971697+ notice. See Section 5.6 for more discussion of keys in maps.
16981698+16991699+ Invalid UTF-8 string: A decoder might or might not want to verify
17001700+ that the sequence of bytes in a UTF-8 string (major type 3) is
17011701+ actually valid UTF-8 and react appropriately.
17021702+17031703+5.3.2. Tag validity
17041704+17051705+ Two additional kinds of validity errors are introduced by adding tags
17061706+ to the basic generic data model:
17071707+17081708+ Inadmissible type for tag content: Tag numbers (Section 3.4) specify
17091709+ what type of data item is supposed to be used as their tag
17101710+ content; for example, the tag numbers for unsigned or negative
17111711+ bignums are supposed to be put on byte strings. A decoder that
17121712+ decodes the tagged data item into a native representation (a
17131713+ native big integer in this example) is expected to check the type
17141714+ of the data item being tagged. Even decoders that don't have such
17151715+ native representations available in their environment may perform
17161716+ the check on those tags known to them and react appropriately.
17171717+17181718+ Inadmissible value for tag content: The type of data item may be
17191719+ admissible for a tag's content, but the specific value may not be;
17201720+ e.g., a value of "yesterday" is not acceptable for the content of
17211721+ tag 0, even though it properly is a text string. A decoder that
17221722+ normally ingests such tags into equivalent platform types might
17231723+ present this tag to the application in a similar way to how it
17241724+ would present a tag with an unknown tag number (Section 5.4).
17251725+17261726+5.4. Validity and Evolution
17271727+17281728+ A decoder with validity checking will expend the effort to reliably
17291729+ detect data items with validity errors. For example, such a decoder
17301730+ needs to have an API that reports an error (and does not return data)
17311731+ for a CBOR data item that contains any of the validity errors listed
17321732+ in the previous subsection.
17331733+17341734+ The set of tags defined in the "Concise Binary Object Representation
17351735+ (CBOR) Tags" registry (Section 9.2), as well as the set of simple
17361736+ values defined in the "Concise Binary Object Representation (CBOR)
17371737+ Simple Values" registry (Section 9.1), can grow at any time beyond
17381738+ the set understood by a generic decoder. A validity-checking decoder
17391739+ can do one of two things when it encounters such a case that it does
17401740+ not recognize:
17411741+17421742+ * It can report an error (and not return data). Note that treating
17431743+ this case as an error can cause ossification and is thus not
17441744+ encouraged. This error is not a validity error, per se. This
17451745+ kind of error is more likely to be raised by a decoder that would
17461746+ be performing validity checking if this were a known case.
17471747+17481748+ * It can emit the unknown item (type, value, and, for tags, the
17491749+ decoded tagged data item) to the application calling the decoder,
17501750+ and then give the application an indication that the decoder did
17511751+ not recognize that tag number or simple value.
17521752+17531753+ The latter approach, which is also appropriate for decoders that do
17541754+ not support validity checking, provides forward compatibility with
17551755+ newly registered tags and simple values without the requirement to
17561756+ update the encoder at the same time as the calling application. (For
17571757+ this, the decoder's API needs the ability to mark unknown items so
17581758+ that the calling application can handle them in a manner appropriate
17591759+ for the program.)
17601760+17611761+ Since some of the processing needed for validity checking may have an
17621762+ appreciable cost (in particular with duplicate detection for maps),
17631763+ support of validity checking is not a requirement placed on all CBOR
17641764+ decoders.
17651765+17661766+ Some encoders will rely on their applications to provide input data
17671767+ in such a way that valid CBOR results from the encoder. A generic
17681768+ encoder may also want to provide a validity-checking mode where it
17691769+ reliably limits its output to valid CBOR, independent of whether or
17701770+ not its application is indeed providing API-conformant data.
17711771+17721772+5.5. Numbers
17731773+17741774+ CBOR-based protocols should take into account that different language
17751775+ environments pose different restrictions on the range and precision
17761776+ of numbers that are representable. For example, the basic JavaScript
17771777+ number system treats all numbers as floating-point values, which may
17781778+ result in the silent loss of precision in decoding integers with more
17791779+ than 53 significant bits. Another example is that, since CBOR keeps
17801780+ the sign bit for its integer representation in the major type, it has
17811781+ one bit more for signed numbers of a certain length (e.g.,
17821782+ -2^(64)..2^(64)-1 for 1+8-byte integers) than the typical platform
17831783+ signed integer representation of the same length (-2^(63)..2^(63)-1
17841784+ for 8-byte int64_t). A protocol that uses numbers should define its
17851785+ expectations on the handling of nontrivial numbers in decoders and
17861786+ receiving applications.
17871787+17881788+ A CBOR-based protocol that includes floating-point numbers can
17891789+ restrict which of the three formats (half-precision, single-
17901790+ precision, and double-precision) are to be supported. For an
17911791+ integer-only application, a protocol may want to completely exclude
17921792+ the use of floating-point values.
17931793+17941794+ A CBOR-based protocol designed for compactness may want to exclude
17951795+ specific integer encodings that are longer than necessary for the
17961796+ application, such as to save the need to implement 64-bit integers.
17971797+ There is an expectation that encoders will use the most compact
17981798+ integer representation that can represent a given value. However, a
17991799+ compact application that does not require deterministic encoding
18001800+ should accept values that use a longer-than-needed encoding (such as
18011801+ encoding "0" as 0b000_11001 followed by two bytes of 0x00) as long as
18021802+ the application can decode an integer of the given size. Similar
18031803+ considerations apply to floating-point values; decoding both
18041804+ preferred serializations and longer-than-needed ones is recommended.
18051805+18061806+ CBOR-based protocols for constrained applications that provide a
18071807+ choice between representing a specific number as an integer and as a
18081808+ decimal fraction or bigfloat (such as when the exponent is small and
18091809+ nonnegative) might express a quality-of-implementation expectation
18101810+ that the integer representation is used directly.
18111811+18121812+5.6. Specifying Keys for Maps
18131813+18141814+ The encoding and decoding applications need to agree on what types of
18151815+ keys are going to be used in maps. In applications that need to
18161816+ interwork with JSON-based applications, conversion is simplified by
18171817+ limiting keys to text strings only; otherwise, there has to be a
18181818+ specified mapping from the other CBOR types to text strings, and this
18191819+ often leads to implementation errors. In applications where keys are
18201820+ numeric in nature, and numeric ordering of keys is important to the
18211821+ application, directly using the numbers for the keys is useful.
18221822+18231823+ If multiple types of keys are to be used, consideration should be
18241824+ given to how these types would be represented in the specific
18251825+ programming environments that are to be used. For example, in
18261826+ JavaScript Maps [ECMA262], a key of integer 1 cannot be distinguished
18271827+ from a key of floating-point 1.0. This means that, if integer keys
18281828+ are used, the protocol needs to avoid the use of floating-point keys
18291829+ the values of which happen to be integer numbers in the same map.
18301830+18311831+ Decoders that deliver data items nested within a CBOR data item
18321832+ immediately on decoding them ("streaming decoders") often do not keep
18331833+ the state that is necessary to ascertain uniqueness of a key in a
18341834+ map. Similarly, an encoder that can start encoding data items before
18351835+ the enclosing data item is completely available ("streaming encoder")
18361836+ may want to reduce its overhead significantly by relying on its data
18371837+ source to maintain uniqueness.
18381838+18391839+ A CBOR-based protocol MUST define what to do when a receiving
18401840+ application sees multiple identical keys in a map. The resulting
18411841+ rule in the protocol MUST respect the CBOR data model: it cannot
18421842+ prescribe a specific handling of the entries with the identical keys,
18431843+ except that it might have a rule that having identical keys in a map
18441844+ indicates a malformed map and that the decoder has to stop with an
18451845+ error. When processing maps that exhibit entries with duplicate
18461846+ keys, a generic decoder might do one of the following:
18471847+18481848+ * Not accept maps with duplicate keys (that is, enforce validity for
18491849+ maps, see also Section 5.4). These generic decoders are
18501850+ universally useful. An application may still need to perform its
18511851+ own duplicate checking based on application rules (for instance,
18521852+ if the application equates integers and floating-point values in
18531853+ map key positions for specific maps).
18541854+18551855+ * Pass all map entries to the application, including ones with
18561856+ duplicate keys. This requires that the application handle (check
18571857+ against) duplicate keys, even if the application rules are
18581858+ identical to the generic data model rules.
18591859+18601860+ * Lose some entries with duplicate keys, e.g., deliver only the
18611861+ final (or first) entry out of the entries with the same key. With
18621862+ such a generic decoder, applications may get different results for
18631863+ a specific key on different runs, and with different generic
18641864+ decoders, which value is returned is based on generic decoder
18651865+ implementation and the actual order of keys in the map. In
18661866+ particular, applications cannot validate key uniqueness on their
18671867+ own as they do not necessarily see all entries; they may not be
18681868+ able to use such a generic decoder if they need to validate key
18691869+ uniqueness. These generic decoders can only be used in situations
18701870+ where the data source and transfer always provide valid maps; this
18711871+ is not possible if the data source and transfer can be attacked.
18721872+18731873+ Generic decoders need to document which of these three approaches
18741874+ they implement.
18751875+18761876+ The CBOR data model for maps does not allow ascribing semantics to
18771877+ the order of the key/value pairs in the map representation. Thus, a
18781878+ CBOR-based protocol MUST NOT specify that changing the key/value pair
18791879+ order in a map changes the semantics, except to specify that some
18801880+ orders are disallowed, for example, where they would not meet the
18811881+ requirements of a deterministic encoding (Section 4.2). (Any
18821882+ secondary effects of map ordering such as on timing, cache usage, and
18831883+ other potential side channels are not considered part of the
18841884+ semantics but may be enough reason on their own for a protocol to
18851885+ require a deterministic encoding format.)
18861886+18871887+ Applications for constrained devices should consider using small
18881888+ integers as keys if they have maps with a small number of frequently
18891889+ used keys; for instance, a set of 24 or fewer keys can be encoded in
18901890+ a single byte as unsigned integers, up to 48 if negative integers are
18911891+ also used. Less frequently occurring keys can then use integers with
18921892+ longer encodings.
18931893+18941894+5.6.1. Equivalence of Keys
18951895+18961896+ The specific data model that applies to a CBOR data item is used to
18971897+ determine whether keys occurring in maps are duplicates or distinct.
18981898+18991899+ At the generic data model level, numerically equivalent integer and
19001900+ floating-point values are distinct from each other, as they are from
19011901+ the various big numbers (Tags 2 to 5). Similarly, text strings are
19021902+ distinct from byte strings, even if composed of the same bytes. A
19031903+ tagged value is distinct from an untagged value or from a value
19041904+ tagged with a different tag number.
19051905+19061906+ Within each of these groups, numeric values are distinct unless they
19071907+ are numerically equal (specifically, -0.0 is equal to 0.0); for the
19081908+ purpose of map key equivalence, NaN values are equivalent if they
19091909+ have the same significand after zero-extending both significands at
19101910+ the right to 64 bits.
19111911+19121912+ Both byte strings and text strings are compared byte by byte, arrays
19131913+ are compared element by element, and are equal if they have the same
19141914+ number of bytes/elements and the same values at the same positions.
19151915+ Two maps are equal if they have the same set of pairs regardless of
19161916+ their order; pairs are equal if both the key and value are equal.
19171917+19181918+ Tagged values are equal if both the tag number and the tag content
19191919+ are equal. (Note that a generic decoder that provides processing for
19201920+ a specific tag may not be able to distinguish some semantically
19211921+ equivalent values, e.g., if leading zeroes occur in the content of
19221922+ tag 2 or tag 3 (Section 3.4.3).) Simple values are equal if they
19231923+ simply have the same value. Nothing else is equal in the generic
19241924+ data model; a simple value 2 is not equivalent to an integer 2, and
19251925+ an array is never equivalent to a map.
19261926+19271927+ As discussed in Section 2.2, specific data models can make values
19281928+ equivalent for the purpose of comparing map keys that are distinct in
19291929+ the generic data model. Note that this implies that a generic
19301930+ decoder may deliver a decoded map to an application that needs to be
19311931+ checked for duplicate map keys by that application (alternatively,
19321932+ the decoder may provide a programming interface to perform this
19331933+ service for the application). Specific data models are not able to
19341934+ distinguish values for map keys that are equal for this purpose at
19351935+ the generic data model level.
19361936+19371937+5.7. Undefined Values
19381938+19391939+ In some CBOR-based protocols, the simple value (Section 3.3) of
19401940+ "undefined" might be used by an encoder as a substitute for a data
19411941+ item with an encoding problem, in order to allow the rest of the
19421942+ enclosing data items to be encoded without harm.
19431943+19441944+6. Converting Data between CBOR and JSON
19451945+19461946+ This section gives non-normative advice about converting between CBOR
19471947+ and JSON. Implementations of converters MAY use whichever advice
19481948+ here they want.
19491949+19501950+ It is worth noting that a JSON text is a sequence of characters, not
19511951+ an encoded sequence of bytes, while a CBOR data item consists of
19521952+ bytes, not characters.
19531953+19541954+6.1. Converting from CBOR to JSON
19551955+19561956+ Most of the types in CBOR have direct analogs in JSON. However, some
19571957+ do not, and someone implementing a CBOR-to-JSON converter has to
19581958+ consider what to do in those cases. The following non-normative
19591959+ advice deals with these by converting them to a single substitute
19601960+ value, such as a JSON null.
19611961+19621962+ * An integer (major type 0 or 1) becomes a JSON number.
19631963+19641964+ * A byte string (major type 2) that is not embedded in a tag that
19651965+ specifies a proposed encoding is encoded in base64url without
19661966+ padding and becomes a JSON string.
19671967+19681968+ * A UTF-8 string (major type 3) becomes a JSON string. Note that
19691969+ JSON requires escaping certain characters ([RFC8259], Section 7):
19701970+ quotation mark (U+0022), reverse solidus (U+005C), and the "C0
19711971+ control characters" (U+0000 through U+001F). All other characters
19721972+ are copied unchanged into the JSON UTF-8 string.
19731973+19741974+ * An array (major type 4) becomes a JSON array.
19751975+19761976+ * A map (major type 5) becomes a JSON object. This is possible
19771977+ directly only if all keys are UTF-8 strings. A converter might
19781978+ also convert other keys into UTF-8 strings (such as by converting
19791979+ integers into strings containing their decimal representation);
19801980+ however, doing so introduces a danger of key collision. Note also
19811981+ that, if tags on UTF-8 strings are ignored as proposed below, this
19821982+ will cause a key collision if the tags are different but the
19831983+ strings are the same.
19841984+19851985+ * False (major type 7, additional information 20) becomes a JSON
19861986+ false.
19871987+19881988+ * True (major type 7, additional information 21) becomes a JSON
19891989+ true.
19901990+19911991+ * Null (major type 7, additional information 22) becomes a JSON
19921992+ null.
19931993+19941994+ * A floating-point value (major type 7, additional information 25
19951995+ through 27) becomes a JSON number if it is finite (that is, it can
19961996+ be represented in a JSON number); if the value is non-finite (NaN,
19971997+ or positive or negative Infinity), it is represented by the
19981998+ substitute value.
19991999+20002000+ * Any other simple value (major type 7, any additional information
20012001+ value not yet discussed) is represented by the substitute value.
20022002+20032003+ * A bignum (major type 6, tag number 2 or 3) is represented by
20042004+ encoding its byte string in base64url without padding and becomes
20052005+ a JSON string. For tag number 3 (negative bignum), a "~" (ASCII
20062006+ tilde) is inserted before the base-encoded value. (The conversion
20072007+ to a binary blob instead of a number is to prevent a likely
20082008+ numeric overflow for the JSON decoder.)
20092009+20102010+ * A byte string with an encoding hint (major type 6, tag number 21
20112011+ through 23) is encoded as described by the hint and becomes a JSON
20122012+ string.
20132013+20142014+ * For all other tags (major type 6, any other tag number), the tag
20152015+ content is represented as a JSON value; the tag number is ignored.
20162016+20172017+ * Indefinite-length items are made definite before conversion.
20182018+20192019+ A CBOR-to-JSON converter may want to keep to the JSON profile I-JSON
20202020+ [RFC7493], to maximize interoperability and increase confidence that
20212021+ the JSON output can be processed with predictable results. For
20222022+ example, this has implications on the range of integers that can be
20232023+ represented reliably, as well as on the top-level items that may be
20242024+ supported by older JSON implementations.
20252025+20262026+6.2. Converting from JSON to CBOR
20272027+20282028+ All JSON values, once decoded, directly map into one or more CBOR
20292029+ values. As with any kind of CBOR generation, decisions have to be
20302030+ made with respect to number representation. In a suggested
20312031+ conversion:
20322032+20332033+ * JSON numbers without fractional parts (integer numbers) are
20342034+ represented as integers (major types 0 and 1, possibly major type
20352035+ 6, tag number 2 and 3), choosing the shortest form; integers
20362036+ longer than an implementation-defined threshold may instead be
20372037+ represented as floating-point values. The default range that is
20382038+ represented as integer is -2^(53)+1..2^(53)-1 (fully exploiting
20392039+ the range for exact integers in the binary64 representation often
20402040+ used for decoding JSON [RFC7493]). A CBOR-based protocol, or a
20412041+ generic converter implementation, may choose -2^(32)..2^(32)-1 or
20422042+ -2^(64)..2^(64)-1 (fully using the integer ranges available in
20432043+ CBOR with uint32_t or uint64_t, respectively) or even
20442044+ -2^(31)..2^(31)-1 or -2^(63)..2^(63)-1 (using popular ranges for
20452045+ two's complement signed integers). (If the JSON was generated
20462046+ from a JavaScript implementation, its precision is already limited
20472047+ to 53 bits maximum.)
20482048+20492049+ * Numbers with fractional parts are represented as floating-point
20502050+ values, performing the decimal-to-binary conversion based on the
20512051+ precision provided by IEEE 754 binary64. The mathematical value
20522052+ of the JSON number is converted to binary64 using the
20532053+ roundTiesToEven procedure in Section 4.3.1 of [IEEE754]. Then,
20542054+ when encoding in CBOR, the preferred serialization uses the
20552055+ shortest floating-point representation exactly representing this
20562056+ conversion result; for instance, 1.5 is represented in a 16-bit
20572057+ floating-point value (not all implementations will be capable of
20582058+ efficiently finding the minimum form, though). Instead of using
20592059+ the default binary64 precision, there may be an implementation-
20602060+ defined limit to the precision of the conversion that will affect
20612061+ the precision of the represented values. Decimal representation
20622062+ should only be used on the CBOR side if that is specified in a
20632063+ protocol.
20642064+20652065+ CBOR has been designed to generally provide a more compact encoding
20662066+ than JSON. One implementation strategy that might come to mind is to
20672067+ perform a JSON-to-CBOR encoding in place in a single buffer. This
20682068+ strategy would need to carefully consider a number of pathological
20692069+ cases, such as that some strings represented with no or very few
20702070+ escapes and longer (or much longer) than 255 bytes may expand when
20712071+ encoded as UTF-8 strings in CBOR. Similarly, a few of the binary
20722072+ floating-point representations might cause expansion from some short
20732073+ decimal representations (1.1, 1e9) in JSON. This may be hard to get
20742074+ right, and any ensuing vulnerabilities may be exploited by an
20752075+ attacker.
20762076+20772077+7. Future Evolution of CBOR
20782078+20792079+ Successful protocols evolve over time. New ideas appear,
20802080+ implementation platforms improve, related protocols are developed and
20812081+ evolve, and new requirements from applications and protocols are
20822082+ added. Facilitating protocol evolution is therefore an important
20832083+ design consideration for any protocol development.
20842084+20852085+ For protocols that will use CBOR, CBOR provides some useful
20862086+ mechanisms to facilitate their evolution. Best practices for this
20872087+ are well known, particularly from JSON format development of JSON-
20882088+ based protocols. Therefore, such best practices are outside the
20892089+ scope of this specification.
20902090+20912091+ However, facilitating the evolution of CBOR itself is very well
20922092+ within its scope. CBOR is designed to both provide a stable basis
20932093+ for development of CBOR-based protocols and to be able to evolve.
20942094+ Since a successful protocol may live for decades, CBOR needs to be
20952095+ designed for decades of use and evolution. This section provides
20962096+ some guidance for the evolution of CBOR. It is necessarily more
20972097+ subjective than other parts of this document. It is also necessarily
20982098+ incomplete, lest it turn into a textbook on protocol development.
20992099+21002100+7.1. Extension Points
21012101+21022102+ In a protocol design, opportunities for evolution are often included
21032103+ in the form of extension points. For example, there may be a
21042104+ codepoint space that is not fully allocated from the outset, and the
21052105+ protocol is designed to tolerate and embrace implementations that
21062106+ start using more codepoints than initially allocated.
21072107+21082108+ Sizing the codepoint space may be difficult because the range
21092109+ required may be hard to predict. Protocol designs should attempt to
21102110+ make the codepoint space large enough so that it can slowly be filled
21112111+ over the intended lifetime of the protocol.
21122112+21132113+ CBOR has three major extension points:
21142114+21152115+ the "simple" space (values in major type 7): Of the 24 efficient
21162116+ (and 224 slightly less efficient) values, only a small number have
21172117+ been allocated. Implementations receiving an unknown simple data
21182118+ item may easily be able to process it as such, given that the
21192119+ structure of the value is indeed simple. The IANA registry in
21202120+ Section 9.1 is the appropriate way to address the extensibility of
21212121+ this codepoint space.
21222122+21232123+ the "tag" space (values in major type 6): The total codepoint space
21242124+ is abundant; only a tiny part of it has been allocated. However,
21252125+ not all of these codepoints are equally efficient: the first 24
21262126+ only consume a single ("1+0") byte, and half of them have already
21272127+ been allocated. The next 232 values only consume two ("1+1")
21282128+ bytes, with nearly a quarter already allocated. These subspaces
21292129+ need some curation to last for a few more decades.
21302130+ Implementations receiving an unknown tag number can choose to
21312131+ process just the enclosed tag content or, preferably, to process
21322132+ the tag as an unknown tag number wrapping the tag content. The
21332133+ IANA registry in Section 9.2 is the appropriate way to address the
21342134+ extensibility of this codepoint space.
21352135+21362136+ the "additional information" space: An implementation receiving an
21372137+ unknown additional information value has no way to continue
21382138+ decoding, so allocating codepoints in this space is a major step
21392139+ beyond just exercising an extension point. There are also very
21402140+ few codepoints left. See also Section 7.2.
21412141+21422142+7.2. Curating the Additional Information Space
21432143+21442144+ The human mind is sometimes drawn to filling in little perceived gaps
21452145+ to make something neat. We expect the remaining gaps in the
21462146+ codepoint space for the additional information values to be an
21472147+ attractor for new ideas, just because they are there.
21482148+21492149+ The present specification does not manage the additional information
21502150+ codepoint space by an IANA registry. Instead, allocations out of
21512151+ this space can only be done by updating this specification.
21522152+21532153+ For an additional information value of n >= 24, the size of the
21542154+ additional data typically is 2^(n-24) bytes. Therefore, additional
21552155+ information values 28 and 29 should be viewed as candidates for
21562156+ 128-bit and 256-bit quantities, in case a need arises to add them to
21572157+ the protocol. Additional information value 30 is then the only
21582158+ additional information value available for general allocation, and
21592159+ there should be a very good reason for allocating it before assigning
21602160+ it through an update of the present specification.
21612161+21622162+8. Diagnostic Notation
21632163+21642164+ CBOR is a binary interchange format. To facilitate documentation and
21652165+ debugging, and in particular to facilitate communication between
21662166+ entities cooperating in debugging, this section defines a simple
21672167+ human-readable diagnostic notation. All actual interchange always
21682168+ happens in the binary format.
21692169+21702170+ Note that this truly is a diagnostic format; it is not meant to be
21712171+ parsed. Therefore, no formal definition (as in ABNF) is given in
21722172+ this document. (Implementers looking for a text-based format for
21732173+ representing CBOR data items in configuration files may also want to
21742174+ consider YAML [YAML].)
21752175+21762176+ The diagnostic notation is loosely based on JSON as it is defined in
21772177+ RFC 8259, extending it where needed.
21782178+21792179+ The notation borrows the JSON syntax for numbers (integer and
21802180+ floating-point), True (>true<), False (>false<), Null (>null<), UTF-8
21812181+ strings, arrays, and maps (maps are called objects in JSON; the
21822182+ diagnostic notation extends JSON here by allowing any data item in
21832183+ the key position). Undefined is written >undefined< as in
21842184+ JavaScript. The non-finite floating-point numbers Infinity,
21852185+ -Infinity, and NaN are written exactly as in this sentence (this is
21862186+ also a way they can be written in JavaScript, although JSON does not
21872187+ allow them). A tag is written as an integer number for the tag
21882188+ number, followed by the tag content in parentheses; for instance, a
21892189+ date in the format specified by RFC 3339 (ISO 8601) could be notated
21902190+ as:
21912191+21922192+ 0("2013-03-21T20:04:00Z")
21932193+21942194+ or the equivalent relative time as the following:
21952195+21962196+ 1(1363896240)
21972197+21982198+ Byte strings are notated in one of the base encodings, without
21992199+ padding, enclosed in single quotes, prefixed by >h< for base16, >b32<
22002200+ for base32, >h32< for base32hex, >b64< for base64 or base64url (the
22012201+ actual encodings do not overlap, so the string remains unambiguous).
22022202+ For example, the byte string 0x12345678 could be written h'12345678',
22032203+ b32'CI2FM6A', or b64'EjRWeA'.
22042204+22052205+ Unassigned simple values are given as "simple()" with the appropriate
22062206+ integer in the parentheses. For example, "simple(42)" indicates
22072207+ major type 7, value 42.
22082208+22092209+ A number of useful extensions to the diagnostic notation defined here
22102210+ are provided in Appendix G of [RFC8610], "Extended Diagnostic
22112211+ Notation" (EDN). Similarly, this notation could be extended in a
22122212+ separate document to provide documentation for NaN payloads, which
22132213+ are not covered in this document.
22142214+22152215+8.1. Encoding Indicators
22162216+22172217+ Sometimes it is useful to indicate in the diagnostic notation which
22182218+ of several alternative representations were actually used; for
22192219+ example, a data item written >1.5< by a diagnostic decoder might have
22202220+ been encoded as a half-, single-, or double-precision float.
22212221+22222222+ The convention for encoding indicators is that anything starting with
22232223+ an underscore and all following characters that are alphanumeric or
22242224+ underscore is an encoding indicator, and can be ignored by anyone not
22252225+ interested in this information. For example, "_" or "_3". Encoding
22262226+ indicators are always optional.
22272227+22282228+ A single underscore can be written after the opening brace of a map
22292229+ or the opening bracket of an array to indicate that the data item was
22302230+ represented in indefinite-length format. For example, [_ 1, 2]
22312231+ contains an indicator that an indefinite-length representation was
22322232+ used to represent the data item [1, 2].
22332233+22342234+ An underscore followed by a decimal digit n indicates that the
22352235+ preceding item (or, for arrays and maps, the item starting with the
22362236+ preceding bracket or brace) was encoded with an additional
22372237+ information value of 24+n. For example, 1.5_1 is a half-precision
22382238+ floating-point number, while 1.5_3 is encoded as double precision.
22392239+ This encoding indicator is not shown in Appendix A. (Note that the
22402240+ encoding indicator "_" is thus an abbreviation of the full form "_7",
22412241+ which is not used.)
22422242+22432243+ The detailed chunk structure of byte and text strings of indefinite
22442244+ length can be notated in the form (_ h'0123', h'4567') and (_ "foo",
22452245+ "bar"). However, for an indefinite-length string with no chunks
22462246+ inside, (_ ) would be ambiguous as to whether a byte string (0x5fff)
22472247+ or a text string (0x7fff) is meant and is therefore not used. The
22482248+ basic forms ''_ and ""_ can be used instead and are reserved for the
22492249+ case of no chunks only -- not as short forms for the (permitted, but
22502250+ not really useful) encodings with only empty chunks, which need to be
22512251+ notated as (_ ''), (_ ""), etc., to preserve the chunk structure.
22522252+22532253+9. IANA Considerations
22542254+22552255+ IANA has created two registries for new CBOR values. The registries
22562256+ are separate, that is, not under an umbrella registry, and follow the
22572257+ rules in [RFC8126]. IANA has also assigned a new media type, an
22582258+ associated CoAP Content-Format entry, and a structured syntax suffix.
22592259+22602260+9.1. CBOR Simple Values Registry
22612261+22622262+ IANA has created the "Concise Binary Object Representation (CBOR)
22632263+ Simple Values" registry at [IANA.cbor-simple-values]. The initial
22642264+ values are shown in Table 4.
22652265+22662266+ New entries in the range 0 to 19 are assigned by Standards Action
22672267+ [RFC8126]. It is suggested that IANA allocate values starting with
22682268+ the number 16 in order to reserve the lower numbers for contiguous
22692269+ blocks (if any).
22702270+22712271+ New entries in the range 32 to 255 are assigned by Specification
22722272+ Required.
22732273+22742274+9.2. CBOR Tags Registry
22752275+22762276+ IANA has created the "Concise Binary Object Representation (CBOR)
22772277+ Tags" registry at [IANA.cbor-tags]. The tags that were defined in
22782278+ [RFC7049] are described in detail in Section 3.4, and other tags have
22792279+ already been defined since then.
22802280+22812281+ New entries in the range 0 to 23 ("1+0") are assigned by Standards
22822282+ Action. New entries in the ranges 24 to 255 ("1+1") and 256 to 32767
22832283+ (lower half of "1+2") are assigned by Specification Required. New
22842284+ entries in the range 32768 to 18446744073709551615 (upper half of
22852285+ "1+2", "1+4", and "1+8") are assigned by First Come First Served.
22862286+ The template for registration requests is:
22872287+22882288+ * Data item
22892289+22902290+ * Semantics (short form)
22912291+22922292+ In addition, First Come First Served requests should include:
22932293+22942294+ * Point of contact
22952295+22962296+ * Description of semantics (URL) -- This description is optional;
22972297+ the URL can point to something like an Internet-Draft or a web
22982298+ page.
22992299+23002300+ Applicants exercising the First Come First Served range and making a
23012301+ suggestion for a tag number that is not representable in 32 bits
23022302+ (i.e., larger than 4294967295) should be aware that this could reduce
23032303+ interoperability with implementations that do not support 64-bit
23042304+ numbers.
23052305+23062306+9.3. Media Types Registry
23072307+23082308+ The Internet media type [RFC6838] ("MIME type") for a single encoded
23092309+ CBOR data item is "application/cbor" as defined in the "Media Types"
23102310+ registry [IANA.media-types]:
23112311+23122312+ Type name: application
23132313+23142314+ Subtype name: cbor
23152315+23162316+ Required parameters: n/a
23172317+23182318+ Optional parameters: n/a
23192319+23202320+ Encoding considerations: Binary
23212321+23222322+ Security considerations: See Section 10 of RFC 8949.
23232323+23242324+ Interoperability considerations: n/a
23252325+23262326+ Published specification: RFC 8949
23272327+23282328+ Applications that use this media type: Many
23292329+23302330+ Additional information:
23312331+23322332+ Magic number(s): n/a
23332333+ File extension(s): .cbor
23342334+ Macintosh file type code(s): n/a
23352335+23362336+ Person & email address to contact for further information: IETF CBOR
23372337+ Working Group (cbor@ietf.org) or IETF Applications and Real-Time
23382338+ Area (art@ietf.org)
23392339+23402340+ Intended usage: COMMON
23412341+23422342+ Restrictions on usage: none
23432343+23442344+ Author: IETF CBOR Working Group (cbor@ietf.org)
23452345+23462346+ Change controller: The IESG (iesg@ietf.org)
23472347+23482348+9.4. CoAP Content-Format Registry
23492349+23502350+ The CoAP Content-Format for CBOR has been registered in the "CoAP
23512351+ Content-Formats" subregistry within the "Constrained RESTful
23522352+ Environments (CoRE) Parameters" registry [IANA.core-parameters]:
23532353+23542354+ Media Type: application/cbor
23552355+23562356+ Encoding: -
23572357+23582358+ ID: 60
23592359+23602360+ Reference: RFC 8949
23612361+23622362+9.5. Structured Syntax Suffix Registry
23632363+23642364+ The structured syntax suffix [RFC6838] for media types based on a
23652365+ single encoded CBOR data item is +cbor, which IANA has registered in
23662366+ the "Structured Syntax Suffixes" registry [IANA.structured-suffix]:
23672367+23682368+ Name: Concise Binary Object Representation (CBOR)
23692369+23702370+ +suffix: +cbor
23712371+23722372+ References: RFC 8949
23732373+23742374+ Encoding Considerations: CBOR is a binary format.
23752375+23762376+ Interoperability Considerations: n/a
23772377+23782378+ Fragment Identifier Considerations: The syntax and semantics of
23792379+ fragment identifiers specified for +cbor SHOULD be as specified
23802380+ for "application/cbor". (At publication of RFC 8949, there is no
23812381+ fragment identification syntax defined for "application/cbor".)
23822382+23832383+ The syntax and semantics for fragment identifiers for a specific
23842384+ "xxx/yyy+cbor" SHOULD be processed as follows:
23852385+23862386+ * For cases defined in +cbor, where the fragment identifier
23872387+ resolves per the +cbor rules, then process as specified in
23882388+ +cbor.
23892389+23902390+ * For cases defined in +cbor, where the fragment identifier does
23912391+ not resolve per the +cbor rules, then process as specified in
23922392+ "xxx/yyy+cbor".
23932393+23942394+ * For cases not defined in +cbor, then process as specified in
23952395+ "xxx/yyy+cbor".
23962396+23972397+ Security Considerations: See Section 10 of RFC 8949.
23982398+23992399+ Contact: IETF CBOR Working Group (cbor@ietf.org) or IETF
24002400+ Applications and Real-Time Area (art@ietf.org)
24012401+24022402+ Author/Change Controller: IETF
24032403+24042404+10. Security Considerations
24052405+24062406+ A network-facing application can exhibit vulnerabilities in its
24072407+ processing logic for incoming data. Complex parsers are well known
24082408+ as a likely source of such vulnerabilities, such as the ability to
24092409+ remotely crash a node, or even remotely execute arbitrary code on it.
24102410+ CBOR attempts to narrow the opportunities for introducing such
24112411+ vulnerabilities by reducing parser complexity, by giving the entire
24122412+ range of encodable values a meaning where possible.
24132413+24142414+ Because CBOR decoders are often used as a first step in processing
24152415+ unvalidated input, they need to be fully prepared for all types of
24162416+ hostile input that may be designed to corrupt, overrun, or achieve
24172417+ control of the system decoding the CBOR data item. A CBOR decoder
24182418+ needs to assume that all input may be hostile even if it has been
24192419+ checked by a firewall, has come over a secure channel such as TLS, is
24202420+ encrypted or signed, or has come from some other source that is
24212421+ presumed trusted.
24222422+24232423+ Section 4.1 gives examples of limitations in interoperability when
24242424+ using a constrained CBOR decoder with input from a CBOR encoder that
24252425+ uses a non-preferred serialization. When a single data item is
24262426+ consumed both by such a constrained decoder and a full decoder, it
24272427+ can lead to security issues that can be exploited by an attacker who
24282428+ can inject or manipulate content.
24292429+24302430+ As discussed throughout this document, there are many values that can
24312431+ be considered "equivalent" in some circumstances and "not equivalent"
24322432+ in others. As just one example, the numeric value for the number
24332433+ "one" might be expressed as an integer or a bignum. A system
24342434+ interpreting CBOR input might accept either form for the number
24352435+ "one", or might reject one (or both) forms. Such acceptance or
24362436+ rejection can have security implications in the program that is using
24372437+ the interpreted input.
24382438+24392439+ Hostile input may be constructed to overrun buffers, to overflow or
24402440+ underflow integer arithmetic, or to cause other decoding disruption.
24412441+ CBOR data items might have lengths or sizes that are intentionally
24422442+ extremely large or too short. Resource exhaustion attacks might
24432443+ attempt to lure a decoder into allocating very big data items
24442444+ (strings, arrays, maps, or even arbitrary precision numbers) or
24452445+ exhaust the stack depth by setting up deeply nested items. Decoders
24462446+ need to have appropriate resource management to mitigate these
24472447+ attacks. (Items for which very large sizes are given can also
24482448+ attempt to exploit integer overflow vulnerabilities.)
24492449+24502450+ A CBOR decoder, by definition, only accepts well-formed CBOR; this is
24512451+ the first step to its robustness. Input that is not well-formed CBOR
24522452+ causes no further processing from the point where the lack of well-
24532453+ formedness was detected. If possible, any data decoded up to this
24542454+ point should have no impact on the application using the CBOR
24552455+ decoder.
24562456+24572457+ In addition to ascertaining well-formedness, a CBOR decoder might
24582458+ also perform validity checks on the CBOR data. Alternatively, it can
24592459+ leave those checks to the application using the decoder. This choice
24602460+ needs to be clearly documented in the decoder. Beyond the validity
24612461+ at the CBOR level, an application also needs to ascertain that the
24622462+ input is in alignment with the application protocol that is
24632463+ serialized in CBOR.
24642464+24652465+ The input check itself may consume resources. This is usually linear
24662466+ in the size of the input, which means that an attacker has to spend
24672467+ resources that are commensurate to the resources spent by the
24682468+ defender on input validation. However, an attacker might be able to
24692469+ craft inputs that will take longer for a target decoder to process
24702470+ than for the attacker to produce. Processing for arbitrary-precision
24712471+ numbers may exceed linear effort. Also, some hash-table
24722472+ implementations that are used by decoders to build in-memory
24732473+ representations of maps can be attacked to spend quadratic effort,
24742474+ unless a secret key (see Section 7 of [SIPHASH_LNCS], also
24752475+ [SIPHASH_OPEN]) or some other mitigation is employed. Such
24762476+ superlinear efforts can be exploited by an attacker to exhaust
24772477+ resources at or before the input validator; they therefore need to be
24782478+ avoided in a CBOR decoder implementation. Note that tag number
24792479+ definitions and their implementations can add security considerations
24802480+ of this kind; this should then be discussed in the security
24812481+ considerations of the tag number definition.
24822482+24832483+ CBOR encoders do not receive input directly from the network and are
24842484+ thus not directly attackable in the same way as CBOR decoders.
24852485+ However, CBOR encoders often have an API that takes input from
24862486+ another level in the implementation and can be attacked through that
24872487+ API. The design and implementation of that API should assume the
24882488+ behavior of its caller may be based on hostile input or on coding
24892489+ mistakes. It should check inputs for buffer overruns, overflow and
24902490+ underflow of integer arithmetic, and other such errors that are aimed
24912491+ to disrupt the encoder.
24922492+24932493+ Protocols should be defined in such a way that potential multiple
24942494+ interpretations are reliably reduced to a single interpretation. For
24952495+ example, an attacker could make use of invalid input such as
24962496+ duplicate keys in maps, or exploit different precision in processing
24972497+ numbers to make one application base its decisions on a different
24982498+ interpretation than the one that will be used by a second
24992499+ application. To facilitate consistent interpretation, encoder and
25002500+ decoder implementations should provide a validity-checking mode of
25012501+ operation (Section 5.4). Note, however, that a generic decoder
25022502+ cannot know about all requirements that an application poses on its
25032503+ input data; it is therefore not relieving the application from
25042504+ performing its own input checking. Also, since the set of defined
25052505+ tag numbers evolves, the application may employ a tag number that is
25062506+ not yet supported for validity checking by the generic decoder it
25072507+ uses. Generic decoders therefore need to document which tag numbers
25082508+ they support and what validity checking they provide for those tag
25092509+ numbers as well as for basic CBOR (UTF-8 checking, duplicate map key
25102510+ checking).
25112511+25122512+ Section 3.4.3 notes that using the non-preferred choice of a bignum
25132513+ representation instead of a basic integer for encoding a number is
25142514+ not intended to have application semantics, but it can have such
25152515+ semantics if an application receiving CBOR data is using a decoder in
25162516+ the basic generic data model. This disparity causes a security issue
25172517+ if the two sets of semantics differ. Thus, applications using CBOR
25182518+ need to specify the data model that they are using for each use of
25192519+ CBOR data.
25202520+25212521+ It is common to convert CBOR data to other formats. In many cases,
25222522+ CBOR has more expressive types than other formats; this is
25232523+ particularly true for the common conversion to JSON. The loss of
25242524+ type information can cause security issues for the systems that are
25252525+ processing the less-expressive data.
25262526+25272527+ Section 6.2 describes a possibly common usage scenario of converting
25282528+ between CBOR and JSON that could allow an attack if the attacker
25292529+ knows that the application is performing the conversion.
25302530+25312531+ Security considerations for the use of base16 and base64 from
25322532+ [RFC4648], and the use of UTF-8 from [RFC3629], are relevant to CBOR
25332533+ as well.
25342534+25352535+11. References
25362536+25372537+11.1. Normative References
25382538+25392539+ [C] International Organization for Standardization,
25402540+ "Information technology - Programming languages - C",
25412541+ Fourth Edition, ISO/IEC 9899:2018, June 2018,
25422542+ <https://www.iso.org/standard/74528.html>.
25432543+25442544+ [Cplusplus20]
25452545+ International Organization for Standardization,
25462546+ "Programming languages - C++", Sixth Edition, ISO/IEC DIS
25472547+ 14882, ISO/IEC ISO/IEC JTC1 SC22 WG21 N 4860, March 2020,
25482548+ <https://isocpp.org/files/papers/N4860.pdf>.
25492549+25502550+ [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE
25512551+ Std 754-2019, DOI 10.1109/IEEESTD.2019.8766229,
25522552+ <https://ieeexplore.ieee.org/document/8766229>.
25532553+25542554+ [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
25552555+ Extensions (MIME) Part One: Format of Internet Message
25562556+ Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
25572557+ <https://www.rfc-editor.org/info/rfc2045>.
25582558+25592559+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
25602560+ Requirement Levels", BCP 14, RFC 2119,
25612561+ DOI 10.17487/RFC2119, March 1997,
25622562+ <https://www.rfc-editor.org/info/rfc2119>.
25632563+25642564+ [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet:
25652565+ Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
25662566+ <https://www.rfc-editor.org/info/rfc3339>.
25672567+25682568+ [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
25692569+ 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
25702570+ 2003, <https://www.rfc-editor.org/info/rfc3629>.
25712571+25722572+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
25732573+ Resource Identifier (URI): Generic Syntax", STD 66,
25742574+ RFC 3986, DOI 10.17487/RFC3986, January 2005,
25752575+ <https://www.rfc-editor.org/info/rfc3986>.
25762576+25772577+ [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
25782578+ Syndication Format", RFC 4287, DOI 10.17487/RFC4287,
25792579+ December 2005, <https://www.rfc-editor.org/info/rfc4287>.
25802580+25812581+ [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
25822582+ Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
25832583+ <https://www.rfc-editor.org/info/rfc4648>.
25842584+25852585+ [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
25862586+ Writing an IANA Considerations Section in RFCs", BCP 26,
25872587+ RFC 8126, DOI 10.17487/RFC8126, June 2017,
25882588+ <https://www.rfc-editor.org/info/rfc8126>.
25892589+25902590+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
25912591+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
25922592+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
25932593+25942594+ [TIME_T] The Open Group, "The Open Group Base Specifications",
25952595+ Section 4.16, 'Seconds Since the Epoch', Issue 7, 2018
25962596+ Edition, IEEE Std 1003.1, 2018,
25972597+ <https://pubs.opengroup.org/onlinepubs/9699919799/
25982598+ basedefs/V1_chap04.html#tag_04_16>.
25992599+26002600+11.2. Informative References
26012601+26022602+ [ASN.1] International Telecommunication Union, "Information
26032603+ Technology - ASN.1 encoding rules: Specification of Basic
26042604+ Encoding Rules (BER), Canonical Encoding Rules (CER) and
26052605+ Distinguished Encoding Rules (DER)", ITU-T Recommendation
26062606+ X.690, 2015,
26072607+ <https://www.itu.int/rec/T-REC-X.690-201508-I/en>.
26082608+26092609+ [BSON] Various, "BSON - Binary JSON", <http://bsonspec.org/>.
26102610+26112611+ [CBOR-TAGS]
26122612+ Bormann, C., "Notable CBOR Tags", Work in Progress,
26132613+ Internet-Draft, draft-bormann-cbor-notable-tags-02, 25
26142614+ June 2020, <https://tools.ietf.org/html/draft-bormann-
26152615+ cbor-notable-tags-02>.
26162616+26172617+ [ECMA262] Ecma International, "ECMAScript 2020 Language
26182618+ Specification", Standard ECMA-262, 11th Edition, June
26192619+ 2020, <https://www.ecma-
26202620+ international.org/publications/standards/Ecma-262.htm>.
26212621+26222622+ [Err3764] RFC Errata, Erratum ID 3764, RFC 7049,
26232623+ <https://www.rfc-editor.org/errata/eid3764>.
26242624+26252625+ [Err3770] RFC Errata, Erratum ID 3770, RFC 7049,
26262626+ <https://www.rfc-editor.org/errata/eid3770>.
26272627+26282628+ [Err4294] RFC Errata, Erratum ID 4294, RFC 7049,
26292629+ <https://www.rfc-editor.org/errata/eid4294>.
26302630+26312631+ [Err4409] RFC Errata, Erratum ID 4409, RFC 7049,
26322632+ <https://www.rfc-editor.org/errata/eid4409>.
26332633+26342634+ [Err4963] RFC Errata, Erratum ID 4963, RFC 7049,
26352635+ <https://www.rfc-editor.org/errata/eid4963>.
26362636+26372637+ [Err4964] RFC Errata, Erratum ID 4964, RFC 7049,
26382638+ <https://www.rfc-editor.org/errata/eid4964>.
26392639+26402640+ [Err5434] RFC Errata, Erratum ID 5434, RFC 7049,
26412641+ <https://www.rfc-editor.org/errata/eid5434>.
26422642+26432643+ [Err5763] RFC Errata, Erratum ID 5763, RFC 7049,
26442644+ <https://www.rfc-editor.org/errata/eid5763>.
26452645+26462646+ [Err5917] RFC Errata, Erratum ID 5917, RFC 7049,
26472647+ <https://www.rfc-editor.org/errata/eid5917>.
26482648+26492649+ [IANA.cbor-simple-values]
26502650+ IANA, "Concise Binary Object Representation (CBOR) Simple
26512651+ Values",
26522652+ <https://www.iana.org/assignments/cbor-simple-values>.
26532653+26542654+ [IANA.cbor-tags]
26552655+ IANA, "Concise Binary Object Representation (CBOR) Tags",
26562656+ <https://www.iana.org/assignments/cbor-tags>.
26572657+26582658+ [IANA.core-parameters]
26592659+ IANA, "Constrained RESTful Environments (CoRE)
26602660+ Parameters",
26612661+ <https://www.iana.org/assignments/core-parameters>.
26622662+26632663+ [IANA.media-types]
26642664+ IANA, "Media Types",
26652665+ <https://www.iana.org/assignments/media-types>.
26662666+26672667+ [IANA.structured-suffix]
26682668+ IANA, "Structured Syntax Suffixes",
26692669+ <https://www.iana.org/assignments/media-type-structured-
26702670+ suffix>.
26712671+26722672+ [MessagePack]
26732673+ Furuhashi, S., "MessagePack", <https://msgpack.org/>.
26742674+26752675+ [PCRE] Hazel, P., "PCRE - Perl Compatible Regular Expressions",
26762676+ <https://www.pcre.org/>.
26772677+26782678+ [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission
26792679+ Protocol", RFC 713, DOI 10.17487/RFC0713, April 1976,
26802680+ <https://www.rfc-editor.org/info/rfc713>.
26812681+26822682+ [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
26832683+ Specifications and Registration Procedures", BCP 13,
26842684+ RFC 6838, DOI 10.17487/RFC6838, January 2013,
26852685+ <https://www.rfc-editor.org/info/rfc6838>.
26862686+26872687+ [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object
26882688+ Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049,
26892689+ October 2013, <https://www.rfc-editor.org/info/rfc7049>.
26902690+26912691+ [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for
26922692+ Constrained-Node Networks", RFC 7228,
26932693+ DOI 10.17487/RFC7228, May 2014,
26942694+ <https://www.rfc-editor.org/info/rfc7228>.
26952695+26962696+ [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493,
26972697+ DOI 10.17487/RFC7493, March 2015,
26982698+ <https://www.rfc-editor.org/info/rfc7493>.
26992699+27002700+ [RFC7991] Hoffman, P., "The "xml2rfc" Version 3 Vocabulary",
27012701+ RFC 7991, DOI 10.17487/RFC7991, December 2016,
27022702+ <https://www.rfc-editor.org/info/rfc7991>.
27032703+27042704+ [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
27052705+ Interchange Format", STD 90, RFC 8259,
27062706+ DOI 10.17487/RFC8259, December 2017,
27072707+ <https://www.rfc-editor.org/info/rfc8259>.
27082708+27092709+ [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data
27102710+ Definition Language (CDDL): A Notational Convention to
27112711+ Express Concise Binary Object Representation (CBOR) and
27122712+ JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610,
27132713+ June 2019, <https://www.rfc-editor.org/info/rfc8610>.
27142714+27152715+ [RFC8618] Dickinson, J., Hague, J., Dickinson, S., Manderson, T.,
27162716+ and J. Bond, "Compacted-DNS (C-DNS): A Format for DNS
27172717+ Packet Capture", RFC 8618, DOI 10.17487/RFC8618, September
27182718+ 2019, <https://www.rfc-editor.org/info/rfc8618>.
27192719+27202720+ [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR)
27212721+ Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020,
27222722+ <https://www.rfc-editor.org/info/rfc8742>.
27232723+27242724+ [RFC8746] Bormann, C., Ed., "Concise Binary Object Representation
27252725+ (CBOR) Tags for Typed Arrays", RFC 8746,
27262726+ DOI 10.17487/RFC8746, February 2020,
27272727+ <https://www.rfc-editor.org/info/rfc8746>.
27282728+27292729+ [SIPHASH_LNCS]
27302730+ Aumasson, J. and D. Bernstein, "SipHash: A Fast Short-
27312731+ Input PRF", Progress in Cryptology - INDOCRYPT 2012, pp.
27322732+ 489-508, DOI 10.1007/978-3-642-34931-7_28, 2012,
27332733+ <https://doi.org/10.1007/978-3-642-34931-7_28>.
27342734+27352735+ [SIPHASH_OPEN]
27362736+ Aumasson, J. and D.J. Bernstein, "SipHash: a fast short-
27372737+ input PRF", <https://www.aumasson.jp/siphash/siphash.pdf>.
27382738+27392739+ [YAML] Ben-Kiki, O., Evans, C., and I.d. Net, "YAML Ain't Markup
27402740+ Language (YAML[TM]) Version 1.2", 3rd Edition, October
27412741+ 2009, <https://www.yaml.org/spec/1.2/spec.html>.
27422742+27432743+Appendix A. Examples of Encoded CBOR Data Items
27442744+27452745+ The following table provides some CBOR-encoded values in hexadecimal
27462746+ (right column), together with diagnostic notation for these values
27472747+ (left column). Note that the string "\u00fc" is one form of
27482748+ diagnostic notation for a UTF-8 string containing the single Unicode
27492749+ character U+00FC (LATIN SMALL LETTER U WITH DIAERESIS, "ü").
27502750+ Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a
27512751+ single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, "水"), often
27522752+ representing "water", and "\ud800\udd51" is a UTF-8 string in
27532753+ diagnostic notation with a single character U+10151 (GREEK ACROPHONIC
27542754+ ATTIC FIFTY STATERS, "𐅑"). (Note that all these single-character
27552755+ strings could also be represented in native UTF-8 in diagnostic
27562756+ notation, just not if an ASCII-only specification is required.) In
27572757+ the diagnostic notation provided for bignums, their intended numeric
27582758+ value is shown as a decimal number (such as 18446744073709551616)
27592759+ instead of a tagged byte string (such as 2(h'010000000000000000')).
27602760+27612761+ +==============================+====================================+
27622762+ |Diagnostic | Encoded |
27632763+ +==============================+====================================+
27642764+ |0 | 0x00 |
27652765+ +------------------------------+------------------------------------+
27662766+ |1 | 0x01 |
27672767+ +------------------------------+------------------------------------+
27682768+ |10 | 0x0a |
27692769+ +------------------------------+------------------------------------+
27702770+ |23 | 0x17 |
27712771+ +------------------------------+------------------------------------+
27722772+ |24 | 0x1818 |
27732773+ +------------------------------+------------------------------------+
27742774+ |25 | 0x1819 |
27752775+ +------------------------------+------------------------------------+
27762776+ |100 | 0x1864 |
27772777+ +------------------------------+------------------------------------+
27782778+ |1000 | 0x1903e8 |
27792779+ +------------------------------+------------------------------------+
27802780+ |1000000 | 0x1a000f4240 |
27812781+ +------------------------------+------------------------------------+
27822782+ |1000000000000 | 0x1b000000e8d4a51000 |
27832783+ +------------------------------+------------------------------------+
27842784+ |18446744073709551615 | 0x1bffffffffffffffff |
27852785+ +------------------------------+------------------------------------+
27862786+ |18446744073709551616 | 0xc249010000000000000000 |
27872787+ +------------------------------+------------------------------------+
27882788+ |-18446744073709551616 | 0x3bffffffffffffffff |
27892789+ +------------------------------+------------------------------------+
27902790+ |-18446744073709551617 | 0xc349010000000000000000 |
27912791+ +------------------------------+------------------------------------+
27922792+ |-1 | 0x20 |
27932793+ +------------------------------+------------------------------------+
27942794+ |-10 | 0x29 |
27952795+ +------------------------------+------------------------------------+
27962796+ |-100 | 0x3863 |
27972797+ +------------------------------+------------------------------------+
27982798+ |-1000 | 0x3903e7 |
27992799+ +------------------------------+------------------------------------+
28002800+ |0.0 | 0xf90000 |
28012801+ +------------------------------+------------------------------------+
28022802+ |-0.0 | 0xf98000 |
28032803+ +------------------------------+------------------------------------+
28042804+ |1.0 | 0xf93c00 |
28052805+ +------------------------------+------------------------------------+
28062806+ |1.1 | 0xfb3ff199999999999a |
28072807+ +------------------------------+------------------------------------+
28082808+ |1.5 | 0xf93e00 |
28092809+ +------------------------------+------------------------------------+
28102810+ |65504.0 | 0xf97bff |
28112811+ +------------------------------+------------------------------------+
28122812+ |100000.0 | 0xfa47c35000 |
28132813+ +------------------------------+------------------------------------+
28142814+ |3.4028234663852886e+38 | 0xfa7f7fffff |
28152815+ +------------------------------+------------------------------------+
28162816+ |1.0e+300 | 0xfb7e37e43c8800759c |
28172817+ +------------------------------+------------------------------------+
28182818+ |5.960464477539063e-8 | 0xf90001 |
28192819+ +------------------------------+------------------------------------+
28202820+ |0.00006103515625 | 0xf90400 |
28212821+ +------------------------------+------------------------------------+
28222822+ |-4.0 | 0xf9c400 |
28232823+ +------------------------------+------------------------------------+
28242824+ |-4.1 | 0xfbc010666666666666 |
28252825+ +------------------------------+------------------------------------+
28262826+ |Infinity | 0xf97c00 |
28272827+ +------------------------------+------------------------------------+
28282828+ |NaN | 0xf97e00 |
28292829+ +------------------------------+------------------------------------+
28302830+ |-Infinity | 0xf9fc00 |
28312831+ +------------------------------+------------------------------------+
28322832+ |Infinity | 0xfa7f800000 |
28332833+ +------------------------------+------------------------------------+
28342834+ |NaN | 0xfa7fc00000 |
28352835+ +------------------------------+------------------------------------+
28362836+ |-Infinity | 0xfaff800000 |
28372837+ +------------------------------+------------------------------------+
28382838+ |Infinity | 0xfb7ff0000000000000 |
28392839+ +------------------------------+------------------------------------+
28402840+ |NaN | 0xfb7ff8000000000000 |
28412841+ +------------------------------+------------------------------------+
28422842+ |-Infinity | 0xfbfff0000000000000 |
28432843+ +------------------------------+------------------------------------+
28442844+ |false | 0xf4 |
28452845+ +------------------------------+------------------------------------+
28462846+ |true | 0xf5 |
28472847+ +------------------------------+------------------------------------+
28482848+ |null | 0xf6 |
28492849+ +------------------------------+------------------------------------+
28502850+ |undefined | 0xf7 |
28512851+ +------------------------------+------------------------------------+
28522852+ |simple(16) | 0xf0 |
28532853+ +------------------------------+------------------------------------+
28542854+ |simple(255) | 0xf8ff |
28552855+ +------------------------------+------------------------------------+
28562856+ |0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a |
28572857+ | | 30343a30305a |
28582858+ +------------------------------+------------------------------------+
28592859+ |1(1363896240) | 0xc11a514b67b0 |
28602860+ +------------------------------+------------------------------------+
28612861+ |1(1363896240.5) | 0xc1fb41d452d9ec200000 |
28622862+ +------------------------------+------------------------------------+
28632863+ |23(h'01020304') | 0xd74401020304 |
28642864+ +------------------------------+------------------------------------+
28652865+ |24(h'6449455446') | 0xd818456449455446 |
28662866+ +------------------------------+------------------------------------+
28672867+ |32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
28682868+ | | 616d706c652e636f6d |
28692869+ +------------------------------+------------------------------------+
28702870+ |h'' | 0x40 |
28712871+ +------------------------------+------------------------------------+
28722872+ |h'01020304' | 0x4401020304 |
28732873+ +------------------------------+------------------------------------+
28742874+ |"" | 0x60 |
28752875+ +------------------------------+------------------------------------+
28762876+ |"a" | 0x6161 |
28772877+ +------------------------------+------------------------------------+
28782878+ |"IETF" | 0x6449455446 |
28792879+ +------------------------------+------------------------------------+
28802880+ |"\"\\" | 0x62225c |
28812881+ +------------------------------+------------------------------------+
28822882+ |"\u00fc" | 0x62c3bc |
28832883+ +------------------------------+------------------------------------+
28842884+ |"\u6c34" | 0x63e6b0b4 |
28852885+ +------------------------------+------------------------------------+
28862886+ |"\ud800\udd51" | 0x64f0908591 |
28872887+ +------------------------------+------------------------------------+
28882888+ |[] | 0x80 |
28892889+ +------------------------------+------------------------------------+
28902890+ |[1, 2, 3] | 0x83010203 |
28912891+ +------------------------------+------------------------------------+
28922892+ |[1, [2, 3], [4, 5]] | 0x8301820203820405 |
28932893+ +------------------------------+------------------------------------+
28942894+ |[1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e |
28952895+ |10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 |
28962896+ |17, 18, 19, 20, 21, 22, 23, | |
28972897+ |24, 25] | |
28982898+ +------------------------------+------------------------------------+
28992899+ |{} | 0xa0 |
29002900+ +------------------------------+------------------------------------+
29012901+ |{1: 2, 3: 4} | 0xa201020304 |
29022902+ +------------------------------+------------------------------------+
29032903+ |{"a": 1, "b": [2, 3]} | 0xa26161016162820203 |
29042904+ +------------------------------+------------------------------------+
29052905+ |["a", {"b": "c"}] | 0x826161a161626163 |
29062906+ +------------------------------+------------------------------------+
29072907+ |{"a": "A", "b": "B", "c": "C",| 0xa5616161416162614261636143616461 |
29082908+ |"d": "D", "e": "E"} | 4461656145 |
29092909+ +------------------------------+------------------------------------+
29102910+ |(_ h'0102', h'030405') | 0x5f42010243030405ff |
29112911+ +------------------------------+------------------------------------+
29122912+ |(_ "strea", "ming") | 0x7f657374726561646d696e67ff |
29132913+ +------------------------------+------------------------------------+
29142914+ |[_ ] | 0x9fff |
29152915+ +------------------------------+------------------------------------+
29162916+ |[_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff |
29172917+ +------------------------------+------------------------------------+
29182918+ |[_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff |
29192919+ +------------------------------+------------------------------------+
29202920+ |[1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff |
29212921+ +------------------------------+------------------------------------+
29222922+ |[1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 |
29232923+ +------------------------------+------------------------------------+
29242924+ |[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x9f0102030405060708090a0b0c0d0e0f |
29252925+ |10, 11, 12, 13, 14, 15, 16, | 101112131415161718181819ff |
29262926+ |17, 18, 19, 20, 21, 22, 23, | |
29272927+ |24, 25] | |
29282928+ +------------------------------+------------------------------------+
29292929+ |{_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff |
29302930+ +------------------------------+------------------------------------+
29312931+ |["a", {_ "b": "c"}] | 0x826161bf61626163ff |
29322932+ +------------------------------+------------------------------------+
29332933+ |{_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff |
29342934+ +------------------------------+------------------------------------+
29352935+29362936+ Table 6: Examples of Encoded CBOR Data Items
29372937+29382938+Appendix B. Jump Table for Initial Byte
29392939+29402940+ For brevity, this jump table does not show initial bytes that are
29412941+ reserved for future extension. It also only shows a selection of the
29422942+ initial bytes that can be used for optional features. (All unsigned
29432943+ integers are in network byte order.)
29442944+29452945+ +============+================================================+
29462946+ | Byte | Structure/Semantics |
29472947+ +============+================================================+
29482948+ | 0x00..0x17 | unsigned integer 0x00..0x17 (0..23) |
29492949+ +------------+------------------------------------------------+
29502950+ | 0x18 | unsigned integer (one-byte uint8_t follows) |
29512951+ +------------+------------------------------------------------+
29522952+ | 0x19 | unsigned integer (two-byte uint16_t follows) |
29532953+ +------------+------------------------------------------------+
29542954+ | 0x1a | unsigned integer (four-byte uint32_t follows) |
29552955+ +------------+------------------------------------------------+
29562956+ | 0x1b | unsigned integer (eight-byte uint64_t follows) |
29572957+ +------------+------------------------------------------------+
29582958+ | 0x20..0x37 | negative integer -1-0x00..-1-0x17 (-1..-24) |
29592959+ +------------+------------------------------------------------+
29602960+ | 0x38 | negative integer -1-n (one-byte uint8_t for n |
29612961+ | | follows) |
29622962+ +------------+------------------------------------------------+
29632963+ | 0x39 | negative integer -1-n (two-byte uint16_t for n |
29642964+ | | follows) |
29652965+ +------------+------------------------------------------------+
29662966+ | 0x3a | negative integer -1-n (four-byte uint32_t for |
29672967+ | | n follows) |
29682968+ +------------+------------------------------------------------+
29692969+ | 0x3b | negative integer -1-n (eight-byte uint64_t for |
29702970+ | | n follows) |
29712971+ +------------+------------------------------------------------+
29722972+ | 0x40..0x57 | byte string (0x00..0x17 bytes follow) |
29732973+ +------------+------------------------------------------------+
29742974+ | 0x58 | byte string (one-byte uint8_t for n, and then |
29752975+ | | n bytes follow) |
29762976+ +------------+------------------------------------------------+
29772977+ | 0x59 | byte string (two-byte uint16_t for n, and then |
29782978+ | | n bytes follow) |
29792979+ +------------+------------------------------------------------+
29802980+ | 0x5a | byte string (four-byte uint32_t for n, and |
29812981+ | | then n bytes follow) |
29822982+ +------------+------------------------------------------------+
29832983+ | 0x5b | byte string (eight-byte uint64_t for n, and |
29842984+ | | then n bytes follow) |
29852985+ +------------+------------------------------------------------+
29862986+ | 0x5f | byte string, byte strings follow, terminated |
29872987+ | | by "break" |
29882988+ +------------+------------------------------------------------+
29892989+ | 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) |
29902990+ +------------+------------------------------------------------+
29912991+ | 0x78 | UTF-8 string (one-byte uint8_t for n, and then |
29922992+ | | n bytes follow) |
29932993+ +------------+------------------------------------------------+
29942994+ | 0x79 | UTF-8 string (two-byte uint16_t for n, and |
29952995+ | | then n bytes follow) |
29962996+ +------------+------------------------------------------------+
29972997+ | 0x7a | UTF-8 string (four-byte uint32_t for n, and |
29982998+ | | then n bytes follow) |
29992999+ +------------+------------------------------------------------+
30003000+ | 0x7b | UTF-8 string (eight-byte uint64_t for n, and |
30013001+ | | then n bytes follow) |
30023002+ +------------+------------------------------------------------+
30033003+ | 0x7f | UTF-8 string, UTF-8 strings follow, terminated |
30043004+ | | by "break" |
30053005+ +------------+------------------------------------------------+
30063006+ | 0x80..0x97 | array (0x00..0x17 data items follow) |
30073007+ +------------+------------------------------------------------+
30083008+ | 0x98 | array (one-byte uint8_t for n, and then n data |
30093009+ | | items follow) |
30103010+ +------------+------------------------------------------------+
30113011+ | 0x99 | array (two-byte uint16_t for n, and then n |
30123012+ | | data items follow) |
30133013+ +------------+------------------------------------------------+
30143014+ | 0x9a | array (four-byte uint32_t for n, and then n |
30153015+ | | data items follow) |
30163016+ +------------+------------------------------------------------+
30173017+ | 0x9b | array (eight-byte uint64_t for n, and then n |
30183018+ | | data items follow) |
30193019+ +------------+------------------------------------------------+
30203020+ | 0x9f | array, data items follow, terminated by |
30213021+ | | "break" |
30223022+ +------------+------------------------------------------------+
30233023+ | 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) |
30243024+ +------------+------------------------------------------------+
30253025+ | 0xb8 | map (one-byte uint8_t for n, and then n pairs |
30263026+ | | of data items follow) |
30273027+ +------------+------------------------------------------------+
30283028+ | 0xb9 | map (two-byte uint16_t for n, and then n pairs |
30293029+ | | of data items follow) |
30303030+ +------------+------------------------------------------------+
30313031+ | 0xba | map (four-byte uint32_t for n, and then n |
30323032+ | | pairs of data items follow) |
30333033+ +------------+------------------------------------------------+
30343034+ | 0xbb | map (eight-byte uint64_t for n, and then n |
30353035+ | | pairs of data items follow) |
30363036+ +------------+------------------------------------------------+
30373037+ | 0xbf | map, pairs of data items follow, terminated by |
30383038+ | | "break" |
30393039+ +------------+------------------------------------------------+
30403040+ | 0xc0 | text-based date/time (data item follows; see |
30413041+ | | Section 3.4.1) |
30423042+ +------------+------------------------------------------------+
30433043+ | 0xc1 | epoch-based date/time (data item follows; see |
30443044+ | | Section 3.4.2) |
30453045+ +------------+------------------------------------------------+
30463046+ | 0xc2 | unsigned bignum (data item "byte string" |
30473047+ | | follows) |
30483048+ +------------+------------------------------------------------+
30493049+ | 0xc3 | negative bignum (data item "byte string" |
30503050+ | | follows) |
30513051+ +------------+------------------------------------------------+
30523052+ | 0xc4 | decimal Fraction (data item "array" follows; |
30533053+ | | see Section 3.4.4) |
30543054+ +------------+------------------------------------------------+
30553055+ | 0xc5 | bigfloat (data item "array" follows; see |
30563056+ | | Section 3.4.4) |
30573057+ +------------+------------------------------------------------+
30583058+ | 0xc6..0xd4 | (tag) |
30593059+ +------------+------------------------------------------------+
30603060+ | 0xd5..0xd7 | expected conversion (data item follows; see |
30613061+ | | Section 3.4.5.2) |
30623062+ +------------+------------------------------------------------+
30633063+ | 0xd8..0xdb | (more tags; 1/2/4/8 bytes of tag number and |
30643064+ | | then a data item follow) |
30653065+ +------------+------------------------------------------------+
30663066+ | 0xe0..0xf3 | (simple value) |
30673067+ +------------+------------------------------------------------+
30683068+ | 0xf4 | false |
30693069+ +------------+------------------------------------------------+
30703070+ | 0xf5 | true |
30713071+ +------------+------------------------------------------------+
30723072+ | 0xf6 | null |
30733073+ +------------+------------------------------------------------+
30743074+ | 0xf7 | undefined |
30753075+ +------------+------------------------------------------------+
30763076+ | 0xf8 | (simple value, one byte follows) |
30773077+ +------------+------------------------------------------------+
30783078+ | 0xf9 | half-precision float (two-byte IEEE 754) |
30793079+ +------------+------------------------------------------------+
30803080+ | 0xfa | single-precision float (four-byte IEEE 754) |
30813081+ +------------+------------------------------------------------+
30823082+ | 0xfb | double-precision float (eight-byte IEEE 754) |
30833083+ +------------+------------------------------------------------+
30843084+ | 0xff | "break" stop code |
30853085+ +------------+------------------------------------------------+
30863086+30873087+ Table 7: Jump Table for Initial Byte
30883088+30893089+Appendix C. Pseudocode
30903090+30913091+ The well-formedness of a CBOR item can be checked by the pseudocode
30923092+ in Figure 1. The data is well-formed if and only if:
30933093+30943094+ * the pseudocode does not "fail";
30953095+30963096+ * after execution of the pseudocode, no bytes are left in the input
30973097+ (except in streaming applications).
30983098+30993099+ The pseudocode has the following prerequisites:
31003100+31013101+ * take(n) reads n bytes from the input data and returns them as a
31023102+ byte string. If n bytes are no longer available, take(n) fails.
31033103+31043104+ * uint() converts a byte string into an unsigned integer by
31053105+ interpreting the byte string in network byte order.
31063106+31073107+ * Arithmetic works as in C.
31083108+31093109+ * All variables are unsigned integers of sufficient range.
31103110+31113111+ Note that "well_formed" returns the major type for well-formed
31123112+ definite-length items, but 99 for an indefinite-length item (or -1
31133113+ for a "break" stop code, only if "breakable" is set). This is used
31143114+ in "well_formed_indefinite" to ascertain that indefinite-length
31153115+ strings only contain definite-length strings as chunks.
31163116+31173117+ well_formed(breakable = false) {
31183118+ // process initial bytes
31193119+ ib = uint(take(1));
31203120+ mt = ib >> 5;
31213121+ val = ai = ib & 0x1f;
31223122+ switch (ai) {
31233123+ case 24: val = uint(take(1)); break;
31243124+ case 25: val = uint(take(2)); break;
31253125+ case 26: val = uint(take(4)); break;
31263126+ case 27: val = uint(take(8)); break;
31273127+ case 28: case 29: case 30: fail();
31283128+ case 31:
31293129+ return well_formed_indefinite(mt, breakable);
31303130+ }
31313131+ // process content
31323132+ switch (mt) {
31333133+ // case 0, 1, 7 do not have content; just use val
31343134+ case 2: case 3: take(val); break; // bytes/UTF-8
31353135+ case 4: for (i = 0; i < val; i++) well_formed(); break;
31363136+ case 5: for (i = 0; i < val*2; i++) well_formed(); break;
31373137+ case 6: well_formed(); break; // 1 embedded data item
31383138+ case 7: if (ai == 24 && val < 32) fail(); // bad simple
31393139+ }
31403140+ return mt; // definite-length data item
31413141+ }
31423142+31433143+ well_formed_indefinite(mt, breakable) {
31443144+ switch (mt) {
31453145+ case 2: case 3:
31463146+ while ((it = well_formed(true)) != -1)
31473147+ if (it != mt) // need definite-length chunk
31483148+ fail(); // of same type
31493149+ break;
31503150+ case 4: while (well_formed(true) != -1); break;
31513151+ case 5: while (well_formed(true) != -1) well_formed(); break;
31523152+ case 7:
31533153+ if (breakable)
31543154+ return -1; // signal break out
31553155+ else fail(); // no enclosing indefinite
31563156+ default: fail(); // wrong mt
31573157+ }
31583158+ return 99; // indefinite-length data item
31593159+ }
31603160+31613161+ Figure 1: Pseudocode for Well-Formedness Check
31623162+31633163+ Note that the remaining complexity of a complete CBOR decoder is
31643164+ about presenting data that has been decoded to the application in an
31653165+ appropriate form.
31663166+31673167+ Major types 0 and 1 are designed in such a way that they can be
31683168+ encoded in C from a signed integer without actually doing an if-then-
31693169+ else for positive/negative (Figure 2). This uses the fact that
31703170+ (-1-n), the transformation for major type 1, is the same as ~n
31713171+ (bitwise complement) in C unsigned arithmetic; ~n can then be
31723172+ expressed as (-1)^n for the negative case, while 0^n leaves n
31733173+ unchanged for nonnegative. The sign of a number can be converted to
31743174+ -1 for negative and 0 for nonnegative (0 or positive) by arithmetic-
31753175+ shifting the number by one bit less than the bit length of the number
31763176+ (for example, by 63 for 64-bit numbers).
31773177+31783178+ void encode_sint(int64_t n) {
31793179+ uint64t ui = n >> 63; // extend sign to whole length
31803180+ unsigned mt = ui & 0x20; // extract (shifted) major type
31813181+ ui ^= n; // complement negatives
31823182+ if (ui < 24)
31833183+ *p++ = mt + ui;
31843184+ else if (ui < 256) {
31853185+ *p++ = mt + 24;
31863186+ *p++ = ui;
31873187+ } else
31883188+ ...
31893189+31903190+ Figure 2: Pseudocode for Encoding a Signed Integer
31913191+31923192+ See Section 1.2 for some specific assumptions about the profile of
31933193+ the C language used in these pieces of code.
31943194+31953195+Appendix D. Half-Precision
31963196+31973197+ As half-precision floating-point numbers were only added to IEEE 754
31983198+ in 2008 [IEEE754], today's programming platforms often still only
31993199+ have limited support for them. It is very easy to include at least
32003200+ decoding support for them even without such support. An example of a
32013201+ small decoder for half-precision floating-point numbers in the C
32023202+ language is shown in Figure 3. A similar program for Python is in
32033203+ Figure 4; this code assumes that the 2-byte value has already been
32043204+ decoded as an (unsigned short) integer in network byte order (as
32053205+ would be done by the pseudocode in Appendix C).
32063206+32073207+ #include <math.h>
32083208+32093209+ double decode_half(unsigned char *halfp) {
32103210+ unsigned half = (halfp[0] << 8) + halfp[1];
32113211+ unsigned exp = (half >> 10) & 0x1f;
32123212+ unsigned mant = half & 0x3ff;
32133213+ double val;
32143214+ if (exp == 0) val = ldexp(mant, -24);
32153215+ else if (exp != 31) val = ldexp(mant + 1024, exp - 25);
32163216+ else val = mant == 0 ? INFINITY : NAN;
32173217+ return half & 0x8000 ? -val : val;
32183218+ }
32193219+32203220+ Figure 3: C Code for a Half-Precision Decoder
32213221+32223222+ import struct
32233223+ from math import ldexp
32243224+32253225+ def decode_single(single):
32263226+ return struct.unpack("!f", struct.pack("!I", single))[0]
32273227+32283228+ def decode_half(half):
32293229+ valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16
32303230+ if ((half & 0x7c00) != 0x7c00):
32313231+ return ldexp(decode_single(valu), 112)
32323232+ return decode_single(valu | 0x7f800000)
32333233+32343234+ Figure 4: Python Code for a Half-Precision Decoder
32353235+32363236+Appendix E. Comparison of Other Binary Formats to CBOR's Design
32373237+ Objectives
32383238+32393239+ The proposal for CBOR follows a history of binary formats that is as
32403240+ long as the history of computers themselves. Different formats have
32413241+ had different objectives. In most cases, the objectives of the
32423242+ format were never stated, although they can sometimes be implied by
32433243+ the context where the format was first used. Some formats were meant
32443244+ to be universally usable, although history has proven that no binary
32453245+ format meets the needs of all protocols and applications.
32463246+32473247+ CBOR differs from many of these formats due to it starting with a set
32483248+ of objectives and attempting to meet just those. This section
32493249+ compares a few of the dozens of formats with CBOR's objectives in
32503250+ order to help the reader decide if they want to use CBOR or a
32513251+ different format for a particular protocol or application.
32523252+32533253+ Note that the discussion here is not meant to be a criticism of any
32543254+ format: to the best of our knowledge, no format before CBOR was meant
32553255+ to cover CBOR's objectives in the priority we have assigned them. A
32563256+ brief recap of the objectives from Section 1.1 is:
32573257+32583258+ 1. unambiguous encoding of most common data formats from Internet
32593259+ standards
32603260+32613261+ 2. code compactness for encoder or decoder
32623262+32633263+ 3. no schema description needed
32643264+32653265+ 4. reasonably compact serialization
32663266+32673267+ 5. applicability to constrained and unconstrained applications
32683268+32693269+ 6. good JSON conversion
32703270+32713271+ 7. extensibility
32723272+32733273+ A discussion of CBOR and other formats with respect to a different
32743274+ set of design objectives is provided in Section 5 and Appendix C of
32753275+ [RFC8618].
32763276+32773277+E.1. ASN.1 DER, BER, and PER
32783278+32793279+ [ASN.1] has many serializations. In the IETF, DER and BER are the
32803280+ most common. The serialized output is not particularly compact for
32813281+ many items, and the code needed to decode numeric items can be
32823282+ complex on a constrained device.
32833283+32843284+ Few (if any) IETF protocols have adopted one of the several variants
32853285+ of Packed Encoding Rules (PER). There could be many reasons for
32863286+ this, but one that is commonly stated is that PER makes use of the
32873287+ schema even for parsing the surface structure of the data item,
32883288+ requiring significant tool support. There are different versions of
32893289+ the ASN.1 schema language in use, which has also hampered adoption.
32903290+32913291+E.2. MessagePack
32923292+32933293+ [MessagePack] is a concise, widely implemented counted binary
32943294+ serialization format, similar in many properties to CBOR, although
32953295+ somewhat less regular. While the data model can be used to represent
32963296+ JSON data, MessagePack has also been used in many remote procedure
32973297+ call (RPC) applications and for long-term storage of data.
32983298+32993299+ MessagePack has been essentially stable since it was first published
33003300+ around 2011; it has not yet had a transition. The evolution of
33013301+ MessagePack is impeded by an imperative to maintain complete
33023302+ backwards compatibility with existing stored data, while only few
33033303+ bytecodes are still available for extension. Repeated requests over
33043304+ the years from the MessagePack user community to separate out binary
33053305+ and text strings in the encoding recently have led to an extension
33063306+ proposal that would leave MessagePack's "raw" data ambiguous between
33073307+ its usages for binary and text data. The extension mechanism for
33083308+ MessagePack remains unclear.
33093309+33103310+E.3. BSON
33113311+33123312+ [BSON] is a data format that was developed for the storage of JSON-
33133313+ like maps (JSON objects) in the MongoDB database. Its major
33143314+ distinguishing feature is the capability for in-place update, which
33153315+ prevents a compact representation. BSON uses a counted
33163316+ representation except for map keys, which are null-byte terminated.
33173317+ While BSON can be used for the representation of JSON-like objects on
33183318+ the wire, its specification is dominated by the requirements of the
33193319+ database application and has become somewhat baroque. The status of
33203320+ how BSON extensions will be implemented remains unclear.
33213321+33223322+E.4. MSDTP: RFC 713
33233323+33243324+ Message Services Data Transmission (MSDTP) is a very early example of
33253325+ a compact message format; it is described in [RFC0713], written in
33263326+ 1976. It is included here for its historical value, not because it
33273327+ was ever widely used.
33283328+33293329+E.5. Conciseness on the Wire
33303330+33313331+ While CBOR's design objective of code compactness for encoders and
33323332+ decoders is a higher priority than its objective of conciseness on
33333333+ the wire, many people focus on the wire size. Table 8 shows some
33343334+ encoding examples for the simple nested array [1, [2, 3]]; where some
33353335+ form of indefinite-length encoding is supported by the encoding,
33363336+ [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.
33373337+33383338+ +=============+============================+================+
33393339+ | Format | [1, [2, 3]] | [_ 1, [2, 3]] |
33403340+ +=============+============================+================+
33413341+ | RFC 713 | c2 05 81 c2 02 82 83 | |
33423342+ +-------------+----------------------------+----------------+
33433343+ | ASN.1 BER | 30 0b 02 01 01 30 06 02 01 | 30 80 02 01 01 |
33443344+ | | 02 02 01 03 | 30 06 02 01 02 |
33453345+ | | | 02 01 03 00 00 |
33463346+ +-------------+----------------------------+----------------+
33473347+ | MessagePack | 92 01 92 02 03 | |
33483348+ +-------------+----------------------------+----------------+
33493349+ | BSON | 22 00 00 00 10 30 00 01 00 | |
33503350+ | | 00 00 04 31 00 13 00 00 00 | |
33513351+ | | 10 30 00 02 00 00 00 10 31 | |
33523352+ | | 00 03 00 00 00 00 00 | |
33533353+ +-------------+----------------------------+----------------+
33543354+ | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 |
33553355+ | | | ff |
33563356+ +-------------+----------------------------+----------------+
33573357+33583358+ Table 8: Examples for Different Levels of Conciseness
33593359+33603360+Appendix F. Well-Formedness Errors and Examples
33613361+33623362+ There are three basic kinds of well-formedness errors that can occur
33633363+ in decoding a CBOR data item:
33643364+33653365+ Too much data: There are input bytes left that were not consumed.
33663366+ This is only an error if the application assumed that the input
33673367+ bytes would span exactly one data item. Where the application
33683368+ uses the self-delimiting nature of CBOR encoding to permit
33693369+ additional data after the data item, as is done in CBOR sequences
33703370+ [RFC8742], for example, the CBOR decoder can simply indicate which
33713371+ part of the input has not been consumed.
33723372+33733373+ Too little data: The input data available would need additional
33743374+ bytes added at their end for a complete CBOR data item. This may
33753375+ indicate the input is truncated; it is also a common error when
33763376+ trying to decode random data as CBOR. For some applications,
33773377+ however, this may not actually be an error, as the application may
33783378+ not be certain it has all the data yet and can obtain or wait for
33793379+ additional input bytes. Some of these applications may have an
33803380+ upper limit for how much additional data can appear; here the
33813381+ decoder may be able to indicate that the encoded CBOR data item
33823382+ cannot be completed within this limit.
33833383+33843384+ Syntax error: The input data are not consistent with the
33853385+ requirements of the CBOR encoding, and this cannot be remedied by
33863386+ adding (or removing) data at the end.
33873387+33883388+ In Appendix C, errors of the first kind are addressed in the first
33893389+ paragraph and bullet list (requiring "no bytes are left"), and errors
33903390+ of the second kind are addressed in the second paragraph/bullet list
33913391+ (failing "if n bytes are no longer available"). Errors of the third
33923392+ kind are identified in the pseudocode by specific instances of
33933393+ calling fail(), in order:
33943394+33953395+ * a reserved value is used for additional information (28, 29, 30)
33963396+33973397+ * major type 7, additional information 24, value < 32 (incorrect)
33983398+33993399+ * incorrect substructure of indefinite-length byte string or text
34003400+ string (may only contain definite-length strings of the same major
34013401+ type)
34023402+34033403+ * "break" stop code (major type 7, additional information 31) occurs
34043404+ in a value position of a map or except at a position directly in
34053405+ an indefinite-length item where also another enclosed data item
34063406+ could occur
34073407+34083408+ * additional information 31 used with major type 0, 1, or 6
34093409+34103410+F.1. Examples of CBOR Data Items That Are Not Well-Formed
34113411+34123412+ This subsection shows a few examples for CBOR data items that are not
34133413+ well-formed. Each example is a sequence of bytes, each shown in
34143414+ hexadecimal; multiple examples in a list are separated by commas.
34153415+34163416+ Examples for well-formedness error kind 1 (too much data) can easily
34173417+ be formed by adding data to a well-formed encoded CBOR data item.
34183418+34193419+ Similarly, examples for well-formedness error kind 2 (too little
34203420+ data) can be formed by truncating a well-formed encoded CBOR data
34213421+ item. In test suites, it may be beneficial to specifically test with
34223422+ incomplete data items that would require large amounts of addition to
34233423+ be completed (for instance by starting the encoding of a string of a
34243424+ very large size).
34253425+34263426+ A premature end of the input can occur in a head or within the
34273427+ enclosed data, which may be bare strings or enclosed data items that
34283428+ are either counted or should have been ended by a "break" stop code.
34293429+34303430+ End of input in a head: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 03
34313431+ 04 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 00
34323432+ 00, fb 00 00 00
34333433+34343434+ Definite-length strings with short data: 41, 61, 5a ff ff ff ff 00,
34353435+ 5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f ff
34363436+ ff ff ff ff ff ff 01 02 03
34373437+34383438+ Definite-length maps and arrays not closed with enough items: 81, 81
34393439+ 81 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 00
34403440+34413441+ Tag number not followed by tag content: c0
34423442+34433443+ Indefinite-length strings not closed by a "break" stop code: 5f 41
34443444+ 00, 7f 61 00
34453445+34463446+ Indefinite-length maps and arrays not closed by a "break" stop
34473447+ code: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f
34483448+ 9f ff ff ff ff, 9f 81 9f 81 9f 9f ff ff ff
34493449+34503450+ A few examples for the five subkinds of well-formedness error kind 3
34513451+ (syntax error) are shown below.
34523452+34533453+ Subkind 1:
34543454+ Reserved additional information values: 1c, 1d, 1e, 3c, 3d, 3e,
34553455+ 5c, 5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc,
34563456+ fd, fe,
34573457+34583458+ Subkind 2:
34593459+ Reserved two-byte encodings of simple values: f8 00, f8 01, f8
34603460+ 18, f8 1f
34613461+34623462+ Subkind 3:
34633463+ Indefinite-length string chunks not of the correct type: 5f 00
34643464+ ff, 5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f
34653465+ e0 ff, 7f 41 00 ff
34663466+34673467+ Indefinite-length string chunks not definite length: 5f 5f 41 00
34683468+ ff ff, 7f 7f 61 00 ff ff
34693469+34703470+ Subkind 4:
34713471+ Break occurring on its own outside of an indefinite-length
34723472+ item: ff
34733473+34743474+ Break occurring in a definite-length array or map or a tag: 81
34753475+ ff, 82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff,
34763476+ 9f 82 9f 81 9f 9f ff ff ff ff
34773477+34783478+ Break in an indefinite-length map that would lead to an odd
34793479+ number of items (break in a value position): bf 00 ff, bf 00 00
34803480+ 00 ff
34813481+34823482+ Subkind 5:
34833483+ Major type 0, 1, 6 with additional information 31: 1f, 3f, df
34843484+34853485+Appendix G. Changes from RFC 7049
34863486+34873487+ As discussed in the introduction, this document formally obsoletes
34883488+ RFC 7049 while keeping full compatibility with the interchange format
34893489+ from RFC 7049. This document provides editorial improvements, added
34903490+ detail, and fixed errata. This document does not create a new
34913491+ version of the format.
34923492+34933493+G.1. Errata Processing and Clerical Changes
34943494+34953495+ The two verified errata on RFC 7049, [Err3764] and [Err3770],
34963496+ concerned two encoding examples in the text that have been corrected
34973497+ (Section 3.4.3: "29" -> "49", Section 5.5: "0b000_11101" ->
34983498+ "0b000_11001"). Also, RFC 7049 contained an example using the
34993499+ numeric value 24 for a simple value [Err5917], which is not well-
35003500+ formed; this example has been removed. Errata report 5763 [Err5763]
35013501+ pointed to an error in the wording of the definition of tags; this
35023502+ was resolved during a rewrite of Section 3.4. Errata report 5434
35033503+ [Err5434] pointed out that the Universal Binary JSON (UBJSON) example
35043504+ in Appendix E no longer complied with the version of UBJSON current
35053505+ at the time of the errata report submission. It turned out that the
35063506+ UBJSON specification had completely changed since 2013; this example
35073507+ therefore was removed. Other errata reports [Err4409] [Err4963]
35083508+ [Err4964] complained that the map key sorting rules for canonical
35093509+ encoding were onerous; these led to a reconsideration of the
35103510+ canonical encoding suggestions and replacement by the deterministic
35113511+ encoding suggestions (described below). An editorial suggestion in
35123512+ errata report 4294 [Err4294] was also implemented (improved symmetry
35133513+ by adding "Second value" to a comment to the last example in
35143514+ Section 3.2.2).
35153515+35163516+ Other clerical changes include:
35173517+35183518+ * the use of new xml2rfc functionality [RFC7991];
35193519+35203520+ * more explanation of the notation used;
35213521+35223522+ * the update of references, e.g., from RFC 4627 to [RFC8259], from
35233523+ CNN-TERMS to [RFC7228], and from the 5.1 edition to the 11th
35243524+ edition of [ECMA262]; the addition of a reference to [IEEE754] and
35253525+ importation of required definitions; the addition of references to
35263526+ [C] and [Cplusplus20]; and the addition of a reference to
35273527+ [RFC8618] that further illustrates the discussion in Appendix E;
35283528+35293529+ * in the discussion of diagnostic notation (Section 8), the
35303530+ "Extended Diagnostic Notation" (EDN) defined in [RFC8610] is now
35313531+ mentioned, the gap in representing NaN payloads is now
35323532+ highlighted, and an explanation of representing indefinite-length
35333533+ strings with no chunks has been added (Section 8.1);
35343534+35353535+ * the addition of this appendix.
35363536+35373537+G.2. Changes in IANA Considerations
35383538+35393539+ The IANA considerations were generally updated (clerical changes,
35403540+ e.g., now pointing to the CBOR Working Group as the author of the
35413541+ specification). References to the respective IANA registries were
35423542+ added to the informative references.
35433543+35443544+ In the "Concise Binary Object Representation (CBOR) Tags" registry
35453545+ [IANA.cbor-tags], tags in the space from 256 to 32767 (lower half of
35463546+ "1+2") are no longer assigned by First Come First Served; this range
35473547+ is now Specification Required.
35483548+35493549+G.3. Changes in Suggestions and Other Informational Components
35503550+35513551+ While revising the document, beyond the addressing of the errata
35523552+ reports, the working group drew upon nearly seven years of experience
35533553+ with CBOR in a diverse set of applications. This led to a number of
35543554+ editorial changes, including adding tables for illustration, but also
35553555+ emphasizing some aspects and de-emphasizing others.
35563556+35573557+ A significant addition is Section 2, which discusses the CBOR data
35583558+ model and its small variations involved in the processing of CBOR.
35593559+ The introduction of terms for those variations (basic generic,
35603560+ extended generic, specific) enables more concise language in other
35613561+ places of the document and also helps to clarify expectations of
35623562+ implementations and of the extensibility features of the format.
35633563+35643564+ As a format derived from the JSON ecosystem, RFC 7049 was influenced
35653565+ by the JSON number system that was in turn inherited from JavaScript
35663566+ at the time. JSON does not provide distinct integers and floating-
35673567+ point values (and the latter are decimal in the format). CBOR
35683568+ provides binary representations of numbers, which do differ between
35693569+ integers and floating-point values. Experience from implementation
35703570+ and use suggested that the separation between these two number
35713571+ domains should be more clearly drawn in the document; language that
35723572+ suggested an integer could seamlessly stand in for a floating-point
35733573+ value was removed. Also, a suggestion (based on I-JSON [RFC7493])
35743574+ was added for handling these types when converting JSON to CBOR, and
35753575+ the use of a specific rounding mechanism has been recommended.
35763576+35773577+ For a single value in the data model, CBOR often provides multiple
35783578+ encoding options. A new section (Section 4) introduces the term
35793579+ "preferred serialization" (Section 4.1) and defines it for various
35803580+ kinds of data items. On the basis of this terminology, the section
35813581+ then discusses how a CBOR-based protocol can define "deterministic
35823582+ encoding" (Section 4.2), which avoids terms "canonical" and
35833583+ "canonicalization" from RFC 7049. The suggestion of "Core
35843584+ Deterministic Encoding Requirements" (Section 4.2.1) enables generic
35853585+ support for such protocol-defined encoding requirements. This
35863586+ document further eases the implementation of deterministic encoding
35873587+ by simplifying the map ordering suggested in RFC 7049 to a simple
35883588+ lexicographic ordering of encoded keys. A description of the older
35893589+ suggestion is kept as an alternative, now termed "length-first map
35903590+ key ordering" (Section 4.2.3).
35913591+35923592+ The terminology for well-formed and valid data was sharpened and more
35933593+ stringently used, avoiding less well-defined alternative terms such
35943594+ as "syntax error", "decoding error", and "strict mode" outside of
35953595+ examples. Also, a third level of requirements that an application
35963596+ has on its input data beyond CBOR-level validity is now explicitly
35973597+ called out. Well-formed (processable at all), valid (checked by a
35983598+ validity-checking generic decoder), and expected input (as checked by
35993599+ the application) are treated as a hierarchy of layers of
36003600+ acceptability.
36013601+36023602+ The handling of non-well-formed simple values was clarified in text
36033603+ and pseudocode. Appendix F was added to discuss well-formedness
36043604+ errors and provide examples for them. The pseudocode was updated to
36053605+ be more portable, and some portability considerations were added.
36063606+36073607+ The discussion of validity has been sharpened in two areas. Map
36083608+ validity (handling of duplicate keys) was clarified, and the domain
36093609+ of applicability of certain implementation choices explained. Also,
36103610+ while streamlining the terminology for tags, tag numbers, and tag
36113611+ content, discussion was added on tag validity, and the restrictions
36123612+ were clarified on tag content, in general and specifically for tag 1.
36133613+36143614+ An implementation note (and note for future tag definitions) was
36153615+ added to Section 3.4 about defining tags with semantics that depend
36163616+ on serialization order.
36173617+36183618+ Tag 35 is not defined by this document; the registration based on the
36193619+ definition in RFC 7049 remains in place.
36203620+36213621+ Terminology was introduced in Section 3 for "argument" and "head",
36223622+ simplifying further discussion.
36233623+36243624+ The security considerations (Section 10) were mostly rewritten and
36253625+ significantly expanded; in multiple other places, the document is now
36263626+ more explicit that a decoder cannot simply condone well-formedness
36273627+ errors.
36283628+36293629+Acknowledgements
36303630+36313631+ CBOR was inspired by MessagePack. MessagePack was developed and
36323632+ promoted by Sadayuki Furuhashi ("frsyuki"). This reference to
36333633+ MessagePack is solely for attribution; CBOR is not intended as a
36343634+ version of, or replacement for, MessagePack, as it has different
36353635+ design goals and requirements.
36363636+36373637+ The need for functionality beyond the original MessagePack
36383638+ specification became obvious to many people at about the same time
36393639+ around the year 2012. BinaryPack is a minor derivation of
36403640+ MessagePack that was developed by Eric Zhang for the binaryjs
36413641+ project. A similar, but different, extension was made by Tim Caswell
36423642+ for his msgpack-js and msgpack-js-browser projects. Many people have
36433643+ contributed to the discussion about extending MessagePack to separate
36443644+ text string representation from byte string representation.
36453645+36463646+ The encoding of the additional information in CBOR was inspired by
36473647+ the encoding of length information designed by Klaus Hartke for CoAP.
36483648+36493649+ This document also incorporates suggestions made by many people,
36503650+ notably Dan Frost, James Manger, Jeffrey Yasskin, Joe Hildebrand,
36513651+ Keith Moore, Laurence Lundblade, Matthew Lepinski, Michael
36523652+ Richardson, Nico Williams, Peter Occil, Phillip Hallam-Baker, Ray
36533653+ Polk, Stuart Cheshire, Tim Bray, Tony Finch, Tony Hansen, and Yaron
36543654+ Sheffer. Benjamin Kaduk provided an extensive review during IESG
36553655+ processing. Éric Vyncke, Erik Kline, Robert Wilton, and Roman Danyliw
36563656+ provided further IESG comments, which included an IoT directorate
36573657+ review by Eve Schooler.
36583658+36593659+Authors' Addresses
36603660+36613661+ Carsten Bormann
36623662+ Universität Bremen TZI
36633663+ Postfach 330440
36643664+ D-28359 Bremen
36653665+ Germany
36663666+36673667+ Phone: +49-421-218-63921
36683668+ Email: cabo@tzi.org
36693669+36703670+36713671+ Paul Hoffman
36723672+ ICANN
36733673+36743674+ Email: paul.hoffman@icann.org