OCaml Zarr jsont codecs for v2/v3 and common conventions
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: expand ocamldoc with spec links and standalone descriptions

Every type, module, and accessor now has documentation that includes:
- Links to the relevant zarr-specs section on readthedocs
- Links to convention repositories on GitHub
- Enough spec detail to understand the type without reading the spec
- JSON encoding details (what JSON forms each type accepts)
- Field semantics (units, defaults, valid ranges)
- Section headers grouping related types

The .mli module doc includes a quick-start example for both decode
and store probing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+635 -163
+635 -163
src/zarr_jsont.mli
··· 1 - (** Jsont codecs for Zarr v2 and v3 metadata. *) 1 + (** Type-safe bidirectional JSON codecs for 2 + {{:https://zarr-specs.readthedocs.io/en/latest/}Zarr} v2 and v3 3 + array and group metadata. 4 + 5 + This library provides {!Jsont.t} codecs for decoding and encoding the 6 + JSON metadata used by 7 + {{:https://zarr-specs.readthedocs.io/en/latest/v2/core.html}Zarr v2} 8 + and {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html}Zarr v3} 9 + stores. It also includes best-effort codecs for several 10 + {{:https://github.com/zarr-conventions}zarr conventions}: geo-proj, 11 + spatial, multiscales, and geoembeddings. 12 + 13 + All types use polymorphic variants. Known extensions and codecs are 14 + enumerated with an [`Other] escape hatch for unrecognised values. 15 + Unknown JSON fields are preserved throughout via 16 + {!Jsont.Object.keep_unknown} so that round-tripping is faithful. 17 + 18 + {2 Quick start} 19 + 20 + Decode any Zarr metadata blob: 21 + {[ 22 + match Jsont_bytesrw.decode_string Zarr_jsont.jsont json_string with 23 + | Ok (`V2 node) -> (* V2 array or group *) 24 + | Ok (`V3 node) -> (* V3 array or group *) 25 + | Error e -> (* decode error *) 26 + ]} 27 + 28 + Probe a store directory: 29 + {[ 30 + let read relpath = (* read file at base/relpath *) in 31 + match Zarr_jsont.probe ~read "." with 32 + | Ok { node; children; _ } -> (* decoded hierarchy *) 33 + | Error msg -> (* probe failed *) 34 + ]} *) 2 35 3 36 [@@@ai_disclosure "ai-assisted"] 4 37 [@@@ai_model "claude-opus-4"] 5 38 [@@@ai_provider "Anthropic"] 6 39 7 - (** Fill value that encodes the value stored in unwritten or missing chunks. *) 40 + (** {1:shared Shared types} 41 + 42 + Types used by both Zarr v2 and v3 metadata. *) 43 + 44 + (** The value stored in uninitialized or missing chunks. 45 + 46 + In JSON, fill values appear as multiple sorts: 47 + - [null] for absent fill values 48 + - [true] / [false] for booleans 49 + - numbers for integers and finite floats 50 + - ["NaN"], ["Infinity"], ["-Infinity"] strings for non-finite floats 51 + - ["0x..."] hex strings for float bit patterns (v3) 52 + - two-element arrays [[r, i]] for complex numbers (v3) 53 + - integer arrays [[0, 255, ...]] for raw bytes (v3) 54 + 55 + See 56 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#fill-value}v3 57 + fill value} and 58 + {{:https://zarr-specs.readthedocs.io/en/latest/v2/core.html#fill-value}v2 59 + fill value}. *) 8 60 type fill_value = [ 9 61 | `Null 10 62 | `Bool of bool ··· 15 67 ] 16 68 17 69 val fill_value_jsont : fill_value Jsont.t 18 - (** Codec for {!fill_value}. Dispatches on the JSON sort via {!Jsont.any}. *) 70 + (** Codec for {!fill_value}. Dispatches on the JSON sort via 71 + {!Jsont.any}: null, boolean, number, string, and array are each 72 + routed to a specialised sub-codec. *) 19 73 20 - (** Byte order of a NumPy array dtype. *) 74 + (** Byte order indicator from a NumPy dtype typestr. 75 + - [`Little] — ['<'] little-endian 76 + - [`Big] — ['>'] big-endian 77 + - [`Not_applicable] — ['|'] or ['='], used for single-byte types *) 21 78 type endian = [ `Little | `Big | `Not_applicable ] 22 79 23 - (** NumPy array dtype as used in Zarr v2 array metadata [".zarray"]. 80 + (** NumPy array dtype as used in Zarr v2 81 + {{:https://zarr-specs.readthedocs.io/en/latest/v2/core.html#data-type-encoding} 82 + array metadata}. 24 83 25 - Simple types are encoded as JSON strings in NumPy typestr format (e.g. 26 - ["<f8"], ["|b1"]). Structured types are encoded as JSON arrays of field 27 - descriptors, each of the form [["name","<dtype_str"]] or 28 - [["name","<dtype_str",[dim1,...]]]. *) 84 + Simple types are encoded as JSON strings in 85 + {{:https://numpy.org/doc/stable/reference/arrays.interface.html#arrays-interface} 86 + NumPy typestr} format: a one-character byte-order indicator (['<'], 87 + ['>'], or ['|']), a one-character type code, and a size in bytes. 88 + For example: 89 + - ["<f8"] — little-endian 64-bit float 90 + - ["|b1"] — boolean 91 + - ["<M8[ns]"] — little-endian datetime with nanosecond units 92 + 93 + Structured (compound) types are encoded as JSON arrays of field 94 + descriptors: [[\["name","<f4"\], \["name","<i2",\[3\]\]]]. Each 95 + field has a name, a dtype (which may itself be structured), and an 96 + optional shape. *) 29 97 type dtype = [ 30 98 | `Bool 31 - | `Int of endian * int 32 - | `Uint of endian * int 33 - | `Float of endian * int 34 - | `Complex of endian * int 35 - | `Timedelta of endian * string 36 - | `Datetime of endian * string 37 - | `String of int 38 - | `Unicode of endian * int 39 - | `Raw of int 99 + | `Int of endian * int (** Signed integer; size in bytes. *) 100 + | `Uint of endian * int (** Unsigned integer; size in bytes. *) 101 + | `Float of endian * int (** IEEE float; size in bytes. *) 102 + | `Complex of endian * int (** Complex float pair; total size in bytes. *) 103 + | `Timedelta of endian * string (** Time delta; unit string (e.g. ["s"]). *) 104 + | `Datetime of endian * string (** Datetime; unit string (e.g. ["ns"]). *) 105 + | `String of int (** Fixed-length byte string; size in bytes. *) 106 + | `Unicode of endian * int (** Fixed-length unicode; number of characters. *) 107 + | `Raw of int (** Void / raw bytes; size in bytes. *) 40 108 | `Structured of (string * dtype * int list option) list 109 + (** Compound type: list of [(field_name, field_dtype, optional_shape)]. *) 41 110 ] 42 111 43 112 val dtype_jsont : dtype Jsont.t 44 - (** Codec for {!dtype}. Simple types decode/encode as JSON strings; 45 - structured types decode/encode as JSON arrays. *) 113 + (** Codec for {!dtype}. Simple types round-trip through JSON strings; 114 + structured types round-trip through JSON arrays. The codec is 115 + recursive: structured field dtypes may themselves be structured. *) 116 + 117 + (** {1:escape Escape-hatch types} 118 + 119 + Catch-all types for unrecognised codecs and extensions, ensuring 120 + that unknown values are preserved and round-tripped faithfully. *) 46 121 47 - (** Catch-all type for unrecognized v2 codecs. 122 + (** Unrecognised Zarr v2 codec (compressor or filter). 48 123 49 - Represents objects with an ["id"] key plus arbitrary extra fields, 50 - for example [{"id":"custom_codec","param1":42}]. Unknown fields are 51 - preserved and round-tripped faithfully. *) 124 + V2 codecs are JSON objects with an ["id"] key naming the codec and 125 + arbitrary additional configuration fields. For example: 126 + {[{"id": "custom_codec", "param1": 42, "param2": "hello"}]} 127 + 128 + Unknown fields are captured via {!Jsont.Object.keep_unknown} and 129 + preserved on re-encoding. *) 52 130 module Other_codec : sig 53 131 type t 132 + 54 133 val name : t -> string 134 + (** The ["id"] value identifying this codec. *) 135 + 55 136 val configuration : t -> Jsont.json 137 + (** All fields other than ["id"], as a JSON object. *) 138 + 56 139 val make : string -> Jsont.json -> t 57 140 val jsont : t Jsont.t 58 141 end 59 142 60 - (** Catch-all type for unrecognized v3 extensions. 143 + (** Unrecognised Zarr v3 extension point. 144 + 145 + V3 uses a uniform extension format for data types, chunk grids, chunk 146 + key encodings, codecs, and storage transformers. Each is a JSON 147 + object with: 148 + - ["name"] — extension identifier 149 + - ["configuration"] — optional configuration object 150 + - ["must_understand"] — boolean (default [true]); if [false], 151 + implementations may safely ignore the extension 61 152 62 - Represents objects with a ["name"] key, an optional ["configuration"] 63 - object, and a ["must_understand"] boolean (defaults to [true] when 64 - absent; omitted on encode when [true]). Unknown fields are skipped. *) 153 + See 154 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#extension-points} 155 + v3 extension points}. *) 65 156 module Other_ext : sig 66 157 type t 158 + 67 159 val name : t -> string 160 + (** The extension identifier. *) 161 + 68 162 val configuration : t -> Jsont.json option 163 + (** The extension configuration, if any. *) 164 + 69 165 val must_understand : t -> bool 166 + (** Whether implementations must understand this extension. Defaults 167 + to [true] when absent in JSON; omitted on encode when [true]. *) 168 + 70 169 val make : string -> Jsont.json option -> bool -> t 71 170 val jsont : t Jsont.t 72 171 end 73 172 74 - (** Zarr v3 codec types. *) 173 + (** {1:v3 Zarr v3} 174 + 175 + Types and codecs for 176 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html} 177 + Zarr v3} metadata. V3 stores a single [zarr.json] file per node 178 + (array or group), with the ["node_type"] field distinguishing the 179 + two. *) 75 180 module V3 : sig 76 - (** Typed sub-codecs for known v3 codecs. *) 181 + 182 + (** {2 Codecs} 183 + 184 + A v3 array's data is processed through a 185 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#codecs} 186 + codec chain}: zero or more array-to-array codecs, then exactly one 187 + array-to-bytes codec, then zero or more bytes-to-bytes codecs. 188 + Each codec is a JSON object with ["name"] and optional 189 + ["configuration"]. *) 77 190 module Codec : sig 78 - (** Bytes codec configuration. *) 191 + 192 + (** The 193 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/codecs/bytes/index.html} 194 + bytes} codec (array-to-bytes). Serialises array elements in 195 + lexicographic (C) order. The [endian] field specifies byte order; 196 + it is omitted for single-byte data types. *) 79 197 module Bytes : sig 80 198 type t 81 199 val endian : t -> [ `Little | `Big ] option 82 - (** [None] when endian is not applicable (single-byte data types). *) 200 + (** [None] when endian is not applicable (single-byte data types 201 + such as [bool] or [uint8]). *) 83 202 end 84 203 85 - (** Gzip codec configuration. *) 204 + (** The 205 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/codecs/gzip/index.html} 206 + gzip} codec (bytes-to-bytes). Compresses data using the gzip 207 + algorithm. *) 86 208 module Gzip : sig 87 209 type t 88 210 val level : t -> int 211 + (** Compression level, 0 (no compression) to 9 (best). *) 89 212 end 90 213 91 - (** Blosc codec configuration. *) 214 + (** The 215 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/codecs/blosc/index.html} 216 + blosc} codec (bytes-to-bytes). Meta-compressor that delegates 217 + to one of several compression libraries. *) 92 218 module Blosc : sig 93 219 type t 94 220 val cname : t -> string 221 + (** Compressor name: ["lz4"], ["lz4hc"], ["blosclz"], ["zstd"], 222 + ["snappy"], or ["zlib"]. *) 223 + 95 224 val clevel : t -> int 225 + (** Compression level, 0–9. *) 226 + 96 227 val shuffle : t -> [ `Noshuffle | `Shuffle | `Bitshuffle ] 228 + (** Shuffle mode applied before compression. *) 229 + 97 230 val typesize : t -> int option 231 + (** Element size in bytes for shuffling. [None] when shuffle is 232 + [`Noshuffle]. *) 233 + 98 234 val blocksize : t -> int 235 + (** Internal block size; 0 means automatic. *) 99 236 end 100 237 101 - (** Transpose codec configuration. *) 238 + (** The 239 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/codecs/transpose/index.html} 240 + transpose} codec (array-to-array). Permutes array dimensions. *) 102 241 module Transpose : sig 103 242 type t 104 243 val order : t -> int list 244 + (** Dimension permutation, e.g. [\[1; 0; 2\]]. *) 105 245 end 106 246 107 - (** Sharding codec configuration. *) 247 + (** The 248 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/index.html} 249 + sharding} codec (array-to-bytes). Bundles multiple inner chunks 250 + into a single shard with an embedded index for random access. 251 + The codec is recursive: inner chunks have their own codec chain. *) 108 252 module Sharding : sig 109 253 type t 110 254 and codec = [ ··· 117 261 | `Other of Other_ext.t 118 262 ] 119 263 val chunk_shape : t -> int list 264 + (** Shape of inner chunks within each shard. *) 265 + 120 266 val codecs : t -> codec list 267 + (** Codec chain applied to each inner chunk. *) 268 + 121 269 val index_codecs : t -> codec list 270 + (** Codec chain for the shard index. Empty list if unspecified. *) 271 + 122 272 val index_location : t -> [ `Start | `End ] 273 + (** Where the shard index is stored. Default [`End]. *) 123 274 end 124 275 end 125 276 126 - (** A v3 codec: bytes, gzip, blosc, crc32c, transpose, sharding, or a catch-all. *) 277 + (** A v3 codec pipeline entry. The core specification defines six 278 + codecs; unrecognised codecs decode as [`Other]. 279 + 280 + - [`Bytes] — array-to-bytes serialisation 281 + - [`Gzip] — gzip compression 282 + - [`Blosc] — blosc meta-compression 283 + - [`Crc32c] — CRC-32C checksum (no configuration) 284 + - [`Transpose] — dimension permutation 285 + - [`Sharding] — sharding with inner chunks *) 127 286 type codec = Codec.Sharding.codec 128 287 129 288 val codec_jsont : codec Jsont.t 130 - (** Codec for {!codec}. Dispatches on the ["name"] field. 131 - Sharding codecs are decoded recursively. *) 289 + (** Codec for {!codec}. Dispatches on the ["name"] field. 290 + Sharding codecs are decoded recursively so that inner codec 291 + chains are fully typed. *) 292 + 293 + (** V3 294 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#data-type} 295 + data type}. Core types are encoded as JSON strings; extension 296 + types as JSON objects with ["name"] and ["configuration"]. 132 297 133 - (** V3 data type. Either a core named type (string) or an extension (object). *) 298 + The fifteen core types are: 299 + - Booleans: [bool] 300 + - Signed integers: [int8], [int16], [int32], [int64] 301 + - Unsigned integers: [uint8], [uint16], [uint32], [uint64] 302 + - Floats: [float16], [float32], [float64] 303 + - Complex: [complex64], [complex128] 304 + 305 + Raw bit types ([r8], [r16], etc.) use the ["r<bits>"] pattern. *) 134 306 module Data_type : sig 135 307 type t = [ 136 308 | `Bool | `Int8 | `Int16 | `Int32 | `Int64 137 309 | `Uint8 | `Uint16 | `Uint32 | `Uint64 138 310 | `Float16 | `Float32 | `Float64 139 311 | `Complex64 | `Complex128 140 - | `Raw of int 141 - | `Other of Other_ext.t 312 + | `Raw of int (** Raw bits; the integer is the bit width. *) 313 + | `Other of Other_ext.t (** Extension data type. *) 142 314 ] 143 315 end 144 316 145 317 val data_type_jsont : Data_type.t Jsont.t 146 - (** Codec for {!Data_type.t}. Core types decode from JSON strings; 147 - extension types decode from JSON objects. The [r<bits>] pattern 318 + (** Codec for {!Data_type.t}. Core types decode from JSON strings; 319 + extension types from JSON objects. The ["r<bits>"] pattern 148 320 decodes as [`Raw bits]. *) 149 321 150 - (** V3 chunk grid specification. *) 322 + (** V3 323 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#chunk-grid} 324 + chunk grid}. Defines how the array is divided into chunks. 325 + The only core grid is ["regular"] (fixed-shape chunks). *) 151 326 module Chunk_grid : sig 152 - (** Regular (fixed-shape) chunk grid. *) 327 + (** Regular chunk grid: every chunk has the same shape. *) 153 328 module Regular : sig 154 329 type t 155 330 val chunk_shape : t -> int list 331 + (** Dimensions of each chunk, e.g. [\[1000; 100\]]. *) 156 332 end 157 333 158 334 type t = [ `Regular of Regular.t | `Other of Other_ext.t ] 159 335 end 160 336 161 337 val chunk_grid_jsont : Chunk_grid.t Jsont.t 162 - (** Codec for {!Chunk_grid.t}. Dispatches on the ["name"] field. *) 338 + (** Codec for {!Chunk_grid.t}. Dispatches on the ["name"] field. *) 163 339 164 - (** V3 chunk key encoding specification. *) 340 + (** V3 341 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#chunk-key-encoding} 342 + chunk key encoding}. Maps chunk grid coordinates to store keys. 343 + The only core encoding is ["default"]. *) 165 344 module Chunk_key_encoding : sig 166 - (** Default chunk key encoding with configurable separator. *) 345 + (** Default key encoding. Chunk at grid position [(1, 2, 3)] is 346 + stored at key ["c/1/2/3"] (with ["/"]) or ["c.1.2.3"] (with 347 + ["."]). The prefix ["c"] distinguishes chunk keys from 348 + metadata keys. *) 167 349 module Default : sig 168 350 type t 169 351 val separator : t -> [ `Slash | `Dot ] 352 + (** The separator between coordinate components. 353 + Default is [`Slash]. *) 170 354 end 171 355 172 356 type t = [ `Default of Default.t | `Other of Other_ext.t ] 173 357 end 174 358 175 359 val chunk_key_encoding_jsont : Chunk_key_encoding.t Jsont.t 176 - (** Codec for {!Chunk_key_encoding.t}. Dispatches on the ["name"] field. *) 360 + (** Codec for {!Chunk_key_encoding.t}. Dispatches on the ["name"] 361 + field. *) 362 + 363 + (** Complete v3 364 + {{:https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#array-metadata} 365 + array metadata} as stored in [zarr.json]. 177 366 178 - (** Complete v3 array metadata. *) 367 + Required fields: [shape], [data_type], [chunk_grid], 368 + [chunk_key_encoding], [codecs], [fill_value]. 369 + 370 + Optional fields: [dimension_names] (named dimensions, with [null] 371 + entries for unnamed ones), [storage_transformers], [attributes]. *) 179 372 module Array_meta : sig 180 373 type t 374 + 181 375 val shape : t -> int list 376 + (** Array dimensions, e.g. [\[10000; 1000\]]. *) 377 + 182 378 val data_type : t -> Data_type.t 379 + (** Element data type. *) 380 + 183 381 val chunk_grid : t -> Chunk_grid.t 382 + (** How the array is divided into chunks. *) 383 + 184 384 val chunk_key_encoding : t -> Chunk_key_encoding.t 385 + (** How chunk coordinates map to store keys. *) 386 + 185 387 val codecs : t -> codec list 388 + (** The codec pipeline applied to each chunk. *) 389 + 186 390 val fill_value : t -> fill_value 391 + (** Default value for uninitialised chunks. *) 392 + 187 393 val dimension_names : t -> string option list option 394 + (** Named dimensions. [None] if absent; [Some \[Some "x"; None\]] 395 + if partially named. *) 396 + 188 397 val storage_transformers : t -> Other_ext.t list option 398 + (** Optional storage transformer extensions. *) 399 + 189 400 val unknown : t -> Jsont.json 401 + (** Any additional JSON fields not consumed by this codec. *) 190 402 end 191 403 192 404 val array_meta_jsont : Array_meta.t Jsont.t 193 - (** Codec for {!Array_meta.t}. Decodes and encodes the full v3 array 194 - metadata object. The ["zarr_format"] and ["node_type"] fields are 195 - consumed on decode and always written as [3] / ["array"] on encode. 196 - Dimension names may be absent or contain null entries. 197 - Unknown fields are preserved. *) 405 + (** Codec for {!Array_meta.t}. Consumes ["zarr_format"] and 406 + ["node_type"] on decode (always writes [3] / ["array"] on encode). 407 + The ["attributes"] sub-object is consumed but decoded separately 408 + at the {!V3_node} level. *) 198 409 end 199 410 200 - (** Zarr v2 compressor and filter codecs. *) 411 + (** {1:v2 Zarr v2} 412 + 413 + Types and codecs for 414 + {{:https://zarr-specs.readthedocs.io/en/latest/v2/core.html}Zarr v2} 415 + metadata. V2 uses three separate files: [.zarray] (array metadata), 416 + [.zgroup] (group metadata, just [\{"zarr_format": 2\}]), and 417 + [.zattrs] (user attributes). *) 201 418 module V2 : sig 202 - (** Typed sub-codecs for known v2 compressors. *) 419 + 420 + (** {2 Compressors} 421 + 422 + V2 compressors appear in the ["compressor"] field of [.zarray]. 423 + Each is a JSON object with an ["id"] field naming the compressor 424 + and codec-specific configuration. A [null] value means no 425 + compression. *) 203 426 module Compressor : sig 204 - (** Blosc compressor codec. *) 427 + 428 + (** {{:https://www.blosc.org/}Blosc} meta-compressor. *) 205 429 module Blosc : sig 206 430 type t 207 431 val cname : t -> string 432 + (** Compression library name (e.g. ["lz4"], ["zstd"]). *) 433 + 208 434 val clevel : t -> int 435 + (** Compression level, 0–9. *) 436 + 209 437 val shuffle : t -> int 438 + (** Shuffle mode: 0 = none, 1 = byte, 2 = bit. *) 439 + 210 440 val blocksize : t -> int option 441 + (** Internal block size; [None] if not specified. *) 442 + 211 443 val unknown : t -> Jsont.json 444 + (** Extra fields preserved for round-tripping. *) 212 445 end 213 446 214 - (** Zlib compressor codec. *) 447 + (** {{:https://zlib.net/}Zlib} compressor. *) 215 448 module Zlib : sig 216 449 type t 217 450 val level : t -> int 451 + (** Compression level, 1–9. *) 452 + 218 453 val unknown : t -> Jsont.json 219 454 end 220 455 end 221 456 222 - (** A v2 compressor: either a known type or a catch-all. *) 457 + (** A v2 compressor. Known compressors are decoded into typed 458 + variants; unrecognised ones are captured as {!Other_codec.t}. *) 223 459 type compressor = [ 224 460 | `Blosc of Compressor.Blosc.t 225 461 | `Zlib of Compressor.Zlib.t ··· 227 463 ] 228 464 229 465 val compressor_jsont : compressor Jsont.t 230 - (** Codec for {!compressor}. Dispatches on the ["id"] field. *) 466 + (** Codec for {!compressor}. Dispatches on the ["id"] field. *) 467 + 468 + (** {2 Filters} 231 469 232 - (** Typed sub-codecs for known v2 filters. *) 470 + V2 filters appear in the ["filters"] field of [.zarray] as a 471 + JSON array of objects (or [null] for no filters). Each object 472 + has an ["id"] field. *) 233 473 module Filter : sig 234 - (** Delta filter codec. *) 474 + 475 + (** Delta filter: stores differences between consecutive elements. *) 235 476 module Delta : sig 236 477 type t 237 478 val dtype : t -> string 479 + (** Data type of the input, as a NumPy typestr (e.g. ["<f8"]). *) 480 + 238 481 val astype : t -> string option 482 + (** Data type to cast to before differencing, if different. *) 483 + 239 484 val unknown : t -> Jsont.json 240 485 end 241 486 end 242 487 243 - (** A v2 filter: either a known type or a catch-all. *) 488 + (** A v2 filter. *) 244 489 type filter = [ 245 490 | `Delta of Filter.Delta.t 246 491 | `Other of Other_codec.t 247 492 ] 248 493 249 494 val filter_jsont : filter Jsont.t 250 - (** Codec for {!filter}. Dispatches on the ["id"] field. *) 495 + (** Codec for {!filter}. Dispatches on the ["id"] field. *) 251 496 252 - (** Complete v2 [.zarray] array metadata. *) 497 + (** Complete v2 498 + {{:https://zarr-specs.readthedocs.io/en/latest/v2/core.html#metadata} 499 + array metadata} as stored in [.zarray]. *) 253 500 module Array_meta : sig 254 501 type t 502 + 255 503 val shape : t -> int list 504 + (** Array dimensions. *) 505 + 256 506 val chunks : t -> int list 507 + (** Chunk dimensions. All chunks have the same shape. *) 508 + 257 509 val dtype : t -> dtype 510 + (** Element data type in NumPy typestr format. *) 511 + 258 512 val compressor : t -> compressor option 513 + (** Primary compressor. [None] means no compression. *) 514 + 259 515 val fill_value : t -> fill_value 516 + (** Default value for uninitialised chunks. *) 517 + 260 518 val order : t -> [ `C | `F ] 519 + (** Memory layout: [`C] for row-major (C order), [`F] for 520 + column-major (Fortran order). *) 521 + 261 522 val filters : t -> filter list option 523 + (** Pre-compression filter pipeline. [None] means no filters. *) 524 + 262 525 val dimension_separator : t -> [ `Dot | `Slash ] option 526 + (** Separator used in chunk keys. [None] uses the default ["."]. *) 527 + 263 528 val unknown : t -> Jsont.json 529 + (** Extra fields preserved for round-tripping. *) 264 530 end 265 531 266 532 val array_meta_jsont : Array_meta.t Jsont.t 267 - (** Codec for {!Array_meta.t}. Decodes and encodes the full [.zarray] 268 - metadata object. The ["zarr_format"] field is consumed on decode and 269 - always written as [2] on encode. Unknown fields are preserved. *) 533 + (** Codec for {!Array_meta.t}. Consumes ["zarr_format"] on decode 534 + (always writes [2] on encode). *) 270 535 end 271 536 272 - (** Zarr convention codecs. 537 + (** {1:conv Conventions} 538 + 539 + {{:https://github.com/zarr-conventions}Zarr conventions} extend the 540 + base metadata with domain-specific attributes. Each convention uses 541 + a namespace prefix (e.g. ["proj:"], ["spatial:"]) and registers 542 + itself in the ["zarr_conventions"] array within the attributes 543 + object. 273 544 274 - Provides metadata and typed codecs for known Zarr conventions. 275 - Each sub-module exposes a [meta] constant describing the convention 276 - and a [jsont] codec for its attributes. *) 545 + This library provides typed codecs for four conventions. Each 546 + sub-module exposes a {!Conv.Meta.t} constant for registration and a 547 + [jsont] codec that decodes from the flat attributes object (using 548 + {!Jsont.Object.skip_unknown} to coexist with other conventions). *) 277 549 module Conv : sig 278 - (** Convention registration metadata. *) 550 + 551 + (** Convention registration metadata, as stored in the 552 + ["zarr_conventions"] array. At least one of [uuid], [schema_url], 553 + or [spec_url] should be present for identification. *) 279 554 module Meta : sig 280 555 type t 281 556 val uuid : t -> string 557 + (** Permanent identifier for the convention. *) 558 + 282 559 val name : t -> string 560 + (** Human-readable name, typically the namespace prefix. *) 561 + 283 562 val schema_url : t -> string option 563 + (** URL to the JSON Schema definition. *) 564 + 284 565 val spec_url : t -> string option 566 + (** URL to the specification document. *) 567 + 285 568 val description : t -> string option 569 + (** Brief description of the convention's purpose. *) 570 + 286 571 val jsont : t Jsont.t 287 572 end 288 573 289 - (** Geo-projection convention attributes ([proj:] prefix). *) 574 + (** {{:https://github.com/zarr-experimental/geo-proj}Geo-projection} 575 + convention ([proj:] prefix). 576 + 577 + Defines coordinate reference system (CRS) metadata for geospatial 578 + data. At least one of [code], [wkt2], or [projjson] should be 579 + present. 580 + 581 + - [proj:code] — authority:code identifier (e.g. ["EPSG:4326"]) 582 + - [proj:wkt2] — WKT2 (ISO 19162:2019) CRS string 583 + - [proj:projjson] — PROJJSON CRS object *) 290 584 module Proj : sig 291 585 type t 292 586 val code : t -> string option 587 + (** Authority:code CRS identifier, e.g. ["EPSG:4326"]. *) 588 + 293 589 val wkt2 : t -> string option 590 + (** WKT2 CRS representation. *) 591 + 294 592 val projjson : t -> Jsont.json option 593 + (** PROJJSON CRS object. *) 594 + 295 595 val meta : Meta.t 596 + (** Convention registration metadata (UUID 597 + [f17cb550-5864-4468-aeb7-f3180cfb622f]). *) 598 + 296 599 val jsont : t Jsont.t 297 600 end 298 601 299 - (** Spatial convention attributes ([spatial:] prefix). *) 602 + (** {{:https://github.com/zarr-conventions/spatial}Spatial} convention 603 + ([spatial:] prefix). 604 + 605 + Describes the mapping between array indices and spatial 606 + coordinates. Works for both geospatial and non-geospatial data 607 + (microscopy, medical imaging, etc.). 608 + 609 + The affine transform maps pixel coordinates [(i, j)] to spatial 610 + coordinates [(x, y)] via: 611 + {[ 612 + x = a*i + b*j + c 613 + y = d*i + e*j + f 614 + ]} 615 + where the six coefficients are stored in [transform] as 616 + [\[a; b; c; d; e; f\]]. *) 300 617 module Spatial : sig 301 618 type t 302 619 val dimensions : t -> string list 620 + (** Names of the spatial dimensions (e.g. [["y"; "x"]]). *) 621 + 303 622 val bbox : t -> float list option 623 + (** Bounding box: [\[xmin; ymin; xmax; ymax\]] for 2D, 624 + [\[xmin; ymin; zmin; xmax; ymax; zmax\]] for 3D. *) 625 + 304 626 val transform_type : t -> string option 627 + (** Type of coordinate transform. Currently only ["affine"] is 628 + defined; defaults to ["affine"] when absent. *) 629 + 305 630 val transform : t -> float list option 631 + (** Affine coefficients [\[a; b; c; d; e; f\]] for 2D. *) 632 + 306 633 val shape : t -> int list option 634 + (** Shape of the spatial dimensions [\[height; width\]]. Useful 635 + when the spatial shape differs from the full array shape. *) 636 + 307 637 val registration : t -> [ `Pixel | `Node ] option 638 + (** Grid cell registration. [`Pixel] (default) means cell 639 + boundaries align with coordinate values; [`Node] means cell 640 + centres align with coordinate values. *) 641 + 308 642 val meta : Meta.t 643 + (** Convention registration metadata (UUID 644 + [689b58e2-cf7b-45e0-9fff-9cfc0883d6b4]). *) 645 + 309 646 val jsont : t Jsont.t 310 647 end 311 648 312 - (** Geoembeddings convention attributes ([geoemb:] prefix). 649 + (** {{:https://github.com/geo-embeddings/embeddings-zarr-convention} 650 + Geoembeddings} convention ([geoemb:] prefix). 313 651 314 - Describes geospatial embedding provenance including encoder model, 315 - source data, quantization, and chip layout. Supports pixel and 316 - chip embedding types. *) 652 + Describes geospatial embedding groups with model provenance, 653 + processing parameters, and optional quantization metadata. 654 + Supports two embedding types: 655 + 656 + - {b Pixel embeddings}: per-pixel dense embeddings where each 657 + spatial location has an embedding vector. 658 + - {b Chip embeddings}: image patch embeddings where non-overlapping 659 + or overlapping regions are encoded into single vectors. 660 + 661 + When [type_] is [`Chip], the [chip_layout] field is also 662 + required. *) 317 663 module Geoemb : sig 318 - (** Chip layout for chip-type embeddings. *) 664 + 665 + (** Chip layout for chip-type embeddings, describing how source 666 + imagery was divided into patches. *) 319 667 module Chip_layout : sig 320 668 type t 321 669 val layout_type : t -> [ `Regular_grid | `Irregular ] 670 + (** [`Regular_grid] for uniform non-overlapping or overlapping 671 + patches; [`Irregular] for variable-size patches. *) 672 + 322 673 val chip_size : t -> int * int 674 + (** Chip dimensions [(height, width)] in pixels. *) 675 + 323 676 val stride : t -> (int * int) option 677 + (** Stride [(y, x)] between chips. Defaults to [chip_size] 678 + (non-overlapping) when absent. *) 679 + 324 680 val grid_id : t -> string option 681 + (** Identifier for a predefined grid system. *) 682 + 325 683 val grid_definition : t -> string option 684 + (** URL to a grid definition document. *) 326 685 end 327 686 328 - (** Scale parameters for dequantization. *) 687 + (** Scale parameters for dequantizing compressed embeddings. Uses 688 + a tagged union on the ["type"] field: 689 + 690 + - {b Scalar}: [value = quantized * scale + offset] 691 + - {b Array}: [value\[..., y, x\] = quantized\[..., y, x\] * 692 + array\[..., y, x\]]; non-finite values in the scale array 693 + indicate no-data pixels. *) 329 694 module Scale : sig 695 + (** Global scalar dequantization parameters. *) 330 696 module Scalar : sig 331 697 type t 332 698 val scale : t -> float 699 + (** Scale factor. *) 700 + 333 701 val offset : t -> float 702 + (** Additive offset (default [0.0]). *) 334 703 end 704 + 705 + (** Per-pixel scale factors stored in a sibling zarr array. *) 335 706 module Array_ : sig 336 707 type t 337 708 val array_name : t -> string 709 + (** Name of the zarr array containing per-pixel scales. *) 710 + 338 711 val nodata : t -> string option 712 + (** Value indicating no-data in the scale array (e.g. 713 + ["+inf"]). Represented as a string since the JSON value 714 + may be a number or a string. *) 339 715 end 716 + 340 717 type t = [ `Scalar of Scalar.t | `Array of Array_.t ] 341 718 val jsont : t Jsont.t 342 719 end 343 720 344 - (** Quantization details for compressed embeddings. *) 721 + (** Quantization details for embeddings compressed from their 722 + original floating-point representation. *) 345 723 module Quantization : sig 346 724 type t 347 725 val meth : t -> string 726 + (** Quantization method (e.g. ["linear"], 727 + ["per_pixel_scale"], ["product_quantization"], 728 + ["binary"]). *) 729 + 348 730 val original_dtype : t -> string 731 + (** Original data type before quantization (e.g. ["float32"]). *) 732 + 349 733 val quantized_dtype : t -> string option 734 + (** Data type after quantization (e.g. ["int8"]). *) 735 + 350 736 val scale : t -> Scale.t option 737 + (** Scale parameters for dequantization. *) 738 + 351 739 val link : t -> string option 740 + (** URL to quantization codebook or lookup table. *) 352 741 end 353 742 354 743 type t 355 744 val type_ : t -> [ `Pixel | `Chip ] 745 + (** Embedding type. *) 746 + 356 747 val dimensions : t -> int 748 + (** Dimensionality of the embedding vector (e.g. [128], [768]). *) 749 + 357 750 val model : t -> string 751 + (** URL reference to the encoder model. *) 752 + 358 753 val source_data : t -> string list 754 + (** URL references to the source datasets. *) 755 + 359 756 val data_type : t -> string 757 + (** Data type of stored embeddings (e.g. ["float32"], ["int8"]). *) 758 + 360 759 val gsd : t -> float option 760 + (** Ground sample distance in metres. *) 761 + 361 762 val chip_layout : t -> Chip_layout.t option 763 + (** Chip layout configuration. Required when [type_] is [`Chip]. *) 764 + 362 765 val quantization : t -> Quantization.t option 766 + (** Quantization details, if embeddings have been compressed. *) 767 + 363 768 val spatial_layout : t -> string option 769 + (** Spatial organisation scheme: ["utm_zones"] (one group per UTM 770 + zone, named [utm01]–[utm60]) or ["global"]. *) 771 + 364 772 val build_version : t -> string option 773 + (** Version of the software that built this store. *) 774 + 365 775 val benchmark : t -> string list option 776 + (** URLs to benchmark evaluation results. *) 777 + 366 778 val meta : Meta.t 779 + (** Convention registration metadata (UUID 780 + [61c12cc5-0e28-4056-999a-480cf3fb7e4c]). *) 781 + 367 782 val jsont : t Jsont.t 368 783 end 369 784 370 - (** Multiscales convention attributes. *) 785 + (** {{:https://github.com/zarr-conventions/multiscales}Multiscales} 786 + convention. 787 + 788 + Encodes multi-resolution pyramid metadata for hierarchical data 789 + stored in zarr groups. Each level in the pyramid is a separate 790 + zarr array (or group containing arrays), linked by relative 791 + coordinate transforms. 792 + 793 + The [layout] array describes the pyramid levels. Each level has 794 + an [asset] path (relative to the group), an optional 795 + [derived_from] path, and a [transform] with [scale] and 796 + [translation] vectors describing the coordinate relationship to 797 + the source level. *) 371 798 module Multiscales : sig 372 - (** Coordinate transform for a layout item. *) 799 + 800 + (** Relative coordinate transform between pyramid levels. *) 373 801 module Transform : sig 374 802 type t 375 803 val scale : t -> float list option 804 + (** Scale factors per axis. [> 1.0] = downsampling. *) 805 + 376 806 val translation : t -> float list option 807 + (** Translation offsets per axis in coordinate space. *) 808 + 377 809 val unknown : t -> Jsont.json 810 + (** Additional fields (e.g. domain-specific transforms like 811 + [spatial:transform]). *) 378 812 end 379 813 380 - (** A single entry in the multiscales layout. *) 814 + (** A single pyramid level. *) 381 815 module Layout_item : sig 382 816 type t 383 817 val asset : t -> string 818 + (** Path to the zarr array or group for this level, relative to 819 + the parent group (e.g. ["0"], ["r10m"], ["0/data"]). *) 820 + 384 821 val derived_from : t -> string option 822 + (** Path to the source level this one was derived from. *) 823 + 385 824 val transform : t -> Transform.t option 825 + (** Coordinate transform relative to [derived_from]. Required 826 + when [derived_from] is present. *) 827 + 386 828 val resampling_method : t -> string option 829 + (** Resampling method used to derive this level (e.g. 830 + ["average"], ["nearest"], ["bilinear"]). Overrides the 831 + top-level default. *) 832 + 387 833 val unknown : t -> Jsont.json 834 + (** Additional fields (e.g. [spatial:shape], 835 + [spatial:transform]). *) 388 836 end 389 837 390 838 type t 391 839 val layout : t -> Layout_item.t list 840 + (** The pyramid levels, from highest to lowest resolution. *) 841 + 392 842 val resampling_method : t -> string option 843 + (** Default resampling method for all levels. *) 844 + 393 845 val meta : Meta.t 846 + (** Convention registration metadata (UUID 847 + [d35379db-88df-4056-af3a-620245f8e347]). *) 848 + 394 849 val jsont : t Jsont.t 395 850 end 396 851 end 397 852 398 - (** Composable attributes layer for Zarr nodes. 853 + (** {1:attrs Attributes} 399 854 400 - Decodes convention-namespaced keys from a flat JSON object, plus a 401 - [zarr_conventions] registration array. Convention keys are recognised by 402 - prefix: [proj:*], [spatial:*], and the nested [multiscales] object. 403 - All remaining keys are preserved in [unknown]. *) 855 + The composable attributes layer shared by all zarr node types. 856 + Convention-namespaced keys are routed to the appropriate convention 857 + codec; remaining keys are preserved in {!Attrs.unknown}. *) 404 858 module Attrs : sig 405 859 type t 406 860 407 861 val conventions : t -> Conv.Meta.t list 408 - (** The [zarr_conventions] registration entries decoded from the object. *) 862 + (** The ["zarr_conventions"] registration entries. *) 409 863 410 864 val proj : t -> Conv.Proj.t option 411 - (** Projection convention attributes, present if any [proj:*] keys exist. *) 865 + (** Projection convention, present if any [proj:*] keys exist. *) 412 866 413 867 val spatial : t -> Conv.Spatial.t option 414 - (** Spatial convention attributes, present if [spatial:dimensions] exists. *) 868 + (** Spatial convention, present if [spatial:dimensions] exists. *) 415 869 416 870 val multiscales : t -> Conv.Multiscales.t option 417 - (** Multiscales convention attributes, present if [multiscales] key exists. *) 871 + (** Multiscales convention, present if ["multiscales"] key exists. *) 418 872 419 873 val geoemb : t -> Conv.Geoemb.t option 420 - (** Geoembeddings convention attributes, present if [geoemb:type] exists. *) 874 + (** Geoembeddings convention, present if [geoemb:type] exists. *) 421 875 422 876 val unknown : t -> Jsont.json 423 - (** Remaining keys not belonging to any known convention. *) 877 + (** All keys not belonging to any known convention. *) 424 878 425 879 val empty : t 426 - (** An empty attributes value with no conventions and no unknown keys. *) 880 + (** Empty attributes with no conventions and no unknown keys. *) 427 881 end 428 882 429 883 val attrs_jsont : Attrs.t Jsont.t 430 884 (** Codec for {!Attrs.t}. 431 885 432 - Decodes a flat JSON object by routing convention-prefixed keys to the 433 - appropriate sub-codec. On encode, auto-populates [zarr_conventions] from 434 - whichever conventions are [Some], then merges their flat members back into 435 - the object alongside [multiscales] and unknown keys. *) 886 + On decode, routes convention-prefixed keys to sub-codecs. On 887 + encode, auto-populates ["zarr_conventions"] from whichever 888 + conventions are [Some], then merges their flat members alongside 889 + ["multiscales"] and unknown keys. *) 890 + 891 + (** {1:nodes Nodes} 436 892 437 - (** A Zarr v2 node (array or group) with associated attributes. 893 + A zarr node is either an array or a group, with associated 894 + attributes. *) 438 895 439 - Defined at the top level (not inside {!V2}) so it can reference {!Attrs.t}, 440 - which is declared after the V2 module. *) 896 + (** A Zarr v2 node. *) 441 897 module V2_node : sig 442 898 type t 443 899 val kind : t -> [ `Array of V2.Array_meta.t | `Group ] 900 + (** [`Array] for [.zarray], [`Group] for [.zgroup]. *) 901 + 444 902 val attrs : t -> Attrs.t 903 + (** Decoded attributes (from [.zattrs] or {!Attrs.empty}). *) 904 + 445 905 val unknown : t -> Jsont.json 446 906 end 447 907 448 - (** A Zarr v3 node (array or group) with associated attributes. 449 - 450 - Defined at the top level (not inside {!V3}) so it can reference {!Attrs.t}, 451 - which is declared after the V3 module. *) 908 + (** A Zarr v3 node. *) 452 909 module V3_node : sig 453 910 type t 454 911 val kind : t -> [ `Array of V3.Array_meta.t | `Group ] 912 + (** [`Array] if ["node_type"] is ["array"], [`Group] otherwise. *) 913 + 455 914 val attrs : t -> Attrs.t 915 + (** Decoded attributes (from the ["attributes"] sub-object). *) 916 + 456 917 val unknown : t -> Jsont.json 457 918 end 458 919 459 920 val v2_array_jsont : V2_node.t Jsont.t 460 - (** Codec for a Zarr v2 array node. 461 - 462 - Decodes the [.zarray] JSON object into a {!V2_node.t} with [`Array] kind. 463 - Attributes are not decoded by this codec; use {!attrs_jsont} separately if 464 - needed. *) 921 + (** Codec for a Zarr v2 [.zarray] file. Produces a {!V2_node.t} with 922 + [`Array] kind. Attributes are not decoded (use {!attrs_jsont} 923 + separately on the [.zattrs] file). *) 465 924 466 925 val v2_group_jsont : V2_node.t Jsont.t 467 - (** Codec for a Zarr v2 group node. 468 - 469 - Decodes any JSON object into a {!V2_node.t} with [`Group] kind. 470 - Encodes as [{"zarr_format": 2}]. *) 926 + (** Codec for a Zarr v2 [.zgroup] file. Produces a {!V2_node.t} 927 + with [`Group] kind. Encodes as [{"zarr_format": 2}]. *) 471 928 472 929 val v3_jsont : V3_node.t Jsont.t 473 - (** Codec for a Zarr v3 node (array or group). 930 + (** Codec for a Zarr v3 [zarr.json] file. Dispatches on ["node_type"]: 931 + arrays are fully decoded via {!V3.array_meta_jsont}; groups produce 932 + a [`Group] node. The ["attributes"] sub-object is decoded via 933 + {!attrs_jsont}. *) 474 934 475 - Dispatches on the ["node_type"] field. For arrays, decodes the full array 476 - metadata via {!V3.array_meta_jsont}; for groups, produces a [`Group] node. 477 - The ["attributes"] sub-object, if present, is decoded via {!attrs_jsont} and 478 - stored in the node's [attrs] field. On encode, attributes are re-inserted 479 - as the ["attributes"] member only when non-empty. *) 935 + (** {1:dispatch Unified dispatch} *) 480 936 481 937 type t = [ `V2 of V2_node.t | `V3 of V3_node.t ] 482 - (** A unified Zarr node: either a v2 node or a v3 node. *) 938 + (** A zarr node of either version. *) 483 939 484 940 val jsont : t Jsont.t 485 - (** Top-level dispatch codec for Zarr metadata. 941 + (** Top-level dispatch codec. Inspects ["zarr_format"] to choose v2 or 942 + v3, then dispatches to the appropriate node codec. V2 arrays are 943 + distinguished from groups by the presence of ["shape"]. *) 486 944 487 - Decodes any Zarr metadata JSON object by inspecting the ["zarr_format"] 488 - field and dispatching to {!v2_array_jsont}, {!v2_group_jsont}, or 489 - {!v3_jsont} as appropriate. V2 arrays are distinguished from v2 groups 490 - by the presence of a ["shape"] field. *) 945 + (** {1:consolidated Consolidated metadata} 491 946 492 - (** {1 Consolidated metadata} *) 947 + Consolidated metadata allows loading the metadata for an entire 948 + hierarchy with a single read, avoiding per-node HTTP requests for 949 + remote stores. *) 493 950 494 - (** V3 consolidated metadata (inline in a group's [zarr.json]). 951 + (** V3 consolidated metadata, stored inline in a group's [zarr.json] 952 + as an optional ["consolidated_metadata"] field. 495 953 496 - See {{:https://github.com/zarr-developers/zarr-specs/pull/309}zarr-specs PR 309}. 497 - The [consolidated_metadata] field maps relative node paths to their 498 - full metadata objects. *) 954 + See {{:https://github.com/zarr-developers/zarr-specs/pull/309} 955 + zarr-specs PR 309}. The ["metadata"] map uses relative paths as 956 + keys and full [zarr.json] objects as values. For example, a 957 + hierarchy [A/B/x] consolidated at [A] would have key ["B/x"]. *) 499 958 module Consolidated : sig 500 959 type t 501 960 val metadata : t -> (string * V3_node.t) list 502 - (** Mapping from relative node path to decoded node metadata. *) 961 + (** Mapping from relative node path to decoded metadata. *) 503 962 504 963 val kind : t -> string 505 - (** Currently always ["inline"]. *) 964 + (** Storage kind. Currently always ["inline"]. *) 506 965 507 966 val jsont : t Jsont.t 508 967 end 509 968 510 - (** V2 consolidated metadata ([.zmetadata] file). 969 + (** V2 consolidated metadata, stored in a [.zmetadata] file at the 970 + store root. 511 971 512 - Maps flat keys like ["array1/.zarray"] and ["array1/.zattrs"] to their 513 - decoded JSON objects. *) 972 + The file contains a ["metadata"] object mapping flat keys like 973 + ["array1/.zarray"] and ["array1/.zattrs"] to their JSON contents, 974 + plus a ["zarr_consolidated_format"] version number (currently [1]). 975 + This codec groups the flat keys by path prefix and decodes each 976 + group into a {!V2_node.t} with optional {!Attrs.t}. *) 514 977 module V2_consolidated : sig 515 978 type entry = { 516 - path : string; 517 - node : V2_node.t; 518 - attrs : Attrs.t option; 979 + path : string; (** Relative path within the store. *) 980 + node : V2_node.t; (** Decoded node metadata. *) 981 + attrs : Attrs.t option; (** Decoded [.zattrs], if present. *) 519 982 } 520 983 521 984 type t 522 985 val entries : t -> entry list 986 + (** All entries, sorted by path. *) 987 + 523 988 val format_version : t -> int 989 + (** The ["zarr_consolidated_format"] value (currently [1]). *) 524 990 525 991 val jsont : t Jsont.t 526 992 end 527 993 528 - (** {1 Store probing} *) 994 + (** {1:probe Store probing} *) 529 995 530 996 type probe_result = { 531 997 node : t; 532 998 attrs : Attrs.t option; 533 - (** For v2, the separately-fetched [.zattrs] if present. For v3, [None] 534 - (attributes are inline in [zarr.json]). *) 999 + (** For v2, the separately-fetched [.zattrs] if present. For v3, 1000 + [None] (attributes are inline in [zarr.json]). *) 535 1001 consolidated : [ `V3 of Consolidated.t | `V2 of V2_consolidated.t ] option; 536 - (** Consolidated metadata if present. *) 1002 + (** Consolidated metadata, if present. *) 537 1003 children : (string * probe_result) list; 538 - (** Recursively probed children. Populated from consolidated metadata 539 - when available, or by listing the directory (if [list] is provided). *) 1004 + (** Recursively probed children. Populated from consolidated 1005 + metadata when available. *) 540 1006 } 541 1007 (** The result of probing a zarr store path. *) 542 1008 ··· 546 1012 (probe_result, string) result 547 1013 (** [probe ~read path] probes the zarr store rooted at [path]. 548 1014 549 - [read relpath] is called to fetch the contents of a file at 550 - [path/relpath]. It should return [Ok contents] on success or 551 - [Error msg] if the file does not exist or cannot be read. 1015 + [read relpath] fetches a file relative to the store root. It 1016 + should return [Ok contents] or [Error msg]. 552 1017 553 - The probing order is: 554 - + [zarr.json] (v3), checking for [consolidated_metadata] field 1018 + Probing order: 1019 + + [zarr.json] (v3), checking for ["consolidated_metadata"] 555 1020 + [.zarray] (v2 array), with optional [.zattrs] 556 1021 + [.zgroup] (v2 group), with optional [.zattrs] and [.zmetadata] 557 1022 558 - When consolidated metadata is present, [children] is populated by 559 - decoding each entry. Each child in consolidated metadata is itself 560 - probed via [read] for any additional metadata not captured in the 561 - consolidated form. 1023 + When consolidated metadata is present, {!probe_result.children} is 1024 + populated by recursively grouping entries by path component. 562 1025 563 - Returns [Error] if none of the expected files can be read. *) 1026 + Returns [Error] if no zarr metadata files can be read. *) 564 1027 565 - (** {1 Pretty-printing} *) 1028 + (** {1:pp Pretty-printing} *) 566 1029 567 1030 val pp_dtype : Format.formatter -> dtype -> unit 568 1031 (** Pretty-print a v2 dtype (e.g. [float64], [int32], [structured]). *) ··· 574 1037 (** Pretty-print a fill value. *) 575 1038 576 1039 val pp_attrs : Format.formatter -> Attrs.t -> unit 577 - (** Pretty-print decoded attributes: conventions then unknown keys. *) 1040 + (** Pretty-print decoded attributes: conventions, then unknown keys. *) 578 1041 579 1042 val pp_probe_result : Format.formatter -> probe_result -> unit 580 1043 (** Pretty-print a probe result as a tree with types, shapes, 581 - conventions, and unknown attributes. *) 1044 + conventions, and unknown attributes. Example output: 1045 + {v 1046 + [group] 1047 + geoemb: pixel 128d model=https://... dtype=int8 gsd=10 1048 + quantization: per_pixel_scale float32 -> int8 scale_array=scales 1049 + utm01 [group] 1050 + proj: code=EPSG:32601 1051 + spatial: dims=[y,x] bbox=[...] transform=[...] shape=[1290240x65536] 1052 + embeddings [array int8 9x128x1290240x65536] 1053 + v} *)