OCaml Zarr jsont codecs for v2/v3 and common conventions
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add zarr-jsont design spec

Defines type-safe OCaml jsont codecs for Zarr v2/v3 JSON metadata
with best-effort convention decoding for geo-proj, spatial, and
multiscales.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+446
+446
docs/superpowers/specs/2026-03-30-zarr-jsont-design.md
··· 1 + # zarr-jsont Design Spec 2 + 3 + Type-safe OCaml jsont codecs for Zarr v2 and v3 JSON metadata, with 4 + best-effort convention decoding. 5 + 6 + ## Overview 7 + 8 + A single `zarr-jsont` opam package providing bidirectional jsont codecs for: 9 + 10 + - **Zarr v2**: `.zarray`, `.zgroup`, `.zattrs` metadata files 11 + - **Zarr v3**: `zarr.json` metadata (arrays and groups) 12 + - **Conventions**: geo-proj (`proj:`), spatial (`spatial:`), multiscales 13 + 14 + All types use polymorphic variants. Known extensions and codecs are 15 + enumerated as typed variants with an `Other` escape hatch. Unknown JSON 16 + fields are preserved throughout via `keep_unknown`. 17 + 18 + ## Top-Level Type 19 + 20 + ```ocaml 21 + type t = [ `V2 of V2.Node.t | `V3 of V3.Node.t ] 22 + ``` 23 + 24 + Both arrays and groups share an `Attrs.t` layer (where conventions live). 25 + The node type determines whether array-specific metadata is present: 26 + 27 + ```ocaml 28 + (* Common pattern in both V2 and V3 *) 29 + module Node : sig 30 + type t 31 + val kind : t -> [ `Array of Array_meta.t | `Group ] 32 + val attrs : t -> Attrs.t 33 + val unknown : t -> Jsont.json 34 + end 35 + ``` 36 + 37 + ## Shared Types 38 + 39 + ### Fill Value 40 + 41 + Single polymorphic variant used by both v2 and v3. The codec handles 42 + version-specific JSON encoding (e.g. `"NaN"` string for floats, hex 43 + strings in v3, base64 for v2 structured types). 44 + 45 + ```ocaml 46 + type fill_value = [ 47 + | `Null 48 + | `Bool of bool 49 + | `Int of int64 50 + | `Float of float 51 + | `Complex of float * float 52 + | `Bytes of string 53 + ] 54 + ``` 55 + 56 + ### Dtype (v2) 57 + 58 + Fully parsed from NumPy typestr format, including structured types. 59 + 60 + ```ocaml 61 + type endian = [ `Little | `Big | `Not_applicable ] 62 + 63 + type dtype = [ 64 + | `Bool 65 + | `Int of endian * int 66 + | `Uint of endian * int 67 + | `Float of endian * int 68 + | `Complex of endian * int 69 + | `Timedelta of endian * string 70 + | `Datetime of endian * string 71 + | `String of int 72 + | `Unicode of endian * int 73 + | `Raw of int 74 + | `Structured of (string * dtype * int list option) list 75 + ] 76 + ``` 77 + 78 + ### Other_codec / Other_ext 79 + 80 + Escape hatches for unrecognized codecs and v3 extensions: 81 + 82 + ```ocaml 83 + module Other_codec : sig 84 + type t 85 + val name : t -> string 86 + val configuration : t -> Jsont.json 87 + end 88 + 89 + module Other_ext : sig 90 + type t 91 + val name : t -> string 92 + val configuration : t -> Jsont.json option 93 + val must_understand : t -> bool 94 + end 95 + ``` 96 + 97 + ## V2 Module 98 + 99 + ### V2.Array_meta 100 + 101 + ```ocaml 102 + module Array_meta : sig 103 + type t 104 + val shape : t -> int list 105 + val chunks : t -> int list 106 + val dtype : t -> dtype 107 + val compressor : t -> compressor option 108 + val fill_value : t -> fill_value 109 + val order : t -> [ `C | `F ] 110 + val filters : t -> filter list option 111 + val dimension_separator : t -> [ `Dot | `Slash ] option 112 + val unknown : t -> Jsont.json 113 + end 114 + ``` 115 + 116 + ### V2.Compressor 117 + 118 + ```ocaml 119 + type compressor = [ 120 + | `Blosc of Compressor.Blosc.t 121 + | `Zlib of Compressor.Zlib.t 122 + | `Other of Other_codec.t 123 + ] 124 + 125 + module Compressor : sig 126 + module Blosc : sig 127 + type t 128 + val cname : t -> string 129 + val clevel : t -> int 130 + val shuffle : t -> int 131 + val blocksize : t -> int option 132 + val unknown : t -> Jsont.json 133 + end 134 + module Zlib : sig 135 + type t 136 + val level : t -> int 137 + val unknown : t -> Jsont.json 138 + end 139 + end 140 + ``` 141 + 142 + ### V2.Filter 143 + 144 + ```ocaml 145 + type filter = [ 146 + | `Delta of Filter.Delta.t 147 + | `Other of Other_codec.t 148 + ] 149 + 150 + module Filter : sig 151 + module Delta : sig 152 + type t 153 + val dtype : t -> string 154 + val astype : t -> string option 155 + val unknown : t -> Jsont.json 156 + end 157 + end 158 + ``` 159 + 160 + ### V2.Node 161 + 162 + ```ocaml 163 + module Node : sig 164 + type t 165 + val kind : t -> [ `Array of Array_meta.t | `Group ] 166 + val attrs : t -> Attrs.t 167 + val unknown : t -> Jsont.json 168 + end 169 + ``` 170 + 171 + ## V3 Module 172 + 173 + ### V3.Data_type 174 + 175 + ```ocaml 176 + type data_type = [ 177 + | `Bool | `Int8 | `Int16 | `Int32 | `Int64 178 + | `Uint8 | `Uint16 | `Uint32 | `Uint64 179 + | `Float16 | `Float32 | `Float64 180 + | `Complex64 | `Complex128 181 + | `Raw of int 182 + | `Other of Other_ext.t 183 + ] 184 + ``` 185 + 186 + ### V3.Chunk_grid 187 + 188 + ```ocaml 189 + type chunk_grid = [ `Regular of Chunk_grid.Regular.t | `Other of Other_ext.t ] 190 + 191 + module Chunk_grid : sig 192 + module Regular : sig 193 + type t 194 + val chunk_shape : t -> int list 195 + end 196 + end 197 + ``` 198 + 199 + ### V3.Chunk_key_encoding 200 + 201 + ```ocaml 202 + type chunk_key_encoding = [ 203 + | `Default of Chunk_key_encoding.Default.t 204 + | `Other of Other_ext.t 205 + ] 206 + 207 + module Chunk_key_encoding : sig 208 + module Default : sig 209 + type t 210 + val separator : t -> [ `Slash | `Dot ] 211 + end 212 + end 213 + ``` 214 + 215 + ### V3.Codec 216 + 217 + ```ocaml 218 + type codec = [ 219 + | `Bytes of Codec.Bytes.t 220 + | `Gzip of Codec.Gzip.t 221 + | `Blosc of Codec.Blosc.t 222 + | `Crc32c 223 + | `Transpose of Codec.Transpose.t 224 + | `Sharding of Codec.Sharding.t 225 + | `Other of Other_ext.t 226 + ] 227 + 228 + module Codec : sig 229 + module Bytes : sig 230 + type t 231 + val endian : t -> [ `Little | `Big ] 232 + end 233 + module Gzip : sig 234 + type t 235 + val level : t -> int 236 + end 237 + module Blosc : sig 238 + type t 239 + val cname : t -> string 240 + val clevel : t -> int 241 + val shuffle : t -> [ `Noshuffle | `Shuffle | `Bitshuffle ] 242 + val typesize : t -> int option 243 + val blocksize : t -> int 244 + end 245 + module Transpose : sig 246 + type t 247 + val order : t -> int list 248 + end 249 + module Sharding : sig 250 + type t 251 + val chunk_shape : t -> int list 252 + val codecs : t -> codec list 253 + val index_codecs : t -> codec list 254 + val index_location : t -> [ `Start | `End ] 255 + end 256 + end 257 + ``` 258 + 259 + ### V3.Array_meta 260 + 261 + ```ocaml 262 + module Array_meta : sig 263 + type t 264 + val shape : t -> int list 265 + val data_type : t -> data_type 266 + val chunk_grid : t -> chunk_grid 267 + val chunk_key_encoding : t -> chunk_key_encoding 268 + val codecs : t -> codec list 269 + val fill_value : t -> fill_value 270 + val dimension_names : t -> string option list option 271 + val storage_transformers : t -> Other_ext.t list option 272 + val unknown : t -> Jsont.json 273 + end 274 + ``` 275 + 276 + ### V3.Node 277 + 278 + ```ocaml 279 + module Node : sig 280 + type t 281 + val kind : t -> [ `Array of Array_meta.t | `Group ] 282 + val attrs : t -> Attrs.t 283 + val unknown : t -> Jsont.json 284 + end 285 + ``` 286 + 287 + ## Conventions 288 + 289 + ### Conv.Meta 290 + 291 + Convention registration metadata, auto-populated on encode. 292 + 293 + ```ocaml 294 + module Meta : sig 295 + type t 296 + val uuid : t -> string 297 + val name : t -> string 298 + val schema_url : t -> string option 299 + val spec_url : t -> string option 300 + val description : t -> string option 301 + end 302 + ``` 303 + 304 + ### Conv.Proj 305 + 306 + ```ocaml 307 + module Proj : sig 308 + type t 309 + val code : t -> string option 310 + val wkt2 : t -> string option 311 + val projjson : t -> Jsont.json option 312 + val meta : Meta.t 313 + end 314 + ``` 315 + 316 + ### Conv.Spatial 317 + 318 + ```ocaml 319 + module Spatial : sig 320 + type t 321 + val dimensions : t -> string list 322 + val bbox : t -> float list option 323 + val transform_type : t -> string option 324 + val transform : t -> float list option 325 + val shape : t -> int list option 326 + val registration : t -> [ `Pixel | `Node ] option 327 + val meta : Meta.t 328 + end 329 + ``` 330 + 331 + ### Conv.Multiscales 332 + 333 + ```ocaml 334 + module Multiscales : sig 335 + module Transform : sig 336 + type t 337 + val scale : t -> float list option 338 + val translation : t -> float list option 339 + val unknown : t -> Jsont.json 340 + end 341 + module Layout_item : sig 342 + type t 343 + val asset : t -> string 344 + val derived_from : t -> string option 345 + val transform : t -> Transform.t option 346 + val resampling_method : t -> string option 347 + val unknown : t -> Jsont.json 348 + end 349 + type t 350 + val layout : t -> Layout_item.t list 351 + val resampling_method : t -> string option 352 + val meta : Meta.t 353 + end 354 + ``` 355 + 356 + ## Attrs 357 + 358 + Composable attributes layer shared by all node types. 359 + 360 + ```ocaml 361 + module Attrs : sig 362 + type t 363 + val conventions : t -> Conv.Meta.t list 364 + val proj : t -> Conv.Proj.t option 365 + val spatial : t -> Conv.Spatial.t option 366 + val multiscales : t -> Conv.Multiscales.t option 367 + val unknown : t -> Jsont.json 368 + end 369 + ``` 370 + 371 + On encode, the `zarr_conventions` array is auto-populated based on which 372 + convention values are `Some`. Users who need custom conventions can build 373 + their own attrs type reusing the `Conv.*` codecs as building blocks. 374 + 375 + ## Public API 376 + 377 + ```ocaml 378 + module Zarr_jsont : sig 379 + type t = [ `V2 of V2.Node.t | `V3 of V3.Node.t ] 380 + 381 + (** Smart decoder: dispatches on zarr_format, then node_type/shape *) 382 + val jsont : t Jsont.t 383 + 384 + (** Version-specific codecs *) 385 + val v2_array_jsont : V2.Node.t Jsont.t 386 + val v2_group_jsont : V2.Node.t Jsont.t 387 + val v3_jsont : V3.Node.t Jsont.t 388 + 389 + (** Standalone codecs *) 390 + val attrs_jsont : Attrs.t Jsont.t 391 + val dtype_jsont : dtype Jsont.t 392 + val fill_value_jsont : fill_value Jsont.t 393 + end 394 + ``` 395 + 396 + **Dispatch logic for `jsont`**: 397 + 1. Peek at `zarr_format` field: `2` or `3` 398 + 2. V3: check `node_type` for `"array"` or `"group"` 399 + 3. V2: presence of `shape` field distinguishes `.zarray` (always has 400 + `shape`) from `.zgroup` (only has `zarr_format`) 401 + 402 + **V2 file mapping**: In v2, metadata is split across separate files. 403 + `.zarray` and `.zgroup` decode via `v2_array_jsont`/`v2_group_jsont`. 404 + `.zattrs` is a standalone attributes file decoded via `attrs_jsont`. 405 + The unified `jsont` codec handles the case where you have a JSON blob 406 + and don't know its type yet (e.g. reading from a store abstraction). 407 + 408 + ## Build System 409 + 410 + ``` 411 + zarr-jsont/ 412 + dune-project 413 + zarr-jsont.opam 414 + src/ 415 + dune 416 + zarr_jsont.ml(i) 417 + v2.ml 418 + v3.ml 419 + conv.ml 420 + attrs.ml 421 + fill_value.ml 422 + dtype.ml 423 + other_codec.ml 424 + other_ext.ml 425 + test/ 426 + dune 427 + test_zarr_jsont.ml 428 + ``` 429 + 430 + **Dependencies**: `jsont` (>= 0.2.0), `jsont.bytesrw` 431 + 432 + **dune library**: 433 + ``` 434 + (library 435 + (name zarr_jsont) 436 + (public_name zarr-jsont) 437 + (libraries jsont jsont.bytesrw)) 438 + ``` 439 + 440 + ## Testing 441 + 442 + - Roundtrip tests using example JSON from spec directories 443 + - Unit tests for dtype parsing (all NumPy typestr forms) 444 + - Fill value edge cases: `"NaN"`, `"Infinity"`, `"-Infinity"`, hex, complex, bytes 445 + - Convention composition: proj + spatial + multiscales together 446 + - Unknown field preservation: decode-encode roundtrip retains extra fields