···11+# zarr-jsont Design Spec
22+33+Type-safe OCaml jsont codecs for Zarr v2 and v3 JSON metadata, with
44+best-effort convention decoding.
55+66+## Overview
77+88+A single `zarr-jsont` opam package providing bidirectional jsont codecs for:
99+1010+- **Zarr v2**: `.zarray`, `.zgroup`, `.zattrs` metadata files
1111+- **Zarr v3**: `zarr.json` metadata (arrays and groups)
1212+- **Conventions**: geo-proj (`proj:`), spatial (`spatial:`), multiscales
1313+1414+All types use polymorphic variants. Known extensions and codecs are
1515+enumerated as typed variants with an `Other` escape hatch. Unknown JSON
1616+fields are preserved throughout via `keep_unknown`.
1717+1818+## Top-Level Type
1919+2020+```ocaml
2121+type t = [ `V2 of V2.Node.t | `V3 of V3.Node.t ]
2222+```
2323+2424+Both arrays and groups share an `Attrs.t` layer (where conventions live).
2525+The node type determines whether array-specific metadata is present:
2626+2727+```ocaml
2828+(* Common pattern in both V2 and V3 *)
2929+module Node : sig
3030+ type t
3131+ val kind : t -> [ `Array of Array_meta.t | `Group ]
3232+ val attrs : t -> Attrs.t
3333+ val unknown : t -> Jsont.json
3434+end
3535+```
3636+3737+## Shared Types
3838+3939+### Fill Value
4040+4141+Single polymorphic variant used by both v2 and v3. The codec handles
4242+version-specific JSON encoding (e.g. `"NaN"` string for floats, hex
4343+strings in v3, base64 for v2 structured types).
4444+4545+```ocaml
4646+type fill_value = [
4747+ | `Null
4848+ | `Bool of bool
4949+ | `Int of int64
5050+ | `Float of float
5151+ | `Complex of float * float
5252+ | `Bytes of string
5353+]
5454+```
5555+5656+### Dtype (v2)
5757+5858+Fully parsed from NumPy typestr format, including structured types.
5959+6060+```ocaml
6161+type endian = [ `Little | `Big | `Not_applicable ]
6262+6363+type dtype = [
6464+ | `Bool
6565+ | `Int of endian * int
6666+ | `Uint of endian * int
6767+ | `Float of endian * int
6868+ | `Complex of endian * int
6969+ | `Timedelta of endian * string
7070+ | `Datetime of endian * string
7171+ | `String of int
7272+ | `Unicode of endian * int
7373+ | `Raw of int
7474+ | `Structured of (string * dtype * int list option) list
7575+]
7676+```
7777+7878+### Other_codec / Other_ext
7979+8080+Escape hatches for unrecognized codecs and v3 extensions:
8181+8282+```ocaml
8383+module Other_codec : sig
8484+ type t
8585+ val name : t -> string
8686+ val configuration : t -> Jsont.json
8787+end
8888+8989+module Other_ext : sig
9090+ type t
9191+ val name : t -> string
9292+ val configuration : t -> Jsont.json option
9393+ val must_understand : t -> bool
9494+end
9595+```
9696+9797+## V2 Module
9898+9999+### V2.Array_meta
100100+101101+```ocaml
102102+module Array_meta : sig
103103+ type t
104104+ val shape : t -> int list
105105+ val chunks : t -> int list
106106+ val dtype : t -> dtype
107107+ val compressor : t -> compressor option
108108+ val fill_value : t -> fill_value
109109+ val order : t -> [ `C | `F ]
110110+ val filters : t -> filter list option
111111+ val dimension_separator : t -> [ `Dot | `Slash ] option
112112+ val unknown : t -> Jsont.json
113113+end
114114+```
115115+116116+### V2.Compressor
117117+118118+```ocaml
119119+type compressor = [
120120+ | `Blosc of Compressor.Blosc.t
121121+ | `Zlib of Compressor.Zlib.t
122122+ | `Other of Other_codec.t
123123+]
124124+125125+module Compressor : sig
126126+ module Blosc : sig
127127+ type t
128128+ val cname : t -> string
129129+ val clevel : t -> int
130130+ val shuffle : t -> int
131131+ val blocksize : t -> int option
132132+ val unknown : t -> Jsont.json
133133+ end
134134+ module Zlib : sig
135135+ type t
136136+ val level : t -> int
137137+ val unknown : t -> Jsont.json
138138+ end
139139+end
140140+```
141141+142142+### V2.Filter
143143+144144+```ocaml
145145+type filter = [
146146+ | `Delta of Filter.Delta.t
147147+ | `Other of Other_codec.t
148148+]
149149+150150+module Filter : sig
151151+ module Delta : sig
152152+ type t
153153+ val dtype : t -> string
154154+ val astype : t -> string option
155155+ val unknown : t -> Jsont.json
156156+ end
157157+end
158158+```
159159+160160+### V2.Node
161161+162162+```ocaml
163163+module Node : sig
164164+ type t
165165+ val kind : t -> [ `Array of Array_meta.t | `Group ]
166166+ val attrs : t -> Attrs.t
167167+ val unknown : t -> Jsont.json
168168+end
169169+```
170170+171171+## V3 Module
172172+173173+### V3.Data_type
174174+175175+```ocaml
176176+type data_type = [
177177+ | `Bool | `Int8 | `Int16 | `Int32 | `Int64
178178+ | `Uint8 | `Uint16 | `Uint32 | `Uint64
179179+ | `Float16 | `Float32 | `Float64
180180+ | `Complex64 | `Complex128
181181+ | `Raw of int
182182+ | `Other of Other_ext.t
183183+]
184184+```
185185+186186+### V3.Chunk_grid
187187+188188+```ocaml
189189+type chunk_grid = [ `Regular of Chunk_grid.Regular.t | `Other of Other_ext.t ]
190190+191191+module Chunk_grid : sig
192192+ module Regular : sig
193193+ type t
194194+ val chunk_shape : t -> int list
195195+ end
196196+end
197197+```
198198+199199+### V3.Chunk_key_encoding
200200+201201+```ocaml
202202+type chunk_key_encoding = [
203203+ | `Default of Chunk_key_encoding.Default.t
204204+ | `Other of Other_ext.t
205205+]
206206+207207+module Chunk_key_encoding : sig
208208+ module Default : sig
209209+ type t
210210+ val separator : t -> [ `Slash | `Dot ]
211211+ end
212212+end
213213+```
214214+215215+### V3.Codec
216216+217217+```ocaml
218218+type codec = [
219219+ | `Bytes of Codec.Bytes.t
220220+ | `Gzip of Codec.Gzip.t
221221+ | `Blosc of Codec.Blosc.t
222222+ | `Crc32c
223223+ | `Transpose of Codec.Transpose.t
224224+ | `Sharding of Codec.Sharding.t
225225+ | `Other of Other_ext.t
226226+]
227227+228228+module Codec : sig
229229+ module Bytes : sig
230230+ type t
231231+ val endian : t -> [ `Little | `Big ]
232232+ end
233233+ module Gzip : sig
234234+ type t
235235+ val level : t -> int
236236+ end
237237+ module Blosc : sig
238238+ type t
239239+ val cname : t -> string
240240+ val clevel : t -> int
241241+ val shuffle : t -> [ `Noshuffle | `Shuffle | `Bitshuffle ]
242242+ val typesize : t -> int option
243243+ val blocksize : t -> int
244244+ end
245245+ module Transpose : sig
246246+ type t
247247+ val order : t -> int list
248248+ end
249249+ module Sharding : sig
250250+ type t
251251+ val chunk_shape : t -> int list
252252+ val codecs : t -> codec list
253253+ val index_codecs : t -> codec list
254254+ val index_location : t -> [ `Start | `End ]
255255+ end
256256+end
257257+```
258258+259259+### V3.Array_meta
260260+261261+```ocaml
262262+module Array_meta : sig
263263+ type t
264264+ val shape : t -> int list
265265+ val data_type : t -> data_type
266266+ val chunk_grid : t -> chunk_grid
267267+ val chunk_key_encoding : t -> chunk_key_encoding
268268+ val codecs : t -> codec list
269269+ val fill_value : t -> fill_value
270270+ val dimension_names : t -> string option list option
271271+ val storage_transformers : t -> Other_ext.t list option
272272+ val unknown : t -> Jsont.json
273273+end
274274+```
275275+276276+### V3.Node
277277+278278+```ocaml
279279+module Node : sig
280280+ type t
281281+ val kind : t -> [ `Array of Array_meta.t | `Group ]
282282+ val attrs : t -> Attrs.t
283283+ val unknown : t -> Jsont.json
284284+end
285285+```
286286+287287+## Conventions
288288+289289+### Conv.Meta
290290+291291+Convention registration metadata, auto-populated on encode.
292292+293293+```ocaml
294294+module Meta : sig
295295+ type t
296296+ val uuid : t -> string
297297+ val name : t -> string
298298+ val schema_url : t -> string option
299299+ val spec_url : t -> string option
300300+ val description : t -> string option
301301+end
302302+```
303303+304304+### Conv.Proj
305305+306306+```ocaml
307307+module Proj : sig
308308+ type t
309309+ val code : t -> string option
310310+ val wkt2 : t -> string option
311311+ val projjson : t -> Jsont.json option
312312+ val meta : Meta.t
313313+end
314314+```
315315+316316+### Conv.Spatial
317317+318318+```ocaml
319319+module Spatial : sig
320320+ type t
321321+ val dimensions : t -> string list
322322+ val bbox : t -> float list option
323323+ val transform_type : t -> string option
324324+ val transform : t -> float list option
325325+ val shape : t -> int list option
326326+ val registration : t -> [ `Pixel | `Node ] option
327327+ val meta : Meta.t
328328+end
329329+```
330330+331331+### Conv.Multiscales
332332+333333+```ocaml
334334+module Multiscales : sig
335335+ module Transform : sig
336336+ type t
337337+ val scale : t -> float list option
338338+ val translation : t -> float list option
339339+ val unknown : t -> Jsont.json
340340+ end
341341+ module Layout_item : sig
342342+ type t
343343+ val asset : t -> string
344344+ val derived_from : t -> string option
345345+ val transform : t -> Transform.t option
346346+ val resampling_method : t -> string option
347347+ val unknown : t -> Jsont.json
348348+ end
349349+ type t
350350+ val layout : t -> Layout_item.t list
351351+ val resampling_method : t -> string option
352352+ val meta : Meta.t
353353+end
354354+```
355355+356356+## Attrs
357357+358358+Composable attributes layer shared by all node types.
359359+360360+```ocaml
361361+module Attrs : sig
362362+ type t
363363+ val conventions : t -> Conv.Meta.t list
364364+ val proj : t -> Conv.Proj.t option
365365+ val spatial : t -> Conv.Spatial.t option
366366+ val multiscales : t -> Conv.Multiscales.t option
367367+ val unknown : t -> Jsont.json
368368+end
369369+```
370370+371371+On encode, the `zarr_conventions` array is auto-populated based on which
372372+convention values are `Some`. Users who need custom conventions can build
373373+their own attrs type reusing the `Conv.*` codecs as building blocks.
374374+375375+## Public API
376376+377377+```ocaml
378378+module Zarr_jsont : sig
379379+ type t = [ `V2 of V2.Node.t | `V3 of V3.Node.t ]
380380+381381+ (** Smart decoder: dispatches on zarr_format, then node_type/shape *)
382382+ val jsont : t Jsont.t
383383+384384+ (** Version-specific codecs *)
385385+ val v2_array_jsont : V2.Node.t Jsont.t
386386+ val v2_group_jsont : V2.Node.t Jsont.t
387387+ val v3_jsont : V3.Node.t Jsont.t
388388+389389+ (** Standalone codecs *)
390390+ val attrs_jsont : Attrs.t Jsont.t
391391+ val dtype_jsont : dtype Jsont.t
392392+ val fill_value_jsont : fill_value Jsont.t
393393+end
394394+```
395395+396396+**Dispatch logic for `jsont`**:
397397+1. Peek at `zarr_format` field: `2` or `3`
398398+2. V3: check `node_type` for `"array"` or `"group"`
399399+3. V2: presence of `shape` field distinguishes `.zarray` (always has
400400+ `shape`) from `.zgroup` (only has `zarr_format`)
401401+402402+**V2 file mapping**: In v2, metadata is split across separate files.
403403+`.zarray` and `.zgroup` decode via `v2_array_jsont`/`v2_group_jsont`.
404404+`.zattrs` is a standalone attributes file decoded via `attrs_jsont`.
405405+The unified `jsont` codec handles the case where you have a JSON blob
406406+and don't know its type yet (e.g. reading from a store abstraction).
407407+408408+## Build System
409409+410410+```
411411+zarr-jsont/
412412+ dune-project
413413+ zarr-jsont.opam
414414+ src/
415415+ dune
416416+ zarr_jsont.ml(i)
417417+ v2.ml
418418+ v3.ml
419419+ conv.ml
420420+ attrs.ml
421421+ fill_value.ml
422422+ dtype.ml
423423+ other_codec.ml
424424+ other_ext.ml
425425+ test/
426426+ dune
427427+ test_zarr_jsont.ml
428428+```
429429+430430+**Dependencies**: `jsont` (>= 0.2.0), `jsont.bytesrw`
431431+432432+**dune library**:
433433+```
434434+(library
435435+ (name zarr_jsont)
436436+ (public_name zarr-jsont)
437437+ (libraries jsont jsont.bytesrw))
438438+```
439439+440440+## Testing
441441+442442+- Roundtrip tests using example JSON from spec directories
443443+- Unit tests for dtype parsing (all NumPy typestr forms)
444444+- Fill value edge cases: `"NaN"`, `"Infinity"`, `"-Infinity"`, hex, complex, bytes
445445+- Convention composition: proj + spatial + multiscales together
446446+- Unknown field preservation: decode-encode roundtrip retains extra fields