commits
- README.md with features, installation, quick start, zarr-inspect
example output, and documentation links
- CHANGES.md for 0.1.0 release
- All ai_disclosure annotations changed from ai-generated to ai-assisted
(human-authored with AI editing/refinement)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every type, module, and accessor now has documentation that includes:
- Links to the relevant zarr-specs section on readthedocs
- Links to convention repositories on GitHub
- Enough spec detail to understand the type without reading the spec
- JSON encoding details (what JSON forms each type accepts)
- Field semantics (units, defaults, valid ranges)
- Section headers grouping related types
The .mli module doc includes a quick-start example for both decode
and store probing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .ocamlformat: formatting configuration
- .gitignore: add *.install, *.merlin
- LICENSE.md: ISC license
- dune-project: add license, authors, maintainers, source, description
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the geoemb: convention from
geo-embeddings/embeddings-zarr-convention v1 with full typed codecs:
- Geoemb: type (pixel/chip), dimensions, model, source_data, data_type
plus optional gsd, spatial_layout, build_version, benchmark
- Chip_layout: layout_type (regular_grid/irregular), chip_size, stride
- Quantization: method, original_dtype, quantized_dtype, scale, link
- Scale: tagged union with Scalar (scale+offset) and Array (array_name+nodata)
using case_mem on the "type" discriminant
The Scale tagged union uses case_mem properly — the "type" field is
the discriminant and each case codec omits it from its own members.
zarr-inspect now displays geoemb metadata inline:
geoemb: pixel 128d model=... dtype=int8 gsd=10 layout=utm_zones
quantization: per_pixel_scale float32 -> int8 scale_array=scales
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual JSON AST pattern matching with codec-based decoding:
- jdec/jenc helpers: decode/encode via any jsont codec, raise jsont errors
- field_jsont: positional tuple via Jsont.list Jsont.json + jdec for each element
- structured_codec: just Jsont.list field_jsont wrapped in a trivial map
- No Jsont.String/Array/Number constructor matching anywhere in dtype
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- parse_typestr returns (dtype, string) result instead of raising
- Simple dtype uses Jsont.of_of_string (result-based, proper errors)
- Structured dtype uses Jsont.Error.msgf for decode errors
- decode_field and decode_shape return result types
- No more invalid_arg or manual JSON AST matching in dtype codec
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use let rec with a lazy value and Jsont.rec' instead of a mutable
forward reference. The lazy block binds self = Jsont.rec' dtype_jsont_lazy
and uses it directly for recursive decode/encode of nested structured
dtypes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add pp_dtype, pp_data_type, pp_fill_value, pp_attrs, pp_probe_result
to the library. These use plain string indentation (not Format boxes)
for clean tree output.
The zarr-inspect binary is now a one-liner calling pp_probe_result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
zarr-inspect now shows decoded convention metadata inline:
- proj: code, wkt2, projjson
- spatial: dims, bbox, transform, shape, registration
- multiscales: level count, layout with derivation chains
- unknown keys: truncated JSON values
Root attrs and per-child attrs are both displayed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
probe now populates children by building a tree from consolidated
metadata entries. V3 and V2 consolidated are both supported with
recursive grouping by path components.
zarr-inspect displays the tree with types and shapes, e.g.:
utm01 [group]
embeddings [array int8 9x128x1466368x69632]
scales [array float32 9x1466368x69632]
No directory listing needed — children come from consolidated metadata.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass empty object to sub-codecs when configuration is absent, letting
the codec's own field declarations (mem vs opt_mem with dec_absent)
determine what's required. This produces proper jsont errors for
genuinely missing required fields.
Make Codec.Bytes.endian optional (None for single-byte data types).
Tested against geotessera.org zarr store (464 nodes, some bytes codecs
without configuration).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All OCaml source files annotated with ai-generated/claude-opus-4/Anthropic.
opam template adds x-ai-* extension fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Consolidated module for v3 inline consolidated_metadata field
(zarr-specs PR 309) and V2_consolidated module for v2 .zmetadata files.
The probe function now checks for consolidated metadata:
- V3: parses consolidated_metadata from zarr.json if present
- V2: reads .zmetadata if present alongside .zgroup
The zarr-inspect CLI displays the consolidated node listing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Zarr_jsont.probe ~read which probes a zarr store by trying
zarr.json, .zarray, .zgroup in order. The read function is a
caller-supplied callback for fetching files.
Add zarr-inspect binary that uses curl for URLs and direct file
reads for local directories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Task 13 roundtrip tests exercising real-world v2/v3 Zarr JSON through
encode-decode-encode cycles, and Task 14 tests verifying that unknown fields
survive roundtrips in V2.Array_meta, V2.Compressor.Blosc, and Attrs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add V2_node and V3_node top-level modules (after Attrs to avoid ordering
issues), v2_array_jsont, v2_group_jsont, and v3_jsont codecs that
decode/encode full Zarr nodes with embedded convention attributes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add V3 module with all six core zarr v3 codec types plus Other_ext fallback.
Sharding is recursive via Jsont.rec' with a lazy value referencing codec_jsont_lazy.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
15 tasks covering bottom-up build: shared types, v2/v3 modules,
conventions, attrs composition, dispatch codec, and roundtrip tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defines type-safe OCaml jsont codecs for Zarr v2/v3 JSON metadata
with best-effort convention decoding for geo-proj, spatial, and
multiscales.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README.md with features, installation, quick start, zarr-inspect
example output, and documentation links
- CHANGES.md for 0.1.0 release
- All ai_disclosure annotations changed from ai-generated to ai-assisted
(human-authored with AI editing/refinement)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every type, module, and accessor now has documentation that includes:
- Links to the relevant zarr-specs section on readthedocs
- Links to convention repositories on GitHub
- Enough spec detail to understand the type without reading the spec
- JSON encoding details (what JSON forms each type accepts)
- Field semantics (units, defaults, valid ranges)
- Section headers grouping related types
The .mli module doc includes a quick-start example for both decode
and store probing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the geoemb: convention from
geo-embeddings/embeddings-zarr-convention v1 with full typed codecs:
- Geoemb: type (pixel/chip), dimensions, model, source_data, data_type
plus optional gsd, spatial_layout, build_version, benchmark
- Chip_layout: layout_type (regular_grid/irregular), chip_size, stride
- Quantization: method, original_dtype, quantized_dtype, scale, link
- Scale: tagged union with Scalar (scale+offset) and Array (array_name+nodata)
using case_mem on the "type" discriminant
The Scale tagged union uses case_mem properly — the "type" field is
the discriminant and each case codec omits it from its own members.
zarr-inspect now displays geoemb metadata inline:
geoemb: pixel 128d model=... dtype=int8 gsd=10 layout=utm_zones
quantization: per_pixel_scale float32 -> int8 scale_array=scales
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual JSON AST pattern matching with codec-based decoding:
- jdec/jenc helpers: decode/encode via any jsont codec, raise jsont errors
- field_jsont: positional tuple via Jsont.list Jsont.json + jdec for each element
- structured_codec: just Jsont.list field_jsont wrapped in a trivial map
- No Jsont.String/Array/Number constructor matching anywhere in dtype
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- parse_typestr returns (dtype, string) result instead of raising
- Simple dtype uses Jsont.of_of_string (result-based, proper errors)
- Structured dtype uses Jsont.Error.msgf for decode errors
- decode_field and decode_shape return result types
- No more invalid_arg or manual JSON AST matching in dtype codec
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
zarr-inspect now shows decoded convention metadata inline:
- proj: code, wkt2, projjson
- spatial: dims, bbox, transform, shape, registration
- multiscales: level count, layout with derivation chains
- unknown keys: truncated JSON values
Root attrs and per-child attrs are both displayed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
probe now populates children by building a tree from consolidated
metadata entries. V3 and V2 consolidated are both supported with
recursive grouping by path components.
zarr-inspect displays the tree with types and shapes, e.g.:
utm01 [group]
embeddings [array int8 9x128x1466368x69632]
scales [array float32 9x1466368x69632]
No directory listing needed — children come from consolidated metadata.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass empty object to sub-codecs when configuration is absent, letting
the codec's own field declarations (mem vs opt_mem with dec_absent)
determine what's required. This produces proper jsont errors for
genuinely missing required fields.
Make Codec.Bytes.endian optional (None for single-byte data types).
Tested against geotessera.org zarr store (464 nodes, some bytes codecs
without configuration).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Consolidated module for v3 inline consolidated_metadata field
(zarr-specs PR 309) and V2_consolidated module for v2 .zmetadata files.
The probe function now checks for consolidated metadata:
- V3: parses consolidated_metadata from zarr.json if present
- V2: reads .zmetadata if present alongside .zgroup
The zarr-inspect CLI displays the consolidated node listing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Zarr_jsont.probe ~read which probes a zarr store by trying
zarr.json, .zarray, .zgroup in order. The read function is a
caller-supplied callback for fetching files.
Add zarr-inspect binary that uses curl for URLs and direct file
reads for local directories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>