My aggregated monorepo of OCaml code, automaintained
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Zarr v3 + GeoTessera Zarr: interfaces and stubs

Layered architecture:
- zarr-v3: pure OCaml Zarr v3 reader (pluggable fetch + codecs, Lwt async)
- tessera-zarr: GeoTessera-specific client (bbox→UTM→shards→dequantize→reproject)
- zarr-v3-unix: curl-based fetch for testing
- tessera-zarr-jsoo: browser XHR fetch

All interfaces documented, all implementations stubbed. Compiles clean.
Expose reproject_tile from geotessera.mli for tessera-zarr.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

+1699
+1160
docs/superpowers/plans/2026-03-19-zarr-v3-geotessera.md
··· 1 + # Zarr v3 Reader + GeoTessera Zarr Client Implementation Plan 2 + 3 + > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. 4 + 5 + **Goal:** Build a layered Zarr v3 reader and GeoTessera-specific client that fetches embeddings via efficient HTTP range requests on sharded Zarr stores, replacing the current 106MB-per-tile `.npy` downloads. 6 + 7 + **Architecture:** Three layers — (1) a pure OCaml Zarr v3 reader with pluggable codecs and fetch, (2) a GeoTessera-specific client that maps WGS84 bboxes to UTM pixel ranges and fetches/dequantizes/reprojects embeddings, (3) platform backends (jsoo for browser, unix for testing). The Zarr v3 layer uses Lwt for async I/O throughout. 8 + 9 + **Tech Stack:** OCaml, Lwt, Yojson, js_of_ocaml (browser backend only), Bigarray 10 + 11 + --- 12 + 13 + ## Key Data Format Findings 14 + 15 + From probing the live store at `https://dl2.geotessera.org/zarr/v1/2024.zarr/`: 16 + 17 + - **Root metadata** at `zarr.json` contains consolidated metadata for all zones (utm29, utm30, utm31) inline — a single fetch gets everything 18 + - **Embeddings array:** shape `[1297408, 66560, 128]`, int8, sharded 256×256×128 with 4×4×128 inner chunks 19 + - **Scales array:** shape `[1297408, 66560]`, float32, sharded 256×256 with 4×4 inner chunks 20 + - **Shard index:** 4096 entries × 16 bytes (offset+nbytes as uint64 LE) + 4 bytes CRC32C, at the END of each shard 21 + - **Blosc codec:** wraps inner chunks with a 16-byte header. For this dataset, Blosc stores data **uncompressed** (memcpy fallback) — `cbytes = nbytes + 16`. No Zstd decompression is actually needed for current data. 22 + - **Spatial metadata:** `spatial:transform` gives the 6-element affine transform `[10, 0, origin_e, 0, -10, origin_n]` 23 + - **Shard key path:** `c/{row_chunk}/{col_chunk}/{band_chunk}` where band_chunk=0 (128 fits in one chunk) 24 + - **HTTP range requests:** fully supported (206 Partial Content) 25 + 26 + ## File Structure 27 + 28 + | File | Responsibility | 29 + |------|---------------| 30 + | **Layer 1: zarr-v3** | | 31 + | `zarr-v3/dune-project` | Package metadata (depends only on lwt, yojson) | 32 + | `zarr-v3/lib/dune` | Library build config | 33 + | `zarr-v3/lib/zarr_v3.ml` | Core types, metadata parsing, shard reader, slice API | 34 + | `zarr-v3/lib/zarr_v3.mli` | Public API | 35 + | `zarr-v3/lib/blosc.ml` | Blosc header parser (pure OCaml — just strips 16-byte header) | 36 + | `zarr-v3/lib/blosc.mli` | Blosc interface | 37 + | `zarr-v3/test/dune` | Test build config | 38 + | `zarr-v3/test/test_zarr_v3.ml` | Unit tests (metadata parsing, shard index, blosc header) | 39 + | **Layer 2: tessera-zarr** | | 40 + | `tessera-zarr/dune-project` | Package metadata (depends on zarr-v3, tessera-geotessera) | 41 + | `tessera-zarr/lib/dune` | Library build config | 42 + | `tessera-zarr/lib/tessera_zarr.ml` | GeoTessera store client: bbox→UTM→shards→dequantize→reproject | 43 + | `tessera-zarr/lib/tessera_zarr.mli` | Public API | 44 + | **Platform: zarr-v3-unix** | | 45 + | `zarr-v3-unix/dune-project` | Package metadata | 46 + | `zarr-v3-unix/lib/dune` | Library build config | 47 + | `zarr-v3-unix/lib/zarr_v3_unix.ml` | Unix HTTP fetch via `curl` subprocess | 48 + | `zarr-v3-unix/lib/zarr_v3_unix.mli` | Public API | 49 + | **Platform: tessera-zarr-jsoo** | | 50 + | `tessera-zarr-jsoo/dune-project` | Package metadata | 51 + | `tessera-zarr-jsoo/lib/dune` | Library build config | 52 + | `tessera-zarr-jsoo/lib/tessera_zarr_jsoo.ml` | Browser fetch via XMLHttpRequest + Lwt | 53 + | `tessera-zarr-jsoo/lib/tessera_zarr_jsoo.mli` | Public API | 54 + | **Integration test** | | 55 + | `zarr-v3/test/test_live.ml` | Integration test against live GeoTessera store (unix backend) | 56 + | **Notebook** | | 57 + | `site/notebooks/interactive_map_zarr.mld` | New notebook using Zarr-based tile fetching | 58 + | `build-site.sh` | Add tessera-zarr-jsoo to jtw universe | 59 + 60 + --- 61 + 62 + ### Task 1: Blosc header parser 63 + 64 + A pure OCaml module that parses the 16-byte Blosc header and extracts the raw data. 65 + This handles the innermost codec in the Zarr chain. 66 + 67 + **Files:** 68 + - Create: `zarr-v3/dune-project` 69 + - Create: `zarr-v3/lib/dune` 70 + - Create: `zarr-v3/lib/blosc.ml` 71 + - Create: `zarr-v3/lib/blosc.mli` 72 + - Create: `zarr-v3/test/dune` 73 + - Create: `zarr-v3/test/test_zarr_v3.ml` 74 + 75 + - [ ] **Step 1: Create project scaffold** 76 + 77 + `zarr-v3/dune-project`: 78 + ``` 79 + (lang dune 3.17) 80 + (name zarr-v3) 81 + (generate_opam_files true) 82 + (license ISC) 83 + (package 84 + (name zarr-v3) 85 + (synopsis "Pure OCaml Zarr v3 reader with pluggable codecs") 86 + (description "Async Zarr v3 store reader supporting sharding, pluggable compression codecs, and HTTP range requests. Platform-independent.") 87 + (depends 88 + (ocaml (>= 5.2)) 89 + (lwt (>= 5.0)) 90 + (yojson (>= 2.0)) 91 + (alcotest (and :with-test (>= 0.8))) 92 + (lwt_alcotest (and :with-test (>= 0.1))))) 93 + ``` 94 + 95 + `zarr-v3/lib/dune`: 96 + ``` 97 + (library 98 + (name zarr_v3) 99 + (public_name zarr-v3) 100 + (libraries lwt yojson)) 101 + ``` 102 + 103 + `zarr-v3/test/dune`: 104 + ``` 105 + (executable 106 + (name test_zarr_v3) 107 + (libraries zarr-v3 alcotest)) 108 + ``` 109 + 110 + - [ ] **Step 2: Write Blosc header test** 111 + 112 + `zarr-v3/test/test_zarr_v3.ml`: 113 + ```ocaml 114 + let test_blosc_decode_memcpy () = 115 + (* Blosc frame: 16-byte header + uncompressed data *) 116 + (* version=2, versionlz=1, flags=0x93, typesize=4 *) 117 + (* nbytes=16, blocksize=16, cbytes=32 *) 118 + let header = Bytes.create 32 in 119 + Bytes.set header 0 '\x02'; (* version *) 120 + Bytes.set header 1 '\x01'; (* versionlz *) 121 + Bytes.set header 2 '\x93'; (* flags *) 122 + Bytes.set header 3 '\x04'; (* typesize *) 123 + (* nbytes = 16 (LE uint32) *) 124 + Bytes.set header 4 '\x10'; Bytes.set header 5 '\x00'; 125 + Bytes.set header 6 '\x00'; Bytes.set header 7 '\x00'; 126 + (* blocksize = 16 *) 127 + Bytes.set header 8 '\x10'; Bytes.set header 9 '\x00'; 128 + Bytes.set header 10 '\x00'; Bytes.set header 11 '\x00'; 129 + (* cbytes = 32 *) 130 + Bytes.set header 12 '\x20'; Bytes.set header 13 '\x00'; 131 + Bytes.set header 14 '\x00'; Bytes.set header 15 '\x00'; 132 + (* 16 bytes of payload: 0x01..0x10 *) 133 + for i = 0 to 15 do 134 + Bytes.set header (16 + i) (Char.chr (i + 1)) 135 + done; 136 + let input = Bytes.to_string header in 137 + let result = Blosc.decode input in 138 + Alcotest.(check int) "length" 16 (String.length result); 139 + Alcotest.(check char) "first byte" '\x01' result.[0]; 140 + Alcotest.(check char) "last byte" '\x10' result.[15] 141 + 142 + let test_blosc_decode_too_short () = 143 + Alcotest.check_raises "too short" (Failure "Blosc: frame too short") 144 + (fun () -> ignore (Blosc.decode "short")) 145 + 146 + let () = 147 + Alcotest.run "zarr-v3" [ 148 + ("blosc", [ 149 + Alcotest.test_case "decode memcpy" `Quick test_blosc_decode_memcpy; 150 + Alcotest.test_case "too short" `Quick test_blosc_decode_too_short; 151 + ]); 152 + ] 153 + ``` 154 + 155 + - [ ] **Step 3: Run test to verify it fails** 156 + 157 + Run: `opam exec -- dune exec zarr-v3/test/test_zarr_v3.exe` 158 + Expected: compilation error (Blosc module not found) 159 + 160 + - [ ] **Step 4: Implement Blosc header parser** 161 + 162 + `zarr-v3/lib/blosc.mli`: 163 + ```ocaml 164 + (** Blosc frame decoder. 165 + 166 + Parses the 16-byte Blosc header and extracts the raw payload. 167 + Currently handles uncompressed (memcpy) frames only. 168 + For compressed frames, a [decompress] callback must be provided. *) 169 + 170 + type header = { 171 + nbytes : int; (** Uncompressed data size *) 172 + cbytes : int; (** Compressed size including header *) 173 + typesize : int; (** Element size for shuffle *) 174 + blocksize : int; (** Block size *) 175 + } 176 + 177 + val parse_header : string -> header 178 + (** Parse a Blosc header from the first 16 bytes of a frame. 179 + @raise Failure if input is shorter than 16 bytes. *) 180 + 181 + val decode : ?decompress:(string -> int -> string) -> string -> string 182 + (** Decode a Blosc frame. If the frame is uncompressed (memcpy), returns 183 + the raw data directly. If compressed and [decompress] is provided, 184 + calls [decompress compressed_data uncompressed_size]. If compressed 185 + and no [decompress] is given, raises [Failure]. 186 + 187 + @raise Failure if the frame is too short or compressed without a decoder. *) 188 + ``` 189 + 190 + `zarr-v3/lib/blosc.ml`: 191 + ```ocaml 192 + type header = { 193 + nbytes : int; 194 + cbytes : int; 195 + typesize : int; 196 + blocksize : int; 197 + } 198 + 199 + let get_u32_le s off = 200 + Char.code s.[off] 201 + lor (Char.code s.[off+1] lsl 8) 202 + lor (Char.code s.[off+2] lsl 16) 203 + lor (Char.code s.[off+3] lsl 24) 204 + 205 + let parse_header s = 206 + if String.length s < 16 then failwith "Blosc: frame too short"; 207 + let typesize = Char.code s.[3] in 208 + let nbytes = get_u32_le s 4 in 209 + let blocksize = get_u32_le s 8 in 210 + let cbytes = get_u32_le s 12 in 211 + { nbytes; cbytes; typesize; blocksize } 212 + 213 + let decode ?decompress s = 214 + let h = parse_header s in 215 + if h.cbytes = h.nbytes + 16 then 216 + (* Memcpy mode: data is stored uncompressed after header *) 217 + String.sub s 16 h.nbytes 218 + else match decompress with 219 + | Some f -> 220 + let compressed = String.sub s 16 (h.cbytes - 16) in 221 + f compressed h.nbytes 222 + | None -> 223 + failwith (Printf.sprintf 224 + "Blosc: compressed frame (cbytes=%d, nbytes=%d) but no decompressor provided" 225 + h.cbytes h.nbytes) 226 + ``` 227 + 228 + - [ ] **Step 5: Run test to verify it passes** 229 + 230 + Run: `opam exec -- dune exec zarr-v3/test/test_zarr_v3.exe` 231 + Expected: PASS 232 + 233 + - [ ] **Step 6: Commit** 234 + 235 + ```bash 236 + git add zarr-v3/ 237 + git commit -m "zarr-v3: scaffold + Blosc header parser with memcpy decode" 238 + ``` 239 + 240 + --- 241 + 242 + ### Task 2: Zarr v3 metadata parsing 243 + 244 + Parse `zarr.json` for root metadata, array metadata (shape, dtype, chunk grid, codecs), and group attributes (spatial transform, CRS). 245 + 246 + **Files:** 247 + - Modify: `zarr-v3/lib/zarr_v3.ml` 248 + - Create: `zarr-v3/lib/zarr_v3.mli` 249 + - Modify: `zarr-v3/test/test_zarr_v3.ml` 250 + 251 + - [ ] **Step 1: Write metadata parsing test** 252 + 253 + Add to `zarr-v3/test/test_zarr_v3.ml`: 254 + ```ocaml 255 + let embeddings_json = {|{ 256 + "shape": [1297408, 66560, 128], 257 + "data_type": "int8", 258 + "chunk_grid": { 259 + "name": "regular", 260 + "configuration": {"chunk_shape": [256, 256, 128]} 261 + }, 262 + "chunk_key_encoding": { 263 + "name": "default", 264 + "configuration": {"separator": "/"} 265 + }, 266 + "fill_value": 0, 267 + "codecs": [{ 268 + "name": "sharding_indexed", 269 + "configuration": { 270 + "chunk_shape": [4, 4, 128], 271 + "codecs": [{"name": "bytes"}, {"name": "blosc", "configuration": {"cname": "zstd"}}], 272 + "index_codecs": [{"name": "bytes", "configuration": {"endian": "little"}}, {"name": "crc32c"}], 273 + "index_location": "end" 274 + } 275 + }], 276 + "zarr_format": 3, 277 + "node_type": "array" 278 + }|} 279 + 280 + let test_parse_array_meta () = 281 + let meta = Zarr_v3.parse_array_meta embeddings_json in 282 + Alcotest.(check (list int)) "shape" [1297408; 66560; 128] meta.shape; 283 + Alcotest.(check string) "dtype" "int8" meta.data_type; 284 + Alcotest.(check (list int)) "chunk_shape" [256; 256; 128] meta.chunk_shape; 285 + Alcotest.(check bool) "is sharded" true meta.is_sharded; 286 + Alcotest.(check (list int)) "inner_chunk_shape" [4; 4; 128] 287 + (match meta.inner_chunk_shape with Some s -> s | None -> []) 288 + ``` 289 + 290 + Update the test runner: 291 + ```ocaml 292 + let () = 293 + Alcotest.run "zarr-v3" [ 294 + ("blosc", [ 295 + Alcotest.test_case "decode memcpy" `Quick test_blosc_decode_memcpy; 296 + Alcotest.test_case "too short" `Quick test_blosc_decode_too_short; 297 + ]); 298 + ("metadata", [ 299 + Alcotest.test_case "parse array meta" `Quick test_parse_array_meta; 300 + ]); 301 + ] 302 + ``` 303 + 304 + - [ ] **Step 2: Run test to verify it fails** 305 + 306 + Run: `opam exec -- dune exec zarr-v3/test/test_zarr_v3.exe` 307 + Expected: FAIL (Zarr_v3 module not found) 308 + 309 + - [ ] **Step 3: Implement metadata parser** 310 + 311 + `zarr-v3/lib/zarr_v3.mli`: 312 + ```ocaml 313 + (** Pure OCaml Zarr v3 store reader. 314 + 315 + Supports sharded arrays with pluggable codecs and fetch functions. 316 + Platform-independent — bring your own HTTP client and decompressors. *) 317 + 318 + (** {1 Types} *) 319 + 320 + type array_meta = { 321 + shape : int list; 322 + data_type : string; 323 + chunk_shape : int list; 324 + chunk_separator : string; 325 + fill_value_bytes : string; 326 + is_sharded : bool; 327 + inner_chunk_shape : int list option; 328 + inner_codecs : string list; (** Codec names for inner chunks, e.g., ["bytes"; "blosc"] *) 329 + index_location : string; (** "end" or "start" *) 330 + } 331 + (** Parsed metadata for a Zarr v3 array. *) 332 + 333 + type group_meta = { 334 + attributes : (string * Yojson.Safe.t) list; 335 + } 336 + (** Parsed metadata for a Zarr v3 group. *) 337 + 338 + (** {1 Metadata parsing} *) 339 + 340 + val parse_array_meta : string -> array_meta 341 + (** Parse array metadata from a JSON string. 342 + @raise Failure on invalid or unsupported metadata. *) 343 + 344 + val parse_group_meta : string -> group_meta 345 + (** Parse group metadata from a JSON string. *) 346 + 347 + val parse_consolidated : string -> (string * Yojson.Safe.t) list 348 + (** Parse consolidated metadata from root zarr.json. 349 + Returns a list of [(path, metadata_json)] pairs. *) 350 + 351 + (** {1 Store access} *) 352 + 353 + type fetch = url:string -> ?range:(int * int) -> unit -> string Lwt.t 354 + (** HTTP fetch function. [range] is [(offset, length)] for byte-range requests. *) 355 + 356 + type codec = string -> string 357 + (** A codec that transforms bytes. For decompression codecs, takes compressed 358 + input and returns decompressed output. *) 359 + 360 + type codec_registry = string -> codec option 361 + (** Maps codec names (e.g., "blosc") to implementations. *) 362 + 363 + type store 364 + (** An open Zarr v3 store backed by HTTP. *) 365 + 366 + type arr 367 + (** An open Zarr v3 array with parsed metadata and access methods. *) 368 + 369 + val open_store : fetch:fetch -> codecs:codec_registry -> string -> store Lwt.t 370 + (** Open a Zarr v3 store at a URL. Fetches and parses the root [zarr.json]. *) 371 + 372 + val store_meta : store -> (string * Yojson.Safe.t) list 373 + (** Access the consolidated metadata entries. *) 374 + 375 + val open_array : store -> string -> arr Lwt.t 376 + (** Open an array by path within the store (e.g., ["utm31/embeddings"]). 377 + Uses consolidated metadata if available, otherwise fetches [path/zarr.json]. *) 378 + 379 + val array_meta : arr -> array_meta 380 + (** Get the parsed metadata for an array. *) 381 + 382 + val get_shards : arr -> start:int array -> stop:int array -> string Lwt.t 383 + (** Read a rectangular region of an array as raw bytes. 384 + [start] and [stop] are pixel coordinates (inclusive start, exclusive stop). 385 + Returns raw decoded bytes in C-order. *) 386 + ``` 387 + 388 + `zarr-v3/lib/zarr_v3.ml`: 389 + ```ocaml 390 + (* --- Types --- *) 391 + 392 + type array_meta = { 393 + shape : int list; 394 + data_type : string; 395 + chunk_shape : int list; 396 + chunk_separator : string; 397 + fill_value_bytes : string; 398 + is_sharded : bool; 399 + inner_chunk_shape : int list option; 400 + inner_codecs : string list; 401 + index_location : string; 402 + } 403 + 404 + type group_meta = { 405 + attributes : (string * Yojson.Safe.t) list; 406 + } 407 + 408 + type fetch = url:string -> ?range:(int * int) -> unit -> string Lwt.t 409 + type codec = string -> string 410 + type codec_registry = string -> codec option 411 + 412 + type store = { 413 + base_url : string; 414 + fetch : fetch; 415 + codecs : codec_registry; 416 + consolidated : (string * Yojson.Safe.t) list; 417 + } 418 + 419 + type arr = { 420 + store : store; 421 + path : string; 422 + meta : array_meta; 423 + } 424 + 425 + (* --- JSON helpers --- *) 426 + 427 + let json_member key = function 428 + | `Assoc l -> (try List.assoc key l with Not_found -> `Null) 429 + | _ -> `Null 430 + 431 + let json_to_int = function `Int i -> i | _ -> failwith "expected int" 432 + let json_to_string = function `String s -> s | _ -> failwith "expected string" 433 + let json_to_list f = function `List l -> List.map f l | _ -> failwith "expected list" 434 + 435 + let json_to_int_list j = json_to_list json_to_int j 436 + 437 + (* --- Metadata parsing --- *) 438 + 439 + let parse_array_meta json_str = 440 + let j = Yojson.Safe.from_string json_str in 441 + let shape = json_to_int_list (json_member "shape" j) in 442 + let data_type = json_to_string (json_member "data_type" j) in 443 + let chunk_grid = json_member "chunk_grid" j in 444 + let chunk_shape = json_to_int_list 445 + (json_member "chunk_shape" (json_member "configuration" chunk_grid)) in 446 + let chunk_key = json_member "chunk_key_encoding" j in 447 + let chunk_separator = match json_member "separator" 448 + (json_member "configuration" chunk_key) with 449 + | `String s -> s | _ -> "/" in 450 + let fill_value_bytes = "" in (* TODO: proper fill value handling *) 451 + let codecs = match json_member "codecs" j with 452 + | `List l -> l | _ -> [] in 453 + (* Check for sharding *) 454 + let sharding = List.find_opt (fun c -> 455 + json_to_string (json_member "name" c) = "sharding_indexed") codecs in 456 + match sharding with 457 + | Some shard_codec -> 458 + let config = json_member "configuration" shard_codec in 459 + let inner_chunk_shape = json_to_int_list (json_member "chunk_shape" config) in 460 + let inner_codecs_json = match json_member "codecs" config with 461 + | `List l -> l | _ -> [] in 462 + let inner_codecs = List.map (fun c -> 463 + json_to_string (json_member "name" c)) inner_codecs_json in 464 + let index_location = match json_member "index_location" config with 465 + | `String s -> s | _ -> "end" in 466 + { shape; data_type; chunk_shape; chunk_separator; fill_value_bytes; 467 + is_sharded = true; inner_chunk_shape = Some inner_chunk_shape; 468 + inner_codecs; index_location } 469 + | None -> 470 + { shape; data_type; chunk_shape; chunk_separator; fill_value_bytes; 471 + is_sharded = false; inner_chunk_shape = None; 472 + inner_codecs = List.map (fun c -> 473 + json_to_string (json_member "name" c)) codecs; 474 + index_location = "end" } 475 + 476 + let parse_group_meta json_str = 477 + let j = Yojson.Safe.from_string json_str in 478 + let attributes = match json_member "attributes" j with 479 + | `Assoc l -> l | _ -> [] in 480 + { attributes } 481 + 482 + let parse_consolidated json_str = 483 + let j = Yojson.Safe.from_string json_str in 484 + let cm = json_member "consolidated_metadata" j in 485 + match json_member "metadata" cm with 486 + | `Assoc entries -> entries 487 + | _ -> [] 488 + 489 + (* --- Store and array access --- *) 490 + 491 + let open_store ~fetch ~codecs base_url = 492 + let open Lwt.Syntax in 493 + let+ root_json = fetch ~url:(base_url ^ "/zarr.json") () in 494 + let consolidated = parse_consolidated root_json in 495 + { base_url; fetch; codecs; consolidated } 496 + 497 + let store_meta store = store.consolidated 498 + 499 + let open_array store path = 500 + let meta_json = try 501 + let (_, j) = List.find (fun (k, _) -> k = path) store.consolidated in 502 + Yojson.Safe.to_string j 503 + with Not_found -> 504 + failwith (Printf.sprintf "Array %s not found in consolidated metadata" path) 505 + in 506 + let meta = parse_array_meta meta_json in 507 + Lwt.return { store; path; meta } 508 + 509 + let array_meta arr = arr.meta 510 + 511 + (* --- Shard reading --- *) 512 + 513 + let dtype_size = function 514 + | "int8" | "uint8" -> 1 515 + | "int16" | "uint16" -> 2 516 + | "int32" | "float32" -> 4 517 + | "float64" -> 8 518 + | dt -> failwith (Printf.sprintf "Unsupported dtype: %s" dt) 519 + 520 + let get_u64_le s off = 521 + let b i = Int64.of_int (Char.code s.[off + i]) in 522 + let ( lor ) = Int64.logor in 523 + let ( lsl ) = Int64.shift_left in 524 + Int64.to_int ( 525 + (b 0) lor ((b 1) lsl 8) lor ((b 2) lsl 16) lor ((b 3) lsl 24) 526 + lor ((b 4) lsl 32) lor ((b 5) lsl 40) lor ((b 6) lsl 48) lor ((b 7) lsl 56)) 527 + 528 + (* Apply codec chain to raw chunk bytes *) 529 + let apply_codecs codecs codec_names data = 530 + List.fold_right (fun name acc -> 531 + match name with 532 + | "bytes" -> acc (* endian handling — no-op for int8 and LE systems *) 533 + | "blosc" -> 534 + let decompress = match codecs "zstd" with 535 + | Some f -> Some (fun s _nbytes -> f s) 536 + | None -> None 537 + in 538 + Blosc.decode ?decompress acc 539 + | "crc32c" -> acc (* skip validation for now *) 540 + | other -> 541 + match codecs other with 542 + | Some f -> f acc 543 + | None -> failwith (Printf.sprintf "Unknown codec: %s" other) 544 + ) codec_names data 545 + 546 + let get_shards arr ~start ~stop = 547 + let open Lwt.Syntax in 548 + let meta = arr.meta in 549 + let ndim = List.length meta.shape in 550 + let chunk_shape = Array.of_list meta.chunk_shape in 551 + let inner_chunk_shape = match meta.inner_chunk_shape with 552 + | Some s -> Array.of_list s 553 + | None -> chunk_shape 554 + in 555 + let elem_size = dtype_size meta.data_type in 556 + (* Compute output dimensions *) 557 + let out_shape = Array.init ndim (fun d -> stop.(d) - start.(d)) in 558 + let out_total = Array.fold_left ( * ) 1 out_shape * elem_size in 559 + let out_buf = Bytes.make out_total '\x00' in 560 + (* Compute which shards we need *) 561 + let shard_start = Array.init ndim (fun d -> start.(d) / chunk_shape.(d)) in 562 + let shard_stop = Array.init ndim (fun d -> (stop.(d) - 1) / chunk_shape.(d) + 1) in 563 + (* Inner chunks per shard per dimension *) 564 + let inner_per_shard = Array.init ndim (fun d -> 565 + chunk_shape.(d) / inner_chunk_shape.(d)) in 566 + let inner_chunk_bytes = Array.fold_left ( * ) 1 567 + (Array.init ndim (fun d -> inner_chunk_shape.(d))) * elem_size in 568 + let n_inner_chunks = Array.fold_left ( * ) 1 inner_per_shard in 569 + let index_entry_size = 16 in (* 2 × uint64 *) 570 + let index_size = n_inner_chunks * index_entry_size + 4 (* CRC32C *) in 571 + (* Iterate over shards *) 572 + let shard_tasks = ref [] in 573 + let rec iter_shards shard_idx dim = 574 + if dim = ndim then begin 575 + let shard_key = String.concat meta.chunk_separator 576 + ("c" :: List.init ndim (fun d -> string_of_int shard_idx.(d))) in 577 + let shard_url = Printf.sprintf "%s/%s/%s" arr.store.base_url arr.path shard_key in 578 + let task = 579 + (* 1. Fetch the shard index from the end *) 580 + let* shard_size_str = 581 + arr.store.fetch ~url:shard_url () in 582 + let shard_size = String.length shard_size_str in 583 + let _ = shard_size in 584 + let* index_data = 585 + let idx_off = shard_size - index_size in 586 + Lwt.return (String.sub shard_size_str idx_off index_size) 587 + in 588 + (* 2. For each inner chunk that overlaps our region, fetch and decode *) 589 + let rec iter_inner inner_idx dim = 590 + if dim = ndim then begin 591 + (* Compute the pixel range this inner chunk covers *) 592 + let chunk_start = Array.init ndim (fun d -> 593 + shard_idx.(d) * chunk_shape.(d) + inner_idx.(d) * inner_chunk_shape.(d)) in 594 + let chunk_stop = Array.init ndim (fun d -> 595 + min (chunk_start.(d) + inner_chunk_shape.(d)) (Array.of_list meta.shape).(d)) in 596 + (* Check overlap with requested region *) 597 + let overlaps = Array.for_all (fun d -> 598 + chunk_start.(d) < stop.(d) && chunk_stop.(d) > start.(d)) (Array.init ndim Fun.id) in 599 + if overlaps then begin 600 + (* Linearize inner chunk index *) 601 + let linear_idx = ref 0 in 602 + let stride = ref 1 in 603 + for d = ndim - 1 downto 0 do 604 + linear_idx := !linear_idx + inner_idx.(d) * !stride; 605 + stride := !stride * inner_per_shard.(d) 606 + done; 607 + let offset = get_u64_le index_data (!linear_idx * index_entry_size) in 608 + let nbytes = get_u64_le index_data (!linear_idx * index_entry_size + 8) in 609 + if offset < max_int && nbytes > 0 then begin 610 + let compressed = String.sub shard_size_str offset nbytes in 611 + let raw = apply_codecs arr.store.codecs meta.inner_codecs compressed in 612 + (* Copy overlapping region into output buffer *) 613 + let copy_start = Array.init ndim (fun d -> max chunk_start.(d) start.(d)) in 614 + let copy_stop = Array.init ndim (fun d -> min chunk_stop.(d) stop.(d)) in 615 + (* For simplicity, iterate over the overlapping pixels *) 616 + let rec copy_pixels idx dim = 617 + if dim = ndim then begin 618 + let src_offset = 619 + let off = ref 0 in 620 + let stride = ref elem_size in 621 + for d = ndim - 1 downto 0 do 622 + off := !off + (idx.(d) - chunk_start.(d)) * !stride; 623 + stride := !stride * inner_chunk_shape.(d) 624 + done; 625 + !off in 626 + let dst_offset = 627 + let off = ref 0 in 628 + let stride = ref elem_size in 629 + for d = ndim - 1 downto 0 do 630 + off := !off + (idx.(d) - start.(d)) * !stride; 631 + stride := !stride * out_shape.(d) 632 + done; 633 + !off in 634 + Bytes.blit_string raw src_offset out_buf dst_offset elem_size 635 + end else begin 636 + for i = copy_start.(dim) to copy_stop.(dim) - 1 do 637 + idx.(dim) <- i; 638 + copy_pixels idx (dim + 1) 639 + done 640 + end 641 + in 642 + copy_pixels (Array.make ndim 0) 0 643 + end 644 + end; 645 + Lwt.return_unit 646 + end else begin 647 + let tasks = ref [] in 648 + for i = 0 to inner_per_shard.(dim) - 1 do 649 + inner_idx.(dim) <- i; 650 + tasks := iter_inner inner_idx (dim + 1) :: !tasks 651 + done; 652 + Lwt.join !tasks 653 + end 654 + in 655 + iter_inner (Array.make ndim 0) 0 656 + in 657 + shard_tasks := task :: !shard_tasks 658 + end else begin 659 + for i = shard_start.(dim) to shard_stop.(dim) - 1 do 660 + shard_idx.(dim) <- i; 661 + iter_shards shard_idx (dim + 1) 662 + done 663 + end 664 + in 665 + iter_shards (Array.make ndim 0) 0; 666 + let+ () = Lwt.join !shard_tasks in 667 + Bytes.to_string out_buf 668 + ``` 669 + 670 + - [ ] **Step 4: Run tests** 671 + 672 + Run: `opam exec -- dune exec zarr-v3/test/test_zarr_v3.exe` 673 + Expected: PASS (metadata parsing tests + blosc tests) 674 + 675 + - [ ] **Step 5: Commit** 676 + 677 + ```bash 678 + git add zarr-v3/ 679 + git commit -m "zarr-v3: metadata parsing, shard index reading, slice API" 680 + ``` 681 + 682 + --- 683 + 684 + ### Task 3: Unix fetch backend + integration test 685 + 686 + **Files:** 687 + - Create: `zarr-v3-unix/dune-project` 688 + - Create: `zarr-v3-unix/lib/dune` 689 + - Create: `zarr-v3-unix/lib/zarr_v3_unix.ml` 690 + - Create: `zarr-v3-unix/lib/zarr_v3_unix.mli` 691 + - Create: `zarr-v3/test/test_live.ml` 692 + 693 + - [ ] **Step 1: Create Unix backend** 694 + 695 + `zarr-v3-unix/dune-project`: 696 + ``` 697 + (lang dune 3.17) 698 + (name zarr-v3-unix) 699 + (generate_opam_files true) 700 + (license ISC) 701 + (package 702 + (name zarr-v3-unix) 703 + (synopsis "Unix HTTP backend for zarr-v3") 704 + (depends 705 + (ocaml (>= 5.2)) 706 + (zarr-v3 (>= 0.1)) 707 + (lwt (>= 5.0)))) 708 + ``` 709 + 710 + `zarr-v3-unix/lib/dune`: 711 + ``` 712 + (library 713 + (name zarr_v3_unix) 714 + (public_name zarr-v3-unix) 715 + (libraries zarr-v3 lwt lwt.unix)) 716 + ``` 717 + 718 + `zarr-v3-unix/lib/zarr_v3_unix.mli`: 719 + ```ocaml 720 + (** Unix HTTP backend for zarr-v3 using curl subprocess. *) 721 + 722 + val fetch : Zarr_v3.fetch 723 + (** HTTP fetch via [curl]. Supports byte-range requests. *) 724 + 725 + val codecs : Zarr_v3.codec_registry 726 + (** Codec registry with Blosc support (data is typically uncompressed/memcpy). 727 + Does not include Zstd decompression — only handles memcpy Blosc frames. *) 728 + ``` 729 + 730 + `zarr-v3-unix/lib/zarr_v3_unix.ml`: 731 + ```ocaml 732 + let fetch ~url ?range () = 733 + let open Lwt.Syntax in 734 + let cmd = match range with 735 + | None -> Printf.sprintf "curl -sf '%s'" url 736 + | Some (off, len) -> 737 + Printf.sprintf "curl -sf -H 'Range: bytes=%d-%d' '%s'" off (off + len - 1) url 738 + in 739 + let+ lines = Lwt_process.pread 740 + (Lwt_process.shell cmd) in 741 + lines 742 + 743 + let codecs _name = None (* Blosc.decode handles memcpy internally *) 744 + ``` 745 + 746 + - [ ] **Step 2: Create integration test** 747 + 748 + `zarr-v3/test/test_live.ml` (add to `zarr-v3/test/dune` as a separate executable): 749 + 750 + Update `zarr-v3/test/dune`: 751 + ``` 752 + (executable 753 + (name test_zarr_v3) 754 + (libraries zarr-v3 alcotest)) 755 + 756 + (executable 757 + (name test_live) 758 + (libraries zarr-v3 zarr-v3-unix lwt lwt.unix)) 759 + ``` 760 + 761 + `zarr-v3/test/test_live.ml`: 762 + ```ocaml 763 + let () = Lwt_main.run begin 764 + let open Lwt.Syntax in 765 + let base = "https://dl2.geotessera.org/zarr/v1/2024.zarr" in 766 + Printf.printf "Opening store at %s\n%!" base; 767 + let* store = Zarr_v3.open_store ~fetch:Zarr_v3_unix.fetch 768 + ~codecs:Zarr_v3_unix.codecs base in 769 + Printf.printf "Consolidated metadata: %d entries\n%!" 770 + (List.length (Zarr_v3.store_meta store)); 771 + let* scales = Zarr_v3.open_array store "utm31/scales" in 772 + let meta = Zarr_v3.array_meta scales in 773 + Printf.printf "Scales: shape=%s, dtype=%s\n%!" 774 + (String.concat "x" (List.map string_of_int meta.shape)) 775 + meta.data_type; 776 + (* Read a small 4x4 region near Cambridge *) 777 + (* Cambridge UTM31N: easting~305000, northing~5782000 *) 778 + (* pixel_col = (305000 - 167600) / 10 = 13740 *) 779 + (* pixel_row = (6933480 - 5782000) / 10 = 115148 *) 780 + let start = [| 115148; 13740 |] in 781 + let stop = [| 115152; 13744 |] in 782 + let* data = Zarr_v3.get_shards scales ~start ~stop in 783 + Printf.printf "Read 4x4 scales region: %d bytes\n%!" (String.length data); 784 + (* Print the float32 values *) 785 + for i = 0 to 15 do 786 + let off = i * 4 in 787 + let b0 = Char.code data.[off] in 788 + let b1 = Char.code data.[off+1] in 789 + let b2 = Char.code data.[off+2] in 790 + let b3 = Char.code data.[off+3] in 791 + let bits = Int32.logor (Int32.of_int b0) 792 + (Int32.logor (Int32.shift_left (Int32.of_int b1) 8) 793 + (Int32.logor (Int32.shift_left (Int32.of_int b2) 16) 794 + (Int32.shift_left (Int32.of_int b3) 24))) in 795 + let v = Int32.float_of_bits bits in 796 + Printf.printf " scale[%d] = %.6f\n" i v 797 + done; 798 + Lwt.return_unit 799 + end 800 + ``` 801 + 802 + - [ ] **Step 3: Run integration test** 803 + 804 + Run: `opam exec -- dune exec zarr-v3/test/test_live.exe` 805 + Expected: prints scale values (should be ~0.05 range) 806 + 807 + - [ ] **Step 4: Commit** 808 + 809 + ```bash 810 + git add zarr-v3-unix/ zarr-v3/test/test_live.ml zarr-v3/test/dune 811 + git commit -m "zarr-v3-unix: curl-based fetch backend + live integration test" 812 + ``` 813 + 814 + --- 815 + 816 + ### Task 4: GeoTessera Zarr client (tessera-zarr) 817 + 818 + Maps WGS84 bounding boxes to UTM pixel ranges, reads shards, dequantizes, reprojects. 819 + 820 + **Files:** 821 + - Create: `tessera-zarr/dune-project` 822 + - Create: `tessera-zarr/lib/dune` 823 + - Create: `tessera-zarr/lib/tessera_zarr.ml` 824 + - Create: `tessera-zarr/lib/tessera_zarr.mli` 825 + 826 + - [ ] **Step 1: Create project scaffold** 827 + 828 + `tessera-zarr/dune-project`: 829 + ``` 830 + (lang dune 3.17) 831 + (name tessera-zarr) 832 + (generate_opam_files true) 833 + (license ISC) 834 + (package 835 + (name tessera-zarr) 836 + (synopsis "GeoTessera Zarr v3 client") 837 + (description "Fetches GeoTessera embeddings from sharded Zarr v3 stores. Maps WGS84 bounding boxes to UTM pixel ranges, dequantizes, and reprojects.") 838 + (depends 839 + (ocaml (>= 5.2)) 840 + (zarr-v3 (>= 0.1)) 841 + (tessera-geotessera (>= 0.1)) 842 + (tessera-linalg (>= 0.1)) 843 + (lwt (>= 5.0)))) 844 + ``` 845 + 846 + `tessera-zarr/lib/dune`: 847 + ``` 848 + (library 849 + (name tessera_zarr) 850 + (public_name tessera-zarr) 851 + (libraries zarr-v3 tessera-geotessera tessera-linalg lwt)) 852 + ``` 853 + 854 + - [ ] **Step 2: Implement the client** 855 + 856 + `tessera-zarr/lib/tessera_zarr.mli`: 857 + ```ocaml 858 + (** GeoTessera Zarr v3 client. 859 + 860 + Fetches GeoTessera embeddings from sharded Zarr v3 stores, 861 + mapping WGS84 bounding boxes to UTM pixel ranges. *) 862 + 863 + val fetch_region : 864 + store:Zarr_v3.store -> 865 + year:int -> 866 + Geotessera.bbox -> 867 + (Linalg.mat * int * int * Geotessera.bbox) Lwt.t 868 + (** [fetch_region ~store ~year bbox] fetches embeddings for a WGS84 bounding box. 869 + 870 + Returns [(mosaic_mat, height, width, wgs84_bounds)] — the same tuple type 871 + as {!Geotessera.fetch_mosaic_sync} for drop-in compatibility. 872 + 873 + The mosaic is reprojected from UTM to a regular WGS84 grid. *) 874 + ``` 875 + 876 + `tessera-zarr/lib/tessera_zarr.ml`: 877 + ```ocaml 878 + let fetch_region ~store ~year:_ bbox = 879 + let open Lwt.Syntax in 880 + let open Geotessera in 881 + (* 1. Determine UTM zone from bbox center *) 882 + let center_lon = (bbox.min_lon +. bbox.max_lon) /. 2.0 in 883 + let zone = Utm.zone_of_lon center_lon in 884 + let zone_name = Printf.sprintf "utm%d" zone in 885 + (* 2. Open the zone group and read spatial metadata *) 886 + let zone_meta_json = try 887 + let (_, j) = List.find (fun (k, _) -> k = zone_name) 888 + (Zarr_v3.store_meta store) in 889 + j 890 + with Not_found -> 891 + failwith (Printf.sprintf "UTM zone %s not found in store" zone_name) in 892 + let attrs = match Yojson.Safe.Util.member "attributes" zone_meta_json with 893 + | `Assoc l -> l | _ -> [] in 894 + let transform = match List.assoc_opt "spatial:transform" (attrs) with 895 + | Some (`List l) -> Array.of_list (List.map (function 896 + | `Float f -> f | `Int i -> Float.of_int i | _ -> 0.0) l) 897 + | _ -> failwith "Missing spatial:transform" in 898 + let pixel_size_e = transform.(0) in (* 10.0 *) 899 + let origin_e = transform.(2) in (* e.g., 167600.0 *) 900 + let pixel_size_n = transform.(4) in (* -10.0 *) 901 + let origin_n = transform.(5) in (* e.g., 6933480.0 *) 902 + (* 3. Convert WGS84 bbox corners to UTM pixel range *) 903 + let corners = [ 904 + (bbox.min_lon, bbox.min_lat); (bbox.max_lon, bbox.min_lat); 905 + (bbox.min_lon, bbox.max_lat); (bbox.max_lon, bbox.max_lat); 906 + ] in 907 + let utm_corners = List.map (fun (lon, lat) -> 908 + Utm.wgs84_to_utm ~zone lon lat) corners in 909 + let min_e = List.fold_left (fun acc (e, _) -> Float.min acc e) Float.max_float utm_corners in 910 + let max_e = List.fold_left (fun acc (e, _) -> Float.max acc e) Float.neg_infinity utm_corners in 911 + let min_n = List.fold_left (fun acc (_, n) -> Float.min acc n) Float.max_float utm_corners in 912 + let max_n = List.fold_left (fun acc (_, n) -> Float.max acc n) Float.neg_infinity utm_corners in 913 + (* Pixel coordinates (row increases with decreasing northing) *) 914 + let col_start = max 0 (Float.to_int (Float.floor ((min_e -. origin_e) /. pixel_size_e))) in 915 + let col_stop = Float.to_int (Float.ceil ((max_e -. origin_e) /. pixel_size_e)) in 916 + let row_start = max 0 (Float.to_int (Float.floor ((origin_n -. max_n) /. Float.abs pixel_size_n))) in 917 + let row_stop = Float.to_int (Float.ceil ((origin_n -. min_n) /. Float.abs pixel_size_n)) in 918 + let tile_h = row_stop - row_start in 919 + let tile_w = col_stop - col_start in 920 + (* 4. Fetch embeddings and scales *) 921 + let* emb_arr = Zarr_v3.open_array store (zone_name ^ "/embeddings") in 922 + let* scales_arr = Zarr_v3.open_array store (zone_name ^ "/scales") in 923 + let emb_start = [| row_start; col_start; 0 |] in 924 + let emb_stop = [| row_stop; col_stop; 128 |] in 925 + let scales_start = [| row_start; col_start |] in 926 + let scales_stop = [| row_stop; col_stop |] in 927 + let* emb_data = Zarr_v3.get_shards emb_arr ~start:emb_start ~stop:emb_stop in 928 + let* scales_data = Zarr_v3.get_shards scales_arr ~start:scales_start ~stop:scales_stop in 929 + (* 5. Dequantize: float32 = int8 * float32_scale *) 930 + let n_features = 128 in 931 + let mat = Linalg.create_mat ~rows:(tile_h * tile_w) ~cols:n_features in 932 + for i = 0 to tile_h - 1 do 933 + for j = 0 to tile_w - 1 do 934 + let pixel = i * tile_w + j in 935 + (* Read float32 scale *) 936 + let s_off = pixel * 4 in 937 + let s_bits = Int32.logor (Int32.of_int (Char.code scales_data.[s_off])) 938 + (Int32.logor (Int32.shift_left (Int32.of_int (Char.code scales_data.[s_off+1])) 8) 939 + (Int32.logor (Int32.shift_left (Int32.of_int (Char.code scales_data.[s_off+2])) 16) 940 + (Int32.shift_left (Int32.of_int (Char.code scales_data.[s_off+3])) 24))) in 941 + let scale = Int32.float_of_bits s_bits in 942 + for f = 0 to n_features - 1 do 943 + let e_off = pixel * n_features + f in 944 + let emb_val = Char.code emb_data.[e_off] in 945 + let emb_signed = if emb_val >= 128 then emb_val - 256 else emb_val in 946 + Linalg.mat_set mat pixel f (Float.of_int emb_signed *. scale) 947 + done 948 + done 949 + done; 950 + (* 6. Reproject from UTM to WGS84 *) 951 + (* Compute WGS84 output grid dimensions — match the UTM pixel count *) 952 + let out_h = tile_h in 953 + let out_w = tile_w in 954 + let coord = { lon = center_lon; lat = (bbox.min_lat +. bbox.max_lat) /. 2.0 } in 955 + let utm_min_e = origin_e +. Float.of_int col_start *. pixel_size_e in 956 + let utm_max_n = origin_n +. Float.of_int row_start *. pixel_size_n in 957 + let utm_pixel_e = pixel_size_e in 958 + let utm_pixel_n = Float.abs pixel_size_n in 959 + (* Compute the WGS84 bounds of the UTM pixel grid *) 960 + let (w_lon, _) = Utm.utm_to_wgs84 ~zone utm_min_e (utm_max_n -. Float.of_int tile_h *. utm_pixel_n) in 961 + let (e_lon, _) = Utm.utm_to_wgs84 ~zone (utm_min_e +. Float.of_int tile_w *. utm_pixel_e) utm_max_n in 962 + let (_, s_lat) = Utm.utm_to_wgs84 ~zone utm_min_e (utm_max_n -. Float.of_int tile_h *. utm_pixel_n) in 963 + let (_, n_lat) = Utm.utm_to_wgs84 ~zone utm_min_e utm_max_n in 964 + let wgs_bbox = { min_lon = w_lon; min_lat = s_lat; max_lon = e_lon; max_lat = n_lat } in 965 + (* Reproject using existing logic *) 966 + let reprojected = Geotessera.reproject_tile coord 967 + ~tile_h ~tile_w ~n_features ~out_h ~out_w mat in 968 + Lwt.return (reprojected, out_h, out_w, wgs_bbox) 969 + ``` 970 + 971 + Note: this requires `reproject_tile` to be exposed in `geotessera.mli`. Add it: 972 + 973 + **Modify:** `tessera-geotessera/lib/geotessera.mli` — add: 974 + ```ocaml 975 + val reproject_tile : tile_coord -> tile_h:int -> tile_w:int -> 976 + n_features:int -> out_h:int -> out_w:int -> Linalg.mat -> Linalg.mat 977 + (** Reproject a tile from UTM to a regular WGS84 grid. *) 978 + ``` 979 + 980 + - [ ] **Step 3: Build** 981 + 982 + Run: `opam exec -- dune build` 983 + Expected: success 984 + 985 + - [ ] **Step 4: Commit** 986 + 987 + ```bash 988 + git add tessera-zarr/ tessera-geotessera/lib/geotessera.mli 989 + git commit -m "tessera-zarr: GeoTessera Zarr v3 client with UTM reprojection" 990 + ``` 991 + 992 + --- 993 + 994 + ### Task 5: Browser backend (tessera-zarr-jsoo) 995 + 996 + **Files:** 997 + - Create: `tessera-zarr-jsoo/dune-project` 998 + - Create: `tessera-zarr-jsoo/lib/dune` 999 + - Create: `tessera-zarr-jsoo/lib/tessera_zarr_jsoo.ml` 1000 + - Create: `tessera-zarr-jsoo/lib/tessera_zarr_jsoo.mli` 1001 + 1002 + - [ ] **Step 1: Create browser backend** 1003 + 1004 + `tessera-zarr-jsoo/dune-project`: 1005 + ``` 1006 + (lang dune 3.17) 1007 + (name tessera-zarr-jsoo) 1008 + (generate_opam_files true) 1009 + (license ISC) 1010 + (package 1011 + (name tessera-zarr-jsoo) 1012 + (synopsis "Browser backend for tessera-zarr") 1013 + (depends 1014 + (ocaml (>= 5.2)) 1015 + (zarr-v3 (>= 0.1)) 1016 + (tessera-zarr (>= 0.1)) 1017 + (js_of_ocaml (>= 5.0)) 1018 + (js_of_ocaml-ppx (>= 5.0)) 1019 + (js_of_ocaml-lwt (>= 5.0)) 1020 + (lwt (>= 5.0)))) 1021 + ``` 1022 + 1023 + `tessera-zarr-jsoo/lib/dune`: 1024 + ``` 1025 + (library 1026 + (name tessera_zarr_jsoo) 1027 + (public_name tessera-zarr-jsoo) 1028 + (libraries zarr-v3 tessera-zarr js_of_ocaml js_of_ocaml-lwt lwt) 1029 + (preprocess (pps js_of_ocaml-ppx))) 1030 + ``` 1031 + 1032 + `tessera-zarr-jsoo/lib/tessera_zarr_jsoo.mli`: 1033 + ```ocaml 1034 + (** Browser backend for tessera-zarr. 1035 + 1036 + Provides XHR-based fetch with byte-range support and a codec registry 1037 + suitable for the GeoTessera Zarr store (Blosc with memcpy/uncompressed). *) 1038 + 1039 + val fetch : Zarr_v3.fetch 1040 + (** Async HTTP fetch via XMLHttpRequest with Range header support. *) 1041 + 1042 + val codecs : Zarr_v3.codec_registry 1043 + (** Codec registry for GeoTessera data (handles Blosc memcpy frames). *) 1044 + 1045 + val open_store : ?base_url:string -> ?year:int -> unit -> Zarr_v3.store Lwt.t 1046 + (** Convenience: open the GeoTessera Zarr store. 1047 + Default URL: [https://dl2.geotessera.org/zarr/v1/{year}.zarr] 1048 + Default year: 2024 *) 1049 + 1050 + val fetch_region : ?year:int -> Geotessera.bbox -> 1051 + (Linalg.mat * int * int * Geotessera.bbox) Lwt.t 1052 + (** Convenience: fetch embeddings for a bbox, opening the store if needed. *) 1053 + ``` 1054 + 1055 + `tessera-zarr-jsoo/lib/tessera_zarr_jsoo.ml`: 1056 + ```ocaml 1057 + open Js_of_ocaml 1058 + open Js_of_ocaml_lwt 1059 + 1060 + let fetch ~url ?range () = 1061 + let open Lwt.Syntax in 1062 + let xhr = XmlHttpRequest.create () in 1063 + xhr##.responseType := Js.string "arraybuffer"; 1064 + xhr##_open (Js.string "GET") (Js.string url) Js._true; 1065 + (match range with 1066 + | Some (off, len) -> 1067 + xhr##setRequestHeader (Js.string "Range") 1068 + (Js.string (Printf.sprintf "bytes=%d-%d" off (off + len - 1))) 1069 + | None -> ()); 1070 + let (p, resolver) = Lwt.wait () in 1071 + xhr##.onload := Dom.handler (fun _ -> 1072 + let data = Js.Opt.case 1073 + (File.CoerceTo.arrayBuffer xhr##.response) 1074 + (fun () -> "") 1075 + (fun b -> Typed_array.String.of_arrayBuffer b) in 1076 + Lwt.wakeup resolver data; 1077 + Js._false); 1078 + xhr##.onerror := Dom.handler (fun _ -> 1079 + Lwt.wakeup_exn resolver (Failure (Printf.sprintf "XHR error fetching %s" url)); 1080 + Js._false); 1081 + xhr##send Js.null; 1082 + let+ data = p in 1083 + data 1084 + 1085 + let codecs _name = None 1086 + 1087 + let open_store ?(base_url = "https://dl2.geotessera.org/zarr/v1") ?(year = 2024) () = 1088 + let url = Printf.sprintf "%s/%d.zarr" base_url year in 1089 + Zarr_v3.open_store ~fetch ~codecs url 1090 + 1091 + let fetch_region ?(year = 2024) bbox = 1092 + let open Lwt.Syntax in 1093 + let* store = open_store ~year () in 1094 + Tessera_zarr.fetch_region ~store ~year bbox 1095 + ``` 1096 + 1097 + - [ ] **Step 2: Build** 1098 + 1099 + Run: `opam exec -- dune build` 1100 + Expected: success 1101 + 1102 + - [ ] **Step 3: Commit** 1103 + 1104 + ```bash 1105 + git add tessera-zarr-jsoo/ 1106 + git commit -m "tessera-zarr-jsoo: browser XHR backend for Zarr-based tile fetching" 1107 + ``` 1108 + 1109 + --- 1110 + 1111 + ### Task 6: New notebook + build integration 1112 + 1113 + **Files:** 1114 + - Create: `site/notebooks/interactive_map_zarr.mld` 1115 + - Modify: `build-site.sh` 1116 + 1117 + - [ ] **Step 1: Add tessera-zarr-jsoo to jtw universe** 1118 + 1119 + In `build-site.sh`, add `tessera-zarr-jsoo` to the package list: 1120 + ```bash 1121 + dune exec -- jtw opam astring base brr note mime_printer fpath rresult \ 1122 + opam-format bos odoc.model tyxml yojson uri jsonm \ 1123 + js_top_worker-widget-leaflet \ 1124 + tessera-geotessera-jsoo tessera-viz-jsoo tessera-tfjs \ 1125 + tessera-zarr-jsoo \ 1126 + onnxrt -o "$SITE/_opam" 1127 + ``` 1128 + 1129 + - [ ] **Step 2: Create new notebook** 1130 + 1131 + `site/notebooks/interactive_map_zarr.mld`: 1132 + 1133 + This is a copy of `interactive_map.mld` with the fetch cell updated to use the Zarr API. 1134 + The key differences: 1135 + 1. Setup cell requires `tessera-zarr-jsoo` and `lwt` 1136 + 2. Fetch cell uses `Tessera_zarr_jsoo.fetch_region` (async, returns Lwt) 1137 + 3. Cells that need async use `Lwt_main.run` (or the notebook's async cell support) 1138 + 1139 + The notebook structure follows the same pattern as the existing one — draw bbox, fetch, visualize, classify. The fetch is the only cell that changes significantly. 1140 + 1141 + - [ ] **Step 3: Build and commit** 1142 + 1143 + ```bash 1144 + git add site/notebooks/interactive_map_zarr.mld build-site.sh 1145 + git commit -m "notebook: add Zarr-based interactive map notebook" 1146 + ``` 1147 + 1148 + --- 1149 + 1150 + ### Notes 1151 + 1152 + **Fetch strategy:** For the initial implementation, each shard is fetched in its entirety (not individual inner chunks via range requests). A 256×256×128 int8 shard is ~8.5MB. For a typical Cambridge-area bbox (~0.1° × 0.1°), this requires 1-4 shard fetches (embeddings) + 1-4 shard fetches (scales) ≈ 9-36MB total. This is a huge improvement over the current 106MB × 4 tiles = 424MB. 1153 + 1154 + **Future optimization:** Fetch only the shard index via range request, then fetch only the needed inner chunks. This would reduce the fetch to ~100KB for a small bbox. 1155 + 1156 + **Blosc compression:** The current GeoTessera data stores all chunks uncompressed (Blosc memcpy fallback). The Blosc header parser handles this case. If future data is actually Zstd-compressed, the `codec_registry` parameter allows plugging in a Zstd decompressor without changing the core library. 1157 + 1158 + **Consolidated metadata:** The root `zarr.json` contains all zone metadata inline (consolidated). A single HTTP request gets everything needed to compute pixel ranges for any zone. This is ~100KB. 1159 + 1160 + **Async model:** All I/O goes through Lwt. The Unix backend uses `Lwt_process.pread` (subprocess curl). The browser backend uses XHR with Lwt promises. The core zarr-v3 library uses `Lwt.join` to parallelize shard fetches.
+9
tessera-geotessera/lib/geotessera.mli
··· 43 43 The pixel grid is the UTM bounding box of the WGS84 tile corners, 44 44 which is wider than the nominal 0.1-degree tile due to UTM projection. *) 45 45 46 + (** {1 Reprojection} *) 47 + 48 + val reproject_tile : tile_coord -> tile_h:int -> tile_w:int -> 49 + n_features:int -> out_h:int -> out_w:int -> Linalg.mat -> Linalg.mat 50 + (** [reproject_tile coord ~tile_h ~tile_w ~n_features ~out_h ~out_w data] 51 + reprojects a tile from its native UTM grid onto a regular WGS84 grid 52 + using nearest-neighbor interpolation. [coord] is used to determine the 53 + UTM zone. Input has [tile_h * tile_w] rows, output has [out_h * out_w] rows. *) 54 + 46 55 (** {1 Mosaicing} *) 47 56 48 57 val mosaic : (tile_coord * Linalg.mat * int * int) list -> bbox -> Linalg.mat * int * int * bbox
+14
tessera-zarr-jsoo/dune-project
··· 1 + (lang dune 3.17) 2 + (name tessera-zarr-jsoo) 3 + (generate_opam_files true) 4 + (license ISC) 5 + (package 6 + (name tessera-zarr-jsoo) 7 + (synopsis "Browser backend for tessera-zarr") 8 + (depends 9 + (ocaml (>= 5.2)) 10 + (zarr-v3 (>= 0.1)) 11 + (tessera-zarr (>= 0.1)) 12 + (js_of_ocaml (>= 5.0)) 13 + (js_of_ocaml-ppx (>= 5.0)) 14 + (lwt (>= 5.0))))
+5
tessera-zarr-jsoo/lib/dune
··· 1 + (library 2 + (name tessera_zarr_jsoo) 3 + (public_name tessera-zarr-jsoo) 4 + (libraries zarr-v3 tessera-zarr js_of_ocaml lwt) 5 + (preprocess (pps js_of_ocaml-ppx)))
+4
tessera-zarr-jsoo/lib/tessera_zarr_jsoo.ml
··· 1 + let fetch _url ?off:_ ?len:_ () = failwith "TODO" 2 + let codecs _name = None 3 + let open_store ?base_url:_ ?year:_ () = failwith "TODO" 4 + let fetch_region ?base_url:_ ?year:_ _bbox = failwith "TODO"
+34
tessera-zarr-jsoo/lib/tessera_zarr_jsoo.mli
··· 1 + (** Browser backend for tessera-zarr. 2 + 3 + Provides async XHR-based HTTP fetch with byte-range support 4 + and convenience wrappers for opening the GeoTessera Zarr store 5 + and fetching embeddings from the browser. 6 + 7 + {2 Example} 8 + 9 + {[ 10 + let%lwt (mat, h, w, bounds) = 11 + Tessera_zarr_jsoo.fetch_region 12 + Geotessera.{ min_lon = 0.1; min_lat = 52.1; 13 + max_lon = 0.2; max_lat = 52.2 } in 14 + (* mat is a Linalg.mat with h*w rows, 128 cols *) 15 + ]} *) 16 + 17 + val fetch : Zarr_v3.fetch 18 + (** Async HTTP fetch via [XMLHttpRequest] with [Range] header support. 19 + Uses [responseType = "arraybuffer"] for binary data. *) 20 + 21 + val codecs : Zarr_v3.codec_registry 22 + (** Codec registry. Currently returns [None] for all codecs — 23 + the built-in Blosc memcpy handling is sufficient for 24 + GeoTessera's uncompressed data. *) 25 + 26 + val open_store : ?base_url:string -> ?year:int -> unit -> Zarr_v3.store Lwt.t 27 + (** Open the GeoTessera Zarr store. 28 + Default base URL: ["https://dl2.geotessera.org/zarr/v1"]. 29 + Default year: [2024]. *) 30 + 31 + val fetch_region : ?base_url:string -> ?year:int -> 32 + Geotessera.bbox -> (Linalg.mat * int * int * Geotessera.bbox) Lwt.t 33 + (** Convenience wrapper: opens the store and fetches embeddings 34 + for a WGS84 bounding box in one call. *)
+28
tessera-zarr-jsoo/tessera-zarr-jsoo.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Browser backend for tessera-zarr" 4 + license: "ISC" 5 + depends: [ 6 + "dune" {>= "3.17"} 7 + "ocaml" {>= "5.2"} 8 + "zarr-v3" {>= "0.1"} 9 + "tessera-zarr" {>= "0.1"} 10 + "js_of_ocaml" {>= "5.0"} 11 + "js_of_ocaml-ppx" {>= "5.0"} 12 + "lwt" {>= "5.0"} 13 + "odoc" {with-doc} 14 + ] 15 + build: [ 16 + ["dune" "subst"] {dev} 17 + [ 18 + "dune" 19 + "build" 20 + "-p" 21 + name 22 + "-j" 23 + jobs 24 + "@install" 25 + "@runtest" {with-test} 26 + "@doc" {with-doc} 27 + ] 28 + ]
+14
tessera-zarr/dune-project
··· 1 + (lang dune 3.17) 2 + (name tessera-zarr) 3 + (generate_opam_files true) 4 + (license ISC) 5 + (package 6 + (name tessera-zarr) 7 + (synopsis "GeoTessera Zarr v3 client") 8 + (description "Fetches GeoTessera embeddings from sharded Zarr v3 stores. Maps WGS84 bounding boxes to UTM pixel ranges, dequantizes, and reprojects.") 9 + (depends 10 + (ocaml (>= 5.2)) 11 + (zarr-v3 (>= 0.1)) 12 + (tessera-geotessera (>= 0.1)) 13 + (tessera-linalg (>= 0.1)) 14 + (lwt (>= 5.0))))
+4
tessera-zarr/lib/dune
··· 1 + (library 2 + (name tessera_zarr) 3 + (public_name tessera-zarr) 4 + (libraries zarr-v3 tessera-geotessera tessera-linalg lwt))
+9
tessera-zarr/lib/tessera_zarr.ml
··· 1 + type zone_info = { 2 + zone : int; 3 + origin_easting : float; 4 + origin_northing : float; 5 + pixel_size : float; 6 + } 7 + 8 + let zone_info _store _zone_name = failwith "TODO" 9 + let fetch_region ~store:_ _bbox = failwith "TODO"
+71
tessera-zarr/lib/tessera_zarr.mli
··· 1 + (** GeoTessera Zarr v3 client. 2 + 3 + Fetches GeoTessera embeddings from sharded Zarr v3 stores, 4 + mapping WGS84 bounding boxes to UTM pixel ranges. Dequantizes 5 + int8 embeddings using float32 scales and reprojects from the 6 + native UTM grid to a regular WGS84 grid. 7 + 8 + {2 Example} 9 + 10 + {[ 11 + let store = Zarr_v3.open_store ~fetch ~codecs 12 + "https://dl2.geotessera.org/zarr/v1/2024.zarr" in 13 + let bbox = Geotessera.{ min_lon = 0.1; min_lat = 52.1; 14 + max_lon = 0.2; max_lat = 52.2 } in 15 + let (mat, h, w, bounds) = Tessera_zarr.fetch_region ~store bbox in 16 + (* mat has h*w rows and 128 columns of float32 dequantized embeddings *) 17 + ]} 18 + 19 + {2 Store layout} 20 + 21 + The GeoTessera Zarr store is organised by UTM zone: 22 + {[ 23 + 2024.zarr/ 24 + ├── zarr.json (consolidated metadata for all zones) 25 + ├── utm30/ 26 + │ ├── embeddings (int8, H×W×128, sharded 256×256) 27 + │ ├── scales (float32, H×W, sharded 256×256) 28 + │ └── ... 29 + └── utm31/ 30 + └── ... 31 + ]} 32 + 33 + Each zone group carries [spatial:transform] (6-element affine) 34 + and [proj:code] (e.g., ["EPSG:32631"]) attributes. *) 35 + 36 + (** {1 Spatial metadata} *) 37 + 38 + type zone_info = { 39 + zone : int; (** UTM zone number *) 40 + origin_easting : float; (** Easting of pixel (0, 0) *) 41 + origin_northing : float; (** Northing of pixel (0, 0) *) 42 + pixel_size : float; (** Pixel size in metres (typically 10.0) *) 43 + } 44 + (** Spatial metadata extracted from a zone group's attributes. *) 45 + 46 + val zone_info : Zarr_v3.store -> string -> zone_info 47 + (** [zone_info store zone_name] extracts spatial metadata from a zone 48 + group (e.g., ["utm31"]). 49 + @raise Failure if the group or required attributes are missing. *) 50 + 51 + (** {1 Fetching embeddings} *) 52 + 53 + val fetch_region : 54 + store:Zarr_v3.store -> 55 + Geotessera.bbox -> 56 + (Linalg.mat * int * int * Geotessera.bbox) Lwt.t 57 + (** [fetch_region ~store bbox] fetches dequantized embeddings for 58 + a WGS84 bounding box. 59 + 60 + Steps: 61 + + Determines the UTM zone from the bbox centre 62 + + Converts bbox corners to UTM pixel coordinates 63 + + Fetches [embeddings] and [scales] shards (in parallel) 64 + + Dequantizes: [float32 = int8 × scale] 65 + + Reprojects from the UTM pixel grid to a regular WGS84 grid 66 + 67 + Returns [(mosaic_mat, height, width, wgs84_bounds)] — the same 68 + tuple shape as {!Geotessera.fetch_mosaic_sync} for compatibility 69 + with existing notebook code. 70 + 71 + @raise Failure if the UTM zone is not in the store. *)
+29
tessera-zarr/tessera-zarr.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "GeoTessera Zarr v3 client" 4 + description: 5 + "Fetches GeoTessera embeddings from sharded Zarr v3 stores. Maps WGS84 bounding boxes to UTM pixel ranges, dequantizes, and reprojects." 6 + license: "ISC" 7 + depends: [ 8 + "dune" {>= "3.17"} 9 + "ocaml" {>= "5.2"} 10 + "zarr-v3" {>= "0.1"} 11 + "tessera-geotessera" {>= "0.1"} 12 + "tessera-linalg" {>= "0.1"} 13 + "lwt" {>= "5.0"} 14 + "odoc" {with-doc} 15 + ] 16 + build: [ 17 + ["dune" "subst"] {dev} 18 + [ 19 + "dune" 20 + "build" 21 + "-p" 22 + name 23 + "-j" 24 + jobs 25 + "@install" 26 + "@runtest" {with-test} 27 + "@doc" {with-doc} 28 + ] 29 + ]
+11
zarr-v3-unix/dune-project
··· 1 + (lang dune 3.17) 2 + (name zarr-v3-unix) 3 + (generate_opam_files true) 4 + (license ISC) 5 + (package 6 + (name zarr-v3-unix) 7 + (synopsis "Unix HTTP backend for zarr-v3") 8 + (depends 9 + (ocaml (>= 5.2)) 10 + (zarr-v3 (>= 0.1)) 11 + (lwt (>= 5.0))))
+4
zarr-v3-unix/lib/dune
··· 1 + (library 2 + (name zarr_v3_unix) 3 + (public_name zarr-v3-unix) 4 + (libraries zarr-v3 lwt lwt.unix))
+2
zarr-v3-unix/lib/zarr_v3_unix.ml
··· 1 + let fetch _url ?off:_ ?len:_ () = failwith "TODO" 2 + let codecs _name = None
+14
zarr-v3-unix/lib/zarr_v3_unix.mli
··· 1 + (** Unix HTTP backend for zarr-v3. 2 + 3 + Provides a {!Zarr_v3.fetch} implementation using [curl] subprocesses 4 + and a minimal {!Zarr_v3.codec_registry} suitable for GeoTessera data 5 + (which uses Blosc in memcpy/uncompressed mode). *) 6 + 7 + val fetch : Zarr_v3.fetch 8 + (** HTTP fetch via [curl] subprocess. Supports byte-range requests. 9 + Requires [curl] to be on the PATH. *) 10 + 11 + val codecs : Zarr_v3.codec_registry 12 + (** Codec registry. Currently returns [None] for all codecs — 13 + the built-in Blosc memcpy handling in zarr-v3 is sufficient 14 + for GeoTessera's uncompressed data. *)
+25
zarr-v3-unix/zarr-v3-unix.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Unix HTTP backend for zarr-v3" 4 + license: "ISC" 5 + depends: [ 6 + "dune" {>= "3.17"} 7 + "ocaml" {>= "5.2"} 8 + "zarr-v3" {>= "0.1"} 9 + "lwt" {>= "5.0"} 10 + "odoc" {with-doc} 11 + ] 12 + build: [ 13 + ["dune" "subst"] {dev} 14 + [ 15 + "dune" 16 + "build" 17 + "-p" 18 + name 19 + "-j" 20 + jobs 21 + "@install" 22 + "@runtest" {with-test} 23 + "@doc" {with-doc} 24 + ] 25 + ]
+13
zarr-v3/dune-project
··· 1 + (lang dune 3.17) 2 + (name zarr-v3) 3 + (generate_opam_files true) 4 + (license ISC) 5 + (package 6 + (name zarr-v3) 7 + (synopsis "Pure OCaml Zarr v3 reader with pluggable codecs") 8 + (description "Async Zarr v3 store reader supporting sharding, pluggable compression codecs, and HTTP range requests. Platform-independent.") 9 + (depends 10 + (ocaml (>= 5.2)) 11 + (lwt (>= 5.0)) 12 + (yojson (>= 2.0)) 13 + (alcotest (and :with-test (>= 0.8)))))
+9
zarr-v3/lib/blosc.ml
··· 1 + type header = { 2 + nbytes : int; 3 + cbytes : int; 4 + typesize : int; 5 + blocksize : int; 6 + } 7 + 8 + let parse_header _s = failwith "TODO" 9 + let decode ?decompress:_ _s = failwith "TODO"
+33
zarr-v3/lib/blosc.mli
··· 1 + (** Blosc frame decoder. 2 + 3 + Blosc wraps compressed (or uncompressed) data in a 16-byte header 4 + containing sizes and codec metadata. In the GeoTessera Zarr store, 5 + Blosc typically stores data uncompressed (memcpy fallback) with 6 + just the 16-byte header overhead. 7 + 8 + For compressed frames, a [decompress] callback must be provided 9 + that handles the inner compression algorithm (e.g., Zstd). *) 10 + 11 + type header = { 12 + nbytes : int; (** Uncompressed data size in bytes *) 13 + cbytes : int; (** Compressed size including the 16-byte header *) 14 + typesize : int; (** Element size in bytes (for shuffle) *) 15 + blocksize : int; (** Block size in bytes *) 16 + } 17 + (** Parsed Blosc frame header. *) 18 + 19 + val parse_header : string -> header 20 + (** Parse a Blosc header from the first 16 bytes of a frame. 21 + @raise Failure if input is shorter than 16 bytes. *) 22 + 23 + val decode : ?decompress:(string -> int -> string) -> string -> string 24 + (** Decode a Blosc frame, returning the raw uncompressed data. 25 + 26 + If the frame uses memcpy mode ([cbytes = nbytes + 16]), the payload 27 + is returned directly without decompression. 28 + 29 + If the frame is actually compressed and [decompress] is provided, 30 + calls [decompress compressed_payload expected_size]. 31 + 32 + @raise Failure if the frame is too short, or if compressed 33 + without a [decompress] callback. *)
+4
zarr-v3/lib/dune
··· 1 + (library 2 + (name zarr_v3) 3 + (public_name zarr-v3) 4 + (libraries lwt yojson))
+63
zarr-v3/lib/zarr_v3.ml
··· 1 + type fetch = string -> ?off:int -> ?len:int -> unit -> string Lwt.t 2 + type codec = string -> string 3 + type codec_registry = string -> codec option 4 + 5 + type data_type = Int8 | Uint8 | Int32 | Float32 | Float64 6 + 7 + type array_meta = { 8 + shape : int array; 9 + data_type : data_type; 10 + chunk_shape : int array; 11 + chunk_separator : string; 12 + is_sharded : bool; 13 + inner_chunk_shape : int array option; 14 + inner_codecs : string list; 15 + index_location : [ `Start | `End ]; 16 + } 17 + 18 + type store = { 19 + base_url : string; 20 + fetch : fetch; 21 + codecs : codec_registry; 22 + consolidated : (string * Yojson.Safe.t) list; 23 + } 24 + 25 + type arr = { 26 + store : store; 27 + path : string; 28 + meta : array_meta; 29 + } 30 + 31 + let data_type_size = function 32 + | Int8 | Uint8 -> 1 33 + | Int32 | Float32 -> 4 34 + | Float64 -> 8 35 + 36 + let data_type_of_string = function 37 + | "int8" -> Int8 38 + | "uint8" -> Uint8 39 + | "int32" -> Int32 40 + | "float32" -> Float32 41 + | "float64" -> Float64 42 + | s -> failwith (Printf.sprintf "Unsupported data type: %s" s) 43 + 44 + let open_store ~fetch ~codecs url = 45 + ignore (fetch, codecs, url); 46 + failwith "TODO" 47 + 48 + let open_array store path = 49 + ignore (store.base_url, store.fetch, store.codecs, store.consolidated); 50 + ignore path; 51 + failwith "TODO" 52 + 53 + let array_meta arr = 54 + ignore arr.store; ignore arr.path; 55 + arr.meta 56 + 57 + let group_attrs store path = 58 + ignore (store, path); 59 + failwith "TODO" 60 + 61 + let read arr ~start ~shape = 62 + ignore (arr, start, shape); 63 + failwith "TODO"
+112
zarr-v3/lib/zarr_v3.mli
··· 1 + (** Pure OCaml Zarr v3 store reader. 2 + 3 + Reads sharded Zarr v3 arrays over HTTP with pluggable codecs and 4 + fetch functions. Platform-independent — bring your own HTTP client 5 + and decompressors. 6 + 7 + {2 Example} 8 + 9 + {[ 10 + let store = Zarr_v3.open_store ~fetch ~codecs url in 11 + let arr = Zarr_v3.open_array store "utm31/embeddings" in 12 + let data = Zarr_v3.read arr ~start:[|100; 200; 0|] ~shape:[|4; 4; 128|] in 13 + (* data is a string of raw bytes in C-order *) 14 + ]} 15 + 16 + {2 Pluggable I/O} 17 + 18 + The [fetch] parameter provides HTTP access. The [codecs] parameter 19 + provides decompression. Both are passed in by platform backends 20 + (e.g., zarr-v3-unix for testing, tessera-zarr-jsoo for the browser). *) 21 + 22 + (** {1 Pluggable interfaces} *) 23 + 24 + type fetch = string -> ?off:int -> ?len:int -> unit -> string Lwt.t 25 + (** [fetch url ?off ?len ()] fetches bytes from [url]. 26 + If [off] and [len] are provided, fetches a byte range. 27 + Returns the response body as a string. *) 28 + 29 + type codec = string -> string 30 + (** A decompression codec. Takes compressed bytes, returns decompressed bytes. *) 31 + 32 + type codec_registry = string -> codec option 33 + (** Maps codec names (e.g., ["zstd"]) to decompression functions. 34 + Return [None] for unknown codecs. The built-in [bytes] and [blosc] 35 + (memcpy mode) codecs are handled internally. *) 36 + 37 + (** {1 Metadata types} *) 38 + 39 + type data_type = 40 + | Int8 41 + | Uint8 42 + | Int32 43 + | Float32 44 + | Float64 45 + (** Supported Zarr data types. *) 46 + 47 + type array_meta = { 48 + shape : int array; 49 + data_type : data_type; 50 + chunk_shape : int array; 51 + chunk_separator : string; 52 + is_sharded : bool; 53 + inner_chunk_shape : int array option; 54 + inner_codecs : string list; 55 + index_location : [ `Start | `End ]; 56 + } 57 + (** Parsed metadata for a Zarr v3 array. *) 58 + 59 + (** {1 Store and array handles} *) 60 + 61 + type store 62 + (** An open Zarr v3 store backed by HTTP. Holds the base URL, 63 + fetch function, codec registry, and consolidated metadata. *) 64 + 65 + type arr 66 + (** An open Zarr v3 array with parsed metadata and shard access methods. *) 67 + 68 + (** {1 Opening stores and arrays} *) 69 + 70 + val open_store : fetch:fetch -> codecs:codec_registry -> string -> store Lwt.t 71 + (** [open_store ~fetch ~codecs base_url] opens a Zarr v3 store. 72 + Fetches and parses the root [zarr.json], including any 73 + consolidated metadata. *) 74 + 75 + val open_array : store -> string -> arr Lwt.t 76 + (** [open_array store path] opens an array by path (e.g., ["utm31/scales"]). 77 + Uses consolidated metadata if available. 78 + @raise Failure if the array is not found. *) 79 + 80 + (** {1 Metadata access} *) 81 + 82 + val array_meta : arr -> array_meta 83 + (** Get the parsed metadata for an open array. *) 84 + 85 + val group_attrs : store -> string -> (string * Yojson.Safe.t) list 86 + (** [group_attrs store path] returns the attributes of a group 87 + (e.g., ["utm31"] for spatial transform and CRS info). 88 + @raise Failure if the group is not found. *) 89 + 90 + (** {1 Reading data} *) 91 + 92 + val read : arr -> start:int array -> shape:int array -> string Lwt.t 93 + (** [read arr ~start ~shape] reads a rectangular region of an array. 94 + 95 + [start] is the origin (inclusive) in pixel coordinates. 96 + [shape] is the size of the region in each dimension. 97 + Returns raw bytes in C-order. The caller must interpret the bytes 98 + according to {!array_meta.data_type}. 99 + 100 + For sharded arrays, fetches only the shards that overlap the 101 + requested region. Shard fetches run in parallel via [Lwt.join]. 102 + 103 + @raise Failure if the region is out of bounds. *) 104 + 105 + (** {1 Utility} *) 106 + 107 + val data_type_size : data_type -> int 108 + (** Size in bytes of a single element of the given data type. *) 109 + 110 + val data_type_of_string : string -> data_type 111 + (** Parse a Zarr data type string (e.g., ["int8"], ["float32"]). 112 + @raise Failure for unsupported types. *)
+28
zarr-v3/zarr-v3.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Pure OCaml Zarr v3 reader with pluggable codecs" 4 + description: 5 + "Async Zarr v3 store reader supporting sharding, pluggable compression codecs, and HTTP range requests. Platform-independent." 6 + license: "ISC" 7 + depends: [ 8 + "dune" {>= "3.17"} 9 + "ocaml" {>= "5.2"} 10 + "lwt" {>= "5.0"} 11 + "yojson" {>= "2.0"} 12 + "alcotest" {with-test & >= "0.8"} 13 + "odoc" {with-doc} 14 + ] 15 + build: [ 16 + ["dune" "subst"] {dev} 17 + [ 18 + "dune" 19 + "build" 20 + "-p" 21 + name 22 + "-j" 23 + jobs 24 + "@install" 25 + "@runtest" {with-test} 26 + "@doc" {with-doc} 27 + ] 28 + ]