Sync from monorepo · jon.recoil.org/onnxrt@54c812b

+23

doc/index.mld

··· 1 + {0 onnxrt} 2 + 3 + OCaml bindings to {{:https://onnxruntime.ai/}ONNX Runtime Web} for 4 + browser-based ML inference via {{:https://ocsigen.org/js_of_ocaml}js_of_ocaml}. 5 + 6 + Supports WebAssembly (CPU) and WebGPU (GPU) execution providers, typed 7 + tensors, and session management. Models are loaded from [.onnx] files and 8 + run asynchronously using Lwt. 9 + 10 + {1 Examples} 11 + 12 + - {{!page-add_example}Tensor addition} — minimal example of creating 13 + tensors and running a model 14 + - {{!page-sentiment_example}Sentiment analysis} — text classification 15 + using a transformer model 16 + 17 + {1 API} 18 + 19 + The main entry points are: 20 + 21 + - {!Onnxrt.Session} — load models and run inference 22 + - {!Onnxrt.Tensor} — create and inspect typed tensors 23 + - {!Onnxrt.Env} — configure execution providers (WASM threads, WebGPU, SIMD)

+10 -10

lib/onnxrt.mli

··· 72 72 {1 GPU tensors} 73 73 74 74 When using the WebGPU backend, tensors can reside on the GPU to avoid 75 - CPU↔GPU transfers between chained inference calls. See {!Tensor.location}, 75 + CPU↔GPU transfers between chained inference calls. See {!Tensor.type-location}, 76 76 {!Tensor.download}, and {!Session.create} with 77 77 [~preferred_output_location:`Gpu_buffer]. 78 78 ··· 138 138 139 139 {2 Lifecycle} 140 140 141 - Tensors obtained from {!Session.run} should be {!dispose}d when no longer 141 + Tensors obtained from {!Session.run} should be {!Tensor.dispose}d when no longer 142 142 needed. For CPU tensors this is a hint to the garbage collector; for GPU 143 143 tensors it releases the underlying [GPUBuffer] and failure to dispose will 144 144 leak GPU memory. ··· 147 147 148 148 When a session is configured with 149 149 [~preferred_output_location:`Gpu_buffer], output tensors reside on the GPU. 150 - Their data is not accessible synchronously — use {!download} to transfer 150 + Their data is not accessible synchronously — use {!Tensor.download} to transfer 151 151 to CPU, or pass them directly as input to another {!Session.run} call to 152 152 keep computation on the GPU. *) 153 153 module Tensor : sig ··· 302 302 303 303 {2 Session lifecycle} 304 304 305 - 1. Create a session with {!create}, which loads the model, applies graph 305 + 1. Create a session with {!Session.create}, which loads the model, applies graph 306 306 optimizations, and partitions operators across execution providers. 307 - 2. Run inference with {!run}, passing named input tensors and receiving 307 + 2. Run inference with {!Session.run}, passing named input tensors and receiving 308 308 named output tensors. 309 - 3. Release with {!release} when done, to free model weights and any 309 + 3. Release with {!Session.release} when done, to free model weights and any 310 310 GPU resources. 311 311 312 312 {2 Warm-up} 313 313 314 314 When using WebGPU, compute shaders are compiled lazily on the first 315 - {!run} call. The first inference will be significantly slower than 315 + {!Session.run} call. The first inference will be significantly slower than 316 316 subsequent ones. Run a warm-up inference with dummy data after session 317 317 creation if latency matters. 318 318 319 319 {2 Thread safety} 320 320 321 - Sessions do not support concurrent {!run} calls. Await each result before 321 + Sessions do not support concurrent {!Session.run} calls. Await each result before 322 322 starting the next inference. *) 323 323 module Session : sig 324 324 (** An opaque inference session handle. *) ··· 419 419 - [n]: use [n] threads (requires cross-origin isolation) 420 420 421 421 Multi-threading requires the page to be served with: 422 - {v 422 + {v 423 423 Cross-Origin-Opener-Policy: same-origin 424 424 Cross-Origin-Embedder-Policy: require-corp 425 - v} *) 425 + v} *) 426 426 427 427 val set_simd : bool -> unit 428 428 (** [set_simd enabled] enables or disables WASM SIMD. Defaults to [true]

+1 -1

onnxrt.opam

··· 9 9 "ocaml" {>= "5.2"} 10 10 "js_of_ocaml" {>= "5.8"} 11 11 "js_of_ocaml-ppx" {>= "5.8"} 12 - "lwt" {>= "5.7" & < "6.1.0"} 12 + "lwt" {>= "5.7"} 13 13 "js_of_ocaml-lwt" {>= "5.8"} 14 14 "odoc" {with-doc} 15 15 ]

Configure Feed

Configure Feed