···11+{0 onnxrt}
22+33+OCaml bindings to {{:https://onnxruntime.ai/}ONNX Runtime Web} for
44+browser-based ML inference via {{:https://ocsigen.org/js_of_ocaml}js_of_ocaml}.
55+66+Supports WebAssembly (CPU) and WebGPU (GPU) execution providers, typed
77+tensors, and session management. Models are loaded from [.onnx] files and
88+run asynchronously using Lwt.
99+1010+{1 Examples}
1111+1212+- {{!page-add_example}Tensor addition} — minimal example of creating
1313+ tensors and running a model
1414+- {{!page-sentiment_example}Sentiment analysis} — text classification
1515+ using a transformer model
1616+1717+{1 API}
1818+1919+The main entry points are:
2020+2121+- {!Onnxrt.Session} — load models and run inference
2222+- {!Onnxrt.Tensor} — create and inspect typed tensors
2323+- {!Onnxrt.Env} — configure execution providers (WASM threads, WebGPU, SIMD)
+10-10
lib/onnxrt.mli
···7272 {1 GPU tensors}
73737474 When using the WebGPU backend, tensors can reside on the GPU to avoid
7575- CPU↔GPU transfers between chained inference calls. See {!Tensor.location},
7575+ CPU↔GPU transfers between chained inference calls. See {!Tensor.type-location},
7676 {!Tensor.download}, and {!Session.create} with
7777 [~preferred_output_location:`Gpu_buffer].
7878···138138139139 {2 Lifecycle}
140140141141- Tensors obtained from {!Session.run} should be {!dispose}d when no longer
141141+ Tensors obtained from {!Session.run} should be {!Tensor.dispose}d when no longer
142142 needed. For CPU tensors this is a hint to the garbage collector; for GPU
143143 tensors it releases the underlying [GPUBuffer] and failure to dispose will
144144 leak GPU memory.
···147147148148 When a session is configured with
149149 [~preferred_output_location:`Gpu_buffer], output tensors reside on the GPU.
150150- Their data is not accessible synchronously — use {!download} to transfer
150150+ Their data is not accessible synchronously — use {!Tensor.download} to transfer
151151 to CPU, or pass them directly as input to another {!Session.run} call to
152152 keep computation on the GPU. *)
153153module Tensor : sig
···302302303303 {2 Session lifecycle}
304304305305- 1. Create a session with {!create}, which loads the model, applies graph
305305+ 1. Create a session with {!Session.create}, which loads the model, applies graph
306306 optimizations, and partitions operators across execution providers.
307307- 2. Run inference with {!run}, passing named input tensors and receiving
307307+ 2. Run inference with {!Session.run}, passing named input tensors and receiving
308308 named output tensors.
309309- 3. Release with {!release} when done, to free model weights and any
309309+ 3. Release with {!Session.release} when done, to free model weights and any
310310 GPU resources.
311311312312 {2 Warm-up}
313313314314 When using WebGPU, compute shaders are compiled lazily on the first
315315- {!run} call. The first inference will be significantly slower than
315315+ {!Session.run} call. The first inference will be significantly slower than
316316 subsequent ones. Run a warm-up inference with dummy data after session
317317 creation if latency matters.
318318319319 {2 Thread safety}
320320321321- Sessions do not support concurrent {!run} calls. Await each result before
321321+ Sessions do not support concurrent {!Session.run} calls. Await each result before
322322 starting the next inference. *)
323323module Session : sig
324324 (** An opaque inference session handle. *)
···419419 - [n]: use [n] threads (requires cross-origin isolation)
420420421421 Multi-threading requires the page to be served with:
422422- {v
422422+{v
423423Cross-Origin-Opener-Policy: same-origin
424424Cross-Origin-Embedder-Policy: require-corp
425425- v} *)
425425+v} *)
426426427427 val set_simd : bool -> unit
428428 (** [set_simd enabled] enables or disables WASM SIMD. Defaults to [true]