csv#
Declarative CSV codecs for OCaml with streaming I/O via bytesrw and jsont-style structured errors.
Overview#
Csv provides bidirectional CSV encoding and decoding using a combinator approach inspired by jsont. Define a typed row codec once and use it for both decoding CSV files into OCaml records and encoding records back to CSV. Column names are resolved once at decode time for O(1) per-row field access.
Streaming I/O is built on
bytesrw: decode reads from a
Bytes.Reader.t and encode writes to a Bytes.Writer.t, so CSV
processing composes with other bytesrw-based codecs in a pipeline. For
simple cases, decode_file, decode_string, and fold_file provide
direct access.
Decode errors are reported via loc with a structural path context identifying the failing row (by index) and column (by name).
Installation#
Install with opam:
$ opam install nox-csv
If opam cannot find the package, it may not yet be released in the public
opam-repository. Add the overlay repository, then install it:
$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install nox-csv
Usage#
type point = { x : float; y : float; label : string }
let point_codec =
Csv.(Row.(
obj (fun x y label -> { x; y; label })
|> col "x" float ~enc:(fun p -> p.x)
|> col "y" float ~enc:(fun p -> p.y)
|> col "label" string ~enc:(fun p -> p.label)
|> finish
))
(* Decode from a file. Error is already a formatted string with
line/column and the row/column path context. *)
let csv = "x,y,label\n1.0,2.0,A\n3.5,4.5,B\n"
let points =
match Csv.decode_string point_codec csv with
| Ok points -> points
| Error _ -> failwith "invalid CSV"
(* Streaming decode from a bytesrw reader. *)
let sum =
let r = Bytesrw.Bytes.Reader.of_string csv in
match Csv.decode point_codec r with
| Ok rows -> List.fold_left (fun acc p -> acc +. p.x) 0.0 rows
| Error _ -> 0.0
(* Encode to a bytesrw writer. Raises [Csv.Invalid_utf8_encode] if any
field string contains malformed UTF-8. *)
let encoded =
let buf = Buffer.create 256 in
let w = Bytesrw.Bytes.Writer.of_buffer buf in
Csv.encode point_codec points w;
Buffer.contents buf
For programmatic inspection of errors (IDE diagnostics, error
aggregation) use the primed variants which return the structured
Csv.Error.t. Loc, Meta, Path, and Sort_kind are
available directly as Csv.Loc, Csv.Meta, Csv.Path,
Csv.Sort_kind:
let report codec csv =
match Csv.decode_string' codec csv with
| Ok _ -> ()
| Error e -> ignore e
API overview#
- Field codecs:
string,int,int64,float,bool,option,nullable_float,nullable_int col_map-- custom field codec from a decode / encode pairRowbuilder --obj,col,finish, with?dec_absentfor defaulted columns- Batch decoding:
decode_file,decode_channel,decode_string, and primed variants returning('a, Loc.Error.t) result - Streaming fold:
fold_file,fold_channeland primed variants - Streaming:
decode/decode'from a bytesrw reader,encodeto a bytesrw writer col,update_col,delete_col-- column-level queries / updatescol_names,col_count-- codec introspectionSortmodule --Row | Field | Headersort enumLoc,Meta,Path,Sort_kind,Error-- shared source-location and structured-error infrastructure, available asCsv.Loc,Csv.Meta,Csv.Path,Csv.Sort_kind,Csv.Error(same modules as in loc)Invalid_utf8_encode of int-- raised by the encoder when a field value is not well-formed UTF-8
All decoding entry points accept ?max_rows and ?max_cols to bound
resource use on untrusted input.
Full signatures in csv.mli.
References#
- RFC 4180 -- the CSV format
- jsont -- JSON codec library that inspired the approach
- bytesrw -- composable byte stream readers and writers
- loc -- shared source-location and error infrastructure
License#
ISC. See LICENSE.md.