My own corner of monopam
2
fork

Configure Feed

Select the types of activity you want to include in your feed.

README.md

csv#

Declarative CSV codecs for OCaml with streaming I/O via bytesrw and jsont-style structured errors.

Overview#

Csv provides bidirectional CSV encoding and decoding using a combinator approach inspired by jsont. Define a typed row codec once and use it for both decoding CSV files into OCaml records and encoding records back to CSV. Column names are resolved once at decode time for O(1) per-row field access.

Streaming I/O is built on bytesrw: decode reads from a Bytes.Reader.t and encode writes to a Bytes.Writer.t, so CSV processing composes with other bytesrw-based codecs in a pipeline. For simple cases, decode_file, decode_string, and fold_file provide direct access.

Decode errors are reported via loc with a structural path context identifying the failing row (by index) and column (by name).

Installation#

Install with opam:

$ opam install nox-csv

If opam cannot find the package, it may not yet be released in the public opam-repository. Add the overlay repository, then install it:

$ opam repo add samoht https://tangled.org/gazagnaire.org/opam-overlay.git
$ opam update
$ opam install nox-csv

Usage#

type point = { x : float; y : float; label : string }

let point_codec =
  Csv.(Row.(
    obj (fun x y label -> { x; y; label })
    |> col "x" float ~enc:(fun p -> p.x)
    |> col "y" float ~enc:(fun p -> p.y)
    |> col "label" string ~enc:(fun p -> p.label)
    |> finish
  ))

(* Decode from a file. Error is already a formatted string with
   line/column and the row/column path context. *)
let csv = "x,y,label\n1.0,2.0,A\n3.5,4.5,B\n"

let points =
  match Csv.decode_string point_codec csv with
  | Ok points -> points
  | Error _ -> failwith "invalid CSV"

(* Streaming decode from a bytesrw reader. *)
let sum =
  let r = Bytesrw.Bytes.Reader.of_string csv in
  match Csv.decode point_codec r with
  | Ok rows -> List.fold_left (fun acc p -> acc +. p.x) 0.0 rows
  | Error _ -> 0.0

(* Encode to a bytesrw writer. Raises [Csv.Invalid_utf8_encode] if any
   field string contains malformed UTF-8. *)
let encoded =
  let buf = Buffer.create 256 in
  let w = Bytesrw.Bytes.Writer.of_buffer buf in
  Csv.encode point_codec points w;
  Buffer.contents buf

For programmatic inspection of errors (IDE diagnostics, error aggregation) use the primed variants which return the structured Csv.Error.t. Loc, Meta, Path, and Sort_kind are available directly as Csv.Loc, Csv.Meta, Csv.Path, Csv.Sort_kind:

let report codec csv =
  match Csv.decode_string' codec csv with
  | Ok _ -> ()
  | Error e -> ignore e

API overview#

  • Field codecs: string, int, int64, float, bool, option, nullable_float, nullable_int
  • col_map -- custom field codec from a decode / encode pair
  • Row builder -- obj, col, finish, with ?dec_absent for defaulted columns
  • Batch decoding: decode_file, decode_channel, decode_string, and primed variants returning ('a, Loc.Error.t) result
  • Streaming fold: fold_file, fold_channel and primed variants
  • Streaming: decode / decode' from a bytesrw reader, encode to a bytesrw writer
  • col, update_col, delete_col -- column-level queries / updates
  • col_names, col_count -- codec introspection
  • Sort module -- Row | Field | Header sort enum
  • Loc, Meta, Path, Sort_kind, Error -- shared source-location and structured-error infrastructure, available as Csv.Loc, Csv.Meta, Csv.Path, Csv.Sort_kind, Csv.Error (same modules as in loc)
  • Invalid_utf8_encode of int -- raised by the encoder when a field value is not well-formed UTF-8

All decoding entry points accept ?max_rows and ?max_cols to bound resource use on untrusted input.

Full signatures in csv.mli.

References#

  • RFC 4180 -- the CSV format
  • jsont -- JSON codec library that inspired the approach
  • bytesrw -- composable byte stream readers and writers
  • loc -- shared source-location and error infrastructure

License#

ISC. See LICENSE.md.