# langdetect Language detection library for OCaml using n-gram frequency analysis. This is an OCaml port of the [Cybozu langdetect](https://github.com/shuyo/language-detection) algorithm. It detects the natural language of text using n-gram frequency profiles. It was ported from . ## Features - Detects 49 languages including English, Chinese, Japanese, Arabic, and many European languages - Fast probabilistic detection using n-gram frequency analysis - Configurable detection parameters (smoothing, convergence thresholds) - Reproducible results with optional random seed control - Pure OCaml implementation with minimal dependencies ## Installation ```bash opam install langdetect ``` ## Usage ```ocaml (* Create a detector with all built-in profiles *) let detector = Langdetect.create_default () (* Detect the best matching language *) let () = match Langdetect.detect_best detector "Hello, world!" with | Some lang -> Printf.printf "Detected: %s\n" lang | None -> print_endline "Could not detect language" (* Get all possible languages with probabilities *) let () = let results = Langdetect.detect detector "Bonjour le monde" in List.iter (fun r -> Printf.printf "%s: %.2f\n" r.Langdetect.lang r.Langdetect.prob ) results (* Use custom configuration *) let config = { Langdetect.default_config with prob_threshold = 0.3 } let detector = Langdetect.create_default ~config () ``` ## Supported Languages Arabic, Bengali, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Farsi, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malayalam, Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Sinhalese, Albanian, Spanish, Swedish, Tamil, Telugu, Thai, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Chinese (Simplified), Chinese (Traditional). ## License MIT License - see LICENSE file for details. Based on the Cybozu langdetect algorithm. Copyright (c) 2007-2016 Mozilla Foundation and 2025 Anil Madhavapeddy.