Opinionated OCaml linter with Merlin integration for code quality, naming conventions, and style checks
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Delete outdated *.md files

Twelve docs that no longer track real work or describe the current
fork. Removed:

ocaml-tls/{attacks,sni,design}.md — pre-fork architecture notes from
mirleft/ocaml-tls that reference 2014 CVEs, polarssl, Lwt/Async
flows; nothing in the current Eio-based fork's design.

memtrace/CONTRIBUTING.md — Jane Street upstream guide pointing at
github.com PRs, doesn't apply on tangled.

ocaml-requests/HTTP2_{IMPLEMENTATION_PLAN,CONPOOL_INTEGRATION}.md —
1500 lines combined for an HTTP/2 rollout that hasn't moved in 6+
weeks; revive from git history if/when the work restarts.

merlint/TODO.md, merlint/todo/{new_rules,codes,duplication}.md,
ocaml-sqlite/TODO.md, prune/TODO.md — stale TODO/proposal lists,
last touched 6+ weeks ago. Active TODOs (monopam, irmin, jwt,
cookie, chor, requests SPEC-TODO, claude ARCHITECTURE) all kept.

Also drop the now-empty merlint/todo/ directory and roll in dune fmt
fallout for irmin/test/bench/dune.

-453
-91
TODO.md
··· 1 - # TODO 2 - 3 - ## High Priority - New Rules to Implement 4 - 5 - ### Code Quality Rules 6 - - [ ] **Excessive String.contains Detection** 7 - - Flag functions with multiple String.contains calls (code smell) 8 - - Suggest using Re module for pattern matching 9 - - Threshold: 3+ String.contains in a single function 10 - 11 - - [ ] **ML File Documentation Style** 12 - - Discourage `(** *)` comments in .ml files except for module headers 13 - - Function docs in .ml should use `(* *)` and be implementation notes 14 - - Reserve `(** *)` for .mli files and module-level documentation 15 - 16 - ### Idiomatic OCaml Rules 17 - - [ ] **E331: Missing Labels for Same-Type Parameters** 18 - - Functions with 2+ parameters of same type should use labels 19 - - Prevents argument order mistakes (e.g., `copy ~from ~to`) 20 - 21 - - [ ] **E326: Redundant 'get_' Prefix** 22 - - Simple accessors shouldn't have `get_` prefix 23 - - Use `User.name` not `User.get_name` 24 - - Reserve `get_` for computations that always succeed 25 - 26 - ### Code Complexity Rules 27 - - [ ] **Code Duplication Detection** 28 - - Find duplicated code blocks across the codebase 29 - - Use AST subtree hashing algorithm (most accurate) 30 - - Suggest extracting common code into functions 31 - - See todo/duplication.md for detailed algorithm specification 32 - 33 - ## High Priority - Rule Improvements 34 - 35 - ### AST-Based Rule Conversions 36 - - [ ] **E105: Catch-all Exception Handler** 37 - - Currently catches ALL `_` patterns, not just in try-with 38 - - Need AST traversal to find only Pexp_try nodes with Ppat_any 39 - 40 - - [ ] **E340: Inline Error Construction** 41 - - Pattern `Error (Fmt.str ...)` detection not working 42 - - Needs AST analysis of constructor applications 43 - 44 - - [ ] **E110: Silenced Warning** 45 - - Uses regex for `[@warning...]` attributes 46 - - Need proper attribute support in AST/dump module 47 - 48 - - [ ] **E600/E610/E615: Test File Detection Improvements** 49 - - E600: Currently only checks "test.ml" files - should check all test executables 50 - - E610/E615: Still use brittle path heuristics (checking for "lib/" or "test/" in paths) 51 - - All should use Context.executable_modules and Context.test_modules like E505 does 52 - 53 - ## Medium Priority 54 - 55 - ### Configuration & Flexibility 56 - - [ ] **E325: Function Naming Convention** 57 - - get_* vs find_* convention not universal 58 - - Should be configurable/optional 59 - 60 - - [ ] **Add Configuration Support** 61 - - Allow customizing thresholds 62 - - Enable/disable specific rule categories 63 - - Per-project overrides 64 - 65 - ### Additional Rules 66 - - [ ] **KISS Simplicity Rules** 67 - - E342: Limit function parameters (max 4-5) 68 - - E343: Flag complex boolean expressions 69 - - E345: No single-letter variables (except x/xs, i) 70 - - E348: No magic numbers 71 - 72 - - [ ] **E352: Generic Label Detection** 73 - - Flag uninformative labels (~f, ~x, ~k) 74 - - Enforce descriptive API design 75 - 76 - - [ ] **E620: Test Naming Conventions** 77 - - Test suite names: lowercase, single words 78 - - Test cases: lowercase with underscores 79 - 80 - ## Low Priority 81 - 82 - ### Infrastructure 83 - - [ ] **Profiling Integration** 84 - - Module exists but collects no data 85 - - Add timing to engine.ml rule execution 86 - - Track time per file and per rule 87 - 88 - ### Documentation & Cleanup 89 - - [ ] **Rule Helper Consolidation** 90 - - Move kind_to_string from e325.ml/e330.ml to Outline module 91 - - Standardize Dump.check_elements pattern usage
-55
todo/codes.md
··· 1 - # Error Code and Documentation Review 2 - 3 - This document lists inconsistencies, gaps, and other issues found in the error code documentation (`docs/error-codes.html`) and the style guides (`prompts/*.md`). 4 - 5 - ## 1. Redundant Testing Guides 6 - 7 - - **Issue**: There are two separate and overlapping testing guides: `prompts/test.md` and `prompts/tests.md`. 8 - - **Gasps**: 9 - - This creates confusion about which guide is authoritative. 10 - - Maintaining two documents increases the risk of them becoming inconsistent over time. 11 - - **Recommendation**: Merge the content of `prompts/test.md` and `prompts/tests.md` into a single, comprehensive testing guide. `prompts/tests.md` appears to be more detailed and could be used as the base. 12 - 13 - ## 2. Incorrect Module Naming Convention (E305) 14 - 15 - - **Issue**: The description for error `E305` in `docs/error-codes.html` is inconsistent with the project's actual convention. 16 - - **Mismatch**: 17 - - `docs/error-codes.html` states the convention is `Snake_case` and gives a confusing example (`MyModule` → `My_module`). 18 - - `prompts/code.md` and the project's file structure clearly show the convention is `lowercase_with_underscores` (e.g., `user_profile.ml`). 19 - - **Recommendation**: Correct the `E305` entry in `docs/error-codes.html` to specify `lowercase_with_underscores` and provide a clear example (e.g., `MyModule` → `my_module`). 20 - 21 - ## 3. Missing Reference for E410 22 - 23 - - **Issue**: Error code `E410` (Documentation Style Issues) is defined in `docs/error-codes.html` but is not referenced in any of the `prompts/` style guides. 24 - - **Gasp**: Developers have no context or guidance on what constitutes a "documentation style issue" or how to avoid this error. 25 - - **Recommendation**: Add a section to `prompts/code.md` that explains the expected documentation style and explicitly references `[E410]`. 26 - 27 - ## 4. Overlapping Style Guide Content 28 - 29 - - **Issue**: The style guides in `prompts/` are fragmented. For example, rules about testing are split between `test.md` and `tests.md`, and general coding style is in `code.md`. 30 - - **Inconsistency**: This fragmentation can lead to incomplete or conflicting advice, as seen with the naming conventions. 31 - - **Recommendation**: Consider consolidating all style-related documentation into a single, authoritative `STYLE_GUIDE.md` to ensure consistency and make it easier for developers to find information. 32 - 33 - ## 5. Rules Without Error Codes 34 - 35 - The following rules from the style guides do not have corresponding error codes. They are candidates for being automated by `merlint`. 36 - 37 - ### E411: Docstring Format Convention (Proposed) 38 - - **Rule**: From `prompts/code.md`, "For functions, use the `[function_name arg1 arg2] is ...` pattern." 39 - - **Mechanization**: A linter could use a regular expression to check that the docstring for a `val` in an `.mli` file starts with `[<name> ...] is`. This would enforce a consistent documentation style. 40 - 41 - ### E330: Discouraged Argument Label (Proposed) 42 - - **Rule**: From `prompts/code.md`, "Avoid `~f` and `~x`." 43 - - **Mechanization**: This is a straightforward AST check. The linter can flag any function definition or call that uses the labels `~f` or `~x`. 44 - 45 - ### E510: Missing Log Source (Proposed) 46 - - **Rule**: From `prompts/code.md`, "Each module should define its own log source." 47 - - **Mechanization**: The linter can enforce that every `.ml` file contains at least one call to `Logs.Src.create`. This ensures that logging is properly configured on a per-module basis. 48 - 49 - ### E620: Test Name Convention (Proposed) 50 - - **Rule**: From `prompts/tests.md`, "Test suite names should be lowercase, single words... Test case names should be lowercase with underscores." 51 - - **Mechanization**: The linter can inspect the string literals passed to `Alcotest.test_case` and the suite names in the exported `suite` value to enforce the naming convention. 52 - 53 - ### E326: Redundant 'get_' Prefix (Proposed) 54 - - **Rule**: Refined from `E325`. Simple accessors should not have a `get_` prefix. The name of the property is sufficient (e.g., `User.name` instead of `User.get_name`). The `get_` prefix should be reserved for functions that perform a non-trivial computation but are still guaranteed to succeed. 55 - - **Mechanization**: The linter can flag functions named `get_*` where the implementation is a simple record field access. This would require semantic analysis of the function body.
-167
todo/duplication.md
··· 1 - # Code Duplication Detection Rule Specification 2 - 3 - ## Overview 4 - 5 - This rule should detect structurally similar code blocks that are candidates for extraction into shared functions. The goal is to identify code that has the same structure but may differ in variable names, literal values, or minor details. 6 - 7 - ## Algorithm: AST Subtree Hashing 8 - 9 - This is the most robust and widely-used technique for finding structural duplicates with a very low false-positive rate. 10 - 11 - ### How It Works 12 - 13 - 1. **Parse to AST**: Convert the source code into an Abstract Syntax Tree (AST). This immediately discards irrelevant information like comments and whitespace. 14 - 15 - 2. **Normalize the Tree**: Traverse the AST and "normalize" certain nodes. This is the key step for detecting variations. 16 - - **Identifiers**: Replace all variable and function names with a generic placeholder. For example, `Pexp_ident "my_var"` becomes `Pexp_ident "_VAR_"`. 17 - - **Literals**: Replace all literal values (strings, numbers) with a placeholder. `Pexp_constant 42` becomes `Pexp_constant "_LIT_"`. 18 - 19 - 3. **Hash Subtrees**: Traverse the normalized tree. For each node, compute a hash that combines its own type with the hashes of its children. This means that any two subtrees that are structurally identical (after normalization) will produce the exact same hash. 20 - 21 - 4. **Match Hashes**: Store the hashes in a map. When a hash collision occurs, you've found a structural duplicate. 22 - 23 - ### Example 24 - 25 - - Code A: `let total = price + 10` 26 - - Code B: `let sum = cost + 25` 27 - 28 - After parsing and normalization, both of these would effectively become the AST for: 29 - ``` 30 - let _VAR_ = _VAR_ + _LIT_ 31 - ``` 32 - 33 - Since their normalized ASTs are identical, their hashes will match. 34 - 35 - ### Characteristics 36 - 37 - - **False Positives**: Extremely low. A match from this algorithm means the code has the exact same syntactic structure. A developer looking at the two snippets will immediately agree they are duplicates and good candidates for a shared function. 38 - 39 - - **Pros**: 40 - - Immune to changes in variable names, literals, comments, and whitespace 41 - - Understands code structure, making it very accurate 42 - - The "normalization" step is configurable, allowing you to tune how sensitive it is 43 - 44 - - **Cons**: 45 - - More computationally intensive than token-based methods (requires a full parse) 46 - - Will not detect reordered statements (a Type 3 variation) 47 - 48 - ## Implementation Plan 49 - 50 - ### Step 1: Parse the Code (Already Done) 51 - We already have the logic to parse a file into a Ppxlib.structure. 52 - 53 - ```ocaml 54 - let structure = Ppxlib.Parse.implementation lexbuf 55 - ``` 56 - 57 - ### Step 2: Normalize the AST 58 - Create a visitor that replaces variable names and literals. 59 - 60 - ```ocaml 61 - open Ppxlib 62 - 63 - (* A visitor that replaces identifiers and constants with placeholders *) 64 - let normalizer = 65 - object 66 - inherit Ast_traverse.map as super 67 - 68 - (* Normalize variable names *) 69 - method! expression expr = 70 - match expr.pexp_desc with 71 - | Pexp_ident _ -> 72 - { expr with pexp_desc = Pexp_ident (Located.mk (Lident "_VAR_")) } 73 - | _ -> super#expression expr 74 - 75 - (* Normalize literals *) 76 - method! constant _ = 77 - Ast_builder.Default.pconst_integer "_LIT_" None 78 - end 79 - 80 - let normalized_structure = normalizer#structure structure 81 - ``` 82 - 83 - ### Step 3: Hash the Normalized Subtrees 84 - Create another visitor that traverses the normalized_structure and computes a hash for each node. 85 - 86 - ```ocaml 87 - let hashes = Hashtbl.create 1024 88 - 89 - let hasher = 90 - object 91 - inherit Ast_traverse.iter as super 92 - 93 - method! expression expr = 94 - (* Compute a hash for the current expression node. 95 - A simple hash could combine the node type with its 96 - children's hashes. 97 - This part requires a more sophisticated hashing strategy. *) 98 - let node_hash = (* ... compute hash of expr ... *) in 99 - 100 - (* Store the hash and its location *) 101 - let existing = Hashtbl.find_opt hashes node_hash in 102 - Hashtbl.replace hashes node_hash (expr.pexp_loc :: existing); 103 - 104 - super#expression expr 105 - end 106 - 107 - let () = hasher#structure normalized_structure 108 - ``` 109 - 110 - ### Step 4: Report Duplicates 111 - Finally, iterate through your hashes table. Any entry with more than one location is a code duplicate. 112 - 113 - ## Configuration Options 114 - 115 - 1. **Minimum Size**: To avoid flagging trivial duplicates (like `x + 1`), only consider subtrees with a minimum number of nodes or a certain depth. 116 - 117 - 2. **Normalization Level**: Configure what gets normalized: 118 - - Variable names only 119 - - Literals only 120 - - Both variables and literals 121 - - Type annotations 122 - - Module names 123 - 124 - 3. **Scope**: Configure where to look for duplicates: 125 - - Within a single file 126 - - Across files in the same directory 127 - - Across the entire project 128 - 129 - ## Example Output 130 - 131 - ``` 132 - [E700] Code Duplication 133 - Found 2 structurally identical code blocks: 134 - 135 - lib/user.ml:45-48: 136 - let calculate_total price tax = 137 - let subtotal = price * quantity in 138 - let total = subtotal + tax in 139 - total 140 - 141 - lib/order.ml:123-126: 142 - let compute_amount cost fee = 143 - let base = cost * count in 144 - let amount = base + fee in 145 - amount 146 - 147 - Consider extracting into a shared function: 148 - let calculate_with_addition base multiplier addition = 149 - let subtotal = base * multiplier in 150 - let total = subtotal + addition in 151 - total 152 - ``` 153 - 154 - ## Alternative Algorithm: Suffix Tree on Token Stream 155 - 156 - For reference, here's the alternative token-based approach: 157 - 158 - 1. **Lex & Normalize**: Convert the code into a normalized token stream. 159 - 2. **Build a Suffix Tree**: Build a generalized suffix tree from the token streams of all files. 160 - 3. **Find Common Substrings**: Find the longest common substrings within this tree. 161 - 162 - **Pros**: Can be faster than AST-based methods for very large codebases. 163 - **Cons**: Doesn't understand code structure and is sensitive to statement reordering. 164 - 165 - ## Recommendation 166 - 167 - For a tool that needs to be both accurate and useful for refactoring, the AST Subtree Hashing algorithm is the state-of-the-art and the best choice. It provides actionable results that directly point to code that can be refactored into a single, parameterized function.
-140
todo/new_rules.md
··· 1 - # Proposed New Rules for OCaml Style Guide 2 - 3 - This document outlines proposed new rules to enhance the existing OCaml style guide, making it more comprehensive and aligned with modern best practices for safe and maintainable systems programming. 4 - 5 - --- 6 - 7 - ### 1. API Design and Module Structure 8 - 9 - #### [E420] Avoid Exporting Simple Module Aliases 10 - 11 - * **Description**: In an interface (`.mli`) file, avoid creating simple aliases for other modules, such as `module Foo = Bar`. This practice clutters the API and can be confusing for users, who should be encouraged to use the original module (`Bar`) directly. 12 - * **Rationale**: Exporting a module alias provides no new functionality and can create ambiguity about whether the alias is a distinct module with different properties. Keeping the API clean and direct improves usability. 13 - * **Good Example**: 14 - ```ocaml 15 - (* In http.mli *) 16 - module Request = Http_request 17 - module Response = Http_response 18 - (* User should just use Http_request and Http_response directly. *) 19 - ``` 20 - * **Bad Example**: 21 - ```ocaml 22 - (* In http.mli *) 23 - (* (No aliases - user directly accesses other modules) *) 24 - ``` 25 - 26 - #### [E425] Define and Expose Functor Signatures 27 - 28 - * **Description**: When defining a functor in an `.mli` file, its argument and return types should be explicitly defined with clear signatures. Avoid exposing anonymous or inline functor types in interfaces. 29 - * **Rationale**: Explicit functor signatures are essential for documentation and usability. They clarify the contract of the functor, making it clear what kind of modules it accepts and what kind of module it produces. 30 - * **Good Example**: 31 - ```ocaml 32 - (* In store.mli *) 33 - module type Key = sig 34 - type t 35 - val equal : t -> t -> bool 36 - end 37 - 38 - module Make (K : Key) : sig 39 - type key = K.t 40 - type 'a t 41 - val find : 'a t -> key -> 'a option 42 - end 43 - ``` 44 - * **Bad Example**: 45 - ```ocaml 46 - (* In store.mli *) 47 - module Make : functor (K : sig type t ... end) -> sig ... end 48 - ``` 49 - 50 - #### [E360] Define Custom Operators in a Scoped Module 51 - 52 - * **Description**: Custom infix operators (e.g., `>>=`, `let*`, `|->`) should not be defined at the top level of a module. Instead, they should be placed within a dedicated, scoped module, such as `Syntax`, `Infix`, or `Let_syntax`. 53 - * **Rationale**: This practice prevents polluting the global namespace and forces the user to explicitly `open` the syntax module to enable the operators. This makes code clearer and avoids conflicts between libraries that might define the same operator. 54 - * **Good Example**: 55 - ```ocaml 56 - module Result = struct 57 - type ('a, 'e) t = ('a, 'e) result 58 - module Syntax = struct 59 - let (let*) = Result.bind 60 - end 61 - end 62 - 63 - (* Usage *) 64 - let (let*) = Result.Syntax.(let*) in 65 - ... 66 - ``` 67 - * **Bad Example**: 68 - ```ocaml 69 - module Result = struct 70 - type ('a, 'e) t = ('a, 'e) result 71 - let (let*) = Result.bind (* Defined at the top level *) 72 - end 73 - ``` 74 - 75 - --- 76 - 77 - ### 2. Type and Data Structure Definitions 78 - 79 - #### [E316] Prefer Private Types over Concrete Record Types in MLIs 80 - 81 - * **Description**: In `.mli` files, prefer marking record types as `private` (e.g., `type t = private { ... }`). This prevents users from directly constructing or modifying record fields, forcing them to use your provided helper functions. 82 - * **Rationale**: Abstract types (`type t`) can be too restrictive, while concrete types (`type t = { ... }`) break encapsulation and allow users to violate invariants. Private types offer a perfect middle ground, allowing controlled construction while protecting internal invariants. 83 - * **Good Example**: 84 - ```ocaml 85 - (* In user.mli *) 86 - type t = private { id: int; name: string } 87 - val make : id:int -> name:string -> t 88 - val name : t -> string 89 - ``` 90 - * **Bad Example**: 91 - ```ocaml 92 - (* In user.mli *) 93 - type t = { id: int; name: string } (* Invariants can be broken *) 94 - ``` 95 - 96 - #### [E317] Avoid Extensible Variants in Public APIs 97 - 98 - * **Description**: Avoid using extensible variants (`type t += ...`) for core data types in public-facing library interfaces. 99 - * **Rationale**: Extensible variants allow downstream consumers to add new constructors to your type. While powerful for plugin architectures, this can break a library's internal logic, which may assume it knows all possible cases for pattern matching. Their use should be rare and explicitly justified. 100 - * **Good Example**: 101 - ```ocaml 102 - (* Standard variant is safer for library logic *) 103 - type event = Message of string | Disconnect 104 - ``` 105 - * **Bad Example**: 106 - ```ocaml 107 - (* Core event type is extensible, which can be dangerous *) 108 - type event += Message of string 109 - ``` 110 - 111 - --- 112 - 113 - ### 3. Expression-level and Idiomatic Patterns 114 - 115 - #### [E120] Use `let _ = ...` for Intentionally Ignored Non-Unit Returns 116 - 117 - * **Description**: When calling a function that returns a non-`unit` value where the result is intentionally not used, explicitly ignore it with `let _ = ...`. Do not simply call it on its own line. 118 - * **Rationale**: The OCaml compiler issues a warning (warning 32) for ignored non-`unit` expressions because it's a common source of bugs. Using `let _ = ...` signals that the result is being discarded intentionally and safely silences the warning. 119 - * **Good Example**: 120 - ```ocaml 121 - let _ = List.map (fun x -> x * 2) [1; 2; 3] 122 - ``` 123 - * **Bad Example**: 124 - ```ocaml 125 - List.map (fun x -> x * 2) [1; 2; 3]; (* This does nothing and will warn *) 126 - ``` 127 - 128 - #### [E125] Avoid Point-Free Style in Public Functions 129 - 130 - * **Description**: Avoid defining functions using a point-free style (i.e., without explicitly naming the arguments). 131 - * **Rationale**: While point-free style can be very concise (e.g., `let process = f |> g`), it often makes the code harder to read, debug, and understand, as the data being transformed is implicit. Explicitly naming the argument (e.g., `let process x = x |> f |> g`) is clearer. 132 - * **Good Example**: 133 - ```ocaml 134 - let process_user user = 135 - user |> find_profile |> normalize_name 136 - ``` 137 - * **Bad Example**: 138 - ```ocaml 139 - let process_user = find_profile |> normalize_name 140 - ```