My aggregated monorepo of OCaml code, automaintained
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add design doc for sherlodoc markdown storage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+98
+98
docs/plans/2026-02-19-sherlodoc-markdown-design.md
··· 1 + # Sherlodoc Markdown Storage 2 + 3 + ## Problem 4 + 5 + Sherlodoc stores documentation as pre-rendered HTML (`doc_html` field in `Entry.t`). 6 + This is unsuitable for LLM/AI consumption and terminal CLI display, where markdown 7 + is the natural format. 8 + 9 + ## Solution 10 + 11 + Add a `doc_markdown : string` field to `Entry.t`, populated at index time using 12 + odoc's markdown2 renderer (with LLM-specific improvements: fully qualified paths 13 + and unified code blocks for type definitions). 14 + 15 + ## Approach: Dual storage (Approach 1) 16 + 17 + Store both `doc_html` and `doc_markdown` in each entry. Compute both at index time. 18 + No runtime conversion needed. Database size increase is modest since doc comments 19 + are typically small. 20 + 21 + ## Changes 22 + 23 + ### 1. `db/entry.ml` — Add `doc_markdown` field 24 + 25 + ```ocaml 26 + type t = 27 + { name : string 28 + ; rhs : string option 29 + ; url : string 30 + ; kind : Kind.t 31 + ; cost : int 32 + ; doc_html : string 33 + ; doc_markdown : string (* NEW *) 34 + ; pkg : Package.t 35 + } 36 + ``` 37 + 38 + Update `v`, `pp`, `structural_compare` accordingly. 39 + 40 + ### 2. `odoc/src/search/` — Add markdown rendering for doc comments 41 + 42 + Add a `markdown_string_of_doc` function that mirrors `html_string_of_doc`: 43 + 44 + ``` 45 + Comment.elements 46 + -> Odoc_document.Comment.to_ir 47 + -> Odoc_markdown.Generator.block 48 + -> Renderer.to_string (Renderer.Block.Blocks blocks) 49 + -> string 50 + ``` 51 + 52 + The search module's dune gains a dependency on `odoc_markdown`. 53 + 54 + ### 3. `index/load_doc.ml` — Populate `doc_markdown` at index time 55 + 56 + In `register_entry`, compute `doc_markdown` alongside `doc_html`: 57 + 58 + ```ocaml 59 + let doc_markdown = 60 + match doc_txt with 61 + | "" -> "" 62 + | _ -> markdown_string_of_doc doc 63 + in 64 + ``` 65 + 66 + The index dune gains a dependency on `odoc_markdown` (via the search module). 67 + 68 + ### 4. `cli/search.ml` — Default to markdown in CLI output 69 + 70 + - Change `--print-docstring-html` to print `elt.doc_markdown` and rename to 71 + `--print-docstring` (keeping `--print-docstring-html` for HTML output) 72 + - When `--print-docstring` is set, print `elt.doc_markdown` 73 + 74 + ### 5. Storage scope 75 + 76 + Marshal format only for now — it serializes `Entry.t` directly so adding a field 77 + Just Works. JS and ancient formats can be extended later. 78 + 79 + ## What stays the same 80 + 81 + - Suffix tree / type polarity indices — unchanged 82 + - Search algorithm — unchanged 83 + - `doc_html` field — kept for browser/HTML consumers 84 + - Text tokenization for search indexing — still uses `Text.of_doc` 85 + 86 + ## LLM markdown improvements (cherry-picked from jonludlam/odoc) 87 + 88 + These commits improve markdown quality for LLM consumption: 89 + 90 + 1. **Fully qualified paths** — `Foo.Bar.t` instead of just `t` 91 + 2. **Unified code blocks** — Type definitions rendered as single code blocks 92 + with inline doc comments instead of fragmented separate blocks 93 + 3. **Record field types** — Proper extraction of all inline elements 94 + 95 + ## Dependencies 96 + 97 + The `index` library needs `odoc_markdown` added to its dune dependencies. 98 + The `db` library has no new dependencies.