···11+# Sherlodoc Markdown Storage
22+33+## Problem
44+55+Sherlodoc stores documentation as pre-rendered HTML (`doc_html` field in `Entry.t`).
66+This is unsuitable for LLM/AI consumption and terminal CLI display, where markdown
77+is the natural format.
88+99+## Solution
1010+1111+Add a `doc_markdown : string` field to `Entry.t`, populated at index time using
1212+odoc's markdown2 renderer (with LLM-specific improvements: fully qualified paths
1313+and unified code blocks for type definitions).
1414+1515+## Approach: Dual storage (Approach 1)
1616+1717+Store both `doc_html` and `doc_markdown` in each entry. Compute both at index time.
1818+No runtime conversion needed. Database size increase is modest since doc comments
1919+are typically small.
2020+2121+## Changes
2222+2323+### 1. `db/entry.ml` — Add `doc_markdown` field
2424+2525+```ocaml
2626+type t =
2727+ { name : string
2828+ ; rhs : string option
2929+ ; url : string
3030+ ; kind : Kind.t
3131+ ; cost : int
3232+ ; doc_html : string
3333+ ; doc_markdown : string (* NEW *)
3434+ ; pkg : Package.t
3535+ }
3636+```
3737+3838+Update `v`, `pp`, `structural_compare` accordingly.
3939+4040+### 2. `odoc/src/search/` — Add markdown rendering for doc comments
4141+4242+Add a `markdown_string_of_doc` function that mirrors `html_string_of_doc`:
4343+4444+```
4545+Comment.elements
4646+ -> Odoc_document.Comment.to_ir
4747+ -> Odoc_markdown.Generator.block
4848+ -> Renderer.to_string (Renderer.Block.Blocks blocks)
4949+ -> string
5050+```
5151+5252+The search module's dune gains a dependency on `odoc_markdown`.
5353+5454+### 3. `index/load_doc.ml` — Populate `doc_markdown` at index time
5555+5656+In `register_entry`, compute `doc_markdown` alongside `doc_html`:
5757+5858+```ocaml
5959+let doc_markdown =
6060+ match doc_txt with
6161+ | "" -> ""
6262+ | _ -> markdown_string_of_doc doc
6363+in
6464+```
6565+6666+The index dune gains a dependency on `odoc_markdown` (via the search module).
6767+6868+### 4. `cli/search.ml` — Default to markdown in CLI output
6969+7070+- Change `--print-docstring-html` to print `elt.doc_markdown` and rename to
7171+ `--print-docstring` (keeping `--print-docstring-html` for HTML output)
7272+- When `--print-docstring` is set, print `elt.doc_markdown`
7373+7474+### 5. Storage scope
7575+7676+Marshal format only for now — it serializes `Entry.t` directly so adding a field
7777+Just Works. JS and ancient formats can be extended later.
7878+7979+## What stays the same
8080+8181+- Suffix tree / type polarity indices — unchanged
8282+- Search algorithm — unchanged
8383+- `doc_html` field — kept for browser/HTML consumers
8484+- Text tokenization for search indexing — still uses `Text.of_doc`
8585+8686+## LLM markdown improvements (cherry-picked from jonludlam/odoc)
8787+8888+These commits improve markdown quality for LLM consumption:
8989+9090+1. **Fully qualified paths** — `Foo.Bar.t` instead of just `t`
9191+2. **Unified code blocks** — Type definitions rendered as single code blocks
9292+ with inline doc comments instead of fragmented separate blocks
9393+3. **Record field types** — Proper extraction of all inline elements
9494+9595+## Dependencies
9696+9797+The `index` library needs `odoc_markdown` added to its dune dependencies.
9898+The `db` library has no new dependencies.