···1818opam repositories within a profile, keyed by commit SHAs.
19192020Sits near the top of the dependency hierarchy, depending on
2121-{!day11-opam-build}, {!page-solution}, {!day11-lib}, {!day11-layer},
2222-and {!day11-exec}.
2121+{!page-opam_build}, {!page-solution}, {!page-lib}, {!page-layer},
2222+and {!page-exec}.
23232424{1 Modules}
2525
+2-2
day11/doc-pages/jtw.mld
···1313index generation, dynamic_cmis.json creation, and container script
1414generation.
15151616-Depends on {!day11-opam-build}, {!day11-container}, {!day11-doc},
1717-{!day11-layer}, and {!day11-solver}.
1616+Depends on {!page-opam_build}, {!page-container}, {!page-doc},
1717+{!page-layer}, and {!page-solver}.
18181919{1 Modules}
2020
+1-1
day11/doc-pages/layer.mld
···1919eviction. {!Day11_layer.Symlinks} maintains per-identifier tracking
2020symlinks for layer discovery.
21212222-Depends on {!day11-exec} for subprocess and sudo access. Has no opam
2222+Depends on {!page-exec} for subprocess and sudo access. Has no opam
2323or package-domain knowledge — domain-specific metadata lives in
2424sidecar files owned by higher libraries.
2525
+2-2
day11/doc-pages/opam_build.mld
···1616for investigating failed builds. {!Day11_opam_build.Patches} manages per-package patch
1717files that modify builds before execution.
18181919-Depends on {!day11-runner}, {!day11-opam-layer}, {!day11-solver} (via
2020-solver_pool), {!day11-opam}, {!page-solution}, and the lower layer and
1919+Depends on {!page-runner}, {!page-opam_layer}, {!page-solver} (via
2020+solver_pool), {!page-opam}, {!page-solution}, and the lower layer and
2121container libraries.
22222323{1 Modules}
+3-3
day11/doc-pages/opam_layer.mld
···11{0 day11-opam-layer}
2233Opam-flavoured layer types that give domain meaning to the generic
44-{!day11-layer} storage. {!Day11_opam_layer.Build} is the recursive DAG node type: each
44+{!page-layer} storage. {!Day11_opam_layer.Build} is the recursive DAG node type: each
55node carries a content-addressed hash, a package, direct dependency
66nodes, and a universe identifier. {!Day11_opam_layer.Tool} aggregates multiple build
77nodes into a single tool layer (e.g. odoc + deps). {!Day11_opam_layer.Build_meta}
···1414synthetic opam switch-state files so the container sees stacked deps
1515as installed.
16161717-Depends on {!day11-layer}, {!day11-exec}, and {!page-solution}. This
1818-library defines the types that {!day11-opam-build} and {!day11-doc}
1717+Depends on {!page-layer}, {!page-exec}, and {!page-solution}. This
1818+library defines the types that {!page-opam_build} and {!page-doc}
1919operate on.
20202121{1 Modules}
+2-2
day11/doc-pages/runner.mld
···1111variables, bind mounts, commands — are injected by the caller through
1212the [prep_upper] callback and the {!Day11_container.Oci_spec.t}.
13131414-Depends on {!day11-layer} for storage and stacking, {!day11-container}
1515-for overlayfs mounts and runc execution, and {!day11-exec} for
1414+Depends on {!page-layer} for storage and stacking, {!page-container}
1515+for overlayfs mounts and runc execution, and {!page-exec} for
1616subprocess primitives.
17171818{1 Modules}
+1-1
day11/doc-pages/solver.mld
···1111invalidation. {!Day11_solver.Dot_solution} renders dependency graphs as Graphviz
1212DOT files for debugging.
13131414-Depends on {!page-solution} for solution types and {!day11-opam} for the
1414+Depends on {!page-solution} for solution types and {!page-opam} for the
1515git-backed package index.
16161717{1 Modules}
+1-1
day11/doc/universe.mli
···2323(** [write_package_refs ~pkg_html_dir ~universe_hashes] writes
2424 [universes.json] into the package's HTML directory listing which
2525 universes it references. Moves atomically with the package docs
2626- during {!Atomic_publish}. *)
2626+ during publication (see {!Day11_exec.Atomic_swap}). *)
27272828val collect_referenced :
2929 html_dir:Fpath.t -> string list
+1-1
day11/opam_build/build_layer.mli
···7474 sidecars ([build.json] for opam package builds, [doc.json] for
7575 odoc layers). NOT called on cache hits.
76767777- Default strategy is {!opam_build_strategy}. *)
7777+ The default strategy runs [opam-build] on the package. *)
+2-1
day11/opam_layer/build_meta.mli
···33 Lives next to {!Day11_layer.Meta} as [build.json] in the
44 layer directory. The presence of this file marks a layer as the
55 output of an opam package build (as opposed to e.g. a doc layer,
66- see {!Doc_meta}, or a future layer kind that doesn't yet exist).
66+ see {!Day11_doc.Doc_meta}, or a future layer kind that doesn't
77+ yet exist).
7889 The opam-specific information is kept here so that
910 {!Day11_layer.Meta} can stay generic and reusable across
+191
docs/plans/2026-04-15-native-figures.md
···11+# Native `@figure` for .mld pages
22+33+**Status:** Planned
44+**Date:** 2026-04-15
55+66+## Problem
77+88+Blog posts currently embed figures as raw HTML:
99+1010+```
1111+{%html:
1212+<figure>
1313+ <a href="…"><img src="parseff.png" alt="…"></a>
1414+ <figcaption><em>A screenshot…</em></figcaption>
1515+</figure>
1616+%}
1717+```
1818+1919+See e.g. `site/blog/2026/04/weeknotes-2026-15.mld:66-71` and several
2020+instances in
2121+`site/blog/2026/04/odoc_and_ocaml_notebooks.mld`.
2222+2323+This is verbose, HTML-only, and loses odoc's semantic layer (captions
2424+can't contain references, no consistent CSS hook, no alt-text
2525+discipline).
2626+2727+Odoc's native `{image:…}` syntax (`odoc/src/parser/token.ml:42,45`)
2828+renders bare `<img>` — no caption, no link wrapping.
2929+3030+## Goal
3131+3232+A block-level tag that produces a `<figure>` with caption, linkable
3333+image, and sensible defaults:
3434+3535+```
3636+@figure parseff.png "A screenshot of the parseff site"
3737+https://jon.ludl.am/experiments/parseff
3838+```
3939+4040+or a multi-line form where the body is the caption (so it can be
4141+formatted):
4242+4343+```
4444+@figure parseff.png
4545+Produced by the {{:…}parseff plugin}. Click for the full site.
4646+```
4747+4848+## Ground truth
4949+5050+- Block-tag extensions receive `Comment.nestable_block_element list`
5151+ as the body (`odoc/src/extension_api/odoc_extension_api.ml:121-148`)
5252+ — so the caption **can** contain formatted inlines (bold, italic,
5353+ links, references). This is the key advantage over a code-block
5454+ extension, which gets a raw string
5555+ (`odoc_extension_api.ml:174-185`).
5656+- The existing admonition extension
5757+ (`odoc-admonition-extension/src/admonition_extension.ml:125-146`)
5858+ is the closest pattern: tag + formatted block body → rendered HTML
5959+ with custom class.
6060+6161+## Design
6262+6363+### Future V3 (deferred — depends on custom inlines task)
6464+6565+Walk the first paragraph's inline AST instead of parsing strings. Use
6666+odoc's native `{image:url}` / `{{image:url}alt}` media element as the
6767+image, render everything else in the paragraph as a rich caption
6868+(preserving `{i}`, `{b}`, `{{:url}…}`, `{!refs}`):
6969+7070+```
7171+@figure link=https://…/parseff {{image:parseff.png} A screenshot of the parseff site} produced by the {i plugin}.
7272+```
7373+7474+→ `<figure><a><img></a><figcaption>produced by the <em>plugin</em>.</figcaption></figure>`
7575+7676+Because `non_link_inline_element` excludes `Media`, native odoc can't
7777+nest an image inside a link, so the wrapping URL still needs to arrive
7878+as an attribute prefix. Once the custom-inline extension point lands
7979+(`docs/plans/2026-04-15-odoc-custom-inlines.md`), a dedicated
8080+`{linked-image url src alt}` custom inline removes the attribute
8181+prefix entirely:
8282+8383+```
8484+@figure {linked-image https://…/parseff parseff.png "A screenshot"} produced by the {i plugin}.
8585+```
8686+8787+V3 is deferred until the custom-inline work is done; V1 stays in the
8888+tree as the working implementation in the meantime.
8989+9090+### Syntax (v1 — shipped)
9191+9292+```
9393+@figure <src> alt="…" [link="…"] [caption="…"] [class="…"]
9494+```
9595+9696+Everything after `@figure` is an attribute string. Bare first token is
9797+promoted to `src`. The `alt` attribute doubles as the figcaption unless
9898+a separate `caption="…"` attribute is given. Plain text only — no
9999+inline formatting in captions.
100100+101101+**Why not a rich-text caption block?** Odoc's parser collapses
102102+newlines to spaces inside a paragraph, and a blank line terminates the
103103+`@tag` body. That leaves no way to distinguish "attrs" from "caption"
104104+within the same paragraph, and no way to attach a second paragraph to
105105+the tag. A v2 using a code-block extension (`{@figure[ rich caption
106106+with {{:url}links} ]}`) would work but defers richer captions. For
107107+now, plain-text alt-as-caption covers the existing blog use cases.
108108+109109+### Rendering
110110+111111+Output:
112112+113113+```html
114114+<figure class="figure {class}">
115115+ <a href="{link}"><img src="{src}" alt="{alt}"></a> <!-- link optional -->
116116+ <figcaption>{caption}</figcaption>
117117+</figure>
118118+```
119119+120120+If no `link`, emit bare `<img>`. If no `alt`, warn (accessibility).
121121+If no caption, emit `<figure>` without `<figcaption>`.
122122+123123+The caption is rendered by feeding the body blocks back through
124124+`Odoc_document` standard inline/block rendering — no need to
125125+reimplement formatting. Admonitions already do this.
126126+127127+### Asset resolution
128128+129129+The `src` is relative to the `.mld` file. Odoc's existing asset
130130+handling (`{image:…}`) resolves paths against the page's location;
131131+reuse that logic by parsing the src through the same helper rather
132132+than inlining it as raw HTML. Check what admonition does with
133133+references to see if this is straightforward.
134134+135135+### CSS
136136+137137+One block in `odoc_jons_plugins_css.ml`:
138138+139139+```css
140140+figure.figure { margin: 1.5em 0; text-align: center; }
141141+figure.figure img { max-width: 100%; height: auto; }
142142+figure.figure figcaption { font-style: italic; color: #666; }
143143+```
144144+145145+## Sequence
146146+147147+1. Add `Figure` module in
148148+ `odoc-jons-plugins/src/odoc_jons_plugins.ml` using the block-tag
149149+ extension pattern.
150150+2. Write an attribute parser for `key="value"` pairs on the tag's
151151+ first line.
152152+3. Render via the block-tag `to_document` → construct
153153+ `Odoc_document.Types` nodes (or emit raw `<figure>` wrapping
154154+ around already-rendered inner content — see admonition for which
155155+ idiom fits).
156156+4. Resolve `src` through odoc's asset path logic so the generated
157157+ `<img src>` matches what `{image:…}` would produce.
158158+5. CSS.
159159+6. Convert the April 2026 posts from `{%html: <figure>…%}` to
160160+ `@figure`.
161161+7. Warn (don't fail) when `alt` is missing.
162162+163163+## Effort
164164+165165+Small–medium. Most of the machinery already exists in
166166+`admonition_extension.ml`. Biggest risk is getting asset path
167167+resolution right without duplicating odoc internals. ~100–150 LOC.
168168+169169+## Gotchas
170170+171171+- **Attribute parsing in .mld.** Odoc's parser treats the first line
172172+ after `@figure` as prose. The extension body is whatever lands
173173+ between this tag and the next block-level sibling. Two options:
174174+ (a) parse attrs from the first paragraph's raw text, (b) require
175175+ attrs on separate marker lines like `@figure src=…`. (a) is
176176+ friendlier. Prototype and see what the received AST looks like.
177177+- **Link-wrapped images.** If the image links to itself at full
178178+ size (common pattern), default `link` to `src` when absent? Or
179179+ leave explicit. Prefer explicit — less magic.
180180+- **Multiple images per figure.** Out of scope for v1.
181181+- **Non-HTML backends.** Same concern as inline extensions. For now,
182182+ HTML-only is fine; warn on other backends.
183183+184184+## Not doing
185185+186186+- **Full Pandoc-style figure syntax** (``). Keep to
187187+ odoc's `@tag` style.
188188+- **Automatic width/height detection** from image dimensions. Lets
189189+ the browser handle it.
190190+- **Gallery/lightbox JS.** Out of scope; can be added later as a
191191+ separate plugin that enhances `.figure` elements.
+193
docs/plans/2026-04-15-odoc-custom-inlines.md
···11+# Custom Inline Extensions for Odoc
22+33+**Status:** Shipped (2026-04-15) — simplified implementation
44+**Date:** 2026-04-15
55+66+## Shipped
77+88+The implementation uses a lighter approach than originally sketched:
99+instead of a new AST variant, inline extensions ride on the existing
1010+`Raw_markup` variant with a synthetic target prefix (`odoc-ext:`).
1111+1212+- Lexer rule: `{&name payload}` emits
1313+ `Raw_markup (Some ("odoc-ext:" ^ name), payload)` — no new token
1414+ type. One rule added in `odoc/src/parser/lexer.mll`.
1515+- HTML generator: `raw_markup` in `odoc/src/html/generator.ml`
1616+ detects the prefix and dispatches to the inline-handler registry.
1717+ Falls back to emitting payload raw if no handler is registered.
1818+- Registry: `register_inline_handler` / `find_inline_handler` +
1919+ `inline_extension_target_prefix` constant in
2020+ `odoc/src/extension_registry/odoc_extension_registry.ml`.
2121+- Plugin API: `module type Inline_Extension` (prefix + `to_html :
2222+ string -> string`) and `Registry.register_inline` in
2323+ `odoc/src/extension_api/odoc_extension_api.ml`.
2424+2525+Total: ~180 LOC across 5 files, no AST changes, no document-IR
2626+changes, no backend pattern-match audit. The Raw_markup target is
2727+lightly punned but fully reversible if we ever want a clean
2828+`` `Extension `` AST variant.
2929+3030+Smoke-test plugins in `odoc-jons-plugins`:
3131+3232+- `{&kbd Ctrl-K}` → `<kbd>Ctrl-K</kbd>`
3333+- `{&margin an aside about X}` → `<span class="margin-note">…</span>`
3434+3535+Both verified end-to-end after installing the patched odoc and
3636+reinstalling the plugin.
3737+3838+## Not shipped
3939+4040+The originally-sketched new AST variant, separate token type, and
4141+document-IR change are all deferred. They buy cleaner semantics (e.g.
4242+for non-HTML backends that might want to dispatch on the variant
4343+directly) but are not needed for the inline extensions to work. The
4444+sketch below remains as documentation of that larger design.
4545+4646+---
4747+4848+## Original sketch (deferred)
4949+5050+## Problem
5151+5252+Odoc supports custom block-level tags (`@custom …`) and custom code
5353+blocks (`{@name[ … ]}`) via a plugin registry. There is no equivalent
5454+at the inline level. Authors who want marginal annotations, citations,
5555+keyboard-key styling, or other small inline decorations must fall back
5656+to `{%html: … %}` raw markup, which is verbose, HTML-only, and
5757+bypasses odoc's semantic layer.
5858+5959+## Goal
6060+6161+Let plugins register inline-level handlers, so `.mld` authors can
6262+write something like `{%margin:this is a side note}` and have a plugin
6363+render it to arbitrary inline HTML.
6464+6565+## Core constraint
6666+6767+`inline_element` in `odoc/src/model/comment.ml:35-39` is a **closed
6868+polymorphic variant**. The lexer (`odoc/src/parser/lexer.mll`) has a
6969+hard-coded set of brace commands. There is no extension point at the
7070+inline level today — block-level `@custom` works because the lexer
7171+treats any `@name` as a generic `Custom` token dispatched through the
7272+extension registry at
7373+`odoc/src/extension_api/odoc_extension_api.ml:121-148`.
7474+7575+Adding inline extensibility therefore requires an odoc patch. We can
7676+follow the shape of the block-level registry.
7777+7878+## Design
7979+8080+### 1. AST — `odoc/src/model/comment.ml`
8181+8282+Add an extension variant to `inline_element`:
8383+8484+```ocaml
8585+| `Extension of string * string (* name, raw payload *)
8686+```
8787+8888+Add the mirror variant to the document IR at
8989+`odoc/src/document/types.ml` so it can survive through to the
9090+renderer.
9191+9292+### 2. Syntax — `{%name:payload}`
9393+9494+Chosen because:
9595+9696+- `{%` is already used only for `{%html: … %}` (raw markup, closed by
9797+ `%}`). The new form uses `}` to close, so the lexer can disambiguate
9898+ by lookahead: `{%html:` → raw markup path; `{%name:` where `name ≠
9999+ html` → inline extension.
100100+- Does not collide with `{!ref}`, `{{:url} text}`, or any existing
101101+ brace command.
102102+- The `%` cues "injected content", matching the raw-markup
103103+ convention.
104104+105105+Alternative considered: `{:name payload}`. Mirrors `@name` block
106106+syntax nicely but visually close to the `{{:url}}` link form. Sticking
107107+with `{%…}` unless feedback says otherwise.
108108+109109+### 3. Lexer + parser
110110+111111+- `odoc/src/parser/token.ml` — new token `Inline_extension of string *
112112+ string`.
113113+- `odoc/src/parser/lexer.mll` — recognise `{%name:content}` where
114114+ `name` is `[a-z][a-z0-9_.-]*` and `name ≠ html`. Warn and recover on
115115+ unknown-looking forms.
116116+- `odoc/src/parser/syntax.ml` — consume the token inside
117117+ `inline_element`, produce `` `Extension (name, payload) ``.
118118+119119+### 4. Document phase
120120+121121+- `odoc/src/document/comment.ml` — in the inline dispatcher, map AST
122122+ `Extension` to the document IR's `Extension` variant.
123123+124124+### 5. Plugin API — `odoc/src/extension_api/odoc_extension_api.ml`
125125+126126+Add a module type alongside the block-level `Extension`:
127127+128128+```ocaml
129129+module type Inline_Extension = sig
130130+ val prefix : string
131131+ val to_inline : string -> Inline.t
132132+end
133133+```
134134+135135+Add `Registry.register_inline` that stores handlers in a parallel
136136+`Hashtbl` keyed by prefix.
137137+138138+Plugins return `Inline.t` directly (the document IR), so they can emit
139139+styled text, links, raw HTML, or any mix — same flexibility the
140140+block-level API gives them.
141141+142142+### 6. HTML rendering
143143+144144+- `odoc/src/html/generator.ml:inline` — when the IR has
145145+ `Extension (name, payload)`, look up the handler in the registry and
146146+ splice its `Inline.t` result. Fall back to rendering the payload as
147147+ plain text if no handler is registered (with a warning), so sites
148148+ don't break when a plugin is missing.
149149+150150+### 7. Smoke test — `odoc-jons-plugins`
151151+152152+Register one small inline plugin (e.g. `{%margin:…}`) rendering to a
153153+`<span class="margin-note">`, plus CSS in
154154+`odoc_jons_plugins_css.ml`. Use it in a blog post.
155155+156156+## Sequence
157157+158158+1. AST variant in `model/comment.ml` + `document/types.ml`.
159159+2. Token + lexer rule.
160160+3. Parser wiring in `syntax.ml`.
161161+4. Document-phase mapping in `document/comment.ml`.
162162+5. Plugin API module type + `register_inline`.
163163+6. HTML dispatcher + fallback.
164164+7. Tests: parse, error recovery on unknown/malformed forms, roundtrip,
165165+ handler lookup.
166166+8. Ship `{%margin:…}` via `odoc-jons-plugins`.
167167+168168+## Effort
169169+170170+~150–200 LOC across 7 files. Roughly 2× the sherlodoc patch. The
171171+parser changes are the only genuinely new territory; everything else
172172+mirrors the block-level code path.
173173+174174+## Gotchas
175175+176176+- **Payload escaping.** The payload is "everything until `}`". Need a
177177+ plan for `}` inside payload — either backslash-escape, or disallow
178178+ (plugins can accept references to external content instead).
179179+- **Non-HTML backends.** Odoc also renders to man pages / LaTeX. Either
180180+ implement `Extension` in each backend (with a "plain text of
181181+ payload" default) or restrict this feature to HTML for now and warn
182182+ on other backends.
183183+- **Upstream.** Worth proposing to odoc rather than carrying locally —
184184+ inline extensibility is generally useful and patching the parser is
185185+ costly to maintain out-of-tree.
186186+187187+## Not doing
188188+189189+- **Post-processing `Raw_markup` or `Link`** to fake inline
190190+ extensibility without an odoc patch. Ugly syntax, fragile, and
191191+ breaks non-HTML backends. Rejected.
192192+- **Full attribute grammar** (`{%name attr=foo: payload}`). Start with
193193+ `name + string payload`; plugins can parse the payload themselves.
+110
docs/plans/2026-04-15-page-tags.md
···11+# `@page-tags` for .mld pages
22+33+**Status:** Planned
44+**Date:** 2026-04-15
55+66+## Problem
77+88+Pages have no machine-readable tag metadata. Cross-linking related
99+posts today means hand-written "see also" lists, which rot. A
1010+lightweight `@page-tags foo bar baz` tag would let us (a) surface a
1111+visible tag chip row on each page and (b) power cross-page queries
1212+("all pages tagged `ocaml`", "related posts by shared tags") the same
1313+way `@recent-posts` already does.
1414+1515+## Goal
1616+1717+- Author syntax: `@page-tags ocaml odoc plugins` at the top of any
1818+ `.mld`.
1919+- Visible rendering: a small chip row near the page header.
2020+- Programmatic consumption: a sibling extension (e.g. `@tagged-pages
2121+ ocaml`) that enumerates matching pages.
2222+2323+## Ground truth
2424+2525+- Block-tag extensions register via
2626+ `Odoc_extension_api.Registry.register` and receive
2727+ `Comment.nestable_block_element list` (`odoc-jons-plugins/src/odoc_jons_plugins.ml:383-401`).
2828+- Cross-page data is available at **link phase** via the `Env` API.
2929+ `@recent-posts` uses `Api.Env.lookup_page_by_path` to pull other
3030+ pages' content
3131+ (`odoc-jons-plugins/src/odoc_jons_plugins.ml:687, 696`;
3232+ `odoc/src/extension_api/odoc_extension_api.ml:116, 140-147`).
3333+3434+## Design
3535+3636+Two extensions, one for producing tags, one for consuming them.
3737+3838+### 1. `@page-tags` — producer
3939+4040+Register a block-tag extension with `prefix = "page-tags"`.
4141+4242+- Parse the block content into a flat list of tag tokens (split on
4343+ whitespace; each tag `[a-z0-9][a-z0-9-]*`; warn on anything else).
4444+- Render: a `<div class="page-tags">` with one `<a
4545+ class="tag-chip">` per tag linking to `/tags/<tag>.html` (or
4646+ wherever the index lives; see "Tag index page" below).
4747+- Emit a small CSS block via `odoc_jons_plugins_css.ml`.
4848+4949+### 2. `@tagged-pages <tag>` — consumer
5050+5151+A link-phase extension (same shape as `recent-posts`) that:
5252+5353+- Walks the page tree via `Env`.
5454+- For each page, looks at its raw `Comment.docs` for a top-level
5555+ `@page-tags` block and extracts the tags.
5656+- Emits a bulleted list of pages whose tags include the argument.
5757+5858+This is the mechanism for "quick referencing between pages". A
5959+separate per-tag index page (`/tags/ocaml.mld`) can just be a thin
6060+`.mld` containing `@tagged-pages ocaml`.
6161+6262+### 3. Tag normalisation
6363+6464+Lowercase, trim, dedupe at extraction. Same function used in both the
6565+producer (for rendering) and the consumer (for matching), so that
6666+`Ocaml` and `ocaml` are the same tag.
6767+6868+## Sequence
6969+7070+1. Add `Page_tags` module in
7171+ `odoc-jons-plugins/src/odoc_jons_plugins.ml` — block-tag extension,
7272+ parse + render.
7373+2. Add CSS for `.page-tags` / `.tag-chip` in
7474+ `odoc_jons_plugins_css.ml`.
7575+3. Add `Tagged_pages` link-phase extension — mirror the structure of
7676+ `Recent_posts`.
7777+4. Factor a shared `extract_tags : Comment.docs -> string list`
7878+ helper so producer and consumer agree.
7979+5. Optional: a `/tags/index.mld` listing every tag with counts; build
8080+ with a third extension `@tag-cloud` or generate offline.
8181+6. Add `@page-tags` to a handful of posts; link from the blog index.
8282+8383+## Effort
8484+8585+Small. Producer is ~30 LOC copying the `hidden_tag_extension`
8686+pattern. Consumer is ~100 LOC copying `Recent_posts`. No odoc patch
8787+needed.
8888+8989+## Gotchas
9090+9191+- **Tag discovery.** The consumer must know which tags exist. Either
9292+ (a) walk all pages in the consumer and collect, or (b) persist tag
9393+ index at build time via a hook. (a) is simpler; (b) is faster if
9494+ the site grows. Start with (a).
9595+- **Positioning in rendering.** Tags ideally render near the page
9696+ header, not wherever the `@page-tags` block physically sits. The
9797+ block-tag extension can't control position directly; simplest is to
9898+ put `@page-tags` at the top of the file and accept the block
9999+ renders in-place. If that's ugly, a post-render shell hook can
100100+ relocate the node.
101101+- **Anchor for "tagged by"**: each rendered chip links to a tag
102102+ index. Decide the URL scheme up front (`/tags/<tag>`) so links
103103+ don't need to be rewritten later.
104104+105105+## Not doing
106106+107107+- **YAML front-matter-style metadata.** Keeping with odoc's native
108108+ `@tag` style so it parses with no syntax extension.
109109+- **Full-text tag search integration with sherlodoc.** Handled
110110+ separately once mld indexing is in.
+125
docs/plans/2026-04-15-sherlodoc-mld-indexing.md
···11+# Sherlodoc mld Page-Prose Indexing
22+33+**Status:** Phase 1 shipped (2026-04-15); Phase 2 deferred
44+**Date:** 2026-04-15
55+66+## Problem
77+88+Sherlodoc does not index `.mld` page content. Its indexer explicitly
99+drops entries of kind `Doc | Page _ | Dir | Impl` before they reach the
1010+full-text index (`odoc/sherlodoc/index/load_doc.ml:220-226`). Only API
1111+items (values, types, modules) and their docstrings are searchable. A
1212+user searching the site cannot find blog posts, tutorials, or narrative
1313+pages by their content.
1414+1515+## Goal
1616+1717+Make headings (and optionally paragraphs/list items) from `.mld` pages
1818+searchable via sherlodoc, with results that deep-link to the relevant
1919+heading anchor.
2020+2121+## Scope
2222+2323+All changes live inside the vendored `odoc/` tree. No upstream
2424+coordination required; regenerate `.db` after deployment.
2525+2626+Rough size: ~150 LOC across 4–5 files.
2727+2828+## Design
2929+3030+### 1. Extract prose entries — `odoc/src/index/skeleton.ml:338-343`
3131+3232+Today `from_page` emits one `Entry` per page. Walk
3333+`p.content.elements` (`Comment.block_element`) recursively and emit
3434+child entries for:
3535+3636+- `Heading` — reuse its existing `Identifier.Label.t` as the entry
3737+ id (the same label already drives the HTML fragment anchor).
3838+- `Paragraph` / list items — synthesize child ids from the parent
3939+ label plus a counter. (Phase 2; see "Staging" below.)
4040+4141+Attach the new entries as children of the page node in the `Tree` so
4242+hierarchy is preserved.
4343+4444+### 2. Entry kind — `odoc/src/index/entry.ml:45-62` and `odoc/sherlodoc/db/entry.ml:7-34`
4545+4646+Add a `Heading` variant. Reusing `Doc` would work but loses the ability
4747+to badge results distinctly and tune ranking. Since sherlodoc is
4848+vendored, the binary format bump is local to us.
4949+5050+### 3. Unblock the filter — `odoc/sherlodoc/index/load_doc.ml:220-226`
5151+5252+`is_pure_documentation` short-circuits before `register_entry`,
5353+`register_doc`, and `register_full_name`. Remove `Heading` (and
5454+optionally `Doc`) from that guard so prose reaches tokenization. Keep
5555+`Page _` excluded (the whole-page entry is already in the tree) and
5656+`Dir`/`Impl` (not prose).
5757+5858+### 4. URL fragments — `odoc/src/search/html.ml:7-29`
5959+6060+When the entry id is a `Label`, build `<page-url>#<label>`. Odoc
6161+already emits matching `id=` anchors on headings, so no new anchor
6262+logic is needed — just make sure the search URL picks up the fragment.
6363+6464+### 5. Result display — `odoc/sherlodoc/jsoo/odoc_html_frontend.ml:39-52`
6565+6666+Add a `kind_heading` badge. Compose `name` as `"Page title › Heading"`,
6767+leave `rhs = None`, and put a short prose snippet in `doc_html`. All
6868+rendering is server-side OCaml — no JS changes.
6969+7070+### 6. Ranking — `odoc/sherlodoc/index/load_doc.ml:37-47`
7171+7272+Existing `cost_doc = 100` already ranks prose below API items.
7373+Optionally add a small bonus for top-level headings (Title <
7474+Section < Subsection) so page-level hits float up.
7575+7676+## Sequence
7777+7878+### Phase 1 — one-entry-per-page (SHIPPED)
7979+8080+Trivial change: remove `Doc` and `Page _` from the
8181+`is_pure_documentation` filter in
8282+`odoc/sherlodoc/index/load_doc.ml:220-223`. The inner `register_entry`
8383+already handles `Doc`-kind entries correctly (skips
8484+`register_full_name` and `register_kind` for them), so page body text
8585+flows straight into `register_doc`'s tokeniser. One result per page,
8686+matched by any word in the body. No new AST variants, no URL work.
8787+8888+Verified via a standalone cram-style test:
8989+`sherlodoc index` + `sherlodoc search "parseff"` returns the indexed
9090+`.mld` page hit, and a nonsense query returns `[No results]`.
9191+9292+### Phase 2 — per-heading entries (DEFERRED)
9393+9494+1. Add `Heading` kind in both `entry.ml`s.
9595+2. Recurse in `skeleton.ml`; emit heading entries.
9696+3. Unblock `Heading` in `load_doc.ml`.
9797+4. Compose fragment URL in `search/html.ml`.
9898+5. Add `kind_heading` constant in `odoc_html_frontend.ml`.
9999+6. Regenerate `.db`, test search against the live site.
100100+101101+## Staging
102102+103103+- **Phase 1 (headings only):** smallest useful increment. Low noise,
104104+ high value — users usually search for section titles.
105105+- **Phase 2 (paragraph bodies):** gate behind `--index-prose`. Decide
106106+ after Phase 1 ships whether the added recall outweighs the noise.
107107+108108+## Gotchas
109109+110110+- **Anchor stability.** Auto-generated labels for unlabeled headings
111111+ change when you reorder or rename them — search results will rot.
112112+ Consider requiring explicit `{1:label ...}` syntax on headings that
113113+ should be indexed, or accept some churn.
114114+- **Result flooding.** Indexing every paragraph easily drowns API hits.
115115+ Headings-first avoids this.
116116+- **DB format bump.** Any deployed `.db` must be rebuilt on upgrade.
117117+- **Upstream.** Worth floating the design to the sherlodoc maintainer
118118+ even while we carry the patch locally.
119119+120120+## Not doing
121121+122122+- Building a separate page-only search index (lunr/pagefind). Rejected
123123+ because it splits the search UX.
124124+- Rewriting sherlodoc's ranking model. The existing cost model is good
125125+ enough for a first cut.
+609-5
odoc-jons-plugins/src/odoc_jons_plugins.ml
···66module Url = Odoc_document.Url
7788(* Register CSS and JS as support files *)
99+(* The jon-shell.css support file is registered later, after all
1010+ stylesheet fragments (including the inline-extension CSS for
1111+ {&margin}, {&kbd}, etc.) are in scope so we can concatenate
1212+ everything into one file that the page's <link> already pulls. *)
1313+914let () =
1010- Odoc_extension_registry.register_support_file ~prefix:"jon-shell"
1111- {
1212- filename = "extensions/jon-shell.css";
1313- content = Inline Odoc_jons_plugins_css.css;
1414- };
1515 Odoc_extension_registry.register_support_file ~prefix:"jon-shell"
1616 {
1717 filename = "extensions/jon-shell.js";
···399399400400let () =
401401 List.iter hidden_tag_extension [ "published"; "notanotebook"; "packages" ]
402402+403403+(* --- Page tags extension ---
404404+405405+ Produces a small row of tag chips from [@page-tags foo bar baz]. Tags
406406+ are lowercase, link to /tags/<tag>. The consumer extension
407407+ ([@tagged-pages <tag>]) is a separate plugin that walks the page tree
408408+ at link phase to collect matching pages. *)
409409+410410+module Page_tags = struct
411411+ let prefix = "page-tags"
412412+413413+ let page_tags_css = {|
414414+/* Page tags extension - neutralize the at-tags list wrapper */
415415+.jon-shell-main ul.at-tags:has(li.page-tags) {
416416+ list-style: none;
417417+ margin: 0;
418418+ padding: 0;
419419+}
420420+.jon-shell-main .at-tags li.page-tags {
421421+ list-style: none;
422422+ margin: 0;
423423+ padding: 0;
424424+ text-indent: 0;
425425+}
426426+.page-tags {
427427+ display: flex;
428428+ flex-wrap: wrap;
429429+ gap: 0.4em;
430430+ margin: 0.75em 0 1.5em;
431431+}
432432+.page-tags .tag-chip {
433433+ display: inline-block;
434434+ padding: 0.15em 0.65em;
435435+ font-size: 0.8rem;
436436+ line-height: 1.4;
437437+ color: var(--text-muted, #666);
438438+ background: var(--surface-alt, #f3f3f3);
439439+ border: 1px solid var(--border-color, #e0e0e0);
440440+ border-radius: 999px;
441441+ text-decoration: none;
442442+ transition: background 0.15s ease, color 0.15s ease;
443443+}
444444+.page-tags .tag-chip:hover {
445445+ color: var(--accent-color, #b44e2d);
446446+ background: var(--bg-hover, #eee);
447447+}
448448+@media (prefers-color-scheme: dark) {
449449+ .page-tags .tag-chip {
450450+ color: var(--text-muted, #aaa);
451451+ background: rgba(255,255,255,0.04);
452452+ border-color: rgba(255,255,255,0.1);
453453+ }
454454+}
455455+|}
456456+457457+ (* A tag is lowercase [a-z0-9] with optional internal hyphens. We
458458+ normalise by lowercasing and trimming, and reject anything that
459459+ doesn't match. *)
460460+ let is_tag_char c =
461461+ (c >= 'a' && c <= 'z')
462462+ || (c >= '0' && c <= '9')
463463+ || c = '-'
464464+465465+ let normalise_tag s =
466466+ let s = Stdlib.String.lowercase_ascii (Stdlib.String.trim s) in
467467+ if s = "" then None
468468+ else if Stdlib.String.length s < 1 then None
469469+ else
470470+ let ok = ref true in
471471+ Stdlib.String.iter (fun c -> if not (is_tag_char c) then ok := false) s;
472472+ if !ok then Some s else None
473473+474474+ (* Parse tags from the block content: take the plain text, split on
475475+ whitespace, dedupe while preserving order. *)
476476+ let extract_tags content =
477477+ let text = Api.text_of_nestable_block_elements content in
478478+ let parts =
479479+ Stdlib.String.split_on_char ' ' text
480480+ |> List.concat_map (fun s -> Stdlib.String.split_on_char '\n' s)
481481+ |> List.concat_map (fun s -> Stdlib.String.split_on_char '\t' s)
482482+ in
483483+ let seen = Hashtbl.create 8 in
484484+ List.filter_map (fun part ->
485485+ match normalise_tag part with
486486+ | None -> None
487487+ | Some tag ->
488488+ if Hashtbl.mem seen tag then None
489489+ else (Hashtbl.add seen tag (); Some tag)
490490+ ) parts
491491+492492+ let raw_block html =
493493+ Odoc_document.Types.Block.{ attr = []; desc = Raw_markup ("html", html) }
494494+495495+ (* HTML-escape a tag (tags are already constrained to safe chars, but
496496+ be defensive). *)
497497+ let escape_attr s =
498498+ let b = Buffer.create (Stdlib.String.length s) in
499499+ Stdlib.String.iter (fun c ->
500500+ match c with
501501+ | '&' -> Buffer.add_string b "&"
502502+ | '<' -> Buffer.add_string b "<"
503503+ | '>' -> Buffer.add_string b ">"
504504+ | '"' -> Buffer.add_string b """
505505+ | c -> Buffer.add_char b c
506506+ ) s;
507507+ Buffer.contents b
508508+509509+ let render_chips tags =
510510+ let buf = Buffer.create 256 in
511511+ Buffer.add_string buf {|<div class="page-tags">|};
512512+ List.iter (fun tag ->
513513+ let t = escape_attr tag in
514514+ Buffer.add_string buf
515515+ (Printf.sprintf {|<a class="tag-chip" href="/tags/%s">%s</a>|} t t)
516516+ ) tags;
517517+ Buffer.add_string buf "</div>";
518518+ Buffer.contents buf
519519+520520+ let to_document ~tag:_ content =
521521+ let tags = extract_tags content in
522522+ let content =
523523+ if tags = [] then []
524524+ else [ raw_block (render_chips tags) ]
525525+ in
526526+ {
527527+ Api.content;
528528+ overrides = [];
529529+ resources = [ Api.Css_inline page_tags_css ];
530530+ assets = [];
531531+ }
532532+533533+ let link ~tag:_ env content =
534534+ let tags = extract_tags content in
535535+ List.iter (fun tag ->
536536+ let hierarchy : Odoc_model.Paths.Reference.Hierarchy.t =
537537+ (`TCurrentPackage, [ "tags"; tag ])
538538+ in
539539+ match Api.Env.lookup_page_by_path hierarchy env with
540540+ | Ok _ -> ()
541541+ | Error _ ->
542542+ failwith
543543+ (Printf.sprintf
544544+ "@page-tags: no page found for tag '%s'. Create \
545545+ site/tags/%s.mld before using this tag."
546546+ tag tag)
547547+ ) tags;
548548+ content
549549+end
550550+551551+let () =
552552+ Api.Registry.register_with_link (module Page_tags)
553553+554554+(* Whitespace-separated tokeniser with support for double-quoted
555555+ values. Used by @figure and the image / linked-image inlines. *)
556556+module Tok = struct
557557+ let is_ws c = c = ' ' || c = '\t' || c = '\n' || c = '\r'
558558+559559+ let tokenise s =
560560+ let len = Stdlib.String.length s in
561561+ let i = ref 0 in
562562+ let out = ref [] in
563563+ let skip_ws () =
564564+ while !i < len && is_ws s.[!i] do incr i done
565565+ in
566566+ while !i < len do
567567+ skip_ws ();
568568+ if !i < len then begin
569569+ let start = !i in
570570+ let tok =
571571+ if s.[!i] = '"' then begin
572572+ incr i;
573573+ let tstart = !i in
574574+ while !i < len && s.[!i] <> '"' do incr i done;
575575+ let t = Stdlib.String.sub s tstart (!i - tstart) in
576576+ if !i < len then incr i;
577577+ t
578578+ end else begin
579579+ while !i < len && not (is_ws s.[!i]) do incr i done;
580580+ Stdlib.String.sub s start (!i - start)
581581+ end
582582+ in
583583+ if tok <> "" then out := tok :: !out
584584+ end
585585+ done;
586586+ List.rev !out
587587+end
588588+589589+(* --- Figure extension ---
590590+591591+ [@figure src=foo.png alt="…" link="…"]
592592+ Caption body (one or more paragraphs, with inline formatting).
593593+594594+ Renders to <figure><a><img></a><figcaption>…</figcaption></figure>.
595595+ The caption body is rendered through the normal odoc document layer
596596+ so links, references, and emphasis work inside it. *)
597597+598598+module Figure = struct
599599+ open Odoc_document.Types
600600+601601+ let prefix = "figure"
602602+603603+ let figure_css = {|
604604+/* Figure extension - neutralize the at-tags list wrapper */
605605+.jon-shell-main ul.at-tags:has(li.figure) {
606606+ list-style: none;
607607+ margin: 0;
608608+ padding: 0;
609609+}
610610+.jon-shell-main .at-tags li.figure {
611611+ list-style: none;
612612+ margin: 0;
613613+ padding: 0;
614614+ text-indent: 0;
615615+}
616616+figure.figure {
617617+ margin: 1.5em auto;
618618+ text-align: center;
619619+}
620620+figure.figure img {
621621+ max-width: 100%;
622622+ height: auto;
623623+ border-radius: 4px;
624624+}
625625+figure.figure figcaption {
626626+ margin-top: 0.6em;
627627+ font-size: 0.9rem;
628628+ color: var(--text-muted, #666);
629629+ font-style: italic;
630630+ line-height: 1.5;
631631+}
632632+figure.figure figcaption p {
633633+ margin: 0;
634634+}
635635+|}
636636+637637+ (* Parse key=value pairs out of a single line of raw text. Supports
638638+ quoted values ("…") and bare values (stop at whitespace). Returns
639639+ an assoc list. Unknown keys are kept — the caller decides what to
640640+ do with them. *)
641641+ let parse_attrs line =
642642+ let len = Stdlib.String.length line in
643643+ let i = ref 0 in
644644+ let attrs = ref [] in
645645+ let skip_ws () =
646646+ while !i < len && (line.[!i] = ' ' || line.[!i] = '\t') do
647647+ incr i
648648+ done
649649+ in
650650+ let read_key () =
651651+ let start = !i in
652652+ while !i < len
653653+ && line.[!i] <> '='
654654+ && line.[!i] <> ' '
655655+ && line.[!i] <> '\t'
656656+ do incr i done;
657657+ Stdlib.String.sub line start (!i - start)
658658+ in
659659+ let read_quoted () =
660660+ (* assumes current char is '"' *)
661661+ incr i;
662662+ let start = !i in
663663+ while !i < len && line.[!i] <> '"' do incr i done;
664664+ let v = Stdlib.String.sub line start (!i - start) in
665665+ if !i < len then incr i; (* consume closing quote *)
666666+ v
667667+ in
668668+ let read_bare () =
669669+ let start = !i in
670670+ while !i < len && line.[!i] <> ' ' && line.[!i] <> '\t' do
671671+ incr i
672672+ done;
673673+ Stdlib.String.sub line start (!i - start)
674674+ in
675675+ (try
676676+ while !i < len do
677677+ skip_ws ();
678678+ if !i >= len then raise Exit;
679679+ let key = read_key () in
680680+ if key = "" then raise Exit;
681681+ if !i < len && line.[!i] = '=' then begin
682682+ incr i;
683683+ let value =
684684+ if !i < len && line.[!i] = '"' then read_quoted ()
685685+ else read_bare ()
686686+ in
687687+ attrs := (key, value) :: !attrs
688688+ end else begin
689689+ (* bare token — treat as the source if no src= seen yet *)
690690+ attrs := (key, "") :: !attrs
691691+ end
692692+ done
693693+ with Exit -> ());
694694+ List.rev !attrs
695695+696696+ let find_attr key attrs =
697697+ List.assoc_opt key attrs
698698+699699+ (* If the author wrote [@figure foo.png …] (no src=), the first bare
700700+ token becomes the src. *)
701701+ let infer_src attrs =
702702+ match find_attr "src" attrs with
703703+ | Some _ -> attrs
704704+ | None ->
705705+ match List.find_opt (fun (_k, v) -> v = "") attrs with
706706+ | Some (k, _) ->
707707+ (* Promote the first bare token to src *)
708708+ let rest = List.filter (fun (k', v) -> not (k' = k && v = "")) attrs in
709709+ ("src", k) :: rest
710710+ | None -> attrs
711711+712712+ (* HTML-escape an attribute / text value. *)
713713+ let escape s =
714714+ let b = Buffer.create (Stdlib.String.length s) in
715715+ Stdlib.String.iter (fun c ->
716716+ match c with
717717+ | '&' -> Buffer.add_string b "&"
718718+ | '<' -> Buffer.add_string b "<"
719719+ | '>' -> Buffer.add_string b ">"
720720+ | '"' -> Buffer.add_string b """
721721+ | c -> Buffer.add_char b c
722722+ ) s;
723723+ Buffer.contents b
724724+725725+ let raw_block html =
726726+ Block.{ attr = []; desc = Raw_markup ("html", html) }
727727+728728+ (* V3: detect a {&image …} or {&linked-image …} custom inline in the
729729+ first paragraph and use it as the image. The rest of the
730730+ paragraph's inlines become the rich-formatted caption. Falls back
731731+ to V1 attribute-only form if no such inline is present. *)
732732+ let ext_image_prefix = "odoc-ext:image"
733733+ let ext_linked_prefix = "odoc-ext:linked-image"
734734+735735+ (* Split a paragraph's inlines around the first image/linked-image
736736+ custom inline. Returns (before, image_html, after) or None. *)
737737+ let find_image_inline
738738+ (inlines : Odoc_model.Comment.inline_element
739739+ Odoc_model.Location_.with_location list) =
740740+ let rec go acc = function
741741+ | [] -> None
742742+ | (el : Odoc_model.Comment.inline_element
743743+ Odoc_model.Location_.with_location) :: rest ->
744744+ (match el.value with
745745+ | `Raw_markup (target, payload)
746746+ when target = ext_image_prefix ->
747747+ (match Tok.tokenise payload with
748748+ | src :: rest_toks ->
749749+ let alt = Stdlib.String.concat " " rest_toks in
750750+ let html =
751751+ Printf.sprintf {|<img src="%s" alt="%s">|}
752752+ (escape src) (escape alt)
753753+ in
754754+ Some (List.rev acc, html, rest)
755755+ | [] -> go (el :: acc) rest)
756756+ | `Raw_markup (target, payload)
757757+ when target = ext_linked_prefix ->
758758+ (match Tok.tokenise payload with
759759+ | url :: src :: rest_toks ->
760760+ let alt = Stdlib.String.concat " " rest_toks in
761761+ let html =
762762+ Printf.sprintf
763763+ {|<a href="%s"><img src="%s" alt="%s"></a>|}
764764+ (escape url) (escape src) (escape alt)
765765+ in
766766+ Some (List.rev acc, html, rest)
767767+ | _ -> go (el :: acc) rest)
768768+ | _ -> go (el :: acc) rest)
769769+ in
770770+ go [] inlines
771771+772772+ (* Skip leading/trailing whitespace-only inlines for a cleaner caption. *)
773773+ let trim_inlines inlines =
774774+ let is_blank (el : Odoc_model.Comment.inline_element
775775+ Odoc_model.Location_.with_location) =
776776+ match el.value with
777777+ | `Space -> true
778778+ | _ -> false
779779+ in
780780+ let rec drop_head = function
781781+ | x :: rest when is_blank x -> drop_head rest
782782+ | xs -> xs
783783+ in
784784+ let drop_tail xs = List.rev (drop_head (List.rev xs)) in
785785+ drop_tail (drop_head inlines)
786786+787787+ let try_v3 content =
788788+ match content with
789789+ | [] -> None
790790+ | (first : Odoc_model.Comment.nestable_block_element
791791+ Odoc_model.Location_.with_location) :: _ ->
792792+ (match first.value with
793793+ | `Paragraph inlines ->
794794+ (match find_image_inline inlines with
795795+ | None -> None
796796+ | Some (before, img_html, after) ->
797797+ Some (img_html, trim_inlines (before @ after)))
798798+ | _ -> None)
799799+800800+ let render_v3 ~img_html ~caption_inlines =
801801+ let caption_ir : Inline.t =
802802+ Odoc_document.Comment.inline_element_list caption_inlines
803803+ in
804804+ let has_caption = caption_inlines <> [] in
805805+ let open_fig = {|<figure class="figure">|} in
806806+ let opens = if has_caption then "<figcaption>" else "" in
807807+ let closes = if has_caption then "</figcaption>" else "" in
808808+ let blocks =
809809+ [ raw_block (open_fig ^ img_html ^ opens) ]
810810+ @ (if has_caption
811811+ then [ Block.{ attr = []; desc = Inline caption_ir } ]
812812+ else [])
813813+ @ [ raw_block (closes ^ "</figure>") ]
814814+ in
815815+ {
816816+ Api.content = blocks;
817817+ overrides = [];
818818+ resources = [ Api.Css_inline figure_css ];
819819+ assets = [];
820820+ }
821821+822822+ (* v1 design: the entire tag body is treated as the attribute string.
823823+ Syntax: [@figure src=foo.png alt="Caption text" link=https://…]
824824+ The [alt] attribute doubles as the figcaption (plain text). For
825825+ rich-text captions, fall back to raw HTML until a v2 exists. *)
826826+ let to_document ~tag:_ content =
827827+ match try_v3 content with
828828+ | Some (img_html, caption_inlines) ->
829829+ render_v3 ~img_html ~caption_inlines
830830+ | None ->
831831+ (* Fall through to V1 *)
832832+ let _to_document_v1 = () in
833833+ let _ = _to_document_v1 in
834834+ let attr_line =
835835+ Api.text_of_nestable_block_elements content
836836+ |> Stdlib.String.trim
837837+ in
838838+ let attrs = parse_attrs attr_line |> infer_src in
839839+ match find_attr "src" attrs with
840840+ | None ->
841841+ (* No src — fall back to emitting the original content so the
842842+ author sees something rather than silence. *)
843843+ Api.simple_output (Api.blocks_of_nestable_elements content)
844844+ | Some src ->
845845+ let alt = Option.value ~default:"" (find_attr "alt" attrs) in
846846+ let caption =
847847+ match find_attr "caption" attrs with
848848+ | Some c -> c
849849+ | None -> alt
850850+ in
851851+ let link = find_attr "link" attrs in
852852+ let extra_class = Option.value ~default:"" (find_attr "class" attrs) in
853853+ let cls =
854854+ if extra_class = "" then "figure"
855855+ else "figure " ^ extra_class
856856+ in
857857+ let img_html =
858858+ match link with
859859+ | Some url ->
860860+ Printf.sprintf {|<a href="%s"><img src="%s" alt="%s"></a>|}
861861+ (escape url) (escape src) (escape alt)
862862+ | None ->
863863+ Printf.sprintf {|<img src="%s" alt="%s">|}
864864+ (escape src) (escape alt)
865865+ in
866866+ let caption_html =
867867+ if caption = "" then ""
868868+ else Printf.sprintf {|<figcaption>%s</figcaption>|} (escape caption)
869869+ in
870870+ let full_html =
871871+ Printf.sprintf {|<figure class="%s">%s%s</figure>|}
872872+ (escape cls) img_html caption_html
873873+ in
874874+ {
875875+ Api.content = [ raw_block full_html ];
876876+ overrides = [];
877877+ resources = [ Api.Css_inline figure_css ];
878878+ assets = [];
879879+ }
880880+end
881881+882882+let () =
883883+ Api.Registry.register (module Figure)
884884+885885+(* --- Inline extensions ---
886886+887887+ Custom inline elements written as [{&name payload}] in .mld text.
888888+ The registry calls us with the raw payload string; we return a
889889+ chunk of HTML that is spliced into the output verbatim.
890890+891891+ The extension sets up its own CSS via a separate block-level hook
892892+ on first use; that's inconvenient to track per-page, so we register
893893+ the CSS once via the shell support file (already pulled into every
894894+ page by the jon-shell plugin). *)
895895+896896+module Margin = struct
897897+ let prefix = "margin"
898898+899899+ let escape s =
900900+ let b = Buffer.create (Stdlib.String.length s) in
901901+ Stdlib.String.iter (fun c ->
902902+ match c with
903903+ | '&' -> Buffer.add_string b "&"
904904+ | '<' -> Buffer.add_string b "<"
905905+ | '>' -> Buffer.add_string b ">"
906906+ | '"' -> Buffer.add_string b """
907907+ | c -> Buffer.add_char b c
908908+ ) s;
909909+ Buffer.contents b
910910+911911+ let to_html payload =
912912+ Printf.sprintf {|<span class="margin-note">%s</span>|} (escape payload)
913913+end
914914+915915+module Kbd = struct
916916+ let prefix = "kbd"
917917+ let escape = Margin.escape
918918+ let to_html payload =
919919+ Printf.sprintf {|<kbd>%s</kbd>|} (escape payload)
920920+end
921921+922922+module Image_inline = struct
923923+ let prefix = "image"
924924+ let escape = Margin.escape
925925+ let to_html payload =
926926+ match Tok.tokenise payload with
927927+ | [] -> ""
928928+ | src :: rest ->
929929+ let alt = Stdlib.String.concat " " rest in
930930+ Printf.sprintf {|<img src="%s" alt="%s">|}
931931+ (escape src) (escape alt)
932932+end
933933+934934+module Linked_image_inline = struct
935935+ let prefix = "linked-image"
936936+ let escape = Margin.escape
937937+ let to_html payload =
938938+ match Tok.tokenise payload with
939939+ | [] | [ _ ] -> ""
940940+ | url :: src :: rest ->
941941+ let alt = Stdlib.String.concat " " rest in
942942+ Printf.sprintf
943943+ {|<a href="%s"><img src="%s" alt="%s"></a>|}
944944+ (escape url) (escape src) (escape alt)
945945+end
946946+947947+let inline_extensions_css = {|
948948+/* Inline extension: margin note
949949+950950+ Float a small sidenote to the right of the paragraph. The main
951951+ content column is flush with the page sidebar on the right so we
952952+ can't escape into an outer gutter; instead we float inside the
953953+ column and let the paragraph text wrap around it. On narrow
954954+ viewports it falls back to a pull-quote style block. */
955955+.margin-note {
956956+ float: right;
957957+ clear: right;
958958+ width: 12em;
959959+ margin: 0.1em 0 0.4em 1.2em;
960960+ padding: 0 0 0 0.7em;
961961+ font-size: 0.8em;
962962+ line-height: 1.5;
963963+ color: var(--text-muted, #666);
964964+ border-left: 2px solid var(--accent-color, #b44e2d);
965965+ font-style: normal;
966966+}
967967+968968+/* Keep the sidenote from overlapping figures or code blocks that
969969+ come after it. */
970970+.margin-note + * {
971971+ clear: right;
972972+}
973973+974974+/* Narrow viewports: show inline before the next block. */
975975+@media (max-width: 800px) {
976976+ .margin-note {
977977+ float: none;
978978+ display: block;
979979+ width: auto;
980980+ margin: 0.4em 0 0.4em 0;
981981+ font-size: 0.85em;
982982+ }
983983+}
984984+985985+@media (prefers-color-scheme: dark) {
986986+ .margin-note {
987987+ color: var(--text-muted, #aaa);
988988+ }
989989+}
990990+|}
991991+992992+let () =
993993+ Api.Registry.register_inline (module Margin);
994994+ Api.Registry.register_inline (module Kbd);
995995+ Api.Registry.register_inline (module Image_inline);
996996+ Api.Registry.register_inline (module Linked_image_inline);
997997+ (* Now that inline_extensions_css is in scope, register
998998+ jon-shell.css with the combined stylesheet — the shell's page
999999+ <link> already references this file, so no per-page hook needed. *)
10001000+ Odoc_extension_registry.register_support_file ~prefix:"jon-shell"
10011001+ {
10021002+ filename = "extensions/jon-shell.css";
10031003+ content = Inline
10041004+ (Odoc_jons_plugins_css.css ^ "\n" ^ inline_extensions_css);
10051005+ }
40210064031007(* --- Recent posts extension --- *)
4041008
···219219 let cat = categorize entry in
220220 let is_pure_documentation =
221221 match kind with
222222- | Doc | Page _ | Dir | Impl -> true
222222+ | Dir | Impl -> true
223223 | _ -> false
224224 in
225225 if is_pure_documentation || cat = `ignore || Odoc_model.Paths.Identifier.is_hidden id
+39
odoc/src/extension_api/odoc_extension_api.ml
···165165 [`Binding (key, value)] for key=value pairs. *)
166166}
167167168168+(** {1 Inline Extensions}
169169+170170+ Extensions can handle inline-level custom elements written as
171171+ [{&name payload}] in odoc comments. The extension receives the raw
172172+ payload string and returns the HTML that should be spliced into
173173+ the output verbatim.
174174+175175+ Example:
176176+ {[
177177+ module Margin = struct
178178+ let prefix = "margin"
179179+ let to_html payload =
180180+ Printf.sprintf
181181+ {|<span class="margin-note">%s</span>|}
182182+ payload
183183+ end
184184+185185+ let () = Registry.register_inline (module Margin)
186186+ ]}
187187+188188+ In a comment: [Some text with {&margin an aside about X} inline.]
189189+190190+ Inline extensions are HTML-only. On other backends the element
191191+ renders to the empty string. Payload is a raw string — the
192192+ extension decides how to parse it. *)
193193+194194+module type Inline_Extension = sig
195195+ val prefix : string
196196+ (** The inline tag prefix, e.g. ["margin"] handles [{&margin ...}]. *)
197197+198198+ val to_html : string -> string
199199+ (** [to_html payload] returns the raw HTML to splice into the output.
200200+ The returned string is emitted without further processing; the
201201+ extension is responsible for any escaping. *)
202202+end
203203+168204(** The signature that code block extensions must implement *)
169205module type Code_Block_Extension = sig
170206 val prefix : string
···238274 E.link ~tag (Obj.obj env) content
239275 in
240276 Odoc_extension_registry.register_link_handler ~prefix:E.prefix link_handler
277277+278278+ let register_inline (module E : Inline_Extension) =
279279+ Odoc_extension_registry.register_inline_handler ~prefix:E.prefix E.to_html
241280242281 let register_code_block (module E : Code_Block_Extension) =
243282 let handler meta content =
···235235236236let find_link_handler ~prefix =
237237 Hashtbl.find_opt link_handlers prefix
238238+239239+(** {1 Inline Extension Handlers}
240240+241241+ Extensions can register handlers for inline-level custom elements
242242+ written as [{&name payload}] in odoc comments. The handler receives
243243+ the raw payload string and returns a chunk of HTML that will be
244244+ spliced into the rendered output verbatim.
245245+246246+ These extensions are HTML-only. On other backends the element
247247+ renders to the empty string. *)
248248+249249+type inline_handler = string -> string
250250+(** [payload -> raw html]. *)
251251+252252+let inline_handlers : (string, inline_handler) Hashtbl.t = Hashtbl.create 16
253253+254254+let inline_prefixes : (string, unit) Hashtbl.t = Hashtbl.create 16
255255+256256+let register_inline_handler ~prefix (handler : inline_handler) =
257257+ Hashtbl.replace inline_handlers prefix handler;
258258+ Hashtbl.replace inline_prefixes prefix ()
259259+260260+let find_inline_handler ~prefix =
261261+ Hashtbl.find_opt inline_handlers prefix
262262+263263+let list_inline_prefixes () =
264264+ Hashtbl.fold (fun prefix () acc -> prefix :: acc) inline_prefixes []
265265+ |> List.sort String.compare
266266+267267+(** Synthetic target prefix used in [Comment.Raw_markup] to carry
268268+ inline-extension payloads through the AST without adding a new
269269+ variant. Lexer emits [Raw_markup (Some (inline_extension_target_prefix
270270+ ^ name), payload)]; renderers detect the prefix and dispatch. *)
271271+let inline_extension_target_prefix = "odoc-ext:"
+22-8
odoc/src/html/generator.ml
···83838484and raw_markup (t : Raw_markup.t) =
8585 let target, content = t in
8686- match Astring.String.Ascii.lowercase target with
8787- | "html" ->
8888- (* This is OK because we output *textual* HTML.
8989- In theory, we should try to parse the HTML with lambdasoup and rebuild
9090- the HTML tree from there.
9191- *)
9292- [ Html.Unsafe.data content ]
9393- | _ -> []
8686+ let lowercase_target = Astring.String.Ascii.lowercase target in
8787+ let ext_prefix = Odoc_extension_registry.inline_extension_target_prefix in
8888+ let ext_prefix_len = Stdlib.String.length ext_prefix in
8989+ if Stdlib.String.length lowercase_target >= ext_prefix_len
9090+ && Stdlib.String.sub lowercase_target 0 ext_prefix_len = ext_prefix
9191+ then
9292+ let name =
9393+ Stdlib.String.sub lowercase_target ext_prefix_len
9494+ (Stdlib.String.length lowercase_target - ext_prefix_len)
9595+ in
9696+ match Odoc_extension_registry.find_inline_handler ~prefix:name with
9797+ | Some handler -> [ Html.Unsafe.data (handler content) ]
9898+ | None -> [ Html.Unsafe.data content ]
9999+ else
100100+ match lowercase_target with
101101+ | "html" ->
102102+ (* This is OK because we output *textual* HTML.
103103+ In theory, we should try to parse the HTML with lambdasoup and rebuild
104104+ the HTML tree from there.
105105+ *)
106106+ [ Html.Unsafe.data content ]
107107+ | _ -> []
9410895109and source k ?a ?mode_links (t : Source.t) =
96110 let rec token (x : Source.token) =
+23-2
odoc/src/index/skeleton.ml
···148148149149 let of_docs id source_loc doc =
150150 Entry.entry ~id ~doc:doc.elements ~kind:Doc ~source_loc
151151+152152+ let of_heading label_id inline_content location =
153153+ let doc =
154154+ [ { Odoc_model.Location_.value = `Paragraph inline_content;
155155+ location } ]
156156+ in
157157+ Entry.entry ~id:label_id ~doc ~kind:Doc ~source_loc:None
151158end
152159153160let if_non_hidden id f =
···338345let from_page (p : Page.t) =
339346 match p with
340347 | { name; content; _ } ->
341341- let entry = Entry.of_docs name None content in
342342- Tree.leaf entry
348348+ let page_entry = Entry.of_docs name None content in
349349+ (* Phase 2: emit one child entry per heading so sherlodoc can
350350+ index heading text and deep-link to the anchor. The heading's
351351+ Label.t identifier carries the fragment anchor. *)
352352+ let heading_children =
353353+ List.filter_map
354354+ (fun (el : Odoc_model.Comment.block_element
355355+ Odoc_model.Location_.with_location) ->
356356+ match el.value with
357357+ | `Heading (_, label_id, inline_content) ->
358358+ Some (Tree.leaf
359359+ (Entry.of_heading label_id inline_content el.location))
360360+ | _ -> None)
361361+ content.elements
362362+ in
363363+ { Tree.node = page_entry; children = heading_children }
···7070</figure>
7171%}
72727373-There are a number of advantages and disadvantages to this. As @davesnx {{:https://sancho.dev/blog/ocaml-documentation-as-markdown}wrote},
7373+There are a number of advantages and disadvantages to this. As \@davesnx {{:https://sancho.dev/blog/ocaml-documentation-as-markdown}wrote},
7474his concern with the markdown output was to be able to integrate the odoc output seamlessly
7575with an existing site, and it does this very well. However, it's at a cost - we lose links
7676in the API docs, links to source, the source rendering itself, and so on. Whereas the plugin
+85
site/drafts/new-extensions.mld
···11+{0 New extensions}
22+33+@page-tags ocaml odoc plugins meta
44+55+A visual tour of four new pieces of machinery wired into this site:
66+page tags, native figures, custom inline extensions, and mld-page
77+search via sherlodoc.
88+99+{1 [@page-tags]}
1010+1111+The list at the top of this page was written as:
1212+1313+{@shell[
1414+@page-tags ocaml odoc plugins meta
1515+]}
1616+1717+The plugin extracts those words, normalises them, and renders a row
1818+of chip links to [/tags/<tag>]. Cross-page indexing (finding all
1919+pages tagged [odoc]) is a follow-up.
2020+2121+{1 [@figure]}
2222+2323+Two forms coexist.
2424+2525+{b V1 (attribute-only, plain-text caption):}
2626+2727+{@shell[
2828+@figure sherlodoc-search.png alt="Sherlodoc's search UI" link="https://github.com/art-w/sherlodoc"
2929+]}
3030+3131+@figure sherlodoc-search.png alt="Sherlodoc's search UI (V1, plain caption)" link="https://github.com/art-w/sherlodoc"
3232+3333+{b V3 (inline-AST caption with rich formatting):} the body carries a
3434+[{&linked-image URL SRC "alt"}] or [{&image SRC "alt"}] custom inline,
3535+and the remaining inlines become a {i richly-formatted} figcaption:
3636+3737+{@shell[
3838+@figure {&linked-image https://github.com/art-w/sherlodoc sherlodoc-search.png "Sherlodoc"} \
3939+produced by the {i sherlodoc} team.
4040+]}
4141+4242+@figure {&linked-image https://github.com/art-w/sherlodoc sherlodoc-search.png "Sherlodoc search UI"} produced by the {i sherlodoc} team — note the {b bold} and {i italics} work inside the caption.
4343+4444+{1 [{&name payload}] custom inlines}
4545+4646+Press {&kbd Ctrl-K} to open the search box. The tag here was written
4747+as [{&kbd Ctrl-K}] and the plugin turned it into a [<kbd>] element.
4848+4949+A margin note {&margin This is a side note. It floats to the right
5050+and the paragraph wraps around it.} floats to the right of the
5151+paragraph. This is a sizeable block of prose designed to wrap around
5252+the margin note and give it enough vertical space to show properly.
5353+Without the float, the note would sit inline — which is what it did
5454+in the first cut of this plugin. With the float, you get a proper
5555+sidenote experience on wide screens. On a narrow viewport it collapses
5656+to a block-quote style above the next element. Same syntax — [{&margin
5757+note text}] — same plugin registered under [margin], just better CSS.
5858+5959+Two plugins, 20 lines of OCaml each:
6060+6161+{@ocaml[
6262+module Kbd = struct
6363+ let prefix = "kbd"
6464+ let to_html s = Printf.sprintf {|<kbd>%s</kbd>|} (escape s)
6565+end
6666+6767+let () = Api.Registry.register_inline (module Kbd)
6868+]}
6969+7070+{1 Sherlodoc mld indexing}
7171+7272+Searching for {b parseff}, {b sherlodoc}, or any other word that
7373+appears only in a [.mld] page now returns the page itself as a hit.
7474+Before this patch only API items (values, types, modules) were
7575+searchable; page prose was dropped on the floor by the sherlodoc
7676+indexer. One-line change in [load_doc.ml]. Per-heading anchors with
7777+fragment URLs are Phase 2.
7878+7979+{1 What's next}
8080+8181+- Cross-page tag index ([@tagged-pages <tag>]).
8282+- [{&margin …}] promoted from inline chip to real gutter placement.
8383+- [@figure] V3 using the inline-AST caption pattern, enabled by the
8484+ new custom-inline hook.
8585+- Sherlodoc Phase 2: per-heading entries with deep-link anchors.
···11+{0 Tags}
22+33+Pages grouped by tag. Each tag page lists posts that carry it.
44+55+{ul
66+{- {{!page-ocaml}ocaml}}
77+{- {{!page-odoc}odoc}}
88+{- {{!page-plugins}plugins}}
99+{- {{!page-meta}meta}}
1010+}
+6
site/tags/meta.mld
···11+{0 Tag: meta}
22+33+Pages tagged [meta].
44+55+{i (An auto-generated list of pages carrying this tag will live here
66+once the [@tagged-pages] consumer extension ships.)}
+6
site/tags/ocaml.mld
···11+{0 Tag: ocaml}
22+33+Pages tagged [ocaml].
44+55+{i (An auto-generated list of pages carrying this tag will live here
66+once the [@tagged-pages] consumer extension ships.)}
+6
site/tags/odoc.mld
···11+{0 Tag: odoc}
22+33+Pages tagged [odoc].
44+55+{i (An auto-generated list of pages carrying this tag will live here
66+once the [@tagged-pages] consumer extension ships.)}
+6
site/tags/plugins.mld
···11+{0 Tag: plugins}
22+33+Pages tagged [plugins].
44+55+{i (An auto-generated list of pages carrying this tag will live here
66+once the [@tagged-pages] consumer extension ships.)}