Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
CROHME, MathWriting, and typeset_train manifests store bare math expressions.
Since images render as display math ($ expr $), training targets should be
valid Typst -- wrap at load_records() via _MATH_ONLY_SPLITS set rather than
touching manifests. Mixed splits already contain full body content.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_styled() now accepts underline param, rendering as #underline[...] wrapping
any bold/italic markup. Bold, italic, and underline are each applied
independently at 25% (down from 30% for bold/italic).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
generate_typeset.py:
- _polynomial(): explicit (a x^3 + b x^2 + c x + d), indexed (a_0 + a_1 x + ... + a_n x^n),
monic general form; variable pool includes operator letters T, A, D, L, X, S
- _schematic_matrix(): generic n×m with dots.c / dots.v / dots.down ellipsis
- New 0.73–0.76 probability slots (stole 3% from product/df/dx/partial)
generate_mixed.py:
- _multi_paragraph(): 2–4 inline seqs separated by blank lines (~15% of bodies)
- _para_then_list() / _list_then_para(): intro-para+list and list+outro-para (~10%)
- All reflowable bodies (multi-para, lists, mixed) get random fixed width from
_PARA_WIDTHS = [200, 240, 280, 320, 360, 420, 480]pt to cover narrow two-column
journal through wide single-column; tables stay width: auto
- generate_body() returns (body, page_width) tuple; _CONTENT_TEMPLATE parameterized
with {width}; render_content() accepts page_width arg
- Image hash keyed on "width:body" to avoid collisions across reflow variants
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>