Add handwriting-font data pipeline; expand body grammar; drop typeset_* splits
- src/download_hw_fonts.py: downloads 6 Google Fonts TTFs, strips WOFF wrappers,
instantiates variable fonts at wght=400 for full character coverage
- src/generate_handwritten.py: hw mode (whole-doc font) and mix mode (per-block
font mixing); 7-way uniform font sampling including Typst default; manifest
records clean body (no font directives)
- src/generate_mixed.py: expand generate_body -- add 18% bare math, 15% short
inline (1-2 tokens); reduce complex structured weight; min complexity now n=1
- src/data.py: replace typeset_* splits with hw_structured_* and hw_mixed_*;
update val sampling to use VAL_SPLITS
- src/train.py: fix val loading to use VAL_SPLITS from data.py; move import to
top level
- pyproject.toml: add generate-hw and download-hw-fonts entry points
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>