commits

oscillatory.net / ocr-to-typst

fork

this repo has no description

fork

Author

Commit

Message

Date

oscillatory.net

2d4ec1fb

Update README canon

5d ago

oscillatory.net

12e31c77

augment_vis: invert handwriting splits

6d ago

oscillatory.net +1

a0d6e272

Add dark-mode inversion augmentation for handwriting splits

6d ago

oscillatory.net +1

39541e8c

Fix dark-mode augmentation and simplify augment-vis

6d ago

oscillatory.net

e71d3505

Track regenerated typeset splits with dark mode

6d ago

oscillatory.net +1

8a638089

Overhaul augmentation pipeline and add augment-vis tool

7d ago

oscillatory.net +1

549c05b1

Add typeset_prose splits and augmentation improvements

7d ago

oscillatory.net

8185b3c7

Add deepseek handwriting demo web app

13d ago

oscillatory.net +1

e9b0dc32

Add DeepSeek-OCR-2 model code with eval_mode inference fix

13d ago

oscillatory.net

24972323

Move deepseek-ocr 2 finetune code to subdirectory to use earlier version of transformers

14d ago

oscillatory.net

7ed4e562

WIP: new deepseek-ocr 2 impl

14d ago

oscillatory.net

83939e3a

Remove previous deepseek-ocr 2 finetune attempt

14d ago

oscillatory.net

0971daf5

Add image-stats command

14d ago

oscillatory.net +1

37faa6c2

Regen typeset splits at 250 PPI; rename structured->uniform; strengthen prompt

14d ago

oscillatory.net +1

c0510274

Add licenses; flatten fonts dir; update font path references

15d ago

oscillatory.net +1

bf0f6f82

Rename hw_* splits to typeset_structured/mixed; drop old typeset_* splits; pin unsloth

15d ago

oscillatory.net +1

09a4a4a9

Update README: reflect hw_* splits, new body grammar weights, updated data mix

15d ago

oscillatory.net +1

9d39de1b

Add handwriting-font data pipeline; expand body grammar; drop typeset_* splits

15d ago

oscillatory.net +1

f01c190c

Update eval (per-split sampling, verbose mode); align save_steps with eval; drop DeepSeek from README

15d ago

oscillatory.net +1

c5e8f4e8

Add crohme_gen_syntactic split; fix val stratification; reduce batch size; fix checkpoint resume; drop transformers pin

16d ago

oscillatory.net +1

cf2af1ec

Overhaul Gemma training; fix eval stratification; fix mathwriting label format

16d ago

oscillatory.net +1

9b76091e

Fix image injection bug; add DeepSeek eval/mining scripts

18d ago

oscillatory.net +1

1016f685

Add TensorBoard logging; backfill script for stdout logs

2w ago

oscillatory.net +1

92cde45a

Fix training loss logging; report_to=[] instead of 'none'

2w ago

oscillatory.net +1

c846ac8b

Increase batch size to 2, halve grad_accum; update seed to 29979

2w ago

oscillatory.net +1

d4e31716

Cap typeset_val at 250; document training mix and splits in README

2w ago

oscillatory.net +1

c4c0a79b

Rename project to typst-ocr

2w ago

oscillatory.net +1

f58a2a1f

Switch to Python 3.12; 1 epoch; rebalance training caps

2w ago

oscillatory.net +1

2fef662a

Add README with training split documentation; rebalance train data; fix val

2w ago

oscillatory.net +1

f2713b81

Remove debug prints from modeling_deepseekocr2.py

2w ago

oscillatory.net +1

84a92e75

Fix DeepSeek-OCR-2 for training; load from local patched module

2w ago

oscillatory.net +1

813c41fb

Add DeepSeek-OCR-2 modeling files from HuggingFace hub (verbatim)

2w ago

oscillatory.net +1

5156b12c

Add train_deepseek: QLoRA fine-tuning script for DeepSeek-OCR-2

2w ago

oscillatory.net +1

c80071a3

Add DeepSeek-OCR-2 data collator

2w ago

oscillatory.net

b1a45769

Add data/typeset_mixed_*

2w ago

oscillatory.net +1

80a60741

Wrap math-only split labels with display math delimiters at load time

2w ago

oscillatory.net +1

98174044

Add underline to text groups; reduce styling rates to 25%

2w ago

oscillatory.net +1

1a3a18e0

Add polynomials, schematic matrix, multi-paragraph, and mixed list+para bodies

2w ago

oscillatory.net +1

348798e1

Fix list item overlap from tall math expressions

2w ago

oscillatory.net +1

ebf93d9b

Add row vectors and non-square matrices to grammar

2w ago

oscillatory.net +1

e2bb02fe

Expand grammar: cal(), logic/sequent, vec, overbrace, tables, bra-ket, intervals

2w ago

oscillatory.net

6aebf20b

WIP: Add generate_mixed dataset for more structure

2w ago

oscillatory.net +1

29263f42

Add replace-dif script and apply d→dif label rewrites

3w ago

oscillatory.net +1

1d777a94

Add boolean search and result counts to label review

3w ago

oscillatory.net

6c0f4788

label app: Add search

3w ago

oscillatory.net

a47303af

Add replace_frac

3w ago

oscillatory.net

a063c7d0

Replace deprecated 'diff' keyword with 'partial' in labels

3w ago

oscillatory.net

de8a0f82

Replace frac(A,B) with (A)/(B) in all split labels

3w ago

oscillatory.net

6008c21e

Update 4 labels from crohme_real_train (gt/lt => >/<)

3w ago

oscillatory.net

be55362a

Canonicalize ^prime -> ' in labels (dataset-v2 prep)

3w ago

oscillatory.net +1

e06f000e

search-labels: add --raw flag for literal substring patterns

3w ago

oscillatory.net +1

e9a2cc86

Add search-labels CLI for inspecting and bulk-replacing label patterns

3w ago

oscillatory.net +1

ea09f4da

Add label review web app and apply-edits CLI

3w ago

oscillatory.net +1

aa87b661

Add vim swap files to .gitignore

3w ago

oscillatory.net +1

69d206b6

Move probe_deepseek to src package, drop stale scripts/ copy

3w ago

oscillatory.net +1

976da9d2

Consolidate dataset under data/, init DVC, track all 14 splits as v1

3w ago

oscillatory.net +1

27b6351e

Add DeepSeek-OCR-2 standalone scripts and project deps

3w ago

oscillatory.net +1

383d9bfb

Bump max_new_tokens to 512 and add raw output debug logging

3w ago

oscillatory.net +1

4ecaccbe

Add FastAPI web app and refine probe/eval utilities

3w ago

oscillatory.net +1

dfcf3535

Import eff-mer CNN+Transformer model for comparison baseline

3w ago

Update README canon

2d4ec1fb

oscillatory.net

augment_vis: invert handwriting splits

12e31c77

oscillatory.net

Add dark-mode inversion augmentation for handwriting splits

a0d6e272

oscillatory.net +1

Fix dark-mode augmentation and simplify augment-vis

39541e8c

oscillatory.net +1

Track regenerated typeset splits with dark mode

e71d3505

oscillatory.net

Overhaul augmentation pipeline and add augment-vis tool

8a638089

oscillatory.net +1

Add typeset_prose splits and augmentation improvements

549c05b1

oscillatory.net +1

Add deepseek handwriting demo web app

8185b3c7

oscillatory.net

13d

Add DeepSeek-OCR-2 model code with eval_mode inference fix

e9b0dc32

oscillatory.net +1

13d

Move deepseek-ocr 2 finetune code to subdirectory to use earlier version of transformers

24972323

oscillatory.net

14d

WIP: new deepseek-ocr 2 impl

7ed4e562

oscillatory.net

14d

Remove previous deepseek-ocr 2 finetune attempt

83939e3a

oscillatory.net

14d

Add image-stats command

0971daf5

oscillatory.net

14d

Regen typeset splits at 250 PPI; rename structured->uniform; strengthen prompt

37faa6c2

oscillatory.net +1

14d

Add licenses; flatten fonts dir; update font path references

c0510274

oscillatory.net +1

15d

Rename hw_* splits to typeset_structured/mixed; drop old typeset_* splits; pin unsloth

bf0f6f82

oscillatory.net +1

15d

Update README: reflect hw_* splits, new body grammar weights, updated data mix

09a4a4a9

oscillatory.net +1

15d

Add handwriting-font data pipeline; expand body grammar; drop typeset_* splits

9d39de1b

oscillatory.net +1

15d

Update eval (per-split sampling, verbose mode); align save_steps with eval; drop DeepSeek from README

f01c190c

oscillatory.net +1

15d

Add crohme_gen_syntactic split; fix val stratification; reduce batch size; fix checkpoint resume; drop transformers pin

c5e8f4e8

oscillatory.net +1

16d

Overhaul Gemma training; fix eval stratification; fix mathwriting label format

cf2af1ec

oscillatory.net +1

16d

Fix image injection bug; add DeepSeek eval/mining scripts

9b76091e

oscillatory.net +1

18d

Add TensorBoard logging; backfill script for stdout logs

1016f685

oscillatory.net +1

Fix training loss logging; report_to=[] instead of 'none'

92cde45a

oscillatory.net +1

Increase batch size to 2, halve grad_accum; update seed to 29979

c846ac8b

oscillatory.net +1

Cap typeset_val at 250; document training mix and splits in README

d4e31716

oscillatory.net +1

Rename project to typst-ocr

c4c0a79b

oscillatory.net +1

Switch to Python 3.12; 1 epoch; rebalance training caps

f58a2a1f

oscillatory.net +1

Add README with training split documentation; rebalance train data; fix val

2fef662a

oscillatory.net +1

Remove debug prints from modeling_deepseekocr2.py

f2713b81

oscillatory.net +1

Fix DeepSeek-OCR-2 for training; load from local patched module

84a92e75

oscillatory.net +1

Add DeepSeek-OCR-2 modeling files from HuggingFace hub (verbatim)

813c41fb

oscillatory.net +1

Add train_deepseek: QLoRA fine-tuning script for DeepSeek-OCR-2

5156b12c

oscillatory.net +1

Add DeepSeek-OCR-2 data collator

c80071a3

oscillatory.net +1

Add data/typeset_mixed_*

b1a45769

oscillatory.net

Wrap math-only split labels with display math delimiters at load time

80a60741

oscillatory.net +1

Add underline to text groups; reduce styling rates to 25%

98174044

oscillatory.net +1

Add polynomials, schematic matrix, multi-paragraph, and mixed list+para bodies

1a3a18e0

oscillatory.net +1

Fix list item overlap from tall math expressions

348798e1

oscillatory.net +1

Add row vectors and non-square matrices to grammar

ebf93d9b

oscillatory.net +1

Expand grammar: cal(), logic/sequent, vec, overbrace, tables, bra-ket, intervals

e2bb02fe

oscillatory.net +1

WIP: Add generate_mixed dataset for more structure

6aebf20b

oscillatory.net

Add replace-dif script and apply d→dif label rewrites

29263f42

oscillatory.net +1

Add boolean search and result counts to label review

1d777a94

oscillatory.net +1

label app: Add search

6c0f4788

oscillatory.net

Add replace_frac

a47303af

oscillatory.net

Replace deprecated 'diff' keyword with 'partial' in labels

a063c7d0

oscillatory.net

Replace frac(A,B) with (A)/(B) in all split labels

de8a0f82

oscillatory.net

Update 4 labels from crohme_real_train (gt/lt => >/<)

6008c21e

oscillatory.net

Canonicalize ^prime -> ' in labels (dataset-v2 prep)

be55362a

oscillatory.net

search-labels: add --raw flag for literal substring patterns

e06f000e

oscillatory.net +1

Add search-labels CLI for inspecting and bulk-replacing label patterns

e9a2cc86

oscillatory.net +1

Add label review web app and apply-edits CLI

ea09f4da

oscillatory.net +1

Add vim swap files to .gitignore

aa87b661

oscillatory.net +1

Move probe_deepseek to src package, drop stale scripts/ copy

69d206b6

oscillatory.net +1

Consolidate dataset under data/, init DVC, track all 14 splits as v1

976da9d2

oscillatory.net +1

Add DeepSeek-OCR-2 standalone scripts and project deps

27b6351e

oscillatory.net +1

Bump max_new_tokens to 512 and add raw output debug logging

383d9bfb

oscillatory.net +1

Add FastAPI web app and refine probe/eval utilities

4ecaccbe

oscillatory.net +1

Import eff-mer CNN+Transformer model for comparison baseline

dfcf3535

oscillatory.net +1

Configure Feed

Configure Feed

commits