Fix DeepSeek-OCR-2 for training; load from local patched module
Two patches to modeling_deepseekocr2.py for training compatibility:
1. .clone(): breaks autograd leaf-variable link so masked_scatter_ on
inputs_embeds slice doesn't raise during backprop
2. .to(bfloat16): matches vision encoder dtype (prepare_model_for_kbit_training
upcasts embedding table to fp32; vision encoder stays bfloat16)
train_deepseek.py now imports DeepseekOCR2ForCausalLM directly from the local
src/deepseek_ocr2 module instead of trust_remote_code -- weights still fetched
from hub, only the forward() code is local and version-controlled.
Smoke test: forward OK (loss 16.85), backward OK.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>