Optimize Zig port: remove @constCast, f64 inference fast path, readability improvements
- Eliminate all @constCast by changing [][]const *Value to [][]*Value
- Add f64 fast-path for inference (doLinearF64, doSoftmaxF64, doRmsnormF64, doGptF64)
- Break up dense Mersenne Twister expressions into readable intermediates
- Use std.mem.swap in mtShuffle
- Inline sum in doLinear to avoid temp allocation per row
- Pre-size ArrayLists (tokens, losses) where lengths are known
- Shrink Value.gen from u64 to u32