ghash_pure: constant-time GF(2^128) multiplication
Replace bit-by-bit implementation with constant-time shift-and-XOR.
All 128 iterations execute identical operations regardless of input:
- Conditional XOR uses arithmetic mask ((-bit) land 0xff), no branch
- Reduction polynomial XOR uses same masking technique
- No lookup tables, no data-dependent memory access
Reference: BearSSL ctmul technique (Thomas Pornin)