crypto: fix portability and security bugs in C primitives
Address issues identified in a hardware/security audit pass over
src/c/. All changes are correctness/hardening fixes; no behavioural
changes for callers.
Correctness:
- detect_cpu_features.c: <sys/auxval.h> -> <sys/auxv.h> typo fix.
The previous header doesn't exist on glibc/musl/bionic, so the
Linux ARM64 detection branch was unbuildable. We also remove
the entire ARM64 detection block (and the arm_aes/arm_pmull
struct fields) because no ARM AES path consumes them -- shipping
dead detection misled audits. ARM AES-CE/PMULL acceleration is
filed as future work.
- misc.c, misc_sse.c (xor_into): the destination side of the 64-bit
and 32-bit XOR loops was a raw cast through a possibly-unaligned
pointer. Only the source was memcpy'd, leaving the destination
as undefined behaviour on architectures that trap unaligned
accesses (SPARC, MIPS, strict ARM, etc.). Use memcpy on both
sides; GCC and Clang elide it on x86.
- misc.c (_mc_count_16_be_4): cast a uint64_t* to uint32_t* and
dereferenced through both -- strict aliasing violation under
-O3. Replace with a uint32_t[4] working buffer + memcpy.
- crypto.h (_mc_switch_accel): wrap the multi-statement macro in
do { } while (0) so that "if (cond) _mc_switch_accel(...)"
doesn't attach the inner else to the wrong if. Add the
required trailing semicolons at every call site.
Hardening:
- crypto.h: add mc_secure_bzero, an optimization-resistant
memset(0) for cryptographic stack residue. Uses the standard
GCC asm memory-clobber barrier on GCC/Clang and a volatile loop
fallback elsewhere.
- aes_aesni.c: wipe the 256-byte schedule[] stack array in
_mc_aesni_derive_e_key and the 240-byte rk[] stack array in
_mc_aesni_invert_e_key before return. Both held expanded
round-key material that previously leaked into stack residue.
- aes_generic.c: wipe the 960-byte sk_exp, q[8], and w[16]
working buffers in _mc_ct64_enc_blocks and _mc_ct64_dec_blocks.
- config/cfg.ml: enable -fstack-protector-strong and
-D_FORTIFY_SOURCE=2 for the C compilation flags. Standard
cryptographic library hardening; no runtime cost on the hot
paths and catches stack/buffer overflows that the existing
code wouldn't have flagged.
Modernization:
- bitfn.h: replace the hand-rolled inline-asm byte swap (i386,
ARM, x86_64) with __builtin_bswap32/64 on GCC and Clang and
_byteswap_ulong/uint64 on MSVC. The hand-rolled ARM sequence
used 'bic', which doesn't exist in Thumb-1, so the previous
code was unbuildable on ARMv6-M (Cortex-M0/M0+). The builtins
emit a single 'rev' on ARMv6+ and 'bswap' on x86, with no
portability hazards.
Renaming:
- __mc_ARM64CE__ -> __mc_ARM64NEON__: the macro gates ARM NEON
XOR/CTR code in misc_sse.c, not Cryptography Extensions. The
previous name implied AES-CE/PMULL acceleration that doesn't
exist; rename to match what the code actually does.
All 4068 tests still pass.