Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2

Save 480 bytes of .rodata by replacing the .long constants with .bytes,
and using the vpmovzxbd instruction to expand them.

Also update the code to do the loads before incrementing %rax rather
than after. This avoids the need for the first load to use an offset.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250827151131.27733-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

+14 -14
+14 -14
lib/crypto/x86/blake2s-core.S
··· 29 29 .byte 13, 7, 12, 3, 11, 14, 1, 9, 2, 5, 15, 8, 10, 0, 4, 6 30 30 .byte 6, 14, 11, 0, 15, 9, 3, 8, 10, 12, 13, 1, 5, 2, 7, 4 31 31 .byte 10, 8, 7, 1, 2, 4, 6, 5, 13, 15, 9, 3, 0, 11, 14, 12 32 - .section .rodata.cst64.BLAKE2S_SIGMA2, "aM", @progbits, 640 32 + .section .rodata.cst64.BLAKE2S_SIGMA2, "aM", @progbits, 160 33 33 .align 64 34 34 SIGMA2: 35 - .long 0, 2, 4, 6, 1, 3, 5, 7, 14, 8, 10, 12, 15, 9, 11, 13 36 - .long 8, 2, 13, 15, 10, 9, 12, 3, 6, 4, 0, 14, 5, 11, 1, 7 37 - .long 11, 13, 8, 6, 5, 10, 14, 3, 2, 4, 12, 15, 1, 0, 7, 9 38 - .long 11, 10, 7, 0, 8, 15, 1, 13, 3, 6, 2, 12, 4, 14, 9, 5 39 - .long 4, 10, 9, 14, 15, 0, 11, 8, 1, 7, 3, 13, 2, 5, 6, 12 40 - .long 2, 11, 4, 15, 14, 3, 10, 8, 13, 6, 5, 7, 0, 12, 1, 9 41 - .long 4, 8, 15, 9, 14, 11, 13, 5, 3, 2, 1, 12, 6, 10, 7, 0 42 - .long 6, 13, 0, 14, 12, 2, 1, 11, 15, 4, 5, 8, 7, 9, 3, 10 43 - .long 15, 5, 4, 13, 10, 7, 3, 11, 12, 2, 0, 6, 9, 8, 1, 14 44 - .long 8, 7, 14, 11, 13, 15, 0, 12, 10, 4, 5, 6, 3, 2, 1, 9 35 + .byte 0, 2, 4, 6, 1, 3, 5, 7, 14, 8, 10, 12, 15, 9, 11, 13 36 + .byte 8, 2, 13, 15, 10, 9, 12, 3, 6, 4, 0, 14, 5, 11, 1, 7 37 + .byte 11, 13, 8, 6, 5, 10, 14, 3, 2, 4, 12, 15, 1, 0, 7, 9 38 + .byte 11, 10, 7, 0, 8, 15, 1, 13, 3, 6, 2, 12, 4, 14, 9, 5 39 + .byte 4, 10, 9, 14, 15, 0, 11, 8, 1, 7, 3, 13, 2, 5, 6, 12 40 + .byte 2, 11, 4, 15, 14, 3, 10, 8, 13, 6, 5, 7, 0, 12, 1, 9 41 + .byte 4, 8, 15, 9, 14, 11, 13, 5, 3, 2, 1, 12, 6, 10, 7, 0 42 + .byte 6, 13, 0, 14, 12, 2, 1, 11, 15, 4, 5, 8, 7, 9, 3, 10 43 + .byte 15, 5, 4, 13, 10, 7, 3, 11, 12, 2, 0, 6, 9, 8, 1, 14 44 + .byte 8, 7, 14, 11, 13, 15, 0, 12, 10, 4, 5, 6, 3, 2, 1, 9 45 45 46 46 .text 47 47 SYM_FUNC_START(blake2s_compress_ssse3) ··· 193 193 leaq SIGMA2(%rip),%rax 194 194 movb $0xa,%cl 195 195 .Lblake2s_compress_avx512_roundloop: 196 - addq $0x40,%rax 197 - vmovdqa -0x40(%rax),%ymm8 198 - vmovdqa -0x20(%rax),%ymm9 196 + vpmovzxbd (%rax),%ymm8 197 + vpmovzxbd 0x8(%rax),%ymm9 198 + addq $0x10,%rax 199 199 vpermi2d %ymm7,%ymm6,%ymm8 200 200 vpermi2d %ymm7,%ymm6,%ymm9 201 201 vmovdqa %ymm8,%ymm6