randomize_kstack: Improve stack alignment codegen

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

The codgen for adding architecture-specific stack alignment to the
effective alloca() usage is somewhat inefficient and allows a bit to get
carried beyond the desired entropy range. This isn't really a problem,
but it's unexpected and the codegen is kind of bad.

Quoting Mark[1], the disassembly for arm64's invoke_syscall() looks like:

// offset = raw_cpu_read(kstack_offset)
mov x4, sp
adrp x0, kstack_offset
mrs x5, tpidr_el1
add x0, x0, #:lo12:kstack_offset
ldr w0, [x0, x5]

// offset = KSTACK_OFFSET_MAX(offset)
and x0, x0, #0x3ff

// alloca(offset)
add x0, x0, #0xf
and x0, x0, #0x7f0
sub sp, x4, x0

... which in C would be:

offset = raw_cpu_read(kstack_offset)
offset &= 0x3ff; // [0x0, 0x3ff]
offset += 0xf; // [0xf, 0x40e]
offset &= 0x7f0; // [0x0,

... so when *all* bits [3:0] are 0, they'll have no impact, and when
*any* of bits [3:0] are 1 they'll trigger a carry into bit 4, which
could ripple all the way up and spill into bit 10.

Switch the masking in KSTACK_OFFSET_MAX() to explicitly clear the bottom
bits to avoid the rounding by using 0b1111110000 instead of 0b1111111111:

// offset = raw_cpu_read(kstack_offset)
mov x4, sp
adrp x0, 0 <kstack_offset>
mrs x5, tpidr_el1
add x0, x0, #:lo12:kstack_offset
ldr w0, [x0, x5]

// offset = KSTACK_OFFSET_MAX(offset)
and x0, x0, #0x3f0

// alloca(offset)
sub sp, x4, x0

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/lkml/ZnVfOnIuFl2kNWkT@J2N7QTR9R3/ [1]
Link: https://lore.kernel.org/r/20240702211612.work.576-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>

Kees Cook 2 years ago 872bb37f 3ccea478

+12 -6

1 changed file

expand all

include

linux

randomize_kstack.h

+12 -6

include/linux/randomize_kstack.h

··· 32 32 #endif 33 33 34 34 /* 35 - * Use, at most, 10 bits of entropy. We explicitly cap this to keep the 36 - * "VLA" from being unbounded (see above). 10 bits leaves enough room for 37 - * per-arch offset masks to reduce entropy (by removing higher bits, since 38 - * high entropy may overly constrain usable stack space), and for 39 - * compiler/arch-specific stack alignment to remove the lower bits. 35 + * Use, at most, 6 bits of entropy (on 64-bit; 8 on 32-bit). This cap is 36 + * to keep the "VLA" from being unbounded (see above). Additionally clear 37 + * the bottom 4 bits (on 64-bit systems, 2 for 32-bit), since stack 38 + * alignment will always be at least word size. This makes the compiler 39 + * code gen better when it is applying the actual per-arch alignment to 40 + * the final offset. The resulting randomness is reasonable without overly 41 + * constraining usable stack space. 40 42 */ 41 - #define KSTACK_OFFSET_MAX(x) ((x) & 0x3FF) 43 + #ifdef CONFIG_64BIT 44 + #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111110000) 45 + #else 46 + #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100) 47 + #endif 42 48 43 49 /** 44 50 * add_random_kstack_offset - Increase stack utilization by previously

Configure Feed

Configure Feed