Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware

Modify asm_fred_entry_from_kvm() to allow it to be invoked by KVM even
when FRED isn't fully enabled, e.g. when running with
CONFIG_X86_FRED=y on non-FRED hardware. This will allow forcing KVM
to always use the FRED entry points for 64-bit kernels, which in turn
will eliminate a rather gross non-CFI indirect call that KVM uses to
trampoline IRQs by doing IDT lookups.

The point of asm_fred_entry_from_kvm() is to bridge between C
(vmx:handle_external_interrupt_irqoff()) and more C
(__fred_entry_from_kvm()) while changing the calling context to appear
like an interrupt (pt_regs). Making the whole thing bound by C ABI.

All that remains for non-FRED hardware is to restore RSP (to undo the
redzone and alignment). However the trivial change would result in
code like:

push %rbp
mov %rsp, %rbp

sub $REDZONE, %rsp
and $MASK, %rsp

PUSH_AND_CLEAR_REGS
push %rbp

POP_REGS
pop %rbp <-- *objtool fail*

mov %rbp, %rsp
pop %rbp
ret

And this will confuse objtool something wicked -- it gets confused by
the extra pop %rbp, not realizing the push and pop preserve the value.

Rather than trying to each objtool about this, recognise that since
the code is bound by C ABI on both ends and interrupts are not allowed
to change pt_regs (only exceptions are) it is sufficient to PUSH_REGS
in order to create pt_regs, but there is no reason to POP_REGS --
provided the callee-saved registers are preserved.

So avoid clearing callee-saved regs and skip POP_REGS.

[Original patch by Sean; much of this version by Josh; Changelog,
comments and final form by Peterz]

Originally-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lkml.kernel.org/r/20250714103441.245417052@infradead.org

authored by

Josh Poimboeuf and committed by
Peter Zijlstra
deed19b9 2d1435b7

+33 -14
+5 -6
arch/x86/entry/calling.h
··· 99 99 .endif 100 100 .endm 101 101 102 - .macro CLEAR_REGS clear_bp=1 102 + .macro CLEAR_REGS clear_callee=1 103 103 /* 104 104 * Sanitize registers of values that a speculation attack might 105 105 * otherwise want to exploit. The lower registers are likely clobbered ··· 113 113 xorl %r9d, %r9d /* nospec r9 */ 114 114 xorl %r10d, %r10d /* nospec r10 */ 115 115 xorl %r11d, %r11d /* nospec r11 */ 116 + .if \clear_callee 116 117 xorl %ebx, %ebx /* nospec rbx */ 117 - .if \clear_bp 118 118 xorl %ebp, %ebp /* nospec rbp */ 119 - .endif 120 119 xorl %r12d, %r12d /* nospec r12 */ 121 120 xorl %r13d, %r13d /* nospec r13 */ 122 121 xorl %r14d, %r14d /* nospec r14 */ 123 122 xorl %r15d, %r15d /* nospec r15 */ 124 - 123 + .endif 125 124 .endm 126 125 127 - .macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_bp=1 unwind_hint=1 126 + .macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_callee=1 unwind_hint=1 128 127 PUSH_REGS rdx=\rdx, rcx=\rcx, rax=\rax, save_ret=\save_ret unwind_hint=\unwind_hint 129 - CLEAR_REGS clear_bp=\clear_bp 128 + CLEAR_REGS clear_callee=\clear_callee 130 129 .endm 131 130 132 131 .macro POP_REGS pop_rdi=1
+27 -8
arch/x86/entry/entry_64_fred.S
··· 112 112 push %rax /* Return RIP */ 113 113 push $0 /* Error code, 0 for IRQ/NMI */ 114 114 115 - PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0 115 + PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0 116 + 116 117 movq %rsp, %rdi /* %rdi -> pt_regs */ 117 - call __fred_entry_from_kvm /* Call the C entry point */ 118 - POP_REGS 119 - ERETS 120 - 1: 121 118 /* 122 - * Objtool doesn't understand what ERETS does, this hint tells it that 123 - * yes, we'll reach here and with what stack state. A save/restore pair 124 - * isn't strictly needed, but it's the simplest form. 119 + * At this point: {rdi, rsi, rdx, rcx, r8, r9}, {r10, r11}, {rax, rdx} 120 + * are clobbered, which corresponds to: arguments, extra caller-saved 121 + * and return. All registers a C function is allowed to clobber. 122 + * 123 + * Notably, the callee-saved registers: {rbx, r12, r13, r14, r15} 124 + * are untouched, with the exception of rbp, which carries the stack 125 + * frame and will be restored before exit. 126 + * 127 + * Further calling another C function will not alter this state. 128 + */ 129 + call __fred_entry_from_kvm /* Call the C entry point */ 130 + 131 + /* 132 + * When FRED, use ERETS to potentially clear NMIs, otherwise simply 133 + * restore the stack pointer. 134 + */ 135 + ALTERNATIVE "nop; nop; mov %rbp, %rsp", \ 136 + __stringify(add $C_PTREGS_SIZE, %rsp; ERETS), \ 137 + X86_FEATURE_FRED 138 + 139 + 1: /* 140 + * Objtool doesn't understand ERETS, and the cfi register state is 141 + * different from initial_func_cfi due to PUSH_REGS. Tell it the state 142 + * is similar to where UNWIND_HINT_SAVE is. 125 143 */ 126 144 UNWIND_HINT_RESTORE 145 + 127 146 pop %rbp 128 147 RET 129 148
+1
arch/x86/kernel/asm-offsets.c
··· 102 102 103 103 BLANK(); 104 104 DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); 105 + OFFSET(C_PTREGS_SIZE, pt_regs, orig_ax); 105 106 106 107 /* TLB state for the entry code */ 107 108 OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);