Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

entry: Split preemption from irqentry_exit_to_kernel_mode()

Some architecture-specific work needs to be performed between the state
management for exception entry/exit and the "real" work to handle the
exception. For example, arm64 needs to manipulate a number of exception
masking bits, with different exceptions requiring different masking.

Generally this can all be hidden in the architecture code, but for arm64
the current structure of irqentry_exit_to_kernel_mode() makes this
particularly difficult to handle in a way that is correct, maintainable,
and efficient.

The gory details are described in the thread surrounding:

https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/

The summary is:

* Currently, irqentry_exit_to_kernel_mode() handles both involuntary
preemption AND state management necessary for exception return.

* When scheduling (including involuntary preemption), arm64 needs to
have all arm64-specific exceptions unmasked, though regular interrupts
must be masked.

* Prior to the state management for exception return, arm64 needs to
mask a number of arm64-specific exceptions, and perform some work with
these exceptions masked (with RCU watching, etc).

While in theory it is possible to handle this with a new arch_*() hook
called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
and complicated, and doesn't match the flow used for exception return to
user mode, which has a separate 'prepare' step (where preemption can
occur) prior to the state management.

To solve this, refactor irqentry_exit_to_kernel_mode() to match the
style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
state management in a new irqentry_exit_to_kernel_mode_after_preempt()
function. The existing irqentry_exit_to_kernel_mode() is left as a
caller of both of these, avoiding the need to modify existing callers.

There should be no functional change as a result of this change.

[ tglx: Updated kernel doc ]

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-6-mark.rutland@arm.com

authored by

Mark Rutland and committed by
Thomas Gleixner
041aa7a8 c5538d01

+59 -14
+59 -14
include/linux/irq-entry-common.h
··· 438 438 } 439 439 440 440 /** 441 - * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after 442 - * invoking the interrupt handler 441 + * irqentry_exit_to_kernel_mode_preempt - Run preempt checks on return to kernel mode 443 442 * @regs: Pointer to current's pt_regs 444 443 * @state: Return value from matching call to irqentry_enter_from_kernel_mode() 445 444 * 446 - * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the 447 - * necessary preemption check if possible and required. It returns to the caller 448 - * with interrupts disabled and the correct state vs. tracing, lockdep and RCU 449 - * required to return to the interrupted context. 445 + * This is to be invoked before irqentry_exit_to_kernel_mode_after_preempt() to 446 + * allow kernel preemption on return from interrupt. 450 447 * 451 - * It is the last action before returning to the low level ASM code which just 452 - * needs to return. 448 + * Must be invoked with interrupts disabled and CPU state which allows kernel 449 + * preemption. 450 + * 451 + * After returning from this function, the caller can modify CPU state before 452 + * invoking irqentry_exit_to_kernel_mode_after_preempt(), which is required to 453 + * re-establish the tracing, lockdep and RCU state for returning to the 454 + * interrupted context. 453 455 */ 454 - static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, 455 - irqentry_state_t state) 456 + static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, 457 + irqentry_state_t state) 456 458 { 457 - lockdep_assert_irqs_disabled(); 459 + if (regs_irqs_disabled(regs) || state.exit_rcu) 460 + return; 458 461 462 + if (IS_ENABLED(CONFIG_PREEMPTION)) 463 + irqentry_exit_cond_resched(); 464 + } 465 + 466 + /** 467 + * irqentry_exit_to_kernel_mode_after_preempt - Establish trace, lockdep and RCU state 468 + * @regs: Pointer to current's pt_regs 469 + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() 470 + * 471 + * This is to be invoked after irqentry_exit_to_kernel_mode_preempt() and before 472 + * actually returning to the interrupted context. 473 + * 474 + * There are no requirements for the CPU state other than being able to complete 475 + * the tracing, lockdep and RCU state transitions. After this function returns 476 + * the caller must return directly to the interrupted context. 477 + */ 478 + static __always_inline void 479 + irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state) 480 + { 459 481 if (!regs_irqs_disabled(regs)) { 460 482 /* 461 483 * If RCU was not watching on entry this needs to be done ··· 496 474 } 497 475 498 476 instrumentation_begin(); 499 - if (IS_ENABLED(CONFIG_PREEMPTION)) 500 - irqentry_exit_cond_resched(); 501 - 502 477 /* Covers both tracing and lockdep */ 503 478 trace_hardirqs_on(); 504 479 instrumentation_end(); ··· 507 488 if (state.exit_rcu) 508 489 ct_irq_exit(); 509 490 } 491 + } 492 + 493 + /** 494 + * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after 495 + * invoking the interrupt handler 496 + * @regs: Pointer to current's pt_regs 497 + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() 498 + * 499 + * This is the counterpart of irqentry_enter_from_kernel_mode() and combines 500 + * the calls to irqentry_exit_to_kernel_mode_preempt() and 501 + * irqentry_exit_to_kernel_mode_after_preempt(). 502 + * 503 + * The requirement for the CPU state is that it can schedule. After the function 504 + * returns the tracing, lockdep and RCU state transitions are completed and the 505 + * caller must return directly to the interrupted context. 506 + */ 507 + static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, 508 + irqentry_state_t state) 509 + { 510 + lockdep_assert_irqs_disabled(); 511 + 512 + instrumentation_begin(); 513 + irqentry_exit_to_kernel_mode_preempt(regs, state); 514 + instrumentation_end(); 515 + 516 + irqentry_exit_to_kernel_mode_after_preempt(regs, state); 510 517 } 511 518 512 519 /**