Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

entry: Split kernel mode logic from irqentry_{enter,exit}()

The generic irqentry code has entry/exit functions specifically for
exceptions taken from user mode, but doesn't have entry/exit functions
specifically for exceptions taken from kernel mode.

It would be helpful to have separate entry/exit functions specifically
for exceptions taken from kernel mode. This would make the structure of
the entry code more consistent, and would make it easier for
architectures to manage logic specific to exceptions taken from kernel
mode.

Move the logic specific to kernel mode out of irqentry_enter() and
irqentry_exit() into new irqentry_enter_from_kernel_mode() and
irqentry_exit_to_kernel_mode() functions. These are marked
__always_inline and placed in irq-entry-common.h, as with
irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so
that they can be inlined into architecture-specific wrappers. The
existing out-of-line irqentry_enter() and irqentry_exit() functions
retained as callers of the new functions.

The lockdep assertion from irqentry_exit() is moved into
irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This
was previously missing from irqentry_exit_to_user_mode() when called
directly, and any new lockdep assertion failure relating from this
change is a latent bug.

Aside from the lockdep change noted above, there should be no functional
change as a result of this change.

[ tglx: Updated kernel doc ]

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260407131650.3813777-5-mark.rutland@arm.com

authored by

Mark Rutland and committed by
Thomas Gleixner
c5538d01 eb1b51af

+142 -95
+134
include/linux/irq-entry-common.h
··· 304 304 */ 305 305 static __always_inline void irqentry_exit_to_user_mode(struct pt_regs *regs) 306 306 { 307 + lockdep_assert_irqs_disabled(); 308 + 307 309 instrumentation_begin(); 308 310 irqentry_exit_to_user_mode_prepare(regs); 309 311 instrumentation_end(); ··· 357 355 #else /* CONFIG_PREEMPT_DYNAMIC */ 358 356 #define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() 359 357 #endif /* CONFIG_PREEMPT_DYNAMIC */ 358 + 359 + /** 360 + * irqentry_enter_from_kernel_mode - Establish state before invoking the irq handler 361 + * @regs: Pointer to currents pt_regs 362 + * 363 + * Invoked from architecture specific entry code with interrupts disabled. 364 + * Can only be called when the interrupt entry came from kernel mode. The 365 + * calling code must be non-instrumentable. When the function returns all 366 + * state is correct and the subsequent functions can be instrumented. 367 + * 368 + * The function establishes state (lockdep, RCU (context tracking), tracing) and 369 + * is provided for architectures which require a strict split between entry from 370 + * kernel and user mode and therefore cannot use irqentry_enter() which handles 371 + * both entry modes. 372 + * 373 + * Returns: An opaque object that must be passed to irqentry_exit_to_kernel_mode(). 374 + */ 375 + static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct pt_regs *regs) 376 + { 377 + irqentry_state_t ret = { 378 + .exit_rcu = false, 379 + }; 380 + 381 + /* 382 + * If this entry hit the idle task invoke ct_irq_enter() whether 383 + * RCU is watching or not. 384 + * 385 + * Interrupts can nest when the first interrupt invokes softirq 386 + * processing on return which enables interrupts. 387 + * 388 + * Scheduler ticks in the idle task can mark quiescent state and 389 + * terminate a grace period, if and only if the timer interrupt is 390 + * not nested into another interrupt. 391 + * 392 + * Checking for rcu_is_watching() here would prevent the nesting 393 + * interrupt to invoke ct_irq_enter(). If that nested interrupt is 394 + * the tick then rcu_flavor_sched_clock_irq() would wrongfully 395 + * assume that it is the first interrupt and eventually claim 396 + * quiescent state and end grace periods prematurely. 397 + * 398 + * Unconditionally invoke ct_irq_enter() so RCU state stays 399 + * consistent. 400 + * 401 + * TINY_RCU does not support EQS, so let the compiler eliminate 402 + * this part when enabled. 403 + */ 404 + if (!IS_ENABLED(CONFIG_TINY_RCU) && 405 + (is_idle_task(current) || arch_in_rcu_eqs())) { 406 + /* 407 + * If RCU is not watching then the same careful 408 + * sequence vs. lockdep and tracing is required 409 + * as in irqentry_enter_from_user_mode(). 410 + */ 411 + lockdep_hardirqs_off(CALLER_ADDR0); 412 + ct_irq_enter(); 413 + instrumentation_begin(); 414 + kmsan_unpoison_entry_regs(regs); 415 + trace_hardirqs_off_finish(); 416 + instrumentation_end(); 417 + 418 + ret.exit_rcu = true; 419 + return ret; 420 + } 421 + 422 + /* 423 + * If RCU is watching then RCU only wants to check whether it needs 424 + * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick() 425 + * already contains a warning when RCU is not watching, so no point 426 + * in having another one here. 427 + */ 428 + lockdep_hardirqs_off(CALLER_ADDR0); 429 + instrumentation_begin(); 430 + kmsan_unpoison_entry_regs(regs); 431 + rcu_irq_enter_check_tick(); 432 + trace_hardirqs_off_finish(); 433 + instrumentation_end(); 434 + 435 + return ret; 436 + } 437 + 438 + /** 439 + * irqentry_exit_to_kernel_mode - Run preempt checks and establish state after 440 + * invoking the interrupt handler 441 + * @regs: Pointer to current's pt_regs 442 + * @state: Return value from matching call to irqentry_enter_from_kernel_mode() 443 + * 444 + * This is the counterpart of irqentry_enter_from_kernel_mode() and runs the 445 + * necessary preemption check if possible and required. It returns to the caller 446 + * with interrupts disabled and the correct state vs. tracing, lockdep and RCU 447 + * required to return to the interrupted context. 448 + * 449 + * It is the last action before returning to the low level ASM code which just 450 + * needs to return. 451 + */ 452 + static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, 453 + irqentry_state_t state) 454 + { 455 + lockdep_assert_irqs_disabled(); 456 + 457 + if (!regs_irqs_disabled(regs)) { 458 + /* 459 + * If RCU was not watching on entry this needs to be done 460 + * carefully and needs the same ordering of lockdep/tracing 461 + * and RCU as the return to user mode path. 462 + */ 463 + if (state.exit_rcu) { 464 + instrumentation_begin(); 465 + /* Tell the tracer that IRET will enable interrupts */ 466 + trace_hardirqs_on_prepare(); 467 + lockdep_hardirqs_on_prepare(); 468 + instrumentation_end(); 469 + ct_irq_exit(); 470 + lockdep_hardirqs_on(CALLER_ADDR0); 471 + return; 472 + } 473 + 474 + instrumentation_begin(); 475 + if (IS_ENABLED(CONFIG_PREEMPTION)) 476 + irqentry_exit_cond_resched(); 477 + 478 + /* Covers both tracing and lockdep */ 479 + trace_hardirqs_on(); 480 + instrumentation_end(); 481 + } else { 482 + /* 483 + * IRQ flags state is correct already. Just tell RCU if it 484 + * was not watching on entry. 485 + */ 486 + if (state.exit_rcu) 487 + ct_irq_exit(); 488 + } 489 + } 360 490 361 491 /** 362 492 * irqentry_enter - Handle state tracking on ordinary interrupt entries
+8 -95
kernel/entry/common.c
··· 105 105 106 106 noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) 107 107 { 108 - irqentry_state_t ret = { 109 - .exit_rcu = false, 110 - }; 111 - 112 108 if (user_mode(regs)) { 109 + irqentry_state_t ret = { 110 + .exit_rcu = false, 111 + }; 112 + 113 113 irqentry_enter_from_user_mode(regs); 114 114 return ret; 115 115 } 116 116 117 - /* 118 - * If this entry hit the idle task invoke ct_irq_enter() whether 119 - * RCU is watching or not. 120 - * 121 - * Interrupts can nest when the first interrupt invokes softirq 122 - * processing on return which enables interrupts. 123 - * 124 - * Scheduler ticks in the idle task can mark quiescent state and 125 - * terminate a grace period, if and only if the timer interrupt is 126 - * not nested into another interrupt. 127 - * 128 - * Checking for rcu_is_watching() here would prevent the nesting 129 - * interrupt to invoke ct_irq_enter(). If that nested interrupt is 130 - * the tick then rcu_flavor_sched_clock_irq() would wrongfully 131 - * assume that it is the first interrupt and eventually claim 132 - * quiescent state and end grace periods prematurely. 133 - * 134 - * Unconditionally invoke ct_irq_enter() so RCU state stays 135 - * consistent. 136 - * 137 - * TINY_RCU does not support EQS, so let the compiler eliminate 138 - * this part when enabled. 139 - */ 140 - if (!IS_ENABLED(CONFIG_TINY_RCU) && 141 - (is_idle_task(current) || arch_in_rcu_eqs())) { 142 - /* 143 - * If RCU is not watching then the same careful 144 - * sequence vs. lockdep and tracing is required 145 - * as in irqentry_enter_from_user_mode(). 146 - */ 147 - lockdep_hardirqs_off(CALLER_ADDR0); 148 - ct_irq_enter(); 149 - instrumentation_begin(); 150 - kmsan_unpoison_entry_regs(regs); 151 - trace_hardirqs_off_finish(); 152 - instrumentation_end(); 153 - 154 - ret.exit_rcu = true; 155 - return ret; 156 - } 157 - 158 - /* 159 - * If RCU is watching then RCU only wants to check whether it needs 160 - * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick() 161 - * already contains a warning when RCU is not watching, so no point 162 - * in having another one here. 163 - */ 164 - lockdep_hardirqs_off(CALLER_ADDR0); 165 - instrumentation_begin(); 166 - kmsan_unpoison_entry_regs(regs); 167 - rcu_irq_enter_check_tick(); 168 - trace_hardirqs_off_finish(); 169 - instrumentation_end(); 170 - 171 - return ret; 117 + return irqentry_enter_from_kernel_mode(regs); 172 118 } 173 119 174 120 /** ··· 158 212 159 213 noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) 160 214 { 161 - lockdep_assert_irqs_disabled(); 162 - 163 - /* Check whether this returns to user mode */ 164 - if (user_mode(regs)) { 215 + if (user_mode(regs)) 165 216 irqentry_exit_to_user_mode(regs); 166 - } else if (!regs_irqs_disabled(regs)) { 167 - /* 168 - * If RCU was not watching on entry this needs to be done 169 - * carefully and needs the same ordering of lockdep/tracing 170 - * and RCU as the return to user mode path. 171 - */ 172 - if (state.exit_rcu) { 173 - instrumentation_begin(); 174 - /* Tell the tracer that IRET will enable interrupts */ 175 - trace_hardirqs_on_prepare(); 176 - lockdep_hardirqs_on_prepare(); 177 - instrumentation_end(); 178 - ct_irq_exit(); 179 - lockdep_hardirqs_on(CALLER_ADDR0); 180 - return; 181 - } 182 - 183 - instrumentation_begin(); 184 - if (IS_ENABLED(CONFIG_PREEMPTION)) 185 - irqentry_exit_cond_resched(); 186 - 187 - /* Covers both tracing and lockdep */ 188 - trace_hardirqs_on(); 189 - instrumentation_end(); 190 - } else { 191 - /* 192 - * IRQ flags state is correct already. Just tell RCU if it 193 - * was not watching on entry. 194 - */ 195 - if (state.exit_rcu) 196 - ct_irq_exit(); 197 - } 217 + else 218 + irqentry_exit_to_kernel_mode(regs, state); 198 219 } 199 220 200 221 irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)