Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel

do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a
WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function.
Since do_softirq_post_smp_call_flush() is called with preempt disabled,
raising a SOFTIRQ during flush_smp_call_function_queue() can lead to
longer preempt disabled sections.

Since commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()") IPIs to an idle CPU in
TIF_POLLING_NRFLAG mode can be optimized out by instead setting
TIF_NEED_RESCHED bit in idle task's thread_info and relying on the
flush_smp_call_function_queue() in the idle-exit path to run the
SMP-call-function.

To trigger an idle load balancing, the scheduler queues
nohz_csd_function() responsible for triggering an idle load balancing on
a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized
out and the SMP-call-function is executed from
flush_smp_call_function_queue() in do_idle() which can raise a
SCHED_SOFTIRQ to trigger the balancing.

So far, this went undetected since, the need_resched() check in
nohz_csd_function() would make it bail out of idle load balancing early
as the idle thread does not clear TIF_POLLING_NRFLAG before calling
flush_smp_call_function_queue(). The need_resched() check was added with
the intent to catch a new task wakeup, however, it has recently
discovered to be unnecessary and will be removed in the subsequent
commit after which nohz_csd_function() can raise a SCHED_SOFTIRQ from
flush_smp_call_function_queue() to trigger an idle load balance on an
idle target in TIF_POLLING_NRFLAG mode.

nohz_csd_function() bails out early if "idle_cpu()" check for the
target CPU, and does not lock the target CPU's rq until the very end,
once it has found tasks to run on the CPU and will not inhibit the
wakeup of, or running of a newly woken up higher priority task. Account
for this and prevent a WARN_ON_ONCE() when SCHED_SOFTIRQ is raised from
flush_smp_call_function_queue().

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241119054432.6405-2-kprateek.nayak@amd.com

authored by

K Prateek Nayak and committed by
Peter Zijlstra
6675ce20 70ee7947

+11 -4
+11 -4
kernel/softirq.c
··· 280 280 wakeup_softirqd(); 281 281 } 282 282 283 + #define SCHED_SOFTIRQ_MASK BIT(SCHED_SOFTIRQ) 284 + 283 285 /* 284 286 * flush_smp_call_function_queue() can raise a soft interrupt in a function 285 - * call. On RT kernels this is undesired and the only known functionality 286 - * in the block layer which does this is disabled on RT. If soft interrupts 287 - * get raised which haven't been raised before the flush, warn so it can be 287 + * call. On RT kernels this is undesired and the only known functionalities 288 + * are in the block layer which is disabled on RT, and in the scheduler for 289 + * idle load balancing. If soft interrupts get raised which haven't been 290 + * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be 288 291 * investigated. 289 292 */ 290 293 void do_softirq_post_smp_call_flush(unsigned int was_pending) 291 294 { 292 - if (WARN_ON_ONCE(was_pending != local_softirq_pending())) 295 + unsigned int is_pending = local_softirq_pending(); 296 + 297 + if (unlikely(was_pending != is_pending)) { 298 + WARN_ON_ONCE(was_pending != (is_pending & ~SCHED_SOFTIRQ_MASK)); 293 299 invoke_softirq(); 300 + } 294 301 } 295 302 296 303 #else /* CONFIG_PREEMPT_RT */