Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/core: Fix wakeup_preempt's next_class tracking

Kernel test robot reported that
tools/testing/selftests/kvm/hardware_disable_test was failing due to
commit 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt()
and rq_modified_*()")

It turns out there were two related problems that could lead to a
missed preemption:

- when hitting newidle balance from the idle thread, it would elevate
rb->next_class from &idle_sched_class to &fair_sched_class, causing
later wakeup_preempt() calls to not hit the sched_class_above()
case, and not issue resched_curr().

Notably, this modification pattern should only lower the
next_class, and never raise it. Create two new helper functions to
wrap this.

- when doing schedule_idle(), it was possible to miss (re)setting
rq->next_class to &idle_sched_class, leading to the very same
problem.

Cc: Sean Christopherson <seanjc@google.com>
Fixes: 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602122157.4e861298-lkp@intel.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260218163329.GQ1395416@noisy.programming.kicks-ass.net

+16 -4
+1
kernel/sched/core.c
··· 6830 6830 /* SCX must consult the BPF scheduler to tell if rq is empty */ 6831 6831 if (!rq->nr_running && !scx_enabled()) { 6832 6832 next = prev; 6833 + rq->next_class = &idle_sched_class; 6833 6834 goto picked; 6834 6835 } 6835 6836 } else if (!preempt && prev_state) {
+2 -2
kernel/sched/ext.c
··· 2460 2460 /* see kick_cpus_irq_workfn() */ 2461 2461 smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1); 2462 2462 2463 - rq->next_class = &ext_sched_class; 2463 + rq_modified_begin(rq, &ext_sched_class); 2464 2464 2465 2465 rq_unpin_lock(rq, rf); 2466 2466 balance_one(rq, prev); ··· 2475 2475 * If @force_scx is true, always try to pick a SCHED_EXT task, 2476 2476 * regardless of any higher-priority sched classes activity. 2477 2477 */ 2478 - if (!force_scx && sched_class_above(rq->next_class, &ext_sched_class)) 2478 + if (!force_scx && rq_modified_above(rq, &ext_sched_class)) 2479 2479 return RETRY_TASK; 2480 2480 2481 2481 keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP;
+2 -2
kernel/sched/fair.c
··· 12982 12982 t0 = sched_clock_cpu(this_cpu); 12983 12983 __sched_balance_update_blocked_averages(this_rq); 12984 12984 12985 - this_rq->next_class = &fair_sched_class; 12985 + rq_modified_begin(this_rq, &fair_sched_class); 12986 12986 raw_spin_rq_unlock(this_rq); 12987 12987 12988 12988 for_each_domain(this_cpu, sd) { ··· 13049 13049 pulled_task = 1; 13050 13050 13051 13051 /* If a higher prio class was modified, restart the pick */ 13052 - if (sched_class_above(this_rq->next_class, &fair_sched_class)) 13052 + if (rq_modified_above(this_rq, &fair_sched_class)) 13053 13053 pulled_task = -1; 13054 13054 13055 13055 out:
+11
kernel/sched/sched.h
··· 2748 2748 2749 2749 #define sched_class_above(_a, _b) ((_a) < (_b)) 2750 2750 2751 + static inline void rq_modified_begin(struct rq *rq, const struct sched_class *class) 2752 + { 2753 + if (sched_class_above(rq->next_class, class)) 2754 + rq->next_class = class; 2755 + } 2756 + 2757 + static inline bool rq_modified_above(struct rq *rq, const struct sched_class *class) 2758 + { 2759 + return sched_class_above(rq->next_class, class); 2760 + } 2761 + 2751 2762 static inline bool sched_stop_runnable(struct rq *rq) 2752 2763 { 2753 2764 return rq->stop && task_on_rq_queued(rq->stop);