Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched_ext: defer queue_balance_callback() until after ops.dispatch

The sched_ext code calls queue_balance_callback() during enqueue_task()
to defer operations that drop multiple locks until we can unpin them.
The call assumes that the rq lock is held until the callbacks are
invoked, and the pending callbacks will not be visible to any other
threads. This is enforced by a WARN_ON_ONCE() in rq_pin_lock().

However, balance_one() may actually drop the lock during a BPF dispatch
call. Another thread may win the race to get the rq lock and see the
pending callback. To avoid this, sched_ext must only queue the callback
after the dispatch calls have completed.

CPU 0 CPU 1 CPU 2

scx_balance()
rq_unpin_lock()
scx_balance_one()
|= IN_BALANCE scx_enqueue()
ops.dispatch()
rq_unlock()
rq_lock()
queue_balance_callback()
rq_unlock()
[WARN] rq_pin_lock()
rq_lock()
&= ~IN_BALANCE
rq_repin_lock()

Changelog

v2-> v1 (https://lore.kernel.org/sched-ext/aOgOxtHCeyRT_7jn@gpd4)

- Fixed explanation in patch description (Andrea)
- Fixed scx_rq mask state updates (Andrea)
- Added Reviewed-by tag from Andrea

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Emil Tsalapatis and committed by
Tejun Heo
a8ad8731 efeeaac9

+28 -2
+27 -2
kernel/sched/ext.c
··· 780 780 if (rq->scx.flags & SCX_RQ_IN_WAKEUP) 781 781 return; 782 782 783 + /* Don't do anything if there already is a deferred operation. */ 784 + if (rq->scx.flags & SCX_RQ_BAL_PENDING) 785 + return; 786 + 783 787 /* 784 788 * If in balance, the balance callbacks will be called before rq lock is 785 789 * released. Schedule one. 790 + * 791 + * 792 + * We can't directly insert the callback into the 793 + * rq's list: The call can drop its lock and make the pending balance 794 + * callback visible to unrelated code paths that call rq_pin_lock(). 795 + * 796 + * Just let balance_one() know that it must do it itself. 786 797 */ 787 798 if (rq->scx.flags & SCX_RQ_IN_BALANCE) { 788 - queue_balance_callback(rq, &rq->scx.deferred_bal_cb, 789 - deferred_bal_cb_workfn); 799 + rq->scx.flags |= SCX_RQ_BAL_CB_PENDING; 790 800 return; 791 801 } 792 802 ··· 2013 2003 dspc->cursor = 0; 2014 2004 } 2015 2005 2006 + static inline void maybe_queue_balance_callback(struct rq *rq) 2007 + { 2008 + lockdep_assert_rq_held(rq); 2009 + 2010 + if (!(rq->scx.flags & SCX_RQ_BAL_CB_PENDING)) 2011 + return; 2012 + 2013 + queue_balance_callback(rq, &rq->scx.deferred_bal_cb, 2014 + deferred_bal_cb_workfn); 2015 + 2016 + rq->scx.flags &= ~SCX_RQ_BAL_CB_PENDING; 2017 + } 2018 + 2016 2019 static int balance_one(struct rq *rq, struct task_struct *prev) 2017 2020 { 2018 2021 struct scx_sched *sch = scx_root; ··· 2172 2149 } 2173 2150 #endif 2174 2151 rq_repin_lock(rq, rf); 2152 + 2153 + maybe_queue_balance_callback(rq); 2175 2154 2176 2155 return ret; 2177 2156 }
+1
kernel/sched/sched.h
··· 784 784 SCX_RQ_BAL_KEEP = 1 << 3, /* balance decided to keep current */ 785 785 SCX_RQ_BYPASSING = 1 << 4, 786 786 SCX_RQ_CLK_VALID = 1 << 5, /* RQ clock is fresh and valid */ 787 + SCX_RQ_BAL_CB_PENDING = 1 << 6, /* must queue a cb after dispatching */ 787 788 788 789 SCX_RQ_IN_WAKEUP = 1 << 16, 789 790 SCX_RQ_IN_BALANCE = 1 << 17,