sched_ext: Decouple kfunc unlocked-context check from kf_mask

scx_kf_allowed_if_unlocked() uses !current->scx.kf_mask as a proxy for "no
SCX-tracked lock held". kf_mask is removed in a follow-up patch, so its two
callers - select_cpu_from_kfunc() and scx_dsq_move() - need another basis.

Add a new bool scx_rq.in_select_cpu, set across the SCX_CALL_OP_TASK_RET
that invokes ops.select_cpu(), to capture the one case where SCX itself
holds no lock but try_to_wake_up() holds @p's pi_lock. Together with
scx_locked_rq(), it expresses the same accepted-context set.

select_cpu_from_kfunc() needs a runtime test because it has to take
different locking paths depending on context. Open-code as a three-way
branch. The unlocked branch takes raw_spin_lock_irqsave(&p->pi_lock)
directly - pi_lock alone is enough for the fields the kfunc reads, and is
lighter than task_rq_lock().

scx_dsq_move() doesn't really need a runtime test - its accepted contexts
could be enforced at verifier load time. But since the runtime state is
already there and using it keeps the upcoming load-time filter simpler, just
write it the same way: (scx_locked_rq() || in_select_cpu) &&
!kf_allowed(DISPATCH).

scx_kf_allowed_if_unlocked() is deleted with the conversions.

No semantic change.

v2: s/No functional change/No semantic change/ - the unlocked path now acquires
pi_lock instead of the heavier task_rq_lock() (Andrea Righi).

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

Tejun Heo 2 months ago 0022b328 b470e37c

+21 -28

4 changed files

expand all

kernel

sched

ext.c

ext_idle.c

ext_internal.h

sched.h

+3 -1

kernel/sched/ext.c

··· 3308 3308 WARN_ON_ONCE(*ddsp_taskp); 3309 3309 *ddsp_taskp = p; 3310 3310 3311 + this_rq()->scx.in_select_cpu = true; 3311 3312 cpu = SCX_CALL_OP_TASK_RET(sch, 3312 3313 SCX_KF_ENQUEUE | SCX_KF_SELECT_CPU, 3313 3314 select_cpu, NULL, p, prev_cpu, 3314 3315 wake_flags); 3316 + this_rq()->scx.in_select_cpu = false; 3315 3317 p->scx.selected_cpu = cpu; 3316 3318 *ddsp_taskp = NULL; 3317 3319 if (ops_cpu_valid(sch, cpu, "from ops.select_cpu()")) ··· 8146 8144 bool in_balance; 8147 8145 unsigned long flags; 8148 8146 8149 - if (!scx_kf_allowed_if_unlocked() && 8147 + if ((scx_locked_rq() || this_rq()->scx.in_select_cpu) && 8150 8148 !scx_kf_allowed(sch, SCX_KF_DISPATCH)) 8151 8149 return false; 8152 8150

+17 -22

kernel/sched/ext_idle.c

··· 913 913 s32 prev_cpu, u64 wake_flags, 914 914 const struct cpumask *allowed, u64 flags) 915 915 { 916 - struct rq *rq; 917 - struct rq_flags rf; 916 + unsigned long irq_flags; 917 + bool we_locked = false; 918 918 s32 cpu; 919 919 920 920 if (!ops_cpu_valid(sch, prev_cpu, NULL)) ··· 924 924 return -EBUSY; 925 925 926 926 /* 927 - * If called from an unlocked context, acquire the task's rq lock, 928 - * so that we can safely access p->cpus_ptr and p->nr_cpus_allowed. 927 + * Accessing p->cpus_ptr / p->nr_cpus_allowed needs either @p's rq 928 + * lock or @p's pi_lock. Three cases: 929 929 * 930 - * Otherwise, allow to use this kfunc only from ops.select_cpu() 931 - * and ops.select_enqueue(). 930 + * - inside ops.select_cpu(): try_to_wake_up() holds @p's pi_lock. 931 + * - other rq-locked SCX op: scx_locked_rq() points at the held rq. 932 + * - truly unlocked (UNLOCKED ops, SYSCALL, non-SCX struct_ops): 933 + * nothing held, take pi_lock ourselves. 932 934 */ 933 - if (scx_kf_allowed_if_unlocked()) { 934 - rq = task_rq_lock(p, &rf); 935 - } else { 936 - if (!scx_kf_allowed(sch, SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE)) 937 - return -EPERM; 938 - rq = scx_locked_rq(); 939 - } 940 - 941 - /* 942 - * Validate locking correctness to access p->cpus_ptr and 943 - * p->nr_cpus_allowed: if we're holding an rq lock, we're safe; 944 - * otherwise, assert that p->pi_lock is held. 945 - */ 946 - if (!rq) 935 + if (this_rq()->scx.in_select_cpu) { 947 936 lockdep_assert_held(&p->pi_lock); 937 + } else if (!scx_locked_rq()) { 938 + raw_spin_lock_irqsave(&p->pi_lock, irq_flags); 939 + we_locked = true; 940 + } else if (!scx_kf_allowed(sch, SCX_KF_ENQUEUE)) { 941 + return -EPERM; 942 + } 948 943 949 944 /* 950 945 * This may also be called from ops.enqueue(), so we need to handle ··· 958 963 allowed ?: p->cpus_ptr, flags); 959 964 } 960 965 961 - if (scx_kf_allowed_if_unlocked()) 962 - task_rq_unlock(rq, p, &rf); 966 + if (we_locked) 967 + raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags); 963 968 964 969 return cpu; 965 970 }

-5

kernel/sched/ext_internal.h

··· 1372 1372 return __this_cpu_read(scx_locked_rq_state); 1373 1373 } 1374 1374 1375 - static inline bool scx_kf_allowed_if_unlocked(void) 1376 - { 1377 - return !current->scx.kf_mask; 1378 - } 1379 - 1380 1375 static inline bool scx_bypassing(struct scx_sched *sch, s32 cpu) 1381 1376 { 1382 1377 return unlikely(per_cpu_ptr(sch->pcpu, cpu)->flags &

kernel/sched/sched.h

··· 798 798 u64 extra_enq_flags; /* see move_task_to_local_dsq() */ 799 799 u32 nr_running; 800 800 u32 cpuperf_target; /* [0, SCHED_CAPACITY_SCALE] */ 801 + bool in_select_cpu; 801 802 bool cpu_released; 802 803 u32 flags; 803 804 u32 nr_immed; /* ENQ_IMMED tasks on local_dsq */

Configure Feed

Configure Feed