Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path

scx_sub_enable_workfn()'s prep loop calls __scx_init_task(sch, p, false)
without transitioning task state, then sets SCX_TASK_SUB_INIT. If prep fails
partway, the abort path runs __scx_disable_and_exit_task(sch, p) on the
marked tasks. Task state is still the parent's ENABLED, so that dispatches
to the SCX_TASK_ENABLED arm and calls scx_disable_task(sch, p) - i.e.
child->ops.disable() - for tasks on which child->ops.enable() never ran. A
BPF sub-scheduler allocating per-task state in enable/freeing in disable
would operate on uninitialized state.

The dying-task branch in scx_disable_and_exit_task() has the same problem,
and scx_enabling_sub_sched was cleared before the abort cleanup loop - a
task exiting during cleanup tripped the WARN and skipped both ops.exit_task
and the SCX_TASK_SUB_INIT clear, leaking per-task resources and leaving the
task stuck.

Introduce scx_sub_init_cancel_task() that calls ops.exit_task with
cancelled=true - matching what the top-level init path does when init_task
itself returns -errno. Use it in the abort loop and in the dying-task
branch. scx_enabling_sub_sched now stays set until the abort loop finishes
clearing SUB_INIT, so concurrent exits hitting the dying-task branch can
still find @sch. That branch also clears SCX_TASK_SUB_INIT unconditionally
when seen, leaving the task unmarked even if the WARN fires.

Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

+30 -6
+30 -6
kernel/sched/ext.c
··· 3633 3633 SCX_CALL_OP_TASK(sch, exit_task, task_rq(p), p, &args); 3634 3634 } 3635 3635 3636 + /* 3637 + * Undo a completed __scx_init_task(sch, p, false) when scx_enable_task() never 3638 + * ran. The task state has not been transitioned, so this mirrors the 3639 + * SCX_TASK_INIT branch in __scx_disable_and_exit_task(). 3640 + */ 3641 + static void scx_sub_init_cancel_task(struct scx_sched *sch, struct task_struct *p) 3642 + { 3643 + struct scx_exit_task_args args = { .cancelled = true }; 3644 + 3645 + lockdep_assert_held(&p->pi_lock); 3646 + lockdep_assert_rq_held(task_rq(p)); 3647 + 3648 + if (SCX_HAS_OP(sch, exit_task)) 3649 + SCX_CALL_OP_TASK(sch, exit_task, task_rq(p), p, &args); 3650 + } 3651 + 3636 3652 static void scx_disable_and_exit_task(struct scx_sched *sch, 3637 3653 struct task_struct *p) 3638 3654 { ··· 3657 3641 /* 3658 3642 * If set, @p exited between __scx_init_task() and scx_enable_task() in 3659 3643 * scx_sub_enable() and is initialized for both the associated sched and 3660 - * its parent. Disable and exit for the child too. 3644 + * its parent. Exit for the child too - scx_enable_task() never ran for 3645 + * it, so undo only init_task. 3661 3646 */ 3662 - if ((p->scx.flags & SCX_TASK_SUB_INIT) && 3663 - !WARN_ON_ONCE(!scx_enabling_sub_sched)) { 3664 - __scx_disable_and_exit_task(scx_enabling_sub_sched, p); 3647 + if (p->scx.flags & SCX_TASK_SUB_INIT) { 3648 + if (!WARN_ON_ONCE(!scx_enabling_sub_sched)) 3649 + scx_sub_init_cancel_task(scx_enabling_sub_sched, p); 3665 3650 p->scx.flags &= ~SCX_TASK_SUB_INIT; 3666 3651 } 3667 3652 ··· 7141 7124 abort: 7142 7125 put_task_struct(p); 7143 7126 scx_task_iter_stop(&sti); 7144 - scx_enabling_sub_sched = NULL; 7145 7127 7128 + /* 7129 + * Undo __scx_init_task() for tasks we marked. scx_enable_task() never 7130 + * ran for @sch on them, so calling scx_disable_task() here would invoke 7131 + * ops.disable() without a matching ops.enable(). scx_enabling_sub_sched 7132 + * must stay set until SUB_INIT is cleared from every marked task - 7133 + * scx_disable_and_exit_task() reads it when a task exits concurrently. 7134 + */ 7146 7135 scx_task_iter_start(&sti, sch->cgrp); 7147 7136 while ((p = scx_task_iter_next_locked(&sti))) { 7148 7137 if (p->scx.flags & SCX_TASK_SUB_INIT) { 7149 - __scx_disable_and_exit_task(sch, p); 7138 + scx_sub_init_cancel_task(sch, p); 7150 7139 p->scx.flags &= ~SCX_TASK_SUB_INIT; 7151 7140 } 7152 7141 } 7153 7142 scx_task_iter_stop(&sti); 7143 + scx_enabling_sub_sched = NULL; 7154 7144 err_unlock_and_disable: 7155 7145 /* we'll soon enter disable path, keep bypass on */ 7156 7146 scx_cgroup_unlock();