Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy

Consider the following cgroup:

root
|
------------------------
| |
normal_cgroup idle_cgroup
| |
SCHED_IDLE task_A SCHED_NORMAL task_B

According to the cgroup hierarchy, A should preempt B. But current
check_preempt_wakeup_fair() treats cgroup se and task separately, so B
will preempt A unexpectedly.
Unify the wakeup logic by {c,p}se_is_idle only. This makes SCHED_IDLE of
a task a relative policy that is effective only within its own cgroup,
similar to the behavior of NICE.

Also fix se_is_idle() definition when !CONFIG_FAIR_GROUP_SCHED.

Fixes: 304000390f88 ("sched: Cgroup SCHED_IDLE support")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Don <joshdon@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20240626023505.1332596-1-dtcccc@linux.alibaba.com

authored by

Tianchen Ding and committed by
Peter Zijlstra
faa42d29 a58501fb

+9 -13
+9 -13
kernel/sched/fair.c
··· 511 511 512 512 static int se_is_idle(struct sched_entity *se) 513 513 { 514 - return 0; 514 + return task_has_idle_policy(task_of(se)); 515 515 } 516 516 517 517 #endif /* CONFIG_FAIR_GROUP_SCHED */ ··· 8381 8381 if (test_tsk_need_resched(curr)) 8382 8382 return; 8383 8383 8384 - /* Idle tasks are by definition preempted by non-idle tasks. */ 8385 - if (unlikely(task_has_idle_policy(curr)) && 8386 - likely(!task_has_idle_policy(p))) 8387 - goto preempt; 8388 - 8389 - /* 8390 - * Batch and idle tasks do not preempt non-idle tasks (their preemption 8391 - * is driven by the tick): 8392 - */ 8393 - if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION)) 8384 + if (!sched_feat(WAKEUP_PREEMPTION)) 8394 8385 return; 8395 8386 8396 8387 find_matching_se(&se, &pse); ··· 8391 8400 pse_is_idle = se_is_idle(pse); 8392 8401 8393 8402 /* 8394 - * Preempt an idle group in favor of a non-idle group (and don't preempt 8403 + * Preempt an idle entity in favor of a non-idle entity (and don't preempt 8395 8404 * in the inverse case). 8396 8405 */ 8397 8406 if (cse_is_idle && !pse_is_idle) ··· 8399 8408 if (cse_is_idle != pse_is_idle) 8400 8409 return; 8401 8410 8411 + /* 8412 + * BATCH and IDLE tasks do not preempt others. 8413 + */ 8414 + if (unlikely(p->policy != SCHED_NORMAL)) 8415 + return; 8416 + 8402 8417 cfs_rq = cfs_rq_of(se); 8403 8418 update_curr(cfs_rq); 8404 - 8405 8419 /* 8406 8420 * XXX pick_eevdf(cfs_rq) != se ? 8407 8421 */