Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/fair: Fix wakeup_preempt_fair() vs delayed dequeue

Similar to how pick_next_entity() must dequeue delayed entities, so too must
wakeup_preempt_fair(). Any delayed task being found means it is eligible and
hence past the 0-lag point, ready for removal.

Worse, by not removing delayed entities from consideration, it can skew the
preemption decision, with the end result that a short slice wakeup will not
result in a preemption.

tip/sched/core tip/sched/core +this patch
cyclictest slice (ms) (default)2.8 8 8
hackbench slice (ms) (default)2.8 20 20
Total Samples | 22559 22595 22683
Average (us) | 157 64( 59%) 59( 8%)
Median (P50) (us) | 57 57( 0%) 58(- 2%)
90th Percentile (us) | 64 60( 6%) 60( 0%)
99th Percentile (us) | 2407 67( 97%) 67( 0%)
99.9th Percentile (us) | 3400 2288( 33%) 727( 68%)
Maximum (us) | 5037 9252(-84%) 7461( 19%)

Fixes: f12e148892ed ("sched/fair: Prepare pick_next_task() for delayed dequeue")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260422093400.319251-1-vincent.guittot@linaro.org

authored by

Vincent Guittot and committed by
Peter Zijlstra
ac8e69e6 c5cd6fd7

+15 -14
+15 -14
kernel/sched/fair.c
··· 1104 1104 * 1105 1105 * Which allows tree pruning through eligibility. 1106 1106 */ 1107 - static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq, bool protect) 1107 + static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq, bool protect) 1108 1108 { 1109 1109 struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node; 1110 1110 struct sched_entity *se = __pick_first_entity(cfs_rq); ··· 1173 1173 best = curr; 1174 1174 1175 1175 return best; 1176 - } 1177 - 1178 - static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) 1179 - { 1180 - return __pick_eevdf(cfs_rq, true); 1181 1176 } 1182 1177 1183 1178 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq) ··· 5749 5754 * 4) do not run the "skip" process, if something else is available 5750 5755 */ 5751 5756 static struct sched_entity * 5752 - pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq) 5757 + pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq, bool protect) 5753 5758 { 5754 5759 struct sched_entity *se; 5755 5760 5756 - se = pick_eevdf(cfs_rq); 5761 + se = pick_eevdf(cfs_rq, protect); 5757 5762 if (se->sched_delayed) { 5758 5763 dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); 5759 5764 /* ··· 9027 9032 { 9028 9033 enum preempt_wakeup_action preempt_action = PREEMPT_WAKEUP_PICK; 9029 9034 struct task_struct *donor = rq->donor; 9030 - struct sched_entity *se = &donor->se, *pse = &p->se; 9035 + struct sched_entity *nse, *se = &donor->se, *pse = &p->se; 9031 9036 struct cfs_rq *cfs_rq = task_cfs_rq(donor); 9032 9037 int cse_is_idle, pse_is_idle; 9033 9038 ··· 9138 9143 } 9139 9144 9140 9145 pick: 9141 - /* 9142 - * If @p has become the most eligible task, force preemption. 9143 - */ 9144 - if (__pick_eevdf(cfs_rq, preempt_action != PREEMPT_WAKEUP_SHORT) == pse) 9146 + nse = pick_next_entity(rq, cfs_rq, preempt_action != PREEMPT_WAKEUP_SHORT); 9147 + /* If @p has become the most eligible task, force preemption */ 9148 + if (nse == pse) 9145 9149 goto preempt; 9150 + 9151 + /* 9152 + * Because p is enqueued, nse being null can only mean that we 9153 + * dequeued a delayed task. 9154 + */ 9155 + if (!nse) 9156 + goto pick; 9146 9157 9147 9158 if (sched_feat(RUN_TO_PARITY)) 9148 9159 update_protect_slice(cfs_rq, se); ··· 9185 9184 9186 9185 throttled |= check_cfs_rq_runtime(cfs_rq); 9187 9186 9188 - se = pick_next_entity(rq, cfs_rq); 9187 + se = pick_next_entity(rq, cfs_rq, true); 9189 9188 if (!se) 9190 9189 goto again; 9191 9190 cfs_rq = group_cfs_rq(se);