Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/fair: Do not balance task to a throttled cfs_rq

When doing load balance and the target cfs_rq is in throttled hierarchy,
whether to allow balancing there is a question.

The good side to allow balancing is: if the target CPU is idle or less
loaded and the being balanced task is holding some kernel resources,
then it seems a good idea to balance the task there and let the task get
the CPU earlier and release kernel resources sooner. The bad part is, if
the task is not holding any kernel resources, then the balance seems not
that useful.

While theoretically it's debatable, a performance test[0] which involves
200 cgroups and each cgroup runs hackbench(20 sender, 20 receiver) in
pipe mode showed a performance degradation on AMD Genoa when allowing
load balance to throttled cfs_rq. Analysis[1] showed hackbench doesn't
like task migration across LLC boundary. For this reason, add a check in
can_migrate_task() to forbid balancing to a cfs_rq that is in throttled
hierarchy. This reduced task migration a lot and performance restored.

[0]: https://lore.kernel.org/lkml/20250822110701.GB289@bytedance/
[1]: https://lore.kernel.org/lkml/20250903101102.GB42@bytedance/

Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>

authored by

Aaron Lu and committed by
Peter Zijlstra
0d4eaf8c 253b3f58

+18 -4
+18 -4
kernel/sched/fair.c
··· 5737 5737 return cfs_bandwidth_used() && cfs_rq->throttle_count; 5738 5738 } 5739 5739 5740 + static inline int lb_throttled_hierarchy(struct task_struct *p, int dst_cpu) 5741 + { 5742 + return throttled_hierarchy(task_group(p)->cfs_rq[dst_cpu]); 5743 + } 5744 + 5740 5745 static inline bool task_is_throttled(struct task_struct *p) 5741 5746 { 5742 5747 return cfs_bandwidth_used() && p->throttled; ··· 6734 6729 } 6735 6730 6736 6731 static inline int throttled_hierarchy(struct cfs_rq *cfs_rq) 6732 + { 6733 + return 0; 6734 + } 6735 + 6736 + static inline int lb_throttled_hierarchy(struct task_struct *p, int dst_cpu) 6737 6737 { 6738 6738 return 0; 6739 6739 } ··· 9379 9369 /* 9380 9370 * We do not migrate tasks that are: 9381 9371 * 1) delayed dequeued unless we migrate load, or 9382 - * 2) cannot be migrated to this CPU due to cpus_ptr, or 9383 - * 3) running (obviously), or 9384 - * 4) are cache-hot on their current CPU, or 9385 - * 5) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled) 9372 + * 2) target cfs_rq is in throttled hierarchy, or 9373 + * 3) cannot be migrated to this CPU due to cpus_ptr, or 9374 + * 4) running (obviously), or 9375 + * 5) are cache-hot on their current CPU, or 9376 + * 6) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled) 9386 9377 */ 9387 9378 if ((p->se.sched_delayed) && (env->migration_type != migrate_load)) 9379 + return 0; 9380 + 9381 + if (lb_throttled_hierarchy(p, env->dst_cpu)) 9388 9382 return 0; 9389 9383 9390 9384 /*