Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/topology: Compute sd_weight considering cpuset partitions

The "sd_weight" used for calculating the load balancing interval, and
its limits, considers the span weight of the entire topology level
without accounting for cpuset partitions.

For example, consider a large system of 128CPUs divided into 8 * 16CPUs
partition which is typical when deploying virtual machines:

[ PKG Domain: 128CPUs ]

[Partition0: 16CPUs][Partition1: 16CPUs] ... [Partition7: 16CPUs]

Although each partition only contains 16CPUs, the load balancing
interval is set to a minimum of 128 jiffies considering the span of the
entire domain with 128CPUs which can lead to longer imbalances within
the partition although balancing within is cheaper with 16CPUs.

Compute the "sd_weight" after computing the "sd_span" considering the
cpu_map covered by the partition, and set the load balancing interval,
and its limits accordingly.

For the above example, the balancing intervals for the partitions PKG
domain changes as follows:

before after
balance_interval 128 16
min_interval 128 16
max_interval 256 32

Intervals are now proportional to the CPUs in the partitioned domain as
was intended by the original formula.

Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/20260312044434.1974-2-kprateek.nayak@amd.com

authored by

K Prateek Nayak and committed by
Peter Zijlstra
8e8e23de 786244f7

+6 -8
+6 -8
kernel/sched/topology.c
··· 1645 1645 struct cpumask *sd_span; 1646 1646 u64 now = sched_clock(); 1647 1647 1648 - sd_weight = cpumask_weight(tl->mask(tl, cpu)); 1648 + sd_span = sched_domain_span(sd); 1649 + cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu)); 1650 + sd_weight = cpumask_weight(sd_span); 1651 + sd_id = cpumask_first(sd_span); 1649 1652 1650 1653 if (tl->sd_flags) 1651 1654 sd_flags = (*tl->sd_flags)(); 1652 1655 if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS, 1653 - "wrong sd_flags in topology description\n")) 1656 + "wrong sd_flags in topology description\n")) 1654 1657 sd_flags &= TOPOLOGY_SD_FLAGS; 1658 + sd_flags |= asym_cpu_capacity_classify(sd_span, cpu_map); 1655 1659 1656 1660 *sd = (struct sched_domain){ 1657 1661 .min_interval = sd_weight, ··· 1692 1688 .child = child, 1693 1689 .name = tl->name, 1694 1690 }; 1695 - 1696 - sd_span = sched_domain_span(sd); 1697 - cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu)); 1698 - sd_id = cpumask_first(sd_span); 1699 - 1700 - sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map); 1701 1691 1702 1692 WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) == 1703 1693 (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY),