Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/fair: More complex proportional newidle balance

It turns out that a few workloads (easyWave, fio) have a fairly low
success rate on newidle balance, but still benefit greatly from having
it anyway.

Luckliky these workloads have a faily low newidle rate, so the cost if
doing the newidle is relatively low, even if unsuccessfull.

Add a simple rate based part to the newidle ratio compute, such that
low rate newidle will still have a high newidle ratio.

This cures the easyWave and fio workloads while not affecting the
schbench numbers either (which have a very high newidle rate).

Reported-by: Mario Roy <marioeroy@gmail.com>
Reported-by: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Mario Roy <marioeroy@gmail.com>
Tested-by: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>
Link: https://patch.msgid.link/20260127151748.GA1079264@noisy.programming.kicks-ass.net

+30 -2
+1
include/linux/sched/topology.h
··· 95 95 unsigned int newidle_call; 96 96 unsigned int newidle_success; 97 97 unsigned int newidle_ratio; 98 + u64 newidle_stamp; 98 99 u64 max_newidle_lb_cost; 99 100 unsigned long last_decay_max_lb_cost; 100 101
+25 -2
kernel/sched/fair.c
··· 12289 12289 sd->newidle_success += success; 12290 12290 12291 12291 if (sd->newidle_call >= 1024) { 12292 - sd->newidle_ratio = sd->newidle_success; 12292 + u64 now = sched_clock(); 12293 + s64 delta = now - sd->newidle_stamp; 12294 + sd->newidle_stamp = now; 12295 + int ratio = 0; 12296 + 12297 + if (delta < 0) 12298 + delta = 0; 12299 + 12300 + if (sched_feat(NI_RATE)) { 12301 + /* 12302 + * ratio delta freq 12303 + * 12304 + * 1024 - 4 s - 128 Hz 12305 + * 512 - 2 s - 256 Hz 12306 + * 256 - 1 s - 512 Hz 12307 + * 128 - .5 s - 1024 Hz 12308 + * 64 - .25 s - 2048 Hz 12309 + */ 12310 + ratio = delta >> 22; 12311 + } 12312 + 12313 + ratio += sd->newidle_success; 12314 + 12315 + sd->newidle_ratio = min(1024, ratio); 12293 12316 sd->newidle_call /= 2; 12294 12317 sd->newidle_success /= 2; 12295 12318 } ··· 13019 12996 if (sd->flags & SD_BALANCE_NEWIDLE) { 13020 12997 unsigned int weight = 1; 13021 12998 13022 - if (sched_feat(NI_RANDOM)) { 12999 + if (sched_feat(NI_RANDOM) && sd->newidle_ratio < 1024) { 13023 13000 /* 13024 13001 * Throw a 1k sided dice; and only run 13025 13002 * newidle_balance according to the success
+1
kernel/sched/features.h
··· 126 126 * Do newidle balancing proportional to its success rate using randomization. 127 127 */ 128 128 SCHED_FEAT(NI_RANDOM, true) 129 + SCHED_FEAT(NI_RATE, true)
+3
kernel/sched/topology.c
··· 4 4 */ 5 5 6 6 #include <linux/sched/isolation.h> 7 + #include <linux/sched/clock.h> 7 8 #include <linux/bsearch.h> 8 9 #include "sched.h" 9 10 ··· 1643 1642 struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu); 1644 1643 int sd_id, sd_weight, sd_flags = 0; 1645 1644 struct cpumask *sd_span; 1645 + u64 now = sched_clock(); 1646 1646 1647 1647 sd_weight = cpumask_weight(tl->mask(tl, cpu)); 1648 1648 ··· 1681 1679 .newidle_call = 512, 1682 1680 .newidle_success = 256, 1683 1681 .newidle_ratio = 512, 1682 + .newidle_stamp = now, 1684 1683 1685 1684 .max_newidle_lb_cost = 0, 1686 1685 .last_decay_max_lb_cost = jiffies,