Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/deadline: Check bandwidth overflow earlier for hotplug

Currently we check for bandwidth overflow potentially due to hotplug
operations at the end of sched_cpu_deactivate(), after the cpu going
offline has already been removed from scheduling, active_mask, etc.
This can create issues for DEADLINE tasks, as there is a substantial
race window between the start of sched_cpu_deactivate() and the moment
we possibly decide to roll-back the operation if dl_bw_deactivate()
returns failure in cpuset_cpu_inactive(). An example is a throttled
task that sees its replenishment timer firing while the cpu it was
previously running on is considered offline, but before
dl_bw_deactivate() had a chance to say no and roll-back happened.

Fix this by directly calling dl_bw_deactivate() first thing in
sched_cpu_deactivate() and do the required calculation in the former
function considering the cpu passed as an argument as offline already.

By doing so we also simplify sched_cpu_deactivate(), as there is no need
anymore for any kind of roll-back if we fail early.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Tested-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/Zzc1DfPhbvqDDIJR@jlelli-thinkpadt14gen4.remote.csb

authored by

Juri Lelli and committed by
Peter Zijlstra
53916d5f d4742f6e

+17 -17
+7 -15
kernel/sched/core.c
··· 8182 8182 cpuset_update_active_cpus(); 8183 8183 } 8184 8184 8185 - static int cpuset_cpu_inactive(unsigned int cpu) 8185 + static void cpuset_cpu_inactive(unsigned int cpu) 8186 8186 { 8187 8187 if (!cpuhp_tasks_frozen) { 8188 - int ret = dl_bw_deactivate(cpu); 8189 - 8190 - if (ret) 8191 - return ret; 8192 8188 cpuset_update_active_cpus(); 8193 8189 } else { 8194 8190 num_cpus_frozen++; 8195 8191 partition_sched_domains(1, NULL, NULL); 8196 8192 } 8197 - return 0; 8198 8193 } 8199 8194 8200 8195 static inline void sched_smt_present_inc(int cpu) ··· 8251 8256 struct rq *rq = cpu_rq(cpu); 8252 8257 int ret; 8253 8258 8259 + ret = dl_bw_deactivate(cpu); 8260 + 8261 + if (ret) 8262 + return ret; 8263 + 8254 8264 /* 8255 8265 * Remove CPU from nohz.idle_cpus_mask to prevent participating in 8256 8266 * load balancing when not active ··· 8301 8301 return 0; 8302 8302 8303 8303 sched_update_numa(cpu, false); 8304 - ret = cpuset_cpu_inactive(cpu); 8305 - if (ret) { 8306 - sched_smt_present_inc(cpu); 8307 - sched_set_rq_online(rq, cpu); 8308 - balance_push_set(cpu, false); 8309 - set_cpu_active(cpu, true); 8310 - sched_update_numa(cpu, true); 8311 - return ret; 8312 - } 8304 + cpuset_cpu_inactive(cpu); 8313 8305 sched_domains_numa_masks_clear(cpu); 8314 8306 return 0; 8315 8307 }
+10 -2
kernel/sched/deadline.c
··· 3496 3496 break; 3497 3497 case dl_bw_req_deactivate: 3498 3498 /* 3499 + * cpu is not off yet, but we need to do the math by 3500 + * considering it off already (i.e., what would happen if we 3501 + * turn cpu off?). 3502 + */ 3503 + cap -= arch_scale_cpu_capacity(cpu); 3504 + 3505 + /* 3499 3506 * cpu is going offline and NORMAL tasks will be moved away 3500 3507 * from it. We can thus discount dl_server bandwidth 3501 3508 * contribution as it won't need to be servicing tasks after ··· 3519 3512 if (dl_b->total_bw - fair_server_bw > 0) { 3520 3513 /* 3521 3514 * Leaving at least one CPU for DEADLINE tasks seems a 3522 - * wise thing to do. 3515 + * wise thing to do. As said above, cpu is not offline 3516 + * yet, so account for that. 3523 3517 */ 3524 - if (dl_bw_cpus(cpu)) 3518 + if (dl_bw_cpus(cpu) - 1) 3525 3519 overflow = __dl_overflow(dl_b, cap, fair_server_bw, 0); 3526 3520 else 3527 3521 overflow = 1;