Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting

Running stress-ng --schedpolicy 0 on an RT kernel on a big machine
might lead to the following WARNINGs (edited).

sched: DL de-boosted task PID 22725: REPLENISH flag missing

WARNING: CPU: 93 PID: 0 at kernel/sched/deadline.c:239 dequeue_task_dl+0x15c/0x1f8
... (running_bw underflow)
Call trace:
dequeue_task_dl+0x15c/0x1f8 (P)
dequeue_task+0x80/0x168
deactivate_task+0x24/0x50
push_dl_task+0x264/0x2e0
dl_task_timer+0x1b0/0x228
__hrtimer_run_queues+0x188/0x378
hrtimer_interrupt+0xfc/0x260
...

The problem is that when a SCHED_DEADLINE task (lock holder) is
changed to a lower priority class via sched_setscheduler(), it may
fail to properly inherit the parameters of potential DEADLINE donors
if it didn't already inherit them in the past (shorter deadline than
donor's at that time). This might lead to bandwidth accounting
corruption, as enqueue_task_dl() won't recognize the lock holder as
boosted.

The scenario occurs when:
1. A DEADLINE task (donor) blocks on a PI mutex held by another
DEADLINE task (holder), but the holder doesn't inherit parameters
(e.g., it already has a shorter deadline)
2. sched_setscheduler() changes the holder from DEADLINE to a lower
class while still holding the mutex
3. The holder should now inherit DEADLINE parameters from the donor
and be enqueued with ENQUEUE_REPLENISH, but this doesn't happen

Fix the issue by introducing __setscheduler_dl_pi(), which detects when
a DEADLINE (proper or boosted) task gets setscheduled to a lower
priority class. In case, the function makes the task inherit DEADLINE
parameters of the donoer (pi_se) and sets ENQUEUE_REPLENISH flag to
ensure proper bandwidth accounting during the next enqueue operation.

Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260302-upstream-fix-deadline-piboost-b4-v3-1-6ba32184a9e0@redhat.com

authored by

Juri Lelli and committed by
Peter Zijlstra
d658686a 11439c46

+30
+30
kernel/sched/syscalls.c
··· 284 284 uid_eq(cred->euid, pcred->uid)); 285 285 } 286 286 287 + #ifdef CONFIG_RT_MUTEXES 288 + static inline void __setscheduler_dl_pi(int newprio, int policy, 289 + struct task_struct *p, 290 + struct sched_change_ctx *scope) 291 + { 292 + /* 293 + * In case a DEADLINE task (either proper or boosted) gets 294 + * setscheduled to a lower priority class, check if it neeeds to 295 + * inherit parameters from a potential pi_task. In that case make 296 + * sure replenishment happens with the next enqueue. 297 + */ 298 + 299 + if (dl_prio(newprio) && !dl_policy(policy)) { 300 + struct task_struct *pi_task = rt_mutex_get_top_task(p); 301 + 302 + if (pi_task) { 303 + p->dl.pi_se = pi_task->dl.pi_se; 304 + scope->flags |= ENQUEUE_REPLENISH; 305 + } 306 + } 307 + } 308 + #else /* !CONFIG_RT_MUTEXES */ 309 + static inline void __setscheduler_dl_pi(int newprio, int policy, 310 + struct task_struct *p, 311 + struct sched_change_ctx *scope) 312 + { 313 + } 314 + #endif /* !CONFIG_RT_MUTEXES */ 315 + 287 316 #ifdef CONFIG_UCLAMP_TASK 288 317 289 318 static int uclamp_validate(struct task_struct *p, ··· 684 655 __setscheduler_params(p, attr); 685 656 p->sched_class = next_class; 686 657 p->prio = newprio; 658 + __setscheduler_dl_pi(newprio, policy, p, scope); 687 659 } 688 660 __setscheduler_uclamp(p, attr); 689 661