Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

futex: Prevent lockup in requeue-PI during signal/ timeout wakeup

During wait-requeue-pi (task A) and requeue-PI (task B) the following
race can happen:

Task A Task B
futex_wait_requeue_pi()
futex_setup_timer()
futex_do_wait()
futex_requeue()
CLASS(hb, hb1)(&key1);
CLASS(hb, hb2)(&key2);
*timeout*
futex_requeue_pi_wakeup_sync()
requeue_state = Q_REQUEUE_PI_IGNORE

*blocks on hb->lock*

futex_proxy_trylock_atomic()
futex_requeue_pi_prepare()
Q_REQUEUE_PI_IGNORE => -EAGAIN
double_unlock_hb(hb1, hb2)
*retry*

Task B acquires both hb locks and attempts to acquire the PI-lock of the
top most waiter (task B). Task A is leaving early due to a signal/
timeout and started removing itself from the queue. It updates its
requeue_state but can not remove it from the list because this requires
the hb lock which is owned by task B.

Usually task A is able to swoop the lock after task B unlocked it.
However if task B is of higher priority then task A may not be able to
wake up in time and acquire the lock before task B gets it again.
Especially on a UP system where A is never scheduled.

As a result task A blocks on the lock and task B busy loops, trying to
make progress but live locks the system instead. Tragic.

This can be fixed by removing the top most waiter from the list in this
case. This allows task B to grab the next top waiter (if any) in the
next iteration and make progress.

Remove the top most waiter if futex_requeue_pi_prepare() fails.
Let the waiter conditionally remove itself from the list in
handle_early_requeue_pi_wakeup().

Fixes: 07d91ef510fb1 ("futex: Prevent requeue_pi() lock nesting issue on RT")
Reported-by: Moritz Klammler <Moritz.Klammler@ferchau.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260428103425.dywXyPd3@linutronix.de
Closes: https://lore.kernel.org/all/VE1PR06MB6894BE61C173D802365BE19DFF4CA@VE1PR06MB6894.eurprd06.prod.outlook.com

authored by

Sebastian Andrzej Siewior and committed by
Thomas Gleixner
bc7304f3 254f4963

+9 -4
+9 -4
kernel/futex/requeue.c
··· 319 319 return -EINVAL; 320 320 321 321 /* Ensure that this does not race against an early wakeup */ 322 - if (!futex_requeue_pi_prepare(top_waiter, NULL)) 322 + if (!futex_requeue_pi_prepare(top_waiter, NULL)) { 323 + plist_del(&top_waiter->list, &hb1->chain); 324 + futex_hb_waiters_dec(hb1); 323 325 return -EAGAIN; 326 + } 324 327 325 328 /* 326 329 * Try to take the lock for top_waiter and set the FUTEX_WAITERS bit ··· 725 722 726 723 /* 727 724 * We were woken prior to requeue by a timeout or a signal. 728 - * Unqueue the futex_q and determine which it was. 725 + * Conditionally unqueue the futex_q and determine which it was. 729 726 */ 730 - plist_del(&q->list, &hb->chain); 731 - futex_hb_waiters_dec(hb); 727 + if (!plist_node_empty(&q->list)) { 728 + plist_del(&q->list, &hb->chain); 729 + futex_hb_waiters_dec(hb); 730 + } 732 731 733 732 /* Handle spurious wakeups gracefully */ 734 733 ret = -EWOULDBLOCK;