Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched: fix the theoretical signal_wake_up() vs schedule() race

This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like

__set_current_state(TASK_INTERRUPTIBLE);
schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Oleg Nesterov and committed by
Linus Torvalds
e0acd0a6 584d88b2

+24 -8
-4
arch/x86/include/asm/spinlock.h
··· 233 233 #define arch_read_relax(lock) cpu_relax() 234 234 #define arch_write_relax(lock) cpu_relax() 235 235 236 - /* The {read|write|spin}_lock() on x86 are full memory barriers. */ 237 - static inline void smp_mb__after_lock(void) { } 238 - #define ARCH_HAS_SMP_MB_AFTER_LOCK 239 - 240 236 #endif /* _ASM_X86_SPINLOCK_H */
+11 -3
include/linux/spinlock.h
··· 117 117 #endif /*arch_spin_is_contended*/ 118 118 #endif 119 119 120 - /* The lock does not imply full memory barrier. */ 121 - #ifndef ARCH_HAS_SMP_MB_AFTER_LOCK 122 - static inline void smp_mb__after_lock(void) { smp_mb(); } 120 + /* 121 + * Despite its name it doesn't necessarily has to be a full barrier. 122 + * It should only guarantee that a STORE before the critical section 123 + * can not be reordered with a LOAD inside this section. 124 + * spin_lock() is the one-way barrier, this LOAD can not escape out 125 + * of the region. So the default implementation simply ensures that 126 + * a STORE can not move into the critical section, smp_wmb() should 127 + * serialize it with another STORE done by spin_lock(). 128 + */ 129 + #ifndef smp_mb__before_spinlock 130 + #define smp_mb__before_spinlock() smp_wmb() 123 131 #endif 124 132 125 133 /**
+13 -1
kernel/sched/core.c
··· 1491 1491 unsigned long flags; 1492 1492 int cpu, success = 0; 1493 1493 1494 - smp_wmb(); 1494 + /* 1495 + * If we are going to wake up a thread waiting for CONDITION we 1496 + * need to ensure that CONDITION=1 done by the caller can not be 1497 + * reordered with p->state check below. This pairs with mb() in 1498 + * set_current_state() the waiting thread does. 1499 + */ 1500 + smp_mb__before_spinlock(); 1495 1501 raw_spin_lock_irqsave(&p->pi_lock, flags); 1496 1502 if (!(p->state & state)) 1497 1503 goto out; ··· 2400 2394 if (sched_feat(HRTICK)) 2401 2395 hrtick_clear(rq); 2402 2396 2397 + /* 2398 + * Make sure that signal_pending_state()->signal_pending() below 2399 + * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) 2400 + * done by the caller to avoid the race with signal_wake_up(). 2401 + */ 2402 + smp_mb__before_spinlock(); 2403 2403 raw_spin_lock_irq(&rq->lock); 2404 2404 2405 2405 switch_count = &prev->nivcsw;