Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Revert "signal: Allow tasks to cache one sigqueue struct"

This reverts commits 4bad58ebc8bc4f20d89cff95417c9b4674769709 (and
399f8dd9a866e107639eabd3c1979cd526ca3a98, which tried to fix it).

I do not believe these are correct, and I'm about to release 5.13, so am
reverting them out of an abundance of caution.

The locking is odd, and appears broken.

On the allocation side (in __sigqueue_alloc()), the locking is somewhat
straightforward: it depends on sighand->siglock. Since one caller
doesn't hold that lock, it further then tests 'sigqueue_flags' to avoid
the case with no locks held.

On the freeing side (in sigqueue_cache_or_free()), there is no locking
at all, and the logic instead depends on 'current' being a single
thread, and not able to race with itself.

To make things more exciting, there's also the data race between freeing
a signal and allocating one, which is handled by using WRITE_ONCE() and
READ_ONCE(), and being mutually exclusive wrt the initial state (ie
freeing will only free if the old state was NULL, while allocating will
obviously only use the value if it was non-NULL, so only one or the
other will actually act on the value).

However, while the free->alloc paths do seem mutually exclusive thanks
to just the data value dependency, it's not clear what the memory
ordering constraints are on it. Could writes from the previous
allocation possibly be delayed and seen by the new allocation later,
causing logical inconsistencies?

So it's all very exciting and unusual.

And in particular, it seems that the freeing side is incorrect in
depending on "current" being single-threaded. Yes, 'current' is a
single thread, but in the presense of asynchronous events even a single
thread can have data races.

And such asynchronous events can and do happen, with interrupts causing
signals to be flushed and thus free'd (for example - sending a
SIGCONT/SIGSTOP can happen from interrupt context, and can flush
previously queued process control signals).

So regardless of all the other questions about the memory ordering and
locking for this new cached allocation, the sigqueue_cache_or_free()
assumptions seem to be fundamentally incorrect.

It may be that people will show me the errors of my ways, and tell me
why this is all safe after all. We can reinstate it if so. But my
current belief is that the WRITE_ONCE() that sets the cached entry needs
to be a smp_store_release(), and the READ_ONCE() that finds a cached
entry needs to be a smp_load_acquire() to handle memory ordering
correctly.

And the sequence in sigqueue_cache_or_free() would need to either use a
lock or at least be interrupt-safe some way (perhaps by using something
like the percpu 'cmpxchg': it doesn't need to be SMP-safe, but like the
percpu operations it needs to be interrupt-safe).

Fixes: 399f8dd9a866 ("signal: Prevent sigqueue caching after task got released")
Fixes: 4bad58ebc8bc ("signal: Allow tasks to cache one sigqueue struct")
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

+2 -61
-1
include/linux/sched.h
··· 997 997 /* Signal handlers: */ 998 998 struct signal_struct *signal; 999 999 struct sighand_struct __rcu *sighand; 1000 - struct sigqueue *sigqueue_cache; 1001 1000 sigset_t blocked; 1002 1001 sigset_t real_blocked; 1003 1002 /* Restored if set_restore_sigmask() was used: */
-1
include/linux/signal.h
··· 267 267 } 268 268 269 269 extern void flush_sigqueue(struct sigpending *queue); 270 - extern void exit_task_sigqueue_cache(struct task_struct *tsk); 271 270 272 271 /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */ 273 272 static inline int valid_signal(unsigned long sig)
-1
kernel/exit.c
··· 162 162 flush_sigqueue(&sig->shared_pending); 163 163 tty_kref_put(tty); 164 164 } 165 - exit_task_sigqueue_cache(tsk); 166 165 } 167 166 168 167 static void delayed_put_task_struct(struct rcu_head *rhp)
-1
kernel/fork.c
··· 2008 2008 spin_lock_init(&p->alloc_lock); 2009 2009 2010 2010 init_sigpending(&p->pending); 2011 - p->sigqueue_cache = NULL; 2012 2011 2013 2012 p->utime = p->stime = p->gtime = 0; 2014 2013 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
+2 -57
kernel/signal.c
··· 431 431 rcu_read_unlock(); 432 432 433 433 if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { 434 - /* 435 - * Preallocation does not hold sighand::siglock so it can't 436 - * use the cache. The lockless caching requires that only 437 - * one consumer and only one producer run at a time. 438 - * 439 - * For the regular allocation case it is sufficient to 440 - * check @q for NULL because this code can only be called 441 - * if the target task @t has not been reaped yet; which 442 - * means this code can never observe the error pointer which is 443 - * written to @t->sigqueue_cache in exit_task_sigqueue_cache(). 444 - */ 445 - q = READ_ONCE(t->sigqueue_cache); 446 - if (!q || sigqueue_flags) 447 - q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); 448 - else 449 - WRITE_ONCE(t->sigqueue_cache, NULL); 434 + q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); 450 435 } else { 451 436 print_dropped_signal(sig); 452 437 } ··· 448 463 return q; 449 464 } 450 465 451 - void exit_task_sigqueue_cache(struct task_struct *tsk) 452 - { 453 - /* Race free because @tsk is mopped up */ 454 - struct sigqueue *q = tsk->sigqueue_cache; 455 - 456 - if (q) { 457 - /* 458 - * Hand it back to the cache as the task might 459 - * be self reaping which would leak the object. 460 - */ 461 - kmem_cache_free(sigqueue_cachep, q); 462 - } 463 - 464 - /* 465 - * Set an error pointer to ensure that @tsk will not cache a 466 - * sigqueue when it is reaping it's child tasks 467 - */ 468 - tsk->sigqueue_cache = ERR_PTR(-1); 469 - } 470 - 471 - static void sigqueue_cache_or_free(struct sigqueue *q) 472 - { 473 - /* 474 - * Cache one sigqueue per task. This pairs with the consumer side 475 - * in __sigqueue_alloc() and needs READ/WRITE_ONCE() to prevent the 476 - * compiler from store tearing and to tell KCSAN that the data race 477 - * is intentional when run without holding current->sighand->siglock, 478 - * which is fine as current obviously cannot run __sigqueue_free() 479 - * concurrently. 480 - * 481 - * The NULL check is safe even if current has been reaped already, 482 - * in which case exit_task_sigqueue_cache() wrote an error pointer 483 - * into current->sigqueue_cache. 484 - */ 485 - if (!READ_ONCE(current->sigqueue_cache)) 486 - WRITE_ONCE(current->sigqueue_cache, q); 487 - else 488 - kmem_cache_free(sigqueue_cachep, q); 489 - } 490 - 491 466 static void __sigqueue_free(struct sigqueue *q) 492 467 { 493 468 if (q->flags & SIGQUEUE_PREALLOC) 494 469 return; 495 470 if (atomic_dec_and_test(&q->user->sigpending)) 496 471 free_uid(q->user); 497 - sigqueue_cache_or_free(q); 472 + kmem_cache_free(sigqueue_cachep, q); 498 473 } 499 474 500 475 void flush_sigqueue(struct sigpending *queue)