Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Oleg Nesterov and committed by
Linus Torvalds
d80e731e 855a85f7

+25 -2
+4
fs/eventpoll.c
··· 842 842 struct epitem *epi = ep_item_from_wait(wait); 843 843 struct eventpoll *ep = epi->ep; 844 844 845 + /* the caller holds eppoll_entry->whead->lock */ 846 + if ((unsigned long)key & POLLFREE) 847 + list_del_init(&wait->task_list); 848 + 845 849 spin_lock_irqsave(&ep->lock, flags); 846 850 847 851 /*
+11
fs/signalfd.c
··· 30 30 #include <linux/signalfd.h> 31 31 #include <linux/syscalls.h> 32 32 33 + void signalfd_cleanup(struct sighand_struct *sighand) 34 + { 35 + wait_queue_head_t *wqh = &sighand->signalfd_wqh; 36 + 37 + if (likely(!waitqueue_active(wqh))) 38 + return; 39 + 40 + /* wait_queue_t->func(POLLFREE) should do remove_wait_queue() */ 41 + wake_up_poll(wqh, POLLHUP | POLLFREE); 42 + } 43 + 33 44 struct signalfd_ctx { 34 45 sigset_t sigmask; 35 46 };
+2
include/asm-generic/poll.h
··· 28 28 #define POLLRDHUP 0x2000 29 29 #endif 30 30 31 + #define POLLFREE 0x4000 /* currently only for epoll */ 32 + 31 33 struct pollfd { 32 34 int fd; 33 35 short events;
+4 -1
include/linux/signalfd.h
··· 61 61 wake_up(&tsk->sighand->signalfd_wqh); 62 62 } 63 63 64 + extern void signalfd_cleanup(struct sighand_struct *sighand); 65 + 64 66 #else /* CONFIG_SIGNALFD */ 65 67 66 68 static inline void signalfd_notify(struct task_struct *tsk, int sig) { } 69 + 70 + static inline void signalfd_cleanup(struct sighand_struct *sighand) { } 67 71 68 72 #endif /* CONFIG_SIGNALFD */ 69 73 70 74 #endif /* __KERNEL__ */ 71 75 72 76 #endif /* _LINUX_SIGNALFD_H */ 73 -
+4 -1
kernel/fork.c
··· 66 66 #include <linux/user-return-notifier.h> 67 67 #include <linux/oom.h> 68 68 #include <linux/khugepaged.h> 69 + #include <linux/signalfd.h> 69 70 70 71 #include <asm/pgtable.h> 71 72 #include <asm/pgalloc.h> ··· 936 935 937 936 void __cleanup_sighand(struct sighand_struct *sighand) 938 937 { 939 - if (atomic_dec_and_test(&sighand->count)) 938 + if (atomic_dec_and_test(&sighand->count)) { 939 + signalfd_cleanup(sighand); 940 940 kmem_cache_free(sighand_cachep, sighand); 941 + } 941 942 } 942 943 943 944