Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

eventpoll: fix ep_remove struct eventpoll / struct file UAF

ep_remove() (via ep_remove_file()) cleared file->f_ep under
file->f_lock but then kept using @file inside the critical section
(is_file_epoll(), hlist_del_rcu() through the head, spin_unlock).
A concurrent __fput() taking the eventpoll_release() fastpath in
that window observed the transient NULL, skipped
eventpoll_release_file() and ran to f_op->release / file_free().

For the epoll-watches-epoll case, f_op->release is
ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which
kfree()s the watched struct eventpoll. Its embedded ->refs
hlist_head is exactly where epi->fllink.pprev points, so the
subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed
kmalloc-192 memory.

In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot
backing @file could be recycled by alloc_empty_file() --
reinitializing f_lock and f_ep -- while ep_remove() is still
nominally inside that lock. The upshot is an attacker-controllable
kmem_cache_free() against the wrong slab cache.

Pin @file via epi_fget() at the top of ep_remove() and gate the
critical section on the pin succeeding. With the pin held @file
cannot reach refcount zero, which holds __fput() off and
transitively keeps the watched struct eventpoll alive across the
hlist_del_rcu() and the f_lock use, closing both UAFs.

If the pin fails @file has already reached refcount zero and its
__fput() is in flight. Because we bailed before clearing f_ep,
that path takes the eventpoll_release() slow path into
eventpoll_release_file() and blocks on ep->mtx until the waiter
side's ep_clear_and_put() drops it. The bailed epi's share of
ep->refcount stays intact, so the trailing ep_refcount_dec_and_test()
in ep_clear_and_put() cannot free the eventpoll out from under
eventpoll_release_file(); the orphaned epi is then cleaned up
there.

A successful pin also proves we are not racing
eventpoll_release_file() on this epi, so drop the now-redundant
re-check of epi->dying under f_lock. The cheap lockless
READ_ONCE(epi->dying) fast-path bailout stays.

Fixes: 58c9b016e128 ("epoll: use refcount to reduce ep_mutex contention")
Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr>
Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-6-2470f9eec0f5@kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

+10 -6
+10 -6
fs/eventpoll.c
··· 912 912 */ 913 913 static void ep_remove(struct eventpoll *ep, struct epitem *epi) 914 914 { 915 - struct file *file = epi->ffd.file; 915 + struct file *file __free(fput) = NULL; 916 916 917 917 lockdep_assert_irqs_enabled(); 918 918 lockdep_assert_held(&ep->mtx); 919 919 920 920 ep_unregister_pollwait(ep, epi); 921 921 922 - /* sync with eventpoll_release_file() */ 922 + /* cheap sync with eventpoll_release_file() */ 923 923 if (unlikely(READ_ONCE(epi->dying))) 924 924 return; 925 925 926 - spin_lock(&file->f_lock); 927 - if (epi->dying) { 928 - spin_unlock(&file->f_lock); 926 + /* 927 + * If we manage to grab a reference it means we're not in 928 + * eventpoll_release_file() and aren't going to be. 929 + */ 930 + file = epi_fget(epi); 931 + if (!file) 929 932 return; 930 - } 933 + 934 + spin_lock(&file->f_lock); 931 935 ep_remove_file(ep, epi, file); 932 936 933 937 if (ep_remove_epi(ep, epi))