Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()

The race was introduced by me in commit 971316f0503a ("epoll:
ep_unregister_pollwait() can use the freed pwq->whead"). I did not
realize that nothing can protect eventpoll after ep_poll_callback() sets
->whead = NULL, only whead->lock can save us from the race with
ep_free() or ep_remove().

Move ->whead = NULL to the end of ep_poll_callback() and add the
necessary barriers.

TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
before this patch.

Hopefully this explains use-after-free reported by syzcaller:

BUG: KASAN: use-after-free in debug_spin_lock_before
...
_raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148

this is spin_lock(eventpoll->lock),

...
Freed by task 17774:
...
kfree+0xe8/0x2c0 mm/slub.c:3883
ep_free+0x22c/0x2a0 fs/eventpoll.c:865

Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
Reported-by: 范龙飞 <long7573@126.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Oleg Nesterov and committed by
Linus Torvalds
138e4ad6 8cf9f2a2

+26 -16
+26 -16
fs/eventpoll.c
··· 600 600 wait_queue_head_t *whead; 601 601 602 602 rcu_read_lock(); 603 - /* If it is cleared by POLLFREE, it should be rcu-safe */ 604 - whead = rcu_dereference(pwq->whead); 603 + /* 604 + * If it is cleared by POLLFREE, it should be rcu-safe. 605 + * If we read NULL we need a barrier paired with 606 + * smp_store_release() in ep_poll_callback(), otherwise 607 + * we rely on whead->lock. 608 + */ 609 + whead = smp_load_acquire(&pwq->whead); 605 610 if (whead) 606 611 remove_wait_queue(whead, &pwq->wait); 607 612 rcu_read_unlock(); ··· 1139 1134 struct eventpoll *ep = epi->ep; 1140 1135 int ewake = 0; 1141 1136 1142 - if ((unsigned long)key & POLLFREE) { 1143 - ep_pwq_from_wait(wait)->whead = NULL; 1144 - /* 1145 - * whead = NULL above can race with ep_remove_wait_queue() 1146 - * which can do another remove_wait_queue() after us, so we 1147 - * can't use __remove_wait_queue(). whead->lock is held by 1148 - * the caller. 1149 - */ 1150 - list_del_init(&wait->entry); 1151 - } 1152 - 1153 1137 spin_lock_irqsave(&ep->lock, flags); 1154 1138 1155 1139 ep_set_busy_poll_napi_id(epi); ··· 1222 1228 if (pwake) 1223 1229 ep_poll_safewake(&ep->poll_wait); 1224 1230 1225 - if (epi->event.events & EPOLLEXCLUSIVE) 1226 - return ewake; 1231 + if (!(epi->event.events & EPOLLEXCLUSIVE)) 1232 + ewake = 1; 1227 1233 1228 - return 1; 1234 + if ((unsigned long)key & POLLFREE) { 1235 + /* 1236 + * If we race with ep_remove_wait_queue() it can miss 1237 + * ->whead = NULL and do another remove_wait_queue() after 1238 + * us, so we can't use __remove_wait_queue(). 1239 + */ 1240 + list_del_init(&wait->entry); 1241 + /* 1242 + * ->whead != NULL protects us from the race with ep_free() 1243 + * or ep_remove(), ep_remove_wait_queue() takes whead->lock 1244 + * held by the caller. Once we nullify it, nothing protects 1245 + * ep/epi or even wait. 1246 + */ 1247 + smp_store_release(&ep_pwq_from_wait(wait)->whead, NULL); 1248 + } 1249 + 1250 + return ewake; 1229 1251 } 1230 1252 1231 1253 /*