Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'tcp-fix-listener-wakeup-after-reuseport-migration'

Zhenzhong Wu says:

====================
tcp: fix listener wakeup after reuseport migration

This series fixes a missing wakeup when inet_csk_listen_stop() migrates
an established child socket from a closing listener to another socket
in the same SO_REUSEPORT group after the child has already been queued
for accept.

The target listener receives the migrated accept-queue entry via
inet_csk_reqsk_queue_add(), but its waiters are not notified.
Nonblocking accept() still succeeds because it checks the accept queue
directly, but readiness-based waiters can remain asleep until another
connection generates a wakeup.

Patch 1 notifies the target listener after a successful migration in
inet_csk_listen_stop() and protects the post-queue_add() nsk accesses
with rcu_read_lock()/rcu_read_unlock().

Patch 2 extends the existing migrate_reuseport BPF selftest with epoll
readiness checks inside migrate_dance(), around shutdown() where the
migration happens. The test now verifies that the target listener is
not ready before migration and becomes ready immediately after it, for
both TCP_ESTABLISHED and TCP_SYN_RECV. TCP_NEW_SYN_RECV remains
excluded because it still depends on later handshake completion.

Testing:
- On a local unpatched kernel, the focused migrate_reuseport test
fails for the listener-migration cases and passes for the
TCP_NEW_SYN_RECV cases:
not ok 1 IPv4 TCP_ESTABLISHED inet_csk_listen_stop
not ok 2 IPv4 TCP_SYN_RECV inet_csk_listen_stop
ok 3 IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler
ok 4 IPv4 TCP_NEW_SYN_RECV inet_csk_complete_hashdance
not ok 5 IPv6 TCP_ESTABLISHED inet_csk_listen_stop
not ok 6 IPv6 TCP_SYN_RECV inet_csk_listen_stop
ok 7 IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler
ok 8 IPv6 TCP_NEW_SYN_RECV inet_csk_complete_hashdance
- On a patched kernel booted under QEMU, the full migrate_reuseport
selftest passes:
ok 1 IPv4 TCP_ESTABLISHED inet_csk_listen_stop
ok 2 IPv4 TCP_SYN_RECV inet_csk_listen_stop
ok 3 IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler
ok 4 IPv4 TCP_NEW_SYN_RECV inet_csk_complete_hashdance
ok 5 IPv6 TCP_ESTABLISHED inet_csk_listen_stop
ok 6 IPv6 TCP_SYN_RECV inet_csk_listen_stop
ok 7 IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler
ok 8 IPv6 TCP_NEW_SYN_RECV inet_csk_complete_hashdance
SELFTEST_RC=0
====================

Link: https://patch.msgid.link/20260422024554.130346-1-jt26wzz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+45 -7
+3
net/ipv4/inet_connection_sock.c
··· 1479 1479 if (nreq) { 1480 1480 refcount_set(&nreq->rsk_refcnt, 1); 1481 1481 1482 + rcu_read_lock(); 1482 1483 if (inet_csk_reqsk_queue_add(nsk, nreq, child)) { 1483 1484 __NET_INC_STATS(sock_net(nsk), 1484 1485 LINUX_MIB_TCPMIGRATEREQSUCCESS); 1485 1486 reqsk_migrate_reset(req); 1487 + READ_ONCE(nsk->sk_data_ready)(nsk); 1486 1488 } else { 1487 1489 __NET_INC_STATS(sock_net(nsk), 1488 1490 LINUX_MIB_TCPMIGRATEREQFAILURE); 1489 1491 reqsk_migrate_reset(nreq); 1490 1492 __reqsk_free(nreq); 1491 1493 } 1494 + rcu_read_unlock(); 1492 1495 1493 1496 /* inet_csk_reqsk_queue_add() has already 1494 1497 * called inet_child_forget() on failure case.
+42 -7
tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c
··· 7 7 * 3. call listen() for 1 server socket. (migration target) 8 8 * 4. update a map to migrate all child sockets 9 9 * to the last server socket (migrate_map[cookie] = 4) 10 - * 5. call shutdown() for first 4 server sockets 10 + * 5. for TCP_ESTABLISHED and TCP_SYN_RECV cases, verify via epoll 11 + * that the last server socket is not ready before migration. 12 + * 6. call shutdown() for first 4 server sockets 11 13 * and migrate the requests in the accept queue 12 14 * to the last server socket. 13 - * 6. call listen() for the second server socket. 14 - * 7. call shutdown() for the last server 15 + * 7. for TCP_ESTABLISHED and TCP_SYN_RECV cases, verify via epoll 16 + * that the last server socket is ready after migration. 17 + * 8. call listen() for the second server socket. 18 + * 9. call shutdown() for the last server 15 19 * and migrate the requests in the accept queue 16 20 * to the second server socket. 17 - * 8. call listen() for the last server. 18 - * 9. call shutdown() for the second server 21 + * 10. call listen() for the last server. 22 + * 11. call shutdown() for the second server 19 23 * and migrate the requests in the accept queue 20 24 * to the last server socket. 21 - * 10. call accept() for the last server socket. 25 + * 12. call accept() for the last server socket. 22 26 * 23 27 * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> 24 28 */ 25 29 26 30 #include <bpf/bpf.h> 27 31 #include <bpf/libbpf.h> 32 + #include <sys/epoll.h> 28 33 29 34 #include "test_progs.h" 30 35 #include "test_migrate_reuseport.skel.h" ··· 355 350 356 351 static int migrate_dance(struct migrate_reuseport_test_case *test_case) 357 352 { 353 + struct epoll_event ev = { 354 + .events = EPOLLIN, 355 + }; 356 + int epoll = -1, nfds; 358 357 int i, err; 358 + 359 + if (test_case->state != BPF_TCP_NEW_SYN_RECV) { 360 + epoll = epoll_create1(0); 361 + if (!ASSERT_NEQ(epoll, -1, "epoll_create1")) 362 + return -1; 363 + 364 + ev.data.fd = test_case->servers[MIGRATED_TO]; 365 + if (!ASSERT_OK(epoll_ctl(epoll, EPOLL_CTL_ADD, 366 + test_case->servers[MIGRATED_TO], &ev), 367 + "epoll_ctl")) 368 + goto close_epoll; 369 + 370 + nfds = epoll_wait(epoll, &ev, 1, 0); 371 + if (!ASSERT_EQ(nfds, 0, "epoll_wait 1")) 372 + goto close_epoll; 373 + } 359 374 360 375 /* Migrate TCP_ESTABLISHED and TCP_SYN_RECV requests 361 376 * to the last listener based on eBPF. ··· 383 358 for (i = 0; i < MIGRATED_TO; i++) { 384 359 err = shutdown(test_case->servers[i], SHUT_RDWR); 385 360 if (!ASSERT_OK(err, "shutdown")) 386 - return -1; 361 + goto close_epoll; 387 362 } 388 363 389 364 /* No dance for TCP_NEW_SYN_RECV to migrate based on eBPF */ 390 365 if (test_case->state == BPF_TCP_NEW_SYN_RECV) 391 366 return 0; 367 + 368 + nfds = epoll_wait(epoll, &ev, 1, 0); 369 + if (!ASSERT_EQ(nfds, 1, "epoll_wait 2")) { 370 + close_epoll: 371 + if (epoll >= 0) 372 + close(epoll); 373 + return -1; 374 + } 375 + 376 + close(epoll); 392 377 393 378 /* Note that we use the second listener instead of the 394 379 * first one here.