Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

ksmbd: rewrite stop_sessions() with restartable iteration

stop_sessions() walks conn_list with hash_for_each() and, for every
entry, drops conn_list_lock across the transport ->shutdown() call
before re-acquiring the read lock to continue the loop. The hash
walk relies on cross-iteration state (the current bucket and the
hlist position), which is not preserved across unlock/relock: if
another thread performs a list mutation during the unlocked window,
the ongoing iteration becomes unreliable and can re-visit
connections that have already been handled or skip connections that
have not. The outer `if (!hash_empty(conn_list)) goto again;` retry
masks the symptom in the common case but does not address the
unsafe iteration itself.

Reframe the loop so it never relies on iterator state across
unlock/relock. Under conn_list_lock held for read, pick the first
connection whose ->shutdown() has not yet been issued by this path,
pin it by taking an extra reference, record that fact on the
connection and mark it EXITING while still inside the locked walk,
then drop the lock. Then call ->shutdown() outside the lock, drop
the pin (freeing the connection if the handler already released its
reference), and restart from the top.

Use a new per-connection flag, conn->stop_called, as the "shutdown
issued from stop_sessions()" marker rather than reusing the status
state. ksmbd_conn_set_exiting() is also invoked by
ksmbd_sessions_deregister() on sibling channels of a multichannel
session without issuing a transport shutdown, so treating
KSMBD_SESS_EXITING as "already handled here" would skip connections
that still need shutdown() to wake their handler out of recv(),
leaving the outer retry waiting indefinitely for the hash to drain.
stop_sessions() is serialised by init_lock in
ksmbd_conn_transport_destroy(), so writing stop_called under the
read lock has no other writer.

Set EXITING inside the locked walk so the selection, the stop_called
marker, and the status transition all happen together, and guard
against regressing a connection that has already advanced to
KSMBD_SESS_RELEASING on its own (for example, if the handler exited
its receive loop for an unrelated reason between teardown steps).

When the pin drop is the last put, release the transport and pair
ida_destroy(&target->async_ida) with the ida_init() done in
ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its
own does not leak the xarray backing of the embedded async_ida.

The outer retry with msleep() is kept to wait for handler threads to
reach ksmbd_conn_free() and drain the hash.

Observed with an instrumented build that logs one line per visit and
widens the unlocked window before ->shutdown() by 200 ms, under
five concurrent cifs mounts (nosharesock, one connection each):

* Current code: the same connection address is revisited many
times during a single stop_sessions() call and ->shutdown() is
invoked well beyond the number of live connections before the
hash finally drains.

* Rewritten code: each live connection produces exactly one
->shutdown() call; the function returns as soon as the hash is
empty.

Functional teardown via `ksmbd.control --shutdown` with the same
five mounts completes cleanly on the rewritten path.

Performance is observably unchanged. Tearing down N concurrent
nosharesock cifs connections with `ksmbd.control --shutdown` +
`rmmod ksmbd` takes essentially the same wall time before and after
the rewrite:

N before after
10 4.93s 5.34s
30 7.34s 7.03s
50 7.31s 7.01s (3-run avg: 7.04s vs 7.25s)
100 6.98s 6.78s
200 6.77s 6.89s

and the number of ->shutdown() calls equals the number of live
connections on both paths when the race is not widened. The
teardown is dominated by the msleep(100)-based outer retry waiting
for handler threads to run ksmbd_conn_free(), not by the iteration
itself; the restartable loop's worst-case O(N^2) visit cost is in
the microseconds even at N=200 and sits far below the msleep(100)
granularity.

Applied alone on top of ksmbd-for-next-next, this patch does not
introduce a new leak site. Under the same reproducer (10x
concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the
tree still shows the pre-existing per-connection transport leak
count that arises when the last refcount drop lands in one of
ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() -
all of which end with a bare kfree() today. kmemleak backtraces
for the unreferenced objects point into the TCP accept path
(sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none
involve stop_sessions(). Plugging those bare-kfree sites is the
responsibility of the follow-up patch.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

authored by

DaeMyung Kang and committed by
Steve French
c444139c ab4ad35e

+40 -9
+39 -9
fs/smb/server/connection.c
··· 540 540 541 541 static void stop_sessions(void) 542 542 { 543 - struct ksmbd_conn *conn; 543 + struct ksmbd_conn *conn, *target; 544 544 struct ksmbd_transport *t; 545 + bool any; 545 546 int bkt; 546 547 548 + /* 549 + * Serialised via init_lock; no concurrent stop_sessions() can 550 + * touch conn->stop_called, so writing it under the read lock is 551 + * safe. 552 + */ 547 553 again: 554 + target = NULL; 555 + any = false; 548 556 down_read(&conn_list_lock); 549 557 hash_for_each(conn_list, bkt, conn, hlist) { 550 - t = conn->transport; 551 - ksmbd_conn_set_exiting(conn); 552 - if (t->ops->shutdown) { 553 - up_read(&conn_list_lock); 554 - t->ops->shutdown(t); 555 - down_read(&conn_list_lock); 556 - } 558 + any = true; 559 + if (conn->stop_called) 560 + continue; 561 + atomic_inc(&conn->refcnt); 562 + conn->stop_called = true; 563 + /* 564 + * Mark the connection EXITING while still holding the 565 + * read lock so the selection and the status transition 566 + * happen together. Do not regress a connection that has 567 + * already advanced to RELEASING on its own (e.g. the 568 + * handler exited its receive loop for an unrelated 569 + * reason). 570 + */ 571 + if (READ_ONCE(conn->status) != KSMBD_SESS_RELEASING) 572 + ksmbd_conn_set_exiting(conn); 573 + target = conn; 574 + break; 557 575 } 558 576 up_read(&conn_list_lock); 559 577 560 - if (!hash_empty(conn_list)) { 578 + if (target) { 579 + t = target->transport; 580 + if (t->ops->shutdown) 581 + t->ops->shutdown(t); 582 + if (atomic_dec_and_test(&target->refcnt)) { 583 + ida_destroy(&target->async_ida); 584 + t->ops->free_transport(t); 585 + kfree(target); 586 + } 587 + goto again; 588 + } 589 + 590 + if (any) { 561 591 msleep(100); 562 592 goto again; 563 593 }
+1
fs/smb/server/connection.h
··· 49 49 struct mutex srv_mutex; 50 50 int status; 51 51 unsigned int cli_cap; 52 + bool stop_called; 52 53 union { 53 54 __be32 inet_addr; 54 55 #if IS_ENABLED(CONFIG_IPV6)