Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
"New Features:
- Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
- Add support for the LOCALIO protocol extention

Bugfixes:
- Fix memory leak in error path of nfs4_do_reclaim()
- Simplify and guarantee lock owner uniqueness
- Fix -Wformat-truncation warning
- Fix folio refcounts by using folio_attach_private()
- Fix failing the mount system call when the server is down
- Fix detection of "Proxying of Times" server support

Cleanups:
- Annotate struct nfs_cache_array with __counted_by()
- Remove unnecessary NULL checks before kfree()
- Convert RPC_TASK_* constants to an enum
- Remove obsolete or misleading comments and declerations"

* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
nfs: Fix `make htmldocs` warnings in the localio documentation
nfs: add "NFS Client and Server Interlock" section to localio.rst
nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
nfs: add Documentation/filesystems/nfs/localio.rst
nfs: implement client support for NFS_LOCALIO_PROGRAM
nfs/localio: use dedicated workqueues for filesystem read and write
pnfs/flexfiles: enable localio support
nfs: enable localio for non-pNFS IO
nfs: add LOCALIO support
nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
nfsd: implement server support for NFS_LOCALIO_PROGRAM
nfsd: add LOCALIO support
nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
nfs_common: add NFS LOCALIO auxiliary protocol enablement
SUNRPC: replace program list with program array
SUNRPC: add svcauth_map_clnt_to_svc_cred_local
SUNRPC: remove call_allocate() BUG_ONs
nfsd: add nfsd_serv_try_get and nfsd_serv_put
nfsd: add nfsd_file_acquire_local()
nfsd: factor out __fh_verify to allow NULL rqstp to be passed
...

+2566 -511
+1
Documentation/filesystems/nfs/index.rst
··· 8 8 9 9 client-identifier 10 10 exporting 11 + localio 11 12 pnfs 12 13 rpc-cache 13 14 rpc-server-gss
+357
Documentation/filesystems/nfs/localio.rst
··· 1 + =========== 2 + NFS LOCALIO 3 + =========== 4 + 5 + Overview 6 + ======== 7 + 8 + The LOCALIO auxiliary RPC protocol allows the Linux NFS client and 9 + server to reliably handshake to determine if they are on the same 10 + host. Select "NFS client and server support for LOCALIO auxiliary 11 + protocol" in menuconfig to enable CONFIG_NFS_LOCALIO in the kernel 12 + config (both CONFIG_NFS_FS and CONFIG_NFSD must also be enabled). 13 + 14 + Once an NFS client and server handshake as "local", the client will 15 + bypass the network RPC protocol for read, write and commit operations. 16 + Due to this XDR and RPC bypass, these operations will operate faster. 17 + 18 + The LOCALIO auxiliary protocol's implementation, which uses the same 19 + connection as NFS traffic, follows the pattern established by the NFS 20 + ACL protocol extension. 21 + 22 + The LOCALIO auxiliary protocol is needed to allow robust discovery of 23 + clients local to their servers. In a private implementation that 24 + preceded use of this LOCALIO protocol, a fragile sockaddr network 25 + address based match against all local network interfaces was attempted. 26 + But unlike the LOCALIO protocol, the sockaddr-based matching didn't 27 + handle use of iptables or containers. 28 + 29 + The robust handshake between local client and server is just the 30 + beginning, the ultimate use case this locality makes possible is the 31 + client is able to open files and issue reads, writes and commits 32 + directly to the server without having to go over the network. The 33 + requirement is to perform these loopback NFS operations as efficiently 34 + as possible, this is particularly useful for container use cases 35 + (e.g. kubernetes) where it is possible to run an IO job local to the 36 + server. 37 + 38 + The performance advantage realized from LOCALIO's ability to bypass 39 + using XDR and RPC for reads, writes and commits can be extreme, e.g.: 40 + 41 + fio for 20 secs with directio, qd of 8, 16 libaio threads: 42 + - With LOCALIO: 43 + 4K read: IOPS=979k, BW=3825MiB/s (4011MB/s)(74.7GiB/20002msec) 44 + 4K write: IOPS=165k, BW=646MiB/s (678MB/s)(12.6GiB/20002msec) 45 + 128K read: IOPS=402k, BW=49.1GiB/s (52.7GB/s)(982GiB/20002msec) 46 + 128K write: IOPS=11.5k, BW=1433MiB/s (1503MB/s)(28.0GiB/20004msec) 47 + 48 + - Without LOCALIO: 49 + 4K read: IOPS=79.2k, BW=309MiB/s (324MB/s)(6188MiB/20003msec) 50 + 4K write: IOPS=59.8k, BW=234MiB/s (245MB/s)(4671MiB/20002msec) 51 + 128K read: IOPS=33.9k, BW=4234MiB/s (4440MB/s)(82.7GiB/20004msec) 52 + 128K write: IOPS=11.5k, BW=1434MiB/s (1504MB/s)(28.0GiB/20011msec) 53 + 54 + fio for 20 secs with directio, qd of 8, 1 libaio thread: 55 + - With LOCALIO: 56 + 4K read: IOPS=230k, BW=898MiB/s (941MB/s)(17.5GiB/20001msec) 57 + 4K write: IOPS=22.6k, BW=88.3MiB/s (92.6MB/s)(1766MiB/20001msec) 58 + 128K read: IOPS=38.8k, BW=4855MiB/s (5091MB/s)(94.8GiB/20001msec) 59 + 128K write: IOPS=11.4k, BW=1428MiB/s (1497MB/s)(27.9GiB/20001msec) 60 + 61 + - Without LOCALIO: 62 + 4K read: IOPS=77.1k, BW=301MiB/s (316MB/s)(6022MiB/20001msec) 63 + 4K write: IOPS=32.8k, BW=128MiB/s (135MB/s)(2566MiB/20001msec) 64 + 128K read: IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec) 65 + 128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec) 66 + 67 + FAQ 68 + === 69 + 70 + 1. What are the use cases for LOCALIO? 71 + 72 + a. Workloads where the NFS client and server are on the same host 73 + realize improved IO performance. In particular, it is common when 74 + running containerised workloads for jobs to find themselves 75 + running on the same host as the knfsd server being used for 76 + storage. 77 + 78 + 2. What are the requirements for LOCALIO? 79 + 80 + a. Bypass use of the network RPC protocol as much as possible. This 81 + includes bypassing XDR and RPC for open, read, write and commit 82 + operations. 83 + b. Allow client and server to autonomously discover if they are 84 + running local to each other without making any assumptions about 85 + the local network topology. 86 + c. Support the use of containers by being compatible with relevant 87 + namespaces (e.g. network, user, mount). 88 + d. Support all versions of NFS. NFSv3 is of particular importance 89 + because it has wide enterprise usage and pNFS flexfiles makes use 90 + of it for the data path. 91 + 92 + 3. Why doesn’t LOCALIO just compare IP addresses or hostnames when 93 + deciding if the NFS client and server are co-located on the same 94 + host? 95 + 96 + Since one of the main use cases is containerised workloads, we cannot 97 + assume that IP addresses will be shared between the client and 98 + server. This sets up a requirement for a handshake protocol that 99 + needs to go over the same connection as the NFS traffic in order to 100 + identify that the client and the server really are running on the 101 + same host. The handshake uses a secret that is sent over the wire, 102 + and can be verified by both parties by comparing with a value stored 103 + in shared kernel memory if they are truly co-located. 104 + 105 + 4. Does LOCALIO improve pNFS flexfiles? 106 + 107 + Yes, LOCALIO complements pNFS flexfiles by allowing it to take 108 + advantage of NFS client and server locality. Policy that initiates 109 + client IO as closely to the server where the data is stored naturally 110 + benefits from the data path optimization LOCALIO provides. 111 + 112 + 5. Why not develop a new pNFS layout to enable LOCALIO? 113 + 114 + A new pNFS layout could be developed, but doing so would put the 115 + onus on the server to somehow discover that the client is co-located 116 + when deciding to hand out the layout. 117 + There is value in a simpler approach (as provided by LOCALIO) that 118 + allows the NFS client to negotiate and leverage locality without 119 + requiring more elaborate modeling and discovery of such locality in a 120 + more centralized manner. 121 + 122 + 6. Why is having the client perform a server-side file OPEN, without 123 + using RPC, beneficial? Is the benefit pNFS specific? 124 + 125 + Avoiding the use of XDR and RPC for file opens is beneficial to 126 + performance regardless of whether pNFS is used. Especially when 127 + dealing with small files its best to avoid going over the wire 128 + whenever possible, otherwise it could reduce or even negate the 129 + benefits of avoiding the wire for doing the small file I/O itself. 130 + Given LOCALIO's requirements the current approach of having the 131 + client perform a server-side file open, without using RPC, is ideal. 132 + If in the future requirements change then we can adapt accordingly. 133 + 134 + 7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)? 135 + 136 + Strong authentication is usually tied to the connection itself. It 137 + works by establishing a context that is cached by the server, and 138 + that acts as the key for discovering the authorisation token, which 139 + can then be passed to rpc.mountd to complete the authentication 140 + process. On the other hand, in the case of AUTH_UNIX, the credential 141 + that was passed over the wire is used directly as the key in the 142 + upcall to rpc.mountd. This simplifies the authentication process, and 143 + so makes AUTH_UNIX easier to support. 144 + 145 + 8. How do export options that translate RPC user IDs behave for LOCALIO 146 + operations (eg. root_squash, all_squash)? 147 + 148 + Export options that translate user IDs are managed by nfsd_setuser() 149 + which is called by nfsd_setuser_and_check_port() which is called by 150 + __fh_verify(). So they get handled exactly the same way for LOCALIO 151 + as they do for non-LOCALIO. 152 + 153 + 9. How does LOCALIO make certain that object lifetimes are managed 154 + properly given NFSD and NFS operate in different contexts? 155 + 156 + See the detailed "NFS Client and Server Interlock" section below. 157 + 158 + RPC 159 + === 160 + 161 + The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" 162 + RPC method that allows the Linux NFS client to verify the local Linux 163 + NFS server can see the nonce (single-use UUID) the client generated and 164 + made available in nfs_common. This protocol isn't part of an IETF 165 + standard, nor does it need to be considering it is Linux-to-Linux 166 + auxiliary RPC protocol that amounts to an implementation detail. 167 + 168 + The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of 169 + the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode 170 + XDR methods are used instead of the less efficient variable sized 171 + methods. 172 + 173 + The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned 174 + by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ): 175 + Linux Kernel Organization 400122 nfslocalio 176 + 177 + The LOCALIO protocol spec in rpcgen syntax is:: 178 + 179 + /* raw RFC 9562 UUID */ 180 + #define UUID_SIZE 16 181 + typedef u8 uuid_t<UUID_SIZE>; 182 + 183 + program NFS_LOCALIO_PROGRAM { 184 + version LOCALIO_V1 { 185 + void 186 + NULL(void) = 0; 187 + 188 + void 189 + UUID_IS_LOCAL(uuid_t) = 1; 190 + } = 1; 191 + } = 400122; 192 + 193 + LOCALIO uses the same transport connection as NFS traffic. As such, 194 + LOCALIO is not registered with rpcbind. 195 + 196 + NFS Common and Client/Server Handshake 197 + ====================================== 198 + 199 + fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client 200 + to generate a nonce (single-use UUID) and associated short-lived 201 + nfs_uuid_t struct, register it with nfs_common for subsequent lookup and 202 + verification by the NFS server and if matched the NFS server populates 203 + members in the nfs_uuid_t struct. The NFS client then uses nfs_common to 204 + transfer the nfs_uuid_t from its nfs_uuids to the nn->nfsd_serv 205 + clients_list from the nfs_common's uuids_list. See: 206 + fs/nfs/localio.c:nfs_local_probe() 207 + 208 + nfs_common's nfs_uuids list is the basis for LOCALIO enablement, as such 209 + it has members that point to nfsd memory for direct use by the client 210 + (e.g. 'net' is the server's network namespace, through it the client can 211 + access nn->nfsd_serv with proper rcu read access). It is this client 212 + and server synchronization that enables advanced usage and lifetime of 213 + objects to span from the host kernel's nfsd to per-container knfsd 214 + instances that are connected to nfs client's running on the same local 215 + host. 216 + 217 + NFS Client and Server Interlock 218 + =============================== 219 + 220 + LOCALIO provides the nfs_uuid_t object and associated interfaces to 221 + allow proper network namespace (net-ns) and NFSD object refcounting: 222 + 223 + We don't want to keep a long-term counted reference on each NFSD's 224 + net-ns in the client because that prevents a server container from 225 + completely shutting down. 226 + 227 + So we avoid taking a reference at all and rely on the per-cpu 228 + reference to the server (detailed below) being sufficient to keep 229 + the net-ns active. This involves allowing the NFSD's net-ns exit 230 + code to iterate all active clients and clear their ->net pointers 231 + (which are needed to find the per-cpu-refcount for the nfsd_serv). 232 + 233 + Details: 234 + 235 + - Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head 236 + that can be used to find the client. It does add the 16-byte 237 + uuid_t to nfs_client so it is bigger than needed (given that 238 + uuid_t is only used during the initial NFS client and server 239 + LOCALIO handshake to determine if they are local to each other). 240 + If that is really a problem we can find a fix. 241 + 242 + - When the nfs server confirms that the uuid_t is local, it moves 243 + the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net. 244 + 245 + - When each server's net-ns is shutting down - in a "pre_exit" 246 + handler, all these nfs_uuid_t have their ->net cleared. There is 247 + an rcu_synchronize() call between pre_exit() handlers and exit() 248 + handlers so any caller that sees nfs_uuid_t ->net as not NULL can 249 + safely manage the per-cpu-refcount for nfsd_serv. 250 + 251 + - The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it 252 + can safely dereference ->net in a private rcu_read_lock() section 253 + to allow safe access to the associated nfsd_net and nfsd_serv. 254 + 255 + So LOCALIO required the introduction and use of NFSD's percpu_ref to 256 + interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each 257 + nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and 258 + warrants a more detailed explanation: 259 + 260 + nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its 261 + nfsd_file handle and then the caller (NFS client) must drop the 262 + reference for the nfsd_file and associated nn->nfsd_serv using 263 + nfs_file_put_local() once it has completed its IO. 264 + 265 + This interlock working relies heavily on nfsd_open_local_fh() being 266 + afforded the ability to safely deal with the possibility that the 267 + NFSD's net-ns (and nfsd_net by association) may have been destroyed 268 + by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only 269 + possible given the nfs_uuid_t ->net pointer managemenet detailed 270 + above. 271 + 272 + All told, this elaborate interlock of the NFS client and server has been 273 + verified to fix an easy to hit crash that would occur if an NFSD 274 + instance running in a container, with a LOCALIO client mounted, is 275 + shutdown. Upon restart of the container and associated NFSD the client 276 + would go on to crash due to NULL pointer dereference that occurred due 277 + to the LOCALIO client's attempting to nfsd_open_local_fh(), using 278 + nn->nfsd_serv, without having a proper reference on nn->nfsd_serv. 279 + 280 + NFS Client issues IO instead of Server 281 + ====================================== 282 + 283 + Because LOCALIO is focused on protocol bypass to achieve improved IO 284 + performance, alternatives to the traditional NFS wire protocol (SUNRPC 285 + with XDR) must be provided to access the backing filesystem. 286 + 287 + See fs/nfs/localio.c:nfs_local_open_fh() and 288 + fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes 289 + focused use of select nfs server objects to allow a client local to a 290 + server to open a file pointer without needing to go over the network. 291 + 292 + The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the 293 + server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access 294 + both the associated nfsd network namespace and nn->nfsd_serv in terms of 295 + RCU. If nfsd_open_local_fh() finds that the client no longer sees valid 296 + nfsd objects (be it struct net or nn->nfsd_serv) it returns -ENXIO 297 + to nfs_local_open_fh() and the client will try to reestablish the 298 + LOCALIO resources needed by calling nfs_local_probe() again. This 299 + recovery is needed if/when an nfsd instance running in a container were 300 + to reboot while a LOCALIO client is connected to it. 301 + 302 + Once the client has an open nfsd_file pointer it will issue reads, 303 + writes and commits directly to the underlying local filesystem (normally 304 + done by the nfs server). As such, for these operations, the NFS client 305 + is issuing IO to the underlying local filesystem that it is sharing with 306 + the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and 307 + fs/nfs/localio.c:nfs_local_commit(). 308 + 309 + Security 310 + ======== 311 + 312 + Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka 313 + AUTH_SYS) is used. 314 + 315 + Care is taken to ensure the same NFS security mechanisms are used 316 + (authentication, etc) regardless of whether LOCALIO or regular NFS 317 + access is used. The auth_domain established as part of the traditional 318 + NFS client access to the NFS server is also used for LOCALIO. 319 + 320 + Relative to containers, LOCALIO gives the client access to the network 321 + namespace the server has. This is required to allow the client to access 322 + the server's per-namespace nfsd_net struct. With traditional NFS, the 323 + client is afforded this same level of access (albeit in terms of the NFS 324 + protocol via SUNRPC). No other namespaces (user, mount, etc) have been 325 + altered or purposely extended from the server to the client. 326 + 327 + Testing 328 + ======= 329 + 330 + The LOCALIO auxiliary protocol and associated NFS LOCALIO read, write 331 + and commit access have proven stable against various test scenarios: 332 + 333 + - Client and server both on the same host. 334 + 335 + - All permutations of client and server support enablement for both 336 + local and remote client and server. 337 + 338 + - Testing against NFS storage products that don't support the LOCALIO 339 + protocol was also performed. 340 + 341 + - Client on host, server within a container (for both v3 and v4.2). 342 + The container testing was in terms of podman managed containers and 343 + includes successful container stop/restart scenario. 344 + 345 + - Formalizing these test scenarios in terms of existing test 346 + infrastructure is on-going. Initial regular coverage is provided in 347 + terms of ktest running xfstests against a LOCALIO-enabled NFS loopback 348 + mount configuration, and includes lockdep and KASAN coverage, see: 349 + https://evilpiepirate.org/~testdashboard/ci?user=snitzer&branch=snitm-nfs-next 350 + https://github.com/koverstreet/ktest 351 + 352 + - Various kdevops testing (in terms of "Chuck's BuildBot") has been 353 + performed to regularly verify the LOCALIO changes haven't caused any 354 + regressions to non-LOCALIO NFS use cases. 355 + 356 + - All of Hammerspace's various sanity tests pass with LOCALIO enabled 357 + (this includes numerous pNFS and flexfiles tests).
+23
fs/Kconfig
··· 386 386 depends on NFSD || NFS_FS || LOCKD 387 387 default y 388 388 389 + config NFS_COMMON_LOCALIO_SUPPORT 390 + tristate 391 + default n 392 + default y if NFSD=y || NFS_FS=y 393 + default m if NFSD=m && NFS_FS=m 394 + select SUNRPC 395 + 396 + config NFS_LOCALIO 397 + bool "NFS client and server support for LOCALIO auxiliary protocol" 398 + depends on NFSD && NFS_FS 399 + select NFS_COMMON_LOCALIO_SUPPORT 400 + default n 401 + help 402 + Some NFS servers support an auxiliary NFS LOCALIO protocol 403 + that is not an official part of the NFS protocol. 404 + 405 + This option enables support for the LOCALIO protocol in the 406 + kernel's NFS server and client. Enable this to permit local 407 + NFS clients to bypass the network when issuing reads and 408 + writes to the local NFS server. 409 + 410 + If unsure, say N. 411 + 389 412 config NFS_V4_2_SSC_HELPER 390 413 bool 391 414 default y if NFS_V4_2
+1
fs/nfs/Kconfig
··· 4 4 depends on INET && FILE_LOCKING && MULTIUSER 5 5 select LOCKD 6 6 select SUNRPC 7 + select NFS_COMMON 7 8 select NFS_ACL_SUPPORT if NFS_V3_ACL 8 9 help 9 10 Choose Y here if you want to access files residing on other
+1
fs/nfs/Makefile
··· 13 13 nfs-$(CONFIG_ROOT_NFS) += nfsroot.o 14 14 nfs-$(CONFIG_SYSCTL) += sysctl.o 15 15 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o 16 + nfs-$(CONFIG_NFS_LOCALIO) += localio.o 16 17 17 18 obj-$(CONFIG_NFS_V2) += nfsv2.o 18 19 nfsv2-y := nfs2super.o proc.o nfs2xdr.o
+16 -5
fs/nfs/client.c
··· 178 178 clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1; 179 179 clp->cl_net = get_net(cl_init->net); 180 180 181 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 182 + seqlock_init(&clp->cl_boot_lock); 183 + ktime_get_real_ts64(&clp->cl_nfssvc_boot); 184 + clp->cl_uuid.net = NULL; 185 + clp->cl_uuid.dom = NULL; 186 + spin_lock_init(&clp->cl_localio_lock); 187 + #endif /* CONFIG_NFS_LOCALIO */ 188 + 181 189 clp->cl_principal = "*"; 182 190 clp->cl_xprtsec = cl_init->xprtsec; 183 191 return clp; ··· 241 233 */ 242 234 void nfs_free_client(struct nfs_client *clp) 243 235 { 236 + nfs_local_disable(clp); 237 + 244 238 /* -EIO all pending I/O */ 245 239 if (!IS_ERR(clp->cl_rpcclient)) 246 240 rpc_shutdown_client(clp->cl_rpcclient); ··· 434 424 list_add_tail(&new->cl_share_link, 435 425 &nn->nfs_client_list); 436 426 spin_unlock(&nn->nfs_client_lock); 437 - return rpc_ops->init_client(new, cl_init); 427 + new = rpc_ops->init_client(new, cl_init); 428 + if (!IS_ERR(new)) 429 + nfs_local_probe(new); 430 + return new; 438 431 } 439 432 440 433 spin_unlock(&nn->nfs_client_lock); ··· 1010 997 init_waitqueue_head(&server->write_congestion_wait); 1011 998 atomic_long_set(&server->writeback, 0); 1012 999 1013 - ida_init(&server->openowner_id); 1014 - ida_init(&server->lockowner_id); 1000 + atomic64_set(&server->owner_ctr, 0); 1001 + 1015 1002 pnfs_init_server(server); 1016 1003 rpc_init_wait_queue(&server->uoc_rpcwaitq, "NFS UOC"); 1017 1004 ··· 1050 1037 } 1051 1038 ida_free(&s_sysfs_ids, server->s_sysfs_id); 1052 1039 1053 - ida_destroy(&server->lockowner_id); 1054 - ida_destroy(&server->openowner_id); 1055 1040 put_cred(server->cred); 1056 1041 nfs_release_automount_timer(); 1057 1042 call_rcu(&server->rcu, delayed_free);
+3 -3
fs/nfs/dir.c
··· 151 151 unsigned char folio_full : 1, 152 152 folio_is_eof : 1, 153 153 cookies_are_ordered : 1; 154 - struct nfs_cache_array_entry array[]; 154 + struct nfs_cache_array_entry array[] __counted_by(size); 155 155 }; 156 156 157 157 struct nfs_readdir_descriptor { ··· 328 328 goto out; 329 329 } 330 330 331 - cache_entry = &array->array[array->size]; 331 + array->size++; 332 + cache_entry = &array->array[array->size - 1]; 332 333 cache_entry->cookie = array->last_cookie; 333 334 cache_entry->ino = entry->ino; 334 335 cache_entry->d_type = entry->d_type; ··· 338 337 array->last_cookie = entry->cookie; 339 338 if (array->last_cookie <= cache_entry->cookie) 340 339 array->cookies_are_ordered = 0; 341 - array->size++; 342 340 if (entry->eof != 0) 343 341 nfs_readdir_array_set_eof(array); 344 342 out:
+3 -3
fs/nfs/filelayout/filelayout.c
··· 488 488 /* Perform an asynchronous read to ds */ 489 489 nfs_initiate_pgio(ds_clnt, hdr, hdr->cred, 490 490 NFS_PROTO(hdr->inode), &filelayout_read_call_ops, 491 - 0, RPC_TASK_SOFTCONN); 491 + 0, RPC_TASK_SOFTCONN, NULL); 492 492 return PNFS_ATTEMPTED; 493 493 } 494 494 ··· 530 530 /* Perform an asynchronous write */ 531 531 nfs_initiate_pgio(ds_clnt, hdr, hdr->cred, 532 532 NFS_PROTO(hdr->inode), &filelayout_write_call_ops, 533 - sync, RPC_TASK_SOFTCONN); 533 + sync, RPC_TASK_SOFTCONN, NULL); 534 534 return PNFS_ATTEMPTED; 535 535 } 536 536 ··· 1011 1011 data->args.fh = fh; 1012 1012 return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode), 1013 1013 &filelayout_commit_call_ops, how, 1014 - RPC_TASK_SOFTCONN); 1014 + RPC_TASK_SOFTCONN, NULL); 1015 1015 out_err: 1016 1016 pnfs_generic_prepare_to_resend_writes(data); 1017 1017 pnfs_generic_commit_release(data);
+46 -10
fs/nfs/flexfilelayout/flexfilelayout.c
··· 11 11 #include <linux/nfs_mount.h> 12 12 #include <linux/nfs_page.h> 13 13 #include <linux/module.h> 14 + #include <linux/file.h> 14 15 #include <linux/sched/mm.h> 15 16 16 17 #include <linux/sunrpc/metrics.h> ··· 163 162 return 0; 164 163 } 165 164 165 + static struct nfsd_file * 166 + ff_local_open_fh(struct nfs_client *clp, const struct cred *cred, 167 + struct nfs_fh *fh, fmode_t mode) 168 + { 169 + if (mode & FMODE_WRITE) { 170 + /* 171 + * Always request read and write access since this corresponds 172 + * to a rw layout. 173 + */ 174 + mode |= FMODE_READ; 175 + } 176 + 177 + return nfs_local_open_fh(clp, cred, fh, mode); 178 + } 179 + 166 180 static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1, 167 181 const struct nfs4_ff_layout_mirror *m2) 168 182 { ··· 253 237 254 238 static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror) 255 239 { 256 - const struct cred *cred; 240 + const struct cred *cred; 257 241 258 242 ff_layout_remove_mirror(mirror); 259 243 kfree(mirror->fh_versions); ··· 1772 1756 struct pnfs_layout_segment *lseg = hdr->lseg; 1773 1757 struct nfs4_pnfs_ds *ds; 1774 1758 struct rpc_clnt *ds_clnt; 1759 + struct nfsd_file *localio; 1775 1760 struct nfs4_ff_layout_mirror *mirror; 1776 1761 const struct cred *ds_cred; 1777 1762 loff_t offset = hdr->args.offset; ··· 1819 1802 hdr->args.offset = offset; 1820 1803 hdr->mds_offset = offset; 1821 1804 1805 + /* Start IO accounting for local read */ 1806 + localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ); 1807 + if (localio) { 1808 + hdr->task.tk_start = ktime_get(); 1809 + ff_layout_read_record_layoutstats_start(&hdr->task, hdr); 1810 + } 1811 + 1822 1812 /* Perform an asynchronous read to ds */ 1823 1813 nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops, 1824 1814 vers == 3 ? &ff_layout_read_call_ops_v3 : 1825 1815 &ff_layout_read_call_ops_v4, 1826 - 0, RPC_TASK_SOFTCONN); 1816 + 0, RPC_TASK_SOFTCONN, localio); 1827 1817 put_cred(ds_cred); 1828 1818 return PNFS_ATTEMPTED; 1829 1819 ··· 1850 1826 struct pnfs_layout_segment *lseg = hdr->lseg; 1851 1827 struct nfs4_pnfs_ds *ds; 1852 1828 struct rpc_clnt *ds_clnt; 1829 + struct nfsd_file *localio; 1853 1830 struct nfs4_ff_layout_mirror *mirror; 1854 1831 const struct cred *ds_cred; 1855 1832 loff_t offset = hdr->args.offset; ··· 1895 1870 */ 1896 1871 hdr->args.offset = offset; 1897 1872 1873 + /* Start IO accounting for local write */ 1874 + localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, 1875 + FMODE_READ|FMODE_WRITE); 1876 + if (localio) { 1877 + hdr->task.tk_start = ktime_get(); 1878 + ff_layout_write_record_layoutstats_start(&hdr->task, hdr); 1879 + } 1880 + 1898 1881 /* Perform an asynchronous write */ 1899 1882 nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops, 1900 1883 vers == 3 ? &ff_layout_write_call_ops_v3 : 1901 1884 &ff_layout_write_call_ops_v4, 1902 - sync, RPC_TASK_SOFTCONN); 1885 + sync, RPC_TASK_SOFTCONN, localio); 1903 1886 put_cred(ds_cred); 1904 1887 return PNFS_ATTEMPTED; 1905 1888 ··· 1941 1908 struct pnfs_layout_segment *lseg = data->lseg; 1942 1909 struct nfs4_pnfs_ds *ds; 1943 1910 struct rpc_clnt *ds_clnt; 1911 + struct nfsd_file *localio; 1944 1912 struct nfs4_ff_layout_mirror *mirror; 1945 1913 const struct cred *ds_cred; 1946 1914 u32 idx; ··· 1980 1946 if (fh) 1981 1947 data->args.fh = fh; 1982 1948 1949 + /* Start IO accounting for local commit */ 1950 + localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, 1951 + FMODE_READ|FMODE_WRITE); 1952 + if (localio) { 1953 + data->task.tk_start = ktime_get(); 1954 + ff_layout_commit_record_layoutstats_start(&data->task, data); 1955 + } 1956 + 1983 1957 ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops, 1984 1958 vers == 3 ? &ff_layout_commit_call_ops_v3 : 1985 1959 &ff_layout_commit_call_ops_v4, 1986 - how, RPC_TASK_SOFTCONN); 1960 + how, RPC_TASK_SOFTCONN, localio); 1987 1961 put_cred(ds_cred); 1988 1962 return ret; 1989 1963 out_err: ··· 2126 2084 *start = cpu_to_be32(ff_args->num_errors); 2127 2085 /* This assume we always return _ALL_ layouts */ 2128 2086 return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors); 2129 - } 2130 - 2131 - static void 2132 - encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len) 2133 - { 2134 - WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0); 2135 2087 } 2136 2088 2137 2089 static void
+6
fs/nfs/flexfilelayout/flexfilelayoutdev.c
··· 395 395 396 396 /* connect success, check rsize/wsize limit */ 397 397 if (!status) { 398 + /* 399 + * ds_clp is put in destroy_ds(). 400 + * keep ds_clp even if DS is local, so that if local IO cannot 401 + * proceed somehow, we can fall back to NFS whenever we want. 402 + */ 403 + nfs_local_probe(ds->ds_clp); 398 404 max_payload = 399 405 nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient), 400 406 NULL);
+8
fs/nfs/fs_context.c
··· 49 49 Opt_bsize, 50 50 Opt_clientaddr, 51 51 Opt_cto, 52 + Opt_alignwrite, 52 53 Opt_fg, 53 54 Opt_fscache, 54 55 Opt_fscache_flag, ··· 150 149 fsparam_u32 ("bsize", Opt_bsize), 151 150 fsparam_string("clientaddr", Opt_clientaddr), 152 151 fsparam_flag_no("cto", Opt_cto), 152 + fsparam_flag_no("alignwrite", Opt_alignwrite), 153 153 fsparam_flag ("fg", Opt_fg), 154 154 fsparam_flag_no("fsc", Opt_fscache_flag), 155 155 fsparam_string("fsc", Opt_fscache), ··· 593 591 ctx->flags &= ~NFS_MOUNT_TRUNK_DISCOVERY; 594 592 else 595 593 ctx->flags |= NFS_MOUNT_TRUNK_DISCOVERY; 594 + break; 595 + case Opt_alignwrite: 596 + if (result.negated) 597 + ctx->flags |= NFS_MOUNT_NO_ALIGNWRITE; 598 + else 599 + ctx->flags &= ~NFS_MOUNT_NO_ALIGNWRITE; 596 600 break; 597 601 case Opt_ac: 598 602 if (result.negated)
+1 -1
fs/nfs/getroot.c
··· 62 62 } 63 63 64 64 /* 65 - * get an NFS2/NFS3 root dentry from the root filehandle 65 + * get a root dentry from the root filehandle 66 66 */ 67 67 int nfs_get_root(struct super_block *s, struct fs_context *fc) 68 68 {
+38 -19
fs/nfs/inode.c
··· 2461 2461 kmem_cache_destroy(nfs_inode_cachep); 2462 2462 } 2463 2463 2464 + struct workqueue_struct *nfslocaliod_workqueue; 2464 2465 struct workqueue_struct *nfsiod_workqueue; 2465 2466 EXPORT_SYMBOL_GPL(nfsiod_workqueue); 2466 2467 2467 2468 /* 2468 - * start up the nfsiod workqueue 2469 - */ 2470 - static int nfsiod_start(void) 2471 - { 2472 - struct workqueue_struct *wq; 2473 - dprintk("RPC: creating workqueue nfsiod\n"); 2474 - wq = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); 2475 - if (wq == NULL) 2476 - return -ENOMEM; 2477 - nfsiod_workqueue = wq; 2478 - return 0; 2479 - } 2480 - 2481 - /* 2482 - * Destroy the nfsiod workqueue 2469 + * Destroy the nfsiod workqueues 2483 2470 */ 2484 2471 static void nfsiod_stop(void) 2485 2472 { 2486 2473 struct workqueue_struct *wq; 2487 2474 2488 2475 wq = nfsiod_workqueue; 2489 - if (wq == NULL) 2490 - return; 2491 - nfsiod_workqueue = NULL; 2492 - destroy_workqueue(wq); 2476 + if (wq != NULL) { 2477 + nfsiod_workqueue = NULL; 2478 + destroy_workqueue(wq); 2479 + } 2480 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 2481 + wq = nfslocaliod_workqueue; 2482 + if (wq != NULL) { 2483 + nfslocaliod_workqueue = NULL; 2484 + destroy_workqueue(wq); 2485 + } 2486 + #endif /* CONFIG_NFS_LOCALIO */ 2487 + } 2488 + 2489 + /* 2490 + * Start the nfsiod workqueues 2491 + */ 2492 + static int nfsiod_start(void) 2493 + { 2494 + dprintk("RPC: creating workqueue nfsiod\n"); 2495 + nfsiod_workqueue = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); 2496 + if (nfsiod_workqueue == NULL) 2497 + return -ENOMEM; 2498 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 2499 + /* 2500 + * localio writes need to use a normal (non-memreclaim) workqueue. 2501 + * When we start getting low on space, XFS goes and calls flush_work() on 2502 + * a non-memreclaim work queue, which causes a priority inversion problem. 2503 + */ 2504 + dprintk("RPC: creating workqueue nfslocaliod\n"); 2505 + nfslocaliod_workqueue = alloc_workqueue("nfslocaliod", WQ_UNBOUND, 0); 2506 + if (unlikely(nfslocaliod_workqueue == NULL)) { 2507 + nfsiod_stop(); 2508 + return -ENOMEM; 2509 + } 2510 + #endif /* CONFIG_NFS_LOCALIO */ 2511 + return 0; 2493 2512 } 2494 2513 2495 2514 unsigned int nfs_net_id;
+51 -3
fs/nfs/internal.h
··· 9 9 #include <linux/crc32.h> 10 10 #include <linux/sunrpc/addr.h> 11 11 #include <linux/nfs_page.h> 12 + #include <linux/nfslocalio.h> 12 13 #include <linux/wait_bit.h> 13 14 14 15 #define NFS_SB_MASK (SB_RDONLY|SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS) ··· 309 308 int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *); 310 309 int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr, 311 310 const struct cred *cred, const struct nfs_rpc_ops *rpc_ops, 312 - const struct rpc_call_ops *call_ops, int how, int flags); 311 + const struct rpc_call_ops *call_ops, int how, int flags, 312 + struct nfsd_file *localio); 313 313 void nfs_free_request(struct nfs_page *req); 314 314 struct nfs_pgio_mirror * 315 315 nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc); ··· 440 438 441 439 /* inode.c */ 442 440 extern struct workqueue_struct *nfsiod_workqueue; 441 + extern struct workqueue_struct *nfslocaliod_workqueue; 443 442 extern struct inode *nfs_alloc_inode(struct super_block *sb); 444 443 extern void nfs_free_inode(struct inode *); 445 444 extern int nfs_write_inode(struct inode *, struct writeback_control *); ··· 451 448 extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags); 452 449 extern bool nfs_check_cache_invalid(struct inode *, unsigned long); 453 450 extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode); 451 + 452 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 453 + /* localio.c */ 454 + extern void nfs_local_disable(struct nfs_client *); 455 + extern void nfs_local_probe(struct nfs_client *); 456 + extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *, 457 + const struct cred *, 458 + struct nfs_fh *, 459 + const fmode_t); 460 + extern int nfs_local_doio(struct nfs_client *, 461 + struct nfsd_file *, 462 + struct nfs_pgio_header *, 463 + const struct rpc_call_ops *); 464 + extern int nfs_local_commit(struct nfsd_file *, 465 + struct nfs_commit_data *, 466 + const struct rpc_call_ops *, int); 467 + extern bool nfs_server_is_local(const struct nfs_client *clp); 468 + 469 + #else /* CONFIG_NFS_LOCALIO */ 470 + static inline void nfs_local_disable(struct nfs_client *clp) {} 471 + static inline void nfs_local_probe(struct nfs_client *clp) {} 472 + static inline struct nfsd_file * 473 + nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, 474 + struct nfs_fh *fh, const fmode_t mode) 475 + { 476 + return NULL; 477 + } 478 + static inline int nfs_local_doio(struct nfs_client *clp, 479 + struct nfsd_file *localio, 480 + struct nfs_pgio_header *hdr, 481 + const struct rpc_call_ops *call_ops) 482 + { 483 + return -EINVAL; 484 + } 485 + static inline int nfs_local_commit(struct nfsd_file *localio, 486 + struct nfs_commit_data *data, 487 + const struct rpc_call_ops *call_ops, int how) 488 + { 489 + return -EINVAL; 490 + } 491 + static inline bool nfs_server_is_local(const struct nfs_client *clp) 492 + { 493 + return false; 494 + } 495 + #endif /* CONFIG_NFS_LOCALIO */ 454 496 455 497 /* super.c */ 456 498 extern const struct super_operations nfs_sops; ··· 553 505 struct nfs_open_context *ctx, 554 506 struct folio *folio); 555 507 extern void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio); 556 - extern void nfs_read_prepare(struct rpc_task *task, void *calldata); 557 508 extern void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio); 558 509 559 510 /* super.c */ ··· 575 528 struct nfs_commit_data *data, 576 529 const struct nfs_rpc_ops *nfs_ops, 577 530 const struct rpc_call_ops *call_ops, 578 - int how, int flags); 531 + int how, int flags, 532 + struct nfsd_file *localio); 579 533 extern void nfs_init_commit(struct nfs_commit_data *data, 580 534 struct list_head *head, 581 535 struct pnfs_layout_segment *lseg,
+757
fs/nfs/localio.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * NFS client support for local clients to bypass network stack 4 + * 5 + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com> 6 + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com> 7 + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> 8 + * Copyright (C) 2024 NeilBrown <neilb@suse.de> 9 + */ 10 + 11 + #include <linux/module.h> 12 + #include <linux/errno.h> 13 + #include <linux/vfs.h> 14 + #include <linux/file.h> 15 + #include <linux/inet.h> 16 + #include <linux/sunrpc/addr.h> 17 + #include <linux/inetdevice.h> 18 + #include <net/addrconf.h> 19 + #include <linux/nfs_common.h> 20 + #include <linux/nfslocalio.h> 21 + #include <linux/module.h> 22 + #include <linux/bvec.h> 23 + 24 + #include <linux/nfs.h> 25 + #include <linux/nfs_fs.h> 26 + #include <linux/nfs_xdr.h> 27 + 28 + #include "internal.h" 29 + #include "pnfs.h" 30 + #include "nfstrace.h" 31 + 32 + #define NFSDBG_FACILITY NFSDBG_VFS 33 + 34 + struct nfs_local_kiocb { 35 + struct kiocb kiocb; 36 + struct bio_vec *bvec; 37 + struct nfs_pgio_header *hdr; 38 + struct work_struct work; 39 + struct nfsd_file *localio; 40 + }; 41 + 42 + struct nfs_local_fsync_ctx { 43 + struct nfsd_file *localio; 44 + struct nfs_commit_data *data; 45 + struct work_struct work; 46 + struct kref kref; 47 + struct completion *done; 48 + }; 49 + static void nfs_local_fsync_work(struct work_struct *work); 50 + 51 + static bool localio_enabled __read_mostly = true; 52 + module_param(localio_enabled, bool, 0644); 53 + 54 + static inline bool nfs_client_is_local(const struct nfs_client *clp) 55 + { 56 + return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); 57 + } 58 + 59 + bool nfs_server_is_local(const struct nfs_client *clp) 60 + { 61 + return nfs_client_is_local(clp) && localio_enabled; 62 + } 63 + EXPORT_SYMBOL_GPL(nfs_server_is_local); 64 + 65 + /* 66 + * UUID_IS_LOCAL XDR functions 67 + */ 68 + 69 + static void localio_xdr_enc_uuidargs(struct rpc_rqst *req, 70 + struct xdr_stream *xdr, 71 + const void *data) 72 + { 73 + const u8 *uuid = data; 74 + 75 + encode_opaque_fixed(xdr, uuid, UUID_SIZE); 76 + } 77 + 78 + static int localio_xdr_dec_uuidres(struct rpc_rqst *req, 79 + struct xdr_stream *xdr, 80 + void *result) 81 + { 82 + /* void return */ 83 + return 0; 84 + } 85 + 86 + static const struct rpc_procinfo nfs_localio_procedures[] = { 87 + [LOCALIOPROC_UUID_IS_LOCAL] = { 88 + .p_proc = LOCALIOPROC_UUID_IS_LOCAL, 89 + .p_encode = localio_xdr_enc_uuidargs, 90 + .p_decode = localio_xdr_dec_uuidres, 91 + .p_arglen = XDR_QUADLEN(UUID_SIZE), 92 + .p_replen = 0, 93 + .p_statidx = LOCALIOPROC_UUID_IS_LOCAL, 94 + .p_name = "UUID_IS_LOCAL", 95 + }, 96 + }; 97 + 98 + static unsigned int nfs_localio_counts[ARRAY_SIZE(nfs_localio_procedures)]; 99 + static const struct rpc_version nfslocalio_version1 = { 100 + .number = 1, 101 + .nrprocs = ARRAY_SIZE(nfs_localio_procedures), 102 + .procs = nfs_localio_procedures, 103 + .counts = nfs_localio_counts, 104 + }; 105 + 106 + static const struct rpc_version *nfslocalio_version[] = { 107 + [1] = &nfslocalio_version1, 108 + }; 109 + 110 + extern const struct rpc_program nfslocalio_program; 111 + static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program }; 112 + 113 + const struct rpc_program nfslocalio_program = { 114 + .name = "nfslocalio", 115 + .number = NFS_LOCALIO_PROGRAM, 116 + .nrvers = ARRAY_SIZE(nfslocalio_version), 117 + .version = nfslocalio_version, 118 + .stats = &nfslocalio_rpcstat, 119 + }; 120 + 121 + /* 122 + * nfs_local_enable - enable local i/o for an nfs_client 123 + */ 124 + static void nfs_local_enable(struct nfs_client *clp) 125 + { 126 + spin_lock(&clp->cl_localio_lock); 127 + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); 128 + trace_nfs_local_enable(clp); 129 + spin_unlock(&clp->cl_localio_lock); 130 + } 131 + 132 + /* 133 + * nfs_local_disable - disable local i/o for an nfs_client 134 + */ 135 + void nfs_local_disable(struct nfs_client *clp) 136 + { 137 + spin_lock(&clp->cl_localio_lock); 138 + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) { 139 + trace_nfs_local_disable(clp); 140 + nfs_uuid_invalidate_one_client(&clp->cl_uuid); 141 + } 142 + spin_unlock(&clp->cl_localio_lock); 143 + } 144 + 145 + /* 146 + * nfs_init_localioclient - Initialise an NFS localio client connection 147 + */ 148 + static struct rpc_clnt *nfs_init_localioclient(struct nfs_client *clp) 149 + { 150 + struct rpc_clnt *rpcclient_localio; 151 + 152 + rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient, 153 + &nfslocalio_program, 1); 154 + 155 + dprintk_rcu("%s: server (%s) %s NFS LOCALIO.\n", 156 + __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR), 157 + (IS_ERR(rpcclient_localio) ? "does not support" : "supports")); 158 + 159 + return rpcclient_localio; 160 + } 161 + 162 + static bool nfs_server_uuid_is_local(struct nfs_client *clp) 163 + { 164 + u8 uuid[UUID_SIZE]; 165 + struct rpc_message msg = { 166 + .rpc_argp = &uuid, 167 + }; 168 + struct rpc_clnt *rpcclient_localio; 169 + int status; 170 + 171 + rpcclient_localio = nfs_init_localioclient(clp); 172 + if (IS_ERR(rpcclient_localio)) 173 + return false; 174 + 175 + export_uuid(uuid, &clp->cl_uuid.uuid); 176 + 177 + msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_UUID_IS_LOCAL]; 178 + status = rpc_call_sync(rpcclient_localio, &msg, 0); 179 + dprintk("%s: NFS reply UUID_IS_LOCAL: status=%d\n", 180 + __func__, status); 181 + rpc_shutdown_client(rpcclient_localio); 182 + 183 + /* Server is only local if it initialized required struct members */ 184 + if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom) 185 + return false; 186 + 187 + return true; 188 + } 189 + 190 + /* 191 + * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client 192 + * - called after alloc_client and init_client (so cl_rpcclient exists) 193 + * - this function is idempotent, it can be called for old or new clients 194 + */ 195 + void nfs_local_probe(struct nfs_client *clp) 196 + { 197 + /* Disallow localio if disabled via sysfs or AUTH_SYS isn't used */ 198 + if (!localio_enabled || 199 + clp->cl_rpcclient->cl_auth->au_flavor != RPC_AUTH_UNIX) { 200 + nfs_local_disable(clp); 201 + return; 202 + } 203 + 204 + if (nfs_client_is_local(clp)) { 205 + /* If already enabled, disable and re-enable */ 206 + nfs_local_disable(clp); 207 + } 208 + 209 + nfs_uuid_begin(&clp->cl_uuid); 210 + if (nfs_server_uuid_is_local(clp)) 211 + nfs_local_enable(clp); 212 + nfs_uuid_end(&clp->cl_uuid); 213 + } 214 + EXPORT_SYMBOL_GPL(nfs_local_probe); 215 + 216 + /* 217 + * nfs_local_open_fh - open a local filehandle in terms of nfsd_file 218 + * 219 + * Returns a pointer to a struct nfsd_file or NULL 220 + */ 221 + struct nfsd_file * 222 + nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, 223 + struct nfs_fh *fh, const fmode_t mode) 224 + { 225 + struct nfsd_file *localio; 226 + int status; 227 + 228 + if (!nfs_server_is_local(clp)) 229 + return NULL; 230 + if (mode & ~(FMODE_READ | FMODE_WRITE)) 231 + return NULL; 232 + 233 + localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient, 234 + cred, fh, mode); 235 + if (IS_ERR(localio)) { 236 + status = PTR_ERR(localio); 237 + trace_nfs_local_open_fh(fh, mode, status); 238 + switch (status) { 239 + case -ENOMEM: 240 + case -ENXIO: 241 + case -ENOENT: 242 + /* Revalidate localio, will disable if unsupported */ 243 + nfs_local_probe(clp); 244 + } 245 + return NULL; 246 + } 247 + return localio; 248 + } 249 + EXPORT_SYMBOL_GPL(nfs_local_open_fh); 250 + 251 + static struct bio_vec * 252 + nfs_bvec_alloc_and_import_pagevec(struct page **pagevec, 253 + unsigned int npages, gfp_t flags) 254 + { 255 + struct bio_vec *bvec, *p; 256 + 257 + bvec = kmalloc_array(npages, sizeof(*bvec), flags); 258 + if (bvec != NULL) { 259 + for (p = bvec; npages > 0; p++, pagevec++, npages--) { 260 + p->bv_page = *pagevec; 261 + p->bv_len = PAGE_SIZE; 262 + p->bv_offset = 0; 263 + } 264 + } 265 + return bvec; 266 + } 267 + 268 + static void 269 + nfs_local_iocb_free(struct nfs_local_kiocb *iocb) 270 + { 271 + kfree(iocb->bvec); 272 + kfree(iocb); 273 + } 274 + 275 + static struct nfs_local_kiocb * 276 + nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, 277 + struct nfsd_file *localio, gfp_t flags) 278 + { 279 + struct nfs_local_kiocb *iocb; 280 + 281 + iocb = kmalloc(sizeof(*iocb), flags); 282 + if (iocb == NULL) 283 + return NULL; 284 + iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec, 285 + hdr->page_array.npages, flags); 286 + if (iocb->bvec == NULL) { 287 + kfree(iocb); 288 + return NULL; 289 + } 290 + init_sync_kiocb(&iocb->kiocb, nfs_to->nfsd_file_file(localio)); 291 + iocb->kiocb.ki_pos = hdr->args.offset; 292 + iocb->localio = localio; 293 + iocb->hdr = hdr; 294 + iocb->kiocb.ki_flags &= ~IOCB_APPEND; 295 + return iocb; 296 + } 297 + 298 + static void 299 + nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir) 300 + { 301 + struct nfs_pgio_header *hdr = iocb->hdr; 302 + 303 + iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages, 304 + hdr->args.count + hdr->args.pgbase); 305 + if (hdr->args.pgbase != 0) 306 + iov_iter_advance(i, hdr->args.pgbase); 307 + } 308 + 309 + static void 310 + nfs_local_hdr_release(struct nfs_pgio_header *hdr, 311 + const struct rpc_call_ops *call_ops) 312 + { 313 + call_ops->rpc_call_done(&hdr->task, hdr); 314 + call_ops->rpc_release(hdr); 315 + } 316 + 317 + static void 318 + nfs_local_pgio_init(struct nfs_pgio_header *hdr, 319 + const struct rpc_call_ops *call_ops) 320 + { 321 + hdr->task.tk_ops = call_ops; 322 + if (!hdr->task.tk_start) 323 + hdr->task.tk_start = ktime_get(); 324 + } 325 + 326 + static void 327 + nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status) 328 + { 329 + if (status >= 0) { 330 + hdr->res.count = status; 331 + hdr->res.op_status = NFS4_OK; 332 + hdr->task.tk_status = 0; 333 + } else { 334 + hdr->res.op_status = nfs4_stat_to_errno(status); 335 + hdr->task.tk_status = status; 336 + } 337 + } 338 + 339 + static void 340 + nfs_local_pgio_release(struct nfs_local_kiocb *iocb) 341 + { 342 + struct nfs_pgio_header *hdr = iocb->hdr; 343 + 344 + nfs_to->nfsd_file_put_local(iocb->localio); 345 + nfs_local_iocb_free(iocb); 346 + nfs_local_hdr_release(hdr, hdr->task.tk_ops); 347 + } 348 + 349 + static void 350 + nfs_local_read_done(struct nfs_local_kiocb *iocb, long status) 351 + { 352 + struct nfs_pgio_header *hdr = iocb->hdr; 353 + struct file *filp = iocb->kiocb.ki_filp; 354 + 355 + nfs_local_pgio_done(hdr, status); 356 + 357 + if (hdr->res.count != hdr->args.count || 358 + hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp))) 359 + hdr->res.eof = true; 360 + 361 + dprintk("%s: read %ld bytes eof %d.\n", __func__, 362 + status > 0 ? status : 0, hdr->res.eof); 363 + } 364 + 365 + static void nfs_local_call_read(struct work_struct *work) 366 + { 367 + struct nfs_local_kiocb *iocb = 368 + container_of(work, struct nfs_local_kiocb, work); 369 + struct file *filp = iocb->kiocb.ki_filp; 370 + const struct cred *save_cred; 371 + struct iov_iter iter; 372 + ssize_t status; 373 + 374 + save_cred = override_creds(filp->f_cred); 375 + 376 + nfs_local_iter_init(&iter, iocb, READ); 377 + 378 + status = filp->f_op->read_iter(&iocb->kiocb, &iter); 379 + WARN_ON_ONCE(status == -EIOCBQUEUED); 380 + 381 + nfs_local_read_done(iocb, status); 382 + nfs_local_pgio_release(iocb); 383 + 384 + revert_creds(save_cred); 385 + } 386 + 387 + static int 388 + nfs_do_local_read(struct nfs_pgio_header *hdr, 389 + struct nfsd_file *localio, 390 + const struct rpc_call_ops *call_ops) 391 + { 392 + struct nfs_local_kiocb *iocb; 393 + 394 + dprintk("%s: vfs_read count=%u pos=%llu\n", 395 + __func__, hdr->args.count, hdr->args.offset); 396 + 397 + iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL); 398 + if (iocb == NULL) 399 + return -ENOMEM; 400 + 401 + nfs_local_pgio_init(hdr, call_ops); 402 + hdr->res.eof = false; 403 + 404 + INIT_WORK(&iocb->work, nfs_local_call_read); 405 + queue_work(nfslocaliod_workqueue, &iocb->work); 406 + 407 + return 0; 408 + } 409 + 410 + static void 411 + nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode) 412 + { 413 + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; 414 + u32 *verf = (u32 *)verifier->data; 415 + int seq = 0; 416 + 417 + do { 418 + read_seqbegin_or_lock(&clp->cl_boot_lock, &seq); 419 + verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec; 420 + verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec; 421 + } while (need_seqretry(&clp->cl_boot_lock, seq)); 422 + done_seqretry(&clp->cl_boot_lock, seq); 423 + } 424 + 425 + static void 426 + nfs_reset_boot_verifier(struct inode *inode) 427 + { 428 + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; 429 + 430 + write_seqlock(&clp->cl_boot_lock); 431 + ktime_get_real_ts64(&clp->cl_nfssvc_boot); 432 + write_sequnlock(&clp->cl_boot_lock); 433 + } 434 + 435 + static void 436 + nfs_set_local_verifier(struct inode *inode, 437 + struct nfs_writeverf *verf, 438 + enum nfs3_stable_how how) 439 + { 440 + nfs_copy_boot_verifier(&verf->verifier, inode); 441 + verf->committed = how; 442 + } 443 + 444 + /* Factored out from fs/nfsd/vfs.h:fh_getattr() */ 445 + static int __vfs_getattr(struct path *p, struct kstat *stat, int version) 446 + { 447 + u32 request_mask = STATX_BASIC_STATS; 448 + 449 + if (version == 4) 450 + request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE); 451 + return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT); 452 + } 453 + 454 + /* Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute() */ 455 + static u64 __nfsd4_change_attribute(const struct kstat *stat, 456 + const struct inode *inode) 457 + { 458 + u64 chattr; 459 + 460 + if (stat->result_mask & STATX_CHANGE_COOKIE) { 461 + chattr = stat->change_cookie; 462 + if (S_ISREG(inode->i_mode) && 463 + !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) { 464 + chattr += (u64)stat->ctime.tv_sec << 30; 465 + chattr += stat->ctime.tv_nsec; 466 + } 467 + } else { 468 + chattr = time_to_chattr(&stat->ctime); 469 + } 470 + return chattr; 471 + } 472 + 473 + static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb) 474 + { 475 + struct kstat stat; 476 + struct file *filp = iocb->kiocb.ki_filp; 477 + struct nfs_pgio_header *hdr = iocb->hdr; 478 + struct nfs_fattr *fattr = hdr->res.fattr; 479 + int version = NFS_PROTO(hdr->inode)->version; 480 + 481 + if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version)) 482 + return; 483 + 484 + fattr->valid = (NFS_ATTR_FATTR_FILEID | 485 + NFS_ATTR_FATTR_CHANGE | 486 + NFS_ATTR_FATTR_SIZE | 487 + NFS_ATTR_FATTR_ATIME | 488 + NFS_ATTR_FATTR_MTIME | 489 + NFS_ATTR_FATTR_CTIME | 490 + NFS_ATTR_FATTR_SPACE_USED); 491 + 492 + fattr->fileid = stat.ino; 493 + fattr->size = stat.size; 494 + fattr->atime = stat.atime; 495 + fattr->mtime = stat.mtime; 496 + fattr->ctime = stat.ctime; 497 + if (version == 4) { 498 + fattr->change_attr = 499 + __nfsd4_change_attribute(&stat, file_inode(filp)); 500 + } else 501 + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime); 502 + fattr->du.nfs3.used = stat.blocks << 9; 503 + } 504 + 505 + static void 506 + nfs_local_write_done(struct nfs_local_kiocb *iocb, long status) 507 + { 508 + struct nfs_pgio_header *hdr = iocb->hdr; 509 + struct inode *inode = hdr->inode; 510 + 511 + dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0); 512 + 513 + /* Handle short writes as if they are ENOSPC */ 514 + if (status > 0 && status < hdr->args.count) { 515 + hdr->mds_offset += status; 516 + hdr->args.offset += status; 517 + hdr->args.pgbase += status; 518 + hdr->args.count -= status; 519 + nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset); 520 + status = -ENOSPC; 521 + } 522 + if (status < 0) 523 + nfs_reset_boot_verifier(inode); 524 + else if (nfs_should_remove_suid(inode)) { 525 + /* Deal with the suid/sgid bit corner case */ 526 + spin_lock(&inode->i_lock); 527 + nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE); 528 + spin_unlock(&inode->i_lock); 529 + } 530 + nfs_local_pgio_done(hdr, status); 531 + } 532 + 533 + static void nfs_local_call_write(struct work_struct *work) 534 + { 535 + struct nfs_local_kiocb *iocb = 536 + container_of(work, struct nfs_local_kiocb, work); 537 + struct file *filp = iocb->kiocb.ki_filp; 538 + unsigned long old_flags = current->flags; 539 + const struct cred *save_cred; 540 + struct iov_iter iter; 541 + ssize_t status; 542 + 543 + current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO; 544 + save_cred = override_creds(filp->f_cred); 545 + 546 + nfs_local_iter_init(&iter, iocb, WRITE); 547 + 548 + file_start_write(filp); 549 + status = filp->f_op->write_iter(&iocb->kiocb, &iter); 550 + file_end_write(filp); 551 + WARN_ON_ONCE(status == -EIOCBQUEUED); 552 + 553 + nfs_local_write_done(iocb, status); 554 + nfs_local_vfs_getattr(iocb); 555 + nfs_local_pgio_release(iocb); 556 + 557 + revert_creds(save_cred); 558 + current->flags = old_flags; 559 + } 560 + 561 + static int 562 + nfs_do_local_write(struct nfs_pgio_header *hdr, 563 + struct nfsd_file *localio, 564 + const struct rpc_call_ops *call_ops) 565 + { 566 + struct nfs_local_kiocb *iocb; 567 + 568 + dprintk("%s: vfs_write count=%u pos=%llu %s\n", 569 + __func__, hdr->args.count, hdr->args.offset, 570 + (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable"); 571 + 572 + iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO); 573 + if (iocb == NULL) 574 + return -ENOMEM; 575 + 576 + switch (hdr->args.stable) { 577 + default: 578 + break; 579 + case NFS_DATA_SYNC: 580 + iocb->kiocb.ki_flags |= IOCB_DSYNC; 581 + break; 582 + case NFS_FILE_SYNC: 583 + iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC; 584 + } 585 + nfs_local_pgio_init(hdr, call_ops); 586 + 587 + nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable); 588 + 589 + INIT_WORK(&iocb->work, nfs_local_call_write); 590 + queue_work(nfslocaliod_workqueue, &iocb->work); 591 + 592 + return 0; 593 + } 594 + 595 + int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio, 596 + struct nfs_pgio_header *hdr, 597 + const struct rpc_call_ops *call_ops) 598 + { 599 + int status = 0; 600 + struct file *filp = nfs_to->nfsd_file_file(localio); 601 + 602 + if (!hdr->args.count) 603 + return 0; 604 + /* Don't support filesystems without read_iter/write_iter */ 605 + if (!filp->f_op->read_iter || !filp->f_op->write_iter) { 606 + nfs_local_disable(clp); 607 + status = -EAGAIN; 608 + goto out; 609 + } 610 + 611 + switch (hdr->rw_mode) { 612 + case FMODE_READ: 613 + status = nfs_do_local_read(hdr, localio, call_ops); 614 + break; 615 + case FMODE_WRITE: 616 + status = nfs_do_local_write(hdr, localio, call_ops); 617 + break; 618 + default: 619 + dprintk("%s: invalid mode: %d\n", __func__, 620 + hdr->rw_mode); 621 + status = -EINVAL; 622 + } 623 + out: 624 + if (status != 0) { 625 + nfs_to->nfsd_file_put_local(localio); 626 + hdr->task.tk_status = status; 627 + nfs_local_hdr_release(hdr, call_ops); 628 + } 629 + return status; 630 + } 631 + 632 + static void 633 + nfs_local_init_commit(struct nfs_commit_data *data, 634 + const struct rpc_call_ops *call_ops) 635 + { 636 + data->task.tk_ops = call_ops; 637 + } 638 + 639 + static int 640 + nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data) 641 + { 642 + loff_t start = data->args.offset; 643 + loff_t end = LLONG_MAX; 644 + 645 + if (data->args.count > 0) { 646 + end = start + data->args.count - 1; 647 + if (end < start) 648 + end = LLONG_MAX; 649 + } 650 + 651 + dprintk("%s: commit %llu - %llu\n", __func__, start, end); 652 + return vfs_fsync_range(filp, start, end, 0); 653 + } 654 + 655 + static void 656 + nfs_local_commit_done(struct nfs_commit_data *data, int status) 657 + { 658 + if (status >= 0) { 659 + nfs_set_local_verifier(data->inode, 660 + data->res.verf, 661 + NFS_FILE_SYNC); 662 + data->res.op_status = NFS4_OK; 663 + data->task.tk_status = 0; 664 + } else { 665 + nfs_reset_boot_verifier(data->inode); 666 + data->res.op_status = nfs4_stat_to_errno(status); 667 + data->task.tk_status = status; 668 + } 669 + } 670 + 671 + static void 672 + nfs_local_release_commit_data(struct nfsd_file *localio, 673 + struct nfs_commit_data *data, 674 + const struct rpc_call_ops *call_ops) 675 + { 676 + nfs_to->nfsd_file_put_local(localio); 677 + call_ops->rpc_call_done(&data->task, data); 678 + call_ops->rpc_release(data); 679 + } 680 + 681 + static struct nfs_local_fsync_ctx * 682 + nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, 683 + struct nfsd_file *localio, gfp_t flags) 684 + { 685 + struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags); 686 + 687 + if (ctx != NULL) { 688 + ctx->localio = localio; 689 + ctx->data = data; 690 + INIT_WORK(&ctx->work, nfs_local_fsync_work); 691 + kref_init(&ctx->kref); 692 + ctx->done = NULL; 693 + } 694 + return ctx; 695 + } 696 + 697 + static void 698 + nfs_local_fsync_ctx_kref_free(struct kref *kref) 699 + { 700 + kfree(container_of(kref, struct nfs_local_fsync_ctx, kref)); 701 + } 702 + 703 + static void 704 + nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx) 705 + { 706 + kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free); 707 + } 708 + 709 + static void 710 + nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx) 711 + { 712 + nfs_local_release_commit_data(ctx->localio, ctx->data, 713 + ctx->data->task.tk_ops); 714 + nfs_local_fsync_ctx_put(ctx); 715 + } 716 + 717 + static void 718 + nfs_local_fsync_work(struct work_struct *work) 719 + { 720 + struct nfs_local_fsync_ctx *ctx; 721 + int status; 722 + 723 + ctx = container_of(work, struct nfs_local_fsync_ctx, work); 724 + 725 + status = nfs_local_run_commit(nfs_to->nfsd_file_file(ctx->localio), 726 + ctx->data); 727 + nfs_local_commit_done(ctx->data, status); 728 + if (ctx->done != NULL) 729 + complete(ctx->done); 730 + nfs_local_fsync_ctx_free(ctx); 731 + } 732 + 733 + int nfs_local_commit(struct nfsd_file *localio, 734 + struct nfs_commit_data *data, 735 + const struct rpc_call_ops *call_ops, int how) 736 + { 737 + struct nfs_local_fsync_ctx *ctx; 738 + 739 + ctx = nfs_local_fsync_ctx_alloc(data, localio, GFP_KERNEL); 740 + if (!ctx) { 741 + nfs_local_commit_done(data, -ENOMEM); 742 + nfs_local_release_commit_data(localio, data, call_ops); 743 + return -ENOMEM; 744 + } 745 + 746 + nfs_local_init_commit(data, call_ops); 747 + kref_get(&ctx->kref); 748 + if (how & FLUSH_SYNC) { 749 + DECLARE_COMPLETION_ONSTACK(done); 750 + ctx->done = &done; 751 + queue_work(nfsiod_workqueue, &ctx->work); 752 + wait_for_completion(&done); 753 + } else 754 + queue_work(nfsiod_workqueue, &ctx->work); 755 + nfs_local_fsync_ctx_put(ctx); 756 + return 0; 757 + }
+1 -69
fs/nfs/nfs2xdr.c
··· 22 22 #include <linux/nfs.h> 23 23 #include <linux/nfs2.h> 24 24 #include <linux/nfs_fs.h> 25 + #include <linux/nfs_common.h> 25 26 #include "nfstrace.h" 26 27 #include "internal.h" 27 28 28 29 #define NFSDBG_FACILITY NFSDBG_XDR 29 - 30 - /* Mapping from NFS error code to "errno" error code. */ 31 - #define errno_NFSERR_IO EIO 32 30 33 31 /* 34 32 * Declare the space requirements for NFS arguments and replies as ··· 61 63 #define NFS_stat_sz (1) 62 64 #define NFS_readdirres_sz (1+NFS_pagepad_sz) 63 65 #define NFS_statfsres_sz (1+NFS_info_sz) 64 - 65 - static int nfs_stat_to_errno(enum nfs_stat); 66 66 67 67 /* 68 68 * Encode/decode NFSv2 basic data types ··· 1048 1052 return error; 1049 1053 out_default: 1050 1054 return nfs_stat_to_errno(status); 1051 - } 1052 - 1053 - 1054 - /* 1055 - * We need to translate between nfs status return values and 1056 - * the local errno values which may not be the same. 1057 - */ 1058 - static const struct { 1059 - int stat; 1060 - int errno; 1061 - } nfs_errtbl[] = { 1062 - { NFS_OK, 0 }, 1063 - { NFSERR_PERM, -EPERM }, 1064 - { NFSERR_NOENT, -ENOENT }, 1065 - { NFSERR_IO, -errno_NFSERR_IO}, 1066 - { NFSERR_NXIO, -ENXIO }, 1067 - /* { NFSERR_EAGAIN, -EAGAIN }, */ 1068 - { NFSERR_ACCES, -EACCES }, 1069 - { NFSERR_EXIST, -EEXIST }, 1070 - { NFSERR_XDEV, -EXDEV }, 1071 - { NFSERR_NODEV, -ENODEV }, 1072 - { NFSERR_NOTDIR, -ENOTDIR }, 1073 - { NFSERR_ISDIR, -EISDIR }, 1074 - { NFSERR_INVAL, -EINVAL }, 1075 - { NFSERR_FBIG, -EFBIG }, 1076 - { NFSERR_NOSPC, -ENOSPC }, 1077 - { NFSERR_ROFS, -EROFS }, 1078 - { NFSERR_MLINK, -EMLINK }, 1079 - { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, 1080 - { NFSERR_NOTEMPTY, -ENOTEMPTY }, 1081 - { NFSERR_DQUOT, -EDQUOT }, 1082 - { NFSERR_STALE, -ESTALE }, 1083 - { NFSERR_REMOTE, -EREMOTE }, 1084 - #ifdef EWFLUSH 1085 - { NFSERR_WFLUSH, -EWFLUSH }, 1086 - #endif 1087 - { NFSERR_BADHANDLE, -EBADHANDLE }, 1088 - { NFSERR_NOT_SYNC, -ENOTSYNC }, 1089 - { NFSERR_BAD_COOKIE, -EBADCOOKIE }, 1090 - { NFSERR_NOTSUPP, -ENOTSUPP }, 1091 - { NFSERR_TOOSMALL, -ETOOSMALL }, 1092 - { NFSERR_SERVERFAULT, -EREMOTEIO }, 1093 - { NFSERR_BADTYPE, -EBADTYPE }, 1094 - { NFSERR_JUKEBOX, -EJUKEBOX }, 1095 - { -1, -EIO } 1096 - }; 1097 - 1098 - /** 1099 - * nfs_stat_to_errno - convert an NFS status code to a local errno 1100 - * @status: NFS status code to convert 1101 - * 1102 - * Returns a local errno value, or -EIO if the NFS status code is 1103 - * not recognized. This function is used jointly by NFSv2 and NFSv3. 1104 - */ 1105 - static int nfs_stat_to_errno(enum nfs_stat status) 1106 - { 1107 - int i; 1108 - 1109 - for (i = 0; nfs_errtbl[i].stat != -1; i++) { 1110 - if (nfs_errtbl[i].stat == (int)status) 1111 - return nfs_errtbl[i].errno; 1112 - } 1113 - dprintk("NFS: Unrecognized nfs status value: %u\n", status); 1114 - return nfs_errtbl[i].errno; 1115 1055 } 1116 1056 1117 1057 #define PROC(proc, argtype, restype, timer) \
+20 -88
fs/nfs/nfs3xdr.c
··· 21 21 #include <linux/nfs3.h> 22 22 #include <linux/nfs_fs.h> 23 23 #include <linux/nfsacl.h> 24 + #include <linux/nfs_common.h> 25 + 24 26 #include "nfstrace.h" 25 27 #include "internal.h" 26 28 27 29 #define NFSDBG_FACILITY NFSDBG_XDR 28 - 29 - /* Mapping from NFS error code to "errno" error code. */ 30 - #define errno_NFSERR_IO EIO 31 30 32 31 /* 33 32 * Declare the space requirements for NFS arguments and replies as ··· 89 90 XDR_QUADLEN(NFS_ACL_INLINE_BUFSIZE)+\ 90 91 NFS3_pagepad_sz) 91 92 #define ACL3_setaclres_sz (1+NFS3_post_op_attr_sz) 92 - 93 - static int nfs3_stat_to_errno(enum nfs_stat); 94 93 95 94 /* 96 95 * Map file type to S_IFMT bits ··· 1403 1406 out: 1404 1407 return error; 1405 1408 out_default: 1406 - return nfs3_stat_to_errno(status); 1409 + return nfs_stat_to_errno(status); 1407 1410 } 1408 1411 1409 1412 /* ··· 1442 1445 out: 1443 1446 return error; 1444 1447 out_status: 1445 - return nfs3_stat_to_errno(status); 1448 + return nfs_stat_to_errno(status); 1446 1449 } 1447 1450 1448 1451 /* ··· 1492 1495 error = decode_post_op_attr(xdr, result->dir_attr, userns); 1493 1496 if (unlikely(error)) 1494 1497 goto out; 1495 - return nfs3_stat_to_errno(status); 1498 + return nfs_stat_to_errno(status); 1496 1499 } 1497 1500 1498 1501 /* ··· 1534 1537 out: 1535 1538 return error; 1536 1539 out_default: 1537 - return nfs3_stat_to_errno(status); 1540 + return nfs_stat_to_errno(status); 1538 1541 } 1539 1542 1540 1543 /* ··· 1575 1578 out: 1576 1579 return error; 1577 1580 out_default: 1578 - return nfs3_stat_to_errno(status); 1581 + return nfs_stat_to_errno(status); 1579 1582 } 1580 1583 1581 1584 /* ··· 1655 1658 out: 1656 1659 return error; 1657 1660 out_status: 1658 - return nfs3_stat_to_errno(status); 1661 + return nfs_stat_to_errno(status); 1659 1662 } 1660 1663 1661 1664 /* ··· 1725 1728 out: 1726 1729 return error; 1727 1730 out_status: 1728 - return nfs3_stat_to_errno(status); 1731 + return nfs_stat_to_errno(status); 1729 1732 } 1730 1733 1731 1734 /* ··· 1792 1795 error = decode_wcc_data(xdr, result->dir_attr, userns); 1793 1796 if (unlikely(error)) 1794 1797 goto out; 1795 - return nfs3_stat_to_errno(status); 1798 + return nfs_stat_to_errno(status); 1796 1799 } 1797 1800 1798 1801 /* ··· 1832 1835 out: 1833 1836 return error; 1834 1837 out_status: 1835 - return nfs3_stat_to_errno(status); 1838 + return nfs_stat_to_errno(status); 1836 1839 } 1837 1840 1838 1841 /* ··· 1878 1881 out: 1879 1882 return error; 1880 1883 out_status: 1881 - return nfs3_stat_to_errno(status); 1884 + return nfs_stat_to_errno(status); 1882 1885 } 1883 1886 1884 1887 /* ··· 1923 1926 out: 1924 1927 return error; 1925 1928 out_status: 1926 - return nfs3_stat_to_errno(status); 1929 + return nfs_stat_to_errno(status); 1927 1930 } 1928 1931 1929 1932 /** ··· 2098 2101 error = decode_post_op_attr(xdr, result->dir_attr, rpc_rqst_userns(req)); 2099 2102 if (unlikely(error)) 2100 2103 goto out; 2101 - return nfs3_stat_to_errno(status); 2104 + return nfs_stat_to_errno(status); 2102 2105 } 2103 2106 2104 2107 /* ··· 2164 2167 out: 2165 2168 return error; 2166 2169 out_status: 2167 - return nfs3_stat_to_errno(status); 2170 + return nfs_stat_to_errno(status); 2168 2171 } 2169 2172 2170 2173 /* ··· 2240 2243 out: 2241 2244 return error; 2242 2245 out_status: 2243 - return nfs3_stat_to_errno(status); 2246 + return nfs_stat_to_errno(status); 2244 2247 } 2245 2248 2246 2249 /* ··· 2301 2304 out: 2302 2305 return error; 2303 2306 out_status: 2304 - return nfs3_stat_to_errno(status); 2307 + return nfs_stat_to_errno(status); 2305 2308 } 2306 2309 2307 2310 /* ··· 2347 2350 out: 2348 2351 return error; 2349 2352 out_status: 2350 - return nfs3_stat_to_errno(status); 2353 + return nfs_stat_to_errno(status); 2351 2354 } 2352 2355 2353 2356 #ifdef CONFIG_NFS_V3_ACL ··· 2413 2416 out: 2414 2417 return error; 2415 2418 out_default: 2416 - return nfs3_stat_to_errno(status); 2419 + return nfs_stat_to_errno(status); 2417 2420 } 2418 2421 2419 2422 static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req, ··· 2432 2435 out: 2433 2436 return error; 2434 2437 out_default: 2435 - return nfs3_stat_to_errno(status); 2438 + return nfs_stat_to_errno(status); 2436 2439 } 2437 2440 2438 2441 #endif /* CONFIG_NFS_V3_ACL */ 2439 - 2440 - 2441 - /* 2442 - * We need to translate between nfs status return values and 2443 - * the local errno values which may not be the same. 2444 - */ 2445 - static const struct { 2446 - int stat; 2447 - int errno; 2448 - } nfs_errtbl[] = { 2449 - { NFS_OK, 0 }, 2450 - { NFSERR_PERM, -EPERM }, 2451 - { NFSERR_NOENT, -ENOENT }, 2452 - { NFSERR_IO, -errno_NFSERR_IO}, 2453 - { NFSERR_NXIO, -ENXIO }, 2454 - /* { NFSERR_EAGAIN, -EAGAIN }, */ 2455 - { NFSERR_ACCES, -EACCES }, 2456 - { NFSERR_EXIST, -EEXIST }, 2457 - { NFSERR_XDEV, -EXDEV }, 2458 - { NFSERR_NODEV, -ENODEV }, 2459 - { NFSERR_NOTDIR, -ENOTDIR }, 2460 - { NFSERR_ISDIR, -EISDIR }, 2461 - { NFSERR_INVAL, -EINVAL }, 2462 - { NFSERR_FBIG, -EFBIG }, 2463 - { NFSERR_NOSPC, -ENOSPC }, 2464 - { NFSERR_ROFS, -EROFS }, 2465 - { NFSERR_MLINK, -EMLINK }, 2466 - { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, 2467 - { NFSERR_NOTEMPTY, -ENOTEMPTY }, 2468 - { NFSERR_DQUOT, -EDQUOT }, 2469 - { NFSERR_STALE, -ESTALE }, 2470 - { NFSERR_REMOTE, -EREMOTE }, 2471 - #ifdef EWFLUSH 2472 - { NFSERR_WFLUSH, -EWFLUSH }, 2473 - #endif 2474 - { NFSERR_BADHANDLE, -EBADHANDLE }, 2475 - { NFSERR_NOT_SYNC, -ENOTSYNC }, 2476 - { NFSERR_BAD_COOKIE, -EBADCOOKIE }, 2477 - { NFSERR_NOTSUPP, -ENOTSUPP }, 2478 - { NFSERR_TOOSMALL, -ETOOSMALL }, 2479 - { NFSERR_SERVERFAULT, -EREMOTEIO }, 2480 - { NFSERR_BADTYPE, -EBADTYPE }, 2481 - { NFSERR_JUKEBOX, -EJUKEBOX }, 2482 - { -1, -EIO } 2483 - }; 2484 - 2485 - /** 2486 - * nfs3_stat_to_errno - convert an NFS status code to a local errno 2487 - * @status: NFS status code to convert 2488 - * 2489 - * Returns a local errno value, or -EIO if the NFS status code is 2490 - * not recognized. This function is used jointly by NFSv2 and NFSv3. 2491 - */ 2492 - static int nfs3_stat_to_errno(enum nfs_stat status) 2493 - { 2494 - int i; 2495 - 2496 - for (i = 0; nfs_errtbl[i].stat != -1; i++) { 2497 - if (nfs_errtbl[i].stat == (int)status) 2498 - return nfs_errtbl[i].errno; 2499 - } 2500 - dprintk("NFS: Unrecognized nfs status value: %u\n", status); 2501 - return nfs_errtbl[i].errno; 2502 - } 2503 - 2504 2442 2505 2443 #define PROC(proc, argtype, restype, timer) \ 2506 2444 [NFS3PROC_##proc] = { \
+1 -1
fs/nfs/nfs4_fs.h
··· 83 83 #define NFS_SEQID_CONFIRMED 1 84 84 struct nfs_seqid_counter { 85 85 ktime_t create_time; 86 - int owner_id; 86 + u64 owner_id; 87 87 int flags; 88 88 u32 counter; 89 89 spinlock_t lock; /* Protects the list */
+14 -2
fs/nfs/nfs4proc.c
··· 3904 3904 #define FATTR4_WORD2_NFS41_MASK (2*FATTR4_WORD2_SUPPATTR_EXCLCREAT - 1UL) 3905 3905 #define FATTR4_WORD2_NFS42_MASK (2*FATTR4_WORD2_OPEN_ARGUMENTS - 1UL) 3906 3906 3907 + #define FATTR4_WORD2_NFS42_TIME_DELEG_MASK \ 3908 + (FATTR4_WORD2_TIME_DELEG_MODIFY|FATTR4_WORD2_TIME_DELEG_ACCESS) 3909 + static bool nfs4_server_delegtime_capable(struct nfs4_server_caps_res *res) 3910 + { 3911 + u32 share_access_want = res->open_caps.oa_share_access_want[0]; 3912 + u32 attr_bitmask = res->attr_bitmask[2]; 3913 + 3914 + return (share_access_want & NFS4_SHARE_WANT_DELEG_TIMESTAMPS) && 3915 + ((attr_bitmask & FATTR4_WORD2_NFS42_TIME_DELEG_MASK) == 3916 + FATTR4_WORD2_NFS42_TIME_DELEG_MASK); 3917 + } 3918 + 3907 3919 static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle) 3908 3920 { 3909 3921 u32 minorversion = server->nfs_client->cl_minorversion; ··· 3994 3982 #endif 3995 3983 if (res.attr_bitmask[0] & FATTR4_WORD0_FS_LOCATIONS) 3996 3984 server->caps |= NFS_CAP_FS_LOCATIONS; 3997 - if (res.attr_bitmask[2] & FATTR4_WORD2_TIME_DELEG_MODIFY) 3998 - server->caps |= NFS_CAP_DELEGTIME; 3999 3985 if (!(res.attr_bitmask[0] & FATTR4_WORD0_FILEID)) 4000 3986 server->fattr_valid &= ~NFS_ATTR_FATTR_FILEID; 4001 3987 if (!(res.attr_bitmask[1] & FATTR4_WORD1_MODE)) ··· 4021 4011 if (res.open_caps.oa_share_access_want[0] & 4022 4012 NFS4_SHARE_WANT_OPEN_XOR_DELEGATION) 4023 4013 server->caps |= NFS_CAP_OPEN_XOR; 4014 + if (nfs4_server_delegtime_capable(&res)) 4015 + server->caps |= NFS_CAP_DELEGTIME; 4024 4016 4025 4017 memcpy(server->cache_consistency_bitmask, res.attr_bitmask, sizeof(server->cache_consistency_bitmask)); 4026 4018 server->cache_consistency_bitmask[0] &= FATTR4_WORD0_CHANGE|FATTR4_WORD0_SIZE;
+9 -13
fs/nfs/nfs4state.c
··· 501 501 sp = kzalloc(sizeof(*sp), gfp_flags); 502 502 if (!sp) 503 503 return NULL; 504 - sp->so_seqid.owner_id = ida_alloc(&server->openowner_id, gfp_flags); 505 - if (sp->so_seqid.owner_id < 0) { 506 - kfree(sp); 507 - return NULL; 508 - } 504 + sp->so_seqid.owner_id = atomic64_inc_return(&server->owner_ctr); 509 505 sp->so_server = server; 510 506 sp->so_cred = get_cred(cred); 511 507 spin_lock_init(&sp->so_lock); ··· 532 536 { 533 537 nfs4_destroy_seqid_counter(&sp->so_seqid); 534 538 put_cred(sp->so_cred); 535 - ida_free(&sp->so_server->openowner_id, sp->so_seqid.owner_id); 536 539 kfree(sp); 537 540 } 538 541 ··· 874 879 refcount_set(&lsp->ls_count, 1); 875 880 lsp->ls_state = state; 876 881 lsp->ls_owner = owner; 877 - lsp->ls_seqid.owner_id = ida_alloc(&server->lockowner_id, GFP_KERNEL_ACCOUNT); 878 - if (lsp->ls_seqid.owner_id < 0) 879 - goto out_free; 882 + lsp->ls_seqid.owner_id = atomic64_inc_return(&server->owner_ctr); 880 883 INIT_LIST_HEAD(&lsp->ls_locks); 881 884 return lsp; 882 - out_free: 883 - kfree(lsp); 884 - return NULL; 885 885 } 886 886 887 887 void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp) 888 888 { 889 - ida_free(&server->lockowner_id, lsp->ls_seqid.owner_id); 890 889 nfs4_destroy_seqid_counter(&lsp->ls_seqid); 891 890 kfree(lsp); 892 891 } ··· 1946 1957 set_bit(ops->owner_flag_bit, &sp->so_flags); 1947 1958 nfs4_put_state_owner(sp); 1948 1959 status = nfs4_recovery_handle_error(clp, status); 1960 + nfs4_free_state_owners(&freeme); 1949 1961 return (status != 0) ? status : -EAGAIN; 1950 1962 } 1951 1963 ··· 2013 2023 nfs_mark_client_ready(clp, -EPERM); 2014 2024 clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state); 2015 2025 return -EPERM; 2026 + case -ETIMEDOUT: 2027 + if (clp->cl_cons_state == NFS_CS_SESSION_INITING) { 2028 + nfs_mark_client_ready(clp, -EIO); 2029 + return -EIO; 2030 + } 2031 + fallthrough; 2016 2032 case -EACCES: 2017 2033 case -NFS4ERR_DELAY: 2018 2034 case -EAGAIN:
+4 -86
fs/nfs/nfs4xdr.c
··· 52 52 #include <linux/nfs.h> 53 53 #include <linux/nfs4.h> 54 54 #include <linux/nfs_fs.h> 55 + #include <linux/nfs_common.h> 55 56 56 57 #include "nfs4_fs.h" 57 58 #include "nfs4trace.h" ··· 64 63 65 64 #define NFSDBG_FACILITY NFSDBG_XDR 66 65 67 - /* Mapping from NFS error code to "errno" error code. */ 68 - #define errno_NFSERR_IO EIO 69 - 70 66 struct compound_hdr; 71 - static int nfs4_stat_to_errno(int); 72 67 static void encode_layoutget(struct xdr_stream *xdr, 73 68 const struct nfs4_layoutget_args *args, 74 69 struct compound_hdr *hdr); ··· 972 975 return p; 973 976 } 974 977 975 - static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len) 976 - { 977 - WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0); 978 - } 979 - 980 978 static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str) 981 979 { 982 980 WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0); ··· 1416 1424 */ 1417 1425 encode_nfs4_seqid(xdr, arg->seqid); 1418 1426 encode_share_access(xdr, arg->share_access); 1419 - p = reserve_space(xdr, 36); 1427 + p = reserve_space(xdr, 40); 1420 1428 p = xdr_encode_hyper(p, arg->clientid); 1421 - *p++ = cpu_to_be32(24); 1429 + *p++ = cpu_to_be32(28); 1422 1430 p = xdr_encode_opaque_fixed(p, "open id:", 8); 1423 1431 *p++ = cpu_to_be32(arg->server->s_dev); 1424 - *p++ = cpu_to_be32(arg->id.uniquifier); 1432 + p = xdr_encode_hyper(p, arg->id.uniquifier); 1425 1433 xdr_encode_hyper(p, arg->id.create_time); 1426 1434 } 1427 1435 ··· 4397 4405 acc = be32_to_cpup(p); 4398 4406 *supported = supp; 4399 4407 *access = acc; 4400 - return 0; 4401 - } 4402 - 4403 - static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len) 4404 - { 4405 - ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len); 4406 - if (unlikely(ret < 0)) 4407 - return -EIO; 4408 4408 return 0; 4409 4409 } 4410 4410 ··· 7602 7618 entry->cookie = new_cookie; 7603 7619 7604 7620 return 0; 7605 - } 7606 - 7607 - /* 7608 - * We need to translate between nfs status return values and 7609 - * the local errno values which may not be the same. 7610 - */ 7611 - static struct { 7612 - int stat; 7613 - int errno; 7614 - } nfs_errtbl[] = { 7615 - { NFS4_OK, 0 }, 7616 - { NFS4ERR_PERM, -EPERM }, 7617 - { NFS4ERR_NOENT, -ENOENT }, 7618 - { NFS4ERR_IO, -errno_NFSERR_IO}, 7619 - { NFS4ERR_NXIO, -ENXIO }, 7620 - { NFS4ERR_ACCESS, -EACCES }, 7621 - { NFS4ERR_EXIST, -EEXIST }, 7622 - { NFS4ERR_XDEV, -EXDEV }, 7623 - { NFS4ERR_NOTDIR, -ENOTDIR }, 7624 - { NFS4ERR_ISDIR, -EISDIR }, 7625 - { NFS4ERR_INVAL, -EINVAL }, 7626 - { NFS4ERR_FBIG, -EFBIG }, 7627 - { NFS4ERR_NOSPC, -ENOSPC }, 7628 - { NFS4ERR_ROFS, -EROFS }, 7629 - { NFS4ERR_MLINK, -EMLINK }, 7630 - { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG }, 7631 - { NFS4ERR_NOTEMPTY, -ENOTEMPTY }, 7632 - { NFS4ERR_DQUOT, -EDQUOT }, 7633 - { NFS4ERR_STALE, -ESTALE }, 7634 - { NFS4ERR_BADHANDLE, -EBADHANDLE }, 7635 - { NFS4ERR_BAD_COOKIE, -EBADCOOKIE }, 7636 - { NFS4ERR_NOTSUPP, -ENOTSUPP }, 7637 - { NFS4ERR_TOOSMALL, -ETOOSMALL }, 7638 - { NFS4ERR_SERVERFAULT, -EREMOTEIO }, 7639 - { NFS4ERR_BADTYPE, -EBADTYPE }, 7640 - { NFS4ERR_LOCKED, -EAGAIN }, 7641 - { NFS4ERR_SYMLINK, -ELOOP }, 7642 - { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP }, 7643 - { NFS4ERR_DEADLOCK, -EDEADLK }, 7644 - { NFS4ERR_NOXATTR, -ENODATA }, 7645 - { NFS4ERR_XATTR2BIG, -E2BIG }, 7646 - { -1, -EIO } 7647 - }; 7648 - 7649 - /* 7650 - * Convert an NFS error code to a local one. 7651 - * This one is used jointly by NFSv2 and NFSv3. 7652 - */ 7653 - static int 7654 - nfs4_stat_to_errno(int stat) 7655 - { 7656 - int i; 7657 - for (i = 0; nfs_errtbl[i].stat != -1; i++) { 7658 - if (nfs_errtbl[i].stat == stat) 7659 - return nfs_errtbl[i].errno; 7660 - } 7661 - if (stat <= 10000 || stat > 10100) { 7662 - /* The server is looney tunes. */ 7663 - return -EREMOTEIO; 7664 - } 7665 - /* If we cannot translate the error, the recovery routines should 7666 - * handle it. 7667 - * Note: remaining NFSv4 error codes have values > 10000, so should 7668 - * not conflict with native Linux error codes. 7669 - */ 7670 - return -stat; 7671 7621 } 7672 7622 7673 7623 #ifdef CONFIG_NFS_V4_2
+61
fs/nfs/nfstrace.h
··· 1685 1685 TP_printk("path='%s'", __get_str(path)) 1686 1686 ); 1687 1687 1688 + TRACE_EVENT(nfs_local_open_fh, 1689 + TP_PROTO( 1690 + const struct nfs_fh *fh, 1691 + fmode_t fmode, 1692 + int error 1693 + ), 1694 + 1695 + TP_ARGS(fh, fmode, error), 1696 + 1697 + TP_STRUCT__entry( 1698 + __field(int, error) 1699 + __field(u32, fhandle) 1700 + __field(unsigned int, fmode) 1701 + ), 1702 + 1703 + TP_fast_assign( 1704 + __entry->error = error; 1705 + __entry->fhandle = nfs_fhandle_hash(fh); 1706 + __entry->fmode = (__force unsigned int)fmode; 1707 + ), 1708 + 1709 + TP_printk( 1710 + "error=%d fhandle=0x%08x mode=%s", 1711 + __entry->error, 1712 + __entry->fhandle, 1713 + show_fs_fmode_flags(__entry->fmode) 1714 + ) 1715 + ); 1716 + 1717 + DECLARE_EVENT_CLASS(nfs_local_client_event, 1718 + TP_PROTO( 1719 + const struct nfs_client *clp 1720 + ), 1721 + 1722 + TP_ARGS(clp), 1723 + 1724 + TP_STRUCT__entry( 1725 + __field(unsigned int, protocol) 1726 + __string(server, clp->cl_hostname) 1727 + ), 1728 + 1729 + TP_fast_assign( 1730 + __entry->protocol = clp->rpc_ops->version; 1731 + __assign_str(server); 1732 + ), 1733 + 1734 + TP_printk( 1735 + "server=%s NFSv%u", __get_str(server), __entry->protocol 1736 + ) 1737 + ); 1738 + 1739 + #define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \ 1740 + DEFINE_EVENT(nfs_local_client_event, name, \ 1741 + TP_PROTO( \ 1742 + const struct nfs_client *clp \ 1743 + ), \ 1744 + TP_ARGS(clp)) 1745 + 1746 + DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable); 1747 + DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable); 1748 + 1688 1749 DECLARE_EVENT_CLASS(nfs_xdr_event, 1689 1750 TP_PROTO( 1690 1751 const struct xdr_stream *xdr,
+14 -2
fs/nfs/pagelist.c
··· 731 731 732 732 int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr, 733 733 const struct cred *cred, const struct nfs_rpc_ops *rpc_ops, 734 - const struct rpc_call_ops *call_ops, int how, int flags) 734 + const struct rpc_call_ops *call_ops, int how, int flags, 735 + struct nfsd_file *localio) 735 736 { 736 737 struct rpc_task *task; 737 738 struct rpc_message msg = { ··· 761 760 (unsigned long long)NFS_FILEID(hdr->inode), 762 761 hdr->args.count, 763 762 (unsigned long long)hdr->args.offset); 763 + 764 + if (localio) 765 + return nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client, 766 + localio, hdr, call_ops); 764 767 765 768 task = rpc_run_task(&task_setup_data); 766 769 if (IS_ERR(task)) ··· 958 953 nfs_pgheader_init(desc, hdr, nfs_pgio_header_free); 959 954 ret = nfs_generic_pgio(desc, hdr); 960 955 if (ret == 0) { 956 + struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client; 957 + 958 + struct nfsd_file *localio = 959 + nfs_local_open_fh(clp, hdr->cred, 960 + hdr->args.fh, hdr->args.context->mode); 961 + 961 962 if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion) 962 963 task_flags = RPC_TASK_MOVEABLE; 963 964 ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode), ··· 972 961 NFS_PROTO(hdr->inode), 973 962 desc->pg_rpc_callops, 974 963 desc->pg_ioflags, 975 - RPC_TASK_CRED_NOREF | task_flags); 964 + RPC_TASK_CRED_NOREF | task_flags, 965 + localio); 976 966 } 977 967 return ret; 978 968 }
+1 -1
fs/nfs/pnfs_nfs.c
··· 490 490 nfs_initiate_commit(NFS_CLIENT(inode), data, 491 491 NFS_PROTO(data->inode), 492 492 data->mds_ops, how, 493 - RPC_TASK_CRED_NOREF); 493 + RPC_TASK_CRED_NOREF, NULL); 494 494 } else { 495 495 nfs_init_commit(data, NULL, data->lseg, cinfo); 496 496 initiate_commit(data, how);
+1 -2
fs/nfs/read.c
··· 48 48 49 49 static void nfs_readhdr_free(struct nfs_pgio_header *rhdr) 50 50 { 51 - if (rhdr->res.scratch != NULL) 52 - kfree(rhdr->res.scratch); 51 + kfree(rhdr->res.scratch); 53 52 kmem_cache_free(nfs_rdata_cachep, rhdr); 54 53 } 55 54
+3
fs/nfs/super.c
··· 551 551 else 552 552 seq_puts(m, ",local_lock=posix"); 553 553 554 + if (nfss->flags & NFS_MOUNT_NO_ALIGNWRITE) 555 + seq_puts(m, ",noalignwrite"); 556 + 554 557 if (nfss->flags & NFS_MOUNT_WRITE_EAGER) { 555 558 if (nfss->flags & NFS_MOUNT_WRITE_WAIT) 556 559 seq_puts(m, ",write=wait");
+15 -6
fs/nfs/write.c
··· 772 772 nfs_lock_request(req); 773 773 spin_lock(&mapping->i_private_lock); 774 774 set_bit(PG_MAPPED, &req->wb_flags); 775 - folio_set_private(folio); 776 - folio->private = req; 775 + folio_attach_private(folio, req); 777 776 spin_unlock(&mapping->i_private_lock); 778 777 atomic_long_inc(&nfsi->nrequests); 779 778 /* this a head request for a page group - mark it as having an ··· 796 797 797 798 spin_lock(&mapping->i_private_lock); 798 799 if (likely(folio)) { 799 - folio->private = NULL; 800 - folio_clear_private(folio); 800 + folio_detach_private(folio); 801 801 clear_bit(PG_MAPPED, &req->wb_head->wb_flags); 802 802 } 803 803 spin_unlock(&mapping->i_private_lock); ··· 1295 1297 struct file_lock_context *flctx = locks_inode_context(inode); 1296 1298 struct file_lock *fl; 1297 1299 int ret; 1300 + unsigned int mntflags = NFS_SERVER(inode)->flags; 1298 1301 1302 + if (mntflags & NFS_MOUNT_NO_ALIGNWRITE) 1303 + return 0; 1299 1304 if (file->f_flags & O_DSYNC) 1300 1305 return 0; 1301 1306 if (!nfs_folio_write_uptodate(folio, pagelen)) ··· 1664 1663 int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data, 1665 1664 const struct nfs_rpc_ops *nfs_ops, 1666 1665 const struct rpc_call_ops *call_ops, 1667 - int how, int flags) 1666 + int how, int flags, 1667 + struct nfsd_file *localio) 1668 1668 { 1669 1669 struct rpc_task *task; 1670 1670 int priority = flush_task_priority(how); ··· 1693 1691 trace_nfs_initiate_commit(data); 1694 1692 1695 1693 dprintk("NFS: initiated commit call\n"); 1694 + 1695 + if (localio) 1696 + return nfs_local_commit(localio, data, call_ops, how); 1696 1697 1697 1698 task = rpc_run_task(&task_setup_data); 1698 1699 if (IS_ERR(task)) ··· 1796 1791 struct nfs_commit_info *cinfo) 1797 1792 { 1798 1793 struct nfs_commit_data *data; 1794 + struct nfsd_file *localio; 1799 1795 unsigned short task_flags = 0; 1800 1796 1801 1797 /* another commit raced with us */ ··· 1813 1807 nfs_init_commit(data, head, NULL, cinfo); 1814 1808 if (NFS_SERVER(inode)->nfs_client->cl_minorversion) 1815 1809 task_flags = RPC_TASK_MOVEABLE; 1810 + 1811 + localio = nfs_local_open_fh(NFS_SERVER(inode)->nfs_client, data->cred, 1812 + data->args.fh, data->context->mode); 1816 1813 return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode), 1817 1814 data->mds_ops, how, 1818 - RPC_TASK_CRED_NOREF | task_flags); 1815 + RPC_TASK_CRED_NOREF | task_flags, localio); 1819 1816 } 1820 1817 1821 1818 /*
+5
fs/nfs_common/Makefile
··· 6 6 obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o 7 7 nfs_acl-objs := nfsacl.o 8 8 9 + obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o 10 + nfs_localio-objs := nfslocalio.o 11 + 9 12 obj-$(CONFIG_GRACE_PERIOD) += grace.o 10 13 obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o 14 + 15 + obj-$(CONFIG_NFS_COMMON) += common.o
+134
fs/nfs_common/common.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + #include <linux/module.h> 4 + #include <linux/nfs_common.h> 5 + #include <linux/nfs4.h> 6 + 7 + /* 8 + * We need to translate between nfs status return values and 9 + * the local errno values which may not be the same. 10 + */ 11 + static const struct { 12 + int stat; 13 + int errno; 14 + } nfs_errtbl[] = { 15 + { NFS_OK, 0 }, 16 + { NFSERR_PERM, -EPERM }, 17 + { NFSERR_NOENT, -ENOENT }, 18 + { NFSERR_IO, -errno_NFSERR_IO}, 19 + { NFSERR_NXIO, -ENXIO }, 20 + /* { NFSERR_EAGAIN, -EAGAIN }, */ 21 + { NFSERR_ACCES, -EACCES }, 22 + { NFSERR_EXIST, -EEXIST }, 23 + { NFSERR_XDEV, -EXDEV }, 24 + { NFSERR_NODEV, -ENODEV }, 25 + { NFSERR_NOTDIR, -ENOTDIR }, 26 + { NFSERR_ISDIR, -EISDIR }, 27 + { NFSERR_INVAL, -EINVAL }, 28 + { NFSERR_FBIG, -EFBIG }, 29 + { NFSERR_NOSPC, -ENOSPC }, 30 + { NFSERR_ROFS, -EROFS }, 31 + { NFSERR_MLINK, -EMLINK }, 32 + { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, 33 + { NFSERR_NOTEMPTY, -ENOTEMPTY }, 34 + { NFSERR_DQUOT, -EDQUOT }, 35 + { NFSERR_STALE, -ESTALE }, 36 + { NFSERR_REMOTE, -EREMOTE }, 37 + #ifdef EWFLUSH 38 + { NFSERR_WFLUSH, -EWFLUSH }, 39 + #endif 40 + { NFSERR_BADHANDLE, -EBADHANDLE }, 41 + { NFSERR_NOT_SYNC, -ENOTSYNC }, 42 + { NFSERR_BAD_COOKIE, -EBADCOOKIE }, 43 + { NFSERR_NOTSUPP, -ENOTSUPP }, 44 + { NFSERR_TOOSMALL, -ETOOSMALL }, 45 + { NFSERR_SERVERFAULT, -EREMOTEIO }, 46 + { NFSERR_BADTYPE, -EBADTYPE }, 47 + { NFSERR_JUKEBOX, -EJUKEBOX }, 48 + { -1, -EIO } 49 + }; 50 + 51 + /** 52 + * nfs_stat_to_errno - convert an NFS status code to a local errno 53 + * @status: NFS status code to convert 54 + * 55 + * Returns a local errno value, or -EIO if the NFS status code is 56 + * not recognized. This function is used jointly by NFSv2 and NFSv3. 57 + */ 58 + int nfs_stat_to_errno(enum nfs_stat status) 59 + { 60 + int i; 61 + 62 + for (i = 0; nfs_errtbl[i].stat != -1; i++) { 63 + if (nfs_errtbl[i].stat == (int)status) 64 + return nfs_errtbl[i].errno; 65 + } 66 + return nfs_errtbl[i].errno; 67 + } 68 + EXPORT_SYMBOL_GPL(nfs_stat_to_errno); 69 + 70 + /* 71 + * We need to translate between nfs v4 status return values and 72 + * the local errno values which may not be the same. 73 + */ 74 + static const struct { 75 + int stat; 76 + int errno; 77 + } nfs4_errtbl[] = { 78 + { NFS4_OK, 0 }, 79 + { NFS4ERR_PERM, -EPERM }, 80 + { NFS4ERR_NOENT, -ENOENT }, 81 + { NFS4ERR_IO, -errno_NFSERR_IO}, 82 + { NFS4ERR_NXIO, -ENXIO }, 83 + { NFS4ERR_ACCESS, -EACCES }, 84 + { NFS4ERR_EXIST, -EEXIST }, 85 + { NFS4ERR_XDEV, -EXDEV }, 86 + { NFS4ERR_NOTDIR, -ENOTDIR }, 87 + { NFS4ERR_ISDIR, -EISDIR }, 88 + { NFS4ERR_INVAL, -EINVAL }, 89 + { NFS4ERR_FBIG, -EFBIG }, 90 + { NFS4ERR_NOSPC, -ENOSPC }, 91 + { NFS4ERR_ROFS, -EROFS }, 92 + { NFS4ERR_MLINK, -EMLINK }, 93 + { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG }, 94 + { NFS4ERR_NOTEMPTY, -ENOTEMPTY }, 95 + { NFS4ERR_DQUOT, -EDQUOT }, 96 + { NFS4ERR_STALE, -ESTALE }, 97 + { NFS4ERR_BADHANDLE, -EBADHANDLE }, 98 + { NFS4ERR_BAD_COOKIE, -EBADCOOKIE }, 99 + { NFS4ERR_NOTSUPP, -ENOTSUPP }, 100 + { NFS4ERR_TOOSMALL, -ETOOSMALL }, 101 + { NFS4ERR_SERVERFAULT, -EREMOTEIO }, 102 + { NFS4ERR_BADTYPE, -EBADTYPE }, 103 + { NFS4ERR_LOCKED, -EAGAIN }, 104 + { NFS4ERR_SYMLINK, -ELOOP }, 105 + { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP }, 106 + { NFS4ERR_DEADLOCK, -EDEADLK }, 107 + { NFS4ERR_NOXATTR, -ENODATA }, 108 + { NFS4ERR_XATTR2BIG, -E2BIG }, 109 + { -1, -EIO } 110 + }; 111 + 112 + /* 113 + * Convert an NFS error code to a local one. 114 + * This one is used by NFSv4. 115 + */ 116 + int nfs4_stat_to_errno(int stat) 117 + { 118 + int i; 119 + for (i = 0; nfs4_errtbl[i].stat != -1; i++) { 120 + if (nfs4_errtbl[i].stat == stat) 121 + return nfs4_errtbl[i].errno; 122 + } 123 + if (stat <= 10000 || stat > 10100) { 124 + /* The server is looney tunes. */ 125 + return -EREMOTEIO; 126 + } 127 + /* If we cannot translate the error, the recovery routines should 128 + * handle it. 129 + * Note: remaining NFSv4 error codes have values > 10000, so should 130 + * not conflict with native Linux error codes. 131 + */ 132 + return -stat; 133 + } 134 + EXPORT_SYMBOL_GPL(nfs4_stat_to_errno);
+172
fs/nfs_common/nfslocalio.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> 4 + * Copyright (C) 2024 NeilBrown <neilb@suse.de> 5 + */ 6 + 7 + #include <linux/module.h> 8 + #include <linux/rculist.h> 9 + #include <linux/nfslocalio.h> 10 + #include <net/netns/generic.h> 11 + 12 + MODULE_LICENSE("GPL"); 13 + MODULE_DESCRIPTION("NFS localio protocol bypass support"); 14 + 15 + static DEFINE_SPINLOCK(nfs_uuid_lock); 16 + 17 + /* 18 + * Global list of nfs_uuid_t instances 19 + * that is protected by nfs_uuid_lock. 20 + */ 21 + static LIST_HEAD(nfs_uuids); 22 + 23 + void nfs_uuid_begin(nfs_uuid_t *nfs_uuid) 24 + { 25 + nfs_uuid->net = NULL; 26 + nfs_uuid->dom = NULL; 27 + uuid_gen(&nfs_uuid->uuid); 28 + 29 + spin_lock(&nfs_uuid_lock); 30 + list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids); 31 + spin_unlock(&nfs_uuid_lock); 32 + } 33 + EXPORT_SYMBOL_GPL(nfs_uuid_begin); 34 + 35 + void nfs_uuid_end(nfs_uuid_t *nfs_uuid) 36 + { 37 + if (nfs_uuid->net == NULL) { 38 + spin_lock(&nfs_uuid_lock); 39 + list_del_init(&nfs_uuid->list); 40 + spin_unlock(&nfs_uuid_lock); 41 + } 42 + } 43 + EXPORT_SYMBOL_GPL(nfs_uuid_end); 44 + 45 + static nfs_uuid_t * nfs_uuid_lookup_locked(const uuid_t *uuid) 46 + { 47 + nfs_uuid_t *nfs_uuid; 48 + 49 + list_for_each_entry(nfs_uuid, &nfs_uuids, list) 50 + if (uuid_equal(&nfs_uuid->uuid, uuid)) 51 + return nfs_uuid; 52 + 53 + return NULL; 54 + } 55 + 56 + static struct module *nfsd_mod; 57 + 58 + void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list, 59 + struct net *net, struct auth_domain *dom, 60 + struct module *mod) 61 + { 62 + nfs_uuid_t *nfs_uuid; 63 + 64 + spin_lock(&nfs_uuid_lock); 65 + nfs_uuid = nfs_uuid_lookup_locked(uuid); 66 + if (nfs_uuid) { 67 + kref_get(&dom->ref); 68 + nfs_uuid->dom = dom; 69 + /* 70 + * We don't hold a ref on the net, but instead put 71 + * ourselves on a list so the net pointer can be 72 + * invalidated. 73 + */ 74 + list_move(&nfs_uuid->list, list); 75 + rcu_assign_pointer(nfs_uuid->net, net); 76 + 77 + __module_get(mod); 78 + nfsd_mod = mod; 79 + } 80 + spin_unlock(&nfs_uuid_lock); 81 + } 82 + EXPORT_SYMBOL_GPL(nfs_uuid_is_local); 83 + 84 + static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid) 85 + { 86 + if (nfs_uuid->net) { 87 + module_put(nfsd_mod); 88 + nfs_uuid->net = NULL; 89 + } 90 + if (nfs_uuid->dom) { 91 + auth_domain_put(nfs_uuid->dom); 92 + nfs_uuid->dom = NULL; 93 + } 94 + list_del_init(&nfs_uuid->list); 95 + } 96 + 97 + void nfs_uuid_invalidate_clients(struct list_head *list) 98 + { 99 + nfs_uuid_t *nfs_uuid, *tmp; 100 + 101 + spin_lock(&nfs_uuid_lock); 102 + list_for_each_entry_safe(nfs_uuid, tmp, list, list) 103 + nfs_uuid_put_locked(nfs_uuid); 104 + spin_unlock(&nfs_uuid_lock); 105 + } 106 + EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients); 107 + 108 + void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid) 109 + { 110 + if (nfs_uuid->net) { 111 + spin_lock(&nfs_uuid_lock); 112 + nfs_uuid_put_locked(nfs_uuid); 113 + spin_unlock(&nfs_uuid_lock); 114 + } 115 + } 116 + EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client); 117 + 118 + struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid, 119 + struct rpc_clnt *rpc_clnt, const struct cred *cred, 120 + const struct nfs_fh *nfs_fh, const fmode_t fmode) 121 + { 122 + struct net *net; 123 + struct nfsd_file *localio; 124 + 125 + /* 126 + * Not running in nfsd context, so must safely get reference on nfsd_serv. 127 + * But the server may already be shutting down, if so disallow new localio. 128 + * uuid->net is NOT a counted reference, but rcu_read_lock() ensures that 129 + * if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe 130 + * and if it succeeds we will have an implied reference to the net. 131 + * 132 + * Otherwise NFS may not have ref on NFSD and therefore cannot safely 133 + * make 'nfs_to' calls. 134 + */ 135 + rcu_read_lock(); 136 + net = rcu_dereference(uuid->net); 137 + if (!net || !nfs_to->nfsd_serv_try_get(net)) { 138 + rcu_read_unlock(); 139 + return ERR_PTR(-ENXIO); 140 + } 141 + rcu_read_unlock(); 142 + /* We have an implied reference to net thanks to nfsd_serv_try_get */ 143 + localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt, 144 + cred, nfs_fh, fmode); 145 + if (IS_ERR(localio)) 146 + nfs_to->nfsd_serv_put(net); 147 + return localio; 148 + } 149 + EXPORT_SYMBOL_GPL(nfs_open_local_fh); 150 + 151 + /* 152 + * The NFS LOCALIO code needs to call into NFSD using various symbols, 153 + * but cannot be statically linked, because that will make the NFS 154 + * module always depend on the NFSD module. 155 + * 156 + * 'nfs_to' provides NFS access to NFSD functions needed for LOCALIO, 157 + * its lifetime is tightly coupled to the NFSD module and will always 158 + * be available to NFS LOCALIO because any successful client<->server 159 + * LOCALIO handshake results in a reference on the NFSD module (above), 160 + * so NFS implicitly holds a reference to the NFSD module and its 161 + * functions in the 'nfs_to' nfsd_localio_operations cannot disappear. 162 + * 163 + * If the last NFS client using LOCALIO disconnects (and its reference 164 + * on NFSD dropped) then NFSD could be unloaded, resulting in 'nfs_to' 165 + * functions being invalid pointers. But if NFSD isn't loaded then NFS 166 + * will not be able to handshake with NFSD and will have no cause to 167 + * try to call 'nfs_to' function pointers. If/when NFSD is reloaded it 168 + * will reinitialize the 'nfs_to' function pointers and make LOCALIO 169 + * possible. 170 + */ 171 + const struct nfsd_localio_operations *nfs_to; 172 + EXPORT_SYMBOL_GPL(nfs_to);
+1
fs/nfsd/Kconfig
··· 7 7 select LOCKD 8 8 select SUNRPC 9 9 select EXPORTFS 10 + select NFS_COMMON 10 11 select NFS_ACL_SUPPORT if NFSD_V2_ACL 11 12 select NFS_ACL_SUPPORT if NFSD_V3_ACL 12 13 depends on MULTIUSER
+1
fs/nfsd/Makefile
··· 23 23 nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o 24 24 nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o 25 25 nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o 26 + nfsd-$(CONFIG_NFS_LOCALIO) += localio.o
+25 -5
fs/nfsd/export.c
··· 1074 1074 return exp; 1075 1075 } 1076 1076 1077 + /** 1078 + * check_nfsd_access - check if access to export is allowed. 1079 + * @exp: svc_export that is being accessed. 1080 + * @rqstp: svc_rqst attempting to access @exp (will be NULL for LOCALIO). 1081 + * 1082 + * Return values: 1083 + * %nfs_ok if access is granted, or 1084 + * %nfserr_wrongsec if access is denied 1085 + */ 1077 1086 __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp) 1078 1087 { 1079 1088 struct exp_flavor_info *f, *end = exp->ex_flavors + exp->ex_nflavors; 1080 - struct svc_xprt *xprt = rqstp->rq_xprt; 1089 + struct svc_xprt *xprt; 1090 + 1091 + /* 1092 + * If rqstp is NULL, this is a LOCALIO request which will only 1093 + * ever use a filehandle/credential pair for which access has 1094 + * been affirmed (by ACCESS or OPEN NFS requests) over the 1095 + * wire. So there is no need for further checks here. 1096 + */ 1097 + if (!rqstp) 1098 + return nfs_ok; 1099 + 1100 + xprt = rqstp->rq_xprt; 1081 1101 1082 1102 if (exp->ex_xprtsec_modes & NFSEXP_XPRTSEC_NONE) { 1083 1103 if (!test_bit(XPT_TLS_SESSION, &xprt->xpt_flags)) ··· 1118 1098 ok: 1119 1099 /* legacy gss-only clients are always OK: */ 1120 1100 if (exp->ex_client == rqstp->rq_gssclient) 1121 - return 0; 1101 + return nfs_ok; 1122 1102 /* ip-address based client; check sec= export option: */ 1123 1103 for (f = exp->ex_flavors; f < end; f++) { 1124 1104 if (f->pseudoflavor == rqstp->rq_cred.cr_flavor) 1125 - return 0; 1105 + return nfs_ok; 1126 1106 } 1127 1107 /* defaults in absence of sec= options: */ 1128 1108 if (exp->ex_nflavors == 0) { 1129 1109 if (rqstp->rq_cred.cr_flavor == RPC_AUTH_NULL || 1130 1110 rqstp->rq_cred.cr_flavor == RPC_AUTH_UNIX) 1131 - return 0; 1111 + return nfs_ok; 1132 1112 } 1133 1113 1134 1114 /* If the compound op contains a spo_must_allowed op, ··· 1138 1118 */ 1139 1119 1140 1120 if (nfsd4_spo_must_allow(rqstp)) 1141 - return 0; 1121 + return nfs_ok; 1142 1122 1143 1123 denied: 1144 1124 return nfserr_wrongsec;
+93 -8
fs/nfsd/filecache.c
··· 52 52 #define NFSD_FILE_CACHE_UP (0) 53 53 54 54 /* We only care about NFSD_MAY_READ/WRITE for this cache */ 55 - #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) 55 + #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO) 56 56 57 57 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); 58 58 static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); ··· 388 388 } 389 389 if (refcount_dec_and_test(&nf->nf_ref)) 390 390 nfsd_file_free(nf); 391 + } 392 + 393 + /** 394 + * nfsd_file_put_local - put the reference to nfsd_file and local nfsd_serv 395 + * @nf: nfsd_file of which to put the references 396 + * 397 + * First put the reference of the nfsd_file and then put the 398 + * reference to the associated nn->nfsd_serv. 399 + */ 400 + void 401 + nfsd_file_put_local(struct nfsd_file *nf) 402 + { 403 + struct net *net = nf->nf_net; 404 + 405 + nfsd_file_put(nf); 406 + nfsd_serv_put(net); 407 + } 408 + 409 + /** 410 + * nfsd_file_file - get the backing file of an nfsd_file 411 + * @nf: nfsd_file of which to access the backing file. 412 + * 413 + * Return backing file for @nf. 414 + */ 415 + struct file * 416 + nfsd_file_file(struct nfsd_file *nf) 417 + { 418 + return nf->nf_file; 391 419 } 392 420 393 421 static void ··· 1010 982 } 1011 983 1012 984 static __be32 1013 - nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, 985 + nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net, 986 + struct svc_cred *cred, 987 + struct auth_domain *client, 988 + struct svc_fh *fhp, 1014 989 unsigned int may_flags, struct file *file, 1015 990 struct nfsd_file **pnf, bool want_gc) 1016 991 { 1017 992 unsigned char need = may_flags & NFSD_FILE_MAY_MASK; 1018 - struct net *net = SVC_NET(rqstp); 1019 993 struct nfsd_file *new, *nf; 1020 994 bool stale_retry = true; 1021 995 bool open_retry = true; ··· 1026 996 int ret; 1027 997 1028 998 retry: 1029 - status = fh_verify(rqstp, fhp, S_IFREG, 1030 - may_flags|NFSD_MAY_OWNER_OVERRIDE); 999 + if (rqstp) { 1000 + status = fh_verify(rqstp, fhp, S_IFREG, 1001 + may_flags|NFSD_MAY_OWNER_OVERRIDE); 1002 + } else { 1003 + status = fh_verify_local(net, cred, client, fhp, S_IFREG, 1004 + may_flags|NFSD_MAY_OWNER_OVERRIDE); 1005 + } 1031 1006 if (status != nfs_ok) 1032 1007 return status; 1033 1008 inode = d_inode(fhp->fh_dentry); ··· 1178 1143 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, 1179 1144 unsigned int may_flags, struct nfsd_file **pnf) 1180 1145 { 1181 - return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true); 1146 + return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL, 1147 + fhp, may_flags, NULL, pnf, true); 1182 1148 } 1183 1149 1184 1150 /** ··· 1203 1167 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, 1204 1168 unsigned int may_flags, struct nfsd_file **pnf) 1205 1169 { 1206 - return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false); 1170 + return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL, 1171 + fhp, may_flags, NULL, pnf, false); 1172 + } 1173 + 1174 + /** 1175 + * nfsd_file_acquire_local - Get a struct nfsd_file with an open file for localio 1176 + * @net: The network namespace in which to perform a lookup 1177 + * @cred: the user credential with which to validate access 1178 + * @client: the auth_domain for LOCALIO lookup 1179 + * @fhp: the NFS filehandle of the file to be opened 1180 + * @may_flags: NFSD_MAY_ settings for the file 1181 + * @pnf: OUT: new or found "struct nfsd_file" object 1182 + * 1183 + * This file lookup interface provide access to a file given the 1184 + * filehandle and credential. No connection-based authorisation 1185 + * is performed and in that way it is quite different to other 1186 + * file access mediated by nfsd. It allows a kernel module such as the NFS 1187 + * client to reach across network and filesystem namespaces to access 1188 + * a file. The security implications of this should be carefully 1189 + * considered before use. 1190 + * 1191 + * The nfsd_file object returned by this API is reference-counted 1192 + * and garbage-collected. The object is retained for a few 1193 + * seconds after the final nfsd_file_put() in case the caller 1194 + * wants to re-use it. 1195 + * 1196 + * Return values: 1197 + * %nfs_ok - @pnf points to an nfsd_file with its reference 1198 + * count boosted. 1199 + * 1200 + * On error, an nfsstat value in network byte order is returned. 1201 + */ 1202 + __be32 1203 + nfsd_file_acquire_local(struct net *net, struct svc_cred *cred, 1204 + struct auth_domain *client, struct svc_fh *fhp, 1205 + unsigned int may_flags, struct nfsd_file **pnf) 1206 + { 1207 + /* 1208 + * Save creds before calling nfsd_file_do_acquire() (which calls 1209 + * nfsd_setuser). Important because caller (LOCALIO) is from 1210 + * client context. 1211 + */ 1212 + const struct cred *save_cred = get_current_cred(); 1213 + __be32 beres; 1214 + 1215 + beres = nfsd_file_do_acquire(NULL, net, cred, client, 1216 + fhp, may_flags, NULL, pnf, true); 1217 + revert_creds(save_cred); 1218 + return beres; 1207 1219 } 1208 1220 1209 1221 /** ··· 1277 1193 unsigned int may_flags, struct file *file, 1278 1194 struct nfsd_file **pnf) 1279 1195 { 1280 - return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false); 1196 + return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL, 1197 + fhp, may_flags, file, pnf, false); 1281 1198 } 1282 1199 1283 1200 /*
+5
fs/nfsd/filecache.h
··· 55 55 int nfsd_file_cache_start_net(struct net *net); 56 56 void nfsd_file_cache_shutdown_net(struct net *net); 57 57 void nfsd_file_put(struct nfsd_file *nf); 58 + void nfsd_file_put_local(struct nfsd_file *nf); 58 59 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); 60 + struct file *nfsd_file_file(struct nfsd_file *nf); 59 61 void nfsd_file_close_inode_sync(struct inode *inode); 60 62 void nfsd_file_net_dispose(struct nfsd_net *nn); 61 63 bool nfsd_file_is_cached(struct inode *inode); ··· 68 66 __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, 69 67 unsigned int may_flags, struct file *file, 70 68 struct nfsd_file **nfp); 69 + __be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred, 70 + struct auth_domain *client, struct svc_fh *fhp, 71 + unsigned int may_flags, struct nfsd_file **pnf); 71 72 int nfsd_file_cache_stats_show(struct seq_file *m, void *v); 72 73 #endif /* _FS_NFSD_FILECACHE_H */
+169
fs/nfsd/localio.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * NFS server support for local clients to bypass network stack 4 + * 5 + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com> 6 + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com> 7 + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> 8 + * Copyright (C) 2024 NeilBrown <neilb@suse.de> 9 + */ 10 + 11 + #include <linux/exportfs.h> 12 + #include <linux/sunrpc/svcauth.h> 13 + #include <linux/sunrpc/clnt.h> 14 + #include <linux/nfs.h> 15 + #include <linux/nfs_common.h> 16 + #include <linux/nfslocalio.h> 17 + #include <linux/nfs_fs.h> 18 + #include <linux/nfs_xdr.h> 19 + #include <linux/string.h> 20 + 21 + #include "nfsd.h" 22 + #include "vfs.h" 23 + #include "netns.h" 24 + #include "filecache.h" 25 + #include "cache.h" 26 + 27 + static const struct nfsd_localio_operations nfsd_localio_ops = { 28 + .nfsd_serv_try_get = nfsd_serv_try_get, 29 + .nfsd_serv_put = nfsd_serv_put, 30 + .nfsd_open_local_fh = nfsd_open_local_fh, 31 + .nfsd_file_put_local = nfsd_file_put_local, 32 + .nfsd_file_file = nfsd_file_file, 33 + }; 34 + 35 + void nfsd_localio_ops_init(void) 36 + { 37 + nfs_to = &nfsd_localio_ops; 38 + } 39 + 40 + /** 41 + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file 42 + * 43 + * @net: 'struct net' to get the proper nfsd_net required for LOCALIO access 44 + * @dom: 'struct auth_domain' required for LOCALIO access 45 + * @rpc_clnt: rpc_clnt that the client established 46 + * @cred: cred that the client established 47 + * @nfs_fh: filehandle to lookup 48 + * @fmode: fmode_t to use for open 49 + * 50 + * This function maps a local fh to a path on a local filesystem. 51 + * This is useful when the nfs client has the local server mounted - it can 52 + * avoid all the NFS overhead with reads, writes and commits. 53 + * 54 + * On successful return, returned nfsd_file will have its nf_net member 55 + * set. Caller (NFS client) is responsible for calling nfsd_serv_put and 56 + * nfsd_file_put (via nfs_to->nfsd_file_put_local). 57 + */ 58 + struct nfsd_file * 59 + nfsd_open_local_fh(struct net *net, struct auth_domain *dom, 60 + struct rpc_clnt *rpc_clnt, const struct cred *cred, 61 + const struct nfs_fh *nfs_fh, const fmode_t fmode) 62 + { 63 + int mayflags = NFSD_MAY_LOCALIO; 64 + struct svc_cred rq_cred; 65 + struct svc_fh fh; 66 + struct nfsd_file *localio; 67 + __be32 beres; 68 + 69 + if (nfs_fh->size > NFS4_FHSIZE) 70 + return ERR_PTR(-EINVAL); 71 + 72 + /* nfs_fh -> svc_fh */ 73 + fh_init(&fh, NFS4_FHSIZE); 74 + fh.fh_handle.fh_size = nfs_fh->size; 75 + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size); 76 + 77 + if (fmode & FMODE_READ) 78 + mayflags |= NFSD_MAY_READ; 79 + if (fmode & FMODE_WRITE) 80 + mayflags |= NFSD_MAY_WRITE; 81 + 82 + svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred); 83 + 84 + beres = nfsd_file_acquire_local(net, &rq_cred, dom, 85 + &fh, mayflags, &localio); 86 + if (beres) 87 + localio = ERR_PTR(nfs_stat_to_errno(be32_to_cpu(beres))); 88 + 89 + fh_put(&fh); 90 + if (rq_cred.cr_group_info) 91 + put_group_info(rq_cred.cr_group_info); 92 + 93 + return localio; 94 + } 95 + EXPORT_SYMBOL_GPL(nfsd_open_local_fh); 96 + 97 + /* 98 + * UUID_IS_LOCAL XDR functions 99 + */ 100 + 101 + static __be32 localio_proc_null(struct svc_rqst *rqstp) 102 + { 103 + return rpc_success; 104 + } 105 + 106 + struct localio_uuidarg { 107 + uuid_t uuid; 108 + }; 109 + 110 + static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp) 111 + { 112 + struct localio_uuidarg *argp = rqstp->rq_argp; 113 + struct net *net = SVC_NET(rqstp); 114 + struct nfsd_net *nn = net_generic(net, nfsd_net_id); 115 + 116 + nfs_uuid_is_local(&argp->uuid, &nn->local_clients, 117 + net, rqstp->rq_client, THIS_MODULE); 118 + 119 + return rpc_success; 120 + } 121 + 122 + static bool localio_decode_uuidarg(struct svc_rqst *rqstp, 123 + struct xdr_stream *xdr) 124 + { 125 + struct localio_uuidarg *argp = rqstp->rq_argp; 126 + u8 uuid[UUID_SIZE]; 127 + 128 + if (decode_opaque_fixed(xdr, uuid, UUID_SIZE)) 129 + return false; 130 + import_uuid(&argp->uuid, uuid); 131 + 132 + return true; 133 + } 134 + 135 + static const struct svc_procedure localio_procedures1[] = { 136 + [LOCALIOPROC_NULL] = { 137 + .pc_func = localio_proc_null, 138 + .pc_decode = nfssvc_decode_voidarg, 139 + .pc_encode = nfssvc_encode_voidres, 140 + .pc_argsize = sizeof(struct nfsd_voidargs), 141 + .pc_ressize = sizeof(struct nfsd_voidres), 142 + .pc_cachetype = RC_NOCACHE, 143 + .pc_xdrressize = 0, 144 + .pc_name = "NULL", 145 + }, 146 + [LOCALIOPROC_UUID_IS_LOCAL] = { 147 + .pc_func = localio_proc_uuid_is_local, 148 + .pc_decode = localio_decode_uuidarg, 149 + .pc_encode = nfssvc_encode_voidres, 150 + .pc_argsize = sizeof(struct localio_uuidarg), 151 + .pc_argzero = sizeof(struct localio_uuidarg), 152 + .pc_ressize = sizeof(struct nfsd_voidres), 153 + .pc_cachetype = RC_NOCACHE, 154 + .pc_name = "UUID_IS_LOCAL", 155 + }, 156 + }; 157 + 158 + #define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1) 159 + static DEFINE_PER_CPU_ALIGNED(unsigned long, 160 + localio_count[LOCALIO_NR_PROCEDURES]); 161 + const struct svc_version localio_version1 = { 162 + .vs_vers = 1, 163 + .vs_nproc = LOCALIO_NR_PROCEDURES, 164 + .vs_proc = localio_procedures1, 165 + .vs_dispatch = nfsd_dispatch, 166 + .vs_count = localio_count, 167 + .vs_xdrsize = XDR_QUADLEN(UUID_SIZE), 168 + .vs_hidden = true, 169 + };
+11 -1
fs/nfsd/netns.h
··· 13 13 #include <linux/filelock.h> 14 14 #include <linux/nfs4.h> 15 15 #include <linux/percpu_counter.h> 16 + #include <linux/percpu-refcount.h> 16 17 #include <linux/siphash.h> 17 18 #include <linux/sunrpc/stats.h> 18 19 ··· 140 139 141 140 struct svc_info nfsd_info; 142 141 #define nfsd_serv nfsd_info.serv 143 - 142 + struct percpu_ref nfsd_serv_ref; 143 + struct completion nfsd_serv_confirm_done; 144 + struct completion nfsd_serv_free_done; 144 145 145 146 /* 146 147 * clientid and stateid data for construction of net unique COPY ··· 217 214 /* last time an admin-revoke happened for NFSv4.0 */ 218 215 time64_t nfs40_last_revoke; 219 216 217 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 218 + /* Local clients to be invalidated when net is shut down */ 219 + struct list_head local_clients; 220 + #endif 220 221 }; 221 222 222 223 /* Simple check to find out if a given net was properly initialized */ ··· 228 221 229 222 extern bool nfsd_support_version(int vers); 230 223 extern unsigned int nfsd_net_id; 224 + 225 + bool nfsd_serv_try_get(struct net *net); 226 + void nfsd_serv_put(struct net *net); 231 227 232 228 void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); 233 229 void nfsd_reset_write_verifier(struct nfsd_net *nn);
+25 -2
fs/nfsd/nfsctl.c
··· 18 18 #include <linux/sunrpc/svc.h> 19 19 #include <linux/module.h> 20 20 #include <linux/fsnotify.h> 21 + #include <linux/nfslocalio.h> 21 22 22 23 #include "idmap.h" 23 24 #include "nfsd.h" ··· 2247 2246 if (retval) 2248 2247 goto out_repcache_error; 2249 2248 memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats)); 2250 - nn->nfsd_svcstats.program = &nfsd_program; 2249 + nn->nfsd_svcstats.program = &nfsd_programs[0]; 2251 2250 for (i = 0; i < sizeof(nn->nfsd_versions); i++) 2252 2251 nn->nfsd_versions[i] = nfsd_support_version(i); 2253 2252 for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++) ··· 2258 2257 get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); 2259 2258 seqlock_init(&nn->writeverf_lock); 2260 2259 nfsd_proc_stat_init(net); 2261 - 2260 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 2261 + INIT_LIST_HEAD(&nn->local_clients); 2262 + #endif 2262 2263 return 0; 2263 2264 2264 2265 out_repcache_error: ··· 2270 2267 out_export_error: 2271 2268 return retval; 2272 2269 } 2270 + 2271 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 2272 + /** 2273 + * nfsd_net_pre_exit - Disconnect localio clients from net namespace 2274 + * @net: a network namespace that is about to be destroyed 2275 + * 2276 + * This invalidated ->net pointers held by localio clients 2277 + * while they can still safely access nn->counter. 2278 + */ 2279 + static __net_exit void nfsd_net_pre_exit(struct net *net) 2280 + { 2281 + struct nfsd_net *nn = net_generic(net, nfsd_net_id); 2282 + 2283 + nfs_uuid_invalidate_clients(&nn->local_clients); 2284 + } 2285 + #endif 2273 2286 2274 2287 /** 2275 2288 * nfsd_net_exit - Release the nfsd_net portion of a net namespace ··· 2304 2285 2305 2286 static struct pernet_operations nfsd_net_ops = { 2306 2287 .init = nfsd_net_init, 2288 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 2289 + .pre_exit = nfsd_net_pre_exit, 2290 + #endif 2307 2291 .exit = nfsd_net_exit, 2308 2292 .id = &nfsd_net_id, 2309 2293 .size = sizeof(struct nfsd_net), ··· 2344 2322 retval = genl_register_family(&nfsd_nl_family); 2345 2323 if (retval) 2346 2324 goto out_free_all; 2325 + nfsd_localio_ops_init(); 2347 2326 2348 2327 return 0; 2349 2328 out_free_all:
+5 -1
fs/nfsd/nfsd.h
··· 85 85 u32 rq_opnum[NFSD_MAX_OPS_PER_COMPOUND]; 86 86 }; 87 87 88 - extern struct svc_program nfsd_program; 88 + extern struct svc_program nfsd_programs[]; 89 89 extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4; 90 90 extern struct mutex nfsd_mutex; 91 91 extern spinlock_t nfsd_drc_lock; ··· 144 144 #else 145 145 #define nfsd_acl_version3 NULL 146 146 #endif 147 + #endif 148 + 149 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 150 + extern const struct svc_version localio_version1; 147 151 #endif 148 152 149 153 struct nfsd_net;
+140 -87
fs/nfsd/nfsfh.c
··· 87 87 return nfserr_wrong_type; 88 88 } 89 89 90 - static bool nfsd_originating_port_ok(struct svc_rqst *rqstp, int flags) 90 + static bool nfsd_originating_port_ok(struct svc_rqst *rqstp, 91 + struct svc_cred *cred, 92 + struct svc_export *exp) 91 93 { 92 - if (flags & NFSEXP_INSECURE_PORT) 94 + if (nfsexp_flags(cred, exp) & NFSEXP_INSECURE_PORT) 93 95 return true; 94 96 /* We don't require gss requests to use low ports: */ 95 - if (rqstp->rq_cred.cr_flavor >= RPC_AUTH_GSS) 97 + if (cred->cr_flavor >= RPC_AUTH_GSS) 96 98 return true; 97 99 return test_bit(RQ_SECURE, &rqstp->rq_flags); 98 100 } 99 101 100 102 static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp, 103 + struct svc_cred *cred, 101 104 struct svc_export *exp) 102 105 { 103 - int flags = nfsexp_flags(&rqstp->rq_cred, exp); 104 - 105 106 /* Check if the request originated from a secure port. */ 106 - if (!nfsd_originating_port_ok(rqstp, flags)) { 107 + if (rqstp && !nfsd_originating_port_ok(rqstp, cred, exp)) { 107 108 RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]); 108 109 dprintk("nfsd: request from insecure port %s!\n", 109 110 svc_print_addr(rqstp, buf, sizeof(buf))); ··· 112 111 } 113 112 114 113 /* Set user creds for this exportpoint */ 115 - return nfserrno(nfsd_setuser(&rqstp->rq_cred, exp)); 114 + return nfserrno(nfsd_setuser(cred, exp)); 116 115 } 117 116 118 117 static inline __be32 check_pseudo_root(struct dentry *dentry, ··· 142 141 * dentry. On success, the results are used to set fh_export and 143 142 * fh_dentry. 144 143 */ 145 - static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) 144 + static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net, 145 + struct svc_cred *cred, 146 + struct auth_domain *client, 147 + struct auth_domain *gssclient, 148 + struct svc_fh *fhp) 146 149 { 147 150 struct knfsd_fh *fh = &fhp->fh_handle; 148 151 struct fid *fid = NULL; ··· 188 183 data_left -= len; 189 184 if (data_left < 0) 190 185 return error; 191 - exp = rqst_exp_find(&rqstp->rq_chandle, SVC_NET(rqstp), 192 - rqstp->rq_client, rqstp->rq_gssclient, 186 + exp = rqst_exp_find(rqstp ? &rqstp->rq_chandle : NULL, 187 + net, client, gssclient, 193 188 fh->fh_fsid_type, fh->fh_fsid); 194 189 fid = (struct fid *)(fh->fh_fsid + len); 195 190 ··· 224 219 put_cred(override_creds(new)); 225 220 put_cred(new); 226 221 } else { 227 - error = nfsd_setuser_and_check_port(rqstp, exp); 222 + error = nfsd_setuser_and_check_port(rqstp, cred, exp); 228 223 if (error) 229 224 goto out; 230 225 } ··· 271 266 fhp->fh_dentry = dentry; 272 267 fhp->fh_export = exp; 273 268 274 - switch (rqstp->rq_vers) { 275 - case 4: 269 + switch (fhp->fh_maxsize) { 270 + case NFS4_FHSIZE: 276 271 if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR) 277 272 fhp->fh_no_atomic_attr = true; 278 273 fhp->fh_64bit_cookies = true; 279 274 break; 280 - case 3: 275 + case NFS3_FHSIZE: 281 276 if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) 282 277 fhp->fh_no_wcc = true; 283 278 fhp->fh_64bit_cookies = true; 284 279 if (exp->ex_flags & NFSEXP_V4ROOT) 285 280 goto out; 286 281 break; 287 - case 2: 282 + case NFS_FHSIZE: 288 283 fhp->fh_no_wcc = true; 289 284 if (EX_WGATHER(exp)) 290 285 fhp->fh_use_wgather = true; ··· 296 291 out: 297 292 exp_put(exp); 298 293 return error; 294 + } 295 + 296 + /** 297 + * __fh_verify - filehandle lookup and access checking 298 + * @rqstp: RPC transaction context, or NULL 299 + * @net: net namespace in which to perform the export lookup 300 + * @cred: RPC user credential 301 + * @client: RPC auth domain 302 + * @gssclient: RPC GSS auth domain, or NULL 303 + * @fhp: filehandle to be verified 304 + * @type: expected type of object pointed to by filehandle 305 + * @access: type of access needed to object 306 + * 307 + * See fh_verify() for further descriptions of @fhp, @type, and @access. 308 + */ 309 + static __be32 310 + __fh_verify(struct svc_rqst *rqstp, 311 + struct net *net, struct svc_cred *cred, 312 + struct auth_domain *client, 313 + struct auth_domain *gssclient, 314 + struct svc_fh *fhp, umode_t type, int access) 315 + { 316 + struct nfsd_net *nn = net_generic(net, nfsd_net_id); 317 + struct svc_export *exp = NULL; 318 + struct dentry *dentry; 319 + __be32 error; 320 + 321 + if (!fhp->fh_dentry) { 322 + error = nfsd_set_fh_dentry(rqstp, net, cred, client, 323 + gssclient, fhp); 324 + if (error) 325 + goto out; 326 + } 327 + dentry = fhp->fh_dentry; 328 + exp = fhp->fh_export; 329 + 330 + trace_nfsd_fh_verify(rqstp, fhp, type, access); 331 + 332 + /* 333 + * We still have to do all these permission checks, even when 334 + * fh_dentry is already set: 335 + * - fh_verify may be called multiple times with different 336 + * "access" arguments (e.g. nfsd_proc_create calls 337 + * fh_verify(...,NFSD_MAY_EXEC) first, then later (in 338 + * nfsd_create) calls fh_verify(...,NFSD_MAY_CREATE). 339 + * - in the NFSv4 case, the filehandle may have been filled 340 + * in by fh_compose, and given a dentry, but further 341 + * compound operations performed with that filehandle 342 + * still need permissions checks. In the worst case, a 343 + * mountpoint crossing may have changed the export 344 + * options, and we may now need to use a different uid 345 + * (for example, if different id-squashing options are in 346 + * effect on the new filesystem). 347 + */ 348 + error = check_pseudo_root(dentry, exp); 349 + if (error) 350 + goto out; 351 + 352 + error = nfsd_setuser_and_check_port(rqstp, cred, exp); 353 + if (error) 354 + goto out; 355 + 356 + error = nfsd_mode_check(dentry, type); 357 + if (error) 358 + goto out; 359 + 360 + /* 361 + * pseudoflavor restrictions are not enforced on NLM, 362 + * which clients virtually always use auth_sys for, 363 + * even while using RPCSEC_GSS for NFS. 364 + */ 365 + if (access & NFSD_MAY_LOCK || access & NFSD_MAY_BYPASS_GSS) 366 + goto skip_pseudoflavor_check; 367 + /* 368 + * Clients may expect to be able to use auth_sys during mount, 369 + * even if they use gss for everything else; see section 2.3.2 370 + * of rfc 2623. 371 + */ 372 + if (access & NFSD_MAY_BYPASS_GSS_ON_ROOT 373 + && exp->ex_path.dentry == dentry) 374 + goto skip_pseudoflavor_check; 375 + 376 + error = check_nfsd_access(exp, rqstp); 377 + if (error) 378 + goto out; 379 + 380 + skip_pseudoflavor_check: 381 + /* Finally, check access permissions. */ 382 + error = nfsd_permission(cred, exp, dentry, access); 383 + out: 384 + trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error); 385 + if (error == nfserr_stale) 386 + nfsd_stats_fh_stale_inc(nn, exp); 387 + return error; 388 + } 389 + 390 + /** 391 + * fh_verify_local - filehandle lookup and access checking 392 + * @net: net namespace in which to perform the export lookup 393 + * @cred: RPC user credential 394 + * @client: RPC auth domain 395 + * @fhp: filehandle to be verified 396 + * @type: expected type of object pointed to by filehandle 397 + * @access: type of access needed to object 398 + * 399 + * This API can be used by callers who do not have an RPC 400 + * transaction context (ie are not running in an nfsd thread). 401 + * 402 + * See fh_verify() for further descriptions of @fhp, @type, and @access. 403 + */ 404 + __be32 405 + fh_verify_local(struct net *net, struct svc_cred *cred, 406 + struct auth_domain *client, struct svc_fh *fhp, 407 + umode_t type, int access) 408 + { 409 + return __fh_verify(NULL, net, cred, client, NULL, 410 + fhp, type, access); 299 411 } 300 412 301 413 /** ··· 445 323 __be32 446 324 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) 447 325 { 448 - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); 449 - struct svc_export *exp = NULL; 450 - struct dentry *dentry; 451 - __be32 error; 452 - 453 - if (!fhp->fh_dentry) { 454 - error = nfsd_set_fh_dentry(rqstp, fhp); 455 - if (error) 456 - goto out; 457 - } 458 - dentry = fhp->fh_dentry; 459 - exp = fhp->fh_export; 460 - 461 - trace_nfsd_fh_verify(rqstp, fhp, type, access); 462 - 463 - /* 464 - * We still have to do all these permission checks, even when 465 - * fh_dentry is already set: 466 - * - fh_verify may be called multiple times with different 467 - * "access" arguments (e.g. nfsd_proc_create calls 468 - * fh_verify(...,NFSD_MAY_EXEC) first, then later (in 469 - * nfsd_create) calls fh_verify(...,NFSD_MAY_CREATE). 470 - * - in the NFSv4 case, the filehandle may have been filled 471 - * in by fh_compose, and given a dentry, but further 472 - * compound operations performed with that filehandle 473 - * still need permissions checks. In the worst case, a 474 - * mountpoint crossing may have changed the export 475 - * options, and we may now need to use a different uid 476 - * (for example, if different id-squashing options are in 477 - * effect on the new filesystem). 478 - */ 479 - error = check_pseudo_root(dentry, exp); 480 - if (error) 481 - goto out; 482 - 483 - error = nfsd_setuser_and_check_port(rqstp, exp); 484 - if (error) 485 - goto out; 486 - 487 - error = nfsd_mode_check(dentry, type); 488 - if (error) 489 - goto out; 490 - 491 - /* 492 - * pseudoflavor restrictions are not enforced on NLM, 493 - * which clients virtually always use auth_sys for, 494 - * even while using RPCSEC_GSS for NFS. 495 - */ 496 - if (access & NFSD_MAY_LOCK || access & NFSD_MAY_BYPASS_GSS) 497 - goto skip_pseudoflavor_check; 498 - /* 499 - * Clients may expect to be able to use auth_sys during mount, 500 - * even if they use gss for everything else; see section 2.3.2 501 - * of rfc 2623. 502 - */ 503 - if (access & NFSD_MAY_BYPASS_GSS_ON_ROOT 504 - && exp->ex_path.dentry == dentry) 505 - goto skip_pseudoflavor_check; 506 - 507 - error = check_nfsd_access(exp, rqstp); 508 - if (error) 509 - goto out; 510 - 511 - skip_pseudoflavor_check: 512 - /* Finally, check access permissions. */ 513 - error = nfsd_permission(&rqstp->rq_cred, exp, dentry, access); 514 - out: 515 - trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error); 516 - if (error == nfserr_stale) 517 - nfsd_stats_fh_stale_inc(nn, exp); 518 - return error; 326 + return __fh_verify(rqstp, SVC_NET(rqstp), &rqstp->rq_cred, 327 + rqstp->rq_client, rqstp->rq_gssclient, 328 + fhp, type, access); 519 329 } 520 - 521 330 522 331 /* 523 332 * Compose a file handle for an NFS reply.
+2
fs/nfsd/nfsfh.h
··· 217 217 * Function prototypes 218 218 */ 219 219 __be32 fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int); 220 + __be32 fh_verify_local(struct net *, struct svc_cred *, struct auth_domain *, 221 + struct svc_fh *, umode_t, int); 220 222 __be32 fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *); 221 223 __be32 fh_update(struct svc_fh *); 222 224 void fh_put(struct svc_fh *);
+85 -20
fs/nfsd/nfssvc.c
··· 19 19 #include <linux/sunrpc/svc_xprt.h> 20 20 #include <linux/lockd/bind.h> 21 21 #include <linux/nfsacl.h> 22 + #include <linux/nfslocalio.h> 22 23 #include <linux/seq_file.h> 23 24 #include <linux/inetdevice.h> 24 25 #include <net/addrconf.h> ··· 36 35 #define NFSDDBG_FACILITY NFSDDBG_SVC 37 36 38 37 atomic_t nfsd_th_cnt = ATOMIC_INIT(0); 39 - extern struct svc_program nfsd_program; 40 38 static int nfsd(void *vrqstp); 41 39 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) 42 40 static int nfsd_acl_rpcbind_set(struct net *, ··· 80 80 unsigned long nfsd_drc_max_mem; 81 81 unsigned long nfsd_drc_mem_used; 82 82 83 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 84 + static const struct svc_version *localio_versions[] = { 85 + [1] = &localio_version1, 86 + }; 87 + 88 + #define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions) 89 + 90 + #endif /* CONFIG_NFS_LOCALIO */ 91 + 83 92 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) 84 93 static const struct svc_version *nfsd_acl_version[] = { 85 94 # if defined(CONFIG_NFSD_V2_ACL) ··· 99 90 # endif 100 91 }; 101 92 102 - #define NFSD_ACL_MINVERS 2 93 + #define NFSD_ACL_MINVERS 2 103 94 #define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version) 104 - 105 - static struct svc_program nfsd_acl_program = { 106 - .pg_prog = NFS_ACL_PROGRAM, 107 - .pg_nvers = NFSD_ACL_NRVERS, 108 - .pg_vers = nfsd_acl_version, 109 - .pg_name = "nfsacl", 110 - .pg_class = "nfsd", 111 - .pg_authenticate = &svc_set_client, 112 - .pg_init_request = nfsd_acl_init_request, 113 - .pg_rpcbind_set = nfsd_acl_rpcbind_set, 114 - }; 115 95 116 96 #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */ 117 97 ··· 114 116 #endif 115 117 }; 116 118 117 - struct svc_program nfsd_program = { 118 - #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) 119 - .pg_next = &nfsd_acl_program, 120 - #endif 119 + struct svc_program nfsd_programs[] = { 120 + { 121 121 .pg_prog = NFS_PROGRAM, /* program number */ 122 122 .pg_nvers = NFSD_MAXVERS+1, /* nr of entries in nfsd_version */ 123 123 .pg_vers = nfsd_version, /* version table */ 124 124 .pg_name = "nfsd", /* program name */ 125 125 .pg_class = "nfsd", /* authentication class */ 126 - .pg_authenticate = &svc_set_client, /* export authentication */ 126 + .pg_authenticate = svc_set_client, /* export authentication */ 127 127 .pg_init_request = nfsd_init_request, 128 128 .pg_rpcbind_set = nfsd_rpcbind_set, 129 + }, 130 + #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) 131 + { 132 + .pg_prog = NFS_ACL_PROGRAM, 133 + .pg_nvers = NFSD_ACL_NRVERS, 134 + .pg_vers = nfsd_acl_version, 135 + .pg_name = "nfsacl", 136 + .pg_class = "nfsd", 137 + .pg_authenticate = svc_set_client, 138 + .pg_init_request = nfsd_acl_init_request, 139 + .pg_rpcbind_set = nfsd_acl_rpcbind_set, 140 + }, 141 + #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */ 142 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 143 + { 144 + .pg_prog = NFS_LOCALIO_PROGRAM, 145 + .pg_nvers = NFSD_LOCALIO_NRVERS, 146 + .pg_vers = localio_versions, 147 + .pg_name = "nfslocalio", 148 + .pg_class = "nfsd", 149 + .pg_authenticate = svc_set_client, 150 + .pg_init_request = svc_generic_init_request, 151 + .pg_rpcbind_set = svc_generic_rpcbind_set, 152 + } 153 + #endif /* CONFIG_NFS_LOCALIO */ 129 154 }; 130 155 131 156 bool nfsd_support_version(int vers) ··· 212 191 nfsd_vers(nn, 4, NFSD_AVAIL); 213 192 } 214 193 return 0; 194 + } 195 + 196 + bool nfsd_serv_try_get(struct net *net) 197 + { 198 + struct nfsd_net *nn = net_generic(net, nfsd_net_id); 199 + 200 + return (nn && percpu_ref_tryget_live(&nn->nfsd_serv_ref)); 201 + } 202 + 203 + void nfsd_serv_put(struct net *net) 204 + { 205 + struct nfsd_net *nn = net_generic(net, nfsd_net_id); 206 + 207 + percpu_ref_put(&nn->nfsd_serv_ref); 208 + } 209 + 210 + static void nfsd_serv_done(struct percpu_ref *ref) 211 + { 212 + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); 213 + 214 + complete(&nn->nfsd_serv_confirm_done); 215 + } 216 + 217 + static void nfsd_serv_free(struct percpu_ref *ref) 218 + { 219 + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); 220 + 221 + complete(&nn->nfsd_serv_free_done); 215 222 } 216 223 217 224 /* ··· 441 392 lockd_down(net); 442 393 nn->lockd_up = false; 443 394 } 395 + percpu_ref_exit(&nn->nfsd_serv_ref); 444 396 nn->nfsd_net_up = false; 445 397 nfsd_shutdown_generic(); 446 398 } ··· 520 470 { 521 471 struct nfsd_net *nn = net_generic(net, nfsd_net_id); 522 472 struct svc_serv *serv = nn->nfsd_serv; 473 + 474 + lockdep_assert_held(&nfsd_mutex); 475 + 476 + percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done); 477 + wait_for_completion(&nn->nfsd_serv_confirm_done); 478 + wait_for_completion(&nn->nfsd_serv_free_done); 479 + /* percpu_ref_exit is called in nfsd_shutdown_net */ 523 480 524 481 spin_lock(&nfsd_notifier_lock); 525 482 nn->nfsd_serv = NULL; ··· 652 595 if (nn->nfsd_serv) 653 596 return 0; 654 597 598 + error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free, 599 + 0, GFP_KERNEL); 600 + if (error) 601 + return error; 602 + init_completion(&nn->nfsd_serv_free_done); 603 + init_completion(&nn->nfsd_serv_confirm_done); 604 + 655 605 if (nfsd_max_blksize == 0) 656 606 nfsd_max_blksize = nfsd_get_default_max_blksize(); 657 607 nfsd_reset_versions(nn); 658 - serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats, 608 + serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs), 609 + &nn->nfsd_svcstats, 659 610 nfsd_max_blksize, nfsd); 660 611 if (serv == NULL) 661 612 return -ENOMEM; ··· 970 905 } 971 906 972 907 /** 973 - * nfsd_dispatch - Process an NFS or NFSACL Request 908 + * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request 974 909 * @rqstp: incoming request 975 910 * 976 911 * This RPC dispatcher integrates the NFS server's duplicate reply cache.
+12 -9
fs/nfsd/trace.h
··· 86 86 { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \ 87 87 { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \ 88 88 { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \ 89 - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }) 89 + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \ 90 + { NFSD_MAY_LOCALIO, "LOCALIO" }) 90 91 91 92 TRACE_EVENT(nfsd_compound, 92 93 TP_PROTO( ··· 194 193 { S_IFIFO, "FIFO" }, \ 195 194 { S_IFSOCK, "SOCK" }) 196 195 197 - TRACE_EVENT(nfsd_fh_verify, 196 + TRACE_EVENT_CONDITION(nfsd_fh_verify, 198 197 TP_PROTO( 199 198 const struct svc_rqst *rqstp, 200 199 const struct svc_fh *fhp, ··· 202 201 int access 203 202 ), 204 203 TP_ARGS(rqstp, fhp, type, access), 204 + TP_CONDITION(rqstp != NULL), 205 205 TP_STRUCT__entry( 206 206 __field(unsigned int, netns_ino) 207 207 __sockaddr(server, rqstp->rq_xprt->xpt_remotelen) ··· 241 239 __be32 error 242 240 ), 243 241 TP_ARGS(rqstp, fhp, type, access, error), 244 - TP_CONDITION(error), 242 + TP_CONDITION(rqstp != NULL && error), 245 243 TP_STRUCT__entry( 246 244 __field(unsigned int, netns_ino) 247 245 __sockaddr(server, rqstp->rq_xprt->xpt_remotelen) ··· 297 295 __entry->status) 298 296 ) 299 297 300 - #define DEFINE_NFSD_FH_ERR_EVENT(name) \ 301 - DEFINE_EVENT(nfsd_fh_err_class, nfsd_##name, \ 302 - TP_PROTO(struct svc_rqst *rqstp, \ 303 - struct svc_fh *fhp, \ 304 - int status), \ 305 - TP_ARGS(rqstp, fhp, status)) 298 + #define DEFINE_NFSD_FH_ERR_EVENT(name) \ 299 + DEFINE_EVENT_CONDITION(nfsd_fh_err_class, nfsd_##name, \ 300 + TP_PROTO(struct svc_rqst *rqstp, \ 301 + struct svc_fh *fhp, \ 302 + int status), \ 303 + TP_ARGS(rqstp, fhp, status), \ 304 + TP_CONDITION(rqstp != NULL)) 306 305 307 306 DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badexport); 308 307 DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badhandle);
+2
fs/nfsd/vfs.h
··· 33 33 34 34 #define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */ 35 35 36 + #define NFSD_MAY_LOCALIO 0x2000 /* for tracing, reflects when localio used */ 37 + 36 38 #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE) 37 39 #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC) 38 40
+9
include/linux/nfs.h
··· 8 8 #ifndef _LINUX_NFS_H 9 9 #define _LINUX_NFS_H 10 10 11 + #include <linux/cred.h> 12 + #include <linux/sunrpc/auth.h> 11 13 #include <linux/sunrpc/msg_prot.h> 12 14 #include <linux/string.h> 13 15 #include <linux/crc32.h> 14 16 #include <uapi/linux/nfs.h> 17 + 18 + /* The LOCALIO program is entirely private to Linux and is 19 + * NOT part of the uapi. 20 + */ 21 + #define NFS_LOCALIO_PROGRAM 400122 22 + #define LOCALIOPROC_NULL 0 23 + #define LOCALIOPROC_UUID_IS_LOCAL 1 15 24 16 25 /* 17 26 * This is the kernel NFS client file handle representation
+17
include/linux/nfs_common.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * This file contains constants and methods used by both NFS client and server. 4 + */ 5 + #ifndef _LINUX_NFS_COMMON_H 6 + #define _LINUX_NFS_COMMON_H 7 + 8 + #include <linux/errno.h> 9 + #include <uapi/linux/nfs.h> 10 + 11 + /* Mapping from NFS error code to "errno" error code. */ 12 + #define errno_NFSERR_IO EIO 13 + 14 + int nfs_stat_to_errno(enum nfs_stat status); 15 + int nfs4_stat_to_errno(int stat); 16 + 17 + #endif /* _LINUX_NFS_COMMON_H */
+11 -2
include/linux/nfs_fs_sb.h
··· 8 8 #include <linux/wait.h> 9 9 #include <linux/nfs_xdr.h> 10 10 #include <linux/sunrpc/xprt.h> 11 + #include <linux/nfslocalio.h> 11 12 12 13 #include <linux/atomic.h> 13 14 #include <linux/refcount.h> ··· 50 49 #define NFS_CS_DS 7 /* - Server is a DS */ 51 50 #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */ 52 51 #define NFS_CS_PNFS 9 /* - Server used for pnfs */ 52 + #define NFS_CS_LOCAL_IO 10 /* - client is local */ 53 53 struct sockaddr_storage cl_addr; /* server identifier */ 54 54 size_t cl_addrlen; 55 55 char * cl_hostname; /* hostname of server */ ··· 127 125 struct net *cl_net; 128 126 struct list_head pending_cb_stateids; 129 127 struct rcu_head rcu; 128 + 129 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 130 + struct timespec64 cl_nfssvc_boot; 131 + seqlock_t cl_boot_lock; 132 + nfs_uuid_t cl_uuid; 133 + spinlock_t cl_localio_lock; 134 + #endif /* CONFIG_NFS_LOCALIO */ 130 135 }; 131 136 132 137 /* ··· 167 158 #define NFS_MOUNT_WRITE_WAIT 0x02000000 168 159 #define NFS_MOUNT_TRUNK_DISCOVERY 0x04000000 169 160 #define NFS_MOUNT_SHUTDOWN 0x08000000 161 + #define NFS_MOUNT_NO_ALIGNWRITE 0x10000000 170 162 171 163 unsigned int fattr_valid; /* Valid attributes */ 172 164 unsigned int caps; /* server capabilities */ ··· 244 234 /* the following fields are protected by nfs_client->cl_lock */ 245 235 struct rb_root state_owners; 246 236 #endif 247 - struct ida openowner_id; 248 - struct ida lockowner_id; 237 + atomic64_t owner_ctr; 249 238 struct list_head state_owners_lru; 250 239 struct list_head layouts; 251 240 struct list_head delegations;
+20 -2
include/linux/nfs_xdr.h
··· 446 446 447 447 struct stateowner_id { 448 448 __u64 create_time; 449 - __u32 uniquifier; 449 + __u64 uniquifier; 450 450 }; 451 451 452 452 struct nfs4_open_delegation { ··· 1854 1854 }; 1855 1855 1856 1856 /* 1857 + * Helper functions used by NFS client and/or server 1858 + */ 1859 + static inline void encode_opaque_fixed(struct xdr_stream *xdr, 1860 + const void *buf, size_t len) 1861 + { 1862 + WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0); 1863 + } 1864 + 1865 + static inline int decode_opaque_fixed(struct xdr_stream *xdr, 1866 + void *buf, size_t len) 1867 + { 1868 + ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len); 1869 + if (unlikely(ret < 0)) 1870 + return -EIO; 1871 + return 0; 1872 + } 1873 + 1874 + /* 1857 1875 * Function vectors etc. for the NFS client 1858 1876 */ 1859 1877 extern const struct nfs_rpc_ops nfs_v2_clientops; ··· 1884 1866 extern const struct rpc_version nfsacl_version3; 1885 1867 extern const struct rpc_program nfsacl_program; 1886 1868 1887 - #endif 1869 + #endif /* _LINUX_NFS_XDR_H */
+74
include/linux/nfslocalio.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> 4 + * Copyright (C) 2024 NeilBrown <neilb@suse.de> 5 + */ 6 + #ifndef __LINUX_NFSLOCALIO_H 7 + #define __LINUX_NFSLOCALIO_H 8 + 9 + /* nfsd_file structure is purposely kept opaque to NFS client */ 10 + struct nfsd_file; 11 + 12 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 13 + 14 + #include <linux/module.h> 15 + #include <linux/list.h> 16 + #include <linux/uuid.h> 17 + #include <linux/sunrpc/clnt.h> 18 + #include <linux/sunrpc/svcauth.h> 19 + #include <linux/nfs.h> 20 + #include <net/net_namespace.h> 21 + 22 + /* 23 + * Useful to allow a client to negotiate if localio 24 + * possible with its server. 25 + * 26 + * See Documentation/filesystems/nfs/localio.rst for more detail. 27 + */ 28 + typedef struct { 29 + uuid_t uuid; 30 + struct list_head list; 31 + struct net __rcu *net; /* nfsd's network namespace */ 32 + struct auth_domain *dom; /* auth_domain for localio */ 33 + } nfs_uuid_t; 34 + 35 + void nfs_uuid_begin(nfs_uuid_t *); 36 + void nfs_uuid_end(nfs_uuid_t *); 37 + void nfs_uuid_is_local(const uuid_t *, struct list_head *, 38 + struct net *, struct auth_domain *, struct module *); 39 + void nfs_uuid_invalidate_clients(struct list_head *list); 40 + void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid); 41 + 42 + /* localio needs to map filehandle -> struct nfsd_file */ 43 + extern struct nfsd_file * 44 + nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *, 45 + const struct cred *, const struct nfs_fh *, 46 + const fmode_t) __must_hold(rcu); 47 + 48 + struct nfsd_localio_operations { 49 + bool (*nfsd_serv_try_get)(struct net *); 50 + void (*nfsd_serv_put)(struct net *); 51 + struct nfsd_file *(*nfsd_open_local_fh)(struct net *, 52 + struct auth_domain *, 53 + struct rpc_clnt *, 54 + const struct cred *, 55 + const struct nfs_fh *, 56 + const fmode_t); 57 + void (*nfsd_file_put_local)(struct nfsd_file *); 58 + struct file *(*nfsd_file_file)(struct nfsd_file *); 59 + } ____cacheline_aligned; 60 + 61 + extern void nfsd_localio_ops_init(void); 62 + extern const struct nfsd_localio_operations *nfs_to; 63 + 64 + struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *, 65 + struct rpc_clnt *, const struct cred *, 66 + const struct nfs_fh *, const fmode_t); 67 + 68 + #else /* CONFIG_NFS_LOCALIO */ 69 + static inline void nfsd_localio_ops_init(void) 70 + { 71 + } 72 + #endif /* CONFIG_NFS_LOCALIO */ 73 + 74 + #endif /* __LINUX_NFSLOCALIO_H */
+9 -7
include/linux/sunrpc/sched.h
··· 151 151 #define RPC_WAS_SENT(t) ((t)->tk_flags & RPC_TASK_SENT) 152 152 #define RPC_IS_MOVEABLE(t) ((t)->tk_flags & RPC_TASK_MOVEABLE) 153 153 154 - #define RPC_TASK_RUNNING 0 155 - #define RPC_TASK_QUEUED 1 156 - #define RPC_TASK_ACTIVE 2 157 - #define RPC_TASK_NEED_XMIT 3 158 - #define RPC_TASK_NEED_RECV 4 159 - #define RPC_TASK_MSG_PIN_WAIT 5 160 - #define RPC_TASK_SIGNALLED 6 154 + enum { 155 + RPC_TASK_RUNNING, 156 + RPC_TASK_QUEUED, 157 + RPC_TASK_ACTIVE, 158 + RPC_TASK_NEED_XMIT, 159 + RPC_TASK_NEED_RECV, 160 + RPC_TASK_MSG_PIN_WAIT, 161 + RPC_TASK_SIGNALLED, 162 + }; 161 163 162 164 #define rpc_test_and_set_running(t) \ 163 165 test_and_set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
+4 -3
include/linux/sunrpc/svc.h
··· 67 67 * We currently do not support more than one RPC program per daemon. 68 68 */ 69 69 struct svc_serv { 70 - struct svc_program * sv_program; /* RPC program */ 70 + struct svc_program * sv_programs; /* RPC programs */ 71 71 struct svc_stat * sv_stats; /* RPC statistics */ 72 72 spinlock_t sv_lock; 73 + unsigned int sv_nprogs; /* Number of sv_programs */ 73 74 unsigned int sv_nrthreads; /* # of server threads */ 74 75 unsigned int sv_maxconn; /* max connections allowed or 75 76 * '0' causing max to be based ··· 361 360 }; 362 361 363 362 /* 364 - * List of RPC programs on the same transport endpoint 363 + * RPC program - an array of these can use the same transport endpoint 365 364 */ 366 365 struct svc_program { 367 - struct svc_program * pg_next; /* other programs (same xprt) */ 368 366 u32 pg_prog; /* program number */ 369 367 unsigned int pg_lovers; /* lowest version */ 370 368 unsigned int pg_hivers; /* highest version */ ··· 441 441 void svc_rqst_release_pages(struct svc_rqst *rqstp); 442 442 void svc_exit_thread(struct svc_rqst *); 443 443 struct svc_serv * svc_create_pooled(struct svc_program *prog, 444 + unsigned int nprog, 444 445 struct svc_stat *stats, 445 446 unsigned int bufsize, 446 447 int (*threadfn)(void *data));
+5
include/linux/sunrpc/svcauth.h
··· 14 14 #include <linux/sunrpc/msg_prot.h> 15 15 #include <linux/sunrpc/cache.h> 16 16 #include <linux/sunrpc/gss_api.h> 17 + #include <linux/sunrpc/clnt.h> 17 18 #include <linux/hash.h> 18 19 #include <linux/stringhash.h> 19 20 #include <linux/cred.h> ··· 157 156 extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp); 158 157 extern int svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops); 159 158 extern void svc_auth_unregister(rpc_authflavor_t flavor); 159 + 160 + extern void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt, 161 + const struct cred *, 162 + struct svc_cred *); 160 163 161 164 extern struct auth_domain *unix_domain_find(char *name); 162 165 extern void auth_domain_put(struct auth_domain *item);
+3 -7
net/sunrpc/cache.c
··· 731 731 static void cache_revisit_request(struct cache_head *item) 732 732 { 733 733 struct cache_deferred_req *dreq; 734 - struct list_head pending; 735 734 struct hlist_node *tmp; 736 735 int hash = DFR_HASH(item); 736 + LIST_HEAD(pending); 737 737 738 - INIT_LIST_HEAD(&pending); 739 738 spin_lock(&cache_defer_lock); 740 739 741 740 hlist_for_each_entry_safe(dreq, tmp, &cache_defer_hash[hash], hash) ··· 755 756 void cache_clean_deferred(void *owner) 756 757 { 757 758 struct cache_deferred_req *dreq, *tmp; 758 - struct list_head pending; 759 + LIST_HEAD(pending); 759 760 760 - 761 - INIT_LIST_HEAD(&pending); 762 761 spin_lock(&cache_defer_lock); 763 762 764 763 list_for_each_entry_safe(dreq, tmp, &cache_defer_list, recent) { ··· 1082 1085 { 1083 1086 struct cache_queue *cq, *tmp; 1084 1087 struct cache_request *cr; 1085 - struct list_head dequeued; 1088 + LIST_HEAD(dequeued); 1086 1089 1087 - INIT_LIST_HEAD(&dequeued); 1088 1090 spin_lock(&queue_lock); 1089 1091 list_for_each_entry_safe(cq, tmp, &detail->queue, list) 1090 1092 if (!cq->reader) {
+1 -12
net/sunrpc/clnt.c
··· 48 48 # define RPCDBG_FACILITY RPCDBG_CALL 49 49 #endif 50 50 51 - /* 52 - * All RPC clients are linked into this list 53 - */ 54 - 55 51 static DECLARE_WAIT_QUEUE_HEAD(destroy_wait); 56 - 57 52 58 53 static void call_start(struct rpc_task *task); 59 54 static void call_reserve(struct rpc_task *task); ··· 541 546 .connect_timeout = args->connect_timeout, 542 547 .reconnect_timeout = args->reconnect_timeout, 543 548 }; 544 - char servername[48]; 549 + char servername[RPC_MAXNETNAMELEN]; 545 550 struct rpc_clnt *clnt; 546 551 int i; 547 552 ··· 1887 1892 1888 1893 if (req->rq_buffer) 1889 1894 return; 1890 - 1891 - if (proc->p_proc != 0) { 1892 - BUG_ON(proc->p_arglen == 0); 1893 - if (proc->p_decode != NULL) 1894 - BUG_ON(proc->p_replen == 0); 1895 - } 1896 1895 1897 1896 /* 1898 1897 * Calculate the size (in quads) of the RPC call
+39 -29
net/sunrpc/svc.c
··· 440 440 441 441 static int svc_uses_rpcbind(struct svc_serv *serv) 442 442 { 443 - struct svc_program *progp; 444 - unsigned int i; 443 + unsigned int p, i; 445 444 446 - for (progp = serv->sv_program; progp; progp = progp->pg_next) { 445 + for (p = 0; p < serv->sv_nprogs; p++) { 446 + struct svc_program *progp = &serv->sv_programs[p]; 447 + 447 448 for (i = 0; i < progp->pg_nvers; i++) { 448 449 if (progp->pg_vers[i] == NULL) 449 450 continue; ··· 481 480 * Create an RPC service 482 481 */ 483 482 static struct svc_serv * 484 - __svc_create(struct svc_program *prog, struct svc_stat *stats, 483 + __svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats, 485 484 unsigned int bufsize, int npools, int (*threadfn)(void *data)) 486 485 { 487 486 struct svc_serv *serv; ··· 492 491 if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL))) 493 492 return NULL; 494 493 serv->sv_name = prog->pg_name; 495 - serv->sv_program = prog; 494 + serv->sv_programs = prog; 495 + serv->sv_nprogs = nprogs; 496 496 serv->sv_stats = stats; 497 497 if (bufsize > RPCSVC_MAXPAYLOAD) 498 498 bufsize = RPCSVC_MAXPAYLOAD; ··· 501 499 serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); 502 500 serv->sv_threadfn = threadfn; 503 501 xdrsize = 0; 504 - while (prog) { 505 - prog->pg_lovers = prog->pg_nvers-1; 506 - for (vers=0; vers<prog->pg_nvers ; vers++) 507 - if (prog->pg_vers[vers]) { 508 - prog->pg_hivers = vers; 509 - if (prog->pg_lovers > vers) 510 - prog->pg_lovers = vers; 511 - if (prog->pg_vers[vers]->vs_xdrsize > xdrsize) 512 - xdrsize = prog->pg_vers[vers]->vs_xdrsize; 502 + for (i = 0; i < nprogs; i++) { 503 + struct svc_program *progp = &prog[i]; 504 + 505 + progp->pg_lovers = progp->pg_nvers-1; 506 + for (vers = 0; vers < progp->pg_nvers ; vers++) 507 + if (progp->pg_vers[vers]) { 508 + progp->pg_hivers = vers; 509 + if (progp->pg_lovers > vers) 510 + progp->pg_lovers = vers; 511 + if (progp->pg_vers[vers]->vs_xdrsize > xdrsize) 512 + xdrsize = progp->pg_vers[vers]->vs_xdrsize; 513 513 } 514 - prog = prog->pg_next; 515 514 } 516 515 serv->sv_xdrsize = xdrsize; 517 516 INIT_LIST_HEAD(&serv->sv_tempsocks); ··· 561 558 struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, 562 559 int (*threadfn)(void *data)) 563 560 { 564 - return __svc_create(prog, NULL, bufsize, 1, threadfn); 561 + return __svc_create(prog, 1, NULL, bufsize, 1, threadfn); 565 562 } 566 563 EXPORT_SYMBOL_GPL(svc_create); 567 564 568 565 /** 569 566 * svc_create_pooled - Create an RPC service with pooled threads 570 - * @prog: the RPC program the new service will handle 567 + * @prog: Array of RPC programs the new service will handle 568 + * @nprogs: Number of programs in the array 571 569 * @stats: the stats struct if desired 572 570 * @bufsize: maximum message size for @prog 573 571 * @threadfn: a function to service RPC requests for @prog ··· 576 572 * Returns an instantiated struct svc_serv object or NULL. 577 573 */ 578 574 struct svc_serv *svc_create_pooled(struct svc_program *prog, 575 + unsigned int nprogs, 579 576 struct svc_stat *stats, 580 577 unsigned int bufsize, 581 578 int (*threadfn)(void *data)) ··· 584 579 struct svc_serv *serv; 585 580 unsigned int npools = svc_pool_map_get(); 586 581 587 - serv = __svc_create(prog, stats, bufsize, npools, threadfn); 582 + serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn); 588 583 if (!serv) 589 584 goto out_err; 590 585 serv->sv_is_pooled = true; ··· 607 602 608 603 *servp = NULL; 609 604 610 - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); 605 + dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name); 611 606 timer_shutdown_sync(&serv->sv_temptimer); 612 607 613 608 /* 614 609 * Remaining transports at this point are not expected. 615 610 */ 616 611 WARN_ONCE(!list_empty(&serv->sv_permsocks), 617 - "SVC: permsocks remain for %s\n", serv->sv_program->pg_name); 612 + "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name); 618 613 WARN_ONCE(!list_empty(&serv->sv_tempsocks), 619 - "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name); 614 + "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name); 620 615 621 616 cache_clean_deferred(serv); 622 617 ··· 1153 1148 const int family, const unsigned short proto, 1154 1149 const unsigned short port) 1155 1150 { 1156 - struct svc_program *progp; 1157 - unsigned int i; 1151 + unsigned int p, i; 1158 1152 int error = 0; 1159 1153 1160 1154 WARN_ON_ONCE(proto == 0 && port == 0); 1161 1155 if (proto == 0 && port == 0) 1162 1156 return -EINVAL; 1163 1157 1164 - for (progp = serv->sv_program; progp; progp = progp->pg_next) { 1158 + for (p = 0; p < serv->sv_nprogs; p++) { 1159 + struct svc_program *progp = &serv->sv_programs[p]; 1160 + 1165 1161 for (i = 0; i < progp->pg_nvers; i++) { 1166 1162 1167 1163 error = progp->pg_rpcbind_set(net, progp, i, ··· 1214 1208 static void svc_unregister(const struct svc_serv *serv, struct net *net) 1215 1209 { 1216 1210 struct sighand_struct *sighand; 1217 - struct svc_program *progp; 1218 1211 unsigned long flags; 1219 - unsigned int i; 1212 + unsigned int p, i; 1220 1213 1221 1214 clear_thread_flag(TIF_SIGPENDING); 1222 1215 1223 - for (progp = serv->sv_program; progp; progp = progp->pg_next) { 1216 + for (p = 0; p < serv->sv_nprogs; p++) { 1217 + struct svc_program *progp = &serv->sv_programs[p]; 1218 + 1224 1219 for (i = 0; i < progp->pg_nvers; i++) { 1225 1220 if (progp->pg_vers[i] == NULL) 1226 1221 continue; ··· 1327 1320 struct svc_process_info process; 1328 1321 enum svc_auth_status auth_res; 1329 1322 unsigned int aoffset; 1330 - int rc; 1323 + int pr, rc; 1331 1324 __be32 *p; 1332 1325 1333 1326 /* Will be turned off only when NFSv4 Sessions are used */ ··· 1351 1344 rqstp->rq_vers = be32_to_cpup(p++); 1352 1345 rqstp->rq_proc = be32_to_cpup(p); 1353 1346 1354 - for (progp = serv->sv_program; progp; progp = progp->pg_next) 1347 + for (pr = 0; pr < serv->sv_nprogs; pr++) { 1348 + progp = &serv->sv_programs[pr]; 1349 + 1355 1350 if (rqstp->rq_prog == progp->pg_prog) 1356 1351 break; 1352 + } 1357 1353 1358 1354 /* 1359 1355 * Decode auth data, and add verifier to reply buffer.
+1 -1
net/sunrpc/svc_xprt.c
··· 268 268 spin_unlock(&svc_xprt_class_lock); 269 269 newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags); 270 270 if (IS_ERR(newxprt)) { 271 - trace_svc_xprt_create_err(serv->sv_program->pg_name, 271 + trace_svc_xprt_create_err(serv->sv_programs->pg_name, 272 272 xcl->xcl_name, sap, len, 273 273 newxprt); 274 274 module_put(xcl->xcl_owner);
+28
net/sunrpc/svcauth.c
··· 18 18 #include <linux/sunrpc/svcauth.h> 19 19 #include <linux/err.h> 20 20 #include <linux/hash.h> 21 + #include <linux/user_namespace.h> 21 22 22 23 #include <trace/events/sunrpc.h> 23 24 ··· 175 174 return aops->pseudoflavor(rqstp); 176 175 } 177 176 EXPORT_SYMBOL_GPL(svc_auth_flavor); 177 + 178 + /** 179 + * svcauth_map_clnt_to_svc_cred_local - maps a generic cred 180 + * to a svc_cred suitable for use in nfsd. 181 + * @clnt: rpc_clnt associated with nfs client 182 + * @cred: generic cred associated with nfs client 183 + * @svc: returned svc_cred that is suitable for use in nfsd 184 + */ 185 + void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt, 186 + const struct cred *cred, 187 + struct svc_cred *svc) 188 + { 189 + struct user_namespace *userns = clnt->cl_cred ? 190 + clnt->cl_cred->user_ns : &init_user_ns; 191 + 192 + memset(svc, 0, sizeof(struct svc_cred)); 193 + 194 + svc->cr_uid = KUIDT_INIT(from_kuid_munged(userns, cred->fsuid)); 195 + svc->cr_gid = KGIDT_INIT(from_kgid_munged(userns, cred->fsgid)); 196 + svc->cr_flavor = clnt->cl_auth->au_flavor; 197 + if (cred->group_info) 198 + svc->cr_group_info = get_group_info(cred->group_info); 199 + /* These aren't relevant for local (network is bypassed) */ 200 + svc->cr_principal = NULL; 201 + svc->cr_gss_mech = NULL; 202 + } 203 + EXPORT_SYMBOL_GPL(svcauth_map_clnt_to_svc_cred_local); 178 204 179 205 /************************************************** 180 206 * 'auth_domains' are stored in a hash table indexed by name.
+2 -1
net/sunrpc/svcauth_unix.c
··· 697 697 rqstp->rq_auth_stat = rpc_autherr_badcred; 698 698 ipm = ip_map_cached_get(xprt); 699 699 if (ipm == NULL) 700 - ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class, 700 + ipm = __ip_map_lookup(sn->ip_map_cache, 701 + rqstp->rq_server->sv_programs->pg_class, 701 702 &sin6->sin6_addr); 702 703 703 704 if (ipm == NULL)