Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:

- Mike Snitzer's mechanism for disabling I/O caching introduced in
v6.18 is extended to include using direct I/O. The goal is to further
reduce the memory footprint consumed by NFS clients accessing large
data sets via NFSD.

- The NFSD community adopted a maintainer entry profile during this
cycle. See

Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst

- Work continues on hardening NFSD's implementation of the pNFS block
layout type. This type enables pNFS clients to directly access the
underlying block devices that contain an exported file system,
reducing server overhead and increasing data throughput.

- The remaining patches are clean-ups and minor optimizations. Many
thanks to the contributors, reviewers, testers, and bug reporters who
participated during the v6.19 NFSD development cycle.

* tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (38 commits)
NFSD: nfsd-io-modes: Separate lists
NFSD: nfsd-io-modes: Wrap shell snippets in literal code blocks
NFSD: Add toctree entry for NFSD IO modes docs
NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst
NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
NFSD: Make FILE_SYNC WRITEs comply with spec
NFSD: Add trace point for SCSI fencing operation.
NFSD: use correct reservation type in nfsd4_scsi_fence_client
xdrgen: Don't generate unnecessary semicolon
xdrgen: Fix union declarations
NFSD: don't start nfsd if sv_permsocks is empty
xdrgen: handle _XdrString in union encoder/decoder
xdrgen: Fix the variable-length opaque field decoder template
xdrgen: Make the xdrgen script location-independent
xdrgen: Generalize/harden pathname construction
lockd: don't allow locking on reexported NFSv2/3
MAINTAINERS: add a nfsd blocklayout reviewer
nfsd: Use MD5 library instead of crypto_shash
nfsd: stop pretending that we cache the SEQUENCE reply.
NFS: nfsd-maintainer-entry-profile: Inline function name prefixes
...

+1431 -373
+1
Documentation/filesystems/nfs/index.rst
··· 13 13 rpc-cache 14 14 rpc-server-gss 15 15 nfs41-server 16 + nfsd-io-modes 16 17 knfsd-stats 17 18 reexport
+153
Documentation/filesystems/nfs/nfsd-io-modes.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============= 4 + NFSD IO MODES 5 + ============= 6 + 7 + Overview 8 + ======== 9 + 10 + NFSD has historically always used buffered IO when servicing READ and 11 + WRITE operations. BUFFERED is NFSD's default IO mode, but it is possible 12 + to override that default to use either DONTCACHE or DIRECT IO modes. 13 + 14 + Experimental NFSD debugfs interfaces are available to allow the NFSD IO 15 + mode used for READ and WRITE to be configured independently. See both: 16 + 17 + - /sys/kernel/debug/nfsd/io_cache_read 18 + - /sys/kernel/debug/nfsd/io_cache_write 19 + 20 + The default value for both io_cache_read and io_cache_write reflects 21 + NFSD's default IO mode (which is NFSD_IO_BUFFERED=0). 22 + 23 + Based on the configured settings, NFSD's IO will either be: 24 + 25 + - cached using page cache (NFSD_IO_BUFFERED=0) 26 + - cached but removed from page cache on completion (NFSD_IO_DONTCACHE=1) 27 + - not cached stable_how=NFS_UNSTABLE (NFSD_IO_DIRECT=2) 28 + 29 + To set an NFSD IO mode, write a supported value (0 - 2) to the 30 + corresponding IO operation's debugfs interface, e.g.:: 31 + 32 + echo 2 > /sys/kernel/debug/nfsd/io_cache_read 33 + echo 2 > /sys/kernel/debug/nfsd/io_cache_write 34 + 35 + To check which IO mode NFSD is using for READ or WRITE, simply read the 36 + corresponding IO operation's debugfs interface, e.g.:: 37 + 38 + cat /sys/kernel/debug/nfsd/io_cache_read 39 + cat /sys/kernel/debug/nfsd/io_cache_write 40 + 41 + If you experiment with NFSD's IO modes on a recent kernel and have 42 + interesting results, please report them to linux-nfs@vger.kernel.org 43 + 44 + NFSD DONTCACHE 45 + ============== 46 + 47 + DONTCACHE offers a hybrid approach to servicing IO that aims to offer 48 + the benefits of using DIRECT IO without any of the strict alignment 49 + requirements that DIRECT IO imposes. To achieve this buffered IO is used 50 + but the IO is flagged to "drop behind" (meaning associated pages are 51 + dropped from the page cache) when IO completes. 52 + 53 + DONTCACHE aims to avoid what has proven to be a fairly significant 54 + limition of Linux's memory management subsystem if/when large amounts of 55 + data is infrequently accessed (e.g. read once _or_ written once but not 56 + read until much later). Such use-cases are particularly problematic 57 + because the page cache will eventually become a bottleneck to servicing 58 + new IO requests. 59 + 60 + For more context on DONTCACHE, please see these Linux commit headers: 61 + 62 + - Overview: 9ad6344568cc3 ("mm/filemap: change filemap_create_folio() 63 + to take a struct kiocb") 64 + - for READ: 8026e49bff9b1 ("mm/filemap: add read support for 65 + RWF_DONTCACHE") 66 + - for WRITE: 974c5e6139db3 ("xfs: flag as supporting FOP_DONTCACHE") 67 + 68 + NFSD_IO_DONTCACHE will fall back to NFSD_IO_BUFFERED if the underlying 69 + filesystem doesn't indicate support by setting FOP_DONTCACHE. 70 + 71 + NFSD DIRECT 72 + =========== 73 + 74 + DIRECT IO doesn't make use of the page cache, as such it is able to 75 + avoid the Linux memory management's page reclaim scalability problems 76 + without resorting to the hybrid use of page cache that DONTCACHE does. 77 + 78 + Some workloads benefit from NFSD avoiding the page cache, particularly 79 + those with a working set that is significantly larger than available 80 + system memory. The pathological worst-case workload that NFSD DIRECT has 81 + proven to help most is: NFS client issuing large sequential IO to a file 82 + that is 2-3 times larger than the NFS server's available system memory. 83 + The reason for such improvement is NFSD DIRECT eliminates a lot of work 84 + that the memory management subsystem would otherwise be required to 85 + perform (e.g. page allocation, dirty writeback, page reclaim). When 86 + using NFSD DIRECT, kswapd and kcompactd are no longer commanding CPU 87 + time trying to find adequate free pages so that forward IO progress can 88 + be made. 89 + 90 + The performance win associated with using NFSD DIRECT was previously 91 + discussed on linux-nfs, see: 92 + https://lore.kernel.org/linux-nfs/aEslwqa9iMeZjjlV@kernel.org/ 93 + 94 + But in summary: 95 + 96 + - NFSD DIRECT can significantly reduce memory requirements 97 + - NFSD DIRECT can reduce CPU load by avoiding costly page reclaim work 98 + - NFSD DIRECT can offer more deterministic IO performance 99 + 100 + As always, your mileage may vary and so it is important to carefully 101 + consider if/when it is beneficial to make use of NFSD DIRECT. When 102 + assessing comparative performance of your workload please be sure to log 103 + relevant performance metrics during testing (e.g. memory usage, cpu 104 + usage, IO performance). Using perf to collect perf data that may be used 105 + to generate a "flamegraph" for work Linux must perform on behalf of your 106 + test is a really meaningful way to compare the relative health of the 107 + system and how switching NFSD's IO mode changes what is observed. 108 + 109 + If NFSD_IO_DIRECT is specified by writing 2 (or 3 and 4 for WRITE) to 110 + NFSD's debugfs interfaces, ideally the IO will be aligned relative to 111 + the underlying block device's logical_block_size. Also the memory buffer 112 + used to store the READ or WRITE payload must be aligned relative to the 113 + underlying block device's dma_alignment. 114 + 115 + But NFSD DIRECT does handle misaligned IO in terms of O_DIRECT as best 116 + it can: 117 + 118 + Misaligned READ: 119 + If NFSD_IO_DIRECT is used, expand any misaligned READ to the next 120 + DIO-aligned block (on either end of the READ). The expanded READ is 121 + verified to have proper offset/len (logical_block_size) and 122 + dma_alignment checking. 123 + 124 + Misaligned WRITE: 125 + If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start, 126 + middle and end as needed. The large middle segment is DIO-aligned 127 + and the start and/or end are misaligned. Buffered IO is used for the 128 + misaligned segments and O_DIRECT is used for the middle DIO-aligned 129 + segment. DONTCACHE buffered IO is _not_ used for the misaligned 130 + segments because using normal buffered IO offers significant RMW 131 + performance benefit when handling streaming misaligned WRITEs. 132 + 133 + Tracing: 134 + The nfsd_read_direct trace event shows how NFSD expands any 135 + misaligned READ to the next DIO-aligned block (on either end of the 136 + original READ, as needed). 137 + 138 + This combination of trace events is useful for READs:: 139 + 140 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_vector/enable 141 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_direct/enable 142 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_io_done/enable 143 + echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable 144 + 145 + The nfsd_write_direct trace event shows how NFSD splits a given 146 + misaligned WRITE into a DIO-aligned middle segment. 147 + 148 + This combination of trace events is useful for WRITEs:: 149 + 150 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable 151 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_direct/enable 152 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable 153 + echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
+547
Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst
··· 1 + NFSD Maintainer Entry Profile 2 + ============================= 3 + 4 + A Maintainer Entry Profile supplements the top-level process 5 + documents (found in Documentation/process/) with customs that are 6 + specific to a subsystem and its maintainers. A contributor may use 7 + this document to set their expectations and avoid common mistakes. 8 + A maintainer may use these profiles to look across subsystems for 9 + opportunities to converge on best common practices. 10 + 11 + Overview 12 + -------- 13 + The Network File System (NFS) is a standardized family of network 14 + protocols that enable access to files across a set of network- 15 + connected peer hosts. Applications on NFS clients access files that 16 + reside on file systems that are shared by NFS servers. A single 17 + network peer can act as both an NFS client and an NFS server. 18 + 19 + NFSD refers to the NFS server implementation included in the Linux 20 + kernel. An in-kernel NFS server has fast access to files stored 21 + in file systems local to that server. NFSD can share files stored 22 + on most of the file system types native to Linux, including xfs, 23 + ext4, btrfs, and tmpfs. 24 + 25 + Mailing list 26 + ------------ 27 + The linux-nfs@vger.kernel.org mailing list is a public list. Its 28 + purpose is to enable collaboration among developers working on the 29 + Linux NFS stack, both client and server. It is not a place for 30 + conversations that are not related directly to the Linux NFS stack. 31 + 32 + The linux-nfs mailing list is archived on `lore.kernel.org <https://lore.kernel.org/linux-nfs/>`_. 33 + 34 + The Linux NFS community does not have any chat room. 35 + 36 + Reporting bugs 37 + -------------- 38 + If you experience an NFSD-related bug on a distribution-built 39 + kernel, please start by working with your Linux distributor. 40 + 41 + Bug reports against upstream Linux code bases are welcome on the 42 + linux-nfs@vger.kernel.org mailing list, where some active triage 43 + can be done. NFSD bugs may also be reported in the Linux kernel 44 + community's bugzilla at: 45 + 46 + https://bugzilla.kernel.org 47 + 48 + Please file NFSD-related bugs under the "Filesystems/NFSD" 49 + component. In general, including as much detail as possible is a 50 + good start, including pertinent system log messages from both 51 + the client and server. 52 + 53 + User space software related to NFSD, such as mountd or the exportfs 54 + command, is contained in the nfs-utils package. Report problems 55 + with those components to linux-nfs@vger.kernel.org. You might be 56 + directed to move the report to a specific bug tracker. 57 + 58 + Contributor's Guide 59 + ------------------- 60 + 61 + Standards compliance 62 + ~~~~~~~~~~~~~~~~~~~~ 63 + The priority is for NFSD to interoperate fully with the Linux NFS 64 + client. We also test against other popular NFS client implementa- 65 + tions regularly at NFS bake-a-thon events (also known as plug- 66 + fests). Non-Linux NFS clients are not part of upstream NFSD CI/CD. 67 + 68 + The NFSD community strives to provide an NFS server implementation 69 + that interoperates with all standards-compliant NFS client 70 + implementations. This is done by staying as close as is sensible to 71 + the normative mandates in the IETF's published NFS, RPC, and GSS-API 72 + standards. 73 + 74 + It is always useful to reference an RFC and section number in a code 75 + comment where behavior deviates from the standard (and even when the 76 + behavior is compliant but the implementation is obfuscatory). 77 + 78 + On the rare occasion when a deviation from standard-mandated 79 + behavior is needed, brief documentation of the use case or 80 + deficiencies in the standard is a required part of in-code 81 + documentation. 82 + 83 + Care must always be taken to avoid leaking local error codes (ie, 84 + errnos) to clients of NFSD. A proper NFS status code is always 85 + required in NFS protocol replies. 86 + 87 + NFSD administrative interfaces 88 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 89 + NFSD administrative interfaces include: 90 + 91 + - an NFSD or SUNRPC module parameter 92 + 93 + - export options in /etc/exports 94 + 95 + - files under /proc/fs/nfsd/ or /proc/sys/sunrpc/ 96 + 97 + - the NFSD netlink protocol 98 + 99 + Frequently, a request is made to introduce or modify one of NFSD's 100 + traditional administrative interfaces. Certainly it is technically 101 + easy to introduce a new administrative setting. However, there are 102 + good reasons why the NFSD maintainers prefer to leave that as a last 103 + resort: 104 + 105 + - As with any API, administrative interfaces are difficult to get 106 + right. 107 + 108 + - Once they are documented and have a legacy of use, administrative 109 + interfaces become difficult to modify or remove. 110 + 111 + - Every new administrative setting multiplies the NFSD test matrix. 112 + 113 + - The cost of one administrative interface is incremental, but costs 114 + add up across all of the existing interfaces. 115 + 116 + It is often better for everyone if effort is made up front to 117 + understanding the underlying requirement of the new setting, and 118 + then trying to make it tune itself (or to become otherwise 119 + unnecessary). 120 + 121 + If a new setting is indeed necessary, first consider adding it to 122 + the NFSD netlink protocol. Or if it doesn't need to be a reliable 123 + long term user space feature, it can be added to NFSD's menagerie of 124 + experimental settings which reside under /sys/kernel/debug/nfsd/ . 125 + 126 + Field observability 127 + ~~~~~~~~~~~~~~~~~~~ 128 + NFSD employs several different mechanisms for observing operation, 129 + including counters, printks, WARNings, and static trace points. Each 130 + have their strengths and weaknesses. Contributors should select the 131 + most appropriate tool for their task. 132 + 133 + - BUG must be avoided if at all possible, as it will frequently 134 + result in a full system crash. 135 + 136 + - WARN is appropriate only when a full stack trace is useful. 137 + 138 + - printk can show detailed information. These must not be used 139 + in code paths where they can be triggered repeatedly by remote 140 + users. 141 + 142 + - dprintk can show detailed information, but can be enabled only 143 + in pre-set groups. The overhead of emitting output makes dprintk 144 + inappropriate for frequent operations like I/O. 145 + 146 + - Counters are always on, but provide little information about 147 + individual events other than how frequently they occur. 148 + 149 + - static trace points can be enabled individually or in groups 150 + (via a glob). These are generally low overhead, and thus are 151 + favored for use in hot paths. 152 + 153 + - dynamic tracing, such as kprobes or eBPF, are quite flexible but 154 + cannot be used in certain environments (eg, full kernel lock- 155 + down). 156 + 157 + Testing 158 + ~~~~~~~ 159 + The kdevops project 160 + 161 + https://github.com/linux-kdevops/kdevops 162 + 163 + contains several NFS-specific workflows, as well as the community 164 + standard fstests suite. These workflows are based on open source 165 + testing tools such as ltp and fio. Contributors are encouraged to 166 + use these tools without kdevops, or contributors should install and 167 + use kdevops themselves to verify their patches before submission. 168 + 169 + Coding style 170 + ~~~~~~~~~~~~ 171 + Follow the coding style preferences described in 172 + 173 + Documentation/process/coding-style.rst 174 + 175 + with the following exceptions: 176 + 177 + - Add new local variables to a function in reverse Christmas tree 178 + order 179 + 180 + - Use the kdoc comment style for 181 + + non-static functions 182 + + static inline functions 183 + + static functions that are callbacks/virtual functions 184 + 185 + - All new function names start with ``nfsd_`` for non-NFS-version- 186 + specific functions. 187 + 188 + - New function names that are specific to NFSv2 or NFSv3, or are 189 + used by all minor versions of NFSv4, use ``nfsdN_`` where N is 190 + the version. 191 + 192 + - New function names specific to an NFSv4 minor version can be 193 + named with ``nfsd4M_`` where M is the minor version. 194 + 195 + Patch preparation 196 + ~~~~~~~~~~~~~~~~~ 197 + Read and follow all guidelines in 198 + 199 + Documentation/process/submitting-patches.rst 200 + 201 + Use tagging to identify all patch authors. However, reviewers and 202 + testers should be added by replying to the email patch submission. 203 + Email is extensively used in order to publicly archive review and 204 + testing attributions. These tags are automatically inserted into 205 + your patches when they are applied. 206 + 207 + The code in the body of the diff already shows /what/ is being 208 + changed. Thus it is not necessary to repeat that in the patch 209 + description. Instead, the description should contain one or more 210 + of: 211 + 212 + - A brief problem statement ("what is this patch trying to fix?") 213 + with a root-cause analysis. 214 + 215 + - End-user visible symptoms or items that a support engineer might 216 + use to search for the patch, like stack traces. 217 + 218 + - A brief explanation of why the patch is the best way to address 219 + the problem. 220 + 221 + - Any context that reviewers might need to understand the changes 222 + made by the patch. 223 + 224 + - Any relevant benchmarking results, and/or functional test results. 225 + 226 + As detailed in Documentation/process/submitting-patches.rst, 227 + identify the point in history that the issue being addressed was 228 + introduced by using a Fixes: tag. 229 + 230 + Mention in the patch description if that point in history cannot be 231 + determined -- that is, no Fixes: tag can be provided. In this case, 232 + please make it clear to maintainers whether an LTS backport is 233 + needed even though there is no Fixes: tag. 234 + 235 + The NFSD maintainers prefer to add stable tagging themselves, after 236 + public discussion in response to the patch submission. Contributors 237 + may suggest stable tagging, but be aware that many version 238 + management tools add such stable Cc's when you post your patches. 239 + Don't add "Cc: stable" unless you are absolutely sure the patch 240 + needs to go to stable during the initial submission process. 241 + 242 + Patch submission 243 + ~~~~~~~~~~~~~~~~ 244 + Patches to NFSD are submitted via the kernel's email-based review 245 + process that is common to most other kernel subsystems. 246 + 247 + Just before each submission, rebase your patch or series on the 248 + nfsd-testing branch at 249 + 250 + https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 251 + 252 + The NFSD subsystem is maintained separately from the Linux in-kernel 253 + NFS client. The NFSD maintainers do not normally take submissions 254 + for client changes, nor can they respond authoritatively to bug 255 + reports or feature requests for NFS client code. 256 + 257 + This means that contributors might be asked to resubmit patches if 258 + they were emailed to the incorrect set of maintainers and reviewers. 259 + This is not a rejection, but simply a correction of the submission 260 + process. 261 + 262 + When in doubt, consult the NFSD entry in the MAINTAINERS file to 263 + see which files and directories fall under the NFSD subsystem. 264 + 265 + The proper set of email addresses for NFSD patches are: 266 + 267 + To: the NFSD maintainers and reviewers listed in MAINTAINERS 268 + Cc: linux-nfs@vger.kernel.org and optionally linux-kernel@ 269 + 270 + If there are other subsystems involved in the patches (for example 271 + MM or RDMA) their primary mailing list address can be included in 272 + the Cc: field. Other contributors and interested parties may be 273 + included there as well. 274 + 275 + In general we prefer that contributors use common patch email tools 276 + such as "git send-email" or "stg email format/send", which tend to 277 + get the details right without a lot of fuss. 278 + 279 + A series consisting of a single patch is not required to have a 280 + cover letter. However, a cover letter can be included if there is 281 + substantial context that is not appropriate to include in the 282 + patch description. 283 + 284 + Please note that, with an e-mail based submission process, series 285 + cover letters are not part of the work that is committed to the 286 + kernel source code base or its commit history. Therefore always try 287 + to keep pertinent information in the patch descriptions. 288 + 289 + Design documentation is welcome, but as cover letters are not 290 + preserved, a perhaps better option is to include a patch that adds 291 + such documentation under Documentation/filesystems/nfs/. 292 + 293 + Reviewers will ask about test coverage and what use cases the 294 + patches are expected to address. Please be prepared to answer these 295 + questions. 296 + 297 + Review comments from maintainers might be politely stated, but in 298 + general, these are not optional to address when they are actionable. 299 + If necessary, the maintainers retain the right to not apply patches 300 + when contributors refuse to address reasonable requests. 301 + 302 + Post changes to kernel source code and user space source code as 303 + separate series. You can connect the two series with comments in 304 + your cover letters. 305 + 306 + Generally the NFSD maintainers ask for a reposts even for simple 307 + modifications in order to publicly archive the request and the 308 + resulting repost before it is pulled into the NFSD trees. This 309 + also enables us to rebuild a patch series quickly without missing 310 + changes that might have been discussed via email. 311 + 312 + Avoid frequently reposting large series with only small changes. As 313 + a rule of thumb, posting substantial changes more than once a week 314 + will result in reviewer overload. 315 + 316 + Remember, there are only a handful of subsystem maintainers and 317 + reviewers, but potentially many sources of contributions. The 318 + maintainers and reviewers, therefore, are always the less scalable 319 + resource. Be kind to your friendly neighborhood maintainer. 320 + 321 + Patch Acceptance 322 + ~~~~~~~~~~~~~~~~ 323 + There isn't a formal review process for NFSD, but we like to see 324 + at least two Reviewed-by: notices for patches that are more than 325 + simple clean-ups. Reviews are done in public on 326 + linux-nfs@vger.kernel.org and are archived on lore.kernel.org. 327 + 328 + Currently the NFSD patch queues are maintained in branches here: 329 + 330 + https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 331 + 332 + The NFSD maintainers apply patches initially to the nfsd-testing 333 + branch, which is always open to new submissions. Patches can be 334 + applied while review is ongoing. nfsd-testing is a topic branch, 335 + so it can change frequently, it will be rebased, and your patch 336 + might get dropped if there is a problem with it. 337 + 338 + Generally a script-generated "thank you" email will indicate when 339 + your patch has been added to the nfsd-testing branch. You can track 340 + the progress of your patch using the linux-nfs patchworks instance: 341 + 342 + https://patchwork.kernel.org/project/linux-nfs/list/ 343 + 344 + While your patch is in nfsd-testing, it is exposed to a variety of 345 + test environments, including community zero-day bots, static 346 + analysis tools, and NFSD continuous integration testing. The soak 347 + period is three to four weeks. 348 + 349 + Each patch that survives in nfsd-testing for the soak period without 350 + changes is moved to the nfsd-next branch. 351 + 352 + The nfsd-next branch is automatically merged into linux-next and 353 + fs-next on a nightly basis. 354 + 355 + Patches that survive in nfsd-next are included in the next NFSD 356 + merge window pull request. These windows typically occur once every 357 + 63 days (nine weeks). 358 + 359 + When the upstream merge window closes, the nfsd-next branch is 360 + renamed nfsd-fixes, and a new nfsd-next branch is created, based on 361 + the upstream -rc1 tag. 362 + 363 + Fixes that are destined for an upstream -rc release also run the 364 + nfsd-testing gauntlet, but are then applied to the nfsd-fixes 365 + branch. That branch is made available for Linus to pull after a 366 + short time. In order to limit the risk of introducing regressions, 367 + we limit such fixes to emergency situations or fixes to breakage 368 + that occurred during the most recent upstream merge. 369 + 370 + Please make it clear when submitting an emergency patch that 371 + immediate action (either application to -rc or LTS backport) is 372 + needed. 373 + 374 + Sensitive patch submissions and bug reports 375 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 376 + CVEs are generated by specific members of the Linux kernel community 377 + and several external entities. The Linux NFS community does not emit 378 + or assign CVEs. CVEs are assigned after an issue and its fix are 379 + known. 380 + 381 + However, the NFSD maintainers sometimes receive sensitive security 382 + reports, and at times these are significant enough to need to be 383 + embargoed. In such rare cases, fixes can be developed and reviewed 384 + out of the public eye. 385 + 386 + Please be aware that many version management tools add the stable 387 + Cc's when you post your patches. This is generally a nuisance, but 388 + it can result in outing an embargoed security issue accidentally. 389 + Don't add "Cc: stable" unless you are absolutely sure the patch 390 + needs to go to stable@ during the initial submission process. 391 + 392 + Patches that are merged without ever appearing on any list, and 393 + which carry a Reported-by: or Fixes: tag are detected as suspicious 394 + by security-focused people. We encourage that, after any private 395 + review, security-sensitive patches should be posted to linux-nfs@ 396 + for the usual public review, archiving, and test period. 397 + 398 + LLM-generated submissions 399 + ~~~~~~~~~~~~~~~~~~~~~~~~~ 400 + The Linux kernel community as a whole is still exploring the new 401 + world of LLM-generated code. The NFSD maintainers will entertain 402 + submission of patches that are partially or wholly generated by 403 + LLM-based development tools. Such submissions are held to the 404 + same standards as submissions created entirely by human authors: 405 + 406 + - The human contributor identifies themselves via a Signed-off-by: 407 + tag. This tag counts as a DoC. 408 + 409 + - The human contributor is solely responsible for code provenance 410 + and any contamination by inadvertently-included code with a 411 + conflicting license, as usual. 412 + 413 + - The human contributor must be able to answer and address review 414 + questions. A patch description such as "This fixed my problem 415 + but I don't know why" is not acceptable. 416 + 417 + - The contribution is subjected to the same test regimen as all 418 + other submissions. 419 + 420 + - An indication (via a Generated-by: tag or otherwise) that the 421 + contribution is LLM-generated is not required. 422 + 423 + It is easy to address review comments and fix requests in LLM 424 + generated code. So easy, in fact, that it becomes tempting to repost 425 + refreshed code immediately. Please resist that temptation. 426 + 427 + As always, please avoid reposting series revisions more than once 428 + every 24 hours. 429 + 430 + Clean-up patches 431 + ~~~~~~~~~~~~~~~~ 432 + The NFSD maintainers discourage patches which perform simple clean- 433 + ups, which are not in the context of other work. For example: 434 + 435 + * Addressing ``checkpatch.pl`` warnings after merge 436 + * Addressing :ref:`Local variable ordering<rcs>` issues 437 + * Addressing long-standing whitespace damage 438 + 439 + This is because it is felt that the churn that such changes produce 440 + comes at a greater cost than the value of such clean-ups. 441 + 442 + Conversely, spelling and grammar fixes are encouraged. 443 + 444 + Stable and LTS support 445 + ---------------------- 446 + Upstream NFSD continuous integration testing runs against LTS trees 447 + whenever they are updated. 448 + 449 + Please indicate when a patch containing a fix needs to be considered 450 + for LTS kernels, either via a Fixes: tag or explicit mention. 451 + 452 + Feature requests 453 + ---------------- 454 + There is no one way to make an official feature request, but 455 + discussion about the request should eventually make its way to 456 + the linux-nfs@vger.kernel.org mailing list for public review by 457 + the community. 458 + 459 + Subsystem boundaries 460 + ~~~~~~~~~~~~~~~~~~~~ 461 + NFSD itself is not much more than a protocol engine. This means its 462 + primary responsibility is to translate the NFS protocol into API 463 + calls in the Linux kernel. For example, NFSD is not responsible for 464 + knowing exactly how bytes or file attributes are managed on a block 465 + device. It relies on other kernel subsystems for that. 466 + 467 + If the subsystems on which NFSD relies do not implement a particular 468 + feature, even if the standard NFS protocols do support that feature, 469 + that usually means NFSD cannot provide that feature without 470 + substantial development work in other areas of the kernel. 471 + 472 + Specificity 473 + ~~~~~~~~~~~ 474 + Feature requests can come from anywhere, and thus can often be 475 + nebulous. A requester might not understand what a "use case" or 476 + "user story" is. These descriptive paradigms are often used by 477 + developers and architects to understand what is required of a 478 + design, but are terms of art in the software trade, not used in 479 + the everyday world. 480 + 481 + In order to prevent contributors and maintainers from becoming 482 + overwhelmed, we won't be afraid of saying "no" politely to 483 + underspecified requests. 484 + 485 + Community roles and their authority 486 + ----------------------------------- 487 + The purpose of Linux subsystem communities is to provide expertise 488 + and active stewardship of a narrow set of source files in the Linux 489 + kernel. This can include managing user space tooling as well. 490 + 491 + To contextualize the structure of the Linux NFS community that 492 + is responsible for stewardship of the NFS server code base, we 493 + define the community roles here. 494 + 495 + - **Contributor** : Anyone who submits a code change, bug fix, 496 + recommendation, documentation fix, and so on. A contributor can 497 + submit regularly or infrequently. 498 + 499 + - **Outside Contributor** : A contributor who is not a regular actor 500 + in the Linux NFS community. This can mean someone who contributes 501 + to other parts of the kernel, or someone who just noticed a 502 + misspelling in a comment and sent a patch. 503 + 504 + - **Reviewer** : Someone who is named in the MAINTAINERS file as a 505 + reviewer is an area expert who can request changes to contributed 506 + code, and expects that contributors will address the request. 507 + 508 + - **External Reviewer** : Someone who is not named in the 509 + MAINTAINERS file as a reviewer, but who is an area expert. 510 + Examples include Linux kernel contributors with networking, 511 + security, or persistent storage expertise, or developers who 512 + contribute primarily to other NFS implementations. 513 + 514 + One or more people will take on the following roles. These people 515 + are often generically referred to as "maintainers", and are 516 + identified in the MAINTAINERS file with the "M:" tag under the NFSD 517 + subsystem. 518 + 519 + - **Upstream Release Manager** : This role is responsible for 520 + curating contributions into a branch, reviewing test results, and 521 + then sending a pull request during merge windows. There is a 522 + trust relationship between the release manager and Linus. 523 + 524 + - **Bug Triager** : Someone who is a first responder to bug reports 525 + submitted to the linux-nfs mailing list or bug trackers, and helps 526 + troubleshoot and identify next steps. 527 + 528 + - **Security Lead** : The security lead handles contacts from the 529 + security community to resolve immediate issues, as well as dealing 530 + with long-term security issues such as supply chain concerns. For 531 + upstream, that's usually whether contributions violate licensing 532 + or other intellectual property agreements. 533 + 534 + - **Testing Lead** : The testing lead builds and runs the test 535 + infrastructure for the subsystem. The testing lead may ask for 536 + patches to be dropped because of ongoing high defect rates. 537 + 538 + - **LTS Maintainer** : The LTS maintainer is responsible for managing 539 + the Fixes: and Cc: stable annotations on patches, and seeing that 540 + patches that cannot be automatically applied to LTS kernels get 541 + proper manual backports as necessary. 542 + 543 + - **Community Manager** : This umpire role can be asked to call balls 544 + and strikes during conflicts, but is also responsible for ensuring 545 + the health of the relationships within the community and for 546 + facilitating discussions on long-term topics such as how to manage 547 + growing technical debt.
+1
Documentation/maintainer/maintainer-entry-profile.rst
··· 110 110 ../process/maintainer-netdev 111 111 ../driver-api/vfio-pci-device-specific-driver-acceptance 112 112 ../nvme/feature-and-quirk-policy 113 + ../filesystems/nfs/nfsd-maintainer-entry-profile 113 114 ../filesystems/xfs/xfs-maintainer-entry-profile 114 115 ../mm/damon/maintainer-profile
+5
MAINTAINERS
··· 13654 13654 R: Tom Talpey <tom@talpey.com> 13655 13655 L: linux-nfs@vger.kernel.org 13656 13656 S: Supported 13657 + P: Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst 13657 13658 B: https://bugzilla.kernel.org 13658 13659 T: git git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 13659 13660 F: Documentation/filesystems/nfs/ ··· 13673 13672 F: include/uapi/linux/sunrpc/ 13674 13673 F: net/sunrpc/ 13675 13674 F: tools/net/sunrpc/ 13675 + 13676 + KERNEL NFSD BLOCK and SCSI LAYOUT DRIVER 13677 + R: Christoph Hellwig <hch@lst.de> 13678 + F: fs/nfsd/blocklayout* 13676 13679 13677 13680 KERNEL PACMAN PACKAGING (in addition to generic KERNEL BUILD) 13678 13681 M: Thomas Weißschuh <linux@weissschuh.net>
+12
fs/lockd/svclock.c
··· 495 495 (long long)lock->fl.fl_end, 496 496 wait); 497 497 498 + if (nlmsvc_file_cannot_lock(file)) 499 + return nlm_lck_denied_nolocks; 500 + 498 501 if (!locks_can_async_lock(nlmsvc_file_file(file)->f_op)) { 499 502 async_block = wait; 500 503 wait = 0; ··· 624 621 (long long)lock->fl.fl_start, 625 622 (long long)lock->fl.fl_end); 626 623 624 + if (nlmsvc_file_cannot_lock(file)) 625 + return nlm_lck_denied_nolocks; 626 + 627 627 if (locks_in_grace(SVC_NET(rqstp))) { 628 628 ret = nlm_lck_denied_grace_period; 629 629 goto out; ··· 684 678 (long long)lock->fl.fl_start, 685 679 (long long)lock->fl.fl_end); 686 680 681 + if (nlmsvc_file_cannot_lock(file)) 682 + return nlm_lck_denied_nolocks; 683 + 687 684 /* First, cancel any lock that might be there */ 688 685 nlmsvc_cancel_blocked(net, file, lock); 689 686 ··· 723 714 lock->fl.c.flc_pid, 724 715 (long long)lock->fl.fl_start, 725 716 (long long)lock->fl.fl_end); 717 + 718 + if (nlmsvc_file_cannot_lock(file)) 719 + return nlm_lck_denied_nolocks; 726 720 727 721 if (locks_in_grace(net)) 728 722 return nlm_lck_denied_grace_period;
+6
fs/lockd/svcshare.c
··· 32 32 struct xdr_netobj *oh = &argp->lock.oh; 33 33 u8 *ohdata; 34 34 35 + if (nlmsvc_file_cannot_lock(file)) 36 + return nlm_lck_denied_nolocks; 37 + 35 38 for (share = file->f_shares; share; share = share->s_next) { 36 39 if (share->s_host == host && nlm_cmp_owner(share, oh)) 37 40 goto update; ··· 74 71 { 75 72 struct nlm_share *share, **shpp; 76 73 struct xdr_netobj *oh = &argp->lock.oh; 74 + 75 + if (nlmsvc_file_cannot_lock(file)) 76 + return nlm_lck_denied_nolocks; 77 77 78 78 for (shpp = &file->f_shares; (share = *shpp) != NULL; 79 79 shpp = &share->s_next) {
+3 -3
fs/nfsd/Kconfig
··· 5 5 depends on FILE_LOCKING 6 6 depends on FSNOTIFY 7 7 select CRC32 8 + select CRYPTO_LIB_MD5 if NFSD_LEGACY_CLIENT_TRACKING 8 9 select CRYPTO_LIB_SHA256 if NFSD_V4 9 10 select LOCKD 10 11 select SUNRPC ··· 78 77 depends on NFSD && PROC_FS 79 78 select FS_POSIX_ACL 80 79 select RPCSEC_GSS_KRB5 81 - select CRYPTO 82 - select CRYPTO_MD5 80 + select CRYPTO # required by RPCSEC_GSS_KRB5 83 81 select GRACE_PERIOD 84 82 select NFS_V4_2_SSC_HELPER if NFS_V4_2 85 83 help ··· 164 164 config NFSD_LEGACY_CLIENT_TRACKING 165 165 bool "Support legacy NFSv4 client tracking methods (DEPRECATED)" 166 166 depends on NFSD_V4 167 - default y 167 + default n 168 168 help 169 169 The NFSv4 server needs to store a small amount of information on 170 170 stable storage in order to handle state recovery after reboot. Most
+112 -50
fs/nfsd/blocklayout.c
··· 13 13 #include "pnfs.h" 14 14 #include "filecache.h" 15 15 #include "vfs.h" 16 + #include "trace.h" 16 17 17 18 #define NFSDDBG_FACILITY NFSDDBG_PNFS 18 19 19 20 21 + /* 22 + * Get an extent from the file system that starts at offset or below 23 + * and may be shorter than the requested length. 24 + */ 20 25 static __be32 21 - nfsd4_block_proc_layoutget(struct svc_rqst *rqstp, struct inode *inode, 22 - const struct svc_fh *fhp, struct nfsd4_layoutget *args) 26 + nfsd4_block_map_extent(struct inode *inode, const struct svc_fh *fhp, 27 + u64 offset, u64 length, u32 iomode, u64 minlength, 28 + struct pnfs_block_extent *bex) 23 29 { 24 - struct nfsd4_layout_seg *seg = &args->lg_seg; 25 30 struct super_block *sb = inode->i_sb; 26 - u32 block_size = i_blocksize(inode); 27 - struct pnfs_block_extent *bex; 28 31 struct iomap iomap; 29 32 u32 device_generation = 0; 30 33 int error; 31 34 32 - if (locks_in_grace(SVC_NET(rqstp))) 33 - return nfserr_grace; 34 - 35 - if (seg->offset & (block_size - 1)) { 36 - dprintk("pnfsd: I/O misaligned\n"); 37 - goto out_layoutunavailable; 38 - } 39 - 40 - /* 41 - * Some clients barf on non-zero block numbers for NONE or INVALID 42 - * layouts, so make sure to zero the whole structure. 43 - */ 44 - error = -ENOMEM; 45 - bex = kzalloc(sizeof(*bex), GFP_KERNEL); 46 - if (!bex) 47 - goto out_error; 48 - args->lg_content = bex; 49 - 50 - error = sb->s_export_op->map_blocks(inode, seg->offset, seg->length, 51 - &iomap, seg->iomode != IOMODE_READ, 52 - &device_generation); 35 + error = sb->s_export_op->map_blocks(inode, offset, length, &iomap, 36 + iomode != IOMODE_READ, &device_generation); 53 37 if (error) { 54 38 if (error == -ENXIO) 55 - goto out_layoutunavailable; 56 - goto out_error; 57 - } 58 - 59 - if (iomap.length < args->lg_minlength) { 60 - dprintk("pnfsd: extent smaller than minlength\n"); 61 - goto out_layoutunavailable; 39 + return nfserr_layoutunavailable; 40 + return nfserrno(error); 62 41 } 63 42 64 43 switch (iomap.type) { 65 44 case IOMAP_MAPPED: 66 - if (seg->iomode == IOMODE_READ) 45 + if (iomode == IOMODE_READ) 67 46 bex->es = PNFS_BLOCK_READ_DATA; 68 47 else 69 48 bex->es = PNFS_BLOCK_READWRITE_DATA; 70 49 bex->soff = iomap.addr; 71 50 break; 72 51 case IOMAP_UNWRITTEN: 73 - if (seg->iomode & IOMODE_RW) { 52 + if (iomode & IOMODE_RW) { 74 53 /* 75 54 * Crack monkey special case from section 2.3.1. 76 55 */ 77 - if (args->lg_minlength == 0) { 56 + if (minlength == 0) { 78 57 dprintk("pnfsd: no soup for you!\n"); 79 - goto out_layoutunavailable; 58 + return nfserr_layoutunavailable; 80 59 } 81 60 82 61 bex->es = PNFS_BLOCK_INVALID_DATA; ··· 64 85 } 65 86 fallthrough; 66 87 case IOMAP_HOLE: 67 - if (seg->iomode == IOMODE_READ) { 88 + if (iomode == IOMODE_READ) { 68 89 bex->es = PNFS_BLOCK_NONE_DATA; 69 90 break; 70 91 } ··· 72 93 case IOMAP_DELALLOC: 73 94 default: 74 95 WARN(1, "pnfsd: filesystem returned %d extent\n", iomap.type); 75 - goto out_layoutunavailable; 96 + return nfserr_layoutunavailable; 76 97 } 77 98 78 99 error = nfsd4_set_deviceid(&bex->vol_id, fhp, device_generation); 79 100 if (error) 80 - goto out_error; 101 + return nfserrno(error); 102 + 81 103 bex->foff = iomap.offset; 82 104 bex->len = iomap.length; 105 + return nfs_ok; 106 + } 83 107 84 - seg->offset = iomap.offset; 85 - seg->length = iomap.length; 108 + static __be32 109 + nfsd4_block_proc_layoutget(struct svc_rqst *rqstp, struct inode *inode, 110 + const struct svc_fh *fhp, struct nfsd4_layoutget *args) 111 + { 112 + struct nfsd4_layout_seg *seg = &args->lg_seg; 113 + struct pnfs_block_layout *bl; 114 + struct pnfs_block_extent *first_bex, *last_bex; 115 + u64 offset = seg->offset, length = seg->length; 116 + u32 i, nr_extents_max, block_size = i_blocksize(inode); 117 + __be32 nfserr; 86 118 87 - dprintk("GET: 0x%llx:0x%llx %d\n", bex->foff, bex->len, bex->es); 88 - return 0; 119 + if (locks_in_grace(SVC_NET(rqstp))) 120 + return nfserr_grace; 121 + 122 + nfserr = nfserr_layoutunavailable; 123 + if (seg->offset & (block_size - 1)) { 124 + dprintk("pnfsd: I/O misaligned\n"); 125 + goto out_error; 126 + } 127 + 128 + /* 129 + * RFC 8881, section 3.3.17: 130 + * The layout4 data type defines a layout for a file. 131 + * 132 + * RFC 8881, section 18.43.3: 133 + * The loga_maxcount field specifies the maximum layout size 134 + * (in bytes) that the client can handle. If the size of the 135 + * layout structure exceeds the size specified by maxcount, 136 + * the metadata server will return the NFS4ERR_TOOSMALL error. 137 + */ 138 + nfserr = nfserr_toosmall; 139 + if (args->lg_maxcount < PNFS_BLOCK_LAYOUT4_SIZE + 140 + PNFS_BLOCK_EXTENT_SIZE) 141 + goto out_error; 142 + 143 + /* 144 + * Limit the maximum layout size to avoid allocating 145 + * a large buffer on the server for each layout request. 146 + */ 147 + nr_extents_max = (min(args->lg_maxcount, PAGE_SIZE) - 148 + PNFS_BLOCK_LAYOUT4_SIZE) / PNFS_BLOCK_EXTENT_SIZE; 149 + 150 + /* 151 + * Some clients barf on non-zero block numbers for NONE or INVALID 152 + * layouts, so make sure to zero the whole structure. 153 + */ 154 + nfserr = nfserrno(-ENOMEM); 155 + bl = kzalloc(struct_size(bl, extents, nr_extents_max), GFP_KERNEL); 156 + if (!bl) 157 + goto out_error; 158 + bl->nr_extents = nr_extents_max; 159 + args->lg_content = bl; 160 + 161 + for (i = 0; i < bl->nr_extents; i++) { 162 + struct pnfs_block_extent *bex = bl->extents + i; 163 + u64 bex_length; 164 + 165 + nfserr = nfsd4_block_map_extent(inode, fhp, offset, length, 166 + seg->iomode, args->lg_minlength, bex); 167 + if (nfserr != nfs_ok) 168 + goto out_error; 169 + 170 + bex_length = bex->len - (offset - bex->foff); 171 + if (bex_length >= length) { 172 + bl->nr_extents = i + 1; 173 + break; 174 + } 175 + 176 + offset = bex->foff + bex->len; 177 + length -= bex_length; 178 + } 179 + 180 + first_bex = bl->extents; 181 + last_bex = bl->extents + bl->nr_extents - 1; 182 + 183 + nfserr = nfserr_layoutunavailable; 184 + length = last_bex->foff + last_bex->len - seg->offset; 185 + if (length < args->lg_minlength) { 186 + dprintk("pnfsd: extent smaller than minlength\n"); 187 + goto out_error; 188 + } 189 + 190 + seg->offset = first_bex->foff; 191 + seg->length = last_bex->foff - first_bex->foff + last_bex->len; 192 + return nfs_ok; 89 193 90 194 out_error: 91 195 seg->length = 0; 92 - return nfserrno(error); 93 - out_layoutunavailable: 94 - seg->length = 0; 95 - return nfserr_layoutunavailable; 196 + return nfserr; 96 197 } 97 198 98 199 static __be32 ··· 399 340 { 400 341 struct nfs4_client *clp = ls->ls_stid.sc_client; 401 342 struct block_device *bdev = file->nf_file->f_path.mnt->mnt_sb->s_bdev; 343 + int status; 402 344 403 - bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, 404 - nfsd4_scsi_pr_key(clp), 0, true); 345 + status = bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, 346 + nfsd4_scsi_pr_key(clp), 347 + PR_EXCLUSIVE_ACCESS_REG_ONLY, true); 348 + trace_nfsd_pnfs_fence(clp, bdev->bd_disk->disk_name, status); 405 349 } 406 350 407 351 const struct nfsd4_layout_ops scsi_layout_ops = {
+27 -9
fs/nfsd/blocklayoutxdr.c
··· 14 14 #define NFSDDBG_FACILITY NFSDDBG_PNFS 15 15 16 16 17 + /** 18 + * nfsd4_block_encode_layoutget - encode block/scsi layout extent array 19 + * @xdr: stream for data encoding 20 + * @lgp: layoutget content, actually an array of extents to encode 21 + * 22 + * Encode the opaque loc_body field in the layoutget response. Since the 23 + * pnfs_block_layout4 and pnfs_scsi_layout4 structures on the wire are 24 + * the same, this function is used by both layout drivers. 25 + * 26 + * Return values: 27 + * %nfs_ok: Success, all extents encoded into @xdr 28 + * %nfserr_toosmall: Not enough space in @xdr to encode all the data 29 + */ 17 30 __be32 18 31 nfsd4_block_encode_layoutget(struct xdr_stream *xdr, 19 32 const struct nfsd4_layoutget *lgp) 20 33 { 21 - const struct pnfs_block_extent *b = lgp->lg_content; 22 - int len = sizeof(__be32) + 5 * sizeof(__be64) + sizeof(__be32); 34 + const struct pnfs_block_layout *bl = lgp->lg_content; 35 + u32 i, len = sizeof(__be32) + bl->nr_extents * PNFS_BLOCK_EXTENT_SIZE; 23 36 __be32 *p; 24 37 25 38 p = xdr_reserve_space(xdr, sizeof(__be32) + len); ··· 40 27 return nfserr_toosmall; 41 28 42 29 *p++ = cpu_to_be32(len); 43 - *p++ = cpu_to_be32(1); /* we always return a single extent */ 30 + *p++ = cpu_to_be32(bl->nr_extents); 44 31 45 - p = svcxdr_encode_deviceid4(p, &b->vol_id); 46 - p = xdr_encode_hyper(p, b->foff); 47 - p = xdr_encode_hyper(p, b->len); 48 - p = xdr_encode_hyper(p, b->soff); 49 - *p++ = cpu_to_be32(b->es); 50 - return 0; 32 + for (i = 0; i < bl->nr_extents; i++) { 33 + const struct pnfs_block_extent *bex = bl->extents + i; 34 + 35 + p = svcxdr_encode_deviceid4(p, &bex->vol_id); 36 + p = xdr_encode_hyper(p, bex->foff); 37 + p = xdr_encode_hyper(p, bex->len); 38 + p = xdr_encode_hyper(p, bex->soff); 39 + *p++ = cpu_to_be32(bex->es); 40 + } 41 + 42 + return nfs_ok; 51 43 } 52 44 53 45 static int
+14
fs/nfsd/blocklayoutxdr.h
··· 8 8 struct iomap; 9 9 struct xdr_stream; 10 10 11 + /* On the wire size of the layout4 struct with zero number of extents */ 12 + #define PNFS_BLOCK_LAYOUT4_SIZE \ 13 + (sizeof(__be32) * 2 + /* offset4 */ \ 14 + sizeof(__be32) * 2 + /* length4 */ \ 15 + sizeof(__be32) + /* layoutiomode4 */ \ 16 + sizeof(__be32) + /* layouttype4 */ \ 17 + sizeof(__be32) + /* number of bytes */ \ 18 + sizeof(__be32)) /* number of extents */ 19 + 11 20 struct pnfs_block_extent { 12 21 struct nfsd4_deviceid vol_id; 13 22 u64 foff; ··· 28 19 struct pnfs_block_range { 29 20 u64 foff; 30 21 u64 len; 22 + }; 23 + 24 + struct pnfs_block_layout { 25 + u32 nr_extents; 26 + struct pnfs_block_extent extents[] __counted_by(nr_extents); 31 27 }; 32 28 33 29 /*
+3
fs/nfsd/debugfs.c
··· 44 44 * Contents: 45 45 * %0: NFS READ will use buffered IO 46 46 * %1: NFS READ will use dontcache (buffered IO w/ dropbehind) 47 + * %2: NFS READ will use direct IO 47 48 * 48 49 * This setting takes immediate effect for all NFS versions, 49 50 * all exports, and in all NFSD net namespaces. ··· 65 64 nfsd_io_cache_read = NFSD_IO_BUFFERED; 66 65 break; 67 66 case NFSD_IO_DONTCACHE: 67 + case NFSD_IO_DIRECT: 68 68 /* 69 69 * Must disable splice_read when enabling 70 70 * NFSD_IO_DONTCACHE. ··· 108 106 switch (val) { 109 107 case NFSD_IO_BUFFERED: 110 108 case NFSD_IO_DONTCACHE: 109 + case NFSD_IO_DIRECT: 111 110 nfsd_io_cache_write = val; 112 111 break; 113 112 default:
+57 -138
fs/nfsd/nfs4recover.c
··· 32 32 * 33 33 */ 34 34 35 - #include <crypto/hash.h> 35 + #include <crypto/md5.h> 36 36 #include <crypto/sha2.h> 37 37 #include <linux/file.h> 38 38 #include <linux/slab.h> ··· 92 92 put_cred(revert_creds(original)); 93 93 } 94 94 95 - static int 95 + static void 96 96 nfs4_make_rec_clidname(char dname[HEXDIR_LEN], const struct xdr_netobj *clname) 97 97 { 98 98 u8 digest[MD5_DIGEST_SIZE]; 99 - struct crypto_shash *tfm; 100 - int status; 101 99 102 100 dprintk("NFSD: nfs4_make_rec_clidname for %.*s\n", 103 101 clname->len, clname->data); 104 - tfm = crypto_alloc_shash("md5", 0, 0); 105 - if (IS_ERR(tfm)) { 106 - status = PTR_ERR(tfm); 107 - goto out_no_tfm; 108 - } 109 102 110 - status = crypto_shash_tfm_digest(tfm, clname->data, clname->len, 111 - digest); 112 - if (status) 113 - goto out; 103 + md5(clname->data, clname->len, digest); 114 104 115 105 static_assert(HEXDIR_LEN == 2 * MD5_DIGEST_SIZE + 1); 116 106 sprintf(dname, "%*phN", MD5_DIGEST_SIZE, digest); 117 - 118 - status = 0; 119 - out: 120 - crypto_free_shash(tfm); 121 - out_no_tfm: 122 - return status; 123 - } 124 - 125 - /* 126 - * If we had an error generating the recdir name for the legacy tracker 127 - * then warn the admin. If the error doesn't appear to be transient, 128 - * then disable recovery tracking. 129 - */ 130 - static void 131 - legacy_recdir_name_error(struct nfs4_client *clp, int error) 132 - { 133 - printk(KERN_ERR "NFSD: unable to generate recoverydir " 134 - "name (%d).\n", error); 135 - 136 - /* 137 - * if the algorithm just doesn't exist, then disable the recovery 138 - * tracker altogether. The crypto libs will generally return this if 139 - * FIPS is enabled as well. 140 - */ 141 - if (error == -ENOENT) { 142 - printk(KERN_ERR "NFSD: disabling legacy clientid tracking. " 143 - "Reboot recovery will not function correctly!\n"); 144 - nfsd4_client_tracking_exit(clp->net); 145 - } 146 107 } 147 108 148 109 static void 149 110 __nfsd4_create_reclaim_record_grace(struct nfs4_client *clp, 150 - const char *dname, int len, struct nfsd_net *nn) 111 + char *dname, struct nfsd_net *nn) 151 112 { 152 - struct xdr_netobj name; 113 + struct xdr_netobj name = { .len = strlen(dname), .data = dname }; 153 114 struct xdr_netobj princhash = { .len = 0, .data = NULL }; 154 115 struct nfs4_client_reclaim *crp; 155 116 156 - name.data = kmemdup(dname, len, GFP_KERNEL); 157 - if (!name.data) { 158 - dprintk("%s: failed to allocate memory for name.data!\n", 159 - __func__); 160 - return; 161 - } 162 - name.len = len; 163 117 crp = nfs4_client_to_reclaim(name, princhash, nn); 164 - if (!crp) { 165 - kfree(name.data); 166 - return; 167 - } 168 118 crp->cr_clp = clp; 169 119 } 170 120 ··· 132 182 if (!nn->rec_file) 133 183 return; 134 184 135 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 136 - if (status) 137 - return legacy_recdir_name_error(clp, status); 185 + nfs4_make_rec_clidname(dname, &clp->cl_name); 138 186 139 187 status = nfs4_save_creds(&original_cred); 140 188 if (status < 0) ··· 167 219 out: 168 220 if (status == 0) { 169 221 if (nn->in_grace) 170 - __nfsd4_create_reclaim_record_grace(clp, dname, 171 - HEXDIR_LEN, nn); 222 + __nfsd4_create_reclaim_record_grace(clp, dname, nn); 172 223 vfs_fsync(nn->rec_file, 0); 173 224 } else { 174 225 printk(KERN_ERR "NFSD: failed to write recovery record" ··· 180 233 nfs4_reset_creds(original_cred); 181 234 } 182 235 183 - typedef int (recdir_func)(struct dentry *, struct dentry *, struct nfsd_net *); 236 + typedef int (recdir_func)(struct dentry *, char *, struct nfsd_net *); 184 237 185 238 struct name_list { 186 239 char name[HEXDIR_LEN]; ··· 234 287 } 235 288 236 289 status = iterate_dir(nn->rec_file, &ctx.ctx); 237 - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); 238 290 239 291 list_for_each_entry_safe(entry, tmp, &ctx.names, list) { 240 - if (!status) { 241 - struct dentry *dentry; 242 - dentry = lookup_one(&nop_mnt_idmap, 243 - &QSTR(entry->name), dir); 244 - if (IS_ERR(dentry)) { 245 - status = PTR_ERR(dentry); 246 - break; 247 - } 248 - status = f(dir, dentry, nn); 249 - dput(dentry); 250 - } 292 + if (!status) 293 + status = f(dir, entry->name, nn); 294 + 251 295 list_del(&entry->list); 252 296 kfree(entry); 253 297 } 254 - inode_unlock(d_inode(dir)); 255 298 nfs4_reset_creds(original_cred); 256 299 257 300 list_for_each_entry_safe(entry, tmp, &ctx.names, list) { ··· 301 364 if (!nn->rec_file || !test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) 302 365 return; 303 366 304 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 305 - if (status) 306 - return legacy_recdir_name_error(clp, status); 367 + nfs4_make_rec_clidname(dname, &clp->cl_name); 307 368 308 369 status = mnt_want_write_file(nn->rec_file); 309 370 if (status) ··· 329 394 } 330 395 331 396 static int 332 - purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) 397 + purge_old(struct dentry *parent, char *cname, struct nfsd_net *nn) 333 398 { 334 399 int status; 400 + struct dentry *child; 335 401 struct xdr_netobj name; 336 402 337 - if (child->d_name.len != HEXDIR_LEN - 1) { 338 - printk("%s: illegal name %pd in recovery directory\n", 339 - __func__, child); 403 + if (strlen(cname) != HEXDIR_LEN - 1) { 404 + printk("%s: illegal name %s in recovery directory\n", 405 + __func__, cname); 340 406 /* Keep trying; maybe the others are OK: */ 341 407 return 0; 342 408 } 343 - name.data = kmemdup_nul(child->d_name.name, child->d_name.len, GFP_KERNEL); 409 + name.data = kstrdup(cname, GFP_KERNEL); 344 410 if (!name.data) { 345 411 dprintk("%s: failed to allocate memory for name.data!\n", 346 412 __func__); ··· 351 415 if (nfs4_has_reclaimed_state(name, nn)) 352 416 goto out_free; 353 417 354 - status = vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child, NULL); 355 - if (status) 356 - printk("failed to remove client recovery directory %pd\n", 357 - child); 418 + inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); 419 + child = lookup_one(&nop_mnt_idmap, &QSTR(cname), parent); 420 + if (!IS_ERR(child)) { 421 + status = vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child, NULL); 422 + if (status) 423 + printk("failed to remove client recovery directory %pd\n", 424 + child); 425 + dput(child); 426 + } 427 + inode_unlock(d_inode(parent)); 428 + 358 429 out_free: 359 430 kfree(name.data); 360 431 out: ··· 392 449 } 393 450 394 451 static int 395 - load_recdir(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) 452 + load_recdir(struct dentry *parent, char *cname, struct nfsd_net *nn) 396 453 { 397 - struct xdr_netobj name; 454 + struct xdr_netobj name = { .len = HEXDIR_LEN, .data = cname }; 398 455 struct xdr_netobj princhash = { .len = 0, .data = NULL }; 399 456 400 - if (child->d_name.len != HEXDIR_LEN - 1) { 401 - printk("%s: illegal name %pd in recovery directory\n", 402 - __func__, child); 457 + if (strlen(cname) != HEXDIR_LEN - 1) { 458 + printk("%s: illegal name %s in recovery directory\n", 459 + __func__, cname); 403 460 /* Keep trying; maybe the others are OK: */ 404 461 return 0; 405 462 } 406 - name.data = kmemdup_nul(child->d_name.name, child->d_name.len, GFP_KERNEL); 407 - if (!name.data) { 408 - dprintk("%s: failed to allocate memory for name.data!\n", 409 - __func__); 410 - goto out; 411 - } 412 - name.len = HEXDIR_LEN; 413 - if (!nfs4_client_to_reclaim(name, princhash, nn)) 414 - kfree(name.data); 415 - out: 463 + nfs4_client_to_reclaim(name, princhash, nn); 416 464 return 0; 417 465 } 418 466 ··· 581 647 static int 582 648 nfsd4_check_legacy_client(struct nfs4_client *clp) 583 649 { 584 - int status; 585 650 char dname[HEXDIR_LEN]; 586 651 struct nfs4_client_reclaim *crp; 587 652 struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); ··· 590 657 if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) 591 658 return 0; 592 659 593 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 594 - if (status) { 595 - legacy_recdir_name_error(clp, status); 596 - return status; 597 - } 660 + nfs4_make_rec_clidname(dname, &clp->cl_name); 598 661 599 662 /* look for it in the reclaim hashtable otherwise */ 600 663 name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL); ··· 696 767 { 697 768 uint8_t cmd, princhashlen; 698 769 struct xdr_netobj name, princhash = { .len = 0, .data = NULL }; 770 + char *namecopy __free(kfree) = NULL; 771 + char *princhashcopy __free(kfree) = NULL; 699 772 uint16_t namelen; 700 773 701 774 if (get_user(cmd, &cmsg->cm_cmd)) { ··· 715 784 dprintk("%s: invalid namelen (%u)", __func__, namelen); 716 785 return -EINVAL; 717 786 } 718 - name.data = memdup_user(&ci->cc_name.cn_id, namelen); 719 - if (IS_ERR(name.data)) 720 - return PTR_ERR(name.data); 787 + namecopy = memdup_user(&ci->cc_name.cn_id, namelen); 788 + if (IS_ERR(namecopy)) 789 + return PTR_ERR(namecopy); 790 + name.data = namecopy; 721 791 name.len = namelen; 722 792 get_user(princhashlen, &ci->cc_princhash.cp_len); 723 793 if (princhashlen > 0) { 724 - princhash.data = memdup_user( 725 - &ci->cc_princhash.cp_data, 726 - princhashlen); 727 - if (IS_ERR(princhash.data)) { 728 - kfree(name.data); 729 - return PTR_ERR(princhash.data); 730 - } 794 + princhashcopy = memdup_user( 795 + &ci->cc_princhash.cp_data, 796 + princhashlen); 797 + if (IS_ERR(princhashcopy)) 798 + return PTR_ERR(princhashcopy); 799 + princhash.data = princhashcopy; 731 800 princhash.len = princhashlen; 732 801 } else 733 802 princhash.len = 0; ··· 741 810 dprintk("%s: invalid namelen (%u)", __func__, namelen); 742 811 return -EINVAL; 743 812 } 744 - name.data = memdup_user(&cnm->cn_id, namelen); 745 - if (IS_ERR(name.data)) 746 - return PTR_ERR(name.data); 813 + namecopy = memdup_user(&cnm->cn_id, namelen); 814 + if (IS_ERR(namecopy)) 815 + return PTR_ERR(namecopy); 816 + name.data = namecopy; 747 817 name.len = namelen; 748 818 } 749 819 #ifdef CONFIG_NFSD_LEGACY_CLIENT_TRACKING ··· 752 820 struct cld_net *cn = nn->cld_net; 753 821 754 822 name.len = name.len - 5; 755 - memmove(name.data, name.data + 5, name.len); 823 + name.data = name.data + 5; 756 824 cn->cn_has_legacy = true; 757 825 } 758 826 #endif 759 - if (!nfs4_client_to_reclaim(name, princhash, nn)) { 760 - kfree(name.data); 761 - kfree(princhash.data); 827 + if (!nfs4_client_to_reclaim(name, princhash, nn)) 762 828 return -EFAULT; 763 - } 764 829 return nn->client_tracking_ops->msglen; 765 830 } 766 831 return -EFAULT; ··· 1183 1254 1184 1255 #ifdef CONFIG_NFSD_LEGACY_CLIENT_TRACKING 1185 1256 if (nn->cld_net->cn_has_legacy) { 1186 - int status; 1187 1257 char dname[HEXDIR_LEN]; 1188 1258 struct xdr_netobj name; 1189 1259 1190 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 1191 - if (status) 1192 - return -ENOENT; 1260 + nfs4_make_rec_clidname(dname, &clp->cl_name); 1193 1261 1194 1262 name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL); 1195 1263 if (!name.data) { ··· 1231 1305 if (cn->cn_has_legacy) { 1232 1306 struct xdr_netobj name; 1233 1307 char dname[HEXDIR_LEN]; 1234 - int status; 1235 1308 1236 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 1237 - if (status) 1238 - return -ENOENT; 1309 + nfs4_make_rec_clidname(dname, &clp->cl_name); 1239 1310 1240 1311 name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL); 1241 1312 if (!name.data) { ··· 1605 1682 return NULL; 1606 1683 } 1607 1684 1608 - copied = nfs4_make_rec_clidname(result + copied, name); 1609 - if (copied) { 1610 - kfree(result); 1611 - return NULL; 1612 - } 1685 + nfs4_make_rec_clidname(result + copied, name); 1613 1686 1614 1687 return result; 1615 1688 }
+37 -48
fs/nfsd/nfs4state.c
··· 3508 3508 free_svc_cred(&slot->sl_cred); 3509 3509 copy_cred(&slot->sl_cred, &resp->rqstp->rq_cred); 3510 3510 3511 - if (!nfsd4_cache_this(resp)) { 3511 + if (!(resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS)) { 3512 3512 slot->sl_flags &= ~NFSD4_SLOT_CACHED; 3513 3513 return; 3514 3514 } ··· 3523 3523 } 3524 3524 3525 3525 /* 3526 - * Encode the replay sequence operation from the slot values. 3527 - * If cachethis is FALSE encode the uncached rep error on the next 3528 - * operation which sets resp->p and increments resp->opcnt for 3529 - * nfs4svc_encode_compoundres. 3530 - * 3531 - */ 3532 - static __be32 3533 - nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args, 3534 - struct nfsd4_compoundres *resp) 3535 - { 3536 - struct nfsd4_op *op; 3537 - struct nfsd4_slot *slot = resp->cstate.slot; 3538 - 3539 - /* Encode the replayed sequence operation */ 3540 - op = &args->ops[resp->opcnt - 1]; 3541 - nfsd4_encode_operation(resp, op); 3542 - 3543 - if (slot->sl_flags & NFSD4_SLOT_CACHED) 3544 - return op->status; 3545 - if (args->opcnt == 1) { 3546 - /* 3547 - * The original operation wasn't a solo sequence--we 3548 - * always cache those--so this retry must not match the 3549 - * original: 3550 - */ 3551 - op->status = nfserr_seq_false_retry; 3552 - } else { 3553 - op = &args->ops[resp->opcnt++]; 3554 - op->status = nfserr_retry_uncached_rep; 3555 - nfsd4_encode_operation(resp, op); 3556 - } 3557 - return op->status; 3558 - } 3559 - 3560 - /* 3561 3526 * The sequence operation is not cached because we can use the slot and 3562 3527 * session values. 3563 3528 */ ··· 3530 3565 nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp, 3531 3566 struct nfsd4_sequence *seq) 3532 3567 { 3568 + struct nfsd4_compoundargs *args = resp->rqstp->rq_argp; 3533 3569 struct nfsd4_slot *slot = resp->cstate.slot; 3534 3570 struct xdr_stream *xdr = resp->xdr; 3535 3571 __be32 *p; 3536 - __be32 status; 3537 3572 3538 3573 dprintk("--> %s slot %p\n", __func__, slot); 3539 3574 3540 - status = nfsd4_enc_sequence_replay(resp->rqstp->rq_argp, resp); 3541 - if (status) 3542 - return status; 3575 + /* Always encode the SEQUENCE response. */ 3576 + nfsd4_encode_operation(resp, &args->ops[0]); 3577 + if (args->opcnt == 1) 3578 + /* A solo SEQUENCE - nothing was cached */ 3579 + return args->ops[0].status; 3543 3580 3581 + if (!(slot->sl_flags & NFSD4_SLOT_CACHED)) { 3582 + /* We weren't asked to cache this. */ 3583 + struct nfsd4_op *op; 3584 + 3585 + op = &args->ops[resp->opcnt++]; 3586 + op->status = nfserr_retry_uncached_rep; 3587 + nfsd4_encode_operation(resp, op); 3588 + return op->status; 3589 + } 3590 + 3591 + /* return reply from cache */ 3544 3592 p = xdr_reserve_space(xdr, slot->sl_datalen); 3545 3593 if (!p) { 3546 3594 WARN_ON_ONCE(1); ··· 6340 6362 return; 6341 6363 out_no_deleg: 6342 6364 open->op_delegate_type = OPEN_DELEGATE_NONE; 6343 - if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS && 6344 - open->op_delegate_type != OPEN_DELEGATE_NONE) { 6345 - dprintk("NFSD: WARNING: refusing delegation reclaim\n"); 6346 - open->op_recall = true; 6347 - } 6348 6365 6349 6366 /* 4.1 client asking for a delegation? */ 6350 6367 if (open->op_deleg_want) ··· 8775 8802 8776 8803 /* 8777 8804 * failure => all reset bets are off, nfserr_no_grace... 8778 - * 8779 - * The caller is responsible for freeing name.data if NULL is returned (it 8780 - * will be freed in nfs4_remove_reclaim_record in the normal case). 8781 8805 */ 8782 8806 struct nfs4_client_reclaim * 8783 8807 nfs4_client_to_reclaim(struct xdr_netobj name, struct xdr_netobj princhash, ··· 8783 8813 unsigned int strhashval; 8784 8814 struct nfs4_client_reclaim *crp; 8785 8815 8816 + name.data = kmemdup(name.data, name.len, GFP_KERNEL); 8817 + if (!name.data) { 8818 + dprintk("%s: failed to allocate memory for name.data!\n", 8819 + __func__); 8820 + return NULL; 8821 + } 8822 + if (princhash.len) { 8823 + princhash.data = kmemdup(princhash.data, princhash.len, GFP_KERNEL); 8824 + if (!princhash.data) { 8825 + dprintk("%s: failed to allocate memory for princhash.data!\n", 8826 + __func__); 8827 + kfree(name.data); 8828 + return NULL; 8829 + } 8830 + } else 8831 + princhash.data = NULL; 8786 8832 crp = alloc_reclaim(); 8787 8833 if (crp) { 8788 8834 strhashval = clientstr_hashval(name); ··· 8810 8824 crp->cr_princhash.len = princhash.len; 8811 8825 crp->cr_clp = NULL; 8812 8826 nn->reclaim_str_hashtbl_size++; 8827 + } else { 8828 + kfree(name.data); 8829 + kfree(princhash.data); 8813 8830 } 8814 8831 return crp; 8815 8832 }
+20 -8
fs/nfsd/nfs4xdr.c
··· 4472 4472 4473 4473 static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, 4474 4474 struct nfsd4_read *read, 4475 - struct file *file, unsigned long maxcount) 4475 + unsigned long maxcount) 4476 4476 { 4477 4477 struct xdr_stream *xdr = resp->xdr; 4478 4478 unsigned int base = xdr->buf->page_len & ~PAGE_MASK; ··· 4480 4480 __be32 zero = xdr_zero; 4481 4481 __be32 nfserr; 4482 4482 4483 - if (xdr_reserve_space_vec(xdr, maxcount) < 0) 4484 - return nfserr_resource; 4485 - 4486 - nfserr = nfsd_iter_read(resp->rqstp, read->rd_fhp, file, 4483 + nfserr = nfsd_iter_read(resp->rqstp, read->rd_fhp, read->rd_nf, 4487 4484 read->rd_offset, &maxcount, base, 4488 4485 &read->rd_eof); 4489 4486 read->rd_length = maxcount; 4490 4487 if (nfserr) 4491 4488 return nfserr; 4489 + 4490 + /* 4491 + * svcxdr_encode_opaque_pages() is not used here because 4492 + * we don't want to encode subsequent results in this 4493 + * COMPOUND into the xdr->buf's tail, but rather those 4494 + * results should follow the NFS READ payload in the 4495 + * buf's pages. 4496 + */ 4497 + if (xdr_reserve_space_vec(xdr, maxcount) < 0) 4498 + return nfserr_resource; 4499 + 4500 + /* 4501 + * Mark the buffer location of the NFS READ payload so that 4502 + * direct placement-capable transports send only the 4503 + * payload bytes out-of-band. 4504 + */ 4492 4505 if (svc_encode_result_payload(resp->rqstp, starting_len, maxcount)) 4493 4506 return nfserr_io; 4494 - xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount)); 4495 4507 4496 4508 write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &zero, 4497 4509 xdr_pad_size(maxcount)); ··· 4542 4530 if (file->f_op->splice_read && splice_ok) 4543 4531 nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); 4544 4532 else 4545 - nfserr = nfsd4_encode_readv(resp, read, file, maxcount); 4533 + nfserr = nfsd4_encode_readv(resp, read, maxcount); 4546 4534 if (nfserr) { 4547 4535 xdr_truncate_encode(xdr, eof_offset); 4548 4536 return nfserr; ··· 5438 5426 if (file->f_op->splice_read && splice_ok) 5439 5427 nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); 5440 5428 else 5441 - nfserr = nfsd4_encode_readv(resp, read, file, maxcount); 5429 + nfserr = nfsd4_encode_readv(resp, read, maxcount); 5442 5430 if (nfserr) 5443 5431 return nfserr; 5444 5432
+2 -2
fs/nfsd/nfsd.h
··· 160 160 /* Any new NFSD_IO enum value must be added at the end */ 161 161 NFSD_IO_BUFFERED, 162 162 NFSD_IO_DONTCACHE, 163 + NFSD_IO_DIRECT, 163 164 }; 164 165 165 166 extern u64 nfsd_io_cache_read __read_mostly; ··· 398 397 #define NFSD_CB_GETATTR_TIMEOUT NFSD_DELEGRETURN_TIMEOUT 399 398 400 399 /* 401 - * The following attributes are currently not supported by the NFSv4 server: 400 + * The following attributes are not implemented by NFSD: 402 401 * ARCHIVE (deprecated anyway) 403 402 * HIDDEN (unlikely to be supported any time soon) 404 403 * MIMETYPE (unlikely to be supported any time soon) 405 404 * QUOTA_* (will be supported in a forthcoming patch) 406 405 * SYSTEM (unlikely to be supported any time soon) 407 406 * TIME_BACKUP (unlikely to be supported any time soon) 408 - * TIME_CREATE (unlikely to be supported any time soon) 409 407 */ 410 408 #define NFSD4_SUPPORTED_ATTRS_WORD0 \ 411 409 (FATTR4_WORD0_SUPPORTED_ATTRS | FATTR4_WORD0_TYPE | FATTR4_WORD0_FH_EXPIRE_TYPE \
+5 -23
fs/nfsd/nfssvc.c
··· 249 249 return rv; 250 250 } 251 251 252 - static int nfsd_init_socks(struct net *net, const struct cred *cred) 253 - { 254 - int error; 255 - struct nfsd_net *nn = net_generic(net, nfsd_net_id); 256 - 257 - if (!list_empty(&nn->nfsd_serv->sv_permsocks)) 258 - return 0; 259 - 260 - error = svc_xprt_create(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, 261 - SVC_SOCK_DEFAULTS, cred); 262 - if (error < 0) 263 - return error; 264 - 265 - error = svc_xprt_create(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, 266 - SVC_SOCK_DEFAULTS, cred); 267 - if (error < 0) 268 - return error; 269 - 270 - return 0; 271 - } 272 - 273 252 static int nfsd_users = 0; 274 253 275 254 static int nfsd_startup_generic(void) ··· 356 377 ret = nfsd_startup_generic(); 357 378 if (ret) 358 379 return ret; 359 - ret = nfsd_init_socks(net, cred); 360 - if (ret) 380 + 381 + if (list_empty(&nn->nfsd_serv->sv_permsocks)) { 382 + pr_warn("NFSD: Failed to start, no listeners configured.\n"); 383 + ret = -EIO; 361 384 goto out_socks; 385 + } 362 386 363 387 if (nfsd_needs_lockd(nn) && !nn->lockd_up) { 364 388 ret = lockd_up(net, cred);
+41
fs/nfsd/trace.h
··· 464 464 DEFINE_NFSD_IO_EVENT(read_start); 465 465 DEFINE_NFSD_IO_EVENT(read_splice); 466 466 DEFINE_NFSD_IO_EVENT(read_vector); 467 + DEFINE_NFSD_IO_EVENT(read_direct); 467 468 DEFINE_NFSD_IO_EVENT(read_io_done); 468 469 DEFINE_NFSD_IO_EVENT(read_done); 469 470 DEFINE_NFSD_IO_EVENT(write_start); 470 471 DEFINE_NFSD_IO_EVENT(write_opened); 472 + DEFINE_NFSD_IO_EVENT(write_direct); 473 + DEFINE_NFSD_IO_EVENT(write_vector); 471 474 DEFINE_NFSD_IO_EVENT(write_io_done); 472 475 DEFINE_NFSD_IO_EVENT(write_done); 473 476 DEFINE_NFSD_IO_EVENT(commit_start); ··· 2616 2613 DEFINE_NFSD_VFS_GETATTR_EVENT(nfsd_vfs_getattr); 2617 2614 DEFINE_NFSD_VFS_GETATTR_EVENT(nfsd_vfs_statfs); 2618 2615 2616 + DECLARE_EVENT_CLASS(nfsd_pnfs_class, 2617 + TP_PROTO( 2618 + const struct nfs4_client *clp, 2619 + const char *dev, 2620 + int error 2621 + ), 2622 + TP_ARGS(clp, dev, error), 2623 + TP_STRUCT__entry( 2624 + __sockaddr(addr, sizeof(struct sockaddr_in6)) 2625 + __field(unsigned int, netns_ino) 2626 + __string(dev, dev) 2627 + __field(int, error) 2628 + ), 2629 + TP_fast_assign( 2630 + __assign_sockaddr(addr, &clp->cl_addr, 2631 + sizeof(struct sockaddr_in6)); 2632 + __entry->netns_ino = clp->net->ns.inum; 2633 + __assign_str(dev); 2634 + __entry->error = error; 2635 + ), 2636 + TP_printk("client=%pISpc nn=%d dev=%s error=%d", 2637 + __get_sockaddr(addr), 2638 + __entry->netns_ino, 2639 + __get_str(dev), 2640 + __entry->error 2641 + ) 2642 + ); 2643 + 2644 + #define DEFINE_NFSD_PNFS_ERR_EVENT(name) \ 2645 + DEFINE_EVENT(nfsd_pnfs_class, nfsd_pnfs_##name, \ 2646 + TP_PROTO( \ 2647 + const struct nfs4_client *clp, \ 2648 + const char *dev, \ 2649 + int error \ 2650 + ), \ 2651 + TP_ARGS(clp, dev, error)) 2652 + 2653 + DEFINE_NFSD_PNFS_ERR_EVENT(fence); 2619 2654 #endif /* _NFSD_TRACE_H */ 2620 2655 2621 2656 #undef TRACE_INCLUDE_PATH
+247 -14
fs/nfsd/vfs.c
··· 1075 1075 return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); 1076 1076 } 1077 1077 1078 + /* 1079 + * The byte range of the client's READ request is expanded on both ends 1080 + * until it meets the underlying file system's direct I/O alignment 1081 + * requirements. After the internal read is complete, the byte range of 1082 + * the NFS READ payload is reduced to the byte range that was originally 1083 + * requested. 1084 + * 1085 + * Note that a direct read can be done only when the xdr_buf containing 1086 + * the NFS READ reply does not already have contents in its .pages array. 1087 + * This is due to potentially restrictive alignment requirements on the 1088 + * read buffer. When .page_len and @base are zero, the .pages array is 1089 + * guaranteed to be page-aligned. 1090 + */ 1091 + static noinline_for_stack __be32 1092 + nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp, 1093 + struct nfsd_file *nf, loff_t offset, unsigned long *count, 1094 + u32 *eof) 1095 + { 1096 + u64 dio_start, dio_end; 1097 + unsigned long v, total; 1098 + struct iov_iter iter; 1099 + struct kiocb kiocb; 1100 + ssize_t host_err; 1101 + size_t len; 1102 + 1103 + init_sync_kiocb(&kiocb, nf->nf_file); 1104 + kiocb.ki_flags |= IOCB_DIRECT; 1105 + 1106 + /* Read a properly-aligned region of bytes into rq_bvec */ 1107 + dio_start = round_down(offset, nf->nf_dio_read_offset_align); 1108 + dio_end = round_up((u64)offset + *count, nf->nf_dio_read_offset_align); 1109 + 1110 + kiocb.ki_pos = dio_start; 1111 + 1112 + v = 0; 1113 + total = dio_end - dio_start; 1114 + while (total && v < rqstp->rq_maxpages && 1115 + rqstp->rq_next_page < rqstp->rq_page_end) { 1116 + len = min_t(size_t, total, PAGE_SIZE); 1117 + bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, 1118 + len, 0); 1119 + 1120 + total -= len; 1121 + ++rqstp->rq_next_page; 1122 + ++v; 1123 + } 1124 + 1125 + trace_nfsd_read_direct(rqstp, fhp, offset, *count - total); 1126 + iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, 1127 + dio_end - dio_start - total); 1128 + 1129 + host_err = vfs_iocb_iter_read(nf->nf_file, &kiocb, &iter); 1130 + if (host_err >= 0) { 1131 + unsigned int pad = offset - dio_start; 1132 + 1133 + /* The returned payload starts after the pad */ 1134 + rqstp->rq_res.page_base = pad; 1135 + 1136 + /* Compute the count of bytes to be returned */ 1137 + if (host_err > pad + *count) 1138 + host_err = *count; 1139 + else if (host_err > pad) 1140 + host_err -= pad; 1141 + else 1142 + host_err = 0; 1143 + } else if (unlikely(host_err == -EINVAL)) { 1144 + struct inode *inode = d_inode(fhp->fh_dentry); 1145 + 1146 + pr_info_ratelimited("nfsd: Direct I/O alignment failure on %s/%ld\n", 1147 + inode->i_sb->s_id, inode->i_ino); 1148 + host_err = -ESERVERFAULT; 1149 + } 1150 + 1151 + return nfsd_finish_read(rqstp, fhp, nf->nf_file, offset, count, 1152 + eof, host_err); 1153 + } 1154 + 1078 1155 /** 1079 1156 * nfsd_iter_read - Perform a VFS read using an iterator 1080 1157 * @rqstp: RPC transaction context 1081 1158 * @fhp: file handle of file to be read 1082 - * @file: opened struct file of file to be read 1159 + * @nf: opened struct nfsd_file of file to be read 1083 1160 * @offset: starting byte offset 1084 1161 * @count: IN: requested number of bytes; OUT: number of bytes read 1085 1162 * @base: offset in first page of read buffer ··· 1169 1092 * returned. 1170 1093 */ 1171 1094 __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, 1172 - struct file *file, loff_t offset, unsigned long *count, 1095 + struct nfsd_file *nf, loff_t offset, unsigned long *count, 1173 1096 unsigned int base, u32 *eof) 1174 1097 { 1098 + struct file *file = nf->nf_file; 1175 1099 unsigned long v, total; 1176 1100 struct iov_iter iter; 1177 1101 struct kiocb kiocb; ··· 1184 1106 switch (nfsd_io_cache_read) { 1185 1107 case NFSD_IO_BUFFERED: 1186 1108 break; 1109 + case NFSD_IO_DIRECT: 1110 + /* When dio_read_offset_align is zero, dio is not supported */ 1111 + if (nf->nf_dio_read_offset_align && !rqstp->rq_res.page_len) 1112 + return nfsd_direct_read(rqstp, fhp, nf, offset, 1113 + count, eof); 1114 + fallthrough; 1187 1115 case NFSD_IO_DONTCACHE: 1188 1116 if (file->f_op->fop_flags & FOP_DONTCACHE) 1189 1117 kiocb.ki_flags = IOCB_DONTCACHE; ··· 1200 1116 1201 1117 v = 0; 1202 1118 total = *count; 1203 - while (total) { 1119 + while (total && v < rqstp->rq_maxpages && 1120 + rqstp->rq_next_page < rqstp->rq_page_end) { 1204 1121 len = min_t(size_t, total, PAGE_SIZE - base); 1205 - bvec_set_page(&rqstp->rq_bvec[v], *(rqstp->rq_next_page++), 1122 + bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, 1206 1123 len, base); 1124 + 1207 1125 total -= len; 1126 + ++rqstp->rq_next_page; 1208 1127 ++v; 1209 1128 base = 0; 1210 1129 } 1211 - WARN_ON_ONCE(v > rqstp->rq_maxpages); 1212 1130 1213 - trace_nfsd_read_vector(rqstp, fhp, offset, *count); 1214 - iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count); 1131 + trace_nfsd_read_vector(rqstp, fhp, offset, *count - total); 1132 + iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count - total); 1215 1133 host_err = vfs_iocb_iter_read(file, &kiocb, &iter); 1216 1134 return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); 1217 1135 } ··· 1253 1167 last_ino = inode->i_ino; 1254 1168 last_dev = inode->i_sb->s_dev; 1255 1169 return err; 1170 + } 1171 + 1172 + struct nfsd_write_dio_seg { 1173 + struct iov_iter iter; 1174 + int flags; 1175 + }; 1176 + 1177 + static unsigned long 1178 + iov_iter_bvec_offset(const struct iov_iter *iter) 1179 + { 1180 + return (unsigned long)(iter->bvec->bv_offset + iter->iov_offset); 1181 + } 1182 + 1183 + static void 1184 + nfsd_write_dio_seg_init(struct nfsd_write_dio_seg *segment, 1185 + struct bio_vec *bvec, unsigned int nvecs, 1186 + unsigned long total, size_t start, size_t len, 1187 + struct kiocb *iocb) 1188 + { 1189 + iov_iter_bvec(&segment->iter, ITER_SOURCE, bvec, nvecs, total); 1190 + if (start) 1191 + iov_iter_advance(&segment->iter, start); 1192 + iov_iter_truncate(&segment->iter, len); 1193 + segment->flags = iocb->ki_flags; 1194 + } 1195 + 1196 + static unsigned int 1197 + nfsd_write_dio_iters_init(struct nfsd_file *nf, struct bio_vec *bvec, 1198 + unsigned int nvecs, struct kiocb *iocb, 1199 + unsigned long total, 1200 + struct nfsd_write_dio_seg segments[3]) 1201 + { 1202 + u32 offset_align = nf->nf_dio_offset_align; 1203 + loff_t prefix_end, orig_end, middle_end; 1204 + u32 mem_align = nf->nf_dio_mem_align; 1205 + size_t prefix, middle, suffix; 1206 + loff_t offset = iocb->ki_pos; 1207 + unsigned int nsegs = 0; 1208 + 1209 + /* 1210 + * Check if direct I/O is feasible for this write request. 1211 + * If alignments are not available, the write is too small, 1212 + * or no alignment can be found, fall back to buffered I/O. 1213 + */ 1214 + if (unlikely(!mem_align || !offset_align) || 1215 + unlikely(total < max(offset_align, mem_align))) 1216 + goto no_dio; 1217 + 1218 + prefix_end = round_up(offset, offset_align); 1219 + orig_end = offset + total; 1220 + middle_end = round_down(orig_end, offset_align); 1221 + 1222 + prefix = prefix_end - offset; 1223 + middle = middle_end - prefix_end; 1224 + suffix = orig_end - middle_end; 1225 + 1226 + if (!middle) 1227 + goto no_dio; 1228 + 1229 + if (prefix) 1230 + nfsd_write_dio_seg_init(&segments[nsegs++], bvec, 1231 + nvecs, total, 0, prefix, iocb); 1232 + 1233 + nfsd_write_dio_seg_init(&segments[nsegs], bvec, nvecs, 1234 + total, prefix, middle, iocb); 1235 + 1236 + /* 1237 + * Check if the bvec iterator is aligned for direct I/O. 1238 + * 1239 + * bvecs generated from RPC receive buffers are contiguous: After 1240 + * the first bvec, all subsequent bvecs start at bv_offset zero 1241 + * (page-aligned). Therefore, only the first bvec is checked. 1242 + */ 1243 + if (iov_iter_bvec_offset(&segments[nsegs].iter) & (mem_align - 1)) 1244 + goto no_dio; 1245 + segments[nsegs].flags |= IOCB_DIRECT; 1246 + nsegs++; 1247 + 1248 + if (suffix) 1249 + nfsd_write_dio_seg_init(&segments[nsegs++], bvec, nvecs, total, 1250 + prefix + middle, suffix, iocb); 1251 + 1252 + return nsegs; 1253 + 1254 + no_dio: 1255 + /* No DIO alignment possible - pack into single non-DIO segment. */ 1256 + nfsd_write_dio_seg_init(&segments[0], bvec, nvecs, total, 0, 1257 + total, iocb); 1258 + return 1; 1259 + } 1260 + 1261 + static noinline_for_stack int 1262 + nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp, 1263 + struct nfsd_file *nf, unsigned int nvecs, 1264 + unsigned long *cnt, struct kiocb *kiocb) 1265 + { 1266 + struct nfsd_write_dio_seg segments[3]; 1267 + struct file *file = nf->nf_file; 1268 + unsigned int nsegs, i; 1269 + ssize_t host_err; 1270 + 1271 + nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs, 1272 + kiocb, *cnt, segments); 1273 + 1274 + *cnt = 0; 1275 + for (i = 0; i < nsegs; i++) { 1276 + kiocb->ki_flags = segments[i].flags; 1277 + if (kiocb->ki_flags & IOCB_DIRECT) 1278 + trace_nfsd_write_direct(rqstp, fhp, kiocb->ki_pos, 1279 + segments[i].iter.count); 1280 + else { 1281 + trace_nfsd_write_vector(rqstp, fhp, kiocb->ki_pos, 1282 + segments[i].iter.count); 1283 + /* 1284 + * Mark the I/O buffer as evict-able to reduce 1285 + * memory contention. 1286 + */ 1287 + if (nf->nf_file->f_op->fop_flags & FOP_DONTCACHE) 1288 + kiocb->ki_flags |= IOCB_DONTCACHE; 1289 + } 1290 + 1291 + host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter); 1292 + if (host_err < 0) 1293 + return host_err; 1294 + *cnt += host_err; 1295 + if (host_err < segments[i].iter.count) 1296 + break; /* partial write */ 1297 + } 1298 + 1299 + return 0; 1256 1300 } 1257 1301 1258 1302 /** ··· 1445 1229 stable = NFS_UNSTABLE; 1446 1230 init_sync_kiocb(&kiocb, file); 1447 1231 kiocb.ki_pos = offset; 1448 - if (stable && !fhp->fh_use_wgather) 1449 - kiocb.ki_flags |= IOCB_DSYNC; 1232 + if (likely(!fhp->fh_use_wgather)) { 1233 + switch (stable) { 1234 + case NFS_FILE_SYNC: 1235 + /* persist data and timestamps */ 1236 + kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC; 1237 + break; 1238 + case NFS_DATA_SYNC: 1239 + /* persist data only */ 1240 + kiocb.ki_flags |= IOCB_DSYNC; 1241 + break; 1242 + } 1243 + } 1450 1244 1451 1245 nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); 1452 - iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); 1246 + 1453 1247 since = READ_ONCE(file->f_wb_err); 1454 1248 if (verf) 1455 1249 nfsd_copy_write_verifier(verf, nn); 1456 1250 1457 1251 switch (nfsd_io_cache_write) { 1458 - case NFSD_IO_BUFFERED: 1252 + case NFSD_IO_DIRECT: 1253 + host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs, 1254 + cnt, &kiocb); 1459 1255 break; 1460 1256 case NFSD_IO_DONTCACHE: 1461 1257 if (file->f_op->fop_flags & FOP_DONTCACHE) 1462 1258 kiocb.ki_flags |= IOCB_DONTCACHE; 1259 + fallthrough; 1260 + case NFSD_IO_BUFFERED: 1261 + iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); 1262 + host_err = vfs_iocb_iter_write(file, &kiocb, &iter); 1263 + if (host_err < 0) 1264 + break; 1265 + *cnt = host_err; 1463 1266 break; 1464 1267 } 1465 - host_err = vfs_iocb_iter_write(file, &kiocb, &iter); 1466 1268 if (host_err < 0) { 1467 1269 commit_reset_write_verifier(nn, rqstp, host_err); 1468 1270 goto out_nfserr; 1469 1271 } 1470 - *cnt = host_err; 1471 1272 nfsd_stats_io_write_add(nn, exp, *cnt); 1472 1273 fsnotify_modify(file); 1473 1274 host_err = filemap_check_wb_err(file->f_mapping, since); ··· 1568 1335 if (file->f_op->splice_read && nfsd_read_splice_ok(rqstp)) 1569 1336 err = nfsd_splice_read(rqstp, fhp, file, offset, count, eof); 1570 1337 else 1571 - err = nfsd_iter_read(rqstp, fhp, file, offset, count, 0, eof); 1338 + err = nfsd_iter_read(rqstp, fhp, nf, offset, count, 0, eof); 1572 1339 1573 1340 nfsd_file_put(nf); 1574 1341 trace_nfsd_read_done(rqstp, fhp, offset, *count);
+1 -1
fs/nfsd/vfs.h
··· 121 121 unsigned long *count, 122 122 u32 *eof); 123 123 __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, 124 - struct file *file, loff_t offset, 124 + struct nfsd_file *nf, loff_t offset, 125 125 unsigned long *count, unsigned int base, 126 126 u32 *eof); 127 127 bool nfsd_read_splice_ok(struct svc_rqst *rqstp);
-21
fs/nfsd/xdr4.h
··· 924 924 struct nfsd4_compound_state cstate; 925 925 }; 926 926 927 - static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp) 928 - { 929 - struct nfsd4_compoundargs *args = resp->rqstp->rq_argp; 930 - return resp->opcnt == 1 && args->ops[0].opnum == OP_SEQUENCE; 931 - } 932 - 933 - /* 934 - * The session reply cache only needs to cache replies that the client 935 - * actually asked us to. But it's almost free for us to cache compounds 936 - * consisting of only a SEQUENCE op, so we may as well cache those too. 937 - * Also, the protocol doesn't give us a convenient response in the case 938 - * of a replay of a solo SEQUENCE op that wasn't cached 939 - * (RETRY_UNCACHED_REP can only be returned in the second op of a 940 - * compound). 941 - */ 942 - static inline bool nfsd4_cache_this(struct nfsd4_compoundres *resp) 943 - { 944 - return (resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS) 945 - || nfsd4_is_solo_sequence(resp); 946 - } 947 - 948 927 static inline bool nfsd4_last_compound_op(struct svc_rqst *rqstp) 949 928 { 950 929 struct nfsd4_compoundres *resp = rqstp->rq_resp;
+8 -1
include/linux/lockd/lockd.h
··· 12 12 13 13 /* XXX: a lot of this should really be under fs/lockd. */ 14 14 15 + #include <linux/exportfs.h> 15 16 #include <linux/in.h> 16 17 #include <linux/in6.h> 17 18 #include <net/ipv6.h> ··· 308 307 int nlmsvc_unlock_all_by_sb(struct super_block *sb); 309 308 int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr); 310 309 311 - static inline struct file *nlmsvc_file_file(struct nlm_file *file) 310 + static inline struct file *nlmsvc_file_file(const struct nlm_file *file) 312 311 { 313 312 return file->f_file[O_RDONLY] ? 314 313 file->f_file[O_RDONLY] : file->f_file[O_WRONLY]; ··· 317 316 static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) 318 317 { 319 318 return file_inode(nlmsvc_file_file(file)); 319 + } 320 + 321 + static inline bool 322 + nlmsvc_file_cannot_lock(const struct nlm_file *file) 323 + { 324 + return exportfs_cannot_lock(nlmsvc_file_file(file)->f_path.dentry->d_sb->s_export_op); 320 325 } 321 326 322 327 static inline int __nlm_privileged_request4(const struct sockaddr *sap)
+1 -1
include/linux/sunrpc/svc_rdma.h
··· 131 131 */ 132 132 enum { 133 133 RPCRDMA_LISTEN_BACKLOG = 10, 134 - RPCRDMA_MAX_REQUESTS = 64, 134 + RPCRDMA_MAX_REQUESTS = 128, 135 135 RPCRDMA_MAX_BC_REQUESTS = 2, 136 136 }; 137 137
+3
include/linux/sunrpc/svcsock.h
··· 26 26 void (*sk_odata)(struct sock *); 27 27 void (*sk_owspace)(struct sock *); 28 28 29 + /* For sends (protected by xpt_mutex) */ 30 + struct bio_vec *sk_bvec; 31 + 29 32 /* private TCP part */ 30 33 /* On-the-wire fragment header: */ 31 34 __be32 sk_marker;
+52 -10
net/sunrpc/svcsock.c
··· 68 68 69 69 #define RPCDBG_FACILITY RPCDBG_SVCXPRT 70 70 71 + /* 72 + * For UDP: 73 + * 1 for header page 74 + * enough pages for RPCSVC_MAXPAYLOAD_UDP 75 + * 1 in case payload is not aligned 76 + * 1 for tail page 77 + */ 78 + enum { 79 + SUNRPC_MAX_UDP_SENDPAGES = 1 + RPCSVC_MAXPAYLOAD_UDP / PAGE_SIZE + 1 + 1 80 + }; 81 + 71 82 /* To-do: to avoid tying up an nfsd thread while waiting for a 72 83 * handshake request, the request could instead be deferred. 73 84 */ ··· 751 740 if (svc_xprt_is_dead(xprt)) 752 741 goto out_notconn; 753 742 754 - count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr); 743 + count = xdr_buf_to_bvec(svsk->sk_bvec, SUNRPC_MAX_UDP_SENDPAGES, xdr); 755 744 756 - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 745 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, svsk->sk_bvec, 757 746 count, rqstp->rq_res.len); 758 747 err = sock_sendmsg(svsk->sk_sock, &msg); 759 748 if (err == -ECONNREFUSED) { 760 749 /* ICMP error on earlier request. */ 761 - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 750 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, svsk->sk_bvec, 762 751 count, rqstp->rq_res.len); 763 752 err = sock_sendmsg(svsk->sk_sock, &msg); 764 753 } ··· 1073 1062 return svc_sock_reclen(svsk); 1074 1063 1075 1064 err_too_large: 1076 - net_notice_ratelimited("svc: %s %s RPC fragment too large: %d\n", 1077 - __func__, svsk->sk_xprt.xpt_server->sv_name, 1078 - svc_sock_reclen(svsk)); 1065 + net_notice_ratelimited("svc: %s oversized RPC fragment (%u octets) from %pISpc\n", 1066 + svsk->sk_xprt.xpt_server->sv_name, 1067 + svc_sock_reclen(svsk), 1068 + (struct sockaddr *)&svsk->sk_xprt.xpt_remote); 1079 1069 svc_xprt_deferred_close(&svsk->sk_xprt); 1080 1070 err_short: 1081 1071 return -EAGAIN; ··· 1247 1235 int ret; 1248 1236 1249 1237 /* The stream record marker is copied into a temporary page 1250 - * fragment buffer so that it can be included in rq_bvec. 1238 + * fragment buffer so that it can be included in sk_bvec. 1251 1239 */ 1252 1240 buf = page_frag_alloc(&svsk->sk_frag_cache, sizeof(marker), 1253 1241 GFP_KERNEL); 1254 1242 if (!buf) 1255 1243 return -ENOMEM; 1256 1244 memcpy(buf, &marker, sizeof(marker)); 1257 - bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); 1245 + bvec_set_virt(svsk->sk_bvec, buf, sizeof(marker)); 1258 1246 1259 - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages, 1247 + count = xdr_buf_to_bvec(svsk->sk_bvec + 1, rqstp->rq_maxpages, 1260 1248 &rqstp->rq_res); 1261 1249 1262 - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 1250 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, svsk->sk_bvec, 1263 1251 1 + count, sizeof(marker) + rqstp->rq_res.len); 1264 1252 ret = sock_sendmsg(svsk->sk_sock, &msg); 1265 1253 page_frag_free(buf); ··· 1404 1392 spin_unlock_bh(&serv->sv_lock); 1405 1393 } 1406 1394 1395 + static int svc_sock_sendpages(struct svc_serv *serv, struct socket *sock, int flags) 1396 + { 1397 + switch (sock->type) { 1398 + case SOCK_STREAM: 1399 + /* +1 for TCP record marker */ 1400 + if (flags & SVC_SOCK_TEMPORARY) 1401 + return svc_serv_maxpages(serv) + 1; 1402 + return 0; 1403 + case SOCK_DGRAM: 1404 + return SUNRPC_MAX_UDP_SENDPAGES; 1405 + } 1406 + return -EINVAL; 1407 + } 1408 + 1407 1409 /* 1408 1410 * Initialize socket for RPC use and create svc_sock struct 1409 1411 */ ··· 1428 1402 struct svc_sock *svsk; 1429 1403 struct sock *inet; 1430 1404 int pmap_register = !(flags & SVC_SOCK_ANONYMOUS); 1405 + int sendpages; 1431 1406 unsigned long pages; 1407 + 1408 + sendpages = svc_sock_sendpages(serv, sock, flags); 1409 + if (sendpages < 0) 1410 + return ERR_PTR(sendpages); 1432 1411 1433 1412 pages = svc_serv_maxpages(serv); 1434 1413 svsk = kzalloc(struct_size(svsk, sk_pages, pages), GFP_KERNEL); 1435 1414 if (!svsk) 1436 1415 return ERR_PTR(-ENOMEM); 1416 + 1417 + if (sendpages) { 1418 + svsk->sk_bvec = kcalloc(sendpages, sizeof(*svsk->sk_bvec), GFP_KERNEL); 1419 + if (!svsk->sk_bvec) { 1420 + kfree(svsk); 1421 + return ERR_PTR(-ENOMEM); 1422 + } 1423 + } 1424 + 1437 1425 svsk->sk_maxpages = pages; 1438 1426 1439 1427 inet = sock->sk; ··· 1459 1419 inet->sk_protocol, 1460 1420 ntohs(inet_sk(inet)->inet_sport)); 1461 1421 if (err < 0) { 1422 + kfree(svsk->sk_bvec); 1462 1423 kfree(svsk); 1463 1424 return ERR_PTR(err); 1464 1425 } ··· 1677 1636 sock_release(sock); 1678 1637 1679 1638 page_frag_cache_drain(&svsk->sk_frag_cache); 1639 + kfree(svsk->sk_bvec); 1680 1640 kfree(svsk); 1681 1641 }
+8 -11
net/sunrpc/xprtrdma/svc_rdma_transport.c
··· 591 591 rdma_disconnect(rdma->sc_cm_id); 592 592 } 593 593 594 - static void __svc_rdma_free(struct work_struct *work) 594 + /** 595 + * svc_rdma_free - Release class-specific transport resources 596 + * @xprt: Generic svc transport object 597 + */ 598 + static void svc_rdma_free(struct svc_xprt *xprt) 595 599 { 596 600 struct svcxprt_rdma *rdma = 597 - container_of(work, struct svcxprt_rdma, sc_work); 601 + container_of(xprt, struct svcxprt_rdma, sc_xprt); 598 602 struct ib_device *device = rdma->sc_cm_id->device; 603 + 604 + might_sleep(); 599 605 600 606 /* This blocks until the Completion Queues are empty */ 601 607 if (rdma->sc_qp && !IS_ERR(rdma->sc_qp)) ··· 633 627 if (!test_bit(XPT_LISTENER, &rdma->sc_xprt.xpt_flags)) 634 628 rpcrdma_rn_unregister(device, &rdma->sc_rn); 635 629 kfree(rdma); 636 - } 637 - 638 - static void svc_rdma_free(struct svc_xprt *xprt) 639 - { 640 - struct svcxprt_rdma *rdma = 641 - container_of(xprt, struct svcxprt_rdma, sc_xprt); 642 - 643 - INIT_WORK(&rdma->sc_work, __svc_rdma_free); 644 - schedule_work(&rdma->sc_work); 645 630 } 646 631 647 632 static int svc_rdma_has_wspace(struct svc_xprt *xprt)
+6 -5
tools/net/sunrpc/xdrgen/generators/__init__.py
··· 2 2 3 3 """Define a base code generator class""" 4 4 5 - import sys 5 + from pathlib import Path 6 6 from jinja2 import Environment, FileSystemLoader, Template 7 7 8 8 from xdr_ast import _XdrAst, Specification, _RpcProgram, _XdrTypeSpecifier ··· 14 14 """Open a set of templates based on output language""" 15 15 match language: 16 16 case "C": 17 + templates_dir = ( 18 + Path(__file__).parent.parent / "templates" / language / xdr_type 19 + ) 17 20 environment = Environment( 18 - loader=FileSystemLoader(sys.path[0] + "/templates/C/" + xdr_type + "/"), 21 + loader=FileSystemLoader(templates_dir), 19 22 trim_blocks=True, 20 23 lstrip_blocks=True, 21 24 ) ··· 51 48 52 49 def header_guard_infix(filename: str) -> str: 53 50 """Extract the header guard infix from the specification filename""" 54 - basename = filename.split("/")[-1] 55 - program = basename.replace(".x", "") 56 - return program.upper() 51 + return Path(filename).stem.upper() 57 52 58 53 59 54 def kernel_c_type(spec: _XdrTypeSpecifier) -> str:
+25 -9
tools/net/sunrpc/xdrgen/generators/union.py
··· 8 8 from generators import SourceGenerator 9 9 from generators import create_jinja2_environment, get_jinja2_template 10 10 11 - from xdr_ast import _XdrBasic, _XdrUnion, _XdrVoid, get_header_name 11 + from xdr_ast import _XdrBasic, _XdrUnion, _XdrVoid, _XdrString, get_header_name 12 12 from xdr_ast import _XdrDeclaration, _XdrCaseSpec, public_apis, big_endian 13 13 14 14 ··· 40 40 """Emit a definition for an XDR union's case arm""" 41 41 if isinstance(node.arm, _XdrVoid): 42 42 return 43 - assert isinstance(node.arm, _XdrBasic) 43 + if isinstance(node.arm, _XdrString): 44 + type_name = "char *" 45 + classifier = "" 46 + else: 47 + type_name = node.arm.spec.type_name 48 + classifier = node.arm.spec.c_classifier 49 + 50 + assert isinstance(node.arm, (_XdrBasic, _XdrString)) 44 51 template = get_jinja2_template(environment, "definition", "case_spec") 45 52 print( 46 53 template.render( 47 54 name=node.arm.name, 48 - type=node.arm.spec.type_name, 49 - classifier=node.arm.spec.c_classifier, 55 + type=type_name, 56 + classifier=classifier, 50 57 ) 51 58 ) 52 59 ··· 91 84 92 85 if isinstance(node.arm, _XdrVoid): 93 86 return 87 + if isinstance(node.arm, _XdrString): 88 + type_name = "char *" 89 + classifier = "" 90 + else: 91 + type_name = node.arm.spec.type_name 92 + classifier = node.arm.spec.c_classifier 94 93 95 94 if big_endian_discriminant: 96 95 template = get_jinja2_template(environment, "decoder", "case_spec_be") ··· 105 92 for case in node.values: 106 93 print(template.render(case=case)) 107 94 108 - assert isinstance(node.arm, _XdrBasic) 95 + assert isinstance(node.arm, (_XdrBasic, _XdrString)) 109 96 template = get_jinja2_template(environment, "decoder", node.arm.template) 110 97 print( 111 98 template.render( 112 99 name=node.arm.name, 113 - type=node.arm.spec.type_name, 114 - classifier=node.arm.spec.c_classifier, 100 + type=type_name, 101 + classifier=classifier, 115 102 ) 116 103 ) 117 104 ··· 182 169 183 170 if isinstance(node.arm, _XdrVoid): 184 171 return 185 - 172 + if isinstance(node.arm, _XdrString): 173 + type_name = "char *" 174 + else: 175 + type_name = node.arm.spec.type_name 186 176 if big_endian_discriminant: 187 177 template = get_jinja2_template(environment, "encoder", "case_spec_be") 188 178 else: ··· 197 181 print( 198 182 template.render( 199 183 name=node.arm.name, 200 - type=node.arm.spec.type_name, 184 + type=type_name, 201 185 ) 202 186 ) 203 187
+1 -1
tools/net/sunrpc/xdrgen/templates/C/pointer/decoder/close.j2
··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/pointer/encoder/close.j2
··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/struct/decoder/close.j2
··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/struct/decoder/variable_length_opaque.j2
··· 2 2 {% if annotate %} 3 3 /* member {{ name }} (variable-length opaque) */ 4 4 {% endif %} 5 - if (!xdrgen_decode_opaque(xdr, (opaque *)ptr, {{ maxsize }})) 5 + if (!xdrgen_decode_opaque(xdr, &ptr->{{ name }}, {{ maxsize }})) 6 6 return false;
+1 -1
tools/net/sunrpc/xdrgen/templates/C/struct/encoder/close.j2
··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/basic.j2
··· 14 14 /* (basic) */ 15 15 {% endif %} 16 16 return xdrgen_decode_{{ type }}(xdr, ptr); 17 - }; 17 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/fixed_length_array.j2
··· 22 22 return false; 23 23 } 24 24 return true; 25 - }; 25 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/fixed_length_opaque.j2
··· 14 14 /* (fixed-length opaque) */ 15 15 {% endif %} 16 16 return xdr_stream_decode_opaque_fixed(xdr, ptr, {{ size }}) == 0; 17 - }; 17 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/string.j2
··· 14 14 /* (variable-length string) */ 15 15 {% endif %} 16 16 return xdrgen_decode_string(xdr, ptr, {{ maxsize }}); 17 - }; 17 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/variable_length_array.j2
··· 23 23 if (!xdrgen_decode_{{ type }}(xdr, &ptr->element[i])) 24 24 return false; 25 25 return true; 26 - }; 26 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/variable_length_opaque.j2
··· 14 14 /* (variable-length opaque) */ 15 15 {% endif %} 16 16 return xdrgen_decode_opaque(xdr, ptr, {{ maxsize }}); 17 - }; 17 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/basic.j2
··· 18 18 /* (basic) */ 19 19 {% endif %} 20 20 return xdrgen_encode_{{ type }}(xdr, value); 21 - }; 21 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/fixed_length_array.j2
··· 22 22 return false; 23 23 } 24 24 return true; 25 - }; 25 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/fixed_length_opaque.j2
··· 14 14 /* (fixed-length opaque) */ 15 15 {% endif %} 16 16 return xdr_stream_encode_opaque_fixed(xdr, value, {{ size }}) >= 0; 17 - }; 17 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/string.j2
··· 14 14 /* (variable-length string) */ 15 15 {% endif %} 16 16 return xdr_stream_encode_opaque(xdr, value.data, value.len) >= 0; 17 - }; 17 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/variable_length_array.j2
··· 27 27 {% endif %} 28 28 return false; 29 29 return true; 30 - }; 30 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/variable_length_opaque.j2
··· 14 14 /* (variable-length opaque) */ 15 15 {% endif %} 16 16 return xdr_stream_encode_opaque(xdr, value.data, value.len) >= 0; 17 - }; 17 + }
+4
tools/net/sunrpc/xdrgen/templates/C/union/declaration/close.j2
··· 1 + {# SPDX-License-Identifier: GPL-2.0 #} 2 + 3 + bool xdrgen_decode_{{ name }}(struct xdr_stream *xdr, struct {{ name }} *ptr); 4 + bool xdrgen_encode_{{ name }}(struct xdr_stream *xdr, const struct {{ name }} *value);
+1 -1
tools/net/sunrpc/xdrgen/templates/C/union/decoder/close.j2
··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 } 3 3 return true; 4 - }; 4 + }
+1 -1
tools/net/sunrpc/xdrgen/templates/C/union/encoder/close.j2
··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 } 3 3 return true; 4 - }; 4 + }
+6
tools/net/sunrpc/xdrgen/templates/C/union/encoder/string.j2
··· 1 + {# SPDX-License-Identifier: GPL-2.0 #} 2 + {% if annotate %} 3 + /* member {{ name }} (variable-length string) */ 4 + {% endif %} 5 + if (!xdrgen_encode_string(xdr, ptr->u.{{ name }}, {{ maxsize }})) 6 + return false;
+5
tools/net/sunrpc/xdrgen/xdrgen
··· 10 10 __version__ = "0.2" 11 11 12 12 import sys 13 + from pathlib import Path 13 14 import argparse 15 + 16 + _XDRGEN_DIR = Path(__file__).resolve().parent 17 + if str(_XDRGEN_DIR) not in sys.path: 18 + sys.path.insert(0, str(_XDRGEN_DIR)) 14 19 15 20 from subcmds import definitions 16 21 from subcmds import declarations