Merge tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

+1

Documentation/filesystems/nfs/index.rst

··· 13 13 rpc-cache 14 14 rpc-server-gss 15 15 nfs41-server 16 + nfsd-io-modes 16 17 knfsd-stats 17 18 reexport

+153

Documentation/filesystems/nfs/nfsd-io-modes.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============= 4 + NFSD IO MODES 5 + ============= 6 + 7 + Overview 8 + ======== 9 + 10 + NFSD has historically always used buffered IO when servicing READ and 11 + WRITE operations. BUFFERED is NFSD's default IO mode, but it is possible 12 + to override that default to use either DONTCACHE or DIRECT IO modes. 13 + 14 + Experimental NFSD debugfs interfaces are available to allow the NFSD IO 15 + mode used for READ and WRITE to be configured independently. See both: 16 + 17 + - /sys/kernel/debug/nfsd/io_cache_read 18 + - /sys/kernel/debug/nfsd/io_cache_write 19 + 20 + The default value for both io_cache_read and io_cache_write reflects 21 + NFSD's default IO mode (which is NFSD_IO_BUFFERED=0). 22 + 23 + Based on the configured settings, NFSD's IO will either be: 24 + 25 + - cached using page cache (NFSD_IO_BUFFERED=0) 26 + - cached but removed from page cache on completion (NFSD_IO_DONTCACHE=1) 27 + - not cached stable_how=NFS_UNSTABLE (NFSD_IO_DIRECT=2) 28 + 29 + To set an NFSD IO mode, write a supported value (0 - 2) to the 30 + corresponding IO operation's debugfs interface, e.g.:: 31 + 32 + echo 2 > /sys/kernel/debug/nfsd/io_cache_read 33 + echo 2 > /sys/kernel/debug/nfsd/io_cache_write 34 + 35 + To check which IO mode NFSD is using for READ or WRITE, simply read the 36 + corresponding IO operation's debugfs interface, e.g.:: 37 + 38 + cat /sys/kernel/debug/nfsd/io_cache_read 39 + cat /sys/kernel/debug/nfsd/io_cache_write 40 + 41 + If you experiment with NFSD's IO modes on a recent kernel and have 42 + interesting results, please report them to linux-nfs@vger.kernel.org 43 + 44 + NFSD DONTCACHE 45 + ============== 46 + 47 + DONTCACHE offers a hybrid approach to servicing IO that aims to offer 48 + the benefits of using DIRECT IO without any of the strict alignment 49 + requirements that DIRECT IO imposes. To achieve this buffered IO is used 50 + but the IO is flagged to "drop behind" (meaning associated pages are 51 + dropped from the page cache) when IO completes. 52 + 53 + DONTCACHE aims to avoid what has proven to be a fairly significant 54 + limition of Linux's memory management subsystem if/when large amounts of 55 + data is infrequently accessed (e.g. read once _or_ written once but not 56 + read until much later). Such use-cases are particularly problematic 57 + because the page cache will eventually become a bottleneck to servicing 58 + new IO requests. 59 + 60 + For more context on DONTCACHE, please see these Linux commit headers: 61 + 62 + - Overview: 9ad6344568cc3 ("mm/filemap: change filemap_create_folio() 63 + to take a struct kiocb") 64 + - for READ: 8026e49bff9b1 ("mm/filemap: add read support for 65 + RWF_DONTCACHE") 66 + - for WRITE: 974c5e6139db3 ("xfs: flag as supporting FOP_DONTCACHE") 67 + 68 + NFSD_IO_DONTCACHE will fall back to NFSD_IO_BUFFERED if the underlying 69 + filesystem doesn't indicate support by setting FOP_DONTCACHE. 70 + 71 + NFSD DIRECT 72 + =========== 73 + 74 + DIRECT IO doesn't make use of the page cache, as such it is able to 75 + avoid the Linux memory management's page reclaim scalability problems 76 + without resorting to the hybrid use of page cache that DONTCACHE does. 77 + 78 + Some workloads benefit from NFSD avoiding the page cache, particularly 79 + those with a working set that is significantly larger than available 80 + system memory. The pathological worst-case workload that NFSD DIRECT has 81 + proven to help most is: NFS client issuing large sequential IO to a file 82 + that is 2-3 times larger than the NFS server's available system memory. 83 + The reason for such improvement is NFSD DIRECT eliminates a lot of work 84 + that the memory management subsystem would otherwise be required to 85 + perform (e.g. page allocation, dirty writeback, page reclaim). When 86 + using NFSD DIRECT, kswapd and kcompactd are no longer commanding CPU 87 + time trying to find adequate free pages so that forward IO progress can 88 + be made. 89 + 90 + The performance win associated with using NFSD DIRECT was previously 91 + discussed on linux-nfs, see: 92 + https://lore.kernel.org/linux-nfs/aEslwqa9iMeZjjlV@kernel.org/ 93 + 94 + But in summary: 95 + 96 + - NFSD DIRECT can significantly reduce memory requirements 97 + - NFSD DIRECT can reduce CPU load by avoiding costly page reclaim work 98 + - NFSD DIRECT can offer more deterministic IO performance 99 + 100 + As always, your mileage may vary and so it is important to carefully 101 + consider if/when it is beneficial to make use of NFSD DIRECT. When 102 + assessing comparative performance of your workload please be sure to log 103 + relevant performance metrics during testing (e.g. memory usage, cpu 104 + usage, IO performance). Using perf to collect perf data that may be used 105 + to generate a "flamegraph" for work Linux must perform on behalf of your 106 + test is a really meaningful way to compare the relative health of the 107 + system and how switching NFSD's IO mode changes what is observed. 108 + 109 + If NFSD_IO_DIRECT is specified by writing 2 (or 3 and 4 for WRITE) to 110 + NFSD's debugfs interfaces, ideally the IO will be aligned relative to 111 + the underlying block device's logical_block_size. Also the memory buffer 112 + used to store the READ or WRITE payload must be aligned relative to the 113 + underlying block device's dma_alignment. 114 + 115 + But NFSD DIRECT does handle misaligned IO in terms of O_DIRECT as best 116 + it can: 117 + 118 + Misaligned READ: 119 + If NFSD_IO_DIRECT is used, expand any misaligned READ to the next 120 + DIO-aligned block (on either end of the READ). The expanded READ is 121 + verified to have proper offset/len (logical_block_size) and 122 + dma_alignment checking. 123 + 124 + Misaligned WRITE: 125 + If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start, 126 + middle and end as needed. The large middle segment is DIO-aligned 127 + and the start and/or end are misaligned. Buffered IO is used for the 128 + misaligned segments and O_DIRECT is used for the middle DIO-aligned 129 + segment. DONTCACHE buffered IO is _not_ used for the misaligned 130 + segments because using normal buffered IO offers significant RMW 131 + performance benefit when handling streaming misaligned WRITEs. 132 + 133 + Tracing: 134 + The nfsd_read_direct trace event shows how NFSD expands any 135 + misaligned READ to the next DIO-aligned block (on either end of the 136 + original READ, as needed). 137 + 138 + This combination of trace events is useful for READs:: 139 + 140 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_vector/enable 141 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_direct/enable 142 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_io_done/enable 143 + echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable 144 + 145 + The nfsd_write_direct trace event shows how NFSD splits a given 146 + misaligned WRITE into a DIO-aligned middle segment. 147 + 148 + This combination of trace events is useful for WRITEs:: 149 + 150 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable 151 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_direct/enable 152 + echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable 153 + echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable

+547

Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst

··· 1 + NFSD Maintainer Entry Profile 2 + ============================= 3 + 4 + A Maintainer Entry Profile supplements the top-level process 5 + documents (found in Documentation/process/) with customs that are 6 + specific to a subsystem and its maintainers. A contributor may use 7 + this document to set their expectations and avoid common mistakes. 8 + A maintainer may use these profiles to look across subsystems for 9 + opportunities to converge on best common practices. 10 + 11 + Overview 12 + -------- 13 + The Network File System (NFS) is a standardized family of network 14 + protocols that enable access to files across a set of network- 15 + connected peer hosts. Applications on NFS clients access files that 16 + reside on file systems that are shared by NFS servers. A single 17 + network peer can act as both an NFS client and an NFS server. 18 + 19 + NFSD refers to the NFS server implementation included in the Linux 20 + kernel. An in-kernel NFS server has fast access to files stored 21 + in file systems local to that server. NFSD can share files stored 22 + on most of the file system types native to Linux, including xfs, 23 + ext4, btrfs, and tmpfs. 24 + 25 + Mailing list 26 + ------------ 27 + The linux-nfs@vger.kernel.org mailing list is a public list. Its 28 + purpose is to enable collaboration among developers working on the 29 + Linux NFS stack, both client and server. It is not a place for 30 + conversations that are not related directly to the Linux NFS stack. 31 + 32 + The linux-nfs mailing list is archived on `lore.kernel.org <https://lore.kernel.org/linux-nfs/>`_. 33 + 34 + The Linux NFS community does not have any chat room. 35 + 36 + Reporting bugs 37 + -------------- 38 + If you experience an NFSD-related bug on a distribution-built 39 + kernel, please start by working with your Linux distributor. 40 + 41 + Bug reports against upstream Linux code bases are welcome on the 42 + linux-nfs@vger.kernel.org mailing list, where some active triage 43 + can be done. NFSD bugs may also be reported in the Linux kernel 44 + community's bugzilla at: 45 + 46 + https://bugzilla.kernel.org 47 + 48 + Please file NFSD-related bugs under the "Filesystems/NFSD" 49 + component. In general, including as much detail as possible is a 50 + good start, including pertinent system log messages from both 51 + the client and server. 52 + 53 + User space software related to NFSD, such as mountd or the exportfs 54 + command, is contained in the nfs-utils package. Report problems 55 + with those components to linux-nfs@vger.kernel.org. You might be 56 + directed to move the report to a specific bug tracker. 57 + 58 + Contributor's Guide 59 + ------------------- 60 + 61 + Standards compliance 62 + ~~~~~~~~~~~~~~~~~~~~ 63 + The priority is for NFSD to interoperate fully with the Linux NFS 64 + client. We also test against other popular NFS client implementa- 65 + tions regularly at NFS bake-a-thon events (also known as plug- 66 + fests). Non-Linux NFS clients are not part of upstream NFSD CI/CD. 67 + 68 + The NFSD community strives to provide an NFS server implementation 69 + that interoperates with all standards-compliant NFS client 70 + implementations. This is done by staying as close as is sensible to 71 + the normative mandates in the IETF's published NFS, RPC, and GSS-API 72 + standards. 73 + 74 + It is always useful to reference an RFC and section number in a code 75 + comment where behavior deviates from the standard (and even when the 76 + behavior is compliant but the implementation is obfuscatory). 77 + 78 + On the rare occasion when a deviation from standard-mandated 79 + behavior is needed, brief documentation of the use case or 80 + deficiencies in the standard is a required part of in-code 81 + documentation. 82 + 83 + Care must always be taken to avoid leaking local error codes (ie, 84 + errnos) to clients of NFSD. A proper NFS status code is always 85 + required in NFS protocol replies. 86 + 87 + NFSD administrative interfaces 88 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 89 + NFSD administrative interfaces include: 90 + 91 + - an NFSD or SUNRPC module parameter 92 + 93 + - export options in /etc/exports 94 + 95 + - files under /proc/fs/nfsd/ or /proc/sys/sunrpc/ 96 + 97 + - the NFSD netlink protocol 98 + 99 + Frequently, a request is made to introduce or modify one of NFSD's 100 + traditional administrative interfaces. Certainly it is technically 101 + easy to introduce a new administrative setting. However, there are 102 + good reasons why the NFSD maintainers prefer to leave that as a last 103 + resort: 104 + 105 + - As with any API, administrative interfaces are difficult to get 106 + right. 107 + 108 + - Once they are documented and have a legacy of use, administrative 109 + interfaces become difficult to modify or remove. 110 + 111 + - Every new administrative setting multiplies the NFSD test matrix. 112 + 113 + - The cost of one administrative interface is incremental, but costs 114 + add up across all of the existing interfaces. 115 + 116 + It is often better for everyone if effort is made up front to 117 + understanding the underlying requirement of the new setting, and 118 + then trying to make it tune itself (or to become otherwise 119 + unnecessary). 120 + 121 + If a new setting is indeed necessary, first consider adding it to 122 + the NFSD netlink protocol. Or if it doesn't need to be a reliable 123 + long term user space feature, it can be added to NFSD's menagerie of 124 + experimental settings which reside under /sys/kernel/debug/nfsd/ . 125 + 126 + Field observability 127 + ~~~~~~~~~~~~~~~~~~~ 128 + NFSD employs several different mechanisms for observing operation, 129 + including counters, printks, WARNings, and static trace points. Each 130 + have their strengths and weaknesses. Contributors should select the 131 + most appropriate tool for their task. 132 + 133 + - BUG must be avoided if at all possible, as it will frequently 134 + result in a full system crash. 135 + 136 + - WARN is appropriate only when a full stack trace is useful. 137 + 138 + - printk can show detailed information. These must not be used 139 + in code paths where they can be triggered repeatedly by remote 140 + users. 141 + 142 + - dprintk can show detailed information, but can be enabled only 143 + in pre-set groups. The overhead of emitting output makes dprintk 144 + inappropriate for frequent operations like I/O. 145 + 146 + - Counters are always on, but provide little information about 147 + individual events other than how frequently they occur. 148 + 149 + - static trace points can be enabled individually or in groups 150 + (via a glob). These are generally low overhead, and thus are 151 + favored for use in hot paths. 152 + 153 + - dynamic tracing, such as kprobes or eBPF, are quite flexible but 154 + cannot be used in certain environments (eg, full kernel lock- 155 + down). 156 + 157 + Testing 158 + ~~~~~~~ 159 + The kdevops project 160 + 161 + https://github.com/linux-kdevops/kdevops 162 + 163 + contains several NFS-specific workflows, as well as the community 164 + standard fstests suite. These workflows are based on open source 165 + testing tools such as ltp and fio. Contributors are encouraged to 166 + use these tools without kdevops, or contributors should install and 167 + use kdevops themselves to verify their patches before submission. 168 + 169 + Coding style 170 + ~~~~~~~~~~~~ 171 + Follow the coding style preferences described in 172 + 173 + Documentation/process/coding-style.rst 174 + 175 + with the following exceptions: 176 + 177 + - Add new local variables to a function in reverse Christmas tree 178 + order 179 + 180 + - Use the kdoc comment style for 181 + + non-static functions 182 + + static inline functions 183 + + static functions that are callbacks/virtual functions 184 + 185 + - All new function names start with ``nfsd_`` for non-NFS-version- 186 + specific functions. 187 + 188 + - New function names that are specific to NFSv2 or NFSv3, or are 189 + used by all minor versions of NFSv4, use ``nfsdN_`` where N is 190 + the version. 191 + 192 + - New function names specific to an NFSv4 minor version can be 193 + named with ``nfsd4M_`` where M is the minor version. 194 + 195 + Patch preparation 196 + ~~~~~~~~~~~~~~~~~ 197 + Read and follow all guidelines in 198 + 199 + Documentation/process/submitting-patches.rst 200 + 201 + Use tagging to identify all patch authors. However, reviewers and 202 + testers should be added by replying to the email patch submission. 203 + Email is extensively used in order to publicly archive review and 204 + testing attributions. These tags are automatically inserted into 205 + your patches when they are applied. 206 + 207 + The code in the body of the diff already shows /what/ is being 208 + changed. Thus it is not necessary to repeat that in the patch 209 + description. Instead, the description should contain one or more 210 + of: 211 + 212 + - A brief problem statement ("what is this patch trying to fix?") 213 + with a root-cause analysis. 214 + 215 + - End-user visible symptoms or items that a support engineer might 216 + use to search for the patch, like stack traces. 217 + 218 + - A brief explanation of why the patch is the best way to address 219 + the problem. 220 + 221 + - Any context that reviewers might need to understand the changes 222 + made by the patch. 223 + 224 + - Any relevant benchmarking results, and/or functional test results. 225 + 226 + As detailed in Documentation/process/submitting-patches.rst, 227 + identify the point in history that the issue being addressed was 228 + introduced by using a Fixes: tag. 229 + 230 + Mention in the patch description if that point in history cannot be 231 + determined -- that is, no Fixes: tag can be provided. In this case, 232 + please make it clear to maintainers whether an LTS backport is 233 + needed even though there is no Fixes: tag. 234 + 235 + The NFSD maintainers prefer to add stable tagging themselves, after 236 + public discussion in response to the patch submission. Contributors 237 + may suggest stable tagging, but be aware that many version 238 + management tools add such stable Cc's when you post your patches. 239 + Don't add "Cc: stable" unless you are absolutely sure the patch 240 + needs to go to stable during the initial submission process. 241 + 242 + Patch submission 243 + ~~~~~~~~~~~~~~~~ 244 + Patches to NFSD are submitted via the kernel's email-based review 245 + process that is common to most other kernel subsystems. 246 + 247 + Just before each submission, rebase your patch or series on the 248 + nfsd-testing branch at 249 + 250 + https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 251 + 252 + The NFSD subsystem is maintained separately from the Linux in-kernel 253 + NFS client. The NFSD maintainers do not normally take submissions 254 + for client changes, nor can they respond authoritatively to bug 255 + reports or feature requests for NFS client code. 256 + 257 + This means that contributors might be asked to resubmit patches if 258 + they were emailed to the incorrect set of maintainers and reviewers. 259 + This is not a rejection, but simply a correction of the submission 260 + process. 261 + 262 + When in doubt, consult the NFSD entry in the MAINTAINERS file to 263 + see which files and directories fall under the NFSD subsystem. 264 + 265 + The proper set of email addresses for NFSD patches are: 266 + 267 + To: the NFSD maintainers and reviewers listed in MAINTAINERS 268 + Cc: linux-nfs@vger.kernel.org and optionally linux-kernel@ 269 + 270 + If there are other subsystems involved in the patches (for example 271 + MM or RDMA) their primary mailing list address can be included in 272 + the Cc: field. Other contributors and interested parties may be 273 + included there as well. 274 + 275 + In general we prefer that contributors use common patch email tools 276 + such as "git send-email" or "stg email format/send", which tend to 277 + get the details right without a lot of fuss. 278 + 279 + A series consisting of a single patch is not required to have a 280 + cover letter. However, a cover letter can be included if there is 281 + substantial context that is not appropriate to include in the 282 + patch description. 283 + 284 + Please note that, with an e-mail based submission process, series 285 + cover letters are not part of the work that is committed to the 286 + kernel source code base or its commit history. Therefore always try 287 + to keep pertinent information in the patch descriptions. 288 + 289 + Design documentation is welcome, but as cover letters are not 290 + preserved, a perhaps better option is to include a patch that adds 291 + such documentation under Documentation/filesystems/nfs/. 292 + 293 + Reviewers will ask about test coverage and what use cases the 294 + patches are expected to address. Please be prepared to answer these 295 + questions. 296 + 297 + Review comments from maintainers might be politely stated, but in 298 + general, these are not optional to address when they are actionable. 299 + If necessary, the maintainers retain the right to not apply patches 300 + when contributors refuse to address reasonable requests. 301 + 302 + Post changes to kernel source code and user space source code as 303 + separate series. You can connect the two series with comments in 304 + your cover letters. 305 + 306 + Generally the NFSD maintainers ask for a reposts even for simple 307 + modifications in order to publicly archive the request and the 308 + resulting repost before it is pulled into the NFSD trees. This 309 + also enables us to rebuild a patch series quickly without missing 310 + changes that might have been discussed via email. 311 + 312 + Avoid frequently reposting large series with only small changes. As 313 + a rule of thumb, posting substantial changes more than once a week 314 + will result in reviewer overload. 315 + 316 + Remember, there are only a handful of subsystem maintainers and 317 + reviewers, but potentially many sources of contributions. The 318 + maintainers and reviewers, therefore, are always the less scalable 319 + resource. Be kind to your friendly neighborhood maintainer. 320 + 321 + Patch Acceptance 322 + ~~~~~~~~~~~~~~~~ 323 + There isn't a formal review process for NFSD, but we like to see 324 + at least two Reviewed-by: notices for patches that are more than 325 + simple clean-ups. Reviews are done in public on 326 + linux-nfs@vger.kernel.org and are archived on lore.kernel.org. 327 + 328 + Currently the NFSD patch queues are maintained in branches here: 329 + 330 + https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 331 + 332 + The NFSD maintainers apply patches initially to the nfsd-testing 333 + branch, which is always open to new submissions. Patches can be 334 + applied while review is ongoing. nfsd-testing is a topic branch, 335 + so it can change frequently, it will be rebased, and your patch 336 + might get dropped if there is a problem with it. 337 + 338 + Generally a script-generated "thank you" email will indicate when 339 + your patch has been added to the nfsd-testing branch. You can track 340 + the progress of your patch using the linux-nfs patchworks instance: 341 + 342 + https://patchwork.kernel.org/project/linux-nfs/list/ 343 + 344 + While your patch is in nfsd-testing, it is exposed to a variety of 345 + test environments, including community zero-day bots, static 346 + analysis tools, and NFSD continuous integration testing. The soak 347 + period is three to four weeks. 348 + 349 + Each patch that survives in nfsd-testing for the soak period without 350 + changes is moved to the nfsd-next branch. 351 + 352 + The nfsd-next branch is automatically merged into linux-next and 353 + fs-next on a nightly basis. 354 + 355 + Patches that survive in nfsd-next are included in the next NFSD 356 + merge window pull request. These windows typically occur once every 357 + 63 days (nine weeks). 358 + 359 + When the upstream merge window closes, the nfsd-next branch is 360 + renamed nfsd-fixes, and a new nfsd-next branch is created, based on 361 + the upstream -rc1 tag. 362 + 363 + Fixes that are destined for an upstream -rc release also run the 364 + nfsd-testing gauntlet, but are then applied to the nfsd-fixes 365 + branch. That branch is made available for Linus to pull after a 366 + short time. In order to limit the risk of introducing regressions, 367 + we limit such fixes to emergency situations or fixes to breakage 368 + that occurred during the most recent upstream merge. 369 + 370 + Please make it clear when submitting an emergency patch that 371 + immediate action (either application to -rc or LTS backport) is 372 + needed. 373 + 374 + Sensitive patch submissions and bug reports 375 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 376 + CVEs are generated by specific members of the Linux kernel community 377 + and several external entities. The Linux NFS community does not emit 378 + or assign CVEs. CVEs are assigned after an issue and its fix are 379 + known. 380 + 381 + However, the NFSD maintainers sometimes receive sensitive security 382 + reports, and at times these are significant enough to need to be 383 + embargoed. In such rare cases, fixes can be developed and reviewed 384 + out of the public eye. 385 + 386 + Please be aware that many version management tools add the stable 387 + Cc's when you post your patches. This is generally a nuisance, but 388 + it can result in outing an embargoed security issue accidentally. 389 + Don't add "Cc: stable" unless you are absolutely sure the patch 390 + needs to go to stable@ during the initial submission process. 391 + 392 + Patches that are merged without ever appearing on any list, and 393 + which carry a Reported-by: or Fixes: tag are detected as suspicious 394 + by security-focused people. We encourage that, after any private 395 + review, security-sensitive patches should be posted to linux-nfs@ 396 + for the usual public review, archiving, and test period. 397 + 398 + LLM-generated submissions 399 + ~~~~~~~~~~~~~~~~~~~~~~~~~ 400 + The Linux kernel community as a whole is still exploring the new 401 + world of LLM-generated code. The NFSD maintainers will entertain 402 + submission of patches that are partially or wholly generated by 403 + LLM-based development tools. Such submissions are held to the 404 + same standards as submissions created entirely by human authors: 405 + 406 + - The human contributor identifies themselves via a Signed-off-by: 407 + tag. This tag counts as a DoC. 408 + 409 + - The human contributor is solely responsible for code provenance 410 + and any contamination by inadvertently-included code with a 411 + conflicting license, as usual. 412 + 413 + - The human contributor must be able to answer and address review 414 + questions. A patch description such as "This fixed my problem 415 + but I don't know why" is not acceptable. 416 + 417 + - The contribution is subjected to the same test regimen as all 418 + other submissions. 419 + 420 + - An indication (via a Generated-by: tag or otherwise) that the 421 + contribution is LLM-generated is not required. 422 + 423 + It is easy to address review comments and fix requests in LLM 424 + generated code. So easy, in fact, that it becomes tempting to repost 425 + refreshed code immediately. Please resist that temptation. 426 + 427 + As always, please avoid reposting series revisions more than once 428 + every 24 hours. 429 + 430 + Clean-up patches 431 + ~~~~~~~~~~~~~~~~ 432 + The NFSD maintainers discourage patches which perform simple clean- 433 + ups, which are not in the context of other work. For example: 434 + 435 + * Addressing ``checkpatch.pl`` warnings after merge 436 + * Addressing :ref:`Local variable ordering<rcs>` issues 437 + * Addressing long-standing whitespace damage 438 + 439 + This is because it is felt that the churn that such changes produce 440 + comes at a greater cost than the value of such clean-ups. 441 + 442 + Conversely, spelling and grammar fixes are encouraged. 443 + 444 + Stable and LTS support 445 + ---------------------- 446 + Upstream NFSD continuous integration testing runs against LTS trees 447 + whenever they are updated. 448 + 449 + Please indicate when a patch containing a fix needs to be considered 450 + for LTS kernels, either via a Fixes: tag or explicit mention. 451 + 452 + Feature requests 453 + ---------------- 454 + There is no one way to make an official feature request, but 455 + discussion about the request should eventually make its way to 456 + the linux-nfs@vger.kernel.org mailing list for public review by 457 + the community. 458 + 459 + Subsystem boundaries 460 + ~~~~~~~~~~~~~~~~~~~~ 461 + NFSD itself is not much more than a protocol engine. This means its 462 + primary responsibility is to translate the NFS protocol into API 463 + calls in the Linux kernel. For example, NFSD is not responsible for 464 + knowing exactly how bytes or file attributes are managed on a block 465 + device. It relies on other kernel subsystems for that. 466 + 467 + If the subsystems on which NFSD relies do not implement a particular 468 + feature, even if the standard NFS protocols do support that feature, 469 + that usually means NFSD cannot provide that feature without 470 + substantial development work in other areas of the kernel. 471 + 472 + Specificity 473 + ~~~~~~~~~~~ 474 + Feature requests can come from anywhere, and thus can often be 475 + nebulous. A requester might not understand what a "use case" or 476 + "user story" is. These descriptive paradigms are often used by 477 + developers and architects to understand what is required of a 478 + design, but are terms of art in the software trade, not used in 479 + the everyday world. 480 + 481 + In order to prevent contributors and maintainers from becoming 482 + overwhelmed, we won't be afraid of saying "no" politely to 483 + underspecified requests. 484 + 485 + Community roles and their authority 486 + ----------------------------------- 487 + The purpose of Linux subsystem communities is to provide expertise 488 + and active stewardship of a narrow set of source files in the Linux 489 + kernel. This can include managing user space tooling as well. 490 + 491 + To contextualize the structure of the Linux NFS community that 492 + is responsible for stewardship of the NFS server code base, we 493 + define the community roles here. 494 + 495 + - **Contributor** : Anyone who submits a code change, bug fix, 496 + recommendation, documentation fix, and so on. A contributor can 497 + submit regularly or infrequently. 498 + 499 + - **Outside Contributor** : A contributor who is not a regular actor 500 + in the Linux NFS community. This can mean someone who contributes 501 + to other parts of the kernel, or someone who just noticed a 502 + misspelling in a comment and sent a patch. 503 + 504 + - **Reviewer** : Someone who is named in the MAINTAINERS file as a 505 + reviewer is an area expert who can request changes to contributed 506 + code, and expects that contributors will address the request. 507 + 508 + - **External Reviewer** : Someone who is not named in the 509 + MAINTAINERS file as a reviewer, but who is an area expert. 510 + Examples include Linux kernel contributors with networking, 511 + security, or persistent storage expertise, or developers who 512 + contribute primarily to other NFS implementations. 513 + 514 + One or more people will take on the following roles. These people 515 + are often generically referred to as "maintainers", and are 516 + identified in the MAINTAINERS file with the "M:" tag under the NFSD 517 + subsystem. 518 + 519 + - **Upstream Release Manager** : This role is responsible for 520 + curating contributions into a branch, reviewing test results, and 521 + then sending a pull request during merge windows. There is a 522 + trust relationship between the release manager and Linus. 523 + 524 + - **Bug Triager** : Someone who is a first responder to bug reports 525 + submitted to the linux-nfs mailing list or bug trackers, and helps 526 + troubleshoot and identify next steps. 527 + 528 + - **Security Lead** : The security lead handles contacts from the 529 + security community to resolve immediate issues, as well as dealing 530 + with long-term security issues such as supply chain concerns. For 531 + upstream, that's usually whether contributions violate licensing 532 + or other intellectual property agreements. 533 + 534 + - **Testing Lead** : The testing lead builds and runs the test 535 + infrastructure for the subsystem. The testing lead may ask for 536 + patches to be dropped because of ongoing high defect rates. 537 + 538 + - **LTS Maintainer** : The LTS maintainer is responsible for managing 539 + the Fixes: and Cc: stable annotations on patches, and seeing that 540 + patches that cannot be automatically applied to LTS kernels get 541 + proper manual backports as necessary. 542 + 543 + - **Community Manager** : This umpire role can be asked to call balls 544 + and strikes during conflicts, but is also responsible for ensuring 545 + the health of the relationships within the community and for 546 + facilitating discussions on long-term topics such as how to manage 547 + growing technical debt.

+1

Documentation/maintainer/maintainer-entry-profile.rst

··· 110 110 ../process/maintainer-netdev 111 111 ../driver-api/vfio-pci-device-specific-driver-acceptance 112 112 ../nvme/feature-and-quirk-policy 113 + ../filesystems/nfs/nfsd-maintainer-entry-profile 113 114 ../filesystems/xfs/xfs-maintainer-entry-profile 114 115 ../mm/damon/maintainer-profile

+5

MAINTAINERS

··· 13654 13654 R: Tom Talpey <tom@talpey.com> 13655 13655 L: linux-nfs@vger.kernel.org 13656 13656 S: Supported 13657 + P: Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst 13657 13658 B: https://bugzilla.kernel.org 13658 13659 T: git git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git 13659 13660 F: Documentation/filesystems/nfs/ ··· 13673 13672 F: include/uapi/linux/sunrpc/ 13674 13673 F: net/sunrpc/ 13675 13674 F: tools/net/sunrpc/ 13675 + 13676 + KERNEL NFSD BLOCK and SCSI LAYOUT DRIVER 13677 + R: Christoph Hellwig <hch@lst.de> 13678 + F: fs/nfsd/blocklayout* 13676 13679 13677 13680 KERNEL PACMAN PACKAGING (in addition to generic KERNEL BUILD) 13678 13681 M: Thomas Weißschuh <linux@weissschuh.net>

+12

fs/lockd/svclock.c

··· 495 495 (long long)lock->fl.fl_end, 496 496 wait); 497 497 498 + if (nlmsvc_file_cannot_lock(file)) 499 + return nlm_lck_denied_nolocks; 500 + 498 501 if (!locks_can_async_lock(nlmsvc_file_file(file)->f_op)) { 499 502 async_block = wait; 500 503 wait = 0; ··· 624 621 (long long)lock->fl.fl_start, 625 622 (long long)lock->fl.fl_end); 626 623 624 + if (nlmsvc_file_cannot_lock(file)) 625 + return nlm_lck_denied_nolocks; 626 + 627 627 if (locks_in_grace(SVC_NET(rqstp))) { 628 628 ret = nlm_lck_denied_grace_period; 629 629 goto out; ··· 684 678 (long long)lock->fl.fl_start, 685 679 (long long)lock->fl.fl_end); 686 680 681 + if (nlmsvc_file_cannot_lock(file)) 682 + return nlm_lck_denied_nolocks; 683 + 687 684 /* First, cancel any lock that might be there */ 688 685 nlmsvc_cancel_blocked(net, file, lock); 689 686 ··· 723 714 lock->fl.c.flc_pid, 724 715 (long long)lock->fl.fl_start, 725 716 (long long)lock->fl.fl_end); 717 + 718 + if (nlmsvc_file_cannot_lock(file)) 719 + return nlm_lck_denied_nolocks; 726 720 727 721 if (locks_in_grace(net)) 728 722 return nlm_lck_denied_grace_period;

+6

fs/lockd/svcshare.c

··· 32 32 struct xdr_netobj *oh = &argp->lock.oh; 33 33 u8 *ohdata; 34 34 35 + if (nlmsvc_file_cannot_lock(file)) 36 + return nlm_lck_denied_nolocks; 37 + 35 38 for (share = file->f_shares; share; share = share->s_next) { 36 39 if (share->s_host == host && nlm_cmp_owner(share, oh)) 37 40 goto update; ··· 74 71 { 75 72 struct nlm_share *share, **shpp; 76 73 struct xdr_netobj *oh = &argp->lock.oh; 74 + 75 + if (nlmsvc_file_cannot_lock(file)) 76 + return nlm_lck_denied_nolocks; 77 77 78 78 for (shpp = &file->f_shares; (share = *shpp) != NULL; 79 79 shpp = &share->s_next) {

+3 -3

fs/nfsd/Kconfig

··· 5 5 depends on FILE_LOCKING 6 6 depends on FSNOTIFY 7 7 select CRC32 8 + select CRYPTO_LIB_MD5 if NFSD_LEGACY_CLIENT_TRACKING 8 9 select CRYPTO_LIB_SHA256 if NFSD_V4 9 10 select LOCKD 10 11 select SUNRPC ··· 78 77 depends on NFSD && PROC_FS 79 78 select FS_POSIX_ACL 80 79 select RPCSEC_GSS_KRB5 81 - select CRYPTO 82 - select CRYPTO_MD5 80 + select CRYPTO # required by RPCSEC_GSS_KRB5 83 81 select GRACE_PERIOD 84 82 select NFS_V4_2_SSC_HELPER if NFS_V4_2 85 83 help ··· 164 164 config NFSD_LEGACY_CLIENT_TRACKING 165 165 bool "Support legacy NFSv4 client tracking methods (DEPRECATED)" 166 166 depends on NFSD_V4 167 - default y 167 + default n 168 168 help 169 169 The NFSv4 server needs to store a small amount of information on 170 170 stable storage in order to handle state recovery after reboot. Most

+112 -50

fs/nfsd/blocklayout.c

··· 13 13 #include "pnfs.h" 14 14 #include "filecache.h" 15 15 #include "vfs.h" 16 + #include "trace.h" 16 17 17 18 #define NFSDDBG_FACILITY NFSDDBG_PNFS 18 19 19 20 21 + /* 22 + * Get an extent from the file system that starts at offset or below 23 + * and may be shorter than the requested length. 24 + */ 20 25 static __be32 21 - nfsd4_block_proc_layoutget(struct svc_rqst *rqstp, struct inode *inode, 22 - const struct svc_fh *fhp, struct nfsd4_layoutget *args) 26 + nfsd4_block_map_extent(struct inode *inode, const struct svc_fh *fhp, 27 + u64 offset, u64 length, u32 iomode, u64 minlength, 28 + struct pnfs_block_extent *bex) 23 29 { 24 - struct nfsd4_layout_seg *seg = &args->lg_seg; 25 30 struct super_block *sb = inode->i_sb; 26 - u32 block_size = i_blocksize(inode); 27 - struct pnfs_block_extent *bex; 28 31 struct iomap iomap; 29 32 u32 device_generation = 0; 30 33 int error; 31 34 32 - if (locks_in_grace(SVC_NET(rqstp))) 33 - return nfserr_grace; 34 - 35 - if (seg->offset & (block_size - 1)) { 36 - dprintk("pnfsd: I/O misaligned\n"); 37 - goto out_layoutunavailable; 38 - } 39 - 40 - /* 41 - * Some clients barf on non-zero block numbers for NONE or INVALID 42 - * layouts, so make sure to zero the whole structure. 43 - */ 44 - error = -ENOMEM; 45 - bex = kzalloc(sizeof(*bex), GFP_KERNEL); 46 - if (!bex) 47 - goto out_error; 48 - args->lg_content = bex; 49 - 50 - error = sb->s_export_op->map_blocks(inode, seg->offset, seg->length, 51 - &iomap, seg->iomode != IOMODE_READ, 52 - &device_generation); 35 + error = sb->s_export_op->map_blocks(inode, offset, length, &iomap, 36 + iomode != IOMODE_READ, &device_generation); 53 37 if (error) { 54 38 if (error == -ENXIO) 55 - goto out_layoutunavailable; 56 - goto out_error; 57 - } 58 - 59 - if (iomap.length < args->lg_minlength) { 60 - dprintk("pnfsd: extent smaller than minlength\n"); 61 - goto out_layoutunavailable; 39 + return nfserr_layoutunavailable; 40 + return nfserrno(error); 62 41 } 63 42 64 43 switch (iomap.type) { 65 44 case IOMAP_MAPPED: 66 - if (seg->iomode == IOMODE_READ) 45 + if (iomode == IOMODE_READ) 67 46 bex->es = PNFS_BLOCK_READ_DATA; 68 47 else 69 48 bex->es = PNFS_BLOCK_READWRITE_DATA; 70 49 bex->soff = iomap.addr; 71 50 break; 72 51 case IOMAP_UNWRITTEN: 73 - if (seg->iomode & IOMODE_RW) { 52 + if (iomode & IOMODE_RW) { 74 53 /* 75 54 * Crack monkey special case from section 2.3.1. 76 55 */ 77 - if (args->lg_minlength == 0) { 56 + if (minlength == 0) { 78 57 dprintk("pnfsd: no soup for you!\n"); 79 - goto out_layoutunavailable; 58 + return nfserr_layoutunavailable; 80 59 } 81 60 82 61 bex->es = PNFS_BLOCK_INVALID_DATA; ··· 64 85 } 65 86 fallthrough; 66 87 case IOMAP_HOLE: 67 - if (seg->iomode == IOMODE_READ) { 88 + if (iomode == IOMODE_READ) { 68 89 bex->es = PNFS_BLOCK_NONE_DATA; 69 90 break; 70 91 } ··· 72 93 case IOMAP_DELALLOC: 73 94 default: 74 95 WARN(1, "pnfsd: filesystem returned %d extent\n", iomap.type); 75 - goto out_layoutunavailable; 96 + return nfserr_layoutunavailable; 76 97 } 77 98 78 99 error = nfsd4_set_deviceid(&bex->vol_id, fhp, device_generation); 79 100 if (error) 80 - goto out_error; 101 + return nfserrno(error); 102 + 81 103 bex->foff = iomap.offset; 82 104 bex->len = iomap.length; 105 + return nfs_ok; 106 + } 83 107 84 - seg->offset = iomap.offset; 85 - seg->length = iomap.length; 108 + static __be32 109 + nfsd4_block_proc_layoutget(struct svc_rqst *rqstp, struct inode *inode, 110 + const struct svc_fh *fhp, struct nfsd4_layoutget *args) 111 + { 112 + struct nfsd4_layout_seg *seg = &args->lg_seg; 113 + struct pnfs_block_layout *bl; 114 + struct pnfs_block_extent *first_bex, *last_bex; 115 + u64 offset = seg->offset, length = seg->length; 116 + u32 i, nr_extents_max, block_size = i_blocksize(inode); 117 + __be32 nfserr; 86 118 87 - dprintk("GET: 0x%llx:0x%llx %d\n", bex->foff, bex->len, bex->es); 88 - return 0; 119 + if (locks_in_grace(SVC_NET(rqstp))) 120 + return nfserr_grace; 121 + 122 + nfserr = nfserr_layoutunavailable; 123 + if (seg->offset & (block_size - 1)) { 124 + dprintk("pnfsd: I/O misaligned\n"); 125 + goto out_error; 126 + } 127 + 128 + /* 129 + * RFC 8881, section 3.3.17: 130 + * The layout4 data type defines a layout for a file. 131 + * 132 + * RFC 8881, section 18.43.3: 133 + * The loga_maxcount field specifies the maximum layout size 134 + * (in bytes) that the client can handle. If the size of the 135 + * layout structure exceeds the size specified by maxcount, 136 + * the metadata server will return the NFS4ERR_TOOSMALL error. 137 + */ 138 + nfserr = nfserr_toosmall; 139 + if (args->lg_maxcount < PNFS_BLOCK_LAYOUT4_SIZE + 140 + PNFS_BLOCK_EXTENT_SIZE) 141 + goto out_error; 142 + 143 + /* 144 + * Limit the maximum layout size to avoid allocating 145 + * a large buffer on the server for each layout request. 146 + */ 147 + nr_extents_max = (min(args->lg_maxcount, PAGE_SIZE) - 148 + PNFS_BLOCK_LAYOUT4_SIZE) / PNFS_BLOCK_EXTENT_SIZE; 149 + 150 + /* 151 + * Some clients barf on non-zero block numbers for NONE or INVALID 152 + * layouts, so make sure to zero the whole structure. 153 + */ 154 + nfserr = nfserrno(-ENOMEM); 155 + bl = kzalloc(struct_size(bl, extents, nr_extents_max), GFP_KERNEL); 156 + if (!bl) 157 + goto out_error; 158 + bl->nr_extents = nr_extents_max; 159 + args->lg_content = bl; 160 + 161 + for (i = 0; i < bl->nr_extents; i++) { 162 + struct pnfs_block_extent *bex = bl->extents + i; 163 + u64 bex_length; 164 + 165 + nfserr = nfsd4_block_map_extent(inode, fhp, offset, length, 166 + seg->iomode, args->lg_minlength, bex); 167 + if (nfserr != nfs_ok) 168 + goto out_error; 169 + 170 + bex_length = bex->len - (offset - bex->foff); 171 + if (bex_length >= length) { 172 + bl->nr_extents = i + 1; 173 + break; 174 + } 175 + 176 + offset = bex->foff + bex->len; 177 + length -= bex_length; 178 + } 179 + 180 + first_bex = bl->extents; 181 + last_bex = bl->extents + bl->nr_extents - 1; 182 + 183 + nfserr = nfserr_layoutunavailable; 184 + length = last_bex->foff + last_bex->len - seg->offset; 185 + if (length < args->lg_minlength) { 186 + dprintk("pnfsd: extent smaller than minlength\n"); 187 + goto out_error; 188 + } 189 + 190 + seg->offset = first_bex->foff; 191 + seg->length = last_bex->foff - first_bex->foff + last_bex->len; 192 + return nfs_ok; 89 193 90 194 out_error: 91 195 seg->length = 0; 92 - return nfserrno(error); 93 - out_layoutunavailable: 94 - seg->length = 0; 95 - return nfserr_layoutunavailable; 196 + return nfserr; 96 197 } 97 198 98 199 static __be32 ··· 399 340 { 400 341 struct nfs4_client *clp = ls->ls_stid.sc_client; 401 342 struct block_device *bdev = file->nf_file->f_path.mnt->mnt_sb->s_bdev; 343 + int status; 402 344 403 - bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, 404 - nfsd4_scsi_pr_key(clp), 0, true); 345 + status = bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, 346 + nfsd4_scsi_pr_key(clp), 347 + PR_EXCLUSIVE_ACCESS_REG_ONLY, true); 348 + trace_nfsd_pnfs_fence(clp, bdev->bd_disk->disk_name, status); 405 349 } 406 350 407 351 const struct nfsd4_layout_ops scsi_layout_ops = {

+27 -9

fs/nfsd/blocklayoutxdr.c

··· 14 14 #define NFSDDBG_FACILITY NFSDDBG_PNFS 15 15 16 16 17 + /** 18 + * nfsd4_block_encode_layoutget - encode block/scsi layout extent array 19 + * @xdr: stream for data encoding 20 + * @lgp: layoutget content, actually an array of extents to encode 21 + * 22 + * Encode the opaque loc_body field in the layoutget response. Since the 23 + * pnfs_block_layout4 and pnfs_scsi_layout4 structures on the wire are 24 + * the same, this function is used by both layout drivers. 25 + * 26 + * Return values: 27 + * %nfs_ok: Success, all extents encoded into @xdr 28 + * %nfserr_toosmall: Not enough space in @xdr to encode all the data 29 + */ 17 30 __be32 18 31 nfsd4_block_encode_layoutget(struct xdr_stream *xdr, 19 32 const struct nfsd4_layoutget *lgp) 20 33 { 21 - const struct pnfs_block_extent *b = lgp->lg_content; 22 - int len = sizeof(__be32) + 5 * sizeof(__be64) + sizeof(__be32); 34 + const struct pnfs_block_layout *bl = lgp->lg_content; 35 + u32 i, len = sizeof(__be32) + bl->nr_extents * PNFS_BLOCK_EXTENT_SIZE; 23 36 __be32 *p; 24 37 25 38 p = xdr_reserve_space(xdr, sizeof(__be32) + len); ··· 40 27 return nfserr_toosmall; 41 28 42 29 *p++ = cpu_to_be32(len); 43 - *p++ = cpu_to_be32(1); /* we always return a single extent */ 30 + *p++ = cpu_to_be32(bl->nr_extents); 44 31 45 - p = svcxdr_encode_deviceid4(p, &b->vol_id); 46 - p = xdr_encode_hyper(p, b->foff); 47 - p = xdr_encode_hyper(p, b->len); 48 - p = xdr_encode_hyper(p, b->soff); 49 - *p++ = cpu_to_be32(b->es); 50 - return 0; 32 + for (i = 0; i < bl->nr_extents; i++) { 33 + const struct pnfs_block_extent *bex = bl->extents + i; 34 + 35 + p = svcxdr_encode_deviceid4(p, &bex->vol_id); 36 + p = xdr_encode_hyper(p, bex->foff); 37 + p = xdr_encode_hyper(p, bex->len); 38 + p = xdr_encode_hyper(p, bex->soff); 39 + *p++ = cpu_to_be32(bex->es); 40 + } 41 + 42 + return nfs_ok; 51 43 } 52 44 53 45 static int

+14

fs/nfsd/blocklayoutxdr.h

··· 8 8 struct iomap; 9 9 struct xdr_stream; 10 10 11 + /* On the wire size of the layout4 struct with zero number of extents */ 12 + #define PNFS_BLOCK_LAYOUT4_SIZE \ 13 + (sizeof(__be32) * 2 + /* offset4 */ \ 14 + sizeof(__be32) * 2 + /* length4 */ \ 15 + sizeof(__be32) + /* layoutiomode4 */ \ 16 + sizeof(__be32) + /* layouttype4 */ \ 17 + sizeof(__be32) + /* number of bytes */ \ 18 + sizeof(__be32)) /* number of extents */ 19 + 11 20 struct pnfs_block_extent { 12 21 struct nfsd4_deviceid vol_id; 13 22 u64 foff; ··· 28 19 struct pnfs_block_range { 29 20 u64 foff; 30 21 u64 len; 22 + }; 23 + 24 + struct pnfs_block_layout { 25 + u32 nr_extents; 26 + struct pnfs_block_extent extents[] __counted_by(nr_extents); 31 27 }; 32 28 33 29 /*

+3

fs/nfsd/debugfs.c

··· 44 44 * Contents: 45 45 * %0: NFS READ will use buffered IO 46 46 * %1: NFS READ will use dontcache (buffered IO w/ dropbehind) 47 + * %2: NFS READ will use direct IO 47 48 * 48 49 * This setting takes immediate effect for all NFS versions, 49 50 * all exports, and in all NFSD net namespaces. ··· 65 64 nfsd_io_cache_read = NFSD_IO_BUFFERED; 66 65 break; 67 66 case NFSD_IO_DONTCACHE: 67 + case NFSD_IO_DIRECT: 68 68 /* 69 69 * Must disable splice_read when enabling 70 70 * NFSD_IO_DONTCACHE. ··· 108 106 switch (val) { 109 107 case NFSD_IO_BUFFERED: 110 108 case NFSD_IO_DONTCACHE: 109 + case NFSD_IO_DIRECT: 111 110 nfsd_io_cache_write = val; 112 111 break; 113 112 default:

+57 -138

fs/nfsd/nfs4recover.c

··· 32 32 * 33 33 */ 34 34 35 - #include <crypto/hash.h> 35 + #include <crypto/md5.h> 36 36 #include <crypto/sha2.h> 37 37 #include <linux/file.h> 38 38 #include <linux/slab.h> ··· 92 92 put_cred(revert_creds(original)); 93 93 } 94 94 95 - static int 95 + static void 96 96 nfs4_make_rec_clidname(char dname[HEXDIR_LEN], const struct xdr_netobj *clname) 97 97 { 98 98 u8 digest[MD5_DIGEST_SIZE]; 99 - struct crypto_shash *tfm; 100 - int status; 101 99 102 100 dprintk("NFSD: nfs4_make_rec_clidname for %.*s\n", 103 101 clname->len, clname->data); 104 - tfm = crypto_alloc_shash("md5", 0, 0); 105 - if (IS_ERR(tfm)) { 106 - status = PTR_ERR(tfm); 107 - goto out_no_tfm; 108 - } 109 102 110 - status = crypto_shash_tfm_digest(tfm, clname->data, clname->len, 111 - digest); 112 - if (status) 113 - goto out; 103 + md5(clname->data, clname->len, digest); 114 104 115 105 static_assert(HEXDIR_LEN == 2 * MD5_DIGEST_SIZE + 1); 116 106 sprintf(dname, "%*phN", MD5_DIGEST_SIZE, digest); 117 - 118 - status = 0; 119 - out: 120 - crypto_free_shash(tfm); 121 - out_no_tfm: 122 - return status; 123 - } 124 - 125 - /* 126 - * If we had an error generating the recdir name for the legacy tracker 127 - * then warn the admin. If the error doesn't appear to be transient, 128 - * then disable recovery tracking. 129 - */ 130 - static void 131 - legacy_recdir_name_error(struct nfs4_client *clp, int error) 132 - { 133 - printk(KERN_ERR "NFSD: unable to generate recoverydir " 134 - "name (%d).\n", error); 135 - 136 - /* 137 - * if the algorithm just doesn't exist, then disable the recovery 138 - * tracker altogether. The crypto libs will generally return this if 139 - * FIPS is enabled as well. 140 - */ 141 - if (error == -ENOENT) { 142 - printk(KERN_ERR "NFSD: disabling legacy clientid tracking. " 143 - "Reboot recovery will not function correctly!\n"); 144 - nfsd4_client_tracking_exit(clp->net); 145 - } 146 107 } 147 108 148 109 static void 149 110 __nfsd4_create_reclaim_record_grace(struct nfs4_client *clp, 150 - const char *dname, int len, struct nfsd_net *nn) 111 + char *dname, struct nfsd_net *nn) 151 112 { 152 - struct xdr_netobj name; 113 + struct xdr_netobj name = { .len = strlen(dname), .data = dname }; 153 114 struct xdr_netobj princhash = { .len = 0, .data = NULL }; 154 115 struct nfs4_client_reclaim *crp; 155 116 156 - name.data = kmemdup(dname, len, GFP_KERNEL); 157 - if (!name.data) { 158 - dprintk("%s: failed to allocate memory for name.data!\n", 159 - __func__); 160 - return; 161 - } 162 - name.len = len; 163 117 crp = nfs4_client_to_reclaim(name, princhash, nn); 164 - if (!crp) { 165 - kfree(name.data); 166 - return; 167 - } 168 118 crp->cr_clp = clp; 169 119 } 170 120 ··· 132 182 if (!nn->rec_file) 133 183 return; 134 184 135 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 136 - if (status) 137 - return legacy_recdir_name_error(clp, status); 185 + nfs4_make_rec_clidname(dname, &clp->cl_name); 138 186 139 187 status = nfs4_save_creds(&original_cred); 140 188 if (status < 0) ··· 167 219 out: 168 220 if (status == 0) { 169 221 if (nn->in_grace) 170 - __nfsd4_create_reclaim_record_grace(clp, dname, 171 - HEXDIR_LEN, nn); 222 + __nfsd4_create_reclaim_record_grace(clp, dname, nn); 172 223 vfs_fsync(nn->rec_file, 0); 173 224 } else { 174 225 printk(KERN_ERR "NFSD: failed to write recovery record" ··· 180 233 nfs4_reset_creds(original_cred); 181 234 } 182 235 183 - typedef int (recdir_func)(struct dentry *, struct dentry *, struct nfsd_net *); 236 + typedef int (recdir_func)(struct dentry *, char *, struct nfsd_net *); 184 237 185 238 struct name_list { 186 239 char name[HEXDIR_LEN]; ··· 234 287 } 235 288 236 289 status = iterate_dir(nn->rec_file, &ctx.ctx); 237 - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); 238 290 239 291 list_for_each_entry_safe(entry, tmp, &ctx.names, list) { 240 - if (!status) { 241 - struct dentry *dentry; 242 - dentry = lookup_one(&nop_mnt_idmap, 243 - &QSTR(entry->name), dir); 244 - if (IS_ERR(dentry)) { 245 - status = PTR_ERR(dentry); 246 - break; 247 - } 248 - status = f(dir, dentry, nn); 249 - dput(dentry); 250 - } 292 + if (!status) 293 + status = f(dir, entry->name, nn); 294 + 251 295 list_del(&entry->list); 252 296 kfree(entry); 253 297 } 254 - inode_unlock(d_inode(dir)); 255 298 nfs4_reset_creds(original_cred); 256 299 257 300 list_for_each_entry_safe(entry, tmp, &ctx.names, list) { ··· 301 364 if (!nn->rec_file || !test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) 302 365 return; 303 366 304 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 305 - if (status) 306 - return legacy_recdir_name_error(clp, status); 367 + nfs4_make_rec_clidname(dname, &clp->cl_name); 307 368 308 369 status = mnt_want_write_file(nn->rec_file); 309 370 if (status) ··· 329 394 } 330 395 331 396 static int 332 - purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) 397 + purge_old(struct dentry *parent, char *cname, struct nfsd_net *nn) 333 398 { 334 399 int status; 400 + struct dentry *child; 335 401 struct xdr_netobj name; 336 402 337 - if (child->d_name.len != HEXDIR_LEN - 1) { 338 - printk("%s: illegal name %pd in recovery directory\n", 339 - __func__, child); 403 + if (strlen(cname) != HEXDIR_LEN - 1) { 404 + printk("%s: illegal name %s in recovery directory\n", 405 + __func__, cname); 340 406 /* Keep trying; maybe the others are OK: */ 341 407 return 0; 342 408 } 343 - name.data = kmemdup_nul(child->d_name.name, child->d_name.len, GFP_KERNEL); 409 + name.data = kstrdup(cname, GFP_KERNEL); 344 410 if (!name.data) { 345 411 dprintk("%s: failed to allocate memory for name.data!\n", 346 412 __func__); ··· 351 415 if (nfs4_has_reclaimed_state(name, nn)) 352 416 goto out_free; 353 417 354 - status = vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child, NULL); 355 - if (status) 356 - printk("failed to remove client recovery directory %pd\n", 357 - child); 418 + inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); 419 + child = lookup_one(&nop_mnt_idmap, &QSTR(cname), parent); 420 + if (!IS_ERR(child)) { 421 + status = vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child, NULL); 422 + if (status) 423 + printk("failed to remove client recovery directory %pd\n", 424 + child); 425 + dput(child); 426 + } 427 + inode_unlock(d_inode(parent)); 428 + 358 429 out_free: 359 430 kfree(name.data); 360 431 out: ··· 392 449 } 393 450 394 451 static int 395 - load_recdir(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) 452 + load_recdir(struct dentry *parent, char *cname, struct nfsd_net *nn) 396 453 { 397 - struct xdr_netobj name; 454 + struct xdr_netobj name = { .len = HEXDIR_LEN, .data = cname }; 398 455 struct xdr_netobj princhash = { .len = 0, .data = NULL }; 399 456 400 - if (child->d_name.len != HEXDIR_LEN - 1) { 401 - printk("%s: illegal name %pd in recovery directory\n", 402 - __func__, child); 457 + if (strlen(cname) != HEXDIR_LEN - 1) { 458 + printk("%s: illegal name %s in recovery directory\n", 459 + __func__, cname); 403 460 /* Keep trying; maybe the others are OK: */ 404 461 return 0; 405 462 } 406 - name.data = kmemdup_nul(child->d_name.name, child->d_name.len, GFP_KERNEL); 407 - if (!name.data) { 408 - dprintk("%s: failed to allocate memory for name.data!\n", 409 - __func__); 410 - goto out; 411 - } 412 - name.len = HEXDIR_LEN; 413 - if (!nfs4_client_to_reclaim(name, princhash, nn)) 414 - kfree(name.data); 415 - out: 463 + nfs4_client_to_reclaim(name, princhash, nn); 416 464 return 0; 417 465 } 418 466 ··· 581 647 static int 582 648 nfsd4_check_legacy_client(struct nfs4_client *clp) 583 649 { 584 - int status; 585 650 char dname[HEXDIR_LEN]; 586 651 struct nfs4_client_reclaim *crp; 587 652 struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); ··· 590 657 if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) 591 658 return 0; 592 659 593 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 594 - if (status) { 595 - legacy_recdir_name_error(clp, status); 596 - return status; 597 - } 660 + nfs4_make_rec_clidname(dname, &clp->cl_name); 598 661 599 662 /* look for it in the reclaim hashtable otherwise */ 600 663 name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL); ··· 696 767 { 697 768 uint8_t cmd, princhashlen; 698 769 struct xdr_netobj name, princhash = { .len = 0, .data = NULL }; 770 + char *namecopy __free(kfree) = NULL; 771 + char *princhashcopy __free(kfree) = NULL; 699 772 uint16_t namelen; 700 773 701 774 if (get_user(cmd, &cmsg->cm_cmd)) { ··· 715 784 dprintk("%s: invalid namelen (%u)", __func__, namelen); 716 785 return -EINVAL; 717 786 } 718 - name.data = memdup_user(&ci->cc_name.cn_id, namelen); 719 - if (IS_ERR(name.data)) 720 - return PTR_ERR(name.data); 787 + namecopy = memdup_user(&ci->cc_name.cn_id, namelen); 788 + if (IS_ERR(namecopy)) 789 + return PTR_ERR(namecopy); 790 + name.data = namecopy; 721 791 name.len = namelen; 722 792 get_user(princhashlen, &ci->cc_princhash.cp_len); 723 793 if (princhashlen > 0) { 724 - princhash.data = memdup_user( 725 - &ci->cc_princhash.cp_data, 726 - princhashlen); 727 - if (IS_ERR(princhash.data)) { 728 - kfree(name.data); 729 - return PTR_ERR(princhash.data); 730 - } 794 + princhashcopy = memdup_user( 795 + &ci->cc_princhash.cp_data, 796 + princhashlen); 797 + if (IS_ERR(princhashcopy)) 798 + return PTR_ERR(princhashcopy); 799 + princhash.data = princhashcopy; 731 800 princhash.len = princhashlen; 732 801 } else 733 802 princhash.len = 0; ··· 741 810 dprintk("%s: invalid namelen (%u)", __func__, namelen); 742 811 return -EINVAL; 743 812 } 744 - name.data = memdup_user(&cnm->cn_id, namelen); 745 - if (IS_ERR(name.data)) 746 - return PTR_ERR(name.data); 813 + namecopy = memdup_user(&cnm->cn_id, namelen); 814 + if (IS_ERR(namecopy)) 815 + return PTR_ERR(namecopy); 816 + name.data = namecopy; 747 817 name.len = namelen; 748 818 } 749 819 #ifdef CONFIG_NFSD_LEGACY_CLIENT_TRACKING ··· 752 820 struct cld_net *cn = nn->cld_net; 753 821 754 822 name.len = name.len - 5; 755 - memmove(name.data, name.data + 5, name.len); 823 + name.data = name.data + 5; 756 824 cn->cn_has_legacy = true; 757 825 } 758 826 #endif 759 - if (!nfs4_client_to_reclaim(name, princhash, nn)) { 760 - kfree(name.data); 761 - kfree(princhash.data); 827 + if (!nfs4_client_to_reclaim(name, princhash, nn)) 762 828 return -EFAULT; 763 - } 764 829 return nn->client_tracking_ops->msglen; 765 830 } 766 831 return -EFAULT; ··· 1183 1254 1184 1255 #ifdef CONFIG_NFSD_LEGACY_CLIENT_TRACKING 1185 1256 if (nn->cld_net->cn_has_legacy) { 1186 - int status; 1187 1257 char dname[HEXDIR_LEN]; 1188 1258 struct xdr_netobj name; 1189 1259 1190 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 1191 - if (status) 1192 - return -ENOENT; 1260 + nfs4_make_rec_clidname(dname, &clp->cl_name); 1193 1261 1194 1262 name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL); 1195 1263 if (!name.data) { ··· 1231 1305 if (cn->cn_has_legacy) { 1232 1306 struct xdr_netobj name; 1233 1307 char dname[HEXDIR_LEN]; 1234 - int status; 1235 1308 1236 - status = nfs4_make_rec_clidname(dname, &clp->cl_name); 1237 - if (status) 1238 - return -ENOENT; 1309 + nfs4_make_rec_clidname(dname, &clp->cl_name); 1239 1310 1240 1311 name.data = kmemdup(dname, HEXDIR_LEN, GFP_KERNEL); 1241 1312 if (!name.data) { ··· 1605 1682 return NULL; 1606 1683 } 1607 1684 1608 - copied = nfs4_make_rec_clidname(result + copied, name); 1609 - if (copied) { 1610 - kfree(result); 1611 - return NULL; 1612 - } 1685 + nfs4_make_rec_clidname(result + copied, name); 1613 1686 1614 1687 return result; 1615 1688 }

+37 -48

fs/nfsd/nfs4state.c

··· 3508 3508 free_svc_cred(&slot->sl_cred); 3509 3509 copy_cred(&slot->sl_cred, &resp->rqstp->rq_cred); 3510 3510 3511 - if (!nfsd4_cache_this(resp)) { 3511 + if (!(resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS)) { 3512 3512 slot->sl_flags &= ~NFSD4_SLOT_CACHED; 3513 3513 return; 3514 3514 } ··· 3523 3523 } 3524 3524 3525 3525 /* 3526 - * Encode the replay sequence operation from the slot values. 3527 - * If cachethis is FALSE encode the uncached rep error on the next 3528 - * operation which sets resp->p and increments resp->opcnt for 3529 - * nfs4svc_encode_compoundres. 3530 - * 3531 - */ 3532 - static __be32 3533 - nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args, 3534 - struct nfsd4_compoundres *resp) 3535 - { 3536 - struct nfsd4_op *op; 3537 - struct nfsd4_slot *slot = resp->cstate.slot; 3538 - 3539 - /* Encode the replayed sequence operation */ 3540 - op = &args->ops[resp->opcnt - 1]; 3541 - nfsd4_encode_operation(resp, op); 3542 - 3543 - if (slot->sl_flags & NFSD4_SLOT_CACHED) 3544 - return op->status; 3545 - if (args->opcnt == 1) { 3546 - /* 3547 - * The original operation wasn't a solo sequence--we 3548 - * always cache those--so this retry must not match the 3549 - * original: 3550 - */ 3551 - op->status = nfserr_seq_false_retry; 3552 - } else { 3553 - op = &args->ops[resp->opcnt++]; 3554 - op->status = nfserr_retry_uncached_rep; 3555 - nfsd4_encode_operation(resp, op); 3556 - } 3557 - return op->status; 3558 - } 3559 - 3560 - /* 3561 3526 * The sequence operation is not cached because we can use the slot and 3562 3527 * session values. 3563 3528 */ ··· 3530 3565 nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp, 3531 3566 struct nfsd4_sequence *seq) 3532 3567 { 3568 + struct nfsd4_compoundargs *args = resp->rqstp->rq_argp; 3533 3569 struct nfsd4_slot *slot = resp->cstate.slot; 3534 3570 struct xdr_stream *xdr = resp->xdr; 3535 3571 __be32 *p; 3536 - __be32 status; 3537 3572 3538 3573 dprintk("--> %s slot %p\n", __func__, slot); 3539 3574 3540 - status = nfsd4_enc_sequence_replay(resp->rqstp->rq_argp, resp); 3541 - if (status) 3542 - return status; 3575 + /* Always encode the SEQUENCE response. */ 3576 + nfsd4_encode_operation(resp, &args->ops[0]); 3577 + if (args->opcnt == 1) 3578 + /* A solo SEQUENCE - nothing was cached */ 3579 + return args->ops[0].status; 3543 3580 3581 + if (!(slot->sl_flags & NFSD4_SLOT_CACHED)) { 3582 + /* We weren't asked to cache this. */ 3583 + struct nfsd4_op *op; 3584 + 3585 + op = &args->ops[resp->opcnt++]; 3586 + op->status = nfserr_retry_uncached_rep; 3587 + nfsd4_encode_operation(resp, op); 3588 + return op->status; 3589 + } 3590 + 3591 + /* return reply from cache */ 3544 3592 p = xdr_reserve_space(xdr, slot->sl_datalen); 3545 3593 if (!p) { 3546 3594 WARN_ON_ONCE(1); ··· 6340 6362 return; 6341 6363 out_no_deleg: 6342 6364 open->op_delegate_type = OPEN_DELEGATE_NONE; 6343 - if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS && 6344 - open->op_delegate_type != OPEN_DELEGATE_NONE) { 6345 - dprintk("NFSD: WARNING: refusing delegation reclaim\n"); 6346 - open->op_recall = true; 6347 - } 6348 6365 6349 6366 /* 4.1 client asking for a delegation? */ 6350 6367 if (open->op_deleg_want) ··· 8775 8802 8776 8803 /* 8777 8804 * failure => all reset bets are off, nfserr_no_grace... 8778 - * 8779 - * The caller is responsible for freeing name.data if NULL is returned (it 8780 - * will be freed in nfs4_remove_reclaim_record in the normal case). 8781 8805 */ 8782 8806 struct nfs4_client_reclaim * 8783 8807 nfs4_client_to_reclaim(struct xdr_netobj name, struct xdr_netobj princhash, ··· 8783 8813 unsigned int strhashval; 8784 8814 struct nfs4_client_reclaim *crp; 8785 8815 8816 + name.data = kmemdup(name.data, name.len, GFP_KERNEL); 8817 + if (!name.data) { 8818 + dprintk("%s: failed to allocate memory for name.data!\n", 8819 + __func__); 8820 + return NULL; 8821 + } 8822 + if (princhash.len) { 8823 + princhash.data = kmemdup(princhash.data, princhash.len, GFP_KERNEL); 8824 + if (!princhash.data) { 8825 + dprintk("%s: failed to allocate memory for princhash.data!\n", 8826 + __func__); 8827 + kfree(name.data); 8828 + return NULL; 8829 + } 8830 + } else 8831 + princhash.data = NULL; 8786 8832 crp = alloc_reclaim(); 8787 8833 if (crp) { 8788 8834 strhashval = clientstr_hashval(name); ··· 8810 8824 crp->cr_princhash.len = princhash.len; 8811 8825 crp->cr_clp = NULL; 8812 8826 nn->reclaim_str_hashtbl_size++; 8827 + } else { 8828 + kfree(name.data); 8829 + kfree(princhash.data); 8813 8830 } 8814 8831 return crp; 8815 8832 }

+20 -8

fs/nfsd/nfs4xdr.c

··· 4472 4472 4473 4473 static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, 4474 4474 struct nfsd4_read *read, 4475 - struct file *file, unsigned long maxcount) 4475 + unsigned long maxcount) 4476 4476 { 4477 4477 struct xdr_stream *xdr = resp->xdr; 4478 4478 unsigned int base = xdr->buf->page_len & ~PAGE_MASK; ··· 4480 4480 __be32 zero = xdr_zero; 4481 4481 __be32 nfserr; 4482 4482 4483 - if (xdr_reserve_space_vec(xdr, maxcount) < 0) 4484 - return nfserr_resource; 4485 - 4486 - nfserr = nfsd_iter_read(resp->rqstp, read->rd_fhp, file, 4483 + nfserr = nfsd_iter_read(resp->rqstp, read->rd_fhp, read->rd_nf, 4487 4484 read->rd_offset, &maxcount, base, 4488 4485 &read->rd_eof); 4489 4486 read->rd_length = maxcount; 4490 4487 if (nfserr) 4491 4488 return nfserr; 4489 + 4490 + /* 4491 + * svcxdr_encode_opaque_pages() is not used here because 4492 + * we don't want to encode subsequent results in this 4493 + * COMPOUND into the xdr->buf's tail, but rather those 4494 + * results should follow the NFS READ payload in the 4495 + * buf's pages. 4496 + */ 4497 + if (xdr_reserve_space_vec(xdr, maxcount) < 0) 4498 + return nfserr_resource; 4499 + 4500 + /* 4501 + * Mark the buffer location of the NFS READ payload so that 4502 + * direct placement-capable transports send only the 4503 + * payload bytes out-of-band. 4504 + */ 4492 4505 if (svc_encode_result_payload(resp->rqstp, starting_len, maxcount)) 4493 4506 return nfserr_io; 4494 - xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount)); 4495 4507 4496 4508 write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &zero, 4497 4509 xdr_pad_size(maxcount)); ··· 4542 4530 if (file->f_op->splice_read && splice_ok) 4543 4531 nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); 4544 4532 else 4545 - nfserr = nfsd4_encode_readv(resp, read, file, maxcount); 4533 + nfserr = nfsd4_encode_readv(resp, read, maxcount); 4546 4534 if (nfserr) { 4547 4535 xdr_truncate_encode(xdr, eof_offset); 4548 4536 return nfserr; ··· 5438 5426 if (file->f_op->splice_read && splice_ok) 5439 5427 nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); 5440 5428 else 5441 - nfserr = nfsd4_encode_readv(resp, read, file, maxcount); 5429 + nfserr = nfsd4_encode_readv(resp, read, maxcount); 5442 5430 if (nfserr) 5443 5431 return nfserr; 5444 5432

+2 -2

fs/nfsd/nfsd.h

··· 160 160 /* Any new NFSD_IO enum value must be added at the end */ 161 161 NFSD_IO_BUFFERED, 162 162 NFSD_IO_DONTCACHE, 163 + NFSD_IO_DIRECT, 163 164 }; 164 165 165 166 extern u64 nfsd_io_cache_read __read_mostly; ··· 398 397 #define NFSD_CB_GETATTR_TIMEOUT NFSD_DELEGRETURN_TIMEOUT 399 398 400 399 /* 401 - * The following attributes are currently not supported by the NFSv4 server: 400 + * The following attributes are not implemented by NFSD: 402 401 * ARCHIVE (deprecated anyway) 403 402 * HIDDEN (unlikely to be supported any time soon) 404 403 * MIMETYPE (unlikely to be supported any time soon) 405 404 * QUOTA_* (will be supported in a forthcoming patch) 406 405 * SYSTEM (unlikely to be supported any time soon) 407 406 * TIME_BACKUP (unlikely to be supported any time soon) 408 - * TIME_CREATE (unlikely to be supported any time soon) 409 407 */ 410 408 #define NFSD4_SUPPORTED_ATTRS_WORD0 \ 411 409 (FATTR4_WORD0_SUPPORTED_ATTRS | FATTR4_WORD0_TYPE | FATTR4_WORD0_FH_EXPIRE_TYPE \

+5 -23

fs/nfsd/nfssvc.c

··· 249 249 return rv; 250 250 } 251 251 252 - static int nfsd_init_socks(struct net *net, const struct cred *cred) 253 - { 254 - int error; 255 - struct nfsd_net *nn = net_generic(net, nfsd_net_id); 256 - 257 - if (!list_empty(&nn->nfsd_serv->sv_permsocks)) 258 - return 0; 259 - 260 - error = svc_xprt_create(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, 261 - SVC_SOCK_DEFAULTS, cred); 262 - if (error < 0) 263 - return error; 264 - 265 - error = svc_xprt_create(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, 266 - SVC_SOCK_DEFAULTS, cred); 267 - if (error < 0) 268 - return error; 269 - 270 - return 0; 271 - } 272 - 273 252 static int nfsd_users = 0; 274 253 275 254 static int nfsd_startup_generic(void) ··· 356 377 ret = nfsd_startup_generic(); 357 378 if (ret) 358 379 return ret; 359 - ret = nfsd_init_socks(net, cred); 360 - if (ret) 380 + 381 + if (list_empty(&nn->nfsd_serv->sv_permsocks)) { 382 + pr_warn("NFSD: Failed to start, no listeners configured.\n"); 383 + ret = -EIO; 361 384 goto out_socks; 385 + } 362 386 363 387 if (nfsd_needs_lockd(nn) && !nn->lockd_up) { 364 388 ret = lockd_up(net, cred);

+41

fs/nfsd/trace.h

··· 464 464 DEFINE_NFSD_IO_EVENT(read_start); 465 465 DEFINE_NFSD_IO_EVENT(read_splice); 466 466 DEFINE_NFSD_IO_EVENT(read_vector); 467 + DEFINE_NFSD_IO_EVENT(read_direct); 467 468 DEFINE_NFSD_IO_EVENT(read_io_done); 468 469 DEFINE_NFSD_IO_EVENT(read_done); 469 470 DEFINE_NFSD_IO_EVENT(write_start); 470 471 DEFINE_NFSD_IO_EVENT(write_opened); 472 + DEFINE_NFSD_IO_EVENT(write_direct); 473 + DEFINE_NFSD_IO_EVENT(write_vector); 471 474 DEFINE_NFSD_IO_EVENT(write_io_done); 472 475 DEFINE_NFSD_IO_EVENT(write_done); 473 476 DEFINE_NFSD_IO_EVENT(commit_start); ··· 2616 2613 DEFINE_NFSD_VFS_GETATTR_EVENT(nfsd_vfs_getattr); 2617 2614 DEFINE_NFSD_VFS_GETATTR_EVENT(nfsd_vfs_statfs); 2618 2615 2616 + DECLARE_EVENT_CLASS(nfsd_pnfs_class, 2617 + TP_PROTO( 2618 + const struct nfs4_client *clp, 2619 + const char *dev, 2620 + int error 2621 + ), 2622 + TP_ARGS(clp, dev, error), 2623 + TP_STRUCT__entry( 2624 + __sockaddr(addr, sizeof(struct sockaddr_in6)) 2625 + __field(unsigned int, netns_ino) 2626 + __string(dev, dev) 2627 + __field(int, error) 2628 + ), 2629 + TP_fast_assign( 2630 + __assign_sockaddr(addr, &clp->cl_addr, 2631 + sizeof(struct sockaddr_in6)); 2632 + __entry->netns_ino = clp->net->ns.inum; 2633 + __assign_str(dev); 2634 + __entry->error = error; 2635 + ), 2636 + TP_printk("client=%pISpc nn=%d dev=%s error=%d", 2637 + __get_sockaddr(addr), 2638 + __entry->netns_ino, 2639 + __get_str(dev), 2640 + __entry->error 2641 + ) 2642 + ); 2643 + 2644 + #define DEFINE_NFSD_PNFS_ERR_EVENT(name) \ 2645 + DEFINE_EVENT(nfsd_pnfs_class, nfsd_pnfs_##name, \ 2646 + TP_PROTO( \ 2647 + const struct nfs4_client *clp, \ 2648 + const char *dev, \ 2649 + int error \ 2650 + ), \ 2651 + TP_ARGS(clp, dev, error)) 2652 + 2653 + DEFINE_NFSD_PNFS_ERR_EVENT(fence); 2619 2654 #endif /* _NFSD_TRACE_H */ 2620 2655 2621 2656 #undef TRACE_INCLUDE_PATH

+247 -14

fs/nfsd/vfs.c

··· 1075 1075 return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); 1076 1076 } 1077 1077 1078 + /* 1079 + * The byte range of the client's READ request is expanded on both ends 1080 + * until it meets the underlying file system's direct I/O alignment 1081 + * requirements. After the internal read is complete, the byte range of 1082 + * the NFS READ payload is reduced to the byte range that was originally 1083 + * requested. 1084 + * 1085 + * Note that a direct read can be done only when the xdr_buf containing 1086 + * the NFS READ reply does not already have contents in its .pages array. 1087 + * This is due to potentially restrictive alignment requirements on the 1088 + * read buffer. When .page_len and @base are zero, the .pages array is 1089 + * guaranteed to be page-aligned. 1090 + */ 1091 + static noinline_for_stack __be32 1092 + nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp, 1093 + struct nfsd_file *nf, loff_t offset, unsigned long *count, 1094 + u32 *eof) 1095 + { 1096 + u64 dio_start, dio_end; 1097 + unsigned long v, total; 1098 + struct iov_iter iter; 1099 + struct kiocb kiocb; 1100 + ssize_t host_err; 1101 + size_t len; 1102 + 1103 + init_sync_kiocb(&kiocb, nf->nf_file); 1104 + kiocb.ki_flags |= IOCB_DIRECT; 1105 + 1106 + /* Read a properly-aligned region of bytes into rq_bvec */ 1107 + dio_start = round_down(offset, nf->nf_dio_read_offset_align); 1108 + dio_end = round_up((u64)offset + *count, nf->nf_dio_read_offset_align); 1109 + 1110 + kiocb.ki_pos = dio_start; 1111 + 1112 + v = 0; 1113 + total = dio_end - dio_start; 1114 + while (total && v < rqstp->rq_maxpages && 1115 + rqstp->rq_next_page < rqstp->rq_page_end) { 1116 + len = min_t(size_t, total, PAGE_SIZE); 1117 + bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, 1118 + len, 0); 1119 + 1120 + total -= len; 1121 + ++rqstp->rq_next_page; 1122 + ++v; 1123 + } 1124 + 1125 + trace_nfsd_read_direct(rqstp, fhp, offset, *count - total); 1126 + iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, 1127 + dio_end - dio_start - total); 1128 + 1129 + host_err = vfs_iocb_iter_read(nf->nf_file, &kiocb, &iter); 1130 + if (host_err >= 0) { 1131 + unsigned int pad = offset - dio_start; 1132 + 1133 + /* The returned payload starts after the pad */ 1134 + rqstp->rq_res.page_base = pad; 1135 + 1136 + /* Compute the count of bytes to be returned */ 1137 + if (host_err > pad + *count) 1138 + host_err = *count; 1139 + else if (host_err > pad) 1140 + host_err -= pad; 1141 + else 1142 + host_err = 0; 1143 + } else if (unlikely(host_err == -EINVAL)) { 1144 + struct inode *inode = d_inode(fhp->fh_dentry); 1145 + 1146 + pr_info_ratelimited("nfsd: Direct I/O alignment failure on %s/%ld\n", 1147 + inode->i_sb->s_id, inode->i_ino); 1148 + host_err = -ESERVERFAULT; 1149 + } 1150 + 1151 + return nfsd_finish_read(rqstp, fhp, nf->nf_file, offset, count, 1152 + eof, host_err); 1153 + } 1154 + 1078 1155 /** 1079 1156 * nfsd_iter_read - Perform a VFS read using an iterator 1080 1157 * @rqstp: RPC transaction context 1081 1158 * @fhp: file handle of file to be read 1082 - * @file: opened struct file of file to be read 1159 + * @nf: opened struct nfsd_file of file to be read 1083 1160 * @offset: starting byte offset 1084 1161 * @count: IN: requested number of bytes; OUT: number of bytes read 1085 1162 * @base: offset in first page of read buffer ··· 1169 1092 * returned. 1170 1093 */ 1171 1094 __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, 1172 - struct file *file, loff_t offset, unsigned long *count, 1095 + struct nfsd_file *nf, loff_t offset, unsigned long *count, 1173 1096 unsigned int base, u32 *eof) 1174 1097 { 1098 + struct file *file = nf->nf_file; 1175 1099 unsigned long v, total; 1176 1100 struct iov_iter iter; 1177 1101 struct kiocb kiocb; ··· 1184 1106 switch (nfsd_io_cache_read) { 1185 1107 case NFSD_IO_BUFFERED: 1186 1108 break; 1109 + case NFSD_IO_DIRECT: 1110 + /* When dio_read_offset_align is zero, dio is not supported */ 1111 + if (nf->nf_dio_read_offset_align && !rqstp->rq_res.page_len) 1112 + return nfsd_direct_read(rqstp, fhp, nf, offset, 1113 + count, eof); 1114 + fallthrough; 1187 1115 case NFSD_IO_DONTCACHE: 1188 1116 if (file->f_op->fop_flags & FOP_DONTCACHE) 1189 1117 kiocb.ki_flags = IOCB_DONTCACHE; ··· 1200 1116 1201 1117 v = 0; 1202 1118 total = *count; 1203 - while (total) { 1119 + while (total && v < rqstp->rq_maxpages && 1120 + rqstp->rq_next_page < rqstp->rq_page_end) { 1204 1121 len = min_t(size_t, total, PAGE_SIZE - base); 1205 - bvec_set_page(&rqstp->rq_bvec[v], *(rqstp->rq_next_page++), 1122 + bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page, 1206 1123 len, base); 1124 + 1207 1125 total -= len; 1126 + ++rqstp->rq_next_page; 1208 1127 ++v; 1209 1128 base = 0; 1210 1129 } 1211 - WARN_ON_ONCE(v > rqstp->rq_maxpages); 1212 1130 1213 - trace_nfsd_read_vector(rqstp, fhp, offset, *count); 1214 - iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count); 1131 + trace_nfsd_read_vector(rqstp, fhp, offset, *count - total); 1132 + iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count - total); 1215 1133 host_err = vfs_iocb_iter_read(file, &kiocb, &iter); 1216 1134 return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); 1217 1135 } ··· 1253 1167 last_ino = inode->i_ino; 1254 1168 last_dev = inode->i_sb->s_dev; 1255 1169 return err; 1170 + } 1171 + 1172 + struct nfsd_write_dio_seg { 1173 + struct iov_iter iter; 1174 + int flags; 1175 + }; 1176 + 1177 + static unsigned long 1178 + iov_iter_bvec_offset(const struct iov_iter *iter) 1179 + { 1180 + return (unsigned long)(iter->bvec->bv_offset + iter->iov_offset); 1181 + } 1182 + 1183 + static void 1184 + nfsd_write_dio_seg_init(struct nfsd_write_dio_seg *segment, 1185 + struct bio_vec *bvec, unsigned int nvecs, 1186 + unsigned long total, size_t start, size_t len, 1187 + struct kiocb *iocb) 1188 + { 1189 + iov_iter_bvec(&segment->iter, ITER_SOURCE, bvec, nvecs, total); 1190 + if (start) 1191 + iov_iter_advance(&segment->iter, start); 1192 + iov_iter_truncate(&segment->iter, len); 1193 + segment->flags = iocb->ki_flags; 1194 + } 1195 + 1196 + static unsigned int 1197 + nfsd_write_dio_iters_init(struct nfsd_file *nf, struct bio_vec *bvec, 1198 + unsigned int nvecs, struct kiocb *iocb, 1199 + unsigned long total, 1200 + struct nfsd_write_dio_seg segments[3]) 1201 + { 1202 + u32 offset_align = nf->nf_dio_offset_align; 1203 + loff_t prefix_end, orig_end, middle_end; 1204 + u32 mem_align = nf->nf_dio_mem_align; 1205 + size_t prefix, middle, suffix; 1206 + loff_t offset = iocb->ki_pos; 1207 + unsigned int nsegs = 0; 1208 + 1209 + /* 1210 + * Check if direct I/O is feasible for this write request. 1211 + * If alignments are not available, the write is too small, 1212 + * or no alignment can be found, fall back to buffered I/O. 1213 + */ 1214 + if (unlikely(!mem_align || !offset_align) || 1215 + unlikely(total < max(offset_align, mem_align))) 1216 + goto no_dio; 1217 + 1218 + prefix_end = round_up(offset, offset_align); 1219 + orig_end = offset + total; 1220 + middle_end = round_down(orig_end, offset_align); 1221 + 1222 + prefix = prefix_end - offset; 1223 + middle = middle_end - prefix_end; 1224 + suffix = orig_end - middle_end; 1225 + 1226 + if (!middle) 1227 + goto no_dio; 1228 + 1229 + if (prefix) 1230 + nfsd_write_dio_seg_init(&segments[nsegs++], bvec, 1231 + nvecs, total, 0, prefix, iocb); 1232 + 1233 + nfsd_write_dio_seg_init(&segments[nsegs], bvec, nvecs, 1234 + total, prefix, middle, iocb); 1235 + 1236 + /* 1237 + * Check if the bvec iterator is aligned for direct I/O. 1238 + * 1239 + * bvecs generated from RPC receive buffers are contiguous: After 1240 + * the first bvec, all subsequent bvecs start at bv_offset zero 1241 + * (page-aligned). Therefore, only the first bvec is checked. 1242 + */ 1243 + if (iov_iter_bvec_offset(&segments[nsegs].iter) & (mem_align - 1)) 1244 + goto no_dio; 1245 + segments[nsegs].flags |= IOCB_DIRECT; 1246 + nsegs++; 1247 + 1248 + if (suffix) 1249 + nfsd_write_dio_seg_init(&segments[nsegs++], bvec, nvecs, total, 1250 + prefix + middle, suffix, iocb); 1251 + 1252 + return nsegs; 1253 + 1254 + no_dio: 1255 + /* No DIO alignment possible - pack into single non-DIO segment. */ 1256 + nfsd_write_dio_seg_init(&segments[0], bvec, nvecs, total, 0, 1257 + total, iocb); 1258 + return 1; 1259 + } 1260 + 1261 + static noinline_for_stack int 1262 + nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp, 1263 + struct nfsd_file *nf, unsigned int nvecs, 1264 + unsigned long *cnt, struct kiocb *kiocb) 1265 + { 1266 + struct nfsd_write_dio_seg segments[3]; 1267 + struct file *file = nf->nf_file; 1268 + unsigned int nsegs, i; 1269 + ssize_t host_err; 1270 + 1271 + nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs, 1272 + kiocb, *cnt, segments); 1273 + 1274 + *cnt = 0; 1275 + for (i = 0; i < nsegs; i++) { 1276 + kiocb->ki_flags = segments[i].flags; 1277 + if (kiocb->ki_flags & IOCB_DIRECT) 1278 + trace_nfsd_write_direct(rqstp, fhp, kiocb->ki_pos, 1279 + segments[i].iter.count); 1280 + else { 1281 + trace_nfsd_write_vector(rqstp, fhp, kiocb->ki_pos, 1282 + segments[i].iter.count); 1283 + /* 1284 + * Mark the I/O buffer as evict-able to reduce 1285 + * memory contention. 1286 + */ 1287 + if (nf->nf_file->f_op->fop_flags & FOP_DONTCACHE) 1288 + kiocb->ki_flags |= IOCB_DONTCACHE; 1289 + } 1290 + 1291 + host_err = vfs_iocb_iter_write(file, kiocb, &segments[i].iter); 1292 + if (host_err < 0) 1293 + return host_err; 1294 + *cnt += host_err; 1295 + if (host_err < segments[i].iter.count) 1296 + break; /* partial write */ 1297 + } 1298 + 1299 + return 0; 1256 1300 } 1257 1301 1258 1302 /** ··· 1445 1229 stable = NFS_UNSTABLE; 1446 1230 init_sync_kiocb(&kiocb, file); 1447 1231 kiocb.ki_pos = offset; 1448 - if (stable && !fhp->fh_use_wgather) 1449 - kiocb.ki_flags |= IOCB_DSYNC; 1232 + if (likely(!fhp->fh_use_wgather)) { 1233 + switch (stable) { 1234 + case NFS_FILE_SYNC: 1235 + /* persist data and timestamps */ 1236 + kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC; 1237 + break; 1238 + case NFS_DATA_SYNC: 1239 + /* persist data only */ 1240 + kiocb.ki_flags |= IOCB_DSYNC; 1241 + break; 1242 + } 1243 + } 1450 1244 1451 1245 nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); 1452 - iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); 1246 + 1453 1247 since = READ_ONCE(file->f_wb_err); 1454 1248 if (verf) 1455 1249 nfsd_copy_write_verifier(verf, nn); 1456 1250 1457 1251 switch (nfsd_io_cache_write) { 1458 - case NFSD_IO_BUFFERED: 1252 + case NFSD_IO_DIRECT: 1253 + host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs, 1254 + cnt, &kiocb); 1459 1255 break; 1460 1256 case NFSD_IO_DONTCACHE: 1461 1257 if (file->f_op->fop_flags & FOP_DONTCACHE) 1462 1258 kiocb.ki_flags |= IOCB_DONTCACHE; 1259 + fallthrough; 1260 + case NFSD_IO_BUFFERED: 1261 + iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); 1262 + host_err = vfs_iocb_iter_write(file, &kiocb, &iter); 1263 + if (host_err < 0) 1264 + break; 1265 + *cnt = host_err; 1463 1266 break; 1464 1267 } 1465 - host_err = vfs_iocb_iter_write(file, &kiocb, &iter); 1466 1268 if (host_err < 0) { 1467 1269 commit_reset_write_verifier(nn, rqstp, host_err); 1468 1270 goto out_nfserr; 1469 1271 } 1470 - *cnt = host_err; 1471 1272 nfsd_stats_io_write_add(nn, exp, *cnt); 1472 1273 fsnotify_modify(file); 1473 1274 host_err = filemap_check_wb_err(file->f_mapping, since); ··· 1568 1335 if (file->f_op->splice_read && nfsd_read_splice_ok(rqstp)) 1569 1336 err = nfsd_splice_read(rqstp, fhp, file, offset, count, eof); 1570 1337 else 1571 - err = nfsd_iter_read(rqstp, fhp, file, offset, count, 0, eof); 1338 + err = nfsd_iter_read(rqstp, fhp, nf, offset, count, 0, eof); 1572 1339 1573 1340 nfsd_file_put(nf); 1574 1341 trace_nfsd_read_done(rqstp, fhp, offset, *count);

+1 -1

fs/nfsd/vfs.h

··· 121 121 unsigned long *count, 122 122 u32 *eof); 123 123 __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, 124 - struct file *file, loff_t offset, 124 + struct nfsd_file *nf, loff_t offset, 125 125 unsigned long *count, unsigned int base, 126 126 u32 *eof); 127 127 bool nfsd_read_splice_ok(struct svc_rqst *rqstp);

-21

fs/nfsd/xdr4.h

··· 924 924 struct nfsd4_compound_state cstate; 925 925 }; 926 926 927 - static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp) 928 - { 929 - struct nfsd4_compoundargs *args = resp->rqstp->rq_argp; 930 - return resp->opcnt == 1 && args->ops[0].opnum == OP_SEQUENCE; 931 - } 932 - 933 - /* 934 - * The session reply cache only needs to cache replies that the client 935 - * actually asked us to. But it's almost free for us to cache compounds 936 - * consisting of only a SEQUENCE op, so we may as well cache those too. 937 - * Also, the protocol doesn't give us a convenient response in the case 938 - * of a replay of a solo SEQUENCE op that wasn't cached 939 - * (RETRY_UNCACHED_REP can only be returned in the second op of a 940 - * compound). 941 - */ 942 - static inline bool nfsd4_cache_this(struct nfsd4_compoundres *resp) 943 - { 944 - return (resp->cstate.slot->sl_flags & NFSD4_SLOT_CACHETHIS) 945 - || nfsd4_is_solo_sequence(resp); 946 - } 947 - 948 927 static inline bool nfsd4_last_compound_op(struct svc_rqst *rqstp) 949 928 { 950 929 struct nfsd4_compoundres *resp = rqstp->rq_resp;

+8 -1

include/linux/lockd/lockd.h

··· 12 12 13 13 /* XXX: a lot of this should really be under fs/lockd. */ 14 14 15 + #include <linux/exportfs.h> 15 16 #include <linux/in.h> 16 17 #include <linux/in6.h> 17 18 #include <net/ipv6.h> ··· 308 307 int nlmsvc_unlock_all_by_sb(struct super_block *sb); 309 308 int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr); 310 309 311 - static inline struct file *nlmsvc_file_file(struct nlm_file *file) 310 + static inline struct file *nlmsvc_file_file(const struct nlm_file *file) 312 311 { 313 312 return file->f_file[O_RDONLY] ? 314 313 file->f_file[O_RDONLY] : file->f_file[O_WRONLY]; ··· 317 316 static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) 318 317 { 319 318 return file_inode(nlmsvc_file_file(file)); 319 + } 320 + 321 + static inline bool 322 + nlmsvc_file_cannot_lock(const struct nlm_file *file) 323 + { 324 + return exportfs_cannot_lock(nlmsvc_file_file(file)->f_path.dentry->d_sb->s_export_op); 320 325 } 321 326 322 327 static inline int __nlm_privileged_request4(const struct sockaddr *sap)

+1 -1

include/linux/sunrpc/svc_rdma.h

··· 131 131 */ 132 132 enum { 133 133 RPCRDMA_LISTEN_BACKLOG = 10, 134 - RPCRDMA_MAX_REQUESTS = 64, 134 + RPCRDMA_MAX_REQUESTS = 128, 135 135 RPCRDMA_MAX_BC_REQUESTS = 2, 136 136 }; 137 137

+3

include/linux/sunrpc/svcsock.h

··· 26 26 void (*sk_odata)(struct sock *); 27 27 void (*sk_owspace)(struct sock *); 28 28 29 + /* For sends (protected by xpt_mutex) */ 30 + struct bio_vec *sk_bvec; 31 + 29 32 /* private TCP part */ 30 33 /* On-the-wire fragment header: */ 31 34 __be32 sk_marker;

+52 -10

net/sunrpc/svcsock.c

··· 68 68 69 69 #define RPCDBG_FACILITY RPCDBG_SVCXPRT 70 70 71 + /* 72 + * For UDP: 73 + * 1 for header page 74 + * enough pages for RPCSVC_MAXPAYLOAD_UDP 75 + * 1 in case payload is not aligned 76 + * 1 for tail page 77 + */ 78 + enum { 79 + SUNRPC_MAX_UDP_SENDPAGES = 1 + RPCSVC_MAXPAYLOAD_UDP / PAGE_SIZE + 1 + 1 80 + }; 81 + 71 82 /* To-do: to avoid tying up an nfsd thread while waiting for a 72 83 * handshake request, the request could instead be deferred. 73 84 */ ··· 751 740 if (svc_xprt_is_dead(xprt)) 752 741 goto out_notconn; 753 742 754 - count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr); 743 + count = xdr_buf_to_bvec(svsk->sk_bvec, SUNRPC_MAX_UDP_SENDPAGES, xdr); 755 744 756 - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 745 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, svsk->sk_bvec, 757 746 count, rqstp->rq_res.len); 758 747 err = sock_sendmsg(svsk->sk_sock, &msg); 759 748 if (err == -ECONNREFUSED) { 760 749 /* ICMP error on earlier request. */ 761 - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 750 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, svsk->sk_bvec, 762 751 count, rqstp->rq_res.len); 763 752 err = sock_sendmsg(svsk->sk_sock, &msg); 764 753 } ··· 1073 1062 return svc_sock_reclen(svsk); 1074 1063 1075 1064 err_too_large: 1076 - net_notice_ratelimited("svc: %s %s RPC fragment too large: %d\n", 1077 - __func__, svsk->sk_xprt.xpt_server->sv_name, 1078 - svc_sock_reclen(svsk)); 1065 + net_notice_ratelimited("svc: %s oversized RPC fragment (%u octets) from %pISpc\n", 1066 + svsk->sk_xprt.xpt_server->sv_name, 1067 + svc_sock_reclen(svsk), 1068 + (struct sockaddr *)&svsk->sk_xprt.xpt_remote); 1079 1069 svc_xprt_deferred_close(&svsk->sk_xprt); 1080 1070 err_short: 1081 1071 return -EAGAIN; ··· 1247 1235 int ret; 1248 1236 1249 1237 /* The stream record marker is copied into a temporary page 1250 - * fragment buffer so that it can be included in rq_bvec. 1238 + * fragment buffer so that it can be included in sk_bvec. 1251 1239 */ 1252 1240 buf = page_frag_alloc(&svsk->sk_frag_cache, sizeof(marker), 1253 1241 GFP_KERNEL); 1254 1242 if (!buf) 1255 1243 return -ENOMEM; 1256 1244 memcpy(buf, &marker, sizeof(marker)); 1257 - bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker)); 1245 + bvec_set_virt(svsk->sk_bvec, buf, sizeof(marker)); 1258 1246 1259 - count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages, 1247 + count = xdr_buf_to_bvec(svsk->sk_bvec + 1, rqstp->rq_maxpages, 1260 1248 &rqstp->rq_res); 1261 1249 1262 - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 1250 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, svsk->sk_bvec, 1263 1251 1 + count, sizeof(marker) + rqstp->rq_res.len); 1264 1252 ret = sock_sendmsg(svsk->sk_sock, &msg); 1265 1253 page_frag_free(buf); ··· 1404 1392 spin_unlock_bh(&serv->sv_lock); 1405 1393 } 1406 1394 1395 + static int svc_sock_sendpages(struct svc_serv *serv, struct socket *sock, int flags) 1396 + { 1397 + switch (sock->type) { 1398 + case SOCK_STREAM: 1399 + /* +1 for TCP record marker */ 1400 + if (flags & SVC_SOCK_TEMPORARY) 1401 + return svc_serv_maxpages(serv) + 1; 1402 + return 0; 1403 + case SOCK_DGRAM: 1404 + return SUNRPC_MAX_UDP_SENDPAGES; 1405 + } 1406 + return -EINVAL; 1407 + } 1408 + 1407 1409 /* 1408 1410 * Initialize socket for RPC use and create svc_sock struct 1409 1411 */ ··· 1428 1402 struct svc_sock *svsk; 1429 1403 struct sock *inet; 1430 1404 int pmap_register = !(flags & SVC_SOCK_ANONYMOUS); 1405 + int sendpages; 1431 1406 unsigned long pages; 1407 + 1408 + sendpages = svc_sock_sendpages(serv, sock, flags); 1409 + if (sendpages < 0) 1410 + return ERR_PTR(sendpages); 1432 1411 1433 1412 pages = svc_serv_maxpages(serv); 1434 1413 svsk = kzalloc(struct_size(svsk, sk_pages, pages), GFP_KERNEL); 1435 1414 if (!svsk) 1436 1415 return ERR_PTR(-ENOMEM); 1416 + 1417 + if (sendpages) { 1418 + svsk->sk_bvec = kcalloc(sendpages, sizeof(*svsk->sk_bvec), GFP_KERNEL); 1419 + if (!svsk->sk_bvec) { 1420 + kfree(svsk); 1421 + return ERR_PTR(-ENOMEM); 1422 + } 1423 + } 1424 + 1437 1425 svsk->sk_maxpages = pages; 1438 1426 1439 1427 inet = sock->sk; ··· 1459 1419 inet->sk_protocol, 1460 1420 ntohs(inet_sk(inet)->inet_sport)); 1461 1421 if (err < 0) { 1422 + kfree(svsk->sk_bvec); 1462 1423 kfree(svsk); 1463 1424 return ERR_PTR(err); 1464 1425 } ··· 1677 1636 sock_release(sock); 1678 1637 1679 1638 page_frag_cache_drain(&svsk->sk_frag_cache); 1639 + kfree(svsk->sk_bvec); 1680 1640 kfree(svsk); 1681 1641 }

+8 -11

net/sunrpc/xprtrdma/svc_rdma_transport.c

··· 591 591 rdma_disconnect(rdma->sc_cm_id); 592 592 } 593 593 594 - static void __svc_rdma_free(struct work_struct *work) 594 + /** 595 + * svc_rdma_free - Release class-specific transport resources 596 + * @xprt: Generic svc transport object 597 + */ 598 + static void svc_rdma_free(struct svc_xprt *xprt) 595 599 { 596 600 struct svcxprt_rdma *rdma = 597 - container_of(work, struct svcxprt_rdma, sc_work); 601 + container_of(xprt, struct svcxprt_rdma, sc_xprt); 598 602 struct ib_device *device = rdma->sc_cm_id->device; 603 + 604 + might_sleep(); 599 605 600 606 /* This blocks until the Completion Queues are empty */ 601 607 if (rdma->sc_qp && !IS_ERR(rdma->sc_qp)) ··· 633 627 if (!test_bit(XPT_LISTENER, &rdma->sc_xprt.xpt_flags)) 634 628 rpcrdma_rn_unregister(device, &rdma->sc_rn); 635 629 kfree(rdma); 636 - } 637 - 638 - static void svc_rdma_free(struct svc_xprt *xprt) 639 - { 640 - struct svcxprt_rdma *rdma = 641 - container_of(xprt, struct svcxprt_rdma, sc_xprt); 642 - 643 - INIT_WORK(&rdma->sc_work, __svc_rdma_free); 644 - schedule_work(&rdma->sc_work); 645 630 } 646 631 647 632 static int svc_rdma_has_wspace(struct svc_xprt *xprt)

+6 -5

tools/net/sunrpc/xdrgen/generators/__init__.py

··· 2 2 3 3 """Define a base code generator class""" 4 4 5 - import sys 5 + from pathlib import Path 6 6 from jinja2 import Environment, FileSystemLoader, Template 7 7 8 8 from xdr_ast import _XdrAst, Specification, _RpcProgram, _XdrTypeSpecifier ··· 14 14 """Open a set of templates based on output language""" 15 15 match language: 16 16 case "C": 17 + templates_dir = ( 18 + Path(__file__).parent.parent / "templates" / language / xdr_type 19 + ) 17 20 environment = Environment( 18 - loader=FileSystemLoader(sys.path[0] + "/templates/C/" + xdr_type + "/"), 21 + loader=FileSystemLoader(templates_dir), 19 22 trim_blocks=True, 20 23 lstrip_blocks=True, 21 24 ) ··· 51 48 52 49 def header_guard_infix(filename: str) -> str: 53 50 """Extract the header guard infix from the specification filename""" 54 - basename = filename.split("/")[-1] 55 - program = basename.replace(".x", "") 56 - return program.upper() 51 + return Path(filename).stem.upper() 57 52 58 53 59 54 def kernel_c_type(spec: _XdrTypeSpecifier) -> str:

+25 -9

tools/net/sunrpc/xdrgen/generators/union.py

··· 8 8 from generators import SourceGenerator 9 9 from generators import create_jinja2_environment, get_jinja2_template 10 10 11 - from xdr_ast import _XdrBasic, _XdrUnion, _XdrVoid, get_header_name 11 + from xdr_ast import _XdrBasic, _XdrUnion, _XdrVoid, _XdrString, get_header_name 12 12 from xdr_ast import _XdrDeclaration, _XdrCaseSpec, public_apis, big_endian 13 13 14 14 ··· 40 40 """Emit a definition for an XDR union's case arm""" 41 41 if isinstance(node.arm, _XdrVoid): 42 42 return 43 - assert isinstance(node.arm, _XdrBasic) 43 + if isinstance(node.arm, _XdrString): 44 + type_name = "char *" 45 + classifier = "" 46 + else: 47 + type_name = node.arm.spec.type_name 48 + classifier = node.arm.spec.c_classifier 49 + 50 + assert isinstance(node.arm, (_XdrBasic, _XdrString)) 44 51 template = get_jinja2_template(environment, "definition", "case_spec") 45 52 print( 46 53 template.render( 47 54 name=node.arm.name, 48 - type=node.arm.spec.type_name, 49 - classifier=node.arm.spec.c_classifier, 55 + type=type_name, 56 + classifier=classifier, 50 57 ) 51 58 ) 52 59 ··· 91 84 92 85 if isinstance(node.arm, _XdrVoid): 93 86 return 87 + if isinstance(node.arm, _XdrString): 88 + type_name = "char *" 89 + classifier = "" 90 + else: 91 + type_name = node.arm.spec.type_name 92 + classifier = node.arm.spec.c_classifier 94 93 95 94 if big_endian_discriminant: 96 95 template = get_jinja2_template(environment, "decoder", "case_spec_be") ··· 105 92 for case in node.values: 106 93 print(template.render(case=case)) 107 94 108 - assert isinstance(node.arm, _XdrBasic) 95 + assert isinstance(node.arm, (_XdrBasic, _XdrString)) 109 96 template = get_jinja2_template(environment, "decoder", node.arm.template) 110 97 print( 111 98 template.render( 112 99 name=node.arm.name, 113 - type=node.arm.spec.type_name, 114 - classifier=node.arm.spec.c_classifier, 100 + type=type_name, 101 + classifier=classifier, 115 102 ) 116 103 ) 117 104 ··· 182 169 183 170 if isinstance(node.arm, _XdrVoid): 184 171 return 185 - 172 + if isinstance(node.arm, _XdrString): 173 + type_name = "char *" 174 + else: 175 + type_name = node.arm.spec.type_name 186 176 if big_endian_discriminant: 187 177 template = get_jinja2_template(environment, "encoder", "case_spec_be") 188 178 else: ··· 197 181 print( 198 182 template.render( 199 183 name=node.arm.name, 200 - type=node.arm.spec.type_name, 184 + type=type_name, 201 185 ) 202 186 ) 203 187

+1 -1

tools/net/sunrpc/xdrgen/templates/C/pointer/decoder/close.j2

··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/pointer/encoder/close.j2

··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/struct/decoder/close.j2

··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/struct/decoder/variable_length_opaque.j2

··· 2 2 {% if annotate %} 3 3 /* member {{ name }} (variable-length opaque) */ 4 4 {% endif %} 5 - if (!xdrgen_decode_opaque(xdr, (opaque *)ptr, {{ maxsize }})) 5 + if (!xdrgen_decode_opaque(xdr, &ptr->{{ name }}, {{ maxsize }})) 6 6 return false;

+1 -1

tools/net/sunrpc/xdrgen/templates/C/struct/encoder/close.j2

··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 return true; 3 - }; 3 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/basic.j2

··· 14 14 /* (basic) */ 15 15 {% endif %} 16 16 return xdrgen_decode_{{ type }}(xdr, ptr); 17 - }; 17 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/fixed_length_array.j2

··· 22 22 return false; 23 23 } 24 24 return true; 25 - }; 25 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/fixed_length_opaque.j2

··· 14 14 /* (fixed-length opaque) */ 15 15 {% endif %} 16 16 return xdr_stream_decode_opaque_fixed(xdr, ptr, {{ size }}) == 0; 17 - }; 17 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/string.j2

··· 14 14 /* (variable-length string) */ 15 15 {% endif %} 16 16 return xdrgen_decode_string(xdr, ptr, {{ maxsize }}); 17 - }; 17 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/variable_length_array.j2

··· 23 23 if (!xdrgen_decode_{{ type }}(xdr, &ptr->element[i])) 24 24 return false; 25 25 return true; 26 - }; 26 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/decoder/variable_length_opaque.j2

··· 14 14 /* (variable-length opaque) */ 15 15 {% endif %} 16 16 return xdrgen_decode_opaque(xdr, ptr, {{ maxsize }}); 17 - }; 17 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/basic.j2

··· 18 18 /* (basic) */ 19 19 {% endif %} 20 20 return xdrgen_encode_{{ type }}(xdr, value); 21 - }; 21 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/fixed_length_array.j2

··· 22 22 return false; 23 23 } 24 24 return true; 25 - }; 25 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/fixed_length_opaque.j2

··· 14 14 /* (fixed-length opaque) */ 15 15 {% endif %} 16 16 return xdr_stream_encode_opaque_fixed(xdr, value, {{ size }}) >= 0; 17 - }; 17 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/string.j2

··· 14 14 /* (variable-length string) */ 15 15 {% endif %} 16 16 return xdr_stream_encode_opaque(xdr, value.data, value.len) >= 0; 17 - }; 17 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/variable_length_array.j2

··· 27 27 {% endif %} 28 28 return false; 29 29 return true; 30 - }; 30 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/typedef/encoder/variable_length_opaque.j2

··· 14 14 /* (variable-length opaque) */ 15 15 {% endif %} 16 16 return xdr_stream_encode_opaque(xdr, value.data, value.len) >= 0; 17 - }; 17 + }

+4

tools/net/sunrpc/xdrgen/templates/C/union/declaration/close.j2

··· 1 + {# SPDX-License-Identifier: GPL-2.0 #} 2 + 3 + bool xdrgen_decode_{{ name }}(struct xdr_stream *xdr, struct {{ name }} *ptr); 4 + bool xdrgen_encode_{{ name }}(struct xdr_stream *xdr, const struct {{ name }} *value);

+1 -1

tools/net/sunrpc/xdrgen/templates/C/union/decoder/close.j2

··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 } 3 3 return true; 4 - }; 4 + }

+1 -1

tools/net/sunrpc/xdrgen/templates/C/union/encoder/close.j2

··· 1 1 {# SPDX-License-Identifier: GPL-2.0 #} 2 2 } 3 3 return true; 4 - }; 4 + }

+6

tools/net/sunrpc/xdrgen/templates/C/union/encoder/string.j2

··· 1 + {# SPDX-License-Identifier: GPL-2.0 #} 2 + {% if annotate %} 3 + /* member {{ name }} (variable-length string) */ 4 + {% endif %} 5 + if (!xdrgen_encode_string(xdr, ptr->u.{{ name }}, {{ maxsize }})) 6 + return false;

+5

tools/net/sunrpc/xdrgen/xdrgen

··· 10 10 __version__ = "0.2" 11 11 12 12 import sys 13 + from pathlib import Path 13 14 import argparse 15 + 16 + _XDRGEN_DIR = Path(__file__).resolve().parent 17 + if str(_XDRGEN_DIR) not in sys.path: 18 + sys.path.insert(0, str(_XDRGEN_DIR)) 14 19 15 20 from subcmds import definitions 16 21 from subcmds import declarations

Configure Feed

Configure Feed