docs/vfs: update references to i_mutex to i_rwsem

tjh.dev / kernel

fork

Configure Feed

Issues Pull Requests Commits Tags

Feed URL

Select the types of activity you want to include in your feed.

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

fork

Configure Feed

Issues Pull Requests Commits Tags

Feed URL

Select the types of activity you want to include in your feed.

docs/vfs: update references to i_mutex to i_rwsem

VFS has switched to i_rwsem for ten years now (9902af79c01a: parallel
lookups actual switch to rwsem), but the VFS documentation and comments
still has references to i_mutex.

Signed-off-by: Junxuan Liao <ljx@cs.wisc.edu>
Link: https://lore.kernel.org/72223729-5471-474a-af3c-f366691fba82@cs.wisc.edu
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

Junxuan Liao and committed by

Christian Brauner 10 months ago 2773d282 6ae58121

+48 -47

16 changed files

expand all collapse all

Documentation

filesystems

vfs.rst

attr.c

buffer.c

dcache.c

direct-io.c

inode.c

libfs.c

locks.c

namei.c

namespace.c

stack.c

xattr.c

include

linux

exportfs.h

fs.h

fs_stack.h

quotaops.h

+3 -2

Documentation/filesystems/vfs.rst

reviewed

··· 758 758 dirty_folio to write data into the address_space, and 759 759 writepages to writeback data to storage. 760 760 761 761 - Adding and removing pages to/from an address_space is protected by the 762 762 - inode's i_mutex. 761 761 + Removing pages from an address_space requires holding the inode's i_rwsem 762 762 + exclusively, while adding pages to the address_space requires holding the 763 763 + inode's i_mapping->invalidate_lock exclusively. 763 764 764 765 When data is written to a page, the PG_Dirty flag should be set. It 765 766 typically remains set until writepages asks for it to be written. This

+5 -5

fs/attr.c

reviewed

··· 230 230 * @inode: the inode to be truncated 231 231 * @offset: the new size to assign to the inode 232 232 * 233 233 - * inode_newsize_ok must be called with i_mutex held. 233 233 + * inode_newsize_ok must be called with i_rwsem held exclusively. 234 234 * 235 235 * inode_newsize_ok will check filesystem limits and ulimits to check that the 236 236 * new inode size is within limits. inode_newsize_ok will also send SIGXFSZ ··· 318 318 * @inode: the inode to be updated 319 319 * @attr: the new attributes 320 320 * 321 321 - * setattr_copy must be called with i_mutex held. 321 321 + * setattr_copy must be called with i_rwsem held exclusively. 322 322 * 323 323 * setattr_copy updates the inode's metadata with that specified 324 324 * in attr on idmapped mounts. Necessary permission checks to determine ··· 403 403 * @attr: new attributes 404 404 * @delegated_inode: returns inode, if the inode is delegated 405 405 * 406 406 - * The caller must hold the i_mutex on the affected object. 406 406 + * The caller must hold the i_rwsem exclusively on the affected object. 407 407 * 408 408 * If notify_change discovers a delegation in need of breaking, 409 409 * it will return -EWOULDBLOCK and return a reference to the inode in 410 410 * delegated_inode. The caller should then break the delegation and 411 411 * retry. Because breaking a delegation may take a long time, the 412 412 - * caller should drop the i_mutex before doing so. 412 412 + * caller should drop the i_rwsem before doing so. 413 413 * 414 414 * Alternatively, a caller may pass NULL for delegated_inode. This may 415 415 * be appropriate for callers that expect the underlying filesystem not ··· 456 456 if (S_ISLNK(inode->i_mode)) 457 457 return -EOPNOTSUPP; 458 458 459 459 - /* Flag setting protected by i_mutex */ 459 459 + /* Flag setting protected by i_rwsem */ 460 460 if (is_sxid(attr->ia_mode)) 461 461 inode->i_flags &= ~S_NOSEC; 462 462 }

+1 -1

fs/buffer.c

reviewed

··· 2609 2609 * holes and correct delalloc and unwritten extent mapping on filesystems that 2610 2610 * support these features. 2611 2611 * 2612 2612 - * We are not allowed to take the i_mutex here so we have to play games to 2612 2612 + * We are not allowed to take the i_rwsem here so we have to play games to 2613 2613 * protect against truncate races as the page could now be beyond EOF. Because 2614 2614 * truncate writes the inode size before removing pages, once we have the 2615 2615 * page lock we can determine safely if the page is beyond EOF. If it is not

+5 -5

fs/dcache.c

reviewed

··· 2774 2774 * @target: new dentry 2775 2775 * @exchange: exchange the two dentries 2776 2776 * 2777 2777 - * Update the dcache to reflect the move of a file name. Negative 2778 2778 - * dcache entries should not be moved in this way. Caller must hold 2779 2779 - * rename_lock, the i_mutex of the source and target directories, 2780 2780 - * and the sb->s_vfs_rename_mutex if they differ. See lock_rename(). 2777 2777 + * Update the dcache to reflect the move of a file name. Negative dcache 2778 2778 + * entries should not be moved in this way. Caller must hold rename_lock, the 2779 2779 + * i_rwsem of the source and target directories (exclusively), and the sb-> 2780 2780 + * s_vfs_rename_mutex if they differ. See lock_rename(). 2781 2781 */ 2782 2782 static void __d_move(struct dentry *dentry, struct dentry *target, 2783 2783 bool exchange) ··· 2923 2923 * This helper attempts to cope with remotely renamed directories 2924 2924 * 2925 2925 * It assumes that the caller is already holding 2926 2926 - * dentry->d_parent->d_inode->i_mutex, and rename_lock 2926 2926 + * dentry->d_parent->d_inode->i_rwsem, and rename_lock 2927 2927 * 2928 2928 * Note: If ever the locking in lock_rename() changes, then please 2929 2929 * remember to update this too...

+4 -4

fs/direct-io.c

reviewed

··· 1083 1083 * The locking rules are governed by the flags parameter: 1084 1084 * - if the flags value contains DIO_LOCKING we use a fancy locking 1085 1085 * scheme for dumb filesystems. 1086 1086 - * For writes this function is called under i_mutex and returns with 1087 1087 - * i_mutex held, for reads, i_mutex is not held on entry, but it is 1086 1086 + * For writes this function is called under i_rwsem and returns with 1087 1087 + * i_rwsem held, for reads, i_rwsem is not held on entry, but it is 1088 1088 * taken and dropped again before returning. 1089 1089 * - if the flags value does NOT contain DIO_LOCKING we don't use any 1090 1090 * internal locking but rather rely on the filesystem to synchronize ··· 1094 1094 * counter before starting direct I/O, and decrement it once we are done. 1095 1095 * Truncate can wait for it to reach zero to provide exclusion. It is 1096 1096 * expected that filesystem provide exclusion between new direct I/O 1097 1097 - * and truncates. For DIO_LOCKING filesystems this is done by i_mutex, 1097 1097 + * and truncates. For DIO_LOCKING filesystems this is done by i_rwsem, 1098 1098 * but other filesystems need to take care of this on their own. 1099 1099 * 1100 1100 * NOTE: if you pass "sdio" to anything by pointer make sure that function ··· 1279 1279 1280 1280 /* 1281 1281 * All block lookups have been performed. For READ requests 1282 1282 - * we can let i_mutex go now that its achieved its purpose 1282 1282 + * we can let i_rwsem go now that its achieved its purpose 1283 1283 * of protecting us from looking up uninitialized blocks. 1284 1284 */ 1285 1285 if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))

+4 -5

fs/inode.c

reviewed

··· 1158 1158 /* Set new key only if filesystem hasn't already changed it */ 1159 1159 if (lockdep_match_class(&inode->i_rwsem, &type->i_mutex_key)) { 1160 1160 /* 1161 1161 - * ensure nobody is actually holding i_mutex 1161 1161 + * ensure nobody is actually holding i_rwsem 1162 1162 */ 1163 1163 - // mutex_destroy(&inode->i_mutex); 1164 1163 init_rwsem(&inode->i_rwsem); 1165 1164 lockdep_set_class(&inode->i_rwsem, 1166 1165 &type->i_mutex_dir_key); ··· 2614 2615 * proceed with a truncate or equivalent operation. 2615 2616 * 2616 2617 * Must be called under a lock that serializes taking new references 2617 2617 - * to i_dio_count, usually by inode->i_mutex. 2618 2618 + * to i_dio_count, usually by inode->i_rwsem. 2618 2619 */ 2619 2620 void inode_dio_wait(struct inode *inode) 2620 2621 { ··· 2632 2633 /* 2633 2634 * inode_set_flags - atomically set some inode flags 2634 2635 * 2635 2635 - * Note: the caller should be holding i_mutex, or else be sure that 2636 2636 + * Note: the caller should be holding i_rwsem exclusively, or else be sure that 2636 2637 * they have exclusive access to the inode structure (i.e., while the 2637 2638 * inode is being instantiated). The reason for the cmpxchg() loop 2638 2639 * --- which wouldn't be necessary if all code paths which modify ··· 2640 2641 * code path which doesn't today so we use cmpxchg() out of an abundance 2641 2642 * of caution. 2642 2643 * 2643 2643 - * In the long run, i_mutex is overkill, and we should probably look 2644 2644 + * In the long run, i_rwsem is overkill, and we should probably look 2644 2645 * at using the i_lock spinlock to protect i_flags, and then make sure 2645 2646 * it is so documented in include/linux/fs.h and that all code follows 2646 2647 * the locking convention!!

+3 -2

fs/libfs.c

reviewed

··· 946 946 * simple_write_end does the minimum needed for updating a folio after 947 947 * writing is done. It has the same API signature as the .write_end of 948 948 * address_space_operations vector. So it can just be set onto .write_end for 949 949 - * FSes that don't need any other processing. i_mutex is assumed to be held. 949 949 + * FSes that don't need any other processing. i_rwsem is assumed to be held 950 950 + * exclusively. 950 951 * Block based filesystems should use generic_write_end(). 951 952 * NOTE: Even though i_size might get updated by this function, mark_inode_dirty 952 953 * is not called, so a filesystem that actually does store data in .write_inode ··· 974 973 } 975 974 /* 976 975 * No need to use i_size_read() here, the i_size 977 977 - * cannot change under us because we hold the i_mutex. 976 976 + * cannot change under us because we hold the i_rwsem. 978 977 */ 979 978 if (last_pos > inode->i_size) 980 979 i_size_write(inode, last_pos);

+1 -1

fs/locks.c

reviewed

··· 1794 1794 1795 1795 /* 1796 1796 * In the delegation case we need mutual exclusion with 1797 1797 - * a number of operations that take the i_mutex. We trylock 1797 1797 + * a number of operations that take the i_rwsem. We trylock 1798 1798 * because delegations are an optional optimization, and if 1799 1799 * there's some chance of a conflict--we'd rather not 1800 1800 * bother, maybe that's a sign this just isn't a good file to

+11 -11

fs/namei.c

reviewed

··· 1469 1469 int ret = 0; 1470 1470 1471 1471 while (flags & DCACHE_MANAGED_DENTRY) { 1472 1472 - /* Allow the filesystem to manage the transit without i_mutex 1472 1472 + /* Allow the filesystem to manage the transit without i_rwsem 1473 1473 * being held. */ 1474 1474 if (flags & DCACHE_MANAGE_TRANSIT) { 1475 1475 ret = path->dentry->d_op->d_manage(path, false); ··· 2945 2945 * Note that this routine is purely a helper for filesystem usage and should 2946 2946 * not be called by generic code. It does no permission checking. 2947 2947 * 2948 2948 - * The caller must hold base->i_mutex. 2948 2948 + * The caller must hold base->i_rwsem. 2949 2949 */ 2950 2950 struct dentry *lookup_noperm(struct qstr *name, struct dentry *base) 2951 2951 { ··· 2971 2971 * 2972 2972 * This can be used for in-kernel filesystem clients such as file servers. 2973 2973 * 2974 2974 - * The caller must hold base->i_mutex. 2974 2974 + * The caller must hold base->i_rwsem. 2975 2975 */ 2976 2976 struct dentry *lookup_one(struct mnt_idmap *idmap, struct qstr *name, 2977 2977 struct dentry *base) ··· 4542 4542 * @dentry: victim 4543 4543 * @delegated_inode: returns victim inode, if the inode is delegated. 4544 4544 * 4545 4545 - * The caller must hold dir->i_mutex. 4545 4545 + * The caller must hold dir->i_rwsem exclusively. 4546 4546 * 4547 4547 * If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK and 4548 4548 * return a reference to the inode in delegated_inode. The caller 4549 4549 * should then break the delegation on that inode and retry. Because 4550 4550 * breaking a delegation may take a long time, the caller should drop 4551 4551 - * dir->i_mutex before doing so. 4551 4551 + * dir->i_rwsem before doing so. 4552 4552 * 4553 4553 * Alternatively, a caller may pass NULL for delegated_inode. This may 4554 4554 * be appropriate for callers that expect the underlying filesystem not ··· 4607 4607 4608 4608 /* 4609 4609 * Make sure that the actual truncation of the file will occur outside its 4610 4610 - * directory's i_mutex. Truncate can take a long time if there is a lot of 4610 4610 + * directory's i_rwsem. Truncate can take a long time if there is a lot of 4611 4611 * writeout happening, and we don't want to prevent access to the directory 4612 4612 * while waiting on the I/O. 4613 4613 */ ··· 4785 4785 * @new_dentry: where to create the new link 4786 4786 * @delegated_inode: returns inode needing a delegation break 4787 4787 * 4788 4788 - * The caller must hold dir->i_mutex 4788 4788 + * The caller must hold dir->i_rwsem exclusively. 4789 4789 * 4790 4790 * If vfs_link discovers a delegation on the to-be-linked file in need 4791 4791 * of breaking, it will return -EWOULDBLOCK and return a reference to the 4792 4792 * inode in delegated_inode. The caller should then break the delegation 4793 4793 * and retry. Because breaking a delegation may take a long time, the 4794 4794 - * caller should drop the i_mutex before doing so. 4794 4794 + * caller should drop the i_rwsem before doing so. 4795 4795 * 4796 4796 * Alternatively, a caller may pass NULL for delegated_inode. This may 4797 4797 * be appropriate for callers that expect the underlying filesystem not ··· 4987 4987 * c) we may have to lock up to _four_ objects - parents and victim (if it exists), 4988 4988 * and source (if it's a non-directory or a subdirectory that moves to 4989 4989 * different parent). 4990 4990 - * And that - after we got ->i_mutex on parents (until then we don't know 4990 4990 + * And that - after we got ->i_rwsem on parents (until then we don't know 4991 4991 * whether the target exists). Solution: try to be smart with locking 4992 4992 * order for inodes. We rely on the fact that tree topology may change 4993 4993 * only under ->s_vfs_rename_mutex _and_ that parent of the object we ··· 4999 4999 * has no more than 1 dentry. If "hybrid" objects will ever appear, 5000 5000 * we'd better make sure that there's no link(2) for them. 5001 5001 * d) conversion from fhandle to dentry may come in the wrong moment - when 5002 5002 - * we are removing the target. Solution: we will have to grab ->i_mutex 5002 5002 + * we are removing the target. Solution: we will have to grab ->i_rwsem 5003 5003 * in the fhandle_to_dentry code. [FIXME - current nfsfh.c relies on 5004 5004 - * ->i_mutex on parents, which works but leads to some truly excessive 5004 5004 + * ->i_rwsem on parents, which works but leads to some truly excessive 5005 5005 * locking]. 5006 5006 */ 5007 5007 int vfs_rename(struct renamedata *rd)

+1 -1

fs/namespace.c

reviewed

··· 2053 2053 * detach_mounts allows lazily unmounting those mounts instead of 2054 2054 * leaking them. 2055 2055 * 2056 2056 - * The caller may hold dentry->d_inode->i_mutex. 2056 2056 + * The caller may hold dentry->d_inode->i_rwsem. 2057 2057 */ 2058 2058 void __detach_mounts(struct dentry *dentry) 2059 2059 {

+2 -2

fs/stack.c

reviewed

··· 3 3 #include <linux/fs.h> 4 4 #include <linux/fs_stack.h> 5 5 6 6 - /* does _NOT_ require i_mutex to be held. 6 6 + /* does _NOT_ require i_rwsem to be held. 7 7 * 8 8 * This function cannot be inlined since i_size_{read,write} is rather 9 9 * heavy-weight on 32-bit systems ··· 41 41 * If CONFIG_SMP or CONFIG_PREEMPTION on 32-bit, it's vital for 42 42 * fsstack_copy_inode_size() to hold some lock around 43 43 * i_size_write(), otherwise i_size_read() may spin forever (see 44 44 - * include/linux/fs.h). We don't necessarily hold i_mutex when this 44 44 + * include/linux/fs.h). We don't necessarily hold i_rwsem when this 45 45 * is called, so take i_lock for that case. 46 46 * 47 47 * And if on 32-bit, continue our effort to keep the two halves of

+1 -1

fs/xattr.c

reviewed

··· 215 215 * 216 216 * returns the result of the internal setxattr or setsecurity operations. 217 217 * 218 218 - * This function requires the caller to lock the inode's i_mutex before it 218 218 + * This function requires the caller to lock the inode's i_rwsem before it 219 219 * is executed. It also assumes that the caller will make the appropriate 220 220 * permission checks. 221 221 */

+2 -2

include/linux/exportfs.h

reviewed

··· 230 230 * directory. The name should be stored in the @name (with the 231 231 * understanding that it is already pointing to a %NAME_MAX+1 sized 232 232 * buffer. get_name() should return %0 on success, a negative error code 233 233 - * or error. @get_name will be called without @parent->i_mutex held. 233 233 + * or error. @get_name will be called without @parent->i_rwsem held. 234 234 * 235 235 * get_parent: 236 236 * @get_parent should find the parent directory for the given @child which ··· 247 247 * @commit_metadata should commit metadata changes to stable storage. 248 248 * 249 249 * Locking rules: 250 250 - * get_parent is called with child->d_inode->i_mutex down 250 250 + * get_parent is called with child->d_inode->i_rwsem down 251 251 * get_name is not (which is possibly inconsistent) 252 252 */ 253 253

+3 -3

include/linux/fs.h

reviewed

··· 837 837 } 838 838 839 839 /* 840 840 - * inode->i_mutex nesting subclasses for the lock validator: 840 840 + * inode->i_rwsem nesting subclasses for the lock validator: 841 841 * 842 842 * 0: the object of the current VFS operation 843 843 * 1: parent ··· 989 989 990 990 /* 991 991 * NOTE: unlike i_size_read(), i_size_write() does need locking around it 992 992 - * (normally i_mutex), otherwise on 32bit/SMP an update of i_size_seqcount 992 992 + * (normally i_rwsem), otherwise on 32bit/SMP an update of i_size_seqcount 993 993 * can be lost, resulting in subsequent i_size_read() calls spinning forever. 994 994 */ 995 995 static inline void i_size_write(struct inode *inode, loff_t i_size) ··· 1921 1921 * freeze protection should be the outermost lock. In particular, we have: 1922 1922 * 1923 1923 * sb_start_write 1924 1924 - * -> i_mutex (write path, truncate, directory ops, ...) 1924 1924 + * -> i_rwsem (write path, truncate, directory ops, ...) 1925 1925 * -> s_umount (freeze_super, thaw_super) 1926 1926 */ 1927 1927 static inline void sb_start_write(struct super_block *sb)

+1 -1

include/linux/fs_stack.h

reviewed

··· 3 3 #define _LINUX_FS_STACK_H 4 4 5 5 /* This file defines generic functions used primarily by stackable 6 6 - * filesystems; none of these functions require i_mutex to be held. 6 6 + * filesystems; none of these functions require i_rwsem to be held. 7 7 */ 8 8 9 9 #include <linux/fs.h>

+1 -1

include/linux/quotaops.h

reviewed

··· 19 19 return &sb->s_dquot; 20 20 } 21 21 22 22 - /* i_mutex must being held */ 22 22 + /* i_rwsem must being held */ 23 23 static inline bool is_quota_modification(struct mnt_idmap *idmap, 24 24 struct inode *inode, struct iattr *ia) 25 25 {