Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

fs: track the inode having file locks with a flag in ->i_opflags

Opening and closing an inode dirties the ->i_readcount field.

Depending on the alignment of the inode, it may happen to false-share
with other fields loaded both for both operations to various extent.

This notably concerns the ->i_flctx field.

Since most inodes don't have the field populated, this bit can be managed
with a flag in ->i_opflags instead which bypasses the problem.

Here are results I obtained while opening a file read-only in a loop
with 24 cores doing the work on Sapphire Rapids. Utilizing the flag as
opposed to reading ->i_flctx field was toggled at runtime as the benchmark
was running, to make sure both results come from the same alignment.

before: 3233740
after: 3373346 (+4%)

before: 3284313
after: 3518711 (+7%)

before: 3505545
after: 4092806 (+16%)

Or to put it differently, this varies wildly depending on how (un)lucky
you get.

The primary bottleneck before and after is the avoidable lockref trip in
do_dentry_open().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251203094837.290654-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

Mateusz Guzik and committed by
Christian Brauner
887e9774 1fa4e69a

+24 -6
+12 -2
fs/locks.c
··· 178 178 { 179 179 struct file_lock_context *ctx; 180 180 181 - /* paired with cmpxchg() below */ 182 181 ctx = locks_inode_context(inode); 183 182 if (likely(ctx) || type == F_UNLCK) 184 183 goto out; ··· 195 196 * Assign the pointer if it's not already assigned. If it is, then 196 197 * free the context we just allocated. 197 198 */ 198 - if (cmpxchg(&inode->i_flctx, NULL, ctx)) { 199 + spin_lock(&inode->i_lock); 200 + if (!(inode->i_opflags & IOP_FLCTX)) { 201 + VFS_BUG_ON_INODE(inode->i_flctx, inode); 202 + WRITE_ONCE(inode->i_flctx, ctx); 203 + /* 204 + * Paired with locks_inode_context(). 205 + */ 206 + smp_store_release(&inode->i_opflags, inode->i_opflags | IOP_FLCTX); 207 + spin_unlock(&inode->i_lock); 208 + } else { 209 + VFS_BUG_ON_INODE(!inode->i_flctx, inode); 210 + spin_unlock(&inode->i_lock); 199 211 kmem_cache_free(flctx_cache, ctx); 200 212 ctx = locks_inode_context(inode); 201 213 }
+11 -4
include/linux/filelock.h
··· 242 242 locks_inode_context(const struct inode *inode) 243 243 { 244 244 /* 245 - * Paired with the fence in locks_get_lock_context(). 245 + * Paired with smp_store_release in locks_get_lock_context(). 246 + * 247 + * Ensures ->i_flctx will be visible if we spotted the flag. 246 248 */ 249 + if (likely(!(smp_load_acquire(&inode->i_opflags) & IOP_FLCTX))) 250 + return NULL; 247 251 return READ_ONCE(inode->i_flctx); 248 252 } 249 253 ··· 475 471 * could end up racing with tasks trying to set a new lease on this 476 472 * file. 477 473 */ 478 - flctx = READ_ONCE(inode->i_flctx); 474 + flctx = locks_inode_context(inode); 479 475 if (!flctx) 480 476 return 0; 481 477 smp_mb(); ··· 494 490 * could end up racing with tasks trying to set a new lease on this 495 491 * file. 496 492 */ 497 - flctx = READ_ONCE(inode->i_flctx); 493 + flctx = locks_inode_context(inode); 498 494 if (!flctx) 499 495 return 0; 500 496 smp_mb(); ··· 539 535 540 536 static inline int break_layout(struct inode *inode, bool wait) 541 537 { 538 + struct file_lock_context *flctx; 539 + 542 540 smp_mb(); 543 - if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease)) { 541 + flctx = locks_inode_context(inode); 542 + if (flctx && !list_empty_careful(&flctx->flc_lease)) { 544 543 unsigned int flags = LEASE_BREAK_LAYOUT; 545 544 546 545 if (!wait)
+1
include/linux/fs.h
··· 631 631 #define IOP_MGTIME 0x0020 632 632 #define IOP_CACHED_LINK 0x0040 633 633 #define IOP_FASTPERM_MAY_EXEC 0x0080 634 + #define IOP_FLCTX 0x0100 634 635 635 636 /* 636 637 * Inode state bits. Protected by inode->i_lock