Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

afs: Fix speculative status fetch going out of order wrt to modifications

When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).

When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes. What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.

This results in something like the following being seen in dmesg:

kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification. This was causing the cache to be
invalidated for that vnode when it shouldn't have been. If it happens
on a data file, this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.

Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway. It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

David Howells and committed by
Linus Torvalds
a9e5c87c a349e4c6

+10
+1
fs/afs/dir.c
··· 823 823 vp->cb_break_before = afs_calc_vnode_cb_break(vnode); 824 824 vp->vnode = vnode; 825 825 vp->put_vnode = true; 826 + vp->speculative = true; /* vnode not locked */ 826 827 } 827 828 } 828 829 }
+8
fs/afs/inode.c
··· 294 294 op->flags &= ~AFS_OPERATION_DIR_CONFLICT; 295 295 } 296 296 } else if (vp->scb.have_status) { 297 + if (vp->dv_before + vp->dv_delta != vp->scb.status.data_version && 298 + vp->speculative) 299 + /* Ignore the result of a speculative bulk status fetch 300 + * if it splits around a modification op, thereby 301 + * appearing to regress the data version. 302 + */ 303 + goto out; 297 304 afs_apply_status(op, vp); 298 305 if (vp->scb.have_cb) 299 306 afs_apply_callback(op, vp); ··· 312 305 } 313 306 } 314 307 308 + out: 315 309 write_sequnlock(&vnode->cb_lock); 316 310 317 311 if (vp->scb.have_status)
+1
fs/afs/internal.h
··· 755 755 bool update_ctime:1; /* Need to update the ctime */ 756 756 bool set_size:1; /* Must update i_size */ 757 757 bool op_unlinked:1; /* True if file was unlinked by op */ 758 + bool speculative:1; /* T if speculative status fetch (no vnode lock) */ 758 759 }; 759 760 760 761 /*