Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

afs: Fix speculative status fetches

The generic/464 xfstest causes kAFS to emit occasional warnings of the
form:

kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015)

This indicates that the data version received back from the server did not
match the expected value (the DV should be incremented monotonically for
each individual modification op committed to a vnode).

What is happening is that a lookup call is doing a bulk status fetch
speculatively on a bunch of vnodes in a directory besides getting the
status of the vnode it's actually interested in. This is racing with a
StoreData operation (though it could also occur with, say, a MakeDir op).

On the client, a modification operation locks the vnode, but the bulk
status fetch only locks the parent directory, so no ordering is imposed
there (thereby avoiding an avenue to deadlock).

On the server, the StoreData op handler doesn't lock the vnode until it's
received all the request data, and downgrades the lock after committing the
data until it has finished sending change notifications to other clients -
which allows the status fetch to occur before it has finished.

This means that:

- a status fetch can access the target vnode either side of the exclusive
section of the modification

- the status fetch could start before the modification, yet finish after,
and vice-versa.

- the status fetch and the modification RPCs can complete in either order.

- the status fetch can return either the before or the after DV from the
modification.

- the status fetch might regress the locally cached DV.

Some of these are handled by the previous fix[1], but that's not sufficient
because it checks the DV it received against the DV it cached at the start
of the op, but the DV might've been updated in the meantime by a locally
generated modification op.

Fix this by the following means:

(1) Keep track of when we're performing a modification operation on a
vnode. This is done by marking vnode parameters with a 'modification'
note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode
for the duration.

(2) Alter the speculation race detection to ignore speculative status
fetches if either the vnode is marked as being modified or the data
version number is not what we expected.

Note that whilst the "vnode modified" warning does get recovered from as it
causes the client to refetch the status at the next opportunity, it will
also invalidate the pagecache, so changes might get lost.

Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/linux-fsdevel/161961335926.39335.2552653972195467566.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

David Howells and committed by
Linus Torvalds
22650f14 7af81cd0

+23 -2
+7
fs/afs/dir.c
··· 1419 1419 1420 1420 afs_op_set_vnode(op, 0, dvnode); 1421 1421 op->file[0].dv_delta = 1; 1422 + op->file[0].modification = true; 1422 1423 op->file[0].update_ctime = true; 1423 1424 op->dentry = dentry; 1424 1425 op->create.mode = S_IFDIR | mode; ··· 1501 1500 1502 1501 afs_op_set_vnode(op, 0, dvnode); 1503 1502 op->file[0].dv_delta = 1; 1503 + op->file[0].modification = true; 1504 1504 op->file[0].update_ctime = true; 1505 1505 1506 1506 op->dentry = dentry; ··· 1638 1636 1639 1637 afs_op_set_vnode(op, 0, dvnode); 1640 1638 op->file[0].dv_delta = 1; 1639 + op->file[0].modification = true; 1641 1640 op->file[0].update_ctime = true; 1642 1641 1643 1642 /* Try to make sure we have a callback promise on the victim. */ ··· 1721 1718 1722 1719 afs_op_set_vnode(op, 0, dvnode); 1723 1720 op->file[0].dv_delta = 1; 1721 + op->file[0].modification = true; 1724 1722 op->file[0].update_ctime = true; 1725 1723 1726 1724 op->dentry = dentry; ··· 1796 1792 afs_op_set_vnode(op, 0, dvnode); 1797 1793 afs_op_set_vnode(op, 1, vnode); 1798 1794 op->file[0].dv_delta = 1; 1795 + op->file[0].modification = true; 1799 1796 op->file[0].update_ctime = true; 1800 1797 op->file[1].update_ctime = true; 1801 1798 ··· 1992 1987 afs_op_set_vnode(op, 1, new_dvnode); /* May be same as orig_dvnode */ 1993 1988 op->file[0].dv_delta = 1; 1994 1989 op->file[1].dv_delta = 1; 1990 + op->file[0].modification = true; 1991 + op->file[1].modification = true; 1995 1992 op->file[0].update_ctime = true; 1996 1993 op->file[1].update_ctime = true; 1997 1994
+3
fs/afs/dir_silly.c
··· 73 73 afs_op_set_vnode(op, 1, dvnode); 74 74 op->file[0].dv_delta = 1; 75 75 op->file[1].dv_delta = 1; 76 + op->file[0].modification = true; 77 + op->file[1].modification = true; 76 78 op->file[0].update_ctime = true; 77 79 op->file[1].update_ctime = true; 78 80 ··· 203 201 afs_op_set_vnode(op, 0, dvnode); 204 202 afs_op_set_vnode(op, 1, vnode); 205 203 op->file[0].dv_delta = 1; 204 + op->file[0].modification = true; 206 205 op->file[0].update_ctime = true; 207 206 op->file[1].op_unlinked = true; 208 207 op->file[1].update_ctime = true;
+6
fs/afs/fs_operation.c
··· 118 118 vp->cb_break_before = afs_calc_vnode_cb_break(vnode); 119 119 if (vnode->lock_state != AFS_VNODE_LOCK_NONE) 120 120 op->flags |= AFS_OPERATION_CUR_ONLY; 121 + if (vp->modification) 122 + set_bit(AFS_VNODE_MODIFYING, &vnode->flags); 121 123 } 122 124 123 125 if (vp->fid.vnode) ··· 227 225 228 226 if (op->ops && op->ops->put) 229 227 op->ops->put(op); 228 + if (op->file[0].modification) 229 + clear_bit(AFS_VNODE_MODIFYING, &op->file[0].vnode->flags); 230 + if (op->file[1].modification && op->file[1].vnode != op->file[0].vnode) 231 + clear_bit(AFS_VNODE_MODIFYING, &op->file[1].vnode->flags); 230 232 if (op->file[0].put_vnode) 231 233 iput(&op->file[0].vnode->vfs_inode); 232 234 if (op->file[1].put_vnode)
+4 -2
fs/afs/inode.c
··· 294 294 op->flags &= ~AFS_OPERATION_DIR_CONFLICT; 295 295 } 296 296 } else if (vp->scb.have_status) { 297 - if (vp->dv_before + vp->dv_delta != vp->scb.status.data_version && 298 - vp->speculative) 297 + if (vp->speculative && 298 + (test_bit(AFS_VNODE_MODIFYING, &vnode->flags) || 299 + vp->dv_before != vnode->status.data_version)) 299 300 /* Ignore the result of a speculative bulk status fetch 300 301 * if it splits around a modification op, thereby 301 302 * appearing to regress the data version. ··· 912 911 } 913 912 op->ctime = attr->ia_ctime; 914 913 op->file[0].update_ctime = 1; 914 + op->file[0].modification = true; 915 915 916 916 op->ops = &afs_setattr_operation; 917 917 ret = afs_do_sync_operation(op);
+2
fs/afs/internal.h
··· 645 645 #define AFS_VNODE_PSEUDODIR 7 /* set if Vnode is a pseudo directory */ 646 646 #define AFS_VNODE_NEW_CONTENT 8 /* Set if file has new content (create/trunc-0) */ 647 647 #define AFS_VNODE_SILLY_DELETED 9 /* Set if file has been silly-deleted */ 648 + #define AFS_VNODE_MODIFYING 10 /* Set if we're performing a modification op */ 648 649 649 650 struct list_head wb_keys; /* List of keys available for writeback */ 650 651 struct list_head pending_locks; /* locks waiting to be granted */ ··· 763 762 bool set_size:1; /* Must update i_size */ 764 763 bool op_unlinked:1; /* True if file was unlinked by op */ 765 764 bool speculative:1; /* T if speculative status fetch (no vnode lock) */ 765 + bool modification:1; /* Set if the content gets modified */ 766 766 }; 767 767 768 768 /*
+1
fs/afs/write.c
··· 377 377 378 378 afs_op_set_vnode(op, 0, vnode); 379 379 op->file[0].dv_delta = 1; 380 + op->file[0].modification = true; 380 381 op->store.write_iter = iter; 381 382 op->store.pos = pos; 382 383 op->store.size = size;