Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

NFS: fix writeback in presence of errors

After running xfstest generic/751, in certain conditions, can have
a writeback IO stuck while experiencing one of the two patterns.

Pattern#1: writeback IO experiences ENOSPC on an offset smaller
than the filesize. Example,
write offset=0 len=4096 how=unstable OK
write offset=8192 len=4096 how=unstable OK
write offset=12288 len=4096 how=unstable ENOSPC
write offset=4096 len=4096 how=unstable ENOSPC
client sends a commit and receives a verifier which is different
from the last successful write. It marks pages dirty and writeback
retries. But it again send writes unstable and gets into the same
pattern, running into the ENOSPC error and sending a commit because
writes were sent at unstable.

Pattern#2: an unstable write followed by a short write and ENOSPC.
write offset=0 len=4096 how=unstable OK
write offset=4096 len=4096 how=unstable returns OK but count=100
write offset=4197 len=3996 how=stable returns ENOSPC
client send a commit and receives a verifier different from
the last unstable write. The same behaviour is retried in a loop.

Instead, this patch proposes to identify those conditions and mark
requests to be done synchronously instead. Previous solution tried
to mark it in the nfs_page, however that's not persistent thus
instead mark it in the nfs_open_context.

Furthermore, the same problem occurs during localio code path so
recognize that IO needs to be done sync in that case as well.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

authored by

Olga Kornievskaia and committed by
Trond Myklebust
5d3869a4 43ea7036

+27 -1
+14 -1
fs/nfs/localio.c
··· 865 865 file_start_write(filp); 866 866 n_iters = atomic_read(&iocb->n_iters); 867 867 for (int i = 0; i < n_iters ; i++) { 868 + size_t icount; 869 + 868 870 if (iocb->iter_is_dio_aligned[i]) { 869 871 iocb->kiocb.ki_flags |= IOCB_DIRECT; 870 872 /* Only use AIO completion if DIO-aligned segment is last */ ··· 883 881 if (status == -EIOCBQUEUED) 884 882 continue; 885 883 /* Break on completion, errors, or short writes */ 884 + icount = iov_iter_count(&iocb->iters[i]); 886 885 if (nfs_local_pgio_done(iocb, status) || status < 0 || 887 - (size_t)status < iov_iter_count(&iocb->iters[i])) { 886 + (size_t)status < icount) { 887 + if ((size_t)status < icount) { 888 + struct nfs_lock_context *ctx = 889 + iocb->hdr->req->wb_lock_context; 890 + 891 + set_bit(NFS_CONTEXT_WRITE_SYNC, 892 + &ctx->open_context->flags); 893 + } 888 894 nfs_local_write_iocb_done(iocb); 889 895 break; 890 896 } ··· 911 901 __func__, hdr->args.count, hdr->args.offset, 912 902 (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable"); 913 903 904 + if (test_bit(NFS_CONTEXT_WRITE_SYNC, 905 + &hdr->req->wb_lock_context->open_context->flags)) 906 + hdr->args.stable = NFS_FILE_SYNC; 914 907 switch (hdr->args.stable) { 915 908 default: 916 909 break;
+3
fs/nfs/pagelist.c
··· 1186 1186 1187 1187 nfs_page_group_lock(req); 1188 1188 1189 + if (test_bit(NFS_CONTEXT_WRITE_SYNC, 1190 + &req->wb_lock_context->open_context->flags)) 1191 + desc->pg_ioflags |= FLUSH_STABLE; 1189 1192 subreq = req; 1190 1193 subreq_size = subreq->wb_bytes; 1191 1194 for(;;) {
+9
fs/nfs/write.c
··· 927 927 goto remove_req; 928 928 } 929 929 if (nfs_write_need_commit(hdr)) { 930 + struct nfs_open_context *ctx = 931 + hdr->req->wb_lock_context->open_context; 932 + 930 933 /* Reset wb_nio, since the write was successful. */ 931 934 req->wb_nio = 0; 932 935 memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf)); 936 + clear_bit(NFS_CONTEXT_WRITE_SYNC, &ctx->flags); 933 937 nfs_mark_request_commit(req, hdr->lseg, &cinfo, 934 938 hdr->ds_commit_idx); 935 939 goto next; ··· 1557 1553 1558 1554 if (resp->count < argp->count && !list_empty(&hdr->pages)) { 1559 1555 static unsigned long complain; 1556 + struct nfs_open_context *ctx = 1557 + hdr->req->wb_lock_context->open_context; 1560 1558 1559 + set_bit(NFS_CONTEXT_WRITE_SYNC, &ctx->flags); 1561 1560 /* This a short write! */ 1562 1561 nfs_inc_stats(hdr->inode, NFSIOS_SHORTWRITE); 1563 1562 ··· 1844 1837 /* We have a mismatch. Write the page again */ 1845 1838 dprintk(" mismatch\n"); 1846 1839 nfs_mark_request_dirty(req); 1840 + set_bit(NFS_CONTEXT_WRITE_SYNC, 1841 + &req->wb_lock_context->open_context->flags); 1847 1842 atomic_long_inc(&NFS_I(data->inode)->redirtied_pages); 1848 1843 next: 1849 1844 nfs_unlock_and_release_request(req);
+1
include/linux/nfs_fs.h
··· 109 109 #define NFS_CONTEXT_BAD (2) 110 110 #define NFS_CONTEXT_UNLOCK (3) 111 111 #define NFS_CONTEXT_FILE_OPEN (4) 112 + #define NFS_CONTEXT_WRITE_SYNC (5) 112 113 113 114 struct nfs4_threshold *mdsthreshold; 114 115 struct list_head list;