Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

fuse: fix fuseblk i_blkbits for iomap partial writes

On regular fuse filesystems, i_blkbits is set to PAGE_SHIFT which means
any iomap partial writes will mark the entire folio as uptodate. However
fuseblk filesystems work differently and allow the blocksize to be less
than the page size. As such, this may lead to data corruption if fuseblk
sets its blocksize to less than the page size, uses the writeback cache,
and does a partial write, then a read and the read happens before the
write has undergone writeback, since the folio will not be marked
uptodate from the partial write so the read will read in the entire
folio from disk, which will overwrite the partial write.

The long-term solution for this, which will also be needed for fuse to
enable large folios with the writeback cache on, is to have fuse also
use iomap for folio reads, but until that is done, the cleanest
workaround is to use the page size for fuseblk's internal kernel inode
blksize/blkbits values while maintaining current behavior for stat().

This was verified using ntfs-3g:
$ sudo mkfs.ntfs -f -c 512 /dev/vdd1
$ sudo ntfs-3g /dev/vdd1 ~/fuseblk
$ stat ~/fuseblk/hi.txt
IO Block: 512

Fixes: a4c9ab1d4975 ("fuse: use iomap for buffered writes")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

authored by

Joanne Koong and committed by
Miklos Szeredi
bd24d210 79569946

+21 -2
+1 -1
fs/fuse/dir.c
··· 1199 1199 if (attr->blksize != 0) 1200 1200 blkbits = ilog2(attr->blksize); 1201 1201 else 1202 - blkbits = inode->i_sb->s_blocksize_bits; 1202 + blkbits = fc->blkbits; 1203 1203 1204 1204 stat->blksize = 1 << blkbits; 1205 1205 }
+8
fs/fuse/fuse_i.h
··· 975 975 /* Request timeout (in jiffies). 0 = no timeout */ 976 976 unsigned int req_timeout; 977 977 } timeout; 978 + 979 + /* 980 + * This is a workaround until fuse uses iomap for reads. 981 + * For fuseblk servers, this represents the blocksize passed in at 982 + * mount time and for regular fuse servers, this is equivalent to 983 + * inode->i_blkbits. 984 + */ 985 + u8 blkbits; 978 986 }; 979 987 980 988 /*
+12 -1
fs/fuse/inode.c
··· 292 292 if (attr->blksize) 293 293 fi->cached_i_blkbits = ilog2(attr->blksize); 294 294 else 295 - fi->cached_i_blkbits = inode->i_sb->s_blocksize_bits; 295 + fi->cached_i_blkbits = fc->blkbits; 296 296 297 297 /* 298 298 * Don't set the sticky bit in i_mode, unless we want the VFS ··· 1810 1810 err = -EINVAL; 1811 1811 if (!sb_set_blocksize(sb, ctx->blksize)) 1812 1812 goto err; 1813 + /* 1814 + * This is a workaround until fuse hooks into iomap for reads. 1815 + * Use PAGE_SIZE for the blocksize else if the writeback cache 1816 + * is enabled, buffered writes go through iomap and a read may 1817 + * overwrite partially written data if blocksize < PAGE_SIZE 1818 + */ 1819 + fc->blkbits = sb->s_blocksize_bits; 1820 + if (ctx->blksize != PAGE_SIZE && 1821 + !sb_set_blocksize(sb, PAGE_SIZE)) 1822 + goto err; 1813 1823 #endif 1814 1824 } else { 1815 1825 sb->s_blocksize = PAGE_SIZE; 1816 1826 sb->s_blocksize_bits = PAGE_SHIFT; 1827 + fc->blkbits = sb->s_blocksize_bits; 1817 1828 } 1818 1829 1819 1830 sb->s_subtype = ctx->subtype;