Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

writeback: don't block sync for filesystems with no data integrity guarantees

Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot
guarantee data persistence on sync (eg fuse). For superblocks with this
flag set, sync kicks off writeback of dirty inodes but does not wait
for the flusher threads to complete the writeback.

This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in
commit f9a49aa302a0 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings
in wait_sb_inodes()"). The flag belongs at the superblock level because
data integrity is a filesystem-wide property, not a per-inode one.
Having this flag at the superblock level also allows us to skip having
to iterate every dirty inode in wait_sb_inodes() only to skip each inode
individually.

Prior to this commit, mappings with no data integrity guarantees skipped
waiting on writeback completion but still waited on the flusher threads
to finish initiating the writeback. Waiting on the flusher threads is
unnecessary. This commit kicks off writeback but does not wait on the
flusher threads. This change properly addresses a recent report [1] for
a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting
on the flusher threads to finish:

Workqueue: pm_fs_sync pm_fs_sync_work_fn
Call Trace:
<TASK>
__schedule+0x457/0x1720
schedule+0x27/0xd0
wb_wait_for_completion+0x97/0xe0
sync_inodes_sb+0xf8/0x2e0
__iterate_supers+0xdc/0x160
ksys_sync+0x43/0xb0
pm_fs_sync_work_fn+0x17/0xa0
process_one_work+0x193/0x350
worker_thread+0x1a1/0x310
kthread+0xfc/0x240
ret_from_fork+0x243/0x280
ret_from_fork_asm+0x1a/0x30
</TASK>

On fuse this is problematic because there are paths that may cause the
flusher thread to block (eg if systemd freezes the user session cgroups
first, which freezes the fuse daemon, before invoking the kernel
suspend. The kernel suspend triggers ->write_node() which on fuse issues
a synchronous setattr request, which cannot be processed since the
daemon is frozen. Or if the daemon is buggy and cannot properly complete
writeback, initiating writeback on a dirty folio already under writeback
leads to writeback_get_folio() -> folio_prepare_writeback() ->
unconditional wait on writeback to finish, which will cause a hang).
This commit restores fuse to its prior behavior before tmp folios were
removed, where sync was essentially a no-op.

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a-asuvfrbKXbEwwDSctvemF+6zfhdnuzO65Pt8HsFSRw@mail.gmail.com/T/#m632c4648e9cafc4239299887109ebd880ac6c5c1

Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
Reported-by: John <therealgraysky@proton.me>
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260320005145.2483161-2-joannelkoong@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

Joanne Koong and committed by
Christian Brauner
76f9377c 7e575234

+15 -20
+12 -6
fs/fs-writeback.c
··· 2787 2787 * The mapping can appear untagged while still on-list since we 2788 2788 * do not have the mapping lock. Skip it here, wb completion 2789 2789 * will remove it. 2790 - * 2791 - * If the mapping does not have data integrity semantics, 2792 - * there's no need to wait for the writeout to complete, as the 2793 - * mapping cannot guarantee that data is persistently stored. 2794 2790 */ 2795 - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) || 2796 - mapping_no_data_integrity(mapping)) 2791 + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) 2797 2792 continue; 2798 2793 2799 2794 spin_unlock_irq(&sb->s_inode_wblist_lock); ··· 2923 2928 */ 2924 2929 if (bdi == &noop_backing_dev_info) 2925 2930 return; 2931 + 2932 + /* 2933 + * If the superblock has SB_I_NO_DATA_INTEGRITY set, there's no need to 2934 + * wait for the writeout to complete, as the filesystem cannot guarantee 2935 + * data persistence on sync. Just kick off writeback and return. 2936 + */ 2937 + if (sb->s_iflags & SB_I_NO_DATA_INTEGRITY) { 2938 + wakeup_flusher_threads_bdi(bdi, WB_REASON_SYNC); 2939 + return; 2940 + } 2941 + 2926 2942 WARN_ON(!rwsem_is_locked(&sb->s_umount)); 2927 2943 2928 2944 /* protect against inode wb switch, see inode_switch_wbs_work_fn() */
+1 -3
fs/fuse/file.c
··· 3201 3201 3202 3202 inode->i_fop = &fuse_file_operations; 3203 3203 inode->i_data.a_ops = &fuse_file_aops; 3204 - if (fc->writeback_cache) { 3204 + if (fc->writeback_cache) 3205 3205 mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data); 3206 - mapping_set_no_data_integrity(&inode->i_data); 3207 - } 3208 3206 3209 3207 INIT_LIST_HEAD(&fi->write_files); 3210 3208 INIT_LIST_HEAD(&fi->queued_writes);
+1
fs/fuse/inode.c
··· 1709 1709 sb->s_export_op = &fuse_export_operations; 1710 1710 sb->s_iflags |= SB_I_IMA_UNVERIFIABLE_SIGNATURE; 1711 1711 sb->s_iflags |= SB_I_NOIDMAP; 1712 + sb->s_iflags |= SB_I_NO_DATA_INTEGRITY; 1712 1713 if (sb->s_user_ns != &init_user_ns) 1713 1714 sb->s_iflags |= SB_I_UNTRUSTED_MOUNTER; 1714 1715 sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION);
+1
include/linux/fs/super_types.h
··· 338 338 #define SB_I_NOUMASK 0x00001000 /* VFS does not apply umask */ 339 339 #define SB_I_NOIDMAP 0x00002000 /* No idmapped mounts on this superblock */ 340 340 #define SB_I_ALLOW_HSM 0x00004000 /* Allow HSM events on this superblock */ 341 + #define SB_I_NO_DATA_INTEGRITY 0x00008000 /* fs cannot guarantee data persistence on sync */ 341 342 342 343 #endif /* _LINUX_FS_SUPER_TYPES_H */
-11
include/linux/pagemap.h
··· 210 210 AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, 211 211 AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't 212 212 account usage to user cgroups */ 213 - AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ 214 213 /* Bits 16-25 are used for FOLIO_ORDER */ 215 214 AS_FOLIO_ORDER_BITS = 5, 216 215 AS_FOLIO_ORDER_MIN = 16, ··· 343 344 static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct address_space *mapping) 344 345 { 345 346 return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); 346 - } 347 - 348 - static inline void mapping_set_no_data_integrity(struct address_space *mapping) 349 - { 350 - set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags); 351 - } 352 - 353 - static inline bool mapping_no_data_integrity(const struct address_space *mapping) 354 - { 355 - return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags); 356 347 } 357 348 358 349 static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)