Merge tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull f2fs updates from Jaegeuk Kim:
"This focuses on two primary updates for Android devices.

First, it sets hash-based file name lookup as the default method to
improve performance, while retaining an option to fall back to a
linear lookup.

Second, it resolves a persistent issue with the 'checkpoint=enable'
feature.

The update further boosts performance by prefetching node blocks,
merging FUA writes more efficiently, and optimizing block allocation
policies.

The release is rounded out by a comprehensive set of bug fixes that
address memory safety, data integrity, and potential system hangs,
along with minor documentation and code clean-ups.

Enhancements:
- add mount option and sysfs entry to tune the lookup mode
- dump more information and add a timeout when enabling/disabling
checkpoints
- readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode
- merge FUA command with the existing writes
- allocate HOT_DATA for IPU writes
- Use allocate_section_policy to control write priority in
multi-devices setups
- add reserved nodes for privileged users
- Add bggc_io_aware to adjust the priority of BG_GC when issuing IO
- show the list of donation files

Bug fixes:
- add missing dput() when printing the donation list
- fix UAF issue in f2fs_merge_page_bio()
- add sanity check on ei.len in __update_extent_tree_range()
- fix infinite loop in __insert_extent_tree()
- fix zero-sized extent for precache extents
- fix to mitigate overhead of f2fs_zero_post_eof_page()
- fix to avoid migrating empty section
- fix to truncate first page in error path of f2fs_truncate()
- fix to update map->m_next_extent correctly in f2fs_map_blocks()
- fix wrong layout information on 16KB page
- fix to do sanity check on node footer for non inode dnode
- fix to avoid NULL pointer dereference in
f2fs_check_quota_consistency()
- fix to detect potential corrupted nid in free_nid_list
- fix to clear unusable_cap for checkpoint=enable
- fix to zero data after EOF for compressed file correctly
- fix to avoid overflow while left shift operation
- fix condition in __allow_reserved_blocks()"

* tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (43 commits)
f2fs: add missing dput() when printing the donation list
f2fs: fix UAF issue in f2fs_merge_page_bio()
f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode
f2fs: add sanity check on ei.len in __update_extent_tree_range()
f2fs: fix infinite loop in __insert_extent_tree()
f2fs: fix zero-sized extent for precache extents
f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page()
f2fs: fix to avoid migrating empty section
f2fs: fix to truncate first page in error path of f2fs_truncate()
f2fs: fix to update map->m_next_extent correctly in f2fs_map_blocks()
f2fs: fix wrong layout information on 16KB page
f2fs: clean up error handing of f2fs_submit_page_read()
f2fs: avoid unnecessary folio_clear_uptodate() for cleanup
f2fs: merge FUA command with the existing writes
f2fs: allocate HOT_DATA for IPU writes
f2fs: Use allocate_section_policy to control write priority in multi-devices setups
Documentation: f2fs: Reword title
Documentation: f2fs: Indent compression_mode option list
Documentation: f2fs: Wrap snippets in literal code blocks
Documentation: f2fs: Span write hint table section rows
...

Linus Torvalds 8 months ago 86d563ac f0712c20

+697 -199

18 changed files

expand all

Documentation

ABI

testing

sysfs-fs-f2fs

filesystems

f2fs.rst

f2fs

checkpoint.c

compress.c

data.c

dir.c

extent_cache.c

f2fs.h

file.c

gc.c

node.c

node.h

recovery.c

segment.c

segment.h

super.c

sysfs.c

include

linux

f2fs_fs.h

+53 -3

Documentation/ABI/testing/sysfs-fs-f2fs

··· 822 822 Date: September 2024 823 823 Contact: "Daeho Jeong" <daehojeong@google.com> 824 824 Description: It controls the valid block ratio threshold not to trigger excessive GC 825 - for zoned deivces. The initial value of it is 95(%). F2FS will stop the 826 - background GC thread from intiating GC for sections having valid blocks 825 + for zoned devices. The initial value of it is 95(%). F2FS will stop the 826 + background GC thread from initiating GC for sections having valid blocks 827 827 exceeding the ratio. 828 828 829 829 What: /sys/fs/f2fs/<disk>/max_read_extent_count ··· 847 847 filesystem level GC. To do that, we can reserve the space using 848 848 reserved_blocks. However, it is not enough, since this extra space should 849 849 not be shown to users. So, with this new sysfs node, we can hide the space 850 - by substracting reserved_blocks from total bytes. 850 + by subtracting reserved_blocks from total bytes. 851 851 852 852 What: /sys/fs/f2fs/<disk>/encoding_flags 853 853 Date: April 2025 ··· 883 883 Contact: "Daeho Jeong" <daehojeong@google.com> 884 884 Description: Control GC algorithm for boost GC. 0: cost benefit, 1: greedy 885 885 Default: 1 886 + 887 + What: /sys/fs/f2fs/<disk>/effective_lookup_mode 888 + Date: August 2025 889 + Contact: "Daniel Lee" <chullee@google.com> 890 + Description: 891 + This is a read-only entry to show the effective directory lookup mode 892 + F2FS is currently using for casefolded directories. 893 + This considers both the "lookup_mode" mount option and the on-disk 894 + encoding flag, SB_ENC_NO_COMPAT_FALLBACK_FL. 895 + 896 + Possible values are: 897 + - "perf": Hash-only lookup. 898 + - "compat": Hash-based lookup with a linear search fallback enabled 899 + - "auto:perf": lookup_mode is auto and fallback is disabled on-disk 900 + - "auto:compat": lookup_mode is auto and fallback is enabled on-disk 901 + 902 + What: /sys/fs/f2fs/<disk>/bggc_io_aware 903 + Date: August 2025 904 + Contact: "Liao Yuanhong" <liaoyuanhong@vivo.com> 905 + Description: Used to adjust the BG_GC priority when pending IO, with a default value 906 + of 0. Specifically, for ZUFS, the default value is 1. 907 + 908 + ================== ====================================================== 909 + value description 910 + bggc_io_aware = 0 skip background GC if there is any kind of pending IO 911 + bggc_io_aware = 1 skip background GC if there is pending read IO 912 + bggc_io_aware = 2 don't aware IO for background GC 913 + ================== ====================================================== 914 + 915 + What: /sys/fs/f2fs/<disk>/allocate_section_hint 916 + Date: August 2025 917 + Contact: "Liao Yuanhong" <liaoyuanhong@vivo.com> 918 + Description: Indicates the hint section between the first device and others in multi-devices 919 + setup. It defaults to the end of the first device in sections. For a single storage 920 + device, it defaults to the total number of sections. It can be manually set to match 921 + scenarios where multi-devices are mapped to the same dm device. 922 + 923 + What: /sys/fs/f2fs/<disk>/allocate_section_policy 924 + Date: August 2025 925 + Contact: "Liao Yuanhong" <liaoyuanhong@vivo.com> 926 + Description: Controls write priority in multi-devices setups. A value of 0 means normal writing. 927 + A value of 1 prioritizes writing to devices before the allocate_section_hint. A value of 2 928 + prioritizes writing to devices after the allocate_section_hint. The default is 0. 929 + 930 + =========================== ========================================================== 931 + value description 932 + allocate_section_policy = 0 Normal writing 933 + allocate_section_policy = 1 Prioritize writing to section before allocate_section_hint 934 + allocate_section_policy = 2 Prioritize writing to section after allocate_section_hint 935 + =========================== ==========================================================

+75 -41

Documentation/filesystems/f2fs.rst

··· 1 1 .. SPDX-License-Identifier: GPL-2.0 2 2 3 - ========================================== 4 - WHAT IS Flash-Friendly File System (F2FS)? 5 - ========================================== 3 + ================================= 4 + Flash-Friendly File System (F2FS) 5 + ================================= 6 + 7 + Overview 8 + ======== 6 9 7 10 NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have 8 11 been equipped on a variety systems ranging from mobile to server systems. Since ··· 176 173 persist data of regular and symlink. 177 174 reserve_root=%d Support configuring reserved space which is used for 178 175 allocation from a privileged user with specified uid or 179 - gid, unit: 4KB, the default limit is 0.2% of user blocks. 180 - resuid=%d The user ID which may use the reserved blocks. 181 - resgid=%d The group ID which may use the reserved blocks. 176 + gid, unit: 4KB, the default limit is 12.5% of user blocks. 177 + reserve_node=%d Support configuring reserved nodes which are used for 178 + allocation from a privileged user with specified uid or 179 + gid, the default limit is 12.5% of all nodes. 180 + resuid=%d The user ID which may use the reserved blocks and nodes. 181 + resgid=%d The group ID which may use the reserved blocks and nodes. 182 182 fault_injection=%d Enable fault injection in all supported types with 183 183 specified injection rate. 184 184 fault_type=%d Support configuring fault injection type, should be ··· 297 291 "lz4", "zstd" and "lzo-rle" algorithm. 298 292 compress_algorithm=%s:%d Control compress algorithm and its compress level, now, only 299 293 "lz4" and "zstd" support compress level config. 294 + 295 + ========= =========== 300 296 algorithm level range 297 + ========= =========== 301 298 lz4 3 - 16 302 299 zstd 1 - 22 300 + ========= =========== 303 301 compress_log_size=%u Support configuring compress cluster size. The size will 304 302 be 4KB * (1 << %u). The default and minimum sizes are 16KB. 305 303 compress_extension=%s Support adding specified extension, so that f2fs can enable ··· 367 357 panic immediately, continue without doing anything, and remount 368 358 the partition in read-only mode. By default it uses "continue" 369 359 mode. 360 + 370 361 ====================== =============== =============== ======== 371 362 mode continue remount-ro panic 372 363 ====================== =============== =============== ======== ··· 381 370 ====================== =============== =============== ======== 382 371 nat_bits Enable nat_bits feature to enhance full/empty nat blocks access, 383 372 by default it's disabled. 373 + lookup_mode=%s Control the directory lookup behavior for casefolded 374 + directories. This option has no effect on directories 375 + that do not have the casefold feature enabled. 376 + 377 + ================== ======================================== 378 + Value Description 379 + ================== ======================================== 380 + perf (Default) Enforces a hash-only lookup. 381 + The linear search fallback is always 382 + disabled, ignoring the on-disk flag. 383 + compat Enables the linear search fallback for 384 + compatibility with directory entries 385 + created by older kernel that used a 386 + different case-folding algorithm. 387 + This mode ignores the on-disk flag. 388 + auto F2FS determines the mode based on the 389 + on-disk `SB_ENC_NO_COMPAT_FALLBACK_FL` 390 + flag. 391 + ================== ======================================== 384 392 ======================== ============================================================ 385 393 386 394 Debugfs Entries ··· 825 795 extension list " " 826 796 827 797 -- buffered io 798 + ------------------------------------------------------------------ 828 799 N/A COLD_DATA WRITE_LIFE_EXTREME 829 800 N/A HOT_DATA WRITE_LIFE_SHORT 830 801 N/A WARM_DATA WRITE_LIFE_NOT_SET 831 802 832 803 -- direct io 804 + ------------------------------------------------------------------ 833 805 WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME 834 806 WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT 835 807 WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET ··· 947 915 enable compression on a regular inode). 948 916 949 917 1) compress_mode=fs 950 - This is the default option. f2fs does automatic compression in the writeback of the 951 - compression enabled files. 918 + 919 + This is the default option. f2fs does automatic compression in the writeback of the 920 + compression enabled files. 952 921 953 922 2) compress_mode=user 954 - This disables the automatic compression and gives the user discretion of choosing the 955 - target file and the timing. The user can do manual compression/decompression on the 956 - compression enabled files using F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE 957 - ioctls like the below. 958 923 959 - To decompress a file, 924 + This disables the automatic compression and gives the user discretion of choosing the 925 + target file and the timing. The user can do manual compression/decompression on the 926 + compression enabled files using F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE 927 + ioctls like the below. 960 928 961 - fd = open(filename, O_WRONLY, 0); 962 - ret = ioctl(fd, F2FS_IOC_DECOMPRESS_FILE); 929 + To decompress a file:: 963 930 964 - To compress a file, 931 + fd = open(filename, O_WRONLY, 0); 932 + ret = ioctl(fd, F2FS_IOC_DECOMPRESS_FILE); 965 933 966 - fd = open(filename, O_WRONLY, 0); 967 - ret = ioctl(fd, F2FS_IOC_COMPRESS_FILE); 934 + To compress a file:: 935 + 936 + fd = open(filename, O_WRONLY, 0); 937 + ret = ioctl(fd, F2FS_IOC_COMPRESS_FILE); 968 938 969 939 NVMe Zoned Namespace devices 970 940 ---------------------------- ··· 996 962 external usage is complete, the device aliasing file can be deleted, releasing 997 963 the reserved space back to F2FS for its own use. 998 964 999 - <use-case> 965 + .. code-block:: 1000 966 1001 - # ls /dev/vd* 1002 - /dev/vdb (32GB) /dev/vdc (32GB) 1003 - # mkfs.ext4 /dev/vdc 1004 - # mkfs.f2fs -c /dev/vdc@vdc.file /dev/vdb 1005 - # mount /dev/vdb /mnt/f2fs 1006 - # ls -l /mnt/f2fs 1007 - vdc.file 1008 - # df -h 1009 - /dev/vdb 64G 33G 32G 52% /mnt/f2fs 967 + # ls /dev/vd* 968 + /dev/vdb (32GB) /dev/vdc (32GB) 969 + # mkfs.ext4 /dev/vdc 970 + # mkfs.f2fs -c /dev/vdc@vdc.file /dev/vdb 971 + # mount /dev/vdb /mnt/f2fs 972 + # ls -l /mnt/f2fs 973 + vdc.file 974 + # df -h 975 + /dev/vdb 64G 33G 32G 52% /mnt/f2fs 1010 976 1011 - # mount -o loop /dev/vdc /mnt/ext4 1012 - # df -h 1013 - /dev/vdb 64G 33G 32G 52% /mnt/f2fs 1014 - /dev/loop7 32G 24K 30G 1% /mnt/ext4 1015 - # umount /mnt/ext4 977 + # mount -o loop /dev/vdc /mnt/ext4 978 + # df -h 979 + /dev/vdb 64G 33G 32G 52% /mnt/f2fs 980 + /dev/loop7 32G 24K 30G 1% /mnt/ext4 981 + # umount /mnt/ext4 1016 982 1017 - # f2fs_io getflags /mnt/f2fs/vdc.file 1018 - get a flag on /mnt/f2fs/vdc.file ret=0, flags=nocow(pinned),immutable 1019 - # f2fs_io setflags noimmutable /mnt/f2fs/vdc.file 1020 - get a flag on noimmutable ret=0, flags=800010 1021 - set a flag on /mnt/f2fs/vdc.file ret=0, flags=noimmutable 1022 - # rm /mnt/f2fs/vdc.file 1023 - # df -h 1024 - /dev/vdb 64G 753M 64G 2% /mnt/f2fs 983 + # f2fs_io getflags /mnt/f2fs/vdc.file 984 + get a flag on /mnt/f2fs/vdc.file ret=0, flags=nocow(pinned),immutable 985 + # f2fs_io setflags noimmutable /mnt/f2fs/vdc.file 986 + get a flag on noimmutable ret=0, flags=800010 987 + set a flag on /mnt/f2fs/vdc.file ret=0, flags=noimmutable 988 + # rm /mnt/f2fs/vdc.file 989 + # df -h 990 + /dev/vdb 64G 753M 64G 2% /mnt/f2fs 1025 991 1026 992 So, the key idea is, user can do any file operations on /dev/vdc, and 1027 993 reclaim the space after the use, while the space is counted as /data.

+53

fs/f2fs/checkpoint.c

··· 1442 1442 return get_sectors_written(sbi->sb->s_bdev); 1443 1443 } 1444 1444 1445 + static inline void stat_cp_time(struct cp_control *cpc, enum cp_time type) 1446 + { 1447 + cpc->stats.times[type] = ktime_get(); 1448 + } 1449 + 1450 + static inline void check_cp_time(struct f2fs_sb_info *sbi, struct cp_control *cpc) 1451 + { 1452 + unsigned long long sb_diff, cur_diff; 1453 + enum cp_time ct; 1454 + 1455 + sb_diff = (u64)ktime_ms_delta(sbi->cp_stats.times[CP_TIME_END], 1456 + sbi->cp_stats.times[CP_TIME_START]); 1457 + cur_diff = (u64)ktime_ms_delta(cpc->stats.times[CP_TIME_END], 1458 + cpc->stats.times[CP_TIME_START]); 1459 + 1460 + if (cur_diff > sb_diff) { 1461 + sbi->cp_stats = cpc->stats; 1462 + if (cur_diff < CP_LONG_LATENCY_THRESHOLD) 1463 + return; 1464 + 1465 + f2fs_warn(sbi, "checkpoint was blocked for %llu ms", cur_diff); 1466 + for (ct = CP_TIME_START; ct < CP_TIME_MAX - 1; ct++) 1467 + f2fs_warn(sbi, "Step#%d: %llu ms", ct, 1468 + (u64)ktime_ms_delta(cpc->stats.times[ct + 1], 1469 + cpc->stats.times[ct])); 1470 + } 1471 + } 1472 + 1445 1473 static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) 1446 1474 { 1447 1475 struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); ··· 1486 1458 1487 1459 /* Flush all the NAT/SIT pages */ 1488 1460 f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); 1461 + 1462 + stat_cp_time(cpc, CP_TIME_SYNC_META); 1489 1463 1490 1464 /* start to update checkpoint, cp ver is already updated previously */ 1491 1465 ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi, true)); ··· 1585 1555 1586 1556 /* Here, we have one bio having CP pack except cp pack 2 page */ 1587 1557 f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); 1558 + stat_cp_time(cpc, CP_TIME_SYNC_CP_META); 1559 + 1588 1560 /* Wait for all dirty meta pages to be submitted for IO */ 1589 1561 f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META); 1562 + stat_cp_time(cpc, CP_TIME_WAIT_DIRTY_META); 1590 1563 1591 1564 /* wait for previous submitted meta pages writeback */ 1592 1565 f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA); 1566 + stat_cp_time(cpc, CP_TIME_WAIT_CP_DATA); 1593 1567 1594 1568 /* flush all device cache */ 1595 1569 err = f2fs_flush_device_cache(sbi); 1596 1570 if (err) 1597 1571 return err; 1572 + stat_cp_time(cpc, CP_TIME_FLUSH_DEVICE); 1598 1573 1599 1574 /* barrier and flush checkpoint cp pack 2 page if it can */ 1600 1575 commit_checkpoint(sbi, ckpt, start_blk); 1601 1576 f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA); 1577 + stat_cp_time(cpc, CP_TIME_WAIT_LAST_CP); 1602 1578 1603 1579 /* 1604 1580 * invalidate intermediate page cache borrowed from meta inode which are ··· 1649 1613 unsigned long long ckpt_ver; 1650 1614 int err = 0; 1651 1615 1616 + stat_cp_time(cpc, CP_TIME_START); 1617 + 1652 1618 if (f2fs_readonly(sbi->sb) || f2fs_hw_is_readonly(sbi)) 1653 1619 return -EROFS; 1654 1620 ··· 1661 1623 } 1662 1624 if (cpc->reason != CP_RESIZE) 1663 1625 f2fs_down_write(&sbi->cp_global_sem); 1626 + 1627 + stat_cp_time(cpc, CP_TIME_LOCK); 1664 1628 1665 1629 if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) && 1666 1630 ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) || ··· 1678 1638 err = block_operations(sbi); 1679 1639 if (err) 1680 1640 goto out; 1641 + 1642 + stat_cp_time(cpc, CP_TIME_OP_LOCK); 1681 1643 1682 1644 trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish block_ops"); 1683 1645 ··· 1720 1678 1721 1679 f2fs_flush_sit_entries(sbi, cpc); 1722 1680 1681 + stat_cp_time(cpc, CP_TIME_FLUSH_META); 1682 + 1723 1683 /* save inmem log status */ 1724 1684 f2fs_save_inmem_curseg(sbi); 1725 1685 ··· 1739 1695 stat_inc_cp_count(sbi); 1740 1696 stop: 1741 1697 unblock_operations(sbi); 1698 + stat_cp_time(cpc, CP_TIME_END); 1699 + check_cp_time(sbi, cpc); 1742 1700 1743 1701 if (cpc->reason & CP_RECOVERY) 1744 1702 f2fs_notice(sbi, "checkpoint: version = %llx", ckpt_ver); ··· 1824 1778 llist_for_each_entry_safe(req, next, dispatch_list, llnode) { 1825 1779 diff = (u64)ktime_ms_delta(ktime_get(), req->queue_time); 1826 1780 req->ret = ret; 1781 + req->delta_time = diff; 1827 1782 complete(&req->wait); 1828 1783 1829 1784 sum_diff += diff; ··· 1919 1872 wait_for_completion(&req.wait); 1920 1873 else 1921 1874 flush_remained_ckpt_reqs(sbi, &req); 1875 + 1876 + if (unlikely(req.delta_time >= CP_LONG_LATENCY_THRESHOLD)) { 1877 + f2fs_warn_ratelimited(sbi, 1878 + "blocked on checkpoint for %u ms", cprc->peak_time); 1879 + dump_stack(); 1880 + } 1922 1881 1923 1882 return req.ret; 1924 1883 }

+22 -17

fs/f2fs/compress.c

··· 1215 1215 { 1216 1216 void *fsdata = NULL; 1217 1217 struct page *pagep; 1218 + struct page **rpages; 1218 1219 int log_cluster_size = F2FS_I(inode)->i_log_cluster_size; 1219 1220 pgoff_t start_idx = from >> (PAGE_SHIFT + log_cluster_size) << 1220 1221 log_cluster_size; 1222 + int i; 1221 1223 int err; 1222 1224 1223 1225 err = f2fs_is_compressed_cluster(inode, start_idx); ··· 1240 1238 if (err <= 0) 1241 1239 return err; 1242 1240 1243 - if (err > 0) { 1244 - struct page **rpages = fsdata; 1245 - int cluster_size = F2FS_I(inode)->i_cluster_size; 1246 - int i; 1241 + rpages = fsdata; 1247 1242 1248 - for (i = cluster_size - 1; i >= 0; i--) { 1249 - struct folio *folio = page_folio(rpages[i]); 1250 - loff_t start = folio->index << PAGE_SHIFT; 1243 + for (i = (1 << log_cluster_size) - 1; i >= 0; i--) { 1244 + struct folio *folio = page_folio(rpages[i]); 1245 + loff_t start = (loff_t)folio->index << PAGE_SHIFT; 1246 + loff_t offset = from > start ? from - start : 0; 1251 1247 1252 - if (from <= start) { 1253 - folio_zero_segment(folio, 0, folio_size(folio)); 1254 - } else { 1255 - folio_zero_segment(folio, from - start, 1256 - folio_size(folio)); 1257 - break; 1258 - } 1259 - } 1248 + folio_zero_segment(folio, offset, folio_size(folio)); 1260 1249 1261 - f2fs_compress_write_end(inode, fsdata, start_idx, true); 1250 + if (from >= start) 1251 + break; 1262 1252 } 1263 - return 0; 1253 + 1254 + f2fs_compress_write_end(inode, fsdata, start_idx, true); 1255 + 1256 + err = filemap_write_and_wait_range(inode->i_mapping, 1257 + round_down(from, 1 << log_cluster_size << PAGE_SHIFT), 1258 + LLONG_MAX); 1259 + if (err) 1260 + return err; 1261 + 1262 + truncate_pagecache(inode, from); 1263 + 1264 + return f2fs_do_truncate_blocks(inode, round_up(from, PAGE_SIZE), lock); 1264 1265 } 1265 1266 1266 1267 static int f2fs_write_compressed_pages(struct compress_ctx *cc,

+20 -39

fs/f2fs/data.c

··· 733 733 static bool io_type_is_mergeable(struct f2fs_bio_info *io, 734 734 struct f2fs_io_info *fio) 735 735 { 736 + blk_opf_t mask = ~(REQ_PREFLUSH | REQ_FUA); 737 + 736 738 if (io->fio.op != fio->op) 737 739 return false; 738 - return io->fio.op_flags == fio->op_flags; 740 + return (io->fio.op_flags & mask) == (fio->op_flags & mask); 739 741 } 740 742 741 743 static bool io_is_mergeable(struct f2fs_sb_info *sbi, struct bio *bio, ··· 913 911 if (fio->io_wbc) 914 912 wbc_account_cgroup_owner(fio->io_wbc, folio, folio_size(folio)); 915 913 916 - inc_page_count(fio->sbi, WB_DATA_TYPE(data_folio, false)); 914 + inc_page_count(fio->sbi, WB_DATA_TYPE(folio, false)); 917 915 918 916 *fio->last_block = fio->new_blkaddr; 919 917 *fio->bio = bio; ··· 1085 1083 } 1086 1084 1087 1085 /* This can handle encryption stuffs */ 1088 - static int f2fs_submit_page_read(struct inode *inode, struct folio *folio, 1086 + static void f2fs_submit_page_read(struct inode *inode, struct folio *folio, 1089 1087 block_t blkaddr, blk_opf_t op_flags, 1090 1088 bool for_write) 1091 1089 { ··· 1094 1092 1095 1093 bio = f2fs_grab_read_bio(inode, blkaddr, 1, op_flags, 1096 1094 folio->index, for_write); 1097 - if (IS_ERR(bio)) 1098 - return PTR_ERR(bio); 1099 1095 1100 1096 /* wait for GCed page writeback via META_MAPPING */ 1101 1097 f2fs_wait_on_block_writeback(inode, blkaddr); 1102 1098 1103 - if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) { 1104 - iostat_update_and_unbind_ctx(bio); 1105 - if (bio->bi_private) 1106 - mempool_free(bio->bi_private, bio_post_read_ctx_pool); 1107 - bio_put(bio); 1108 - return -EFAULT; 1109 - } 1099 + if (!bio_add_folio(bio, folio, PAGE_SIZE, 0)) 1100 + f2fs_bug_on(sbi, 1); 1101 + 1110 1102 inc_page_count(sbi, F2FS_RD_DATA); 1111 1103 f2fs_update_iostat(sbi, NULL, FS_DATA_READ_IO, F2FS_BLKSIZE); 1112 1104 f2fs_submit_read_bio(sbi, bio, DATA); 1113 - return 0; 1114 1105 } 1115 1106 1116 1107 static void __set_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr) ··· 1260 1265 return folio; 1261 1266 } 1262 1267 1263 - err = f2fs_submit_page_read(inode, folio, dn.data_blkaddr, 1268 + f2fs_submit_page_read(inode, folio, dn.data_blkaddr, 1264 1269 op_flags, for_write); 1265 - if (err) 1266 - goto put_err; 1267 1270 return folio; 1268 1271 1269 1272 put_err: ··· 1565 1572 pgofs = (pgoff_t)map->m_lblk; 1566 1573 end = pgofs + maxblocks; 1567 1574 1575 + if (flag == F2FS_GET_BLOCK_PRECACHE) 1576 + mode = LOOKUP_NODE_RA; 1577 + 1568 1578 next_dnode: 1569 1579 if (map->m_may_create) { 1570 1580 if (f2fs_lfs_mode(sbi)) ··· 1774 1778 if (map->m_flags & F2FS_MAP_MAPPED) { 1775 1779 unsigned int ofs = start_pgofs - map->m_lblk; 1776 1780 1777 - f2fs_update_read_extent_cache_range(&dn, 1778 - start_pgofs, map->m_pblk + ofs, 1779 - map->m_len - ofs); 1781 + if (map->m_len > ofs) 1782 + f2fs_update_read_extent_cache_range(&dn, 1783 + start_pgofs, map->m_pblk + ofs, 1784 + map->m_len - ofs); 1780 1785 } 1781 1786 if (map->m_next_extent) 1782 - *map->m_next_extent = pgofs + 1; 1787 + *map->m_next_extent = is_hole ? pgofs + 1 : pgofs; 1783 1788 } 1784 1789 f2fs_put_dnode(&dn); 1785 1790 unlock_out: ··· 2142 2145 f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); 2143 2146 bio = NULL; 2144 2147 } 2145 - if (bio == NULL) { 2148 + if (bio == NULL) 2146 2149 bio = f2fs_grab_read_bio(inode, block_nr, nr_pages, 2147 2150 f2fs_ra_op_flags(rac), index, 2148 2151 false); 2149 - if (IS_ERR(bio)) { 2150 - ret = PTR_ERR(bio); 2151 - bio = NULL; 2152 - goto out; 2153 - } 2154 - } 2155 2152 2156 2153 /* 2157 2154 * If the page is under writeback, we need to wait for ··· 2294 2303 bio = NULL; 2295 2304 } 2296 2305 2297 - if (!bio) { 2306 + if (!bio) 2298 2307 bio = f2fs_grab_read_bio(inode, blkaddr, nr_pages - i, 2299 2308 f2fs_ra_op_flags(rac), 2300 2309 folio->index, for_write); 2301 - if (IS_ERR(bio)) { 2302 - ret = PTR_ERR(bio); 2303 - f2fs_decompress_end_io(dic, ret, true); 2304 - f2fs_put_dnode(&dn); 2305 - *bio_ret = NULL; 2306 - return ret; 2307 - } 2308 - } 2309 2310 2310 2311 if (!bio_add_folio(bio, folio, blocksize, 0)) 2311 2312 goto submit_and_realloc; ··· 3622 3639 err = -EFSCORRUPTED; 3623 3640 goto put_folio; 3624 3641 } 3625 - err = f2fs_submit_page_read(use_cow ? 3642 + f2fs_submit_page_read(use_cow ? 3626 3643 F2FS_I(inode)->cow_inode : inode, 3627 3644 folio, blkaddr, 0, true); 3628 - if (err) 3629 - goto put_folio; 3630 3645 3631 3646 folio_lock(folio); 3632 3647 if (unlikely(folio->mapping != mapping)) {

+16 -1

fs/f2fs/dir.c

··· 16 16 #include "xattr.h" 17 17 #include <trace/events/f2fs.h> 18 18 19 + static inline bool f2fs_should_fallback_to_linear(struct inode *dir) 20 + { 21 + struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 22 + 23 + switch (F2FS_OPTION(sbi).lookup_mode) { 24 + case LOOKUP_PERF: 25 + return false; 26 + case LOOKUP_COMPAT: 27 + return true; 28 + case LOOKUP_AUTO: 29 + return !sb_no_casefold_compat_fallback(sbi->sb); 30 + } 31 + return false; 32 + } 33 + 19 34 #if IS_ENABLED(CONFIG_UNICODE) 20 35 extern struct kmem_cache *f2fs_cf_name_slab; 21 36 #endif ··· 381 366 382 367 out: 383 368 #if IS_ENABLED(CONFIG_UNICODE) 384 - if (!sb_no_casefold_compat_fallback(dir->i_sb) && 369 + if (f2fs_should_fallback_to_linear(dir) && 385 370 IS_CASEFOLDED(dir) && !de && use_hash) { 386 371 use_hash = false; 387 372 goto start_find_entry;

+15

fs/f2fs/extent_cache.c

··· 604 604 p = &(*p)->rb_right; 605 605 leftmost = false; 606 606 } else { 607 + f2fs_err_ratelimited(sbi, "%s: corrupted extent, type: %d, " 608 + "extent node in rb tree [%u, %u, %u], age [%llu, %llu], " 609 + "extent node to insert [%u, %u, %u], age [%llu, %llu]", 610 + __func__, et->type, en->ei.fofs, en->ei.blk, en->ei.len, en->ei.age, 611 + en->ei.last_blocks, ei->fofs, ei->blk, ei->len, ei->age, ei->last_blocks); 607 612 f2fs_bug_on(sbi, 1); 613 + return NULL; 608 614 } 609 615 } 610 616 ··· 669 663 670 664 if (!et) 671 665 return; 666 + 667 + if (unlikely(len == 0)) { 668 + f2fs_err_ratelimited(sbi, "%s: extent len is zero, type: %d, " 669 + "extent [%u, %u, %u], age [%llu, %llu]", 670 + __func__, type, tei->fofs, tei->blk, tei->len, 671 + tei->age, tei->last_blocks); 672 + f2fs_bug_on(sbi, 1); 673 + return; 674 + } 672 675 673 676 if (type == EX_READ) 674 677 trace_f2fs_update_read_extent_tree_range(inode, fofs, len,

+72 -16

fs/f2fs/f2fs.h

··· 131 131 * string rather than using the MS_LAZYTIME flag, so this must remain. 132 132 */ 133 133 #define F2FS_MOUNT_LAZYTIME 0x40000000 134 + #define F2FS_MOUNT_RESERVE_NODE 0x80000000 134 135 135 136 #define F2FS_OPTION(sbi) ((sbi)->mount_opt) 136 137 #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option) ··· 156 155 BLKZONE_ALLOC_PRIOR_CONV, /* Prioritize writing to conventional zones */ 157 156 }; 158 157 158 + enum bggc_io_aware_policy { 159 + AWARE_ALL_IO, /* skip background GC if there is any kind of pending IO */ 160 + AWARE_READ_IO, /* skip background GC if there is pending read IO */ 161 + AWARE_NONE, /* don't aware IO for background GC */ 162 + }; 163 + 164 + enum device_allocation_policy { 165 + ALLOCATE_FORWARD_NOHINT, 166 + ALLOCATE_FORWARD_WITHIN_HINT, 167 + ALLOCATE_FORWARD_FROM_HINT, 168 + }; 169 + 159 170 /* 160 171 * An implementation of an rwsem that is explicitly unfair to readers. This 161 172 * prevents priority inversion when a low-priority reader acquires the read lock ··· 185 172 struct f2fs_mount_info { 186 173 unsigned int opt; 187 174 block_t root_reserved_blocks; /* root reserved blocks */ 175 + block_t root_reserved_nodes; /* root reserved nodes */ 188 176 kuid_t s_resuid; /* reserved blocks for uid */ 189 177 kgid_t s_resgid; /* reserved blocks for gid */ 190 178 int active_logs; /* # of active logs */ ··· 226 212 int compress_mode; /* compression mode */ 227 213 unsigned char extensions[COMPRESS_EXT_NUM][F2FS_EXTENSION_LEN]; /* extensions */ 228 214 unsigned char noextensions[COMPRESS_EXT_NUM][F2FS_EXTENSION_LEN]; /* extensions */ 215 + unsigned int lookup_mode; 229 216 }; 230 217 231 218 #define F2FS_FEATURE_ENCRYPT 0x00000001 ··· 281 266 #define DEF_CP_INTERVAL 60 /* 60 secs */ 282 267 #define DEF_IDLE_INTERVAL 5 /* 5 secs */ 283 268 #define DEF_DISABLE_INTERVAL 5 /* 5 secs */ 269 + #define DEF_ENABLE_INTERVAL 16 /* 16 secs */ 284 270 #define DEF_DISABLE_QUICK_INTERVAL 1 /* 1 secs */ 285 271 #define DEF_UMOUNT_DISCARD_TIMEOUT 5 /* 5 secs */ 272 + 273 + enum cp_time { 274 + CP_TIME_START, /* begin */ 275 + CP_TIME_LOCK, /* after cp_global_sem */ 276 + CP_TIME_OP_LOCK, /* after block_operation */ 277 + CP_TIME_FLUSH_META, /* after flush sit/nat */ 278 + CP_TIME_SYNC_META, /* after sync_meta_pages */ 279 + CP_TIME_SYNC_CP_META, /* after sync cp meta pages */ 280 + CP_TIME_WAIT_DIRTY_META,/* after wait on dirty meta */ 281 + CP_TIME_WAIT_CP_DATA, /* after wait on cp data */ 282 + CP_TIME_FLUSH_DEVICE, /* after flush device cache */ 283 + CP_TIME_WAIT_LAST_CP, /* after wait on last cp pack */ 284 + CP_TIME_END, /* after unblock_operation */ 285 + CP_TIME_MAX, 286 + }; 287 + 288 + /* time cost stats of checkpoint */ 289 + struct cp_stats { 290 + ktime_t times[CP_TIME_MAX]; 291 + }; 286 292 287 293 struct cp_control { 288 294 int reason; 289 295 __u64 trim_start; 290 296 __u64 trim_end; 291 297 __u64 trim_minlen; 298 + struct cp_stats stats; 292 299 }; 293 300 294 301 /* ··· 371 334 struct completion wait; /* completion for checkpoint done */ 372 335 struct llist_node llnode; /* llist_node to be linked in wait queue */ 373 336 int ret; /* return code of checkpoint */ 374 - ktime_t queue_time; /* request queued time */ 337 + union { 338 + ktime_t queue_time; /* request queued time */ 339 + ktime_t delta_time; /* time in queue */ 340 + }; 375 341 }; 376 342 377 343 struct ckpt_req_control { ··· 389 349 unsigned int cur_time; /* cur wait time in msec for currently issued checkpoint */ 390 350 unsigned int peak_time; /* peak wait time in msec until now */ 391 351 }; 352 + 353 + /* a time threshold that checkpoint was blocked for, unit: ms */ 354 + #define CP_LONG_LATENCY_THRESHOLD 5000 392 355 393 356 /* for the bitmap indicate blocks to be discarded */ 394 357 struct discard_entry { ··· 1418 1375 DISCARD_TIME, 1419 1376 GC_TIME, 1420 1377 DISABLE_TIME, 1378 + ENABLE_TIME, 1421 1379 UMOUNT_DISCARD_TIMEOUT, 1422 1380 MAX_TIME, 1423 1381 }; ··· 1496 1452 FOREGROUND, 1497 1453 MAX_CALL_TYPE, 1498 1454 TOTAL_CALL = FOREGROUND, 1455 + }; 1456 + 1457 + enum f2fs_lookup_mode { 1458 + LOOKUP_PERF, 1459 + LOOKUP_COMPAT, 1460 + LOOKUP_AUTO, 1499 1461 }; 1500 1462 1501 1463 static inline int f2fs_test_bit(unsigned int nr, char *addr); ··· 1693 1643 unsigned long last_time[MAX_TIME]; /* to store time in jiffies */ 1694 1644 long interval_time[MAX_TIME]; /* to store thresholds */ 1695 1645 struct ckpt_req_control cprc_info; /* for checkpoint request control */ 1646 + struct cp_stats cp_stats; /* for time stat of checkpoint */ 1696 1647 1697 1648 struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */ 1698 1649 ··· 1861 1810 spinlock_t dev_lock; /* protect dirty_device */ 1862 1811 bool aligned_blksize; /* all devices has the same logical blksize */ 1863 1812 unsigned int first_seq_zone_segno; /* first segno in sequential zone */ 1813 + unsigned int bggc_io_aware; /* For adjust the BG_GC priority when pending IO */ 1814 + unsigned int allocate_section_hint; /* the boundary position between devices */ 1815 + unsigned int allocate_section_policy; /* determine the section writing priority */ 1864 1816 1865 1817 /* For write statistics */ 1866 1818 u64 sectors_written_start; ··· 2416 2362 return ofs == XATTR_NODE_OFFSET; 2417 2363 } 2418 2364 2419 - static inline bool __allow_reserved_blocks(struct f2fs_sb_info *sbi, 2365 + static inline bool __allow_reserved_root(struct f2fs_sb_info *sbi, 2420 2366 struct inode *inode, bool cap) 2421 2367 { 2422 2368 if (!inode) 2423 2369 return true; 2424 - if (!test_opt(sbi, RESERVE_ROOT)) 2425 - return false; 2426 2370 if (IS_NOQUOTA(inode)) 2427 2371 return true; 2428 2372 if (uid_eq(F2FS_OPTION(sbi).s_resuid, current_fsuid())) ··· 2441 2389 avail_user_block_count = sbi->user_block_count - 2442 2390 sbi->current_reserved_blocks; 2443 2391 2444 - if (!__allow_reserved_blocks(sbi, inode, cap)) 2392 + if (test_opt(sbi, RESERVE_ROOT) && !__allow_reserved_root(sbi, inode, cap)) 2445 2393 avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks; 2446 2394 2447 2395 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { ··· 2799 2747 struct inode *inode, bool is_inode) 2800 2748 { 2801 2749 block_t valid_block_count; 2802 - unsigned int valid_node_count; 2750 + unsigned int valid_node_count, avail_user_node_count; 2803 2751 unsigned int avail_user_block_count; 2804 2752 int err; 2805 2753 ··· 2821 2769 spin_lock(&sbi->stat_lock); 2822 2770 2823 2771 valid_block_count = sbi->total_valid_block_count + 1; 2824 - avail_user_block_count = get_available_block_count(sbi, inode, false); 2772 + avail_user_block_count = get_available_block_count(sbi, inode, 2773 + test_opt(sbi, RESERVE_NODE)); 2825 2774 2826 2775 if (unlikely(valid_block_count > avail_user_block_count)) { 2827 2776 spin_unlock(&sbi->stat_lock); 2828 2777 goto enospc; 2829 2778 } 2830 2779 2780 + avail_user_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM; 2781 + if (test_opt(sbi, RESERVE_NODE) && 2782 + !__allow_reserved_root(sbi, inode, true)) 2783 + avail_user_node_count -= F2FS_OPTION(sbi).root_reserved_nodes; 2831 2784 valid_node_count = sbi->total_valid_node_count + 1; 2832 - if (unlikely(valid_node_count > sbi->total_node_count)) { 2785 + if (unlikely(valid_node_count > avail_user_node_count)) { 2833 2786 spin_unlock(&sbi->stat_lock); 2834 2787 goto enospc; 2835 2788 } ··· 3061 3004 if (sbi->gc_mode == GC_URGENT_HIGH) 3062 3005 return true; 3063 3006 3064 - if (zoned_gc) { 3065 - if (is_inflight_read_io(sbi)) 3066 - return false; 3067 - } else { 3068 - if (is_inflight_io(sbi, type)) 3069 - return false; 3070 - } 3007 + if (sbi->bggc_io_aware == AWARE_READ_IO && is_inflight_read_io(sbi)) 3008 + return false; 3009 + if (sbi->bggc_io_aware == AWARE_ALL_IO && is_inflight_io(sbi, type)) 3010 + return false; 3071 3011 3072 3012 if (sbi->gc_mode == GC_URGENT_MID) 3073 3013 return true; ··· 3824 3770 * node.c 3825 3771 */ 3826 3772 struct node_info; 3773 + enum node_type; 3827 3774 3828 3775 int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid); 3829 3776 bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type); ··· 3847 3792 struct folio *f2fs_new_inode_folio(struct inode *inode); 3848 3793 struct folio *f2fs_new_node_folio(struct dnode_of_data *dn, unsigned int ofs); 3849 3794 void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); 3850 - struct folio *f2fs_get_node_folio(struct f2fs_sb_info *sbi, pgoff_t nid); 3795 + struct folio *f2fs_get_node_folio(struct f2fs_sb_info *sbi, pgoff_t nid, 3796 + enum node_type node_type); 3851 3797 struct folio *f2fs_get_inode_folio(struct f2fs_sb_info *sbi, pgoff_t ino); 3852 3798 struct folio *f2fs_get_xnode_folio(struct f2fs_sb_info *sbi, pgoff_t xnid); 3853 3799 int f2fs_move_node_folio(struct folio *node_folio, int gc_type);

+28 -21

fs/f2fs/file.c

··· 35 35 #include <trace/events/f2fs.h> 36 36 #include <uapi/linux/f2fs.h> 37 37 38 - static void f2fs_zero_post_eof_page(struct inode *inode, loff_t new_size) 38 + static void f2fs_zero_post_eof_page(struct inode *inode, 39 + loff_t new_size, bool lock) 39 40 { 40 41 loff_t old_size = i_size_read(inode); 41 42 42 43 if (old_size >= new_size) 43 44 return; 44 45 46 + if (mapping_empty(inode->i_mapping)) 47 + return; 48 + 49 + if (lock) 50 + filemap_invalidate_lock(inode->i_mapping); 45 51 /* zero or drop pages only in range of [old_size, new_size] */ 46 - truncate_pagecache(inode, old_size); 52 + truncate_inode_pages_range(inode->i_mapping, old_size, new_size); 53 + if (lock) 54 + filemap_invalidate_unlock(inode->i_mapping); 47 55 } 48 56 49 57 static vm_fault_t f2fs_filemap_fault(struct vm_fault *vmf) ··· 122 114 123 115 f2fs_bug_on(sbi, f2fs_has_inline_data(inode)); 124 116 125 - filemap_invalidate_lock(inode->i_mapping); 126 - f2fs_zero_post_eof_page(inode, (folio->index + 1) << PAGE_SHIFT); 127 - filemap_invalidate_unlock(inode->i_mapping); 117 + f2fs_zero_post_eof_page(inode, (folio->index + 1) << PAGE_SHIFT, true); 128 118 129 119 file_update_time(vmf->vma->vm_file); 130 120 filemap_invalidate_lock_shared(inode->i_mapping); ··· 910 904 /* we should check inline_data size */ 911 905 if (!f2fs_may_inline_data(inode)) { 912 906 err = f2fs_convert_inline_inode(inode); 913 - if (err) 907 + if (err) { 908 + /* 909 + * Always truncate page #0 to avoid page cache 910 + * leak in evict() path. 911 + */ 912 + truncate_inode_pages_range(inode->i_mapping, 913 + F2FS_BLK_TO_BYTES(0), 914 + F2FS_BLK_END_BYTES(0)); 914 915 return err; 916 + } 915 917 } 916 918 917 919 err = f2fs_truncate_blocks(inode, i_size_read(inode), true); ··· 1155 1141 filemap_invalidate_lock(inode->i_mapping); 1156 1142 1157 1143 if (attr->ia_size > old_size) 1158 - f2fs_zero_post_eof_page(inode, attr->ia_size); 1144 + f2fs_zero_post_eof_page(inode, attr->ia_size, false); 1159 1145 truncate_setsize(inode, attr->ia_size); 1160 1146 1161 1147 if (attr->ia_size <= old_size) ··· 1274 1260 if (ret) 1275 1261 return ret; 1276 1262 1277 - filemap_invalidate_lock(inode->i_mapping); 1278 - f2fs_zero_post_eof_page(inode, offset + len); 1279 - filemap_invalidate_unlock(inode->i_mapping); 1263 + f2fs_zero_post_eof_page(inode, offset + len, true); 1280 1264 1281 1265 pg_start = ((unsigned long long) offset) >> PAGE_SHIFT; 1282 1266 pg_end = ((unsigned long long) offset + len) >> PAGE_SHIFT; ··· 1559 1547 f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1560 1548 filemap_invalidate_lock(inode->i_mapping); 1561 1549 1562 - f2fs_zero_post_eof_page(inode, offset + len); 1550 + f2fs_zero_post_eof_page(inode, offset + len, false); 1563 1551 1564 1552 f2fs_lock_op(sbi); 1565 1553 f2fs_drop_extent_tree(inode); ··· 1682 1670 if (ret) 1683 1671 return ret; 1684 1672 1685 - filemap_invalidate_lock(mapping); 1686 - f2fs_zero_post_eof_page(inode, offset + len); 1687 - filemap_invalidate_unlock(mapping); 1673 + f2fs_zero_post_eof_page(inode, offset + len, true); 1688 1674 1689 1675 pg_start = ((unsigned long long) offset) >> PAGE_SHIFT; 1690 1676 pg_end = ((unsigned long long) offset + len) >> PAGE_SHIFT; ··· 1816 1806 f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1817 1807 filemap_invalidate_lock(mapping); 1818 1808 1819 - f2fs_zero_post_eof_page(inode, offset + len); 1809 + f2fs_zero_post_eof_page(inode, offset + len, false); 1820 1810 truncate_pagecache(inode, offset); 1821 1811 1822 1812 while (!ret && idx > pg_start) { ··· 1874 1864 if (err) 1875 1865 return err; 1876 1866 1877 - filemap_invalidate_lock(inode->i_mapping); 1878 - f2fs_zero_post_eof_page(inode, offset + len); 1879 - filemap_invalidate_unlock(inode->i_mapping); 1867 + f2fs_zero_post_eof_page(inode, offset + len, true); 1880 1868 1881 1869 f2fs_balance_fs(sbi, true); 1882 1870 ··· 4922 4914 if (err) 4923 4915 return err; 4924 4916 4925 - filemap_invalidate_lock(inode->i_mapping); 4926 - f2fs_zero_post_eof_page(inode, iocb->ki_pos + iov_iter_count(from)); 4927 - filemap_invalidate_unlock(inode->i_mapping); 4917 + f2fs_zero_post_eof_page(inode, 4918 + iocb->ki_pos + iov_iter_count(from), true); 4928 4919 return count; 4929 4920 } 4930 4921

+22 -3

fs/f2fs/gc.c

··· 1071 1071 } 1072 1072 1073 1073 /* phase == 2 */ 1074 - node_folio = f2fs_get_node_folio(sbi, nid); 1074 + node_folio = f2fs_get_node_folio(sbi, nid, NODE_TYPE_REGULAR); 1075 1075 if (IS_ERR(node_folio)) 1076 1076 continue; 1077 1077 ··· 1145 1145 nid = le32_to_cpu(sum->nid); 1146 1146 ofs_in_node = le16_to_cpu(sum->ofs_in_node); 1147 1147 1148 - node_folio = f2fs_get_node_folio(sbi, nid); 1148 + node_folio = f2fs_get_node_folio(sbi, nid, NODE_TYPE_REGULAR); 1149 1149 if (IS_ERR(node_folio)) 1150 1150 return false; 1151 1151 ··· 1794 1794 struct folio *sum_folio = filemap_get_folio(META_MAPPING(sbi), 1795 1795 GET_SUM_BLOCK(sbi, segno)); 1796 1796 1797 + if (is_cursec(sbi, GET_SEC_FROM_SEG(sbi, segno))) { 1798 + f2fs_err(sbi, "%s: segment %u is used by log", 1799 + __func__, segno); 1800 + f2fs_bug_on(sbi, 1); 1801 + goto skip; 1802 + } 1803 + 1797 1804 if (get_valid_blocks(sbi, segno, false) == 0) 1798 1805 goto freed; 1799 1806 if (gc_type == BG_GC && __is_large_section(sbi) && ··· 1812 1805 1813 1806 sum = folio_address(sum_folio); 1814 1807 if (type != GET_SUM_TYPE((&sum->footer))) { 1815 - f2fs_err(sbi, "Inconsistent segment (%u) type [%d, %d] in SSA and SIT", 1808 + f2fs_err(sbi, "Inconsistent segment (%u) type [%d, %d] in SIT and SSA", 1816 1809 segno, type, GET_SUM_TYPE((&sum->footer))); 1817 1810 f2fs_stop_checkpoint(sbi, false, 1818 1811 STOP_CP_REASON_CORRUPTED_SUMMARY); ··· 2075 2068 .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS), 2076 2069 }; 2077 2070 2071 + /* 2072 + * avoid migrating empty section, as it can be allocated by 2073 + * log in parallel. 2074 + */ 2075 + if (!get_valid_blocks(sbi, segno, true)) 2076 + continue; 2077 + 2078 2078 if (is_cursec(sbi, GET_SEC_FROM_SEG(sbi, segno))) 2079 2079 continue; 2080 2080 ··· 2196 2182 SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs; 2197 2183 MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs; 2198 2184 MAIN_SECS(sbi) += secs; 2185 + if (sbi->allocate_section_hint > MAIN_SECS(sbi)) 2186 + sbi->allocate_section_hint = MAIN_SECS(sbi); 2199 2187 FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs; 2200 2188 FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs; 2201 2189 F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks); 2202 2190 2203 2191 if (f2fs_is_multi_device(sbi)) { 2204 2192 int last_dev = sbi->s_ndevs - 1; 2193 + 2194 + sbi->allocate_section_hint = FDEV(0).total_segments / 2195 + SEGS_PER_SEC(sbi); 2205 2196 2206 2197 FDEV(last_dev).total_segments = 2207 2198 (int)FDEV(last_dev).total_segments + segs;

+56 -21

fs/f2fs/node.c

··· 27 27 static struct kmem_cache *nat_entry_set_slab; 28 28 static struct kmem_cache *fsync_node_entry_slab; 29 29 30 + static inline bool is_invalid_nid(struct f2fs_sb_info *sbi, nid_t nid) 31 + { 32 + return nid < F2FS_ROOT_INO(sbi) || nid >= NM_I(sbi)->max_nid; 33 + } 34 + 30 35 /* 31 36 * Check whether the given nid is within node id range. 32 37 */ 33 38 int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid) 34 39 { 35 - if (unlikely(nid < F2FS_ROOT_INO(sbi) || nid >= NM_I(sbi)->max_nid)) { 40 + if (unlikely(is_invalid_nid(sbi, nid))) { 36 41 set_sbi_flag(sbi, SBI_NEED_FSCK); 37 42 f2fs_warn(sbi, "%s: out-of-range nid=%x, run fsck to fix.", 38 43 __func__, nid); ··· 876 871 } 877 872 878 873 if (!done) { 879 - nfolio[i] = f2fs_get_node_folio(sbi, nids[i]); 874 + nfolio[i] = f2fs_get_node_folio(sbi, nids[i], 875 + NODE_TYPE_NON_INODE); 880 876 if (IS_ERR(nfolio[i])) { 881 877 err = PTR_ERR(nfolio[i]); 882 878 f2fs_folio_put(nfolio[0], false); ··· 995 989 return 1; 996 990 997 991 /* get direct node */ 998 - folio = f2fs_get_node_folio(sbi, dn->nid); 992 + folio = f2fs_get_node_folio(sbi, dn->nid, NODE_TYPE_NON_INODE); 999 993 if (PTR_ERR(folio) == -ENOENT) 1000 994 return 1; 1001 995 else if (IS_ERR(folio)) ··· 1039 1033 1040 1034 trace_f2fs_truncate_nodes_enter(dn->inode, dn->nid, dn->data_blkaddr); 1041 1035 1042 - folio = f2fs_get_node_folio(F2FS_I_SB(dn->inode), dn->nid); 1036 + folio = f2fs_get_node_folio(F2FS_I_SB(dn->inode), dn->nid, 1037 + NODE_TYPE_NON_INODE); 1043 1038 if (IS_ERR(folio)) { 1044 1039 trace_f2fs_truncate_nodes_exit(dn->inode, PTR_ERR(folio)); 1045 1040 return PTR_ERR(folio); ··· 1118 1111 /* get indirect nodes in the path */ 1119 1112 for (i = 0; i < idx + 1; i++) { 1120 1113 /* reference count'll be increased */ 1121 - folios[i] = f2fs_get_node_folio(F2FS_I_SB(dn->inode), nid[i]); 1114 + folios[i] = f2fs_get_node_folio(F2FS_I_SB(dn->inode), nid[i], 1115 + NODE_TYPE_NON_INODE); 1122 1116 if (IS_ERR(folios[i])) { 1123 1117 err = PTR_ERR(folios[i]); 1124 1118 idx = i - 1; ··· 1504 1496 struct folio *folio, pgoff_t nid, 1505 1497 enum node_type ntype) 1506 1498 { 1507 - if (unlikely(nid != nid_of_node(folio) || 1508 - (ntype == NODE_TYPE_INODE && !IS_INODE(folio)) || 1509 - (ntype == NODE_TYPE_XATTR && 1510 - !f2fs_has_xattr_block(ofs_of_node(folio))) || 1511 - time_to_inject(sbi, FAULT_INCONSISTENT_FOOTER))) { 1512 - f2fs_warn(sbi, "inconsistent node block, node_type:%d, nid:%lu, " 1513 - "node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]", 1514 - ntype, nid, nid_of_node(folio), ino_of_node(folio), 1515 - ofs_of_node(folio), cpver_of_node(folio), 1516 - next_blkaddr_of_node(folio)); 1517 - set_sbi_flag(sbi, SBI_NEED_FSCK); 1518 - f2fs_handle_error(sbi, ERROR_INCONSISTENT_FOOTER); 1519 - return -EFSCORRUPTED; 1499 + if (unlikely(nid != nid_of_node(folio))) 1500 + goto out_err; 1501 + 1502 + switch (ntype) { 1503 + case NODE_TYPE_INODE: 1504 + if (!IS_INODE(folio)) 1505 + goto out_err; 1506 + break; 1507 + case NODE_TYPE_XATTR: 1508 + if (!f2fs_has_xattr_block(ofs_of_node(folio))) 1509 + goto out_err; 1510 + break; 1511 + case NODE_TYPE_NON_INODE: 1512 + if (IS_INODE(folio)) 1513 + goto out_err; 1514 + break; 1515 + default: 1516 + break; 1520 1517 } 1518 + if (time_to_inject(sbi, FAULT_INCONSISTENT_FOOTER)) 1519 + goto out_err; 1521 1520 return 0; 1521 + out_err: 1522 + f2fs_warn(sbi, "inconsistent node block, node_type:%d, nid:%lu, " 1523 + "node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]", 1524 + ntype, nid, nid_of_node(folio), ino_of_node(folio), 1525 + ofs_of_node(folio), cpver_of_node(folio), 1526 + next_blkaddr_of_node(folio)); 1527 + set_sbi_flag(sbi, SBI_NEED_FSCK); 1528 + f2fs_handle_error(sbi, ERROR_INCONSISTENT_FOOTER); 1529 + return -EFSCORRUPTED; 1522 1530 } 1523 1531 1524 1532 static struct folio *__get_node_folio(struct f2fs_sb_info *sbi, pgoff_t nid, ··· 1570 1546 1571 1547 if (unlikely(!folio_test_uptodate(folio))) { 1572 1548 err = -EIO; 1573 - goto out_err; 1549 + goto out_put_err; 1574 1550 } 1575 1551 1576 1552 if (!f2fs_inode_chksum_verify(sbi, folio)) { ··· 1591 1567 return ERR_PTR(err); 1592 1568 } 1593 1569 1594 - struct folio *f2fs_get_node_folio(struct f2fs_sb_info *sbi, pgoff_t nid) 1570 + struct folio *f2fs_get_node_folio(struct f2fs_sb_info *sbi, pgoff_t nid, 1571 + enum node_type node_type) 1595 1572 { 1596 - return __get_node_folio(sbi, nid, NULL, 0, NODE_TYPE_REGULAR); 1573 + return __get_node_folio(sbi, nid, NULL, 0, node_type); 1597 1574 } 1598 1575 1599 1576 struct folio *f2fs_get_inode_folio(struct f2fs_sb_info *sbi, pgoff_t ino) ··· 2659 2634 f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list)); 2660 2635 i = list_first_entry(&nm_i->free_nid_list, 2661 2636 struct free_nid, list); 2637 + 2638 + if (unlikely(is_invalid_nid(sbi, i->nid))) { 2639 + spin_unlock(&nm_i->nid_list_lock); 2640 + f2fs_err(sbi, "Corrupted nid %u in free_nid_list", 2641 + i->nid); 2642 + f2fs_stop_checkpoint(sbi, false, 2643 + STOP_CP_REASON_CORRUPTED_NID); 2644 + return false; 2645 + } 2646 + 2662 2647 *nid = i->nid; 2663 2648 2664 2649 __move_free_nid(sbi, i, FREE_NID, PREALLOC_NID);

fs/f2fs/node.h

··· 57 57 NODE_TYPE_REGULAR, 58 58 NODE_TYPE_INODE, 59 59 NODE_TYPE_XATTR, 60 + NODE_TYPE_NON_INODE, 60 61 }; 61 62 62 63 /*

+1 -1

fs/f2fs/recovery.c

··· 548 548 } 549 549 550 550 /* Get the node page */ 551 - node_folio = f2fs_get_node_folio(sbi, nid); 551 + node_folio = f2fs_get_node_folio(sbi, nid, NODE_TYPE_REGULAR); 552 552 if (IS_ERR(node_folio)) 553 553 return PTR_ERR(node_folio); 554 554

+27 -3

fs/f2fs/segment.c

··· 2774 2774 unsigned int total_zones = MAIN_SECS(sbi) / sbi->secs_per_zone; 2775 2775 unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg); 2776 2776 unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg); 2777 + unsigned int alloc_policy = sbi->allocate_section_policy; 2778 + unsigned int alloc_hint = sbi->allocate_section_hint; 2777 2779 bool init = true; 2778 2780 int i; 2779 2781 int ret = 0; ··· 2808 2806 hint = GET_SEC_FROM_SEG(sbi, segno); 2809 2807 } 2810 2808 #endif 2809 + 2810 + /* 2811 + * Prevent allocate_section_hint from exceeding MAIN_SECS() 2812 + * due to desynchronization. 2813 + */ 2814 + if (alloc_policy != ALLOCATE_FORWARD_NOHINT && 2815 + alloc_hint > MAIN_SECS(sbi)) 2816 + alloc_hint = MAIN_SECS(sbi); 2817 + 2818 + if (alloc_policy == ALLOCATE_FORWARD_FROM_HINT && 2819 + hint < alloc_hint) 2820 + hint = alloc_hint; 2821 + else if (alloc_policy == ALLOCATE_FORWARD_WITHIN_HINT && 2822 + hint >= alloc_hint) 2823 + hint = 0; 2811 2824 2812 2825 find_other_zone: 2813 2826 secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); ··· 3689 3672 3690 3673 if (file_is_hot(inode) || 3691 3674 is_inode_flag_set(inode, FI_HOT_DATA) || 3692 - f2fs_is_cow_file(inode)) 3675 + f2fs_is_cow_file(inode) || 3676 + is_inode_flag_set(inode, FI_NEED_IPU)) 3693 3677 return CURSEG_HOT_DATA; 3694 3678 return f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), 3695 3679 inode->i_write_hint); ··· 3954 3936 int seg_type = log_type_to_seg_type(type); 3955 3937 bool keep_order = (f2fs_lfs_mode(fio->sbi) && 3956 3938 seg_type == CURSEG_COLD_DATA); 3939 + int err; 3957 3940 3958 3941 if (keep_order) 3959 3942 f2fs_down_read(&fio->sbi->io_order_lock); 3960 3943 3961 - if (f2fs_allocate_data_block(fio->sbi, folio, fio->old_blkaddr, 3962 - &fio->new_blkaddr, sum, type, fio)) { 3944 + err = f2fs_allocate_data_block(fio->sbi, folio, fio->old_blkaddr, 3945 + &fio->new_blkaddr, sum, type, fio); 3946 + if (unlikely(err)) { 3947 + f2fs_err_ratelimited(fio->sbi, 3948 + "%s Failed to allocate data block, ino:%u, index:%lu, type:%d, old_blkaddr:0x%x, new_blkaddr:0x%x, err:%d", 3949 + __func__, fio->ino, folio->index, type, 3950 + fio->old_blkaddr, fio->new_blkaddr, err); 3963 3951 if (fscrypt_inode_uses_fs_layer_crypto(folio->mapping->host)) 3964 3952 fscrypt_finalize_bounce_page(&fio->encrypted_page); 3965 3953 folio_end_writeback(folio);

+12 -16

fs/f2fs/segment.h

··· 600 600 return GET_SEC_FROM_SEG(sbi, reserved_segments(sbi)); 601 601 } 602 602 603 + static inline unsigned int get_left_section_blocks(struct f2fs_sb_info *sbi, 604 + enum log_type type, unsigned int segno) 605 + { 606 + if (f2fs_lfs_mode(sbi) && __is_large_section(sbi)) 607 + return CAP_BLKS_PER_SEC(sbi) - SEGS_TO_BLKS(sbi, 608 + (segno - GET_START_SEG_FROM_SEC(sbi, segno))) - 609 + CURSEG_I(sbi, type)->next_blkoff; 610 + return CAP_BLKS_PER_SEC(sbi) - get_ckpt_valid_blocks(sbi, segno, true); 611 + } 612 + 603 613 static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi, 604 614 unsigned int node_blocks, unsigned int data_blocks, 605 615 unsigned int dent_blocks) ··· 624 614 if (unlikely(segno == NULL_SEGNO)) 625 615 return false; 626 616 627 - if (f2fs_lfs_mode(sbi) && __is_large_section(sbi)) { 628 - left_blocks = CAP_BLKS_PER_SEC(sbi) - 629 - SEGS_TO_BLKS(sbi, (segno - GET_START_SEG_FROM_SEC(sbi, segno))) - 630 - CURSEG_I(sbi, i)->next_blkoff; 631 - } else { 632 - left_blocks = CAP_BLKS_PER_SEC(sbi) - 633 - get_ckpt_valid_blocks(sbi, segno, true); 634 - } 617 + left_blocks = get_left_section_blocks(sbi, i, segno); 635 618 636 619 blocks = i <= CURSEG_COLD_DATA ? data_blocks : node_blocks; 637 620 if (blocks > left_blocks) ··· 637 634 if (unlikely(segno == NULL_SEGNO)) 638 635 return false; 639 636 640 - if (f2fs_lfs_mode(sbi) && __is_large_section(sbi)) { 641 - left_blocks = CAP_BLKS_PER_SEC(sbi) - 642 - SEGS_TO_BLKS(sbi, (segno - GET_START_SEG_FROM_SEC(sbi, segno))) - 643 - CURSEG_I(sbi, CURSEG_HOT_DATA)->next_blkoff; 644 - } else { 645 - left_blocks = CAP_BLKS_PER_SEC(sbi) - 646 - get_ckpt_valid_blocks(sbi, segno, true); 647 - } 637 + left_blocks = get_left_section_blocks(sbi, CURSEG_HOT_DATA, segno); 648 638 649 639 if (dent_blocks > left_blocks) 650 640 return false;

+107 -14

fs/f2fs/super.c

··· 143 143 Opt_extent_cache, 144 144 Opt_data_flush, 145 145 Opt_reserve_root, 146 + Opt_reserve_node, 146 147 Opt_resgid, 147 148 Opt_resuid, 148 149 Opt_mode, ··· 182 181 Opt_nat_bits, 183 182 Opt_jqfmt, 184 183 Opt_checkpoint, 184 + Opt_lookup_mode, 185 185 Opt_err, 186 186 }; 187 187 ··· 246 244 {} 247 245 }; 248 246 247 + static const struct constant_table f2fs_param_lookup_mode[] = { 248 + {"perf", LOOKUP_PERF}, 249 + {"compat", LOOKUP_COMPAT}, 250 + {"auto", LOOKUP_AUTO}, 251 + {} 252 + }; 253 + 249 254 static const struct fs_parameter_spec f2fs_param_specs[] = { 250 255 fsparam_enum("background_gc", Opt_gc_background, f2fs_param_background_gc), 251 256 fsparam_flag("disable_roll_forward", Opt_disable_roll_forward), ··· 274 265 fsparam_flag_no("extent_cache", Opt_extent_cache), 275 266 fsparam_flag("data_flush", Opt_data_flush), 276 267 fsparam_u32("reserve_root", Opt_reserve_root), 268 + fsparam_u32("reserve_node", Opt_reserve_node), 277 269 fsparam_gid("resgid", Opt_resgid), 278 270 fsparam_uid("resuid", Opt_resuid), 279 271 fsparam_enum("mode", Opt_mode, f2fs_param_mode), ··· 310 300 fsparam_enum("memory", Opt_memory_mode, f2fs_param_memory_mode), 311 301 fsparam_flag("age_extent_cache", Opt_age_extent_cache), 312 302 fsparam_enum("errors", Opt_errors, f2fs_param_errors), 303 + fsparam_enum("lookup_mode", Opt_lookup_mode, f2fs_param_lookup_mode), 313 304 {} 314 305 }; 315 306 ··· 347 336 #define F2FS_SPEC_discard_unit (1 << 21) 348 337 #define F2FS_SPEC_memory_mode (1 << 22) 349 338 #define F2FS_SPEC_errors (1 << 23) 339 + #define F2FS_SPEC_lookup_mode (1 << 24) 340 + #define F2FS_SPEC_reserve_node (1 << 25) 350 341 351 342 struct f2fs_fs_context { 352 343 struct f2fs_mount_info info; ··· 450 437 451 438 static inline void limit_reserve_root(struct f2fs_sb_info *sbi) 452 439 { 453 - block_t limit = min((sbi->user_block_count >> 3), 440 + block_t block_limit = min((sbi->user_block_count >> 3), 454 441 sbi->user_block_count - sbi->reserved_blocks); 442 + block_t node_limit = sbi->total_node_count >> 3; 455 443 456 444 /* limit is 12.5% */ 457 445 if (test_opt(sbi, RESERVE_ROOT) && 458 - F2FS_OPTION(sbi).root_reserved_blocks > limit) { 459 - F2FS_OPTION(sbi).root_reserved_blocks = limit; 446 + F2FS_OPTION(sbi).root_reserved_blocks > block_limit) { 447 + F2FS_OPTION(sbi).root_reserved_blocks = block_limit; 460 448 f2fs_info(sbi, "Reduce reserved blocks for root = %u", 461 449 F2FS_OPTION(sbi).root_reserved_blocks); 462 450 } 463 - if (!test_opt(sbi, RESERVE_ROOT) && 451 + if (test_opt(sbi, RESERVE_NODE) && 452 + F2FS_OPTION(sbi).root_reserved_nodes > node_limit) { 453 + F2FS_OPTION(sbi).root_reserved_nodes = node_limit; 454 + f2fs_info(sbi, "Reduce reserved nodes for root = %u", 455 + F2FS_OPTION(sbi).root_reserved_nodes); 456 + } 457 + if (!test_opt(sbi, RESERVE_ROOT) && !test_opt(sbi, RESERVE_NODE) && 464 458 (!uid_eq(F2FS_OPTION(sbi).s_resuid, 465 459 make_kuid(&init_user_ns, F2FS_DEF_RESUID)) || 466 460 !gid_eq(F2FS_OPTION(sbi).s_resgid, 467 461 make_kgid(&init_user_ns, F2FS_DEF_RESGID)))) 468 - f2fs_info(sbi, "Ignore s_resuid=%u, s_resgid=%u w/o reserve_root", 462 + f2fs_info(sbi, "Ignore s_resuid=%u, s_resgid=%u w/o reserve_root" 463 + " and reserve_node", 469 464 from_kuid_munged(&init_user_ns, 470 465 F2FS_OPTION(sbi).s_resuid), 471 466 from_kgid_munged(&init_user_ns, ··· 868 847 F2FS_CTX_INFO(ctx).root_reserved_blocks = result.uint_32; 869 848 ctx->spec_mask |= F2FS_SPEC_reserve_root; 870 849 break; 850 + case Opt_reserve_node: 851 + ctx_set_opt(ctx, F2FS_MOUNT_RESERVE_NODE); 852 + F2FS_CTX_INFO(ctx).root_reserved_nodes = result.uint_32; 853 + ctx->spec_mask |= F2FS_SPEC_reserve_node; 854 + break; 871 855 case Opt_resuid: 872 856 F2FS_CTX_INFO(ctx).s_resuid = result.uid; 873 857 ctx->spec_mask |= F2FS_SPEC_resuid; ··· 1020 994 ctx_set_opt(ctx, F2FS_MOUNT_DISABLE_CHECKPOINT); 1021 995 break; 1022 996 case Opt_checkpoint_enable: 997 + F2FS_CTX_INFO(ctx).unusable_cap_perc = 0; 998 + ctx->spec_mask |= F2FS_SPEC_checkpoint_disable_cap_perc; 999 + F2FS_CTX_INFO(ctx).unusable_cap = 0; 1000 + ctx->spec_mask |= F2FS_SPEC_checkpoint_disable_cap; 1023 1001 ctx_clear_opt(ctx, F2FS_MOUNT_DISABLE_CHECKPOINT); 1024 1002 break; 1025 1003 default: ··· 1179 1149 case Opt_nat_bits: 1180 1150 ctx_set_opt(ctx, F2FS_MOUNT_NAT_BITS); 1181 1151 break; 1152 + case Opt_lookup_mode: 1153 + F2FS_CTX_INFO(ctx).lookup_mode = result.uint_32; 1154 + ctx->spec_mask |= F2FS_SPEC_lookup_mode; 1155 + break; 1182 1156 } 1183 1157 return 0; 1184 1158 } ··· 1225 1191 goto err_jquota_change; 1226 1192 1227 1193 if (old_qname) { 1228 - if (strcmp(old_qname, new_qname) == 0) { 1194 + if (!new_qname) { 1195 + f2fs_info(sbi, "remove qf_name %s", 1196 + old_qname); 1197 + continue; 1198 + } else if (strcmp(old_qname, new_qname) == 0) { 1229 1199 ctx->qname_mask &= ~(1 << i); 1230 1200 continue; 1231 1201 } ··· 1468 1430 ctx_clear_opt(ctx, F2FS_MOUNT_RESERVE_ROOT); 1469 1431 ctx->opt_mask &= ~F2FS_MOUNT_RESERVE_ROOT; 1470 1432 } 1433 + if (test_opt(sbi, RESERVE_NODE) && 1434 + (ctx->opt_mask & F2FS_MOUNT_RESERVE_NODE) && 1435 + ctx_test_opt(ctx, F2FS_MOUNT_RESERVE_NODE)) { 1436 + f2fs_info(sbi, "Preserve previous reserve_node=%u", 1437 + F2FS_OPTION(sbi).root_reserved_nodes); 1438 + ctx_clear_opt(ctx, F2FS_MOUNT_RESERVE_NODE); 1439 + ctx->opt_mask &= ~F2FS_MOUNT_RESERVE_NODE; 1440 + } 1471 1441 1472 1442 err = f2fs_check_test_dummy_encryption(fc, sb); 1473 1443 if (err) ··· 1675 1629 if (ctx->spec_mask & F2FS_SPEC_reserve_root) 1676 1630 F2FS_OPTION(sbi).root_reserved_blocks = 1677 1631 F2FS_CTX_INFO(ctx).root_reserved_blocks; 1632 + if (ctx->spec_mask & F2FS_SPEC_reserve_node) 1633 + F2FS_OPTION(sbi).root_reserved_nodes = 1634 + F2FS_CTX_INFO(ctx).root_reserved_nodes; 1678 1635 if (ctx->spec_mask & F2FS_SPEC_resgid) 1679 1636 F2FS_OPTION(sbi).s_resgid = F2FS_CTX_INFO(ctx).s_resgid; 1680 1637 if (ctx->spec_mask & F2FS_SPEC_resuid) ··· 1707 1658 F2FS_OPTION(sbi).memory_mode = F2FS_CTX_INFO(ctx).memory_mode; 1708 1659 if (ctx->spec_mask & F2FS_SPEC_errors) 1709 1660 F2FS_OPTION(sbi).errors = F2FS_CTX_INFO(ctx).errors; 1661 + if (ctx->spec_mask & F2FS_SPEC_lookup_mode) 1662 + F2FS_OPTION(sbi).lookup_mode = F2FS_CTX_INFO(ctx).lookup_mode; 1710 1663 1711 1664 f2fs_apply_compression(fc, sb); 1712 1665 f2fs_apply_test_dummy_encryption(fc, sb); ··· 2400 2349 else if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK) 2401 2350 seq_puts(seq, "fragment:block"); 2402 2351 seq_printf(seq, ",active_logs=%u", F2FS_OPTION(sbi).active_logs); 2403 - if (test_opt(sbi, RESERVE_ROOT)) 2404 - seq_printf(seq, ",reserve_root=%u,resuid=%u,resgid=%u", 2352 + if (test_opt(sbi, RESERVE_ROOT) || test_opt(sbi, RESERVE_NODE)) 2353 + seq_printf(seq, ",reserve_root=%u,reserve_node=%u,resuid=%u," 2354 + "resgid=%u", 2405 2355 F2FS_OPTION(sbi).root_reserved_blocks, 2356 + F2FS_OPTION(sbi).root_reserved_nodes, 2406 2357 from_kuid_munged(&init_user_ns, 2407 2358 F2FS_OPTION(sbi).s_resuid), 2408 2359 from_kgid_munged(&init_user_ns, ··· 2475 2422 if (test_opt(sbi, NAT_BITS)) 2476 2423 seq_puts(seq, ",nat_bits"); 2477 2424 2425 + if (F2FS_OPTION(sbi).lookup_mode == LOOKUP_PERF) 2426 + seq_show_option(seq, "lookup_mode", "perf"); 2427 + else if (F2FS_OPTION(sbi).lookup_mode == LOOKUP_COMPAT) 2428 + seq_show_option(seq, "lookup_mode", "compat"); 2429 + else if (F2FS_OPTION(sbi).lookup_mode == LOOKUP_AUTO) 2430 + seq_show_option(seq, "lookup_mode", "auto"); 2431 + 2478 2432 return 0; 2479 2433 } 2480 2434 ··· 2546 2486 #endif 2547 2487 2548 2488 f2fs_build_fault_attr(sbi, 0, 0, FAULT_ALL); 2489 + 2490 + F2FS_OPTION(sbi).lookup_mode = LOOKUP_PERF; 2549 2491 } 2550 2492 2551 2493 #ifdef CONFIG_QUOTA ··· 2628 2566 restore_flag: 2629 2567 sbi->gc_mode = gc_mode; 2630 2568 sbi->sb->s_flags = s_flags; /* Restore SB_RDONLY status */ 2569 + f2fs_info(sbi, "f2fs_disable_checkpoint() finish, err:%d", err); 2631 2570 return err; 2632 2571 } 2633 2572 2634 2573 static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi) 2635 2574 { 2636 - int retry = DEFAULT_RETRY_IO_COUNT; 2575 + unsigned int nr_pages = get_pages(sbi, F2FS_DIRTY_DATA) / 16; 2576 + long long start, writeback, end; 2577 + 2578 + f2fs_info(sbi, "f2fs_enable_checkpoint() starts, meta: %lld, node: %lld, data: %lld", 2579 + get_pages(sbi, F2FS_DIRTY_META), 2580 + get_pages(sbi, F2FS_DIRTY_NODES), 2581 + get_pages(sbi, F2FS_DIRTY_DATA)); 2582 + 2583 + f2fs_update_time(sbi, ENABLE_TIME); 2584 + 2585 + start = ktime_get(); 2637 2586 2638 2587 /* we should flush all the data to keep data consistency */ 2639 - do { 2640 - sync_inodes_sb(sbi->sb); 2588 + while (get_pages(sbi, F2FS_DIRTY_DATA)) { 2589 + writeback_inodes_sb_nr(sbi->sb, nr_pages, WB_REASON_SYNC); 2641 2590 f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); 2642 - } while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--); 2643 2591 2644 - if (unlikely(retry < 0)) 2645 - f2fs_warn(sbi, "checkpoint=enable has some unwritten data."); 2592 + if (f2fs_time_over(sbi, ENABLE_TIME)) 2593 + break; 2594 + } 2595 + writeback = ktime_get(); 2596 + 2597 + sync_inodes_sb(sbi->sb); 2598 + 2599 + if (unlikely(get_pages(sbi, F2FS_DIRTY_DATA))) 2600 + f2fs_warn(sbi, "checkpoint=enable has some unwritten data: %lld", 2601 + get_pages(sbi, F2FS_DIRTY_DATA)); 2646 2602 2647 2603 f2fs_down_write(&sbi->gc_lock); 2648 2604 f2fs_dirty_to_prefree(sbi); ··· 2673 2593 2674 2594 /* Let's ensure there's no pending checkpoint anymore */ 2675 2595 f2fs_flush_ckpt_thread(sbi); 2596 + 2597 + end = ktime_get(); 2598 + 2599 + f2fs_info(sbi, "f2fs_enable_checkpoint() finishes, writeback:%llu, sync:%llu", 2600 + ktime_ms_delta(writeback, start), 2601 + ktime_ms_delta(end, writeback)); 2676 2602 } 2677 2603 2678 2604 static int __f2fs_remount(struct fs_context *fc, struct super_block *sb) ··· 4242 4156 sbi->total_node_count = SEGS_TO_BLKS(sbi, 4243 4157 ((le32_to_cpu(raw_super->segment_count_nat) / 2) * 4244 4158 NAT_ENTRY_PER_BLOCK)); 4159 + sbi->allocate_section_hint = le32_to_cpu(raw_super->section_count); 4160 + sbi->allocate_section_policy = ALLOCATE_FORWARD_NOHINT; 4245 4161 F2FS_ROOT_INO(sbi) = le32_to_cpu(raw_super->root_ino); 4246 4162 F2FS_NODE_INO(sbi) = le32_to_cpu(raw_super->node_ino); 4247 4163 F2FS_META_INO(sbi) = le32_to_cpu(raw_super->meta_ino); ··· 4267 4179 sbi->interval_time[DISCARD_TIME] = DEF_IDLE_INTERVAL; 4268 4180 sbi->interval_time[GC_TIME] = DEF_IDLE_INTERVAL; 4269 4181 sbi->interval_time[DISABLE_TIME] = DEF_DISABLE_INTERVAL; 4182 + sbi->interval_time[ENABLE_TIME] = DEF_ENABLE_INTERVAL; 4270 4183 sbi->interval_time[UMOUNT_DISCARD_TIMEOUT] = 4271 4184 DEF_UMOUNT_DISCARD_TIMEOUT; 4272 4185 clear_sbi_flag(sbi, SBI_NEED_FSCK); ··· 4726 4637 4727 4638 logical_blksize = bdev_logical_block_size(sbi->sb->s_bdev); 4728 4639 sbi->aligned_blksize = true; 4640 + sbi->bggc_io_aware = AWARE_ALL_IO; 4729 4641 #ifdef CONFIG_BLK_DEV_ZONED 4730 4642 sbi->max_open_zones = UINT_MAX; 4731 4643 sbi->blkzone_alloc_policy = BLKZONE_ALLOC_PRIOR_SEQ; 4644 + sbi->bggc_io_aware = AWARE_READ_IO; 4732 4645 #endif 4733 4646 4734 4647 for (i = 0; i < max_devices; i++) { ··· 4758 4667 SEGS_TO_BLKS(sbi, 4759 4668 FDEV(i).total_segments) - 1 + 4760 4669 le32_to_cpu(raw_super->segment0_blkaddr); 4670 + sbi->allocate_section_hint = FDEV(i).total_segments / 4671 + SEGS_PER_SEC(sbi); 4761 4672 } else { 4762 4673 FDEV(i).start_blk = FDEV(i - 1).end_blk + 1; 4763 4674 FDEV(i).end_blk = FDEV(i).start_blk +

+116 -3

fs/f2fs/sysfs.c

··· 281 281 le16_to_cpu(F2FS_RAW_SUPER(sbi)->s_encoding_flags)); 282 282 } 283 283 284 + static ssize_t effective_lookup_mode_show(struct f2fs_attr *a, 285 + struct f2fs_sb_info *sbi, char *buf) 286 + { 287 + switch (F2FS_OPTION(sbi).lookup_mode) { 288 + case LOOKUP_PERF: 289 + return sysfs_emit(buf, "perf\n"); 290 + case LOOKUP_COMPAT: 291 + return sysfs_emit(buf, "compat\n"); 292 + case LOOKUP_AUTO: 293 + if (sb_no_casefold_compat_fallback(sbi->sb)) 294 + return sysfs_emit(buf, "auto:perf\n"); 295 + return sysfs_emit(buf, "auto:compat\n"); 296 + } 297 + return 0; 298 + } 299 + 284 300 static ssize_t mounted_time_sec_show(struct f2fs_attr *a, 285 301 struct f2fs_sb_info *sbi, char *buf) 286 302 { ··· 882 866 return count; 883 867 } 884 868 869 + if (!strcmp(a->attr.name, "bggc_io_aware")) { 870 + if (t < AWARE_ALL_IO || t > AWARE_NONE) 871 + return -EINVAL; 872 + sbi->bggc_io_aware = t; 873 + return count; 874 + } 875 + 876 + if (!strcmp(a->attr.name, "allocate_section_hint")) { 877 + if (t < 0 || t > MAIN_SECS(sbi)) 878 + return -EINVAL; 879 + sbi->allocate_section_hint = t; 880 + return count; 881 + } 882 + 883 + if (!strcmp(a->attr.name, "allocate_section_policy")) { 884 + if (t < ALLOCATE_FORWARD_NOHINT || t > ALLOCATE_FORWARD_FROM_HINT) 885 + return -EINVAL; 886 + sbi->allocate_section_policy = t; 887 + return count; 888 + } 889 + 885 890 *ui = (unsigned int)t; 886 891 887 892 return count; ··· 1175 1138 F2FS_SBI_GENERAL_RW_ATTR(migration_granularity); 1176 1139 F2FS_SBI_GENERAL_RW_ATTR(migration_window_granularity); 1177 1140 F2FS_SBI_GENERAL_RW_ATTR(dir_level); 1141 + F2FS_SBI_GENERAL_RW_ATTR(allocate_section_hint); 1142 + F2FS_SBI_GENERAL_RW_ATTR(allocate_section_policy); 1178 1143 #ifdef CONFIG_F2FS_IOSTAT 1179 1144 F2FS_SBI_GENERAL_RW_ATTR(iostat_enable); 1180 1145 F2FS_SBI_GENERAL_RW_ATTR(iostat_period_ms); ··· 1214 1175 #endif 1215 1176 F2FS_SBI_GENERAL_RW_ATTR(carve_out); 1216 1177 F2FS_SBI_GENERAL_RW_ATTR(reserved_pin_section); 1178 + F2FS_SBI_GENERAL_RW_ATTR(bggc_io_aware); 1217 1179 1218 1180 /* STAT_INFO ATTR */ 1219 1181 #ifdef CONFIG_F2FS_STAT_FS ··· 1251 1211 F2FS_GENERAL_RO_ATTR(unusable); 1252 1212 F2FS_GENERAL_RO_ATTR(encoding); 1253 1213 F2FS_GENERAL_RO_ATTR(encoding_flags); 1214 + F2FS_GENERAL_RO_ATTR(effective_lookup_mode); 1254 1215 F2FS_GENERAL_RO_ATTR(mounted_time_sec); 1255 1216 F2FS_GENERAL_RO_ATTR(main_blkaddr); 1256 1217 F2FS_GENERAL_RO_ATTR(pending_discard); ··· 1344 1303 ATTR_LIST(discard_idle_interval), 1345 1304 ATTR_LIST(gc_idle_interval), 1346 1305 ATTR_LIST(umount_discard_timeout), 1306 + ATTR_LIST(bggc_io_aware), 1347 1307 #ifdef CONFIG_F2FS_IOSTAT 1348 1308 ATTR_LIST(iostat_enable), 1349 1309 ATTR_LIST(iostat_period_ms), ··· 1371 1329 ATTR_LIST(current_reserved_blocks), 1372 1330 ATTR_LIST(encoding), 1373 1331 ATTR_LIST(encoding_flags), 1332 + ATTR_LIST(effective_lookup_mode), 1374 1333 ATTR_LIST(mounted_time_sec), 1375 1334 #ifdef CONFIG_F2FS_STAT_FS 1376 1335 ATTR_LIST(cp_foreground_calls), ··· 1414 1371 ATTR_LIST(max_read_extent_count), 1415 1372 ATTR_LIST(carve_out), 1416 1373 ATTR_LIST(reserved_pin_section), 1374 + ATTR_LIST(allocate_section_hint), 1375 + ATTR_LIST(allocate_section_policy), 1417 1376 NULL, 1418 1377 }; 1419 1378 ATTRIBUTE_GROUPS(f2fs); ··· 1768 1723 seq_printf(seq, " Main : 0x%010x (%10d)\n", 1769 1724 SM_I(sbi)->main_blkaddr, 1770 1725 le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment_count_main)); 1771 - seq_printf(seq, " # of Sections : %12d\n", 1772 - le32_to_cpu(F2FS_RAW_SUPER(sbi)->section_count)); 1726 + seq_printf(seq, " Block size : %12lu KB\n", F2FS_BLKSIZE >> 10); 1727 + seq_printf(seq, " Segment size : %12d MB\n", 1728 + (BLKS_PER_SEG(sbi) << (F2FS_BLKSIZE_BITS - 10)) >> 10); 1773 1729 seq_printf(seq, " Segs/Sections : %12d\n", 1774 1730 SEGS_PER_SEC(sbi)); 1775 1731 seq_printf(seq, " Section size : %12d MB\n", 1776 - SEGS_PER_SEC(sbi) << 1); 1732 + (BLKS_PER_SEC(sbi) << (F2FS_BLKSIZE_BITS - 10)) >> 10); 1733 + seq_printf(seq, " # of Sections : %12d\n", 1734 + le32_to_cpu(F2FS_RAW_SUPER(sbi)->section_count)); 1777 1735 1778 1736 if (!f2fs_is_multi_device(sbi)) 1779 1737 return 0; ··· 1787 1739 i, bdev_is_zoned(FDEV(i).bdev), 1788 1740 FDEV(i).start_blk, FDEV(i).end_blk, 1789 1741 FDEV(i).path); 1742 + return 0; 1743 + } 1744 + 1745 + static int __maybe_unused donation_list_seq_show(struct seq_file *seq, 1746 + void *offset) 1747 + { 1748 + struct super_block *sb = seq->private; 1749 + struct f2fs_sb_info *sbi = F2FS_SB(sb); 1750 + struct inode *inode; 1751 + struct f2fs_inode_info *fi; 1752 + struct dentry *dentry; 1753 + char *buf, *path; 1754 + int i; 1755 + 1756 + buf = f2fs_getname(sbi); 1757 + if (!buf) 1758 + return 0; 1759 + 1760 + seq_printf(seq, "Donation List\n"); 1761 + seq_printf(seq, " # of files : %u\n", sbi->donate_files); 1762 + seq_printf(seq, " %-50s %10s %20s %20s %22s\n", 1763 + "File path", "Status", "Donation offset (kb)", 1764 + "Donation size (kb)", "File cached size (kb)"); 1765 + seq_printf(seq, "---\n"); 1766 + 1767 + for (i = 0; i < sbi->donate_files; i++) { 1768 + spin_lock(&sbi->inode_lock[DONATE_INODE]); 1769 + if (list_empty(&sbi->inode_list[DONATE_INODE])) { 1770 + spin_unlock(&sbi->inode_lock[DONATE_INODE]); 1771 + break; 1772 + } 1773 + fi = list_first_entry(&sbi->inode_list[DONATE_INODE], 1774 + struct f2fs_inode_info, gdonate_list); 1775 + list_move_tail(&fi->gdonate_list, &sbi->inode_list[DONATE_INODE]); 1776 + inode = igrab(&fi->vfs_inode); 1777 + spin_unlock(&sbi->inode_lock[DONATE_INODE]); 1778 + 1779 + if (!inode) 1780 + continue; 1781 + 1782 + inode_lock_shared(inode); 1783 + 1784 + dentry = d_find_alias(inode); 1785 + if (!dentry) { 1786 + path = NULL; 1787 + } else { 1788 + path = dentry_path_raw(dentry, buf, PATH_MAX); 1789 + if (IS_ERR(path)) 1790 + goto next; 1791 + } 1792 + seq_printf(seq, " %-50s %10s %20llu %20llu %22llu\n", 1793 + path ? path : "<unlinked>", 1794 + is_inode_flag_set(inode, FI_DONATE_FINISHED) ? 1795 + "Evicted" : "Donated", 1796 + (loff_t)fi->donate_start << (PAGE_SHIFT - 10), 1797 + (loff_t)(fi->donate_end + 1) << (PAGE_SHIFT - 10), 1798 + (loff_t)inode->i_mapping->nrpages << (PAGE_SHIFT - 10)); 1799 + next: 1800 + dput(dentry); 1801 + inode_unlock_shared(inode); 1802 + iput(inode); 1803 + } 1804 + f2fs_putname(buf); 1790 1805 return 0; 1791 1806 } 1792 1807 ··· 1962 1851 discard_plist_seq_show, sb); 1963 1852 proc_create_single_data("disk_map", 0444, sbi->s_proc, 1964 1853 disk_map_seq_show, sb); 1854 + proc_create_single_data("donation_list", 0444, sbi->s_proc, 1855 + donation_list_seq_show, sb); 1965 1856 #ifdef CONFIG_F2FS_FAULT_INJECTION 1966 1857 proc_create_single_data("inject_stats", 0444, sbi->s_proc, 1967 1858 inject_stats_seq_show, sb);

include/linux/f2fs_fs.h

··· 79 79 STOP_CP_REASON_FLUSH_FAIL, 80 80 STOP_CP_REASON_NO_SEGMENT, 81 81 STOP_CP_REASON_CORRUPTED_FREE_BITMAP, 82 + STOP_CP_REASON_CORRUPTED_NID, 82 83 STOP_CP_REASON_MAX, 83 84 }; 84 85

Configure Feed

Configure Feed