Merge tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

+59 -3

Documentation/ABI/testing/sysfs-fs-f2fs

··· 520 520 Date: January 2021 521 521 Contact: "Daeho Jeong" <daehojeong@google.com> 522 522 Description: Give a way to change checkpoint merge daemon's io priority. 523 - Its default value is "be,3", which means "BE" I/O class and 523 + Its default value is "rt,3", which means "RT" I/O class and 524 524 I/O priority "3". We can select the class between "rt" and "be", 525 525 and set the I/O priority within valid range of it. "," delimiter 526 526 is necessary in between I/O class and priority number. ··· 732 732 FAULT_TRUNCATE 0x00000400 733 733 FAULT_READ_IO 0x00000800 734 734 FAULT_CHECKPOINT 0x00001000 735 - FAULT_DISCARD 0x00002000 735 + FAULT_DISCARD 0x00002000 (obsolete) 736 736 FAULT_WRITE_IO 0x00004000 737 737 FAULT_SLAB_ALLOC 0x00008000 738 738 FAULT_DQUOT_INIT 0x00010000 ··· 741 741 FAULT_BLKADDR_CONSISTENCE 0x00080000 742 742 FAULT_NO_SEGMENT 0x00100000 743 743 FAULT_INCONSISTENT_FOOTER 0x00200000 744 - FAULT_TIMEOUT 0x00400000 (1000ms) 744 + FAULT_ATOMIC_TIMEOUT 0x00400000 (1000ms) 745 745 FAULT_VMALLOC 0x00800000 746 + FAULT_LOCK_TIMEOUT 0x01000000 (1000ms) 747 + FAULT_SKIP_WRITE 0x02000000 746 748 =========================== ========== 747 749 748 750 What: /sys/fs/f2fs/<disk>/discard_io_aware_gran ··· 941 939 allocate_section_policy = 1 Prioritize writing to section before allocate_section_hint 942 940 allocate_section_policy = 2 Prioritize writing to section after allocate_section_hint 943 941 =========================== ========================================================== 942 + 943 + What: /sys/fs/f2fs/<disk>/max_lock_elapsed_time 944 + Date: December 2025 945 + Contact: "Chao Yu" <chao@kernel.org> 946 + Description: This is a threshold, once a thread enters critical region that lock covers, total 947 + elapsed time exceeds this threshold, f2fs will print tracepoint to dump information 948 + of related context. This sysfs entry can be used to control the value of threshold, 949 + by default, the value is 500 ms. 950 + 951 + What: /sys/fs/f2fs/<disk>/inject_timeout_type 952 + Date: December 2025 953 + Contact: "Chao Yu" <chao@kernel.org> 954 + Description: This sysfs entry can be used to change type of injected timeout: 955 + ========== =============================== 956 + Flag_Value Flag_Description 957 + ========== =============================== 958 + 0x00000000 No timeout (default) 959 + 0x00000001 Simulate running time 960 + 0x00000002 Simulate IO type sleep time 961 + 0x00000003 Simulate Non-IO type sleep time 962 + 0x00000004 Simulate runnable time 963 + ========== =============================== 964 + 965 + What: /sys/fs/f2fs/<disk>/adjust_lock_priority 966 + Date: January 2026 967 + Contact: "Chao Yu" <chao@kernel.org> 968 + Description: This sysfs entry can be used to enable/disable to adjust priority for task 969 + which is in critical region covered by lock. 970 + ========== ================== 971 + Flag_Value Flag_Description 972 + ========== ================== 973 + 0x00000000 Disabled (default) 974 + 0x00000001 cp_rwsem 975 + 0x00000002 node_change 976 + 0x00000004 node_write 977 + 0x00000008 gc_lock 978 + 0x00000010 cp_global 979 + 0x00000020 io_rwsem 980 + ========== ================== 981 + 982 + What: /sys/fs/f2fs/<disk>/lock_duration_priority 983 + Date: January 2026 984 + Contact: "Chao Yu" <chao@kernel.org> 985 + Description: f2fs can tune priority of thread which has entered into critical region covered by 986 + f2fs rwsemphore lock. This sysfs entry can be used to control priority value, the 987 + range is [100,139], by default the value is 120. 988 + 989 + What: /sys/fs/f2fs/<disk>/critical_task_priority 990 + Date: February 2026 991 + Contact: "Chao Yu" <chao@kernel.org> 992 + Description: It can be used to tune priority of f2fs critical task, e.g. f2fs_ckpt, f2fs_gc 993 + threads, limitation as below: 994 + - it requires user has CAP_SYS_NICE capability. 995 + - the range is [100, 139], by default the value is 100.

+47 -2

Documentation/filesystems/f2fs.rst

··· 206 206 FAULT_TRUNCATE 0x00000400 207 207 FAULT_READ_IO 0x00000800 208 208 FAULT_CHECKPOINT 0x00001000 209 - FAULT_DISCARD 0x00002000 209 + FAULT_DISCARD 0x00002000 (obsolete) 210 210 FAULT_WRITE_IO 0x00004000 211 211 FAULT_SLAB_ALLOC 0x00008000 212 212 FAULT_DQUOT_INIT 0x00010000 ··· 215 215 FAULT_BLKADDR_CONSISTENCE 0x00080000 216 216 FAULT_NO_SEGMENT 0x00100000 217 217 FAULT_INCONSISTENT_FOOTER 0x00200000 218 - FAULT_TIMEOUT 0x00400000 (1000ms) 218 + FAULT_ATOMIC_TIMEOUT 0x00400000 (1000ms) 219 219 FAULT_VMALLOC 0x00800000 220 + FAULT_LOCK_TIMEOUT 0x01000000 (1000ms) 221 + FAULT_SKIP_WRITE 0x02000000 220 222 =========================== ========== 221 223 mode=%s Control block allocation mode which supports "adaptive" 222 224 and "lfs". In "lfs" mode, there should be no random ··· 1035 1033 So, the key idea is, user can do any file operations on /dev/vdc, and 1036 1034 reclaim the space after the use, while the space is counted as /data. 1037 1035 That doesn't require modifying partition size and filesystem format. 1036 + 1037 + Per-file Read-Only Large Folio Support 1038 + -------------------------------------- 1039 + 1040 + F2FS implements large folio support on the read path to leverage high-order 1041 + page allocation for significant performance gains. To minimize code complexity, 1042 + this support is currently excluded from the write path, which requires handling 1043 + complex optimizations such as compression and block allocation modes. 1044 + 1045 + This optional feature is triggered only when a file's immutable bit is set. 1046 + Consequently, F2FS will return EOPNOTSUPP if a user attempts to open a cached 1047 + file with write permissions, even immediately after clearing the bit. Write 1048 + access is only restored once the cached inode is dropped. The usage flow is 1049 + demonstrated below: 1050 + 1051 + .. code-block:: 1052 + 1053 + # f2fs_io setflags immutable /data/testfile_read_seq 1054 + 1055 + /* flush and reload the inode to enable the large folio */ 1056 + # sync && echo 3 > /proc/sys/vm/drop_caches 1057 + 1058 + /* mmap(MAP_POPULATE) + mlock() */ 1059 + # f2fs_io read 128 0 1024 mmap 1 0 /data/testfile_read_seq 1060 + 1061 + /* mmap() + fadvise(POSIX_FADV_WILLNEED) + mlock() */ 1062 + # f2fs_io read 128 0 1024 fadvise 1 0 /data/testfile_read_seq 1063 + 1064 + /* mmap() + mlock2(MLOCK_ONFAULT) + madvise(MADV_POPULATE_READ) */ 1065 + # f2fs_io read 128 0 1024 madvise 1 0 /data/testfile_read_seq 1066 + 1067 + # f2fs_io clearflags immutable /data/testfile_read_seq 1068 + 1069 + # f2fs_io write 1 0 1 zero buffered /data/testfile_read_seq 1070 + Failed to open /mnt/test/test: Operation not supported 1071 + 1072 + /* flush and reload the inode to disable the large folio */ 1073 + # sync && echo 3 > /proc/sys/vm/drop_caches 1074 + 1075 + # f2fs_io write 1 0 1 zero buffered /data/testfile_read_seq 1076 + Written 4096 bytes with pattern = zero, total_time = 29 us, max_latency = 28 us 1077 + 1078 + # rm /data/testfile_read_seq

+230 -17

fs/f2fs/checkpoint.c

··· 14 14 #include <linux/pagevec.h> 15 15 #include <linux/swap.h> 16 16 #include <linux/kthread.h> 17 + #include <linux/delayacct.h> 18 + #include <linux/ioprio.h> 19 + #include <linux/math64.h> 17 20 18 21 #include "f2fs.h" 19 22 #include "node.h" 20 23 #include "segment.h" 21 24 #include "iostat.h" 22 25 #include <trace/events/f2fs.h> 26 + 27 + static inline void get_lock_elapsed_time(struct f2fs_time_stat *ts) 28 + { 29 + ts->total_time = ktime_get(); 30 + #ifdef CONFIG_64BIT 31 + ts->running_time = current->se.sum_exec_runtime; 32 + #endif 33 + #if defined(CONFIG_SCHED_INFO) && defined(CONFIG_SCHEDSTATS) 34 + ts->runnable_time = current->sched_info.run_delay; 35 + #endif 36 + #ifdef CONFIG_TASK_DELAY_ACCT 37 + if (current->delays) 38 + ts->io_sleep_time = current->delays->blkio_delay; 39 + #endif 40 + } 41 + 42 + static inline void trace_lock_elapsed_time_start(struct f2fs_rwsem *sem, 43 + struct f2fs_lock_context *lc) 44 + { 45 + lc->lock_trace = trace_f2fs_lock_elapsed_time_enabled(); 46 + if (!lc->lock_trace) 47 + return; 48 + 49 + get_lock_elapsed_time(&lc->ts); 50 + } 51 + 52 + static inline void trace_lock_elapsed_time_end(struct f2fs_rwsem *sem, 53 + struct f2fs_lock_context *lc, bool is_write) 54 + { 55 + struct f2fs_time_stat tts; 56 + unsigned long long total_time; 57 + unsigned long long running_time = 0; 58 + unsigned long long runnable_time = 0; 59 + unsigned long long io_sleep_time = 0; 60 + unsigned long long other_time = 0; 61 + unsigned npm = NSEC_PER_MSEC; 62 + 63 + if (!lc->lock_trace) 64 + return; 65 + 66 + if (time_to_inject(sem->sbi, FAULT_LOCK_TIMEOUT)) 67 + f2fs_schedule_timeout_killable(DEFAULT_FAULT_TIMEOUT, true); 68 + 69 + get_lock_elapsed_time(&tts); 70 + 71 + total_time = div_u64(tts.total_time - lc->ts.total_time, npm); 72 + if (total_time <= sem->sbi->max_lock_elapsed_time) 73 + return; 74 + 75 + #ifdef CONFIG_64BIT 76 + running_time = div_u64(tts.running_time - lc->ts.running_time, npm); 77 + #endif 78 + #if defined(CONFIG_SCHED_INFO) && defined(CONFIG_SCHEDSTATS) 79 + runnable_time = div_u64(tts.runnable_time - lc->ts.runnable_time, npm); 80 + #endif 81 + #ifdef CONFIG_TASK_DELAY_ACCT 82 + io_sleep_time = div_u64(tts.io_sleep_time - lc->ts.io_sleep_time, npm); 83 + #endif 84 + if (total_time > running_time + io_sleep_time + runnable_time) 85 + other_time = total_time - running_time - 86 + io_sleep_time - runnable_time; 87 + 88 + trace_f2fs_lock_elapsed_time(sem->sbi, sem->name, is_write, current, 89 + get_current_ioprio(), total_time, running_time, 90 + runnable_time, io_sleep_time, other_time); 91 + } 92 + 93 + static bool need_uplift_priority(struct f2fs_rwsem *sem, bool is_write) 94 + { 95 + if (!(sem->sbi->adjust_lock_priority & BIT(sem->name - 1))) 96 + return false; 97 + 98 + switch (sem->name) { 99 + /* 100 + * writer is checkpoint which has high priority, let's just uplift 101 + * priority for reader 102 + */ 103 + case LOCK_NAME_CP_RWSEM: 104 + case LOCK_NAME_NODE_CHANGE: 105 + case LOCK_NAME_NODE_WRITE: 106 + return !is_write; 107 + case LOCK_NAME_GC_LOCK: 108 + case LOCK_NAME_CP_GLOBAL: 109 + case LOCK_NAME_IO_RWSEM: 110 + return true; 111 + default: 112 + f2fs_bug_on(sem->sbi, 1); 113 + } 114 + return false; 115 + } 116 + 117 + static void uplift_priority(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc, 118 + bool is_write) 119 + { 120 + lc->need_restore = false; 121 + if (!sem->sbi->adjust_lock_priority) 122 + return; 123 + if (rt_task(current)) 124 + return; 125 + if (!need_uplift_priority(sem, is_write)) 126 + return; 127 + lc->orig_nice = task_nice(current); 128 + lc->new_nice = PRIO_TO_NICE(sem->sbi->lock_duration_priority); 129 + if (lc->orig_nice <= lc->new_nice) 130 + return; 131 + set_user_nice(current, lc->new_nice); 132 + lc->need_restore = true; 133 + 134 + trace_f2fs_priority_uplift(sem->sbi, sem->name, is_write, current, 135 + NICE_TO_PRIO(lc->orig_nice), NICE_TO_PRIO(lc->new_nice)); 136 + } 137 + 138 + static void restore_priority(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc, 139 + bool is_write) 140 + { 141 + if (!lc->need_restore) 142 + return; 143 + /* someone has updated the priority */ 144 + if (task_nice(current) != lc->new_nice) 145 + return; 146 + set_user_nice(current, lc->orig_nice); 147 + 148 + trace_f2fs_priority_restore(sem->sbi, sem->name, is_write, current, 149 + NICE_TO_PRIO(lc->orig_nice), NICE_TO_PRIO(lc->new_nice)); 150 + } 151 + 152 + void f2fs_down_read_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc) 153 + { 154 + uplift_priority(sem, lc, false); 155 + f2fs_down_read(sem); 156 + trace_lock_elapsed_time_start(sem, lc); 157 + } 158 + 159 + int f2fs_down_read_trylock_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc) 160 + { 161 + uplift_priority(sem, lc, false); 162 + if (!f2fs_down_read_trylock(sem)) { 163 + restore_priority(sem, lc, false); 164 + return 0; 165 + } 166 + trace_lock_elapsed_time_start(sem, lc); 167 + return 1; 168 + } 169 + 170 + void f2fs_up_read_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc) 171 + { 172 + f2fs_up_read(sem); 173 + restore_priority(sem, lc, false); 174 + trace_lock_elapsed_time_end(sem, lc, false); 175 + } 176 + 177 + void f2fs_down_write_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc) 178 + { 179 + uplift_priority(sem, lc, true); 180 + f2fs_down_write(sem); 181 + trace_lock_elapsed_time_start(sem, lc); 182 + } 183 + 184 + int f2fs_down_write_trylock_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc) 185 + { 186 + uplift_priority(sem, lc, true); 187 + if (!f2fs_down_write_trylock(sem)) { 188 + restore_priority(sem, lc, true); 189 + return 0; 190 + } 191 + trace_lock_elapsed_time_start(sem, lc); 192 + return 1; 193 + } 194 + 195 + void f2fs_up_write_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc) 196 + { 197 + f2fs_up_write(sem); 198 + restore_priority(sem, lc, true); 199 + trace_lock_elapsed_time_end(sem, lc, true); 200 + } 201 + 202 + void f2fs_lock_op(struct f2fs_sb_info *sbi, struct f2fs_lock_context *lc) 203 + { 204 + f2fs_down_read_trace(&sbi->cp_rwsem, lc); 205 + } 206 + 207 + int f2fs_trylock_op(struct f2fs_sb_info *sbi, struct f2fs_lock_context *lc) 208 + { 209 + if (time_to_inject(sbi, FAULT_LOCK_OP)) 210 + return 0; 211 + 212 + return f2fs_down_read_trylock_trace(&sbi->cp_rwsem, lc); 213 + } 214 + 215 + void f2fs_unlock_op(struct f2fs_sb_info *sbi, struct f2fs_lock_context *lc) 216 + { 217 + f2fs_up_read_trace(&sbi->cp_rwsem, lc); 218 + } 219 + 220 + static inline void f2fs_lock_all(struct f2fs_sb_info *sbi) 221 + { 222 + f2fs_down_write(&sbi->cp_rwsem); 223 + } 224 + 225 + static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi) 226 + { 227 + f2fs_up_write(&sbi->cp_rwsem); 228 + } 23 229 24 230 #define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, 3)) 25 231 ··· 585 379 struct writeback_control *wbc) 586 380 { 587 381 struct f2fs_sb_info *sbi = F2FS_M_SB(mapping); 382 + struct f2fs_lock_context lc; 588 383 long diff, written; 589 384 590 385 if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) ··· 598 391 goto skip_write; 599 392 600 393 /* if locked failed, cp will flush dirty pages instead */ 601 - if (!f2fs_down_write_trylock(&sbi->cp_global_sem)) 394 + if (!f2fs_down_write_trylock_trace(&sbi->cp_global_sem, &lc)) 602 395 goto skip_write; 603 396 604 397 trace_f2fs_writepages(mapping->host, wbc, META); 605 398 diff = nr_pages_to_write(sbi, META, wbc); 606 - written = f2fs_sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO); 607 - f2fs_up_write(&sbi->cp_global_sem); 399 + written = f2fs_sync_meta_pages(sbi, wbc->nr_to_write, FS_META_IO); 400 + f2fs_up_write_trace(&sbi->cp_global_sem, &lc); 608 401 wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff); 609 402 return 0; 610 403 ··· 614 407 return 0; 615 408 } 616 409 617 - long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, 618 - long nr_to_write, enum iostat_type io_type) 410 + long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, long nr_to_write, 411 + enum iostat_type io_type) 619 412 { 620 413 struct address_space *mapping = META_MAPPING(sbi); 621 414 pgoff_t index = 0, prev = ULONG_MAX; ··· 676 469 } 677 470 stop: 678 471 if (nwritten) 679 - f2fs_submit_merged_write(sbi, type); 472 + f2fs_submit_merged_write(sbi, META); 680 473 681 474 blk_finish_plug(&plug); 682 475 ··· 1519 1312 break; 1520 1313 1521 1314 if (type == F2FS_DIRTY_META) 1522 - f2fs_sync_meta_pages(sbi, META, LONG_MAX, 1523 - FS_CP_META_IO); 1315 + f2fs_sync_meta_pages(sbi, LONG_MAX, FS_CP_META_IO); 1524 1316 else if (type == F2FS_WB_CP_DATA) 1525 1317 f2fs_submit_merged_write(sbi, DATA); 1526 1318 ··· 1691 1485 int err; 1692 1486 1693 1487 /* Flush all the NAT/SIT pages */ 1694 - f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); 1488 + f2fs_sync_meta_pages(sbi, LONG_MAX, FS_CP_META_IO); 1695 1489 1696 1490 stat_cp_time(cpc, CP_TIME_SYNC_META); 1697 1491 ··· 1790 1584 } 1791 1585 1792 1586 /* Here, we have one bio having CP pack except cp pack 2 page */ 1793 - f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); 1587 + f2fs_sync_meta_pages(sbi, LONG_MAX, FS_CP_META_IO); 1794 1588 stat_cp_time(cpc, CP_TIME_SYNC_CP_META); 1795 1589 1796 1590 /* Wait for all dirty meta pages to be submitted for IO */ ··· 1852 1646 int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) 1853 1647 { 1854 1648 struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); 1649 + struct f2fs_lock_context lc; 1855 1650 unsigned long long ckpt_ver; 1856 1651 int err = 0; 1857 1652 ··· 1867 1660 f2fs_warn(sbi, "Start checkpoint disabled!"); 1868 1661 } 1869 1662 if (cpc->reason != CP_RESIZE) 1870 - f2fs_down_write(&sbi->cp_global_sem); 1663 + f2fs_down_write_trace(&sbi->cp_global_sem, &lc); 1871 1664 1872 1665 stat_cp_time(cpc, CP_TIME_LOCK); 1873 1666 ··· 1908 1701 goto out; 1909 1702 } 1910 1703 } 1704 + stat_cp_time(cpc, CP_TIME_MERGE_WRITE); 1911 1705 1912 1706 /* 1913 1707 * update checkpoint pack index ··· 1925 1717 f2fs_bug_on(sbi, !f2fs_cp_error(sbi)); 1926 1718 goto stop; 1927 1719 } 1720 + stat_cp_time(cpc, CP_TIME_FLUSH_NAT); 1928 1721 1929 1722 f2fs_flush_sit_entries(sbi, cpc); 1930 1723 1931 - stat_cp_time(cpc, CP_TIME_FLUSH_META); 1724 + stat_cp_time(cpc, CP_TIME_FLUSH_SIT); 1932 1725 1933 1726 /* save inmem log status */ 1934 1727 f2fs_save_inmem_curseg(sbi); ··· 1959 1750 trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, CP_PHASE_FINISH_CHECKPOINT); 1960 1751 out: 1961 1752 if (cpc->reason != CP_RESIZE) 1962 - f2fs_up_write(&sbi->cp_global_sem); 1753 + f2fs_up_write_trace(&sbi->cp_global_sem, &lc); 1963 1754 return err; 1964 1755 } 1965 1756 ··· 2005 1796 static int __write_checkpoint_sync(struct f2fs_sb_info *sbi) 2006 1797 { 2007 1798 struct cp_control cpc = { .reason = CP_SYNC, }; 1799 + struct f2fs_lock_context lc; 2008 1800 int err; 2009 1801 2010 - f2fs_down_write(&sbi->gc_lock); 1802 + f2fs_down_write_trace(&sbi->gc_lock, &lc); 2011 1803 err = f2fs_write_checkpoint(sbi, &cpc); 2012 - f2fs_up_write(&sbi->gc_lock); 1804 + f2fs_up_write_trace(&sbi->gc_lock, &lc); 2013 1805 2014 1806 return err; 2015 1807 } ··· 2098 1888 cpc.reason = __get_cp_reason(sbi); 2099 1889 if (!test_opt(sbi, MERGE_CHECKPOINT) || cpc.reason != CP_SYNC || 2100 1890 sbi->umount_lock_holder == current) { 1891 + struct f2fs_lock_context lc; 2101 1892 int ret; 2102 1893 2103 - f2fs_down_write(&sbi->gc_lock); 1894 + f2fs_down_write_trace(&sbi->gc_lock, &lc); 2104 1895 ret = f2fs_write_checkpoint(sbi, &cpc); 2105 - f2fs_up_write(&sbi->gc_lock); 1896 + f2fs_up_write_trace(&sbi->gc_lock, &lc); 2106 1897 2107 1898 return ret; 2108 1899 } ··· 2158 1947 } 2159 1948 2160 1949 set_task_ioprio(cprc->f2fs_issue_ckpt, cprc->ckpt_thread_ioprio); 1950 + set_user_nice(cprc->f2fs_issue_ckpt, 1951 + PRIO_TO_NICE(sbi->critical_task_priority)); 2161 1952 2162 1953 return 0; 2163 1954 }

+10 -8

fs/f2fs/compress.c

··· 1291 1291 struct dnode_of_data dn; 1292 1292 struct node_info ni; 1293 1293 struct compress_io_ctx *cic; 1294 + struct f2fs_lock_context lc; 1294 1295 pgoff_t start_idx = start_idx_of_cluster(cc); 1295 1296 unsigned int last_index = cc->cluster_size - 1; 1296 1297 loff_t psize; ··· 1310 1309 * checkpoint. This can only happen to quota writes which can cause 1311 1310 * the below discard race condition. 1312 1311 */ 1313 - f2fs_down_read(&sbi->node_write); 1314 - } else if (!f2fs_trylock_op(sbi)) { 1312 + f2fs_down_read_trace(&sbi->node_write, &lc); 1313 + } else if (!f2fs_trylock_op(sbi, &lc)) { 1315 1314 goto out_free; 1316 1315 } 1317 1316 ··· 1435 1434 1436 1435 f2fs_put_dnode(&dn); 1437 1436 if (quota_inode) 1438 - f2fs_up_read(&sbi->node_write); 1437 + f2fs_up_read_trace(&sbi->node_write, &lc); 1439 1438 else 1440 - f2fs_unlock_op(sbi); 1439 + f2fs_unlock_op(sbi, &lc); 1441 1440 1442 1441 spin_lock(&fi->i_size_lock); 1443 1442 if (fi->last_disk_size < psize) ··· 1464 1463 f2fs_put_dnode(&dn); 1465 1464 out_unlock_op: 1466 1465 if (quota_inode) 1467 - f2fs_up_read(&sbi->node_write); 1466 + f2fs_up_read_trace(&sbi->node_write, &lc); 1468 1467 else 1469 - f2fs_unlock_op(sbi); 1468 + f2fs_unlock_op(sbi, &lc); 1470 1469 out_free: 1471 1470 for (i = 0; i < cc->valid_nr_cpages; i++) { 1472 1471 f2fs_compress_free_page(cc->cpages[i]); ··· 1513 1512 { 1514 1513 struct address_space *mapping = cc->inode->i_mapping; 1515 1514 struct f2fs_sb_info *sbi = F2FS_M_SB(mapping); 1515 + struct f2fs_lock_context lc; 1516 1516 int submitted, compr_blocks, i; 1517 1517 int ret = 0; 1518 1518 ··· 1532 1530 1533 1531 /* overwrite compressed cluster w/ normal cluster */ 1534 1532 if (compr_blocks > 0) 1535 - f2fs_lock_op(sbi); 1533 + f2fs_lock_op(sbi, &lc); 1536 1534 1537 1535 for (i = 0; i < cc->cluster_size; i++) { 1538 1536 struct folio *folio; ··· 1588 1586 1589 1587 out: 1590 1588 if (compr_blocks > 0) 1591 - f2fs_unlock_op(sbi); 1589 + f2fs_unlock_op(sbi, &lc); 1592 1590 1593 1591 f2fs_balance_fs(sbi, true); 1594 1592 return ret;

+376 -62

fs/f2fs/data.c

··· 31 31 32 32 static struct kmem_cache *bio_post_read_ctx_cache; 33 33 static struct kmem_cache *bio_entry_slab; 34 + static struct kmem_cache *ffs_entry_slab; 34 35 static mempool_t *bio_post_read_ctx_pool; 35 36 static struct bio_set f2fs_bioset; 37 + 38 + struct f2fs_folio_state { 39 + spinlock_t state_lock; 40 + unsigned int read_pages_pending; 41 + }; 36 42 37 43 #define F2FS_BIO_POOL_SIZE NR_CURSEG_TYPE 38 44 ··· 145 139 { 146 140 struct folio_iter fi; 147 141 struct bio_post_read_ctx *ctx = bio->bi_private; 142 + unsigned long flags; 148 143 149 144 bio_for_each_folio_all(fi, bio) { 150 145 struct folio *folio = fi.folio; 146 + unsigned nr_pages = fi.length >> PAGE_SHIFT; 147 + bool finished = true; 151 148 152 - if (f2fs_is_compressed_page(folio)) { 149 + if (!folio_test_large(folio) && 150 + f2fs_is_compressed_page(folio)) { 153 151 if (ctx && !ctx->decompression_attempted) 154 152 f2fs_end_read_compressed_page(folio, true, 0, 155 153 in_task); ··· 161 151 continue; 162 152 } 163 153 164 - dec_page_count(F2FS_F_SB(folio), __read_io_type(folio)); 165 - folio_end_read(folio, bio->bi_status == BLK_STS_OK); 154 + if (folio_test_large(folio)) { 155 + struct f2fs_folio_state *ffs = folio->private; 156 + 157 + spin_lock_irqsave(&ffs->state_lock, flags); 158 + ffs->read_pages_pending -= nr_pages; 159 + finished = !ffs->read_pages_pending; 160 + spin_unlock_irqrestore(&ffs->state_lock, flags); 161 + } 162 + 163 + while (nr_pages--) 164 + dec_page_count(F2FS_F_SB(folio), __read_io_type(folio)); 165 + 166 + if (F2FS_F_SB(folio)->node_inode && is_node_folio(folio) && 167 + f2fs_sanity_check_node_footer(F2FS_F_SB(folio), 168 + folio, folio->index, NODE_TYPE_REGULAR, true)) 169 + bio->bi_status = BLK_STS_IOERR; 170 + 171 + if (finished) 172 + folio_end_read(folio, bio->bi_status == BLK_STS_OK); 166 173 } 167 174 168 175 if (ctx) ··· 216 189 struct folio *folio = fi.folio; 217 190 218 191 if (!f2fs_is_compressed_page(folio) && 219 - !fsverity_verify_page(vi, &folio->page)) { 192 + !fsverity_verify_folio(vi, folio)) { 220 193 bio->bi_status = BLK_STS_IOERR; 221 194 break; 222 195 } ··· 381 354 STOP_CP_REASON_WRITE_FAIL); 382 355 } 383 356 384 - f2fs_bug_on(sbi, is_node_folio(folio) && 385 - folio->index != nid_of_node(folio)); 357 + if (is_node_folio(folio)) { 358 + f2fs_sanity_check_node_footer(sbi, folio, 359 + folio->index, NODE_TYPE_REGULAR, true); 360 + f2fs_bug_on(sbi, folio->index != nid_of_node(folio)); 361 + } 386 362 387 363 dec_page_count(sbi, type); 364 + 365 + /* 366 + * we should access sbi before folio_end_writeback() to 367 + * avoid racing w/ kill_f2fs_super() 368 + */ 369 + if (type == F2FS_WB_CP_DATA && !get_pages(sbi, type) && 370 + wq_has_sleeper(&sbi->cp_wait)) 371 + wake_up(&sbi->cp_wait); 372 + 388 373 if (f2fs_in_warm_node_list(sbi, folio)) 389 374 f2fs_del_fsync_node_entry(sbi, folio); 390 375 folio_clear_f2fs_gcing(folio); 391 376 folio_end_writeback(folio); 392 377 } 393 - if (!get_pages(sbi, F2FS_WB_CP_DATA) && 394 - wq_has_sleeper(&sbi->cp_wait)) 395 - wake_up(&sbi->cp_wait); 396 378 397 379 bio_put(bio); 398 380 } ··· 547 511 void f2fs_submit_read_bio(struct f2fs_sb_info *sbi, struct bio *bio, 548 512 enum page_type type) 549 513 { 514 + if (!bio) 515 + return; 516 + 550 517 WARN_ON_ONCE(!is_read_io(bio_op(bio))); 551 518 trace_f2fs_submit_read_bio(sbi->sb, type, bio); 552 519 ··· 636 597 for (j = HOT; j < n; j++) { 637 598 struct f2fs_bio_info *io = &sbi->write_io[i][j]; 638 599 639 - init_f2fs_rwsem(&io->io_rwsem); 600 + init_f2fs_rwsem_trace(&io->io_rwsem, sbi, 601 + LOCK_NAME_IO_RWSEM); 640 602 io->sbi = sbi; 641 603 io->bio = NULL; 642 604 io->last_block_in_bio = 0; ··· 661 621 { 662 622 enum page_type btype = PAGE_TYPE_OF_BIO(type); 663 623 struct f2fs_bio_info *io = sbi->write_io[btype] + temp; 624 + struct f2fs_lock_context lc; 664 625 665 - f2fs_down_write(&io->io_rwsem); 626 + f2fs_down_write_trace(&io->io_rwsem, &lc); 666 627 667 628 if (!io->bio) 668 629 goto unlock_out; ··· 677 636 } 678 637 __submit_merged_bio(io); 679 638 unlock_out: 680 - f2fs_up_write(&io->io_rwsem); 639 + f2fs_up_write_trace(&io->io_rwsem, &lc); 681 640 } 682 641 683 642 static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, 684 643 struct inode *inode, struct folio *folio, 685 - nid_t ino, enum page_type type, bool force) 644 + nid_t ino, enum page_type type, bool writeback) 686 645 { 687 646 enum temp_type temp; 688 647 bool ret = true; 648 + bool force = !inode && !folio && !ino; 689 649 690 650 for (temp = HOT; temp < NR_TEMP_TYPE; temp++) { 691 651 if (!force) { 692 652 enum page_type btype = PAGE_TYPE_OF_BIO(type); 693 653 struct f2fs_bio_info *io = sbi->write_io[btype] + temp; 654 + struct f2fs_lock_context lc; 694 655 695 - f2fs_down_read(&io->io_rwsem); 656 + f2fs_down_read_trace(&io->io_rwsem, &lc); 696 657 ret = __has_merged_page(io->bio, inode, folio, ino); 697 - f2fs_up_read(&io->io_rwsem); 658 + f2fs_up_read_trace(&io->io_rwsem, &lc); 698 659 } 699 - if (ret) 660 + if (ret) { 700 661 __f2fs_submit_merged_write(sbi, type, temp); 662 + /* 663 + * For waitting writebck case, if the bio owned by the 664 + * folio is already submitted, we do not need to submit 665 + * other types of bios. 666 + */ 667 + if (writeback) 668 + break; 669 + } 701 670 702 671 /* TODO: use HOT temp only for meta pages now. */ 703 672 if (type >= META) ··· 717 666 718 667 void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type) 719 668 { 720 - __submit_merged_write_cond(sbi, NULL, NULL, 0, type, true); 669 + __submit_merged_write_cond(sbi, NULL, NULL, 0, type, false); 721 670 } 722 671 723 672 void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, ··· 725 674 nid_t ino, enum page_type type) 726 675 { 727 676 __submit_merged_write_cond(sbi, inode, folio, ino, type, false); 677 + } 678 + 679 + void f2fs_submit_merged_write_folio(struct f2fs_sb_info *sbi, 680 + struct folio *folio, enum page_type type) 681 + { 682 + __submit_merged_write_cond(sbi, NULL, folio, 0, type, true); 728 683 } 729 684 730 685 void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi) ··· 1006 949 enum page_type btype = PAGE_TYPE_OF_BIO(fio->type); 1007 950 struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp; 1008 951 struct folio *bio_folio; 952 + struct f2fs_lock_context lc; 1009 953 enum count_type type; 1010 954 1011 955 f2fs_bug_on(sbi, is_read_io(fio->op)); 1012 956 1013 - f2fs_down_write(&io->io_rwsem); 957 + f2fs_down_write_trace(&io->io_rwsem, &lc); 1014 958 next: 1015 959 #ifdef CONFIG_BLK_DEV_ZONED 1016 960 if (f2fs_sb_has_blkzoned(sbi) && btype < META && io->zone_pending_bio) { ··· 1093 1035 if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) || 1094 1036 !f2fs_is_checkpoint_ready(sbi)) 1095 1037 __submit_merged_bio(io); 1096 - f2fs_up_write(&io->io_rwsem); 1038 + f2fs_up_write_trace(&io->io_rwsem, &lc); 1097 1039 } 1098 1040 1099 1041 static struct bio *f2fs_grab_read_bio(struct inode *inode, ··· 1270 1212 struct dnode_of_data dn; 1271 1213 struct folio *folio; 1272 1214 int err; 1273 - 1215 + retry: 1274 1216 folio = f2fs_grab_cache_folio(mapping, index, for_write); 1275 1217 if (IS_ERR(folio)) 1276 1218 return folio; 1219 + 1220 + if (folio_test_large(folio)) { 1221 + pgoff_t folio_index = mapping_align_index(mapping, index); 1222 + 1223 + f2fs_folio_put(folio, true); 1224 + invalidate_inode_pages2_range(mapping, folio_index, 1225 + folio_index + folio_nr_pages(folio) - 1); 1226 + f2fs_schedule_timeout(DEFAULT_SCHEDULE_TIMEOUT); 1227 + goto retry; 1228 + } 1277 1229 1278 1230 if (f2fs_lookup_read_extent_cache_block(inode, index, 1279 1231 &dn.data_blkaddr)) { ··· 1496 1428 return 0; 1497 1429 } 1498 1430 1499 - static void f2fs_map_lock(struct f2fs_sb_info *sbi, int flag) 1431 + static void f2fs_map_lock(struct f2fs_sb_info *sbi, 1432 + struct f2fs_lock_context *lc, 1433 + int flag) 1500 1434 { 1501 - f2fs_down_read(&sbi->cp_enable_rwsem); 1502 1435 if (flag == F2FS_GET_BLOCK_PRE_AIO) 1503 - f2fs_down_read(&sbi->node_change); 1436 + f2fs_down_read_trace(&sbi->node_change, lc); 1504 1437 else 1505 - f2fs_lock_op(sbi); 1438 + f2fs_lock_op(sbi, lc); 1506 1439 } 1507 1440 1508 - static void f2fs_map_unlock(struct f2fs_sb_info *sbi, int flag) 1441 + static void f2fs_map_unlock(struct f2fs_sb_info *sbi, 1442 + struct f2fs_lock_context *lc, 1443 + int flag) 1509 1444 { 1510 1445 if (flag == F2FS_GET_BLOCK_PRE_AIO) 1511 - f2fs_up_read(&sbi->node_change); 1446 + f2fs_up_read_trace(&sbi->node_change, lc); 1512 1447 else 1513 - f2fs_unlock_op(sbi); 1514 - f2fs_up_read(&sbi->cp_enable_rwsem); 1448 + f2fs_unlock_op(sbi, lc); 1515 1449 } 1516 1450 1517 1451 int f2fs_get_block_locked(struct dnode_of_data *dn, pgoff_t index) 1518 1452 { 1519 1453 struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); 1454 + struct f2fs_lock_context lc; 1520 1455 int err = 0; 1521 1456 1522 - f2fs_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO); 1457 + f2fs_map_lock(sbi, &lc, F2FS_GET_BLOCK_PRE_AIO); 1523 1458 if (!f2fs_lookup_read_extent_cache_block(dn->inode, index, 1524 1459 &dn->data_blkaddr)) 1525 1460 err = f2fs_reserve_block(dn, index); 1526 - f2fs_map_unlock(sbi, F2FS_GET_BLOCK_PRE_AIO); 1461 + f2fs_map_unlock(sbi, &lc, F2FS_GET_BLOCK_PRE_AIO); 1527 1462 1528 1463 return err; 1529 1464 } ··· 1617 1546 unsigned int maxblocks = map->m_len; 1618 1547 struct dnode_of_data dn; 1619 1548 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1549 + struct f2fs_lock_context lc; 1620 1550 int mode = map->m_may_create ? ALLOC_NODE : LOOKUP_NODE; 1621 1551 pgoff_t pgofs, end_offset, end; 1622 1552 int err = 0, ofs = 1; ··· 1656 1584 if (map->m_may_create) { 1657 1585 if (f2fs_lfs_mode(sbi)) 1658 1586 f2fs_balance_fs(sbi, true); 1659 - f2fs_map_lock(sbi, flag); 1587 + f2fs_map_lock(sbi, &lc, flag); 1660 1588 } 1661 1589 1662 1590 /* When reading holes, we need its node page */ ··· 1822 1750 f2fs_put_dnode(&dn); 1823 1751 1824 1752 if (map->m_may_create) { 1825 - f2fs_map_unlock(sbi, flag); 1753 + f2fs_map_unlock(sbi, &lc, flag); 1826 1754 f2fs_balance_fs(sbi, dn.node_changed); 1827 1755 } 1828 1756 goto next_dnode; ··· 1869 1797 f2fs_put_dnode(&dn); 1870 1798 unlock_out: 1871 1799 if (map->m_may_create) { 1872 - f2fs_map_unlock(sbi, flag); 1800 + f2fs_map_unlock(sbi, &lc, flag); 1873 1801 f2fs_balance_fs(sbi, dn.node_changed); 1874 1802 } 1875 1803 out: ··· 1877 1805 return err; 1878 1806 } 1879 1807 1880 - bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len) 1808 + static bool __f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len, 1809 + bool check_first) 1881 1810 { 1882 1811 struct f2fs_map_blocks map; 1883 1812 block_t last_lblk; ··· 1900 1827 if (err || map.m_len == 0) 1901 1828 return false; 1902 1829 map.m_lblk += map.m_len; 1830 + if (check_first) 1831 + break; 1903 1832 } 1904 1833 return true; 1834 + } 1835 + 1836 + bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len) 1837 + { 1838 + return __f2fs_overwrite_io(inode, pos, len, false); 1905 1839 } 1906 1840 1907 1841 static int f2fs_xattr_fiemap(struct inode *inode, ··· 2184 2104 /* 2185 2105 * Map blocks using the previous result first. 2186 2106 */ 2187 - if ((map->m_flags & F2FS_MAP_MAPPED) && 2188 - block_in_file > map->m_lblk && 2107 + if (map->m_flags & F2FS_MAP_MAPPED) { 2108 + if (block_in_file > map->m_lblk && 2189 2109 block_in_file < (map->m_lblk + map->m_len)) 2110 + goto got_it; 2111 + } else if (block_in_file < *map->m_next_pgofs) { 2190 2112 goto got_it; 2113 + } 2191 2114 2192 2115 /* 2193 2116 * Then do more f2fs_map_blocks() calls until we are ··· 2426 2343 } 2427 2344 #endif 2428 2345 2346 + static struct f2fs_folio_state *ffs_find_or_alloc(struct folio *folio) 2347 + { 2348 + struct f2fs_folio_state *ffs = folio->private; 2349 + 2350 + if (ffs) 2351 + return ffs; 2352 + 2353 + ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, 2354 + GFP_NOIO | __GFP_ZERO, true, NULL); 2355 + 2356 + spin_lock_init(&ffs->state_lock); 2357 + folio_attach_private(folio, ffs); 2358 + return ffs; 2359 + } 2360 + 2361 + static void ffs_detach_free(struct folio *folio) 2362 + { 2363 + struct f2fs_folio_state *ffs; 2364 + 2365 + if (!folio_test_large(folio)) { 2366 + folio_detach_private(folio); 2367 + return; 2368 + } 2369 + 2370 + ffs = folio_detach_private(folio); 2371 + if (!ffs) 2372 + return; 2373 + 2374 + WARN_ON_ONCE(ffs->read_pages_pending != 0); 2375 + kmem_cache_free(ffs_entry_slab, ffs); 2376 + } 2377 + 2378 + static int f2fs_read_data_large_folio(struct inode *inode, 2379 + struct fsverity_info *vi, 2380 + struct readahead_control *rac, struct folio *folio) 2381 + { 2382 + struct bio *bio = NULL; 2383 + sector_t last_block_in_bio = 0; 2384 + struct f2fs_map_blocks map = {0, }; 2385 + pgoff_t index, offset, next_pgofs = 0; 2386 + unsigned max_nr_pages = rac ? readahead_count(rac) : 2387 + folio_nr_pages(folio); 2388 + unsigned nrpages; 2389 + struct f2fs_folio_state *ffs; 2390 + int ret = 0; 2391 + bool folio_in_bio; 2392 + 2393 + if (!IS_IMMUTABLE(inode) || f2fs_compressed_file(inode)) { 2394 + if (folio) 2395 + folio_unlock(folio); 2396 + return -EOPNOTSUPP; 2397 + } 2398 + 2399 + map.m_seg_type = NO_CHECK_TYPE; 2400 + 2401 + if (rac) 2402 + folio = readahead_folio(rac); 2403 + next_folio: 2404 + if (!folio) 2405 + goto out; 2406 + 2407 + folio_in_bio = false; 2408 + index = folio->index; 2409 + offset = 0; 2410 + ffs = NULL; 2411 + nrpages = folio_nr_pages(folio); 2412 + 2413 + for (; nrpages; nrpages--, max_nr_pages--, index++, offset++) { 2414 + sector_t block_nr; 2415 + /* 2416 + * Map blocks using the previous result first. 2417 + */ 2418 + if (map.m_flags & F2FS_MAP_MAPPED) { 2419 + if (index > map.m_lblk && 2420 + index < (map.m_lblk + map.m_len)) 2421 + goto got_it; 2422 + } else if (index < next_pgofs) { 2423 + /* hole case */ 2424 + goto got_it; 2425 + } 2426 + 2427 + /* 2428 + * Then do more f2fs_map_blocks() calls until we are 2429 + * done with this page. 2430 + */ 2431 + memset(&map, 0, sizeof(map)); 2432 + map.m_next_pgofs = &next_pgofs; 2433 + map.m_seg_type = NO_CHECK_TYPE; 2434 + map.m_lblk = index; 2435 + map.m_len = max_nr_pages; 2436 + 2437 + ret = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_DEFAULT); 2438 + if (ret) 2439 + goto err_out; 2440 + got_it: 2441 + if ((map.m_flags & F2FS_MAP_MAPPED)) { 2442 + block_nr = map.m_pblk + index - map.m_lblk; 2443 + if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, 2444 + DATA_GENERIC_ENHANCE_READ)) { 2445 + ret = -EFSCORRUPTED; 2446 + goto err_out; 2447 + } 2448 + } else { 2449 + size_t page_offset = offset << PAGE_SHIFT; 2450 + folio_zero_range(folio, page_offset, PAGE_SIZE); 2451 + if (vi && !fsverity_verify_blocks(vi, folio, PAGE_SIZE, page_offset)) { 2452 + ret = -EIO; 2453 + goto err_out; 2454 + } 2455 + continue; 2456 + } 2457 + 2458 + /* We must increment read_pages_pending before possible BIOs submitting 2459 + * to prevent from premature folio_end_read() call on folio 2460 + */ 2461 + if (folio_test_large(folio)) { 2462 + ffs = ffs_find_or_alloc(folio); 2463 + 2464 + /* set the bitmap to wait */ 2465 + spin_lock_irq(&ffs->state_lock); 2466 + ffs->read_pages_pending++; 2467 + spin_unlock_irq(&ffs->state_lock); 2468 + } 2469 + 2470 + /* 2471 + * This page will go to BIO. Do we need to send this 2472 + * BIO off first? 2473 + */ 2474 + if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio, 2475 + last_block_in_bio, block_nr) || 2476 + !f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) { 2477 + submit_and_realloc: 2478 + f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); 2479 + bio = NULL; 2480 + } 2481 + if (bio == NULL) 2482 + bio = f2fs_grab_read_bio(inode, vi, 2483 + block_nr, max_nr_pages, 2484 + f2fs_ra_op_flags(rac), 2485 + index, false); 2486 + 2487 + /* 2488 + * If the page is under writeback, we need to wait for 2489 + * its completion to see the correct decrypted data. 2490 + */ 2491 + f2fs_wait_on_block_writeback(inode, block_nr); 2492 + 2493 + if (!bio_add_folio(bio, folio, F2FS_BLKSIZE, 2494 + offset << PAGE_SHIFT)) 2495 + goto submit_and_realloc; 2496 + 2497 + folio_in_bio = true; 2498 + inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA); 2499 + f2fs_update_iostat(F2FS_I_SB(inode), NULL, FS_DATA_READ_IO, 2500 + F2FS_BLKSIZE); 2501 + last_block_in_bio = block_nr; 2502 + } 2503 + trace_f2fs_read_folio(folio, DATA); 2504 + err_out: 2505 + if (!folio_in_bio) { 2506 + folio_end_read(folio, !ret); 2507 + if (ret) 2508 + return ret; 2509 + } 2510 + if (rac) { 2511 + folio = readahead_folio(rac); 2512 + goto next_folio; 2513 + } 2514 + out: 2515 + f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); 2516 + if (ret) { 2517 + /* Wait bios and clear uptodate. */ 2518 + folio_lock(folio); 2519 + folio_clear_uptodate(folio); 2520 + folio_unlock(folio); 2521 + } 2522 + return ret; 2523 + } 2524 + 2429 2525 /* 2430 2526 * This function was originally taken from fs/mpage.c, and customized for f2fs. 2431 2527 * Major change was from block_size == page_size in f2fs by default. ··· 2629 2367 pgoff_t nc_cluster_idx = NULL_CLUSTER; 2630 2368 pgoff_t index; 2631 2369 #endif 2370 + pgoff_t next_pgofs = 0; 2632 2371 unsigned nr_pages = rac ? readahead_count(rac) : 1; 2372 + struct address_space *mapping = rac ? rac->mapping : folio->mapping; 2633 2373 unsigned max_nr_pages = nr_pages; 2634 2374 int ret = 0; 2375 + 2376 + if (mapping_large_folio_support(mapping)) 2377 + return f2fs_read_data_large_folio(inode, vi, rac, folio); 2635 2378 2636 2379 #ifdef CONFIG_F2FS_FS_COMPRESSION 2637 2380 if (f2fs_compressed_file(inode)) { ··· 2650 2383 map.m_lblk = 0; 2651 2384 map.m_len = 0; 2652 2385 map.m_flags = 0; 2653 - map.m_next_pgofs = NULL; 2386 + map.m_next_pgofs = &next_pgofs; 2654 2387 map.m_next_extent = NULL; 2655 2388 map.m_seg_type = NO_CHECK_TYPE; 2656 2389 map.m_may_create = false; ··· 2731 2464 } 2732 2465 #endif 2733 2466 } 2734 - if (bio) 2735 - f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); 2467 + f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); 2736 2468 return ret; 2737 2469 } 2738 2470 ··· 2929 2663 struct inode *inode = folio->mapping->host; 2930 2664 struct dnode_of_data dn; 2931 2665 struct node_info ni; 2666 + struct f2fs_lock_context lc; 2932 2667 bool ipu_force = false; 2933 2668 bool atomic_commit; 2934 2669 int err = 0; ··· 2954 2687 goto got_it; 2955 2688 } 2956 2689 2690 + if (is_sbi_flag_set(fio->sbi, SBI_ENABLE_CHECKPOINT) && 2691 + time_to_inject(fio->sbi, FAULT_SKIP_WRITE)) 2692 + return -EINVAL; 2693 + 2957 2694 /* Deadlock due to between page->lock and f2fs_lock_op */ 2958 - if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi)) 2695 + if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi, &lc)) 2959 2696 return -EAGAIN; 2960 2697 2961 2698 err = f2fs_get_dnode_of_data(&dn, folio->index, LOOKUP_NODE); ··· 3000 2729 folio_start_writeback(folio); 3001 2730 f2fs_put_dnode(&dn); 3002 2731 if (fio->need_lock == LOCK_REQ) 3003 - f2fs_unlock_op(fio->sbi); 2732 + f2fs_unlock_op(fio->sbi, &lc); 3004 2733 err = f2fs_inplace_write_data(fio); 3005 2734 if (err) { 3006 2735 if (fscrypt_inode_uses_fs_layer_crypto(inode)) ··· 3014 2743 } 3015 2744 3016 2745 if (fio->need_lock == LOCK_RETRY) { 3017 - if (!f2fs_trylock_op(fio->sbi)) { 2746 + if (!f2fs_trylock_op(fio->sbi, &lc)) { 3018 2747 err = -EAGAIN; 3019 2748 goto out_writepage; 3020 2749 } ··· 3046 2775 f2fs_put_dnode(&dn); 3047 2776 out: 3048 2777 if (fio->need_lock == LOCK_REQ) 3049 - f2fs_unlock_op(fio->sbi); 2778 + f2fs_unlock_op(fio->sbi, &lc); 3050 2779 return err; 3051 2780 } 3052 2781 ··· 3126 2855 write: 3127 2856 /* Dentry/quota blocks are controlled by checkpoint */ 3128 2857 if (S_ISDIR(inode->i_mode) || quota_inode) { 2858 + struct f2fs_lock_context lc; 2859 + 3129 2860 /* 3130 2861 * We need to wait for node_write to avoid block allocation during 3131 2862 * checkpoint. This can only happen to quota writes which can cause 3132 2863 * the below discard race condition. 3133 2864 */ 3134 2865 if (quota_inode) 3135 - f2fs_down_read(&sbi->node_write); 2866 + f2fs_down_read_trace(&sbi->node_write, &lc); 3136 2867 3137 2868 fio.need_lock = LOCK_DONE; 3138 2869 err = f2fs_do_write_data_page(&fio); 3139 2870 3140 2871 if (quota_inode) 3141 - f2fs_up_read(&sbi->node_write); 2872 + f2fs_up_read_trace(&sbi->node_write, &lc); 3142 2873 3143 2874 goto done; 3144 2875 } ··· 3510 3237 if (IS_NOQUOTA(inode)) 3511 3238 return false; 3512 3239 3240 + if (f2fs_is_pinned_file(inode)) 3241 + return false; 3513 3242 if (f2fs_need_compress_data(inode)) 3514 3243 return true; 3515 3244 if (wbc->sync_mode != WB_SYNC_ALL) ··· 3532 3257 else 3533 3258 atomic_dec(&F2FS_I(inode)->writeback); 3534 3259 f2fs_up_read(&F2FS_I(inode)->i_sem); 3260 + } 3261 + 3262 + static inline void update_skipped_write(struct f2fs_sb_info *sbi, 3263 + struct writeback_control *wbc) 3264 + { 3265 + long skipped = wbc->pages_skipped; 3266 + 3267 + if (is_sbi_flag_set(sbi, SBI_ENABLE_CHECKPOINT) && skipped && 3268 + wbc->sync_mode == WB_SYNC_ALL) 3269 + atomic_add(skipped, &sbi->nr_pages[F2FS_SKIPPED_WRITE]); 3535 3270 } 3536 3271 3537 3272 static int __f2fs_write_data_pages(struct address_space *mapping, ··· 3608 3323 */ 3609 3324 3610 3325 f2fs_remove_dirty_inode(inode); 3326 + 3327 + /* 3328 + * f2fs_write_cache_pages() has retry logic for EAGAIN case which is 3329 + * common when racing w/ checkpoint, so only update skipped write 3330 + * when ret is non-zero. 3331 + */ 3332 + if (ret) 3333 + update_skipped_write(sbi, wbc); 3611 3334 return ret; 3612 3335 3613 3336 skip_write: 3614 3337 wbc->pages_skipped += get_dirty_pages(inode); 3338 + update_skipped_write(sbi, wbc); 3615 3339 trace_f2fs_writepages(mapping->host, wbc, DATA); 3616 3340 return 0; 3617 3341 } ··· 3662 3368 struct inode *inode = folio->mapping->host; 3663 3369 pgoff_t index = folio->index; 3664 3370 struct dnode_of_data dn; 3371 + struct f2fs_lock_context lc; 3665 3372 struct folio *ifolio; 3666 3373 bool locked = false; 3667 3374 int flag = F2FS_GET_BLOCK_PRE_AIO; ··· 3679 3384 if (f2fs_has_inline_data(inode)) { 3680 3385 if (pos + len > MAX_INLINE_DATA(inode)) 3681 3386 flag = F2FS_GET_BLOCK_DEFAULT; 3682 - f2fs_map_lock(sbi, flag); 3387 + f2fs_map_lock(sbi, &lc, flag); 3683 3388 locked = true; 3684 3389 } else if ((pos & PAGE_MASK) >= i_size_read(inode)) { 3685 - f2fs_map_lock(sbi, flag); 3390 + f2fs_map_lock(sbi, &lc, flag); 3686 3391 locked = true; 3687 3392 } 3688 3393 ··· 3726 3431 if (!err && dn.data_blkaddr != NULL_ADDR) 3727 3432 goto out; 3728 3433 f2fs_put_dnode(&dn); 3729 - f2fs_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO); 3434 + f2fs_map_lock(sbi, &lc, F2FS_GET_BLOCK_PRE_AIO); 3730 3435 WARN_ON(flag != F2FS_GET_BLOCK_PRE_AIO); 3731 3436 locked = true; 3732 3437 goto restart; ··· 3740 3445 f2fs_put_dnode(&dn); 3741 3446 unlock_out: 3742 3447 if (locked) 3743 - f2fs_map_unlock(sbi, flag); 3448 + f2fs_map_unlock(sbi, &lc, flag); 3744 3449 return err; 3745 3450 } 3746 3451 ··· 3776 3481 { 3777 3482 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 3778 3483 struct dnode_of_data dn; 3484 + struct f2fs_lock_context lc; 3779 3485 struct folio *ifolio; 3780 3486 int err = 0; 3781 3487 3782 - f2fs_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO); 3488 + f2fs_map_lock(sbi, &lc, F2FS_GET_BLOCK_PRE_AIO); 3783 3489 3784 3490 ifolio = f2fs_get_inode_folio(sbi, inode->i_ino); 3785 3491 if (IS_ERR(ifolio)) { ··· 3798 3502 f2fs_put_dnode(&dn); 3799 3503 3800 3504 unlock_out: 3801 - f2fs_map_unlock(sbi, F2FS_GET_BLOCK_PRE_AIO); 3505 + f2fs_map_unlock(sbi, &lc, F2FS_GET_BLOCK_PRE_AIO); 3802 3506 return err; 3803 3507 } 3804 3508 ··· 4057 3761 f2fs_remove_dirty_inode(inode); 4058 3762 } 4059 3763 } 4060 - folio_detach_private(folio); 3764 + 3765 + if (offset || length != folio_size(folio)) 3766 + return; 3767 + 3768 + folio_cancel_dirty(folio); 3769 + ffs_detach_free(folio); 4061 3770 } 4062 3771 4063 3772 bool f2fs_release_folio(struct folio *folio, gfp_t wait) ··· 4071 3770 if (folio_test_dirty(folio)) 4072 3771 return false; 4073 3772 4074 - folio_detach_private(folio); 3773 + ffs_detach_free(folio); 4075 3774 return true; 4076 3775 } 4077 3776 ··· 4256 3955 4257 3956 while (cur_lblock < last_lblock && cur_lblock < sis->max) { 4258 3957 struct f2fs_map_blocks map; 3958 + bool last_extent = false; 4259 3959 retry: 4260 3960 cond_resched(); 4261 3961 ··· 4282 3980 pblock = map.m_pblk; 4283 3981 nr_pblocks = map.m_len; 4284 3982 4285 - if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec || 4286 - nr_pblocks % blks_per_sec || 4287 - f2fs_is_sequential_zone_area(sbi, pblock)) { 4288 - bool last_extent = false; 4289 - 3983 + if (!last_extent && 3984 + ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec || 3985 + nr_pblocks % blks_per_sec || 3986 + f2fs_is_sequential_zone_area(sbi, pblock))) { 4290 3987 not_aligned++; 4291 3988 4292 3989 nr_pblocks = roundup(nr_pblocks, blks_per_sec); ··· 4306 4005 goto out; 4307 4006 } 4308 4007 4309 - if (!last_extent) 4310 - goto retry; 4008 + /* lookup block mapping info after block migration */ 4009 + goto retry; 4311 4010 } 4312 4011 4313 4012 if (cur_lblock + nr_pblocks >= sis->max) ··· 4477 4176 { 4478 4177 bio_entry_slab = f2fs_kmem_cache_create("f2fs_bio_entry_slab", 4479 4178 sizeof(struct bio_entry)); 4480 - return bio_entry_slab ? 0 : -ENOMEM; 4179 + 4180 + if (!bio_entry_slab) 4181 + return -ENOMEM; 4182 + 4183 + ffs_entry_slab = f2fs_kmem_cache_create("f2fs_ffs_slab", 4184 + sizeof(struct f2fs_folio_state)); 4185 + 4186 + if (!ffs_entry_slab) { 4187 + kmem_cache_destroy(bio_entry_slab); 4188 + return -ENOMEM; 4189 + } 4190 + 4191 + return 0; 4481 4192 } 4482 4193 4483 4194 void f2fs_destroy_bio_entry_cache(void) 4484 4195 { 4485 4196 kmem_cache_destroy(bio_entry_slab); 4197 + kmem_cache_destroy(ffs_entry_slab); 4486 4198 } 4487 4199 4488 4200 static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, ··· 4521 4207 * f2fs_map_lock and f2fs_balance_fs are not necessary. 4522 4208 */ 4523 4209 if ((flags & IOMAP_WRITE) && 4524 - !f2fs_overwrite_io(inode, offset, length)) 4210 + !__f2fs_overwrite_io(inode, offset, length, true)) 4525 4211 map.m_may_create = true; 4526 4212 4527 4213 err = f2fs_map_blocks(inode, &map, F2FS_GET_BLOCK_DIO);

+1

fs/f2fs/debug.c

··· 423 423 [SBI_IS_RESIZEFS] = "resizefs", 424 424 [SBI_IS_FREEZING] = "freezefs", 425 425 [SBI_IS_WRITABLE] = "writable", 426 + [SBI_ENABLE_CHECKPOINT] = "enable_checkpoint", 426 427 }; 427 428 428 429 static const char *ipu_mode_names[F2FS_IPU_MAX] = {

+182 -67

fs/f2fs/f2fs.h

··· 54 54 FAULT_TRUNCATE, 55 55 FAULT_READ_IO, 56 56 FAULT_CHECKPOINT, 57 - FAULT_DISCARD, 57 + FAULT_DISCARD, /* it's obsolete due to __blkdev_issue_discard() will never fail */ 58 58 FAULT_WRITE_IO, 59 59 FAULT_SLAB_ALLOC, 60 60 FAULT_DQUOT_INIT, ··· 63 63 FAULT_BLKADDR_CONSISTENCE, 64 64 FAULT_NO_SEGMENT, 65 65 FAULT_INCONSISTENT_FOOTER, 66 - FAULT_TIMEOUT, 66 + FAULT_ATOMIC_TIMEOUT, 67 67 FAULT_VMALLOC, 68 + FAULT_LOCK_TIMEOUT, 69 + FAULT_SKIP_WRITE, 68 70 FAULT_MAX, 69 71 }; 70 72 ··· 74 72 enum fault_option { 75 73 FAULT_RATE = 1, /* only update fault rate */ 76 74 FAULT_TYPE = 2, /* only update fault type */ 77 - FAULT_ALL = 4, /* reset all fault injection options/stats */ 75 + FAULT_TIMEOUT = 4, /* only update fault timeout type */ 76 + FAULT_ALL = 8, /* reset all fault injection options/stats */ 78 77 }; 79 78 80 79 #ifdef CONFIG_F2FS_FAULT_INJECTION ··· 85 82 unsigned int inject_type; 86 83 /* Used to account total count of injection for each type */ 87 84 unsigned int inject_count[FAULT_MAX]; 85 + unsigned int inject_lock_timeout; /* inject lock timeout */ 88 86 }; 89 87 90 88 extern const char *f2fs_fault_name[FAULT_MAX]; ··· 177 173 ALLOCATE_FORWARD_FROM_HINT, 178 174 }; 179 175 176 + enum f2fs_lock_name { 177 + LOCK_NAME_NONE, 178 + LOCK_NAME_CP_RWSEM, 179 + LOCK_NAME_NODE_CHANGE, 180 + LOCK_NAME_NODE_WRITE, 181 + LOCK_NAME_GC_LOCK, 182 + LOCK_NAME_CP_GLOBAL, 183 + LOCK_NAME_IO_RWSEM, 184 + LOCK_NAME_MAX, 185 + }; 186 + 187 + enum f2fs_timeout_type { 188 + TIMEOUT_TYPE_NONE, 189 + TIMEOUT_TYPE_RUNNING, 190 + TIMEOUT_TYPE_IO_SLEEP, 191 + TIMEOUT_TYPE_NONIO_SLEEP, 192 + TIMEOUT_TYPE_RUNNABLE, 193 + TIMEOUT_TYPE_MAX, 194 + }; 195 + 180 196 /* 181 197 * An implementation of an rwsem that is explicitly unfair to readers. This 182 198 * prevents priority inversion when a low-priority reader acquires the read lock ··· 205 181 */ 206 182 207 183 struct f2fs_rwsem { 184 + struct f2fs_sb_info *sbi; 185 + enum f2fs_lock_name name; 208 186 struct rw_semaphore internal_rwsem; 209 187 #ifdef CONFIG_F2FS_UNFAIR_RWSEM 210 188 wait_queue_head_t read_waiters; ··· 313 287 #define DEF_CP_INTERVAL 60 /* 60 secs */ 314 288 #define DEF_IDLE_INTERVAL 5 /* 5 secs */ 315 289 #define DEF_DISABLE_INTERVAL 5 /* 5 secs */ 316 - #define DEF_ENABLE_INTERVAL 5 /* 5 secs */ 317 290 #define DEF_DISABLE_QUICK_INTERVAL 1 /* 1 secs */ 318 291 #define DEF_UMOUNT_DISCARD_TIMEOUT 5 /* 5 secs */ 319 292 ··· 320 295 CP_TIME_START, /* begin */ 321 296 CP_TIME_LOCK, /* after cp_global_sem */ 322 297 CP_TIME_OP_LOCK, /* after block_operation */ 323 - CP_TIME_FLUSH_META, /* after flush sit/nat */ 298 + CP_TIME_MERGE_WRITE, /* after flush DATA/NODE/META */ 299 + CP_TIME_FLUSH_NAT, /* after flush nat */ 300 + CP_TIME_FLUSH_SIT, /* after flush sit */ 324 301 CP_TIME_SYNC_META, /* after sync_meta_pages */ 325 302 CP_TIME_SYNC_CP_META, /* after sync cp meta pages */ 326 303 CP_TIME_WAIT_DIRTY_META,/* after wait on dirty meta */ ··· 548 521 #define nats_in_cursum(jnl) (le16_to_cpu((jnl)->n_nats)) 549 522 #define sits_in_cursum(jnl) (le16_to_cpu((jnl)->n_sits)) 550 523 551 - #define nat_in_journal(jnl, i) ((jnl)->nat_j.entries[i].ne) 552 - #define nid_in_journal(jnl, i) ((jnl)->nat_j.entries[i].nid) 553 - #define sit_in_journal(jnl, i) ((jnl)->sit_j.entries[i].se) 554 - #define segno_in_journal(jnl, i) ((jnl)->sit_j.entries[i].segno) 524 + #define nat_in_journal(jnl, i) \ 525 + (((struct nat_journal_entry *)(jnl)->nat_j.entries)[i].ne) 526 + #define nid_in_journal(jnl, i) \ 527 + (((struct nat_journal_entry *)(jnl)->nat_j.entries)[i].nid) 528 + #define sit_in_journal(jnl, i) \ 529 + (((struct sit_journal_entry *)(jnl)->sit_j.entries)[i].se) 530 + #define segno_in_journal(jnl, i) \ 531 + (((struct sit_journal_entry *)(jnl)->sit_j.entries)[i].segno) 555 532 556 - #define MAX_NAT_JENTRIES(jnl) (NAT_JOURNAL_ENTRIES - nats_in_cursum(jnl)) 557 - #define MAX_SIT_JENTRIES(jnl) (SIT_JOURNAL_ENTRIES - sits_in_cursum(jnl)) 533 + #define sum_entries(sum) ((struct f2fs_summary *)(sum)) 534 + #define sum_journal(sbi, sum) \ 535 + ((struct f2fs_journal *)((char *)(sum) + \ 536 + ((sbi)->entries_in_sum * sizeof(struct f2fs_summary)))) 537 + #define sum_footer(sbi, sum) \ 538 + ((struct summary_footer *)((char *)(sum) + (sbi)->sum_blocksize - \ 539 + sizeof(struct summary_footer))) 540 + 541 + #define MAX_NAT_JENTRIES(sbi, jnl) ((sbi)->nat_journal_entries - nats_in_cursum(jnl)) 542 + #define MAX_SIT_JENTRIES(sbi, jnl) ((sbi)->sit_journal_entries - sits_in_cursum(jnl)) 558 543 559 544 static inline int update_nats_in_cursum(struct f2fs_journal *journal, int i) 560 545 { ··· 582 543 583 544 journal->n_sits = cpu_to_le16(before + i); 584 545 return before; 585 - } 586 - 587 - static inline bool __has_cursum_space(struct f2fs_journal *journal, 588 - int size, int type) 589 - { 590 - if (type == NAT_JOURNAL) 591 - return size <= MAX_NAT_JENTRIES(journal); 592 - return size <= MAX_SIT_JENTRIES(journal); 593 546 } 594 547 595 548 /* for inline stuff */ ··· 700 669 701 670 #define DEFAULT_RETRY_IO_COUNT 8 /* maximum retry read IO or flush count */ 702 671 703 - /* IO/non-IO congestion wait timeout value, default: 1ms */ 704 - #define DEFAULT_SCHEDULE_TIMEOUT (msecs_to_jiffies(1)) 672 + #define MAX_FLUSH_RETRY_COUNT 3 /* maximum flush retry count in f2fs_enable_checkpoint() */ 673 + 674 + /* IO/non-IO congestion wait timeout value, default: 1 jiffies */ 675 + #define DEFAULT_SCHEDULE_TIMEOUT 1 705 676 706 677 /* timeout value injected, default: 1000ms */ 707 678 #define DEFAULT_FAULT_TIMEOUT (msecs_to_jiffies(1000)) ··· 1241 1208 F2FS_RD_META, 1242 1209 F2FS_DIO_WRITE, 1243 1210 F2FS_DIO_READ, 1211 + F2FS_SKIPPED_WRITE, /* skip or fail during f2fs_enable_checkpoint() */ 1244 1212 NR_COUNT_TYPE, 1245 1213 }; 1246 1214 ··· 1430 1396 unsigned long long age_threshold; /* age threshold */ 1431 1397 }; 1432 1398 1399 + struct f2fs_time_stat { 1400 + unsigned long long total_time; /* total wall clock time */ 1401 + #ifdef CONFIG_64BIT 1402 + unsigned long long running_time; /* running time */ 1403 + #endif 1404 + #if defined(CONFIG_SCHED_INFO) && defined(CONFIG_SCHEDSTATS) 1405 + unsigned long long runnable_time; /* runnable(including preempted) time */ 1406 + #endif 1407 + #ifdef CONFIG_TASK_DELAY_ACCT 1408 + unsigned long long io_sleep_time; /* IO sleep time */ 1409 + #endif 1410 + }; 1411 + 1412 + struct f2fs_lock_context { 1413 + struct f2fs_time_stat ts; 1414 + int orig_nice; 1415 + int new_nice; 1416 + bool lock_trace; 1417 + bool need_restore; 1418 + }; 1419 + 1433 1420 struct f2fs_gc_control { 1434 1421 unsigned int victim_segno; /* target victim segment number */ 1435 1422 int init_gc_type; /* FG_GC or BG_GC */ ··· 1459 1404 bool err_gc_skipped; /* return EAGAIN if GC skipped */ 1460 1405 bool one_time; /* require one time GC in one migration unit */ 1461 1406 unsigned int nr_free_secs; /* # of free sections to do GC */ 1407 + struct f2fs_lock_context lc; /* lock context for gc_lock */ 1462 1408 }; 1463 1409 1464 1410 /* ··· 1483 1427 SBI_IS_RESIZEFS, /* resizefs is in process */ 1484 1428 SBI_IS_FREEZING, /* freezefs is in process */ 1485 1429 SBI_IS_WRITABLE, /* remove ro mountoption transiently */ 1430 + SBI_ENABLE_CHECKPOINT, /* indicate it's during f2fs_enable_checkpoint() */ 1486 1431 MAX_SBI_FLAG, 1487 1432 }; 1488 1433 ··· 1493 1436 DISCARD_TIME, 1494 1437 GC_TIME, 1495 1438 DISABLE_TIME, 1496 - ENABLE_TIME, 1497 1439 UMOUNT_DISCARD_TIMEOUT, 1498 1440 MAX_TIME, 1499 1441 }; ··· 1577 1521 LOOKUP_COMPAT, 1578 1522 LOOKUP_AUTO, 1579 1523 }; 1524 + 1525 + /* For node type in __get_node_folio() */ 1526 + enum node_type { 1527 + NODE_TYPE_REGULAR, 1528 + NODE_TYPE_INODE, 1529 + NODE_TYPE_XATTR, 1530 + NODE_TYPE_NON_INODE, 1531 + }; 1532 + 1533 + /* a threshold of maximum elapsed time in critical region to print tracepoint */ 1534 + #define MAX_LOCK_ELAPSED_TIME 500 1535 + 1536 + #define F2FS_DEFAULT_TASK_PRIORITY (DEFAULT_PRIO) 1537 + #define F2FS_CRITICAL_TASK_PRIORITY NICE_TO_PRIO(0) 1580 1538 1581 1539 static inline int f2fs_test_bit(unsigned int nr, char *addr); 1582 1540 static inline void f2fs_set_bit(unsigned int nr, char *addr); ··· 1784 1714 long interval_time[MAX_TIME]; /* to store thresholds */ 1785 1715 struct ckpt_req_control cprc_info; /* for checkpoint request control */ 1786 1716 struct cp_stats cp_stats; /* for time stat of checkpoint */ 1787 - struct f2fs_rwsem cp_enable_rwsem; /* block cache/dio write */ 1788 1717 1789 1718 struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */ 1790 1719 ··· 1829 1760 unsigned int total_valid_node_count; /* valid node block count */ 1830 1761 int dir_level; /* directory level */ 1831 1762 bool readdir_ra; /* readahead inode in readdir */ 1832 - u64 max_io_bytes; /* max io bytes to merge IOs */ 1763 + unsigned int max_io_bytes; /* max io bytes to merge IOs */ 1764 + 1765 + /* variable summary block units */ 1766 + unsigned int sum_blocksize; /* sum block size */ 1767 + unsigned int sums_per_block; /* sum block count per block */ 1768 + unsigned int entries_in_sum; /* entry count in sum block */ 1769 + unsigned int sum_entry_size; /* total entry size in sum block */ 1770 + unsigned int sum_journal_size; /* journal size in sum block */ 1771 + unsigned int nat_journal_entries; /* nat journal entry count in the journal */ 1772 + unsigned int sit_journal_entries; /* sit journal entry count in the journal */ 1833 1773 1834 1774 block_t user_block_count; /* # of user blocks */ 1835 1775 block_t total_valid_block_count; /* # of valid blocks */ ··· 1986 1908 unsigned int gc_segment_mode; /* GC state for reclaimed segments */ 1987 1909 unsigned int gc_reclaimed_segs[MAX_GC_MODE]; /* Reclaimed segs for each mode */ 1988 1910 1989 - unsigned long seq_file_ra_mul; /* multiplier for ra_pages of seq. files in fadvise */ 1911 + unsigned int seq_file_ra_mul; /* multiplier for ra_pages of seq. files in fadvise */ 1990 1912 1991 1913 int max_fragment_chunk; /* max chunk size for block fragmentation mode */ 1992 1914 int max_fragment_hole; /* max hole size for block fragmentation mode */ ··· 1999 1921 2000 1922 /* carve out reserved_blocks from total blocks */ 2001 1923 bool carve_out; 1924 + 1925 + /* max elapsed time threshold in critical region that lock covered */ 1926 + unsigned long long max_lock_elapsed_time; 1927 + 1928 + /* enable/disable to adjust task priority in critical region covered by lock */ 1929 + unsigned int adjust_lock_priority; 1930 + 1931 + /* adjust priority for task which is in critical region covered by lock */ 1932 + unsigned int lock_duration_priority; 1933 + 1934 + /* priority for critical task, e.g. f2fs_ckpt, f2fs_gc threads */ 1935 + long critical_task_priority; 2002 1936 2003 1937 #ifdef CONFIG_F2FS_FS_COMPRESSION 2004 1938 struct kmem_cache *page_array_slab; /* page array entry */ ··· 2351 2261 spin_unlock_irqrestore(&sbi->cp_lock, flags); 2352 2262 } 2353 2263 2354 - #define init_f2fs_rwsem(sem) \ 2264 + #define init_f2fs_rwsem(sem) __init_f2fs_rwsem(sem, NULL, LOCK_NAME_NONE) 2265 + #define init_f2fs_rwsem_trace __init_f2fs_rwsem 2266 + 2267 + #define __init_f2fs_rwsem(sem, sbi, name) \ 2355 2268 do { \ 2356 2269 static struct lock_class_key __key; \ 2357 2270 \ 2358 - __init_f2fs_rwsem((sem), #sem, &__key); \ 2271 + do_init_f2fs_rwsem((sem), #sem, &__key, sbi, name); \ 2359 2272 } while (0) 2360 2273 2361 - static inline void __init_f2fs_rwsem(struct f2fs_rwsem *sem, 2362 - const char *sem_name, struct lock_class_key *key) 2274 + static inline void do_init_f2fs_rwsem(struct f2fs_rwsem *sem, 2275 + const char *sem_name, struct lock_class_key *key, 2276 + struct f2fs_sb_info *sbi, enum f2fs_lock_name name) 2363 2277 { 2278 + sem->sbi = sbi; 2279 + sem->name = name; 2364 2280 __init_rwsem(&sem->internal_rwsem, sem_name, key); 2365 2281 #ifdef CONFIG_F2FS_UNFAIR_RWSEM 2366 2282 init_waitqueue_head(&sem->read_waiters); ··· 2435 2339 #endif 2436 2340 } 2437 2341 2342 + void f2fs_down_read_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc); 2343 + int f2fs_down_read_trylock_trace(struct f2fs_rwsem *sem, 2344 + struct f2fs_lock_context *lc); 2345 + void f2fs_up_read_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc); 2346 + void f2fs_down_write_trace(struct f2fs_rwsem *sem, 2347 + struct f2fs_lock_context *lc); 2348 + int f2fs_down_write_trylock_trace(struct f2fs_rwsem *sem, 2349 + struct f2fs_lock_context *lc); 2350 + void f2fs_up_write_trace(struct f2fs_rwsem *sem, struct f2fs_lock_context *lc); 2351 + 2438 2352 static inline void disable_nat_bits(struct f2fs_sb_info *sbi, bool lock) 2439 2353 { 2440 2354 unsigned long flags; ··· 2473 2367 bool set = is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG); 2474 2368 2475 2369 return (cpc) ? (cpc->reason & CP_UMOUNT) && set : set; 2476 - } 2477 - 2478 - static inline void f2fs_lock_op(struct f2fs_sb_info *sbi) 2479 - { 2480 - f2fs_down_read(&sbi->cp_rwsem); 2481 - } 2482 - 2483 - static inline int f2fs_trylock_op(struct f2fs_sb_info *sbi) 2484 - { 2485 - if (time_to_inject(sbi, FAULT_LOCK_OP)) 2486 - return 0; 2487 - return f2fs_down_read_trylock(&sbi->cp_rwsem); 2488 - } 2489 - 2490 - static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi) 2491 - { 2492 - f2fs_up_read(&sbi->cp_rwsem); 2493 - } 2494 - 2495 - static inline void f2fs_lock_all(struct f2fs_sb_info *sbi) 2496 - { 2497 - f2fs_down_write(&sbi->cp_rwsem); 2498 - } 2499 - 2500 - static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi) 2501 - { 2502 - f2fs_up_write(&sbi->cp_rwsem); 2503 2370 } 2504 2371 2505 2372 static inline int __get_cp_reason(struct f2fs_sb_info *sbi) ··· 2888 2809 static inline block_t __start_sum_addr(struct f2fs_sb_info *sbi) 2889 2810 { 2890 2811 return le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum); 2812 + } 2813 + 2814 + static inline bool __has_cursum_space(struct f2fs_sb_info *sbi, 2815 + struct f2fs_journal *journal, unsigned int size, int type) 2816 + { 2817 + if (type == NAT_JOURNAL) 2818 + return size <= MAX_NAT_JENTRIES(sbi, journal); 2819 + return size <= MAX_SIT_JENTRIES(sbi, journal); 2891 2820 } 2892 2821 2893 2822 extern void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync); ··· 3809 3722 int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc); 3810 3723 void f2fs_remove_donate_inode(struct inode *inode); 3811 3724 void f2fs_evict_inode(struct inode *inode); 3812 - void f2fs_handle_failed_inode(struct inode *inode); 3725 + void f2fs_handle_failed_inode(struct inode *inode, struct f2fs_lock_context *lc); 3813 3726 3814 3727 /* 3815 3728 * namei.c ··· 3942 3855 void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); 3943 3856 struct folio *f2fs_get_node_folio(struct f2fs_sb_info *sbi, pgoff_t nid, 3944 3857 enum node_type node_type); 3858 + int f2fs_sanity_check_node_footer(struct f2fs_sb_info *sbi, 3859 + struct folio *folio, pgoff_t nid, 3860 + enum node_type ntype, bool in_irq); 3945 3861 struct folio *f2fs_get_inode_folio(struct f2fs_sb_info *sbi, pgoff_t ino); 3946 3862 struct folio *f2fs_get_xnode_folio(struct f2fs_sb_info *sbi, pgoff_t xnid); 3947 3863 int f2fs_move_node_folio(struct folio *node_folio, int gc_type); ··· 4044 3954 block_t len); 4045 3955 void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk); 4046 3956 void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); 4047 - int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, 3957 + int f2fs_lookup_journal_in_cursum(struct f2fs_sb_info *sbi, 3958 + struct f2fs_journal *journal, int type, 4048 3959 unsigned int val, int alloc); 4049 3960 void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); 4050 3961 int f2fs_check_and_fix_write_pointer(struct f2fs_sb_info *sbi); ··· 4080 3989 /* 4081 3990 * checkpoint.c 4082 3991 */ 3992 + void f2fs_lock_op(struct f2fs_sb_info *sbi, struct f2fs_lock_context *lc); 3993 + int f2fs_trylock_op(struct f2fs_sb_info *sbi, struct f2fs_lock_context *lc); 3994 + void f2fs_unlock_op(struct f2fs_sb_info *sbi, struct f2fs_lock_context *lc); 4083 3995 void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io, 4084 3996 unsigned char reason); 4085 3997 void f2fs_flush_ckpt_thread(struct f2fs_sb_info *sbi); ··· 4098 4004 int type, bool sync); 4099 4005 void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index, 4100 4006 unsigned int ra_blocks); 4101 - long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, 4102 - long nr_to_write, enum iostat_type io_type); 4007 + long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, long nr_to_write, 4008 + enum iostat_type io_type); 4103 4009 void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); 4104 4010 void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); 4105 4011 void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all); ··· 4144 4050 void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, 4145 4051 struct inode *inode, struct folio *folio, 4146 4052 nid_t ino, enum page_type type); 4053 + void f2fs_submit_merged_write_folio(struct f2fs_sb_info *sbi, 4054 + struct folio *folio, enum page_type type); 4147 4055 void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi, 4148 4056 struct bio **bio, struct folio *folio); 4149 4057 void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi); ··· 4983 4887 #ifdef CONFIG_F2FS_FAULT_INJECTION 4984 4888 extern int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, 4985 4889 unsigned long type, enum fault_option fo); 4890 + extern void f2fs_simulate_lock_timeout(struct f2fs_sb_info *sbi); 4986 4891 #else 4987 4892 static inline int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, 4988 4893 unsigned long rate, unsigned long type, 4989 4894 enum fault_option fo) 4990 4895 { 4991 4896 return 0; 4897 + } 4898 + static inline void f2fs_simulate_lock_timeout(struct f2fs_sb_info *sbi) 4899 + { 4900 + return; 4992 4901 } 4993 4902 #endif 4994 4903 ··· 5006 4905 F2FS_OPTION(sbi).s_qf_names[GRPQUOTA] || 5007 4906 F2FS_OPTION(sbi).s_qf_names[PRJQUOTA]) 5008 4907 return true; 4908 + #endif 4909 + return false; 4910 + } 4911 + 4912 + static inline bool f2fs_quota_file(struct f2fs_sb_info *sbi, nid_t ino) 4913 + { 4914 + #ifdef CONFIG_QUOTA 4915 + int i; 4916 + 4917 + if (!f2fs_sb_has_quota_ino(sbi)) 4918 + return false; 4919 + 4920 + for (i = 0; i < MAXQUOTAS; i++) { 4921 + if (f2fs_qf_ino(sbi->sb, i) == ino) 4922 + return true; 4923 + } 5009 4924 #endif 5010 4925 return false; 5011 4926 } ··· 5045 4928 #define f2fs_schedule_timeout(timeout) \ 5046 4929 __f2fs_schedule_timeout(timeout, false) 5047 4930 5048 - static inline void f2fs_io_schedule_timeout_killable(long timeout) 4931 + static inline void f2fs_schedule_timeout_killable(long timeout, bool io) 5049 4932 { 5050 - while (timeout) { 4933 + unsigned long last_time = jiffies + timeout; 4934 + 4935 + while (jiffies < last_time) { 5051 4936 if (fatal_signal_pending(current)) 5052 4937 return; 5053 - set_current_state(TASK_UNINTERRUPTIBLE); 5054 - io_schedule_timeout(DEFAULT_SCHEDULE_TIMEOUT); 5055 - if (timeout <= DEFAULT_SCHEDULE_TIMEOUT) 5056 - return; 5057 - timeout -= DEFAULT_SCHEDULE_TIMEOUT; 4938 + __f2fs_schedule_timeout(DEFAULT_SCHEDULE_TIMEOUT, io); 5058 4939 } 5059 4940 } 5060 4941

+51 -33

fs/f2fs/file.c

··· 626 626 if (!f2fs_is_compress_backend_ready(inode)) 627 627 return -EOPNOTSUPP; 628 628 629 + if (mapping_large_folio_support(inode->i_mapping) && 630 + filp->f_mode & FMODE_WRITE) 631 + return -EOPNOTSUPP; 632 + 629 633 err = fsverity_file_open(inode, filp); 630 634 if (err) 631 635 return err; ··· 776 772 { 777 773 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 778 774 struct dnode_of_data dn; 775 + struct f2fs_lock_context lc; 779 776 pgoff_t free_from; 780 777 int count = 0, err = 0; 781 778 struct folio *ifolio; ··· 795 790 goto free_partial; 796 791 797 792 if (lock) 798 - f2fs_lock_op(sbi); 793 + f2fs_lock_op(sbi, &lc); 799 794 800 795 ifolio = f2fs_get_inode_folio(sbi, inode->i_ino); 801 796 if (IS_ERR(ifolio)) { ··· 846 841 err = f2fs_truncate_inode_blocks(inode, free_from); 847 842 out: 848 843 if (lock) 849 - f2fs_unlock_op(sbi); 844 + f2fs_unlock_op(sbi, &lc); 850 845 free_partial: 851 846 /* lastly zero out the first data page */ 852 847 if (!err) ··· 1117 1112 } 1118 1113 if (i_uid_needs_update(idmap, attr, inode) || 1119 1114 i_gid_needs_update(idmap, attr, inode)) { 1120 - f2fs_lock_op(sbi); 1115 + struct f2fs_lock_context lc; 1116 + 1117 + f2fs_lock_op(sbi, &lc); 1121 1118 err = dquot_transfer(idmap, inode, attr); 1122 1119 if (err) { 1123 1120 set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR); 1124 - f2fs_unlock_op(sbi); 1121 + f2fs_unlock_op(sbi, &lc); 1125 1122 return err; 1126 1123 } 1127 1124 /* ··· 1133 1126 i_uid_update(idmap, attr, inode); 1134 1127 i_gid_update(idmap, attr, inode); 1135 1128 f2fs_mark_inode_dirty_sync(inode, true); 1136 - f2fs_unlock_op(sbi); 1129 + f2fs_unlock_op(sbi, &lc); 1137 1130 } 1138 1131 1139 1132 if (attr->ia_valid & ATTR_SIZE) { ··· 1217 1210 { 1218 1211 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1219 1212 struct folio *folio; 1213 + struct f2fs_lock_context lc; 1220 1214 1221 1215 if (!len) 1222 1216 return 0; 1223 1217 1224 1218 f2fs_balance_fs(sbi, true); 1225 1219 1226 - f2fs_lock_op(sbi); 1220 + f2fs_lock_op(sbi, &lc); 1227 1221 folio = f2fs_get_new_data_folio(inode, NULL, index, false); 1228 - f2fs_unlock_op(sbi); 1222 + f2fs_unlock_op(sbi, &lc); 1229 1223 1230 1224 if (IS_ERR(folio)) 1231 1225 return PTR_ERR(folio); ··· 1309 1301 if (pg_start < pg_end) { 1310 1302 loff_t blk_start, blk_end; 1311 1303 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1304 + struct f2fs_lock_context lc; 1312 1305 1313 1306 f2fs_balance_fs(sbi, true); 1314 1307 ··· 1321 1312 1322 1313 truncate_pagecache_range(inode, blk_start, blk_end - 1); 1323 1314 1324 - f2fs_lock_op(sbi); 1315 + f2fs_lock_op(sbi, &lc); 1325 1316 ret = f2fs_truncate_hole(inode, pg_start, pg_end); 1326 - f2fs_unlock_op(sbi); 1317 + f2fs_unlock_op(sbi, &lc); 1327 1318 1328 1319 filemap_invalidate_unlock(inode->i_mapping); 1329 1320 f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); ··· 1555 1546 static int f2fs_do_collapse(struct inode *inode, loff_t offset, loff_t len) 1556 1547 { 1557 1548 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1549 + struct f2fs_lock_context lc; 1558 1550 pgoff_t nrpages = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); 1559 1551 pgoff_t start = offset >> PAGE_SHIFT; 1560 1552 pgoff_t end = (offset + len) >> PAGE_SHIFT; ··· 1569 1559 1570 1560 f2fs_zero_post_eof_page(inode, offset + len, false); 1571 1561 1572 - f2fs_lock_op(sbi); 1562 + f2fs_lock_op(sbi, &lc); 1573 1563 f2fs_drop_extent_tree(inode); 1574 1564 truncate_pagecache(inode, offset); 1575 1565 ret = __exchange_data_block(inode, inode, end, start, nrpages - end, true); 1576 - f2fs_unlock_op(sbi); 1566 + f2fs_unlock_op(sbi, &lc); 1577 1567 1578 1568 filemap_invalidate_unlock(inode->i_mapping); 1579 1569 f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); ··· 1721 1711 1722 1712 for (index = pg_start; index < pg_end;) { 1723 1713 struct dnode_of_data dn; 1714 + struct f2fs_lock_context lc; 1724 1715 unsigned int end_offset; 1725 1716 pgoff_t end; 1726 1717 ··· 1732 1721 (loff_t)index << PAGE_SHIFT, 1733 1722 ((loff_t)pg_end << PAGE_SHIFT) - 1); 1734 1723 1735 - f2fs_lock_op(sbi); 1724 + f2fs_lock_op(sbi, &lc); 1736 1725 1737 1726 set_new_dnode(&dn, inode, NULL, NULL, 0); 1738 1727 ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE); 1739 1728 if (ret) { 1740 - f2fs_unlock_op(sbi); 1729 + f2fs_unlock_op(sbi, &lc); 1741 1730 filemap_invalidate_unlock(mapping); 1742 1731 f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1743 1732 goto out; ··· 1749 1738 ret = f2fs_do_zero_range(&dn, index, end); 1750 1739 f2fs_put_dnode(&dn); 1751 1740 1752 - f2fs_unlock_op(sbi); 1741 + f2fs_unlock_op(sbi, &lc); 1753 1742 filemap_invalidate_unlock(mapping); 1754 1743 f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1755 1744 ··· 1832 1821 truncate_pagecache(inode, offset); 1833 1822 1834 1823 while (!ret && idx > pg_start) { 1824 + struct f2fs_lock_context lc; 1825 + 1835 1826 nr = idx - pg_start; 1836 1827 if (nr > delta) 1837 1828 nr = delta; 1838 1829 idx -= nr; 1839 1830 1840 - f2fs_lock_op(sbi); 1831 + f2fs_lock_op(sbi, &lc); 1841 1832 f2fs_drop_extent_tree(inode); 1842 1833 1843 1834 ret = __exchange_data_block(inode, inode, idx, 1844 1835 idx + delta, nr, false); 1845 - f2fs_unlock_op(sbi); 1836 + f2fs_unlock_op(sbi, &lc); 1846 1837 } 1847 1838 filemap_invalidate_unlock(mapping); 1848 1839 f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); ··· 1926 1913 1927 1914 if (has_not_enough_free_secs(sbi, 0, 1928 1915 sbi->reserved_pin_section)) { 1929 - f2fs_down_write(&sbi->gc_lock); 1916 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 1930 1917 stat_inc_gc_call_count(sbi, FOREGROUND); 1931 1918 err = f2fs_gc(sbi, &gc_control); 1932 1919 if (err && err != -ENODATA) { ··· 2461 2448 f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_SHUTDOWN); 2462 2449 break; 2463 2450 case F2FS_GOING_DOWN_METAFLUSH: 2464 - f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_META_IO); 2451 + f2fs_sync_meta_pages(sbi, LONG_MAX, FS_META_IO); 2465 2452 f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_SHUTDOWN); 2466 2453 break; 2467 2454 case F2FS_GOING_DOWN_NEED_FSCK: ··· 2777 2764 return ret; 2778 2765 2779 2766 if (!sync) { 2780 - if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 2767 + if (!f2fs_down_write_trylock_trace(&sbi->gc_lock, 2768 + &gc_control.lc)) { 2781 2769 ret = -EBUSY; 2782 2770 goto out; 2783 2771 } 2784 2772 } else { 2785 - f2fs_down_write(&sbi->gc_lock); 2773 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 2786 2774 } 2787 2775 2788 2776 gc_control.init_gc_type = sync ? FG_GC : BG_GC; ··· 2823 2809 2824 2810 do_more: 2825 2811 if (!range->sync) { 2826 - if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 2812 + if (!f2fs_down_write_trylock_trace(&sbi->gc_lock, &gc_control.lc)) { 2827 2813 ret = -EBUSY; 2828 2814 goto out; 2829 2815 } 2830 2816 } else { 2831 - f2fs_down_write(&sbi->gc_lock); 2817 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 2832 2818 } 2833 2819 2834 2820 gc_control.victim_segno = GET_SEGNO(sbi, range->start); ··· 3101 3087 struct inode *src = file_inode(file_in); 3102 3088 struct inode *dst = file_inode(file_out); 3103 3089 struct f2fs_sb_info *sbi = F2FS_I_SB(src); 3090 + struct f2fs_lock_context lc; 3104 3091 size_t olen = len, dst_max_i_size = 0; 3105 3092 size_t dst_osize; 3106 3093 int ret; ··· 3197 3182 goto out_src; 3198 3183 } 3199 3184 3200 - f2fs_lock_op(sbi); 3185 + f2fs_lock_op(sbi, &lc); 3201 3186 ret = __exchange_data_block(src, dst, F2FS_BYTES_TO_BLK(pos_in), 3202 3187 F2FS_BYTES_TO_BLK(pos_out), 3203 3188 F2FS_BYTES_TO_BLK(len), false); ··· 3208 3193 else if (dst_osize != dst->i_size) 3209 3194 f2fs_i_size_write(dst, dst_osize); 3210 3195 } 3211 - f2fs_unlock_op(sbi); 3196 + f2fs_unlock_op(sbi, &lc); 3212 3197 3213 3198 if (src != dst) 3214 3199 f2fs_up_write(&F2FS_I(dst)->i_gc_rwsem[WRITE]); ··· 3319 3304 end_segno = min(start_segno + range.segments, dev_end_segno); 3320 3305 3321 3306 while (start_segno < end_segno) { 3322 - if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 3307 + if (!f2fs_down_write_trylock_trace(&sbi->gc_lock, &gc_control.lc)) { 3323 3308 ret = -EBUSY; 3324 3309 goto out; 3325 3310 } ··· 3376 3361 struct f2fs_inode_info *fi = F2FS_I(inode); 3377 3362 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 3378 3363 struct f2fs_inode *ri = NULL; 3364 + struct f2fs_lock_context lc; 3379 3365 kprojid_t kprojid; 3380 3366 int err; 3381 3367 ··· 3407 3391 if (err) 3408 3392 return err; 3409 3393 3410 - f2fs_lock_op(sbi); 3394 + f2fs_lock_op(sbi, &lc); 3411 3395 err = f2fs_transfer_project_quota(inode, kprojid); 3412 3396 if (err) 3413 3397 goto out_unlock; ··· 3416 3400 inode_set_ctime_current(inode); 3417 3401 f2fs_mark_inode_dirty_sync(inode, true); 3418 3402 out_unlock: 3419 - f2fs_unlock_op(sbi); 3403 + f2fs_unlock_op(sbi, &lc); 3420 3404 return err; 3421 3405 } 3422 3406 #else ··· 3849 3833 struct inode *inode = file_inode(filp); 3850 3834 struct f2fs_inode_info *fi = F2FS_I(inode); 3851 3835 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 3836 + struct f2fs_lock_context lc; 3852 3837 pgoff_t page_idx = 0, last_idx; 3853 3838 unsigned int released_blocks = 0; 3854 3839 int ret; ··· 3904 3887 struct dnode_of_data dn; 3905 3888 pgoff_t end_offset, count; 3906 3889 3907 - f2fs_lock_op(sbi); 3890 + f2fs_lock_op(sbi, &lc); 3908 3891 3909 3892 set_new_dnode(&dn, inode, NULL, NULL, 0); 3910 3893 ret = f2fs_get_dnode_of_data(&dn, page_idx, LOOKUP_NODE); 3911 3894 if (ret) { 3912 - f2fs_unlock_op(sbi); 3895 + f2fs_unlock_op(sbi, &lc); 3913 3896 if (ret == -ENOENT) { 3914 3897 page_idx = f2fs_get_next_page_offset(&dn, 3915 3898 page_idx); ··· 3927 3910 3928 3911 f2fs_put_dnode(&dn); 3929 3912 3930 - f2fs_unlock_op(sbi); 3913 + f2fs_unlock_op(sbi, &lc); 3931 3914 3932 3915 if (ret < 0) 3933 3916 break; ··· 4080 4063 4081 4064 while (page_idx < last_idx) { 4082 4065 struct dnode_of_data dn; 4066 + struct f2fs_lock_context lc; 4083 4067 pgoff_t end_offset, count; 4084 4068 4085 - f2fs_lock_op(sbi); 4069 + f2fs_lock_op(sbi, &lc); 4086 4070 4087 4071 set_new_dnode(&dn, inode, NULL, NULL, 0); 4088 4072 ret = f2fs_get_dnode_of_data(&dn, page_idx, LOOKUP_NODE); 4089 4073 if (ret) { 4090 - f2fs_unlock_op(sbi); 4074 + f2fs_unlock_op(sbi, &lc); 4091 4075 if (ret == -ENOENT) { 4092 4076 page_idx = f2fs_get_next_page_offset(&dn, 4093 4077 page_idx); ··· 4106 4088 4107 4089 f2fs_put_dnode(&dn); 4108 4090 4109 - f2fs_unlock_op(sbi); 4091 + f2fs_unlock_op(sbi, &lc); 4110 4092 4111 4093 if (ret < 0) 4112 4094 break;

+53 -33

fs/f2fs/gc.c

··· 102 102 if (sbi->gc_mode == GC_URGENT_HIGH || 103 103 sbi->gc_mode == GC_URGENT_MID) { 104 104 wait_ms = gc_th->urgent_sleep_time; 105 - f2fs_down_write(&sbi->gc_lock); 105 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 106 106 goto do_gc; 107 107 } 108 108 109 109 if (foreground) { 110 - f2fs_down_write(&sbi->gc_lock); 110 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 111 111 goto do_gc; 112 - } else if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 112 + } else if (!f2fs_down_write_trylock_trace(&sbi->gc_lock, 113 + &gc_control.lc)) { 113 114 stat_other_skip_bggc_count(sbi); 114 115 goto next; 115 116 } 116 117 117 118 if (!is_idle(sbi, GC_TIME)) { 118 119 increase_sleep_time(gc_th, &wait_ms); 119 - f2fs_up_write(&sbi->gc_lock); 120 + f2fs_up_write_trace(&sbi->gc_lock, &gc_control.lc); 120 121 stat_io_skip_bggc_count(sbi); 121 122 goto next; 122 123 } ··· 126 125 if (has_enough_free_blocks(sbi, 127 126 gc_th->no_zoned_gc_percent)) { 128 127 wait_ms = gc_th->no_gc_sleep_time; 129 - f2fs_up_write(&sbi->gc_lock); 128 + f2fs_up_write_trace(&sbi->gc_lock, 129 + &gc_control.lc); 130 130 goto next; 131 131 } 132 132 if (wait_ms == gc_th->no_gc_sleep_time) ··· 234 232 return err; 235 233 } 236 234 235 + set_user_nice(gc_th->f2fs_gc_task, 236 + PRIO_TO_NICE(sbi->critical_task_priority)); 237 237 return 0; 238 238 } 239 239 ··· 1035 1031 * ignore that. 1036 1032 */ 1037 1033 static int gc_node_segment(struct f2fs_sb_info *sbi, 1038 - struct f2fs_summary *sum, unsigned int segno, int gc_type) 1034 + struct f2fs_summary *sum, unsigned int segno, int gc_type, 1035 + struct blk_plug *plug) 1039 1036 { 1040 1037 struct f2fs_summary *entry; 1041 1038 block_t start_addr; ··· 1105 1100 stat_inc_node_blk_count(sbi, 1, gc_type); 1106 1101 } 1107 1102 1108 - if (++phase < 3) 1103 + if (++phase < 3) { 1104 + blk_finish_plug(plug); 1105 + blk_start_plug(plug); 1109 1106 goto next_step; 1107 + } 1110 1108 1111 1109 if (fggc) 1112 1110 atomic_dec(&sbi->wb_sync_req[NODE]); ··· 1461 1453 put_out: 1462 1454 f2fs_put_dnode(&dn); 1463 1455 out: 1464 - f2fs_folio_put(folio, true); 1456 + if (!folio_test_uptodate(folio)) 1457 + __folio_set_dropbehind(folio); 1458 + folio_unlock(folio); 1459 + folio_end_dropbehind(folio); 1460 + folio_put(folio); 1465 1461 return err; 1466 1462 } 1467 1463 ··· 1547 1535 */ 1548 1536 static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, 1549 1537 struct gc_inode_list *gc_list, unsigned int segno, int gc_type, 1550 - bool force_migrate) 1538 + bool force_migrate, struct blk_plug *plug) 1551 1539 { 1552 1540 struct super_block *sb = sbi->sb; 1553 1541 struct f2fs_summary *entry; ··· 1715 1703 } 1716 1704 } 1717 1705 1718 - if (++phase < 5) 1706 + if (++phase < 5) { 1707 + blk_finish_plug(plug); 1708 + blk_start_plug(plug); 1719 1709 goto next_step; 1710 + } 1720 1711 1721 1712 return submitted; 1722 1713 } ··· 1784 1769 1785 1770 sanity_check_seg_type(sbi, get_seg_entry(sbi, segno)->type); 1786 1771 1787 - segno = rounddown(segno, SUMS_PER_BLOCK); 1788 - sum_blk_cnt = DIV_ROUND_UP(end_segno - segno, SUMS_PER_BLOCK); 1772 + segno = rounddown(segno, sbi->sums_per_block); 1773 + sum_blk_cnt = DIV_ROUND_UP(end_segno - segno, sbi->sums_per_block); 1789 1774 /* readahead multi ssa blocks those have contiguous address */ 1790 1775 if (__is_large_section(sbi)) 1791 1776 f2fs_ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno), ··· 1795 1780 while (segno < end_segno) { 1796 1781 struct folio *sum_folio = f2fs_get_sum_folio(sbi, segno); 1797 1782 1798 - segno += SUMS_PER_BLOCK; 1783 + segno += sbi->sums_per_block; 1799 1784 if (IS_ERR(sum_folio)) { 1800 1785 int err = PTR_ERR(sum_folio); 1801 1786 1802 - end_segno = segno - SUMS_PER_BLOCK; 1803 - segno = rounddown(start_segno, SUMS_PER_BLOCK); 1787 + end_segno = segno - sbi->sums_per_block; 1788 + segno = rounddown(start_segno, sbi->sums_per_block); 1804 1789 while (segno < end_segno) { 1805 1790 sum_folio = filemap_get_folio(META_MAPPING(sbi), 1806 1791 GET_SUM_BLOCK(sbi, segno)); 1807 1792 folio_put_refs(sum_folio, 2); 1808 - segno += SUMS_PER_BLOCK; 1793 + segno += sbi->sums_per_block; 1809 1794 } 1810 1795 return err; 1811 1796 } ··· 1821 1806 /* find segment summary of victim */ 1822 1807 struct folio *sum_folio = filemap_get_folio(META_MAPPING(sbi), 1823 1808 GET_SUM_BLOCK(sbi, segno)); 1824 - unsigned int block_end_segno = rounddown(segno, SUMS_PER_BLOCK) 1825 - + SUMS_PER_BLOCK; 1809 + unsigned int block_end_segno = rounddown(segno, sbi->sums_per_block) 1810 + + sbi->sums_per_block; 1826 1811 1827 1812 if (block_end_segno > end_segno) 1828 1813 block_end_segno = end_segno; ··· 1848 1833 migrated >= sbi->migration_granularity) 1849 1834 continue; 1850 1835 1851 - sum = SUM_BLK_PAGE_ADDR(sum_folio, cur_segno); 1852 - if (type != GET_SUM_TYPE((&sum->footer))) { 1836 + sum = SUM_BLK_PAGE_ADDR(sbi, sum_folio, cur_segno); 1837 + if (type != GET_SUM_TYPE(sum_footer(sbi, sum))) { 1853 1838 f2fs_err(sbi, "Inconsistent segment (%u) type " 1854 1839 "[%d, %d] in SSA and SIT", 1855 1840 cur_segno, type, 1856 - GET_SUM_TYPE((&sum->footer))); 1841 + GET_SUM_TYPE( 1842 + sum_footer(sbi, sum))); 1857 1843 f2fs_stop_checkpoint(sbi, false, 1858 1844 STOP_CP_REASON_CORRUPTED_SUMMARY); 1859 1845 continue; ··· 1869 1853 */ 1870 1854 if (type == SUM_TYPE_NODE) 1871 1855 submitted += gc_node_segment(sbi, sum->entries, 1872 - cur_segno, gc_type); 1856 + cur_segno, gc_type, &plug); 1873 1857 else 1874 1858 submitted += gc_data_segment(sbi, sum->entries, 1875 1859 gc_list, cur_segno, 1876 - gc_type, force_migrate); 1860 + gc_type, force_migrate, &plug); 1877 1861 1878 1862 stat_inc_gc_seg_count(sbi, data_type, gc_type); 1879 1863 sbi->gc_reclaimed_segs[sbi->gc_mode]++; ··· 2016 2000 goto stop; 2017 2001 } 2018 2002 2019 - __get_secs_required(sbi, NULL, &upper_secs, NULL); 2003 + upper_secs = __get_secs_required(sbi); 2020 2004 2021 2005 /* 2022 2006 * Write checkpoint to reclaim prefree segments. ··· 2051 2035 reserved_segments(sbi), 2052 2036 prefree_segments(sbi)); 2053 2037 2054 - f2fs_up_write(&sbi->gc_lock); 2038 + f2fs_up_write_trace(&sbi->gc_lock, &gc_control->lc); 2055 2039 2056 2040 put_gc_inode(&gc_list); 2057 2041 ··· 2112 2096 if (unlikely(f2fs_cp_error(sbi))) 2113 2097 return -EIO; 2114 2098 2099 + stat_inc_gc_call_count(sbi, FOREGROUND); 2115 2100 for (segno = start_seg; segno <= end_seg; segno += SEGS_PER_SEC(sbi)) { 2116 2101 struct gc_inode_list gc_list = { 2117 2102 .ilist = LIST_HEAD_INIT(gc_list.ilist), ··· 2268 2251 struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp)); 2269 2252 __u64 old_block_count, shrunk_blocks; 2270 2253 struct cp_control cpc = { CP_RESIZE, 0, 0, 0 }; 2254 + struct f2fs_lock_context lc; 2255 + struct f2fs_lock_context glc; 2256 + struct f2fs_lock_context clc; 2271 2257 unsigned int secs; 2272 2258 int err = 0; 2273 2259 __u32 rem; ··· 2314 2294 secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi)); 2315 2295 2316 2296 /* stop other GC */ 2317 - if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 2297 + if (!f2fs_down_write_trylock_trace(&sbi->gc_lock, &glc)) { 2318 2298 err = -EAGAIN; 2319 2299 goto out_drop_write; 2320 2300 } 2321 2301 2322 2302 /* stop CP to protect MAIN_SEC in free_segment_range */ 2323 - f2fs_lock_op(sbi); 2303 + f2fs_lock_op(sbi, &lc); 2324 2304 2325 2305 spin_lock(&sbi->stat_lock); 2326 2306 if (shrunk_blocks + valid_user_blocks(sbi) + ··· 2335 2315 err = free_segment_range(sbi, secs, true); 2336 2316 2337 2317 out_unlock: 2338 - f2fs_unlock_op(sbi); 2339 - f2fs_up_write(&sbi->gc_lock); 2318 + f2fs_unlock_op(sbi, &lc); 2319 + f2fs_up_write_trace(&sbi->gc_lock, &glc); 2340 2320 out_drop_write: 2341 2321 mnt_drop_write_file(filp); 2342 2322 if (err) ··· 2353 2333 return -EROFS; 2354 2334 } 2355 2335 2356 - f2fs_down_write(&sbi->gc_lock); 2357 - f2fs_down_write(&sbi->cp_global_sem); 2336 + f2fs_down_write_trace(&sbi->gc_lock, &glc); 2337 + f2fs_down_write_trace(&sbi->cp_global_sem, &clc); 2358 2338 2359 2339 spin_lock(&sbi->stat_lock); 2360 2340 if (shrunk_blocks + valid_user_blocks(sbi) + ··· 2402 2382 spin_unlock(&sbi->stat_lock); 2403 2383 } 2404 2384 out_err: 2405 - f2fs_up_write(&sbi->cp_global_sem); 2406 - f2fs_up_write(&sbi->gc_lock); 2385 + f2fs_up_write_trace(&sbi->cp_global_sem, &clc); 2386 + f2fs_up_write_trace(&sbi->gc_lock, &glc); 2407 2387 thaw_super(sbi->sb, FREEZE_HOLDER_KERNEL, NULL); 2408 2388 return err; 2409 2389 }

+6 -4

fs/f2fs/inline.c

··· 218 218 { 219 219 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 220 220 struct dnode_of_data dn; 221 + struct f2fs_lock_context lc; 221 222 struct folio *ifolio, *folio; 222 223 int err = 0; 223 224 ··· 236 235 if (IS_ERR(folio)) 237 236 return PTR_ERR(folio); 238 237 239 - f2fs_lock_op(sbi); 238 + f2fs_lock_op(sbi, &lc); 240 239 241 240 ifolio = f2fs_get_inode_folio(sbi, inode->i_ino); 242 241 if (IS_ERR(ifolio)) { ··· 251 250 252 251 f2fs_put_dnode(&dn); 253 252 out: 254 - f2fs_unlock_op(sbi); 253 + f2fs_unlock_op(sbi, &lc); 255 254 256 255 f2fs_folio_put(folio, true); 257 256 ··· 598 597 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 599 598 struct folio *ifolio; 600 599 struct f2fs_filename fname; 600 + struct f2fs_lock_context lc; 601 601 void *inline_dentry = NULL; 602 602 int err = 0; 603 603 604 604 if (!f2fs_has_inline_dentry(dir)) 605 605 return 0; 606 606 607 - f2fs_lock_op(sbi); 607 + f2fs_lock_op(sbi, &lc); 608 608 609 609 err = f2fs_setup_filename(dir, &dentry->d_name, 0, &fname); 610 610 if (err) ··· 630 628 out_fname: 631 629 f2fs_free_filename(&fname); 632 630 out: 633 - f2fs_unlock_op(sbi); 631 + f2fs_unlock_op(sbi, &lc); 634 632 return err; 635 633 } 636 634

+11 -5

fs/f2fs/inode.c

··· 597 597 if (ret) 598 598 goto bad_inode; 599 599 make_now: 600 + f2fs_set_inode_flags(inode); 601 + 600 602 if (ino == F2FS_NODE_INO(sbi)) { 601 603 inode->i_mapping->a_ops = &f2fs_node_aops; 602 604 mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); ··· 620 618 inode->i_op = &f2fs_file_inode_operations; 621 619 inode->i_fop = &f2fs_file_operations; 622 620 inode->i_mapping->a_ops = &f2fs_dblock_aops; 621 + if (IS_IMMUTABLE(inode) && !f2fs_compressed_file(inode) && 622 + !f2fs_quota_file(sbi, inode->i_ino)) 623 + mapping_set_folio_min_order(inode->i_mapping, 0); 623 624 } else if (S_ISDIR(inode->i_mode)) { 624 625 inode->i_op = &f2fs_dir_inode_operations; 625 626 inode->i_fop = &f2fs_dir_operations; ··· 643 638 ret = -EIO; 644 639 goto bad_inode; 645 640 } 646 - f2fs_set_inode_flags(inode); 647 641 648 642 unlock_new_inode(inode); 649 643 trace_f2fs_iget(inode); ··· 910 906 err = -EIO; 911 907 912 908 if (!err) { 913 - f2fs_lock_op(sbi); 909 + struct f2fs_lock_context lc; 910 + 911 + f2fs_lock_op(sbi, &lc); 914 912 err = f2fs_remove_inode_page(inode); 915 - f2fs_unlock_op(sbi); 913 + f2fs_unlock_op(sbi, &lc); 916 914 if (err == -ENOENT) { 917 915 err = 0; 918 916 ··· 1010 1004 } 1011 1005 1012 1006 /* caller should call f2fs_lock_op() */ 1013 - void f2fs_handle_failed_inode(struct inode *inode) 1007 + void f2fs_handle_failed_inode(struct inode *inode, struct f2fs_lock_context *lc) 1014 1008 { 1015 1009 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1016 1010 struct node_info ni; ··· 1059 1053 } 1060 1054 1061 1055 out: 1062 - f2fs_unlock_op(sbi); 1056 + f2fs_unlock_op(sbi, lc); 1063 1057 1064 1058 /* iput will drop the inode object */ 1065 1059 iput(inode);

+37 -28

fs/f2fs/namei.c

··· 354 354 struct dentry *dentry, umode_t mode, bool excl) 355 355 { 356 356 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 357 + struct f2fs_lock_context lc; 357 358 struct inode *inode; 358 359 nid_t ino = 0; 359 360 int err; ··· 377 376 inode->i_mapping->a_ops = &f2fs_dblock_aops; 378 377 ino = inode->i_ino; 379 378 380 - f2fs_lock_op(sbi); 379 + f2fs_lock_op(sbi, &lc); 381 380 err = f2fs_add_link(dentry, inode); 382 381 if (err) 383 382 goto out; 384 - f2fs_unlock_op(sbi); 383 + f2fs_unlock_op(sbi, &lc); 385 384 386 385 f2fs_alloc_nid_done(sbi, ino); 387 386 ··· 393 392 f2fs_balance_fs(sbi, true); 394 393 return 0; 395 394 out: 396 - f2fs_handle_failed_inode(inode); 395 + f2fs_handle_failed_inode(inode, &lc); 397 396 return err; 398 397 } 399 398 ··· 402 401 { 403 402 struct inode *inode = d_inode(old_dentry); 404 403 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 404 + struct f2fs_lock_context lc; 405 405 int err; 406 406 407 407 if (unlikely(f2fs_cp_error(sbi))) ··· 429 427 ihold(inode); 430 428 431 429 set_inode_flag(inode, FI_INC_LINK); 432 - f2fs_lock_op(sbi); 430 + f2fs_lock_op(sbi, &lc); 433 431 err = f2fs_add_link(dentry, inode); 434 432 if (err) 435 433 goto out; 436 - f2fs_unlock_op(sbi); 434 + f2fs_unlock_op(sbi, &lc); 437 435 438 436 d_instantiate(dentry, inode); 439 437 ··· 443 441 out: 444 442 clear_inode_flag(inode, FI_INC_LINK); 445 443 iput(inode); 446 - f2fs_unlock_op(sbi); 444 + f2fs_unlock_op(sbi, &lc); 447 445 return err; 448 446 } 449 447 ··· 547 545 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 548 546 struct inode *inode = d_inode(dentry); 549 547 struct f2fs_dir_entry *de; 548 + struct f2fs_lock_context lc; 550 549 struct folio *folio; 551 550 int err; 552 551 ··· 584 581 585 582 f2fs_balance_fs(sbi, true); 586 583 587 - f2fs_lock_op(sbi); 584 + f2fs_lock_op(sbi, &lc); 588 585 err = f2fs_acquire_orphan_inode(sbi); 589 586 if (err) { 590 - f2fs_unlock_op(sbi); 587 + f2fs_unlock_op(sbi, &lc); 591 588 f2fs_folio_put(folio, false); 592 589 goto out; 593 590 } 594 591 f2fs_delete_entry(de, folio, dir, inode); 595 - f2fs_unlock_op(sbi); 592 + f2fs_unlock_op(sbi, &lc); 596 593 597 594 /* VFS negative dentries are incompatible with Encoding and 598 595 * Case-insensitiveness. Eventually we'll want avoid ··· 635 632 struct dentry *dentry, const char *symname) 636 633 { 637 634 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 635 + struct f2fs_lock_context lc; 638 636 struct inode *inode; 639 637 size_t len = strlen(symname); 640 638 struct fscrypt_str disk_link; ··· 666 662 inode_nohighmem(inode); 667 663 inode->i_mapping->a_ops = &f2fs_dblock_aops; 668 664 669 - f2fs_lock_op(sbi); 665 + f2fs_lock_op(sbi, &lc); 670 666 err = f2fs_add_link(dentry, inode); 671 667 if (err) 672 668 goto out_f2fs_handle_failed_inode; 673 - f2fs_unlock_op(sbi); 669 + f2fs_unlock_op(sbi, &lc); 674 670 f2fs_alloc_nid_done(sbi, inode->i_ino); 675 671 676 672 err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link); ··· 705 701 goto out_free_encrypted_link; 706 702 707 703 out_f2fs_handle_failed_inode: 708 - f2fs_handle_failed_inode(inode); 704 + f2fs_handle_failed_inode(inode, &lc); 709 705 out_free_encrypted_link: 710 706 if (disk_link.name != (unsigned char *)symname) 711 707 kfree(disk_link.name); ··· 716 712 struct dentry *dentry, umode_t mode) 717 713 { 718 714 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 715 + struct f2fs_lock_context lc; 719 716 struct inode *inode; 720 717 int err; 721 718 ··· 737 732 mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); 738 733 739 734 set_inode_flag(inode, FI_INC_LINK); 740 - f2fs_lock_op(sbi); 735 + f2fs_lock_op(sbi, &lc); 741 736 err = f2fs_add_link(dentry, inode); 742 737 if (err) 743 738 goto out_fail; 744 - f2fs_unlock_op(sbi); 739 + f2fs_unlock_op(sbi, &lc); 745 740 746 741 f2fs_alloc_nid_done(sbi, inode->i_ino); 747 742 ··· 755 750 756 751 out_fail: 757 752 clear_inode_flag(inode, FI_INC_LINK); 758 - f2fs_handle_failed_inode(inode); 753 + f2fs_handle_failed_inode(inode, &lc); 759 754 return ERR_PTR(err); 760 755 } 761 756 ··· 772 767 struct dentry *dentry, umode_t mode, dev_t rdev) 773 768 { 774 769 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 770 + struct f2fs_lock_context lc; 775 771 struct inode *inode; 776 772 int err = 0; 777 773 ··· 792 786 init_special_inode(inode, inode->i_mode, rdev); 793 787 inode->i_op = &f2fs_special_inode_operations; 794 788 795 - f2fs_lock_op(sbi); 789 + f2fs_lock_op(sbi, &lc); 796 790 err = f2fs_add_link(dentry, inode); 797 791 if (err) 798 792 goto out; 799 - f2fs_unlock_op(sbi); 793 + f2fs_unlock_op(sbi, &lc); 800 794 801 795 f2fs_alloc_nid_done(sbi, inode->i_ino); 802 796 ··· 808 802 f2fs_balance_fs(sbi, true); 809 803 return 0; 810 804 out: 811 - f2fs_handle_failed_inode(inode); 805 + f2fs_handle_failed_inode(inode, &lc); 812 806 return err; 813 807 } 814 808 ··· 817 811 struct inode **new_inode, struct f2fs_filename *fname) 818 812 { 819 813 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 814 + struct f2fs_lock_context lc; 820 815 struct inode *inode; 821 816 int err; 822 817 ··· 838 831 inode->i_mapping->a_ops = &f2fs_dblock_aops; 839 832 } 840 833 841 - f2fs_lock_op(sbi); 834 + f2fs_lock_op(sbi, &lc); 842 835 err = f2fs_acquire_orphan_inode(sbi); 843 836 if (err) 844 837 goto out; ··· 867 860 f2fs_i_links_write(inode, false); 868 861 } 869 862 /* link_count was changed by d_tmpfile as well. */ 870 - f2fs_unlock_op(sbi); 863 + f2fs_unlock_op(sbi, &lc); 871 864 unlock_new_inode(inode); 872 865 873 866 if (new_inode) ··· 879 872 release_out: 880 873 f2fs_release_orphan_inode(sbi); 881 874 out: 882 - f2fs_handle_failed_inode(inode); 875 + f2fs_handle_failed_inode(inode, &lc); 883 876 return err; 884 877 } 885 878 ··· 927 920 struct f2fs_dir_entry *old_dir_entry = NULL; 928 921 struct f2fs_dir_entry *old_entry; 929 922 struct f2fs_dir_entry *new_entry; 923 + struct f2fs_lock_context lc; 930 924 bool old_is_dir = S_ISDIR(old_inode->i_mode); 931 925 int err; 932 926 ··· 1016 1008 1017 1009 f2fs_balance_fs(sbi, true); 1018 1010 1019 - f2fs_lock_op(sbi); 1011 + f2fs_lock_op(sbi, &lc); 1020 1012 1021 1013 err = f2fs_acquire_orphan_inode(sbi); 1022 1014 if (err) ··· 1039 1031 } else { 1040 1032 f2fs_balance_fs(sbi, true); 1041 1033 1042 - f2fs_lock_op(sbi); 1034 + f2fs_lock_op(sbi, &lc); 1043 1035 1044 1036 err = f2fs_add_link(new_dentry, old_inode); 1045 1037 if (err) { 1046 - f2fs_unlock_op(sbi); 1038 + f2fs_unlock_op(sbi, &lc); 1047 1039 goto out_dir; 1048 1040 } 1049 1041 ··· 1092 1084 TRANS_DIR_INO); 1093 1085 } 1094 1086 1095 - f2fs_unlock_op(sbi); 1087 + f2fs_unlock_op(sbi, &lc); 1096 1088 1097 1089 if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) 1098 1090 f2fs_sync_fs(sbi->sb, 1); ··· 1101 1093 return 0; 1102 1094 1103 1095 put_out_dir: 1104 - f2fs_unlock_op(sbi); 1096 + f2fs_unlock_op(sbi, &lc); 1105 1097 f2fs_folio_put(new_folio, false); 1106 1098 out_dir: 1107 1099 if (old_dir_entry) ··· 1123 1115 struct folio *old_folio, *new_folio; 1124 1116 struct f2fs_dir_entry *old_dir_entry = NULL, *new_dir_entry = NULL; 1125 1117 struct f2fs_dir_entry *old_entry, *new_entry; 1118 + struct f2fs_lock_context lc; 1126 1119 int old_nlink = 0, new_nlink = 0; 1127 1120 int err; 1128 1121 ··· 1203 1194 1204 1195 f2fs_balance_fs(sbi, true); 1205 1196 1206 - f2fs_lock_op(sbi); 1197 + f2fs_lock_op(sbi, &lc); 1207 1198 1208 1199 /* update ".." directory entry info of old dentry */ 1209 1200 if (old_dir_entry) ··· 1256 1247 f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); 1257 1248 } 1258 1249 1259 - f2fs_unlock_op(sbi); 1250 + f2fs_unlock_op(sbi, &lc); 1260 1251 1261 1252 if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) 1262 1253 f2fs_sync_fs(sbi->sb, 1);

+72 -27

fs/f2fs/node.c

··· 606 606 goto retry; 607 607 } 608 608 609 - i = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0); 609 + i = f2fs_lookup_journal_in_cursum(sbi, journal, NAT_JOURNAL, nid, 0); 610 610 if (i >= 0) { 611 611 ne = nat_in_journal(journal, i); 612 612 node_info_from_raw_nat(ni, &ne); ··· 641 641 ni->ino, ni->nid, ni->blk_addr, ni->version, ni->flag); 642 642 f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT); 643 643 return -EFSCORRUPTED; 644 + } 645 + 646 + if (unlikely(f2fs_quota_file(sbi, ni->nid) && 647 + !__is_valid_data_blkaddr(ni->blk_addr))) { 648 + set_sbi_flag(sbi, SBI_NEED_FSCK); 649 + f2fs_err_ratelimited(sbi, 650 + "f2fs_get_node_info of %pS: inconsistent nat entry from qf_ino, " 651 + "ino:%u, nid:%u, blkaddr:%u, ver:%u, flag:%u", 652 + __builtin_return_address(0), 653 + ni->ino, ni->nid, ni->blk_addr, ni->version, ni->flag); 654 + f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT); 644 655 } 645 656 646 657 /* cache nat entry */ ··· 1511 1500 f2fs_folio_put(afolio, err ? true : false); 1512 1501 } 1513 1502 1514 - static int sanity_check_node_footer(struct f2fs_sb_info *sbi, 1503 + int f2fs_sanity_check_node_footer(struct f2fs_sb_info *sbi, 1515 1504 struct folio *folio, pgoff_t nid, 1516 - enum node_type ntype) 1505 + enum node_type ntype, bool in_irq) 1517 1506 { 1507 + bool is_inode, is_xnode; 1508 + 1518 1509 if (unlikely(nid != nid_of_node(folio))) 1519 1510 goto out_err; 1520 1511 1512 + is_inode = IS_INODE(folio); 1513 + is_xnode = f2fs_has_xattr_block(ofs_of_node(folio)); 1514 + 1521 1515 switch (ntype) { 1516 + case NODE_TYPE_REGULAR: 1517 + if (is_inode && is_xnode) 1518 + goto out_err; 1519 + break; 1522 1520 case NODE_TYPE_INODE: 1523 - if (!IS_INODE(folio)) 1521 + if (!is_inode || is_xnode) 1524 1522 goto out_err; 1525 1523 break; 1526 1524 case NODE_TYPE_XATTR: 1527 - if (!f2fs_has_xattr_block(ofs_of_node(folio))) 1525 + if (is_inode || !is_xnode) 1528 1526 goto out_err; 1529 1527 break; 1530 1528 case NODE_TYPE_NON_INODE: 1531 - if (IS_INODE(folio)) 1529 + if (is_inode) 1532 1530 goto out_err; 1533 1531 break; 1534 1532 default: ··· 1547 1527 goto out_err; 1548 1528 return 0; 1549 1529 out_err: 1550 - f2fs_warn(sbi, "inconsistent node block, node_type:%d, nid:%lu, " 1551 - "node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]", 1552 - ntype, nid, nid_of_node(folio), ino_of_node(folio), 1553 - ofs_of_node(folio), cpver_of_node(folio), 1554 - next_blkaddr_of_node(folio)); 1555 1530 set_sbi_flag(sbi, SBI_NEED_FSCK); 1531 + f2fs_warn_ratelimited(sbi, "inconsistent node block, node_type:%d, nid:%lu, " 1532 + "node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]", 1533 + ntype, nid, nid_of_node(folio), ino_of_node(folio), 1534 + ofs_of_node(folio), cpver_of_node(folio), 1535 + next_blkaddr_of_node(folio)); 1536 + 1556 1537 f2fs_handle_error(sbi, ERROR_INCONSISTENT_FOOTER); 1557 1538 return -EFSCORRUPTED; 1558 1539 } ··· 1599 1578 goto out_err; 1600 1579 } 1601 1580 page_hit: 1602 - err = sanity_check_node_footer(sbi, folio, nid, ntype); 1581 + err = f2fs_sanity_check_node_footer(sbi, folio, nid, ntype, false); 1603 1582 if (!err) 1604 1583 return folio; 1605 1584 out_err: ··· 1748 1727 .io_type = io_type, 1749 1728 .io_wbc = wbc, 1750 1729 }; 1730 + struct f2fs_lock_context lc; 1751 1731 unsigned int seq; 1752 1732 1753 1733 trace_f2fs_writepage(folio, NODE); ··· 1773 1751 1774 1752 /* get old block addr of this node page */ 1775 1753 nid = nid_of_node(folio); 1776 - f2fs_bug_on(sbi, folio->index != nid); 1754 + 1755 + if (f2fs_sanity_check_node_footer(sbi, folio, nid, 1756 + NODE_TYPE_REGULAR, false)) { 1757 + f2fs_handle_critical_error(sbi, STOP_CP_REASON_CORRUPTED_NID); 1758 + goto redirty_out; 1759 + } 1777 1760 1778 1761 if (f2fs_get_node_info(sbi, nid, &ni, !do_balance)) 1779 1762 goto redirty_out; 1780 1763 1781 - f2fs_down_read(&sbi->node_write); 1764 + f2fs_down_read_trace(&sbi->node_write, &lc); 1782 1765 1783 1766 /* This page is already truncated */ 1784 1767 if (unlikely(ni.blk_addr == NULL_ADDR)) { 1785 1768 folio_clear_uptodate(folio); 1786 1769 dec_page_count(sbi, F2FS_DIRTY_NODES); 1787 - f2fs_up_read(&sbi->node_write); 1770 + f2fs_up_read_trace(&sbi->node_write, &lc); 1788 1771 folio_unlock(folio); 1789 1772 return true; 1790 1773 } ··· 1797 1770 if (__is_valid_data_blkaddr(ni.blk_addr) && 1798 1771 !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, 1799 1772 DATA_GENERIC_ENHANCE)) { 1800 - f2fs_up_read(&sbi->node_write); 1773 + f2fs_up_read_trace(&sbi->node_write, &lc); 1801 1774 goto redirty_out; 1802 1775 } 1803 1776 1804 - if (atomic && !test_opt(sbi, NOBARRIER)) 1805 - fio.op_flags |= REQ_PREFLUSH | REQ_FUA; 1777 + if (atomic) { 1778 + if (!test_opt(sbi, NOBARRIER)) 1779 + fio.op_flags |= REQ_PREFLUSH | REQ_FUA; 1780 + if (IS_INODE(folio)) 1781 + set_dentry_mark(folio, 1782 + f2fs_need_dentry_mark(sbi, ino_of_node(folio))); 1783 + } 1806 1784 1807 1785 /* should add to global list before clearing PAGECACHE status */ 1808 1786 if (f2fs_in_warm_node_list(sbi, folio)) { ··· 1822 1790 f2fs_do_write_node_page(nid, &fio); 1823 1791 set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(folio)); 1824 1792 dec_page_count(sbi, F2FS_DIRTY_NODES); 1825 - f2fs_up_read(&sbi->node_write); 1793 + f2fs_up_read_trace(&sbi->node_write, &lc); 1826 1794 1827 1795 folio_unlock(folio); 1828 1796 ··· 1948 1916 if (is_inode_flag_set(inode, 1949 1917 FI_DIRTY_INODE)) 1950 1918 f2fs_update_inode(inode, folio); 1951 - set_dentry_mark(folio, 1952 - f2fs_need_dentry_mark(sbi, ino)); 1919 + if (!atomic) 1920 + set_dentry_mark(folio, 1921 + f2fs_need_dentry_mark(sbi, ino)); 1953 1922 } 1954 1923 /* may be written by other thread */ 1955 1924 if (!folio_test_dirty(folio)) ··· 2970 2937 /* scan the node segment */ 2971 2938 last_offset = BLKS_PER_SEG(sbi); 2972 2939 addr = START_BLOCK(sbi, segno); 2973 - sum_entry = &sum->entries[0]; 2940 + sum_entry = sum_entries(sum); 2974 2941 2975 2942 for (i = 0; i < last_offset; i += nrpages, addr += nrpages) { 2976 2943 nrpages = bio_max_segs(last_offset - i); ··· 3111 3078 * #2, flush nat entries to nat page. 3112 3079 */ 3113 3080 if (enabled_nat_bits(sbi, cpc) || 3114 - !__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL)) 3081 + !__has_cursum_space(sbi, journal, set->entry_cnt, NAT_JOURNAL)) 3115 3082 to_journal = false; 3116 3083 3117 3084 if (to_journal) { ··· 3134 3101 f2fs_bug_on(sbi, nat_get_blkaddr(ne) == NEW_ADDR); 3135 3102 3136 3103 if (to_journal) { 3137 - offset = f2fs_lookup_journal_in_cursum(journal, 3104 + offset = f2fs_lookup_journal_in_cursum(sbi, journal, 3138 3105 NAT_JOURNAL, nid, 1); 3139 3106 f2fs_bug_on(sbi, offset < 0); 3140 3107 raw_ne = &nat_in_journal(journal, offset); ··· 3179 3146 struct f2fs_journal *journal = curseg->journal; 3180 3147 struct nat_entry_set *setvec[NAT_VEC_SIZE]; 3181 3148 struct nat_entry_set *set, *tmp; 3182 - unsigned int found; 3149 + unsigned int found, entry_count = 0; 3183 3150 nid_t set_idx = 0; 3184 3151 LIST_HEAD(sets); 3185 3152 int err = 0; ··· 3205 3172 * into nat entry set. 3206 3173 */ 3207 3174 if (enabled_nat_bits(sbi, cpc) || 3208 - !__has_cursum_space(journal, 3175 + !__has_cursum_space(sbi, journal, 3209 3176 nm_i->nat_cnt[DIRTY_NAT], NAT_JOURNAL)) 3210 3177 remove_nats_in_journal(sbi); 3211 3178 ··· 3216 3183 set_idx = setvec[found - 1]->set + 1; 3217 3184 for (idx = 0; idx < found; idx++) 3218 3185 __adjust_nat_entry_set(setvec[idx], &sets, 3219 - MAX_NAT_JENTRIES(journal)); 3186 + MAX_NAT_JENTRIES(sbi, journal)); 3220 3187 } 3221 3188 3189 + /* 3190 + * Readahead the current NAT block to prevent read requests from 3191 + * being issued and waited on one by one. 3192 + */ 3193 + list_for_each_entry(set, &sets, set_list) { 3194 + entry_count += set->entry_cnt; 3195 + if (!enabled_nat_bits(sbi, cpc) && 3196 + __has_cursum_space(sbi, journal, 3197 + entry_count, NAT_JOURNAL)) 3198 + continue; 3199 + f2fs_ra_meta_pages(sbi, set->set, 1, META_NAT, true); 3200 + } 3222 3201 /* flush dirty nats in nat entry set */ 3223 3202 list_for_each_entry_safe(set, tmp, &sets, set_list) { 3224 3203 err = __flush_nat_entry_set(sbi, set, cpc);

-8

fs/f2fs/node.h

··· 52 52 IS_PREALLOC, /* nat entry is preallocated */ 53 53 }; 54 54 55 - /* For node type in __get_node_folio() */ 56 - enum node_type { 57 - NODE_TYPE_REGULAR, 58 - NODE_TYPE_INODE, 59 - NODE_TYPE_XATTR, 60 - NODE_TYPE_NON_INODE, 61 - }; 62 - 63 55 /* 64 56 * For node information 65 57 */

+6 -5

fs/f2fs/recovery.c

··· 514 514 struct curseg_info *curseg = CURSEG_I(sbi, i); 515 515 516 516 if (curseg->segno == segno) { 517 - sum = curseg->sum_blk->entries[blkoff]; 517 + sum = sum_entries(curseg->sum_blk)[blkoff]; 518 518 goto got_it; 519 519 } 520 520 } ··· 522 522 sum_folio = f2fs_get_sum_folio(sbi, segno); 523 523 if (IS_ERR(sum_folio)) 524 524 return PTR_ERR(sum_folio); 525 - sum_node = SUM_BLK_PAGE_ADDR(sum_folio, segno); 526 - sum = sum_node->entries[blkoff]; 525 + sum_node = SUM_BLK_PAGE_ADDR(sbi, sum_folio, segno); 526 + sum = sum_entries(sum_node)[blkoff]; 527 527 f2fs_folio_put(sum_folio, true); 528 528 got_it: 529 529 /* Use the locked dnode page and inode */ ··· 875 875 LIST_HEAD(inode_list); 876 876 LIST_HEAD(tmp_inode_list); 877 877 LIST_HEAD(dir_list); 878 + struct f2fs_lock_context lc; 878 879 int err; 879 880 int ret = 0; 880 881 unsigned long s_flags = sbi->sb->s_flags; ··· 889 888 f2fs_info(sbi, "recover fsync data on readonly fs"); 890 889 891 890 /* prevent checkpoint */ 892 - f2fs_down_write(&sbi->cp_global_sem); 891 + f2fs_down_write_trace(&sbi->cp_global_sem, &lc); 893 892 894 893 /* step #1: find fsynced inode numbers */ 895 894 err = find_fsync_dnodes(sbi, &inode_list, check_only, &new_inode); ··· 933 932 if (!err) 934 933 clear_sbi_flag(sbi, SBI_POR_DOING); 935 934 936 - f2fs_up_write(&sbi->cp_global_sem); 935 + f2fs_up_write_trace(&sbi->cp_global_sem, &lc); 937 936 938 937 /* let's drop all the directory inodes for clean checkpoint */ 939 938 destroy_fsync_dnodes(&dir_list, err);

+65 -68

fs/f2fs/segment.c

··· 371 371 } 372 372 373 373 out: 374 - if (time_to_inject(sbi, FAULT_TIMEOUT)) 375 - f2fs_io_schedule_timeout_killable(DEFAULT_FAULT_TIMEOUT); 374 + if (time_to_inject(sbi, FAULT_ATOMIC_TIMEOUT)) 375 + f2fs_schedule_timeout_killable(DEFAULT_FAULT_TIMEOUT, true); 376 376 377 377 if (ret) { 378 378 sbi->revoked_atomic_block += fi->atomic_write_cnt; ··· 400 400 { 401 401 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 402 402 struct f2fs_inode_info *fi = F2FS_I(inode); 403 + struct f2fs_lock_context lc; 403 404 int err; 404 405 405 406 err = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); ··· 408 407 return err; 409 408 410 409 f2fs_down_write(&fi->i_gc_rwsem[WRITE]); 411 - f2fs_lock_op(sbi); 410 + f2fs_lock_op(sbi, &lc); 412 411 413 412 err = __f2fs_commit_atomic_write(inode); 414 413 415 - f2fs_unlock_op(sbi); 414 + f2fs_unlock_op(sbi, &lc); 416 415 f2fs_up_write(&fi->i_gc_rwsem[WRITE]); 417 416 418 417 return err; ··· 462 461 .should_migrate_blocks = false, 463 462 .err_gc_skipped = false, 464 463 .nr_free_secs = 1 }; 465 - f2fs_down_write(&sbi->gc_lock); 464 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 466 465 stat_inc_gc_call_count(sbi, FOREGROUND); 467 466 f2fs_gc(sbi, &gc_control); 468 467 } ··· 1287 1286 &(dcc->fstrim_list) : &(dcc->wait_list); 1288 1287 blk_opf_t flag = dpolicy->sync ? REQ_SYNC : 0; 1289 1288 block_t lstart, start, len, total_len; 1290 - int err = 0; 1291 1289 1292 1290 if (dc->state != D_PREP) 1293 1291 return 0; ··· 1327 1327 1328 1328 dc->di.len = 0; 1329 1329 1330 - while (total_len && *issued < dpolicy->max_requests && !err) { 1330 + while (total_len && *issued < dpolicy->max_requests) { 1331 1331 struct bio *bio = NULL; 1332 1332 unsigned long flags; 1333 1333 bool last = true; ··· 1342 1342 last = true; 1343 1343 1344 1344 dc->di.len += len; 1345 - 1346 - err = 0; 1347 - if (time_to_inject(sbi, FAULT_DISCARD)) { 1348 - err = -EIO; 1349 - spin_lock_irqsave(&dc->lock, flags); 1350 - if (dc->state == D_PARTIAL) 1351 - dc->state = D_SUBMIT; 1352 - spin_unlock_irqrestore(&dc->lock, flags); 1353 - 1354 - break; 1355 - } 1356 1345 1357 1346 __blkdev_issue_discard(bdev, SECTOR_FROM_BLOCK(start), 1358 1347 SECTOR_FROM_BLOCK(len), GFP_NOFS, &bio); ··· 1381 1392 len = total_len; 1382 1393 } 1383 1394 1384 - if (!err && len) { 1395 + if (len) { 1385 1396 dcc->undiscard_blks -= len; 1386 1397 __update_discard_tree_range(sbi, bdev, lstart, start, len); 1387 1398 } 1388 - return err; 1399 + return 0; 1389 1400 } 1390 1401 1391 1402 static void __insert_discard_cmd(struct f2fs_sb_info *sbi, ··· 2674 2685 valid_sum_count += f2fs_curseg_valid_blocks(sbi, i); 2675 2686 } 2676 2687 2677 - sum_in_page = (PAGE_SIZE - 2 * SUM_JOURNAL_SIZE - 2688 + sum_in_page = (sbi->blocksize - 2 * sbi->sum_journal_size - 2678 2689 SUM_FOOTER_SIZE) / SUMMARY_SIZE; 2679 2690 if (valid_sum_count <= sum_in_page) 2680 2691 return 1; 2681 2692 else if ((valid_sum_count - sum_in_page) <= 2682 - (PAGE_SIZE - SUM_FOOTER_SIZE) / SUMMARY_SIZE) 2693 + (sbi->blocksize - SUM_FOOTER_SIZE) / SUMMARY_SIZE) 2683 2694 return 2; 2684 2695 return 3; 2685 2696 } ··· 2699 2710 { 2700 2711 struct folio *folio; 2701 2712 2702 - if (SUMS_PER_BLOCK == 1) 2713 + if (!f2fs_sb_has_packed_ssa(sbi)) 2703 2714 folio = f2fs_grab_meta_folio(sbi, blk_addr); 2704 2715 else 2705 2716 folio = f2fs_get_meta_folio_retry(sbi, blk_addr); ··· 2717 2728 { 2718 2729 struct folio *folio; 2719 2730 2720 - if (SUMS_PER_BLOCK == 1) 2731 + if (!f2fs_sb_has_packed_ssa(sbi)) 2721 2732 return f2fs_update_meta_page(sbi, (void *)sum_blk, 2722 2733 GET_SUM_BLOCK(sbi, segno)); 2723 2734 ··· 2725 2736 if (IS_ERR(folio)) 2726 2737 return; 2727 2738 2728 - memcpy(SUM_BLK_PAGE_ADDR(folio, segno), sum_blk, sizeof(*sum_blk)); 2739 + memcpy(SUM_BLK_PAGE_ADDR(sbi, folio, segno), sum_blk, 2740 + sbi->sum_blocksize); 2729 2741 folio_mark_dirty(folio); 2730 2742 f2fs_folio_put(folio, true); 2731 2743 } ··· 2745 2755 mutex_lock(&curseg->curseg_mutex); 2746 2756 2747 2757 down_read(&curseg->journal_rwsem); 2748 - memcpy(&dst->journal, curseg->journal, SUM_JOURNAL_SIZE); 2758 + memcpy(sum_journal(sbi, dst), curseg->journal, sbi->sum_journal_size); 2749 2759 up_read(&curseg->journal_rwsem); 2750 2760 2751 - memcpy(dst->entries, src->entries, SUM_ENTRY_SIZE); 2752 - memcpy(&dst->footer, &src->footer, SUM_FOOTER_SIZE); 2761 + memcpy(sum_entries(dst), sum_entries(src), sbi->sum_entry_size); 2762 + memcpy(sum_footer(sbi, dst), sum_footer(sbi, src), SUM_FOOTER_SIZE); 2753 2763 2754 2764 mutex_unlock(&curseg->curseg_mutex); 2755 2765 ··· 2922 2932 curseg->next_blkoff = 0; 2923 2933 curseg->next_segno = NULL_SEGNO; 2924 2934 2925 - sum_footer = &(curseg->sum_blk->footer); 2935 + sum_footer = sum_footer(sbi, curseg->sum_blk); 2926 2936 memset(sum_footer, 0, sizeof(struct summary_footer)); 2927 2937 2928 2938 sanity_check_seg_type(sbi, seg_type); ··· 3068 3078 sum_folio = f2fs_get_sum_folio(sbi, new_segno); 3069 3079 if (IS_ERR(sum_folio)) { 3070 3080 /* GC won't be able to use stale summary pages by cp_error */ 3071 - memset(curseg->sum_blk, 0, SUM_ENTRY_SIZE); 3081 + memset(curseg->sum_blk, 0, sbi->sum_entry_size); 3072 3082 return PTR_ERR(sum_folio); 3073 3083 } 3074 - sum_node = SUM_BLK_PAGE_ADDR(sum_folio, new_segno); 3075 - memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE); 3084 + sum_node = SUM_BLK_PAGE_ADDR(sbi, sum_folio, new_segno); 3085 + memcpy(curseg->sum_blk, sum_node, sbi->sum_entry_size); 3076 3086 f2fs_folio_put(sum_folio, true); 3077 3087 return 0; 3078 3088 } ··· 3352 3362 3353 3363 int f2fs_allocate_pinning_section(struct f2fs_sb_info *sbi) 3354 3364 { 3365 + struct f2fs_lock_context lc; 3355 3366 int err; 3356 3367 bool gc_required = true; 3357 3368 3358 3369 retry: 3359 - f2fs_lock_op(sbi); 3370 + f2fs_lock_op(sbi, &lc); 3360 3371 err = f2fs_allocate_new_section(sbi, CURSEG_COLD_DATA_PINNED, false); 3361 - f2fs_unlock_op(sbi); 3372 + f2fs_unlock_op(sbi, &lc); 3362 3373 3363 3374 if (f2fs_sb_has_blkzoned(sbi) && err == -EAGAIN && gc_required) { 3364 - f2fs_down_write(&sbi->gc_lock); 3375 + f2fs_down_write_trace(&sbi->gc_lock, &lc); 3365 3376 err = f2fs_gc_range(sbi, 0, sbi->first_seq_zone_segno - 1, 3366 3377 true, ZONED_PIN_SEC_REQUIRED_COUNT); 3367 - f2fs_up_write(&sbi->gc_lock); 3378 + f2fs_up_write_trace(&sbi->gc_lock, &lc); 3368 3379 3369 3380 gc_required = false; 3370 3381 if (!err) ··· 3485 3494 block_t start_block, end_block; 3486 3495 struct cp_control cpc; 3487 3496 struct discard_policy dpolicy; 3497 + struct f2fs_lock_context lc; 3488 3498 unsigned long long trimmed = 0; 3489 3499 int err = 0; 3490 3500 bool need_align = f2fs_lfs_mode(sbi) && __is_large_section(sbi); ··· 3518 3526 if (sbi->discard_blks == 0) 3519 3527 goto out; 3520 3528 3521 - f2fs_down_write(&sbi->gc_lock); 3529 + f2fs_down_write_trace(&sbi->gc_lock, &lc); 3522 3530 stat_inc_cp_call_count(sbi, TOTAL_CALL); 3523 3531 err = f2fs_write_checkpoint(sbi, &cpc); 3524 - f2fs_up_write(&sbi->gc_lock); 3532 + f2fs_up_write_trace(&sbi->gc_lock, &lc); 3525 3533 if (err) 3526 3534 goto out; 3527 3535 ··· 3806 3814 3807 3815 f2fs_wait_discard_bio(sbi, *new_blkaddr); 3808 3816 3809 - curseg->sum_blk->entries[curseg->next_blkoff] = *sum; 3817 + sum_entries(curseg->sum_blk)[curseg->next_blkoff] = *sum; 3810 3818 if (curseg->alloc_type == SSR) { 3811 3819 curseg->next_blkoff = f2fs_find_next_ssr_block(sbi, curseg); 3812 3820 } else { ··· 4175 4183 } 4176 4184 4177 4185 curseg->next_blkoff = GET_BLKOFF_FROM_SEG0(sbi, new_blkaddr); 4178 - curseg->sum_blk->entries[curseg->next_blkoff] = *sum; 4186 + sum_entries(curseg->sum_blk)[curseg->next_blkoff] = *sum; 4179 4187 4180 4188 if (!recover_curseg || recover_newaddr) { 4181 4189 if (!from_gc) ··· 4232 4240 struct f2fs_sb_info *sbi = F2FS_F_SB(folio); 4233 4241 4234 4242 /* submit cached LFS IO */ 4235 - f2fs_submit_merged_write_cond(sbi, NULL, folio, 0, type); 4243 + f2fs_submit_merged_write_folio(sbi, folio, type); 4236 4244 /* submit cached IPU IO */ 4237 4245 f2fs_submit_merged_ipu_write(sbi, NULL, folio); 4238 4246 if (ordered) { ··· 4295 4303 4296 4304 /* Step 1: restore nat cache */ 4297 4305 seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA); 4298 - memcpy(seg_i->journal, kaddr, SUM_JOURNAL_SIZE); 4306 + memcpy(seg_i->journal, kaddr, sbi->sum_journal_size); 4299 4307 4300 4308 /* Step 2: restore sit cache */ 4301 4309 seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA); 4302 - memcpy(seg_i->journal, kaddr + SUM_JOURNAL_SIZE, SUM_JOURNAL_SIZE); 4303 - offset = 2 * SUM_JOURNAL_SIZE; 4310 + memcpy(seg_i->journal, kaddr + sbi->sum_journal_size, sbi->sum_journal_size); 4311 + offset = 2 * sbi->sum_journal_size; 4304 4312 4305 4313 /* Step 3: restore summary entries */ 4306 4314 for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) { ··· 4322 4330 struct f2fs_summary *s; 4323 4331 4324 4332 s = (struct f2fs_summary *)(kaddr + offset); 4325 - seg_i->sum_blk->entries[j] = *s; 4333 + sum_entries(seg_i->sum_blk)[j] = *s; 4326 4334 offset += SUMMARY_SIZE; 4327 - if (offset + SUMMARY_SIZE <= PAGE_SIZE - 4335 + if (offset + SUMMARY_SIZE <= sbi->blocksize - 4328 4336 SUM_FOOTER_SIZE) 4329 4337 continue; 4330 4338 ··· 4380 4388 4381 4389 if (IS_NODESEG(type)) { 4382 4390 if (__exist_node_summaries(sbi)) { 4383 - struct f2fs_summary *ns = &sum->entries[0]; 4391 + struct f2fs_summary *ns = sum_entries(sum); 4384 4392 int i; 4385 4393 4386 4394 for (i = 0; i < BLKS_PER_SEG(sbi); i++, ns++) { ··· 4400 4408 4401 4409 /* update journal info */ 4402 4410 down_write(&curseg->journal_rwsem); 4403 - memcpy(curseg->journal, &sum->journal, SUM_JOURNAL_SIZE); 4411 + memcpy(curseg->journal, sum_journal(sbi, sum), sbi->sum_journal_size); 4404 4412 up_write(&curseg->journal_rwsem); 4405 4413 4406 - memcpy(curseg->sum_blk->entries, sum->entries, SUM_ENTRY_SIZE); 4407 - memcpy(&curseg->sum_blk->footer, &sum->footer, SUM_FOOTER_SIZE); 4414 + memcpy(sum_entries(curseg->sum_blk), sum_entries(sum), 4415 + sbi->sum_entry_size); 4416 + memcpy(sum_footer(sbi, curseg->sum_blk), sum_footer(sbi, sum), 4417 + SUM_FOOTER_SIZE); 4408 4418 curseg->next_segno = segno; 4409 4419 reset_curseg(sbi, type, 0); 4410 4420 curseg->alloc_type = ckpt->alloc_type[type]; ··· 4450 4456 } 4451 4457 4452 4458 /* sanity check for summary blocks */ 4453 - if (nats_in_cursum(nat_j) > NAT_JOURNAL_ENTRIES || 4454 - sits_in_cursum(sit_j) > SIT_JOURNAL_ENTRIES) { 4459 + if (nats_in_cursum(nat_j) > sbi->nat_journal_entries || 4460 + sits_in_cursum(sit_j) > sbi->sit_journal_entries) { 4455 4461 f2fs_err(sbi, "invalid journal entries nats %u sits %u", 4456 4462 nats_in_cursum(nat_j), sits_in_cursum(sit_j)); 4457 4463 return -EINVAL; ··· 4475 4481 4476 4482 /* Step 1: write nat cache */ 4477 4483 seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA); 4478 - memcpy(kaddr, seg_i->journal, SUM_JOURNAL_SIZE); 4479 - written_size += SUM_JOURNAL_SIZE; 4484 + memcpy(kaddr, seg_i->journal, sbi->sum_journal_size); 4485 + written_size += sbi->sum_journal_size; 4480 4486 4481 4487 /* Step 2: write sit cache */ 4482 4488 seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA); 4483 - memcpy(kaddr + written_size, seg_i->journal, SUM_JOURNAL_SIZE); 4484 - written_size += SUM_JOURNAL_SIZE; 4489 + memcpy(kaddr + written_size, seg_i->journal, sbi->sum_journal_size); 4490 + written_size += sbi->sum_journal_size; 4485 4491 4486 4492 /* Step 3: write summary entries */ 4487 4493 for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) { ··· 4494 4500 written_size = 0; 4495 4501 } 4496 4502 summary = (struct f2fs_summary *)(kaddr + written_size); 4497 - *summary = seg_i->sum_blk->entries[j]; 4503 + *summary = sum_entries(seg_i->sum_blk)[j]; 4498 4504 written_size += SUMMARY_SIZE; 4499 4505 4500 - if (written_size + SUMMARY_SIZE <= PAGE_SIZE - 4506 + if (written_size + SUMMARY_SIZE <= sbi->blocksize - 4501 4507 SUM_FOOTER_SIZE) 4502 4508 continue; 4503 4509 ··· 4539 4545 write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE); 4540 4546 } 4541 4547 4542 - int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, 4543 - unsigned int val, int alloc) 4548 + int f2fs_lookup_journal_in_cursum(struct f2fs_sb_info *sbi, 4549 + struct f2fs_journal *journal, int type, 4550 + unsigned int val, int alloc) 4544 4551 { 4545 4552 int i; 4546 4553 ··· 4550 4555 if (le32_to_cpu(nid_in_journal(journal, i)) == val) 4551 4556 return i; 4552 4557 } 4553 - if (alloc && __has_cursum_space(journal, 1, NAT_JOURNAL)) 4558 + if (alloc && __has_cursum_space(sbi, journal, 1, NAT_JOURNAL)) 4554 4559 return update_nats_in_cursum(journal, 1); 4555 4560 } else if (type == SIT_JOURNAL) { 4556 4561 for (i = 0; i < sits_in_cursum(journal); i++) 4557 4562 if (le32_to_cpu(segno_in_journal(journal, i)) == val) 4558 4563 return i; 4559 - if (alloc && __has_cursum_space(journal, 1, SIT_JOURNAL)) 4564 + if (alloc && __has_cursum_space(sbi, journal, 1, SIT_JOURNAL)) 4560 4565 return update_sits_in_cursum(journal, 1); 4561 4566 } 4562 4567 return -1; ··· 4704 4709 * entries, remove all entries from journal and add and account 4705 4710 * them in sit entry set. 4706 4711 */ 4707 - if (!__has_cursum_space(journal, sit_i->dirty_sentries, SIT_JOURNAL) || 4708 - !to_journal) 4712 + if (!__has_cursum_space(sbi, journal, 4713 + sit_i->dirty_sentries, SIT_JOURNAL) || !to_journal) 4709 4714 remove_sits_in_journal(sbi); 4710 4715 4711 4716 /* ··· 4722 4727 unsigned int segno = start_segno; 4723 4728 4724 4729 if (to_journal && 4725 - !__has_cursum_space(journal, ses->entry_cnt, SIT_JOURNAL)) 4730 + !__has_cursum_space(sbi, journal, ses->entry_cnt, 4731 + SIT_JOURNAL)) 4726 4732 to_journal = false; 4727 4733 4728 4734 if (to_journal) { ··· 4751 4755 } 4752 4756 4753 4757 if (to_journal) { 4754 - offset = f2fs_lookup_journal_in_cursum(journal, 4758 + offset = f2fs_lookup_journal_in_cursum(sbi, journal, 4755 4759 SIT_JOURNAL, segno, 1); 4756 4760 f2fs_bug_on(sbi, offset < 0); 4757 4761 segno_in_journal(journal, offset) = ··· 4958 4962 4959 4963 for (i = 0; i < NO_CHECK_TYPE; i++) { 4960 4964 mutex_init(&array[i].curseg_mutex); 4961 - array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL); 4965 + array[i].sum_blk = f2fs_kzalloc(sbi, sbi->sum_blocksize, 4966 + GFP_KERNEL); 4962 4967 if (!array[i].sum_blk) 4963 4968 return -ENOMEM; 4964 4969 init_rwsem(&array[i].journal_rwsem); 4965 4970 array[i].journal = f2fs_kzalloc(sbi, 4966 - sizeof(struct f2fs_journal), GFP_KERNEL); 4971 + sbi->sum_journal_size, GFP_KERNEL); 4967 4972 if (!array[i].journal) 4968 4973 return -ENOMEM; 4969 4974 array[i].seg_type = log_type_to_seg_type(i);

+50 -58

fs/f2fs/segment.h

··· 90 90 #define GET_ZONE_FROM_SEG(sbi, segno) \ 91 91 GET_ZONE_FROM_SEC(sbi, GET_SEC_FROM_SEG(sbi, segno)) 92 92 93 - #define SUMS_PER_BLOCK (F2FS_BLKSIZE / F2FS_SUM_BLKSIZE) 94 93 #define GET_SUM_BLOCK(sbi, segno) \ 95 - (SM_I(sbi)->ssa_blkaddr + (segno / SUMS_PER_BLOCK)) 96 - #define GET_SUM_BLKOFF(segno) (segno % SUMS_PER_BLOCK) 97 - #define SUM_BLK_PAGE_ADDR(folio, segno) \ 98 - (folio_address(folio) + GET_SUM_BLKOFF(segno) * F2FS_SUM_BLKSIZE) 94 + (SM_I(sbi)->ssa_blkaddr + (segno / (sbi)->sums_per_block)) 95 + #define GET_SUM_BLKOFF(sbi, segno) (segno % (sbi)->sums_per_block) 96 + #define SUM_BLK_PAGE_ADDR(sbi, folio, segno) \ 97 + (folio_address(folio) + GET_SUM_BLKOFF(sbi, segno) * (sbi)->sum_blocksize) 99 98 100 99 #define GET_SUM_TYPE(footer) ((footer)->entry_type) 101 100 #define SET_SUM_TYPE(footer, type) ((footer)->entry_type = (type)) ··· 620 621 return CAP_BLKS_PER_SEC(sbi) - get_ckpt_valid_blocks(sbi, segno, true); 621 622 } 622 623 623 - static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi, 624 - unsigned int node_blocks, unsigned int data_blocks, 625 - unsigned int dent_blocks) 624 + static inline void get_additional_blocks_required(struct f2fs_sb_info *sbi, 625 + unsigned int *total_node_blocks, unsigned int *total_data_blocks, 626 + unsigned int *total_dent_blocks, bool separate_dent) 626 627 { 627 - unsigned int segno, left_blocks, blocks; 628 + unsigned int segno, left_blocks; 628 629 int i; 630 + unsigned int min_free_node_blocks = CAP_BLKS_PER_SEC(sbi); 631 + unsigned int min_free_dent_blocks = CAP_BLKS_PER_SEC(sbi); 632 + unsigned int min_free_data_blocks = CAP_BLKS_PER_SEC(sbi); 629 633 630 634 /* check current data/node sections in the worst case. */ 631 635 for (i = CURSEG_HOT_DATA; i < NR_PERSISTENT_LOG; i++) { 632 636 segno = CURSEG_I(sbi, i)->segno; 633 637 634 638 if (unlikely(segno == NULL_SEGNO)) 635 - return false; 639 + return; 636 640 637 641 left_blocks = get_left_section_blocks(sbi, i, segno); 638 642 639 - blocks = i <= CURSEG_COLD_DATA ? data_blocks : node_blocks; 640 - if (blocks > left_blocks) 641 - return false; 643 + if (i > CURSEG_COLD_DATA) 644 + min_free_node_blocks = min(min_free_node_blocks, left_blocks); 645 + else if (i == CURSEG_HOT_DATA && separate_dent) 646 + min_free_dent_blocks = left_blocks; 647 + else 648 + min_free_data_blocks = min(min_free_data_blocks, left_blocks); 642 649 } 643 650 644 - /* check current data section for dentry blocks. */ 645 - segno = CURSEG_I(sbi, CURSEG_HOT_DATA)->segno; 646 - 647 - if (unlikely(segno == NULL_SEGNO)) 648 - return false; 649 - 650 - left_blocks = get_left_section_blocks(sbi, CURSEG_HOT_DATA, segno); 651 - 652 - if (dent_blocks > left_blocks) 653 - return false; 654 - return true; 651 + *total_node_blocks = (*total_node_blocks > min_free_node_blocks) ? 652 + *total_node_blocks - min_free_node_blocks : 0; 653 + *total_dent_blocks = (*total_dent_blocks > min_free_dent_blocks) ? 654 + *total_dent_blocks - min_free_dent_blocks : 0; 655 + *total_data_blocks = (*total_data_blocks > min_free_data_blocks) ? 656 + *total_data_blocks - min_free_data_blocks : 0; 655 657 } 656 658 657 659 /* 658 - * calculate needed sections for dirty node/dentry and call 659 - * has_curseg_enough_space, please note that, it needs to account 660 - * dirty data as well in lfs mode when checkpoint is disabled. 660 + * call get_additional_blocks_required to calculate dirty blocks 661 + * needing to be placed in free sections, please note that, it 662 + * needs to account dirty data as well in lfs mode when checkpoint 663 + * is disabled. 661 664 */ 662 - static inline void __get_secs_required(struct f2fs_sb_info *sbi, 663 - unsigned int *lower_p, unsigned int *upper_p, bool *curseg_p) 665 + static inline int __get_secs_required(struct f2fs_sb_info *sbi) 664 666 { 665 667 unsigned int total_node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) + 666 668 get_pages(sbi, F2FS_DIRTY_DENTS) + 667 669 get_pages(sbi, F2FS_DIRTY_IMETA); 668 670 unsigned int total_dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS); 669 671 unsigned int total_data_blocks = 0; 670 - unsigned int node_secs = total_node_blocks / CAP_BLKS_PER_SEC(sbi); 671 - unsigned int dent_secs = total_dent_blocks / CAP_BLKS_PER_SEC(sbi); 672 - unsigned int data_secs = 0; 673 - unsigned int node_blocks = total_node_blocks % CAP_BLKS_PER_SEC(sbi); 674 - unsigned int dent_blocks = total_dent_blocks % CAP_BLKS_PER_SEC(sbi); 675 - unsigned int data_blocks = 0; 672 + bool separate_dent = true; 676 673 677 - if (f2fs_lfs_mode(sbi)) { 674 + if (f2fs_lfs_mode(sbi)) 678 675 total_data_blocks = get_pages(sbi, F2FS_DIRTY_DATA); 679 - data_secs = total_data_blocks / CAP_BLKS_PER_SEC(sbi); 680 - data_blocks = total_data_blocks % CAP_BLKS_PER_SEC(sbi); 676 + 677 + /* 678 + * When active_logs != 4, dentry blocks and data blocks can be 679 + * mixed in the same logs, so check their space together. 680 + */ 681 + if (F2FS_OPTION(sbi).active_logs != 4) { 682 + total_data_blocks += total_dent_blocks; 683 + total_dent_blocks = 0; 684 + separate_dent = false; 681 685 } 682 686 683 - if (lower_p) 684 - *lower_p = node_secs + dent_secs + data_secs; 685 - if (upper_p) 686 - *upper_p = node_secs + dent_secs + data_secs + 687 - (node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0) + 688 - (data_blocks ? 1 : 0); 689 - if (curseg_p) 690 - *curseg_p = has_curseg_enough_space(sbi, 691 - node_blocks, data_blocks, dent_blocks); 687 + get_additional_blocks_required(sbi, &total_node_blocks, &total_dent_blocks, 688 + &total_data_blocks, separate_dent); 689 + 690 + return DIV_ROUND_UP(total_node_blocks, CAP_BLKS_PER_SEC(sbi)) + 691 + DIV_ROUND_UP(total_dent_blocks, CAP_BLKS_PER_SEC(sbi)) + 692 + DIV_ROUND_UP(total_data_blocks, CAP_BLKS_PER_SEC(sbi)); 692 693 } 693 694 694 695 static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi, 695 696 int freed, int needed) 696 697 { 697 - unsigned int free_secs, lower_secs, upper_secs; 698 - bool curseg_space; 698 + unsigned int free_secs, required_secs; 699 699 700 700 if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) 701 701 return false; 702 702 703 - __get_secs_required(sbi, &lower_secs, &upper_secs, &curseg_space); 704 - 705 703 free_secs = free_sections(sbi) + freed; 706 - lower_secs += needed + reserved_sections(sbi); 707 - upper_secs += needed + reserved_sections(sbi); 704 + required_secs = needed + reserved_sections(sbi) + 705 + __get_secs_required(sbi); 708 706 709 - if (free_secs > upper_secs) 710 - return false; 711 - if (free_secs <= lower_secs) 712 - return true; 713 - return !curseg_space; 707 + return free_secs < required_secs; 714 708 } 715 709 716 710 static inline bool has_enough_free_secs(struct f2fs_sb_info *sbi,

+126 -78

fs/f2fs/super.c

··· 67 67 [FAULT_BLKADDR_CONSISTENCE] = "inconsistent blkaddr", 68 68 [FAULT_NO_SEGMENT] = "no free segment", 69 69 [FAULT_INCONSISTENT_FOOTER] = "inconsistent footer", 70 - [FAULT_TIMEOUT] = "timeout", 70 + [FAULT_ATOMIC_TIMEOUT] = "atomic timeout", 71 71 [FAULT_VMALLOC] = "vmalloc", 72 + [FAULT_LOCK_TIMEOUT] = "lock timeout", 73 + [FAULT_SKIP_WRITE] = "skip write", 72 74 }; 73 75 74 76 int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, ··· 98 96 f2fs_info(sbi, "build fault injection type: 0x%lx", type); 99 97 } 100 98 99 + if (fo & FAULT_TIMEOUT) { 100 + if (type >= TIMEOUT_TYPE_MAX) 101 + return -EINVAL; 102 + ffi->inject_lock_timeout = (unsigned int)type; 103 + f2fs_info(sbi, "build fault timeout injection type: 0x%lx", type); 104 + } 105 + 101 106 return 0; 107 + } 108 + 109 + static void inject_timeout(struct f2fs_sb_info *sbi) 110 + { 111 + struct f2fs_fault_info *ffi = &F2FS_OPTION(sbi).fault_info; 112 + enum f2fs_timeout_type type = ffi->inject_lock_timeout; 113 + unsigned long start_time = jiffies; 114 + unsigned long timeout = HZ; 115 + 116 + switch (type) { 117 + case TIMEOUT_TYPE_RUNNING: 118 + while (!time_after(jiffies, start_time + timeout)) { 119 + if (fatal_signal_pending(current)) 120 + return; 121 + ; 122 + } 123 + break; 124 + case TIMEOUT_TYPE_IO_SLEEP: 125 + f2fs_schedule_timeout_killable(timeout, true); 126 + break; 127 + case TIMEOUT_TYPE_NONIO_SLEEP: 128 + f2fs_schedule_timeout_killable(timeout, false); 129 + break; 130 + case TIMEOUT_TYPE_RUNNABLE: 131 + while (!time_after(jiffies, start_time + timeout)) { 132 + if (fatal_signal_pending(current)) 133 + return; 134 + schedule(); 135 + } 136 + break; 137 + default: 138 + return; 139 + } 140 + } 141 + 142 + void f2fs_simulate_lock_timeout(struct f2fs_sb_info *sbi) 143 + { 144 + struct f2fs_lock_context lc; 145 + 146 + f2fs_lock_op(sbi, &lc); 147 + inject_timeout(sbi); 148 + f2fs_unlock_op(sbi, &lc); 102 149 } 103 150 #endif 104 151 ··· 2607 2556 { 2608 2557 unsigned int s_flags = sbi->sb->s_flags; 2609 2558 struct cp_control cpc; 2559 + struct f2fs_lock_context lc; 2610 2560 unsigned int gc_mode = sbi->gc_mode; 2611 2561 int err = 0; 2612 2562 int ret; ··· 2637 2585 .no_bg_gc = true, 2638 2586 .nr_free_secs = 1 }; 2639 2587 2640 - f2fs_down_write(&sbi->gc_lock); 2588 + f2fs_down_write_trace(&sbi->gc_lock, &gc_control.lc); 2641 2589 stat_inc_gc_call_count(sbi, FOREGROUND); 2642 2590 err = f2fs_gc(sbi, &gc_control); 2643 2591 if (err == -ENODATA) { ··· 2661 2609 } 2662 2610 2663 2611 skip_gc: 2664 - f2fs_down_write(&sbi->gc_lock); 2612 + f2fs_down_write_trace(&sbi->gc_lock, &lc); 2665 2613 cpc.reason = CP_PAUSE; 2666 2614 set_sbi_flag(sbi, SBI_CP_DISABLED); 2667 2615 stat_inc_cp_call_count(sbi, TOTAL_CALL); ··· 2674 2622 spin_unlock(&sbi->stat_lock); 2675 2623 2676 2624 out_unlock: 2677 - f2fs_up_write(&sbi->gc_lock); 2625 + f2fs_up_write_trace(&sbi->gc_lock, &lc); 2678 2626 restore_flag: 2679 2627 sbi->gc_mode = gc_mode; 2680 2628 sbi->sb->s_flags = s_flags; /* Restore SB_RDONLY status */ ··· 2684 2632 2685 2633 static int f2fs_enable_checkpoint(struct f2fs_sb_info *sbi) 2686 2634 { 2687 - unsigned int nr_pages = get_pages(sbi, F2FS_DIRTY_DATA) / 16; 2688 - long long start, writeback, lock, sync_inode, end; 2635 + int retry = MAX_FLUSH_RETRY_COUNT; 2636 + long long start, writeback, end; 2689 2637 int ret; 2638 + struct f2fs_lock_context lc; 2639 + long long skipped_write, dirty_data; 2690 2640 2691 - f2fs_info(sbi, "%s start, meta: %lld, node: %lld, data: %lld", 2692 - __func__, 2641 + f2fs_info(sbi, "f2fs_enable_checkpoint() starts, meta: %lld, node: %lld, data: %lld", 2693 2642 get_pages(sbi, F2FS_DIRTY_META), 2694 2643 get_pages(sbi, F2FS_DIRTY_NODES), 2695 2644 get_pages(sbi, F2FS_DIRTY_DATA)); 2696 2645 2697 - f2fs_update_time(sbi, ENABLE_TIME); 2698 - 2699 2646 start = ktime_get(); 2700 2647 2648 + set_sbi_flag(sbi, SBI_ENABLE_CHECKPOINT); 2649 + 2701 2650 /* we should flush all the data to keep data consistency */ 2702 - while (get_pages(sbi, F2FS_DIRTY_DATA)) { 2703 - writeback_inodes_sb_nr(sbi->sb, nr_pages, WB_REASON_SYNC); 2651 + do { 2652 + skipped_write = get_pages(sbi, F2FS_SKIPPED_WRITE); 2653 + dirty_data = get_pages(sbi, F2FS_DIRTY_DATA); 2654 + 2655 + sync_inodes_sb(sbi->sb); 2704 2656 f2fs_io_schedule_timeout(DEFAULT_SCHEDULE_TIMEOUT); 2705 2657 2706 - if (f2fs_time_over(sbi, ENABLE_TIME)) 2658 + f2fs_info(sbi, "sync_inode_sb done, dirty_data: %lld, %lld, " 2659 + "skipped write: %lld, %lld, retry: %d", 2660 + get_pages(sbi, F2FS_DIRTY_DATA), 2661 + dirty_data, 2662 + get_pages(sbi, F2FS_SKIPPED_WRITE), 2663 + skipped_write, retry); 2664 + 2665 + /* 2666 + * sync_inodes_sb() has retry logic, so let's check dirty_data 2667 + * in prior to skipped_write in case there is no dirty data. 2668 + */ 2669 + if (!get_pages(sbi, F2FS_DIRTY_DATA)) 2707 2670 break; 2708 - } 2671 + if (get_pages(sbi, F2FS_SKIPPED_WRITE) == skipped_write) 2672 + break; 2673 + } while (retry--); 2674 + 2675 + clear_sbi_flag(sbi, SBI_ENABLE_CHECKPOINT); 2676 + 2709 2677 writeback = ktime_get(); 2710 2678 2711 - f2fs_down_write(&sbi->cp_enable_rwsem); 2679 + if (unlikely(get_pages(sbi, F2FS_DIRTY_DATA) || 2680 + get_pages(sbi, F2FS_SKIPPED_WRITE))) 2681 + f2fs_warn(sbi, "checkpoint=enable unwritten data: %lld, skipped data: %lld, retry: %d", 2682 + get_pages(sbi, F2FS_DIRTY_DATA), 2683 + get_pages(sbi, F2FS_SKIPPED_WRITE), retry); 2712 2684 2713 - lock = ktime_get(); 2685 + if (get_pages(sbi, F2FS_SKIPPED_WRITE)) 2686 + atomic_set(&sbi->nr_pages[F2FS_SKIPPED_WRITE], 0); 2714 2687 2715 - if (get_pages(sbi, F2FS_DIRTY_DATA)) 2716 - sync_inodes_sb(sbi->sb); 2717 - 2718 - if (unlikely(get_pages(sbi, F2FS_DIRTY_DATA))) 2719 - f2fs_warn(sbi, "%s: has some unwritten data: %lld", 2720 - __func__, get_pages(sbi, F2FS_DIRTY_DATA)); 2721 - 2722 - sync_inode = ktime_get(); 2723 - 2724 - f2fs_down_write(&sbi->gc_lock); 2688 + f2fs_down_write_trace(&sbi->gc_lock, &lc); 2725 2689 f2fs_dirty_to_prefree(sbi); 2726 2690 2727 2691 clear_sbi_flag(sbi, SBI_CP_DISABLED); 2728 2692 set_sbi_flag(sbi, SBI_IS_DIRTY); 2729 - f2fs_up_write(&sbi->gc_lock); 2693 + f2fs_up_write_trace(&sbi->gc_lock, &lc); 2730 2694 2731 - f2fs_info(sbi, "%s sync_fs, meta: %lld, imeta: %lld, node: %lld, dents: %lld, qdata: %lld", 2732 - __func__, 2733 - get_pages(sbi, F2FS_DIRTY_META), 2734 - get_pages(sbi, F2FS_DIRTY_IMETA), 2735 - get_pages(sbi, F2FS_DIRTY_NODES), 2736 - get_pages(sbi, F2FS_DIRTY_DENTS), 2737 - get_pages(sbi, F2FS_DIRTY_QDATA)); 2738 2695 ret = f2fs_sync_fs(sbi->sb, 1); 2739 2696 if (ret) 2740 2697 f2fs_err(sbi, "%s sync_fs failed, ret: %d", __func__, ret); ··· 2751 2690 /* Let's ensure there's no pending checkpoint anymore */ 2752 2691 f2fs_flush_ckpt_thread(sbi); 2753 2692 2754 - f2fs_up_write(&sbi->cp_enable_rwsem); 2755 - 2756 2693 end = ktime_get(); 2757 2694 2758 - f2fs_info(sbi, "%s end, writeback:%llu, " 2759 - "lock:%llu, sync_inode:%llu, sync_fs:%llu", 2760 - __func__, 2761 - ktime_ms_delta(writeback, start), 2762 - ktime_ms_delta(lock, writeback), 2763 - ktime_ms_delta(sync_inode, lock), 2764 - ktime_ms_delta(end, sync_inode)); 2695 + f2fs_info(sbi, "f2fs_enable_checkpoint() finishes, writeback:%llu, sync:%llu", 2696 + ktime_ms_delta(writeback, start), 2697 + ktime_ms_delta(end, writeback)); 2765 2698 return ret; 2766 2699 } 2767 2700 ··· 3274 3219 } 3275 3220 3276 3221 static int f2fs_quota_enable(struct super_block *sb, int type, int format_id, 3277 - unsigned int flags) 3222 + unsigned int flags, unsigned long qf_inum) 3278 3223 { 3279 3224 struct inode *qf_inode; 3280 - unsigned long qf_inum; 3281 3225 unsigned long qf_flag = F2FS_QUOTA_DEFAULT_FL; 3282 3226 int err; 3283 - 3284 - BUG_ON(!f2fs_sb_has_quota_ino(F2FS_SB(sb))); 3285 - 3286 - qf_inum = f2fs_qf_ino(sb, type); 3287 - if (!qf_inum) 3288 - return -EPERM; 3289 3227 3290 3228 qf_inode = f2fs_iget(sb, qf_inum); 3291 3229 if (IS_ERR(qf_inode)) { ··· 3312 3264 test_opt(sbi, PRJQUOTA), 3313 3265 }; 3314 3266 3315 - if (is_set_ckpt_flags(F2FS_SB(sb), CP_QUOTA_NEED_FSCK_FLAG)) { 3267 + if (is_set_ckpt_flags(sbi, CP_QUOTA_NEED_FSCK_FLAG)) { 3316 3268 f2fs_err(sbi, "quota file may be corrupted, skip loading it"); 3317 3269 return 0; 3318 3270 } ··· 3324 3276 if (qf_inum) { 3325 3277 err = f2fs_quota_enable(sb, type, QFMT_VFS_V1, 3326 3278 DQUOT_USAGE_ENABLED | 3327 - (quota_mopt[type] ? DQUOT_LIMITS_ENABLED : 0)); 3279 + (quota_mopt[type] ? DQUOT_LIMITS_ENABLED : 0), qf_inum); 3328 3280 if (err) { 3329 3281 f2fs_err(sbi, "Failed to enable quota tracking (type=%d, err=%d). Please run fsck to fix.", 3330 3282 type, err); 3331 3283 for (type--; type >= 0; type--) 3332 3284 dquot_quota_off(sb, type); 3333 - set_sbi_flag(F2FS_SB(sb), 3334 - SBI_QUOTA_NEED_REPAIR); 3285 + set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR); 3335 3286 return err; 3336 3287 } 3337 3288 } ··· 3377 3330 * that userspace sees the changes. 3378 3331 */ 3379 3332 for (cnt = 0; cnt < MAXQUOTAS; cnt++) { 3333 + struct f2fs_lock_context lc; 3380 3334 3381 3335 if (type != -1 && cnt != type) 3382 3336 continue; ··· 3397 3349 * block_operation 3398 3350 * f2fs_down_read(quota_sem) 3399 3351 */ 3400 - f2fs_lock_op(sbi); 3352 + f2fs_lock_op(sbi, &lc); 3401 3353 f2fs_down_read(&sbi->quota_sem); 3402 3354 3403 3355 ret = f2fs_quota_sync_file(sbi, cnt); 3404 3356 3405 3357 f2fs_up_read(&sbi->quota_sem); 3406 - f2fs_unlock_op(sbi); 3358 + f2fs_unlock_op(sbi, &lc); 3407 3359 3408 3360 if (!f2fs_sb_has_quota_ino(sbi)) 3409 3361 inode_unlock(dqopt->files[cnt]); ··· 4125 4077 if (sanity_check_area_boundary(sbi, folio, index)) 4126 4078 return -EFSCORRUPTED; 4127 4079 4128 - /* 4129 - * Check for legacy summary layout on 16KB+ block devices. 4130 - * Modern f2fs-tools packs multiple 4KB summary areas into one block, 4131 - * whereas legacy versions used one block per summary, leading 4132 - * to a much larger SSA. 4133 - */ 4134 - if (SUMS_PER_BLOCK > 1 && 4135 - !(__F2FS_HAS_FEATURE(raw_super, F2FS_FEATURE_PACKED_SSA))) { 4136 - f2fs_info(sbi, "Error: Device formatted with a legacy version. " 4137 - "Please reformat with a tool supporting the packed ssa " 4138 - "feature for block sizes larger than 4kb."); 4139 - return -EOPNOTSUPP; 4140 - } 4141 - 4142 4080 return 0; 4143 4081 } 4144 4082 ··· 4334 4300 sbi->max_fragment_hole = DEF_FRAGMENT_SIZE; 4335 4301 spin_lock_init(&sbi->gc_remaining_trials_lock); 4336 4302 atomic64_set(&sbi->current_atomic_write, 0); 4303 + sbi->max_lock_elapsed_time = MAX_LOCK_ELAPSED_TIME; 4304 + sbi->adjust_lock_priority = 0; 4305 + sbi->lock_duration_priority = F2FS_DEFAULT_TASK_PRIORITY; 4306 + sbi->critical_task_priority = F2FS_CRITICAL_TASK_PRIORITY; 4307 + 4308 + sbi->sum_blocksize = f2fs_sb_has_packed_ssa(sbi) ? 4309 + 4096 : sbi->blocksize; 4310 + sbi->sums_per_block = sbi->blocksize / sbi->sum_blocksize; 4311 + sbi->entries_in_sum = sbi->sum_blocksize / 8; 4312 + sbi->sum_entry_size = SUMMARY_SIZE * sbi->entries_in_sum; 4313 + sbi->sum_journal_size = sbi->sum_blocksize - SUM_FOOTER_SIZE - 4314 + sbi->sum_entry_size; 4315 + sbi->nat_journal_entries = (sbi->sum_journal_size - 2) / 4316 + sizeof(struct nat_journal_entry); 4317 + sbi->sit_journal_entries = (sbi->sum_journal_size - 2) / 4318 + sizeof(struct sit_journal_entry); 4337 4319 4338 4320 sbi->dir_level = DEF_DIR_LEVEL; 4339 4321 sbi->interval_time[CP_TIME] = DEF_CP_INTERVAL; ··· 4357 4307 sbi->interval_time[DISCARD_TIME] = DEF_IDLE_INTERVAL; 4358 4308 sbi->interval_time[GC_TIME] = DEF_IDLE_INTERVAL; 4359 4309 sbi->interval_time[DISABLE_TIME] = DEF_DISABLE_INTERVAL; 4360 - sbi->interval_time[ENABLE_TIME] = DEF_ENABLE_INTERVAL; 4361 4310 sbi->interval_time[UMOUNT_DISCARD_TIMEOUT] = 4362 4311 DEF_UMOUNT_DISCARD_TIMEOUT; 4363 4312 clear_sbi_flag(sbi, SBI_NEED_FSCK); ··· 4945 4896 sbi->sb = sb; 4946 4897 4947 4898 /* initialize locks within allocated memory */ 4948 - init_f2fs_rwsem(&sbi->gc_lock); 4899 + init_f2fs_rwsem_trace(&sbi->gc_lock, sbi, LOCK_NAME_GC_LOCK); 4949 4900 mutex_init(&sbi->writepages); 4950 - init_f2fs_rwsem(&sbi->cp_global_sem); 4951 - init_f2fs_rwsem(&sbi->node_write); 4952 - init_f2fs_rwsem(&sbi->node_change); 4901 + init_f2fs_rwsem_trace(&sbi->cp_global_sem, sbi, LOCK_NAME_CP_GLOBAL); 4902 + init_f2fs_rwsem_trace(&sbi->node_write, sbi, LOCK_NAME_NODE_WRITE); 4903 + init_f2fs_rwsem_trace(&sbi->node_change, sbi, LOCK_NAME_NODE_CHANGE); 4953 4904 spin_lock_init(&sbi->stat_lock); 4954 - init_f2fs_rwsem(&sbi->cp_rwsem); 4955 - init_f2fs_rwsem(&sbi->cp_enable_rwsem); 4905 + init_f2fs_rwsem_trace(&sbi->cp_rwsem, sbi, LOCK_NAME_CP_RWSEM); 4956 4906 init_f2fs_rwsem(&sbi->quota_sem); 4957 4907 init_waitqueue_head(&sbi->cp_wait); 4958 4908 spin_lock_init(&sbi->error_lock);

+101 -10

fs/f2fs/sysfs.c

··· 35 35 #ifdef CONFIG_F2FS_FAULT_INJECTION 36 36 FAULT_INFO_RATE, /* struct f2fs_fault_info */ 37 37 FAULT_INFO_TYPE, /* struct f2fs_fault_info */ 38 + FAULT_INFO_TIMEOUT, /* struct f2fs_fault_info */ 38 39 #endif 39 40 RESERVED_BLOCKS, /* struct f2fs_sb_info */ 40 41 CPRC_INFO, /* struct ckpt_req_control */ ··· 59 58 const char *buf, size_t len); 60 59 int struct_type; 61 60 int offset; 61 + int size; 62 62 int id; 63 63 }; 64 64 ··· 86 84 return (unsigned char *)sbi; 87 85 #ifdef CONFIG_F2FS_FAULT_INJECTION 88 86 else if (struct_type == FAULT_INFO_RATE || 89 - struct_type == FAULT_INFO_TYPE) 87 + struct_type == FAULT_INFO_TYPE || 88 + struct_type == FAULT_INFO_TIMEOUT) 90 89 return (unsigned char *)&F2FS_OPTION(sbi).fault_info; 91 90 #endif 92 91 #ifdef CONFIG_F2FS_STAT_FS ··· 347 344 (unsigned long long)MAIN_BLKADDR(sbi)); 348 345 } 349 346 347 + static ssize_t __sbi_show_value(struct f2fs_attr *a, 348 + struct f2fs_sb_info *sbi, char *buf, 349 + unsigned char *value) 350 + { 351 + switch (a->size) { 352 + case 1: 353 + return sysfs_emit(buf, "%u\n", *(u8 *)value); 354 + case 2: 355 + return sysfs_emit(buf, "%u\n", *(u16 *)value); 356 + case 4: 357 + return sysfs_emit(buf, "%u\n", *(u32 *)value); 358 + case 8: 359 + return sysfs_emit(buf, "%llu\n", *(u64 *)value); 360 + default: 361 + f2fs_bug_on(sbi, 1); 362 + return sysfs_emit(buf, 363 + "show sysfs node value with wrong type\n"); 364 + } 365 + } 366 + 350 367 static ssize_t f2fs_sbi_show(struct f2fs_attr *a, 351 368 struct f2fs_sb_info *sbi, char *buf) 352 369 { 353 370 unsigned char *ptr = NULL; 354 - unsigned int *ui; 355 371 356 372 ptr = __struct_ptr(sbi, a->struct_type); 357 373 if (!ptr) ··· 450 428 atomic_read(&sbi->cp_call_count[BACKGROUND])); 451 429 #endif 452 430 453 - ui = (unsigned int *)(ptr + a->offset); 431 + return __sbi_show_value(a, sbi, buf, ptr + a->offset); 432 + } 454 433 455 - return sysfs_emit(buf, "%u\n", *ui); 434 + static void __sbi_store_value(struct f2fs_attr *a, 435 + struct f2fs_sb_info *sbi, 436 + unsigned char *ui, unsigned long value) 437 + { 438 + switch (a->size) { 439 + case 1: 440 + *(u8 *)ui = value; 441 + break; 442 + case 2: 443 + *(u16 *)ui = value; 444 + break; 445 + case 4: 446 + *(u32 *)ui = value; 447 + break; 448 + case 8: 449 + *(u64 *)ui = value; 450 + break; 451 + default: 452 + f2fs_bug_on(sbi, 1); 453 + f2fs_err(sbi, "store sysfs node value with wrong type"); 454 + } 456 455 } 457 456 458 457 static ssize_t __sbi_store(struct f2fs_attr *a, ··· 570 527 if (a->struct_type == FAULT_INFO_RATE) { 571 528 if (f2fs_build_fault_attr(sbi, t, 0, FAULT_RATE)) 572 529 return -EINVAL; 530 + return count; 531 + } 532 + if (a->struct_type == FAULT_INFO_TIMEOUT) { 533 + if (f2fs_build_fault_attr(sbi, 0, t, FAULT_TIMEOUT)) 534 + return -EINVAL; 535 + f2fs_simulate_lock_timeout(sbi); 573 536 return count; 574 537 } 575 538 #endif ··· 798 749 return count; 799 750 } 800 751 801 - if (!strcmp(a->attr.name, "gc_pin_file_threshold")) { 752 + if (!strcmp(a->attr.name, "gc_pin_file_thresh")) { 802 753 if (t > MAX_GC_FAILED_PINNED_FILES) 803 754 return -EINVAL; 804 755 sbi->gc_pin_file_threshold = t; ··· 955 906 return count; 956 907 } 957 908 958 - *ui = (unsigned int)t; 909 + if (!strcmp(a->attr.name, "adjust_lock_priority")) { 910 + if (t >= BIT(LOCK_NAME_MAX - 1)) 911 + return -EINVAL; 912 + sbi->adjust_lock_priority = t; 913 + return count; 914 + } 915 + 916 + if (!strcmp(a->attr.name, "lock_duration_priority")) { 917 + if (t < NICE_TO_PRIO(MIN_NICE) || t > NICE_TO_PRIO(MAX_NICE)) 918 + return -EINVAL; 919 + sbi->lock_duration_priority = t; 920 + return count; 921 + } 922 + 923 + if (!strcmp(a->attr.name, "critical_task_priority")) { 924 + if (t < NICE_TO_PRIO(MIN_NICE) || t > NICE_TO_PRIO(MAX_NICE)) 925 + return -EINVAL; 926 + if (!capable(CAP_SYS_NICE)) 927 + return -EPERM; 928 + sbi->critical_task_priority = t; 929 + if (sbi->cprc_info.f2fs_issue_ckpt) 930 + set_user_nice(sbi->cprc_info.f2fs_issue_ckpt, 931 + PRIO_TO_NICE(sbi->critical_task_priority)); 932 + if (sbi->gc_thread && sbi->gc_thread->f2fs_gc_task) 933 + set_user_nice(sbi->gc_thread->f2fs_gc_task, 934 + PRIO_TO_NICE(sbi->critical_task_priority)); 935 + return count; 936 + } 937 + 938 + __sbi_store_value(a, sbi, ptr + a->offset, t); 959 939 960 940 return count; 961 941 } ··· 1131 1053 .id = F2FS_FEATURE_##_feat, \ 1132 1054 } 1133 1055 1134 - #define F2FS_ATTR_OFFSET(_struct_type, _name, _mode, _show, _store, _offset) \ 1056 + #define F2FS_ATTR_OFFSET(_struct_type, _name, _mode, _show, _store, _offset, _size) \ 1135 1057 static struct f2fs_attr f2fs_attr_##_name = { \ 1136 1058 .attr = {.name = __stringify(_name), .mode = _mode }, \ 1137 1059 .show = _show, \ 1138 1060 .store = _store, \ 1139 1061 .struct_type = _struct_type, \ 1140 - .offset = _offset \ 1062 + .offset = _offset, \ 1063 + .size = _size \ 1141 1064 } 1142 1065 1143 1066 #define F2FS_RO_ATTR(struct_type, struct_name, name, elname) \ 1144 1067 F2FS_ATTR_OFFSET(struct_type, name, 0444, \ 1145 1068 f2fs_sbi_show, NULL, \ 1146 - offsetof(struct struct_name, elname)) 1069 + offsetof(struct struct_name, elname), \ 1070 + sizeof_field(struct struct_name, elname)) 1147 1071 1148 1072 #define F2FS_RW_ATTR(struct_type, struct_name, name, elname) \ 1149 1073 F2FS_ATTR_OFFSET(struct_type, name, 0644, \ 1150 1074 f2fs_sbi_show, f2fs_sbi_store, \ 1151 - offsetof(struct struct_name, elname)) 1075 + offsetof(struct struct_name, elname), \ 1076 + sizeof_field(struct struct_name, elname)) 1152 1077 1153 1078 #define F2FS_GENERAL_RO_ATTR(name) \ 1154 1079 static struct f2fs_attr f2fs_attr_##name = __ATTR(name, 0444, name##_show, NULL) ··· 1300 1219 F2FS_SBI_GENERAL_RW_ATTR(carve_out); 1301 1220 F2FS_SBI_GENERAL_RW_ATTR(reserved_pin_section); 1302 1221 F2FS_SBI_GENERAL_RW_ATTR(bggc_io_aware); 1222 + F2FS_SBI_GENERAL_RW_ATTR(max_lock_elapsed_time); 1223 + F2FS_SBI_GENERAL_RW_ATTR(lock_duration_priority); 1224 + F2FS_SBI_GENERAL_RW_ATTR(adjust_lock_priority); 1225 + F2FS_SBI_GENERAL_RW_ATTR(critical_task_priority); 1303 1226 1304 1227 /* STAT_INFO ATTR */ 1305 1228 #ifdef CONFIG_F2FS_STAT_FS ··· 1317 1232 #ifdef CONFIG_F2FS_FAULT_INJECTION 1318 1233 FAULT_INFO_GENERAL_RW_ATTR(FAULT_INFO_RATE, inject_rate); 1319 1234 FAULT_INFO_GENERAL_RW_ATTR(FAULT_INFO_TYPE, inject_type); 1235 + FAULT_INFO_GENERAL_RW_ATTR(FAULT_INFO_TIMEOUT, inject_lock_timeout); 1320 1236 #endif 1321 1237 1322 1238 /* RESERVED_BLOCKS ATTR */ ··· 1447 1361 #ifdef CONFIG_F2FS_FAULT_INJECTION 1448 1362 ATTR_LIST(inject_rate), 1449 1363 ATTR_LIST(inject_type), 1364 + ATTR_LIST(inject_lock_timeout), 1450 1365 #endif 1451 1366 ATTR_LIST(data_io_flag), 1452 1367 ATTR_LIST(node_io_flag), ··· 1509 1422 ATTR_LIST(reserved_pin_section), 1510 1423 ATTR_LIST(allocate_section_hint), 1511 1424 ATTR_LIST(allocate_section_policy), 1425 + ATTR_LIST(max_lock_elapsed_time), 1426 + ATTR_LIST(lock_duration_priority), 1427 + ATTR_LIST(adjust_lock_priority), 1428 + ATTR_LIST(critical_task_priority), 1512 1429 NULL, 1513 1430 }; 1514 1431 ATTRIBUTE_GROUPS(f2fs);

+3 -2

fs/f2fs/xattr.c

··· 804 804 struct folio *ifolio, int flags) 805 805 { 806 806 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 807 + struct f2fs_lock_context lc; 807 808 int err; 808 809 809 810 if (unlikely(f2fs_cp_error(sbi))) ··· 822 821 size, ifolio, flags); 823 822 f2fs_balance_fs(sbi, true); 824 823 825 - f2fs_lock_op(sbi); 824 + f2fs_lock_op(sbi, &lc); 826 825 f2fs_down_write(&F2FS_I(inode)->i_xattr_sem); 827 826 err = __f2fs_setxattr(inode, index, name, value, size, NULL, flags); 828 827 f2fs_up_write(&F2FS_I(inode)->i_xattr_sem); 829 - f2fs_unlock_op(sbi); 828 + f2fs_unlock_op(sbi, &lc); 830 829 831 830 f2fs_update_time(sbi, REQ_TIME); 832 831 return err;

+45 -28

include/linux/f2fs_fs.h

··· 17 17 #define F2FS_LOG_SECTORS_PER_BLOCK (PAGE_SHIFT - 9) /* log number for sector/blk */ 18 18 #define F2FS_BLKSIZE PAGE_SIZE /* support only block == page */ 19 19 #define F2FS_BLKSIZE_BITS PAGE_SHIFT /* bits for F2FS_BLKSIZE */ 20 - #define F2FS_SUM_BLKSIZE 4096 /* only support 4096 byte sum block */ 21 20 #define F2FS_MAX_EXTENSION 64 /* # of extension entries */ 22 21 #define F2FS_EXTENSION_LEN 8 /* max size of extension */ 23 22 ··· 441 442 * from node's page's beginning to get a data block address. 442 443 * ex) data_blkaddr = (block_t)(nodepage_start_address + ofs_in_node) 443 444 */ 444 - #define ENTRIES_IN_SUM (F2FS_SUM_BLKSIZE / 8) 445 445 #define SUMMARY_SIZE (7) /* sizeof(struct f2fs_summary) */ 446 446 #define SUM_FOOTER_SIZE (5) /* sizeof(struct summary_footer) */ 447 - #define SUM_ENTRY_SIZE (SUMMARY_SIZE * ENTRIES_IN_SUM) 448 447 449 448 /* a summary entry for a block in a segment */ 450 449 struct f2fs_summary { ··· 465 468 __le32 check_sum; /* summary checksum */ 466 469 } __packed; 467 470 468 - #define SUM_JOURNAL_SIZE (F2FS_SUM_BLKSIZE - SUM_FOOTER_SIZE -\ 469 - SUM_ENTRY_SIZE) 470 - #define NAT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\ 471 - sizeof(struct nat_journal_entry)) 472 - #define NAT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\ 473 - sizeof(struct nat_journal_entry)) 474 - #define SIT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\ 475 - sizeof(struct sit_journal_entry)) 476 - #define SIT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\ 477 - sizeof(struct sit_journal_entry)) 478 - 479 - /* Reserved area should make size of f2fs_extra_info equals to 480 - * that of nat_journal and sit_journal. 481 - */ 482 - #define EXTRA_INFO_RESERVED (SUM_JOURNAL_SIZE - 2 - 8) 483 - 484 471 /* 485 472 * frequently updated NAT/SIT entries can be stored in the spare area in 486 473 * summary blocks ··· 479 498 struct f2fs_nat_entry ne; 480 499 } __packed; 481 500 501 + /* 502 + * The nat_journal structure is a placeholder whose actual size varies depending 503 + * on the use of packed_ssa. Therefore, it must always be accessed only through 504 + * specific sets of macros and fields, and size calculations should use 505 + * size-related macros instead of sizeof(). 506 + * Relevant macros: sbi->nat_journal_entries, nat_in_journal(), 507 + * nid_in_journal(), MAX_NAT_JENTRIES(). 508 + */ 482 509 struct nat_journal { 483 - struct nat_journal_entry entries[NAT_JOURNAL_ENTRIES]; 484 - __u8 reserved[NAT_JOURNAL_RESERVED]; 510 + struct nat_journal_entry entries[0]; 485 511 } __packed; 486 512 487 513 struct sit_journal_entry { ··· 496 508 struct f2fs_sit_entry se; 497 509 } __packed; 498 510 511 + /* 512 + * The sit_journal structure is a placeholder whose actual size varies depending 513 + * on the use of packed_ssa. Therefore, it must always be accessed only through 514 + * specific sets of macros and fields, and size calculations should use 515 + * size-related macros instead of sizeof(). 516 + * Relevant macros: sbi->sit_journal_entries, sit_in_journal(), 517 + * segno_in_journal(), MAX_SIT_JENTRIES(). 518 + */ 499 519 struct sit_journal { 500 - struct sit_journal_entry entries[SIT_JOURNAL_ENTRIES]; 501 - __u8 reserved[SIT_JOURNAL_RESERVED]; 520 + struct sit_journal_entry entries[0]; 502 521 } __packed; 503 522 504 523 struct f2fs_extra_info { 505 524 __le64 kbytes_written; 506 - __u8 reserved[EXTRA_INFO_RESERVED]; 525 + __u8 reserved[]; 507 526 } __packed; 508 527 509 528 struct f2fs_journal { ··· 526 531 }; 527 532 } __packed; 528 533 529 - /* Block-sized summary block structure */ 534 + /* 535 + * Block-sized summary block structure 536 + * 537 + * The f2fs_summary_block structure is a placeholder whose actual size varies 538 + * depending on the use of packed_ssa. Therefore, it must always be accessed 539 + * only through specific sets of macros and fields, and size calculations should 540 + * use size-related macros instead of sizeof(). 541 + * Relevant macros: sbi->sum_blocksize, sbi->entries_in_sum, 542 + * sbi->sum_entry_size, sum_entries(), sum_journal(), sum_footer(). 543 + * 544 + * Summary Block Layout 545 + * 546 + * +-----------------------+ <--- Block Start 547 + * | struct f2fs_summary | 548 + * | entries[0] | 549 + * | ... | 550 + * | entries[N-1] | 551 + * +-----------------------+ 552 + * | struct f2fs_journal | 553 + * +-----------------------+ 554 + * | struct summary_footer | 555 + * +-----------------------+ <--- Block End 556 + */ 530 557 struct f2fs_summary_block { 531 - struct f2fs_summary entries[ENTRIES_IN_SUM]; 532 - struct f2fs_journal journal; 533 - struct summary_footer footer; 558 + struct f2fs_summary entries[0]; 559 + // struct f2fs_journal journal; 560 + // struct summary_footer footer; 534 561 } __packed; 535 562 536 563 /*

+141 -1

include/trace/events/f2fs.h

··· 184 184 { CP_PHASE_FINISH_BLOCK_OPS, "finish block_ops" }, \ 185 185 { CP_PHASE_FINISH_CHECKPOINT, "finish checkpoint" }) 186 186 187 + #define show_lock_name(lock) \ 188 + __print_symbolic(lock, \ 189 + { LOCK_NAME_CP_RWSEM, "cp_rwsem" }, \ 190 + { LOCK_NAME_NODE_CHANGE, "node_change" }, \ 191 + { LOCK_NAME_NODE_WRITE, "node_write" }, \ 192 + { LOCK_NAME_GC_LOCK, "gc_lock" }, \ 193 + { LOCK_NAME_CP_GLOBAL, "cp_global" }, \ 194 + { LOCK_NAME_IO_RWSEM, "io_rwsem" }) 195 + 187 196 struct f2fs_sb_info; 188 197 struct f2fs_io_info; 189 198 struct extent_info; ··· 1367 1358 __field(int, type) 1368 1359 __field(int, dir) 1369 1360 __field(pgoff_t, index) 1361 + __field(pgoff_t, nrpages) 1370 1362 __field(int, dirty) 1371 1363 __field(int, uptodate) 1372 1364 ), ··· 1378 1368 __entry->type = type; 1379 1369 __entry->dir = S_ISDIR(folio->mapping->host->i_mode); 1380 1370 __entry->index = folio->index; 1371 + __entry->nrpages= folio_nr_pages(folio); 1381 1372 __entry->dirty = folio_test_dirty(folio); 1382 1373 __entry->uptodate = folio_test_uptodate(folio); 1383 1374 ), 1384 1375 1385 - TP_printk("dev = (%d,%d), ino = %lu, %s, %s, index = %lu, " 1376 + TP_printk("dev = (%d,%d), ino = %lu, %s, %s, index = %lu, nr_pages = %lu, " 1386 1377 "dirty = %d, uptodate = %d", 1387 1378 show_dev_ino(__entry), 1388 1379 show_block_type(__entry->type), 1389 1380 show_file_type(__entry->dir), 1390 1381 (unsigned long)__entry->index, 1382 + (unsigned long)__entry->nrpages, 1391 1383 __entry->dirty, 1392 1384 __entry->uptodate) 1393 1385 ); ··· 1409 1397 ); 1410 1398 1411 1399 DEFINE_EVENT(f2fs__folio, f2fs_readpage, 1400 + 1401 + TP_PROTO(struct folio *folio, int type), 1402 + 1403 + TP_ARGS(folio, type) 1404 + ); 1405 + 1406 + DEFINE_EVENT(f2fs__folio, f2fs_read_folio, 1412 1407 1413 1408 TP_PROTO(struct folio *folio, int type), 1414 1409 ··· 2459 2440 TP_PROTO(struct inode *inode, loff_t offset, int bytes), 2460 2441 2461 2442 TP_ARGS(inode, offset, bytes) 2443 + ); 2444 + 2445 + TRACE_EVENT(f2fs_lock_elapsed_time, 2446 + 2447 + TP_PROTO(struct f2fs_sb_info *sbi, enum f2fs_lock_name lock_name, 2448 + bool is_write, struct task_struct *p, int ioprio, 2449 + unsigned long long total_time, 2450 + unsigned long long running_time, 2451 + unsigned long long runnable_time, 2452 + unsigned long long io_sleep_time, 2453 + unsigned long long other_time), 2454 + 2455 + TP_ARGS(sbi, lock_name, is_write, p, ioprio, total_time, running_time, 2456 + runnable_time, io_sleep_time, other_time), 2457 + 2458 + TP_STRUCT__entry( 2459 + __field(dev_t, dev) 2460 + __array(char, comm, TASK_COMM_LEN) 2461 + __field(pid_t, pid) 2462 + __field(int, prio) 2463 + __field(int, ioprio_class) 2464 + __field(int, ioprio_data) 2465 + __field(unsigned int, lock_name) 2466 + __field(bool, is_write) 2467 + __field(unsigned long long, total_time) 2468 + __field(unsigned long long, running_time) 2469 + __field(unsigned long long, runnable_time) 2470 + __field(unsigned long long, io_sleep_time) 2471 + __field(unsigned long long, other_time) 2472 + ), 2473 + 2474 + TP_fast_assign( 2475 + __entry->dev = sbi->sb->s_dev; 2476 + memcpy(__entry->comm, p->comm, TASK_COMM_LEN); 2477 + __entry->pid = p->pid; 2478 + __entry->prio = p->prio; 2479 + __entry->ioprio_class = IOPRIO_PRIO_CLASS(ioprio); 2480 + __entry->ioprio_data = IOPRIO_PRIO_DATA(ioprio); 2481 + __entry->lock_name = lock_name; 2482 + __entry->is_write = is_write; 2483 + __entry->total_time = total_time; 2484 + __entry->running_time = running_time; 2485 + __entry->runnable_time = runnable_time; 2486 + __entry->io_sleep_time = io_sleep_time; 2487 + __entry->other_time = other_time; 2488 + ), 2489 + 2490 + TP_printk("dev = (%d,%d), comm: %s, pid: %d, prio: %d, " 2491 + "ioprio_class: %d, ioprio_data: %d, lock_name: %s, " 2492 + "lock_type: %s, total: %llu, running: %llu, " 2493 + "runnable: %llu, io_sleep: %llu, other: %llu", 2494 + show_dev(__entry->dev), 2495 + __entry->comm, 2496 + __entry->pid, 2497 + __entry->prio, 2498 + __entry->ioprio_class, 2499 + __entry->ioprio_data, 2500 + show_lock_name(__entry->lock_name), 2501 + __entry->is_write ? "wlock" : "rlock", 2502 + __entry->total_time, 2503 + __entry->running_time, 2504 + __entry->runnable_time, 2505 + __entry->io_sleep_time, 2506 + __entry->other_time) 2507 + ); 2508 + 2509 + DECLARE_EVENT_CLASS(f2fs_priority_update, 2510 + 2511 + TP_PROTO(struct f2fs_sb_info *sbi, enum f2fs_lock_name lock_name, 2512 + bool is_write, struct task_struct *p, int orig_prio, 2513 + int new_prio), 2514 + 2515 + TP_ARGS(sbi, lock_name, is_write, p, orig_prio, new_prio), 2516 + 2517 + TP_STRUCT__entry( 2518 + __field(dev_t, dev) 2519 + __array(char, comm, TASK_COMM_LEN) 2520 + __field(pid_t, pid) 2521 + __field(unsigned int, lock_name) 2522 + __field(bool, is_write) 2523 + __field(int, orig_prio) 2524 + __field(int, new_prio) 2525 + ), 2526 + 2527 + TP_fast_assign( 2528 + __entry->dev = sbi->sb->s_dev; 2529 + memcpy(__entry->comm, p->comm, TASK_COMM_LEN); 2530 + __entry->pid = p->pid; 2531 + __entry->lock_name = lock_name; 2532 + __entry->is_write = is_write; 2533 + __entry->orig_prio = orig_prio; 2534 + __entry->new_prio = new_prio; 2535 + ), 2536 + 2537 + TP_printk("dev = (%d,%d), comm: %s, pid: %d, lock_name: %s, " 2538 + "lock_type: %s, orig_prio: %d, new_prio: %d", 2539 + show_dev(__entry->dev), 2540 + __entry->comm, 2541 + __entry->pid, 2542 + show_lock_name(__entry->lock_name), 2543 + __entry->is_write ? "wlock" : "rlock", 2544 + __entry->orig_prio, 2545 + __entry->new_prio) 2546 + ); 2547 + 2548 + DEFINE_EVENT(f2fs_priority_update, f2fs_priority_uplift, 2549 + 2550 + TP_PROTO(struct f2fs_sb_info *sbi, enum f2fs_lock_name lock_name, 2551 + bool is_write, struct task_struct *p, int orig_prio, 2552 + int new_prio), 2553 + 2554 + TP_ARGS(sbi, lock_name, is_write, p, orig_prio, new_prio) 2555 + ); 2556 + 2557 + DEFINE_EVENT(f2fs_priority_update, f2fs_priority_restore, 2558 + 2559 + TP_PROTO(struct f2fs_sb_info *sbi, enum f2fs_lock_name lock_name, 2560 + bool is_write, struct task_struct *p, int orig_prio, 2561 + int new_prio), 2562 + 2563 + TP_ARGS(sbi, lock_name, is_write, p, orig_prio, new_prio) 2462 2564 ); 2463 2565 2464 2566 #endif /* _TRACE_F2FS_H */

Configure Feed

Configure Feed