Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'ext4_for_linus-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
"New features and improvements for the ext4 file system

- Avoid unnecessary cache invalidation in the extent status cache
(es_cache) when adding extents to be cached in the es_cache and we
are not changing the extent tree

- Add a sysfs parameter, err_report_sec, to control how frequently to
log a warning message that file system inconsistency has been
detected (Previously we logged the warning message every 24 hours)

- Avoid unnecessary forced ordered writes when appending to a file
when delayed allocation is enabled

- Defer splitting unwritten extents to I/O completion to improve
write performance of concurrent direct I/O writes to multiple files

- Refactor and add kunit tests to the extent splitting and conversion
code paths

Various Bug Fixes:

- Fix a panic when the debugging DOUBLE_CHECK macro is defined

- Avoid using fast commit for rare and complex file system operations
to make fast commit easier to reason about. This can also avoid
some corner cases that could result in file system inconsistency if
there is a crash between the fast commit before a subsequent full
commit

- Fix memory leaks in error paths

- Fix a false positive reports caused when running stress tests using
mixed huge-page workloads caused by a race between page migration
and bitmap updates

- Fix a potential recursion into file system reclaim when evicting an
inode when fast commit is enabled

- Fix a warning caused by a potential double decrement to the dirty
clusters counter when executing FS_IOC_SHUTDOWN when running a
stress test

- Enable mballoc optimized scanning regardless whether the inode is
using indirect blocks or extent trees to map blocks"

* tag 'ext4_for_linus-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (45 commits)
et4: allow zeroout when doing written to unwritten split
ext4: refactor split and convert extents
ext4: refactor zeroout path and handle all cases
ext4: propagate flags to ext4_convert_unwritten_extents_endio()
ext4: propagate flags to convert_initialized_extent()
ext4: add extent status cache support to kunit tests
ext4: kunit tests for higher level extent manipulation functions
ext4: kunit tests for extent splitting and conversion
ext4: use optimized mballoc scanning regardless of inode format
ext4: always allocate blocks only from groups inode can use
ext4: fix dirtyclusters double decrement on fs shutdown
ext4: fast commit: make s_fc_lock reclaim-safe
ext4: fix e4b bitmap inconsistency reports
ext4: remove redundant NULL check after __GFP_NOFAIL
ext4: remove EXT4_GET_BLOCKS_IO_CREATE_EXT
ext4: simplify the mapping query logic in ext4_iomap_begin()
ext4: remove unused unwritten parameter in ext4_dio_write_iter()
ext4: remove useless ext4_iomap_overwrite_ops
ext4: avoid starting handle when dio writing an unwritten extent
ext4: don't split extent before submitting I/O
...

+1680 -464
+23 -11
fs/ext4/ext4.h
··· 707 707 * found an unwritten extent, we need to split it. 708 708 */ 709 709 #define EXT4_GET_BLOCKS_SPLIT_NOMERGE 0x0008 710 - /* 711 - * Caller is from the dio or dioread_nolock buffered IO, reqest to 712 - * create an unwritten extent if it does not exist or split the 713 - * found unwritten extent. Also do not merge the newly created 714 - * unwritten extent, io end will convert unwritten to written, 715 - * and try to merge the written extent. 716 - */ 717 - #define EXT4_GET_BLOCKS_IO_CREATE_EXT (EXT4_GET_BLOCKS_SPLIT_NOMERGE|\ 718 - EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT) 719 710 /* Convert unwritten extent to initialized. */ 720 711 #define EXT4_GET_BLOCKS_CONVERT 0x0010 721 712 /* Eventual metadata allocation (due to growing extent tree) ··· 1683 1692 1684 1693 /* timer for periodic error stats printing */ 1685 1694 struct timer_list s_err_report; 1695 + /* timeout in seconds for s_err_report; 0 disables the timer. */ 1696 + unsigned long s_err_report_sec; 1686 1697 1687 1698 /* Lazy inode table initialization info */ 1688 1699 struct ext4_li_request *s_li_request; ··· 1788 1795 * Main fast commit lock. This lock protects accesses to the 1789 1796 * following fields: 1790 1797 * ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh. 1798 + * 1799 + * s_fc_lock can be taken from reclaim context (inode eviction) and is 1800 + * thus reclaim unsafe. Use ext4_fc_lock()/ext4_fc_unlock() helpers 1801 + * when acquiring / releasing the lock. 1791 1802 */ 1792 1803 struct mutex s_fc_lock; 1793 1804 struct buffer_head *s_fc_bh; ··· 1834 1837 { 1835 1838 memalloc_nofs_restore(ctx); 1836 1839 percpu_up_write(&EXT4_SB(sb)->s_writepages_rwsem); 1840 + } 1841 + 1842 + static inline int ext4_fc_lock(struct super_block *sb) 1843 + { 1844 + mutex_lock(&EXT4_SB(sb)->s_fc_lock); 1845 + return memalloc_nofs_save(); 1846 + } 1847 + 1848 + static inline void ext4_fc_unlock(struct super_block *sb, int ctx) 1849 + { 1850 + memalloc_nofs_restore(ctx); 1851 + mutex_unlock(&EXT4_SB(sb)->s_fc_lock); 1837 1852 } 1838 1853 1839 1854 static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) ··· 2381 2372 */ 2382 2373 #define EXT4_DEF_SB_UPDATE_INTERVAL_SEC (3600) /* seconds (1 hour) */ 2383 2374 #define EXT4_DEF_SB_UPDATE_INTERVAL_KB (16384) /* kilobytes (16MB) */ 2384 - 2385 2375 2386 2376 /* 2387 2377 * Minimum number of groups in a flexgroup before we separate out ··· 3207 3199 unsigned int flags); 3208 3200 extern unsigned int ext4_num_base_meta_blocks(struct super_block *sb, 3209 3201 ext4_group_t block_group); 3202 + extern void print_daily_error_info(struct timer_list *t); 3210 3203 3211 3204 extern __printf(7, 8) 3212 3205 void __ext4_error(struct super_block *, const char *, unsigned int, bool, ··· 3804 3795 ext4_io_end_t *io_end); 3805 3796 extern int ext4_map_blocks(handle_t *handle, struct inode *inode, 3806 3797 struct ext4_map_blocks *map, int flags); 3798 + extern int ext4_map_query_blocks(handle_t *handle, struct inode *inode, 3799 + struct ext4_map_blocks *map, int flags); 3800 + extern int ext4_map_create_blocks(handle_t *handle, struct inode *inode, 3801 + struct ext4_map_blocks *map, int flags); 3807 3802 extern int ext4_ext_calc_credits_for_single_extent(struct inode *inode, 3808 3803 int num, 3809 3804 struct ext4_ext_path *path); ··· 3922 3909 } 3923 3910 3924 3911 extern const struct iomap_ops ext4_iomap_ops; 3925 - extern const struct iomap_ops ext4_iomap_overwrite_ops; 3926 3912 extern const struct iomap_ops ext4_iomap_report_ops; 3927 3913 3928 3914 static inline int ext4_buffer_uptodate(struct buffer_head *bh)
+1027
fs/ext4/extents-test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Written by Ojaswin Mujoo <ojaswin@linux.ibm.com> (IBM) 4 + * 5 + * These Kunit tests are designed to test the functionality of 6 + * extent split and conversion in ext4. 7 + * 8 + * Currently, ext4 can split extents in 2 ways: 9 + * 1. By splitting the extents in the extent tree and optionally converting them 10 + * to written or unwritten based on flags passed. 11 + * 2. In case 1 encounters an error, ext4 instead zerooes out the unwritten 12 + * areas of the extent and marks the complete extent written. 13 + * 14 + * The primary function that handles this is ext4_split_convert_extents(). 15 + * 16 + * We test both of the methods of split. The behavior we try to enforce is: 17 + * 1. When passing EXT4_GET_BLOCKS_CONVERT flag to ext4_split_convert_extents(), 18 + * the split extent should be converted to initialized. 19 + * 2. When passing EXT4_GET_BLOCKS_CONVERT_UNWRITTEN flag to 20 + * ext4_split_convert_extents(), the split extent should be converted to 21 + * uninitialized. 22 + * 3. In case we use the zeroout method, then we should correctly write zeroes 23 + * to the unwritten areas of the extent and we should not corrupt/leak any 24 + * data. 25 + * 26 + * Enforcing 1 and 2 is straight forward, we just setup a minimal inode with 27 + * extent tree, call ext4_split_convert_extents() and check the final state of 28 + * the extent tree. 29 + * 30 + * For zeroout testing, we maintain a separate buffer which represents the disk 31 + * data corresponding to the extents. We then override ext4's zeroout functions 32 + * to instead write zeroes to our buffer. Then, we override 33 + * ext4_ext_insert_extent() to return -ENOSPC, which triggers the zeroout. 34 + * Finally, we check the state of the extent tree and zeroout buffer to confirm 35 + * everything went well. 36 + */ 37 + 38 + #include <kunit/test.h> 39 + #include <kunit/static_stub.h> 40 + #include <linux/gfp_types.h> 41 + #include <linux/stddef.h> 42 + 43 + #include "ext4.h" 44 + #include "ext4_extents.h" 45 + 46 + #define EXT_DATA_PBLK 100 47 + #define EXT_DATA_LBLK 10 48 + #define EXT_DATA_LEN 3 49 + 50 + struct kunit_ctx { 51 + /* 52 + * Ext4 inode which has only 1 unwrit extent 53 + */ 54 + struct ext4_inode_info *k_ei; 55 + /* 56 + * Represents the underlying data area (used for zeroout testing) 57 + */ 58 + char *k_data; 59 + } k_ctx; 60 + 61 + /* 62 + * describes the state of an expected extent in extent tree. 63 + */ 64 + struct kunit_ext_state { 65 + ext4_lblk_t ex_lblk; 66 + ext4_lblk_t ex_len; 67 + bool is_unwrit; 68 + }; 69 + 70 + /* 71 + * describes the state of the data area of a writ extent. Used for testing 72 + * correctness of zeroout. 73 + */ 74 + struct kunit_ext_data_state { 75 + char exp_char; 76 + ext4_lblk_t off_blk; 77 + ext4_lblk_t len_blk; 78 + }; 79 + 80 + enum kunit_test_types { 81 + TEST_SPLIT_CONVERT, 82 + TEST_CREATE_BLOCKS, 83 + }; 84 + 85 + struct kunit_ext_test_param { 86 + /* description of test */ 87 + char *desc; 88 + 89 + /* determines which function will be tested */ 90 + int type; 91 + 92 + /* is extent unwrit at beginning of test */ 93 + bool is_unwrit_at_start; 94 + 95 + /* flags to pass while splitting */ 96 + int split_flags; 97 + 98 + /* map describing range to split */ 99 + struct ext4_map_blocks split_map; 100 + 101 + /* disable zeroout */ 102 + bool disable_zeroout; 103 + 104 + /* no of extents expected after split */ 105 + int nr_exp_ext; 106 + 107 + /* 108 + * expected state of extents after split. We will never split into more 109 + * than 3 extents 110 + */ 111 + struct kunit_ext_state exp_ext_state[3]; 112 + 113 + /* Below fields used for zeroout tests */ 114 + 115 + bool is_zeroout_test; 116 + /* 117 + * no of expected data segments (zeroout tests). Example, if we expect 118 + * data to be 4kb 0s, followed by 8kb non-zero, then nr_exp_data_segs==2 119 + */ 120 + int nr_exp_data_segs; 121 + 122 + /* 123 + * expected state of data area after zeroout. 124 + */ 125 + struct kunit_ext_data_state exp_data_state[3]; 126 + }; 127 + 128 + static void ext_kill_sb(struct super_block *sb) 129 + { 130 + generic_shutdown_super(sb); 131 + } 132 + 133 + static int ext_set(struct super_block *sb, void *data) 134 + { 135 + return 0; 136 + } 137 + 138 + static struct file_system_type ext_fs_type = { 139 + .name = "extents test", 140 + .kill_sb = ext_kill_sb, 141 + }; 142 + 143 + static void extents_kunit_exit(struct kunit *test) 144 + { 145 + struct ext4_sb_info *sbi = k_ctx.k_ei->vfs_inode.i_sb->s_fs_info; 146 + 147 + kfree(sbi); 148 + kfree(k_ctx.k_ei); 149 + kfree(k_ctx.k_data); 150 + } 151 + 152 + static int __ext4_ext_dirty_stub(const char *where, unsigned int line, 153 + handle_t *handle, struct inode *inode, 154 + struct ext4_ext_path *path) 155 + { 156 + return 0; 157 + } 158 + 159 + static struct ext4_ext_path * 160 + ext4_ext_insert_extent_stub(handle_t *handle, struct inode *inode, 161 + struct ext4_ext_path *path, 162 + struct ext4_extent *newext, int gb_flags) 163 + { 164 + return ERR_PTR(-ENOSPC); 165 + } 166 + 167 + /* 168 + * We will zeroout the equivalent range in the data area 169 + */ 170 + static int ext4_ext_zeroout_stub(struct inode *inode, struct ext4_extent *ex) 171 + { 172 + ext4_lblk_t ee_block, off_blk; 173 + loff_t ee_len; 174 + loff_t off_bytes; 175 + struct kunit *test = kunit_get_current_test(); 176 + 177 + ee_block = le32_to_cpu(ex->ee_block); 178 + ee_len = ext4_ext_get_actual_len(ex); 179 + 180 + KUNIT_EXPECT_EQ_MSG(test, 1, ee_block >= EXT_DATA_LBLK, "ee_block=%d", 181 + ee_block); 182 + KUNIT_EXPECT_EQ(test, 1, 183 + ee_block + ee_len <= EXT_DATA_LBLK + EXT_DATA_LEN); 184 + 185 + off_blk = ee_block - EXT_DATA_LBLK; 186 + off_bytes = off_blk << inode->i_sb->s_blocksize_bits; 187 + memset(k_ctx.k_data + off_bytes, 0, 188 + ee_len << inode->i_sb->s_blocksize_bits); 189 + 190 + return 0; 191 + } 192 + 193 + static int ext4_issue_zeroout_stub(struct inode *inode, ext4_lblk_t lblk, 194 + ext4_fsblk_t pblk, ext4_lblk_t len) 195 + { 196 + ext4_lblk_t off_blk; 197 + loff_t off_bytes; 198 + struct kunit *test = kunit_get_current_test(); 199 + 200 + kunit_log(KERN_ALERT, test, 201 + "%s: lblk=%u pblk=%llu len=%u", __func__, lblk, pblk, len); 202 + KUNIT_EXPECT_EQ(test, 1, lblk >= EXT_DATA_LBLK); 203 + KUNIT_EXPECT_EQ(test, 1, lblk + len <= EXT_DATA_LBLK + EXT_DATA_LEN); 204 + KUNIT_EXPECT_EQ(test, 1, lblk - EXT_DATA_LBLK == pblk - EXT_DATA_PBLK); 205 + 206 + off_blk = lblk - EXT_DATA_LBLK; 207 + off_bytes = off_blk << inode->i_sb->s_blocksize_bits; 208 + memset(k_ctx.k_data + off_bytes, 0, 209 + len << inode->i_sb->s_blocksize_bits); 210 + 211 + return 0; 212 + } 213 + 214 + static int extents_kunit_init(struct kunit *test) 215 + { 216 + struct ext4_extent_header *eh = NULL; 217 + struct ext4_inode_info *ei; 218 + struct inode *inode; 219 + struct super_block *sb; 220 + struct ext4_sb_info *sbi = NULL; 221 + struct kunit_ext_test_param *param = 222 + (struct kunit_ext_test_param *)(test->param_value); 223 + int err; 224 + 225 + sb = sget(&ext_fs_type, NULL, ext_set, 0, NULL); 226 + if (IS_ERR(sb)) 227 + return PTR_ERR(sb); 228 + 229 + sb->s_blocksize = 4096; 230 + sb->s_blocksize_bits = 12; 231 + 232 + sbi = kzalloc(sizeof(struct ext4_sb_info), GFP_KERNEL); 233 + if (sbi == NULL) 234 + return -ENOMEM; 235 + 236 + sbi->s_sb = sb; 237 + sb->s_fs_info = sbi; 238 + 239 + if (!param || !param->disable_zeroout) 240 + sbi->s_extent_max_zeroout_kb = 32; 241 + 242 + /* setup the mock inode */ 243 + k_ctx.k_ei = kzalloc(sizeof(struct ext4_inode_info), GFP_KERNEL); 244 + if (k_ctx.k_ei == NULL) 245 + return -ENOMEM; 246 + ei = k_ctx.k_ei; 247 + inode = &ei->vfs_inode; 248 + 249 + err = ext4_es_register_shrinker(sbi); 250 + if (err) 251 + return err; 252 + 253 + ext4_es_init_tree(&ei->i_es_tree); 254 + rwlock_init(&ei->i_es_lock); 255 + INIT_LIST_HEAD(&ei->i_es_list); 256 + ei->i_es_all_nr = 0; 257 + ei->i_es_shk_nr = 0; 258 + ei->i_es_shrink_lblk = 0; 259 + 260 + ei->i_disksize = (EXT_DATA_LBLK + EXT_DATA_LEN + 10) 261 + << sb->s_blocksize_bits; 262 + ei->i_flags = 0; 263 + ext4_set_inode_flag(inode, EXT4_INODE_EXTENTS); 264 + inode->i_sb = sb; 265 + 266 + k_ctx.k_data = kzalloc(EXT_DATA_LEN * 4096, GFP_KERNEL); 267 + if (k_ctx.k_data == NULL) 268 + return -ENOMEM; 269 + 270 + /* 271 + * set the data area to a junk value 272 + */ 273 + memset(k_ctx.k_data, 'X', EXT_DATA_LEN * 4096); 274 + 275 + /* create a tree with depth 0 */ 276 + eh = (struct ext4_extent_header *)k_ctx.k_ei->i_data; 277 + 278 + /* Fill extent header */ 279 + eh = ext_inode_hdr(&k_ctx.k_ei->vfs_inode); 280 + eh->eh_depth = 0; 281 + eh->eh_entries = cpu_to_le16(1); 282 + eh->eh_magic = EXT4_EXT_MAGIC; 283 + eh->eh_max = 284 + cpu_to_le16(ext4_ext_space_root_idx(&k_ctx.k_ei->vfs_inode, 0)); 285 + eh->eh_generation = 0; 286 + 287 + /* 288 + * add 1 extent in leaf node covering: 289 + * - lblks: [EXT_DATA_LBLK, EXT_DATA_LBLK * + EXT_DATA_LEN) 290 + * - pblks: [EXT_DATA_PBLK, EXT_DATA_PBLK + EXT_DATA_LEN) 291 + */ 292 + EXT_FIRST_EXTENT(eh)->ee_block = cpu_to_le32(EXT_DATA_LBLK); 293 + EXT_FIRST_EXTENT(eh)->ee_len = cpu_to_le16(EXT_DATA_LEN); 294 + ext4_ext_store_pblock(EXT_FIRST_EXTENT(eh), EXT_DATA_PBLK); 295 + if (!param || param->is_unwrit_at_start) 296 + ext4_ext_mark_unwritten(EXT_FIRST_EXTENT(eh)); 297 + 298 + ext4_es_insert_extent(inode, EXT_DATA_LBLK, EXT_DATA_LEN, EXT_DATA_PBLK, 299 + ext4_ext_is_unwritten(EXT_FIRST_EXTENT(eh)) ? 300 + EXTENT_STATUS_UNWRITTEN : 301 + EXTENT_STATUS_WRITTEN, 302 + 0); 303 + 304 + /* Add stubs */ 305 + kunit_activate_static_stub(test, __ext4_ext_dirty, 306 + __ext4_ext_dirty_stub); 307 + kunit_activate_static_stub(test, ext4_ext_zeroout, ext4_ext_zeroout_stub); 308 + kunit_activate_static_stub(test, ext4_issue_zeroout, 309 + ext4_issue_zeroout_stub); 310 + return 0; 311 + } 312 + 313 + /* 314 + * Return 1 if all bytes in the buf equal to c, else return the offset of first mismatch 315 + */ 316 + static int check_buffer(char *buf, int c, int size) 317 + { 318 + void *ret = NULL; 319 + 320 + ret = memchr_inv(buf, c, size); 321 + if (ret == NULL) 322 + return 0; 323 + 324 + kunit_log(KERN_ALERT, kunit_get_current_test(), 325 + "# %s: wrong char found at offset %u (expected:%d got:%d)", __func__, 326 + (u32)((char *)ret - buf), c, *((char *)ret)); 327 + return 1; 328 + } 329 + 330 + /* 331 + * Simulate a map block call by first calling ext4_map_query_blocks() to 332 + * correctly populate map flags and pblk and then call the 333 + * ext4_map_create_blocks() to do actual split and conversion. This is easier 334 + * than calling ext4_map_blocks() because that needs mocking a lot of unrelated 335 + * functions. 336 + */ 337 + static void ext4_map_create_blocks_helper(struct kunit *test, 338 + struct inode *inode, 339 + struct ext4_map_blocks *map, 340 + int flags) 341 + { 342 + int retval = 0; 343 + 344 + retval = ext4_map_query_blocks(NULL, inode, map, flags); 345 + if (retval < 0) { 346 + KUNIT_FAIL(test, 347 + "ext4_map_query_blocks() failed. Cannot proceed\n"); 348 + return; 349 + } 350 + 351 + ext4_map_create_blocks(NULL, inode, map, flags); 352 + } 353 + 354 + static void test_split_convert(struct kunit *test) 355 + { 356 + struct ext4_ext_path *path; 357 + struct inode *inode = &k_ctx.k_ei->vfs_inode; 358 + struct ext4_extent *ex; 359 + struct ext4_map_blocks map; 360 + const struct kunit_ext_test_param *param = 361 + (const struct kunit_ext_test_param *)(test->param_value); 362 + int blkbits = inode->i_sb->s_blocksize_bits; 363 + 364 + if (param->is_zeroout_test) 365 + /* 366 + * Force zeroout by making ext4_ext_insert_extent return ENOSPC 367 + */ 368 + kunit_activate_static_stub(test, ext4_ext_insert_extent, 369 + ext4_ext_insert_extent_stub); 370 + 371 + path = ext4_find_extent(inode, EXT_DATA_LBLK, NULL, EXT4_EX_NOCACHE); 372 + ex = path->p_ext; 373 + KUNIT_EXPECT_EQ(test, EXT_DATA_LBLK, le32_to_cpu(ex->ee_block)); 374 + KUNIT_EXPECT_EQ(test, EXT_DATA_LEN, ext4_ext_get_actual_len(ex)); 375 + KUNIT_EXPECT_EQ(test, param->is_unwrit_at_start, 376 + ext4_ext_is_unwritten(ex)); 377 + if (param->is_zeroout_test) 378 + KUNIT_EXPECT_EQ(test, 0, 379 + check_buffer(k_ctx.k_data, 'X', 380 + EXT_DATA_LEN << blkbits)); 381 + 382 + map.m_lblk = param->split_map.m_lblk; 383 + map.m_len = param->split_map.m_len; 384 + 385 + switch (param->type) { 386 + case TEST_SPLIT_CONVERT: 387 + path = ext4_split_convert_extents(NULL, inode, &map, path, 388 + param->split_flags, NULL); 389 + break; 390 + case TEST_CREATE_BLOCKS: 391 + ext4_map_create_blocks_helper(test, inode, &map, param->split_flags); 392 + break; 393 + default: 394 + KUNIT_FAIL(test, "param->type %d not support.", param->type); 395 + } 396 + 397 + path = ext4_find_extent(inode, EXT_DATA_LBLK, NULL, EXT4_EX_NOCACHE); 398 + ex = path->p_ext; 399 + 400 + for (int i = 0; i < param->nr_exp_ext; i++) { 401 + struct kunit_ext_state exp_ext = param->exp_ext_state[i]; 402 + bool es_check_needed = param->type != TEST_SPLIT_CONVERT; 403 + struct extent_status es; 404 + int contains_ex, ex_end, es_end, es_pblk; 405 + 406 + KUNIT_EXPECT_EQ(test, exp_ext.ex_lblk, 407 + le32_to_cpu(ex->ee_block)); 408 + KUNIT_EXPECT_EQ(test, exp_ext.ex_len, 409 + ext4_ext_get_actual_len(ex)); 410 + KUNIT_EXPECT_EQ(test, exp_ext.is_unwrit, 411 + ext4_ext_is_unwritten(ex)); 412 + /* 413 + * Confirm extent cache is in sync. Note that es cache can be 414 + * merged even when on-disk extents are not so take that into 415 + * account. 416 + * 417 + * Also, ext4_split_convert_extents() forces EXT4_EX_NOCACHE hence 418 + * es status are ignored for that case. 419 + */ 420 + if (es_check_needed) { 421 + ext4_es_lookup_extent(inode, le32_to_cpu(ex->ee_block), 422 + NULL, &es, NULL); 423 + 424 + ex_end = exp_ext.ex_lblk + exp_ext.ex_len; 425 + es_end = es.es_lblk + es.es_len; 426 + contains_ex = es.es_lblk <= exp_ext.ex_lblk && 427 + es_end >= ex_end; 428 + es_pblk = ext4_es_pblock(&es) + 429 + (exp_ext.ex_lblk - es.es_lblk); 430 + 431 + KUNIT_EXPECT_EQ(test, contains_ex, 1); 432 + KUNIT_EXPECT_EQ(test, ext4_ext_pblock(ex), es_pblk); 433 + KUNIT_EXPECT_EQ(test, 1, 434 + (exp_ext.is_unwrit && 435 + ext4_es_is_unwritten(&es)) || 436 + (!exp_ext.is_unwrit && 437 + ext4_es_is_written(&es))); 438 + } 439 + 440 + /* Only printed on failure */ 441 + kunit_log(KERN_INFO, test, 442 + "# [extent %d] exp: lblk:%d len:%d unwrit:%d \n", i, 443 + exp_ext.ex_lblk, exp_ext.ex_len, exp_ext.is_unwrit); 444 + kunit_log(KERN_INFO, test, 445 + "# [extent %d] got: lblk:%d len:%d unwrit:%d\n", i, 446 + le32_to_cpu(ex->ee_block), 447 + ext4_ext_get_actual_len(ex), 448 + ext4_ext_is_unwritten(ex)); 449 + if (es_check_needed) 450 + kunit_log( 451 + KERN_INFO, test, 452 + "# [extent %d] es: lblk:%d len:%d pblk:%lld type:0x%x\n", 453 + i, es.es_lblk, es.es_len, ext4_es_pblock(&es), 454 + ext4_es_type(&es)); 455 + kunit_log(KERN_INFO, test, "------------------\n"); 456 + 457 + ex = ex + 1; 458 + } 459 + 460 + if (!param->is_zeroout_test) 461 + return; 462 + 463 + /* 464 + * Check that then data area has been zeroed out correctly 465 + */ 466 + for (int i = 0; i < param->nr_exp_data_segs; i++) { 467 + loff_t off, len; 468 + struct kunit_ext_data_state exp_data_seg = param->exp_data_state[i]; 469 + 470 + off = exp_data_seg.off_blk << blkbits; 471 + len = exp_data_seg.len_blk << blkbits; 472 + KUNIT_EXPECT_EQ_MSG(test, 0, 473 + check_buffer(k_ctx.k_data + off, 474 + exp_data_seg.exp_char, len), 475 + "# corruption in byte range [%lld, %lld)", 476 + off, len); 477 + } 478 + 479 + return; 480 + } 481 + 482 + static const struct kunit_ext_test_param test_split_convert_params[] = { 483 + /* unwrit to writ splits */ 484 + { .desc = "split unwrit extent to 2 extents and convert 1st half writ", 485 + .type = TEST_SPLIT_CONVERT, 486 + .is_unwrit_at_start = 1, 487 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 488 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 489 + .nr_exp_ext = 2, 490 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 491 + .ex_len = 1, 492 + .is_unwrit = 0 }, 493 + { .ex_lblk = EXT_DATA_LBLK + 1, 494 + .ex_len = EXT_DATA_LEN - 1, 495 + .is_unwrit = 1 } }, 496 + .is_zeroout_test = 0 }, 497 + { .desc = "split unwrit extent to 2 extents and convert 2nd half writ", 498 + .type = TEST_SPLIT_CONVERT, 499 + .is_unwrit_at_start = 1, 500 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 501 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 502 + .nr_exp_ext = 2, 503 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 504 + .ex_len = 1, 505 + .is_unwrit = 1 }, 506 + { .ex_lblk = EXT_DATA_LBLK + 1, 507 + .ex_len = EXT_DATA_LEN - 1, 508 + .is_unwrit = 0 } }, 509 + .is_zeroout_test = 0 }, 510 + { .desc = "split unwrit extent to 3 extents and convert 2nd half to writ", 511 + .type = TEST_SPLIT_CONVERT, 512 + .is_unwrit_at_start = 1, 513 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 514 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 515 + .nr_exp_ext = 3, 516 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 517 + .ex_len = 1, 518 + .is_unwrit = 1 }, 519 + { .ex_lblk = EXT_DATA_LBLK + 1, 520 + .ex_len = EXT_DATA_LEN - 2, 521 + .is_unwrit = 0 }, 522 + { .ex_lblk = EXT_DATA_LBLK + 1 + (EXT_DATA_LEN - 2), 523 + .ex_len = 1, 524 + .is_unwrit = 1 } }, 525 + .is_zeroout_test = 0 }, 526 + 527 + /* writ to unwrit splits */ 528 + { .desc = "split writ extent to 2 extents and convert 1st half unwrit", 529 + .type = TEST_SPLIT_CONVERT, 530 + .is_unwrit_at_start = 0, 531 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 532 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 533 + .nr_exp_ext = 2, 534 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 535 + .ex_len = 1, 536 + .is_unwrit = 1 }, 537 + { .ex_lblk = EXT_DATA_LBLK + 1, 538 + .ex_len = EXT_DATA_LEN - 1, 539 + .is_unwrit = 0 } }, 540 + .is_zeroout_test = 0 }, 541 + { .desc = "split writ extent to 2 extents and convert 2nd half unwrit", 542 + .type = TEST_SPLIT_CONVERT, 543 + .is_unwrit_at_start = 0, 544 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 545 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 546 + .nr_exp_ext = 2, 547 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 548 + .ex_len = 1, 549 + .is_unwrit = 0 }, 550 + { .ex_lblk = EXT_DATA_LBLK + 1, 551 + .ex_len = EXT_DATA_LEN - 1, 552 + .is_unwrit = 1 } }, 553 + .is_zeroout_test = 0 }, 554 + { .desc = "split writ extent to 3 extents and convert 2nd half to unwrit", 555 + .type = TEST_SPLIT_CONVERT, 556 + .is_unwrit_at_start = 0, 557 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 558 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 559 + .nr_exp_ext = 3, 560 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 561 + .ex_len = 1, 562 + .is_unwrit = 0 }, 563 + { .ex_lblk = EXT_DATA_LBLK + 1, 564 + .ex_len = EXT_DATA_LEN - 2, 565 + .is_unwrit = 1 }, 566 + { .ex_lblk = EXT_DATA_LBLK + 1 + (EXT_DATA_LEN - 2), 567 + .ex_len = 1, 568 + .is_unwrit = 0 } }, 569 + .is_zeroout_test = 0 }, 570 + 571 + /* 572 + * ***** zeroout tests ***** 573 + */ 574 + /* unwrit to writ splits */ 575 + { .desc = "split unwrit extent to 2 extents and convert 1st half writ (zeroout)", 576 + .type = TEST_SPLIT_CONVERT, 577 + .is_unwrit_at_start = 1, 578 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 579 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 580 + .nr_exp_ext = 1, 581 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 582 + .ex_len = EXT_DATA_LEN, 583 + .is_unwrit = 0 } }, 584 + .is_zeroout_test = 1, 585 + .nr_exp_data_segs = 2, 586 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 587 + { .exp_char = 0, 588 + .off_blk = 1, 589 + .len_blk = EXT_DATA_LEN - 1 } } }, 590 + { .desc = "split unwrit extent to 2 extents and convert 2nd half writ (zeroout)", 591 + .type = TEST_SPLIT_CONVERT, 592 + .is_unwrit_at_start = 1, 593 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 594 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 595 + .nr_exp_ext = 1, 596 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 597 + .ex_len = EXT_DATA_LEN, 598 + .is_unwrit = 0 } }, 599 + .is_zeroout_test = 1, 600 + .nr_exp_data_segs = 2, 601 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 602 + { .exp_char = 'X', 603 + .off_blk = 1, 604 + .len_blk = EXT_DATA_LEN - 1 } } }, 605 + { .desc = "split unwrit extent to 3 extents and convert 2nd half writ (zeroout)", 606 + .type = TEST_SPLIT_CONVERT, 607 + .is_unwrit_at_start = 1, 608 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 609 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 610 + .nr_exp_ext = 1, 611 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 612 + .ex_len = EXT_DATA_LEN, 613 + .is_unwrit = 0 } }, 614 + .is_zeroout_test = 1, 615 + .nr_exp_data_segs = 3, 616 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 617 + { .exp_char = 'X', .off_blk = 1, .len_blk = EXT_DATA_LEN - 2 }, 618 + { .exp_char = 0, .off_blk = EXT_DATA_LEN - 1, .len_blk = 1 } } }, 619 + 620 + /* writ to unwrit splits */ 621 + { .desc = "split writ extent to 2 extents and convert 1st half unwrit (zeroout)", 622 + .type = TEST_SPLIT_CONVERT, 623 + .is_unwrit_at_start = 0, 624 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 625 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 626 + .nr_exp_ext = 1, 627 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 628 + .ex_len = EXT_DATA_LEN, 629 + .is_unwrit = 0 } }, 630 + .is_zeroout_test = 1, 631 + .nr_exp_data_segs = 2, 632 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 633 + { .exp_char = 'X', 634 + .off_blk = 1, 635 + .len_blk = EXT_DATA_LEN - 1 } } }, 636 + { .desc = "split writ extent to 2 extents and convert 2nd half unwrit (zeroout)", 637 + .type = TEST_SPLIT_CONVERT, 638 + .is_unwrit_at_start = 0, 639 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 640 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 641 + .nr_exp_ext = 1, 642 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 643 + .ex_len = EXT_DATA_LEN, 644 + .is_unwrit = 0 } }, 645 + .is_zeroout_test = 1, 646 + .nr_exp_data_segs = 2, 647 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 648 + { .exp_char = 0, 649 + .off_blk = 1, 650 + .len_blk = EXT_DATA_LEN - 1 } } }, 651 + { .desc = "split writ extent to 3 extents and convert 2nd half unwrit (zeroout)", 652 + .type = TEST_SPLIT_CONVERT, 653 + .is_unwrit_at_start = 0, 654 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 655 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 656 + .nr_exp_ext = 1, 657 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 658 + .ex_len = EXT_DATA_LEN, 659 + .is_unwrit = 0 } }, 660 + .is_zeroout_test = 1, 661 + .nr_exp_data_segs = 3, 662 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 663 + { .exp_char = 0, 664 + .off_blk = 1, 665 + .len_blk = EXT_DATA_LEN - 2 }, 666 + { .exp_char = 'X', 667 + .off_blk = EXT_DATA_LEN - 1, 668 + .len_blk = 1 } } }, 669 + }; 670 + 671 + /* Tests to trigger ext4_ext_map_blocks() -> convert_initialized_extent() */ 672 + static const struct kunit_ext_test_param test_convert_initialized_params[] = { 673 + /* writ to unwrit splits */ 674 + { .desc = "split writ extent to 2 extents and convert 1st half unwrit", 675 + .type = TEST_CREATE_BLOCKS, 676 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 677 + .is_unwrit_at_start = 0, 678 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 679 + .nr_exp_ext = 2, 680 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 681 + .ex_len = 1, 682 + .is_unwrit = 1 }, 683 + { .ex_lblk = EXT_DATA_LBLK + 1, 684 + .ex_len = EXT_DATA_LEN - 1, 685 + .is_unwrit = 0 } }, 686 + .is_zeroout_test = 0 }, 687 + { .desc = "split writ extent to 2 extents and convert 2nd half unwrit", 688 + .type = TEST_CREATE_BLOCKS, 689 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 690 + .is_unwrit_at_start = 0, 691 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 692 + .nr_exp_ext = 2, 693 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 694 + .ex_len = 1, 695 + .is_unwrit = 0 }, 696 + { .ex_lblk = EXT_DATA_LBLK + 1, 697 + .ex_len = EXT_DATA_LEN - 1, 698 + .is_unwrit = 1 } }, 699 + .is_zeroout_test = 0 }, 700 + { .desc = "split writ extent to 3 extents and convert 2nd half to unwrit", 701 + .type = TEST_CREATE_BLOCKS, 702 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 703 + .is_unwrit_at_start = 0, 704 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 705 + .nr_exp_ext = 3, 706 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 707 + .ex_len = 1, 708 + .is_unwrit = 0 }, 709 + { .ex_lblk = EXT_DATA_LBLK + 1, 710 + .ex_len = EXT_DATA_LEN - 2, 711 + .is_unwrit = 1 }, 712 + { .ex_lblk = EXT_DATA_LBLK + 1 + (EXT_DATA_LEN - 2), 713 + .ex_len = 1, 714 + .is_unwrit = 0 } }, 715 + .is_zeroout_test = 0 }, 716 + 717 + /* writ to unwrit splits (zeroout) */ 718 + { .desc = "split writ extent to 2 extents and convert 1st half unwrit (zeroout)", 719 + .type = TEST_CREATE_BLOCKS, 720 + .is_unwrit_at_start = 0, 721 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 722 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 723 + .nr_exp_ext = 1, 724 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 725 + .ex_len = EXT_DATA_LEN, 726 + .is_unwrit = 0 } }, 727 + .is_zeroout_test = 1, 728 + .nr_exp_data_segs = 2, 729 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 730 + { .exp_char = 'X', 731 + .off_blk = 1, 732 + .len_blk = EXT_DATA_LEN - 1 } } }, 733 + { .desc = "split writ extent to 2 extents and convert 2nd half unwrit (zeroout)", 734 + .type = TEST_CREATE_BLOCKS, 735 + .is_unwrit_at_start = 0, 736 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 737 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 738 + .nr_exp_ext = 1, 739 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 740 + .ex_len = EXT_DATA_LEN, 741 + .is_unwrit = 0 } }, 742 + .is_zeroout_test = 1, 743 + .nr_exp_data_segs = 2, 744 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 745 + { .exp_char = 0, 746 + .off_blk = 1, 747 + .len_blk = EXT_DATA_LEN - 1 } } }, 748 + { .desc = "split writ extent to 3 extents and convert 2nd half unwrit (zeroout)", 749 + .type = TEST_CREATE_BLOCKS, 750 + .is_unwrit_at_start = 0, 751 + .split_flags = EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, 752 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 753 + .nr_exp_ext = 1, 754 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 755 + .ex_len = EXT_DATA_LEN, 756 + .is_unwrit = 0 } }, 757 + .is_zeroout_test = 1, 758 + .nr_exp_data_segs = 3, 759 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 760 + { .exp_char = 0, 761 + .off_blk = 1, 762 + .len_blk = EXT_DATA_LEN - 2 }, 763 + { .exp_char = 'X', 764 + .off_blk = EXT_DATA_LEN - 1, 765 + .len_blk = 1 } } }, 766 + }; 767 + 768 + /* Tests to trigger ext4_ext_map_blocks() -> ext4_ext_handle_unwritten_exntents() */ 769 + static const struct kunit_ext_test_param test_handle_unwritten_params[] = { 770 + /* unwrit to writ splits via endio path */ 771 + { .desc = "split unwrit extent to 2 extents and convert 1st half writ (endio)", 772 + .type = TEST_CREATE_BLOCKS, 773 + .is_unwrit_at_start = 1, 774 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 775 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 776 + .nr_exp_ext = 2, 777 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 778 + .ex_len = 1, 779 + .is_unwrit = 0 }, 780 + { .ex_lblk = EXT_DATA_LBLK + 1, 781 + .ex_len = EXT_DATA_LEN - 1, 782 + .is_unwrit = 1 } }, 783 + .is_zeroout_test = 0 }, 784 + { .desc = "split unwrit extent to 2 extents and convert 2nd half writ (endio)", 785 + .type = TEST_CREATE_BLOCKS, 786 + .is_unwrit_at_start = 1, 787 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 788 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 789 + .nr_exp_ext = 2, 790 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 791 + .ex_len = 1, 792 + .is_unwrit = 1 }, 793 + { .ex_lblk = EXT_DATA_LBLK + 1, 794 + .ex_len = EXT_DATA_LEN - 1, 795 + .is_unwrit = 0 } }, 796 + .is_zeroout_test = 0 }, 797 + { .desc = "split unwrit extent to 3 extents and convert 2nd half to writ (endio)", 798 + .type = TEST_CREATE_BLOCKS, 799 + .is_unwrit_at_start = 1, 800 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 801 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 802 + .nr_exp_ext = 3, 803 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 804 + .ex_len = 1, 805 + .is_unwrit = 1 }, 806 + { .ex_lblk = EXT_DATA_LBLK + 1, 807 + .ex_len = EXT_DATA_LEN - 2, 808 + .is_unwrit = 0 }, 809 + { .ex_lblk = EXT_DATA_LBLK + 1 + (EXT_DATA_LEN - 2), 810 + .ex_len = 1, 811 + .is_unwrit = 1 } }, 812 + .is_zeroout_test = 0 }, 813 + 814 + /* unwrit to writ splits via non-endio path */ 815 + { .desc = "split unwrit extent to 2 extents and convert 1st half writ (non endio)", 816 + .type = TEST_CREATE_BLOCKS, 817 + .is_unwrit_at_start = 1, 818 + .split_flags = EXT4_GET_BLOCKS_CREATE, 819 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 820 + .nr_exp_ext = 2, 821 + .disable_zeroout = true, 822 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 823 + .ex_len = 1, 824 + .is_unwrit = 0 }, 825 + { .ex_lblk = EXT_DATA_LBLK + 1, 826 + .ex_len = EXT_DATA_LEN - 1, 827 + .is_unwrit = 1 } }, 828 + .is_zeroout_test = 0 }, 829 + { .desc = "split unwrit extent to 2 extents and convert 2nd half writ (non endio)", 830 + .type = TEST_CREATE_BLOCKS, 831 + .is_unwrit_at_start = 1, 832 + .split_flags = EXT4_GET_BLOCKS_CREATE, 833 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 834 + .nr_exp_ext = 2, 835 + .disable_zeroout = true, 836 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 837 + .ex_len = 1, 838 + .is_unwrit = 1 }, 839 + { .ex_lblk = EXT_DATA_LBLK + 1, 840 + .ex_len = EXT_DATA_LEN - 1, 841 + .is_unwrit = 0 } }, 842 + .is_zeroout_test = 0 }, 843 + { .desc = "split unwrit extent to 3 extents and convert 2nd half to writ (non endio)", 844 + .type = TEST_CREATE_BLOCKS, 845 + .is_unwrit_at_start = 1, 846 + .split_flags = EXT4_GET_BLOCKS_CREATE, 847 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 848 + .nr_exp_ext = 3, 849 + .disable_zeroout = true, 850 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 851 + .ex_len = 1, 852 + .is_unwrit = 1 }, 853 + { .ex_lblk = EXT_DATA_LBLK + 1, 854 + .ex_len = EXT_DATA_LEN - 2, 855 + .is_unwrit = 0 }, 856 + { .ex_lblk = EXT_DATA_LBLK + 1 + (EXT_DATA_LEN - 2), 857 + .ex_len = 1, 858 + .is_unwrit = 1 } }, 859 + .is_zeroout_test = 0 }, 860 + 861 + /* 862 + * ***** zeroout tests ***** 863 + */ 864 + /* unwrit to writ splits (endio)*/ 865 + { .desc = "split unwrit extent to 2 extents and convert 1st half writ (endio, zeroout)", 866 + .type = TEST_CREATE_BLOCKS, 867 + .is_unwrit_at_start = 1, 868 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 869 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 870 + .nr_exp_ext = 1, 871 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 872 + .ex_len = EXT_DATA_LEN, 873 + .is_unwrit = 0 } }, 874 + .is_zeroout_test = 1, 875 + .nr_exp_data_segs = 2, 876 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 877 + { .exp_char = 0, 878 + .off_blk = 1, 879 + .len_blk = EXT_DATA_LEN - 1 } } }, 880 + { .desc = "split unwrit extent to 2 extents and convert 2nd half writ (endio, zeroout)", 881 + .type = TEST_CREATE_BLOCKS, 882 + .is_unwrit_at_start = 1, 883 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 884 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 885 + .nr_exp_ext = 1, 886 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 887 + .ex_len = EXT_DATA_LEN, 888 + .is_unwrit = 0 } }, 889 + .is_zeroout_test = 1, 890 + .nr_exp_data_segs = 2, 891 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 892 + { .exp_char = 'X', 893 + .off_blk = 1, 894 + .len_blk = EXT_DATA_LEN - 1 } } }, 895 + { .desc = "split unwrit extent to 3 extents and convert 2nd half writ (endio, zeroout)", 896 + .type = TEST_CREATE_BLOCKS, 897 + .is_unwrit_at_start = 1, 898 + .split_flags = EXT4_GET_BLOCKS_CONVERT, 899 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 900 + .nr_exp_ext = 1, 901 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 902 + .ex_len = EXT_DATA_LEN, 903 + .is_unwrit = 0 } }, 904 + .is_zeroout_test = 1, 905 + .nr_exp_data_segs = 3, 906 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 907 + { .exp_char = 'X', 908 + .off_blk = 1, 909 + .len_blk = EXT_DATA_LEN - 2 }, 910 + { .exp_char = 0, 911 + .off_blk = EXT_DATA_LEN - 1, 912 + .len_blk = 1 } } }, 913 + 914 + /* unwrit to writ splits (non-endio)*/ 915 + { .desc = "split unwrit extent to 2 extents and convert 1st half writ (non-endio, zeroout)", 916 + .type = TEST_CREATE_BLOCKS, 917 + .is_unwrit_at_start = 1, 918 + .split_flags = EXT4_GET_BLOCKS_CREATE, 919 + .split_map = { .m_lblk = EXT_DATA_LBLK, .m_len = 1 }, 920 + .nr_exp_ext = 1, 921 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 922 + .ex_len = EXT_DATA_LEN, 923 + .is_unwrit = 0 } }, 924 + .is_zeroout_test = 1, 925 + .nr_exp_data_segs = 2, 926 + .exp_data_state = { { .exp_char = 'X', .off_blk = 0, .len_blk = 1 }, 927 + { .exp_char = 0, 928 + .off_blk = 1, 929 + .len_blk = EXT_DATA_LEN - 1 } } }, 930 + { .desc = "split unwrit extent to 2 extents and convert 2nd half writ (non-endio, zeroout)", 931 + .type = TEST_CREATE_BLOCKS, 932 + .is_unwrit_at_start = 1, 933 + .split_flags = EXT4_GET_BLOCKS_CREATE, 934 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 1 }, 935 + .nr_exp_ext = 1, 936 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 937 + .ex_len = EXT_DATA_LEN, 938 + .is_unwrit = 0 } }, 939 + .is_zeroout_test = 1, 940 + .nr_exp_data_segs = 2, 941 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 942 + { .exp_char = 'X', 943 + .off_blk = 1, 944 + .len_blk = EXT_DATA_LEN - 1 } } }, 945 + { .desc = "split unwrit extent to 3 extents and convert 2nd half writ (non-endio, zeroout)", 946 + .type = TEST_CREATE_BLOCKS, 947 + .is_unwrit_at_start = 1, 948 + .split_flags = EXT4_GET_BLOCKS_CREATE, 949 + .split_map = { .m_lblk = EXT_DATA_LBLK + 1, .m_len = EXT_DATA_LEN - 2 }, 950 + .nr_exp_ext = 1, 951 + .exp_ext_state = { { .ex_lblk = EXT_DATA_LBLK, 952 + .ex_len = EXT_DATA_LEN, 953 + .is_unwrit = 0 } }, 954 + .is_zeroout_test = 1, 955 + .nr_exp_data_segs = 3, 956 + .exp_data_state = { { .exp_char = 0, .off_blk = 0, .len_blk = 1 }, 957 + { .exp_char = 'X', 958 + .off_blk = 1, 959 + .len_blk = EXT_DATA_LEN - 2 }, 960 + { .exp_char = 0, 961 + .off_blk = EXT_DATA_LEN - 1, 962 + .len_blk = 1 } } }, 963 + }; 964 + 965 + static void ext_get_desc(struct kunit *test, const void *p, char *desc) 966 + 967 + { 968 + struct kunit_ext_test_param *param = (struct kunit_ext_test_param *)p; 969 + 970 + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%s %s\n", param->desc, 971 + (param->type & TEST_CREATE_BLOCKS) ? "(highlevel)" : ""); 972 + } 973 + 974 + static int test_split_convert_param_init(struct kunit *test) 975 + { 976 + size_t arr_size = ARRAY_SIZE(test_split_convert_params); 977 + 978 + kunit_register_params_array(test, test_split_convert_params, arr_size, 979 + ext_get_desc); 980 + return 0; 981 + } 982 + 983 + static int test_convert_initialized_param_init(struct kunit *test) 984 + { 985 + size_t arr_size = ARRAY_SIZE(test_convert_initialized_params); 986 + 987 + kunit_register_params_array(test, test_convert_initialized_params, 988 + arr_size, ext_get_desc); 989 + return 0; 990 + } 991 + 992 + static int test_handle_unwritten_init(struct kunit *test) 993 + { 994 + size_t arr_size = ARRAY_SIZE(test_handle_unwritten_params); 995 + 996 + kunit_register_params_array(test, test_handle_unwritten_params, 997 + arr_size, ext_get_desc); 998 + return 0; 999 + } 1000 + 1001 + /* 1002 + * Note that we use KUNIT_CASE_PARAM_WITH_INIT() instead of the more compact 1003 + * KUNIT_ARRAY_PARAM() because the later currently has a limitation causing the 1004 + * output parsing to be prone to error. For more context: 1005 + * 1006 + * https://lore.kernel.org/linux-kselftest/aULJpTvJDw9ctUDe@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com/ 1007 + */ 1008 + static struct kunit_case extents_test_cases[] = { 1009 + KUNIT_CASE_PARAM_WITH_INIT(test_split_convert, kunit_array_gen_params, 1010 + test_split_convert_param_init, NULL), 1011 + KUNIT_CASE_PARAM_WITH_INIT(test_split_convert, kunit_array_gen_params, 1012 + test_convert_initialized_param_init, NULL), 1013 + KUNIT_CASE_PARAM_WITH_INIT(test_split_convert, kunit_array_gen_params, 1014 + test_handle_unwritten_init, NULL), 1015 + {} 1016 + }; 1017 + 1018 + static struct kunit_suite extents_test_suite = { 1019 + .name = "ext4_extents_test", 1020 + .init = extents_kunit_init, 1021 + .exit = extents_kunit_exit, 1022 + .test_cases = extents_test_cases, 1023 + }; 1024 + 1025 + kunit_test_suites(&extents_test_suite); 1026 + 1027 + MODULE_LICENSE("GPL");
+332 -276
fs/ext4/extents.c
··· 32 32 #include "ext4_jbd2.h" 33 33 #include "ext4_extents.h" 34 34 #include "xattr.h" 35 + #include <kunit/static_stub.h> 35 36 36 37 #include <trace/events/ext4.h> 37 38 ··· 41 40 */ 42 41 #define EXT4_EXT_MAY_ZEROOUT 0x1 /* safe to zeroout if split fails \ 43 42 due to ENOSPC */ 44 - #define EXT4_EXT_MARK_UNWRIT1 0x2 /* mark first half unwritten */ 45 - #define EXT4_EXT_MARK_UNWRIT2 0x4 /* mark second half unwritten */ 46 - 47 - #define EXT4_EXT_DATA_VALID1 0x8 /* first half contains valid data */ 48 - #define EXT4_EXT_DATA_VALID2 0x10 /* second half contains valid data */ 43 + static struct ext4_ext_path *ext4_split_convert_extents( 44 + handle_t *handle, struct inode *inode, struct ext4_map_blocks *map, 45 + struct ext4_ext_path *path, int flags, unsigned int *allocated); 49 46 50 47 static __le32 ext4_extent_block_csum(struct inode *inode, 51 48 struct ext4_extent_header *eh) ··· 85 86 static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle, 86 87 struct inode *inode, 87 88 struct ext4_ext_path *path, 88 - ext4_lblk_t split, 89 - int split_flag, int flags); 89 + ext4_lblk_t split, int flags); 90 90 91 91 static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped) 92 92 { ··· 189 191 struct ext4_ext_path *path) 190 192 { 191 193 int err; 194 + 195 + KUNIT_STATIC_STUB_REDIRECT(__ext4_ext_dirty, where, line, handle, inode, 196 + path); 192 197 193 198 WARN_ON(!rwsem_is_locked(&EXT4_I(inode)->i_data_sem)); 194 199 if (path->p_bh) { ··· 333 332 struct ext4_ext_path *path, ext4_lblk_t lblk, 334 333 int nofail) 335 334 { 336 - int unwritten = ext4_ext_is_unwritten(path[path->p_depth].p_ext); 337 335 int flags = EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_SPLIT_NOMERGE; 338 336 339 337 if (nofail) 340 338 flags |= EXT4_GET_BLOCKS_METADATA_NOFAIL | EXT4_EX_NOFAIL; 341 339 342 - return ext4_split_extent_at(handle, inode, path, lblk, unwritten ? 343 - EXT4_EXT_MARK_UNWRIT1|EXT4_EXT_MARK_UNWRIT2 : 0, 344 - flags); 340 + return ext4_split_extent_at(handle, inode, path, lblk, flags); 345 341 } 346 342 347 343 static int ··· 527 529 struct ext4_extent *ex = EXT_FIRST_EXTENT(eh); 528 530 ext4_lblk_t prev = 0; 529 531 int i; 532 + 533 + KUNIT_STATIC_STUB_REDIRECT(ext4_cache_extents, inode, eh); 530 534 531 535 for (i = le16_to_cpu(eh->eh_entries); i > 0; i--, ex++) { 532 536 unsigned int status = EXTENT_STATUS_WRITTEN; ··· 892 892 short int depth, i, ppos = 0; 893 893 int ret; 894 894 gfp_t gfp_flags = GFP_NOFS; 895 + 896 + KUNIT_STATIC_STUB_REDIRECT(ext4_find_extent, inode, block, path, flags); 895 897 896 898 if (flags & EXT4_EX_NOFAIL) 897 899 gfp_flags |= __GFP_NOFAIL; ··· 1987 1985 ext4_lblk_t next; 1988 1986 int mb_flags = 0, unwritten; 1989 1987 1988 + KUNIT_STATIC_STUB_REDIRECT(ext4_ext_insert_extent, handle, inode, path, 1989 + newext, gb_flags); 1990 + 1990 1991 if (gb_flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) 1991 1992 mb_flags |= EXT4_MB_DELALLOC_RESERVED; 1992 1993 if (unlikely(ext4_ext_get_actual_len(newext) == 0)) { ··· 2949 2944 } else { 2950 2945 path = kcalloc(depth + 1, sizeof(struct ext4_ext_path), 2951 2946 GFP_NOFS | __GFP_NOFAIL); 2952 - if (path == NULL) { 2953 - ext4_journal_stop(handle); 2954 - return -ENOMEM; 2955 - } 2956 2947 path[0].p_maxdepth = path[0].p_depth = depth; 2957 2948 path[0].p_hdr = ext_inode_hdr(inode); 2958 2949 i = 0; ··· 3134 3133 ext4_fsblk_t ee_pblock; 3135 3134 unsigned int ee_len; 3136 3135 3137 - ee_block = le32_to_cpu(ex->ee_block); 3138 - ee_len = ext4_ext_get_actual_len(ex); 3136 + ee_block = le32_to_cpu(ex->ee_block); 3137 + ee_len = ext4_ext_get_actual_len(ex); 3139 3138 ee_pblock = ext4_ext_pblock(ex); 3140 3139 3141 3140 if (ee_len == 0) ··· 3151 3150 ext4_fsblk_t ee_pblock; 3152 3151 unsigned int ee_len; 3153 3152 3153 + KUNIT_STATIC_STUB_REDIRECT(ext4_ext_zeroout, inode, ex); 3154 + 3154 3155 ee_len = ext4_ext_get_actual_len(ex); 3155 3156 ee_pblock = ext4_ext_pblock(ex); 3156 3157 return ext4_issue_zeroout(inode, le32_to_cpu(ex->ee_block), ee_pblock, ··· 3166 3163 * @inode: the file inode 3167 3164 * @path: the path to the extent 3168 3165 * @split: the logical block where the extent is splitted. 3169 - * @split_flags: indicates if the extent could be zeroout if split fails, and 3170 - * the states(init or unwritten) of new extents. 3171 3166 * @flags: flags used to insert new extent to extent tree. 3172 3167 * 3173 3168 * 3174 3169 * Splits extent [a, b] into two extents [a, @split) and [@split, b], states 3175 - * of which are determined by split_flag. 3170 + * of which are same as the original extent. No conversion is performed. 3176 3171 * 3177 - * There are two cases: 3178 - * a> the extent are splitted into two extent. 3179 - * b> split is not needed, and just mark the extent. 3180 - * 3181 - * Return an extent path pointer on success, or an error pointer on failure. 3172 + * Return an extent path pointer on success, or an error pointer on failure. On 3173 + * failure, the extent is restored to original state. 3182 3174 */ 3183 3175 static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle, 3184 3176 struct inode *inode, 3185 3177 struct ext4_ext_path *path, 3186 3178 ext4_lblk_t split, 3187 - int split_flag, int flags) 3179 + int flags) 3188 3180 { 3189 3181 ext4_fsblk_t newblock; 3190 3182 ext4_lblk_t ee_block; 3191 - struct ext4_extent *ex, newex, orig_ex, zero_ex; 3183 + struct ext4_extent *ex, newex, orig_ex; 3192 3184 struct ext4_extent *ex2 = NULL; 3193 3185 unsigned int ee_len, depth; 3194 - int err = 0; 3186 + int err = 0, insert_err = 0, is_unwrit = 0; 3195 3187 3196 - BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)) == 3197 - (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2)); 3188 + /* Do not cache extents that are in the process of being modified. */ 3189 + flags |= EXT4_EX_NOCACHE; 3198 3190 3199 3191 ext_debug(inode, "logical block %llu\n", (unsigned long long)split); 3200 3192 ··· 3200 3202 ee_block = le32_to_cpu(ex->ee_block); 3201 3203 ee_len = ext4_ext_get_actual_len(ex); 3202 3204 newblock = split - ee_block + ext4_ext_pblock(ex); 3205 + is_unwrit = ext4_ext_is_unwritten(ex); 3203 3206 3204 3207 BUG_ON(split < ee_block || split >= (ee_block + ee_len)); 3205 - BUG_ON(!ext4_ext_is_unwritten(ex) && 3206 - split_flag & (EXT4_EXT_MAY_ZEROOUT | 3207 - EXT4_EXT_MARK_UNWRIT1 | 3208 - EXT4_EXT_MARK_UNWRIT2)); 3208 + 3209 + /* 3210 + * No split needed 3211 + */ 3212 + if (split == ee_block) 3213 + goto out; 3209 3214 3210 3215 err = ext4_ext_get_access(handle, inode, path + depth); 3211 3216 if (err) 3212 3217 goto out; 3213 3218 3214 - if (split == ee_block) { 3215 - /* 3216 - * case b: block @split is the block that the extent begins with 3217 - * then we just change the state of the extent, and splitting 3218 - * is not needed. 3219 - */ 3220 - if (split_flag & EXT4_EXT_MARK_UNWRIT2) 3221 - ext4_ext_mark_unwritten(ex); 3222 - else 3223 - ext4_ext_mark_initialized(ex); 3224 - 3225 - if (!(flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE)) 3226 - ext4_ext_try_to_merge(handle, inode, path, ex); 3227 - 3228 - err = ext4_ext_dirty(handle, inode, path + path->p_depth); 3229 - goto out; 3230 - } 3231 - 3232 3219 /* case a */ 3233 3220 memcpy(&orig_ex, ex, sizeof(orig_ex)); 3234 3221 ex->ee_len = cpu_to_le16(split - ee_block); 3235 - if (split_flag & EXT4_EXT_MARK_UNWRIT1) 3222 + if (is_unwrit) 3236 3223 ext4_ext_mark_unwritten(ex); 3237 3224 3238 3225 /* ··· 3232 3249 ex2->ee_block = cpu_to_le32(split); 3233 3250 ex2->ee_len = cpu_to_le16(ee_len - (split - ee_block)); 3234 3251 ext4_ext_store_pblock(ex2, newblock); 3235 - if (split_flag & EXT4_EXT_MARK_UNWRIT2) 3252 + if (is_unwrit) 3236 3253 ext4_ext_mark_unwritten(ex2); 3237 3254 3238 3255 path = ext4_ext_insert_extent(handle, inode, path, &newex, flags); 3239 3256 if (!IS_ERR(path)) 3240 - goto out; 3241 - 3242 - err = PTR_ERR(path); 3243 - if (err != -ENOSPC && err != -EDQUOT && err != -ENOMEM) 3244 3257 return path; 3258 + 3259 + insert_err = PTR_ERR(path); 3260 + err = 0; 3245 3261 3246 3262 /* 3247 3263 * Get a new path to try to zeroout or fix the extent length. ··· 3254 3272 if (IS_ERR(path)) { 3255 3273 EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld", 3256 3274 split, PTR_ERR(path)); 3257 - return path; 3275 + goto out_path; 3258 3276 } 3277 + 3278 + err = ext4_ext_get_access(handle, inode, path + depth); 3279 + if (err) 3280 + goto out; 3281 + 3259 3282 depth = ext_depth(inode); 3260 3283 ex = path[depth].p_ext; 3261 3284 3262 - if (EXT4_EXT_MAY_ZEROOUT & split_flag) { 3263 - if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { 3264 - if (split_flag & EXT4_EXT_DATA_VALID1) { 3265 - err = ext4_ext_zeroout(inode, ex2); 3266 - zero_ex.ee_block = ex2->ee_block; 3267 - zero_ex.ee_len = cpu_to_le16( 3268 - ext4_ext_get_actual_len(ex2)); 3269 - ext4_ext_store_pblock(&zero_ex, 3270 - ext4_ext_pblock(ex2)); 3271 - } else { 3272 - err = ext4_ext_zeroout(inode, ex); 3273 - zero_ex.ee_block = ex->ee_block; 3274 - zero_ex.ee_len = cpu_to_le16( 3275 - ext4_ext_get_actual_len(ex)); 3276 - ext4_ext_store_pblock(&zero_ex, 3277 - ext4_ext_pblock(ex)); 3278 - } 3279 - } else { 3280 - err = ext4_ext_zeroout(inode, &orig_ex); 3281 - zero_ex.ee_block = orig_ex.ee_block; 3282 - zero_ex.ee_len = cpu_to_le16( 3283 - ext4_ext_get_actual_len(&orig_ex)); 3284 - ext4_ext_store_pblock(&zero_ex, 3285 - ext4_ext_pblock(&orig_ex)); 3286 - } 3287 - 3288 - if (!err) { 3289 - /* update the extent length and mark as initialized */ 3290 - ex->ee_len = cpu_to_le16(ee_len); 3291 - ext4_ext_try_to_merge(handle, inode, path, ex); 3292 - err = ext4_ext_dirty(handle, inode, path + path->p_depth); 3293 - if (!err) 3294 - /* update extent status tree */ 3295 - ext4_zeroout_es(inode, &zero_ex); 3296 - /* If we failed at this point, we don't know in which 3297 - * state the extent tree exactly is so don't try to fix 3298 - * length of the original extent as it may do even more 3299 - * damage. 3300 - */ 3301 - goto out; 3302 - } 3303 - } 3304 - 3305 3285 fix_extent_len: 3306 3286 ex->ee_len = orig_ex.ee_len; 3307 - /* 3308 - * Ignore ext4_ext_dirty return value since we are already in error path 3309 - * and err is a non-zero error code. 3310 - */ 3311 - ext4_ext_dirty(handle, inode, path + path->p_depth); 3287 + err = ext4_ext_dirty(handle, inode, path + path->p_depth); 3312 3288 out: 3313 - if (err) { 3289 + if (err || insert_err) { 3314 3290 ext4_free_ext_path(path); 3315 - path = ERR_PTR(err); 3291 + path = err ? ERR_PTR(err) : ERR_PTR(insert_err); 3316 3292 } 3293 + out_path: 3294 + if (IS_ERR(path)) 3295 + /* Remove all remaining potentially stale extents. */ 3296 + ext4_es_remove_extent(inode, ee_block, ee_len); 3317 3297 ext4_ext_show_leaf(inode, path); 3318 3298 return path; 3299 + } 3300 + 3301 + static int ext4_split_extent_zeroout(handle_t *handle, struct inode *inode, 3302 + struct ext4_ext_path *path, 3303 + struct ext4_map_blocks *map, int flags) 3304 + { 3305 + struct ext4_extent *ex; 3306 + unsigned int ee_len, depth; 3307 + ext4_lblk_t ee_block; 3308 + uint64_t lblk, pblk, len; 3309 + int is_unwrit; 3310 + int err = 0; 3311 + 3312 + depth = ext_depth(inode); 3313 + ex = path[depth].p_ext; 3314 + ee_block = le32_to_cpu(ex->ee_block); 3315 + ee_len = ext4_ext_get_actual_len(ex); 3316 + is_unwrit = ext4_ext_is_unwritten(ex); 3317 + 3318 + if (flags & EXT4_GET_BLOCKS_CONVERT) { 3319 + /* 3320 + * EXT4_GET_BLOCKS_CONVERT: Caller wants the range specified by 3321 + * map to be initialized. Zeroout everything except the map 3322 + * range. 3323 + */ 3324 + 3325 + loff_t map_end = (loff_t) map->m_lblk + map->m_len; 3326 + loff_t ex_end = (loff_t) ee_block + ee_len; 3327 + 3328 + if (!is_unwrit) 3329 + /* Shouldn't happen. Just exit */ 3330 + return -EINVAL; 3331 + 3332 + /* zeroout left */ 3333 + if (map->m_lblk > ee_block) { 3334 + lblk = ee_block; 3335 + len = map->m_lblk - ee_block; 3336 + pblk = ext4_ext_pblock(ex); 3337 + err = ext4_issue_zeroout(inode, lblk, pblk, len); 3338 + if (err) 3339 + /* ZEROOUT failed, just return original error */ 3340 + return err; 3341 + } 3342 + 3343 + /* zeroout right */ 3344 + if (map_end < ex_end) { 3345 + lblk = map_end; 3346 + len = ex_end - map_end; 3347 + pblk = ext4_ext_pblock(ex) + (map_end - ee_block); 3348 + err = ext4_issue_zeroout(inode, lblk, pblk, len); 3349 + if (err) 3350 + /* ZEROOUT failed, just return original error */ 3351 + return err; 3352 + } 3353 + } else if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) { 3354 + /* 3355 + * EXT4_GET_BLOCKS_CONVERT_UNWRITTEN: Caller wants the 3356 + * range specified by map to be marked unwritten. 3357 + * Zeroout the map range leaving rest as it is. 3358 + */ 3359 + 3360 + if (is_unwrit) 3361 + /* Shouldn't happen. Just exit */ 3362 + return -EINVAL; 3363 + 3364 + lblk = map->m_lblk; 3365 + len = map->m_len; 3366 + pblk = ext4_ext_pblock(ex) + (map->m_lblk - ee_block); 3367 + err = ext4_issue_zeroout(inode, lblk, pblk, len); 3368 + if (err) 3369 + /* ZEROOUT failed, just return original error */ 3370 + return err; 3371 + } else { 3372 + /* 3373 + * We no longer perform unwritten to unwritten splits in IO paths. 3374 + * Hence this should not happen. 3375 + */ 3376 + WARN_ON_ONCE(true); 3377 + return -EINVAL; 3378 + } 3379 + 3380 + err = ext4_ext_get_access(handle, inode, path + depth); 3381 + if (err) 3382 + return err; 3383 + 3384 + ext4_ext_mark_initialized(ex); 3385 + 3386 + ext4_ext_dirty(handle, inode, path + depth); 3387 + if (err) 3388 + return err; 3389 + 3390 + return 0; 3319 3391 } 3320 3392 3321 3393 /* ··· 3388 3352 struct ext4_ext_path *path, 3389 3353 struct ext4_map_blocks *map, 3390 3354 int split_flag, int flags, 3391 - unsigned int *allocated) 3355 + unsigned int *allocated, bool *did_zeroout) 3392 3356 { 3393 - ext4_lblk_t ee_block; 3357 + ext4_lblk_t ee_block, orig_ee_block; 3394 3358 struct ext4_extent *ex; 3395 - unsigned int ee_len, depth; 3396 - int unwritten; 3397 - int split_flag1, flags1; 3359 + unsigned int ee_len, orig_ee_len, depth; 3360 + int unwritten, orig_unwritten; 3361 + int orig_err = 0; 3398 3362 3399 3363 depth = ext_depth(inode); 3400 3364 ex = path[depth].p_ext; ··· 3402 3366 ee_len = ext4_ext_get_actual_len(ex); 3403 3367 unwritten = ext4_ext_is_unwritten(ex); 3404 3368 3369 + orig_ee_block = ee_block; 3370 + orig_ee_len = ee_len; 3371 + orig_unwritten = unwritten; 3372 + 3373 + /* Do not cache extents that are in the process of being modified. */ 3374 + flags |= EXT4_EX_NOCACHE; 3375 + 3405 3376 if (map->m_lblk + map->m_len < ee_block + ee_len) { 3406 - split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT; 3407 - flags1 = flags | EXT4_GET_BLOCKS_SPLIT_NOMERGE; 3408 - if (unwritten) 3409 - split_flag1 |= EXT4_EXT_MARK_UNWRIT1 | 3410 - EXT4_EXT_MARK_UNWRIT2; 3411 - if (split_flag & EXT4_EXT_DATA_VALID2) 3412 - split_flag1 |= EXT4_EXT_DATA_VALID1; 3413 3377 path = ext4_split_extent_at(handle, inode, path, 3414 - map->m_lblk + map->m_len, split_flag1, flags1); 3378 + map->m_lblk + map->m_len, flags); 3415 3379 if (IS_ERR(path)) 3416 - return path; 3380 + goto try_zeroout; 3381 + 3417 3382 /* 3418 3383 * Update path is required because previous ext4_split_extent_at 3419 3384 * may result in split of original leaf or extent zeroout. 3420 3385 */ 3421 3386 path = ext4_find_extent(inode, map->m_lblk, path, flags); 3422 3387 if (IS_ERR(path)) 3423 - return path; 3388 + goto try_zeroout; 3389 + 3424 3390 depth = ext_depth(inode); 3425 3391 ex = path[depth].p_ext; 3426 3392 if (!ex) { ··· 3431 3393 ext4_free_ext_path(path); 3432 3394 return ERR_PTR(-EFSCORRUPTED); 3433 3395 } 3434 - unwritten = ext4_ext_is_unwritten(ex); 3396 + 3397 + /* extent would have changed so update original values */ 3398 + orig_ee_block = le32_to_cpu(ex->ee_block); 3399 + orig_ee_len = ext4_ext_get_actual_len(ex); 3400 + orig_unwritten = ext4_ext_is_unwritten(ex); 3435 3401 } 3436 3402 3437 3403 if (map->m_lblk >= ee_block) { 3438 - split_flag1 = split_flag & EXT4_EXT_DATA_VALID2; 3439 - if (unwritten) { 3440 - split_flag1 |= EXT4_EXT_MARK_UNWRIT1; 3441 - split_flag1 |= split_flag & (EXT4_EXT_MAY_ZEROOUT | 3442 - EXT4_EXT_MARK_UNWRIT2); 3443 - } 3444 - path = ext4_split_extent_at(handle, inode, path, 3445 - map->m_lblk, split_flag1, flags); 3404 + path = ext4_split_extent_at(handle, inode, path, map->m_lblk, 3405 + flags); 3446 3406 if (IS_ERR(path)) 3447 - return path; 3407 + goto try_zeroout; 3448 3408 } 3449 3409 3410 + goto success; 3411 + 3412 + try_zeroout: 3413 + /* 3414 + * There was an error in splitting the extent. So instead, just zeroout 3415 + * unwritten portions and convert it to initialized as a last resort. If 3416 + * there is any failure here we just return the original error 3417 + */ 3418 + 3419 + orig_err = PTR_ERR(path); 3420 + if (orig_err != -ENOSPC && orig_err != -EDQUOT && orig_err != -ENOMEM) 3421 + goto out_orig_err; 3422 + 3423 + /* we can't zeroout? just return the original err */ 3424 + if (!(split_flag & EXT4_EXT_MAY_ZEROOUT)) 3425 + goto out_orig_err; 3426 + 3427 + if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) { 3428 + int max_zeroout_blks = 3429 + EXT4_SB(inode->i_sb)->s_extent_max_zeroout_kb >> 3430 + (inode->i_sb->s_blocksize_bits - 10); 3431 + 3432 + if (map->m_len > max_zeroout_blks) 3433 + goto out_orig_err; 3434 + } 3435 + 3436 + path = ext4_find_extent(inode, map->m_lblk, NULL, flags); 3437 + if (IS_ERR(path)) 3438 + goto out_orig_err; 3439 + 3440 + depth = ext_depth(inode); 3441 + ex = path[depth].p_ext; 3442 + ee_block = le32_to_cpu(ex->ee_block); 3443 + ee_len = ext4_ext_get_actual_len(ex); 3444 + unwritten = ext4_ext_is_unwritten(ex); 3445 + 3446 + /* extent to zeroout should have been unchanged but its not */ 3447 + if (WARN_ON(ee_block != orig_ee_block || ee_len != orig_ee_len || 3448 + unwritten != orig_unwritten)) 3449 + goto out_free_path; 3450 + 3451 + if (ext4_split_extent_zeroout(handle, inode, path, map, flags)) 3452 + goto out_free_path; 3453 + 3454 + /* zeroout succeeded */ 3455 + if (did_zeroout) 3456 + *did_zeroout = true; 3457 + 3458 + success: 3450 3459 if (allocated) { 3451 3460 if (map->m_lblk + map->m_len > ee_block + ee_len) 3452 3461 *allocated = ee_len - (map->m_lblk - ee_block); ··· 3502 3417 } 3503 3418 ext4_ext_show_leaf(inode, path); 3504 3419 return path; 3420 + 3421 + out_free_path: 3422 + ext4_free_ext_path(path); 3423 + out_orig_err: 3424 + return ERR_PTR(orig_err); 3425 + 3505 3426 } 3506 3427 3507 3428 /* ··· 3543 3452 ext4_lblk_t ee_block, eof_block; 3544 3453 unsigned int ee_len, depth, map_len = map->m_len; 3545 3454 int err = 0; 3546 - int split_flag = EXT4_EXT_DATA_VALID2; 3547 3455 unsigned int max_zeroout = 0; 3548 3456 3549 3457 ext_debug(inode, "logical block %llu, max_blocks %u\n", ··· 3694 3604 * It is safe to convert extent to initialized via explicit 3695 3605 * zeroout only if extent is fully inside i_size or new_size. 3696 3606 */ 3697 - split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0; 3698 - 3699 - if (EXT4_EXT_MAY_ZEROOUT & split_flag) 3607 + if (ee_block + ee_len <= eof_block) 3700 3608 max_zeroout = sbi->s_extent_max_zeroout_kb >> 3701 3609 (inode->i_sb->s_blocksize_bits - 10); 3702 3610 ··· 3749 3661 } 3750 3662 3751 3663 fallback: 3752 - path = ext4_split_extent(handle, inode, path, &split_map, split_flag, 3753 - flags, NULL); 3664 + path = ext4_split_convert_extents(handle, inode, &split_map, path, 3665 + flags | EXT4_GET_BLOCKS_CONVERT, NULL); 3754 3666 if (IS_ERR(path)) 3755 3667 return path; 3756 3668 out: ··· 3800 3712 ext4_lblk_t ee_block; 3801 3713 struct ext4_extent *ex; 3802 3714 unsigned int ee_len; 3803 - int split_flag = 0, depth; 3715 + int split_flag = 0, depth, err = 0; 3716 + bool did_zeroout = false; 3804 3717 3805 3718 ext_debug(inode, "logical block %llu, max_blocks %u\n", 3806 3719 (unsigned long long)map->m_lblk, map->m_len); ··· 3815 3726 ee_block = le32_to_cpu(ex->ee_block); 3816 3727 ee_len = ext4_ext_get_actual_len(ex); 3817 3728 3818 - /* Convert to unwritten */ 3819 - if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) { 3820 - split_flag |= EXT4_EXT_DATA_VALID1; 3821 - /* Convert to initialized */ 3822 - } else if (flags & EXT4_GET_BLOCKS_CONVERT) { 3823 - /* 3824 - * It is safe to convert extent to initialized via explicit 3825 - * zeroout only if extent is fully inside i_size or new_size. 3826 - */ 3827 - split_flag |= ee_block + ee_len <= eof_block ? 3828 - EXT4_EXT_MAY_ZEROOUT : 0; 3829 - split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2); 3729 + /* No split needed */ 3730 + if (ee_block == map->m_lblk && ee_len == map->m_len) 3731 + goto convert; 3732 + 3733 + /* 3734 + * It is only safe to convert extent to initialized via explicit 3735 + * zeroout only if extent is fully inside i_size or new_size. 3736 + */ 3737 + split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0; 3738 + 3739 + /* 3740 + * pass SPLIT_NOMERGE explicitly so we don't end up merging extents we 3741 + * just split. 3742 + */ 3743 + path = ext4_split_extent(handle, inode, path, map, split_flag, 3744 + flags | EXT4_GET_BLOCKS_SPLIT_NOMERGE, 3745 + allocated, &did_zeroout); 3746 + if (IS_ERR(path)) 3747 + return path; 3748 + 3749 + convert: 3750 + path = ext4_find_extent(inode, map->m_lblk, path, flags); 3751 + if (IS_ERR(path)) 3752 + return path; 3753 + 3754 + depth = ext_depth(inode); 3755 + ex = path[depth].p_ext; 3756 + 3757 + /* 3758 + * Conversion is already handled in case of zeroout 3759 + */ 3760 + if (!did_zeroout) { 3761 + err = ext4_ext_get_access(handle, inode, path + depth); 3762 + if (err) 3763 + goto err; 3764 + 3765 + if (flags & EXT4_GET_BLOCKS_CONVERT) 3766 + ext4_ext_mark_initialized(ex); 3767 + else if (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN) 3768 + ext4_ext_mark_unwritten(ex); 3769 + 3770 + if (!(flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE)) 3771 + /* 3772 + * note: ext4_ext_correct_indexes() isn't needed here because 3773 + * borders are not changed 3774 + */ 3775 + ext4_ext_try_to_merge(handle, inode, path, ex); 3776 + 3777 + err = ext4_ext_dirty(handle, inode, path + depth); 3778 + if (err) 3779 + goto err; 3830 3780 } 3831 - flags |= EXT4_GET_BLOCKS_SPLIT_NOMERGE; 3832 - return ext4_split_extent(handle, inode, path, map, split_flag, flags, 3833 - allocated); 3781 + 3782 + /* Lets update the extent status tree after conversion */ 3783 + if (!(flags & EXT4_EX_NOCACHE)) 3784 + ext4_es_insert_extent(inode, le32_to_cpu(ex->ee_block), 3785 + ext4_ext_get_actual_len(ex), 3786 + ext4_ext_pblock(ex), 3787 + ext4_ext_is_unwritten(ex) ? 3788 + EXTENT_STATUS_UNWRITTEN : 3789 + EXTENT_STATUS_WRITTEN, 3790 + false); 3791 + 3792 + err: 3793 + if (err) { 3794 + ext4_free_ext_path(path); 3795 + return ERR_PTR(err); 3796 + } 3797 + 3798 + return path; 3834 3799 } 3835 3800 3836 3801 static struct ext4_ext_path * 3837 3802 ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode, 3838 3803 struct ext4_map_blocks *map, 3839 - struct ext4_ext_path *path) 3804 + struct ext4_ext_path *path, int flags) 3840 3805 { 3841 3806 struct ext4_extent *ex; 3842 3807 ext4_lblk_t ee_block; 3843 3808 unsigned int ee_len; 3844 3809 int depth; 3845 - int err = 0; 3846 3810 3847 3811 depth = ext_depth(inode); 3848 3812 ex = path[depth].p_ext; ··· 3905 3763 ext_debug(inode, "logical block %llu, max_blocks %u\n", 3906 3764 (unsigned long long)ee_block, ee_len); 3907 3765 3908 - /* If extent is larger than requested it is a clear sign that we still 3909 - * have some extent state machine issues left. So extent_split is still 3910 - * required. 3911 - * TODO: Once all related issues will be fixed this situation should be 3912 - * illegal. 3913 - */ 3914 - if (ee_block != map->m_lblk || ee_len > map->m_len) { 3915 - #ifdef CONFIG_EXT4_DEBUG 3916 - ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu," 3917 - " len %u; IO logical block %llu, len %u", 3918 - inode->i_ino, (unsigned long long)ee_block, ee_len, 3919 - (unsigned long long)map->m_lblk, map->m_len); 3920 - #endif 3921 - path = ext4_split_convert_extents(handle, inode, map, path, 3922 - EXT4_GET_BLOCKS_CONVERT, NULL); 3923 - if (IS_ERR(path)) 3924 - return path; 3925 - 3926 - path = ext4_find_extent(inode, map->m_lblk, path, 0); 3927 - if (IS_ERR(path)) 3928 - return path; 3929 - depth = ext_depth(inode); 3930 - ex = path[depth].p_ext; 3931 - } 3932 - 3933 - err = ext4_ext_get_access(handle, inode, path + depth); 3934 - if (err) 3935 - goto errout; 3936 - /* first mark the extent as initialized */ 3937 - ext4_ext_mark_initialized(ex); 3938 - 3939 - /* note: ext4_ext_correct_indexes() isn't needed here because 3940 - * borders are not changed 3941 - */ 3942 - ext4_ext_try_to_merge(handle, inode, path, ex); 3943 - 3944 - /* Mark modified extent as dirty */ 3945 - err = ext4_ext_dirty(handle, inode, path + path->p_depth); 3946 - if (err) 3947 - goto errout; 3948 - 3949 - ext4_ext_show_leaf(inode, path); 3950 - return path; 3951 - 3952 - errout: 3953 - ext4_free_ext_path(path); 3954 - return ERR_PTR(err); 3766 + return ext4_split_convert_extents(handle, inode, map, path, flags, 3767 + NULL); 3955 3768 } 3956 3769 3957 3770 static struct ext4_ext_path * 3958 3771 convert_initialized_extent(handle_t *handle, struct inode *inode, 3959 3772 struct ext4_map_blocks *map, 3960 3773 struct ext4_ext_path *path, 3774 + int flags, 3961 3775 unsigned int *allocated) 3962 3776 { 3963 3777 struct ext4_extent *ex; 3964 3778 ext4_lblk_t ee_block; 3965 3779 unsigned int ee_len; 3966 3780 int depth; 3967 - int err = 0; 3968 3781 3969 3782 /* 3970 3783 * Make sure that the extent is no bigger than we support with ··· 3936 3839 ext_debug(inode, "logical block %llu, max_blocks %u\n", 3937 3840 (unsigned long long)ee_block, ee_len); 3938 3841 3939 - if (ee_block != map->m_lblk || ee_len > map->m_len) { 3940 - path = ext4_split_convert_extents(handle, inode, map, path, 3941 - EXT4_GET_BLOCKS_CONVERT_UNWRITTEN, NULL); 3942 - if (IS_ERR(path)) 3943 - return path; 3842 + path = ext4_split_convert_extents(handle, inode, map, path, flags, 3843 + NULL); 3844 + if (IS_ERR(path)) 3845 + return path; 3944 3846 3945 - path = ext4_find_extent(inode, map->m_lblk, path, 0); 3946 - if (IS_ERR(path)) 3947 - return path; 3948 - depth = ext_depth(inode); 3949 - ex = path[depth].p_ext; 3950 - if (!ex) { 3951 - EXT4_ERROR_INODE(inode, "unexpected hole at %lu", 3952 - (unsigned long) map->m_lblk); 3953 - err = -EFSCORRUPTED; 3954 - goto errout; 3955 - } 3956 - } 3957 - 3958 - err = ext4_ext_get_access(handle, inode, path + depth); 3959 - if (err) 3960 - goto errout; 3961 - /* first mark the extent as unwritten */ 3962 - ext4_ext_mark_unwritten(ex); 3963 - 3964 - /* note: ext4_ext_correct_indexes() isn't needed here because 3965 - * borders are not changed 3966 - */ 3967 - ext4_ext_try_to_merge(handle, inode, path, ex); 3968 - 3969 - /* Mark modified extent as dirty */ 3970 - err = ext4_ext_dirty(handle, inode, path + path->p_depth); 3971 - if (err) 3972 - goto errout; 3973 3847 ext4_ext_show_leaf(inode, path); 3974 3848 3975 3849 ext4_update_inode_fsync_trans(handle, inode, 1); 3976 3850 3977 - map->m_flags |= EXT4_MAP_UNWRITTEN; 3851 + /* 3852 + * The extent might be initialized in case of zeroout. 3853 + */ 3854 + path = ext4_find_extent(inode, map->m_lblk, path, flags); 3855 + if (IS_ERR(path)) 3856 + return path; 3857 + 3858 + depth = ext_depth(inode); 3859 + ex = path[depth].p_ext; 3860 + 3861 + if (ext4_ext_is_unwritten(ex)) 3862 + map->m_flags |= EXT4_MAP_UNWRITTEN; 3863 + else 3864 + map->m_flags |= EXT4_MAP_MAPPED; 3978 3865 if (*allocated > map->m_len) 3979 3866 *allocated = map->m_len; 3980 3867 map->m_len = *allocated; 3981 3868 return path; 3982 - 3983 - errout: 3984 - ext4_free_ext_path(path); 3985 - return ERR_PTR(err); 3986 3869 } 3987 3870 3988 3871 static struct ext4_ext_path * ··· 3987 3910 trace_ext4_ext_handle_unwritten_extents(inode, map, flags, 3988 3911 *allocated, newblock); 3989 3912 3990 - /* get_block() before submitting IO, split the extent */ 3991 - if (flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE) { 3992 - path = ext4_split_convert_extents(handle, inode, map, path, 3993 - flags | EXT4_GET_BLOCKS_CONVERT, allocated); 3994 - if (IS_ERR(path)) 3995 - return path; 3996 - /* 3997 - * shouldn't get a 0 allocated when splitting an extent unless 3998 - * m_len is 0 (bug) or extent has been corrupted 3999 - */ 4000 - if (unlikely(*allocated == 0)) { 4001 - EXT4_ERROR_INODE(inode, 4002 - "unexpected allocated == 0, m_len = %u", 4003 - map->m_len); 4004 - err = -EFSCORRUPTED; 4005 - goto errout; 4006 - } 4007 - map->m_flags |= EXT4_MAP_UNWRITTEN; 4008 - goto out; 4009 - } 4010 3913 /* IO end_io complete, convert the filled extent to written */ 4011 3914 if (flags & EXT4_GET_BLOCKS_CONVERT) { 4012 3915 path = ext4_convert_unwritten_extents_endio(handle, inode, 4013 - map, path); 3916 + map, path, flags); 4014 3917 if (IS_ERR(path)) 4015 3918 return path; 4016 3919 ext4_update_inode_fsync_trans(handle, inode, 1); ··· 4040 3983 goto errout; 4041 3984 } 4042 3985 4043 - out: 4044 3986 map->m_flags |= EXT4_MAP_NEW; 4045 3987 map_out: 4046 3988 map->m_flags |= EXT4_MAP_MAPPED; ··· 4216 4160 insert_hole: 4217 4161 /* Put just found gap into cache to speed up subsequent requests */ 4218 4162 ext_debug(inode, " -> %u:%u\n", hole_start, len); 4219 - ext4_es_insert_extent(inode, hole_start, len, ~0, 4220 - EXTENT_STATUS_HOLE, false); 4163 + ext4_es_cache_extent(inode, hole_start, len, ~0, EXTENT_STATUS_HOLE); 4221 4164 4222 4165 /* Update hole_len to reflect hole size after lblk */ 4223 4166 if (hole_start != lblk) ··· 4312 4257 if ((!ext4_ext_is_unwritten(ex)) && 4313 4258 (flags & EXT4_GET_BLOCKS_CONVERT_UNWRITTEN)) { 4314 4259 path = convert_initialized_extent(handle, 4315 - inode, map, path, &allocated); 4260 + inode, map, path, flags, &allocated); 4316 4261 if (IS_ERR(path)) 4317 4262 err = PTR_ERR(path); 4318 4263 goto out; ··· 5430 5375 if (!extent) { 5431 5376 EXT4_ERROR_INODE(inode, "unexpected hole at %lu", 5432 5377 (unsigned long) *iterator); 5433 - return -EFSCORRUPTED; 5378 + ret = -EFSCORRUPTED; 5379 + goto out; 5434 5380 } 5435 5381 if (SHIFT == SHIFT_LEFT && *iterator > 5436 5382 le32_to_cpu(extent->ee_block)) { ··· 5597 5541 struct ext4_extent *extent; 5598 5542 ext4_lblk_t start_lblk, len_lblk, ee_start_lblk = 0; 5599 5543 unsigned int credits, ee_len; 5600 - int ret, depth, split_flag = 0; 5544 + int ret, depth; 5601 5545 loff_t start; 5602 5546 5603 5547 trace_ext4_insert_range(inode, offset, len); ··· 5668 5612 */ 5669 5613 if ((start_lblk > ee_start_lblk) && 5670 5614 (start_lblk < (ee_start_lblk + ee_len))) { 5671 - if (ext4_ext_is_unwritten(extent)) 5672 - split_flag = EXT4_EXT_MARK_UNWRIT1 | 5673 - EXT4_EXT_MARK_UNWRIT2; 5674 5615 path = ext4_split_extent_at(handle, inode, path, 5675 - start_lblk, split_flag, 5676 - EXT4_EX_NOCACHE | 5616 + start_lblk, EXT4_EX_NOCACHE | 5677 5617 EXT4_GET_BLOCKS_SPLIT_NOMERGE | 5678 5618 EXT4_GET_BLOCKS_METADATA_NOFAIL); 5679 5619 } ··· 6239 6187 ext4_free_ext_path(path); 6240 6188 return 0; 6241 6189 } 6190 + 6191 + #ifdef CONFIG_EXT4_KUNIT_TESTS 6192 + #include "extents-test.c" 6193 + #endif
+96 -29
fs/ext4/extents_status.c
··· 16 16 #include "ext4.h" 17 17 18 18 #include <trace/events/ext4.h> 19 + #include <kunit/static_stub.h> 19 20 20 21 /* 21 22 * According to previous discussion in Ext4 Developer Workshop, we ··· 179 178 static int __es_insert_extent(struct inode *inode, struct extent_status *newes, 180 179 struct extent_status *prealloc); 181 180 static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, 182 - ext4_lblk_t end, int *reserved, 181 + ext4_lblk_t end, unsigned int status, 182 + int *reserved, struct extent_status *res, 183 183 struct extent_status *prealloc); 184 184 static int es_reclaim_extents(struct ext4_inode_info *ei, int *nr_to_scan); 185 185 static int __es_shrink(struct ext4_sb_info *sbi, int nr_to_scan, ··· 242 240 struct ext4_inode_info *ei = EXT4_I(inode); 243 241 244 242 WRITE_ONCE(ei->i_es_seq, ei->i_es_seq + 1); 243 + } 244 + 245 + static inline int __es_check_extent_status(struct extent_status *es, 246 + unsigned int status, 247 + struct extent_status *res) 248 + { 249 + if (ext4_es_type(es) & status) 250 + return 0; 251 + 252 + if (res) { 253 + res->es_lblk = es->es_lblk; 254 + res->es_len = es->es_len; 255 + res->es_pblk = es->es_pblk; 256 + } 257 + return -EINVAL; 245 258 } 246 259 247 260 /* ··· 899 882 900 883 /* 901 884 * ext4_es_insert_extent() adds information to an inode's extent 902 - * status tree. 885 + * status tree. This interface is used for modifying extents. To cache 886 + * on-disk extents, use ext4_es_cache_extent() instead. 903 887 */ 904 888 void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, 905 889 ext4_lblk_t len, ext4_fsblk_t pblk, ··· 947 929 pr = __alloc_pending(true); 948 930 write_lock(&EXT4_I(inode)->i_es_lock); 949 931 950 - err1 = __es_remove_extent(inode, lblk, end, &resv_used, es1); 932 + err1 = __es_remove_extent(inode, lblk, end, 0, &resv_used, NULL, es1); 951 933 if (err1 != 0) 952 934 goto error; 953 935 /* Free preallocated extent if it didn't get used. */ ··· 979 961 } 980 962 pending = err3; 981 963 } 982 - /* 983 - * TODO: For cache on-disk extents, there is no need to increment 984 - * the sequence counter, this requires future optimization. 985 - */ 986 964 ext4_es_inc_seq(inode); 987 965 error: 988 966 write_unlock(&EXT4_I(inode)->i_es_lock); ··· 1012 998 } 1013 999 1014 1000 /* 1015 - * ext4_es_cache_extent() inserts information into the extent status 1016 - * tree if and only if there isn't information about the range in 1017 - * question already. 1001 + * ext4_es_cache_extent() inserts information into the extent status tree 1002 + * only if there is no existing information about the specified range or 1003 + * if the existing extents have the same status. 1004 + * 1005 + * Note that this interface is only used for caching on-disk extent 1006 + * information and cannot be used to convert existing extents in the extent 1007 + * status tree. To convert existing extents, use ext4_es_insert_extent() 1008 + * instead. 1018 1009 */ 1019 1010 void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, 1020 1011 ext4_lblk_t len, ext4_fsblk_t pblk, 1021 1012 unsigned int status) 1022 1013 { 1023 1014 struct extent_status *es; 1024 - struct extent_status newes; 1015 + struct extent_status chkes, newes; 1025 1016 ext4_lblk_t end = lblk + len - 1; 1017 + bool conflict = false; 1018 + int err; 1026 1019 1027 1020 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) 1028 1021 return; ··· 1037 1016 newes.es_lblk = lblk; 1038 1017 newes.es_len = len; 1039 1018 ext4_es_store_pblock_status(&newes, pblk, status); 1040 - trace_ext4_es_cache_extent(inode, &newes); 1041 1019 1042 1020 if (!len) 1043 1021 return; ··· 1044 1024 BUG_ON(end < lblk); 1045 1025 1046 1026 write_lock(&EXT4_I(inode)->i_es_lock); 1047 - 1048 1027 es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk); 1049 - if (!es || es->es_lblk > end) 1050 - __es_insert_extent(inode, &newes, NULL); 1028 + if (es && es->es_lblk <= end) { 1029 + /* Found an extent that covers the entire range. */ 1030 + if (es->es_lblk <= lblk && es->es_lblk + es->es_len > end) { 1031 + if (__es_check_extent_status(es, status, &chkes)) 1032 + conflict = true; 1033 + goto unlock; 1034 + } 1035 + /* Check and remove all extents in range. */ 1036 + err = __es_remove_extent(inode, lblk, end, status, NULL, 1037 + &chkes, NULL); 1038 + if (err) { 1039 + if (err == -EINVAL) 1040 + conflict = true; 1041 + goto unlock; 1042 + } 1043 + } 1044 + __es_insert_extent(inode, &newes, NULL); 1045 + trace_ext4_es_cache_extent(inode, &newes); 1046 + ext4_es_print_tree(inode); 1047 + unlock: 1051 1048 write_unlock(&EXT4_I(inode)->i_es_lock); 1049 + if (!conflict) 1050 + return; 1051 + /* 1052 + * A hole in the on-disk extent but a delayed extent in the extent 1053 + * status tree, is allowed. 1054 + */ 1055 + if (status == EXTENT_STATUS_HOLE && 1056 + ext4_es_type(&chkes) == EXTENT_STATUS_DELAYED) 1057 + return; 1058 + 1059 + ext4_warning_inode(inode, 1060 + "ES cache extent failed: add [%d,%d,%llu,0x%x] conflict with existing [%d,%d,%llu,0x%x]\n", 1061 + lblk, len, pblk, status, chkes.es_lblk, chkes.es_len, 1062 + ext4_es_pblock(&chkes), ext4_es_status(&chkes)); 1052 1063 } 1053 1064 1054 1065 /* ··· 1460 1409 return rc->ndelayed; 1461 1410 } 1462 1411 1463 - 1464 1412 /* 1465 1413 * __es_remove_extent - removes block range from extent status tree 1466 1414 * 1467 1415 * @inode - file containing range 1468 1416 * @lblk - first block in range 1469 1417 * @end - last block in range 1418 + * @status - the extent status to be checked 1470 1419 * @reserved - number of cluster reservations released 1420 + * @res - return the extent if the status is not match 1471 1421 * @prealloc - pre-allocated es to avoid memory allocation failures 1472 1422 * 1473 1423 * If @reserved is not NULL and delayed allocation is enabled, counts 1474 1424 * block/cluster reservations freed by removing range and if bigalloc 1475 - * enabled cancels pending reservations as needed. Returns 0 on success, 1476 - * error code on failure. 1425 + * enabled cancels pending reservations as needed. If @status is not 1426 + * zero, check extent status type while removing extent, return -EINVAL 1427 + * and pass out the extent through @res if not match. Returns 0 on 1428 + * success, error code on failure. 1477 1429 */ 1478 1430 static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk, 1479 - ext4_lblk_t end, int *reserved, 1431 + ext4_lblk_t end, unsigned int status, 1432 + int *reserved, struct extent_status *res, 1480 1433 struct extent_status *prealloc) 1481 1434 { 1482 1435 struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; ··· 1489 1434 struct extent_status orig_es; 1490 1435 ext4_lblk_t len1, len2; 1491 1436 ext4_fsblk_t block; 1492 - int err = 0; 1437 + int err; 1493 1438 bool count_reserved = true; 1494 1439 struct rsvd_count rc; 1495 1440 1496 1441 if (reserved == NULL || !test_opt(inode->i_sb, DELALLOC)) 1497 1442 count_reserved = false; 1443 + if (status == 0) 1444 + status = ES_TYPE_MASK; 1498 1445 1499 1446 es = __es_tree_search(&tree->root, lblk); 1500 1447 if (!es) 1501 - goto out; 1448 + return 0; 1502 1449 if (es->es_lblk > end) 1503 - goto out; 1450 + return 0; 1451 + 1452 + err = __es_check_extent_status(es, status, res); 1453 + if (err) 1454 + return err; 1504 1455 1505 1456 /* Simply invalidate cache_es. */ 1506 1457 tree->cache_es = NULL; ··· 1541 1480 1542 1481 es->es_lblk = orig_es.es_lblk; 1543 1482 es->es_len = orig_es.es_len; 1544 - goto out; 1483 + return err; 1545 1484 } 1546 1485 } else { 1547 1486 es->es_lblk = end + 1; ··· 1555 1494 if (count_reserved) 1556 1495 count_rsvd(inode, orig_es.es_lblk + len1, 1557 1496 orig_es.es_len - len1 - len2, &orig_es, &rc); 1558 - goto out_get_reserved; 1497 + goto out; 1559 1498 } 1560 1499 1561 1500 if (len1 > 0) { ··· 1570 1509 } 1571 1510 1572 1511 while (es && ext4_es_end(es) <= end) { 1512 + err = __es_check_extent_status(es, status, res); 1513 + if (err) 1514 + return err; 1573 1515 if (count_reserved) 1574 1516 count_rsvd(inode, es->es_lblk, es->es_len, es, &rc); 1575 1517 node = rb_next(&es->rb_node); ··· 1588 1524 if (es && es->es_lblk < end + 1) { 1589 1525 ext4_lblk_t orig_len = es->es_len; 1590 1526 1527 + err = __es_check_extent_status(es, status, res); 1528 + if (err) 1529 + return err; 1530 + 1591 1531 len1 = ext4_es_end(es) - end; 1592 1532 if (count_reserved) 1593 1533 count_rsvd(inode, es->es_lblk, orig_len - len1, ··· 1604 1536 } 1605 1537 } 1606 1538 1607 - out_get_reserved: 1539 + out: 1608 1540 if (count_reserved) 1609 1541 *reserved = get_rsvd(inode, end, es, &rc); 1610 - out: 1611 - return err; 1542 + return 0; 1612 1543 } 1613 1544 1614 1545 /* ··· 1649 1582 * is reclaimed. 1650 1583 */ 1651 1584 write_lock(&EXT4_I(inode)->i_es_lock); 1652 - err = __es_remove_extent(inode, lblk, end, &reserved, es); 1585 + err = __es_remove_extent(inode, lblk, end, 0, &reserved, NULL, es); 1653 1586 if (err) 1654 1587 goto error; 1655 1588 /* Free preallocated extent if it didn't get used. */ ··· 2241 2174 } 2242 2175 write_lock(&EXT4_I(inode)->i_es_lock); 2243 2176 2244 - err1 = __es_remove_extent(inode, lblk, end, NULL, es1); 2177 + err1 = __es_remove_extent(inode, lblk, end, 0, NULL, NULL, es1); 2245 2178 if (err1 != 0) 2246 2179 goto error; 2247 2180 /* Free preallocated extent if it didn't get used. */
+31 -23
fs/ext4/fast_commit.c
··· 231 231 void ext4_fc_del(struct inode *inode) 232 232 { 233 233 struct ext4_inode_info *ei = EXT4_I(inode); 234 - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 235 234 struct ext4_fc_dentry_update *fc_dentry; 236 235 wait_queue_head_t *wq; 236 + int alloc_ctx; 237 237 238 238 if (ext4_fc_disabled(inode->i_sb)) 239 239 return; 240 240 241 - mutex_lock(&sbi->s_fc_lock); 241 + alloc_ctx = ext4_fc_lock(inode->i_sb); 242 242 if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) { 243 - mutex_unlock(&sbi->s_fc_lock); 243 + ext4_fc_unlock(inode->i_sb, alloc_ctx); 244 244 return; 245 245 } 246 246 ··· 275 275 #endif 276 276 prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); 277 277 if (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) { 278 - mutex_unlock(&sbi->s_fc_lock); 278 + ext4_fc_unlock(inode->i_sb, alloc_ctx); 279 279 schedule(); 280 - mutex_lock(&sbi->s_fc_lock); 280 + alloc_ctx = ext4_fc_lock(inode->i_sb); 281 281 } 282 282 finish_wait(wq, &wait.wq_entry); 283 283 } ··· 288 288 * dentry create references, since it is not needed to log it anyways. 289 289 */ 290 290 if (list_empty(&ei->i_fc_dilist)) { 291 - mutex_unlock(&sbi->s_fc_lock); 291 + ext4_fc_unlock(inode->i_sb, alloc_ctx); 292 292 return; 293 293 } 294 294 ··· 298 298 list_del_init(&fc_dentry->fcd_dilist); 299 299 300 300 WARN_ON(!list_empty(&ei->i_fc_dilist)); 301 - mutex_unlock(&sbi->s_fc_lock); 301 + ext4_fc_unlock(inode->i_sb, alloc_ctx); 302 302 303 303 release_dentry_name_snapshot(&fc_dentry->fcd_name); 304 304 kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry); ··· 315 315 tid_t tid; 316 316 bool has_transaction = true; 317 317 bool is_ineligible; 318 + int alloc_ctx; 318 319 319 320 if (ext4_fc_disabled(sb)) 320 321 return; ··· 330 329 has_transaction = false; 331 330 read_unlock(&sbi->s_journal->j_state_lock); 332 331 } 333 - mutex_lock(&sbi->s_fc_lock); 332 + alloc_ctx = ext4_fc_lock(sb); 334 333 is_ineligible = ext4_test_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); 335 334 if (has_transaction && (!is_ineligible || tid_gt(tid, sbi->s_fc_ineligible_tid))) 336 335 sbi->s_fc_ineligible_tid = tid; 337 336 ext4_set_mount_flag(sb, EXT4_MF_FC_INELIGIBLE); 338 - mutex_unlock(&sbi->s_fc_lock); 337 + ext4_fc_unlock(sb, alloc_ctx); 339 338 WARN_ON(reason >= EXT4_FC_REASON_MAX); 340 339 sbi->s_fc_stats.fc_ineligible_reason_count[reason]++; 341 340 } ··· 359 358 struct ext4_inode_info *ei = EXT4_I(inode); 360 359 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 361 360 tid_t tid = 0; 361 + int alloc_ctx; 362 362 int ret; 363 363 364 364 tid = handle->h_transaction->t_tid; ··· 375 373 if (!enqueue) 376 374 return ret; 377 375 378 - mutex_lock(&sbi->s_fc_lock); 376 + alloc_ctx = ext4_fc_lock(inode->i_sb); 379 377 if (list_empty(&EXT4_I(inode)->i_fc_list)) 380 378 list_add_tail(&EXT4_I(inode)->i_fc_list, 381 379 (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING || 382 380 sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ? 383 381 &sbi->s_fc_q[FC_Q_STAGING] : 384 382 &sbi->s_fc_q[FC_Q_MAIN]); 385 - mutex_unlock(&sbi->s_fc_lock); 383 + ext4_fc_unlock(inode->i_sb, alloc_ctx); 386 384 387 385 return ret; 388 386 } ··· 404 402 struct inode *dir = dentry->d_parent->d_inode; 405 403 struct super_block *sb = inode->i_sb; 406 404 struct ext4_sb_info *sbi = EXT4_SB(sb); 405 + int alloc_ctx; 407 406 408 407 spin_unlock(&ei->i_fc_lock); 409 408 ··· 428 425 take_dentry_name_snapshot(&node->fcd_name, dentry); 429 426 INIT_LIST_HEAD(&node->fcd_dilist); 430 427 INIT_LIST_HEAD(&node->fcd_list); 431 - mutex_lock(&sbi->s_fc_lock); 428 + alloc_ctx = ext4_fc_lock(sb); 432 429 if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING || 433 430 sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) 434 431 list_add_tail(&node->fcd_list, ··· 449 446 WARN_ON(!list_empty(&ei->i_fc_dilist)); 450 447 list_add_tail(&node->fcd_dilist, &ei->i_fc_dilist); 451 448 } 452 - mutex_unlock(&sbi->s_fc_lock); 449 + ext4_fc_unlock(sb, alloc_ctx); 453 450 spin_lock(&ei->i_fc_lock); 454 451 455 452 return 0; ··· 1049 1046 struct blk_plug plug; 1050 1047 int ret = 0; 1051 1048 u32 crc = 0; 1049 + int alloc_ctx; 1052 1050 1053 1051 /* 1054 1052 * Step 1: Mark all inodes on s_fc_q[MAIN] with 1055 1053 * EXT4_STATE_FC_FLUSHING_DATA. This prevents these inodes from being 1056 1054 * freed until the data flush is over. 1057 1055 */ 1058 - mutex_lock(&sbi->s_fc_lock); 1056 + alloc_ctx = ext4_fc_lock(sb); 1059 1057 list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) { 1060 1058 ext4_set_inode_state(&iter->vfs_inode, 1061 1059 EXT4_STATE_FC_FLUSHING_DATA); 1062 1060 } 1063 - mutex_unlock(&sbi->s_fc_lock); 1061 + ext4_fc_unlock(sb, alloc_ctx); 1064 1062 1065 1063 /* Step 2: Flush data for all the eligible inodes. */ 1066 1064 ret = ext4_fc_flush_data(journal); ··· 1071 1067 * any error from step 2. This ensures that waiters waiting on 1072 1068 * EXT4_STATE_FC_FLUSHING_DATA can resume. 1073 1069 */ 1074 - mutex_lock(&sbi->s_fc_lock); 1070 + alloc_ctx = ext4_fc_lock(sb); 1075 1071 list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) { 1076 1072 ext4_clear_inode_state(&iter->vfs_inode, 1077 1073 EXT4_STATE_FC_FLUSHING_DATA); ··· 1088 1084 * prepare_to_wait() in ext4_fc_del(). 1089 1085 */ 1090 1086 smp_mb(); 1091 - mutex_unlock(&sbi->s_fc_lock); 1087 + ext4_fc_unlock(sb, alloc_ctx); 1092 1088 1093 1089 /* 1094 1090 * If we encountered error in Step 2, return it now after clearing ··· 1105 1101 * previous handles are now drained. We now mark the inodes on the 1106 1102 * commit queue as being committed. 1107 1103 */ 1108 - mutex_lock(&sbi->s_fc_lock); 1104 + alloc_ctx = ext4_fc_lock(sb); 1109 1105 list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) { 1110 1106 ext4_set_inode_state(&iter->vfs_inode, 1111 1107 EXT4_STATE_FC_COMMITTING); 1112 1108 } 1113 - mutex_unlock(&sbi->s_fc_lock); 1109 + ext4_fc_unlock(sb, alloc_ctx); 1114 1110 jbd2_journal_unlock_updates(journal); 1115 1111 1116 1112 /* ··· 1121 1117 blkdev_issue_flush(journal->j_fs_dev); 1122 1118 1123 1119 blk_start_plug(&plug); 1120 + alloc_ctx = ext4_fc_lock(sb); 1124 1121 /* Step 6: Write fast commit blocks to disk. */ 1125 1122 if (sbi->s_fc_bytes == 0) { 1126 1123 /* ··· 1139 1134 } 1140 1135 1141 1136 /* Step 6.2: Now write all the dentry updates. */ 1142 - mutex_lock(&sbi->s_fc_lock); 1143 1137 ret = ext4_fc_commit_dentry_updates(journal, &crc); 1144 1138 if (ret) 1145 1139 goto out; ··· 1160 1156 ret = ext4_fc_write_tail(sb, crc); 1161 1157 1162 1158 out: 1163 - mutex_unlock(&sbi->s_fc_lock); 1159 + ext4_fc_unlock(sb, alloc_ctx); 1164 1160 blk_finish_plug(&plug); 1165 1161 return ret; 1166 1162 } ··· 1294 1290 struct ext4_sb_info *sbi = EXT4_SB(sb); 1295 1291 struct ext4_inode_info *ei; 1296 1292 struct ext4_fc_dentry_update *fc_dentry; 1293 + int alloc_ctx; 1297 1294 1298 1295 if (full && sbi->s_fc_bh) 1299 1296 sbi->s_fc_bh = NULL; ··· 1302 1297 trace_ext4_fc_cleanup(journal, full, tid); 1303 1298 jbd2_fc_release_bufs(journal); 1304 1299 1305 - mutex_lock(&sbi->s_fc_lock); 1300 + alloc_ctx = ext4_fc_lock(sb); 1306 1301 while (!list_empty(&sbi->s_fc_q[FC_Q_MAIN])) { 1307 1302 ei = list_first_entry(&sbi->s_fc_q[FC_Q_MAIN], 1308 1303 struct ext4_inode_info, ··· 1361 1356 1362 1357 if (full) 1363 1358 sbi->s_fc_bytes = 0; 1364 - mutex_unlock(&sbi->s_fc_lock); 1359 + ext4_fc_unlock(sb, alloc_ctx); 1365 1360 trace_ext4_fc_stats(sb); 1366 1361 } 1367 1362 ··· 2307 2302 [EXT4_FC_REASON_FALLOC_RANGE] = "Falloc range op", 2308 2303 [EXT4_FC_REASON_INODE_JOURNAL_DATA] = "Data journalling", 2309 2304 [EXT4_FC_REASON_ENCRYPTED_FILENAME] = "Encrypted filename", 2305 + [EXT4_FC_REASON_MIGRATE] = "Inode format migration", 2306 + [EXT4_FC_REASON_VERITY] = "fs-verity enable", 2307 + [EXT4_FC_REASON_MOVE_EXT] = "Move extents", 2310 2308 }; 2311 2309 2312 2310 int ext4_fc_info_show(struct seq_file *seq, void *v)
+3
fs/ext4/fast_commit.h
··· 97 97 EXT4_FC_REASON_FALLOC_RANGE, 98 98 EXT4_FC_REASON_INODE_JOURNAL_DATA, 99 99 EXT4_FC_REASON_ENCRYPTED_FILENAME, 100 + EXT4_FC_REASON_MIGRATE, 101 + EXT4_FC_REASON_VERITY, 102 + EXT4_FC_REASON_MOVE_EXT, 100 103 EXT4_FC_REASON_MAX 101 104 }; 102 105
+9 -15
fs/ext4/file.c
··· 419 419 * updating inode i_disksize and/or orphan handling with exclusive lock. 420 420 * 421 421 * - shared locking will only be true mostly with overwrites, including 422 - * initialized blocks and unwritten blocks. For overwrite unwritten blocks 423 - * we protect splitting extents by i_data_sem in ext4_inode_info, so we can 424 - * also release exclusive i_rwsem lock. 422 + * initialized blocks and unwritten blocks. 425 423 * 426 424 * - Otherwise we will switch to exclusive i_rwsem lock. 427 425 */ 428 426 static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from, 429 427 bool *ilock_shared, bool *extend, 430 - bool *unwritten, int *dio_flags) 428 + int *dio_flags) 431 429 { 432 430 struct file *file = iocb->ki_filp; 433 431 struct inode *inode = file_inode(file); 434 432 loff_t offset; 435 433 size_t count; 436 434 ssize_t ret; 437 - bool overwrite, unaligned_io; 435 + bool overwrite, unaligned_io, unwritten; 438 436 439 437 restart: 440 438 ret = ext4_generic_write_checks(iocb, from); ··· 444 446 445 447 unaligned_io = ext4_unaligned_io(inode, from, offset); 446 448 *extend = ext4_extending_io(inode, offset, count); 447 - overwrite = ext4_overwrite_io(inode, offset, count, unwritten); 449 + overwrite = ext4_overwrite_io(inode, offset, count, &unwritten); 448 450 449 451 /* 450 452 * Determine whether we need to upgrade to an exclusive lock. This is ··· 459 461 */ 460 462 if (*ilock_shared && 461 463 ((!IS_NOSEC(inode) || *extend || !overwrite || 462 - (unaligned_io && *unwritten)))) { 464 + (unaligned_io && unwritten)))) { 463 465 if (iocb->ki_flags & IOCB_NOWAIT) { 464 466 ret = -EAGAIN; 465 467 goto out; ··· 482 484 ret = -EAGAIN; 483 485 goto out; 484 486 } 485 - if (unaligned_io && (!overwrite || *unwritten)) 487 + if (unaligned_io && (!overwrite || unwritten)) 486 488 inode_dio_wait(inode); 487 489 *dio_flags = IOMAP_DIO_FORCE_WAIT; 488 490 } ··· 507 509 struct inode *inode = file_inode(iocb->ki_filp); 508 510 loff_t offset = iocb->ki_pos; 509 511 size_t count = iov_iter_count(from); 510 - const struct iomap_ops *iomap_ops = &ext4_iomap_ops; 511 - bool extend = false, unwritten = false; 512 + bool extend = false; 512 513 bool ilock_shared = true; 513 514 int dio_flags = 0; 514 515 ··· 553 556 ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); 554 557 555 558 ret = ext4_dio_write_checks(iocb, from, &ilock_shared, &extend, 556 - &unwritten, &dio_flags); 559 + &dio_flags); 557 560 if (ret <= 0) 558 561 return ret; 559 562 ··· 573 576 goto out; 574 577 } 575 578 576 - if (ilock_shared && !unwritten) 577 - iomap_ops = &ext4_iomap_overwrite_ops; 578 - ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, 579 + ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, &ext4_dio_write_ops, 579 580 dio_flags, NULL, 0); 580 581 if (ret == -ENOTBLK) 581 582 ret = 0; ··· 854 859 * when trying to sort through large numbers of block 855 860 * devices or filesystem images. 856 861 */ 857 - memset(buf, 0, sizeof(buf)); 858 862 path.mnt = mnt; 859 863 path.dentry = mnt->mnt_root; 860 864 cp = d_path(&path, buf, sizeof(buf));
+37 -57
fs/ext4/inode.c
··· 48 48 #include "acl.h" 49 49 #include "truncate.h" 50 50 51 + #include <kunit/static_stub.h> 52 + 51 53 #include <trace/events/ext4.h> 52 54 53 55 static void ext4_journalled_zero_new_buffers(handle_t *handle, ··· 402 400 { 403 401 int ret; 404 402 403 + KUNIT_STATIC_STUB_REDIRECT(ext4_issue_zeroout, inode, lblk, pblk, len); 404 + 405 405 if (IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) 406 406 return fscrypt_zeroout_range(inode, lblk, pblk, len); 407 407 ··· 507 503 retval = ext4_ext_map_blocks(handle, inode, &map2, 0); 508 504 509 505 if (retval <= 0) { 510 - ext4_es_insert_extent(inode, map->m_lblk, map->m_len, 511 - map->m_pblk, status, false); 506 + ext4_es_cache_extent(inode, map->m_lblk, map->m_len, 507 + map->m_pblk, status); 512 508 return map->m_len; 513 509 } 514 510 ··· 529 525 */ 530 526 if (map->m_pblk + map->m_len == map2.m_pblk && 531 527 status == status2) { 532 - ext4_es_insert_extent(inode, map->m_lblk, 533 - map->m_len + map2.m_len, map->m_pblk, 534 - status, false); 528 + ext4_es_cache_extent(inode, map->m_lblk, 529 + map->m_len + map2.m_len, map->m_pblk, 530 + status); 535 531 map->m_len += map2.m_len; 536 532 } else { 537 - ext4_es_insert_extent(inode, map->m_lblk, map->m_len, 538 - map->m_pblk, status, false); 533 + ext4_es_cache_extent(inode, map->m_lblk, map->m_len, 534 + map->m_pblk, status); 539 535 } 540 536 541 537 return map->m_len; 542 538 } 543 539 544 - static int ext4_map_query_blocks(handle_t *handle, struct inode *inode, 545 - struct ext4_map_blocks *map, int flags) 540 + int ext4_map_query_blocks(handle_t *handle, struct inode *inode, 541 + struct ext4_map_blocks *map, int flags) 546 542 { 547 543 unsigned int status; 548 544 int retval; ··· 577 573 map->m_len == orig_mlen) { 578 574 status = map->m_flags & EXT4_MAP_UNWRITTEN ? 579 575 EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN; 580 - ext4_es_insert_extent(inode, map->m_lblk, map->m_len, 581 - map->m_pblk, status, false); 576 + ext4_es_cache_extent(inode, map->m_lblk, map->m_len, 577 + map->m_pblk, status); 582 578 } else { 583 579 retval = ext4_map_query_blocks_next_in_leaf(handle, inode, map, 584 580 orig_mlen); ··· 588 584 return retval; 589 585 } 590 586 591 - static int ext4_map_create_blocks(handle_t *handle, struct inode *inode, 592 - struct ext4_map_blocks *map, int flags) 587 + int ext4_map_create_blocks(handle_t *handle, struct inode *inode, 588 + struct ext4_map_blocks *map, int flags) 593 589 { 594 - struct extent_status es; 595 590 unsigned int status; 596 591 int err, retval = 0; 597 592 ··· 649 646 map->m_len); 650 647 if (err) 651 648 return err; 652 - } 653 - 654 - /* 655 - * If the extent has been zeroed out, we don't need to update 656 - * extent status tree. 657 - */ 658 - if (flags & EXT4_GET_BLOCKS_SPLIT_NOMERGE && 659 - ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es, &map->m_seq)) { 660 - if (ext4_es_is_written(&es)) 661 - return retval; 662 649 } 663 650 664 651 status = map->m_flags & EXT4_MAP_UNWRITTEN ? ··· 2368 2375 2369 2376 dioread_nolock = ext4_should_dioread_nolock(inode); 2370 2377 if (dioread_nolock) 2371 - get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT; 2378 + get_blocks_flags |= EXT4_GET_BLOCKS_UNWRIT_EXT; 2372 2379 2373 2380 err = ext4_map_blocks(handle, inode, map, get_blocks_flags); 2374 2381 if (err < 0) ··· 3733 3740 else if (EXT4_LBLK_TO_B(inode, map->m_lblk) >= i_size_read(inode)) 3734 3741 m_flags = EXT4_GET_BLOCKS_CREATE; 3735 3742 else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) 3736 - m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT; 3743 + m_flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; 3737 3744 3738 3745 if (flags & IOMAP_ATOMIC) 3739 3746 ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags, ··· 3805 3812 if (offset + length <= i_size_read(inode)) { 3806 3813 ret = ext4_map_blocks(NULL, inode, &map, 0); 3807 3814 /* 3808 - * For atomic writes the entire requested length should 3809 - * be mapped. 3815 + * For DAX we convert extents to initialized ones before 3816 + * copying the data, otherwise we do it after I/O so 3817 + * there's no need to call into ext4_iomap_alloc(). 3810 3818 */ 3811 - if (map.m_flags & EXT4_MAP_MAPPED) { 3812 - if ((!(flags & IOMAP_ATOMIC) && ret > 0) || 3813 - (flags & IOMAP_ATOMIC && ret >= orig_mlen)) 3819 + if ((map.m_flags & EXT4_MAP_MAPPED) || 3820 + (!(flags & IOMAP_DAX) && 3821 + (map.m_flags & EXT4_MAP_UNWRITTEN))) { 3822 + /* 3823 + * For atomic writes the entire requested 3824 + * length should be mapped. 3825 + */ 3826 + if (ret == orig_mlen || 3827 + (!(flags & IOMAP_ATOMIC) && ret > 0)) 3814 3828 goto out; 3815 3829 } 3816 3830 map.m_len = orig_mlen; 3817 3831 } 3818 3832 ret = ext4_iomap_alloc(inode, &map, flags); 3819 3833 } else { 3820 - /* 3821 - * This can be called for overwrites path from 3822 - * ext4_iomap_overwrite_begin(). 3823 - */ 3824 3834 ret = ext4_map_blocks(NULL, inode, &map, 0); 3825 3835 } 3826 3836 ··· 3852 3856 return 0; 3853 3857 } 3854 3858 3855 - static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset, 3856 - loff_t length, unsigned flags, struct iomap *iomap, 3857 - struct iomap *srcmap) 3858 - { 3859 - int ret; 3860 - 3861 - /* 3862 - * Even for writes we don't need to allocate blocks, so just pretend 3863 - * we are reading to save overhead of starting a transaction. 3864 - */ 3865 - flags &= ~IOMAP_WRITE; 3866 - ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap); 3867 - WARN_ON_ONCE(!ret && iomap->type != IOMAP_MAPPED); 3868 - return ret; 3869 - } 3870 - 3871 3859 const struct iomap_ops ext4_iomap_ops = { 3872 3860 .iomap_begin = ext4_iomap_begin, 3873 - }; 3874 - 3875 - const struct iomap_ops ext4_iomap_overwrite_ops = { 3876 - .iomap_begin = ext4_iomap_overwrite_begin, 3877 3861 }; 3878 3862 3879 3863 static int ext4_iomap_begin_report(struct inode *inode, loff_t offset, ··· 4109 4133 if (ext4_should_journal_data(inode)) { 4110 4134 err = ext4_dirty_journalled_data(handle, bh); 4111 4135 } else { 4112 - err = 0; 4113 4136 mark_buffer_dirty(bh); 4114 - if (ext4_should_order_data(inode)) 4137 + /* 4138 + * Only the written block requires ordered data to prevent 4139 + * exposing stale data. 4140 + */ 4141 + if (!buffer_unwritten(bh) && !buffer_delay(bh) && 4142 + ext4_should_order_data(inode)) 4115 4143 err = ext4_jbd2_inode_add_write(handle, inode, from, 4116 4144 length); 4117 4145 }
+3
fs/ext4/ioctl.c
··· 968 968 969 969 err = ext4_group_add(sb, input); 970 970 if (EXT4_SB(sb)->s_journal) { 971 + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE, NULL); 971 972 jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal); 972 973 err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal, 0); 973 974 jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal); ··· 1614 1613 1615 1614 err = ext4_group_extend(sb, EXT4_SB(sb)->s_es, n_blocks_count); 1616 1615 if (EXT4_SB(sb)->s_journal) { 1616 + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE, 1617 + NULL); 1617 1618 jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal); 1618 1619 err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal, 0); 1619 1620 jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+1 -1
fs/ext4/mballoc-test.c
··· 567 567 568 568 bitmap = mbt_ctx_bitmap(sb, TEST_GOAL_GROUP); 569 569 memset(bitmap, 0, sb->s_blocksize); 570 - ret = ext4_mb_mark_diskspace_used(ac, NULL, 0); 570 + ret = ext4_mb_mark_diskspace_used(ac, NULL); 571 571 KUNIT_ASSERT_EQ(test, ret, 0); 572 572 573 573 max = EXT4_CLUSTERS_PER_GROUP(sb);
+36 -37
fs/ext4/mballoc.c
··· 892 892 } 893 893 } 894 894 895 + static ext4_group_t ext4_get_allocation_groups_count( 896 + struct ext4_allocation_context *ac) 897 + { 898 + ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb); 899 + 900 + /* non-extent files are limited to low blocks/groups */ 901 + if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))) 902 + ngroups = EXT4_SB(ac->ac_sb)->s_blockfile_groups; 903 + 904 + /* Pairs with smp_wmb() in ext4_update_super() */ 905 + smp_rmb(); 906 + 907 + return ngroups; 908 + } 909 + 895 910 static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac, 896 911 struct xarray *xa, 897 912 ext4_group_t start, ext4_group_t end) ··· 914 899 struct super_block *sb = ac->ac_sb; 915 900 struct ext4_sb_info *sbi = EXT4_SB(sb); 916 901 enum criteria cr = ac->ac_criteria; 917 - ext4_group_t ngroups = ext4_get_groups_count(sb); 902 + ext4_group_t ngroups = ext4_get_allocation_groups_count(ac); 918 903 unsigned long group = start; 919 904 struct ext4_group_info *grp; 920 905 ··· 966 951 ext4_group_t start, end; 967 952 968 953 start = group; 969 - end = ext4_get_groups_count(ac->ac_sb); 954 + end = ext4_get_allocation_groups_count(ac); 970 955 wrap_around: 971 956 for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) { 972 957 ret = ext4_mb_scan_groups_largest_free_order_range(ac, i, ··· 1016 1001 ext4_group_t start, end; 1017 1002 1018 1003 start = group; 1019 - end = ext4_get_groups_count(ac->ac_sb); 1004 + end = ext4_get_allocation_groups_count(ac); 1020 1005 wrap_around: 1021 1006 i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len); 1022 1007 for (; i < MB_NUM_ORDERS(ac->ac_sb); i++) { ··· 1098 1083 min_order = fls(ac->ac_o_ex.fe_len); 1099 1084 1100 1085 start = group; 1101 - end = ext4_get_groups_count(ac->ac_sb); 1086 + end = ext4_get_allocation_groups_count(ac); 1102 1087 wrap_around: 1103 1088 for (i = order; i >= min_order; i--) { 1104 1089 int frag_order; ··· 1148 1133 return 0; 1149 1134 if (ac->ac_criteria >= CR_GOAL_LEN_SLOW) 1150 1135 return 0; 1151 - if (!ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) 1152 - return 0; 1153 1136 return 1; 1154 1137 } 1155 1138 ··· 1195 1182 int ret = 0; 1196 1183 ext4_group_t start; 1197 1184 struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); 1198 - ext4_group_t ngroups = ext4_get_groups_count(ac->ac_sb); 1199 - 1200 - /* non-extent files are limited to low blocks/groups */ 1201 - if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))) 1202 - ngroups = sbi->s_blockfile_groups; 1185 + ext4_group_t ngroups = ext4_get_allocation_groups_count(ac); 1203 1186 1204 1187 /* searching for the right group start from the goal value specified */ 1205 1188 start = ac->ac_g_ex.fe_group; ··· 1715 1706 1716 1707 /* Avoid locking the folio in the fast path ... */ 1717 1708 folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0); 1718 - if (IS_ERR(folio) || !folio_test_uptodate(folio)) { 1709 + if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) { 1710 + /* 1711 + * folio_test_locked is employed to detect ongoing folio 1712 + * migrations, since concurrent migrations can lead to 1713 + * bitmap inconsistency. And if we are not uptodate that 1714 + * implies somebody just created the folio but is yet to 1715 + * initialize it. We can drop the folio reference and 1716 + * try to get the folio with lock in both cases to avoid 1717 + * concurrency. 1718 + */ 1719 1719 if (!IS_ERR(folio)) 1720 - /* 1721 - * drop the folio reference and try 1722 - * to get the folio with lock. If we 1723 - * are not uptodate that implies 1724 - * somebody just created the folio but 1725 - * is yet to initialize it. So 1726 - * wait for it to initialize. 1727 - */ 1728 1720 folio_put(folio); 1729 1721 folio = __filemap_get_folio(inode->i_mapping, pnum, 1730 1722 FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp); ··· 1774 1764 1775 1765 /* we need another folio for the buddy */ 1776 1766 folio = __filemap_get_folio(inode->i_mapping, pnum, FGP_ACCESSED, 0); 1777 - if (IS_ERR(folio) || !folio_test_uptodate(folio)) { 1767 + if (IS_ERR(folio) || !folio_test_uptodate(folio) || folio_test_locked(folio)) { 1778 1768 if (!IS_ERR(folio)) 1779 1769 folio_put(folio); 1780 1770 folio = __filemap_get_folio(inode->i_mapping, pnum, ··· 4195 4185 * Returns 0 if success or error code 4196 4186 */ 4197 4187 static noinline_for_stack int 4198 - ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, 4199 - handle_t *handle, unsigned int reserv_clstrs) 4188 + ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, handle_t *handle) 4200 4189 { 4201 4190 struct ext4_group_desc *gdp; 4202 4191 struct ext4_sb_info *sbi; ··· 4250 4241 BUG_ON(changed != ac->ac_b_ex.fe_len); 4251 4242 #endif 4252 4243 percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len); 4253 - /* 4254 - * Now reduce the dirty block count also. Should not go negative 4255 - */ 4256 - if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) 4257 - /* release all the reserved blocks if non delalloc */ 4258 - percpu_counter_sub(&sbi->s_dirtyclusters_counter, 4259 - reserv_clstrs); 4260 4244 4261 4245 return err; 4262 4246 } ··· 6333 6331 ext4_mb_pa_put_free(ac); 6334 6332 } 6335 6333 if (likely(ac->ac_status == AC_STATUS_FOUND)) { 6336 - *errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_clstrs); 6334 + *errp = ext4_mb_mark_diskspace_used(ac, handle); 6337 6335 if (*errp) { 6338 6336 ext4_discard_allocated_blocks(ac); 6339 6337 goto errout; ··· 6364 6362 out: 6365 6363 if (inquota && ar->len < inquota) 6366 6364 dquot_free_block(ar->inode, EXT4_C2B(sbi, inquota - ar->len)); 6367 - if (!ar->len) { 6368 - if ((ar->flags & EXT4_MB_DELALLOC_RESERVED) == 0) 6369 - /* release all the reserved blocks if non delalloc */ 6370 - percpu_counter_sub(&sbi->s_dirtyclusters_counter, 6371 - reserv_clstrs); 6372 - } 6365 + /* release any reserved blocks */ 6366 + if (reserv_clstrs) 6367 + percpu_counter_sub(&sbi->s_dirtyclusters_counter, reserv_clstrs); 6373 6368 6374 6369 trace_ext4_allocate_blocks(ar, (unsigned long long)block); 6375 6370
+12
fs/ext4/migrate.c
··· 449 449 retval = PTR_ERR(handle); 450 450 goto out_unlock; 451 451 } 452 + /* 453 + * This operation rewrites the inode's block mapping layout 454 + * (indirect to extents) and is not tracked in the fast commit 455 + * log, so disable fast commits for this transaction. 456 + */ 457 + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MIGRATE, handle); 452 458 goal = (((inode->i_ino - 1) / EXT4_INODES_PER_GROUP(inode->i_sb)) * 453 459 EXT4_INODES_PER_GROUP(inode->i_sb)) + 1; 454 460 owner[0] = i_uid_read(inode); ··· 636 630 ret = PTR_ERR(handle); 637 631 goto out_unlock; 638 632 } 633 + /* 634 + * This operation rewrites the inode's block mapping layout 635 + * (extents to indirect blocks) and is not tracked in the fast 636 + * commit log, so disable fast commits for this transaction. 637 + */ 638 + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_MIGRATE, handle); 639 639 640 640 down_write(&EXT4_I(inode)->i_data_sem); 641 641 ret = ext4_ext_check_inode(inode);
+2
fs/ext4/move_extent.c
··· 321 321 ret = PTR_ERR(handle); 322 322 goto out; 323 323 } 324 + ext4_fc_mark_ineligible(orig_inode->i_sb, EXT4_FC_REASON_MOVE_EXT, 325 + handle); 324 326 325 327 ret = mext_move_begin(mext, folio, &move_type); 326 328 if (ret)
+23 -14
fs/ext4/super.c
··· 3650 3650 } 3651 3651 3652 3652 /* 3653 - * This function is called once a day if we have errors logged 3654 - * on the file system 3653 + * This function is called once a day by default if we have errors logged 3654 + * on the file system. 3655 + * Use the err_report_sec sysfs attribute to disable or adjust its call 3656 + * freequency. 3655 3657 */ 3656 - static void print_daily_error_info(struct timer_list *t) 3658 + void print_daily_error_info(struct timer_list *t) 3657 3659 { 3658 3660 struct ext4_sb_info *sbi = timer_container_of(sbi, t, s_err_report); 3659 3661 struct super_block *sb = sbi->s_sb; ··· 3695 3693 le64_to_cpu(es->s_last_error_block)); 3696 3694 printk(KERN_CONT "\n"); 3697 3695 } 3698 - mod_timer(&sbi->s_err_report, jiffies + 24*60*60*HZ); /* Once a day */ 3696 + 3697 + if (sbi->s_err_report_sec) 3698 + mod_timer(&sbi->s_err_report, jiffies + secs_to_jiffies(sbi->s_err_report_sec)); 3699 3699 } 3700 3700 3701 3701 /* Find next suitable group and run ext4_init_inode_table */ ··· 5620 5616 clear_opt2(sb, MB_OPTIMIZE_SCAN); 5621 5617 } 5622 5618 5619 + err = ext4_percpu_param_init(sbi); 5620 + if (err) 5621 + goto failed_mount5; 5622 + 5623 5623 err = ext4_mb_init(sb); 5624 5624 if (err) { 5625 5625 ext4_msg(sb, KERN_ERR, "failed to initialize mballoc (%d)", ··· 5638 5630 if (sbi->s_journal) 5639 5631 sbi->s_journal->j_commit_callback = 5640 5632 ext4_journal_commit_callback; 5641 - 5642 - err = ext4_percpu_param_init(sbi); 5643 - if (err) 5644 - goto failed_mount6; 5645 5633 5646 5634 if (ext4_has_feature_flex_bg(sb)) 5647 5635 if (!ext4_fill_flex_info(sb)) { ··· 5694 5690 clear_opt(sb, DISCARD); 5695 5691 } 5696 5692 5697 - if (es->s_error_count) 5698 - mod_timer(&sbi->s_err_report, jiffies + 300*HZ); /* 5 minutes */ 5693 + if (es->s_error_count) { 5694 + sbi->s_err_report_sec = 5*60; /* first time 5 minutes */ 5695 + mod_timer(&sbi->s_err_report, 5696 + jiffies + secs_to_jiffies(sbi->s_err_report_sec)); 5697 + } 5698 + sbi->s_err_report_sec = 24*60*60; /* Once a day */ 5699 5699 5700 5700 /* Enable message ratelimiting. Default is 10 messages per 5 secs. */ 5701 5701 ratelimit_state_init(&sbi->s_err_ratelimit_state, 5 * HZ, 10); ··· 5724 5716 failed_mount6: 5725 5717 ext4_mb_release(sb); 5726 5718 ext4_flex_groups_free(sbi); 5727 - ext4_percpu_param_destroy(sbi); 5728 5719 failed_mount5: 5720 + ext4_percpu_param_destroy(sbi); 5729 5721 ext4_ext_release(sb); 5730 5722 ext4_release_system_zone(sb); 5731 5723 failed_mount4a: ··· 6245 6237 ext4_errno_to_code(sbi->s_last_error_code); 6246 6238 /* 6247 6239 * Start the daily error reporting function if it hasn't been 6248 - * started already 6240 + * started already and sbi->s_err_report_sec is not zero 6249 6241 */ 6250 - if (!es->s_error_count) 6251 - mod_timer(&sbi->s_err_report, jiffies + 24*60*60*HZ); 6242 + if (!es->s_error_count && !sbi->s_err_report_sec) 6243 + mod_timer(&sbi->s_err_report, 6244 + jiffies + secs_to_jiffies(sbi->s_err_report_sec)); 6252 6245 le32_add_cpu(&es->s_error_count, sbi->s_add_error_count); 6253 6246 sbi->s_add_error_count = 0; 6254 6247 }
+36
fs/ext4/sysfs.c
··· 40 40 attr_pointer_string, 41 41 attr_pointer_atomic, 42 42 attr_journal_task, 43 + attr_err_report_sec, 43 44 } attr_id_t; 44 45 45 46 typedef enum { ··· 131 130 return count; 132 131 } 133 132 133 + static ssize_t err_report_sec_store(struct ext4_sb_info *sbi, 134 + const char *buf, size_t count) 135 + { 136 + unsigned long t; 137 + int ret; 138 + 139 + ret = kstrtoul(skip_spaces(buf), 0, &t); 140 + if (ret) 141 + return ret; 142 + 143 + /*the maximum time interval must not exceed one year.*/ 144 + if (t > (365*24*60*60)) 145 + return -EINVAL; 146 + 147 + if (sbi->s_err_report_sec == t) /*nothing to do*/ 148 + goto out; 149 + else if (!sbi->s_err_report_sec && t) { 150 + timer_setup(&sbi->s_err_report, print_daily_error_info, 0); 151 + } else if (sbi->s_err_report_sec && !t) { 152 + timer_delete_sync(&sbi->s_err_report); 153 + goto out; 154 + } 155 + 156 + sbi->s_err_report_sec = t; 157 + mod_timer(&sbi->s_err_report, jiffies + secs_to_jiffies(sbi->s_err_report_sec)); 158 + 159 + out: 160 + return count; 161 + } 162 + 134 163 static ssize_t journal_task_show(struct ext4_sb_info *sbi, char *buf) 135 164 { 136 165 if (!sbi->s_journal) ··· 248 217 ext4_sb_info, s_mb_group_prealloc); 249 218 EXT4_ATTR_OFFSET(mb_best_avail_max_trim_order, 0644, mb_order, 250 219 ext4_sb_info, s_mb_best_avail_max_trim_order); 220 + EXT4_ATTR_OFFSET(err_report_sec, 0644, err_report_sec, ext4_sb_info, s_err_report_sec); 251 221 EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal); 252 222 EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats); 253 223 EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan); ··· 341 309 ATTR_LIST(last_trim_minblks), 342 310 ATTR_LIST(sb_update_sec), 343 311 ATTR_LIST(sb_update_kb), 312 + ATTR_LIST(err_report_sec), 344 313 NULL, 345 314 }; 346 315 ATTRIBUTE_GROUPS(ext4); ··· 435 402 return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr)); 436 403 return sysfs_emit(buf, "%u\n", *((unsigned int *) ptr)); 437 404 case attr_pointer_ul: 405 + case attr_err_report_sec: 438 406 return sysfs_emit(buf, "%lu\n", *((unsigned long *) ptr)); 439 407 case attr_pointer_u8: 440 408 return sysfs_emit(buf, "%u\n", *((unsigned char *) ptr)); ··· 559 525 return inode_readahead_blks_store(sbi, buf, len); 560 526 case attr_trigger_test_error: 561 527 return trigger_test_error(sbi, buf, len); 528 + case attr_err_report_sec: 529 + return err_report_sec_store(sbi, buf, len); 562 530 default: 563 531 return ext4_generic_attr_store(a, sbi, buf, len); 564 532 }
+2
fs/ext4/verity.c
··· 231 231 goto cleanup; 232 232 } 233 233 234 + ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_VERITY, handle); 235 + 234 236 err = ext4_orphan_del(handle, inode); 235 237 if (err) 236 238 goto stop_and_cleanup;
+7 -1
include/trace/events/ext4.h
··· 102 102 TRACE_DEFINE_ENUM(EXT4_FC_REASON_FALLOC_RANGE); 103 103 TRACE_DEFINE_ENUM(EXT4_FC_REASON_INODE_JOURNAL_DATA); 104 104 TRACE_DEFINE_ENUM(EXT4_FC_REASON_ENCRYPTED_FILENAME); 105 + TRACE_DEFINE_ENUM(EXT4_FC_REASON_MIGRATE); 106 + TRACE_DEFINE_ENUM(EXT4_FC_REASON_VERITY); 107 + TRACE_DEFINE_ENUM(EXT4_FC_REASON_MOVE_EXT); 105 108 TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX); 106 109 107 110 #define show_fc_reason(reason) \ ··· 118 115 { EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \ 119 116 { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \ 120 117 { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}, \ 121 - { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"}) 118 + { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"}, \ 119 + { EXT4_FC_REASON_MIGRATE, "MIGRATE"}, \ 120 + { EXT4_FC_REASON_VERITY, "VERITY"}, \ 121 + { EXT4_FC_REASON_MOVE_EXT, "MOVE_EXT"}) 122 122 123 123 TRACE_DEFINE_ENUM(CR_POWER2_ALIGNED); 124 124 TRACE_DEFINE_ENUM(CR_GOAL_LEN_FAST);