Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

btrfs: refactor the main loop of cow_file_range()

Currently inside the main loop of cow_file_range(), we do the following
sequence:

- Reserve an extent
- Lock the IO tree range
- Create an IO extent map
- Create an ordered extent

Every step will need extra steps to do cleanup in the following order:

- Drop the newly created extent map
- Unlock extent range and cleanup the involved folios
- Free the reserved extent

However currently the error handling is done inconsistently:

- Extent map drop is handled in a dedicated tag
Out of the main loop, make it much harder to track.

- The extent unlock and folios cleanup is done separately
The extent is unlocked through btrfs_unlock_extent(), then
extent_clear_unlock_delalloc() again in a dedicated tag.
Meanwhile all other callsites (compression/encoded/nocow) all just
call extent_clear_unlock_delalloc() to handle unlock and folio clean
up in one go.

- Reserved extent freeing is handled in a dedicated tag
Out of the main loop, make it much harder to track.

- Error handling of btrfs_reloc_clone_csums() is relying out-of-loop
tags
This is due to the special requirement to finish ordered extents to
handle the metadata reserved space.

Enhance the error handling and align the behavior by:

- Introduce a dedicated cow_one_range() helper
Which do the reserve/lock/allocation in the helper.

And also handle the errors inside the helper.
No more dedicated tags out of the main loop.

- Use a single extent_clear_unlock_delalloc() to unlock and cleanup
folios

- Move the btrfs_reloc_clone_csums() error handling into the new helper
Thankfully it's not that complex compared to other cases.

And since we're here, also reduce the width of the following local
variables to u32:

- cur_alloc_size
- min_alloc_size
Each allocation won't go beyond 128M, thus u32 is more than enough.

- blocksize
The maximum is 64K, no need for u64.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

authored by

Qu Wenruo and committed by
David Sterba
c28214bd 9da49784

+142 -103
+142 -103
fs/btrfs/inode.c
··· 1275 1275 } 1276 1276 1277 1277 /* 1278 + * Handle COW for one range. 1279 + * 1280 + * @ins: The key representing the allocated range. 1281 + * @file_offset: The file offset of the COW range 1282 + * @num_bytes: The expected length of the COW range 1283 + * The actually allocated length can be smaller than it. 1284 + * @min_alloc_size: The minimal extent size. 1285 + * @alloc_hint: The hint for the extent allocator. 1286 + * @ret_alloc_size: The COW range handles by this function. 1287 + * 1288 + * Return 0 if everything is fine and update @ret_alloc_size updated. The 1289 + * range is still locked, and caller should unlock the range after everything 1290 + * is done or for error handling. 1291 + * 1292 + * Return <0 for error and @is updated for where the extra cleanup should 1293 + * happen. The range [file_offset, file_offset + ret_alloc_size) will be 1294 + * cleaned up by this function. 1295 + */ 1296 + static int cow_one_range(struct btrfs_inode *inode, struct folio *locked_folio, 1297 + struct btrfs_key *ins, struct extent_state **cached, 1298 + u64 file_offset, u32 num_bytes, u32 min_alloc_size, 1299 + u64 alloc_hint, u32 *ret_alloc_size) 1300 + { 1301 + struct btrfs_root *root = inode->root; 1302 + struct btrfs_fs_info *fs_info = root->fs_info; 1303 + struct btrfs_ordered_extent *ordered; 1304 + struct btrfs_file_extent file_extent; 1305 + struct extent_map *em; 1306 + u32 cur_len = 0; 1307 + u64 cur_end; 1308 + int ret; 1309 + 1310 + ret = btrfs_reserve_extent(root, num_bytes, num_bytes, min_alloc_size, 1311 + 0, alloc_hint, ins, true, true); 1312 + if (ret < 0) { 1313 + *ret_alloc_size = cur_len; 1314 + return ret; 1315 + } 1316 + 1317 + cur_len = ins->offset; 1318 + cur_end = file_offset + cur_len - 1; 1319 + 1320 + file_extent.disk_bytenr = ins->objectid; 1321 + file_extent.disk_num_bytes = ins->offset; 1322 + file_extent.num_bytes = ins->offset; 1323 + file_extent.ram_bytes = ins->offset; 1324 + file_extent.offset = 0; 1325 + file_extent.compression = BTRFS_COMPRESS_NONE; 1326 + 1327 + /* 1328 + * Locked range will be released either during error clean up (inside 1329 + * this function or by the caller for previously successful ranges) or 1330 + * after the whole range is finished. 1331 + */ 1332 + btrfs_lock_extent(&inode->io_tree, file_offset, cur_end, cached); 1333 + em = btrfs_create_io_em(inode, file_offset, &file_extent, BTRFS_ORDERED_REGULAR); 1334 + if (IS_ERR(em)) { 1335 + ret = PTR_ERR(em); 1336 + goto free_reserved; 1337 + } 1338 + btrfs_free_extent_map(em); 1339 + 1340 + ordered = btrfs_alloc_ordered_extent(inode, file_offset, &file_extent, 1341 + 1U << BTRFS_ORDERED_REGULAR); 1342 + if (IS_ERR(ordered)) { 1343 + btrfs_drop_extent_map_range(inode, file_offset, cur_end, false); 1344 + ret = PTR_ERR(ordered); 1345 + goto free_reserved; 1346 + } 1347 + 1348 + if (btrfs_is_data_reloc_root(root)) { 1349 + ret = btrfs_reloc_clone_csums(ordered); 1350 + 1351 + /* 1352 + * Only drop cache here, and process as normal. 1353 + * 1354 + * We must not allow extent_clear_unlock_delalloc() at 1355 + * free_reserved label to free meta of this ordered extent, as 1356 + * its meta should be freed by btrfs_finish_ordered_io(). 1357 + * 1358 + * So we must continue until @start is increased to 1359 + * skip current ordered extent. 1360 + */ 1361 + if (ret) 1362 + btrfs_drop_extent_map_range(inode, file_offset, 1363 + cur_end, false); 1364 + } 1365 + btrfs_put_ordered_extent(ordered); 1366 + btrfs_dec_block_group_reservations(fs_info, ins->objectid); 1367 + /* 1368 + * Error handling for btrfs_reloc_clone_csums(). 1369 + * 1370 + * Treat the range as finished, thus only clear EXTENT_LOCKED | EXTENT_DELALLOC. 1371 + * The accounting will be done by ordered extents. 1372 + */ 1373 + if (unlikely(ret < 0)) { 1374 + btrfs_cleanup_ordered_extents(inode, file_offset, cur_len); 1375 + extent_clear_unlock_delalloc(inode, file_offset, cur_end, locked_folio, cached, 1376 + EXTENT_LOCKED | EXTENT_DELALLOC, 1377 + PAGE_UNLOCK | PAGE_START_WRITEBACK | 1378 + PAGE_END_WRITEBACK); 1379 + mapping_set_error(inode->vfs_inode.i_mapping, -EIO); 1380 + } 1381 + *ret_alloc_size = cur_len; 1382 + return ret; 1383 + 1384 + free_reserved: 1385 + extent_clear_unlock_delalloc(inode, file_offset, cur_end, locked_folio, cached, 1386 + EXTENT_LOCKED | EXTENT_DELALLOC | 1387 + EXTENT_DELALLOC_NEW | 1388 + EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, 1389 + PAGE_UNLOCK | PAGE_START_WRITEBACK | 1390 + PAGE_END_WRITEBACK); 1391 + btrfs_qgroup_free_data(inode, NULL, file_offset, cur_len, NULL); 1392 + btrfs_dec_block_group_reservations(fs_info, ins->objectid); 1393 + btrfs_free_reserved_extent(fs_info, ins->objectid, ins->offset, true); 1394 + mapping_set_error(inode->vfs_inode.i_mapping, -EIO); 1395 + *ret_alloc_size = cur_len; 1396 + /* 1397 + * We should not return -EAGAIN where it's a special return code for 1398 + * zoned to catch btrfs_reserved_extent(). 1399 + */ 1400 + ASSERT(ret != -EAGAIN); 1401 + return ret; 1402 + } 1403 + 1404 + /* 1278 1405 * when extent_io.c finds a delayed allocation range in the file, 1279 1406 * the call backs end up in this code. The basic idea is to 1280 1407 * allocate extents on disk for the range, and create ordered data structs ··· 1437 1310 u64 alloc_hint = 0; 1438 1311 u64 orig_start = start; 1439 1312 u64 num_bytes; 1440 - u64 cur_alloc_size = 0; 1441 - u64 min_alloc_size; 1442 - u64 blocksize = fs_info->sectorsize; 1313 + u32 min_alloc_size; 1314 + u32 blocksize = fs_info->sectorsize; 1315 + u32 cur_alloc_size = 0; 1443 1316 struct btrfs_key ins; 1444 - struct extent_map *em; 1445 1317 unsigned clear_bits; 1446 1318 unsigned long page_ops; 1447 1319 int ret = 0; ··· 1509 1383 min_alloc_size = fs_info->sectorsize; 1510 1384 1511 1385 while (num_bytes > 0) { 1512 - struct btrfs_ordered_extent *ordered; 1513 - struct btrfs_file_extent file_extent; 1386 + ret = cow_one_range(inode, locked_folio, &ins, &cached, start, 1387 + num_bytes, min_alloc_size, alloc_hint, &cur_alloc_size); 1514 1388 1515 - ret = btrfs_reserve_extent(root, num_bytes, num_bytes, 1516 - min_alloc_size, 0, alloc_hint, 1517 - &ins, true, true); 1518 1389 if (ret == -EAGAIN) { 1519 1390 /* 1520 - * btrfs_reserve_extent only returns -EAGAIN for zoned 1521 - * file systems, which is an indication that there are 1391 + * cow_one_range() only returns -EAGAIN for zoned 1392 + * file systems (from btrfs_reserve_extent()), which 1393 + * is an indication that there are 1522 1394 * no active zones to allocate from at the moment. 1523 1395 * 1524 1396 * If this is the first loop iteration, wait for at ··· 1545 1421 } 1546 1422 if (ret < 0) 1547 1423 goto out_unlock; 1548 - cur_alloc_size = ins.offset; 1549 1424 1550 - file_extent.disk_bytenr = ins.objectid; 1551 - file_extent.disk_num_bytes = ins.offset; 1552 - file_extent.num_bytes = ins.offset; 1553 - file_extent.ram_bytes = ins.offset; 1554 - file_extent.offset = 0; 1555 - file_extent.compression = BTRFS_COMPRESS_NONE; 1425 + /* We should not allocate an extent larger than requested.*/ 1426 + ASSERT(cur_alloc_size <= num_bytes); 1556 1427 1557 - /* 1558 - * Locked range will be released either during error clean up or 1559 - * after the whole range is finished. 1560 - */ 1561 - btrfs_lock_extent(&inode->io_tree, start, start + cur_alloc_size - 1, 1562 - &cached); 1563 - 1564 - em = btrfs_create_io_em(inode, start, &file_extent, 1565 - BTRFS_ORDERED_REGULAR); 1566 - if (IS_ERR(em)) { 1567 - btrfs_unlock_extent(&inode->io_tree, start, 1568 - start + cur_alloc_size - 1, &cached); 1569 - ret = PTR_ERR(em); 1570 - goto out_reserve; 1571 - } 1572 - btrfs_free_extent_map(em); 1573 - 1574 - ordered = btrfs_alloc_ordered_extent(inode, start, &file_extent, 1575 - 1U << BTRFS_ORDERED_REGULAR); 1576 - if (IS_ERR(ordered)) { 1577 - btrfs_unlock_extent(&inode->io_tree, start, 1578 - start + cur_alloc_size - 1, &cached); 1579 - ret = PTR_ERR(ordered); 1580 - goto out_drop_extent_cache; 1581 - } 1582 - 1583 - if (btrfs_is_data_reloc_root(root)) { 1584 - ret = btrfs_reloc_clone_csums(ordered); 1585 - 1586 - /* 1587 - * Only drop cache here, and process as normal. 1588 - * 1589 - * We must not allow extent_clear_unlock_delalloc() 1590 - * at out_unlock label to free meta of this ordered 1591 - * extent, as its meta should be freed by 1592 - * btrfs_finish_ordered_io(). 1593 - * 1594 - * So we must continue until @start is increased to 1595 - * skip current ordered extent. 1596 - */ 1597 - if (ret) 1598 - btrfs_drop_extent_map_range(inode, start, 1599 - start + cur_alloc_size - 1, 1600 - false); 1601 - } 1602 - btrfs_put_ordered_extent(ordered); 1603 - 1604 - btrfs_dec_block_group_reservations(fs_info, ins.objectid); 1605 - 1606 - if (num_bytes < cur_alloc_size) 1607 - num_bytes = 0; 1608 - else 1609 - num_bytes -= cur_alloc_size; 1428 + num_bytes -= cur_alloc_size; 1610 1429 alloc_hint = ins.objectid + ins.offset; 1611 1430 start += cur_alloc_size; 1612 1431 cur_alloc_size = 0; 1613 - 1614 - /* 1615 - * btrfs_reloc_clone_csums() error, since start is increased 1616 - * extent_clear_unlock_delalloc() at out_unlock label won't 1617 - * free metadata of current ordered extent, we're OK to exit. 1618 - */ 1619 - if (ret) 1620 - goto out_unlock; 1621 1432 } 1622 1433 extent_clear_unlock_delalloc(inode, orig_start, end, locked_folio, &cached, 1623 1434 EXTENT_LOCKED | EXTENT_DELALLOC, page_ops); ··· 1561 1502 *done_offset = end; 1562 1503 return ret; 1563 1504 1564 - out_drop_extent_cache: 1565 - btrfs_drop_extent_map_range(inode, start, start + cur_alloc_size - 1, false); 1566 - out_reserve: 1567 - btrfs_dec_block_group_reservations(fs_info, ins.objectid); 1568 - btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, true); 1569 1505 out_unlock: 1570 1506 /* 1571 1507 * Now, we have three regions to clean up: ··· 1597 1543 page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK; 1598 1544 1599 1545 /* 1600 - * For the range (2). If we reserved an extent for our delalloc range 1601 - * (or a subrange) and failed to create the respective ordered extent, 1602 - * then it means that when we reserved the extent we decremented the 1603 - * extent's size from the data space_info's bytes_may_use counter and 1604 - * incremented the space_info's bytes_reserved counter by the same 1605 - * amount. We must make sure extent_clear_unlock_delalloc() does not try 1606 - * to decrement again the data space_info's bytes_may_use counter, 1607 - * therefore we do not pass it the flag EXTENT_CLEAR_DATA_RESV. 1608 - */ 1609 - if (cur_alloc_size) { 1610 - extent_clear_unlock_delalloc(inode, start, 1611 - start + cur_alloc_size - 1, 1612 - locked_folio, &cached, clear_bits, 1613 - page_ops); 1614 - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); 1615 - } 1616 - 1617 - /* 1546 + * For the range (2) the error handling is done by cow_one_range() itself. 1547 + * Nothing needs to be done. 1548 + * 1618 1549 * For the range (3). We never touched the region. In addition to the 1619 1550 * clear_bits above, we add EXTENT_CLEAR_DATA_RESV to release the data 1620 1551 * space_info's bytes_may_use counter, reserved in ··· 1614 1575 end - start - cur_alloc_size + 1, NULL); 1615 1576 } 1616 1577 btrfs_err(fs_info, 1617 - "%s failed, root=%llu inode=%llu start=%llu len=%llu cur_offset=%llu cur_alloc_size=%llu: %d", 1578 + "%s failed, root=%llu inode=%llu start=%llu len=%llu cur_offset=%llu cur_alloc_size=%u: %d", 1618 1579 __func__, btrfs_root_id(inode->root), 1619 1580 btrfs_ino(inode), orig_start, end + 1 - orig_start, 1620 1581 start, cur_alloc_size, ret);