Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fallocate updates from Christian Brauner:
"fallocate() currently supports creating preallocated files
efficiently. However, on most filesystems fallocate() will preallocate
blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.

The extent state must later be converted to a written state when the
user writes data into this range, which can trigger numerous metadata
changes and journal I/O. This may leads to significant write
amplification and performance degradation in synchronous write mode.

At the moment, the only method to avoid this is to create an empty
file and write zero data into it (for example, using 'dd' with a large
block size). However, this method is slow and consumes a considerable
amount of disk bandwidth.

Now that more and more flash-based storage devices are available it is
possible to efficiently write zeros to SSDs using the unmap write
zeroes command if the devices do not write physical zeroes to the
media.

For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support
the DEAC bit[1], the write zeroes command does not write actual data
to the device, instead, NVMe converts the zeroed range to a
deallocated state, which works fast and consumes almost no disk write
bandwidth.

This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and
BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and
device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and
STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.

fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES
flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a
way that subsequent writes to that range do not require further
changes to the file mapping metadata. This flag is beneficial for
subsequent pure overwriting within this range, as it can save on block
allocation and, consequently, significant metadata changes"

* tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ext4: add FALLOC_FL_WRITE_ZEROES support
block: add FALLOC_FL_WRITE_ZEROES support
block: factor out common part in blkdev_fallocate()
fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate
dm: clear unmap write zeroes limits when disabling write zeroes
scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP
nvmet: set WZDS and DRB if device enables unmap write zeroes operation
nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit
block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits

+217 -49
+33
Documentation/ABI/stable/sysfs-block
··· 778 778 0, write zeroes is not supported by the device. 779 779 780 780 781 + What: /sys/block/<disk>/queue/write_zeroes_unmap_max_hw_bytes 782 + Date: January 2025 783 + Contact: Zhang Yi <yi.zhang@huawei.com> 784 + Description: 785 + [RO] This file indicates whether a device supports zeroing data 786 + in a specified block range without incurring the cost of 787 + physically writing zeroes to the media for each individual 788 + block. If this parameter is set to write_zeroes_max_bytes, the 789 + device implements a zeroing operation which opportunistically 790 + avoids writing zeroes to media while still guaranteeing that 791 + subsequent reads from the specified block range will return 792 + zeroed data. This operation is a best-effort optimization, a 793 + device may fall back to physically writing zeroes to the media 794 + due to other factors such as misalignment or being asked to 795 + clear a block range smaller than the device's internal 796 + allocation unit. If this parameter is set to 0, the device may 797 + have to write each logical block media during a zeroing 798 + operation. 799 + 800 + 801 + What: /sys/block/<disk>/queue/write_zeroes_unmap_max_bytes 802 + Date: January 2025 803 + Contact: Zhang Yi <yi.zhang@huawei.com> 804 + Description: 805 + [RW] While write_zeroes_unmap_max_hw_bytes is the hardware limit 806 + for the device, this setting is the software limit. Since the 807 + unmap write zeroes operation is a best-effort optimization, some 808 + devices may still physically writing zeroes to media. So the 809 + speed of this operation is not guaranteed. Writing a value of 810 + '0' to this file disables this operation. Otherwise, this 811 + parameter should be equal to write_zeroes_unmap_max_hw_bytes. 812 + 813 + 781 814 What: /sys/block/<disk>/queue/zone_append_max_bytes 782 815 Date: May 2020 783 816 Contact: linux-block@vger.kernel.org
+18 -2
block/blk-settings.c
··· 50 50 lim->max_sectors = UINT_MAX; 51 51 lim->max_dev_sectors = UINT_MAX; 52 52 lim->max_write_zeroes_sectors = UINT_MAX; 53 + lim->max_hw_wzeroes_unmap_sectors = UINT_MAX; 54 + lim->max_user_wzeroes_unmap_sectors = UINT_MAX; 53 55 lim->max_hw_zone_append_sectors = UINT_MAX; 54 56 lim->max_user_discard_sectors = UINT_MAX; 55 57 } ··· 335 333 if (!lim->max_segments) 336 334 lim->max_segments = BLK_MAX_SEGMENTS; 337 335 336 + if (lim->max_hw_wzeroes_unmap_sectors && 337 + lim->max_hw_wzeroes_unmap_sectors != lim->max_write_zeroes_sectors) 338 + return -EINVAL; 339 + lim->max_wzeroes_unmap_sectors = min(lim->max_hw_wzeroes_unmap_sectors, 340 + lim->max_user_wzeroes_unmap_sectors); 341 + 338 342 lim->max_discard_sectors = 339 343 min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors); 340 344 ··· 426 418 { 427 419 /* 428 420 * Most defaults are set by capping the bounds in blk_validate_limits, 429 - * but max_user_discard_sectors is special and needs an explicit 430 - * initialization to the max value here. 421 + * but these limits are special and need an explicit initialization to 422 + * the max value here. 431 423 */ 432 424 lim->max_user_discard_sectors = UINT_MAX; 425 + lim->max_user_wzeroes_unmap_sectors = UINT_MAX; 433 426 return blk_validate_limits(lim); 434 427 } 435 428 ··· 717 708 t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors); 718 709 t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors, 719 710 b->max_write_zeroes_sectors); 711 + t->max_user_wzeroes_unmap_sectors = 712 + min(t->max_user_wzeroes_unmap_sectors, 713 + b->max_user_wzeroes_unmap_sectors); 714 + t->max_hw_wzeroes_unmap_sectors = 715 + min(t->max_hw_wzeroes_unmap_sectors, 716 + b->max_hw_wzeroes_unmap_sectors); 717 + 720 718 t->max_hw_zone_append_sectors = min(t->max_hw_zone_append_sectors, 721 719 b->max_hw_zone_append_sectors); 722 720
+26
block/blk-sysfs.c
··· 161 161 QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_discard_sectors) 162 162 QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_hw_discard_sectors) 163 163 QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_write_zeroes_sectors) 164 + QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_hw_wzeroes_unmap_sectors) 165 + QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_wzeroes_unmap_sectors) 164 166 QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(atomic_write_max_sectors) 165 167 QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(atomic_write_boundary_sectors) 166 168 QUEUE_SYSFS_LIMIT_SHOW_SECTORS_TO_BYTES(max_zone_append_sectors) ··· 204 202 return -EINVAL; 205 203 206 204 lim->max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT; 205 + return 0; 206 + } 207 + 208 + static int queue_max_wzeroes_unmap_sectors_store(struct gendisk *disk, 209 + const char *page, size_t count, struct queue_limits *lim) 210 + { 211 + unsigned long max_zeroes_bytes, max_hw_zeroes_bytes; 212 + ssize_t ret; 213 + 214 + ret = queue_var_store(&max_zeroes_bytes, page, count); 215 + if (ret < 0) 216 + return ret; 217 + 218 + max_hw_zeroes_bytes = lim->max_hw_wzeroes_unmap_sectors << SECTOR_SHIFT; 219 + if (max_zeroes_bytes != 0 && max_zeroes_bytes != max_hw_zeroes_bytes) 220 + return -EINVAL; 221 + 222 + lim->max_user_wzeroes_unmap_sectors = max_zeroes_bytes >> SECTOR_SHIFT; 207 223 return 0; 208 224 } 209 225 ··· 534 514 535 515 QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes"); 536 516 QUEUE_LIM_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); 517 + QUEUE_LIM_RO_ENTRY(queue_max_hw_wzeroes_unmap_sectors, 518 + "write_zeroes_unmap_max_hw_bytes"); 519 + QUEUE_LIM_RW_ENTRY(queue_max_wzeroes_unmap_sectors, 520 + "write_zeroes_unmap_max_bytes"); 537 521 QUEUE_LIM_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); 538 522 QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); 539 523 ··· 686 662 &queue_atomic_write_unit_min_entry.attr, 687 663 &queue_atomic_write_unit_max_entry.attr, 688 664 &queue_max_write_zeroes_sectors_entry.attr, 665 + &queue_max_hw_wzeroes_unmap_sectors_entry.attr, 666 + &queue_max_wzeroes_unmap_sectors_entry.attr, 689 667 &queue_max_zone_append_sectors_entry.attr, 690 668 &queue_zone_write_granularity_entry.attr, 691 669 &queue_rotational_entry.attr,
+30 -24
block/fops.c
··· 844 844 845 845 #define BLKDEV_FALLOC_FL_SUPPORTED \ 846 846 (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ 847 - FALLOC_FL_ZERO_RANGE) 847 + FALLOC_FL_ZERO_RANGE | FALLOC_FL_WRITE_ZEROES) 848 848 849 849 static long blkdev_fallocate(struct file *file, int mode, loff_t start, 850 850 loff_t len) ··· 853 853 struct block_device *bdev = I_BDEV(inode); 854 854 loff_t end = start + len - 1; 855 855 loff_t isize; 856 + unsigned int flags; 856 857 int error; 857 858 858 859 /* Fail if we don't recognize the flags. */ 859 860 if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) 861 + return -EOPNOTSUPP; 862 + /* 863 + * Don't allow writing zeroes if the device does not enable the 864 + * unmap write zeroes operation. 865 + */ 866 + if ((mode & FALLOC_FL_WRITE_ZEROES) && 867 + !bdev_write_zeroes_unmap_sectors(bdev)) 860 868 return -EOPNOTSUPP; 861 869 862 870 /* Don't go off the end of the device. */ ··· 888 880 inode_lock(inode); 889 881 filemap_invalidate_lock(inode->i_mapping); 890 882 883 + switch (mode) { 884 + case FALLOC_FL_ZERO_RANGE: 885 + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: 886 + flags = BLKDEV_ZERO_NOUNMAP; 887 + break; 888 + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: 889 + flags = BLKDEV_ZERO_NOFALLBACK; 890 + break; 891 + case FALLOC_FL_WRITE_ZEROES: 892 + flags = 0; 893 + break; 894 + default: 895 + error = -EOPNOTSUPP; 896 + goto fail; 897 + } 898 + 891 899 /* 892 900 * Invalidate the page cache, including dirty pages, for valid 893 901 * de-allocate mode calls to fallocate(). 894 902 */ 895 - switch (mode) { 896 - case FALLOC_FL_ZERO_RANGE: 897 - case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: 898 - error = truncate_bdev_range(bdev, file_to_blk_mode(file), start, end); 899 - if (error) 900 - goto fail; 903 + error = truncate_bdev_range(bdev, file_to_blk_mode(file), start, end); 904 + if (error) 905 + goto fail; 901 906 902 - error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, 903 - len >> SECTOR_SHIFT, GFP_KERNEL, 904 - BLKDEV_ZERO_NOUNMAP); 905 - break; 906 - case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: 907 - error = truncate_bdev_range(bdev, file_to_blk_mode(file), start, end); 908 - if (error) 909 - goto fail; 910 - 911 - error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, 912 - len >> SECTOR_SHIFT, GFP_KERNEL, 913 - BLKDEV_ZERO_NOFALLBACK); 914 - break; 915 - default: 916 - error = -EOPNOTSUPP; 917 - } 918 - 907 + error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT, 908 + len >> SECTOR_SHIFT, GFP_KERNEL, flags); 919 909 fail: 920 910 filemap_invalidate_unlock(inode->i_mapping); 921 911 inode_unlock(inode);
+3 -1
drivers/md/dm-table.c
··· 2065 2065 limits->discard_alignment = 0; 2066 2066 } 2067 2067 2068 - if (!dm_table_supports_write_zeroes(t)) 2068 + if (!dm_table_supports_write_zeroes(t)) { 2069 2069 limits->max_write_zeroes_sectors = 0; 2070 + limits->max_hw_wzeroes_unmap_sectors = 0; 2071 + } 2070 2072 2071 2073 if (!dm_table_supports_secure_erase(t)) 2072 2074 limits->max_secure_erase_sectors = 0;
+11 -9
drivers/nvme/host/core.c
··· 2408 2408 else 2409 2409 lim.write_stream_granularity = 0; 2410 2410 2411 + /* 2412 + * Only set the DEAC bit if the device guarantees that reads from 2413 + * deallocated data return zeroes. While the DEAC bit does not 2414 + * require that, it must be a no-op if reads from deallocated data 2415 + * do not return zeroes. 2416 + */ 2417 + if ((id->dlfeat & 0x7) == 0x1 && (id->dlfeat & (1 << 3))) { 2418 + ns->head->features |= NVME_NS_DEAC; 2419 + lim.max_hw_wzeroes_unmap_sectors = lim.max_write_zeroes_sectors; 2420 + } 2421 + 2411 2422 ret = queue_limits_commit_update(ns->disk->queue, &lim); 2412 2423 if (ret) { 2413 2424 blk_mq_unfreeze_queue(ns->disk->queue, memflags); ··· 2426 2415 } 2427 2416 2428 2417 set_capacity_and_notify(ns->disk, capacity); 2429 - 2430 - /* 2431 - * Only set the DEAC bit if the device guarantees that reads from 2432 - * deallocated data return zeroes. While the DEAC bit does not 2433 - * require that, it must be a no-op if reads from deallocated data 2434 - * do not return zeroes. 2435 - */ 2436 - if ((id->dlfeat & 0x7) == 0x1 && (id->dlfeat & (1 << 3))) 2437 - ns->head->features |= NVME_NS_DEAC; 2438 2418 set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info)); 2439 2419 set_bit(NVME_NS_READY, &ns->flags); 2440 2420 blk_mq_unfreeze_queue(ns->disk->queue, memflags);
+4
drivers/nvme/target/io-cmd-bdev.c
··· 46 46 id->npda = id->npdg; 47 47 /* NOWS = Namespace Optimal Write Size */ 48 48 id->nows = to0based(bdev_io_opt(bdev) / bdev_logical_block_size(bdev)); 49 + 50 + /* Set WZDS and DRB if device supports unmapped write zeroes */ 51 + if (bdev_write_zeroes_unmap_sectors(bdev)) 52 + id->dlfeat = (1 << 3) | 0x1; 49 53 } 50 54 51 55 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
+5
drivers/scsi/sd.c
··· 1141 1141 out: 1142 1142 lim->max_write_zeroes_sectors = 1143 1143 sdkp->max_ws_blocks * (logical_block_size >> SECTOR_SHIFT); 1144 + 1145 + if (sdkp->zeroing_mode == SD_ZERO_WS16_UNMAP || 1146 + sdkp->zeroing_mode == SD_ZERO_WS10_UNMAP) 1147 + lim->max_hw_wzeroes_unmap_sectors = 1148 + lim->max_write_zeroes_sectors; 1144 1149 } 1145 1150 1146 1151 static blk_status_t sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
+55 -11
fs/ext4/extents.c
··· 4501 4501 struct ext4_map_blocks map; 4502 4502 unsigned int credits; 4503 4503 loff_t epos, old_size = i_size_read(inode); 4504 + unsigned int blkbits = inode->i_blkbits; 4505 + bool alloc_zero = false; 4504 4506 4505 4507 BUG_ON(!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)); 4506 4508 map.m_lblk = offset; ··· 4514 4512 */ 4515 4513 if (len <= EXT_UNWRITTEN_MAX_LEN) 4516 4514 flags |= EXT4_GET_BLOCKS_NO_NORMALIZE; 4515 + 4516 + /* 4517 + * Do the actual write zero during a running journal transaction 4518 + * costs a lot. First allocate an unwritten extent and then 4519 + * convert it to written after zeroing it out. 4520 + */ 4521 + if (flags & EXT4_GET_BLOCKS_ZERO) { 4522 + flags &= ~EXT4_GET_BLOCKS_ZERO; 4523 + flags |= EXT4_GET_BLOCKS_UNWRIT_EXT; 4524 + alloc_zero = true; 4525 + } 4517 4526 4518 4527 /* 4519 4528 * credits to insert 1 extent into extent tree ··· 4562 4549 * allow a full retry cycle for any remaining allocations 4563 4550 */ 4564 4551 retries = 0; 4565 - map.m_lblk += ret; 4566 - map.m_len = len = len - ret; 4567 - epos = (loff_t)map.m_lblk << inode->i_blkbits; 4552 + epos = (loff_t)(map.m_lblk + ret) << blkbits; 4568 4553 inode_set_ctime_current(inode); 4569 4554 if (new_size) { 4570 4555 if (epos > new_size) ··· 4582 4571 ret2 = ret3 ? ret3 : ret2; 4583 4572 if (unlikely(ret2)) 4584 4573 break; 4574 + 4575 + if (alloc_zero && 4576 + (map.m_flags & (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN))) { 4577 + ret2 = ext4_issue_zeroout(inode, map.m_lblk, map.m_pblk, 4578 + map.m_len); 4579 + if (likely(!ret2)) 4580 + ret2 = ext4_convert_unwritten_extents(NULL, 4581 + inode, (loff_t)map.m_lblk << blkbits, 4582 + (loff_t)map.m_len << blkbits); 4583 + if (ret2) 4584 + break; 4585 + } 4586 + 4587 + map.m_lblk += ret; 4588 + map.m_len = len = len - ret; 4585 4589 } 4586 4590 if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) 4587 4591 goto retry; ··· 4662 4636 if (end_lblk > start_lblk) { 4663 4637 ext4_lblk_t zero_blks = end_lblk - start_lblk; 4664 4638 4665 - flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | EXT4_EX_NOCACHE); 4639 + if (mode & FALLOC_FL_WRITE_ZEROES) 4640 + flags = EXT4_GET_BLOCKS_CREATE_ZERO | EXT4_EX_NOCACHE; 4641 + else 4642 + flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | 4643 + EXT4_EX_NOCACHE); 4666 4644 ret = ext4_alloc_file_blocks(file, start_lblk, zero_blks, 4667 4645 new_size, flags); 4668 4646 if (ret) ··· 4775 4745 if (IS_ENCRYPTED(inode) && 4776 4746 (mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE))) 4777 4747 return -EOPNOTSUPP; 4748 + /* 4749 + * Don't allow writing zeroes if the underlying device does not 4750 + * enable the unmap write zeroes operation. 4751 + */ 4752 + if ((mode & FALLOC_FL_WRITE_ZEROES) && 4753 + !bdev_write_zeroes_unmap_sectors(inode->i_sb->s_bdev)) 4754 + return -EOPNOTSUPP; 4778 4755 4779 4756 /* Return error if mode is not supported */ 4780 4757 if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | 4781 - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | 4782 - FALLOC_FL_INSERT_RANGE)) 4758 + FALLOC_FL_ZERO_RANGE | FALLOC_FL_COLLAPSE_RANGE | 4759 + FALLOC_FL_INSERT_RANGE | FALLOC_FL_WRITE_ZEROES)) 4783 4760 return -EOPNOTSUPP; 4784 4761 4785 4762 inode_lock(inode); ··· 4817 4780 if (ret) 4818 4781 goto out_invalidate_lock; 4819 4782 4820 - if (mode & FALLOC_FL_PUNCH_HOLE) 4783 + switch (mode & FALLOC_FL_MODE_MASK) { 4784 + case FALLOC_FL_PUNCH_HOLE: 4821 4785 ret = ext4_punch_hole(file, offset, len); 4822 - else if (mode & FALLOC_FL_COLLAPSE_RANGE) 4786 + break; 4787 + case FALLOC_FL_COLLAPSE_RANGE: 4823 4788 ret = ext4_collapse_range(file, offset, len); 4824 - else if (mode & FALLOC_FL_INSERT_RANGE) 4789 + break; 4790 + case FALLOC_FL_INSERT_RANGE: 4825 4791 ret = ext4_insert_range(file, offset, len); 4826 - else if (mode & FALLOC_FL_ZERO_RANGE) 4792 + break; 4793 + case FALLOC_FL_ZERO_RANGE: 4794 + case FALLOC_FL_WRITE_ZEROES: 4827 4795 ret = ext4_zero_range(file, offset, len, mode); 4828 - else 4796 + break; 4797 + default: 4829 4798 ret = -EOPNOTSUPP; 4799 + } 4830 4800 4831 4801 out_invalidate_lock: 4832 4802 filemap_invalidate_unlock(mapping);
+1
fs/open.c
··· 281 281 break; 282 282 case FALLOC_FL_COLLAPSE_RANGE: 283 283 case FALLOC_FL_INSERT_RANGE: 284 + case FALLOC_FL_WRITE_ZEROES: 284 285 if (mode & FALLOC_FL_KEEP_SIZE) 285 286 return -EOPNOTSUPP; 286 287 break;
+10
include/linux/blkdev.h
··· 388 388 unsigned int max_user_discard_sectors; 389 389 unsigned int max_secure_erase_sectors; 390 390 unsigned int max_write_zeroes_sectors; 391 + unsigned int max_wzeroes_unmap_sectors; 392 + unsigned int max_hw_wzeroes_unmap_sectors; 393 + unsigned int max_user_wzeroes_unmap_sectors; 391 394 unsigned int max_hw_zone_append_sectors; 392 395 unsigned int max_zone_append_sectors; 393 396 unsigned int discard_granularity; ··· 1050 1047 static inline void blk_queue_disable_write_zeroes(struct request_queue *q) 1051 1048 { 1052 1049 q->limits.max_write_zeroes_sectors = 0; 1050 + q->limits.max_wzeroes_unmap_sectors = 0; 1053 1051 } 1054 1052 1055 1053 /* ··· 1385 1381 static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) 1386 1382 { 1387 1383 return bdev_limits(bdev)->max_write_zeroes_sectors; 1384 + } 1385 + 1386 + static inline unsigned int 1387 + bdev_write_zeroes_unmap_sectors(struct block_device *bdev) 1388 + { 1389 + return bdev_limits(bdev)->max_wzeroes_unmap_sectors; 1388 1390 } 1389 1391 1390 1392 static inline bool bdev_nonrot(struct block_device *bdev)
+2 -1
include/linux/falloc.h
··· 36 36 FALLOC_FL_COLLAPSE_RANGE | \ 37 37 FALLOC_FL_ZERO_RANGE | \ 38 38 FALLOC_FL_INSERT_RANGE | \ 39 - FALLOC_FL_UNSHARE_RANGE) 39 + FALLOC_FL_UNSHARE_RANGE | \ 40 + FALLOC_FL_WRITE_ZEROES) 40 41 41 42 /* on ia32 l_start is on a 32-bit boundary */ 42 43 #if defined(CONFIG_X86_64)
+2 -1
include/trace/events/ext4.h
··· 92 92 { FALLOC_FL_KEEP_SIZE, "KEEP_SIZE"}, \ 93 93 { FALLOC_FL_PUNCH_HOLE, "PUNCH_HOLE"}, \ 94 94 { FALLOC_FL_COLLAPSE_RANGE, "COLLAPSE_RANGE"}, \ 95 - { FALLOC_FL_ZERO_RANGE, "ZERO_RANGE"}) 95 + { FALLOC_FL_ZERO_RANGE, "ZERO_RANGE"}, \ 96 + { FALLOC_FL_WRITE_ZEROES, "WRITE_ZEROES"}) 96 97 97 98 TRACE_DEFINE_ENUM(EXT4_FC_REASON_XATTR); 98 99 TRACE_DEFINE_ENUM(EXT4_FC_REASON_CROSS_RENAME);
+17
include/uapi/linux/falloc.h
··· 78 78 */ 79 79 #define FALLOC_FL_UNSHARE_RANGE 0x40 80 80 81 + /* 82 + * FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that 83 + * subsequent writes to that range do not require further changes to the file 84 + * mapping metadata. This flag is beneficial for subsequent pure overwriting 85 + * within this range, as it can save on block allocation and, consequently, 86 + * significant metadata changes. Therefore, filesystems that always require 87 + * out-of-place writes should not support this flag. 88 + * 89 + * Different filesystems may implement different limitations on the 90 + * granularity of the zeroing operation. Most will preferably be accelerated 91 + * by submitting write zeroes command if the backing storage supports, which 92 + * may not physically write zeros to the media. 93 + * 94 + * This flag cannot be specified in conjunction with the FALLOC_FL_KEEP_SIZE. 95 + */ 96 + #define FALLOC_FL_WRITE_ZEROES 0x80 97 + 81 98 #endif /* _UAPI_FALLOC_H_ */