Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

xfs: add tunable threshold parameter for triggering zone GC

Presently we start garbage collection late - when we start running
out of free zones to backfill max_open_zones. This is a reasonable
default as it minimizes write amplification. The longer we wait,
the more blocks are invalidated and reclaim cost less in terms
of blocks to relocate.

Starting this late however introduces a risk of GC being outcompeted
by user writes. If GC can't keep up, user writes will be forced to
wait for free zones with high tail latencies as a result.

This is not a problem under normal circumstances, but if fragmentation
is bad and user write pressure is high (multiple full-throttle
writers) we will "bottom out" of free zones.

To mitigate this, introduce a zonegc_low_space tunable that lets the
user specify a percentage of how much of the unused space that GC
should keep available for writing. A high value will reclaim more of
the space occupied by unused blocks, creating a larger buffer against
write bursts.

This comes at a cost as write amplification is increased. To
illustrate this using a sample workload, setting zonegc_low_space to
60% avoids high (500ms) max latencies while increasing write
amplification by 15%.

Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

authored by

Hans Holmberg and committed by
Carlos Maiolino
845abeb1 a1a56f54

+75 -2
+21
Documentation/admin-guide/xfs.rst
··· 542 542 nice Relative priority of scheduling the threads. These are the 543 543 same nice levels that can be applied to userspace processes. 544 544 ============ =========== 545 + 546 + Zoned Filesystems 547 + ================= 548 + 549 + For zoned file systems, the following attributes are exposed in: 550 + 551 + /sys/fs/xfs/<dev>/zoned/ 552 + 553 + max_open_zones (Min: 1 Default: Varies Max: UINTMAX) 554 + This read-only attribute exposes the maximum number of open zones 555 + available for data placement. The value is determined at mount time and 556 + is limited by the capabilities of the backing zoned device, file system 557 + size and the max_open_zones mount option. 558 + 559 + zonegc_low_space (Min: 0 Default: 0 Max: 100) 560 + Define a percentage for how much of the unused space that GC should keep 561 + available for writing. A high value will reclaim more of the space 562 + occupied by unused blocks, creating a larger buffer against write 563 + bursts at the cost of increased write amplification. Regardless 564 + of this value, garbage collection will always aim to free a minimum 565 + amount of blocks to keep max_open_zones open for data placement purposes.
+1
fs/xfs/xfs_mount.h
··· 229 229 bool m_finobt_nores; /* no per-AG finobt resv. */ 230 230 bool m_update_sb; /* sb needs update in mount */ 231 231 unsigned int m_max_open_zones; 232 + unsigned int m_zonegc_low_space; 232 233 233 234 /* 234 235 * Bitsets of per-fs metadata that have been checked and/or are sick.
+32
fs/xfs/xfs_sysfs.c
··· 718 718 } 719 719 XFS_SYSFS_ATTR_RO(max_open_zones); 720 720 721 + static ssize_t 722 + zonegc_low_space_store( 723 + struct kobject *kobj, 724 + const char *buf, 725 + size_t count) 726 + { 727 + int ret; 728 + unsigned int val; 729 + 730 + ret = kstrtouint(buf, 0, &val); 731 + if (ret) 732 + return ret; 733 + 734 + if (val > 100) 735 + return -EINVAL; 736 + 737 + zoned_to_mp(kobj)->m_zonegc_low_space = val; 738 + 739 + return count; 740 + } 741 + 742 + static ssize_t 743 + zonegc_low_space_show( 744 + struct kobject *kobj, 745 + char *buf) 746 + { 747 + return sysfs_emit(buf, "%u\n", 748 + zoned_to_mp(kobj)->m_zonegc_low_space); 749 + } 750 + XFS_SYSFS_ATTR_RW(zonegc_low_space); 751 + 721 752 static struct attribute *xfs_zoned_attrs[] = { 722 753 ATTR_LIST(max_open_zones), 754 + ATTR_LIST(zonegc_low_space), 723 755 NULL, 724 756 }; 725 757 ATTRIBUTE_GROUPS(xfs_zoned);
+7
fs/xfs/xfs_zone_alloc.c
··· 1201 1201 xfs_set_freecounter(mp, XC_FREE_RTEXTENTS, 1202 1202 iz.available + iz.reclaimable); 1203 1203 1204 + /* 1205 + * The user may configure GC to free up a percentage of unused blocks. 1206 + * By default this is 0. GC will always trigger at the minimum level 1207 + * for keeping max_open_zones available for data placement. 1208 + */ 1209 + mp->m_zonegc_low_space = 0; 1210 + 1204 1211 error = xfs_zone_gc_mount(mp); 1205 1212 if (error) 1206 1213 goto out_free_zone_info;
+14 -2
fs/xfs/xfs_zone_gc.c
··· 162 162 163 163 /* 164 164 * We aim to keep enough zones free in stock to fully use the open zone limit 165 - * for data placement purposes. 165 + * for data placement purposes. Additionally, the m_zonegc_low_space tunable 166 + * can be set to make sure a fraction of the unused blocks are available for 167 + * writing. 166 168 */ 167 169 bool 168 170 xfs_zoned_need_gc( 169 171 struct xfs_mount *mp) 170 172 { 173 + s64 available, free; 174 + 171 175 if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE)) 172 176 return false; 173 - if (xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE) < 177 + 178 + available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE); 179 + 180 + if (available < 174 181 mp->m_groups[XG_TYPE_RTG].blocks * 175 182 (mp->m_max_open_zones - XFS_OPEN_GC_ZONES)) 176 183 return true; 184 + 185 + free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS); 186 + if (available < mult_frac(free, mp->m_zonegc_low_space, 100)) 187 + return true; 188 + 177 189 return false; 178 190 } 179 191