Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: rename huge_zero_page to huge_zero_folio

Patch series "add persistent huge zero folio support", v3.

Many places in the kernel need to zero out larger chunks, but the maximum
segment we can zero out at a time by ZERO_PAGE is limited by PAGE_SIZE.

This concern was raised during the review of adding Large Block Size
support to XFS[1][2].

This is especially annoying in block devices and filesystems where
multiple ZERO_PAGEs are attached to the bio in different bvecs. With
multipage bvec support in block layer, it is much more efficient to send
out larger zero pages as a part of single bvec.

Some examples of places in the kernel where this could be useful:
- blkdev_issue_zero_pages()
- iomap_dio_zero()
- vmalloc.c:zero_iter()
- rxperf_process_call()
- fscrypt_zeroout_range_inline_crypt()
- bch2_checksum_update()
...

Usually huge_zero_folio is allocated on demand, and it will be deallocated
by the shrinker if there are no users of it left. At the moment,
huge_zero_folio infrastructure refcount is tied to the process lifetime
that created it. This might not work for bio layer as the completions can
be async and the process that created the huge_zero_folio might no longer
be alive. And, one of the main point that came during discussion is to
have something bigger than zero page as a drop-in replacement.

Add a config option PERSISTENT_HUGE_ZERO_FOLIO that will always allocate
the huge_zero_folio, and disable the shrinker so that huge_zero_folio is
never freed. This makes using the huge_zero_folio without having to pass
any mm struct and does not tie the lifetime of the zero folio to anything,
making it a drop-in replacement for ZERO_PAGE.

I have converted blkdev_issue_zero_pages() as an example as a part of this
series. I also noticed close to 4% performance improvement just by
replacing ZERO_PAGE with persistent huge_zero_folio.

I will send patches to individual subsystems using the huge_zero_folio
once this gets upstreamed.

[1] https://lore.kernel.org/linux-xfs/20231027051847.GA7885@lst.de/
[2] https://lore.kernel.org/linux-xfs/ZitIK5OnR7ZNY0IG@infradead.org/


As the transition already happened from exposing huge_zero_page to
huge_zero_folio, change the name of the shrinker and the other helper
function to reflect that.

No functional changes.

Link: https://lkml.kernel.org/r/20250811084113.647267-1-kernel@pankajraghav.com
Link: https://lkml.kernel.org/r/20250811084113.647267-2-kernel@pankajraghav.com
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Mariano Pache <npache@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kiryl Shutsemau <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Pankaj Raghav and committed by
Andrew Morton
b912586b 4c89792e

+17 -17
+17 -17
mm/huge_memory.c
··· 207 207 return orders; 208 208 } 209 209 210 - static bool get_huge_zero_page(void) 210 + static bool get_huge_zero_folio(void) 211 211 { 212 212 struct folio *zero_folio; 213 213 retry: ··· 237 237 return true; 238 238 } 239 239 240 - static void put_huge_zero_page(void) 240 + static void put_huge_zero_folio(void) 241 241 { 242 242 /* 243 243 * Counter should never go to zero here. Only shrinker can put ··· 251 251 if (test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) 252 252 return READ_ONCE(huge_zero_folio); 253 253 254 - if (!get_huge_zero_page()) 254 + if (!get_huge_zero_folio()) 255 255 return NULL; 256 256 257 257 if (test_and_set_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) 258 - put_huge_zero_page(); 258 + put_huge_zero_folio(); 259 259 260 260 return READ_ONCE(huge_zero_folio); 261 261 } ··· 263 263 void mm_put_huge_zero_folio(struct mm_struct *mm) 264 264 { 265 265 if (test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) 266 - put_huge_zero_page(); 266 + put_huge_zero_folio(); 267 267 } 268 268 269 - static unsigned long shrink_huge_zero_page_count(struct shrinker *shrink, 270 - struct shrink_control *sc) 269 + static unsigned long shrink_huge_zero_folio_count(struct shrinker *shrink, 270 + struct shrink_control *sc) 271 271 { 272 272 /* we can free zero page only if last reference remains */ 273 273 return atomic_read(&huge_zero_refcount) == 1 ? HPAGE_PMD_NR : 0; 274 274 } 275 275 276 - static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, 277 - struct shrink_control *sc) 276 + static unsigned long shrink_huge_zero_folio_scan(struct shrinker *shrink, 277 + struct shrink_control *sc) 278 278 { 279 279 if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) { 280 280 struct folio *zero_folio = xchg(&huge_zero_folio, NULL); ··· 287 287 return 0; 288 288 } 289 289 290 - static struct shrinker *huge_zero_page_shrinker; 290 + static struct shrinker *huge_zero_folio_shrinker; 291 291 292 292 #ifdef CONFIG_SYSFS 293 293 static ssize_t enabled_show(struct kobject *kobj, ··· 849 849 850 850 static int __init thp_shrinker_init(void) 851 851 { 852 - huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero"); 853 - if (!huge_zero_page_shrinker) 852 + huge_zero_folio_shrinker = shrinker_alloc(0, "thp-zero"); 853 + if (!huge_zero_folio_shrinker) 854 854 return -ENOMEM; 855 855 856 856 deferred_split_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE | ··· 858 858 SHRINKER_NONSLAB, 859 859 "thp-deferred_split"); 860 860 if (!deferred_split_shrinker) { 861 - shrinker_free(huge_zero_page_shrinker); 861 + shrinker_free(huge_zero_folio_shrinker); 862 862 return -ENOMEM; 863 863 } 864 864 865 - huge_zero_page_shrinker->count_objects = shrink_huge_zero_page_count; 866 - huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan; 867 - shrinker_register(huge_zero_page_shrinker); 865 + huge_zero_folio_shrinker->count_objects = shrink_huge_zero_folio_count; 866 + huge_zero_folio_shrinker->scan_objects = shrink_huge_zero_folio_scan; 867 + shrinker_register(huge_zero_folio_shrinker); 868 868 869 869 deferred_split_shrinker->count_objects = deferred_split_count; 870 870 deferred_split_shrinker->scan_objects = deferred_split_scan; ··· 875 875 876 876 static void __init thp_shrinker_exit(void) 877 877 { 878 - shrinker_free(huge_zero_page_shrinker); 878 + shrinker_free(huge_zero_folio_shrinker); 879 879 shrinker_free(deferred_split_shrinker); 880 880 } 881 881