Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/page_alloc: don't increase highatomic reserve after pcp alloc

Higher order GFP_ATOMIC allocations can be served through a PCP list with
ALLOC_HIGHATOMIC set. Such an allocation can e.g. happen if a zone is
between the low and min watermarks, and get_page_from_freelist is retried
after the alloc_flags are relaxed.

The call to reserve_highatomic_pageblock() after such a PCP allocation
will result in an increase every single time: the page from the
(unmovable) PCP list will never have migrate type MIGRATE_HIGHATOMIC,
since MIGRATE_HIGHATOMIC pages do not appear on the unmovable PCP list.
So a new pageblock is converted to MIGRATE_HIGHATOMIC.

Eventually that leads to the maximum of 1% of the zone being used up by
(often mostly free) MIGRATE_HIGHATOMIC pageblocks, for no good reason.
Since this space is not available for normal allocations, this wastes
memory and will push things in to reclaim too soon.

This was observed on a system that ran a test with bursts of memory
activity, paired with GFP_ATOMIC SLUB activity. These would lead to a new
slab being allocated with GFP_ATOMIC, sometimes hitting the
get_page_from_freelist retry path by being below the low watermark. While
the frequency of those allocations was low, it kept adding up over time,
and the number of MIGRATE_ATOMIC pageblocks kept increasing.

If a higher order atomic allocation can be served by the unmovable PCP
list, there is probably no need yet to extend the reserves. So, move the
check and possible extension of the highatomic reserves to the buddy case
only, and do not refill the PCP list for ALLOC_HIGHATOMIC if it's empty.
This way, the PCP list is tried for ALLOC_HIGHATOMIC for a fast atomic
allocation. But it will immediately fall back to rmqueue_buddy() if it's
empty. In rmqueue_buddy(), the MIGRATE_HIGHATOMIC buddy lists are tried
first (as before), and the reserves are extended only if that fails.

With this change, the test was stable. Highatomic reserves were built up,
but to a normal level. No highatomic failures were seen.

This is similar to the patch proposed in [1] by Zhiguo Jiang, but
re-arranged a bit.

Link: https://lkml.kernel.org/r/20260320173426.1831267-1-fvdl@google.com
Link: https://lore.kernel.org/all/20231122013925.1507-1-justinjiang@vivo.com/ [1]
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
Signed-off-by: Frank van der Linden <fvdl@google.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zhiguo Jiang <justinjiang@vivo.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Frank van der Linden and committed by
Andrew Morton
b480cbb0 d885a076

+23 -7
+23 -7
mm/page_alloc.c
··· 207 207 208 208 static void __free_pages_ok(struct page *page, unsigned int order, 209 209 fpi_t fpi_flags); 210 + static void reserve_highatomic_pageblock(struct page *page, int order, 211 + struct zone *zone); 210 212 211 213 /* 212 214 * results with 256, 32 in the lowmem_reserve sysctl: ··· 3241 3239 spin_unlock_irqrestore(&zone->lock, flags); 3242 3240 } while (check_new_pages(page, order)); 3243 3241 3242 + /* 3243 + * If this is a high-order atomic allocation then check 3244 + * if the pageblock should be reserved for the future 3245 + */ 3246 + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) 3247 + reserve_highatomic_pageblock(page, order, zone); 3248 + 3244 3249 __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); 3245 3250 zone_statistics(preferred_zone, zone, 1); 3246 3251 ··· 3318 3309 if (list_empty(list)) { 3319 3310 int batch = nr_pcp_alloc(pcp, zone, order); 3320 3311 int alloced; 3312 + 3313 + /* 3314 + * Don't refill the list for a higher order atomic 3315 + * allocation under memory pressure, as this would 3316 + * not build up any HIGHATOMIC reserves, which 3317 + * might be needed soon. 3318 + * 3319 + * Instead, direct it towards the reserves by 3320 + * returning NULL, which will make the caller fall 3321 + * back to rmqueue_buddy. This will try to use the 3322 + * reserves first and grow them if needed. 3323 + */ 3324 + if (alloc_flags & ALLOC_HIGHATOMIC) 3325 + return NULL; 3321 3326 3322 3327 alloced = rmqueue_bulk(zone, order, 3323 3328 batch, list, ··· 3946 3923 gfp_mask, alloc_flags, ac->migratetype); 3947 3924 if (page) { 3948 3925 prep_new_page(page, order, gfp_mask, alloc_flags); 3949 - 3950 - /* 3951 - * If this is a high-order atomic allocation then check 3952 - * if the pageblock should be reserved for the future 3953 - */ 3954 - if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) 3955 - reserve_highatomic_pageblock(page, order, zone); 3956 3926 3957 3927 return page; 3958 3928 } else {