Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/page_alloc: batch page freeing in free_frozen_page_commit

Before returning, free_frozen_page_commit calls free_pcppages_bulk using
nr_pcp_free to determine how many pages can appropritately be freed, based
on the tunable parameters stored in pcp. While this number is an accurate
representation of how many pages should be freed in total, it is not an
appropriate number of pages to free at once using free_pcppages_bulk,
since we have seen the value consistently go above 2000 in the Meta fleet
on larger machines.

As such, perform batched page freeing in free_pcppages_bulk by using
pcp->batch. In order to ensure that other processes are not starved of
the zone lock, free both the zone lock and pcp lock to yield to other
threads.

Note that because free_frozen_page_commit now performs a spinlock inside
the function (and can fail), the function may now return with a freed pcp.
To handle this, return true if the pcp is locked on exit and false
otherwise.

In addition, since free_frozen_page_commit must now be aware of what UP
flags were stored at the time of the spin lock, and because we must be
able to report new UP flags to the callers, add a new unsigned long*
parameter UP_flags to keep track of this.

The following are a few synthetic benchmarks, made on three machines. The
first is a large machine with 754GiB memory and 316 processors. The
second is a relatively smaller machine with 251GiB memory and 176
processors. The third and final is the smallest of the three, which has
62GiB memory and 36 processors.

On all machines, I kick off a kernel build with -j$(nproc). Negative
delta is better (faster compilation)

Large machine (754GiB memory, 316 processors)
make -j$(nproc)
+------------+---------------+-----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+-----------+
| real | 0.8070 | - 1.4865 |
| user | 0.2823 | + 0.4081 |
| sys | 5.0267 | -11.8737 |
+------------+---------------+-----------+

Medium machine (251GiB memory, 176 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real | 0.2806 | +0.0351 |
| user | 0.0994 | +0.3170 |
| sys | 0.6229 | -0.6277 |
+------------+---------------+----------+

Small machine (62GiB memory, 36 processors)
make -j$(nproc)
+------------+---------------+----------+
| Metric (s) | Variation (%) | Delta(%) |
+------------+---------------+----------+
| real | 0.1503 | -2.6585 |
| user | 0.0431 | -2.2984 |
| sys | 0.1870 | -3.2013 |
+------------+---------------+----------+

Here, variation is the coefficient of variation, i.e. standard deviation
/ mean.

[joshua.hahnjy@gmail.com: simplify checks]
Link: https://lkml.kernel.org/r/20251014192827.851389-1-joshua.hahnjy@gmail.com
Link: https://lkml.kernel.org/r/20251014145011.3427205-4-joshua.hahnjy@gmail.com
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Chris Mason <clm@fb.com>
Co-developed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Joshua Hahn and committed by
Andrew Morton
91e69129 fc4b909c

+56 -9
+56 -9
mm/page_alloc.c
··· 2818 2818 return high; 2819 2819 } 2820 2820 2821 - static void free_frozen_page_commit(struct zone *zone, 2821 + /* 2822 + * Tune pcp alloc factor and adjust count & free_count. Free pages to bring the 2823 + * pcp's watermarks below high. 2824 + * 2825 + * May return a freed pcp, if during page freeing the pcp spinlock cannot be 2826 + * reacquired. Return true if pcp is locked, false otherwise. 2827 + */ 2828 + static bool free_frozen_page_commit(struct zone *zone, 2822 2829 struct per_cpu_pages *pcp, struct page *page, int migratetype, 2823 - unsigned int order, fpi_t fpi_flags) 2830 + unsigned int order, fpi_t fpi_flags, unsigned long *UP_flags) 2824 2831 { 2825 2832 int high, batch; 2833 + int to_free, to_free_batched; 2826 2834 int pindex; 2835 + int cpu = smp_processor_id(); 2836 + int ret = true; 2827 2837 bool free_high = false; 2828 2838 2829 2839 /* ··· 2871 2861 * Do not attempt to take a zone lock. Let pcp->count get 2872 2862 * over high mark temporarily. 2873 2863 */ 2874 - return; 2864 + return true; 2875 2865 } 2876 2866 2877 2867 high = nr_pcp_high(pcp, zone, batch, free_high); 2878 2868 if (pcp->count < high) 2879 - return; 2869 + return true; 2880 2870 2881 - free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), 2882 - pcp, pindex); 2871 + to_free = nr_pcp_free(pcp, batch, high, free_high); 2872 + while (to_free > 0 && pcp->count > 0) { 2873 + to_free_batched = min(to_free, batch); 2874 + free_pcppages_bulk(zone, to_free_batched, pcp, pindex); 2875 + to_free -= to_free_batched; 2876 + 2877 + if (to_free == 0 || pcp->count == 0) 2878 + break; 2879 + 2880 + pcp_spin_unlock(pcp); 2881 + pcp_trylock_finish(*UP_flags); 2882 + 2883 + pcp_trylock_prepare(*UP_flags); 2884 + pcp = pcp_spin_trylock(zone->per_cpu_pageset); 2885 + if (!pcp) { 2886 + pcp_trylock_finish(*UP_flags); 2887 + ret = false; 2888 + break; 2889 + } 2890 + 2891 + /* 2892 + * Check if this thread has been migrated to a different CPU. 2893 + * If that is the case, give up and indicate that the pcp is 2894 + * returned in an unlocked state. 2895 + */ 2896 + if (smp_processor_id() != cpu) { 2897 + pcp_spin_unlock(pcp); 2898 + pcp_trylock_finish(*UP_flags); 2899 + ret = false; 2900 + break; 2901 + } 2902 + } 2903 + 2883 2904 if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && 2884 2905 zone_watermark_ok(zone, 0, high_wmark_pages(zone), 2885 2906 ZONE_MOVABLE, 0)) { ··· 2928 2887 next_memory_node(pgdat->node_id) < MAX_NUMNODES) 2929 2888 atomic_set(&pgdat->kswapd_failures, 0); 2930 2889 } 2890 + return ret; 2931 2891 } 2932 2892 2933 2893 /* ··· 2976 2934 pcp_trylock_prepare(UP_flags); 2977 2935 pcp = pcp_spin_trylock(zone->per_cpu_pageset); 2978 2936 if (pcp) { 2979 - free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags); 2937 + if (!free_frozen_page_commit(zone, pcp, page, migratetype, 2938 + order, fpi_flags, &UP_flags)) 2939 + return; 2980 2940 pcp_spin_unlock(pcp); 2981 2941 } else { 2982 2942 free_one_page(zone, page, pfn, order, fpi_flags); ··· 3078 3034 migratetype = MIGRATE_MOVABLE; 3079 3035 3080 3036 trace_mm_page_free_batched(&folio->page); 3081 - free_frozen_page_commit(zone, pcp, &folio->page, migratetype, 3082 - order, FPI_NONE); 3037 + if (!free_frozen_page_commit(zone, pcp, &folio->page, 3038 + migratetype, order, FPI_NONE, &UP_flags)) { 3039 + pcp = NULL; 3040 + locked_zone = NULL; 3041 + } 3083 3042 } 3084 3043 3085 3044 if (pcp) {