Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity

calculate_totalreserve_pages() currently finds the maximum
lowmem_reserve[j] for a zone by scanning the full forward range [j =
zone_idx .. MAX_NR_ZONES). However, for a given zone i, the
lowmem_reserve[j] array (for j > i) is naturally expected to form a
monotonically non-decreasing sequence in j, not as an implementation
detail, but as a consequence that naturally arises from the semantics of
lowmem_reserve[].

For zone "i", lowmem_reserve[j] expresses how many pages in zone i must
effectively be kept in reserve when deciding whether an allocation class
that may allocate from zones up to j is allowed to fall back into i. It
protects less flexible allocation classes (which cannot use higher zones)
from being starved by more flexible ones.

Viewed from this semantics, it is natural to expect a partial ordering in
j: as j increases, the allocation class gains access to a strictly larger
set of fallback zones. Therefore lowmem_reserve[j] is expected to be
monotonically non-decreasing in j: more flexible allocation classes must
not be allowed to deplete low zones more aggressively than less flexible
ones.

In other words, if lowmem_reserve[j] were ever observed to *decrease* as j
grows, that would be unexpected from the reserve semantics' point of view
and would likely indicate a semantic change or a misconfiguration.

The current implementation in setup_per_zone_lowmem_reserve() reflects
this policy by accumulating managed pages from higher zones and applying
the configured ratio, which results in a non-decreasing sequence. This
patch makes calculate_totalreserve_pages() rely on that monotonicity
explicitly and finds the maximum reserve value by scanning backward and
stopping at the first non-zero entry. This avoids unnecessary iteration
and reflects the conceptual model more directly. No functional behavior
changes.

To maintain this assumption explicitly, a comment is added next to
setup_per_zone_lowmem_reserve() documenting the monotonicity expectation
and noting that calculate_totalreserve_pages() relies on it.

Link: https://lkml.kernel.org/r/tencent_EB0FED91B01B1F8B6DAEE96719C5F5797F07@qq.com
Signed-off-by: fujunjie <fujunjie1@qq.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

fujunjie and committed by
Andrew Morton
a493c7a6 3cf41edc

+29 -4
+29 -4
mm/page_alloc.c
··· 6311 6311 long max = 0; 6312 6312 unsigned long managed_pages = zone_managed_pages(zone); 6313 6313 6314 - /* Find valid and maximum lowmem_reserve in the zone */ 6315 - for (j = i; j < MAX_NR_ZONES; j++) 6316 - max = max(max, zone->lowmem_reserve[j]); 6314 + /* 6315 + * lowmem_reserve[j] is monotonically non-decreasing 6316 + * in j for a given zone (see 6317 + * setup_per_zone_lowmem_reserve()). The maximum 6318 + * valid reserve lives at the highest index with a 6319 + * non-zero value, so scan backwards and stop at the 6320 + * first hit. 6321 + */ 6322 + for (j = MAX_NR_ZONES - 1; j > i; j--) { 6323 + if (!zone->lowmem_reserve[j]) 6324 + continue; 6317 6325 6326 + max = zone->lowmem_reserve[j]; 6327 + break; 6328 + } 6318 6329 /* we treat the high watermark as reserved pages. */ 6319 6330 max += high_wmark_pages(zone); 6320 6331 ··· 6350 6339 { 6351 6340 struct pglist_data *pgdat; 6352 6341 enum zone_type i, j; 6353 - 6342 + /* 6343 + * For a given zone node_zones[i], lowmem_reserve[j] (j > i) 6344 + * represents how many pages in zone i must effectively be kept 6345 + * in reserve when deciding whether an allocation class that is 6346 + * allowed to allocate from zones up to j may fall back into 6347 + * zone i. 6348 + * 6349 + * As j increases, the allocation class can use a strictly larger 6350 + * set of fallback zones and therefore must not be allowed to 6351 + * deplete low zones more aggressively than a less flexible one. 6352 + * As a result, lowmem_reserve[j] is required to be monotonically 6353 + * non-decreasing in j for each zone i. Callers such as 6354 + * calculate_totalreserve_pages() rely on this monotonicity when 6355 + * selecting the maximum reserve entry. 6356 + */ 6354 6357 for_each_online_pgdat(pgdat) { 6355 6358 for (i = 0; i < MAX_NR_ZONES - 1; i++) { 6356 6359 struct zone *zone = &pgdat->node_zones[i];