Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/vmscan: select the closest preferred node in demote_folio_list()

The preferred demotion node (migration_target_control.nid) should be the
one closest to the source node to minimize migration latency. Currently,
a discrepancy exists where demote_folio_list() randomly selects an allowed
node if the preferred node from next_demotion_node() is not set in
mems_effective.

To address it, update next_demotion_node() to select a preferred target
against allowed nodes; and to return the closest demotion target if all
preferred nodes are not in mems_effective via next_demotion_node().

It ensures that the preferred demotion target is consistently the closest
available node to the source node.

[akpm@linux-foundation.org: fix comment typo, per Shakeel]
Link: https://lkml.kernel.org/r/20260114205305.2869796-3-bingjiao@google.com
Signed-off-by: Bing Jiao <bingjiao@google.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Gregory Price <gourry@gourry.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Bing Jiao and committed by
Andrew Morton
7ec9ecf2 1aceed56

+21 -11
+3 -3
include/linux/memory-tiers.h
··· 53 53 struct list_head *memory_types); 54 54 void mt_put_memory_types(struct list_head *memory_types); 55 55 #ifdef CONFIG_MIGRATION 56 - int next_demotion_node(int node); 56 + int next_demotion_node(int node, const nodemask_t *allowed_mask); 57 57 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); 58 58 bool node_is_toptier(int node); 59 59 #else 60 - static inline int next_demotion_node(int node) 60 + static inline int next_demotion_node(int node, const nodemask_t *allowed_mask) 61 61 { 62 62 return NUMA_NO_NODE; 63 63 } ··· 101 101 102 102 } 103 103 104 - static inline int next_demotion_node(int node) 104 + static inline int next_demotion_node(int node, const nodemask_t *allowed_mask) 105 105 { 106 106 return NUMA_NO_NODE; 107 107 }
+16 -5
mm/memory-tiers.c
··· 320 320 /** 321 321 * next_demotion_node() - Get the next node in the demotion path 322 322 * @node: The starting node to lookup the next node 323 + * @allowed_mask: The pointer to allowed node mask 323 324 * 324 325 * Return: node id for next memory node in the demotion path hierarchy 325 326 * from @node; NUMA_NO_NODE if @node is terminal. This does not keep 326 327 * @node online or guarantee that it *continues* to be the next demotion 327 328 * target. 328 329 */ 329 - int next_demotion_node(int node) 330 + int next_demotion_node(int node, const nodemask_t *allowed_mask) 330 331 { 331 332 struct demotion_nodes *nd; 332 - int target; 333 + nodemask_t mask; 333 334 334 335 if (!node_demotion) 335 336 return NUMA_NO_NODE; ··· 345 344 * node_demotion[] reads need to be consistent. 346 345 */ 347 346 rcu_read_lock(); 347 + /* Filter out nodes that are not in allowed_mask. */ 348 + nodes_and(mask, nd->preferred, *allowed_mask); 349 + rcu_read_unlock(); 350 + 348 351 /* 349 352 * If there are multiple target nodes, just select one 350 353 * target node randomly. ··· 361 356 * caching issue, which seems more complicated. So selecting 362 357 * target node randomly seems better until now. 363 358 */ 364 - target = node_random(&nd->preferred); 365 - rcu_read_unlock(); 359 + if (!nodes_empty(mask)) 360 + return node_random(&mask); 366 361 367 - return target; 362 + /* 363 + * Preferred nodes are not in allowed_mask. Flip bits in 364 + * allowed_mask as used node mask. Then, use it to get the 365 + * closest demotion target. 366 + */ 367 + nodes_complement(mask, *allowed_mask); 368 + return find_next_best_node(node, &mask); 368 369 } 369 370 370 371 static void disable_all_demotion_targets(void)
+2 -3
mm/vmscan.c
··· 1046 1046 if (nodes_empty(allowed_mask)) 1047 1047 return 0; 1048 1048 1049 - target_nid = next_demotion_node(pgdat->node_id); 1049 + target_nid = next_demotion_node(pgdat->node_id, &allowed_mask); 1050 1050 if (target_nid == NUMA_NO_NODE) 1051 1051 /* No lower-tier nodes or nodes were hot-unplugged. */ 1052 1052 return 0; 1053 - if (!node_isset(target_nid, allowed_mask)) 1054 - target_nid = node_random(&allowed_mask); 1053 + 1055 1054 mtc.nid = target_nid; 1056 1055 1057 1056 /* Demotion ignores all cpuset and mempolicy settings */