Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/vmscan: add tracepoint and reason for kswapd_failures reset

Currently, kswapd_failures is reset in multiple places (kswapd, direct
reclaim, PCP freeing, memory-tiers), but there's no way to trace when and
why it was reset, making it difficult to debug memory reclaim issues.

This patch:

1. Introduce kswapd_clear_hopeless() as a wrapper function to
centralize kswapd_failures reset logic.

2. Introduce kswapd_test_hopeless() to encapsulate hopeless node
checks, replacing all open-coded kswapd_failures comparisons.

3. Add kswapd_clear_hopeless_reason enum to distinguish reset sources:
- KSWAPD_CLEAR_HOPELESS_KSWAPD: reset from kswapd context
- KSWAPD_CLEAR_HOPELESS_DIRECT: reset from direct reclaim
- KSWAPD_CLEAR_HOPELESS_PCP: reset from PCP page freeing
- KSWAPD_CLEAR_HOPELESS_OTHER: reset from other paths

4. Add tracepoints for better observability:
- mm_vmscan_kswapd_clear_hopeless: traces each reset with reason
- mm_vmscan_kswapd_reclaim_fail: traces each kswapd reclaim failure

Test results:

$ trace-cmd record -e vmscan:mm_vmscan_kswapd_clear_hopeless -e vmscan:mm_vmscan_kswapd_reclaim_fail
$ # generate memory pressure
$ trace-cmd report
cpus=4
kswapd0-71 [000] 27.216563: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1
kswapd0-71 [000] 27.217169: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2
kswapd0-71 [000] 27.217764: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3
kswapd0-71 [000] 27.218353: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4
kswapd0-71 [000] 27.218993: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5
kswapd0-71 [000] 27.219744: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6
kswapd0-71 [000] 27.220488: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7
kswapd0-71 [000] 27.221206: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8
kswapd0-71 [000] 27.221806: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9
kswapd0-71 [000] 27.222634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10
kswapd0-71 [000] 27.223286: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11
kswapd0-71 [000] 27.223894: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12
kswapd0-71 [000] 27.224712: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13
kswapd0-71 [000] 27.225424: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14
kswapd0-71 [000] 27.226082: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15
kswapd0-71 [000] 27.226810: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16
kswapd1-72 [002] 27.386869: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=1
kswapd1-72 [002] 27.387435: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=2
kswapd1-72 [002] 27.388016: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=3
kswapd1-72 [002] 27.388586: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=4
kswapd1-72 [002] 27.389155: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=5
kswapd1-72 [002] 27.389723: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=6
kswapd1-72 [002] 27.390292: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=7
kswapd1-72 [002] 27.392364: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=8
kswapd1-72 [002] 27.392934: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=9
kswapd1-72 [002] 27.393504: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=10
kswapd1-72 [002] 27.394073: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=11
kswapd1-72 [002] 27.394899: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=12
kswapd1-72 [002] 27.395472: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=13
kswapd1-72 [002] 27.396055: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=14
kswapd1-72 [002] 27.396628: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=15
kswapd1-72 [002] 27.397199: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=16
kworker/u18:0-40 [002] 27.410151: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=DIRECT
kswapd0-71 [000] 27.439454: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1
kswapd0-71 [000] 27.440048: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2
kswapd0-71 [000] 27.440634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3
kswapd0-71 [000] 27.441211: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4
kswapd0-71 [000] 27.441787: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5
kswapd0-71 [000] 27.442363: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6
kswapd0-71 [000] 27.443030: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7
kswapd0-71 [000] 27.443725: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8
kswapd0-71 [000] 27.444315: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9
kswapd0-71 [000] 27.444898: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10
kswapd0-71 [000] 27.445476: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11
kswapd0-71 [000] 27.446053: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12
kswapd0-71 [000] 27.446646: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13
kswapd0-71 [000] 27.447230: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14
kswapd0-71 [000] 27.447812: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15
kswapd0-71 [000] 27.448391: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16
ann-423 [003] 28.028285: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=PCP

Link: https://lkml.kernel.org/r/20260120024402.387576-3-jiayuan.chen@linux.dev
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> [tracing]
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Jiayuan Chen and committed by
Andrew Morton
a4508837 dc9fe9b7

+91 -19
+15 -4
include/linux/mmzone.h
··· 1534 1534 #include <linux/memory_hotplug.h> 1535 1535 1536 1536 void build_all_zonelists(pg_data_t *pgdat); 1537 - void wakeup_kswapd(struct zone *zone, gfp_t gfp_mask, int order, 1538 - enum zone_type highest_zoneidx); 1539 - void kswapd_try_clear_hopeless(struct pglist_data *pgdat, 1540 - unsigned int order, int highest_zoneidx); 1541 1537 bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, 1542 1538 int highest_zoneidx, unsigned int alloc_flags, 1543 1539 long free_pages); 1544 1540 bool zone_watermark_ok(struct zone *z, unsigned int order, 1545 1541 unsigned long mark, int highest_zoneidx, 1546 1542 unsigned int alloc_flags); 1543 + 1544 + enum kswapd_clear_hopeless_reason { 1545 + KSWAPD_CLEAR_HOPELESS_OTHER = 0, 1546 + KSWAPD_CLEAR_HOPELESS_KSWAPD, 1547 + KSWAPD_CLEAR_HOPELESS_DIRECT, 1548 + KSWAPD_CLEAR_HOPELESS_PCP, 1549 + }; 1550 + 1551 + void wakeup_kswapd(struct zone *zone, gfp_t gfp_mask, int order, 1552 + enum zone_type highest_zoneidx); 1553 + void kswapd_try_clear_hopeless(struct pglist_data *pgdat, 1554 + unsigned int order, int highest_zoneidx); 1555 + void kswapd_clear_hopeless(pg_data_t *pgdat, enum kswapd_clear_hopeless_reason reason); 1556 + bool kswapd_test_hopeless(pg_data_t *pgdat); 1557 + 1547 1558 /* 1548 1559 * Memory initialization context, use to differentiate memory added by 1549 1560 * the platform statically or via memory hotplug interface.
+51
include/trace/events/vmscan.h
··· 40 40 {_VMSCAN_THROTTLE_CONGESTED, "VMSCAN_THROTTLE_CONGESTED"} \ 41 41 ) : "VMSCAN_THROTTLE_NONE" 42 42 43 + TRACE_DEFINE_ENUM(KSWAPD_CLEAR_HOPELESS_OTHER); 44 + TRACE_DEFINE_ENUM(KSWAPD_CLEAR_HOPELESS_KSWAPD); 45 + TRACE_DEFINE_ENUM(KSWAPD_CLEAR_HOPELESS_DIRECT); 46 + TRACE_DEFINE_ENUM(KSWAPD_CLEAR_HOPELESS_PCP); 47 + 48 + #define kswapd_clear_hopeless_reason_ops \ 49 + {KSWAPD_CLEAR_HOPELESS_KSWAPD, "KSWAPD"}, \ 50 + {KSWAPD_CLEAR_HOPELESS_DIRECT, "DIRECT"}, \ 51 + {KSWAPD_CLEAR_HOPELESS_PCP, "PCP"}, \ 52 + {KSWAPD_CLEAR_HOPELESS_OTHER, "OTHER"} 43 53 44 54 #define trace_reclaim_flags(file) ( \ 45 55 (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ ··· 544 534 __entry->usec_timeout, 545 535 __entry->usec_delayed, 546 536 show_throttle_flags(__entry->reason)) 537 + ); 538 + 539 + TRACE_EVENT(mm_vmscan_kswapd_reclaim_fail, 540 + 541 + TP_PROTO(int nid, int failures), 542 + 543 + TP_ARGS(nid, failures), 544 + 545 + TP_STRUCT__entry( 546 + __field(int, nid) 547 + __field(int, failures) 548 + ), 549 + 550 + TP_fast_assign( 551 + __entry->nid = nid; 552 + __entry->failures = failures; 553 + ), 554 + 555 + TP_printk("nid=%d failures=%d", 556 + __entry->nid, __entry->failures) 557 + ); 558 + 559 + TRACE_EVENT(mm_vmscan_kswapd_clear_hopeless, 560 + 561 + TP_PROTO(int nid, int reason), 562 + 563 + TP_ARGS(nid, reason), 564 + 565 + TP_STRUCT__entry( 566 + __field(int, nid) 567 + __field(int, reason) 568 + ), 569 + 570 + TP_fast_assign( 571 + __entry->nid = nid; 572 + __entry->reason = reason; 573 + ), 574 + 575 + TP_printk("nid=%d reason=%s", 576 + __entry->nid, 577 + __print_symbolic(__entry->reason, kswapd_clear_hopeless_reason_ops)) 547 578 ); 548 579 #endif /* _TRACE_VMSCAN_H */ 549 580
+1 -1
mm/memory-tiers.c
··· 955 955 struct pglist_data *pgdat; 956 956 957 957 for_each_online_pgdat(pgdat) 958 - atomic_set(&pgdat->kswapd_failures, 0); 958 + kswapd_clear_hopeless(pgdat, KSWAPD_CLEAR_HOPELESS_OTHER); 959 959 } 960 960 961 961 return count;
+2 -2
mm/page_alloc.c
··· 2945 2945 * 'hopeless node' to stay in that state for a while. Let 2946 2946 * kswapd work again by resetting kswapd_failures. 2947 2947 */ 2948 - if (atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES && 2948 + if (kswapd_test_hopeless(pgdat) && 2949 2949 next_memory_node(pgdat->node_id) < MAX_NUMNODES) 2950 - atomic_set(&pgdat->kswapd_failures, 0); 2950 + kswapd_clear_hopeless(pgdat, KSWAPD_CLEAR_HOPELESS_PCP); 2951 2951 } 2952 2952 return ret; 2953 2953 }
+1 -2
mm/show_mem.c
··· 278 278 #endif 279 279 K(node_page_state(pgdat, NR_PAGETABLE)), 280 280 K(node_page_state(pgdat, NR_SECONDARY_PAGETABLE)), 281 - str_yes_no(atomic_read(&pgdat->kswapd_failures) >= 282 - MAX_RECLAIM_RETRIES), 281 + str_yes_no(kswapd_test_hopeless(pgdat)), 283 282 K(node_page_state(pgdat, NR_BALLOON_PAGES))); 284 283 } 285 284
+20 -9
mm/vmscan.c
··· 506 506 * If kswapd is disabled, reschedule if necessary but do not 507 507 * throttle as the system is likely near OOM. 508 508 */ 509 - if (atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES) 509 + if (kswapd_test_hopeless(pgdat)) 510 510 return true; 511 511 512 512 /* ··· 6437 6437 int i; 6438 6438 bool wmark_ok; 6439 6439 6440 - if (atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES) 6440 + if (kswapd_test_hopeless(pgdat)) 6441 6441 return true; 6442 6442 6443 6443 for_each_managed_zone_pgdat(zone, pgdat, i, ZONE_NORMAL) { ··· 6846 6846 wake_up_all(&pgdat->pfmemalloc_wait); 6847 6847 6848 6848 /* Hopeless node, leave it to direct reclaim */ 6849 - if (atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES) 6849 + if (kswapd_test_hopeless(pgdat)) 6850 6850 return true; 6851 6851 6852 6852 if (pgdat_balanced(pgdat, order, highest_zoneidx)) { ··· 7111 7111 * watermark_high at this point. We need to avoid increasing the 7112 7112 * failure count to prevent the kswapd thread from stopping. 7113 7113 */ 7114 - if (!sc.nr_reclaimed && !boosted) 7115 - atomic_inc(&pgdat->kswapd_failures); 7114 + if (!sc.nr_reclaimed && !boosted) { 7115 + int fail_cnt = atomic_inc_return(&pgdat->kswapd_failures); 7116 + /* kswapd context, low overhead to trace every failure */ 7117 + trace_mm_vmscan_kswapd_reclaim_fail(pgdat->node_id, fail_cnt); 7118 + } 7116 7119 7117 7120 out: 7118 7121 clear_reclaim_active(pgdat, highest_zoneidx); ··· 7374 7371 return; 7375 7372 7376 7373 /* Hopeless node, leave it to direct reclaim if possible */ 7377 - if (atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES || 7374 + if (kswapd_test_hopeless(pgdat) || 7378 7375 (pgdat_balanced(pgdat, order, highest_zoneidx) && 7379 7376 !pgdat_watermark_boosted(pgdat, highest_zoneidx))) { 7380 7377 /* ··· 7394 7391 wake_up_interruptible(&pgdat->kswapd_wait); 7395 7392 } 7396 7393 7397 - static void kswapd_clear_hopeless(pg_data_t *pgdat) 7394 + void kswapd_clear_hopeless(pg_data_t *pgdat, enum kswapd_clear_hopeless_reason reason) 7398 7395 { 7399 - atomic_set(&pgdat->kswapd_failures, 0); 7396 + /* Only trace actual resets, not redundant zero-to-zero */ 7397 + if (atomic_xchg(&pgdat->kswapd_failures, 0)) 7398 + trace_mm_vmscan_kswapd_clear_hopeless(pgdat->node_id, reason); 7400 7399 } 7401 7400 7402 7401 /* ··· 7411 7406 unsigned int order, int highest_zoneidx) 7412 7407 { 7413 7408 if (pgdat_balanced(pgdat, order, highest_zoneidx)) 7414 - kswapd_clear_hopeless(pgdat); 7409 + kswapd_clear_hopeless(pgdat, current_is_kswapd() ? 7410 + KSWAPD_CLEAR_HOPELESS_KSWAPD : KSWAPD_CLEAR_HOPELESS_DIRECT); 7411 + } 7412 + 7413 + bool kswapd_test_hopeless(pg_data_t *pgdat) 7414 + { 7415 + return atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES; 7415 7416 } 7416 7417 7417 7418 #ifdef CONFIG_HIBERNATION
+1 -1
mm/vmstat.c
··· 1840 1840 "\n start_pfn: %lu" 1841 1841 "\n reserved_highatomic: %lu" 1842 1842 "\n free_highatomic: %lu", 1843 - atomic_read(&pgdat->kswapd_failures) >= MAX_RECLAIM_RETRIES, 1843 + kswapd_test_hopeless(pgdat), 1844 1844 zone->zone_start_pfn, 1845 1845 zone->nr_reserved_highatomic, 1846 1846 zone->nr_free_highatomic);