Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/numa: add tracepoint that tracks the skipping of numa balancing due to cpuset memory pinning

Unlike sched_skip_vma_numa tracepoint which tracks skipped VMAs, this
tracks the task subjected to cpuset.mems pinning and prints out its
allowed memory node mask.

Link: https://lkml.kernel.org/r/20250424024523.2298272-3-libo.chen@oracle.com
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Srikanth Aithal <sraithal@amd.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Libo Chen and committed by
Andrew Morton
3fc567e4 1f6c6ac0

+36 -1
+33
include/trace/events/sched.h
··· 745 745 __entry->vm_end, 746 746 __print_symbolic(__entry->reason, NUMAB_SKIP_REASON)) 747 747 ); 748 + 749 + TRACE_EVENT(sched_skip_cpuset_numa, 750 + 751 + TP_PROTO(struct task_struct *tsk, nodemask_t *mem_allowed_ptr), 752 + 753 + TP_ARGS(tsk, mem_allowed_ptr), 754 + 755 + TP_STRUCT__entry( 756 + __array( char, comm, TASK_COMM_LEN ) 757 + __field( pid_t, pid ) 758 + __field( pid_t, tgid ) 759 + __field( pid_t, ngid ) 760 + __array( unsigned long, mem_allowed, BITS_TO_LONGS(MAX_NUMNODES)) 761 + ), 762 + 763 + TP_fast_assign( 764 + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); 765 + __entry->pid = task_pid_nr(tsk); 766 + __entry->tgid = task_tgid_nr(tsk); 767 + __entry->ngid = task_numa_group_id(tsk); 768 + BUILD_BUG_ON(sizeof(nodemask_t) != \ 769 + BITS_TO_LONGS(MAX_NUMNODES) * sizeof(long)); 770 + memcpy(__entry->mem_allowed, mem_allowed_ptr->bits, 771 + sizeof(__entry->mem_allowed)); 772 + ), 773 + 774 + TP_printk("comm=%s pid=%d tgid=%d ngid=%d mem_nodes_allowed=%*pbl", 775 + __entry->comm, 776 + __entry->pid, 777 + __entry->tgid, 778 + __entry->ngid, 779 + MAX_NUMNODES, __entry->mem_allowed) 780 + ); 748 781 #endif /* CONFIG_NUMA_BALANCING */ 749 782 750 783 /*
+3 -1
kernel/sched/fair.c
··· 3333 3333 * Memory is pinned to only one NUMA node via cpuset.mems, naturally 3334 3334 * no page can be migrated. 3335 3335 */ 3336 - if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) == 1) 3336 + if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) == 1) { 3337 + trace_sched_skip_cpuset_numa(current, &cpuset_current_mems_allowed); 3337 3338 return; 3339 + } 3338 3340 3339 3341 if (!mm->numa_next_scan) { 3340 3342 mm->numa_next_scan = now +