Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

exit: move and extend sched_process_exit() tracepoint

It is useful to be able to access current->mm at task exit to, say, record
a bunch of VMA information right before the task exits (e.g., for stack
symbolization reasons when dealing with short-lived processes that exit in
the middle of profiling session). Currently, trace_sched_process_exit()
is triggered after exit_mm() which resets current->mm to NULL making this
tracepoint unsuitable for inspecting and recording task's
mm_struct-related data when tracing process lifetimes.

There is a particularly suitable place, though, right after
taskstats_exit() is called, but before we do exit_mm() and other exit_*()
resource teardowns. taskstats performs a similar kind of accounting that
some applications do with BPF, and so co-locating them seems like a good
fit. So that's where trace_sched_process_exit() is moved with this patch.

Also, existing trace_sched_process_exit() tracepoint is notoriously
missing `group_dead` flag that is certainly useful in practice and some of
our production applications have to work around this. So plumb
`group_dead` through while at it, to have a richer and more complete
tracepoint.

Note that we can't use sched_process_template anymore, and so we use
TRACE_EVENT()-based tracepoint definition. But all the field names and
order, as well as assign and output logic remain intact. We just add one
extra field at the end in backwards-compatible way.

[andrii@kernel.org: document sched_process_exit and sched_process_template relation]
Link: https://lkml.kernel.org/r/20250403174120.4087794-1-andrii@kernel.org
Link: https://lkml.kernel.org/r/20250402180925.90914-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Andrii Nakryiko and committed by
Andrew Morton
3ca55ca2 82f2b0b9

+31 -5
+30 -4
include/trace/events/sched.h
··· 326 326 TP_ARGS(p)); 327 327 328 328 /* 329 - * Tracepoint for a task exiting: 329 + * Tracepoint for a task exiting. 330 + * Note, it's a superset of sched_process_template and should be kept 331 + * compatible as much as possible. sched_process_exits has an extra 332 + * `group_dead` argument, so sched_process_template can't be used, 333 + * unfortunately, just like sched_migrate_task above. 330 334 */ 331 - DEFINE_EVENT(sched_process_template, sched_process_exit, 332 - TP_PROTO(struct task_struct *p), 333 - TP_ARGS(p)); 335 + TRACE_EVENT(sched_process_exit, 336 + 337 + TP_PROTO(struct task_struct *p, bool group_dead), 338 + 339 + TP_ARGS(p, group_dead), 340 + 341 + TP_STRUCT__entry( 342 + __array( char, comm, TASK_COMM_LEN ) 343 + __field( pid_t, pid ) 344 + __field( int, prio ) 345 + __field( bool, group_dead ) 346 + ), 347 + 348 + TP_fast_assign( 349 + memcpy(__entry->comm, p->comm, TASK_COMM_LEN); 350 + __entry->pid = p->pid; 351 + __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ 352 + __entry->group_dead = group_dead; 353 + ), 354 + 355 + TP_printk("comm=%s pid=%d prio=%d group_dead=%s", 356 + __entry->comm, __entry->pid, __entry->prio, 357 + __entry->group_dead ? "true" : "false" 358 + ) 359 + ); 334 360 335 361 /* 336 362 * Tracepoint for waiting on task to unschedule:
+1 -1
kernel/exit.c
··· 936 936 937 937 tsk->exit_code = code; 938 938 taskstats_exit(tsk, group_dead); 939 + trace_sched_process_exit(tsk, group_dead); 939 940 940 941 exit_mm(); 941 942 942 943 if (group_dead) 943 944 acct_process(); 944 - trace_sched_process_exit(tsk); 945 945 946 946 exit_sem(tsk); 947 947 exit_shm(tsk);