Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

- Extend tracing option mask to 64 bits

The trace options were defined by a 32 bit variable. This limits the
tracing instances to have a total of 32 different options. As that
limit has been hit, and more options are being added, increase the
option mask to a 64 bit number, doubling the number of options
available.

As this is required for the kprobe topic branches as well as the
tracing topic branch, a separate branch was created and merged into
both.

- Make trace_user_fault_read() available for the rest of tracing

The function trace_user_fault_read() is used by trace_marker file
read to allow reading user space to be done fast and without locking
or allocations. Make this available so that the system call trace
events can use it too.

- Have system call trace events read user space values

Now that the system call trace events callbacks are called in a
faultable context, take advantage of this and read the user space
buffers for various system calls. For example, show the path name of
the openat system call instead of just showing the pointer to that
path name in user space. Also show the contents of the buffer of the
write system call. Several system call trace events are updated to
make tracing into a light weight strace tool for all applications in
the system.

- Update perf system call tracing to do the same

- And a config and syscall_user_buf_size file to control the size of
the buffer

Limit the amount of data that can be read from user space. The
default size is 63 bytes but that can be expanded to 165 bytes.

- Allow the persistent ring buffer to print system calls normally

The persistent ring buffer prints trace events by their type and
ignores the print_fmt. This is because the print_fmt may change from
kernel to kernel. As the system call output is fixed by the system
call ABI itself, there's no reason to limit that. This makes reading
the system call events in the persistent ring buffer much nicer and
easier to understand.

- Add options to show text offset to function profiler

The function profiler that counts the number of times a function is
hit currently lists all functions by its name and offset. But this
becomes ambiguous when there are several functions with the same
name.

Add a tracing option that changes the output to be that of
'_text+offset' instead. Now a user space tool can use this
information to map the '_text+offset' to the unique function it is
counting.

- Report bad dynamic event command

If a bad command is passed to the dynamic_events file, report it
properly in the error log.

- Clean up tracer options

Clean up the tracer option code a bit, by removing some useless code
and also using switch statements instead of a series of if
statements.

- Have tracing options be instance specific

Tracers can have their own options (function tracer, irqsoff tracer,
function graph tracer, etc). But now that the same tracer can be
enabled in multiple trace instances, their options are still global.
The API is per instance, thus changing one affects other instances.
This isn't even consistent, as the option take affect differently
depending on when an tracer started in an instance. Make the options
for instances only affect the instance it is changed under.

- Optimize pid_list lock contention

Whenever the pid_list is read, it uses a spin lock. This happens at
every sched switch. Taking the lock at sched switch can be removed by
instead using a seqlock counter.

- Clean up the trace trigger structures

The trigger code uses two different structures to implement a single
tigger. This was due to trying to reuse code for the two different
types of triggers (always on trigger, and count limited trigger). But
by adding a single field to one structure, the other structure could
be absorbed into the first structure making he code easier to
understand.

- Create a bulk garbage collector for trace triggers

If user space has triggers for several hundreds of events and then
removes them, it can take several seconds to complete. This is
because each removal calls tracepoint_synchronize_unregister() that
can take hundreds of milliseconds to complete.

Instead, create a helper thread that will do the clean up. When a
trigger is removed, it will create the kthread if it isn't already
created, and then add the trigger to a llist. The kthread will take
the items off the llist, call tracepoint_synchronize_unregister(),
and then remove the items it took off. It will then check if there's
more items to free before sleeping.

This makes user space removing all these triggers to finish in less
than a second.

- Allow function tracing of some of the tracing infrastructure code

Because the tracing code can cause recursion issues if it is traced
by the function tracer the entire tracing directory disables function
tracing. But not all of tracing causes issues if it is traced.
Namely, the event tracing code. Add a config that enables some of the
tracing code to be traced to help in debugging it. Note, when this is
enabled, it does add noise to general function tracing, especially if
events are enabled as well (which is a common case).

- Add boot-time backup instance for persistent buffer

The persistent ring buffer is used mostly for kernel crash analysis
in the field. One issue is that if there's a crash, the data in the
persistent ring buffer must be read before tracing can begin using
it. This slows down the boot process. Once tracing starts in the
persistent ring buffer, the old data must be freed and the addresses
no longer match and old events can't be in the buffer with new
events.

Create a way to create a backup buffer that copies the persistent
ring buffer at boot up. Then after a crash, the always on tracer can
begin immediately as well as the normal boot process while the crash
analysis tooling uses the backup buffer. After the backup buffer is
finished being read, it can be removed.

- Enable function graph args and return address options at the same
time

Currently the when reading of arguments in the function graph tracer
is enabled, the option to record the parent function in the entry
event can not be enabled. Update the code so that it can.

- Add new struct_offset() helper macro

Add a new macro that takes a pointer to a structure and a name of one
of its members and it will return the offset of that member. This
allows the ring buffer code to simplify the following:

From: size = struct_size(entry, buf, cnt - sizeof(entry->id));
To: size = struct_offset(entry, id) + cnt;

There should be other simplifications that this macro can help out
with as well

* tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits)
overflow: Introduce struct_offset() to get offset of member
function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
tracing: Add boot-time backup of persistent ring buffer
ftrace: Allow tracing of some of the tracing code
tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
tracing: Add bulk garbage collection of freeing event_trigger_data
tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
tracing: Merge struct event_trigger_ops into struct event_command
tracing: Remove get_trigger_ops() and add count_func() from trigger ops
tracing: Show the tracer options in boot-time created instance
ftrace: Avoid redundant initialization in register_ftrace_direct
tracing: Remove unused variable in tracing_trace_options_show()
fgraph: Make fgraph_no_sleep_time signed
tracing: Convert function graph set_flags() to use a switch() statement
tracing: Have function graph tracer option sleep-time be per instance
tracing: Move graph-time out of function graph options
tracing: Have function graph tracer option funcgraph-irqs be per instance
trace/pid_list: optimize pid_list->lock contention
tracing: Have function graph tracer define options per instance
tracing: Have function tracer define options per instance
...

+2306 -907
+8
Documentation/trace/ftrace.rst
··· 366 366 for each function. The displayed address is the patch-site address 367 367 and can differ from /proc/kallsyms address. 368 368 369 + syscall_user_buf_size: 370 + 371 + Some system call trace events will record the data from a user 372 + space address that one of the parameters point to. The amount of 373 + data per event is limited. This file holds the max number of bytes 374 + that will be recorded into the ring buffer to hold this data. 375 + The max value is currently 165. 376 + 369 377 dyn_ftrace_total_info: 370 378 371 379 This file is for debugging purposes. The number of functions that
+2 -5
include/linux/ftrace.h
··· 1167 1167 */ 1168 1168 struct ftrace_graph_ent { 1169 1169 unsigned long func; /* Current function */ 1170 - int depth; 1170 + unsigned long depth; 1171 1171 } __packed; 1172 1172 1173 1173 /* 1174 1174 * Structure that defines an entry function trace with retaddr. 1175 - * It's already packed but the attribute "packed" is needed 1176 - * to remove extra padding at the end. 1177 1175 */ 1178 1176 struct fgraph_retaddr_ent { 1179 - unsigned long func; /* Current function */ 1180 - int depth; 1177 + struct ftrace_graph_ent ent; 1181 1178 unsigned long retaddr; /* Return address */ 1182 1179 } __packed; 1183 1180
+12
include/linux/overflow.h
··· 459 459 struct_size((type *)NULL, member, count) 460 460 461 461 /** 462 + * struct_offset() - Calculate the offset of a member within a struct 463 + * @p: Pointer to the struct 464 + * @member: Name of the member to get the offset of 465 + * 466 + * Calculates the offset of a particular @member of the structure pointed 467 + * to by @p. 468 + * 469 + * Return: number of bytes to the location of @member. 470 + */ 471 + #define struct_offset(p, member) (offsetof(typeof(*(p)), member)) 472 + 473 + /** 462 474 * __DEFINE_FLEX() - helper macro for DEFINE_FLEX() family. 463 475 * Enables caller macro to pass arbitrary trailing expressions 464 476 *
+17
include/linux/seq_buf.h
··· 149 149 } 150 150 } 151 151 152 + /** 153 + * seq_buf_pop - pop off the last written character 154 + * @s: the seq_buf handle 155 + * 156 + * Removes the last written character to the seq_buf @s. 157 + * 158 + * Returns the last character or -1 if it is empty. 159 + */ 160 + static inline int seq_buf_pop(struct seq_buf *s) 161 + { 162 + if (!s->len) 163 + return -1; 164 + 165 + s->len--; 166 + return (unsigned int)s->buffer[s->len]; 167 + } 168 + 152 169 extern __printf(2, 3) 153 170 int seq_buf_printf(struct seq_buf *s, const char *fmt, ...); 154 171 extern __printf(2, 0)
+13
include/linux/trace_seq.h
··· 80 80 return s->full || seq_buf_has_overflowed(&s->seq); 81 81 } 82 82 83 + /** 84 + * trace_seq_pop - pop off the last written character 85 + * @s: trace sequence descriptor 86 + * 87 + * Removes the last written character to the trace_seq @s. 88 + * 89 + * Returns the last character or -1 if it is empty. 90 + */ 91 + static inline int trace_seq_pop(struct trace_seq *s) 92 + { 93 + return seq_buf_pop(&s->seq); 94 + } 95 + 83 96 /* 84 97 * Currently only defined when tracing is enabled. 85 98 */
+7 -1
include/trace/syscall.h
··· 16 16 * @name: name of the syscall 17 17 * @syscall_nr: number of the syscall 18 18 * @nb_args: number of parameters it takes 19 + * @user_arg_is_str: set if the arg for @user_arg_size is a string 20 + * @user_arg_size: holds @arg that has size of the user space to read 21 + * @user_mask: mask of @args that will read user space 19 22 * @types: list of types as strings 20 23 * @args: list of args as strings (args[i] matches types[i]) 21 24 * @enter_fields: list of fields for syscall_enter trace event ··· 28 25 struct syscall_metadata { 29 26 const char *name; 30 27 int syscall_nr; 31 - int nb_args; 28 + u8 nb_args:7; 29 + u8 user_arg_is_str:1; 30 + s8 user_arg_size; 31 + short user_mask; 32 32 const char **types; 33 33 const char **args; 34 34 struct list_head enter_fields;
+28
kernel/trace/Kconfig
··· 342 342 depends on DYNAMIC_FTRACE_WITH_DIRECT_CALLS 343 343 depends on HAVE_DYNAMIC_FTRACE_WITH_JMP 344 344 345 + config FUNCTION_SELF_TRACING 346 + bool "Function trace tracing code" 347 + depends on FUNCTION_TRACER 348 + help 349 + Normally all the tracing code is set to notrace, where the function 350 + tracer will ignore all the tracing functions. Sometimes it is useful 351 + for debugging to trace some of the tracing infratructure itself. 352 + Enable this to allow some of the tracing infrastructure to be traced 353 + by the function tracer. Note, this will likely add noise to function 354 + tracing if events and other tracing features are enabled along with 355 + function tracing. 356 + 357 + If unsure, say N. 358 + 345 359 config FPROBE 346 360 bool "Kernel Function Probe (fprobe)" 347 361 depends on HAVE_FUNCTION_GRAPH_FREGS && HAVE_FTRACE_GRAPH_FUNC ··· 600 586 select KALLSYMS 601 587 help 602 588 Basic tracer to catch the syscall entry and exit events. 589 + 590 + config TRACE_SYSCALL_BUF_SIZE_DEFAULT 591 + int "System call user read max size" 592 + range 0 165 593 + default 63 594 + depends on FTRACE_SYSCALLS 595 + help 596 + Some system call trace events will record the data from a user 597 + space address that one of the parameters point to. The amount of 598 + data per event is limited. That limit is set by this config and 599 + this config also affects how much user space data perf can read. 600 + 601 + For a tracing instance, this size may be changed by writing into 602 + its syscall_user_buf_size file. 603 603 604 604 config TRACER_SNAPSHOT 605 605 bool "Create a snapshot trace buffer"
+17
kernel/trace/Makefile
··· 16 16 endif 17 17 endif 18 18 19 + # Allow some files to be function traced 20 + ifdef CONFIG_FUNCTION_SELF_TRACING 21 + CFLAGS_trace_output.o = $(CC_FLAGS_FTRACE) 22 + CFLAGS_trace_seq.o = $(CC_FLAGS_FTRACE) 23 + CFLAGS_trace_stat.o = $(CC_FLAGS_FTRACE) 24 + CFLAGS_tracing_map.o = $(CC_FLAGS_FTRACE) 25 + CFLAGS_synth_event_gen_test.o = $(CC_FLAGS_FTRACE) 26 + CFLAGS_trace_events.o = $(CC_FLAGS_FTRACE) 27 + CFLAGS_trace_syscalls.o = $(CC_FLAGS_FTRACE) 28 + CFLAGS_trace_events_filter.o = $(CC_FLAGS_FTRACE) 29 + CFLAGS_trace_events_trigger.o = $(CC_FLAGS_FTRACE) 30 + CFLAGS_trace_events_synth.o = $(CC_FLAGS_FTRACE) 31 + CFLAGS_trace_events_hist.o = $(CC_FLAGS_FTRACE) 32 + CFLAGS_trace_events_user.o = $(CC_FLAGS_FTRACE) 33 + CFLAGS_trace_dynevent.o = $(CC_FLAGS_FTRACE) 34 + endif 35 + 19 36 ifdef CONFIG_FTRACE_STARTUP_TEST 20 37 CFLAGS_trace_kprobe_selftest.o = $(CC_FLAGS_FTRACE) 21 38 obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe_selftest.o
+3 -3
kernel/trace/blktrace.c
··· 1738 1738 1739 1739 t = te_blk_io_trace(iter->ent); 1740 1740 what = (t->action & ((1 << BLK_TC_SHIFT) - 1)) & ~__BLK_TA_CGROUP; 1741 - long_act = !!(tr->trace_flags & TRACE_ITER_VERBOSE); 1741 + long_act = !!(tr->trace_flags & TRACE_ITER(VERBOSE)); 1742 1742 log_action = classic ? &blk_log_action_classic : &blk_log_action; 1743 1743 has_cg = t->action & __BLK_TA_CGROUP; 1744 1744 ··· 1803 1803 /* don't output context-info for blk_classic output */ 1804 1804 if (bit == TRACE_BLK_OPT_CLASSIC) { 1805 1805 if (set) 1806 - tr->trace_flags &= ~TRACE_ITER_CONTEXT_INFO; 1806 + tr->trace_flags &= ~TRACE_ITER(CONTEXT_INFO); 1807 1807 else 1808 - tr->trace_flags |= TRACE_ITER_CONTEXT_INFO; 1808 + tr->trace_flags |= TRACE_ITER(CONTEXT_INFO); 1809 1809 } 1810 1810 return 0; 1811 1811 }
+1 -9
kernel/trace/fgraph.c
··· 498 498 return get_data_type_data(current, offset); 499 499 } 500 500 501 - /* Both enabled by default (can be cleared by function_graph tracer flags */ 502 - bool fgraph_sleep_time = true; 503 - 504 501 #ifdef CONFIG_DYNAMIC_FTRACE 505 502 /* 506 503 * archs can override this function if they must do something ··· 1020 1023 #endif 1021 1024 } 1022 1025 1023 - void ftrace_graph_sleep_time_control(bool enable) 1024 - { 1025 - fgraph_sleep_time = enable; 1026 - } 1027 - 1028 1026 /* 1029 1027 * Simply points to ftrace_stub, but with the proper protocol. 1030 1028 * Defined by the linker script in linux/vmlinux.lds.h ··· 1090 1098 * Does the user want to count the time a function was asleep. 1091 1099 * If so, do not update the time stamps. 1092 1100 */ 1093 - if (fgraph_sleep_time) 1101 + if (!fgraph_no_sleep_time) 1094 1102 return; 1095 1103 1096 1104 timestamp = trace_clock_local();
+29 -3
kernel/trace/ftrace.c
··· 534 534 535 535 static int function_stat_show(struct seq_file *m, void *v) 536 536 { 537 + struct trace_array *tr = trace_get_global_array(); 537 538 struct ftrace_profile *rec = v; 539 + const char *refsymbol = NULL; 538 540 char str[KSYM_SYMBOL_LEN]; 539 541 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 540 542 static struct trace_seq s; ··· 556 554 return 0; 557 555 #endif 558 556 559 - kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); 557 + if (tr->trace_flags & TRACE_ITER(PROF_TEXT_OFFSET)) { 558 + unsigned long offset; 559 + 560 + if (core_kernel_text(rec->ip)) { 561 + refsymbol = "_text"; 562 + offset = rec->ip - (unsigned long)_text; 563 + } else { 564 + struct module *mod; 565 + 566 + guard(rcu)(); 567 + mod = __module_text_address(rec->ip); 568 + if (mod) { 569 + refsymbol = mod->name; 570 + /* Calculate offset from module's text entry address. */ 571 + offset = rec->ip - (unsigned long)mod->mem[MOD_TEXT].base; 572 + } 573 + } 574 + if (refsymbol) 575 + snprintf(str, sizeof(str), " %s+%#lx", refsymbol, offset); 576 + } 577 + if (!refsymbol) 578 + kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); 579 + 560 580 seq_printf(m, " %-30.30s %10lu", str, rec->counter); 561 581 562 582 #ifdef CONFIG_FUNCTION_GRAPH_TRACER ··· 862 838 return 1; 863 839 } 864 840 841 + bool fprofile_no_sleep_time; 842 + 865 843 static void profile_graph_return(struct ftrace_graph_ret *trace, 866 844 struct fgraph_ops *gops, 867 845 struct ftrace_regs *fregs) ··· 889 863 890 864 calltime = rettime - profile_data->calltime; 891 865 892 - if (!fgraph_sleep_time) { 866 + if (fprofile_no_sleep_time) { 893 867 if (current->ftrace_sleeptime) 894 868 calltime -= current->ftrace_sleeptime - profile_data->sleeptime; 895 869 } ··· 6101 6075 new_hash = NULL; 6102 6076 6103 6077 ops->func = call_direct_funcs; 6104 - ops->flags = MULTI_FLAGS; 6078 + ops->flags |= MULTI_FLAGS; 6105 6079 ops->trampoline = FTRACE_REGS_ADDR; 6106 6080 ops->direct_call = addr; 6107 6081
+21 -9
kernel/trace/pid_list.c
··· 3 3 * Copyright (C) 2021 VMware Inc, Steven Rostedt <rostedt@goodmis.org> 4 4 */ 5 5 #include <linux/spinlock.h> 6 + #include <linux/seqlock.h> 6 7 #include <linux/irq_work.h> 7 8 #include <linux/slab.h> 8 9 #include "trace.h" ··· 127 126 { 128 127 union upper_chunk *upper_chunk; 129 128 union lower_chunk *lower_chunk; 130 - unsigned long flags; 129 + unsigned int seq; 131 130 unsigned int upper1; 132 131 unsigned int upper2; 133 132 unsigned int lower; ··· 139 138 if (pid_split(pid, &upper1, &upper2, &lower) < 0) 140 139 return false; 141 140 142 - raw_spin_lock_irqsave(&pid_list->lock, flags); 143 - upper_chunk = pid_list->upper[upper1]; 144 - if (upper_chunk) { 145 - lower_chunk = upper_chunk->data[upper2]; 146 - if (lower_chunk) 147 - ret = test_bit(lower, lower_chunk->data); 148 - } 149 - raw_spin_unlock_irqrestore(&pid_list->lock, flags); 141 + do { 142 + seq = read_seqcount_begin(&pid_list->seqcount); 143 + ret = false; 144 + upper_chunk = pid_list->upper[upper1]; 145 + if (upper_chunk) { 146 + lower_chunk = upper_chunk->data[upper2]; 147 + if (lower_chunk) 148 + ret = test_bit(lower, lower_chunk->data); 149 + } 150 + } while (read_seqcount_retry(&pid_list->seqcount, seq)); 150 151 151 152 return ret; 152 153 } ··· 181 178 return -EINVAL; 182 179 183 180 raw_spin_lock_irqsave(&pid_list->lock, flags); 181 + write_seqcount_begin(&pid_list->seqcount); 184 182 upper_chunk = pid_list->upper[upper1]; 185 183 if (!upper_chunk) { 186 184 upper_chunk = get_upper_chunk(pid_list); ··· 203 199 set_bit(lower, lower_chunk->data); 204 200 ret = 0; 205 201 out: 202 + write_seqcount_end(&pid_list->seqcount); 206 203 raw_spin_unlock_irqrestore(&pid_list->lock, flags); 207 204 return ret; 208 205 } ··· 235 230 return -EINVAL; 236 231 237 232 raw_spin_lock_irqsave(&pid_list->lock, flags); 233 + write_seqcount_begin(&pid_list->seqcount); 238 234 upper_chunk = pid_list->upper[upper1]; 239 235 if (!upper_chunk) 240 236 goto out; ··· 256 250 } 257 251 } 258 252 out: 253 + write_seqcount_end(&pid_list->seqcount); 259 254 raw_spin_unlock_irqrestore(&pid_list->lock, flags); 260 255 return 0; 261 256 } ··· 347 340 348 341 again: 349 342 raw_spin_lock(&pid_list->lock); 343 + write_seqcount_begin(&pid_list->seqcount); 350 344 upper_count = CHUNK_ALLOC - pid_list->free_upper_chunks; 351 345 lower_count = CHUNK_ALLOC - pid_list->free_lower_chunks; 346 + write_seqcount_end(&pid_list->seqcount); 352 347 raw_spin_unlock(&pid_list->lock); 353 348 354 349 if (upper_count <= 0 && lower_count <= 0) ··· 379 370 } 380 371 381 372 raw_spin_lock(&pid_list->lock); 373 + write_seqcount_begin(&pid_list->seqcount); 382 374 if (upper) { 383 375 *upper_next = pid_list->upper_list; 384 376 pid_list->upper_list = upper; ··· 390 380 pid_list->lower_list = lower; 391 381 pid_list->free_lower_chunks += lcnt; 392 382 } 383 + write_seqcount_end(&pid_list->seqcount); 393 384 raw_spin_unlock(&pid_list->lock); 394 385 395 386 /* ··· 430 419 init_irq_work(&pid_list->refill_irqwork, pid_list_refill_irq); 431 420 432 421 raw_spin_lock_init(&pid_list->lock); 422 + seqcount_raw_spinlock_init(&pid_list->seqcount, &pid_list->lock); 433 423 434 424 for (i = 0; i < CHUNK_ALLOC; i++) { 435 425 union upper_chunk *chunk;
+1
kernel/trace/pid_list.h
··· 76 76 }; 77 77 78 78 struct trace_pid_list { 79 + seqcount_raw_spinlock_t seqcount; 79 80 raw_spinlock_t lock; 80 81 struct irq_work refill_irqwork; 81 82 union upper_chunk *upper[UPPER1_SIZE]; // 1 or 2K in size
+636 -263
kernel/trace/trace.c
··· 20 20 #include <linux/security.h> 21 21 #include <linux/seq_file.h> 22 22 #include <linux/irqflags.h> 23 + #include <linux/syscalls.h> 23 24 #include <linux/debugfs.h> 24 25 #include <linux/tracefs.h> 25 26 #include <linux/pagemap.h> ··· 94 93 static bool traceoff_after_boot __initdata; 95 94 static DEFINE_STATIC_KEY_FALSE(tracepoint_printk_key); 96 95 97 - /* For tracers that don't implement custom flags */ 98 - static struct tracer_opt dummy_tracer_opt[] = { 99 - { } 96 + /* Store tracers and their flags per instance */ 97 + struct tracers { 98 + struct list_head list; 99 + struct tracer *tracer; 100 + struct tracer_flags *flags; 100 101 }; 101 - 102 - static int 103 - dummy_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set) 104 - { 105 - return 0; 106 - } 107 102 108 103 /* 109 104 * To prevent the comm cache from being overwritten when no ··· 509 512 510 513 /* trace_flags holds trace_options default values */ 511 514 #define TRACE_DEFAULT_FLAGS \ 512 - (FUNCTION_DEFAULT_FLAGS | \ 513 - TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK | \ 514 - TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | \ 515 - TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE | \ 516 - TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS | \ 517 - TRACE_ITER_HASH_PTR | TRACE_ITER_TRACE_PRINTK | \ 518 - TRACE_ITER_COPY_MARKER) 515 + (FUNCTION_DEFAULT_FLAGS | FPROFILE_DEFAULT_FLAGS | \ 516 + TRACE_ITER(PRINT_PARENT) | TRACE_ITER(PRINTK) | \ 517 + TRACE_ITER(ANNOTATE) | TRACE_ITER(CONTEXT_INFO) | \ 518 + TRACE_ITER(RECORD_CMD) | TRACE_ITER(OVERWRITE) | \ 519 + TRACE_ITER(IRQ_INFO) | TRACE_ITER(MARKERS) | \ 520 + TRACE_ITER(HASH_PTR) | TRACE_ITER(TRACE_PRINTK) | \ 521 + TRACE_ITER(COPY_MARKER)) 519 522 520 523 /* trace_options that are only supported by global_trace */ 521 - #define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \ 522 - TRACE_ITER_PRINTK_MSGONLY | TRACE_ITER_RECORD_CMD) 524 + #define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER(PRINTK) | \ 525 + TRACE_ITER(PRINTK_MSGONLY) | TRACE_ITER(RECORD_CMD) | \ 526 + TRACE_ITER(PROF_TEXT_OFFSET) | FPROFILE_DEFAULT_FLAGS) 523 527 524 528 /* trace_flags that are default zero for instances */ 525 529 #define ZEROED_TRACE_FLAGS \ 526 - (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK | TRACE_ITER_TRACE_PRINTK | \ 527 - TRACE_ITER_COPY_MARKER) 530 + (TRACE_ITER(EVENT_FORK) | TRACE_ITER(FUNC_FORK) | TRACE_ITER(TRACE_PRINTK) | \ 531 + TRACE_ITER(COPY_MARKER)) 528 532 529 533 /* 530 534 * The global_trace is the descriptor that holds the top-level tracing ··· 556 558 if (printk_trace == tr) 557 559 return; 558 560 559 - printk_trace->trace_flags &= ~TRACE_ITER_TRACE_PRINTK; 561 + printk_trace->trace_flags &= ~TRACE_ITER(TRACE_PRINTK); 560 562 printk_trace = tr; 561 - tr->trace_flags |= TRACE_ITER_TRACE_PRINTK; 563 + tr->trace_flags |= TRACE_ITER(TRACE_PRINTK); 562 564 } 563 565 564 566 /* Returns true if the status of tr changed */ ··· 571 573 return false; 572 574 573 575 list_add_rcu(&tr->marker_list, &marker_copies); 574 - tr->trace_flags |= TRACE_ITER_COPY_MARKER; 576 + tr->trace_flags |= TRACE_ITER(COPY_MARKER); 575 577 return true; 576 578 } 577 579 ··· 579 581 return false; 580 582 581 583 list_del_init(&tr->marker_list); 582 - tr->trace_flags &= ~TRACE_ITER_COPY_MARKER; 584 + tr->trace_flags &= ~TRACE_ITER(COPY_MARKER); 583 585 return true; 584 586 } 585 587 ··· 1137 1139 unsigned int trace_ctx; 1138 1140 int alloc; 1139 1141 1140 - if (!(tr->trace_flags & TRACE_ITER_PRINTK)) 1142 + if (!(tr->trace_flags & TRACE_ITER(PRINTK))) 1141 1143 return 0; 1142 1144 1143 1145 if (unlikely(tracing_selftest_running && tr == &global_trace)) ··· 1203 1205 if (!printk_binsafe(tr)) 1204 1206 return __trace_puts(ip, str, strlen(str)); 1205 1207 1206 - if (!(tr->trace_flags & TRACE_ITER_PRINTK)) 1208 + if (!(tr->trace_flags & TRACE_ITER(PRINTK))) 1207 1209 return 0; 1208 1210 1209 1211 if (unlikely(tracing_selftest_running || tracing_disabled)) ··· 2171 2173 static int run_tracer_selftest(struct tracer *type) 2172 2174 { 2173 2175 struct trace_array *tr = &global_trace; 2176 + struct tracer_flags *saved_flags = tr->current_trace_flags; 2174 2177 struct tracer *saved_tracer = tr->current_trace; 2175 2178 int ret; 2176 2179 ··· 2202 2203 tracing_reset_online_cpus(&tr->array_buffer); 2203 2204 2204 2205 tr->current_trace = type; 2206 + tr->current_trace_flags = type->flags ? : type->default_flags; 2205 2207 2206 2208 #ifdef CONFIG_TRACER_MAX_TRACE 2207 2209 if (type->use_max_tr) { ··· 2219 2219 ret = type->selftest(type, tr); 2220 2220 /* the test is responsible for resetting too */ 2221 2221 tr->current_trace = saved_tracer; 2222 + tr->current_trace_flags = saved_flags; 2222 2223 if (ret) { 2223 2224 printk(KERN_CONT "FAILED!\n"); 2224 2225 /* Add the warning after printing 'FAILED' */ ··· 2312 2311 } 2313 2312 #endif /* CONFIG_FTRACE_STARTUP_TEST */ 2314 2313 2315 - static void add_tracer_options(struct trace_array *tr, struct tracer *t); 2314 + static int add_tracer(struct trace_array *tr, struct tracer *t); 2316 2315 2317 2316 static void __init apply_trace_boot_options(void); 2317 + 2318 + static void free_tracers(struct trace_array *tr) 2319 + { 2320 + struct tracers *t, *n; 2321 + 2322 + lockdep_assert_held(&trace_types_lock); 2323 + 2324 + list_for_each_entry_safe(t, n, &tr->tracers, list) { 2325 + list_del(&t->list); 2326 + kfree(t->flags); 2327 + kfree(t); 2328 + } 2329 + } 2318 2330 2319 2331 /** 2320 2332 * register_tracer - register a tracer with the ftrace system. ··· 2337 2323 */ 2338 2324 int __init register_tracer(struct tracer *type) 2339 2325 { 2326 + struct trace_array *tr; 2340 2327 struct tracer *t; 2341 2328 int ret = 0; 2342 2329 ··· 2369 2354 } 2370 2355 } 2371 2356 2372 - if (!type->set_flag) 2373 - type->set_flag = &dummy_set_flag; 2374 - if (!type->flags) { 2375 - /*allocate a dummy tracer_flags*/ 2376 - type->flags = kmalloc(sizeof(*type->flags), GFP_KERNEL); 2377 - if (!type->flags) { 2378 - ret = -ENOMEM; 2379 - goto out; 2380 - } 2381 - type->flags->val = 0; 2382 - type->flags->opts = dummy_tracer_opt; 2383 - } else 2384 - if (!type->flags->opts) 2385 - type->flags->opts = dummy_tracer_opt; 2386 - 2387 2357 /* store the tracer for __set_tracer_option */ 2388 - type->flags->trace = type; 2358 + if (type->flags) 2359 + type->flags->trace = type; 2389 2360 2390 2361 ret = do_run_tracer_selftest(type); 2391 2362 if (ret < 0) 2392 2363 goto out; 2393 2364 2365 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 2366 + ret = add_tracer(tr, type); 2367 + if (ret < 0) { 2368 + /* The tracer will still exist but without options */ 2369 + pr_warn("Failed to create tracer options for %s\n", type->name); 2370 + break; 2371 + } 2372 + } 2373 + 2394 2374 type->next = trace_types; 2395 2375 trace_types = type; 2396 - add_tracer_options(&global_trace, type); 2397 2376 2398 2377 out: 2399 2378 mutex_unlock(&trace_types_lock); ··· 2400 2391 2401 2392 printk(KERN_INFO "Starting tracer '%s'\n", type->name); 2402 2393 /* Do we want this tracer to start on bootup? */ 2403 - tracing_set_tracer(&global_trace, type->name); 2394 + WARN_ON(tracing_set_tracer(&global_trace, type->name) < 0); 2404 2395 default_bootup_tracer = NULL; 2405 2396 2406 2397 apply_trace_boot_options(); ··· 3087 3078 unsigned int trace_ctx, 3088 3079 int skip, struct pt_regs *regs) 3089 3080 { 3090 - if (!(tr->trace_flags & TRACE_ITER_STACKTRACE)) 3081 + if (!(tr->trace_flags & TRACE_ITER(STACKTRACE))) 3091 3082 return; 3092 3083 3093 3084 __ftrace_trace_stack(tr, buffer, trace_ctx, skip, regs); ··· 3148 3139 struct ring_buffer_event *event; 3149 3140 struct userstack_entry *entry; 3150 3141 3151 - if (!(tr->trace_flags & TRACE_ITER_USERSTACKTRACE)) 3142 + if (!(tr->trace_flags & TRACE_ITER(USERSTACKTRACE))) 3152 3143 return; 3153 3144 3154 3145 /* ··· 3493 3484 if (tr == &global_trace) 3494 3485 return 0; 3495 3486 3496 - if (!(tr->trace_flags & TRACE_ITER_PRINTK)) 3487 + if (!(tr->trace_flags & TRACE_ITER(PRINTK))) 3497 3488 return 0; 3498 3489 3499 3490 va_start(ap, fmt); ··· 3530 3521 int ret; 3531 3522 va_list ap; 3532 3523 3533 - if (!(printk_trace->trace_flags & TRACE_ITER_PRINTK)) 3524 + if (!(printk_trace->trace_flags & TRACE_ITER(PRINTK))) 3534 3525 return 0; 3535 3526 3536 3527 va_start(ap, fmt); ··· 3800 3791 if (WARN_ON_ONCE(!fmt)) 3801 3792 return fmt; 3802 3793 3803 - if (!iter->tr || iter->tr->trace_flags & TRACE_ITER_HASH_PTR) 3794 + if (!iter->tr || iter->tr->trace_flags & TRACE_ITER(HASH_PTR)) 3804 3795 return fmt; 3805 3796 3806 3797 p = fmt; ··· 4122 4113 static void print_func_help_header(struct array_buffer *buf, struct seq_file *m, 4123 4114 unsigned int flags) 4124 4115 { 4125 - bool tgid = flags & TRACE_ITER_RECORD_TGID; 4116 + bool tgid = flags & TRACE_ITER(RECORD_TGID); 4126 4117 4127 4118 print_event_info(buf, m); 4128 4119 ··· 4133 4124 static void print_func_help_header_irq(struct array_buffer *buf, struct seq_file *m, 4134 4125 unsigned int flags) 4135 4126 { 4136 - bool tgid = flags & TRACE_ITER_RECORD_TGID; 4127 + bool tgid = flags & TRACE_ITER(RECORD_TGID); 4137 4128 static const char space[] = " "; 4138 4129 int prec = tgid ? 12 : 2; 4139 4130 ··· 4206 4197 struct trace_seq *s = &iter->seq; 4207 4198 struct trace_array *tr = iter->tr; 4208 4199 4209 - if (!(tr->trace_flags & TRACE_ITER_ANNOTATE)) 4200 + if (!(tr->trace_flags & TRACE_ITER(ANNOTATE))) 4210 4201 return; 4211 4202 4212 4203 if (!(iter->iter_flags & TRACE_FILE_ANNOTATE)) ··· 4228 4219 iter->cpu); 4229 4220 } 4230 4221 4222 + #ifdef CONFIG_FTRACE_SYSCALLS 4223 + static bool is_syscall_event(struct trace_event *event) 4224 + { 4225 + return (event->funcs == &enter_syscall_print_funcs) || 4226 + (event->funcs == &exit_syscall_print_funcs); 4227 + 4228 + } 4229 + #define syscall_buf_size CONFIG_TRACE_SYSCALL_BUF_SIZE_DEFAULT 4230 + #else 4231 + static inline bool is_syscall_event(struct trace_event *event) 4232 + { 4233 + return false; 4234 + } 4235 + #define syscall_buf_size 0 4236 + #endif /* CONFIG_FTRACE_SYSCALLS */ 4237 + 4231 4238 static enum print_line_t print_trace_fmt(struct trace_iterator *iter) 4232 4239 { 4233 4240 struct trace_array *tr = iter->tr; ··· 4258 4233 4259 4234 event = ftrace_find_event(entry->type); 4260 4235 4261 - if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { 4236 + if (tr->trace_flags & TRACE_ITER(CONTEXT_INFO)) { 4262 4237 if (iter->iter_flags & TRACE_FILE_LAT_FMT) 4263 4238 trace_print_lat_context(iter); 4264 4239 else ··· 4269 4244 return TRACE_TYPE_PARTIAL_LINE; 4270 4245 4271 4246 if (event) { 4272 - if (tr->trace_flags & TRACE_ITER_FIELDS) 4247 + if (tr->trace_flags & TRACE_ITER(FIELDS)) 4273 4248 return print_event_fields(iter, event); 4274 4249 /* 4275 4250 * For TRACE_EVENT() events, the print_fmt is not 4276 4251 * safe to use if the array has delta offsets 4277 4252 * Force printing via the fields. 4278 4253 */ 4279 - if ((tr->text_delta) && 4280 - event->type > __TRACE_LAST_TYPE) 4254 + if ((tr->text_delta)) { 4255 + /* ftrace and system call events are still OK */ 4256 + if ((event->type > __TRACE_LAST_TYPE) && 4257 + !is_syscall_event(event)) 4281 4258 return print_event_fields(iter, event); 4282 - 4259 + } 4283 4260 return event->funcs->trace(iter, sym_flags, event); 4284 4261 } 4285 4262 ··· 4299 4272 4300 4273 entry = iter->ent; 4301 4274 4302 - if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) 4275 + if (tr->trace_flags & TRACE_ITER(CONTEXT_INFO)) 4303 4276 trace_seq_printf(s, "%d %d %llu ", 4304 4277 entry->pid, iter->cpu, iter->ts); 4305 4278 ··· 4325 4298 4326 4299 entry = iter->ent; 4327 4300 4328 - if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { 4301 + if (tr->trace_flags & TRACE_ITER(CONTEXT_INFO)) { 4329 4302 SEQ_PUT_HEX_FIELD(s, entry->pid); 4330 4303 SEQ_PUT_HEX_FIELD(s, iter->cpu); 4331 4304 SEQ_PUT_HEX_FIELD(s, iter->ts); ··· 4354 4327 4355 4328 entry = iter->ent; 4356 4329 4357 - if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { 4330 + if (tr->trace_flags & TRACE_ITER(CONTEXT_INFO)) { 4358 4331 SEQ_PUT_FIELD(s, entry->pid); 4359 4332 SEQ_PUT_FIELD(s, iter->cpu); 4360 4333 SEQ_PUT_FIELD(s, iter->ts); ··· 4425 4398 } 4426 4399 4427 4400 if (iter->ent->type == TRACE_BPUTS && 4428 - trace_flags & TRACE_ITER_PRINTK && 4429 - trace_flags & TRACE_ITER_PRINTK_MSGONLY) 4401 + trace_flags & TRACE_ITER(PRINTK) && 4402 + trace_flags & TRACE_ITER(PRINTK_MSGONLY)) 4430 4403 return trace_print_bputs_msg_only(iter); 4431 4404 4432 4405 if (iter->ent->type == TRACE_BPRINT && 4433 - trace_flags & TRACE_ITER_PRINTK && 4434 - trace_flags & TRACE_ITER_PRINTK_MSGONLY) 4406 + trace_flags & TRACE_ITER(PRINTK) && 4407 + trace_flags & TRACE_ITER(PRINTK_MSGONLY)) 4435 4408 return trace_print_bprintk_msg_only(iter); 4436 4409 4437 4410 if (iter->ent->type == TRACE_PRINT && 4438 - trace_flags & TRACE_ITER_PRINTK && 4439 - trace_flags & TRACE_ITER_PRINTK_MSGONLY) 4411 + trace_flags & TRACE_ITER(PRINTK) && 4412 + trace_flags & TRACE_ITER(PRINTK_MSGONLY)) 4440 4413 return trace_print_printk_msg_only(iter); 4441 4414 4442 - if (trace_flags & TRACE_ITER_BIN) 4415 + if (trace_flags & TRACE_ITER(BIN)) 4443 4416 return print_bin_fmt(iter); 4444 4417 4445 - if (trace_flags & TRACE_ITER_HEX) 4418 + if (trace_flags & TRACE_ITER(HEX)) 4446 4419 return print_hex_fmt(iter); 4447 4420 4448 - if (trace_flags & TRACE_ITER_RAW) 4421 + if (trace_flags & TRACE_ITER(RAW)) 4449 4422 return print_raw_fmt(iter); 4450 4423 4451 4424 return print_trace_fmt(iter); ··· 4463 4436 if (iter->iter_flags & TRACE_FILE_LAT_FMT) 4464 4437 print_trace_header(m, iter); 4465 4438 4466 - if (!(tr->trace_flags & TRACE_ITER_VERBOSE)) 4439 + if (!(tr->trace_flags & TRACE_ITER(VERBOSE))) 4467 4440 print_lat_help_header(m); 4468 4441 } 4469 4442 ··· 4473 4446 struct trace_array *tr = iter->tr; 4474 4447 unsigned long trace_flags = tr->trace_flags; 4475 4448 4476 - if (!(trace_flags & TRACE_ITER_CONTEXT_INFO)) 4449 + if (!(trace_flags & TRACE_ITER(CONTEXT_INFO))) 4477 4450 return; 4478 4451 4479 4452 if (iter->iter_flags & TRACE_FILE_LAT_FMT) { ··· 4481 4454 if (trace_empty(iter)) 4482 4455 return; 4483 4456 print_trace_header(m, iter); 4484 - if (!(trace_flags & TRACE_ITER_VERBOSE)) 4457 + if (!(trace_flags & TRACE_ITER(VERBOSE))) 4485 4458 print_lat_help_header(m); 4486 4459 } else { 4487 - if (!(trace_flags & TRACE_ITER_VERBOSE)) { 4488 - if (trace_flags & TRACE_ITER_IRQ_INFO) 4460 + if (!(trace_flags & TRACE_ITER(VERBOSE))) { 4461 + if (trace_flags & TRACE_ITER(IRQ_INFO)) 4489 4462 print_func_help_header_irq(iter->array_buffer, 4490 4463 m, trace_flags); 4491 4464 else ··· 4709 4682 * If pause-on-trace is enabled, then stop the trace while 4710 4683 * dumping, unless this is the "snapshot" file 4711 4684 */ 4712 - if (!iter->snapshot && (tr->trace_flags & TRACE_ITER_PAUSE_ON_TRACE)) 4685 + if (!iter->snapshot && (tr->trace_flags & TRACE_ITER(PAUSE_ON_TRACE))) 4713 4686 tracing_stop_tr(tr); 4714 4687 4715 4688 if (iter->cpu_file == RING_BUFFER_ALL_CPUS) { ··· 4903 4876 iter = __tracing_open(inode, file, false); 4904 4877 if (IS_ERR(iter)) 4905 4878 ret = PTR_ERR(iter); 4906 - else if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) 4879 + else if (tr->trace_flags & TRACE_ITER(LATENCY_FMT)) 4907 4880 iter->iter_flags |= TRACE_FILE_LAT_FMT; 4908 4881 } 4909 4882 ··· 5166 5139 { 5167 5140 struct tracer_opt *trace_opts; 5168 5141 struct trace_array *tr = m->private; 5142 + struct tracer_flags *flags; 5169 5143 u32 tracer_flags; 5170 5144 int i; 5171 5145 5172 5146 guard(mutex)(&trace_types_lock); 5173 5147 5174 - tracer_flags = tr->current_trace->flags->val; 5175 - trace_opts = tr->current_trace->flags->opts; 5176 - 5177 5148 for (i = 0; trace_options[i]; i++) { 5178 - if (tr->trace_flags & (1 << i)) 5149 + if (tr->trace_flags & (1ULL << i)) 5179 5150 seq_printf(m, "%s\n", trace_options[i]); 5180 5151 else 5181 5152 seq_printf(m, "no%s\n", trace_options[i]); 5182 5153 } 5154 + 5155 + flags = tr->current_trace_flags; 5156 + if (!flags || !flags->opts) 5157 + return 0; 5158 + 5159 + tracer_flags = flags->val; 5160 + trace_opts = flags->opts; 5183 5161 5184 5162 for (i = 0; trace_opts[i].name; i++) { 5185 5163 if (tracer_flags & trace_opts[i].bit) ··· 5201 5169 struct tracer_opt *opts, int neg) 5202 5170 { 5203 5171 struct tracer *trace = tracer_flags->trace; 5204 - int ret; 5172 + int ret = 0; 5205 5173 5206 - ret = trace->set_flag(tr, tracer_flags->val, opts->bit, !neg); 5174 + if (trace->set_flag) 5175 + ret = trace->set_flag(tr, tracer_flags->val, opts->bit, !neg); 5207 5176 if (ret) 5208 5177 return ret; 5209 5178 ··· 5218 5185 /* Try to assign a tracer specific option */ 5219 5186 static int set_tracer_option(struct trace_array *tr, char *cmp, int neg) 5220 5187 { 5221 - struct tracer *trace = tr->current_trace; 5222 - struct tracer_flags *tracer_flags = trace->flags; 5188 + struct tracer_flags *tracer_flags = tr->current_trace_flags; 5223 5189 struct tracer_opt *opts = NULL; 5224 5190 int i; 5191 + 5192 + if (!tracer_flags || !tracer_flags->opts) 5193 + return 0; 5225 5194 5226 5195 for (i = 0; tracer_flags->opts[i].name; i++) { 5227 5196 opts = &tracer_flags->opts[i]; 5228 5197 5229 5198 if (strcmp(cmp, opts->name) == 0) 5230 - return __set_tracer_option(tr, trace->flags, opts, neg); 5199 + return __set_tracer_option(tr, tracer_flags, opts, neg); 5231 5200 } 5232 5201 5233 5202 return -EINVAL; 5234 5203 } 5235 5204 5236 5205 /* Some tracers require overwrite to stay enabled */ 5237 - int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set) 5206 + int trace_keep_overwrite(struct tracer *tracer, u64 mask, int set) 5238 5207 { 5239 - if (tracer->enabled && (mask & TRACE_ITER_OVERWRITE) && !set) 5208 + if (tracer->enabled && (mask & TRACE_ITER(OVERWRITE)) && !set) 5240 5209 return -1; 5241 5210 5242 5211 return 0; 5243 5212 } 5244 5213 5245 - int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) 5214 + int set_tracer_flag(struct trace_array *tr, u64 mask, int enabled) 5246 5215 { 5247 - if ((mask == TRACE_ITER_RECORD_TGID) || 5248 - (mask == TRACE_ITER_RECORD_CMD) || 5249 - (mask == TRACE_ITER_TRACE_PRINTK) || 5250 - (mask == TRACE_ITER_COPY_MARKER)) 5216 + switch (mask) { 5217 + case TRACE_ITER(RECORD_TGID): 5218 + case TRACE_ITER(RECORD_CMD): 5219 + case TRACE_ITER(TRACE_PRINTK): 5220 + case TRACE_ITER(COPY_MARKER): 5251 5221 lockdep_assert_held(&event_mutex); 5222 + } 5252 5223 5253 5224 /* do nothing if flag is already set */ 5254 5225 if (!!(tr->trace_flags & mask) == !!enabled) ··· 5263 5226 if (tr->current_trace->flag_changed(tr, mask, !!enabled)) 5264 5227 return -EINVAL; 5265 5228 5266 - if (mask == TRACE_ITER_TRACE_PRINTK) { 5229 + switch (mask) { 5230 + case TRACE_ITER(TRACE_PRINTK): 5267 5231 if (enabled) { 5268 5232 update_printk_trace(tr); 5269 5233 } else { ··· 5281 5243 if (printk_trace == tr) 5282 5244 update_printk_trace(&global_trace); 5283 5245 } 5284 - } 5246 + break; 5285 5247 5286 - if (mask == TRACE_ITER_COPY_MARKER) 5248 + case TRACE_ITER(COPY_MARKER): 5287 5249 update_marker_trace(tr, enabled); 5250 + /* update_marker_trace updates the tr->trace_flags */ 5251 + return 0; 5252 + } 5288 5253 5289 5254 if (enabled) 5290 5255 tr->trace_flags |= mask; 5291 5256 else 5292 5257 tr->trace_flags &= ~mask; 5293 5258 5294 - if (mask == TRACE_ITER_RECORD_CMD) 5259 + switch (mask) { 5260 + case TRACE_ITER(RECORD_CMD): 5295 5261 trace_event_enable_cmd_record(enabled); 5262 + break; 5296 5263 5297 - if (mask == TRACE_ITER_RECORD_TGID) { 5264 + case TRACE_ITER(RECORD_TGID): 5298 5265 5299 5266 if (trace_alloc_tgid_map() < 0) { 5300 - tr->trace_flags &= ~TRACE_ITER_RECORD_TGID; 5267 + tr->trace_flags &= ~TRACE_ITER(RECORD_TGID); 5301 5268 return -ENOMEM; 5302 5269 } 5303 5270 5304 5271 trace_event_enable_tgid_record(enabled); 5305 - } 5272 + break; 5306 5273 5307 - if (mask == TRACE_ITER_EVENT_FORK) 5274 + case TRACE_ITER(EVENT_FORK): 5308 5275 trace_event_follow_fork(tr, enabled); 5276 + break; 5309 5277 5310 - if (mask == TRACE_ITER_FUNC_FORK) 5278 + case TRACE_ITER(FUNC_FORK): 5311 5279 ftrace_pid_follow_fork(tr, enabled); 5280 + break; 5312 5281 5313 - if (mask == TRACE_ITER_OVERWRITE) { 5282 + case TRACE_ITER(OVERWRITE): 5314 5283 ring_buffer_change_overwrite(tr->array_buffer.buffer, enabled); 5315 5284 #ifdef CONFIG_TRACER_MAX_TRACE 5316 5285 ring_buffer_change_overwrite(tr->max_buffer.buffer, enabled); 5317 5286 #endif 5318 - } 5287 + break; 5319 5288 5320 - if (mask == TRACE_ITER_PRINTK) { 5289 + case TRACE_ITER(PRINTK): 5321 5290 trace_printk_start_stop_comm(enabled); 5322 5291 trace_printk_control(enabled); 5292 + break; 5293 + 5294 + #if defined(CONFIG_FUNCTION_PROFILER) && defined(CONFIG_FUNCTION_GRAPH_TRACER) 5295 + case TRACE_GRAPH_GRAPH_TIME: 5296 + ftrace_graph_graph_time_control(enabled); 5297 + break; 5298 + #endif 5323 5299 } 5324 5300 5325 5301 return 0; ··· 5363 5311 if (ret < 0) 5364 5312 ret = set_tracer_option(tr, cmp, neg); 5365 5313 else 5366 - ret = set_tracer_flag(tr, 1 << ret, !neg); 5314 + ret = set_tracer_flag(tr, 1ULL << ret, !neg); 5367 5315 5368 5316 mutex_unlock(&trace_types_lock); 5369 5317 mutex_unlock(&event_mutex); ··· 6267 6215 return ret; 6268 6216 } 6269 6217 6270 - struct trace_option_dentry; 6271 - 6272 - static void 6273 - create_trace_option_files(struct trace_array *tr, struct tracer *tracer); 6274 - 6275 6218 /* 6276 6219 * Used to clear out the tracer before deletion of an instance. 6277 6220 * Must have trace_types_lock held. ··· 6282 6235 tr->current_trace->reset(tr); 6283 6236 6284 6237 tr->current_trace = &nop_trace; 6238 + tr->current_trace_flags = nop_trace.flags; 6285 6239 } 6286 6240 6287 6241 static bool tracer_options_updated; 6288 6242 6289 - static void add_tracer_options(struct trace_array *tr, struct tracer *t) 6290 - { 6291 - /* Only enable if the directory has been created already. */ 6292 - if (!tr->dir && !(tr->flags & TRACE_ARRAY_FL_GLOBAL)) 6293 - return; 6294 - 6295 - /* Only create trace option files after update_tracer_options finish */ 6296 - if (!tracer_options_updated) 6297 - return; 6298 - 6299 - create_trace_option_files(tr, t); 6300 - } 6301 - 6302 6243 int tracing_set_tracer(struct trace_array *tr, const char *buf) 6303 6244 { 6304 - struct tracer *t; 6245 + struct tracer *trace = NULL; 6246 + struct tracers *t; 6305 6247 #ifdef CONFIG_TRACER_MAX_TRACE 6306 6248 bool had_max_tr; 6307 6249 #endif ··· 6308 6272 ret = 0; 6309 6273 } 6310 6274 6311 - for (t = trace_types; t; t = t->next) { 6312 - if (strcmp(t->name, buf) == 0) 6275 + list_for_each_entry(t, &tr->tracers, list) { 6276 + if (strcmp(t->tracer->name, buf) == 0) { 6277 + trace = t->tracer; 6313 6278 break; 6279 + } 6314 6280 } 6315 - if (!t) 6281 + if (!trace) 6316 6282 return -EINVAL; 6317 6283 6318 - if (t == tr->current_trace) 6284 + if (trace == tr->current_trace) 6319 6285 return 0; 6320 6286 6321 6287 #ifdef CONFIG_TRACER_SNAPSHOT 6322 - if (t->use_max_tr) { 6288 + if (trace->use_max_tr) { 6323 6289 local_irq_disable(); 6324 6290 arch_spin_lock(&tr->max_lock); 6325 6291 ret = tr->cond_snapshot ? -EBUSY : 0; ··· 6332 6294 } 6333 6295 #endif 6334 6296 /* Some tracers won't work on kernel command line */ 6335 - if (system_state < SYSTEM_RUNNING && t->noboot) { 6297 + if (system_state < SYSTEM_RUNNING && trace->noboot) { 6336 6298 pr_warn("Tracer '%s' is not allowed on command line, ignored\n", 6337 - t->name); 6299 + trace->name); 6338 6300 return -EINVAL; 6339 6301 } 6340 6302 6341 6303 /* Some tracers are only allowed for the top level buffer */ 6342 - if (!trace_ok_for_array(t, tr)) 6304 + if (!trace_ok_for_array(trace, tr)) 6343 6305 return -EINVAL; 6344 6306 6345 6307 /* If trace pipe files are being read, we can't change the tracer */ ··· 6358 6320 6359 6321 /* Current trace needs to be nop_trace before synchronize_rcu */ 6360 6322 tr->current_trace = &nop_trace; 6323 + tr->current_trace_flags = nop_trace.flags; 6361 6324 6362 - if (had_max_tr && !t->use_max_tr) { 6325 + if (had_max_tr && !trace->use_max_tr) { 6363 6326 /* 6364 6327 * We need to make sure that the update_max_tr sees that 6365 6328 * current_trace changed to nop_trace to keep it from ··· 6373 6334 tracing_disarm_snapshot(tr); 6374 6335 } 6375 6336 6376 - if (!had_max_tr && t->use_max_tr) { 6337 + if (!had_max_tr && trace->use_max_tr) { 6377 6338 ret = tracing_arm_snapshot_locked(tr); 6378 6339 if (ret) 6379 6340 return ret; ··· 6382 6343 tr->current_trace = &nop_trace; 6383 6344 #endif 6384 6345 6385 - if (t->init) { 6386 - ret = tracer_init(t, tr); 6346 + tr->current_trace_flags = t->flags ? : t->tracer->flags; 6347 + 6348 + if (trace->init) { 6349 + ret = tracer_init(trace, tr); 6387 6350 if (ret) { 6388 6351 #ifdef CONFIG_TRACER_MAX_TRACE 6389 - if (t->use_max_tr) 6352 + if (trace->use_max_tr) 6390 6353 tracing_disarm_snapshot(tr); 6391 6354 #endif 6355 + tr->current_trace_flags = nop_trace.flags; 6392 6356 return ret; 6393 6357 } 6394 6358 } 6395 6359 6396 - tr->current_trace = t; 6360 + tr->current_trace = trace; 6397 6361 tr->current_trace->enabled++; 6398 6362 trace_branch_enable(tr); 6399 6363 ··· 6574 6532 /* trace pipe does not show start of buffer */ 6575 6533 cpumask_setall(iter->started); 6576 6534 6577 - if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) 6535 + if (tr->trace_flags & TRACE_ITER(LATENCY_FMT)) 6578 6536 iter->iter_flags |= TRACE_FILE_LAT_FMT; 6579 6537 6580 6538 /* Output in nanoseconds only if we are using a clock in nanoseconds. */ ··· 6635 6593 if (trace_buffer_iter(iter, iter->cpu_file)) 6636 6594 return EPOLLIN | EPOLLRDNORM; 6637 6595 6638 - if (tr->trace_flags & TRACE_ITER_BLOCK) 6596 + if (tr->trace_flags & TRACE_ITER(BLOCK)) 6639 6597 /* 6640 6598 * Always select as readable when in blocking mode 6641 6599 */ ··· 6954 6912 } 6955 6913 6956 6914 static ssize_t 6915 + tracing_syscall_buf_read(struct file *filp, char __user *ubuf, 6916 + size_t cnt, loff_t *ppos) 6917 + { 6918 + struct inode *inode = file_inode(filp); 6919 + struct trace_array *tr = inode->i_private; 6920 + char buf[64]; 6921 + int r; 6922 + 6923 + r = snprintf(buf, 64, "%d\n", tr->syscall_buf_sz); 6924 + 6925 + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); 6926 + } 6927 + 6928 + static ssize_t 6929 + tracing_syscall_buf_write(struct file *filp, const char __user *ubuf, 6930 + size_t cnt, loff_t *ppos) 6931 + { 6932 + struct inode *inode = file_inode(filp); 6933 + struct trace_array *tr = inode->i_private; 6934 + unsigned long val; 6935 + int ret; 6936 + 6937 + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); 6938 + if (ret) 6939 + return ret; 6940 + 6941 + if (val > SYSCALL_FAULT_USER_MAX) 6942 + val = SYSCALL_FAULT_USER_MAX; 6943 + 6944 + tr->syscall_buf_sz = val; 6945 + 6946 + *ppos += cnt; 6947 + 6948 + return cnt; 6949 + } 6950 + 6951 + static ssize_t 6957 6952 tracing_entries_read(struct file *filp, char __user *ubuf, 6958 6953 size_t cnt, loff_t *ppos) 6959 6954 { ··· 7224 7145 struct trace_array *tr = inode->i_private; 7225 7146 7226 7147 /* disable tracing ? */ 7227 - if (tr->trace_flags & TRACE_ITER_STOP_ON_FREE) 7148 + if (tr->trace_flags & TRACE_ITER(STOP_ON_FREE)) 7228 7149 tracer_tracing_off(tr); 7229 7150 /* resize the ring buffer to 0 */ 7230 7151 tracing_resize_ring_buffer(tr, 0, RING_BUFFER_ALL_CPUS); ··· 7302 7223 char *buf; 7303 7224 }; 7304 7225 7305 - struct trace_user_buf_info { 7306 - struct trace_user_buf __percpu *tbuf; 7307 - int ref; 7308 - }; 7309 - 7310 - 7311 7226 static DEFINE_MUTEX(trace_user_buffer_mutex); 7312 7227 static struct trace_user_buf_info *trace_user_buffer; 7313 7228 7314 - static void trace_user_fault_buffer_free(struct trace_user_buf_info *tinfo) 7229 + /** 7230 + * trace_user_fault_destroy - free up allocated memory of a trace user buffer 7231 + * @tinfo: The descriptor to free up 7232 + * 7233 + * Frees any data allocated in the trace info dsecriptor. 7234 + */ 7235 + void trace_user_fault_destroy(struct trace_user_buf_info *tinfo) 7315 7236 { 7316 7237 char *buf; 7317 7238 int cpu; 7239 + 7240 + if (!tinfo || !tinfo->tbuf) 7241 + return; 7318 7242 7319 7243 for_each_possible_cpu(cpu) { 7320 7244 buf = per_cpu_ptr(tinfo->tbuf, cpu)->buf; 7321 7245 kfree(buf); 7322 7246 } 7323 7247 free_percpu(tinfo->tbuf); 7324 - kfree(tinfo); 7325 7248 } 7326 7249 7327 - static int trace_user_fault_buffer_enable(void) 7250 + static int user_fault_buffer_enable(struct trace_user_buf_info *tinfo, size_t size) 7328 7251 { 7329 - struct trace_user_buf_info *tinfo; 7330 7252 char *buf; 7331 7253 int cpu; 7332 7254 7333 - guard(mutex)(&trace_user_buffer_mutex); 7334 - 7335 - if (trace_user_buffer) { 7336 - trace_user_buffer->ref++; 7337 - return 0; 7338 - } 7339 - 7340 - tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL); 7341 - if (!tinfo) 7342 - return -ENOMEM; 7255 + lockdep_assert_held(&trace_user_buffer_mutex); 7343 7256 7344 7257 tinfo->tbuf = alloc_percpu(struct trace_user_buf); 7345 - if (!tinfo->tbuf) { 7346 - kfree(tinfo); 7258 + if (!tinfo->tbuf) 7347 7259 return -ENOMEM; 7348 - } 7349 7260 7350 7261 tinfo->ref = 1; 7262 + tinfo->size = size; 7351 7263 7352 7264 /* Clear each buffer in case of error */ 7353 7265 for_each_possible_cpu(cpu) { ··· 7346 7276 } 7347 7277 7348 7278 for_each_possible_cpu(cpu) { 7349 - buf = kmalloc_node(TRACE_MARKER_MAX_SIZE, GFP_KERNEL, 7279 + buf = kmalloc_node(size, GFP_KERNEL, 7350 7280 cpu_to_node(cpu)); 7351 - if (!buf) { 7352 - trace_user_fault_buffer_free(tinfo); 7281 + if (!buf) 7353 7282 return -ENOMEM; 7354 - } 7355 7283 per_cpu_ptr(tinfo->tbuf, cpu)->buf = buf; 7356 7284 } 7357 - 7358 - trace_user_buffer = tinfo; 7359 7285 7360 7286 return 0; 7361 7287 } 7362 7288 7363 - static void trace_user_fault_buffer_disable(void) 7289 + /* For internal use. Free and reinitialize */ 7290 + static void user_buffer_free(struct trace_user_buf_info **tinfo) 7364 7291 { 7365 - struct trace_user_buf_info *tinfo; 7292 + lockdep_assert_held(&trace_user_buffer_mutex); 7293 + 7294 + trace_user_fault_destroy(*tinfo); 7295 + kfree(*tinfo); 7296 + *tinfo = NULL; 7297 + } 7298 + 7299 + /* For internal use. Initialize and allocate */ 7300 + static int user_buffer_init(struct trace_user_buf_info **tinfo, size_t size) 7301 + { 7302 + bool alloc = false; 7303 + int ret; 7304 + 7305 + lockdep_assert_held(&trace_user_buffer_mutex); 7306 + 7307 + if (!*tinfo) { 7308 + alloc = true; 7309 + *tinfo = kzalloc(sizeof(**tinfo), GFP_KERNEL); 7310 + if (!*tinfo) 7311 + return -ENOMEM; 7312 + } 7313 + 7314 + ret = user_fault_buffer_enable(*tinfo, size); 7315 + if (ret < 0 && alloc) 7316 + user_buffer_free(tinfo); 7317 + 7318 + return ret; 7319 + } 7320 + 7321 + /* For internal use, derefrence and free if necessary */ 7322 + static void user_buffer_put(struct trace_user_buf_info **tinfo) 7323 + { 7324 + guard(mutex)(&trace_user_buffer_mutex); 7325 + 7326 + if (WARN_ON_ONCE(!*tinfo || !(*tinfo)->ref)) 7327 + return; 7328 + 7329 + if (--(*tinfo)->ref) 7330 + return; 7331 + 7332 + user_buffer_free(tinfo); 7333 + } 7334 + 7335 + /** 7336 + * trace_user_fault_init - Allocated or reference a per CPU buffer 7337 + * @tinfo: A pointer to the trace buffer descriptor 7338 + * @size: The size to allocate each per CPU buffer 7339 + * 7340 + * Create a per CPU buffer that can be used to copy from user space 7341 + * in a task context. When calling trace_user_fault_read(), preemption 7342 + * must be disabled, and it will enable preemption and copy user 7343 + * space data to the buffer. If any schedule switches occur, it will 7344 + * retry until it succeeds without a schedule switch knowing the buffer 7345 + * is still valid. 7346 + * 7347 + * Returns 0 on success, negative on failure. 7348 + */ 7349 + int trace_user_fault_init(struct trace_user_buf_info *tinfo, size_t size) 7350 + { 7351 + int ret; 7352 + 7353 + if (!tinfo) 7354 + return -EINVAL; 7366 7355 7367 7356 guard(mutex)(&trace_user_buffer_mutex); 7368 7357 7369 - tinfo = trace_user_buffer; 7358 + ret = user_buffer_init(&tinfo, size); 7359 + if (ret < 0) 7360 + trace_user_fault_destroy(tinfo); 7370 7361 7371 - if (WARN_ON_ONCE(!tinfo)) 7372 - return; 7373 - 7374 - if (--tinfo->ref) 7375 - return; 7376 - 7377 - trace_user_fault_buffer_free(tinfo); 7378 - trace_user_buffer = NULL; 7362 + return ret; 7379 7363 } 7380 7364 7381 - /* Must be called with preemption disabled */ 7382 - static char *trace_user_fault_read(struct trace_user_buf_info *tinfo, 7383 - const char __user *ptr, size_t size, 7384 - size_t *read_size) 7365 + /** 7366 + * trace_user_fault_get - up the ref count for the user buffer 7367 + * @tinfo: A pointer to a pointer to the trace buffer descriptor 7368 + * 7369 + * Ups the ref count of the trace buffer. 7370 + * 7371 + * Returns the new ref count. 7372 + */ 7373 + int trace_user_fault_get(struct trace_user_buf_info *tinfo) 7374 + { 7375 + if (!tinfo) 7376 + return -1; 7377 + 7378 + guard(mutex)(&trace_user_buffer_mutex); 7379 + 7380 + tinfo->ref++; 7381 + return tinfo->ref; 7382 + } 7383 + 7384 + /** 7385 + * trace_user_fault_put - dereference a per cpu trace buffer 7386 + * @tinfo: The @tinfo that was passed to trace_user_fault_get() 7387 + * 7388 + * Decrement the ref count of @tinfo. 7389 + * 7390 + * Returns the new refcount (negative on error). 7391 + */ 7392 + int trace_user_fault_put(struct trace_user_buf_info *tinfo) 7393 + { 7394 + guard(mutex)(&trace_user_buffer_mutex); 7395 + 7396 + if (WARN_ON_ONCE(!tinfo || !tinfo->ref)) 7397 + return -1; 7398 + 7399 + --tinfo->ref; 7400 + return tinfo->ref; 7401 + } 7402 + 7403 + /** 7404 + * trace_user_fault_read - Read user space into a per CPU buffer 7405 + * @tinfo: The @tinfo allocated by trace_user_fault_get() 7406 + * @ptr: The user space pointer to read 7407 + * @size: The size of user space to read. 7408 + * @copy_func: Optional function to use to copy from user space 7409 + * @data: Data to pass to copy_func if it was supplied 7410 + * 7411 + * Preemption must be disabled when this is called, and must not 7412 + * be enabled while using the returned buffer. 7413 + * This does the copying from user space into a per CPU buffer. 7414 + * 7415 + * The @size must not be greater than the size passed in to 7416 + * trace_user_fault_init(). 7417 + * 7418 + * If @copy_func is NULL, trace_user_fault_read() will use copy_from_user(), 7419 + * otherwise it will call @copy_func. It will call @copy_func with: 7420 + * 7421 + * buffer: the per CPU buffer of the @tinfo. 7422 + * ptr: The pointer @ptr to user space to read 7423 + * size: The @size of the ptr to read 7424 + * data: The @data parameter 7425 + * 7426 + * It is expected that @copy_func will return 0 on success and non zero 7427 + * if there was a fault. 7428 + * 7429 + * Returns a pointer to the buffer with the content read from @ptr. 7430 + * Preemption must remain disabled while the caller accesses the 7431 + * buffer returned by this function. 7432 + * Returns NULL if there was a fault, or the size passed in is 7433 + * greater than the size passed to trace_user_fault_init(). 7434 + */ 7435 + char *trace_user_fault_read(struct trace_user_buf_info *tinfo, 7436 + const char __user *ptr, size_t size, 7437 + trace_user_buf_copy copy_func, void *data) 7385 7438 { 7386 7439 int cpu = smp_processor_id(); 7387 7440 char *buffer = per_cpu_ptr(tinfo->tbuf, cpu)->buf; ··· 7512 7319 int trys = 0; 7513 7320 int ret; 7514 7321 7515 - if (size > TRACE_MARKER_MAX_SIZE) 7516 - size = TRACE_MARKER_MAX_SIZE; 7517 - *read_size = 0; 7322 + lockdep_assert_preemption_disabled(); 7323 + 7324 + /* 7325 + * It's up to the caller to not try to copy more than it said 7326 + * it would. 7327 + */ 7328 + if (size > tinfo->size) 7329 + return NULL; 7518 7330 7519 7331 /* 7520 7332 * This acts similar to a seqcount. The per CPU context switches are ··· 7559 7361 */ 7560 7362 preempt_enable_notrace(); 7561 7363 7562 - ret = __copy_from_user(buffer, ptr, size); 7364 + /* Make sure preemption is enabled here */ 7365 + lockdep_assert_preemption_enabled(); 7366 + 7367 + if (copy_func) { 7368 + ret = copy_func(buffer, ptr, size, data); 7369 + } else { 7370 + ret = __copy_from_user(buffer, ptr, size); 7371 + } 7563 7372 7564 7373 preempt_disable_notrace(); 7565 7374 migrate_enable(); ··· 7583 7378 */ 7584 7379 } while (nr_context_switches_cpu(cpu) != cnt); 7585 7380 7586 - *read_size = size; 7587 7381 return buffer; 7588 7382 } 7589 7383 ··· 7593 7389 struct trace_array *tr = filp->private_data; 7594 7390 ssize_t written = -ENODEV; 7595 7391 unsigned long ip; 7596 - size_t size; 7597 7392 char *buf; 7598 7393 7599 7394 if (tracing_disabled) 7600 7395 return -EINVAL; 7601 7396 7602 - if (!(tr->trace_flags & TRACE_ITER_MARKERS)) 7397 + if (!(tr->trace_flags & TRACE_ITER(MARKERS))) 7603 7398 return -EINVAL; 7604 7399 7605 7400 if ((ssize_t)cnt < 0) ··· 7610 7407 /* Must have preemption disabled while having access to the buffer */ 7611 7408 guard(preempt_notrace)(); 7612 7409 7613 - buf = trace_user_fault_read(trace_user_buffer, ubuf, cnt, &size); 7410 + buf = trace_user_fault_read(trace_user_buffer, ubuf, cnt, NULL, NULL); 7614 7411 if (!buf) 7615 7412 return -EFAULT; 7616 - 7617 - if (cnt > size) 7618 - cnt = size; 7619 7413 7620 7414 /* The selftests expect this function to be the IP address */ 7621 7415 ip = _THIS_IP_; ··· 7642 7442 size_t size; 7643 7443 7644 7444 /* cnt includes both the entry->id and the data behind it. */ 7645 - size = struct_size(entry, buf, cnt - sizeof(entry->id)); 7445 + size = struct_offset(entry, id) + cnt; 7646 7446 7647 7447 buffer = tr->array_buffer.buffer; 7648 7448 ··· 7673 7473 { 7674 7474 struct trace_array *tr = filp->private_data; 7675 7475 ssize_t written = -ENODEV; 7676 - size_t size; 7677 7476 char *buf; 7678 7477 7679 7478 if (tracing_disabled) 7680 7479 return -EINVAL; 7681 7480 7682 - if (!(tr->trace_flags & TRACE_ITER_MARKERS)) 7481 + if (!(tr->trace_flags & TRACE_ITER(MARKERS))) 7683 7482 return -EINVAL; 7684 7483 7685 7484 /* The marker must at least have a tag id */ 7686 7485 if (cnt < sizeof(unsigned int)) 7687 7486 return -EINVAL; 7688 7487 7488 + /* raw write is all or nothing */ 7489 + if (cnt > TRACE_MARKER_MAX_SIZE) 7490 + return -EINVAL; 7491 + 7689 7492 /* Must have preemption disabled while having access to the buffer */ 7690 7493 guard(preempt_notrace)(); 7691 7494 7692 - buf = trace_user_fault_read(trace_user_buffer, ubuf, cnt, &size); 7495 + buf = trace_user_fault_read(trace_user_buffer, ubuf, cnt, NULL, NULL); 7693 7496 if (!buf) 7694 7497 return -EFAULT; 7695 - 7696 - /* raw write is all or nothing */ 7697 - if (cnt > size) 7698 - return -EINVAL; 7699 7498 7700 7499 /* The global trace_marker_raw can go to multiple instances */ 7701 7500 if (tr == &global_trace) { ··· 7715 7516 { 7716 7517 int ret; 7717 7518 7718 - ret = trace_user_fault_buffer_enable(); 7719 - if (ret < 0) 7720 - return ret; 7519 + scoped_guard(mutex, &trace_user_buffer_mutex) { 7520 + if (!trace_user_buffer) { 7521 + ret = user_buffer_init(&trace_user_buffer, TRACE_MARKER_MAX_SIZE); 7522 + if (ret < 0) 7523 + return ret; 7524 + } else { 7525 + trace_user_buffer->ref++; 7526 + } 7527 + } 7721 7528 7722 7529 stream_open(inode, filp); 7723 7530 ret = tracing_open_generic_tr(inode, filp); 7724 7531 if (ret < 0) 7725 - trace_user_fault_buffer_disable(); 7532 + user_buffer_put(&trace_user_buffer); 7726 7533 return ret; 7727 7534 } 7728 7535 7729 7536 static int tracing_mark_release(struct inode *inode, struct file *file) 7730 7537 { 7731 - trace_user_fault_buffer_disable(); 7538 + user_buffer_put(&trace_user_buffer); 7732 7539 return tracing_release_generic_tr(inode, file); 7733 7540 } 7734 7541 ··· 8118 7913 .open = tracing_open_generic_tr, 8119 7914 .read = tracing_entries_read, 8120 7915 .write = tracing_entries_write, 7916 + .llseek = generic_file_llseek, 7917 + .release = tracing_release_generic_tr, 7918 + }; 7919 + 7920 + static const struct file_operations tracing_syscall_buf_fops = { 7921 + .open = tracing_open_generic_tr, 7922 + .read = tracing_syscall_buf_read, 7923 + .write = tracing_syscall_buf_write, 8121 7924 .llseek = generic_file_llseek, 8122 7925 .release = tracing_release_generic_tr, 8123 7926 }; ··· 9014 8801 struct trace_iterator *iter = &info->iter; 9015 8802 int ret = 0; 9016 8803 9017 - /* A memmap'ed buffer is not supported for user space mmap */ 9018 - if (iter->tr->flags & TRACE_ARRAY_FL_MEMMAP) 8804 + /* A memmap'ed and backup buffers are not supported for user space mmap */ 8805 + if (iter->tr->flags & (TRACE_ARRAY_FL_MEMMAP | TRACE_ARRAY_FL_VMALLOC)) 9019 8806 return -ENODEV; 9020 8807 9021 8808 ret = get_snapshot_map(iter->tr); ··· 9528 9315 9529 9316 get_tr_index(tr_index, &tr, &index); 9530 9317 9531 - if (tr->trace_flags & (1 << index)) 9318 + if (tr->trace_flags & (1ULL << index)) 9532 9319 buf = "1\n"; 9533 9320 else 9534 9321 buf = "0\n"; ··· 9557 9344 9558 9345 mutex_lock(&event_mutex); 9559 9346 mutex_lock(&trace_types_lock); 9560 - ret = set_tracer_flag(tr, 1 << index, val); 9347 + ret = set_tracer_flag(tr, 1ULL << index, val); 9561 9348 mutex_unlock(&trace_types_lock); 9562 9349 mutex_unlock(&event_mutex); 9563 9350 ··· 9630 9417 9631 9418 topt->entry = trace_create_file(opt->name, TRACE_MODE_WRITE, 9632 9419 t_options, topt, &trace_options_fops); 9633 - 9634 9420 } 9635 9421 9636 - static void 9637 - create_trace_option_files(struct trace_array *tr, struct tracer *tracer) 9422 + static int 9423 + create_trace_option_files(struct trace_array *tr, struct tracer *tracer, 9424 + struct tracer_flags *flags) 9638 9425 { 9639 9426 struct trace_option_dentry *topts; 9640 9427 struct trace_options *tr_topts; 9641 - struct tracer_flags *flags; 9642 9428 struct tracer_opt *opts; 9643 9429 int cnt; 9644 - int i; 9645 - 9646 - if (!tracer) 9647 - return; 9648 - 9649 - flags = tracer->flags; 9650 9430 9651 9431 if (!flags || !flags->opts) 9652 - return; 9653 - 9654 - /* 9655 - * If this is an instance, only create flags for tracers 9656 - * the instance may have. 9657 - */ 9658 - if (!trace_ok_for_array(tracer, tr)) 9659 - return; 9660 - 9661 - for (i = 0; i < tr->nr_topts; i++) { 9662 - /* Make sure there's no duplicate flags. */ 9663 - if (WARN_ON_ONCE(tr->topts[i].tracer->flags == tracer->flags)) 9664 - return; 9665 - } 9432 + return 0; 9666 9433 9667 9434 opts = flags->opts; 9668 9435 ··· 9651 9458 9652 9459 topts = kcalloc(cnt + 1, sizeof(*topts), GFP_KERNEL); 9653 9460 if (!topts) 9654 - return; 9461 + return 0; 9655 9462 9656 9463 tr_topts = krealloc(tr->topts, sizeof(*tr->topts) * (tr->nr_topts + 1), 9657 9464 GFP_KERNEL); 9658 9465 if (!tr_topts) { 9659 9466 kfree(topts); 9660 - return; 9467 + return -ENOMEM; 9661 9468 } 9662 9469 9663 9470 tr->topts = tr_topts; ··· 9672 9479 "Failed to create trace option: %s", 9673 9480 opts[cnt].name); 9674 9481 } 9482 + return 0; 9483 + } 9484 + 9485 + static int get_global_flags_val(struct tracer *tracer) 9486 + { 9487 + struct tracers *t; 9488 + 9489 + list_for_each_entry(t, &global_trace.tracers, list) { 9490 + if (t->tracer != tracer) 9491 + continue; 9492 + if (!t->flags) 9493 + return -1; 9494 + return t->flags->val; 9495 + } 9496 + return -1; 9497 + } 9498 + 9499 + static int add_tracer_options(struct trace_array *tr, struct tracers *t) 9500 + { 9501 + struct tracer *tracer = t->tracer; 9502 + struct tracer_flags *flags = t->flags ?: tracer->flags; 9503 + 9504 + if (!flags) 9505 + return 0; 9506 + 9507 + /* Only add tracer options after update_tracer_options finish */ 9508 + if (!tracer_options_updated) 9509 + return 0; 9510 + 9511 + return create_trace_option_files(tr, tracer, flags); 9512 + } 9513 + 9514 + static int add_tracer(struct trace_array *tr, struct tracer *tracer) 9515 + { 9516 + struct tracer_flags *flags; 9517 + struct tracers *t; 9518 + int ret; 9519 + 9520 + /* Only enable if the directory has been created already. */ 9521 + if (!tr->dir && !(tr->flags & TRACE_ARRAY_FL_GLOBAL)) 9522 + return 0; 9523 + 9524 + /* 9525 + * If this is an instance, only create flags for tracers 9526 + * the instance may have. 9527 + */ 9528 + if (!trace_ok_for_array(tracer, tr)) 9529 + return 0; 9530 + 9531 + t = kmalloc(sizeof(*t), GFP_KERNEL); 9532 + if (!t) 9533 + return -ENOMEM; 9534 + 9535 + t->tracer = tracer; 9536 + t->flags = NULL; 9537 + list_add(&t->list, &tr->tracers); 9538 + 9539 + flags = tracer->flags; 9540 + if (!flags) { 9541 + if (!tracer->default_flags) 9542 + return 0; 9543 + 9544 + /* 9545 + * If the tracer defines default flags, it means the flags are 9546 + * per trace instance. 9547 + */ 9548 + flags = kmalloc(sizeof(*flags), GFP_KERNEL); 9549 + if (!flags) 9550 + return -ENOMEM; 9551 + 9552 + *flags = *tracer->default_flags; 9553 + flags->trace = tracer; 9554 + 9555 + t->flags = flags; 9556 + 9557 + /* If this is an instance, inherit the global_trace flags */ 9558 + if (!(tr->flags & TRACE_ARRAY_FL_GLOBAL)) { 9559 + int val = get_global_flags_val(tracer); 9560 + if (!WARN_ON_ONCE(val < 0)) 9561 + flags->val = val; 9562 + } 9563 + } 9564 + 9565 + ret = add_tracer_options(tr, t); 9566 + if (ret < 0) { 9567 + list_del(&t->list); 9568 + kfree(t->flags); 9569 + kfree(t); 9570 + } 9571 + 9572 + return ret; 9675 9573 } 9676 9574 9677 9575 static struct dentry * ··· 9792 9508 9793 9509 for (i = 0; trace_options[i]; i++) { 9794 9510 if (top_level || 9795 - !((1 << i) & TOP_LEVEL_TRACE_FLAGS)) 9511 + !((1ULL << i) & TOP_LEVEL_TRACE_FLAGS)) { 9796 9512 create_trace_option_core_file(tr, trace_options[i], i); 9513 + } 9797 9514 } 9798 9515 } 9799 9516 ··· 10115 9830 struct trace_scratch *tscratch; 10116 9831 unsigned int scratch_size = 0; 10117 9832 10118 - rb_flags = tr->trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0; 9833 + rb_flags = tr->trace_flags & TRACE_ITER(OVERWRITE) ? RB_FL_OVERWRITE : 0; 10119 9834 10120 9835 buf->tr = tr; 10121 9836 ··· 10213 9928 tr->trace_flags_index[i] = i; 10214 9929 } 10215 9930 10216 - static void __update_tracer_options(struct trace_array *tr) 9931 + static int __update_tracer(struct trace_array *tr) 10217 9932 { 10218 9933 struct tracer *t; 9934 + int ret = 0; 10219 9935 10220 - for (t = trace_types; t; t = t->next) 10221 - add_tracer_options(tr, t); 9936 + for (t = trace_types; t && !ret; t = t->next) 9937 + ret = add_tracer(tr, t); 9938 + 9939 + return ret; 10222 9940 } 10223 9941 10224 - static void update_tracer_options(struct trace_array *tr) 9942 + static __init int __update_tracer_options(struct trace_array *tr) 10225 9943 { 9944 + struct tracers *t; 9945 + int ret = 0; 9946 + 9947 + list_for_each_entry(t, &tr->tracers, list) { 9948 + ret = add_tracer_options(tr, t); 9949 + if (ret < 0) 9950 + break; 9951 + } 9952 + 9953 + return ret; 9954 + } 9955 + 9956 + static __init void update_tracer_options(void) 9957 + { 9958 + struct trace_array *tr; 9959 + 10226 9960 guard(mutex)(&trace_types_lock); 10227 9961 tracer_options_updated = true; 10228 - __update_tracer_options(tr); 9962 + list_for_each_entry(tr, &ftrace_trace_arrays, list) 9963 + __update_tracer_options(tr); 10229 9964 } 10230 9965 10231 9966 /* Must have trace_types_lock held */ ··· 10290 9985 } 10291 9986 10292 9987 init_tracer_tracefs(tr, tr->dir); 10293 - __update_tracer_options(tr); 10294 - 10295 - return ret; 9988 + ret = __update_tracer(tr); 9989 + if (ret) { 9990 + event_trace_del_tracer(tr); 9991 + tracefs_remove(tr->dir); 9992 + return ret; 9993 + } 9994 + return 0; 10296 9995 } 10297 9996 10298 9997 static struct trace_array * ··· 10338 10029 10339 10030 raw_spin_lock_init(&tr->start_lock); 10340 10031 10032 + tr->syscall_buf_sz = global_trace.syscall_buf_sz; 10033 + 10341 10034 tr->max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 10342 10035 #ifdef CONFIG_TRACER_MAX_TRACE 10343 10036 spin_lock_init(&tr->snapshot_trigger_lock); 10344 10037 #endif 10345 10038 tr->current_trace = &nop_trace; 10039 + tr->current_trace_flags = nop_trace.flags; 10346 10040 10347 10041 INIT_LIST_HEAD(&tr->systems); 10348 10042 INIT_LIST_HEAD(&tr->events); 10349 10043 INIT_LIST_HEAD(&tr->hist_vars); 10350 10044 INIT_LIST_HEAD(&tr->err_log); 10045 + INIT_LIST_HEAD(&tr->tracers); 10351 10046 INIT_LIST_HEAD(&tr->marker_list); 10352 10047 10353 10048 #ifdef CONFIG_MODULES ··· 10506 10193 /* Disable all the flags that were enabled coming in */ 10507 10194 for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) { 10508 10195 if ((1 << i) & ZEROED_TRACE_FLAGS) 10509 - set_tracer_flag(tr, 1 << i, 0); 10196 + set_tracer_flag(tr, 1ULL << i, 0); 10510 10197 } 10511 10198 10512 10199 if (printk_trace == tr) ··· 10524 10211 free_percpu(tr->last_func_repeats); 10525 10212 free_trace_buffers(tr); 10526 10213 clear_tracing_err_log(tr); 10214 + free_tracers(tr); 10527 10215 10528 10216 if (tr->range_name) { 10529 10217 reserve_mem_release_by_name(tr->range_name); 10530 10218 kfree(tr->range_name); 10531 10219 } 10220 + if (tr->flags & TRACE_ARRAY_FL_VMALLOC) 10221 + vfree((void *)tr->range_addr_start); 10532 10222 10533 10223 for (i = 0; i < tr->nr_topts; i++) { 10534 10224 kfree(tr->topts[i].topts); ··· 10660 10344 10661 10345 trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer, 10662 10346 tr, &buffer_subbuf_size_fops); 10347 + 10348 + trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer, 10349 + tr, &tracing_syscall_buf_fops); 10663 10350 10664 10351 create_trace_options_dir(tr); 10665 10352 ··· 10949 10630 10950 10631 create_trace_instances(NULL); 10951 10632 10952 - update_tracer_options(&global_trace); 10633 + update_tracer_options(); 10953 10634 } 10954 10635 10955 10636 static __init int tracer_init_tracefs(void) ··· 11102 10783 /* While dumping, do not allow the buffer to be enable */ 11103 10784 tracer_tracing_disable(tr); 11104 10785 11105 - old_userobj = tr->trace_flags & TRACE_ITER_SYM_USEROBJ; 10786 + old_userobj = tr->trace_flags & TRACE_ITER(SYM_USEROBJ); 11106 10787 11107 10788 /* don't look at user memory in panic mode */ 11108 - tr->trace_flags &= ~TRACE_ITER_SYM_USEROBJ; 10789 + tr->trace_flags &= ~TRACE_ITER(SYM_USEROBJ); 11109 10790 11110 10791 if (dump_mode == DUMP_ORIG) 11111 10792 iter.cpu_file = raw_smp_processor_id(); ··· 11337 11018 static inline void do_allocate_snapshot(const char *name) { } 11338 11019 #endif 11339 11020 11021 + __init static int backup_instance_area(const char *backup, 11022 + unsigned long *addr, phys_addr_t *size) 11023 + { 11024 + struct trace_array *backup_tr; 11025 + void *allocated_vaddr = NULL; 11026 + 11027 + backup_tr = trace_array_get_by_name(backup, NULL); 11028 + if (!backup_tr) { 11029 + pr_warn("Tracing: Instance %s is not found.\n", backup); 11030 + return -ENOENT; 11031 + } 11032 + 11033 + if (!(backup_tr->flags & TRACE_ARRAY_FL_BOOT)) { 11034 + pr_warn("Tracing: Instance %s is not boot mapped.\n", backup); 11035 + trace_array_put(backup_tr); 11036 + return -EINVAL; 11037 + } 11038 + 11039 + *size = backup_tr->range_addr_size; 11040 + 11041 + allocated_vaddr = vzalloc(*size); 11042 + if (!allocated_vaddr) { 11043 + pr_warn("Tracing: Failed to allocate memory for copying instance %s (size 0x%lx)\n", 11044 + backup, (unsigned long)*size); 11045 + trace_array_put(backup_tr); 11046 + return -ENOMEM; 11047 + } 11048 + 11049 + memcpy(allocated_vaddr, 11050 + (void *)backup_tr->range_addr_start, (size_t)*size); 11051 + *addr = (unsigned long)allocated_vaddr; 11052 + 11053 + trace_array_put(backup_tr); 11054 + return 0; 11055 + } 11056 + 11340 11057 __init static void enable_instances(void) 11341 11058 { 11342 11059 struct trace_array *tr; ··· 11395 11040 char *flag_delim; 11396 11041 char *addr_delim; 11397 11042 char *rname __free(kfree) = NULL; 11043 + char *backup; 11398 11044 11399 11045 tok = strsep(&curr_str, ","); 11400 11046 11401 - flag_delim = strchr(tok, '^'); 11402 - addr_delim = strchr(tok, '@'); 11047 + name = strsep(&tok, "="); 11048 + backup = tok; 11049 + 11050 + flag_delim = strchr(name, '^'); 11051 + addr_delim = strchr(name, '@'); 11403 11052 11404 11053 if (addr_delim) 11405 11054 *addr_delim++ = '\0'; ··· 11411 11052 if (flag_delim) 11412 11053 *flag_delim++ = '\0'; 11413 11054 11414 - name = tok; 11055 + if (backup) { 11056 + if (backup_instance_area(backup, &addr, &size) < 0) 11057 + continue; 11058 + } 11415 11059 11416 11060 if (flag_delim) { 11417 11061 char *flag; ··· 11510 11148 tr->ref++; 11511 11149 } 11512 11150 11513 - if (start) { 11151 + /* 11152 + * Backup buffers can be freed but need vfree(). 11153 + */ 11154 + if (backup) 11155 + tr->flags |= TRACE_ARRAY_FL_VMALLOC; 11156 + 11157 + if (start || backup) { 11514 11158 tr->flags |= TRACE_ARRAY_FL_BOOT | TRACE_ARRAY_FL_LAST_BOOT; 11515 11159 tr->range_name = no_free_ptr(rname); 11516 11160 } ··· 11610 11242 * just a bootstrap of current_trace anyway. 11611 11243 */ 11612 11244 global_trace.current_trace = &nop_trace; 11245 + global_trace.current_trace_flags = nop_trace.flags; 11613 11246 11614 11247 global_trace.max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 11615 11248 #ifdef CONFIG_TRACER_MAX_TRACE ··· 11624 11255 11625 11256 init_trace_flags_index(&global_trace); 11626 11257 11627 - register_tracer(&nop_trace); 11628 - 11629 - /* Function tracing may start here (via kernel command line) */ 11630 - init_function_trace(); 11258 + INIT_LIST_HEAD(&global_trace.tracers); 11631 11259 11632 11260 /* All seems OK, enable tracing */ 11633 11261 tracing_disabled = 0; ··· 11636 11270 11637 11271 global_trace.flags = TRACE_ARRAY_FL_GLOBAL; 11638 11272 11273 + global_trace.syscall_buf_sz = syscall_buf_size; 11274 + 11639 11275 INIT_LIST_HEAD(&global_trace.systems); 11640 11276 INIT_LIST_HEAD(&global_trace.events); 11641 11277 INIT_LIST_HEAD(&global_trace.hist_vars); 11642 11278 INIT_LIST_HEAD(&global_trace.err_log); 11643 11279 list_add(&global_trace.marker_list, &marker_copies); 11644 11280 list_add(&global_trace.list, &ftrace_trace_arrays); 11281 + 11282 + register_tracer(&nop_trace); 11283 + 11284 + /* Function tracing may start here (via kernel command line) */ 11285 + init_function_trace(); 11645 11286 11646 11287 apply_trace_boot_options(); 11647 11288 ··· 11673 11300 11674 11301 #ifdef CONFIG_FUNCTION_TRACER 11675 11302 /* Used to set module cached ftrace filtering at boot up */ 11676 - __init struct trace_array *trace_get_global_array(void) 11303 + struct trace_array *trace_get_global_array(void) 11677 11304 { 11678 11305 return &global_trace; 11679 11306 }
+140 -90
kernel/trace/trace.h
··· 22 22 #include <linux/ctype.h> 23 23 #include <linux/once_lite.h> 24 24 #include <linux/ftrace_regs.h> 25 + #include <linux/llist.h> 25 26 26 27 #include "pid_list.h" 27 28 ··· 132 131 #define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long)) 133 132 #define HIST_STACKTRACE_SKIP 5 134 133 134 + #define SYSCALL_FAULT_USER_MAX 165 135 + 135 136 /* 136 137 * syscalls are special, and need special handling, this is why 137 138 * they are not included in trace_entries.h ··· 219 216 int cpu; 220 217 }; 221 218 222 - #define TRACE_FLAGS_MAX_SIZE 32 219 + #define TRACE_FLAGS_MAX_SIZE 64 223 220 224 221 struct trace_options { 225 222 struct tracer *tracer; ··· 393 390 int buffer_percent; 394 391 unsigned int n_err_log_entries; 395 392 struct tracer *current_trace; 396 - unsigned int trace_flags; 393 + struct tracer_flags *current_trace_flags; 394 + u64 trace_flags; 397 395 unsigned char trace_flags_index[TRACE_FLAGS_MAX_SIZE]; 398 396 unsigned int flags; 399 397 raw_spinlock_t start_lock; ··· 408 404 struct list_head systems; 409 405 struct list_head events; 410 406 struct list_head marker_list; 407 + struct list_head tracers; 411 408 struct trace_event_file *trace_marker_file; 412 409 cpumask_var_t tracing_cpumask; /* only trace on set CPUs */ 413 410 /* one per_cpu trace_pipe can be opened by only one user */ ··· 435 430 int function_enabled; 436 431 #endif 437 432 int no_filter_buffering_ref; 433 + unsigned int syscall_buf_sz; 438 434 struct list_head hist_vars; 439 435 #ifdef CONFIG_TRACER_SNAPSHOT 440 436 struct cond_snapshot *cond_snapshot; ··· 454 448 TRACE_ARRAY_FL_LAST_BOOT = BIT(2), 455 449 TRACE_ARRAY_FL_MOD_INIT = BIT(3), 456 450 TRACE_ARRAY_FL_MEMMAP = BIT(4), 451 + TRACE_ARRAY_FL_VMALLOC = BIT(5), 457 452 }; 458 453 459 454 #ifdef CONFIG_MODULES ··· 638 631 u32 old_flags, u32 bit, int set); 639 632 /* Return 0 if OK with change, else return non-zero */ 640 633 int (*flag_changed)(struct trace_array *tr, 641 - u32 mask, int set); 634 + u64 mask, int set); 642 635 struct tracer *next; 643 636 struct tracer_flags *flags; 637 + struct tracer_flags *default_flags; 644 638 int enabled; 645 639 bool print_max; 646 640 bool allow_instances; ··· 945 937 #define TRACE_GRAPH_PRINT_FILL_SHIFT 28 946 938 #define TRACE_GRAPH_PRINT_FILL_MASK (0x3 << TRACE_GRAPH_PRINT_FILL_SHIFT) 947 939 948 - extern void ftrace_graph_sleep_time_control(bool enable); 949 - 950 940 #ifdef CONFIG_FUNCTION_PROFILER 951 941 extern void ftrace_graph_graph_time_control(bool enable); 952 942 #else ··· 964 958 extern int __trace_graph_retaddr_entry(struct trace_array *tr, 965 959 struct ftrace_graph_ent *trace, 966 960 unsigned int trace_ctx, 967 - unsigned long retaddr); 961 + unsigned long retaddr, 962 + struct ftrace_regs *fregs); 968 963 extern void __trace_graph_return(struct trace_array *tr, 969 964 struct ftrace_graph_ret *trace, 970 965 unsigned int trace_ctx, ··· 1116 1109 #endif /* CONFIG_DYNAMIC_FTRACE */ 1117 1110 1118 1111 extern unsigned int fgraph_max_depth; 1119 - extern bool fgraph_sleep_time; 1112 + extern int fgraph_no_sleep_time; 1113 + extern bool fprofile_no_sleep_time; 1120 1114 1121 1115 static inline bool 1122 1116 ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent *trace) ··· 1353 1345 # define FUNCTION_FLAGS \ 1354 1346 C(FUNCTION, "function-trace"), \ 1355 1347 C(FUNC_FORK, "function-fork"), 1356 - # define FUNCTION_DEFAULT_FLAGS TRACE_ITER_FUNCTION 1348 + # define FUNCTION_DEFAULT_FLAGS TRACE_ITER(FUNCTION) 1357 1349 #else 1358 1350 # define FUNCTION_FLAGS 1359 1351 # define FUNCTION_DEFAULT_FLAGS 0UL 1360 - # define TRACE_ITER_FUNC_FORK 0UL 1352 + # define TRACE_ITER_FUNC_FORK_BIT -1 1361 1353 #endif 1362 1354 1363 1355 #ifdef CONFIG_STACKTRACE ··· 1365 1357 C(STACKTRACE, "stacktrace"), 1366 1358 #else 1367 1359 # define STACK_FLAGS 1360 + #endif 1361 + 1362 + #ifdef CONFIG_FUNCTION_PROFILER 1363 + # define PROFILER_FLAGS \ 1364 + C(PROF_TEXT_OFFSET, "prof-text-offset"), 1365 + # ifdef CONFIG_FUNCTION_GRAPH_TRACER 1366 + # define FPROFILE_FLAGS \ 1367 + C(GRAPH_TIME, "graph-time"), 1368 + # define FPROFILE_DEFAULT_FLAGS TRACE_ITER(GRAPH_TIME) 1369 + # else 1370 + # define FPROFILE_FLAGS 1371 + # define FPROFILE_DEFAULT_FLAGS 0UL 1372 + # endif 1373 + #else 1374 + # define PROFILER_FLAGS 1375 + # define FPROFILE_FLAGS 1376 + # define FPROFILE_DEFAULT_FLAGS 0UL 1377 + # define TRACE_ITER_PROF_TEXT_OFFSET_BIT -1 1368 1378 #endif 1369 1379 1370 1380 /* ··· 1417 1391 C(MARKERS, "markers"), \ 1418 1392 C(EVENT_FORK, "event-fork"), \ 1419 1393 C(TRACE_PRINTK, "trace_printk_dest"), \ 1420 - C(COPY_MARKER, "copy_trace_marker"),\ 1394 + C(COPY_MARKER, "copy_trace_marker"), \ 1421 1395 C(PAUSE_ON_TRACE, "pause-on-trace"), \ 1422 1396 C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \ 1423 1397 FUNCTION_FLAGS \ 1424 1398 FGRAPH_FLAGS \ 1425 1399 STACK_FLAGS \ 1426 - BRANCH_FLAGS 1400 + BRANCH_FLAGS \ 1401 + PROFILER_FLAGS \ 1402 + FPROFILE_FLAGS 1427 1403 1428 1404 /* 1429 1405 * By defining C, we can make TRACE_FLAGS a list of bit names ··· 1441 1413 }; 1442 1414 1443 1415 /* 1444 - * By redefining C, we can make TRACE_FLAGS a list of masks that 1445 - * use the bits as defined above. 1416 + * And use TRACE_ITER(flag) to define the bit masks. 1446 1417 */ 1447 - #undef C 1448 - #define C(a, b) TRACE_ITER_##a = (1 << TRACE_ITER_##a##_BIT) 1449 - 1450 - enum trace_iterator_flags { TRACE_FLAGS }; 1418 + #define TRACE_ITER(flag) \ 1419 + (TRACE_ITER_##flag##_BIT < 0 ? 0 : 1ULL << (TRACE_ITER_##flag##_BIT)) 1451 1420 1452 1421 /* 1453 1422 * TRACE_ITER_SYM_MASK masks the options in trace_flags that 1454 1423 * control the output of kernel symbols. 1455 1424 */ 1456 1425 #define TRACE_ITER_SYM_MASK \ 1457 - (TRACE_ITER_PRINT_PARENT|TRACE_ITER_SYM_OFFSET|TRACE_ITER_SYM_ADDR) 1426 + (TRACE_ITER(PRINT_PARENT)|TRACE_ITER(SYM_OFFSET)|TRACE_ITER(SYM_ADDR)) 1458 1427 1459 1428 extern struct tracer nop_trace; 1460 1429 ··· 1460 1435 extern void disable_branch_tracing(void); 1461 1436 static inline int trace_branch_enable(struct trace_array *tr) 1462 1437 { 1463 - if (tr->trace_flags & TRACE_ITER_BRANCH) 1438 + if (tr->trace_flags & TRACE_ITER(BRANCH)) 1464 1439 return enable_branch_tracing(tr); 1465 1440 return 0; 1466 1441 } ··· 1555 1530 void trace_buffered_event_enable(void); 1556 1531 1557 1532 void early_enable_events(struct trace_array *tr, char *buf, bool disable_first); 1533 + 1534 + struct trace_user_buf; 1535 + struct trace_user_buf_info { 1536 + struct trace_user_buf __percpu *tbuf; 1537 + size_t size; 1538 + int ref; 1539 + }; 1540 + 1541 + typedef int (*trace_user_buf_copy)(char *dst, const char __user *src, 1542 + size_t size, void *data); 1543 + int trace_user_fault_init(struct trace_user_buf_info *tinfo, size_t size); 1544 + int trace_user_fault_get(struct trace_user_buf_info *tinfo); 1545 + int trace_user_fault_put(struct trace_user_buf_info *tinfo); 1546 + void trace_user_fault_destroy(struct trace_user_buf_info *tinfo); 1547 + char *trace_user_fault_read(struct trace_user_buf_info *tinfo, 1548 + const char __user *ptr, size_t size, 1549 + trace_user_buf_copy copy_func, void *data); 1558 1550 1559 1551 static inline void 1560 1552 __trace_event_discard_commit(struct trace_buffer *buffer, ··· 1794 1752 1795 1753 enum { 1796 1754 EVENT_TRIGGER_FL_PROBE = BIT(0), 1755 + EVENT_TRIGGER_FL_COUNT = BIT(1), 1797 1756 }; 1798 1757 1799 1758 struct event_trigger_data { 1800 1759 unsigned long count; 1801 1760 int ref; 1802 1761 int flags; 1803 - const struct event_trigger_ops *ops; 1804 1762 struct event_command *cmd_ops; 1805 1763 struct event_filter __rcu *filter; 1806 1764 char *filter_str; ··· 1811 1769 char *name; 1812 1770 struct list_head named_list; 1813 1771 struct event_trigger_data *named_data; 1772 + struct llist_node llist; 1814 1773 }; 1815 1774 1816 1775 /* Avoid typos */ ··· 1825 1782 bool enable; 1826 1783 bool hist; 1827 1784 }; 1785 + 1786 + bool event_trigger_count(struct event_trigger_data *data, 1787 + struct trace_buffer *buffer, void *rec, 1788 + struct ring_buffer_event *event); 1828 1789 1829 1790 extern int event_enable_trigger_print(struct seq_file *m, 1830 1791 struct event_trigger_data *data); ··· 1893 1846 extern void event_file_put(struct trace_event_file *file); 1894 1847 1895 1848 /** 1896 - * struct event_trigger_ops - callbacks for trace event triggers 1897 - * 1898 - * The methods in this structure provide per-event trigger hooks for 1899 - * various trigger operations. 1900 - * 1901 - * The @init and @free methods are used during trigger setup and 1902 - * teardown, typically called from an event_command's @parse() 1903 - * function implementation. 1904 - * 1905 - * The @print method is used to print the trigger spec. 1906 - * 1907 - * The @trigger method is the function that actually implements the 1908 - * trigger and is called in the context of the triggering event 1909 - * whenever that event occurs. 1910 - * 1911 - * All the methods below, except for @init() and @free(), must be 1912 - * implemented. 1913 - * 1914 - * @trigger: The trigger 'probe' function called when the triggering 1915 - * event occurs. The data passed into this callback is the data 1916 - * that was supplied to the event_command @reg() function that 1917 - * registered the trigger (see struct event_command) along with 1918 - * the trace record, rec. 1919 - * 1920 - * @init: An optional initialization function called for the trigger 1921 - * when the trigger is registered (via the event_command reg() 1922 - * function). This can be used to perform per-trigger 1923 - * initialization such as incrementing a per-trigger reference 1924 - * count, for instance. This is usually implemented by the 1925 - * generic utility function @event_trigger_init() (see 1926 - * trace_event_triggers.c). 1927 - * 1928 - * @free: An optional de-initialization function called for the 1929 - * trigger when the trigger is unregistered (via the 1930 - * event_command @reg() function). This can be used to perform 1931 - * per-trigger de-initialization such as decrementing a 1932 - * per-trigger reference count and freeing corresponding trigger 1933 - * data, for instance. This is usually implemented by the 1934 - * generic utility function @event_trigger_free() (see 1935 - * trace_event_triggers.c). 1936 - * 1937 - * @print: The callback function invoked to have the trigger print 1938 - * itself. This is usually implemented by a wrapper function 1939 - * that calls the generic utility function @event_trigger_print() 1940 - * (see trace_event_triggers.c). 1941 - */ 1942 - struct event_trigger_ops { 1943 - void (*trigger)(struct event_trigger_data *data, 1944 - struct trace_buffer *buffer, 1945 - void *rec, 1946 - struct ring_buffer_event *rbe); 1947 - int (*init)(struct event_trigger_data *data); 1948 - void (*free)(struct event_trigger_data *data); 1949 - int (*print)(struct seq_file *m, 1950 - struct event_trigger_data *data); 1951 - }; 1952 - 1953 - /** 1954 1849 * struct event_command - callbacks and data members for event commands 1955 1850 * 1956 1851 * Event commands are invoked by users by writing the command name ··· 1941 1952 * 1942 1953 * @reg: Adds the trigger to the list of triggers associated with the 1943 1954 * event, and enables the event trigger itself, after 1944 - * initializing it (via the event_trigger_ops @init() function). 1955 + * initializing it (via the event_command @init() function). 1945 1956 * This is also where commands can use the @trigger_type value to 1946 1957 * make the decision as to whether or not multiple instances of 1947 1958 * the trigger should be allowed. This is usually implemented by ··· 1950 1961 * 1951 1962 * @unreg: Removes the trigger from the list of triggers associated 1952 1963 * with the event, and disables the event trigger itself, after 1953 - * initializing it (via the event_trigger_ops @free() function). 1964 + * initializing it (via the event_command @free() function). 1954 1965 * This is usually implemented by the generic utility function 1955 1966 * @unregister_trigger() (see trace_event_triggers.c). 1956 1967 * ··· 1964 1975 * ignored. This is usually implemented by the generic utility 1965 1976 * function @set_trigger_filter() (see trace_event_triggers.c). 1966 1977 * 1967 - * @get_trigger_ops: The callback function invoked to retrieve the 1968 - * event_trigger_ops implementation associated with the command. 1969 - * This callback function allows a single event_command to 1970 - * support multiple trigger implementations via different sets of 1971 - * event_trigger_ops, depending on the value of the @param 1972 - * string. 1978 + * All the methods below, except for @init() and @free(), must be 1979 + * implemented. 1980 + * 1981 + * @trigger: The trigger 'probe' function called when the triggering 1982 + * event occurs. The data passed into this callback is the data 1983 + * that was supplied to the event_command @reg() function that 1984 + * registered the trigger (see struct event_command) along with 1985 + * the trace record, rec. 1986 + * 1987 + * @count_func: If defined and a numeric parameter is passed to the 1988 + * trigger, then this function will be called before @trigger 1989 + * is called. If this function returns false, then @trigger is not 1990 + * executed. 1991 + * 1992 + * @init: An optional initialization function called for the trigger 1993 + * when the trigger is registered (via the event_command reg() 1994 + * function). This can be used to perform per-trigger 1995 + * initialization such as incrementing a per-trigger reference 1996 + * count, for instance. This is usually implemented by the 1997 + * generic utility function @event_trigger_init() (see 1998 + * trace_event_triggers.c). 1999 + * 2000 + * @free: An optional de-initialization function called for the 2001 + * trigger when the trigger is unregistered (via the 2002 + * event_command @reg() function). This can be used to perform 2003 + * per-trigger de-initialization such as decrementing a 2004 + * per-trigger reference count and freeing corresponding trigger 2005 + * data, for instance. This is usually implemented by the 2006 + * generic utility function @event_trigger_free() (see 2007 + * trace_event_triggers.c). 2008 + * 2009 + * @print: The callback function invoked to have the trigger print 2010 + * itself. This is usually implemented by a wrapper function 2011 + * that calls the generic utility function @event_trigger_print() 2012 + * (see trace_event_triggers.c). 1973 2013 */ 1974 2014 struct event_command { 1975 2015 struct list_head list; ··· 2019 2001 int (*set_filter)(char *filter_str, 2020 2002 struct event_trigger_data *data, 2021 2003 struct trace_event_file *file); 2022 - const struct event_trigger_ops *(*get_trigger_ops)(char *cmd, char *param); 2004 + void (*trigger)(struct event_trigger_data *data, 2005 + struct trace_buffer *buffer, 2006 + void *rec, 2007 + struct ring_buffer_event *rbe); 2008 + bool (*count_func)(struct event_trigger_data *data, 2009 + struct trace_buffer *buffer, 2010 + void *rec, 2011 + struct ring_buffer_event *rbe); 2012 + int (*init)(struct event_trigger_data *data); 2013 + void (*free)(struct event_trigger_data *data); 2014 + int (*print)(struct seq_file *m, 2015 + struct event_trigger_data *data); 2023 2016 }; 2024 2017 2025 2018 /** ··· 2051 2022 * either committed or discarded. At that point, if any commands 2052 2023 * have deferred their triggers, those commands are finally 2053 2024 * invoked following the close of the current event. In other 2054 - * words, if the event_trigger_ops @func() probe implementation 2025 + * words, if the event_command @func() probe implementation 2055 2026 * itself logs to the trace buffer, this flag should be set, 2056 2027 * otherwise it can be left unspecified. 2057 2028 * ··· 2093 2064 2094 2065 void trace_printk_control(bool enabled); 2095 2066 void trace_printk_start_comm(void); 2096 - int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set); 2097 - int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled); 2067 + int trace_keep_overwrite(struct tracer *tracer, u64 mask, int set); 2068 + int set_tracer_flag(struct trace_array *tr, u64 mask, int enabled); 2098 2069 2099 2070 /* Used from boot time tracer */ 2100 2071 extern int trace_set_options(struct trace_array *tr, char *option); ··· 2276 2247 * So this value has no meaning. 2277 2248 */ 2278 2249 #define FTRACE_TRAMPOLINE_MARKER ((unsigned long) INT_MAX) 2250 + 2251 + /* 2252 + * This is used to get the address of the args array based on 2253 + * the type of the entry. 2254 + */ 2255 + #define FGRAPH_ENTRY_ARGS(e) \ 2256 + ({ \ 2257 + unsigned long *_args; \ 2258 + struct ftrace_graph_ent_entry *_e = e; \ 2259 + \ 2260 + if (IS_ENABLED(CONFIG_FUNCTION_GRAPH_RETADDR) && \ 2261 + e->ent.type == TRACE_GRAPH_RETADDR_ENT) { \ 2262 + struct fgraph_retaddr_ent_entry *_re; \ 2263 + \ 2264 + _re = (typeof(_re))_e; \ 2265 + _args = _re->args; \ 2266 + } else { \ 2267 + _args = _e->args; \ 2268 + } \ 2269 + _args; \ 2270 + }) 2279 2271 2280 2272 #endif /* _LINUX_KERNEL_TRACE_H */
+9 -2
kernel/trace/trace_dynevent.c
··· 144 144 if (!ret || ret != -ECANCELED) 145 145 break; 146 146 } 147 - mutex_unlock(&dyn_event_ops_mutex); 148 - if (ret == -ECANCELED) 147 + if (ret == -ECANCELED) { 148 + static const char *err_msg[] = {"No matching dynamic event type"}; 149 + 150 + /* Wrong dynamic event. Leave an error message. */ 151 + tracing_log_err(NULL, "dynevent", raw_command, err_msg, 152 + 0, 0); 149 153 ret = -EINVAL; 154 + } 155 + 156 + mutex_unlock(&dyn_event_ops_mutex); 150 157 151 158 return ret; 152 159 }
+8 -7
kernel/trace/trace_entries.h
··· 80 80 F_STRUCT( 81 81 __field_struct( struct ftrace_graph_ent, graph_ent ) 82 82 __field_packed( unsigned long, graph_ent, func ) 83 - __field_packed( unsigned int, graph_ent, depth ) 83 + __field_packed( unsigned long, graph_ent, depth ) 84 84 __dynamic_array(unsigned long, args ) 85 85 ), 86 86 87 - F_printk("--> %ps (%u)", (void *)__entry->func, __entry->depth) 87 + F_printk("--> %ps (%lu)", (void *)__entry->func, __entry->depth) 88 88 ); 89 89 90 90 #ifdef CONFIG_FUNCTION_GRAPH_RETADDR ··· 95 95 TRACE_GRAPH_RETADDR_ENT, 96 96 97 97 F_STRUCT( 98 - __field_struct( struct fgraph_retaddr_ent, graph_ent ) 99 - __field_packed( unsigned long, graph_ent, func ) 100 - __field_packed( unsigned int, graph_ent, depth ) 101 - __field_packed( unsigned long, graph_ent, retaddr ) 98 + __field_struct( struct fgraph_retaddr_ent, graph_rent ) 99 + __field_packed( unsigned long, graph_rent.ent, func ) 100 + __field_packed( unsigned long, graph_rent.ent, depth ) 101 + __field_packed( unsigned long, graph_rent, retaddr ) 102 + __dynamic_array(unsigned long, args ) 102 103 ), 103 104 104 - F_printk("--> %ps (%u) <- %ps", (void *)__entry->func, __entry->depth, 105 + F_printk("--> %ps (%lu) <- %ps", (void *)__entry->func, __entry->depth, 105 106 (void *)__entry->retaddr) 106 107 ); 107 108
+4 -15
kernel/trace/trace_eprobe.c
··· 484 484 __eprobe_trace_func(edata, rec); 485 485 } 486 486 487 - static const struct event_trigger_ops eprobe_trigger_ops = { 488 - .trigger = eprobe_trigger_func, 489 - .print = eprobe_trigger_print, 490 - .init = eprobe_trigger_init, 491 - .free = eprobe_trigger_free, 492 - }; 493 - 494 487 static int eprobe_trigger_cmd_parse(struct event_command *cmd_ops, 495 488 struct trace_event_file *file, 496 489 char *glob, char *cmd, ··· 506 513 507 514 } 508 515 509 - static const struct event_trigger_ops *eprobe_trigger_get_ops(char *cmd, 510 - char *param) 511 - { 512 - return &eprobe_trigger_ops; 513 - } 514 - 515 516 static struct event_command event_trigger_cmd = { 516 517 .name = "eprobe", 517 518 .trigger_type = ETT_EVENT_EPROBE, ··· 514 527 .reg = eprobe_trigger_reg_func, 515 528 .unreg = eprobe_trigger_unreg_func, 516 529 .unreg_all = NULL, 517 - .get_trigger_ops = eprobe_trigger_get_ops, 518 530 .set_filter = NULL, 531 + .trigger = eprobe_trigger_func, 532 + .print = eprobe_trigger_print, 533 + .init = eprobe_trigger_init, 534 + .free = eprobe_trigger_free, 519 535 }; 520 536 521 537 static struct event_trigger_data * ··· 538 548 539 549 trigger->flags = EVENT_TRIGGER_FL_PROBE; 540 550 trigger->count = -1; 541 - trigger->ops = &eprobe_trigger_ops; 542 551 543 552 /* 544 553 * EVENT PROBE triggers are not registered as commands with
+2 -2
kernel/trace/trace_events.c
··· 845 845 if (soft_disable) 846 846 set_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags); 847 847 848 - if (tr->trace_flags & TRACE_ITER_RECORD_CMD) { 848 + if (tr->trace_flags & TRACE_ITER(RECORD_CMD)) { 849 849 cmd = true; 850 850 tracing_start_cmdline_record(); 851 851 set_bit(EVENT_FILE_FL_RECORDED_CMD_BIT, &file->flags); 852 852 } 853 853 854 - if (tr->trace_flags & TRACE_ITER_RECORD_TGID) { 854 + if (tr->trace_flags & TRACE_ITER(RECORD_TGID)) { 855 855 tgid = true; 856 856 tracing_start_tgid_record(); 857 857 set_bit(EVENT_FILE_FL_RECORDED_TGID_BIT, &file->flags);
+48 -95
kernel/trace/trace_events_hist.c
··· 5696 5696 seq_puts(m, "\n\n"); 5697 5697 5698 5698 seq_puts(m, "# event histogram\n#\n# trigger info: "); 5699 - data->ops->print(m, data); 5699 + data->cmd_ops->print(m, data); 5700 5700 seq_puts(m, "#\n\n"); 5701 5701 5702 5702 hist_data = data->private_data; ··· 6018 6018 seq_puts(m, "\n\n"); 6019 6019 6020 6020 seq_puts(m, "# event histogram\n#\n# trigger info: "); 6021 - data->ops->print(m, data); 6021 + data->cmd_ops->print(m, data); 6022 6022 seq_puts(m, "#\n\n"); 6023 6023 6024 6024 hist_data = data->private_data; ··· 6328 6328 free_hist_pad(); 6329 6329 } 6330 6330 6331 - static const struct event_trigger_ops event_hist_trigger_ops = { 6332 - .trigger = event_hist_trigger, 6333 - .print = event_hist_trigger_print, 6334 - .init = event_hist_trigger_init, 6335 - .free = event_hist_trigger_free, 6336 - }; 6337 - 6338 6331 static int event_hist_trigger_named_init(struct event_trigger_data *data) 6339 6332 { 6333 + int ret; 6334 + 6340 6335 data->ref++; 6341 6336 6342 6337 save_named_trigger(data->named_data->name, data); 6343 6338 6344 - return event_hist_trigger_init(data->named_data); 6339 + ret = event_hist_trigger_init(data->named_data); 6340 + if (ret < 0) { 6341 + kfree(data->cmd_ops); 6342 + data->cmd_ops = &trigger_hist_cmd; 6343 + } 6344 + 6345 + return ret; 6345 6346 } 6346 6347 6347 6348 static void event_hist_trigger_named_free(struct event_trigger_data *data) ··· 6354 6353 6355 6354 data->ref--; 6356 6355 if (!data->ref) { 6356 + struct event_command *cmd_ops = data->cmd_ops; 6357 + 6357 6358 del_named_trigger(data); 6358 6359 trigger_data_free(data); 6360 + kfree(cmd_ops); 6359 6361 } 6360 - } 6361 - 6362 - static const struct event_trigger_ops event_hist_trigger_named_ops = { 6363 - .trigger = event_hist_trigger, 6364 - .print = event_hist_trigger_print, 6365 - .init = event_hist_trigger_named_init, 6366 - .free = event_hist_trigger_named_free, 6367 - }; 6368 - 6369 - static const struct event_trigger_ops *event_hist_get_trigger_ops(char *cmd, 6370 - char *param) 6371 - { 6372 - return &event_hist_trigger_ops; 6373 6362 } 6374 6363 6375 6364 static void hist_clear(struct event_trigger_data *data) ··· 6555 6564 data->paused = true; 6556 6565 6557 6566 if (named_data) { 6567 + struct event_command *cmd_ops; 6568 + 6558 6569 data->private_data = named_data->private_data; 6559 6570 set_named_trigger_data(data, named_data); 6560 - data->ops = &event_hist_trigger_named_ops; 6571 + /* Copy the command ops and update some of the functions */ 6572 + cmd_ops = kmalloc(sizeof(*cmd_ops), GFP_KERNEL); 6573 + if (!cmd_ops) { 6574 + ret = -ENOMEM; 6575 + goto out; 6576 + } 6577 + *cmd_ops = *data->cmd_ops; 6578 + cmd_ops->init = event_hist_trigger_named_init; 6579 + cmd_ops->free = event_hist_trigger_named_free; 6580 + data->cmd_ops = cmd_ops; 6561 6581 } 6562 6582 6563 - if (data->ops->init) { 6564 - ret = data->ops->init(data); 6583 + if (data->cmd_ops->init) { 6584 + ret = data->cmd_ops->init(data); 6565 6585 if (ret < 0) 6566 6586 goto out; 6567 6587 } ··· 6686 6684 } 6687 6685 } 6688 6686 6689 - if (test && test->ops->free) 6690 - test->ops->free(test); 6687 + if (test && test->cmd_ops->free) 6688 + test->cmd_ops->free(test); 6691 6689 6692 6690 if (hist_data->enable_timestamps) { 6693 6691 if (!hist_data->remove || test) ··· 6739 6737 update_cond_flag(file); 6740 6738 if (hist_data->enable_timestamps) 6741 6739 tracing_set_filter_buffering(file->tr, false); 6742 - if (test->ops->free) 6743 - test->ops->free(test); 6740 + if (test->cmd_ops->free) 6741 + test->cmd_ops->free(test); 6744 6742 } 6745 6743 } 6746 6744 } ··· 6916 6914 .reg = hist_register_trigger, 6917 6915 .unreg = hist_unregister_trigger, 6918 6916 .unreg_all = hist_unreg_all, 6919 - .get_trigger_ops = event_hist_get_trigger_ops, 6920 6917 .set_filter = set_trigger_filter, 6918 + .trigger = event_hist_trigger, 6919 + .print = event_hist_trigger_print, 6920 + .init = event_hist_trigger_init, 6921 + .free = event_hist_trigger_free, 6921 6922 }; 6922 6923 6923 6924 __init int register_trigger_hist_cmd(void) ··· 6952 6947 } 6953 6948 } 6954 6949 6955 - static void 6956 - hist_enable_count_trigger(struct event_trigger_data *data, 6957 - struct trace_buffer *buffer, void *rec, 6958 - struct ring_buffer_event *event) 6959 - { 6960 - if (!data->count) 6961 - return; 6962 - 6963 - if (data->count != -1) 6964 - (data->count)--; 6965 - 6966 - hist_enable_trigger(data, buffer, rec, event); 6967 - } 6968 - 6969 - static const struct event_trigger_ops hist_enable_trigger_ops = { 6970 - .trigger = hist_enable_trigger, 6971 - .print = event_enable_trigger_print, 6972 - .init = event_trigger_init, 6973 - .free = event_enable_trigger_free, 6974 - }; 6975 - 6976 - static const struct event_trigger_ops hist_enable_count_trigger_ops = { 6977 - .trigger = hist_enable_count_trigger, 6978 - .print = event_enable_trigger_print, 6979 - .init = event_trigger_init, 6980 - .free = event_enable_trigger_free, 6981 - }; 6982 - 6983 - static const struct event_trigger_ops hist_disable_trigger_ops = { 6984 - .trigger = hist_enable_trigger, 6985 - .print = event_enable_trigger_print, 6986 - .init = event_trigger_init, 6987 - .free = event_enable_trigger_free, 6988 - }; 6989 - 6990 - static const struct event_trigger_ops hist_disable_count_trigger_ops = { 6991 - .trigger = hist_enable_count_trigger, 6992 - .print = event_enable_trigger_print, 6993 - .init = event_trigger_init, 6994 - .free = event_enable_trigger_free, 6995 - }; 6996 - 6997 - static const struct event_trigger_ops * 6998 - hist_enable_get_trigger_ops(char *cmd, char *param) 6999 - { 7000 - const struct event_trigger_ops *ops; 7001 - bool enable; 7002 - 7003 - enable = (strcmp(cmd, ENABLE_HIST_STR) == 0); 7004 - 7005 - if (enable) 7006 - ops = param ? &hist_enable_count_trigger_ops : 7007 - &hist_enable_trigger_ops; 7008 - else 7009 - ops = param ? &hist_disable_count_trigger_ops : 7010 - &hist_disable_trigger_ops; 7011 - 7012 - return ops; 7013 - } 7014 - 7015 6950 static void hist_enable_unreg_all(struct trace_event_file *file) 7016 6951 { 7017 6952 struct event_trigger_data *test, *n; ··· 6961 7016 list_del_rcu(&test->list); 6962 7017 update_cond_flag(file); 6963 7018 trace_event_trigger_enable_disable(file, 0); 6964 - if (test->ops->free) 6965 - test->ops->free(test); 7019 + if (test->cmd_ops->free) 7020 + test->cmd_ops->free(test); 6966 7021 } 6967 7022 } 6968 7023 } ··· 6974 7029 .reg = event_enable_register_trigger, 6975 7030 .unreg = event_enable_unregister_trigger, 6976 7031 .unreg_all = hist_enable_unreg_all, 6977 - .get_trigger_ops = hist_enable_get_trigger_ops, 6978 7032 .set_filter = set_trigger_filter, 7033 + .trigger = hist_enable_trigger, 7034 + .count_func = event_trigger_count, 7035 + .print = event_enable_trigger_print, 7036 + .init = event_trigger_init, 7037 + .free = event_enable_trigger_free, 6979 7038 }; 6980 7039 6981 7040 static struct event_command trigger_hist_disable_cmd = { ··· 6989 7040 .reg = event_enable_register_trigger, 6990 7041 .unreg = event_enable_unregister_trigger, 6991 7042 .unreg_all = hist_enable_unreg_all, 6992 - .get_trigger_ops = hist_enable_get_trigger_ops, 6993 7043 .set_filter = set_trigger_filter, 7044 + .trigger = hist_enable_trigger, 7045 + .count_func = event_trigger_count, 7046 + .print = event_enable_trigger_print, 7047 + .init = event_trigger_init, 7048 + .free = event_enable_trigger_free, 6994 7049 }; 6995 7050 6996 7051 static __init void unregister_trigger_hist_enable_disable_cmds(void)
+1 -1
kernel/trace/trace_events_synth.c
··· 359 359 fmt = synth_field_fmt(se->fields[i]->type); 360 360 361 361 /* parameter types */ 362 - if (tr && tr->trace_flags & TRACE_ITER_VERBOSE) 362 + if (tr && tr->trace_flags & TRACE_ITER(VERBOSE)) 363 363 trace_seq_printf(s, "%s ", fmt); 364 364 365 365 snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);
+177 -231
kernel/trace/trace_events_trigger.c
··· 6 6 */ 7 7 8 8 #include <linux/security.h> 9 + #include <linux/kthread.h> 9 10 #include <linux/module.h> 10 11 #include <linux/ctype.h> 11 12 #include <linux/mutex.h> ··· 18 17 static LIST_HEAD(trigger_commands); 19 18 static DEFINE_MUTEX(trigger_cmd_mutex); 20 19 20 + static struct task_struct *trigger_kthread; 21 + static struct llist_head trigger_data_free_list; 22 + static DEFINE_MUTEX(trigger_data_kthread_mutex); 23 + 24 + /* Bulk garbage collection of event_trigger_data elements */ 25 + static int trigger_kthread_fn(void *ignore) 26 + { 27 + struct event_trigger_data *data, *tmp; 28 + struct llist_node *llnodes; 29 + 30 + /* Once this task starts, it lives forever */ 31 + for (;;) { 32 + set_current_state(TASK_INTERRUPTIBLE); 33 + if (llist_empty(&trigger_data_free_list)) 34 + schedule(); 35 + 36 + __set_current_state(TASK_RUNNING); 37 + 38 + llnodes = llist_del_all(&trigger_data_free_list); 39 + 40 + /* make sure current triggers exit before free */ 41 + tracepoint_synchronize_unregister(); 42 + 43 + llist_for_each_entry_safe(data, tmp, llnodes, llist) 44 + kfree(data); 45 + } 46 + 47 + return 0; 48 + } 49 + 21 50 void trigger_data_free(struct event_trigger_data *data) 22 51 { 23 52 if (data->cmd_ops->set_filter) 24 53 data->cmd_ops->set_filter(NULL, data, NULL); 25 54 26 - /* make sure current triggers exit before free */ 27 - tracepoint_synchronize_unregister(); 55 + if (unlikely(!trigger_kthread)) { 56 + guard(mutex)(&trigger_data_kthread_mutex); 57 + /* Check again after taking mutex */ 58 + if (!trigger_kthread) { 59 + struct task_struct *kthread; 28 60 29 - kfree(data); 61 + kthread = kthread_create(trigger_kthread_fn, NULL, 62 + "trigger_data_free"); 63 + if (!IS_ERR(kthread)) 64 + WRITE_ONCE(trigger_kthread, kthread); 65 + } 66 + } 67 + 68 + if (!trigger_kthread) { 69 + /* Do it the slow way */ 70 + tracepoint_synchronize_unregister(); 71 + kfree(data); 72 + return; 73 + } 74 + 75 + llist_add(&data->llist, &trigger_data_free_list); 76 + wake_up_process(trigger_kthread); 77 + } 78 + 79 + static inline void data_ops_trigger(struct event_trigger_data *data, 80 + struct trace_buffer *buffer, void *rec, 81 + struct ring_buffer_event *event) 82 + { 83 + const struct event_command *cmd_ops = data->cmd_ops; 84 + 85 + if (data->flags & EVENT_TRIGGER_FL_COUNT) { 86 + if (!cmd_ops->count_func(data, buffer, rec, event)) 87 + return; 88 + } 89 + 90 + cmd_ops->trigger(data, buffer, rec, event); 30 91 } 31 92 32 93 /** ··· 133 70 if (data->paused) 134 71 continue; 135 72 if (!rec) { 136 - data->ops->trigger(data, buffer, rec, event); 73 + data_ops_trigger(data, buffer, rec, event); 137 74 continue; 138 75 } 139 76 filter = rcu_dereference_sched(data->filter); ··· 143 80 tt |= data->cmd_ops->trigger_type; 144 81 continue; 145 82 } 146 - data->ops->trigger(data, buffer, rec, event); 83 + data_ops_trigger(data, buffer, rec, event); 147 84 } 148 85 return tt; 149 86 } ··· 185 122 if (data->paused) 186 123 continue; 187 124 if (data->cmd_ops->trigger_type & tt) 188 - data->ops->trigger(data, NULL, NULL, NULL); 125 + data_ops_trigger(data, NULL, NULL, NULL); 189 126 } 190 127 } 191 128 EXPORT_SYMBOL_GPL(event_triggers_post_call); ··· 254 191 } 255 192 256 193 data = list_entry(v, struct event_trigger_data, list); 257 - data->ops->print(m, data); 194 + data->cmd_ops->print(m, data); 258 195 259 196 return 0; 260 197 } ··· 308 245 char *command, *next; 309 246 struct event_command *p; 310 247 311 - next = buff = skip_spaces(buff); 248 + next = buff = strim(buff); 249 + 312 250 command = strsep(&next, ": \t"); 313 251 if (next) { 314 252 next = skip_spaces(next); ··· 346 282 if (IS_ERR(buf)) 347 283 return PTR_ERR(buf); 348 284 349 - strim(buf); 350 - 351 285 guard(mutex)(&event_mutex); 352 286 353 287 event_file = event_file_file(file); ··· 362 300 363 301 static int event_trigger_regex_release(struct inode *inode, struct file *file) 364 302 { 365 - mutex_lock(&event_mutex); 366 - 367 303 if (file->f_mode & FMODE_READ) 368 304 seq_release(inode, file); 369 - 370 - mutex_unlock(&event_mutex); 371 305 372 306 return 0; 373 307 } ··· 436 378 } 437 379 438 380 /** 439 - * event_trigger_print - Generic event_trigger_ops @print implementation 381 + * event_trigger_count - Optional count function for event triggers 382 + * @data: Trigger-specific data 383 + * @buffer: The ring buffer that the event is being written to 384 + * @rec: The trace entry for the event, NULL for unconditional invocation 385 + * @event: The event meta data in the ring buffer 386 + * 387 + * For triggers that can take a count parameter that doesn't do anything 388 + * special, they can use this function to assign to their .count_func 389 + * field. 390 + * 391 + * This simply does a count down of the @data->count field. 392 + * 393 + * If the @data->count is greater than zero, it will decrement it. 394 + * 395 + * Returns false if @data->count is zero, otherwise true. 396 + */ 397 + bool event_trigger_count(struct event_trigger_data *data, 398 + struct trace_buffer *buffer, void *rec, 399 + struct ring_buffer_event *event) 400 + { 401 + if (!data->count) 402 + return false; 403 + 404 + if (data->count != -1) 405 + (data->count)--; 406 + 407 + return true; 408 + } 409 + 410 + /** 411 + * event_trigger_print - Generic event_command @print implementation 440 412 * @name: The name of the event trigger 441 413 * @m: The seq_file being printed to 442 414 * @data: Trigger-specific data ··· 501 413 } 502 414 503 415 /** 504 - * event_trigger_init - Generic event_trigger_ops @init implementation 416 + * event_trigger_init - Generic event_command @init implementation 505 417 * @data: Trigger-specific data 506 418 * 507 419 * Common implementation of event trigger initialization. ··· 518 430 } 519 431 520 432 /** 521 - * event_trigger_free - Generic event_trigger_ops @free implementation 433 + * event_trigger_free - Generic event_command @free implementation 522 434 * @data: Trigger-specific data 523 435 * 524 436 * Common implementation of event trigger de-initialization. ··· 580 492 list_for_each_entry_safe(data, n, &file->triggers, list) { 581 493 trace_event_trigger_enable_disable(file, 0); 582 494 list_del_rcu(&data->list); 583 - if (data->ops->free) 584 - data->ops->free(data); 495 + if (data->cmd_ops->free) 496 + data->cmd_ops->free(data); 585 497 } 586 498 } 587 499 } ··· 644 556 return -EEXIST; 645 557 } 646 558 647 - if (data->ops->init) { 648 - ret = data->ops->init(data); 559 + if (data->cmd_ops->init) { 560 + ret = data->cmd_ops->init(data); 649 561 if (ret < 0) 650 562 return ret; 651 563 } ··· 683 595 } 684 596 685 597 if (data) { 686 - if (data->ops->free) 687 - data->ops->free(data); 598 + if (data->cmd_ops->free) 599 + data->cmd_ops->free(data); 688 600 689 601 return true; 690 602 } ··· 895 807 * @private_data: User data to associate with the event trigger 896 808 * 897 809 * Allocate an event_trigger_data instance and initialize it. The 898 - * @cmd_ops are used along with the @cmd and @param to get the 899 - * trigger_ops to assign to the event_trigger_data. @private_data can 900 - * also be passed in and associated with the event_trigger_data. 810 + * @cmd_ops defines how the trigger will operate. If @param is set, 811 + * and @cmd_ops->trigger_ops->count_func is non NULL, then the 812 + * data->count is set to @param and before the trigger is executed, the 813 + * @cmd_ops->trigger_ops->count_func() is called. If that function returns 814 + * false, the @cmd_ops->trigger_ops->trigger() function will not be called. 815 + * @private_data can also be passed in and associated with the 816 + * event_trigger_data. 901 817 * 902 818 * Use trigger_data_free() to free an event_trigger_data object. 903 819 * ··· 913 821 void *private_data) 914 822 { 915 823 struct event_trigger_data *trigger_data; 916 - const struct event_trigger_ops *trigger_ops; 917 - 918 - trigger_ops = cmd_ops->get_trigger_ops(cmd, param); 919 824 920 825 trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL); 921 826 if (!trigger_data) 922 827 return NULL; 923 828 924 829 trigger_data->count = -1; 925 - trigger_data->ops = trigger_ops; 926 830 trigger_data->cmd_ops = cmd_ops; 927 831 trigger_data->private_data = private_data; 832 + if (param && cmd_ops->count_func) 833 + trigger_data->flags |= EVENT_TRIGGER_FL_COUNT; 928 834 929 835 INIT_LIST_HEAD(&trigger_data->list); 930 836 INIT_LIST_HEAD(&trigger_data->named_list); ··· 1361 1271 tracing_on(); 1362 1272 } 1363 1273 1364 - static void 1365 - traceon_count_trigger(struct event_trigger_data *data, 1366 - struct trace_buffer *buffer, void *rec, 1367 - struct ring_buffer_event *event) 1274 + static bool 1275 + traceon_count_func(struct event_trigger_data *data, 1276 + struct trace_buffer *buffer, void *rec, 1277 + struct ring_buffer_event *event) 1368 1278 { 1369 1279 struct trace_event_file *file = data->private_data; 1370 1280 1371 1281 if (file) { 1372 1282 if (tracer_tracing_is_on(file->tr)) 1373 - return; 1283 + return false; 1374 1284 } else { 1375 1285 if (tracing_is_on()) 1376 - return; 1286 + return false; 1377 1287 } 1378 1288 1379 1289 if (!data->count) 1380 - return; 1290 + return false; 1381 1291 1382 1292 if (data->count != -1) 1383 1293 (data->count)--; 1384 1294 1385 - if (file) 1386 - tracer_tracing_on(file->tr); 1387 - else 1388 - tracing_on(); 1295 + return true; 1389 1296 } 1390 1297 1391 1298 static void ··· 1406 1319 tracing_off(); 1407 1320 } 1408 1321 1409 - static void 1410 - traceoff_count_trigger(struct event_trigger_data *data, 1411 - struct trace_buffer *buffer, void *rec, 1412 - struct ring_buffer_event *event) 1322 + static bool 1323 + traceoff_count_func(struct event_trigger_data *data, 1324 + struct trace_buffer *buffer, void *rec, 1325 + struct ring_buffer_event *event) 1413 1326 { 1414 1327 struct trace_event_file *file = data->private_data; 1415 1328 1416 1329 if (file) { 1417 1330 if (!tracer_tracing_is_on(file->tr)) 1418 - return; 1331 + return false; 1419 1332 } else { 1420 1333 if (!tracing_is_on()) 1421 - return; 1334 + return false; 1422 1335 } 1423 1336 1424 1337 if (!data->count) 1425 - return; 1338 + return false; 1426 1339 1427 1340 if (data->count != -1) 1428 1341 (data->count)--; 1429 1342 1430 - if (file) 1431 - tracer_tracing_off(file->tr); 1432 - else 1433 - tracing_off(); 1343 + return true; 1434 1344 } 1435 1345 1436 1346 static int ··· 1444 1360 data->filter_str); 1445 1361 } 1446 1362 1447 - static const struct event_trigger_ops traceon_trigger_ops = { 1448 - .trigger = traceon_trigger, 1449 - .print = traceon_trigger_print, 1450 - .init = event_trigger_init, 1451 - .free = event_trigger_free, 1452 - }; 1453 - 1454 - static const struct event_trigger_ops traceon_count_trigger_ops = { 1455 - .trigger = traceon_count_trigger, 1456 - .print = traceon_trigger_print, 1457 - .init = event_trigger_init, 1458 - .free = event_trigger_free, 1459 - }; 1460 - 1461 - static const struct event_trigger_ops traceoff_trigger_ops = { 1462 - .trigger = traceoff_trigger, 1463 - .print = traceoff_trigger_print, 1464 - .init = event_trigger_init, 1465 - .free = event_trigger_free, 1466 - }; 1467 - 1468 - static const struct event_trigger_ops traceoff_count_trigger_ops = { 1469 - .trigger = traceoff_count_trigger, 1470 - .print = traceoff_trigger_print, 1471 - .init = event_trigger_init, 1472 - .free = event_trigger_free, 1473 - }; 1474 - 1475 - static const struct event_trigger_ops * 1476 - onoff_get_trigger_ops(char *cmd, char *param) 1477 - { 1478 - const struct event_trigger_ops *ops; 1479 - 1480 - /* we register both traceon and traceoff to this callback */ 1481 - if (strcmp(cmd, "traceon") == 0) 1482 - ops = param ? &traceon_count_trigger_ops : 1483 - &traceon_trigger_ops; 1484 - else 1485 - ops = param ? &traceoff_count_trigger_ops : 1486 - &traceoff_trigger_ops; 1487 - 1488 - return ops; 1489 - } 1490 - 1491 1363 static struct event_command trigger_traceon_cmd = { 1492 1364 .name = "traceon", 1493 1365 .trigger_type = ETT_TRACE_ONOFF, 1494 1366 .parse = event_trigger_parse, 1495 1367 .reg = register_trigger, 1496 1368 .unreg = unregister_trigger, 1497 - .get_trigger_ops = onoff_get_trigger_ops, 1498 1369 .set_filter = set_trigger_filter, 1370 + .trigger = traceon_trigger, 1371 + .count_func = traceon_count_func, 1372 + .print = traceon_trigger_print, 1373 + .init = event_trigger_init, 1374 + .free = event_trigger_free, 1499 1375 }; 1500 1376 1501 1377 static struct event_command trigger_traceoff_cmd = { ··· 1465 1421 .parse = event_trigger_parse, 1466 1422 .reg = register_trigger, 1467 1423 .unreg = unregister_trigger, 1468 - .get_trigger_ops = onoff_get_trigger_ops, 1469 1424 .set_filter = set_trigger_filter, 1425 + .trigger = traceoff_trigger, 1426 + .count_func = traceoff_count_func, 1427 + .print = traceoff_trigger_print, 1428 + .init = event_trigger_init, 1429 + .free = event_trigger_free, 1470 1430 }; 1471 1431 1472 1432 #ifdef CONFIG_TRACER_SNAPSHOT ··· 1485 1437 tracing_snapshot_instance(file->tr); 1486 1438 else 1487 1439 tracing_snapshot(); 1488 - } 1489 - 1490 - static void 1491 - snapshot_count_trigger(struct event_trigger_data *data, 1492 - struct trace_buffer *buffer, void *rec, 1493 - struct ring_buffer_event *event) 1494 - { 1495 - if (!data->count) 1496 - return; 1497 - 1498 - if (data->count != -1) 1499 - (data->count)--; 1500 - 1501 - snapshot_trigger(data, buffer, rec, event); 1502 1440 } 1503 1441 1504 1442 static int ··· 1518 1484 data->filter_str); 1519 1485 } 1520 1486 1521 - static const struct event_trigger_ops snapshot_trigger_ops = { 1522 - .trigger = snapshot_trigger, 1523 - .print = snapshot_trigger_print, 1524 - .init = event_trigger_init, 1525 - .free = event_trigger_free, 1526 - }; 1527 - 1528 - static const struct event_trigger_ops snapshot_count_trigger_ops = { 1529 - .trigger = snapshot_count_trigger, 1530 - .print = snapshot_trigger_print, 1531 - .init = event_trigger_init, 1532 - .free = event_trigger_free, 1533 - }; 1534 - 1535 - static const struct event_trigger_ops * 1536 - snapshot_get_trigger_ops(char *cmd, char *param) 1537 - { 1538 - return param ? &snapshot_count_trigger_ops : &snapshot_trigger_ops; 1539 - } 1540 - 1541 1487 static struct event_command trigger_snapshot_cmd = { 1542 1488 .name = "snapshot", 1543 1489 .trigger_type = ETT_SNAPSHOT, 1544 1490 .parse = event_trigger_parse, 1545 1491 .reg = register_snapshot_trigger, 1546 1492 .unreg = unregister_snapshot_trigger, 1547 - .get_trigger_ops = snapshot_get_trigger_ops, 1548 1493 .set_filter = set_trigger_filter, 1494 + .trigger = snapshot_trigger, 1495 + .count_func = event_trigger_count, 1496 + .print = snapshot_trigger_print, 1497 + .init = event_trigger_init, 1498 + .free = event_trigger_free, 1549 1499 }; 1550 1500 1551 1501 static __init int register_trigger_snapshot_cmd(void) ··· 1576 1558 trace_dump_stack(STACK_SKIP); 1577 1559 } 1578 1560 1579 - static void 1580 - stacktrace_count_trigger(struct event_trigger_data *data, 1581 - struct trace_buffer *buffer, void *rec, 1582 - struct ring_buffer_event *event) 1583 - { 1584 - if (!data->count) 1585 - return; 1586 - 1587 - if (data->count != -1) 1588 - (data->count)--; 1589 - 1590 - stacktrace_trigger(data, buffer, rec, event); 1591 - } 1592 - 1593 1561 static int 1594 1562 stacktrace_trigger_print(struct seq_file *m, struct event_trigger_data *data) 1595 1563 { 1596 1564 return event_trigger_print("stacktrace", m, (void *)data->count, 1597 1565 data->filter_str); 1598 - } 1599 - 1600 - static const struct event_trigger_ops stacktrace_trigger_ops = { 1601 - .trigger = stacktrace_trigger, 1602 - .print = stacktrace_trigger_print, 1603 - .init = event_trigger_init, 1604 - .free = event_trigger_free, 1605 - }; 1606 - 1607 - static const struct event_trigger_ops stacktrace_count_trigger_ops = { 1608 - .trigger = stacktrace_count_trigger, 1609 - .print = stacktrace_trigger_print, 1610 - .init = event_trigger_init, 1611 - .free = event_trigger_free, 1612 - }; 1613 - 1614 - static const struct event_trigger_ops * 1615 - stacktrace_get_trigger_ops(char *cmd, char *param) 1616 - { 1617 - return param ? &stacktrace_count_trigger_ops : &stacktrace_trigger_ops; 1618 1566 } 1619 1567 1620 1568 static struct event_command trigger_stacktrace_cmd = { ··· 1590 1606 .parse = event_trigger_parse, 1591 1607 .reg = register_trigger, 1592 1608 .unreg = unregister_trigger, 1593 - .get_trigger_ops = stacktrace_get_trigger_ops, 1594 1609 .set_filter = set_trigger_filter, 1610 + .trigger = stacktrace_trigger, 1611 + .count_func = event_trigger_count, 1612 + .print = stacktrace_trigger_print, 1613 + .init = event_trigger_init, 1614 + .free = event_trigger_free, 1595 1615 }; 1596 1616 1597 1617 static __init int register_trigger_stacktrace_cmd(void) ··· 1630 1642 set_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &enable_data->file->flags); 1631 1643 } 1632 1644 1633 - static void 1634 - event_enable_count_trigger(struct event_trigger_data *data, 1635 - struct trace_buffer *buffer, void *rec, 1636 - struct ring_buffer_event *event) 1645 + static bool 1646 + event_enable_count_func(struct event_trigger_data *data, 1647 + struct trace_buffer *buffer, void *rec, 1648 + struct ring_buffer_event *event) 1637 1649 { 1638 1650 struct enable_trigger_data *enable_data = data->private_data; 1639 1651 1640 1652 if (!data->count) 1641 - return; 1653 + return false; 1642 1654 1643 1655 /* Skip if the event is in a state we want to switch to */ 1644 1656 if (enable_data->enable == !(enable_data->file->flags & EVENT_FILE_FL_SOFT_DISABLED)) 1645 - return; 1657 + return false; 1646 1658 1647 1659 if (data->count != -1) 1648 1660 (data->count)--; 1649 1661 1650 - event_enable_trigger(data, buffer, rec, event); 1662 + return true; 1651 1663 } 1652 1664 1653 1665 int event_enable_trigger_print(struct seq_file *m, ··· 1691 1703 kfree(enable_data); 1692 1704 } 1693 1705 } 1694 - 1695 - static const struct event_trigger_ops event_enable_trigger_ops = { 1696 - .trigger = event_enable_trigger, 1697 - .print = event_enable_trigger_print, 1698 - .init = event_trigger_init, 1699 - .free = event_enable_trigger_free, 1700 - }; 1701 - 1702 - static const struct event_trigger_ops event_enable_count_trigger_ops = { 1703 - .trigger = event_enable_count_trigger, 1704 - .print = event_enable_trigger_print, 1705 - .init = event_trigger_init, 1706 - .free = event_enable_trigger_free, 1707 - }; 1708 - 1709 - static const struct event_trigger_ops event_disable_trigger_ops = { 1710 - .trigger = event_enable_trigger, 1711 - .print = event_enable_trigger_print, 1712 - .init = event_trigger_init, 1713 - .free = event_enable_trigger_free, 1714 - }; 1715 - 1716 - static const struct event_trigger_ops event_disable_count_trigger_ops = { 1717 - .trigger = event_enable_count_trigger, 1718 - .print = event_enable_trigger_print, 1719 - .init = event_trigger_init, 1720 - .free = event_enable_trigger_free, 1721 - }; 1722 1706 1723 1707 int event_enable_trigger_parse(struct event_command *cmd_ops, 1724 1708 struct trace_event_file *file, ··· 1821 1861 } 1822 1862 } 1823 1863 1824 - if (data->ops->init) { 1825 - ret = data->ops->init(data); 1864 + if (data->cmd_ops->init) { 1865 + ret = data->cmd_ops->init(data); 1826 1866 if (ret < 0) 1827 1867 return ret; 1828 1868 } ··· 1862 1902 } 1863 1903 } 1864 1904 1865 - if (data && data->ops->free) 1866 - data->ops->free(data); 1867 - } 1868 - 1869 - static const struct event_trigger_ops * 1870 - event_enable_get_trigger_ops(char *cmd, char *param) 1871 - { 1872 - const struct event_trigger_ops *ops; 1873 - bool enable; 1874 - 1875 - #ifdef CONFIG_HIST_TRIGGERS 1876 - enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) || 1877 - (strcmp(cmd, ENABLE_HIST_STR) == 0)); 1878 - #else 1879 - enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; 1880 - #endif 1881 - if (enable) 1882 - ops = param ? &event_enable_count_trigger_ops : 1883 - &event_enable_trigger_ops; 1884 - else 1885 - ops = param ? &event_disable_count_trigger_ops : 1886 - &event_disable_trigger_ops; 1887 - 1888 - return ops; 1905 + if (data && data->cmd_ops->free) 1906 + data->cmd_ops->free(data); 1889 1907 } 1890 1908 1891 1909 static struct event_command trigger_enable_cmd = { ··· 1872 1934 .parse = event_enable_trigger_parse, 1873 1935 .reg = event_enable_register_trigger, 1874 1936 .unreg = event_enable_unregister_trigger, 1875 - .get_trigger_ops = event_enable_get_trigger_ops, 1876 1937 .set_filter = set_trigger_filter, 1938 + .trigger = event_enable_trigger, 1939 + .count_func = event_enable_count_func, 1940 + .print = event_enable_trigger_print, 1941 + .init = event_trigger_init, 1942 + .free = event_enable_trigger_free, 1877 1943 }; 1878 1944 1879 1945 static struct event_command trigger_disable_cmd = { ··· 1886 1944 .parse = event_enable_trigger_parse, 1887 1945 .reg = event_enable_register_trigger, 1888 1946 .unreg = event_enable_unregister_trigger, 1889 - .get_trigger_ops = event_enable_get_trigger_ops, 1890 1947 .set_filter = set_trigger_filter, 1948 + .trigger = event_enable_trigger, 1949 + .count_func = event_enable_count_func, 1950 + .print = event_enable_trigger_print, 1951 + .init = event_trigger_init, 1952 + .free = event_enable_trigger_free, 1891 1953 }; 1892 1954 1893 1955 static __init void unregister_trigger_enable_disable_cmds(void)
+3 -3
kernel/trace/trace_fprobe.c
··· 632 632 633 633 trace_seq_printf(s, "%s: (", trace_probe_name(tp)); 634 634 635 - if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET)) 635 + if (!seq_print_ip_sym_offset(s, field->ip, flags)) 636 636 goto out; 637 637 638 638 trace_seq_putc(s, ')'); ··· 662 662 663 663 trace_seq_printf(s, "%s: (", trace_probe_name(tp)); 664 664 665 - if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET)) 665 + if (!seq_print_ip_sym_offset(s, field->ret_ip, flags)) 666 666 goto out; 667 667 668 668 trace_seq_puts(s, " <- "); 669 669 670 - if (!seq_print_ip_sym(s, field->func, flags & ~TRACE_ITER_SYM_OFFSET)) 670 + if (!seq_print_ip_sym_no_offset(s, field->func, flags)) 671 671 goto out; 672 672 673 673 trace_seq_putc(s, ')');
+5 -5
kernel/trace/trace_functions.c
··· 154 154 if (!tr->ops) 155 155 return -ENOMEM; 156 156 157 - func = select_trace_function(func_flags.val); 157 + func = select_trace_function(tr->current_trace_flags->val); 158 158 if (!func) 159 159 return -EINVAL; 160 160 161 - if (!handle_func_repeats(tr, func_flags.val)) 161 + if (!handle_func_repeats(tr, tr->current_trace_flags->val)) 162 162 return -ENOMEM; 163 163 164 164 ftrace_init_array_ops(tr, func); ··· 459 459 u32 new_flags; 460 460 461 461 /* Do nothing if already set. */ 462 - if (!!set == !!(func_flags.val & bit)) 462 + if (!!set == !!(tr->current_trace_flags->val & bit)) 463 463 return 0; 464 464 465 465 /* We can change this flag only when not running. */ 466 466 if (tr->current_trace != &function_trace) 467 467 return 0; 468 468 469 - new_flags = (func_flags.val & ~bit) | (set ? bit : 0); 469 + new_flags = (tr->current_trace_flags->val & ~bit) | (set ? bit : 0); 470 470 func = select_trace_function(new_flags); 471 471 if (!func) 472 472 return -EINVAL; ··· 491 491 .init = function_trace_init, 492 492 .reset = function_trace_reset, 493 493 .start = function_trace_start, 494 - .flags = &func_flags, 494 + .default_flags = &func_flags, 495 495 .set_flag = func_set_flag, 496 496 .allow_instances = true, 497 497 #ifdef CONFIG_FTRACE_SELFTEST
+153 -67
kernel/trace/trace_functions_graph.c
··· 16 16 #include "trace.h" 17 17 #include "trace_output.h" 18 18 19 - /* When set, irq functions will be ignored */ 19 + /* When set, irq functions might be ignored */ 20 20 static int ftrace_graph_skip_irqs; 21 + 22 + /* Do not record function time when task is sleeping */ 23 + int fgraph_no_sleep_time; 21 24 22 25 struct fgraph_cpu_data { 23 26 pid_t last_pid; ··· 36 33 unsigned long args[FTRACE_REGS_MAX_ARGS]; 37 34 }; 38 35 36 + struct fgraph_retaddr_ent_args { 37 + struct fgraph_retaddr_ent_entry ent; 38 + /* Force the sizeof of args[] to have FTRACE_REGS_MAX_ARGS entries */ 39 + unsigned long args[FTRACE_REGS_MAX_ARGS]; 40 + }; 41 + 39 42 struct fgraph_data { 40 43 struct fgraph_cpu_data __percpu *cpu_data; 41 44 42 45 /* Place to preserve last processed entry. */ 43 46 union { 44 47 struct fgraph_ent_args ent; 45 - /* TODO allow retaddr to have args */ 46 - struct fgraph_retaddr_ent_entry rent; 48 + struct fgraph_retaddr_ent_args rent; 47 49 }; 48 50 struct ftrace_graph_ret_entry ret; 49 51 int failed; ··· 93 85 /* Include sleep time (scheduled out) between entry and return */ 94 86 { TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) }, 95 87 96 - #ifdef CONFIG_FUNCTION_PROFILER 97 - /* Include time within nested functions */ 98 - { TRACER_OPT(graph-time, TRACE_GRAPH_GRAPH_TIME) }, 99 - #endif 100 - 101 88 { } /* Empty entry */ 102 89 }; 103 90 ··· 100 97 /* Don't display overruns, proc, or tail by default */ 101 98 .val = TRACE_GRAPH_PRINT_CPU | TRACE_GRAPH_PRINT_OVERHEAD | 102 99 TRACE_GRAPH_PRINT_DURATION | TRACE_GRAPH_PRINT_IRQS | 103 - TRACE_GRAPH_SLEEP_TIME | TRACE_GRAPH_GRAPH_TIME, 100 + TRACE_GRAPH_SLEEP_TIME, 104 101 .opts = trace_opts 105 102 }; 106 103 107 - static bool tracer_flags_is_set(u32 flags) 104 + static bool tracer_flags_is_set(struct trace_array *tr, u32 flags) 108 105 { 109 - return (tracer_flags.val & flags) == flags; 106 + return (tr->current_trace_flags->val & flags) == flags; 110 107 } 111 108 112 109 /* ··· 165 162 int __trace_graph_retaddr_entry(struct trace_array *tr, 166 163 struct ftrace_graph_ent *trace, 167 164 unsigned int trace_ctx, 168 - unsigned long retaddr) 165 + unsigned long retaddr, 166 + struct ftrace_regs *fregs) 169 167 { 170 168 struct ring_buffer_event *event; 171 169 struct trace_buffer *buffer = tr->array_buffer.buffer; 172 170 struct fgraph_retaddr_ent_entry *entry; 171 + int size; 172 + 173 + /* If fregs is defined, add FTRACE_REGS_MAX_ARGS long size words */ 174 + size = sizeof(*entry) + (FTRACE_REGS_MAX_ARGS * !!fregs * sizeof(long)); 173 175 174 176 event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_RETADDR_ENT, 175 - sizeof(*entry), trace_ctx); 177 + size, trace_ctx); 176 178 if (!event) 177 179 return 0; 178 180 entry = ring_buffer_event_data(event); 179 - entry->graph_ent.func = trace->func; 180 - entry->graph_ent.depth = trace->depth; 181 - entry->graph_ent.retaddr = retaddr; 181 + entry->graph_rent.ent = *trace; 182 + entry->graph_rent.retaddr = retaddr; 183 + 184 + #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API 185 + if (fregs) { 186 + for (int i = 0; i < FTRACE_REGS_MAX_ARGS; i++) 187 + entry->args[i] = ftrace_regs_get_argument(fregs, i); 188 + } 189 + #endif 190 + 182 191 trace_buffer_unlock_commit_nostack(buffer, event); 183 192 184 193 return 1; ··· 199 184 int __trace_graph_retaddr_entry(struct trace_array *tr, 200 185 struct ftrace_graph_ent *trace, 201 186 unsigned int trace_ctx, 202 - unsigned long retaddr) 187 + unsigned long retaddr, 188 + struct ftrace_regs *fregs) 203 189 { 204 190 return 1; 205 191 } 206 192 #endif 207 193 208 - static inline int ftrace_graph_ignore_irqs(void) 194 + static inline int ftrace_graph_ignore_irqs(struct trace_array *tr) 209 195 { 210 196 if (!ftrace_graph_skip_irqs || trace_recursion_test(TRACE_IRQ_BIT)) 197 + return 0; 198 + 199 + if (tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_IRQS)) 211 200 return 0; 212 201 213 202 return in_hardirq(); ··· 257 238 if (ftrace_graph_ignore_func(gops, trace)) 258 239 return 0; 259 240 260 - if (ftrace_graph_ignore_irqs()) 241 + if (ftrace_graph_ignore_irqs(tr)) 261 242 return 0; 262 243 263 - if (fgraph_sleep_time) { 264 - /* Only need to record the calltime */ 265 - ftimes = fgraph_reserve_data(gops->idx, sizeof(ftimes->calltime)); 266 - } else { 244 + if (fgraph_no_sleep_time && 245 + !tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME)) { 267 246 ftimes = fgraph_reserve_data(gops->idx, sizeof(*ftimes)); 268 247 if (ftimes) 269 248 ftimes->sleeptime = current->ftrace_sleeptime; 249 + } else { 250 + /* Only need to record the calltime */ 251 + ftimes = fgraph_reserve_data(gops->idx, sizeof(ftimes->calltime)); 270 252 } 271 253 if (!ftimes) 272 254 return 0; ··· 283 263 284 264 trace_ctx = tracing_gen_ctx(); 285 265 if (IS_ENABLED(CONFIG_FUNCTION_GRAPH_RETADDR) && 286 - tracer_flags_is_set(TRACE_GRAPH_PRINT_RETADDR)) { 266 + tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_RETADDR)) { 287 267 unsigned long retaddr = ftrace_graph_top_ret_addr(current); 288 - ret = __trace_graph_retaddr_entry(tr, trace, trace_ctx, retaddr); 268 + ret = __trace_graph_retaddr_entry(tr, trace, trace_ctx, 269 + retaddr, fregs); 289 270 } else { 290 271 ret = __graph_entry(tr, trace, trace_ctx, fregs); 291 272 } ··· 354 333 trace_buffer_unlock_commit_nostack(buffer, event); 355 334 } 356 335 357 - static void handle_nosleeptime(struct ftrace_graph_ret *trace, 336 + static void handle_nosleeptime(struct trace_array *tr, 337 + struct ftrace_graph_ret *trace, 358 338 struct fgraph_times *ftimes, 359 339 int size) 360 340 { 361 - if (fgraph_sleep_time || size < sizeof(*ftimes)) 341 + if (size < sizeof(*ftimes)) 342 + return; 343 + 344 + if (!fgraph_no_sleep_time || tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME)) 362 345 return; 363 346 364 347 ftimes->calltime += current->ftrace_sleeptime - ftimes->sleeptime; ··· 391 366 if (!ftimes) 392 367 return; 393 368 394 - handle_nosleeptime(trace, ftimes, size); 369 + handle_nosleeptime(tr, trace, ftimes, size); 395 370 396 371 calltime = ftimes->calltime; 397 372 ··· 404 379 struct ftrace_regs *fregs) 405 380 { 406 381 struct fgraph_times *ftimes; 382 + struct trace_array *tr; 407 383 int size; 408 384 409 385 ftrace_graph_addr_finish(gops, trace); ··· 418 392 if (!ftimes) 419 393 return; 420 394 421 - handle_nosleeptime(trace, ftimes, size); 395 + tr = gops->private; 396 + handle_nosleeptime(tr, trace, ftimes, size); 422 397 423 398 if (tracing_thresh && 424 399 (trace_clock_local() - ftimes->calltime < tracing_thresh)) ··· 468 441 { 469 442 int ret; 470 443 471 - if (tracer_flags_is_set(TRACE_GRAPH_ARGS)) 444 + if (tracer_flags_is_set(tr, TRACE_GRAPH_ARGS)) 472 445 tr->gops->entryfunc = trace_graph_entry_args; 473 446 else 474 447 tr->gops->entryfunc = trace_graph_entry; ··· 477 450 tr->gops->retfunc = trace_graph_thresh_return; 478 451 else 479 452 tr->gops->retfunc = trace_graph_return; 453 + 454 + if (!tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_IRQS)) 455 + ftrace_graph_skip_irqs++; 456 + 457 + if (!tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME)) 458 + fgraph_no_sleep_time++; 480 459 481 460 /* Make gops functions visible before we start tracing */ 482 461 smp_mb(); ··· 500 467 static int ftrace_graph_trace_args(struct trace_array *tr, int set) 501 468 { 502 469 trace_func_graph_ent_t entry; 503 - 504 - /* Do nothing if the current tracer is not this tracer */ 505 - if (tr->current_trace != &graph_trace) 506 - return 0; 507 470 508 471 if (set) 509 472 entry = trace_graph_entry_args; ··· 521 492 522 493 static void graph_trace_reset(struct trace_array *tr) 523 494 { 495 + if (!tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_IRQS)) 496 + ftrace_graph_skip_irqs--; 497 + if (WARN_ON_ONCE(ftrace_graph_skip_irqs < 0)) 498 + ftrace_graph_skip_irqs = 0; 499 + 500 + if (!tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME)) 501 + fgraph_no_sleep_time--; 502 + if (WARN_ON_ONCE(fgraph_no_sleep_time < 0)) 503 + fgraph_no_sleep_time = 0; 504 + 524 505 tracing_stop_cmdline_record(); 525 506 unregister_ftrace_graph(tr->gops); 526 507 } ··· 673 634 * Save current and next entries for later reference 674 635 * if the output fails. 675 636 */ 676 - if (unlikely(curr->ent.type == TRACE_GRAPH_RETADDR_ENT)) { 677 - data->rent = *(struct fgraph_retaddr_ent_entry *)curr; 678 - } else { 679 - int size = min((int)sizeof(data->ent), (int)iter->ent_size); 637 + int size = min_t(int, sizeof(data->rent), iter->ent_size); 680 638 681 - memcpy(&data->ent, curr, size); 682 - } 639 + memcpy(&data->rent, curr, size); 683 640 /* 684 641 * If the next event is not a return type, then 685 642 * we only care about what type it is. Otherwise we can ··· 738 703 addr >= (unsigned long)__irqentry_text_end) 739 704 return; 740 705 741 - if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { 706 + if (tr->trace_flags & TRACE_ITER(CONTEXT_INFO)) { 742 707 /* Absolute time */ 743 708 if (flags & TRACE_GRAPH_PRINT_ABS_TIME) 744 709 print_graph_abs_time(iter->ts, s); ··· 758 723 } 759 724 760 725 /* Latency format */ 761 - if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) 726 + if (tr->trace_flags & TRACE_ITER(LATENCY_FMT)) 762 727 print_graph_lat_fmt(s, ent); 763 728 } 764 729 ··· 812 777 struct trace_seq *s, u32 flags) 813 778 { 814 779 if (!(flags & TRACE_GRAPH_PRINT_DURATION) || 815 - !(tr->trace_flags & TRACE_ITER_CONTEXT_INFO)) 780 + !(tr->trace_flags & TRACE_ITER(CONTEXT_INFO))) 816 781 return; 817 782 818 783 /* No real adata, just filling the column with spaces */ ··· 853 818 trace_seq_puts(s, " /*"); 854 819 855 820 trace_seq_puts(s, " <-"); 856 - seq_print_ip_sym(s, entry->graph_ent.retaddr, trace_flags | TRACE_ITER_SYM_OFFSET); 821 + seq_print_ip_sym_offset(s, entry->graph_rent.retaddr, trace_flags); 857 822 858 823 if (comment) 859 824 trace_seq_puts(s, " */"); ··· 999 964 trace_seq_printf(s, "%ps", (void *)ret_func); 1000 965 1001 966 if (args_size >= FTRACE_REGS_MAX_ARGS * sizeof(long)) { 1002 - print_function_args(s, entry->args, ret_func); 967 + print_function_args(s, FGRAPH_ENTRY_ARGS(entry), ret_func); 1003 968 trace_seq_putc(s, ';'); 1004 969 } else 1005 970 trace_seq_puts(s, "();"); ··· 1051 1016 args_size = iter->ent_size - offsetof(struct ftrace_graph_ent_entry, args); 1052 1017 1053 1018 if (args_size >= FTRACE_REGS_MAX_ARGS * sizeof(long)) 1054 - print_function_args(s, entry->args, func); 1019 + print_function_args(s, FGRAPH_ENTRY_ARGS(entry), func); 1055 1020 else 1056 1021 trace_seq_puts(s, "()"); 1057 1022 ··· 1089 1054 /* Interrupt */ 1090 1055 print_graph_irq(iter, addr, type, cpu, ent->pid, flags); 1091 1056 1092 - if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO)) 1057 + if (!(tr->trace_flags & TRACE_ITER(CONTEXT_INFO))) 1093 1058 return; 1094 1059 1095 1060 /* Absolute time */ ··· 1111 1076 } 1112 1077 1113 1078 /* Latency format */ 1114 - if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) 1079 + if (tr->trace_flags & TRACE_ITER(LATENCY_FMT)) 1115 1080 print_graph_lat_fmt(s, ent); 1116 1081 1117 1082 return; ··· 1233 1198 /* 1234 1199 * print_graph_entry() may consume the current event, 1235 1200 * thus @field may become invalid, so we need to save it. 1236 - * sizeof(struct ftrace_graph_ent_entry) is very small, 1237 - * it can be safely saved at the stack. 1201 + * This function is shared by ftrace_graph_ent_entry and 1202 + * fgraph_retaddr_ent_entry, the size of the latter one 1203 + * is larger, but it is very small and can be safely saved 1204 + * at the stack. 1238 1205 */ 1239 1206 struct ftrace_graph_ent_entry *entry; 1240 - u8 save_buf[sizeof(*entry) + FTRACE_REGS_MAX_ARGS * sizeof(long)]; 1207 + struct fgraph_retaddr_ent_entry *rentry; 1208 + u8 save_buf[sizeof(*rentry) + FTRACE_REGS_MAX_ARGS * sizeof(long)]; 1241 1209 1242 1210 /* The ent_size is expected to be as big as the entry */ 1243 1211 if (iter->ent_size > sizeof(save_buf)) ··· 1469 1431 } 1470 1432 #ifdef CONFIG_FUNCTION_GRAPH_RETADDR 1471 1433 case TRACE_GRAPH_RETADDR_ENT: { 1472 - struct fgraph_retaddr_ent_entry saved; 1434 + /* 1435 + * ftrace_graph_ent_entry and fgraph_retaddr_ent_entry have 1436 + * similar functions and memory layouts. The only difference 1437 + * is that the latter one has an extra retaddr member, so 1438 + * they can share most of the logic. 1439 + */ 1473 1440 struct fgraph_retaddr_ent_entry *rfield; 1474 1441 1475 1442 trace_assign_type(rfield, entry); 1476 - saved = *rfield; 1477 - return print_graph_entry((struct ftrace_graph_ent_entry *)&saved, s, iter, flags); 1443 + return print_graph_entry((struct ftrace_graph_ent_entry *)rfield, 1444 + s, iter, flags); 1478 1445 } 1479 1446 #endif 1480 1447 case TRACE_GRAPH_RET: { ··· 1502 1459 static enum print_line_t 1503 1460 print_graph_function(struct trace_iterator *iter) 1504 1461 { 1505 - return print_graph_function_flags(iter, tracer_flags.val); 1462 + struct trace_array *tr = iter->tr; 1463 + return print_graph_function_flags(iter, tr->current_trace_flags->val); 1506 1464 } 1507 1465 1508 1466 static enum print_line_t ··· 1539 1495 static void __print_graph_headers_flags(struct trace_array *tr, 1540 1496 struct seq_file *s, u32 flags) 1541 1497 { 1542 - int lat = tr->trace_flags & TRACE_ITER_LATENCY_FMT; 1498 + int lat = tr->trace_flags & TRACE_ITER(LATENCY_FMT); 1543 1499 1544 1500 if (lat) 1545 1501 print_lat_header(s, flags); ··· 1579 1535 1580 1536 static void print_graph_headers(struct seq_file *s) 1581 1537 { 1582 - print_graph_headers_flags(s, tracer_flags.val); 1538 + struct trace_iterator *iter = s->private; 1539 + struct trace_array *tr = iter->tr; 1540 + 1541 + print_graph_headers_flags(s, tr->current_trace_flags->val); 1583 1542 } 1584 1543 1585 1544 void print_graph_headers_flags(struct seq_file *s, u32 flags) ··· 1590 1543 struct trace_iterator *iter = s->private; 1591 1544 struct trace_array *tr = iter->tr; 1592 1545 1593 - if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO)) 1546 + if (!(tr->trace_flags & TRACE_ITER(CONTEXT_INFO))) 1594 1547 return; 1595 1548 1596 - if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) { 1549 + if (tr->trace_flags & TRACE_ITER(LATENCY_FMT)) { 1597 1550 /* print nothing if the buffers are empty */ 1598 1551 if (trace_empty(iter)) 1599 1552 return; ··· 1660 1613 static int 1661 1614 func_graph_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set) 1662 1615 { 1663 - if (bit == TRACE_GRAPH_PRINT_IRQS) 1664 - ftrace_graph_skip_irqs = !set; 1616 + /* 1617 + * The function profiler gets updated even if function graph 1618 + * isn't the current tracer. Handle it separately. 1619 + */ 1620 + #ifdef CONFIG_FUNCTION_PROFILER 1621 + if (bit == TRACE_GRAPH_SLEEP_TIME && (tr->flags & TRACE_ARRAY_FL_GLOBAL) && 1622 + !!set == fprofile_no_sleep_time) { 1623 + if (set) { 1624 + fgraph_no_sleep_time--; 1625 + if (WARN_ON_ONCE(fgraph_no_sleep_time < 0)) 1626 + fgraph_no_sleep_time = 0; 1627 + fprofile_no_sleep_time = false; 1628 + } else { 1629 + fgraph_no_sleep_time++; 1630 + fprofile_no_sleep_time = true; 1631 + } 1632 + } 1633 + #endif 1665 1634 1666 - if (bit == TRACE_GRAPH_SLEEP_TIME) 1667 - ftrace_graph_sleep_time_control(set); 1635 + /* Do nothing if the current tracer is not this tracer */ 1636 + if (tr->current_trace != &graph_trace) 1637 + return 0; 1668 1638 1669 - if (bit == TRACE_GRAPH_GRAPH_TIME) 1670 - ftrace_graph_graph_time_control(set); 1639 + /* Do nothing if already set. */ 1640 + if (!!set == !!(tr->current_trace_flags->val & bit)) 1641 + return 0; 1671 1642 1672 - if (bit == TRACE_GRAPH_ARGS) 1643 + switch (bit) { 1644 + case TRACE_GRAPH_SLEEP_TIME: 1645 + if (set) { 1646 + fgraph_no_sleep_time--; 1647 + if (WARN_ON_ONCE(fgraph_no_sleep_time < 0)) 1648 + fgraph_no_sleep_time = 0; 1649 + } else { 1650 + fgraph_no_sleep_time++; 1651 + } 1652 + break; 1653 + 1654 + case TRACE_GRAPH_PRINT_IRQS: 1655 + if (set) 1656 + ftrace_graph_skip_irqs--; 1657 + else 1658 + ftrace_graph_skip_irqs++; 1659 + if (WARN_ON_ONCE(ftrace_graph_skip_irqs < 0)) 1660 + ftrace_graph_skip_irqs = 0; 1661 + break; 1662 + 1663 + case TRACE_GRAPH_ARGS: 1673 1664 return ftrace_graph_trace_args(tr, set); 1665 + } 1674 1666 1675 1667 return 0; 1676 1668 } ··· 1746 1660 .reset = graph_trace_reset, 1747 1661 .print_line = print_graph_function, 1748 1662 .print_header = print_graph_headers, 1749 - .flags = &tracer_flags, 1663 + .default_flags = &tracer_flags, 1750 1664 .set_flag = func_graph_set_flag, 1751 1665 .allow_instances = true, 1752 1666 #ifdef CONFIG_FTRACE_SELFTEST
+15 -15
kernel/trace/trace_irqsoff.c
··· 63 63 64 64 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 65 65 static int irqsoff_display_graph(struct trace_array *tr, int set); 66 - # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH) 66 + # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER(DISPLAY_GRAPH)) 67 67 #else 68 68 static inline int irqsoff_display_graph(struct trace_array *tr, int set) 69 69 { ··· 485 485 { 486 486 int ret; 487 487 488 - /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */ 489 - if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER_FUNCTION))) 488 + /* 'set' is set if TRACE_ITER(FUNCTION) is about to be set */ 489 + if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER(FUNCTION)))) 490 490 return 0; 491 491 492 492 if (graph) ··· 515 515 516 516 static int irqsoff_function_set(struct trace_array *tr, u32 mask, int set) 517 517 { 518 - if (!(mask & TRACE_ITER_FUNCTION)) 518 + if (!(mask & TRACE_ITER(FUNCTION))) 519 519 return 0; 520 520 521 521 if (set) ··· 536 536 } 537 537 #endif /* CONFIG_FUNCTION_TRACER */ 538 538 539 - static int irqsoff_flag_changed(struct trace_array *tr, u32 mask, int set) 539 + static int irqsoff_flag_changed(struct trace_array *tr, u64 mask, int set) 540 540 { 541 541 struct tracer *tracer = tr->current_trace; 542 542 ··· 544 544 return 0; 545 545 546 546 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 547 - if (mask & TRACE_ITER_DISPLAY_GRAPH) 547 + if (mask & TRACE_ITER(DISPLAY_GRAPH)) 548 548 return irqsoff_display_graph(tr, set); 549 549 #endif 550 550 ··· 582 582 save_flags = tr->trace_flags; 583 583 584 584 /* non overwrite screws up the latency tracers */ 585 - set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); 586 - set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); 585 + set_tracer_flag(tr, TRACE_ITER(OVERWRITE), 1); 586 + set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), 1); 587 587 /* without pause, we will produce garbage if another latency occurs */ 588 - set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, 1); 588 + set_tracer_flag(tr, TRACE_ITER(PAUSE_ON_TRACE), 1); 589 589 590 590 tr->max_latency = 0; 591 591 irqsoff_trace = tr; ··· 605 605 606 606 static void __irqsoff_tracer_reset(struct trace_array *tr) 607 607 { 608 - int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; 609 - int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; 610 - int pause_flag = save_flags & TRACE_ITER_PAUSE_ON_TRACE; 608 + int lat_flag = save_flags & TRACE_ITER(LATENCY_FMT); 609 + int overwrite_flag = save_flags & TRACE_ITER(OVERWRITE); 610 + int pause_flag = save_flags & TRACE_ITER(PAUSE_ON_TRACE); 611 611 612 612 stop_irqsoff_tracer(tr, is_graph(tr)); 613 613 614 - set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); 615 - set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); 616 - set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, pause_flag); 614 + set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), lat_flag); 615 + set_tracer_flag(tr, TRACE_ITER(OVERWRITE), overwrite_flag); 616 + set_tracer_flag(tr, TRACE_ITER(PAUSE_ON_TRACE), pause_flag); 617 617 ftrace_reset_array_ops(tr); 618 618 619 619 irqsoff_busy = false;
+1 -1
kernel/trace/trace_kdb.c
··· 31 31 old_userobj = tr->trace_flags; 32 32 33 33 /* don't look at user memory in panic mode */ 34 - tr->trace_flags &= ~TRACE_ITER_SYM_USEROBJ; 34 + tr->trace_flags &= ~TRACE_ITER(SYM_USEROBJ); 35 35 36 36 kdb_printf("Dumping ftrace buffer:\n"); 37 37 if (skip_entries)
+3 -3
kernel/trace/trace_kprobe.c
··· 1584 1584 1585 1585 trace_seq_printf(s, "%s: (", trace_probe_name(tp)); 1586 1586 1587 - if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET)) 1587 + if (!seq_print_ip_sym_offset(s, field->ip, flags)) 1588 1588 goto out; 1589 1589 1590 1590 trace_seq_putc(s, ')'); ··· 1614 1614 1615 1615 trace_seq_printf(s, "%s: (", trace_probe_name(tp)); 1616 1616 1617 - if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET)) 1617 + if (!seq_print_ip_sym_offset(s, field->ret_ip, flags)) 1618 1618 goto out; 1619 1619 1620 1620 trace_seq_puts(s, " <- "); 1621 1621 1622 - if (!seq_print_ip_sym(s, field->func, flags & ~TRACE_ITER_SYM_OFFSET)) 1622 + if (!seq_print_ip_sym_no_offset(s, field->func, flags)) 1623 1623 goto out; 1624 1624 1625 1625 trace_seq_putc(s, ')');
+34 -11
kernel/trace/trace_output.c
··· 420 420 } 421 421 mmap_read_unlock(mm); 422 422 } 423 - if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file)) 423 + if (ret && ((sym_flags & TRACE_ITER(SYM_ADDR)) || !file)) 424 424 trace_seq_printf(s, " <" IP_FMT ">", ip); 425 425 return !trace_seq_has_overflowed(s); 426 426 } ··· 433 433 goto out; 434 434 } 435 435 436 - trace_seq_print_sym(s, ip, sym_flags & TRACE_ITER_SYM_OFFSET); 436 + trace_seq_print_sym(s, ip, sym_flags & TRACE_ITER(SYM_OFFSET)); 437 437 438 - if (sym_flags & TRACE_ITER_SYM_ADDR) 438 + if (sym_flags & TRACE_ITER(SYM_ADDR)) 439 439 trace_seq_printf(s, " <" IP_FMT ">", ip); 440 440 441 441 out: ··· 569 569 lat_print_timestamp(struct trace_iterator *iter, u64 next_ts) 570 570 { 571 571 struct trace_array *tr = iter->tr; 572 - unsigned long verbose = tr->trace_flags & TRACE_ITER_VERBOSE; 572 + unsigned long verbose = tr->trace_flags & TRACE_ITER(VERBOSE); 573 573 unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS; 574 574 unsigned long long abs_ts = iter->ts - iter->array_buffer->time_start; 575 575 unsigned long long rel_ts = next_ts - iter->ts; ··· 636 636 637 637 trace_seq_printf(s, "%16s-%-7d ", comm, entry->pid); 638 638 639 - if (tr->trace_flags & TRACE_ITER_RECORD_TGID) { 639 + if (tr->trace_flags & TRACE_ITER(RECORD_TGID)) { 640 640 unsigned int tgid = trace_find_tgid(entry->pid); 641 641 642 642 if (!tgid) ··· 647 647 648 648 trace_seq_printf(s, "[%03d] ", iter->cpu); 649 649 650 - if (tr->trace_flags & TRACE_ITER_IRQ_INFO) 650 + if (tr->trace_flags & TRACE_ITER(IRQ_INFO)) 651 651 trace_print_lat_fmt(s, entry); 652 652 653 653 trace_print_time(s, iter, iter->ts); ··· 661 661 struct trace_entry *entry, *next_entry; 662 662 struct trace_array *tr = iter->tr; 663 663 struct trace_seq *s = &iter->seq; 664 - unsigned long verbose = (tr->trace_flags & TRACE_ITER_VERBOSE); 664 + unsigned long verbose = (tr->trace_flags & TRACE_ITER(VERBOSE)); 665 665 u64 next_ts; 666 666 667 667 next_entry = trace_find_next_entry(iter, NULL, &next_ts); ··· 950 950 int offset; 951 951 int len; 952 952 int ret; 953 + int i; 953 954 void *pos; 955 + char *str; 954 956 955 957 list_for_each_entry_reverse(field, head, link) { 956 958 trace_seq_printf(&iter->seq, " %s=", field->name); ··· 979 977 trace_seq_puts(&iter->seq, "<OVERFLOW>"); 980 978 break; 981 979 } 982 - pos = (void *)iter->ent + offset; 983 - trace_seq_printf(&iter->seq, "%.*s", len, (char *)pos); 980 + str = (char *)iter->ent + offset; 981 + /* Check if there's any non printable strings */ 982 + for (i = 0; i < len; i++) { 983 + if (str[i] && !(isascii(str[i]) && isprint(str[i]))) 984 + break; 985 + } 986 + if (i < len) { 987 + for (i = 0; i < len; i++) { 988 + if (isascii(str[i]) && isprint(str[i])) 989 + trace_seq_putc(&iter->seq, str[i]); 990 + else 991 + trace_seq_putc(&iter->seq, '.'); 992 + } 993 + trace_seq_puts(&iter->seq, " ("); 994 + for (i = 0; i < len; i++) { 995 + if (i) 996 + trace_seq_putc(&iter->seq, ':'); 997 + trace_seq_printf(&iter->seq, "%02x", str[i]); 998 + } 999 + trace_seq_putc(&iter->seq, ')'); 1000 + } else { 1001 + trace_seq_printf(&iter->seq, "%.*s", len, str); 1002 + } 984 1003 break; 985 1004 case FILTER_PTR_STRING: 986 1005 if (!iter->fmt_size) ··· 1150 1127 if (args) 1151 1128 print_function_args(s, args, ip); 1152 1129 1153 - if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) { 1130 + if ((flags & TRACE_ITER(PRINT_PARENT)) && parent_ip) { 1154 1131 trace_seq_puts(s, " <-"); 1155 1132 seq_print_ip_sym(s, parent_ip, flags); 1156 1133 } ··· 1440 1417 1441 1418 trace_seq_puts(s, "<user stack trace>\n"); 1442 1419 1443 - if (tr->trace_flags & TRACE_ITER_SYM_USEROBJ) { 1420 + if (tr->trace_flags & TRACE_ITER(SYM_USEROBJ)) { 1444 1421 struct task_struct *task; 1445 1422 /* 1446 1423 * we do the lookup on the thread group leader,
+11
kernel/trace/trace_output.h
··· 16 16 seq_print_ip_sym(struct trace_seq *s, unsigned long ip, 17 17 unsigned long sym_flags); 18 18 19 + static inline int seq_print_ip_sym_offset(struct trace_seq *s, unsigned long ip, 20 + unsigned long sym_flags) 21 + { 22 + return seq_print_ip_sym(s, ip, sym_flags | TRACE_ITER(SYM_OFFSET)); 23 + } 24 + static inline int seq_print_ip_sym_no_offset(struct trace_seq *s, unsigned long ip, 25 + unsigned long sym_flags) 26 + { 27 + return seq_print_ip_sym(s, ip, sym_flags & ~TRACE_ITER(SYM_OFFSET)); 28 + } 29 + 19 30 extern void trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset); 20 31 extern int trace_print_context(struct trace_iterator *iter); 21 32 extern int trace_print_lat_context(struct trace_iterator *iter);
+12 -12
kernel/trace/trace_sched_wakeup.c
··· 41 41 static int save_flags; 42 42 43 43 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 44 - # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH) 44 + # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER(DISPLAY_GRAPH)) 45 45 #else 46 46 # define is_graph(tr) false 47 47 #endif ··· 247 247 { 248 248 int ret; 249 249 250 - /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */ 251 - if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER_FUNCTION))) 250 + /* 'set' is set if TRACE_ITER(FUNCTION) is about to be set */ 251 + if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER(FUNCTION)))) 252 252 return 0; 253 253 254 254 if (graph) ··· 277 277 278 278 static int wakeup_function_set(struct trace_array *tr, u32 mask, int set) 279 279 { 280 - if (!(mask & TRACE_ITER_FUNCTION)) 280 + if (!(mask & TRACE_ITER(FUNCTION))) 281 281 return 0; 282 282 283 283 if (set) ··· 324 324 trace_function(tr, ip, parent_ip, trace_ctx, NULL); 325 325 } 326 326 327 - static int wakeup_flag_changed(struct trace_array *tr, u32 mask, int set) 327 + static int wakeup_flag_changed(struct trace_array *tr, u64 mask, int set) 328 328 { 329 329 struct tracer *tracer = tr->current_trace; 330 330 ··· 332 332 return 0; 333 333 334 334 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 335 - if (mask & TRACE_ITER_DISPLAY_GRAPH) 335 + if (mask & TRACE_ITER(DISPLAY_GRAPH)) 336 336 return wakeup_display_graph(tr, set); 337 337 #endif 338 338 ··· 681 681 save_flags = tr->trace_flags; 682 682 683 683 /* non overwrite screws up the latency tracers */ 684 - set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); 685 - set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); 684 + set_tracer_flag(tr, TRACE_ITER(OVERWRITE), 1); 685 + set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), 1); 686 686 687 687 tr->max_latency = 0; 688 688 wakeup_trace = tr; ··· 725 725 726 726 static void wakeup_tracer_reset(struct trace_array *tr) 727 727 { 728 - int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; 729 - int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; 728 + int lat_flag = save_flags & TRACE_ITER(LATENCY_FMT); 729 + int overwrite_flag = save_flags & TRACE_ITER(OVERWRITE); 730 730 731 731 stop_wakeup_tracer(tr); 732 732 /* make sure we put back any tasks we are tracing */ 733 733 wakeup_reset(tr); 734 734 735 - set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); 736 - set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); 735 + set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), lat_flag); 736 + set_tracer_flag(tr, TRACE_ITER(OVERWRITE), overwrite_flag); 737 737 ftrace_reset_array_ops(tr); 738 738 wakeup_busy = false; 739 739 }
+885 -54
kernel/trace/trace_syscalls.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <trace/syscall.h> 3 3 #include <trace/events/syscalls.h> 4 + #include <linux/kernel_stat.h> 4 5 #include <linux/syscalls.h> 5 6 #include <linux/slab.h> 6 7 #include <linux/kernel.h> ··· 124 123 return entry->name; 125 124 } 126 125 126 + /* Added to user strings or arrays when max limit is reached */ 127 + #define EXTRA "..." 128 + 129 + static void get_dynamic_len_ptr(struct syscall_trace_enter *trace, 130 + struct syscall_metadata *entry, 131 + int *offset_p, int *len_p, unsigned char **ptr_p) 132 + { 133 + unsigned char *ptr; 134 + int offset = *offset_p; 135 + int val; 136 + 137 + /* This arg points to a user space string */ 138 + ptr = (void *)trace->args + sizeof(long) * entry->nb_args + offset; 139 + val = *(int *)ptr; 140 + 141 + /* The value is a dynamic string (len << 16 | offset) */ 142 + ptr = (void *)trace + (val & 0xffff); 143 + *len_p = val >> 16; 144 + offset += 4; 145 + 146 + *ptr_p = ptr; 147 + *offset_p = offset; 148 + } 149 + 150 + static enum print_line_t 151 + sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_metadata *entry, 152 + struct trace_seq *s, struct trace_event *event) 153 + { 154 + unsigned char *ptr; 155 + int offset = 0; 156 + int bits, len; 157 + bool done = false; 158 + static const struct trace_print_flags __flags[] = 159 + { 160 + { O_TMPFILE, "O_TMPFILE" }, 161 + { O_WRONLY, "O_WRONLY" }, 162 + { O_RDWR, "O_RDWR" }, 163 + { O_CREAT, "O_CREAT" }, 164 + { O_EXCL, "O_EXCL" }, 165 + { O_NOCTTY, "O_NOCTTY" }, 166 + { O_TRUNC, "O_TRUNC" }, 167 + { O_APPEND, "O_APPEND" }, 168 + { O_NONBLOCK, "O_NONBLOCK" }, 169 + { O_DSYNC, "O_DSYNC" }, 170 + { O_DIRECT, "O_DIRECT" }, 171 + { O_LARGEFILE, "O_LARGEFILE" }, 172 + { O_DIRECTORY, "O_DIRECTORY" }, 173 + { O_NOFOLLOW, "O_NOFOLLOW" }, 174 + { O_NOATIME, "O_NOATIME" }, 175 + { O_CLOEXEC, "O_CLOEXEC" }, 176 + { -1, NULL } 177 + }; 178 + 179 + trace_seq_printf(s, "%s(", entry->name); 180 + 181 + for (int i = 0; !done && i < entry->nb_args; i++) { 182 + 183 + if (trace_seq_has_overflowed(s)) 184 + goto end; 185 + 186 + if (i) 187 + trace_seq_puts(s, ", "); 188 + 189 + switch (i) { 190 + case 2: 191 + bits = trace->args[2]; 192 + 193 + trace_seq_puts(s, "flags: "); 194 + 195 + /* No need to show mode when not creating the file */ 196 + if (!(bits & (O_CREAT|O_TMPFILE))) 197 + done = true; 198 + 199 + if (!(bits & O_ACCMODE)) { 200 + if (!bits) { 201 + trace_seq_puts(s, "O_RDONLY"); 202 + continue; 203 + } 204 + trace_seq_puts(s, "O_RDONLY|"); 205 + } 206 + 207 + trace_print_flags_seq(s, "|", bits, __flags); 208 + /* 209 + * trace_print_flags_seq() adds a '\0' to the 210 + * buffer, but this needs to append more to the seq. 211 + */ 212 + if (!trace_seq_has_overflowed(s)) 213 + trace_seq_pop(s); 214 + 215 + continue; 216 + case 3: 217 + trace_seq_printf(s, "%s: 0%03o", entry->args[i], 218 + (unsigned int)trace->args[i]); 219 + continue; 220 + } 221 + 222 + trace_seq_printf(s, "%s: %lu", entry->args[i], 223 + trace->args[i]); 224 + 225 + if (!(BIT(i) & entry->user_mask)) 226 + continue; 227 + 228 + get_dynamic_len_ptr(trace, entry, &offset, &len, &ptr); 229 + trace_seq_printf(s, " \"%.*s\"", len, ptr); 230 + } 231 + 232 + trace_seq_putc(s, ')'); 233 + end: 234 + trace_seq_putc(s, '\n'); 235 + 236 + return trace_handle_return(s); 237 + } 238 + 127 239 static enum print_line_t 128 240 print_syscall_enter(struct trace_iterator *iter, int flags, 129 241 struct trace_event *event) ··· 246 132 struct trace_entry *ent = iter->ent; 247 133 struct syscall_trace_enter *trace; 248 134 struct syscall_metadata *entry; 249 - int i, syscall; 135 + int i, syscall, val, len; 136 + unsigned char *ptr; 137 + int offset = 0; 250 138 251 139 trace = (typeof(trace))ent; 252 140 syscall = trace->nr; ··· 262 146 goto end; 263 147 } 264 148 149 + switch (entry->syscall_nr) { 150 + case __NR_openat: 151 + if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE))) 152 + return sys_enter_openat_print(trace, entry, s, event); 153 + break; 154 + default: 155 + break; 156 + } 157 + 265 158 trace_seq_printf(s, "%s(", entry->name); 266 159 267 160 for (i = 0; i < entry->nb_args; i++) { 161 + bool printable = false; 162 + char *str; 268 163 269 164 if (trace_seq_has_overflowed(s)) 270 165 goto end; ··· 284 157 trace_seq_puts(s, ", "); 285 158 286 159 /* parameter types */ 287 - if (tr && tr->trace_flags & TRACE_ITER_VERBOSE) 160 + if (tr && tr->trace_flags & TRACE_ITER(VERBOSE)) 288 161 trace_seq_printf(s, "%s ", entry->types[i]); 289 162 290 163 /* parameter values */ ··· 294 167 else 295 168 trace_seq_printf(s, "%s: 0x%lx", entry->args[i], 296 169 trace->args[i]); 170 + 171 + if (!(BIT(i) & entry->user_mask)) 172 + continue; 173 + 174 + get_dynamic_len_ptr(trace, entry, &offset, &len, &ptr); 175 + 176 + if (entry->user_arg_size < 0 || entry->user_arg_is_str) { 177 + trace_seq_printf(s, " \"%.*s\"", len, ptr); 178 + continue; 179 + } 180 + 181 + val = trace->args[entry->user_arg_size]; 182 + 183 + str = ptr; 184 + trace_seq_puts(s, " ("); 185 + for (int x = 0; x < len; x++, ptr++) { 186 + if (isascii(*ptr) && isprint(*ptr)) 187 + printable = true; 188 + if (x) 189 + trace_seq_putc(s, ':'); 190 + trace_seq_printf(s, "%02x", *ptr); 191 + } 192 + if (len < val) 193 + trace_seq_printf(s, ", %s", EXTRA); 194 + 195 + trace_seq_putc(s, ')'); 196 + 197 + /* If nothing is printable, don't bother printing anything */ 198 + if (!printable) 199 + continue; 200 + 201 + trace_seq_puts(s, " \""); 202 + for (int x = 0; x < len; x++) { 203 + if (isascii(str[x]) && isprint(str[x])) 204 + trace_seq_putc(s, str[x]); 205 + else 206 + trace_seq_putc(s, '.'); 207 + } 208 + if (len < val) 209 + trace_seq_printf(s, "\"%s", EXTRA); 210 + else 211 + trace_seq_putc(s, '"'); 297 212 } 298 213 299 214 trace_seq_putc(s, ')'); ··· 381 212 .size = sizeof(_type), .align = __alignof__(_type), \ 382 213 .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER } 383 214 215 + /* When len=0, we just calculate the needed length */ 216 + #define LEN_OR_ZERO (len ? len - pos : 0) 217 + 218 + static int __init 219 + sys_enter_openat_print_fmt(struct syscall_metadata *entry, char *buf, int len) 220 + { 221 + int pos = 0; 222 + 223 + pos += snprintf(buf + pos, LEN_OR_ZERO, 224 + "\"dfd: 0x%%08lx, filename: 0x%%08lx \\\"%%s\\\", flags: %%s%%s, mode: 0%%03o\","); 225 + pos += snprintf(buf + pos, LEN_OR_ZERO, 226 + " ((unsigned long)(REC->dfd)),"); 227 + pos += snprintf(buf + pos, LEN_OR_ZERO, 228 + " ((unsigned long)(REC->filename)),"); 229 + pos += snprintf(buf + pos, LEN_OR_ZERO, 230 + " __get_str(__filename_val),"); 231 + pos += snprintf(buf + pos, LEN_OR_ZERO, 232 + " (REC->flags & ~3) && !(REC->flags & 3) ? \"O_RDONLY|\" : \"\", "); 233 + pos += snprintf(buf + pos, LEN_OR_ZERO, 234 + " REC->flags ? __print_flags(REC->flags, \"|\", "); 235 + pos += snprintf(buf + pos, LEN_OR_ZERO, 236 + "{ 0x%x, \"O_WRONLY\" }, ", O_WRONLY); 237 + pos += snprintf(buf + pos, LEN_OR_ZERO, 238 + "{ 0x%x, \"O_RDWR\" }, ", O_RDWR); 239 + pos += snprintf(buf + pos, LEN_OR_ZERO, 240 + "{ 0x%x, \"O_CREAT\" }, ", O_CREAT); 241 + pos += snprintf(buf + pos, LEN_OR_ZERO, 242 + "{ 0x%x, \"O_EXCL\" }, ", O_EXCL); 243 + pos += snprintf(buf + pos, LEN_OR_ZERO, 244 + "{ 0x%x, \"O_NOCTTY\" }, ", O_NOCTTY); 245 + pos += snprintf(buf + pos, LEN_OR_ZERO, 246 + "{ 0x%x, \"O_TRUNC\" }, ", O_TRUNC); 247 + pos += snprintf(buf + pos, LEN_OR_ZERO, 248 + "{ 0x%x, \"O_APPEND\" }, ", O_APPEND); 249 + pos += snprintf(buf + pos, LEN_OR_ZERO, 250 + "{ 0x%x, \"O_NONBLOCK\" }, ", O_NONBLOCK); 251 + pos += snprintf(buf + pos, LEN_OR_ZERO, 252 + "{ 0x%x, \"O_DSYNC\" }, ", O_DSYNC); 253 + pos += snprintf(buf + pos, LEN_OR_ZERO, 254 + "{ 0x%x, \"O_DIRECT\" }, ", O_DIRECT); 255 + pos += snprintf(buf + pos, LEN_OR_ZERO, 256 + "{ 0x%x, \"O_LARGEFILE\" }, ", O_LARGEFILE); 257 + pos += snprintf(buf + pos, LEN_OR_ZERO, 258 + "{ 0x%x, \"O_DIRECTORY\" }, ", O_DIRECTORY); 259 + pos += snprintf(buf + pos, LEN_OR_ZERO, 260 + "{ 0x%x, \"O_NOFOLLOW\" }, ", O_NOFOLLOW); 261 + pos += snprintf(buf + pos, LEN_OR_ZERO, 262 + "{ 0x%x, \"O_NOATIME\" }, ", O_NOATIME); 263 + pos += snprintf(buf + pos, LEN_OR_ZERO, 264 + "{ 0x%x, \"O_CLOEXEC\" }) : \"O_RDONLY\", ", O_CLOEXEC); 265 + 266 + pos += snprintf(buf + pos, LEN_OR_ZERO, 267 + " ((unsigned long)(REC->mode))"); 268 + return pos; 269 + } 270 + 384 271 static int __init 385 272 __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len) 386 273 { 274 + bool is_string = entry->user_arg_is_str; 387 275 int i; 388 276 int pos = 0; 389 277 390 - /* When len=0, we just calculate the needed length */ 391 - #define LEN_OR_ZERO (len ? len - pos : 0) 278 + switch (entry->syscall_nr) { 279 + case __NR_openat: 280 + return sys_enter_openat_print_fmt(entry, buf, len); 281 + default: 282 + break; 283 + } 392 284 393 285 pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); 394 286 for (i = 0; i < entry->nb_args; i++) { 395 - pos += snprintf(buf + pos, LEN_OR_ZERO, "%s: 0x%%0%zulx%s", 396 - entry->args[i], sizeof(unsigned long), 397 - i == entry->nb_args - 1 ? "" : ", "); 287 + if (i) 288 + pos += snprintf(buf + pos, LEN_OR_ZERO, ", "); 289 + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s: 0x%%0%zulx", 290 + entry->args[i], sizeof(unsigned long)); 291 + 292 + if (!(BIT(i) & entry->user_mask)) 293 + continue; 294 + 295 + /* Add the format for the user space string or array */ 296 + if (entry->user_arg_size < 0 || is_string) 297 + pos += snprintf(buf + pos, LEN_OR_ZERO, " \\\"%%s\\\""); 298 + else 299 + pos += snprintf(buf + pos, LEN_OR_ZERO, " (%%s)"); 398 300 } 399 301 pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); 400 302 401 303 for (i = 0; i < entry->nb_args; i++) { 402 304 pos += snprintf(buf + pos, LEN_OR_ZERO, 403 305 ", ((unsigned long)(REC->%s))", entry->args[i]); 306 + if (!(BIT(i) & entry->user_mask)) 307 + continue; 308 + /* The user space data for arg has name __<arg>_val */ 309 + if (entry->user_arg_size < 0 || is_string) { 310 + pos += snprintf(buf + pos, LEN_OR_ZERO, ", __get_str(__%s_val)", 311 + entry->args[i]); 312 + } else { 313 + pos += snprintf(buf + pos, LEN_OR_ZERO, ", __print_dynamic_array(__%s_val, 1)", 314 + entry->args[i]); 315 + } 404 316 } 405 317 406 318 #undef LEN_OR_ZERO ··· 527 277 { 528 278 struct syscall_trace_enter trace; 529 279 struct syscall_metadata *meta = call->data; 280 + unsigned long mask; 281 + char *arg; 530 282 int offset = offsetof(typeof(trace), args); 531 283 int ret = 0; 284 + int len; 532 285 int i; 533 286 534 287 for (i = 0; i < meta->nb_args; i++) { ··· 544 291 offset += sizeof(unsigned long); 545 292 } 546 293 294 + if (ret || !meta->user_mask) 295 + return ret; 296 + 297 + mask = meta->user_mask; 298 + 299 + while (mask) { 300 + int idx = ffs(mask) - 1; 301 + mask &= ~BIT(idx); 302 + 303 + /* 304 + * User space data is faulted into a temporary buffer and then 305 + * added as a dynamic string or array to the end of the event. 306 + * The user space data name for the arg pointer is 307 + * "__<arg>_val". 308 + */ 309 + len = strlen(meta->args[idx]) + sizeof("___val"); 310 + arg = kmalloc(len, GFP_KERNEL); 311 + if (WARN_ON_ONCE(!arg)) { 312 + meta->user_mask = 0; 313 + return -ENOMEM; 314 + } 315 + 316 + snprintf(arg, len, "__%s_val", meta->args[idx]); 317 + 318 + ret = trace_define_field(call, "__data_loc char[]", 319 + arg, offset, sizeof(int), 0, 320 + FILTER_OTHER); 321 + if (ret) { 322 + kfree(arg); 323 + break; 324 + } 325 + offset += 4; 326 + } 547 327 return ret; 328 + } 329 + 330 + /* 331 + * Create a per CPU temporary buffer to copy user space pointers into. 332 + * 333 + * SYSCALL_FAULT_USER_MAX is the amount to copy from user space. 334 + * (defined in kernel/trace/trace.h) 335 + 336 + * SYSCALL_FAULT_ARG_SZ is the amount to copy from user space plus the 337 + * nul terminating byte and possibly appended EXTRA (4 bytes). 338 + * 339 + * SYSCALL_FAULT_BUF_SZ holds the size of the per CPU buffer to use 340 + * to copy memory from user space addresses into that will hold 341 + * 3 args as only 3 args are allowed to be copied from system calls. 342 + */ 343 + #define SYSCALL_FAULT_ARG_SZ (SYSCALL_FAULT_USER_MAX + 1 + 4) 344 + #define SYSCALL_FAULT_MAX_CNT 3 345 + #define SYSCALL_FAULT_BUF_SZ (SYSCALL_FAULT_ARG_SZ * SYSCALL_FAULT_MAX_CNT) 346 + 347 + /* Use the tracing per CPU buffer infrastructure to copy from user space */ 348 + struct syscall_user_buffer { 349 + struct trace_user_buf_info buf; 350 + struct rcu_head rcu; 351 + }; 352 + 353 + static struct syscall_user_buffer *syscall_buffer; 354 + 355 + static int syscall_fault_buffer_enable(void) 356 + { 357 + struct syscall_user_buffer *sbuf; 358 + int ret; 359 + 360 + lockdep_assert_held(&syscall_trace_lock); 361 + 362 + if (syscall_buffer) { 363 + trace_user_fault_get(&syscall_buffer->buf); 364 + return 0; 365 + } 366 + 367 + sbuf = kmalloc(sizeof(*sbuf), GFP_KERNEL); 368 + if (!sbuf) 369 + return -ENOMEM; 370 + 371 + ret = trace_user_fault_init(&sbuf->buf, SYSCALL_FAULT_BUF_SZ); 372 + if (ret < 0) { 373 + kfree(sbuf); 374 + return ret; 375 + } 376 + 377 + WRITE_ONCE(syscall_buffer, sbuf); 378 + 379 + return 0; 380 + } 381 + 382 + static void rcu_free_syscall_buffer(struct rcu_head *rcu) 383 + { 384 + struct syscall_user_buffer *sbuf = 385 + container_of(rcu, struct syscall_user_buffer, rcu); 386 + 387 + trace_user_fault_destroy(&sbuf->buf); 388 + kfree(sbuf); 389 + } 390 + 391 + 392 + static void syscall_fault_buffer_disable(void) 393 + { 394 + struct syscall_user_buffer *sbuf = syscall_buffer; 395 + 396 + lockdep_assert_held(&syscall_trace_lock); 397 + 398 + if (trace_user_fault_put(&sbuf->buf)) 399 + return; 400 + 401 + WRITE_ONCE(syscall_buffer, NULL); 402 + call_rcu_tasks_trace(&sbuf->rcu, rcu_free_syscall_buffer); 403 + } 404 + 405 + struct syscall_args { 406 + char *ptr_array[SYSCALL_FAULT_MAX_CNT]; 407 + int read[SYSCALL_FAULT_MAX_CNT]; 408 + int uargs; 409 + }; 410 + 411 + static int syscall_copy_user(char *buf, const char __user *ptr, 412 + size_t size, void *data) 413 + { 414 + struct syscall_args *args = data; 415 + int ret; 416 + 417 + for (int i = 0; i < args->uargs; i++, buf += SYSCALL_FAULT_ARG_SZ) { 418 + ptr = (char __user *)args->ptr_array[i]; 419 + ret = strncpy_from_user(buf, ptr, size); 420 + args->read[i] = ret; 421 + } 422 + return 0; 423 + } 424 + 425 + static int syscall_copy_user_array(char *buf, const char __user *ptr, 426 + size_t size, void *data) 427 + { 428 + struct syscall_args *args = data; 429 + int ret; 430 + 431 + for (int i = 0; i < args->uargs; i++, buf += SYSCALL_FAULT_ARG_SZ) { 432 + ptr = (char __user *)args->ptr_array[i]; 433 + ret = __copy_from_user(buf, ptr, size); 434 + args->read[i] = ret ? -1 : size; 435 + } 436 + return 0; 437 + } 438 + 439 + static char *sys_fault_user(unsigned int buf_size, 440 + struct syscall_metadata *sys_data, 441 + struct syscall_user_buffer *sbuf, 442 + unsigned long *args, 443 + unsigned int data_size[SYSCALL_FAULT_MAX_CNT]) 444 + { 445 + trace_user_buf_copy syscall_copy = syscall_copy_user; 446 + unsigned long mask = sys_data->user_mask; 447 + unsigned long size = SYSCALL_FAULT_ARG_SZ - 1; 448 + struct syscall_args sargs; 449 + bool array = false; 450 + char *buffer; 451 + char *buf; 452 + int ret; 453 + int i = 0; 454 + 455 + /* The extra is appended to the user data in the buffer */ 456 + BUILD_BUG_ON(SYSCALL_FAULT_USER_MAX + sizeof(EXTRA) >= 457 + SYSCALL_FAULT_ARG_SZ); 458 + 459 + /* 460 + * If this system call event has a size argument, use 461 + * it to define how much of user space memory to read, 462 + * and read it as an array and not a string. 463 + */ 464 + if (sys_data->user_arg_size >= 0) { 465 + array = true; 466 + size = args[sys_data->user_arg_size]; 467 + if (size > SYSCALL_FAULT_ARG_SZ - 1) 468 + size = SYSCALL_FAULT_ARG_SZ - 1; 469 + syscall_copy = syscall_copy_user_array; 470 + } 471 + 472 + while (mask) { 473 + int idx = ffs(mask) - 1; 474 + mask &= ~BIT(idx); 475 + 476 + if (WARN_ON_ONCE(i == SYSCALL_FAULT_MAX_CNT)) 477 + break; 478 + 479 + /* Get the pointer to user space memory to read */ 480 + sargs.ptr_array[i++] = (char *)args[idx]; 481 + } 482 + 483 + sargs.uargs = i; 484 + 485 + /* Clear the values that are not used */ 486 + for (; i < SYSCALL_FAULT_MAX_CNT; i++) { 487 + data_size[i] = -1; /* Denotes no pointer */ 488 + } 489 + 490 + /* A zero size means do not even try */ 491 + if (!buf_size) 492 + return NULL; 493 + 494 + buffer = trace_user_fault_read(&sbuf->buf, NULL, size, 495 + syscall_copy, &sargs); 496 + if (!buffer) 497 + return NULL; 498 + 499 + buf = buffer; 500 + for (i = 0; i < sargs.uargs; i++, buf += SYSCALL_FAULT_ARG_SZ) { 501 + 502 + ret = sargs.read[i]; 503 + if (ret < 0) 504 + continue; 505 + buf[ret] = '\0'; 506 + 507 + /* For strings, replace any non-printable characters with '.' */ 508 + if (!array) { 509 + for (int x = 0; x < ret; x++) { 510 + if (!isprint(buf[x])) 511 + buf[x] = '.'; 512 + } 513 + 514 + size = min(buf_size, SYSCALL_FAULT_USER_MAX); 515 + 516 + /* 517 + * If the text was truncated due to our max limit, 518 + * add "..." to the string. 519 + */ 520 + if (ret > size) { 521 + strscpy(buf + size, EXTRA, sizeof(EXTRA)); 522 + ret = size + sizeof(EXTRA); 523 + } else { 524 + buf[ret++] = '\0'; 525 + } 526 + } else { 527 + ret = min((unsigned int)ret, buf_size); 528 + } 529 + data_size[i] = ret; 530 + } 531 + 532 + return buffer; 533 + } 534 + 535 + static int 536 + syscall_get_data(struct syscall_metadata *sys_data, unsigned long *args, 537 + char **buffer, int *size, int *user_sizes, int *uargs, 538 + int buf_size) 539 + { 540 + struct syscall_user_buffer *sbuf; 541 + int i; 542 + 543 + /* If the syscall_buffer is NULL, tracing is being shutdown */ 544 + sbuf = READ_ONCE(syscall_buffer); 545 + if (!sbuf) 546 + return -1; 547 + 548 + *buffer = sys_fault_user(buf_size, sys_data, sbuf, args, user_sizes); 549 + /* 550 + * user_size is the amount of data to append. 551 + * Need to add 4 for the meta field that points to 552 + * the user memory at the end of the event and also 553 + * stores its size. 554 + */ 555 + for (i = 0; i < SYSCALL_FAULT_MAX_CNT; i++) { 556 + if (user_sizes[i] < 0) 557 + break; 558 + *size += user_sizes[i] + 4; 559 + } 560 + /* Save the number of user read arguments of this syscall */ 561 + *uargs = i; 562 + return 0; 563 + } 564 + 565 + static void syscall_put_data(struct syscall_metadata *sys_data, 566 + struct syscall_trace_enter *entry, 567 + char *buffer, int size, int *user_sizes, int uargs) 568 + { 569 + char *buf = buffer; 570 + void *ptr; 571 + int val; 572 + 573 + /* 574 + * Set the pointer to point to the meta data of the event 575 + * that has information about the stored user space memory. 576 + */ 577 + ptr = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args; 578 + 579 + /* 580 + * The meta data will store the offset of the user data from 581 + * the beginning of the event. That is after the static arguments 582 + * and the meta data fields. 583 + */ 584 + val = (ptr - (void *)entry) + 4 * uargs; 585 + 586 + for (int i = 0; i < uargs; i++) { 587 + 588 + if (i) 589 + val += user_sizes[i - 1]; 590 + 591 + /* Store the offset and the size into the meta data */ 592 + *(int *)ptr = val | (user_sizes[i] << 16); 593 + 594 + /* Skip the meta data */ 595 + ptr += 4; 596 + } 597 + 598 + for (int i = 0; i < uargs; i++, buf += SYSCALL_FAULT_ARG_SZ) { 599 + /* Nothing to do if the user space was empty or faulted */ 600 + if (!user_sizes[i]) 601 + continue; 602 + 603 + memcpy(ptr, buf, user_sizes[i]); 604 + ptr += user_sizes[i]; 605 + } 548 606 } 549 607 550 608 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) ··· 866 302 struct syscall_metadata *sys_data; 867 303 struct trace_event_buffer fbuffer; 868 304 unsigned long args[6]; 305 + char *user_ptr; 306 + int user_sizes[SYSCALL_FAULT_MAX_CNT] = {}; 869 307 int syscall_nr; 870 - int size; 308 + int size = 0; 309 + int uargs = 0; 310 + bool mayfault; 871 311 872 312 /* 873 313 * Syscall probe called with preemption enabled, but the ring 874 314 * buffer and per-cpu data require preemption to be disabled. 875 315 */ 876 316 might_fault(); 877 - guard(preempt_notrace)(); 878 317 879 318 syscall_nr = trace_get_syscall_nr(current, regs); 880 319 if (syscall_nr < 0 || syscall_nr >= NR_syscalls) ··· 894 327 if (!sys_data) 895 328 return; 896 329 897 - size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; 330 + /* Check if this syscall event faults in user space memory */ 331 + mayfault = sys_data->user_mask != 0; 332 + 333 + guard(preempt_notrace)(); 334 + 335 + syscall_get_arguments(current, regs, args); 336 + 337 + if (mayfault) { 338 + if (syscall_get_data(sys_data, args, &user_ptr, 339 + &size, user_sizes, &uargs, tr->syscall_buf_sz) < 0) 340 + return; 341 + } 342 + 343 + size += sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; 898 344 899 345 entry = trace_event_buffer_reserve(&fbuffer, trace_file, size); 900 346 if (!entry) ··· 915 335 916 336 entry = ring_buffer_event_data(fbuffer.event); 917 337 entry->nr = syscall_nr; 918 - syscall_get_arguments(current, regs, args); 338 + 919 339 memcpy(entry->args, args, sizeof(unsigned long) * sys_data->nb_args); 340 + 341 + if (mayfault) 342 + syscall_put_data(sys_data, entry, user_ptr, size, user_sizes, uargs); 920 343 921 344 trace_event_buffer_commit(&fbuffer); 922 345 } ··· 969 386 static int reg_event_syscall_enter(struct trace_event_file *file, 970 387 struct trace_event_call *call) 971 388 { 389 + struct syscall_metadata *sys_data = call->data; 972 390 struct trace_array *tr = file->tr; 973 391 int ret = 0; 974 392 int num; 975 393 976 - num = ((struct syscall_metadata *)call->data)->syscall_nr; 394 + num = sys_data->syscall_nr; 977 395 if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 978 396 return -ENOSYS; 979 - mutex_lock(&syscall_trace_lock); 980 - if (!tr->sys_refcount_enter) 981 - ret = register_trace_sys_enter(ftrace_syscall_enter, tr); 982 - if (!ret) { 983 - WRITE_ONCE(tr->enter_syscall_files[num], file); 984 - tr->sys_refcount_enter++; 397 + guard(mutex)(&syscall_trace_lock); 398 + if (sys_data->user_mask) { 399 + ret = syscall_fault_buffer_enable(); 400 + if (ret < 0) 401 + return ret; 985 402 } 986 - mutex_unlock(&syscall_trace_lock); 987 - return ret; 403 + if (!tr->sys_refcount_enter) { 404 + ret = register_trace_sys_enter(ftrace_syscall_enter, tr); 405 + if (ret < 0) { 406 + if (sys_data->user_mask) 407 + syscall_fault_buffer_disable(); 408 + return ret; 409 + } 410 + } 411 + WRITE_ONCE(tr->enter_syscall_files[num], file); 412 + tr->sys_refcount_enter++; 413 + return 0; 988 414 } 989 415 990 416 static void unreg_event_syscall_enter(struct trace_event_file *file, 991 417 struct trace_event_call *call) 992 418 { 419 + struct syscall_metadata *sys_data = call->data; 993 420 struct trace_array *tr = file->tr; 994 421 int num; 995 422 996 - num = ((struct syscall_metadata *)call->data)->syscall_nr; 423 + num = sys_data->syscall_nr; 997 424 if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 998 425 return; 999 - mutex_lock(&syscall_trace_lock); 426 + guard(mutex)(&syscall_trace_lock); 1000 427 tr->sys_refcount_enter--; 1001 428 WRITE_ONCE(tr->enter_syscall_files[num], NULL); 1002 429 if (!tr->sys_refcount_enter) 1003 430 unregister_trace_sys_enter(ftrace_syscall_enter, tr); 1004 - mutex_unlock(&syscall_trace_lock); 431 + if (sys_data->user_mask) 432 + syscall_fault_buffer_disable(); 1005 433 } 1006 434 1007 435 static int reg_event_syscall_exit(struct trace_event_file *file, ··· 1053 459 mutex_unlock(&syscall_trace_lock); 1054 460 } 1055 461 462 + /* 463 + * For system calls that reference user space memory that can 464 + * be recorded into the event, set the system call meta data's user_mask 465 + * to the "args" index that points to the user space memory to retrieve. 466 + */ 467 + static void check_faultable_syscall(struct trace_event_call *call, int nr) 468 + { 469 + struct syscall_metadata *sys_data = call->data; 470 + unsigned long mask; 471 + 472 + /* Only work on entry */ 473 + if (sys_data->enter_event != call) 474 + return; 475 + 476 + sys_data->user_arg_size = -1; 477 + 478 + switch (nr) { 479 + /* user arg 1 with size arg at 2 */ 480 + case __NR_write: 481 + #ifdef __NR_mq_timedsend 482 + case __NR_mq_timedsend: 483 + #endif 484 + case __NR_pwrite64: 485 + sys_data->user_mask = BIT(1); 486 + sys_data->user_arg_size = 2; 487 + break; 488 + /* user arg 0 with size arg at 1 as string */ 489 + case __NR_setdomainname: 490 + case __NR_sethostname: 491 + sys_data->user_mask = BIT(0); 492 + sys_data->user_arg_size = 1; 493 + sys_data->user_arg_is_str = 1; 494 + break; 495 + #ifdef __NR_kexec_file_load 496 + /* user arg 4 with size arg at 3 as string */ 497 + case __NR_kexec_file_load: 498 + sys_data->user_mask = BIT(4); 499 + sys_data->user_arg_size = 3; 500 + sys_data->user_arg_is_str = 1; 501 + break; 502 + #endif 503 + /* user arg at position 0 */ 504 + #ifdef __NR_access 505 + case __NR_access: 506 + #endif 507 + case __NR_acct: 508 + case __NR_chdir: 509 + #ifdef __NR_chown 510 + case __NR_chown: 511 + #endif 512 + #ifdef __NR_chmod 513 + case __NR_chmod: 514 + #endif 515 + case __NR_chroot: 516 + #ifdef __NR_creat 517 + case __NR_creat: 518 + #endif 519 + case __NR_delete_module: 520 + case __NR_execve: 521 + case __NR_fsopen: 522 + #ifdef __NR_lchown 523 + case __NR_lchown: 524 + #endif 525 + #ifdef __NR_open 526 + case __NR_open: 527 + #endif 528 + case __NR_memfd_create: 529 + #ifdef __NR_mkdir 530 + case __NR_mkdir: 531 + #endif 532 + #ifdef __NR_mknod 533 + case __NR_mknod: 534 + #endif 535 + case __NR_mq_open: 536 + case __NR_mq_unlink: 537 + #ifdef __NR_readlink 538 + case __NR_readlink: 539 + #endif 540 + #ifdef __NR_rmdir 541 + case __NR_rmdir: 542 + #endif 543 + case __NR_shmdt: 544 + #ifdef __NR_statfs 545 + case __NR_statfs: 546 + #endif 547 + case __NR_swapon: 548 + case __NR_swapoff: 549 + #ifdef __NR_truncate 550 + case __NR_truncate: 551 + #endif 552 + #ifdef __NR_unlink 553 + case __NR_unlink: 554 + #endif 555 + case __NR_umount2: 556 + #ifdef __NR_utime 557 + case __NR_utime: 558 + #endif 559 + #ifdef __NR_utimes 560 + case __NR_utimes: 561 + #endif 562 + sys_data->user_mask = BIT(0); 563 + break; 564 + /* user arg at position 1 */ 565 + case __NR_execveat: 566 + case __NR_faccessat: 567 + case __NR_faccessat2: 568 + case __NR_finit_module: 569 + case __NR_fchmodat: 570 + case __NR_fchmodat2: 571 + case __NR_fchownat: 572 + case __NR_fgetxattr: 573 + case __NR_flistxattr: 574 + case __NR_fsetxattr: 575 + case __NR_fspick: 576 + case __NR_fremovexattr: 577 + #ifdef __NR_futimesat 578 + case __NR_futimesat: 579 + #endif 580 + case __NR_inotify_add_watch: 581 + case __NR_mkdirat: 582 + case __NR_mknodat: 583 + case __NR_mount_setattr: 584 + case __NR_name_to_handle_at: 585 + #ifdef __NR_newfstatat 586 + case __NR_newfstatat: 587 + #endif 588 + case __NR_openat: 589 + case __NR_openat2: 590 + case __NR_open_tree: 591 + case __NR_open_tree_attr: 592 + case __NR_readlinkat: 593 + case __NR_quotactl: 594 + case __NR_syslog: 595 + case __NR_statx: 596 + case __NR_unlinkat: 597 + #ifdef __NR_utimensat 598 + case __NR_utimensat: 599 + #endif 600 + sys_data->user_mask = BIT(1); 601 + break; 602 + /* user arg at position 2 */ 603 + case __NR_init_module: 604 + case __NR_fsconfig: 605 + sys_data->user_mask = BIT(2); 606 + break; 607 + /* user arg at position 4 */ 608 + case __NR_fanotify_mark: 609 + sys_data->user_mask = BIT(4); 610 + break; 611 + /* 2 user args, 0 and 1 */ 612 + case __NR_add_key: 613 + case __NR_getxattr: 614 + case __NR_lgetxattr: 615 + case __NR_lremovexattr: 616 + #ifdef __NR_link 617 + case __NR_link: 618 + #endif 619 + case __NR_listxattr: 620 + case __NR_llistxattr: 621 + case __NR_lsetxattr: 622 + case __NR_pivot_root: 623 + case __NR_removexattr: 624 + #ifdef __NR_rename 625 + case __NR_rename: 626 + #endif 627 + case __NR_request_key: 628 + case __NR_setxattr: 629 + #ifdef __NR_symlink 630 + case __NR_symlink: 631 + #endif 632 + sys_data->user_mask = BIT(0) | BIT(1); 633 + break; 634 + /* 2 user args, 0 and 2 */ 635 + case __NR_symlinkat: 636 + sys_data->user_mask = BIT(0) | BIT(2); 637 + break; 638 + /* 2 user args, 1 and 3 */ 639 + case __NR_getxattrat: 640 + case __NR_linkat: 641 + case __NR_listxattrat: 642 + case __NR_move_mount: 643 + #ifdef __NR_renameat 644 + case __NR_renameat: 645 + #endif 646 + case __NR_renameat2: 647 + case __NR_removexattrat: 648 + case __NR_setxattrat: 649 + sys_data->user_mask = BIT(1) | BIT(3); 650 + break; 651 + case __NR_mount: /* Just dev_name and dir_name, TODO add type */ 652 + sys_data->user_mask = BIT(0) | BIT(1) | BIT(2); 653 + break; 654 + default: 655 + sys_data->user_mask = 0; 656 + return; 657 + } 658 + 659 + if (sys_data->user_arg_size < 0) 660 + return; 661 + 662 + /* 663 + * The user_arg_size can only be used when the system call 664 + * is reading only a single address from user space. 665 + */ 666 + mask = sys_data->user_mask; 667 + if (WARN_ON(mask & (mask - 1))) 668 + sys_data->user_arg_size = -1; 669 + } 670 + 1056 671 static int __init init_syscall_trace(struct trace_event_call *call) 1057 672 { 1058 673 int id; ··· 1273 470 ((struct syscall_metadata *)call->data)->name); 1274 471 return -ENOSYS; 1275 472 } 473 + 474 + check_faultable_syscall(call, num); 1276 475 1277 476 if (set_syscall_print_fmt(call) < 0) 1278 477 return -ENOMEM; ··· 1403 598 struct hlist_head *head; 1404 599 unsigned long args[6]; 1405 600 bool valid_prog_array; 601 + bool mayfault; 602 + char *user_ptr; 603 + int user_sizes[SYSCALL_FAULT_MAX_CNT] = {}; 604 + int buf_size = CONFIG_TRACE_SYSCALL_BUF_SIZE_DEFAULT; 1406 605 int syscall_nr; 1407 606 int rctx; 1408 - int size; 607 + int size = 0; 608 + int uargs = 0; 1409 609 1410 610 /* 1411 611 * Syscall probe called with preemption enabled, but the ring ··· 1429 619 if (!sys_data) 1430 620 return; 1431 621 622 + syscall_get_arguments(current, regs, args); 623 + 624 + /* Check if this syscall event faults in user space memory */ 625 + mayfault = sys_data->user_mask != 0; 626 + 627 + if (mayfault) { 628 + if (syscall_get_data(sys_data, args, &user_ptr, 629 + &size, user_sizes, &uargs, buf_size) < 0) 630 + return; 631 + } 632 + 1432 633 head = this_cpu_ptr(sys_data->enter_event->perf_events); 1433 634 valid_prog_array = bpf_prog_array_valid(sys_data->enter_event); 1434 635 if (!valid_prog_array && hlist_empty(head)) 1435 636 return; 1436 637 1437 638 /* get the size after alignment with the u32 buffer size field */ 1438 - size = sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec); 639 + size += sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec); 1439 640 size = ALIGN(size + sizeof(u32), sizeof(u64)); 1440 641 size -= sizeof(u32); 1441 642 ··· 1455 634 return; 1456 635 1457 636 rec->nr = syscall_nr; 1458 - syscall_get_arguments(current, regs, args); 1459 637 memcpy(&rec->args, args, sizeof(unsigned long) * sys_data->nb_args); 638 + 639 + if (mayfault) 640 + syscall_put_data(sys_data, rec, user_ptr, size, user_sizes, uargs); 1460 641 1461 642 if ((valid_prog_array && 1462 643 !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec)) || ··· 1474 651 1475 652 static int perf_sysenter_enable(struct trace_event_call *call) 1476 653 { 1477 - int ret = 0; 654 + struct syscall_metadata *sys_data = call->data; 1478 655 int num; 656 + int ret; 1479 657 1480 - num = ((struct syscall_metadata *)call->data)->syscall_nr; 658 + num = sys_data->syscall_nr; 1481 659 1482 - mutex_lock(&syscall_trace_lock); 1483 - if (!sys_perf_refcount_enter) 1484 - ret = register_trace_sys_enter(perf_syscall_enter, NULL); 1485 - if (ret) { 1486 - pr_info("event trace: Could not activate syscall entry trace point"); 1487 - } else { 1488 - set_bit(num, enabled_perf_enter_syscalls); 1489 - sys_perf_refcount_enter++; 660 + guard(mutex)(&syscall_trace_lock); 661 + if (sys_data->user_mask) { 662 + ret = syscall_fault_buffer_enable(); 663 + if (ret < 0) 664 + return ret; 1490 665 } 1491 - mutex_unlock(&syscall_trace_lock); 1492 - return ret; 666 + if (!sys_perf_refcount_enter) { 667 + ret = register_trace_sys_enter(perf_syscall_enter, NULL); 668 + if (ret) { 669 + pr_info("event trace: Could not activate syscall entry trace point"); 670 + if (sys_data->user_mask) 671 + syscall_fault_buffer_disable(); 672 + return ret; 673 + } 674 + } 675 + set_bit(num, enabled_perf_enter_syscalls); 676 + sys_perf_refcount_enter++; 677 + return 0; 1493 678 } 1494 679 1495 680 static void perf_sysenter_disable(struct trace_event_call *call) 1496 681 { 682 + struct syscall_metadata *sys_data = call->data; 1497 683 int num; 1498 684 1499 - num = ((struct syscall_metadata *)call->data)->syscall_nr; 685 + num = sys_data->syscall_nr; 1500 686 1501 - mutex_lock(&syscall_trace_lock); 687 + guard(mutex)(&syscall_trace_lock); 1502 688 sys_perf_refcount_enter--; 1503 689 clear_bit(num, enabled_perf_enter_syscalls); 1504 690 if (!sys_perf_refcount_enter) 1505 691 unregister_trace_sys_enter(perf_syscall_enter, NULL); 1506 - mutex_unlock(&syscall_trace_lock); 692 + if (sys_data->user_mask) 693 + syscall_fault_buffer_disable(); 1507 694 } 1508 695 1509 696 static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs, ··· 1590 757 1591 758 static int perf_sysexit_enable(struct trace_event_call *call) 1592 759 { 1593 - int ret = 0; 1594 760 int num; 1595 761 1596 762 num = ((struct syscall_metadata *)call->data)->syscall_nr; 1597 763 1598 - mutex_lock(&syscall_trace_lock); 1599 - if (!sys_perf_refcount_exit) 1600 - ret = register_trace_sys_exit(perf_syscall_exit, NULL); 1601 - if (ret) { 1602 - pr_info("event trace: Could not activate syscall exit trace point"); 1603 - } else { 1604 - set_bit(num, enabled_perf_exit_syscalls); 1605 - sys_perf_refcount_exit++; 764 + guard(mutex)(&syscall_trace_lock); 765 + if (!sys_perf_refcount_exit) { 766 + int ret = register_trace_sys_exit(perf_syscall_exit, NULL); 767 + if (ret) { 768 + pr_info("event trace: Could not activate syscall exit trace point"); 769 + return ret; 770 + } 1606 771 } 1607 - mutex_unlock(&syscall_trace_lock); 1608 - return ret; 772 + set_bit(num, enabled_perf_exit_syscalls); 773 + sys_perf_refcount_exit++; 774 + return 0; 1609 775 } 1610 776 1611 777 static void perf_sysexit_disable(struct trace_event_call *call) ··· 1613 781 1614 782 num = ((struct syscall_metadata *)call->data)->syscall_nr; 1615 783 1616 - mutex_lock(&syscall_trace_lock); 784 + guard(mutex)(&syscall_trace_lock); 1617 785 sys_perf_refcount_exit--; 1618 786 clear_bit(num, enabled_perf_exit_syscalls); 1619 787 if (!sys_perf_refcount_exit) 1620 788 unregister_trace_sys_exit(perf_syscall_exit, NULL); 1621 - mutex_unlock(&syscall_trace_lock); 1622 789 } 1623 790 1624 791 #endif /* CONFIG_PERF_EVENTS */