Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:
"Major changes:

- Changed location of tracing repo from personal git repo to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git

- Added Masami Hiramatsu as co-maintainer

- Updated MAINTAINERS file to separate out FTRACE as it is more than
just TRACING.

Minor changes:

- Added Mark Rutland as FTRACE reviewer

- Updated user_events to make it on its way to remove the BROKEN tag.
The changes should now be acceptable but will run it through a
cycle and hopefully we can remove the BROKEN tag next release.

- Added filtering to eprobes

- Added a delta time to the benchmark trace event

- Have the histogram and filter callbacks called via a switch
statement instead of indirect functions. This speeds it up to avoid
retpolines.

- Add a way to wake up ring buffer waiters waiting for the ring
buffer to fill up to its watermark.

- New ioctl() on the trace_pipe_raw file to wake up ring buffer
waiters.

- Wake up waiters when the ring buffer is disabled. A reader may
block when the ring buffer is disabled, but if it was blocked when
the ring buffer is disabled it should then wake up.

Fixes:

- Allow splice to read partially read ring buffer pages. This fixes
splice never moving forward.

- Fix inverted compare that made the "shortest" ring buffer wait
queue actually the longest.

- Fix a race in the ring buffer between resetting a page when a
writer goes to another page, and the reader.

- Fix ftrace accounting bug when function hooks are added at boot up
before the weak functions are set to "disabled".

- Fix bug that freed a user allocated snapshot buffer when enabling a
tracer.

- Fix possible recursive locks in osnoise tracer

- Fix recursive locking direct functions

- Other minor clean ups and fixes"

* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
ftrace: Create separate entry in MAINTAINERS for function hooks
tracing: Update MAINTAINERS to reflect new tracing git repo
tracing: Do not free snapshot if tracer is on cmdline
ftrace: Still disable enabled records marked as disabled
tracing/user_events: Move pages/locks into groups to prepare for namespaces
tracing: Add Masami Hiramatsu as co-maintainer
tracing: Remove unused variable 'dups'
MAINTAINERS: add myself as a tracing reviewer
ring-buffer: Fix race between reset page and reading page
tracing/user_events: Update ABI documentation to align to bits vs bytes
tracing/user_events: Use bits vs bytes for enabled status page data
tracing/user_events: Use refcount instead of atomic for ref tracking
tracing/user_events: Ensure user provided strings are safely formatted
tracing/user_events: Use WRITE instead of READ for io vector import
tracing/user_events: Use NULL for strstr checks
tracing: Fix spelling mistake "preapre" -> "prepare"
tracing: Wake up waiters when tracing is disabled
tracing: Add ioctl() to force ring buffer waiters to wake up
tracing: Wake up ring buffer waiters on closing of the file
ring-buffer: Add ring_buffer_wake_waiters()
...

+1302 -489
+58 -28
Documentation/trace/user_events.rst
··· 20 20 21 21 Typically programs will register a set of events that they wish to expose to 22 22 tools that can read trace_events (such as ftrace and perf). The registration 23 - process gives back two ints to the program for each event. The first int is the 24 - status index. This index describes which byte in the 23 + process gives back two ints to the program for each event. The first int is 24 + the status bit. This describes which bit in little-endian format in the 25 25 /sys/kernel/debug/tracing/user_events_status file represents this event. The 26 - second int is the write index. This index describes the data when a write() or 26 + second int is the write index which describes the data when a write() or 27 27 writev() is called on the /sys/kernel/debug/tracing/user_events_data file. 28 28 29 - The structures referenced in this document are contained with the 30 - /include/uap/linux/user_events.h file in the source tree. 29 + The structures referenced in this document are contained within the 30 + /include/uapi/linux/user_events.h file in the source tree. 31 31 32 32 **NOTE:** *Both user_events_status and user_events_data are under the tracefs 33 33 filesystem and may be mounted at different paths than above.* ··· 38 38 /sys/kernel/debug/tracing/user_events_data file. The command to issue is 39 39 DIAG_IOCSREG. 40 40 41 - This command takes a struct user_reg as an argument:: 41 + This command takes a packed struct user_reg as an argument:: 42 42 43 43 struct user_reg { 44 44 u32 size; 45 45 u64 name_args; 46 - u32 status_index; 46 + u32 status_bit; 47 47 u32 write_index; 48 48 }; 49 49 50 50 The struct user_reg requires two inputs, the first is the size of the structure 51 51 to ensure forward and backward compatibility. The second is the command string 52 - to issue for registering. Upon success two outputs are set, the status index 52 + to issue for registering. Upon success two outputs are set, the status bit 53 53 and the write index. 54 54 55 55 User based events show up under tracefs like any other event under the ··· 111 111 writev() calls when something is actively attached to the event. 112 112 113 113 User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to 114 - check the status for each event that is registered. The byte to check in the 115 - file is given back after the register ioctl() via user_reg.status_index. 114 + check the status for each event that is registered. The bit to check in the 115 + file is given back after the register ioctl() via user_reg.status_bit. The bit 116 + is always in little-endian format. Programs can check if the bit is set either 117 + using a byte-wise index with a mask or a long-wise index with a little-endian 118 + mask. 119 + 116 120 Currently the size of user_events_status is a single page, however, custom 117 121 kernel configurations can change this size to allow more user based events. In 118 122 all cases the size of the file is a multiple of a page size. 119 123 120 - For example, if the register ioctl() gives back a status_index of 3 you would 121 - check byte 3 of the returned mmap data to see if anything is attached to that 122 - event. 124 + For example, if the register ioctl() gives back a status_bit of 3 you would 125 + check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8 126 + (1 << (3 % 8)) to see if anything is attached to that event. 127 + 128 + A byte-wise index check is performed as follows:: 129 + 130 + int index, mask; 131 + char *status_page; 132 + 133 + index = status_bit / 8; 134 + mask = 1 << (status_bit % 8); 135 + 136 + ... 137 + 138 + if (status_page[index] & mask) { 139 + /* Enabled */ 140 + } 141 + 142 + A long-wise index check is performed as follows:: 143 + 144 + #include <asm/bitsperlong.h> 145 + #include <endian.h> 146 + 147 + #if __BITS_PER_LONG == 64 148 + #define endian_swap(x) htole64(x) 149 + #else 150 + #define endian_swap(x) htole32(x) 151 + #endif 152 + 153 + long index, mask, *status_page; 154 + 155 + index = status_bit / __BITS_PER_LONG; 156 + mask = 1L << (status_bit % __BITS_PER_LONG); 157 + mask = endian_swap(mask); 158 + 159 + ... 160 + 161 + if (status_page[index] & mask) { 162 + /* Enabled */ 163 + } 123 164 124 165 Administrators can easily check the status of all registered events by reading 125 166 the user_events_status file directly via a terminal. The output is as follows:: ··· 178 137 179 138 Active: 1 180 139 Busy: 0 181 - Max: 4096 140 + Max: 32768 182 141 183 142 If a user enables the user event via ftrace, the output would change to this:: 184 143 ··· 186 145 187 146 Active: 1 188 147 Busy: 1 189 - Max: 4096 148 + Max: 32768 190 149 191 - **NOTE:** *A status index of 0 will never be returned. This allows user 192 - programs to have an index that can be used on error cases.* 193 - 194 - Status Bits 195 - ^^^^^^^^^^^ 196 - The byte being checked will be non-zero if anything is attached. Programs can 197 - check specific bits in the byte to see what mechanism has been attached. 198 - 199 - The following values are defined to aid in checking what has been attached: 200 - 201 - **EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0). 202 - 203 - **EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1). 150 + **NOTE:** *A status bit of 0 will never be returned. This allows user programs 151 + to have a bit that can be used on error cases.* 204 152 205 153 Writing Data 206 154 ------------
+18 -8
MAINTAINERS
··· 8433 8433 S: Maintained 8434 8434 F: drivers/platform/x86/fujitsu-tablet.c 8435 8435 8436 + FUNCTION HOOKS (FTRACE) 8437 + M: Steven Rostedt <rostedt@goodmis.org> 8438 + M: Masami Hiramatsu <mhiramat@kernel.org> 8439 + R: Mark Rutland <mark.rutland@arm.com> 8440 + S: Maintained 8441 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git 8442 + F: Documentation/trace/ftrace* 8443 + F: kernel/trace/ftrace* 8444 + F: kernel/trace/fgraph.c 8445 + F: arch/*/*/*/*ftrace* 8446 + F: arch/*/*/*ftrace* 8447 + F: include/*/ftrace.h 8448 + 8436 8449 FUNGIBLE ETHERNET DRIVERS 8437 8450 M: Dimitris Michailidis <dmichail@fungible.com> 8438 8451 L: netdev@vger.kernel.org ··· 11435 11422 M: "David S. Miller" <davem@davemloft.net> 11436 11423 M: Masami Hiramatsu <mhiramat@kernel.org> 11437 11424 S: Maintained 11438 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git 11425 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git 11439 11426 F: Documentation/trace/kprobes.rst 11440 11427 F: include/asm-generic/kprobes.h 11441 11428 F: include/linux/kprobes.h ··· 20784 20771 20785 20772 TRACING 20786 20773 M: Steven Rostedt <rostedt@goodmis.org> 20787 - M: Ingo Molnar <mingo@redhat.com> 20774 + M: Masami Hiramatsu <mhiramat@kernel.org> 20788 20775 S: Maintained 20789 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git 20790 - F: Documentation/trace/ftrace.rst 20791 - F: arch/*/*/*/*ftrace* 20792 - F: arch/*/*/*ftrace* 20776 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git 20777 + F: Documentation/trace/* 20793 20778 F: fs/tracefs/ 20794 - F: include/*/ftrace.h 20795 20779 F: include/linux/trace*.h 20796 20780 F: include/trace/ 20797 20781 F: kernel/trace/ ··· 20797 20787 20798 20788 TRACING MMIO ACCESSES (MMIOTRACE) 20799 20789 M: Steven Rostedt <rostedt@goodmis.org> 20800 - M: Ingo Molnar <mingo@kernel.org> 20790 + M: Masami Hiramatsu <mhiramat@kernel.org> 20801 20791 R: Karol Herbst <karolherbst@gmail.com> 20802 20792 R: Pekka Paalanen <ppaalanen@gmail.com> 20803 20793 L: linux-kernel@vger.kernel.org
-1
arch/x86/include/asm/ftrace.h
··· 23 23 #define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR 24 24 25 25 #ifndef __ASSEMBLY__ 26 - extern atomic_t modifying_ftrace_code; 27 26 extern void __fentry__(void); 28 27 29 28 static inline unsigned long ftrace_call_adjust(unsigned long addr)
-2
arch/x86/include/asm/kprobes.h
··· 50 50 51 51 void arch_remove_kprobe(struct kprobe *p); 52 52 53 - extern void arch_kprobe_override_function(struct pt_regs *regs); 54 - 55 53 /* Architecture specific copy of original instruction*/ 56 54 struct arch_specific_insn { 57 55 /* copy of the original instruction */
-2
arch/x86/kernel/kprobes/core.c
··· 59 59 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; 60 60 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); 61 61 62 - #define stack_addr(regs) ((unsigned long *)regs->sp) 63 - 64 62 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\ 65 63 (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \ 66 64 (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \
-41
include/linux/ftrace.h
··· 1122 1122 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ 1123 1123 1124 1124 #ifdef CONFIG_TRACING 1125 - 1126 - /* flags for current->trace */ 1127 - enum { 1128 - TSK_TRACE_FL_TRACE_BIT = 0, 1129 - TSK_TRACE_FL_GRAPH_BIT = 1, 1130 - }; 1131 - enum { 1132 - TSK_TRACE_FL_TRACE = 1 << TSK_TRACE_FL_TRACE_BIT, 1133 - TSK_TRACE_FL_GRAPH = 1 << TSK_TRACE_FL_GRAPH_BIT, 1134 - }; 1135 - 1136 - static inline void set_tsk_trace_trace(struct task_struct *tsk) 1137 - { 1138 - set_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace); 1139 - } 1140 - 1141 - static inline void clear_tsk_trace_trace(struct task_struct *tsk) 1142 - { 1143 - clear_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace); 1144 - } 1145 - 1146 - static inline int test_tsk_trace_trace(struct task_struct *tsk) 1147 - { 1148 - return tsk->trace & TSK_TRACE_FL_TRACE; 1149 - } 1150 - 1151 - static inline void set_tsk_trace_graph(struct task_struct *tsk) 1152 - { 1153 - set_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace); 1154 - } 1155 - 1156 - static inline void clear_tsk_trace_graph(struct task_struct *tsk) 1157 - { 1158 - clear_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace); 1159 - } 1160 - 1161 - static inline int test_tsk_trace_graph(struct task_struct *tsk) 1162 - { 1163 - return tsk->trace & TSK_TRACE_FL_GRAPH; 1164 - } 1165 - 1166 1125 enum ftrace_dump_mode; 1167 1126 1168 1127 extern enum ftrace_dump_mode ftrace_dump_on_oops;
+1 -1
include/linux/ring_buffer.h
··· 101 101 int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); 102 102 __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, 103 103 struct file *filp, poll_table *poll_table); 104 - 104 + void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); 105 105 106 106 #define RING_BUFFER_ALL_CPUS -1 107 107
-3
include/linux/sched.h
··· 1390 1390 #endif 1391 1391 1392 1392 #ifdef CONFIG_TRACING 1393 - /* State flags for use by tracers: */ 1394 - unsigned long trace; 1395 - 1396 1393 /* Bitmask and counter of trace recursion: */ 1397 1394 unsigned long trace_recursion; 1398 1395 #endif /* CONFIG_TRACING */
+1
include/linux/trace_events.h
··· 92 92 unsigned int temp_size; 93 93 char *fmt; /* modified format holder */ 94 94 unsigned int fmt_size; 95 + long wait_index; 95 96 96 97 /* trace_seq for __print_flags() and __print_symbolic() etc. */ 97 98 struct trace_seq tmp_seq;
+3 -12
include/linux/user_events.h
··· 20 20 #define USER_EVENTS_SYSTEM "user_events" 21 21 #define USER_EVENTS_PREFIX "u:" 22 22 23 - /* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */ 24 - #define EVENT_BIT_FTRACE 0 25 - #define EVENT_BIT_PERF 1 26 - #define EVENT_BIT_OTHER 7 27 - 28 - #define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE) 29 - #define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF) 30 - #define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER) 31 - 32 23 /* Create dynamic location entry within a 32-bit value */ 33 24 #define DYN_LOC(offset, size) ((size) << 16 | (offset)) 34 25 ··· 36 45 /* Input: Pointer to string with event name, description and flags */ 37 46 __u64 name_args; 38 47 39 - /* Output: Byte index of the event within the status page */ 40 - __u32 status_index; 48 + /* Output: Bitwise index of the event within the status page */ 49 + __u32 status_bit; 41 50 42 51 /* Output: Index of the event to use when writing data */ 43 52 __u32 write_index; 44 - }; 53 + } __attribute__((__packed__)); 45 54 46 55 #define DIAG_IOC_MAGIC '*' 47 56
+27 -7
kernel/trace/ftrace.c
··· 1644 1644 static struct ftrace_ops * 1645 1645 ftrace_find_tramp_ops_next(struct dyn_ftrace *rec, struct ftrace_ops *ops); 1646 1646 1647 + static bool skip_record(struct dyn_ftrace *rec) 1648 + { 1649 + /* 1650 + * At boot up, weak functions are set to disable. Function tracing 1651 + * can be enabled before they are, and they still need to be disabled now. 1652 + * If the record is disabled, still continue if it is marked as already 1653 + * enabled (this is needed to keep the accounting working). 1654 + */ 1655 + return rec->flags & FTRACE_FL_DISABLED && 1656 + !(rec->flags & FTRACE_FL_ENABLED); 1657 + } 1658 + 1647 1659 static bool __ftrace_hash_rec_update(struct ftrace_ops *ops, 1648 1660 int filter_hash, 1649 1661 bool inc) ··· 1705 1693 int in_hash = 0; 1706 1694 int match = 0; 1707 1695 1708 - if (rec->flags & FTRACE_FL_DISABLED) 1696 + if (skip_record(rec)) 1709 1697 continue; 1710 1698 1711 1699 if (all) { ··· 2138 2126 2139 2127 ftrace_bug_type = FTRACE_BUG_UNKNOWN; 2140 2128 2141 - if (rec->flags & FTRACE_FL_DISABLED) 2129 + if (skip_record(rec)) 2142 2130 return FTRACE_UPDATE_IGNORE; 2143 2131 2144 2132 /* ··· 2253 2241 if (update) { 2254 2242 /* If there's no more users, clear all flags */ 2255 2243 if (!ftrace_rec_count(rec)) 2256 - rec->flags = 0; 2244 + rec->flags &= FTRACE_FL_DISABLED; 2257 2245 else 2258 2246 /* 2259 2247 * Just disable the record, but keep the ops TRAMP ··· 2646 2634 2647 2635 do_for_each_ftrace_rec(pg, rec) { 2648 2636 2649 - if (rec->flags & FTRACE_FL_DISABLED) 2637 + if (skip_record(rec)) 2650 2638 continue; 2651 2639 2652 2640 failed = __ftrace_replace_code(rec, enable); ··· 5439 5427 * it is safe to modify the ftrace record, where it should be 5440 5428 * currently calling @old_addr directly, to call @new_addr. 5441 5429 * 5430 + * This is called with direct_mutex locked. 5431 + * 5442 5432 * Safety checks should be made to make sure that the code at 5443 5433 * @rec->ip is currently calling @old_addr. And this must 5444 5434 * also update entry->direct to @new_addr. ··· 5452 5438 { 5453 5439 unsigned long ip = rec->ip; 5454 5440 int ret; 5441 + 5442 + lockdep_assert_held(&direct_mutex); 5455 5443 5456 5444 /* 5457 5445 * The ftrace_lock was used to determine if the record ··· 5477 5461 if (ret) 5478 5462 goto out_lock; 5479 5463 5480 - ret = register_ftrace_function(&stub_ops); 5464 + ret = register_ftrace_function_nolock(&stub_ops); 5481 5465 if (ret) { 5482 5466 ftrace_set_filter_ip(&stub_ops, ip, 1, 0); 5483 5467 goto out_lock; ··· 6097 6081 6098 6082 if (filter_hash) { 6099 6083 orig_hash = &iter->ops->func_hash->filter_hash; 6100 - if (iter->tr && !list_empty(&iter->tr->mod_trace)) 6101 - iter->hash->flags |= FTRACE_HASH_FL_MOD; 6084 + if (iter->tr) { 6085 + if (list_empty(&iter->tr->mod_trace)) 6086 + iter->hash->flags &= ~FTRACE_HASH_FL_MOD; 6087 + else 6088 + iter->hash->flags |= FTRACE_HASH_FL_MOD; 6089 + } 6102 6090 } else 6103 6091 orig_hash = &iter->ops->func_hash->notrace_hash; 6104 6092
+44 -5
kernel/trace/kprobe_event_gen_test.c
··· 35 35 static struct trace_event_file *gen_kprobe_test; 36 36 static struct trace_event_file *gen_kretprobe_test; 37 37 38 + #define KPROBE_GEN_TEST_FUNC "do_sys_open" 39 + 40 + /* X86 */ 41 + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_32) 42 + #define KPROBE_GEN_TEST_ARG0 "dfd=%ax" 43 + #define KPROBE_GEN_TEST_ARG1 "filename=%dx" 44 + #define KPROBE_GEN_TEST_ARG2 "flags=%cx" 45 + #define KPROBE_GEN_TEST_ARG3 "mode=+4($stack)" 46 + 47 + /* ARM64 */ 48 + #elif defined(CONFIG_ARM64) 49 + #define KPROBE_GEN_TEST_ARG0 "dfd=%x0" 50 + #define KPROBE_GEN_TEST_ARG1 "filename=%x1" 51 + #define KPROBE_GEN_TEST_ARG2 "flags=%x2" 52 + #define KPROBE_GEN_TEST_ARG3 "mode=%x3" 53 + 54 + /* ARM */ 55 + #elif defined(CONFIG_ARM) 56 + #define KPROBE_GEN_TEST_ARG0 "dfd=%r0" 57 + #define KPROBE_GEN_TEST_ARG1 "filename=%r1" 58 + #define KPROBE_GEN_TEST_ARG2 "flags=%r2" 59 + #define KPROBE_GEN_TEST_ARG3 "mode=%r3" 60 + 61 + /* RISCV */ 62 + #elif defined(CONFIG_RISCV) 63 + #define KPROBE_GEN_TEST_ARG0 "dfd=%a0" 64 + #define KPROBE_GEN_TEST_ARG1 "filename=%a1" 65 + #define KPROBE_GEN_TEST_ARG2 "flags=%a2" 66 + #define KPROBE_GEN_TEST_ARG3 "mode=%a3" 67 + 68 + /* others */ 69 + #else 70 + #define KPROBE_GEN_TEST_ARG0 NULL 71 + #define KPROBE_GEN_TEST_ARG1 NULL 72 + #define KPROBE_GEN_TEST_ARG2 NULL 73 + #define KPROBE_GEN_TEST_ARG3 NULL 74 + #endif 75 + 76 + 38 77 /* 39 78 * Test to make sure we can create a kprobe event, then add more 40 79 * fields. ··· 97 58 * fields. 98 59 */ 99 60 ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", 100 - "do_sys_open", 101 - "dfd=%ax", "filename=%dx"); 61 + KPROBE_GEN_TEST_FUNC, 62 + KPROBE_GEN_TEST_ARG0, KPROBE_GEN_TEST_ARG1); 102 63 if (ret) 103 64 goto free; 104 65 105 66 /* Use kprobe_event_add_fields to add the rest of the fields */ 106 67 107 - ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)"); 68 + ret = kprobe_event_add_fields(&cmd, KPROBE_GEN_TEST_ARG2, KPROBE_GEN_TEST_ARG3); 108 69 if (ret) 109 70 goto free; 110 71 ··· 167 128 * Define the kretprobe event. 168 129 */ 169 130 ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test", 170 - "do_sys_open", 131 + KPROBE_GEN_TEST_FUNC, 171 132 "$retval"); 172 133 if (ret) 173 134 goto free; ··· 245 206 WARN_ON(kprobe_event_delete("gen_kprobe_test")); 246 207 247 208 /* Disable the event or you can't remove it */ 248 - WARN_ON(trace_array_set_clr_event(gen_kprobe_test->tr, 209 + WARN_ON(trace_array_set_clr_event(gen_kretprobe_test->tr, 249 210 "kprobes", 250 211 "gen_kretprobe_test", false)); 251 212
+84 -3
kernel/trace/ring_buffer.c
··· 413 413 struct irq_work work; 414 414 wait_queue_head_t waiters; 415 415 wait_queue_head_t full_waiters; 416 + long wait_index; 416 417 bool waiters_pending; 417 418 bool full_waiters_pending; 418 419 bool wakeup_full; ··· 918 917 struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); 919 918 920 919 wake_up_all(&rbwork->waiters); 921 - if (rbwork->wakeup_full) { 920 + if (rbwork->full_waiters_pending || rbwork->wakeup_full) { 922 921 rbwork->wakeup_full = false; 922 + rbwork->full_waiters_pending = false; 923 923 wake_up_all(&rbwork->full_waiters); 924 924 } 925 + } 926 + 927 + /** 928 + * ring_buffer_wake_waiters - wake up any waiters on this ring buffer 929 + * @buffer: The ring buffer to wake waiters on 930 + * 931 + * In the case of a file that represents a ring buffer is closing, 932 + * it is prudent to wake up any waiters that are on this. 933 + */ 934 + void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu) 935 + { 936 + struct ring_buffer_per_cpu *cpu_buffer; 937 + struct rb_irq_work *rbwork; 938 + 939 + if (cpu == RING_BUFFER_ALL_CPUS) { 940 + 941 + /* Wake up individual ones too. One level recursion */ 942 + for_each_buffer_cpu(buffer, cpu) 943 + ring_buffer_wake_waiters(buffer, cpu); 944 + 945 + rbwork = &buffer->irq_work; 946 + } else { 947 + cpu_buffer = buffer->buffers[cpu]; 948 + rbwork = &cpu_buffer->irq_work; 949 + } 950 + 951 + rbwork->wait_index++; 952 + /* make sure the waiters see the new index */ 953 + smp_wmb(); 954 + 955 + rb_wake_up_waiters(&rbwork->work); 925 956 } 926 957 927 958 /** ··· 971 938 struct ring_buffer_per_cpu *cpu_buffer; 972 939 DEFINE_WAIT(wait); 973 940 struct rb_irq_work *work; 941 + long wait_index; 974 942 int ret = 0; 975 943 976 944 /* ··· 990 956 work = &cpu_buffer->irq_work; 991 957 } 992 958 959 + wait_index = READ_ONCE(work->wait_index); 993 960 994 961 while (true) { 995 962 if (full) ··· 1046 1011 nr_pages = cpu_buffer->nr_pages; 1047 1012 dirty = ring_buffer_nr_dirty_pages(buffer, cpu); 1048 1013 if (!cpu_buffer->shortest_full || 1049 - cpu_buffer->shortest_full < full) 1014 + cpu_buffer->shortest_full > full) 1050 1015 cpu_buffer->shortest_full = full; 1051 1016 raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); 1052 1017 if (!pagebusy && ··· 1055 1020 } 1056 1021 1057 1022 schedule(); 1023 + 1024 + /* Make sure to see the new wait index */ 1025 + smp_rmb(); 1026 + if (wait_index != work->wait_index) 1027 + break; 1058 1028 } 1059 1029 1060 1030 if (full) ··· 2648 2608 /* Mark the rest of the page with padding */ 2649 2609 rb_event_set_padding(event); 2650 2610 2611 + /* Make sure the padding is visible before the write update */ 2612 + smp_wmb(); 2613 + 2651 2614 /* Set the write back to the previous setting */ 2652 2615 local_sub(length, &tail_page->write); 2653 2616 return; ··· 2661 2618 event->type_len = RINGBUF_TYPE_PADDING; 2662 2619 /* time delta must be non zero */ 2663 2620 event->time_delta = 1; 2621 + 2622 + /* Make sure the padding is visible before the tail_page->write update */ 2623 + smp_wmb(); 2664 2624 2665 2625 /* Set write to end of buffer */ 2666 2626 length = (tail + length) - BUF_PAGE_SIZE; ··· 4633 4587 arch_spin_unlock(&cpu_buffer->lock); 4634 4588 local_irq_restore(flags); 4635 4589 4590 + /* 4591 + * The writer has preempt disable, wait for it. But not forever 4592 + * Although, 1 second is pretty much "forever" 4593 + */ 4594 + #define USECS_WAIT 1000000 4595 + for (nr_loops = 0; nr_loops < USECS_WAIT; nr_loops++) { 4596 + /* If the write is past the end of page, a writer is still updating it */ 4597 + if (likely(!reader || rb_page_write(reader) <= BUF_PAGE_SIZE)) 4598 + break; 4599 + 4600 + udelay(1); 4601 + 4602 + /* Get the latest version of the reader write value */ 4603 + smp_rmb(); 4604 + } 4605 + 4606 + /* The writer is not moving forward? Something is wrong */ 4607 + if (RB_WARN_ON(cpu_buffer, nr_loops == USECS_WAIT)) 4608 + reader = NULL; 4609 + 4610 + /* 4611 + * Make sure we see any padding after the write update 4612 + * (see rb_reset_tail()) 4613 + */ 4614 + smp_rmb(); 4615 + 4616 + 4636 4617 return reader; 4637 4618 } 4638 4619 ··· 5689 5616 unsigned int pos = 0; 5690 5617 unsigned int size; 5691 5618 5692 - if (full) 5619 + /* 5620 + * If a full page is expected, this can still be returned 5621 + * if there's been a previous partial read and the 5622 + * rest of the page can be read and the commit page is off 5623 + * the reader page. 5624 + */ 5625 + if (full && 5626 + (!read || (len < (commit - read)) || 5627 + cpu_buffer->reader_page == cpu_buffer->commit_page)) 5693 5628 goto out_unlock; 5694 5629 5695 5630 if (len > (commit - read))
+4 -4
kernel/trace/rv/monitors/wip/wip.c
··· 16 16 17 17 #include "wip.h" 18 18 19 - struct rv_monitor rv_wip; 19 + static struct rv_monitor rv_wip; 20 20 DECLARE_DA_MON_PER_CPU(wip, unsigned char); 21 21 22 22 static void handle_preempt_disable(void *data, unsigned long ip, unsigned long parent_ip) ··· 60 60 da_monitor_destroy_wip(); 61 61 } 62 62 63 - struct rv_monitor rv_wip = { 63 + static struct rv_monitor rv_wip = { 64 64 .name = "wip", 65 65 .description = "wakeup in preemptive per-cpu testing monitor.", 66 66 .enable = enable_wip, ··· 69 69 .enabled = 0, 70 70 }; 71 71 72 - static int register_wip(void) 72 + static int __init register_wip(void) 73 73 { 74 74 rv_register_monitor(&rv_wip); 75 75 return 0; 76 76 } 77 77 78 - static void unregister_wip(void) 78 + static void __exit unregister_wip(void) 79 79 { 80 80 rv_unregister_monitor(&rv_wip); 81 81 }
+4 -4
kernel/trace/rv/monitors/wwnr/wwnr.c
··· 15 15 16 16 #include "wwnr.h" 17 17 18 - struct rv_monitor rv_wwnr; 18 + static struct rv_monitor rv_wwnr; 19 19 DECLARE_DA_MON_PER_TASK(wwnr, unsigned char); 20 20 21 21 static void handle_switch(void *data, bool preempt, struct task_struct *p, ··· 59 59 da_monitor_destroy_wwnr(); 60 60 } 61 61 62 - struct rv_monitor rv_wwnr = { 62 + static struct rv_monitor rv_wwnr = { 63 63 .name = "wwnr", 64 64 .description = "wakeup while not running per-task testing model.", 65 65 .enable = enable_wwnr, ··· 68 68 .enabled = 0, 69 69 }; 70 70 71 - static int register_wwnr(void) 71 + static int __init register_wwnr(void) 72 72 { 73 73 rv_register_monitor(&rv_wwnr); 74 74 return 0; 75 75 } 76 76 77 - static void unregister_wwnr(void) 77 + static void __exit unregister_wwnr(void) 78 78 { 79 79 rv_unregister_monitor(&rv_wwnr); 80 80 }
+73 -5
kernel/trace/trace.c
··· 1193 1193 { 1194 1194 void *cond_data = NULL; 1195 1195 1196 + local_irq_disable(); 1196 1197 arch_spin_lock(&tr->max_lock); 1197 1198 1198 1199 if (tr->cond_snapshot) 1199 1200 cond_data = tr->cond_snapshot->cond_data; 1200 1201 1201 1202 arch_spin_unlock(&tr->max_lock); 1203 + local_irq_enable(); 1202 1204 1203 1205 return cond_data; 1204 1206 } ··· 1336 1334 goto fail_unlock; 1337 1335 } 1338 1336 1337 + local_irq_disable(); 1339 1338 arch_spin_lock(&tr->max_lock); 1340 1339 tr->cond_snapshot = cond_snapshot; 1341 1340 arch_spin_unlock(&tr->max_lock); 1341 + local_irq_enable(); 1342 1342 1343 1343 mutex_unlock(&trace_types_lock); 1344 1344 ··· 1367 1363 { 1368 1364 int ret = 0; 1369 1365 1366 + local_irq_disable(); 1370 1367 arch_spin_lock(&tr->max_lock); 1371 1368 1372 1369 if (!tr->cond_snapshot) ··· 1378 1373 } 1379 1374 1380 1375 arch_spin_unlock(&tr->max_lock); 1376 + local_irq_enable(); 1381 1377 1382 1378 return ret; 1383 1379 } ··· 2206 2200 2207 2201 #define SAVED_CMDLINES_DEFAULT 128 2208 2202 #define NO_CMDLINE_MAP UINT_MAX 2203 + /* 2204 + * Preemption must be disabled before acquiring trace_cmdline_lock. 2205 + * The various trace_arrays' max_lock must be acquired in a context 2206 + * where interrupt is disabled. 2207 + */ 2209 2208 static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED; 2210 2209 struct saved_cmdlines_buffer { 2211 2210 unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1]; ··· 2423 2412 * the lock, but we also don't want to spin 2424 2413 * nor do we want to disable interrupts, 2425 2414 * so if we miss here, then better luck next time. 2415 + * 2416 + * This is called within the scheduler and wake up, so interrupts 2417 + * had better been disabled and run queue lock been held. 2426 2418 */ 2419 + lockdep_assert_preemption_disabled(); 2427 2420 if (!arch_spin_trylock(&trace_cmdline_lock)) 2428 2421 return 0; 2429 2422 ··· 5905 5890 char buf[64]; 5906 5891 int r; 5907 5892 5893 + preempt_disable(); 5908 5894 arch_spin_lock(&trace_cmdline_lock); 5909 5895 r = scnprintf(buf, sizeof(buf), "%u\n", savedcmd->cmdline_num); 5910 5896 arch_spin_unlock(&trace_cmdline_lock); 5897 + preempt_enable(); 5911 5898 5912 5899 return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); 5913 5900 } ··· 5934 5917 return -ENOMEM; 5935 5918 } 5936 5919 5920 + preempt_disable(); 5937 5921 arch_spin_lock(&trace_cmdline_lock); 5938 5922 savedcmd_temp = savedcmd; 5939 5923 savedcmd = s; 5940 5924 arch_spin_unlock(&trace_cmdline_lock); 5925 + preempt_enable(); 5941 5926 free_saved_cmdlines_buffer(savedcmd_temp); 5942 5927 5943 5928 return 0; ··· 6392 6373 6393 6374 #ifdef CONFIG_TRACER_SNAPSHOT 6394 6375 if (t->use_max_tr) { 6376 + local_irq_disable(); 6395 6377 arch_spin_lock(&tr->max_lock); 6396 6378 if (tr->cond_snapshot) 6397 6379 ret = -EBUSY; 6398 6380 arch_spin_unlock(&tr->max_lock); 6381 + local_irq_enable(); 6399 6382 if (ret) 6400 6383 goto out; 6401 6384 } ··· 6428 6407 if (tr->current_trace->reset) 6429 6408 tr->current_trace->reset(tr); 6430 6409 6410 + #ifdef CONFIG_TRACER_MAX_TRACE 6411 + had_max_tr = tr->current_trace->use_max_tr; 6412 + 6431 6413 /* Current trace needs to be nop_trace before synchronize_rcu */ 6432 6414 tr->current_trace = &nop_trace; 6433 - 6434 - #ifdef CONFIG_TRACER_MAX_TRACE 6435 - had_max_tr = tr->allocated_snapshot; 6436 6415 6437 6416 if (had_max_tr && !t->use_max_tr) { 6438 6417 /* ··· 6446 6425 free_snapshot(tr); 6447 6426 } 6448 6427 6449 - if (t->use_max_tr && !had_max_tr) { 6428 + if (t->use_max_tr && !tr->allocated_snapshot) { 6450 6429 ret = tracing_alloc_snapshot_instance(tr); 6451 6430 if (ret < 0) 6452 6431 goto out; 6453 6432 } 6433 + #else 6434 + tr->current_trace = &nop_trace; 6454 6435 #endif 6455 6436 6456 6437 if (t->init) { ··· 7459 7436 goto out; 7460 7437 } 7461 7438 7439 + local_irq_disable(); 7462 7440 arch_spin_lock(&tr->max_lock); 7463 7441 if (tr->cond_snapshot) 7464 7442 ret = -EBUSY; 7465 7443 arch_spin_unlock(&tr->max_lock); 7444 + local_irq_enable(); 7466 7445 if (ret) 7467 7446 goto out; 7468 7447 ··· 8162 8137 8163 8138 __trace_array_put(iter->tr); 8164 8139 8140 + iter->wait_index++; 8141 + /* Make sure the waiters see the new wait_index */ 8142 + smp_wmb(); 8143 + 8144 + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); 8145 + 8165 8146 if (info->spare) 8166 8147 ring_buffer_free_read_page(iter->array_buffer->buffer, 8167 8148 info->spare_cpu, info->spare); ··· 8321 8290 8322 8291 /* did we read anything? */ 8323 8292 if (!spd.nr_pages) { 8293 + long wait_index; 8294 + 8324 8295 if (ret) 8325 8296 goto out; 8326 8297 ··· 8330 8297 if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) 8331 8298 goto out; 8332 8299 8300 + wait_index = READ_ONCE(iter->wait_index); 8301 + 8333 8302 ret = wait_on_pipe(iter, iter->tr->buffer_percent); 8334 8303 if (ret) 8304 + goto out; 8305 + 8306 + /* No need to wait after waking up when tracing is off */ 8307 + if (!tracer_tracing_is_on(iter->tr)) 8308 + goto out; 8309 + 8310 + /* Make sure we see the new wait_index */ 8311 + smp_rmb(); 8312 + if (wait_index != iter->wait_index) 8335 8313 goto out; 8336 8314 8337 8315 goto again; ··· 8355 8311 return ret; 8356 8312 } 8357 8313 8314 + /* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */ 8315 + static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg) 8316 + { 8317 + struct ftrace_buffer_info *info = file->private_data; 8318 + struct trace_iterator *iter = &info->iter; 8319 + 8320 + if (cmd) 8321 + return -ENOIOCTLCMD; 8322 + 8323 + mutex_lock(&trace_types_lock); 8324 + 8325 + iter->wait_index++; 8326 + /* Make sure the waiters see the new wait_index */ 8327 + smp_wmb(); 8328 + 8329 + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); 8330 + 8331 + mutex_unlock(&trace_types_lock); 8332 + return 0; 8333 + } 8334 + 8358 8335 static const struct file_operations tracing_buffers_fops = { 8359 8336 .open = tracing_buffers_open, 8360 8337 .read = tracing_buffers_read, 8361 8338 .poll = tracing_buffers_poll, 8362 8339 .release = tracing_buffers_release, 8363 8340 .splice_read = tracing_buffers_splice_read, 8341 + .unlocked_ioctl = tracing_buffers_ioctl, 8364 8342 .llseek = no_llseek, 8365 8343 }; 8366 8344 ··· 9071 9005 tracer_tracing_off(tr); 9072 9006 if (tr->current_trace->stop) 9073 9007 tr->current_trace->stop(tr); 9008 + /* Wake up any waiters */ 9009 + ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS); 9074 9010 } 9075 9011 mutex_unlock(&trace_types_lock); 9076 9012 } ··· 10159 10091 * buffer. The memory will be removed once the "instance" is removed. 10160 10092 */ 10161 10093 ret = cpuhp_setup_state_multi(CPUHP_TRACE_RB_PREPARE, 10162 - "trace/RB:preapre", trace_rb_cpu_prepare, 10094 + "trace/RB:prepare", trace_rb_cpu_prepare, 10163 10095 NULL); 10164 10096 if (ret < 0) 10165 10097 goto out_free_cpumask;
-13
kernel/trace/trace.h
··· 1435 1435 struct filter_pred; 1436 1436 struct regex; 1437 1437 1438 - typedef int (*filter_pred_fn_t) (struct filter_pred *pred, void *event); 1439 - 1440 1438 typedef int (*regex_match_func)(char *str, struct regex *r, int len); 1441 1439 1442 1440 enum regex_type { ··· 1451 1453 int len; 1452 1454 int field_len; 1453 1455 regex_match_func match; 1454 - }; 1455 - 1456 - struct filter_pred { 1457 - filter_pred_fn_t fn; 1458 - u64 val; 1459 - struct regex regex; 1460 - unsigned short *ops; 1461 - struct ftrace_event_field *field; 1462 - int offset; 1463 - int not; 1464 - int op; 1465 1456 }; 1466 1457 1467 1458 static inline bool is_string_field(struct ftrace_event_field *field)
+1 -1
kernel/trace/trace_benchmark.c
··· 51 51 52 52 local_irq_disable(); 53 53 start = trace_clock_local(); 54 - trace_benchmark_event(bm_str); 54 + trace_benchmark_event(bm_str, bm_last); 55 55 stop = trace_clock_local(); 56 56 local_irq_enable(); 57 57
+5 -3
kernel/trace/trace_benchmark.h
··· 14 14 15 15 TRACE_EVENT_FN(benchmark_event, 16 16 17 - TP_PROTO(const char *str), 17 + TP_PROTO(const char *str, u64 delta), 18 18 19 - TP_ARGS(str), 19 + TP_ARGS(str, delta), 20 20 21 21 TP_STRUCT__entry( 22 22 __array( char, str, BENCHMARK_EVENT_STRLEN ) 23 + __field( u64, delta) 23 24 ), 24 25 25 26 TP_fast_assign( 26 27 memcpy(__entry->str, str, BENCHMARK_EVENT_STRLEN); 28 + __entry->delta = delta; 27 29 ), 28 30 29 - TP_printk("%s", __entry->str), 31 + TP_printk("%s delta=%llu", __entry->str, __entry->delta), 30 32 31 33 trace_benchmark_reg, trace_benchmark_unreg 32 34 );
+97 -10
kernel/trace/trace_eprobe.c
··· 26 26 /* tracepoint event */ 27 27 const char *event_name; 28 28 29 + /* filter string for the tracepoint */ 30 + char *filter_str; 31 + 29 32 struct trace_event_call *event; 30 33 31 34 struct dyn_event devent; ··· 667 664 new_eprobe_trigger(struct trace_eprobe *ep, struct trace_event_file *file) 668 665 { 669 666 struct event_trigger_data *trigger; 667 + struct event_filter *filter = NULL; 670 668 struct eprobe_data *edata; 669 + int ret; 671 670 672 671 edata = kzalloc(sizeof(*edata), GFP_KERNEL); 673 672 trigger = kzalloc(sizeof(*trigger), GFP_KERNEL); 674 673 if (!trigger || !edata) { 675 - kfree(edata); 676 - kfree(trigger); 677 - return ERR_PTR(-ENOMEM); 674 + ret = -ENOMEM; 675 + goto error; 678 676 } 679 677 680 678 trigger->flags = EVENT_TRIGGER_FL_PROBE; ··· 690 686 trigger->cmd_ops = &event_trigger_cmd; 691 687 692 688 INIT_LIST_HEAD(&trigger->list); 693 - RCU_INIT_POINTER(trigger->filter, NULL); 689 + 690 + if (ep->filter_str) { 691 + ret = create_event_filter(file->tr, file->event_call, 692 + ep->filter_str, false, &filter); 693 + if (ret) 694 + goto error; 695 + } 696 + RCU_INIT_POINTER(trigger->filter, filter); 694 697 695 698 edata->file = file; 696 699 edata->ep = ep; 697 700 trigger->private_data = edata; 698 701 699 702 return trigger; 703 + error: 704 + free_event_filter(filter); 705 + kfree(edata); 706 + kfree(trigger); 707 + return ERR_PTR(ret); 700 708 } 701 709 702 710 static int enable_eprobe(struct trace_eprobe *ep, ··· 742 726 { 743 727 struct event_trigger_data *trigger = NULL, *iter; 744 728 struct trace_event_file *file; 729 + struct event_filter *filter; 745 730 struct eprobe_data *edata; 746 731 747 732 file = find_event_file(tr, ep->event_system, ep->event_name); ··· 769 752 /* Make sure nothing is using the edata or trigger */ 770 753 tracepoint_synchronize_unregister(); 771 754 755 + filter = rcu_access_pointer(trigger->filter); 756 + 757 + if (filter) 758 + free_event_filter(filter); 772 759 kfree(edata); 773 760 kfree(trigger); 774 761 ··· 948 927 return ret; 949 928 } 950 929 930 + static int trace_eprobe_parse_filter(struct trace_eprobe *ep, int argc, const char *argv[]) 931 + { 932 + struct event_filter *dummy; 933 + int i, ret, len = 0; 934 + char *p; 935 + 936 + if (argc == 0) { 937 + trace_probe_log_err(0, NO_EP_FILTER); 938 + return -EINVAL; 939 + } 940 + 941 + /* Recover the filter string */ 942 + for (i = 0; i < argc; i++) 943 + len += strlen(argv[i]) + 1; 944 + 945 + ep->filter_str = kzalloc(len, GFP_KERNEL); 946 + if (!ep->filter_str) 947 + return -ENOMEM; 948 + 949 + p = ep->filter_str; 950 + for (i = 0; i < argc; i++) { 951 + ret = snprintf(p, len, "%s ", argv[i]); 952 + if (ret < 0) 953 + goto error; 954 + if (ret > len) { 955 + ret = -E2BIG; 956 + goto error; 957 + } 958 + p += ret; 959 + len -= ret; 960 + } 961 + p[-1] = '\0'; 962 + 963 + /* 964 + * Ensure the filter string can be parsed correctly. Note, this 965 + * filter string is for the original event, not for the eprobe. 966 + */ 967 + ret = create_event_filter(top_trace_array(), ep->event, ep->filter_str, 968 + true, &dummy); 969 + free_event_filter(dummy); 970 + if (ret) 971 + goto error; 972 + 973 + return 0; 974 + error: 975 + kfree(ep->filter_str); 976 + ep->filter_str = NULL; 977 + return ret; 978 + } 979 + 951 980 static int __trace_eprobe_create(int argc, const char *argv[]) 952 981 { 953 982 /* 954 983 * Argument syntax: 955 - * e[:[GRP/][ENAME]] SYSTEM.EVENT [FETCHARGS] 956 - * Fetch args: 984 + * e[:[GRP/][ENAME]] SYSTEM.EVENT [FETCHARGS] [if FILTER] 985 + * Fetch args (no space): 957 986 * <name>=$<field>[:TYPE] 958 987 */ 959 988 const char *event = NULL, *group = EPROBE_EVENT_SYSTEM; ··· 1013 942 char buf1[MAX_EVENT_NAME_LEN]; 1014 943 char buf2[MAX_EVENT_NAME_LEN]; 1015 944 char gbuf[MAX_EVENT_NAME_LEN]; 1016 - int ret = 0; 1017 - int i; 945 + int ret = 0, filter_idx = 0; 946 + int i, filter_cnt; 1018 947 1019 948 if (argc < 2 || argv[0][0] != 'e') 1020 949 return -ECANCELED; ··· 1039 968 } 1040 969 1041 970 if (!event) { 1042 - strscpy(buf1, argv[1], MAX_EVENT_NAME_LEN); 1043 - sanitize_event_name(buf1); 971 + strscpy(buf1, sys_event, MAX_EVENT_NAME_LEN); 1044 972 event = buf1; 973 + } 974 + 975 + for (i = 2; i < argc; i++) { 976 + if (!strcmp(argv[i], "if")) { 977 + filter_idx = i + 1; 978 + filter_cnt = argc - filter_idx; 979 + argc = i; 980 + break; 981 + } 1045 982 } 1046 983 1047 984 mutex_lock(&event_mutex); ··· 1066 987 ep = NULL; 1067 988 goto error; 1068 989 } 990 + 991 + if (filter_idx) { 992 + trace_probe_log_set_index(filter_idx); 993 + ret = trace_eprobe_parse_filter(ep, filter_cnt, argv + filter_idx); 994 + if (ret) 995 + goto parse_error; 996 + } else 997 + ep->filter_str = NULL; 1069 998 1070 999 argc -= 2; argv += 2; 1071 1000 /* parse arguments */
+169 -72
kernel/trace/trace_events_filter.c
··· 43 43 44 44 static const char * ops[] = { OPS }; 45 45 46 + enum filter_pred_fn { 47 + FILTER_PRED_FN_NOP, 48 + FILTER_PRED_FN_64, 49 + FILTER_PRED_FN_S64, 50 + FILTER_PRED_FN_U64, 51 + FILTER_PRED_FN_32, 52 + FILTER_PRED_FN_S32, 53 + FILTER_PRED_FN_U32, 54 + FILTER_PRED_FN_16, 55 + FILTER_PRED_FN_S16, 56 + FILTER_PRED_FN_U16, 57 + FILTER_PRED_FN_8, 58 + FILTER_PRED_FN_S8, 59 + FILTER_PRED_FN_U8, 60 + FILTER_PRED_FN_COMM, 61 + FILTER_PRED_FN_STRING, 62 + FILTER_PRED_FN_STRLOC, 63 + FILTER_PRED_FN_STRRELLOC, 64 + FILTER_PRED_FN_PCHAR_USER, 65 + FILTER_PRED_FN_PCHAR, 66 + FILTER_PRED_FN_CPU, 67 + FILTER_PRED_FN_, 68 + FILTER_PRED_TEST_VISITED, 69 + }; 70 + 71 + struct filter_pred { 72 + enum filter_pred_fn fn_num; 73 + u64 val; 74 + struct regex regex; 75 + unsigned short *ops; 76 + struct ftrace_event_field *field; 77 + int offset; 78 + int not; 79 + int op; 80 + }; 81 + 46 82 /* 47 83 * pred functions are OP_LE, OP_LT, OP_GE, OP_GT, and OP_BAND 48 84 * pred_funcs_##type below must match the order of them above. ··· 626 590 return ERR_PTR(ret); 627 591 } 628 592 629 - #define DEFINE_COMPARISON_PRED(type) \ 630 - static int filter_pred_LT_##type(struct filter_pred *pred, void *event) \ 631 - { \ 632 - type *addr = (type *)(event + pred->offset); \ 633 - type val = (type)pred->val; \ 634 - return *addr < val; \ 635 - } \ 636 - static int filter_pred_LE_##type(struct filter_pred *pred, void *event) \ 637 - { \ 638 - type *addr = (type *)(event + pred->offset); \ 639 - type val = (type)pred->val; \ 640 - return *addr <= val; \ 641 - } \ 642 - static int filter_pred_GT_##type(struct filter_pred *pred, void *event) \ 643 - { \ 644 - type *addr = (type *)(event + pred->offset); \ 645 - type val = (type)pred->val; \ 646 - return *addr > val; \ 647 - } \ 648 - static int filter_pred_GE_##type(struct filter_pred *pred, void *event) \ 649 - { \ 650 - type *addr = (type *)(event + pred->offset); \ 651 - type val = (type)pred->val; \ 652 - return *addr >= val; \ 653 - } \ 654 - static int filter_pred_BAND_##type(struct filter_pred *pred, void *event) \ 655 - { \ 656 - type *addr = (type *)(event + pred->offset); \ 657 - type val = (type)pred->val; \ 658 - return !!(*addr & val); \ 659 - } \ 660 - static const filter_pred_fn_t pred_funcs_##type[] = { \ 661 - filter_pred_LE_##type, \ 662 - filter_pred_LT_##type, \ 663 - filter_pred_GE_##type, \ 664 - filter_pred_GT_##type, \ 665 - filter_pred_BAND_##type, \ 593 + enum pred_cmp_types { 594 + PRED_CMP_TYPE_NOP, 595 + PRED_CMP_TYPE_LT, 596 + PRED_CMP_TYPE_LE, 597 + PRED_CMP_TYPE_GT, 598 + PRED_CMP_TYPE_GE, 599 + PRED_CMP_TYPE_BAND, 666 600 }; 601 + 602 + #define DEFINE_COMPARISON_PRED(type) \ 603 + static int filter_pred_##type(struct filter_pred *pred, void *event) \ 604 + { \ 605 + switch (pred->op) { \ 606 + case OP_LT: { \ 607 + type *addr = (type *)(event + pred->offset); \ 608 + type val = (type)pred->val; \ 609 + return *addr < val; \ 610 + } \ 611 + case OP_LE: { \ 612 + type *addr = (type *)(event + pred->offset); \ 613 + type val = (type)pred->val; \ 614 + return *addr <= val; \ 615 + } \ 616 + case OP_GT: { \ 617 + type *addr = (type *)(event + pred->offset); \ 618 + type val = (type)pred->val; \ 619 + return *addr > val; \ 620 + } \ 621 + case OP_GE: { \ 622 + type *addr = (type *)(event + pred->offset); \ 623 + type val = (type)pred->val; \ 624 + return *addr >= val; \ 625 + } \ 626 + case OP_BAND: { \ 627 + type *addr = (type *)(event + pred->offset); \ 628 + type val = (type)pred->val; \ 629 + return !!(*addr & val); \ 630 + } \ 631 + default: \ 632 + return 0; \ 633 + } \ 634 + } 667 635 668 636 #define DEFINE_EQUALITY_PRED(size) \ 669 637 static int filter_pred_##size(struct filter_pred *pred, void *event) \ ··· 876 836 return cmp ^ pred->not; 877 837 } 878 838 879 - static int filter_pred_none(struct filter_pred *pred, void *event) 880 - { 881 - return 0; 882 - } 883 - 884 839 /* 885 840 * regex_match_foo - Basic regex callbacks 886 841 * ··· 1021 986 } 1022 987 } 1023 988 989 + 990 + #ifdef CONFIG_FTRACE_STARTUP_TEST 991 + static int test_pred_visited_fn(struct filter_pred *pred, void *event); 992 + #else 993 + static int test_pred_visited_fn(struct filter_pred *pred, void *event) 994 + { 995 + return 0; 996 + } 997 + #endif 998 + 999 + 1000 + static int filter_pred_fn_call(struct filter_pred *pred, void *event); 1001 + 1024 1002 /* return 1 if event matches, 0 otherwise (discard) */ 1025 1003 int filter_match_preds(struct event_filter *filter, void *rec) 1026 1004 { ··· 1051 1003 1052 1004 for (i = 0; prog[i].pred; i++) { 1053 1005 struct filter_pred *pred = prog[i].pred; 1054 - int match = pred->fn(pred, rec); 1006 + int match = filter_pred_fn_call(pred, rec); 1055 1007 if (match == prog[i].when_to_branch) 1056 1008 i = prog[i].target; 1057 1009 } ··· 1237 1189 return FILTER_OTHER; 1238 1190 } 1239 1191 1240 - static filter_pred_fn_t select_comparison_fn(enum filter_op_ids op, 1241 - int field_size, int field_is_signed) 1192 + static enum filter_pred_fn select_comparison_fn(enum filter_op_ids op, 1193 + int field_size, int field_is_signed) 1242 1194 { 1243 - filter_pred_fn_t fn = NULL; 1195 + enum filter_pred_fn fn = FILTER_PRED_FN_NOP; 1244 1196 int pred_func_index = -1; 1245 1197 1246 1198 switch (op) { ··· 1249 1201 break; 1250 1202 default: 1251 1203 if (WARN_ON_ONCE(op < PRED_FUNC_START)) 1252 - return NULL; 1204 + return fn; 1253 1205 pred_func_index = op - PRED_FUNC_START; 1254 1206 if (WARN_ON_ONCE(pred_func_index > PRED_FUNC_MAX)) 1255 - return NULL; 1207 + return fn; 1256 1208 } 1257 1209 1258 1210 switch (field_size) { 1259 1211 case 8: 1260 1212 if (pred_func_index < 0) 1261 - fn = filter_pred_64; 1213 + fn = FILTER_PRED_FN_64; 1262 1214 else if (field_is_signed) 1263 - fn = pred_funcs_s64[pred_func_index]; 1215 + fn = FILTER_PRED_FN_S64; 1264 1216 else 1265 - fn = pred_funcs_u64[pred_func_index]; 1217 + fn = FILTER_PRED_FN_U64; 1266 1218 break; 1267 1219 case 4: 1268 1220 if (pred_func_index < 0) 1269 - fn = filter_pred_32; 1221 + fn = FILTER_PRED_FN_32; 1270 1222 else if (field_is_signed) 1271 - fn = pred_funcs_s32[pred_func_index]; 1223 + fn = FILTER_PRED_FN_S32; 1272 1224 else 1273 - fn = pred_funcs_u32[pred_func_index]; 1225 + fn = FILTER_PRED_FN_U32; 1274 1226 break; 1275 1227 case 2: 1276 1228 if (pred_func_index < 0) 1277 - fn = filter_pred_16; 1229 + fn = FILTER_PRED_FN_16; 1278 1230 else if (field_is_signed) 1279 - fn = pred_funcs_s16[pred_func_index]; 1231 + fn = FILTER_PRED_FN_S16; 1280 1232 else 1281 - fn = pred_funcs_u16[pred_func_index]; 1233 + fn = FILTER_PRED_FN_U16; 1282 1234 break; 1283 1235 case 1: 1284 1236 if (pred_func_index < 0) 1285 - fn = filter_pred_8; 1237 + fn = FILTER_PRED_FN_8; 1286 1238 else if (field_is_signed) 1287 - fn = pred_funcs_s8[pred_func_index]; 1239 + fn = FILTER_PRED_FN_S8; 1288 1240 else 1289 - fn = pred_funcs_u8[pred_func_index]; 1241 + fn = FILTER_PRED_FN_U8; 1290 1242 break; 1291 1243 } 1292 1244 1293 1245 return fn; 1246 + } 1247 + 1248 + 1249 + static int filter_pred_fn_call(struct filter_pred *pred, void *event) 1250 + { 1251 + switch (pred->fn_num) { 1252 + case FILTER_PRED_FN_64: 1253 + return filter_pred_64(pred, event); 1254 + case FILTER_PRED_FN_S64: 1255 + return filter_pred_s64(pred, event); 1256 + case FILTER_PRED_FN_U64: 1257 + return filter_pred_u64(pred, event); 1258 + case FILTER_PRED_FN_32: 1259 + return filter_pred_32(pred, event); 1260 + case FILTER_PRED_FN_S32: 1261 + return filter_pred_s32(pred, event); 1262 + case FILTER_PRED_FN_U32: 1263 + return filter_pred_u32(pred, event); 1264 + case FILTER_PRED_FN_16: 1265 + return filter_pred_16(pred, event); 1266 + case FILTER_PRED_FN_S16: 1267 + return filter_pred_s16(pred, event); 1268 + case FILTER_PRED_FN_U16: 1269 + return filter_pred_u16(pred, event); 1270 + case FILTER_PRED_FN_8: 1271 + return filter_pred_8(pred, event); 1272 + case FILTER_PRED_FN_S8: 1273 + return filter_pred_s8(pred, event); 1274 + case FILTER_PRED_FN_U8: 1275 + return filter_pred_u8(pred, event); 1276 + case FILTER_PRED_FN_COMM: 1277 + return filter_pred_comm(pred, event); 1278 + case FILTER_PRED_FN_STRING: 1279 + return filter_pred_string(pred, event); 1280 + case FILTER_PRED_FN_STRLOC: 1281 + return filter_pred_strloc(pred, event); 1282 + case FILTER_PRED_FN_STRRELLOC: 1283 + return filter_pred_strrelloc(pred, event); 1284 + case FILTER_PRED_FN_PCHAR_USER: 1285 + return filter_pred_pchar_user(pred, event); 1286 + case FILTER_PRED_FN_PCHAR: 1287 + return filter_pred_pchar(pred, event); 1288 + case FILTER_PRED_FN_CPU: 1289 + return filter_pred_cpu(pred, event); 1290 + case FILTER_PRED_TEST_VISITED: 1291 + return test_pred_visited_fn(pred, event); 1292 + default: 1293 + return 0; 1294 + } 1294 1295 } 1295 1296 1296 1297 /* Called when a predicate is encountered by predicate_parse() */ ··· 1435 1338 parse_error(pe, FILT_ERR_IP_FIELD_ONLY, pos + i); 1436 1339 goto err_free; 1437 1340 } 1438 - pred->fn = filter_pred_none; 1341 + pred->fn_num = FILTER_PRED_FN_NOP; 1439 1342 1440 1343 /* 1441 1344 * Quotes are not required, but if they exist then we need ··· 1513 1416 filter_build_regex(pred); 1514 1417 1515 1418 if (field->filter_type == FILTER_COMM) { 1516 - pred->fn = filter_pred_comm; 1419 + pred->fn_num = FILTER_PRED_FN_COMM; 1517 1420 1518 1421 } else if (field->filter_type == FILTER_STATIC_STRING) { 1519 - pred->fn = filter_pred_string; 1422 + pred->fn_num = FILTER_PRED_FN_STRING; 1520 1423 pred->regex.field_len = field->size; 1521 1424 1522 1425 } else if (field->filter_type == FILTER_DYN_STRING) { 1523 - pred->fn = filter_pred_strloc; 1426 + pred->fn_num = FILTER_PRED_FN_STRLOC; 1524 1427 } else if (field->filter_type == FILTER_RDYN_STRING) 1525 - pred->fn = filter_pred_strrelloc; 1428 + pred->fn_num = FILTER_PRED_FN_STRRELLOC; 1526 1429 else { 1527 1430 1528 1431 if (!ustring_per_cpu) { ··· 1533 1436 } 1534 1437 1535 1438 if (ustring) 1536 - pred->fn = filter_pred_pchar_user; 1439 + pred->fn_num = FILTER_PRED_FN_PCHAR_USER; 1537 1440 else 1538 - pred->fn = filter_pred_pchar; 1441 + pred->fn_num = FILTER_PRED_FN_PCHAR; 1539 1442 } 1540 1443 /* go past the last quote */ 1541 1444 i++; ··· 1583 1486 pred->val = val; 1584 1487 1585 1488 if (field->filter_type == FILTER_CPU) 1586 - pred->fn = filter_pred_cpu; 1489 + pred->fn_num = FILTER_PRED_FN_CPU; 1587 1490 else { 1588 - pred->fn = select_comparison_fn(pred->op, field->size, 1589 - field->is_signed); 1491 + pred->fn_num = select_comparison_fn(pred->op, field->size, 1492 + field->is_signed); 1590 1493 if (pred->op == OP_NE) 1591 1494 pred->not = 1; 1592 1495 } ··· 2393 2296 struct filter_pred *pred = prog[i].pred; 2394 2297 struct ftrace_event_field *field = pred->field; 2395 2298 2396 - WARN_ON_ONCE(!pred->fn); 2299 + WARN_ON_ONCE(pred->fn_num == FILTER_PRED_FN_NOP); 2397 2300 2398 2301 if (!field) { 2399 2302 WARN_ONCE(1, "all leafs should have field defined %d", i); ··· 2403 2306 if (!strchr(fields, *field->name)) 2404 2307 continue; 2405 2308 2406 - pred->fn = test_pred_visited_fn; 2309 + pred->fn_num = FILTER_PRED_TEST_VISITED; 2407 2310 } 2408 2311 } 2409 2312
+169 -77
kernel/trace/trace_events_hist.c
··· 104 104 FIELD_OP_MULT, 105 105 }; 106 106 107 + enum hist_field_fn { 108 + HIST_FIELD_FN_NOP, 109 + HIST_FIELD_FN_VAR_REF, 110 + HIST_FIELD_FN_COUNTER, 111 + HIST_FIELD_FN_CONST, 112 + HIST_FIELD_FN_LOG2, 113 + HIST_FIELD_FN_BUCKET, 114 + HIST_FIELD_FN_TIMESTAMP, 115 + HIST_FIELD_FN_CPU, 116 + HIST_FIELD_FN_STRING, 117 + HIST_FIELD_FN_DYNSTRING, 118 + HIST_FIELD_FN_RELDYNSTRING, 119 + HIST_FIELD_FN_PSTRING, 120 + HIST_FIELD_FN_S64, 121 + HIST_FIELD_FN_U64, 122 + HIST_FIELD_FN_S32, 123 + HIST_FIELD_FN_U32, 124 + HIST_FIELD_FN_S16, 125 + HIST_FIELD_FN_U16, 126 + HIST_FIELD_FN_S8, 127 + HIST_FIELD_FN_U8, 128 + HIST_FIELD_FN_UMINUS, 129 + HIST_FIELD_FN_MINUS, 130 + HIST_FIELD_FN_PLUS, 131 + HIST_FIELD_FN_DIV, 132 + HIST_FIELD_FN_MULT, 133 + HIST_FIELD_FN_DIV_POWER2, 134 + HIST_FIELD_FN_DIV_NOT_POWER2, 135 + HIST_FIELD_FN_DIV_MULT_SHIFT, 136 + HIST_FIELD_FN_EXECNAME, 137 + }; 138 + 107 139 /* 108 140 * A hist_var (histogram variable) contains variable information for 109 141 * hist_fields having the HIST_FIELD_FL_VAR or HIST_FIELD_FL_VAR_REF ··· 155 123 struct hist_field { 156 124 struct ftrace_event_field *field; 157 125 unsigned long flags; 158 - hist_field_fn_t fn; 159 - unsigned int ref; 160 - unsigned int size; 161 - unsigned int offset; 162 - unsigned int is_signed; 163 126 unsigned long buckets; 164 127 const char *type; 165 128 struct hist_field *operands[HIST_FIELD_OPERANDS_MAX]; 166 129 struct hist_trigger_data *hist_data; 130 + enum hist_field_fn fn_num; 131 + unsigned int ref; 132 + unsigned int size; 133 + unsigned int offset; 134 + unsigned int is_signed; 167 135 168 136 /* 169 137 * Variable fields contain variable-specific info in var. ··· 198 166 u64 div_multiplier; 199 167 }; 200 168 201 - static u64 hist_field_none(struct hist_field *field, 202 - struct tracing_map_elt *elt, 203 - struct trace_buffer *buffer, 204 - struct ring_buffer_event *rbe, 205 - void *event) 206 - { 207 - return 0; 208 - } 169 + static u64 hist_fn_call(struct hist_field *hist_field, 170 + struct tracing_map_elt *elt, 171 + struct trace_buffer *buffer, 172 + struct ring_buffer_event *rbe, 173 + void *event); 209 174 210 175 static u64 hist_field_const(struct hist_field *field, 211 176 struct tracing_map_elt *elt, ··· 279 250 { 280 251 struct hist_field *operand = hist_field->operands[0]; 281 252 282 - u64 val = operand->fn(operand, elt, buffer, rbe, event); 253 + u64 val = hist_fn_call(operand, elt, buffer, rbe, event); 283 254 284 255 return (u64) ilog2(roundup_pow_of_two(val)); 285 256 } ··· 293 264 struct hist_field *operand = hist_field->operands[0]; 294 265 unsigned long buckets = hist_field->buckets; 295 266 296 - u64 val = operand->fn(operand, elt, buffer, rbe, event); 267 + u64 val = hist_fn_call(operand, elt, buffer, rbe, event); 297 268 298 269 if (WARN_ON_ONCE(!buckets)) 299 270 return val; ··· 314 285 struct hist_field *operand1 = hist_field->operands[0]; 315 286 struct hist_field *operand2 = hist_field->operands[1]; 316 287 317 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 318 - u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event); 288 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 289 + u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event); 319 290 320 291 return val1 + val2; 321 292 } ··· 329 300 struct hist_field *operand1 = hist_field->operands[0]; 330 301 struct hist_field *operand2 = hist_field->operands[1]; 331 302 332 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 333 - u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event); 303 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 304 + u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event); 334 305 335 306 return val1 - val2; 336 307 } ··· 344 315 struct hist_field *operand1 = hist_field->operands[0]; 345 316 struct hist_field *operand2 = hist_field->operands[1]; 346 317 347 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 348 - u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event); 318 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 319 + u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event); 349 320 350 321 /* Return -1 for the undefined case */ 351 322 if (!val2) ··· 367 338 struct hist_field *operand1 = hist_field->operands[0]; 368 339 struct hist_field *operand2 = hist_field->operands[1]; 369 340 370 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 341 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 371 342 372 343 return val1 >> __ffs64(operand2->constant); 373 344 } ··· 381 352 struct hist_field *operand1 = hist_field->operands[0]; 382 353 struct hist_field *operand2 = hist_field->operands[1]; 383 354 384 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 355 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 385 356 386 357 return div64_u64(val1, operand2->constant); 387 358 } ··· 395 366 struct hist_field *operand1 = hist_field->operands[0]; 396 367 struct hist_field *operand2 = hist_field->operands[1]; 397 368 398 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 369 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 399 370 400 371 /* 401 372 * If the divisor is a constant, do a multiplication and shift instead. ··· 429 400 struct hist_field *operand1 = hist_field->operands[0]; 430 401 struct hist_field *operand2 = hist_field->operands[1]; 431 402 432 - u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 433 - u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event); 403 + u64 val1 = hist_fn_call(operand1, elt, buffer, rbe, event); 404 + u64 val2 = hist_fn_call(operand2, elt, buffer, rbe, event); 434 405 435 406 return val1 * val2; 436 407 } ··· 443 414 { 444 415 struct hist_field *operand = hist_field->operands[0]; 445 416 446 - s64 sval = (s64)operand->fn(operand, elt, buffer, rbe, event); 417 + s64 sval = (s64)hist_fn_call(operand, elt, buffer, rbe, event); 447 418 u64 val = (u64)-sval; 448 419 449 420 return val; ··· 686 657 * Returns the specific division function to use if the divisor 687 658 * is constant. This avoids extra branches when the trigger is hit. 688 659 */ 689 - static hist_field_fn_t hist_field_get_div_fn(struct hist_field *divisor) 660 + static enum hist_field_fn hist_field_get_div_fn(struct hist_field *divisor) 690 661 { 691 662 u64 div = divisor->constant; 692 663 693 664 if (!(div & (div - 1))) 694 - return div_by_power_of_two; 665 + return HIST_FIELD_FN_DIV_POWER2; 695 666 696 667 /* If the divisor is too large, do a regular division */ 697 668 if (div > (1 << HIST_DIV_SHIFT)) 698 - return div_by_not_power_of_two; 669 + return HIST_FIELD_FN_DIV_NOT_POWER2; 699 670 700 671 divisor->div_multiplier = div64_u64((u64)(1 << HIST_DIV_SHIFT), div); 701 - return div_by_mult_and_shift; 672 + return HIST_FIELD_FN_DIV_MULT_SHIFT; 702 673 } 703 674 704 675 static void track_data_free(struct track_data *track_data) ··· 1363 1334 return field_name; 1364 1335 } 1365 1336 1366 - static hist_field_fn_t select_value_fn(int field_size, int field_is_signed) 1337 + static enum hist_field_fn select_value_fn(int field_size, int field_is_signed) 1367 1338 { 1368 - hist_field_fn_t fn = NULL; 1369 - 1370 1339 switch (field_size) { 1371 1340 case 8: 1372 1341 if (field_is_signed) 1373 - fn = hist_field_s64; 1342 + return HIST_FIELD_FN_S64; 1374 1343 else 1375 - fn = hist_field_u64; 1376 - break; 1344 + return HIST_FIELD_FN_U64; 1377 1345 case 4: 1378 1346 if (field_is_signed) 1379 - fn = hist_field_s32; 1347 + return HIST_FIELD_FN_S32; 1380 1348 else 1381 - fn = hist_field_u32; 1382 - break; 1349 + return HIST_FIELD_FN_U32; 1383 1350 case 2: 1384 1351 if (field_is_signed) 1385 - fn = hist_field_s16; 1352 + return HIST_FIELD_FN_S16; 1386 1353 else 1387 - fn = hist_field_u16; 1388 - break; 1354 + return HIST_FIELD_FN_U16; 1389 1355 case 1: 1390 1356 if (field_is_signed) 1391 - fn = hist_field_s8; 1357 + return HIST_FIELD_FN_S8; 1392 1358 else 1393 - fn = hist_field_u8; 1394 - break; 1359 + return HIST_FIELD_FN_U8; 1395 1360 } 1396 1361 1397 - return fn; 1362 + return HIST_FIELD_FN_NOP; 1398 1363 } 1399 1364 1400 1365 static int parse_map_size(char *str) ··· 1945 1922 goto out; /* caller will populate */ 1946 1923 1947 1924 if (flags & HIST_FIELD_FL_VAR_REF) { 1948 - hist_field->fn = hist_field_var_ref; 1925 + hist_field->fn_num = HIST_FIELD_FN_VAR_REF; 1949 1926 goto out; 1950 1927 } 1951 1928 1952 1929 if (flags & HIST_FIELD_FL_HITCOUNT) { 1953 - hist_field->fn = hist_field_counter; 1930 + hist_field->fn_num = HIST_FIELD_FN_COUNTER; 1954 1931 hist_field->size = sizeof(u64); 1955 1932 hist_field->type = "u64"; 1956 1933 goto out; 1957 1934 } 1958 1935 1959 1936 if (flags & HIST_FIELD_FL_CONST) { 1960 - hist_field->fn = hist_field_const; 1937 + hist_field->fn_num = HIST_FIELD_FN_CONST; 1961 1938 hist_field->size = sizeof(u64); 1962 1939 hist_field->type = kstrdup("u64", GFP_KERNEL); 1963 1940 if (!hist_field->type) ··· 1966 1943 } 1967 1944 1968 1945 if (flags & HIST_FIELD_FL_STACKTRACE) { 1969 - hist_field->fn = hist_field_none; 1946 + hist_field->fn_num = HIST_FIELD_FN_NOP; 1970 1947 goto out; 1971 1948 } 1972 1949 1973 1950 if (flags & (HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET)) { 1974 1951 unsigned long fl = flags & ~(HIST_FIELD_FL_LOG2 | HIST_FIELD_FL_BUCKET); 1975 - hist_field->fn = flags & HIST_FIELD_FL_LOG2 ? hist_field_log2 : 1976 - hist_field_bucket; 1952 + hist_field->fn_num = flags & HIST_FIELD_FL_LOG2 ? HIST_FIELD_FN_LOG2 : 1953 + HIST_FIELD_FN_BUCKET; 1977 1954 hist_field->operands[0] = create_hist_field(hist_data, field, fl, NULL); 1978 1955 hist_field->size = hist_field->operands[0]->size; 1979 1956 hist_field->type = kstrdup_const(hist_field->operands[0]->type, GFP_KERNEL); ··· 1983 1960 } 1984 1961 1985 1962 if (flags & HIST_FIELD_FL_TIMESTAMP) { 1986 - hist_field->fn = hist_field_timestamp; 1963 + hist_field->fn_num = HIST_FIELD_FN_TIMESTAMP; 1987 1964 hist_field->size = sizeof(u64); 1988 1965 hist_field->type = "u64"; 1989 1966 goto out; 1990 1967 } 1991 1968 1992 1969 if (flags & HIST_FIELD_FL_CPU) { 1993 - hist_field->fn = hist_field_cpu; 1970 + hist_field->fn_num = HIST_FIELD_FN_CPU; 1994 1971 hist_field->size = sizeof(int); 1995 1972 hist_field->type = "unsigned int"; 1996 1973 goto out; ··· 2010 1987 goto free; 2011 1988 2012 1989 if (field->filter_type == FILTER_STATIC_STRING) { 2013 - hist_field->fn = hist_field_string; 1990 + hist_field->fn_num = HIST_FIELD_FN_STRING; 2014 1991 hist_field->size = field->size; 2015 1992 } else if (field->filter_type == FILTER_DYN_STRING) { 2016 - hist_field->fn = hist_field_dynstring; 1993 + hist_field->fn_num = HIST_FIELD_FN_DYNSTRING; 2017 1994 } else if (field->filter_type == FILTER_RDYN_STRING) 2018 - hist_field->fn = hist_field_reldynstring; 1995 + hist_field->fn_num = HIST_FIELD_FN_RELDYNSTRING; 2019 1996 else 2020 - hist_field->fn = hist_field_pstring; 1997 + hist_field->fn_num = HIST_FIELD_FN_PSTRING; 2021 1998 } else { 2022 1999 hist_field->size = field->size; 2023 2000 hist_field->is_signed = field->is_signed; ··· 2025 2002 if (!hist_field->type) 2026 2003 goto free; 2027 2004 2028 - hist_field->fn = select_value_fn(field->size, 2029 - field->is_signed); 2030 - if (!hist_field->fn) { 2005 + hist_field->fn_num = select_value_fn(field->size, 2006 + field->is_signed); 2007 + if (hist_field->fn_num == HIST_FIELD_FN_NOP) { 2031 2008 destroy_hist_field(hist_field, 0); 2032 2009 return NULL; 2033 2010 } ··· 2363 2340 if (!alias) 2364 2341 return NULL; 2365 2342 2366 - alias->fn = var_ref->fn; 2343 + alias->fn_num = var_ref->fn_num; 2367 2344 alias->operands[0] = var_ref; 2368 2345 2369 2346 if (init_var_ref(alias, var_ref, var_ref->system, var_ref->event_name)) { ··· 2546 2523 2547 2524 expr->flags |= operand1->flags & 2548 2525 (HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS); 2549 - expr->fn = hist_field_unary_minus; 2526 + expr->fn_num = HIST_FIELD_FN_UMINUS; 2550 2527 expr->operands[0] = operand1; 2551 2528 expr->size = operand1->size; 2552 2529 expr->is_signed = operand1->is_signed; ··· 2618 2595 unsigned long operand_flags, operand2_flags; 2619 2596 int field_op, ret = -EINVAL; 2620 2597 char *sep, *operand1_str; 2621 - hist_field_fn_t op_fn; 2598 + enum hist_field_fn op_fn; 2622 2599 bool combine_consts; 2623 2600 2624 2601 if (*n_subexprs > 3) { ··· 2677 2654 2678 2655 switch (field_op) { 2679 2656 case FIELD_OP_MINUS: 2680 - op_fn = hist_field_minus; 2657 + op_fn = HIST_FIELD_FN_MINUS; 2681 2658 break; 2682 2659 case FIELD_OP_PLUS: 2683 - op_fn = hist_field_plus; 2660 + op_fn = HIST_FIELD_FN_PLUS; 2684 2661 break; 2685 2662 case FIELD_OP_DIV: 2686 - op_fn = hist_field_div; 2663 + op_fn = HIST_FIELD_FN_DIV; 2687 2664 break; 2688 2665 case FIELD_OP_MULT: 2689 - op_fn = hist_field_mult; 2666 + op_fn = HIST_FIELD_FN_MULT; 2690 2667 break; 2691 2668 default: 2692 2669 ret = -EINVAL; ··· 2742 2719 op_fn = hist_field_get_div_fn(operand2); 2743 2720 } 2744 2721 2722 + expr->fn_num = op_fn; 2723 + 2745 2724 if (combine_consts) { 2746 2725 if (var1) 2747 2726 expr->operands[0] = var1; 2748 2727 if (var2) 2749 2728 expr->operands[1] = var2; 2750 2729 2751 - expr->constant = op_fn(expr, NULL, NULL, NULL, NULL); 2730 + expr->constant = hist_fn_call(expr, NULL, NULL, NULL, NULL); 2731 + expr->fn_num = HIST_FIELD_FN_CONST; 2752 2732 2753 2733 expr->operands[0] = NULL; 2754 2734 expr->operands[1] = NULL; ··· 2765 2739 2766 2740 expr->name = expr_str(expr, 0); 2767 2741 } else { 2768 - expr->fn = op_fn; 2769 - 2770 2742 /* The operand sizes should be the same, so just pick one */ 2771 2743 expr->size = operand1->size; 2772 2744 expr->is_signed = operand1->is_signed; ··· 3089 3065 struct hist_field *var = field_var->var; 3090 3066 struct hist_field *val = field_var->val; 3091 3067 3092 - var_val = val->fn(val, elt, buffer, rbe, rec); 3068 + var_val = hist_fn_call(val, elt, buffer, rbe, rec); 3093 3069 var_idx = var->var.idx; 3094 3070 3095 3071 if (val->flags & HIST_FIELD_FL_STRING) { ··· 4210 4186 return (u64)(unsigned long)(elt_data->comm); 4211 4187 } 4212 4188 4189 + static u64 hist_fn_call(struct hist_field *hist_field, 4190 + struct tracing_map_elt *elt, 4191 + struct trace_buffer *buffer, 4192 + struct ring_buffer_event *rbe, 4193 + void *event) 4194 + { 4195 + switch (hist_field->fn_num) { 4196 + case HIST_FIELD_FN_VAR_REF: 4197 + return hist_field_var_ref(hist_field, elt, buffer, rbe, event); 4198 + case HIST_FIELD_FN_COUNTER: 4199 + return hist_field_counter(hist_field, elt, buffer, rbe, event); 4200 + case HIST_FIELD_FN_CONST: 4201 + return hist_field_const(hist_field, elt, buffer, rbe, event); 4202 + case HIST_FIELD_FN_LOG2: 4203 + return hist_field_log2(hist_field, elt, buffer, rbe, event); 4204 + case HIST_FIELD_FN_BUCKET: 4205 + return hist_field_bucket(hist_field, elt, buffer, rbe, event); 4206 + case HIST_FIELD_FN_TIMESTAMP: 4207 + return hist_field_timestamp(hist_field, elt, buffer, rbe, event); 4208 + case HIST_FIELD_FN_CPU: 4209 + return hist_field_cpu(hist_field, elt, buffer, rbe, event); 4210 + case HIST_FIELD_FN_STRING: 4211 + return hist_field_string(hist_field, elt, buffer, rbe, event); 4212 + case HIST_FIELD_FN_DYNSTRING: 4213 + return hist_field_dynstring(hist_field, elt, buffer, rbe, event); 4214 + case HIST_FIELD_FN_RELDYNSTRING: 4215 + return hist_field_reldynstring(hist_field, elt, buffer, rbe, event); 4216 + case HIST_FIELD_FN_PSTRING: 4217 + return hist_field_pstring(hist_field, elt, buffer, rbe, event); 4218 + case HIST_FIELD_FN_S64: 4219 + return hist_field_s64(hist_field, elt, buffer, rbe, event); 4220 + case HIST_FIELD_FN_U64: 4221 + return hist_field_u64(hist_field, elt, buffer, rbe, event); 4222 + case HIST_FIELD_FN_S32: 4223 + return hist_field_s32(hist_field, elt, buffer, rbe, event); 4224 + case HIST_FIELD_FN_U32: 4225 + return hist_field_u32(hist_field, elt, buffer, rbe, event); 4226 + case HIST_FIELD_FN_S16: 4227 + return hist_field_s16(hist_field, elt, buffer, rbe, event); 4228 + case HIST_FIELD_FN_U16: 4229 + return hist_field_u16(hist_field, elt, buffer, rbe, event); 4230 + case HIST_FIELD_FN_S8: 4231 + return hist_field_s8(hist_field, elt, buffer, rbe, event); 4232 + case HIST_FIELD_FN_U8: 4233 + return hist_field_u8(hist_field, elt, buffer, rbe, event); 4234 + case HIST_FIELD_FN_UMINUS: 4235 + return hist_field_unary_minus(hist_field, elt, buffer, rbe, event); 4236 + case HIST_FIELD_FN_MINUS: 4237 + return hist_field_minus(hist_field, elt, buffer, rbe, event); 4238 + case HIST_FIELD_FN_PLUS: 4239 + return hist_field_plus(hist_field, elt, buffer, rbe, event); 4240 + case HIST_FIELD_FN_DIV: 4241 + return hist_field_div(hist_field, elt, buffer, rbe, event); 4242 + case HIST_FIELD_FN_MULT: 4243 + return hist_field_mult(hist_field, elt, buffer, rbe, event); 4244 + case HIST_FIELD_FN_DIV_POWER2: 4245 + return div_by_power_of_two(hist_field, elt, buffer, rbe, event); 4246 + case HIST_FIELD_FN_DIV_NOT_POWER2: 4247 + return div_by_not_power_of_two(hist_field, elt, buffer, rbe, event); 4248 + case HIST_FIELD_FN_DIV_MULT_SHIFT: 4249 + return div_by_mult_and_shift(hist_field, elt, buffer, rbe, event); 4250 + case HIST_FIELD_FN_EXECNAME: 4251 + return hist_field_execname(hist_field, elt, buffer, rbe, event); 4252 + default: 4253 + return 0; 4254 + } 4255 + } 4256 + 4213 4257 /* Convert a var that points to common_pid.execname to a string */ 4214 4258 static void update_var_execname(struct hist_field *hist_field) 4215 4259 { ··· 4289 4197 kfree_const(hist_field->type); 4290 4198 hist_field->type = "char[]"; 4291 4199 4292 - hist_field->fn = hist_field_execname; 4200 + hist_field->fn_num = HIST_FIELD_FN_EXECNAME; 4293 4201 } 4294 4202 4295 4203 static int create_var_field(struct hist_trigger_data *hist_data, ··· 5048 4956 5049 4957 for_each_hist_val_field(i, hist_data) { 5050 4958 hist_field = hist_data->fields[i]; 5051 - hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec); 4959 + hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec); 5052 4960 if (hist_field->flags & HIST_FIELD_FL_VAR) { 5053 4961 var_idx = hist_field->var.idx; 5054 4962 ··· 5079 4987 for_each_hist_key_field(i, hist_data) { 5080 4988 hist_field = hist_data->fields[i]; 5081 4989 if (hist_field->flags & HIST_FIELD_FL_VAR) { 5082 - hist_val = hist_field->fn(hist_field, elt, buffer, rbe, rec); 4990 + hist_val = hist_fn_call(hist_field, elt, buffer, rbe, rec); 5083 4991 var_idx = hist_field->var.idx; 5084 4992 tracing_map_set_var(elt, var_idx, hist_val); 5085 4993 } ··· 5154 5062 HIST_STACKTRACE_SKIP); 5155 5063 key = entries; 5156 5064 } else { 5157 - field_contents = key_field->fn(key_field, elt, buffer, rbe, rec); 5065 + field_contents = hist_fn_call(key_field, elt, buffer, rbe, rec); 5158 5066 if (key_field->flags & HIST_FIELD_FL_STRING) { 5159 5067 key = (void *)(unsigned long)field_contents; 5160 5068 use_compound_key = true;
+428 -144
kernel/trace/trace_events_user.c
··· 14 14 #include <linux/uio.h> 15 15 #include <linux/ioctl.h> 16 16 #include <linux/jhash.h> 17 + #include <linux/refcount.h> 17 18 #include <linux/trace_events.h> 18 19 #include <linux/tracefs.h> 19 20 #include <linux/types.h> ··· 40 39 */ 41 40 #define MAX_PAGE_ORDER 0 42 41 #define MAX_PAGES (1 << MAX_PAGE_ORDER) 43 - #define MAX_EVENTS (MAX_PAGES * PAGE_SIZE) 42 + #define MAX_BYTES (MAX_PAGES * PAGE_SIZE) 43 + #define MAX_EVENTS (MAX_BYTES * 8) 44 44 45 45 /* Limit how long of an event name plus args within the subsystem. */ 46 46 #define MAX_EVENT_DESC 512 47 47 #define EVENT_NAME(user_event) ((user_event)->tracepoint.name) 48 48 #define MAX_FIELD_ARRAY_SIZE 1024 49 - #define MAX_FIELD_ARG_NAME 256 50 49 51 - static char *register_page_data; 50 + /* 51 + * The MAP_STATUS_* macros are used for taking a index and determining the 52 + * appropriate byte and the bit in the byte to set/reset for an event. 53 + * 54 + * The lower 3 bits of the index decide which bit to set. 55 + * The remaining upper bits of the index decide which byte to use for the bit. 56 + * 57 + * This is used when an event has a probe attached/removed to reflect live 58 + * status of the event wanting tracing or not to user-programs via shared 59 + * memory maps. 60 + */ 61 + #define MAP_STATUS_BYTE(index) ((index) >> 3) 62 + #define MAP_STATUS_MASK(index) BIT((index) & 7) 52 63 53 - static DEFINE_MUTEX(reg_mutex); 54 - static DEFINE_HASHTABLE(register_table, 4); 55 - static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); 64 + /* 65 + * Internal bits (kernel side only) to keep track of connected probes: 66 + * These are used when status is requested in text form about an event. These 67 + * bits are compared against an internal byte on the event to determine which 68 + * probes to print out to the user. 69 + * 70 + * These do not reflect the mapped bytes between the user and kernel space. 71 + */ 72 + #define EVENT_STATUS_FTRACE BIT(0) 73 + #define EVENT_STATUS_PERF BIT(1) 74 + #define EVENT_STATUS_OTHER BIT(7) 75 + 76 + /* 77 + * Stores the pages, tables, and locks for a group of events. 78 + * Each logical grouping of events has its own group, with a 79 + * matching page for status checks within user programs. This 80 + * allows for isolation of events to user programs by various 81 + * means. 82 + */ 83 + struct user_event_group { 84 + struct page *pages; 85 + char *register_page_data; 86 + char *system_name; 87 + struct hlist_node node; 88 + struct mutex reg_mutex; 89 + DECLARE_HASHTABLE(register_table, 8); 90 + DECLARE_BITMAP(page_bitmap, MAX_EVENTS); 91 + }; 92 + 93 + /* Group for init_user_ns mapping, top-most group */ 94 + static struct user_event_group *init_group; 56 95 57 96 /* 58 97 * Stores per-event properties, as users register events 59 98 * within a file a user_event might be created if it does not 60 99 * already exist. These are globally used and their lifetime 61 100 * is tied to the refcnt member. These cannot go away until the 62 - * refcnt reaches zero. 101 + * refcnt reaches one. 63 102 */ 64 103 struct user_event { 104 + struct user_event_group *group; 65 105 struct tracepoint tracepoint; 66 106 struct trace_event_call call; 67 107 struct trace_event_class class; ··· 110 68 struct hlist_node node; 111 69 struct list_head fields; 112 70 struct list_head validators; 113 - atomic_t refcnt; 71 + refcount_t refcnt; 114 72 int index; 115 73 int flags; 116 74 int min_size; 75 + char status; 117 76 }; 118 77 119 78 /* ··· 129 86 struct user_event *events[]; 130 87 }; 131 88 89 + struct user_event_file_info { 90 + struct user_event_group *group; 91 + struct user_event_refs *refs; 92 + }; 93 + 132 94 #define VALIDATOR_ENSURE_NULL (1 << 0) 133 95 #define VALIDATOR_REL (1 << 1) 134 96 ··· 146 98 typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter *i, 147 99 void *tpdata, bool *faulted); 148 100 149 - static int user_event_parse(char *name, char *args, char *flags, 101 + static int user_event_parse(struct user_event_group *group, char *name, 102 + char *args, char *flags, 150 103 struct user_event **newuser); 151 104 152 105 static u32 user_event_key(char *name) 153 106 { 154 107 return jhash(name, strlen(name), 0); 108 + } 109 + 110 + static void set_page_reservations(char *pages, bool set) 111 + { 112 + int page; 113 + 114 + for (page = 0; page < MAX_PAGES; ++page) { 115 + void *addr = pages + (PAGE_SIZE * page); 116 + 117 + if (set) 118 + SetPageReserved(virt_to_page(addr)); 119 + else 120 + ClearPageReserved(virt_to_page(addr)); 121 + } 122 + } 123 + 124 + static void user_event_group_destroy(struct user_event_group *group) 125 + { 126 + if (group->register_page_data) 127 + set_page_reservations(group->register_page_data, false); 128 + 129 + if (group->pages) 130 + __free_pages(group->pages, MAX_PAGE_ORDER); 131 + 132 + kfree(group->system_name); 133 + kfree(group); 134 + } 135 + 136 + static char *user_event_group_system_name(struct user_namespace *user_ns) 137 + { 138 + char *system_name; 139 + int len = sizeof(USER_EVENTS_SYSTEM) + 1; 140 + 141 + if (user_ns != &init_user_ns) { 142 + /* 143 + * Unexpected at this point: 144 + * We only currently support init_user_ns. 145 + * When we enable more, this will trigger a failure so log. 146 + */ 147 + pr_warn("user_events: Namespace other than init_user_ns!\n"); 148 + return NULL; 149 + } 150 + 151 + system_name = kmalloc(len, GFP_KERNEL); 152 + 153 + if (!system_name) 154 + return NULL; 155 + 156 + snprintf(system_name, len, "%s", USER_EVENTS_SYSTEM); 157 + 158 + return system_name; 159 + } 160 + 161 + static inline struct user_event_group 162 + *user_event_group_from_user_ns(struct user_namespace *user_ns) 163 + { 164 + if (user_ns == &init_user_ns) 165 + return init_group; 166 + 167 + return NULL; 168 + } 169 + 170 + static struct user_event_group *current_user_event_group(void) 171 + { 172 + struct user_namespace *user_ns = current_user_ns(); 173 + struct user_event_group *group = NULL; 174 + 175 + while (user_ns) { 176 + group = user_event_group_from_user_ns(user_ns); 177 + 178 + if (group) 179 + break; 180 + 181 + user_ns = user_ns->parent; 182 + } 183 + 184 + return group; 185 + } 186 + 187 + static struct user_event_group 188 + *user_event_group_create(struct user_namespace *user_ns) 189 + { 190 + struct user_event_group *group; 191 + 192 + group = kzalloc(sizeof(*group), GFP_KERNEL); 193 + 194 + if (!group) 195 + return NULL; 196 + 197 + group->system_name = user_event_group_system_name(user_ns); 198 + 199 + if (!group->system_name) 200 + goto error; 201 + 202 + group->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PAGE_ORDER); 203 + 204 + if (!group->pages) 205 + goto error; 206 + 207 + group->register_page_data = page_address(group->pages); 208 + 209 + set_page_reservations(group->register_page_data, true); 210 + 211 + /* Zero all bits beside 0 (which is reserved for failures) */ 212 + bitmap_zero(group->page_bitmap, MAX_EVENTS); 213 + set_bit(0, group->page_bitmap); 214 + 215 + mutex_init(&group->reg_mutex); 216 + hash_init(group->register_table); 217 + 218 + return group; 219 + error: 220 + if (group) 221 + user_event_group_destroy(group); 222 + 223 + return NULL; 224 + }; 225 + 226 + static __always_inline 227 + void user_event_register_set(struct user_event *user) 228 + { 229 + int i = user->index; 230 + 231 + user->group->register_page_data[MAP_STATUS_BYTE(i)] |= MAP_STATUS_MASK(i); 232 + } 233 + 234 + static __always_inline 235 + void user_event_register_clear(struct user_event *user) 236 + { 237 + int i = user->index; 238 + 239 + user->group->register_page_data[MAP_STATUS_BYTE(i)] &= ~MAP_STATUS_MASK(i); 240 + } 241 + 242 + static __always_inline __must_check 243 + bool user_event_last_ref(struct user_event *user) 244 + { 245 + return refcount_read(&user->refcnt) == 1; 155 246 } 156 247 157 248 static __always_inline __must_check ··· 328 141 * 329 142 * Upon success user_event has its ref count increased by 1. 330 143 */ 331 - static int user_event_parse_cmd(char *raw_command, struct user_event **newuser) 144 + static int user_event_parse_cmd(struct user_event_group *group, 145 + char *raw_command, struct user_event **newuser) 332 146 { 333 147 char *name = raw_command; 334 148 char *args = strpbrk(name, " "); ··· 343 155 if (flags) 344 156 *flags++ = '\0'; 345 157 346 - return user_event_parse(name, args, flags, newuser); 158 + return user_event_parse(group, name, args, flags, newuser); 347 159 } 348 160 349 161 static int user_field_array_size(const char *type) ··· 465 277 goto add_field; 466 278 467 279 add_validator: 468 - if (strstr(type, "char") != 0) 280 + if (strstr(type, "char") != NULL) 469 281 validator_flags |= VALIDATOR_ENSURE_NULL; 470 282 471 283 validator = kmalloc(sizeof(*validator), GFP_KERNEL); ··· 646 458 return "%d"; 647 459 if (strcmp(type, "unsigned char") == 0) 648 460 return "%u"; 649 - if (strstr(type, "char[") != 0) 461 + if (strstr(type, "char[") != NULL) 650 462 return "%s"; 651 463 652 464 /* Unknown, likely struct, allowed treat as 64-bit */ ··· 667 479 668 480 return false; 669 481 check: 670 - return strstr(type, "char") != 0; 482 + return strstr(type, "char") != NULL; 671 483 } 672 484 673 485 #define LEN_OR_ZERO (len ? len - pos : 0) 486 + static int user_dyn_field_set_string(int argc, const char **argv, int *iout, 487 + char *buf, int len, bool *colon) 488 + { 489 + int pos = 0, i = *iout; 490 + 491 + *colon = false; 492 + 493 + for (; i < argc; ++i) { 494 + if (i != *iout) 495 + pos += snprintf(buf + pos, LEN_OR_ZERO, " "); 496 + 497 + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", argv[i]); 498 + 499 + if (strchr(argv[i], ';')) { 500 + ++i; 501 + *colon = true; 502 + break; 503 + } 504 + } 505 + 506 + /* Actual set, advance i */ 507 + if (len != 0) 508 + *iout = i; 509 + 510 + return pos + 1; 511 + } 512 + 513 + static int user_field_set_string(struct ftrace_event_field *field, 514 + char *buf, int len, bool colon) 515 + { 516 + int pos = 0; 517 + 518 + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", field->type); 519 + pos += snprintf(buf + pos, LEN_OR_ZERO, " "); 520 + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", field->name); 521 + 522 + if (colon) 523 + pos += snprintf(buf + pos, LEN_OR_ZERO, ";"); 524 + 525 + return pos + 1; 526 + } 527 + 674 528 static int user_event_set_print_fmt(struct user_event *user, char *buf, int len) 675 529 { 676 530 struct ftrace_event_field *field, *next; ··· 830 600 831 601 dyn_event_remove(&user->devent); 832 602 833 - register_page_data[user->index] = 0; 834 - clear_bit(user->index, page_bitmap); 603 + user_event_register_clear(user); 604 + clear_bit(user->index, user->group->page_bitmap); 835 605 hash_del(&user->node); 836 606 837 607 user_event_destroy_validators(user); ··· 842 612 return ret; 843 613 } 844 614 845 - static struct user_event *find_user_event(char *name, u32 *outkey) 615 + static struct user_event *find_user_event(struct user_event_group *group, 616 + char *name, u32 *outkey) 846 617 { 847 618 struct user_event *user; 848 619 u32 key = user_event_key(name); 849 620 850 621 *outkey = key; 851 622 852 - hash_for_each_possible(register_table, user, node, key) 623 + hash_for_each_possible(group->register_table, user, node, key) 853 624 if (!strcmp(EVENT_NAME(user), name)) { 854 - atomic_inc(&user->refcnt); 625 + refcount_inc(&user->refcnt); 855 626 return user; 856 627 } 857 628 ··· 1010 779 rcu_read_unlock_sched(); 1011 780 } 1012 781 1013 - register_page_data[user->index] = status; 782 + if (status) 783 + user_event_register_set(user); 784 + else 785 + user_event_register_clear(user); 786 + 787 + user->status = status; 1014 788 } 1015 789 1016 790 /* ··· 1071 835 1072 836 return ret; 1073 837 inc: 1074 - atomic_inc(&user->refcnt); 838 + refcount_inc(&user->refcnt); 1075 839 update_reg_page_for(user); 1076 840 return 0; 1077 841 dec: 1078 842 update_reg_page_for(user); 1079 - atomic_dec(&user->refcnt); 843 + refcount_dec(&user->refcnt); 1080 844 return 0; 1081 845 } 1082 846 1083 847 static int user_event_create(const char *raw_command) 1084 848 { 849 + struct user_event_group *group; 1085 850 struct user_event *user; 1086 851 char *name; 1087 852 int ret; ··· 1098 861 if (!name) 1099 862 return -ENOMEM; 1100 863 1101 - mutex_lock(&reg_mutex); 864 + group = current_user_event_group(); 1102 865 1103 - ret = user_event_parse_cmd(name, &user); 866 + if (!group) 867 + return -ENOENT; 868 + 869 + mutex_lock(&group->reg_mutex); 870 + 871 + ret = user_event_parse_cmd(group, name, &user); 1104 872 1105 873 if (!ret) 1106 - atomic_dec(&user->refcnt); 874 + refcount_dec(&user->refcnt); 1107 875 1108 - mutex_unlock(&reg_mutex); 876 + mutex_unlock(&group->reg_mutex); 1109 877 1110 878 if (ret) 1111 879 kfree(name); ··· 1152 910 { 1153 911 struct user_event *user = container_of(ev, struct user_event, devent); 1154 912 1155 - return atomic_read(&user->refcnt) != 0; 913 + return !user_event_last_ref(user); 1156 914 } 1157 915 1158 916 static int user_event_free(struct dyn_event *ev) 1159 917 { 1160 918 struct user_event *user = container_of(ev, struct user_event, devent); 1161 919 1162 - if (atomic_read(&user->refcnt) != 0) 920 + if (!user_event_last_ref(user)) 1163 921 return -EBUSY; 1164 922 1165 923 return destroy_user_event(user); ··· 1168 926 static bool user_field_match(struct ftrace_event_field *field, int argc, 1169 927 const char **argv, int *iout) 1170 928 { 1171 - char *field_name, *arg_name; 1172 - int len, pos, i = *iout; 929 + char *field_name = NULL, *dyn_field_name = NULL; 1173 930 bool colon = false, match = false; 931 + int dyn_len, len; 1174 932 1175 - if (i >= argc) 933 + if (*iout >= argc) 1176 934 return false; 1177 935 1178 - len = MAX_FIELD_ARG_NAME; 1179 - field_name = kmalloc(len, GFP_KERNEL); 1180 - arg_name = kmalloc(len, GFP_KERNEL); 936 + dyn_len = user_dyn_field_set_string(argc, argv, iout, dyn_field_name, 937 + 0, &colon); 1181 938 1182 - if (!arg_name || !field_name) 939 + len = user_field_set_string(field, field_name, 0, colon); 940 + 941 + if (dyn_len != len) 942 + return false; 943 + 944 + dyn_field_name = kmalloc(dyn_len, GFP_KERNEL); 945 + field_name = kmalloc(len, GFP_KERNEL); 946 + 947 + if (!dyn_field_name || !field_name) 1183 948 goto out; 1184 949 1185 - pos = 0; 950 + user_dyn_field_set_string(argc, argv, iout, dyn_field_name, 951 + dyn_len, &colon); 1186 952 1187 - for (; i < argc; ++i) { 1188 - if (i != *iout) 1189 - pos += snprintf(arg_name + pos, len - pos, " "); 953 + user_field_set_string(field, field_name, len, colon); 1190 954 1191 - pos += snprintf(arg_name + pos, len - pos, argv[i]); 1192 - 1193 - if (strchr(argv[i], ';')) { 1194 - ++i; 1195 - colon = true; 1196 - break; 1197 - } 1198 - } 1199 - 1200 - pos = 0; 1201 - 1202 - pos += snprintf(field_name + pos, len - pos, field->type); 1203 - pos += snprintf(field_name + pos, len - pos, " "); 1204 - pos += snprintf(field_name + pos, len - pos, field->name); 1205 - 1206 - if (colon) 1207 - pos += snprintf(field_name + pos, len - pos, ";"); 1208 - 1209 - *iout = i; 1210 - 1211 - match = strcmp(arg_name, field_name) == 0; 955 + match = strcmp(dyn_field_name, field_name) == 0; 1212 956 out: 1213 - kfree(arg_name); 957 + kfree(dyn_field_name); 1214 958 kfree(field_name); 1215 959 1216 960 return match; ··· 1264 1036 * The name buffer lifetime is owned by this method for success cases only. 1265 1037 * Upon success the returned user_event has its ref count increased by 1. 1266 1038 */ 1267 - static int user_event_parse(char *name, char *args, char *flags, 1039 + static int user_event_parse(struct user_event_group *group, char *name, 1040 + char *args, char *flags, 1268 1041 struct user_event **newuser) 1269 1042 { 1270 1043 int ret; ··· 1275 1046 1276 1047 /* Prevent dyn_event from racing */ 1277 1048 mutex_lock(&event_mutex); 1278 - user = find_user_event(name, &key); 1049 + user = find_user_event(group, name, &key); 1279 1050 mutex_unlock(&event_mutex); 1280 1051 1281 1052 if (user) { ··· 1288 1059 return 0; 1289 1060 } 1290 1061 1291 - index = find_first_zero_bit(page_bitmap, MAX_EVENTS); 1062 + index = find_first_zero_bit(group->page_bitmap, MAX_EVENTS); 1292 1063 1293 1064 if (index == MAX_EVENTS) 1294 1065 return -EMFILE; ··· 1302 1073 INIT_LIST_HEAD(&user->fields); 1303 1074 INIT_LIST_HEAD(&user->validators); 1304 1075 1076 + user->group = group; 1305 1077 user->tracepoint.name = name; 1306 1078 1307 1079 ret = user_event_parse_fields(user, args); ··· 1321 1091 user->call.flags = TRACE_EVENT_FL_TRACEPOINT; 1322 1092 user->call.tp = &user->tracepoint; 1323 1093 user->call.event.funcs = &user_event_funcs; 1094 + user->class.system = group->system_name; 1324 1095 1325 - user->class.system = USER_EVENTS_SYSTEM; 1326 1096 user->class.fields_array = user_event_fields_array; 1327 1097 user->class.get_fields = user_event_get_fields; 1328 1098 user->class.reg = user_event_reg; ··· 1340 1110 1341 1111 user->index = index; 1342 1112 1343 - /* Ensure we track ref */ 1344 - atomic_inc(&user->refcnt); 1113 + /* Ensure we track self ref and caller ref (2) */ 1114 + refcount_set(&user->refcnt, 2); 1345 1115 1346 1116 dyn_event_init(&user->devent, &user_event_dops); 1347 1117 dyn_event_add(&user->devent, &user->call); 1348 - set_bit(user->index, page_bitmap); 1349 - hash_add(register_table, &user->node, key); 1118 + set_bit(user->index, group->page_bitmap); 1119 + hash_add(group->register_table, &user->node, key); 1350 1120 1351 1121 mutex_unlock(&event_mutex); 1352 1122 ··· 1364 1134 /* 1365 1135 * Deletes a previously created event if it is no longer being used. 1366 1136 */ 1367 - static int delete_user_event(char *name) 1137 + static int delete_user_event(struct user_event_group *group, char *name) 1368 1138 { 1369 1139 u32 key; 1370 - int ret; 1371 - struct user_event *user = find_user_event(name, &key); 1140 + struct user_event *user = find_user_event(group, name, &key); 1372 1141 1373 1142 if (!user) 1374 1143 return -ENOENT; 1375 1144 1376 - /* Ensure we are the last ref */ 1377 - if (atomic_read(&user->refcnt) != 1) { 1378 - ret = -EBUSY; 1379 - goto put_ref; 1380 - } 1145 + refcount_dec(&user->refcnt); 1381 1146 1382 - ret = destroy_user_event(user); 1147 + if (!user_event_last_ref(user)) 1148 + return -EBUSY; 1383 1149 1384 - if (ret) 1385 - goto put_ref; 1386 - 1387 - return ret; 1388 - put_ref: 1389 - /* No longer have this ref */ 1390 - atomic_dec(&user->refcnt); 1391 - 1392 - return ret; 1150 + return destroy_user_event(user); 1393 1151 } 1394 1152 1395 1153 /* ··· 1385 1167 */ 1386 1168 static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) 1387 1169 { 1170 + struct user_event_file_info *info = file->private_data; 1388 1171 struct user_event_refs *refs; 1389 1172 struct user_event *user = NULL; 1390 1173 struct tracepoint *tp; ··· 1397 1178 1398 1179 rcu_read_lock_sched(); 1399 1180 1400 - refs = rcu_dereference_sched(file->private_data); 1181 + refs = rcu_dereference_sched(info->refs); 1401 1182 1402 1183 /* 1403 1184 * The refs->events array is protected by RCU, and new items may be ··· 1455 1236 return ret; 1456 1237 } 1457 1238 1239 + static int user_events_open(struct inode *node, struct file *file) 1240 + { 1241 + struct user_event_group *group; 1242 + struct user_event_file_info *info; 1243 + 1244 + group = current_user_event_group(); 1245 + 1246 + if (!group) 1247 + return -ENOENT; 1248 + 1249 + info = kzalloc(sizeof(*info), GFP_KERNEL); 1250 + 1251 + if (!info) 1252 + return -ENOMEM; 1253 + 1254 + info->group = group; 1255 + 1256 + file->private_data = info; 1257 + 1258 + return 0; 1259 + } 1260 + 1458 1261 static ssize_t user_events_write(struct file *file, const char __user *ubuf, 1459 1262 size_t count, loff_t *ppos) 1460 1263 { ··· 1486 1245 if (unlikely(*ppos != 0)) 1487 1246 return -EFAULT; 1488 1247 1489 - if (unlikely(import_single_range(READ, (char *)ubuf, count, &iov, &i))) 1248 + if (unlikely(import_single_range(WRITE, (char __user *)ubuf, 1249 + count, &iov, &i))) 1490 1250 return -EFAULT; 1491 1251 1492 1252 return user_events_write_core(file, &i); ··· 1498 1256 return user_events_write_core(kp->ki_filp, i); 1499 1257 } 1500 1258 1501 - static int user_events_ref_add(struct file *file, struct user_event *user) 1259 + static int user_events_ref_add(struct user_event_file_info *info, 1260 + struct user_event *user) 1502 1261 { 1262 + struct user_event_group *group = info->group; 1503 1263 struct user_event_refs *refs, *new_refs; 1504 1264 int i, size, count = 0; 1505 1265 1506 - refs = rcu_dereference_protected(file->private_data, 1507 - lockdep_is_held(&reg_mutex)); 1266 + refs = rcu_dereference_protected(info->refs, 1267 + lockdep_is_held(&group->reg_mutex)); 1508 1268 1509 1269 if (refs) { 1510 1270 count = refs->count; ··· 1530 1286 1531 1287 new_refs->events[i] = user; 1532 1288 1533 - atomic_inc(&user->refcnt); 1289 + refcount_inc(&user->refcnt); 1534 1290 1535 - rcu_assign_pointer(file->private_data, new_refs); 1291 + rcu_assign_pointer(info->refs, new_refs); 1536 1292 1537 1293 if (refs) 1538 1294 kfree_rcu(refs, rcu); ··· 1553 1309 if (size > PAGE_SIZE) 1554 1310 return -E2BIG; 1555 1311 1556 - return copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); 1312 + if (size < offsetofend(struct user_reg, write_index)) 1313 + return -EINVAL; 1314 + 1315 + ret = copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); 1316 + 1317 + if (ret) 1318 + return ret; 1319 + 1320 + kreg->size = size; 1321 + 1322 + return 0; 1557 1323 } 1558 1324 1559 1325 /* 1560 1326 * Registers a user_event on behalf of a user process. 1561 1327 */ 1562 - static long user_events_ioctl_reg(struct file *file, unsigned long uarg) 1328 + static long user_events_ioctl_reg(struct user_event_file_info *info, 1329 + unsigned long uarg) 1563 1330 { 1564 1331 struct user_reg __user *ureg = (struct user_reg __user *)uarg; 1565 1332 struct user_reg reg; ··· 1591 1336 return ret; 1592 1337 } 1593 1338 1594 - ret = user_event_parse_cmd(name, &user); 1339 + ret = user_event_parse_cmd(info->group, name, &user); 1595 1340 1596 1341 if (ret) { 1597 1342 kfree(name); 1598 1343 return ret; 1599 1344 } 1600 1345 1601 - ret = user_events_ref_add(file, user); 1346 + ret = user_events_ref_add(info, user); 1602 1347 1603 1348 /* No longer need parse ref, ref_add either worked or not */ 1604 - atomic_dec(&user->refcnt); 1349 + refcount_dec(&user->refcnt); 1605 1350 1606 1351 /* Positive number is index and valid */ 1607 1352 if (ret < 0) 1608 1353 return ret; 1609 1354 1610 1355 put_user((u32)ret, &ureg->write_index); 1611 - put_user(user->index, &ureg->status_index); 1356 + put_user(user->index, &ureg->status_bit); 1612 1357 1613 1358 return 0; 1614 1359 } ··· 1616 1361 /* 1617 1362 * Deletes a user_event on behalf of a user process. 1618 1363 */ 1619 - static long user_events_ioctl_del(struct file *file, unsigned long uarg) 1364 + static long user_events_ioctl_del(struct user_event_file_info *info, 1365 + unsigned long uarg) 1620 1366 { 1621 1367 void __user *ubuf = (void __user *)uarg; 1622 1368 char *name; ··· 1630 1374 1631 1375 /* event_mutex prevents dyn_event from racing */ 1632 1376 mutex_lock(&event_mutex); 1633 - ret = delete_user_event(name); 1377 + ret = delete_user_event(info->group, name); 1634 1378 mutex_unlock(&event_mutex); 1635 1379 1636 1380 kfree(name); ··· 1644 1388 static long user_events_ioctl(struct file *file, unsigned int cmd, 1645 1389 unsigned long uarg) 1646 1390 { 1391 + struct user_event_file_info *info = file->private_data; 1392 + struct user_event_group *group = info->group; 1647 1393 long ret = -ENOTTY; 1648 1394 1649 1395 switch (cmd) { 1650 1396 case DIAG_IOCSREG: 1651 - mutex_lock(&reg_mutex); 1652 - ret = user_events_ioctl_reg(file, uarg); 1653 - mutex_unlock(&reg_mutex); 1397 + mutex_lock(&group->reg_mutex); 1398 + ret = user_events_ioctl_reg(info, uarg); 1399 + mutex_unlock(&group->reg_mutex); 1654 1400 break; 1655 1401 1656 1402 case DIAG_IOCSDEL: 1657 - mutex_lock(&reg_mutex); 1658 - ret = user_events_ioctl_del(file, uarg); 1659 - mutex_unlock(&reg_mutex); 1403 + mutex_lock(&group->reg_mutex); 1404 + ret = user_events_ioctl_del(info, uarg); 1405 + mutex_unlock(&group->reg_mutex); 1660 1406 break; 1661 1407 } 1662 1408 ··· 1670 1412 */ 1671 1413 static int user_events_release(struct inode *node, struct file *file) 1672 1414 { 1415 + struct user_event_file_info *info = file->private_data; 1416 + struct user_event_group *group; 1673 1417 struct user_event_refs *refs; 1674 1418 struct user_event *user; 1675 1419 int i; 1420 + 1421 + if (!info) 1422 + return -EINVAL; 1423 + 1424 + group = info->group; 1676 1425 1677 1426 /* 1678 1427 * Ensure refs cannot change under any situation by taking the 1679 1428 * register mutex during the final freeing of the references. 1680 1429 */ 1681 - mutex_lock(&reg_mutex); 1430 + mutex_lock(&group->reg_mutex); 1682 1431 1683 - refs = file->private_data; 1432 + refs = info->refs; 1684 1433 1685 1434 if (!refs) 1686 1435 goto out; ··· 1701 1436 user = refs->events[i]; 1702 1437 1703 1438 if (user) 1704 - atomic_dec(&user->refcnt); 1439 + refcount_dec(&user->refcnt); 1705 1440 } 1706 1441 out: 1707 1442 file->private_data = NULL; 1708 1443 1709 - mutex_unlock(&reg_mutex); 1444 + mutex_unlock(&group->reg_mutex); 1710 1445 1711 1446 kfree(refs); 1447 + kfree(info); 1712 1448 1713 1449 return 0; 1714 1450 } 1715 1451 1716 1452 static const struct file_operations user_data_fops = { 1453 + .open = user_events_open, 1717 1454 .write = user_events_write, 1718 1455 .write_iter = user_events_write_iter, 1719 1456 .unlocked_ioctl = user_events_ioctl, 1720 1457 .release = user_events_release, 1721 1458 }; 1722 1459 1460 + static struct user_event_group *user_status_group(struct file *file) 1461 + { 1462 + struct seq_file *m = file->private_data; 1463 + 1464 + if (!m) 1465 + return NULL; 1466 + 1467 + return m->private; 1468 + } 1469 + 1723 1470 /* 1724 1471 * Maps the shared page into the user process for checking if event is enabled. 1725 1472 */ 1726 1473 static int user_status_mmap(struct file *file, struct vm_area_struct *vma) 1727 1474 { 1475 + char *pages; 1476 + struct user_event_group *group = user_status_group(file); 1728 1477 unsigned long size = vma->vm_end - vma->vm_start; 1729 1478 1730 - if (size != MAX_EVENTS) 1479 + if (size != MAX_BYTES) 1731 1480 return -EINVAL; 1732 1481 1482 + if (!group) 1483 + return -EINVAL; 1484 + 1485 + pages = group->register_page_data; 1486 + 1733 1487 return remap_pfn_range(vma, vma->vm_start, 1734 - virt_to_phys(register_page_data) >> PAGE_SHIFT, 1488 + virt_to_phys(pages) >> PAGE_SHIFT, 1735 1489 size, vm_get_page_prot(VM_READ)); 1736 1490 } 1737 1491 ··· 1774 1490 1775 1491 static int user_seq_show(struct seq_file *m, void *p) 1776 1492 { 1493 + struct user_event_group *group = m->private; 1777 1494 struct user_event *user; 1778 1495 char status; 1779 1496 int i, active = 0, busy = 0, flags; 1780 1497 1781 - mutex_lock(&reg_mutex); 1498 + if (!group) 1499 + return -EINVAL; 1782 1500 1783 - hash_for_each(register_table, i, user, node) { 1784 - status = register_page_data[user->index]; 1501 + mutex_lock(&group->reg_mutex); 1502 + 1503 + hash_for_each(group->register_table, i, user, node) { 1504 + status = user->status; 1785 1505 flags = user->flags; 1786 1506 1787 1507 seq_printf(m, "%d:%s", user->index, EVENT_NAME(user)); ··· 1808 1520 active++; 1809 1521 } 1810 1522 1811 - mutex_unlock(&reg_mutex); 1523 + mutex_unlock(&group->reg_mutex); 1812 1524 1813 1525 seq_puts(m, "\n"); 1814 1526 seq_printf(m, "Active: %d\n", active); ··· 1827 1539 1828 1540 static int user_status_open(struct inode *node, struct file *file) 1829 1541 { 1830 - return seq_open(file, &user_seq_ops); 1542 + struct user_event_group *group; 1543 + int ret; 1544 + 1545 + group = current_user_event_group(); 1546 + 1547 + if (!group) 1548 + return -ENOENT; 1549 + 1550 + ret = seq_open(file, &user_seq_ops); 1551 + 1552 + if (!ret) { 1553 + /* Chain group to seq_file */ 1554 + struct seq_file *m = file->private_data; 1555 + 1556 + m->private = group; 1557 + } 1558 + 1559 + return ret; 1831 1560 } 1832 1561 1833 1562 static const struct file_operations user_status_fops = { ··· 1885 1580 return -ENODEV; 1886 1581 } 1887 1582 1888 - static void set_page_reservations(bool set) 1889 - { 1890 - int page; 1891 - 1892 - for (page = 0; page < MAX_PAGES; ++page) { 1893 - void *addr = register_page_data + (PAGE_SIZE * page); 1894 - 1895 - if (set) 1896 - SetPageReserved(virt_to_page(addr)); 1897 - else 1898 - ClearPageReserved(virt_to_page(addr)); 1899 - } 1900 - } 1901 - 1902 1583 static int __init trace_events_user_init(void) 1903 1584 { 1904 - struct page *pages; 1905 1585 int ret; 1906 1586 1907 - /* Zero all bits beside 0 (which is reserved for failures) */ 1908 - bitmap_zero(page_bitmap, MAX_EVENTS); 1909 - set_bit(0, page_bitmap); 1587 + init_group = user_event_group_create(&init_user_ns); 1910 1588 1911 - pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PAGE_ORDER); 1912 - if (!pages) 1589 + if (!init_group) 1913 1590 return -ENOMEM; 1914 - register_page_data = page_address(pages); 1915 - 1916 - set_page_reservations(true); 1917 1591 1918 1592 ret = create_user_tracefs(); 1919 1593 1920 1594 if (ret) { 1921 1595 pr_warn("user_events could not register with tracefs\n"); 1922 - set_page_reservations(false); 1923 - __free_pages(pages, MAX_PAGE_ORDER); 1596 + user_event_group_destroy(init_group); 1597 + init_group = NULL; 1924 1598 return ret; 1925 1599 } 1926 1600
+2 -1
kernel/trace/trace_osnoise.c
··· 1786 1786 for_each_cpu(cpu, current_mask) { 1787 1787 retval = start_kthread(cpu); 1788 1788 if (retval) { 1789 + cpus_read_unlock(); 1789 1790 stop_per_cpu_kthreads(); 1790 - break; 1791 + return retval; 1791 1792 } 1792 1793 } 1793 1794
+2 -1
kernel/trace/trace_probe.h
··· 445 445 C(SAME_PROBE, "There is already the exact same probe event"),\ 446 446 C(NO_EVENT_INFO, "This requires both group and event name to attach"),\ 447 447 C(BAD_ATTACH_EVENT, "Attached event does not exist"),\ 448 - C(BAD_ATTACH_ARG, "Attached event does not have this field"), 448 + C(BAD_ATTACH_ARG, "Attached event does not have this field"),\ 449 + C(NO_EP_FILTER, "No filter rule after 'if'"), 449 450 450 451 #undef C 451 452 #define C(a, b) TP_ERR_##a
+2 -3
kernel/trace/tracing_map.c
··· 961 961 static void detect_dups(struct tracing_map_sort_entry **sort_entries, 962 962 int n_entries, unsigned int key_size) 963 963 { 964 - unsigned int dups = 0, total_dups = 0; 964 + unsigned int total_dups = 0; 965 965 int i; 966 966 void *key; 967 967 ··· 974 974 key = sort_entries[0]->key; 975 975 for (i = 1; i < n_entries; i++) { 976 976 if (!memcmp(sort_entries[i]->key, key, key_size)) { 977 - dups++; total_dups++; 977 + total_dups++; 978 978 continue; 979 979 } 980 980 key = sort_entries[i]->key; 981 - dups = 0; 982 981 } 983 982 984 983 WARN_ONCE(total_dups > 0,
+6 -8
kernel/tracepoint.c
··· 640 640 static int tracepoint_module_coming(struct module *mod) 641 641 { 642 642 struct tp_module *tp_mod; 643 - int ret = 0; 644 643 645 644 if (!mod->num_tracepoints) 646 645 return 0; ··· 651 652 */ 652 653 if (trace_module_has_bad_taint(mod)) 653 654 return 0; 654 - mutex_lock(&tracepoint_module_list_mutex); 655 + 655 656 tp_mod = kmalloc(sizeof(struct tp_module), GFP_KERNEL); 656 - if (!tp_mod) { 657 - ret = -ENOMEM; 658 - goto end; 659 - } 657 + if (!tp_mod) 658 + return -ENOMEM; 660 659 tp_mod->mod = mod; 660 + 661 + mutex_lock(&tracepoint_module_list_mutex); 661 662 list_add_tail(&tp_mod->list, &tracepoint_module_list); 662 663 blocking_notifier_call_chain(&tracepoint_notify_list, 663 664 MODULE_STATE_COMING, tp_mod); 664 - end: 665 665 mutex_unlock(&tracepoint_module_list_mutex); 666 - return ret; 666 + return 0; 667 667 } 668 668 669 669 static void tracepoint_module_going(struct module *mod)
+18 -7
samples/user_events/example.c
··· 12 12 #include <fcntl.h> 13 13 #include <stdio.h> 14 14 #include <unistd.h> 15 + #include <asm/bitsperlong.h> 16 + #include <endian.h> 15 17 #include <linux/user_events.h> 18 + 19 + #if __BITS_PER_LONG == 64 20 + #define endian_swap(x) htole64(x) 21 + #else 22 + #define endian_swap(x) htole32(x) 23 + #endif 16 24 17 25 /* Assumes debugfs is mounted */ 18 26 const char *data_file = "/sys/kernel/debug/tracing/user_events_data"; 19 27 const char *status_file = "/sys/kernel/debug/tracing/user_events_status"; 20 28 21 - static int event_status(char **status) 29 + static int event_status(long **status) 22 30 { 23 31 int fd = open(status_file, O_RDONLY); 24 32 ··· 41 33 return 0; 42 34 } 43 35 44 - static int event_reg(int fd, const char *command, int *status, int *write) 36 + static int event_reg(int fd, const char *command, long *index, long *mask, 37 + int *write) 45 38 { 46 39 struct user_reg reg = {0}; 47 40 ··· 52 43 if (ioctl(fd, DIAG_IOCSREG, &reg) == -1) 53 44 return -1; 54 45 55 - *status = reg.status_index; 46 + *index = reg.status_bit / __BITS_PER_LONG; 47 + *mask = endian_swap(1L << (reg.status_bit % __BITS_PER_LONG)); 56 48 *write = reg.write_index; 57 49 58 50 return 0; ··· 61 51 62 52 int main(int argc, char **argv) 63 53 { 64 - int data_fd, status, write; 65 - char *status_page; 54 + int data_fd, write; 55 + long index, mask; 56 + long *status_page; 66 57 struct iovec io[2]; 67 58 __u32 count = 0; 68 59 ··· 72 61 73 62 data_fd = open(data_file, O_RDWR); 74 63 75 - if (event_reg(data_fd, "test u32 count", &status, &write) == -1) 64 + if (event_reg(data_fd, "test u32 count", &index, &mask, &write) == -1) 76 65 return errno; 77 66 78 67 /* Setup iovec */ ··· 86 75 getchar(); 87 76 88 77 /* Check if anyone is listening */ 89 - if (status_page[status]) { 78 + if (status_page[index] & mask) { 90 79 /* Yep, trace out our data */ 91 80 writev(data_fd, (const struct iovec *)io, 2); 92 81
+27
tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Event probe event parser error log check 4 + # requires: dynamic_events events/syscalls/sys_enter_openat "<attached-group>.<attached-event> [<args>]":README error_log 5 + 6 + check_error() { # command-with-error-pos-by-^ 7 + ftrace_errlog_check 'event_probe' "$1" 'dynamic_events' 8 + } 9 + 10 + check_error 'e ^a.' # NO_EVENT_INFO 11 + check_error 'e ^.b' # NO_EVENT_INFO 12 + check_error 'e ^a.b' # BAD_ATTACH_EVENT 13 + check_error 'e syscalls/sys_enter_openat ^foo' # BAD_ATTACH_ARG 14 + check_error 'e:^/bar syscalls/sys_enter_openat' # NO_GROUP_NAME 15 + check_error 'e:^12345678901234567890123456789012345678901234567890123456789012345/bar syscalls/sys_enter_openat' # GROUP_TOO_LONG 16 + 17 + check_error 'e:^foo.1/bar syscalls/sys_enter_openat' # BAD_GROUP_NAME 18 + check_error 'e:^ syscalls/sys_enter_openat' # NO_EVENT_NAME 19 + check_error 'e:foo/^12345678901234567890123456789012345678901234567890123456789012345 syscalls/sys_enter_openat' # EVENT_TOO_LONG 20 + check_error 'e:foo/^bar.1 syscalls/sys_enter_openat' # BAD_EVENT_NAME 21 + 22 + check_error 'e:foo/bar syscalls/sys_enter_openat arg=^dfd' # BAD_FETCH_ARG 23 + check_error 'e:foo/bar syscalls/sys_enter_openat ^arg=$foo' # BAD_ATTACH_ARG 24 + 25 + check_error 'e:foo/bar syscalls/sys_enter_openat if ^' # NO_EP_FILTER 26 + 27 + exit 0
+39 -8
tools/testing/selftests/user_events/ftrace_test.c
··· 22 22 const char *trace_file = "/sys/kernel/debug/tracing/trace"; 23 23 const char *fmt_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/format"; 24 24 25 + static inline int status_check(char *status_page, int status_bit) 26 + { 27 + return status_page[status_bit >> 3] & (1 << (status_bit & 7)); 28 + } 29 + 25 30 static int trace_bytes(void) 26 31 { 27 32 int fd = open(trace_file, O_RDONLY); ··· 202 197 /* Register should work */ 203 198 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 204 199 ASSERT_EQ(0, reg.write_index); 205 - ASSERT_NE(0, reg.status_index); 200 + ASSERT_NE(0, reg.status_bit); 206 201 207 202 /* Multiple registers should result in same index */ 208 203 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 209 204 ASSERT_EQ(0, reg.write_index); 210 - ASSERT_NE(0, reg.status_index); 205 + ASSERT_NE(0, reg.status_bit); 211 206 212 207 /* Ensure disabled */ 213 208 self->enable_fd = open(enable_file, O_RDWR); ··· 217 212 /* MMAP should work and be zero'd */ 218 213 ASSERT_NE(MAP_FAILED, status_page); 219 214 ASSERT_NE(NULL, status_page); 220 - ASSERT_EQ(0, status_page[reg.status_index]); 215 + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 221 216 222 217 /* Enable event and ensure bits updated in status */ 223 218 ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) 224 - ASSERT_EQ(EVENT_STATUS_FTRACE, status_page[reg.status_index]); 219 + ASSERT_NE(0, status_check(status_page, reg.status_bit)); 225 220 226 221 /* Disable event and ensure bits updated in status */ 227 222 ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0"))) 228 - ASSERT_EQ(0, status_page[reg.status_index]); 223 + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 229 224 230 225 /* File still open should return -EBUSY for delete */ 231 226 ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event")); ··· 245 240 struct iovec io[3]; 246 241 __u32 field1, field2; 247 242 int before = 0, after = 0; 243 + int page_size = sysconf(_SC_PAGESIZE); 244 + char *status_page; 248 245 249 246 reg.size = sizeof(reg); 250 247 reg.name_args = (__u64)"__test_event u32 field1; u32 field2"; ··· 261 254 io[2].iov_base = &field2; 262 255 io[2].iov_len = sizeof(field2); 263 256 257 + status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, 258 + self->status_fd, 0); 259 + 264 260 /* Register should work */ 265 261 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 266 262 ASSERT_EQ(0, reg.write_index); 267 - ASSERT_NE(0, reg.status_index); 263 + ASSERT_NE(0, reg.status_bit); 264 + 265 + /* MMAP should work and be zero'd */ 266 + ASSERT_NE(MAP_FAILED, status_page); 267 + ASSERT_NE(NULL, status_page); 268 + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 268 269 269 270 /* Write should fail on invalid slot with ENOENT */ 270 271 io[0].iov_base = &field2; ··· 285 270 /* Enable event */ 286 271 self->enable_fd = open(enable_file, O_RDWR); 287 272 ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) 273 + 274 + /* Event should now be enabled */ 275 + ASSERT_NE(0, status_check(status_page, reg.status_bit)); 288 276 289 277 /* Write should make it out to ftrace buffers */ 290 278 before = trace_bytes(); ··· 316 298 /* Register should work */ 317 299 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 318 300 ASSERT_EQ(0, reg.write_index); 319 - ASSERT_NE(0, reg.status_index); 301 + ASSERT_NE(0, reg.status_bit); 320 302 321 303 /* Write should work normally */ 322 304 ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 2)); ··· 333 315 int loc, bytes; 334 316 char data[8]; 335 317 int before = 0, after = 0; 318 + int page_size = sysconf(_SC_PAGESIZE); 319 + char *status_page; 320 + 321 + status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, 322 + self->status_fd, 0); 336 323 337 324 reg.size = sizeof(reg); 338 325 reg.name_args = (__u64)"__test_event __rel_loc char[] data"; ··· 345 322 /* Register should work */ 346 323 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 347 324 ASSERT_EQ(0, reg.write_index); 348 - ASSERT_NE(0, reg.status_index); 325 + ASSERT_NE(0, reg.status_bit); 326 + 327 + /* MMAP should work and be zero'd */ 328 + ASSERT_NE(MAP_FAILED, status_page); 329 + ASSERT_NE(NULL, status_page); 330 + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 349 331 350 332 io[0].iov_base = &reg.write_index; 351 333 io[0].iov_len = sizeof(reg.write_index); ··· 367 339 /* Enable event */ 368 340 self->enable_fd = open(enable_file, O_RDWR); 369 341 ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) 342 + 343 + /* Event should now be enabled */ 344 + ASSERT_NE(0, status_check(status_page, reg.status_bit)); 370 345 371 346 /* Full in-bounds write should work */ 372 347 before = trace_bytes();
+8 -3
tools/testing/selftests/user_events/perf_test.c
··· 35 35 return syscall(__NR_perf_event_open, pe, pid, cpu, group_fd, flags); 36 36 } 37 37 38 + static inline int status_check(char *status_page, int status_bit) 39 + { 40 + return status_page[status_bit >> 3] & (1 << (status_bit & 7)); 41 + } 42 + 38 43 static int get_id(void) 39 44 { 40 45 FILE *fp = fopen(id_file, "r"); ··· 125 120 /* Register should work */ 126 121 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 127 122 ASSERT_EQ(0, reg.write_index); 128 - ASSERT_NE(0, reg.status_index); 129 - ASSERT_EQ(0, status_page[reg.status_index]); 123 + ASSERT_NE(0, reg.status_bit); 124 + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 130 125 131 126 /* Id should be there */ 132 127 id = get_id(); ··· 149 144 ASSERT_NE(MAP_FAILED, perf_page); 150 145 151 146 /* Status should be updated */ 152 - ASSERT_EQ(EVENT_STATUS_PERF, status_page[reg.status_index]); 147 + ASSERT_NE(0, status_check(status_page, reg.status_bit)); 153 148 154 149 event.index = reg.write_index; 155 150 event.field1 = 0xc001;
+4 -4
tools/verification/dot2/dot2k_templates/main_global.c
··· 27 27 * 28 28 * The rv monitor reference is needed for the monitor declaration. 29 29 */ 30 - struct rv_monitor rv_MODEL_NAME; 30 + static struct rv_monitor rv_MODEL_NAME; 31 31 DECLARE_DA_MON_GLOBAL(MODEL_NAME, MIN_TYPE); 32 32 33 33 /* ··· 63 63 /* 64 64 * This is the monitor register section. 65 65 */ 66 - struct rv_monitor rv_MODEL_NAME = { 66 + static struct rv_monitor rv_MODEL_NAME = { 67 67 .name = "MODEL_NAME", 68 68 .description = "auto-generated MODEL_NAME", 69 69 .enable = enable_MODEL_NAME, ··· 72 72 .enabled = 0, 73 73 }; 74 74 75 - static int register_MODEL_NAME(void) 75 + static int __init register_MODEL_NAME(void) 76 76 { 77 77 rv_register_monitor(&rv_MODEL_NAME); 78 78 return 0; 79 79 } 80 80 81 - static void unregister_MODEL_NAME(void) 81 + static void __exit unregister_MODEL_NAME(void) 82 82 { 83 83 rv_unregister_monitor(&rv_MODEL_NAME); 84 84 }
+4 -4
tools/verification/dot2/dot2k_templates/main_per_cpu.c
··· 27 27 * 28 28 * The rv monitor reference is needed for the monitor declaration. 29 29 */ 30 - struct rv_monitor rv_MODEL_NAME; 30 + static struct rv_monitor rv_MODEL_NAME; 31 31 DECLARE_DA_MON_PER_CPU(MODEL_NAME, MIN_TYPE); 32 32 33 33 /* ··· 63 63 /* 64 64 * This is the monitor register section. 65 65 */ 66 - struct rv_monitor rv_MODEL_NAME = { 66 + static struct rv_monitor rv_MODEL_NAME = { 67 67 .name = "MODEL_NAME", 68 68 .description = "auto-generated MODEL_NAME", 69 69 .enable = enable_MODEL_NAME, ··· 72 72 .enabled = 0, 73 73 }; 74 74 75 - static int register_MODEL_NAME(void) 75 + static int __init register_MODEL_NAME(void) 76 76 { 77 77 rv_register_monitor(&rv_MODEL_NAME); 78 78 return 0; 79 79 } 80 80 81 - static void unregister_MODEL_NAME(void) 81 + static void __exit unregister_MODEL_NAME(void) 82 82 { 83 83 rv_unregister_monitor(&rv_MODEL_NAME); 84 84 }
+4 -4
tools/verification/dot2/dot2k_templates/main_per_task.c
··· 27 27 * 28 28 * The rv monitor reference is needed for the monitor declaration. 29 29 */ 30 - struct rv_monitor rv_MODEL_NAME; 30 + static struct rv_monitor rv_MODEL_NAME; 31 31 DECLARE_DA_MON_PER_TASK(MODEL_NAME, MIN_TYPE); 32 32 33 33 /* ··· 63 63 /* 64 64 * This is the monitor register section. 65 65 */ 66 - struct rv_monitor rv_MODEL_NAME = { 66 + static struct rv_monitor rv_MODEL_NAME = { 67 67 .name = "MODEL_NAME", 68 68 .description = "auto-generated MODEL_NAME", 69 69 .enable = enable_MODEL_NAME, ··· 72 72 .enabled = 0, 73 73 }; 74 74 75 - static int register_MODEL_NAME(void) 75 + static int __init register_MODEL_NAME(void) 76 76 { 77 77 rv_register_monitor(&rv_MODEL_NAME); 78 78 return 0; 79 79 } 80 80 81 - static void unregister_MODEL_NAME(void) 81 + static void __exit unregister_MODEL_NAME(void) 82 82 { 83 83 rv_unregister_monitor(&rv_MODEL_NAME); 84 84 }