Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

- User events are finally ready!

After lots of collaboration between various parties, we finally
locked down on a stable interface for user events that can also work
with user space only tracing.

This is implemented by telling the kernel (or user space library, but
that part is user space only and not part of this patch set), where
the variable is that the application uses to know if something is
listening to the trace.

There's also an interface to tell the kernel about these events,
which will show up in the /sys/kernel/tracing/events/user_events/
directory, where it can be enabled.

When it's enabled, the kernel will update the variable, to tell the
application to start writing to the kernel.

See https://lwn.net/Articles/927595/

- Cleaned up the direct trampolines code to simplify arm64 addition of
direct trampolines.

Direct trampolines use the ftrace interface but instead of jumping to
the ftrace trampoline, applications (mostly BPF) can register their
own trampoline for performance reasons.

- Some updates to the fprobe infrastructure. fprobes are more efficient
than kprobes, as it does not need to save all the registers that
kprobes on ftrace do. More work needs to be done before the fprobes
will be exposed as dynamic events.

- More updates to references to the obsolete path of
/sys/kernel/debug/tracing for the new /sys/kernel/tracing path.

- Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer
line by line instead of all at once.

There are users in production kernels that have a large data dump
that originally used printk() directly, but the data dump was larger
than what printk() allowed as a single print.

Using seq_buf() to do the printing fixes that.

- Add /sys/kernel/tracing/touched_functions that shows all functions
that was every traced by ftrace or a direct trampoline. This is used
for debugging issues where a traced function could have caused a
crash by a bpf program or live patching.

- Add a "fields" option that is similar to "raw" but outputs the fields
of the events. It's easier to read by humans.

- Some minor fixes and clean ups.

* tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits)
ring-buffer: Sync IRQ works before buffer destruction
tracing: Add missing spaces in trace_print_hex_seq()
ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus
recordmcount: Fix memory leaks in the uwrite function
tracing/user_events: Limit max fault-in attempts
tracing/user_events: Prevent same address and bit per process
tracing/user_events: Ensure bit is cleared on unregister
tracing/user_events: Ensure write index cannot be negative
seq_buf: Add seq_buf_do_printk() helper
tracing: Fix print_fields() for __dyn_loc/__rel_loc
tracing/user_events: Set event filter_type from type
ring-buffer: Clearly check null ptr returned by rb_set_head_page()
tracing: Unbreak user events
tracing/user_events: Use print_format_fields() for trace output
tracing/user_events: Align structs with tabs for readability
tracing/user_events: Limit global user_event count
tracing/user_events: Charge event allocs to cgroups
tracing/user_events: Update documentation for ABI
tracing/user_events: Use write ABI in example
tracing/user_events: Add ABI self-test
...

+1973 -532
+12 -4
Documentation/trace/fprobe.rst
··· 87 87 The fprobe entry/exit handler 88 88 ============================= 89 89 90 - The prototype of the entry/exit callback function is as follows: 90 + The prototype of the entry/exit callback function are as follows: 91 91 92 92 .. code-block:: c 93 93 94 - void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); 94 + int entry_callback(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs, void *entry_data); 95 95 96 - Note that both entry and exit callbacks have same ptototype. The @entry_ip is 97 - saved at function entry and passed to exit handler. 96 + void exit_callback(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs, void *entry_data); 97 + 98 + Note that the @entry_ip is saved at function entry and passed to exit handler. 99 + If the entry callback function returns !0, the corresponding exit callback will be cancelled. 98 100 99 101 @fp 100 102 This is the address of `fprobe` data structure related to this handler. ··· 114 112 in the entry_handler. If you need traced instruction pointer, you need 115 113 to use @entry_ip. On the other hand, in the exit_handler, the instruction 116 114 pointer of @regs is set to the currect return address. 115 + 116 + @entry_data 117 + This is a local storage to share the data between entry and exit handlers. 118 + This storage is NULL by default. If the user specify `exit_handler` field 119 + and `entry_data_size` field when registering the fprobe, the storage is 120 + allocated and passed to both `entry_handler` and `exit_handler`. 117 121 118 122 Share the callbacks with kprobes 119 123 ================================
+6
Documentation/trace/ftrace.rst
··· 1027 1027 nohex 1028 1028 nobin 1029 1029 noblock 1030 + nofields 1030 1031 trace_printk 1031 1032 annotate 1032 1033 nouserstacktrace ··· 1110 1109 1111 1110 block 1112 1111 When set, reading trace_pipe will not block when polled. 1112 + 1113 + fields 1114 + Print the fields as described by their types. This is a better 1115 + option than using hex, bin or raw, as it gives a better parsing 1116 + of the content of the event. 1113 1117 1114 1118 trace_printk 1115 1119 Can disable trace_printk() from writing into the buffer.
+104 -77
Documentation/trace/user_events.rst
··· 20 20 21 21 Typically programs will register a set of events that they wish to expose to 22 22 tools that can read trace_events (such as ftrace and perf). The registration 23 - process gives back two ints to the program for each event. The first int is 24 - the status bit. This describes which bit in little-endian format in the 25 - /sys/kernel/tracing/user_events_status file represents this event. The 26 - second int is the write index which describes the data when a write() or 27 - writev() is called on the /sys/kernel/tracing/user_events_data file. 23 + process tells the kernel which address and bit to reflect if any tool has 24 + enabled the event and data should be written. The registration will give back 25 + a write index which describes the data when a write() or writev() is called 26 + on the /sys/kernel/tracing/user_events_data file. 28 27 29 28 The structures referenced in this document are contained within the 30 29 /include/uapi/linux/user_events.h file in the source tree. ··· 40 41 This command takes a packed struct user_reg as an argument:: 41 42 42 43 struct user_reg { 43 - u32 size; 44 - u64 name_args; 45 - u32 status_bit; 46 - u32 write_index; 47 - }; 44 + /* Input: Size of the user_reg structure being used */ 45 + __u32 size; 48 46 49 - The struct user_reg requires two inputs, the first is the size of the structure 50 - to ensure forward and backward compatibility. The second is the command string 51 - to issue for registering. Upon success two outputs are set, the status bit 52 - and the write index. 47 + /* Input: Bit in enable address to use */ 48 + __u8 enable_bit; 49 + 50 + /* Input: Enable size in bytes at address */ 51 + __u8 enable_size; 52 + 53 + /* Input: Flags for future use, set to 0 */ 54 + __u16 flags; 55 + 56 + /* Input: Address to update when enabled */ 57 + __u64 enable_addr; 58 + 59 + /* Input: Pointer to string with event name, description and flags */ 60 + __u64 name_args; 61 + 62 + /* Output: Index of the event to use when writing data */ 63 + __u32 write_index; 64 + } __attribute__((__packed__)); 65 + 66 + The struct user_reg requires all the above inputs to be set appropriately. 67 + 68 + + size: This must be set to sizeof(struct user_reg). 69 + 70 + + enable_bit: The bit to reflect the event status at the address specified by 71 + enable_addr. 72 + 73 + + enable_size: The size of the value specified by enable_addr. 74 + This must be 4 (32-bit) or 8 (64-bit). 64-bit values are only allowed to be 75 + used on 64-bit kernels, however, 32-bit can be used on all kernels. 76 + 77 + + flags: The flags to use, if any. For the initial version this must be 0. 78 + Callers should first attempt to use flags and retry without flags to ensure 79 + support for lower versions of the kernel. If a flag is not supported -EINVAL 80 + is returned. 81 + 82 + + enable_addr: The address of the value to use to reflect event status. This 83 + must be naturally aligned and write accessible within the user program. 84 + 85 + + name_args: The name and arguments to describe the event, see command format 86 + for details. 87 + 88 + Upon successful registration the following is set. 89 + 90 + + write_index: The index to use for this file descriptor that represents this 91 + event when writing out data. The index is unique to this instance of the file 92 + descriptor that was used for the registration. See writing data for details. 53 93 54 94 User based events show up under tracefs like any other event under the 55 95 subsystem named "user_events". This means tools that wish to attach to the 56 96 events need to use /sys/kernel/tracing/events/user_events/[name]/enable 57 97 or perf record -e user_events:[name] when attaching/recording. 58 98 59 - **NOTE:** *The write_index returned is only valid for the FD that was used* 99 + **NOTE:** The event subsystem name by default is "user_events". Callers should 100 + not assume it will always be "user_events". Operators reserve the right in the 101 + future to change the subsystem name per-process to accomodate event isolation. 60 102 61 103 Command Format 62 104 ^^^^^^^^^^^^^^ ··· 134 94 struct mytype myname 20 135 95 136 96 Deleting 137 - ----------- 97 + -------- 138 98 Deleting an event from within a user process is done via ioctl() out to the 139 99 /sys/kernel/tracing/user_events_data file. The command to issue is 140 100 DIAG_IOCSDEL. ··· 144 104 event (in both user and kernel space). User programs should use a separate file 145 105 to request deletes than the one used for registration due to this. 146 106 107 + Unregistering 108 + ------------- 109 + If after registering an event it is no longer wanted to be updated then it can 110 + be disabled via ioctl() out to the /sys/kernel/tracing/user_events_data file. 111 + The command to issue is DIAG_IOCSUNREG. This is different than deleting, where 112 + deleting actually removes the event from the system. Unregistering simply tells 113 + the kernel your process is no longer interested in updates to the event. 114 + 115 + This command takes a packed struct user_unreg as an argument:: 116 + 117 + struct user_unreg { 118 + /* Input: Size of the user_unreg structure being used */ 119 + __u32 size; 120 + 121 + /* Input: Bit to unregister */ 122 + __u8 disable_bit; 123 + 124 + /* Input: Reserved, set to 0 */ 125 + __u8 __reserved; 126 + 127 + /* Input: Reserved, set to 0 */ 128 + __u16 __reserved2; 129 + 130 + /* Input: Address to unregister */ 131 + __u64 disable_addr; 132 + } __attribute__((__packed__)); 133 + 134 + The struct user_unreg requires all the above inputs to be set appropriately. 135 + 136 + + size: This must be set to sizeof(struct user_unreg). 137 + 138 + + disable_bit: This must be set to the bit to disable (same bit that was 139 + previously registered via enable_bit). 140 + 141 + + disable_addr: This must be set to the address to disable (same address that was 142 + previously registered via enable_addr). 143 + 144 + **NOTE:** Events are automatically unregistered when execve() is invoked. During 145 + fork() the registered events will be retained and must be unregistered manually 146 + in each process if wanted. 147 + 147 148 Status 148 149 ------ 149 150 When tools attach/record user based events the status of the event is updated 150 151 in realtime. This allows user programs to only incur the cost of the write() or 151 152 writev() calls when something is actively attached to the event. 152 153 153 - User programs call mmap() on /sys/kernel/tracing/user_events_status to 154 - check the status for each event that is registered. The bit to check in the 155 - file is given back after the register ioctl() via user_reg.status_bit. The bit 156 - is always in little-endian format. Programs can check if the bit is set either 157 - using a byte-wise index with a mask or a long-wise index with a little-endian 158 - mask. 159 - 160 - Currently the size of user_events_status is a single page, however, custom 161 - kernel configurations can change this size to allow more user based events. In 162 - all cases the size of the file is a multiple of a page size. 163 - 164 - For example, if the register ioctl() gives back a status_bit of 3 you would 165 - check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8 166 - (1 << (3 % 8)) to see if anything is attached to that event. 167 - 168 - A byte-wise index check is performed as follows:: 169 - 170 - int index, mask; 171 - char *status_page; 172 - 173 - index = status_bit / 8; 174 - mask = 1 << (status_bit % 8); 175 - 176 - ... 177 - 178 - if (status_page[index] & mask) { 179 - /* Enabled */ 180 - } 181 - 182 - A long-wise index check is performed as follows:: 183 - 184 - #include <asm/bitsperlong.h> 185 - #include <endian.h> 186 - 187 - #if __BITS_PER_LONG == 64 188 - #define endian_swap(x) htole64(x) 189 - #else 190 - #define endian_swap(x) htole32(x) 191 - #endif 192 - 193 - long index, mask, *status_page; 194 - 195 - index = status_bit / __BITS_PER_LONG; 196 - mask = 1L << (status_bit % __BITS_PER_LONG); 197 - mask = endian_swap(mask); 198 - 199 - ... 200 - 201 - if (status_page[index] & mask) { 202 - /* Enabled */ 203 - } 154 + The kernel will update the specified bit that was registered for the event as 155 + tools attach/detach from the event. User programs simply check if the bit is set 156 + to see if something is attached or not. 204 157 205 158 Administrators can easily check the status of all registered events by reading 206 159 the user_events_status file directly via a terminal. The output is as follows:: 207 160 208 - Byte:Name [# Comments] 161 + Name [# Comments] 209 162 ... 210 163 211 164 Active: ActiveCount 212 165 Busy: BusyCount 213 - Max: MaxCount 214 166 215 167 For example, on a system that has a single event the output looks like this:: 216 168 217 - 1:test 169 + test 218 170 219 171 Active: 1 220 172 Busy: 0 221 - Max: 32768 222 173 223 174 If a user enables the user event via ftrace, the output would change to this:: 224 175 225 - 1:test # Used by ftrace 176 + test # Used by ftrace 226 177 227 178 Active: 1 228 179 Busy: 1 229 - Max: 32768 230 - 231 - **NOTE:** *A status bit of 0 will never be returned. This allows user programs 232 - to have a bit that can be used on error cases.* 233 180 234 181 Writing Data 235 182 ------------ ··· 244 217 int src; 245 218 int dst; 246 219 int flags; 247 - }; 220 + } __attribute__((__packed__)); 248 221 249 222 It's advised for user programs to do the following:: 250 223
+2
fs/exec.c
··· 65 65 #include <linux/syscall_user_dispatch.h> 66 66 #include <linux/coredump.h> 67 67 #include <linux/time_namespace.h> 68 + #include <linux/user_events.h> 68 69 69 70 #include <linux/uaccess.h> 70 71 #include <asm/mmu_context.h> ··· 1860 1859 current->fs->in_exec = 0; 1861 1860 current->in_execve = 0; 1862 1861 rseq_execve(current); 1862 + user_events_execve(current); 1863 1863 acct_update_integrals(current); 1864 1864 task_numa_free(current, false); 1865 1865 return retval;
+8 -2
include/linux/fprobe.h
··· 13 13 * @nmissed: The counter for missing events. 14 14 * @flags: The status flag. 15 15 * @rethook: The rethook data structure. (internal data) 16 + * @entry_data_size: The private data storage size. 17 + * @nr_maxactive: The max number of active functions. 16 18 * @entry_handler: The callback function for function entry. 17 19 * @exit_handler: The callback function for function exit. 18 20 */ ··· 31 29 unsigned long nmissed; 32 30 unsigned int flags; 33 31 struct rethook *rethook; 32 + size_t entry_data_size; 33 + int nr_maxactive; 34 34 35 - void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); 36 - void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); 35 + int (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, 36 + struct pt_regs *regs, void *entry_data); 37 + void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, 38 + struct pt_regs *regs, void *entry_data); 37 39 }; 38 40 39 41 /* This fprobe is soft-disabled. */
+4 -1
include/linux/ftrace.h
··· 548 548 * DIRECT - there is a direct function to call 549 549 * CALL_OPS - the record can use callsite-specific ops 550 550 * CALL_OPS_EN - the function is set up to use callsite-specific ops 551 + * TOUCHED - A callback was added since boot up 551 552 * 552 553 * When a new ftrace_ops is registered and wants a function to save 553 554 * pt_regs, the rec->flags REGS is set. When the function has been ··· 568 567 FTRACE_FL_DIRECT_EN = (1UL << 23), 569 568 FTRACE_FL_CALL_OPS = (1UL << 22), 570 569 FTRACE_FL_CALL_OPS_EN = (1UL << 21), 570 + FTRACE_FL_TOUCHED = (1UL << 20), 571 571 }; 572 572 573 - #define FTRACE_REF_MAX_SHIFT 21 573 + #define FTRACE_REF_MAX_SHIFT 20 574 574 #define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1) 575 575 576 576 #define ftrace_rec_count(rec) ((rec)->flags & FTRACE_REF_MAX) ··· 630 628 FTRACE_ITER_PROBE = (1 << 4), 631 629 FTRACE_ITER_MOD = (1 << 5), 632 630 FTRACE_ITER_ENABLED = (1 << 6), 631 + FTRACE_ITER_TOUCHED = (1 << 7), 633 632 }; 634 633 635 634 void arch_ftrace_update_code(int command);
+5
include/linux/sched.h
··· 70 70 struct signal_struct; 71 71 struct task_delay_info; 72 72 struct task_group; 73 + struct user_event_mm; 73 74 74 75 /* 75 76 * Task state bitmask. NOTE! These bits are also ··· 1528 1527 * none of these are justified. 1529 1528 */ 1530 1529 union rv_task_monitor rv[RV_PER_TASK_MONITORS]; 1530 + #endif 1531 + 1532 + #ifdef CONFIG_USER_EVENTS 1533 + struct user_event_mm *user_event_mm; 1531 1534 #endif 1532 1535 1533 1536 /*
+2
include/linux/seq_buf.h
··· 159 159 seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary); 160 160 #endif 161 161 162 + void seq_buf_do_printk(struct seq_buf *s, const char *lvl); 163 + 162 164 #endif /* _LINUX_SEQ_BUF_H */
+72 -43
include/linux/user_events.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 2 /* 3 - * Copyright (c) 2021, Microsoft Corporation. 3 + * Copyright (c) 2022, Microsoft Corporation. 4 4 * 5 5 * Authors: 6 6 * Beau Belgrave <beaub@linux.microsoft.com> 7 7 */ 8 - #ifndef _UAPI_LINUX_USER_EVENTS_H 9 - #define _UAPI_LINUX_USER_EVENTS_H 10 8 11 - #include <linux/types.h> 12 - #include <linux/ioctl.h> 9 + #ifndef _LINUX_USER_EVENTS_H 10 + #define _LINUX_USER_EVENTS_H 13 11 14 - #ifdef __KERNEL__ 15 - #include <linux/uio.h> 12 + #include <linux/list.h> 13 + #include <linux/refcount.h> 14 + #include <linux/mm_types.h> 15 + #include <linux/workqueue.h> 16 + #include <uapi/linux/user_events.h> 17 + 18 + #ifdef CONFIG_USER_EVENTS 19 + struct user_event_mm { 20 + struct list_head link; 21 + struct list_head enablers; 22 + struct mm_struct *mm; 23 + struct user_event_mm *next; 24 + refcount_t refcnt; 25 + refcount_t tasks; 26 + struct rcu_work put_rwork; 27 + }; 28 + 29 + extern void user_event_mm_dup(struct task_struct *t, 30 + struct user_event_mm *old_mm); 31 + 32 + extern void user_event_mm_remove(struct task_struct *t); 33 + 34 + static inline void user_events_fork(struct task_struct *t, 35 + unsigned long clone_flags) 36 + { 37 + struct user_event_mm *old_mm; 38 + 39 + if (!t || !current->user_event_mm) 40 + return; 41 + 42 + old_mm = current->user_event_mm; 43 + 44 + if (clone_flags & CLONE_VM) { 45 + t->user_event_mm = old_mm; 46 + refcount_inc(&old_mm->tasks); 47 + return; 48 + } 49 + 50 + user_event_mm_dup(t, old_mm); 51 + } 52 + 53 + static inline void user_events_execve(struct task_struct *t) 54 + { 55 + if (!t || !t->user_event_mm) 56 + return; 57 + 58 + user_event_mm_remove(t); 59 + } 60 + 61 + static inline void user_events_exit(struct task_struct *t) 62 + { 63 + if (!t || !t->user_event_mm) 64 + return; 65 + 66 + user_event_mm_remove(t); 67 + } 16 68 #else 17 - #include <sys/uio.h> 18 - #endif 69 + static inline void user_events_fork(struct task_struct *t, 70 + unsigned long clone_flags) 71 + { 72 + } 19 73 20 - #define USER_EVENTS_SYSTEM "user_events" 21 - #define USER_EVENTS_PREFIX "u:" 74 + static inline void user_events_execve(struct task_struct *t) 75 + { 76 + } 22 77 23 - /* Create dynamic location entry within a 32-bit value */ 24 - #define DYN_LOC(offset, size) ((size) << 16 | (offset)) 78 + static inline void user_events_exit(struct task_struct *t) 79 + { 80 + } 81 + #endif /* CONFIG_USER_EVENTS */ 25 82 26 - /* 27 - * Describes an event registration and stores the results of the registration. 28 - * This structure is passed to the DIAG_IOCSREG ioctl, callers at a minimum 29 - * must set the size and name_args before invocation. 30 - */ 31 - struct user_reg { 32 - 33 - /* Input: Size of the user_reg structure being used */ 34 - __u32 size; 35 - 36 - /* Input: Pointer to string with event name, description and flags */ 37 - __u64 name_args; 38 - 39 - /* Output: Bitwise index of the event within the status page */ 40 - __u32 status_bit; 41 - 42 - /* Output: Index of the event to use when writing data */ 43 - __u32 write_index; 44 - } __attribute__((__packed__)); 45 - 46 - #define DIAG_IOC_MAGIC '*' 47 - 48 - /* Requests to register a user_event */ 49 - #define DIAG_IOCSREG _IOWR(DIAG_IOC_MAGIC, 0, struct user_reg*) 50 - 51 - /* Requests to delete a user_event */ 52 - #define DIAG_IOCSDEL _IOW(DIAG_IOC_MAGIC, 1, char*) 53 - 54 - #endif /* _UAPI_LINUX_USER_EVENTS_H */ 83 + #endif /* _LINUX_USER_EVENTS_H */
+81
include/uapi/linux/user_events.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ 2 + /* 3 + * Copyright (c) 2021-2022, Microsoft Corporation. 4 + * 5 + * Authors: 6 + * Beau Belgrave <beaub@linux.microsoft.com> 7 + */ 8 + #ifndef _UAPI_LINUX_USER_EVENTS_H 9 + #define _UAPI_LINUX_USER_EVENTS_H 10 + 11 + #include <linux/types.h> 12 + #include <linux/ioctl.h> 13 + 14 + #define USER_EVENTS_SYSTEM "user_events" 15 + #define USER_EVENTS_PREFIX "u:" 16 + 17 + /* Create dynamic location entry within a 32-bit value */ 18 + #define DYN_LOC(offset, size) ((size) << 16 | (offset)) 19 + 20 + /* 21 + * Describes an event registration and stores the results of the registration. 22 + * This structure is passed to the DIAG_IOCSREG ioctl, callers at a minimum 23 + * must set the size and name_args before invocation. 24 + */ 25 + struct user_reg { 26 + 27 + /* Input: Size of the user_reg structure being used */ 28 + __u32 size; 29 + 30 + /* Input: Bit in enable address to use */ 31 + __u8 enable_bit; 32 + 33 + /* Input: Enable size in bytes at address */ 34 + __u8 enable_size; 35 + 36 + /* Input: Flags for future use, set to 0 */ 37 + __u16 flags; 38 + 39 + /* Input: Address to update when enabled */ 40 + __u64 enable_addr; 41 + 42 + /* Input: Pointer to string with event name, description and flags */ 43 + __u64 name_args; 44 + 45 + /* Output: Index of the event to use when writing data */ 46 + __u32 write_index; 47 + } __attribute__((__packed__)); 48 + 49 + /* 50 + * Describes an event unregister, callers must set the size, address and bit. 51 + * This structure is passed to the DIAG_IOCSUNREG ioctl to disable bit updates. 52 + */ 53 + struct user_unreg { 54 + /* Input: Size of the user_unreg structure being used */ 55 + __u32 size; 56 + 57 + /* Input: Bit to unregister */ 58 + __u8 disable_bit; 59 + 60 + /* Input: Reserved, set to 0 */ 61 + __u8 __reserved; 62 + 63 + /* Input: Reserved, set to 0 */ 64 + __u16 __reserved2; 65 + 66 + /* Input: Address to unregister */ 67 + __u64 disable_addr; 68 + } __attribute__((__packed__)); 69 + 70 + #define DIAG_IOC_MAGIC '*' 71 + 72 + /* Request to register a user_event */ 73 + #define DIAG_IOCSREG _IOWR(DIAG_IOC_MAGIC, 0, struct user_reg *) 74 + 75 + /* Request to delete a user_event */ 76 + #define DIAG_IOCSDEL _IOW(DIAG_IOC_MAGIC, 1, char *) 77 + 78 + /* Requests to unregister a user_event */ 79 + #define DIAG_IOCSUNREG _IOW(DIAG_IOC_MAGIC, 2, struct user_unreg*) 80 + 81 + #endif /* _UAPI_LINUX_USER_EVENTS_H */
+2
kernel/exit.c
··· 68 68 #include <linux/kprobes.h> 69 69 #include <linux/rethook.h> 70 70 #include <linux/sysfs.h> 71 + #include <linux/user_events.h> 71 72 72 73 #include <linux/uaccess.h> 73 74 #include <asm/unistd.h> ··· 819 818 820 819 coredump_task_exit(tsk); 821 820 ptrace_event(PTRACE_EVENT_EXIT, code); 821 + user_events_exit(tsk); 822 822 823 823 validate_creds_for_do_exit(tsk); 824 824
+2
kernel/fork.c
··· 97 97 #include <linux/io_uring.h> 98 98 #include <linux/bpf.h> 99 99 #include <linux/stackprotector.h> 100 + #include <linux/user_events.h> 100 101 101 102 #include <asm/pgalloc.h> 102 103 #include <linux/uaccess.h> ··· 2736 2735 2737 2736 trace_task_newtask(p, clone_flags); 2738 2737 uprobe_copy_process(p, clone_flags); 2738 + user_events_fork(p, clone_flags); 2739 2739 2740 2740 copy_oom_score_adj(clone_flags, p); 2741 2741
+3 -3
kernel/trace/Kconfig
··· 792 792 bool "User trace events" 793 793 select TRACING 794 794 select DYNAMIC_EVENTS 795 - depends on BROKEN || COMPILE_TEST # API needs to be straighten out 796 795 help 797 796 User trace events are user-defined trace events that 798 797 can be used like an existing kernel trace event. User trace 799 798 events are generated by writing to a tracefs file. User 800 799 processes can determine if their tracing events should be 801 - generated by memory mapping a tracefs file and checking for 802 - an associated byte being non-zero. 800 + generated by registering a value and bit with the kernel 801 + that reflects when it is enabled or not. 803 802 803 + See Documentation/trace/user_events.rst. 804 804 If in doubt, say N. 805 805 806 806 config HIST_TRIGGERS
+14 -3
kernel/trace/bpf_trace.c
··· 2640 2640 return err; 2641 2641 } 2642 2642 2643 - static void 2643 + static int 2644 2644 kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip, 2645 - struct pt_regs *regs) 2645 + struct pt_regs *regs, void *data) 2646 + { 2647 + struct bpf_kprobe_multi_link *link; 2648 + 2649 + link = container_of(fp, struct bpf_kprobe_multi_link, fp); 2650 + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs); 2651 + return 0; 2652 + } 2653 + 2654 + static void 2655 + kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip, 2656 + struct pt_regs *regs, void *data) 2646 2657 { 2647 2658 struct bpf_kprobe_multi_link *link; 2648 2659 ··· 2855 2844 goto error; 2856 2845 2857 2846 if (flags & BPF_F_KPROBE_MULTI_RETURN) 2858 - link->fp.exit_handler = kprobe_multi_link_handler; 2847 + link->fp.exit_handler = kprobe_multi_link_exit_handler; 2859 2848 else 2860 2849 link->fp.entry_handler = kprobe_multi_link_handler; 2861 2850
+23 -9
kernel/trace/fprobe.c
··· 17 17 struct fprobe_rethook_node { 18 18 struct rethook_node node; 19 19 unsigned long entry_ip; 20 + char data[]; 20 21 }; 21 22 22 23 static void fprobe_handler(unsigned long ip, unsigned long parent_ip, 23 24 struct ftrace_ops *ops, struct ftrace_regs *fregs) 24 25 { 25 26 struct fprobe_rethook_node *fpr; 26 - struct rethook_node *rh; 27 + struct rethook_node *rh = NULL; 27 28 struct fprobe *fp; 28 - int bit; 29 + void *entry_data = NULL; 30 + int bit, ret; 29 31 30 32 fp = container_of(ops, struct fprobe, ops); 31 33 if (fprobe_disabled(fp)) ··· 39 37 return; 40 38 } 41 39 42 - if (fp->entry_handler) 43 - fp->entry_handler(fp, ip, ftrace_get_regs(fregs)); 44 - 45 40 if (fp->exit_handler) { 46 41 rh = rethook_try_get(fp->rethook); 47 42 if (!rh) { ··· 47 48 } 48 49 fpr = container_of(rh, struct fprobe_rethook_node, node); 49 50 fpr->entry_ip = ip; 50 - rethook_hook(rh, ftrace_get_regs(fregs), true); 51 + if (fp->entry_data_size) 52 + entry_data = fpr->data; 51 53 } 52 54 55 + if (fp->entry_handler) 56 + ret = fp->entry_handler(fp, ip, ftrace_get_regs(fregs), entry_data); 57 + 58 + /* If entry_handler returns !0, nmissed is not counted. */ 59 + if (rh) { 60 + if (ret) 61 + rethook_recycle(rh); 62 + else 63 + rethook_hook(rh, ftrace_get_regs(fregs), true); 64 + } 53 65 out: 54 66 ftrace_test_recursion_unlock(bit); 55 67 } ··· 91 81 92 82 fpr = container_of(rh, struct fprobe_rethook_node, node); 93 83 94 - fp->exit_handler(fp, fpr->entry_ip, regs); 84 + fp->exit_handler(fp, fpr->entry_ip, regs, 85 + fp->entry_data_size ? (void *)fpr->data : NULL); 95 86 } 96 87 NOKPROBE_SYMBOL(fprobe_exit_handler); 97 88 ··· 147 136 } 148 137 149 138 /* Initialize rethook if needed */ 150 - size = num * num_possible_cpus() * 2; 139 + if (fp->nr_maxactive) 140 + size = fp->nr_maxactive; 141 + else 142 + size = num * num_possible_cpus() * 2; 151 143 if (size < 0) 152 144 return -E2BIG; 153 145 ··· 160 146 for (i = 0; i < size; i++) { 161 147 struct fprobe_rethook_node *node; 162 148 163 - node = kzalloc(sizeof(*node), GFP_KERNEL); 149 + node = kzalloc(sizeof(*node) + fp->entry_data_size, GFP_KERNEL); 164 150 if (!node) { 165 151 rethook_free(fp->rethook); 166 152 fp->rethook = NULL;
+46 -5
kernel/trace/ftrace.c
··· 45 45 #include "trace_output.h" 46 46 #include "trace_stat.h" 47 47 48 + /* Flags that do not get reset */ 49 + #define FTRACE_NOCLEAR_FLAGS (FTRACE_FL_DISABLED | FTRACE_FL_TOUCHED) 50 + 48 51 #define FTRACE_INVALID_FUNCTION "__ftrace_invalid_address__" 49 52 50 53 #define FTRACE_WARN_ON(cond) \ ··· 2259 2256 flag ^= rec->flags & FTRACE_FL_ENABLED; 2260 2257 2261 2258 if (update) { 2262 - rec->flags |= FTRACE_FL_ENABLED; 2259 + rec->flags |= FTRACE_FL_ENABLED | FTRACE_FL_TOUCHED; 2263 2260 if (flag & FTRACE_FL_REGS) { 2264 2261 if (rec->flags & FTRACE_FL_REGS) 2265 2262 rec->flags |= FTRACE_FL_REGS_EN; ··· 2329 2326 if (update) { 2330 2327 /* If there's no more users, clear all flags */ 2331 2328 if (!ftrace_rec_count(rec)) 2332 - rec->flags &= FTRACE_FL_DISABLED; 2329 + rec->flags &= FTRACE_NOCLEAR_FLAGS; 2333 2330 else 2334 2331 /* 2335 2332 * Just disable the record, but keep the ops TRAMP ··· 3150 3147 struct dyn_ftrace *rec; 3151 3148 3152 3149 do_for_each_ftrace_rec(pg, rec) { 3153 - if (FTRACE_WARN_ON_ONCE(rec->flags & ~FTRACE_FL_DISABLED)) 3150 + if (FTRACE_WARN_ON_ONCE(rec->flags & ~FTRACE_NOCLEAR_FLAGS)) 3154 3151 pr_warn(" %pS flags:%lx\n", 3155 3152 (void *)rec->ip, rec->flags); 3156 3153 } while_for_each_ftrace_rec(); ··· 3601 3598 !ftrace_lookup_ip(iter->hash, rec->ip)) || 3602 3599 3603 3600 ((iter->flags & FTRACE_ITER_ENABLED) && 3604 - !(rec->flags & FTRACE_FL_ENABLED))) { 3601 + !(rec->flags & FTRACE_FL_ENABLED)) || 3602 + 3603 + ((iter->flags & FTRACE_ITER_TOUCHED) && 3604 + !(rec->flags & FTRACE_FL_TOUCHED))) { 3605 3605 3606 3606 rec = NULL; 3607 3607 goto retry; ··· 3863 3857 return 0; 3864 3858 } 3865 3859 3866 - if (iter->flags & FTRACE_ITER_ENABLED) { 3860 + if (iter->flags & (FTRACE_ITER_ENABLED | FTRACE_ITER_TOUCHED)) { 3867 3861 struct ftrace_ops *ops; 3868 3862 3869 3863 seq_printf(m, " (%ld)%s%s%s%s", ··· 3960 3954 3961 3955 iter->pg = ftrace_pages_start; 3962 3956 iter->flags = FTRACE_ITER_ENABLED; 3957 + iter->ops = &global_ops; 3958 + 3959 + return 0; 3960 + } 3961 + 3962 + static int 3963 + ftrace_touched_open(struct inode *inode, struct file *file) 3964 + { 3965 + struct ftrace_iterator *iter; 3966 + 3967 + /* 3968 + * This shows us what functions have ever been enabled 3969 + * (traced, direct, patched, etc). Not sure if we want lockdown 3970 + * to hide such critical information for an admin. 3971 + * Although, perhaps it can show information we don't 3972 + * want people to see, but if something had traced 3973 + * something, we probably want to know about it. 3974 + */ 3975 + 3976 + iter = __seq_open_private(file, &show_ftrace_seq_ops, sizeof(*iter)); 3977 + if (!iter) 3978 + return -ENOMEM; 3979 + 3980 + iter->pg = ftrace_pages_start; 3981 + iter->flags = FTRACE_ITER_TOUCHED; 3963 3982 iter->ops = &global_ops; 3964 3983 3965 3984 return 0; ··· 5903 5872 .release = seq_release_private, 5904 5873 }; 5905 5874 5875 + static const struct file_operations ftrace_touched_fops = { 5876 + .open = ftrace_touched_open, 5877 + .read = seq_read, 5878 + .llseek = seq_lseek, 5879 + .release = seq_release_private, 5880 + }; 5881 + 5906 5882 static const struct file_operations ftrace_filter_fops = { 5907 5883 .open = ftrace_filter_open, 5908 5884 .read = seq_read, ··· 6373 6335 6374 6336 trace_create_file("enabled_functions", TRACE_MODE_READ, 6375 6337 d_tracer, NULL, &ftrace_enabled_fops); 6338 + 6339 + trace_create_file("touched_functions", TRACE_MODE_READ, 6340 + d_tracer, NULL, &ftrace_touched_fops); 6376 6341 6377 6342 ftrace_create_filter_files(&global_ops, d_tracer); 6378 6343
+54 -48
kernel/trace/ring_buffer.c
··· 163 163 #define extended_time(event) \ 164 164 (event->type_len >= RINGBUF_TYPE_TIME_EXTEND) 165 165 166 - static inline int rb_null_event(struct ring_buffer_event *event) 166 + static inline bool rb_null_event(struct ring_buffer_event *event) 167 167 { 168 168 return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta; 169 169 } ··· 363 363 /* 364 364 * We need to fit the time_stamp delta into 27 bits. 365 365 */ 366 - static inline int test_time_stamp(u64 delta) 366 + static inline bool test_time_stamp(u64 delta) 367 367 { 368 - if (delta & TS_DELTA_TEST) 369 - return 1; 370 - return 0; 368 + return !!(delta & TS_DELTA_TEST); 371 369 } 372 370 373 371 #define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE) ··· 694 696 return ret == expect; 695 697 } 696 698 697 - static int rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) 699 + static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) 698 700 { 699 701 unsigned long cnt, top, bottom, msb; 700 702 unsigned long cnt2, top2, bottom2, msb2; ··· 1484 1486 return NULL; 1485 1487 } 1486 1488 1487 - static int rb_head_page_replace(struct buffer_page *old, 1489 + static bool rb_head_page_replace(struct buffer_page *old, 1488 1490 struct buffer_page *new) 1489 1491 { 1490 1492 unsigned long *ptr = (unsigned long *)&old->list.prev->next; ··· 1563 1565 } 1564 1566 } 1565 1567 1566 - static int rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer, 1568 + static void rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer, 1567 1569 struct buffer_page *bpage) 1568 1570 { 1569 1571 unsigned long val = (unsigned long)bpage; 1570 1572 1571 - if (RB_WARN_ON(cpu_buffer, val & RB_FLAG_MASK)) 1572 - return 1; 1573 - 1574 - return 0; 1573 + RB_WARN_ON(cpu_buffer, val & RB_FLAG_MASK); 1575 1574 } 1576 1575 1577 1576 /** ··· 1578 1583 * As a safety measure we check to make sure the data pages have not 1579 1584 * been corrupted. 1580 1585 */ 1581 - static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) 1586 + static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) 1582 1587 { 1583 1588 struct list_head *head = rb_list_head(cpu_buffer->pages); 1584 1589 struct list_head *tmp; 1585 1590 1586 1591 if (RB_WARN_ON(cpu_buffer, 1587 1592 rb_list_head(rb_list_head(head->next)->prev) != head)) 1588 - return -1; 1593 + return; 1589 1594 1590 1595 if (RB_WARN_ON(cpu_buffer, 1591 1596 rb_list_head(rb_list_head(head->prev)->next) != head)) 1592 - return -1; 1597 + return; 1593 1598 1594 1599 for (tmp = rb_list_head(head->next); tmp != head; tmp = rb_list_head(tmp->next)) { 1595 1600 if (RB_WARN_ON(cpu_buffer, 1596 1601 rb_list_head(rb_list_head(tmp->next)->prev) != tmp)) 1597 - return -1; 1602 + return; 1598 1603 1599 1604 if (RB_WARN_ON(cpu_buffer, 1600 1605 rb_list_head(rb_list_head(tmp->prev)->next) != tmp)) 1601 - return -1; 1606 + return; 1602 1607 } 1603 - 1604 - return 0; 1605 1608 } 1606 1609 1607 1610 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, ··· 1767 1774 struct list_head *head = cpu_buffer->pages; 1768 1775 struct buffer_page *bpage, *tmp; 1769 1776 1777 + irq_work_sync(&cpu_buffer->irq_work.work); 1778 + 1770 1779 free_buffer_page(cpu_buffer->reader_page); 1771 1780 1772 1781 if (head) { ··· 1875 1880 1876 1881 cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node); 1877 1882 1883 + irq_work_sync(&buffer->irq_work.work); 1884 + 1878 1885 for_each_buffer_cpu(buffer, cpu) 1879 1886 rb_free_cpu_buffer(buffer->buffers[cpu]); 1880 1887 ··· 1915 1918 return local_read(&bpage->write) & RB_WRITE_MASK; 1916 1919 } 1917 1920 1918 - static int 1921 + static bool 1919 1922 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages) 1920 1923 { 1921 1924 struct list_head *tail_page, *to_remove, *next_page; ··· 2028 2031 return nr_removed == 0; 2029 2032 } 2030 2033 2031 - static int 2034 + static bool 2032 2035 rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer) 2033 2036 { 2034 2037 struct list_head *pages = &cpu_buffer->new_pages; 2035 - int retries, success; 2036 2038 unsigned long flags; 2039 + bool success; 2040 + int retries; 2037 2041 2038 2042 /* Can be called at early boot up, where interrupts must not been enabled */ 2039 2043 raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); ··· 2053 2055 * spinning. 2054 2056 */ 2055 2057 retries = 10; 2056 - success = 0; 2058 + success = false; 2057 2059 while (retries--) { 2058 2060 struct list_head *head_page, *prev_page, *r; 2059 2061 struct list_head *last_page, *first_page; 2060 2062 struct list_head *head_page_with_bit; 2063 + struct buffer_page *hpage = rb_set_head_page(cpu_buffer); 2061 2064 2062 - head_page = &rb_set_head_page(cpu_buffer)->list; 2063 - if (!head_page) 2065 + if (!hpage) 2064 2066 break; 2067 + head_page = &hpage->list; 2065 2068 prev_page = head_page->prev; 2066 2069 2067 2070 first_page = pages->next; ··· 2083 2084 * pointer to point to end of list 2084 2085 */ 2085 2086 head_page->prev = last_page; 2086 - success = 1; 2087 + success = true; 2087 2088 break; 2088 2089 } 2089 2090 } ··· 2111 2112 2112 2113 static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer) 2113 2114 { 2114 - int success; 2115 + bool success; 2115 2116 2116 2117 if (cpu_buffer->nr_pages_to_update > 0) 2117 2118 success = rb_insert_pages(cpu_buffer); ··· 2994 2995 } 2995 2996 } 2996 2997 2997 - static inline int 2998 + static inline bool 2998 2999 rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer, 2999 3000 struct ring_buffer_event *event) 3000 3001 { ··· 3015 3016 delta = rb_time_delta(event); 3016 3017 3017 3018 if (!rb_time_read(&cpu_buffer->write_stamp, &write_stamp)) 3018 - return 0; 3019 + return false; 3019 3020 3020 3021 /* Make sure the write stamp is read before testing the location */ 3021 3022 barrier(); ··· 3028 3029 /* Something came in, can't discard */ 3029 3030 if (!rb_time_cmpxchg(&cpu_buffer->write_stamp, 3030 3031 write_stamp, write_stamp - delta)) 3031 - return 0; 3032 + return false; 3032 3033 3033 3034 /* 3034 3035 * It's possible that the event time delta is zero ··· 3061 3062 if (index == old_index) { 3062 3063 /* update counters */ 3063 3064 local_sub(event_length, &cpu_buffer->entries_bytes); 3064 - return 1; 3065 + return true; 3065 3066 } 3066 3067 } 3067 3068 3068 3069 /* could not discard */ 3069 - return 0; 3070 + return false; 3070 3071 } 3071 3072 3072 3073 static void rb_start_commit(struct ring_buffer_per_cpu *cpu_buffer) ··· 3287 3288 * Note: The TRANSITION bit only handles a single transition between context. 3288 3289 */ 3289 3290 3290 - static __always_inline int 3291 + static __always_inline bool 3291 3292 trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer) 3292 3293 { 3293 3294 unsigned int val = cpu_buffer->current_context; ··· 3304 3305 bit = RB_CTX_TRANSITION; 3305 3306 if (val & (1 << (bit + cpu_buffer->nest))) { 3306 3307 do_ring_buffer_record_recursion(); 3307 - return 1; 3308 + return true; 3308 3309 } 3309 3310 } 3310 3311 3311 3312 val |= (1 << (bit + cpu_buffer->nest)); 3312 3313 cpu_buffer->current_context = val; 3313 3314 3314 - return 0; 3315 + return false; 3315 3316 } 3316 3317 3317 3318 static __always_inline void ··· 4068 4069 unsigned int rd; 4069 4070 unsigned int new_rd; 4070 4071 4072 + rd = atomic_read(&buffer->record_disabled); 4071 4073 do { 4072 - rd = atomic_read(&buffer->record_disabled); 4073 4074 new_rd = rd | RB_BUFFER_OFF; 4074 - } while (atomic_cmpxchg(&buffer->record_disabled, rd, new_rd) != rd); 4075 + } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); 4075 4076 } 4076 4077 EXPORT_SYMBOL_GPL(ring_buffer_record_off); 4077 4078 ··· 4091 4092 unsigned int rd; 4092 4093 unsigned int new_rd; 4093 4094 4095 + rd = atomic_read(&buffer->record_disabled); 4094 4096 do { 4095 - rd = atomic_read(&buffer->record_disabled); 4096 4097 new_rd = rd & ~RB_BUFFER_OFF; 4097 - } while (atomic_cmpxchg(&buffer->record_disabled, rd, new_rd) != rd); 4098 + } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); 4098 4099 } 4099 4100 EXPORT_SYMBOL_GPL(ring_buffer_record_on); 4100 4101 ··· 4501 4502 default: 4502 4503 RB_WARN_ON(cpu_buffer, 1); 4503 4504 } 4504 - return; 4505 4505 } 4506 4506 4507 4507 static void ··· 4531 4533 default: 4532 4534 RB_WARN_ON(iter->cpu_buffer, 1); 4533 4535 } 4534 - return; 4535 4536 } 4536 4537 4537 4538 static struct buffer_page * ··· 4540 4543 unsigned long overwrite; 4541 4544 unsigned long flags; 4542 4545 int nr_loops = 0; 4543 - int ret; 4546 + bool ret; 4544 4547 4545 4548 local_irq_save(flags); 4546 4549 arch_spin_lock(&cpu_buffer->lock); ··· 4950 4953 { 4951 4954 if (likely(locked)) 4952 4955 raw_spin_unlock(&cpu_buffer->reader_lock); 4953 - return; 4954 4956 } 4955 4957 4956 4958 /** ··· 5341 5345 } 5342 5346 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); 5343 5347 5348 + /* Flag to ensure proper resetting of atomic variables */ 5349 + #define RESET_BIT (1 << 30) 5350 + 5344 5351 /** 5345 5352 * ring_buffer_reset_online_cpus - reset a ring buffer per CPU buffer 5346 5353 * @buffer: The ring buffer to reset a per cpu buffer of ··· 5360 5361 for_each_online_buffer_cpu(buffer, cpu) { 5361 5362 cpu_buffer = buffer->buffers[cpu]; 5362 5363 5363 - atomic_inc(&cpu_buffer->resize_disabled); 5364 + atomic_add(RESET_BIT, &cpu_buffer->resize_disabled); 5364 5365 atomic_inc(&cpu_buffer->record_disabled); 5365 5366 } 5366 5367 5367 5368 /* Make sure all commits have finished */ 5368 5369 synchronize_rcu(); 5369 5370 5370 - for_each_online_buffer_cpu(buffer, cpu) { 5371 + for_each_buffer_cpu(buffer, cpu) { 5371 5372 cpu_buffer = buffer->buffers[cpu]; 5373 + 5374 + /* 5375 + * If a CPU came online during the synchronize_rcu(), then 5376 + * ignore it. 5377 + */ 5378 + if (!(atomic_read(&cpu_buffer->resize_disabled) & RESET_BIT)) 5379 + continue; 5372 5380 5373 5381 reset_disabled_cpu_buffer(cpu_buffer); 5374 5382 5375 5383 atomic_dec(&cpu_buffer->record_disabled); 5376 - atomic_dec(&cpu_buffer->resize_disabled); 5384 + atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); 5377 5385 } 5378 5386 5379 5387 mutex_unlock(&buffer->mutex); ··· 5430 5424 struct ring_buffer_per_cpu *cpu_buffer; 5431 5425 unsigned long flags; 5432 5426 bool dolock; 5427 + bool ret; 5433 5428 int cpu; 5434 - int ret; 5435 5429 5436 5430 /* yes this is racy, but if you don't like the race, lock the buffer */ 5437 5431 for_each_buffer_cpu(buffer, cpu) { ··· 5460 5454 struct ring_buffer_per_cpu *cpu_buffer; 5461 5455 unsigned long flags; 5462 5456 bool dolock; 5463 - int ret; 5457 + bool ret; 5464 5458 5465 5459 if (!cpumask_test_cpu(cpu, buffer->cpumask)) 5466 5460 return true;
+5 -2
kernel/trace/trace.c
··· 3726 3726 #define STATIC_FMT_BUF_SIZE 128 3727 3727 static char static_fmt_buf[STATIC_FMT_BUF_SIZE]; 3728 3728 3729 - static char *trace_iter_expand_format(struct trace_iterator *iter) 3729 + char *trace_iter_expand_format(struct trace_iterator *iter) 3730 3730 { 3731 3731 char *tmp; 3732 3732 ··· 4446 4446 if (trace_seq_has_overflowed(s)) 4447 4447 return TRACE_TYPE_PARTIAL_LINE; 4448 4448 4449 - if (event) 4449 + if (event) { 4450 + if (tr->trace_flags & TRACE_ITER_FIELDS) 4451 + return print_event_fields(iter, event); 4450 4452 return event->funcs->trace(iter, sym_flags, event); 4453 + } 4451 4454 4452 4455 trace_seq_printf(s, "Unknown type %d\n", entry->type); 4453 4456
+2
kernel/trace/trace.h
··· 619 619 const char *trace_event_format(struct trace_iterator *iter, const char *fmt); 620 620 void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, 621 621 va_list ap) __printf(2, 0); 622 + char *trace_iter_expand_format(struct trace_iterator *iter); 622 623 623 624 int trace_empty(struct trace_iterator *iter); 624 625 ··· 1200 1199 C(HEX, "hex"), \ 1201 1200 C(BIN, "bin"), \ 1202 1201 C(BLOCK, "block"), \ 1202 + C(FIELDS, "fields"), \ 1203 1203 C(PRINTK, "trace_printk"), \ 1204 1204 C(ANNOTATE, "annotate"), \ 1205 1205 C(USERSTACKTRACE, "userstacktrace"), \
+845 -189
kernel/trace/trace_events_user.c
··· 19 19 #include <linux/tracefs.h> 20 20 #include <linux/types.h> 21 21 #include <linux/uaccess.h> 22 - /* Reminder to move to uapi when everything works */ 23 - #ifdef CONFIG_COMPILE_TEST 22 + #include <linux/highmem.h> 23 + #include <linux/init.h> 24 24 #include <linux/user_events.h> 25 - #else 26 - #include <uapi/linux/user_events.h> 27 - #endif 28 - #include "trace.h" 29 25 #include "trace_dynevent.h" 26 + #include "trace_output.h" 27 + #include "trace.h" 30 28 31 29 #define USER_EVENTS_PREFIX_LEN (sizeof(USER_EVENTS_PREFIX)-1) 32 30 ··· 32 34 #define FIELD_DEPTH_NAME 1 33 35 #define FIELD_DEPTH_SIZE 2 34 36 35 - /* 36 - * Limits how many trace_event calls user processes can create: 37 - * Must be a power of two of PAGE_SIZE. 38 - */ 39 - #define MAX_PAGE_ORDER 0 40 - #define MAX_PAGES (1 << MAX_PAGE_ORDER) 41 - #define MAX_BYTES (MAX_PAGES * PAGE_SIZE) 42 - #define MAX_EVENTS (MAX_BYTES * 8) 43 - 44 37 /* Limit how long of an event name plus args within the subsystem. */ 45 38 #define MAX_EVENT_DESC 512 46 39 #define EVENT_NAME(user_event) ((user_event)->tracepoint.name) 47 40 #define MAX_FIELD_ARRAY_SIZE 1024 48 - 49 - /* 50 - * The MAP_STATUS_* macros are used for taking a index and determining the 51 - * appropriate byte and the bit in the byte to set/reset for an event. 52 - * 53 - * The lower 3 bits of the index decide which bit to set. 54 - * The remaining upper bits of the index decide which byte to use for the bit. 55 - * 56 - * This is used when an event has a probe attached/removed to reflect live 57 - * status of the event wanting tracing or not to user-programs via shared 58 - * memory maps. 59 - */ 60 - #define MAP_STATUS_BYTE(index) ((index) >> 3) 61 - #define MAP_STATUS_MASK(index) BIT((index) & 7) 62 41 63 42 /* 64 43 * Internal bits (kernel side only) to keep track of connected probes: ··· 50 75 #define EVENT_STATUS_OTHER BIT(7) 51 76 52 77 /* 53 - * Stores the pages, tables, and locks for a group of events. 54 - * Each logical grouping of events has its own group, with a 55 - * matching page for status checks within user programs. This 56 - * allows for isolation of events to user programs by various 57 - * means. 78 + * Stores the system name, tables, and locks for a group of events. This 79 + * allows isolation for events by various means. 58 80 */ 59 81 struct user_event_group { 60 - struct page *pages; 61 - char *register_page_data; 62 - char *system_name; 63 - struct hlist_node node; 64 - struct mutex reg_mutex; 82 + char *system_name; 83 + struct hlist_node node; 84 + struct mutex reg_mutex; 65 85 DECLARE_HASHTABLE(register_table, 8); 66 - DECLARE_BITMAP(page_bitmap, MAX_EVENTS); 67 86 }; 68 87 69 88 /* Group for init_user_ns mapping, top-most group */ 70 89 static struct user_event_group *init_group; 90 + 91 + /* Max allowed events for the whole system */ 92 + static unsigned int max_user_events = 32768; 93 + 94 + /* Current number of events on the whole system */ 95 + static unsigned int current_user_events; 71 96 72 97 /* 73 98 * Stores per-event properties, as users register events ··· 77 102 * refcnt reaches one. 78 103 */ 79 104 struct user_event { 80 - struct user_event_group *group; 81 - struct tracepoint tracepoint; 82 - struct trace_event_call call; 83 - struct trace_event_class class; 84 - struct dyn_event devent; 85 - struct hlist_node node; 86 - struct list_head fields; 87 - struct list_head validators; 88 - refcount_t refcnt; 89 - int index; 90 - int flags; 91 - int min_size; 92 - char status; 105 + struct user_event_group *group; 106 + struct tracepoint tracepoint; 107 + struct trace_event_call call; 108 + struct trace_event_class class; 109 + struct dyn_event devent; 110 + struct hlist_node node; 111 + struct list_head fields; 112 + struct list_head validators; 113 + refcount_t refcnt; 114 + int min_size; 115 + char status; 93 116 }; 117 + 118 + /* 119 + * Stores per-mm/event properties that enable an address to be 120 + * updated properly for each task. As tasks are forked, we use 121 + * these to track enablement sites that are tied to an event. 122 + */ 123 + struct user_event_enabler { 124 + struct list_head link; 125 + struct user_event *event; 126 + unsigned long addr; 127 + 128 + /* Track enable bit, flags, etc. Aligned for bitops. */ 129 + unsigned int values; 130 + }; 131 + 132 + /* Bits 0-5 are for the bit to update upon enable/disable (0-63 allowed) */ 133 + #define ENABLE_VAL_BIT_MASK 0x3F 134 + 135 + /* Bit 6 is for faulting status of enablement */ 136 + #define ENABLE_VAL_FAULTING_BIT 6 137 + 138 + /* Bit 7 is for freeing status of enablement */ 139 + #define ENABLE_VAL_FREEING_BIT 7 140 + 141 + /* Only duplicate the bit value */ 142 + #define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK 143 + 144 + #define ENABLE_BITOPS(e) ((unsigned long *)&(e)->values) 145 + 146 + /* Used for asynchronous faulting in of pages */ 147 + struct user_event_enabler_fault { 148 + struct work_struct work; 149 + struct user_event_mm *mm; 150 + struct user_event_enabler *enabler; 151 + int attempt; 152 + }; 153 + 154 + static struct kmem_cache *fault_cache; 155 + 156 + /* Global list of memory descriptors using user_events */ 157 + static LIST_HEAD(user_event_mms); 158 + static DEFINE_SPINLOCK(user_event_mms_lock); 94 159 95 160 /* 96 161 * Stores per-file events references, as users register events ··· 139 124 * These are not shared and only accessible by the file that created it. 140 125 */ 141 126 struct user_event_refs { 142 - struct rcu_head rcu; 143 - int count; 144 - struct user_event *events[]; 127 + struct rcu_head rcu; 128 + int count; 129 + struct user_event *events[]; 145 130 }; 146 131 147 132 struct user_event_file_info { 148 - struct user_event_group *group; 149 - struct user_event_refs *refs; 133 + struct user_event_group *group; 134 + struct user_event_refs *refs; 150 135 }; 151 136 152 137 #define VALIDATOR_ENSURE_NULL (1 << 0) 153 138 #define VALIDATOR_REL (1 << 1) 154 139 155 140 struct user_event_validator { 156 - struct list_head link; 157 - int offset; 158 - int flags; 141 + struct list_head link; 142 + int offset; 143 + int flags; 159 144 }; 160 145 161 146 typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter *i, ··· 165 150 char *args, char *flags, 166 151 struct user_event **newuser); 167 152 153 + static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm); 154 + static struct user_event_mm *user_event_mm_get_all(struct user_event *user); 155 + static void user_event_mm_put(struct user_event_mm *mm); 156 + 168 157 static u32 user_event_key(char *name) 169 158 { 170 159 return jhash(name, strlen(name), 0); 171 160 } 172 161 173 - static void set_page_reservations(char *pages, bool set) 174 - { 175 - int page; 176 - 177 - for (page = 0; page < MAX_PAGES; ++page) { 178 - void *addr = pages + (PAGE_SIZE * page); 179 - 180 - if (set) 181 - SetPageReserved(virt_to_page(addr)); 182 - else 183 - ClearPageReserved(virt_to_page(addr)); 184 - } 185 - } 186 - 187 162 static void user_event_group_destroy(struct user_event_group *group) 188 163 { 189 - if (group->register_page_data) 190 - set_page_reservations(group->register_page_data, false); 191 - 192 - if (group->pages) 193 - __free_pages(group->pages, MAX_PAGE_ORDER); 194 - 195 164 kfree(group->system_name); 196 165 kfree(group); 197 166 } ··· 246 247 if (!group->system_name) 247 248 goto error; 248 249 249 - group->pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PAGE_ORDER); 250 - 251 - if (!group->pages) 252 - goto error; 253 - 254 - group->register_page_data = page_address(group->pages); 255 - 256 - set_page_reservations(group->register_page_data, true); 257 - 258 - /* Zero all bits beside 0 (which is reserved for failures) */ 259 - bitmap_zero(group->page_bitmap, MAX_EVENTS); 260 - set_bit(0, group->page_bitmap); 261 - 262 250 mutex_init(&group->reg_mutex); 263 251 hash_init(group->register_table); 264 252 ··· 257 271 return NULL; 258 272 }; 259 273 260 - static __always_inline 261 - void user_event_register_set(struct user_event *user) 274 + static void user_event_enabler_destroy(struct user_event_enabler *enabler) 262 275 { 263 - int i = user->index; 276 + list_del_rcu(&enabler->link); 264 277 265 - user->group->register_page_data[MAP_STATUS_BYTE(i)] |= MAP_STATUS_MASK(i); 278 + /* No longer tracking the event via the enabler */ 279 + refcount_dec(&enabler->event->refcnt); 280 + 281 + kfree(enabler); 266 282 } 267 283 268 - static __always_inline 269 - void user_event_register_clear(struct user_event *user) 284 + static int user_event_mm_fault_in(struct user_event_mm *mm, unsigned long uaddr, 285 + int attempt) 270 286 { 271 - int i = user->index; 287 + bool unlocked; 288 + int ret; 272 289 273 - user->group->register_page_data[MAP_STATUS_BYTE(i)] &= ~MAP_STATUS_MASK(i); 290 + /* 291 + * Normally this is low, ensure that it cannot be taken advantage of by 292 + * bad user processes to cause excessive looping. 293 + */ 294 + if (attempt > 10) 295 + return -EFAULT; 296 + 297 + mmap_read_lock(mm->mm); 298 + 299 + /* Ensure MM has tasks, cannot use after exit_mm() */ 300 + if (refcount_read(&mm->tasks) == 0) { 301 + ret = -ENOENT; 302 + goto out; 303 + } 304 + 305 + ret = fixup_user_fault(mm->mm, uaddr, FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, 306 + &unlocked); 307 + out: 308 + mmap_read_unlock(mm->mm); 309 + 310 + return ret; 311 + } 312 + 313 + static int user_event_enabler_write(struct user_event_mm *mm, 314 + struct user_event_enabler *enabler, 315 + bool fixup_fault, int *attempt); 316 + 317 + static void user_event_enabler_fault_fixup(struct work_struct *work) 318 + { 319 + struct user_event_enabler_fault *fault = container_of( 320 + work, struct user_event_enabler_fault, work); 321 + struct user_event_enabler *enabler = fault->enabler; 322 + struct user_event_mm *mm = fault->mm; 323 + unsigned long uaddr = enabler->addr; 324 + int attempt = fault->attempt; 325 + int ret; 326 + 327 + ret = user_event_mm_fault_in(mm, uaddr, attempt); 328 + 329 + if (ret && ret != -ENOENT) { 330 + struct user_event *user = enabler->event; 331 + 332 + pr_warn("user_events: Fault for mm: 0x%pK @ 0x%llx event: %s\n", 333 + mm->mm, (unsigned long long)uaddr, EVENT_NAME(user)); 334 + } 335 + 336 + /* Prevent state changes from racing */ 337 + mutex_lock(&event_mutex); 338 + 339 + /* User asked for enabler to be removed during fault */ 340 + if (test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler))) { 341 + user_event_enabler_destroy(enabler); 342 + goto out; 343 + } 344 + 345 + /* 346 + * If we managed to get the page, re-issue the write. We do not 347 + * want to get into a possible infinite loop, which is why we only 348 + * attempt again directly if the page came in. If we couldn't get 349 + * the page here, then we will try again the next time the event is 350 + * enabled/disabled. 351 + */ 352 + clear_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)); 353 + 354 + if (!ret) { 355 + mmap_read_lock(mm->mm); 356 + user_event_enabler_write(mm, enabler, true, &attempt); 357 + mmap_read_unlock(mm->mm); 358 + } 359 + out: 360 + mutex_unlock(&event_mutex); 361 + 362 + /* In all cases we no longer need the mm or fault */ 363 + user_event_mm_put(mm); 364 + kmem_cache_free(fault_cache, fault); 365 + } 366 + 367 + static bool user_event_enabler_queue_fault(struct user_event_mm *mm, 368 + struct user_event_enabler *enabler, 369 + int attempt) 370 + { 371 + struct user_event_enabler_fault *fault; 372 + 373 + fault = kmem_cache_zalloc(fault_cache, GFP_NOWAIT | __GFP_NOWARN); 374 + 375 + if (!fault) 376 + return false; 377 + 378 + INIT_WORK(&fault->work, user_event_enabler_fault_fixup); 379 + fault->mm = user_event_mm_get(mm); 380 + fault->enabler = enabler; 381 + fault->attempt = attempt; 382 + 383 + /* Don't try to queue in again while we have a pending fault */ 384 + set_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)); 385 + 386 + if (!schedule_work(&fault->work)) { 387 + /* Allow another attempt later */ 388 + clear_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)); 389 + 390 + user_event_mm_put(mm); 391 + kmem_cache_free(fault_cache, fault); 392 + 393 + return false; 394 + } 395 + 396 + return true; 397 + } 398 + 399 + static int user_event_enabler_write(struct user_event_mm *mm, 400 + struct user_event_enabler *enabler, 401 + bool fixup_fault, int *attempt) 402 + { 403 + unsigned long uaddr = enabler->addr; 404 + unsigned long *ptr; 405 + struct page *page; 406 + void *kaddr; 407 + int ret; 408 + 409 + lockdep_assert_held(&event_mutex); 410 + mmap_assert_locked(mm->mm); 411 + 412 + *attempt += 1; 413 + 414 + /* Ensure MM has tasks, cannot use after exit_mm() */ 415 + if (refcount_read(&mm->tasks) == 0) 416 + return -ENOENT; 417 + 418 + if (unlikely(test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)) || 419 + test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler)))) 420 + return -EBUSY; 421 + 422 + ret = pin_user_pages_remote(mm->mm, uaddr, 1, FOLL_WRITE | FOLL_NOFAULT, 423 + &page, NULL, NULL); 424 + 425 + if (unlikely(ret <= 0)) { 426 + if (!fixup_fault) 427 + return -EFAULT; 428 + 429 + if (!user_event_enabler_queue_fault(mm, enabler, *attempt)) 430 + pr_warn("user_events: Unable to queue fault handler\n"); 431 + 432 + return -EFAULT; 433 + } 434 + 435 + kaddr = kmap_local_page(page); 436 + ptr = kaddr + (uaddr & ~PAGE_MASK); 437 + 438 + /* Update bit atomically, user tracers must be atomic as well */ 439 + if (enabler->event && enabler->event->status) 440 + set_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr); 441 + else 442 + clear_bit(enabler->values & ENABLE_VAL_BIT_MASK, ptr); 443 + 444 + kunmap_local(kaddr); 445 + unpin_user_pages_dirty_lock(&page, 1, true); 446 + 447 + return 0; 448 + } 449 + 450 + static bool user_event_enabler_exists(struct user_event_mm *mm, 451 + unsigned long uaddr, unsigned char bit) 452 + { 453 + struct user_event_enabler *enabler; 454 + struct user_event_enabler *next; 455 + 456 + list_for_each_entry_safe(enabler, next, &mm->enablers, link) { 457 + if (enabler->addr == uaddr && 458 + (enabler->values & ENABLE_VAL_BIT_MASK) == bit) 459 + return true; 460 + } 461 + 462 + return false; 463 + } 464 + 465 + static void user_event_enabler_update(struct user_event *user) 466 + { 467 + struct user_event_enabler *enabler; 468 + struct user_event_mm *mm = user_event_mm_get_all(user); 469 + struct user_event_mm *next; 470 + int attempt; 471 + 472 + while (mm) { 473 + next = mm->next; 474 + mmap_read_lock(mm->mm); 475 + rcu_read_lock(); 476 + 477 + list_for_each_entry_rcu(enabler, &mm->enablers, link) { 478 + if (enabler->event == user) { 479 + attempt = 0; 480 + user_event_enabler_write(mm, enabler, true, &attempt); 481 + } 482 + } 483 + 484 + rcu_read_unlock(); 485 + mmap_read_unlock(mm->mm); 486 + user_event_mm_put(mm); 487 + mm = next; 488 + } 489 + } 490 + 491 + static bool user_event_enabler_dup(struct user_event_enabler *orig, 492 + struct user_event_mm *mm) 493 + { 494 + struct user_event_enabler *enabler; 495 + 496 + /* Skip pending frees */ 497 + if (unlikely(test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(orig)))) 498 + return true; 499 + 500 + enabler = kzalloc(sizeof(*enabler), GFP_NOWAIT | __GFP_ACCOUNT); 501 + 502 + if (!enabler) 503 + return false; 504 + 505 + enabler->event = orig->event; 506 + enabler->addr = orig->addr; 507 + 508 + /* Only dup part of value (ignore future flags, etc) */ 509 + enabler->values = orig->values & ENABLE_VAL_DUP_MASK; 510 + 511 + refcount_inc(&enabler->event->refcnt); 512 + list_add_rcu(&enabler->link, &mm->enablers); 513 + 514 + return true; 515 + } 516 + 517 + static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm) 518 + { 519 + refcount_inc(&mm->refcnt); 520 + 521 + return mm; 522 + } 523 + 524 + static struct user_event_mm *user_event_mm_get_all(struct user_event *user) 525 + { 526 + struct user_event_mm *found = NULL; 527 + struct user_event_enabler *enabler; 528 + struct user_event_mm *mm; 529 + 530 + /* 531 + * We do not want to block fork/exec while enablements are being 532 + * updated, so we use RCU to walk the current tasks that have used 533 + * user_events ABI for 1 or more events. Each enabler found in each 534 + * task that matches the event being updated has a write to reflect 535 + * the kernel state back into the process. Waits/faults must not occur 536 + * during this. So we scan the list under RCU for all the mm that have 537 + * the event within it. This is needed because mm_read_lock() can wait. 538 + * Each user mm returned has a ref inc to handle remove RCU races. 539 + */ 540 + rcu_read_lock(); 541 + 542 + list_for_each_entry_rcu(mm, &user_event_mms, link) 543 + list_for_each_entry_rcu(enabler, &mm->enablers, link) 544 + if (enabler->event == user) { 545 + mm->next = found; 546 + found = user_event_mm_get(mm); 547 + break; 548 + } 549 + 550 + rcu_read_unlock(); 551 + 552 + return found; 553 + } 554 + 555 + static struct user_event_mm *user_event_mm_create(struct task_struct *t) 556 + { 557 + struct user_event_mm *user_mm; 558 + unsigned long flags; 559 + 560 + user_mm = kzalloc(sizeof(*user_mm), GFP_KERNEL_ACCOUNT); 561 + 562 + if (!user_mm) 563 + return NULL; 564 + 565 + user_mm->mm = t->mm; 566 + INIT_LIST_HEAD(&user_mm->enablers); 567 + refcount_set(&user_mm->refcnt, 1); 568 + refcount_set(&user_mm->tasks, 1); 569 + 570 + spin_lock_irqsave(&user_event_mms_lock, flags); 571 + list_add_rcu(&user_mm->link, &user_event_mms); 572 + spin_unlock_irqrestore(&user_event_mms_lock, flags); 573 + 574 + t->user_event_mm = user_mm; 575 + 576 + /* 577 + * The lifetime of the memory descriptor can slightly outlast 578 + * the task lifetime if a ref to the user_event_mm is taken 579 + * between list_del_rcu() and call_rcu(). Therefore we need 580 + * to take a reference to it to ensure it can live this long 581 + * under this corner case. This can also occur in clones that 582 + * outlast the parent. 583 + */ 584 + mmgrab(user_mm->mm); 585 + 586 + return user_mm; 587 + } 588 + 589 + static struct user_event_mm *current_user_event_mm(void) 590 + { 591 + struct user_event_mm *user_mm = current->user_event_mm; 592 + 593 + if (user_mm) 594 + goto inc; 595 + 596 + user_mm = user_event_mm_create(current); 597 + 598 + if (!user_mm) 599 + goto error; 600 + inc: 601 + refcount_inc(&user_mm->refcnt); 602 + error: 603 + return user_mm; 604 + } 605 + 606 + static void user_event_mm_destroy(struct user_event_mm *mm) 607 + { 608 + struct user_event_enabler *enabler, *next; 609 + 610 + list_for_each_entry_safe(enabler, next, &mm->enablers, link) 611 + user_event_enabler_destroy(enabler); 612 + 613 + mmdrop(mm->mm); 614 + kfree(mm); 615 + } 616 + 617 + static void user_event_mm_put(struct user_event_mm *mm) 618 + { 619 + if (mm && refcount_dec_and_test(&mm->refcnt)) 620 + user_event_mm_destroy(mm); 621 + } 622 + 623 + static void delayed_user_event_mm_put(struct work_struct *work) 624 + { 625 + struct user_event_mm *mm; 626 + 627 + mm = container_of(to_rcu_work(work), struct user_event_mm, put_rwork); 628 + user_event_mm_put(mm); 629 + } 630 + 631 + void user_event_mm_remove(struct task_struct *t) 632 + { 633 + struct user_event_mm *mm; 634 + unsigned long flags; 635 + 636 + might_sleep(); 637 + 638 + mm = t->user_event_mm; 639 + t->user_event_mm = NULL; 640 + 641 + /* Clone will increment the tasks, only remove if last clone */ 642 + if (!refcount_dec_and_test(&mm->tasks)) 643 + return; 644 + 645 + /* Remove the mm from the list, so it can no longer be enabled */ 646 + spin_lock_irqsave(&user_event_mms_lock, flags); 647 + list_del_rcu(&mm->link); 648 + spin_unlock_irqrestore(&user_event_mms_lock, flags); 649 + 650 + /* 651 + * We need to wait for currently occurring writes to stop within 652 + * the mm. This is required since exit_mm() snaps the current rss 653 + * stats and clears them. On the final mmdrop(), check_mm() will 654 + * report a bug if these increment. 655 + * 656 + * All writes/pins are done under mmap_read lock, take the write 657 + * lock to ensure in-progress faults have completed. Faults that 658 + * are pending but yet to run will check the task count and skip 659 + * the fault since the mm is going away. 660 + */ 661 + mmap_write_lock(mm->mm); 662 + mmap_write_unlock(mm->mm); 663 + 664 + /* 665 + * Put for mm must be done after RCU delay to handle new refs in 666 + * between the list_del_rcu() and now. This ensures any get refs 667 + * during rcu_read_lock() are accounted for during list removal. 668 + * 669 + * CPU A | CPU B 670 + * --------------------------------------------------------------- 671 + * user_event_mm_remove() | rcu_read_lock(); 672 + * list_del_rcu() | list_for_each_entry_rcu(); 673 + * call_rcu() | refcount_inc(); 674 + * . | rcu_read_unlock(); 675 + * schedule_work() | . 676 + * user_event_mm_put() | . 677 + * 678 + * mmdrop() cannot be called in the softirq context of call_rcu() 679 + * so we use a work queue after call_rcu() to run within. 680 + */ 681 + INIT_RCU_WORK(&mm->put_rwork, delayed_user_event_mm_put); 682 + queue_rcu_work(system_wq, &mm->put_rwork); 683 + } 684 + 685 + void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm) 686 + { 687 + struct user_event_mm *mm = user_event_mm_create(t); 688 + struct user_event_enabler *enabler; 689 + 690 + if (!mm) 691 + return; 692 + 693 + rcu_read_lock(); 694 + 695 + list_for_each_entry_rcu(enabler, &old_mm->enablers, link) 696 + if (!user_event_enabler_dup(enabler, mm)) 697 + goto error; 698 + 699 + rcu_read_unlock(); 700 + 701 + return; 702 + error: 703 + rcu_read_unlock(); 704 + user_event_mm_remove(t); 705 + } 706 + 707 + static bool current_user_event_enabler_exists(unsigned long uaddr, 708 + unsigned char bit) 709 + { 710 + struct user_event_mm *user_mm = current_user_event_mm(); 711 + bool exists; 712 + 713 + if (!user_mm) 714 + return false; 715 + 716 + exists = user_event_enabler_exists(user_mm, uaddr, bit); 717 + 718 + user_event_mm_put(user_mm); 719 + 720 + return exists; 721 + } 722 + 723 + static struct user_event_enabler 724 + *user_event_enabler_create(struct user_reg *reg, struct user_event *user, 725 + int *write_result) 726 + { 727 + struct user_event_enabler *enabler; 728 + struct user_event_mm *user_mm; 729 + unsigned long uaddr = (unsigned long)reg->enable_addr; 730 + int attempt = 0; 731 + 732 + user_mm = current_user_event_mm(); 733 + 734 + if (!user_mm) 735 + return NULL; 736 + 737 + enabler = kzalloc(sizeof(*enabler), GFP_KERNEL_ACCOUNT); 738 + 739 + if (!enabler) 740 + goto out; 741 + 742 + enabler->event = user; 743 + enabler->addr = uaddr; 744 + enabler->values = reg->enable_bit; 745 + retry: 746 + /* Prevents state changes from racing with new enablers */ 747 + mutex_lock(&event_mutex); 748 + 749 + /* Attempt to reflect the current state within the process */ 750 + mmap_read_lock(user_mm->mm); 751 + *write_result = user_event_enabler_write(user_mm, enabler, false, 752 + &attempt); 753 + mmap_read_unlock(user_mm->mm); 754 + 755 + /* 756 + * If the write works, then we will track the enabler. A ref to the 757 + * underlying user_event is held by the enabler to prevent it going 758 + * away while the enabler is still in use by a process. The ref is 759 + * removed when the enabler is destroyed. This means a event cannot 760 + * be forcefully deleted from the system until all tasks using it 761 + * exit or run exec(), which includes forks and clones. 762 + */ 763 + if (!*write_result) { 764 + refcount_inc(&enabler->event->refcnt); 765 + list_add_rcu(&enabler->link, &user_mm->enablers); 766 + } 767 + 768 + mutex_unlock(&event_mutex); 769 + 770 + if (*write_result) { 771 + /* Attempt to fault-in and retry if it worked */ 772 + if (!user_event_mm_fault_in(user_mm, uaddr, attempt)) 773 + goto retry; 774 + 775 + kfree(enabler); 776 + enabler = NULL; 777 + } 778 + out: 779 + user_event_mm_put(user_mm); 780 + 781 + return enabler; 274 782 } 275 783 276 784 static __always_inline __must_check ··· 929 449 struct ftrace_event_field *field; 930 450 int validator_flags = 0; 931 451 932 - field = kmalloc(sizeof(*field), GFP_KERNEL); 452 + field = kmalloc(sizeof(*field), GFP_KERNEL_ACCOUNT); 933 453 934 454 if (!field) 935 455 return -ENOMEM; ··· 948 468 if (strstr(type, "char") != NULL) 949 469 validator_flags |= VALIDATOR_ENSURE_NULL; 950 470 951 - validator = kmalloc(sizeof(*validator), GFP_KERNEL); 471 + validator = kmalloc(sizeof(*validator), GFP_KERNEL_ACCOUNT); 952 472 953 473 if (!validator) { 954 474 kfree(field); ··· 968 488 field->size = size; 969 489 field->is_signed = is_signed; 970 490 field->filter_type = filter_type; 491 + 492 + if (filter_type == FILTER_OTHER) 493 + field->filter_type = filter_assign_type(type); 971 494 972 495 list_add(&field->link, &user->fields); 973 496 ··· 1237 754 1238 755 len = user_event_set_print_fmt(user, NULL, 0); 1239 756 1240 - print_fmt = kmalloc(len, GFP_KERNEL); 757 + print_fmt = kmalloc(len, GFP_KERNEL_ACCOUNT); 1241 758 1242 759 if (!print_fmt) 1243 760 return -ENOMEM; ··· 1253 770 int flags, 1254 771 struct trace_event *event) 1255 772 { 1256 - /* Unsafe to try to decode user provided print_fmt, use hex */ 1257 - trace_print_hex_dump_seq(&iter->seq, "", DUMP_PREFIX_OFFSET, 16, 1258 - 1, iter->ent, iter->ent_size, true); 1259 - 1260 - return trace_handle_return(&iter->seq); 773 + return print_event_fields(iter, event); 1261 774 } 1262 775 1263 776 static struct trace_event_functions user_event_funcs = { ··· 1299 820 { 1300 821 int ret = 0; 1301 822 823 + lockdep_assert_held(&event_mutex); 824 + 1302 825 /* Must destroy fields before call removal */ 1303 826 user_event_destroy_fields(user); 1304 827 ··· 1310 829 return ret; 1311 830 1312 831 dyn_event_remove(&user->devent); 1313 - 1314 - user_event_register_clear(user); 1315 - clear_bit(user->index, user->group->page_bitmap); 1316 832 hash_del(&user->node); 1317 833 1318 834 user_event_destroy_validators(user); 1319 835 kfree(user->call.print_fmt); 1320 836 kfree(EVENT_NAME(user)); 1321 837 kfree(user); 838 + 839 + if (current_user_events > 0) 840 + current_user_events--; 841 + else 842 + pr_alert("BUG: Bad current_user_events\n"); 1322 843 1323 844 return ret; 1324 845 } ··· 1460 977 #endif 1461 978 1462 979 /* 1463 - * Update the register page that is shared between user processes. 980 + * Update the enabled bit among all user processes. 1464 981 */ 1465 - static void update_reg_page_for(struct user_event *user) 982 + static void update_enable_bit_for(struct user_event *user) 1466 983 { 1467 984 struct tracepoint *tp = &user->tracepoint; 1468 985 char status = 0; ··· 1493 1010 rcu_read_unlock_sched(); 1494 1011 } 1495 1012 1496 - if (status) 1497 - user_event_register_set(user); 1498 - else 1499 - user_event_register_clear(user); 1500 - 1501 1013 user->status = status; 1014 + 1015 + user_event_enabler_update(user); 1502 1016 } 1503 1017 1504 1018 /* ··· 1552 1072 return ret; 1553 1073 inc: 1554 1074 refcount_inc(&user->refcnt); 1555 - update_reg_page_for(user); 1075 + update_enable_bit_for(user); 1556 1076 return 0; 1557 1077 dec: 1558 - update_reg_page_for(user); 1078 + update_enable_bit_for(user); 1559 1079 refcount_dec(&user->refcnt); 1560 1080 return 0; 1561 1081 } ··· 1573 1093 raw_command += USER_EVENTS_PREFIX_LEN; 1574 1094 raw_command = skip_spaces(raw_command); 1575 1095 1576 - name = kstrdup(raw_command, GFP_KERNEL); 1096 + name = kstrdup(raw_command, GFP_KERNEL_ACCOUNT); 1577 1097 1578 1098 if (!name) 1579 1099 return -ENOMEM; ··· 1751 1271 struct user_event **newuser) 1752 1272 { 1753 1273 int ret; 1754 - int index; 1755 1274 u32 key; 1756 1275 struct user_event *user; 1757 1276 ··· 1769 1290 return 0; 1770 1291 } 1771 1292 1772 - index = find_first_zero_bit(group->page_bitmap, MAX_EVENTS); 1773 - 1774 - if (index == MAX_EVENTS) 1775 - return -EMFILE; 1776 - 1777 - user = kzalloc(sizeof(*user), GFP_KERNEL); 1293 + user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT); 1778 1294 1779 1295 if (!user) 1780 1296 return -ENOMEM; ··· 1809 1335 1810 1336 mutex_lock(&event_mutex); 1811 1337 1338 + if (current_user_events >= max_user_events) { 1339 + ret = -EMFILE; 1340 + goto put_user_lock; 1341 + } 1342 + 1812 1343 ret = user_event_trace_register(user); 1813 1344 1814 1345 if (ret) 1815 1346 goto put_user_lock; 1816 - 1817 - user->index = index; 1818 1347 1819 1348 /* Ensure we track self ref and caller ref (2) */ 1820 1349 refcount_set(&user->refcnt, 2); 1821 1350 1822 1351 dyn_event_init(&user->devent, &user_event_dops); 1823 1352 dyn_event_add(&user->devent, &user->call); 1824 - set_bit(user->index, group->page_bitmap); 1825 1353 hash_add(group->register_table, &user->node, key); 1354 + current_user_events++; 1826 1355 1827 1356 mutex_unlock(&event_mutex); 1828 1357 ··· 1874 1397 1875 1398 if (unlikely(copy_from_iter(&idx, sizeof(idx), i) != sizeof(idx))) 1876 1399 return -EFAULT; 1400 + 1401 + if (idx < 0) 1402 + return -EINVAL; 1877 1403 1878 1404 rcu_read_lock_sched(); 1879 1405 ··· 1948 1468 if (!group) 1949 1469 return -ENOENT; 1950 1470 1951 - info = kzalloc(sizeof(*info), GFP_KERNEL); 1471 + info = kzalloc(sizeof(*info), GFP_KERNEL_ACCOUNT); 1952 1472 1953 1473 if (!info) 1954 1474 return -ENOMEM; ··· 2001 1521 2002 1522 size = struct_size(refs, events, count + 1); 2003 1523 2004 - new_refs = kzalloc(size, GFP_KERNEL); 1524 + new_refs = kzalloc(size, GFP_KERNEL_ACCOUNT); 2005 1525 2006 1526 if (!new_refs) 2007 1527 return -ENOMEM; ··· 2044 1564 if (ret) 2045 1565 return ret; 2046 1566 1567 + /* Ensure no flags, since we don't support any yet */ 1568 + if (kreg->flags != 0) 1569 + return -EINVAL; 1570 + 1571 + /* Ensure supported size */ 1572 + switch (kreg->enable_size) { 1573 + case 4: 1574 + /* 32-bit */ 1575 + break; 1576 + #if BITS_PER_LONG >= 64 1577 + case 8: 1578 + /* 64-bit */ 1579 + break; 1580 + #endif 1581 + default: 1582 + return -EINVAL; 1583 + } 1584 + 1585 + /* Ensure natural alignment */ 1586 + if (kreg->enable_addr % kreg->enable_size) 1587 + return -EINVAL; 1588 + 1589 + /* Ensure bit range for size */ 1590 + if (kreg->enable_bit > (kreg->enable_size * BITS_PER_BYTE) - 1) 1591 + return -EINVAL; 1592 + 1593 + /* Ensure accessible */ 1594 + if (!access_ok((const void __user *)(uintptr_t)kreg->enable_addr, 1595 + kreg->enable_size)) 1596 + return -EFAULT; 1597 + 2047 1598 kreg->size = size; 2048 1599 2049 1600 return 0; ··· 2089 1578 struct user_reg __user *ureg = (struct user_reg __user *)uarg; 2090 1579 struct user_reg reg; 2091 1580 struct user_event *user; 1581 + struct user_event_enabler *enabler; 2092 1582 char *name; 2093 1583 long ret; 1584 + int write_result; 2094 1585 2095 1586 ret = user_reg_get(ureg, &reg); 2096 1587 2097 1588 if (ret) 2098 1589 return ret; 1590 + 1591 + /* 1592 + * Prevent users from using the same address and bit multiple times 1593 + * within the same mm address space. This can cause unexpected behavior 1594 + * for user processes that is far easier to debug if this is explictly 1595 + * an error upon registering. 1596 + */ 1597 + if (current_user_event_enabler_exists((unsigned long)reg.enable_addr, 1598 + reg.enable_bit)) 1599 + return -EADDRINUSE; 2099 1600 2100 1601 name = strndup_user((const char __user *)(uintptr_t)reg.name_args, 2101 1602 MAX_EVENT_DESC); ··· 2133 1610 if (ret < 0) 2134 1611 return ret; 2135 1612 1613 + /* 1614 + * user_events_ref_add succeeded: 1615 + * At this point we have a user_event, it's lifetime is bound by the 1616 + * reference count, not this file. If anything fails, the user_event 1617 + * still has a reference until the file is released. During release 1618 + * any remaining references (from user_events_ref_add) are decremented. 1619 + * 1620 + * Attempt to create an enabler, which too has a lifetime tied in the 1621 + * same way for the event. Once the task that caused the enabler to be 1622 + * created exits or issues exec() then the enablers it has created 1623 + * will be destroyed and the ref to the event will be decremented. 1624 + */ 1625 + enabler = user_event_enabler_create(&reg, user, &write_result); 1626 + 1627 + if (!enabler) 1628 + return -ENOMEM; 1629 + 1630 + /* Write failed/faulted, give error back to caller */ 1631 + if (write_result) 1632 + return write_result; 1633 + 2136 1634 put_user((u32)ret, &ureg->write_index); 2137 - put_user(user->index, &ureg->status_bit); 2138 1635 2139 1636 return 0; 2140 1637 } ··· 2184 1641 return ret; 2185 1642 } 2186 1643 1644 + static long user_unreg_get(struct user_unreg __user *ureg, 1645 + struct user_unreg *kreg) 1646 + { 1647 + u32 size; 1648 + long ret; 1649 + 1650 + ret = get_user(size, &ureg->size); 1651 + 1652 + if (ret) 1653 + return ret; 1654 + 1655 + if (size > PAGE_SIZE) 1656 + return -E2BIG; 1657 + 1658 + if (size < offsetofend(struct user_unreg, disable_addr)) 1659 + return -EINVAL; 1660 + 1661 + ret = copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); 1662 + 1663 + /* Ensure no reserved values, since we don't support any yet */ 1664 + if (kreg->__reserved || kreg->__reserved2) 1665 + return -EINVAL; 1666 + 1667 + return ret; 1668 + } 1669 + 1670 + static int user_event_mm_clear_bit(struct user_event_mm *user_mm, 1671 + unsigned long uaddr, unsigned char bit) 1672 + { 1673 + struct user_event_enabler enabler; 1674 + int result; 1675 + int attempt = 0; 1676 + 1677 + memset(&enabler, 0, sizeof(enabler)); 1678 + enabler.addr = uaddr; 1679 + enabler.values = bit; 1680 + retry: 1681 + /* Prevents state changes from racing with new enablers */ 1682 + mutex_lock(&event_mutex); 1683 + 1684 + /* Force the bit to be cleared, since no event is attached */ 1685 + mmap_read_lock(user_mm->mm); 1686 + result = user_event_enabler_write(user_mm, &enabler, false, &attempt); 1687 + mmap_read_unlock(user_mm->mm); 1688 + 1689 + mutex_unlock(&event_mutex); 1690 + 1691 + if (result) { 1692 + /* Attempt to fault-in and retry if it worked */ 1693 + if (!user_event_mm_fault_in(user_mm, uaddr, attempt)) 1694 + goto retry; 1695 + } 1696 + 1697 + return result; 1698 + } 1699 + 1700 + /* 1701 + * Unregisters an enablement address/bit within a task/user mm. 1702 + */ 1703 + static long user_events_ioctl_unreg(unsigned long uarg) 1704 + { 1705 + struct user_unreg __user *ureg = (struct user_unreg __user *)uarg; 1706 + struct user_event_mm *mm = current->user_event_mm; 1707 + struct user_event_enabler *enabler, *next; 1708 + struct user_unreg reg; 1709 + long ret; 1710 + 1711 + ret = user_unreg_get(ureg, &reg); 1712 + 1713 + if (ret) 1714 + return ret; 1715 + 1716 + if (!mm) 1717 + return -ENOENT; 1718 + 1719 + ret = -ENOENT; 1720 + 1721 + /* 1722 + * Flags freeing and faulting are used to indicate if the enabler is in 1723 + * use at all. When faulting is set a page-fault is occurring asyncly. 1724 + * During async fault if freeing is set, the enabler will be destroyed. 1725 + * If no async fault is happening, we can destroy it now since we hold 1726 + * the event_mutex during these checks. 1727 + */ 1728 + mutex_lock(&event_mutex); 1729 + 1730 + list_for_each_entry_safe(enabler, next, &mm->enablers, link) 1731 + if (enabler->addr == reg.disable_addr && 1732 + (enabler->values & ENABLE_VAL_BIT_MASK) == reg.disable_bit) { 1733 + set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler)); 1734 + 1735 + if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler))) 1736 + user_event_enabler_destroy(enabler); 1737 + 1738 + /* Removed at least one */ 1739 + ret = 0; 1740 + } 1741 + 1742 + mutex_unlock(&event_mutex); 1743 + 1744 + /* Ensure bit is now cleared for user, regardless of event status */ 1745 + if (!ret) 1746 + ret = user_event_mm_clear_bit(mm, reg.disable_addr, 1747 + reg.disable_bit); 1748 + 1749 + return ret; 1750 + } 1751 + 2187 1752 /* 2188 1753 * Handles the ioctl from user mode to register or alter operations. 2189 1754 */ ··· 2312 1661 case DIAG_IOCSDEL: 2313 1662 mutex_lock(&group->reg_mutex); 2314 1663 ret = user_events_ioctl_del(info, uarg); 1664 + mutex_unlock(&group->reg_mutex); 1665 + break; 1666 + 1667 + case DIAG_IOCSUNREG: 1668 + mutex_lock(&group->reg_mutex); 1669 + ret = user_events_ioctl_unreg(uarg); 2315 1670 mutex_unlock(&group->reg_mutex); 2316 1671 break; 2317 1672 } ··· 2375 1718 } 2376 1719 2377 1720 static const struct file_operations user_data_fops = { 2378 - .open = user_events_open, 2379 - .write = user_events_write, 2380 - .write_iter = user_events_write_iter, 1721 + .open = user_events_open, 1722 + .write = user_events_write, 1723 + .write_iter = user_events_write_iter, 2381 1724 .unlocked_ioctl = user_events_ioctl, 2382 - .release = user_events_release, 1725 + .release = user_events_release, 2383 1726 }; 2384 - 2385 - static struct user_event_group *user_status_group(struct file *file) 2386 - { 2387 - struct seq_file *m = file->private_data; 2388 - 2389 - if (!m) 2390 - return NULL; 2391 - 2392 - return m->private; 2393 - } 2394 - 2395 - /* 2396 - * Maps the shared page into the user process for checking if event is enabled. 2397 - */ 2398 - static int user_status_mmap(struct file *file, struct vm_area_struct *vma) 2399 - { 2400 - char *pages; 2401 - struct user_event_group *group = user_status_group(file); 2402 - unsigned long size = vma->vm_end - vma->vm_start; 2403 - 2404 - if (size != MAX_BYTES) 2405 - return -EINVAL; 2406 - 2407 - if (!group) 2408 - return -EINVAL; 2409 - 2410 - pages = group->register_page_data; 2411 - 2412 - return remap_pfn_range(vma, vma->vm_start, 2413 - virt_to_phys(pages) >> PAGE_SHIFT, 2414 - size, vm_get_page_prot(VM_READ)); 2415 - } 2416 1727 2417 1728 static void *user_seq_start(struct seq_file *m, loff_t *pos) 2418 1729 { ··· 2405 1780 struct user_event_group *group = m->private; 2406 1781 struct user_event *user; 2407 1782 char status; 2408 - int i, active = 0, busy = 0, flags; 1783 + int i, active = 0, busy = 0; 2409 1784 2410 1785 if (!group) 2411 1786 return -EINVAL; ··· 2414 1789 2415 1790 hash_for_each(group->register_table, i, user, node) { 2416 1791 status = user->status; 2417 - flags = user->flags; 2418 1792 2419 - seq_printf(m, "%d:%s", user->index, EVENT_NAME(user)); 1793 + seq_printf(m, "%s", EVENT_NAME(user)); 2420 1794 2421 - if (flags != 0 || status != 0) 1795 + if (status != 0) 2422 1796 seq_puts(m, " #"); 2423 1797 2424 1798 if (status != 0) { ··· 2440 1816 seq_puts(m, "\n"); 2441 1817 seq_printf(m, "Active: %d\n", active); 2442 1818 seq_printf(m, "Busy: %d\n", busy); 2443 - seq_printf(m, "Max: %ld\n", MAX_EVENTS); 2444 1819 2445 1820 return 0; 2446 1821 } 2447 1822 2448 1823 static const struct seq_operations user_seq_ops = { 2449 - .start = user_seq_start, 2450 - .next = user_seq_next, 2451 - .stop = user_seq_stop, 2452 - .show = user_seq_show, 1824 + .start = user_seq_start, 1825 + .next = user_seq_next, 1826 + .stop = user_seq_stop, 1827 + .show = user_seq_show, 2453 1828 }; 2454 1829 2455 1830 static int user_status_open(struct inode *node, struct file *file) ··· 2474 1851 } 2475 1852 2476 1853 static const struct file_operations user_status_fops = { 2477 - .open = user_status_open, 2478 - .mmap = user_status_mmap, 2479 - .read = seq_read, 2480 - .llseek = seq_lseek, 2481 - .release = seq_release, 1854 + .open = user_status_open, 1855 + .read = seq_read, 1856 + .llseek = seq_lseek, 1857 + .release = seq_release, 2482 1858 }; 2483 1859 2484 1860 /* ··· 2495 1873 goto err; 2496 1874 } 2497 1875 2498 - /* mmap with MAP_SHARED requires writable fd */ 2499 - emmap = tracefs_create_file("user_events_status", TRACE_MODE_WRITE, 1876 + emmap = tracefs_create_file("user_events_status", TRACE_MODE_READ, 2500 1877 NULL, NULL, &user_status_fops); 2501 1878 2502 1879 if (!emmap) { ··· 2509 1888 return -ENODEV; 2510 1889 } 2511 1890 1891 + static int set_max_user_events_sysctl(struct ctl_table *table, int write, 1892 + void *buffer, size_t *lenp, loff_t *ppos) 1893 + { 1894 + int ret; 1895 + 1896 + mutex_lock(&event_mutex); 1897 + 1898 + ret = proc_douintvec(table, write, buffer, lenp, ppos); 1899 + 1900 + mutex_unlock(&event_mutex); 1901 + 1902 + return ret; 1903 + } 1904 + 1905 + static struct ctl_table user_event_sysctls[] = { 1906 + { 1907 + .procname = "user_events_max", 1908 + .data = &max_user_events, 1909 + .maxlen = sizeof(unsigned int), 1910 + .mode = 0644, 1911 + .proc_handler = set_max_user_events_sysctl, 1912 + }, 1913 + {} 1914 + }; 1915 + 2512 1916 static int __init trace_events_user_init(void) 2513 1917 { 2514 1918 int ret; 2515 1919 1920 + fault_cache = KMEM_CACHE(user_event_enabler_fault, 0); 1921 + 1922 + if (!fault_cache) 1923 + return -ENOMEM; 1924 + 2516 1925 init_group = user_event_group_create(&init_user_ns); 2517 1926 2518 - if (!init_group) 1927 + if (!init_group) { 1928 + kmem_cache_destroy(fault_cache); 2519 1929 return -ENOMEM; 1930 + } 2520 1931 2521 1932 ret = create_user_tracefs(); 2522 1933 2523 1934 if (ret) { 2524 1935 pr_warn("user_events could not register with tracefs\n"); 2525 1936 user_event_group_destroy(init_group); 1937 + kmem_cache_destroy(fault_cache); 2526 1938 init_group = NULL; 2527 1939 return ret; 2528 1940 } 2529 1941 2530 1942 if (dyn_event_register(&user_event_dops)) 2531 1943 pr_warn("user_events could not register with dyn_events\n"); 1944 + 1945 + register_sysctl_init("kernel", user_event_sysctls); 2532 1946 2533 1947 return 0; 2534 1948 }
+174 -1
kernel/trace/trace_output.c
··· 221 221 const char *ret = trace_seq_buffer_ptr(p); 222 222 const char *fmt = concatenate ? "%*phN" : "%*ph"; 223 223 224 - for (i = 0; i < buf_len; i += 16) 224 + for (i = 0; i < buf_len; i += 16) { 225 + if (!concatenate && i != 0) 226 + trace_seq_putc(p, ' '); 225 227 trace_seq_printf(p, fmt, min(buf_len - i, 16), &buf[i]); 228 + } 226 229 trace_seq_putc(p, 0); 227 230 228 231 return ret; ··· 810 807 /* 811 808 * Standard events 812 809 */ 810 + 811 + static void print_array(struct trace_iterator *iter, void *pos, 812 + struct ftrace_event_field *field) 813 + { 814 + int offset; 815 + int len; 816 + int i; 817 + 818 + offset = *(int *)pos & 0xffff; 819 + len = *(int *)pos >> 16; 820 + 821 + if (field) 822 + offset += field->offset + sizeof(int); 823 + 824 + if (offset + len > iter->ent_size) { 825 + trace_seq_puts(&iter->seq, "<OVERFLOW>"); 826 + return; 827 + } 828 + 829 + pos = (void *)iter->ent + offset; 830 + 831 + for (i = 0; i < len; i++, pos++) { 832 + if (i) 833 + trace_seq_putc(&iter->seq, ','); 834 + trace_seq_printf(&iter->seq, "%02x", *(unsigned char *)pos); 835 + } 836 + } 837 + 838 + static void print_fields(struct trace_iterator *iter, struct trace_event_call *call, 839 + struct list_head *head) 840 + { 841 + struct ftrace_event_field *field; 842 + int offset; 843 + int len; 844 + int ret; 845 + void *pos; 846 + 847 + list_for_each_entry(field, head, link) { 848 + trace_seq_printf(&iter->seq, " %s=", field->name); 849 + if (field->offset + field->size > iter->ent_size) { 850 + trace_seq_puts(&iter->seq, "<OVERFLOW>"); 851 + continue; 852 + } 853 + pos = (void *)iter->ent + field->offset; 854 + 855 + switch (field->filter_type) { 856 + case FILTER_COMM: 857 + case FILTER_STATIC_STRING: 858 + trace_seq_printf(&iter->seq, "%.*s", field->size, (char *)pos); 859 + break; 860 + case FILTER_RDYN_STRING: 861 + case FILTER_DYN_STRING: 862 + offset = *(int *)pos & 0xffff; 863 + len = *(int *)pos >> 16; 864 + 865 + if (field->filter_type == FILTER_RDYN_STRING) 866 + offset += field->offset + sizeof(int); 867 + 868 + if (offset + len > iter->ent_size) { 869 + trace_seq_puts(&iter->seq, "<OVERFLOW>"); 870 + break; 871 + } 872 + pos = (void *)iter->ent + offset; 873 + trace_seq_printf(&iter->seq, "%.*s", len, (char *)pos); 874 + break; 875 + case FILTER_PTR_STRING: 876 + if (!iter->fmt_size) 877 + trace_iter_expand_format(iter); 878 + pos = *(void **)pos; 879 + ret = strncpy_from_kernel_nofault(iter->fmt, pos, 880 + iter->fmt_size); 881 + if (ret < 0) 882 + trace_seq_printf(&iter->seq, "(0x%px)", pos); 883 + else 884 + trace_seq_printf(&iter->seq, "(0x%px:%s)", 885 + pos, iter->fmt); 886 + break; 887 + case FILTER_TRACE_FN: 888 + pos = *(void **)pos; 889 + trace_seq_printf(&iter->seq, "%pS", pos); 890 + break; 891 + case FILTER_CPU: 892 + case FILTER_OTHER: 893 + switch (field->size) { 894 + case 1: 895 + if (isprint(*(char *)pos)) { 896 + trace_seq_printf(&iter->seq, "'%c'", 897 + *(unsigned char *)pos); 898 + } 899 + trace_seq_printf(&iter->seq, "(%d)", 900 + *(unsigned char *)pos); 901 + break; 902 + case 2: 903 + trace_seq_printf(&iter->seq, "0x%x (%d)", 904 + *(unsigned short *)pos, 905 + *(unsigned short *)pos); 906 + break; 907 + case 4: 908 + /* dynamic array info is 4 bytes */ 909 + if (strstr(field->type, "__data_loc")) { 910 + print_array(iter, pos, NULL); 911 + break; 912 + } 913 + 914 + if (strstr(field->type, "__rel_loc")) { 915 + print_array(iter, pos, field); 916 + break; 917 + } 918 + 919 + trace_seq_printf(&iter->seq, "0x%x (%d)", 920 + *(unsigned int *)pos, 921 + *(unsigned int *)pos); 922 + break; 923 + case 8: 924 + trace_seq_printf(&iter->seq, "0x%llx (%lld)", 925 + *(unsigned long long *)pos, 926 + *(unsigned long long *)pos); 927 + break; 928 + default: 929 + trace_seq_puts(&iter->seq, "<INVALID-SIZE>"); 930 + break; 931 + } 932 + break; 933 + default: 934 + trace_seq_puts(&iter->seq, "<INVALID-TYPE>"); 935 + } 936 + } 937 + trace_seq_putc(&iter->seq, '\n'); 938 + } 939 + 940 + enum print_line_t print_event_fields(struct trace_iterator *iter, 941 + struct trace_event *event) 942 + { 943 + struct trace_event_call *call; 944 + struct list_head *head; 945 + 946 + /* ftrace defined events have separate call structures */ 947 + if (event->type <= __TRACE_LAST_TYPE) { 948 + bool found = false; 949 + 950 + down_read(&trace_event_sem); 951 + list_for_each_entry(call, &ftrace_events, list) { 952 + if (call->event.type == event->type) { 953 + found = true; 954 + break; 955 + } 956 + /* No need to search all events */ 957 + if (call->event.type > __TRACE_LAST_TYPE) 958 + break; 959 + } 960 + up_read(&trace_event_sem); 961 + if (!found) { 962 + trace_seq_printf(&iter->seq, "UNKNOWN TYPE %d\n", event->type); 963 + goto out; 964 + } 965 + } else { 966 + call = container_of(event, struct trace_event_call, event); 967 + } 968 + head = trace_get_fields(call); 969 + 970 + trace_seq_printf(&iter->seq, "%s:", trace_event_name(call)); 971 + 972 + if (head && !list_empty(head)) 973 + print_fields(iter, call, head); 974 + else 975 + trace_seq_puts(&iter->seq, "No fields found\n"); 976 + 977 + out: 978 + return trace_handle_return(&iter->seq); 979 + } 813 980 814 981 enum print_line_t trace_nop_print(struct trace_iterator *iter, int flags, 815 982 struct trace_event *event)
+2
kernel/trace/trace_output.h
··· 19 19 extern void trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset); 20 20 extern int trace_print_context(struct trace_iterator *iter); 21 21 extern int trace_print_lat_context(struct trace_iterator *iter); 22 + extern enum print_line_t print_event_fields(struct trace_iterator *iter, 23 + struct trace_event *event); 22 24 23 25 extern void trace_event_read_lock(void); 24 26 extern void trace_event_read_unlock(void);
+32
lib/seq_buf.c
··· 93 93 } 94 94 EXPORT_SYMBOL_GPL(seq_buf_printf); 95 95 96 + /** 97 + * seq_buf_do_printk - printk seq_buf line by line 98 + * @s: seq_buf descriptor 99 + * @lvl: printk level 100 + * 101 + * printk()-s a multi-line sequential buffer line by line. The function 102 + * makes sure that the buffer in @s is nul terminated and safe to read 103 + * as a string. 104 + */ 105 + void seq_buf_do_printk(struct seq_buf *s, const char *lvl) 106 + { 107 + const char *start, *lf; 108 + 109 + if (s->size == 0 || s->len == 0) 110 + return; 111 + 112 + seq_buf_terminate(s); 113 + 114 + start = s->buffer; 115 + while ((lf = strchr(start, '\n'))) { 116 + int len = lf - start + 1; 117 + 118 + printk("%s%.*s", lvl, len, start); 119 + start = ++lf; 120 + } 121 + 122 + /* No trailing LF */ 123 + if (start < s->buffer + s->len) 124 + printk("%s%s\n", lvl, start); 125 + } 126 + EXPORT_SYMBOL_GPL(seq_buf_do_printk); 127 + 96 128 #ifdef CONFIG_BINARY_PRINTF 97 129 /** 98 130 * seq_buf_bprintf - Write the printf string from binary arguments
+103 -2
lib/test_fprobe.c
··· 17 17 /* Use indirect calls to avoid inlining the target functions */ 18 18 static u32 (*target)(u32 value); 19 19 static u32 (*target2)(u32 value); 20 + static u32 (*target_nest)(u32 value, u32 (*nest)(u32)); 20 21 static unsigned long target_ip; 21 22 static unsigned long target2_ip; 23 + static unsigned long target_nest_ip; 24 + static int entry_return_value; 22 25 23 26 static noinline u32 fprobe_selftest_target(u32 value) 24 27 { ··· 33 30 return (value / div_factor) + 1; 34 31 } 35 32 36 - static notrace void fp_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) 33 + static noinline u32 fprobe_selftest_nest_target(u32 value, u32 (*nest)(u32)) 34 + { 35 + return nest(value + 2); 36 + } 37 + 38 + static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip, 39 + struct pt_regs *regs, void *data) 37 40 { 38 41 KUNIT_EXPECT_FALSE(current_test, preemptible()); 39 42 /* This can be called on the fprobe_selftest_target and the fprobe_selftest_target2 */ 40 43 if (ip != target_ip) 41 44 KUNIT_EXPECT_EQ(current_test, ip, target2_ip); 42 45 entry_val = (rand1 / div_factor); 46 + if (fp->entry_data_size) { 47 + KUNIT_EXPECT_NOT_NULL(current_test, data); 48 + if (data) 49 + *(u32 *)data = entry_val; 50 + } else 51 + KUNIT_EXPECT_NULL(current_test, data); 52 + 53 + return entry_return_value; 43 54 } 44 55 45 - static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) 56 + static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip, 57 + struct pt_regs *regs, void *data) 46 58 { 47 59 unsigned long ret = regs_return_value(regs); 48 60 ··· 69 51 KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor)); 70 52 KUNIT_EXPECT_EQ(current_test, entry_val, (rand1 / div_factor)); 71 53 exit_val = entry_val + div_factor; 54 + if (fp->entry_data_size) { 55 + KUNIT_EXPECT_NOT_NULL(current_test, data); 56 + if (data) 57 + KUNIT_EXPECT_EQ(current_test, *(u32 *)data, entry_val); 58 + } else 59 + KUNIT_EXPECT_NULL(current_test, data); 60 + } 61 + 62 + static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip, 63 + struct pt_regs *regs, void *data) 64 + { 65 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 66 + return 0; 67 + } 68 + 69 + static notrace void nest_exit_handler(struct fprobe *fp, unsigned long ip, 70 + struct pt_regs *regs, void *data) 71 + { 72 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 73 + KUNIT_EXPECT_EQ(current_test, ip, target_nest_ip); 72 74 } 73 75 74 76 /* Test entry only (no rethook) */ ··· 170 132 KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); 171 133 } 172 134 135 + /* Test private entry_data */ 136 + static void test_fprobe_data(struct kunit *test) 137 + { 138 + struct fprobe fp = { 139 + .entry_handler = fp_entry_handler, 140 + .exit_handler = fp_exit_handler, 141 + .entry_data_size = sizeof(u32), 142 + }; 143 + 144 + current_test = test; 145 + KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp, "fprobe_selftest_target", NULL)); 146 + 147 + target(rand1); 148 + 149 + KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); 150 + } 151 + 152 + /* Test nr_maxactive */ 153 + static void test_fprobe_nest(struct kunit *test) 154 + { 155 + static const char *syms[] = {"fprobe_selftest_target", "fprobe_selftest_nest_target"}; 156 + struct fprobe fp = { 157 + .entry_handler = nest_entry_handler, 158 + .exit_handler = nest_exit_handler, 159 + .nr_maxactive = 1, 160 + }; 161 + 162 + current_test = test; 163 + KUNIT_EXPECT_EQ(test, 0, register_fprobe_syms(&fp, syms, 2)); 164 + 165 + target_nest(rand1, target); 166 + KUNIT_EXPECT_EQ(test, 1, fp.nmissed); 167 + 168 + KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); 169 + } 170 + 171 + static void test_fprobe_skip(struct kunit *test) 172 + { 173 + struct fprobe fp = { 174 + .entry_handler = fp_entry_handler, 175 + .exit_handler = fp_exit_handler, 176 + }; 177 + 178 + current_test = test; 179 + KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp, "fprobe_selftest_target", NULL)); 180 + 181 + entry_return_value = 1; 182 + entry_val = 0; 183 + exit_val = 0; 184 + target(rand1); 185 + KUNIT_EXPECT_NE(test, 0, entry_val); 186 + KUNIT_EXPECT_EQ(test, 0, exit_val); 187 + KUNIT_EXPECT_EQ(test, 0, fp.nmissed); 188 + entry_return_value = 0; 189 + 190 + KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); 191 + } 192 + 173 193 static unsigned long get_ftrace_location(void *func) 174 194 { 175 195 unsigned long size, addr = (unsigned long)func; ··· 243 147 rand1 = get_random_u32_above(div_factor); 244 148 target = fprobe_selftest_target; 245 149 target2 = fprobe_selftest_target2; 150 + target_nest = fprobe_selftest_nest_target; 246 151 target_ip = get_ftrace_location(target); 247 152 target2_ip = get_ftrace_location(target2); 153 + target_nest_ip = get_ftrace_location(target_nest); 248 154 249 155 return 0; 250 156 } ··· 255 157 KUNIT_CASE(test_fprobe_entry), 256 158 KUNIT_CASE(test_fprobe), 257 159 KUNIT_CASE(test_fprobe_syms), 160 + KUNIT_CASE(test_fprobe_data), 161 + KUNIT_CASE(test_fprobe_nest), 162 + KUNIT_CASE(test_fprobe_skip), 258 163 {} 259 164 }; 260 165
+5 -2
samples/fprobe/fprobe_example.c
··· 48 48 stack_trace_print(stacks, len, 24); 49 49 } 50 50 51 - static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) 51 + static int sample_entry_handler(struct fprobe *fp, unsigned long ip, 52 + struct pt_regs *regs, void *data) 52 53 { 53 54 if (use_trace) 54 55 /* ··· 62 61 nhit++; 63 62 if (stackdump) 64 63 show_backtrace(); 64 + return 0; 65 65 } 66 66 67 - static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) 67 + static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs, 68 + void *data) 68 69 { 69 70 unsigned long rip = instruction_pointer(regs); 70 71
+8 -37
samples/user_events/example.c
··· 9 9 #include <errno.h> 10 10 #include <sys/ioctl.h> 11 11 #include <sys/mman.h> 12 + #include <sys/uio.h> 12 13 #include <fcntl.h> 13 14 #include <stdio.h> 14 15 #include <unistd.h> 15 - #include <asm/bitsperlong.h> 16 - #include <endian.h> 17 16 #include <linux/user_events.h> 18 17 19 - #if __BITS_PER_LONG == 64 20 - #define endian_swap(x) htole64(x) 21 - #else 22 - #define endian_swap(x) htole32(x) 23 - #endif 24 - 25 - /* Assumes debugfs is mounted */ 26 18 const char *data_file = "/sys/kernel/tracing/user_events_data"; 27 - const char *status_file = "/sys/kernel/tracing/user_events_status"; 19 + int enabled = 0; 28 20 29 - static int event_status(long **status) 30 - { 31 - int fd = open(status_file, O_RDONLY); 32 - 33 - *status = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_READ, 34 - MAP_SHARED, fd, 0); 35 - 36 - close(fd); 37 - 38 - if (*status == MAP_FAILED) 39 - return -1; 40 - 41 - return 0; 42 - } 43 - 44 - static int event_reg(int fd, const char *command, long *index, long *mask, 45 - int *write) 21 + static int event_reg(int fd, const char *command, int *write, int *enabled) 46 22 { 47 23 struct user_reg reg = {0}; 48 24 49 25 reg.size = sizeof(reg); 26 + reg.enable_bit = 31; 27 + reg.enable_size = sizeof(*enabled); 28 + reg.enable_addr = (__u64)enabled; 50 29 reg.name_args = (__u64)command; 51 30 52 31 if (ioctl(fd, DIAG_IOCSREG, &reg) == -1) 53 32 return -1; 54 33 55 - *index = reg.status_bit / __BITS_PER_LONG; 56 - *mask = endian_swap(1L << (reg.status_bit % __BITS_PER_LONG)); 57 34 *write = reg.write_index; 58 35 59 36 return 0; ··· 39 62 int main(int argc, char **argv) 40 63 { 41 64 int data_fd, write; 42 - long index, mask; 43 - long *status_page; 44 65 struct iovec io[2]; 45 66 __u32 count = 0; 46 67 47 - if (event_status(&status_page) == -1) 48 - return errno; 49 - 50 68 data_fd = open(data_file, O_RDWR); 51 69 52 - if (event_reg(data_fd, "test u32 count", &index, &mask, &write) == -1) 70 + if (event_reg(data_fd, "test u32 count", &write, &enabled) == -1) 53 71 return errno; 54 72 55 73 /* Setup iovec */ ··· 52 80 io[0].iov_len = sizeof(write); 53 81 io[1].iov_base = &count; 54 82 io[1].iov_len = sizeof(count); 55 - 56 83 ask: 57 84 printf("Press enter to check status...\n"); 58 85 getchar(); 59 86 60 87 /* Check if anyone is listening */ 61 - if (status_page[index] & mask) { 88 + if (enabled) { 62 89 /* Yep, trace out our data */ 63 90 writev(data_fd, (const struct iovec *)io, 2); 64 91
+1
scripts/leaking_addresses.pl
··· 61 61 '/proc/device-tree', 62 62 '/proc/1/syscall', 63 63 '/sys/firmware/devicetree', 64 + '/sys/kernel/tracing/trace_pipe', 64 65 '/sys/kernel/debug/tracing/trace_pipe', 65 66 '/sys/kernel/security/apparmor/revision'); 66 67
+5 -1
scripts/recordmcount.c
··· 110 110 { 111 111 size_t cnt = count; 112 112 off_t idx = 0; 113 + void *p = NULL; 113 114 114 115 file_updated = 1; 115 116 ··· 118 117 off_t aoffset = (file_ptr + count) - file_end; 119 118 120 119 if (aoffset > file_append_size) { 121 - file_append = realloc(file_append, aoffset); 120 + p = realloc(file_append, aoffset); 121 + if (!p) 122 + free(file_append); 123 + file_append = p; 122 124 file_append_size = aoffset; 123 125 } 124 126 if (!file_append) {
+1 -1
tools/kvm/kvm_stat/kvm_stat
··· 627 627 name)'. 628 628 629 629 All available events have directories under 630 - /sys/kernel/debug/tracing/events/ which export information 630 + /sys/kernel/tracing/events/ which export information 631 631 about the specific event. Therefore, listing the dirs gives us 632 632 a list of all available events. 633 633
+2 -2
tools/testing/selftests/mm/protection_keys.c
··· 98 98 void tracing_on(void) 99 99 { 100 100 #if CONTROL_TRACING > 0 101 - #define TRACEDIR "/sys/kernel/debug/tracing" 101 + #define TRACEDIR "/sys/kernel/tracing" 102 102 char pidstr[32]; 103 103 104 104 if (!tracing_root_ok()) ··· 124 124 #if CONTROL_TRACING > 0 125 125 if (!tracing_root_ok()) 126 126 return; 127 - cat_into_file("0", "/sys/kernel/debug/tracing/tracing_on"); 127 + cat_into_file("0", "/sys/kernel/tracing/tracing_on"); 128 128 #endif 129 129 } 130 130
+1 -1
tools/testing/selftests/user_events/Makefile
··· 10 10 # This test will not compile until user_events.h is added 11 11 # back to uapi. 12 12 13 - TEST_GEN_PROGS = ftrace_test dyn_test perf_test 13 + TEST_GEN_PROGS = ftrace_test dyn_test perf_test abi_test 14 14 15 15 TEST_FILES := settings 16 16
+229
tools/testing/selftests/user_events/abi_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * User Events ABI Test Program 4 + * 5 + * Copyright (c) 2022 Beau Belgrave <beaub@linux.microsoft.com> 6 + */ 7 + 8 + #define _GNU_SOURCE 9 + #include <sched.h> 10 + 11 + #include <errno.h> 12 + #include <linux/user_events.h> 13 + #include <stdio.h> 14 + #include <stdlib.h> 15 + #include <fcntl.h> 16 + #include <sys/ioctl.h> 17 + #include <sys/stat.h> 18 + #include <unistd.h> 19 + #include <asm/unistd.h> 20 + 21 + #include "../kselftest_harness.h" 22 + 23 + const char *data_file = "/sys/kernel/tracing/user_events_data"; 24 + const char *enable_file = "/sys/kernel/tracing/events/user_events/__abi_event/enable"; 25 + 26 + static int change_event(bool enable) 27 + { 28 + int fd = open(enable_file, O_RDWR); 29 + int ret; 30 + 31 + if (fd < 0) 32 + return -1; 33 + 34 + if (enable) 35 + ret = write(fd, "1", 1); 36 + else 37 + ret = write(fd, "0", 1); 38 + 39 + close(fd); 40 + 41 + if (ret == 1) 42 + ret = 0; 43 + else 44 + ret = -1; 45 + 46 + return ret; 47 + } 48 + 49 + static int reg_enable(long *enable, int size, int bit) 50 + { 51 + struct user_reg reg = {0}; 52 + int fd = open(data_file, O_RDWR); 53 + int ret; 54 + 55 + if (fd < 0) 56 + return -1; 57 + 58 + reg.size = sizeof(reg); 59 + reg.name_args = (__u64)"__abi_event"; 60 + reg.enable_bit = bit; 61 + reg.enable_addr = (__u64)enable; 62 + reg.enable_size = size; 63 + 64 + ret = ioctl(fd, DIAG_IOCSREG, &reg); 65 + 66 + close(fd); 67 + 68 + return ret; 69 + } 70 + 71 + static int reg_disable(long *enable, int bit) 72 + { 73 + struct user_unreg reg = {0}; 74 + int fd = open(data_file, O_RDWR); 75 + int ret; 76 + 77 + if (fd < 0) 78 + return -1; 79 + 80 + reg.size = sizeof(reg); 81 + reg.disable_bit = bit; 82 + reg.disable_addr = (__u64)enable; 83 + 84 + ret = ioctl(fd, DIAG_IOCSUNREG, &reg); 85 + 86 + close(fd); 87 + 88 + return ret; 89 + } 90 + 91 + FIXTURE(user) { 92 + long check; 93 + }; 94 + 95 + FIXTURE_SETUP(user) { 96 + change_event(false); 97 + self->check = 0; 98 + } 99 + 100 + FIXTURE_TEARDOWN(user) { 101 + } 102 + 103 + TEST_F(user, enablement) { 104 + /* Changes should be reflected immediately */ 105 + ASSERT_EQ(0, self->check); 106 + ASSERT_EQ(0, reg_enable(&self->check, sizeof(int), 0)); 107 + ASSERT_EQ(0, change_event(true)); 108 + ASSERT_EQ(1, self->check); 109 + ASSERT_EQ(0, change_event(false)); 110 + ASSERT_EQ(0, self->check); 111 + 112 + /* Ensure kernel clears bit after disable */ 113 + ASSERT_EQ(0, change_event(true)); 114 + ASSERT_EQ(1, self->check); 115 + ASSERT_EQ(0, reg_disable(&self->check, 0)); 116 + ASSERT_EQ(0, self->check); 117 + 118 + /* Ensure doesn't change after unreg */ 119 + ASSERT_EQ(0, change_event(true)); 120 + ASSERT_EQ(0, self->check); 121 + ASSERT_EQ(0, change_event(false)); 122 + } 123 + 124 + TEST_F(user, bit_sizes) { 125 + /* Allow 0-31 bits for 32-bit */ 126 + ASSERT_EQ(0, reg_enable(&self->check, sizeof(int), 0)); 127 + ASSERT_EQ(0, reg_enable(&self->check, sizeof(int), 31)); 128 + ASSERT_NE(0, reg_enable(&self->check, sizeof(int), 32)); 129 + ASSERT_EQ(0, reg_disable(&self->check, 0)); 130 + ASSERT_EQ(0, reg_disable(&self->check, 31)); 131 + 132 + #if BITS_PER_LONG == 8 133 + /* Allow 0-64 bits for 64-bit */ 134 + ASSERT_EQ(0, reg_enable(&self->check, sizeof(long), 63)); 135 + ASSERT_NE(0, reg_enable(&self->check, sizeof(long), 64)); 136 + ASSERT_EQ(0, reg_disable(&self->check, 63)); 137 + #endif 138 + 139 + /* Disallowed sizes (everything beside 4 and 8) */ 140 + ASSERT_NE(0, reg_enable(&self->check, 1, 0)); 141 + ASSERT_NE(0, reg_enable(&self->check, 2, 0)); 142 + ASSERT_NE(0, reg_enable(&self->check, 3, 0)); 143 + ASSERT_NE(0, reg_enable(&self->check, 5, 0)); 144 + ASSERT_NE(0, reg_enable(&self->check, 6, 0)); 145 + ASSERT_NE(0, reg_enable(&self->check, 7, 0)); 146 + ASSERT_NE(0, reg_enable(&self->check, 9, 0)); 147 + ASSERT_NE(0, reg_enable(&self->check, 128, 0)); 148 + } 149 + 150 + TEST_F(user, forks) { 151 + int i; 152 + 153 + /* Ensure COW pages get updated after fork */ 154 + ASSERT_EQ(0, reg_enable(&self->check, sizeof(int), 0)); 155 + ASSERT_EQ(0, self->check); 156 + 157 + if (fork() == 0) { 158 + /* Force COW */ 159 + self->check = 0; 160 + 161 + /* Up to 1 sec for enablement */ 162 + for (i = 0; i < 10; ++i) { 163 + usleep(100000); 164 + 165 + if (self->check) 166 + exit(0); 167 + } 168 + 169 + exit(1); 170 + } 171 + 172 + /* Allow generous time for COW, then enable */ 173 + usleep(100000); 174 + ASSERT_EQ(0, change_event(true)); 175 + 176 + ASSERT_NE(-1, wait(&i)); 177 + ASSERT_EQ(0, WEXITSTATUS(i)); 178 + 179 + /* Ensure child doesn't disable parent */ 180 + if (fork() == 0) 181 + exit(reg_disable(&self->check, 0)); 182 + 183 + ASSERT_NE(-1, wait(&i)); 184 + ASSERT_EQ(0, WEXITSTATUS(i)); 185 + ASSERT_EQ(1, self->check); 186 + ASSERT_EQ(0, change_event(false)); 187 + ASSERT_EQ(0, self->check); 188 + } 189 + 190 + /* Waits up to 1 sec for enablement */ 191 + static int clone_check(void *check) 192 + { 193 + int i; 194 + 195 + for (i = 0; i < 10; ++i) { 196 + usleep(100000); 197 + 198 + if (*(long *)check) 199 + return 0; 200 + } 201 + 202 + return 1; 203 + } 204 + 205 + TEST_F(user, clones) { 206 + int i, stack_size = 4096; 207 + void *stack = mmap(NULL, stack_size, PROT_READ | PROT_WRITE, 208 + MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, 209 + -1, 0); 210 + 211 + ASSERT_NE(MAP_FAILED, stack); 212 + ASSERT_EQ(0, reg_enable(&self->check, sizeof(int), 0)); 213 + ASSERT_EQ(0, self->check); 214 + 215 + /* Shared VM should see enablements */ 216 + ASSERT_NE(-1, clone(&clone_check, stack + stack_size, 217 + CLONE_VM | SIGCHLD, &self->check)); 218 + 219 + ASSERT_EQ(0, change_event(true)); 220 + ASSERT_NE(-1, wait(&i)); 221 + ASSERT_EQ(0, WEXITSTATUS(i)); 222 + munmap(stack, stack_size); 223 + ASSERT_EQ(0, change_event(false)); 224 + } 225 + 226 + int main(int argc, char **argv) 227 + { 228 + return test_harness_run(argc, argv); 229 + }
+1 -1
tools/testing/selftests/user_events/dyn_test.c
··· 16 16 17 17 #include "../kselftest_harness.h" 18 18 19 - const char *dyn_file = "/sys/kernel/debug/tracing/dynamic_events"; 19 + const char *dyn_file = "/sys/kernel/tracing/dynamic_events"; 20 20 const char *clear = "!u:__test_event"; 21 21 22 22 static int Append(const char *value)
+101 -75
tools/testing/selftests/user_events/ftrace_test.c
··· 12 12 #include <fcntl.h> 13 13 #include <sys/ioctl.h> 14 14 #include <sys/stat.h> 15 + #include <sys/uio.h> 15 16 #include <unistd.h> 16 17 17 18 #include "../kselftest_harness.h" 18 19 19 - const char *data_file = "/sys/kernel/debug/tracing/user_events_data"; 20 - const char *status_file = "/sys/kernel/debug/tracing/user_events_status"; 21 - const char *enable_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/enable"; 22 - const char *trace_file = "/sys/kernel/debug/tracing/trace"; 23 - const char *fmt_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/format"; 24 - 25 - static inline int status_check(char *status_page, int status_bit) 26 - { 27 - return status_page[status_bit >> 3] & (1 << (status_bit & 7)); 28 - } 20 + const char *data_file = "/sys/kernel/tracing/user_events_data"; 21 + const char *status_file = "/sys/kernel/tracing/user_events_status"; 22 + const char *enable_file = "/sys/kernel/tracing/events/user_events/__test_event/enable"; 23 + const char *trace_file = "/sys/kernel/tracing/trace"; 24 + const char *fmt_file = "/sys/kernel/tracing/events/user_events/__test_event/format"; 29 25 30 26 static int trace_bytes(void) 31 27 { ··· 102 106 return -1; 103 107 } 104 108 105 - static int clear(void) 109 + static int clear(int *check) 106 110 { 111 + struct user_unreg unreg = {0}; 112 + 113 + unreg.size = sizeof(unreg); 114 + unreg.disable_bit = 31; 115 + unreg.disable_addr = (__u64)check; 116 + 107 117 int fd = open(data_file, O_RDWR); 108 118 109 119 if (fd == -1) 110 120 return -1; 121 + 122 + if (ioctl(fd, DIAG_IOCSUNREG, &unreg) == -1) 123 + if (errno != ENOENT) 124 + return -1; 111 125 112 126 if (ioctl(fd, DIAG_IOCSDEL, "__test_event") == -1) 113 127 if (errno != ENOENT) ··· 128 122 return 0; 129 123 } 130 124 131 - static int check_print_fmt(const char *event, const char *expected) 125 + static int check_print_fmt(const char *event, const char *expected, int *check) 132 126 { 133 127 struct user_reg reg = {0}; 134 128 char print_fmt[256]; ··· 136 130 int fd; 137 131 138 132 /* Ensure cleared */ 139 - ret = clear(); 133 + ret = clear(check); 140 134 141 135 if (ret != 0) 142 136 return ret; ··· 148 142 149 143 reg.size = sizeof(reg); 150 144 reg.name_args = (__u64)event; 145 + reg.enable_bit = 31; 146 + reg.enable_addr = (__u64)check; 147 + reg.enable_size = sizeof(*check); 151 148 152 149 /* Register should work */ 153 150 ret = ioctl(fd, DIAG_IOCSREG, &reg); 154 151 155 152 close(fd); 156 153 157 - if (ret != 0) 154 + if (ret != 0) { 155 + printf("Reg failed in fmt\n"); 158 156 return ret; 157 + } 159 158 160 159 /* Ensure correct print_fmt */ 161 160 ret = get_print_fmt(print_fmt, sizeof(print_fmt)); ··· 175 164 int status_fd; 176 165 int data_fd; 177 166 int enable_fd; 167 + int check; 178 168 }; 179 169 180 170 FIXTURE_SETUP(user) { ··· 197 185 close(self->enable_fd); 198 186 } 199 187 200 - ASSERT_EQ(0, clear()); 188 + if (clear(&self->check) != 0) 189 + printf("WARNING: Clear didn't work!\n"); 201 190 } 202 191 203 192 TEST_F(user, register_events) { 204 193 struct user_reg reg = {0}; 205 - int page_size = sysconf(_SC_PAGESIZE); 206 - char *status_page; 194 + struct user_unreg unreg = {0}; 207 195 208 196 reg.size = sizeof(reg); 209 197 reg.name_args = (__u64)"__test_event u32 field1; u32 field2"; 198 + reg.enable_bit = 31; 199 + reg.enable_addr = (__u64)&self->check; 200 + reg.enable_size = sizeof(self->check); 210 201 211 - status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, 212 - self->status_fd, 0); 202 + unreg.size = sizeof(unreg); 203 + unreg.disable_bit = 31; 204 + unreg.disable_addr = (__u64)&self->check; 213 205 214 206 /* Register should work */ 215 207 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 216 208 ASSERT_EQ(0, reg.write_index); 217 - ASSERT_NE(0, reg.status_bit); 218 209 219 - /* Multiple registers should result in same index */ 210 + /* Multiple registers to the same addr + bit should fail */ 211 + ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 212 + ASSERT_EQ(EADDRINUSE, errno); 213 + 214 + /* Multiple registers to same name should result in same index */ 215 + reg.enable_bit = 30; 220 216 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 221 217 ASSERT_EQ(0, reg.write_index); 222 - ASSERT_NE(0, reg.status_bit); 223 218 224 219 /* Ensure disabled */ 225 220 self->enable_fd = open(enable_file, O_RDWR); 226 221 ASSERT_NE(-1, self->enable_fd); 227 222 ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0"))) 228 223 229 - /* MMAP should work and be zero'd */ 230 - ASSERT_NE(MAP_FAILED, status_page); 231 - ASSERT_NE(NULL, status_page); 232 - ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 233 - 234 224 /* Enable event and ensure bits updated in status */ 235 225 ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) 236 - ASSERT_NE(0, status_check(status_page, reg.status_bit)); 226 + ASSERT_EQ(1 << reg.enable_bit, self->check); 237 227 238 228 /* Disable event and ensure bits updated in status */ 239 229 ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0"))) 240 - ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 230 + ASSERT_EQ(0, self->check); 241 231 242 232 /* File still open should return -EBUSY for delete */ 243 233 ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event")); 244 234 ASSERT_EQ(EBUSY, errno); 245 235 246 - /* Delete should work only after close */ 236 + /* Unregister */ 237 + ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSUNREG, &unreg)); 238 + unreg.disable_bit = 30; 239 + ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSUNREG, &unreg)); 240 + 241 + /* Delete should work only after close and unregister */ 247 242 close(self->data_fd); 248 243 self->data_fd = open(data_file, O_RDWR); 249 244 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event")); 250 - 251 - /* Unmap should work */ 252 - ASSERT_EQ(0, munmap(status_page, page_size)); 253 245 } 254 246 255 247 TEST_F(user, write_events) { ··· 261 245 struct iovec io[3]; 262 246 __u32 field1, field2; 263 247 int before = 0, after = 0; 264 - int page_size = sysconf(_SC_PAGESIZE); 265 - char *status_page; 266 248 267 249 reg.size = sizeof(reg); 268 250 reg.name_args = (__u64)"__test_event u32 field1; u32 field2"; 251 + reg.enable_bit = 31; 252 + reg.enable_addr = (__u64)&self->check; 253 + reg.enable_size = sizeof(self->check); 269 254 270 255 field1 = 1; 271 256 field2 = 2; ··· 278 261 io[2].iov_base = &field2; 279 262 io[2].iov_len = sizeof(field2); 280 263 281 - status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, 282 - self->status_fd, 0); 283 - 284 264 /* Register should work */ 285 265 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 286 266 ASSERT_EQ(0, reg.write_index); 287 - ASSERT_NE(0, reg.status_bit); 288 - 289 - /* MMAP should work and be zero'd */ 290 - ASSERT_NE(MAP_FAILED, status_page); 291 - ASSERT_NE(NULL, status_page); 292 - ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 267 + ASSERT_EQ(0, self->check); 293 268 294 269 /* Write should fail on invalid slot with ENOENT */ 295 270 io[0].iov_base = &field2; ··· 296 287 ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) 297 288 298 289 /* Event should now be enabled */ 299 - ASSERT_NE(0, status_check(status_page, reg.status_bit)); 290 + ASSERT_NE(1 << reg.enable_bit, self->check); 300 291 301 292 /* Write should make it out to ftrace buffers */ 302 293 before = trace_bytes(); 303 294 ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 3)); 304 295 after = trace_bytes(); 305 296 ASSERT_GT(after, before); 297 + 298 + /* Negative index should fail with EINVAL */ 299 + reg.write_index = -1; 300 + ASSERT_EQ(-1, writev(self->data_fd, (const struct iovec *)io, 3)); 301 + ASSERT_EQ(EINVAL, errno); 306 302 } 307 303 308 304 TEST_F(user, write_fault) { ··· 318 304 319 305 reg.size = sizeof(reg); 320 306 reg.name_args = (__u64)"__test_event u64 anon"; 307 + reg.enable_bit = 31; 308 + reg.enable_addr = (__u64)&self->check; 309 + reg.enable_size = sizeof(self->check); 321 310 322 311 anon = mmap(NULL, l, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); 323 312 ASSERT_NE(MAP_FAILED, anon); ··· 333 316 /* Register should work */ 334 317 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 335 318 ASSERT_EQ(0, reg.write_index); 336 - ASSERT_NE(0, reg.status_bit); 337 319 338 320 /* Write should work normally */ 339 321 ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 2)); ··· 349 333 int loc, bytes; 350 334 char data[8]; 351 335 int before = 0, after = 0; 352 - int page_size = sysconf(_SC_PAGESIZE); 353 - char *status_page; 354 - 355 - status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, 356 - self->status_fd, 0); 357 336 358 337 reg.size = sizeof(reg); 359 338 reg.name_args = (__u64)"__test_event __rel_loc char[] data"; 339 + reg.enable_bit = 31; 340 + reg.enable_addr = (__u64)&self->check; 341 + reg.enable_size = sizeof(self->check); 360 342 361 343 /* Register should work */ 362 344 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 363 345 ASSERT_EQ(0, reg.write_index); 364 - ASSERT_NE(0, reg.status_bit); 365 - 366 - /* MMAP should work and be zero'd */ 367 - ASSERT_NE(MAP_FAILED, status_page); 368 - ASSERT_NE(NULL, status_page); 369 - ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 346 + ASSERT_EQ(0, self->check); 370 347 371 348 io[0].iov_base = &reg.write_index; 372 349 io[0].iov_len = sizeof(reg.write_index); ··· 378 369 ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) 379 370 380 371 /* Event should now be enabled */ 381 - ASSERT_NE(0, status_check(status_page, reg.status_bit)); 372 + ASSERT_EQ(1 << reg.enable_bit, self->check); 382 373 383 374 /* Full in-bounds write should work */ 384 375 before = trace_bytes(); ··· 418 409 int ret; 419 410 420 411 ret = check_print_fmt("__test_event __rel_loc char[] data", 421 - "print fmt: \"data=%s\", __get_rel_str(data)"); 412 + "print fmt: \"data=%s\", __get_rel_str(data)", 413 + &self->check); 422 414 ASSERT_EQ(0, ret); 423 415 424 416 ret = check_print_fmt("__test_event __data_loc char[] data", 425 - "print fmt: \"data=%s\", __get_str(data)"); 417 + "print fmt: \"data=%s\", __get_str(data)", 418 + &self->check); 426 419 ASSERT_EQ(0, ret); 427 420 428 421 ret = check_print_fmt("__test_event s64 data", 429 - "print fmt: \"data=%lld\", REC->data"); 422 + "print fmt: \"data=%lld\", REC->data", 423 + &self->check); 430 424 ASSERT_EQ(0, ret); 431 425 432 426 ret = check_print_fmt("__test_event u64 data", 433 - "print fmt: \"data=%llu\", REC->data"); 427 + "print fmt: \"data=%llu\", REC->data", 428 + &self->check); 434 429 ASSERT_EQ(0, ret); 435 430 436 431 ret = check_print_fmt("__test_event s32 data", 437 - "print fmt: \"data=%d\", REC->data"); 432 + "print fmt: \"data=%d\", REC->data", 433 + &self->check); 438 434 ASSERT_EQ(0, ret); 439 435 440 436 ret = check_print_fmt("__test_event u32 data", 441 - "print fmt: \"data=%u\", REC->data"); 437 + "print fmt: \"data=%u\", REC->data", 438 + &self->check); 442 439 ASSERT_EQ(0, ret); 443 440 444 441 ret = check_print_fmt("__test_event int data", 445 - "print fmt: \"data=%d\", REC->data"); 442 + "print fmt: \"data=%d\", REC->data", 443 + &self->check); 446 444 ASSERT_EQ(0, ret); 447 445 448 446 ret = check_print_fmt("__test_event unsigned int data", 449 - "print fmt: \"data=%u\", REC->data"); 447 + "print fmt: \"data=%u\", REC->data", 448 + &self->check); 450 449 ASSERT_EQ(0, ret); 451 450 452 451 ret = check_print_fmt("__test_event s16 data", 453 - "print fmt: \"data=%d\", REC->data"); 452 + "print fmt: \"data=%d\", REC->data", 453 + &self->check); 454 454 ASSERT_EQ(0, ret); 455 455 456 456 ret = check_print_fmt("__test_event u16 data", 457 - "print fmt: \"data=%u\", REC->data"); 457 + "print fmt: \"data=%u\", REC->data", 458 + &self->check); 458 459 ASSERT_EQ(0, ret); 459 460 460 461 ret = check_print_fmt("__test_event short data", 461 - "print fmt: \"data=%d\", REC->data"); 462 + "print fmt: \"data=%d\", REC->data", 463 + &self->check); 462 464 ASSERT_EQ(0, ret); 463 465 464 466 ret = check_print_fmt("__test_event unsigned short data", 465 - "print fmt: \"data=%u\", REC->data"); 467 + "print fmt: \"data=%u\", REC->data", 468 + &self->check); 466 469 ASSERT_EQ(0, ret); 467 470 468 471 ret = check_print_fmt("__test_event s8 data", 469 - "print fmt: \"data=%d\", REC->data"); 472 + "print fmt: \"data=%d\", REC->data", 473 + &self->check); 470 474 ASSERT_EQ(0, ret); 471 475 472 476 ret = check_print_fmt("__test_event u8 data", 473 - "print fmt: \"data=%u\", REC->data"); 477 + "print fmt: \"data=%u\", REC->data", 478 + &self->check); 474 479 ASSERT_EQ(0, ret); 475 480 476 481 ret = check_print_fmt("__test_event char data", 477 - "print fmt: \"data=%d\", REC->data"); 482 + "print fmt: \"data=%d\", REC->data", 483 + &self->check); 478 484 ASSERT_EQ(0, ret); 479 485 480 486 ret = check_print_fmt("__test_event unsigned char data", 481 - "print fmt: \"data=%u\", REC->data"); 487 + "print fmt: \"data=%u\", REC->data", 488 + &self->check); 482 489 ASSERT_EQ(0, ret); 483 490 484 491 ret = check_print_fmt("__test_event char[4] data", 485 - "print fmt: \"data=%s\", REC->data"); 492 + "print fmt: \"data=%s\", REC->data", 493 + &self->check); 486 494 ASSERT_EQ(0, ret); 487 495 } 488 496
+16 -23
tools/testing/selftests/user_events/perf_test.c
··· 18 18 19 19 #include "../kselftest_harness.h" 20 20 21 - const char *data_file = "/sys/kernel/debug/tracing/user_events_data"; 22 - const char *status_file = "/sys/kernel/debug/tracing/user_events_status"; 23 - const char *id_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/id"; 24 - const char *fmt_file = "/sys/kernel/debug/tracing/events/user_events/__test_event/format"; 21 + const char *data_file = "/sys/kernel/tracing/user_events_data"; 22 + const char *id_file = "/sys/kernel/tracing/events/user_events/__test_event/id"; 23 + const char *fmt_file = "/sys/kernel/tracing/events/user_events/__test_event/format"; 25 24 26 25 struct event { 27 26 __u32 index; ··· 32 33 int cpu, int group_fd, unsigned long flags) 33 34 { 34 35 return syscall(__NR_perf_event_open, pe, pid, cpu, group_fd, flags); 35 - } 36 - 37 - static inline int status_check(char *status_page, int status_bit) 38 - { 39 - return status_page[status_bit >> 3] & (1 << (status_bit & 7)); 40 36 } 41 37 42 38 static int get_id(void) ··· 82 88 } 83 89 84 90 FIXTURE(user) { 85 - int status_fd; 86 91 int data_fd; 92 + int check; 87 93 }; 88 94 89 95 FIXTURE_SETUP(user) { 90 - self->status_fd = open(status_file, O_RDONLY); 91 - ASSERT_NE(-1, self->status_fd); 92 - 93 96 self->data_fd = open(data_file, O_RDWR); 94 97 ASSERT_NE(-1, self->data_fd); 95 98 } 96 99 97 100 FIXTURE_TEARDOWN(user) { 98 - close(self->status_fd); 99 101 close(self->data_fd); 100 102 } 101 103 102 104 TEST_F(user, perf_write) { 103 105 struct perf_event_attr pe = {0}; 104 106 struct user_reg reg = {0}; 105 - int page_size = sysconf(_SC_PAGESIZE); 106 - char *status_page; 107 107 struct event event; 108 108 struct perf_event_mmap_page *perf_page; 109 + int page_size = sysconf(_SC_PAGESIZE); 109 110 int id, fd, offset; 110 111 __u32 *val; 111 112 112 113 reg.size = sizeof(reg); 113 114 reg.name_args = (__u64)"__test_event u32 field1; u32 field2"; 114 - 115 - status_page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, 116 - self->status_fd, 0); 117 - ASSERT_NE(MAP_FAILED, status_page); 115 + reg.enable_bit = 31; 116 + reg.enable_addr = (__u64)&self->check; 117 + reg.enable_size = sizeof(self->check); 118 118 119 119 /* Register should work */ 120 120 ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg)); 121 121 ASSERT_EQ(0, reg.write_index); 122 - ASSERT_NE(0, reg.status_bit); 123 - ASSERT_EQ(0, status_check(status_page, reg.status_bit)); 122 + ASSERT_EQ(0, self->check); 124 123 125 124 /* Id should be there */ 126 125 id = get_id(); ··· 136 149 ASSERT_NE(MAP_FAILED, perf_page); 137 150 138 151 /* Status should be updated */ 139 - ASSERT_NE(0, status_check(status_page, reg.status_bit)); 152 + ASSERT_EQ(1 << reg.enable_bit, self->check); 140 153 141 154 event.index = reg.write_index; 142 155 event.field1 = 0xc001; ··· 152 165 /* Ensure correct */ 153 166 ASSERT_EQ(event.field1, *val++); 154 167 ASSERT_EQ(event.field2, *val++); 168 + 169 + munmap(perf_page, page_size * 2); 170 + close(fd); 171 + 172 + /* Status should be updated */ 173 + ASSERT_EQ(0, self->check); 155 174 } 156 175 157 176 int main(int argc, char **argv)