Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'trace-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

- Remove eventfs_file descriptor

This is the biggest change, and the second part of making eventfs
create its files dynamically.

In 6.6 the first part was added, and that maintained a one to one
mapping between eventfs meta descriptors and the directories and file
inodes and dentries that were dynamically created. The directories
were represented by a eventfs_inode and the files were represented by
a eventfs_file.

In v6.7 the eventfs_file is removed. As all events have the same
directory make up (sched_switch has an "enable", "id", "format", etc
files), the handing of what files are underneath each leaf eventfs
directory is moved back to the tracing subsystem via a callback.

When an event is added to the eventfs, it registers an array of
evenfs_entry's. These hold the names of the files and the callbacks
to call when the file is referenced. The callback gets the name so
that the same callback may be used by multiple files. The callback
then supplies the filesystem_operations structure needed to create
this file.

This has brought the memory footprint of creating multiple eventfs
instances down by 2 megs each!

- User events now has persistent events that are not associated to a
single processes. These are privileged events that hang around even
if no process is attached to them

- Clean up of seq_buf

There's talk about using seq_buf more to replace strscpy() and
friends. But this also requires some minor modifications of seq_buf
to be able to do this

- Expand instance ring buffers individually

Currently if boot up creates an instance, and a trace event is
enabled on that instance, the ring buffer for that instance and the
top level ring buffer are expanded (1.4 MB per CPU). This wastes
memory as this happens when nothing is using the top level instance

- Other minor clean ups and fixes

* tag 'trace-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
seq_buf: Export seq_buf_puts()
seq_buf: Export seq_buf_putc()
eventfs: Use simple_recursive_removal() to clean up dentries
eventfs: Remove special processing of dput() of events directory
eventfs: Delete eventfs_inode when the last dentry is freed
eventfs: Hold eventfs_mutex when calling callback functions
eventfs: Save ownership and mode
eventfs: Test for ei->is_freed when accessing ei->dentry
eventfs: Have a free_ei() that just frees the eventfs_inode
eventfs: Remove "is_freed" union with rcu head
eventfs: Fix kerneldoc of eventfs_remove_rec()
tracing: Have the user copy of synthetic event address use correct context
eventfs: Remove extra dget() in eventfs_create_events_dir()
tracing: Have trace_event_file have ref counters
seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str()
eventfs: Fix typo in eventfs_inode union comment
eventfs: Fix WARN_ON() in create_file_dentry()
powerpc: Remove initialisation of readpos
tracing/histograms: Simplify last_cmd_set()
seq_buf: fix a misleading comment
...

+1310 -724
+18 -3
Documentation/trace/user_events.rst
··· 14 14 /sys/kernel/tracing/user_events_status and can both register and write 15 15 data out via /sys/kernel/tracing/user_events_data. 16 16 17 + Programs can also use /sys/kernel/tracing/dynamic_events to register and 18 + delete user based events via the u: prefix. The format of the command to 19 + dynamic_events is the same as the ioctl with the u: prefix applied. This 20 + requires CAP_PERFMON due to the event persisting, otherwise -EPERM is returned. 21 + 17 22 Typically programs will register a set of events that they wish to expose to 18 23 tools that can read trace_events (such as ftrace and perf). The registration 19 24 process tells the kernel which address and bit to reflect if any tool has ··· 50 45 /* Input: Enable size in bytes at address */ 51 46 __u8 enable_size; 52 47 53 - /* Input: Flags for future use, set to 0 */ 48 + /* Input: Flags to use, if any */ 54 49 __u16 flags; 55 50 56 51 /* Input: Address to update when enabled */ ··· 74 69 This must be 4 (32-bit) or 8 (64-bit). 64-bit values are only allowed to be 75 70 used on 64-bit kernels, however, 32-bit can be used on all kernels. 76 71 77 - + flags: The flags to use, if any. For the initial version this must be 0. 72 + + flags: The flags to use, if any. 78 73 Callers should first attempt to use flags and retry without flags to ensure 79 74 support for lower versions of the kernel. If a flag is not supported -EINVAL 80 75 is returned. ··· 84 79 85 80 + name_args: The name and arguments to describe the event, see command format 86 81 for details. 82 + 83 + The following flags are currently supported. 84 + 85 + + USER_EVENT_REG_PERSIST: The event will not delete upon the last reference 86 + closing. Callers may use this if an event should exist even after the 87 + process closes or unregisters the event. Requires CAP_PERFMON otherwise 88 + -EPERM is returned. 87 89 88 90 Upon successful registration the following is set. 89 91 ··· 153 141 to request deletes than the one used for registration due to this. 154 142 155 143 **NOTE:** By default events will auto-delete when there are no references left 156 - to the event. Flags in the future may change this logic. 144 + to the event. If programs do not want auto-delete, they must use the 145 + USER_EVENT_REG_PERSIST flag when registering the event. Once that flag is used 146 + the event exists until DIAG_IOCSDEL is invoked. Both register and delete of an 147 + event that persists requires CAP_PERFMON, otherwise -EPERM is returned. 157 148 158 149 Unregistering 159 150 -------------
-1
arch/powerpc/kernel/setup-common.c
··· 601 601 .buffer = ppc_hw_desc_buf, 602 602 .size = sizeof(ppc_hw_desc_buf), 603 603 .len = 0, 604 - .readpos = 0, 605 604 }; 606 605 607 606 static __init void probe_machine(void)
+666 -491
fs/tracefs/event_inode.c
··· 2 2 /* 3 3 * event_inode.c - part of tracefs, a pseudo file system for activating tracing 4 4 * 5 - * Copyright (C) 2020-23 VMware Inc, author: Steven Rostedt (VMware) <rostedt@goodmis.org> 5 + * Copyright (C) 2020-23 VMware Inc, author: Steven Rostedt <rostedt@goodmis.org> 6 6 * Copyright (C) 2020-23 VMware Inc, author: Ajay Kaher <akaher@vmware.com> 7 + * Copyright (C) 2023 Google, author: Steven Rostedt <rostedt@goodmis.org> 7 8 * 8 9 * eventfs is used to dynamically create inodes and dentries based on the 9 10 * meta data provided by the tracing system. ··· 24 23 #include <linux/delay.h> 25 24 #include "internal.h" 26 25 27 - struct eventfs_inode { 28 - struct list_head e_top_files; 29 - }; 26 + /* 27 + * eventfs_mutex protects the eventfs_inode (ei) dentry. Any access 28 + * to the ei->dentry must be done under this mutex and after checking 29 + * if ei->is_freed is not set. The ei->dentry is released under the 30 + * mutex at the same time ei->is_freed is set. If ei->is_freed is set 31 + * then the ei->dentry is invalid. 32 + */ 33 + static DEFINE_MUTEX(eventfs_mutex); 30 34 31 35 /* 32 - * struct eventfs_file - hold the properties of the eventfs files and 33 - * directories. 34 - * @name: the name of the file or directory to create 35 - * @d_parent: holds parent's dentry 36 - * @dentry: once accessed holds dentry 37 - * @list: file or directory to be added to parent directory 38 - * @ei: list of files and directories within directory 39 - * @fop: file_operations for file or directory 40 - * @iop: inode_operations for file or directory 41 - * @data: something that the caller will want to get to later on 42 - * @mode: the permission that the file or directory should have 36 + * The eventfs_inode (ei) itself is protected by SRCU. It is released from 37 + * its parent's list and will have is_freed set (under eventfs_mutex). 38 + * After the SRCU grace period is over, the ei may be freed. 43 39 */ 44 - struct eventfs_file { 45 - const char *name; 46 - struct dentry *d_parent; 47 - struct dentry *dentry; 48 - struct list_head list; 49 - struct eventfs_inode *ei; 50 - const struct file_operations *fop; 51 - const struct inode_operations *iop; 52 - /* 53 - * Union - used for deletion 54 - * @del_list: list of eventfs_file to delete 55 - * @rcu: eventfs_file to delete in RCU 56 - * @is_freed: node is freed if one of the above is set 57 - */ 58 - union { 59 - struct list_head del_list; 60 - struct rcu_head rcu; 61 - unsigned long is_freed; 62 - }; 63 - void *data; 64 - umode_t mode; 40 + DEFINE_STATIC_SRCU(eventfs_srcu); 41 + 42 + /* Mode is unsigned short, use the upper bits for flags */ 43 + enum { 44 + EVENTFS_SAVE_MODE = BIT(16), 45 + EVENTFS_SAVE_UID = BIT(17), 46 + EVENTFS_SAVE_GID = BIT(18), 65 47 }; 66 48 67 - static DEFINE_MUTEX(eventfs_mutex); 68 - DEFINE_STATIC_SRCU(eventfs_srcu); 49 + #define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1) 69 50 70 51 static struct dentry *eventfs_root_lookup(struct inode *dir, 71 52 struct dentry *dentry, ··· 56 73 static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx); 57 74 static int eventfs_release(struct inode *inode, struct file *file); 58 75 76 + static void update_attr(struct eventfs_attr *attr, struct iattr *iattr) 77 + { 78 + unsigned int ia_valid = iattr->ia_valid; 79 + 80 + if (ia_valid & ATTR_MODE) { 81 + attr->mode = (attr->mode & ~EVENTFS_MODE_MASK) | 82 + (iattr->ia_mode & EVENTFS_MODE_MASK) | 83 + EVENTFS_SAVE_MODE; 84 + } 85 + if (ia_valid & ATTR_UID) { 86 + attr->mode |= EVENTFS_SAVE_UID; 87 + attr->uid = iattr->ia_uid; 88 + } 89 + if (ia_valid & ATTR_GID) { 90 + attr->mode |= EVENTFS_SAVE_GID; 91 + attr->gid = iattr->ia_gid; 92 + } 93 + } 94 + 95 + static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, 96 + struct iattr *iattr) 97 + { 98 + const struct eventfs_entry *entry; 99 + struct eventfs_inode *ei; 100 + const char *name; 101 + int ret; 102 + 103 + mutex_lock(&eventfs_mutex); 104 + ei = dentry->d_fsdata; 105 + if (ei->is_freed) { 106 + /* Do not allow changes if the event is about to be removed. */ 107 + mutex_unlock(&eventfs_mutex); 108 + return -ENODEV; 109 + } 110 + 111 + /* Preallocate the children mode array if necessary */ 112 + if (!(dentry->d_inode->i_mode & S_IFDIR)) { 113 + if (!ei->entry_attrs) { 114 + ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries, 115 + GFP_KERNEL); 116 + if (!ei->entry_attrs) { 117 + ret = -ENOMEM; 118 + goto out; 119 + } 120 + } 121 + } 122 + 123 + ret = simple_setattr(idmap, dentry, iattr); 124 + if (ret < 0) 125 + goto out; 126 + 127 + /* 128 + * If this is a dir, then update the ei cache, only the file 129 + * mode is saved in the ei->m_children, and the ownership is 130 + * determined by the parent directory. 131 + */ 132 + if (dentry->d_inode->i_mode & S_IFDIR) { 133 + update_attr(&ei->attr, iattr); 134 + 135 + } else { 136 + name = dentry->d_name.name; 137 + 138 + for (int i = 0; i < ei->nr_entries; i++) { 139 + entry = &ei->entries[i]; 140 + if (strcmp(name, entry->name) == 0) { 141 + update_attr(&ei->entry_attrs[i], iattr); 142 + break; 143 + } 144 + } 145 + } 146 + out: 147 + mutex_unlock(&eventfs_mutex); 148 + return ret; 149 + } 150 + 59 151 static const struct inode_operations eventfs_root_dir_inode_operations = { 60 152 .lookup = eventfs_root_lookup, 153 + .setattr = eventfs_set_attr, 154 + }; 155 + 156 + static const struct inode_operations eventfs_file_inode_operations = { 157 + .setattr = eventfs_set_attr, 61 158 }; 62 159 63 160 static const struct file_operations eventfs_file_operations = { ··· 148 85 .release = eventfs_release, 149 86 }; 150 87 88 + static void update_inode_attr(struct inode *inode, struct eventfs_attr *attr, umode_t mode) 89 + { 90 + if (!attr) { 91 + inode->i_mode = mode; 92 + return; 93 + } 94 + 95 + if (attr->mode & EVENTFS_SAVE_MODE) 96 + inode->i_mode = attr->mode & EVENTFS_MODE_MASK; 97 + else 98 + inode->i_mode = mode; 99 + 100 + if (attr->mode & EVENTFS_SAVE_UID) 101 + inode->i_uid = attr->uid; 102 + 103 + if (attr->mode & EVENTFS_SAVE_GID) 104 + inode->i_gid = attr->gid; 105 + } 106 + 151 107 /** 152 108 * create_file - create a file in the tracefs filesystem 153 109 * @name: the name of the file to create. 154 110 * @mode: the permission that the file should have. 111 + * @attr: saved attributes changed by user 155 112 * @parent: parent dentry for this file. 156 113 * @data: something that the caller will want to get to later on. 157 114 * @fop: struct file_operations that should be used for this file. 158 115 * 159 - * This is the basic "create a file" function for tracefs. It allows for a 160 - * wide range of flexibility in creating a file. 161 - * 162 - * This function will return a pointer to a dentry if it succeeds. This 163 - * pointer must be passed to the tracefs_remove() function when the file is 164 - * to be removed (no automatic cleanup happens if your module is unloaded, 165 - * you are responsible here.) If an error occurs, %NULL will be returned. 166 - * 167 - * If tracefs is not enabled in the kernel, the value -%ENODEV will be 168 - * returned. 116 + * This function creates a dentry that represents a file in the eventsfs_inode 117 + * directory. The inode.i_private pointer will point to @data in the open() 118 + * call. 169 119 */ 170 120 static struct dentry *create_file(const char *name, umode_t mode, 121 + struct eventfs_attr *attr, 171 122 struct dentry *parent, void *data, 172 123 const struct file_operations *fop) 173 124 { ··· 195 118 if (WARN_ON_ONCE(!S_ISREG(mode))) 196 119 return NULL; 197 120 121 + WARN_ON_ONCE(!parent); 198 122 dentry = eventfs_start_creating(name, parent); 199 123 200 124 if (IS_ERR(dentry)) ··· 205 127 if (unlikely(!inode)) 206 128 return eventfs_failed_creating(dentry); 207 129 208 - inode->i_mode = mode; 130 + /* If the user updated the directory's attributes, use them */ 131 + update_inode_attr(inode, attr, mode); 132 + 133 + inode->i_op = &eventfs_file_inode_operations; 209 134 inode->i_fop = fop; 210 135 inode->i_private = data; 211 136 ··· 221 140 222 141 /** 223 142 * create_dir - create a dir in the tracefs filesystem 224 - * @name: the name of the file to create. 143 + * @ei: the eventfs_inode that represents the directory to create 225 144 * @parent: parent dentry for this file. 226 - * @data: something that the caller will want to get to later on. 227 145 * 228 - * This is the basic "create a dir" function for eventfs. It allows for a 229 - * wide range of flexibility in creating a dir. 230 - * 231 - * This function will return a pointer to a dentry if it succeeds. This 232 - * pointer must be passed to the tracefs_remove() function when the file is 233 - * to be removed (no automatic cleanup happens if your module is unloaded, 234 - * you are responsible here.) If an error occurs, %NULL will be returned. 235 - * 236 - * If tracefs is not enabled in the kernel, the value -%ENODEV will be 237 - * returned. 146 + * This function will create a dentry for a directory represented by 147 + * a eventfs_inode. 238 148 */ 239 - static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) 149 + static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent) 240 150 { 241 151 struct tracefs_inode *ti; 242 152 struct dentry *dentry; 243 153 struct inode *inode; 244 154 245 - dentry = eventfs_start_creating(name, parent); 155 + dentry = eventfs_start_creating(ei->name, parent); 246 156 if (IS_ERR(dentry)) 247 157 return dentry; 248 158 ··· 241 169 if (unlikely(!inode)) 242 170 return eventfs_failed_creating(dentry); 243 171 244 - inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; 172 + /* If the user updated the directory's attributes, use them */ 173 + update_inode_attr(inode, &ei->attr, S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO); 174 + 245 175 inode->i_op = &eventfs_root_dir_inode_operations; 246 176 inode->i_fop = &eventfs_file_operations; 247 - inode->i_private = data; 248 177 249 178 ti = get_tracefs(inode); 250 179 ti->flags |= TRACEFS_EVENT_INODE; ··· 257 184 return eventfs_end_creating(dentry); 258 185 } 259 186 260 - /** 261 - * eventfs_set_ef_status_free - set the ef->status to free 262 - * @ti: the tracefs_inode of the dentry 263 - * @dentry: dentry who's status to be freed 264 - * 265 - * eventfs_set_ef_status_free will be called if no more 266 - * references remain 267 - */ 268 - void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry) 187 + static void free_ei(struct eventfs_inode *ei) 269 188 { 270 - struct tracefs_inode *ti_parent; 189 + kfree_const(ei->name); 190 + kfree(ei->d_children); 191 + kfree(ei->entry_attrs); 192 + kfree(ei); 193 + } 194 + 195 + /** 196 + * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode 197 + * @ti: the tracefs_inode of the dentry 198 + * @dentry: dentry which has the reference to remove. 199 + * 200 + * Remove the association between a dentry from an eventfs_inode. 201 + */ 202 + void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) 203 + { 271 204 struct eventfs_inode *ei; 272 - struct eventfs_file *ef, *tmp; 273 - 274 - /* The top level events directory may be freed by this */ 275 - if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) { 276 - LIST_HEAD(ef_del_list); 277 - 278 - mutex_lock(&eventfs_mutex); 279 - 280 - ei = ti->private; 281 - 282 - /* Record all the top level files */ 283 - list_for_each_entry_srcu(ef, &ei->e_top_files, list, 284 - lockdep_is_held(&eventfs_mutex)) { 285 - list_add_tail(&ef->del_list, &ef_del_list); 286 - } 287 - 288 - /* Nothing should access this, but just in case! */ 289 - ti->private = NULL; 290 - 291 - mutex_unlock(&eventfs_mutex); 292 - 293 - /* Now safely free the top level files and their children */ 294 - list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { 295 - list_del(&ef->del_list); 296 - eventfs_remove(ef); 297 - } 298 - 299 - kfree(ei); 300 - return; 301 - } 205 + int i; 302 206 303 207 mutex_lock(&eventfs_mutex); 304 208 305 - ti_parent = get_tracefs(dentry->d_parent->d_inode); 306 - if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) 209 + ei = dentry->d_fsdata; 210 + if (!ei) 307 211 goto out; 308 212 309 - ef = dentry->d_fsdata; 310 - if (!ef) 311 - goto out; 312 - 313 - /* 314 - * If ef was freed, then the LSB bit is set for d_fsdata. 315 - * But this should not happen, as it should still have a 316 - * ref count that prevents it. Warn in case it does. 317 - */ 318 - if (WARN_ON_ONCE((unsigned long)ef & 1)) 319 - goto out; 213 + /* This could belong to one of the files of the ei */ 214 + if (ei->dentry != dentry) { 215 + for (i = 0; i < ei->nr_entries; i++) { 216 + if (ei->d_children[i] == dentry) 217 + break; 218 + } 219 + if (WARN_ON_ONCE(i == ei->nr_entries)) 220 + goto out; 221 + ei->d_children[i] = NULL; 222 + } else if (ei->is_freed) { 223 + free_ei(ei); 224 + } else { 225 + ei->dentry = NULL; 226 + } 320 227 321 228 dentry->d_fsdata = NULL; 322 - ef->dentry = NULL; 323 - out: 229 + out: 324 230 mutex_unlock(&eventfs_mutex); 325 231 } 326 232 327 233 /** 328 - * eventfs_post_create_dir - post create dir routine 329 - * @ef: eventfs_file of recently created dir 234 + * create_file_dentry - create a dentry for a file of an eventfs_inode 235 + * @ei: the eventfs_inode that the file will be created under 236 + * @idx: the index into the d_children[] of the @ei 237 + * @parent: The parent dentry of the created file. 238 + * @name: The name of the file to create 239 + * @mode: The mode of the file. 240 + * @data: The data to use to set the inode of the file with on open() 241 + * @fops: The fops of the file to be created. 242 + * @lookup: If called by the lookup routine, in which case, dput() the created dentry. 330 243 * 331 - * Map the meta-data of files within an eventfs dir to their parent dentry 332 - */ 333 - static void eventfs_post_create_dir(struct eventfs_file *ef) 334 - { 335 - struct eventfs_file *ef_child; 336 - struct tracefs_inode *ti; 337 - 338 - /* srcu lock already held */ 339 - /* fill parent-child relation */ 340 - list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, 341 - srcu_read_lock_held(&eventfs_srcu)) { 342 - ef_child->d_parent = ef->dentry; 343 - } 344 - 345 - ti = get_tracefs(ef->dentry->d_inode); 346 - ti->private = ef->ei; 347 - } 348 - 349 - /** 350 - * create_dentry - helper function to create dentry 351 - * @ef: eventfs_file of file or directory to create 352 - * @parent: parent dentry 353 - * @lookup: true if called from lookup routine 354 - * 355 - * Used to create a dentry for file/dir, executes post dentry creation routine 244 + * Create a dentry for a file of an eventfs_inode @ei and place it into the 245 + * address located at @e_dentry. If the @e_dentry already has a dentry, then 246 + * just do a dget() on it and return. Otherwise create the dentry and attach it. 356 247 */ 357 248 static struct dentry * 358 - create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) 249 + create_file_dentry(struct eventfs_inode *ei, int idx, 250 + struct dentry *parent, const char *name, umode_t mode, void *data, 251 + const struct file_operations *fops, bool lookup) 359 252 { 360 - bool invalidate = false; 253 + struct eventfs_attr *attr = NULL; 254 + struct dentry **e_dentry = &ei->d_children[idx]; 361 255 struct dentry *dentry; 256 + bool invalidate = false; 362 257 363 258 mutex_lock(&eventfs_mutex); 364 - if (ef->is_freed) { 259 + if (ei->is_freed) { 365 260 mutex_unlock(&eventfs_mutex); 366 261 return NULL; 367 262 } 368 - if (ef->dentry) { 369 - dentry = ef->dentry; 370 - /* On dir open, up the ref count */ 263 + /* If the e_dentry already has a dentry, use it */ 264 + if (*e_dentry) { 265 + /* lookup does not need to up the ref count */ 266 + if (!lookup) 267 + dget(*e_dentry); 268 + mutex_unlock(&eventfs_mutex); 269 + return *e_dentry; 270 + } 271 + 272 + /* ei->entry_attrs are protected by SRCU */ 273 + if (ei->entry_attrs) 274 + attr = &ei->entry_attrs[idx]; 275 + 276 + mutex_unlock(&eventfs_mutex); 277 + 278 + /* The lookup already has the parent->d_inode locked */ 279 + if (!lookup) 280 + inode_lock(parent->d_inode); 281 + 282 + dentry = create_file(name, mode, attr, parent, data, fops); 283 + 284 + if (!lookup) 285 + inode_unlock(parent->d_inode); 286 + 287 + mutex_lock(&eventfs_mutex); 288 + 289 + if (IS_ERR_OR_NULL(dentry)) { 290 + /* 291 + * When the mutex was released, something else could have 292 + * created the dentry for this e_dentry. In which case 293 + * use that one. 294 + * 295 + * Note, with the mutex held, the e_dentry cannot have content 296 + * and the ei->is_freed be true at the same time. 297 + */ 298 + dentry = *e_dentry; 299 + if (WARN_ON_ONCE(dentry && ei->is_freed)) 300 + dentry = NULL; 301 + /* The lookup does not need to up the dentry refcount */ 302 + if (dentry && !lookup) 303 + dget(dentry); 304 + mutex_unlock(&eventfs_mutex); 305 + return dentry; 306 + } 307 + 308 + if (!*e_dentry && !ei->is_freed) { 309 + *e_dentry = dentry; 310 + dentry->d_fsdata = ei; 311 + } else { 312 + /* 313 + * Should never happen unless we get here due to being freed. 314 + * Otherwise it means two dentries exist with the same name. 315 + */ 316 + WARN_ON_ONCE(!ei->is_freed); 317 + invalidate = true; 318 + } 319 + mutex_unlock(&eventfs_mutex); 320 + 321 + if (invalidate) 322 + d_invalidate(dentry); 323 + 324 + if (lookup || invalidate) 325 + dput(dentry); 326 + 327 + return invalidate ? NULL : dentry; 328 + } 329 + 330 + /** 331 + * eventfs_post_create_dir - post create dir routine 332 + * @ei: eventfs_inode of recently created dir 333 + * 334 + * Map the meta-data of files within an eventfs dir to their parent dentry 335 + */ 336 + static void eventfs_post_create_dir(struct eventfs_inode *ei) 337 + { 338 + struct eventfs_inode *ei_child; 339 + struct tracefs_inode *ti; 340 + 341 + lockdep_assert_held(&eventfs_mutex); 342 + 343 + /* srcu lock already held */ 344 + /* fill parent-child relation */ 345 + list_for_each_entry_srcu(ei_child, &ei->children, list, 346 + srcu_read_lock_held(&eventfs_srcu)) { 347 + ei_child->d_parent = ei->dentry; 348 + } 349 + 350 + ti = get_tracefs(ei->dentry->d_inode); 351 + ti->private = ei; 352 + } 353 + 354 + /** 355 + * create_dir_dentry - Create a directory dentry for the eventfs_inode 356 + * @pei: The eventfs_inode parent of ei. 357 + * @ei: The eventfs_inode to create the directory for 358 + * @parent: The dentry of the parent of this directory 359 + * @lookup: True if this is called by the lookup code 360 + * 361 + * This creates and attaches a directory dentry to the eventfs_inode @ei. 362 + */ 363 + static struct dentry * 364 + create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, 365 + struct dentry *parent, bool lookup) 366 + { 367 + bool invalidate = false; 368 + struct dentry *dentry = NULL; 369 + 370 + mutex_lock(&eventfs_mutex); 371 + if (pei->is_freed || ei->is_freed) { 372 + mutex_unlock(&eventfs_mutex); 373 + return NULL; 374 + } 375 + if (ei->dentry) { 376 + /* If the dentry already has a dentry, use it */ 377 + dentry = ei->dentry; 378 + /* lookup does not need to up the ref count */ 371 379 if (!lookup) 372 380 dget(dentry); 373 381 mutex_unlock(&eventfs_mutex); ··· 456 302 } 457 303 mutex_unlock(&eventfs_mutex); 458 304 305 + /* The lookup already has the parent->d_inode locked */ 459 306 if (!lookup) 460 307 inode_lock(parent->d_inode); 461 308 462 - if (ef->ei) 463 - dentry = create_dir(ef->name, parent, ef->data); 464 - else 465 - dentry = create_file(ef->name, ef->mode, parent, 466 - ef->data, ef->fop); 309 + dentry = create_dir(ei, parent); 467 310 468 311 if (!lookup) 469 312 inode_unlock(parent->d_inode); 470 313 471 314 mutex_lock(&eventfs_mutex); 472 - if (IS_ERR_OR_NULL(dentry)) { 473 - /* If the ef was already updated get it */ 474 - dentry = ef->dentry; 315 + 316 + if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) { 317 + /* 318 + * When the mutex was released, something else could have 319 + * created the dentry for this e_dentry. In which case 320 + * use that one. 321 + * 322 + * Note, with the mutex held, the e_dentry cannot have content 323 + * and the ei->is_freed be true at the same time. 324 + */ 325 + dentry = ei->dentry; 475 326 if (dentry && !lookup) 476 327 dget(dentry); 477 328 mutex_unlock(&eventfs_mutex); 478 329 return dentry; 479 330 } 480 331 481 - if (!ef->dentry && !ef->is_freed) { 482 - ef->dentry = dentry; 483 - if (ef->ei) 484 - eventfs_post_create_dir(ef); 485 - dentry->d_fsdata = ef; 332 + if (!ei->dentry && !ei->is_freed) { 333 + ei->dentry = dentry; 334 + eventfs_post_create_dir(ei); 335 + dentry->d_fsdata = ei; 486 336 } else { 487 - /* A race here, should try again (unless freed) */ 488 - invalidate = true; 489 - 490 337 /* 491 338 * Should never happen unless we get here due to being freed. 492 339 * Otherwise it means two dentries exist with the same name. 493 340 */ 494 - WARN_ON_ONCE(!ef->is_freed); 341 + WARN_ON_ONCE(!ei->is_freed); 342 + invalidate = true; 495 343 } 496 344 mutex_unlock(&eventfs_mutex); 497 345 if (invalidate) ··· 505 349 return invalidate ? NULL : dentry; 506 350 } 507 351 508 - static bool match_event_file(struct eventfs_file *ef, const char *name) 509 - { 510 - bool ret; 511 - 512 - mutex_lock(&eventfs_mutex); 513 - ret = !ef->is_freed && strcmp(ef->name, name) == 0; 514 - mutex_unlock(&eventfs_mutex); 515 - 516 - return ret; 517 - } 518 - 519 352 /** 520 353 * eventfs_root_lookup - lookup routine to create file/dir 521 354 * @dir: in which a lookup is being done 522 355 * @dentry: file/dir dentry 523 - * @flags: to pass as flags parameter to simple lookup 356 + * @flags: Just passed to simple_lookup() 524 357 * 525 - * Used to create a dynamic file/dir within @dir. Use the eventfs_inode 526 - * list of meta data to find the information needed to create the file/dir. 358 + * Used to create dynamic file/dir with-in @dir, search with-in @ei 359 + * list, if @dentry found go ahead and create the file/dir 527 360 */ 361 + 528 362 static struct dentry *eventfs_root_lookup(struct inode *dir, 529 363 struct dentry *dentry, 530 364 unsigned int flags) 531 365 { 366 + const struct file_operations *fops; 367 + const struct eventfs_entry *entry; 368 + struct eventfs_inode *ei_child; 532 369 struct tracefs_inode *ti; 533 370 struct eventfs_inode *ei; 534 - struct eventfs_file *ef; 371 + struct dentry *ei_dentry = NULL; 535 372 struct dentry *ret = NULL; 373 + const char *name = dentry->d_name.name; 374 + bool created = false; 375 + umode_t mode; 376 + void *data; 536 377 int idx; 378 + int i; 379 + int r; 537 380 538 381 ti = get_tracefs(dir); 539 382 if (!(ti->flags & TRACEFS_EVENT_INODE)) 540 383 return NULL; 541 384 542 - ei = ti->private; 385 + /* Grab srcu to prevent the ei from going away */ 543 386 idx = srcu_read_lock(&eventfs_srcu); 544 - list_for_each_entry_srcu(ef, &ei->e_top_files, list, 387 + 388 + /* 389 + * Grab the eventfs_mutex to consistent value from ti->private. 390 + * This s 391 + */ 392 + mutex_lock(&eventfs_mutex); 393 + ei = READ_ONCE(ti->private); 394 + if (ei && !ei->is_freed) 395 + ei_dentry = READ_ONCE(ei->dentry); 396 + mutex_unlock(&eventfs_mutex); 397 + 398 + if (!ei || !ei_dentry) 399 + goto out; 400 + 401 + data = ei->data; 402 + 403 + list_for_each_entry_srcu(ei_child, &ei->children, list, 545 404 srcu_read_lock_held(&eventfs_srcu)) { 546 - if (!match_event_file(ef, dentry->d_name.name)) 405 + if (strcmp(ei_child->name, name) != 0) 547 406 continue; 548 407 ret = simple_lookup(dir, dentry, flags); 549 - create_dentry(ef, ef->d_parent, true); 408 + create_dir_dentry(ei, ei_child, ei_dentry, true); 409 + created = true; 550 410 break; 551 411 } 412 + 413 + if (created) 414 + goto out; 415 + 416 + for (i = 0; i < ei->nr_entries; i++) { 417 + entry = &ei->entries[i]; 418 + if (strcmp(name, entry->name) == 0) { 419 + void *cdata = data; 420 + mutex_lock(&eventfs_mutex); 421 + /* If ei->is_freed, then the event itself may be too */ 422 + if (!ei->is_freed) 423 + r = entry->callback(name, &mode, &cdata, &fops); 424 + else 425 + r = -1; 426 + mutex_unlock(&eventfs_mutex); 427 + if (r <= 0) 428 + continue; 429 + ret = simple_lookup(dir, dentry, flags); 430 + create_file_dentry(ei, i, ei_dentry, name, mode, cdata, 431 + fops, true); 432 + break; 433 + } 434 + } 435 + out: 552 436 srcu_read_unlock(&eventfs_srcu, idx); 553 437 return ret; 554 438 } ··· 628 432 return dcache_dir_close(inode, file); 629 433 } 630 434 435 + static int add_dentries(struct dentry ***dentries, struct dentry *d, int cnt) 436 + { 437 + struct dentry **tmp; 438 + 439 + tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_KERNEL); 440 + if (!tmp) 441 + return -1; 442 + tmp[cnt] = d; 443 + tmp[cnt + 1] = NULL; 444 + *dentries = tmp; 445 + return 0; 446 + } 447 + 631 448 /** 632 449 * dcache_dir_open_wrapper - eventfs open wrapper 633 450 * @inode: not used 634 - * @file: dir to be opened (to create its child) 451 + * @file: dir to be opened (to create it's children) 635 452 * 636 - * Used to dynamically create the file/dir within @file. @file is really a 637 - * directory and all the files/dirs of the children within @file will be 638 - * created. If any of the files/dirs have already been created, their 639 - * reference count will be incremented. 453 + * Used to dynamic create file/dir with-in @file, all the 454 + * file/dir will be created. If already created then references 455 + * will be increased 640 456 */ 641 457 static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) 642 458 { 459 + const struct file_operations *fops; 460 + const struct eventfs_entry *entry; 461 + struct eventfs_inode *ei_child; 643 462 struct tracefs_inode *ti; 644 463 struct eventfs_inode *ei; 645 - struct eventfs_file *ef; 646 464 struct dentry_list *dlist; 647 465 struct dentry **dentries = NULL; 648 - struct dentry *dentry = file_dentry(file); 466 + struct dentry *parent = file_dentry(file); 649 467 struct dentry *d; 650 468 struct inode *f_inode = file_inode(file); 469 + const char *name = parent->d_name.name; 470 + umode_t mode; 471 + void *data; 651 472 int cnt = 0; 652 473 int idx; 653 474 int ret; 475 + int i; 476 + int r; 654 477 655 478 ti = get_tracefs(f_inode); 656 479 if (!(ti->flags & TRACEFS_EVENT_INODE)) ··· 678 463 if (WARN_ON_ONCE(file->private_data)) 679 464 return -EINVAL; 680 465 681 - dlist = kmalloc(sizeof(*dlist), GFP_KERNEL); 682 - if (!dlist) 683 - return -ENOMEM; 684 - 685 - ei = ti->private; 686 466 idx = srcu_read_lock(&eventfs_srcu); 687 - list_for_each_entry_srcu(ef, &ei->e_top_files, list, 688 - srcu_read_lock_held(&eventfs_srcu)) { 689 - d = create_dentry(ef, dentry, false); 690 - if (d) { 691 - struct dentry **tmp; 692 467 693 - tmp = krealloc(dentries, sizeof(d) * (cnt + 2), GFP_KERNEL); 694 - if (!tmp) 468 + mutex_lock(&eventfs_mutex); 469 + ei = READ_ONCE(ti->private); 470 + mutex_unlock(&eventfs_mutex); 471 + 472 + if (!ei) { 473 + srcu_read_unlock(&eventfs_srcu, idx); 474 + return -EINVAL; 475 + } 476 + 477 + 478 + data = ei->data; 479 + 480 + dlist = kmalloc(sizeof(*dlist), GFP_KERNEL); 481 + if (!dlist) { 482 + srcu_read_unlock(&eventfs_srcu, idx); 483 + return -ENOMEM; 484 + } 485 + 486 + list_for_each_entry_srcu(ei_child, &ei->children, list, 487 + srcu_read_lock_held(&eventfs_srcu)) { 488 + d = create_dir_dentry(ei, ei_child, parent, false); 489 + if (d) { 490 + ret = add_dentries(&dentries, d, cnt); 491 + if (ret < 0) 695 492 break; 696 - tmp[cnt] = d; 697 - tmp[cnt + 1] = NULL; 698 493 cnt++; 699 - dentries = tmp; 494 + } 495 + } 496 + 497 + for (i = 0; i < ei->nr_entries; i++) { 498 + void *cdata = data; 499 + entry = &ei->entries[i]; 500 + name = entry->name; 501 + mutex_lock(&eventfs_mutex); 502 + /* If ei->is_freed, then the event itself may be too */ 503 + if (!ei->is_freed) 504 + r = entry->callback(name, &mode, &cdata, &fops); 505 + else 506 + r = -1; 507 + mutex_unlock(&eventfs_mutex); 508 + if (r <= 0) 509 + continue; 510 + d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false); 511 + if (d) { 512 + ret = add_dentries(&dentries, d, cnt); 513 + if (ret < 0) 514 + break; 515 + cnt++; 700 516 } 701 517 } 702 518 srcu_read_unlock(&eventfs_srcu, idx); ··· 760 514 } 761 515 762 516 /** 763 - * eventfs_prepare_ef - helper function to prepare eventfs_file 764 - * @name: the name of the file/directory to create. 765 - * @mode: the permission that the file should have. 766 - * @fop: struct file_operations that should be used for this file/directory. 767 - * @iop: struct inode_operations that should be used for this file/directory. 768 - * @data: something that the caller will want to get to later on. The 769 - * inode.i_private pointer will point to this value on the open() call. 517 + * eventfs_create_dir - Create the eventfs_inode for this directory 518 + * @name: The name of the directory to create. 519 + * @parent: The eventfs_inode of the parent directory. 520 + * @entries: A list of entries that represent the files under this directory 521 + * @size: The number of @entries 522 + * @data: The default data to pass to the files (an entry may override it). 770 523 * 771 - * This function allocates and fills the eventfs_file structure. 524 + * This function creates the descriptor to represent a directory in the 525 + * eventfs. This descriptor is an eventfs_inode, and it is returned to be 526 + * used to create other children underneath. 527 + * 528 + * The @entries is an array of eventfs_entry structures which has: 529 + * const char *name 530 + * eventfs_callback callback; 531 + * 532 + * The name is the name of the file, and the callback is a pointer to a function 533 + * that will be called when the file is reference (either by lookup or by 534 + * reading a directory). The callback is of the prototype: 535 + * 536 + * int callback(const char *name, umode_t *mode, void **data, 537 + * const struct file_operations **fops); 538 + * 539 + * When a file needs to be created, this callback will be called with 540 + * name = the name of the file being created (so that the same callback 541 + * may be used for multiple files). 542 + * mode = a place to set the file's mode 543 + * data = A pointer to @data, and the callback may replace it, which will 544 + * cause the file created to pass the new data to the open() call. 545 + * fops = the fops to use for the created file. 546 + * 547 + * NB. @callback is called while holding internal locks of the eventfs 548 + * system. The callback must not call any code that might also call into 549 + * the tracefs or eventfs system or it will risk creating a deadlock. 772 550 */ 773 - static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode, 774 - const struct file_operations *fop, 775 - const struct inode_operations *iop, 776 - void *data) 551 + struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent, 552 + const struct eventfs_entry *entries, 553 + int size, void *data) 777 554 { 778 - struct eventfs_file *ef; 555 + struct eventfs_inode *ei; 779 556 780 - ef = kzalloc(sizeof(*ef), GFP_KERNEL); 781 - if (!ef) 557 + if (!parent) 558 + return ERR_PTR(-EINVAL); 559 + 560 + ei = kzalloc(sizeof(*ei), GFP_KERNEL); 561 + if (!ei) 782 562 return ERR_PTR(-ENOMEM); 783 563 784 - ef->name = kstrdup(name, GFP_KERNEL); 785 - if (!ef->name) { 786 - kfree(ef); 564 + ei->name = kstrdup_const(name, GFP_KERNEL); 565 + if (!ei->name) { 566 + kfree(ei); 787 567 return ERR_PTR(-ENOMEM); 788 568 } 789 569 790 - if (S_ISDIR(mode)) { 791 - ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL); 792 - if (!ef->ei) { 793 - kfree(ef->name); 794 - kfree(ef); 570 + if (size) { 571 + ei->d_children = kzalloc(sizeof(*ei->d_children) * size, GFP_KERNEL); 572 + if (!ei->d_children) { 573 + kfree_const(ei->name); 574 + kfree(ei); 795 575 return ERR_PTR(-ENOMEM); 796 576 } 797 - INIT_LIST_HEAD(&ef->ei->e_top_files); 798 - } else { 799 - ef->ei = NULL; 800 577 } 801 578 802 - ef->iop = iop; 803 - ef->fop = fop; 804 - ef->mode = mode; 805 - ef->data = data; 806 - return ef; 579 + ei->entries = entries; 580 + ei->nr_entries = size; 581 + ei->data = data; 582 + INIT_LIST_HEAD(&ei->children); 583 + INIT_LIST_HEAD(&ei->list); 584 + 585 + mutex_lock(&eventfs_mutex); 586 + if (!parent->is_freed) { 587 + list_add_tail(&ei->list, &parent->children); 588 + ei->d_parent = parent->dentry; 589 + } 590 + mutex_unlock(&eventfs_mutex); 591 + 592 + /* Was the parent freed? */ 593 + if (list_empty(&ei->list)) { 594 + free_ei(ei); 595 + ei = NULL; 596 + } 597 + return ei; 807 598 } 808 599 809 600 /** 810 - * eventfs_create_events_dir - create the trace event structure 811 - * @name: the name of the directory to create. 812 - * @parent: parent dentry for this file. This should be a directory dentry 813 - * if set. If this parameter is NULL, then the directory will be 814 - * created in the root of the tracefs filesystem. 601 + * eventfs_create_events_dir - create the top level events directory 602 + * @name: The name of the top level directory to create. 603 + * @parent: Parent dentry for this file in the tracefs directory. 604 + * @entries: A list of entries that represent the files under this directory 605 + * @size: The number of @entries 606 + * @data: The default data to pass to the files (an entry may override it). 815 607 * 816 608 * This function creates the top of the trace event directory. 609 + * 610 + * See eventfs_create_dir() for use of @entries. 817 611 */ 818 - struct dentry *eventfs_create_events_dir(const char *name, 819 - struct dentry *parent) 612 + struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent, 613 + const struct eventfs_entry *entries, 614 + int size, void *data) 820 615 { 821 616 struct dentry *dentry = tracefs_start_creating(name, parent); 822 617 struct eventfs_inode *ei; ··· 868 581 return NULL; 869 582 870 583 if (IS_ERR(dentry)) 871 - return dentry; 584 + return ERR_CAST(dentry); 872 585 873 586 ei = kzalloc(sizeof(*ei), GFP_KERNEL); 874 587 if (!ei) 875 - return ERR_PTR(-ENOMEM); 588 + goto fail_ei; 589 + 876 590 inode = tracefs_get_inode(dentry->d_sb); 877 - if (unlikely(!inode)) { 878 - kfree(ei); 879 - tracefs_failed_creating(dentry); 880 - return ERR_PTR(-ENOMEM); 591 + if (unlikely(!inode)) 592 + goto fail; 593 + 594 + if (size) { 595 + ei->d_children = kzalloc(sizeof(*ei->d_children) * size, GFP_KERNEL); 596 + if (!ei->d_children) 597 + goto fail; 881 598 } 882 599 883 - INIT_LIST_HEAD(&ei->e_top_files); 600 + ei->dentry = dentry; 601 + ei->entries = entries; 602 + ei->nr_entries = size; 603 + ei->data = data; 604 + ei->name = kstrdup_const(name, GFP_KERNEL); 605 + if (!ei->name) 606 + goto fail; 607 + 608 + INIT_LIST_HEAD(&ei->children); 609 + INIT_LIST_HEAD(&ei->list); 884 610 885 611 ti = get_tracefs(inode); 886 612 ti->flags |= TRACEFS_EVENT_INODE | TRACEFS_EVENT_TOP_INODE; ··· 903 603 inode->i_op = &eventfs_root_dir_inode_operations; 904 604 inode->i_fop = &eventfs_file_operations; 905 605 606 + dentry->d_fsdata = ei; 607 + 906 608 /* directory inodes start off with i_nlink == 2 (for "." entry) */ 907 609 inc_nlink(inode); 908 610 d_instantiate(dentry, inode); 909 611 inc_nlink(dentry->d_parent->d_inode); 910 612 fsnotify_mkdir(dentry->d_parent->d_inode, dentry); 911 - return tracefs_end_creating(dentry); 613 + tracefs_end_creating(dentry); 614 + 615 + return ei; 616 + 617 + fail: 618 + kfree(ei->d_children); 619 + kfree(ei); 620 + fail_ei: 621 + tracefs_failed_creating(dentry); 622 + return ERR_PTR(-ENOMEM); 912 623 } 913 624 914 - /** 915 - * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later 916 - * @name: the name of the file to create. 917 - * @parent: parent dentry for this dir. 918 - * 919 - * This function adds eventfs subsystem dir to list. 920 - * And all these dirs are created on the fly when they are looked up, 921 - * and the dentry and inodes will be removed when they are done. 922 - */ 923 - struct eventfs_file *eventfs_add_subsystem_dir(const char *name, 924 - struct dentry *parent) 625 + static LLIST_HEAD(free_list); 626 + 627 + static void eventfs_workfn(struct work_struct *work) 925 628 { 926 - struct tracefs_inode *ti_parent; 927 - struct eventfs_inode *ei_parent; 928 - struct eventfs_file *ef; 629 + struct eventfs_inode *ei, *tmp; 630 + struct llist_node *llnode; 929 631 930 - if (security_locked_down(LOCKDOWN_TRACEFS)) 931 - return NULL; 932 - 933 - if (!parent) 934 - return ERR_PTR(-EINVAL); 935 - 936 - ti_parent = get_tracefs(parent->d_inode); 937 - ei_parent = ti_parent->private; 938 - 939 - ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); 940 - if (IS_ERR(ef)) 941 - return ef; 942 - 943 - mutex_lock(&eventfs_mutex); 944 - list_add_tail(&ef->list, &ei_parent->e_top_files); 945 - ef->d_parent = parent; 946 - mutex_unlock(&eventfs_mutex); 947 - return ef; 632 + llnode = llist_del_all(&free_list); 633 + llist_for_each_entry_safe(ei, tmp, llnode, llist) { 634 + /* This dput() matches the dget() from unhook_dentry() */ 635 + for (int i = 0; i < ei->nr_entries; i++) { 636 + if (ei->d_children[i]) 637 + dput(ei->d_children[i]); 638 + } 639 + /* This should only get here if it had a dentry */ 640 + if (!WARN_ON_ONCE(!ei->dentry)) 641 + dput(ei->dentry); 642 + } 948 643 } 949 644 950 - /** 951 - * eventfs_add_dir - add eventfs dir to list to create later 952 - * @name: the name of the file to create. 953 - * @ef_parent: parent eventfs_file for this dir. 954 - * 955 - * This function adds eventfs dir to list. 956 - * And all these dirs are created on the fly when they are looked up, 957 - * and the dentry and inodes will be removed when they are done. 958 - */ 959 - struct eventfs_file *eventfs_add_dir(const char *name, 960 - struct eventfs_file *ef_parent) 645 + static DECLARE_WORK(eventfs_work, eventfs_workfn); 646 + 647 + static void free_rcu_ei(struct rcu_head *head) 961 648 { 962 - struct eventfs_file *ef; 649 + struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu); 963 650 964 - if (security_locked_down(LOCKDOWN_TRACEFS)) 965 - return NULL; 651 + if (ei->dentry) { 652 + /* Do not free the ei until all references of dentry are gone */ 653 + if (llist_add(&ei->llist, &free_list)) 654 + queue_work(system_unbound_wq, &eventfs_work); 655 + return; 656 + } 966 657 967 - if (!ef_parent) 968 - return ERR_PTR(-EINVAL); 658 + /* If the ei doesn't have a dentry, neither should its children */ 659 + for (int i = 0; i < ei->nr_entries; i++) { 660 + WARN_ON_ONCE(ei->d_children[i]); 661 + } 969 662 970 - ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); 971 - if (IS_ERR(ef)) 972 - return ef; 973 - 974 - mutex_lock(&eventfs_mutex); 975 - list_add_tail(&ef->list, &ef_parent->ei->e_top_files); 976 - ef->d_parent = ef_parent->dentry; 977 - mutex_unlock(&eventfs_mutex); 978 - return ef; 663 + free_ei(ei); 979 664 } 980 665 981 - /** 982 - * eventfs_add_events_file - add the data needed to create a file for later reference 983 - * @name: the name of the file to create. 984 - * @mode: the permission that the file should have. 985 - * @parent: parent dentry for this file. 986 - * @data: something that the caller will want to get to later on. 987 - * @fop: struct file_operations that should be used for this file. 988 - * 989 - * This function is used to add the information needed to create a 990 - * dentry/inode within the top level events directory. The file created 991 - * will have the @mode permissions. The @data will be used to fill the 992 - * inode.i_private when the open() call is done. The dentry and inodes are 993 - * all created when they are referenced, and removed when they are no 994 - * longer referenced. 995 - */ 996 - int eventfs_add_events_file(const char *name, umode_t mode, 997 - struct dentry *parent, void *data, 998 - const struct file_operations *fop) 666 + static void unhook_dentry(struct dentry *dentry) 999 667 { 1000 - struct tracefs_inode *ti; 1001 - struct eventfs_inode *ei; 1002 - struct eventfs_file *ef; 668 + if (!dentry) 669 + return; 670 + /* 671 + * Need to add a reference to the dentry that is expected by 672 + * simple_recursive_removal(), which will include a dput(). 673 + */ 674 + dget(dentry); 1003 675 1004 - if (security_locked_down(LOCKDOWN_TRACEFS)) 1005 - return -ENODEV; 1006 - 1007 - if (!parent) 1008 - return -EINVAL; 1009 - 1010 - if (!(mode & S_IFMT)) 1011 - mode |= S_IFREG; 1012 - 1013 - if (!parent->d_inode) 1014 - return -EINVAL; 1015 - 1016 - ti = get_tracefs(parent->d_inode); 1017 - if (!(ti->flags & TRACEFS_EVENT_INODE)) 1018 - return -EINVAL; 1019 - 1020 - ei = ti->private; 1021 - ef = eventfs_prepare_ef(name, mode, fop, NULL, data); 1022 - 1023 - if (IS_ERR(ef)) 1024 - return -ENOMEM; 1025 - 1026 - mutex_lock(&eventfs_mutex); 1027 - list_add_tail(&ef->list, &ei->e_top_files); 1028 - ef->d_parent = parent; 1029 - mutex_unlock(&eventfs_mutex); 1030 - return 0; 1031 - } 1032 - 1033 - /** 1034 - * eventfs_add_file - add eventfs file to list to create later 1035 - * @name: the name of the file to create. 1036 - * @mode: the permission that the file should have. 1037 - * @ef_parent: parent eventfs_file for this file. 1038 - * @data: something that the caller will want to get to later on. 1039 - * @fop: struct file_operations that should be used for this file. 1040 - * 1041 - * This function is used to add the information needed to create a 1042 - * file within a subdirectory of the events directory. The file created 1043 - * will have the @mode permissions. The @data will be used to fill the 1044 - * inode.i_private when the open() call is done. The dentry and inodes are 1045 - * all created when they are referenced, and removed when they are no 1046 - * longer referenced. 1047 - */ 1048 - int eventfs_add_file(const char *name, umode_t mode, 1049 - struct eventfs_file *ef_parent, 1050 - void *data, 1051 - const struct file_operations *fop) 1052 - { 1053 - struct eventfs_file *ef; 1054 - 1055 - if (security_locked_down(LOCKDOWN_TRACEFS)) 1056 - return -ENODEV; 1057 - 1058 - if (!ef_parent) 1059 - return -EINVAL; 1060 - 1061 - if (!(mode & S_IFMT)) 1062 - mode |= S_IFREG; 1063 - 1064 - ef = eventfs_prepare_ef(name, mode, fop, NULL, data); 1065 - if (IS_ERR(ef)) 1066 - return -ENOMEM; 1067 - 1068 - mutex_lock(&eventfs_mutex); 1069 - list_add_tail(&ef->list, &ef_parent->ei->e_top_files); 1070 - ef->d_parent = ef_parent->dentry; 1071 - mutex_unlock(&eventfs_mutex); 1072 - return 0; 1073 - } 1074 - 1075 - static void free_ef(struct rcu_head *head) 1076 - { 1077 - struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); 1078 - 1079 - kfree(ef->name); 1080 - kfree(ef->ei); 1081 - kfree(ef); 676 + /* 677 + * Also add a reference for the dput() in eventfs_workfn(). 678 + * That is required as that dput() will free the ei after 679 + * the SRCU grace period is over. 680 + */ 681 + dget(dentry); 1082 682 } 1083 683 1084 684 /** 1085 685 * eventfs_remove_rec - remove eventfs dir or file from list 1086 - * @ef: eventfs_file to be removed. 1087 - * @head: to create list of eventfs_file to be deleted 1088 - * @level: to check recursion depth 686 + * @ei: eventfs_inode to be removed. 687 + * @level: prevent recursion from going more than 3 levels deep. 1089 688 * 1090 - * The helper function eventfs_remove_rec() is used to clean up and free the 1091 - * associated data from eventfs for both of the added functions. 689 + * This function recursively removes eventfs_inodes which 690 + * contains info of files and/or directories. 1092 691 */ 1093 - static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) 692 + static void eventfs_remove_rec(struct eventfs_inode *ei, int level) 1094 693 { 1095 - struct eventfs_file *ef_child; 694 + struct eventfs_inode *ei_child; 1096 695 1097 - if (!ef) 696 + if (!ei) 1098 697 return; 1099 698 /* 1100 699 * Check recursion depth. It should never be greater than 3: ··· 1005 806 if (WARN_ON_ONCE(level > 3)) 1006 807 return; 1007 808 1008 - if (ef->ei) { 1009 - /* search for nested folders or files */ 1010 - list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, 1011 - lockdep_is_held(&eventfs_mutex)) { 1012 - eventfs_remove_rec(ef_child, head, level + 1); 809 + /* search for nested folders or files */ 810 + list_for_each_entry_srcu(ei_child, &ei->children, list, 811 + lockdep_is_held(&eventfs_mutex)) { 812 + /* Children only have dentry if parent does */ 813 + WARN_ON_ONCE(ei_child->dentry && !ei->dentry); 814 + eventfs_remove_rec(ei_child, level + 1); 815 + } 816 + 817 + 818 + ei->is_freed = 1; 819 + 820 + for (int i = 0; i < ei->nr_entries; i++) { 821 + if (ei->d_children[i]) { 822 + /* Children only have dentry if parent does */ 823 + WARN_ON_ONCE(!ei->dentry); 824 + unhook_dentry(ei->d_children[i]); 1013 825 } 1014 826 } 1015 827 1016 - list_del_rcu(&ef->list); 1017 - list_add_tail(&ef->del_list, head); 828 + unhook_dentry(ei->dentry); 829 + 830 + list_del_rcu(&ei->list); 831 + call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); 1018 832 } 1019 833 1020 834 /** 1021 - * eventfs_remove - remove eventfs dir or file from list 1022 - * @ef: eventfs_file to be removed. 835 + * eventfs_remove_dir - remove eventfs dir or file from list 836 + * @ei: eventfs_inode to be removed. 1023 837 * 1024 838 * This function acquire the eventfs_mutex lock and call eventfs_remove_rec() 1025 839 */ 1026 - void eventfs_remove(struct eventfs_file *ef) 840 + void eventfs_remove_dir(struct eventfs_inode *ei) 1027 841 { 1028 - struct eventfs_file *tmp; 1029 - LIST_HEAD(ef_del_list); 1030 - struct dentry *dentry_list = NULL; 1031 842 struct dentry *dentry; 1032 843 1033 - if (!ef) 844 + if (!ei) 1034 845 return; 1035 846 1036 847 mutex_lock(&eventfs_mutex); 1037 - eventfs_remove_rec(ef, &ef_del_list, 0); 1038 - list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { 1039 - if (ef->dentry) { 1040 - unsigned long ptr = (unsigned long)dentry_list; 1041 - 1042 - /* Keep the dentry from being freed yet */ 1043 - dget(ef->dentry); 1044 - 1045 - /* 1046 - * Paranoid: The dget() above should prevent the dentry 1047 - * from being freed and calling eventfs_set_ef_status_free(). 1048 - * But just in case, set the link list LSB pointer to 1 1049 - * and have eventfs_set_ef_status_free() check that to 1050 - * make sure that if it does happen, it will not think 1051 - * the d_fsdata is an event_file. 1052 - * 1053 - * For this to work, no event_file should be allocated 1054 - * on a odd space, as the ef should always be allocated 1055 - * to be at least word aligned. Check for that too. 1056 - */ 1057 - WARN_ON_ONCE(ptr & 1); 1058 - 1059 - ef->dentry->d_fsdata = (void *)(ptr | 1); 1060 - dentry_list = ef->dentry; 1061 - ef->dentry = NULL; 1062 - } 1063 - call_srcu(&eventfs_srcu, &ef->rcu, free_ef); 1064 - } 848 + dentry = ei->dentry; 849 + eventfs_remove_rec(ei, 0); 1065 850 mutex_unlock(&eventfs_mutex); 1066 851 1067 - while (dentry_list) { 1068 - unsigned long ptr; 1069 - 1070 - dentry = dentry_list; 1071 - ptr = (unsigned long)dentry->d_fsdata & ~1UL; 1072 - dentry_list = (struct dentry *)ptr; 1073 - dentry->d_fsdata = NULL; 1074 - d_invalidate(dentry); 1075 - mutex_lock(&eventfs_mutex); 1076 - /* dentry should now have at least a single reference */ 1077 - WARN_ONCE((int)d_count(dentry) < 1, 1078 - "dentry %p less than one reference (%d) after invalidate\n", 1079 - dentry, d_count(dentry)); 1080 - mutex_unlock(&eventfs_mutex); 1081 - dput(dentry); 1082 - } 852 + /* 853 + * If any of the ei children has a dentry, then the ei itself 854 + * must have a dentry. 855 + */ 856 + if (dentry) 857 + simple_recursive_removal(dentry, NULL); 1083 858 } 1084 859 1085 860 /** 1086 - * eventfs_remove_events_dir - remove eventfs dir or file from list 1087 - * @dentry: events's dentry to be removed. 861 + * eventfs_remove_events_dir - remove the top level eventfs directory 862 + * @ei: the event_inode returned by eventfs_create_events_dir(). 1088 863 * 1089 - * This function remove events main directory 864 + * This function removes the events main directory 1090 865 */ 1091 - void eventfs_remove_events_dir(struct dentry *dentry) 866 + void eventfs_remove_events_dir(struct eventfs_inode *ei) 1092 867 { 1093 - struct tracefs_inode *ti; 868 + struct dentry *dentry; 1094 869 1095 - if (!dentry || !dentry->d_inode) 1096 - return; 870 + dentry = ei->dentry; 871 + eventfs_remove_dir(ei); 1097 872 1098 - ti = get_tracefs(dentry->d_inode); 1099 - if (!ti || !(ti->flags & TRACEFS_EVENT_INODE)) 1100 - return; 1101 - 1102 - d_invalidate(dentry); 873 + /* 874 + * Matches the dget() done by tracefs_start_creating() 875 + * in eventfs_create_events_dir() when it the dentry was 876 + * created. In other words, it's a normal dentry that 877 + * sticks around while the other ei->dentry are created 878 + * and destroyed dynamically. 879 + */ 1103 880 dput(dentry); 1104 881 }
+1 -1
fs/tracefs/inode.c
··· 385 385 386 386 ti = get_tracefs(inode); 387 387 if (ti && ti->flags & TRACEFS_EVENT_INODE) 388 - eventfs_set_ef_status_free(ti, dentry); 388 + eventfs_set_ei_status_free(ti, dentry); 389 389 iput(inode); 390 390 } 391 391
+53 -1
fs/tracefs/internal.h
··· 13 13 struct inode vfs_inode; 14 14 }; 15 15 16 + /* 17 + * struct eventfs_attr - cache the mode and ownership of a eventfs entry 18 + * @mode: saved mode plus flags of what is saved 19 + * @uid: saved uid if changed 20 + * @gid: saved gid if changed 21 + */ 22 + struct eventfs_attr { 23 + int mode; 24 + kuid_t uid; 25 + kgid_t gid; 26 + }; 27 + 28 + /* 29 + * struct eventfs_inode - hold the properties of the eventfs directories. 30 + * @list: link list into the parent directory 31 + * @entries: the array of entries representing the files in the directory 32 + * @name: the name of the directory to create 33 + * @children: link list into the child eventfs_inode 34 + * @dentry: the dentry of the directory 35 + * @d_parent: pointer to the parent's dentry 36 + * @d_children: The array of dentries to represent the files when created 37 + * @entry_attrs: Saved mode and ownership of the @d_children 38 + * @attr: Saved mode and ownership of eventfs_inode itself 39 + * @data: The private data to pass to the callbacks 40 + * @is_freed: Flag set if the eventfs is on its way to be freed 41 + * Note if is_freed is set, then dentry is corrupted. 42 + * @nr_entries: The number of items in @entries 43 + */ 44 + struct eventfs_inode { 45 + struct list_head list; 46 + const struct eventfs_entry *entries; 47 + const char *name; 48 + struct list_head children; 49 + struct dentry *dentry; /* Check is_freed to access */ 50 + struct dentry *d_parent; 51 + struct dentry **d_children; 52 + struct eventfs_attr *entry_attrs; 53 + struct eventfs_attr attr; 54 + void *data; 55 + /* 56 + * Union - used for deletion 57 + * @llist: for calling dput() if needed after RCU 58 + * @rcu: eventfs_inode to delete in RCU 59 + */ 60 + union { 61 + struct llist_node llist; 62 + struct rcu_head rcu; 63 + }; 64 + unsigned int is_freed:1; 65 + unsigned int nr_entries:31; 66 + }; 67 + 16 68 static inline struct tracefs_inode *get_tracefs(const struct inode *inode) 17 69 { 18 70 return container_of(inode, struct tracefs_inode, vfs_inode); ··· 77 25 struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); 78 26 struct dentry *eventfs_failed_creating(struct dentry *dentry); 79 27 struct dentry *eventfs_end_creating(struct dentry *dentry); 80 - void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry); 28 + void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry); 81 29 82 30 #endif /* _TRACEFS_INTERNAL_H */
+19 -9
include/linux/seq_buf.h
··· 14 14 * @buffer: pointer to the buffer 15 15 * @size: size of the buffer 16 16 * @len: the amount of data inside the buffer 17 - * @readpos: The next position to read in the buffer. 18 17 */ 19 18 struct seq_buf { 20 19 char *buffer; 21 20 size_t size; 22 21 size_t len; 23 - loff_t readpos; 24 22 }; 23 + 24 + #define DECLARE_SEQ_BUF(NAME, SIZE) \ 25 + char __ ## NAME ## _buffer[SIZE] = ""; \ 26 + struct seq_buf NAME = { \ 27 + .buffer = &__ ## NAME ## _buffer, \ 28 + .size = SIZE, \ 29 + } 25 30 26 31 static inline void seq_buf_clear(struct seq_buf *s) 27 32 { 28 33 s->len = 0; 29 - s->readpos = 0; 34 + if (s->size) 35 + s->buffer[0] = '\0'; 30 36 } 31 37 32 38 static inline void ··· 45 39 46 40 /* 47 41 * seq_buf have a buffer that might overflow. When this happens 48 - * the len and size are set to be equal. 42 + * len is set to be greater than size. 49 43 */ 50 44 static inline bool 51 45 seq_buf_has_overflowed(struct seq_buf *s) ··· 78 72 } 79 73 80 74 /** 81 - * seq_buf_terminate - Make sure buffer is nul terminated 82 - * @s: the seq_buf descriptor to terminate. 75 + * seq_buf_str - get %NUL-terminated C string from seq_buf 76 + * @s: the seq_buf handle 83 77 * 84 78 * This makes sure that the buffer in @s is nul terminated and 85 79 * safe to read as a string. ··· 90 84 * 91 85 * After this function is called, s->buffer is safe to use 92 86 * in string operations. 87 + * 88 + * Returns @s->buf after making sure it is terminated. 93 89 */ 94 - static inline void seq_buf_terminate(struct seq_buf *s) 90 + static inline const char *seq_buf_str(struct seq_buf *s) 95 91 { 96 92 if (WARN_ON(s->size == 0)) 97 - return; 93 + return ""; 98 94 99 95 if (seq_buf_buffer_left(s)) 100 96 s->buffer[s->len] = 0; 101 97 else 102 98 s->buffer[s->size - 1] = 0; 99 + 100 + return s->buffer; 103 101 } 104 102 105 103 /** ··· 153 143 int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args); 154 144 extern int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s); 155 145 extern int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, 156 - int cnt); 146 + size_t start, int cnt); 157 147 extern int seq_buf_puts(struct seq_buf *s, const char *str); 158 148 extern int seq_buf_putc(struct seq_buf *s, unsigned char c); 159 149 extern int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len);
+5 -1
include/linux/trace_events.h
··· 492 492 EVENT_FILE_FL_TRIGGER_COND_BIT, 493 493 EVENT_FILE_FL_PID_FILTER_BIT, 494 494 EVENT_FILE_FL_WAS_ENABLED_BIT, 495 + EVENT_FILE_FL_FREED_BIT, 495 496 }; 496 497 497 498 extern struct trace_event_file *trace_get_event_file(const char *instance, ··· 631 630 * TRIGGER_COND - When set, one or more triggers has an associated filter 632 631 * PID_FILTER - When set, the event is filtered based on pid 633 632 * WAS_ENABLED - Set when enabled to know to clear trace on module removal 633 + * FREED - File descriptor is freed, all fields should be considered invalid 634 634 */ 635 635 enum { 636 636 EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), ··· 645 643 EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), 646 644 EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), 647 645 EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), 646 + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), 648 647 }; 649 648 650 649 struct trace_event_file { 651 650 struct list_head list; 652 651 struct trace_event_call *event_call; 653 652 struct event_filter __rcu *filter; 654 - struct eventfs_file *ef; 653 + struct eventfs_inode *ei; 655 654 struct trace_array *tr; 656 655 struct trace_subsystem_dir *system; 657 656 struct list_head triggers; ··· 674 671 * caching and such. Which is mostly OK ;-) 675 672 */ 676 673 unsigned long flags; 674 + atomic_t ref; /* ref count for opened files */ 677 675 atomic_t sm_ref; /* soft-mode reference counter */ 678 676 atomic_t tm_ref; /* trigger-mode reference counter */ 679 677 };
+2
include/linux/trace_seq.h
··· 14 14 struct trace_seq { 15 15 char buffer[PAGE_SIZE]; 16 16 struct seq_buf seq; 17 + size_t readpos; 17 18 int full; 18 19 }; 19 20 ··· 23 22 { 24 23 seq_buf_init(&s->seq, s->buffer, PAGE_SIZE); 25 24 s->full = 0; 25 + s->readpos = 0; 26 26 } 27 27 28 28 /**
+58 -15
include/linux/tracefs.h
··· 23 23 24 24 struct eventfs_file; 25 25 26 - struct dentry *eventfs_create_events_dir(const char *name, 27 - struct dentry *parent); 26 + /** 27 + * eventfs_callback - A callback function to create dynamic files in eventfs 28 + * @name: The name of the file that is to be created 29 + * @mode: return the file mode for the file (RW access, etc) 30 + * @data: data to pass to the created file ops 31 + * @fops: the file operations of the created file 32 + * 33 + * The evetnfs files are dynamically created. The struct eventfs_entry array 34 + * is passed to eventfs_create_dir() or eventfs_create_events_dir() that will 35 + * be used to create the files within those directories. When a lookup 36 + * or access to a file within the directory is made, the struct eventfs_entry 37 + * array is used to find a callback() with the matching name that is being 38 + * referenced (for lookups, the entire array is iterated and each callback 39 + * will be called). 40 + * 41 + * The callback will be called with @name for the name of the file to create. 42 + * The callback can return less than 1 to indicate that no file should be 43 + * created. 44 + * 45 + * If a file is to be created, then @mode should be populated with the file 46 + * mode (permissions) for which the file is created for. This would be 47 + * used to set the created inode i_mode field. 48 + * 49 + * The @data should be set to the data passed to the other file operations 50 + * (read, write, etc). Note, @data will also point to the data passed in 51 + * to eventfs_create_dir() or eventfs_create_events_dir(), but the callback 52 + * can replace the data if it chooses to. Otherwise, the original data 53 + * will be used for the file operation functions. 54 + * 55 + * The @fops should be set to the file operations that will be used to create 56 + * the inode. 57 + * 58 + * NB. This callback is called while holding internal locks of the eventfs 59 + * system. The callback must not call any code that might also call into 60 + * the tracefs or eventfs system or it will risk creating a deadlock. 61 + */ 62 + typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data, 63 + const struct file_operations **fops); 28 64 29 - struct eventfs_file *eventfs_add_subsystem_dir(const char *name, 30 - struct dentry *parent); 65 + /** 66 + * struct eventfs_entry - dynamically created eventfs file call back handler 67 + * @name: Then name of the dynamic file in an eventfs directory 68 + * @callback: The callback to get the fops of the file when it is created 69 + * 70 + * See evenfs_callback() typedef for how to set up @callback. 71 + */ 72 + struct eventfs_entry { 73 + const char *name; 74 + eventfs_callback callback; 75 + }; 31 76 32 - struct eventfs_file *eventfs_add_dir(const char *name, 33 - struct eventfs_file *ef_parent); 77 + struct eventfs_inode; 34 78 35 - int eventfs_add_file(const char *name, umode_t mode, 36 - struct eventfs_file *ef_parent, void *data, 37 - const struct file_operations *fops); 79 + struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent, 80 + const struct eventfs_entry *entries, 81 + int size, void *data); 38 82 39 - int eventfs_add_events_file(const char *name, umode_t mode, 40 - struct dentry *parent, void *data, 41 - const struct file_operations *fops); 83 + struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent, 84 + const struct eventfs_entry *entries, 85 + int size, void *data); 42 86 43 - void eventfs_remove(struct eventfs_file *ef); 44 - 45 - void eventfs_remove_events_dir(struct dentry *dentry); 87 + void eventfs_remove_events_dir(struct eventfs_inode *ei); 88 + void eventfs_remove_dir(struct eventfs_inode *ei); 46 89 47 90 struct dentry *tracefs_create_file(const char *name, umode_t mode, 48 91 struct dentry *parent, void *data,
+10 -1
include/uapi/linux/user_events.h
··· 17 17 /* Create dynamic location entry within a 32-bit value */ 18 18 #define DYN_LOC(offset, size) ((size) << 16 | (offset)) 19 19 20 + /* List of supported registration flags */ 21 + enum user_reg_flag { 22 + /* Event will not delete upon last reference closing */ 23 + USER_EVENT_REG_PERSIST = 1U << 0, 24 + 25 + /* This value or above is currently non-ABI */ 26 + USER_EVENT_REG_MAX = 1U << 1, 27 + }; 28 + 20 29 /* 21 30 * Describes an event registration and stores the results of the registration. 22 31 * This structure is passed to the DIAG_IOCSREG ioctl, callers at a minimum ··· 42 33 /* Input: Enable size in bytes at address */ 43 34 __u8 enable_size; 44 35 45 - /* Input: Flags for future use, set to 0 */ 36 + /* Input: Flags to use, if any */ 46 37 __u16 flags; 47 38 48 39 /* Input: Address to update when enabled */
+4 -4
kernel/trace/ring_buffer.c
··· 2056 2056 retries = 10; 2057 2057 success = false; 2058 2058 while (retries--) { 2059 - struct list_head *head_page, *prev_page, *r; 2059 + struct list_head *head_page, *prev_page; 2060 2060 struct list_head *last_page, *first_page; 2061 2061 struct list_head *head_page_with_bit; 2062 2062 struct buffer_page *hpage = rb_set_head_page(cpu_buffer); ··· 2075 2075 last_page->next = head_page_with_bit; 2076 2076 first_page->prev = prev_page; 2077 2077 2078 - r = cmpxchg(&prev_page->next, head_page_with_bit, first_page); 2079 - 2080 - if (r == head_page_with_bit) { 2078 + /* caution: head_page_with_bit gets updated on cmpxchg failure */ 2079 + if (try_cmpxchg(&prev_page->next, 2080 + &head_page_with_bit, first_page)) { 2081 2081 /* 2082 2082 * yay, we replaced the page pointer to our new list, 2083 2083 * now, we just have to update to head page's prev
+48 -42
kernel/trace/trace.c
··· 54 54 #include "trace.h" 55 55 #include "trace_output.h" 56 56 57 - /* 58 - * On boot up, the ring buffer is set to the minimum size, so that 59 - * we do not waste memory on systems that are not using tracing. 60 - */ 61 - bool ring_buffer_expanded; 62 - 63 57 #ifdef CONFIG_FTRACE_STARTUP_TEST 64 58 /* 65 59 * We need to change this state when a selftest is running. ··· 196 202 strscpy(bootup_tracer_buf, str, MAX_TRACER_SIZE); 197 203 default_bootup_tracer = bootup_tracer_buf; 198 204 /* We are using ftrace early, expand it */ 199 - ring_buffer_expanded = true; 205 + trace_set_ring_buffer_expanded(NULL); 200 206 return 1; 201 207 } 202 208 __setup("ftrace=", set_cmdline_ftrace); ··· 241 247 } else { 242 248 allocate_snapshot = true; 243 249 /* We also need the main ring buffer expanded */ 244 - ring_buffer_expanded = true; 250 + trace_set_ring_buffer_expanded(NULL); 245 251 } 246 252 return 1; 247 253 } ··· 483 489 static struct trace_array global_trace = { 484 490 .trace_flags = TRACE_DEFAULT_FLAGS, 485 491 }; 492 + 493 + void trace_set_ring_buffer_expanded(struct trace_array *tr) 494 + { 495 + if (!tr) 496 + tr = &global_trace; 497 + tr->ring_buffer_expanded = true; 498 + } 486 499 487 500 LIST_HEAD(ftrace_trace_arrays); 488 501 ··· 1731 1730 { 1732 1731 int len; 1733 1732 1734 - if (trace_seq_used(s) <= s->seq.readpos) 1733 + if (trace_seq_used(s) <= s->readpos) 1735 1734 return -EBUSY; 1736 1735 1737 - len = trace_seq_used(s) - s->seq.readpos; 1736 + len = trace_seq_used(s) - s->readpos; 1738 1737 if (cnt > len) 1739 1738 cnt = len; 1740 - memcpy(buf, s->buffer + s->seq.readpos, cnt); 1739 + memcpy(buf, s->buffer + s->readpos, cnt); 1741 1740 1742 - s->seq.readpos += cnt; 1741 + s->readpos += cnt; 1743 1742 return cnt; 1744 1743 } 1745 1744 ··· 2013 2012 #ifdef CONFIG_TRACER_MAX_TRACE 2014 2013 if (type->use_max_tr) { 2015 2014 /* If we expanded the buffers, make sure the max is expanded too */ 2016 - if (ring_buffer_expanded) 2015 + if (tr->ring_buffer_expanded) 2017 2016 ring_buffer_resize(tr->max_buffer.buffer, trace_buf_size, 2018 2017 RING_BUFFER_ALL_CPUS); 2019 2018 tr->allocated_snapshot = true; ··· 2039 2038 tr->allocated_snapshot = false; 2040 2039 2041 2040 /* Shrink the max buffer again */ 2042 - if (ring_buffer_expanded) 2041 + if (tr->ring_buffer_expanded) 2043 2042 ring_buffer_resize(tr->max_buffer.buffer, 1, 2044 2043 RING_BUFFER_ALL_CPUS); 2045 2044 } ··· 3404 3403 pr_warn("**********************************************************\n"); 3405 3404 3406 3405 /* Expand the buffers to set size */ 3407 - tracing_update_buffers(); 3406 + tracing_update_buffers(&global_trace); 3408 3407 3409 3408 buffers_allocated = 1; 3410 3409 ··· 3828 3827 return false; 3829 3828 } 3830 3829 3831 - static const char *show_buffer(struct trace_seq *s) 3832 - { 3833 - struct seq_buf *seq = &s->seq; 3834 - 3835 - seq_buf_terminate(seq); 3836 - 3837 - return seq->buffer; 3838 - } 3839 - 3840 3830 static DEFINE_STATIC_KEY_FALSE(trace_no_verify); 3841 3831 3842 3832 static int test_can_verify_check(const char *fmt, ...) ··· 3967 3975 */ 3968 3976 if (WARN_ONCE(!trace_safe_str(iter, str, star, len), 3969 3977 "fmt: '%s' current_buffer: '%s'", 3970 - fmt, show_buffer(&iter->seq))) { 3978 + fmt, seq_buf_str(&iter->seq.seq))) { 3971 3979 int ret; 3972 3980 3973 3981 /* Try to safely read the string */ ··· 4978 4986 if (ret) 4979 4987 return ret; 4980 4988 4989 + mutex_lock(&event_mutex); 4990 + 4991 + /* Fail if the file is marked for removal */ 4992 + if (file->flags & EVENT_FILE_FL_FREED) { 4993 + trace_array_put(file->tr); 4994 + ret = -ENODEV; 4995 + } else { 4996 + event_file_get(file); 4997 + } 4998 + 4999 + mutex_unlock(&event_mutex); 5000 + if (ret) 5001 + return ret; 5002 + 4981 5003 filp->private_data = inode->i_private; 4982 5004 4983 5005 return 0; ··· 5002 4996 struct trace_event_file *file = inode->i_private; 5003 4997 5004 4998 trace_array_put(file->tr); 4999 + event_file_put(file); 5005 5000 5006 5001 return 0; 5007 5002 } ··· 6381 6374 * we use the size that was given, and we can forget about 6382 6375 * expanding it later. 6383 6376 */ 6384 - ring_buffer_expanded = true; 6377 + trace_set_ring_buffer_expanded(tr); 6385 6378 6386 6379 /* May be called before buffers are initialized */ 6387 6380 if (!tr->array_buffer.buffer) ··· 6459 6452 6460 6453 /** 6461 6454 * tracing_update_buffers - used by tracing facility to expand ring buffers 6455 + * @tr: The tracing instance 6462 6456 * 6463 6457 * To save on memory when the tracing is never used on a system with it 6464 6458 * configured in. The ring buffers are set to a minimum size. But once ··· 6468 6460 * 6469 6461 * This function is to be called when a tracer is about to be used. 6470 6462 */ 6471 - int tracing_update_buffers(void) 6463 + int tracing_update_buffers(struct trace_array *tr) 6472 6464 { 6473 6465 int ret = 0; 6474 6466 6475 6467 mutex_lock(&trace_types_lock); 6476 - if (!ring_buffer_expanded) 6477 - ret = __tracing_resize_ring_buffer(&global_trace, trace_buf_size, 6468 + if (!tr->ring_buffer_expanded) 6469 + ret = __tracing_resize_ring_buffer(tr, trace_buf_size, 6478 6470 RING_BUFFER_ALL_CPUS); 6479 6471 mutex_unlock(&trace_types_lock); 6480 6472 ··· 6528 6520 6529 6521 mutex_lock(&trace_types_lock); 6530 6522 6531 - if (!ring_buffer_expanded) { 6523 + if (!tr->ring_buffer_expanded) { 6532 6524 ret = __tracing_resize_ring_buffer(tr, trace_buf_size, 6533 6525 RING_BUFFER_ALL_CPUS); 6534 6526 if (ret < 0) ··· 7014 7006 7015 7007 /* Now copy what we have to the user */ 7016 7008 sret = trace_seq_to_user(&iter->seq, ubuf, cnt); 7017 - if (iter->seq.seq.readpos >= trace_seq_used(&iter->seq)) 7009 + if (iter->seq.readpos >= trace_seq_used(&iter->seq)) 7018 7010 trace_seq_init(&iter->seq); 7019 7011 7020 7012 /* ··· 7200 7192 } 7201 7193 7202 7194 if (buf_size_same) { 7203 - if (!ring_buffer_expanded) 7195 + if (!tr->ring_buffer_expanded) 7204 7196 r = sprintf(buf, "%lu (expanded: %lu)\n", 7205 7197 size >> 10, 7206 7198 trace_buf_size >> 10); ··· 7257 7249 mutex_lock(&trace_types_lock); 7258 7250 for_each_tracing_cpu(cpu) { 7259 7251 size += per_cpu_ptr(tr->array_buffer.data, cpu)->entries >> 10; 7260 - if (!ring_buffer_expanded) 7252 + if (!tr->ring_buffer_expanded) 7261 7253 expanded_size += trace_buf_size >> 10; 7262 7254 } 7263 - if (ring_buffer_expanded) 7255 + if (tr->ring_buffer_expanded) 7264 7256 r = sprintf(buf, "%lu\n", size); 7265 7257 else 7266 7258 r = sprintf(buf, "%lu (expanded: %lu)\n", size, expanded_size); ··· 7654 7646 unsigned long val; 7655 7647 int ret; 7656 7648 7657 - ret = tracing_update_buffers(); 7649 + ret = tracing_update_buffers(tr); 7658 7650 if (ret < 0) 7659 7651 return ret; 7660 7652 ··· 9558 9550 if (allocate_trace_buffers(tr, trace_buf_size) < 0) 9559 9551 goto out_free_tr; 9560 9552 9553 + /* The ring buffer is defaultly expanded */ 9554 + trace_set_ring_buffer_expanded(tr); 9555 + 9561 9556 if (ftrace_allocate_ftrace_ops(tr) < 0) 9562 9557 goto out_free_tr; 9563 9558 ··· 9770 9759 static void 9771 9760 init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) 9772 9761 { 9773 - struct trace_event_file *file; 9774 9762 int cpu; 9775 9763 9776 9764 trace_create_file("available_tracers", TRACE_MODE_READ, d_tracer, ··· 9802 9792 trace_create_file("trace_marker", 0220, d_tracer, 9803 9793 tr, &tracing_mark_fops); 9804 9794 9805 - file = __find_event_file(tr, "ftrace", "print"); 9806 - if (file && file->ef) 9807 - eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, 9808 - file, &event_trigger_fops); 9809 - tr->trace_marker_file = file; 9795 + tr->trace_marker_file = __find_event_file(tr, "ftrace", "print"); 9810 9796 9811 9797 trace_create_file("trace_marker_raw", 0220, d_tracer, 9812 9798 tr, &tracing_mark_raw_fops); ··· 10450 10444 trace_printk_init_buffers(); 10451 10445 10452 10446 /* To save memory, keep the ring buffer size to its minimum */ 10453 - if (ring_buffer_expanded) 10447 + if (global_trace.ring_buffer_expanded) 10454 10448 ring_buf_size = trace_buf_size; 10455 10449 else 10456 10450 ring_buf_size = 1;
+12 -4
kernel/trace/trace.h
··· 381 381 struct dentry *dir; 382 382 struct dentry *options; 383 383 struct dentry *percpu_dir; 384 - struct dentry *event_dir; 384 + struct eventfs_inode *event_dir; 385 385 struct trace_options *topts; 386 386 struct list_head systems; 387 387 struct list_head events; ··· 410 410 struct cond_snapshot *cond_snapshot; 411 411 #endif 412 412 struct trace_func_repeats __percpu *last_func_repeats; 413 + /* 414 + * On boot up, the ring buffer is set to the minimum size, so that 415 + * we do not waste memory on systems that are not using tracing. 416 + */ 417 + bool ring_buffer_expanded; 413 418 }; 414 419 415 420 enum { ··· 766 761 #define DYN_FTRACE_TEST_NAME2 trace_selftest_dynamic_test_func2 767 762 extern int DYN_FTRACE_TEST_NAME2(void); 768 763 769 - extern bool ring_buffer_expanded; 764 + extern void trace_set_ring_buffer_expanded(struct trace_array *tr); 770 765 extern bool tracing_selftest_disabled; 771 766 772 767 #ifdef CONFIG_FTRACE_STARTUP_TEST ··· 1310 1305 #endif /* CONFIG_BRANCH_TRACER */ 1311 1306 1312 1307 /* set ring buffers to default size if not already done so */ 1313 - int tracing_update_buffers(void); 1308 + int tracing_update_buffers(struct trace_array *tr); 1314 1309 1315 1310 union trace_synth_field { 1316 1311 u8 as_u8; ··· 1349 1344 struct list_head list; 1350 1345 struct event_subsystem *subsystem; 1351 1346 struct trace_array *tr; 1352 - struct eventfs_file *ef; 1347 + struct eventfs_inode *ei; 1353 1348 int ref_count; 1354 1349 int nr_events; 1355 1350 }; ··· 1668 1663 struct trace_event_file *file, 1669 1664 char *glob, 1670 1665 struct event_trigger_data *trigger_data); 1666 + 1667 + extern void event_file_get(struct trace_event_file *file); 1668 + extern void event_file_put(struct trace_event_file *file); 1671 1669 1672 1670 /** 1673 1671 * struct event_trigger_ops - callbacks for trace event triggers
+255 -107
kernel/trace/trace_events.c
··· 984 984 return; 985 985 986 986 if (!--dir->nr_events) { 987 - eventfs_remove(dir->ef); 987 + eventfs_remove_dir(dir->ei); 988 988 list_del(&dir->list); 989 989 __put_system_dir(dir); 990 990 } 991 991 } 992 992 993 + void event_file_get(struct trace_event_file *file) 994 + { 995 + atomic_inc(&file->ref); 996 + } 997 + 998 + void event_file_put(struct trace_event_file *file) 999 + { 1000 + if (WARN_ON_ONCE(!atomic_read(&file->ref))) { 1001 + if (file->flags & EVENT_FILE_FL_FREED) 1002 + kmem_cache_free(file_cachep, file); 1003 + return; 1004 + } 1005 + 1006 + if (atomic_dec_and_test(&file->ref)) { 1007 + /* Count should only go to zero when it is freed */ 1008 + if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED))) 1009 + return; 1010 + kmem_cache_free(file_cachep, file); 1011 + } 1012 + } 1013 + 993 1014 static void remove_event_file_dir(struct trace_event_file *file) 994 1015 { 995 - eventfs_remove(file->ef); 1016 + eventfs_remove_dir(file->ei); 996 1017 list_del(&file->list); 997 1018 remove_subsystem(file->system); 998 1019 free_event_filter(file->filter); 999 - kmem_cache_free(file_cachep, file); 1020 + file->flags |= EVENT_FILE_FL_FREED; 1021 + event_file_put(file); 1000 1022 } 1001 1023 1002 1024 /* ··· 1188 1166 if (!cnt) 1189 1167 return 0; 1190 1168 1191 - ret = tracing_update_buffers(); 1169 + ret = tracing_update_buffers(tr); 1192 1170 if (ret < 0) 1193 1171 return ret; 1194 1172 ··· 1391 1369 flags = file->flags; 1392 1370 mutex_unlock(&event_mutex); 1393 1371 1394 - if (!file) 1372 + if (!file || flags & EVENT_FILE_FL_FREED) 1395 1373 return -ENODEV; 1396 1374 1397 1375 if (flags & EVENT_FILE_FL_ENABLED && ··· 1419 1397 if (ret) 1420 1398 return ret; 1421 1399 1422 - ret = tracing_update_buffers(); 1423 - if (ret < 0) 1424 - return ret; 1425 - 1426 1400 switch (val) { 1427 1401 case 0: 1428 1402 case 1: 1429 1403 ret = -ENODEV; 1430 1404 mutex_lock(&event_mutex); 1431 1405 file = event_file_data(filp); 1432 - if (likely(file)) 1406 + if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) { 1407 + ret = tracing_update_buffers(file->tr); 1408 + if (ret < 0) { 1409 + mutex_unlock(&event_mutex); 1410 + return ret; 1411 + } 1433 1412 ret = ftrace_event_enable_disable(file, val); 1413 + } 1434 1414 mutex_unlock(&event_mutex); 1435 1415 break; 1436 1416 ··· 1506 1482 if (ret) 1507 1483 return ret; 1508 1484 1509 - ret = tracing_update_buffers(); 1485 + ret = tracing_update_buffers(dir->tr); 1510 1486 if (ret < 0) 1511 1487 return ret; 1512 1488 ··· 1705 1681 1706 1682 mutex_lock(&event_mutex); 1707 1683 file = event_file_data(filp); 1708 - if (file) 1684 + if (file && !(file->flags & EVENT_FILE_FL_FREED)) 1709 1685 print_event_filter(file, s); 1710 1686 mutex_unlock(&event_mutex); 1711 1687 ··· 1980 1956 if (!cnt) 1981 1957 return 0; 1982 1958 1983 - ret = tracing_update_buffers(); 1959 + ret = tracing_update_buffers(tr); 1984 1960 if (ret < 0) 1985 1961 return ret; 1986 1962 ··· 2304 2280 return NULL; 2305 2281 } 2306 2282 2307 - static struct eventfs_file * 2283 + static int system_callback(const char *name, umode_t *mode, void **data, 2284 + const struct file_operations **fops) 2285 + { 2286 + if (strcmp(name, "filter") == 0) 2287 + *fops = &ftrace_subsystem_filter_fops; 2288 + 2289 + else if (strcmp(name, "enable") == 0) 2290 + *fops = &ftrace_system_enable_fops; 2291 + 2292 + else 2293 + return 0; 2294 + 2295 + *mode = TRACE_MODE_WRITE; 2296 + return 1; 2297 + } 2298 + 2299 + static struct eventfs_inode * 2308 2300 event_subsystem_dir(struct trace_array *tr, const char *name, 2309 - struct trace_event_file *file, struct dentry *parent) 2301 + struct trace_event_file *file, struct eventfs_inode *parent) 2310 2302 { 2311 2303 struct event_subsystem *system, *iter; 2312 2304 struct trace_subsystem_dir *dir; 2313 - struct eventfs_file *ef; 2314 - int res; 2305 + struct eventfs_inode *ei; 2306 + int nr_entries; 2307 + static struct eventfs_entry system_entries[] = { 2308 + { 2309 + .name = "filter", 2310 + .callback = system_callback, 2311 + }, 2312 + { 2313 + .name = "enable", 2314 + .callback = system_callback, 2315 + } 2316 + }; 2315 2317 2316 2318 /* First see if we did not already create this dir */ 2317 2319 list_for_each_entry(dir, &tr->systems, list) { ··· 2345 2295 if (strcmp(system->name, name) == 0) { 2346 2296 dir->nr_events++; 2347 2297 file->system = dir; 2348 - return dir->ef; 2298 + return dir->ei; 2349 2299 } 2350 2300 } 2351 2301 ··· 2369 2319 } else 2370 2320 __get_system(system); 2371 2321 2372 - ef = eventfs_add_subsystem_dir(name, parent); 2373 - if (IS_ERR(ef)) { 2322 + /* ftrace only has directories no files */ 2323 + if (strcmp(name, "ftrace") == 0) 2324 + nr_entries = 0; 2325 + else 2326 + nr_entries = ARRAY_SIZE(system_entries); 2327 + 2328 + ei = eventfs_create_dir(name, parent, system_entries, nr_entries, dir); 2329 + if (IS_ERR(ei)) { 2374 2330 pr_warn("Failed to create system directory %s\n", name); 2375 2331 __put_system(system); 2376 2332 goto out_free; 2377 2333 } 2378 2334 2379 - dir->ef = ef; 2335 + dir->ei = ei; 2380 2336 dir->tr = tr; 2381 2337 dir->ref_count = 1; 2382 2338 dir->nr_events = 1; 2383 2339 dir->subsystem = system; 2384 2340 file->system = dir; 2385 2341 2386 - /* the ftrace system is special, do not create enable or filter files */ 2387 - if (strcmp(name, "ftrace") != 0) { 2388 - 2389 - res = eventfs_add_file("filter", TRACE_MODE_WRITE, 2390 - dir->ef, dir, 2391 - &ftrace_subsystem_filter_fops); 2392 - if (res) { 2393 - kfree(system->filter); 2394 - system->filter = NULL; 2395 - pr_warn("Could not create tracefs '%s/filter' entry\n", name); 2396 - } 2397 - 2398 - eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir, 2399 - &ftrace_system_enable_fops); 2400 - } 2401 - 2402 2342 list_add(&dir->list, &tr->systems); 2403 2343 2404 - return dir->ef; 2344 + return dir->ei; 2405 2345 2406 2346 out_free: 2407 2347 kfree(dir); ··· 2440 2400 return ret; 2441 2401 } 2442 2402 2403 + static int event_callback(const char *name, umode_t *mode, void **data, 2404 + const struct file_operations **fops) 2405 + { 2406 + struct trace_event_file *file = *data; 2407 + struct trace_event_call *call = file->event_call; 2408 + 2409 + if (strcmp(name, "format") == 0) { 2410 + *mode = TRACE_MODE_READ; 2411 + *fops = &ftrace_event_format_fops; 2412 + *data = call; 2413 + return 1; 2414 + } 2415 + 2416 + /* 2417 + * Only event directories that can be enabled should have 2418 + * triggers or filters, with the exception of the "print" 2419 + * event that can have a "trigger" file. 2420 + */ 2421 + if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { 2422 + if (call->class->reg && strcmp(name, "enable") == 0) { 2423 + *mode = TRACE_MODE_WRITE; 2424 + *fops = &ftrace_enable_fops; 2425 + return 1; 2426 + } 2427 + 2428 + if (strcmp(name, "filter") == 0) { 2429 + *mode = TRACE_MODE_WRITE; 2430 + *fops = &ftrace_event_filter_fops; 2431 + return 1; 2432 + } 2433 + } 2434 + 2435 + if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) || 2436 + strcmp(trace_event_name(call), "print") == 0) { 2437 + if (strcmp(name, "trigger") == 0) { 2438 + *mode = TRACE_MODE_WRITE; 2439 + *fops = &event_trigger_fops; 2440 + return 1; 2441 + } 2442 + } 2443 + 2444 + #ifdef CONFIG_PERF_EVENTS 2445 + if (call->event.type && call->class->reg && 2446 + strcmp(name, "id") == 0) { 2447 + *mode = TRACE_MODE_READ; 2448 + *data = (void *)(long)call->event.type; 2449 + *fops = &ftrace_event_id_fops; 2450 + return 1; 2451 + } 2452 + #endif 2453 + 2454 + #ifdef CONFIG_HIST_TRIGGERS 2455 + if (strcmp(name, "hist") == 0) { 2456 + *mode = TRACE_MODE_READ; 2457 + *fops = &event_hist_fops; 2458 + return 1; 2459 + } 2460 + #endif 2461 + #ifdef CONFIG_HIST_TRIGGERS_DEBUG 2462 + if (strcmp(name, "hist_debug") == 0) { 2463 + *mode = TRACE_MODE_READ; 2464 + *fops = &event_hist_debug_fops; 2465 + return 1; 2466 + } 2467 + #endif 2468 + #ifdef CONFIG_TRACE_EVENT_INJECT 2469 + if (call->event.type && call->class->reg && 2470 + strcmp(name, "inject") == 0) { 2471 + *mode = 0200; 2472 + *fops = &event_inject_fops; 2473 + return 1; 2474 + } 2475 + #endif 2476 + return 0; 2477 + } 2478 + 2443 2479 static int 2444 - event_create_dir(struct dentry *parent, struct trace_event_file *file) 2480 + event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file) 2445 2481 { 2446 2482 struct trace_event_call *call = file->event_call; 2447 - struct eventfs_file *ef_subsystem = NULL; 2448 2483 struct trace_array *tr = file->tr; 2449 - struct eventfs_file *ef; 2484 + struct eventfs_inode *e_events; 2485 + struct eventfs_inode *ei; 2450 2486 const char *name; 2487 + int nr_entries; 2451 2488 int ret; 2489 + static struct eventfs_entry event_entries[] = { 2490 + { 2491 + .name = "enable", 2492 + .callback = event_callback, 2493 + }, 2494 + { 2495 + .name = "filter", 2496 + .callback = event_callback, 2497 + }, 2498 + { 2499 + .name = "trigger", 2500 + .callback = event_callback, 2501 + }, 2502 + { 2503 + .name = "format", 2504 + .callback = event_callback, 2505 + }, 2506 + #ifdef CONFIG_PERF_EVENTS 2507 + { 2508 + .name = "id", 2509 + .callback = event_callback, 2510 + }, 2511 + #endif 2512 + #ifdef CONFIG_HIST_TRIGGERS 2513 + { 2514 + .name = "hist", 2515 + .callback = event_callback, 2516 + }, 2517 + #endif 2518 + #ifdef CONFIG_HIST_TRIGGERS_DEBUG 2519 + { 2520 + .name = "hist_debug", 2521 + .callback = event_callback, 2522 + }, 2523 + #endif 2524 + #ifdef CONFIG_TRACE_EVENT_INJECT 2525 + { 2526 + .name = "inject", 2527 + .callback = event_callback, 2528 + }, 2529 + #endif 2530 + }; 2452 2531 2453 2532 /* 2454 2533 * If the trace point header did not define TRACE_SYSTEM ··· 2577 2418 if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) 2578 2419 return -ENODEV; 2579 2420 2580 - ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent); 2581 - if (!ef_subsystem) 2421 + e_events = event_subsystem_dir(tr, call->class->system, file, parent); 2422 + if (!e_events) 2582 2423 return -ENOMEM; 2583 2424 2425 + nr_entries = ARRAY_SIZE(event_entries); 2426 + 2584 2427 name = trace_event_name(call); 2585 - ef = eventfs_add_dir(name, ef_subsystem); 2586 - if (IS_ERR(ef)) { 2428 + ei = eventfs_create_dir(name, e_events, event_entries, nr_entries, file); 2429 + if (IS_ERR(ei)) { 2587 2430 pr_warn("Could not create tracefs '%s' directory\n", name); 2588 2431 return -1; 2589 2432 } 2590 2433 2591 - file->ef = ef; 2592 - 2593 - if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) 2594 - eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file, 2595 - &ftrace_enable_fops); 2596 - 2597 - #ifdef CONFIG_PERF_EVENTS 2598 - if (call->event.type && call->class->reg) 2599 - eventfs_add_file("id", TRACE_MODE_READ, file->ef, 2600 - (void *)(long)call->event.type, 2601 - &ftrace_event_id_fops); 2602 - #endif 2434 + file->ei = ei; 2603 2435 2604 2436 ret = event_define_fields(call); 2605 2437 if (ret < 0) { 2606 2438 pr_warn("Could not initialize trace point events/%s\n", name); 2607 2439 return ret; 2608 2440 } 2609 - 2610 - /* 2611 - * Only event directories that can be enabled should have 2612 - * triggers or filters. 2613 - */ 2614 - if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { 2615 - eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef, 2616 - file, &ftrace_event_filter_fops); 2617 - 2618 - eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, 2619 - file, &event_trigger_fops); 2620 - } 2621 - 2622 - #ifdef CONFIG_HIST_TRIGGERS 2623 - eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file, 2624 - &event_hist_fops); 2625 - #endif 2626 - #ifdef CONFIG_HIST_TRIGGERS_DEBUG 2627 - eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file, 2628 - &event_hist_debug_fops); 2629 - #endif 2630 - eventfs_add_file("format", TRACE_MODE_READ, file->ef, call, 2631 - &ftrace_event_format_fops); 2632 - 2633 - #ifdef CONFIG_TRACE_EVENT_INJECT 2634 - if (call->event.type && call->class->reg) 2635 - eventfs_add_file("inject", 0200, file->ef, file, 2636 - &event_inject_fops); 2637 - #endif 2638 2441 2639 2442 return 0; 2640 2443 } ··· 2924 2803 atomic_set(&file->tm_ref, 0); 2925 2804 INIT_LIST_HEAD(&file->triggers); 2926 2805 list_add(&file->list, &tr->events); 2806 + event_file_get(file); 2927 2807 2928 2808 return file; 2929 2809 } ··· 2946 2824 int i; 2947 2825 2948 2826 strscpy(bootup_trigger_buf, str, COMMAND_LINE_SIZE); 2949 - ring_buffer_expanded = true; 2827 + trace_set_ring_buffer_expanded(NULL); 2950 2828 disable_tracing_selftest("running event triggers"); 2951 2829 2952 2830 buf = bootup_trigger_buf; ··· 3736 3614 static __init int setup_trace_event(char *str) 3737 3615 { 3738 3616 strscpy(bootup_event_buf, str, COMMAND_LINE_SIZE); 3739 - ring_buffer_expanded = true; 3617 + trace_set_ring_buffer_expanded(NULL); 3740 3618 disable_tracing_selftest("running event tracing"); 3741 3619 3742 3620 return 1; 3743 3621 } 3744 3622 __setup("trace_event=", setup_trace_event); 3745 3623 3624 + static int events_callback(const char *name, umode_t *mode, void **data, 3625 + const struct file_operations **fops) 3626 + { 3627 + if (strcmp(name, "enable") == 0) { 3628 + *mode = TRACE_MODE_WRITE; 3629 + *fops = &ftrace_tr_enable_fops; 3630 + return 1; 3631 + } 3632 + 3633 + if (strcmp(name, "header_page") == 0) 3634 + *data = ring_buffer_print_page_header; 3635 + 3636 + else if (strcmp(name, "header_event") == 0) 3637 + *data = ring_buffer_print_entry_header; 3638 + 3639 + else 3640 + return 0; 3641 + 3642 + *mode = TRACE_MODE_READ; 3643 + *fops = &ftrace_show_header_fops; 3644 + return 1; 3645 + } 3646 + 3746 3647 /* Expects to have event_mutex held when called */ 3747 3648 static int 3748 3649 create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) 3749 3650 { 3750 - struct dentry *d_events; 3651 + struct eventfs_inode *e_events; 3751 3652 struct dentry *entry; 3752 - int error = 0; 3653 + int nr_entries; 3654 + static struct eventfs_entry events_entries[] = { 3655 + { 3656 + .name = "enable", 3657 + .callback = events_callback, 3658 + }, 3659 + { 3660 + .name = "header_page", 3661 + .callback = events_callback, 3662 + }, 3663 + { 3664 + .name = "header_event", 3665 + .callback = events_callback, 3666 + }, 3667 + }; 3753 3668 3754 3669 entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, 3755 3670 tr, &ftrace_set_event_fops); 3756 3671 if (!entry) 3757 3672 return -ENOMEM; 3758 3673 3759 - d_events = eventfs_create_events_dir("events", parent); 3760 - if (IS_ERR(d_events)) { 3674 + nr_entries = ARRAY_SIZE(events_entries); 3675 + 3676 + e_events = eventfs_create_events_dir("events", parent, events_entries, 3677 + nr_entries, tr); 3678 + if (IS_ERR(e_events)) { 3761 3679 pr_warn("Could not create tracefs 'events' directory\n"); 3762 3680 return -ENOMEM; 3763 3681 } 3764 - 3765 - error = eventfs_add_events_file("enable", TRACE_MODE_WRITE, d_events, 3766 - tr, &ftrace_tr_enable_fops); 3767 - if (error) 3768 - return -ENOMEM; 3769 3682 3770 3683 /* There are not as crucial, just warn if they are not created */ 3771 3684 ··· 3811 3654 TRACE_MODE_WRITE, parent, tr, 3812 3655 &ftrace_set_event_notrace_pid_fops); 3813 3656 3814 - /* ring buffer internal formats */ 3815 - eventfs_add_events_file("header_page", TRACE_MODE_READ, d_events, 3816 - ring_buffer_print_page_header, 3817 - &ftrace_show_header_fops); 3818 - 3819 - eventfs_add_events_file("header_event", TRACE_MODE_READ, d_events, 3820 - ring_buffer_print_entry_header, 3821 - &ftrace_show_header_fops); 3822 - 3823 - tr->event_dir = d_events; 3657 + tr->event_dir = e_events; 3824 3658 3825 3659 return 0; 3826 3660 }
+3
kernel/trace/trace_events_filter.c
··· 2349 2349 struct event_filter *filter = NULL; 2350 2350 int err; 2351 2351 2352 + if (file->flags & EVENT_FILE_FL_FREED) 2353 + return -ENODEV; 2354 + 2352 2355 if (!strcmp(strstrip(filter_string), "0")) { 2353 2356 filter_disable(file); 2354 2357 filter = event_filter(file);
+2 -9
kernel/trace/trace_events_hist.c
··· 774 774 { 775 775 const char *system = NULL, *name = NULL; 776 776 struct trace_event_call *call; 777 - int len; 778 777 779 778 if (!str) 780 779 return; 781 780 782 - /* sizeof() contains the nul byte */ 783 - len = sizeof(HIST_PREFIX) + strlen(str); 784 781 kfree(last_cmd); 785 - last_cmd = kzalloc(len, GFP_KERNEL); 782 + 783 + last_cmd = kasprintf(GFP_KERNEL, HIST_PREFIX "%s", str); 786 784 if (!last_cmd) 787 785 return; 788 - 789 - strcpy(last_cmd, HIST_PREFIX); 790 - /* Again, sizeof() contains the nul byte */ 791 - len -= sizeof(HIST_PREFIX); 792 - strncat(last_cmd, str, len); 793 786 794 787 if (file) { 795 788 call = file->event_call;
+1 -1
kernel/trace/trace_events_synth.c
··· 452 452 453 453 #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE 454 454 if ((unsigned long)str_val < TASK_SIZE) 455 - ret = strncpy_from_user_nofault(str_field, str_val, STR_VAR_LEN_MAX); 455 + ret = strncpy_from_user_nofault(str_field, (const void __user *)str_val, STR_VAR_LEN_MAX); 456 456 else 457 457 #endif 458 458 ret = strncpy_from_kernel_nofault(str_field, str_val, STR_VAR_LEN_MAX);
+22 -14
kernel/trace/trace_events_user.c
··· 50 50 #define EVENT_STATUS_OTHER BIT(7) 51 51 52 52 /* 53 - * User register flags are not allowed yet, keep them here until we are 54 - * ready to expose them out to the user ABI. 55 - */ 56 - enum user_reg_flag { 57 - /* Event will not delete upon last reference closing */ 58 - USER_EVENT_REG_PERSIST = 1U << 0, 59 - 60 - /* This value or above is currently non-ABI */ 61 - USER_EVENT_REG_MAX = 1U << 1, 62 - }; 63 - 64 - /* 65 53 * Stores the system name, tables, and locks for a group of events. This 66 54 * allows isolation for events by various means. 67 55 */ ··· 206 218 static u32 user_event_key(char *name) 207 219 { 208 220 return jhash(name, strlen(name), 0); 221 + } 222 + 223 + static bool user_event_capable(u16 reg_flags) 224 + { 225 + /* Persistent events require CAP_PERFMON / CAP_SYS_ADMIN */ 226 + if (reg_flags & USER_EVENT_REG_PERSIST) { 227 + if (!perfmon_capable()) 228 + return false; 229 + } 230 + 231 + return true; 209 232 } 210 233 211 234 static struct user_event *user_event_get(struct user_event *user) ··· 1810 1811 if (!user_event_last_ref(user)) 1811 1812 return -EBUSY; 1812 1813 1814 + if (!user_event_capable(user->reg_flags)) 1815 + return -EPERM; 1816 + 1813 1817 return destroy_user_event(user); 1814 1818 } 1815 1819 ··· 1928 1926 int argc = 0; 1929 1927 char **argv; 1930 1928 1931 - /* User register flags are not ready yet */ 1932 - if (reg_flags != 0 || flags != NULL) 1929 + /* Currently don't support any text based flags */ 1930 + if (flags != NULL) 1933 1931 return -EINVAL; 1932 + 1933 + if (!user_event_capable(reg_flags)) 1934 + return -EPERM; 1934 1935 1935 1936 /* Prevent dyn_event from racing */ 1936 1937 mutex_lock(&event_mutex); ··· 2066 2061 2067 2062 if (!user_event_last_ref(user)) 2068 2063 return -EBUSY; 2064 + 2065 + if (!user_event_capable(user->reg_flags)) 2066 + return -EPERM; 2069 2067 2070 2068 return destroy_user_event(user); 2071 2069 }
+5 -1
kernel/trace/trace_seq.c
··· 370 370 */ 371 371 int trace_seq_to_user(struct trace_seq *s, char __user *ubuf, int cnt) 372 372 { 373 + int ret; 373 374 __trace_seq_init(s); 374 - return seq_buf_to_user(&s->seq, ubuf, cnt); 375 + ret = seq_buf_to_user(&s->seq, ubuf, s->readpos, cnt); 376 + if (ret > 0) 377 + s->readpos += ret; 378 + return ret; 375 379 } 376 380 EXPORT_SYMBOL_GPL(trace_seq_to_user); 377 381
+13 -15
lib/seq_buf.c
··· 109 109 if (s->size == 0 || s->len == 0) 110 110 return; 111 111 112 - seq_buf_terminate(s); 113 - 114 - start = s->buffer; 112 + start = seq_buf_str(s); 115 113 while ((lf = strchr(start, '\n'))) { 116 114 int len = lf - start + 1; 117 115 ··· 187 189 seq_buf_set_overflow(s); 188 190 return -1; 189 191 } 192 + EXPORT_SYMBOL_GPL(seq_buf_puts); 190 193 191 194 /** 192 195 * seq_buf_putc - sequence printing of simple character ··· 209 210 seq_buf_set_overflow(s); 210 211 return -1; 211 212 } 213 + EXPORT_SYMBOL_GPL(seq_buf_putc); 212 214 213 215 /** 214 216 * seq_buf_putmem - write raw data into the sequenc buffer ··· 324 324 * seq_buf_to_user - copy the sequence buffer to user space 325 325 * @s: seq_buf descriptor 326 326 * @ubuf: The userspace memory location to copy to 327 + * @start: The first byte in the buffer to copy 327 328 * @cnt: The amount to copy 328 329 * 329 330 * Copies the sequence buffer into the userspace memory pointed to 330 - * by @ubuf. It starts from the last read position (@s->readpos) 331 - * and writes up to @cnt characters or till it reaches the end of 332 - * the content in the buffer (@s->len), which ever comes first. 331 + * by @ubuf. It starts from @start and writes up to @cnt characters 332 + * or until it reaches the end of the content in the buffer (@s->len), 333 + * whichever comes first. 333 334 * 334 335 * On success, it returns a positive number of the number of bytes 335 336 * it copied. 336 337 * 337 338 * On failure it returns -EBUSY if all of the content in the 338 339 * sequence has been already read, which includes nothing in the 339 - * sequence (@s->len == @s->readpos). 340 + * sequence (@s->len == @start). 340 341 * 341 342 * Returns -EFAULT if the copy to userspace fails. 342 343 */ 343 - int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, int cnt) 344 + int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, size_t start, int cnt) 344 345 { 345 346 int len; 346 347 int ret; ··· 351 350 352 351 len = seq_buf_used(s); 353 352 354 - if (len <= s->readpos) 353 + if (len <= start) 355 354 return -EBUSY; 356 355 357 - len -= s->readpos; 356 + len -= start; 358 357 if (cnt > len) 359 358 cnt = len; 360 - ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt); 359 + ret = copy_to_user(ubuf, s->buffer + start, cnt); 361 360 if (ret == cnt) 362 361 return -EFAULT; 363 362 364 - cnt -= ret; 365 - 366 - s->readpos += cnt; 367 - return cnt; 363 + return cnt - ret; 368 364 } 369 365 370 366 /**
+3 -1
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc
··· 40 40 esac 41 41 42 42 : "Test get argument (1)" 43 - if grep -q eventfs_add_dir available_filter_functions; then 43 + if grep -q eventfs_create_dir available_filter_functions; then 44 + DIR_NAME="eventfs_create_dir" 45 + elif grep -q eventfs_add_dir available_filter_functions; then 44 46 DIR_NAME="eventfs_add_dir" 45 47 else 46 48 DIR_NAME="tracefs_create_dir"
+3 -1
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
··· 40 40 esac 41 41 42 42 : "Test get argument (1)" 43 - if grep -q eventfs_add_dir available_filter_functions; then 43 + if grep -q eventfs_create_dir available_filter_functions; then 44 + DIR_NAME="eventfs_create_dir" 45 + elif grep -q eventfs_add_dir available_filter_functions; then 44 46 DIR_NAME="eventfs_add_dir" 45 47 else 46 48 DIR_NAME="tracefs_create_dir"
+54 -1
tools/testing/selftests/user_events/abi_test.c
··· 24 24 const char *data_file = "/sys/kernel/tracing/user_events_data"; 25 25 const char *enable_file = "/sys/kernel/tracing/events/user_events/__abi_event/enable"; 26 26 27 + static bool event_exists(void) 28 + { 29 + int fd = open(enable_file, O_RDWR); 30 + 31 + if (fd < 0) 32 + return false; 33 + 34 + close(fd); 35 + 36 + return true; 37 + } 38 + 27 39 static int change_event(bool enable) 28 40 { 29 41 int fd = open(enable_file, O_RDWR); ··· 59 47 return ret; 60 48 } 61 49 62 - static int reg_enable(void *enable, int size, int bit) 50 + static int event_delete(void) 51 + { 52 + int fd = open(data_file, O_RDWR); 53 + int ret; 54 + 55 + if (fd < 0) 56 + return -1; 57 + 58 + ret = ioctl(fd, DIAG_IOCSDEL, "__abi_event"); 59 + 60 + close(fd); 61 + 62 + return ret; 63 + } 64 + 65 + static int reg_enable_flags(void *enable, int size, int bit, int flags) 63 66 { 64 67 struct user_reg reg = {0}; 65 68 int fd = open(data_file, O_RDWR); ··· 85 58 86 59 reg.size = sizeof(reg); 87 60 reg.name_args = (__u64)"__abi_event"; 61 + reg.flags = flags; 88 62 reg.enable_bit = bit; 89 63 reg.enable_addr = (__u64)enable; 90 64 reg.enable_size = size; ··· 95 67 close(fd); 96 68 97 69 return ret; 70 + } 71 + 72 + static int reg_enable(void *enable, int size, int bit) 73 + { 74 + return reg_enable_flags(enable, size, bit, 0); 98 75 } 99 76 100 77 static int reg_disable(void *enable, int bit) ··· 159 126 ASSERT_EQ(0, change_event(true)); 160 127 ASSERT_EQ(0, self->check); 161 128 ASSERT_EQ(0, change_event(false)); 129 + } 130 + 131 + TEST_F(user, flags) { 132 + /* USER_EVENT_REG_PERSIST is allowed */ 133 + ASSERT_EQ(0, reg_enable_flags(&self->check, sizeof(int), 0, 134 + USER_EVENT_REG_PERSIST)); 135 + ASSERT_EQ(0, reg_disable(&self->check, 0)); 136 + 137 + /* Ensure it exists after close and disable */ 138 + ASSERT_TRUE(event_exists()); 139 + 140 + /* Ensure we can delete it */ 141 + ASSERT_EQ(0, event_delete()); 142 + 143 + /* USER_EVENT_REG_MAX or above is not allowed */ 144 + ASSERT_EQ(-1, reg_enable_flags(&self->check, sizeof(int), 0, 145 + USER_EVENT_REG_MAX)); 146 + 147 + /* Ensure it does not exist after invalid flags */ 148 + ASSERT_FALSE(event_exists()); 162 149 } 163 150 164 151 TEST_F(user, bit_sizes) {
+53 -1
tools/testing/selftests/user_events/dyn_test.c
··· 17 17 #include "../kselftest_harness.h" 18 18 #include "user_events_selftests.h" 19 19 20 + const char *dyn_file = "/sys/kernel/tracing/dynamic_events"; 20 21 const char *abi_file = "/sys/kernel/tracing/user_events_data"; 21 22 const char *enable_file = "/sys/kernel/tracing/events/user_events/__test_event/enable"; 23 + 24 + static int event_delete(void) 25 + { 26 + int fd = open(abi_file, O_RDWR); 27 + int ret; 28 + 29 + if (fd < 0) 30 + return -1; 31 + 32 + ret = ioctl(fd, DIAG_IOCSDEL, "__test_event"); 33 + 34 + close(fd); 35 + 36 + return ret; 37 + } 22 38 23 39 static bool wait_for_delete(void) 24 40 { ··· 80 64 return ioctl(fd, DIAG_IOCSUNREG, &unreg); 81 65 } 82 66 83 - static int parse(int *check, const char *value) 67 + static int parse_dyn(const char *value) 68 + { 69 + int fd = open(dyn_file, O_RDWR | O_APPEND); 70 + int len = strlen(value); 71 + int ret; 72 + 73 + if (fd == -1) 74 + return -1; 75 + 76 + ret = write(fd, value, len); 77 + 78 + if (ret == len) 79 + ret = 0; 80 + else 81 + ret = -1; 82 + 83 + close(fd); 84 + 85 + if (ret == 0) 86 + event_delete(); 87 + 88 + return ret; 89 + } 90 + 91 + static int parse_abi(int *check, const char *value) 84 92 { 85 93 int fd = open(abi_file, O_RDWR); 86 94 int ret; ··· 128 88 close(fd); 129 89 130 90 return ret; 91 + } 92 + 93 + static int parse(int *check, const char *value) 94 + { 95 + int abi_ret = parse_abi(check, value); 96 + int dyn_ret = parse_dyn(value); 97 + 98 + /* Ensure both ABI and DYN parse the same way */ 99 + if (dyn_ret != abi_ret) 100 + return -1; 101 + 102 + return dyn_ret; 131 103 } 132 104 133 105 static int check_match(int *check, const char *first, const char *second, bool *match)