Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

eventfs: Hold eventfs_mutex and SRCU when remount walks events

Commit 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the
events descriptor") had eventfs_set_attrs() recurse through ei->children
on remount. The walk only holds the rcu_read_lock() taken by
tracefs_apply_options() over tracefs_inodes, which is wrong:

- list_for_each_entry over ei->children races with the list_del_rcu()
in eventfs_remove_rec() -- LIST_POISON1 deref, same shape as
d2603279c7d6.
- eventfs_inodes are freed via call_srcu(&eventfs_srcu, ...).
rcu_read_lock() does not extend an SRCU grace period, so ti->private
can be reclaimed under the walk.
- The writes to ei->attr race with eventfs_set_attr(), which holds
eventfs_mutex.

Reproducer:

while :; do mount -o remount,uid=$((RANDOM%1000)) /sys/kernel/tracing; done &
while :; do
echo "p:kp submit_bio" > /sys/kernel/tracing/kprobe_events
echo > /sys/kernel/tracing/kprobe_events
done

Wrap the events portion of tracefs_apply_options() in
eventfs_remount_lock()/_unlock() that take eventfs_mutex and
srcu_read_lock(&eventfs_srcu). eventfs_set_attrs() doesn't sleep so the
nested rcu_read_lock() is fine; lockdep_assert_held() pins the contract.

Comment in tracefs_drop_inode() said "RCU cycle" -- it is SRCU.

Fixes: 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260418191737.10289-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

authored by

David Carlier and committed by
Steven Rostedt
07004a8c f67950b2

+21 -1
+14
fs/tracefs/event_inode.c
··· 244 244 { 245 245 struct eventfs_inode *ei_child; 246 246 247 + lockdep_assert_held(&eventfs_mutex); 248 + 247 249 /* Update events/<system>/<event> */ 248 250 if (WARN_ON_ONCE(level > 3)) 249 251 return; ··· 887 885 */ 888 886 d_invalidate(dentry); 889 887 d_make_discardable(dentry); 888 + } 889 + 890 + int eventfs_remount_lock(void) 891 + { 892 + mutex_lock(&eventfs_mutex); 893 + return srcu_read_lock(&eventfs_srcu); 894 + } 895 + 896 + void eventfs_remount_unlock(int srcu_idx) 897 + { 898 + srcu_read_unlock(&eventfs_srcu, srcu_idx); 899 + mutex_unlock(&eventfs_mutex); 890 900 }
+4 -1
fs/tracefs/inode.c
··· 313 313 struct inode *inode = d_inode(sb->s_root); 314 314 struct tracefs_inode *ti; 315 315 bool update_uid, update_gid; 316 + int srcu_idx; 316 317 umode_t tmp_mode; 317 318 318 319 /* ··· 338 337 update_uid = fsi->opts & BIT(Opt_uid); 339 338 update_gid = fsi->opts & BIT(Opt_gid); 340 339 340 + srcu_idx = eventfs_remount_lock(); 341 341 rcu_read_lock(); 342 342 list_for_each_entry_rcu(ti, &tracefs_inodes, list) { 343 343 if (update_uid) { ··· 360 358 eventfs_remount(ti, update_uid, update_gid); 361 359 } 362 360 rcu_read_unlock(); 361 + eventfs_remount_unlock(srcu_idx); 363 362 } 364 363 365 364 return 0; ··· 406 403 * This inode is being freed and cannot be used for 407 404 * eventfs. Clear the flag so that it doesn't call into 408 405 * eventfs during the remount flag updates. The eventfs_inode 409 - * gets freed after an RCU cycle, so the content will still 406 + * gets freed after an SRCU cycle, so the content will still 410 407 * be safe if the iteration is going on now. 411 408 */ 412 409 ti->flags &= ~TRACEFS_EVENT_INODE;
+3
fs/tracefs/internal.h
··· 76 76 void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid); 77 77 void eventfs_d_release(struct dentry *dentry); 78 78 79 + int eventfs_remount_lock(void); 80 + void eventfs_remount_unlock(int srcu_idx); 81 + 79 82 #endif /* _TRACEFS_INTERNAL_H */