Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull vfs pidfs updates from Christian Brauner:

- Allow retrieving exit information after a process has been reaped
through pidfds via the new PIDFD_INTO_EXIT extension for the
PIDFD_GET_INFO ioctl. Various tools need access to information about
a process/task even after it has already been reaped.

Pidfd polling allows waiting on either task exit or for a task to
have been reaped. The contract for PIDFD_INFO_EXIT is simply that
EPOLLHUP must be observed before exit information can be retrieved,
i.e., exit information is only provided once the task has been reaped
and then can be retrieved as long as the pidfd is open.

- Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
forgo allocating a file descriptor for their own process. This is
useful in scenarios where users want to act on their own process
through pidfds and is akin to AT_FDCWD.

- Improve premature thread-group leader and subthread exec behavior
when polling on pidfds:

(1) During a multi-threaded exec by a subthread, i.e.,
non-thread-group leader thread, all other threads in the
thread-group including the thread-group leader are killed and the
struct pid of the thread-group leader will be taken over by the
subthread that called exec. IOW, two tasks change their TIDs.

(2) A premature thread-group leader exit means that the thread-group
leader exited before all of the other subthreads in the
thread-group have exited.

Both cases lead to inconsistencies for pidfd polling with
PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
current thread-group leader may or may not see an exit notification
on the file descriptor depending on when poll is performed. If the
poll is performed before the exec of the subthread has concluded an
exit notification is generated for the old thread-group leader. If
the poll is performed after the exec of the subthread has concluded
no exit notification is generated for the old thread-group leader.

The correct behavior is to simply not generate an exit notification
on the struct pid of a subhthread exec because the struct pid is
taken over by the subthread and thus remains alive.

But this is difficult to handle because a thread-group may exit
premature as mentioned in (2). In that case an exit notification is
reliably generated but the subthreads may continue to run for an
indeterminate amount of time and thus also may exec at some point.

After this pull no exit notifications will be generated for a
PIDFD_THREAD pidfd for a thread-group leader until all subthreads
have been reaped. If a subthread should exec before no exit
notification will be generated until that task exits or it creates
subthreads and repeates the cycle.

This means an exit notification indicates the ability for the father
to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
selftests/pidfd: third test for multi-threaded exec polling
selftests/pidfd: second test for multi-threaded exec polling
selftests/pidfd: first test for multi-threaded exec polling
pidfs: improve multi-threaded exec and premature thread-group leader exit polling
pidfs: ensure that PIDFS_INFO_EXIT is available
selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
selftests/pidfd: add third PIDFD_INFO_EXIT selftest
selftests/pidfd: add second PIDFD_INFO_EXIT selftest
selftests/pidfd: add first PIDFD_INFO_EXIT selftest
selftests/pidfd: expand common pidfd header
pidfs/selftests: ensure correct headers for ioctl handling
selftests/pidfd: fix header inclusion
pidfs: allow to retrieve exit information
pidfs: record exit code and cgroupid at exit
pidfs: use private inode slab cache
pidfs: move setting flags into pidfs_alloc_file()
pidfd: rely on automatic cleanup in __pidfd_prepare()
...

Linus Torvalds 1 year ago df00ded2 71ee2fde

+1249 -200

19 changed files

expand all

internal.h

libfs.c

pidfs.c

include

linux

pidfs.h

uapi

linux

pidfd.h

kernel

exit.c

fork.c

pid.c

signal.c

tools

testing

selftests

guard-pages.c

pidfd

.gitignore

Makefile

pidfd.h

pidfd_exec_helper.c

pidfd_fdinfo_test.c

pidfd_info_test.c

pidfd_open_test.c

pidfd_setns_test.c

pidfd_test.c

fs/internal.h

··· 324 324 int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data, 325 325 struct path *path); 326 326 void stashed_dentry_prune(struct dentry *dentry); 327 + struct dentry *stashed_dentry_get(struct dentry **stashed); 327 328 /** 328 329 * path_mounted - check whether path is mounted 329 330 * @path: path to check

+2 -2

fs/libfs.c

··· 2113 2113 } 2114 2114 EXPORT_SYMBOL(simple_inode_init_ts); 2115 2115 2116 - static inline struct dentry *get_stashed_dentry(struct dentry **stashed) 2116 + struct dentry *stashed_dentry_get(struct dentry **stashed) 2117 2117 { 2118 2118 struct dentry *dentry; 2119 2119 ··· 2215 2215 const struct stashed_operations *sops = mnt->mnt_sb->s_fs_info; 2216 2216 2217 2217 /* See if dentry can be reused. */ 2218 - path->dentry = get_stashed_dentry(stashed); 2218 + path->dentry = stashed_dentry_get(stashed); 2219 2219 if (path->dentry) { 2220 2220 sops->put_data(data); 2221 2221 goto out_path;

+221 -26

fs/pidfs.c

··· 24 24 #include "internal.h" 25 25 #include "mount.h" 26 26 27 + static struct kmem_cache *pidfs_cachep __ro_after_init; 28 + 29 + /* 30 + * Stashes information that userspace needs to access even after the 31 + * process has been reaped. 32 + */ 33 + struct pidfs_exit_info { 34 + __u64 cgroupid; 35 + __s32 exit_code; 36 + }; 37 + 38 + struct pidfs_inode { 39 + struct pidfs_exit_info __pei; 40 + struct pidfs_exit_info *exit_info; 41 + struct inode vfs_inode; 42 + }; 43 + 44 + static inline struct pidfs_inode *pidfs_i(struct inode *inode) 45 + { 46 + return container_of(inode, struct pidfs_inode, vfs_inode); 47 + } 48 + 27 49 static struct rb_root pidfs_ino_tree = RB_ROOT; 28 50 29 51 #if BITS_PER_LONG == 32 ··· 210 188 static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) 211 189 { 212 190 struct pid *pid = pidfd_pid(file); 213 - bool thread = file->f_flags & PIDFD_THREAD; 214 191 struct task_struct *task; 215 192 __poll_t poll_flags = 0; 216 193 217 194 poll_wait(file, &pid->wait_pidfd, pts); 218 195 /* 219 - * Depending on PIDFD_THREAD, inform pollers when the thread 220 - * or the whole thread-group exits. 196 + * Don't wake waiters if the thread-group leader exited 197 + * prematurely. They either get notified when the last subthread 198 + * exits or not at all if one of the remaining subthreads execs 199 + * and assumes the struct pid of the old thread-group leader. 221 200 */ 222 201 guard(rcu)(); 223 202 task = pid_task(pid, PIDTYPE_PID); 224 203 if (!task) 225 204 poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP; 226 - else if (task->exit_state && (thread || thread_group_empty(task))) 205 + else if (task->exit_state && !delay_group_leader(task)) 227 206 poll_flags = EPOLLIN | EPOLLRDNORM; 228 207 229 208 return poll_flags; 230 209 } 231 210 232 - static long pidfd_info(struct task_struct *task, unsigned int cmd, unsigned long arg) 211 + static inline bool pid_in_current_pidns(const struct pid *pid) 212 + { 213 + const struct pid_namespace *ns = task_active_pid_ns(current); 214 + 215 + if (ns->level <= pid->level) 216 + return pid->numbers[ns->level].ns == ns; 217 + 218 + return false; 219 + } 220 + 221 + static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg) 233 222 { 234 223 struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg; 224 + struct inode *inode = file_inode(file); 225 + struct pid *pid = pidfd_pid(file); 235 226 size_t usize = _IOC_SIZE(cmd); 236 227 struct pidfd_info kinfo = {}; 228 + struct pidfs_exit_info *exit_info; 237 229 struct user_namespace *user_ns; 230 + struct task_struct *task; 238 231 const struct cred *c; 239 232 __u64 mask; 240 - #ifdef CONFIG_CGROUPS 241 - struct cgroup *cgrp; 242 - #endif 243 233 244 234 if (!uinfo) 245 235 return -EINVAL; ··· 260 226 261 227 if (copy_from_user(&mask, &uinfo->mask, sizeof(mask))) 262 228 return -EFAULT; 229 + 230 + /* 231 + * Restrict information retrieval to tasks within the caller's pid 232 + * namespace hierarchy. 233 + */ 234 + if (!pid_in_current_pidns(pid)) 235 + return -ESRCH; 236 + 237 + if (mask & PIDFD_INFO_EXIT) { 238 + exit_info = READ_ONCE(pidfs_i(inode)->exit_info); 239 + if (exit_info) { 240 + kinfo.mask |= PIDFD_INFO_EXIT; 241 + #ifdef CONFIG_CGROUPS 242 + kinfo.cgroupid = exit_info->cgroupid; 243 + kinfo.mask |= PIDFD_INFO_CGROUPID; 244 + #endif 245 + kinfo.exit_code = exit_info->exit_code; 246 + } 247 + } 248 + 249 + task = get_pid_task(pid, PIDTYPE_PID); 250 + if (!task) { 251 + /* 252 + * If the task has already been reaped, only exit 253 + * information is available 254 + */ 255 + if (!(mask & PIDFD_INFO_EXIT)) 256 + return -ESRCH; 257 + 258 + goto copy_out; 259 + } 263 260 264 261 c = get_task_cred(task); 265 262 if (!c) ··· 311 246 put_cred(c); 312 247 313 248 #ifdef CONFIG_CGROUPS 314 - rcu_read_lock(); 315 - cgrp = task_dfl_cgroup(task); 316 - kinfo.cgroupid = cgroup_id(cgrp); 317 - kinfo.mask |= PIDFD_INFO_CGROUPID; 318 - rcu_read_unlock(); 249 + if (!kinfo.cgroupid) { 250 + struct cgroup *cgrp; 251 + 252 + rcu_read_lock(); 253 + cgrp = task_dfl_cgroup(task); 254 + kinfo.cgroupid = cgroup_id(cgrp); 255 + kinfo.mask |= PIDFD_INFO_CGROUPID; 256 + rcu_read_unlock(); 257 + } 319 258 #endif 320 259 321 260 /* ··· 339 270 if (kinfo.pid == 0 || kinfo.tgid == 0 || (kinfo.ppid == 0 && kinfo.pid != 1)) 340 271 return -ESRCH; 341 272 273 + copy_out: 342 274 /* 343 275 * If userspace and the kernel have the same struct size it can just 344 276 * be copied. If userspace provides an older struct, only the bits that 345 277 * userspace knows about will be copied. If userspace provides a new 346 278 * struct, only the bits that the kernel knows about will be copied. 347 279 */ 348 - if (copy_to_user(uinfo, &kinfo, min(usize, sizeof(kinfo)))) 349 - return -EFAULT; 350 - 351 - return 0; 280 + return copy_struct_to_user(uinfo, usize, &kinfo, sizeof(kinfo), NULL); 352 281 } 353 282 354 283 static bool pidfs_ioctl_valid(unsigned int cmd) ··· 384 317 { 385 318 struct task_struct *task __free(put_task) = NULL; 386 319 struct nsproxy *nsp __free(put_nsproxy) = NULL; 387 - struct pid *pid = pidfd_pid(file); 388 320 struct ns_common *ns_common = NULL; 389 321 struct pid_namespace *pid_ns; 390 322 ··· 398 332 return put_user(file_inode(file)->i_generation, argp); 399 333 } 400 334 401 - task = get_pid_task(pid, PIDTYPE_PID); 402 - if (!task) 403 - return -ESRCH; 404 - 405 335 /* Extensible IOCTL that does not open namespace FDs, take a shortcut */ 406 336 if (_IOC_NR(cmd) == _IOC_NR(PIDFD_GET_INFO)) 407 - return pidfd_info(task, cmd, arg); 337 + return pidfd_info(file, cmd, arg); 338 + 339 + task = get_pid_task(pidfd_pid(file), PIDTYPE_PID); 340 + if (!task) 341 + return -ESRCH; 408 342 409 343 if (arg) 410 344 return -EINVAL; ··· 516 450 return file_inode(file)->i_private; 517 451 } 518 452 453 + /* 454 + * We're called from release_task(). We know there's at least one 455 + * reference to struct pid being held that won't be released until the 456 + * task has been reaped which cannot happen until we're out of 457 + * release_task(). 458 + * 459 + * If this struct pid is referred to by a pidfd then 460 + * stashed_dentry_get() will return the dentry and inode for that struct 461 + * pid. Since we've taken a reference on it there's now an additional 462 + * reference from the exit path on it. Which is fine. We're going to put 463 + * it again in a second and we know that the pid is kept alive anyway. 464 + * 465 + * Worst case is that we've filled in the info and immediately free the 466 + * dentry and inode afterwards since the pidfd has been closed. Since 467 + * pidfs_exit() currently is placed after exit_task_work() we know that 468 + * it cannot be us aka the exiting task holding a pidfd to ourselves. 469 + */ 470 + void pidfs_exit(struct task_struct *tsk) 471 + { 472 + struct dentry *dentry; 473 + 474 + might_sleep(); 475 + 476 + dentry = stashed_dentry_get(&task_pid(tsk)->stashed); 477 + if (dentry) { 478 + struct inode *inode = d_inode(dentry); 479 + struct pidfs_exit_info *exit_info = &pidfs_i(inode)->__pei; 480 + #ifdef CONFIG_CGROUPS 481 + struct cgroup *cgrp; 482 + 483 + rcu_read_lock(); 484 + cgrp = task_dfl_cgroup(tsk); 485 + exit_info->cgroupid = cgroup_id(cgrp); 486 + rcu_read_unlock(); 487 + #endif 488 + exit_info->exit_code = tsk->exit_code; 489 + 490 + /* Ensure that PIDFD_GET_INFO sees either all or nothing. */ 491 + smp_store_release(&pidfs_i(inode)->exit_info, &pidfs_i(inode)->__pei); 492 + dput(dentry); 493 + } 494 + } 495 + 519 496 static struct vfsmount *pidfs_mnt __ro_after_init; 520 497 521 498 /* ··· 614 505 put_pid(pid); 615 506 } 616 507 508 + static struct inode *pidfs_alloc_inode(struct super_block *sb) 509 + { 510 + struct pidfs_inode *pi; 511 + 512 + pi = alloc_inode_sb(sb, pidfs_cachep, GFP_KERNEL); 513 + if (!pi) 514 + return NULL; 515 + 516 + memset(&pi->__pei, 0, sizeof(pi->__pei)); 517 + pi->exit_info = NULL; 518 + 519 + return &pi->vfs_inode; 520 + } 521 + 522 + static void pidfs_free_inode(struct inode *inode) 523 + { 524 + kmem_cache_free(pidfs_cachep, pidfs_i(inode)); 525 + } 526 + 617 527 static const struct super_operations pidfs_sops = { 528 + .alloc_inode = pidfs_alloc_inode, 618 529 .drop_inode = generic_delete_inode, 619 530 .evict_inode = pidfs_evict_inode, 531 + .free_inode = pidfs_free_inode, 620 532 .statfs = simple_statfs, 621 533 }; 622 534 ··· 763 633 return 0; 764 634 } 765 635 636 + static inline bool pidfs_pid_valid(struct pid *pid, const struct path *path, 637 + unsigned int flags) 638 + { 639 + enum pid_type type; 640 + 641 + if (flags & PIDFD_CLONE) 642 + return true; 643 + 644 + /* 645 + * Make sure that if a pidfd is created PIDFD_INFO_EXIT 646 + * information will be available. So after an inode for the 647 + * pidfd has been allocated perform another check that the pid 648 + * is still alive. If it is exit information is available even 649 + * if the task gets reaped before the pidfd is returned to 650 + * userspace. The only exception is PIDFD_CLONE where no task 651 + * linkage has been established for @pid yet and the kernel is 652 + * in the middle of process creation so there's nothing for 653 + * pidfs to miss. 654 + */ 655 + if (flags & PIDFD_THREAD) 656 + type = PIDTYPE_PID; 657 + else 658 + type = PIDTYPE_TGID; 659 + 660 + /* 661 + * Since pidfs_exit() is called before struct pid's task linkage 662 + * is removed the case where the task got reaped but a dentry 663 + * was already attached to struct pid and exit information was 664 + * recorded and published can be handled correctly. 665 + */ 666 + if (unlikely(!pid_has_task(pid, type))) { 667 + struct inode *inode = d_inode(path->dentry); 668 + return !!READ_ONCE(pidfs_i(inode)->exit_info); 669 + } 670 + 671 + return true; 672 + } 673 + 766 674 static struct file *pidfs_export_open(struct path *path, unsigned int oflags) 767 675 { 676 + if (!pidfs_pid_valid(d_inode(path->dentry)->i_private, path, oflags)) 677 + return ERR_PTR(-ESRCH); 678 + 768 679 /* 769 680 * Clear O_LARGEFILE as open_by_handle_at() forces it and raise 770 681 * O_RDWR as pidfds always are. ··· 869 698 870 699 struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags) 871 700 { 872 - 873 701 struct file *pidfd_file; 874 - struct path path; 702 + struct path path __free(path_put) = {}; 875 703 int ret; 704 + 705 + /* 706 + * Ensure that PIDFD_CLONE can be passed as a flag without 707 + * overloading other uapi pidfd flags. 708 + */ 709 + BUILD_BUG_ON(PIDFD_CLONE == PIDFD_THREAD); 710 + BUILD_BUG_ON(PIDFD_CLONE == PIDFD_NONBLOCK); 876 711 877 712 ret = path_from_stashed(&pid->stashed, pidfs_mnt, get_pid(pid), &path); 878 713 if (ret < 0) 879 714 return ERR_PTR(ret); 880 715 716 + if (!pidfs_pid_valid(pid, &path, flags)) 717 + return ERR_PTR(-ESRCH); 718 + 719 + flags &= ~PIDFD_CLONE; 881 720 pidfd_file = dentry_open(&path, flags, current_cred()); 882 - path_put(&path); 721 + /* Raise PIDFD_THREAD explicitly as do_dentry_open() strips it. */ 722 + if (!IS_ERR(pidfd_file)) 723 + pidfd_file->f_flags |= (flags & PIDFD_THREAD); 724 + 883 725 return pidfd_file; 726 + } 727 + 728 + static void pidfs_inode_init_once(void *data) 729 + { 730 + struct pidfs_inode *pi = data; 731 + 732 + inode_init_once(&pi->vfs_inode); 884 733 } 885 734 886 735 void __init pidfs_init(void) 887 736 { 737 + pidfs_cachep = kmem_cache_create("pidfs_cache", sizeof(struct pidfs_inode), 0, 738 + (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | 739 + SLAB_ACCOUNT | SLAB_PANIC), 740 + pidfs_inode_init_once); 888 741 pidfs_mnt = kern_mount(&pidfs_type); 889 742 if (IS_ERR(pidfs_mnt)) 890 743 panic("Failed to mount pidfs pseudo filesystem");

include/linux/pidfs.h

··· 6 6 void __init pidfs_init(void); 7 7 void pidfs_add_pid(struct pid *pid); 8 8 void pidfs_remove_pid(struct pid *pid); 9 + void pidfs_exit(struct task_struct *tsk); 9 10 extern const struct dentry_operations pidfs_dentry_operations; 10 11 11 12 #endif /* _LINUX_PID_FS_H */

+30 -1

include/uapi/linux/pidfd.h

··· 10 10 /* Flags for pidfd_open(). */ 11 11 #define PIDFD_NONBLOCK O_NONBLOCK 12 12 #define PIDFD_THREAD O_EXCL 13 + #ifdef __KERNEL__ 14 + #include <linux/sched.h> 15 + #define PIDFD_CLONE CLONE_PIDFD 16 + #endif 13 17 14 18 /* Flags for pidfd_send_signal(). */ 15 19 #define PIDFD_SIGNAL_THREAD (1UL << 0) ··· 24 20 #define PIDFD_INFO_PID (1UL << 0) /* Always returned, even if not requested */ 25 21 #define PIDFD_INFO_CREDS (1UL << 1) /* Always returned, even if not requested */ 26 22 #define PIDFD_INFO_CGROUPID (1UL << 2) /* Always returned if available, even if not requested */ 23 + #define PIDFD_INFO_EXIT (1UL << 3) /* Only returned if requested. */ 27 24 28 25 #define PIDFD_INFO_SIZE_VER0 64 /* sizeof first published struct */ 26 + 27 + /* 28 + * The concept of process and threads in userland and the kernel is a confusing 29 + * one - within the kernel every thread is a 'task' with its own individual PID, 30 + * however from userland's point of view threads are grouped by a single PID, 31 + * which is that of the 'thread group leader', typically the first thread 32 + * spawned. 33 + * 34 + * To cut the Gideon knot, for internal kernel usage, we refer to 35 + * PIDFD_SELF_THREAD to refer to the current thread (or task from a kernel 36 + * perspective), and PIDFD_SELF_THREAD_GROUP to refer to the current thread 37 + * group leader... 38 + */ 39 + #define PIDFD_SELF_THREAD -10000 /* Current thread. */ 40 + #define PIDFD_SELF_THREAD_GROUP -20000 /* Current thread group leader. */ 41 + 42 + /* 43 + * ...and for userland we make life simpler - PIDFD_SELF refers to the current 44 + * thread, PIDFD_SELF_PROCESS refers to the process thread group leader. 45 + * 46 + * For nearly all practical uses, a user will want to use PIDFD_SELF. 47 + */ 48 + #define PIDFD_SELF PIDFD_SELF_THREAD 49 + #define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP 29 50 30 51 struct pidfd_info { 31 52 /* ··· 91 62 __u32 sgid; 92 63 __u32 fsuid; 93 64 __u32 fsgid; 94 - __u32 spare0[1]; 65 + __s32 exit_code; 95 66 }; 96 67 97 68 #define PIDFS_IOCTL_MAGIC 0xFF

+5 -3

kernel/exit.c

··· 69 69 #include <linux/sysfs.h> 70 70 #include <linux/user_events.h> 71 71 #include <linux/uaccess.h> 72 + #include <linux/pidfs.h> 72 73 73 74 #include <uapi/linux/wait.h> 74 75 ··· 250 249 dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); 251 250 rcu_read_unlock(); 252 251 252 + pidfs_exit(p); 253 253 cgroup_release(p); 254 254 255 255 write_lock_irq(&tasklist_lock); ··· 743 741 744 742 tsk->exit_state = EXIT_ZOMBIE; 745 743 /* 746 - * sub-thread or delay_group_leader(), wake up the 747 - * PIDFD_THREAD waiters. 744 + * Ignore thread-group leaders that exited before all 745 + * subthreads did. 748 746 */ 749 - if (!thread_group_empty(tsk)) 747 + if (!delay_group_leader(tsk)) 750 748 do_notify_pidfd(tsk); 751 749 752 750 if (unlikely(tsk->ptrace)) {

+9 -13

kernel/fork.c

··· 2032 2032 */ 2033 2033 static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) 2034 2034 { 2035 - int pidfd; 2036 2035 struct file *pidfd_file; 2037 2036 2038 - pidfd = get_unused_fd_flags(O_CLOEXEC); 2037 + CLASS(get_unused_fd, pidfd)(O_CLOEXEC); 2039 2038 if (pidfd < 0) 2040 2039 return pidfd; 2041 2040 2042 2041 pidfd_file = pidfs_alloc_file(pid, flags | O_RDWR); 2043 - if (IS_ERR(pidfd_file)) { 2044 - put_unused_fd(pidfd); 2042 + if (IS_ERR(pidfd_file)) 2045 2043 return PTR_ERR(pidfd_file); 2046 - } 2047 - /* 2048 - * anon_inode_getfile() ignores everything outside of the 2049 - * O_ACCMODE | O_NONBLOCK mask, set PIDFD_THREAD manually. 2050 - */ 2051 - pidfd_file->f_flags |= (flags & PIDFD_THREAD); 2044 + 2052 2045 *ret = pidfd_file; 2053 - return pidfd; 2046 + return take_fd(pidfd); 2054 2047 } 2055 2048 2056 2049 /** ··· 2425 2432 if (clone_flags & CLONE_PIDFD) { 2426 2433 int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0; 2427 2434 2428 - /* Note that no task has been attached to @pid yet. */ 2429 - retval = __pidfd_prepare(pid, flags, &pidfile); 2435 + /* 2436 + * Note that no task has been attached to @pid yet indicate 2437 + * that via CLONE_PIDFD. 2438 + */ 2439 + retval = __pidfd_prepare(pid, flags | PIDFD_CLONE, &pidfile); 2430 2440 if (retval < 0) 2431 2441 goto bad_fork_free_pid; 2432 2442 pidfd = retval;

+19 -5

kernel/pid.c

··· 564 564 */ 565 565 struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags) 566 566 { 567 - unsigned int f_flags; 567 + unsigned int f_flags = 0; 568 568 struct pid *pid; 569 569 struct task_struct *task; 570 + enum pid_type type; 570 571 571 - pid = pidfd_get_pid(pidfd, &f_flags); 572 - if (IS_ERR(pid)) 573 - return ERR_CAST(pid); 572 + switch (pidfd) { 573 + case PIDFD_SELF_THREAD: 574 + type = PIDTYPE_PID; 575 + pid = get_task_pid(current, type); 576 + break; 577 + case PIDFD_SELF_THREAD_GROUP: 578 + type = PIDTYPE_TGID; 579 + pid = get_task_pid(current, type); 580 + break; 581 + default: 582 + pid = pidfd_get_pid(pidfd, &f_flags); 583 + if (IS_ERR(pid)) 584 + return ERR_CAST(pid); 585 + type = PIDTYPE_TGID; 586 + break; 587 + } 574 588 575 - task = get_pid_task(pid, PIDTYPE_TGID); 589 + task = get_pid_task(pid, type); 576 590 put_pid(pid); 577 591 if (!task) 578 592 return ERR_PTR(-ESRCH);

+70 -50

kernel/signal.c

··· 2180 2180 WARN_ON_ONCE(!tsk->ptrace && 2181 2181 (tsk->group_leader != tsk || !thread_group_empty(tsk))); 2182 2182 /* 2183 - * tsk is a group leader and has no threads, wake up the 2184 - * non-PIDFD_THREAD waiters. 2183 + * Notify for thread-group leaders without subthreads. 2185 2184 */ 2186 2185 if (thread_group_empty(tsk)) 2187 2186 do_notify_pidfd(tsk); ··· 4008 4009 (PIDFD_SIGNAL_THREAD | PIDFD_SIGNAL_THREAD_GROUP | \ 4009 4010 PIDFD_SIGNAL_PROCESS_GROUP) 4010 4011 4011 - /** 4012 - * sys_pidfd_send_signal - Signal a process through a pidfd 4013 - * @pidfd: file descriptor of the process 4014 - * @sig: signal to send 4015 - * @info: signal info 4016 - * @flags: future flags 4017 - * 4018 - * Send the signal to the thread group or to the individual thread depending 4019 - * on PIDFD_THREAD. 4020 - * In the future extension to @flags may be used to override the default scope 4021 - * of @pidfd. 4022 - * 4023 - * Return: 0 on success, negative errno on failure 4024 - */ 4025 - SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, 4026 - siginfo_t __user *, info, unsigned int, flags) 4012 + static int do_pidfd_send_signal(struct pid *pid, int sig, enum pid_type type, 4013 + siginfo_t __user *info, unsigned int flags) 4027 4014 { 4028 - int ret; 4029 - struct pid *pid; 4030 4015 kernel_siginfo_t kinfo; 4031 - enum pid_type type; 4032 - 4033 - /* Enforce flags be set to 0 until we add an extension. */ 4034 - if (flags & ~PIDFD_SEND_SIGNAL_FLAGS) 4035 - return -EINVAL; 4036 - 4037 - /* Ensure that only a single signal scope determining flag is set. */ 4038 - if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1) 4039 - return -EINVAL; 4040 - 4041 - CLASS(fd, f)(pidfd); 4042 - if (fd_empty(f)) 4043 - return -EBADF; 4044 - 4045 - /* Is this a pidfd? */ 4046 - pid = pidfd_to_pid(fd_file(f)); 4047 - if (IS_ERR(pid)) 4048 - return PTR_ERR(pid); 4049 - 4050 - if (!access_pidfd_pidns(pid)) 4051 - return -EINVAL; 4052 4016 4053 4017 switch (flags) { 4054 - case 0: 4055 - /* Infer scope from the type of pidfd. */ 4056 - if (fd_file(f)->f_flags & PIDFD_THREAD) 4057 - type = PIDTYPE_PID; 4058 - else 4059 - type = PIDTYPE_TGID; 4060 - break; 4061 4018 case PIDFD_SIGNAL_THREAD: 4062 4019 type = PIDTYPE_PID; 4063 4020 break; ··· 4026 4071 } 4027 4072 4028 4073 if (info) { 4074 + int ret; 4075 + 4029 4076 ret = copy_siginfo_from_user_any(&kinfo, info); 4030 4077 if (unlikely(ret)) 4031 4078 return ret; ··· 4045 4088 4046 4089 if (type == PIDTYPE_PGID) 4047 4090 return kill_pgrp_info(sig, &kinfo, pid); 4048 - else 4049 - return kill_pid_info_type(sig, &kinfo, pid, type); 4091 + 4092 + return kill_pid_info_type(sig, &kinfo, pid, type); 4093 + } 4094 + 4095 + /** 4096 + * sys_pidfd_send_signal - Signal a process through a pidfd 4097 + * @pidfd: file descriptor of the process 4098 + * @sig: signal to send 4099 + * @info: signal info 4100 + * @flags: future flags 4101 + * 4102 + * Send the signal to the thread group or to the individual thread depending 4103 + * on PIDFD_THREAD. 4104 + * In the future extension to @flags may be used to override the default scope 4105 + * of @pidfd. 4106 + * 4107 + * Return: 0 on success, negative errno on failure 4108 + */ 4109 + SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, 4110 + siginfo_t __user *, info, unsigned int, flags) 4111 + { 4112 + struct pid *pid; 4113 + enum pid_type type; 4114 + 4115 + /* Enforce flags be set to 0 until we add an extension. */ 4116 + if (flags & ~PIDFD_SEND_SIGNAL_FLAGS) 4117 + return -EINVAL; 4118 + 4119 + /* Ensure that only a single signal scope determining flag is set. */ 4120 + if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1) 4121 + return -EINVAL; 4122 + 4123 + switch (pidfd) { 4124 + case PIDFD_SELF_THREAD: 4125 + pid = get_task_pid(current, PIDTYPE_PID); 4126 + type = PIDTYPE_PID; 4127 + break; 4128 + case PIDFD_SELF_THREAD_GROUP: 4129 + pid = get_task_pid(current, PIDTYPE_TGID); 4130 + type = PIDTYPE_TGID; 4131 + break; 4132 + default: { 4133 + CLASS(fd, f)(pidfd); 4134 + if (fd_empty(f)) 4135 + return -EBADF; 4136 + 4137 + /* Is this a pidfd? */ 4138 + pid = pidfd_to_pid(fd_file(f)); 4139 + if (IS_ERR(pid)) 4140 + return PTR_ERR(pid); 4141 + 4142 + if (!access_pidfd_pidns(pid)) 4143 + return -EINVAL; 4144 + 4145 + /* Infer scope from the type of pidfd. */ 4146 + if (fd_file(f)->f_flags & PIDFD_THREAD) 4147 + type = PIDTYPE_PID; 4148 + else 4149 + type = PIDTYPE_TGID; 4150 + 4151 + return do_pidfd_send_signal(pid, sig, type, info, flags); 4152 + } 4153 + } 4154 + 4155 + return do_pidfd_send_signal(pid, sig, type, info, flags); 4050 4156 } 4051 4157 4052 4158 static int

+4 -12

tools/testing/selftests/mm/guard-pages.c

··· 19 19 #include <sys/uio.h> 20 20 #include <unistd.h> 21 21 22 + #include "../pidfd/pidfd.h" 23 + 22 24 /* 23 25 * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1: 24 26 * ··· 50 48 return; 51 49 52 50 siglongjmp(signal_jmp_buf, c); 53 - } 54 - 55 - static int pidfd_open(pid_t pid, unsigned int flags) 56 - { 57 - return syscall(SYS_pidfd_open, pid, flags); 58 51 } 59 52 60 53 static ssize_t sys_process_madvise(int pidfd, const struct iovec *iovec, ··· 367 370 TEST_F(guard_pages, process_madvise) 368 371 { 369 372 const unsigned long page_size = self->page_size; 370 - pid_t pid = getpid(); 371 - int pidfd = pidfd_open(pid, 0); 372 373 char *ptr_region, *ptr1, *ptr2, *ptr3; 373 374 ssize_t count; 374 375 struct iovec vec[6]; 375 - 376 - ASSERT_NE(pidfd, -1); 377 376 378 377 /* Reserve region to map over. */ 379 378 ptr_region = mmap(NULL, 100 * page_size, PROT_NONE, ··· 418 425 ASSERT_EQ(munmap(&ptr_region[99 * page_size], page_size), 0); 419 426 420 427 /* Now guard in one step. */ 421 - count = sys_process_madvise(pidfd, vec, 6, MADV_GUARD_INSTALL, 0); 428 + count = sys_process_madvise(PIDFD_SELF, vec, 6, MADV_GUARD_INSTALL, 0); 422 429 423 430 /* OK we don't have permission to do this, skip. */ 424 431 if (count == -1 && errno == EPERM) ··· 439 446 ASSERT_FALSE(try_read_write_buf(&ptr3[19 * page_size])); 440 447 441 448 /* Now do the same with unguard... */ 442 - count = sys_process_madvise(pidfd, vec, 6, MADV_GUARD_REMOVE, 0); 449 + count = sys_process_madvise(PIDFD_SELF, vec, 6, MADV_GUARD_REMOVE, 0); 443 450 444 451 /* ...and everything should now succeed. */ 445 452 ··· 456 463 ASSERT_EQ(munmap(ptr1, 10 * page_size), 0); 457 464 ASSERT_EQ(munmap(ptr2, 5 * page_size), 0); 458 465 ASSERT_EQ(munmap(ptr3, 20 * page_size), 0); 459 - close(pidfd); 460 466 } 461 467 462 468 /* Assert that unmapping ranges does not leave guard markers behind. */

tools/testing/selftests/pidfd/.gitignore

··· 8 8 pidfd_setns_test 9 9 pidfd_file_handle_test 10 10 pidfd_bind_mount 11 + pidfd_info_test 12 + pidfd_exec_helper

+3 -1

tools/testing/selftests/pidfd/Makefile

··· 3 3 4 4 TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \ 5 5 pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \ 6 - pidfd_file_handle_test pidfd_bind_mount 6 + pidfd_file_handle_test pidfd_bind_mount pidfd_info_test 7 + 8 + TEST_GEN_PROGS_EXTENDED := pidfd_exec_helper 7 9 8 10 include ../lib.mk 9 11

+109

tools/testing/selftests/pidfd/pidfd.h

··· 12 12 #include <stdlib.h> 13 13 #include <string.h> 14 14 #include <syscall.h> 15 + #include <sys/ioctl.h> 15 16 #include <sys/types.h> 16 17 #include <sys/wait.h> 17 18 ··· 50 49 #ifndef PIDFD_NONBLOCK 51 50 #define PIDFD_NONBLOCK O_NONBLOCK 52 51 #endif 52 + 53 + #ifndef PIDFD_SELF_THREAD 54 + #define PIDFD_SELF_THREAD -10000 /* Current thread. */ 55 + #endif 56 + 57 + #ifndef PIDFD_SELF_THREAD_GROUP 58 + #define PIDFD_SELF_THREAD_GROUP -20000 /* Current thread group leader. */ 59 + #endif 60 + 61 + #ifndef PIDFD_SELF 62 + #define PIDFD_SELF PIDFD_SELF_THREAD 63 + #endif 64 + 65 + #ifndef PIDFD_SELF_PROCESS 66 + #define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP 67 + #endif 68 + 69 + #ifndef PIDFS_IOCTL_MAGIC 70 + #define PIDFS_IOCTL_MAGIC 0xFF 71 + #endif 72 + 73 + #ifndef PIDFD_GET_CGROUP_NAMESPACE 74 + #define PIDFD_GET_CGROUP_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 1) 75 + #endif 76 + 77 + #ifndef PIDFD_GET_IPC_NAMESPACE 78 + #define PIDFD_GET_IPC_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 2) 79 + #endif 80 + 81 + #ifndef PIDFD_GET_MNT_NAMESPACE 82 + #define PIDFD_GET_MNT_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 3) 83 + #endif 84 + 85 + #ifndef PIDFD_GET_NET_NAMESPACE 86 + #define PIDFD_GET_NET_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 4) 87 + #endif 88 + 89 + #ifndef PIDFD_GET_PID_NAMESPACE 90 + #define PIDFD_GET_PID_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 5) 91 + #endif 92 + 93 + #ifndef PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE 94 + #define PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 6) 95 + #endif 96 + 97 + #ifndef PIDFD_GET_TIME_NAMESPACE 98 + #define PIDFD_GET_TIME_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 7) 99 + #endif 100 + 101 + #ifndef PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE 102 + #define PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 8) 103 + #endif 104 + 105 + #ifndef PIDFD_GET_USER_NAMESPACE 106 + #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) 107 + #endif 108 + 109 + #ifndef PIDFD_GET_UTS_NAMESPACE 110 + #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10) 111 + #endif 112 + 113 + #ifndef PIDFD_GET_INFO 114 + #define PIDFD_GET_INFO _IOWR(PIDFS_IOCTL_MAGIC, 11, struct pidfd_info) 115 + #endif 116 + 117 + #ifndef PIDFD_INFO_PID 118 + #define PIDFD_INFO_PID (1UL << 0) /* Always returned, even if not requested */ 119 + #endif 120 + 121 + #ifndef PIDFD_INFO_CREDS 122 + #define PIDFD_INFO_CREDS (1UL << 1) /* Always returned, even if not requested */ 123 + #endif 124 + 125 + #ifndef PIDFD_INFO_CGROUPID 126 + #define PIDFD_INFO_CGROUPID (1UL << 2) /* Always returned if available, even if not requested */ 127 + #endif 128 + 129 + #ifndef PIDFD_INFO_EXIT 130 + #define PIDFD_INFO_EXIT (1UL << 3) /* Always returned if available, even if not requested */ 131 + #endif 132 + 133 + #ifndef PIDFD_THREAD 134 + #define PIDFD_THREAD O_EXCL 135 + #endif 136 + 137 + struct pidfd_info { 138 + __u64 mask; 139 + __u64 cgroupid; 140 + __u32 pid; 141 + __u32 tgid; 142 + __u32 ppid; 143 + __u32 ruid; 144 + __u32 rgid; 145 + __u32 euid; 146 + __u32 egid; 147 + __u32 suid; 148 + __u32 sgid; 149 + __u32 fsuid; 150 + __u32 fsgid; 151 + __s32 exit_code; 152 + }; 53 153 54 154 /* 55 155 * The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c ··· 252 150 } while (ret < 0 && errno == EINTR); 253 151 254 152 return ret; 153 + } 154 + 155 + static inline int sys_execveat(int dirfd, const char *pathname, 156 + char *const argv[], char *const envp[], 157 + int flags) 158 + { 159 + return syscall(__NR_execveat, dirfd, pathname, argv, envp, flags); 255 160 } 256 161 257 162 #endif /* __PIDFD_H */

+12

tools/testing/selftests/pidfd/pidfd_exec_helper.c

··· 1 + #define _GNU_SOURCE 2 + #include <stdio.h> 3 + #include <stdlib.h> 4 + #include <unistd.h> 5 + 6 + int main(int argc, char *argv[]) 7 + { 8 + if (pause()) 9 + _exit(EXIT_FAILURE); 10 + 11 + _exit(EXIT_SUCCESS); 12 + }

tools/testing/selftests/pidfd/pidfd_fdinfo_test.c

··· 13 13 #include <syscall.h> 14 14 #include <sys/wait.h> 15 15 #include <sys/mman.h> 16 + #include <sys/mount.h> 16 17 17 18 #include "pidfd.h" 18 19 #include "../kselftest.h"

+692

tools/testing/selftests/pidfd/pidfd_info_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #define _GNU_SOURCE 4 + #include <errno.h> 5 + #include <fcntl.h> 6 + #include <limits.h> 7 + #include <linux/types.h> 8 + #include <poll.h> 9 + #include <pthread.h> 10 + #include <sched.h> 11 + #include <signal.h> 12 + #include <stdio.h> 13 + #include <stdlib.h> 14 + #include <string.h> 15 + #include <syscall.h> 16 + #include <sys/prctl.h> 17 + #include <sys/wait.h> 18 + #include <unistd.h> 19 + #include <sys/socket.h> 20 + #include <linux/kcmp.h> 21 + #include <sys/stat.h> 22 + 23 + #include "pidfd.h" 24 + #include "../kselftest_harness.h" 25 + 26 + FIXTURE(pidfd_info) 27 + { 28 + pid_t child_pid1; 29 + int child_pidfd1; 30 + 31 + pid_t child_pid2; 32 + int child_pidfd2; 33 + 34 + pid_t child_pid3; 35 + int child_pidfd3; 36 + 37 + pid_t child_pid4; 38 + int child_pidfd4; 39 + }; 40 + 41 + FIXTURE_SETUP(pidfd_info) 42 + { 43 + int ret; 44 + int ipc_sockets[2]; 45 + char c; 46 + 47 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 48 + EXPECT_EQ(ret, 0); 49 + 50 + self->child_pid1 = create_child(&self->child_pidfd1, 0); 51 + EXPECT_GE(self->child_pid1, 0); 52 + 53 + if (self->child_pid1 == 0) { 54 + close(ipc_sockets[0]); 55 + 56 + if (write_nointr(ipc_sockets[1], "1", 1) < 0) 57 + _exit(EXIT_FAILURE); 58 + 59 + close(ipc_sockets[1]); 60 + 61 + pause(); 62 + _exit(EXIT_SUCCESS); 63 + } 64 + 65 + EXPECT_EQ(close(ipc_sockets[1]), 0); 66 + ASSERT_EQ(read_nointr(ipc_sockets[0], &c, 1), 1); 67 + EXPECT_EQ(close(ipc_sockets[0]), 0); 68 + 69 + /* SIGKILL but don't reap. */ 70 + EXPECT_EQ(sys_pidfd_send_signal(self->child_pidfd1, SIGKILL, NULL, 0), 0); 71 + 72 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 73 + EXPECT_EQ(ret, 0); 74 + 75 + self->child_pid2 = create_child(&self->child_pidfd2, 0); 76 + EXPECT_GE(self->child_pid2, 0); 77 + 78 + if (self->child_pid2 == 0) { 79 + close(ipc_sockets[0]); 80 + 81 + if (write_nointr(ipc_sockets[1], "1", 1) < 0) 82 + _exit(EXIT_FAILURE); 83 + 84 + close(ipc_sockets[1]); 85 + 86 + pause(); 87 + _exit(EXIT_SUCCESS); 88 + } 89 + 90 + EXPECT_EQ(close(ipc_sockets[1]), 0); 91 + ASSERT_EQ(read_nointr(ipc_sockets[0], &c, 1), 1); 92 + EXPECT_EQ(close(ipc_sockets[0]), 0); 93 + 94 + /* SIGKILL and reap. */ 95 + EXPECT_EQ(sys_pidfd_send_signal(self->child_pidfd2, SIGKILL, NULL, 0), 0); 96 + EXPECT_EQ(sys_waitid(P_PID, self->child_pid2, NULL, WEXITED), 0); 97 + 98 + self->child_pid3 = create_child(&self->child_pidfd3, CLONE_NEWUSER | CLONE_NEWPID); 99 + EXPECT_GE(self->child_pid3, 0); 100 + 101 + if (self->child_pid3 == 0) 102 + _exit(EXIT_SUCCESS); 103 + 104 + self->child_pid4 = create_child(&self->child_pidfd4, CLONE_NEWUSER | CLONE_NEWPID); 105 + EXPECT_GE(self->child_pid4, 0); 106 + 107 + if (self->child_pid4 == 0) 108 + _exit(EXIT_SUCCESS); 109 + 110 + EXPECT_EQ(sys_waitid(P_PID, self->child_pid4, NULL, WEXITED), 0); 111 + } 112 + 113 + FIXTURE_TEARDOWN(pidfd_info) 114 + { 115 + sys_pidfd_send_signal(self->child_pidfd1, SIGKILL, NULL, 0); 116 + if (self->child_pidfd1 >= 0) 117 + EXPECT_EQ(0, close(self->child_pidfd1)); 118 + 119 + sys_waitid(P_PID, self->child_pid1, NULL, WEXITED); 120 + 121 + sys_pidfd_send_signal(self->child_pidfd2, SIGKILL, NULL, 0); 122 + if (self->child_pidfd2 >= 0) 123 + EXPECT_EQ(0, close(self->child_pidfd2)); 124 + 125 + sys_waitid(P_PID, self->child_pid2, NULL, WEXITED); 126 + sys_waitid(P_PID, self->child_pid3, NULL, WEXITED); 127 + sys_waitid(P_PID, self->child_pid4, NULL, WEXITED); 128 + } 129 + 130 + TEST_F(pidfd_info, sigkill_exit) 131 + { 132 + struct pidfd_info info = { 133 + .mask = PIDFD_INFO_CGROUPID, 134 + }; 135 + 136 + /* Process has exited but not been reaped so this must work. */ 137 + ASSERT_EQ(ioctl(self->child_pidfd1, PIDFD_GET_INFO, &info), 0); 138 + 139 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 140 + ASSERT_EQ(ioctl(self->child_pidfd1, PIDFD_GET_INFO, &info), 0); 141 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); 142 + /* Process has exited but not been reaped, so no PIDFD_INFO_EXIT information yet. */ 143 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); 144 + } 145 + 146 + TEST_F(pidfd_info, sigkill_reaped) 147 + { 148 + struct pidfd_info info = { 149 + .mask = PIDFD_INFO_CGROUPID, 150 + }; 151 + 152 + /* Process has already been reaped and PIDFD_INFO_EXIT hasn't been set. */ 153 + ASSERT_NE(ioctl(self->child_pidfd2, PIDFD_GET_INFO, &info), 0); 154 + ASSERT_EQ(errno, ESRCH); 155 + 156 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 157 + ASSERT_EQ(ioctl(self->child_pidfd2, PIDFD_GET_INFO, &info), 0); 158 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 159 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 160 + ASSERT_TRUE(WIFSIGNALED(info.exit_code)); 161 + ASSERT_EQ(WTERMSIG(info.exit_code), SIGKILL); 162 + } 163 + 164 + TEST_F(pidfd_info, success_exit) 165 + { 166 + struct pidfd_info info = { 167 + .mask = PIDFD_INFO_CGROUPID, 168 + }; 169 + 170 + /* Process has exited but not been reaped so this must work. */ 171 + ASSERT_EQ(ioctl(self->child_pidfd3, PIDFD_GET_INFO, &info), 0); 172 + 173 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 174 + ASSERT_EQ(ioctl(self->child_pidfd3, PIDFD_GET_INFO, &info), 0); 175 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); 176 + /* Process has exited but not been reaped, so no PIDFD_INFO_EXIT information yet. */ 177 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); 178 + } 179 + 180 + TEST_F(pidfd_info, success_reaped) 181 + { 182 + struct pidfd_info info = { 183 + .mask = PIDFD_INFO_CGROUPID, 184 + }; 185 + 186 + /* Process has already been reaped and PIDFD_INFO_EXIT hasn't been set. */ 187 + ASSERT_NE(ioctl(self->child_pidfd4, PIDFD_GET_INFO, &info), 0); 188 + ASSERT_EQ(errno, ESRCH); 189 + 190 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 191 + ASSERT_EQ(ioctl(self->child_pidfd4, PIDFD_GET_INFO, &info), 0); 192 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 193 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 194 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 195 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); 196 + } 197 + 198 + TEST_F(pidfd_info, success_reaped_poll) 199 + { 200 + struct pidfd_info info = { 201 + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, 202 + }; 203 + struct pollfd fds = {}; 204 + int nevents; 205 + 206 + fds.events = POLLIN; 207 + fds.fd = self->child_pidfd2; 208 + 209 + nevents = poll(&fds, 1, -1); 210 + ASSERT_EQ(nevents, 1); 211 + ASSERT_TRUE(!!(fds.revents & POLLIN)); 212 + ASSERT_TRUE(!!(fds.revents & POLLHUP)); 213 + 214 + ASSERT_EQ(ioctl(self->child_pidfd2, PIDFD_GET_INFO, &info), 0); 215 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 216 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 217 + ASSERT_TRUE(WIFSIGNALED(info.exit_code)); 218 + ASSERT_EQ(WTERMSIG(info.exit_code), SIGKILL); 219 + } 220 + 221 + static void *pidfd_info_pause_thread(void *arg) 222 + { 223 + pid_t pid_thread = gettid(); 224 + int ipc_socket = *(int *)arg; 225 + 226 + /* Inform the grand-parent what the tid of this thread is. */ 227 + if (write_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) 228 + return NULL; 229 + 230 + close(ipc_socket); 231 + 232 + /* Sleep untill we're killed. */ 233 + pause(); 234 + return NULL; 235 + } 236 + 237 + TEST_F(pidfd_info, thread_group) 238 + { 239 + pid_t pid_leader, pid_poller, pid_thread; 240 + pthread_t thread; 241 + int nevents, pidfd_leader, pidfd_thread, pidfd_leader_thread, ret; 242 + int ipc_sockets[2]; 243 + struct pollfd fds = {}; 244 + struct pidfd_info info = { 245 + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, 246 + }, info2; 247 + 248 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 249 + EXPECT_EQ(ret, 0); 250 + 251 + pid_leader = create_child(&pidfd_leader, 0); 252 + EXPECT_GE(pid_leader, 0); 253 + 254 + if (pid_leader == 0) { 255 + close(ipc_sockets[0]); 256 + 257 + /* The thread will outlive the thread-group leader. */ 258 + if (pthread_create(&thread, NULL, pidfd_info_pause_thread, &ipc_sockets[1])) 259 + syscall(__NR_exit, EXIT_FAILURE); 260 + 261 + /* Make the thread-group leader exit prematurely. */ 262 + syscall(__NR_exit, EXIT_SUCCESS); 263 + } 264 + 265 + /* 266 + * Opening a PIDFD_THREAD aka thread-specific pidfd based on a 267 + * thread-group leader must succeed. 268 + */ 269 + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); 270 + ASSERT_GE(pidfd_leader_thread, 0); 271 + 272 + pid_poller = fork(); 273 + ASSERT_GE(pid_poller, 0); 274 + if (pid_poller == 0) { 275 + /* 276 + * We can't poll and wait for the old thread-group 277 + * leader to exit using a thread-specific pidfd. The 278 + * thread-group leader exited prematurely and 279 + * notification is delayed until all subthreads have 280 + * exited. 281 + */ 282 + fds.events = POLLIN; 283 + fds.fd = pidfd_leader_thread; 284 + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); 285 + if (nevents != 0) 286 + _exit(EXIT_FAILURE); 287 + if (fds.revents & POLLIN) 288 + _exit(EXIT_FAILURE); 289 + if (fds.revents & POLLHUP) 290 + _exit(EXIT_FAILURE); 291 + _exit(EXIT_SUCCESS); 292 + } 293 + 294 + /* Retrieve the tid of the thread. */ 295 + EXPECT_EQ(close(ipc_sockets[1]), 0); 296 + ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); 297 + EXPECT_EQ(close(ipc_sockets[0]), 0); 298 + 299 + /* Opening a thread as a thread-group leader must fail. */ 300 + pidfd_thread = sys_pidfd_open(pid_thread, 0); 301 + ASSERT_LT(pidfd_thread, 0); 302 + 303 + /* Opening a thread as a PIDFD_THREAD must succeed. */ 304 + pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); 305 + ASSERT_GE(pidfd_thread, 0); 306 + 307 + ASSERT_EQ(wait_for_pid(pid_poller), 0); 308 + 309 + /* 310 + * Note that pidfd_leader is a thread-group pidfd, so polling on it 311 + * would only notify us once all thread in the thread-group have 312 + * exited. So we can't poll before we have taken down the whole 313 + * thread-group. 314 + */ 315 + 316 + /* Get PIDFD_GET_INFO using the thread-group leader pidfd. */ 317 + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); 318 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); 319 + /* Process has exited but not been reaped, so no PIDFD_INFO_EXIT information yet. */ 320 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); 321 + ASSERT_EQ(info.pid, pid_leader); 322 + 323 + /* 324 + * Now retrieve the same info using the thread specific pidfd 325 + * for the thread-group leader. 326 + */ 327 + info2.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 328 + ASSERT_EQ(ioctl(pidfd_leader_thread, PIDFD_GET_INFO, &info2), 0); 329 + ASSERT_TRUE(!!(info2.mask & PIDFD_INFO_CREDS)); 330 + /* Process has exited but not been reaped, so no PIDFD_INFO_EXIT information yet. */ 331 + ASSERT_FALSE(!!(info2.mask & PIDFD_INFO_EXIT)); 332 + ASSERT_EQ(info2.pid, pid_leader); 333 + 334 + /* Now try the thread-specific pidfd. */ 335 + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); 336 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); 337 + /* The thread hasn't exited, so no PIDFD_INFO_EXIT information yet. */ 338 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); 339 + ASSERT_EQ(info.pid, pid_thread); 340 + 341 + /* 342 + * Take down the whole thread-group. The thread-group leader 343 + * exited successfully but the thread will now be SIGKILLed. 344 + * This must be reflected in the recorded exit information. 345 + */ 346 + EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); 347 + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); 348 + 349 + fds.events = POLLIN; 350 + fds.fd = pidfd_leader; 351 + nevents = poll(&fds, 1, -1); 352 + ASSERT_EQ(nevents, 1); 353 + ASSERT_TRUE(!!(fds.revents & POLLIN)); 354 + /* The thread-group leader has been reaped. */ 355 + ASSERT_TRUE(!!(fds.revents & POLLHUP)); 356 + 357 + /* 358 + * Retrieve exit information for the thread-group leader via the 359 + * thread-group leader pidfd. 360 + */ 361 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 362 + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); 363 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 364 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 365 + /* The thread-group leader exited successfully. Only the specific thread was SIGKILLed. */ 366 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 367 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); 368 + 369 + /* 370 + * Retrieve exit information for the thread-group leader via the 371 + * thread-specific pidfd. 372 + */ 373 + info2.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 374 + ASSERT_EQ(ioctl(pidfd_leader_thread, PIDFD_GET_INFO, &info2), 0); 375 + ASSERT_FALSE(!!(info2.mask & PIDFD_INFO_CREDS)); 376 + ASSERT_TRUE(!!(info2.mask & PIDFD_INFO_EXIT)); 377 + 378 + /* The thread-group leader exited successfully. Only the specific thread was SIGKILLed. */ 379 + ASSERT_TRUE(WIFEXITED(info2.exit_code)); 380 + ASSERT_EQ(WEXITSTATUS(info2.exit_code), 0); 381 + 382 + /* Retrieve exit information for the thread. */ 383 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 384 + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); 385 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 386 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 387 + 388 + /* The thread got SIGKILLed. */ 389 + ASSERT_TRUE(WIFSIGNALED(info.exit_code)); 390 + ASSERT_EQ(WTERMSIG(info.exit_code), SIGKILL); 391 + 392 + EXPECT_EQ(close(pidfd_leader), 0); 393 + EXPECT_EQ(close(pidfd_thread), 0); 394 + } 395 + 396 + static void *pidfd_info_thread_exec(void *arg) 397 + { 398 + pid_t pid_thread = gettid(); 399 + int ipc_socket = *(int *)arg; 400 + 401 + /* Inform the grand-parent what the tid of this thread is. */ 402 + if (write_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) 403 + return NULL; 404 + 405 + if (read_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) 406 + return NULL; 407 + 408 + close(ipc_socket); 409 + 410 + sys_execveat(AT_FDCWD, "pidfd_exec_helper", NULL, NULL, 0); 411 + return NULL; 412 + } 413 + 414 + TEST_F(pidfd_info, thread_group_exec) 415 + { 416 + pid_t pid_leader, pid_poller, pid_thread; 417 + pthread_t thread; 418 + int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; 419 + int ipc_sockets[2]; 420 + struct pollfd fds = {}; 421 + struct pidfd_info info = { 422 + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, 423 + }; 424 + 425 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 426 + EXPECT_EQ(ret, 0); 427 + 428 + pid_leader = create_child(&pidfd_leader, 0); 429 + EXPECT_GE(pid_leader, 0); 430 + 431 + if (pid_leader == 0) { 432 + close(ipc_sockets[0]); 433 + 434 + /* The thread will outlive the thread-group leader. */ 435 + if (pthread_create(&thread, NULL, pidfd_info_thread_exec, &ipc_sockets[1])) 436 + syscall(__NR_exit, EXIT_FAILURE); 437 + 438 + /* Make the thread-group leader exit prematurely. */ 439 + syscall(__NR_exit, EXIT_SUCCESS); 440 + } 441 + 442 + /* Open a thread-specific pidfd for the thread-group leader. */ 443 + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); 444 + ASSERT_GE(pidfd_leader_thread, 0); 445 + 446 + pid_poller = fork(); 447 + ASSERT_GE(pid_poller, 0); 448 + if (pid_poller == 0) { 449 + /* 450 + * We can't poll and wait for the old thread-group 451 + * leader to exit using a thread-specific pidfd. The 452 + * thread-group leader exited prematurely and 453 + * notification is delayed until all subthreads have 454 + * exited. 455 + * 456 + * When the thread has execed it will taken over the old 457 + * thread-group leaders struct pid. Calling poll after 458 + * the thread execed will thus block again because a new 459 + * thread-group has started. 460 + */ 461 + fds.events = POLLIN; 462 + fds.fd = pidfd_leader_thread; 463 + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); 464 + if (nevents != 0) 465 + _exit(EXIT_FAILURE); 466 + if (fds.revents & POLLIN) 467 + _exit(EXIT_FAILURE); 468 + if (fds.revents & POLLHUP) 469 + _exit(EXIT_FAILURE); 470 + _exit(EXIT_SUCCESS); 471 + } 472 + 473 + /* Retrieve the tid of the thread. */ 474 + EXPECT_EQ(close(ipc_sockets[1]), 0); 475 + ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); 476 + 477 + /* Opening a thread as a PIDFD_THREAD must succeed. */ 478 + pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); 479 + ASSERT_GE(pidfd_thread, 0); 480 + 481 + /* Now that we've opened a thread-specific pidfd the thread can exec. */ 482 + ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); 483 + EXPECT_EQ(close(ipc_sockets[0]), 0); 484 + 485 + ASSERT_EQ(wait_for_pid(pid_poller), 0); 486 + 487 + /* Wait until the kernel has SIGKILLed the thread. */ 488 + fds.events = POLLHUP; 489 + fds.fd = pidfd_thread; 490 + nevents = poll(&fds, 1, -1); 491 + ASSERT_EQ(nevents, 1); 492 + /* The thread has been reaped. */ 493 + ASSERT_TRUE(!!(fds.revents & POLLHUP)); 494 + 495 + /* Retrieve thread-specific exit info from pidfd. */ 496 + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); 497 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 498 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 499 + /* 500 + * While the kernel will have SIGKILLed the whole thread-group 501 + * during exec it will cause the individual threads to exit 502 + * cleanly. 503 + */ 504 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 505 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); 506 + 507 + /* 508 + * The thread-group leader is still alive, the thread has taken 509 + * over its struct pid and thus its pid number. 510 + */ 511 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 512 + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); 513 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); 514 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); 515 + ASSERT_EQ(info.pid, pid_leader); 516 + 517 + /* Take down the thread-group leader. */ 518 + EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); 519 + 520 + /* 521 + * Afte the exec we're dealing with an empty thread-group so now 522 + * we must see an exit notification on the thread-specific pidfd 523 + * for the thread-group leader as there's no subthread that can 524 + * revive the struct pid. 525 + */ 526 + fds.events = POLLIN; 527 + fds.fd = pidfd_leader_thread; 528 + nevents = poll(&fds, 1, -1); 529 + ASSERT_EQ(nevents, 1); 530 + ASSERT_TRUE(!!(fds.revents & POLLIN)); 531 + ASSERT_FALSE(!!(fds.revents & POLLHUP)); 532 + 533 + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); 534 + 535 + /* Retrieve exit information for the thread-group leader. */ 536 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 537 + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); 538 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 539 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 540 + 541 + EXPECT_EQ(close(pidfd_leader), 0); 542 + EXPECT_EQ(close(pidfd_thread), 0); 543 + } 544 + 545 + static void *pidfd_info_thread_exec_sane(void *arg) 546 + { 547 + pid_t pid_thread = gettid(); 548 + int ipc_socket = *(int *)arg; 549 + 550 + /* Inform the grand-parent what the tid of this thread is. */ 551 + if (write_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) 552 + return NULL; 553 + 554 + if (read_nointr(ipc_socket, &pid_thread, sizeof(pid_thread)) != sizeof(pid_thread)) 555 + return NULL; 556 + 557 + close(ipc_socket); 558 + 559 + sys_execveat(AT_FDCWD, "pidfd_exec_helper", NULL, NULL, 0); 560 + return NULL; 561 + } 562 + 563 + TEST_F(pidfd_info, thread_group_exec_thread) 564 + { 565 + pid_t pid_leader, pid_poller, pid_thread; 566 + pthread_t thread; 567 + int nevents, pidfd_leader, pidfd_leader_thread, pidfd_thread, ret; 568 + int ipc_sockets[2]; 569 + struct pollfd fds = {}; 570 + struct pidfd_info info = { 571 + .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, 572 + }; 573 + 574 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 575 + EXPECT_EQ(ret, 0); 576 + 577 + pid_leader = create_child(&pidfd_leader, 0); 578 + EXPECT_GE(pid_leader, 0); 579 + 580 + if (pid_leader == 0) { 581 + close(ipc_sockets[0]); 582 + 583 + /* The thread will outlive the thread-group leader. */ 584 + if (pthread_create(&thread, NULL, pidfd_info_thread_exec_sane, &ipc_sockets[1])) 585 + syscall(__NR_exit, EXIT_FAILURE); 586 + 587 + /* 588 + * Pause the thread-group leader. It will be killed once 589 + * the subthread execs. 590 + */ 591 + pause(); 592 + syscall(__NR_exit, EXIT_SUCCESS); 593 + } 594 + 595 + /* Retrieve the tid of the thread. */ 596 + EXPECT_EQ(close(ipc_sockets[1]), 0); 597 + ASSERT_EQ(read_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); 598 + 599 + /* Opening a thread as a PIDFD_THREAD must succeed. */ 600 + pidfd_thread = sys_pidfd_open(pid_thread, PIDFD_THREAD); 601 + ASSERT_GE(pidfd_thread, 0); 602 + 603 + /* Open a thread-specific pidfd for the thread-group leader. */ 604 + pidfd_leader_thread = sys_pidfd_open(pid_leader, PIDFD_THREAD); 605 + ASSERT_GE(pidfd_leader_thread, 0); 606 + 607 + pid_poller = fork(); 608 + ASSERT_GE(pid_poller, 0); 609 + if (pid_poller == 0) { 610 + /* 611 + * The subthread will now exec. The struct pid of the old 612 + * thread-group leader will be assumed by the subthread which 613 + * becomes the new thread-group leader. So no exit notification 614 + * must be generated. Wait for 5 seconds and call it a success 615 + * if no notification has been received. 616 + */ 617 + fds.events = POLLIN; 618 + fds.fd = pidfd_leader_thread; 619 + nevents = poll(&fds, 1, 10000 /* wait 5 seconds */); 620 + if (nevents != 0) 621 + _exit(EXIT_FAILURE); 622 + if (fds.revents & POLLIN) 623 + _exit(EXIT_FAILURE); 624 + if (fds.revents & POLLHUP) 625 + _exit(EXIT_FAILURE); 626 + _exit(EXIT_SUCCESS); 627 + } 628 + 629 + /* Now that we've opened a thread-specific pidfd the thread can exec. */ 630 + ASSERT_EQ(write_nointr(ipc_sockets[0], &pid_thread, sizeof(pid_thread)), sizeof(pid_thread)); 631 + EXPECT_EQ(close(ipc_sockets[0]), 0); 632 + ASSERT_EQ(wait_for_pid(pid_poller), 0); 633 + 634 + /* Wait until the kernel has SIGKILLed the thread. */ 635 + fds.events = POLLHUP; 636 + fds.fd = pidfd_thread; 637 + nevents = poll(&fds, 1, -1); 638 + ASSERT_EQ(nevents, 1); 639 + /* The thread has been reaped. */ 640 + ASSERT_TRUE(!!(fds.revents & POLLHUP)); 641 + 642 + /* Retrieve thread-specific exit info from pidfd. */ 643 + ASSERT_EQ(ioctl(pidfd_thread, PIDFD_GET_INFO, &info), 0); 644 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 645 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 646 + /* 647 + * While the kernel will have SIGKILLed the whole thread-group 648 + * during exec it will cause the individual threads to exit 649 + * cleanly. 650 + */ 651 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 652 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); 653 + 654 + /* 655 + * The thread-group leader is still alive, the thread has taken 656 + * over its struct pid and thus its pid number. 657 + */ 658 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 659 + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); 660 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_CREDS)); 661 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_EXIT)); 662 + ASSERT_EQ(info.pid, pid_leader); 663 + 664 + /* Take down the thread-group leader. */ 665 + EXPECT_EQ(sys_pidfd_send_signal(pidfd_leader, SIGKILL, NULL, 0), 0); 666 + 667 + /* 668 + * Afte the exec we're dealing with an empty thread-group so now 669 + * we must see an exit notification on the thread-specific pidfd 670 + * for the thread-group leader as there's no subthread that can 671 + * revive the struct pid. 672 + */ 673 + fds.events = POLLIN; 674 + fds.fd = pidfd_leader_thread; 675 + nevents = poll(&fds, 1, -1); 676 + ASSERT_EQ(nevents, 1); 677 + ASSERT_TRUE(!!(fds.revents & POLLIN)); 678 + ASSERT_FALSE(!!(fds.revents & POLLHUP)); 679 + 680 + EXPECT_EQ(sys_waitid(P_PIDFD, pidfd_leader, NULL, WEXITED), 0); 681 + 682 + /* Retrieve exit information for the thread-group leader. */ 683 + info.mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT; 684 + ASSERT_EQ(ioctl(pidfd_leader, PIDFD_GET_INFO, &info), 0); 685 + ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); 686 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); 687 + 688 + EXPECT_EQ(close(pidfd_leader), 0); 689 + EXPECT_EQ(close(pidfd_thread), 0); 690 + } 691 + 692 + TEST_HARNESS_MAIN

+2 -28

tools/testing/selftests/pidfd/pidfd_open_test.c

··· 22 22 #include "pidfd.h" 23 23 #include "../kselftest.h" 24 24 25 - #ifndef PIDFS_IOCTL_MAGIC 26 - #define PIDFS_IOCTL_MAGIC 0xFF 27 - #endif 28 - 29 - #ifndef PIDFD_GET_INFO 30 - #define PIDFD_GET_INFO _IOWR(PIDFS_IOCTL_MAGIC, 11, struct pidfd_info) 31 - #define PIDFD_INFO_CGROUPID (1UL << 0) 32 - 33 - struct pidfd_info { 34 - __u64 request_mask; 35 - __u64 cgroupid; 36 - __u32 pid; 37 - __u32 tgid; 38 - __u32 ppid; 39 - __u32 ruid; 40 - __u32 rgid; 41 - __u32 euid; 42 - __u32 egid; 43 - __u32 suid; 44 - __u32 sgid; 45 - __u32 fsuid; 46 - __u32 fsgid; 47 - __u32 spare0[1]; 48 - }; 49 - #endif 50 - 51 25 static int safe_int(const char *numstr, int *converted) 52 26 { 53 27 char *err = NULL; ··· 122 148 int main(int argc, char **argv) 123 149 { 124 150 struct pidfd_info info = { 125 - .request_mask = PIDFD_INFO_CGROUPID, 151 + .mask = PIDFD_INFO_CGROUPID, 126 152 }; 127 153 int pidfd = -1, ret = 1; 128 154 pid_t pid; ··· 201 227 getegid(), info.sgid); 202 228 goto on_error; 203 229 } 204 - if ((info.request_mask & PIDFD_INFO_CGROUPID) && info.cgroupid == 0) { 230 + if ((info.mask & PIDFD_INFO_CGROUPID) && info.cgroupid == 0) { 205 231 ksft_print_msg("cgroupid should not be 0 when PIDFD_INFO_CGROUPID is set\n"); 206 232 goto on_error; 207 233 }

-45

tools/testing/selftests/pidfd/pidfd_setns_test.c

··· 16 16 #include <unistd.h> 17 17 #include <sys/socket.h> 18 18 #include <sys/stat.h> 19 - #include <linux/ioctl.h> 20 19 21 20 #include "pidfd.h" 22 21 #include "../kselftest_harness.h" 23 - 24 - #ifndef PIDFS_IOCTL_MAGIC 25 - #define PIDFS_IOCTL_MAGIC 0xFF 26 - #endif 27 - 28 - #ifndef PIDFD_GET_CGROUP_NAMESPACE 29 - #define PIDFD_GET_CGROUP_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 1) 30 - #endif 31 - 32 - #ifndef PIDFD_GET_IPC_NAMESPACE 33 - #define PIDFD_GET_IPC_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 2) 34 - #endif 35 - 36 - #ifndef PIDFD_GET_MNT_NAMESPACE 37 - #define PIDFD_GET_MNT_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 3) 38 - #endif 39 - 40 - #ifndef PIDFD_GET_NET_NAMESPACE 41 - #define PIDFD_GET_NET_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 4) 42 - #endif 43 - 44 - #ifndef PIDFD_GET_PID_NAMESPACE 45 - #define PIDFD_GET_PID_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 5) 46 - #endif 47 - 48 - #ifndef PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE 49 - #define PIDFD_GET_PID_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 6) 50 - #endif 51 - 52 - #ifndef PIDFD_GET_TIME_NAMESPACE 53 - #define PIDFD_GET_TIME_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 7) 54 - #endif 55 - 56 - #ifndef PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE 57 - #define PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 8) 58 - #endif 59 - 60 - #ifndef PIDFD_GET_USER_NAMESPACE 61 - #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) 62 - #endif 63 - 64 - #ifndef PIDFD_GET_UTS_NAMESPACE 65 - #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10) 66 - #endif 67 22 68 23 enum { 69 24 PIDFD_NS_USER,

+66 -14

tools/testing/selftests/pidfd/pidfd_test.c

··· 42 42 #endif 43 43 } 44 44 45 - static int signal_received; 45 + static pthread_t signal_received; 46 46 47 47 static void set_signal_received_on_sigusr1(int sig) 48 48 { 49 49 if (sig == SIGUSR1) 50 - signal_received = 1; 50 + signal_received = pthread_self(); 51 + } 52 + 53 + static int send_signal(int pidfd) 54 + { 55 + int ret = 0; 56 + 57 + if (sys_pidfd_send_signal(pidfd, SIGUSR1, NULL, 0) < 0) { 58 + ret = -EINVAL; 59 + goto exit; 60 + } 61 + 62 + if (signal_received != pthread_self()) { 63 + ret = -EINVAL; 64 + goto exit; 65 + } 66 + 67 + exit: 68 + signal_received = 0; 69 + return ret; 70 + } 71 + 72 + static void *send_signal_worker(void *arg) 73 + { 74 + int pidfd = (int)(intptr_t)arg; 75 + int ret; 76 + 77 + /* We forward any errors for the caller to handle. */ 78 + ret = send_signal(pidfd); 79 + return (void *)(intptr_t)ret; 51 80 } 52 81 53 82 /* ··· 85 56 */ 86 57 static int test_pidfd_send_signal_simple_success(void) 87 58 { 88 - int pidfd, ret; 59 + int pidfd; 89 60 const char *test_name = "pidfd_send_signal send SIGUSR1"; 61 + pthread_t thread; 62 + void *thread_res; 63 + int err; 90 64 91 65 if (!have_pidfd_send_signal) { 92 66 ksft_test_result_skip( ··· 98 66 return 0; 99 67 } 100 68 69 + signal(SIGUSR1, set_signal_received_on_sigusr1); 70 + 71 + /* Try sending a signal to ourselves via /proc/self. */ 101 72 pidfd = open("/proc/self", O_DIRECTORY | O_CLOEXEC); 102 73 if (pidfd < 0) 103 74 ksft_exit_fail_msg( 104 75 "%s test: Failed to open process file descriptor\n", 105 76 test_name); 106 - 107 - signal(SIGUSR1, set_signal_received_on_sigusr1); 108 - 109 - ret = sys_pidfd_send_signal(pidfd, SIGUSR1, NULL, 0); 77 + err = send_signal(pidfd); 78 + if (err) 79 + ksft_exit_fail_msg( 80 + "%s test: Error %d on sending pidfd signal\n", 81 + test_name, err); 110 82 close(pidfd); 111 - if (ret < 0) 112 - ksft_exit_fail_msg("%s test: Failed to send signal\n", 113 - test_name); 114 83 115 - if (signal_received != 1) 116 - ksft_exit_fail_msg("%s test: Failed to receive signal\n", 117 - test_name); 84 + /* Now try the same thing only using PIDFD_SELF_THREAD_GROUP. */ 85 + err = send_signal(PIDFD_SELF_THREAD_GROUP); 86 + if (err) 87 + ksft_exit_fail_msg( 88 + "%s test: Error %d on PIDFD_SELF_THREAD_GROUP signal\n", 89 + test_name, err); 118 90 119 - signal_received = 0; 91 + /* 92 + * Now try the same thing in a thread and assert thread ID is equal to 93 + * worker thread ID. 94 + */ 95 + if (pthread_create(&thread, NULL, send_signal_worker, 96 + (void *)(intptr_t)PIDFD_SELF_THREAD)) 97 + ksft_exit_fail_msg("%s test: Failed to create thread\n", 98 + test_name); 99 + if (pthread_join(thread, &thread_res)) 100 + ksft_exit_fail_msg("%s test: Failed to join thread\n", 101 + test_name); 102 + err = (int)(intptr_t)thread_res; 103 + if (err) 104 + ksft_exit_fail_msg( 105 + "%s test: Error %d on PIDFD_SELF_THREAD signal\n", 106 + test_name, err); 107 + 120 108 ksft_test_result_pass("%s test: Sent signal\n", test_name); 121 109 return 0; 122 110 }

Configure Feed

Configure Feed