Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull clone and pidfs updates from Christian Brauner:
"Add three new clone3() flags for pidfd-based process lifecycle
management.

CLONE_AUTOREAP:

CLONE_AUTOREAP makes a child process auto-reap on exit without ever
becoming a zombie. This is a per-process property in contrast to
the existing auto-reap mechanism via SA_NOCLDWAIT or SIG_IGN for
SIGCHLD which applies to all children of a given parent.

Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped
property affecting all children which makes it unsuitable for
libraries or applications that need selective auto-reaping of
specific children while still being able to wait() on others.

CLONE_AUTOREAP stores an autoreap flag in the child's
signal_struct. When the child exits do_notify_parent() checks this
flag and causes exit_notify() to transition the task directly to
EXIT_DEAD. Since the flag lives on the child it survives
reparenting: if the original parent exits and the child is
reparented to a subreaper or init the child still auto-reaps when
it eventually exits. This is cleaner than forcing the subreaper to
get SIGCHLD and then reaping it. If the parent doesn't care the
subreaper won't care. If there's a subreaper that would care it
would be easy enough to add a prctl() that either just turns back
on SIGCHLD and turns off auto-reaping or a prctl() that just
notifies the subreaper whenever a child is reparented to it.

CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent
to monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern. No exit signal is delivered so exit_signal must be zero.
CLONE_THREAD and CLONE_PARENT are rejected: CLONE_THREAD because
autoreap is a process-level property, and CLONE_PARENT because an
autoreap child reparented via CLONE_PARENT could become an
invisible zombie under a parent that never calls wait().

The flag is not inherited by the autoreap process's own children.
Each child that should be autoreaped must be explicitly created
with CLONE_AUTOREAP.

CLONE_NNP:

CLONE_NNP sets no_new_privs on the child at clone time. Unlike
prctl(PR_SET_NO_NEW_PRIVS) which a process sets on itself,
CLONE_NNP allows the parent to impose no_new_privs on the child at
creation without affecting the parent's own privileges.
CLONE_THREAD is rejected because threads share credentials.
CLONE_NNP is useful on its own for any spawn-and-sandbox pattern
but was specifically introduced to enable unprivileged usage of
CLONE_PIDFD_AUTOKILL.

CLONE_PIDFD_AUTOKILL:

This flag ties a child's lifetime to the pidfd returned from
clone3(). When the last reference to the struct file created by
clone3() is closed the kernel sends SIGKILL to the child. A pidfd
obtained via pidfd_open() for the same process does not keep the
child alive and does not trigger autokill - only the specific
struct file from clone3() has this property. This is useful for
container runtimes, service managers, and sandboxed subprocess
execution - any scenario where the child must die if the parent
crashes or abandons the pidfd or just wants a throwaway helper
process.

CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD and CLONE_AUTOREAP.
It requires CLONE_PIDFD because the whole point is tying the
child's lifetime to the pidfd. It requires CLONE_AUTOREAP because a
killed child with no one to reap it would become a zombie - the
primary use case is the parent crashing or abandoning the pidfd so
no one is around to call waitpid(). CLONE_THREAD is rejected
because autokill targets a process not a thread.

If CLONE_NNP is specified together with CLONE_PIDFD_AUTOKILL an
unprivileged user may spawn a process that is autokilled. The child
cannot escalate privileges via setuid/setgid exec after being
spawned. If CLONE_PIDFD_AUTOKILL is specified without CLONE_NNP the
caller must have have CAP_SYS_ADMIN in its user namespace"

* tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: check pidfd_info->coredump_code correctness
pidfds: add coredump_code field to pidfd_info
kselftest/coredump: reintroduce null pointer dereference
selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests
selftests/pidfd: add CLONE_NNP tests
selftests/pidfd: add CLONE_AUTOREAP tests
pidfd: add CLONE_PIDFD_AUTOKILL
clone: add CLONE_NNP
clone: add CLONE_AUTOREAP

+1075 -20
+40 -10
fs/pidfs.c
··· 8 8 #include <linux/mount.h> 9 9 #include <linux/pid.h> 10 10 #include <linux/pidfs.h> 11 + #include <linux/sched/signal.h> 12 + #include <linux/signal.h> 11 13 #include <linux/pid_namespace.h> 12 14 #include <linux/poll.h> 13 15 #include <linux/proc_fs.h> ··· 56 54 }; 57 55 __u32 coredump_mask; 58 56 __u32 coredump_signal; 57 + __u32 coredump_code; 59 58 }; 60 59 61 60 static struct rhashtable pidfs_ino_ht; ··· 361 358 PIDFD_INFO_EXIT | \ 362 359 PIDFD_INFO_COREDUMP | \ 363 360 PIDFD_INFO_SUPPORTED_MASK | \ 364 - PIDFD_INFO_COREDUMP_SIGNAL) 361 + PIDFD_INFO_COREDUMP_SIGNAL | \ 362 + PIDFD_INFO_COREDUMP_CODE) 365 363 366 364 static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg) 367 365 { ··· 376 372 const struct cred *c; 377 373 __u64 mask; 378 374 379 - BUILD_BUG_ON(sizeof(struct pidfd_info) != PIDFD_INFO_SIZE_VER2); 375 + BUILD_BUG_ON(sizeof(struct pidfd_info) != PIDFD_INFO_SIZE_VER3); 380 376 381 377 if (!uinfo) 382 378 return -EINVAL; ··· 409 405 if (mask & PIDFD_INFO_COREDUMP) { 410 406 if (test_bit(PIDFS_ATTR_BIT_COREDUMP, &attr->attr_mask)) { 411 407 smp_rmb(); 412 - kinfo.mask |= PIDFD_INFO_COREDUMP | PIDFD_INFO_COREDUMP_SIGNAL; 408 + kinfo.mask |= PIDFD_INFO_COREDUMP | PIDFD_INFO_COREDUMP_SIGNAL | PIDFD_INFO_COREDUMP_CODE; 413 409 kinfo.coredump_mask = attr->coredump_mask; 414 410 kinfo.coredump_signal = attr->coredump_signal; 411 + kinfo.coredump_code = attr->coredump_code; 415 412 } 416 413 } 417 414 ··· 667 662 return open_namespace(ns_common); 668 663 } 669 664 665 + static int pidfs_file_release(struct inode *inode, struct file *file) 666 + { 667 + struct pid *pid = inode->i_private; 668 + struct task_struct *task; 669 + 670 + if (!(file->f_flags & PIDFD_AUTOKILL)) 671 + return 0; 672 + 673 + guard(rcu)(); 674 + task = pid_task(pid, PIDTYPE_TGID); 675 + if (!task) 676 + return 0; 677 + 678 + /* Not available for kthreads or user workers for now. */ 679 + if (WARN_ON_ONCE(task->flags & (PF_KTHREAD | PF_USER_WORKER))) 680 + return 0; 681 + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_TGID); 682 + return 0; 683 + } 684 + 670 685 static const struct file_operations pidfs_file_operations = { 686 + .release = pidfs_file_release, 671 687 .poll = pidfd_poll, 672 688 #ifdef CONFIG_PROC_FS 673 689 .show_fdinfo = pidfd_show_fdinfo, ··· 783 757 PIDFD_COREDUMPED; 784 758 /* If coredumping is set to skip we should never end up here. */ 785 759 VFS_WARN_ON_ONCE(attr->coredump_mask & PIDFD_COREDUMP_SKIP); 786 - /* Expose the signal number that caused the coredump. */ 760 + /* Expose the signal number and code that caused the coredump. */ 787 761 attr->coredump_signal = cprm->siginfo->si_signo; 762 + attr->coredump_code = cprm->siginfo->si_code; 788 763 smp_wmb(); 789 764 set_bit(PIDFS_ATTR_BIT_COREDUMP, &attr->attr_mask); 790 765 } ··· 1139 1112 int ret; 1140 1113 1141 1114 /* 1142 - * Ensure that PIDFD_STALE can be passed as a flag without 1143 - * overloading other uapi pidfd flags. 1115 + * Ensure that internal pidfd flags don't overlap with each 1116 + * other or with uapi pidfd flags. 1144 1117 */ 1145 - BUILD_BUG_ON(PIDFD_STALE == PIDFD_THREAD); 1146 - BUILD_BUG_ON(PIDFD_STALE == PIDFD_NONBLOCK); 1118 + BUILD_BUG_ON(hweight32(PIDFD_THREAD | PIDFD_NONBLOCK | 1119 + PIDFD_STALE | PIDFD_AUTOKILL) != 4); 1147 1120 1148 1121 ret = path_from_stashed(&pid->stashed, pidfs_mnt, get_pid(pid), &path); 1149 1122 if (ret < 0) ··· 1154 1127 flags &= ~PIDFD_STALE; 1155 1128 flags |= O_RDWR; 1156 1129 pidfd_file = dentry_open(&path, flags, current_cred()); 1157 - /* Raise PIDFD_THREAD explicitly as do_dentry_open() strips it. */ 1130 + /* 1131 + * Raise PIDFD_THREAD and PIDFD_AUTOKILL explicitly as 1132 + * do_dentry_open() strips O_EXCL and O_TRUNC. 1133 + */ 1158 1134 if (!IS_ERR(pidfd_file)) 1159 - pidfd_file->f_flags |= (flags & PIDFD_THREAD); 1135 + pidfd_file->f_flags |= (flags & (PIDFD_THREAD | PIDFD_AUTOKILL)); 1160 1136 1161 1137 return pidfd_file; 1162 1138 }
+1
include/linux/sched/signal.h
··· 132 132 */ 133 133 unsigned int is_child_subreaper:1; 134 134 unsigned int has_child_subreaper:1; 135 + unsigned int autoreap:1; 135 136 136 137 #ifdef CONFIG_POSIX_TIMERS 137 138
+5
include/uapi/linux/pidfd.h
··· 13 13 #ifdef __KERNEL__ 14 14 #include <linux/sched.h> 15 15 #define PIDFD_STALE CLONE_PIDFD 16 + #define PIDFD_AUTOKILL O_TRUNC 16 17 #endif 17 18 18 19 /* Flags for pidfd_send_signal(). */ ··· 29 28 #define PIDFD_INFO_COREDUMP (1UL << 4) /* Only returned if requested. */ 30 29 #define PIDFD_INFO_SUPPORTED_MASK (1UL << 5) /* Want/got supported mask flags */ 31 30 #define PIDFD_INFO_COREDUMP_SIGNAL (1UL << 6) /* Always returned if PIDFD_INFO_COREDUMP is requested. */ 31 + #define PIDFD_INFO_COREDUMP_CODE (1UL << 7) /* Always returned if PIDFD_INFO_COREDUMP is requested. */ 32 32 33 33 #define PIDFD_INFO_SIZE_VER0 64 /* sizeof first published struct */ 34 34 #define PIDFD_INFO_SIZE_VER1 72 /* sizeof second published struct */ 35 35 #define PIDFD_INFO_SIZE_VER2 80 /* sizeof third published struct */ 36 + #define PIDFD_INFO_SIZE_VER3 88 /* sizeof fourth published struct */ 36 37 37 38 /* 38 39 * Values for @coredump_mask in pidfd_info. ··· 101 98 struct /* coredump info */ { 102 99 __u32 coredump_mask; 103 100 __u32 coredump_signal; 101 + __u32 coredump_code; 102 + __u32 coredump_pad; /* align supported_mask to 8 bytes */ 104 103 }; 105 104 __u64 supported_mask; /* Mask flags that this kernel supports */ 106 105 };
+5 -2
include/uapi/linux/sched.h
··· 34 34 #define CLONE_IO 0x80000000 /* Clone io context */ 35 35 36 36 /* Flags for the clone3() syscall. */ 37 - #define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */ 38 - #define CLONE_INTO_CGROUP 0x200000000ULL /* Clone into a specific cgroup given the right permissions. */ 37 + #define CLONE_CLEAR_SIGHAND (1ULL << 32) /* Clear any signal handler and reset to SIG_DFL. */ 38 + #define CLONE_INTO_CGROUP (1ULL << 33) /* Clone into a specific cgroup given the right permissions. */ 39 + #define CLONE_AUTOREAP (1ULL << 34) /* Auto-reap child on exit. */ 40 + #define CLONE_NNP (1ULL << 35) /* Set no_new_privs on child. */ 41 + #define CLONE_PIDFD_AUTOKILL (1ULL << 36) /* Kill child when clone pidfd closes. */ 39 42 40 43 /* 41 44 * cloning flags intersect with CSIGNAL so can be used with unshare and clone3
+49 -3
kernel/fork.c
··· 2029 2029 return ERR_PTR(-EINVAL); 2030 2030 } 2031 2031 2032 + if (clone_flags & CLONE_AUTOREAP) { 2033 + if (clone_flags & CLONE_THREAD) 2034 + return ERR_PTR(-EINVAL); 2035 + if (clone_flags & CLONE_PARENT) 2036 + return ERR_PTR(-EINVAL); 2037 + if (args->exit_signal) 2038 + return ERR_PTR(-EINVAL); 2039 + } 2040 + 2041 + if ((clone_flags & CLONE_PARENT) && current->signal->autoreap) 2042 + return ERR_PTR(-EINVAL); 2043 + 2044 + if (clone_flags & CLONE_NNP) { 2045 + if (clone_flags & CLONE_THREAD) 2046 + return ERR_PTR(-EINVAL); 2047 + } 2048 + 2049 + if (clone_flags & CLONE_PIDFD_AUTOKILL) { 2050 + if (!(clone_flags & CLONE_PIDFD)) 2051 + return ERR_PTR(-EINVAL); 2052 + if (!(clone_flags & CLONE_AUTOREAP)) 2053 + return ERR_PTR(-EINVAL); 2054 + if (clone_flags & CLONE_THREAD) 2055 + return ERR_PTR(-EINVAL); 2056 + /* 2057 + * Without CLONE_NNP the child could escalate privileges 2058 + * after being spawned, so require CAP_SYS_ADMIN. 2059 + * With CLONE_NNP the child can't gain new privileges, 2060 + * so allow unprivileged usage. 2061 + */ 2062 + if (!(clone_flags & CLONE_NNP) && 2063 + !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) 2064 + return ERR_PTR(-EPERM); 2065 + } 2066 + 2032 2067 /* 2033 2068 * Force any signals received before this point to be delivered 2034 2069 * before the fork happens. Collect up signals sent to multiple ··· 2286 2251 * if the fd table isn't shared). 2287 2252 */ 2288 2253 if (clone_flags & CLONE_PIDFD) { 2289 - int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0; 2254 + unsigned flags = PIDFD_STALE; 2255 + 2256 + if (clone_flags & CLONE_THREAD) 2257 + flags |= PIDFD_THREAD; 2258 + if (clone_flags & CLONE_PIDFD_AUTOKILL) 2259 + flags |= PIDFD_AUTOKILL; 2290 2260 2291 2261 /* 2292 2262 * Note that no task has been attached to @pid yet indicate 2293 2263 * that via CLONE_PIDFD. 2294 2264 */ 2295 - retval = pidfd_prepare(pid, flags | PIDFD_STALE, &pidfile); 2265 + retval = pidfd_prepare(pid, flags, &pidfile); 2296 2266 if (retval < 0) 2297 2267 goto bad_fork_free_pid; 2298 2268 pidfd = retval; ··· 2453 2413 */ 2454 2414 copy_seccomp(p); 2455 2415 2416 + if (clone_flags & CLONE_NNP) 2417 + task_set_no_new_privs(p); 2418 + 2456 2419 init_task_pid_links(p); 2457 2420 if (likely(p->pid)) { 2458 2421 ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace); ··· 2479 2436 */ 2480 2437 p->signal->has_child_subreaper = p->real_parent->signal->has_child_subreaper || 2481 2438 p->real_parent->signal->is_child_subreaper; 2439 + if (clone_flags & CLONE_AUTOREAP) 2440 + p->signal->autoreap = 1; 2482 2441 list_add_tail(&p->sibling, &p->real_parent->children); 2483 2442 list_add_tail_rcu(&p->tasks, &init_task.tasks); 2484 2443 attach_pid(p, PIDTYPE_TGID); ··· 2942 2897 { 2943 2898 /* Verify that no unknown flags are passed along. */ 2944 2899 if (kargs->flags & 2945 - ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP)) 2900 + ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND | CLONE_INTO_CGROUP | 2901 + CLONE_AUTOREAP | CLONE_NNP | CLONE_PIDFD_AUTOKILL)) 2946 2902 return false; 2947 2903 2948 2904 /*
+2 -1
kernel/ptrace.c
··· 549 549 if (!dead && thread_group_empty(p)) { 550 550 if (!same_thread_group(p->real_parent, tracer)) 551 551 dead = do_notify_parent(p, p->exit_signal); 552 - else if (ignoring_children(tracer->sighand)) { 552 + else if (ignoring_children(tracer->sighand) || 553 + p->signal->autoreap) { 553 554 __wake_up_parent(p, tracer); 554 555 dead = true; 555 556 }
+4
kernel/signal.c
··· 2251 2251 if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) 2252 2252 sig = 0; 2253 2253 } 2254 + if (!tsk->ptrace && tsk->signal->autoreap) { 2255 + autoreap = true; 2256 + sig = 0; 2257 + } 2254 2258 /* 2255 2259 * Send with __send_signal as si_pid and si_uid are in the 2256 2260 * parent's namespaces.
+26
tools/testing/selftests/coredump/coredump_socket_protocol_test.c
··· 1004 1004 * 1005 1005 * Verify that when using socket-based coredump protocol, 1006 1006 * the coredump_signal field is correctly exposed as SIGSEGV. 1007 + * Also check that the coredump_code field is correctly exposed 1008 + * as SEGV_MAPERR. 1007 1009 */ 1008 1010 TEST_F(coredump, socket_coredump_signal_sigsegv) 1009 1011 { ··· 1081 1079 goto out; 1082 1080 } 1083 1081 1082 + /* Verify coredump_code is available and correct */ 1083 + if (!(info.mask & PIDFD_INFO_COREDUMP_CODE)) { 1084 + fprintf(stderr, "socket_coredump_signal_sigsegv: PIDFD_INFO_COREDUMP_CODE not set in mask\n"); 1085 + goto out; 1086 + } 1087 + 1088 + if (info.coredump_code != SEGV_MAPERR) { 1089 + fprintf(stderr, "socket_coredump_signal_sigsegv: coredump_code=%d, expected SEGV_MAPERR=%d\n", 1090 + info.coredump_code, SEGV_MAPERR); 1091 + goto out; 1092 + } 1093 + 1084 1094 if (!read_coredump_req(fd_coredump, &req)) { 1085 1095 fprintf(stderr, "socket_coredump_signal_sigsegv: read_coredump_req failed\n"); 1086 1096 goto out; ··· 1142 1128 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP)); 1143 1129 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_SIGNAL)); 1144 1130 ASSERT_EQ(info.coredump_signal, SIGSEGV); 1131 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_CODE)); 1132 + ASSERT_EQ(info.coredump_code, SEGV_MAPERR); 1145 1133 1146 1134 wait_and_check_coredump_server(pid_coredump_server, _metadata, self); 1147 1135 } ··· 1153 1137 * 1154 1138 * Verify that when using socket-based coredump protocol, 1155 1139 * the coredump_signal field is correctly exposed as SIGABRT. 1140 + * Also check that the coredump_code field is correctly exposed 1141 + * as SI_TKILL. 1156 1142 */ 1157 1143 TEST_F(coredump, socket_coredump_signal_sigabrt) 1158 1144 { ··· 1230 1212 goto out; 1231 1213 } 1232 1214 1215 + if (info.coredump_code != SI_TKILL) { 1216 + fprintf(stderr, "socket_coredump_signal_sigabrt: coredump_code=%d, expected SI_TKILL=%d\n", 1217 + info.coredump_code, SI_TKILL); 1218 + goto out; 1219 + } 1220 + 1233 1221 if (!read_coredump_req(fd_coredump, &req)) { 1234 1222 fprintf(stderr, "socket_coredump_signal_sigabrt: read_coredump_req failed\n"); 1235 1223 goto out; ··· 1285 1261 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP)); 1286 1262 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_SIGNAL)); 1287 1263 ASSERT_EQ(info.coredump_signal, SIGABRT); 1264 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_CODE)); 1265 + ASSERT_EQ(info.coredump_code, SI_TKILL); 1288 1266 1289 1267 wait_and_check_coredump_server(pid_coredump_server, _metadata, self); 1290 1268 }
+32
tools/testing/selftests/coredump/coredump_socket_test.c
··· 435 435 * 436 436 * Verify that when using simple socket-based coredump (@ pattern), 437 437 * the coredump_signal field is correctly exposed as SIGSEGV. 438 + * Also check that the coredump_code field is correctly exposed 439 + * as SEGV_MAPERR. 438 440 */ 439 441 TEST_F(coredump, socket_coredump_signal_sigsegv) 440 442 { ··· 511 509 goto out; 512 510 } 513 511 512 + /* Verify coredump_code is available and correct */ 513 + if (!(info.mask & PIDFD_INFO_COREDUMP_CODE)) { 514 + fprintf(stderr, "socket_coredump_signal_sigsegv: PIDFD_INFO_COREDUMP_CODE not set in mask\n"); 515 + goto out; 516 + } 517 + 518 + if (info.coredump_code != SEGV_MAPERR) { 519 + fprintf(stderr, "socket_coredump_signal_sigsegv: coredump_code=%d, expected SEGV_MAPERR=%d\n", 520 + info.coredump_code, SEGV_MAPERR); 521 + goto out; 522 + } 523 + 514 524 fd_core_file = open_coredump_tmpfile(self->fd_tmpfs_detached); 515 525 if (fd_core_file < 0) { 516 526 fprintf(stderr, "socket_coredump_signal_sigsegv: open_coredump_tmpfile failed: %m\n"); ··· 586 572 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP)); 587 573 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_SIGNAL)); 588 574 ASSERT_EQ(info.coredump_signal, SIGSEGV); 575 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_CODE)); 576 + ASSERT_EQ(info.coredump_code, SEGV_MAPERR); 589 577 590 578 wait_and_check_coredump_server(pid_coredump_server, _metadata, self); 591 579 } ··· 597 581 * 598 582 * Verify that when using simple socket-based coredump (@ pattern), 599 583 * the coredump_signal field is correctly exposed as SIGABRT. 584 + * Also check that the coredump_code field is correctly exposed 585 + * as SI_TKILL. 600 586 */ 601 587 TEST_F(coredump, socket_coredump_signal_sigabrt) 602 588 { ··· 673 655 goto out; 674 656 } 675 657 658 + /* Verify coredump_code is available and correct */ 659 + if (!(info.mask & PIDFD_INFO_COREDUMP_CODE)) { 660 + fprintf(stderr, "socket_coredump_signal_sigabrt: PIDFD_INFO_COREDUMP_CODE not set in mask\n"); 661 + goto out; 662 + } 663 + 664 + if (info.coredump_code != SI_TKILL) { 665 + fprintf(stderr, "socket_coredump_signal_sigabrt: coredump_code=%d, expected SI_TKILL=%d\n", 666 + info.coredump_code, SI_TKILL); 667 + goto out; 668 + } 669 + 676 670 fd_core_file = open_coredump_tmpfile(self->fd_tmpfs_detached); 677 671 if (fd_core_file < 0) { 678 672 fprintf(stderr, "socket_coredump_signal_sigabrt: open_coredump_tmpfile failed: %m\n"); ··· 748 718 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP)); 749 719 ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_SIGNAL)); 750 720 ASSERT_EQ(info.coredump_signal, SIGABRT); 721 + ASSERT_TRUE(!!(info.mask & PIDFD_INFO_COREDUMP_CODE)); 722 + ASSERT_EQ(info.coredump_code, SI_TKILL); 751 723 752 724 wait_and_check_coredump_server(pid_coredump_server, _metadata, self); 753 725 }
+3 -3
tools/testing/selftests/coredump/coredump_test_helpers.c
··· 56 56 pthread_create(&thread, NULL, do_nothing, NULL); 57 57 58 58 /* crash on purpose */ 59 - __builtin_trap(); 59 + i = *(volatile int *)NULL; 60 60 } 61 61 62 62 int create_detached_tmpfs(void) ··· 148 148 fprintf(stderr, "get_pidfd_info: ioctl(PIDFD_GET_INFO) failed: %m\n"); 149 149 return false; 150 150 } 151 - fprintf(stderr, "get_pidfd_info: mask=0x%llx, coredump_mask=0x%x, coredump_signal=%d\n", 152 - (unsigned long long)info->mask, info->coredump_mask, info->coredump_signal); 151 + fprintf(stderr, "get_pidfd_info: mask=0x%llx, coredump_mask=0x%x, coredump_signal=%d, coredump_code=%d\n", 152 + (unsigned long long)info->mask, info->coredump_mask, info->coredump_signal, info->coredump_code); 153 153 return true; 154 154 } 155 155
+1
tools/testing/selftests/pidfd/.gitignore
··· 12 12 pidfd_exec_helper 13 13 pidfd_xattr_test 14 14 pidfd_setattr_test 15 + pidfd_autoreap_test
+1 -1
tools/testing/selftests/pidfd/Makefile
··· 4 4 TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \ 5 5 pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \ 6 6 pidfd_file_handle_test pidfd_bind_mount pidfd_info_test \ 7 - pidfd_xattr_test pidfd_setattr_test 7 + pidfd_xattr_test pidfd_setattr_test pidfd_autoreap_test 8 8 9 9 TEST_GEN_PROGS_EXTENDED := pidfd_exec_helper 10 10
+5
tools/testing/selftests/pidfd/pidfd.h
··· 156 156 #define PIDFD_INFO_COREDUMP_SIGNAL (1UL << 6) 157 157 #endif 158 158 159 + #ifndef PIDFD_INFO_COREDUMP_CODE 160 + #define PIDFD_INFO_COREDUMP_CODE (1UL << 7) 161 + #endif 162 + 159 163 #ifndef PIDFD_COREDUMPED 160 164 #define PIDFD_COREDUMPED (1U << 0) /* Did crash and... */ 161 165 #endif ··· 198 194 struct { 199 195 __u32 coredump_mask; 200 196 __u32 coredump_signal; 197 + __u32 coredump_code; 201 198 }; 202 199 __u64 supported_mask; 203 200 };
+900
tools/testing/selftests/pidfd/pidfd_autoreap_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2026 Christian Brauner <brauner@kernel.org> 3 + 4 + #define _GNU_SOURCE 5 + #include <errno.h> 6 + #include <linux/types.h> 7 + #include <poll.h> 8 + #include <pthread.h> 9 + #include <sched.h> 10 + #include <signal.h> 11 + #include <stdio.h> 12 + #include <stdlib.h> 13 + #include <string.h> 14 + #include <syscall.h> 15 + #include <sys/ioctl.h> 16 + #include <sys/prctl.h> 17 + #include <sys/socket.h> 18 + #include <sys/types.h> 19 + #include <sys/wait.h> 20 + #include <unistd.h> 21 + 22 + #include "pidfd.h" 23 + #include "kselftest_harness.h" 24 + 25 + #ifndef CLONE_AUTOREAP 26 + #define CLONE_AUTOREAP (1ULL << 34) 27 + #endif 28 + 29 + #ifndef CLONE_NNP 30 + #define CLONE_NNP (1ULL << 35) 31 + #endif 32 + 33 + #ifndef CLONE_PIDFD_AUTOKILL 34 + #define CLONE_PIDFD_AUTOKILL (1ULL << 36) 35 + #endif 36 + 37 + #ifndef _LINUX_CAPABILITY_VERSION_3 38 + #define _LINUX_CAPABILITY_VERSION_3 0x20080522 39 + #endif 40 + 41 + struct cap_header { 42 + __u32 version; 43 + int pid; 44 + }; 45 + 46 + struct cap_data { 47 + __u32 effective; 48 + __u32 permitted; 49 + __u32 inheritable; 50 + }; 51 + 52 + static int drop_all_caps(void) 53 + { 54 + struct cap_header hdr = { .version = _LINUX_CAPABILITY_VERSION_3 }; 55 + struct cap_data data[2] = {}; 56 + 57 + return syscall(__NR_capset, &hdr, data); 58 + } 59 + 60 + static pid_t create_autoreap_child(int *pidfd) 61 + { 62 + struct __clone_args args = { 63 + .flags = CLONE_PIDFD | CLONE_AUTOREAP, 64 + .exit_signal = 0, 65 + .pidfd = ptr_to_u64(pidfd), 66 + }; 67 + 68 + return sys_clone3(&args, sizeof(args)); 69 + } 70 + 71 + /* 72 + * Test that CLONE_AUTOREAP works without CLONE_PIDFD (fire-and-forget). 73 + */ 74 + TEST(autoreap_without_pidfd) 75 + { 76 + struct __clone_args args = { 77 + .flags = CLONE_AUTOREAP, 78 + .exit_signal = 0, 79 + }; 80 + pid_t pid; 81 + int ret; 82 + 83 + pid = sys_clone3(&args, sizeof(args)); 84 + if (pid < 0 && errno == EINVAL) 85 + SKIP(return, "CLONE_AUTOREAP not supported"); 86 + ASSERT_GE(pid, 0); 87 + 88 + if (pid == 0) 89 + _exit(0); 90 + 91 + /* 92 + * Give the child a moment to exit and be autoreaped. 93 + * Then verify no zombie remains. 94 + */ 95 + usleep(200000); 96 + ret = waitpid(pid, NULL, WNOHANG); 97 + ASSERT_EQ(ret, -1); 98 + ASSERT_EQ(errno, ECHILD); 99 + } 100 + 101 + /* 102 + * Test that CLONE_AUTOREAP with a non-zero exit_signal fails. 103 + */ 104 + TEST(autoreap_rejects_exit_signal) 105 + { 106 + struct __clone_args args = { 107 + .flags = CLONE_AUTOREAP, 108 + .exit_signal = SIGCHLD, 109 + }; 110 + pid_t pid; 111 + 112 + pid = sys_clone3(&args, sizeof(args)); 113 + ASSERT_EQ(pid, -1); 114 + ASSERT_EQ(errno, EINVAL); 115 + } 116 + 117 + /* 118 + * Test that CLONE_AUTOREAP with CLONE_PARENT fails. 119 + */ 120 + TEST(autoreap_rejects_parent) 121 + { 122 + struct __clone_args args = { 123 + .flags = CLONE_AUTOREAP | CLONE_PARENT, 124 + .exit_signal = 0, 125 + }; 126 + pid_t pid; 127 + 128 + pid = sys_clone3(&args, sizeof(args)); 129 + ASSERT_EQ(pid, -1); 130 + ASSERT_EQ(errno, EINVAL); 131 + } 132 + 133 + /* 134 + * Test that CLONE_AUTOREAP with CLONE_THREAD fails. 135 + */ 136 + TEST(autoreap_rejects_thread) 137 + { 138 + struct __clone_args args = { 139 + .flags = CLONE_AUTOREAP | CLONE_THREAD | 140 + CLONE_SIGHAND | CLONE_VM, 141 + .exit_signal = 0, 142 + }; 143 + pid_t pid; 144 + 145 + pid = sys_clone3(&args, sizeof(args)); 146 + ASSERT_EQ(pid, -1); 147 + ASSERT_EQ(errno, EINVAL); 148 + } 149 + 150 + /* 151 + * Basic test: create an autoreap child, let it exit, verify: 152 + * - pidfd becomes readable (poll returns POLLIN) 153 + * - PIDFD_GET_INFO returns the correct exit code 154 + * - waitpid() returns -1/ECHILD (no zombie) 155 + */ 156 + TEST(autoreap_basic) 157 + { 158 + struct pidfd_info info = { .mask = PIDFD_INFO_EXIT }; 159 + int pidfd = -1, ret; 160 + struct pollfd pfd; 161 + pid_t pid; 162 + 163 + pid = create_autoreap_child(&pidfd); 164 + if (pid < 0 && errno == EINVAL) 165 + SKIP(return, "CLONE_AUTOREAP not supported"); 166 + ASSERT_GE(pid, 0); 167 + 168 + if (pid == 0) 169 + _exit(42); 170 + 171 + ASSERT_GE(pidfd, 0); 172 + 173 + /* Wait for the child to exit via pidfd poll. */ 174 + pfd.fd = pidfd; 175 + pfd.events = POLLIN; 176 + ret = poll(&pfd, 1, 5000); 177 + ASSERT_EQ(ret, 1); 178 + ASSERT_TRUE(pfd.revents & POLLIN); 179 + 180 + /* Verify exit info via PIDFD_GET_INFO. */ 181 + ret = ioctl(pidfd, PIDFD_GET_INFO, &info); 182 + ASSERT_EQ(ret, 0); 183 + ASSERT_TRUE(info.mask & PIDFD_INFO_EXIT); 184 + /* 185 + * exit_code is in waitpid format: for _exit(42), 186 + * WIFEXITED is true and WEXITSTATUS is 42. 187 + */ 188 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 189 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 42); 190 + 191 + /* Verify no zombie: waitpid should fail with ECHILD. */ 192 + ret = waitpid(pid, NULL, WNOHANG); 193 + ASSERT_EQ(ret, -1); 194 + ASSERT_EQ(errno, ECHILD); 195 + 196 + close(pidfd); 197 + } 198 + 199 + /* 200 + * Test that an autoreap child killed by a signal reports 201 + * the correct exit info. 202 + */ 203 + TEST(autoreap_signaled) 204 + { 205 + struct pidfd_info info = { .mask = PIDFD_INFO_EXIT }; 206 + int pidfd = -1, ret; 207 + struct pollfd pfd; 208 + pid_t pid; 209 + 210 + pid = create_autoreap_child(&pidfd); 211 + if (pid < 0 && errno == EINVAL) 212 + SKIP(return, "CLONE_AUTOREAP not supported"); 213 + ASSERT_GE(pid, 0); 214 + 215 + if (pid == 0) { 216 + pause(); 217 + _exit(1); 218 + } 219 + 220 + ASSERT_GE(pidfd, 0); 221 + 222 + /* Kill the child. */ 223 + ret = sys_pidfd_send_signal(pidfd, SIGKILL, NULL, 0); 224 + ASSERT_EQ(ret, 0); 225 + 226 + /* Wait for exit via pidfd. */ 227 + pfd.fd = pidfd; 228 + pfd.events = POLLIN; 229 + ret = poll(&pfd, 1, 5000); 230 + ASSERT_EQ(ret, 1); 231 + ASSERT_TRUE(pfd.revents & POLLIN); 232 + 233 + /* Verify signal info. */ 234 + ret = ioctl(pidfd, PIDFD_GET_INFO, &info); 235 + ASSERT_EQ(ret, 0); 236 + ASSERT_TRUE(info.mask & PIDFD_INFO_EXIT); 237 + ASSERT_TRUE(WIFSIGNALED(info.exit_code)); 238 + ASSERT_EQ(WTERMSIG(info.exit_code), SIGKILL); 239 + 240 + /* No zombie. */ 241 + ret = waitpid(pid, NULL, WNOHANG); 242 + ASSERT_EQ(ret, -1); 243 + ASSERT_EQ(errno, ECHILD); 244 + 245 + close(pidfd); 246 + } 247 + 248 + /* 249 + * Test autoreap survives reparenting: middle process creates an 250 + * autoreap grandchild, then exits. The grandchild gets reparented 251 + * to us (the grandparent, which is a subreaper). When the grandchild 252 + * exits, it should still be autoreaped - no zombie under us. 253 + */ 254 + TEST(autoreap_reparent) 255 + { 256 + int ipc_sockets[2], ret; 257 + int pidfd = -1; 258 + struct pollfd pfd; 259 + pid_t mid_pid, grandchild_pid; 260 + char buf[32] = {}; 261 + 262 + /* Make ourselves a subreaper so reparented children come to us. */ 263 + ret = prctl(PR_SET_CHILD_SUBREAPER, 1); 264 + ASSERT_EQ(ret, 0); 265 + 266 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 267 + ASSERT_EQ(ret, 0); 268 + 269 + mid_pid = fork(); 270 + ASSERT_GE(mid_pid, 0); 271 + 272 + if (mid_pid == 0) { 273 + /* Middle child: create an autoreap grandchild. */ 274 + int gc_pidfd = -1; 275 + 276 + close(ipc_sockets[0]); 277 + 278 + grandchild_pid = create_autoreap_child(&gc_pidfd); 279 + if (grandchild_pid < 0) { 280 + write_nointr(ipc_sockets[1], "E", 1); 281 + close(ipc_sockets[1]); 282 + _exit(1); 283 + } 284 + 285 + if (grandchild_pid == 0) { 286 + /* Grandchild: wait for signal to exit. */ 287 + close(ipc_sockets[1]); 288 + if (gc_pidfd >= 0) 289 + close(gc_pidfd); 290 + pause(); 291 + _exit(0); 292 + } 293 + 294 + /* Send grandchild PID to grandparent. */ 295 + snprintf(buf, sizeof(buf), "%d", grandchild_pid); 296 + write_nointr(ipc_sockets[1], buf, strlen(buf)); 297 + close(ipc_sockets[1]); 298 + if (gc_pidfd >= 0) 299 + close(gc_pidfd); 300 + 301 + /* Middle child exits, grandchild gets reparented. */ 302 + _exit(0); 303 + } 304 + 305 + close(ipc_sockets[1]); 306 + 307 + /* Read grandchild's PID. */ 308 + ret = read_nointr(ipc_sockets[0], buf, sizeof(buf) - 1); 309 + close(ipc_sockets[0]); 310 + ASSERT_GT(ret, 0); 311 + 312 + if (buf[0] == 'E') { 313 + waitpid(mid_pid, NULL, 0); 314 + prctl(PR_SET_CHILD_SUBREAPER, 0); 315 + SKIP(return, "CLONE_AUTOREAP not supported"); 316 + } 317 + 318 + grandchild_pid = atoi(buf); 319 + ASSERT_GT(grandchild_pid, 0); 320 + 321 + /* Wait for the middle child to exit. */ 322 + ret = waitpid(mid_pid, NULL, 0); 323 + ASSERT_EQ(ret, mid_pid); 324 + 325 + /* 326 + * Now the grandchild is reparented to us (subreaper). 327 + * Open a pidfd for the grandchild and kill it. 328 + */ 329 + pidfd = sys_pidfd_open(grandchild_pid, 0); 330 + ASSERT_GE(pidfd, 0); 331 + 332 + ret = sys_pidfd_send_signal(pidfd, SIGKILL, NULL, 0); 333 + ASSERT_EQ(ret, 0); 334 + 335 + /* Wait for it to exit via pidfd poll. */ 336 + pfd.fd = pidfd; 337 + pfd.events = POLLIN; 338 + ret = poll(&pfd, 1, 5000); 339 + ASSERT_EQ(ret, 1); 340 + ASSERT_TRUE(pfd.revents & POLLIN); 341 + 342 + /* 343 + * The grandchild should have been autoreaped even though 344 + * we (the new parent) haven't set SA_NOCLDWAIT. 345 + * waitpid should return -1/ECHILD. 346 + */ 347 + ret = waitpid(grandchild_pid, NULL, WNOHANG); 348 + EXPECT_EQ(ret, -1); 349 + EXPECT_EQ(errno, ECHILD); 350 + 351 + close(pidfd); 352 + 353 + /* Clean up subreaper status. */ 354 + prctl(PR_SET_CHILD_SUBREAPER, 0); 355 + } 356 + 357 + static int thread_sock_fd; 358 + 359 + static void *thread_func(void *arg) 360 + { 361 + /* Signal parent we're running. */ 362 + write_nointr(thread_sock_fd, "1", 1); 363 + 364 + /* Give main thread time to call _exit() first. */ 365 + usleep(200000); 366 + 367 + return NULL; 368 + } 369 + 370 + /* 371 + * Test that an autoreap child with multiple threads is properly 372 + * autoreaped only after all threads have exited. 373 + */ 374 + TEST(autoreap_multithreaded) 375 + { 376 + struct pidfd_info info = { .mask = PIDFD_INFO_EXIT }; 377 + int ipc_sockets[2], ret; 378 + int pidfd = -1; 379 + struct pollfd pfd; 380 + pid_t pid; 381 + char c; 382 + 383 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 384 + ASSERT_EQ(ret, 0); 385 + 386 + pid = create_autoreap_child(&pidfd); 387 + if (pid < 0 && errno == EINVAL) { 388 + close(ipc_sockets[0]); 389 + close(ipc_sockets[1]); 390 + SKIP(return, "CLONE_AUTOREAP not supported"); 391 + } 392 + ASSERT_GE(pid, 0); 393 + 394 + if (pid == 0) { 395 + pthread_t thread; 396 + 397 + close(ipc_sockets[0]); 398 + 399 + /* 400 + * Create a sub-thread that outlives the main thread. 401 + * The thread signals readiness, then sleeps. 402 + * The main thread waits briefly, then calls _exit(). 403 + */ 404 + thread_sock_fd = ipc_sockets[1]; 405 + pthread_create(&thread, NULL, thread_func, NULL); 406 + pthread_detach(thread); 407 + 408 + /* Wait for thread to be running. */ 409 + usleep(100000); 410 + 411 + /* Main thread exits; sub-thread is still alive. */ 412 + _exit(99); 413 + } 414 + 415 + close(ipc_sockets[1]); 416 + 417 + /* Wait for the sub-thread to signal readiness. */ 418 + ret = read_nointr(ipc_sockets[0], &c, 1); 419 + close(ipc_sockets[0]); 420 + ASSERT_EQ(ret, 1); 421 + 422 + /* Wait for the process to fully exit via pidfd poll. */ 423 + pfd.fd = pidfd; 424 + pfd.events = POLLIN; 425 + ret = poll(&pfd, 1, 5000); 426 + ASSERT_EQ(ret, 1); 427 + ASSERT_TRUE(pfd.revents & POLLIN); 428 + 429 + /* Verify exit info. */ 430 + ret = ioctl(pidfd, PIDFD_GET_INFO, &info); 431 + ASSERT_EQ(ret, 0); 432 + ASSERT_TRUE(info.mask & PIDFD_INFO_EXIT); 433 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 434 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 99); 435 + 436 + /* No zombie. */ 437 + ret = waitpid(pid, NULL, WNOHANG); 438 + ASSERT_EQ(ret, -1); 439 + ASSERT_EQ(errno, ECHILD); 440 + 441 + close(pidfd); 442 + } 443 + 444 + /* 445 + * Test that autoreap is NOT inherited by grandchildren. 446 + */ 447 + TEST(autoreap_no_inherit) 448 + { 449 + int ipc_sockets[2], ret; 450 + int pidfd = -1; 451 + pid_t pid; 452 + char buf[2] = {}; 453 + struct pollfd pfd; 454 + 455 + ret = socketpair(AF_LOCAL, SOCK_STREAM | SOCK_CLOEXEC, 0, ipc_sockets); 456 + ASSERT_EQ(ret, 0); 457 + 458 + pid = create_autoreap_child(&pidfd); 459 + if (pid < 0 && errno == EINVAL) { 460 + close(ipc_sockets[0]); 461 + close(ipc_sockets[1]); 462 + SKIP(return, "CLONE_AUTOREAP not supported"); 463 + } 464 + ASSERT_GE(pid, 0); 465 + 466 + if (pid == 0) { 467 + pid_t gc; 468 + int status; 469 + 470 + close(ipc_sockets[0]); 471 + 472 + /* Autoreap child forks a grandchild (without autoreap). */ 473 + gc = fork(); 474 + if (gc < 0) { 475 + write_nointr(ipc_sockets[1], "E", 1); 476 + _exit(1); 477 + } 478 + if (gc == 0) { 479 + /* Grandchild: exit immediately. */ 480 + close(ipc_sockets[1]); 481 + _exit(77); 482 + } 483 + 484 + /* 485 + * The grandchild should become a regular zombie 486 + * since it was NOT created with CLONE_AUTOREAP. 487 + * Wait for it to verify. 488 + */ 489 + ret = waitpid(gc, &status, 0); 490 + if (ret == gc && WIFEXITED(status) && 491 + WEXITSTATUS(status) == 77) { 492 + write_nointr(ipc_sockets[1], "P", 1); 493 + } else { 494 + write_nointr(ipc_sockets[1], "F", 1); 495 + } 496 + close(ipc_sockets[1]); 497 + _exit(0); 498 + } 499 + 500 + close(ipc_sockets[1]); 501 + 502 + ret = read_nointr(ipc_sockets[0], buf, 1); 503 + close(ipc_sockets[0]); 504 + ASSERT_EQ(ret, 1); 505 + 506 + /* 507 + * 'P' means the autoreap child was able to waitpid() its 508 + * grandchild (correct - grandchild should be a normal zombie, 509 + * not autoreaped). 510 + */ 511 + ASSERT_EQ(buf[0], 'P'); 512 + 513 + /* Wait for the autoreap child to exit. */ 514 + pfd.fd = pidfd; 515 + pfd.events = POLLIN; 516 + ret = poll(&pfd, 1, 5000); 517 + ASSERT_EQ(ret, 1); 518 + 519 + /* Autoreap child itself should be autoreaped. */ 520 + ret = waitpid(pid, NULL, WNOHANG); 521 + ASSERT_EQ(ret, -1); 522 + ASSERT_EQ(errno, ECHILD); 523 + 524 + close(pidfd); 525 + } 526 + 527 + /* 528 + * Test that CLONE_NNP sets no_new_privs on the child. 529 + * The child checks via prctl(PR_GET_NO_NEW_PRIVS) and reports back. 530 + * The parent must NOT have no_new_privs set afterwards. 531 + */ 532 + TEST(nnp_sets_no_new_privs) 533 + { 534 + struct __clone_args args = { 535 + .flags = CLONE_PIDFD | CLONE_AUTOREAP | CLONE_NNP, 536 + .exit_signal = 0, 537 + }; 538 + struct pidfd_info info = { .mask = PIDFD_INFO_EXIT }; 539 + int pidfd = -1, ret; 540 + struct pollfd pfd; 541 + pid_t pid; 542 + 543 + /* Ensure parent does not already have no_new_privs. */ 544 + ret = prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); 545 + ASSERT_EQ(ret, 0) { 546 + TH_LOG("Parent already has no_new_privs set, cannot run test"); 547 + } 548 + 549 + args.pidfd = ptr_to_u64(&pidfd); 550 + 551 + pid = sys_clone3(&args, sizeof(args)); 552 + if (pid < 0 && errno == EINVAL) 553 + SKIP(return, "CLONE_NNP not supported"); 554 + ASSERT_GE(pid, 0); 555 + 556 + if (pid == 0) { 557 + /* 558 + * Child: check no_new_privs. Exit 0 if set, 1 if not. 559 + */ 560 + ret = prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); 561 + _exit(ret == 1 ? 0 : 1); 562 + } 563 + 564 + ASSERT_GE(pidfd, 0); 565 + 566 + /* Parent must still NOT have no_new_privs. */ 567 + ret = prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); 568 + ASSERT_EQ(ret, 0) { 569 + TH_LOG("Parent got no_new_privs after creating CLONE_NNP child"); 570 + } 571 + 572 + /* Wait for child to exit. */ 573 + pfd.fd = pidfd; 574 + pfd.events = POLLIN; 575 + ret = poll(&pfd, 1, 5000); 576 + ASSERT_EQ(ret, 1); 577 + 578 + /* Verify child exited with 0 (no_new_privs was set). */ 579 + ret = ioctl(pidfd, PIDFD_GET_INFO, &info); 580 + ASSERT_EQ(ret, 0); 581 + ASSERT_TRUE(info.mask & PIDFD_INFO_EXIT); 582 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 583 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0) { 584 + TH_LOG("Child did not have no_new_privs set"); 585 + } 586 + 587 + close(pidfd); 588 + } 589 + 590 + /* 591 + * Test that CLONE_NNP with CLONE_THREAD fails with EINVAL. 592 + */ 593 + TEST(nnp_rejects_thread) 594 + { 595 + struct __clone_args args = { 596 + .flags = CLONE_NNP | CLONE_THREAD | 597 + CLONE_SIGHAND | CLONE_VM, 598 + .exit_signal = 0, 599 + }; 600 + pid_t pid; 601 + 602 + pid = sys_clone3(&args, sizeof(args)); 603 + ASSERT_EQ(pid, -1); 604 + ASSERT_EQ(errno, EINVAL); 605 + } 606 + 607 + /* 608 + * Test that a plain CLONE_AUTOREAP child does NOT get no_new_privs. 609 + * Only CLONE_NNP should set it. 610 + */ 611 + TEST(autoreap_no_new_privs_unset) 612 + { 613 + struct pidfd_info info = { .mask = PIDFD_INFO_EXIT }; 614 + int pidfd = -1, ret; 615 + struct pollfd pfd; 616 + pid_t pid; 617 + 618 + pid = create_autoreap_child(&pidfd); 619 + if (pid < 0 && errno == EINVAL) 620 + SKIP(return, "CLONE_AUTOREAP not supported"); 621 + ASSERT_GE(pid, 0); 622 + 623 + if (pid == 0) { 624 + /* 625 + * Child: check no_new_privs. Exit 0 if NOT set, 1 if set. 626 + */ 627 + ret = prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); 628 + _exit(ret == 0 ? 0 : 1); 629 + } 630 + 631 + ASSERT_GE(pidfd, 0); 632 + 633 + pfd.fd = pidfd; 634 + pfd.events = POLLIN; 635 + ret = poll(&pfd, 1, 5000); 636 + ASSERT_EQ(ret, 1); 637 + 638 + ret = ioctl(pidfd, PIDFD_GET_INFO, &info); 639 + ASSERT_EQ(ret, 0); 640 + ASSERT_TRUE(info.mask & PIDFD_INFO_EXIT); 641 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 642 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0) { 643 + TH_LOG("Plain autoreap child unexpectedly has no_new_privs"); 644 + } 645 + 646 + close(pidfd); 647 + } 648 + 649 + /* 650 + * Helper: create a child with CLONE_PIDFD | CLONE_PIDFD_AUTOKILL | CLONE_AUTOREAP | CLONE_NNP. 651 + */ 652 + static pid_t create_autokill_child(int *pidfd) 653 + { 654 + struct __clone_args args = { 655 + .flags = CLONE_PIDFD | CLONE_PIDFD_AUTOKILL | 656 + CLONE_AUTOREAP | CLONE_NNP, 657 + .exit_signal = 0, 658 + .pidfd = ptr_to_u64(pidfd), 659 + }; 660 + 661 + return sys_clone3(&args, sizeof(args)); 662 + } 663 + 664 + /* 665 + * Basic autokill test: child blocks in pause(), parent closes the 666 + * clone3 pidfd, child should be killed and autoreaped. 667 + */ 668 + TEST(autokill_basic) 669 + { 670 + int pidfd = -1, pollfd_fd = -1, ret; 671 + struct pollfd pfd; 672 + pid_t pid; 673 + 674 + pid = create_autokill_child(&pidfd); 675 + if (pid < 0 && errno == EINVAL) 676 + SKIP(return, "CLONE_PIDFD_AUTOKILL not supported"); 677 + ASSERT_GE(pid, 0); 678 + 679 + if (pid == 0) { 680 + pause(); 681 + _exit(1); 682 + } 683 + 684 + ASSERT_GE(pidfd, 0); 685 + 686 + /* 687 + * Open a second pidfd via pidfd_open() so we can observe the 688 + * child's death after closing the clone3 pidfd. 689 + */ 690 + pollfd_fd = sys_pidfd_open(pid, 0); 691 + ASSERT_GE(pollfd_fd, 0); 692 + 693 + /* Close the clone3 pidfd — this should trigger autokill. */ 694 + close(pidfd); 695 + 696 + /* Wait for the child to die via the pidfd_open'd fd. */ 697 + pfd.fd = pollfd_fd; 698 + pfd.events = POLLIN; 699 + ret = poll(&pfd, 1, 5000); 700 + ASSERT_EQ(ret, 1); 701 + ASSERT_TRUE(pfd.revents & POLLIN); 702 + 703 + /* Child should be autoreaped — no zombie. */ 704 + usleep(100000); 705 + ret = waitpid(pid, NULL, WNOHANG); 706 + ASSERT_EQ(ret, -1); 707 + ASSERT_EQ(errno, ECHILD); 708 + 709 + close(pollfd_fd); 710 + } 711 + 712 + /* 713 + * CLONE_PIDFD_AUTOKILL without CLONE_PIDFD must fail with EINVAL. 714 + */ 715 + TEST(autokill_requires_pidfd) 716 + { 717 + struct __clone_args args = { 718 + .flags = CLONE_PIDFD_AUTOKILL | CLONE_AUTOREAP, 719 + .exit_signal = 0, 720 + }; 721 + pid_t pid; 722 + 723 + pid = sys_clone3(&args, sizeof(args)); 724 + ASSERT_EQ(pid, -1); 725 + ASSERT_EQ(errno, EINVAL); 726 + } 727 + 728 + /* 729 + * CLONE_PIDFD_AUTOKILL without CLONE_AUTOREAP must fail with EINVAL. 730 + */ 731 + TEST(autokill_requires_autoreap) 732 + { 733 + int pidfd = -1; 734 + struct __clone_args args = { 735 + .flags = CLONE_PIDFD | CLONE_PIDFD_AUTOKILL, 736 + .exit_signal = 0, 737 + .pidfd = ptr_to_u64(&pidfd), 738 + }; 739 + pid_t pid; 740 + 741 + pid = sys_clone3(&args, sizeof(args)); 742 + ASSERT_EQ(pid, -1); 743 + ASSERT_EQ(errno, EINVAL); 744 + } 745 + 746 + /* 747 + * CLONE_PIDFD_AUTOKILL with CLONE_THREAD must fail with EINVAL. 748 + */ 749 + TEST(autokill_rejects_thread) 750 + { 751 + int pidfd = -1; 752 + struct __clone_args args = { 753 + .flags = CLONE_PIDFD | CLONE_PIDFD_AUTOKILL | 754 + CLONE_AUTOREAP | CLONE_THREAD | 755 + CLONE_SIGHAND | CLONE_VM, 756 + .exit_signal = 0, 757 + .pidfd = ptr_to_u64(&pidfd), 758 + }; 759 + pid_t pid; 760 + 761 + pid = sys_clone3(&args, sizeof(args)); 762 + ASSERT_EQ(pid, -1); 763 + ASSERT_EQ(errno, EINVAL); 764 + } 765 + 766 + /* 767 + * Test that only the clone3 pidfd triggers autokill, not pidfd_open(). 768 + * Close the pidfd_open'd fd first — child should survive. 769 + * Then close the clone3 pidfd — child should be killed and autoreaped. 770 + */ 771 + TEST(autokill_pidfd_open_no_effect) 772 + { 773 + int pidfd = -1, open_fd = -1, ret; 774 + struct pollfd pfd; 775 + pid_t pid; 776 + 777 + pid = create_autokill_child(&pidfd); 778 + if (pid < 0 && errno == EINVAL) 779 + SKIP(return, "CLONE_PIDFD_AUTOKILL not supported"); 780 + ASSERT_GE(pid, 0); 781 + 782 + if (pid == 0) { 783 + pause(); 784 + _exit(1); 785 + } 786 + 787 + ASSERT_GE(pidfd, 0); 788 + 789 + /* Open a second pidfd via pidfd_open(). */ 790 + open_fd = sys_pidfd_open(pid, 0); 791 + ASSERT_GE(open_fd, 0); 792 + 793 + /* 794 + * Close the pidfd_open'd fd — child should survive because 795 + * only the clone3 pidfd has autokill. 796 + */ 797 + close(open_fd); 798 + usleep(200000); 799 + 800 + /* Verify child is still alive by polling the clone3 pidfd. */ 801 + pfd.fd = pidfd; 802 + pfd.events = POLLIN; 803 + ret = poll(&pfd, 1, 0); 804 + ASSERT_EQ(ret, 0) { 805 + TH_LOG("Child died after closing pidfd_open fd — should still be alive"); 806 + } 807 + 808 + /* Open another observation fd before triggering autokill. */ 809 + open_fd = sys_pidfd_open(pid, 0); 810 + ASSERT_GE(open_fd, 0); 811 + 812 + /* Now close the clone3 pidfd — this triggers autokill. */ 813 + close(pidfd); 814 + 815 + pfd.fd = open_fd; 816 + pfd.events = POLLIN; 817 + ret = poll(&pfd, 1, 5000); 818 + ASSERT_EQ(ret, 1); 819 + ASSERT_TRUE(pfd.revents & POLLIN); 820 + 821 + /* Child should be autoreaped — no zombie. */ 822 + usleep(100000); 823 + ret = waitpid(pid, NULL, WNOHANG); 824 + ASSERT_EQ(ret, -1); 825 + ASSERT_EQ(errno, ECHILD); 826 + 827 + close(open_fd); 828 + } 829 + 830 + /* 831 + * Test that CLONE_PIDFD_AUTOKILL without CLONE_NNP fails with EPERM 832 + * for an unprivileged caller. 833 + */ 834 + TEST(autokill_requires_cap_sys_admin) 835 + { 836 + int pidfd = -1, ret; 837 + struct __clone_args args = { 838 + .flags = CLONE_PIDFD | CLONE_PIDFD_AUTOKILL | 839 + CLONE_AUTOREAP, 840 + .exit_signal = 0, 841 + .pidfd = ptr_to_u64(&pidfd), 842 + }; 843 + pid_t pid; 844 + 845 + /* Drop all capabilities so we lack CAP_SYS_ADMIN. */ 846 + ret = drop_all_caps(); 847 + ASSERT_EQ(ret, 0); 848 + 849 + pid = sys_clone3(&args, sizeof(args)); 850 + ASSERT_EQ(pid, -1); 851 + ASSERT_EQ(errno, EPERM); 852 + } 853 + 854 + /* 855 + * Test that CLONE_PIDFD_AUTOKILL without CLONE_NNP succeeds with 856 + * CAP_SYS_ADMIN. 857 + */ 858 + TEST(autokill_without_nnp_with_cap) 859 + { 860 + struct __clone_args args = { 861 + .flags = CLONE_PIDFD | CLONE_PIDFD_AUTOKILL | 862 + CLONE_AUTOREAP, 863 + .exit_signal = 0, 864 + }; 865 + struct pidfd_info info = { .mask = PIDFD_INFO_EXIT }; 866 + int pidfd = -1, ret; 867 + struct pollfd pfd; 868 + pid_t pid; 869 + 870 + if (geteuid() != 0) 871 + SKIP(return, "Need root/CAP_SYS_ADMIN"); 872 + 873 + args.pidfd = ptr_to_u64(&pidfd); 874 + 875 + pid = sys_clone3(&args, sizeof(args)); 876 + if (pid < 0 && errno == EINVAL) 877 + SKIP(return, "CLONE_PIDFD_AUTOKILL not supported"); 878 + ASSERT_GE(pid, 0); 879 + 880 + if (pid == 0) 881 + _exit(0); 882 + 883 + ASSERT_GE(pidfd, 0); 884 + 885 + /* Wait for child to exit. */ 886 + pfd.fd = pidfd; 887 + pfd.events = POLLIN; 888 + ret = poll(&pfd, 1, 5000); 889 + ASSERT_EQ(ret, 1); 890 + 891 + ret = ioctl(pidfd, PIDFD_GET_INFO, &info); 892 + ASSERT_EQ(ret, 0); 893 + ASSERT_TRUE(info.mask & PIDFD_INFO_EXIT); 894 + ASSERT_TRUE(WIFEXITED(info.exit_code)); 895 + ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); 896 + 897 + close(pidfd); 898 + } 899 + 900 + TEST_HARNESS_MAIN
+1
tools/testing/selftests/pidfd/pidfd_info_test.c
··· 724 724 ASSERT_TRUE(!!(info.supported_mask & PIDFD_INFO_COREDUMP)); 725 725 ASSERT_TRUE(!!(info.supported_mask & PIDFD_INFO_SUPPORTED_MASK)); 726 726 ASSERT_TRUE(!!(info.supported_mask & PIDFD_INFO_COREDUMP_SIGNAL)); 727 + ASSERT_TRUE(!!(info.supported_mask & PIDFD_INFO_COREDUMP_CODE)); 727 728 728 729 /* Clean up */ 729 730 sys_pidfd_send_signal(pidfd, SIGKILL, NULL, 0);