Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace-fs updates from Christian Brauner:
"This adds ioctls allowing to translate PIDs between PID namespaces.

The motivating use-case comes from LXCFS which is a tiny fuse
filesystem used to virtualize various aspects of procfs. LXCFS is run
on the host. The files and directories it creates can be bind-mounted
by e.g. a container at startup and mounted over the various procfs
files the container wishes to have virtualized.

When e.g. a read request for uptime is received, LXCFS will receive
the pid of the reader. In order to virtualize the corresponding read,
LXCFS needs to know the pid of the init process of the reader's pid
namespace.

In order to do this, LXCFS first needs to fork() two helper processes.
The first helper process setns() to the readers pid namespace. The
second helper process is needed to create a process that is a proper
member of the pid namespace.

The second helper process then creates a ucred message with ucred.pid
set to 1 and sends it back to LXCFS. The kernel will translate the
ucred.pid field to the corresponding pid number in LXCFS's pid
namespace. This way LXCFS can learn the init pid number of the
reader's pid namespace and can go on to virtualize.

Since these two forks() are costly LXCFS maintains an init pid cache
that caches a given pid for a fixed amount of time. The cache is
pruned during new read requests. However, even with the cache the hit
of the two forks() is singificant when a very large number of
containers are running.

So this adds a simple set of ioctls that let's a caller translate PIDs
from and into a given PID namespace. This significantly improves
performance with a very simple change.

To protect against races pidfds can be used to check whether the
process is still valid"

* tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nsfs: add pid translation ioctls

+60 -1
+52 -1
fs/nsfs.c
··· 8 8 #include <linux/magic.h> 9 9 #include <linux/ktime.h> 10 10 #include <linux/seq_file.h> 11 + #include <linux/pid_namespace.h> 11 12 #include <linux/user_namespace.h> 12 13 #include <linux/nsfs.h> 13 14 #include <linux/uaccess.h> ··· 125 124 unsigned long arg) 126 125 { 127 126 struct user_namespace *user_ns; 127 + struct pid_namespace *pid_ns; 128 + struct task_struct *tsk; 128 129 struct ns_common *ns = get_proc_ns(file_inode(filp)); 129 130 uid_t __user *argp; 130 131 uid_t uid; 132 + int ret; 131 133 132 134 switch (ioctl) { 133 135 case NS_GET_USERNS: ··· 161 157 id = mnt_ns->seq; 162 158 return put_user(id, idp); 163 159 } 160 + case NS_GET_PID_FROM_PIDNS: 161 + fallthrough; 162 + case NS_GET_TGID_FROM_PIDNS: 163 + fallthrough; 164 + case NS_GET_PID_IN_PIDNS: 165 + fallthrough; 166 + case NS_GET_TGID_IN_PIDNS: 167 + if (ns->ops->type != CLONE_NEWPID) 168 + return -EINVAL; 169 + 170 + ret = -ESRCH; 171 + pid_ns = container_of(ns, struct pid_namespace, ns); 172 + 173 + rcu_read_lock(); 174 + 175 + if (ioctl == NS_GET_PID_IN_PIDNS || 176 + ioctl == NS_GET_TGID_IN_PIDNS) 177 + tsk = find_task_by_vpid(arg); 178 + else 179 + tsk = find_task_by_pid_ns(arg, pid_ns); 180 + if (!tsk) 181 + break; 182 + 183 + switch (ioctl) { 184 + case NS_GET_PID_FROM_PIDNS: 185 + ret = task_pid_vnr(tsk); 186 + break; 187 + case NS_GET_TGID_FROM_PIDNS: 188 + ret = task_tgid_vnr(tsk); 189 + break; 190 + case NS_GET_PID_IN_PIDNS: 191 + ret = task_pid_nr_ns(tsk, pid_ns); 192 + break; 193 + case NS_GET_TGID_IN_PIDNS: 194 + ret = task_tgid_nr_ns(tsk, pid_ns); 195 + break; 196 + default: 197 + ret = 0; 198 + break; 199 + } 200 + rcu_read_unlock(); 201 + 202 + if (!ret) 203 + ret = -ESRCH; 204 + break; 164 205 default: 165 - return -ENOTTY; 206 + ret = -ENOTTY; 166 207 } 208 + 209 + return ret; 167 210 } 168 211 169 212 int ns_get_name(char *buf, size_t size, struct task_struct *task,
+8
include/uapi/linux/nsfs.h
··· 17 17 #define NS_GET_OWNER_UID _IO(NSIO, 0x4) 18 18 /* Get the id for a mount namespace */ 19 19 #define NS_GET_MNTNS_ID _IO(NSIO, 0x5) 20 + /* Translate pid from target pid namespace into the caller's pid namespace. */ 21 + #define NS_GET_PID_FROM_PIDNS _IOR(NSIO, 0x6, int) 22 + /* Return thread-group leader id of pid in the callers pid namespace. */ 23 + #define NS_GET_TGID_FROM_PIDNS _IOR(NSIO, 0x7, int) 24 + /* Translate pid from caller's pid namespace into a target pid namespace. */ 25 + #define NS_GET_PID_IN_PIDNS _IOR(NSIO, 0x8, int) 26 + /* Return thread-group leader id of pid in the target pid namespace. */ 27 + #define NS_GET_TGID_IN_PIDNS _IOR(NSIO, 0x9, int) 20 28 21 29 #endif /* __LINUX_NSFS_H */