Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched/psi: fix race between file release and pressure write

A potential race condition exists between pressure write and cgroup file
release regarding the priv member of struct kernfs_open_file, which
triggers the uaf reported in [1].

Consider the following scenario involving execution on two separate CPUs:

CPU0 CPU1
==== ====
vfs_rmdir()
kernfs_iop_rmdir()
cgroup_rmdir()
cgroup_kn_lock_live()
cgroup_destroy_locked()
cgroup_addrm_files()
cgroup_rm_file()
kernfs_remove_by_name()
kernfs_remove_by_name_ns()
vfs_write() __kernfs_remove()
new_sync_write() kernfs_drain()
kernfs_fop_write_iter() kernfs_drain_open_files()
cgroup_file_write() kernfs_release_file()
pressure_write() cgroup_file_release()
ctx = of->priv;
kfree(ctx);
of->priv = NULL;
cgroup_kn_unlock()
cgroup_kn_lock_live()
cgroup_get(cgrp)
cgroup_kn_unlock()
if (ctx->psi.trigger) // here, trigger uaf for ctx, that is of->priv

The cgroup_rmdir() is protected by the cgroup_mutex, it also safeguards
the memory deallocation of of->priv performed within cgroup_file_release().
However, the operations involving of->priv executed within pressure_write()
are not entirely covered by the protection of cgroup_mutex. Consequently,
if the code in pressure_write(), specifically the section handling the
ctx variable executes after cgroup_file_release() has completed, a uaf
vulnerability involving of->priv is triggered.

Therefore, the issue can be resolved by extending the scope of the
cgroup_mutex lock within pressure_write() to encompass all code paths
involving of->priv, thereby properly synchronizing the race condition
occurring between cgroup_file_release() and pressure_write().

And, if an live kn lock can be successfully acquired while executing
the pressure write operation, it indicates that the cgroup deletion
process has not yet reached its final stage; consequently, the priv
pointer within open_file cannot be NULL. Therefore, the operation to
retrieve the ctx value must be moved to a point *after* the live kn
lock has been successfully acquired.

In another situation, specifically after entering cgroup_kn_lock_live()
but before acquiring cgroup_mutex, there exists a different class of
race condition:

CPU0: write memory.pressure CPU1: write cgroup.pressure=0
=========================== =============================

kernfs_fop_write_iter()
kernfs_get_active_of(of)
pressure_write()
cgroup_kn_lock_live(memory.pressure)
cgroup_tryget(cgrp)
kernfs_break_active_protection(kn)
... blocks on cgroup_mutex

cgroup_pressure_write()
cgroup_kn_lock_live(cgroup.pressure)
cgroup_file_show(memory.pressure, false)
kernfs_show(false)
kernfs_drain_open_files()
cgroup_file_release(of)
kfree(ctx)
of->priv = NULL
cgroup_kn_unlock()

... acquires cgroup_mutex
ctx = of->priv; // may now be NULL
if (ctx->psi.trigger) // NULL dereference

Consequently, there is a possibility that of->priv is NULL, the pressure
write needs to check for this.

Now that the scope of the cgroup_mutex has been expanded, the original
explicit cgroup_get/put operations are no longer necessary, this is
because acquiring/releasing the live kn lock inherently executes a
cgroup get/put operation.

[1]
BUG: KASAN: slab-use-after-free in pressure_write+0xa4/0x210 kernel/cgroup/cgroup.c:4011
Call Trace:
pressure_write+0xa4/0x210 kernel/cgroup/cgroup.c:4011
cgroup_file_write+0x36f/0x790 kernel/cgroup/cgroup.c:4311
kernfs_fop_write_iter+0x3b0/0x540 fs/kernfs/file.c:352

Allocated by task 9352:
cgroup_file_open+0x90/0x3a0 kernel/cgroup/cgroup.c:4256
kernfs_fop_open+0x9eb/0xcb0 fs/kernfs/file.c:724
do_dentry_open+0x83d/0x13e0 fs/open.c:949

Freed by task 9353:
cgroup_file_release+0xd6/0x100 kernel/cgroup/cgroup.c:4283
kernfs_release_file fs/kernfs/file.c:764 [inline]
kernfs_drain_open_files+0x392/0x720 fs/kernfs/file.c:834
kernfs_drain+0x470/0x600 fs/kernfs/dir.c:525

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: syzbot+33e571025d88efd1312c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33e571025d88efd1312c
Tested-by: syzbot+33e571025d88efd1312c@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Edward Adam Davis and committed by
Tejun Heo
a5b98009 d730905b

+16 -8
+16 -8
kernel/cgroup/cgroup.c
··· 3934 3934 static ssize_t pressure_write(struct kernfs_open_file *of, char *buf, 3935 3935 size_t nbytes, enum psi_res res) 3936 3936 { 3937 - struct cgroup_file_ctx *ctx = of->priv; 3937 + struct cgroup_file_ctx *ctx; 3938 3938 struct psi_trigger *new; 3939 3939 struct cgroup *cgrp; 3940 3940 struct psi_group *psi; 3941 + ssize_t ret = 0; 3941 3942 3942 3943 cgrp = cgroup_kn_lock_live(of->kn, false); 3943 3944 if (!cgrp) 3944 3945 return -ENODEV; 3945 3946 3946 - cgroup_get(cgrp); 3947 - cgroup_kn_unlock(of->kn); 3947 + ctx = of->priv; 3948 + if (!ctx) { 3949 + ret = -ENODEV; 3950 + goto out_unlock; 3951 + } 3948 3952 3949 3953 /* Allow only one trigger per file descriptor */ 3950 3954 if (ctx->psi.trigger) { 3951 - cgroup_put(cgrp); 3952 - return -EBUSY; 3955 + ret = -EBUSY; 3956 + goto out_unlock; 3953 3957 } 3954 3958 3955 3959 psi = cgroup_psi(cgrp); 3956 3960 new = psi_trigger_create(psi, buf, res, of->file, of); 3957 3961 if (IS_ERR(new)) { 3958 - cgroup_put(cgrp); 3959 - return PTR_ERR(new); 3962 + ret = PTR_ERR(new); 3963 + goto out_unlock; 3960 3964 } 3961 3965 3962 3966 smp_store_release(&ctx->psi.trigger, new); 3963 - cgroup_put(cgrp); 3967 + 3968 + out_unlock: 3969 + cgroup_kn_unlock(of->kn); 3970 + if (ret) 3971 + return ret; 3964 3972 3965 3973 return nbytes; 3966 3974 }