sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork()

A new BPF extensible sched_class will need more control over the forking
process. It wants to be able to fail from sched_cgroup_fork() after the new
task's sched_task_group is initialized so that the loaded BPF program can
prepare the task with its cgroup association is established and reject fork
if e.g. allocation fails.

Allow sched_cgroup_fork() to fail by making it return int instead of void
and adding sched_cancel_fork() to undo sched_fork() in the error path.

sched_cgroup_fork() doesn't fail yet and this patch shouldn't cause any
behavior changes.

v2: Patch description updated to detail the expected use.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>

Tejun Heo 2 years ago 304b3f2b df268382

+19 -7

3 changed files

expand all

include

linux

sched

task.h

kernel

fork.c

sched

core.c

+2 -1

include/linux/sched/task.h

··· 63 63 extern void init_idle(struct task_struct *idle, int cpu); 64 64 65 65 extern int sched_fork(unsigned long clone_flags, struct task_struct *p); 66 - extern void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs); 66 + extern int sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs); 67 + extern void sched_cancel_fork(struct task_struct *p); 67 68 extern void sched_post_fork(struct task_struct *p); 68 69 extern void sched_dead(struct task_struct *p); 69 70

+10 -5

kernel/fork.c

··· 2363 2363 2364 2364 retval = perf_event_init_task(p, clone_flags); 2365 2365 if (retval) 2366 - goto bad_fork_cleanup_policy; 2366 + goto bad_fork_sched_cancel_fork; 2367 2367 retval = audit_alloc(p); 2368 2368 if (retval) 2369 2369 goto bad_fork_cleanup_perf; ··· 2496 2496 * cgroup specific, it unconditionally needs to place the task on a 2497 2497 * runqueue. 2498 2498 */ 2499 - sched_cgroup_fork(p, args); 2499 + retval = sched_cgroup_fork(p, args); 2500 + if (retval) 2501 + goto bad_fork_cancel_cgroup; 2500 2502 2501 2503 /* 2502 2504 * From this point on we must avoid any synchronous user-space ··· 2544 2542 /* Don't start children in a dying pid namespace */ 2545 2543 if (unlikely(!(ns_of_pid(pid)->pid_allocated & PIDNS_ADDING))) { 2546 2544 retval = -ENOMEM; 2547 - goto bad_fork_cancel_cgroup; 2545 + goto bad_fork_core_free; 2548 2546 } 2549 2547 2550 2548 /* Let kill terminate clone/fork in the middle */ 2551 2549 if (fatal_signal_pending(current)) { 2552 2550 retval = -EINTR; 2553 - goto bad_fork_cancel_cgroup; 2551 + goto bad_fork_core_free; 2554 2552 } 2555 2553 2556 2554 /* No more failure paths after this point. */ ··· 2624 2622 2625 2623 return p; 2626 2624 2627 - bad_fork_cancel_cgroup: 2625 + bad_fork_core_free: 2628 2626 sched_core_free(p); 2629 2627 spin_unlock(&current->sighand->siglock); 2630 2628 write_unlock_irq(&tasklist_lock); 2629 + bad_fork_cancel_cgroup: 2631 2630 cgroup_cancel_fork(p, args); 2632 2631 bad_fork_put_pidfd: 2633 2632 if (clone_flags & CLONE_PIDFD) { ··· 2667 2664 audit_free(p); 2668 2665 bad_fork_cleanup_perf: 2669 2666 perf_event_free_task(p); 2667 + bad_fork_sched_cancel_fork: 2668 + sched_cancel_fork(p); 2670 2669 bad_fork_cleanup_policy: 2671 2670 lockdep_free_task(p); 2672 2671 #ifdef CONFIG_NUMA

+7 -1

kernel/sched/core.c

··· 4609 4609 return 0; 4610 4610 } 4611 4611 4612 - void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs) 4612 + int sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs) 4613 4613 { 4614 4614 unsigned long flags; 4615 4615 ··· 4636 4636 if (p->sched_class->task_fork) 4637 4637 p->sched_class->task_fork(p); 4638 4638 raw_spin_unlock_irqrestore(&p->pi_lock, flags); 4639 + 4640 + return 0; 4641 + } 4642 + 4643 + void sched_cancel_fork(struct task_struct *p) 4644 + { 4639 4645 } 4640 4646 4641 4647 void sched_post_fork(struct task_struct *p)

Configure Feed

Configure Feed