sched_ext: Defer scx_hardlockup() out of NMI

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

scx_hardlockup() runs from NMI and eventually calls scx_claim_exit(),
which takes scx_sched_lock. scx_sched_lock isn't NMI-safe and grabbing
it from NMI context can lead to deadlocks.

The hardlockup handler is best-effort recovery and the disable path it
triggers runs off of irq_work anyway. Move the handle_lockup() call into
an irq_work so it runs in IRQ context.

Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

Tejun Heo 2 weeks ago bd2d7645 510a2705

+27 -6

1 changed file

expand all

kernel

sched

ext.c

+27 -6

kernel/sched/ext.c

··· 4940 4940 smp_processor_id(), dur_s); 4941 4941 } 4942 4942 4943 + /* 4944 + * scx_hardlockup() runs from NMI and eventually calls scx_claim_exit(), 4945 + * which takes scx_sched_lock. scx_sched_lock isn't NMI-safe and grabbing 4946 + * it from NMI context can lead to deadlocks. Defer via irq_work; the 4947 + * disable path runs off irq_work anyway. 4948 + */ 4949 + static atomic_t scx_hardlockup_cpu = ATOMIC_INIT(-1); 4950 + 4951 + static void scx_hardlockup_irq_workfn(struct irq_work *work) 4952 + { 4953 + int cpu = atomic_xchg(&scx_hardlockup_cpu, -1); 4954 + 4955 + if (cpu >= 0 && handle_lockup("hard lockup - CPU %d", cpu)) 4956 + printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n", 4957 + cpu); 4958 + } 4959 + 4960 + static DEFINE_IRQ_WORK(scx_hardlockup_irq_work, scx_hardlockup_irq_workfn); 4961 + 4943 4962 /** 4944 4963 * scx_hardlockup - sched_ext hardlockup handler 4945 4964 * ··· 4967 4948 * Try kicking out the current scheduler in an attempt to recover the system to 4968 4949 * a good state before taking more drastic actions. 4969 4950 * 4970 - * Returns %true if sched_ext is enabled and abort was initiated, which may 4971 - * resolve the reported hardlockup. %false if sched_ext is not enabled or 4972 - * someone else already initiated abort. 4951 + * Queues an irq_work; the handle_lockup() call happens in IRQ context (see 4952 + * scx_hardlockup_irq_workfn). 4953 + * 4954 + * Returns %true if sched_ext is enabled and the work was queued, %false 4955 + * otherwise. 4973 4956 */ 4974 4957 bool scx_hardlockup(int cpu) 4975 4958 { 4976 - if (!handle_lockup("hard lockup - CPU %d", cpu)) 4959 + if (!rcu_access_pointer(scx_root)) 4977 4960 return false; 4978 4961 4979 - printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n", 4980 - cpu); 4962 + atomic_cmpxchg(&scx_hardlockup_cpu, -1, cpu); 4963 + irq_work_queue(&scx_hardlockup_irq_work); 4981 4964 return true; 4982 4965 } 4983 4966

Configure Feed

Configure Feed