Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

sched: Add missing memory barrier in switch_mm_cid

Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:

commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")

If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the actively used cid when it fails to observe active task after it
sets lazy_put.

There *is* a memory barrier between storing to rq->curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq->curr
and switch_mm_cid(), which happens earlier than:

- spin_unlock(),
- switch_to().

So it's fine when the architecture switch_mm() happens to have that
barrier already, but less so when the architecture only provides the
full barrier in switch_to() or spin_unlock().

It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().

Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.

Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.

Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Reported-by: levi.yun <yeoreum.yun@arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86
Cc: <stable@vger.kernel.org> # 6.4.x
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240415152114.59122-2-mathieu.desnoyers@efficios.com

authored by

Mathieu Desnoyers and committed by
Ingo Molnar
fe90f396 0bbac3fa

+25 -6
+3
arch/x86/include/asm/barrier.h
··· 79 79 #define __smp_mb__before_atomic() do { } while (0) 80 80 #define __smp_mb__after_atomic() do { } while (0) 81 81 82 + /* Writing to CR3 provides a full memory barrier in switch_mm(). */ 83 + #define smp_mb__after_switch_mm() do { } while (0) 84 + 82 85 #include <asm-generic/barrier.h> 83 86 84 87 #endif /* _ASM_X86_BARRIER_H */
+8
include/asm-generic/barrier.h
··· 294 294 #define io_stop_wc() do { } while (0) 295 295 #endif 296 296 297 + /* 298 + * Architectures that guarantee an implicit smp_mb() in switch_mm() 299 + * can override smp_mb__after_switch_mm. 300 + */ 301 + #ifndef smp_mb__after_switch_mm 302 + # define smp_mb__after_switch_mm() smp_mb() 303 + #endif 304 + 297 305 #endif /* !__ASSEMBLY__ */ 298 306 #endif /* __ASM_GENERIC_BARRIER_H */
+14 -6
kernel/sched/sched.h
··· 79 79 # include <asm/paravirt_api_clock.h> 80 80 #endif 81 81 82 + #include <asm/barrier.h> 83 + 82 84 #include "cpupri.h" 83 85 #include "cpudeadline.h" 84 86 ··· 3447 3445 * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu]. 3448 3446 * Provide it here. 3449 3447 */ 3450 - if (!prev->mm) // from kernel 3448 + if (!prev->mm) { // from kernel 3451 3449 smp_mb(); 3452 - /* 3453 - * user -> user transition guarantees a memory barrier through 3454 - * switch_mm() when current->mm changes. If current->mm is 3455 - * unchanged, no barrier is needed. 3456 - */ 3450 + } else { // from user 3451 + /* 3452 + * user->user transition relies on an implicit 3453 + * memory barrier in switch_mm() when 3454 + * current->mm changes. If the architecture 3455 + * switch_mm() does not have an implicit memory 3456 + * barrier, it is emitted here. If current->mm 3457 + * is unchanged, no barrier is needed. 3458 + */ 3459 + smp_mb__after_switch_mm(); 3460 + } 3457 3461 } 3458 3462 if (prev->mm_cid_active) { 3459 3463 mm_cid_snapshot_time(rq, prev->mm);