Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

timers: Annotate possible non critical data race of next_expiry

Global timers could be expired remotely when the target CPU is idle. After
a remote timer expiry, the remote timer_base->next_expiry value is updated
while holding the timer_base->lock. When the formerly idle CPU becomes
active at the same time and checks whether timers need to expire, this
check is done lockless as it is on the local CPU. This could lead to a data
race, which was reported by sysbot:

https://lore.kernel.org/r/000000000000916e55061f969e14@google.com

When the value is read lockless but changed by the remote CPU, only two non
critical scenarios could happen:

1) The already update value is read -> everything is perfect

2) The old value is read -> a superfluous timer soft interrupt is raised

The same situation could happen when enqueueing a new first pinned timer by
a remote CPU also with non critical scenarios:

1) The already update value is read -> everything is perfect

2) The old value is read -> when the CPU is idle, an IPI is executed
nevertheless and when the CPU isn't idle, the updated value will be visible
on the next tick and the timer might be late one jiffie.

As this is very unlikely to happen, the overhead of doing the check under
the lock is a way more effort, than a superfluous timer soft interrupt or a
possible 1 jiffie delay of the timer.

Document and annotate this non critical behavior in the code by using
READ/WRITE_ONCE() pair when accessing timer_base->next_expiry.

Reported-by: syzbot+bf285fcc0a048e028118@syzkaller.appspotmail.com
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/all/20240829154305.19259-1-anna-maria@linutronix.de
Closes: https://lore.kernel.org/lkml/000000000000916e55061f969e14@google.com

authored by

Anna-Maria Behnsen and committed by
Thomas Gleixner
79f8b28e 4381b895

+37 -5
+37 -5
kernel/time/timer.c
··· 672 672 * Set the next expiry time and kick the CPU so it 673 673 * can reevaluate the wheel: 674 674 */ 675 - base->next_expiry = bucket_expiry; 675 + WRITE_ONCE(base->next_expiry, bucket_expiry); 676 676 base->timers_pending = true; 677 677 base->next_expiry_recalc = false; 678 678 trigger_dyntick_cpu(base, timer); ··· 1966 1966 clk += adj; 1967 1967 } 1968 1968 1969 - base->next_expiry = next; 1969 + WRITE_ONCE(base->next_expiry, next); 1970 1970 base->next_expiry_recalc = false; 1971 1971 base->timers_pending = !(next == base->clk + NEXT_TIMER_MAX_DELTA); 1972 1972 } ··· 2020 2020 * easy comparable to find out which base holds the first pending timer. 2021 2021 */ 2022 2022 if (!base->timers_pending) 2023 - base->next_expiry = basej + NEXT_TIMER_MAX_DELTA; 2023 + WRITE_ONCE(base->next_expiry, basej + NEXT_TIMER_MAX_DELTA); 2024 2024 2025 2025 return base->next_expiry; 2026 2026 } ··· 2464 2464 hrtimer_run_queues(); 2465 2465 2466 2466 for (int i = 0; i < NR_BASES; i++, base++) { 2467 - /* Raise the softirq only if required. */ 2468 - if (time_after_eq(jiffies, base->next_expiry) || 2467 + /* 2468 + * Raise the softirq only if required. 2469 + * 2470 + * timer_base::next_expiry can be written by a remote CPU while 2471 + * holding the lock. If this write happens at the same time than 2472 + * the lockless local read, sanity checker could complain about 2473 + * data corruption. 2474 + * 2475 + * There are two possible situations where 2476 + * timer_base::next_expiry is written by a remote CPU: 2477 + * 2478 + * 1. Remote CPU expires global timers of this CPU and updates 2479 + * timer_base::next_expiry of BASE_GLOBAL afterwards in 2480 + * next_timer_interrupt() or timer_recalc_next_expiry(). The 2481 + * worst outcome is a superfluous raise of the timer softirq 2482 + * when the not yet updated value is read. 2483 + * 2484 + * 2. A new first pinned timer is enqueued by a remote CPU 2485 + * and therefore timer_base::next_expiry of BASE_LOCAL is 2486 + * updated. When this update is missed, this isn't a 2487 + * problem, as an IPI is executed nevertheless when the CPU 2488 + * was idle before. When the CPU wasn't idle but the update 2489 + * is missed, then the timer would expire one jiffie late - 2490 + * bad luck. 2491 + * 2492 + * Those unlikely corner cases where the worst outcome is only a 2493 + * one jiffie delay or a superfluous raise of the softirq are 2494 + * not that expensive as doing the check always while holding 2495 + * the lock. 2496 + * 2497 + * Possible remote writers are using WRITE_ONCE(). Local reader 2498 + * uses therefore READ_ONCE(). 2499 + */ 2500 + if (time_after_eq(jiffies, READ_ONCE(base->next_expiry)) || 2469 2501 (i == BASE_DEF && tmigr_requires_handle_remote())) { 2470 2502 raise_softirq(TIMER_SOFTIRQ); 2471 2503 return;