Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

clockevents: Prevent timer interrupt starvation

Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space. He provided a reproducer, which sets up a timerfd based
timer and then rearms it in a loop with an absolute expiry time of 1ns.

As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.

As a first step to prevent this, avoid reprogramming the clock event device
when:
- a forced minimum delta event is pending
- the new expiry delta is less then or equal to the minimum delta

Thanks to Calvin for providing the reproducer and to Borislav for testing
and providing data from his Zen5 machine.

The problem is not limited to Zen5, but depending on the underlying
clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
not necessarily observable.

This change serves only as the last resort and further changes will be made
to prevent this scenario earlier in the call chain as far as possible.

[ tglx: Updated to restore the old behaviour vs. !force and delta <= 0 and
fixed up the tick-broadcast handlers as pointed out by Borislav ]

Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Calvin Owens <calvin@wbinvd.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
Link: https://patch.msgid.link/20260407083247.562657657@kernel.org

+31 -9
+2
include/linux/clockchips.h
··· 80 80 * @shift: nanoseconds to cycles divisor (power of two) 81 81 * @state_use_accessors:current state of the device, assigned by the core code 82 82 * @features: features 83 + * @next_event_forced: True if the last programming was a forced event 83 84 * @retries: number of forced programming retries 84 85 * @set_state_periodic: switch state to periodic 85 86 * @set_state_oneshot: switch state to oneshot ··· 109 108 u32 shift; 110 109 enum clock_event_state state_use_accessors; 111 110 unsigned int features; 111 + unsigned int next_event_forced; 112 112 unsigned long retries; 113 113 114 114 int (*set_state_periodic)(struct clock_event_device *);
+19 -8
kernel/time/clockevents.c
··· 172 172 { 173 173 clockevents_switch_state(dev, CLOCK_EVT_STATE_SHUTDOWN); 174 174 dev->next_event = KTIME_MAX; 175 + dev->next_event_forced = 0; 175 176 } 176 177 177 178 /** ··· 306 305 { 307 306 unsigned long long clc; 308 307 int64_t delta; 309 - int rc; 310 308 311 309 if (WARN_ON_ONCE(expires < 0)) 312 310 return -ETIME; ··· 324 324 return dev->set_next_ktime(expires, dev); 325 325 326 326 delta = ktime_to_ns(ktime_sub(expires, ktime_get())); 327 - if (delta <= 0) 328 - return force ? clockevents_program_min_delta(dev) : -ETIME; 329 327 330 - delta = min(delta, (int64_t) dev->max_delta_ns); 331 - delta = max(delta, (int64_t) dev->min_delta_ns); 328 + /* Required for tick_periodic() during early boot */ 329 + if (delta <= 0 && !force) 330 + return -ETIME; 332 331 333 - clc = ((unsigned long long) delta * dev->mult) >> dev->shift; 334 - rc = dev->set_next_event((unsigned long) clc, dev); 332 + if (delta > (int64_t)dev->min_delta_ns) { 333 + delta = min(delta, (int64_t) dev->max_delta_ns); 334 + clc = ((unsigned long long) delta * dev->mult) >> dev->shift; 335 + if (!dev->set_next_event((unsigned long) clc, dev)) 336 + return 0; 337 + } 335 338 336 - return (rc && force) ? clockevents_program_min_delta(dev) : rc; 339 + if (dev->next_event_forced) 340 + return 0; 341 + 342 + if (dev->set_next_event(dev->min_delta_ticks, dev)) { 343 + if (!force || clockevents_program_min_delta(dev)) 344 + return -ETIME; 345 + } 346 + dev->next_event_forced = 1; 347 + return 0; 337 348 } 338 349 339 350 /*
+1
kernel/time/hrtimer.c
··· 1888 1888 BUG_ON(!cpu_base->hres_active); 1889 1889 cpu_base->nr_events++; 1890 1890 dev->next_event = KTIME_MAX; 1891 + dev->next_event_forced = 0; 1891 1892 1892 1893 raw_spin_lock_irqsave(&cpu_base->lock, flags); 1893 1894 entry_time = now = hrtimer_update_base(cpu_base);
+7 -1
kernel/time/tick-broadcast.c
··· 76 76 */ 77 77 static void tick_broadcast_start_periodic(struct clock_event_device *bc) 78 78 { 79 - if (bc) 79 + if (bc) { 80 + bc->next_event_forced = 0; 80 81 tick_setup_periodic(bc, 1); 82 + } 81 83 } 82 84 83 85 /* ··· 405 403 bool bc_local; 406 404 407 405 raw_spin_lock(&tick_broadcast_lock); 406 + tick_broadcast_device.evtdev->next_event_forced = 0; 408 407 409 408 /* Handle spurious interrupts gracefully */ 410 409 if (clockevent_state_shutdown(tick_broadcast_device.evtdev)) { ··· 699 696 700 697 raw_spin_lock(&tick_broadcast_lock); 701 698 dev->next_event = KTIME_MAX; 699 + tick_broadcast_device.evtdev->next_event_forced = 0; 702 700 next_event = KTIME_MAX; 703 701 cpumask_clear(tmpmask); 704 702 now = ktime_get(); ··· 1067 1063 1068 1064 1069 1065 bc->event_handler = tick_handle_oneshot_broadcast; 1066 + bc->next_event_forced = 0; 1070 1067 bc->next_event = KTIME_MAX; 1071 1068 1072 1069 /* ··· 1180 1175 } 1181 1176 1182 1177 /* This moves the broadcast assignment to this CPU: */ 1178 + bc->next_event_forced = 0; 1183 1179 clockevents_program_event(bc, bc->next_event, 1); 1184 1180 } 1185 1181 raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
+1
kernel/time/tick-common.c
··· 110 110 int cpu = smp_processor_id(); 111 111 ktime_t next = dev->next_event; 112 112 113 + dev->next_event_forced = 0; 113 114 tick_periodic(cpu); 114 115 115 116 /*
+1
kernel/time/tick-sched.c
··· 1513 1513 struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched); 1514 1514 1515 1515 dev->next_event = KTIME_MAX; 1516 + dev->next_event_forced = 0; 1516 1517 1517 1518 if (likely(tick_nohz_handler(&ts->sched_timer) == HRTIMER_RESTART)) 1518 1519 tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);