watchdog/softlockup: fix incorrect CPU utilization output during softlockup

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Since we use 16-bit precision, the raw data will undergo integer division,
which may sometimes result in data loss. This can lead to slightly
inaccurate CPU utilization calculations. Under normal circumstances, this
isn't an issue. However, when CPU utilization reaches 100%, the
calculated result might exceed 100%. For example, with raw data like the
following:

sample_period 400000134 new_stat 83648414036 old_stat 83247417494

sample_period=400000134/2^24=23
new_stat=83648414036/2^24=4985
old_stat=83247417494/2^24=4961
util=105%

Below log will output：

CPU#3 Utilization every 0s during lockup:
#1: 0% system, 0% softirq, 105% hardirq, 0% idle
#2: 0% system, 0% softirq, 105% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 105% hardirq, 0% idle
#5: 0% system, 0% softirq, 105% hardirq, 0% idle

To avoid confusion, we enforce a 100% display cap when calculations exceed
this threshold.

We also round to the nearest multiple of 16.8 milliseconds to improve the
accuracy.

[yaozhenguo1@gmail.com: make get_16bit_precision() more accurate, fix comment layout]
Link: https://lkml.kernel.org/r/20250818081438.40540-1-yaozhenguo@jd.com
Link: https://lkml.kernel.org/r/20250812082510.32291-1-yaozhenguo@jd.com
Signed-off-by: ZhenguoYao <yaozhenguo1@gmail.com>
Cc: Bitao Hu <yaoma@linux.alibaba.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

ZhenguoYao and committed by

Andrew Morton 9 months ago 95f09127 41f88ddf

+13 -1

1 changed file

expand all

kernel

watchdog.c

+13 -1

kernel/watchdog.c

··· 425 425 */ 426 426 static u16 get_16bit_precision(u64 data_ns) 427 427 { 428 - return data_ns >> 24LL; /* 2^24ns ~= 16.8ms */ 428 + /* 429 + * 2^24ns ~= 16.8ms 430 + * Round to the nearest multiple of 16.8 milliseconds. 431 + */ 432 + return (data_ns + (1 << 23)) >> 24LL; 429 433 } 430 434 431 435 static void update_cpustat(void) ··· 448 444 old_stat = __this_cpu_read(cpustat_old[i]); 449 445 new_stat = get_16bit_precision(cpustat[tracked_stats[i]]); 450 446 util = DIV_ROUND_UP(100 * (new_stat - old_stat), sample_period_16); 447 + /* 448 + * Since we use 16-bit precision, the raw data will undergo 449 + * integer division, which may sometimes result in data loss, 450 + * and then result might exceed 100%. To avoid confusion, 451 + * we enforce a 100% display cap when calculations exceed this threshold. 452 + */ 453 + if (util > 100) 454 + util = 100; 451 455 __this_cpu_write(cpustat_util[tail][i], util); 452 456 __this_cpu_write(cpustat_old[i], new_stat); 453 457 }

Configure Feed

Configure Feed