Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

watchdog: fix watchdog may detect false positive of softlockup

When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer. If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used. The problem can be described as follow:

# We asuume previous watchdog_thresh is 60, so the watchdog timer is
# coming every 24s.
echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
|
+------>+ update watchdog_thresh (We are in kernel now)
|
| # using old interval and new `watchdog_thresh`
+------>+ watchdog hrtimer (irq context: detect softlockup)
|
|
+-------+
|
|
+ softlockup_stop_all

To fix this problem, introduce a shadow variable for `watchdog_thresh`.
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.

The following testcase may help to understand this problem.

---------------------------------------------
echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 > /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &
echo 10 > /proc/sys/kernel/watchdog_thresh &
---------------------------------------------

The test case above first removes the throttling restrictions for
real-time tasks. It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3. Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler. This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.

[nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case]
Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Nysal Jan K.A." <nysal@linux.ibm.com>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Luo Gengkun and committed by
Andrew Morton
7123dbbe 2e27fa94

+27 -14
+27 -14
kernel/watchdog.c
··· 47 47 static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT; 48 48 static int __read_mostly watchdog_softlockup_user_enabled = 1; 49 49 int __read_mostly watchdog_thresh = 10; 50 + static int __read_mostly watchdog_thresh_next; 50 51 static int __read_mostly watchdog_hardlockup_available; 51 52 52 53 struct cpumask watchdog_cpumask __read_mostly; ··· 871 870 return 0; 872 871 } 873 872 874 - static void __lockup_detector_reconfigure(void) 873 + static void __lockup_detector_reconfigure(bool thresh_changed) 875 874 { 876 875 cpus_read_lock(); 877 876 watchdog_hardlockup_stop(); 878 877 879 878 softlockup_stop_all(); 879 + /* 880 + * To prevent watchdog_timer_fn from using the old interval and 881 + * the new watchdog_thresh at the same time, which could lead to 882 + * false softlockup reports, it is necessary to update the 883 + * watchdog_thresh after the softlockup is completed. 884 + */ 885 + if (thresh_changed) 886 + watchdog_thresh = READ_ONCE(watchdog_thresh_next); 880 887 set_sample_period(); 881 888 lockup_detector_update_enable(); 882 889 if (watchdog_enabled && watchdog_thresh) ··· 897 888 void lockup_detector_reconfigure(void) 898 889 { 899 890 mutex_lock(&watchdog_mutex); 900 - __lockup_detector_reconfigure(); 891 + __lockup_detector_reconfigure(false); 901 892 mutex_unlock(&watchdog_mutex); 902 893 } 903 894 ··· 917 908 return; 918 909 919 910 mutex_lock(&watchdog_mutex); 920 - __lockup_detector_reconfigure(); 911 + __lockup_detector_reconfigure(false); 921 912 softlockup_initialized = true; 922 913 mutex_unlock(&watchdog_mutex); 923 914 } 924 915 925 916 #else /* CONFIG_SOFTLOCKUP_DETECTOR */ 926 - static void __lockup_detector_reconfigure(void) 917 + static void __lockup_detector_reconfigure(bool thresh_changed) 927 918 { 928 919 cpus_read_lock(); 929 920 watchdog_hardlockup_stop(); 921 + if (thresh_changed) 922 + watchdog_thresh = READ_ONCE(watchdog_thresh_next); 930 923 lockup_detector_update_enable(); 931 924 watchdog_hardlockup_start(); 932 925 cpus_read_unlock(); 933 926 } 934 927 void lockup_detector_reconfigure(void) 935 928 { 936 - __lockup_detector_reconfigure(); 929 + __lockup_detector_reconfigure(false); 937 930 } 938 931 static inline void lockup_detector_setup(void) 939 932 { 940 - __lockup_detector_reconfigure(); 933 + __lockup_detector_reconfigure(false); 941 934 } 942 935 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */ 943 936 ··· 957 946 #ifdef CONFIG_SYSCTL 958 947 959 948 /* Propagate any changes to the watchdog infrastructure */ 960 - static void proc_watchdog_update(void) 949 + static void proc_watchdog_update(bool thresh_changed) 961 950 { 962 951 /* Remove impossible cpus to keep sysctl output clean. */ 963 952 cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask); 964 - __lockup_detector_reconfigure(); 953 + __lockup_detector_reconfigure(thresh_changed); 965 954 } 966 955 967 956 /* ··· 995 984 } else { 996 985 err = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 997 986 if (!err && old != READ_ONCE(*param)) 998 - proc_watchdog_update(); 987 + proc_watchdog_update(false); 999 988 } 1000 989 mutex_unlock(&watchdog_mutex); 1001 990 return err; ··· 1046 1035 1047 1036 mutex_lock(&watchdog_mutex); 1048 1037 1049 - old = READ_ONCE(watchdog_thresh); 1038 + watchdog_thresh_next = READ_ONCE(watchdog_thresh); 1039 + 1040 + old = watchdog_thresh_next; 1050 1041 err = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 1051 1042 1052 - if (!err && write && old != READ_ONCE(watchdog_thresh)) 1053 - proc_watchdog_update(); 1043 + if (!err && write && old != READ_ONCE(watchdog_thresh_next)) 1044 + proc_watchdog_update(true); 1054 1045 1055 1046 mutex_unlock(&watchdog_mutex); 1056 1047 return err; ··· 1073 1060 1074 1061 err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); 1075 1062 if (!err && write) 1076 - proc_watchdog_update(); 1063 + proc_watchdog_update(false); 1077 1064 1078 1065 mutex_unlock(&watchdog_mutex); 1079 1066 return err; ··· 1093 1080 }, 1094 1081 { 1095 1082 .procname = "watchdog_thresh", 1096 - .data = &watchdog_thresh, 1083 + .data = &watchdog_thresh_next, 1097 1084 .maxlen = sizeof(int), 1098 1085 .mode = 0644, 1099 1086 .proc_handler = proc_watchdog_thresh,