Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Revert "x86/smp: Put CPUs into INIT on shutdown if possible"

This reverts commit 45e34c8af58f23db4474e2bfe79183efec09a18b, and the
two subsequent fixes to it:

3f874c9b2aae ("x86/smp: Don't send INIT to non-present and non-booted CPUs")
b1472a60a584 ("x86/smp: Don't send INIT to boot CPU")

because it seems to result in hung machines at shutdown. Particularly
some Dell machines, but Thomas says

"The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs -
at least that's what I could figure out from the various bug reports.

I don't know which CPUs the DELL machines have, so I can't say it's a
pattern.

I agree with the revert for now"

Ashok Raj chimes in:

"There was a report (probably this same one), and it turns out it was a
bug in the BIOS SMI handler.

The client BIOS's were waiting for the lowest APICID to be the SMI
rendevous master. If this is MeteorLake, the BSP wasn't the one with
the lowest APIC and it triped here.

The BIOS change is also being pushed to others for assimilation :)

Server BIOS's had this correctly for a while now"

and it does look likely to be some bad interaction between SMI and the
non-BSP cores having put into INIT (and thus unresponsive until reset).

Link: https://bbs.archlinux.org/viewtopic.php?pid=2124429
Link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/
Link: https://forum.artixlinux.org/index.php/topic,5997.0.html
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

+7 -60
-1
arch/x86/include/asm/smp.h
··· 129 129 void native_send_call_func_ipi(const struct cpumask *mask); 130 130 void native_send_call_func_single_ipi(int cpu); 131 131 132 - bool smp_park_other_cpus_in_init(void); 133 132 void smp_store_cpu_info(int id); 134 133 135 134 asmlinkage __visible void smp_reboot_interrupt(void);
+7 -32
arch/x86/kernel/smp.c
··· 131 131 } 132 132 133 133 /* 134 - * Disable virtualization, APIC etc. and park the CPU in a HLT loop 134 + * this function calls the 'stop' function on all other CPUs in the system. 135 135 */ 136 136 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot) 137 137 { ··· 172 172 * 2) Wait for all other CPUs to report that they reached the 173 173 * HLT loop in stop_this_cpu() 174 174 * 175 - * 3) If the system uses INIT/STARTUP for CPU bringup, then 176 - * send all present CPUs an INIT vector, which brings them 177 - * completely out of the way. 175 + * 3) If #2 timed out send an NMI to the CPUs which did not 176 + * yet report 178 177 * 179 - * 4) If #3 is not possible and #2 timed out send an NMI to the 180 - * CPUs which did not yet report 181 - * 182 - * 5) Wait for all other CPUs to report that they reached the 178 + * 4) Wait for all other CPUs to report that they reached the 183 179 * HLT loop in stop_this_cpu() 184 180 * 185 - * #4 can obviously race against a CPU reaching the HLT loop late. 181 + * #3 can obviously race against a CPU reaching the HLT loop late. 186 182 * That CPU will have reported already and the "have all CPUs 187 183 * reached HLT" condition will be true despite the fact that the 188 184 * other CPU is still handling the NMI. Again, there is no ··· 194 198 /* 195 199 * Don't wait longer than a second for IPI completion. The 196 200 * wait request is not checked here because that would 197 - * prevent an NMI/INIT shutdown in case that not all 201 + * prevent an NMI shutdown attempt in case that not all 198 202 * CPUs reach shutdown state. 199 203 */ 200 204 timeout = USEC_PER_SEC; ··· 202 206 udelay(1); 203 207 } 204 208 205 - /* 206 - * Park all other CPUs in INIT including "offline" CPUs, if 207 - * possible. That's a safe place where they can't resume execution 208 - * of HLT and then execute the HLT loop from overwritten text or 209 - * page tables. 210 - * 211 - * The only downside is a broadcast MCE, but up to the point where 212 - * the kexec() kernel brought all APs online again an MCE will just 213 - * make HLT resume and handle the MCE. The machine crashes and burns 214 - * due to overwritten text, page tables and data. So there is a 215 - * choice between fire and frying pan. The result is pretty much 216 - * the same. Chose frying pan until x86 provides a sane mechanism 217 - * to park a CPU. 218 - */ 219 - if (smp_park_other_cpus_in_init()) 220 - goto done; 221 - 222 - /* 223 - * If park with INIT was not possible and the REBOOT_VECTOR didn't 224 - * take all secondary CPUs offline, try with the NMI. 225 - */ 209 + /* if the REBOOT_VECTOR didn't work, try with the NMI */ 226 210 if (!cpumask_empty(&cpus_stop_mask)) { 227 211 /* 228 212 * If NMI IPI is enabled, try to register the stop handler ··· 225 249 udelay(1); 226 250 } 227 251 228 - done: 229 252 local_irq_save(flags); 230 253 disable_local_APIC(); 231 254 mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
-27
arch/x86/kernel/smpboot.c
··· 1240 1240 cache_aps_init(); 1241 1241 } 1242 1242 1243 - bool smp_park_other_cpus_in_init(void) 1244 - { 1245 - unsigned int cpu, this_cpu = smp_processor_id(); 1246 - unsigned int apicid; 1247 - 1248 - if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu) 1249 - return false; 1250 - 1251 - /* 1252 - * If this is a crash stop which does not execute on the boot CPU, 1253 - * then this cannot use the INIT mechanism because INIT to the boot 1254 - * CPU will reset the machine. 1255 - */ 1256 - if (this_cpu) 1257 - return false; 1258 - 1259 - for_each_cpu_and(cpu, &cpus_booted_once_mask, cpu_present_mask) { 1260 - if (cpu == this_cpu) 1261 - continue; 1262 - apicid = apic->cpu_present_to_apicid(cpu); 1263 - if (apicid == BAD_APICID) 1264 - continue; 1265 - send_init_sequence(apicid); 1266 - } 1267 - return true; 1268 - } 1269 - 1270 1243 /* 1271 1244 * Early setup to make printk work. 1272 1245 */