Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

- Fix CPU topology related regression that limited Xen PV guests to a
single CPU

- Fix ancient e820__register_nosave_regions() bugs that were causing
problems with kexec's artificial memory maps

- Fix an S4 hibernation crash caused by two missing ENDBR's that were
mistakenly removed in a recent commit

- Fix a resctrl serialization bug

- Fix early_printk documentation and comments

- Fix RSB bugs, combined with preparatory updates to better match the
code to vendor recommendations.

- Add RSB mitigation document

- Fix/update documentation

- Fix the erratum_1386_microcode[] table to be NULL terminated

* tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ibt: Fix hibernate
x86/cpu: Avoid running off the end of an AMD erratum table
Documentation/x86: Zap the subsection letters
Documentation/x86: Update the naming of CPU features for /proc/cpuinfo
x86/bugs: Add RSB mitigation document
x86/bugs: Don't fill RSB on context switch with eIBRS
x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier()
x86/bugs: Use SBPB in write_ibpb() if applicable
x86/bugs: Rename entry_ibpb() to write_ibpb()
x86/early_printk: Use 'mmio32' for consistency, fix comments
x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name
x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI

+408 -160
+1
Documentation/admin-guide/hw-vuln/index.rst
··· 22 22 srso 23 23 gather_data_sampling 24 24 reg-file-data-sampling 25 + rsb
+268
Documentation/admin-guide/hw-vuln/rsb.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================= 4 + RSB-related mitigations 5 + ======================= 6 + 7 + .. warning:: 8 + Please keep this document up-to-date, otherwise you will be 9 + volunteered to update it and convert it to a very long comment in 10 + bugs.c! 11 + 12 + Since 2018 there have been many Spectre CVEs related to the Return Stack 13 + Buffer (RSB) (sometimes referred to as the Return Address Stack (RAS) or 14 + Return Address Predictor (RAP) on AMD). 15 + 16 + Information about these CVEs and how to mitigate them is scattered 17 + amongst a myriad of microarchitecture-specific documents. 18 + 19 + This document attempts to consolidate all the relevant information in 20 + once place and clarify the reasoning behind the current RSB-related 21 + mitigations. It's meant to be as concise as possible, focused only on 22 + the current kernel mitigations: what are the RSB-related attack vectors 23 + and how are they currently being mitigated? 24 + 25 + It's *not* meant to describe how the RSB mechanism operates or how the 26 + exploits work. More details about those can be found in the references 27 + below. 28 + 29 + Rather, this is basically a glorified comment, but too long to actually 30 + be one. So when the next CVE comes along, a kernel developer can 31 + quickly refer to this as a refresher to see what we're actually doing 32 + and why. 33 + 34 + At a high level, there are two classes of RSB attacks: RSB poisoning 35 + (Intel and AMD) and RSB underflow (Intel only). They must each be 36 + considered individually for each attack vector (and microarchitecture 37 + where applicable). 38 + 39 + ---- 40 + 41 + RSB poisoning (Intel and AMD) 42 + ============================= 43 + 44 + SpectreRSB 45 + ~~~~~~~~~~ 46 + 47 + RSB poisoning is a technique used by SpectreRSB [#spectre-rsb]_ where 48 + an attacker poisons an RSB entry to cause a victim's return instruction 49 + to speculate to an attacker-controlled address. This can happen when 50 + there are unbalanced CALLs/RETs after a context switch or VMEXIT. 51 + 52 + * All attack vectors can potentially be mitigated by flushing out any 53 + poisoned RSB entries using an RSB filling sequence 54 + [#intel-rsb-filling]_ [#amd-rsb-filling]_ when transitioning between 55 + untrusted and trusted domains. But this has a performance impact and 56 + should be avoided whenever possible. 57 + 58 + .. DANGER:: 59 + **FIXME**: Currently we're flushing 32 entries. However, some CPU 60 + models have more than 32 entries. The loop count needs to be 61 + increased for those. More detailed information is needed about RSB 62 + sizes. 63 + 64 + * On context switch, the user->user mitigation requires ensuring the 65 + RSB gets filled or cleared whenever IBPB gets written [#cond-ibpb]_ 66 + during a context switch: 67 + 68 + * AMD: 69 + On Zen 4+, IBPB (or SBPB [#amd-sbpb]_ if used) clears the RSB. 70 + This is indicated by IBPB_RET in CPUID [#amd-ibpb-rsb]_. 71 + 72 + On Zen < 4, the RSB filling sequence [#amd-rsb-filling]_ must be 73 + always be done in addition to IBPB [#amd-ibpb-no-rsb]_. This is 74 + indicated by X86_BUG_IBPB_NO_RET. 75 + 76 + * Intel: 77 + IBPB always clears the RSB: 78 + 79 + "Software that executed before the IBPB command cannot control 80 + the predicted targets of indirect branches executed after the 81 + command on the same logical processor. The term indirect branch 82 + in this context includes near return instructions, so these 83 + predicted targets may come from the RSB." [#intel-ibpb-rsb]_ 84 + 85 + * On context switch, user->kernel attacks are prevented by SMEP. User 86 + space can only insert user space addresses into the RSB. Even 87 + non-canonical addresses can't be inserted due to the page gap at the 88 + end of the user canonical address space reserved by TASK_SIZE_MAX. 89 + A SMEP #PF at instruction fetch prevents the kernel from speculatively 90 + executing user space. 91 + 92 + * AMD: 93 + "Finally, branches that are predicted as 'ret' instructions get 94 + their predicted targets from the Return Address Predictor (RAP). 95 + AMD recommends software use a RAP stuffing sequence (mitigation 96 + V2-3 in [2]) and/or Supervisor Mode Execution Protection (SMEP) 97 + to ensure that the addresses in the RAP are safe for 98 + speculation. Collectively, we refer to these mitigations as "RAP 99 + Protection"." [#amd-smep-rsb]_ 100 + 101 + * Intel: 102 + "On processors with enhanced IBRS, an RSB overwrite sequence may 103 + not suffice to prevent the predicted target of a near return 104 + from using an RSB entry created in a less privileged predictor 105 + mode. Software can prevent this by enabling SMEP (for 106 + transitions from user mode to supervisor mode) and by having 107 + IA32_SPEC_CTRL.IBRS set during VM exits." [#intel-smep-rsb]_ 108 + 109 + * On VMEXIT, guest->host attacks are mitigated by eIBRS (and PBRSB 110 + mitigation if needed): 111 + 112 + * AMD: 113 + "When Automatic IBRS is enabled, the internal return address 114 + stack used for return address predictions is cleared on VMEXIT." 115 + [#amd-eibrs-vmexit]_ 116 + 117 + * Intel: 118 + "On processors with enhanced IBRS, an RSB overwrite sequence may 119 + not suffice to prevent the predicted target of a near return 120 + from using an RSB entry created in a less privileged predictor 121 + mode. Software can prevent this by enabling SMEP (for 122 + transitions from user mode to supervisor mode) and by having 123 + IA32_SPEC_CTRL.IBRS set during VM exits. Processors with 124 + enhanced IBRS still support the usage model where IBRS is set 125 + only in the OS/VMM for OSes that enable SMEP. To do this, such 126 + processors will ensure that guest behavior cannot control the 127 + RSB after a VM exit once IBRS is set, even if IBRS was not set 128 + at the time of the VM exit." [#intel-eibrs-vmexit]_ 129 + 130 + Note that some Intel CPUs are susceptible to Post-barrier Return 131 + Stack Buffer Predictions (PBRSB) [#intel-pbrsb]_, where the last 132 + CALL from the guest can be used to predict the first unbalanced RET. 133 + In this case the PBRSB mitigation is needed in addition to eIBRS. 134 + 135 + AMD RETBleed / SRSO / Branch Type Confusion 136 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 137 + 138 + On AMD, poisoned RSB entries can also be created by the AMD RETBleed 139 + variant [#retbleed-paper]_ [#amd-btc]_ or by Speculative Return Stack 140 + Overflow [#amd-srso]_ (Inception [#inception-paper]_). The kernel 141 + protects itself by replacing every RET in the kernel with a branch to a 142 + single safe RET. 143 + 144 + ---- 145 + 146 + RSB underflow (Intel only) 147 + ========================== 148 + 149 + RSB Alternate (RSBA) ("Intel Retbleed") 150 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 151 + 152 + Some Intel Skylake-generation CPUs are susceptible to the Intel variant 153 + of RETBleed [#retbleed-paper]_ (Return Stack Buffer Underflow 154 + [#intel-rsbu]_). If a RET is executed when the RSB buffer is empty due 155 + to mismatched CALLs/RETs or returning from a deep call stack, the branch 156 + predictor can fall back to using the Branch Target Buffer (BTB). If a 157 + user forces a BTB collision then the RET can speculatively branch to a 158 + user-controlled address. 159 + 160 + * Note that RSB filling doesn't fully mitigate this issue. If there 161 + are enough unbalanced RETs, the RSB may still underflow and fall back 162 + to using a poisoned BTB entry. 163 + 164 + * On context switch, user->user underflow attacks are mitigated by the 165 + conditional IBPB [#cond-ibpb]_ on context switch which effectively 166 + clears the BTB: 167 + 168 + * "The indirect branch predictor barrier (IBPB) is an indirect branch 169 + control mechanism that establishes a barrier, preventing software 170 + that executed before the barrier from controlling the predicted 171 + targets of indirect branches executed after the barrier on the same 172 + logical processor." [#intel-ibpb-btb]_ 173 + 174 + * On context switch and VMEXIT, user->kernel and guest->host RSB 175 + underflows are mitigated by IBRS or eIBRS: 176 + 177 + * "Enabling IBRS (including enhanced IBRS) will mitigate the "RSBU" 178 + attack demonstrated by the researchers. As previously documented, 179 + Intel recommends the use of enhanced IBRS, where supported. This 180 + includes any processor that enumerates RRSBA but not RRSBA_DIS_S." 181 + [#intel-rsbu]_ 182 + 183 + However, note that eIBRS and IBRS do not mitigate intra-mode attacks. 184 + Like RRSBA below, this is mitigated by clearing the BHB on kernel 185 + entry. 186 + 187 + As an alternative to classic IBRS, call depth tracking (combined with 188 + retpolines) can be used to track kernel returns and fill the RSB when 189 + it gets close to being empty. 190 + 191 + Restricted RSB Alternate (RRSBA) 192 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 193 + 194 + Some newer Intel CPUs have Restricted RSB Alternate (RRSBA) behavior, 195 + which, similar to RSBA described above, also falls back to using the BTB 196 + on RSB underflow. The only difference is that the predicted targets are 197 + restricted to the current domain when eIBRS is enabled: 198 + 199 + * "Restricted RSB Alternate (RRSBA) behavior allows alternate branch 200 + predictors to be used by near RET instructions when the RSB is 201 + empty. When eIBRS is enabled, the predicted targets of these 202 + alternate predictors are restricted to those belonging to the 203 + indirect branch predictor entries of the current prediction domain. 204 + [#intel-eibrs-rrsba]_ 205 + 206 + When a CPU with RRSBA is vulnerable to Branch History Injection 207 + [#bhi-paper]_ [#intel-bhi]_, an RSB underflow could be used for an 208 + intra-mode BTI attack. This is mitigated by clearing the BHB on 209 + kernel entry. 210 + 211 + However if the kernel uses retpolines instead of eIBRS, it needs to 212 + disable RRSBA: 213 + 214 + * "Where software is using retpoline as a mitigation for BHI or 215 + intra-mode BTI, and the processor both enumerates RRSBA and 216 + enumerates RRSBA_DIS controls, it should disable this behavior." 217 + [#intel-retpoline-rrsba]_ 218 + 219 + ---- 220 + 221 + References 222 + ========== 223 + 224 + .. [#spectre-rsb] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://arxiv.org/pdf/1807.07940.pdf>`_ 225 + 226 + .. [#intel-rsb-filling] "Empty RSB Mitigation on Skylake-generation" in `Retpoline: A Branch Target Injection Mitigation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html#inpage-nav-5-1>`_ 227 + 228 + .. [#amd-rsb-filling] "Mitigation V2-3" in `Software Techniques for Managing Speculation <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf>`_ 229 + 230 + .. [#cond-ibpb] Whether IBPB is written depends on whether the prev and/or next task is protected from Spectre attacks. It typically requires opting in per task or system-wide. For more details see the documentation for the ``spectre_v2_user`` cmdline option in Documentation/admin-guide/kernel-parameters.txt. 231 + 232 + .. [#amd-sbpb] IBPB without flushing of branch type predictions. Only exists for AMD. 233 + 234 + .. [#amd-ibpb-rsb] "Function 8000_0008h -- Processor Capacity Parameters and Extended Feature Identification" in `AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf>`_. SBPB behaves the same way according to `this email <https://lore.kernel.org/5175b163a3736ca5fd01cedf406735636c99a>`_. 235 + 236 + .. [#amd-ibpb-no-rsb] `Spectre Attacks: Exploiting Speculative Execution <https://comsec.ethz.ch/wp-content/files/ibpb_sp25.pdf>`_ 237 + 238 + .. [#intel-ibpb-rsb] "Introduction" in `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_ 239 + 240 + .. [#amd-smep-rsb] "Existing Mitigations" in `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_ 241 + 242 + .. [#intel-smep-rsb] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_ 243 + 244 + .. [#amd-eibrs-vmexit] "Extended Feature Enable Register (EFER)" in `AMD64 Architecture Programmer's Manual Volume 2: System Programming <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_ 245 + 246 + .. [#intel-eibrs-vmexit] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_ 247 + 248 + .. [#intel-pbrsb] `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_ 249 + 250 + .. [#retbleed-paper] `RETBleed: Arbitrary Speculative Code Execution with Return Instruction <https://comsec.ethz.ch/wp-content/files/retbleed_sec22.pdf>`_ 251 + 252 + .. [#amd-btc] `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_ 253 + 254 + .. [#amd-srso] `Technical Update Regarding Speculative Return Stack Overflow <https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf>`_ 255 + 256 + .. [#inception-paper] `Inception: Exposing New Attack Surfaces with Training in Transient Execution <https://comsec.ethz.ch/wp-content/files/inception_sec23.pdf>`_ 257 + 258 + .. [#intel-rsbu] `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_ 259 + 260 + .. [#intel-ibpb-btb] `Indirect Branch Predictor Barrier' <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-predictor-barrier.html>`_ 261 + 262 + .. [#intel-eibrs-rrsba] "Guidance for RSBU" in `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_ 263 + 264 + .. [#bhi-paper] `Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks <http://download.vusec.net/papers/bhi-spectre-bhb_sec22.pdf>`_ 265 + 266 + .. [#intel-bhi] `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_ 267 + 268 + .. [#intel-retpoline-rrsba] "Retpoline" in `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
+1 -4
Documentation/admin-guide/kernel-parameters.txt
··· 1407 1407 earlyprintk=serial[,0x...[,baudrate]] 1408 1408 earlyprintk=ttySn[,baudrate] 1409 1409 earlyprintk=dbgp[debugController#] 1410 + earlyprintk=mmio32,membase[,{nocfg|baudrate}] 1410 1411 earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}] 1411 1412 earlyprintk=xdbc[xhciController#] 1412 1413 earlyprintk=bios 1413 - earlyprintk=mmio,membase[,{nocfg|baudrate}] 1414 1414 1415 1415 earlyprintk is useful when the kernel crashes before 1416 1416 the normal console is initialized. It is not enabled by 1417 1417 default because it has some cosmetic problems. 1418 - 1419 - Only 32-bit memory addresses are supported for "mmio" 1420 - and "pciserial" devices. 1421 1418 1422 1419 Use "nocfg" to skip UART configuration, assume 1423 1420 BIOS/firmware has configured UART correctly.
+36 -33
Documentation/arch/x86/cpuinfo.rst
··· 79 79 How are feature flags created? 80 80 ============================== 81 81 82 - a: Feature flags can be derived from the contents of CPUID leaves. 83 - ------------------------------------------------------------------ 82 + Feature flags can be derived from the contents of CPUID leaves 83 + -------------------------------------------------------------- 84 + 84 85 These feature definitions are organized mirroring the layout of CPUID 85 86 leaves and grouped in words with offsets as mapped in enum cpuid_leafs 86 87 in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details). ··· 90 89 displayed accordingly in /proc/cpuinfo. For example, the flag "avx2" 91 90 comes from X86_FEATURE_AVX2 in cpufeatures.h. 92 91 93 - b: Flags can be from scattered CPUID-based features. 94 - ---------------------------------------------------- 92 + Flags can be from scattered CPUID-based features 93 + ------------------------------------------------ 94 + 95 95 Hardware features enumerated in sparsely populated CPUID leaves get 96 96 software-defined values. Still, CPUID needs to be queried to determine 97 97 if a given feature is present. This is done in init_scattered_cpuid_features(). ··· 106 104 array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted 107 105 memory is not trivial. 108 106 109 - c: Flags can be created synthetically under certain conditions for hardware features. 110 - ------------------------------------------------------------------------------------- 107 + Flags can be created synthetically under certain conditions for hardware features 108 + --------------------------------------------------------------------------------- 109 + 111 110 Examples of conditions include whether certain features are present in 112 111 MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed 113 112 conditions are met, the features are enabled by the set_cpu_cap or ··· 117 114 "split_lock_detect" will be displayed. The flag "ring3mwait" will be 118 115 displayed only when running on INTEL_XEON_PHI_[KNL|KNM] processors. 119 116 120 - d: Flags can represent purely software features. 121 - ------------------------------------------------ 117 + Flags can represent purely software features 118 + -------------------------------------------- 122 119 These flags do not represent hardware features. Instead, they represent a 123 120 software feature implemented in the kernel. For example, Kernel Page Table 124 121 Isolation is purely software feature and its feature flag X86_FEATURE_PTI is ··· 133 130 resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming 134 131 of flags in the x86_cap/bug_flags[] are as follows: 135 132 136 - a: The name of the flag is from the string in X86_FEATURE_<name> by default. 137 - ---------------------------------------------------------------------------- 138 - By default, the flag <name> in /proc/cpuinfo is extracted from the respective 139 - X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from 140 - X86_FEATURE_AVX2. 133 + Flags do not appear by default in /proc/cpuinfo 134 + ----------------------------------------------- 141 135 142 - b: The naming can be overridden. 143 - -------------------------------- 136 + Feature flags are omitted by default from /proc/cpuinfo as it does not make 137 + sense for the feature to be exposed to userspace in most cases. For example, 138 + X86_FEATURE_ALWAYS is defined in cpufeatures.h but that flag is an internal 139 + kernel feature used in the alternative runtime patching functionality. So the 140 + flag does not appear in /proc/cpuinfo. 141 + 142 + Specify a flag name if absolutely needed 143 + ---------------------------------------- 144 + 144 145 If the comment on the line for the #define X86_FEATURE_* starts with a 145 146 double-quote character (""), the string inside the double-quote characters 146 147 will be the name of the flags. For example, the flag "sse4_1" comes from ··· 155 148 constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one 156 149 shall override the new naming with the name already used in /proc/cpuinfo. 157 150 158 - c: The naming override can be "", which means it will not appear in /proc/cpuinfo. 159 - ---------------------------------------------------------------------------------- 160 - The feature shall be omitted from /proc/cpuinfo if it does not make sense for 161 - the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is 162 - defined in cpufeatures.h but that flag is an internal kernel feature used 163 - in the alternative runtime patching functionality. So, its name is overridden 164 - with "". Its flag will not appear in /proc/cpuinfo. 165 - 166 151 Flags are missing when one or more of these happen 167 152 ================================================== 168 153 169 - a: The hardware does not enumerate support for it. 170 - -------------------------------------------------- 154 + The hardware does not enumerate support for it 155 + ---------------------------------------------- 156 + 171 157 For example, when a new kernel is running on old hardware or the feature is 172 158 not enabled by boot firmware. Even if the hardware is new, there might be a 173 159 problem enabling the feature at run time, the flag will not be displayed. 174 160 175 - b: The kernel does not know about the flag. 176 - ------------------------------------------- 161 + The kernel does not know about the flag 162 + --------------------------------------- 163 + 177 164 For example, when an old kernel is running on new hardware. 178 165 179 - c: The kernel disabled support for it at compile-time. 180 - ------------------------------------------------------ 166 + The kernel disabled support for it at compile-time 167 + -------------------------------------------------- 168 + 181 169 For example, if 5-level-paging is not enabled when building (i.e., 182 170 CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_. 183 171 Even though the feature will still be detected via CPUID, the kernel disables 184 172 it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57). 185 173 186 - d: The feature is disabled at boot-time. 187 - ---------------------------------------- 174 + The feature is disabled at boot-time 175 + ------------------------------------ 188 176 A feature can be disabled either using a command-line parameter or because 189 177 it failed to be enabled. The command-line parameter clearcpuid= can be used 190 178 to disable features using the feature number as defined in ··· 192 190 to, nofsgsbase, nosgx, noxsave, etc. 5-level paging can also be disabled using 193 191 "no5lvl". 194 192 195 - e: The feature was known to be non-functional. 196 - ---------------------------------------------- 193 + The feature was known to be non-functional 194 + ------------------------------------------ 195 + 197 196 The feature was known to be non-functional because a dependency was 198 197 missing at runtime. For example, AVX flags will not show up if XSAVE feature 199 198 is disabled since they depend on XSAVE feature. Another example would be broken
+5 -4
arch/x86/entry/entry.S
··· 17 17 18 18 .pushsection .noinstr.text, "ax" 19 19 20 - SYM_FUNC_START(entry_ibpb) 20 + /* Clobbers AX, CX, DX */ 21 + SYM_FUNC_START(write_ibpb) 21 22 ANNOTATE_NOENDBR 22 23 movl $MSR_IA32_PRED_CMD, %ecx 23 - movl $PRED_CMD_IBPB, %eax 24 + movl _ASM_RIP(x86_pred_cmd), %eax 24 25 xorl %edx, %edx 25 26 wrmsr 26 27 27 28 /* Make sure IBPB clears return stack preductions too. */ 28 29 FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET 29 30 RET 30 - SYM_FUNC_END(entry_ibpb) 31 + SYM_FUNC_END(write_ibpb) 31 32 /* For KVM */ 32 - EXPORT_SYMBOL_GPL(entry_ibpb); 33 + EXPORT_SYMBOL_GPL(write_ibpb); 33 34 34 35 .popsection 35 36
+6 -6
arch/x86/include/asm/nospec-branch.h
··· 269 269 * typically has NO_MELTDOWN). 270 270 * 271 271 * While retbleed_untrain_ret() doesn't clobber anything but requires stack, 272 - * entry_ibpb() will clobber AX, CX, DX. 272 + * write_ibpb() will clobber AX, CX, DX. 273 273 * 274 274 * As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point 275 275 * where we have a stack but before any RET instruction. ··· 279 279 VALIDATE_UNRET_END 280 280 CALL_UNTRAIN_RET 281 281 ALTERNATIVE_2 "", \ 282 - "call entry_ibpb", \ibpb_feature, \ 282 + "call write_ibpb", \ibpb_feature, \ 283 283 __stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH 284 284 #endif 285 285 .endm ··· 368 368 extern void srso_alias_return_thunk(void); 369 369 370 370 extern void entry_untrain_ret(void); 371 - extern void entry_ibpb(void); 371 + extern void write_ibpb(void); 372 372 373 373 #ifdef CONFIG_X86_64 374 374 extern void clear_bhb_loop(void); ··· 514 514 : "memory"); 515 515 } 516 516 517 - extern u64 x86_pred_cmd; 518 - 519 517 static inline void indirect_branch_prediction_barrier(void) 520 518 { 521 - alternative_msr_write(MSR_IA32_PRED_CMD, x86_pred_cmd, X86_FEATURE_IBPB); 519 + asm_inline volatile(ALTERNATIVE("", "call write_ibpb", X86_FEATURE_IBPB) 520 + : ASM_CALL_CONSTRAINT 521 + :: "rax", "rcx", "rdx", "memory"); 522 522 } 523 523 524 524 /* The Intel SPEC CTRL MSR base value cache */
+11
arch/x86/kernel/acpi/boot.c
··· 23 23 #include <linux/serial_core.h> 24 24 #include <linux/pgtable.h> 25 25 26 + #include <xen/xen.h> 27 + 26 28 #include <asm/e820/api.h> 27 29 #include <asm/irqdomain.h> 28 30 #include <asm/pci_x86.h> ··· 1731 1729 { 1732 1730 #if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_X86_MPPARSE) 1733 1731 /* mptable code is not built-in*/ 1732 + 1733 + /* 1734 + * Xen disables ACPI in PV DomU guests but it still emulates APIC and 1735 + * supports SMP. Returning early here ensures that APIC is not disabled 1736 + * unnecessarily and the guest is not limited to a single vCPU. 1737 + */ 1738 + if (xen_pv_domain() && !xen_initial_domain()) 1739 + return 0; 1740 + 1734 1741 if (acpi_disabled || acpi_noirq) { 1735 1742 pr_warn("MPS support code is not built-in, using acpi=off or acpi=noirq or pci=noacpi may have problem\n"); 1736 1743 return 1;
+1
arch/x86/kernel/cpu/amd.c
··· 805 805 static const struct x86_cpu_id erratum_1386_microcode[] = { 806 806 X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x01), 0x2, 0x2, 0x0800126e), 807 807 X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x31), 0x0, 0x0, 0x08301052), 808 + {} 808 809 }; 809 810 810 811 static void fix_erratum_1386(struct cpuinfo_x86 *c)
+34 -73
arch/x86/kernel/cpu/bugs.c
··· 59 59 EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current); 60 60 61 61 u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB; 62 - EXPORT_SYMBOL_GPL(x86_pred_cmd); 63 62 64 63 static u64 __ro_after_init x86_arch_cap_msr; 65 64 ··· 1141 1142 setup_clear_cpu_cap(X86_FEATURE_RETHUNK); 1142 1143 1143 1144 /* 1144 - * There is no need for RSB filling: entry_ibpb() ensures 1145 + * There is no need for RSB filling: write_ibpb() ensures 1145 1146 * all predictions, including the RSB, are invalidated, 1146 1147 * regardless of IBPB implementation. 1147 1148 */ ··· 1591 1592 rrsba_disabled = true; 1592 1593 } 1593 1594 1594 - static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode) 1595 + static void __init spectre_v2_select_rsb_mitigation(enum spectre_v2_mitigation mode) 1595 1596 { 1596 1597 /* 1597 - * Similar to context switches, there are two types of RSB attacks 1598 - * after VM exit: 1598 + * WARNING! There are many subtleties to consider when changing *any* 1599 + * code related to RSB-related mitigations. Before doing so, carefully 1600 + * read the following document, and update if necessary: 1599 1601 * 1600 - * 1) RSB underflow 1602 + * Documentation/admin-guide/hw-vuln/rsb.rst 1601 1603 * 1602 - * 2) Poisoned RSB entry 1604 + * In an overly simplified nutshell: 1603 1605 * 1604 - * When retpoline is enabled, both are mitigated by filling/clearing 1605 - * the RSB. 1606 + * - User->user RSB attacks are conditionally mitigated during 1607 + * context switches by cond_mitigation -> write_ibpb(). 1606 1608 * 1607 - * When IBRS is enabled, while #1 would be mitigated by the IBRS branch 1608 - * prediction isolation protections, RSB still needs to be cleared 1609 - * because of #2. Note that SMEP provides no protection here, unlike 1610 - * user-space-poisoned RSB entries. 1609 + * - User->kernel and guest->host attacks are mitigated by eIBRS or 1610 + * RSB filling. 1611 1611 * 1612 - * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB 1613 - * bug is present then a LITE version of RSB protection is required, 1614 - * just a single call needs to retire before a RET is executed. 1612 + * Though, depending on config, note that other alternative 1613 + * mitigations may end up getting used instead, e.g., IBPB on 1614 + * entry/vmexit, call depth tracking, or return thunks. 1615 1615 */ 1616 + 1616 1617 switch (mode) { 1617 1618 case SPECTRE_V2_NONE: 1618 - return; 1619 + break; 1619 1620 1620 - case SPECTRE_V2_EIBRS_LFENCE: 1621 1621 case SPECTRE_V2_EIBRS: 1622 - if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { 1623 - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); 1624 - pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); 1625 - } 1626 - return; 1627 - 1622 + case SPECTRE_V2_EIBRS_LFENCE: 1628 1623 case SPECTRE_V2_EIBRS_RETPOLINE: 1624 + if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { 1625 + pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); 1626 + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); 1627 + } 1628 + break; 1629 + 1629 1630 case SPECTRE_V2_RETPOLINE: 1630 1631 case SPECTRE_V2_LFENCE: 1631 1632 case SPECTRE_V2_IBRS: 1633 + pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n"); 1634 + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); 1632 1635 setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); 1633 - pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); 1634 - return; 1635 - } 1636 + break; 1636 1637 1637 - pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit"); 1638 - dump_stack(); 1638 + default: 1639 + pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n"); 1640 + dump_stack(); 1641 + break; 1642 + } 1639 1643 } 1640 1644 1641 1645 /* ··· 1832 1830 spectre_v2_enabled = mode; 1833 1831 pr_info("%s\n", spectre_v2_strings[mode]); 1834 1832 1835 - /* 1836 - * If Spectre v2 protection has been enabled, fill the RSB during a 1837 - * context switch. In general there are two types of RSB attacks 1838 - * across context switches, for which the CALLs/RETs may be unbalanced. 1839 - * 1840 - * 1) RSB underflow 1841 - * 1842 - * Some Intel parts have "bottomless RSB". When the RSB is empty, 1843 - * speculated return targets may come from the branch predictor, 1844 - * which could have a user-poisoned BTB or BHB entry. 1845 - * 1846 - * AMD has it even worse: *all* returns are speculated from the BTB, 1847 - * regardless of the state of the RSB. 1848 - * 1849 - * When IBRS or eIBRS is enabled, the "user -> kernel" attack 1850 - * scenario is mitigated by the IBRS branch prediction isolation 1851 - * properties, so the RSB buffer filling wouldn't be necessary to 1852 - * protect against this type of attack. 1853 - * 1854 - * The "user -> user" attack scenario is mitigated by RSB filling. 1855 - * 1856 - * 2) Poisoned RSB entry 1857 - * 1858 - * If the 'next' in-kernel return stack is shorter than 'prev', 1859 - * 'next' could be tricked into speculating with a user-poisoned RSB 1860 - * entry. 1861 - * 1862 - * The "user -> kernel" attack scenario is mitigated by SMEP and 1863 - * eIBRS. 1864 - * 1865 - * The "user -> user" scenario, also known as SpectreBHB, requires 1866 - * RSB clearing. 1867 - * 1868 - * So to mitigate all cases, unconditionally fill RSB on context 1869 - * switches. 1870 - * 1871 - * FIXME: Is this pointless for retbleed-affected AMD? 1872 - */ 1873 - setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); 1874 - pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n"); 1875 - 1876 - spectre_v2_determine_rsb_fill_type_at_vmexit(mode); 1833 + spectre_v2_select_rsb_mitigation(mode); 1877 1834 1878 1835 /* 1879 1836 * Retpoline protects the kernel, but doesn't protect firmware. IBRS ··· 2637 2676 setup_clear_cpu_cap(X86_FEATURE_RETHUNK); 2638 2677 2639 2678 /* 2640 - * There is no need for RSB filling: entry_ibpb() ensures 2679 + * There is no need for RSB filling: write_ibpb() ensures 2641 2680 * all predictions, including the RSB, are invalidated, 2642 2681 * regardless of IBPB implementation. 2643 2682 */ ··· 2662 2701 srso_mitigation = SRSO_MITIGATION_IBPB_ON_VMEXIT; 2663 2702 2664 2703 /* 2665 - * There is no need for RSB filling: entry_ibpb() ensures 2704 + * There is no need for RSB filling: write_ibpb() ensures 2666 2705 * all predictions, including the RSB, are invalidated, 2667 2706 * regardless of IBPB implementation. 2668 2707 */
+27 -21
arch/x86/kernel/cpu/resctrl/rdtgroup.c
··· 3553 3553 free_rmid(rgrp->closid, rgrp->mon.rmid); 3554 3554 } 3555 3555 3556 + /* 3557 + * We allow creating mon groups only with in a directory called "mon_groups" 3558 + * which is present in every ctrl_mon group. Check if this is a valid 3559 + * "mon_groups" directory. 3560 + * 3561 + * 1. The directory should be named "mon_groups". 3562 + * 2. The mon group itself should "not" be named "mon_groups". 3563 + * This makes sure "mon_groups" directory always has a ctrl_mon group 3564 + * as parent. 3565 + */ 3566 + static bool is_mon_groups(struct kernfs_node *kn, const char *name) 3567 + { 3568 + return (!strcmp(rdt_kn_name(kn), "mon_groups") && 3569 + strcmp(name, "mon_groups")); 3570 + } 3571 + 3556 3572 static int mkdir_rdt_prepare(struct kernfs_node *parent_kn, 3557 3573 const char *name, umode_t mode, 3558 3574 enum rdt_group_type rtype, struct rdtgroup **r) ··· 3581 3565 prdtgrp = rdtgroup_kn_lock_live(parent_kn); 3582 3566 if (!prdtgrp) { 3583 3567 ret = -ENODEV; 3568 + goto out_unlock; 3569 + } 3570 + 3571 + /* 3572 + * Check that the parent directory for a monitor group is a "mon_groups" 3573 + * directory. 3574 + */ 3575 + if (rtype == RDTMON_GROUP && !is_mon_groups(parent_kn, name)) { 3576 + ret = -EPERM; 3584 3577 goto out_unlock; 3585 3578 } 3586 3579 ··· 3776 3751 return ret; 3777 3752 } 3778 3753 3779 - /* 3780 - * We allow creating mon groups only with in a directory called "mon_groups" 3781 - * which is present in every ctrl_mon group. Check if this is a valid 3782 - * "mon_groups" directory. 3783 - * 3784 - * 1. The directory should be named "mon_groups". 3785 - * 2. The mon group itself should "not" be named "mon_groups". 3786 - * This makes sure "mon_groups" directory always has a ctrl_mon group 3787 - * as parent. 3788 - */ 3789 - static bool is_mon_groups(struct kernfs_node *kn, const char *name) 3790 - { 3791 - return (!strcmp(rdt_kn_name(kn), "mon_groups") && 3792 - strcmp(name, "mon_groups")); 3793 - } 3794 - 3795 3754 static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name, 3796 3755 umode_t mode) 3797 3756 { ··· 3791 3782 if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn) 3792 3783 return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode); 3793 3784 3794 - /* 3795 - * If RDT monitoring is supported and the parent directory is a valid 3796 - * "mon_groups" directory, add a monitoring subdirectory. 3797 - */ 3798 - if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name)) 3785 + /* Else, attempt to add a monitoring subdirectory. */ 3786 + if (resctrl_arch_mon_capable()) 3799 3787 return rdtgroup_mkdir_mon(parent_kn, name, mode); 3800 3788 3801 3789 return -EPERM;
+8 -9
arch/x86/kernel/e820.c
··· 753 753 void __init e820__register_nosave_regions(unsigned long limit_pfn) 754 754 { 755 755 int i; 756 - unsigned long pfn = 0; 756 + u64 last_addr = 0; 757 757 758 758 for (i = 0; i < e820_table->nr_entries; i++) { 759 759 struct e820_entry *entry = &e820_table->entries[i]; 760 760 761 - if (pfn < PFN_UP(entry->addr)) 762 - register_nosave_region(pfn, PFN_UP(entry->addr)); 763 - 764 - pfn = PFN_DOWN(entry->addr + entry->size); 765 - 766 761 if (entry->type != E820_TYPE_RAM) 767 - register_nosave_region(PFN_UP(entry->addr), pfn); 762 + continue; 768 763 769 - if (pfn >= limit_pfn) 770 - break; 764 + if (last_addr < entry->addr) 765 + register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr)); 766 + 767 + last_addr = entry->addr + entry->size; 771 768 } 769 + 770 + register_nosave_region(PFN_DOWN(last_addr), limit_pfn); 772 771 } 773 772 774 773 #ifdef CONFIG_ACPI
+5 -5
arch/x86/kernel/early_printk.c
··· 389 389 keep = (strstr(buf, "keep") != NULL); 390 390 391 391 while (*buf != '\0') { 392 - if (!strncmp(buf, "mmio", 4)) { 393 - early_mmio_serial_init(buf + 4); 392 + if (!strncmp(buf, "mmio32", 6)) { 393 + buf += 6; 394 + early_mmio_serial_init(buf); 394 395 early_console_register(&early_serial_console, keep); 395 - buf += 4; 396 396 } 397 397 if (!strncmp(buf, "serial", 6)) { 398 398 buf += 6; ··· 407 407 } 408 408 #ifdef CONFIG_PCI 409 409 if (!strncmp(buf, "pciserial", 9)) { 410 - early_pci_serial_init(buf + 9); 410 + buf += 9; /* Keep from match the above "pciserial" */ 411 + early_pci_serial_init(buf); 411 412 early_console_register(&early_serial_console, keep); 412 - buf += 9; /* Keep from match the above "serial" */ 413 413 } 414 414 #endif 415 415 if (!strncmp(buf, "vga", 3) &&
+3 -3
arch/x86/mm/tlb.c
··· 667 667 prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec); 668 668 669 669 /* 670 - * Avoid user/user BTB poisoning by flushing the branch predictor 671 - * when switching between processes. This stops one process from 672 - * doing Spectre-v2 attacks on another. 670 + * Avoid user->user BTB/RSB poisoning by flushing them when switching 671 + * between processes. This stops one process from doing Spectre-v2 672 + * attacks on another. 673 673 * 674 674 * Both, the conditional and the always IBPB mode use the mm 675 675 * pointer to avoid the IBPB when switching between tasks of the
+2 -2
arch/x86/power/hibernate_asm_64.S
··· 26 26 /* code below belongs to the image kernel */ 27 27 .align PAGE_SIZE 28 28 SYM_FUNC_START(restore_registers) 29 - ANNOTATE_NOENDBR 29 + ENDBR 30 30 /* go back to the original page tables */ 31 31 movq %r9, %cr3 32 32 ··· 120 120 121 121 /* code below has been relocated to a safe page */ 122 122 SYM_FUNC_START(core_restore_code) 123 - ANNOTATE_NOENDBR 123 + ENDBR 124 124 /* switch to temporary page tables */ 125 125 movq %rax, %cr3 126 126 /* flush TLB */