Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 pti updates from Thomas Gleixner:
"This contains:

- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.

- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.

- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared

- removal of dead code in the PTI pagetable setup functions

- add PTI documentation

- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.

- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.

- the initial spectre_v2 mitigations, aka retpoline:

+ The necessary ASM thunk and compiler support

+ The ASM variants of retpoline and the conversion of affected ASM
code

+ Make LFENCE serializing on AMD so it can be used as speculation
trap

+ The RSB fill after vmexit

- initial objtool support for retpoline

As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:

- the retpoline add on of LFENCE which waits for ACKs

- the RSB fill after context switch

Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...

+1525 -100
+16
Documentation/ABI/testing/sysfs-devices-system-cpu
··· 375 375 Description: information about CPUs heterogeneity. 376 376 377 377 cpu_capacity: capacity of cpu#. 378 + 379 + What: /sys/devices/system/cpu/vulnerabilities 380 + /sys/devices/system/cpu/vulnerabilities/meltdown 381 + /sys/devices/system/cpu/vulnerabilities/spectre_v1 382 + /sys/devices/system/cpu/vulnerabilities/spectre_v2 383 + Date: January 2018 384 + Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 385 + Description: Information about CPU vulnerabilities 386 + 387 + The files are named after the code names of CPU 388 + vulnerabilities. The output of those files reflects the 389 + state of the CPUs in the system. Possible output values: 390 + 391 + "Not affected" CPU is not affected by the vulnerability 392 + "Vulnerable" CPU is affected and no mitigation in effect 393 + "Mitigation: $M" CPU is affected and mitigation $M is in effect
+42 -7
Documentation/admin-guide/kernel-parameters.txt
··· 2623 2623 nosmt [KNL,S390] Disable symmetric multithreading (SMT). 2624 2624 Equivalent to smt=1. 2625 2625 2626 + nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2 2627 + (indirect branch prediction) vulnerability. System may 2628 + allow data leaks with this option, which is equivalent 2629 + to spectre_v2=off. 2630 + 2626 2631 noxsave [BUGS=X86] Disables x86 extended register state save 2627 2632 and restore using xsave. The kernel will fallback to 2628 2633 enabling legacy floating-point and sse state. ··· 2713 2708 no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting. 2714 2709 steal time is computed, but won't influence scheduler 2715 2710 behaviour 2716 - 2717 - nopti [X86-64] Disable kernel page table isolation 2718 2711 2719 2712 nolapic [X86-32,APIC] Do not enable or use the local APIC. 2720 2713 ··· 3294 3291 pt. [PARIDE] 3295 3292 See Documentation/blockdev/paride.txt. 3296 3293 3297 - pti= [X86_64] 3298 - Control user/kernel address space isolation: 3299 - on - enable 3300 - off - disable 3301 - auto - default setting 3294 + pti= [X86_64] Control Page Table Isolation of user and 3295 + kernel address spaces. Disabling this feature 3296 + removes hardening, but improves performance of 3297 + system calls and interrupts. 3298 + 3299 + on - unconditionally enable 3300 + off - unconditionally disable 3301 + auto - kernel detects whether your CPU model is 3302 + vulnerable to issues that PTI mitigates 3303 + 3304 + Not specifying this option is equivalent to pti=auto. 3305 + 3306 + nopti [X86_64] 3307 + Equivalent to pti=off 3302 3308 3303 3309 pty.legacy_count= 3304 3310 [KNL] Number of legacy pty's. Overwrites compiled-in ··· 3957 3945 3958 3946 sonypi.*= [HW] Sony Programmable I/O Control Device driver 3959 3947 See Documentation/laptops/sonypi.txt 3948 + 3949 + spectre_v2= [X86] Control mitigation of Spectre variant 2 3950 + (indirect branch speculation) vulnerability. 3951 + 3952 + on - unconditionally enable 3953 + off - unconditionally disable 3954 + auto - kernel detects whether your CPU model is 3955 + vulnerable 3956 + 3957 + Selecting 'on' will, and 'auto' may, choose a 3958 + mitigation method at run time according to the 3959 + CPU, the available microcode, the setting of the 3960 + CONFIG_RETPOLINE configuration option, and the 3961 + compiler with which the kernel was built. 3962 + 3963 + Specific mitigations can also be selected manually: 3964 + 3965 + retpoline - replace indirect branches 3966 + retpoline,generic - google's original retpoline 3967 + retpoline,amd - AMD-specific minimal thunk 3968 + 3969 + Not specifying this option is equivalent to 3970 + spectre_v2=auto. 3960 3971 3961 3972 spia_io_base= [HW,MTD] 3962 3973 spia_fio_base=
+186
Documentation/x86/pti.txt
··· 1 + Overview 2 + ======== 3 + 4 + Page Table Isolation (pti, previously known as KAISER[1]) is a 5 + countermeasure against attacks on the shared user/kernel address 6 + space such as the "Meltdown" approach[2]. 7 + 8 + To mitigate this class of attacks, we create an independent set of 9 + page tables for use only when running userspace applications. When 10 + the kernel is entered via syscalls, interrupts or exceptions, the 11 + page tables are switched to the full "kernel" copy. When the system 12 + switches back to user mode, the user copy is used again. 13 + 14 + The userspace page tables contain only a minimal amount of kernel 15 + data: only what is needed to enter/exit the kernel such as the 16 + entry/exit functions themselves and the interrupt descriptor table 17 + (IDT). There are a few strictly unnecessary things that get mapped 18 + such as the first C function when entering an interrupt (see 19 + comments in pti.c). 20 + 21 + This approach helps to ensure that side-channel attacks leveraging 22 + the paging structures do not function when PTI is enabled. It can be 23 + enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time. 24 + Once enabled at compile-time, it can be disabled at boot with the 25 + 'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt). 26 + 27 + Page Table Management 28 + ===================== 29 + 30 + When PTI is enabled, the kernel manages two sets of page tables. 31 + The first set is very similar to the single set which is present in 32 + kernels without PTI. This includes a complete mapping of userspace 33 + that the kernel can use for things like copy_to_user(). 34 + 35 + Although _complete_, the user portion of the kernel page tables is 36 + crippled by setting the NX bit in the top level. This ensures 37 + that any missed kernel->user CR3 switch will immediately crash 38 + userspace upon executing its first instruction. 39 + 40 + The userspace page tables map only the kernel data needed to enter 41 + and exit the kernel. This data is entirely contained in the 'struct 42 + cpu_entry_area' structure which is placed in the fixmap which gives 43 + each CPU's copy of the area a compile-time-fixed virtual address. 44 + 45 + For new userspace mappings, the kernel makes the entries in its 46 + page tables like normal. The only difference is when the kernel 47 + makes entries in the top (PGD) level. In addition to setting the 48 + entry in the main kernel PGD, a copy of the entry is made in the 49 + userspace page tables' PGD. 50 + 51 + This sharing at the PGD level also inherently shares all the lower 52 + layers of the page tables. This leaves a single, shared set of 53 + userspace page tables to manage. One PTE to lock, one set of 54 + accessed bits, dirty bits, etc... 55 + 56 + Overhead 57 + ======== 58 + 59 + Protection against side-channel attacks is important. But, 60 + this protection comes at a cost: 61 + 62 + 1. Increased Memory Use 63 + a. Each process now needs an order-1 PGD instead of order-0. 64 + (Consumes an additional 4k per process). 65 + b. The 'cpu_entry_area' structure must be 2MB in size and 2MB 66 + aligned so that it can be mapped by setting a single PMD 67 + entry. This consumes nearly 2MB of RAM once the kernel 68 + is decompressed, but no space in the kernel image itself. 69 + 70 + 2. Runtime Cost 71 + a. CR3 manipulation to switch between the page table copies 72 + must be done at interrupt, syscall, and exception entry 73 + and exit (it can be skipped when the kernel is interrupted, 74 + though.) Moves to CR3 are on the order of a hundred 75 + cycles, and are required at every entry and exit. 76 + b. A "trampoline" must be used for SYSCALL entry. This 77 + trampoline depends on a smaller set of resources than the 78 + non-PTI SYSCALL entry code, so requires mapping fewer 79 + things into the userspace page tables. The downside is 80 + that stacks must be switched at entry time. 81 + d. Global pages are disabled for all kernel structures not 82 + mapped into both kernel and userspace page tables. This 83 + feature of the MMU allows different processes to share TLB 84 + entries mapping the kernel. Losing the feature means more 85 + TLB misses after a context switch. The actual loss of 86 + performance is very small, however, never exceeding 1%. 87 + d. Process Context IDentifiers (PCID) is a CPU feature that 88 + allows us to skip flushing the entire TLB when switching page 89 + tables by setting a special bit in CR3 when the page tables 90 + are changed. This makes switching the page tables (at context 91 + switch, or kernel entry/exit) cheaper. But, on systems with 92 + PCID support, the context switch code must flush both the user 93 + and kernel entries out of the TLB. The user PCID TLB flush is 94 + deferred until the exit to userspace, minimizing the cost. 95 + See intel.com/sdm for the gory PCID/INVPCID details. 96 + e. The userspace page tables must be populated for each new 97 + process. Even without PTI, the shared kernel mappings 98 + are created by copying top-level (PGD) entries into each 99 + new process. But, with PTI, there are now *two* kernel 100 + mappings: one in the kernel page tables that maps everything 101 + and one for the entry/exit structures. At fork(), we need to 102 + copy both. 103 + f. In addition to the fork()-time copying, there must also 104 + be an update to the userspace PGD any time a set_pgd() is done 105 + on a PGD used to map userspace. This ensures that the kernel 106 + and userspace copies always map the same userspace 107 + memory. 108 + g. On systems without PCID support, each CR3 write flushes 109 + the entire TLB. That means that each syscall, interrupt 110 + or exception flushes the TLB. 111 + h. INVPCID is a TLB-flushing instruction which allows flushing 112 + of TLB entries for non-current PCIDs. Some systems support 113 + PCIDs, but do not support INVPCID. On these systems, addresses 114 + can only be flushed from the TLB for the current PCID. When 115 + flushing a kernel address, we need to flush all PCIDs, so a 116 + single kernel address flush will require a TLB-flushing CR3 117 + write upon the next use of every PCID. 118 + 119 + Possible Future Work 120 + ==================== 121 + 1. We can be more careful about not actually writing to CR3 122 + unless its value is actually changed. 123 + 2. Allow PTI to be enabled/disabled at runtime in addition to the 124 + boot-time switching. 125 + 126 + Testing 127 + ======== 128 + 129 + To test stability of PTI, the following test procedure is recommended, 130 + ideally doing all of these in parallel: 131 + 132 + 1. Set CONFIG_DEBUG_ENTRY=y 133 + 2. Run several copies of all of the tools/testing/selftests/x86/ tests 134 + (excluding MPX and protection_keys) in a loop on multiple CPUs for 135 + several minutes. These tests frequently uncover corner cases in the 136 + kernel entry code. In general, old kernels might cause these tests 137 + themselves to crash, but they should never crash the kernel. 138 + 3. Run the 'perf' tool in a mode (top or record) that generates many 139 + frequent performance monitoring non-maskable interrupts (see "NMI" 140 + in /proc/interrupts). This exercises the NMI entry/exit code which 141 + is known to trigger bugs in code paths that did not expect to be 142 + interrupted, including nested NMIs. Using "-c" boosts the rate of 143 + NMIs, and using two -c with separate counters encourages nested NMIs 144 + and less deterministic behavior. 145 + 146 + while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done 147 + 148 + 4. Launch a KVM virtual machine. 149 + 5. Run 32-bit binaries on systems supporting the SYSCALL instruction. 150 + This has been a lightly-tested code path and needs extra scrutiny. 151 + 152 + Debugging 153 + ========= 154 + 155 + Bugs in PTI cause a few different signatures of crashes 156 + that are worth noting here. 157 + 158 + * Failures of the selftests/x86 code. Usually a bug in one of the 159 + more obscure corners of entry_64.S 160 + * Crashes in early boot, especially around CPU bringup. Bugs 161 + in the trampoline code or mappings cause these. 162 + * Crashes at the first interrupt. Caused by bugs in entry_64.S, 163 + like screwing up a page table switch. Also caused by 164 + incorrectly mapping the IRQ handler entry code. 165 + * Crashes at the first NMI. The NMI code is separate from main 166 + interrupt handlers and can have bugs that do not affect 167 + normal interrupts. Also caused by incorrectly mapping NMI 168 + code. NMIs that interrupt the entry code must be very 169 + careful and can be the cause of crashes that show up when 170 + running perf. 171 + * Kernel crashes at the first exit to userspace. entry_64.S 172 + bugs, or failing to map some of the exit code. 173 + * Crashes at first interrupt that interrupts userspace. The paths 174 + in entry_64.S that return to userspace are sometimes separate 175 + from the ones that return to the kernel. 176 + * Double faults: overflowing the kernel stack because of page 177 + faults upon page faults. Caused by touching non-pti-mapped 178 + data in the entry code, or forgetting to switch to kernel 179 + CR3 before calling into C functions which are not pti-mapped. 180 + * Userspace segfaults early in boot, sometimes manifesting 181 + as mount(8) failing to mount the rootfs. These have 182 + tended to be TLB invalidation issues. Usually invalidating 183 + the wrong PCID, or otherwise missing an invalidation. 184 + 185 + 1. https://gruss.cc/files/kaiser.pdf 186 + 2. https://meltdownattack.com/meltdown.pdf
+14
arch/x86/Kconfig
··· 88 88 select GENERIC_CLOCKEVENTS_MIN_ADJUST 89 89 select GENERIC_CMOS_UPDATE 90 90 select GENERIC_CPU_AUTOPROBE 91 + select GENERIC_CPU_VULNERABILITIES 91 92 select GENERIC_EARLY_IOREMAP 92 93 select GENERIC_FIND_FIRST_BIT 93 94 select GENERIC_IOMAP ··· 428 427 config GOLDFISH 429 428 def_bool y 430 429 depends on X86_GOLDFISH 430 + 431 + config RETPOLINE 432 + bool "Avoid speculative indirect branches in kernel" 433 + default y 434 + help 435 + Compile kernel with the retpoline compiler options to guard against 436 + kernel-to-user data leaks by avoiding speculative indirect 437 + branches. Requires a compiler with -mindirect-branch=thunk-extern 438 + support for full protection. The kernel may run slower. 439 + 440 + Without compiler support, at least indirect branches in assembler 441 + code are eliminated. Since this includes the syscall entry path, 442 + it is not entirely pointless. 431 443 432 444 config INTEL_RDT 433 445 bool "Intel Resource Director Technology support"
+10
arch/x86/Makefile
··· 230 230 # 231 231 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables 232 232 233 + # Avoid indirect branches in kernel to deal with Spectre 234 + ifdef CONFIG_RETPOLINE 235 + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register) 236 + ifneq ($(RETPOLINE_CFLAGS),) 237 + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE 238 + else 239 + $(warning CONFIG_RETPOLINE=y, but not supported by the compiler. Toolchain update recommended.) 240 + endif 241 + endif 242 + 233 243 archscripts: scripts_basic 234 244 $(Q)$(MAKE) $(build)=arch/x86/tools relocs 235 245
+3 -2
arch/x86/crypto/aesni-intel_asm.S
··· 32 32 #include <linux/linkage.h> 33 33 #include <asm/inst.h> 34 34 #include <asm/frame.h> 35 + #include <asm/nospec-branch.h> 35 36 36 37 /* 37 38 * The following macros are used to move an (un)aligned 16 byte value to/from ··· 2885 2884 pxor INC, STATE4 2886 2885 movdqu IV, 0x30(OUTP) 2887 2886 2888 - call *%r11 2887 + CALL_NOSPEC %r11 2889 2888 2890 2889 movdqu 0x00(OUTP), INC 2891 2890 pxor INC, STATE1 ··· 2930 2929 _aesni_gf128mul_x_ble() 2931 2930 movups IV, (IVP) 2932 2931 2933 - call *%r11 2932 + CALL_NOSPEC %r11 2934 2933 2935 2934 movdqu 0x40(OUTP), INC 2936 2935 pxor INC, STATE1
+2 -1
arch/x86/crypto/camellia-aesni-avx-asm_64.S
··· 17 17 18 18 #include <linux/linkage.h> 19 19 #include <asm/frame.h> 20 + #include <asm/nospec-branch.h> 20 21 21 22 #define CAMELLIA_TABLE_BYTE_LEN 272 22 23 ··· 1228 1227 vpxor 14 * 16(%rax), %xmm15, %xmm14; 1229 1228 vpxor 15 * 16(%rax), %xmm15, %xmm15; 1230 1229 1231 - call *%r9; 1230 + CALL_NOSPEC %r9; 1232 1231 1233 1232 addq $(16 * 16), %rsp; 1234 1233
+2 -1
arch/x86/crypto/camellia-aesni-avx2-asm_64.S
··· 12 12 13 13 #include <linux/linkage.h> 14 14 #include <asm/frame.h> 15 + #include <asm/nospec-branch.h> 15 16 16 17 #define CAMELLIA_TABLE_BYTE_LEN 272 17 18 ··· 1344 1343 vpxor 14 * 32(%rax), %ymm15, %ymm14; 1345 1344 vpxor 15 * 32(%rax), %ymm15, %ymm15; 1346 1345 1347 - call *%r9; 1346 + CALL_NOSPEC %r9; 1348 1347 1349 1348 addq $(16 * 32), %rsp; 1350 1349
+2 -1
arch/x86/crypto/crc32c-pcl-intel-asm_64.S
··· 45 45 46 46 #include <asm/inst.h> 47 47 #include <linux/linkage.h> 48 + #include <asm/nospec-branch.h> 48 49 49 50 ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction 50 51 ··· 173 172 movzxw (bufp, %rax, 2), len 174 173 lea crc_array(%rip), bufp 175 174 lea (bufp, len, 1), bufp 176 - jmp *bufp 175 + JMP_NOSPEC bufp 177 176 178 177 ################################################################ 179 178 ## 2a) PROCESS FULL BLOCKS:
+19 -17
arch/x86/entry/calling.h
··· 198 198 * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two 199 199 * halves: 200 200 */ 201 - #define PTI_SWITCH_PGTABLES_MASK (1<<PAGE_SHIFT) 202 - #define PTI_SWITCH_MASK (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT)) 201 + #define PTI_USER_PGTABLE_BIT PAGE_SHIFT 202 + #define PTI_USER_PGTABLE_MASK (1 << PTI_USER_PGTABLE_BIT) 203 + #define PTI_USER_PCID_BIT X86_CR3_PTI_PCID_USER_BIT 204 + #define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT) 205 + #define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK) 203 206 204 207 .macro SET_NOFLUSH_BIT reg:req 205 208 bts $X86_CR3_PCID_NOFLUSH_BIT, \reg ··· 211 208 .macro ADJUST_KERNEL_CR3 reg:req 212 209 ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID 213 210 /* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */ 214 - andq $(~PTI_SWITCH_MASK), \reg 211 + andq $(~PTI_USER_PGTABLE_AND_PCID_MASK), \reg 215 212 .endm 216 213 217 214 .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req ··· 242 239 /* Flush needed, clear the bit */ 243 240 btr \scratch_reg, THIS_CPU_user_pcid_flush_mask 244 241 movq \scratch_reg2, \scratch_reg 245 - jmp .Lwrcr3_\@ 242 + jmp .Lwrcr3_pcid_\@ 246 243 247 244 .Lnoflush_\@: 248 245 movq \scratch_reg2, \scratch_reg 249 246 SET_NOFLUSH_BIT \scratch_reg 250 247 248 + .Lwrcr3_pcid_\@: 249 + /* Flip the ASID to the user version */ 250 + orq $(PTI_USER_PCID_MASK), \scratch_reg 251 + 251 252 .Lwrcr3_\@: 252 - /* Flip the PGD and ASID to the user version */ 253 - orq $(PTI_SWITCH_MASK), \scratch_reg 253 + /* Flip the PGD to the user version */ 254 + orq $(PTI_USER_PGTABLE_MASK), \scratch_reg 254 255 mov \scratch_reg, %cr3 255 256 .Lend_\@: 256 257 .endm ··· 270 263 movq %cr3, \scratch_reg 271 264 movq \scratch_reg, \save_reg 272 265 /* 273 - * Is the "switch mask" all zero? That means that both of 274 - * these are zero: 275 - * 276 - * 1. The user/kernel PCID bit, and 277 - * 2. The user/kernel "bit" that points CR3 to the 278 - * bottom half of the 8k PGD 279 - * 280 - * That indicates a kernel CR3 value, not a user CR3. 266 + * Test the user pagetable bit. If set, then the user page tables 267 + * are active. If clear CR3 already has the kernel page table 268 + * active. 281 269 */ 282 - testq $(PTI_SWITCH_MASK), \scratch_reg 283 - jz .Ldone_\@ 270 + bt $PTI_USER_PGTABLE_BIT, \scratch_reg 271 + jnc .Ldone_\@ 284 272 285 273 ADJUST_KERNEL_CR3 \scratch_reg 286 274 movq \scratch_reg, %cr3 ··· 292 290 * KERNEL pages can always resume with NOFLUSH as we do 293 291 * explicit flushes. 294 292 */ 295 - bt $X86_CR3_PTI_SWITCH_BIT, \save_reg 293 + bt $PTI_USER_PGTABLE_BIT, \save_reg 296 294 jnc .Lnoflush_\@ 297 295 298 296 /*
+3 -2
arch/x86/entry/entry_32.S
··· 44 44 #include <asm/asm.h> 45 45 #include <asm/smap.h> 46 46 #include <asm/frame.h> 47 + #include <asm/nospec-branch.h> 47 48 48 49 .section .entry.text, "ax" 49 50 ··· 291 290 292 291 /* kernel thread */ 293 292 1: movl %edi, %eax 294 - call *%ebx 293 + CALL_NOSPEC %ebx 295 294 /* 296 295 * A kernel thread is allowed to return here after successfully 297 296 * calling do_execve(). Exit to userspace to complete the execve() ··· 920 919 movl %ecx, %es 921 920 TRACE_IRQS_OFF 922 921 movl %esp, %eax # pt_regs pointer 923 - call *%edi 922 + CALL_NOSPEC %edi 924 923 jmp ret_from_exception 925 924 END(common_exception) 926 925
+9 -3
arch/x86/entry/entry_64.S
··· 37 37 #include <asm/pgtable_types.h> 38 38 #include <asm/export.h> 39 39 #include <asm/frame.h> 40 + #include <asm/nospec-branch.h> 40 41 #include <linux/err.h> 41 42 42 43 #include "calling.h" ··· 192 191 */ 193 192 pushq %rdi 194 193 movq $entry_SYSCALL_64_stage2, %rdi 195 - jmp *%rdi 194 + JMP_NOSPEC %rdi 196 195 END(entry_SYSCALL_64_trampoline) 197 196 198 197 .popsection ··· 271 270 * It might end up jumping to the slow path. If it jumps, RAX 272 271 * and all argument registers are clobbered. 273 272 */ 273 + #ifdef CONFIG_RETPOLINE 274 + movq sys_call_table(, %rax, 8), %rax 275 + call __x86_indirect_thunk_rax 276 + #else 274 277 call *sys_call_table(, %rax, 8) 278 + #endif 275 279 .Lentry_SYSCALL_64_after_fastpath_call: 276 280 277 281 movq %rax, RAX(%rsp) ··· 448 442 jmp entry_SYSCALL64_slow_path 449 443 450 444 1: 451 - jmp *%rax /* Called from C */ 445 + JMP_NOSPEC %rax /* Called from C */ 452 446 END(stub_ptregs_64) 453 447 454 448 .macro ptregs_stub func ··· 527 521 1: 528 522 /* kernel thread */ 529 523 movq %r12, %rdi 530 - call *%rbx 524 + CALL_NOSPEC %rbx 531 525 /* 532 526 * A kernel thread is allowed to return here after successfully 533 527 * calling do_execve(). Exit to userspace to complete the execve()
+18
arch/x86/events/intel/bts.c
··· 582 582 if (!boot_cpu_has(X86_FEATURE_DTES64) || !x86_pmu.bts) 583 583 return -ENODEV; 584 584 585 + if (boot_cpu_has(X86_FEATURE_PTI)) { 586 + /* 587 + * BTS hardware writes through a virtual memory map we must 588 + * either use the kernel physical map, or the user mapping of 589 + * the AUX buffer. 590 + * 591 + * However, since this driver supports per-CPU and per-task inherit 592 + * we cannot use the user mapping since it will not be availble 593 + * if we're not running the owning process. 594 + * 595 + * With PTI we can't use the kernal map either, because its not 596 + * there when we run userspace. 597 + * 598 + * For now, disable this driver when using PTI. 599 + */ 600 + return -ENODEV; 601 + } 602 + 585 603 bts_pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE | 586 604 PERF_PMU_CAP_EXCLUSIVE; 587 605 bts_pmu.task_ctx_nr = perf_sw_context;
+25
arch/x86/include/asm/asm-prototypes.h
··· 11 11 #include <asm/pgtable.h> 12 12 #include <asm/special_insns.h> 13 13 #include <asm/preempt.h> 14 + #include <asm/asm.h> 14 15 15 16 #ifndef CONFIG_X86_CMPXCHG64 16 17 extern void cmpxchg8b_emu(void); 17 18 #endif 19 + 20 + #ifdef CONFIG_RETPOLINE 21 + #ifdef CONFIG_X86_32 22 + #define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void); 23 + #else 24 + #define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void); 25 + INDIRECT_THUNK(8) 26 + INDIRECT_THUNK(9) 27 + INDIRECT_THUNK(10) 28 + INDIRECT_THUNK(11) 29 + INDIRECT_THUNK(12) 30 + INDIRECT_THUNK(13) 31 + INDIRECT_THUNK(14) 32 + INDIRECT_THUNK(15) 33 + #endif 34 + INDIRECT_THUNK(ax) 35 + INDIRECT_THUNK(bx) 36 + INDIRECT_THUNK(cx) 37 + INDIRECT_THUNK(dx) 38 + INDIRECT_THUNK(si) 39 + INDIRECT_THUNK(di) 40 + INDIRECT_THUNK(bp) 41 + INDIRECT_THUNK(sp) 42 + #endif /* CONFIG_RETPOLINE */
+4
arch/x86/include/asm/cpufeatures.h
··· 203 203 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ 204 204 #define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ 205 205 #define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */ 206 + #define X86_FEATURE_RETPOLINE ( 7*32+12) /* Generic Retpoline mitigation for Spectre variant 2 */ 207 + #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */ 206 208 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ 207 209 #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ 208 210 #define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */ ··· 344 342 #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ 345 343 #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ 346 344 #define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */ 345 + #define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */ 346 + #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ 347 347 348 348 #endif /* _ASM_X86_CPUFEATURES_H */
+10 -8
arch/x86/include/asm/mshyperv.h
··· 7 7 #include <linux/nmi.h> 8 8 #include <asm/io.h> 9 9 #include <asm/hyperv.h> 10 + #include <asm/nospec-branch.h> 10 11 11 12 /* 12 13 * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent ··· 187 186 return U64_MAX; 188 187 189 188 __asm__ __volatile__("mov %4, %%r8\n" 190 - "call *%5" 189 + CALL_NOSPEC 191 190 : "=a" (hv_status), ASM_CALL_CONSTRAINT, 192 191 "+c" (control), "+d" (input_address) 193 - : "r" (output_address), "m" (hv_hypercall_pg) 192 + : "r" (output_address), 193 + THUNK_TARGET(hv_hypercall_pg) 194 194 : "cc", "memory", "r8", "r9", "r10", "r11"); 195 195 #else 196 196 u32 input_address_hi = upper_32_bits(input_address); ··· 202 200 if (!hv_hypercall_pg) 203 201 return U64_MAX; 204 202 205 - __asm__ __volatile__("call *%7" 203 + __asm__ __volatile__(CALL_NOSPEC 206 204 : "=A" (hv_status), 207 205 "+c" (input_address_lo), ASM_CALL_CONSTRAINT 208 206 : "A" (control), 209 207 "b" (input_address_hi), 210 208 "D"(output_address_hi), "S"(output_address_lo), 211 - "m" (hv_hypercall_pg) 209 + THUNK_TARGET(hv_hypercall_pg) 212 210 : "cc", "memory"); 213 211 #endif /* !x86_64 */ 214 212 return hv_status; ··· 229 227 230 228 #ifdef CONFIG_X86_64 231 229 { 232 - __asm__ __volatile__("call *%4" 230 + __asm__ __volatile__(CALL_NOSPEC 233 231 : "=a" (hv_status), ASM_CALL_CONSTRAINT, 234 232 "+c" (control), "+d" (input1) 235 - : "m" (hv_hypercall_pg) 233 + : THUNK_TARGET(hv_hypercall_pg) 236 234 : "cc", "r8", "r9", "r10", "r11"); 237 235 } 238 236 #else ··· 240 238 u32 input1_hi = upper_32_bits(input1); 241 239 u32 input1_lo = lower_32_bits(input1); 242 240 243 - __asm__ __volatile__ ("call *%5" 241 + __asm__ __volatile__ (CALL_NOSPEC 244 242 : "=A"(hv_status), 245 243 "+c"(input1_lo), 246 244 ASM_CALL_CONSTRAINT 247 245 : "A" (control), 248 246 "b" (input1_hi), 249 - "m" (hv_hypercall_pg) 247 + THUNK_TARGET(hv_hypercall_pg) 250 248 : "cc", "edi", "esi"); 251 249 } 252 250 #endif
+3
arch/x86/include/asm/msr-index.h
··· 355 355 #define FAM10H_MMIO_CONF_BASE_MASK 0xfffffffULL 356 356 #define FAM10H_MMIO_CONF_BASE_SHIFT 20 357 357 #define MSR_FAM10H_NODE_ID 0xc001100c 358 + #define MSR_F10H_DECFG 0xc0011029 359 + #define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1 360 + #define MSR_F10H_DECFG_LFENCE_SERIALIZE BIT_ULL(MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT) 358 361 359 362 /* K8 MSRs */ 360 363 #define MSR_K8_TOP_MEM1 0xc001001a
+214
arch/x86/include/asm/nospec-branch.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #ifndef __NOSPEC_BRANCH_H__ 4 + #define __NOSPEC_BRANCH_H__ 5 + 6 + #include <asm/alternative.h> 7 + #include <asm/alternative-asm.h> 8 + #include <asm/cpufeatures.h> 9 + 10 + /* 11 + * Fill the CPU return stack buffer. 12 + * 13 + * Each entry in the RSB, if used for a speculative 'ret', contains an 14 + * infinite 'pause; jmp' loop to capture speculative execution. 15 + * 16 + * This is required in various cases for retpoline and IBRS-based 17 + * mitigations for the Spectre variant 2 vulnerability. Sometimes to 18 + * eliminate potentially bogus entries from the RSB, and sometimes 19 + * purely to ensure that it doesn't get empty, which on some CPUs would 20 + * allow predictions from other (unwanted!) sources to be used. 21 + * 22 + * We define a CPP macro such that it can be used from both .S files and 23 + * inline assembly. It's possible to do a .macro and then include that 24 + * from C via asm(".include <asm/nospec-branch.h>") but let's not go there. 25 + */ 26 + 27 + #define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */ 28 + #define RSB_FILL_LOOPS 16 /* To avoid underflow */ 29 + 30 + /* 31 + * Google experimented with loop-unrolling and this turned out to be 32 + * the optimal version — two calls, each with their own speculation 33 + * trap should their return address end up getting used, in a loop. 34 + */ 35 + #define __FILL_RETURN_BUFFER(reg, nr, sp) \ 36 + mov $(nr/2), reg; \ 37 + 771: \ 38 + call 772f; \ 39 + 773: /* speculation trap */ \ 40 + pause; \ 41 + jmp 773b; \ 42 + 772: \ 43 + call 774f; \ 44 + 775: /* speculation trap */ \ 45 + pause; \ 46 + jmp 775b; \ 47 + 774: \ 48 + dec reg; \ 49 + jnz 771b; \ 50 + add $(BITS_PER_LONG/8) * nr, sp; 51 + 52 + #ifdef __ASSEMBLY__ 53 + 54 + /* 55 + * This should be used immediately before a retpoline alternative. It tells 56 + * objtool where the retpolines are so that it can make sense of the control 57 + * flow by just reading the original instruction(s) and ignoring the 58 + * alternatives. 59 + */ 60 + .macro ANNOTATE_NOSPEC_ALTERNATIVE 61 + .Lannotate_\@: 62 + .pushsection .discard.nospec 63 + .long .Lannotate_\@ - . 64 + .popsection 65 + .endm 66 + 67 + /* 68 + * These are the bare retpoline primitives for indirect jmp and call. 69 + * Do not use these directly; they only exist to make the ALTERNATIVE 70 + * invocation below less ugly. 71 + */ 72 + .macro RETPOLINE_JMP reg:req 73 + call .Ldo_rop_\@ 74 + .Lspec_trap_\@: 75 + pause 76 + jmp .Lspec_trap_\@ 77 + .Ldo_rop_\@: 78 + mov \reg, (%_ASM_SP) 79 + ret 80 + .endm 81 + 82 + /* 83 + * This is a wrapper around RETPOLINE_JMP so the called function in reg 84 + * returns to the instruction after the macro. 85 + */ 86 + .macro RETPOLINE_CALL reg:req 87 + jmp .Ldo_call_\@ 88 + .Ldo_retpoline_jmp_\@: 89 + RETPOLINE_JMP \reg 90 + .Ldo_call_\@: 91 + call .Ldo_retpoline_jmp_\@ 92 + .endm 93 + 94 + /* 95 + * JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple 96 + * indirect jmp/call which may be susceptible to the Spectre variant 2 97 + * attack. 98 + */ 99 + .macro JMP_NOSPEC reg:req 100 + #ifdef CONFIG_RETPOLINE 101 + ANNOTATE_NOSPEC_ALTERNATIVE 102 + ALTERNATIVE_2 __stringify(jmp *\reg), \ 103 + __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \ 104 + __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD 105 + #else 106 + jmp *\reg 107 + #endif 108 + .endm 109 + 110 + .macro CALL_NOSPEC reg:req 111 + #ifdef CONFIG_RETPOLINE 112 + ANNOTATE_NOSPEC_ALTERNATIVE 113 + ALTERNATIVE_2 __stringify(call *\reg), \ 114 + __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\ 115 + __stringify(lfence; call *\reg), X86_FEATURE_RETPOLINE_AMD 116 + #else 117 + call *\reg 118 + #endif 119 + .endm 120 + 121 + /* 122 + * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP 123 + * monstrosity above, manually. 124 + */ 125 + .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req 126 + #ifdef CONFIG_RETPOLINE 127 + ANNOTATE_NOSPEC_ALTERNATIVE 128 + ALTERNATIVE "jmp .Lskip_rsb_\@", \ 129 + __stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)) \ 130 + \ftr 131 + .Lskip_rsb_\@: 132 + #endif 133 + .endm 134 + 135 + #else /* __ASSEMBLY__ */ 136 + 137 + #define ANNOTATE_NOSPEC_ALTERNATIVE \ 138 + "999:\n\t" \ 139 + ".pushsection .discard.nospec\n\t" \ 140 + ".long 999b - .\n\t" \ 141 + ".popsection\n\t" 142 + 143 + #if defined(CONFIG_X86_64) && defined(RETPOLINE) 144 + 145 + /* 146 + * Since the inline asm uses the %V modifier which is only in newer GCC, 147 + * the 64-bit one is dependent on RETPOLINE not CONFIG_RETPOLINE. 148 + */ 149 + # define CALL_NOSPEC \ 150 + ANNOTATE_NOSPEC_ALTERNATIVE \ 151 + ALTERNATIVE( \ 152 + "call *%[thunk_target]\n", \ 153 + "call __x86_indirect_thunk_%V[thunk_target]\n", \ 154 + X86_FEATURE_RETPOLINE) 155 + # define THUNK_TARGET(addr) [thunk_target] "r" (addr) 156 + 157 + #elif defined(CONFIG_X86_32) && defined(CONFIG_RETPOLINE) 158 + /* 159 + * For i386 we use the original ret-equivalent retpoline, because 160 + * otherwise we'll run out of registers. We don't care about CET 161 + * here, anyway. 162 + */ 163 + # define CALL_NOSPEC ALTERNATIVE("call *%[thunk_target]\n", \ 164 + " jmp 904f;\n" \ 165 + " .align 16\n" \ 166 + "901: call 903f;\n" \ 167 + "902: pause;\n" \ 168 + " jmp 902b;\n" \ 169 + " .align 16\n" \ 170 + "903: addl $4, %%esp;\n" \ 171 + " pushl %[thunk_target];\n" \ 172 + " ret;\n" \ 173 + " .align 16\n" \ 174 + "904: call 901b;\n", \ 175 + X86_FEATURE_RETPOLINE) 176 + 177 + # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) 178 + #else /* No retpoline for C / inline asm */ 179 + # define CALL_NOSPEC "call *%[thunk_target]\n" 180 + # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) 181 + #endif 182 + 183 + /* The Spectre V2 mitigation variants */ 184 + enum spectre_v2_mitigation { 185 + SPECTRE_V2_NONE, 186 + SPECTRE_V2_RETPOLINE_MINIMAL, 187 + SPECTRE_V2_RETPOLINE_MINIMAL_AMD, 188 + SPECTRE_V2_RETPOLINE_GENERIC, 189 + SPECTRE_V2_RETPOLINE_AMD, 190 + SPECTRE_V2_IBRS, 191 + }; 192 + 193 + /* 194 + * On VMEXIT we must ensure that no RSB predictions learned in the guest 195 + * can be followed in the host, by overwriting the RSB completely. Both 196 + * retpoline and IBRS mitigations for Spectre v2 need this; only on future 197 + * CPUs with IBRS_ATT *might* it be avoided. 198 + */ 199 + static inline void vmexit_fill_RSB(void) 200 + { 201 + #ifdef CONFIG_RETPOLINE 202 + unsigned long loops = RSB_CLEAR_LOOPS / 2; 203 + 204 + asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE 205 + ALTERNATIVE("jmp 910f", 206 + __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)), 207 + X86_FEATURE_RETPOLINE) 208 + "910:" 209 + : "=&r" (loops), ASM_CALL_CONSTRAINT 210 + : "r" (loops) : "memory" ); 211 + #endif 212 + } 213 + #endif /* __ASSEMBLY__ */ 214 + #endif /* __NOSPEC_BRANCH_H__ */
+1 -1
arch/x86/include/asm/processor-flags.h
··· 40 40 #define CR3_NOFLUSH BIT_ULL(63) 41 41 42 42 #ifdef CONFIG_PAGE_TABLE_ISOLATION 43 - # define X86_CR3_PTI_SWITCH_BIT 11 43 + # define X86_CR3_PTI_PCID_USER_BIT 11 44 44 #endif 45 45 46 46 #else
+3 -3
arch/x86/include/asm/tlbflush.h
··· 81 81 * Make sure that the dynamic ASID space does not confict with the 82 82 * bit we are using to switch between user and kernel ASIDs. 83 83 */ 84 - BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT)); 84 + BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_PCID_USER_BIT)); 85 85 86 86 /* 87 87 * The ASID being passed in here should have respected the 88 88 * MAX_ASID_AVAILABLE and thus never have the switch bit set. 89 89 */ 90 - VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT)); 90 + VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_PCID_USER_BIT)); 91 91 #endif 92 92 /* 93 93 * The dynamically-assigned ASIDs that get passed in are small ··· 112 112 { 113 113 u16 ret = kern_pcid(asid); 114 114 #ifdef CONFIG_PAGE_TABLE_ISOLATION 115 - ret |= 1 << X86_CR3_PTI_SWITCH_BIT; 115 + ret |= 1 << X86_CR3_PTI_PCID_USER_BIT; 116 116 #endif 117 117 return ret; 118 118 }
+3 -2
arch/x86/include/asm/xen/hypercall.h
··· 44 44 #include <asm/page.h> 45 45 #include <asm/pgtable.h> 46 46 #include <asm/smap.h> 47 + #include <asm/nospec-branch.h> 47 48 48 49 #include <xen/interface/xen.h> 49 50 #include <xen/interface/sched.h> ··· 218 217 __HYPERCALL_5ARG(a1, a2, a3, a4, a5); 219 218 220 219 stac(); 221 - asm volatile("call *%[call]" 220 + asm volatile(CALL_NOSPEC 222 221 : __HYPERCALL_5PARAM 223 - : [call] "a" (&hypercall_page[call]) 222 + : [thunk_target] "a" (&hypercall_page[call]) 224 223 : __HYPERCALL_CLOBBER5); 225 224 clac(); 226 225
+5 -2
arch/x86/kernel/alternative.c
··· 344 344 static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr) 345 345 { 346 346 unsigned long flags; 347 + int i; 347 348 348 - if (instr[0] != 0x90) 349 - return; 349 + for (i = 0; i < a->padlen; i++) { 350 + if (instr[i] != 0x90) 351 + return; 352 + } 350 353 351 354 local_irq_save(flags); 352 355 add_nops(instr + (a->instrlen - a->padlen), a->padlen);
+26 -2
arch/x86/kernel/cpu/amd.c
··· 829 829 set_cpu_cap(c, X86_FEATURE_K8); 830 830 831 831 if (cpu_has(c, X86_FEATURE_XMM2)) { 832 - /* MFENCE stops RDTSC speculation */ 833 - set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC); 832 + unsigned long long val; 833 + int ret; 834 + 835 + /* 836 + * A serializing LFENCE has less overhead than MFENCE, so 837 + * use it for execution serialization. On families which 838 + * don't have that MSR, LFENCE is already serializing. 839 + * msr_set_bit() uses the safe accessors, too, even if the MSR 840 + * is not present. 841 + */ 842 + msr_set_bit(MSR_F10H_DECFG, 843 + MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT); 844 + 845 + /* 846 + * Verify that the MSR write was successful (could be running 847 + * under a hypervisor) and only then assume that LFENCE is 848 + * serializing. 849 + */ 850 + ret = rdmsrl_safe(MSR_F10H_DECFG, &val); 851 + if (!ret && (val & MSR_F10H_DECFG_LFENCE_SERIALIZE)) { 852 + /* A serializing LFENCE stops RDTSC speculation */ 853 + set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); 854 + } else { 855 + /* MFENCE stops RDTSC speculation */ 856 + set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC); 857 + } 834 858 } 835 859 836 860 /*
+185
arch/x86/kernel/cpu/bugs.c
··· 10 10 */ 11 11 #include <linux/init.h> 12 12 #include <linux/utsname.h> 13 + #include <linux/cpu.h> 14 + 15 + #include <asm/nospec-branch.h> 16 + #include <asm/cmdline.h> 13 17 #include <asm/bugs.h> 14 18 #include <asm/processor.h> 15 19 #include <asm/processor-flags.h> ··· 24 20 #include <asm/pgtable.h> 25 21 #include <asm/set_memory.h> 26 22 23 + static void __init spectre_v2_select_mitigation(void); 24 + 27 25 void __init check_bugs(void) 28 26 { 29 27 identify_boot_cpu(); ··· 34 28 pr_info("CPU: "); 35 29 print_cpu_info(&boot_cpu_data); 36 30 } 31 + 32 + /* Select the proper spectre mitigation before patching alternatives */ 33 + spectre_v2_select_mitigation(); 37 34 38 35 #ifdef CONFIG_X86_32 39 36 /* ··· 69 60 set_memory_4k((unsigned long)__va(0), 1); 70 61 #endif 71 62 } 63 + 64 + /* The kernel command line selection */ 65 + enum spectre_v2_mitigation_cmd { 66 + SPECTRE_V2_CMD_NONE, 67 + SPECTRE_V2_CMD_AUTO, 68 + SPECTRE_V2_CMD_FORCE, 69 + SPECTRE_V2_CMD_RETPOLINE, 70 + SPECTRE_V2_CMD_RETPOLINE_GENERIC, 71 + SPECTRE_V2_CMD_RETPOLINE_AMD, 72 + }; 73 + 74 + static const char *spectre_v2_strings[] = { 75 + [SPECTRE_V2_NONE] = "Vulnerable", 76 + [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", 77 + [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", 78 + [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", 79 + [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", 80 + }; 81 + 82 + #undef pr_fmt 83 + #define pr_fmt(fmt) "Spectre V2 mitigation: " fmt 84 + 85 + static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; 86 + 87 + static void __init spec2_print_if_insecure(const char *reason) 88 + { 89 + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) 90 + pr_info("%s\n", reason); 91 + } 92 + 93 + static void __init spec2_print_if_secure(const char *reason) 94 + { 95 + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) 96 + pr_info("%s\n", reason); 97 + } 98 + 99 + static inline bool retp_compiler(void) 100 + { 101 + return __is_defined(RETPOLINE); 102 + } 103 + 104 + static inline bool match_option(const char *arg, int arglen, const char *opt) 105 + { 106 + int len = strlen(opt); 107 + 108 + return len == arglen && !strncmp(arg, opt, len); 109 + } 110 + 111 + static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) 112 + { 113 + char arg[20]; 114 + int ret; 115 + 116 + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, 117 + sizeof(arg)); 118 + if (ret > 0) { 119 + if (match_option(arg, ret, "off")) { 120 + goto disable; 121 + } else if (match_option(arg, ret, "on")) { 122 + spec2_print_if_secure("force enabled on command line."); 123 + return SPECTRE_V2_CMD_FORCE; 124 + } else if (match_option(arg, ret, "retpoline")) { 125 + spec2_print_if_insecure("retpoline selected on command line."); 126 + return SPECTRE_V2_CMD_RETPOLINE; 127 + } else if (match_option(arg, ret, "retpoline,amd")) { 128 + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { 129 + pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); 130 + return SPECTRE_V2_CMD_AUTO; 131 + } 132 + spec2_print_if_insecure("AMD retpoline selected on command line."); 133 + return SPECTRE_V2_CMD_RETPOLINE_AMD; 134 + } else if (match_option(arg, ret, "retpoline,generic")) { 135 + spec2_print_if_insecure("generic retpoline selected on command line."); 136 + return SPECTRE_V2_CMD_RETPOLINE_GENERIC; 137 + } else if (match_option(arg, ret, "auto")) { 138 + return SPECTRE_V2_CMD_AUTO; 139 + } 140 + } 141 + 142 + if (!cmdline_find_option_bool(boot_command_line, "nospectre_v2")) 143 + return SPECTRE_V2_CMD_AUTO; 144 + disable: 145 + spec2_print_if_insecure("disabled on command line."); 146 + return SPECTRE_V2_CMD_NONE; 147 + } 148 + 149 + static void __init spectre_v2_select_mitigation(void) 150 + { 151 + enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); 152 + enum spectre_v2_mitigation mode = SPECTRE_V2_NONE; 153 + 154 + /* 155 + * If the CPU is not affected and the command line mode is NONE or AUTO 156 + * then nothing to do. 157 + */ 158 + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2) && 159 + (cmd == SPECTRE_V2_CMD_NONE || cmd == SPECTRE_V2_CMD_AUTO)) 160 + return; 161 + 162 + switch (cmd) { 163 + case SPECTRE_V2_CMD_NONE: 164 + return; 165 + 166 + case SPECTRE_V2_CMD_FORCE: 167 + /* FALLTRHU */ 168 + case SPECTRE_V2_CMD_AUTO: 169 + goto retpoline_auto; 170 + 171 + case SPECTRE_V2_CMD_RETPOLINE_AMD: 172 + if (IS_ENABLED(CONFIG_RETPOLINE)) 173 + goto retpoline_amd; 174 + break; 175 + case SPECTRE_V2_CMD_RETPOLINE_GENERIC: 176 + if (IS_ENABLED(CONFIG_RETPOLINE)) 177 + goto retpoline_generic; 178 + break; 179 + case SPECTRE_V2_CMD_RETPOLINE: 180 + if (IS_ENABLED(CONFIG_RETPOLINE)) 181 + goto retpoline_auto; 182 + break; 183 + } 184 + pr_err("kernel not compiled with retpoline; no mitigation available!"); 185 + return; 186 + 187 + retpoline_auto: 188 + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { 189 + retpoline_amd: 190 + if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { 191 + pr_err("LFENCE not serializing. Switching to generic retpoline\n"); 192 + goto retpoline_generic; 193 + } 194 + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : 195 + SPECTRE_V2_RETPOLINE_MINIMAL_AMD; 196 + setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); 197 + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); 198 + } else { 199 + retpoline_generic: 200 + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : 201 + SPECTRE_V2_RETPOLINE_MINIMAL; 202 + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); 203 + } 204 + 205 + spectre_v2_enabled = mode; 206 + pr_info("%s\n", spectre_v2_strings[mode]); 207 + } 208 + 209 + #undef pr_fmt 210 + 211 + #ifdef CONFIG_SYSFS 212 + ssize_t cpu_show_meltdown(struct device *dev, 213 + struct device_attribute *attr, char *buf) 214 + { 215 + if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) 216 + return sprintf(buf, "Not affected\n"); 217 + if (boot_cpu_has(X86_FEATURE_PTI)) 218 + return sprintf(buf, "Mitigation: PTI\n"); 219 + return sprintf(buf, "Vulnerable\n"); 220 + } 221 + 222 + ssize_t cpu_show_spectre_v1(struct device *dev, 223 + struct device_attribute *attr, char *buf) 224 + { 225 + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1)) 226 + return sprintf(buf, "Not affected\n"); 227 + return sprintf(buf, "Vulnerable\n"); 228 + } 229 + 230 + ssize_t cpu_show_spectre_v2(struct device *dev, 231 + struct device_attribute *attr, char *buf) 232 + { 233 + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) 234 + return sprintf(buf, "Not affected\n"); 235 + 236 + return sprintf(buf, "%s\n", spectre_v2_strings[spectre_v2_enabled]); 237 + } 238 + #endif
+3
arch/x86/kernel/cpu/common.c
··· 926 926 if (c->x86_vendor != X86_VENDOR_AMD) 927 927 setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); 928 928 929 + setup_force_cpu_bug(X86_BUG_SPECTRE_V1); 930 + setup_force_cpu_bug(X86_BUG_SPECTRE_V2); 931 + 929 932 fpu__init_system(c); 930 933 931 934 #ifdef CONFIG_X86_32
+4 -2
arch/x86/kernel/ftrace_32.S
··· 8 8 #include <asm/segment.h> 9 9 #include <asm/export.h> 10 10 #include <asm/ftrace.h> 11 + #include <asm/nospec-branch.h> 11 12 12 13 #ifdef CC_USING_FENTRY 13 14 # define function_hook __fentry__ ··· 198 197 movl 0x4(%ebp), %edx 199 198 subl $MCOUNT_INSN_SIZE, %eax 200 199 201 - call *ftrace_trace_function 200 + movl ftrace_trace_function, %ecx 201 + CALL_NOSPEC %ecx 202 202 203 203 popl %edx 204 204 popl %ecx ··· 243 241 movl %eax, %ecx 244 242 popl %edx 245 243 popl %eax 246 - jmp *%ecx 244 + JMP_NOSPEC %ecx 247 245 #endif
+4 -4
arch/x86/kernel/ftrace_64.S
··· 7 7 #include <asm/ptrace.h> 8 8 #include <asm/ftrace.h> 9 9 #include <asm/export.h> 10 - 10 + #include <asm/nospec-branch.h> 11 11 12 12 .code64 13 13 .section .entry.text, "ax" ··· 286 286 * ip and parent ip are used and the list function is called when 287 287 * function tracing is enabled. 288 288 */ 289 - call *ftrace_trace_function 290 - 289 + movq ftrace_trace_function, %r8 290 + CALL_NOSPEC %r8 291 291 restore_mcount_regs 292 292 293 293 jmp fgraph_trace ··· 329 329 movq 8(%rsp), %rdx 330 330 movq (%rsp), %rax 331 331 addq $24, %rsp 332 - jmp *%rdi 332 + JMP_NOSPEC %rdi 333 333 #endif
+5 -4
arch/x86/kernel/irq_32.c
··· 20 20 #include <linux/mm.h> 21 21 22 22 #include <asm/apic.h> 23 + #include <asm/nospec-branch.h> 23 24 24 25 #ifdef CONFIG_DEBUG_STACKOVERFLOW 25 26 ··· 56 55 static void call_on_stack(void *func, void *stack) 57 56 { 58 57 asm volatile("xchgl %%ebx,%%esp \n" 59 - "call *%%edi \n" 58 + CALL_NOSPEC 60 59 "movl %%ebx,%%esp \n" 61 60 : "=b" (stack) 62 61 : "0" (stack), 63 - "D"(func) 62 + [thunk_target] "D"(func) 64 63 : "memory", "cc", "edx", "ecx", "eax"); 65 64 } 66 65 ··· 96 95 call_on_stack(print_stack_overflow, isp); 97 96 98 97 asm volatile("xchgl %%ebx,%%esp \n" 99 - "call *%%edi \n" 98 + CALL_NOSPEC 100 99 "movl %%ebx,%%esp \n" 101 100 : "=a" (arg1), "=b" (isp) 102 101 : "0" (desc), "1" (isp), 103 - "D" (desc->handle_irq) 102 + [thunk_target] "D" (desc->handle_irq) 104 103 : "memory", "cc", "ecx"); 105 104 return 1; 106 105 }
+11
arch/x86/kernel/tboot.c
··· 138 138 return -1; 139 139 set_pte_at(&tboot_mm, vaddr, pte, pfn_pte(pfn, prot)); 140 140 pte_unmap(pte); 141 + 142 + /* 143 + * PTI poisons low addresses in the kernel page tables in the 144 + * name of making them unusable for userspace. To execute 145 + * code at such a low address, the poison must be cleared. 146 + * 147 + * Note: 'pgd' actually gets set in p4d_alloc() _or_ 148 + * pud_alloc() depending on 4/5-level paging. 149 + */ 150 + pgd->pgd &= ~_PAGE_NX; 151 + 141 152 return 0; 142 153 } 143 154
+4
arch/x86/kvm/svm.c
··· 45 45 #include <asm/debugreg.h> 46 46 #include <asm/kvm_para.h> 47 47 #include <asm/irq_remapping.h> 48 + #include <asm/nospec-branch.h> 48 49 49 50 #include <asm/virtext.h> 50 51 #include "trace.h" ··· 5027 5026 , "ebx", "ecx", "edx", "esi", "edi" 5028 5027 #endif 5029 5028 ); 5029 + 5030 + /* Eliminate branch target predictions from guest mode */ 5031 + vmexit_fill_RSB(); 5030 5032 5031 5033 #ifdef CONFIG_X86_64 5032 5034 wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+4
arch/x86/kvm/vmx.c
··· 50 50 #include <asm/apic.h> 51 51 #include <asm/irq_remapping.h> 52 52 #include <asm/mmu_context.h> 53 + #include <asm/nospec-branch.h> 53 54 54 55 #include "trace.h" 55 56 #include "pmu.h" ··· 9490 9489 , "eax", "ebx", "edi", "esi" 9491 9490 #endif 9492 9491 ); 9492 + 9493 + /* Eliminate branch target predictions from guest mode */ 9494 + vmexit_fill_RSB(); 9493 9495 9494 9496 /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */ 9495 9497 if (debugctlmsr)
+1
arch/x86/lib/Makefile
··· 26 26 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o 27 27 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o 28 28 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o 29 + lib-$(CONFIG_RETPOLINE) += retpoline.o 29 30 30 31 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o 31 32
+4 -3
arch/x86/lib/checksum_32.S
··· 29 29 #include <asm/errno.h> 30 30 #include <asm/asm.h> 31 31 #include <asm/export.h> 32 - 32 + #include <asm/nospec-branch.h> 33 + 33 34 /* 34 35 * computes a partial checksum, e.g. for TCP/UDP fragments 35 36 */ ··· 157 156 negl %ebx 158 157 lea 45f(%ebx,%ebx,2), %ebx 159 158 testl %esi, %esi 160 - jmp *%ebx 159 + JMP_NOSPEC %ebx 161 160 162 161 # Handle 2-byte-aligned regions 163 162 20: addw (%esi), %ax ··· 440 439 andl $-32,%edx 441 440 lea 3f(%ebx,%ebx), %ebx 442 441 testl %esi, %esi 443 - jmp *%ebx 442 + JMP_NOSPEC %ebx 444 443 1: addl $64,%esi 445 444 addl $64,%edi 446 445 SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
+48
arch/x86/lib/retpoline.S
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #include <linux/stringify.h> 4 + #include <linux/linkage.h> 5 + #include <asm/dwarf2.h> 6 + #include <asm/cpufeatures.h> 7 + #include <asm/alternative-asm.h> 8 + #include <asm/export.h> 9 + #include <asm/nospec-branch.h> 10 + 11 + .macro THUNK reg 12 + .section .text.__x86.indirect_thunk.\reg 13 + 14 + ENTRY(__x86_indirect_thunk_\reg) 15 + CFI_STARTPROC 16 + JMP_NOSPEC %\reg 17 + CFI_ENDPROC 18 + ENDPROC(__x86_indirect_thunk_\reg) 19 + .endm 20 + 21 + /* 22 + * Despite being an assembler file we can't just use .irp here 23 + * because __KSYM_DEPS__ only uses the C preprocessor and would 24 + * only see one instance of "__x86_indirect_thunk_\reg" rather 25 + * than one per register with the correct names. So we do it 26 + * the simple and nasty way... 27 + */ 28 + #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) 29 + #define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) 30 + 31 + GENERATE_THUNK(_ASM_AX) 32 + GENERATE_THUNK(_ASM_BX) 33 + GENERATE_THUNK(_ASM_CX) 34 + GENERATE_THUNK(_ASM_DX) 35 + GENERATE_THUNK(_ASM_SI) 36 + GENERATE_THUNK(_ASM_DI) 37 + GENERATE_THUNK(_ASM_BP) 38 + GENERATE_THUNK(_ASM_SP) 39 + #ifdef CONFIG_64BIT 40 + GENERATE_THUNK(r8) 41 + GENERATE_THUNK(r9) 42 + GENERATE_THUNK(r10) 43 + GENERATE_THUNK(r11) 44 + GENERATE_THUNK(r12) 45 + GENERATE_THUNK(r13) 46 + GENERATE_THUNK(r14) 47 + GENERATE_THUNK(r15) 48 + #endif
+6 -26
arch/x86/mm/pti.c
··· 149 149 * 150 150 * Returns a pointer to a P4D on success, or NULL on failure. 151 151 */ 152 - static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address) 152 + static __init p4d_t *pti_user_pagetable_walk_p4d(unsigned long address) 153 153 { 154 154 pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address)); 155 155 gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); ··· 164 164 if (!new_p4d_page) 165 165 return NULL; 166 166 167 - if (pgd_none(*pgd)) { 168 - set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page))); 169 - new_p4d_page = 0; 170 - } 171 - if (new_p4d_page) 172 - free_page(new_p4d_page); 167 + set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page))); 173 168 } 174 169 BUILD_BUG_ON(pgd_large(*pgd) != 0); 175 170 ··· 177 182 * 178 183 * Returns a pointer to a PMD on success, or NULL on failure. 179 184 */ 180 - static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address) 185 + static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address) 181 186 { 182 187 gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); 183 188 p4d_t *p4d = pti_user_pagetable_walk_p4d(address); ··· 189 194 if (!new_pud_page) 190 195 return NULL; 191 196 192 - if (p4d_none(*p4d)) { 193 - set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page))); 194 - new_pud_page = 0; 195 - } 196 - if (new_pud_page) 197 - free_page(new_pud_page); 197 + set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page))); 198 198 } 199 199 200 200 pud = pud_offset(p4d, address); ··· 203 213 if (!new_pmd_page) 204 214 return NULL; 205 215 206 - if (pud_none(*pud)) { 207 - set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page))); 208 - new_pmd_page = 0; 209 - } 210 - if (new_pmd_page) 211 - free_page(new_pmd_page); 216 + set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page))); 212 217 } 213 218 214 219 return pmd_offset(pud, address); ··· 236 251 if (!new_pte_page) 237 252 return NULL; 238 253 239 - if (pmd_none(*pmd)) { 240 - set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page))); 241 - new_pte_page = 0; 242 - } 243 - if (new_pte_page) 244 - free_page(new_pte_page); 254 + set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page))); 245 255 } 246 256 247 257 pte = pte_offset_kernel(pmd, address);
+2
arch/x86/platform/efi/efi_64.c
··· 135 135 pud[j] = *pud_offset(p4d_k, vaddr); 136 136 } 137 137 } 138 + pgd_offset_k(pgd * PGDIR_SIZE)->pgd &= ~_PAGE_NX; 138 139 } 140 + 139 141 out: 140 142 __flush_tlb_all(); 141 143
+3
drivers/base/Kconfig
··· 236 236 config GENERIC_CPU_AUTOPROBE 237 237 bool 238 238 239 + config GENERIC_CPU_VULNERABILITIES 240 + bool 241 + 239 242 config SOC_BUS 240 243 bool 241 244 select GLOB
+48
drivers/base/cpu.c
··· 511 511 #endif 512 512 } 513 513 514 + #ifdef CONFIG_GENERIC_CPU_VULNERABILITIES 515 + 516 + ssize_t __weak cpu_show_meltdown(struct device *dev, 517 + struct device_attribute *attr, char *buf) 518 + { 519 + return sprintf(buf, "Not affected\n"); 520 + } 521 + 522 + ssize_t __weak cpu_show_spectre_v1(struct device *dev, 523 + struct device_attribute *attr, char *buf) 524 + { 525 + return sprintf(buf, "Not affected\n"); 526 + } 527 + 528 + ssize_t __weak cpu_show_spectre_v2(struct device *dev, 529 + struct device_attribute *attr, char *buf) 530 + { 531 + return sprintf(buf, "Not affected\n"); 532 + } 533 + 534 + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); 535 + static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); 536 + static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); 537 + 538 + static struct attribute *cpu_root_vulnerabilities_attrs[] = { 539 + &dev_attr_meltdown.attr, 540 + &dev_attr_spectre_v1.attr, 541 + &dev_attr_spectre_v2.attr, 542 + NULL 543 + }; 544 + 545 + static const struct attribute_group cpu_root_vulnerabilities_group = { 546 + .name = "vulnerabilities", 547 + .attrs = cpu_root_vulnerabilities_attrs, 548 + }; 549 + 550 + static void __init cpu_register_vulnerabilities(void) 551 + { 552 + if (sysfs_create_group(&cpu_subsys.dev_root->kobj, 553 + &cpu_root_vulnerabilities_group)) 554 + pr_err("Unable to register CPU vulnerabilities\n"); 555 + } 556 + 557 + #else 558 + static inline void cpu_register_vulnerabilities(void) { } 559 + #endif 560 + 514 561 void __init cpu_dev_init(void) 515 562 { 516 563 if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups)) 517 564 panic("Failed to register CPU subsystem"); 518 565 519 566 cpu_dev_register_generic(); 567 + cpu_register_vulnerabilities(); 520 568 }
+7
include/linux/cpu.h
··· 47 47 extern int cpu_add_dev_attr_group(struct attribute_group *attrs); 48 48 extern void cpu_remove_dev_attr_group(struct attribute_group *attrs); 49 49 50 + extern ssize_t cpu_show_meltdown(struct device *dev, 51 + struct device_attribute *attr, char *buf); 52 + extern ssize_t cpu_show_spectre_v1(struct device *dev, 53 + struct device_attribute *attr, char *buf); 54 + extern ssize_t cpu_show_spectre_v2(struct device *dev, 55 + struct device_attribute *attr, char *buf); 56 + 50 57 extern __printf(4, 5) 51 58 struct device *cpu_device_create(struct device *parent, void *drvdata, 52 59 const struct attribute_group **groups,
+1 -1
security/Kconfig
··· 63 63 ensuring that the majority of kernel addresses are not mapped 64 64 into userspace. 65 65 66 - See Documentation/x86/pagetable-isolation.txt for more details. 66 + See Documentation/x86/pti.txt for more details. 67 67 68 68 config SECURITY_INFINIBAND 69 69 bool "Infiniband Security Hooks"
+63 -6
tools/objtool/check.c
··· 428 428 } 429 429 430 430 /* 431 + * FIXME: For now, just ignore any alternatives which add retpolines. This is 432 + * a temporary hack, as it doesn't allow ORC to unwind from inside a retpoline. 433 + * But it at least allows objtool to understand the control flow *around* the 434 + * retpoline. 435 + */ 436 + static int add_nospec_ignores(struct objtool_file *file) 437 + { 438 + struct section *sec; 439 + struct rela *rela; 440 + struct instruction *insn; 441 + 442 + sec = find_section_by_name(file->elf, ".rela.discard.nospec"); 443 + if (!sec) 444 + return 0; 445 + 446 + list_for_each_entry(rela, &sec->rela_list, list) { 447 + if (rela->sym->type != STT_SECTION) { 448 + WARN("unexpected relocation symbol type in %s", sec->name); 449 + return -1; 450 + } 451 + 452 + insn = find_insn(file, rela->sym->sec, rela->addend); 453 + if (!insn) { 454 + WARN("bad .discard.nospec entry"); 455 + return -1; 456 + } 457 + 458 + insn->ignore_alts = true; 459 + } 460 + 461 + return 0; 462 + } 463 + 464 + /* 431 465 * Find the destination instructions for all jumps. 432 466 */ 433 467 static int add_jump_destinations(struct objtool_file *file) ··· 490 456 } else if (rela->sym->sec->idx) { 491 457 dest_sec = rela->sym->sec; 492 458 dest_off = rela->sym->sym.st_value + rela->addend + 4; 459 + } else if (strstr(rela->sym->name, "_indirect_thunk_")) { 460 + /* 461 + * Retpoline jumps are really dynamic jumps in 462 + * disguise, so convert them accordingly. 463 + */ 464 + insn->type = INSN_JUMP_DYNAMIC; 465 + continue; 493 466 } else { 494 467 /* sibling call */ 495 468 insn->jump_dest = 0; ··· 543 502 dest_off = insn->offset + insn->len + insn->immediate; 544 503 insn->call_dest = find_symbol_by_offset(insn->sec, 545 504 dest_off); 505 + /* 506 + * FIXME: Thanks to retpolines, it's now considered 507 + * normal for a function to call within itself. So 508 + * disable this warning for now. 509 + */ 510 + #if 0 546 511 if (!insn->call_dest) { 547 512 WARN_FUNC("can't find call dest symbol at offset 0x%lx", 548 513 insn->sec, insn->offset, dest_off); 549 514 return -1; 550 515 } 516 + #endif 551 517 } else if (rela->sym->type == STT_SECTION) { 552 518 insn->call_dest = find_symbol_by_offset(rela->sym->sec, 553 519 rela->addend+4); ··· 719 671 return ret; 720 672 721 673 list_for_each_entry_safe(special_alt, tmp, &special_alts, list) { 722 - alt = malloc(sizeof(*alt)); 723 - if (!alt) { 724 - WARN("malloc failed"); 725 - ret = -1; 726 - goto out; 727 - } 728 674 729 675 orig_insn = find_insn(file, special_alt->orig_sec, 730 676 special_alt->orig_off); ··· 728 686 ret = -1; 729 687 goto out; 730 688 } 689 + 690 + /* Ignore retpoline alternatives. */ 691 + if (orig_insn->ignore_alts) 692 + continue; 731 693 732 694 new_insn = NULL; 733 695 if (!special_alt->group || special_alt->new_len) { ··· 756 710 &new_insn); 757 711 if (ret) 758 712 goto out; 713 + } 714 + 715 + alt = malloc(sizeof(*alt)); 716 + if (!alt) { 717 + WARN("malloc failed"); 718 + ret = -1; 719 + goto out; 759 720 } 760 721 761 722 alt->insn = new_insn; ··· 1080 1027 return ret; 1081 1028 1082 1029 add_ignores(file); 1030 + 1031 + ret = add_nospec_ignores(file); 1032 + if (ret) 1033 + return ret; 1083 1034 1084 1035 ret = add_jump_destinations(file); 1085 1036 if (ret)
+1 -1
tools/objtool/check.h
··· 44 44 unsigned int len; 45 45 unsigned char type; 46 46 unsigned long immediate; 47 - bool alt_group, visited, dead_end, ignore, hint, save, restore; 47 + bool alt_group, visited, dead_end, ignore, hint, save, restore, ignore_alts; 48 48 struct symbol *call_dest; 49 49 struct instruction *jump_dest; 50 50 struct list_head alts;
+1 -1
tools/testing/selftests/x86/Makefile
··· 7 7 8 8 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall test_mremap_vdso \ 9 9 check_initial_reg_state sigreturn ldt_gdt iopl mpx-mini-test ioperm \ 10 - protection_keys test_vdso 10 + protection_keys test_vdso test_vsyscall 11 11 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ 12 12 test_FCMOV test_FCOMI test_FISTTP \ 13 13 vdso_restorer
+500
tools/testing/selftests/x86/test_vsyscall.c
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #define _GNU_SOURCE 4 + 5 + #include <stdio.h> 6 + #include <sys/time.h> 7 + #include <time.h> 8 + #include <stdlib.h> 9 + #include <sys/syscall.h> 10 + #include <unistd.h> 11 + #include <dlfcn.h> 12 + #include <string.h> 13 + #include <inttypes.h> 14 + #include <signal.h> 15 + #include <sys/ucontext.h> 16 + #include <errno.h> 17 + #include <err.h> 18 + #include <sched.h> 19 + #include <stdbool.h> 20 + #include <setjmp.h> 21 + 22 + #ifdef __x86_64__ 23 + # define VSYS(x) (x) 24 + #else 25 + # define VSYS(x) 0 26 + #endif 27 + 28 + #ifndef SYS_getcpu 29 + # ifdef __x86_64__ 30 + # define SYS_getcpu 309 31 + # else 32 + # define SYS_getcpu 318 33 + # endif 34 + #endif 35 + 36 + static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), 37 + int flags) 38 + { 39 + struct sigaction sa; 40 + memset(&sa, 0, sizeof(sa)); 41 + sa.sa_sigaction = handler; 42 + sa.sa_flags = SA_SIGINFO | flags; 43 + sigemptyset(&sa.sa_mask); 44 + if (sigaction(sig, &sa, 0)) 45 + err(1, "sigaction"); 46 + } 47 + 48 + /* vsyscalls and vDSO */ 49 + bool should_read_vsyscall = false; 50 + 51 + typedef long (*gtod_t)(struct timeval *tv, struct timezone *tz); 52 + gtod_t vgtod = (gtod_t)VSYS(0xffffffffff600000); 53 + gtod_t vdso_gtod; 54 + 55 + typedef int (*vgettime_t)(clockid_t, struct timespec *); 56 + vgettime_t vdso_gettime; 57 + 58 + typedef long (*time_func_t)(time_t *t); 59 + time_func_t vtime = (time_func_t)VSYS(0xffffffffff600400); 60 + time_func_t vdso_time; 61 + 62 + typedef long (*getcpu_t)(unsigned *, unsigned *, void *); 63 + getcpu_t vgetcpu = (getcpu_t)VSYS(0xffffffffff600800); 64 + getcpu_t vdso_getcpu; 65 + 66 + static void init_vdso(void) 67 + { 68 + void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); 69 + if (!vdso) 70 + vdso = dlopen("linux-gate.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); 71 + if (!vdso) { 72 + printf("[WARN]\tfailed to find vDSO\n"); 73 + return; 74 + } 75 + 76 + vdso_gtod = (gtod_t)dlsym(vdso, "__vdso_gettimeofday"); 77 + if (!vdso_gtod) 78 + printf("[WARN]\tfailed to find gettimeofday in vDSO\n"); 79 + 80 + vdso_gettime = (vgettime_t)dlsym(vdso, "__vdso_clock_gettime"); 81 + if (!vdso_gettime) 82 + printf("[WARN]\tfailed to find clock_gettime in vDSO\n"); 83 + 84 + vdso_time = (time_func_t)dlsym(vdso, "__vdso_time"); 85 + if (!vdso_time) 86 + printf("[WARN]\tfailed to find time in vDSO\n"); 87 + 88 + vdso_getcpu = (getcpu_t)dlsym(vdso, "__vdso_getcpu"); 89 + if (!vdso_getcpu) { 90 + /* getcpu() was never wired up in the 32-bit vDSO. */ 91 + printf("[%s]\tfailed to find getcpu in vDSO\n", 92 + sizeof(long) == 8 ? "WARN" : "NOTE"); 93 + } 94 + } 95 + 96 + static int init_vsys(void) 97 + { 98 + #ifdef __x86_64__ 99 + int nerrs = 0; 100 + FILE *maps; 101 + char line[128]; 102 + bool found = false; 103 + 104 + maps = fopen("/proc/self/maps", "r"); 105 + if (!maps) { 106 + printf("[WARN]\tCould not open /proc/self/maps -- assuming vsyscall is r-x\n"); 107 + should_read_vsyscall = true; 108 + return 0; 109 + } 110 + 111 + while (fgets(line, sizeof(line), maps)) { 112 + char r, x; 113 + void *start, *end; 114 + char name[128]; 115 + if (sscanf(line, "%p-%p %c-%cp %*x %*x:%*x %*u %s", 116 + &start, &end, &r, &x, name) != 5) 117 + continue; 118 + 119 + if (strcmp(name, "[vsyscall]")) 120 + continue; 121 + 122 + printf("\tvsyscall map: %s", line); 123 + 124 + if (start != (void *)0xffffffffff600000 || 125 + end != (void *)0xffffffffff601000) { 126 + printf("[FAIL]\taddress range is nonsense\n"); 127 + nerrs++; 128 + } 129 + 130 + printf("\tvsyscall permissions are %c-%c\n", r, x); 131 + should_read_vsyscall = (r == 'r'); 132 + if (x != 'x') { 133 + vgtod = NULL; 134 + vtime = NULL; 135 + vgetcpu = NULL; 136 + } 137 + 138 + found = true; 139 + break; 140 + } 141 + 142 + fclose(maps); 143 + 144 + if (!found) { 145 + printf("\tno vsyscall map in /proc/self/maps\n"); 146 + should_read_vsyscall = false; 147 + vgtod = NULL; 148 + vtime = NULL; 149 + vgetcpu = NULL; 150 + } 151 + 152 + return nerrs; 153 + #else 154 + return 0; 155 + #endif 156 + } 157 + 158 + /* syscalls */ 159 + static inline long sys_gtod(struct timeval *tv, struct timezone *tz) 160 + { 161 + return syscall(SYS_gettimeofday, tv, tz); 162 + } 163 + 164 + static inline int sys_clock_gettime(clockid_t id, struct timespec *ts) 165 + { 166 + return syscall(SYS_clock_gettime, id, ts); 167 + } 168 + 169 + static inline long sys_time(time_t *t) 170 + { 171 + return syscall(SYS_time, t); 172 + } 173 + 174 + static inline long sys_getcpu(unsigned * cpu, unsigned * node, 175 + void* cache) 176 + { 177 + return syscall(SYS_getcpu, cpu, node, cache); 178 + } 179 + 180 + static jmp_buf jmpbuf; 181 + 182 + static void sigsegv(int sig, siginfo_t *info, void *ctx_void) 183 + { 184 + siglongjmp(jmpbuf, 1); 185 + } 186 + 187 + static double tv_diff(const struct timeval *a, const struct timeval *b) 188 + { 189 + return (double)(a->tv_sec - b->tv_sec) + 190 + (double)((int)a->tv_usec - (int)b->tv_usec) * 1e-6; 191 + } 192 + 193 + static int check_gtod(const struct timeval *tv_sys1, 194 + const struct timeval *tv_sys2, 195 + const struct timezone *tz_sys, 196 + const char *which, 197 + const struct timeval *tv_other, 198 + const struct timezone *tz_other) 199 + { 200 + int nerrs = 0; 201 + double d1, d2; 202 + 203 + if (tz_other && (tz_sys->tz_minuteswest != tz_other->tz_minuteswest || tz_sys->tz_dsttime != tz_other->tz_dsttime)) { 204 + printf("[FAIL] %s tz mismatch\n", which); 205 + nerrs++; 206 + } 207 + 208 + d1 = tv_diff(tv_other, tv_sys1); 209 + d2 = tv_diff(tv_sys2, tv_other); 210 + printf("\t%s time offsets: %lf %lf\n", which, d1, d2); 211 + 212 + if (d1 < 0 || d2 < 0) { 213 + printf("[FAIL]\t%s time was inconsistent with the syscall\n", which); 214 + nerrs++; 215 + } else { 216 + printf("[OK]\t%s gettimeofday()'s timeval was okay\n", which); 217 + } 218 + 219 + return nerrs; 220 + } 221 + 222 + static int test_gtod(void) 223 + { 224 + struct timeval tv_sys1, tv_sys2, tv_vdso, tv_vsys; 225 + struct timezone tz_sys, tz_vdso, tz_vsys; 226 + long ret_vdso = -1; 227 + long ret_vsys = -1; 228 + int nerrs = 0; 229 + 230 + printf("[RUN]\ttest gettimeofday()\n"); 231 + 232 + if (sys_gtod(&tv_sys1, &tz_sys) != 0) 233 + err(1, "syscall gettimeofday"); 234 + if (vdso_gtod) 235 + ret_vdso = vdso_gtod(&tv_vdso, &tz_vdso); 236 + if (vgtod) 237 + ret_vsys = vgtod(&tv_vsys, &tz_vsys); 238 + if (sys_gtod(&tv_sys2, &tz_sys) != 0) 239 + err(1, "syscall gettimeofday"); 240 + 241 + if (vdso_gtod) { 242 + if (ret_vdso == 0) { 243 + nerrs += check_gtod(&tv_sys1, &tv_sys2, &tz_sys, "vDSO", &tv_vdso, &tz_vdso); 244 + } else { 245 + printf("[FAIL]\tvDSO gettimeofday() failed: %ld\n", ret_vdso); 246 + nerrs++; 247 + } 248 + } 249 + 250 + if (vgtod) { 251 + if (ret_vsys == 0) { 252 + nerrs += check_gtod(&tv_sys1, &tv_sys2, &tz_sys, "vsyscall", &tv_vsys, &tz_vsys); 253 + } else { 254 + printf("[FAIL]\tvsys gettimeofday() failed: %ld\n", ret_vsys); 255 + nerrs++; 256 + } 257 + } 258 + 259 + return nerrs; 260 + } 261 + 262 + static int test_time(void) { 263 + int nerrs = 0; 264 + 265 + printf("[RUN]\ttest time()\n"); 266 + long t_sys1, t_sys2, t_vdso = 0, t_vsys = 0; 267 + long t2_sys1 = -1, t2_sys2 = -1, t2_vdso = -1, t2_vsys = -1; 268 + t_sys1 = sys_time(&t2_sys1); 269 + if (vdso_time) 270 + t_vdso = vdso_time(&t2_vdso); 271 + if (vtime) 272 + t_vsys = vtime(&t2_vsys); 273 + t_sys2 = sys_time(&t2_sys2); 274 + if (t_sys1 < 0 || t_sys1 != t2_sys1 || t_sys2 < 0 || t_sys2 != t2_sys2) { 275 + printf("[FAIL]\tsyscall failed (ret1:%ld output1:%ld ret2:%ld output2:%ld)\n", t_sys1, t2_sys1, t_sys2, t2_sys2); 276 + nerrs++; 277 + return nerrs; 278 + } 279 + 280 + if (vdso_time) { 281 + if (t_vdso < 0 || t_vdso != t2_vdso) { 282 + printf("[FAIL]\tvDSO failed (ret:%ld output:%ld)\n", t_vdso, t2_vdso); 283 + nerrs++; 284 + } else if (t_vdso < t_sys1 || t_vdso > t_sys2) { 285 + printf("[FAIL]\tvDSO returned the wrong time (%ld %ld %ld)\n", t_sys1, t_vdso, t_sys2); 286 + nerrs++; 287 + } else { 288 + printf("[OK]\tvDSO time() is okay\n"); 289 + } 290 + } 291 + 292 + if (vtime) { 293 + if (t_vsys < 0 || t_vsys != t2_vsys) { 294 + printf("[FAIL]\tvsyscall failed (ret:%ld output:%ld)\n", t_vsys, t2_vsys); 295 + nerrs++; 296 + } else if (t_vsys < t_sys1 || t_vsys > t_sys2) { 297 + printf("[FAIL]\tvsyscall returned the wrong time (%ld %ld %ld)\n", t_sys1, t_vsys, t_sys2); 298 + nerrs++; 299 + } else { 300 + printf("[OK]\tvsyscall time() is okay\n"); 301 + } 302 + } 303 + 304 + return nerrs; 305 + } 306 + 307 + static int test_getcpu(int cpu) 308 + { 309 + int nerrs = 0; 310 + long ret_sys, ret_vdso = -1, ret_vsys = -1; 311 + 312 + printf("[RUN]\tgetcpu() on CPU %d\n", cpu); 313 + 314 + cpu_set_t cpuset; 315 + CPU_ZERO(&cpuset); 316 + CPU_SET(cpu, &cpuset); 317 + if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) { 318 + printf("[SKIP]\tfailed to force CPU %d\n", cpu); 319 + return nerrs; 320 + } 321 + 322 + unsigned cpu_sys, cpu_vdso, cpu_vsys, node_sys, node_vdso, node_vsys; 323 + unsigned node = 0; 324 + bool have_node = false; 325 + ret_sys = sys_getcpu(&cpu_sys, &node_sys, 0); 326 + if (vdso_getcpu) 327 + ret_vdso = vdso_getcpu(&cpu_vdso, &node_vdso, 0); 328 + if (vgetcpu) 329 + ret_vsys = vgetcpu(&cpu_vsys, &node_vsys, 0); 330 + 331 + if (ret_sys == 0) { 332 + if (cpu_sys != cpu) { 333 + printf("[FAIL]\tsyscall reported CPU %hu but should be %d\n", cpu_sys, cpu); 334 + nerrs++; 335 + } 336 + 337 + have_node = true; 338 + node = node_sys; 339 + } 340 + 341 + if (vdso_getcpu) { 342 + if (ret_vdso) { 343 + printf("[FAIL]\tvDSO getcpu() failed\n"); 344 + nerrs++; 345 + } else { 346 + if (!have_node) { 347 + have_node = true; 348 + node = node_vdso; 349 + } 350 + 351 + if (cpu_vdso != cpu) { 352 + printf("[FAIL]\tvDSO reported CPU %hu but should be %d\n", cpu_vdso, cpu); 353 + nerrs++; 354 + } else { 355 + printf("[OK]\tvDSO reported correct CPU\n"); 356 + } 357 + 358 + if (node_vdso != node) { 359 + printf("[FAIL]\tvDSO reported node %hu but should be %hu\n", node_vdso, node); 360 + nerrs++; 361 + } else { 362 + printf("[OK]\tvDSO reported correct node\n"); 363 + } 364 + } 365 + } 366 + 367 + if (vgetcpu) { 368 + if (ret_vsys) { 369 + printf("[FAIL]\tvsyscall getcpu() failed\n"); 370 + nerrs++; 371 + } else { 372 + if (!have_node) { 373 + have_node = true; 374 + node = node_vsys; 375 + } 376 + 377 + if (cpu_vsys != cpu) { 378 + printf("[FAIL]\tvsyscall reported CPU %hu but should be %d\n", cpu_vsys, cpu); 379 + nerrs++; 380 + } else { 381 + printf("[OK]\tvsyscall reported correct CPU\n"); 382 + } 383 + 384 + if (node_vsys != node) { 385 + printf("[FAIL]\tvsyscall reported node %hu but should be %hu\n", node_vsys, node); 386 + nerrs++; 387 + } else { 388 + printf("[OK]\tvsyscall reported correct node\n"); 389 + } 390 + } 391 + } 392 + 393 + return nerrs; 394 + } 395 + 396 + static int test_vsys_r(void) 397 + { 398 + #ifdef __x86_64__ 399 + printf("[RUN]\tChecking read access to the vsyscall page\n"); 400 + bool can_read; 401 + if (sigsetjmp(jmpbuf, 1) == 0) { 402 + *(volatile int *)0xffffffffff600000; 403 + can_read = true; 404 + } else { 405 + can_read = false; 406 + } 407 + 408 + if (can_read && !should_read_vsyscall) { 409 + printf("[FAIL]\tWe have read access, but we shouldn't\n"); 410 + return 1; 411 + } else if (!can_read && should_read_vsyscall) { 412 + printf("[FAIL]\tWe don't have read access, but we should\n"); 413 + return 1; 414 + } else { 415 + printf("[OK]\tgot expected result\n"); 416 + } 417 + #endif 418 + 419 + return 0; 420 + } 421 + 422 + 423 + #ifdef __x86_64__ 424 + #define X86_EFLAGS_TF (1UL << 8) 425 + static volatile sig_atomic_t num_vsyscall_traps; 426 + 427 + static unsigned long get_eflags(void) 428 + { 429 + unsigned long eflags; 430 + asm volatile ("pushfq\n\tpopq %0" : "=rm" (eflags)); 431 + return eflags; 432 + } 433 + 434 + static void set_eflags(unsigned long eflags) 435 + { 436 + asm volatile ("pushq %0\n\tpopfq" : : "rm" (eflags) : "flags"); 437 + } 438 + 439 + static void sigtrap(int sig, siginfo_t *info, void *ctx_void) 440 + { 441 + ucontext_t *ctx = (ucontext_t *)ctx_void; 442 + unsigned long ip = ctx->uc_mcontext.gregs[REG_RIP]; 443 + 444 + if (((ip ^ 0xffffffffff600000UL) & ~0xfffUL) == 0) 445 + num_vsyscall_traps++; 446 + } 447 + 448 + static int test_native_vsyscall(void) 449 + { 450 + time_t tmp; 451 + bool is_native; 452 + 453 + if (!vtime) 454 + return 0; 455 + 456 + printf("[RUN]\tchecking for native vsyscall\n"); 457 + sethandler(SIGTRAP, sigtrap, 0); 458 + set_eflags(get_eflags() | X86_EFLAGS_TF); 459 + vtime(&tmp); 460 + set_eflags(get_eflags() & ~X86_EFLAGS_TF); 461 + 462 + /* 463 + * If vsyscalls are emulated, we expect a single trap in the 464 + * vsyscall page -- the call instruction will trap with RIP 465 + * pointing to the entry point before emulation takes over. 466 + * In native mode, we expect two traps, since whatever code 467 + * the vsyscall page contains will be more than just a ret 468 + * instruction. 469 + */ 470 + is_native = (num_vsyscall_traps > 1); 471 + 472 + printf("\tvsyscalls are %s (%d instructions in vsyscall page)\n", 473 + (is_native ? "native" : "emulated"), 474 + (int)num_vsyscall_traps); 475 + 476 + return 0; 477 + } 478 + #endif 479 + 480 + int main(int argc, char **argv) 481 + { 482 + int nerrs = 0; 483 + 484 + init_vdso(); 485 + nerrs += init_vsys(); 486 + 487 + nerrs += test_gtod(); 488 + nerrs += test_time(); 489 + nerrs += test_getcpu(0); 490 + nerrs += test_getcpu(1); 491 + 492 + sethandler(SIGSEGV, sigsegv, 0); 493 + nerrs += test_vsys_r(); 494 + 495 + #ifdef __x86_64__ 496 + nerrs += test_native_vsyscall(); 497 + #endif 498 + 499 + return nerrs ? 1 : 0; 500 + }