Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

- A large and involved preparatory series to pave the way to add
exception handling for relocate_kernel - which will be a debugging
facility that has aided in the field to debug an exceptionally hard
to debug early boot bug. Plus assorted cleanups and fixes that were
discovered along the way, by David Woodhouse:

- Clean up and document register use in relocate_kernel_64.S
- Use named labels in swap_pages in relocate_kernel_64.S
- Only swap pages for ::preserve_context mode
- Allocate PGD for x86_64 transition page tables separately
- Copy control page into place in machine_kexec_prepare()
- Invoke copy of relocate_kernel() instead of the original
- Move relocate_kernel to kernel .data section
- Add data section to relocate_kernel
- Drop page_list argument from relocate_kernel()
- Eliminate writes through kernel mapping of relocate_kernel page
- Clean up register usage in relocate_kernel()
- Mark relocate_kernel page as ROX instead of RWX
- Disable global pages before writing to control page
- Ensure preserve_context flag is set on return to kernel
- Use correct swap page in swap_pages function
- Fix stack and handling of re-entry point for ::preserve_context
- Mark machine_kexec() with __nocfi
- Cope with relocate_kernel() not being at the start of the page
- Use typedef for relocate_kernel_fn function prototype
- Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor)

- A series to remove the last remaining absolute symbol references from
.head.text, and enforce this at build time, by Ard Biesheuvel:

- Avoid WARN()s and panic()s in early boot code
- Don't hang but terminate on failure to remap SVSM CA
- Determine VA/PA offset before entering C code
- Avoid intentional absolute symbol references in .head.text
- Disable UBSAN in early boot code
- Move ENTRY_TEXT to the start of the image
- Move .head.text into its own output section
- Reject absolute references in .head.text

- The above build-time enforcement uncovered a handful of bugs of
essentially non-working code, and a wrokaround for a toolchain bug,
fixed by Ard Biesheuvel as well:

- Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
- Disable UBSAN on SEV code that may execute very early
- Disable ftrace branch profiling in SEV startup code

- And miscellaneous cleanups:

- kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki)
- x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)"

* tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
x86/sev: Disable ftrace branch profiling in SEV startup code
x86/kexec: Use typedef for relocate_kernel_fn function prototype
x86/kexec: Cope with relocate_kernel() not being at the start of the page
kexec_core: Add and update comments regarding the KEXEC_JUMP flow
x86/kexec: Mark machine_kexec() with __nocfi
x86/kexec: Fix location of relocate_kernel with -ffunction-sections
x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
x86/kexec: Use correct swap page in swap_pages function
x86/kexec: Ensure preserve_context flag is set on return to kernel
x86/kexec: Disable global pages before writing to control page
x86/sev: Don't hang but terminate on failure to remap SVSM CA
x86/sev: Disable UBSAN on SEV code that may execute very early
x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
x86/sysfs: Constify 'struct bin_attribute'
x86/kexec: Mark relocate_kernel page as ROX instead of RWX
x86/kexec: Clean up register usage in relocate_kernel()
x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
x86/kexec: Drop page_list argument from relocate_kernel()
x86/kexec: Add data section to relocate_kernel
x86/kexec: Move relocate_kernel to kernel .data section
...

+318 -225
+3
arch/x86/coco/sev/Makefile
··· 13 13 # With some compiler versions the generated code results in boot hangs, caused 14 14 # by several compilation units. To be safe, disable all instrumentation. 15 15 KCSAN_SANITIZE := n 16 + 17 + # Clang 14 and older may fail to respect __no_sanitize_undefined when inlining 18 + UBSAN_SANITIZE := n
+6 -9
arch/x86/coco/sev/core.c
··· 9 9 10 10 #define pr_fmt(fmt) "SEV: " fmt 11 11 12 + #define DISABLE_BRANCH_PROFILING 13 + 12 14 #include <linux/sched/debug.h> /* For show_regs() */ 13 15 #include <linux/percpu-defs.h> 14 16 #include <linux/cc_platform.h> ··· 789 787 790 788 val = sev_es_rd_ghcb_msr(); 791 789 792 - if (WARN(GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP, 793 - "Wrong PSC response code: 0x%x\n", 794 - (unsigned int)GHCB_RESP_CODE(val))) 790 + if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) 795 791 goto e_term; 796 792 797 - if (WARN(GHCB_MSR_PSC_RESP_VAL(val), 798 - "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n", 799 - op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared", 800 - paddr, GHCB_MSR_PSC_RESP_VAL(val))) 793 + if (GHCB_MSR_PSC_RESP_VAL(val)) 801 794 goto e_term; 802 795 803 796 /* Page validation must be performed after changing to private */ ··· 828 831 early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE); 829 832 } 830 833 831 - void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, 834 + void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, 832 835 unsigned long npages) 833 836 { 834 837 /* ··· 2420 2423 call.rcx = pa; 2421 2424 ret = svsm_perform_call_protocol(&call); 2422 2425 if (ret) 2423 - panic("Can't remap the SVSM CA, ret=%d, rax_out=0x%llx\n", ret, call.rax_out); 2426 + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL); 2424 2427 2425 2428 RIP_REL_REF(boot_svsm_caa) = (struct svsm_ca *)pa; 2426 2429 RIP_REL_REF(boot_svsm_caa_pa) = pa;
+9 -7
arch/x86/coco/sev/shared.c
··· 498 498 * 499 499 * Return: XSAVE area size on success, 0 otherwise. 500 500 */ 501 - static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted) 501 + static u32 __head snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted) 502 502 { 503 503 const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table(); 504 504 u64 xfeatures_found = 0; ··· 576 576 sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV); 577 577 } 578 578 579 - static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt, 580 - struct cpuid_leaf *leaf) 579 + static int __head 580 + snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt, 581 + struct cpuid_leaf *leaf) 581 582 { 582 583 struct cpuid_leaf leaf_hv = *leaf; 583 584 ··· 1254 1253 __pval_terminate(pfn, action, page_size, ret, svsm_ret); 1255 1254 } 1256 1255 1257 - static void svsm_pval_4k_page(unsigned long paddr, bool validate) 1256 + static void __head svsm_pval_4k_page(unsigned long paddr, bool validate) 1258 1257 { 1259 1258 struct svsm_pvalidate_call *pc; 1260 1259 struct svsm_call call = {}; ··· 1286 1285 1287 1286 ret = svsm_perform_call_protocol(&call); 1288 1287 if (ret) 1289 - svsm_pval_terminate(pc, ret, call.rax_out); 1288 + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE); 1290 1289 1291 1290 native_local_irq_restore(flags); 1292 1291 } 1293 1292 1294 - static void pvalidate_4k_page(unsigned long vaddr, unsigned long paddr, bool validate) 1293 + static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr, 1294 + bool validate) 1295 1295 { 1296 1296 int ret; 1297 1297 ··· 1305 1303 } else { 1306 1304 ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate); 1307 1305 if (ret) 1308 - __pval_terminate(PHYS_PFN(paddr), validate, RMP_PG_SIZE_4K, ret, 0); 1306 + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE); 1309 1307 } 1310 1308 } 1311 1309
+1 -1
arch/x86/include/asm/init.h
··· 2 2 #ifndef _ASM_X86_INIT_H 3 3 #define _ASM_X86_INIT_H 4 4 5 - #define __head __section(".head.text") 5 + #define __head __section(".head.text") __no_sanitize_undefined 6 6 7 7 struct x86_mapping_info { 8 8 void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
+31 -23
arch/x86/include/asm/kexec.h
··· 8 8 # define PA_PGD 2 9 9 # define PA_SWAP_PAGE 3 10 10 # define PAGES_NR 4 11 - #else 12 - # define PA_CONTROL_PAGE 0 13 - # define VA_CONTROL_PAGE 1 14 - # define PA_TABLE_PAGE 2 15 - # define PA_SWAP_PAGE 3 16 - # define PAGES_NR 4 17 11 #endif 18 12 13 + # define KEXEC_CONTROL_PAGE_SIZE 4096 19 14 # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 20 15 21 16 #ifndef __ASSEMBLY__ ··· 38 43 /* Maximum address we can use for the control code buffer */ 39 44 # define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE 40 45 41 - # define KEXEC_CONTROL_PAGE_SIZE 4096 42 46 43 47 /* The native architecture */ 44 48 # define KEXEC_ARCH KEXEC_ARCH_386 ··· 52 58 /* Maximum address we can use for the control pages */ 53 59 # define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1) 54 60 55 - /* Allocate one page for the pdp and the second for the code */ 56 - # define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL) 57 - 58 61 /* The native architecture */ 59 62 # define KEXEC_ARCH KEXEC_ARCH_X86_64 63 + 64 + extern unsigned long kexec_va_control_page; 65 + extern unsigned long kexec_pa_table_page; 66 + extern unsigned long kexec_pa_swap_page; 60 67 #endif 61 68 62 69 /* ··· 111 116 } 112 117 113 118 #ifdef CONFIG_X86_32 114 - asmlinkage unsigned long 115 - relocate_kernel(unsigned long indirection_page, 116 - unsigned long control_page, 117 - unsigned long start_address, 118 - unsigned int has_pae, 119 - unsigned int preserve_context); 119 + typedef asmlinkage unsigned long 120 + relocate_kernel_fn(unsigned long indirection_page, 121 + unsigned long control_page, 122 + unsigned long start_address, 123 + unsigned int has_pae, 124 + unsigned int preserve_context); 120 125 #else 121 - unsigned long 122 - relocate_kernel(unsigned long indirection_page, 123 - unsigned long page_list, 124 - unsigned long start_address, 125 - unsigned int preserve_context, 126 - unsigned int host_mem_enc_active); 126 + typedef unsigned long 127 + relocate_kernel_fn(unsigned long indirection_page, 128 + unsigned long pa_control_page, 129 + unsigned long start_address, 130 + unsigned int preserve_context, 131 + unsigned int host_mem_enc_active); 127 132 #endif 128 - 133 + extern relocate_kernel_fn relocate_kernel; 129 134 #define ARCH_HAS_KIMAGE_ARCH 130 135 131 136 #ifdef CONFIG_X86_32 ··· 140 145 }; 141 146 #else 142 147 struct kimage_arch { 148 + /* 149 + * This is a kimage control page, as it must not overlap with either 150 + * source or destination address ranges. 151 + */ 152 + pgd_t *pgd; 153 + /* 154 + * The virtual mapping of the control code page itself is used only 155 + * during the transition, while the current kernel's pages are all 156 + * in place. Thus the intermediate page table pages used to map it 157 + * are not control pages, but instead just normal pages obtained 158 + * with get_zeroed_page(). And have to be tracked (below) so that 159 + * they can be freed. 160 + */ 143 161 p4d_t *p4d; 144 162 pud_t *pud; 145 163 pmd_t *pmd;
+1
arch/x86/include/asm/sections.h
··· 5 5 #include <asm-generic/sections.h> 6 6 #include <asm/extable.h> 7 7 8 + extern char __relocate_kernel_start[], __relocate_kernel_end[]; 8 9 extern char __brk_base[], __brk_limit[]; 9 10 extern char __end_rodata_aligned[]; 10 11
+1 -1
arch/x86/include/asm/setup.h
··· 49 49 50 50 extern void reserve_standard_io_resources(void); 51 51 extern void i386_reserve_resources(void); 52 - extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp); 52 + extern unsigned long __startup_64(unsigned long p2v_offset, struct boot_params *bp); 53 53 extern void startup_64_setup_gdt_idt(void); 54 54 extern void early_setup_idt(void); 55 55 extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
+1
arch/x86/include/asm/sev-common.h
··· 207 207 #define GHCB_TERM_SVSM_VMPL0 8 /* SVSM is present but has set VMPL to 0 */ 208 208 #define GHCB_TERM_SVSM_CAA 9 /* SVSM is present but CAA is not page aligned */ 209 209 #define GHCB_TERM_SECURE_TSC 10 /* Secure TSC initialization failed */ 210 + #define GHCB_TERM_SVSM_CA_REMAP_FAIL 11 /* SVSM is present but CA could not be remapped */ 210 211 211 212 #define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK) 212 213
+6
arch/x86/kernel/callthunks.c
··· 139 139 return true; 140 140 #endif 141 141 #ifdef CONFIG_KEXEC_CORE 142 + # ifdef CONFIG_X86_64 143 + if (dest >= (void *)__relocate_kernel_start && 144 + dest < (void *)__relocate_kernel_end) 145 + return true; 146 + # else 142 147 if (dest >= (void *)relocate_kernel && 143 148 dest < (void*)relocate_kernel + KEXEC_CONTROL_CODE_MAX_SIZE) 144 149 return true; 150 + # endif 145 151 #endif 146 152 return false; 147 153 }
+24 -16
arch/x86/kernel/head64.c
··· 91 91 return true; 92 92 } 93 93 94 - static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) 94 + static unsigned long __head sme_postprocess_startup(struct boot_params *bp, 95 + pmdval_t *pmd, 96 + unsigned long p2v_offset) 95 97 { 96 - unsigned long vaddr, vaddr_end; 98 + unsigned long paddr, paddr_end; 97 99 int i; 98 100 99 101 /* Encrypt the kernel and related (if SME is active) */ ··· 108 106 * attribute. 109 107 */ 110 108 if (sme_get_me_mask()) { 111 - vaddr = (unsigned long)__start_bss_decrypted; 112 - vaddr_end = (unsigned long)__end_bss_decrypted; 109 + paddr = (unsigned long)&RIP_REL_REF(__start_bss_decrypted); 110 + paddr_end = (unsigned long)&RIP_REL_REF(__end_bss_decrypted); 113 111 114 - for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { 112 + for (; paddr < paddr_end; paddr += PMD_SIZE) { 115 113 /* 116 114 * On SNP, transition the page to shared in the RMP table so that 117 115 * it is consistent with the page table attribute change. ··· 120 118 * mapping (kernel .text). PVALIDATE, by way of 121 119 * early_snp_set_memory_shared(), requires a valid virtual 122 120 * address but the kernel is currently running off of the identity 123 - * mapping so use __pa() to get a *currently* valid virtual address. 121 + * mapping so use the PA to get a *currently* valid virtual address. 124 122 */ 125 - early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD); 123 + early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); 126 124 127 - i = pmd_index(vaddr); 125 + i = pmd_index(paddr - p2v_offset); 128 126 pmd[i] -= sme_get_me_mask(); 129 127 } 130 128 } ··· 140 138 * doesn't have to generate PC-relative relocations when accessing globals from 141 139 * that function. Clang actually does not generate them, which leads to 142 140 * boot-time crashes. To work around this problem, every global pointer must 143 - * be accessed using RIP_REL_REF(). 141 + * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determined 142 + * by subtracting p2v_offset from the RIP-relative address. 144 143 */ 145 - unsigned long __head __startup_64(unsigned long physaddr, 144 + unsigned long __head __startup_64(unsigned long p2v_offset, 146 145 struct boot_params *bp) 147 146 { 148 147 pmd_t (*early_pgts)[PTRS_PER_PMD] = RIP_REL_REF(early_dynamic_pgts); 148 + unsigned long physaddr = (unsigned long)&RIP_REL_REF(_text); 149 + unsigned long va_text, va_end; 149 150 unsigned long pgtable_flags; 150 151 unsigned long load_delta; 151 152 pgdval_t *pgd; ··· 168 163 * Compute the delta between the address I am compiled to run at 169 164 * and the address I am actually running at. 170 165 */ 171 - load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map); 166 + load_delta = __START_KERNEL_map + p2v_offset; 172 167 RIP_REL_REF(phys_base) = load_delta; 173 168 174 169 /* Is the address not 2M aligned? */ 175 170 if (load_delta & ~PMD_MASK) 176 171 for (;;); 172 + 173 + va_text = physaddr - p2v_offset; 174 + va_end = (unsigned long)&RIP_REL_REF(_end) - p2v_offset; 177 175 178 176 /* Include the SME encryption mask in the fixup value */ 179 177 load_delta += sme_get_me_mask(); ··· 186 178 pgd = &RIP_REL_REF(early_top_pgt)->pgd; 187 179 pgd[pgd_index(__START_KERNEL_map)] += load_delta; 188 180 189 - if (la57) { 181 + if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { 190 182 p4d = (p4dval_t *)&RIP_REL_REF(level4_kernel_pgt); 191 183 p4d[MAX_PTRS_PER_P4D - 1] += load_delta; 192 184 ··· 238 230 pmd_entry += sme_get_me_mask(); 239 231 pmd_entry += physaddr; 240 232 241 - for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) { 233 + for (i = 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { 242 234 int idx = i + (physaddr >> PMD_SHIFT); 243 235 244 236 pmd[idx % PTRS_PER_PMD] = pmd_entry + i * PMD_SIZE; ··· 263 255 pmd = &RIP_REL_REF(level2_kernel_pgt)->pmd; 264 256 265 257 /* invalidate pages before the kernel image */ 266 - for (i = 0; i < pmd_index((unsigned long)_text); i++) 258 + for (i = 0; i < pmd_index(va_text); i++) 267 259 pmd[i] &= ~_PAGE_PRESENT; 268 260 269 261 /* fixup pages that are part of the kernel image */ 270 - for (; i <= pmd_index((unsigned long)_end); i++) 262 + for (; i <= pmd_index(va_end); i++) 271 263 if (pmd[i] & _PAGE_PRESENT) 272 264 pmd[i] += load_delta; 273 265 ··· 275 267 for (; i < PTRS_PER_PMD; i++) 276 268 pmd[i] &= ~_PAGE_PRESENT; 277 269 278 - return sme_postprocess_startup(bp, pmd); 270 + return sme_postprocess_startup(bp, pmd, p2v_offset); 279 271 } 280 272 281 273 /* Wipe all early page tables except for the kernel symbol map */
+9 -3
arch/x86/kernel/head_64.S
··· 95 95 call verify_cpu 96 96 97 97 /* 98 + * Derive the kernel's physical-to-virtual offset from the physical and 99 + * virtual addresses of common_startup_64(). 100 + */ 101 + leaq common_startup_64(%rip), %rdi 102 + subq .Lcommon_startup_64(%rip), %rdi 103 + 104 + /* 98 105 * Perform pagetable fixups. Additionally, if SME is active, encrypt 99 106 * the kernel and retrieve the modifier (SME encryption mask if SME 100 107 * is active) to be added to the initial pgdir entry that will be 101 108 * programmed into CR3. 102 109 */ 103 - leaq _text(%rip), %rdi 104 110 movq %r15, %rsi 105 111 call __startup_64 106 112 ··· 134 128 135 129 /* Branch to the common startup code at its kernel virtual address */ 136 130 ANNOTATE_RETPOLINE_SAFE 137 - jmp *0f(%rip) 131 + jmp *.Lcommon_startup_64(%rip) 138 132 SYM_CODE_END(startup_64) 139 133 140 134 __INITRODATA 141 - 0: .quad common_startup_64 135 + SYM_DATA_LOCAL(.Lcommon_startup_64, .quad common_startup_64) 142 136 143 137 .text 144 138 SYM_CODE_START(secondary_startup_64)
+9 -9
arch/x86/kernel/ksysfs.c
··· 28 28 static struct kobj_attribute boot_params_version_attr = __ATTR_RO(version); 29 29 30 30 static ssize_t boot_params_data_read(struct file *fp, struct kobject *kobj, 31 - struct bin_attribute *bin_attr, 31 + const struct bin_attribute *bin_attr, 32 32 char *buf, loff_t off, size_t count) 33 33 { 34 34 memcpy(buf, (void *)&boot_params + off, count); 35 35 return count; 36 36 } 37 37 38 - static struct bin_attribute boot_params_data_attr = { 38 + static const struct bin_attribute boot_params_data_attr = { 39 39 .attr = { 40 40 .name = "data", 41 41 .mode = S_IRUGO, 42 42 }, 43 - .read = boot_params_data_read, 43 + .read_new = boot_params_data_read, 44 44 .size = sizeof(boot_params), 45 45 }; 46 46 ··· 49 49 NULL, 50 50 }; 51 51 52 - static struct bin_attribute *boot_params_data_attrs[] = { 52 + static const struct bin_attribute *const boot_params_data_attrs[] = { 53 53 &boot_params_data_attr, 54 54 NULL, 55 55 }; 56 56 57 57 static const struct attribute_group boot_params_attr_group = { 58 58 .attrs = boot_params_version_attrs, 59 - .bin_attrs = boot_params_data_attrs, 59 + .bin_attrs_new = boot_params_data_attrs, 60 60 }; 61 61 62 62 static int kobj_to_setup_data_nr(struct kobject *kobj, int *nr) ··· 172 172 173 173 static ssize_t setup_data_data_read(struct file *fp, 174 174 struct kobject *kobj, 175 - struct bin_attribute *bin_attr, 175 + const struct bin_attribute *bin_attr, 176 176 char *buf, 177 177 loff_t off, size_t count) 178 178 { ··· 250 250 .name = "data", 251 251 .mode = S_IRUGO, 252 252 }, 253 - .read = setup_data_data_read, 253 + .read_new = setup_data_data_read, 254 254 }; 255 255 256 256 static struct attribute *setup_data_type_attrs[] = { ··· 258 258 NULL, 259 259 }; 260 260 261 - static struct bin_attribute *setup_data_data_attrs[] = { 261 + static const struct bin_attribute *const setup_data_data_attrs[] = { 262 262 &data_attr, 263 263 NULL, 264 264 }; 265 265 266 266 static const struct attribute_group setup_data_attr_group = { 267 267 .attrs = setup_data_type_attrs, 268 - .bin_attrs = setup_data_data_attrs, 268 + .bin_attrs_new = setup_data_data_attrs, 269 269 }; 270 270 271 271 static int __init create_setup_data_node(struct kobject *parent,
+1 -6
arch/x86/kernel/machine_kexec_32.c
··· 160 160 */ 161 161 void machine_kexec(struct kimage *image) 162 162 { 163 + relocate_kernel_fn *relocate_kernel_ptr; 163 164 unsigned long page_list[PAGES_NR]; 164 165 void *control_page; 165 166 int save_ftrace_enabled; 166 - asmlinkage unsigned long 167 - (*relocate_kernel_ptr)(unsigned long indirection_page, 168 - unsigned long control_page, 169 - unsigned long start_address, 170 - unsigned int has_pae, 171 - unsigned int preserve_context); 172 167 173 168 #ifdef CONFIG_KEXEC_JUMP 174 169 if (image->preserve_context)
+57 -38
arch/x86/kernel/machine_kexec_64.c
··· 146 146 image->arch.pte = NULL; 147 147 } 148 148 149 - static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) 149 + static int init_transition_pgtable(struct kimage *image, pgd_t *pgd, 150 + unsigned long control_page) 150 151 { 151 152 pgprot_t prot = PAGE_KERNEL_EXEC_NOENC; 152 153 unsigned long vaddr, paddr; ··· 157 156 pmd_t *pmd; 158 157 pte_t *pte; 159 158 160 - vaddr = (unsigned long)relocate_kernel; 161 - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE); 159 + /* 160 + * For the transition to the identity mapped page tables, the control 161 + * code page also needs to be mapped at the virtual address it starts 162 + * off running from. 163 + */ 164 + vaddr = (unsigned long)__va(control_page); 165 + paddr = control_page; 162 166 pgd += pgd_index(vaddr); 163 167 if (!pgd_present(*pgd)) { 164 168 p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL); ··· 222 216 return p; 223 217 } 224 218 225 - static int init_pgtable(struct kimage *image, unsigned long start_pgtable) 219 + static int init_pgtable(struct kimage *image, unsigned long control_page) 226 220 { 227 221 struct x86_mapping_info info = { 228 222 .alloc_pgt_page = alloc_pgt_page, ··· 231 225 .kernpg_flag = _KERNPG_TABLE_NOENC, 232 226 }; 233 227 unsigned long mstart, mend; 234 - pgd_t *level4p; 235 228 int result; 236 229 int i; 237 230 238 - level4p = (pgd_t *)__va(start_pgtable); 239 - clear_page(level4p); 231 + image->arch.pgd = alloc_pgt_page(image); 232 + if (!image->arch.pgd) 233 + return -ENOMEM; 240 234 241 235 if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) { 242 236 info.page_flag |= _PAGE_ENC; ··· 250 244 mstart = pfn_mapped[i].start << PAGE_SHIFT; 251 245 mend = pfn_mapped[i].end << PAGE_SHIFT; 252 246 253 - result = kernel_ident_mapping_init(&info, 254 - level4p, mstart, mend); 247 + result = kernel_ident_mapping_init(&info, image->arch.pgd, 248 + mstart, mend); 255 249 if (result) 256 250 return result; 257 251 } ··· 266 260 mstart = image->segment[i].mem; 267 261 mend = mstart + image->segment[i].memsz; 268 262 269 - result = kernel_ident_mapping_init(&info, 270 - level4p, mstart, mend); 263 + result = kernel_ident_mapping_init(&info, image->arch.pgd, 264 + mstart, mend); 271 265 272 266 if (result) 273 267 return result; ··· 277 271 * Prepare EFI systab and ACPI tables for kexec kernel since they are 278 272 * not covered by pfn_mapped. 279 273 */ 280 - result = map_efi_systab(&info, level4p); 274 + result = map_efi_systab(&info, image->arch.pgd); 281 275 if (result) 282 276 return result; 283 277 284 - result = map_acpi_tables(&info, level4p); 278 + result = map_acpi_tables(&info, image->arch.pgd); 285 279 if (result) 286 280 return result; 287 281 288 - return init_transition_pgtable(image, level4p); 282 + /* 283 + * This must be last because the intermediate page table pages it 284 + * allocates will not be control pages and may overlap the image. 285 + */ 286 + return init_transition_pgtable(image, image->arch.pgd, control_page); 289 287 } 290 288 291 289 static void load_segments(void) ··· 306 296 307 297 int machine_kexec_prepare(struct kimage *image) 308 298 { 309 - unsigned long start_pgtable; 299 + void *control_page = page_address(image->control_code_page); 300 + unsigned long reloc_start = (unsigned long)__relocate_kernel_start; 301 + unsigned long reloc_end = (unsigned long)__relocate_kernel_end; 310 302 int result; 311 303 312 - /* Calculate the offsets */ 313 - start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT; 314 - 315 304 /* Setup the identity mapped 64bit page table */ 316 - result = init_pgtable(image, start_pgtable); 305 + result = init_pgtable(image, __pa(control_page)); 317 306 if (result) 318 307 return result; 308 + kexec_va_control_page = (unsigned long)control_page; 309 + kexec_pa_table_page = (unsigned long)__pa(image->arch.pgd); 310 + 311 + if (image->type == KEXEC_TYPE_DEFAULT) 312 + kexec_pa_swap_page = page_to_pfn(image->swap_page) << PAGE_SHIFT; 313 + 314 + __memcpy(control_page, __relocate_kernel_start, reloc_end - reloc_start); 315 + 316 + set_memory_rox((unsigned long)control_page, 1); 319 317 320 318 return 0; 321 319 } 322 320 323 321 void machine_kexec_cleanup(struct kimage *image) 324 322 { 323 + void *control_page = page_address(image->control_code_page); 324 + 325 + set_memory_nx((unsigned long)control_page, 1); 326 + set_memory_rw((unsigned long)control_page, 1); 327 + 325 328 free_transition_pgtable(image); 326 329 } 327 330 ··· 342 319 * Do not allocate memory (or fail in any way) in machine_kexec(). 343 320 * We are past the point of no return, committed to rebooting now. 344 321 */ 345 - void machine_kexec(struct kimage *image) 322 + void __nocfi machine_kexec(struct kimage *image) 346 323 { 347 - unsigned long page_list[PAGES_NR]; 324 + unsigned long reloc_start = (unsigned long)__relocate_kernel_start; 325 + relocate_kernel_fn *relocate_kernel_ptr; 348 326 unsigned int host_mem_enc_active; 349 327 int save_ftrace_enabled; 350 328 void *control_page; ··· 381 357 #endif 382 358 } 383 359 384 - control_page = page_address(image->control_code_page) + PAGE_SIZE; 385 - __memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE); 360 + control_page = page_address(image->control_code_page); 386 361 387 - page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page); 388 - page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; 389 - page_list[PA_TABLE_PAGE] = 390 - (unsigned long)__pa(page_address(image->control_code_page)); 391 - 392 - if (image->type == KEXEC_TYPE_DEFAULT) 393 - page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) 394 - << PAGE_SHIFT); 362 + /* 363 + * Allow for the possibility that relocate_kernel might not be at 364 + * the very start of the page. 365 + */ 366 + relocate_kernel_ptr = control_page + (unsigned long)relocate_kernel - reloc_start; 395 367 396 368 /* 397 369 * The segment registers are funny things, they have both a ··· 408 388 native_gdt_invalidate(); 409 389 410 390 /* now call it */ 411 - image->start = relocate_kernel((unsigned long)image->head, 412 - (unsigned long)page_list, 413 - image->start, 414 - image->preserve_context, 415 - host_mem_enc_active); 391 + image->start = relocate_kernel_ptr((unsigned long)image->head, 392 + virt_to_phys(control_page), 393 + image->start, 394 + image->preserve_context, 395 + host_mem_enc_active); 416 396 417 397 #ifdef CONFIG_KEXEC_JUMP 418 398 if (image->preserve_context) ··· 593 573 594 574 /* Don't touch the control code page used in crash_kexec().*/ 595 575 control = PFN_PHYS(page_to_pfn(kexec_crash_image->control_code_page)); 596 - /* Control code page is located in the 2nd page. */ 597 - kexec_mark_range(crashk_res.start, control + PAGE_SIZE - 1, protect); 576 + kexec_mark_range(crashk_res.start, control - 1, protect); 598 577 control += KEXEC_CONTROL_PAGE_SIZE; 599 578 kexec_mark_range(control, crashk_res.end, protect); 600 579 }
+105 -92
arch/x86/kernel/relocate_kernel_64.S
··· 24 24 #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) 25 25 26 26 /* 27 - * control_page + KEXEC_CONTROL_CODE_MAX_SIZE 28 - * ~ control_page + PAGE_SIZE are used as data storage and stack for 29 - * jumping back 27 + * The .text..relocate_kernel and .data..relocate_kernel sections are copied 28 + * into the control page, and the remainder of the page is used as the stack. 30 29 */ 31 - #define DATA(offset) (KEXEC_CONTROL_CODE_MAX_SIZE+(offset)) 32 30 31 + .section .data..relocate_kernel,"a"; 33 32 /* Minimal CPU state */ 34 - #define RSP DATA(0x0) 35 - #define CR0 DATA(0x8) 36 - #define CR3 DATA(0x10) 37 - #define CR4 DATA(0x18) 33 + SYM_DATA_LOCAL(saved_rsp, .quad 0) 34 + SYM_DATA_LOCAL(saved_cr0, .quad 0) 35 + SYM_DATA_LOCAL(saved_cr3, .quad 0) 36 + SYM_DATA_LOCAL(saved_cr4, .quad 0) 37 + /* other data */ 38 + SYM_DATA(kexec_va_control_page, .quad 0) 39 + SYM_DATA(kexec_pa_table_page, .quad 0) 40 + SYM_DATA(kexec_pa_swap_page, .quad 0) 41 + SYM_DATA_LOCAL(pa_backup_pages_map, .quad 0) 38 42 39 - /* other data */ 40 - #define CP_PA_TABLE_PAGE DATA(0x20) 41 - #define CP_PA_SWAP_PAGE DATA(0x28) 42 - #define CP_PA_BACKUP_PAGES_MAP DATA(0x30) 43 - 44 - .text 45 - .align PAGE_SIZE 43 + .section .text..relocate_kernel,"ax"; 46 44 .code64 47 - SYM_CODE_START_NOALIGN(relocate_range) 48 45 SYM_CODE_START_NOALIGN(relocate_kernel) 49 46 UNWIND_HINT_END_OF_STACK 50 47 ANNOTATE_NOENDBR 51 48 /* 52 49 * %rdi indirection_page 53 - * %rsi page_list 50 + * %rsi pa_control_page 54 51 * %rdx start address 55 52 * %rcx preserve_context 56 53 * %r8 host_mem_enc_active ··· 62 65 pushq %r15 63 66 pushf 64 67 65 - movq PTR(VA_CONTROL_PAGE)(%rsi), %r11 66 - movq %rsp, RSP(%r11) 67 - movq %cr0, %rax 68 - movq %rax, CR0(%r11) 69 - movq %cr3, %rax 70 - movq %rax, CR3(%r11) 71 - movq %cr4, %rax 72 - movq %rax, CR4(%r11) 73 - 74 - /* Save CR4. Required to enable the right paging mode later. */ 75 - movq %rax, %r13 76 - 77 68 /* zero out flags, and disable interrupts */ 78 69 pushq $0 79 70 popfq 80 71 81 - /* Save SME active flag */ 82 - movq %r8, %r12 83 - 84 - /* 85 - * get physical address of control page now 86 - * this is impossible after page table switch 87 - */ 88 - movq PTR(PA_CONTROL_PAGE)(%rsi), %r8 89 - 90 - /* get physical address of page table now too */ 91 - movq PTR(PA_TABLE_PAGE)(%rsi), %r9 92 - 93 - /* get physical address of swap page now */ 94 - movq PTR(PA_SWAP_PAGE)(%rsi), %r10 95 - 96 - /* save some information for jumping back */ 97 - movq %r9, CP_PA_TABLE_PAGE(%r11) 98 - movq %r10, CP_PA_SWAP_PAGE(%r11) 99 - movq %rdi, CP_PA_BACKUP_PAGES_MAP(%r11) 100 - 101 72 /* Switch to the identity mapped page tables */ 73 + movq %cr3, %rax 74 + movq kexec_pa_table_page(%rip), %r9 102 75 movq %r9, %cr3 103 76 77 + /* Leave CR4 in %r13 to enable the right paging mode later. */ 78 + movq %cr4, %r13 79 + 80 + /* Disable global pages immediately to ensure this mapping is RWX */ 81 + movq %r13, %r12 82 + andq $~(X86_CR4_PGE), %r12 83 + movq %r12, %cr4 84 + 85 + /* Save %rsp and CRs. */ 86 + movq %r13, saved_cr4(%rip) 87 + movq %rsp, saved_rsp(%rip) 88 + movq %rax, saved_cr3(%rip) 89 + movq %cr0, %rax 90 + movq %rax, saved_cr0(%rip) 91 + 92 + /* save indirection list for jumping back */ 93 + movq %rdi, pa_backup_pages_map(%rip) 94 + 95 + /* Save the preserve_context to %r11 as swap_pages clobbers %rcx. */ 96 + movq %rcx, %r11 97 + 104 98 /* setup a new stack at the end of the physical control page */ 105 - lea PAGE_SIZE(%r8), %rsp 99 + lea PAGE_SIZE(%rsi), %rsp 106 100 107 101 /* jump to identity mapped page */ 108 - addq $(identity_mapped - relocate_kernel), %r8 109 - pushq %r8 110 - ANNOTATE_UNRET_SAFE 111 - ret 112 - int3 102 + 0: addq $identity_mapped - 0b, %rsi 103 + subq $__relocate_kernel_start - 0b, %rsi 104 + ANNOTATE_RETPOLINE_SAFE 105 + jmp *%rsi 113 106 SYM_CODE_END(relocate_kernel) 114 107 115 108 SYM_CODE_START_LOCAL_NOALIGN(identity_mapped) 116 109 UNWIND_HINT_END_OF_STACK 117 - /* set return address to 0 if not preserving context */ 118 - pushq $0 110 + /* 111 + * %rdi indirection page 112 + * %rdx start address 113 + * %r8 host_mem_enc_active 114 + * %r9 page table page 115 + * %r11 preserve_context 116 + * %r13 original CR4 when relocate_kernel() was invoked 117 + */ 118 + 119 119 /* store the start address on the stack */ 120 120 pushq %rdx 121 121 ··· 160 166 * entries that will conflict with the now unencrypted memory 161 167 * used by kexec. Flush the caches before copying the kernel. 162 168 */ 163 - testq %r12, %r12 169 + testq %r8, %r8 164 170 jz .Lsme_off 165 171 wbinvd 166 172 .Lsme_off: 167 173 168 - /* Save the preserve_context to %r11 as swap_pages clobbers %rcx. */ 169 - movq %rcx, %r11 170 174 call swap_pages 171 175 172 176 /* ··· 176 184 movq %cr3, %rax 177 185 movq %rax, %cr3 178 186 187 + testq %r11, %r11 /* preserve_context */ 188 + jnz .Lrelocate 189 + 179 190 /* 180 191 * set all of the registers to known values 181 192 * leave %rsp alone 182 193 */ 183 194 184 - testq %r11, %r11 185 - jnz .Lrelocate 186 195 xorl %eax, %eax 187 196 xorl %ebx, %ebx 188 197 xorl %ecx, %ecx ··· 206 213 207 214 .Lrelocate: 208 215 popq %rdx 216 + 217 + /* Use the swap page for the callee's stack */ 218 + movq kexec_pa_swap_page(%rip), %r10 209 219 leaq PAGE_SIZE(%r10), %rsp 220 + 221 + /* push the existing entry point onto the callee's stack */ 222 + pushq %rdx 223 + 210 224 ANNOTATE_RETPOLINE_SAFE 211 225 call *%rdx 212 226 213 227 /* get the re-entry point of the peer system */ 214 - movq 0(%rsp), %rbp 215 - leaq relocate_kernel(%rip), %r8 216 - movq CP_PA_SWAP_PAGE(%r8), %r10 217 - movq CP_PA_BACKUP_PAGES_MAP(%r8), %rdi 218 - movq CP_PA_TABLE_PAGE(%r8), %rax 228 + popq %rbp 229 + movq kexec_pa_swap_page(%rip), %r10 230 + movq pa_backup_pages_map(%rip), %rdi 231 + movq kexec_pa_table_page(%rip), %rax 219 232 movq %rax, %cr3 233 + 234 + /* Find start (and end) of this physical mapping of control page */ 235 + leaq (%rip), %r8 236 + ANNOTATE_NOENDBR 237 + andq $PAGE_MASK, %r8 220 238 lea PAGE_SIZE(%r8), %rsp 239 + movl $1, %r11d /* Ensure preserve_context flag is set */ 221 240 call swap_pages 222 - movq $virtual_mapped, %rax 241 + movq kexec_va_control_page(%rip), %rax 242 + 0: addq $virtual_mapped - 0b, %rax 243 + subq $__relocate_kernel_start - 0b, %rax 223 244 pushq %rax 224 245 ANNOTATE_UNRET_SAFE 225 246 ret ··· 243 236 SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped) 244 237 UNWIND_HINT_END_OF_STACK 245 238 ANNOTATE_NOENDBR // RET target, above 246 - movq RSP(%r8), %rsp 247 - movq CR4(%r8), %rax 239 + movq saved_rsp(%rip), %rsp 240 + movq saved_cr4(%rip), %rax 248 241 movq %rax, %cr4 249 - movq CR3(%r8), %rax 250 - movq CR0(%r8), %r8 242 + movq saved_cr3(%rip), %rax 243 + movq saved_cr0(%rip), %r8 251 244 movq %rax, %cr3 252 245 movq %r8, %cr0 253 246 ··· 257 250 lgdt saved_context_gdt_desc(%rax) 258 251 #endif 259 252 253 + /* relocate_kernel() returns the re-entry point for next time */ 260 254 movq %rbp, %rax 261 255 262 256 popf ··· 275 267 /* Do the copies */ 276 268 SYM_CODE_START_LOCAL_NOALIGN(swap_pages) 277 269 UNWIND_HINT_END_OF_STACK 270 + /* 271 + * %rdi indirection page 272 + * %r11 preserve_context 273 + */ 278 274 movq %rdi, %rcx /* Put the indirection_page in %rcx */ 279 275 xorl %edi, %edi 280 276 xorl %esi, %esi 281 - jmp 1f 277 + jmp .Lstart /* Should start with an indirection record */ 282 278 283 - 0: /* top, read another word for the indirection page */ 279 + .Lloop: /* top, read another word for the indirection page */ 284 280 285 281 movq (%rbx), %rcx 286 282 addq $8, %rbx 287 - 1: 283 + .Lstart: 288 284 testb $0x1, %cl /* is it a destination page? */ 289 - jz 2f 285 + jz .Lnotdest 290 286 movq %rcx, %rdi 291 287 andq $0xfffffffffffff000, %rdi 292 - jmp 0b 293 - 2: 288 + jmp .Lloop 289 + .Lnotdest: 294 290 testb $0x2, %cl /* is it an indirection page? */ 295 - jz 2f 291 + jz .Lnotind 296 292 movq %rcx, %rbx 297 293 andq $0xfffffffffffff000, %rbx 298 - jmp 0b 299 - 2: 294 + jmp .Lloop 295 + .Lnotind: 300 296 testb $0x4, %cl /* is it the done indicator? */ 301 - jz 2f 302 - jmp 3f 303 - 2: 297 + jz .Lnotdone 298 + jmp .Ldone 299 + .Lnotdone: 304 300 testb $0x8, %cl /* is it the source indicator? */ 305 - jz 0b /* Ignore it otherwise */ 301 + jz .Lloop /* Ignore it otherwise */ 306 302 movq %rcx, %rsi /* For ever source page do a copy */ 307 303 andq $0xfffffffffffff000, %rsi 308 304 309 305 movq %rdi, %rdx /* Save destination page to %rdx */ 310 306 movq %rsi, %rax /* Save source page to %rax */ 311 307 308 + testq %r11, %r11 /* Only actually swap for ::preserve_context */ 309 + jz .Lnoswap 310 + 312 311 /* copy source page to swap page */ 313 - movq %r10, %rdi 312 + movq kexec_pa_swap_page(%rip), %rdi 314 313 movl $512, %ecx 315 314 rep ; movsq 316 315 ··· 329 314 330 315 /* copy swap page to destination page */ 331 316 movq %rdx, %rdi 332 - movq %r10, %rsi 317 + movq kexec_pa_swap_page(%rip), %rsi 318 + .Lnoswap: 333 319 movl $512, %ecx 334 320 rep ; movsq 335 321 336 322 lea PAGE_SIZE(%rax), %rsi 337 - jmp 0b 338 - 3: 323 + jmp .Lloop 324 + .Ldone: 339 325 ANNOTATE_UNRET_SAFE 340 326 ret 341 327 int3 342 328 SYM_CODE_END(swap_pages) 343 - 344 - .skip KEXEC_CONTROL_CODE_MAX_SIZE - (. - relocate_kernel), 0xcc 345 - SYM_CODE_END(relocate_range);
+30 -13
arch/x86/kernel/vmlinux.lds.S
··· 28 28 #include <asm/orc_lookup.h> 29 29 #include <asm/cache.h> 30 30 #include <asm/boot.h> 31 + #include <asm/kexec.h> 31 32 32 33 #undef i386 /* in case the preprocessor is a 32bit one */ 33 34 ··· 96 95 #define BSS_DECRYPTED 97 96 98 97 #endif 98 + #if defined(CONFIG_X86_64) && defined(CONFIG_KEXEC_CORE) 99 + #define KEXEC_RELOCATE_KERNEL \ 100 + . = ALIGN(0x100); \ 101 + __relocate_kernel_start = .; \ 102 + *(.text..relocate_kernel); \ 103 + *(.data..relocate_kernel); \ 104 + __relocate_kernel_end = .; 99 105 106 + ASSERT(__relocate_kernel_end - __relocate_kernel_start <= KEXEC_CONTROL_CODE_MAX_SIZE, 107 + "relocate_kernel code too large!") 108 + #else 109 + #define KEXEC_RELOCATE_KERNEL 110 + #endif 100 111 PHDRS { 101 112 text PT_LOAD FLAGS(5); /* R_E */ 102 113 data PT_LOAD FLAGS(6); /* RW_ */ ··· 134 121 .text : AT(ADDR(.text) - LOAD_OFFSET) { 135 122 _text = .; 136 123 _stext = .; 137 - /* bootstrapping code */ 138 - HEAD_TEXT 139 - TEXT_TEXT 140 - SCHED_TEXT 141 - LOCK_TEXT 142 - KPROBES_TEXT 143 - SOFTIRQENTRY_TEXT 144 - #ifdef CONFIG_MITIGATION_RETPOLINE 145 - *(.text..__x86.indirect_thunk) 146 - *(.text..__x86.return_thunk) 147 - #endif 148 - STATIC_CALL_TEXT 149 - 150 124 ALIGN_ENTRY_TEXT_BEGIN 151 125 *(.text..__x86.rethunk_untrain) 152 126 ENTRY_TEXT ··· 147 147 *(.text..__x86.rethunk_safe) 148 148 #endif 149 149 ALIGN_ENTRY_TEXT_END 150 + 151 + TEXT_TEXT 152 + SCHED_TEXT 153 + LOCK_TEXT 154 + KPROBES_TEXT 155 + SOFTIRQENTRY_TEXT 156 + #ifdef CONFIG_MITIGATION_RETPOLINE 157 + *(.text..__x86.indirect_thunk) 158 + *(.text..__x86.return_thunk) 159 + #endif 160 + STATIC_CALL_TEXT 150 161 *(.gnu.warning) 151 162 163 + } :text = 0xcccccccc 164 + 165 + /* bootstrapping code */ 166 + .head.text : AT(ADDR(.head.text) - LOAD_OFFSET) { 167 + HEAD_TEXT 152 168 } :text = 0xcccccccc 153 169 154 170 /* End of text section, which should occupy whole number of pages */ ··· 197 181 198 182 DATA_DATA 199 183 CONSTRUCTORS 184 + KEXEC_RELOCATE_KERNEL 200 185 201 186 /* rarely changed data like cpu maps */ 202 187 READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
+7 -1
arch/x86/tools/relocs.c
··· 841 841 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym, 842 842 const char *symname) 843 843 { 844 + int headtext = !strcmp(sec_name(sec->shdr.sh_info), ".head.text"); 844 845 unsigned r_type = ELF64_R_TYPE(rel->r_info); 845 846 ElfW(Addr) offset = rel->r_offset; 846 847 int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname); 847 - 848 848 if (sym->st_shndx == SHN_UNDEF) 849 849 return 0; 850 850 ··· 897 897 break; 898 898 899 899 die("Invalid absolute %s relocation: %s\n", rel_type(r_type), symname); 900 + break; 901 + } 902 + 903 + if (headtext) { 904 + die("Absolute reference to symbol '%s' not permitted in .head.text\n", 905 + symname); 900 906 break; 901 907 } 902 908
+17 -6
kernel/kexec_core.c
··· 1001 1001 1002 1002 #ifdef CONFIG_KEXEC_JUMP 1003 1003 if (kexec_image->preserve_context) { 1004 + /* 1005 + * This flow is analogous to hibernation flows that occur 1006 + * before creating an image and before jumping from the 1007 + * restore kernel to the image one, so it uses the same 1008 + * device callbacks as those two flows. 1009 + */ 1004 1010 pm_prepare_console(); 1005 1011 error = freeze_processes(); 1006 1012 if (error) { ··· 1017 1011 error = dpm_suspend_start(PMSG_FREEZE); 1018 1012 if (error) 1019 1013 goto Resume_console; 1020 - /* At this point, dpm_suspend_start() has been called, 1021 - * but *not* dpm_suspend_end(). We *must* call 1022 - * dpm_suspend_end() now. Otherwise, drivers for 1023 - * some devices (e.g. interrupt controllers) become 1024 - * desynchronized with the actual state of the 1025 - * hardware at resume time, and evil weirdness ensues. 1014 + /* 1015 + * dpm_suspend_end() must be called after dpm_suspend_start() 1016 + * to complete the transition, like in the hibernation flows 1017 + * mentioned above. 1026 1018 */ 1027 1019 error = dpm_suspend_end(PMSG_FREEZE); 1028 1020 if (error) ··· 1056 1052 1057 1053 #ifdef CONFIG_KEXEC_JUMP 1058 1054 if (kexec_image->preserve_context) { 1055 + /* 1056 + * This flow is analogous to hibernation flows that occur after 1057 + * creating an image and after the image kernel has got control 1058 + * back, and in case the devices have been reset or otherwise 1059 + * manipulated in the meantime, it uses the device callbacks 1060 + * used by the latter. 1061 + */ 1059 1062 syscore_resume(); 1060 1063 Enable_irqs: 1061 1064 local_irq_enable();