Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

arm64: mm: Fix rodata=full block mapping support for realm guests

Commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") enabled the linear map to be mapped by block/cont while
still allowing granular permission changes on BBML2_NOABORT systems by
lazily splitting the live mappings. This mechanism was intended to be
usable by realm guests since they need to dynamically share dma buffers
with the host by "decrypting" them - which for Arm CCA, means marking
them as shared in the page tables.

However, it turns out that the mechanism was failing for realm guests
because realms need to share their dma buffers (via
__set_memory_enc_dec()) much earlier during boot than
split_kernel_leaf_mapping() was able to handle. The report linked below
showed that GIC's ITS was one such user. But during the investigation I
found other callsites that could not meet the
split_kernel_leaf_mapping() constraints.

The problem is that we block map the linear map based on the boot CPU
supporting BBML2_NOABORT, then check that all the other CPUs support it
too when finalizing the caps. If they don't, then we stop_machine() and
split to ptes. For safety, split_kernel_leaf_mapping() previously
wouldn't permit splitting until after the caps were finalized. That
ensured that if any secondary cpus were running that didn't support
BBML2_NOABORT, we wouldn't risk breaking them.

I've fix this problem by reducing the black-out window where we refuse
to split; there are now 2 windows. The first is from T0 until the page
allocator is inititialized. Splitting allocates memory for the page
allocator so it must be in use. The second covers the period between
starting to online the secondary cpus until the system caps are
finalized (this is a very small window).

All of the problematic callers are calling __set_memory_enc_dec() before
the secondary cpus come online, so this solves the problem. However, one
of these callers, swiotlb_update_mem_attributes(), was trying to split
before the page allocator was initialized. So I have moved this call
from arch_mm_preinit() to mem_init(), which solves the ordering issue.

I've added warnings and return an error if any attempt is made to split
in the black-out windows.

Note there are other issues which prevent booting all the way to user
space, which will be fixed in subsequent patches.

Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

authored by

Ryan Roberts and committed by
Catalin Marinas
f12b435d 1f318b96

+42 -14
+2
arch/arm64/include/asm/mmu.h
··· 112 112 static inline void kpti_install_ng_mappings(void) {} 113 113 #endif 114 114 115 + extern bool page_alloc_available; 116 + 115 117 #endif /* !__ASSEMBLER__ */ 116 118 #endif
+8 -1
arch/arm64/mm/init.c
··· 350 350 } 351 351 352 352 swiotlb_init(swiotlb, flags); 353 - swiotlb_update_mem_attributes(); 354 353 355 354 /* 356 355 * Check boundaries twice: Some fundamental inconsistencies can be ··· 374 375 */ 375 376 sysctl_overcommit_memory = OVERCOMMIT_ALWAYS; 376 377 } 378 + } 379 + 380 + bool page_alloc_available __ro_after_init; 381 + 382 + void __init mem_init(void) 383 + { 384 + page_alloc_available = true; 385 + swiotlb_update_mem_attributes(); 377 386 } 378 387 379 388 void free_initmem(void)
+32 -13
arch/arm64/mm/mmu.c
··· 768 768 } 769 769 770 770 static DEFINE_MUTEX(pgtable_split_lock); 771 + static bool linear_map_requires_bbml2; 771 772 772 773 int split_kernel_leaf_mapping(unsigned long start, unsigned long end) 773 774 { 774 775 int ret; 775 776 776 777 /* 777 - * !BBML2_NOABORT systems should not be trying to change permissions on 778 - * anything that is not pte-mapped in the first place. Just return early 779 - * and let the permission change code raise a warning if not already 780 - * pte-mapped. 781 - */ 782 - if (!system_supports_bbml2_noabort()) 783 - return 0; 784 - 785 - /* 786 778 * If the region is within a pte-mapped area, there is no need to try to 787 779 * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may 788 780 * change permissions from atomic context so for those cases (which are 789 781 * always pte-mapped), we must not go any further because taking the 790 - * mutex below may sleep. 782 + * mutex below may sleep. Do not call force_pte_mapping() here because 783 + * it could return a confusing result if called from a secondary cpu 784 + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us 785 + * what we need. 791 786 */ 792 - if (force_pte_mapping() || is_kfence_address((void *)start)) 787 + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start)) 793 788 return 0; 789 + 790 + if (!system_supports_bbml2_noabort()) { 791 + /* 792 + * !BBML2_NOABORT systems should not be trying to change 793 + * permissions on anything that is not pte-mapped in the first 794 + * place. Just return early and let the permission change code 795 + * raise a warning if not already pte-mapped. 796 + */ 797 + if (system_capabilities_finalized()) 798 + return 0; 799 + 800 + /* 801 + * Boot-time: split_kernel_leaf_mapping_locked() allocates from 802 + * page allocator. Can't split until it's available. 803 + */ 804 + if (WARN_ON(!page_alloc_available)) 805 + return -EBUSY; 806 + 807 + /* 808 + * Boot-time: Started secondary cpus but don't know if they 809 + * support BBML2_NOABORT yet. Can't allow splitting in this 810 + * window in case they don't. 811 + */ 812 + if (WARN_ON(num_online_cpus() > 1)) 813 + return -EBUSY; 814 + } 794 815 795 816 /* 796 817 * Ensure start and end are at least page-aligned since this is the ··· 911 890 912 891 return ret; 913 892 } 914 - 915 - static bool linear_map_requires_bbml2 __initdata; 916 893 917 894 u32 idmap_kpti_bbml2_flag; 918 895