commits
We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).
Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
It appears that a !! became ! during a cleanup, resulting in inverted
logic when detecting if a host GICv5 implementation is capable of
virtualization.
Re-add the missing !, fixing the behaviour.
Fixes: 3227c3a89d65f ("irqchip/gic-v5: Check if impl is virt capable")
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260225083130.3378490-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is
not supported") added an early return to several functions in
arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error
when accessing the pgt union for non-nested VMs.
However, this early return was inadvertently applied to
kvm_arch_flush_shadow_all() as well, causing it to skip the call to
kvm_uninit_stage2_mmu(kvm) for all non-nested VMs.
For pKVM, skipping this teardown means the host never unshares the
guest's memory with the EL2 hypervisor. When the host kernel later
recycles these leaked pages for a new VM, it attempts to re-share them.
The hypervisor correctly rejects this with -EPERM, triggering a host
WARN_ON and hanging the guest.
Fix this by dropping the early return from kvm_arch_flush_shadow_all().
The for-loop guarding the nested MMU cleanup already bounds itself when
nested_mmus_size == 0, allowing execution to proceed to
kvm_uninit_stage2_mmu() as intended.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/
Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported")
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
pKVM tracks the memory that has been mapped into a guest in a
side data structure. Crucially, it uses it to find out whether
a page has already been mapped, and therefore refuses to map it
twice. So far, so good.
However, this very patch completely breaks non-4kB page support,
with guests being unable to boot. The most obvious symptom is that
we take the same fault repeatedly, and not making forward progress.
A quick investigation shows that this is because of the above
rejection code.
As it turns out, there are multiple issues at play:
- while the HPFAR_EL2 register gives you the faulting IPA minus
the bottom 12 bits, it will still give you the extra bits that
are part of the page offset for anything larger than 4kB,
even for a level-3 mapping
- pkvm_pgtable_stage2_map() assumes that the address passed as
a parameter is aligned to the size of the intended mapping
- the faulting address is only aligned for a non-page mapping
When the planets are suitably aligned (pun intended), the guest
faults on a page by accessing it past the bottom 4kB, and extra bits
get set in the HPFAR_EL2 register. If this results in a page mapping
(which is likely with large granule sizes), nothing aligns it further
down, and pkvm_mapping_iter_first() finds an intersection that
doesn't really exist. We assume this is a spurious fault and return
-EAGAIN. And again...
This doesn't hit outside of the protected code, as the page table
code always aligns the IPA down to a page boundary, hiding the issue
for everyone else.
Fix it by always forcing the alignment on vma_pagesize, irrespective
of the value of vma_pagesize.
Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org
Cc: stable@vger.kernel.org
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a
hypervisor virtual address during vCPU initialization in
`pkvm_vcpu_init_sve()`.
`unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since
`kern_hyp_va()` is idempotent, it's not a bug. However, it is
unnecessary and potentially confusing. Remove the redundant conversion.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
In protected mode, the hypervisor maintains a separate instance of
the `kvm` structure for each VM. For non-protected VMs, this structure is
initialized from the host's `kvm` state.
Currently, `pkvm_init_features_from_host()` copies the
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the
underlying `id_regs` data being initialized. This results in the
hypervisor seeing the flag as set while the ID registers remain zeroed.
Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for
non-protected VMs. This breaks logic that relies on feature detection,
such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain
system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not
saved/restored during the world switch, which could lead to state
corruption.
Fix this by explicitly copying the ID registers from the host `kvm` to
the hypervisor `kvm` for non-protected VMs during initialization, since
we trust the host with its non-protected guests' features. Also ensure
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in
`pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly
initialize them and set the flag once done.
Fixes: 41d6028e28bd ("KVM: arm64: Convert the SVE guest vcpu flag to a vm flag")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Although ID register sanitisation prevents guests from seeing the
feature, adding this check to the helper allows the compiler to entirely
eliminate S1POE-specific code paths (such as context switching POR_EL1)
when the host kernel is compiled without support (CONFIG_ARM64_POE is
disabled).
This aligns with the pattern used for other optional features like SVE
(kvm_has_sve()) and FPMR (kvm_has_fpmr()), ensuring no POE logic if the
host lacks support, regardless of the guest configuration state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1.
However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to
guests whenever the hardware supports it, ignoring the host kernel
configuration.
If a guest detects this feature and attempts to use it, the host will
fail to context-switch POR_EL1, potentially leading to state corruption.
Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system
registers, preventing KVM from advertising the feature when the host
does not support it (i.e. system_supports_poe() is false).
Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/misc-6.20:
: .
: Misc KVM/arm64 changes for 6.20
:
: - Trivial FPSIMD cleanups
:
: - Calculate hyp VA size only once, avoiding potential mapping issues when
: VA bits is smaller than expected
:
: - Silence sparse warning for the HYP stack base
:
: - Fix error checking when handling FFA_VERSION
:
: - Add missing trap configuration for DBGWCR15_EL1
:
: - Don't try to deal with nested S2 when NV isn't enabled for a guest
:
: - Various spelling fixes
: .
KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
KVM: arm64: Fix various comments
KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1
KVM: arm64: Fix error checking for FFA_VERSION
KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include
KVM: arm64: Calculate hyp VA size only once
KVM: arm64: Remove ISB after writing FPEXC32_EL2
KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices
KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/resx:
: .
: Add infrastructure to deal with the full gamut of RESx bits
: for NV. As a result, it is now possible to have the expected
: semantics for some bits such as SCTLR_EL2.SPAN.
: .
KVM: arm64: Add debugfs file dumping computed RESx values
KVM: arm64: Add sanitisation to SCTLR_EL2
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
KVM: arm64: Remove all traces of FEAT_TME
KVM: arm64: Simplify handling of full register invalid constraint
KVM: arm64: Get rid of FIXED_VALUE altogether
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
KVM: arm64: Move RESx into individual register descriptors
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
KVM: arm64: Simplify FIXED_VALUE handling
KVM: arm64: Convert HCR_EL2.RW to AS_RES1
KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
KVM: arm64: Allow RES1 bits to be inferred from configuration
KVM: arm64: Inherit RESx bits from FGT register descriptors
KVM: arm64: Extend unified RESx handling to runtime sanitisation
KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
KVM: arm64: Introduce standalone FGU computing primitive
KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
arm64: Convert SCTLR_EL2 to sysreg infrastructure
Signed-off-by: Marc Zyngier <maz@kernel.org>
The NV stage-2 manipulation functions kvm_nested_s2_unmap(),
kvm_nested_s2_wp(), and others, are being called for any stage-2
manipulation regardless of whether nested virtualization is supported or
enabled for the VM.
For protected KVM (pKVM), `struct kvm_pgtable` uses the
`pkvm_mappings` member of the union. This member aliases `ia_bits`,
which is used by the non-protected NV code paths. Attempting to
read `pgt->ia_bits` in these functions results in treating
protected mapping pointers or state values as bit-shift amounts.
This triggers a UBSAN shift-out-of-bounds error:
UBSAN: shift-out-of-bounds in arch/arm64/kvm/nested.c:1127:34
shift exponent 174565952 is too large for 64-bit type 'unsigned long'
Call trace:
__ubsan_handle_shift_out_of_bounds+0x28c/0x2c0
kvm_nested_s2_unmap+0x228/0x248
kvm_arch_flush_shadow_memslot+0x98/0xc0
kvm_set_memslot+0x248/0xce0
Since pKVM and NV are mutually exclusive, prevent entry into these
NV handling functions if the VM has not allocated any nested MMUs
(i.e., `kvm->arch.nested_mmus_size` is 0).
Fixes: 7270cc9157f47 ("KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202152310.113467-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/debugfs-fixes:
: .
: Cleanup of the debugfs iterator, which are way more complicated
: than they ought to be, courtesy of Fuad Tabba. From the cover letter:
:
: "This series refactors the debugfs implementations for `idregs` and
: `vgic-state` to use standard `seq_file` iterator patterns.
:
: The existing implementations relied on storing iterator state within
: global VM structures (`kvm_arch` and `vgic_dist`). This approach
: prevented concurrent reads of the debugfs files (returning -EBUSY) and
: created improper dependencies between transient file operations and
: long-lived VM state."
: .
KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs
KVM: arm64: Reimplement vgic-debug XArray iteration
KVM: arm64: Use standard seq_file iterator for idregs debugfs
Signed-off-by: Marc Zyngier <maz@kernel.org>
Computing RESx values is hard. Verifying that they are correct is
harder. Add a debugfs file called "resx" that will dump all the RESx
values for a given VM.
I found it useful, maybe you will too.
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-21-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Use tab instead of whitespaces, as well as 2 minor typo fixes.
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260128075208.23024-1-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/gicv5-prologue:
: .
: Prologue to GICv5 support, courtesy of Sascha Bischoff.
:
: This is preliminary work that sets the scene for the full-blow
: support.
: .
irqchip/gic-v5: Check if impl is virt capable
KVM: arm64: gic: Set vgic_model before initing private IRQs
arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
The current implementation uses `vgic_state_iter` in `struct
vgic_dist` to track the sequence position. This effectively makes the
iterator shared across all open file descriptors for the VM.
This approach has significant drawbacks:
- It enforces mutual exclusion, preventing concurrent reads of the
debugfs file (returning -EBUSY).
- It relies on storing transient iterator state in the long-lived
VM structure (`vgic_dist`).
Refactor the implementation to use the standard `seq_file` iterator.
Instead of storing state in `kvm_arch`, rely on the `pos` argument
passed to the `start` and `next` callbacks, which tracks the logical
index specific to the file descriptor.
This change enables concurrent access and eliminates the
`vgic_state_iter` field from `struct vgic_dist`.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202085721.3954942-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Sanitise SCTLR_EL2 the usual way. The most important aspect of
this is that we benefit from SCTLR_EL2.SPAN being RES1 when
HCR_EL2.E2H==0.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-20-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Seems that it was missed when MDCR_EL2 was first added to the trap
forwarding infrastructure. Add it back.
Fixes: cb31632c4452 ("KVM: arm64: nv: Add trap forwarding for MDCR_EL2")
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260130094435.39942-1-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/gicv3-tdir-fixes:
: .
: Address two trapping-related issues when running legacy (i.e. GICv3)
: guests on GICv5 hosts, courtesy of Sascha Bischoff.
: .
KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts
KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that there is support for creating a GICv5-based guest with KVM,
check that the hardware itself supports virtualisation, skipping the
setting of struct gic_kvm_info if not.
Note: If native GICv5 virt is not supported, then nor is
FEAT_GCIE_LEGACY, so we are able to skip altogether.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-33-sascha.bischoff@arm.com
[maz: cosmetic changes]
Signed-off-by: Marc Zyngier <maz@kernel.org>
The vgic-debug interface implementation uses XArray marks
(`LPI_XA_MARK_DEBUG_ITER`) to "snapshot" LPIs at the start of iteration.
This modifies global state for a read-only operation and complicates
reference counting, leading to leaks if iteration is aborted or fails.
Reimplement the iterator to use dynamic iteration logic:
- Remove `lpi_idx` from `struct vgic_state_iter`.
- Replace the XArray marking mechanism with dynamic iteration using
`xa_find_after(..., XA_PRESENT)`.
- Wrap XArray traversals in `rcu_read_lock()`/`rcu_read_unlock()` to
ensure safety against concurrent modifications (e.g., LPI unmapping).
- Handle potential races where an LPI is removed during iteration by
gracefully skipping it in `show()`, rather than warning.
- Remove the unused `LPI_XA_MARK_DEBUG_ITER` definition.
This simplifies the lifecycle management of the iterator and prevents
resource leaks associated with the marking mechanism, and paves the way
for using a standard seq_file iterator.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202085721.3954942-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
MIOCNCE had the potential to eat your data, and also was never
implemented by anyone. It's been retrospectively removed from
the architecture, and we're happy to follow that lead.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-19-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
According to section 13.2 of the DEN0077 FF-A specification, when
firmware does not support the requested version, it should reply with
FFA_RET_NOT_SUPPORTED(-1). Table 13.6 specifies the type of the error
code as int32.
Currently, the error checking logic compares the unsigned long return
value it got from the SMC layer, against a "-1" literal. This fails due
to a type mismatch: the literal is extended to 64 bits, whereas the
register contains only 32 bits of ones(0x00000000ffffffff).
Consequently, hyp_ffa_init misinterprets the "-1" return value as an
invalid FF-A version. This prevents pKVM initialization on devices where
FF-A is not supported in firmware.
Fix this by explicitly casting res.a0 to s32.
Signed-off-by: Kornel Dulęba <korneld@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251114-pkvm_init_noffa-v1-1-87a82e87c345@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/fwb-for-all:
: .
: Allow pKVM's host stage-2 mappings to use the Force Write Back version
: of the memory attributes by using the "pass-through' encoding.
:
: This avoids having two separate encodings for S2 on a given platform.
: .
KVM: arm64: Simplify PAGE_S2_MEMATTR
KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB
KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1
KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flag
arm64: Add MT_S2{,_FWB}_AS_S1 encodings
Signed-off-by: Marc Zyngier <maz@kernel.org>
The original order of checks in the ICH_HCR_EL2_TDIR test returned
with false early in the case where the native GICv3 CPUIF was not
present. The result was that on GICv5 hosts with legacy support -
which do not have the GICv3 CPUIF - the test always returned false.
Reshuffle the checks such that support for GICv5 legacy is checked
prior to checking for the native GICv3 CPUIF.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 2a28810cbb8b2 ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping")
Link: https://patch.msgid.link/20251208152724.3637157-4-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Different GIC types require the private IRQs to be initialised
differently. GICv5 is the culprit as it supports both a different
number of private IRQs, and all of these are PPIs (there are no
SGIs). Moreover, as GICv5 uses the top bits of the interrupt ID to
encode the type, the intid also needs to computed differently.
Up until now, the GIC model has been set after initialising the
private IRQs for a VCPU. Move this earlier to ensure that the GIC
model is available when configuring the private IRQs. While we're at
it, also move the setting of the in_kernel flag and implementation
revision to keep them grouped together as before.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-7-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The current implementation uses `idreg_debugfs_iter` in `struct
kvm_arch` to track the sequence position. This effectively makes the
iterator shared across all open file descriptors for the VM.
This approach has significant drawbacks:
- It enforces mutual exclusion, preventing concurrent reads of the
debugfs file (returning -EBUSY).
- It relies on storing transient iterator state in the long-lived VM
structure (`kvm_arch`).
- The use of `u8` for the iterator index imposes an implicit limit of
255 registers. While not currently exceeded, this is fragile against
future architectural growth. Switching to `loff_t` eliminates this
overflow risk.
Refactor the implementation to use the standard `seq_file` iterator.
Instead of storing state in `kvm_arch`, rely on the `pos` argument
passed to the `start` and `next` callbacks, which tracks the logical
index specific to the file descriptor.
This change enables concurrent access and eliminates the
`idreg_debugfs_iter` field from `struct kvm_arch`.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202085721.3954942-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
FEAT_TME has been dropped from the architecture. Retrospectively.
I'm sure someone is crying somewhere, but most of us won't.
Clean-up time.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Include <asm/stackpage/nvhe.h> for kvm_arm_hyp_stack_base
declaration which fixes the following sparse warning:
arch/arm64/kvm/arm.c:63:1: warning: symbol 'kvm_arm_hyp_stack_base' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://patch.msgid.link/20260112160413.603493-1-ben.dooks@codethink.co.uk
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/pkvm-no-mte:
: .
: pKVM updates preventing the host from using MTE-related system
: sysrem registers when the feature is disabled from the kernel
: command-line (arm64.nomte), courtesy of Fuad Taba.
:
: From the cover letter:
:
: "If MTE is supported by the hardware (and is enabled at EL3), it remains
: available to lower exception levels by default. Disabling it in the host
: kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising
: the feature; it does not physically disable MTE in the hardware.
:
: The ability to disable MTE in the host kernel is used by some systems,
: such as Android, so that the physical memory otherwise used as tag
: storage can be used for other things (i.e. treated just like the rest of
: memory). In this scenario, a malicious host could still access tags in
: pages donated to a guest using MTE instructions (e.g., STG and LDG),
: bypassing the kernel's configuration."
: .
KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
KVM: arm64: Trap MTE access and discovery when MTE is disabled
KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM
Signed-off-by: Marc Zyngier <maz@kernel.org>
Restore PAGE_S2_MEMATTR() to its former glory, keeping the use of
FWB as an implementation detail.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Factor out the enable (and printing of) the GICv3 CPUIF traps from the
main GICv3 probe into a separate function. Call said function from the
GICv5 probe for legacy support, ensuring that any required GICv3 CPUIF
traps on GICv5 hosts will be correctly handled, rather than injecting
an undef into the guest.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20251208152724.3637157-3-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 architecture is dropping the ICC_HAPR_EL1 and ICV_HAPR_EL1
system registers. These registers were never added to the sysregs, but
the traps for them were.
Drop the trap bit from the ICH_HFGRTR_EL2 and make it Res1 as per the
upcoming GICv5 spec change. Additionally, update the EL2 setup code to
not attempt to set that bit.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-4-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that we embed the RESx bits in the register description, it becomes
easier to deal with registers that are simply not valid, as their
existence is not satisfied by the configuration (SCTLR2_ELx without
FEAT_SCTLR2, for example). Such registers essentially become RES0 for
any bit that wasn't already advertised as RESx.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Calculate the hypervisor's VA size only once to maintain consistency
between the memory layout and MMU initialization logic. Previously the
two would be inconsistent when the kernel is configured for less than
IDMAP_VA_BITS of VA space.
Signed-off-by: Petteri Kangaslampi <pekangas@google.com>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260113194409.2970324-2-pekangas@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/pkvm-features-6.20:
: .
: pKVM guest feature trapping fixes, courtesy of Fuad Tabba.
: .
KVM: arm64: Prevent host from managing timer offsets for protected VMs
KVM: arm64: Check whether a VM IOCTL is allowed in pKVM
KVM: arm64: Track KVM IOCTLs and their associated KVM caps
KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM
KVM: arm64: Include VM type when checking VM capabilities in pKVM
KVM: arm64: Introduce helper to calculate fault IPA offset
KVM: arm64: Fix MTE flag initialization for protected VMs
KVM: arm64: Fix Trace Buffer trap polarity for protected VMs
KVM: arm64: Fix Trace Buffer trapping for protected VMs
Signed-off-by: Marc Zyngier <maz@kernel.org>
When initializing HCR traps in protected mode, use kvm_has_mte() to
check for MTE support rather than kvm_has_feat(kvm, ID_AA64PFR1_EL1,
MTE, IMP).
kvm_has_mte() provides a more comprehensive check:
- kvm_has_feat() only checks if MTE is in the guest's ID register view
(i.e., what we advertise to the guest)
- kvm_has_mte() checks both system_supports_mte() AND whether
KVM_ARCH_FLAG_MTE_ENABLED is set for this VM instance
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Nobody is using this flag anymore, so remove it. This allows
some cleanup by removing stage2_has_fwb(), which is can be replaced
by a direct check on the capability.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
The VGIC-v3 code relied on hand-written definitions for the
ICH_VMCR_EL2 register. This register, and the associated fields, is
now generated as part of the sysreg framework. Move to using the
generated definitions instead of the hand-written ones.
There are no functional changes as part of this change.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-3-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull core entry fix from Borislav Petkov:
- Make sure clang inlines trivial local_irq_* helpers
* tag 'core_urgent_for_v6.19_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Always inline local_irq_{enable,disable}_exit_to_user()
We have now killed every occurrences of FIXED_VALUE, and we can therefore
drop the whole infrastructure. Good riddance.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-16-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
The value of FPEX32_EL2 has no effect on execution in AArch64 state, and
consequently there's no need for an ISB after writing to it in the hyp
code (which executes in AArch64 state). When performing an exception
return to AArch32 state, the exception return will provide the necessary
context synchronization event.
Remove the redundant ISB.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260106173707.3292074-4-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/feat_idst:
: .
: Add support for FEAT_IDST, allowing ID registers that are not implemented
: to be reported as a normal trap rather than as an UNDEF exception.
: .
KVM: arm64: selftests: Add a test for FEAT_IDST
KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome
KVM: arm64: pkvm: Add a generic synchronous exception injection primitive
KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE
KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way
KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers
KVM: arm64: Add a generic synchronous exception injection primitive
KVM: arm64: Add trap routing for GMID_EL1
arm64: Repaint ID_AA64MMFR2_EL1.IDS description
Signed-off-by: Marc Zyngier <maz@kernel.org>
For protected VMs, the guest's timer offset state should not be
controlled by the host and must always run with a virtual counter offset
of 0. The existing timer logic allowed the host to set and manage the
timer counter offsets for protected VMs in certain cases.
Disable all host-side management of timer offsets for protected VMs by
adding checks in the relevant code paths.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
When MTE hardware is present but disabled via software (`arm64.nomte` or
`CONFIG_ARM64_MTE=n`), the kernel clears `HCR_EL2.ATA` and sets
`HCR_EL2.TID5`, to prevent the use of MTE instructions.
Additionally, accesses to certain MTE system registers trap to EL2 with
exception class ESR_ELx_EC_SYS64. To emulate hardware without MTE (where
such accesses would cause an Undefined Instruction exception), inject
UNDEF into the host.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since we have the basics to use the S1 memory attributes as the
final ones with FWB, flip the host over to that when FWB is present.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull pmdomain fixes from Ulf Hansson:
- mediatek: Fix spinlock recursion fix during probe
- imx: Fix reference count leak during probe
* tag 'pmdomain-v6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx: Fix reference count leak in imx_gpc_probe()
pmdomain: mtk-pm-domains: Fix spinlock recursion fix in probe
clang needs __always_inline instead of inline, even for tiny helpers.
This saves some cycles in system call fast path, and saves 195 bytes
on x86_64 build:
$ size vmlinux.before vmlinux.after
text data bss dec hex filename
34652814 22291961 5875180 62819955 3be8e73 vmlinux.before
34652619 22291961 5875180 62819760 3be8db0 vmlinux.after
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251204153127.1321824-1-edumazet@google.com
Now that we can link the RESx behaviour with the value of HCR_EL2.E2H,
we can trivially express the tautological constraint that makes E2H
a reserved value at all times.
Fun, isn't it?
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-15-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
There's a gap in the KVM_HOST_DATA_FLAG_* indices since the removal of
KVM_HOST_DATA_FLAG_HOST_SVE_ENABLED and
KVM_HOST_DATA_FLAG_HOST_SME_ENABLED in commits:
* 459f059be702 ("KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN")
* 407a99c4654e ("KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN")
Shuffle the indices to remove the gap, as Oliver requested at the time of
the removals:
https://lore.kernel.org/linux-arm-kernel/Z6qC4qn47ONfDCSH@linux.dev/
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260106173707.3292074-3-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftests-6.20:
: .
: Some selftest fixes addressing page alignment issues as well as
: a bad MMU setup bug, courtesy of Fuad Tabba.
: .
KVM: selftests: Fix typos and stale comments in kvm_util
KVM: selftests: Move page_align() to shared header
KVM: riscv: selftests: Fix incorrect rounding in page_align()
KVM: arm64: selftests: Fix incorrect rounding in page_align()
KVM: arm64: selftests: Disable unused TTBR1_EL1 translations
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add a very basic test checking that FEAT_IDST actually works for
the {GMID,SMIDR,CSSIDR2}_EL1 registers.
Link: https://patch.msgid.link/20260108173233.2911955-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Certain VM IOCTLs are tied to specific VM features. Since pKVM does not
support all features, restrict which IOCTLs are allowed depending on
whether the associated feature is supported.
Use the existing VM capability check as the source of truth to whether
an IOCTL is allowed for a particular VM by mapping the IOCTLs with their
associated capabilities.
Suggested-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
If MTE is not supported by the hardware, or is disabled in the kernel
configuration (`CONFIG_ARM64_MTE=n`) or command line (`arm64.nomte`),
the kernel stops advertising MTE to userspace and avoids using MTE
instructions. However, this is a software-level disable only.
When MTE hardware is present and enabled by EL3 firmware, leaving
`HCR_EL2.ATA` set allows the host to execute MTE instructions (STG, LDG,
etc.) and access allocation tags in physical memory.
Prevent this by clearing `HCR_EL2.ATA` when MTE is disabled. Remove it
from the `HCR_HOST_NVHE_FLAGS` default, and conditionally set it in
`cpu_prepare_hyp_mode()` only when `system_supports_mte()` returns true.
This causes MTE instructions to trap to EL2 when `HCR_EL2.ATA` is
cleared.
Additionally, set `HCR_EL2.TID5` when MTE is disabled. This traps reads
of `GMID_EL1` (Multiple tag transfer ID register) to EL2, preventing the
discovery of MTE parameters (such as tag block size) when the feature is
suppressed.
Early boot code in `head.S` temporarily keeps `HCR_ATA` set to avoid
special-casing initialization paths. This is safe because this code
executes before untrusted code runs and will clear `HCR_ATA` if MTE is
disabled.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Plumb the MT_S2{,_FWB}_AS_S1 memory types into the KVM_S2_MEMATTR()
macro with a new KVM_PGTABLE_S2_AS_S1 flag.
Nobody selects it yet.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull perf tool fixes and from Namhyung Kim:
- skip building BPF skeletons if libopenssl is missing
- a couple of test updates
- handle error cases of filename__read_build_id()
- support NVIDIA Olympus for ARM SPE profiling
- update tool headers to sync with the kernel
* tag 'perf-tools-fixes-for-v6.19-2026-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools build: Fix the common set of features test wrt libopenssl
tools headers: Sync syscall table with kernel sources
tools headers: Sync linux/socket.h with kernel sources
tools headers: Sync linux/gfp_types.h with kernel sources
tools headers: Sync arm64 headers with kernel sources
tools headers: Sync x86 headers with kernel sources
tools headers: Sync UAPI sound/asound.h with kernel sources
tools headers: Sync UAPI linux/mount.h with kernel sources
tools headers: Sync UAPI linux/fs.h with kernel sources
tools headers: Sync UAPI linux/fcntl.h with kernel sources
tools headers: Sync UAPI KVM headers with kernel sources
tools headers: Sync UAPI drm/drm.h with kernel sources
perf arm-spe: Add NVIDIA Olympus to neoverse list
tools headers arm64: Add NVIDIA Olympus part
perf tests top: Make the test exclusive
perf tests kvm: Avoid leaving perf.data.guest file around
perf symbol: Fix ENOENT case for filename__read_build_id
perf tools: Disable BPF skeleton if no libopenssl found
tools/build: Add a feature test for libopenssl
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable@vger.kernel.org
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).
Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
It appears that a !! became ! during a cleanup, resulting in inverted
logic when detecting if a host GICv5 implementation is capable of
virtualization.
Re-add the missing !, fixing the behaviour.
Fixes: 3227c3a89d65f ("irqchip/gic-v5: Check if impl is virt capable")
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20260225083130.3378490-1-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is
not supported") added an early return to several functions in
arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error
when accessing the pgt union for non-nested VMs.
However, this early return was inadvertently applied to
kvm_arch_flush_shadow_all() as well, causing it to skip the call to
kvm_uninit_stage2_mmu(kvm) for all non-nested VMs.
For pKVM, skipping this teardown means the host never unshares the
guest's memory with the EL2 hypervisor. When the host kernel later
recycles these leaked pages for a new VM, it attempts to re-share them.
The hypervisor correctly rejects this with -EPERM, triggering a host
WARN_ON and hanging the guest.
Fix this by dropping the early return from kvm_arch_flush_shadow_all().
The for-loop guarding the nested MMU cleanup already bounds itself when
nested_mmus_size == 0, allowing execution to proceed to
kvm_uninit_stage2_mmu() as intended.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/
Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported")
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
pKVM tracks the memory that has been mapped into a guest in a
side data structure. Crucially, it uses it to find out whether
a page has already been mapped, and therefore refuses to map it
twice. So far, so good.
However, this very patch completely breaks non-4kB page support,
with guests being unable to boot. The most obvious symptom is that
we take the same fault repeatedly, and not making forward progress.
A quick investigation shows that this is because of the above
rejection code.
As it turns out, there are multiple issues at play:
- while the HPFAR_EL2 register gives you the faulting IPA minus
the bottom 12 bits, it will still give you the extra bits that
are part of the page offset for anything larger than 4kB,
even for a level-3 mapping
- pkvm_pgtable_stage2_map() assumes that the address passed as
a parameter is aligned to the size of the intended mapping
- the faulting address is only aligned for a non-page mapping
When the planets are suitably aligned (pun intended), the guest
faults on a page by accessing it past the bottom 4kB, and extra bits
get set in the HPFAR_EL2 register. If this results in a page mapping
(which is likely with large granule sizes), nothing aligns it further
down, and pkvm_mapping_iter_first() finds an intersection that
doesn't really exist. We assume this is a spurious fault and return
-EAGAIN. And again...
This doesn't hit outside of the protected code, as the page table
code always aligns the IPA down to a page boundary, hiding the issue
for everyone else.
Fix it by always forcing the alignment on vma_pagesize, irrespective
of the value of vma_pagesize.
Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org
Cc: stable@vger.kernel.org
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct gic_kvm_info", but the returned type,
while matching, is const qualified. To get them exactly matching, just
use the dereferenced pointer for the sizeof().
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a
hypervisor virtual address during vCPU initialization in
`pkvm_vcpu_init_sve()`.
`unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since
`kern_hyp_va()` is idempotent, it's not a bug. However, it is
unnecessary and potentially confusing. Remove the redundant conversion.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
In protected mode, the hypervisor maintains a separate instance of
the `kvm` structure for each VM. For non-protected VMs, this structure is
initialized from the host's `kvm` state.
Currently, `pkvm_init_features_from_host()` copies the
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the
underlying `id_regs` data being initialized. This results in the
hypervisor seeing the flag as set while the ID registers remain zeroed.
Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for
non-protected VMs. This breaks logic that relies on feature detection,
such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain
system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not
saved/restored during the world switch, which could lead to state
corruption.
Fix this by explicitly copying the ID registers from the host `kvm` to
the hypervisor `kvm` for non-protected VMs during initialization, since
we trust the host with its non-protected guests' features. Also ensure
`KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in
`pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly
initialize them and set the flag once done.
Fixes: 41d6028e28bd ("KVM: arm64: Convert the SVE guest vcpu flag to a vm flag")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Although ID register sanitisation prevents guests from seeing the
feature, adding this check to the helper allows the compiler to entirely
eliminate S1POE-specific code paths (such as context switching POR_EL1)
when the host kernel is compiled without support (CONFIG_ARM64_POE is
disabled).
This aligns with the pattern used for other optional features like SVE
(kvm_has_sve()) and FPMR (kvm_has_fpmr()), ensuring no POE logic if the
host lacks support, regardless of the guest configuration state.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1.
However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to
guests whenever the hardware supports it, ignoring the host kernel
configuration.
If a guest detects this feature and attempts to use it, the host will
fail to context-switch POR_EL1, potentially leading to state corruption.
Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system
registers, preventing KVM from advertising the feature when the host
does not support it (i.e. system_supports_poe() is false).
Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260213143815.1732675-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/misc-6.20:
: .
: Misc KVM/arm64 changes for 6.20
:
: - Trivial FPSIMD cleanups
:
: - Calculate hyp VA size only once, avoiding potential mapping issues when
: VA bits is smaller than expected
:
: - Silence sparse warning for the HYP stack base
:
: - Fix error checking when handling FFA_VERSION
:
: - Add missing trap configuration for DBGWCR15_EL1
:
: - Don't try to deal with nested S2 when NV isn't enabled for a guest
:
: - Various spelling fixes
: .
KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
KVM: arm64: Fix various comments
KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1
KVM: arm64: Fix error checking for FFA_VERSION
KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include
KVM: arm64: Calculate hyp VA size only once
KVM: arm64: Remove ISB after writing FPEXC32_EL2
KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices
KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/resx:
: .
: Add infrastructure to deal with the full gamut of RESx bits
: for NV. As a result, it is now possible to have the expected
: semantics for some bits such as SCTLR_EL2.SPAN.
: .
KVM: arm64: Add debugfs file dumping computed RESx values
KVM: arm64: Add sanitisation to SCTLR_EL2
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
KVM: arm64: Remove all traces of FEAT_TME
KVM: arm64: Simplify handling of full register invalid constraint
KVM: arm64: Get rid of FIXED_VALUE altogether
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
KVM: arm64: Move RESx into individual register descriptors
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
KVM: arm64: Simplify FIXED_VALUE handling
KVM: arm64: Convert HCR_EL2.RW to AS_RES1
KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
KVM: arm64: Allow RES1 bits to be inferred from configuration
KVM: arm64: Inherit RESx bits from FGT register descriptors
KVM: arm64: Extend unified RESx handling to runtime sanitisation
KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
KVM: arm64: Introduce standalone FGU computing primitive
KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
arm64: Convert SCTLR_EL2 to sysreg infrastructure
Signed-off-by: Marc Zyngier <maz@kernel.org>
The NV stage-2 manipulation functions kvm_nested_s2_unmap(),
kvm_nested_s2_wp(), and others, are being called for any stage-2
manipulation regardless of whether nested virtualization is supported or
enabled for the VM.
For protected KVM (pKVM), `struct kvm_pgtable` uses the
`pkvm_mappings` member of the union. This member aliases `ia_bits`,
which is used by the non-protected NV code paths. Attempting to
read `pgt->ia_bits` in these functions results in treating
protected mapping pointers or state values as bit-shift amounts.
This triggers a UBSAN shift-out-of-bounds error:
UBSAN: shift-out-of-bounds in arch/arm64/kvm/nested.c:1127:34
shift exponent 174565952 is too large for 64-bit type 'unsigned long'
Call trace:
__ubsan_handle_shift_out_of_bounds+0x28c/0x2c0
kvm_nested_s2_unmap+0x228/0x248
kvm_arch_flush_shadow_memslot+0x98/0xc0
kvm_set_memslot+0x248/0xce0
Since pKVM and NV are mutually exclusive, prevent entry into these
NV handling functions if the VM has not allocated any nested MMUs
(i.e., `kvm->arch.nested_mmus_size` is 0).
Fixes: 7270cc9157f47 ("KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202152310.113467-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/debugfs-fixes:
: .
: Cleanup of the debugfs iterator, which are way more complicated
: than they ought to be, courtesy of Fuad Tabba. From the cover letter:
:
: "This series refactors the debugfs implementations for `idregs` and
: `vgic-state` to use standard `seq_file` iterator patterns.
:
: The existing implementations relied on storing iterator state within
: global VM structures (`kvm_arch` and `vgic_dist`). This approach
: prevented concurrent reads of the debugfs files (returning -EBUSY) and
: created improper dependencies between transient file operations and
: long-lived VM state."
: .
KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs
KVM: arm64: Reimplement vgic-debug XArray iteration
KVM: arm64: Use standard seq_file iterator for idregs debugfs
Signed-off-by: Marc Zyngier <maz@kernel.org>
Computing RESx values is hard. Verifying that they are correct is
harder. Add a debugfs file called "resx" that will dump all the RESx
values for a given VM.
I found it useful, maybe you will too.
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-21-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/gicv5-prologue:
: .
: Prologue to GICv5 support, courtesy of Sascha Bischoff.
:
: This is preliminary work that sets the scene for the full-blow
: support.
: .
irqchip/gic-v5: Check if impl is virt capable
KVM: arm64: gic: Set vgic_model before initing private IRQs
arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
The current implementation uses `vgic_state_iter` in `struct
vgic_dist` to track the sequence position. This effectively makes the
iterator shared across all open file descriptors for the VM.
This approach has significant drawbacks:
- It enforces mutual exclusion, preventing concurrent reads of the
debugfs file (returning -EBUSY).
- It relies on storing transient iterator state in the long-lived
VM structure (`vgic_dist`).
Refactor the implementation to use the standard `seq_file` iterator.
Instead of storing state in `kvm_arch`, rely on the `pos` argument
passed to the `start` and `next` callbacks, which tracks the logical
index specific to the file descriptor.
This change enables concurrent access and eliminates the
`vgic_state_iter` field from `struct vgic_dist`.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202085721.3954942-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Sanitise SCTLR_EL2 the usual way. The most important aspect of
this is that we benefit from SCTLR_EL2.SPAN being RES1 when
HCR_EL2.E2H==0.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-20-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Seems that it was missed when MDCR_EL2 was first added to the trap
forwarding infrastructure. Add it back.
Fixes: cb31632c4452 ("KVM: arm64: nv: Add trap forwarding for MDCR_EL2")
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260130094435.39942-1-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/gicv3-tdir-fixes:
: .
: Address two trapping-related issues when running legacy (i.e. GICv3)
: guests on GICv5 hosts, courtesy of Sascha Bischoff.
: .
KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts
KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that there is support for creating a GICv5-based guest with KVM,
check that the hardware itself supports virtualisation, skipping the
setting of struct gic_kvm_info if not.
Note: If native GICv5 virt is not supported, then nor is
FEAT_GCIE_LEGACY, so we are able to skip altogether.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-33-sascha.bischoff@arm.com
[maz: cosmetic changes]
Signed-off-by: Marc Zyngier <maz@kernel.org>
The vgic-debug interface implementation uses XArray marks
(`LPI_XA_MARK_DEBUG_ITER`) to "snapshot" LPIs at the start of iteration.
This modifies global state for a read-only operation and complicates
reference counting, leading to leaks if iteration is aborted or fails.
Reimplement the iterator to use dynamic iteration logic:
- Remove `lpi_idx` from `struct vgic_state_iter`.
- Replace the XArray marking mechanism with dynamic iteration using
`xa_find_after(..., XA_PRESENT)`.
- Wrap XArray traversals in `rcu_read_lock()`/`rcu_read_unlock()` to
ensure safety against concurrent modifications (e.g., LPI unmapping).
- Handle potential races where an LPI is removed during iteration by
gracefully skipping it in `show()`, rather than warning.
- Remove the unused `LPI_XA_MARK_DEBUG_ITER` definition.
This simplifies the lifecycle management of the iterator and prevents
resource leaks associated with the marking mechanism, and paves the way
for using a standard seq_file iterator.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202085721.3954942-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
MIOCNCE had the potential to eat your data, and also was never
implemented by anyone. It's been retrospectively removed from
the architecture, and we're happy to follow that lead.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-19-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
According to section 13.2 of the DEN0077 FF-A specification, when
firmware does not support the requested version, it should reply with
FFA_RET_NOT_SUPPORTED(-1). Table 13.6 specifies the type of the error
code as int32.
Currently, the error checking logic compares the unsigned long return
value it got from the SMC layer, against a "-1" literal. This fails due
to a type mismatch: the literal is extended to 64 bits, whereas the
register contains only 32 bits of ones(0x00000000ffffffff).
Consequently, hyp_ffa_init misinterprets the "-1" return value as an
invalid FF-A version. This prevents pKVM initialization on devices where
FF-A is not supported in firmware.
Fix this by explicitly casting res.a0 to s32.
Signed-off-by: Kornel Dulęba <korneld@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251114-pkvm_init_noffa-v1-1-87a82e87c345@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/fwb-for-all:
: .
: Allow pKVM's host stage-2 mappings to use the Force Write Back version
: of the memory attributes by using the "pass-through' encoding.
:
: This avoids having two separate encodings for S2 on a given platform.
: .
KVM: arm64: Simplify PAGE_S2_MEMATTR
KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB
KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1
KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flag
arm64: Add MT_S2{,_FWB}_AS_S1 encodings
Signed-off-by: Marc Zyngier <maz@kernel.org>
The original order of checks in the ICH_HCR_EL2_TDIR test returned
with false early in the case where the native GICv3 CPUIF was not
present. The result was that on GICv5 hosts with legacy support -
which do not have the GICv3 CPUIF - the test always returned false.
Reshuffle the checks such that support for GICv5 legacy is checked
prior to checking for the native GICv3 CPUIF.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Fixes: 2a28810cbb8b2 ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping")
Link: https://patch.msgid.link/20251208152724.3637157-4-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Different GIC types require the private IRQs to be initialised
differently. GICv5 is the culprit as it supports both a different
number of private IRQs, and all of these are PPIs (there are no
SGIs). Moreover, as GICv5 uses the top bits of the interrupt ID to
encode the type, the intid also needs to computed differently.
Up until now, the GIC model has been set after initialising the
private IRQs for a VCPU. Move this earlier to ensure that the GIC
model is available when configuring the private IRQs. While we're at
it, also move the setting of the in_kernel flag and implementation
revision to keep them grouped together as before.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-7-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The current implementation uses `idreg_debugfs_iter` in `struct
kvm_arch` to track the sequence position. This effectively makes the
iterator shared across all open file descriptors for the VM.
This approach has significant drawbacks:
- It enforces mutual exclusion, preventing concurrent reads of the
debugfs file (returning -EBUSY).
- It relies on storing transient iterator state in the long-lived VM
structure (`kvm_arch`).
- The use of `u8` for the iterator index imposes an implicit limit of
255 registers. While not currently exceeded, this is fragile against
future architectural growth. Switching to `loff_t` eliminates this
overflow risk.
Refactor the implementation to use the standard `seq_file` iterator.
Instead of storing state in `kvm_arch`, rely on the `pos` argument
passed to the `start` and `next` callbacks, which tracks the logical
index specific to the file descriptor.
This change enables concurrent access and eliminates the
`idreg_debugfs_iter` field from `struct kvm_arch`.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202085721.3954942-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
FEAT_TME has been dropped from the architecture. Retrospectively.
I'm sure someone is crying somewhere, but most of us won't.
Clean-up time.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Include <asm/stackpage/nvhe.h> for kvm_arm_hyp_stack_base
declaration which fixes the following sparse warning:
arch/arm64/kvm/arm.c:63:1: warning: symbol 'kvm_arm_hyp_stack_base' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://patch.msgid.link/20260112160413.603493-1-ben.dooks@codethink.co.uk
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/pkvm-no-mte:
: .
: pKVM updates preventing the host from using MTE-related system
: sysrem registers when the feature is disabled from the kernel
: command-line (arm64.nomte), courtesy of Fuad Taba.
:
: From the cover letter:
:
: "If MTE is supported by the hardware (and is enabled at EL3), it remains
: available to lower exception levels by default. Disabling it in the host
: kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising
: the feature; it does not physically disable MTE in the hardware.
:
: The ability to disable MTE in the host kernel is used by some systems,
: such as Android, so that the physical memory otherwise used as tag
: storage can be used for other things (i.e. treated just like the rest of
: memory). In this scenario, a malicious host could still access tags in
: pages donated to a guest using MTE instructions (e.g., STG and LDG),
: bypassing the kernel's configuration."
: .
KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
KVM: arm64: Trap MTE access and discovery when MTE is disabled
KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM
Signed-off-by: Marc Zyngier <maz@kernel.org>
Restore PAGE_S2_MEMATTR() to its former glory, keeping the use of
FWB as an implementation detail.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Factor out the enable (and printing of) the GICv3 CPUIF traps from the
main GICv3 probe into a separate function. Call said function from the
GICv5 probe for legacy support, ensuring that any required GICv3 CPUIF
traps on GICv5 hosts will be correctly handled, rather than injecting
an undef into the guest.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://patch.msgid.link/20251208152724.3637157-3-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
The GICv5 architecture is dropping the ICC_HAPR_EL1 and ICV_HAPR_EL1
system registers. These registers were never added to the sysregs, but
the traps for them were.
Drop the trap bit from the ICH_HFGRTR_EL2 and make it Res1 as per the
upcoming GICv5 spec change. Additionally, update the EL2 setup code to
not attempt to set that bit.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-4-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Now that we embed the RESx bits in the register description, it becomes
easier to deal with registers that are simply not valid, as their
existence is not satisfied by the configuration (SCTLR2_ELx without
FEAT_SCTLR2, for example). Such registers essentially become RES0 for
any bit that wasn't already advertised as RESx.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Calculate the hypervisor's VA size only once to maintain consistency
between the memory layout and MMU initialization logic. Previously the
two would be inconsistent when the kernel is configured for less than
IDMAP_VA_BITS of VA space.
Signed-off-by: Petteri Kangaslampi <pekangas@google.com>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260113194409.2970324-2-pekangas@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/pkvm-features-6.20:
: .
: pKVM guest feature trapping fixes, courtesy of Fuad Tabba.
: .
KVM: arm64: Prevent host from managing timer offsets for protected VMs
KVM: arm64: Check whether a VM IOCTL is allowed in pKVM
KVM: arm64: Track KVM IOCTLs and their associated KVM caps
KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM
KVM: arm64: Include VM type when checking VM capabilities in pKVM
KVM: arm64: Introduce helper to calculate fault IPA offset
KVM: arm64: Fix MTE flag initialization for protected VMs
KVM: arm64: Fix Trace Buffer trap polarity for protected VMs
KVM: arm64: Fix Trace Buffer trapping for protected VMs
Signed-off-by: Marc Zyngier <maz@kernel.org>
When initializing HCR traps in protected mode, use kvm_has_mte() to
check for MTE support rather than kvm_has_feat(kvm, ID_AA64PFR1_EL1,
MTE, IMP).
kvm_has_mte() provides a more comprehensive check:
- kvm_has_feat() only checks if MTE is in the guest's ID register view
(i.e., what we advertise to the guest)
- kvm_has_mte() checks both system_supports_mte() AND whether
KVM_ARCH_FLAG_MTE_ENABLED is set for this VM instance
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Nobody is using this flag anymore, so remove it. This allows
some cleanup by removing stage2_has_fwb(), which is can be replaced
by a direct check on the capability.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
The VGIC-v3 code relied on hand-written definitions for the
ICH_VMCR_EL2 register. This register, and the associated fields, is
now generated as part of the sysreg framework. Move to using the
generated definitions instead of the hand-written ones.
There are no functional changes as part of this change.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-3-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
We have now killed every occurrences of FIXED_VALUE, and we can therefore
drop the whole infrastructure. Good riddance.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-16-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
The value of FPEX32_EL2 has no effect on execution in AArch64 state, and
consequently there's no need for an ISB after writing to it in the hyp
code (which executes in AArch64 state). When performing an exception
return to AArch32 state, the exception return will provide the necessary
context synchronization event.
Remove the redundant ISB.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260106173707.3292074-4-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/feat_idst:
: .
: Add support for FEAT_IDST, allowing ID registers that are not implemented
: to be reported as a normal trap rather than as an UNDEF exception.
: .
KVM: arm64: selftests: Add a test for FEAT_IDST
KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome
KVM: arm64: pkvm: Add a generic synchronous exception injection primitive
KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE
KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way
KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers
KVM: arm64: Add a generic synchronous exception injection primitive
KVM: arm64: Add trap routing for GMID_EL1
arm64: Repaint ID_AA64MMFR2_EL1.IDS description
Signed-off-by: Marc Zyngier <maz@kernel.org>
For protected VMs, the guest's timer offset state should not be
controlled by the host and must always run with a virtual counter offset
of 0. The existing timer logic allowed the host to set and manage the
timer counter offsets for protected VMs in certain cases.
Disable all host-side management of timer offsets for protected VMs by
adding checks in the relevant code paths.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
When MTE hardware is present but disabled via software (`arm64.nomte` or
`CONFIG_ARM64_MTE=n`), the kernel clears `HCR_EL2.ATA` and sets
`HCR_EL2.TID5`, to prevent the use of MTE instructions.
Additionally, accesses to certain MTE system registers trap to EL2 with
exception class ESR_ELx_EC_SYS64. To emulate hardware without MTE (where
such accesses would cause an Undefined Instruction exception), inject
UNDEF into the host.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since we have the basics to use the S1 memory attributes as the
final ones with FWB, flip the host over to that when FWB is present.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull pmdomain fixes from Ulf Hansson:
- mediatek: Fix spinlock recursion fix during probe
- imx: Fix reference count leak during probe
* tag 'pmdomain-v6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx: Fix reference count leak in imx_gpc_probe()
pmdomain: mtk-pm-domains: Fix spinlock recursion fix in probe
clang needs __always_inline instead of inline, even for tiny helpers.
This saves some cycles in system call fast path, and saves 195 bytes
on x86_64 build:
$ size vmlinux.before vmlinux.after
text data bss dec hex filename
34652814 22291961 5875180 62819955 3be8e73 vmlinux.before
34652619 22291961 5875180 62819760 3be8db0 vmlinux.after
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251204153127.1321824-1-edumazet@google.com
Now that we can link the RESx behaviour with the value of HCR_EL2.E2H,
we can trivially express the tautological constraint that makes E2H
a reserved value at all times.
Fun, isn't it?
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-15-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
There's a gap in the KVM_HOST_DATA_FLAG_* indices since the removal of
KVM_HOST_DATA_FLAG_HOST_SVE_ENABLED and
KVM_HOST_DATA_FLAG_HOST_SME_ENABLED in commits:
* 459f059be702 ("KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN")
* 407a99c4654e ("KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN")
Shuffle the indices to remove the gap, as Oliver requested at the time of
the removals:
https://lore.kernel.org/linux-arm-kernel/Z6qC4qn47ONfDCSH@linux.dev/
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260106173707.3292074-3-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftests-6.20:
: .
: Some selftest fixes addressing page alignment issues as well as
: a bad MMU setup bug, courtesy of Fuad Tabba.
: .
KVM: selftests: Fix typos and stale comments in kvm_util
KVM: selftests: Move page_align() to shared header
KVM: riscv: selftests: Fix incorrect rounding in page_align()
KVM: arm64: selftests: Fix incorrect rounding in page_align()
KVM: arm64: selftests: Disable unused TTBR1_EL1 translations
Signed-off-by: Marc Zyngier <maz@kernel.org>
Certain VM IOCTLs are tied to specific VM features. Since pKVM does not
support all features, restrict which IOCTLs are allowed depending on
whether the associated feature is supported.
Use the existing VM capability check as the source of truth to whether
an IOCTL is allowed for a particular VM by mapping the IOCTLs with their
associated capabilities.
Suggested-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251211104710.151771-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
If MTE is not supported by the hardware, or is disabled in the kernel
configuration (`CONFIG_ARM64_MTE=n`) or command line (`arm64.nomte`),
the kernel stops advertising MTE to userspace and avoids using MTE
instructions. However, this is a software-level disable only.
When MTE hardware is present and enabled by EL3 firmware, leaving
`HCR_EL2.ATA` set allows the host to execute MTE instructions (STG, LDG,
etc.) and access allocation tags in physical memory.
Prevent this by clearing `HCR_EL2.ATA` when MTE is disabled. Remove it
from the `HCR_HOST_NVHE_FLAGS` default, and conditionally set it in
`cpu_prepare_hyp_mode()` only when `system_supports_mte()` returns true.
This causes MTE instructions to trap to EL2 when `HCR_EL2.ATA` is
cleared.
Additionally, set `HCR_EL2.TID5` when MTE is disabled. This traps reads
of `GMID_EL1` (Multiple tag transfer ID register) to EL2, preventing the
discovery of MTE parameters (such as tag block size) when the feature is
suppressed.
Early boot code in `head.S` temporarily keeps `HCR_ATA` set to avoid
special-casing initialization paths. This is safe because this code
executes before untrusted code runs and will clear `HCR_ATA` if MTE is
disabled.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260122112218.531948-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Plumb the MT_S2{,_FWB}_AS_S1 memory types into the KVM_S2_MEMATTR()
macro with a new KVM_PGTABLE_S2_AS_S1 flag.
Nobody selects it yet.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260123191637.715429-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull perf tool fixes and from Namhyung Kim:
- skip building BPF skeletons if libopenssl is missing
- a couple of test updates
- handle error cases of filename__read_build_id()
- support NVIDIA Olympus for ARM SPE profiling
- update tool headers to sync with the kernel
* tag 'perf-tools-fixes-for-v6.19-2026-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools build: Fix the common set of features test wrt libopenssl
tools headers: Sync syscall table with kernel sources
tools headers: Sync linux/socket.h with kernel sources
tools headers: Sync linux/gfp_types.h with kernel sources
tools headers: Sync arm64 headers with kernel sources
tools headers: Sync x86 headers with kernel sources
tools headers: Sync UAPI sound/asound.h with kernel sources
tools headers: Sync UAPI linux/mount.h with kernel sources
tools headers: Sync UAPI linux/fs.h with kernel sources
tools headers: Sync UAPI linux/fcntl.h with kernel sources
tools headers: Sync UAPI KVM headers with kernel sources
tools headers: Sync UAPI drm/drm.h with kernel sources
perf arm-spe: Add NVIDIA Olympus to neoverse list
tools headers arm64: Add NVIDIA Olympus part
perf tests top: Make the test exclusive
perf tests kvm: Avoid leaving perf.data.guest file around
perf symbol: Fix ENOENT case for filename__read_build_id
perf tools: Disable BPF skeleton if no libopenssl found
tools/build: Add a feature test for libopenssl
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable@vger.kernel.org
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>