Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1.. SPDX-License-Identifier: GPL-2.0
2
3====================
4Protected KVM (pKVM)
5====================
6
7**NOTE**: pKVM is currently an experimental, development feature and
8subject to breaking changes as new isolation features are implemented.
9Please reach out to the developers at kvmarm@lists.linux.dev if you have
10any questions.
11
12Overview
13========
14
15Booting a host kernel with '``kvm-arm.mode=protected``' enables
16"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity
17map page-table for the host and uses it to isolate the hypervisor
18running at EL2 from the rest of the host running at EL1/0.
19
20pKVM permits creation of protected virtual machines (pVMs) by passing
21the ``KVM_VM_TYPE_ARM_PROTECTED`` machine type identifier to the
22``KVM_CREATE_VM`` ioctl(). The hypervisor isolates pVMs from the host by
23unmapping pages from the stage-2 identity map as they are accessed by a
24pVM. Hypercalls are provided for a pVM to share specific regions of its
25IPA space back with the host, allowing for communication with the VMM.
26A Linux guest must be configured with ``CONFIG_ARM_PKVM_GUEST=y`` in
27order to issue these hypercalls.
28
29See hypercalls.rst for more details.
30
31Isolation mechanisms
32====================
33
34pKVM relies on a number of mechanisms to isolate PVMs from the host:
35
36CPU memory isolation
37--------------------
38
39Status: Isolation of anonymous memory and metadata pages.
40
41Metadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages)
42are donated from the host to the hypervisor during pVM creation and
43are consequently unmapped from the stage-2 identity map until the pVM is
44destroyed.
45
46Similarly to regular KVM, pages are lazily mapped into the guest in
47response to stage-2 page faults handled by the host. However, when
48running a pVM, these pages are first pinned and then unmapped from the
49stage-2 identity map as part of the donation procedure. This gives rise
50to some user-visible differences when compared to non-protected VMs,
51largely due to the lack of MMU notifiers:
52
53* Memslots cannot be moved or deleted once the pVM has started running.
54* Read-only memslots and dirty logging are not supported.
55* With the exception of swap, file-backed pages cannot be mapped into a
56 pVM.
57* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM
58 must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``.
59 The lack of a runtime reclaim mechanism means that memory locked for
60 a pVM will remain locked until the pVM is destroyed.
61* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a
62 mapping associated with a memslot) are not reflected in the guest and
63 may lead to loss of coherency.
64* Accessing pVM memory that has not been shared back will result in the
65 delivery of a SIGSEGV.
66* If a system call accesses pVM memory that has not been shared back
67 then it will either return ``-EFAULT`` or forcefully reclaim the
68 memory pages. Reclaimed memory is zeroed by the hypervisor and a
69 subsequent attempt to access it in the pVM will return ``-EFAULT``
70 from the ``VCPU_RUN`` ioctl().
71
72CPU state isolation
73-------------------
74
75Status: **Unimplemented.**
76
77DMA isolation using an IOMMU
78----------------------------
79
80Status: **Unimplemented.**
81
82Proxying of Trustzone services
83------------------------------
84
85Status: FF-A and PSCI calls from the host are proxied by the pKVM
86hypervisor.
87
88The FF-A proxy ensures that the host cannot share pVM or hypervisor
89memory with Trustzone as part of a "confused deputy" attack.
90
91The PSCI proxy ensures that CPUs always have the stage-2 identity map
92installed when they are executing in the host.
93
94Protected VM firmware (pvmfw)
95-----------------------------
96
97Status: **Unimplemented.**
98
99Resources
100=========
101
102Quentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A
103technical deep dive" remains a good resource for learning more about
104pKVM, despite some of the details having changed in the meantime:
105
106https://www.youtube.com/watch?v=9npebeVFbFw